SMSR2019

Schedule

Poster Session #1 and Registration. Coffee will be available.

SMRS 2019 overview
cellF is the world’s first neuron-driven synthesiser. It is a collaborative project at the cutting edge of experimental art and music that brings together artists, musicians, designers and scientists to create a cybernetic musical entity. cellF is an autonomous, bio-analogue electronic musical instrument designed to operate independently and interact with human musicians. It posits a future where the musician and musical instrument are one. A scenario where personalised ‘bio-instruments’ that contain unique signatures become possible.
The instrument is controlled by a bioengineered neural network or ‘brain’ derived from skin cells using induced pluripotent stem cell (iPSC) technology that is housed in a custom built synthesiser ‘body’. Artist Guy Ben-Ary envisioned the project to realise a juvenile dream of becoming a rock star by creating an bio-analogue alter-ego to perform with other musicians. This fantasy has played out on the world stage seeing collaborations with leading improvisors such as Han Bennick, Jaap Blonk, Okkyung Lee and Chris Abrahams amongst others at international music festivals and gallery spaces such as Ars Electronica (Austria), CTM Festival (Germany), Mona Foma (Australia) and Kapelica (Slovenia).
Darren Moore, musical director for the cellF project, will introduce the project in his presentation highlighting the background, biotechnology, design and performance capabilities.

Darren Moore is a Lecturer in Music at Lasalle who is a drummer and electronic musician working in the fields of jazz, experimental music and multimedia throughout South East Asia, Australia, Japan and Europe. His Doctorate of Musical Arts was completed at Griffith University in 2013 and looked at the adaptation of Carnatic Indian rhythms to the drum set. His research interests centre on his improvisatory practice on drum set and modular synthesiser. Darren is the musical director for the cellF project which is a multi-disciplinary bio-art work bringing together artists, scientists, musicians and electrical engineers to produce a neural-driven analogue synthesiser for real time performance and collaboration.
This talk outlines an approach to creating improvised music which features the interplay between a human performer and elements generated algorithmically. Central to this approach is the creation of a feedback system that allows the human to mediate and shape the generated material and so shape the overall performance. A high-level model of the music improvisation process is presented, and several systems will be shown that demonstrate some of the ways this interactive composition model can be realised.

Michael Spicer has a PhD music and a M.Sc in Computer Science, and is constantly looking for ways to combine these two areas. He has been performing professionally as a keyboard/synthesizer/flute player since the late 1970’s. He was a member of the popular Australian folk/rock group “Redgum” in the 1980’s. He is currently teaching at Singapore Polytechnic and performing in Singapore with the improvisation group “Sonic Escapade”.
This presentation examines the communication strategies between sound and visual artists who interact and engage in a semi-structured sound-visual improvisatory performance (SVimprovisation). An empirical study has been conducted where all ensemble rehearsals andperformances were observed and videotaped. A member checks procedure and a focus groupdiscussion have been employed after each rehearsal and performance in order to validate andunderstand their actions and thoughts when the improvisation unfurls. The findings have beenfocused on how the performers coordinate with the given content (the minimal structures ofthe composition) and the process of collaboration when the interactions unfurl. Resultsindicated that three modes of coordination (a process of detection and reaction, body languageand/or eye contact and no clear live communication when nearing the upcoming section ormovement), together with three modes of collaboration (no instant interaction, a unidirectionalinteraction and a bi-directional interaction) have been identified. At the same time, fivecommon features in sound and visual presentation have been identified which served as thecommon language between the performers from different modalities and these are thearticulations, speed, direction, texture and density of the presentation. By compiling thefindings, the results provide some initial framework into how the performers fromdifferent modalities develop an acute moment-by- moment sense of coordination andcommunicate with each other as the performance unfolds.

Chow Jun Yan (muscjy@nus.edu.sg) received his Ph.D. degree in year 2018 in National University of Singapore under Professor Lonce Wyse, with the title of Communication Between Sound and Visual Artists in Improvisatory Performance. He is currently working as a teaching assistant at Yong Siew Toh Conservatory of Music, National University of Singapore.
The use of music in stroke rehabilitation therapy has been shown to improve motor function. and studies suggest that robot assisted rehabilitation therapy can lead to sustained improvements in motor function. In this study, private robot therapy patients enrolled into a 6 week program completed one therapy session while listening to self-selected music via noise reducing headphones. The robot performance data from this session was compared to performance data from a therapy session where they wore noise reducing headphones but did not receive music. Additional this data was compared to a control group who did not ever listen to music. The participants (non control group) could also choose to continue listening to music for the remainder of their sessions. The second part of the study compares 6 patients who listened to music for the entire 6 week program against 7 (with similar Fugel Meyer scores) who did not. The results of this study will be presented.

Australian composer/sound designer Ross Williams has written music and designed sound across a range of styles for theatre, feature film, concert hall, dance, museum installation and interactive media. Since studying composition in Australia and the United States his works have been performed internationally. His works for award winning abstract, documentary and narrative films have been shown in festivals around the world. He holds a BMus from the University of Western Australia and a Masters and Doctorate in Musical Arts from Rice University, Texas.

He is the Assistant Professor of Sound Design for Film and Animation at the School of Art Design and Media at Nanyang Technical University, Singapore. His main focus is on the sound design and score of documentary, fiction and abstract/experimental film and animation.

His research interests include implementation of audio stimuli to improve effectiveness of robotic motor training, urban soundscapes as cultural heritage, immersive sound and memory, and the aesthetics of sound design for narrative and experimental film. Recent works for animation and experimental film have explored the interplay between literal and abstract sound design where sound and image relationships are constantly redefined. Recent collaborative research projects have investigated the representation of cultural heritage in VR.

A buffet lunch will be served in the lobby. Posters will be hosted from 1:00 until 1:45pm

Music acoustics is the study of the physical elements involved during musical performance: both of the instrument (it’s design, material and construction, and its subsequent acoustic behaviour) and how performers interact with the instrument, to create a musical performance. In this short sharing, I will introduce the audio/acoustics research group at SUTD and share some aspects of the work undertaken to date, as well as aspirations for the future, with invitations for collaboration.

Jer-Ming Chen, SUTD
I will briefly share one of my results obtained by intracellular recordings from single primary auditory cortex neurons in the living rat, that tones evoke synaptic excitation followed by synaptic inhibition. I will then describe how excitation followed by inhibition or suppression may be relevant for some aspects of elementary stream segregation, as hypothesized by Fishman and Steinschneider, and by Christophe Micheyl and colleagues. Finally, I will explain using the example of cochlear implants, why the study of plasticity is a major theme in auditory neuroscience.

Andrew Tan is an Assistant Professor in the Department of Physiology and the Neurobiology Programme at the National University of Singapore. He studies the auditory cortex, being interested in the synaptic organization that influences how we perceive sound arriving at the ear, and the plasticity of that organization during learning. Andrew received his primary and secondary education at the Anglo-Chinese schools in Singapore, studied biology and physics at the Massachusetts Institute of Technology, undertook doctoral work in neuroscience at the University of California, San Francisco, and did postdoctoral research at the University of Oregon and The University of Texas at Austin. He helped develop the state of the art in intracellular measurement of single neuron activity in living animals. He co-published the first complete profiles of the tone-evoked excitation and inhibition received by single auditory cortex neurons, and co-initiated the first successful whole cell study of the behaving monkey.
Music can be viewed from two main perspectives. From the physical side, it follows the same definition as any other sound – propagating disturbance of the pressure. On a perceptual level, those disturbances are decoded within an auditory system resulting in individual experience of sound, affected by our personality, interest, mood, listening habits, and many other psychophysical aspects.
Hearing aids are commonly used solutions for people suffering from hearing loss. While primarily optimized to maximize patients’ speech understanding, there is a range of technical parameters and perceptual factors that needs to be separately examined for music perception. Those considerations together with examples of offered solutions will be discussed within the talk.

Sonia Stasiak (Manager Audiology, Research and Development, Sivantos Pte Ltd)
Sonia Stasiak is head of Applied Audiological Research SG team in Research and Development department of Sivantos Pte Ltd, one of the world’s leading manufacturers of hearing aids, where she and her team are directly involved in development, optimization and verification of products. She has graduated with a Master of Science degree in Acoustic from Adam Mickiewicz University in Poznan, Poland. On top of her core work on hearing aids, her scientific interest includes psychoacoustic as well as speech understanding and music perception improvements for hearing impaired using hearing aids. Personally, she is concert flute and piccolo flute player with almost 7 years’ experience as City Orchestra musician back in her homeland. She is a member of SAPS (Society of Audiology Professionals Singapore). Originally from Poland, working in Singapore for over 7 years.

Musiio makes music discoverable with an Artificial Intelligence (AI) that can ‘listen’ to millions of tracks at once. It is able to recognise thousands of features from every single audio track and can categorise music with an accuracy of greater than 90%. Highly accurate tagging, precise musical feature-based search and personalised auto-playlisting at scale are now all possible with Musiio’s suite of AI products.

Hazel Savage, Mack Hampson
BandLab is an easy-to-use, all-in-one social music creation platform.

Gerry Beauregard, Taemin Cho
Maia is a start-up company based in New Zealand and Singapore augmenting human capabilities with technology.

Yvonne Chua, Shamane Siriwardhana

Speech-to-singing (STS) conversion is the task of converting the read lyrics of a song, spoken in natural manner, to proper singing. It is an enabling technology for many innovative services and applications, such as, beautifying the singing renditions by amateur singers, automatically generating reference singing for vocal learners, personalizing singing synthesis systems, etc. The most important aspect of the task is to change the prosody of the natural speech to match with that of proper singing, while retaining the linguistic content and the speaker’s identity. STS conversion is a challenging task because speaking and singing are different in many ways. We need to find a temporal alignment between speech and singing signals, in order to convert and combine parameters of speech to those of singing signals. This is a particularly difficult task as speech and singing manifest different characteristic nature. The STS conversion is currently implemented in two ways using the model-based and template-based techniques. These two techniques are very similar to each other, except in the way in which reference prosody characteristics are set up for STS conversion. The quality of output of current STS systems are limited by the accuracy of temporal alignment, spectral conversion and analysis-synthesis by a vocoder. In this talk, we will discuss the major challenges in implementing an STS conversion system, the prominent strategies for STS conversion, the status of current research in the area and indicate potential future directions.

Karthika Vijayan (vijayan.karthika@nus.edu.sg) received her Ph.D. degree from the Indian Institute of Technology (IIT) Hyderabad, in 2016. She is a research fellow at the Department of Electrical and Computer Engineering, National University of Singapore, since 2017. Her research interests include speech and singing signal processing and characterization. She is a member of the IEEE, the International Speech Communication Association (ISCA) and the Asia Pacific Signal and Information Processing Association (APSIPA). She received several awards including Research Excellence from IIT Hyderabad in two consecutive years (2014 and 2015) and Springer book prize at the 2017 APSIPA–Annual Summit and Conference.
In this talk, we'll explore some of the intricacies of detecting and classifying a wide range of animal sounds in acoustic recordings. Some of key defining features of the problem is a strong presence of disturbance signals emanating from the environment, as well as a large, potentially unbounded number of classes of interest.

Alf Fredrik Christian Bagge Ka received my MSc in EE in 2014 and my PhD in robotics and machine learning in 2019 from Lund University, Sweden. My research interests include the intersection between machine learning and system identification as well as software for scientific computing.
Among other possibilities, this synth allows timbres to be created in which the set of overtones is close to the harmonic series but deviates from it in specific ways, for example in which the frequencies of the overtones are stretched or compressed relative to a true harmonic series. I'll demonstrate how such stretched timbres have a dramatic effect on our perception of what it means for note pitches to be "in tune" - for example how, in order to sound in tune, octaves between notes in melodic and harmonic sequences may have to deviate by up to half a semitone from the normal 2:1 frequency ratio. More radical stretches of harmonics and note pitches hint at new musical possibilities, even new harmonic languages.
I'll also demonstrate "equally-tempered timbres" comprising near-harmonic overtone series in which each harmonic is forced to the nearest equal-temperament frequency, and how such straight-jacketing to equal temperament affects the tonal qualities of isolated notes and multi-voice harmony.

Pete Kellock has a diverse background combining music, technology and entrepreneurship. He holds degrees in physics / maths and in music, plus a PhD in Electronic Music. His professional experience ranges from writing software and designing electronic hardware to founding technology companies and mentoring young entrepreneurs. In the mid 1980s he founded a company called Zyklus Ltd to create the Midi Performance System, a revolutionary interactive music sequencer, still used by a handful of musicians in Europe. In 2001 he founded and led muvee Technologies, a Singapore startup which pioneered automatic video editing software, to date shipping hundreds of millions of copies worldwide. His musical experience includes freelancing as a horn player in classical orchestras, playing & producing rock music, and writing electronic music in his home studio. In his search for new sounds and forms of musical expression he’s currently using Max and Max for Live to explore ideas in additive synthesis, stretched-frequency timbres, vectorized harmony, algorithmic melodic expression and other areas.

Venue: Orchestra Hall (YST 3rd floor)

A step beyond is a musical duet between a human musician and an artificial agent (MASSE). A notable aspect of this performance is that the system goes beyond musical support to influence the musicians' experience of partnership. The partner characteristics are observed through dialogic interaction, mutually influencing changes in musical material and coordinated changes of musical characteristics between the agent and the human. The system was originally implemented through an approach to musical responsiveness based on rhythmic notions of perceptual stability and togetherness. In the performance, the MASSE system is further expanded to respond to attributes of pitch, and timbre, in addition to the rhythm attributes.

Performer: Dirk Stromberg, Phallophone
This performance will consist of three flute/computer improvisations. The music is produced as a result of the interactions of a human performer and an ensemble of virtual “performers” implemented as autonomous software agents. Each improvisation contains several distinct musical layers, but all the sounds heard by the audience are directly or indirectly derived from the sound made by the flute. This is one way of creating an “extended” instrument and is responsive enough for the performer to perceive how the agent system reacts to performance nuances and note choices, and can effectively “play” the ensemble.

Michael Spicer, flute and electronics
“You are willingly mutated by intimate machines, abducted by audio into the populations of your bodies. Sound machines throw you onto the shores of the skin you're in. The hypersensual cyborg experiences herself as a galaxy of audiotactile sensations.” - Kodwo Eshun, More Brilliant Than The Sun, Quartet Books, 1998

“Constantly talking isn’t necessarily communicating.” – Joel Barish, Eternal Sunshine of the Spotless Mind, 2004

Sifrmu is an ongoing project which identifies encryption is a form / process of intimacy. The work revolves around a programme which encrypts plaintext into ciphertext, simultaneously producing MIDI messages as another layer to this encryption process.

In a scene from Charlie Kaufman’s Eternal Sunshine of the Spotless Mind, Clementine struggles to have a conversation with Joel. both characters are in bed together and Clementine laments on how Joel often keeps to himself, whereas she is open about how she feels and tells him everything. Joel, half asleep, says, “Constantly talking isn’t necessarily communicating.” noticeably upset, Clementine goes on to say that she does not constantly talk, citing that she only wants to get to know him. Clementine later says, “People have to share things. That’s what intimacy is.”

In thinking about relationships and the process of embodying one another not just during the course of interactions, i'm also thinking about the afterlife of it.

I am convinced of this idea that each interaction consists of a process where we are encrypting each other into our bodies, we smell one another and lock in that scent, and if we ever come into contact with such a smell, we decrypt a part of each other and are reminded of our encounter, interaction or experience shared. Encryption is a process beyond the idea of hiding or protecting, it is also a process of intimacy. It is an outcome of containing and safeguarding a shared relation which no one else could gain access to and embodied.

In these violent times, privacy and encryption of personal information is highly valued and finds its way as a premium which only the privileged few could afford. It is time we reorientate this position, and propose a wider and inclusive approach in thinking about the notion of privacy and encryption. This must be a process thats detached from privilege of socio-economic and political order, but a distributed, personal and intimate process of caring and protecting of one another.

At least to me, that is one micro-utopian ideal of encryption in the context of care and relationships.

I've been dabbling with cryptography, specifically ciphertexts and i've found myself at an intersection where poetry comes into contact with ciphers. For this new work, i'm writing new poems in English and Malay, and encoding them into a personally formulated encryption method, of which the ciphertexts are written in Jawi, which is basically Arabic alphabets with some additional letters (specifically the consonants C, G, P) for written Bahasa Melayu. Each of these pieces are encrypted with a unique key which is required to decipher each text. to write these texts, i've written a programme which allows me to write in Roman alphabets and have it encrypted into Jawi.

Bani Haykal

Poster presentations

Abstract:With the advent of neural networks, there have been huge improvements in the accuracy of music transcription systems. While many studies focus on the design of neural network architectures for music transcription, only few of the papers include an in-depth study of the input representations for the neural network. In our work, we compare different input representations such as STFT spectrogram, log-frequency spectrogram, constant Q transform (CQT), and study the effects of these different input representations on the music transcription accuracy.

Author: Kin Wai Cheuk, Kat Agres, Dorien Herremans

Bio: Kin Wai Cheuk is a Ph.D student at Singapore University of Technology and Design, under the supervision of Professor Dorien Herremans and Dr. Kat Agres. He received both his Bachler of Science in Physics (Minor in Music) and Master of Philosophy in Mechanical Engineering in The University of Hong Kong. His research interest is neural network based music composition.
Abstract: Emotion and music are intrinsically connected, and researchers have had limited success in employing computational models to predict perceived emotion in music. Here, we use computational dimension reduction techniques to discover meaningful representations of music. For static emotion prediction, i.e., predicting one valence/arousal value for each 45s musical excerpt, we explore the use of triplet neural networks for discovering a representation that differentiates emotions more effectively. This reduced representation is then used in a classification model, which outperforms the original model trained on raw audio. For dynamic emotion prediction, i.e., predicting one valence/arousal value every 500ms, we examine how meaningful representations can be learned through a variational autoencoder (a state-of-the-art architecture effective in untangling information-rich structures in noisy signals). Although vastly reduced in dimensionality, our model achieves state-of-the-art performance for emotion prediction accuracy. This approach enables us to identify which features underlie emotion content in music.

Author: Dorien Herremans, Kin Wai Cheuk, Yin-Jyun Luo, Kat Agres
Abstract: Aphasia caregivers serve as an important link between the medical practitioners and the patients and assume the most responsibilities in the patients’ well-being. On top of their multitude of care responsibilities, they are expected to act as the stand-in therapists and to guide the patients in speech rehabilitation when the therapists are not present. This work investigates the current practices and communication challenges involved in the process of caregiver training for speech rehabilitation. Then, it builds on these new insights to ideate design solutions to tackle the identified challenges. Three design strategies were proposed (1) connectivity, (2) multimodality and (3) integration, to address these groups of challenges respectively: (a) uncertainty due to lack of feedback from both caregiver and therapist, (b) caregiver’s lack of guidance to conduct exercises at home and (c) caregiver’s lack of understanding and motivation to conduct exercises. A caregiver support system was also developed and and iterated through experimental studies to show how design can help the caregivers of aphasic patients to bridge the expertise gap.

Author: Thanh Pham Ha

Bio: Ha was a Master student at the Department of Communications and New Media, NUS. She is now a design practitioner who has worked on multiple digital products. Her interest lies in applying design processes to use art & music as a channel to solve challenging problems.
Abstract: 30 musicians (10 singers, 10 Dizi and 10 String musicians) were asked to perform pitch singing task. It was discovered that Musicians are not consistent with their pitch and it is difficult to determine the music temperament that they prefer. None of the musicians share the same temperament when compared individually or as a section. It was also discovered that the Dizi musicians’ data were clustered closely together as compared to the String musicians where the data are more spaced out. Two observations stand out strongly among all musicians in this research i.e the preference for a flatter "Fa" and a sharper "Ti".

Author: Hsien Han

Bio: Hsien Han is a PhD student from SUTD, under the supervision of Dr Chen Jer Ming.
Abstract: One of the challenges in the development of computational improvisation systems is to design systems that improve co-creative experiences and engender a sense of creative part- nership. Prior work on computational improvisation systems has engendered a sense of partnership through the perceived agency of the system but are less focused on directly im- pacting co-creative experiences. This work oers an alternative approach to study creative musical partnerships by directly impacting musicians' co-creative experiences. Towards this goal, we develop a new representation of a musical context based on two dimensions - stability and togetherness - that are integral to musical decision-making during co-creation. This representation aids the musical decision-making of an agent that monitors changes in the musical context, and alters its musical response based on internal goals. The agent was evaluated through human experiments in which it improvised rhythmic duets with the musicians and they reported their experiences. In these performances, musicians identied three characteristics of interaction that distinguished their sense of co-creating with a tool and a collaborator. The dierent characteristics also correspond to aspects of group cre- ativity through emergent symbolic interaction, perceivable unpredictability, and a sense of negotiating dierences in understanding. Through the development of a new representa- tion of musical context and decision-making procedure that directly impacts the musicians' co-creative experiences, this work oers an alternative approach to study creative compu- tational partnerships and to develop semi-autonomous music partners.

Author: Prashanth T.R.

Bio: Prashanth is a Ph.D. student in the Department of Communications and New Media, NUS. His interests revolve around the development of intelli- gent music technologies for creative collaboration with humans. His Ph.D. research focuses on addressing the issues of developing creative music tech- nologies that transcend their use as creative support tools and are recognized computational partners in improvised musical collaboration. In the Singa- pore music research symposium 2019, he will be presenting his research on the design of the creative computational partner, and demonstrate its appli- cation in live music performance.
Abstract: Singapore has a notably multiracial population, and along with it, a diversity of musical traditions. We seek to investigate and understand the music of the different ethnicities of Singapore using state-of-the-art techniques from music information retrieval (MIR). This initial project aims to discover the relationships between different musical cultures in Singapore, with a focus on Chinese, Malay, and Indian music. Using the Spotify Web API, we have collected a total of 15,930 musical clips (30 sec each in duration). In our preliminary analysis, we extracted low-level features such as MFCCs and Chroma using OpenSMILE to classify Chinese, Malay, Hindi and Tamil music. Performance across various machine Learning algorithms, such as Logistic Regression and SVM, was evaluated. In our preliminary analysis, 318 features were used (retaining 98% of the variance) to achieve the highest prediction accuracy (of 65%). We are working towards improving classification accuracy by cleaning the dataset and extracting high level musical features.

Author: Fajilatun Nahar, Dorien Herremans, Kat Agres

Bio: Fajilatun Nahar is currently a Master’s Student in Singapore University of Technology and Design (SUTD), under the supervision of Assistant Professor Dorien Herremans. She received her BSc in Computer Science and Engineering from North South University, Bangladesh. Prior to her Master’s she worked as a software developer for more than 3 years. Currently, she is exploring Music Information Retrieval and Machine Learning topics. She is also interested to combine her software development skills into her research.
Abstract: This work investigated the factors that influence the difference between a musician’s live performance and recording self-evaluation. 52 musicians self-evaluated their live performance and recording under varying conditions. Results showed that musicians perceived their live performance and recording differently. However, this difference was not reflected in their overall quality ratings. Significantly different levels of self-efficacy, perfectionistic evaluative concerns, trait anxiety, and cognitive anxiety were found between participants who rated their live performance more favourably and participants who rated their recording more favourably. When the recording self-evaluation was controlled for, similar factors were found to predict the self-evaluation of live performance. Furthermore, cognitive anxiety positively correlated with the increased difference between live performance and recording self-evaluation. Additionally, the participant’s opinion of the effects of their anxiety on the performance was a stronger predictor than the intensity of the anxiety experienced. Interestingly, participants who frequently listened to their recordings did not display a reduced difference between self-evaluation of live performance and recording. However, due to the design of the study, no causal inferences were established.

Author: Paul Huang

Bio: Paul Huang recently completed his postgraduate studies in Performance Science at the Royal College of Music (RCM). He received his Bachelor of Music in flute performance from the RCM and Nanyang Academy of Fine Arts in 2015. He remains as an active freelance flutist; performing regularly with orchestras and chamber ensembles. Paul’s research interests lie in performance psychology, performance optimisation, and quantitative research. He is currently working on a project exploring self-evaluation of live performance and recording in musicians.
Abstract: One can easily visualise a simple pendulum as a model for a sine-wave oscillator, arguably the most basic building block for sound synthesis. However, what if this pendulum is not simple, but chaotic, e.g. a double pendulum? In addition, while other chaotic maps have been used as audio-rate oscillators, using chaotic pendula for such purposes is almost non-existent. This poster explores how one can model chaotic (physical) pendula for audio-rate oscillators, with an example found in Timothy S. H. Tan’s sound installation The Double Double Pendula (The DDP), and challenges.

Author: Timothy S. H. Tan

Bio: Timothy S. H. Tan (timbretan) is an algorithmic sound artist who believes in sonifying the obscure. Ever since his first encounter with Gumowski-Mira maps in 2016, he has been playing with chaotic maps as algorithmic controls for spatialisation and timbres. Tan uses audio particle systems to make the listening space as lively as visuals, and even self-develops his chaotic synths. Also a composer, Tan often grapples with themes of complex puzzles and megalomania, and is no stranger to dark, aggressive and ironic stories. After finishing his Masters in Sonology (Computer Music) in The Hague (NL), Tan has returned to Singapore and now experiments with immersive media. He also makes software for other people’s installations. His notable works include Cells #2 for surround speaker setup, The Double Double Pendula (The DDP) sound installation, and The Virtual Spatial Musical Instrument (The VSMI), a virtual reality (VR) sound installation.
Abstract: Stroke can have a severe impact on an individual’s quality of life, leading to consequences such as motor loss and communication problems, especially among the elderly. Studies have shown that early and easy access to stroke rehabilitation can improve an elderly individual’s quality of life, and that telerehabilitation is a solution that facilitates this. In this work, we visualize movement to music during rehabilitation exercises captured by the Kinect motion sensor, using a dedicated Serious Game called ‘Move to the Music’ (MoMu), so as to provide a quantitative view of progress made by patients in motor rehabilitation for healthcare professionals to track remotely.

Author: Praveena Satkunarajah & Kat Agres

Bio: Praveena is currently a research engineer at the Institute for High Performance Computing. She obtained her Bachelor's in Computer Science from Nanyang Technological University. Her research interests include Data Mining, Machine Learning and Cognitive Science.
Abstract: n this paper, we learn disentangled representations of timbre and pitch for musical instrument sounds. We adapt a framework based on variational autoencoders with Gaussian mixture latent distributions. Specifically, we use two separate encoders to learn distinct latent spaces for timbre and pitch, which form Gaussian mixture components representing instrument identity and pitch, respectively. For reconstruction, latent variables of timbre and pitch are sampled from corresponding mixture components, and are concatenated as the input to a decoder. We show the model's efficacy using latent space visualization, and a quantitative analysis indicates the discriminability of these spaces, even with a limited number of instrument labels for training. The model allows for controllable synthesis of selected instrument sounds by sampling from the latent spaces. To evaluate this, we trained instrument and pitch classifiers using original labeled data. These classifiers achieve high F-scores when tested on our synthesized sounds, which verifies the model’s performance of controllable realistic timbre/pitch synthesis. Our model also enables timbre transfer between multiple instruments, with a single encoder-decoder architecture, which is evaluated by measuring the shift in the posterior of instrument classification. Our in-depth evaluation confirms the model's ability to successfully disentangle timbre and pitch.

Author: Yin-Jyun Luo

Bio: Yin-Jyun Luo is a Ph.D student at Singapore University of Technology and Design, under the supervision of Professor Dorien Herremans and Professor Kat Agres. He was a research assistant in the Music and Culture Technology Lab lead by Dr. Li Su in Institute of Information Science, Academia Sinica, Taiwan. He received an Master of Science in Music Technology, National Chiao Tung University, Taiwan. Yin-Jyun’s is currently working on disentangled and interpretable representation learning of music and audio using deep learning.

Schedule

Poster presentations

Back to SMRS2019