Poster Session #1 and Registration. Coffee will be available.
Poster Session #1 and Registration. Coffee will be available.
A buffet lunch will be served in the lobby. Posters will be hosted from 1:00 until 1:45pm
Abstract:With the advent of neural networks, there have been huge improvements in the accuracy of music transcription systems. While many studies focus on the design of neural network architectures for music transcription, only few of the papers include an in-depth study of the input representations for the neural network. In our work, we compare different input representations such as STFT spectrogram, log-frequency spectrogram, constant Q transform (CQT), and study the effects of these different input representations on the music transcription accuracy.
Author: Kin Wai Cheuk, Kat Agres, Dorien Herremans
Bio: Kin Wai Cheuk is a Ph.D student at Singapore University of Technology and Design, under the supervision of Professor Dorien Herremans and Dr. Kat Agres. He received both his Bachler of Science in Physics (Minor in Music) and Master of Philosophy in Mechanical Engineering in The University of Hong Kong. His research interest is neural network based music composition.
Abstract: Emotion and music are intrinsically connected, and researchers have had limited success in employing computational models to predict perceived emotion in music. Here, we use computational dimension reduction techniques to discover meaningful representations of music. For static emotion prediction, i.e., predicting one valence/arousal value for each 45s musical excerpt, we explore the use of triplet neural networks for discovering a representation that differentiates emotions more effectively. This reduced representation is then used in a classification model, which outperforms the original model trained on raw audio. For dynamic emotion prediction, i.e., predicting one valence/arousal value every 500ms, we examine how meaningful representations can be learned through a variational autoencoder (a state-of-the-art architecture effective in untangling information-rich structures in noisy signals). Although vastly reduced in dimensionality, our model achieves state-of-the-art performance for emotion prediction accuracy. This approach enables us to identify which features underlie emotion content in music.
Author: Dorien Herremans, Kin Wai Cheuk, Yin-Jyun Luo, Kat Agres
Abstract: Aphasia caregivers serve as an important link between the medical practitioners and the patients and assume the most responsibilities in the patients’ well-being. On top of their multitude of care responsibilities, they are expected to act as the stand-in therapists and to guide the patients in speech rehabilitation when the therapists are not present. This work investigates the current practices and communication challenges involved in the process of caregiver training for speech rehabilitation. Then, it builds on these new insights to ideate design solutions to tackle the identified challenges. Three design strategies were proposed (1) connectivity, (2) multimodality and (3) integration, to address these groups of challenges respectively: (a) uncertainty due to lack of feedback from both caregiver and therapist, (b) caregiver’s lack of guidance to conduct exercises at home and (c) caregiver’s lack of understanding and motivation to conduct exercises. A caregiver support system was also developed and and iterated through experimental studies to show how design can help the caregivers of aphasic patients to bridge the expertise gap.
Author: Thanh Pham Ha
Bio: Ha was a Master student at the Department of Communications and New Media, NUS. She is now a design practitioner who has worked on multiple digital products. Her interest lies in applying design processes to use art & music as a channel to solve challenging problems.
Abstract: 30 musicians (10 singers, 10 Dizi and 10 String musicians) were asked to perform pitch singing task. It was discovered that Musicians are not consistent with their pitch and it is difficult to determine the music temperament that they prefer. None of the musicians share the same temperament when compared individually or as a section. It was also discovered that the Dizi musicians’ data were clustered closely together as compared to the String musicians where the data are more spaced out. Two observations stand out strongly among all musicians in this research i.e the preference for a flatter "Fa" and a sharper "Ti".
Author: Hsien Han
Bio: Hsien Han is a PhD student from SUTD, under the supervision of Dr Chen Jer Ming.
Abstract: One of the challenges in the development of computational improvisation systems is to design systems that improve co-creative experiences and engender a sense of creative part- nership. Prior work on computational improvisation systems has engendered a sense of partnership through the perceived agency of the system but are less focused on directly im- pacting co-creative experiences. This work oers an alternative approach to study creative musical partnerships by directly impacting musicians' co-creative experiences. Towards this goal, we develop a new representation of a musical context based on two dimensions - stability and togetherness - that are integral to musical decision-making during co-creation. This representation aids the musical decision-making of an agent that monitors changes in the musical context, and alters its musical response based on internal goals. The agent was evaluated through human experiments in which it improvised rhythmic duets with the musicians and they reported their experiences. In these performances, musicians identied three characteristics of interaction that distinguished their sense of co-creating with a tool and a collaborator. The dierent characteristics also correspond to aspects of group cre- ativity through emergent symbolic interaction, perceivable unpredictability, and a sense of negotiating dierences in understanding. Through the development of a new representa- tion of musical context and decision-making procedure that directly impacts the musicians' co-creative experiences, this work oers an alternative approach to study creative compu- tational partnerships and to develop semi-autonomous music partners.
Author: Prashanth T.R.
Bio: Prashanth is a Ph.D. student in the Department of Communications and New Media, NUS. His interests revolve around the development of intelli- gent music technologies for creative collaboration with humans. His Ph.D. research focuses on addressing the issues of developing creative music tech- nologies that transcend their use as creative support tools and are recognized computational partners in improvised musical collaboration. In the Singa- pore music research symposium 2019, he will be presenting his research on the design of the creative computational partner, and demonstrate its appli- cation in live music performance.
Abstract: Singapore has a notably multiracial population, and along with it, a diversity of musical traditions. We seek to investigate and understand the music of the different ethnicities of Singapore using state-of-the-art techniques from music information retrieval (MIR). This initial project aims to discover the relationships between different musical cultures in Singapore, with a focus on Chinese, Malay, and Indian music. Using the Spotify Web API, we have collected a total of 15,930 musical clips (30 sec each in duration). In our preliminary analysis, we extracted low-level features such as MFCCs and Chroma using OpenSMILE to classify Chinese, Malay, Hindi and Tamil music. Performance across various machine Learning algorithms, such as Logistic Regression and SVM, was evaluated. In our preliminary analysis, 318 features were used (retaining 98% of the variance) to achieve the highest prediction accuracy (of 65%). We are working towards improving classification accuracy by cleaning the dataset and extracting high level musical features.
Author: Fajilatun Nahar, Dorien Herremans, Kat Agres
Bio: Fajilatun Nahar is currently a Master’s Student in Singapore University of Technology and Design (SUTD), under the supervision of Assistant Professor Dorien Herremans. She received her BSc in Computer Science and Engineering from North South University, Bangladesh. Prior to her Master’s she worked as a software developer for more than 3 years. Currently, she is exploring Music Information Retrieval and Machine Learning topics. She is also interested to combine her software development skills into her research.
Abstract: This work investigated the factors that influence the difference between a musician’s live performance and recording self-evaluation. 52 musicians self-evaluated their live performance and recording under varying conditions. Results showed that musicians perceived their live performance and recording differently. However, this difference was not reflected in their overall quality ratings. Significantly different levels of self-efficacy, perfectionistic evaluative concerns, trait anxiety, and cognitive anxiety were found between participants who rated their live performance more favourably and participants who rated their recording more favourably. When the recording self-evaluation was controlled for, similar factors were found to predict the self-evaluation of live performance. Furthermore, cognitive anxiety positively correlated with the increased difference between live performance and recording self-evaluation. Additionally, the participant’s opinion of the effects of their anxiety on the performance was a stronger predictor than the intensity of the anxiety experienced. Interestingly, participants who frequently listened to their recordings did not display a reduced difference between self-evaluation of live performance and recording. However, due to the design of the study, no causal inferences were established.
Author: Paul Huang
Bio: Paul Huang recently completed his postgraduate studies in Performance Science at the Royal College of Music (RCM). He received his Bachelor of Music in flute performance from the RCM and Nanyang Academy of Fine Arts in 2015. He remains as an active freelance flutist; performing regularly with orchestras and chamber ensembles. Paul’s research interests lie in performance psychology, performance optimisation, and quantitative research. He is currently working on a project exploring self-evaluation of live performance and recording in musicians.
Abstract: One can easily visualise a simple pendulum as a model for a sine-wave oscillator, arguably the most basic building block for sound synthesis. However, what if this pendulum is not simple, but chaotic, e.g. a double pendulum? In addition, while other chaotic maps have been used as audio-rate oscillators, using chaotic pendula for such purposes is almost non-existent. This poster explores how one can model chaotic (physical) pendula for audio-rate oscillators, with an example found in Timothy S. H. Tan’s sound installation The Double Double Pendula (The DDP), and challenges.
Author: Timothy S. H. Tan
Bio: Timothy S. H. Tan (timbretan) is an algorithmic sound artist who believes in sonifying the obscure. Ever since his first encounter with Gumowski-Mira maps in 2016, he has been playing with chaotic maps as algorithmic controls for spatialisation and timbres. Tan uses audio particle systems to make the listening space as lively as visuals, and even self-develops his chaotic synths. Also a composer, Tan often grapples with themes of complex puzzles and megalomania, and is no stranger to dark, aggressive and ironic stories. After finishing his Masters in Sonology (Computer Music) in The Hague (NL), Tan has returned to Singapore and now experiments with immersive media. He also makes software for other people’s installations. His notable works include Cells #2 for surround speaker setup, The Double Double Pendula (The DDP) sound installation, and The Virtual Spatial Musical Instrument (The VSMI), a virtual reality (VR) sound installation.
Abstract: Stroke can have a severe impact on an individual’s quality of life, leading to consequences such as motor loss and communication problems, especially among the elderly. Studies have shown that early and easy access to stroke rehabilitation can improve an elderly individual’s quality of life, and that telerehabilitation is a solution that facilitates this. In this work, we visualize movement to music during rehabilitation exercises captured by the Kinect motion sensor, using a dedicated Serious Game called ‘Move to the Music’ (MoMu), so as to provide a quantitative view of progress made by patients in motor rehabilitation for healthcare professionals to track remotely.
Author: Praveena Satkunarajah & Kat Agres
Bio: Praveena is currently a research engineer at the Institute for High Performance Computing. She obtained her Bachelor's in Computer Science from Nanyang Technological University. Her research interests include Data Mining, Machine Learning and Cognitive Science.
Abstract: n this paper, we learn disentangled representations of timbre and pitch for musical instrument sounds. We adapt a framework based on variational autoencoders with Gaussian mixture latent distributions. Specifically, we use two separate encoders to learn distinct latent spaces for timbre and pitch, which form Gaussian mixture components representing instrument identity and pitch, respectively. For reconstruction, latent variables of timbre and pitch are sampled from corresponding mixture components, and are concatenated as the input to a decoder. We show the model's efficacy using latent space visualization, and a quantitative analysis indicates the discriminability of these spaces, even with a limited number of instrument labels for training. The model allows for controllable synthesis of selected instrument sounds by sampling from the latent spaces. To evaluate this, we trained instrument and pitch classifiers using original labeled data. These classifiers achieve high F-scores when tested on our synthesized sounds, which verifies the model’s performance of controllable realistic timbre/pitch synthesis. Our model also enables timbre transfer between multiple instruments, with a single encoder-decoder architecture, which is evaluated by measuring the shift in the posterior of instrument classification. Our in-depth evaluation confirms the model's ability to successfully disentangle timbre and pitch.
Author: Yin-Jyun Luo
Bio: Yin-Jyun Luo is a Ph.D student at Singapore University of Technology and Design, under the supervision of Professor Dorien Herremans and Professor Kat Agres. He was a research assistant in the Music and Culture Technology Lab lead by Dr. Li Su in Institute of Information Science, Academia Sinica, Taiwan. He received an Master of Science in Music Technology, National Chiao Tung University, Taiwan. Yin-Jyun’s is currently working on disentangled and interpretable representation learning of music and audio using deep learning.