Miss Elona Shatri

Email: e.shatri@qmul.ac.uk
Room Number: Engineering, Eng 403
Website: https://www.elonashatri.co.uk
Profile
Project Title:
Optical Music Recognition using Deep Learning
Abstract:
The proposed PhD focuses on developing novel techniques for optical music recognition (OMR) using Deep Neural Networks (DNN). The research will be carried out in collaboration with Steinberg Media Technologies opening the opportunity to work with and test the research outcomes in leading music notation software.
Musicians, composer, arrangers, orchestrators and other users of music notation have long had a dream that they could simply take a photo or use a scan of sheet music and bring it into a music notation application to be able to make changes, rearrange, transpose, or simply listen to being played by the computer. The PhD aims to investigate and demonstrate a novel approach to converting images of sheet music into a semantic representation such as MusicXML and/or MEI. The research will be carried out in the context of designing a music recognition engine capable of ingesting, optically correcting, processing and recognising multiple pages of handwritten or music from image captured by mobile phone, or low-resolution copyright-free scans from the International Music Score Library Project (IMSLP). The main objective is outputting semantic mark-up identifying as many notational elements and text as possible, along with the relationship to their position in the original image. Prior solutions have used algorithmic approaches and have involved layers of algorithmic rules applied to traditional feature detection techniques such as edge detection. An opportunity exists to develop and evaluate new approaches based on DNN and other machine learning techniques.
State-of-the-art Optical Music Recognition (OMR) is already able to recognise clean sheet music with very high accuracy, but fixing the remaining errors may take just as long, if not longer, than transcribing the music into notation software by hand. A new method that can improve recognition rates will allow users who are not so adept at inputting notes into a music notation application to get better results quicker. Another challenge to tackle is the variability in quality of input (particularly from images captured from smartphones) and how best to preprocess the images to improve the quality of recognition for subsequent stages of the pipeline. The application of cutting edge techniques in data science, including machine learning, particularly convolutional neural networks (CNN) may yield better results than traditional methods. To this end, research will start from testing VGG like architectures (https://arxiv.org/abs/1409.1556) and residual networks (e.g. ResNet, https://arxiv.org/pdf/1512.03385.pdf) for the recognition of hand written and/or low-resolution printed sheet music. The same techniques may also prove useful in earlier stages of the pipeline such as document detection and feature detection. It would be desirable to recognise close to all individual objects in the score. One of the first objectives will be to establish the methodology for determining the differences between the reference data and the recognised data. Furthermore data augmentation can be supported by existing Steinberg software. The ideal candidate would have previous experience of training machine learning models and would be familiar with Western music notation. Being well versed in image acquisition, processing techniques, and computer vision would be a significant advantage.
C4DM theme affiliation:
Music Informatics