Optical Music Recognition

This post is incomplete and littered with spelling errors. I’m only posting to shame myself into finishing it after sitting on this draft for the past 5 months

https://github.com/NotJoeMartinez/MusicNotesML https://github.com/aashrafh/Mozart

Last year I participated in the Microsoft Student Hackthon. There was a “Hack For Education” challenge and I wanted to make something that would have helped me when I was a music student. I played the violin for 8 years while I was in grade school and I always struggled to read sheet music.

Some music theory#

It’s been a while since I’ve actually played my violin much less analyzed sheet music (so go easy on me) but this is by best explanation of an excerpt from marry had a little lamb. If you already know how to read sheet music you can skip this part.

This symbol means the notes are in the Treble clef There are a bunch of clefs used to write music for different instruments usually
corresponding to how high or low of a pitch an instrument can produce. The violin uses the Treble clef which is used for higher pitch instruments. Depending on the clef the symbols to the left of it map to different notes to play. To the right of the treble clef is the key signature A key signature is a seres of sharps ♯ or flats ♭ placed to the right of the clef. These serve as a way to tell the musician to shift the notes up or down semitones. So basically the more sharps in the key signature the higher pitch the music will sound and the more flats the more depressing it will sound.

These things are the notes if we look at them by them self (without the lines in the background) they are just three sets of four “beats” and one long beat at the end. Without the staff they don’t have much meaning.

Behind everything is the staff The staff is five horizontal lines that also represent musical pitch. When note symbols are place on, below or in between the lines of the staff they represent a unique pitch that has a letter in the musical alphabet ranging from A to G. Notes lower on the staff will have a lower pitch and notes higher on the staff will have a higher pitch. There are only 7 letters in this alphabet the letters will repeat them selfs once they reach G.

This is what a every note on G major maps to over two octaves meaning that the G on the left will be played in a different way than the G on the right. (I know it’s confusing stick but with me)

How these notes map to what your fingers do#

Simply put sheet music is just way of telling you what to do with your fingers and bow to play a peice of music. A violin has four strings G, D, A and E

Humans have four fingers. Each string on the violin has a four (really eight) specific places where you can put down a finger while bowing and it will produce a note. This is known as this configuration is known as “first position” because there are a few more ways to hold the violin but this is the main one.

Because the violin is such an elitist instrument these finger positions are not marked by anything unlike frets on a guitar. This means players need to gain muscle memory of where their finger should be, this is usually done with the help of tapes in the location of the finger areas.

Fingering#

It took me years to finally stop using finger tapes. But something that I struggled with much longer was mentally mapping the notes on the page to the fingers I needed to have down. A way beginners get past this is by writing the finger numbers down on the sheet music like this. A quick search of “fingering” in the r/violin sub reddit shows some more real world examples

PhotoMath#

PhotoMath is an application that uses optical character recognition to allow users to scan hand written or printed math problems into a calculator that solves the problem in front of you, even showing the solving steps. This technology has been around for years and I was always hoping that one day something similar would come out for reading sheet music ( a photo math for sheet music). Ideally the final product would allow users to upload an image of sheet music and it would augment the proper fingerings over the image or other use full information.

Optical Music Recognition#

Optical music recognition (OMR) is a field of computer vision interested in reading musical notation from sheet music. It’s similar to optical character recognition although there is much less research in this area.

ML ain’t easy#

Full disclosure, Machine Learning and computer vision is basically magic to me. I was once tasked with building a quality control system as a research assistant for a lab at my university and failed miserably after sinking about six months of my life into the project. Turns out my experience writing dumb python scripts doesn’t translate well to a field that requires a deep knowledge of advanced linear algebra and calculus. However in my desperate attempts to produce something of value, I did manage to pick up a few “tricks” which allowed me to experiment with the power of ML technology without understanding how it works. There are a bunch of tools that abstract the complexity of ML behind a couple python libraries so dummies like me can get a prototype working with little effort. During this overnight Hackathon, my team and I managed to get a working proof of concept web webapp. When given a properly cropped excerpt of music, it returns an image with the proper fingerings overlaid. We even got it to work with several instruments. However this would not have been possible if we tried to build the model from scratch. (even if you gave me six months)