Generate Music using Deep Learning
The plan is to develop a Music Generation model using deep learning, specifically RNNs(recurrent neural networks) and LSTMs(long short-term memory networks).The model will be trained on a music database.
The aim is to create a suitable architecture for implementing these models and modify the code such that the music generated aligns with key music theory concepts like key, time signature, and basic resolution.
The broad domain of the work is called Music Information Retrieval (MIR), which is an intersection of Audio Processing, Deep Learning as well as some Cognitive Psychology.
Initially we have to get familiarised with basics of Music Theory link :- https://youtu.be/xZgU57B3ZGg
Reference for Audio Processing - https://drive.google.com/file/d/1foZ8huM08RQG_ib6jHR19eLiSqvBYz1G/view?usp=drive_link
What differs a musical piece from a normal audio?
- Elements of sound repeating after one bar of the time signature
- Scale (not always though good for a basic model)
- Rhythm
- BPM
- Frequency Ranges of different instruments sucha as guitars are in mids, bass is lows and low mids, cymbals/hi hats are highs.
- Parts of a song such as verse, chorus, bridge, prechorus, intro, etc.
Generally music generated by LSTMs don't have a macro-periodicity i.e. the above mentioned parts of the songs (main line, stanza) and returning to the chorus at frequent and equal intervals.
To capture the macro-periodicity and structure of music (like verses, choruses, bridges):
- Hierarchical Models: Models can use lags to understand how musical phrases relate to each other over longer periods, such as predicting when to introduce a chorus after a verse.
- Attention Mechanisms: These can focus on different parts of a musical piece, effectively handling long-term dependencies and ensuring the generation of music with coherent structure.
- Markov Models or Hidden Markov Models (HMMs): Incorporate ideas from classical time series analysis like Markov models or HMMs. These models explicitly model transitions between states, which can be linked to different sections of the song (e.g., verse, chorus). They provide a structured way to handle sequences with clear periodic patterns.
Assignment 1 Evaluation Metrics are critical in any machine learning scenario, in this case too.. as the domain is niche, we have to define and find our own metrics according to our wants, now assume you have generated a melody, over a training set, now define 3 metrics and a method of their evaluation (preferably mathematically) for example- a metric of musicality would be rhythmic periodicity- how likely is the audio generated to follow a bpm, for that the evaluation would be the spread of beat intervals, first find the beat onsets, then measure the variance of all the beat intervals, low variance implies tighter bpm.
- Rhythmic Regularity: assesses how well the generated melody sticks to a specified beat or pattern. A melody with higher rhythmic regularity is perceived as more rhythmically consistent.
- The beat onsets can be detected by manually tapping or by an ODF( onset detection function)
- The time intervals can be calculated between successive beat onsets.
- The variance of these intervals can be used to measure rhythmic periodicity. Lower Variance corresponds to more rhytmic regularity.
- Melodic Contour Smoothness: measures how smooth the pitch transitions are from one note to another within the melody. Smooth contours are typically perceived as more natural and pleasing.
- melody can be represented as a sequence of pitch values P = {p1, p2, .. ,pn} where pi is the pitch of the ith note.
- delta pi is calculated which is the absolute difference between consecutive pitch values.
- calculate the mean and variance of this data to quantify the smoothness of melodic contour. Lower average values indicate smoother transitions.
- Phrase Structure: evaluates coherence and organization of the melody into different musical phrases. clear beginning , middles and endings within each phrase.
- the melody can be segmented into phrases based on musical criteria such as contour, rhythm and harmonic progression.
- measure the consistency and clarity of phrase boundaries by comparing structural elements( pitch patterns, rhythmic motifs, etc.) at the boundaries of identified phrases.
- This metric is more qualitative and involves subjective evaluation.