TY - GEN
T1 - Evaluating language models of tonal harmony
AU - Sears, David R.W.
AU - Korzeniowski, Filip
AU - Widmer, Gerhard
PY - 2018/1/1
Y1 - 2018/1/1
N2 - This study borrows and extends probabilistic language models from natural language processing to discover the syntactic properties of tonal harmony. Language models come in many shapes and sizes, but their central purpose is always the same: to predict the next event in a sequence of letters, words, notes, or chords. However, few studies employing such models have evaluated the most state-of-the-art architectures using a large-scale corpus of Western tonal music, instead preferring to use relatively small datasets containing chord annotations from contemporary genres like jazz, pop, and rock. Using symbolic representations of prominent instrumental genres from the common-practice period, this study applies a flexible, data-driven encoding scheme to (1) evaluate Finite Context (or n-gram) models and Recurrent Neural Networks (RNNs) in a chord prediction task; (2) compare predictive accuracy from the best-performing models for chord onsets from each of the selected datasets; and (3) explain differences between the two model architectures in a regression analysis. We find that Finite Context models using the Prediction by Partial Match (PPM) algorithm outperform RNNs, particularly for the piano datasets, with the regression model suggesting that RNNs struggle with particularly rare chord types.
AB - This study borrows and extends probabilistic language models from natural language processing to discover the syntactic properties of tonal harmony. Language models come in many shapes and sizes, but their central purpose is always the same: to predict the next event in a sequence of letters, words, notes, or chords. However, few studies employing such models have evaluated the most state-of-the-art architectures using a large-scale corpus of Western tonal music, instead preferring to use relatively small datasets containing chord annotations from contemporary genres like jazz, pop, and rock. Using symbolic representations of prominent instrumental genres from the common-practice period, this study applies a flexible, data-driven encoding scheme to (1) evaluate Finite Context (or n-gram) models and Recurrent Neural Networks (RNNs) in a chord prediction task; (2) compare predictive accuracy from the best-performing models for chord onsets from each of the selected datasets; and (3) explain differences between the two model architectures in a regression analysis. We find that Finite Context models using the Prediction by Partial Match (PPM) algorithm outperform RNNs, particularly for the piano datasets, with the regression model suggesting that RNNs struggle with particularly rare chord types.
UR - http://www.scopus.com/inward/record.url?scp=85069862591&partnerID=8YFLogxK
M3 - Conference contribution
T3 - Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018
SP - 211
EP - 217
BT - Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018
A2 - Gomez, Emilia
A2 - Hu, Xiao
A2 - Humphrey, Eric
A2 - Benetos, Emmanouil
PB - International Society for Music Information Retrieval
Y2 - 23 September 2018 through 27 September 2018
ER -