Beatles Genome Project Part I

Go to Github Repo

Distributions and Transitions


This article describes analytics performed on a corpus of The Beatles selected from the Rolling Stone’s Top 100 Beatles songs. The source material is The Beatles Fakebook, which was typed into a wiki using TinyNotation from the Music21 project according to my discussion on music data entry.

The statistical analysis of music is an informatics field in its infancy. I’m deeply indebted to its pioneers, among them David Huron, and David Temperley, who inspired me to pursue this analysis. If you have a passing interest in this field, you are obliged to read their books Sweet AnticipationMusic and Probability, and The Cognition of Basic Musical Structures. I’m also indebted to Trevor DeClercq’s article “A Corpus Analysis of Rock Music“, which provides the Rolling Stone Top 100 and the common practice chord corpus data used to compare against Lennon and McCartney. For the common practice melody corpus, I use results from David Huron’s Sweet Anticipation.

I use music theory notation throughout this article; feel free to check out primers on Roman numeral chord notation (like IV -> V -> I) and scale degree notation (like the tonicb3, 5).

One thing you’ll notice in the evolution of music analytics is the increasing sophistication of tools for conducting analysis on computers as technology improves and the field attracts more technically trained minds. Most recently, the Music21 project has brought musicology to the flexibility of Python scripting, which I use here, and the beautiful user interface of — made by physicists just like myself! — means that flexible rock-music corpuses can be generated in no time flat. Imagine a rock music course taught in which the students input their own songs which can then be analyzed and compared!

Chord Distributions

When someone learns that I’m a Beatles fan (who isn’t, really?), the first question people ask is, “Who do you like better: John or Paul?” Personally, I can’t decide! But these questions leave me to wonder — are there statistical differences in their compositions that I could extract from the dataset? Will I find that McCartney’s more up-beat tunes have a stronger major quality? From a harmonic perspective, it turns out, there is very little difference:

Unsurprisingly, they both empahasize the common I, IV, and V, with decreasing dependency on the vi and ii. John certainly prefers his iii (think “Help!”) and iv (think “Nowhere Man”) whereas Paul is more-strongly associated with the II (think the refrain in “Yesterday”, “Eight Days a Week”).

How do they compare to the Rolling Stone rock corpus (RS100) and classical music (known as the common practice) corpus?

(The latter two datasets only contain information about chord root and not chord mode, hence the ambiguity for the fourth, fifth, sixth, and second scale degrees).

The first thing to note is that all three corpuses sit on the I most often at around one-third of the time. However, among the next-two common chords, the IV and the V, both the Beatles and the rock corpus give roughly equal weighting. In the common practice, however, the V gets significantly more attention.

Interestingly, it turns out that the “rock chords” that we often associated with The Beatles — for instance, the bVII -> IV -> I of “She Said She Said” — don’t actually appear in their works as often as the rock corpus. If anything, the rock music that followed emphasized these chords significantly more. Another interesting detail: The Beatles love the minor vi, ii, and iii significantly more than the rock corpus, bringing them closer in line with classical music. This fact reflects their connections to Tin Pan Alley and jazz.

Chord Transitions

If we examine chords in the time domain, we can construct a simple Markov chain by simply examining the transition rate between chords. Above, I show the  total number of transitions from the antecedant chord on the left to the consquent chords on the bottom by color. We see that in the Beatles corpus, the most common chord transitions are:

  1. I -> IV
  2. IV -> I
  3. V -> I
  4. I -> V
  5. IV -> V
  6. I -> vi
  7. I -> bVII

The prevalence of the I as an antecedent shouldn’t be surprising due to how common it is. Also expected is the emphasize on the -> IV -> V triad, but perhaps more surprising is the -> bVII -> IV and the -> vi. These last two are Lennon favorites; the former connotes doo-wop from the 1950s and the latter the Mixolydian “circle of fourths” that became a dominant theme in pop/rock music.

If we normalize each row, we can get a sense of what common chords we will go to next, given an antecedant. By this metric, the most inevitable chord transitions are:

  1. IV -> I
  2. V -> I
  3. iv -> I
  4. v -> I
  5. bVII -> IV
  6. bIII -> IV
  7. II -> V
  8. III -> vi
  9. ii -> V

The minor substitutions like iv and v are almost always used to color a transition back to I (such as the minor-four miracle move IV -> iv -> I). The prevalence of IV to follow borrowed minor-key chords like bVII and bIII is fairly notable. The transitions II -> V and III -> vi feature the most common secondary dominants.

Scale Degree Distributions and Transitions

If we normalize our melodies to the tonic of the key, we can examine how different scale degrees are distributed in melodies between Lennon and McCartney. Above, I show these distributions, and in general the two composers use very similar profiles, just like with their chords. However, subtle differences do exist. Lennon is far more likely to use non-diatonic tones like b2, b5, and b6, and to a lesser extent b3 and b7 (the common blues notes). Interestingly, McCartney uses more of the 4th and 6th scale degree, corroborating his increased use of the IV and vi chords (in particular, think of “Eleanor Rigby” or “She’s Coming Home” and his many throw-back tunes). Of particular interest is Lennon’s preference for the 5th scale degree over the 1st (tonic) — while both notes are stable tones, the 5th is less-so. These observations together hint at possible ways Lennon produced “edgier” songs.

Comparing Lennon and McCartney against the common practice, we see a much stronger presence of non-diatonic notes, the tonic, and the sixth, likely for the same reasons we identified these behaviors as distinctions between the two composers in our previous analysis.

Examining the scale-degree transition matrix, it’s important to normalize each row, since the absolute matrix doesn’t substantially differ from the simple overlap of probabilities for the antecedent and consequence scale degrees. This is what I plot above.

We see some expected behavior; for instance, most notes are followed by the 1 and 5 which are the most common notes overall. Interestingly, bucking the trend of leading tones, 3 is more likely to move down to 2 instead of resolving to 4, and 7 more likely moves down to 6 than resolves to 1. Surprisingly, the strongest pull felt in the matrix is from 2 to 1.

Interval Distributions and Transitions

Musical notes follow in time just like the chords, so we can examine the distribution of not just each note separately, but also notes in pairs. Any two notes in a sequence form an interval which describes the pitch distance between them. However, because intervals don’t contain the same musical semantic content as scale degrees, it is more difficult to ascribe meaning to differences between composers. Above, I show the distribution of musical intervals for their combined corpus and compare against the common practice.

Most melodies are relatively scalar (do-re-mi-fa-sol-fa-mi-re-do) with occasional leaps (do-sol-do), and this fact is reflected in the clustering of intervals around small values. Interestingly, smaller intervals tend to descend and leaps tend to ascend in both corpuses, a fact that has been corroborated both in the literature and by our common knowledge — for instance, in the melody for “Somewhere Over the Rainbow”, the song begins with an octave leap and then many downward steps.

Like the absolute scale-degree transition matrix, the absolute interval transition matrix doesn’t differ substantially from the overlap of interval probabilities. Row normalizing, however, shows an interesting trend: melodic intervals are often followed by their inverses, so that an upward step is followed by a downward step and vice-versa. This can be seen by the downward diagonal in the above plot. Melodic movement that emphasizes arches and dips as opposed to straight upward and downward sequences can be explained in part by the requirement that melodies must be contained within a limited vocal or instrumental range.

Continue to Beatles Genome Project Part II

One Response to Beatles Genome Project Part I

  1. Pingback: VisWeek Poster: Beatles Genome Project | Blog