<< Back to homepage

2022-12-30

Top 2000: web scraping, interactive plots & holiday fun

Keywords: music, web scraping, data visualization, interactive data

Ever since I met my current partner the Top 2000 has been an integral part of my holiday season. The Top 2000 is an annual Dutch music marathon based on the 2000 ‘best songs ever made’ (according to middle-aged Dutch people). The Top started in 2000 with the turn of the century and runs 24 hours a day from Christmas Eve till New Year’s. The number 1 (the best song EVER) is usually Queen’s Bohemian Rhapsody, which is, therefore, the song that ushers in the new year in most Dutch living rooms.

At first I didn’t care much for listening to 50%+ songs I didn’t know nor liked, with a painfully high level of 70s and 80s rock and very little hip hop (or songs from after 2000 for that matter, my desperate attempt to vote Kanye West, Beyoncé, and Ariana Grande into the list failed miserably with only 6 out of my 35 selections actually making the cut). But I’m starting to understand the fun of not having to choose what to listen to for a week, being annoyed at the old-fashioned choices that do make the cut and increasingly recognizable songs as the week progresses.

Enter: data visualisation

So in the few days between Christmas and New Year’s I decided to practice my data visualization skills. I wanted to try out Plotly’s interactive plots and I was also eager to dip my toes into some web scraping. So, here we are: some not-so-coherent visualizations of the Top 2000, not necessarily to support any coherent conclusion (though it did confirm my presumed bias towards 70’s and 80’s rock) but just because it’s fun. All code and data can be found on GitHub.

I would highly recommend viewing the visualizations on a laptop or tablet instead of a smartphone.

Counting Genres & Supergenres

With the Excel sheet of the Top 2000 that NPO Radio 2 provides it was easy enough to create a small interactive visualization of the songs, artists, and release years. But I thought one more dimension would add a lot of value: the musical genre. To get the genre of each song I used Last.fm. I wrote a pretty small loop and a few functions which were able to create a hyperlink for each artist that would lead to their page on this website, which lists the genres of their music! With some usage of the BeautifulSoup package I was able to extract this data from the HTML of the page. By simply counting the number of songs in each of these genres I was able to create a fun plot, shown below (also visible here) Note that songs are counted double (or triple, quadruple, or quintuple) as they are mostly tagged as more than one genre.

These genres were quite specific in some cases, which is fun to look through but maybe not very descriptive. Therefore, I generalized the genres to several supergenres, combining a few lists by Multimediaeval and adding additional genres were necessary (for example ‘nederhop’, ‘nederpop’, and ‘gabber’). To this end I only used the first genre of each song (or the second genre if the first one didn’t have a supergenre). This resulted in the graph shown below (also visible here).

Visualising all songs

When I started out I had already created one plot which I liked the look of: a scatterplot with all songs as dots. But it needed some more content. With the supergenres I had the third variable I was looking for. Try out this nice plot (also visible here)

What I don’t love is that the supergenres are not as generalized as I would like for this type of plot. I therefore created the ‘supersupergenres’ which generalized over the supergenres once more to give a slightly more intuitive sense of the data (also visible here). I must admit though that my musical knowledge is limited so some of the groupings might be atypical.

A few shortcomings are of course that a genre per song instead of per artist might have been more descriptive but I couldn’t find a suitable website for that purpose. I also would have liked to base the supergenre on all 4 or 5 genres instead of only the first one, but this got very complicated very fast. Some artists were also not listed on Last.fm, which I partly manually fixed but I’m sure the information on Dutch artists is slightly more limited. The list of subgenres vs supergenres is also quite limited but I tried to really get most out.

I hope you enjoy these plots and can find your own song picks in them (and agree with their genre!). Happy New Year!