a muffin with purple glowing regions where a 3d vornoi function using chebychev distance exceeds some threshold

metamuffin's personal website


Correlating music artists

I listen to a lot of music and so every few months my music collection gets boring again. So far I have asked friends to recommend me music but I am running out of friend too now. Therefore I came up with a new solution during a few days.

I want to find new music that i might like too. After some research I found that there is Musicbrainz (a database of all artists and recordings ever made) and Listenbrainz (a service to which you can submit what you are listening to). Both databases are useful for this project. The high-level goal is to know, what people that have a lot of music in common with me, like to listen to. For that the shared number of listeners for each artist is relevant. I use the word 'a listen', to refer to one playthrough of a track.

The Procedure

Parse data & drop unnecessary detail

All of the JSON files of listenbrainz are parsed and only information about how many listens each user has submitted for what artist are kept. The result is stored in a B-tree map on my disk (the sled library is great for that).

The B-Tree stores values ordered, such that i can iterate through all artists of a user, by scanning the prefix (user, ….

Create a graph

Next an undirected graph with weighted edges is generated where nodes are artists and edges are shared listens. For each user, each edge connecting artists they listen to, the weight is incremented by the sum of the logarhythms of either one's playthrough count for that user. This means that artists that share listeners are connected and because of the logarhythms, users that listen to an artist a lot won't be weighted proportionally.

Mapping: (artist, artist) -> weight. (Every key (x, y) is identical with (y, x) so that edges are undirectional.)

Query artists

The graph tree can now be queried by scanning with a prefix of one artist (("The Beatles", …) and all correlated artists are returned with a weight. The top-weighted results are kept and saved.

Notes

Two issues appeared during this project that lead to the following fixes:

Results

In a couple of minutes I rendered about 2.2 million HTML documents with my results. They are available at https://metamuffin.org/artist-correl/{name}. Some example links:

Numbers

Article written by metamuffin, text licenced under CC BY-ND 4.0, non-trivial code blocks under GPL-3.0-only except where indicated otherwise