Demo: https://www.youtube.com/watch?v=bk7PKtHLudE
Demo video description follows.
This is a demo of a visualization of music semantics on the Spotify streaming platform.
It was created with the following tools in the Python programming language:
- Spotify and its Web API for gathering music previews from the platform's 126 genres
- OpenL3 for converting those previews into deep audio embeddings for comparison
- SciKit and NumPy for processing the embeddings via K-means clustering and Principal Component Analysis
- Matplotlib for rendering the Voronoi diagram
- mplcursors for mouse interactivity
- Pygame for playing audio
The results so far are extremely promising, even with my very small sample size of 945 songs. I hope to scale this up and play around with different clustering algorithms to eventually create autogenerated sets for a radio show at MIT's WMBR. Stay tuned, possibly.
A collection of insights I made and other things while explaining the project to some friends:
- Something I whipped up. I used Spotify's API to download those previews you hear when interacting with a track or album preview on another site and ran them through this deep audio embeddings model: https://github.com/marl/openl3
- The dataset includes 10 songs from each one of Spotify's 126 genres.
- I'm aiming to get to around 100K songs, but Tensorflow doesn't support the latest version of CUDA yet so I had to let the generation of this dataset, about ~1K songs, run on my machine overnight. I'm probably am just going to need to downgrade CUDA, which is going to be a major pain.
- I'm hoping to use this for my WMBR radio programs; I think there are a lot of cool ways this can be used to autogenerate sets.
- Big brain idea:
- Get tensorflow-metal working on my Mac [I recently acquired a Mac Studio to use alongside my Linux tower].
- Have the PC churn for a few days.
- ?
- Profit!
- Added support for playing the song I'm hovering over using Pygame; very fun!
- Holy cow, this actually works!
- The rock is near the other rock, the Spanish music is near the other Spanish music!
- The actual clustering isn't all that great, probably because I have too many clusters and too few data points, but the embedings work great along with my descion to use mean pooling (each preview actually generates 30 embeddings which I then have to combine).
- This is amazing and hilarious! I can go from Aerosmith to Travis Scott to Weezer to Louis Armstrong in a few flicks of the mouse. I'm basically emulating some insanely indecisive guy flicking between radio stations at a mile a minute with zero latency.
- There are some songs which do seem too close to one another, so I probably need to play around with adding more dimensions to the embedding, switching between max and mean pooling, and playing around with the params of my clustering and dimension reduction algos.
- Man, making this graph interactive and play audio was a stroke of genius on my part, not to be too self-aggrandizing, of course. I basically just cobbled a bunch of technologies I read about together.
- One thing I noticed is that the insensity rises as you get closer to the center.