Creating a Map of Music with Spotify and Deep Audio Embeddings

Demo: https://www.youtube.com/watch?v=bk7PKtHLudE

Demo video description follows.

This is a demo of a visualization of music semantics on the Spotify streaming platform.

It was created with the following tools in the Python programming language:

Spotify and its Web API for gathering music previews from the platform's 126 genres
OpenL3 for converting those previews into deep audio embeddings for comparison
SciKit and NumPy for processing the embeddings via K-means clustering and Principal Component Analysis
Matplotlib for rendering the Voronoi diagram
mplcursors for mouse interactivity
Pygame for playing audio

The results so far are extremely promising, even with my very small sample size of 945 songs. I hope to scale this up and play around with different clustering algorithms to eventually create autogenerated sets for a radio show at MIT's WMBR. Stay tuned, possibly.

A collection of insights I made and other things while explaining the project to some friends:

Something I whipped up. I used Spotify's API to download those previews you hear when interacting with a track or album preview on another site and ran them through this deep audio embeddings model: https://github.com/marl/openl3
The dataset includes 10 songs from each one of Spotify's 126 genres.
I'm aiming to get to around 100K songs, but Tensorflow doesn't support the latest version of CUDA yet so I had to let the generation of this dataset, about ~1K songs, run on my machine overnight. I'm probably am just going to need to downgrade CUDA, which is going to be a major pain.
I'm hoping to use this for my WMBR radio programs; I think there are a lot of cool ways this can be used to autogenerate sets.
Big brain idea:
1. Get tensorflow-metal working on my Mac [I recently acquired a Mac Studio to use alongside my Linux tower].
2. Have the PC churn for a few days.
3. ?
4. Profit!
Added support for playing the song I'm hovering over using Pygame; very fun!
Holy cow, this actually works!
The rock is near the other rock, the Spanish music is near the other Spanish music!
The actual clustering isn't all that great, probably because I have too many clusters and too few data points, but the embedings work great along with my descion to use mean pooling (each preview actually generates 30 embeddings which I then have to combine).
This is amazing and hilarious! I can go from Aerosmith to Travis Scott to Weezer to Louis Armstrong in a few flicks of the mouse. I'm basically emulating some insanely indecisive guy flicking between radio stations at a mile a minute with zero latency.
There are some songs which do seem too close to one another, so I probably need to play around with adding more dimensions to the embedding, switching between max and mean pooling, and playing around with the params of my clustering and dimension reduction algos.
Man, making this graph interactive and play audio was a stroke of genius on my part, not to be too self-aggrandizing, of course. I basically just cobbled a bunch of technologies I read about together.
One thing I noticed is that the insensity rises as you get closer to the center.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
graphs		graphs
README.md		README.md
cluster.py		cluster.py
download.py		download.py
embed.py		embed.py
files.py		files.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Creating a Map of Music with Spotify and Deep Audio Embeddings

About

Languages

SuperSonicHub1/spotify-audio-embeddings

Folders and files

Latest commit

History

Repository files navigation

Creating a Map of Music with Spotify and Deep Audio Embeddings

About

Topics

Resources

Stars

Watchers

Forks

Languages