TalkPlayData 2
Raw JSON data from conversational music recommendation sessions
📄 TalkPlayData 2
This is a demo page for our paper: "TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation".
In this page, we currently show 3 data points from test sets as a preview. We already have 1,000 data points for test-set, and we plan to release over 20,000 conversations for training set by the end of September.
Links:
- 📄 Paper: PDF (First Version) - arXiv submission pending
- 📊 Dataset: Hugging Face (currently test set with 1,000 conversations available)
- 💻 Generation Code: GitHub (includes verbatim copies of all prompts + working public version)
Note: The generation code does not include the base dataset (LFM-2B and Spotify previews) due to copyright restrictions.
What is TalkPlayData 2?
We generate realistic conversation data for music recommendation research that covers various conversation scenarios and involves multimodal aspects of music.
We achieve it by letting two LLMs talk to each other about music, since LLMs can definitely talk to each other coherently and naturally. But we need grounding music data to make the conversation realistic and relevant to real music items we have. So, we use a listening session dataset to condition the conversation, where each session's tracks become the recommendation pool of the resulting conversation.
But that's not enough. We also condition the Listener LLM with a conversation goal, which is finetuned based on the recommendation pool, by the Goal LLM. And - to do it better, we also condition the Listener LLM with a Listener Profile, which is based on basic demography and inferred information of the listener.
Information Imbalance is an important aspect of TalkPlayData 2. This Listener Profile is shared with the Goal LLM, Listener LLM, and the Recsys LLM. However, the conversation goal is shared only with the Listener LLM, which queries and responds to the Recsys LLM to achieve the goal. The Recsys LLM doesn't know the goal, but it knows the recommendation pool and recommends music each turn based on the Listener LLM's messages.
As a result, this is essentially an agentic pipeline, simulating real-world conversations between a listener and a music recommender.

Pipeline for generating conversational music recommendation data
Multimodal and Conversational, in one LLM.
In our pipeline, all the LLMs are multimodal - they can listen to the audio and see images, besides understanding various sub-modalities in text such as lyrics and chords.
Not to mention they're conversational - just like TalkPlay 1, that's why we're using LLMs as a recommendation engine.
This is nicely put along the line of bigger models, and less number of components -- extending the scope of a single model (LLM). An important contrast to the existing systems which have different, separate components to handle different modalities, connected non-differentially.
So, what can you do with TalkPlayData 2?
We have released the code - you can generate your own data.
Or you can use our test split to evaluate your system.
Most importantly, with the train split (to be released soon), you can train a music recommender that is natively conversational and multimodal.