Skip to Main content Skip to Navigation

Personalized audio auto-tagging as proxy for contextual music recommendation

Abstract : The exponential growth of online services and user data changed how we interact with various services, and how we explore and select new products. Hence, there is a growing need for methods to recommend the appropriate items for each user. In the case of music, it is more important to recommend the right items at the right moment. It has been well documented that the context, i.e. the listening situation of the users, strongly influences their listening preferences. Hence, there has been an increasing attention towards developing recommendation systems. State-of-the-art approaches are sequence-based models aiming at predicting the tracks in the next session using available contextual information. However, these approaches lack interpretability and serve as a hit-or-miss with no room for user involvement. Additionally, few previous approaches focused on studying how the audio content relates to these situational influences, and even to a less extent making use of the audio content in providing contextual recommendations. Hence, these approaches suffer from both lack of interpretability.In this dissertation, we study the potential of using the audio content primarily to disambiguate the listening situations, providing a pathway for interpretable recommendations based on the situation.First, we study the potential listening situations that influence/change the listening preferences of the users. We developed a semi-automated approach to link between the listened tracks and the listening situation using playlist titles as a proxy. Through this approach, we were able to collect datasets of music tracks labelled with their situational use. We proceeded with studying the use of music auto-taggers to identify potential listening situations using the audio content. These studies led to the conclusion that the situational use of a track is highly user-dependent. Hence, we proceeded with extending the music-autotaggers to a user-aware model to make personalized predictions. Our studies showed that including the user in the loop significantly improves the performance of predicting the situations. This user-aware music auto-tagger enabled us to tag a given track through the audio content with potential situational use, according to a given user by leveraging their listening history.Finally, to successfully employ this approach for a recommendation task, we needed a different method to predict the potential current situations of a given user. To this end, we developed a model to predict the situation given the data transmitted from the user's device to the service, and the demographic information of the given user. Our evaluations show that the models can successfully learn to discriminate the potential situations and rank them accordingly. By combining the two model; the auto-tagger and situation predictor, we developed a framework to generate situational sessions in real-time and propose them to the user. This framework provides an alternative pathway to recommending situational sessions, aside from the primary sequential recommendation system deployed by the service, which is both interpretable and addressing the cold-start problem in terms of recommending tracks based on their content.
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Wednesday, April 6, 2022 - 5:18:14 PM
Last modification on : Friday, April 8, 2022 - 2:18:01 PM
Long-term archiving on: : Thursday, July 7, 2022 - 7:19:15 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03633097, version 1



Karim Magdi Abdelfattah Ibrahim. Personalized audio auto-tagging as proxy for contextual music recommendation. Multimedia [cs.MM]. Institut Polytechnique de Paris, 2021. English. ⟨NNT : 2021IPPAT039⟩. ⟨tel-03633097⟩



Record views


Files downloads