Unsupervised Classification in Twitter based on Joint Complexity

Abstract : Joint Complexity (JC) has recently been proposed as an efficient novel method for text analysis and has been shown to have promising applications when dealing with posts from online social networking (OSN) and microblogging platforms (e.g. Twitter) where dictionary or semantic analysis based methods face difficulties due to an abundance of OSN-specific abbreviations and nonstandard spelling. In this work, we propose an unsupervised clustering methodology based on Joint Complexity capable of handling short messages from the Twitter microblogging platform on the scale of millions. We demonstrate it’s abilities by extracting relevant topics, phrases of attention and corresponding clusters of tweets from a sample of over 9 million tweets. We show that the JC-based clustering can be used to track the spatiotemporal distribution of conversation trends in an efficient manner without requiring extensive a priori assumptions about the dictionary of words or language used or computationally expensive semantic analysis resources. We further compare results for clustering on the level of individual tweets and on the level of tweets aggregated by the user who posted it.
Type de document :
Communication dans un congrès
International Conference on Computational Social Science (ICCSS), Jun 2016, Chicago, United States. 2016
Liste complète des métadonnées

https://hal-polytechnique.archives-ouvertes.fr/hal-01299629
Contributeur : Dimitrios Milioris <>
Soumis le : vendredi 8 avril 2016 - 04:21:14
Dernière modification le : jeudi 9 février 2017 - 15:17:16

Identifiants

  • HAL Id : hal-01299629, version 1

Collections

Citation

Dániel Kondor, Dimitrios Milioris. Unsupervised Classification in Twitter based on Joint Complexity. International Conference on Computational Social Science (ICCSS), Jun 2016, Chicago, United States. 2016. <hal-01299629>

Partager

Métriques

Consultations de la notice

222