https://hal-polytechnique.archives-ouvertes.fr/hal-00907364Milioris, DimitriosDimitriosMiliorisAlcatel-Lucent Bell Labs France [Nozay] - Alcatel-Lucent Bell Labs FranceLINCS - Laboratory of Information, Network and Communication Sciences - UPMC - Université Pierre et Marie Curie - Paris 6 - Inria - Institut National de Recherche en Informatique et en Automatique - IMT - Institut Mines-Télécom [Paris]X - École polytechniqueHIPERCOM2 - High PERformance COMmunications - Inria Paris-Rocquencourt - Inria - Institut National de Recherche en Informatique et en AutomatiqueJacquet, PhilippePhilippeJacquetAlcatel-Lucent Bell Labs France [Nozay] - Alcatel-Lucent Bell Labs FranceLINCS - Laboratory of Information, Network and Communication Sciences - UPMC - Université Pierre et Marie Curie - Paris 6 - Inria - Institut National de Recherche en Informatique et en Automatique - IMT - Institut Mines-Télécom [Paris]Joint Sequence Complexity Analysis: Application to Social Networks Information FlowHAL CCSD2014Milioris, Dimitrios2013-11-21 10:48:472023-02-28 15:36:232013-11-21 10:48:47enJournal articles10.1002/bltj.216471In this paper we study joint sequence complexity and its applications for finding similarities between sequences up to the discrimination of sources. The mathematical concept of the complexity of a sequence is defined as the number of distinct subsequences of it. Sequences containing many common parts have a higher joint complexity. The analysis of a sequence in subcomponents is done by suffix trees, which is a simple, fast, and low complexity method to store and recall them from the memory, especially for short sequences. Joint complexity is used for evaluating the similarity between sequences generated by different Markov sources. Markov models well describe the generation of natural text, and their performance can be predicted via linear algebra, combinatorics, and asymptotic analysis. We exploit datasets from different natural languages, for both short and long sequences, with very promising results. The goal is to perform automated online sequence analysis on information streams, e.g., on social networks such as Twitter.