Soutenue par ke LI, à Sorbonne Université le 22/06/2021. Encadrée par Bernd Amann et Hubert Naacke. Ravi d’avoir pu rapporter cette thèse très intéressante et riche en formalisme et développement.
There is an increasing demand for practical tools to explore the evolution of scientific research published in bibliographic archives such as the Web of Science (WoS), arXiv, PubMed or ISTEX. Revealing meaningful evolution patterns from these document archives has many applications and can be extended to synthesize narratives from datasets across multiple domains, including news stories, research papers, legal cases and works of literature. In this thesis, we propose a data model and query language for the visualization and exploration of topic evolution graphs. Our model is independent of a particular topic extraction and alignment method and proposes a set of semantic and structural metrics for characterizing and filtering meaningful topic evolution patterns. These metrics are particularly useful for the visualization and the exploration of large topic evolution graphs. We also present a prototype implementation of our model on top of Apache Spark and experimental results obtained for four real-world document archives.