Empowering Investigative Journalism with Graph-based Heterogeneous Data Management - Archive ouverte HAL Access content directly
Journal Articles Bulletin of the Technical Committee on Data Engineering Year : 2021

Empowering Investigative Journalism with Graph-based Heterogeneous Data Management

(1, 2, 3, 4) , (3, 4) , (3, 4) , (3, 4) , (5, 6) , (3, 4) , (7, 4) , (3, 4) , (3)
1
2
3
4
5
6
7

Abstract

Investigative Journalism (IJ, in short) is staple of modern, democratic societies. IJ often necessitates working with large, dynamic sets of heterogeneous, schema-less data sources, which can be structured, semi-structured, or textual, limiting the applicability of classical data integration approaches. In prior work, we have developed ConnectionLens, a system capable of integrating such sources into a single heterogeneous graph, leveraging Information Extraction (IE) techniques; users can then query the graph by means of keywords, and explore query results and their neighborhood using an interactive GUI. Our keyword search problem is complicated by the graph heterogeneity, and by the lack of a result score function that would enable pruning of the search space. In this work, we describe an actual IJ application studying conflicts of interest in the biomedical domain, and we show how ConnectionLens supports it. Then, we present novel techniques addressing the scalability challenges raised by this application: one allows us to reduce the significant IE costs while building the graph, while the other is a novel, parallel, in-memory keyword search engine, which achieves orders of magnitude speed-up over our previous engine. Our experimental study on the realworld IJ application data confirms the benefits of our contributions.
Fichier principal
Vignette du fichier
paper.pdf (1020.3 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03337650 , version 1 (06-01-2022)

Identifiers

Cite

Angelos-Christos Anadiotis, Oana Balalau, Théo Bouganim, Francesco Chimienti, Helena Galhardas, et al.. Empowering Investigative Journalism with Graph-based Heterogeneous Data Management. Bulletin of the Technical Committee on Data Engineering, inPress. ⟨hal-03337650⟩
52 View
53 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More