Investigative reporters trawl many sources and gather information as a part of their work. One of their routine tasks is figuring out whether one piece of information is the same as another by considering many factors. Another chore is to find matching fragments of the evidence in a multitude of locations and then patching those together into a compelling story. To perform these tasks reporters relied on the pen and paper.
The Organized Crime and Corruption Reporting Project (OCCRP) -- a network of investigative reporters in over 45 countries, working on projects such as the Panama Papers or the Global Laundromat goal is to assist journalists to analyze evidence with the help of data tooling.
Data tooling has emerged as a key factor in investigative reporting, and OCCRP aims to use graph data systems for the analysis of offshore company ownership, money laundering, or state capture. OCCRP has developed the open-source Aleph system and the FollowTheMoney ontology as key components for processing data from both structured databases and unstructured sets of documents.
While these technologies use data integration and linked data, the objective is to offer the best of the limited resources available to investigative reporters. Accordingly, OCCRP has developed one of the largest data engines combining 500 million entities from 600 data sources using both open data and confidentially sourced information.
By combining knowledge graphs with machine learning techniques like graph embeddings, which can transform FollowTheMoney objects into representations, the OCCRP helps investigative reporters compare complex data. In turn, the feedback from the reporters on what constitutes a good link between two data sets, and a weak match will be used to train the open-source system.
Click here to read the original article published by Global Investigative Journalism Network.
Please give your feedback on this article or share a similar story for publishing by clicking here.