Document Forensics

Analysing large numbers of documents is a common and time-consuming task. For instance, investigating unsavoury business practices (e.g., slavery, fraud, bribery) can involve processing large numbers of contracts, yearly reports and external (news) sources that may reflect on a company’s reputation and relations. Currently, this is a labour intensive task mainly using text search to identify relevant documents that are then manually processed.
In this project we will apply methods to extract the relevant concepts (e.g., the name of
suppliers, or the type of relationship between companies, executive management) from unstructured (e.g., news) as well as semi-structured (e.g., contracts and financial) documents to populate knowledge graphs and link them to publicly available knowledge graphs. These knowledge graphs should reflect the temporal binding and provenance of the extracted relations and properties. This will enable automated reasoning about companies and their relationships such as structure of ownership or supply chains and their dynamics. This will allow leveraging external news sources as well as document collections such as the Panama papers in investigations and due diligence processes to automatically identify suspicious entities that companies interact with, even if this interaction is indirect.
At the outset of the project, we will construct a small corpus of relevant questions for document forensics tasks, together with hand-crafted gold standard answers as a benchmark for project success. The questions will be grouped in sets of increasing difficulty: answerable over a single document, answerable over multiple documents, answerable only with background knowledge.
If successful, this project will open up possibilities to guarantee fair trade practices and substantially reduce the effort to comply with regulations that aim at combating money laundering, financing terrorism, bribery, etc.