Appropriate Measures for Security: Investigating Legal and Technical Requirements under the GDPR

The General Data Protection Regulation (GDPR) has been in force in the EU since May 2018, but there is still much uncertainty on how to meet its demands in practice. For instance, in its Article 32 the regulation defines that the data controller “shall implement appropriate technical and organisational measures to ensure a level of security appropriate to the risk”. The GDPR gives some indication on the aspects that should drive the decision on appropriate measures, but it admits multiple interpretations. Thus, when reading the regulation’s demands, one question resonates: How to devise and put in practice technical measures suitable to guarantee such technical and legal demands from Article 32? This is the driving question of this project.

If on one side we lack concrete guidelines on how to comply with GDPR’s demands, on the other, information on what is not compliant is already available: approximately 90 fines have been applied on the basis of violation of Article 32. This project foresees the analysis of the decisions issued by the Data Protection Authorities (DPAs) imposing fines for the breach of Article 32 GDPR. In particular, it will look at how the DPAs interpret and apply the factors mentioned in this provision, and it will map findings regarding the security of data processing into tangible and concrete guidelines to help the implementation of suitable security measures. This interdisciplinary project combines topics of information security (Dayana Spagnuelo), and in human rights law (Magdalena Jozwiak).

Supervisors: Dayana Spagnuelo & Magdalena Jozwiak

Information Systems Complexity and Sustainability

The rising complexity of software systems presents managers with major challenges regarding the management of their application landscapes; it negatively influences efficiency and business agility. In a similar vein, it hinders software architects in making informed design decisions: they are asked to continuously evolve the software while ensuring its reliability, and technical quality like performance and security.

This project addresses the question: how can we get a grip on the complexity of software landscapes so that we can understand how the decisions made over time influence the related technical and business sustainability?
It will study the software landscape of a large organization, and extract a suite of metrics that will help analyze the relation between complexity and sustainability.

Supervisors: Patricia Lago & Bart Hooff

Mapping Communication Science with Living Literature Reviews

Literature reviews are an invaluable asset in communication science to bring work from different approaches (e.g., linguistics, political science, psychology) together. Such reviews, however, require a lot of work to be compiled and are quickly outdated. Unfortunately, there are no incentives or systems in place to keep them updated, which would also be a very difficult and time-consuming task given the narrative paper format in which they are published.
The concept and technology of nanopublications could help researchers with these problems. Nanopublications are a container format to publish scientific (and other) statements as small pieces of Linked Data. This project investigates whether nanopublications yield us with machine-interpretable, interoperable, and easily updatable literature reviews in communication science. To do so, we will develop a general model for literature reviews in communication science, and then apply it on a concrete case from an existing literature review. We will thereby demonstrate how this allows for literature reviews that are “living” in the sense that they can be kept up-to-date in a manner that is user-friendly, open, and provenance-aware.

Supervisors: Tobias Kuhn & Mickey Steijaert

Protein Transformers: Large Transferable Language Models for Protein Sequences

Recently, deep language models like BERT and GPT-2 have shown a remarkable ability to generalize across domains. Models pre-trained on large amounts of general-domain data yield representations that capture high-level semantics, and can be finetuned for domains where little data is available.

We will adapt deep language models from the natural language domain to the domain of protein sequences. Both domains use sequences of tokens from a finite alphabet, making it straightforward to apply existing language models without much adaptation. If this approach is successful, it will lead to representations of protein sequences which extract high-level semantic concepts from raw data, which may benefit drug-discovery, biomedical analysis, and biomolecular information retrieval.

Supervisors: Maurits Dijkstra & Peter Bloem

The Cycle of News in Chronicles from Eighteenth Century Holland: A Stylometric Approach

Scholars agree that cultural changes in early modern Europe (c. 1500-1800) were both accompanied and precipitated by an information revolution. The use of printed media filtered down into local chronicles. These are hand-written narratives produced usually by middle class authors, that recorded events and phenomena they considered important (local politics, upheavals, climate, prices, crime, deaths). Authors frequently copied excerpts from earlier chronicles, official documents, local announcements and by-laws, and increasingly copied or inserted printed material, like ballads, pamphlets, and newspapers, without being explicit about the fact that they were copying (Pollmann 2016).This project will focus on automatically finding what parts of the chronicles contain the wordings of the chronicler him(/her)self and what parts might be copied. We will apply both close reading and computational stylometry techniques that are often used in authorship verification. Students will work in the framework of the NWO project Chronicling Novelty. New Knowledge in The Netherlands (1500-1850).

Supervisors: Roser Morante & Erika Kuijpers