A legal analysis of text- and data-mining

In the area of copyright law, EU and national policy makers presently discuss the need to adopt a new copyright limitation for the purpose of text and data mining. This new provision would make text and data mining activities ‘immune’ against copyright claims. Text and data mining could be carried out without an obligation to ask permission from copyright owners and clear rights. This copyright immunity would thus facilitate the mining of copyrighted material, such as literary, scientific and journalistic texts, blogs and tweets.

Against this background, the project aims to provide a detailed description of the functioning of text and data mining, and the different players and acts of use involved. It also provides a legal analysis of the different acts of use in the light of present copyright legislation to clarify in how far text and data mining may amount to copyright infringement.

Business objective driven data mining

A typical data mining project seeks to answer questions such as “which prospects will respond to offer X?” “which credit applicants will repay their loans?” or “who will defect to the competition?” on the basis of databases with examples of previous interactions. But often the resulting models do not perform optimally. For this reason, data mining practitioners turn to methods such as ROC curves to assess their models. This project aims to develop optimisation methods for machine learning approaches such as SVMs that do allow for this.

Developing a linked, open database of social science measures

Recent troubling findings on the consequences of lacking research and reporting transparency elicited a strong trend towards Open Science within the social sciences. We propose to generate such functionality by designing a participatory, linked, open database of social science measures, that is, of operational definitions (e.g., scales) of theoretical concepts.

Dynamic pricing incentives for participatory sensing

As smart phones are getting prevalent in mobile industry, they are expected to replace application-specific sensors. Wireless connectivity, GPS-based localization capability, and OS can provide a platform for general-purpose sensors. Furthermore, smart phones carried by users add mobility to static sensors, covering a dynamic range.
To stimulate user participation, we design and evaluate novel auction-based dynamic pricing incentive mechanisms where users can sell their sensing data to a service provider with users’ claimed bid prices. The proposed incentive mechanism should focus on minimizing and stabilizing the incentive cost while maintaining an adequate level of participants by preventing users from dropping out of participatory sensing applications.

Genderc – gendered dimensions in ERC Grant selection

The goal of this research is to identify possible gender-specific influences on the assessment of the ERC (European Research Council) Starting Grant. This is done by analyzing the official documents related to the formal criteria, the application of these formal criteria during the peer review process, whereby particular attention is paid to the potential gender-specific application of the concept of scientific excellence. The selection of panel members is also analyzed using the practices described here. Based on empirical evidence collected, recommendations for implementation in the ERC funding practices will be given and discussed with those responsible as part of workshops. The POLICIES team leads the consortium with VU Amsterdam and Tecnalia.

Identification of participants in the Psalms

A great challenge of reading the Hebrew poetry of the Psalms is the identification of participants. The major cause of this problem is a continual shift in person, number and gender (so-called PNG-shifts) in the text. In this pilot project we used the annotated database of the Hebrew Bible prepared by the Eep Talstra Centre for Bible and Computer to experiment with a systematic analysis of PNG-shifts, the demarcation of direct speech sections (e.g. an oracle by God is often not introduced by such but should be inferred by a change of speaker [God] into addressee [human]), and the identification of participants.

INVENiT: Researching Early Modern Creative Industries

Humanities researchers depend in their research on the efficiency and effectiveness of the search functionality provided in various cultural heritage collections online (e.g. images, videos and textual material). Currently many of the cultural heritage institutions do not provide the necessary interactivity and transparency for humanities scholars. In INVENiT, we aim to connect the image database and metadata of the Rijksmuseum with bibliographical data of STCN – Short Title Catalogue of the Netherlands (1550-1800).

Knowledge sharing for the rural poor

Knowledge sharing is a key influence on development of the rural poor, with ICT as a critical enabler, providing for instance critical market data or weather information to sustenance farmers, through low-tech, mobile or radio technologies. In this project, we take a Linked Data approach, thereby adding a new dimension to ICT4D research and practice. Linked Data allows for flexible, multi-layered knowledge sharing, independent of infrastructure and interfaces.

Medical Trust Networks

When it comes to health, the online debate can be very intense, involving a range of actors, from government and science institutions to citizens voicing opinions in (organized) patient forums, blogs, and tweets. This project will make a first investigation to detect belief system dynamics in online trust networks, aiming to study how beliefs converge, collide, and are countered, and how (dis)trust develops within and between trust networks over time.

Mining causal graphs from patient records

Electronic patient records are a rich resource of current practice in the health domain; practitioners meticulously record how they diagnosed and treated their patients. Often these records provide a more up­to­date overview of treatment patterns than medical guidelines, as medical practice often differs for good reasons from the idealised guidelines. In this project, we will create a structured graph representation of medical practitioners’ actions in response to observing particular symptoms. This graph will allow us to analyse the types of treatments practitioners choose, and to compare these treatments to those proposed in guidelines.

Polemics Visualized – Experiments in Syriac text comparison

The purpose of the Polemics Visualized pilot project is to explore the possibilities of computational linguistics and natural language processing for use in theological research of Classical Syriac texts. More specifically, we would like to answer the question whether Ephrem the Syrian, who wrote extensive polemics against Bardaisan, a theologian living two centuries earlier, was indeed discussing the same issues as Bardaisan addressed in his only remaining work.
Syriac, a language from the Aramaic family, has been the lingua franca of the Middle East for centuries. Many important theological documents from the period of the formation of the early church have been written in Syriac. These texts form a considerably large corpus, for example the published works of Ephrem the Syrian already exceed 500,000 words. The theological study of textual corpora of such size would benefit greatly from computational analysis of these texts.

Political discourse in the news

The relation between the political arena and media coverage occupies a central place in Political Communication research. Studies into the newsworthiness of political events and actors show that especially powerful actors get most coverage, while actual work in the political arena has less effect. Using a linked data set, we have shown long-term changes and processes in negativity and personalization in the Dutch news.

Quantifying the experience of a paintings exhibition with wearable sensors

Museum curators and staff design exhibits to appeal to visitors. But once exhibits open, staff and researchers know little about how people experience the shows, relying on subjective visitor feedback. In this project, we supplement self-reported data with sensor technology to unobtrusively record behavior.

Success and Failure of Enterprise Systems

Enterprise Systems (ES) are large, integrated information systems that combine various ICT functions within and between organizations. Implementation of these systems is costly and time-consuming, and often fails. How can Enterprise Systems seemingly both fail and succeed?

Swarm Hacking

The project will carry out a twofold exploration of “Swarm Hacking”. The technical track will explore the possibilities of hacking a group of robots, while the legal track will interpret these activities based on current legislation. The overall goal is to produce a wake-up call, i.e., a case study that will be used (pro)actively to create awareness in the related professional communities as well as in the general public and politics.

Time will tell a different story

Historic research has gone through significant changes over time, in an effort to produce the most objective presentations of the past. Facts may remain the same, but perspectives in the way historians describe people and events change. The angle of analysis evolves hand in hand with the change of time, society and public opinion. This project will investigate how these changes can be traced in historic text.

Words in and out of the social context: The case of Hyperbole

One of the biggest challenges in the light of language and communication research is the fact that expressions can change meaning depending on the referent. In this project, we take on the challenge of developing a method to code linguistic expressions using both their linguistic features and the social context in which the utterance was made.