In September, we partnered with Mondragon University (Spain), Universita degli Studi del Sannio (Italy), Simula Research Laboratory (Norway), Oslo Metropilitan University (Norway), University of Malaga (Spain), and Zurcher Hochschule Fur Angewandte Wissenschaften (Switzerland) to host a mini hackathon challenge on Nebula with the 11 PhD students who are part of the InnoGuard European Union project. The project focuses on autonomous cyber-physical system safety and trustworthiness with the goal to develop new methods for increasing the reliability and robustness of the quality assurance procedures. The task was to investigate the ethics of LLMs from the perspectives of bias, privacy and copyright.
The challenge resulted in set of very intricate findings:
- when exploring AI in college admissions, the participants found DeepSeek to be particularly biased
- on the topic of ethical bias in self-driving cars, once again DeepSeek was found to have the highest number of “problematic outputs”, with GPT-OSS showing the least bias.
- from exploring AI Assistance in Healthcare Decisions, the PhD students found bias issues ranging from gender and race to even BMI and age.
- bias in LLM hiring was found to be very prominent in Llama (prefers male, western candidates and uses disability as a factor), while Gemini showed neutral characteristics across the board
- last but not least, exploring LLM bias on several job types found LLMs to exhibit high gender and racial biases.
Nebula was used during this hackathon as an integral part of the PhD candidates’ research. We take this opportunity to thank all the participants for the high quality work produced in such a short time.