Stefan Grafberger

Stefan Grafberger

Ph.D. Student

University of Amsterdam

Biography

I am a Ph.D. student at the University of Amsterdam in the Intelligent Data Engineering Lab, conducting research at the intersection of data management and machine learning. I mainly publish at conferences like SIGMOD and VLDB.

My Ph.D. advisors are Sebastian Schelter and Paul Groth. I work on responsible data management (also in collaboration with Julia Stoyanovich). Before my Ph.D., I did my masters at TU Munich with Thomas Neumann and Alfons Kemper and focused on databases.

During my studies, I interned with Microsoft GSL, Amazon Research, Oracle Labs, and worked as a research assistant at TU Munich.

News

Recent Publications

All publications

(2024). Red Onions, Soft Cheese and Data: From Food Safety to Data Traceability for Responsible AI. IEEE Data Engineering Bulletin.

PDF

(2023). mlwhatif: What If You Could Stop Re-Implementing Your Machine Learning Pipeline Analyses Over and Over?. VLDB (demo).

PDF

(2023). Towards Declarative Systems for Data-Centric Machine Learning. Data-Centric Machine Learning Research workshop (DMLR) at ICML (abstract).

PDF

(2023). Provenance Tracking for End-to-End Machine Learning Pipelines. ProvenanceWeek at ACM Web Conference (poster).

PDF

(2023). Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines. ACM SIGMOD.

PDF

CV

I am a Ph.D. student at the University of Amsterdam in the Intelligent Data Engineering Lab, conducting research at the intersection of data management and machine learning. I mainly publish at conferences like SIGMOD and VLDB.

My Ph.D. advisors are Sebastian Schelter and Paul Groth. I work on responsible data management (also in collaboration with Julia Stoyanovich). Before my Ph.D., I did my masters at TU Munich with Thomas Neumann and Alfons Kemper and focused on databases.

During my studies, I interned with Microsoft GSL, Amazon Research, Oracle Labs, and worked as a research assistant at TU Munich. I also interned and worked as a working student at TNG Technology Consulting in Munich and worked as a teaching assistant at University of Augsburg.

In the past, I have been working on deequ, a library for ‘unit-testing’ large datasets with Apache Spark, PGX, an in-memory graph analytics framework, and Umbra, a disk-based database with in-memory performance. Currently, I work on mlinspect and mlwhatif. The goal is to diagnose and mitigate robustness and reliability issues in machine learning pipelines.

Contact

I’m reachable via email at s.grafberger@uva.nl.