Publications

Rana Alotaibi, Yuanyuan Tian, Stefan Grafberger, Jesus Camacho-Rodriguez, Nicolas Bruno, Brian Kroth, Sergiy Matusevych, Ashvin Agrawal, Mahesh Behera, Ashit Gosalia, Cesar Galindo-Legaria, Milind Joshi, Milan Potocnik, Beysim Sezgin, Xiaoyu Li, Carlo Curino (2025). Towards Query Optimizer as a Service (QOaaS) in a Unified LakeHouse Platform: Can One QO Rule Them All?. Conference on Innovative Data Systems Research (CIDR).

Sebastian Schelter, Shubha Guha, Stefan Grafberger (2024). Automated Provenance-Based Screening of ML Data Preparation Pipelines. Datenbank-Spektrum.

Stefan Grafberger (2024). Instrumentation and Analysis of Native ML Pipelines via Logical Query Plans. PhD Workshop at VLDB.

Sebastian Schelter, Stefan Grafberger, Maarten de Rijke (2024). Snapcase - Regain Control over Your Predictions with Low-Latency Machine Unlearning. VLDB (demo).

Stefan Grafberger, Paul Groth, Sebastian Schelter (2024). Towards Interactively Improving ML Data Preparation Code via “Shadow Pipelines”. Data Management for End-to-End Machine Learning workshop at ACM SIGMOD.

Stefan Grafberger, Zeyu Zhang, Sebastian Schelter, Ce Zhang (2024). Red Onions, Soft Cheese and Data: From Food Safety to Data Traceability for Responsible AI. IEEE Data Engineering Bulletin (Special Issue on Data-Centric Responsible AI).

Stefan Grafberger, Bojan Karlas, Paul Groth, Sebastian Schelter (2023). Towards Declarative Systems for Data-Centric Machine Learning. Data-Centric Machine Learning Research workshop (DMLR) at ICML (abstract).

Stefan Grafberger, Shubha Guha, Paul Groth, Sebastian Schelter (2023). mlwhatif: What If You Could Stop Re-Implementing Your Machine Learning Pipeline Analyses Over and Over?. VLDB (demo).

Stefan Grafberger, Paul Groth, Sebastian Schelter (2023). Provenance Tracking for End-to-End Machine Learning Pipelines. ProvenanceWeek at ACM Web Conference (poster).

Stefan Grafberger, Paul Groth, Sebastian Schelter (2023). Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines. ACM SIGMOD.

Sebastian Schelter, Stefan Grafberger, Shubha Guha, Bojan Karlas, Ce Zhang (2023). Proactively Screening Machine Learning Pipelines with ArgusEyes. ACM SIGMOD (demo).

Stefan Grafberger, Paul Groth, Sebastian Schelter (2022). Towards Data-Centric What-If Analysis for Native Machine Learning Pipelines. Data Management for End-to-End Machine Learning workshop at ACM SIGMOD.

Stefan Grafberger, Paul Groth, Julia Stoyanovich, Sebastian Schelter (2021). Data Distribution Debugging in Machine Learning Pipelines. The VLDB Journal — The International Journal on Very Large Data Bases (Special Issue on Data Science for Responsible Data Management).

Sebastian Schelter, Stefan Grafberger, Shubha Guha, Olivier Sprangers, Bojan Karlas, Ce Zhang (2021). Screening Native Machine Learning Pipelines with ArgusEyes. Conference on Innovative Data Systems Research (CIDR, abstract).

Sebastian Schelter, Stefan Grafberger, Ted Dunning (2021). HedgeCut: Maintaining Randomized Trees for Low-Latency Machine Unlearning. ACM SIGMOD.

Stefan Grafberger, Shubha Guha, Julia Stoyanovich, Sebastian Schelter (2021). mlinspect: a Data Distribution Debugger for Machine Learning Pipelines. ACM SIGMOD (demo).

Stefan Grafberger, Julia Stoyanovich, Sebastian Schelter (2020). Lightweight Inspection of Data Preprocessing in Native Machine Learning Pipelines. Conference on Innovative Data Systems Research (CIDR).

Sebastian Schelter, Stefan Grafberger, Philipp Schmidt, Tammo Rukat, Mario Kiessling, Andrey Taptunov, Felix Biessmann, Dustin Lange (2019). Differential Data Quality Verification on Partitioned Data. International Conference on Data Engineering (ICDE).

Sebastian Schelter, Stefan Grafberger, Philipp Schmidt, Tammo Rukat, Mario Kiessling, Andrey Taptunov, Felix Biessmann, Dustin Lange (2018). Deequ - Data Quality Validation for Machine Learning Pipelines. Machine Learning Systems workshop at the conference on Neural Information Processing Systems (NeurIPS).