Publications

(2024). Towards Query Optimizer as a Service (QOaaS) in a Unified LakeHouse Platform: Can One QO Rule Them All?. Conference on Innovative Data Systems Research (CIDR).

(2024). Automated Provenance-Based Screening of ML Data Preparation Pipelines. Datenbank-Spektrum.

PDF

(2024). Snapcase - Regain Control over Your Predictions with Low-Latency Machine Unlearning. VLDB (demo).

PDF

(2024). Towards Interactively Improving ML Data Preparation Code via “Shadow Pipelines”. Data Management for End-to-End Machine Learning workshop at ACM SIGMOD.

PDF

(2024). Red Onions, Soft Cheese and Data: From Food Safety to Data Traceability for Responsible AI. IEEE Data Engineering Bulletin (Special Issue on Data-Centric Responsible AI).

PDF

(2023). Towards Declarative Systems for Data-Centric Machine Learning. Data-Centric Machine Learning Research workshop (DMLR) at ICML (abstract).

PDF

(2023). mlwhatif: What If You Could Stop Re-Implementing Your Machine Learning Pipeline Analyses Over and Over?. VLDB (demo).

PDF

(2023). Provenance Tracking for End-to-End Machine Learning Pipelines. ProvenanceWeek at ACM Web Conference (poster).

PDF

(2023). Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines. ACM SIGMOD.

PDF

(2023). Proactively Screening Machine Learning Pipelines with ArgusEyes. ACM SIGMOD (demo).

PDF

(2022). Towards Data-Centric What-If Analysis for Native Machine Learning Pipelines. Data Management for End-to-End Machine Learning workshop at ACM SIGMOD.

PDF

(2021). Data Distribution Debugging in Machine Learning Pipelines. The VLDB Journal — The International Journal on Very Large Data Bases (Special Issue on Data Science for Responsible Data Management).

PDF

(2021). Screening Native Machine Learning Pipelines with ArgusEyes. Conference on Innovative Data Systems Research (CIDR, abstract).

PDF

(2021). HedgeCut: Maintaining Randomized Trees for Low-Latency Machine Unlearning. ACM SIGMOD.

PDF

(2021). mlinspect: a Data Distribution Debugger for Machine Learning Pipelines. ACM SIGMOD (demo).

PDF

(2020). Lightweight Inspection of Data Preprocessing in Native Machine Learning Pipelines. Conference on Innovative Data Systems Research (CIDR).

PDF

(2019). Differential Data Quality Verification on Partitioned Data. International Conference on Data Engineering (ICDE).

PDF

(2018). Deequ - Data Quality Validation for Machine Learning Pipelines. Machine Learning Systems workshop at the conference on Neural Information Processing Systems (NeurIPS).

PDF