1

Towards Query Optimizer as a Service (QOaaS) in a Unified LakeHouse Platform: Can One QO Rule Them All?

Customer demand, regulatory pressure, and engineering efficiency are the driving forces behind the industry-wide trend of moving from siloed engines and services that are optimized in isolation to highly integrated solutions. This is confirmed by the …

Automated Provenance-Based Screening of ML Data Preparation Pipelines

Software systems that learn from data via machine learning (ML) are being deployed in increasing numbers in real world application scenarios. These ML applications contain complex data preparation pipelines, which take several raw inputs, integrate, …

Snapcase - Regain Control over Your Predictions with Low-Latency Machine Unlearning

The ‘right-to-be-forgotten’ requires the removal of personal data from trained machine learning (ML) models with machine unlearning. Conducting such unlearning with low latency is crucial for responsible data management, for example in a scenario …

Towards Interactively Improving ML Data Preparation Code via “Shadow Pipelines”

Data scientists develop ML pipelines in an iterative manner: they repeatedly screen a pipeline for potential issues, debug it, and then revise and improve its code according to their findings. However, this manual process is tedious and error-prone. …

Red Onions, Soft Cheese and Data: From Food Safety to Data Traceability for Responsible AI

Software systems that learn from data with AI and machine learning (ML) are becoming ubiquitous and are increasingly used to automate impactful decisions. The risks arising from this widespread use of AI/ML are garnering attention from policy makers, …

mlwhatif: What If You Could Stop Re-Implementing Your Machine Learning Pipeline Analyses Over and Over?

Software systems that learn from data with machine learning (ML) are used in critical decision-making processes. Unfortunately, real-world experience shows that the pipelines for data preparation, feature encoding and model training in ML systems are …

Provenance Tracking for End-to-End Machine Learning Pipelines

Software systems that learn from data are being deployed in increasing numbers in real-world application scenarios. It is a difficult and tedious task to ensure at development time that the end-to-end ML pipelines for such applications adhere to …

Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines

Proactively Screening Machine Learning Pipelines with ArgusEyes

Software systems that learn from data with machine learning (ML) are ubiquitous. ML pipelines in these applications often suffer from a variety of data-related issues, such as data leakage, label errors or fairness violations, which require reasoning …

Towards Data-Centric What-If Analysis for Native Machine Learning Pipelines

An important task of data scientists is to understand the sensitivity of their models to changes in the data that the models are trained and tested upon. Currently, conducting such data-centric what-if analyses requires significant and costly manual …