1

HedgeCut: Maintaining Randomized Trees for Low-Latency Machine Unlearning

Software systems that learn from user data with machine learning (ML) have become ubiquitous over the last years. Recent law such as the General Data Protection Regulation (GDPR) requires organisations that process personal data to delete user data …

mlinspect: a Data Distribution Debugger for Machine Learning Pipelines

Machine Learning (ML) is increasingly used to automate impactful decisions, and the risks arising from this wide-spread use are garnering attention from policymakers, scientists, and the media. ML applications are often very brittle with respect to …

Lightweight Inspection of Data Preprocessing in Native Machine Learning Pipelines

Machine Learning (ML) is increasingly used to automate impactful decisions, and the risks arising from this wide-spread use are garnering attention from policy makers, scientists, and the media. ML applications are often very brittle with respect to …

Differential Data Quality Verification on Partitioned Data

Modern companies and institutions rely on data to guide every single decision. Missing or incorrect information seriously compromises any decision process. In previous work, we presented Deequ, a Spark-based library for automating the verification of …

Deequ - Data Quality Validation for Machine Learning Pipelines

Modern machine learning (ML) systems are comprised of complex ML pipelines which typically have many implicit assumptions about the data they consume (e.g., about the scales of variables, the presence of missing values or the dictionary of …