Software systems that learn from data are being deployed in increasing numbers in real world application scenarios. It is a difficult and tedious task to ensure at development time that the end-to-end ML pipelines for such applications adhere to sound experimentation practices and compliance requirements. Identifying potential correctness issues currently requires a high degree of discipline, knowledge and time from data scientists, and they often only implement one-off solutions, based on specialised frameworks that are incompatible with the rest of the data science ecosystem. We propose techniques to automatically screen ML pipelines for many common correctness issues, given only access to their relational inputs, matrix outputs and the corresponding provenance. We automatically extract these artifacts, and as a consequence, our approach is lightweight and does not require code changes in the natively written ML pipeline. We design the prototypical platform ‘ArgusEyes’ to screen ML pipelines combining code from various popular ML libraries, and furthermore enable the computation of important metadata such as group fairness metrics or data valuation with Shapley values. We discuss how ‘ArgusEyes’ identifies the semantics and the lineage of common artifacts in classification tasks, apply our platform to several example pipelines with real world data, and showcase how to integrate it into a continuous integration workflow on Github.