Automate your data quality chceks
Description:
Automate regular DQ checks for your data with automation software. Add this tool into your pipeline and then consume DQ reports. Data team then can find weaknesses in its data and make change based on this DQ reports.
Links:
https://github.com/lisehr/dq-meerkat,
https://github.com/great-expectations/great_expectations
Keywords:
data quality, data completeness, data reliability
Motivation:
Having clean and accurate data is crucial for business to work precisely. DQ team should therefore work on DQ measurement and make changes in pipelines based on these findings.
Requirements/Prerequisities:
databeses, knowledge of data
Level:
generic: high level abstract best practice, metalevel category (e.g. manage architectures)
Application domain:
Data science (analysis & visualisation)
Main phase:
Data Science: Preparation/Integration, Data Science: Modeling/Training/Evaluation
Related literature:
MOSES, Barr. Data Quality Fundamentals. O'Reilly Media, 2022. ISBN 1098112040.
In which projects do/did you use this practice?
Covid Dashboard, OpenData API, OpenData
Software Engineer, Data Analyst
0–2 years of experiences
Masaryk University
1. How do you rate the potential benefit for your projects? | 5 |
2. How often are you using that practice? | 2 |
3. What is the effort to introduce the practice in your project upfront? | 4 |
4. What is the effort to apply the best practice in your project daily basis? | 3 |
Questions 1, 3 and 4 (1 = Low, 5 = High)
Question 2 (1 = Never, 5 = Always)