Data Preprocessing
Description:
Data cleaning, adaptation of missing values, feature selection, correlation analysis, Removal of constant & duplicated values, normalization, outlier removal
Links:
https://en.wikipedia.org/wiki/Data_pre-processing
Keywords:
Data cleaning, Data handling, Feature selection
Motivation:
Data preprocessing is required to transform raw data into informative data for further usage in modeling, analysis, ... . Overfitting, bias and worse information is avoided with proper preprocessing.
Requirements/Prerequisities:
Domain Knowledge & Methodical Knowledge
Level:
activity: description what you have to do in your specific level (e.g. define interface)
Application domain:
Data science (analysis & visualisation)
Main phase:
Data Science: Preparation/Integration
Related literature:
GARCÍA, Salvador; LUENGO, Julián; HERRERA, Francisco. Data preprocessing in data mining. Cham, Switzerland: Springer International Publishing, 2015.
In which projects do/did you use this practice?
FDI, COGNIPLANT, SmartDD, DeepRed
Data analyst
3–5 years of experiences
Software Competence Center Hagenberg
1. How do you rate the potential benefit for your projects? | 5 |
2. How often are you using that practice? | 5 |
3. What is the effort to introduce the practice in your project upfront? | 4 |
4. What is the effort to apply the best practice in your project daily basis? | 3 |
Questions 1, 3 and 4 (1 = Low, 5 = High)
Question 2 (1 = Never, 5 = Always)