Machine Learning Justin today18 October 2024
Next up in our machine learning concepts series we are going to discuss exploratory data analysis (EDA). EDA is a cornerstone of data science; you can think of it as a process for you to get to know your data by uncovering patterns, trends, relationships and potential issues. It can help you understand the data’s structure, distributions, and key characteristics which will guide decision-making about:
Calculating basic measures such as mean, median, standard deviation, range, quartiles, etc., allows us to understand the tendency, spread and shape of our numerical data.
Using charts and graphs we can easily visualize our data to identify patterns and relationships.
We can also measure the strength and direction of linear relationships between one or more numerical variables.
Univariate analysis is the exploration of individual variables in order to identify distributions, patterns, outliers, and relationships. Bivariate analysis involves exploring these same areas but as a relationship between two variables.
Identify and handle missing values, outliers, duplicates, and other inconsistencies based on your understanding of the data and its domain. The difference between data cleaning in the EDA phase and data cleaning that was previously done in the data pre-processing phase is that in the EDA phase, we’re dealing with numerical representations of the raw data that we cleaned during data pre-processing.
As with data-processing, it’s important to document every action taken as part of the EDA phase. This includes the documentation of EDA techniques used, and what data they were used on; analysis findings; and the documentation of any data that was removed or replaced.
As with most steps in creating a Machine Learning model, exploratory data analysis is an iterative process and should not be rushed. The goal is to visualize and display the data in ways in which we can make meaningful interpretation of said data.
Written by: Justin
Tagged as: ML, R, Spreadsheet, Feature Engineering, Documentation, Feature, LLM, Exploratory Data Analysis, EDA, Correlation, Visualization, Univariate, Bivariate, Machine Learning, Python.
Machine Learning Justin
©Copyright roguesecurity.ca 2024. All Rights Reserved.
Post comments (0)