Session objective. Learn to perform exploratory data analysis of a tabular data with the help of LLMs. This includes data visualization, interpretation, and statistical summarization.
Step 1. Download the data Project Management CSV
Step 2. Upload it to ChatGPT
Step 3. See the first 5 rows in the CSV
display the first 5 rows of the raw data
Step 4. Clean the rows
remove empty and unnecessary header cells
display the first 5 rows of the clean data
Classical EDA involves using \summary statistics, distributions, and visualizations to understand the structure and quality of data. It helps detect patterns, correlations, and anomalies before applying advanced models.
PDF with Classical EDA Prompts
Step 5. Outlier Detection
Outlier detection is the process of finding data points that are significantly different from the majority of the dataset. It helps identify errors, rare events, or unusual behaviors that may impact analysis or decision-making.
detect numeric outliers based on advanced models
(takes time)
why are they outliers?
Step 6. Multivariate analysis
Multivariate analysis looks at the relationships between three or more variables at once to uncover complex patterns. It helps in understanding how multiple factors interact together rather than in isolation.
create and display in chat a summary table of relationship between project name, days required and progress
how many tasks have been completed? display a table in chat with their respective details
create a timeline chart of all completed tasks
Involves 'Task Name', 'Start Date', 'End Date', 'Progress' columns.
Step 7. Sorting
sort in-progress tasks based on the days required
display all the unique values in each column
make a scatterplot and analyze correlation between task name and days required