0% found this document useful (0 votes)

16 views6 pages

Assignment 2 297

Uploaded by

satwiksharma.kn1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views6 pages

Assignment 2 297

Uploaded by

satwiksharma.kn1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

DVER

ASSIGNMENT 2
BHUVANA SREE
VU22CSEN0300297
* Explore new dataset
* Load/Import dataset
* Understand the data (size, dimensions, type)
* Perform Data Preprocessing
* Descriptive Statistics
* Analysis
* Visualization summary
* Deduct the outliers
* Conclude the findings - Insights

Introduction
This report presents an in-depth analysis of a dataset focused on employee performance within a
fictional company. The dataset consists of 100 records, capturing key attributes such as Employee ID,
Department, Salary, Years of Experience, Job Satisfaction, Performance Score, and Age. The objective
of this analysis is to explore these attributes to identify trends, relationships, and potential outliers
that may influence overall employee performance.
We will begin by understanding the structure and composition of the dataset, followed by necessary
data preprocessing steps to ensure data quality. Descriptive statistics will provide insights into the
central tendencies and variability of the key metrics. We will then visualize important aspects of the
data to illustrate trends and distributions, followed by a thorough investigation of outliers in the
dataset.

1.Load Necessary Libraries

# Load necessary libraries

library(ggplot2)
library(dplyr)
library(htmlwidgets)
library(plotly)

2. Create the Dataset

# Set seed for reproducibility

set.seed(789)
# Create an employee performance dataset

employee_data <- data.frame(

Employee_ID = 1:100,

Department = sample(c("HR", "IT", "Sales", "Marketing"), 100, replace = TRUE),

Salary = sample(30000:120000, 100, replace = TRUE),

Years_of_Experience = sample(1:30, 100, replace = TRUE),

Job_Satisfaction = sample(1:10, 100, replace = TRUE),

Performance_Score = sample(50:100, 100, replace = TRUE),

Age = sample(22:60, 100, replace = TRUE)

# View the first few rows of the dataset

head(employee_data)

3. Understand the Data

# Check dimensions and structure

dim(employee_data) # Dimensions

str(employee_data) # Structure of the dataset

summary(employee_data) # Summary statistics

4. Data Preprocessing

# Check for missing data

missing_data <- sum(is.na(employee_data))

if (missing_data > 0) {

cat("There are", missing_data, "missing values in the dataset.")

} else {

cat("No missing values in the dataset.")

# Convert Department to factor

employee_data$Department <- as.factor(employee_data$Department)

5. Descriptive Statistics

# Calculate descriptive statistics

desc_stats <- employee_data %>%

summarise(

Mean_Salary = mean(Salary),
Median_Salary = median(Salary),

Mean_Job_Satisfaction = mean(Job_Satisfaction),

Mean_Performance_Score = mean(Performance_Score),

Age_Range = range(Age)

print(desc_stats)

6. Analysis and Visualization

a. Distribution of Salary

# Distribution of Salary

salary_plot <- ggplot(employee_data, aes(x = Salary)) +

geom_histogram(binwidth = 5000, fill = "skyblue", color = "black") +

labs(title = "Distribution of Salary", x = "Salary", y = "Frequency")

# Print the plot

print(salary_plot)

b. Job Satisfaction Box Plot

# Job Satisfaction Box Plot

job_satisfaction_plot <- ggplot(employee_data, aes(y = Job_Satisfaction)) +

geom_boxplot(fill = "lightgreen") +

labs(title = "Job Satisfaction Distribution", y = "Satisfaction Rating")

# Print the plot

print(job_satisfaction_plot)

7. Detect Outliers

# Identify outliers in Salary using IQR method

Q1 <- quantile(employee_data$Salary, 0.25)

Q3 <- quantile(employee_data$Salary, 0.75)

IQR <- Q3 - Q1

# Define outlier thresholds

outlier_threshold_low <- Q1 - 1.5 * IQR

outlier_threshold_high <- Q3 + 1.5 * IQR

# Get outlier information

outliers <- employee_data %>%

filter(Salary < outlier_threshold_low | Salary > outlier_threshold_high)

# Print outliers
print(outliers)

# Visualize Outliers

outlier_plot <- ggplot(employee_data, aes(x = "", y = Salary)) +

geom_boxplot() +

labs(title = "Box Plot of Salary with Outliers", y = "Salary")

# Print the outlier plot

print(outlier_plot)

8. Export Interactive Plots

# Convert plots to interactive

salary_plotly <- ggplotly(salary_plot)

job_satisfaction_plotly <- ggplotly(job_satisfaction_plot)

# Save the interactive plots as HTML

saveWidget(salary_plotly, "salary_distribution_plot.html", selfcontained = TRUE)

saveWidget(job_satisfaction_plotly, "job_satisfaction_plot.html", selfcontained = TRUE)

Conclusion

In this structured analysis of the employee performance dataset, we walked through each phase from data
creation to visualization and conclusion. The insights gained can help organizations understand salary
distributions, the impact of job satisfaction on performance, and identify outliers that may indicate areas for
further investigation. These findings can ultimately guide effective management practices aimed at improving
employee engagement and productivity.

Orientation To Data Visualization IV (Informatics) IK1026.3
No ratings yet
Orientation To Data Visualization IV (Informatics) IK1026.3
20 pages
India Credit Risk Default Model - Nivedita Dey - PGP BABI May19 - 2
100% (4)
India Credit Risk Default Model - Nivedita Dey - PGP BABI May19 - 2
19 pages
Capstone Project - Credit Risk Analysis
67% (6)
Capstone Project - Credit Risk Analysis
50 pages
Employee Attrition Study Case
No ratings yet
Employee Attrition Study Case
88 pages
Final Capstone Project Report
100% (1)
Final Capstone Project Report
35 pages
07 HR
No ratings yet
07 HR
15 pages
R Codes
No ratings yet
R Codes
23 pages
s05 Solution
No ratings yet
s05 Solution
15 pages
Capstone Project Assignment
No ratings yet
Capstone Project Assignment
3 pages
Business Analytics
No ratings yet
Business Analytics
5 pages
23914080052_People Analytics InSEM
No ratings yet
23914080052_People Analytics InSEM
11 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Employee Turnover Prediction Project
No ratings yet
Employee Turnover Prediction Project
10 pages
ML 2 Project Business Report_Nandini
No ratings yet
ML 2 Project Business Report_Nandini
43 pages
PFDA
No ratings yet
PFDA
23 pages
Armillia Karenna - TP060327 - Pfda
No ratings yet
Armillia Karenna - TP060327 - Pfda
65 pages
Data Preprocessing
No ratings yet
Data Preprocessing
18 pages
HR Analyst (Data Analyst)
No ratings yet
HR Analyst (Data Analyst)
11 pages
t2
No ratings yet
t2
10 pages
Employee Info
No ratings yet
Employee Info
2 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
VAISHNAVI PPT
No ratings yet
VAISHNAVI PPT
12 pages
data analytics final project
No ratings yet
data analytics final project
6 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
Detail Project Report SMDM
100% (1)
Detail Project Report SMDM
25 pages
[email protected]
No ratings yet
[email protected]
13 pages
EMPLOYEE PERFORMANCE ANALYSIS
No ratings yet
EMPLOYEE PERFORMANCE ANALYSIS
3 pages
Blended Data Cleaning
No ratings yet
Blended Data Cleaning
9 pages
AI Assignment 6 - Employee Performance Analysis - Jupyter Notebook
No ratings yet
AI Assignment 6 - Employee Performance Analysis - Jupyter Notebook
9 pages
Report
No ratings yet
Report
15 pages
R-Lab p-4,2,1
No ratings yet
R-Lab p-4,2,1
12 pages
Assignment 2 - Factor Hair
No ratings yet
Assignment 2 - Factor Hair
39 pages
Data Wrangling Report
No ratings yet
Data Wrangling Report
3 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
Advance Stats Assignment
No ratings yet
Advance Stats Assignment
18 pages
Group Assignment - Data Mining
No ratings yet
Group Assignment - Data Mining
28 pages
Visualization 2
No ratings yet
Visualization 2
1 page
AMCAT Data Analysis
No ratings yet
AMCAT Data Analysis
18 pages
AS Project Report - 16-10-21
No ratings yet
AS Project Report - 16-10-21
16 pages
Project Employee Absenteeism
No ratings yet
Project Employee Absenteeism
33 pages
PySpark_slides
No ratings yet
PySpark_slides
30 pages
Data Project
No ratings yet
Data Project
12 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
HR Analytic Using Logistic Regression
No ratings yet
HR Analytic Using Logistic Regression
12 pages
R Assignment 10
No ratings yet
R Assignment 10
12 pages
Salary Prediction
No ratings yet
Salary Prediction
32 pages
Exp 8_LM
No ratings yet
Exp 8_LM
10 pages
Business Intelligence and Analytics
No ratings yet
Business Intelligence and Analytics
8 pages
Week-6 DS Practical
No ratings yet
Week-6 DS Practical
12 pages
statsss1
No ratings yet
statsss1
18 pages
HR 1
No ratings yet
HR 1
5 pages
IS5312 Mini Project-2
No ratings yet
IS5312 Mini Project-2
5 pages
Ml Projects
No ratings yet
Ml Projects
22 pages
IBM PPT
No ratings yet
IBM PPT
12 pages
Data Cleaning R
No ratings yet
Data Cleaning R
16 pages
Employeement data ideas using excel
No ratings yet
Employeement data ideas using excel
14 pages
Draft - Assignment 1 Report
No ratings yet
Draft - Assignment 1 Report
8 pages
Salary Data Analysis - Phase 1
No ratings yet
Salary Data Analysis - Phase 1
5 pages
Ex 5
No ratings yet
Ex 5
4 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
[2020] An Analysis of Data Driven, Decision-Making Capabilities of Managers in Banks
No ratings yet
[2020] An Analysis of Data Driven, Decision-Making Capabilities of Managers in Banks
19 pages
Hhw Class Xi
No ratings yet
Hhw Class Xi
18 pages
Data Visualization
No ratings yet
Data Visualization
18 pages
Introduction To Tableau
No ratings yet
Introduction To Tableau
7 pages
Mathematical model
No ratings yet
Mathematical model
34 pages
Orientation Deck I IBM Data Analytics Internship
No ratings yet
Orientation Deck I IBM Data Analytics Internship
53 pages
Kavya_Data_analyst
No ratings yet
Kavya_Data_analyst
1 page
class 11 AI
No ratings yet
class 11 AI
9 pages
VivekMohape_Resume_Gen_AI_
No ratings yet
VivekMohape_Resume_Gen_AI_
1 page
Data Analysis Guide
No ratings yet
Data Analysis Guide
51 pages
2023 2024 IV Semester Syllabus
No ratings yet
2023 2024 IV Semester Syllabus
113 pages
Data Visualisation
No ratings yet
Data Visualisation
54 pages
Lecture-1 Descriptive Statistics
No ratings yet
Lecture-1 Descriptive Statistics
50 pages
Voulgaris - Data Scientist (AVG) (2014)
No ratings yet
Voulgaris - Data Scientist (AVG) (2014)
297 pages
LIBA++Lecture+Notes Power+BI
100% (1)
LIBA++Lecture+Notes Power+BI
17 pages
Empowering Teams to Make Data-Driven Decisions (1)
No ratings yet
Empowering Teams to Make Data-Driven Decisions (1)
19 pages
Mba 4 Sem Imp Questions 23
No ratings yet
Mba 4 Sem Imp Questions 23
7 pages
Project Report On Retail Stores
No ratings yet
Project Report On Retail Stores
26 pages
Daol
No ratings yet
Daol
18 pages
Game Play Analysis Research Article
No ratings yet
Game Play Analysis Research Article
13 pages
Tableau - Diabetes Dataset Assessment
No ratings yet
Tableau - Diabetes Dataset Assessment
2 pages
Introduction To EDA Method in Machine Learning: by 60 - Soham Pawar
No ratings yet
Introduction To EDA Method in Machine Learning: by 60 - Soham Pawar
10 pages
Ca2 - Lpu
No ratings yet
Ca2 - Lpu
2 pages
BUS1001 Ass2 Sem2 2024
No ratings yet
BUS1001 Ass2 Sem2 2024
4 pages
Taking Daylight To The Next Level (DHOUAIBIA MOHAMMED SALAH EDINNE)
No ratings yet
Taking Daylight To The Next Level (DHOUAIBIA MOHAMMED SALAH EDINNE)
4 pages
Updated CV
No ratings yet
Updated CV
3 pages
Python NRML
No ratings yet
Python NRML
7 pages
Business Anaytics Unit 1
No ratings yet
Business Anaytics Unit 1
37 pages
The Persuasiveness of a Chart Depends on the Reader Not Just the Chart
No ratings yet
The Persuasiveness of a Chart Depends on the Reader Not Just the Chart
5 pages

Assignment 2 297

Uploaded by

Assignment 2 297

Uploaded by

DVER

1.Load Necessary Libraries

# Load necessary libraries

2. Create the Dataset

# Set seed for reproducibility

employee_data <- data.frame(

Department = sample(c("HR", "IT", "Sales", "Marketing"), 100, replace = TRUE),

Salary = sample(30000:120000, 100, replace = TRUE),

Years_of_Experience = sample(1:30, 100, replace = TRUE),

Job_Satisfaction = sample(1:10, 100, replace = TRUE),

Performance_Score = sample(50:100, 100, replace = TRUE),

Age = sample(22:60, 100, replace = TRUE)

# View the first few rows of the dataset

3. Understand the Data

# Check dimensions and structure

str(employee_data) # Structure of the dataset

summary(employee_data) # Summary statistics

# Check for missing data

missing_data <- sum(is.na(employee_data))

cat("There are", missing_data, "missing values in the dataset.")

cat("No missing values in the dataset.")

# Convert Department to factor

employee_data$Department <- as.factor(employee_data$Department)

# Calculate descriptive statistics

desc_stats <- employee_data %>%

6. Analysis and Visualization

salary_plot <- ggplot(employee_data, aes(x = Salary)) +

geom_histogram(binwidth = 5000, fill = "skyblue", color = "black") +

labs(title = "Distribution of Salary", x = "Salary", y = "Frequency")

# Print the plot

b. Job Satisfaction Box Plot

job_satisfaction_plot <- ggplot(employee_data, aes(y = Job_Satisfaction)) +

labs(title = "Job Satisfaction Distribution", y = "Satisfaction Rating")

# Print the plot

# Identify outliers in Salary using IQR method

Q1 <- quantile(employee_data$Salary, 0.25)

Q3 <- quantile(employee_data$Salary, 0.75)

# Define outlier thresholds

outlier_threshold_low <- Q1 - 1.5 * IQR

outlier_threshold_high <- Q3 + 1.5 * IQR

# Get outlier information

outliers <- employee_data %>%

filter(Salary < outlier_threshold_low | Salary > outlier_threshold_high)

outlier_plot <- ggplot(employee_data, aes(x = "", y = Salary)) +

labs(title = "Box Plot of Salary with Outliers", y = "Salary")

# Print the outlier plot

8. Export Interactive Plots

# Convert plots to interactive

salary_plotly <- ggplotly(salary_plot)

job_satisfaction_plotly <- ggplotly(job_satisfaction_plot)

# Save the interactive plots as HTML

saveWidget(salary_plotly, "salary_distribution_plot.html", selfcontained = TRUE)

saveWidget(job_satisfaction_plotly, "job_satisfaction_plot.html", selfcontained = TRUE)

You might also like