0% found this document useful (0 votes)
16 views6 pages

Assignment 2 297

Uploaded by

satwiksharma.kn1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views6 pages

Assignment 2 297

Uploaded by

satwiksharma.kn1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

DVER

ASSIGNMENT 2
BHUVANA SREE
VU22CSEN0300297
* Explore new dataset
* Load/Import dataset
* Understand the data (size, dimensions, type)
* Perform Data Preprocessing
* Descriptive Statistics
* Analysis
* Visualization summary
* Deduct the outliers
* Conclude the findings - Insights

Introduction
This report presents an in-depth analysis of a dataset focused on employee performance within a
fictional company. The dataset consists of 100 records, capturing key attributes such as Employee ID,
Department, Salary, Years of Experience, Job Satisfaction, Performance Score, and Age. The objective
of this analysis is to explore these attributes to identify trends, relationships, and potential outliers
that may influence overall employee performance.
We will begin by understanding the structure and composition of the dataset, followed by necessary
data preprocessing steps to ensure data quality. Descriptive statistics will provide insights into the
central tendencies and variability of the key metrics. We will then visualize important aspects of the
data to illustrate trends and distributions, followed by a thorough investigation of outliers in the
dataset.

1.Load Necessary Libraries

# Load necessary libraries


library(ggplot2)
library(dplyr)
library(htmlwidgets)
library(plotly)

2. Create the Dataset

# Set seed for reproducibility

set.seed(789)
# Create an employee performance dataset

employee_data <- data.frame(

Employee_ID = 1:100,

Department = sample(c("HR", "IT", "Sales", "Marketing"), 100, replace = TRUE),

Salary = sample(30000:120000, 100, replace = TRUE),

Years_of_Experience = sample(1:30, 100, replace = TRUE),

Job_Satisfaction = sample(1:10, 100, replace = TRUE),

Performance_Score = sample(50:100, 100, replace = TRUE),

Age = sample(22:60, 100, replace = TRUE)

# View the first few rows of the dataset

head(employee_data)

3. Understand the Data

# Check dimensions and structure

dim(employee_data) # Dimensions

str(employee_data) # Structure of the dataset

summary(employee_data) # Summary statistics


4. Data Preprocessing

# Check for missing data

missing_data <- sum(is.na(employee_data))

if (missing_data > 0) {

cat("There are", missing_data, "missing values in the dataset.")

} else {

cat("No missing values in the dataset.")

# Convert Department to factor

employee_data$Department <- as.factor(employee_data$Department)

5. Descriptive Statistics

# Calculate descriptive statistics

desc_stats <- employee_data %>%

summarise(

Mean_Salary = mean(Salary),
Median_Salary = median(Salary),

Mean_Job_Satisfaction = mean(Job_Satisfaction),

Mean_Performance_Score = mean(Performance_Score),

Age_Range = range(Age)

print(desc_stats)

6. Analysis and Visualization

a. Distribution of Salary

# Distribution of Salary

salary_plot <- ggplot(employee_data, aes(x = Salary)) +

geom_histogram(binwidth = 5000, fill = "skyblue", color = "black") +

labs(title = "Distribution of Salary", x = "Salary", y = "Frequency")

# Print the plot

print(salary_plot)

b. Job Satisfaction Box Plot


# Job Satisfaction Box Plot

job_satisfaction_plot <- ggplot(employee_data, aes(y = Job_Satisfaction)) +

geom_boxplot(fill = "lightgreen") +

labs(title = "Job Satisfaction Distribution", y = "Satisfaction Rating")

# Print the plot

print(job_satisfaction_plot)

7. Detect Outliers

# Identify outliers in Salary using IQR method

Q1 <- quantile(employee_data$Salary, 0.25)

Q3 <- quantile(employee_data$Salary, 0.75)

IQR <- Q3 - Q1

# Define outlier thresholds

outlier_threshold_low <- Q1 - 1.5 * IQR

outlier_threshold_high <- Q3 + 1.5 * IQR

# Get outlier information

outliers <- employee_data %>%

filter(Salary < outlier_threshold_low | Salary > outlier_threshold_high)

# Print outliers
print(outliers)

# Visualize Outliers

outlier_plot <- ggplot(employee_data, aes(x = "", y = Salary)) +

geom_boxplot() +

labs(title = "Box Plot of Salary with Outliers", y = "Salary")

# Print the outlier plot

print(outlier_plot)

8. Export Interactive Plots

# Convert plots to interactive

salary_plotly <- ggplotly(salary_plot)

job_satisfaction_plotly <- ggplotly(job_satisfaction_plot)

# Save the interactive plots as HTML

saveWidget(salary_plotly, "salary_distribution_plot.html", selfcontained = TRUE)

saveWidget(job_satisfaction_plotly, "job_satisfaction_plot.html", selfcontained = TRUE)

Conclusion

In this structured analysis of the employee performance dataset, we walked through each phase from data
creation to visualization and conclusion. The insights gained can help organizations understand salary
distributions, the impact of job satisfaction on performance, and identify outliers that may indicate areas for
further investigation. These findings can ultimately guide effective management practices aimed at improving
employee engagement and productivity.

You might also like