0% found this document useful (0 votes)
28 views27 pages

DVDA Laboratory Manual for Data Analysis

The document outlines a series of experiments for a Data Visualization and Data Analytics laboratory course at Parul University, covering various methods and tools such as MS-Excel, Python libraries, and machine learning algorithms. Each experiment includes specific aims, codes, and outputs related to data analysis, statistical measures, and visualization techniques. Tools like Weka and Tableau are also discussed for their applications in data mining and visualization.

Uploaded by

Divyaraj Gohil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views27 pages

DVDA Laboratory Manual for Data Analysis

The document outlines a series of experiments for a Data Visualization and Data Analytics laboratory course at Parul University, covering various methods and tools such as MS-Excel, Python libraries, and machine learning algorithms. Each experiment includes specific aims, codes, and outputs related to data analysis, statistical measures, and visualization techniques. Tools like Weka and Tableau are also discussed for their applications in data mining and visualization.

Uploaded by

Divyaraj Gohil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

lOMoARcPSD|52730535

DVDA Manual PIT

Data structure And Algorithms (Parul University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Divyraj Gohil ([email protected])
lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

EXPERIMENT NO: 1
AIM: Use MS-Excel to create pivot table & apply statistical measures to it.

CODE:
Step1: Create a normal table

Step2: Fill the data with random values. Use the formula/syntax =RANDBETWEEN (m, n)
where m is starting number and n is ending number. This formula will help to fill random
values in table.

Step3: After filling all data we going to take sum of all subjects of all students so we are
using formula called =SUM (cell1 + cell 2+…+.). After doing that for one student just drag
that row downwards so it will calculate all the students sum of marks.

Step4: To calculate average of student marks use formula =AVERAGE (sum/n)

Where n is the total no of subject and sum is the sum of all subjects.

Step5: Now to find the grade of the student we are using if-else condition for that we have
formula like e.g.

=IF(I2>=60,"A”, IF(I2>=50,"B",IF(I2>=40,"C",IF(I2>=20,"F"))))

Step6: Now select the table and click on insert and then on pivot table and select the option
you want i.e you want pivot table in existing sheet or in new sheet then press OK

Step7: Your pivot table is ready.

[1]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

OUTPUT:

[2]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

EXPERIMENT NO: 2
Aim: Use the table created in above practical to generate different charts.

Code:
Column Labels
Values MOTO Grand Total
Sum of JAN 212 212
Sum of FEB 469 469
Sum of MAR 161 161
Sum of APR 150 150
Sum of MAY 125 125
Sum of JUN 297 297
Sum of JUL 438 438
Sum of AUG 381 381
Sum of SEP 398 398
Sum of OCT 571 571
Sum of NOV 445 445
Sum of DEC 288 288

OUTPUT:

MOTO
600
500
400
300
200 MOTO

100
0
Sum Sum Sum Sum Sum Sum Sum Sum Sum Sum Sum Sum
of JAN of FEB of of APR of of JUN of JUL of of SEP of OCT of of DEC
MAR MAY AUG NOV

[3]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

MOTO Sum of JAN


Sum of FEB
7% 5% Sum of MAR
11% 12% Sum of APR
4% Sum of MAY
4% Sum of JUN
Sum of JUL
15% 3% Sum of AUG
Sum of SEP
8%
Sum of OCT
10% Sum of NOV
11% Sum of DEC
10%

800

700
GOOGLE
600
IPHONE
500 IQOO
MOTO
400
NOKIA
300
ONE PLUS

200 OPPO
SAMSUNG
100
VIVO
0
Sum of Sum of Sum of Sum of Sum of Sum of Sum of Sum of Sum of Sum of Sum of Sum of
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC

6000
Sum of DEC
5000 Sum of NOV
Sum of OCT
4000
Sum of SEP

3000 Sum of AUG


Sum of JUL
2000 Sum of JUN
Sum of MAY
1000
Sum of APR
0 Sum of MAR
GOOGLE IPHONE IQOO MOTO NOKIA ONE PLUS OPPO SAMSUNG VIVO

[4]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

EXPERIMENT NO:3
AIM: Perform the histogram analysis of given dataset using data analysis
toolbox of excel.

Code:
Step1: Create a table of 50 students marks of 5 subject and fill the data using random values
for that we have a formula/function i.e.

=RANDBETWEEN (m, n) where m is starting number and n is ending number.

Step2: Calculate the average of 50 students

Step3: Select the column to make histogram

Output:

[5]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

[6]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

EXPERIMENT NO: 4
AIM: Use python libraries to generate chart from data stored in excel.

CODE:
from statistics import median

import pandas as pd

import matplotlib.pyplot as plt

d = pd.read_excel(r'F:\SEMESTER 5\DATA VISULIZATION AND DATA ANALYTICS\LAB\Marks.xlsx')


f = pd.DataFrame(d)
print(f)

meanm = d["Maths"].mean()

medianm = d["Maths"].median()

modem = d["Maths"].mode()

meanp = d["Physics"].mean()

medianp = d["Physics"].median()

modep = d["Physics"].mode()

cor = d.corr()

print("Maths:")

print("Mean: ",meanm)

print("Median: ",medianm)

print("Mode: ",modem)

print("Physics:")

print("Mean: ",meanp)

print("Median: ",medianp)

print("Mode: ",modep)

[7]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

print("Correlation: ")

print(cor)

d.plot(kind = 'hist', x = 'Maths', y = 'Physics')

plt.show()

OUTPUT:

[8]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

EXPERIMENT NO: 5
AIM: Perform multiple linear regression on data.

CODE:
import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import r2_score

import matplotlib.pyplot as plt

df=pd.read_excel(r'H:\CODES\PYTHON\JEET.xlsx')

df.head()

x=df.drop(['V'],axis=1).values

y=df['V'].values

print(x)

print(y)

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=0)

ml=LinearRegression()

ml.fit(x_train,y_train)

y_pred=ml.predict(x_test)

print(y_pred)

r2_score(y_test,y_pred)

plt.figure(figsize=(15,10))

plt.scatter(y_test,y_pred)

plt.xlabel('Original')

[9]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

plt.ylabel('Predicted')

plt.title('Original vs Predicted')

pred_y_df=pd.DataFrame({'Original Value':y_test,'Predicted
Value':y_pred,'Difference':y_test-y_pred})

pred_y_df[0:20]

OUTPUT

[10]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

[11]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

EXPERIMENT NO: 6
AIM: Perform the Logistic Regression on a dataset and interpret the
regression table.

LOGISTIC REGRESSION: Logistic regression estimates the probability of an


event occurring, such as voted or didn't vote, based on a given dataset of
independent variables

CODE:
from sklearn.datasets import load_digits

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

digits = load_digits()

dir(digits)

digits.data[4]

plt.gray()

plt.matshow(digits.images[1])

digits.target[0:5]

x_train,x_test,y_train,y_test = train_test_split(digits.data,digits.target,test_size=0.2)

len(x_test)

logistic = LogisticRegression()

logistic.fit(x_train,y_train)

logistic.score(x_test,y_test)

plt.matshow(digits.images[84])

digits.target[84]

[12]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

logistic.predict([digits.data[84]])

OUTPUT

[13]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

EXPERIMENT NO: 7
AIM: Use a dataset and apply KNN to get insights from data

KNN ALGORITHM- K nearest neighbour is a a simple algorithm that stores all


the available cases and classifies the new data or case based on a similarity
measure

K denotes no of nearest numbers

CODE:

[14]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

[15]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

EXPERIMENT NO: 8
AIM: Use a dataset & apply K means clustering to get insights from data

K MEANS CLUSTING: K means clustering is one of the simplest algorithm which


uses unsupervised learning method to solve known clustering issues

CODE:

[16]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

[17]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

[18]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

EXPERIMENT NO: 9
AIM: Study about the tools like Orange, Tableau, Weka etc. tool for data
Visualization.

Weka Tool:
The Weka Tool Weka is one of the very popular open source data mining tools developed at
the University of Waikato in New Zealand in 1992. It is a Java based tool and can be used to
implement various machine learning and data mining algorithms written in Java. The
simplicity of using Weka has made it a landmark for machine learning and data mining
implementation. Weka supports reading of files from several different databases and also
allows importing the data from the internet, from web pages or from a remotely located
SQL database server by entering the URL of resource. Among all the available data mining
tools, Weka is the most commonly used of all due to its fast performance and support for
major classification and clustering algorithm. Weka can be easily downloaded and deployed.
Weka provides both, a GUI and CLI for performing data mining and does a good job of
providing support for all the data mining tasks [16]. Weka supports a variety of data formats
like CSV (Comma-separated Value), ARFF and Binary. Weka focuses more on textual
representation of the data rather than visualization although it does provide support to
display some visualization but those are very generic. Also, Weka does not provide visual
representation of results of processing in an effective and understanding manner like Rapid
Miner. Weka performs accurately when the size of the data set is not large. If the size is
large, then Weka does experience some performance issues. Weka provides support for
filtering out data or attributes.

Tableau Tool:
The Tableau Tool Tableau is a powerful data visualization tool used in business intelligence
and data analysis. Tableau Software was invented by Chris Stole, Christian Chabot and Pat
Harahan in January, 2003 [18]. The visualization provided by Tableau has completely
enhanced the ability to gain more knowledge about the data we are working on and can be
used to provide more accurate predictions. “The product queries relational databases,
cubes, cloud databases, and spread sheets and then generates a number of graph types that
can be combined into dashboards which can be securely shared over a computer network or
the internet” *18+. Unlike Rapid Miner and Weka, Tableau does not implement data mining
algorithms provides visualizations of the data. For this, Tableau provides integration with
another popular statistical analysis tool R9 , to provide support for data mining. “Tableau
offers five main products namely Tableau Desktop, Tableau Server, Tableau Online, Tableau
Reader and Tableau Public. Tableau Public and Tableau Reader are available 9 See Appendix

[19]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

18 freely, whereas Tableau Server and Tableau Desktop come with a free trial period
afterwards which the user has to pay” . Tableau has made it possible to explore and present
the data in a much simpler and beautiful manner. Working on projects using Tableau is less
time consuming and easy to handle. Tableau uses a feature called Dashboard which is a
collection of worksheets which can be easily imported from anywhere.

Orange Tool:
Orange is a perfect software suite for machine learning & data mining. It best aids the data
visualization and is a component-based software.
As it is a software, the components of orange are called ‘widgets’.
Widgets offer major functionalities like
• Showing data table and allowing to select features
• Reading the data
• Training predictors and to compare learning algorithms
• Visualizing data elements etc.
Additionally, it brings a more interactive and fun vibe to the dull analytic tools. It is quite
interesting to operate.

PRE-PROCESSING IN WEKA
Start Weka

From the Weka GUI Choose, Select Explorer as the Application

Step1: In the preprocess tab, select open file

Select your dataset

Step2: From Filters tab, select any filters you want to add to the data set.

Weka allows supervised and unsupervised filters on attributes and instances.

[20]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

In our example we have not selected any filter

Step3: Select the attributes from the data set you want to keep for analysis.

You can remove some attributes depending upon your requirements.

In this example, we have selected all the attributes for sample visualization

Again, depending on your needs you can select number of attributes you want for the
implementation.

STEPS IN WEKA

Step 1: Start Weka

Step 2: Perform the steps described in the preprocessing section above

Step 3: Go to the classify tab and select Naviebayes as the classifier

Depending on which classification algorithm you want to implement, select the particular
classifier

Step 4: Double click on the classifier to open the Naviebayes Object Editor

In our example, we select all the default properties

Step 5: Now, select one attribute to be the label for classification.

We select “state” as the label in our example

Step 6: After selecting the label, click Start to begin the execution of classification algorithm

Step 7: Now, the classification of various attributes based on the label will be displayed in
the result screen.

Step 8: Alternatively, Weka provides options for visualization by right-clicking on the


particular algorithm in the Result list

[21]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

[22]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

EXPERIMENT NO: 10
AIM: Given a case study: Interactive Data Analytics with Power BI.

1. HEATHROW
• Heathrow airport is an international airport in London. It is the second busiest
international airport in the world after Dubai international airport. And, also the
seventh-largest in terms of total passenger traffic.

THE CHALLENGE
• Being the world’s seventh busiest airport in overall passenger traffic, one can only
imagine the level of efficiency and efforts expected from the airport’s ground
management to keep the airport functioning properly. Managing over 2,00,000
passengers every day can be quite a challenging task for airport authorities and
ground staff. Every department needs to be in absolute coordination and sync to be
able to manage the passenger traffic and give them a smooth experience at the
airport. At such busy airports, every day brings new challenges and uncertainties
with it. Unexpected disruptions in the smooth workflow of operations at the airport
disturb the entire functioning. Issues can arise due to stormy weather, delayed
flights, canceled flights, shifts in jet streams, etc. disturbing the airport’s smooth
functioning. Such problems send the passengers as well as airport employees into
turmoil.
• The airport needed a central digitalized management system as a solution to this
problem. Such a system would use the large amounts of data being produced by
operational systems at the airport and transform it into useful visual insights. The
interpretations produced by the BI tool can be used by airport staff for better
functioning and passenger management.
THE CHANGE
• Heathrow group went with Microsoft Power BI as their business intelligence
software and Microsoft Azure for cloud services. The airport has deployed Microsoft
Azure technology to collect data from back-end operational systems at the airport.
These systems are check-in counters, baggage tracking systems, flight schedules,
and weather tracking systems, cargo tracking and many more.
• The operational data from these systems are forwarded to business intelligence
platforms like Power BI. In Power BI, users shape this data into useful information
that the airport staff can use.

[23]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

• Power BI transforms the crude information into informative visuals showing


different statuses and statistics of the airport systems. Then, the ground staff like
baggage handlers, gate agents, air traffic controllers, etc. use this information to
properly operate and manage passengers.

• Services such as Azure Stream Analytics, Azure Data Lake Analytics, and Azure SQL
Database are used to extract, clean and prepare operational data in real-time. This
data is about flight movements, security queues, passenger transfers, and
immigration queues. Ultimately, Power BI uses data from these Azure services for
analysis and interpretation.
• Operational data from different data sources come into Power BI. Then Power BI
tools are used to transform that data into meaningful insights with the help of visual
reports, graphics, and dashboards. About 75,000 airport employees have
information on their fingertips by the virtue of Power BI.

• Let us understand this with the help of a real-world example. If there is a change in
the jet stream, it may delay about 20 flights in a day. This will result in about 6,000
passengers waiting at the airport at a given point of time. It will increase passenger
traffic and density at the airport. Power BI works like the centralized information
system. The airport uses it to inform about the sudden passenger influx. This
information goes out to different sections such as food outlets, immigration,
customs, gate attenders, baggage handlers at the airport. This will give them time
to prepare themselves to attend the passengers.
• With the presence of smart BI solutions like Power BI, airport staff is notified in
advance about the probable delays and the sudden rush of passengers at the airport.
This help management groups and other employees to take suitable actions in
advance like increasing the food stock, adding extra passenger buses, increasing
the ground staff, directing the passengers to the waiting area, etc. to avoid any
last-minute hustle.
• Thus, with the help of a powerful BI tool like Power BI, Heathrow has been benefited
in more than one way. They are extremely happy and satisfied with the capabilities
of Power BI helping them give a hassle-free airport experience to their passengers.
Heathrow also is extending Power BI applications by trying to anticipate passenger
flow at the airport to avoid any unexpected disruptions for the passengers.

STEPS TO ANALYZE DATA IN POWER BI


Step1: Download the Power BI Software in your PC

Step2: After that download the dataset for performing data analysis. Here I have
downloaded Sales of chocolates to importing to different countries and containing
info about the dealer

Step 3: Then open Power BI. The home page will look like below page

[24]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

Step 4: Then import file which you want

Step 5: Then according to you do data analysis

[25]

Downloaded by Divyraj Gohil ([email protected])


lOMoARcPSD|52730535

Faculty of Engineering and Technology (PIT)


Subject Name: DVDA LABORATORY
Subject Code: 203105308

[26]

Downloaded by Divyraj Gohil ([email protected])

You might also like