0% found this document useful (0 votes)
4 views

Machine_learning_lab_manual r

The document outlines a Machine Learning Lab syllabus for a course at Sanjeev Agrawal Global Educational University, detailing various experiments including installation of Anaconda, dataset sourcing, and implementation of machine learning algorithms like Linear Regression and K-Means Clustering. It emphasizes the importance of datasets in machine learning and provides resources for obtaining them. The document also includes practical steps for setting up the Python environment and conducting experiments.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Machine_learning_lab_manual r

The document outlines a Machine Learning Lab syllabus for a course at Sanjeev Agrawal Global Educational University, detailing various experiments including installation of Anaconda, dataset sourcing, and implementation of machine learning algorithms like Linear Regression and K-Means Clustering. It emphasizes the importance of datasets in machine learning and provides resources for obtaining them. The document also includes practical steps for setting up the Python environment and conducting experiments.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Sanjeev Agrawal Global Educational

University Bhopal

School of Engineering and Technology

Subject: MachineLearningLab subjectcode:CS-702

Semester: VII Year:IV

Index

S.No. ListOfExperiments

1 InstallationofAnacondaDistributioninWindowsOperatingSystem.

2
Casestudyofhowtogetvarious DataSetsforTraining.

3 ImportDataset,usingDatasetloadingutilities(sklearn).

4 ImplementationofLinearRegressionAlgorithm.

5 ImplementationofK-MeansClusteringAlgorithm.

6 StudyofTrainingandTestingDataSets.

7 ImplementationofLogisticRegression.

8 VisualizationofDatausingMatplotlib.

9 ImplementationofDecisionTrees

10 ImplementationofSupportVectorMachines(SVM)

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


Experiment01

Aim:-InstallationofAnaconda andPythoninwindowsOperatingSystem

To learn machine learning, we will use the Python programming language IDEsSo,
in order to use Python for machine learning, we need to install it in our
computersystemwithcompatible IDEs(IntegratedDevelopmentEnvironment).
InthisPractical,wewilllearntoinstallPythonandanIDEwiththehelpof
Anacondadistribution.
Anaconda distribution is a free and open-source platform for Python/R
programming languages. It can be easily installed on any OS such as Windows,
Linux, and MAC OS. It provides more than 1500 Python/R data science packages
which are suitable for developing machine learning and deep learning models.
Anaconda distribution provides installation of Python with various IDE's such as
JupyterNotebook, Spyder,Anacondaprompt,etc.Hence itisaveryconvenient
packaged solution which you can easily download and install in your computer. It
will automatically install Python and some basic IDEs and libraries with it.
Below some steps are given to show the downloading and installing process of
Anaconda and IDE:

Step-1:Download AnacondaPython:

To download Anaconda in your system, firstly, open your favorite browser and
type Download Anaconda Python, and then click on the first link as given in the
below image. Alternatively, you can directly download it by clicking on this link,
https://www.anaconda.com/distribution/#download-section

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


• After clicking on the firstlink, you will reach to download page of Anaconda, as shown in
the below image:

• Since, Anaconda is available for Windows, Linux, and Mac OS, hence, you candownload
it as per your OS type by clicking on available options shown in below image.
Itwillprovide you Python2.7andPython3.7versions, butthe latest version is3.7, hence

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


wewilldownloadPython3.7version.Afterclickingonthedownloadoption,it willstart
downloading on your computer.

Step-2:InstallAnaconda Python(Python3.7version):

Once the downloading process gets completed, go to downloads → double clickon the ".exe" file
(Anaconda3-2019.03-Windows-x86_64.exe) of Anaconda. It will open a setup window for
Anaconda installations as given in below image, then click on Next.

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


• ItwillopenaLicenseagreementwindowclickon"IAgree"optionandmovefurther.

• Inthenextwindow,youwillgettwooptionsforinstallationsasgiveninthebelow image. Select


the first option (Just me) and click on Next.

• Now youwill geta windowforinstallinglocation,here,youcan leaveitasdefaultor change it


by browsing a location, and then click on Next. Consider the below image:

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


• Nowselectthesecondoption, and clickoninstall.

• Oncetheinstallationgetscomplete,clickonNext.

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


• Nowinstallationiscompleted,tickthecheckboxifyouwanttolearnmoreabout Anaconda and
Anaconda cloud. Click on Finish to end the process.

Step-3:OpenAnacondaNavigator
• Aftersuccessfulinstallationof Anaconda,useAnacondanavigator tolaunchaPython IDE
such as Spyder and Jupyter Notebook.
• ToopenAnacondaNavigator,clickonwindowKey andsearchforAnaconda navigator,
and click on it. Consider the below image:

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


• Afteropening thenavigator,launch the SpyderIDEbyclicking on theLaunchbutton given
below the Spyder. It will install the Spyder IDE in your system.

RunyourPython programinSpyderIDE.
• OpenSpyderIDE,itwilllooklikethebelowimage:

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


• Writeyourfirstprogram, andsave itusingthe.pyextension.
• RuntheprogramusingthetriangleRunbutton.
• Youcancheckthe program'soutputonconsole paneatthe bottomrightside.
Step- 4: ClosetheSpyderIDE.

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


Experiment02

Aim:-CaseStudy ofhowtogetvariousDatasetsforTraining.
The key to success in the field of machine learning or to become a great data scientist is to
practice with different types of datasets. But discovering a suitable dataset for each kind of
machine learning project is a difficult task. So, we will provide the detail of the sources from
where you can easily get the dataset according to your project.
Beforeknowingthesourcesofthemachinelearningdataset,let'sdiscussdatasets.

Whatisadataset?

A dataset is a collection of data in which data is arranged in some order. A dataset can contain
any data from a series of an array to a database table. Below table shows an example of the
dataset:

Country Age Salary Purchased


India 38 48000 No
France 43 45000 Yes
Germany 30 54000 No
France 48 65000 No
Germany 40 Yes
India 35 58000 Yes
A tabular dataset can be understood as a database table or matrix, where each column
correspondsto aparticularvariable, and eachrow corresponds tothe fields ofthe dataset. The
most supported file type for a tabular dataset is "Comma Separated File," or CSV.

Typesofdataindatasets

• Numericaldata:Suchashouseprice,temperature,etc.
• Categoricaldata:SuchasYes/No,True/False,Blue/green,etc.
• Ordinaldata:Thesedataaresimilarto categorical databutcanbe measuredonthebasis of
comparison.

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


Note: A real-world dataset is of huge size, which is difficult to manage and process at the
initial level. Therefore, to practice machine learning algorithms, we can use any dummy
dataset.

NeedofDataset
To work with machine learning projects, we need a huge amount of data, because, without the
data, one cannot train ML/AI models. Collecting and preparing the dataset is one of the most
crucial parts while creating an ML/AI project.
The technology applied behind any ML projects cannot work properly if the dataset is not well
prepared and pre-processed.
During the development of the ML project, the developers completely rely on the datasets. In
building ML applications, datasets are divided into two parts:

• Training dataset:
• TestDataset

Note: The datasets are of large size, so to download these datasets, you must have fast
internet on your computer.

PopularsourcesforMachineLearningdatasets
Belowisthelistofdatasetswhicharefreelyavailableforthepublictoworkonit:

1. KaggleDatasets

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


Kaggle is one of the best sources for providing datasets for Data Scientists and Machine
Learners. It allows users to find, download, and publish datasets in an easy way. It also provides
the opportunity to work with other machine learning engineers and solve difficult Data Science
related tasks.
Kaggleprovidesa high-qualitydataset indifferentformatsthat wecaneasilyfind and download. The
link for the Kaggle dataset is https://www.kaggle.com/datasets.

2. UCIMachineLearningRepository

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


UCI Machine learning repository is one of the great sources of machine learning datasets. This
repository contains databases, domain theories, and data generators that are widely used by the
machine learning community for the analysis of ML algorithms.
Since the year 1987, it has been widely used by students, professors, researchers as a primary
source of machine learning dataset.
It classifies the datasets as per the problems and tasks of machine learning such as Regression,
Classification, Clustering, etc. It also contains some of the popular datasets such as the Iris
dataset, Car Evaluation dataset, Poker Hand dataset, etc.
ThelinkfortheUCImachinelearningrepositoryishttps://archive.ics.uci.edu/ml/index.php.

3. DatasetsviaAWS

We can search, download, access, and share the datasets that are publicly available via AWS
resources. These datasets can be accessed through AWS resources but provided and maintained
by different government organizations, researches, businesses, or individuals.
Anyonecananalyzeand build variousservicesusingshareddataviaAWS resources. Theshared
dataset on cloud helps users to spend more time on data analysis rather than on acquisitions of
data.
This source provides the various types of datasets with examples and ways to use the dataset. It
also provides the searchboxusing whichwe cansearch for the required dataset. Anyone canadd
any dataset or example to the Registry of Open Data on AWS.

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


Thelinkfortheresourceishttps://registry.opendata.aws/.

4. Google'sDatasetSearchEngine

GoogledatasetsearchengineisasearchenginelaunchedbyGoogleonSeptember5,2018.
Thissourcehelpsresearchersto getonlinedatasetsthatarefreelyavailableforuse.
ThelinkfortheGoogledatasetsearchengineishttps://toolbox.google.com/datasetsearch.

5. MicrosoftDatasets

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


The Microsoft has launched the "Microsoft Research Open data" repository with thecollection
of free datasets in various areas such as natural language processing, computer vision, and
domain-specific sciences.
Using this resource, we can download the datasets to use on the current device, or we can also
directly use it on the cloud infrastructure.
Thelinktodownload orusethedatasetfromthisresourceishttps://msropendata.com/.

6. AwesomePublicDatasetCollection

Awesome public dataset collection provides high-quality datasets that are arranged in a well-
organized manner within a list according to topics such as Agriculture, Biology, Climate,
Complex networks, etc. Most of the datasets are available free, but some may not, so it is better
to check the license before downloading the dataset.
The link to download the dataset from Awesome public dataset collection is
https://github.com/awesomedata/awesome-public-datasets.

7. GovernmentDatasets
There are different sources to get government-related data. Various countries publishgovernment
data for public use collected by them from different departments.
The goal of providing these datasets is to increase transparency of government work among the
people and to use the data in an innovative approach. Below are some links of government
datasets:

• IndianGovernmentdataset

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


• USGovernment Dataset
• NorthernIreland PublicSectorDatasets
• EuropeanUnionOpenDataPortal

7. ComputerVisionDatasets

Visual data provides multiple numbers of the great dataset that are specific to computer visions
such as Image Classification, Video classification, Image Segmentation, etc. Therefore, if you
want to build a project on deep learning or image processing, then you can refer to this source.
Thelinkfordownloadingthedatasetfromthissourceishttps://www.visualdata.io/.

8. Scikit-learn dataset

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


Scikit-learn isa great source for machine learning enthusiasts. Thissource provides both toyand
real-world datasets. These datasets can be obtained from sklearn.datasets package and using
general dataset API.
The toydataset available on scikit-learn can be loaded using some predefined functions such as,
load_boston([return_X_y]), load_iris([return_X_y]), etc, rather than importing any file from
external sources. But these datasets are not suitable for real-world projects.
The link to download datasets from this source is https://scikit-
learn.org/stable/datasets/index.html.

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


Experiment03

Aim:-Import Dataset,usingDatasetloadingutilities(sklearn).
scikit-learn (formerly scikits.learn andalsoknownas sklearn)isa freesoftware machine learning library
for thePython programming language. scikit-learnmake available a host of datasets for testing learning
algorithms

OutPut

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


Experiment04
Aim:-ImplementofLinearRegressionAlgorithm

Whatislinearregression?

Linear regression analysis is usedtopredict thevalue of a variablebasedonthevalueof another variable. The


variable youwant to predict is called the dependent variable. The variable you are using to predict the
other variable's value is called the independent variable.

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


Out[4]: area
0 0

2 2

3 3

4 4

In[5]:price=df.price
price

Out[5]:0 2
1 3
2 5
3 4
4 6
Name:price,dtype:int64

In[6]:#Create Linear regression object reg=


linear_model.LinearRegression()
reg.fit(new_df,price)

Out[6]:LinearRegression()

In[7]:reg.predict([[10]])

C:\Users\Aftab\anaconda3\lib\site-packages\sklearn\base.py:450:UserWarning:Xdoesnothavevalidfeature
ressionwasfittedwithfeaturenames
warnings.warn(

Out[7]:array([11.2])

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


Experiment05
Aim:-Implement K-MeansClusteringAlgorithm

WhatisKMeansClustering?

K-Means Clustering is an Unsupervised Learning algorithm,which groups the unlabeled dataset into
different clusters. Here K defines the number of pre-defined clusters that need to be created in theprocess,
as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.

“It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that
each dataset belongs only one group that has similar properties.”

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


In[4]:km=KMeans(nclusters=3)
predicted= km.fit_predict(df[['rollno','marks']J)
predicted
Out[4]:array([2,2,2,2,1,1,1,0,0,0,0,0,2.,2,0,0,1,1,1,1])

In[5]:df['cluster']=predicted
df.head()
dfl=df[df.cluster==0]
df2 = df[df.cluster==l]
df3 = df[df.cluster==2]
plt.scatter(dfl.rollno,dfl['marks'],color='green')
plt.scatter(df2.rollno,df2['marks'],color='red')plt
.scatter(df3.rollno,df3['marks'],color='blue')
plt.xlabel('rollno')
plt.ylabel('marks')
Out[5]:Text(0,0.5,'marks')

In[5].:df[·cluster':]predicted
df.head()
dfl=df[df..cluster::0]1
df2;;;df[df.d.uster;;;;;;l]df3=
df[df.cluster==2]
plt.scatter(dh.r-ollnodf1['marks'],color"'green')
plt.scatter(d2..rollno,df2['rnrks'],,color;;;'red')
plt.scatter(difl.rollno,dB[·marks'Lcolor=·blue')
plt.xlabel('rollno")
plt.ylabel('marks')

out[5]:Text(e,0.5,·marks')

160 •• ••
• •
140

120
t"!'
poo
Ill •
6D
• •
40
275 300 35.1 375 40.0 42.5
nlllno

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


In[6];scale=MinMaxScaler()
scale.fit(df[['marks']])
df['marks']=scale.transform(df[['marks']])

scale.fit(df[['rollno']])
df['rollno']::scale.transform(df[['rollno']])

In[7]:kin=KMeans(n_cluters:3)
prediced::;km.fityredict(df[[•rollno',·marks·]J)
predicted
Out[/]:array([2,2,.2,2,1,1,1,0,0,0,0,0,0,0,0,0,1,1, ,1])

rn[8]:df=df.drop(['cluster'],axis='column"')

In(9]:df['cluster')=predictd
df.head()

dfl=df[df.cluster==O]
df2=df(df.cLuster==lJdf3=
df[df. duster==2j
plt.catter(dfl.rollno,dfl['marks'],color='g,re11')
plt.scatter(df2.rollno,df2[ 'marks·], color=·red')
plt.scatter(df3..r-ollno,df3['marks'],color='blue')
plt.xlabel('rollno')
plt.ylabel('marks')
out[9]:Text(0,.0,5,'marks')

••

0.0 0-2 LO

Iri[10]:k1n.cluster_centers_

Out[10]:array([[0,B72549;0,11585945],
[0..72268908,0.8974359]
[0.86764106,0,1965812]])

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


!11(11]:plt,scatter(dfl-!rollno,dfl['marks'],
rnlor='green')
plt,scatter(df2.lrollno,df2[ 'marks'], color=' red')
plt,scatter(df3.rollno,df3['marks'],color='blue')
plt,scatter(km.cluster_centers_[:,0], lb11.duster_centers_[:,1],color=' black' ,marker='*')
plt.xlabel('rollno')
plt.ylabel('mar·ks')
Out[ll]:Text(0,0.5,'marks')

10

1).11

.,lUi0.

41_2

IHI
0.0 02 04 06 08 u
l!IIIIIO

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


Experiment06
Aim:-StudyofTrainingand TestingDataSets.

TrainingandTestingDataset:-

The training data is the biggest (in -size) subset of the original dataset, which is used to train or fit the
machine learning model. Firstly, thetraining data is fedtothe ML algorithms, whichlets them learn how to
make predictions for the given task.

Thetestdatasetisanothersubsetoforiginaldatawhichisusedtochecktheaccuracyofthemodel.

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


(12]y
Os

0 5000
1 4500
2 7000
3 6000
4 3500
5 2000
Name:price,dtype:int64

(13]fromsklearn.model_selectionimporttrain_test_split
2s
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=10)

Os
(14]Xtrain

distance year l?:


0 500 5
3 200 2
4 400 7

1 600 3

[15] Xtest
Os

c..- distance year :


2 mo
5 800 9

[16] y_train
Os

0 5000
3 6000
4 3500
1 4500
Name:priceJdtype:int64

Os0 y_test

7000
2000

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


• The pre-processed data cannot be given to algorithms directly, before that we need to decide the
Independent and Dependent variables of our data. An example data is mentioned.

• This data consist of 6 samples and 3 features. Data is about scooter price which depends on
distance travelled and years its being used, hence here price is dependent variable, distance and
years are independent variables.

• As mentioned below X holds Independent variables(distance and years) and y holds Dependent
variable(price)

• Nowtogeneratetrainingandtestingdataweneedto import train_test_split from


sklearn.model_selection. As mentioned below we need to provide
independentanddependentvariables,alsothetestsizeortrainsizemustbe provided. random_statehelps
to maintain the sameresults (training and testing data) for different
runs,ifitwon’thavebeenmentionedthenthedatakeepsonchanging.Finallywe generate X_train,
X_test, y_train and y_test.

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


Experiment07
Aim- Implementation ofLogisticRegression.

WhatisLogisticRegression?

Logisticregression is oneof themostpopularMachineLearning algorithms, which comes undertheSupervised


Learning technique. It is used for predicting the categorical dependent variable using a given set ofindependent
variables.

Logisticregression is oneof themostpopularMachine Learning algorithms, which comesunderthe Supervised


Learning technique. It is used for predicting the categorical dependent variable using a given set ofindependent
variables.

Logistic Regression is much similar tothe Linear Regression except that how theyare used. Linear Regression is
used for solving Regression problems, whereas Logistic regression is used for solving the classification
problems.

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


[6] fromsklearn.model_selectionimporttrain_test_split
X_train,X_test,y_train,y_test=train_test_split(df[['age']],df.results,train_size=0.8,random_state=10)

Os
0 fromsklearn.linear_modelimportLogisticRegression
model= LogisticRegression()
model.fit(X_train,y_train)
I
c.. • LogisticRegressionI
I LogisticRegression()I

Os
[8]y_predictedmodel.predict(X_test)
y_predicted

array([1,1,0,0,0,0])

o,[9] model.score(X_test,y_test)

1.0

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


Experiment08
Aim-Visualization ofDatausingMatplotlib.
Data visualization is the graphical representation of information and data in a pictorial or graphical format
(Example: charts, graphs, and maps). Data visualization tools provide an accessible way to see and
understand trends, patterns in data, and outliers. Data visualization tools and technologies are essential to
analyzingmassiveamountsofinformationand makingdata-drivendecisions.Theconcept of usingpictures is to
understand data that has been used for centuries. General types of data visualization are Charts,Tables,
Graphs, Maps, and Dashboards.

• PlotLine

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


• PlotBar

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011


• Pie Chart

Name: Ravi Agnihotri Enrollment:22BTE3CSE30011

You might also like