0% found this document useful (0 votes)

130 views

Big Data Assignment Revised

The document discusses analyzing an Indian election data set from 2014 and 2019 containing information on states, cities, candidates, political parties, votes. It describes using Apache Hive to perform SQL-like queries to draw patterns and conclusions from the large data, such as which states had maximum voting and comparing results to the 2014 election. A blueprint for the analysis includes comparing vote share by party, margin of winning, plots, and a correlation matrix.

Uploaded by

Harshit Sukhija

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

130 views

Big Data Assignment Revised

Uploaded by

Harshit Sukhija

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Indian general Election

What is a data set

Data set is a collection of related, discrete items of related data that may be
accessed individually or in combination or managed as a whole entity.
A data set is organized into some type of data structure. In a database, for
example, a data set might contain a collection of business data (names, salaries,
contact information, sales figures, and so forth). The database itself can be
considered a data set, as can bodies of data within it related to a particular type
of information, such as sales data for a particular corporate department.
The term data set originated with IBM, where its meaning was similar to that
of file. In an IBM mainframe operating system, a data set s a named collection
of data that contains individual data units organized (formatted) in a specific,
IBM-prescribed way and accessed by a specific access method based on the
data set organization. Types of data set organization include sequential, relative
sequential, indexed sequential, and partitioned. Access methods include the
Virtual Sequential Access Method (VSAM) and the Indexed Sequential Access
Method (ISAM).

Types of data set

1 - Big data
2 - Structured, unstructured, semi-structured data
3 - Time-stamped data
4 - Machine data
5 - Spatiotemporal data
6 - Open data
7 - Dark data
8 - Real time data
9 - Genomics data
10 - Operational data
11 - High-dimensional data
12 - Unverified outdated data
13 - Translytic Data

Description of the data set

The data set that we are provide to work on consist data of Indian election 2014
& 2019

Parameters provide are:-

State, City, Candiate name, Politacal Party name, evm votes, total votes and
percentage of votes from that particular city.

STATES
Uttar Pradesh 16%
Maharashtra 11%
Tamil Nadu 11%
Bihar 8%

Party
Independent 37%
BSP 6%
INC 5%
Other 57%

State = 34 unique values

Rank = 1>43
PC (city) = 508 unique values.

Why to analyse this data set:-

A large amount of data has generated daily in Indian elections. It is very
important to anyalyse the data because it helps in gaining insights and
better decision making process. We will be able to define the problems
in dataset and finding the best appropriate solutions of the problems.
Business understanding

The problem in this data set is to analyze and find out the pattern or behavior of
different people from different states while they vote for the one political party
for finding out this we can use various tools like hive, sqoop, pig, Rdbms etc.

The basic objective is to draw patterns from the data and to reach at conclusions
related to from which state there are maximum voting.

Analytical approach

Hive provides SQL-like declarative language, called HiveQL, which is used for
expressing queries. Using Hive-QL users associated with SQL are able to perform
data analysis very easily.

Apache Hive is the technique which we are going to use for the analysis. Apache
Hive helps with querying and managing large datasets real fast. It is an ETL tool
for Hadoop ecosystem. It is a data warehouse framework for querying and
analysis of data that is stored in HDFS. Hive is an open source-software that lets
programmers analyzed large data sets on Hadoop.

The size of data sets being collected and analyzed in the industry for business
intelligence is growing and in a way, it is making traditional data warehousing
solutions more expensive.

Blue print is as below

 Comparison of vote hare party

 Comparison of margin of winning for different parties.
 Plot scatter & density plot.
 Comparison of results with 2014 election results.
 Plot of co relation matrix.

Submitted by:-

Harshit Sukhija:- 18MBA7118

Prerna Sharma:- 18MBA7106

Palak Sharma:- 18MBA7108

Lakshay Sharma:- 18MBA7081

Grammar L2 Activities
No ratings yet
Grammar L2 Activities
3 pages
ML Ts Proj
100% (9)
ML Ts Proj
58 pages
Analytics Holds The Key in Politics
No ratings yet
Analytics Holds The Key in Politics
5 pages
Davice ML 21-01-2024
No ratings yet
Davice ML 21-01-2024
50 pages
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
100% (3)
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
77 pages
Info Sec Assign
No ratings yet
Info Sec Assign
14 pages
Homework 1: ECON 621: Political Economy, Monsoon 2019
No ratings yet
Homework 1: ECON 621: Political Economy, Monsoon 2019
2 pages
NIrupam Agarwal Business Report-ML
100% (1)
NIrupam Agarwal Business Report-ML
23 pages
DAL Lab Report
No ratings yet
DAL Lab Report
7 pages
Jensenius Verniers SIP2017 PDF
No ratings yet
Jensenius Verniers SIP2017 PDF
7 pages
Lahore School of Economics Data Analysis and Statistical Methods Winter 2020
No ratings yet
Lahore School of Economics Data Analysis and Statistical Methods Winter 2020
4 pages
Indian Election 2019 Analysis
No ratings yet
Indian Election 2019 Analysis
7 pages
61 Datasets Found: CSV
No ratings yet
61 Datasets Found: CSV
2 pages
Visualization of Roll Call Data For Supporting Analyses of Political Profiles
No ratings yet
Visualization of Roll Call Data For Supporting Analyses of Political Profiles
8 pages
A Presentation Report For Programming With Data Visualization
No ratings yet
A Presentation Report For Programming With Data Visualization
26 pages
ML ProjectReport-Sonali Joshi
100% (2)
ML ProjectReport-Sonali Joshi
38 pages
MIS781 Group Assignment Guidance v16March2024
No ratings yet
MIS781 Group Assignment Guidance v16March2024
12 pages
SSRN Id3511428
No ratings yet
SSRN Id3511428
3 pages
Analyzing Voters
No ratings yet
Analyzing Voters
3 pages
Umendra Pratap Singh Solanki ML Graded Project 18-12-2022
No ratings yet
Umendra Pratap Singh Solanki ML Graded Project 18-12-2022
27 pages
Buy ebook (Ebook) Elections in India: An Overview by Sanjay Kumar ISBN 9780367535964, 0367535963 cheap price
100% (3)
Buy ebook (Ebook) Elections in India: An Overview by Sanjay Kumar ISBN 9780367535964, 0367535963 cheap price
65 pages
SSSDR Bharat India ElectionResultsAudits CirculationNote 11.02.2024
No ratings yet
SSSDR Bharat India ElectionResultsAudits CirculationNote 11.02.2024
7 pages
What is Polititcal Data
No ratings yet
What is Polititcal Data
17 pages
Seminar Report
No ratings yet
Seminar Report
2 pages
Forecasting Election Using Machine Learning and Data Analysis
No ratings yet
Forecasting Election Using Machine Learning and Data Analysis
36 pages
Machine Learning-2 Report.
No ratings yet
Machine Learning-2 Report.
71 pages
Week 1 Data
No ratings yet
Week 1 Data
2 pages
SQL Queries
No ratings yet
SQL Queries
12 pages
Logistic Regression
No ratings yet
Logistic Regression
15 pages
Bhavnani India National and State Election Dataset V 2.0
No ratings yet
Bhavnani India National and State Election Dataset V 2.0
2 pages
Business Analytics End Term SAP ID: 80511020814 Roll Number: J057
No ratings yet
Business Analytics End Term SAP ID: 80511020814 Roll Number: J057
4 pages
PDF Sba Database
No ratings yet
PDF Sba Database
2 pages
A_Presentation_Report_for_DATA_VISUALIZATION[1][1]
No ratings yet
A_Presentation_Report_for_DATA_VISUALIZATION[1][1]
19 pages
Election Prediction Projectfinal
No ratings yet
Election Prediction Projectfinal
30 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
PDF (Ebook) GENERAL ELECTIONS 2014 Reference Handbook by PRESS INFORMATION BUREAU, GOVERNMENT OF INDIA ISBN 9780771223662, 9780172271163, 9780370229041, 9780674253667, 0771223668, 0172271169, 0370229045, 0674253663 download
100% (7)
PDF (Ebook) GENERAL ELECTIONS 2014 Reference Handbook by PRESS INFORMATION BUREAU, GOVERNMENT OF INDIA ISBN 9780771223662, 9780172271163, 9780370229041, 9780674253667, 0771223668, 0172271169, 0370229045, 0674253663 download
65 pages
Data Mining Project PCA Report
100% (1)
Data Mining Project PCA Report
27 pages
SBA Question 4
No ratings yet
SBA Question 4
9 pages
Parliament and Government Composition Database (Parlgov)
No ratings yet
Parliament and Government Composition Database (Parlgov)
8 pages
MySql PL-4
No ratings yet
MySql PL-4
4 pages
Dharmavaram-Assembly-Factbook
No ratings yet
Dharmavaram-Assembly-Factbook
16 pages
2000 Bookmatter ElectionsInWesternEuropeSince1
No ratings yet
2000 Bookmatter ElectionsInWesternEuropeSince1
103 pages
Elections Data Analysis
No ratings yet
Elections Data Analysis
39 pages
Document8 (9)
No ratings yet
Document8 (9)
6 pages
Elections in India An Overview 1st Edition Sanjay Kumar - The complete ebook set is ready for download today
100% (1)
Elections in India An Overview 1st Edition Sanjay Kumar - The complete ebook set is ready for download today
46 pages
Data Mining Project DSBA PCA Report
No ratings yet
Data Mining Project DSBA PCA Report
27 pages
ML P L Lohitha 22-01-23 Business Report
No ratings yet
ML P L Lohitha 22-01-23 Business Report
34 pages
Pertemuan 2-3.data MappingData Reduction
No ratings yet
Pertemuan 2-3.data MappingData Reduction
20 pages
Instant ebooks textbook Elections in India An Overview 1st Edition Sanjay Kumar download all chapters
100% (1)
Instant ebooks textbook Elections in India An Overview 1st Edition Sanjay Kumar download all chapters
81 pages
Kailash BusinessReport ML
No ratings yet
Kailash BusinessReport ML
51 pages
DSBDA Covid - Cases
No ratings yet
DSBDA Covid - Cases
11 pages
Machine Learning Project: Problem 1
67% (3)
Machine Learning Project: Problem 1
26 pages
Mvchine Learning Project Report
No ratings yet
Mvchine Learning Project Report
33 pages
CLASS XI ENGLISH (STUDENT SUPPORT MATERIAL) 2024-25
No ratings yet
CLASS XI ENGLISH (STUDENT SUPPORT MATERIAL) 2024-25
22 pages
Final Project
No ratings yet
Final Project
22 pages
Data Mining Project DSBA PCA Report Final
No ratings yet
Data Mining Project DSBA PCA Report Final
27 pages
EC Game Plan in Connivance With Modi-EC Drastically Manipulated All These Results
No ratings yet
EC Game Plan in Connivance With Modi-EC Drastically Manipulated All These Results
28 pages
Machine Learning Project
83% (6)
Machine Learning Project
37 pages
Machine Learning Business Report
100% (1)
Machine Learning Business Report
34 pages
minors4proj_1
No ratings yet
minors4proj_1
13 pages
Comparative Study
No ratings yet
Comparative Study
7 pages
Supreme Plasto Chem: Manufacturer of CPW & HCL
No ratings yet
Supreme Plasto Chem: Manufacturer of CPW & HCL
1 page
Maharashtra - 1 UGKC - II
No ratings yet
Maharashtra - 1 UGKC - II
3 pages
Glare Elimination: He Benefits of An Auto-Dimming Mirror
No ratings yet
Glare Elimination: He Benefits of An Auto-Dimming Mirror
4 pages
Assignment of Dynamics of Machine
No ratings yet
Assignment of Dynamics of Machine
1 page
God and Evil by - Justin Bright Benjamin
No ratings yet
God and Evil by - Justin Bright Benjamin
2 pages
November 2024
No ratings yet
November 2024
3 pages
153 Scalars Vectors CIE IGCSE Physics Ext Theory Ms
No ratings yet
153 Scalars Vectors CIE IGCSE Physics Ext Theory Ms
3 pages
Deed of gift - Nuwini 1
No ratings yet
Deed of gift - Nuwini 1
4 pages
The Confidence Gap 33
No ratings yet
The Confidence Gap 33
7 pages
GMRC - Story Telling
No ratings yet
GMRC - Story Telling
21 pages
Future Tensesfehehfhg
No ratings yet
Future Tensesfehehfhg
3 pages
Reading 2 - Exegesis On The Rich Young Man by James Bretzke PDF
100% (1)
Reading 2 - Exegesis On The Rich Young Man by James Bretzke PDF
5 pages
Obstetric Outpatient: Friday, April 20 2018
No ratings yet
Obstetric Outpatient: Friday, April 20 2018
14 pages
Radio Network Optimization - TCH Congestion Analysis
No ratings yet
Radio Network Optimization - TCH Congestion Analysis
3 pages
Binaural Beat Music For Soul
No ratings yet
Binaural Beat Music For Soul
2 pages
Here - Alessia Cara
No ratings yet
Here - Alessia Cara
1 page
Audit
No ratings yet
Audit
3 pages
Anti-Obama - LVIII
No ratings yet
Anti-Obama - LVIII
7 pages
Cognitive Approach To Electronic Music Theoretical
No ratings yet
Cognitive Approach To Electronic Music Theoretical
4 pages
Ipi433091 3
No ratings yet
Ipi433091 3
8 pages
Test 5º 1
No ratings yet
Test 5º 1
2 pages
Cabello Romelia B. (Infographics-Coe)
No ratings yet
Cabello Romelia B. (Infographics-Coe)
1 page
Elt Syllabus Design
No ratings yet
Elt Syllabus Design
3 pages
Download The Life and Times of Chinua Achebe 1st Edition Kalu Ogbaa ebook file with all chapters
100% (1)
Download The Life and Times of Chinua Achebe 1st Edition Kalu Ogbaa ebook file with all chapters
67 pages
Complete Yourself: Fixed Teeth in One Day With The All-On-4 Treatment Concept
No ratings yet
Complete Yourself: Fixed Teeth in One Day With The All-On-4 Treatment Concept
16 pages
2nd Exam in Mapeh 9
No ratings yet
2nd Exam in Mapeh 9
3 pages
Sivananda Yoga Asanas in Detailed Instruction
No ratings yet
Sivananda Yoga Asanas in Detailed Instruction
18 pages
QRG - Mri Abdomen
No ratings yet
QRG - Mri Abdomen
2 pages
Whitney Houston - Run To You
0% (1)
Whitney Houston - Run To You
13 pages
Clavano
No ratings yet
Clavano
47 pages
Sale of Goods
No ratings yet
Sale of Goods
52 pages
Learning British Sign Language
80% (5)
Learning British Sign Language
20 pages
LP1020 MM E01
100% (1)
LP1020 MM E01
581 pages

Big Data Assignment Revised

Uploaded by

Big Data Assignment Revised

Uploaded by

Indian general Election

What is a data set

Types of data set

Description of the data set

Parameters provide are:-

State = 34 unique values

Why to analyse this data set:-

Blue print is as below

 Comparison of vote hare party

Harshit Sukhija:- 18MBA7118

Prerna Sharma:- 18MBA7106

Palak Sharma:- 18MBA7108

Lakshay Sharma:- 18MBA7081

You might also like