0% found this document useful (0 votes)
193 views4 pages

Lahore School of Economics Data Analysis and Statistical Methods Winter 2020

This document contains an assignment from Lahore School of Economics for a course on Data Analysis and Statistical Methods. It includes questions about sources of big data, categories of business analytics, sources of big data analytics, and exploring two datasets related to Pakistan elections and yellow pages businesses. The assignment provides context, variable information, and suggestions for further analysis of the election and business datasets.

Uploaded by

Zohraiz Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
193 views4 pages

Lahore School of Economics Data Analysis and Statistical Methods Winter 2020

This document contains an assignment from Lahore School of Economics for a course on Data Analysis and Statistical Methods. It includes questions about sources of big data, categories of business analytics, sources of big data analytics, and exploring two datasets related to Pakistan elections and yellow pages businesses. The assignment provides context, variable information, and suggestions for further analysis of the election and business datasets.

Uploaded by

Zohraiz Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Lahore School of Economics

Data Analysis and Statistical Methods


Winter 2020

Name: Zohraiz Mubarik Section:__D_______________

Date: 02/09/2020 Score:___________________

Assignment 2
Q1: Explain different sources of Big Data.
 Transactional data : The main purpose of Transaction Processing System is to capture the
information and update the data for the operational decisions in an organization. There
are two ways to process transactions namely Batch processing which processes the data
as a single unit over a period of time and Real Time Processing System where data are
processed immediately.
 Social media data: People almost at every possible location in the world share their
information through social media which helps customers to make purchasing decisions
by having a glance at the feedback, customer complaints and miscellaneous services
provided with a product. Sentiments of the consumers are also expressed on social media
which help companies to make production decisions
 Internet Applications: There are numerous online ecommerce websites (such as Amazon,
Flipkart, Alibaba, eBay, Paytm, bookmyshow.com etc.) search engines (Google, Yahoo,
Bing, etc.) or online banking applications where millions of users are logging in daily and
using them. During their searches or transactions various click streams and logs get
generated which could be of value.
 Data from electronic instruments: There are numerous electronic media such as smart
phones, RFID tags, GPS Sensors, machines connected to networks, scanners, cameras
which generate high volumes of datasets. These are other sources of big data.

Q2: Explain the categories of Business Analytics.

Business analytics can be classified into 3 categories based on the purpose of use – descriptive,
predictive and prescriptive.

 Descriptive analytics explains a phenomenon from past data through reports, dashboards,
which helps in understanding what has happened.
 Predictive analytics helps us to understand what can happen. It supports predictions based
on past data, correlations between variables and patterns.
 Prescriptive analytics helps to understand different outcomes under different scenarios. It
consists of various tools such as optimization, simulations, what-if analysis scenarios
with change in input set of parameters.

Q3: Explain sources of Big Data analytics.

 Text Analytics consists Document representation, enterprise search system, search


engines, relevance of feedback, query processing, billions of searches of customer for a
particular product on google, searches on Amazon’s website provide indicator of
intention to purchase the product by customer.
 Audio and Video Analytics Audio analytics takes seconds to process audio through
technology mainly for safety purpose in any organization and can track a wide range of
sound in the environment. Video analytics is used to process and analyze videos from
variety of fields and industries. This helps in extracting events helpful for taking
operational decisions.
 Web Analytics Online retailer Amazon uses data mining techniques to mine the big data
such as click streams, web searches, order history, online etc. to derive intelligence. This
intelligence is used to make decisions about product promotions and it is working
successfully for companies such as Amazon.
 Network Analytics provides information about devices which are connected to network
and how they are interacting with each other. This information helps in designing
network policies, to make actionable decisions that help in improving business
performance and reducing costs.

Q4: Explore the “Predict Pakistan Elections 2018” dataset retrieved from
(https://www.kaggle.com/zusmani/predict-pakistan-elections-2018/kernels). Explain the
context, datatype, time regime, variable information, metadata etc. Discuss few questions
that are already answered (Hint: Kernel activity “Voter Behavior and Voting Reasons”
and “2002/2008/2013 Elections Visualizations”) and what further can be explored from it.

We predict the historic voters’ turn out in this election of 57-61%. Historically the average turn
out is 45% since 1977 (lowest 35% in 1997, highest 55% in 1977 and 53% in last elections).
Pakistan ranked 164th out of 169 nations in voters’ turn out; Australia being the first with 94.5%
turn out.

Voters’ participation in the country is very diverse, historically Musakhel and Kohlu yield less
than 25% whereas Layyah and Khanewal yield more than 60% and everything else is in between.
Punjab has the highest and Balochistan has the lowest voters’ turnout.
The contest will bring 3,675 candidates for 272 national assembly seats, that is 13 candidates on
average per seat. PTI has unleashed 244 candidates (highest in number by any political party).
Islamabad will see 76 candidates just for 3 seats fighting to rule the capital that guarantees the
psychological edge.

There a quite few interesting facts about these elections, for example we will see the highest
number of Lotas (candidates who often change their party affiliation) ever. PTI believes to win
the election no matter what may come while the survey pundits predicts the PML(N) lead of at
least 13% over PTI.

The history of elections and the charges of corruption, voters’ fraud, ghost votes, interferences
by deep state or violence go hand by hand. There is (almost) no country in the world without the
fear or accusations of such incidents in their elections.

We are releasing the complete National Assembly Elections’ Results dataset for 2002, 2008 and
2013 elections in CSV files for public and calling all data scientists, international observers and
journalists out there to help us achieve our inspirations.

Time Regime-Data collected is in a panel format which holds information from the timeline
2013 to 2018. The data set scrutinizes election results for the national assembly of Pakistan for
2002, 2008 and 2013.

Variable-The file contains Seat, Constituency, Candidates Name, Party Affiliation, Votes, Total
Valid Votes, Total Rejected Votes, Total Votes, Total Registered Voters and Turnout variables
for each seat.

Metadata-this data analyses different aspects of Pakistan’s election schedule. Canada, United
States, Pakistan and India are contributors of this data.

Q5: Explore the dataset “Yellow Pages of Pakistan”


Retrieved from (https://www.kaggle.com/mpasha96/yellow-pages-of-pakistan). Explain the
context, datatype, time regime, variable information, metadata etc. What analysis would
you suggest to obtain good insights from data.

Dataset to enable people to explore local businesses of Pakistan. This dataset might help the local
community in gathering information of local businesses. This also contributes in local economic
development of Pakistan by bridging traders and manufacturers.

Geography: Pakistan

Time period: 1990-2017

Dataset: The dataset contains information of approx 67000 businesses in Pakistan (~5000 in each
csv file)
Features: The dataset has total 7 columns

• Business Name

• Contact Name

• Telephone

• Website

• Services (Description of types of products/services provided by the business)

• Address

• City

Datatype-Cross Sectional data

You might also like