Startup Analysis
By – Roshik Ganesan
Utsav Malpani
Vignesh Srinivas Project Guide – Dr. Jongwook Woo
Contents
 Introduction
 Project Scope
 Data Set
 Work Flow
 Hardware Specifications
 Queries
 Data Visualizations
 Summary
 Recommendations
 Limitations
 References
Introduction
• Startup - A company designed to grow fast. Being newly found does not in itself
make a company a startup but it should explore an unknown or innovative
business model in order to disrupt existing markets.
• Eg: Amazon, Uber
• In the past few years there has been a rapid increase in the number of
startups that are starting but all of them do not succeed.
• Through this analysis we aim to provide a guide to future entrepreneurs which
will guide them regarding the segments in which they should invest, the place
where they should start, whom they should approach for funding and who can
they contact if they want their company to be acquired.
Project Scope
• Analysis on 2 decade startup venture’s data conducted
• The data obtained was analyzed using Azure HD insights and HIVEQL.
• Data in HDFS was accessed using cloudberry
• Visualization of data done using Tableau
Data Set
• Dataset link : https://www.dropbox.com/s/jr997tkty186apu/Crunchbase.xlsx?dl=0
• Startup data of over 20 years with market segment and funding details.
• Dataset comprises of:
• Funding details
• Investment details
• Acquirement details
• Format: CSV
• Size: 200 MB
Work Flow Diagram
Hardware Specification
• Number Of Head Nodes = 2 | CPU = 8 Cores
• Number Of Worker Nodes = 2 |CPU = 8 Cores
• Ram – 24 GB
• Disks – 16
• Local SSD – 400 GB
• Cluster – Hadoop Azure HD Insights
Cloud Berry For Azure
Queries Executed
Queries Used
• DROP TABLE companies;
CREATE EXTERNAL TABLE IF NOT EXISTS
companies(name STRING,market STRING,funding_total_usd INT,status
STRING,country_code STRING,state_code STRING,region STRING,city
STRING,funding_rounds INT,founded_at DATE,founded_year
INT,first_funding_at DATE,last_funding_at DATE)
ROW FORMAT DELIMITED FIELDS TERMINATED
BY '|’ tblproperties ("skip.header.line.count"="1");
LOAD DATA INPATH 'Companies.csv’
• select distinct c.name,c.funding_total_usd,a.acquirer_name,a.price_amount,
(a.price_amount - c.funding_total_usd) as acqvalue,c.market from companies c
join acquirer a on (c.name = a.company_name) sort by acqvalue desc;
Most Lucrative Market Segments
Leading Market’s:
•BioTechnology
•Software
From this pie chart we see
that the Biotechnology
market is the most funded
with comprising 35% of the
entire funding followed by
the software and clean
technology with 21% and
19% respectively.
Countries With The Maximum Startups
• USA
• United Kingdom
• Canada
• China
• India
This analysis shows that
USA leads with the most
number of registered
startups with 23k startups
followed by London,
Canada, China and India.
Popular Cities For Startups
• San Francisco
• New York
• London
• Austin
• Cambridge
Drilling Deeper
We analyzed that San
Francisco tops the cities list with
2k followed by New York and
London having 1.9k and 1k
respectively
Year With The Highest Startups
• Most startups in
Software Market
• Steady Increase
• 2012 - Max startups in
every segment
Facts to note:
Though Bio-Technology is
receiving the highest average
funding, startup's in that market
have always remained low.
93.33% increase in software
startups from 2000 to 2012
Companies with maximum Funding
Company Market Funding
Received
Zebra Tech Software 200 Million
Quad/
Graphics
Business -
Printing
190 Million
Solyndra Manufacturin
g
156 Million
Uber Transportatio
n
150 Million
Markets with High Average Funding
• Natural Gas - 400M
• Oil and Gas - 171M
• Content Creators - 121M
• Custom Retail - 119M
Companies With Maximum Investment
Finance
Goldman Sach
Hardware
Intel
Software:
Intel
Google
Healthcare
New Enterprise
Associates
The analysis gives an
idea on the companies
to target while startups
are looking for
funding
Investment And Acquisition
E-Commerce > Video Games > Mobile
Interesting Note:
Companies from the
highest funded
market domain and
companies from the
high average funding
market domain are
acquired the least.
7/10 companies with
highest value are
form the software or
the entertainment
market
States with Maximum Acquisitions - US
• CA - 3491
• NY - 1163
• Texas - 476
14.5% of the
acquisitions are
from the state of
California
followed by New
York 5%
Summary
Best of all:
•Market Segment : Bio-Technology & Software
•Country : USA , UK
•City : San Francisco, New York
•Competitive Market : Software
•Most Funded Segment : Oil and Natural gas, Content Creators
•Prospective Investors : Goldman Sachs, Intel, Google, New Venture Associates
•High Acquisitions : California, New York
Recommendations
Companies seeking funding from
foreign companies could approach
Goldman Sachs, Intel, Google
based on their respective market.
Entrepreneurs looking for a long
term company ownership could
target the following market
domains -
• Bio-Technology
• Natural Gas
• Oil and Gas
Perks: High funding and
company ownership
Entrepreneurs looking only to
make profit out of a company
could venture into markets like
software, entertainment, E-
Commerce or Social Media
preferably in states of CA or NY.
Perks: Low investments and
High acquirement chances.
Limitations
• From the available data, holding few parameters, we were able to provide the solution
that will cater the basic needs of a startup.
• Had the data been more detailed i.e. holding information regarding the scale of the
company (Large or Small), it would have been possible to analyze the funding that
particular segment would receive.
• The dataset also didn’t provide the details on the investment the company alone has
made which would have been helpful in suggesting the amount required to build a
startup.
GITHUB
https://github.com/vigyr/Calstatela
References
• [1] https://www.techinasia.com/talk/27-striking-facts-startups-world-
infographic
• [2] https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-
tutorial-get-started-windows
Thank You
Shoot the Questions

Kick Start Startup Guide

  • 1.
    Startup Analysis By –Roshik Ganesan Utsav Malpani Vignesh Srinivas Project Guide – Dr. Jongwook Woo
  • 2.
    Contents  Introduction  ProjectScope  Data Set  Work Flow  Hardware Specifications  Queries  Data Visualizations  Summary  Recommendations  Limitations  References
  • 3.
    Introduction • Startup -A company designed to grow fast. Being newly found does not in itself make a company a startup but it should explore an unknown or innovative business model in order to disrupt existing markets. • Eg: Amazon, Uber • In the past few years there has been a rapid increase in the number of startups that are starting but all of them do not succeed. • Through this analysis we aim to provide a guide to future entrepreneurs which will guide them regarding the segments in which they should invest, the place where they should start, whom they should approach for funding and who can they contact if they want their company to be acquired.
  • 4.
    Project Scope • Analysison 2 decade startup venture’s data conducted • The data obtained was analyzed using Azure HD insights and HIVEQL. • Data in HDFS was accessed using cloudberry • Visualization of data done using Tableau
  • 5.
    Data Set • Datasetlink : https://www.dropbox.com/s/jr997tkty186apu/Crunchbase.xlsx?dl=0 • Startup data of over 20 years with market segment and funding details. • Dataset comprises of: • Funding details • Investment details • Acquirement details • Format: CSV • Size: 200 MB
  • 6.
  • 7.
    Hardware Specification • NumberOf Head Nodes = 2 | CPU = 8 Cores • Number Of Worker Nodes = 2 |CPU = 8 Cores • Ram – 24 GB • Disks – 16 • Local SSD – 400 GB • Cluster – Hadoop Azure HD Insights
  • 8.
  • 9.
  • 10.
    Queries Used • DROPTABLE companies; CREATE EXTERNAL TABLE IF NOT EXISTS companies(name STRING,market STRING,funding_total_usd INT,status STRING,country_code STRING,state_code STRING,region STRING,city STRING,funding_rounds INT,founded_at DATE,founded_year INT,first_funding_at DATE,last_funding_at DATE) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|’ tblproperties ("skip.header.line.count"="1"); LOAD DATA INPATH 'Companies.csv’ • select distinct c.name,c.funding_total_usd,a.acquirer_name,a.price_amount, (a.price_amount - c.funding_total_usd) as acqvalue,c.market from companies c join acquirer a on (c.name = a.company_name) sort by acqvalue desc;
  • 11.
    Most Lucrative MarketSegments Leading Market’s: •BioTechnology •Software From this pie chart we see that the Biotechnology market is the most funded with comprising 35% of the entire funding followed by the software and clean technology with 21% and 19% respectively.
  • 12.
    Countries With TheMaximum Startups • USA • United Kingdom • Canada • China • India This analysis shows that USA leads with the most number of registered startups with 23k startups followed by London, Canada, China and India.
  • 13.
    Popular Cities ForStartups • San Francisco • New York • London • Austin • Cambridge Drilling Deeper We analyzed that San Francisco tops the cities list with 2k followed by New York and London having 1.9k and 1k respectively
  • 14.
    Year With TheHighest Startups • Most startups in Software Market • Steady Increase • 2012 - Max startups in every segment Facts to note: Though Bio-Technology is receiving the highest average funding, startup's in that market have always remained low. 93.33% increase in software startups from 2000 to 2012
  • 15.
    Companies with maximumFunding Company Market Funding Received Zebra Tech Software 200 Million Quad/ Graphics Business - Printing 190 Million Solyndra Manufacturin g 156 Million Uber Transportatio n 150 Million
  • 16.
    Markets with HighAverage Funding • Natural Gas - 400M • Oil and Gas - 171M • Content Creators - 121M • Custom Retail - 119M
  • 17.
    Companies With MaximumInvestment Finance Goldman Sach Hardware Intel Software: Intel Google Healthcare New Enterprise Associates The analysis gives an idea on the companies to target while startups are looking for funding
  • 18.
    Investment And Acquisition E-Commerce> Video Games > Mobile Interesting Note: Companies from the highest funded market domain and companies from the high average funding market domain are acquired the least. 7/10 companies with highest value are form the software or the entertainment market
  • 19.
    States with MaximumAcquisitions - US • CA - 3491 • NY - 1163 • Texas - 476 14.5% of the acquisitions are from the state of California followed by New York 5%
  • 20.
    Summary Best of all: •MarketSegment : Bio-Technology & Software •Country : USA , UK •City : San Francisco, New York •Competitive Market : Software •Most Funded Segment : Oil and Natural gas, Content Creators •Prospective Investors : Goldman Sachs, Intel, Google, New Venture Associates •High Acquisitions : California, New York
  • 21.
    Recommendations Companies seeking fundingfrom foreign companies could approach Goldman Sachs, Intel, Google based on their respective market. Entrepreneurs looking for a long term company ownership could target the following market domains - • Bio-Technology • Natural Gas • Oil and Gas Perks: High funding and company ownership Entrepreneurs looking only to make profit out of a company could venture into markets like software, entertainment, E- Commerce or Social Media preferably in states of CA or NY. Perks: Low investments and High acquirement chances.
  • 22.
    Limitations • From theavailable data, holding few parameters, we were able to provide the solution that will cater the basic needs of a startup. • Had the data been more detailed i.e. holding information regarding the scale of the company (Large or Small), it would have been possible to analyze the funding that particular segment would receive. • The dataset also didn’t provide the details on the investment the company alone has made which would have been helpful in suggesting the amount required to build a startup.
  • 23.
  • 24.
    References • [1] https://www.techinasia.com/talk/27-striking-facts-startups-world- infographic •[2] https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop- tutorial-get-started-windows
  • 25.