BY: SHIVAM JAIN & CHAITANYA BANKANHAL
SYCM,
EKLAVYA SIKSHAN SANSTHA’S POLYTECNIC,
PUNE
WHAT IS BIG DATA?
• There is no single standard definition…
• Data sets with sizes beyond the ability of commonly used software tools to capture, curate,
manage & process data with a tolerable elapsed time.
• In 2012, Gartner updated its definition as "Big data is high volume, high velocity, and/or high
variety information assets that require new forms of processing to enable enhanced decision
making, insight discovery and process optimization.“
THE 3V’S CLASSIFICATION OF BIG DATA
• Volume
The quantity of data generated
• Velocity
The rate at which the data can be transferred
• Variety
The different types of data that have to be stored.
VOLUME
• Every day…
• More than 1.5 billion shares are traded on the NYSE
• Facebook stores more than 2.6 billion likes & comments.
• Every Minute….
• McDonalds serves 2000 customers
• A new user is registered on G-mail
• Every Second….
• Banks process more than 10,000 transactions.
VELOCITY
• Data is being generated fast and needs to be processed fast.
• Late decisions → missing opportunities
Examples
• E-Promotions:- Based on your location, your purchase history, what you like → send promotions
right now for store next to you.
• Healthcare monitoring:- sensors monitoring your activities and body → any abnormal
measurements require immediate reactions.
VARIETY
• Various formats, types and structures.
• Text, numerical, images, audio, videos, sequences, time series, social media data, multi-dim
arrays, etc…
• A single application can be generated by collecting many types of data.
Advantages Limitations
Ability to make better decisions and take
meaningful actions at the right time.
Big risks on security and privacy
Cost Reduction Difficult to learn, requires expert training
to use in an organization
Technologies such as MapReduce, hive
and impala enable to run the queries
without changing the data structures
underneath.
Making relationships, applying
algorithms is very difficult
LATEST TECHNOLOGIES AND DEVELOPMENT
• Hadoop
• MapReduce
• MongoDB
Big Data

Big Data

  • 1.
    BY: SHIVAM JAIN& CHAITANYA BANKANHAL SYCM, EKLAVYA SIKSHAN SANSTHA’S POLYTECNIC, PUNE
  • 3.
    WHAT IS BIGDATA? • There is no single standard definition… • Data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage & process data with a tolerable elapsed time. • In 2012, Gartner updated its definition as "Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.“
  • 4.
    THE 3V’S CLASSIFICATIONOF BIG DATA • Volume The quantity of data generated • Velocity The rate at which the data can be transferred • Variety The different types of data that have to be stored.
  • 5.
    VOLUME • Every day… •More than 1.5 billion shares are traded on the NYSE • Facebook stores more than 2.6 billion likes & comments. • Every Minute…. • McDonalds serves 2000 customers • A new user is registered on G-mail • Every Second…. • Banks process more than 10,000 transactions.
  • 6.
    VELOCITY • Data isbeing generated fast and needs to be processed fast. • Late decisions → missing opportunities Examples • E-Promotions:- Based on your location, your purchase history, what you like → send promotions right now for store next to you. • Healthcare monitoring:- sensors monitoring your activities and body → any abnormal measurements require immediate reactions.
  • 7.
    VARIETY • Various formats,types and structures. • Text, numerical, images, audio, videos, sequences, time series, social media data, multi-dim arrays, etc… • A single application can be generated by collecting many types of data.
  • 8.
    Advantages Limitations Ability tomake better decisions and take meaningful actions at the right time. Big risks on security and privacy Cost Reduction Difficult to learn, requires expert training to use in an organization Technologies such as MapReduce, hive and impala enable to run the queries without changing the data structures underneath. Making relationships, applying algorithms is very difficult
  • 10.
    LATEST TECHNOLOGIES ANDDEVELOPMENT • Hadoop • MapReduce • MongoDB