BIG DATA
Final
Presentation
By: Hemanth Aroumougam
Friday, April 4, 14
During the first generation....
Friday, April 4, 14
Employees in
companies started
entering data into
computer systems
Friday, April 4, 14
As the second generation
comes...
Friday, April 4, 14
Friday, April 4, 14
But now as generations
move on there is a
third one to this list
and it is...
Friday, April 4, 14
Now a days even machines are
automatically entering data into
computer systems.
Friday, April 4, 14
Friday, April 4, 14
Friday, April 4, 14
BIG DATA is the term for a collection of
data sets so large and complex that it
becomes difficult to process using on-hand
database management tools or traditional
data processing applications.
Friday, April 4, 14
• Big data is a popular term used to describe
the exponential growth and availability of
data, both structured and unstructured.
Friday, April 4, 14
Big data is a buzzword, or catch-phrase, used to
describe a massive volume of both structured
and unstructured data that is so large that it's
difficult to process using traditional database
and software techniques. In most enterprise
scenarios the data is too big or it moves too
fast or it exceeds current processing capacity.
Friday, April 4, 14
In BIG DATA there are
3Vs which are the
defining properties and
the dimensions of
Big Data
Friday, April 4, 14
The 3Vs are...
Friday, April 4, 14
•Volume
•Variety
•Velocity
Friday, April 4, 14
Volume-
BigVolume consists of simple
SQL analytics and with complex
non-SQL analytics. In other
words volume refers to the
amount of data.
Friday, April 4, 14
SQL
• SQL Stands for Structured Query Language.
• SQL is a standardized query language for
requesting information from a database.
• SQL was first introduced as a commercial
database system in 1979 by the Oracle
Corporation.
• Historically, SQL has been the favorite query
language for database management systems
running on minicomputers and mainframes.
Friday, April 4, 14
Volume
Petabyte (PB)
Terabyte (TB)
Gigabyte (GB)
Megabyte (MB)
Kilobyte (KB)
Friday, April 4, 14
Variety-
Large number of diverse data
sources to integrate. In other
words variety is basically
referring to the number of
different types of data.
Friday, April 4, 14
VARIETY
Structured Data
Unstructured
Data
Semi structured
Data
Friday, April 4, 14
Structured Data
• Structured Data is data that resides in a fixed
field within a record or file is called
structured data.This includes data contained
in relational databases and spreadsheets.
Structured data has the advantage of being
easily entered, stored, queried and analyzed.
Friday, April 4, 14
• Library Catalogues (date, author, place, subject, etc)
• Census records (birth, income, employment, place etc.)
• Phone numbers (and the phone book)
• Economic data (GDP, PPI, ASX etc.)
• XML-TEI (bringing structure to the text through tagging particular
elements like versions of the word ”canal’ in 17th C Dutch.
• Databases
• Data warehouse
• Enterprise systems (CRM, ERP, etc)
EXAMPLES OF STRUCTURED DATA
Friday, April 4, 14
Semi structured Data
• Semi-structured data is a form of
structured data that does not conform with
the formal structure of data models
associated with relational databases or
other forms of data tables
Friday, April 4, 14
• Web Pages
• Information Integration
• XML
EXAMPLES OF SEMI STRUCTURED DATA
Friday, April 4, 14
Unstructured Data
• Unstructured Data refers to information
that either does not have a pre-defined data
model or is not organized in a predefined
manner. Unstructured information is typically
text-heavy. In other words unstructured data
is something that is at the other end of the
spectrum. It might be in any form: text, audio,
video.We definitely don’t know from looking
at the data what it means ,unless we apply
human understanding to it.
Friday, April 4, 14
EXAMPLES OF UNSTRUCTURED DATA
• Book
• Story
• Heavy text
• audio
• video
• RSS Feeds
• Word documents
• Excel Spreadsheets
• Email messages
Friday, April 4, 14
Velocity-
Velocity is basically referring
to the speed in which the
data is processed.
Friday, April 4, 14
TYPES OFVELOCITY
REAL TIME ANALYSIS
NEAR REAL TIME
PERIODIC
BATCH
Friday, April 4, 14
Benefits of Batch
Processing.
It can shift the time of job processing to when the computing
resources are less busy.
• It avoids idling the computing resources with minute-by-minute
manual intervention and supervision.
• By keeping high overall rate of utilization, it amortizes the computer,
especially an expensive one.
• It allows the system to use different priorities for batch and
interactive work.
• Rather than running one program multiple times to process one
transaction each time, batch processes will run the program only
once for many transactions, reducing system overhead.
Friday, April 4, 14
Friday, April 4, 14
Friday, April 4, 14
Friday, April 4, 14
ORACLE BIG DATA
SOLUTION
• Oracle is the first vendor to offer a complete and
integrated solution to address the full spectrum of
enterprise big data requirements. Oracle’s big data
strategy is centered on the idea that you can
extend your current enterprise information
architecture to incorporate big data. New big data
technologies, such as Hadoop and Oracle NoSQL
database, run alongside your Oracle data
warehouse to deliver business value and address
your big data requirements.
Friday, April 4, 14
Friday, April 4, 14
Advantages and
Disadvantages of
BIG DATA
Friday, April 4, 14
ADVANTAGES
• Data mining allows uses are that you can find correlations easier
• More calculated now therefore accuracy is higher
• Data is now combined into a big mass which allows for links to be
found
• For example: company with decades of information can make use of
Big Data and data analysis to create competitive advantages and
open new business opportunities
• Started because companies have been finding it hard to manage all
their data 
• Creates new growth opportunities, lots of jobs
Friday, April 4, 14
DISADVANTAGES
• Big risks on security and privacy
• Challenges arise: expensive, need to spend a lot to get it working
• A lot of analyzing: uncover patterns, apply algorithms, connections
relationships
• Still need specialization regarding the analysts; hard to find the right
skill set
Friday, April 4, 14
BIG DATA Softwares
Friday, April 4, 14
•Hadoop- Apache Foundation
•MongoDB- Mongo, Inc
Friday, April 4, 14
• Apache Hadoop is an open source data framework for storage and
large scale processing for data sets on clusters of commodity
hardwares. It is licensed under the Apache License 2.0.  The Apache
Hadoop framework is composed of the following modules:
• Hadoop Common – contains libraries and utilities needed by other
Hadoop modules.
• Hadoop Distributed File System (HDFS) – a distributed file-system
that stores data on commodity machines, providing very high
aggregate bandwidth across the cluster.
• HadoopYARN – a resource-management platform responsible for
managing compute resources in clusters and using them for scheduling
of users' applications.
• Hadoop MapReduce – a programming model for large scale data
processing.
• This is written in- Java
Friday, April 4, 14
• MongoDB is a big data software which came from the word
“humongous”. MongoDB is a cross-platform document-oriented
database.A document-oriented database is a computer program
designed for storing, retrieving, and managing document-oriented
information, also known as semi-structured data.This is classified as
NoSQL.  A NoSQL database provides a mechanism for storage and
retrieval of data that is modeled in means other than the tabular
relations used in relational databases.
• MarkLogic is an American Business company that makes NoSQL
database.
• Language written in- C++
Friday, April 4, 14
Friday, April 4, 14
•Enterprise NoSQL Database
Technology
•Best Big Data Search
•Real-timeYour Hadoop
Friday, April 4, 14
Enterprise NoSQL Database
Technology
• For more than a decade, MarkLogic has
delivered a powerful, agile, and trusted
enterprise-grade NoSQL (Not Only SQL)
database that enables organizations to turn all
data into valuable and actionable information.
Key features include ACID transactions,
horizontal scaling, real-time indexing, high
availability, disaster recovery, government-
grade security, and more.
Friday, April 4, 14
Best Big Data Research
• Search all data for more value. Bring all relevant content back to users
– unstructured and structured, internal and public.
• Real-time updates. Real-time results.When documents are updated or
inserted, they are available for search immediately.
• Able to query all types of data. Structured, semi-structured, and
unstructured content are all supported within the same queries.
• Real-time alerts for fast response. MarkLogic has the highest
performance alerting engine available, capable of running millions of
custom queries on each and every change to the document repository
– no polling required.
• Search you can bank on. Businesses that count on revenue through
paid content search and retrieval trust MarkLogic to deliver.
MarkLogic’s scale-out, real-time platform is more than a
search engine linked to a content repository – it is the most
complete platform for building search-oriented applications.
Friday, April 4, 14
Real Time your Hadoop
Get more power out of Hadoop. Hadoop and MarkLogic together can
allow you to tackle problems that would be difficult or impossible to
address by either technology alone.
Save money by leveraging common infrastructure. Using MarkLogic and
Hadoop Distributed File System (HDFS) enables common batch-
processing infrastructure to be used across many different projects and
applications.
Enterprise-class support for Hadoop. Our partnership with Intel provides
a strong, supported platform for building secure, enterprise-class Big Data
Applications with Apache Hadoop.
Seamlessly combine the power of MapReduce with MarkLogic’s real-time,
interactive analysis and indexing on a single, unified platform.
Friday, April 4, 14
Friday, April 4, 14
Some points of what
can you accomplish
with
BIG DATA?
Friday, April 4, 14
Dialogue with Consumers
• Today’s consumers are a tough nut to crack.They look around a lot
before they buy. You want to make customers to buy your
products.
• Big Data allows you to profile these increasingly vocal and fickle
little ‘tyrants’ in a far-reaching manner so that you can engage in an
almost one-on-one, real-time conversation with them.This is not
actually a luxury. If you don’t treat them like they want to, they will
leave you in the blink of an eye.
Friday, April 4, 14
Re-develop your Products
• Big Data can also help you understand how others perceive your
products so that you can adapt them.
• Analysis of unstructured social media text allows you to uncover
the sentiments of your customers and even segment those in
different geographical locations or among different demographic
groups.
Friday, April 4, 14
Perform Risk Analysis
• Success not only depends on how you run your company. Social and
economic factors are crucial for your accomplishments as well.  
Predictive analytics, fueled by Big Data allows you to scan and
analyze newspaper reports or social media feeds so that you
permanently keep up to speed on the latest developments in your
industry and its environment.
• Detailed health-tests on your suppliers and customers are another
goodie that comes with Big Data.This will allow you to take action
when one of them is in risk of defaulting.
Friday, April 4, 14
Keeping your data safe
• You can map the entire data landscape across your company with
Big Data tools, thus allowing you to analyze the threats that you
face internally.
• You will be able to detect potentially sensitive information that is
not protected in an appropriate manner and make sure it is stored
according to regulatory requirements.
Friday, April 4, 14
Friday, April 4, 14
Where they use
BIG DATA
and How?
Friday, April 4, 14
Big Data is used in
many fields like....
Friday, April 4, 14
• Fault Logging and cost predictions- Car makers
place hundreds of sensors on components around the car which
constantly log data on performance and faults.All of this data can be
used to reengineer designs for more efficient products and to predict
what the strain of warranty repairs are likely to be on cost and man
resource.
Car Makers
Friday, April 4, 14
Friday, April 4, 14
WHERE From Factories and from sensors
Data Center(Headquarters)
NEEDS Safety and Quality Analysis
BENEFITS Feedback from Design
TOYOTA
Friday, April 4, 14
• B2B supplier profiling- Finance professionals can use big
data to check on the ‘health’ of their suppliers and business
partners.They can monitor a variety of indicators including when
creditors pay their bills and whether there is any change
• Fraud detection-Companies likeVisa are using big data to
create fraud detection models which can flag up potential
fraudsters.
Finance
Friday, April 4, 14
WHERE Where ever they buy
Data Center(Headquarters)
NEEDS Detect Fraud, Customer’s Behavior
BENEFITS Personal Recommendation
VISA
Friday, April 4, 14
•  Simulations-Manufacturers can take real data from their
products on the market and then run simulations based on what
would happen if they changed one particular component or design
aspect.They can then find ways to make the product cheaper, more
reliable or more environmentally friendly.The Formula 1 racing
teams are particularly adept in this area, as are advanced aerospace
companies.
•  Expanded product design modeling-Similarly, with
new big-data enabled computer aided design programs, product
designers can substitute components or materials from huge
databases and then access in-depth information on how this affects
the final product, including the ramifications on cost, production
processes, environmental effects, legislative requirements, supply
chain and so on. 
General Manufacturing
Friday, April 4, 14
Friday, April 4, 14
WHERE Several Branches
Data Center(GM Headquarters in
Gurgaon )
NEEDS Safety and Quality Analysis.
BENEFITS Awareness and Indication on what to fix.
GM
Friday, April 4, 14
•  Suspect tracking-By combining CCTV images, facial
recognition software, travel trends and identifiers on travel cards,
police forces can capture criminals by automatically linking people
to their likely destinations on buses and metro systems.This allows
police to catch those that they miss at the scene of the crime and
also to control arrest statistics, meeting targets for arrests in one
London borough, for instance, as needed.
Policing
Friday, April 4, 14
Friday, April 4, 14
WHERE Several Branches
Data Center(CBI Headquarters in Delhi)
NEEDS To identify person’s behavior and actions
BENEFITS
Give awareness for what that person is
going to do next.What is their next plan?
CBI
Friday, April 4, 14
Utilities (oil & gas)
• Asset monitoring- As with the machines in manufacturing
plants, the utilities companies use big data to keep track on all of
their assets spread across a country, continent or the globe.This
enables them to fix any broken asset (such as a sewage cleansing
plant, a leaking pipe or a gas pump), perform pre-emptive running
maintenance or isolate areas in which repair actions have been
ineffective.
Friday, April 4, 14
Friday, April 4, 14
WHERE From the Machines in the Manufacturing plants
Data Center(ChevronHeadquarters)
NEEDS
To keep track of what is going on in the
Manufacturing plant. Like broken pipes, leakage
and etc...
BENEFITS
This gives them feedback from designs so
they know how to improve the
construction of the manufacturing plant
because that is their main source of how
they get oil and gas.
CHEVRON
Friday, April 4, 14
Retail and Marketing
• Mood mapping-Retailers use feeds from social networks to
build an understanding of how their products and company
reputation is seen among the public.With the constant streams of
opinions from Facebook,Twitter, Google+ and the like, companies
are able to cheaply and quickly gather large samples of customer
opinion.
Friday, April 4, 14
Friday, April 4, 14
Friday, April 4, 14
WHERE From Social Media Networking Sites
Data Center(Air Jordan Headquarters)
NEEDS Customer’s behavior, helps to find out opinions
and feelings, feedback of their brand.
BENEFITS This gives them feedback on what the
customers are thinking about their
product. Gives feedback from audiences
to improve their product.
Air Jordan
Friday, April 4, 14
THANK
YOU !!!
Friday, April 4, 14

More Related Content

PDF
Big data-analytics-cpe8035
PPTX
Big Data and Classification
PPTX
Hadoop File system (HDFS)
PPTX
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
PPTX
E2D3 introduction
PPTX
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
PPSX
Big data-analytics-cpe8035
Big Data and Classification
Hadoop File system (HDFS)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
E2D3 introduction
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...

What's hot (20)

DOCX
Big data abstract
PPTX
Big Data & Hadoop Introduction
PDF
Big Data Fundamentals
PPTX
Data Streaming in Big Data Analysis
PDF
Big Data Analytics in 2023
PPTX
Online analytical processing
PPTX
Introduction of Data Science
PDF
"Cool" metadata for FAIR data
PDF
H2O.ai's Driverless AI
PPTX
Big data
PDF
Data modelling 101
PPT
La BI : Qu’est-ce que c’est ? A quoi ça sert ?
PPTX
Building Modern Data Platform with Microsoft Azure
PPTX
Lecture #01
PDF
Big Data Architecture
PDF
Data Virtualization: An Introduction
PDF
From Data Warehouse to Lakehouse
PPT
Introduction to Data Warehouse
PPSX
OLAP OnLine Analytical Processing
PDF
Etl overview training
Big data abstract
Big Data & Hadoop Introduction
Big Data Fundamentals
Data Streaming in Big Data Analysis
Big Data Analytics in 2023
Online analytical processing
Introduction of Data Science
"Cool" metadata for FAIR data
H2O.ai's Driverless AI
Big data
Data modelling 101
La BI : Qu’est-ce que c’est ? A quoi ça sert ?
Building Modern Data Platform with Microsoft Azure
Lecture #01
Big Data Architecture
Data Virtualization: An Introduction
From Data Warehouse to Lakehouse
Introduction to Data Warehouse
OLAP OnLine Analytical Processing
Etl overview training
Ad

Viewers also liked (20)

PPTX
Big data
PPTX
What is Big Data?
PPTX
Big data ppt
PPTX
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
PPTX
Big Data Concepts
PPTX
Big data concepts
PPT
Big data ppt
PPTX
Big Data Analytics with Hadoop
PDF
Robotics
PDF
Big data
PPTX
Big Data Marketing Analytics
PDF
02 a holistic approach to big data
PDF
Big Data Hadoop Training by Easylearning Guru
PPTX
Introduction to Big Data
PDF
Privacy in the Age of Big Data
PDF
Implementation of RBAC and Data Classification onto a Mainframe system (v1.5)
PPTX
Big data, data science & fast data
KEY
NoSQL databases and managing big data
PPTX
Big Data World
Big data
What is Big Data?
Big data ppt
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data Concepts
Big data concepts
Big data ppt
Big Data Analytics with Hadoop
Robotics
Big data
Big Data Marketing Analytics
02 a holistic approach to big data
Big Data Hadoop Training by Easylearning Guru
Introduction to Big Data
Privacy in the Age of Big Data
Implementation of RBAC and Data Classification onto a Mainframe system (v1.5)
Big data, data science & fast data
NoSQL databases and managing big data
Big Data World
Ad

Similar to Big Data Final Presentation (20)

PPT
data analytics lecture3 nice pdf to learn
PPT
data analytics lecture3.ppt
PDF
Big data rmoug
PPTX
BDA UNIT 1big data – web analytics – big data applications– big data technolo...
PPTX
ODSC and iRODS
PPTX
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
PDF
John morrissey c3 dis fair working data.pptx
PPTX
Database and types of database
PPT
Hadoop HDFS.ppt
PPTX
5 Things that Make Hadoop a Game Changer
PDF
Data analytics course 3
PDF
From Big Data to Fast Data
PPTX
Introduction to Apache Hadoop Eco-System
PPT
Data analytics & its Trends
PPTX
One Large Data Lake, Hold the Hype
PPTX
One Large Data Lake, Hold the Hype
PPTX
Data Science ppt for the asjdbhsadbmsnc.pptx
PPTX
Data lake ppt
PDF
Big Data
PPTX
Big data
data analytics lecture3 nice pdf to learn
data analytics lecture3.ppt
Big data rmoug
BDA UNIT 1big data – web analytics – big data applications– big data technolo...
ODSC and iRODS
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
John morrissey c3 dis fair working data.pptx
Database and types of database
Hadoop HDFS.ppt
5 Things that Make Hadoop a Game Changer
Data analytics course 3
From Big Data to Fast Data
Introduction to Apache Hadoop Eco-System
Data analytics & its Trends
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the Hype
Data Science ppt for the asjdbhsadbmsnc.pptx
Data lake ppt
Big Data
Big data

Recently uploaded (20)

PDF
Review of Related Literature & Studies.pdf
PDF
Laparoscopic Imaging Systems at World Laparoscopy Hospital
PDF
0520_Scheme_of_Work_(for_examination_from_2021).pdf
PDF
Horaris_Grups_25-26_Definitiu_15_07_25.pdf
PDF
Lecture on Viruses: Structure, Classification, Replication, Effects on Cells,...
PDF
Fun with Grammar (Communicative Activities for the Azar Grammar Series)
PDF
Everyday Spelling and Grammar by Kathi Wyldeck
PPTX
PLASMA AND ITS CONSTITUENTS 123.pptx
PPTX
Unit 1 aayurveda and nutrition presentation
PDF
fundamentals-of-heat-and-mass-transfer-6th-edition_incropera.pdf
PPTX
pharmaceutics-1unit-1-221214121936-550b56aa.pptx
PPTX
Thinking Routines and Learning Engagements.pptx
DOCX
EDUCATIONAL ASSESSMENT ASSIGNMENT SEMESTER MAY 2025.docx
PPT
hemostasis and its significance, physiology
PDF
WHAT NURSES SAY_ COMMUNICATION BEHAVIORS ASSOCIATED WITH THE COMP.pdf
PDF
anganwadi services for the b.sc nursing and GNM
PPT
Acidosis in Dairy Herds: Causes, Signs, Management, Prevention and Treatment
PPTX
IT infrastructure and emerging technologies
PDF
Chevening Scholarship Application and Interview Preparation Guide
PPTX
principlesofmanagementsem1slides-131211060335-phpapp01 (1).ppt
Review of Related Literature & Studies.pdf
Laparoscopic Imaging Systems at World Laparoscopy Hospital
0520_Scheme_of_Work_(for_examination_from_2021).pdf
Horaris_Grups_25-26_Definitiu_15_07_25.pdf
Lecture on Viruses: Structure, Classification, Replication, Effects on Cells,...
Fun with Grammar (Communicative Activities for the Azar Grammar Series)
Everyday Spelling and Grammar by Kathi Wyldeck
PLASMA AND ITS CONSTITUENTS 123.pptx
Unit 1 aayurveda and nutrition presentation
fundamentals-of-heat-and-mass-transfer-6th-edition_incropera.pdf
pharmaceutics-1unit-1-221214121936-550b56aa.pptx
Thinking Routines and Learning Engagements.pptx
EDUCATIONAL ASSESSMENT ASSIGNMENT SEMESTER MAY 2025.docx
hemostasis and its significance, physiology
WHAT NURSES SAY_ COMMUNICATION BEHAVIORS ASSOCIATED WITH THE COMP.pdf
anganwadi services for the b.sc nursing and GNM
Acidosis in Dairy Herds: Causes, Signs, Management, Prevention and Treatment
IT infrastructure and emerging technologies
Chevening Scholarship Application and Interview Preparation Guide
principlesofmanagementsem1slides-131211060335-phpapp01 (1).ppt

Big Data Final Presentation

  • 1. BIG DATA Final Presentation By: Hemanth Aroumougam Friday, April 4, 14
  • 2. During the first generation.... Friday, April 4, 14
  • 3. Employees in companies started entering data into computer systems Friday, April 4, 14
  • 4. As the second generation comes... Friday, April 4, 14
  • 6. But now as generations move on there is a third one to this list and it is... Friday, April 4, 14
  • 7. Now a days even machines are automatically entering data into computer systems. Friday, April 4, 14
  • 10. BIG DATA is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. Friday, April 4, 14
  • 11. • Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. Friday, April 4, 14
  • 12. Big data is a buzzword, or catch-phrase, used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. Friday, April 4, 14
  • 13. In BIG DATA there are 3Vs which are the defining properties and the dimensions of Big Data Friday, April 4, 14
  • 14. The 3Vs are... Friday, April 4, 14
  • 16. Volume- BigVolume consists of simple SQL analytics and with complex non-SQL analytics. In other words volume refers to the amount of data. Friday, April 4, 14
  • 17. SQL • SQL Stands for Structured Query Language. • SQL is a standardized query language for requesting information from a database. • SQL was first introduced as a commercial database system in 1979 by the Oracle Corporation. • Historically, SQL has been the favorite query language for database management systems running on minicomputers and mainframes. Friday, April 4, 14
  • 18. Volume Petabyte (PB) Terabyte (TB) Gigabyte (GB) Megabyte (MB) Kilobyte (KB) Friday, April 4, 14
  • 19. Variety- Large number of diverse data sources to integrate. In other words variety is basically referring to the number of different types of data. Friday, April 4, 14
  • 21. Structured Data • Structured Data is data that resides in a fixed field within a record or file is called structured data.This includes data contained in relational databases and spreadsheets. Structured data has the advantage of being easily entered, stored, queried and analyzed. Friday, April 4, 14
  • 22. • Library Catalogues (date, author, place, subject, etc) • Census records (birth, income, employment, place etc.) • Phone numbers (and the phone book) • Economic data (GDP, PPI, ASX etc.) • XML-TEI (bringing structure to the text through tagging particular elements like versions of the word ”canal’ in 17th C Dutch. • Databases • Data warehouse • Enterprise systems (CRM, ERP, etc) EXAMPLES OF STRUCTURED DATA Friday, April 4, 14
  • 23. Semi structured Data • Semi-structured data is a form of structured data that does not conform with the formal structure of data models associated with relational databases or other forms of data tables Friday, April 4, 14
  • 24. • Web Pages • Information Integration • XML EXAMPLES OF SEMI STRUCTURED DATA Friday, April 4, 14
  • 25. Unstructured Data • Unstructured Data refers to information that either does not have a pre-defined data model or is not organized in a predefined manner. Unstructured information is typically text-heavy. In other words unstructured data is something that is at the other end of the spectrum. It might be in any form: text, audio, video.We definitely don’t know from looking at the data what it means ,unless we apply human understanding to it. Friday, April 4, 14
  • 26. EXAMPLES OF UNSTRUCTURED DATA • Book • Story • Heavy text • audio • video • RSS Feeds • Word documents • Excel Spreadsheets • Email messages Friday, April 4, 14
  • 27. Velocity- Velocity is basically referring to the speed in which the data is processed. Friday, April 4, 14
  • 28. TYPES OFVELOCITY REAL TIME ANALYSIS NEAR REAL TIME PERIODIC BATCH Friday, April 4, 14
  • 29. Benefits of Batch Processing. It can shift the time of job processing to when the computing resources are less busy. • It avoids idling the computing resources with minute-by-minute manual intervention and supervision. • By keeping high overall rate of utilization, it amortizes the computer, especially an expensive one. • It allows the system to use different priorities for batch and interactive work. • Rather than running one program multiple times to process one transaction each time, batch processes will run the program only once for many transactions, reducing system overhead. Friday, April 4, 14
  • 33. ORACLE BIG DATA SOLUTION • Oracle is the first vendor to offer a complete and integrated solution to address the full spectrum of enterprise big data requirements. Oracle’s big data strategy is centered on the idea that you can extend your current enterprise information architecture to incorporate big data. New big data technologies, such as Hadoop and Oracle NoSQL database, run alongside your Oracle data warehouse to deliver business value and address your big data requirements. Friday, April 4, 14
  • 35. Advantages and Disadvantages of BIG DATA Friday, April 4, 14
  • 36. ADVANTAGES • Data mining allows uses are that you can find correlations easier • More calculated now therefore accuracy is higher • Data is now combined into a big mass which allows for links to be found • For example: company with decades of information can make use of Big Data and data analysis to create competitive advantages and open new business opportunities • Started because companies have been finding it hard to manage all their data  • Creates new growth opportunities, lots of jobs Friday, April 4, 14
  • 37. DISADVANTAGES • Big risks on security and privacy • Challenges arise: expensive, need to spend a lot to get it working • A lot of analyzing: uncover patterns, apply algorithms, connections relationships • Still need specialization regarding the analysts; hard to find the right skill set Friday, April 4, 14
  • 39. •Hadoop- Apache Foundation •MongoDB- Mongo, Inc Friday, April 4, 14
  • 40. • Apache Hadoop is an open source data framework for storage and large scale processing for data sets on clusters of commodity hardwares. It is licensed under the Apache License 2.0.  The Apache Hadoop framework is composed of the following modules: • Hadoop Common – contains libraries and utilities needed by other Hadoop modules. • Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster. • HadoopYARN – a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. • Hadoop MapReduce – a programming model for large scale data processing. • This is written in- Java Friday, April 4, 14
  • 41. • MongoDB is a big data software which came from the word “humongous”. MongoDB is a cross-platform document-oriented database.A document-oriented database is a computer program designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data.This is classified as NoSQL.  A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. • MarkLogic is an American Business company that makes NoSQL database. • Language written in- C++ Friday, April 4, 14
  • 43. •Enterprise NoSQL Database Technology •Best Big Data Search •Real-timeYour Hadoop Friday, April 4, 14
  • 44. Enterprise NoSQL Database Technology • For more than a decade, MarkLogic has delivered a powerful, agile, and trusted enterprise-grade NoSQL (Not Only SQL) database that enables organizations to turn all data into valuable and actionable information. Key features include ACID transactions, horizontal scaling, real-time indexing, high availability, disaster recovery, government- grade security, and more. Friday, April 4, 14
  • 45. Best Big Data Research • Search all data for more value. Bring all relevant content back to users – unstructured and structured, internal and public. • Real-time updates. Real-time results.When documents are updated or inserted, they are available for search immediately. • Able to query all types of data. Structured, semi-structured, and unstructured content are all supported within the same queries. • Real-time alerts for fast response. MarkLogic has the highest performance alerting engine available, capable of running millions of custom queries on each and every change to the document repository – no polling required. • Search you can bank on. Businesses that count on revenue through paid content search and retrieval trust MarkLogic to deliver. MarkLogic’s scale-out, real-time platform is more than a search engine linked to a content repository – it is the most complete platform for building search-oriented applications. Friday, April 4, 14
  • 46. Real Time your Hadoop Get more power out of Hadoop. Hadoop and MarkLogic together can allow you to tackle problems that would be difficult or impossible to address by either technology alone. Save money by leveraging common infrastructure. Using MarkLogic and Hadoop Distributed File System (HDFS) enables common batch- processing infrastructure to be used across many different projects and applications. Enterprise-class support for Hadoop. Our partnership with Intel provides a strong, supported platform for building secure, enterprise-class Big Data Applications with Apache Hadoop. Seamlessly combine the power of MapReduce with MarkLogic’s real-time, interactive analysis and indexing on a single, unified platform. Friday, April 4, 14
  • 48. Some points of what can you accomplish with BIG DATA? Friday, April 4, 14
  • 49. Dialogue with Consumers • Today’s consumers are a tough nut to crack.They look around a lot before they buy. You want to make customers to buy your products. • Big Data allows you to profile these increasingly vocal and fickle little ‘tyrants’ in a far-reaching manner so that you can engage in an almost one-on-one, real-time conversation with them.This is not actually a luxury. If you don’t treat them like they want to, they will leave you in the blink of an eye. Friday, April 4, 14
  • 50. Re-develop your Products • Big Data can also help you understand how others perceive your products so that you can adapt them. • Analysis of unstructured social media text allows you to uncover the sentiments of your customers and even segment those in different geographical locations or among different demographic groups. Friday, April 4, 14
  • 51. Perform Risk Analysis • Success not only depends on how you run your company. Social and economic factors are crucial for your accomplishments as well.   Predictive analytics, fueled by Big Data allows you to scan and analyze newspaper reports or social media feeds so that you permanently keep up to speed on the latest developments in your industry and its environment. • Detailed health-tests on your suppliers and customers are another goodie that comes with Big Data.This will allow you to take action when one of them is in risk of defaulting. Friday, April 4, 14
  • 52. Keeping your data safe • You can map the entire data landscape across your company with Big Data tools, thus allowing you to analyze the threats that you face internally. • You will be able to detect potentially sensitive information that is not protected in an appropriate manner and make sure it is stored according to regulatory requirements. Friday, April 4, 14
  • 54. Where they use BIG DATA and How? Friday, April 4, 14
  • 55. Big Data is used in many fields like.... Friday, April 4, 14
  • 56. • Fault Logging and cost predictions- Car makers place hundreds of sensors on components around the car which constantly log data on performance and faults.All of this data can be used to reengineer designs for more efficient products and to predict what the strain of warranty repairs are likely to be on cost and man resource. Car Makers Friday, April 4, 14
  • 58. WHERE From Factories and from sensors Data Center(Headquarters) NEEDS Safety and Quality Analysis BENEFITS Feedback from Design TOYOTA Friday, April 4, 14
  • 59. • B2B supplier profiling- Finance professionals can use big data to check on the ‘health’ of their suppliers and business partners.They can monitor a variety of indicators including when creditors pay their bills and whether there is any change • Fraud detection-Companies likeVisa are using big data to create fraud detection models which can flag up potential fraudsters. Finance Friday, April 4, 14
  • 60. WHERE Where ever they buy Data Center(Headquarters) NEEDS Detect Fraud, Customer’s Behavior BENEFITS Personal Recommendation VISA Friday, April 4, 14
  • 61. •  Simulations-Manufacturers can take real data from their products on the market and then run simulations based on what would happen if they changed one particular component or design aspect.They can then find ways to make the product cheaper, more reliable or more environmentally friendly.The Formula 1 racing teams are particularly adept in this area, as are advanced aerospace companies. •  Expanded product design modeling-Similarly, with new big-data enabled computer aided design programs, product designers can substitute components or materials from huge databases and then access in-depth information on how this affects the final product, including the ramifications on cost, production processes, environmental effects, legislative requirements, supply chain and so on.  General Manufacturing Friday, April 4, 14
  • 63. WHERE Several Branches Data Center(GM Headquarters in Gurgaon ) NEEDS Safety and Quality Analysis. BENEFITS Awareness and Indication on what to fix. GM Friday, April 4, 14
  • 64. •  Suspect tracking-By combining CCTV images, facial recognition software, travel trends and identifiers on travel cards, police forces can capture criminals by automatically linking people to their likely destinations on buses and metro systems.This allows police to catch those that they miss at the scene of the crime and also to control arrest statistics, meeting targets for arrests in one London borough, for instance, as needed. Policing Friday, April 4, 14
  • 66. WHERE Several Branches Data Center(CBI Headquarters in Delhi) NEEDS To identify person’s behavior and actions BENEFITS Give awareness for what that person is going to do next.What is their next plan? CBI Friday, April 4, 14
  • 67. Utilities (oil & gas) • Asset monitoring- As with the machines in manufacturing plants, the utilities companies use big data to keep track on all of their assets spread across a country, continent or the globe.This enables them to fix any broken asset (such as a sewage cleansing plant, a leaking pipe or a gas pump), perform pre-emptive running maintenance or isolate areas in which repair actions have been ineffective. Friday, April 4, 14
  • 69. WHERE From the Machines in the Manufacturing plants Data Center(ChevronHeadquarters) NEEDS To keep track of what is going on in the Manufacturing plant. Like broken pipes, leakage and etc... BENEFITS This gives them feedback from designs so they know how to improve the construction of the manufacturing plant because that is their main source of how they get oil and gas. CHEVRON Friday, April 4, 14
  • 70. Retail and Marketing • Mood mapping-Retailers use feeds from social networks to build an understanding of how their products and company reputation is seen among the public.With the constant streams of opinions from Facebook,Twitter, Google+ and the like, companies are able to cheaply and quickly gather large samples of customer opinion. Friday, April 4, 14
  • 73. WHERE From Social Media Networking Sites Data Center(Air Jordan Headquarters) NEEDS Customer’s behavior, helps to find out opinions and feelings, feedback of their brand. BENEFITS This gives them feedback on what the customers are thinking about their product. Gives feedback from audiences to improve their product. Air Jordan Friday, April 4, 14