SlideShare a Scribd company logo
2
What is a data stream?
• Golab & Oszu (2003): “A data stream is a real-time, continuous, ordered
(implicitly by arrival time or explicitly by timestamp) sequence of items.
It is impossible to control the order in which items arrive, nor is it
feasible to locally store a stream in its entirety.”
• Massive volumes of data, items arrive at a high rate.
Most read
3
Data Streams
• A data stream is a (potentially unbounded) sequence of tuples. Each
tuple consist of a set of attributes, similar to a row in database table.
• Transactional data streams: log interactions between entities
• Credit card: purchases by consumers from merchants
• Telecommunications: phone calls by callers to dialed parties
• Web: accesses by clients of resources at servers
• Measurement data streams: monitor evolution of entity states
• Sensor networks: physical phenomena, road traffic
• IP network: traffic at router interfaces
• Earth climate: temperature, moisture at weather stations
Most read
10
A data-stream-management system (DSMS)
• Streams may be archived in a large archival
store, but we assume it is not possible to answer
queries from the archival store.
• It could be examined only under special
circumstances using time-consuming retrieval
processes.
• There is also a working store, into which
summaries or parts of streams may be placed,
and which can be used for answering queries.
• The working store might be disk, or it might be
main memory, depending on how fast we need
to process queries.
• But either way, it is of sufficiently limited
capacity that it cannot store all the data from all
the streams.
Most read
Introduction to Data Streams
Concepts
What is a data stream?
• Golab & Oszu (2003): “A data stream is a real-time, continuous, ordered
(implicitly by arrival time or explicitly by timestamp) sequence of items.
It is impossible to control the order in which items arrive, nor is it
feasible to locally store a stream in its entirety.”
• Massive volumes of data, items arrive at a high rate.
Data Streams
• A data stream is a (potentially unbounded) sequence of tuples. Each
tuple consist of a set of attributes, similar to a row in database table.
• Transactional data streams: log interactions between entities
• Credit card: purchases by consumers from merchants
• Telecommunications: phone calls by callers to dialed parties
• Web: accesses by clients of resources at servers
• Measurement data streams: monitor evolution of entity states
• Sensor networks: physical phenomena, road traffic
• IP network: traffic at router interfaces
• Earth climate: temperature, moisture at weather stations
Examples of Stream Sources
Before proceeding, let us consider some of the ways in which stream data arises aturally.
Sensor Data : Imagine a temperature sensor bobbing about in the ocean, sending back to a base
station a reading of the surface temperature each hour. The data produced by this sensor is a stream
of real numbers. Now we have 3.5 terabytes arriving every day, and we definitely need to think about
what can be kept in working storage and what can only be archived.
Image Data : Satellites often send down to earth streams consisting of many terabytes of images per
day. Surveillance cameras produce images with lower resolution than satellites, but there can be many
of them, each producing a stream of images at intervals like one second.
Internet and Web Traffic : A switching node in the middle of the Internet receives streams of IP
packets from many inputs and routes them to its outputs. Web sites receive streams of various types.
For example, Google receives several hundred million search queries per day. Yahoo! accepts billions
of “clicks” per day on its various sites.
Characteristics of Data Streams
• Characteristics
• Huge volumes of continuous data, possibly infinite
• Fast changing and requires fast, real-time response
• Data stream captures nicely our data processing needs of today
• Random access is expensive—single scan algorithm (can only have
one look)
• Store only the summary of the data seen thus far
• Most stream data are at pretty low-level or multi-dimensional in
nature, needs multi-level and multi-dimensional processing
Applications of data stream processing
• Data stream processing
• Process queries (compute statistics, activate alarms)
• Apply data mining algorithms
• Requirements
• Real-time processing
• One-pass processing
• Bounded storage (no complete storage of streams)
• Possibly consider several streams
• Let’s go deeper into some examples
• Network management
• Stock monitoring
Network management
Network management (cont.)
Stock monitoring
A data-stream-management system (DSMS)
• Streams may be archived in a large archival
store, but we assume it is not possible to answer
queries from the archival store.
• It could be examined only under special
circumstances using time-consuming retrieval
processes.
• There is also a working store, into which
summaries or parts of streams may be placed,
and which can be used for answering queries.
• The working store might be disk, or it might be
main memory, depending on how fast we need
to process queries.
• But either way, it is of sufficiently limited
capacity that it cannot store all the data from all
the streams.
Generic DSMS Architecture
Updates to
Static Data
User
Queries
[Golab & Özsu 2003]
Input
Monitor
Output
Buffer
QueryProcessor
Query
Reposi-
tory
Working
Storage
Summary
Storage
Static
Storage
Streaming
Inputs
Streaming
Outputs
Architecture: Stream Query Processing
SDMS (Stream Data
Management System)
Data Stream Management Systems
DBMS versus DSMS (Data Stream Management System)
• Persistent relations
• One-time queries
• Random access
• “Unbounded” disk store
• Only current state matters
• No real-time services
• Relatively low update rate
• Data at any granularity
• Assume precise data
• Access plan determined by query
processor, physical DB design
• Transient streams
• Continuous queries
• Sequential access
• Bounded main memory
• Historical data is important
• Real-time requirements
• Possibly multi-GB arrival rate
• Data at fine granularity
• Data stale/imprecise
• Unpredictable/variable data arrival
and characteristics
Existing DSMS
Challenges of Stream Data Processing
• Multiple, continuous, rapid, time-varying, ordered streams
• Main memory computations
• Queries are often continuous
• Evaluated continuously as stream data arrives
• Answer updated over time
• Queries are often complex
• Beyond element-at-a-time processing
• Beyond stream-at-a-time processing
• Beyond relational queries (scientific, data mining, OLAP)
• Multi-level/multi-dimensional processing and data mining
• Most stream data are at low-level or multi-dimensional in nature
How to deal with Big Data Streams ?
Approximate answers to queries
 When ?
• Queries needing unbounded memory
• Too much queries/too rapid streams/too high response time
requirements
• CPU limit
• Memory limit
• Solution : approximate answers to queries
• Sliding windows
• Sampling and load shedding
• Definition of synopsis
Straming Computing Approaches
• Two approaches for handling such streams
• Use a time window, and query the window as a static table
• When you can’t store collected data, or to keep track of historical data
• Sampling
• Filtering
• Counting

More Related Content

What's hot (20)

Object Relational Database Management System(ORDBMS)
Object Relational Database Management System(ORDBMS)Object Relational Database Management System(ORDBMS)
Object Relational Database Management System(ORDBMS)
Rabin BK
 
OLAP operations
OLAP operationsOLAP operations
OLAP operations
kunj desai
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
Pier Luca Lanzi
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
Mohit Saini
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Prashant Gupta
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
Ashish Kumar
 
Parallel Database
Parallel DatabaseParallel Database
Parallel Database
VESIT/University of Mumbai
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
Sunita Sahu
 
Distributed design alternatives
Distributed design alternativesDistributed design alternatives
Distributed design alternatives
Pooja Dixit
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEM
JYoTHiSH o.s
 
Design issues of dos
Design issues of dosDesign issues of dos
Design issues of dos
vanamali_vanu
 
Distributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - IntroductionDistributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - Introduction
Gyanmanjari Institute Of Technology
 
Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data mining
ZHAO Sam
 
Deep Learning Frameworks slides
Deep Learning Frameworks slides Deep Learning Frameworks slides
Deep Learning Frameworks slides
Sheamus McGovern
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architectures
Pooja Dixit
 
Challenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptxChallenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptx
GovardhanV7
 
Query processing
Query processingQuery processing
Query processing
Dr. C.V. Suresh Babu
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
shivli0769
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
nehabsairam
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management system
Pooja Dixit
 
Object Relational Database Management System(ORDBMS)
Object Relational Database Management System(ORDBMS)Object Relational Database Management System(ORDBMS)
Object Relational Database Management System(ORDBMS)
Rabin BK
 
OLAP operations
OLAP operationsOLAP operations
OLAP operations
kunj desai
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
Mohit Saini
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
Ashish Kumar
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
Sunita Sahu
 
Distributed design alternatives
Distributed design alternativesDistributed design alternatives
Distributed design alternatives
Pooja Dixit
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEM
JYoTHiSH o.s
 
Design issues of dos
Design issues of dosDesign issues of dos
Design issues of dos
vanamali_vanu
 
Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data mining
ZHAO Sam
 
Deep Learning Frameworks slides
Deep Learning Frameworks slides Deep Learning Frameworks slides
Deep Learning Frameworks slides
Sheamus McGovern
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architectures
Pooja Dixit
 
Challenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptxChallenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptx
GovardhanV7
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
shivli0769
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
nehabsairam
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management system
Pooja Dixit
 

Similar to Lecture6 introduction to data streams (20)

Datastream management system1
Datastream management system1Datastream management system1
Datastream management system1
SaritaTripathy3
 
A Deep dive into Big Data Analytics Data mining
A Deep dive into Big Data Analytics Data miningA Deep dive into Big Data Analytics Data mining
A Deep dive into Big Data Analytics Data mining
theniche69
 
Mining Stream Data using k-Means clustering Algorithm
Mining Stream Data using k-Means clustering AlgorithmMining Stream Data using k-Means clustering Algorithm
Mining Stream Data using k-Means clustering Algorithm
Manishankar Medi
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
Raja Chiky
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
Houw Liong The
 
Data Streaming and Stream management system
Data Streaming and Stream management systemData Streaming and Stream management system
Data Streaming and Stream management system
RizwanShaikh146
 
Jewei Hans & Kamber Chapter 8
Jewei Hans & Kamber  Chapter 8Jewei Hans & Kamber  Chapter 8
Jewei Hans & Kamber Chapter 8
Houw Liong The
 
081.ppt
081.ppt081.ppt
081.ppt
amil baba
 
Uint-4 Mining Data Stream.pdf
Uint-4 Mining Data Stream.pdfUint-4 Mining Data Stream.pdf
Uint-4 Mining Data Stream.pdf
Sitamarhi Institute of Technology
 
An Analysis of a Checkpointing Mechanism for a Stream Processing System
An Analysis of a Checkpointing Mechanism for a Stream Processing SystemAn Analysis of a Checkpointing Mechanism for a Stream Processing System
An Analysis of a Checkpointing Mechanism for a Stream Processing System
zucaritask
 
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamKIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
Dr. Radhey Shyam
 
Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Management
k_tauhid
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
WSO2
 
Data Streams Models And Algorithms Charu C Aggarwal Ed
Data Streams Models And Algorithms Charu C Aggarwal EdData Streams Models And Algorithms Charu C Aggarwal Ed
Data Streams Models And Algorithms Charu C Aggarwal Ed
anthuguyso
 
SobhanBadiozamanyPhD
SobhanBadiozamanyPhDSobhanBadiozamanyPhD
SobhanBadiozamanyPhD
Sobhan Badiozamany
 
CHhhjjuiiiiiiiiiiS18003 Unit 3 Class ppt.pptx
CHhhjjuiiiiiiiiiiS18003 Unit 3 Class ppt.pptxCHhhjjuiiiiiiiiiiS18003 Unit 3 Class ppt.pptx
CHhhjjuiiiiiiiiiiS18003 Unit 3 Class ppt.pptx
065JEEVASREEMCSE
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
Maycon Viana Bordin
 
[WSO2Con EU 2018] Streaming SQL in the Real World
[WSO2Con EU 2018] Streaming SQL in the Real World[WSO2Con EU 2018] Streaming SQL in the Real World
[WSO2Con EU 2018] Streaming SQL in the Real World
WSO2
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsight
Eduardo Castro
 
Event Hub & Azure Stream Analytics
Event Hub & Azure Stream AnalyticsEvent Hub & Azure Stream Analytics
Event Hub & Azure Stream Analytics
Davide Mauri
 
Datastream management system1
Datastream management system1Datastream management system1
Datastream management system1
SaritaTripathy3
 
A Deep dive into Big Data Analytics Data mining
A Deep dive into Big Data Analytics Data miningA Deep dive into Big Data Analytics Data mining
A Deep dive into Big Data Analytics Data mining
theniche69
 
Mining Stream Data using k-Means clustering Algorithm
Mining Stream Data using k-Means clustering AlgorithmMining Stream Data using k-Means clustering Algorithm
Mining Stream Data using k-Means clustering Algorithm
Manishankar Medi
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
Raja Chiky
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
Houw Liong The
 
Data Streaming and Stream management system
Data Streaming and Stream management systemData Streaming and Stream management system
Data Streaming and Stream management system
RizwanShaikh146
 
Jewei Hans & Kamber Chapter 8
Jewei Hans & Kamber  Chapter 8Jewei Hans & Kamber  Chapter 8
Jewei Hans & Kamber Chapter 8
Houw Liong The
 
081.ppt
081.ppt081.ppt
081.ppt
amil baba
 
An Analysis of a Checkpointing Mechanism for a Stream Processing System
An Analysis of a Checkpointing Mechanism for a Stream Processing SystemAn Analysis of a Checkpointing Mechanism for a Stream Processing System
An Analysis of a Checkpointing Mechanism for a Stream Processing System
zucaritask
 
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamKIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
Dr. Radhey Shyam
 
Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Management
k_tauhid
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
WSO2
 
Data Streams Models And Algorithms Charu C Aggarwal Ed
Data Streams Models And Algorithms Charu C Aggarwal EdData Streams Models And Algorithms Charu C Aggarwal Ed
Data Streams Models And Algorithms Charu C Aggarwal Ed
anthuguyso
 
CHhhjjuiiiiiiiiiiS18003 Unit 3 Class ppt.pptx
CHhhjjuiiiiiiiiiiS18003 Unit 3 Class ppt.pptxCHhhjjuiiiiiiiiiiS18003 Unit 3 Class ppt.pptx
CHhhjjuiiiiiiiiiiS18003 Unit 3 Class ppt.pptx
065JEEVASREEMCSE
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
Maycon Viana Bordin
 
[WSO2Con EU 2018] Streaming SQL in the Real World
[WSO2Con EU 2018] Streaming SQL in the Real World[WSO2Con EU 2018] Streaming SQL in the Real World
[WSO2Con EU 2018] Streaming SQL in the Real World
WSO2
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsight
Eduardo Castro
 
Event Hub & Azure Stream Analytics
Event Hub & Azure Stream AnalyticsEvent Hub & Azure Stream Analytics
Event Hub & Azure Stream Analytics
Davide Mauri
 
Ad

More from hktripathy (16)

Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
hktripathy
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
hktripathy
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
hktripathy
 
Lecture7.1 data sampling
Lecture7.1 data samplingLecture7.1 data sampling
Lecture7.1 data sampling
hktripathy
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
hktripathy
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
hktripathy
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
hktripathy
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
hktripathy
 
Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & prediction
hktripathy
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
hktripathy
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
hktripathy
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
hktripathy
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-I
hktripathy
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mining
hktripathy
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
hktripathy
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
hktripathy
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
hktripathy
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
hktripathy
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
hktripathy
 
Lecture7.1 data sampling
Lecture7.1 data samplingLecture7.1 data sampling
Lecture7.1 data sampling
hktripathy
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
hktripathy
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
hktripathy
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
hktripathy
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
hktripathy
 
Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & prediction
hktripathy
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
hktripathy
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
hktripathy
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
hktripathy
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-I
hktripathy
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mining
hktripathy
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
hktripathy
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
hktripathy
 
Ad

Recently uploaded (20)

Cloud Computing ..PPT ( Faizan ALTAF )..
Cloud Computing ..PPT ( Faizan ALTAF )..Cloud Computing ..PPT ( Faizan ALTAF )..
Cloud Computing ..PPT ( Faizan ALTAF )..
faizanaltaf231
 
"Dictyoptera: The Order of Cockroaches and Mantises" Or, more specifically: ...
"Dictyoptera: The Order of Cockroaches and Mantises"  Or, more specifically: ..."Dictyoptera: The Order of Cockroaches and Mantises"  Or, more specifically: ...
"Dictyoptera: The Order of Cockroaches and Mantises" Or, more specifically: ...
Arshad Shaikh
 
"Hymenoptera: A Diverse and Fascinating Order".pptx
"Hymenoptera: A Diverse and Fascinating Order".pptx"Hymenoptera: A Diverse and Fascinating Order".pptx
"Hymenoptera: A Diverse and Fascinating Order".pptx
Arshad Shaikh
 
Stewart Butler - OECD - How to design and deliver higher technical education ...
Stewart Butler - OECD - How to design and deliver higher technical education ...Stewart Butler - OECD - How to design and deliver higher technical education ...
Stewart Butler - OECD - How to design and deliver higher technical education ...
EduSkills OECD
 
PHYSIOLOGY & SPORTS INJURY by Diwakar Sir
PHYSIOLOGY & SPORTS INJURY by Diwakar SirPHYSIOLOGY & SPORTS INJURY by Diwakar Sir
PHYSIOLOGY & SPORTS INJURY by Diwakar Sir
Diwakar Kashyap
 
SEM II 3202 STRUCTURAL MECHANICS, B ARCH, REGULATION 2021, ANNA UNIVERSITY, R...
SEM II 3202 STRUCTURAL MECHANICS, B ARCH, REGULATION 2021, ANNA UNIVERSITY, R...SEM II 3202 STRUCTURAL MECHANICS, B ARCH, REGULATION 2021, ANNA UNIVERSITY, R...
SEM II 3202 STRUCTURAL MECHANICS, B ARCH, REGULATION 2021, ANNA UNIVERSITY, R...
RVSPSOA
 
প্রত্যুৎপন্নমতিত্ব - Prottutponnomotittwa 2025.pdf
প্রত্যুৎপন্নমতিত্ব - Prottutponnomotittwa 2025.pdfপ্রত্যুৎপন্নমতিত্ব - Prottutponnomotittwa 2025.pdf
প্রত্যুৎপন্নমতিত্ব - Prottutponnomotittwa 2025.pdf
Pragya - UEM Kolkata Quiz Club
 
Writing Research Papers: Guidance for Research Community
Writing Research Papers: Guidance for Research CommunityWriting Research Papers: Guidance for Research Community
Writing Research Papers: Guidance for Research Community
Rishi Bankim Chandra Evening College, Naihati, North 24 Parganas, West Bengal, India
 
Pragya Champion's Chalice 2025 Set , General Quiz
Pragya Champion's Chalice 2025 Set , General QuizPragya Champion's Chalice 2025 Set , General Quiz
Pragya Champion's Chalice 2025 Set , General Quiz
Pragya - UEM Kolkata Quiz Club
 
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
Sritoma Majumder
 
Odoo 18 Point of Sale PWA - Odoo Slides
Odoo 18 Point of Sale PWA  - Odoo  SlidesOdoo 18 Point of Sale PWA  - Odoo  Slides
Odoo 18 Point of Sale PWA - Odoo Slides
Celine George
 
Active Surveillance For Localized Prostate Cancer A New Paradigm For Clinical...
Active Surveillance For Localized Prostate Cancer A New Paradigm For Clinical...Active Surveillance For Localized Prostate Cancer A New Paradigm For Clinical...
Active Surveillance For Localized Prostate Cancer A New Paradigm For Clinical...
wygalkelceqg
 
LET´S PRACTICE GRAMMAR USING SIMPLE PAST TENSE
LET´S PRACTICE GRAMMAR USING SIMPLE PAST TENSELET´S PRACTICE GRAMMAR USING SIMPLE PAST TENSE
LET´S PRACTICE GRAMMAR USING SIMPLE PAST TENSE
OlgaLeonorTorresSnch
 
IDSP(INTEGRATED DISEASE SURVEILLANCE PROGRAMME...
IDSP(INTEGRATED DISEASE SURVEILLANCE PROGRAMME...IDSP(INTEGRATED DISEASE SURVEILLANCE PROGRAMME...
IDSP(INTEGRATED DISEASE SURVEILLANCE PROGRAMME...
SweetytamannaMohapat
 
Critical Thinking and Bias with Jibi Moses
Critical Thinking and Bias with Jibi MosesCritical Thinking and Bias with Jibi Moses
Critical Thinking and Bias with Jibi Moses
Excellence Foundation for South Sudan
 
Forestry Model Exit Exam_2025_Wollega University, Gimbi Campus.pdf
Forestry Model Exit Exam_2025_Wollega University, Gimbi Campus.pdfForestry Model Exit Exam_2025_Wollega University, Gimbi Campus.pdf
Forestry Model Exit Exam_2025_Wollega University, Gimbi Campus.pdf
ChalaKelbessa
 
How to Manage Orders in Odoo 18 Lunch - Odoo Slides
How to Manage Orders in Odoo 18 Lunch - Odoo SlidesHow to Manage Orders in Odoo 18 Lunch - Odoo Slides
How to Manage Orders in Odoo 18 Lunch - Odoo Slides
Celine George
 
Swachata Quiz - Prelims - 01.10.24 - Quiz Club IIT Patna
Swachata Quiz - Prelims - 01.10.24 - Quiz Club IIT PatnaSwachata Quiz - Prelims - 01.10.24 - Quiz Club IIT Patna
Swachata Quiz - Prelims - 01.10.24 - Quiz Club IIT Patna
Quiz Club, Indian Institute of Technology, Patna
 
LDMMIA Free Reiki Yoga S7 Weekly Workshops
LDMMIA Free Reiki Yoga S7 Weekly WorkshopsLDMMIA Free Reiki Yoga S7 Weekly Workshops
LDMMIA Free Reiki Yoga S7 Weekly Workshops
LDM & Mia eStudios
 
Order: Odonata Isoptera and Thysanoptera.pptx
Order: Odonata Isoptera and Thysanoptera.pptxOrder: Odonata Isoptera and Thysanoptera.pptx
Order: Odonata Isoptera and Thysanoptera.pptx
Arshad Shaikh
 
Cloud Computing ..PPT ( Faizan ALTAF )..
Cloud Computing ..PPT ( Faizan ALTAF )..Cloud Computing ..PPT ( Faizan ALTAF )..
Cloud Computing ..PPT ( Faizan ALTAF )..
faizanaltaf231
 
"Dictyoptera: The Order of Cockroaches and Mantises" Or, more specifically: ...
"Dictyoptera: The Order of Cockroaches and Mantises"  Or, more specifically: ..."Dictyoptera: The Order of Cockroaches and Mantises"  Or, more specifically: ...
"Dictyoptera: The Order of Cockroaches and Mantises" Or, more specifically: ...
Arshad Shaikh
 
"Hymenoptera: A Diverse and Fascinating Order".pptx
"Hymenoptera: A Diverse and Fascinating Order".pptx"Hymenoptera: A Diverse and Fascinating Order".pptx
"Hymenoptera: A Diverse and Fascinating Order".pptx
Arshad Shaikh
 
Stewart Butler - OECD - How to design and deliver higher technical education ...
Stewart Butler - OECD - How to design and deliver higher technical education ...Stewart Butler - OECD - How to design and deliver higher technical education ...
Stewart Butler - OECD - How to design and deliver higher technical education ...
EduSkills OECD
 
PHYSIOLOGY & SPORTS INJURY by Diwakar Sir
PHYSIOLOGY & SPORTS INJURY by Diwakar SirPHYSIOLOGY & SPORTS INJURY by Diwakar Sir
PHYSIOLOGY & SPORTS INJURY by Diwakar Sir
Diwakar Kashyap
 
SEM II 3202 STRUCTURAL MECHANICS, B ARCH, REGULATION 2021, ANNA UNIVERSITY, R...
SEM II 3202 STRUCTURAL MECHANICS, B ARCH, REGULATION 2021, ANNA UNIVERSITY, R...SEM II 3202 STRUCTURAL MECHANICS, B ARCH, REGULATION 2021, ANNA UNIVERSITY, R...
SEM II 3202 STRUCTURAL MECHANICS, B ARCH, REGULATION 2021, ANNA UNIVERSITY, R...
RVSPSOA
 
প্রত্যুৎপন্নমতিত্ব - Prottutponnomotittwa 2025.pdf
প্রত্যুৎপন্নমতিত্ব - Prottutponnomotittwa 2025.pdfপ্রত্যুৎপন্নমতিত্ব - Prottutponnomotittwa 2025.pdf
প্রত্যুৎপন্নমতিত্ব - Prottutponnomotittwa 2025.pdf
Pragya - UEM Kolkata Quiz Club
 
Pragya Champion's Chalice 2025 Set , General Quiz
Pragya Champion's Chalice 2025 Set , General QuizPragya Champion's Chalice 2025 Set , General Quiz
Pragya Champion's Chalice 2025 Set , General Quiz
Pragya - UEM Kolkata Quiz Club
 
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
Sritoma Majumder
 
Odoo 18 Point of Sale PWA - Odoo Slides
Odoo 18 Point of Sale PWA  - Odoo  SlidesOdoo 18 Point of Sale PWA  - Odoo  Slides
Odoo 18 Point of Sale PWA - Odoo Slides
Celine George
 
Active Surveillance For Localized Prostate Cancer A New Paradigm For Clinical...
Active Surveillance For Localized Prostate Cancer A New Paradigm For Clinical...Active Surveillance For Localized Prostate Cancer A New Paradigm For Clinical...
Active Surveillance For Localized Prostate Cancer A New Paradigm For Clinical...
wygalkelceqg
 
LET´S PRACTICE GRAMMAR USING SIMPLE PAST TENSE
LET´S PRACTICE GRAMMAR USING SIMPLE PAST TENSELET´S PRACTICE GRAMMAR USING SIMPLE PAST TENSE
LET´S PRACTICE GRAMMAR USING SIMPLE PAST TENSE
OlgaLeonorTorresSnch
 
IDSP(INTEGRATED DISEASE SURVEILLANCE PROGRAMME...
IDSP(INTEGRATED DISEASE SURVEILLANCE PROGRAMME...IDSP(INTEGRATED DISEASE SURVEILLANCE PROGRAMME...
IDSP(INTEGRATED DISEASE SURVEILLANCE PROGRAMME...
SweetytamannaMohapat
 
Forestry Model Exit Exam_2025_Wollega University, Gimbi Campus.pdf
Forestry Model Exit Exam_2025_Wollega University, Gimbi Campus.pdfForestry Model Exit Exam_2025_Wollega University, Gimbi Campus.pdf
Forestry Model Exit Exam_2025_Wollega University, Gimbi Campus.pdf
ChalaKelbessa
 
How to Manage Orders in Odoo 18 Lunch - Odoo Slides
How to Manage Orders in Odoo 18 Lunch - Odoo SlidesHow to Manage Orders in Odoo 18 Lunch - Odoo Slides
How to Manage Orders in Odoo 18 Lunch - Odoo Slides
Celine George
 
LDMMIA Free Reiki Yoga S7 Weekly Workshops
LDMMIA Free Reiki Yoga S7 Weekly WorkshopsLDMMIA Free Reiki Yoga S7 Weekly Workshops
LDMMIA Free Reiki Yoga S7 Weekly Workshops
LDM & Mia eStudios
 
Order: Odonata Isoptera and Thysanoptera.pptx
Order: Odonata Isoptera and Thysanoptera.pptxOrder: Odonata Isoptera and Thysanoptera.pptx
Order: Odonata Isoptera and Thysanoptera.pptx
Arshad Shaikh
 

Lecture6 introduction to data streams

  • 1. Introduction to Data Streams Concepts
  • 2. What is a data stream? • Golab & Oszu (2003): “A data stream is a real-time, continuous, ordered (implicitly by arrival time or explicitly by timestamp) sequence of items. It is impossible to control the order in which items arrive, nor is it feasible to locally store a stream in its entirety.” • Massive volumes of data, items arrive at a high rate.
  • 3. Data Streams • A data stream is a (potentially unbounded) sequence of tuples. Each tuple consist of a set of attributes, similar to a row in database table. • Transactional data streams: log interactions between entities • Credit card: purchases by consumers from merchants • Telecommunications: phone calls by callers to dialed parties • Web: accesses by clients of resources at servers • Measurement data streams: monitor evolution of entity states • Sensor networks: physical phenomena, road traffic • IP network: traffic at router interfaces • Earth climate: temperature, moisture at weather stations
  • 4. Examples of Stream Sources Before proceeding, let us consider some of the ways in which stream data arises aturally. Sensor Data : Imagine a temperature sensor bobbing about in the ocean, sending back to a base station a reading of the surface temperature each hour. The data produced by this sensor is a stream of real numbers. Now we have 3.5 terabytes arriving every day, and we definitely need to think about what can be kept in working storage and what can only be archived. Image Data : Satellites often send down to earth streams consisting of many terabytes of images per day. Surveillance cameras produce images with lower resolution than satellites, but there can be many of them, each producing a stream of images at intervals like one second. Internet and Web Traffic : A switching node in the middle of the Internet receives streams of IP packets from many inputs and routes them to its outputs. Web sites receive streams of various types. For example, Google receives several hundred million search queries per day. Yahoo! accepts billions of “clicks” per day on its various sites.
  • 5. Characteristics of Data Streams • Characteristics • Huge volumes of continuous data, possibly infinite • Fast changing and requires fast, real-time response • Data stream captures nicely our data processing needs of today • Random access is expensive—single scan algorithm (can only have one look) • Store only the summary of the data seen thus far • Most stream data are at pretty low-level or multi-dimensional in nature, needs multi-level and multi-dimensional processing
  • 6. Applications of data stream processing • Data stream processing • Process queries (compute statistics, activate alarms) • Apply data mining algorithms • Requirements • Real-time processing • One-pass processing • Bounded storage (no complete storage of streams) • Possibly consider several streams • Let’s go deeper into some examples • Network management • Stock monitoring
  • 10. A data-stream-management system (DSMS) • Streams may be archived in a large archival store, but we assume it is not possible to answer queries from the archival store. • It could be examined only under special circumstances using time-consuming retrieval processes. • There is also a working store, into which summaries or parts of streams may be placed, and which can be used for answering queries. • The working store might be disk, or it might be main memory, depending on how fast we need to process queries. • But either way, it is of sufficiently limited capacity that it cannot store all the data from all the streams.
  • 11. Generic DSMS Architecture Updates to Static Data User Queries [Golab & Özsu 2003] Input Monitor Output Buffer QueryProcessor Query Reposi- tory Working Storage Summary Storage Static Storage Streaming Inputs Streaming Outputs
  • 12. Architecture: Stream Query Processing SDMS (Stream Data Management System)
  • 14. DBMS versus DSMS (Data Stream Management System) • Persistent relations • One-time queries • Random access • “Unbounded” disk store • Only current state matters • No real-time services • Relatively low update rate • Data at any granularity • Assume precise data • Access plan determined by query processor, physical DB design • Transient streams • Continuous queries • Sequential access • Bounded main memory • Historical data is important • Real-time requirements • Possibly multi-GB arrival rate • Data at fine granularity • Data stale/imprecise • Unpredictable/variable data arrival and characteristics
  • 16. Challenges of Stream Data Processing • Multiple, continuous, rapid, time-varying, ordered streams • Main memory computations • Queries are often continuous • Evaluated continuously as stream data arrives • Answer updated over time • Queries are often complex • Beyond element-at-a-time processing • Beyond stream-at-a-time processing • Beyond relational queries (scientific, data mining, OLAP) • Multi-level/multi-dimensional processing and data mining • Most stream data are at low-level or multi-dimensional in nature
  • 17. How to deal with Big Data Streams ?
  • 18. Approximate answers to queries  When ? • Queries needing unbounded memory • Too much queries/too rapid streams/too high response time requirements • CPU limit • Memory limit • Solution : approximate answers to queries • Sliding windows • Sampling and load shedding • Definition of synopsis
  • 19. Straming Computing Approaches • Two approaches for handling such streams • Use a time window, and query the window as a static table • When you can’t store collected data, or to keep track of historical data • Sampling • Filtering • Counting