Talk I gave at StratHadoop in Barcelona on November 21, 2014.
In this talk I discuss the experience we made with realtime analysis on high volume event data streams.
This document provides an overview and agenda for a presentation on CA Service Virtualization. The presentation introduces service virtualization, discusses why it is important for addressing the speed and quality quandary in software development, and how it works to improve software development and testing. Key points include that service virtualization models the behavior of software services to remove constraints in development and testing. It can help teams complete work faster, improve quality, and reduce costs. The demonstration portion will show how service virtualization improves on traditional stubs and mocks.
Understanding the Agile Release and Sprint Planning Process John Derrico
The document discusses Agile planning processes. Release planning occurs before each release and involves the product owner, Scrum team, and stakeholders prioritizing features and setting release dates. Sprint planning occurs before each sprint and involves the Scrum team and product owner selecting stories for the sprint from the prioritized backlog, estimating work, and establishing a plan. The document provides details on participants, timing, objectives, inputs, and outputs for both release and sprint planning meetings in Agile. It also notes that estimations may be inaccurate initially but will improve over time as teams gain experience.
Definition of Done and Product Backlog refinementChristian Vos
The document discusses product backlog refinement and the definition of done in agile software development. It emphasizes that product backlog refinement is an important meeting to clarify and estimate user stories and work items to have a ready backlog for iteration planning. It also stresses that having a clear definition of done helps improve team quality, transparency for stakeholders, better release planning, and minimizing risks. Regular product backlog refinement coupled with a well-defined definition of done are key practices for achieving agility.
Purchasing Process Flowchart Identifying Sourcing Management Arrow Gear Accep...SlideTeam
The document outlines the steps in a company's purchasing process, including identifying needs, sourcing suppliers, establishing sales contracts, delivering goods, and requesting payment. It provides flowcharts and diagrams showing the flow of a purchase request from the initial requestor through approval, ordering, receiving goods, and paying invoices. The purchasing process aims to efficiently acquire needed goods and services while maintaining appropriate controls and approvals.
What is the purpose of Sprint planning meeting in Agile?Mario Lucero
What is the purpose of the Sprint planning meeting?
When you’re working within an agile management framework, you accomplish discrete tasks within the framework of a sprint. On the first day of each sprint the scrum team holds the sprint planning meeting.
Wondering what all the hype is about CA Service Virtualization? Wonder no more. Come explore at this technical, pre-conference session the basics of how service virtualization works, why it's so important in today's Agile and DevOps and take a peek at some of the features and functionality that are being launched here at CA World '15.
For more information, please visit http://cainc.to/Nv2VOe
Achieving Elite and High Performance DevOps Using DORA MetricsAggregage
How is your organization’s DevOps doing? Do you have strategies to both identify problems and improve performance? DevOps Research and Assessment (DORA) has identified four key metrics to help organizations understand where their DevOps stands and how it can reach an elite level of performance. In this upcoming webinar, Rollbar will teach you one way to become an elite performer: focusing on Continuous Code Improvement.
Companies that use the Continuous Code Improvement approach have a compact feedback loop that tells them when there’s a code issue that needs to be fixed, fixes it, then goes back to writing and running code. It’s an approach to maintaining and updating software applications that allows for faster deployments, fewer errors, and quicker fixes to problems.
This webinar shows how to accelerate code quality as an “Elite or High Performing” DevOps team with Continuous Code Improvement, and explains how this will:
• Increase the speed of your deployments
• Improve the stability of your software
• Build in security from the start
Application Integration - A Key Component of your Digital TransformationSafe Software
Asset and operations management software collect all sorts of data and information about your vehicles, equipment, facilities, and infrastructure - location, original purchase cost, current condition, and more! Coupled with preventative and emergency maintenance and predictive planning of your capital budget, the accumulated data is invaluable to your organization for analysis and reporting, as well as to let requestors know the current status of work-in-progress. But, this amazing software also requires a lot of care and feeding of curated data from your financial system, your customer service applications, sensor and SCADA networks, and your space management and GIS tools. How can you ensure that all systems are up-to-date with the latest data? And that all users have the data they need at their fingertips?
YOUR challenge now is that users’ expectations of their digital experiences at work are based on their experiences as consumers. You are using systems, like asset and operations management software, to simplify your line-of-business applications. Digital transformation of your business also implies the connection of data in various systems into cross-departmental workflows. You want to transform inefficiency into innovation and empower your teams to make decisions based on up-to-date information.
BUT, your challenge to implement is that your users’ requirements are often vague – and loaded with high expectations. Your use cases are unclear – is there a clear business benefit to connect to other systems? The lack of clarity leads to RISK, which is characterized by higher costs and uncertainty in delivery for your system implementation or upgrades.
Enter Application Integration… This talk will outline what Application Integration is, the various models for integrating your applications. We will introduce Integration Patterns that are more accessible now - even to smaller operators. And we will propose a process for integration and your realized outcome.
The document discusses quality gates, which are go/no-go decision points between lifecycle phases used to detect and remove defects. It describes how quality gates can be implemented throughout a project's lifecycle and a product's development lifecycle to improve quality. Specifically, it recommends having one quality gate for every transition between project phases and product development milestones. It provides examples of specifying quality gates, including defining what work products are inspected, the gate's purpose, how inspections will be conducted, and who will participate in the gate. Quality gates aim to improve predictability and solution quality by facilitating early and continuous defect detection and prevention.
Design Thinking Case Studies | In Their Own Words | IdeafarmsIdeafarms
Examples of how companies like Intuit, Citrix and others have used the human-centric approach of #DesignThinking for
- Testing and validating Business Models
- Employee Engagement
- Product Innovation and Development
- Internal Efficiencies
- Boosting Revenues
More Examples -
1. How Kaiser Solved the Problem of Hospital “Ghost Towns”
https://www.fastcodesign.com/90150616/how-kaiser-solved-the-problem-of-hospital-ghost-towns
2. How Pepsico, Godrej and Marico are 'designed to succeed
https://brandequity.economictimes.indiatimes.com/news/business-of-brands/how-pepsico-godrej-and-marico-are-designed-to-succeed/48719157
3. How Design Thinking Transformed Airbnb from a Failing Startup to a Billion Dollar Business
http://firstround.com/review/How-design-thinking-transformed-Airbnb-from-failing-startup-to-billion-dollar-business/
4. Starbucks, “The Third Place”, and Creating the Ultimate Customer Experience
https://www.fastcompany.com/887990/starbucks-third-place-and-creating-ultimate-customer-experience
How to Misuse and Abuse DORA Metrics.pptxBryan Finster
1) The document discusses how metrics intended to measure continuous delivery (CD) success, like the DORA metrics, can be misused if not properly understood in context.
2) It provides examples of how goals focused solely on metrics like deployment frequency can negatively impact quality and sustainability.
3) The author advocates for a balanced set of metrics that measure efficiency, effectiveness, quality, sustainability and employee/customer satisfaction. Leading indicators should support business goals rather than become goals themselves.
This document provides guidance on transitioning an organization to a DevOps model. It discusses how organizational structures can impact technical designs based on Conway's Law. It then covers common anti-patterns when shifting to DevOps like relying on a single consultant. The document proposes using a logical rather than structural view of the organization and modeling it after Spotify's Guild model. It offers tips for facilitating collaboration between teams and overcoming challenges to change. Finally, it addresses technical transition topics like security as code and environment consistency. The overall message is that organizational change requires clear communication, addressing business needs, facilitating cross-team work, and setting ambitious yet achievable goals.
Agile Transformation is a consulting firm that specializes in organizational transformation using Agile, Lean, and other methods. They help clients transform their processes, teams, and culture to improve performance. Their services include assessing needs, developing custom roadmaps, coaching teams in Agile practices, and training leaders in skills like servant leadership and collaboration. Clients praise how Agile Transformation helped them successfully transform their culture, empower teams, and bridge gaps between departments.
Reinventing Application Performance Testing with Service VirtualizationCA Technologies
Traditional performance testing typically requires that all components of the application are “completed,” integrated and deployed into an appropriate environment. This results in testing not being done until late in the delivery cycle or sometimes skipped entirely. Which can then lead to a less then optimal user experience, expensive rework and potential loss of business. Learn why many organizations are adopting service virtualization to overcome the key challenges associated with performance testing.o Learn more about CA Service Virtualization please visit http://cainc.to/DMPQSE
What do you mean by “API as a Product”?Nordic APIs
You may have heard the term “API Product.” But what does it mean? In this talk I will introduce the concept and explain the benefits and challenges of transforming your organization to view your APIs as measurable products that expose your companies capabilities creating agility, autonomy, and acceleration. Traditional product manufacturers create new product and launch them into the marketplace and then measure value; we will teach you to view your APIs in the same way. Concepts covered in this presentation will be designing APIs with Design Thinking, funding your product, building teams, marketing your API, managing your marketplace, and measuring success.
This document discusses DevOps, including what it is, why it is used, its history and practices. DevOps combines cultural philosophies and tools to increase an organization's ability to deliver applications and services faster. It involves development and operations teams working together throughout the entire service lifecycle. Key DevOps practices include continuous integration, delivery and deployment; use of microservices; infrastructure as code; monitoring and logging; and communication between teams. The DevOps lifecycle aims to continuously deliver products through automation and monitoring at each stage of development and deployment.
Quarterly Based Multiple Project TimelineSlideTeam
Presenting this set of slides with name Quarterly Based Multiple Project Timeline. This is a one stage process. The stages in this process are Quarterly Based Multiple Project Timeline. This is a completely editable PowerPoint presentation and is available for immediate download. Download now and impress your audience. https://bit.ly/2VJqUTV
The document discusses various metrics that can be used to measure schedule, quality, and productivity in agile projects. For schedule, it recommends measuring effort burn down at the sprint level and comparing actual burn down to expected curves. For quality, it suggests tracking metrics like defects per sprint or story point. For productivity, measures include story points or estimated hours completed per sprint or staff-sprint. The data for these metrics can be collected from agile project management tools and analyzed in burn down charts and retrospectives to assess progress and identify opportunities for improvement.
DevOps is the act of managing two distinct but complementary areas of expertise: development and operations. Devops emphasizes collaboration and integration between app developers and IT operations professionals.These 10 business advantages of DevOps can help you see why it's important for organizations to adopt this methodology if they want to stay competitive in the digital economy.
Training objective: Introduction to requirements engineering, as well as an understanding of the role and responsibilities of an engineer on demand. Familiarization with the techniques for identifying and documenting requirements. Defining approaches to validation and requirements management.
The ability to grow (and shrink) according to the needs and the available resources is an essential part of designing applications. In this talk we'll cover the fundamental elements of scalability, including aspects involving people, processes and technology. With sound and proven principles and some advice on how to shape your organisation, set the right processes and design your application, this session is a must-see for developers and technical leads alike.
Agile evolution lifecycle - From implementing Agile to being AgileMichal Epstein
This document outlines an agile evolution lifecycle consisting of adoption, adjustment, and advancement. It discusses scaling challenges with initial agile adoption within teams and a lack of visibility outside teams. The adjustment phase emphasizes focusing on small, well-defined user stories and taking responsibility for deliveries. Advancement challenges working agile in organizations needing roadmaps for customers and discusses prioritizing features by business value and cost to fit within scope. The final culture stage involves organizational unity across functions, adapting approaches, and focusing on short cycles of gradual value to keep customers happy.
When DevOps talks meet DevOps tactics, companies find that Continuous Integration is the make or break point. And implementing CI is one thing, but sustainable CI takes a little bit more consideration. CI is not all about releases, it is also about knowing more about how your software delivery pipeline works, it's weak points, and how you are doing over time.
Join CloudBees and cPrime as we discuss best practices for facilitating DevOps pipelines with Jenkins Workflow and reveal how the workflow engine of Jenkins CI and “Agilecentric” Devops practices together, support complex control structures, shortens the development cycle, stabilizes environments and reduces defects.
What is agile? Where did it come from, and how can it help me?
This session will go through a history of agile, including the origins of waterfall, the Toyota Production System and lean manufacturing, the creation of the agile manifesto, and how these all lead to the modern agile development frameworks we use today. By exploring the original design and intent behind agile principles and practices, we'll also uncover common pitfalls to agile adoption, and insights into overcoming them.
Agile software development is an iterative approach that emphasizes collaboration between self-organizing teams. It promotes adaptive planning, evolutionary development, and rapid response to change. Key characteristics include breaking work into small increments, short iterations of 1-4 weeks with full development cycles, cross-functional teams without hierarchy, and face-to-face communication. Agile differs from traditional methods by focusing more on collaboration and working software than documentation. Common challenges to adopting agile include getting individuals to work as cohesive teams and increasing transparency.
Introduction to Data streaming - 05/12/2014Raja Chiky
Raja Chiky is an associate professor whose research interests include data stream mining, distributed architectures, and recommender systems. The document outlines data streaming concepts including what a data stream is, data stream management systems, and basic approximate algorithms used for processing massive, high-velocity data streams. It also discusses challenges in distributed systems and using semantic technologies for data streaming.
Achieving Elite and High Performance DevOps Using DORA MetricsAggregage
How is your organization’s DevOps doing? Do you have strategies to both identify problems and improve performance? DevOps Research and Assessment (DORA) has identified four key metrics to help organizations understand where their DevOps stands and how it can reach an elite level of performance. In this upcoming webinar, Rollbar will teach you one way to become an elite performer: focusing on Continuous Code Improvement.
Companies that use the Continuous Code Improvement approach have a compact feedback loop that tells them when there’s a code issue that needs to be fixed, fixes it, then goes back to writing and running code. It’s an approach to maintaining and updating software applications that allows for faster deployments, fewer errors, and quicker fixes to problems.
This webinar shows how to accelerate code quality as an “Elite or High Performing” DevOps team with Continuous Code Improvement, and explains how this will:
• Increase the speed of your deployments
• Improve the stability of your software
• Build in security from the start
Application Integration - A Key Component of your Digital TransformationSafe Software
Asset and operations management software collect all sorts of data and information about your vehicles, equipment, facilities, and infrastructure - location, original purchase cost, current condition, and more! Coupled with preventative and emergency maintenance and predictive planning of your capital budget, the accumulated data is invaluable to your organization for analysis and reporting, as well as to let requestors know the current status of work-in-progress. But, this amazing software also requires a lot of care and feeding of curated data from your financial system, your customer service applications, sensor and SCADA networks, and your space management and GIS tools. How can you ensure that all systems are up-to-date with the latest data? And that all users have the data they need at their fingertips?
YOUR challenge now is that users’ expectations of their digital experiences at work are based on their experiences as consumers. You are using systems, like asset and operations management software, to simplify your line-of-business applications. Digital transformation of your business also implies the connection of data in various systems into cross-departmental workflows. You want to transform inefficiency into innovation and empower your teams to make decisions based on up-to-date information.
BUT, your challenge to implement is that your users’ requirements are often vague – and loaded with high expectations. Your use cases are unclear – is there a clear business benefit to connect to other systems? The lack of clarity leads to RISK, which is characterized by higher costs and uncertainty in delivery for your system implementation or upgrades.
Enter Application Integration… This talk will outline what Application Integration is, the various models for integrating your applications. We will introduce Integration Patterns that are more accessible now - even to smaller operators. And we will propose a process for integration and your realized outcome.
The document discusses quality gates, which are go/no-go decision points between lifecycle phases used to detect and remove defects. It describes how quality gates can be implemented throughout a project's lifecycle and a product's development lifecycle to improve quality. Specifically, it recommends having one quality gate for every transition between project phases and product development milestones. It provides examples of specifying quality gates, including defining what work products are inspected, the gate's purpose, how inspections will be conducted, and who will participate in the gate. Quality gates aim to improve predictability and solution quality by facilitating early and continuous defect detection and prevention.
Design Thinking Case Studies | In Their Own Words | IdeafarmsIdeafarms
Examples of how companies like Intuit, Citrix and others have used the human-centric approach of #DesignThinking for
- Testing and validating Business Models
- Employee Engagement
- Product Innovation and Development
- Internal Efficiencies
- Boosting Revenues
More Examples -
1. How Kaiser Solved the Problem of Hospital “Ghost Towns”
https://www.fastcodesign.com/90150616/how-kaiser-solved-the-problem-of-hospital-ghost-towns
2. How Pepsico, Godrej and Marico are 'designed to succeed
https://brandequity.economictimes.indiatimes.com/news/business-of-brands/how-pepsico-godrej-and-marico-are-designed-to-succeed/48719157
3. How Design Thinking Transformed Airbnb from a Failing Startup to a Billion Dollar Business
http://firstround.com/review/How-design-thinking-transformed-Airbnb-from-failing-startup-to-billion-dollar-business/
4. Starbucks, “The Third Place”, and Creating the Ultimate Customer Experience
https://www.fastcompany.com/887990/starbucks-third-place-and-creating-ultimate-customer-experience
How to Misuse and Abuse DORA Metrics.pptxBryan Finster
1) The document discusses how metrics intended to measure continuous delivery (CD) success, like the DORA metrics, can be misused if not properly understood in context.
2) It provides examples of how goals focused solely on metrics like deployment frequency can negatively impact quality and sustainability.
3) The author advocates for a balanced set of metrics that measure efficiency, effectiveness, quality, sustainability and employee/customer satisfaction. Leading indicators should support business goals rather than become goals themselves.
This document provides guidance on transitioning an organization to a DevOps model. It discusses how organizational structures can impact technical designs based on Conway's Law. It then covers common anti-patterns when shifting to DevOps like relying on a single consultant. The document proposes using a logical rather than structural view of the organization and modeling it after Spotify's Guild model. It offers tips for facilitating collaboration between teams and overcoming challenges to change. Finally, it addresses technical transition topics like security as code and environment consistency. The overall message is that organizational change requires clear communication, addressing business needs, facilitating cross-team work, and setting ambitious yet achievable goals.
Agile Transformation is a consulting firm that specializes in organizational transformation using Agile, Lean, and other methods. They help clients transform their processes, teams, and culture to improve performance. Their services include assessing needs, developing custom roadmaps, coaching teams in Agile practices, and training leaders in skills like servant leadership and collaboration. Clients praise how Agile Transformation helped them successfully transform their culture, empower teams, and bridge gaps between departments.
Reinventing Application Performance Testing with Service VirtualizationCA Technologies
Traditional performance testing typically requires that all components of the application are “completed,” integrated and deployed into an appropriate environment. This results in testing not being done until late in the delivery cycle or sometimes skipped entirely. Which can then lead to a less then optimal user experience, expensive rework and potential loss of business. Learn why many organizations are adopting service virtualization to overcome the key challenges associated with performance testing.o Learn more about CA Service Virtualization please visit http://cainc.to/DMPQSE
What do you mean by “API as a Product”?Nordic APIs
You may have heard the term “API Product.” But what does it mean? In this talk I will introduce the concept and explain the benefits and challenges of transforming your organization to view your APIs as measurable products that expose your companies capabilities creating agility, autonomy, and acceleration. Traditional product manufacturers create new product and launch them into the marketplace and then measure value; we will teach you to view your APIs in the same way. Concepts covered in this presentation will be designing APIs with Design Thinking, funding your product, building teams, marketing your API, managing your marketplace, and measuring success.
This document discusses DevOps, including what it is, why it is used, its history and practices. DevOps combines cultural philosophies and tools to increase an organization's ability to deliver applications and services faster. It involves development and operations teams working together throughout the entire service lifecycle. Key DevOps practices include continuous integration, delivery and deployment; use of microservices; infrastructure as code; monitoring and logging; and communication between teams. The DevOps lifecycle aims to continuously deliver products through automation and monitoring at each stage of development and deployment.
Quarterly Based Multiple Project TimelineSlideTeam
Presenting this set of slides with name Quarterly Based Multiple Project Timeline. This is a one stage process. The stages in this process are Quarterly Based Multiple Project Timeline. This is a completely editable PowerPoint presentation and is available for immediate download. Download now and impress your audience. https://bit.ly/2VJqUTV
The document discusses various metrics that can be used to measure schedule, quality, and productivity in agile projects. For schedule, it recommends measuring effort burn down at the sprint level and comparing actual burn down to expected curves. For quality, it suggests tracking metrics like defects per sprint or story point. For productivity, measures include story points or estimated hours completed per sprint or staff-sprint. The data for these metrics can be collected from agile project management tools and analyzed in burn down charts and retrospectives to assess progress and identify opportunities for improvement.
DevOps is the act of managing two distinct but complementary areas of expertise: development and operations. Devops emphasizes collaboration and integration between app developers and IT operations professionals.These 10 business advantages of DevOps can help you see why it's important for organizations to adopt this methodology if they want to stay competitive in the digital economy.
Training objective: Introduction to requirements engineering, as well as an understanding of the role and responsibilities of an engineer on demand. Familiarization with the techniques for identifying and documenting requirements. Defining approaches to validation and requirements management.
The ability to grow (and shrink) according to the needs and the available resources is an essential part of designing applications. In this talk we'll cover the fundamental elements of scalability, including aspects involving people, processes and technology. With sound and proven principles and some advice on how to shape your organisation, set the right processes and design your application, this session is a must-see for developers and technical leads alike.
Agile evolution lifecycle - From implementing Agile to being AgileMichal Epstein
This document outlines an agile evolution lifecycle consisting of adoption, adjustment, and advancement. It discusses scaling challenges with initial agile adoption within teams and a lack of visibility outside teams. The adjustment phase emphasizes focusing on small, well-defined user stories and taking responsibility for deliveries. Advancement challenges working agile in organizations needing roadmaps for customers and discusses prioritizing features by business value and cost to fit within scope. The final culture stage involves organizational unity across functions, adapting approaches, and focusing on short cycles of gradual value to keep customers happy.
When DevOps talks meet DevOps tactics, companies find that Continuous Integration is the make or break point. And implementing CI is one thing, but sustainable CI takes a little bit more consideration. CI is not all about releases, it is also about knowing more about how your software delivery pipeline works, it's weak points, and how you are doing over time.
Join CloudBees and cPrime as we discuss best practices for facilitating DevOps pipelines with Jenkins Workflow and reveal how the workflow engine of Jenkins CI and “Agilecentric” Devops practices together, support complex control structures, shortens the development cycle, stabilizes environments and reduces defects.
What is agile? Where did it come from, and how can it help me?
This session will go through a history of agile, including the origins of waterfall, the Toyota Production System and lean manufacturing, the creation of the agile manifesto, and how these all lead to the modern agile development frameworks we use today. By exploring the original design and intent behind agile principles and practices, we'll also uncover common pitfalls to agile adoption, and insights into overcoming them.
Agile software development is an iterative approach that emphasizes collaboration between self-organizing teams. It promotes adaptive planning, evolutionary development, and rapid response to change. Key characteristics include breaking work into small increments, short iterations of 1-4 weeks with full development cycles, cross-functional teams without hierarchy, and face-to-face communication. Agile differs from traditional methods by focusing more on collaboration and working software than documentation. Common challenges to adopting agile include getting individuals to work as cohesive teams and increasing transparency.
Introduction to Data streaming - 05/12/2014Raja Chiky
Raja Chiky is an associate professor whose research interests include data stream mining, distributed architectures, and recommender systems. The document outlines data streaming concepts including what a data stream is, data stream management systems, and basic approximate algorithms used for processing massive, high-velocity data streams. It also discusses challenges in distributed systems and using semantic technologies for data streaming.
This document discusses techniques for mining data streams. It begins by defining different types of streaming data like time-series data and sequence data. It then discusses the characteristics of data streams like their huge volume, fast changing nature, and requirement for real-time processing. The key challenges in stream query processing are the unbounded memory requirements and need for approximate query answering. The document outlines several synopsis data structures and techniques used for mining data streams, including random sampling, histograms, sketches, and randomized algorithms. It also discusses architectures for stream query processing and classification of dynamic data streams.
This document discusses various patterns for real-time streaming analytics. It begins by providing background on data analytics and how real-time streaming has become important for use cases where insights need to be generated very quickly. It then covers basic patterns like preprocessing, alerts and thresholds, counting, and joining event streams. Further patterns discussed include detecting trends, interacting with databases, running batch and real-time queries, and using machine learning models. The document also reviews tools for implementing real-time analytics like stream processing frameworks and complex event processing. Finally, it provides examples of implementing several patterns in Storm and WSO2 CEP.
This Tutorial will discuss and demonstrate how to implement different realtime streaming analytics patterns. We will start with counting usecases and progress into complex patterns like time windows, tracking objects, and detecting trends. We will start with Apache Storm and progress into Complex Event Processing based technologies.
Streamlio and IoT analytics with Apache PulsarStreamlio
To keep up with fast-moving IoT data, you need technology that can collect, process and store data with performance and scalability. This presentation from Data Day Texas looks at the technology requirements and how Apache Pulsar can help to meet them.
WSO2Con USA 2017: Analytics Patterns for Your Digital EnterpriseWSO2
The WSO2 analytics platform provides a high performance, lean, enterprise-ready, streaming solution to solve data integration and analytics challenges faced by connected businesses. This platform offers real-time, interactive, machine learning and batch processing technologies that empower enterprises to build a digital business, by connecting various enterprise data sources to enhance your experience in understanding the data and to increase internal productivity.
This session explores how to enable digital transformation by building a data analytics platform. It will discuss the follwoing topics:
WSO2 Data Analytics Server architecture
Understanding streaming constructs
Architectural styles for data integration
Debugging and troubleshooting your integration
Deployment
Performance tuning
Production hardening
This document discusses analytics patterns and solutions using WSO2 Data Analytics Server (DAS). It covers topics like real-time processing patterns including transformation, temporal aggregation, alerts and thresholds, and event correlation. It also discusses incremental processing patterns, predictive analytics using machine learning models, and smart analytics solutions for industries like banking/finance, eCommerce, fleet management, energy, and healthcare. Key differentiations of WSO2 DAS highlighted are its real-time analytics capabilities, SQL-like query language without code compilation, incremental processing, intelligent decision making with machine learning, rich connectors, and high performance with low infrastructure costs.
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorialeswcsummerschool
The document discusses big data techniques, tools, and applications. It describes how big data is enabled by increases in storage capacity, processing power, and data availability. It outlines common approaches to distributed processing, storage, and programming models for big data, including MapReduce, NoSQL databases, and cloud computing. It also provides examples of applications involving log file analysis, network alarm monitoring, media content analysis, and social network analysis.
This document discusses high-velocity big data analytics and describes how streaming data can be captured, processed, and analyzed in real-time to enable immediate action. It outlines an approach that assimilates structured and unstructured data from various sources, processes the data using distributed in-memory computing, correlates and enriches the real-time data records, and delivers results and alerts. Visual dashboards are used to view the real-time analytics and detect patterns, outliers, and trends in big data.
There are many modern techniques for identifying anomalies in datasets. There are fewer that work as online algorithms suitable for application to real-time streaming data. What’s worse? Most of these methodologies require a deep understanding of the data itself. In this talk, we tour what the options are for identifying anomalies in real-time data and discuss how much we really need to know before hand to guess at the ever-useful question: is this normal?
The document discusses big data opportunities and challenges. It begins with an introduction to the author and their research interests related to large scale data management. It then provides an overview of what big data is, how it has evolved, and some of the key opportunities it provides such as improved customer analytics and optimization. However, big data also presents challenges across the entire data workflow from collection to analysis to storage. These include issues of data heterogeneity, velocity, quality, as well as limitations of traditional relational databases for large scale data.
This document discusses concepts related to data streams and real-time analytics. It begins with introductions to stream data models and sampling techniques. It then covers filtering, counting, and windowing queries on data streams. The document discusses challenges of stream processing like bounded memory and proposes solutions like sampling and sketching. It provides examples of applications in various domains and tools for real-time data streaming and analytics.
This document summarizes Jairam Chandar's presentation on Datasift's use of Hadoop. Datasift ingests a large volume of social data, averaging 7,500 interactions per second. They store this data in HBase and use MapReduce for exports, analytics, and data migration. Some key lessons are to thoroughly tune configurations like memory, block size, and compression for the use case. Ops also need to closely monitor system and application metrics.
This document discusses an approach for mining frequent itemsets from data streams using the Chernoff bound and sliding window model. The proposed CB-based method approximates itemset counts from summary information without rescanning the stream, making it adaptive to streams with different distributions. Experiments showed the method performs better in optimizing memory usage and mining recent patterns in less time with accurate results. The document reviews related work on frequent itemset mining from data streams and motivates the need for an efficient model to handle time-sensitive items in uncertain streams.
Approximation algorithms for stream and batch processingGabriele Modena
At Improve Digital (http://www.improvedigital.com) we collect and process large amounts of machine generated and behavioral data. Our systems address a variety of use cases that involve both batch and streaming technologies. One common denominator of the overall architecture is the need to share models and workflows across both worlds. Another one is that the analysis of large amounts of data often requires trade-offs; for instance trading accuracy for timeliness in streaming applications. One approach to satisfy these constraints is to make "big data" small. In this talk we will review a number of approximation methods for sketching, summarization and clustering and discuss how they are starting to change the way we think about certain types of analytics, and how they are being integrated into our data pipelines.
Efficient Way to Identify User Aware Rare Sequential Patterns in Document Str...ijtsrd
The document proposes a method to identify rare sequential topic patterns in document streams that are uncommon overall but relatively frequent for specific users. It involves three phases: pre-processing to extract topics and identify user sessions, generating all sequential topic pattern candidates and their expected support values for each user, and selecting rare patterns by analyzing rarity from a user-aware perspective. Experiments on real and synthetic datasets show the approach can effectively discover meaningful rare patterns that reflect user characteristics.
Mining Stream Data using k-Means clustering AlgorithmManishankar Medi
This document discusses using k-means clustering to analyze urban road traffic stream data. Stream data arrives continuously over time and is challenging to process due to its high volume, velocity and volatility. The document proposes using a sliding window technique with k-means clustering to analyze recent urban traffic data and visualize clusters in real-time to provide insights into traffic patterns and congested roads. This analysis could help travelers and authorities respond to traffic issues more quickly.
Design and Implementation of A Data Stream Management SystemErdi Olmezogullari
This presentation is related to my Master's Thesis at Ozyegin University. We focused on data mining on the real streaming (not binary) data. The most popular data mining algorithm, Association Rule Mining (ARM), was performed during this study from scratch. At the end of the thesis, we published four national/international papers in the different conferences such as Cloud Computing and Big Data.
Big Data is the new technology or science to make the well informed decision in
business or any other science discipline with huge volume of data from new sources of
heterogeneous data. . Such new sources include blogs, online media, social network, sensor network,
image data and other forms of data which vary in volume, structure, format and other factors. Big
Data applications are increasingly adopted in all science and engineering domains, including space
science, biomedical sciences and astronomic and deep space studies. The major challenges of big
data mining are in data accessing and processing, data privacy and mining algorithms. This paper
includes the information about what is big data, data mining with big data, the challenges in big data
mining and what are the currently available solutions to meet those challenges.
Bringing ML To Production, What Is Missing? AMLD 2020Mikio L. Braun
This document discusses key considerations for bringing machine learning to production. It addresses identifying suitable problems for ML, architectures for ML systems, and organizing teams and data platforms for ML. Specifically, it provides examples of recommender systems and preprocessing patterns. It emphasizes that the ML problem must address the underlying business problem and have different metrics. Architectures include serving patterns, preprocessing in feature stores, and integrating multiple ML models. The document also discusses effective collaboration between data scientists and developers and organizing data science teams within companies.
Academia to industry looking back on a decade of mlMikio L. Braun
Dr. Mikio Braun gave a presentation on his experience transitioning from academia to industry in artificial intelligence over the past decade. He discussed how machine learning has moved from researching problems like image recognition to solving business problems at companies like Zalando. He also compared the exploratory nature of academic research to the need to productize solutions in industry. Throughout, he provided examples of how machine learning is applied at different companies and analyzed whether certain applications truly qualify as artificial intelligence.
The document summarizes a presentation given by Dr. Mikio Braun on architecting AI applications. It discusses the history and approaches of artificial intelligence, including classical, machine learning, and deep learning methods. It also provides examples of applying AI to autonomous driving, chatbots, recommendations, games and more. Finally, it outlines common elements of AI applications and design patterns for aspects like core machine learning, serving models, preprocessing data, automation, and integrating machine learning components.
Machine Learning for Time Series, Strata London 2018Mikio L. Braun
The document discusses machine learning techniques for time series analysis. It covers classical time series models, which make strong assumptions about stationarity but provide explicit modeling. General machine learning approaches can be more flexible but require transforming time series into supervised learning problems. Feature engineering can help preprocess time series data for modeling. Deep learning techniques like LSTMs have shown success by automatically learning representations of time series and sequential data. Examples are given of applications at companies like Zalando, Uber, and Amazon for user behavior modeling, demand forecasting, and multi-series predictions.
Dr. Mikio L. Braun gave a presentation on hardcore data science in practice at StrataConf 2016 in London. He discussed how Zalando, an online fashion retailer operating in 15 countries, heavily uses data science for recommendation engines. Braun covered different recommendation techniques including collaborative filtering, content-based recommendations, and personalized recommendations. He also discussed challenges in moving from static data analysis to production systems that operate in real-time and are frequently updated and monitored. Additionally, Braun addressed collaborations between data scientists and developers who have different coding approaches, and advocated for cross-functional teams and microservices in organizations.
Data flow vs. procedural programming: How to put your algorithms into FlinkMikio L. Braun
The document discusses the differences between procedural and data flow programming paradigms, using Apache Flink as an example data flow system. Data flow programming uses sets of data as basic building blocks and operations on these sets, rather than variables and control flow. It describes translating algorithms like computing a sum or mean, least squares regression, and vector/matrix operations into data flow operations. Broadcast variables are introduced as a way to combine intermediate results in data flow programming.
The document discusses scalable machine learning techniques for analyzing large datasets. It explains that while parts of the machine learning pipeline like data preparation are easily parallelizable, training steps involving gradient descent are more difficult to parallelize. However, there are approaches for scalable training such as stochastic gradient descent, parameter servers, and feature hashing that approximate the model to make distributed optimization feasible. The key aspects of scalable machine learning involve faster learning algorithms, approximating the optimization problem and features, and asynchronous distributed techniques rather than just relying on parallelization alone.
This document provides an introduction to Cassandra, a distributed database management system. It begins with an overview of Cassandra and how it compares to traditional databases. Key aspects discussed include that Cassandra uses a simple query language, scales out through clustering rather than up on larger servers, does not require a fixed database schema, and is eventually consistent. The document then covers Cassandra's data model, architecture, configuration, usage, performance considerations and tuning. Real-world experiences with Cassandra in a Twitter analytics application are also shared.
Mein Talk, den ich auf dem LinuxTag 2011 gegeben habe. Ich geben ein Übersicht über Cassandra und erzähle von Erfahrungen, die wir mit Cassandra gemacht haben, als wir es zur Echtzeitanalyse von Twitterdaten verwendet haben.
文凭(UMA毕业证书)马拉加大学毕业证成绩单制作案例【q微1954292140】马拉加大学offer/学位证、留信官方学历认证(永久存档真实可查)采用学校原版纸张、特殊工艺完全按照原版一比一制作【q微1954292140】Buy Universidad de Málaga Diploma购买美国毕业证,购买英国毕业证,购买澳洲毕业证,购买加拿大毕业证,以及德国毕业证,购买法国毕业证(q微1954292140)购买荷兰毕业证、购买瑞士毕业证、购买日本毕业证、购买韩国毕业证、购买新西兰毕业证、购买新加坡毕业证、购买西班牙毕业证、购买马来西亚毕业证等。包括了本科毕业证,硕士毕业证。
【办理马拉加大学成绩单Buy Universidad de Málaga Transcripts】
购买日韩成绩单、英国大学成绩单、美国大学成绩单、澳洲大学成绩单、加拿大大学成绩单(q微1954292140)新加坡大学成绩单、新西兰大学成绩单、爱尔兰成绩单、西班牙成绩单、德国成绩单。成绩单的意义主要体现在证明学习能力、评估学术背景、展示综合素质、提高录取率,以及是作为留信认证申请材料的一部分。
马拉加大学成绩单能够体现您的的学习能力,包括马拉加大学课程成绩、专业能力、研究能力。(q微1954292140)具体来说,成绩报告单通常包含学生的学习技能与习惯、各科成绩以及老师评语等部分,因此,成绩单不仅是学生学术能力的证明,也是评估学生是否适合某个教育项目的重要依据!
Buy Universidad de Málaga Diploma《正式成绩单论文没过》有文凭却得不到认证。又该怎么办???西班牙毕业证购买,西班牙文凭购买,【q微1954292140】西班牙文凭购买,西班牙文凭定制,西班牙文凭补办。专业在线定制西班牙大学文凭,定做西班牙本科文凭,【q微1954292140】复制西班牙Universidad de Málaga completion letter。在线快速补办西班牙本科毕业证、硕士文凭证书,购买西班牙学位证、马拉加大学Offer,西班牙大学文凭在线购买。
特殊原因导致无法毕业,也可以联系我们帮您办理相关材料:
1:在马拉加大学挂科了,不想读了,成绩不理想怎么办?
2:打算回国了,找工作的时候,需要提供认证《UMA成绩单购买办理马拉加大学毕业证书范本》
购买日韩毕业证、英国大学毕业证、美国大学毕业证、澳洲大学毕业证、加拿大大学毕业证(q微1954292140)新加坡大学毕业证、新西兰大学毕业证、爱尔兰毕业证、西班牙毕业证、德国毕业证,回国证明,留信网认证,留信认证办理,学历认证。从而完成就业。马拉加大学毕业证办理,马拉加大学文凭办理,马拉加大学成绩单办理和真实留信认证、留服认证、马拉加大学学历认证。学院文凭定制,马拉加大学原版文凭补办,成绩单详解细节,扫描件文凭定做,100%文凭复刻。
主营项目:
1、真实教育部国外学历学位认证《西班牙毕业文凭证书快速办理马拉加大学学位证和毕业证的区别》【q微1954292140】《论文没过马拉加大学正式成绩单》,教育部存档,教育部留服网站100%可查.
2、办理UMA毕业证,改成绩单《UMA毕业证明办理马拉加大学学历认证失败怎么办》【Q/WeChat:1954292140】Buy Universidad de Málaga Certificates《正式成绩单论文没过》,马拉加大学Offer、在读证明、学生卡、信封、证明信等全套材料,从防伪到印刷,从水印到钢印烫金,高精仿度跟学校原版100%相同.
3、真实使馆认证(即留学人员回国证明),使馆存档可通过大使馆查询确认.
4、留信网认证,国家专业人才认证中心颁发入库证书,留信网存档可查.
西班牙马拉加大学毕业证(UMA毕业证书)UMA文凭【q微1954292140】高仿真还原西班牙文凭证书和外壳,定制西班牙马拉加大学成绩单和信封。学历认证失败怎么办UMA毕业证【q微1954292140】毕业证工艺详解马拉加大学offer/学位证文凭一模一样、留信官方学历认证(永久存档真实可查)采用学校原版纸张、特殊工艺完全按照原版一比一制作。帮你解决马拉加大学学历学位认证难题。
帮您解决在西班牙马拉加大学未毕业难题(Universidad de Málaga)文凭购买、毕业证购买、大学文凭购买、大学毕业证购买、买文凭、日韩文凭、英国大学文凭、美国大学文凭、澳洲大学文凭、加拿大大学文凭(q微1954292140)新加坡大学文凭、新西兰大学文凭、爱尔兰文凭、西班牙文凭、德国文凭、教育部认证,买毕业证,毕业证购买,买大学文凭,【q微1954292140】学位证1:1完美还原海外各大学毕业材料上的工艺:水印,阴影底纹,钢印LOGO烫金烫银,LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。《马拉加大学2025年新版毕业证书西班牙毕业证书办理UMA录取通知书》
5 Reasons cheap WordPress hosting is costing you more | Reversed OutReversed Out Creative
Cheap WordPress hosting may seem budget-friendly, but it often comes with hidden costs like poor performance, security risks, and limited support. This article breaks down the true impact of low-cost hosting and why investing wisely can benefit your website in the long run.
Follow this step-by-step guide to activate and configure your Frontier Unlimited Internet. Get expert setup tips from a reliable Internet service Provider and responsive Frontier Customer Service.
Cloud VPS Provider in India: The Best Hosting Solution for Your BusinessDanaJohnson510230
HeroXhost is a leading Cloud VPS provider in India offering powerful hosting solutions with SSD storage, high-speed performance, and 24/7 support. It provides flexible pricing plans suitable for startups, enterprises, and developers.
How to Make Money as a Cam Model – Tips, Tools & Real TalkCam Sites Expert
Want to turn your charm, confidence, and camera into a real source of income? This presentation reveals everything you need to know about making money as a cam model — whether you're just starting out or looking to boost your earnings. From choosing the right platform, building your fanbase, and setting up your cam space, to marketing yourself and creating passive income with clips, this guide covers it all. I’ll also share real-world insights from my experience on CamsRating.com. No BS — just proven tips, smart tools, and sexy strategies to get paid doing what you love.
Essential Tech Stack for Effective Shopify Dropshipping Integration.pdfCartCoders
Looking to connect AliExpress or other platforms with your Shopify store? Our Shopify Dropshipping Integration service helps automate orders, manage inventory, and improve delivery time. Start syncing your suppliers and scale your dropshipping business.
Essential Tech Stack for Effective Shopify Dropshipping Integration.pdfCartCoders
Realtime Data Analysis Patterns
1. Realtime Data
Analysis Patterns
Mikio Braun
@mikiobraun
streamdrill & TU Berlin
O'Really Strata+Hadoop, Barcelona
Nov 21, 2014
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
2. How it all started: Realtime
Twitter Retweet Trends
Rails app + PostgreSQL
About 100 tweets/second,and it got worse
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
3. Road from there
● Version 1.0: Rails + PostgreSQL
– store and batch
● Version 2.0: Scala + Cassandra
– stream processing & working data on disk
● Version 3.0: streamdrill
– “in-memory realtime analytics database”
– approximative algorithms to bound resources
– moderate parallelism for some things
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
4. Lessons learned?
Not just one kind of
realtime.
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
6. Two Dimensions of Real-Time
Complexity Latency
● counting
● trends
● outlier detection
● recommendation
● prediction (churn,
etc.)
● now (ms, RTB)
● seconds (fraud)
● hours (monitoring)
● days (reporting)
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
7. What makes realtime hard
● Many Events
– 100 events / second
– 360k per hour
– 8.6M per day
– 260M per month
– 3.2B per year
● Many Objects
http://www.flickr.com/photos/arenamontanus/269158554/
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
8. Classes of Realtime
● Events per second (100s? 1000s? 10k?)
● Number of objects (A few dozen? Millions?)
● Complexity (Counting? Trends?)
● Latency (Milliseconds? Hours?)
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
10. Data Acquisition
● Flat files / HDFS
● Apache Flume / Logstash
● Apache Kafka for distributed logging
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
11. Processing
● Depending on Latency: Batch or Streaming
● Batch
– Apache Hadoop
– Apache Spark
– Apache Flink
● Streaming
– Apache Storm
– Apache Samza
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
12. Query Layer
● Hadoop/Storm/Spark have no query layer
● Some db backend like redis to store the results
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
13. Lambda Architecture: Mixing
Batch & Streaming
http://lambda-architecture.net/
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
15. Scaling vs. Approximation
● Scaling is expensive
● Not all results are relevant
● Data changes all the time anyway
● Approximate:
Trade accuracy for resource usage
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
17. Heavy Hitters
● Count activities over large item sets (millions, even
more, e.g. IP addresses, Twitter users)
● Interested in most active elements only.
frank
paul
jan
felix
leo
alex
15
12
8
5
3
2
Fixed tables of counts
Case 1: element already in data base
paul paul 12 13
Case 2: new element
nico alex 2
nico 3
Metwally, Agrawal, Abbadi, Efficient computation of Frequent and Top-k Elements in Data Streams, Internation Conference
on Database Theory, 2005
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
18. Count Min Sketch
● Summarize histograms over large feature sets
● Like bloom filters, but better
m bins
0 0 3 0
1 1 0 2
0 2 0 0
0 3 5 2
0 5 3 2
2 4 5 0
1 3 7 3
0 2 0 8
n different
hash functions
Updates for new entry
Query result: 1
● Query: Take minimum over all hash functions
G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications.
LATIN 2004, J. Algorithm 55(1): 58-75 (2005) .
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
19. Hyper Log Log
● Hash stream to generate random bit strings
● Look for infrequent events
● If probability is one hundreths → should have
seen 100 events on average if it occurs.
● Average to improve estimate.
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
20. Comparing Approx. Algorithms
● Heavy Hitters:
– approx. counts + top-k
– large memory requirement
● Count Min Sketch
– approx. counts for all, but no top-k, no elements
– needs to know size beforehand
● HyperLogLog
– approx. number of distinct elements
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
21. Exponential Decay
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
22. Beyond Counting
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
23. Streamdrill & Demos
● Realtime Analysis Solutions
● Core Engine:
– Heavy Hitters + exponential decay + seconndary indices
– Instant counts & top-k results over time windows
– In-memory
– Written in Scala
● Modules
– Profiling and Trending
– Recommendations
– Count Distinct
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
24. Example: Twitter Stock Analysis
http://play.streamdrill.com/vis/
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
25. Example: Twitter Stock Analysis
● Trends:
– symbol:combinations $AAPL:$GOOG
– symbol:hashtag $AAPL:#trading
– symbol:keywords $GOOG:disruption
– symbol:mentions $GOOG:WallStreetCom
– symbol trend $AAPL
– symbol:url $FB:http://on.wsj.com/15fHaZW
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
26. Example: Twitter Stock Analysis
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
27. Example: Twitter Stock Analysis
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
28. Example: Twitter Stock Analysis
Twitter
streamdrill
JavaScript
via REST
tweets
Tweet Analyzer
updates
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
29. Realtime User Profiles
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
30. Realtime User Profiles
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
31. Realtime User Profiles
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
32. Realtime User Profiles
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
33. Realtime user profiles
● Process 10k events / second on one machine
● Track about 1 Million counts per 1 GB
● Shard by user for higher accuracy
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
34. Realtime Data Analysis Patterns
● Acquisition / Processing / Query Layer
● Acquisition: Flat files and distributed logs
● Processing: Scaling batch or streaming
● Query Layer: Separate query from processing
● Lambda and Kappa Architecture
● Approximation as alternative to scaling
● Trends with indices as building blocks for data
analysis
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun
35. Thank You
Mikio Braun
[email protected]
@mikiobraun
Mikio L. Braun, @mikiobraun Realtime Data Analysis Patterns (c) 2014 by Mikio Braun