0% found this document useful (0 votes)
4 views11 pages

unit4

Chapter 8 discusses the capabilities and limitations of consumer devices in accessing streaming data, categorizing client applications into UI/End-User Applications, Integration with Third-Party or Stream Processors, and Hybrid Message Logging techniques. It highlights potential data loss areas, client-side considerations for handling data streams, and the importance of implementing acknowledgment systems for reliable message processing. The chapter emphasizes the need for developers to ensure efficient data handling and error management in streaming applications.

Uploaded by

ssitavinya2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views11 pages

unit4

Chapter 8 discusses the capabilities and limitations of consumer devices in accessing streaming data, categorizing client applications into UI/End-User Applications, Integration with Third-Party or Stream Processors, and Hybrid Message Logging techniques. It highlights potential data loss areas, client-side considerations for handling data streams, and the importance of implementing acknowledgment systems for reliable message processing. The chapter emphasizes the need for developers to ensure efficient data handling and error management in streaming applications.

Uploaded by

ssitavinya2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Chapter 8: Consumer device capabilities and

limitations accessing the data

1.Explain the three general categories of client application.

Figure : The three general categories of client applications


The figure shows different ways apps use streaming data, but it doesn’t specify
the exact communication protocol used between the data source (API) and the
client. That will be discussed later.
A key idea is that filtering and aggregating data are common across all app
types. Sometimes this happens on the client (app) side, but often it's better to
do it within the streaming system itself.
1. UI/End-User Applications (Dashboard/Application)
o These are apps that directly use streaming data.
o Examples: dashboards, messaging apps, multiplayer games.
o They process and display data right on the client side.
2. Integration with Third-Party or Stream Processors
o These apps use streaming data to perform business logic and then
send it to another system.
o Examples:
1
Sending transaction data to an external fraud detection service.
o Sometimes the other system is also internal (within your
organization).

2.Explain client-not-reading-fast-enough situation with two


approaches.
OR
Explain generalized server sent events and web sockets data flow
showing slow client with neat diagram.

Figure: Generalized server-sent events and WebSockets data flow showing


slow client
It handles the first two quickly, but message 3 takes too long. Meanwhile,
messages 4 to 6 show up.
• Option 1: Hold the extra messages in memory (safe if it's just a few).

2
• Option 2: Keep sending them, but that risks data loss if the system can't
handle the load.
A good streaming API should let clients know they’re falling behind. Sadly,
many third-party APIs don’t do that. If you’re building your own API,
definitely include this feature.

Client-Side Considerations
As a developer, you should ask yourself:
1. Am I keeping up with the stream?
2. What happens if I don’t?
3. How can I scale my client to handle more data?

Working with Third-Party Streaming APIs


Some APIs (like Twitter’s) warn you if you fall behind—Twitter sends a “stall
warning” every 5 minutes. But if you keep falling behind, it might
disconnect you, without telling you exactly when or why.
If the API doesn’t send any warning, here’s a tip:
Check the timestamps on the messages. If you notice the data is older than
expected, you’re likely falling behind.

Building Your Own Streaming API


If you control the whole system, be clear about what happens when a client
lags:
• Send status messages when a client is too slow.
• Log these messages so developers can troubleshoot.
• Include helpful documentation—but also provide real-time indicators
because not everyone reads docs.
Remember, don’t overload a struggling client with too many extra messages
to handle.

3
3.Explain different areas where data loss can occur.

Figure : A streaming client with potential areas of data loss in receiving,


processing, and sending data

Data can be lost in four places:


1. When sent by the streaming API
2. During processing (usually due to bugs)
3. When sending it out to another system
4. In the UI display

• When sent by the streaming API


The data might not reach your system due to network issues or if the API
is overwhelmed.
• During processing
If your code has a bug or crashes while handling the data, it might get
lost.
• When sending it out to another system
After processing, if your system fails while sending the data to another
service, that message could be lost.

4
• In the UI display
Even if the data is processed correctly, it might not show up in the user
interface due to rendering errors or browser issues.

4.Explain hybrid message logging with full acknowledgment and


ACKing.

Figure : HML with full acknowledgment shown in context

it’s critical to make sure a message is processed exactly once—especially when


sending data to a third-party system (like an order service) that we don’t
control and that might not handle duplicates well.
The best way to do this is by using acknowledgments:
• The client acknowledges the streaming API after processing the message.
• The third-party system also sends an acknowledgment after receiving the
data.
However, this can be tricky:
• The streaming API might not support acknowledgments.
• You might not control the third-party system either.

5
To handle this, keep a record of processed messages, using a unique ID or
hash. This adds complexity and requires more storage, but helps avoid
duplicates—even if the system crashes.

Figure : HML with partial ACKing and local storage


When using multiple streaming API servers, one may crash or go offline for
maintenance, causing your client to switch to another server. Since the new
server might resend messages already processed—and the API may not track
what was delivered—you risk handling duplicates.
To prevent this, your client needs to keep a record of processed messages in a
distributed store (like Redis or a database). This shared storage ensures that
across reconnects and server switches, you can check which messages you've
already seen and avoid reprocessing them, enabling reliable, exactly-once
message handling even during failures.

6
5. Explain hybrid message logging with partial ACKing and
distributed storage for processed messages.

Figure : HML with partial ACKing and distributed storage for processed
messages
when one streaming server goes down and the client connects to another, it
might resend messages you’ve already processed. So, you need to always check
and skip duplicates using a distributed store.
But if the stream is very fast, checking a remote store every time can slow
things down. To avoid this, keep recent messages in a local cache and
occasionally sync with the distributed store—this keeps things quick and
reliable.

7
8
6. Explain the steps required for exactly-once processing for
Node.js with neat diagram.

Figure : Node.js client with the steps required for exactly-once


processing
The steps in the flow are as follows:
1. Royalty event— In this step we receive the event from the
streaming API.
2. Log event— This is the RBML side of the HML algorithm. Once
we receive the message we store it.
3. Store event— We store the event in RocksDB which works great
as a local store that meets the needs of fast insert and delete.
4. Process event— At this point we’ve stored the original event we
received and can now send it safely.
5. Log event— We will log the event before sending it—this is the
SBML side of the HML algorithm, where we store the message
we are about to send in case we have a problem during the
send.

9
6. Store event— Store the event as we did in step 3.
7. Send payment request— We’re now ready to send the
payment request to our payment processor.
8. Receive ACK— Wait for an “ACK” confirming the payment was
successful. If the payment processor doesn’t support ACKs, try
querying the system to confirm success. If that’s not possible,
use a more asynchronous approach or run a separate process to
verify the payment later.
9. Remove event— At this point we know that the payment
system has processed the request, so we can safely delete the
royalty event received in step 1 from our backing store.
10. Remove— This is a remove from the RocksDB. In this case,
it will remove the royalty event we received in step 1.
11. ACK event— After processing, send an ACK to the
streaming API to confirm the message is done. If the API
doesn’t support ACKs, use another way to signal completion.
Since ACKs might get lost, add a check before processing to
avoid handling the same message twice.

10
11

You might also like