Earn recognition and rewards for your Microsoft Fabric Community contributions and become the hero our community deserves.
Learn moreSee when key Fabric features will launch and what’s already live, all in one place and always up to date. Explore the new Fabric roadmap
If you are currently working with spark structuree streaming, I am trying to understand how it is continuously streaming data as desired.
For example, once you set up a saprk streamin in a databricks notebook, you are advised to schedule that notebook in a job with Continuous mode, that ensures that it is working as desired.
Production considerations for Structured Streaming_databricks
Since there is no continuous job option in Fabric for notebook runs (only option schedule), how does the streaming works in a fabbric notebook that can't have a continuous job run (as in databricks) to perform streaming as desired.
Thank you in advance.
Solved! Go to Solution.
Hi @smpa01 , Yes, for now that is the case. but hoping it will change soon.
If this helped solve the issue, please consider marking it 'Accept as Solution' so others with similar queries may find it more easily. If not, please share the details, always happy to help.
Thank you.
Hi @smpa01 , Thank you for reaching out to the Microsoft Community Forum.
Microsoft Fabric doesn’t support Continuous mode for notebooks like Databricks does. Instead, Fabric notebooks run in batch mode, each scheduled run starts a Spark session, executes the code and then shuts down, which stops any streaming queries. This means Fabric can’t natively support always-on, continuous streaming jobs out of the box.
However, you can achieve near-continuous streaming using Spark Structured Streaming with checkpointing and frequent scheduling. To do this, write your streaming logic to include a checkpointLocation, which allows Spark to persist the query’s state and resume from where it left off in the next run. Set a short trigger interval, such as 10 seconds, so the query processes small batches of data frequently during each run.
To keep the notebook active long enough for meaningful processing, use a time.sleep() function, typically for about 5 minutes. This gives your streaming query time to process incoming data before the Spark session ends. Finally, schedule the notebook to run every 5 to 10 minutes. When combined with checkpointing, this setup ensures that each run continues smoothly from the last, minimizing data gaps. Expect around 5–10 minutes of latency per cycle, plus 30–60 seconds for cluster spin-up time.
If this helped solve the issue, please consider marking it 'Accept as Solution' so others with similar queries may find it more easily. If not, please share the details, always happy to help.
Thank you.
@v-hashadapu Thanks for the explanation and response.
With batch streaming, the streaming would probably work as desired for immutable data.
But batch straming would not work for a scenario when source data mutates and the goal is to have any mutation streamed downstream whatsoever.
So at current capabilties, true streaming is not possible for any use case scenarios whatsoever as it is in databricks, fair to conclude that?
Hi @smpa01 , Yes, for now that is the case. but hoping it will change soon.
If this helped solve the issue, please consider marking it 'Accept as Solution' so others with similar queries may find it more easily. If not, please share the details, always happy to help.
Thank you.
User | Count |
---|---|
13 | |
5 | |
3 | |
3 | |
3 |
User | Count |
---|---|
8 | |
8 | |
7 | |
6 | |
6 |