Earn recognition and rewards for your Microsoft Fabric Community contributions and become the hero our community deserves.
Learn moreSee when key Fabric features will launch and what’s already live, all in one place and always up to date. Explore the new Fabric roadmap
[Edited by admin for unnecessary tagging without context]
A widely preferred pattern for data engineering with lakehouse for us has been creation of External Delta table. This is only possible for data sources that can be consumed from a notebook.
However, there are data sources that exist beyond that and the only alternative is gen2 df for them. But gen 2 df only inserts into a lakehouse table, is there any way to insert into a chosen lakehouse subfolder instead of a table.
I don’t think it is doable now. If that is the case, if it is on cards?
Solved! Go to Solution.
At the moment, Dataflow Gen2 only loads data to tables. Do feel free to suggest new destinations (and formats) in the Fabric Ideas portal (https://aka.ms/FabricIdeas)
An alternative is to leverage the copy activity or a copy job. Especially as the bronze layer is typically used for the files in its raw state, so no transformation should be performed at that layer and instead a simple copy activity should be good enough. If a connector is missing from the copy job / copy activity, then would you mind letting us know what the source is? you can also post a new idea for such connector in the Ideas Portal.
At the moment, Dataflow Gen2 only loads data to tables. Do feel free to suggest new destinations (and formats) in the Fabric Ideas portal (https://aka.ms/FabricIdeas)
An alternative is to leverage the copy activity or a copy job. Especially as the bronze layer is typically used for the files in its raw state, so no transformation should be performed at that layer and instead a simple copy activity should be good enough. If a connector is missing from the copy job / copy activity, then would you mind letting us know what the source is? you can also post a new idea for such connector in the Ideas Portal.
I'm curious, what are the benefits of writing to files instead of just appending to a lakehouse bronze delta table?
In a well-architected Data Lake, data flows through three layers:
The Bronze Layer is where raw data from various sources like on-prem SQL, SharePoint, Azure SQL, Oracle, APIs, and Databricks is ingested. Using external tables for this layer is highly advantageous for the following reasons:
External tables in the Bronze layer offer:
This design pattern forms the backbone of a resilient, scalable, and auditable Data Lake architecture.
Hi @smpa01
You arr right,Dataflow Gen2 currently supports writing data only to Lakehouse tables, not specific subfolders.
one thing which you can try is Use Dataflow Gen2 to land data in a staging table and then read this staging table in notebook and write to your desired location
That's too much honestly to maintain. pre_bronze->brone->silver so on and so forth.
Dataflows have an advantage over notebooks when it comes to connecting to certain sources that don't have equivalent connectors available in notebooks — for example, on-premises SQL Server, SharePoint, etc. In such cases, there is no alternative but to use a dataflow.
Currently, dataflows remain relevant largely because of this limitation. So, for writing to destination, it only makes sense that df gen2 provides same options as notebook.
To keep up with the norm, df gen2 must give the ability to write to subfolders. After all, any bronze should land in files for audit trailing.
User | Count |
---|---|
12 | |
5 | |
3 | |
3 | |
3 |
User | Count |
---|---|
8 | |
7 | |
6 | |
6 | |
4 |