Danny van den Broek
Almere, Flevoland, Nederland
600 volgers
Meer dan 500 connecties
Gemeenschappelijke connecties met Danny weergeven
Welkom terug
Door op Doorgaan te klikken om deel te nemen of u aan te melden, gaat u akkoord met de gebruikersovereenkomst, het privacybeleid en het cookiebeleid van LinkedIn.
Nog geen lid van LinkedIn? Word nu lid
of
Door op Doorgaan te klikken om deel te nemen of u aan te melden, gaat u akkoord met de gebruikersovereenkomst, het privacybeleid en het cookiebeleid van LinkedIn.
Nog geen lid van LinkedIn? Word nu lid
Gemeenschappelijke connecties met Danny weergeven
Welkom terug
Door op Doorgaan te klikken om deel te nemen of u aan te melden, gaat u akkoord met de gebruikersovereenkomst, het privacybeleid en het cookiebeleid van LinkedIn.
Nog geen lid van LinkedIn? Word nu lid
of
Door op Doorgaan te klikken om deel te nemen of u aan te melden, gaat u akkoord met de gebruikersovereenkomst, het privacybeleid en het cookiebeleid van LinkedIn.
Nog geen lid van LinkedIn? Word nu lid
Bekijk het volledige profiel van Danny
Meer bijdragen onderzoeken
-
Abhisek Sahu
Databricks on Azure is everywhere , stealing the spotlight and making waves in the market! But let’s be honest, it can be tricky to truly understand these concepts until you dive into them yourself. That’s where these free end-to-end projects come in your shortcut to clarity and hands-on mastery! 🚀. Believe me, these projects are cutting-edge and align with the latest technologies. 𝗕𝘆 𝗲𝘅𝗽𝗹𝗼𝗿𝗶𝗻𝗴 𝘁𝗵𝗲𝘀𝗲 𝗽𝗿𝗼𝗷𝗲𝗰𝘁𝘀, 𝘆𝗼𝘂’𝗹𝗹 𝗴𝗮𝗶𝗻 𝗮 𝗰𝗼𝗺𝗽𝗿𝗲𝗵𝗲𝗻𝘀𝗶𝘃𝗲 𝘂𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴 𝗼𝗳 𝗸𝗲𝘆 𝗔𝘇𝘂𝗿𝗲 𝗱𝗮𝘁𝗮 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝘀𝗲𝗿𝘃𝗶𝗰𝗲𝘀 𝘄𝗶𝘁𝗵 𝗗𝗮𝘁𝗮𝗯𝗿𝗶𝗰𝗸𝘀 𝗰𝗼𝗻𝗰𝗲𝗽𝘁𝘀 , 𝗶𝗻𝗰𝗹𝘂𝗱𝗶𝗻𝗴: ▪️ How Databricks works and its integration with Azure ▪️ Delta Lake and its benefits ▪️ Full and incremental data loads ▪️ Implementing SCD Type 2 ▪️ CI/CD DevOps integration ▪️ Lakehouse Medallion Architecture with Bronze, Silver, and Gold layers ▪️ Building ETL pipelines using Azure Data Factory ▪️ An overview of PySpark ▪️ Data Governance with unity catalog ▪️ Insights into Azure Synapse Analytics …and so much more! 𝗖𝗵𝗲𝗰𝗸 𝗼𝘂𝘁 𝘁𝗵𝗲𝘀𝗲 𝗳𝗿𝗲𝗲 𝗽𝗿𝗼𝗷𝗲𝗰𝘁𝘀. 👉 End to End Azure Data Engineering Project by Sumit Sir (Sumit Mittal) 🔗 https://lnkd.in/giN6pEEy 👉 Azure End-To-End Data Engineering Project (Job Ready) | Azure Data Engineering Bootcamp (Ansh Lamba) 🔗 https://lnkd.in/gJfPHQsV 👉 Olympic Data Analytics | Azure End-To-End Data Engineering Project (Darshil Parmar ) 🔗 https://lnkd.in/gb_gwT3R 👉 An End to End Azure Data Engineering Real Time Project Demo | Get Hired as an Azure Data Engineer ( Mr. K Talks Tech ) 🔗 https://lnkd.in/gQ7RT5kJ 👉 End to End Big Data Engineering Project with Azure | Big Data Engineering (From Scratch) (Mayank Aggarwal ) 🔗 https://lnkd.in/gTJFAu64 👉 Azure Data Engineer Project | Azure Data Engineer End to End Project | Azure data Engineer Job ( Aditya Chandak) 🔗 https://lnkd.in/gsPWybWw 👉 Azure End-To-End Data Engineering Project for Beginners (FREE Account) | SQL DB Tutorial (Luke J. Byrne) 🔗 https://lnkd.in/g-gPtg2P 👉 Azure End-To-End Data Engineering Project (From Scratch!) (Ansh Lamba) 🔗 https://lnkd.in/gAQaJpZm 👉 Football Data Analytics | Azure End To End Data Engineering Project ( 𝐂𝐨𝐝𝐞𝐖𝐢𝐭𝐡𝐘𝐮 (Yusuf Ganiyu) 🔗 https://lnkd.in/gNHrhDV5 Dive in to elevate your skills and master the world of Azure data engineering! 📕 Sharing 'Azure Databricks End to End Project' for your learning reference. Doc Credits : Linkedin Respective Creator. ♻️ Repost this to help the Job Seekers. P.S : I share valuable job search tips, industry insights, learning resources, and the latest updates in the data domain – all for free. Join thousands of other readers here → https://lnkd.in/g-ZtB4Yf Lets Learn Azure Databricks !! #dataengineer #azure #databricks #projects #data #azuredataengineer #cloud #microsoft #cloudcomputing
464
43 commentaren -
Dipankar Mazumdar
A few days ago I published my article “Apache Parquet vs. Newer File Formats (BtrBlocks, FastLanes, Lance, Vortex)” looking at why new formats are emerging and how they compare to Apache Parquet/ORC. Last week, a new research paper introduced F3 (Future-proof File Format) - a next-gen open-source format designed to move beyond Parquet/ORC’s limitations. It is an extremely detailed read but here are some of the things you might want to know. ✅ Problem Framing: Parquet and ORC were built for hardware/workload assumptions that don’t hold anymore. Cloud object storage, wide ML tables, vector embeddings & random access make them inefficient. ✅ Core Principles: F3 is built on interoperability, extensibility, and efficiency - exactly the gaps newer formats have been trying to fill. ✅ Metadata Redesign: FlatBuffers replace Thrift/Protobuf for zero-copy column-level access, avoiding full footer deserialization. ✅ Decoupled Layout: F3 introduces IOUnits and EncUnits, breaking the tight coupling of row groups, hence better memory usage, predictable I/O, cloud-friendly flushes. ✅ Flexible Dictionaries: Dictionaries can be local, global or shared across columns, enabling better compression ratios than Parquet’s fixed row-group scope. ✅ Decoding API: A stable, language-agnostic API that always outputs Apache Arrow arrays, hence consistent interoperability across systems. ✅ Wasm-Embedded Decoders: Every F3 file ships with its decoder in WebAssembly, ensuring files are always readable without waiting for library upgrades. ✅ Performance: Faster metadata parsing, competitive compression/throughput & much better random access vs. Parquet/ORC. What’s different here is that the F3 authors explicitly call out that even newer formats like Lance, Nimble, BtrBlocks are repeating the same mistake as Parquet/ORC - locking into present-day assumptions. Their bet is that by embedding extensibility and interoperability into the core (e.g., via Wasm decoders), F3 won’t need to be replaced a decade later. Irrespective of how this will see adoption, I think this reinforces what I highlighted in my blog: the file format layer is being actively re-imagined! Link of the paper & my blog in comments. #dataengineering #softwareengineering
332
6 commentaren -
Ilya Vladimirskiy
Just in case you want to explain the difference between ETL (extract-transform-load) and ELT (extract-load-transform) approaches in data engineering to your 5 y.o. child, I have something for you. It’s actually easy: either you get something, store it, and make sense out of it later; or you get something, make some sense out of it, and store only the results. What is better? Depends on the use case and a few parameters like size of your storage space and complexity of the processing needed. UPD: for more information about ETL and ELT please check the episode of my video-blog, which is dedicated to this topic: https://lnkd.in/eP52bjFc #data #leadership #etl #elt #apples
698
65 commentaren -
Olga Maydanchik
Part 4. I often see data lineage tools do not perform well short during cross-platform data movement. Especially when data pipelines mix legacy and cloud systems. In these cases, data flows can get messy. Example: - mainframe → COBOL files → Azure Delta Lake → Snowflake - COBOL files → Hadoop → Oracle → SQL Server → Tableau Most lineage tools see only parts, not the full picture: legacy hops, parallel paths, and cross-platform transformations are not captured. While these tools can track data within a single system, they struggle across multiple technologies. As a result, end-to-end lineage is rarely automatic. We need a mix of tools, expertise, and manual mapping to see the whole journey. This incomplete lineage makes impact analysis, governance, and troubleshooting much harder than expected. Part 1 of the common lineage issues: https://lnkd.in/gwPs5Zfq Part2: https://lnkd.in/gQAxhciw
58
20 commentaren -
Samiul Hossain Fahim
🚀 𝗦𝗽𝗮𝗿𝗸 𝗜𝗻𝗰𝗿𝗲𝗺𝗲𝗻𝘁𝗮𝗹 𝗟𝗼𝗮𝗱𝘀 𝗝𝘂𝘀𝘁 𝗚𝗼𝘁 𝗤𝘂𝗶𝗰𝗸𝗲𝗿 & 𝗖𝗹𝗲𝗮𝗻𝗲𝗿! 🚀 Tired of reprocessing your entire dataset every time you need to update your analytics? When dealing with large volumes of data, especially from cloud storage, efficient incremental loading is key to performance and cost savings. One of the most elegant and powerful ways to achieve this in Databricks Spark, particularly with Auto Loader, is by leveraging hashtag #𝗳𝗶𝗹𝗲_𝗺𝗼𝗱𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻_𝘁𝗶𝗺𝗲 𝗮𝗻𝗱 𝘁𝗵𝗲 𝗺𝗼𝗱𝗶𝗳𝗶𝗲𝗱𝗔𝗳𝘁𝗲𝗿 option. 𝗪𝗵𝘆 𝘁𝗵𝗶𝘀 𝗮𝗽𝗽𝗿𝗼𝗮𝗰𝗵 𝗶𝘀 𝗮 𝗴𝗮𝗺𝗲-𝗰𝗵𝗮𝗻𝗴𝗲𝗿: * Precision Loading: Instead of blindly scanning all historical files, modifiedAfter allows you to tell Auto Loader exactly where to start – only processing files that have been modified (or created) after a specific timestamp. * Optimized Initial Scans: For massive source directories, this drastically reduces the time taken for the initial scan when your stream first starts or restarts. No more sifting through years of old data! * Clean & Efficient Data Pipelines: By focusing only on new or updated data, you streamline your ingestion process, leading to faster job execution and less resource consumption. * Simplicity with Auto Loader: Auto Loader's robust checkpointing combined with modifiedAfter provides a nearly hands-off experience for maintaining exactly-once processing guarantees for your incremental data. How it works (in essence): You simply set the modifiedAfter option in your spark.readStream.format("cloudFiles") call with a precise timestamp. Auto Loader then intelligently filters out anything older than that time during its initial discovery phase. This method is particularly effective for scenarios where new data arrives as new files or existing files are updated (if cloudFiles.allowOverwrites is configured carefully). If you're building data lakes or data warehouses on Databricks, mastering incremental loads with modifiedAfter is a must for building scalable and cost-effective data pipelines. #Databricks #SparkSQL #DataEngineering #DataPipelines #IncrementalLoad #AutoLoader #BigData #DataLake #CloudData #AnalyticsEngineering
143
1 commentaar -
Saurabh .D. Tikekar
👷♂️ 𝐖𝐡𝐨 𝐛𝐮𝐢𝐥𝐭 𝐭𝐡𝐢𝐬 𝐀𝐳𝐮𝐫𝐞 𝐃𝐚𝐭𝐚 𝐅𝐚𝐜𝐭𝐨𝐫𝐲 𝐩𝐢𝐩𝐞𝐥𝐢𝐧𝐞?! Late-night debugging. No documentation. No alerts. Just deeply nested ForEach loops and broken lookups. If you’ve ever inherited an ADF pipeline, you know the pain. 🚨 Real talk: Azure Data Factory is powerful, but without proper naming conventions, logging, and alerting, it quickly becomes a black box of chaos. 💡 Pro tip: Document your pipelines Set alerts for failure triggers Use parameterized datasets Respect your future self (and teammates) 😅 Because when ADF breaks in production, it doesn't whisper... it screams. #DataEngineering #AzureDataFactory #ADF #DataPipelines #TechHumor #CloudEngineering #DevOps #LinkedInHumor #DataOps
396
11 commentaren -
Dumky de Wilde
Running dbt models on Snowflake? Did you know you can easily leverage the query tag to get cost insights per dbt model. Here's my process and code to: - Identify which dbt models burn through those credits 🔥 - Understand how much your dbt tests are costing you 💸 - Calculate how much you're paying for 'empty' warehouses 📦 https://lnkd.in/eqAdqMEt
41
1 commentaar -
Naval Yemul
🚀 𝗗𝗲𝗺𝘆𝘀𝘁𝗶𝗳𝘆𝗶𝗻𝗴 𝗟𝗮𝗸𝗲𝗵𝗼𝘂𝘀𝗲 𝗙𝗲𝗱𝗲𝗿𝗮𝘁𝗶𝗼𝗻 𝗶𝗻 𝗗𝗮𝘁𝗮𝗯𝗿𝗶𝗰𝗸𝘀 Curious about how 𝗾𝘂𝗲𝗿𝘆 𝗳𝗲𝗱𝗲𝗿𝗮𝘁𝗶𝗼𝗻 and 𝗰𝗮𝘁𝗮𝗹𝗼𝗴 𝗳𝗲𝗱𝗲𝗿𝗮𝘁𝗶𝗼𝗻 work in Databricks? 🤔 I’ve put together a 𝗾𝘂𝗶𝗰𝗸 𝗴𝘂𝗶𝗱𝗲 that explains: 🔹 What Lakehouse Federation is 🔹 Key differences between Query vs Catalog Federation 🔹 Practical use cases for both 🔹 Simple setup steps to get started Whether you’re a 𝗱𝗮𝘁𝗮 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿, 𝗮𝗻𝗮𝗹𝘆𝘀𝘁, 𝗼𝗿 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁, this PDF will help you understand how to run queries seamlessly across multiple data sources without moving all your data into one place. 📥 Download & Read the PDF: [Attached] 💡 𝗙𝗼𝗹𝗹𝗼𝘄 𝗺𝗲 for more insights on Databricks, Azure Data, Data Engineering & AI. #Databricks #Lakehouse #DataEngineering #AzureData #BigData #CloudComputing #AI #MachineLearning #DataArchitecture #TechLearning #MicrosoftFabric #LearningOnLinkedIn
65
1 commentaar -
Wael Dagash
𝐋𝐢𝐬𝐭 𝐯𝐬 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐨𝐫 𝐢𝐧 𝐏𝐲𝐭𝐡𝐨𝐧, 𝐰𝐡𝐚𝐭’𝐬 𝐭𝐡𝐞 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐜𝐞? 𝐥𝐢𝐬𝐭 holds everything in memory at once. 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐨𝐫 gives you one item at a time, only when needed. When working with large data, this difference matters. Generators are more memory efficient and better for performance. #waeldagash #dataEngineering
32
3 commentaren -
Avinash S.
When a pipeline breaks at 2 AM, nobody cares whether it was built in Spark, Glue, or Airflow. What matters is: ✅ Did you model the data in a way that’s easy to recover? ✅ Did you add logs, retries, and alerts? ✅ Can you explain the fix in a way your stakeholders understand? Good data engineers aren’t just tool experts—they’re problem solvers who think ahead, build for reliability, and speak the language of the business. And if you’re preparing for interviews, here are some common Data Modeling questions you should be ready for: 1️⃣ Explain the difference between 3NF vs. Star Schema. When would you use each? 2️⃣ How do you design a schema for a slowly changing dimension? 3️⃣ What are the pros & cons of denormalization in analytics systems? 4️⃣ How would you design tables for tracking historical changes in customer data? 5️⃣ Given a business case (e.g., e-commerce orders), how would you design the fact and dimension tables? 💡 I have compiled a list of 25 #DataModeling #Interview #Questions with Solutions that every Data Engineer should know. To get it! 1. Comment your email id.
20
6 commentaren
Anderen hebben Danny van den Broek genoemd in Nederland
-
Danny van den Broek
Mede-eigenaar bij Meubelfabriek Henk van den Broek
Schijndel -
Danny van den Broek
Amsterdam Area -
Danny Van den Broek
--
Nederland -
Danny van den Broek
Nederland
20 anderen die Danny van den Broek in Nederland zijn genoemd op LinkedIn
Bekijk anderen die Danny van den Broek heten