Skip to main content

Data Engineering in Microsoft Fabric

·624 words·3 mins

Welcome, Fabricators! 🚀

This post is all about the Data Engineering experience in Microsoft Fabric. I walk through this with Simon as part of the Advancing Fabric series on the Advancing Analytics youtube channel but here’s a written synopsis for those that prefer to read their content.

The Data Engineering experience?

The Data Engineering experience in Fabric centres around the Lakehouse as the main artifact. This is essentially storage container in Fabric but upon creation, also provisions a dataset artifact and a SQL endpoint.

Creating Your Lakehouse

At the heart of data engineering in Fabric lies the concept of a data lakehouse. For all intents and purposes this is our data lakehouse architecture.

There’s no set medallion architecture structure or framework for ingesting and processing data, but we have the storage container and we can access it from notebooks, and a SQL endpoint so we’re most of the way there.

  1. Getting Started: Create a Lakehouse
    • Navigate to Your Workspace: If you’re already familiar with Fabric or Power BI, you know how to find your workspace. If not, you can get to your own or create a new one from the side pane.
    • Create a Lakehouse artifact: You can then simply create a new lakehouse artifact from the menu. Name it something sensible. REMEMBER - Fabric doesn’t do the medaliion architecture for you - I talk more about Fabric architectures in another post
    • Lakehouse Explorer: Once created you should see three artifacts, your lakehouse, lakehouse dataset, and a sql endpoint. Open up the lakehouse artifact and you can see table and files folders. You’re in the Lakehouse Explorer.
      • Tables are where you’ll see your delta entities
      • Files are where you’ll see all other data formats presented simply as files instead of data tables.
  2. Writing Spark Notebooks Now, let’s roll up our sleeves and create a Spark notebook:
    • Start a New Notebook: Click the “New” button and select “Notebook.” Give it a snazzy name – maybe “Sparky Adventures.”
    • Starter Pools: Fabric provides starter pools – pre-configured spark clusters for your notebooks. Choose that or you can create your own custom pool from the workspace settings menu.
    • Writing Spark Code: Inside your notebook, you can write in Python, C#, Spark SQL. The choice is yours.

Why Data Engineering

It’s important to understand why you would choose to use the Data Engineering experience in Fabric. Fabric users coming from Power BI or a data warehousing world might wonder why they would use Spark, while Spark users from tools such as Databricks may be curious about how to use Spark within Fabric. Spark gives us the ability to parallelise our ETL processes and easily parameterise processing notebooks meaning we can load multiple tables using a single script with a loop.

Rather than creating a new pipeline and SQL script each time a new table is added, users can automate the process and easily make corrections. The web UI allows users to see each other’s work in real-time and collaborate on notebooks. In addition, users have the option to work locally using Visual Studio Code.

The data engineering experience in Fabric is designed for users who are already familiar with Spark and want to easily access and manipulate data in their Fabric workspace. It provides a notebook development environment for writing and running Spark queries, with the ability to bring in external data. Users may need to get accustomed to the fact that files not yet registered as tables won’t display their associated metadata when browsed.

The main use cases of the data engineering experience include ETL automation and early stage data processing in the lakehouse architecture. The data engineering experience in Fabric simplifies these tasks through the use of Spark, making them repeatable and efficient.


My break time browsing list for 1st Oct
·195 words·1 min
With a well-timed (but much needed) holiday last week I’m just catching up with all the exciting announcements and news coming out of Microsoft’s Ignite conference.