An Overview of Microsoft Fabric Going Into 2024

Overview

Introduction
What is Microsoft Fabric?
The Major Components of Microsoft Fabric
3 Upsides to Using Microsoft Fabric
3 Downsides to Using Microsoft Fabric
Should you Change?
Wrapping Up

Introduction

Microsoft Fabric is advertised by Microsoft as being an all-encompassing platform for data analytics, data engineering, and AI. It was introduced in preview in the spring of 2023 and was made generally available for purchase in November 2023. The platform builds on features from existing services like Azure Synapse Analytics, Azure Data Factory, Azure Data Lake Gen 2, Microsoft Purview, and Power BI.

In this blog post, I want to give you a high-level overview of the Microsoft Fabric platform going into 2024. Specifically, I want to give you answers to the following questions:

What parts of the data lifecycle does Microsoft Fabric cover?
What is each component in Microsoft Fabric trying to achieve?
What are the upsides and downsides to using Microsoft Fabric?
Should you consider migrating to Microsoft Fabric?

My experience with Microsoft Fabric is based on a four-day Microsoft depth course I attended, as well as experimentation with Microsoft Fabric in the last couple of months. I also have broad experience with the tools that Microsoft Fabric takes inspiration from. I am however not affiliated with Microsoft and gain nothing monetarily from either overselling or underselling Microsoft Fabric. Based on this I can give an unbiased overview of Microsoft Fabric.

As you will be able to tell from the rest of the blog post, I think that Microsoft Fabric offers some genuinely useful features for a unified data platform. However, like everything else, the choice of migrating to Microsoft Fabric will depend on a lot of factors.

What is Microsoft Fabric?

Let’s try to get an overview of Microsoft Fabric and what it is trying to achieve.

Microsoft Fabric is an all-encompassing data & analytics platform that handles data from the collection stage to the analytics stage. This includes data storage, data pipelines, data alerts, data linage, data governance, AI-features, Power BI integration, and more. The platform is built on previous Microsoft services and collects many existing features into a single package.

According to Microsoft, there are four focus areas of Microsoft Fabric that shape its goals and what it is trying to achieve:

Image from Microsoft's publicly available learning material — Image from Microsoft’s publicly available learning material

Complete Analytics Platform

The Microsoft Fabric platform is a full-fledged ecosystem that aims to give you a complete package of what you need in a data & analytics platform.

Most Data Platforms such as Databricks and Azure Synapse Analytics are PaaS (Platform as a Service) based, where the supplier handles things like operating systems, maintenance, and distribution of workloads, while you control the code and data. Microsoft Fabric is profiling itself as a SaaS (Software as a Service) platform, where the supplier takes a larger role in the code and configuration. This is achieved through a bigger focus on low-code tools like Azure Data Factory, Azure Data Activator, and Power BI as we shall see.

I think that calling Microsoft Fabric a SaaS platform is somewhat a stretch. For large scale projects there is still a high need to write code, whether that is Spark, SQL, or Python. Nevertheless, the Microsoft Fabric platform is truly a step in the direction towards SaaS data platforms by relying more heavily on low-code/no-code tools.

Microsoft Fabric also emphasizes governance and security through features previously available in Microsoft Purview service. This includes securing data ownership by grouping into domains and workspaces. It also includes visibility through data catalogues and linage, making the solution scalable without completely losing track of which data is available and to whom.

Lake Centric and Open

Microsoft Fabric uses OneLake that promise to simplify data storage, maintenance, and copying data.

OneLake is a single, unified, logical data lake for your whole organization – Microsoft Documentation

OneLake builds on Azure Data Lake Gen 2 that most Azure users have experience with. It is designed to be a single unified place for storing data across the organization, rather than setting up multiple data lakes for different branches and teams within the organization. Hence you can have one, and only one, OneLake connected to a Microsoft Fabric tenant. The ownership of data is handled within the OneLake through organizational features like workspaces and domains.

OneLake can support any file format, whether structured or unstructured. However, it is a bit partial to the Delta Parquet format since any warehouses or lakehouses within Fabric store their data by default in this format.

OneLake uses shortcuts, a feature that emulates shortcuts that we all are familiar with on our local machines. Shortcuts are used to share data without data duplication problems. Users with the right privileges can make shortcuts across workspaces and even to external services like S3 storage in AWS and Dataverse storage in the low-code Power Platform.

Empower Every Business User

The Microsoft Fabric user interface is very familiar to Power BI users and is pretty intuitive for most people coming from different data platforms. This allows business-users on the analytics side to take a bigger role in managing data storage and data transformations.

Microsoft Fabric also goes to great lengths to integrate with two other platforms that business users love – Power Platform and Microsoft 365. Both of these integrations allow business users to get closer to the data and collaborate more seamlessly with data engineers and data scientists.

AI-Powered

Finally, the Microsoft Fabric platform has integrated AI in different ways into the platform. One aspect of this is incorporating LLM (Large Language Models) like GPT-models and Microsoft Copilot to speed up the development process. These tools are starting to be heavily incorporated into the platform. If this incorporation is successful, then this is a major selling point for Microsoft Fabric.

AI has also arrived in Microsoft Fabric in another way as well. You can now train machine learning models and do everything from running experiments to saving and deploying models. This seems to be built on experiences Microsoft has drawn from the Azure Machine Learning service, where all of this has been possible for some time. So while this is not a new feature in the grand scheme of Microsoft Azure, it is new that it is so tightly coupled with data engineering tasks in Microsoft Fabric.

In Azure Synapse Analytics, there were no serious features available for machine learning. Other platforms like Databricks have already have ML coupled with data engineering for quite some time. So in this regard, Microsoft is catching up to what is expected in modern data & analytics platforms.

The Major Components of Microsoft Fabric

The following illustration from Microsoft highlights the components that together constitute the Microsoft Fabric platform. Let’s go through each of them briefly.

OneLake

I’ve already talked a bit about OneLake but would like to add some meet on the bone. As the illustration below shows, OneLake acts as a common foundation for the other components.

Workloads from these components automatically store their data in the OneLake workspace folders. The data in OneLake is then indexed for multiple purposes. One of these are data linage, where you can track which transformation have been applied to a dataset. Another is PII (Personal Identifying Information) scans, where sensitive information can be highlighted.

I personally think that one of the biggest advantages of OneLake is transparency. When working in platforms like Azure Synapse Analytics, it becomes difficult for everyone to keep track of what data is available and which transformations have already been applied to the data. A data analyst might get access to the Azure Data Lake Gen 2 storage to fetch the finished data for visualizations. But he/she have little knowledge of which transformations have been applied to the data to get it in this form. While there are ways to handle this by involving Microsoft Purview, this is a bit cumbersome. Having transparency by default is a feature that I think will not reach the headlines but is crucial to better collaboration.

Data Factory

Data Factory is an existing service in the Azure ecosystem that has been incorporated into the Microsoft Fabric platform. This tool is used to connect to data sources like databases, streaming systems like Kafka, documents in SharePoint, and tons of other sources. Then you can write data pipelines to transform the data in simple steps and automate the pipeline management.

Data Factory also includes Dataflow Gen2, which is a low-code tool for data ingestion and simple transformations. Users of Power BI will find this very familiar since Dataflow Gen 2 looks a lot like the Power Query Editor that they are used to. In this way, data analysts and business users can take a bigger role in data ingestion and processing.

Data Factory was already present within the Azure Synapse Analytics platform through the feature called pipelines. Hence the inclusion of Data Factory in Microsoft Factory is nothing less than expected.

Synapse Data Engineering

Some simple transformations are possible to do with low-code tools in Data Factory. For more complicated processing, you can use Synapse Data Engineering to set up Spark Jobs and notebooks to wrangle the data in a more customized way.

Synapse Data Engineering also allows to set up lakehouses where you can manage both structured and unstructured data in a single location. The data can then be transformed using Spark jobs and notebooks. The lakehouse also comes with a SQL analytics endpoint so that you can write SQL-based queries to fetch the data. Note that the SQL analytics endpoint is designed for read operations only. Most data engineers will spend a lot of their time in Microsoft Fabric working within Synapse Data Engineering.

Synapse Data Science

A departure from Azure Synapse Analytics is the inclusion of data science, specifically the lifecycle of machine learning model development.

Synapse Data Science includes model hosting, experiments, and deployments of ML models. There is a build-in MLflow experience so that tracking of parameters and metrics is simplified. Microsoft Fabric also supports autologging, which is a feature that simplifies the logging experience.

The training of ML models can be done with by Python/PySpark and SparklyR/R. Popular libraries like scikit-learn can easily be incorporated and the experience with developing models have been made a lot simpler. Other Azure-based AI tools like Azure OpenAI Service and Text Analytics can also easily be used from Microsoft Fabric. This connection is in preview as of now, but will include more of the Microsoft AI-services in the future.

I think that some more time, testing, and further development is needed before Microsoft Fabric can be called a full-fledged MLOps platform, but the changes they have already made are impressive.

Synapse Data Warehousing

I mentioned earlier that lakehouses in Microsoft Fabric have a SQL analytics endpoint where you can write read-only SQL queries on the data (as well as creation of views). Microsoft Fabric also has a fully functioning data warehouse solution that supports DDL and DML queries.

Hence with Synapse Data Warehousing you have a fully-fledged data warehouse that supports T-SQL capabilities. Whether to choose a SQL analytics endpoint coming from a lakehouse or a full-fledged data warehouse is a trade-off that needs to be considered in most situations. Microsoft has a lot of documentation that illustrate the trade-off and which features you get from the different options.

Synapse Real-Time Analytics

The expectation of real-time on demand data is addressed by Synapse Real-Time Analytics. Many systems collect data continuously to display in dashboards or to be used in ML models. Examples include IoT-data from sensors or browsing data from customers on a website. The Synapse Real-Time Analytics component of Microsoft Fabric tackles streaming data comprehensively.

It uses KQL (Kusto Query Language) for querying event streams. It is optimized for time-series data and has loads of features that support automatic partitioning and scaling. The end result can be easily integrated with other components in Microsoft Fabric such as Synapse Data Engineering or Power BI.

Power BI

I doubt that Power BI needs much of an introduction. It has been one of the de facto visualization/dashboard solutions for the last decade. With Power BI you can create updated and gorgeous dashboards that can be distributed to people with the right access and privileges.

The new thing in Microsoft Fabric is that Power BI is so tightly integrated with the rest of the data & analytics platform. Previously, data engineers could work in Azure Synapse Analytics while data analysts worked in Power BI with minimal interaction between them. In Microsoft Fabric, data analysts are encouraged to take a bigger role in data processing, while data engineers are encouraged to think more about how the data will facilitate insights in the visualization stage.

There is also a new connection mode called Direct Lake Mode that seems very promising as a middle ground between speed and avoidance of data duplication. This is optimal for large datasets with frequent updates. I have not done any benchmarking, but I am carefully optimistic that this might be valuable in many cases.

Data Activator

The final component of the Microsoft Fabric platform is Data Activator. Data Activator is a no-code tool for taking action whenever a condition or type of pattern is detected in the data. It sets up reflexes – a item that contains the necessary information to connect to data, monitor the data for certain conditions, and then act through triggers.

No code rules and triggers send notifications in applications like Microsoft Teams or Outlook to notify about interesting changes. It is also possible to use Power Automate in the Power Platform to write custom workflows for how an end-user should be alerted.

Alert systems have been possible to integrate with Azure Synapse Analytics through Logic Apps or other services. Alert systems quickly become neglected if there is even a hint of effort to connect them, so having Data Activator as a part of Microsoft Fabric is great. It offers in my opinion nothing that is revolutionary, but makes the whole Microsoft Fabric platform more holistic.

3 Upsides to Using Microsoft Fabric

Now that we have described the components that go into Microsoft Fabric, I want to discuss some upsides to Microsoft Fabric. These upsides are based on my subjective interests.

SaaS and Learning Curve

With Microsoft Fabric being a step in the direction towards a SaaS solution, the upskilling should be faster and available to a larger group of people than PaaS solutions. Specifically, I think there is the hope that data analysts can take on more tasks that traditionally belonged to data engineers. This includes monitoring data, setting up data pipelines, and writing code for data transformations. My initial experimentations with Data Activator and OneLake confirm that it is indeed very quick to get started with. The components also have a friendly interface that does not look intimidating to start with. I think this will drive data analysts to try out and experiment with tasks that were previously left to data engineers.

Much is also being done in term of leaning material. I attended a four-day free digital course from Microsoft that aimed to teach the basics of the Fabric platform. There will also be launched a new Microsoft Certification in 2024 called Microsoft Certified: Fabric Analytics Engineer Associate. It seems like Microsoft is really committing to Microsoft Fabric and is willing to make a lot of leaning materials for this solution.

Closing the Gap Between Data Analysts and Data Engineers

I have briefly touched on this previously but want to really focus on this point a bit. In the data sphere, there are many roles such as data scientist, data engineer, data analyst, ML engineer, data architect, and so on. While there are some clear differences between these roles, the data field has become unnecessarily fragmented in terms of roles. The fragmentation is not based on different ideologies or anything lofty like this, but simply due to a separation of tools. Microsoft Fabric makes this more cohesive and less fragmented in the way that it is structured.

A data engineer should think about how the end result of the data transformations should be used in visualizations. Similarly, a data analyst should care about which data transformations that take place before the data is ready for visualization. What often happens in real life is that data engineers and data analysts are segregated to different tools and have a minimal interface between them. This interface is requirement-based and does not really facilitate collaboration or sharing of insights. This can lead to much back-and-forth and silos that work independently.

With Microsoft Fabric, data analysts and data engineers are encouraged to work closely with each other. They can more easily view each other’s work and contribute outside their own specialty. This does not necessarily mean that data engineers will design Power BI dashboards or that data analysts will write Spark code. But the intersection between a data analyst and a data engineer will be bigger, and more collaboration is possible. I think that this is a major highlight of a data & analytics platform.

A Platform Where AI is not an Afterthought

In many data & analytics platforms, AI and machine learning is more of an afterthought than anything. They offer some hosting features for ML models, but is really a data platform first and foremost, with some AI-features as a sprinkling on the cake.

Microsoft Fabric takes a different approach and places AI front and center for the platform. Not only is the ML model lifecycle features relatively competitive, but the native integration for LLMs like Microsoft Copilot and GPT-models is built into the platform carefully. Since Microsoft is a major player in generative AI it is useful to have access to new developments and improvements as quickly as possible.

It seems like Microsoft Fabric are also gradually building more connections to other Azure AI Services (previously called Azure Cognitive Services). These services can of course be used through their respective endpoints as separate services, but Microsoft Fabric is trying to make the connection as smooth as possible. I think that within half a year from now, most of the Azure AI Services will be easily accessible from Microsoft Fabric. Having advanced Document Intelligence for parsing of PDF-documents or advanced text-to-speech translation easily available with the click of a button is something that most other data platforms will struggle to compete with.

3 Downsides to Using Microsoft Fabric

To counter the upsides to Microsoft Fabric I gave previously, here are some possible downsides to using Microsoft Fabric. Again, these are based on my own concerns.

Uniformization vs Lock-in

Microsoft Fabric certainly uniforms a lot of existing Azure solutions into a single package with its own billing and uniform OneLake. However, this also means that you are encouraged to use Microsoft Fabric as a full-fledged solution, rather than a single piece in a microservice architecture. This makes the data platform a lot more locked into the Microsoft ecosystem. Whether this is good or bad depends on which other tools you are using, and what your ambitions are for the platform going forward. For many, this might be a downside that they are not willing to compromise on.

The Double-Edged Sword of Low-Code

The advantage of low-code is that it allows for more people to engage. Data analysts and business users can take on a bigger set of tasks with Microsoft Fabric. But low-code is a double-edged sword in this regard. The simplicity of low-code also typically means less possibilities for customization. The more GUI-based a tool is, the less options are available for fine-tuning.

As a concrete example, Data Factory is a low-code tool that can extract data from e.g., transactional databases. But the functionality that Data Factory offers is less than what you could do if you wrote SQL-queries to fetch the data. This is natural, as SQL-queries is a whole declarative language, while Data Factory has a few presets and options to configure. Hence Data Factory will do the trick 9/10 times without issues, but sometimes writing it out in code will allow you more possibilities.

The fact that Microsoft Fabric is going the low-code road might be a development that does not make everyone ecstatic. I am quite happy with the balance they have achieved between low-code and code-based tools. Nevertheless, a few more steps in the direction of low-code will make the platform more difficult to manage for myself and many others with coding backgrounds. This development could be a future downside that is worth watching closely.

New Technology – Less Competence

This one is true for any new technology. There are few people out there who are very comfortable with Microsoft Fabric. If you are trying to build an internal team, then requiring Fabric experience for new hires is probably too much to expect. You need to do internal upskilling and spend some time finding out which patterns work in Microsoft Fabric, and which don’t. The upside is that Fabric draws so much from services like Synapse Analytics, Data Factory, and Power BI that this background should be enough for getting started with Microsoft Fabric quickly.

Should you Change?

Changing your existing solution to Microsoft Fabric is a complicated decision. Basing this important decision on a single blog post would be a foolish idea. Yet, here are the two cases that are clear-cut:

Are you using Synapse Analytics as a data platform? Are you using other tools such as Microsoft Purview and Power BI? Do you find it cumbersome to connect the services together and difficult to keep track of them? In this case, migration is promising. Making the change could make your data platform more manageable. Start experimenting with Microsoft Fabric! Try to duplicate some of the data pipelines you have in your existing data platform. If this brings good results, then you have a serious contender for a new data platform.
Are you using tools where several of them are outside the Microsoft platform? Perhaps you are not using Power BI but a different dashboard solution like Grafana? Do you have a focus on code-based tools and open-source tooling? Don’t switch to Microsoft Fabric in this case. You should still keep Microsoft Fabric on your radar. But your princess is unfortunately in another castle.

Outside of clear-cut solutions like this, you have to experiment with Microsoft Fabric. Only then will you be able to grasp whether it fits your needs. Changing your data & analytics platform is a heavy decision. Technical competence and strong business understanding are both needed to succeed.

Wrapping Up

I hope you got an honest overview of Microsoft Fabric and what it can offer. If you are interested in AI, data science, or data engineering, then feel free to follow me or connect on LinkedIn. What is your experience Microsoft Fabric? I’d love to hear what you have to say 😃

Like my writing? Check out some of my other posts for more content: