Simplifying D365 Analytics with Export to Data Lake
Existing solutions to get data out of Dynamics 365 (D365) for analytics or business intelligence involve third-party tools or the data export service connected to a potentially costly Azure SQL database. You’re then encouraged to carefully select the exported data to minimise storage and compute costs. You may then move the data again to a data warehouse, where you perform transformations to make it usable and consumable for self-service BI. None of that empowers deeper analytics and machine learning using tools like Databricks for example.
The alternative, data engineering approach, would be to set up a custom data pipeline to pull data from the common data service (CDS) into an Azure Data Lake; a cost-effective, scalable storage platform designed for analytics performance, using an orchestration tool like Azure Data Factory. This requires data engineering skills to set up and maintain while only attracting a (minimal) cost for the data factory and data lake, in comparison to an Azure SQL database.
With Dynamics 365 sitting on top of Microsoft’s Common Data Service (CDS), data is stored and organised in a secure and standardised format, the Common Data Model (CDM). The CDS goes beyond D365 and underpins the entire Power Platform providing a predictable model of data across many systems and tools.
A feature just hitting GA this week simplifies your access to CDS data, mirroring Microsoft’s vision for the modern data platform by bringing the data out to an Azure Data Lake, as a central store for raw data. This is the key to getting the data where you need it for data warehousing, machine learning, reporting and any other integration purposes.
The Export to Data Lake service is a pipeline to export CDS data to Azure Data Lake Gen 2, continuously after an initial load and in regular snapshots. This works for both standard and custom entities and replicates all operations (create, update and delete).
The Export to Data Lake service is enabled through the Power Platform admin portal. It requires minimal configuration with just the connection to an existing data lake and selection of required entities, making it very simple to set up.
Some caveats exist that need to be in place beforehand:
- The CDS and Azure Data Lake must be in the same region.
- The user setting up the service must have Owner rights on the data lake resource, to allow delegation of permissions.
- Any entities you want to use with the service must have change tracking enabled.
You can view the sync status and record count of each entity within the Power Platform portal. Depending on the size of your environment, it may take time to complete the initial sync.
If you connect to your data lake using Azure Storage Explorer you can see the hierarchy set up for the CDS entities. Data is partitioned by year and a snapshot subfolder containing daily snapshots of that entity (if changes take place).
Within the main folder there’s also a JSON model file with all data model information and current sync status across each entity. This provides valuable metadata for data egress and is a good source for monitoring.