Alle Broadcasts
Fabric Frenzy #3
19 visninger
There is a wealth of new opportunities for data analysis and insight in Microsoft Fabric, and new features are constantly being added making it quite a challenge to stay updated and stay organized.
We want you to be fully updated with the coolest options in Microsoft Fabric. That's why once a month we give you an update on the new features and tips and tricks on how to take full advantage of Microsoft Fabric.
On the last Wednesday of each month, we will make sure to:
- Give you an overview of the most important new features in Microsoft Fabric
- Come up with tips and tricks to make better use of new as well as old features
- Give you concrete examples of cool business applications
View transcript
hey everyone and welcome to Thrill mind fabric Francie your monthly source of news and spotlights at Deep Dives and more around Microsoft Fabric and power bi I've been looking so much forward to presenting uh for you today because we have some interesting topics on the agenda as usual we have a spotlight and this week that Spotlight will be on the new direct Lake a new storage mode for mode for power bi that is really neat and works together with Microsoft fabric so for anyone here who's missed the news of what is Microsoft fabric fabric is the new Microsoft data platform embracing all the different workloads we already know and love from Azure synapse analytics from Azure data Factory and from Power bi even with a completely new workload also including the new data activator so a common experience to work with everything we do around data Microsoft fabric is um part of the platform and the UI we already are used to with power bi and it's just an easy way to get started with your entire data stack end to end so what the news it is September we have some new releases and we currently we recently had a power bi Next Step conference here in Copenhagen and during that conference we had a visit from Microsoft from Mo who's a principal program manager there who announced and revealed some very very exciting new things that will will come in the in the near future so before we talk about those we also got some news that are already here we got SQL project support for fabric warehouses we got the new workspace tables that are part of these to come features then we have calculation groups for a desktop upgrades to the dark Lake and finally a North star a direction for visualizations so a lot of interesting things features that you can go in and play with today but also features that we still have to wait a bit more to to use in the near future let's start with equal projects for the fabric Warehouse with the SQL projects you'll be able to do more Source control integration so you can treat your database code as real code that can be stored in version and control for example like git you can use the the SQL database projects to keep track of changes make code reviews and maintain a version history of that code and finally run testing schema validation and deploy using SQL projects so it's really a way to make your whole devops and ciacd around your databases a bit more robust and a bit more version controllable so very nice addition all in all it's available to to use already today it works with ssdt and it works with the with Azure data Studio yeah so so definitely recommend checking that out then we got one of the reveals at next step we are going to have it's something people have been asking for for a long time workspace folders so it's not here yet unfortunately I mean I would love that I think we've been asking for this in the community for five six maybe even more years and it seems like it's getting very close to finally being here for me this is a feature that I would be able to use immediately I mean I would go today and use this feature to just organize my power bi workspaces or my Microsoft fabric way spaces um when we have the feature it seems like it's going to be part of clicking that new button it will allow us to to create new folders officially they seem to be called subfolders and they do it is confirmed that they do support nesting so we can create folders within folders and create an actual folder structure it seems like a small thing and and it seems like almost too much to celebrate such a basic functionality but not having had this functionality for so many years it really excited me that we can finally start making true organization in in our workspaces so I hope someone out there is as excited as I am about this because it will be huge then we have calculation groups so calculation groups is a feature that actually have been in power bi for quite some time it allows us to streamline streamline our meshes and it makes our reporting more flexible the new thing is that calculation groups will be in desktop but more on that let's just make a quick introduction to calculation groups for anyone who doesn't know about them yet so let's imagine that we have a bunch of measures in our data set in this example it would be something like sales amount sales cost sales discount and margin and once we have all those metrics we probably are going to want to do some sort of time intelligence analysis on them we want to look at sales amount but we also want to look at sales amount last year and compare compare this to maybe we also want to see sales amount year to date and and watch that how it progress over time and it could be even more so if we wanted to do this as you can see on the slide we would have to today to create not only the four meshes but also create four new measures for each of these time intelligence summing up to already 16 different measures now imagine if it's not four metrics but 10 or 20 and it's not just three time intelligence Dimensions but maybe also 10 we would quickly have to maintain and and develop on 200 or more measures very very quickly becomes too much calculation groups helps with that so imagine if we could just create our measures our metrics as what is called calculating ice items so a flexible way way where we can pick and choose between them and we could do the same for our time intelligence and and set it up in a way that they actually can mix and match this is exactly what we can do with calculation groups it allows us to create this kind of template where we can with with only two two object two calculation groups we can mix and match between all the elements within them so it becomes much simpler we only have to care in this case about seven different objects and once we do our reporting we just pick which kind of time intelligence analysis we want to do we pick which round of metrics we want to analyze and adding those two together we get the exact number we want so no more having to maintain 200 different measures really cool but this is not new we've had this for a long time actually we've just had to rely on using third-party tools to go into the data model and configure these calculations groups so we've been able to do this for a long time it's just been well complex the new thing is that we have been revealed that there will be calculation groups in the power bi Dex desktop UI in the near future we don't know much more yet but as the image here shows from from Next Step it is part of what is being built and definitely on the roadmap so we're starting to see some some cool things here so calculation groups very neat and if you didn't know about calculating groups yet definitely worth using in your data sets and Reporting it it really streamlines things then we also have some news on the data visualization front so back in the days if we wanted to create some sort of quite visual like this we would have to use so many elements we would have to use a bunch of shapes for the square and the and the um Square to the side as well as the icons and the actual meshes and and it just became a lot of elements with the new card measure we can use we can do all this with the only one visualization so it's going to make things simpler but it's also going to speed things up reporting Wise It's not going to to take a drain on on your report um rendering in the same degree that it's doing before so we get this new card version which already available now and it's still being developed and the more features will be added to this visual um then we also got the news that there will be a new buttons button slicer released in October so looking very much forward to that it seems to be giving us the options to make more clickable icon button based slices similar to what we have with third-party visualizations like Chiclet slicer but again simpler and more more easy to work with and finally we were shown the north north star of the data visualizations in power bi so some of the capabilities that we're looking into getting in the future and there's a lot of here to unwrap and not going to go detail with all of them but we're already here see the power of the of the new button button slices in the top very clickable Sleek great looking buttons but we also have code visuals as you can see on the left with the trend lines and just many features and there's a lot to dig into here you'll also see the line charts is also looking to get an upgrade with more options around formatting and and we'll just in general the analytical capabilities of the visualizations so we're looking into a future with many more capabilities to do much more with only native visualizations which I personally look forward to third which third party visualizations can be great but they can also have a lot of limitations and add a bunch of extra work of having to maintain and govern who and how and and and what external visualizations we are allowed to and can use so that was some of the news um that we had but we also have the spotlight Spotlight which comes with its own set of news this time so we'll spotlighting here on Direct Lake the new connection between one Lake and power bi um and directly received two major news this um this month the first one is that directly now supports calculation groups that's nice except that we don't have the option to use calculation groups in our desktop yet so we cannot really use that feature yet so that was a bit of a okay well um and yeah the second thing we got was and this is where things tied together the second thing we got was xmla endpoint support so we can now edit our direct Lake data sets with third-party tools like table editor and in that way we can also start adding calculation groups to the data set so we are actually able to do this but it does come with a bunch of consequences which I will talk a little bit about about more in a few minutes so let's talk about what is direct Lake even so in power bi we have different storage modes we have the import mode that's probably what most people are familiar with it's the mode where you can refresh your data daily or hourly and the data is copied from a source and store it inside your power bi data set could also be stored in a power bi data flow but but yeah import mode so we also have direct query where you establish a live connection through the data source and then you just send query size data loads in real time so if you connect to a SQL database you'll every time you update a visual it will send a new query to that database and get a response so much more live but more slow in the reporting and takes a larger strain on the soil system and then we have dual which is a hybrid between the import mode and and direct query allowing us to use one storage mode for some tables and and combining it with the with another for for some some queries and finally composite models that enables us to to use multiple storage mode except for the new direct leg mode doesn't work with composite models and finally we have this new direct leg mode which is a live connection to the one leg so it works like like a real-time connection with direct query like speed but it also has the performance of import mode so we get the both of Both Worlds with directly so we can actually use directly in our composite models but it is going to fall back to direct query anyways so um be aware that if you ever use diagram Lake in in any any sense other than direct leg itself it's going to just return to be similar to a direct query import mode sorry dial code storage mode so we have these three um these three storage modes that all work on the on the Microsoft fabric platform we have import mode that can input the data from your lake house or your Warehouse through the SQL endpoint technically you can also connect to the files you have in your data lake or in your one Lake but normally you would want to connect through the SQL endpoint then we also have direct query it's just going to query that SQL endpoint again can be connected to both the warehouse and the lake house and finally we have the direct leg The Best of Both Worlds which can connect directly to those Delta tables but currently it only works with the lake house object in power bi and how it works is that the whenever you ask a query to your data set you have a report you you update some numbers you change the visualization or you just want to refresh the the the the elements on the page it's going to send queries to the data model this is what it does already today with import mode but what happens with the directly connection is that is then going to send a request to the one leg itself and it's going to fetch the data from the lake house and since the lake house already have this standardized Delta format with parquet files it can actually just very quickly cache those files and pull them into the memory engine of of power bi or of direct Lake and then you'll have that data very very quickly available in a cached state where your report can be able to use them but that also means the first time you're doing a query it may take a bit of time because it is going to have to send the request to the to the to the lake house pull in the data store it in the memory cache and then it'll be ready to be optimized for quick reads but until then you're going to have a a delay that is some somewhere between import mode and direct query it's not going to be very long and sluggish but it is going to take time so just be aware that that direct leg at its best is as fast as import mode but at times can be slow for the user in many ways it is the best of both worlds but there are also of course trade-offs and downsides so let me show you a quick demo of how performant direct Lake really is so I have here a data set and this data set I've reset I've cleared the cache it should be a clean slate um I would say live demo sometimes it it acts not so but let's try to test it out so let's try to enter try to change one of the slices here and see how much time will take to update the page something like this and it starts updating takes a few seconds this is not import level performance um this is actually closer to what we would see with direct query and after 10 11 seconds the page is updated not horrible but not a nice experience for the end user but let me just try again because what happened now is that we send this first signal it went in pulled the data from the from the lake house and pulled it into the memory cache of the direct Lake data set itself so it should be much faster here the second time around let's try I'm just going to change the slicer again to a different value yeah and then finished in half a second so from having to spend 10 11 seconds on the first go it is now actually optimized right into memory and it's ready to be used very fast question is then what happens if I have to pull in a new column that I didn't not yet use well let's try so we can pull in something like um was a location the ID is already here we'll just try to pull in this one and it I mean it's only one column it took a little bit of time um because it probably wasn't in the it wasn't in the in the memory but if I update this again you'd probably take less and you see here um that column was now also loaded into memory so with import mode we're loading all our data into our data set in one time but with direct leg it's been done on an ad hoc basis so whenever you do a query it's going to hit those columns that you're using for the query and store them in memory so they're ready to be fast the next time you next time you use them which also means that the more columns are used and that's not for a single user it's for across your whole user base the more a column is used in your data set the more frequently it will be in memory and be ready to be used again so it's very flexible very adaptive and very much the best of Both Worlds but let's say we want to test what is actually in memory now now and what is is not um for that we can we can use Tech studio and actually dig into the model and see what is a little bit of what is happening behind the scenes great neck thing and of course I was locked out in the meantime here well we will not do that today today but uh anyway to deck Studio you should be able to query the DMVs the system locks the the metadata table tables and that will show you the the the value of how much data you pull into the to the memory of the data model as well as as a so-called temperature so there's an inbuilt function in the in the new direct data sets called temperature which is helping determine which columns are used more frequently than others and based on that the columns are either kept in memory or being evicted if they're struggling for the if they're battling for the capacity of of what you have available foreign yeah so that was direct like and as I said we got xmla um xmla report functionality for direct leg so we can actually go now and create calculation groups there we can we we're starting to be able to do most of the things we can do in a standard data set um so that's cool but be aware that if you ever go and actually use third-party tools to make any edits to your direct data set you will not be able to edit them anymore in the web modeling so if you have business users who also needs to be able to make edits on your data set or you just yourself want to be able to make those edits in the power bi service don't run it through an external tool with the xmla input because it is going to disable that option and you will be not able to go back to that point it is kind of a point of no return if you use take some lay endpoint you're forced to use it for all your changes going forward so be aware of that one last thing with direct link is that direct leg also supports live update so whenever you have new data you can ask your direct link to keep that data automatically updated in your data set so if you check out here in the service for a given data set under the settings you have this option in refresh called keep your direct link data up to date and it will allow you to set it up so that whenever there's changes made to the data in one Lake it's automatically going to push those changes to your direct leg tables in your data set and this is really cool because that means once if I load in new data and I do this every two minutes five minutes something like that that data is going to be immediately ready to be used in my data sets so I don't have to think about Refreshers I don't have to think about refreshing every hour every day it's just going to be live updated nothing no no such thing as a free lunch here either so nothing comes for free and this will also come with some um consequences so if you do enable this option it is going to refresh your data set every time new data is entered and that's nice because that is what we want but another thing that will happen is that every time our data set is refreshed it's also going to flush which means delete the cached data we have inside the data set so all that effort we just did of sending some queries using multiple columns getting that data set to load in those data and cache them well all that effort is then going to be reset and we're back from scratch with a data set that didn't load in any memory and then that's going to be slow from the first go so I can show you an example if we go to this specific this specific data set and we click the refresh button because we can also do this manually it's going to refresh and since it's not an import mode the refresh is only about updating the metadata and well deleting this cache so now that I've I I updated we can go back to the same report and we can try to change the slicer again to see what happens and I just changed it and we are well we're not back to the 10 seconds um which could be yeah that's due to the local report caching some of the visuals so some of the visuals locally locally cached but for some of them well it took now two seconds maybe three seconds so it is getting slower again and that will happen if you refresh so deciding whether to refresh or to keep it um or to keep it fast and then control when you refresh is it a new design choice we have to think about the Easy Choice is of course just to let it catch any updates um the problem with that is that you can have data where it's partially updated maybe you updated that Dimension but didn't update your fact table yet or the other way around and if that happens you may have some inconsistent data and you may have even errors in your data whereas if you control when the refresh needs to happen for example after loading in all of your data you can trigger an API to refresh the data set well then you have a bit more control but the data may also be delayed for a few seconds finally you can also just rely on the whole old practice of setting up an hourly or a half hourly refresh of your data set and then it'll manually go and and update those data at those intervals so you get some um you get some choices is how how you want to tune this and finally you can also set up your own kind of um engine or functionality to keep your data set warm so whenever you do a refresh you could use a notebook or an API call or something that hits the data model query the most used columns and make sure that they're warm from the from the from the beginning so that no user have to be the first user using the data set and experience it being slow for the first time and so well depending on your report and depending on your data set you may want to spend this extra effort to gift your users a more Sleek experience or you may choose not to there's a lot of flexibility there's a lot of a potential and there's a lot of speed in this data set so what I forgot to mention is that or what I didn't mention is that this report we've been looking at this report that actually seems to be running pretty blazingly fast so I can click around and in any way it it updates very close to what I would expect from an import mode report this is actually a report with 900 millions of rows okay so almost a billion data rows and it's a it's a 36 gigabyte large data model so this is definitely not a small model that just runs fast because it was the convenient example to bring here this is actually a very very large data model a data model where if we did use direct query it would be going to take 10 or 20 of seconds for updating every page so when Microsoft say that we are getting something that is the best of both worlds it has the import level um performance but it has the direct query speed I am almost close to a green so in most cases it's going to be import level performance in some cases it's going to be a bit more slow and in most cases it's going to be direct query freshness but that's only if you set up the auto update on the data set so yeah we have full flexibility to configure some ideal State the best trade-off between those two but it's just not like we're getting well everything out of the box that just is the best and best and best I mean we get a very very good trade-off here so I'd agree it's the best of both worlds it's not like it's better than import mode or better than dial query but it's the best compromise I've ever seen so far in in power bi and Microsoft replica so thank you so much for listening into the today I hope you learned something and I hope you enjoyed and saw some potential in direct Lake because it is um it's definitely going to be part of the future of how we design data sets and how we set up our end-to-end data Platforms in Microsoft Fabric and in power bi so bye