Alle Broadcasts
Fabric Frenzy #2
26 visninger
There is a wealth of new opportunities for data analysis and insight in Microsoft Fabric, and new features are constantly being added making it quite a challenge to stay updated and stay organized.
We want you to be fully updated with the coolest options in Microsoft Fabric. That's why once a month we give you an update on the new features and tips and tricks on how to take full advantage of Microsoft Fabric.
On the last Wednesday of each month, we will make sure to:
- Give you an overview of the most important new features in Microsoft Fabric
- Come up with tips and tricks to make better use of new as well as old features
- Give you concrete examples of cool business applications
View transcript
hi everyone and welcome to fabric Quincy your monthly source of news spotlights and more all around Fabric and power bi and everything in between today I have here in the Studio I have Yen my great colleague who is here to talk about science exciting new things yeah excited and glad to be here glad to have you um so today we'll go through the news around Fabric and power bi and we also of course have a monthly Spotlight and today that Spotlight is shined upon fabric warehouse and especially some of the new things that we can do with the warehouse in fabric so that will be interesting so before we start if we have any viewers here who are new to fabric who either come from a power bi background or an Azure synapse or some other data platform and is a little bit confused what what fabric even is I'll just sum it up shortly so fabric is the new end-to-end data platform from Microsoft stitching together all the known and true and trusted experiences that we've been used to from data Factory from Azure synapse and from Power bi alongside with a brand new offer called Data activator supposed to help us make a uh take action on the output of our data and all of these things are combined into a unified platform which is web-based it's a software as a service it's easy and approachable and it will really give the power of end-to-end data pipelines back in the hands of everyday users but with the added governance of a centralized government systems so a really great offering for for anyone who's looking into building a new data platform or who's looking into the future of what their future data platform should be yeah yeah exciting which is probably also why some of these impartial evaluators have have ranked Microsoft extremely high on the on the most recent Forester wave fabric power bi and synapse together is such a powerful mix of tools so that should probably clarify the the most basics for new people so what are the new things what are the new things here in August and we didn't have a fabric frenzy in July it was a vacation time so we're also covering some of that and what did we get we got xmla ride support for direct Lake data sets we got Warehouse sharing and more we got git integration Warehouse upgrades and we got capacity metrics in fabric and let's get started what does this even mean we'll go through all of these news and of course it was a big month for news in fabric I mean after moving from Power bi to also covering fabric of course each of the individual experiences in fabric have their own own set of news so there's just been so many news so we've tried our best to curate the the most important ones but also the ones that are related to our Spotlight of the week which is the warehouse so let's start with the warehouse and the upgrades that it has gotten first of all and the one I'm most excited about today is the DBT support for fabric data warehouse which is also what my colleague Yan will show you just just a bit later and we'll show you how DBT can really help with creating a better enhanced transformations in in a data warehouse in Microsoft fabric but we also got some quality of live upgrades we got automatic updates for statistics so if we just went back a few months we would have to update the statistics statistics if in our SQL manually but right now it is updating automatically every time we load in data and every time we query data it is just kept updated which means we don't have to to care about this and which also brings us closer to what we're used to from a classical SQL server and SQL database we got the update from and delete from T SQL so we're getting closer and closer to parity with being able to do all the SQL commands that we have from a SQL server and also being able to do those in the warehouse and we are very close to being have full parity um a colleague in my and I a colleague Brian and I tested out some of the features and and then he found out we can even do a recursive cces we can do CTS we can do window functions we can do most things in the fabric Warehouse that we expect from t-sql there are only a few limitations left we cannot do inval chart data type and we cannot do truncate and there's a few few smaller things as well but but the most important things are are there and we just hope we'll get truncate also in the in the near time then we have some optimizations in the data movement during query executions it's a fancy way of saying queries will run faster and we don't actually see this component whatever it actually does it's just I mean they change something back-end things are going faster um and and and it is going it is going faster a colleague of mine went to uh to Sweden last last weekend and actually talked about uh comparing the new fabric Warehouse compared to the old SQL serverless and found out that that did it is indeed faster on almost all types of queries and especially around big data and big amounts of data so it's definitely not a small scale solution it's a full touch full-fledged SQL solution finally we got the opportunity to use data flows Gen 2 with using the warehouse as a source so that opens up for doing our transformations with data flows get it from the warehouse and store it back into the warehouse making a combination of warehouse and data flow and new viable options for a symbol easy to use architecture then we finally got a zero cubby table clones which is a feature where we can copy a table in its current state and it's not actually copying the data but just pointing back to the same data at the koi which is residing in a data Lake and finally we got data warehouse sharing which is uh which enables us to show a warehouse with individual people internally or externally and we can give different degrees of permissions we can allow people to read data using SQL we can allow people to use read the data using Apache spark or finally we can allow people to build power bi reports on the dataset that's connected to our warehouse later we of course also hope to get an even finer granular granularity something like rollover security something like yeah more granular controls but for now at least we are able to share one data warehouse without having to invite people into our fabric workspaces it's a really great feature there much much needed and it's not only for warehouses if we look at the the picture series you'll actually see that for lake house there's a similar option and even for the kql real-time query set you can also share it with the with your organization or with specific people and and it has different permissions sets but it's open up to be able to share another thing which is actually not a new thing but it's important to emphasize that we can do this is the capacity capacity metrics have so there's an app already there in fabric ready for you to to use so if you have a fabric capacity you actually already can check what kind of consumption does your different artifacts and jobs use and if you're hitting some sort of limit and you can monitor that govern it and administer it so very very neat and then we have git integration I know that's something you care about also a lot yeah getting a lot of developers who actually has been looking forward to this yes absolutely I mean of course we need to be able to have versioning and and repositories around our objects and around the changes that we're making in fabric for it to be viable Pro developer tool and and with this recent git integration I think we we got a very big step towards that that goal currently we can connect to git with Azure devops um and that's so far but I could imagine that GitHub could be under I mean I would assume so then we have xmla write support for direct click so in fabric we got the new data load method called direct Lake which lets our dataset connect to our lake houses um directly without having to create an intermediate data set that needs to be imported or without having to use direct query and the promise was with direct Lake that we will get the performance of an import data set but with the real timedness of a direct query so The Best of Both Worlds and it's really looking promising so far but what we haven't been able to do so far is to use the xmla write point to make changes to our data sets in direct Lake mode that feature was was released and now we can do it which means we can programmatically make changes to a data set we can go to a script and connect to the data set and we can even now write some changes so it also opens up for third-party tools to be able to connect and modify this data set um and that is a I mean that really opens up a lot of flexibility there is one current limitation here directly data sets that are created or Modified by using these xmla Bay tools are currently not able to be edited with the web modeling feature anymore so if you start using a programmatic approach you force yourself into using that forever so that's the limitation should be aware about but the very cool feature very much needed feature and finally on the in the experience of power bi we have a little bonus feature because I think there's been a lot of communication around it personally I'm not sure if it's the biggest deal ever but for visually creating reports it is actually a very much asked for feature bombshell feature yeah yeah so we got finally smooth lines which means we don't have to look at reports which have clunky edges on them and they can look neat and aesthetically pleasing um so you can have opinions back and forth is this important is it not it is asked for a lot and it's something that report developers really like and I actually think that Microsoft did a good job here researching the feature and treating it seriously and not just implementing something that could create misleading data stories I mean that's always the risk with smoothing but actually you create something that took care of making sure that your top points did never do never go across the actual top points of your data data sets and and generally not not suddenly being misleading which smooth smooth graphs really have a tendency to to be so those were a bit of uh of of of of the most important news here but we also brought here today probably our favorite news from the from the release at least it's a you and I've spent some time looking at it we did we did um in a few years there's been a term obviously it's a marketing term but someone called the modern data stack which used to comprise of snowflake and DBT and the modern State the data stack besides being a marketing term um gave people some flexibility to use a template way of handling things with DBT together with the warehouse capabilities of snowflake which was cool and it didn't really connect with the with the products neither with power bi or with the with the Microsoft products so far but with the release of fabric we actually got an alternative now for a modern data stack in fabric because we got the option now to include and and integrate DBT together with our fabric Warehouse so what enables what enables this well as opposed to the lake house and fabric the warehouse actually has this right engine for SQL so anything any any any of the houses any of the lake house or warehouses in fabric can you can read data with SQL but for the warehouse you're also able to write data so we can use stuff like update statements delete statements things like that and by enabling that and by in general aligning with gdbt Microsoft has always also developed a DBT adapter which enables us to use the warehouse together with DBT and finally able to explore the possibilities of what DBC can do so now we can use DBT with fabric so the question to you yen is why should we why should we why should we well there's a number of reasons for that actually um what DBC brings to the equation in in this case is actually that we can we can easily uh collaborate on code we have Source control there's a number of features available from DBT that we can utilize on the fabric platform now we've got things like testing we can actually write tests we can write our own tests we Define them in the DBT framework and we can have them executed on the fabric platform we have as I mentioned earlier version control which is great for collaboration and also documentation of your project um we have a thing called Target profile so we have multiple Target profiles which makes it quite easy actually to switch between your development test and production environments and actually compile and and implement the same code on all of those platforms there's a concept of modular code which makes it easier to reuse code components uh throughout projects um or across projects or within the same project and finally which is one of the real benefits I think from the the DVT framework and and the processes they work with there is that you get automated documentation from the work that you're actually doing no excuse not to do your documentation not really I mean if you made a project with DBT in a warehouse how how much time do you then have to spend to generate this automated documentation well some of the some of the documentation like relationship between some of the tables and the queries that you make actually it just comes from writing the code which as a developer is a really great thing I think just out of the box you just click run and then you get old run and then you automatically get uh some of the you get quite a lot of the documentation of the project available you can see what objects you have in the project you can see lineage uh cool and relationships between the uh objects as well when you use it which is a really great feature nice so we actually we tried this out together and we we tried to build a simple architecture for for how to handle this so we we ingested data with the data pipelines and fabric into a data warehouse and then we'll use DBT for all the transformation workloads yeah and was able to build a basically a data warehouse that could rebuild itself within seconds we can make alterations um make Transformations we can deploy them to multiple uh endpoints um and this is just done with just a few commands actually in the DBT uh console app I still remember when we tested it out and we were still exploring the capabilities we asked ourselves what would it take we built it what would it take now to put it into Productions and we wrote one line of of code and it was then deploying to production the whole data warehouse building all the tables automatically generating the documentation making sure there was a proper lineage orchestrating all the data Transformations and executing them copying all the data so and copying all the data so in a few minutes running the tests seconds minutes in and we had a full-fledged production data warehouse yeah one more work so that was really cool and you brought here a demo today to just explore a little bit of how it looks and what we can use it for yeah there's a limited time of available or slowing showing the demo but we will just go through some of the processes or some of the workflows uh involved with the using DBT yeah it's one of the things you could mention is that DBT is in it's an ETL tool it's not a data it's not a load tool but there's a lot of workflow-wise there's a lot of things to gain from using this platform yeah as Matias mentioned what we did uh to try and make a demo or a quick proof of concept of this of using this tool on a fabric platform we tried to build just a simple data warehouse where we load data from sources we make some Transformations on them and we end up we end up showing them as as Dimensions or facts in what you would normally have in an analysis platform so basically followed the standard architecture like the most common one we had a raw layer where we landed the data we had an enriched layer where we cleaned up the data and finally we had a curated layer or gold layer with the actual modeled data exactly we did it of course as a symbol symbol demo or simple proof fairly simple but still we managed to do some transformations well the version of DBT that is currently supported for fabric is the DBT core which is a python Library and it's a console application cool so what are we looking around here in this screen we're looking at my uh Visual Studio code I'm in Visual Studio code I've written uh some of the what some of the queries will just go through the examples of the different tiers yeah we have so one of the things that we do when we start referencing our sources is that we put in references these we save these as SQL files with the SQL extension but in reality they are a technology called Ginger 2 templates which is a python concept so we can see them as variables that then point to an actual data sort yeah instead of variables that have a value so an available that can have a data source and which is really what's behind the scenes that making the this application what is what helps us make the auto okay everything else about templating and standardizing is because we have this way of having variables in our data source from having defined the raw layer where we draw data from our sources which will loading it loaded into the data house we make sorry we make some Transformations do a bunch of Transformations yeah we do a bunch of transformations in this script yeah and again reference in this layer we are referencing the tables the the raw table the the object that we made before uh and in this case we're actually also referencing a snippet of code that inserts first order dates for yeah so we can also just joins and everything and it's SQL at the the bottom all we have to do is write Aquarius whether you write your SQL basically the way you're used to but you have to get used to using the references and so instead of the table names that is basically the difference cool finally we can turn it into the find the Dimension the whole solution yeah and that's that's the code code we have to do of course there's a few tweaks which you also have to do some yaml files the most most of it is auto generated but this is the core of the development workflow right your SQL queries do only the value adding activity and not everything that can be a bit tedious no around that there's an option to enhance your documentation through the yaml files where you can actually elaborate further on speaking of documentation can you show us what this Auto documentation looks like candidate and we can generate right you can see the command in the command line down here you can do the docs generate and do the docs serve and then you will get nice and we have a documentation if I wanted to get a quick overview of my platform what would be the easiest way now interesting budding for that would be the lineage graph which we can take a look at here oh so in this case you can see the lineage of how the data flows through the the objects that we have so if we look at dim customers to create this one yeah I can see all all other tables nice views involved in this yeah what if I was I was curious I wanted to dig into the enriched customers you can go in and say sorry like the one with most Transformations I want to see what's what's happening with that you can it's as easy as right clicking on the object on the page and you can go to view documentation nice and on this page so so I can even hear there's an owner type language they kind of have the most basics for you get a lot of details about the objects you get all of the columns the data types even with comments if you have a comment you put comments into the uh yaml files you can get you get that out in this as well nice and finally you're also able to see the compiled output of your file but this still contains these these variables these template thingies can also see the actual calendar that was the source leading questions yeah you like that I know that uh so yeah and this is the interesting bit is here that I referenced actually a snippet of code which has been inserted as a separate CTE in this code and also you can see the references that we put in in the ginger template as variables are now replaced with the actual table names so if I was worried about I have this this language this this Ginger influence on my SQL what would it actually result in yeah I could go here and I can actually do what audit what exactly what exactly you can see the exact code that is is put in production in in the environments that you've that's awesome what if I was returned concerned about something like data quality can I set up some yeah we mentioned earlier that again uh you we mentioned earlier that it was able we were able to Define tests in the yaml files we can this can the test can be either built-in tests uh DBC comes with built-in tests for some things most common things but you can also write your own tests this is an example where we've defined a test oh so we used in the beginning see something called test unique yep but what does this actually do well I'm glad you asked ah can you compile that put is here and you can actually see the SQL SQL that is executed to perform the test so from here can actually see it's grouping by customer ID then it's making a count over all the table and it's making sure that all the customer IDs only exist yeah one time and if it was more than one it means we have duplicates yep and they're not unique precisely and what happens if we have these if you have duplicates you can actually one of the one you have an option for actually saving the the failed data to a separate table where you can analyze the course of the failures that's really cool so that is really neat I can you take that table and build my own power bi Report with what are all the test quality issues I have with my exactly and you get and you get the data to prove it really powerful I mean that is really that is really neat cool great demo so are there any are there any any anything else we should there are a few other notes noteworthy features uh with the DBC framework you also get a concept of snapshots which is basically testing for uh slowly changing dimensions in your data you can set that up per table you can do that either on a timestamp or on a comparison of columns in the data um you're able to see it data um if you have CSV data currently only CSV is supported but if you have CSV data that you want to seat into the tables you can do that um you've also got a concept of data freshness if you've got like time stamps on your data you can set up triggers to tell you whether how old your data is and you can set up triggers uh to sort of say if if you reach a threshold of a certain data H you can either get a warning or a notification um cool and one thing we I forgot to ask you DBT what what do I have to pay for it in the current form well in the current form where it's where we're using the DBT core it's actually free you can download it as like I mentioned from a pip Library so you can download it and use it and yeah there's a lot of since it's a python library and it's open source there's a lot of opportunities nice that opens up quite a lot of things possibilities for it nice yeah well thank you for a very uh thorough um rundown of DBT yeah and thank you for listening in I hope that you got got a little bit more excited about the warehouse feature I at least got when I tried this out it's certainly a very powerful odd offering and and suddenly can really compete with the lake house and fabric as well and uh enjoyed the news tips and the today's Spotlight and thank you for joining the studio today thanks for having me and just enjoy your day everyone out there yeah have a great day bye bye