Data Warehousing in Microsoft Fabric

Alle Broadcasts

Data Warehousing in Microsoft Fabric

25 visninger

25. maj 2023

In this second session, Mathias Halkjær Petersen and guest Andy Cutler, Microsoft Data Platform MVP, will talk about how Microsoft #Fabric introduces a whole new set of SQL-based artifacts, enabling us to build a modern version of the tried-and-tested Data Warehouse

View transcript

hey everyone and welcome to the Microsoft fabric session on data warehousing with the with Andy cuddler and myself here in the studio so with in this week we got probably the Christmas came early something like that gift from Microsoft we got the Microsoft fabric platform product Microsoft Microsoft fabric is the new unified platform that combines all the capabilities and tools from data warehousing data Factory synapse analytics serverless SQL dedicated SQL all the tools we know and love from Azure combined into one unified software as a service platform in Microsoft fabric right the same place where we used to have the power bi service so right next to power bi all these bunch of extra capabilities and a lot of a lot of other things also announced to be coming announced to be coming coming soon like data activator and more so we just expanded our toolbox with so many tools that I cannot cannot even count it and we have been working almost day and night for the past few days to get a grasp of everything around the platform everything we can do everything we can every tool and technology and today we are ready to be present um and broadcast around Microsoft fabric for all you guys this is the second session of today we had one earlier which you should be able to watch on YouTube if you missed it where we covered fabric as a whole in this session that we were starting in the starting area we are covering the data warehousing aspect aspect of of Microsoft Fabric and later today we will have a visit from Microsoft talking about the exploring the changes to power bi as a result of Microsoft fabric with a specific focus on a new direct like feature should be really interesting finally finalizing the day with a talk about how we should think about this whole concept of Microsoft fabric as an organization but right now it's time for a session that I've looked very much forward to a session on how we can do data warehousing in a software as a service platform in Microsoft Fabric and to tell you guys about this we have invited no one other than Andy cuddler a Microsoft data platform in VP an independent consultant who is probably one of the least leading experts on anything around Azure data platforming Azure synapse analytics and so forth so welcome to the studio Andy been looking so much forward to having you good morning Matthias how are you doing good morning I'm doing great how about very well although I'm pretty jealous um I actually want to be in that studio where you are right now so it looks really good yeah but thank you for inviting me on yeah I had colleagues asking me shouldn't we just invite him over here and I'm like I'm I'm pretty sure it's a bit far from where you are to Denmark just to be here next time with a bit more notice of this yeah I'll be there or virtual Studio yeah so how are you are you all rested and stuff because I'm not I think I've been up all day all night for the past few days it's been yeah it's been it's been a yeah it's been a strange one to have been Hands-On with fabric for the last few months and then see all announced and it just have that big bang of all of those sessions that went public around data integration data science all the workloads within Fabric and it's yeah it's almost like that uh you know that feeling of relief like it's it's out there now yeah but as you said you know there's been sessions through the night you know when you and what you want to do is you wanna you wanna be in those sessions and chatting with people as it's happening and as the announcements are coming through so yes you know you've got you know your your your day-to-day workload and then you've got to jump into all these sessions after hours as well plus all the Social Media stuff so it's been yeah I think I'm gonna sleep well this weekend yeah I know the feeling that's good but today you brought us a session on synapse data Factory which is one of the things that I am personally very very interested about hearing more more about yeah data warehousing yes we're going to talk about data warehousing Stone yes a couple of days ago uh Matty is well you know fellow minds are running this really great series of fabric videos and invited me on so this is really some of the things that I've grabbed from the publicly available information and put down because that's the other thing is yes because you know we've been hands on with it over the last few months we need to make sure that we're not talking about anything that is still private so basically everything that you see now is publicly available information it's right there in your power bi tenant if you want to go ahead and enable fabric for you know a few users to test out so everything you see here is you know completely uh completely um publicly accessible information that Microsoft have now been marketing but what we'll do is you know me and Mattis we're gonna this isn't just about me sitting here and just talking about it for 30 to 35 minutes you know but yeah well that's not going to be that interesting no exactly yeah we're gonna you know we're gonna jump we're going to have some conversations about it because it deserves talking about because of what Microsoft have done with the data warehousing side of things um so what we'll do is let's just um jump in so if anyone wants to connect with me then that QR code it just takes you to my Twitter account so it's Mr Andy Cutler so if you want to ask any questions or you know ask for any resources around any of this or you know carry on the discussion about what we're talking about today you know feel free feel free and obviously LinkedIn you know there's my LinkedIn page there as well so synapse data warehousing in fabric so I've put together some topics to to talk through so the first thing we're going to talk about is yes we're going to talk about current synapse dedicated and serverless SQL pools yes we have we have to yeah we can't just be like oh look there's fabric data warehousing and then disregard everything that's happening with current synapse dedicated to serverless it's also helpful to talk about that to understand fabric synapse data warehouse because yeah it's branded synapse so what does that mean in terms of both dedicated and serverless then we'll talk about the SQL resources in Fabric in terms of the artifacts that get created because there's some clarification there I think is particularly useful in terms of SQL endpoints and data warehouses within fabric so we'll talk you know we'll talk a little bit about that and then learning resources I've just put together a list of all the things that may interest you if you want to dive into Data warehousing specifically because look there's a lot of like we said you know in the in the um you know in the intro there's a lot of stuff around Fabric and you know what what's that is what's also been quite funny this week is because we've been you know been given information you know as and when week by week by week by week as we've worked with it um you know in in private preview it's it's never felt overwhelming it's never felt you know there's there's been a lot to cover whereas all of a sudden it gets released this week and there's so much stuff and of course I'm working through all of the officially released documentation as well and I've just realized how much they're how much there is yeah and I'm I I'm overwhelmed and I assume you are too so imagine the the audience out there who never heard about this since two days ago exactly it's wild exactly even even me you know having had access is like oh my word this is this is quite overwhelming you know there's there's you know the overall discussion of what fabric is then there's one Lake there's you know and all of the um intricacies around around one Lake in the direct Lake functionality then the warehousing and the data science in real time so yeah all of a sudden I've got this huge appreciation and empathy with anybody coming at Fabric and thinking where do I where do I begin so you could jump you know and the thing is though you can jump into a workload so I I think you know what's of benefit for everybody is if they just pick a workload and that gives context to everything that you use in fabric so instead of coming at it from like the top down and looking at all these Myriad workloads that you can run go bottom up pick a workload like data warehousing and then build up from there and then look to integrate it with with you know with other with other areas of fabric because otherwise yes it will be it will become a bit overwhelming so the learning resources it's just some links to some of the overall fabric resources but also to the blog where they're posting my Microsoft are posting information about each area and the learning modules so they've already put together you know the learn modules perhaps a precursor to a fabric certification we don't know yet um and then there's some data warehouse specific information in there you know as well yeah um nice so like I said my slides are pretty basic they're just talking points yeah they're just talking points so let's talk about Synapse dedicated and serverless SQL pools and how we've got to where we are with the fabric side of data warehouse and I think that's probably um it's going to give us some context right Matthias to to see where we've come from and where we're going right yep absolutely so so if I can mention maybe maybe start from for some of the power bi users out there why why we even in the first place had to talk synapse and if you ask me that is because power bi used to be a front-end tool so we had the most amazing data modeling capabilities we had the most amazing reporting capabilities but it when it came to stuff like actual dimensional modeling creating slowly changing Dimensions surrogate keys if you wanted to do these things but even anything around history and keeping historical data uh snapshots stuff like that we hit it so many limitations on the back end if we worked only with power bi which is where synapse really shined and and gave us all that data warehousing capability um yeah and we did that in synapse so what was the yeah what what was the tool we we tools we used in in synapse to yeah to accomplish that yeah so and you know when we when we go through some of the the sort of the the key points of the synapse data warehouse platform it'll kind of it'll provide a bit more context as to why we're talking about Synapse and dedicated and serverless but ultimately you know synapse is a platform as a service it's a Paz service so when we provision a synapse workspace we provision a dedicated SQL pool as many dedicated SQL pools as we want or as many dedicated SQL pools as our wallet allows us to buy but we can provision servers we can provision um dedicated SQL pools we can configure the size of those um SQL pools we've got settings that we can change so it's a Paz service yeah like Azure SQL databases um so it's it's it it still gives us all the functionality of SQL but it also lets us Tinker with some of the settings as well but we don't have to care or worry about Hardware we don't have to worry about patching anything like that so synapse you know dedicated and SQL pools are Paz services and with fabric we're moving to SAS we're moving to software as a solution where the SQL side of things is even more abstracted away so we don't get to unless we're configuring the size of the fabric capacity we're not able to go in and start changing compute settings for the SQL side of things that's an interesting point when we come to workloads where in uh in a slide later on but with dedicated SQL pools we're essentially ingesting data into its own storage its own storage is backed by ssds so there's fast right there's fast read it's an MPP it's a massively parallel processing system so it has compute nodes and as we've said we can configure and say well I want two nodes or I want four nodes etc etc etc yeah and then one came serverless sequel pools when synapse was announced because dedicated it used to be a standalone product called Azure SQL data warehouse It's itself that was based on the parallel data warehouse system and what's quite funny is some of the system views in some of the DMVs within dedicated SQL pools still have the PDW um prefix which was parallel parallel data warehouse so that product has evolved over time into what we know it today as as dedicated serverless Microsoft wrote a new engine the Polaris engine where we're connecting to data external so we're reading data from the data Lake but with serverless it's slightly different we're dedicated in that we're not actually importing any data into serverless we don't have any control over the number of clusters that you're running or the amount of power it is get the results as quick as you can from serverless however from the none none synapse informed or people out there if you don't read load data into anything how do you store your data how do you use your data with serverless what is this what is the storage behind it yeah so uh great Point data Lakes so all of your storage you know all of your data that you want to query is in the data Lake which is why serverless SQL pools became very popular in a lake house architecture so if you're processing data using spark and you're writing that data out to let's say the Delta format the Delta Lake format so keep Delta in mind when we chat about that in a bit but essentially you know you're Landing your data in that Delta format and then along along comes serverless and it can just query that Delta data using you know standard SQL you can connect client tools into it so power bi very common use case where you still want to do some SQL transformation of data in the data Lake you might have volumes of data in your data Lake that you want an intermediate computer engine to Crunch and aggregate down before loading into Power bi but essentially as you can see there we've got two SQL services so we've had a lot of let's say sessions blogs videos about what's the difference what's the difference between dedicated and serverless which should I use in which scenario and more importantly why do I have to choose between two SQL endpoints because I'm a little confused about which one I use yeah and then on top of that because I mean that and then that choice is influenced by are you using spark for your data transformation do you want to do it with SQL are you how are you loading in your data and it all becomes yeah complicated yeah yeah it becomes it becomes you know yes it's you know it's easy for us to sit here and say well I know what scenario to use dedicated and I know what scenario to use serverless but everybody else you know may have to go through a journey of understanding those two Services yeah and the thing is though the other issue is that there are two end points they're two two different endpoints that you need to connect to so you know two different server URLs to work with and they don't talk to each other so within dedicated you wouldn't be able to write a query straight against serverless SQL pools you just have to go and write it against the data that's in the data Lake and within serverless SQL pools you're not able to query some data from dedicated which I thought is a you know I would have liked to have been able to let's say enrich data coming from the data lake with some reference data in let's say my main data warehouse that's in that's sat in dedicated anyway so the comments it's been semi-fragmented I mean it was unified in some way but it was still kind of fragmented different tools didn't really talk to each other and you needed to be very conscious about how you set up your architecture to make sure that they could actually connect and use each other yes yeah so how does what does what does Microsoft fabric then change around this wow what's the what's the news so and we've had you know some discussions on on Twitter about this as well so they've simplified things but that simplification also needs explaining so I don't know whether you know that's uh you know whether whether you know that's the correct way of of saying that they've actually simplified everything but in terms of the technology in terms of the SQL technology within Fabric and within you know fabric synapse data warehouses you've got one SQL service yeah now this isn't as simple as they've merged dedicated and serverless into a single product because you know dedicated you know it's got storage nodes that it's um storing data in whereas fabric you know it's storing data in a different format we'll get on to that in a bit but we have one SQL engine that's that's the most important thing and it's more based on serverless SQL pools than it is dedicated so basically the engine that drives serverless SQL pools that you know is essentially what's underpinning synapse you know data warehousing that's why we've got the concept of you know lake houses that we can query with SQL in in fabric so do we know oh are we speculating at this point because I mean we know some things about for example The Virtue pack engine in power bi but do we know anything about the engine uh you right so so we're on a we're on a public broadcast so I think I think the answer to that is we have been speculating yeah um and we've been asking and we've been getting certain bits of information so at this at this point in time you know until Microsoft um do the decent thing and you know let us know what they've done under the hood because again you know this is going to be proprietary technology they'll want to you know Microsoft want to you know want want to uh protect it yeah you know um but you know I'm because because of what's happening and what's happened in fabric synapse data warehousing right I think it's a good assumption that more of it is pulled from the serverless SQL pools engine reason for that is when we come on to talk about some of the points later on around fabric but what I've got there is I've got SQL endpoints and I've got Warehouse in terms of the synapse data warehousing so this is uh again an important consideration for when you're creating artifacts within fabric so again I'm just gonna bring up two very simple talking points SQL endpoints and warehouses so when I said Microsoft have simplified things they have from an engine perspective so we've got one engine one SQL engine that we're using so it doesn't matter whether there's data sat in the lake house that we're querying or we're going to import data into a warehouse we're using the same SQL engine it's got the same endpoint great yeah so we've got one set of SQL commands to know to understand and we don't have to learn the idiosyncrasies of two different systems dedicated and serverless however when we come to using certain SQL uh features and certain SQL syntax there's a difference between what is accessible through a SQL endpoint and what's accessible through a warehouse Okay so there's going to be lots of sessions around um one lake so I I you know I think yes pick a workload to understand but then also in parallel you know do do the reading up and learning about one Lake because one Lake pick one workload and then also pick one leak yeah don't make sense yeah yeah basically data warehousing one Lake yeah Power bi one Lake machine learning one Lake you know real time streaming one late because on one lake is underpinning all of the workloads yeah with the the the you know this concept of just one copy of your data that you can access from all the different workloads yeah I mean when they call the new platform a unified platform I think the one thing that really makes it a unified platform is not the UI it is one Lake um in my opinion yes yes yeah I mean I I was having a conversation with someone the other day about just batch versus real time loading for data warehouses where sometimes um we're trying to use batch processes to provide some lower latency real-time data ingestion where it's not really you know it's not really meant for that and one of the what are the problems in data warehousing is how many times you're picking data up transforming it and Landing it somewhere yeah so we're hoping that with fabric we're reducing the overhead of that so if you've already pre-processed process some data and you have it somewhere you should be able to just access that data through the virtualization features within fabric rather than having to create a copy of that data you know that you do you know in other platforms yeah um so yeah so so you know when we're talking about the SQL side of things when we do create um you know a lake house where we can store files and transform those files into queryable objects when we create that lake house we automatically get a SQL endpoint that's created so it is it you know it is very similar to When You Spin up a synapse workspace and connect it to data like storage serverless SQL pools is there automatically for you to go and query the data Lake it's the same thing here in fabric you create the lake house you populate it with files that's being stored uh you know in one Lake you then can transform that data in all its weird and wonderful formats into the readable format within fabric which is you know tables within the lake house that will transform it into the storage format that's being used in in fabric which is Delta Lake so Microsoft have gone in on Delta in terms of the storage format so it doesn't matter whether you're creating a lake house table or a warehouse table the underlying storage is based on you know based on on Delta so that is a significant shift from the existing you know Technologies yeah and why it you know it's it's more aligned with what you're doing with serverless SQL pools but you get the SQL endpoint created you know as part of this lake house creation and that SQL endpoint is what allows you to um write SQL against your data that's in your lake house so what you're saying when we when we started working with Microsoft fabric we need to make a choice between a warehouse as a storage or a method or a lake house but no matter what we choose we still get some not not necessarily the same but we get some SQL capabilities we get a SQL endpoint on top of the lake house and we have SQL capabilities directly inside the data warehouse itself but they're similar but different those two SQL yes yeah and you know the the differences are more syntax which is preferable to you know two different engines with its idiosyncrasies that you've got to understand so you know on the left hand side with SQL endpoints that's more of the lake house architecture yeah where you're not doing any you know you're not doing any SQL based loading or transformation you're just able to access that data that's been transformed in the lake house but you can just query it with the SQL endpoint and ultimately you can either use the SQL endpoints to build power bi data sets and then ultimately reports or connect directly to the lake house using power bi data set and work from it there but the most important thing to consider is that the SQL endpoints doesn't allow you to do any form of modification so if you've written your data into the lake house you know you've created your tables let's say you've used um spark you've used pipelines data flow Gen 2 you know however you've landed that into the Delta format what you then can't do is use the SQL endpoint to modify the data so you can't go in and start updating and deleting um because no it is an end point for querying right so yeah that's that's where the functionality ends there in terms of in terms of what you do and that makes sense because otherwise if you could go in and start modifying the data it would that and this is just my opinion yeah so one of one of people's criticisms of serverless SQL pools is that you couldn't modify data with it but I always thought that made sense because then you could get into a bit of a mess where you're writing data through one system and then modifying it with serverless I was never a fan of you know of adding that particular bit of functionality so it's the same here with with SQL endpoints but we can create a warehouse yep so we can go into Fabric and we can say right I want a warehouse and that warehouse has got nothing to do with a lake house yeah so it is a standalone Warehouse artifact that's the SQL service where you've got full DML where you can insert you can update and you can delete yep so that'll be the storage format in which you can then import your data into a warehouse and work with it there independent of it you know being connected to a lake house but the important thing is here is that we can mount a SQL endpoint a lake house into a warehouse and have access to it yep so when I spoke about dedicated and SQL pools not really able to talk to each other we're able to do that here in terms of the whole data virtualization piece so we can create a warehouse we can mount you know a lake you know one lake house two lake houses whatever into that and we can cross query and read so we've got full compatibility again you know with what we're doing with the data it's just it's just important to to clarify that because you know again yes there's going to be I I think there's going to be some confusion about what you can do with the SQL endpoint in the warehouse people you know you may go into a SQL endpoint see see the SQL script interface and start trying to update data and you'll get an error saying you can't you can't update this data so that's I think that's just just worth calling out there in terms of the um the the differences between you know a SQL endpoint and um and a warehouse so we're choosing the warehouse you're doubling down on SQL where you choose the lake house you are opening up the possibility of doing something python something notebook something more data engineering that's a great you know yeah massive is that that is absolutely fantastic um point to be made as well yes and in fact you could probably add to this slide data engineering above the SQL endpoint yeah and data warehousing above the warehouse so for example if you are a sequel shop yeah and if you don't do a lot of spark maybe some light touch spark in terms of loading data but you you know you are a SQL based shop you know and let's look let's let's be honest there's going to be lots and lots of data teams out there that are running on-premise SQL server or SQL server in the cloud or using Azure SQL database or dedicated SQL pools for warehousing they're going to be writing SQL season absolutely so with creating a warehouse you've got full support well you know support for SQL um I'll be honest I can't remember what is and isn't supported yet in terms of the SQL syntax but if you've got your SQL based ETL or elt processes then the warehouse is gonna be perfect yeah because you're going to be able to create your store procedures with all your loading logic you're going to be able to um you know pipe you know put those store procedures into pipelines and orchestrate the uh you know orchestrate the loading and running of those store procedures so I think anyone coming into Fabric and thinking oh okay so now I've got to use spark to do all of my data engineering and loading no no no no no you can if you are a spark shop so you can come in and use the lake house and use notebooks and and do all that funky stuff with with pi Spark but if you are a sequel shop then Warehouse has is for your uncovered yeah and it's all going to be over the same data and it's all going to be surfaced through the same um you know the same connection string the same objects in the end yeah um so yeah no that's a yeah great it's a great call out Matty is yeah should we dive into it I'm assuming you have some yeah details on on you know I did not workings but the what what can we do what what what's inside what's the fight yeah you said you had something about file format stuff like that yeah yeah so I think the first thing is around the the data the data storage format um you know which is probably the biggest departure that we've seen in terms of a storage format for Microsoft in one of their main database products because we are now using um we're now using Delta yep so which is backed by the parquet yeah backed by the parquet file format um and I have seen you know um you know those that are more um familiar with power bi getting a bit worried about Delta and having to understand parquet and things like that and I don't think you're going to need to be too worried about understanding the mechanics at this stage yes yes you need to understand it the same way you had to understand the inner workings of a data flow storage files and basically you didn't need to understand that either I mean you need to understand how you work with it not what's going on inside of it yeah and you know that you know there are I mean for example I I to be honest with you I couldn't I couldn't explain to you the inner workings of a DOT MDF file and a DOT ldf file in a SQL Server I know what they are and I know what they contain but other than that no you know it's the same here now there are optimizations that will be applied to the storage and settings that you can apply so again there's documentation around how Microsoft have implemented certain optimization features you know within Delta but Delta is you know effectively underpinning the storage mechanism here it's as we've touched on it's a SAS platform so you know you're going to configure your fabric capacity um with a certain amount of compute a certain amount of cores and that's going to be shared so your spark workloads your power bi workloads your pipelines and data flows your SQL endpoints lake house warehouse it's all going to use that shared compute so we don't again have to worry about understanding the different types of compute from all the different services but with that you you know you're relinquishing a certain amount of um a certain amount of power you know in being able to configure the compute and isolate it from you know from itself but yes so you know do we truly have this separation of storage and compute well you know yes the storage format is in Delta and there's been lots of discussions out there about you know the ability to write Delta in to fabric by outside engines so for example data data bricks again you know very very you know probably well you know the ubiquitous um data processing platform you know for spark but the storage I.E the Delta Data which stores the data the schema the statistics the transaction logs everything is self-contained and then the compute can just come along whether it's fabric compute you know databricks compute synapse compute whatever and talk you know talk to that data and we touched on that cross querying so I'll you know ultimately under the in an umbrella term of data virtualization where we want to process our data once and then try and reuse it as much as possible without moving it yeah now maybe we're not too worried about costs in terms of copying data because you know it's it's relatively inexpensive um to store that data but obviously the data volumes grow if you are copying data around copying data it takes time as well right yep so you know if you can minimize the amount of times you're moving and copying data around it's going to speed everything up but you can cross query you know lake houses and warehouses so that you know you're not having to repeat loading from you know uh you know a source into multiple different um areas within fabric you know duplicating data so what I'll do is I'm going to bring these two up you know Auto scale and self-optimizing um because yeah it's you know it's a you know software as a solution service when we're running SQL workloads then those workloads are going to be taking the compute that they require to get their jobs done you know as quickly as possible now there's probably going to be some more information coming about the the way that the fabric computes works with the workloads that are being given to it so we've got you know this this concept of um bursting workloads we've got the concept of smoothing where yes you're going to buy a certain amount of capacity but if certain workloads need to maybe go beyond that a little bit then it can burst it can smooth out so you know there's going to be some great information about what the compute is doing under the hood but suffice to say this when you're running workloads within fabric it's going to be Auto scaling to be able to you know to get those workloads workloads done the interesting point there six self-optimizing so when I say no more workload groups So within dedicated SQL pools you had the concept of being able to create a single dedicated SQL pool let's say 600 gig of RAM yeah and it was you know 10 dwu um a thousand you know so it's two two compute nodes what you could do is you could take that dedicated SQL pool and you could partition it off in terms of the resources so for example you could create a workload group in dedicated just for ETL and say that always gets 50 yes it's going to get 300 gig it's going to get the compute power that's associated with 50 all the time but it's not going to get any more than 50 percent then you could have 25 percent that's dedicated to just querying and then another 25 for let's say uh Downstream loading of models or ml integration or something like that but you had to set that up you had to go in and configure that otherwise dedicated would just say oh okay well a user is going to come in um and I'll give you know the user that amount of compute based on their you know based on their um their user settings and then that's it I'll just I'll just do it based on that but here we've got self-optimizing yeah so again you know I want some more information about the self-optimizing part of this yeah because that then means that the boundaries of the workloads are going to be shifting yep so when my SQL based workloads are happening and I've also got some power bi workloads over here and then maybe some spark workloads and ml workloads you know it's going to be self-optimizing itself yeah um you know that's going to be like you know closely Linked In with but we also want to protect ourselves I mean we don't want one one notebook where that wonky has broke job winning everything else within our fabric capacity to make make sense yeah exactly and we want to add that in the our last session of the day starting at three uh three um in the afternoon Central European Time we will go more into depths of what we know about capacities and pricing in in fabric so just a little teaser there brilliant yes yes yeah so that's three essential things pick a workload understand one Lake and then understand the licensing yes it's at you know all these things are adding up now expandly but in terms of you know I'm just having a look at the time so in terms of loading data they're getting getting data around so in the synapse data warehouse you know if we've created our own Warehouse we can use the most recommended command to load data in so this was the command that was most recommended for dedicated SQL pools which is copy yeah so the copy command you can think of the copy command as you know it's just a SQL statement where you point to external location you know a data Lake where you want to load your data into the warehouse using SQL syntax and using some options there to maybe Auto create the table Etc but that was the command that fully utilized scale out and parallel processing so it's really good to see that here but of course we can also use pipelines so data Factory Within Fabric and data flow Gen 2 so so there was a comment made so um you know forgive me I'm just very quickly scrolling through but there was a comment made about mapping data flows yeah so conspicuously absent from fabric at the moment yeah that those Mappy data flows were just they were just basically a visual way of creating um spark commands really it's not here what we do have is we have the power query ETL um process in data flow Gen 2 which has obviously all those Source connectors to connect to all your different data sources but it now has destination so I did see I did I did see I did I think it might have been MIM on Twitter talking about or it might have been someone else but talking about the fact that data flow Gen 2 hey it's an ETL tool now absolutely so Andy we have half a minute left so can we wrap up and say for for the users out there who either are coming from an on-prem SQL Warehouse or power bi uses or something like that what is the key takeaways here I mean what what is uh what's the key message here around the key message here is that your workloads will be supported you don't have to work you don't have to worry to I've just taken this from the official Microsoft documentation which shows that we've got all those data sources we can ingest them we can mount so you know to to get at copies of them but in our data warehouse we can transform using SQL we've got our stall procedures we've got our happy place in terms of being able to use SQL to do our data warehousing and then Downstream we can expose that with power bi models so I think the key takeaway is that yes we've got you know there's going to be a lot of discussions about all the different ways and means that you can work with fabric but if you're coming from a SQL background it's got you covered absolutely awesome so we we are able to build an actual SQL Data Warehouse in Microsoft fabric under the hood it may use other tools and other storage formats and a lot of his abstraction but from our perspective from our user perspective it is a SQL based data warehouse as we know and love it and we can build it and see yeah you work with it with SQL amazing thank you so much for coming Andy it has been an absolute pleasure and learning uh learning for as as always insightful as always we will if you send me the links we can share them on the channel on the event as well and uh yes we'll do thank you for joining today and enjoy the rest of the week working tweaking more with with Microsoft fabrics and to the viewers out there we'll have a short lunch break now and be ready for the next session again at 1 1 pm so that's uh one in the the at the afternoon with the visit from Microsoft glass Amazon senior program manager of the micro from the power bi cat team who will talk about the influence of direct Lake for power bi today so thank you for coming Andy and have a nice day to every one of you enjoy thank you very much thank you

Alle Broadcasts