WEBVTT 1 00:00:00.160 --> 00:00:03.040 Welcome to the deep dive. Today, we're going to be 2 00:00:03.080 --> 00:00:07.400 diving into the world of automated machine learning exciting specifically 3 00:00:07.679 --> 00:00:10.759 on Microsoft's Azure platform. You know, think of it as 4 00:00:10.839 --> 00:00:16.879 AI simplified. And to help us navigate this really complex world, 5 00:00:17.280 --> 00:00:20.399 we have Dennis Sawyer's Practical Guide to Auto mL on Azure. 6 00:00:20.600 --> 00:00:22.320 Yes, this book is great. 7 00:00:22.440 --> 00:00:25.679 So who wants to start today is to give you 8 00:00:25.719 --> 00:00:29.480 a solid understanding of what auto mL is, what its 9 00:00:29.559 --> 00:00:31.920 benefits are, and then how you can actually use it 10 00:00:31.920 --> 00:00:32.399 on Ashure. 11 00:00:32.479 --> 00:00:34.000 Awesome, No, that's good. 12 00:00:34.000 --> 00:00:37.399 The book starts off with a pretty startling statistic. Eighty 13 00:00:37.520 --> 00:00:40.359 seven percent of AI projects fail. 14 00:00:41.200 --> 00:00:42.560 Wow, that's a lot. 15 00:00:42.759 --> 00:00:45.799 That's a lot of unrealized potential. Yeah, that is, especially 16 00:00:45.799 --> 00:00:48.240 considering the resources that are being poured into this field. 17 00:00:48.399 --> 00:00:50.359 It really is, and it's interesting. 18 00:00:50.439 --> 00:00:52.520 We'll see even more because the book. 19 00:00:52.320 --> 00:00:55.240 Digs into why interesting projects fail. 20 00:00:55.359 --> 00:00:58.960 Book digs into why it's not always they fail, and 21 00:00:59.000 --> 00:01:02.280 it's not all dramatic implosion. 22 00:01:01.840 --> 00:01:04.640 Right, dramatic inclusions a very slow bird. It's like a 23 00:01:04.680 --> 00:01:08.799 slow leak in a spaceship, just slowly draining resources and. 24 00:01:08.840 --> 00:01:13.200 Hope exactly, and as the book details, like the traditional 25 00:01:13.239 --> 00:01:17.439 machine learning workflow is incredibly time consuming incomplex so much. 26 00:01:17.400 --> 00:01:18.879 More than just building models. 27 00:01:18.920 --> 00:01:22.040 It's not just building the model, it's the data cleaning, 28 00:01:22.400 --> 00:01:27.840 the feature engineering, and then the dreaded hyper parameter tuning. Oh, 29 00:01:27.879 --> 00:01:30.159 tell me about it, which can be a real headache. 30 00:01:30.200 --> 00:01:33.680 It's like solving a Rubik's cube, yes, blindfolded while riding 31 00:01:33.719 --> 00:01:34.359 a unicycle. 32 00:01:34.480 --> 00:01:35.159 Absolutely. 33 00:01:35.359 --> 00:01:37.840 And then, as the book points out, many data scientists 34 00:01:37.840 --> 00:01:40.640 aren't actually trained and deploying these models in the real world. 35 00:01:40.719 --> 00:01:40.879 Oh. 36 00:01:40.959 --> 00:01:44.719 Interesting, So they end up with these really fragile okay, 37 00:01:44.959 --> 00:01:48.760 hobbled together solutions that are just waiting to. 38 00:01:48.760 --> 00:01:50.719 Crumble, right, And this is where and. 39 00:01:50.680 --> 00:01:53.519 So this is where AutoML comes in. Gota mL comes in, 40 00:01:53.719 --> 00:01:57.599 offering a potential solution to this ROI dilemma that exists 41 00:01:57.599 --> 00:01:58.640 within data science. 42 00:01:58.840 --> 00:02:02.359 Yeah that ROI prom Okay, so elevator pitch time. What 43 00:02:02.519 --> 00:02:05.120 is auto mL and how does it solve this problem? 44 00:02:05.439 --> 00:02:08.599 Okay? Imagine having an AI assistant that takes care of 45 00:02:08.680 --> 00:02:11.800 all of the heavy lifting in that machine learning process. Ooh, 46 00:02:11.879 --> 00:02:16.159 I like that AutoML can train multiple models simultaneously using 47 00:02:16.159 --> 00:02:20.319 the latest algorithms. Got it handles feature engineering, nice fine 48 00:02:20.319 --> 00:02:24.400 tunes those pesky hyper parameters and even offers built and 49 00:02:24.479 --> 00:02:29.560 explainability features which are really crucial for building trust and transparency. 50 00:02:29.680 --> 00:02:31.319 Wait, so does that mean data scientists are going to 51 00:02:31.319 --> 00:02:31.879 be out of a job. 52 00:02:32.120 --> 00:02:36.439 Not at all. Auto mL actually empowers data scientists. It 53 00:02:36.520 --> 00:02:39.599 frees them up to focus on the higher level aspects 54 00:02:39.639 --> 00:02:44.479 of a like problem definition, strategy and interpreting those results. 55 00:02:44.599 --> 00:02:48.000 So it's more of a collaboration exactly between human expertise 56 00:02:48.599 --> 00:02:53.599 and AI efficiency. Now let's talk Azure. Why is Azure 57 00:02:54.280 --> 00:02:56.759 so central to this auto mL story? 58 00:02:56.960 --> 00:03:01.960 So the book focuses on Azure because Microsoft's cloud platform 59 00:03:02.479 --> 00:03:06.240 offers a comprehensive suite of tools for auto mL, and 60 00:03:06.240 --> 00:03:10.439 that's all through the Azure Machine Learning Service AMLS as 61 00:03:10.439 --> 00:03:10.879 it's called. 62 00:03:10.960 --> 00:03:13.560 So it's like our auto mL playground exactly. 63 00:03:13.680 --> 00:03:16.479 And the book actually guides us I like it through 64 00:03:16.520 --> 00:03:19.719 setting up an AMLS workspace, and it gets us familiar 65 00:03:19.719 --> 00:03:21.719 with all the features we have at our disposal. 66 00:03:21.960 --> 00:03:22.240 Nice. 67 00:03:22.479 --> 00:03:26.319 One key concept is compute compute, which is essentially the 68 00:03:26.360 --> 00:03:28.560 engine towering your auto mL jobs. 69 00:03:28.599 --> 00:03:32.360 So compute is like the horsepower behind our AI engine. 70 00:03:32.400 --> 00:03:36.159 That's a great analogy, and the book distinguishes between compute 71 00:03:36.159 --> 00:03:40.199 instances which are for simpler tasks, and then compute clusters 72 00:03:40.199 --> 00:03:43.840 for those more demanding god the resource intensive workloads. 73 00:03:43.479 --> 00:03:46.039 So when you need to kick it into high gear exactly. 74 00:03:46.159 --> 00:03:47.479 And here's where it gets interesting. 75 00:03:47.680 --> 00:03:49.360 Okay, these compute. 76 00:03:48.960 --> 00:03:53.599 Clusters can autoscale. Ooh, I like that, meaning you only 77 00:03:53.639 --> 00:03:57.280 pay for what you use budget conscious, which is a 78 00:03:57.360 --> 00:04:00.360 major win for budget conscious projects. 79 00:04:00.400 --> 00:04:04.520 Smart. So we have our workspace, our powerful compute engine. 80 00:04:05.080 --> 00:04:07.240 What about the fuel, the data? 81 00:04:07.520 --> 00:04:12.960 Yes, data is the lifeblood of any AI project. MLS 82 00:04:13.280 --> 00:04:17.120 actually works with data sets, data sets which act as 83 00:04:17.160 --> 00:04:20.040 pointers to your data sources, whether they reside in a 84 00:04:20.120 --> 00:04:23.879 storage account, got it or a SQL database. And Azure 85 00:04:24.120 --> 00:04:28.079 even provides open data sets for experimentation. 86 00:04:27.480 --> 00:04:29.040 Free data to play with. I like that. 87 00:04:29.240 --> 00:04:32.439 Yeah, so like the diabetes data set it's mentioned in 88 00:04:32.439 --> 00:04:32.759 the book. 89 00:04:32.839 --> 00:04:35.639 So I give auto mL my data and it magically 90 00:04:35.680 --> 00:04:36.800 spins out a perfect model. 91 00:04:37.120 --> 00:04:40.120 Not quite magic, but AutoML does do a lot of 92 00:04:40.160 --> 00:04:42.720 the heavy lifting kind of scenes. So the book explains 93 00:04:43.120 --> 00:04:45.759 how auto mL starts with what are called data. 94 00:04:45.480 --> 00:04:48.360 Guardrail data guardrails. 95 00:04:47.720 --> 00:04:52.319 Which are automated data quality checks that can identify potential 96 00:04:52.360 --> 00:04:53.319 issues right out. 97 00:04:53.199 --> 00:04:57.240 Of the gate. So bad data equals bad outcomes exactly, garbaging, 98 00:04:57.279 --> 00:04:57.879 garbage out. 99 00:04:58.000 --> 00:05:01.519 So once the data passes inspection, what's next? Auto mL 100 00:05:01.680 --> 00:05:06.199 kicks into high gear with intelligent feature engineering okay, So 101 00:05:06.279 --> 00:05:11.279 it handles tasks like dealing with missing values, transforming categorical 102 00:05:11.360 --> 00:05:15.759 variable for example, using one hot encoding okay, which the 103 00:05:15.800 --> 00:05:20.040 book explains really well, and it even generates new features wow. 104 00:05:20.279 --> 00:05:23.279 And it tailors all of this to the specific algorithms 105 00:05:23.279 --> 00:05:24.079 that it will be using. 106 00:05:24.399 --> 00:05:28.000 So it's not just blindly throwing algorithms at the data, 107 00:05:28.519 --> 00:05:32.399 it's strategically preparing the data for each algorithm exactly. That's 108 00:05:32.399 --> 00:05:36.879 pretty impressive. And speaking of algorithms, seeking of algorithms, AutoML 109 00:05:36.920 --> 00:05:38.680 has a whole arsenal at its disposal. 110 00:05:38.759 --> 00:05:40.160 It has a whole arsenal. 111 00:05:40.040 --> 00:05:44.399 Ranging from classic regression and classification models to more advanced 112 00:05:44.439 --> 00:05:47.600 techniques like gradient boasting and deep learning. So how does 113 00:05:47.639 --> 00:05:50.639 AutoML decide which algorithm to use? Does it just pick 114 00:05:50.680 --> 00:05:51.519 one at random? 115 00:05:51.600 --> 00:05:56.040 It does not. It systematically tests multiple algorithms okay, parallel, 116 00:05:56.079 --> 00:05:58.800 got it. And while it's doing so, it also fine 117 00:05:58.839 --> 00:06:02.240 tunes the hyper parameters for each one, so it's searching 118 00:06:02.319 --> 00:06:03.839 for the best performing combination. 119 00:06:04.199 --> 00:06:08.040 It's like having a team of data scientists working tirelessly 120 00:06:08.120 --> 00:06:12.680 behind the scenes, exact optimizing every step of the process. Yes. 121 00:06:13.160 --> 00:06:15.040 So does this mean I can just sit back and 122 00:06:15.079 --> 00:06:16.759 wait for the perfect model to appear? 123 00:06:17.120 --> 00:06:20.800 Well, not quite. Ok The book emphasizes that while auto 124 00:06:20.920 --> 00:06:23.600 mL is a powerful tool, it's not a magic bullet. 125 00:06:23.639 --> 00:06:27.040 Okay, fair enough. So where does human judgment come in? 126 00:06:27.519 --> 00:06:31.160 You play a crucial role at it in defining the problem, 127 00:06:31.319 --> 00:06:35.360 choosing the right evaluation metrics okay, and most importantly interpreting 128 00:06:35.399 --> 00:06:35.959 the results. 129 00:06:36.079 --> 00:06:39.439 So there's still that need for that human touch, that 130 00:06:39.600 --> 00:06:42.319 understanding of the nuances of the problem that we're trying 131 00:06:42.319 --> 00:06:45.399 to solve. Absolutely, and speaking of getting our hands dirty, Yes, 132 00:06:45.759 --> 00:06:50.240 the book actually dives into building models using Azure Machine 133 00:06:50.279 --> 00:06:53.800 Learning Studio okay, which is a visual interface for working 134 00:06:53.800 --> 00:06:57.120 with mls exactly. It's designed to be user friendly even 135 00:06:57.160 --> 00:06:58.879 if you're not a coding guru. Yes. 136 00:06:59.079 --> 00:07:02.560 And the book walks us through creating our first classification 137 00:07:02.759 --> 00:07:07.079 model using the a classic Titanic passenger data ah. 138 00:07:07.319 --> 00:07:11.560 The Titanic data set a data science right of passage. 139 00:07:11.120 --> 00:07:15.319 Exactly, you know, predicting who survived and who didn't based 140 00:07:15.360 --> 00:07:19.720 on factors like age, gender, ticket class face, and the 141 00:07:19.759 --> 00:07:24.199 book makes it surprisingly straightforward. You upload the data, tell 142 00:07:24.240 --> 00:07:27.079 auto mL you're doing a classification task, and you let 143 00:07:27.079 --> 00:07:28.000 it work its magic. 144 00:07:28.079 --> 00:07:30.639 And while it's working its magic, we can monitor those 145 00:07:30.720 --> 00:07:34.399 data guardrails exactly see if any potential issues are flagged. 146 00:07:34.759 --> 00:07:37.560 Yeah, it's like having a built in data quality watchdog. 147 00:07:37.959 --> 00:07:38.600 I like that. 148 00:07:38.800 --> 00:07:42.279 Now, once automil finishes training the set of models, it's 149 00:07:42.279 --> 00:07:44.000 time to put on our evaluation hats. 150 00:07:44.079 --> 00:07:45.759 So we've got a bunch of models. Yeah, what do 151 00:07:45.839 --> 00:07:46.360 we do with them? 152 00:07:46.480 --> 00:07:51.600 Auto mL gives us a smorgasboard of metrics accuracy, precision, recall, 153 00:07:51.839 --> 00:07:56.199 and it gives us those helpful confusion matrices to visualize 154 00:07:56.199 --> 00:07:57.720 how each model is performing. 155 00:07:57.920 --> 00:08:01.480 I'll be honest, sometimes those metrics can feel a little overwhelming, 156 00:08:01.800 --> 00:08:03.199 especially if you're new to machine learning. 157 00:08:03.240 --> 00:08:03.600 I hear you. 158 00:08:03.680 --> 00:08:07.079 There's just the book so many numbers. The book clearly 159 00:08:07.120 --> 00:08:10.720 explains each metric, okay, and helps us to understand which 160 00:08:10.759 --> 00:08:14.480 ones are most important for different types of problems. And 161 00:08:14.519 --> 00:08:17.959 remember those explainability features, Yes, we can use them to 162 00:08:18.040 --> 00:08:23.120 understand why a model is making certain predictions. That's crucial, 163 00:08:23.399 --> 00:08:26.040 which is crucial for building trust and transparency. 164 00:08:26.199 --> 00:08:28.920 So it's not just blindly trusting the numbers. It's about 165 00:08:29.000 --> 00:08:30.800 understanding the reasoning behind them. 166 00:08:30.920 --> 00:08:31.480 Absolutely. 167 00:08:32.080 --> 00:08:34.919 Now, let's say we've found a model that looks pretty 168 00:08:34.919 --> 00:08:38.679 promising based on the metrics and the explainability checks. What 169 00:08:38.759 --> 00:08:40.759 happens next, then. 170 00:08:40.679 --> 00:08:44.279 It's time to deploy deployment. We need to make our 171 00:08:44.360 --> 00:08:48.600 model available for use either for a batch processing of 172 00:08:48.679 --> 00:08:52.000 large data sets or for real time predictions. 173 00:08:52.120 --> 00:08:53.879 So we're taking our model out of the lab and 174 00:08:53.919 --> 00:08:55.039 putting it into the real world. 175 00:08:55.159 --> 00:08:55.679 Exactly. 176 00:08:55.759 --> 00:08:58.639 But we've mainly been talking about classification models. 177 00:08:58.759 --> 00:09:01.559 What about problems where you need to predict a numerical 178 00:09:01.639 --> 00:09:03.360 value instead of a category. 179 00:09:03.480 --> 00:09:07.360 The book covers that took it dives into building regression 180 00:09:07.399 --> 00:09:11.440 models using auto mL on Azure. It uses the diabetes 181 00:09:11.559 --> 00:09:15.000 data set to actually predict the progression of the disease 182 00:09:15.240 --> 00:09:16.480 based on different factors. 183 00:09:16.559 --> 00:09:20.360 So regression is about predicting numbers, right, like stock prices 184 00:09:20.480 --> 00:09:24.679 or sales figures, or in this case, the severity of 185 00:09:24.720 --> 00:09:27.720 a medical condition. Exactly. Okay, And while the process is 186 00:09:27.759 --> 00:09:31.759 similar to classification, obviously the metrics and the algorithms used 187 00:09:31.759 --> 00:09:35.080 are different, but AutoML still handles all that heavy lifting 188 00:09:35.200 --> 00:09:37.440 it does. But we need to use the right tool 189 00:09:37.519 --> 00:09:40.360 for the job. Absolutely, I bet there are some tips 190 00:09:40.399 --> 00:09:42.720 and tricks for getting the best performance out of auto 191 00:09:42.879 --> 00:09:44.879 mL for these regression problems. 192 00:09:45.080 --> 00:09:48.120 There are, and the book offers some great insights. 193 00:09:48.200 --> 00:09:49.200 Okay, lay on me. 194 00:09:49.600 --> 00:09:53.840 Sometimes converting a regression problem into a classification problem can 195 00:09:53.879 --> 00:09:55.279 actually improve performance. 196 00:09:55.600 --> 00:09:57.879 Hold on, how do you convert a problem about predicting 197 00:09:57.919 --> 00:10:01.360 a number into a problem about predicting a category? 198 00:10:01.480 --> 00:10:05.000 It's all about binning binning. You're going to divide the 199 00:10:05.120 --> 00:10:09.279 range of possible numerical values into categories or bins. Okay, 200 00:10:09.360 --> 00:10:12.600 So instead of predicting the exact price of a house, okay, 201 00:10:12.879 --> 00:10:15.559 you might predict whether it falls into a low, medium, 202 00:10:15.639 --> 00:10:16.759 or high price range. 203 00:10:16.960 --> 00:10:20.360 It's like simplifying the problem to help AUTOMML find patterns 204 00:10:20.360 --> 00:10:22.840 more easily. Exactly, Okay? What are their tips does the 205 00:10:22.840 --> 00:10:23.360 book offer? 206 00:10:23.519 --> 00:10:28.440 It emphasizes the importance of experimenting with different primary metrics. Okay, So, 207 00:10:28.600 --> 00:10:31.279 for example, you might find that a metric like mean 208 00:10:31.399 --> 00:10:36.159 absolute error MAE gives you a more meaningful evaluation than 209 00:10:36.200 --> 00:10:39.320 something like root means squared error or RMS. 210 00:10:39.720 --> 00:10:41.559 Right, So it all depends on what you're trying to 211 00:10:41.600 --> 00:10:42.519 achieve with your model. 212 00:10:42.639 --> 00:10:43.039 It does. 213 00:10:43.240 --> 00:10:45.840 Okay, So there's still a lot of room for human 214 00:10:45.919 --> 00:10:49.960 judgment and decision making, even with AUTOMML handling so much 215 00:10:49.960 --> 00:10:53.080 of that complexity. Now I have to ask, what about 216 00:10:53.080 --> 00:10:55.120 those situations where you need to train not just one 217 00:10:55.200 --> 00:10:56.559 or two models, but one hundreds. 218 00:10:57.120 --> 00:10:58.360 The many models. 219 00:10:57.960 --> 00:10:59.600 Probablybe even thousands of models. 220 00:10:59.720 --> 00:11:03.000 That's the book really shines. It introduces a tool called 221 00:11:03.000 --> 00:11:08.200 the Many Models Solution Accelerator or MMSA. MMSA okay, think 222 00:11:08.200 --> 00:11:10.919 of it as an auto mL factory. You feed it 223 00:11:11.080 --> 00:11:14.360 your data I'm listening, and it automatically splits it up 224 00:11:14.399 --> 00:11:18.960 based on certain criteria, different stores, products, or regions, and 225 00:11:19.000 --> 00:11:22.480 then it uses the power of auto mL to train 226 00:11:22.639 --> 00:11:26.320 a separate model for each of those subsets, all running 227 00:11:26.360 --> 00:11:27.039 in parallel. 228 00:11:27.519 --> 00:11:28.720 That's a lot of models. 229 00:11:28.799 --> 00:11:29.639 It is a lot of models. 230 00:11:29.679 --> 00:11:32.320 So why would someone need to create so many models? 231 00:11:32.360 --> 00:11:36.759 It's incredibly useful when you need granular, customized models. 232 00:11:36.799 --> 00:11:37.320 So got it. 233 00:11:37.879 --> 00:11:40.480 The book gives an example of a retail chain that 234 00:11:40.519 --> 00:11:44.120 wants to forecast demand for each product in each store. 235 00:11:44.679 --> 00:11:48.679 By using MMSA, they can create thousands of custom models 236 00:11:48.720 --> 00:11:52.679 that account for all the unique factors that might influence sales. 237 00:11:53.000 --> 00:11:55.240 Wow. So it's like hyper personalized AI. 238 00:11:55.799 --> 00:11:56.240 It is. 239 00:11:56.399 --> 00:11:57.200 That's incredible. 240 00:11:57.440 --> 00:11:58.759 Imagine there are things to keep in. 241 00:11:58.639 --> 00:12:01.039 Mind, right, there's to be some catches. 242 00:12:01.159 --> 00:12:03.679 One of the key points the book highlights is choosing 243 00:12:03.720 --> 00:12:05.200 the right partition columns. 244 00:12:05.759 --> 00:12:06.200 Columns. 245 00:12:06.200 --> 00:12:08.039 These are the criteria that you use to divide your 246 00:12:08.080 --> 00:12:11.320 data into those subsets. You need to think carefully about 247 00:12:11.360 --> 00:12:15.039 which factors are most likely to influence the target variable 248 00:12:15.080 --> 00:12:16.200 that you're trying to predict. 249 00:12:16.639 --> 00:12:19.000 So it's like making sure you're slicing and dicing your 250 00:12:19.080 --> 00:12:22.039 data along the right lines so that the models you 251 00:12:22.120 --> 00:12:25.799 create are actually meaningful and relevant precisely. Now, once you've 252 00:12:25.799 --> 00:12:29.039 trained all these models using the MMSA, you still need 253 00:12:29.080 --> 00:12:31.480 to deploy them. You do, and that can get even 254 00:12:31.519 --> 00:12:35.480 more complex when you're dealing with thousands of models. 255 00:12:35.639 --> 00:12:37.840 It can instead is just one it does. 256 00:12:38.320 --> 00:12:39.759 So how do you manage all of that? 257 00:12:40.360 --> 00:12:43.440 The book really stresses the importance of automation here. Okay, 258 00:12:43.519 --> 00:12:45.759 It introduces a tool called Azure Data. 259 00:12:45.559 --> 00:12:47.480 Factory as your data factor, which can. 260 00:12:47.360 --> 00:12:52.240 Help orchestrate complex data flows, connect to various data sources, 261 00:12:52.519 --> 00:12:55.840 transform data, and even trigger those mL pipelines that we 262 00:12:55.840 --> 00:12:56.720 talked about earlier. 263 00:12:56.759 --> 00:12:59.240 So if those mL pipelines are like assembly lines for 264 00:12:59.279 --> 00:13:02.519 building our model. Yeah, then Azure Data Factory is like 265 00:13:02.639 --> 00:13:05.360 the logistics manager, making sure all the raw materials and 266 00:13:05.399 --> 00:13:08.600 finished products are flowing smoothly, perfect. And in addition to 267 00:13:08.600 --> 00:13:11.159 all this, the book is packed with helpful tips and 268 00:13:11.159 --> 00:13:13.799 tricks for getting the most out of AUTOMML. It is 269 00:13:14.120 --> 00:13:16.600 regardless of the type of problem that you're tackling. Yes, 270 00:13:16.720 --> 00:13:19.960 I love that practical advice. It's like having a seasoned 271 00:13:19.960 --> 00:13:23.399 auto mL expert looking over your shoulder, absolutely guiding you 272 00:13:23.440 --> 00:13:24.600 along the way exactly. 273 00:13:24.799 --> 00:13:28.799 And the book really encourages a spirit of experimentation. Okay, 274 00:13:28.879 --> 00:13:31.200 it wants you to be a data detective. Oh I 275 00:13:31.279 --> 00:13:33.720 like that, to try different approaches and see what works 276 00:13:33.720 --> 00:13:34.919 best for your situation. 277 00:13:35.720 --> 00:13:38.480 That's what makes data science so exciting. It's not about 278 00:13:38.480 --> 00:13:44.000 blindly following rules. It's about exploring, discovering, and finding creative solutions. 279 00:13:44.120 --> 00:13:44.879 Absolutely. 280 00:13:45.120 --> 00:13:47.279 We've talked a lot about az your machine Learning studio, 281 00:13:47.320 --> 00:13:51.519 which has that visual, user friendly interface, But for those 282 00:13:51.559 --> 00:13:55.000 who prefer to work with code, the book also covers 283 00:13:55.000 --> 00:13:58.559 the Azure Machine Learning SDK for Python. It does so 284 00:13:58.879 --> 00:14:01.039 for those who are comfortable covading, there's a way to 285 00:14:01.039 --> 00:14:04.320 get even more control and flexibility over that auto mL 286 00:14:04.440 --> 00:14:07.399 process there is, okay, and the book walks us through 287 00:14:07.639 --> 00:14:11.080 using Jupiter notebooks within Asure machine Learning, which is a 288 00:14:11.240 --> 00:14:15.320 very popular way to write and execute Python code for 289 00:14:15.440 --> 00:14:18.320 data science tasks. It is, and you can actually use 290 00:14:18.440 --> 00:14:22.240 those powerful compute clusters that we talked about earlier to 291 00:14:22.399 --> 00:14:25.639 run your code in the cloud, yes, giving you access 292 00:14:25.679 --> 00:14:27.159 to tons of processing power. 293 00:14:27.240 --> 00:14:27.879 Absolutely. 294 00:14:28.240 --> 00:14:31.360 It really is amazing how cloud computing has made these 295 00:14:31.399 --> 00:14:35.519 really complex tasks so much more accessible. It has. Now 296 00:14:35.559 --> 00:14:38.200 I'm curious about the different ways that you can actually 297 00:14:38.240 --> 00:14:41.879 deploy models once they're trained. We talked about batch scoring 298 00:14:41.919 --> 00:14:44.559 and real time scoring, yes, but the book also mentioned 299 00:14:44.559 --> 00:14:46.159 something called mL pipelines. 300 00:14:46.440 --> 00:14:50.120 mL pipelines are a fantastic way to automate your entire 301 00:14:50.240 --> 00:14:54.639 machine learning workflow, okay, from data preprocessing and feature engineering 302 00:14:54.919 --> 00:14:56.919 to model training and deployment. 303 00:14:57.240 --> 00:14:59.440 So it's like having an assembly line for your AI, 304 00:15:00.279 --> 00:15:02.799 ensuring that each step is executed in the right order, 305 00:15:03.080 --> 00:15:07.440 with the right settings. Absolutely, automation, efficiency, consistency. That's it 306 00:15:07.440 --> 00:15:09.559 sounds like a well oiled machine exactly. 307 00:15:09.600 --> 00:15:12.159 And to take it a step further, the book introduces 308 00:15:12.240 --> 00:15:13.759 us to Azure Data. 309 00:15:13.559 --> 00:15:15.440 Factory Azure Data Factory. 310 00:15:15.519 --> 00:15:19.159 This is a cloud based data integration service okay that 311 00:15:19.200 --> 00:15:23.159 can handle even more complex data flows, connecting to different 312 00:15:23.240 --> 00:15:26.679 data sources, got it, transforming your data okay, and even 313 00:15:26.720 --> 00:15:28.519 triggering your mL pipelines. 314 00:15:28.720 --> 00:15:31.279 So if mL pipelines are the assembly lines, that Azure 315 00:15:31.320 --> 00:15:34.679 Data Factory is like the logistics manager making sure that 316 00:15:34.759 --> 00:15:38.639 all the raw materials and the finished products are flowing smoothly. 317 00:15:38.799 --> 00:15:39.879 A great way to put it. 318 00:15:40.000 --> 00:15:42.559 Now, we've covered a lot of ground here, from the 319 00:15:42.679 --> 00:15:46.320 really medi gritty details of data preparation and feature engineering 320 00:15:46.679 --> 00:15:49.960 to the broader concepts of model deployment and automation. We have, 321 00:15:50.159 --> 00:15:53.840 but let's not forget about one really crucial aspect, the 322 00:15:54.080 --> 00:15:58.120 human element. Building a model is only part of the story. Yeah, 323 00:15:58.159 --> 00:16:02.320 what about gaining the trust and buy in of the 324 00:16:02.360 --> 00:16:04.320 people who will actually be using these models. 325 00:16:04.440 --> 00:16:06.279 That's a great point. You know, we can build the 326 00:16:06.320 --> 00:16:09.320 most sophisticated AI in the world, but if people don't 327 00:16:09.360 --> 00:16:11.279 trust it or understand how. 328 00:16:11.120 --> 00:16:12.480 It works, what good is it. 329 00:16:12.200 --> 00:16:14.799 It's not going to be very useful, right, And that's 330 00:16:14.840 --> 00:16:19.200 why those explainability features we keep talking about are so important. 331 00:16:19.799 --> 00:16:24.799 Extremely the book emphasizes the need to clearly articulate how 332 00:16:24.840 --> 00:16:28.320 a model is making decisions, especially in industries that have 333 00:16:28.399 --> 00:16:31.159 strict regulations, got it, or where the stakes. 334 00:16:30.840 --> 00:16:34.240 Are high, right, like healthcare of finance exactly. So it's 335 00:16:34.279 --> 00:16:37.039 not enough to just say the computer says this is 336 00:16:37.080 --> 00:16:39.399 the best course of action. We need to be able 337 00:16:39.399 --> 00:16:42.000 to back that up with insights and evidence. 338 00:16:41.639 --> 00:16:44.519 Absolutely, and auto mL gives us the tools to do 339 00:16:44.759 --> 00:16:48.200 just that. You can use feature importance scores to see 340 00:16:48.200 --> 00:16:51.480 which factors are driving the model's predictions. You can even 341 00:16:51.559 --> 00:16:55.039 drill down into individual predictions to understand why the model 342 00:16:55.080 --> 00:16:56.519 made a specific decision. 343 00:16:56.840 --> 00:16:58.879 So it's like having a transparent AI where we can 344 00:16:58.919 --> 00:17:02.159 peek under the hood and see what's going on exactly. Now, 345 00:17:02.200 --> 00:17:04.200 before we wrap up this part of our deep dive, 346 00:17:05.079 --> 00:17:07.119 I want to highlight something that really stood out to 347 00:17:07.160 --> 00:17:09.720 me while reading the book. It mentions that auto mL 348 00:17:09.799 --> 00:17:13.279 can actually be used in other Microsoft products besides Azure 349 00:17:13.359 --> 00:17:16.359 Machine Learning Studio. Yes, oh, so it's not just limited 350 00:17:16.359 --> 00:17:17.759 to this one platform, not at all. 351 00:17:18.119 --> 00:17:21.079 The book talks about integrating auto mL with tools like 352 00:17:21.240 --> 00:17:26.279 Powerbi Powerbi which is a powerful data visualization and business 353 00:17:26.319 --> 00:17:27.960 intelligence platform, So you. 354 00:17:27.920 --> 00:17:32.079 Could create these interactive dashboards that not only display the data, 355 00:17:32.279 --> 00:17:35.960 but use auto mL to generate predictions and insights. Yes, 356 00:17:36.359 --> 00:17:40.079 that's next level. That takes data storytelling to a whole 357 00:17:40.279 --> 00:17:42.960 new level. It does. And the book also touches on 358 00:17:43.359 --> 00:17:47.359 using auto mL with Azure Synapse Analytics, which is a 359 00:17:47.480 --> 00:17:50.680 cloud based data warehousing and analytics. 360 00:17:50.240 --> 00:17:53.880 Service exactly, and this opens up even more possibilities. Wow, 361 00:17:53.920 --> 00:17:57.559 for working with massive data sets and building these enterprise 362 00:17:57.599 --> 00:17:59.160 scale AI solutions. 363 00:17:59.200 --> 00:18:01.799 It sounds like the possible are pretty much endless. They are. 364 00:18:02.079 --> 00:18:04.640 We've covered so much ground, but it feels like we've 365 00:18:04.640 --> 00:18:06.759 only just begun to scratch the surface of what auto 366 00:18:06.839 --> 00:18:07.880 mL on Azure can do. 367 00:18:08.039 --> 00:18:09.599 We've just scratched the surface. 368 00:18:09.799 --> 00:18:11.480 So as we move into the next part of our 369 00:18:11.480 --> 00:18:15.880 deep dive, I'm curious where should someone start or if 370 00:18:15.880 --> 00:18:18.680 they want to dive in and explore this world of 371 00:18:18.759 --> 00:18:20.559 auto mL on Azure. 372 00:18:20.799 --> 00:18:24.359 Well, this book we've been discussing is an excellent starting point. Okay, 373 00:18:24.519 --> 00:18:27.319 Dennis Sawyer's has done a really great job of creating 374 00:18:27.359 --> 00:18:30.319 a practical and easy to follow guide. I agree, pact 375 00:18:30.359 --> 00:18:35.240 with real world examples, code snippets, and tons of helpful advice. 376 00:18:35.119 --> 00:18:38.799 And there are tons of online resources, including Microsoft's own 377 00:18:38.880 --> 00:18:43.920 documentation and tutorials exactly. And plus the Azure community is 378 00:18:44.000 --> 00:18:45.640 incredibly active and supportive. 379 00:18:45.720 --> 00:18:46.039 It is. 380 00:18:46.119 --> 00:18:48.440 It's great. So if you have questions or get stuck, 381 00:18:48.880 --> 00:18:51.519 you'll find plenty of people willing to help you will. 382 00:18:51.799 --> 00:18:54.640 I think the key takeaway here is that AutoML is 383 00:18:54.680 --> 00:18:57.079 not some futuristic concept. 384 00:18:57.279 --> 00:18:57.839 No it's not. 385 00:18:58.240 --> 00:18:59.920 It's here here, it's here now. 386 00:19:00.079 --> 00:19:01.599 It's now a powerful tool. 387 00:19:01.720 --> 00:19:03.920 It's a powerful tool that's available right now. 388 00:19:03.799 --> 00:19:05.960 And it's making AI more accessible. 389 00:19:05.559 --> 00:19:08.039 Than ever before. And with the power of auto mL 390 00:19:08.160 --> 00:19:13.039 on Azure, anyone can become an AI innovator. So to 391 00:19:13.200 --> 00:19:17.720 our listener, we challenge you, what problem will you solve 392 00:19:18.039 --> 00:19:20.960 with auto mL. That's a great question, that's something to 393 00:19:21.000 --> 00:19:23.319 think about. Think about it. We'll be back in just 394 00:19:23.359 --> 00:19:25.599 a moment with part two of our deep dive into 395 00:19:25.599 --> 00:19:29.720 AutoML on Azure. Welcome back to our deep dive into 396 00:19:29.759 --> 00:19:33.319 auto mL on Azure. Now, before the break, we were 397 00:19:33.319 --> 00:19:35.599 talking about how auto mL can be used in other 398 00:19:35.680 --> 00:19:39.519 Microsoft products. Yes, besides just as your machine learning studio. 399 00:19:40.079 --> 00:19:42.680 So it's not just limited to that one platform, not 400 00:19:42.799 --> 00:19:44.079 at all, Okay, So tell me more. 401 00:19:44.240 --> 00:19:47.119