WEBVTT 1 00:00:00.040 --> 00:00:04.480 Hey, everyone, welcome to the deep dive. So data driven 2 00:00:04.480 --> 00:00:06.280 decisions are pretty much everywhere. 3 00:00:05.879 --> 00:00:08.919 Now, right, absolutely, and machine learning is you know, a 4 00:00:08.960 --> 00:00:10.199 massive part of that picture. 5 00:00:10.359 --> 00:00:13.320 It really is. But let's be honored. Jumping into machine 6 00:00:13.359 --> 00:00:16.719 learning it can still feel like, well, like a whole 7 00:00:16.760 --> 00:00:19.199 new world to navigate for a lot of folks. Definitely, 8 00:00:19.280 --> 00:00:25.440 what if there was something powerful, really versatile and genuinely accessible, 9 00:00:25.760 --> 00:00:27.079 especially if you're already working. 10 00:00:26.920 --> 00:00:29.120 In the dot net world, And that's exactly what we're 11 00:00:29.120 --> 00:00:32.119 digging into today. We're doing a deep dive into mL 12 00:00:32.200 --> 00:00:35.479 dot net. That's Microsoft's machine learning framework, right. 13 00:00:35.679 --> 00:00:37.880 Our mission here is basically to pull out the most 14 00:00:37.920 --> 00:00:41.600 important stuff from the well the comprehensive guide on it. 15 00:00:41.880 --> 00:00:43.280 We want to give you a real shortcut. 16 00:00:43.359 --> 00:00:47.079 Yeah, it gets you well informed on building, optimizing, and 17 00:00:47.520 --> 00:00:50.560 deploying machine learning models using mL dot net exactly. 18 00:00:50.600 --> 00:00:52.960 You'll get the basics, see how it's actually used, understand 19 00:00:53.000 --> 00:00:55.520 the steps even if you're not you know, a machine 20 00:00:55.600 --> 00:00:56.119 learning guru. 21 00:00:56.240 --> 00:00:59.079 Right now, So where did it even come from? Let's 22 00:00:59.119 --> 00:01:01.320 start there, MLT and that it actually started life as 23 00:01:01.320 --> 00:01:02.799 an internal thing at Microsoft. 24 00:01:03.200 --> 00:01:06.079 Oh interesting, like an in house tool exactly. 25 00:01:05.840 --> 00:01:09.079 Which kind of shows you how central machine learning was 26 00:01:09.079 --> 00:01:10.680 becoming to them even back then. 27 00:01:10.920 --> 00:01:13.959 And what's really compelling is that then in twenty eighteen 28 00:01:14.519 --> 00:01:17.560 they did something pretty big. They released it as open source, 29 00:01:18.079 --> 00:01:21.319 which wasn't just like dropping code. It was really an invitation, 30 00:01:21.480 --> 00:01:21.959 wasn't it. 31 00:01:21.959 --> 00:01:26.640 It really was. It means mL dot net has grown massively, 32 00:01:26.840 --> 00:01:29.920 not just from Microsoft engineers, but the whole developer community 33 00:01:30.000 --> 00:01:30.680 chipping in. 34 00:01:30.640 --> 00:01:34.719 That open source approach that brings transparency, collective brain power. Yeah, 35 00:01:34.799 --> 00:01:36.480 makes it really solid totally. 36 00:01:36.640 --> 00:01:39.599 And the real power I think comes down to how 37 00:01:39.920 --> 00:01:42.040 accessible and versatile it is. It's not just built for 38 00:01:42.120 --> 00:01:45.359 one specific kind of problem. It handles a whole bunch 39 00:01:45.400 --> 00:01:50.159 of mL tasks, you know, classification, regression, clustering, even those 40 00:01:50.200 --> 00:01:54.439 complex recommendation systems. It's like a full AI toolkit. 41 00:01:54.079 --> 00:01:56.560 And its platform independent, runs on Windows. 42 00:01:56.519 --> 00:01:58.400 LFS, YEP, works wherever you work. 43 00:01:58.480 --> 00:02:00.680 But the real kicker, I think, especially for listeners who 44 00:02:00.680 --> 00:02:03.120 are dot net devs, is that deep integration. 45 00:02:03.519 --> 00:02:04.280 Oh absolutely. 46 00:02:04.280 --> 00:02:06.480 If you're already working in C shark, F sharp, maybe 47 00:02:06.560 --> 00:02:10.039 VB dot net, you use the tools, you know, the libraries, 48 00:02:10.080 --> 00:02:13.439 you know, that barrier to entry just drops way down. 49 00:02:13.639 --> 00:02:16.280 It does, and that adaptability it means you can stick 50 00:02:16.319 --> 00:02:19.919 machine learning smarts right into your desktop apps, web apps, mobile. 51 00:02:19.639 --> 00:02:22.960 Apps, So it's not just for dedicated mL engineers anymore, exactly. 52 00:02:23.000 --> 00:02:25.400 It kind of democratizes it within the dot net space. 53 00:02:25.680 --> 00:02:27.800 Any dot net dev can start using AI. 54 00:02:28.479 --> 00:02:31.360 Okay, So inevitably people will ask, how does it compare, 55 00:02:32.479 --> 00:02:35.479 you know, to other frameworks out there, Like psyche learn 56 00:02:35.599 --> 00:02:38.000 is huge in the Python world, right it is. 57 00:02:37.919 --> 00:02:41.560 And it's fantastic. Psychic learn offers a really comprehensive set 58 00:02:41.599 --> 00:02:44.120 of tools for Python developers. 59 00:02:43.639 --> 00:02:44.120 No doubt. 60 00:02:44.280 --> 00:02:47.080 But mL net well, mlnet really shines in its niche, 61 00:02:47.120 --> 00:02:51.039 which is accessibility and just ease of use, especially if 62 00:02:51.039 --> 00:02:53.520 you're already comfortable in dot net. The big advantage is 63 00:02:53.599 --> 00:02:57.479 working where you already live, basically, which leads to the question, right, 64 00:02:57.560 --> 00:02:59.879 why go learn a whole new language and tool chain. 65 00:03:00.080 --> 00:03:02.199 If you can bring mL right into your existing c 66 00:03:02.360 --> 00:03:03.560 sharp projects, say. 67 00:03:03.400 --> 00:03:05.560 Exactly, leverage the skills you've already got. 68 00:03:05.759 --> 00:03:07.599 That makes a ton of sense. Using what you know 69 00:03:07.759 --> 00:03:12.439 is always faster. Okay, But maybe stepping back for a 70 00:03:12.439 --> 00:03:17.639 second for someone who's heard machine learning but isn't totally clear, Yeah, 71 00:03:17.719 --> 00:03:20.159 can we break down the basics what is it? And 72 00:03:20.199 --> 00:03:22.319 then how does mL dot net help. 73 00:03:22.479 --> 00:03:26.120 Yeah, good idea. So, at its heart, machine learning is 74 00:03:26.120 --> 00:03:30.680 about letting computer systems learn from data without you having 75 00:03:30.680 --> 00:03:33.319 to explicitly program every single rule. 76 00:03:33.479 --> 00:03:37.159 Okay, So they learn how like recognizing patterns? 77 00:03:37.280 --> 00:03:41.039 Yeah, making predictions, precisely recognizing patterns, making predictions, even getting 78 00:03:41.080 --> 00:03:42.719 better over time based on the data they. 79 00:03:42.639 --> 00:03:46.000 See, which naturally brings up What are they learning from? 80 00:03:46.080 --> 00:03:47.879 That probably needs a few key terms. 81 00:03:47.639 --> 00:03:50.280 Right, it does. So first you've got data that's just 82 00:03:50.360 --> 00:03:53.080 the raw stuff, text, images, numbers, whatever. 83 00:03:52.840 --> 00:03:54.840 You're working like, Okay, data, got it. 84 00:03:55.080 --> 00:03:57.439 Then within that data you have features. These are the 85 00:03:57.439 --> 00:04:01.400 specific measurable characteristics or tributes the model looks at. 86 00:04:01.560 --> 00:04:04.039 Like if you're predicting house prices, the features might be 87 00:04:04.080 --> 00:04:05.159 square footage, number of. 88 00:04:05.199 --> 00:04:08.599 Deadrooms exactly, location, age of the house, those kinds of things, 89 00:04:08.639 --> 00:04:09.840 things that influence the outcome. 90 00:04:09.960 --> 00:04:11.439 Okay. And then labels. 91 00:04:11.759 --> 00:04:14.840 Labels are the answer. You're trying to predict the target variable. 92 00:04:15.240 --> 00:04:18.480 So in spam detection, the label is spam or not. 93 00:04:18.480 --> 00:04:21.160 Spam, simple enough, And the models. 94 00:04:20.759 --> 00:04:24.800 Are the models are the algorithms themselves. They're trained on 95 00:04:24.839 --> 00:04:27.319 the data to learn the relationship between the features and 96 00:04:27.360 --> 00:04:30.399 the labels. They're kind of the brain that makes the prediction. 97 00:04:30.759 --> 00:04:35.839 Gotcha. So data features, labels, models the building. 98 00:04:35.439 --> 00:04:38.480 Blocks, right, and these models learn in different ways. The 99 00:04:38.519 --> 00:04:41.639 main paradigm is probably supervised learning. 100 00:04:41.879 --> 00:04:44.600 Supervised meaning it has the answers already. 101 00:04:44.319 --> 00:04:47.439 Kind of it learns from labeled data data where you 102 00:04:47.480 --> 00:04:50.920 already know the correct output the label. This is super 103 00:04:50.920 --> 00:04:52.199 common for tasks like. 104 00:04:52.199 --> 00:04:54.759 Classification assigning categories. 105 00:04:54.240 --> 00:04:57.480 Right, or regression where you're predicting a continuous number like 106 00:04:57.519 --> 00:04:57.959 a price. 107 00:04:58.160 --> 00:04:59.079 Okay, what else? 108 00:04:59.560 --> 00:05:03.600 Then you have unsupervised learning. This deals with unlabeled data. 109 00:05:03.959 --> 00:05:06.240 The goal isn't to predict a known label, but to 110 00:05:06.319 --> 00:05:09.439 find hidden structures or patterns within the data itself. 111 00:05:09.519 --> 00:05:12.519 Ah, so like grouping similar customers together exactly. 112 00:05:12.600 --> 00:05:15.920 Clustering is a classic example, or simplifying data, which is 113 00:05:15.959 --> 00:05:20.399 dimensionality reduction. And just briefly, there's also reinforcement learning training 114 00:05:20.399 --> 00:05:25.639 agents through rewards and penalties, and semi supervised which mixes 115 00:05:25.759 --> 00:05:27.199 labeled and unlabeled data. 116 00:05:27.439 --> 00:05:30.240 That's quite a spectrum, and mL net has tools for these. 117 00:05:30.399 --> 00:05:33.399 It does for a really wide array. Take classification that's 118 00:05:33.439 --> 00:05:36.920 putting data into pre defined buckets. It could be binary classification. 119 00:05:37.040 --> 00:05:40.120 Just two options like spam not spam, fraud not fraud. 120 00:05:40.160 --> 00:05:43.600 Sentiment analysis positive and negative review all those yes, no 121 00:05:43.759 --> 00:05:45.079 type questions. 122 00:05:44.879 --> 00:05:47.800 Or it could be multi class classification more than two 123 00:05:47.800 --> 00:05:48.759 options yep. 124 00:05:49.120 --> 00:05:53.040 Think recognizing handwritten numbers zero through nine, or sorting news 125 00:05:53.079 --> 00:05:56.399 articles into topics like sports, politics, tech. 126 00:05:56.759 --> 00:05:58.800 Okay, what about predicting numbers. 127 00:05:59.079 --> 00:06:03.199 That's regression, predicting house prices, stock prices, sales figures, any 128 00:06:03.279 --> 00:06:05.759 numerical value. Mlnet is great at that too. 129 00:06:05.879 --> 00:06:08.480 And you mentioned clustering earlier, finding. 130 00:06:08.120 --> 00:06:12.120 Groups right Grouping similar data points without knowing the groups beforehand, 131 00:06:12.240 --> 00:06:15.959 really useful for customer segmentation, finding different user types, or 132 00:06:15.959 --> 00:06:18.000 even anomaly detection in some cases. 133 00:06:18.720 --> 00:06:22.319 Speaking of anomaly detection, finding the odd ones. 134 00:06:22.079 --> 00:06:25.480 Out exactly, spotting unusual patterns that don't fit the norm. 135 00:06:25.920 --> 00:06:31.079 Think network intrusion detection or monitoring machine health for predictive maintenance. 136 00:06:31.160 --> 00:06:32.879 It sounds incredibly useful, and. 137 00:06:32.759 --> 00:06:37.920 It goes further ranking search results, building recommendation engines, forecasting 138 00:06:37.959 --> 00:06:43.040 future trends, image classification, object detection in images. It comes 139 00:06:43.079 --> 00:06:43.680 a lot of ground. 140 00:06:43.800 --> 00:06:47.480 That's a seriously broad toolkit, and what gets really interesting 141 00:06:47.519 --> 00:06:51.759 is seeing how this translates into actual business value. Can 142 00:06:51.800 --> 00:06:53.439 we talk about some real world examples. 143 00:06:53.519 --> 00:06:57.079 Sure, look at e commerce recommendation systems are huge, suggesting 144 00:06:57.120 --> 00:06:59.040 products you might like based on what you've looked at 145 00:06:59.120 --> 00:06:59.480 or bought. 146 00:07:00.000 --> 00:07:03.360 Amazon saying customers who bought this also bought precisely. 147 00:07:03.600 --> 00:07:07.920 That's mL It also drives dynamic pricing, adjusting prices based 148 00:07:07.920 --> 00:07:11.279 on demand, and analyzing customer reviews automatically for sentiment. 149 00:07:11.480 --> 00:07:14.680 And in healthcare, I know there's a lot happening there, huge. 150 00:07:14.360 --> 00:07:19.000 Potential things like medical diagnoses support predicting disease risk. You 151 00:07:19.079 --> 00:07:23.079 mentioned the Cleveland Clinic heart disease data earlier. Mlnet can 152 00:07:23.120 --> 00:07:26.000 analyze data like that to help identify patterns linked to 153 00:07:26.040 --> 00:07:28.959 heart disease, potentially leading to earlier diagnoses. 154 00:07:29.079 --> 00:07:31.519 Wow, that's impactful definitely. 155 00:07:32.279 --> 00:07:36.560 Then in manufacturing, think predictive maintenance. Using sensor data to 156 00:07:36.600 --> 00:07:40.000 predict when a machining might fail before it happens. Saves 157 00:07:40.040 --> 00:07:41.360 a lot of downtime and money. 158 00:07:41.480 --> 00:07:42.000 Makes sense. 159 00:07:42.120 --> 00:07:46.120 Finance, big use cases in fraud detections, spotting suspicious transactions 160 00:07:46.120 --> 00:07:50.720 in real time, and complex risk assessment for loans or investments. 161 00:07:50.240 --> 00:07:51.959 And even just everyday customer service. 162 00:07:52.079 --> 00:07:56.199 Yeah, powering smarter chatbots that can actually understand and resolve issues, 163 00:07:56.519 --> 00:08:00.399 or automatically analyzing customer feedback, emails or calls for sentiment 164 00:08:00.480 --> 00:08:03.040 to see if people are generally happy or frustrated. It's 165 00:08:03.079 --> 00:08:04.079 really weaving into. 166 00:08:03.879 --> 00:08:06.680 Everything, it really is. Okay, So, if someone's convinced they 167 00:08:06.720 --> 00:08:10.600 want to try this, what's the actual process? The workflow 168 00:08:10.600 --> 00:08:12.759 for building a model with mL dot net right. 169 00:08:13.040 --> 00:08:18.360 mL dot net guides you through a pretty clear logical workflow. 170 00:08:18.399 --> 00:08:22.240 It breaks down into data preparation, then model training, followed 171 00:08:22.279 --> 00:08:26.279 by model evaluation and tuning, and finally model deployment and integration. 172 00:08:26.519 --> 00:08:30.199 A structured approach keeps things manageable exactly. Okay, let's start 173 00:08:30.240 --> 00:08:34.960 the beginning data preparation you said earlier, garbage in, garbage out. 174 00:08:35.000 --> 00:08:35.960 This sounds critical. 175 00:08:36.279 --> 00:08:39.759 It absolutely is. The quality and honestly the quantity of 176 00:08:39.799 --> 00:08:43.000 your data directly shaped how good your model can be. 177 00:08:43.559 --> 00:08:45.639 So getting the data right is job one. 178 00:08:45.879 --> 00:08:48.039 So how do you ensure good data? Step one is 179 00:08:48.159 --> 00:08:48.919 just getting it right? 180 00:08:49.039 --> 00:08:55.080 Data collection yep, gathering it from wherever it lives, text files, csvs, databases, APIs. 181 00:08:55.600 --> 00:08:57.720 mL dot net helps here with tools like textloader for 182 00:08:57.799 --> 00:09:01.440 files and databaseloader for databases, simplifying that import process. 183 00:09:01.519 --> 00:09:04.480 Okay, So You've got the raw data, but it's probably messy. 184 00:09:04.240 --> 00:09:07.000 Almost always, which brings us to data cleaning. This is 185 00:09:07.000 --> 00:09:11.879 about finding and fixing problems, errors, inconsistencies, missing values big 186 00:09:11.919 --> 00:09:12.440 part of the job. 187 00:09:12.519 --> 00:09:14.000 How do you fix like missing values? 188 00:09:14.080 --> 00:09:16.879 A couple of ways. You could just remove rows or 189 00:09:16.919 --> 00:09:20.360 columns that have missing data using something like drop missing values, 190 00:09:20.840 --> 00:09:22.440 but you have to be careful there. You don't want 191 00:09:22.440 --> 00:09:25.679 to throw away too much valuable info. Often a better 192 00:09:25.720 --> 00:09:29.039 approach is imputation. That's where you replace the missing value 193 00:09:29.639 --> 00:09:32.759 with best guess, maybe the average mean or the middle 194 00:09:32.840 --> 00:09:36.440 value median for that feature. mL net has tools like 195 00:09:36.559 --> 00:09:37.879 replace missing values for that. 196 00:09:38.039 --> 00:09:41.600 Okay, clean data, What next? I remember you mentioned things 197 00:09:41.639 --> 00:09:42.720 like text labels need. 198 00:09:42.600 --> 00:09:47.039 Handling, right, that's data encoding. Most algorithms need numbers, not text, 199 00:09:47.120 --> 00:09:50.159 like red or blue, so you have to convert categorical data. 200 00:09:50.759 --> 00:09:54.480 Common ways are one hot encoding, where each category red, green, 201 00:09:54.559 --> 00:09:56.879 blue gets its own column with a one or zero. 202 00:09:57.080 --> 00:10:00.279 Ah. Okay, so like arid is green is blue exactly? 203 00:10:00.679 --> 00:10:03.200 Or label encoding, where you just assign a unique number 204 00:10:03.200 --> 00:10:06.080 to each category like red one, green, two, blue three. 205 00:10:06.720 --> 00:10:08.639 The choice depends on the algorithm and the data. 206 00:10:08.799 --> 00:10:11.840 Got it anything else in data prep, Yes. 207 00:10:11.639 --> 00:10:15.440 Super important. Normalization and scaling. Think about your features. Age 208 00:10:15.519 --> 00:10:18.159 might be twenty to eighty, well, income might be thirty 209 00:10:18.159 --> 00:10:21.320 thousand to five hundred thousand. Huge difference in scale. 210 00:10:20.960 --> 00:10:22.759 And that difference can mess up the model. 211 00:10:22.960 --> 00:10:26.120 It can. Some algorithms give more weight to features with 212 00:10:26.279 --> 00:10:28.919 larger values just because they're bigger numbers, not because they're 213 00:10:28.919 --> 00:10:32.440 more important. So scaling puts everything on a level playing field. 214 00:10:32.679 --> 00:10:33.360 How do you do that? 215 00:10:33.559 --> 00:10:37.200 Techniques like standardization, which transforms data to have a mean 216 00:10:37.279 --> 00:10:41.080 of zero and a standard deviation of one or minmac scaling, 217 00:10:41.120 --> 00:10:45.159 which squishes everything into a specific range usually zero to one. 218 00:10:45.279 --> 00:10:48.480 Okay, that makes sense. Prevent one feature from dominating exactly. 219 00:10:49.080 --> 00:10:52.120 And the last piece, which can be really powerful is 220 00:10:52.200 --> 00:10:53.279 feature engineering. 221 00:10:53.480 --> 00:10:57.480 Engineering features like creating new ones precisely. 222 00:10:57.639 --> 00:11:01.559 It's about using your domain knowledge or sometimes just creativity 223 00:11:01.919 --> 00:11:05.600 to transform the raw data into features that are more 224 00:11:05.679 --> 00:11:06.720 meaningful for the model. 225 00:11:06.840 --> 00:11:09.360 Like the VMI example you gave earlier, calculating it from 226 00:11:09.360 --> 00:11:09.879 heightened weight. 227 00:11:10.120 --> 00:11:13.399 Perfect example, instead of giving the model raw height and weight, 228 00:11:13.480 --> 00:11:15.879 you give it BMI, which might be a much better 229 00:11:15.960 --> 00:11:20.320 predictor for certain health outcomes. Good feature engineering can seriously 230 00:11:20.360 --> 00:11:24.000 boost your model's accuracy, make it easier to understand, and 231 00:11:24.080 --> 00:11:28.200 even help prevent overfitting. It's where art meets science. 232 00:11:27.879 --> 00:11:31.559 A bit fascinating. So data prep is done. Now we 233 00:11:31.600 --> 00:11:35.879 get to the core right model training, evaluation, and tuning. 234 00:11:36.159 --> 00:11:39.600 Yep, the brains of the operation, and mlnet gives you 235 00:11:39.639 --> 00:11:42.039 a couple of great ways to approach this. The mL 236 00:11:42.120 --> 00:11:45.120 dot net Model Builder and the mL dot net Cli. 237 00:11:45.679 --> 00:11:48.039 Okay, tell me about the model builder. Sounds friendly, It 238 00:11:48.080 --> 00:11:48.480 really is. 239 00:11:48.559 --> 00:11:51.360 It's a graphical tool right inside visual studio. Makes things 240 00:11:51.440 --> 00:11:54.080 much simpler, especially if you're starting out. It helps with 241 00:11:54.200 --> 00:11:57.639 pre processing, picking models. It even does some automated feature 242 00:11:57.639 --> 00:12:00.559 engineering and hyper parameter tuning for you. 243 00:12:00.320 --> 00:12:02.279 So it hides some of the complexity it does. 244 00:12:02.320 --> 00:12:05.519 You can visually pick your scenario like sentiment analysis using 245 00:12:05.600 --> 00:12:08.039 Yelp reviews, point it at your data, choose where to 246 00:12:08.080 --> 00:12:10.480 run it like your local machine, and it guides you through. 247 00:12:10.519 --> 00:12:11.840 It even suggests algorithms. 248 00:12:12.120 --> 00:12:15.360 Nice. What about the cli command line interface. 249 00:12:15.639 --> 00:12:18.519 That's for folks who like the command line or need automation. 250 00:12:18.600 --> 00:12:22.200 It's very powerful for scripting for integrating mL model training 251 00:12:22.279 --> 00:12:26.559 into say a CICD pipeline or other automated workflows. 252 00:12:26.720 --> 00:12:27.879 How do you get these tools? 253 00:12:27.960 --> 00:12:31.360 Pretty standard dot net ways. Now get package manager for 254 00:12:31.399 --> 00:12:34.279 the libraries and for the cli. It's a dot net tool. 255 00:12:34.320 --> 00:12:38.120 So dot net tool installed, gml net straightforward. 256 00:12:38.360 --> 00:12:42.840 Okay, tools installed. How does the actual training and evaluation work? 257 00:12:43.039 --> 00:12:45.919 Well, First you pick a learning algorithm. mL net calls 258 00:12:45.960 --> 00:12:49.000 these trainers and they cleverly package an algorithm with a 259 00:12:49.039 --> 00:12:53.159 specific task type. Then, crucially, you split your prepared. 260 00:12:52.919 --> 00:12:55.639 Data into training and testing sets exactly. 261 00:12:55.679 --> 00:12:57.519 You train the model on the training set, let it 262 00:12:57.559 --> 00:13:00.039 learn the patterns. Then you use the testing set that 263 00:13:00.159 --> 00:13:02.399 data the model hasn't seen before to evaluate how well 264 00:13:02.399 --> 00:13:02.960 it performs. 265 00:13:03.080 --> 00:13:04.799 And how do you measure performance. 266 00:13:04.679 --> 00:13:07.720 Depends on the task. For classification, you might look at accuracy, 267 00:13:07.759 --> 00:13:11.840 how many did it get right? Maybe log loss. For regression, 268 00:13:11.960 --> 00:13:16.639 common metrics are mean absolute error MAE or root means 269 00:13:16.639 --> 00:13:20.320 squared error RMS, measuring how fall off the predictions were 270 00:13:20.320 --> 00:13:20.960 on average. 271 00:13:21.080 --> 00:13:24.039 Okay, and choosing the algorithm the trainer you said, it's 272 00:13:24.080 --> 00:13:25.080 part art, part science. 273 00:13:25.679 --> 00:13:28.440 It is factors like your data size and type, how 274 00:13:28.440 --> 00:13:31.440 fast you need training to be, how complex or interpretable 275 00:13:31.480 --> 00:13:34.080 you need the model to be. They all play a role. 276 00:13:34.279 --> 00:13:36.759 mL net has trainers for all the tasks. 277 00:13:36.480 --> 00:13:39.559 You mentioned pretty much. Yeah, like says care aggression trainer 278 00:13:39.559 --> 00:13:43.279 for regression, like to be a binary trainer for binary classification, 279 00:13:43.759 --> 00:13:46.200 K means trainer for clustering, and many others. 280 00:13:46.399 --> 00:13:48.879 And I remember you mentioned o NNX. Some trainers can 281 00:13:48.879 --> 00:13:49.600 export to that. 282 00:13:49.600 --> 00:13:53.360 That's right. O NNX is the Open Neural Network Exchange format. 283 00:13:53.799 --> 00:13:56.039 Being able to export means you can train in mL 284 00:13:56.080 --> 00:13:58.399 dot net, but maybe deploy or use the model in 285 00:13:58.440 --> 00:14:03.799 other environments or frameworks like TensorFlow or PyTorch huge for interoperability. 286 00:14:04.200 --> 00:14:06.559 What kinds of algorithms are we talking about? Broadly? 287 00:14:06.720 --> 00:14:09.519 A good mix. You've got linear models like average perceptron 288 00:14:09.600 --> 00:14:13.320 often good for text, powerful decision tree methods like light GBM, 289 00:14:13.480 --> 00:14:17.639 fast tree, fast force, great vertabular data gams which are 290 00:14:17.639 --> 00:14:22.000 good for explainability, SVMs K means for clustering, naive bays, 291 00:14:22.840 --> 00:14:27.200 even matrix factorization for building recommendation systems. Lots of choice. 292 00:14:27.320 --> 00:14:31.159 Okay, so you train a model, evaluate it, but maybe 293 00:14:31.200 --> 00:14:33.639 it's not good enough yet. What's hyper parameter tuning? 294 00:14:33.799 --> 00:14:37.759 Ah? Good question. Hyper Parameters are like the settings or 295 00:14:37.840 --> 00:14:41.000 knobs on your learning algorithm before you start training. They 296 00:14:41.039 --> 00:14:44.960 aren't learned from the data itself. You the developer, set them. 297 00:14:45.159 --> 00:14:45.960 What do they control? 298 00:14:46.399 --> 00:14:49.720 Things like how fast the model learns, learning rate, how 299 00:14:49.759 --> 00:14:52.559 complex it's allowed to get, how much it tries to 300 00:14:52.600 --> 00:14:57.399 avoid overfitting regularization, Maybe things like maximum number of iterations, 301 00:14:57.399 --> 00:14:58.360 how long it trains for. 302 00:14:58.759 --> 00:15:01.519 You need to find the best setting for these knobs exactly. 303 00:15:02.120 --> 00:15:05.600 That's tuning. Mlnet offers ways to help automate this, like 304 00:15:05.679 --> 00:15:08.919 grid search trying all combinations in a grid, or you 305 00:15:08.960 --> 00:15:12.480 can implement things like random search or Bayesian optimization to 306 00:15:12.600 --> 00:15:14.519 explore the possibilities more intelligently. 307 00:15:14.759 --> 00:15:18.399 Seems important for squeezing out performance. What about cross validation? 308 00:15:18.919 --> 00:15:21.720 Heard that term a lot crucial technique. It helps you 309 00:15:21.759 --> 00:15:24.240 get a more reliable estimate of how well your model 310 00:15:24.240 --> 00:15:27.200 will actually perform on unseen data, and it's a key 311 00:15:27.240 --> 00:15:28.519 way to fight overfitting. 312 00:15:28.639 --> 00:15:29.120 Does it work? 313 00:15:29.240 --> 00:15:32.840 The most common type is kfold cross validation. You split 314 00:15:32.840 --> 00:15:36.639 your data into say k five chunks or folds. You 315 00:15:36.679 --> 00:15:39.960 train the model five times. Each time, you train on 316 00:15:40.039 --> 00:15:43.600 four folds and test on the one left overfold, rotating 317 00:15:43.600 --> 00:15:45.279 which fold is the test set. 318 00:15:45.399 --> 00:15:48.039 Ah, So you test on all the data eventually, but 319 00:15:48.120 --> 00:15:50.480 never on data was trained on in that specific run. 320 00:15:50.639 --> 00:15:53.679 Exactly, then you average the performance across the five runs. 321 00:15:53.799 --> 00:15:56.440 Gives you a much more robust evaluation than a single 322 00:15:56.559 --> 00:15:57.320 train to split. 323 00:15:57.559 --> 00:15:58.480 Are there variations? 324 00:15:58.679 --> 00:16:01.639 Yep, Leave one out is where k equals your number 325 00:16:01.679 --> 00:16:05.200 of data points. Very thorough but slow for large data 326 00:16:05.200 --> 00:16:09.080 sets and stratified kfold is important for imbalanced data ensuring 327 00:16:09.120 --> 00:16:11.879 each fold has roughly the same proportion of different classes 328 00:16:11.919 --> 00:16:13.120 as the original data set. 329 00:16:13.159 --> 00:16:15.840 Okay, that whole training and tuning process makes sense. So 330 00:16:16.279 --> 00:16:18.759 you've got a model you're happy with. What's next? Model 331 00:16:18.759 --> 00:16:21.440 deployment and integration, getting. 332 00:16:21.200 --> 00:16:23.440 It out there right, bringing it to life. And a 333 00:16:23.519 --> 00:16:25.399 key part of this is being able to save and 334 00:16:25.480 --> 00:16:28.080 load your trained model. You want to preserve this hard 335 00:16:28.120 --> 00:16:29.240 earned knowledge. 336 00:16:28.919 --> 00:16:30.759 Right, Definitely don't want to retrain every. 337 00:16:30.600 --> 00:16:34.919 Time, no way. Reusing pre trained models saves a massive 338 00:16:34.960 --> 00:16:39.639 amount of time. So the process involves serialization serialization. It's 339 00:16:39.679 --> 00:16:44.320 basically converting the model's internal state, its structure, the learned weights, 340 00:16:44.440 --> 00:16:47.720 all metadata into a format that can be saved to 341 00:16:47.799 --> 00:16:50.159 a file, usually a compact byte stream. 342 00:16:50.480 --> 00:16:52.879 Okay, saving it and getting it back. 343 00:16:52.759 --> 00:16:56.440 That's de serialization, reading that saved file, that byte stream, 344 00:16:56.679 --> 00:16:59.000 and converting it back into a working model. Object in 345 00:16:59.039 --> 00:17:01.200 your application ready to make predictions. 346 00:17:01.279 --> 00:17:02.480 What format does it save in? 347 00:17:02.799 --> 00:17:07.240 Mlnet's default is its own native binary format. It's generally 348 00:17:07.240 --> 00:17:10.119 the fastest and supports all mL net features, so it's 349 00:17:10.200 --> 00:17:11.079 often recommended. 350 00:17:11.160 --> 00:17:13.319 But you also mentioned ONNX before, Yes. 351 00:17:13.240 --> 00:17:15.720 And that's a huge plus. You can also save models 352 00:17:15.720 --> 00:17:19.400 in the ONNX format because ONX is an open standard. 353 00:17:19.519 --> 00:17:22.160 This means you can potentially take your mL net trained 354 00:17:22.240 --> 00:17:25.759 model and use it in Python with TensorFlow, or deploy 355 00:17:25.799 --> 00:17:29.920 it on specialized hardware or in cloud services that understand ONX. 356 00:17:30.079 --> 00:17:31.920 Big for flexibility makes sense. 357 00:17:32.039 --> 00:17:34.400 Any tips for managing these saved models. 358 00:17:34.200 --> 00:17:37.599 Yeah, good practices help. Maybe save the model architecture and 359 00:17:37.640 --> 00:17:40.920 the learned weight separately if you update weights. Often use 360 00:17:40.960 --> 00:17:44.200 clear versioning in your file names like sentiment, dash model, 361 00:17:44.279 --> 00:17:47.920 dash v one by two, dot zip, have good naming conventions, 362 00:17:48.200 --> 00:17:51.440 and maybe add integrity checks like file hashing to make 363 00:17:51.480 --> 00:17:53.799 sure the model file hasn't been corrupted when you load it. 364 00:17:53.920 --> 00:17:58.359 Okay, model saved now, getting it used. Deploying to the 365 00:17:58.359 --> 00:18:00.000 cloud seems like a common goal. 366 00:18:00.200 --> 00:18:04.200 Very common, and for good reason. The cloud offers huge advantages. 367 00:18:04.720 --> 00:18:09.440 Scalability is a big one. Automatically handle more or less prediction, traffic, accessibility, 368 00:18:09.519 --> 00:18:12.839 use your model from anywhere, cost effectiveness, often pay as 369 00:18:12.880 --> 00:18:16.000 you go, and real time inference for applications needing instant 370 00:18:16.039 --> 00:18:20.240 predictions like fraud detection you mentioned exactly, or live recommendations. 371 00:18:20.519 --> 00:18:23.160 And mlnet is designed to be cloud agnostic. It works 372 00:18:23.160 --> 00:18:26.039 well with Azure of course, but also AWS, Google Cloud. 373 00:18:26.079 --> 00:18:28.240 How do you typically deploy it? Say, on Ashure? 374 00:18:28.519 --> 00:18:32.920 A very popular pattern is using Azure functions. That's serverless compute. 375 00:18:33.200 --> 00:18:36.079 You upload your code, including loading and running your model, 376 00:18:36.599 --> 00:18:40.079 and Azure handles the scaling automatically. You combine that with 377 00:18:40.119 --> 00:18:43.519 a WebAPI, often using something like ASP dot net core 378 00:18:43.680 --> 00:18:46.720 to expose your model's prediction function of the simple rest 379 00:18:46.799 --> 00:18:48.079 endpoint over HTTP. 380 00:18:48.440 --> 00:18:50.680 So other applications can just call that API to get 381 00:18:50.680 --> 00:18:51.759 predictions precisely. 382 00:18:52.119 --> 00:18:56.079 Web apps, mobile apps, IoT devices, other back end services. 383 00:18:56.519 --> 00:18:59.599 They just make a standard web request to your azufunction API, 384 00:19:00.079 --> 00:19:02.319 send the input data and get the prediction back. 385 00:19:02.440 --> 00:19:05.200 Well, that's pretty seamless. What's the process look like? 386 00:19:05.400 --> 00:19:08.160 Generally you'd prepare your mL dot net model, maybe converted 387 00:19:08.200 --> 00:19:11.720 to onnx for broad compatibility with an Azure services. Then 388 00:19:11.759 --> 00:19:14.440 you set up an Azure function usually with an HTTP trigger. 389 00:19:14.920 --> 00:19:16.799 You write the function code to load the model and 390 00:19:16.839 --> 00:19:19.759 implement the API endpoint logic to handle incoming requests and 391 00:19:19.799 --> 00:19:20.680 return predictions. 392 00:19:20.759 --> 00:19:25.799 Okay, deployed, but the real world is messy, right, How 393 00:19:25.799 --> 00:19:27.960 do you make sure your deployed model keeps working well? 394 00:19:28.039 --> 00:19:29.599 That sounds like a whole other challenge. 395 00:19:30.039 --> 00:19:33.039 It is, and that's where best practices for production environments 396 00:19:33.079 --> 00:19:35.519 come in. It's not enough just to deploy. You have 397 00:19:35.599 --> 00:19:38.720 to maintain and monitor like what kind of things? First, 398 00:19:38.839 --> 00:19:43.759 performance and optimization. Constantly monitor things like prediction, speed, latency, 399 00:19:44.039 --> 00:19:47.240 how many requests that can handle throughput, and critically its 400 00:19:47.240 --> 00:19:50.880 accuracy over time. You might use techniques like quantization to 401 00:19:50.960 --> 00:19:54.000