WEBVTT 1 00:00:00.080 --> 00:00:03.359 Okay, let's unpack this. We're diving deep into machine learning today, 2 00:00:03.399 --> 00:00:07.080 but maybe not in the way you'd expect. We're skipping 3 00:00:07.120 --> 00:00:10.519 the basic code tutorials. Our mission really is twofold. First, 4 00:00:10.720 --> 00:00:14.000 clear up that fuzzy line between AI and mL. Second, 5 00:00:14.199 --> 00:00:16.600 and this is the big one, expose what might be 6 00:00:16.640 --> 00:00:20.719 the hardest part of building these models. Hint, it's probably 7 00:00:20.719 --> 00:00:23.160 not the coding. And then we'll look at some high 8 00:00:23.239 --> 00:00:26.879 level tools, specifically the IBM Watson suite that aim to 9 00:00:26.920 --> 00:00:28.640 sort of shortcut all that complexity. 10 00:00:28.760 --> 00:00:31.760 Sound good, sounds great, And yeah, starting with that AI 11 00:00:31.879 --> 00:00:34.840 versus mL distinction as well. It's essential people use them 12 00:00:34.840 --> 00:00:37.719 interchangeably all the time, but they're really not the same thing. 13 00:00:37.799 --> 00:00:41.240 AI Artificial intelligence. That's the really big umbrella term, right. 14 00:00:41.280 --> 00:00:44.000 It covers anytime a machine does something we normally think 15 00:00:44.039 --> 00:00:46.359 requires intelligence to achieve a goal. 16 00:00:46.479 --> 00:00:48.960 Okay, so, like think way back a simple tic tac 17 00:00:49.039 --> 00:00:52.600 toe game programmed with fixed rules. If it plays to 18 00:00:52.679 --> 00:00:55.960 win based on those rules, that's AI exactly. 19 00:00:55.960 --> 00:01:00.679 It's following programmed instructions to simulate intelligence. It's not learning, 20 00:01:00.840 --> 00:01:04.079 just executing. Machine learning or mL is different. It's a 21 00:01:04.159 --> 00:01:07.480 subset of AI. This is where the system actually improves 22 00:01:07.480 --> 00:01:10.879 its performance on a task without being explicitly programmed for 23 00:01:10.959 --> 00:01:11.920 every single step. 24 00:01:12.079 --> 00:01:14.799 Ah. So instead of programming the if this then that 25 00:01:14.879 --> 00:01:15.640 for tick tak. 26 00:01:15.439 --> 00:01:18.959 Toe right, you'd feed it, say, thousands of recorded games, 27 00:01:19.000 --> 00:01:22.400 just the data, and the mL algorithm itself works out 28 00:01:22.439 --> 00:01:25.719 the patterns, the statistics of what moves lead to wins 29 00:01:25.799 --> 00:01:28.840 or losses. It basically builds its own strategy, its own 30 00:01:28.879 --> 00:01:31.000 sort of functional equation from experience. 31 00:01:31.079 --> 00:01:34.480 And Underlying these algorithms are some core mathematical ideas. We 32 00:01:34.519 --> 00:01:36.640 hear terms like linear regression. 33 00:01:36.319 --> 00:01:39.760 Yeah, the classic YMX plus C just finding the best 34 00:01:39.799 --> 00:01:42.959 line through data points, simple but powerful. 35 00:01:42.640 --> 00:01:45.920 Or things like support vector machines. Those sound more complex. 36 00:01:45.599 --> 00:01:48.079 They are. Sbms are really good when the boundary between 37 00:01:48.079 --> 00:01:51.840 your data categories isn't a straight line. I think complex 38 00:01:51.879 --> 00:01:54.159 patterns are spotting outliers. 39 00:01:53.640 --> 00:01:57.239 And knear's neighbor. That sounds more intuitive it is. 40 00:01:57.480 --> 00:02:01.719 Conceptually. kNN is unsupervised, means it doesn't need pre labeled answers. 41 00:02:02.040 --> 00:02:04.560 It just looks at a new data point and classifies 42 00:02:04.599 --> 00:02:07.680 it based on well what its nearest neighbors are in 43 00:02:07.719 --> 00:02:10.439 the data space. Simple distance calculation. 44 00:02:10.639 --> 00:02:13.400 Essentially, so the common thread is math. They all need 45 00:02:13.479 --> 00:02:16.400 numerical inputs to crunch the numbers and find that equation. 46 00:02:16.599 --> 00:02:21.479 Precisely, they're sophisticated calculators at their core, and that numerical 47 00:02:21.560 --> 00:02:23.639 need leads us right into the thick of it. 48 00:02:23.800 --> 00:02:26.560 Right Here's where it gets, as you said, really interesting, 49 00:02:26.680 --> 00:02:30.479 because if the algorithm is the calculator, the data is 50 00:02:30.520 --> 00:02:35.039 the fuel. And everything we've looked at suggests coding the model. 51 00:02:35.199 --> 00:02:37.840 Choosing the algorithm, that's often the easier. 52 00:02:37.520 --> 00:02:41.199 Part, oh absolutely far easier data preparation and something called 53 00:02:41.240 --> 00:02:43.879 feature engineering. That's where the real time sink is. That's 54 00:02:43.919 --> 00:02:45.039 the hardest part, easily. 55 00:02:45.280 --> 00:02:48.199 That seems counterintuitive. Why is wrangling the data so much 56 00:02:48.199 --> 00:02:50.479 harder than building the prediction engine itself? 57 00:02:50.680 --> 00:02:54.960 Because real world data is messy, it's incomplete, it's inconsistent, 58 00:02:55.080 --> 00:02:58.520 it's often in the wrong format. The algorithms, like those 59 00:02:58.560 --> 00:03:01.719 calculators are actually quite robust once they have clean input, 60 00:03:02.080 --> 00:03:05.000 but they are incredibly picky about getting that clean input. 61 00:03:05.319 --> 00:03:08.840 The complexity is in taming the chaos before the math starts. 62 00:03:09.080 --> 00:03:12.120 Okay, walk us through that taming process. What are the 63 00:03:12.199 --> 00:03:15.719 key headaches the terms a learner really needs to grasp. 64 00:03:15.719 --> 00:03:18.639 Well, First up, is just inspection. You load the data, 65 00:03:18.639 --> 00:03:20.759 maybe using a tool like pandas and Python, and you 66 00:03:20.800 --> 00:03:23.919 look at it. You'd use functions like say DF dot 67 00:03:23.960 --> 00:03:26.080 info to check for missing value. See that non n 68 00:03:26.080 --> 00:03:28.680 all count. If it's less than your total rows, you've 69 00:03:28.680 --> 00:03:29.680 got gaps. 70 00:03:29.479 --> 00:03:31.280 And you kid just leave gaps key Nope. 71 00:03:31.680 --> 00:03:33.960 The math breaks down. So you have to decide do 72 00:03:34.000 --> 00:03:36.479 I fill them in maybe using filnella with the average 73 00:03:36.560 --> 00:03:39.039 value of that column, or do I just drop those 74 00:03:39.159 --> 00:03:40.879 roads entirely. That's a judgment call. 75 00:03:41.039 --> 00:03:43.560 Okay, so handling missing data. What else? 76 00:03:43.759 --> 00:03:47.560 Then there's accessing the specific data you need, the features 77 00:03:47.719 --> 00:03:51.199 or columns. You might use methods like DF dot lock 78 00:03:51.560 --> 00:03:55.800 or just DF column name like DF population. And crucially, 79 00:03:55.840 --> 00:03:58.800 you often need to normalize or scale features. If one 80 00:03:58.800 --> 00:04:01.919 feature is age from zero to one hundred and another 81 00:04:02.039 --> 00:04:05.280 is income from zero to millions, the income scale could 82 00:04:05.360 --> 00:04:08.520 totally dominate the learning process. Just because the numbers are bigger, 83 00:04:08.639 --> 00:04:10.400 you need to bring them to a comparable scale. 84 00:04:10.680 --> 00:04:13.039 Makes sense, But then you hit what you call the 85 00:04:13.120 --> 00:04:16.720 language barrier, the fact that models only speak math exactly. 86 00:04:16.759 --> 00:04:18.959 What do you do with categorical data, things like color 87 00:04:19.040 --> 00:04:22.079 names red, blue, green, or maybe city names or product types. 88 00:04:22.120 --> 00:04:22.759 These aren't numbers. 89 00:04:22.879 --> 00:04:25.319 You can't just assign red one, blue, two green three. 90 00:04:25.360 --> 00:04:28.399 Right, You absolutely cannot, because the algorithm would interpret that 91 00:04:28.519 --> 00:04:31.160 as green being somehow three times as much as red, 92 00:04:31.720 --> 00:04:34.279 or blue being more than red. It imposes a false 93 00:04:34.319 --> 00:04:36.600 mathematical relationship that doesn't exist. 94 00:04:36.720 --> 00:04:39.759 So that data is useless unless you transform it. 95 00:04:39.720 --> 00:04:42.920 Completely useless to the algorithm in its raw state. This 96 00:04:42.959 --> 00:04:45.560 is where we need a technique called one hot in coding. 97 00:04:45.959 --> 00:04:49.160 One hot in coding. Okay, how does that work? It's clever. 98 00:04:49.279 --> 00:04:52.560 Actually, instead of one column with red, blue, green, you 99 00:04:52.600 --> 00:04:56.199 create three new columns. Maybe is red as blue as green? 100 00:04:56.759 --> 00:04:58.879 For a row that was red, the ies red column 101 00:04:58.879 --> 00:05:00.759 gets a one, and the other two you get a zero. 102 00:05:01.040 --> 00:05:03.480 For a blue row is blue gets one, others get zero. 103 00:05:04.319 --> 00:05:08.240 Now you have purely numerical data, just zeros and ones 104 00:05:08.279 --> 00:05:12.160 representing the categories, but without that fake ordering problem. The 105 00:05:12.199 --> 00:05:13.240 algorithm can handle that. 106 00:05:13.399 --> 00:05:16.879 Got it? So lots of cleaning, filling, gaps, scaling, and 107 00:05:16.959 --> 00:05:19.920 this one hot encoding for categories. That sounds like a 108 00:05:19.959 --> 00:05:20.680 lot of steps. 109 00:05:20.839 --> 00:05:23.439 It is, and it requires careful thought at each stage. 110 00:05:23.519 --> 00:05:26.000 Get it wrong and your model's predictions will be meaningless. 111 00:05:26.040 --> 00:05:27.959 No matter how sophisticated the algorithm is. 112 00:05:27.959 --> 00:05:29.879 Okay. So let's say we've done all that, the data 113 00:05:30.040 --> 00:05:33.399 is pristine numerical. How does the model actually learn the 114 00:05:33.399 --> 00:05:36.360 best coefficients those A values in the equation one dollars 115 00:05:36.399 --> 00:05:37.920 plus a one by one plus dollars. 116 00:05:38.199 --> 00:05:40.920 It learns through well trial and error, lots of it, 117 00:05:41.079 --> 00:05:44.879 very quickly. It starts with random guesses for those coefficients 118 00:05:45.000 --> 00:05:48.240 the A values. It makes a prediction using those random values. 119 00:05:48.560 --> 00:05:51.759 Then it compares its prediction to the actual known answer 120 00:05:51.839 --> 00:05:54.680 in the training data. It calculates how wrong it was 121 00:05:54.800 --> 00:05:57.680 using something called a loss function. A common one is 122 00:05:57.800 --> 00:06:01.720 means squared error or MSE. It just measures the average 123 00:06:01.720 --> 00:06:03.680 square difference between prediction. 124 00:06:03.399 --> 00:06:05.480 And reality, so it measures the OUCH. 125 00:06:05.319 --> 00:06:08.120 Pretty much, and the goal is to minimize that OUCH. 126 00:06:08.720 --> 00:06:12.000 Based on the error, the model slightly adjusts its coefficients 127 00:06:12.000 --> 00:06:13.759 in the direction that should reduce the error. 128 00:06:13.800 --> 00:06:14.240 Next time. 129 00:06:14.519 --> 00:06:17.600 It does this over and over again, making predictions, calculating 130 00:06:17.720 --> 00:06:21.439 error adjusting coefficients each full pass through the entire data set. 131 00:06:21.480 --> 00:06:22.360 Doing this is called an. 132 00:06:22.319 --> 00:06:25.399 Epoch, and it just keeps doing airbox until the error 133 00:06:25.519 --> 00:06:26.639 is as low as. 134 00:06:26.519 --> 00:06:30.279 Possible, or until the error stops improving significantly. Yeah, it's 135 00:06:30.319 --> 00:06:33.560 basically finding the coefficient values that best fit the patterns 136 00:06:33.560 --> 00:06:35.879 in the data by minimizing that loss function. 137 00:06:36.160 --> 00:06:40.000 Okay, that makes sense, which brings us to evaluating the 138 00:06:40.040 --> 00:06:44.000 model once it's trained. Metrics matter you called it, and 139 00:06:44.040 --> 00:06:47.519 you mentioned earlier. If I brag about ninety five percent accuracy, 140 00:06:47.680 --> 00:06:50.519 you might be suspicious. Why isn't high accuracy good? 141 00:06:50.839 --> 00:06:54.000 It can be, but it can also be incredibly misleading, 142 00:06:54.160 --> 00:06:56.399 especially with what we call skewed data sets. 143 00:06:56.439 --> 00:06:58.040 Skewed meaning unbalanced. 144 00:06:58.199 --> 00:07:01.680 Exactly. Imagine you're trying to detect a rare disease that 145 00:07:01.800 --> 00:07:05.199 only affects one percent of the population. A lazy model 146 00:07:05.240 --> 00:07:08.959 could just predict no disease for absolutely everyone. It would 147 00:07:08.959 --> 00:07:11.279 be wrong one percent of the time, but right ninety 148 00:07:11.360 --> 00:07:14.079 nine percent of the time, so ninety nine percent accuracy. 149 00:07:14.120 --> 00:07:16.240 So it would be completely useless. It never finds the 150 00:07:16.279 --> 00:07:17.879 actual disease cases precisely. 151 00:07:17.920 --> 00:07:20.560 That's why simple accuracy fails on skewed data. It doesn't 152 00:07:20.560 --> 00:07:22.120 tell you if the model is good at finding the 153 00:07:22.120 --> 00:07:22.959 thing you actually. 154 00:07:22.759 --> 00:07:25.600 Care about, So we need smarter metrics. You mentioned precision, 155 00:07:25.600 --> 00:07:29.040 and recall these involve true positives false positives all that. 156 00:07:29.720 --> 00:07:35.319 Yes, the confusion matrix terms tp tn fp fm true positive, 157 00:07:35.519 --> 00:07:40.480 true negative, false positive false negative. Precision asks of all 158 00:07:40.480 --> 00:07:43.279 the times the model predicted something was positive, like disease 159 00:07:43.360 --> 00:07:46.879 found how many times was it actually right? The formula 160 00:07:47.000 --> 00:07:50.720 is ttp, tp plus fp lesh. It's about minimizing the 161 00:07:50.759 --> 00:07:52.959 false positives predicting something that isn't there. 162 00:07:53.040 --> 00:07:56.000 Okay. So precision is about the accuracy of the positive predictions. 163 00:07:56.040 --> 00:07:56.879 What about recall? 164 00:07:57.399 --> 00:08:01.000 Recall, which is also called sensitivity or true positive rate, 165 00:08:01.480 --> 00:08:04.360 asks a different question of all the things that actually 166 00:08:04.360 --> 00:08:06.959 were positive in the real data, how many did the 167 00:08:07.000 --> 00:08:11.600 model successfully find? The formula is tp tp plus fn double. 168 00:08:12.000 --> 00:08:15.120 It's about minimizing false negatives, missing things you should have found. 169 00:08:15.240 --> 00:08:20.920 Ah, Okay, minimizing false positives precision versus minimizing false negatives recall, 170 00:08:21.399 --> 00:08:23.120 and I guess you can't always maximize both. 171 00:08:23.399 --> 00:08:25.759 Often there's a trade off. Tuning a model to be 172 00:08:25.800 --> 00:08:28.759 extremely precise might make it miss some actual positive cases 173 00:08:28.839 --> 00:08:31.439 lower recall. Tuning for extremely high recall might mean you 174 00:08:31.480 --> 00:08:33.720 get more false alarms lower precision. 175 00:08:33.519 --> 00:08:37.519 And the right balance depends entirely on the consequences of 176 00:08:37.559 --> 00:08:40.639 getting it wrong. Do you give us those examples again? 177 00:08:40.679 --> 00:08:41.399 They were really clear? 178 00:08:41.480 --> 00:08:44.120 Sure? Let's take tumor prediction. What's the worst kind of 179 00:08:44.240 --> 00:08:44.799 error there? 180 00:08:44.960 --> 00:08:48.480 A false positive, right, telling a healthy patient they have cancer. 181 00:08:48.519 --> 00:08:53.919 That's psychologically devastating and leads to unnecessary, potentially harmful treatments. 182 00:08:54.039 --> 00:08:58.480 Exactly. So in that case you need extremely high precision. 183 00:08:58.559 --> 00:09:01.240 You want to be very very sure when you say cancer. 184 00:09:01.840 --> 00:09:05.600 You might tolerate slightly lower recall, meaning you might miss 185 00:09:05.639 --> 00:09:09.440 a few tumors initially a false negative, because hopefully follow 186 00:09:09.519 --> 00:09:12.200 up tests or screenings will catch those later. The cost 187 00:09:12.240 --> 00:09:14.399 of a false positive is just too high. 188 00:09:14.559 --> 00:09:18.679 Okay, high precision for tumors. Now flip it. What about say, 189 00:09:18.720 --> 00:09:21.120 detecting shoplifters in the store security feed? 190 00:09:21.279 --> 00:09:22.919 Right, what's the worst error there? 191 00:09:23.080 --> 00:09:26.679 A false negative missing someone who is shoplifting the store, 192 00:09:26.679 --> 00:09:28.639 losers merchandise, the crime goes. 193 00:09:28.519 --> 00:09:33.679 Unaddressed, precisely, So here you need high recall. You want 194 00:09:33.679 --> 00:09:36.679 to catch as many actual incidents as possible. You might 195 00:09:36.759 --> 00:09:39.600 tolerate a few false positives and maybe flagging an innocent 196 00:09:39.639 --> 00:09:43.320 shopper occasionally who then gets quickly cleared by security. That's 197 00:09:43.360 --> 00:09:46.240 annoying for the customer, sure, but it's often seen as 198 00:09:46.600 --> 00:09:51.080 less costly than letting actual theft happen repeatedly. High recall 199 00:09:51.159 --> 00:09:52.480 is the priority. 200 00:09:52.000 --> 00:09:54.360 That really drives it home. It's not just about the math, 201 00:09:54.480 --> 00:09:57.360 it's about the real world impact of different kinds of errors. 202 00:09:57.399 --> 00:10:01.639 So tools like psych learns, precision recall curve, we're looking 203 00:10:01.639 --> 00:10:05.159 at ROC curves and AUC scores. They help you find 204 00:10:05.200 --> 00:10:06.639 that sweet spot exactly. 205 00:10:06.840 --> 00:10:09.200 They visualize the trade off and help you choose a 206 00:10:09.240 --> 00:10:12.840 model threshold that balances precision and recall appropriately for your 207 00:10:12.879 --> 00:10:16.440 specific problem. There's no single best score. It depends on 208 00:10:16.480 --> 00:10:17.679 the context. 209 00:10:17.159 --> 00:10:19.440 Which is a great transition. We've talked about the pain 210 00:10:19.519 --> 00:10:22.480 of data prep the nuances of metrics. Now let's talk 211 00:10:22.480 --> 00:10:23.559 about making it easier. 212 00:10:23.679 --> 00:10:27.039 Yes, knowledge is great, but applying it efficiently is key. 213 00:10:27.600 --> 00:10:30.879 Given how much manual effort goes into cleaning, tuning, and testing, 214 00:10:31.159 --> 00:10:33.360 Let's look at the tools designed to abstract that away. 215 00:10:33.519 --> 00:10:35.799 The IBM Watson suite is a prime example. 216 00:10:35.399 --> 00:10:39.039 Here, right the automation aspect, Let's start with optimizing the 217 00:10:39.080 --> 00:10:43.480 model itself. Traditionally, after data cleaning, you face this huge 218 00:10:43.519 --> 00:10:47.240 task of trying different models right decision trees, random forests, 219 00:10:47.320 --> 00:10:48.559 boosted trees. 220 00:10:48.440 --> 00:10:51.799 Dozens of them potentially, and for each model type you 221 00:10:51.840 --> 00:10:54.440 have to do hyper parameter tuning hyper parameters. 222 00:10:55.120 --> 00:10:58.320 Those are the knobs and dials inside the algorithm itself, 223 00:10:58.759 --> 00:11:01.480 like how deep a decision tree should grow max depth, 224 00:11:01.559 --> 00:11:04.279 or how many trees a random forest should use. 225 00:11:04.519 --> 00:11:08.240 Estimators exactly, and finding the best combination of these settings 226 00:11:08.320 --> 00:11:12.200 is crucial for performance. The traditional way is often brute force, 227 00:11:12.360 --> 00:11:15.840 like grid search cross validation. You define a grid of 228 00:11:15.960 --> 00:11:19.639 possible values for each hyper parameter, and the computer systematically 229 00:11:19.639 --> 00:11:23.200 tries every single combination. It can take hours, even days, 230 00:11:23.240 --> 00:11:25.240 depending on the data and the model complexity. 231 00:11:25.279 --> 00:11:28.759 Okay, so that sounds incredibly tedious and computationally expensive. How 232 00:11:28.799 --> 00:11:30.879 does something like AUTOAI shortcut this? 233 00:11:31.159 --> 00:11:35.919 AUTOAI is designed specifically for this structured data optimization problem. 234 00:11:36.039 --> 00:11:39.440 It's pretty remarkable. You essentially give it your clean data set, 235 00:11:39.519 --> 00:11:41.720 tell which column you want to predict, like medium house 236 00:11:41.799 --> 00:11:44.399 value or MEDV, and a housing data set, and then 237 00:11:44.519 --> 00:11:49.000 it just goes. It analyzes the data, It intelligently selects 238 00:11:49.039 --> 00:11:52.679 and applies data transformations. It builds multiple candidate pipelines using 239 00:11:52.759 --> 00:11:57.799 various algorithms. It performs sophisticated hyperparameter optimization automatically, far beyond 240 00:11:57.799 --> 00:12:00.240 simple grid search, and then it ranks all all the 241 00:12:00.240 --> 00:12:03.360 tested pipelines based on metrics relevant to your problem, like 242 00:12:03.519 --> 00:12:05.279 RMSC root means squared error. 243 00:12:05.320 --> 00:12:08.720 And the key part is you don't write the modeling code. 244 00:12:08.559 --> 00:12:11.159 Not a single line for the model training and tuning part. 245 00:12:11.320 --> 00:12:13.519 It automates what used to be weeks of a data 246 00:12:13.519 --> 00:12:17.159 scientist's iterative work, presenting you with the best performing models 247 00:12:17.200 --> 00:12:17.679 ready to go. 248 00:12:17.960 --> 00:12:21.879 Wow. Okay, that tackles structured tabular data. But what about 249 00:12:21.879 --> 00:12:26.200 the really messy stuff unstructured text images? We know? Traditional 250 00:12:26.320 --> 00:12:30.039 natural language processing NLP is a beast. You have to 251 00:12:30.039 --> 00:12:33.320 scrape text clean it, filter out common stop words like 252 00:12:33.559 --> 00:12:36.960 the and A, convert words to numbers using complex methods 253 00:12:37.000 --> 00:12:39.600 like word embedding. It's a whole field in itself, it 254 00:12:39.679 --> 00:12:40.039 really is. 255 00:12:40.120 --> 00:12:43.320 Building a good NLP pipeline from scratch can take months 256 00:12:43.399 --> 00:12:46.559 or even years of specialized effort. This is where something 257 00:12:46.600 --> 00:12:49.960 like Watson Discovery comes in. It aims to bypass almost 258 00:12:50.000 --> 00:12:52.720 all of that initial heavy lifting for text analysis. 259 00:12:52.799 --> 00:12:53.799 How so, what does it do? 260 00:12:54.159 --> 00:12:57.440 It provides powerful preprocessing out of the box, things like 261 00:12:57.480 --> 00:13:01.480 optical character recognition OCR to pull text from scanned documents, 262 00:13:01.720 --> 00:13:05.120 automatic text extraction from various file types. But the real 263 00:13:05.200 --> 00:13:07.559 magic is in the enrichments. Instead of you training a 264 00:13:07.600 --> 00:13:11.120 model for months just to recognize names or places, Discovery 265 00:13:11.159 --> 00:13:14.799 comes pre loaded with enrichment's like entity extraction, finding people, 266 00:13:14.879 --> 00:13:20.759 companies' locations, concept tagging, identifying key ideas, sentiment analysis, positive negative, tone, 267 00:13:20.840 --> 00:13:24.120 and more. You get deep insights almost instantly, So. 268 00:13:24.000 --> 00:13:27.039 It's like having a pre trained NLP expert ready to 269 00:13:27.080 --> 00:13:29.519 analyze huge volumes of documents. 270 00:13:29.559 --> 00:13:31.159 That's a good way to put it. And you can 271 00:13:31.240 --> 00:13:35.720 query these analyzed collections using the Discovery Query Language or DQL. 272 00:13:36.360 --> 00:13:39.919 You use simple operators like dot for an exact match 273 00:13:40.039 --> 00:13:44.399 or boff for contains to pinpoint specific information across potentially 274 00:13:44.480 --> 00:13:48.200 millions of documents without writing complex and LP code. 275 00:13:48.360 --> 00:13:50.399 Okay, that's text. What about images? 276 00:13:50.759 --> 00:13:55.720 Simpler idea very similar principle with Watson Visual recognition. Image analysis, 277 00:13:55.879 --> 00:13:59.879 especially using deep learning, is another complex field. Visual recognition 278 00:14:00.080 --> 00:14:03.360 offers pre built capabilities. You can use it for image classification, 279 00:14:03.559 --> 00:14:05.639 like telling the difference between a photo of a husky 280 00:14:05.720 --> 00:14:08.399 and a photo of a beagle, or for object detection 281 00:14:08.559 --> 00:14:11.360 finding and maybe even counting specific things within an image, 282 00:14:11.360 --> 00:14:14.399 like identifying all the cars or people in a street scene. Again, 283 00:14:14.440 --> 00:14:16.759 it abstracts away the need to build and train those 284 00:14:16.799 --> 00:14:18.720 complex deep learning models yourself. 285 00:14:18.879 --> 00:14:21.840 It seems like a recurring theme abstracting the complexity of 286 00:14:21.840 --> 00:14:24.720 the underlying mL. It's such a one more automation piece 287 00:14:25.279 --> 00:14:28.639 building chatbots or conversational interfaces with what it's an assistant? 288 00:14:28.799 --> 00:14:30.279 How does that simplify things? 289 00:14:30.480 --> 00:14:34.039 It uses a fairly intuitive structure. You define the user's 290 00:14:34.080 --> 00:14:36.879 intents what they're trying to achieve, often marked with a 291 00:14:36.919 --> 00:14:40.679 hash like halftag order pizza. Then you define entities the 292 00:14:40.720 --> 00:14:44.559 specific pieces of information relevant to those intents, marked within 293 00:14:44.679 --> 00:14:47.240 at like at pizza size or at topping. 294 00:14:47.600 --> 00:14:50.559 So intent is the goal, entity is the detail. How 295 00:14:50.559 --> 00:14:52.679 do they connect through dialogues? 296 00:14:53.360 --> 00:14:56.360 You build a flow chart essentially that defines the conversation. 297 00:14:56.799 --> 00:14:59.559 If the user expresses the hashtag order pizza intent, the 298 00:14:59.559 --> 00:15:02.519 dialogue might then ask for the site at pizza size 299 00:15:02.559 --> 00:15:03.799 and at topping entities. 300 00:15:04.279 --> 00:15:06.519 How does it remember what the user already said, Like 301 00:15:06.559 --> 00:15:08.159 if I say I want a large pizza and then 302 00:15:08.240 --> 00:15:09.759 later say pepperoni. 303 00:15:10.279 --> 00:15:13.919 That's handled by features like slots and context variables. Slots 304 00:15:13.919 --> 00:15:15.799 are defined within an intent to make sure the bot 305 00:15:15.840 --> 00:15:19.440 gathers all necessary entities if it needs size and topping, 306 00:15:19.600 --> 00:15:21.840 and you only gave the size A slot can prompt 307 00:15:21.879 --> 00:15:24.519 for the topping. Context variables are like the bot's short 308 00:15:24.600 --> 00:15:27.480 term memory. It can store the fact that pizza size 309 00:15:27.519 --> 00:15:29.960 lurgs in a context variable. So when you just say pepperoni, 310 00:15:29.960 --> 00:15:31.919 it knows you mean pepperoni for the large pizza you're 311 00:15:31.919 --> 00:15:34.679 already mentioned. It maintains the state of the conversation. 312 00:15:35.000 --> 00:15:38.759 Okay, so we've got these powerful, often automated ways to 313 00:15:38.799 --> 00:15:42.480 build specialized mL models and services using tools like Watson, 314 00:15:43.039 --> 00:15:48.240 AUTOAI for structured data, discovery for text, visual recognition for images, 315 00:15:48.279 --> 00:15:50.840 Assistant for conversations. So what does this all mean? How 316 00:15:50.919 --> 00:15:53.559 do these things actually get used? How do we move 317 00:15:53.559 --> 00:15:55.720 from these tools to a live application? 318 00:15:56.120 --> 00:15:58.480 Good question. You need to get them into a production 319 00:15:58.600 --> 00:16:01.519 environment where users or other systems can interact with them. 320 00:16:01.799 --> 00:16:04.159 A common approach is to build a back end application, 321 00:16:04.679 --> 00:16:08.519 maybe using a Python web framework like flask. This flask 322 00:16:08.600 --> 00:16:11.559 app acts as a middleman. It receives requests maybe from 323 00:16:11.559 --> 00:16:13.519 a web page or mobile app, figures out what needs 324 00:16:13.559 --> 00:16:16.600 to happen, calls the relevant walks in API like discovery 325 00:16:16.639 --> 00:16:19.200 or assistant, gets the result and sends it. 326 00:16:19.159 --> 00:16:22.440 Back and deploying that Flask gap is that complex too, it. 327 00:16:22.360 --> 00:16:25.240 Can be, but platform as a service offerings like IBM 328 00:16:25.279 --> 00:16:28.440 Cloud with Cloud Foundry really simplify it. Often it's as 329 00:16:28.440 --> 00:16:31.159 simple as navigating to your project directory and the command 330 00:16:31.159 --> 00:16:34.799 line and typing CF push. The platform handles provisioning servers, 331 00:16:34.879 --> 00:16:38.639 load balancing, all the infrastructure stuff. It can be incredibly fast. 332 00:16:38.600 --> 00:16:41.600 So the path to production can be streamlined too. Are 333 00:16:41.600 --> 00:16:45.759 there other useful utility services that often plug into these systems? 334 00:16:45.799 --> 00:16:47.600 You mentioned a couple, Yeah, A couple. 335 00:16:47.399 --> 00:16:50.879 Of really useful ones come to mind. First, the Tone Analyzer. 336 00:16:51.559 --> 00:16:54.799 This service specifically analyzes texts, but not just for what 337 00:16:54.919 --> 00:16:57.960 is said, but how it's said. It uses NLP to 338 00:16:58.039 --> 00:17:01.480 detect emotional and language tones, well kind of tones. It 339 00:17:01.480 --> 00:17:05.200 breaks them down. They're emotional tones, things like anger, fear, 340 00:17:05.759 --> 00:17:11.359 joy sadness, and then language tones analytical, tentative, confident. 341 00:17:11.599 --> 00:17:13.720 I could see how that would be useful, like monitoring 342 00:17:13.720 --> 00:17:15.400 customer support chats or reviews. 343 00:17:15.519 --> 00:17:20.079 Absolutely understanding the tone helps companies gauge customer sentiment, identify 344 00:17:20.240 --> 00:17:24.160 urgent issues, or even tailor responses dynamically. And the other utility, 345 00:17:24.359 --> 00:17:28.240 text to speech or TTS, exactly the kind of technology 346 00:17:28.279 --> 00:17:30.559 needed to voice a script like this one. Actually, it 347 00:17:30.599 --> 00:17:33.599 takes written text and converts it into natural sounding speech. 348 00:17:34.160 --> 00:17:38.119 Modern TTS services offer various high quality voices different languages, 349 00:17:38.400 --> 00:17:41.480 and you can even customize the output using SSML. That's 350 00:17:41.480 --> 00:17:46.720 Speech Synthesis Markup Language. It lets you control pronunciation, pauses, emphasis, pitch, 351 00:17:47.079 --> 00:17:49.559 making the synthesized speech sound much less robotic. 352 00:17:49.720 --> 00:17:52.920 Right bringing it full circle. So to recap, we've seen 353 00:17:52.960 --> 00:17:56.319 that while algorithms are key, the real bear in mL 354 00:17:56.480 --> 00:18:00.079 is often data preparation and feature engineering. We learn that 355 00:18:00.119 --> 00:18:03.440 simple accuracy can lie, and we need metrics like precision 356 00:18:03.480 --> 00:18:06.480 and recall balance carefully based on the real world consequences 357 00:18:06.480 --> 00:18:08.960 of errors. And then we saw how suites like IBM 358 00:18:09.039 --> 00:18:13.480 Watson provide powerful shortcuts AUTOAI for optimizing models on structured 359 00:18:13.519 --> 00:18:17.000 data without coding, Discovery and visual recognition for extracting insights 360 00:18:17.000 --> 00:18:21.759 from unstructured text and images, and assistant for building conversational interfaces. 361 00:18:21.359 --> 00:18:23.960 Plus utilities like tone analyzer and text to speech to 362 00:18:24.000 --> 00:18:28.519 add further capabilities, all deployable relatively easily via cloud platforms. 363 00:18:28.680 --> 00:18:31.599 Okay, so you the listener should now have a much 364 00:18:31.640 --> 00:18:36.240 clearer picture of both the deep challenges in mL, data quality, 365 00:18:36.359 --> 00:18:40.559 metric choice, and also the sophisticated tools emerging to automate 366 00:18:40.599 --> 00:18:42.480 and abstract away a lot of that complexity. 367 00:18:42.799 --> 00:18:45.880 And we saw specifically how tools like AUTOAI can take 368 00:18:45.920 --> 00:18:49.519 over complex tasks like model selection and hyper parameter tuning, 369 00:18:49.839 --> 00:18:52.279 things that used to be purely the domain of the 370 00:18:52.319 --> 00:18:55.319 expert coder. Which leads to, I think a really interesting 371 00:18:55.359 --> 00:18:57.680 final thought for you to chew on. As these incredibly 372 00:18:57.680 --> 00:19:01.319 powerful tools increasingly automate the how now, the coding, the tuning, 373 00:19:01.400 --> 00:19:04.720 the model selection itself, where should the modern data learner 374 00:19:04.799 --> 00:19:07.759 focus their energy next? Is the most valuable skill becoming 375 00:19:07.759 --> 00:19:10.519 in even deeper mastery of the underlying code and mathematics, 376 00:19:11.000 --> 00:19:13.960 or is it shifting towards mastering the data itself, its quality, 377 00:19:14.000 --> 00:19:17.119 its nuances, its preparation, and ultimately the interpretation of what 378 00:19:17.160 --> 00:19:20.319 the automated tools tell us. Where does the essential human 379 00:19:20.359 --> 00:19:22.640 expertise lie now? Something to think about