WEBVTT

1
00:00:00.080 --> 00:00:03.359
<v Speaker 1>Okay, let's unpack this. We're diving deep into machine learning today,

2
00:00:03.399 --> 00:00:07.080
<v Speaker 1>but maybe not in the way you'd expect. We're skipping

3
00:00:07.120 --> 00:00:10.519
<v Speaker 1>the basic code tutorials. Our mission really is twofold. First,

4
00:00:10.720 --> 00:00:14.000
<v Speaker 1>clear up that fuzzy line between AI and mL. Second,

5
00:00:14.199 --> 00:00:16.600
<v Speaker 1>and this is the big one, expose what might be

6
00:00:16.640 --> 00:00:20.719
<v Speaker 1>the hardest part of building these models. Hint, it's probably

7
00:00:20.719 --> 00:00:23.160
<v Speaker 1>not the coding. And then we'll look at some high

8
00:00:23.239 --> 00:00:26.879
<v Speaker 1>level tools, specifically the IBM Watson suite that aim to

9
00:00:26.920 --> 00:00:28.640
<v Speaker 1>sort of shortcut all that complexity.

10
00:00:28.760 --> 00:00:31.760
<v Speaker 2>Sound good, sounds great, And yeah, starting with that AI

11
00:00:31.879 --> 00:00:34.840
<v Speaker 2>versus mL distinction as well. It's essential people use them

12
00:00:34.840 --> 00:00:37.719
<v Speaker 2>interchangeably all the time, but they're really not the same thing.

13
00:00:37.799 --> 00:00:41.240
<v Speaker 2>AI Artificial intelligence. That's the really big umbrella term, right.

14
00:00:41.280 --> 00:00:44.000
<v Speaker 2>It covers anytime a machine does something we normally think

15
00:00:44.039 --> 00:00:46.359
<v Speaker 2>requires intelligence to achieve a goal.

16
00:00:46.479 --> 00:00:48.960
<v Speaker 1>Okay, so, like think way back a simple tic tac

17
00:00:49.039 --> 00:00:52.600
<v Speaker 1>toe game programmed with fixed rules. If it plays to

18
00:00:52.679 --> 00:00:55.960
<v Speaker 1>win based on those rules, that's AI exactly.

19
00:00:55.960 --> 00:01:00.679
<v Speaker 2>It's following programmed instructions to simulate intelligence. It's not learning,

20
00:01:00.840 --> 00:01:04.079
<v Speaker 2>just executing. Machine learning or mL is different. It's a

21
00:01:04.159 --> 00:01:07.480
<v Speaker 2>subset of AI. This is where the system actually improves

22
00:01:07.480 --> 00:01:10.879
<v Speaker 2>its performance on a task without being explicitly programmed for

23
00:01:10.959 --> 00:01:11.920
<v Speaker 2>every single step.

24
00:01:12.079 --> 00:01:14.799
<v Speaker 1>Ah. So instead of programming the if this then that

25
00:01:14.879 --> 00:01:15.640
<v Speaker 1>for tick tak.

26
00:01:15.439 --> 00:01:18.959
<v Speaker 2>Toe right, you'd feed it, say, thousands of recorded games,

27
00:01:19.000 --> 00:01:22.400
<v Speaker 2>just the data, and the mL algorithm itself works out

28
00:01:22.439 --> 00:01:25.719
<v Speaker 2>the patterns, the statistics of what moves lead to wins

29
00:01:25.799 --> 00:01:28.840
<v Speaker 2>or losses. It basically builds its own strategy, its own

30
00:01:28.879 --> 00:01:31.000
<v Speaker 2>sort of functional equation from experience.

31
00:01:31.079 --> 00:01:34.480
<v Speaker 1>And Underlying these algorithms are some core mathematical ideas. We

32
00:01:34.519 --> 00:01:36.640
<v Speaker 1>hear terms like linear regression.

33
00:01:36.319 --> 00:01:39.760
<v Speaker 2>Yeah, the classic YMX plus C just finding the best

34
00:01:39.799 --> 00:01:42.959
<v Speaker 2>line through data points, simple but powerful.

35
00:01:42.640 --> 00:01:45.920
<v Speaker 1>Or things like support vector machines. Those sound more complex.

36
00:01:45.599 --> 00:01:48.079
<v Speaker 2>They are. Sbms are really good when the boundary between

37
00:01:48.079 --> 00:01:51.840
<v Speaker 2>your data categories isn't a straight line. I think complex

38
00:01:51.879 --> 00:01:54.159
<v Speaker 2>patterns are spotting outliers.

39
00:01:53.640 --> 00:01:57.239
<v Speaker 1>And knear's neighbor. That sounds more intuitive it is.

40
00:01:57.480 --> 00:02:01.719
<v Speaker 2>Conceptually. kNN is unsupervised, means it doesn't need pre labeled answers.

41
00:02:02.040 --> 00:02:04.560
<v Speaker 2>It just looks at a new data point and classifies

42
00:02:04.599 --> 00:02:07.680
<v Speaker 2>it based on well what its nearest neighbors are in

43
00:02:07.719 --> 00:02:10.439
<v Speaker 2>the data space. Simple distance calculation.

44
00:02:10.639 --> 00:02:13.400
<v Speaker 1>Essentially, so the common thread is math. They all need

45
00:02:13.479 --> 00:02:16.400
<v Speaker 1>numerical inputs to crunch the numbers and find that equation.

46
00:02:16.599 --> 00:02:21.479
<v Speaker 2>Precisely, they're sophisticated calculators at their core, and that numerical

47
00:02:21.560 --> 00:02:23.639
<v Speaker 2>need leads us right into the thick of it.

48
00:02:23.800 --> 00:02:26.560
<v Speaker 1>Right Here's where it gets, as you said, really interesting,

49
00:02:26.680 --> 00:02:30.479
<v Speaker 1>because if the algorithm is the calculator, the data is

50
00:02:30.520 --> 00:02:35.039
<v Speaker 1>the fuel. And everything we've looked at suggests coding the model.

51
00:02:35.199 --> 00:02:37.840
<v Speaker 1>Choosing the algorithm, that's often the easier.

52
00:02:37.520 --> 00:02:41.199
<v Speaker 2>Part, oh absolutely far easier data preparation and something called

53
00:02:41.240 --> 00:02:43.879
<v Speaker 2>feature engineering. That's where the real time sink is. That's

54
00:02:43.919 --> 00:02:45.039
<v Speaker 2>the hardest part, easily.

55
00:02:45.280 --> 00:02:48.199
<v Speaker 1>That seems counterintuitive. Why is wrangling the data so much

56
00:02:48.199 --> 00:02:50.479
<v Speaker 1>harder than building the prediction engine itself?

57
00:02:50.680 --> 00:02:54.960
<v Speaker 2>Because real world data is messy, it's incomplete, it's inconsistent,

58
00:02:55.080 --> 00:02:58.520
<v Speaker 2>it's often in the wrong format. The algorithms, like those

59
00:02:58.560 --> 00:03:01.719
<v Speaker 2>calculators are actually quite robust once they have clean input,

60
00:03:02.080 --> 00:03:05.000
<v Speaker 2>but they are incredibly picky about getting that clean input.

61
00:03:05.319 --> 00:03:08.840
<v Speaker 2>The complexity is in taming the chaos before the math starts.

62
00:03:09.080 --> 00:03:12.120
<v Speaker 1>Okay, walk us through that taming process. What are the

63
00:03:12.199 --> 00:03:15.719
<v Speaker 1>key headaches the terms a learner really needs to grasp.

64
00:03:15.719 --> 00:03:18.639
<v Speaker 2>Well, First up, is just inspection. You load the data,

65
00:03:18.639 --> 00:03:20.759
<v Speaker 2>maybe using a tool like pandas and Python, and you

66
00:03:20.800 --> 00:03:23.919
<v Speaker 2>look at it. You'd use functions like say DF dot

67
00:03:23.960 --> 00:03:26.080
<v Speaker 2>info to check for missing value. See that non n

68
00:03:26.080 --> 00:03:28.680
<v Speaker 2>all count. If it's less than your total rows, you've

69
00:03:28.680 --> 00:03:29.680
<v Speaker 2>got gaps.

70
00:03:29.479 --> 00:03:31.280
<v Speaker 1>And you kid just leave gaps key Nope.

71
00:03:31.680 --> 00:03:33.960
<v Speaker 2>The math breaks down. So you have to decide do

72
00:03:34.000 --> 00:03:36.479
<v Speaker 2>I fill them in maybe using filnella with the average

73
00:03:36.560 --> 00:03:39.039
<v Speaker 2>value of that column, or do I just drop those

74
00:03:39.159 --> 00:03:40.879
<v Speaker 2>roads entirely. That's a judgment call.

75
00:03:41.039 --> 00:03:43.560
<v Speaker 1>Okay, so handling missing data. What else?

76
00:03:43.759 --> 00:03:47.560
<v Speaker 2>Then there's accessing the specific data you need, the features

77
00:03:47.719 --> 00:03:51.199
<v Speaker 2>or columns. You might use methods like DF dot lock

78
00:03:51.560 --> 00:03:55.800
<v Speaker 2>or just DF column name like DF population. And crucially,

79
00:03:55.840 --> 00:03:58.800
<v Speaker 2>you often need to normalize or scale features. If one

80
00:03:58.800 --> 00:04:01.919
<v Speaker 2>feature is age from zero to one hundred and another

81
00:04:02.039 --> 00:04:05.280
<v Speaker 2>is income from zero to millions, the income scale could

82
00:04:05.360 --> 00:04:08.520
<v Speaker 2>totally dominate the learning process. Just because the numbers are bigger,

83
00:04:08.639 --> 00:04:10.400
<v Speaker 2>you need to bring them to a comparable scale.

84
00:04:10.680 --> 00:04:13.039
<v Speaker 1>Makes sense, But then you hit what you call the

85
00:04:13.120 --> 00:04:16.720
<v Speaker 1>language barrier, the fact that models only speak math exactly.

86
00:04:16.759 --> 00:04:18.959
<v Speaker 2>What do you do with categorical data, things like color

87
00:04:19.040 --> 00:04:22.079
<v Speaker 2>names red, blue, green, or maybe city names or product types.

88
00:04:22.120 --> 00:04:22.759
<v Speaker 2>These aren't numbers.

89
00:04:22.879 --> 00:04:25.319
<v Speaker 1>You can't just assign red one, blue, two green three.

90
00:04:25.360 --> 00:04:28.399
<v Speaker 2>Right, You absolutely cannot, because the algorithm would interpret that

91
00:04:28.519 --> 00:04:31.160
<v Speaker 2>as green being somehow three times as much as red,

92
00:04:31.720 --> 00:04:34.279
<v Speaker 2>or blue being more than red. It imposes a false

93
00:04:34.319 --> 00:04:36.600
<v Speaker 2>mathematical relationship that doesn't exist.

94
00:04:36.720 --> 00:04:39.759
<v Speaker 1>So that data is useless unless you transform it.

95
00:04:39.720 --> 00:04:42.920
<v Speaker 2>Completely useless to the algorithm in its raw state. This

96
00:04:42.959 --> 00:04:45.560
<v Speaker 2>is where we need a technique called one hot in coding.

97
00:04:45.959 --> 00:04:49.160
<v Speaker 1>One hot in coding. Okay, how does that work? It's clever.

98
00:04:49.279 --> 00:04:52.560
<v Speaker 2>Actually, instead of one column with red, blue, green, you

99
00:04:52.600 --> 00:04:56.199
<v Speaker 2>create three new columns. Maybe is red as blue as green?

100
00:04:56.759 --> 00:04:58.879
<v Speaker 2>For a row that was red, the ies red column

101
00:04:58.879 --> 00:05:00.759
<v Speaker 2>gets a one, and the other two you get a zero.

102
00:05:01.040 --> 00:05:03.480
<v Speaker 2>For a blue row is blue gets one, others get zero.

103
00:05:04.319 --> 00:05:08.240
<v Speaker 2>Now you have purely numerical data, just zeros and ones

104
00:05:08.279 --> 00:05:12.160
<v Speaker 2>representing the categories, but without that fake ordering problem. The

105
00:05:12.199 --> 00:05:13.240
<v Speaker 2>algorithm can handle that.

106
00:05:13.399 --> 00:05:16.879
<v Speaker 1>Got it? So lots of cleaning, filling, gaps, scaling, and

107
00:05:16.959 --> 00:05:19.920
<v Speaker 1>this one hot encoding for categories. That sounds like a

108
00:05:19.959 --> 00:05:20.680
<v Speaker 1>lot of steps.

109
00:05:20.839 --> 00:05:23.439
<v Speaker 2>It is, and it requires careful thought at each stage.

110
00:05:23.519 --> 00:05:26.000
<v Speaker 2>Get it wrong and your model's predictions will be meaningless.

111
00:05:26.040 --> 00:05:27.959
<v Speaker 2>No matter how sophisticated the algorithm is.

112
00:05:27.959 --> 00:05:29.879
<v Speaker 1>Okay. So let's say we've done all that, the data

113
00:05:30.040 --> 00:05:33.399
<v Speaker 1>is pristine numerical. How does the model actually learn the

114
00:05:33.399 --> 00:05:36.360
<v Speaker 1>best coefficients those A values in the equation one dollars

115
00:05:36.399 --> 00:05:37.920
<v Speaker 1>plus a one by one plus dollars.

116
00:05:38.199 --> 00:05:40.920
<v Speaker 2>It learns through well trial and error, lots of it,

117
00:05:41.079 --> 00:05:44.879
<v Speaker 2>very quickly. It starts with random guesses for those coefficients

118
00:05:45.000 --> 00:05:48.240
<v Speaker 2>the A values. It makes a prediction using those random values.

119
00:05:48.560 --> 00:05:51.759
<v Speaker 2>Then it compares its prediction to the actual known answer

120
00:05:51.839 --> 00:05:54.680
<v Speaker 2>in the training data. It calculates how wrong it was

121
00:05:54.800 --> 00:05:57.680
<v Speaker 2>using something called a loss function. A common one is

122
00:05:57.800 --> 00:06:01.720
<v Speaker 2>means squared error or MSE. It just measures the average

123
00:06:01.720 --> 00:06:03.680
<v Speaker 2>square difference between prediction.

124
00:06:03.399 --> 00:06:05.480
<v Speaker 1>And reality, so it measures the OUCH.

125
00:06:05.319 --> 00:06:08.120
<v Speaker 2>Pretty much, and the goal is to minimize that OUCH.

126
00:06:08.720 --> 00:06:12.000
<v Speaker 2>Based on the error, the model slightly adjusts its coefficients

127
00:06:12.000 --> 00:06:13.759
<v Speaker 2>in the direction that should reduce the error.

128
00:06:13.800 --> 00:06:14.240
<v Speaker 1>Next time.

129
00:06:14.519 --> 00:06:17.600
<v Speaker 2>It does this over and over again, making predictions, calculating

130
00:06:17.720 --> 00:06:21.439
<v Speaker 2>error adjusting coefficients each full pass through the entire data set.

131
00:06:21.480 --> 00:06:22.360
<v Speaker 2>Doing this is called an.

132
00:06:22.319 --> 00:06:25.399
<v Speaker 1>Epoch, and it just keeps doing airbox until the error

133
00:06:25.519 --> 00:06:26.639
<v Speaker 1>is as low as.

134
00:06:26.519 --> 00:06:30.279
<v Speaker 2>Possible, or until the error stops improving significantly. Yeah, it's

135
00:06:30.319 --> 00:06:33.560
<v Speaker 2>basically finding the coefficient values that best fit the patterns

136
00:06:33.560 --> 00:06:35.879
<v Speaker 2>in the data by minimizing that loss function.

137
00:06:36.160 --> 00:06:40.000
<v Speaker 1>Okay, that makes sense, which brings us to evaluating the

138
00:06:40.040 --> 00:06:44.000
<v Speaker 1>model once it's trained. Metrics matter you called it, and

139
00:06:44.040 --> 00:06:47.519
<v Speaker 1>you mentioned earlier. If I brag about ninety five percent accuracy,

140
00:06:47.680 --> 00:06:50.519
<v Speaker 1>you might be suspicious. Why isn't high accuracy good?

141
00:06:50.839 --> 00:06:54.000
<v Speaker 2>It can be, but it can also be incredibly misleading,

142
00:06:54.160 --> 00:06:56.399
<v Speaker 2>especially with what we call skewed data sets.

143
00:06:56.439 --> 00:06:58.040
<v Speaker 1>Skewed meaning unbalanced.

144
00:06:58.199 --> 00:07:01.680
<v Speaker 2>Exactly. Imagine you're trying to detect a rare disease that

145
00:07:01.800 --> 00:07:05.199
<v Speaker 2>only affects one percent of the population. A lazy model

146
00:07:05.240 --> 00:07:08.959
<v Speaker 2>could just predict no disease for absolutely everyone. It would

147
00:07:08.959 --> 00:07:11.279
<v Speaker 2>be wrong one percent of the time, but right ninety

148
00:07:11.360 --> 00:07:14.079
<v Speaker 2>nine percent of the time, so ninety nine percent accuracy.

149
00:07:14.120 --> 00:07:16.240
<v Speaker 1>So it would be completely useless. It never finds the

150
00:07:16.279 --> 00:07:17.879
<v Speaker 1>actual disease cases precisely.

151
00:07:17.920 --> 00:07:20.560
<v Speaker 2>That's why simple accuracy fails on skewed data. It doesn't

152
00:07:20.560 --> 00:07:22.120
<v Speaker 2>tell you if the model is good at finding the

153
00:07:22.120 --> 00:07:22.959
<v Speaker 2>thing you actually.

154
00:07:22.759 --> 00:07:25.600
<v Speaker 1>Care about, So we need smarter metrics. You mentioned precision,

155
00:07:25.600 --> 00:07:29.040
<v Speaker 1>and recall these involve true positives false positives all that.

156
00:07:29.720 --> 00:07:35.319
<v Speaker 2>Yes, the confusion matrix terms tp tn fp fm true positive,

157
00:07:35.519 --> 00:07:40.480
<v Speaker 2>true negative, false positive false negative. Precision asks of all

158
00:07:40.480 --> 00:07:43.279
<v Speaker 2>the times the model predicted something was positive, like disease

159
00:07:43.360 --> 00:07:46.879
<v Speaker 2>found how many times was it actually right? The formula

160
00:07:47.000 --> 00:07:50.720
<v Speaker 2>is ttp, tp plus fp lesh. It's about minimizing the

161
00:07:50.759 --> 00:07:52.959
<v Speaker 2>false positives predicting something that isn't there.

162
00:07:53.040 --> 00:07:56.000
<v Speaker 1>Okay. So precision is about the accuracy of the positive predictions.

163
00:07:56.040 --> 00:07:56.879
<v Speaker 1>What about recall?

164
00:07:57.399 --> 00:08:01.000
<v Speaker 2>Recall, which is also called sensitivity or true positive rate,

165
00:08:01.480 --> 00:08:04.360
<v Speaker 2>asks a different question of all the things that actually

166
00:08:04.360 --> 00:08:06.959
<v Speaker 2>were positive in the real data, how many did the

167
00:08:07.000 --> 00:08:11.600
<v Speaker 2>model successfully find? The formula is tp tp plus fn double.

168
00:08:12.000 --> 00:08:15.120
<v Speaker 2>It's about minimizing false negatives, missing things you should have found.

169
00:08:15.240 --> 00:08:20.920
<v Speaker 1>Ah, Okay, minimizing false positives precision versus minimizing false negatives recall,

170
00:08:21.399 --> 00:08:23.120
<v Speaker 1>and I guess you can't always maximize both.

171
00:08:23.399 --> 00:08:25.759
<v Speaker 2>Often there's a trade off. Tuning a model to be

172
00:08:25.800 --> 00:08:28.759
<v Speaker 2>extremely precise might make it miss some actual positive cases

173
00:08:28.839 --> 00:08:31.439
<v Speaker 2>lower recall. Tuning for extremely high recall might mean you

174
00:08:31.480 --> 00:08:33.720
<v Speaker 2>get more false alarms lower precision.

175
00:08:33.519 --> 00:08:37.519
<v Speaker 1>And the right balance depends entirely on the consequences of

176
00:08:37.559 --> 00:08:40.639
<v Speaker 1>getting it wrong. Do you give us those examples again?

177
00:08:40.679 --> 00:08:41.399
<v Speaker 1>They were really clear?

178
00:08:41.480 --> 00:08:44.120
<v Speaker 2>Sure? Let's take tumor prediction. What's the worst kind of

179
00:08:44.240 --> 00:08:44.799
<v Speaker 2>error there?

180
00:08:44.960 --> 00:08:48.480
<v Speaker 1>A false positive, right, telling a healthy patient they have cancer.

181
00:08:48.519 --> 00:08:53.919
<v Speaker 1>That's psychologically devastating and leads to unnecessary, potentially harmful treatments.

182
00:08:54.039 --> 00:08:58.480
<v Speaker 2>Exactly. So in that case you need extremely high precision.

183
00:08:58.559 --> 00:09:01.240
<v Speaker 2>You want to be very very sure when you say cancer.

184
00:09:01.840 --> 00:09:05.600
<v Speaker 2>You might tolerate slightly lower recall, meaning you might miss

185
00:09:05.639 --> 00:09:09.440
<v Speaker 2>a few tumors initially a false negative, because hopefully follow

186
00:09:09.519 --> 00:09:12.200
<v Speaker 2>up tests or screenings will catch those later. The cost

187
00:09:12.240 --> 00:09:14.399
<v Speaker 2>of a false positive is just too high.

188
00:09:14.559 --> 00:09:18.679
<v Speaker 1>Okay, high precision for tumors. Now flip it. What about say,

189
00:09:18.720 --> 00:09:21.120
<v Speaker 1>detecting shoplifters in the store security feed?

190
00:09:21.279 --> 00:09:22.919
<v Speaker 2>Right, what's the worst error there?

191
00:09:23.080 --> 00:09:26.679
<v Speaker 1>A false negative missing someone who is shoplifting the store,

192
00:09:26.679 --> 00:09:28.639
<v Speaker 1>losers merchandise, the crime goes.

193
00:09:28.519 --> 00:09:33.679
<v Speaker 2>Unaddressed, precisely, So here you need high recall. You want

194
00:09:33.679 --> 00:09:36.679
<v Speaker 2>to catch as many actual incidents as possible. You might

195
00:09:36.759 --> 00:09:39.600
<v Speaker 2>tolerate a few false positives and maybe flagging an innocent

196
00:09:39.639 --> 00:09:43.320
<v Speaker 2>shopper occasionally who then gets quickly cleared by security. That's

197
00:09:43.360 --> 00:09:46.240
<v Speaker 2>annoying for the customer, sure, but it's often seen as

198
00:09:46.600 --> 00:09:51.080
<v Speaker 2>less costly than letting actual theft happen repeatedly. High recall

199
00:09:51.159 --> 00:09:52.480
<v Speaker 2>is the priority.

200
00:09:52.000 --> 00:09:54.360
<v Speaker 1>That really drives it home. It's not just about the math,

201
00:09:54.480 --> 00:09:57.360
<v Speaker 1>it's about the real world impact of different kinds of errors.

202
00:09:57.399 --> 00:10:01.639
<v Speaker 1>So tools like psych learns, precision recall curve, we're looking

203
00:10:01.639 --> 00:10:05.159
<v Speaker 1>at ROC curves and AUC scores. They help you find

204
00:10:05.200 --> 00:10:06.639
<v Speaker 1>that sweet spot exactly.

205
00:10:06.840 --> 00:10:09.200
<v Speaker 2>They visualize the trade off and help you choose a

206
00:10:09.240 --> 00:10:12.840
<v Speaker 2>model threshold that balances precision and recall appropriately for your

207
00:10:12.879 --> 00:10:16.440
<v Speaker 2>specific problem. There's no single best score. It depends on

208
00:10:16.480 --> 00:10:17.679
<v Speaker 2>the context.

209
00:10:17.159 --> 00:10:19.440
<v Speaker 1>Which is a great transition. We've talked about the pain

210
00:10:19.519 --> 00:10:22.480
<v Speaker 1>of data prep the nuances of metrics. Now let's talk

211
00:10:22.480 --> 00:10:23.559
<v Speaker 1>about making it easier.

212
00:10:23.679 --> 00:10:27.039
<v Speaker 2>Yes, knowledge is great, but applying it efficiently is key.

213
00:10:27.600 --> 00:10:30.879
<v Speaker 2>Given how much manual effort goes into cleaning, tuning, and testing,

214
00:10:31.159 --> 00:10:33.360
<v Speaker 2>Let's look at the tools designed to abstract that away.

215
00:10:33.519 --> 00:10:35.799
<v Speaker 2>The IBM Watson suite is a prime example.

216
00:10:35.399 --> 00:10:39.039
<v Speaker 1>Here, right the automation aspect, Let's start with optimizing the

217
00:10:39.080 --> 00:10:43.480
<v Speaker 1>model itself. Traditionally, after data cleaning, you face this huge

218
00:10:43.519 --> 00:10:47.240
<v Speaker 1>task of trying different models right decision trees, random forests,

219
00:10:47.320 --> 00:10:48.559
<v Speaker 1>boosted trees.

220
00:10:48.440 --> 00:10:51.799
<v Speaker 2>Dozens of them potentially, and for each model type you

221
00:10:51.840 --> 00:10:54.440
<v Speaker 2>have to do hyper parameter tuning hyper parameters.

222
00:10:55.120 --> 00:10:58.320
<v Speaker 1>Those are the knobs and dials inside the algorithm itself,

223
00:10:58.759 --> 00:11:01.480
<v Speaker 1>like how deep a decision tree should grow max depth,

224
00:11:01.559 --> 00:11:04.279
<v Speaker 1>or how many trees a random forest should use.

225
00:11:04.519 --> 00:11:08.240
<v Speaker 2>Estimators exactly, and finding the best combination of these settings

226
00:11:08.320 --> 00:11:12.200
<v Speaker 2>is crucial for performance. The traditional way is often brute force,

227
00:11:12.360 --> 00:11:15.840
<v Speaker 2>like grid search cross validation. You define a grid of

228
00:11:15.960 --> 00:11:19.639
<v Speaker 2>possible values for each hyper parameter, and the computer systematically

229
00:11:19.639 --> 00:11:23.200
<v Speaker 2>tries every single combination. It can take hours, even days,

230
00:11:23.240 --> 00:11:25.240
<v Speaker 2>depending on the data and the model complexity.

231
00:11:25.279 --> 00:11:28.759
<v Speaker 1>Okay, so that sounds incredibly tedious and computationally expensive. How

232
00:11:28.799 --> 00:11:30.879
<v Speaker 1>does something like AUTOAI shortcut this?

233
00:11:31.159 --> 00:11:35.919
<v Speaker 2>AUTOAI is designed specifically for this structured data optimization problem.

234
00:11:36.039 --> 00:11:39.440
<v Speaker 2>It's pretty remarkable. You essentially give it your clean data set,

235
00:11:39.519 --> 00:11:41.720
<v Speaker 2>tell which column you want to predict, like medium house

236
00:11:41.799 --> 00:11:44.399
<v Speaker 2>value or MEDV, and a housing data set, and then

237
00:11:44.519 --> 00:11:49.000
<v Speaker 2>it just goes. It analyzes the data, It intelligently selects

238
00:11:49.039 --> 00:11:52.679
<v Speaker 2>and applies data transformations. It builds multiple candidate pipelines using

239
00:11:52.759 --> 00:11:57.799
<v Speaker 2>various algorithms. It performs sophisticated hyperparameter optimization automatically, far beyond

240
00:11:57.799 --> 00:12:00.240
<v Speaker 2>simple grid search, and then it ranks all all the

241
00:12:00.240 --> 00:12:03.360
<v Speaker 2>tested pipelines based on metrics relevant to your problem, like

242
00:12:03.519 --> 00:12:05.279
<v Speaker 2>RMSC root means squared error.

243
00:12:05.320 --> 00:12:08.720
<v Speaker 1>And the key part is you don't write the modeling code.

244
00:12:08.559 --> 00:12:11.159
<v Speaker 2>Not a single line for the model training and tuning part.

245
00:12:11.320 --> 00:12:13.519
<v Speaker 2>It automates what used to be weeks of a data

246
00:12:13.519 --> 00:12:17.159
<v Speaker 2>scientist's iterative work, presenting you with the best performing models

247
00:12:17.200 --> 00:12:17.679
<v Speaker 2>ready to go.

248
00:12:17.960 --> 00:12:21.879
<v Speaker 1>Wow. Okay, that tackles structured tabular data. But what about

249
00:12:21.879 --> 00:12:26.200
<v Speaker 1>the really messy stuff unstructured text images? We know? Traditional

250
00:12:26.320 --> 00:12:30.039
<v Speaker 1>natural language processing NLP is a beast. You have to

251
00:12:30.039 --> 00:12:33.320
<v Speaker 1>scrape text clean it, filter out common stop words like

252
00:12:33.559 --> 00:12:36.960
<v Speaker 1>the and A, convert words to numbers using complex methods

253
00:12:37.000 --> 00:12:39.600
<v Speaker 1>like word embedding. It's a whole field in itself, it

254
00:12:39.679 --> 00:12:40.039
<v Speaker 1>really is.

255
00:12:40.120 --> 00:12:43.320
<v Speaker 2>Building a good NLP pipeline from scratch can take months

256
00:12:43.399 --> 00:12:46.559
<v Speaker 2>or even years of specialized effort. This is where something

257
00:12:46.600 --> 00:12:49.960
<v Speaker 2>like Watson Discovery comes in. It aims to bypass almost

258
00:12:50.000 --> 00:12:52.720
<v Speaker 2>all of that initial heavy lifting for text analysis.

259
00:12:52.799 --> 00:12:53.799
<v Speaker 1>How so, what does it do?

260
00:12:54.159 --> 00:12:57.440
<v Speaker 2>It provides powerful preprocessing out of the box, things like

261
00:12:57.480 --> 00:13:01.480
<v Speaker 2>optical character recognition OCR to pull text from scanned documents,

262
00:13:01.720 --> 00:13:05.120
<v Speaker 2>automatic text extraction from various file types. But the real

263
00:13:05.200 --> 00:13:07.559
<v Speaker 2>magic is in the enrichments. Instead of you training a

264
00:13:07.600 --> 00:13:11.120
<v Speaker 2>model for months just to recognize names or places, Discovery

265
00:13:11.159 --> 00:13:14.799
<v Speaker 2>comes pre loaded with enrichment's like entity extraction, finding people,

266
00:13:14.879 --> 00:13:20.759
<v Speaker 2>companies' locations, concept tagging, identifying key ideas, sentiment analysis, positive negative, tone,

267
00:13:20.840 --> 00:13:24.120
<v Speaker 2>and more. You get deep insights almost instantly, So.

268
00:13:24.000 --> 00:13:27.039
<v Speaker 1>It's like having a pre trained NLP expert ready to

269
00:13:27.080 --> 00:13:29.519
<v Speaker 1>analyze huge volumes of documents.

270
00:13:29.559 --> 00:13:31.159
<v Speaker 2>That's a good way to put it. And you can

271
00:13:31.240 --> 00:13:35.720
<v Speaker 2>query these analyzed collections using the Discovery Query Language or DQL.

272
00:13:36.360 --> 00:13:39.919
<v Speaker 2>You use simple operators like dot for an exact match

273
00:13:40.039 --> 00:13:44.399
<v Speaker 2>or boff for contains to pinpoint specific information across potentially

274
00:13:44.480 --> 00:13:48.200
<v Speaker 2>millions of documents without writing complex and LP code.

275
00:13:48.360 --> 00:13:50.399
<v Speaker 1>Okay, that's text. What about images?

276
00:13:50.759 --> 00:13:55.720
<v Speaker 2>Simpler idea very similar principle with Watson Visual recognition. Image analysis,

277
00:13:55.879 --> 00:13:59.879
<v Speaker 2>especially using deep learning, is another complex field. Visual recognition

278
00:14:00.080 --> 00:14:03.360
<v Speaker 2>offers pre built capabilities. You can use it for image classification,

279
00:14:03.559 --> 00:14:05.639
<v Speaker 2>like telling the difference between a photo of a husky

280
00:14:05.720 --> 00:14:08.399
<v Speaker 2>and a photo of a beagle, or for object detection

281
00:14:08.559 --> 00:14:11.360
<v Speaker 2>finding and maybe even counting specific things within an image,

282
00:14:11.360 --> 00:14:14.399
<v Speaker 2>like identifying all the cars or people in a street scene. Again,

283
00:14:14.440 --> 00:14:16.759
<v Speaker 2>it abstracts away the need to build and train those

284
00:14:16.799 --> 00:14:18.720
<v Speaker 2>complex deep learning models yourself.

285
00:14:18.879 --> 00:14:21.840
<v Speaker 1>It seems like a recurring theme abstracting the complexity of

286
00:14:21.840 --> 00:14:24.720
<v Speaker 1>the underlying mL. It's such a one more automation piece

287
00:14:25.279 --> 00:14:28.639
<v Speaker 1>building chatbots or conversational interfaces with what it's an assistant?

288
00:14:28.799 --> 00:14:30.279
<v Speaker 1>How does that simplify things?

289
00:14:30.480 --> 00:14:34.039
<v Speaker 2>It uses a fairly intuitive structure. You define the user's

290
00:14:34.080 --> 00:14:36.879
<v Speaker 2>intents what they're trying to achieve, often marked with a

291
00:14:36.919 --> 00:14:40.679
<v Speaker 2>hash like halftag order pizza. Then you define entities the

292
00:14:40.720 --> 00:14:44.559
<v Speaker 2>specific pieces of information relevant to those intents, marked within

293
00:14:44.679 --> 00:14:47.240
<v Speaker 2>at like at pizza size or at topping.

294
00:14:47.600 --> 00:14:50.559
<v Speaker 1>So intent is the goal, entity is the detail. How

295
00:14:50.559 --> 00:14:52.679
<v Speaker 1>do they connect through dialogues?

296
00:14:53.360 --> 00:14:56.360
<v Speaker 2>You build a flow chart essentially that defines the conversation.

297
00:14:56.799 --> 00:14:59.559
<v Speaker 2>If the user expresses the hashtag order pizza intent, the

298
00:14:59.559 --> 00:15:02.519
<v Speaker 2>dialogue might then ask for the site at pizza size

299
00:15:02.559 --> 00:15:03.799
<v Speaker 2>and at topping entities.

300
00:15:04.279 --> 00:15:06.519
<v Speaker 1>How does it remember what the user already said, Like

301
00:15:06.559 --> 00:15:08.159
<v Speaker 1>if I say I want a large pizza and then

302
00:15:08.240 --> 00:15:09.759
<v Speaker 1>later say pepperoni.

303
00:15:10.279 --> 00:15:13.919
<v Speaker 2>That's handled by features like slots and context variables. Slots

304
00:15:13.919 --> 00:15:15.799
<v Speaker 2>are defined within an intent to make sure the bot

305
00:15:15.840 --> 00:15:19.440
<v Speaker 2>gathers all necessary entities if it needs size and topping,

306
00:15:19.600 --> 00:15:21.840
<v Speaker 2>and you only gave the size A slot can prompt

307
00:15:21.879 --> 00:15:24.519
<v Speaker 2>for the topping. Context variables are like the bot's short

308
00:15:24.600 --> 00:15:27.480
<v Speaker 2>term memory. It can store the fact that pizza size

309
00:15:27.519 --> 00:15:29.960
<v Speaker 2>lurgs in a context variable. So when you just say pepperoni,

310
00:15:29.960 --> 00:15:31.919
<v Speaker 2>it knows you mean pepperoni for the large pizza you're

311
00:15:31.919 --> 00:15:34.679
<v Speaker 2>already mentioned. It maintains the state of the conversation.

312
00:15:35.000 --> 00:15:38.759
<v Speaker 1>Okay, so we've got these powerful, often automated ways to

313
00:15:38.799 --> 00:15:42.480
<v Speaker 1>build specialized mL models and services using tools like Watson,

314
00:15:43.039 --> 00:15:48.240
<v Speaker 1>AUTOAI for structured data, discovery for text, visual recognition for images,

315
00:15:48.279 --> 00:15:50.840
<v Speaker 1>Assistant for conversations. So what does this all mean? How

316
00:15:50.919 --> 00:15:53.559
<v Speaker 1>do these things actually get used? How do we move

317
00:15:53.559 --> 00:15:55.720
<v Speaker 1>from these tools to a live application?

318
00:15:56.120 --> 00:15:58.480
<v Speaker 2>Good question. You need to get them into a production

319
00:15:58.600 --> 00:16:01.519
<v Speaker 2>environment where users or other systems can interact with them.

320
00:16:01.799 --> 00:16:04.159
<v Speaker 2>A common approach is to build a back end application,

321
00:16:04.679 --> 00:16:08.519
<v Speaker 2>maybe using a Python web framework like flask. This flask

322
00:16:08.600 --> 00:16:11.559
<v Speaker 2>app acts as a middleman. It receives requests maybe from

323
00:16:11.559 --> 00:16:13.519
<v Speaker 2>a web page or mobile app, figures out what needs

324
00:16:13.559 --> 00:16:16.600
<v Speaker 2>to happen, calls the relevant walks in API like discovery

325
00:16:16.639 --> 00:16:19.200
<v Speaker 2>or assistant, gets the result and sends it.

326
00:16:19.159 --> 00:16:22.440
<v Speaker 1>Back and deploying that Flask gap is that complex too, it.

327
00:16:22.360 --> 00:16:25.240
<v Speaker 2>Can be, but platform as a service offerings like IBM

328
00:16:25.279 --> 00:16:28.440
<v Speaker 2>Cloud with Cloud Foundry really simplify it. Often it's as

329
00:16:28.440 --> 00:16:31.159
<v Speaker 2>simple as navigating to your project directory and the command

330
00:16:31.159 --> 00:16:34.799
<v Speaker 2>line and typing CF push. The platform handles provisioning servers,

331
00:16:34.879 --> 00:16:38.639
<v Speaker 2>load balancing, all the infrastructure stuff. It can be incredibly fast.

332
00:16:38.600 --> 00:16:41.600
<v Speaker 1>So the path to production can be streamlined too. Are

333
00:16:41.600 --> 00:16:45.759
<v Speaker 1>there other useful utility services that often plug into these systems?

334
00:16:45.799 --> 00:16:47.600
<v Speaker 1>You mentioned a couple, Yeah, A couple.

335
00:16:47.399 --> 00:16:50.879
<v Speaker 2>Of really useful ones come to mind. First, the Tone Analyzer.

336
00:16:51.559 --> 00:16:54.799
<v Speaker 2>This service specifically analyzes texts, but not just for what

337
00:16:54.919 --> 00:16:57.960
<v Speaker 2>is said, but how it's said. It uses NLP to

338
00:16:58.039 --> 00:17:01.480
<v Speaker 2>detect emotional and language tones, well kind of tones. It

339
00:17:01.480 --> 00:17:05.200
<v Speaker 2>breaks them down. They're emotional tones, things like anger, fear,

340
00:17:05.759 --> 00:17:11.359
<v Speaker 2>joy sadness, and then language tones analytical, tentative, confident.

341
00:17:11.599 --> 00:17:13.720
<v Speaker 1>I could see how that would be useful, like monitoring

342
00:17:13.720 --> 00:17:15.400
<v Speaker 1>customer support chats or reviews.

343
00:17:15.519 --> 00:17:20.079
<v Speaker 2>Absolutely understanding the tone helps companies gauge customer sentiment, identify

344
00:17:20.240 --> 00:17:24.160
<v Speaker 2>urgent issues, or even tailor responses dynamically. And the other utility,

345
00:17:24.359 --> 00:17:28.240
<v Speaker 2>text to speech or TTS, exactly the kind of technology

346
00:17:28.279 --> 00:17:30.559
<v Speaker 2>needed to voice a script like this one. Actually, it

347
00:17:30.599 --> 00:17:33.599
<v Speaker 2>takes written text and converts it into natural sounding speech.

348
00:17:34.160 --> 00:17:38.119
<v Speaker 2>Modern TTS services offer various high quality voices different languages,

349
00:17:38.400 --> 00:17:41.480
<v Speaker 2>and you can even customize the output using SSML. That's

350
00:17:41.480 --> 00:17:46.720
<v Speaker 2>Speech Synthesis Markup Language. It lets you control pronunciation, pauses, emphasis, pitch,

351
00:17:47.079 --> 00:17:49.559
<v Speaker 2>making the synthesized speech sound much less robotic.

352
00:17:49.720 --> 00:17:52.920
<v Speaker 1>Right bringing it full circle. So to recap, we've seen

353
00:17:52.960 --> 00:17:56.319
<v Speaker 1>that while algorithms are key, the real bear in mL

354
00:17:56.480 --> 00:18:00.079
<v Speaker 1>is often data preparation and feature engineering. We learn that

355
00:18:00.119 --> 00:18:03.440
<v Speaker 1>simple accuracy can lie, and we need metrics like precision

356
00:18:03.480 --> 00:18:06.480
<v Speaker 1>and recall balance carefully based on the real world consequences

357
00:18:06.480 --> 00:18:08.960
<v Speaker 1>of errors. And then we saw how suites like IBM

358
00:18:09.039 --> 00:18:13.480
<v Speaker 1>Watson provide powerful shortcuts AUTOAI for optimizing models on structured

359
00:18:13.519 --> 00:18:17.000
<v Speaker 1>data without coding, Discovery and visual recognition for extracting insights

360
00:18:17.000 --> 00:18:21.759
<v Speaker 1>from unstructured text and images, and assistant for building conversational interfaces.

361
00:18:21.359 --> 00:18:23.960
<v Speaker 2>Plus utilities like tone analyzer and text to speech to

362
00:18:24.000 --> 00:18:28.519
<v Speaker 2>add further capabilities, all deployable relatively easily via cloud platforms.

363
00:18:28.680 --> 00:18:31.599
<v Speaker 1>Okay, so you the listener should now have a much

364
00:18:31.640 --> 00:18:36.240
<v Speaker 1>clearer picture of both the deep challenges in mL, data quality,

365
00:18:36.359 --> 00:18:40.559
<v Speaker 1>metric choice, and also the sophisticated tools emerging to automate

366
00:18:40.599 --> 00:18:42.480
<v Speaker 1>and abstract away a lot of that complexity.

367
00:18:42.799 --> 00:18:45.880
<v Speaker 2>And we saw specifically how tools like AUTOAI can take

368
00:18:45.920 --> 00:18:49.519
<v Speaker 2>over complex tasks like model selection and hyper parameter tuning,

369
00:18:49.839 --> 00:18:52.279
<v Speaker 2>things that used to be purely the domain of the

370
00:18:52.319 --> 00:18:55.319
<v Speaker 2>expert coder. Which leads to, I think a really interesting

371
00:18:55.359 --> 00:18:57.680
<v Speaker 2>final thought for you to chew on. As these incredibly

372
00:18:57.680 --> 00:19:01.319
<v Speaker 2>powerful tools increasingly automate the how now, the coding, the tuning,

373
00:19:01.400 --> 00:19:04.720
<v Speaker 2>the model selection itself, where should the modern data learner

374
00:19:04.799 --> 00:19:07.759
<v Speaker 2>focus their energy next? Is the most valuable skill becoming

375
00:19:07.759 --> 00:19:10.519
<v Speaker 2>in even deeper mastery of the underlying code and mathematics,

376
00:19:11.000 --> 00:19:13.960
<v Speaker 2>or is it shifting towards mastering the data itself, its quality,

377
00:19:14.000 --> 00:19:17.119
<v Speaker 2>its nuances, its preparation, and ultimately the interpretation of what

378
00:19:17.160 --> 00:19:20.319
<v Speaker 2>the automated tools tell us. Where does the essential human

379
00:19:20.359 --> 00:19:22.640
<v Speaker 2>expertise lie now? Something to think about
