WEBVTT

1
00:00:00.160 --> 00:00:03.279
<v Speaker 1>Welcome to the deep dive. You know, when most people

2
00:00:03.319 --> 00:00:07.320
<v Speaker 1>hear machine learning or maybe AI, I think the first

3
00:00:07.320 --> 00:00:09.960
<v Speaker 1>thing that comes to mind is the code.

4
00:00:10.080 --> 00:00:13.240
<v Speaker 2>Right, Oh, absolutely, Python, scripts, neural nets, all that complex

5
00:00:13.279 --> 00:00:15.359
<v Speaker 2>engineering stuff. That's the flashy part.

6
00:00:15.519 --> 00:00:17.199
<v Speaker 1>Yeah, the engine, It is the engine. Yeah.

7
00:00:17.199 --> 00:00:19.679
<v Speaker 2>But what our source material for today really emphasizes, and

8
00:00:19.719 --> 00:00:23.440
<v Speaker 2>it's looking specifically at the prerequisites for even building those engines,

9
00:00:24.199 --> 00:00:27.920
<v Speaker 2>is that the real foundation. It isn't the code, ok,

10
00:00:28.160 --> 00:00:32.000
<v Speaker 2>it's math specifically, it's statistics. You could almost call it

11
00:00:32.039 --> 00:00:33.640
<v Speaker 2>a preliminary requirement.

12
00:00:33.799 --> 00:00:35.759
<v Speaker 1>Right. So that's our mission today. Then we're aiming to

13
00:00:35.759 --> 00:00:38.039
<v Speaker 1>give you a bit of an intellectual shortcut here. We

14
00:00:38.079 --> 00:00:41.679
<v Speaker 1>want to pull out the essential statistical concepts that the

15
00:00:41.679 --> 00:00:45.600
<v Speaker 1>core of vocabulary and the toolkit you need for exploring data,

16
00:00:45.840 --> 00:00:48.600
<v Speaker 1>cleaning it up, getting ready for predictive modeling.

17
00:00:48.280 --> 00:00:51.280
<v Speaker 2>Basically saving you the trouble of reading the whole textbook page.

18
00:00:51.079 --> 00:00:54.679
<v Speaker 1>By page exactly. This is about getting that statistical fluency

19
00:00:54.719 --> 00:00:57.200
<v Speaker 1>you need before you even think about training.

20
00:00:56.920 --> 00:01:00.520
<v Speaker 2>A model, and it's not just about passing some exam.

21
00:01:00.920 --> 00:01:05.359
<v Speaker 2>You genuinely need these concepts because well, every single step

22
00:01:05.400 --> 00:01:07.640
<v Speaker 2>in an mL pipeline from the moment you get the

23
00:01:07.719 --> 00:01:11.359
<v Speaker 2>data to evaluating how well your model did. It's fundamentally

24
00:01:11.400 --> 00:01:12.519
<v Speaker 2>a statistical operation.

25
00:01:13.200 --> 00:01:15.879
<v Speaker 1>Okay, so where do we start. I guess right at

26
00:01:15.879 --> 00:01:20.000
<v Speaker 1>the beginning, recognizing what kind of data you're even dealing with.

27
00:01:20.319 --> 00:01:23.439
<v Speaker 2>That's the spot the sources remind us that data collection

28
00:01:23.560 --> 00:01:26.959
<v Speaker 2>isn't just you know, chaos. It's usually driven by trying

29
00:01:26.959 --> 00:01:29.000
<v Speaker 2>to answer some real world question.

30
00:01:28.879 --> 00:01:31.560
<v Speaker 1>Like market research before you launch a product, maybe.

31
00:01:31.400 --> 00:01:33.760
<v Speaker 2>Exactly is this product feasible? Who are we trying to

32
00:01:33.799 --> 00:01:34.719
<v Speaker 2>reach that kind of thing?

33
00:01:34.840 --> 00:01:37.280
<v Speaker 1>And the answers we get the actual numbers we collect

34
00:01:37.280 --> 00:01:37.959
<v Speaker 1>in store.

35
00:01:38.000 --> 00:01:41.799
<v Speaker 2>Those are grouped into what statisticians call random variables. They're

36
00:01:41.879 --> 00:01:44.719
<v Speaker 2>the numerical backbone of whatever research you're doing.

37
00:01:44.840 --> 00:01:47.760
<v Speaker 1>Okay, random variables, So how do we bring some order

38
00:01:47.799 --> 00:01:49.319
<v Speaker 1>to that? How do we structure them?

39
00:01:49.439 --> 00:01:52.200
<v Speaker 2>Well, we mainly split them based on what they can

40
00:01:52.239 --> 00:01:55.719
<v Speaker 2>actually measure. First up, you've got discrete random variables.

41
00:01:55.760 --> 00:01:57.079
<v Speaker 1>Discrete meaning separate.

42
00:01:57.359 --> 00:01:59.519
<v Speaker 2>Yeah, I think fixed counts. They have to be whole numbers.

43
00:01:59.519 --> 00:02:03.319
<v Speaker 2>It can't be like counting how many people clicked on

44
00:02:03.359 --> 00:02:05.719
<v Speaker 2>an ad or the number of gold medals a country

45
00:02:05.719 --> 00:02:09.560
<v Speaker 2>one in the Olympics. Not it fixed counts, Definite fixed counts.

46
00:02:09.400 --> 00:02:12.360
<v Speaker 1>So what's the other kind the stuff that isn't fixed counts?

47
00:02:12.599 --> 00:02:15.840
<v Speaker 2>That would be your continuous random variable. This one stores

48
00:02:15.919 --> 00:02:19.879
<v Speaker 2>values that can be decimals or floats, and theoretically, at

49
00:02:19.919 --> 00:02:22.759
<v Speaker 2>least you could measure them with infinite precision.

50
00:02:22.479 --> 00:02:24.000
<v Speaker 1>Like height or weight.

51
00:02:24.240 --> 00:02:28.800
<v Speaker 2>Perfect examples height, weight, temperature. You can always, in theory,

52
00:02:29.120 --> 00:02:31.479
<v Speaker 2>add another decimal place to make the measurement finer.

53
00:02:31.800 --> 00:02:34.960
<v Speaker 1>Okay, that makes sense for numbers, But what if the

54
00:02:35.039 --> 00:02:37.280
<v Speaker 1>data isn't a number at all, Like if it's just

55
00:02:37.319 --> 00:02:40.199
<v Speaker 1>a label someone's city or maybe their preferred brand.

56
00:02:40.319 --> 00:02:44.000
<v Speaker 2>Ah, good question. Then you're working with categorical variables. And

57
00:02:44.039 --> 00:02:47.120
<v Speaker 2>this is where we need another layer of distinction, because

58
00:02:47.479 --> 00:02:50.960
<v Speaker 2>how an mL algorithm handles these depends a lot on

59
00:02:51.000 --> 00:02:54.680
<v Speaker 2>whether the categories have some kind of internal meaning or order.

60
00:02:54.759 --> 00:02:57.680
<v Speaker 1>Wait, internal meaning? Why does that matter? Isn't red just

61
00:02:57.919 --> 00:02:58.840
<v Speaker 1>red to a computer?

62
00:02:59.159 --> 00:03:02.759
<v Speaker 2>It matters quite a bit, actually, mostly because it impacts

63
00:03:02.840 --> 00:03:05.919
<v Speaker 2>how you encode that data before feeding it to a model.

64
00:03:06.759 --> 00:03:11.159
<v Speaker 2>If the categories have absolutely no inherent rank or order,

65
00:03:11.560 --> 00:03:15.439
<v Speaker 2>we call them nominal variables. Okay, think gender like male female,

66
00:03:16.000 --> 00:03:19.479
<v Speaker 2>or maybe types of fruit apple, banana, orange. You can't

67
00:03:19.520 --> 00:03:23.520
<v Speaker 2>really rank one above the other. Logically, they're just distinct groups.

68
00:03:23.400 --> 00:03:26.159
<v Speaker 1>Right, distinct groups makes sense, But what if they can

69
00:03:26.199 --> 00:03:27.240
<v Speaker 1>be ranked and it's.

70
00:03:27.159 --> 00:03:33.000
<v Speaker 2>An ordinal variable. Think about say a customer satisfaction rating low, medium, high.

71
00:03:33.360 --> 00:03:35.840
<v Speaker 1>Ah, Okay, there's a clear hierarchy.

72
00:03:35.439 --> 00:03:38.840
<v Speaker 2>There exactly, And knowing this difference is key before you

73
00:03:38.879 --> 00:03:42.800
<v Speaker 2>start your future engineering phenomenal variables. The algorithm often needs

74
00:03:42.800 --> 00:03:46.599
<v Speaker 2>to treat each category as totally separate, maybe using something

75
00:03:46.639 --> 00:03:49.759
<v Speaker 2>called one hot encoding. But for ordinal variables you might

76
00:03:49.800 --> 00:03:52.439
<v Speaker 2>be able to use encoding methods that preserve that ranking,

77
00:03:52.800 --> 00:03:56.120
<v Speaker 2>which can sometimes make the model simpler or even more accurate.

78
00:03:56.520 --> 00:03:58.840
<v Speaker 2>So yeah, knowing this distinction is pretty fundamental.

79
00:03:59.000 --> 00:04:00.719
<v Speaker 1>All right, So we figure fur out what kind of

80
00:04:00.800 --> 00:04:04.719
<v Speaker 1>variables we have. What's the immediate next step? Usually it's

81
00:04:04.800 --> 00:04:09.159
<v Speaker 1>descriptive statistics, right, trying to summarize potentially huge data sets.

82
00:04:09.240 --> 00:04:12.919
<v Speaker 2>Yes, exactly. We're moving from just defining things to actually

83
00:04:12.960 --> 00:04:15.080
<v Speaker 2>starting to tell the story hidden the data. The first

84
00:04:15.120 --> 00:04:18.519
<v Speaker 2>step is usually summarizing it, focusing on its center and

85
00:04:18.560 --> 00:04:19.120
<v Speaker 2>it's spread.

86
00:04:19.279 --> 00:04:22.439
<v Speaker 1>Okay, center and spread. Let's start with the center. Measures

87
00:04:22.439 --> 00:04:24.879
<v Speaker 1>of central tendency is that the term that's the one and.

88
00:04:24.879 --> 00:04:27.439
<v Speaker 2>The big three here are the mean the median and

89
00:04:27.480 --> 00:04:28.079
<v Speaker 2>the mode.

90
00:04:28.399 --> 00:04:31.120
<v Speaker 1>Everyone knows the mean the average, Right, add them all up,

91
00:04:31.160 --> 00:04:34.439
<v Speaker 1>divide by how many there are. Seems simple, But what's

92
00:04:34.480 --> 00:04:37.600
<v Speaker 1>the specific mL insight? Why is it so important?

93
00:04:37.839 --> 00:04:41.240
<v Speaker 2>Well, mathematically, the mean is the center of balance for

94
00:04:41.279 --> 00:04:44.240
<v Speaker 2>your data. But what's really interesting is how it connects

95
00:04:44.240 --> 00:04:47.279
<v Speaker 2>directly to prediction. How So, when you build, say a

96
00:04:47.399 --> 00:04:51.439
<v Speaker 2>simple linear regression model, what you're essentially doing is trying

97
00:04:51.439 --> 00:04:54.959
<v Speaker 2>to draw a line that minimizes the square distance between

98
00:04:54.959 --> 00:04:57.279
<v Speaker 2>that line and all your data points. Yeah, the mean

99
00:04:57.360 --> 00:05:00.160
<v Speaker 2>turns out to be the single value that inherent only

100
00:05:00.199 --> 00:05:01.720
<v Speaker 2>minimizes that scored error.

101
00:05:02.040 --> 00:05:05.040
<v Speaker 1>Huh. So it's like the best guess if you knew

102
00:05:05.079 --> 00:05:05.759
<v Speaker 1>nothing else.

103
00:05:06.160 --> 00:05:09.720
<v Speaker 2>It's the optimal point prediction if you had zero other information. Yes,

104
00:05:09.839 --> 00:05:11.120
<v Speaker 2>it's a point of minimum error.

105
00:05:11.279 --> 00:05:15.279
<v Speaker 1>Okay, but the mean has that famous weakness, right, the

106
00:05:15.279 --> 00:05:18.199
<v Speaker 1>outlier problem. Like if your averaging salary is in a

107
00:05:18.199 --> 00:05:21.720
<v Speaker 1>small startup and suddenly the CEO's twenty million dollars salary

108
00:05:21.759 --> 00:05:22.519
<v Speaker 1>gets added.

109
00:05:22.279 --> 00:05:26.759
<v Speaker 2>In exactly that one massive outlier just yanks the average

110
00:05:26.800 --> 00:05:30.360
<v Speaker 2>way way up, making it not very representative of the

111
00:05:30.360 --> 00:05:31.360
<v Speaker 2>typical employee.

112
00:05:31.439 --> 00:05:32.720
<v Speaker 1>So that's where the medium comes in.

113
00:05:32.959 --> 00:05:36.360
<v Speaker 2>Precisely, The median is the exact middle value when you

114
00:05:36.399 --> 00:05:38.800
<v Speaker 2>sort your data from smallest to largest. Fifty percent of

115
00:05:38.800 --> 00:05:40.720
<v Speaker 2>the data is below it, fifty percent is above it.

116
00:05:40.759 --> 00:05:43.279
<v Speaker 1>And because it only cares about the middle position.

117
00:05:43.519 --> 00:05:46.759
<v Speaker 2>It's incredibly robust to those extreme outliers. That twenty million

118
00:05:46.759 --> 00:05:50.279
<v Speaker 2>dollars salary doesn't really affect the median much, if at all.

119
00:05:50.519 --> 00:05:52.759
<v Speaker 1>And if you have an even number of data points,

120
00:05:53.519 --> 00:05:54.839
<v Speaker 1>no single middle value.

121
00:05:54.959 --> 00:05:57.319
<v Speaker 2>Simple you just take the average of the two middle values.

122
00:05:57.639 --> 00:05:59.480
<v Speaker 2>Still gives you that robust central point.

123
00:05:59.560 --> 00:06:03.199
<v Speaker 1>Okay, so mean is air minimizing, but sensitive to outliers,

124
00:06:03.240 --> 00:06:05.480
<v Speaker 1>meeting is robust. What about the third one, the mode?

125
00:06:05.839 --> 00:06:08.920
<v Speaker 2>The mode is even simpler. It's just the value that

126
00:06:09.000 --> 00:06:12.079
<v Speaker 2>shows up most often in your data set, most frequent yep.

127
00:06:12.759 --> 00:06:16.399
<v Speaker 2>It's typically most useful for categorical data, finding the most

128
00:06:16.399 --> 00:06:18.759
<v Speaker 2>popular choice or the most common group.

129
00:06:18.839 --> 00:06:20.360
<v Speaker 1>And he quirks with the mode.

130
00:06:20.519 --> 00:06:23.439
<v Speaker 2>Couple interesting ones. It's the only measure of center that

131
00:06:23.519 --> 00:06:26.360
<v Speaker 2>might not actually be present in your data, which sounds

132
00:06:26.399 --> 00:06:28.920
<v Speaker 2>weird but can happen. And you can also have more

133
00:06:28.959 --> 00:06:32.560
<v Speaker 2>than one mode, like bimodal exactly by moodal if there

134
00:06:32.560 --> 00:06:35.240
<v Speaker 2>are two peaks, or even multimodal that can be a

135
00:06:35.240 --> 00:06:37.480
<v Speaker 2>clue that your data might actually be composed of a

136
00:06:37.519 --> 00:06:39.639
<v Speaker 2>couple of different underlying groups or clusters.

137
00:06:39.680 --> 00:06:43.279
<v Speaker 1>Okay, so we found the center using mean, median or mode.

138
00:06:43.800 --> 00:06:47.439
<v Speaker 1>But you said center alone isn't enough. Two data sets

139
00:06:47.439 --> 00:06:50.000
<v Speaker 1>could have the same mean but look totally different.

140
00:06:50.319 --> 00:06:53.439
<v Speaker 2>Right. Imagine one data set clustered tightly around the mean

141
00:06:54.040 --> 00:06:57.519
<v Speaker 2>and another spread way out, same mean, very different story.

142
00:06:57.720 --> 00:07:01.480
<v Speaker 2>That's why we need measures of disperge or spread, and

143
00:07:01.519 --> 00:07:04.199
<v Speaker 2>the main ones are variance in standard deviation STY.

144
00:07:04.319 --> 00:07:07.240
<v Speaker 1>Okay, variance and STY. They both measure spread, right, how

145
00:07:07.279 --> 00:07:09.639
<v Speaker 1>far data points tend to be from the center, usually

146
00:07:09.680 --> 00:07:10.000
<v Speaker 1>the mean.

147
00:07:10.240 --> 00:07:13.600
<v Speaker 2>That's the core idea. A high value for either variants

148
00:07:13.639 --> 00:07:17.600
<v Speaker 2>or SD means the data is really spread out, dispersed widely.

149
00:07:18.160 --> 00:07:21.000
<v Speaker 2>A small value means everything's huddled close to the mean.

150
00:07:21.879 --> 00:07:24.519
<v Speaker 1>So if they measure the same basic thing, why do

151
00:07:24.560 --> 00:07:27.879
<v Speaker 1>we need both? What's the practical difference, especially thinking about

152
00:07:27.879 --> 00:07:28.720
<v Speaker 1>machine learning?

153
00:07:28.920 --> 00:07:32.519
<v Speaker 2>Okay, so mathematically, the standard deviation is just the square

154
00:07:32.600 --> 00:07:35.399
<v Speaker 2>root of the variance. The absolute key difference is the

155
00:07:35.519 --> 00:07:39.600
<v Speaker 2>units units. Yeah, variance is calculated using square differences, so

156
00:07:39.639 --> 00:07:42.800
<v Speaker 2>it's units are the square of the original data's units.

157
00:07:43.319 --> 00:07:45.920
<v Speaker 2>If you measure at height in meters, the variance is

158
00:07:45.959 --> 00:07:49.360
<v Speaker 2>in meters squared, which is kind of awkward to interpret directly.

159
00:07:49.319 --> 00:07:50.160
<v Speaker 1>Not very intuitive.

160
00:07:50.240 --> 00:07:53.240
<v Speaker 2>But the standard deviation, because it's the square root, is

161
00:07:53.279 --> 00:07:55.879
<v Speaker 2>back in the original units. So if your height data

162
00:07:55.920 --> 00:07:58.160
<v Speaker 2>is in meters, the SD is also in meters.

163
00:07:58.240 --> 00:08:01.439
<v Speaker 1>Ah. Okay, so s D is easier to compare directly

164
00:08:01.439 --> 00:08:02.079
<v Speaker 1>to the mean.

165
00:08:02.199 --> 00:08:06.199
<v Speaker 2>Much easier. It makes SD far better for interpretation, for reporting,

166
00:08:06.439 --> 00:08:09.920
<v Speaker 2>and really crucially for something called feature scaling or normalization

167
00:08:10.040 --> 00:08:10.480
<v Speaker 2>in mL.

168
00:08:10.600 --> 00:08:11.519
<v Speaker 1>Why future scaling.

169
00:08:11.720 --> 00:08:14.160
<v Speaker 2>Well, often in mL you have features measured on totally

170
00:08:14.160 --> 00:08:18.240
<v Speaker 2>different scales, maybe aging years, income in thousands of dollars,

171
00:08:18.279 --> 00:08:21.920
<v Speaker 2>heightened centimeters. Models can sometimes struggle with that or give

172
00:08:21.959 --> 00:08:24.959
<v Speaker 2>too much weight to features with larger numerical values.

173
00:08:24.720 --> 00:08:26.160
<v Speaker 1>So you need to put them on a level playing

174
00:08:26.160 --> 00:08:27.040
<v Speaker 1>field exactly.

175
00:08:27.199 --> 00:08:29.600
<v Speaker 2>You often rescale features so they have a mean of

176
00:08:29.680 --> 00:08:33.000
<v Speaker 2>zero and a standard deviation of one, And standard deviation

177
00:08:33.120 --> 00:08:35.480
<v Speaker 2>is the metric you use to do that rescaling properly.

178
00:08:35.840 --> 00:08:39.279
<v Speaker 2>It's fundamental for pre processing data for many algorithms.

179
00:08:39.399 --> 00:08:42.960
<v Speaker 1>Okay, we've gone from defining data types to summarizing them

180
00:08:43.000 --> 00:08:46.200
<v Speaker 1>with center and spread. Now how do we pivot towards

181
00:08:46.279 --> 00:08:49.480
<v Speaker 1>using this data for prediction. That feels like the next

182
00:08:49.519 --> 00:08:50.320
<v Speaker 1>logical step.

183
00:08:50.879 --> 00:08:53.559
<v Speaker 2>It is, and that pivot really starts by defining a

184
00:08:53.559 --> 00:08:57.200
<v Speaker 2>potential cause and effect relationship. This is where we introduce

185
00:08:57.240 --> 00:09:00.080
<v Speaker 2>the concepts of dependent and independent.

186
00:08:59.600 --> 00:09:01.720
<v Speaker 1>Variable right setting up the experiment.

187
00:09:01.799 --> 00:09:05.799
<v Speaker 2>Essentially, pretty much, we're defining our modeling goal. What factor

188
00:09:05.879 --> 00:09:10.120
<v Speaker 2>are we changing or observing the independent variable, and what

189
00:09:10.200 --> 00:09:14.080
<v Speaker 2>outcome are we measuring the effect on the dependent variable.

190
00:09:14.360 --> 00:09:17.879
<v Speaker 1>So the independent variable is the input, the thing we control,

191
00:09:18.080 --> 00:09:20.159
<v Speaker 1>or the factor we think is causing a change.

192
00:09:20.200 --> 00:09:23.240
<v Speaker 2>Exactly like in a drug trial, the dosage level would

193
00:09:23.240 --> 00:09:26.879
<v Speaker 2>be the independent variable. Or using an example from the source,

194
00:09:27.200 --> 00:09:29.440
<v Speaker 2>maybe the type of pitch a pitcher throws to a batter.

195
00:09:29.600 --> 00:09:30.639
<v Speaker 2>That's the input being.

196
00:09:30.559 --> 00:09:33.240
<v Speaker 1>Varied, and the dependent variable is the output the result

197
00:09:33.399 --> 00:09:35.600
<v Speaker 1>What happens because of the independent variable.

198
00:09:35.759 --> 00:09:38.919
<v Speaker 2>Yes, it's the variable being tested or measured that responds

199
00:09:38.960 --> 00:09:42.679
<v Speaker 2>to the changes. In that baseball example, the batter's performance,

200
00:09:42.720 --> 00:09:45.240
<v Speaker 2>did they hit it how well? That's the dependent variable.

201
00:09:45.559 --> 00:09:47.320
<v Speaker 2>Its value depends on the pitch type.

202
00:09:47.440 --> 00:09:50.960
<v Speaker 1>And getting these two defined correctly seems absolutely critical. It's

203
00:09:50.960 --> 00:09:54.360
<v Speaker 1>basically framing the entire problem you want your mL model

204
00:09:54.399 --> 00:09:54.879
<v Speaker 1>to solve.

205
00:09:55.080 --> 00:09:58.240
<v Speaker 2>It is you're specifying the relationship you intend to model

206
00:09:58.279 --> 00:09:58.759
<v Speaker 2>and predict.

207
00:09:58.879 --> 00:10:02.120
<v Speaker 1>Now, underpinning all all of this statistical analysis, all these

208
00:10:02.120 --> 00:10:05.639
<v Speaker 1>measurements and relationships, there's a really core principle that gives

209
00:10:05.720 --> 00:10:08.279
<v Speaker 1>us confidence in the results, right, the law of large

210
00:10:08.320 --> 00:10:09.480
<v Speaker 1>numbers LLLN.

211
00:10:09.600 --> 00:10:12.639
<v Speaker 2>Ah. Yes, the LLN. It's absolutely fundamental. It's kind of

212
00:10:12.679 --> 00:10:15.000
<v Speaker 2>the bedrock that makes statistics work reliably.

213
00:10:15.240 --> 00:10:17.799
<v Speaker 1>So what is its state? In simple terms, it.

214
00:10:17.759 --> 00:10:21.000
<v Speaker 2>Basically says that if you repeat the same experiment over

215
00:10:21.080 --> 00:10:23.360
<v Speaker 2>and over and over again a huge number of times,

216
00:10:23.799 --> 00:10:26.559
<v Speaker 2>the average of the results you get will get closer

217
00:10:26.559 --> 00:10:30.240
<v Speaker 2>and closer to the true expected theoretical value.

218
00:10:30.320 --> 00:10:31.159
<v Speaker 1>Like flipping a coin.

219
00:10:31.320 --> 00:10:34.679
<v Speaker 2>Perfect example, flip a coin just ten times, you might

220
00:10:34.720 --> 00:10:38.080
<v Speaker 2>easily get say seven heads and three tails. That's pretty

221
00:10:38.120 --> 00:10:40.679
<v Speaker 2>far from the expected fifty to fifty, right, But flip

222
00:10:40.720 --> 00:10:43.440
<v Speaker 2>that same coin a million times or ten million times,

223
00:10:44.039 --> 00:10:46.120
<v Speaker 2>the ratio of heads to tails is going to get

224
00:10:46.159 --> 00:10:49.360
<v Speaker 2>incredibly close to exactly one to one. It converges on

225
00:10:49.399 --> 00:10:51.200
<v Speaker 2>the true probability.

226
00:10:50.639 --> 00:10:54.120
<v Speaker 1>And it's that convergence that lets us trust statistical methods exactly.

227
00:10:54.279 --> 00:10:57.840
<v Speaker 2>It validates the whole idea of using probabilities and statistics

228
00:10:57.840 --> 00:11:01.799
<v Speaker 2>derived from experiments or samples to stand underlying truths. It

229
00:11:01.799 --> 00:11:04.399
<v Speaker 2>allows us to have confidence in probabilistic models.

230
00:11:04.519 --> 00:11:07.240
<v Speaker 1>So the LN gives us the confidence then to take

231
00:11:07.320 --> 00:11:09.600
<v Speaker 1>results we see in a smaller sample of data and

232
00:11:09.639 --> 00:11:13.200
<v Speaker 1>make reasonable conclusions about the entire population it came from,

233
00:11:13.240 --> 00:11:15.039
<v Speaker 1>which sounds like statistical inference.

234
00:11:15.440 --> 00:11:19.159
<v Speaker 2>That's precisely what statistical inference is about, and it leads

235
00:11:19.240 --> 00:11:22.200
<v Speaker 2>directly to the main framework we use for making those decisions.

236
00:11:22.399 --> 00:11:23.480
<v Speaker 2>Hypothesis testing.

237
00:11:23.840 --> 00:11:27.320
<v Speaker 1>Okay, hypothesis testing. This is where we formally test an

238
00:11:27.320 --> 00:11:28.600
<v Speaker 1>idea using the data.

239
00:11:28.720 --> 00:11:31.559
<v Speaker 2>Yes, it's the structured process where we use the summary

240
00:11:31.559 --> 00:11:34.960
<v Speaker 2>statistics we calculated combined with our understanding of probability in

241
00:11:35.000 --> 00:11:39.200
<v Speaker 2>the LLN to draw conclusions about a whole population based

242
00:11:39.320 --> 00:11:40.879
<v Speaker 2>only on evidence from a sample.

243
00:11:41.240 --> 00:11:44.440
<v Speaker 1>And it usually involves setting up two competing ideas beforehand.

244
00:11:44.559 --> 00:11:47.960
<v Speaker 2>Correct you have a kind of statistical showdown. The main

245
00:11:48.000 --> 00:11:49.879
<v Speaker 2>goal is to see if there's enough evidence in your

246
00:11:49.919 --> 00:11:52.639
<v Speaker 2>sample data to reject the null hypothesis.

247
00:11:52.759 --> 00:11:56.159
<v Speaker 1>The null hypothesis being the default skeptical position.

248
00:11:56.320 --> 00:11:59.440
<v Speaker 2>Always it's the statement of no effect, no difference, or

249
00:11:59.440 --> 00:12:03.200
<v Speaker 2>no relation. For example, this new drug has no effect

250
00:12:03.279 --> 00:12:05.720
<v Speaker 2>on recovery time compared to the place ebo. It's the

251
00:12:05.720 --> 00:12:08.639
<v Speaker 2>status quo assumption, and we test that against against the

252
00:12:08.679 --> 00:12:12.000
<v Speaker 2>alternative hypothesis. This is the statement that contradicts the null.

253
00:12:12.440 --> 00:12:15.240
<v Speaker 2>It's what you, as the researcher, might actually suspect or

254
00:12:15.279 --> 00:12:18.519
<v Speaker 2>hope to prove, like, no, this new drug does reduce

255
00:12:18.559 --> 00:12:19.240
<v Speaker 2>recovery time.

256
00:12:19.480 --> 00:12:22.960
<v Speaker 1>So the whole process is about gathering enough statistical evidence

257
00:12:23.320 --> 00:12:27.679
<v Speaker 1>to confidently say, Okay, we can reject the no effect

258
00:12:27.759 --> 00:12:29.679
<v Speaker 1>idea in favor of the there is an.

259
00:12:29.600 --> 00:12:33.759
<v Speaker 2>Effect idea precisely, and that level of statistical confidence, often

260
00:12:33.799 --> 00:12:36.960
<v Speaker 2>expressed as a P value or a confidence interval, is

261
00:12:37.000 --> 00:12:39.679
<v Speaker 2>what determines whether you feel justified in acting on your

262
00:12:39.720 --> 00:12:41.759
<v Speaker 2>findings or making a claim about the population.

263
00:12:42.000 --> 00:12:44.840
<v Speaker 1>All right, let's pull this together. We've walked through quite

264
00:12:44.840 --> 00:12:48.799
<v Speaker 1>a statistical toolkit, understanding the different types of variables you encounter.

265
00:12:48.679 --> 00:12:51.679
<v Speaker 2>Discrete, continuous, nominal, ordinal.

266
00:12:51.480 --> 00:12:54.639
<v Speaker 1>Yeah, then summarizing them with measures of center like the

267
00:12:54.720 --> 00:12:59.679
<v Speaker 1>mean and median, understanding spread with standard deviation especially ysds

268
00:13:00.159 --> 00:13:01.480
<v Speaker 1>full practically.

269
00:13:01.159 --> 00:13:03.559
<v Speaker 2>Right, those units matter for comparison and scaling.

270
00:13:03.759 --> 00:13:06.600
<v Speaker 1>Then we moved into setting up predictions by defining dependent

271
00:13:06.679 --> 00:13:10.399
<v Speaker 1>and independent variables, and finally, the framework for making decisions

272
00:13:10.440 --> 00:13:14.399
<v Speaker 1>based on sample data hypothesis testing built on the confidence

273
00:13:14.440 --> 00:13:16.320
<v Speaker 1>given by the law of large numbers.

274
00:13:16.559 --> 00:13:19.000
<v Speaker 2>It really does form the essential foundation. You can see

275
00:13:19.000 --> 00:13:22.320
<v Speaker 2>how these concepts are well mandatory before you jump into

276
00:13:22.320 --> 00:13:24.039
<v Speaker 2>the more complex mL algorithms.

277
00:13:24.159 --> 00:13:27.000
<v Speaker 1>They really are the entry point for any serious study

278
00:13:27.080 --> 00:13:27.639
<v Speaker 1>or application.

279
00:13:28.000 --> 00:13:30.080
<v Speaker 2>But here's a final thought, something that connects back to

280
00:13:30.120 --> 00:13:34.399
<v Speaker 2>that law of large numbers. The LN guarantees convergence. It

281
00:13:34.399 --> 00:13:38.799
<v Speaker 2>gives us certainty, but only over a massive number of trials.

282
00:13:39.159 --> 00:13:40.799
<v Speaker 2>A million coin flips.

283
00:13:40.519 --> 00:13:42.639
<v Speaker 1>Right, requires huge scale exactly.

284
00:13:43.200 --> 00:13:46.200
<v Speaker 2>But in the real world, doing market analysis, building a

285
00:13:46.200 --> 00:13:50.480
<v Speaker 2>product prototype, maybe even running a clinical trial, we almost

286
00:13:50.559 --> 00:13:53.519
<v Speaker 2>never have a million data points. We work with samples,

287
00:13:53.799 --> 00:13:57.759
<v Speaker 2>sometimes relatively small samples because collecting data is expensive or

288
00:13:57.759 --> 00:13:58.480
<v Speaker 2>time consuming.

289
00:13:58.759 --> 00:14:02.480
<v Speaker 1>So the certainty we get is an absolute. It's usually probabilistic,

290
00:14:02.559 --> 00:14:05.360
<v Speaker 1>like saying we're ninety five percent confident or maybe ninety

291
00:14:05.399 --> 00:14:06.840
<v Speaker 1>nine percent confident.

292
00:14:06.480 --> 00:14:09.600
<v Speaker 2>Right, which leads to the provocative question. If the law

293
00:14:09.639 --> 00:14:13.759
<v Speaker 2>of large numbers only guarantees truth over immense scale, how

294
00:14:13.799 --> 00:14:16.840
<v Speaker 2>often are our everyday decisions, maybe in business, launching a

295
00:14:16.840 --> 00:14:20.639
<v Speaker 2>new feature, or even interpreting a political poll, actually based

296
00:14:20.639 --> 00:14:22.679
<v Speaker 2>on what could be called the fallacy of small.

297
00:14:22.519 --> 00:14:25.360
<v Speaker 1>Numbers, meaning we're drawing conclusions from samples that might be

298
00:14:25.399 --> 00:14:28.559
<v Speaker 1>too small to really trust the LLNS guarantee potentially.

299
00:14:30.480 --> 00:14:34.080
<v Speaker 2>So the question for you, the listener, is what level

300
00:14:34.080 --> 00:14:37.000
<v Speaker 2>of statistical certainty that ninety five percent, that ninety nine

301
00:14:37.039 --> 00:14:40.240
<v Speaker 2>percent are you willing to accept, Especially when you're moving

302
00:14:40.320 --> 00:14:44.039
<v Speaker 2>from analyzing a potentially small, expensive sample to making a

303
00:14:44.080 --> 00:14:47.360
<v Speaker 2>big assumption about the entire population, an assumption that could

304
00:14:47.360 --> 00:14:49.519
<v Speaker 2>have major consequences, maybe cost millions.

305
00:14:49.600 --> 00:14:51.039
<v Speaker 1>How much uncertainty can you live with?

306
00:14:51.240 --> 00:14:54.720
<v Speaker 2>What's your threshold for risk framed in that statistical confidence?

307
00:14:54.960 --> 00:14:57.279
<v Speaker 1>Definitely something to think about. A great place to leave

308
00:14:57.279 --> 00:14:59.240
<v Speaker 1>it for the steep dive. Thanks for joining us,
