WEBVTT

1
00:00:00.040 --> 00:00:04.559
<v Speaker 1>Okay, let's unpack this today. We're diving into a really

2
00:00:04.559 --> 00:00:10.359
<v Speaker 1>fascinating resource. This comprehensive workshop all about statistics and calculus,

3
00:00:10.839 --> 00:00:12.720
<v Speaker 1>but specifically using.

4
00:00:12.519 --> 00:00:15.919
<v Speaker 2>Python, right, and our mission really is to pull out

5
00:00:15.960 --> 00:00:17.160
<v Speaker 2>the most important bits.

6
00:00:16.960 --> 00:00:20.359
<v Speaker 1>Of knowledge exactly and show how Python can take these

7
00:00:20.359 --> 00:00:24.039
<v Speaker 1>concepts which, let's be honest, can seem pretty intimidating.

8
00:00:23.559 --> 00:00:27.440
<v Speaker 2>Oh definitely math stats they have that reputation, and.

9
00:00:27.440 --> 00:00:31.760
<v Speaker 1>Turn them into genuinely powerful practical tools, tools you can

10
00:00:31.879 --> 00:00:33.000
<v Speaker 1>use to understand the world.

11
00:00:33.079 --> 00:00:35.280
<v Speaker 2>And what's great, I think, is how Python makes it

12
00:00:35.320 --> 00:00:38.640
<v Speaker 2>not just accessible or efficient, but actually kind of engaging.

13
00:00:38.759 --> 00:00:42.439
<v Speaker 1>You know, it really does feel different, almost fun sometimes.

14
00:00:43.240 --> 00:00:44.840
<v Speaker 1>So to get there, we kind of start at the

15
00:00:44.840 --> 00:00:47.039
<v Speaker 1>beginning the foundations of Python itself.

16
00:00:47.119 --> 00:00:47.840
<v Speaker 2>Building blocks.

17
00:00:47.880 --> 00:00:53.560
<v Speaker 1>Yeah, the toolkit, so basic data structures first, strings, lists, tuples, dictionaries.

18
00:00:53.600 --> 00:00:54.520
<v Speaker 1>These are how you hold your.

19
00:00:54.399 --> 00:00:57.000
<v Speaker 2>Information, and dictionaries are a good example, right with their

20
00:00:57.079 --> 00:00:59.840
<v Speaker 2>key value pairs. The workshop mentions using them for something

21
00:00:59.840 --> 00:01:01.240
<v Speaker 2>like shopping cart calculation.

22
00:01:02.439 --> 00:01:05.680
<v Speaker 1>And if you look for an item a key that

23
00:01:05.840 --> 00:01:06.599
<v Speaker 1>isn't in your.

24
00:01:06.439 --> 00:01:09.239
<v Speaker 2>Dictionary, you get that key air right, which.

25
00:01:09.079 --> 00:01:11.879
<v Speaker 1>Isn't just an air message, it's Python telling you, hey,

26
00:01:11.920 --> 00:01:14.640
<v Speaker 1>this isn't here forces you to think about handling those

27
00:01:14.640 --> 00:01:16.239
<v Speaker 1>missing items properly.

28
00:01:16.040 --> 00:01:17.840
<v Speaker 2>Which is critical for reliable code.

29
00:01:18.000 --> 00:01:18.120
<v Speaker 1>Right.

30
00:01:18.400 --> 00:01:21.760
<v Speaker 2>Then, beyond just storing data, you need to control how

31
00:01:21.760 --> 00:01:22.640
<v Speaker 2>the program runs.

32
00:01:22.680 --> 00:01:26.719
<v Speaker 1>Control flow, Yeah, you're if l if else for decisions,

33
00:01:27.239 --> 00:01:29.120
<v Speaker 1>your fore loops, for doing things repeatedly.

34
00:01:29.239 --> 00:01:31.959
<v Speaker 2>And Python's readability is a big plus here. It feels

35
00:01:32.280 --> 00:01:34.400
<v Speaker 2>quite intuitive compared to some other languages.

36
00:01:34.519 --> 00:01:37.319
<v Speaker 1>It does. Now this leads into something really powerful. Functions

37
00:01:38.079 --> 00:01:39.799
<v Speaker 1>and recursion.

38
00:01:39.959 --> 00:01:44.560
<v Speaker 2>Ah yes, functions packaging up logic input output, but really

39
00:01:44.920 --> 00:01:46.599
<v Speaker 2>breaking down big problems exactly.

40
00:01:46.719 --> 00:01:50.480
<v Speaker 1>And recursion where a function calls itself. That can seem

41
00:01:50.519 --> 00:01:51.519
<v Speaker 1>a bit mind bending at.

42
00:01:51.400 --> 00:01:54.799
<v Speaker 2>First it can, but the Sudoku solver example in the

43
00:01:54.799 --> 00:01:58.159
<v Speaker 2>workshop is perfect for illustrating it. How so, well, think

44
00:01:58.159 --> 00:02:01.519
<v Speaker 2>about solving Sudoku manually, try a number. Maybe it leads

45
00:02:01.560 --> 00:02:03.599
<v Speaker 2>to a dead end, so you backtrack, right, you erase

46
00:02:03.640 --> 00:02:06.359
<v Speaker 2>it and try something else. Recursion and Python can work

47
00:02:06.439 --> 00:02:08.919
<v Speaker 2>just like that. If a path doesn't work out, the

48
00:02:08.960 --> 00:02:13.439
<v Speaker 2>function effectively returns false, signaling it needs to back up

49
00:02:13.520 --> 00:02:17.520
<v Speaker 2>and try a different possibility. It explores the solution space.

50
00:02:17.560 --> 00:02:21.280
<v Speaker 1>Very elegant, but writing code is one thing. Making sure

51
00:02:21.319 --> 00:02:23.319
<v Speaker 1>it works and stays working is another.

52
00:02:23.680 --> 00:02:28.280
<v Speaker 2>Debugging absolutely crucial. The workshop mentions simple things like, you know,

53
00:02:28.400 --> 00:02:32.080
<v Speaker 2>just using print statements to see variable values. Print debugging

54
00:02:32.120 --> 00:02:35.560
<v Speaker 2>like classic approach it still works, but also more advanced

55
00:02:35.560 --> 00:02:38.439
<v Speaker 2>tools like PDB, the Python debugger.

56
00:02:38.080 --> 00:02:40.360
<v Speaker 1>That lets you step through the code line.

57
00:02:40.039 --> 00:02:44.360
<v Speaker 2>By line exactly, pause execution, inspect everything, find exactly where

58
00:02:44.360 --> 00:02:45.479
<v Speaker 2>things go off the rails.

59
00:02:45.840 --> 00:02:50.120
<v Speaker 1>And we can't forget version control, get and get hub.

60
00:02:50.000 --> 00:02:53.039
<v Speaker 2>No negotiable really, especially for anything more than a tiny

61
00:02:53.080 --> 00:02:55.000
<v Speaker 2>script or if you're working with others.

62
00:02:55.039 --> 00:02:57.240
<v Speaker 1>It's like a safety net and a collaboration hub rolled

63
00:02:57.280 --> 00:02:59.280
<v Speaker 1>into one. You track changes, you can go back in time.

64
00:02:59.360 --> 00:03:01.800
<v Speaker 2>You set up your little repository, link it to GitHub,

65
00:03:01.840 --> 00:03:04.719
<v Speaker 2>and then just get push your changes. Keeps everything organized.

66
00:03:04.919 --> 00:03:08.199
<v Speaker 1>Okay, so foundations are set. Now, how do we actually

67
00:03:08.240 --> 00:03:11.879
<v Speaker 1>start wrestling with data? This is where Python's analytical libraries

68
00:03:11.919 --> 00:03:12.319
<v Speaker 1>come in.

69
00:03:12.400 --> 00:03:16.319
<v Speaker 2>Right, and the main workhourse for anything numerical or scientific

70
00:03:16.439 --> 00:03:17.159
<v Speaker 2>is numb Pi.

71
00:03:17.240 --> 00:03:19.800
<v Speaker 1>Numb pi raise. They're different from standard Python.

72
00:03:19.560 --> 00:03:22.719
<v Speaker 2>Lists, very different, much more flexible, especially for multi dimensional

73
00:03:22.800 --> 00:03:26.759
<v Speaker 2>data I think spreadsheets, images, three D simulations. Numb Pi

74
00:03:26.960 --> 00:03:28.960
<v Speaker 2>handles that structure naturally.

75
00:03:28.599 --> 00:03:31.759
<v Speaker 1>And the speed. The workshop had that comparison.

76
00:03:31.800 --> 00:03:34.719
<v Speaker 2>Oh yeah, the vectorized operations, it's night and day. A

77
00:03:34.800 --> 00:03:37.759
<v Speaker 2>regular four loop doing multiplication might take what was it,

78
00:03:37.879 --> 00:03:38.599
<v Speaker 2>half a second?

79
00:03:38.840 --> 00:03:40.479
<v Speaker 1>About point five four to three seconds?

80
00:03:40.520 --> 00:03:43.000
<v Speaker 2>Yeah, and the numb pi vectorized version point.

81
00:03:42.919 --> 00:03:46.039
<v Speaker 1>Zero zero zero five seconds, tiny fraction.

82
00:03:46.240 --> 00:03:49.520
<v Speaker 2>It's just fundamentally faster because it processes entire arrays at

83
00:03:49.560 --> 00:03:52.000
<v Speaker 2>once using highly optimized C code underneath.

84
00:03:52.159 --> 00:03:55.560
<v Speaker 1>That kind of speed up changes what's even possible to analyze.

85
00:03:55.879 --> 00:03:57.960
<v Speaker 1>Huge data sets become manageable.

86
00:03:57.560 --> 00:04:02.120
<v Speaker 2>Totally, and a key point for doing analysis reproducibility. Setting

87
00:04:02.159 --> 00:04:05.759
<v Speaker 2>the random seed with np dot random dot seed one two.

88
00:04:05.639 --> 00:04:08.319
<v Speaker 1>Three, so even if you use random numbers, you get

89
00:04:08.319 --> 00:04:11.159
<v Speaker 1>the same random sequence each time you run it exactly.

90
00:04:11.360 --> 00:04:14.240
<v Speaker 2>Ensures your results are consistent and someone else can reproduce

91
00:04:14.240 --> 00:04:16.120
<v Speaker 2>your work. Critical for science.

92
00:04:16.399 --> 00:04:20.480
<v Speaker 1>Okay, so numb Pi handles the raw numbers, but often

93
00:04:20.839 --> 00:04:23.879
<v Speaker 1>data comes in tables like spreadsheets.

94
00:04:23.160 --> 00:04:25.920
<v Speaker 2>And that's where Pandas comes in. Panda's data frames are

95
00:04:25.920 --> 00:04:27.959
<v Speaker 2>the go to for tabular data.

96
00:04:27.680 --> 00:04:32.720
<v Speaker 1>So you can load data, look at rows, columns, manipulate things.

97
00:04:32.519 --> 00:04:36.959
<v Speaker 2>Yep, initialize a data frame, access data, rename columns to

98
00:04:36.959 --> 00:04:40.319
<v Speaker 2>be clearer, fill in missing values, sort the data to

99
00:04:40.360 --> 00:04:43.399
<v Speaker 2>see trends all standard operations.

100
00:04:42.959 --> 00:04:44.800
<v Speaker 1>And it has that handy described well.

101
00:04:45.079 --> 00:04:48.040
<v Speaker 2>Describe is great for a quick overview. For numerical columns,

102
00:04:48.040 --> 00:04:51.519
<v Speaker 2>it gives you count means, standard deviation, min max.

103
00:04:51.399 --> 00:04:55.720
<v Speaker 1>Quartiles, a quick statistical summary. What about non numerical like

104
00:04:55.920 --> 00:04:56.879
<v Speaker 1>text data.

105
00:04:56.720 --> 00:04:59.160
<v Speaker 2>It handles that too. It'll show things like the number

106
00:04:59.160 --> 00:05:01.439
<v Speaker 2>of unique entries, the most frequent one for stats like

107
00:05:01.519 --> 00:05:04.480
<v Speaker 2>mean that don't apply to shows nan not a number.

108
00:05:04.279 --> 00:05:06.560
<v Speaker 1>Makes sense and if you need numbers for say a

109
00:05:06.600 --> 00:05:09.639
<v Speaker 1>machine learning model. There was mention of one hot encoding.

110
00:05:09.800 --> 00:05:12.279
<v Speaker 2>Right, that's a common way to turn categorical features like

111
00:05:12.279 --> 00:05:15.519
<v Speaker 2>color dot red, color blue into numerical columns, usually le's

112
00:05:15.600 --> 00:05:16.000
<v Speaker 2>and ones.

113
00:05:16.199 --> 00:05:19.279
<v Speaker 1>But it adds more columns, right, that's the drawback exactly.

114
00:05:19.360 --> 00:05:22.199
<v Speaker 2>It increases the dimensionality of your data, which can sometimes

115
00:05:22.279 --> 00:05:24.040
<v Speaker 2>make things more complex. It's a trade off.

116
00:05:24.439 --> 00:05:26.839
<v Speaker 1>Okay, so we have the data wrangled, how do we

117
00:05:26.879 --> 00:05:27.639
<v Speaker 1>see what's going on?

118
00:05:27.839 --> 00:05:32.399
<v Speaker 2>Visualization mattplotlib and seaborn are the key libraries here, turning

119
00:05:32.480 --> 00:05:34.120
<v Speaker 2>numbers into pictures.

120
00:05:33.800 --> 00:05:37.680
<v Speaker 1>Scatterplots, line graphs, bar charts, the usual.

121
00:05:37.560 --> 00:05:41.519
<v Speaker 2>Susple all those yeah, grouped bar charts for comparing categories

122
00:05:41.600 --> 00:05:46.120
<v Speaker 2>side by side, histograms to sea distributions. You can tweak

123
00:05:46.199 --> 00:05:49.759
<v Speaker 2>histograms too, like setting density true to compare shapes even

124
00:05:49.759 --> 00:05:53.199
<v Speaker 2>if sample sizes differ, or changing the number of bins.

125
00:05:53.040 --> 00:05:55.519
<v Speaker 1>And heat maps. I always find those interesting.

126
00:05:55.319 --> 00:05:58.439
<v Speaker 2>Very useful, especially for correlation matrices. You can instantly see

127
00:05:58.439 --> 00:06:01.680
<v Speaker 2>which variables tend to move together. It's a great visual shortcut.

128
00:06:01.839 --> 00:06:04.040
<v Speaker 1>So the workshop puts this into practice with a real

129
00:06:04.120 --> 00:06:06.720
<v Speaker 1>data set the Apple App Store games.

130
00:06:06.959 --> 00:06:09.920
<v Speaker 2>Yes a practical example, and it highlights the importance of

131
00:06:10.000 --> 00:06:13.000
<v Speaker 2>data prep. You know, cleaning things up first.

132
00:06:12.920 --> 00:06:16.040
<v Speaker 1>Like changing column names, setting the it is the index,

133
00:06:16.680 --> 00:06:18.120
<v Speaker 1>dropping columns that aren't.

134
00:06:17.959 --> 00:06:21.480
<v Speaker 2>Useful right like the earl or icon earl. And dealing

135
00:06:21.480 --> 00:06:24.560
<v Speaker 2>with missing data is huge. The subtitle column had like

136
00:06:24.759 --> 00:06:27.079
<v Speaker 2>eleven thousand missing values wow.

137
00:06:27.519 --> 00:06:29.439
<v Speaker 1>And the user ratings had a lot missing too.

138
00:06:29.319 --> 00:06:32.759
<v Speaker 2>Over nine thousand missing average user rating values. So a

139
00:06:32.800 --> 00:06:36.079
<v Speaker 2>key step was filtering, only keeping games with at least

140
00:06:36.199 --> 00:06:37.079
<v Speaker 2>thirty ratings.

141
00:06:37.240 --> 00:06:40.000
<v Speaker 1>Why thirty just to have enough data for stats to

142
00:06:40.040 --> 00:06:40.680
<v Speaker 1>be meaningful.

143
00:06:40.879 --> 00:06:44.600
<v Speaker 2>Basically, yes, it's a common threshold for technical reasons to

144
00:06:44.720 --> 00:06:46.399
<v Speaker 2>ensure some reliability in the averages.

145
00:06:46.480 --> 00:06:49.079
<v Speaker 1>And after all that cleaning and filtering, what did they find.

146
00:06:49.560 --> 00:06:52.480
<v Speaker 2>One really interesting finding was that the distribution of average

147
00:06:52.560 --> 00:06:56.240
<v Speaker 2>user ratings looked almost identical for free games versus paid game.

148
00:06:56.279 --> 00:06:58.879
<v Speaker 1>Really, so paying doesn't necessarily mean people like the.

149
00:06:58.839 --> 00:07:01.240
<v Speaker 2>Game more, it seems is not, at least in this

150
00:07:01.319 --> 00:07:05.639
<v Speaker 2>data set. Suggests maybe game quality itself or user experience

151
00:07:05.759 --> 00:07:07.720
<v Speaker 2>is the dominant factor, not the price tag.

152
00:07:07.879 --> 00:07:11.000
<v Speaker 1>Fascinating. Okay, so we've tained the data. Now let's get

153
00:07:11.000 --> 00:07:15.079
<v Speaker 1>into the statistical side, drawing deeper insights, making predictions.

154
00:07:14.600 --> 00:07:17.319
<v Speaker 2>Right, moving beyond just describing the data.

155
00:07:17.040 --> 00:07:21.360
<v Speaker 1>We have, which brings up that distinction descriptive versus inferential statistics.

156
00:07:21.680 --> 00:07:25.720
<v Speaker 2>Yeah, Descriptive is summarizing what you see, average spread things

157
00:07:25.759 --> 00:07:29.560
<v Speaker 2>like that. Inferential is using your sample to say something

158
00:07:29.600 --> 00:07:33.759
<v Speaker 2>about the bigger picture or about unseen data, making inferences.

159
00:07:33.959 --> 00:07:37.240
<v Speaker 1>And a lot of that involves probability, dealing with randomness.

160
00:07:37.360 --> 00:07:39.839
<v Speaker 2>It does. And the interesting thing is, while one random

161
00:07:39.879 --> 00:07:43.800
<v Speaker 2>event is unpredictable, like one coin flip heads or tails,

162
00:07:43.800 --> 00:07:46.639
<v Speaker 2>who knows a lot of random events become surprisingly predictable.

163
00:07:46.720 --> 00:07:49.360
<v Speaker 2>Flip that coin one thousand times and you're almost certainly

164
00:07:49.360 --> 00:07:50.800
<v Speaker 2>going to get around five hundred heads.

165
00:07:50.920 --> 00:07:55.079
<v Speaker 1>The workshop used a die tossing example yeah a million times.

166
00:07:54.879 --> 00:07:58.639
<v Speaker 2>Yeah, to show calculating probability from relative frequency. Pod number

167
00:07:58.639 --> 00:08:01.000
<v Speaker 2>came out around point five zers old one, very close

168
00:08:01.040 --> 00:08:04.160
<v Speaker 2>to the theoretical point five. P less than five was

169
00:08:04.160 --> 00:08:06.920
<v Speaker 2>about point six sixty six, again very close to forty

170
00:08:06.920 --> 00:08:07.720
<v Speaker 2>six or twenty three.

171
00:08:07.879 --> 00:08:12.000
<v Speaker 1>This predictability leads nicely into the roulette example explaining expected value.

172
00:08:12.079 --> 00:08:14.399
<v Speaker 2>It's a classic. You bet one dollar on red. Say

173
00:08:14.519 --> 00:08:16.800
<v Speaker 2>you win one dollar if it lands red, lose one

174
00:08:16.839 --> 00:08:17.920
<v Speaker 2>dollar if black or green.

175
00:08:18.279 --> 00:08:21.040
<v Speaker 1>But there are those two green spaces zero zero, zero

176
00:08:21.160 --> 00:08:22.319
<v Speaker 1>zero exactly.

177
00:08:22.720 --> 00:08:25.839
<v Speaker 2>They tip the odds slightly in the casino's favor. Over

178
00:08:25.959 --> 00:08:29.680
<v Speaker 2>many bets, the expected value for the gambler is slightly negative,

179
00:08:29.720 --> 00:08:32.840
<v Speaker 2>about minus two point seven cents per dollar bet.

180
00:08:32.879 --> 00:08:35.720
<v Speaker 1>And that small negative amount for the player is the

181
00:08:35.720 --> 00:08:36.960
<v Speaker 1>casino's profit margin.

182
00:08:37.120 --> 00:08:41.039
<v Speaker 2>Precisely, that's the house edge built right into the probabilities.

183
00:08:41.159 --> 00:08:44.720
<v Speaker 1>This idea of large numbers leading to predictable averages sounds

184
00:08:44.799 --> 00:08:46.080
<v Speaker 1>like the central limit theorem.

185
00:08:46.159 --> 00:08:50.679
<v Speaker 2>You got it, The CLT hugely important concept. It basically says,

186
00:08:50.919 --> 00:08:54.039
<v Speaker 2>if your sample size is large enough usually thirty or

187
00:08:54.080 --> 00:08:55.639
<v Speaker 2>more is a rule of thumb, then.

188
00:08:55.559 --> 00:08:58.480
<v Speaker 1>The distribution of the sample means will look like a

189
00:08:58.519 --> 00:08:59.399
<v Speaker 1>normal bell.

190
00:08:59.240 --> 00:09:02.519
<v Speaker 2>Curve exactly even if the original data source isn't normally

191
00:09:02.559 --> 00:09:05.720
<v Speaker 2>distributed at all. It could be uniform skewed. Whatever, the

192
00:09:05.799 --> 00:09:08.840
<v Speaker 2>averages from large samples will tend towards normality.

193
00:09:09.000 --> 00:09:12.360
<v Speaker 1>The workshop had that example drawing samples from a uniform distribution.

194
00:09:12.559 --> 00:09:15.720
<v Speaker 2>Yeah, ten thousand samples. The histogram of the sample averages

195
00:09:15.759 --> 00:09:18.559
<v Speaker 2>looked almost perfectly like a bell curve, fitting the normal

196
00:09:18.600 --> 00:09:21.919
<v Speaker 2>distribution predicted by the CLT. It's quite striking visually.

197
00:09:22.200 --> 00:09:26.080
<v Speaker 1>Okay, so sample means follow a normal distribution if the

198
00:09:26.120 --> 00:09:29.840
<v Speaker 1>sample is big enough, But any single sample mean might

199
00:09:29.879 --> 00:09:32.879
<v Speaker 1>still be off from the true population mean. How do

200
00:09:32.919 --> 00:09:34.399
<v Speaker 1>we account for that uncertainty?

201
00:09:34.639 --> 00:09:37.399
<v Speaker 2>That's where confidence intervals come in. They give you a range,

202
00:09:37.559 --> 00:09:39.240
<v Speaker 2>not just a single point estimate.

203
00:09:39.440 --> 00:09:42.320
<v Speaker 1>Like in election polls, they often report a margin of error.

204
00:09:42.360 --> 00:09:46.159
<v Speaker 2>Exactly. A ninety five percent confidence interval means we're ninety

205
00:09:46.159 --> 00:09:49.240
<v Speaker 2>five percent confident that the true population value lies within

206
00:09:49.279 --> 00:09:53.039
<v Speaker 2>this range. The workshop example mentioned a small pole four

207
00:09:53.039 --> 00:09:55.519
<v Speaker 2>to six people out of ten might vote for someone.

208
00:09:55.679 --> 00:09:58.960
<v Speaker 2>The interval reflects the uncertainty due to the small sample size.

209
00:09:59.080 --> 00:10:03.799
<v Speaker 1>Got it, So intervals quantify uncertainty? What about testing specific claims?

210
00:10:03.919 --> 00:10:05.080
<v Speaker 1>Hypothesis testing?

211
00:10:05.200 --> 00:10:08.360
<v Speaker 2>Right? This is about formally testing if a statistic you

212
00:10:08.440 --> 00:10:11.440
<v Speaker 2>observe is significantly different from what you'd expect under some

213
00:10:11.519 --> 00:10:12.320
<v Speaker 2>default assumption.

214
00:10:12.519 --> 00:10:13.919
<v Speaker 1>It has three parts with it yep.

215
00:10:14.320 --> 00:10:18.000
<v Speaker 2>First the hypotheses, the null hypothesis H zero, which is

216
00:10:18.000 --> 00:10:21.879
<v Speaker 2>the default or no effect assumption, and the alternative hypothesis HA,

217
00:10:21.960 --> 00:10:23.600
<v Speaker 2>which is what you're trying to find evidence for.

218
00:10:23.919 --> 00:10:27.240
<v Speaker 1>Like Richard the baker, H zero's is his factory still

219
00:10:27.240 --> 00:10:28.600
<v Speaker 1>makes fifteen thousand loaves.

220
00:10:28.879 --> 00:10:35.159
<v Speaker 2>Correct, HHA equals fifteen thousand. HA might be a fifteen thousand,

221
00:10:35.840 --> 00:10:38.559
<v Speaker 2>or maybe fifteen thousand if he hopes the new equipment

222
00:10:38.679 --> 00:10:43.799
<v Speaker 2>increased output. Okay, hypotheses first, Then, then you calculate a

223
00:10:43.840 --> 00:10:47.039
<v Speaker 2>test statistic based on your data, and finally, the P value.

224
00:10:47.159 --> 00:10:49.120
<v Speaker 1>The P value that tells you it's the.

225
00:10:49.120 --> 00:10:52.320
<v Speaker 2>Probability of seeing your data or something even more extreme

226
00:10:52.519 --> 00:10:55.840
<v Speaker 2>if the null hypothesis were actually true. A small P

227
00:10:56.080 --> 00:10:59.240
<v Speaker 2>value suggests your data is unlikely under the null providing

228
00:10:59.240 --> 00:11:01.320
<v Speaker 2>evidence for the alternative makes sense.

229
00:11:01.559 --> 00:11:04.639
<v Speaker 1>Now there was that important warning about correlation and causation.

230
00:11:04.840 --> 00:11:08.600
<v Speaker 2>Ah. Yes, the community's data set activity found higher test

231
00:11:08.600 --> 00:11:11.399
<v Speaker 2>scores in groups with more Internet access. The P value

232
00:11:11.440 --> 00:11:13.600
<v Speaker 2>was small, indicating a significant difference.

233
00:11:13.720 --> 00:11:15.360
<v Speaker 1>So more internet equals better.

234
00:11:15.120 --> 00:11:18.559
<v Speaker 2>Scores, not necessarily. That's the crucial point. Correlation does not

235
00:11:18.679 --> 00:11:19.639
<v Speaker 2>imply causation.

236
00:11:19.840 --> 00:11:22.120
<v Speaker 1>There could be another factor involved, exactly.

237
00:11:21.840 --> 00:11:25.279
<v Speaker 2>A lurking variable like the overall wealth or socioeconomic status

238
00:11:25.279 --> 00:11:28.159
<v Speaker 2>at a community, could be driving both higher internet access

239
00:11:28.200 --> 00:11:31.200
<v Speaker 2>and higher test scores. You can't conclude causation just from

240
00:11:31.200 --> 00:11:33.559
<v Speaker 2>the correlation. Always have to be careful.

241
00:11:33.399 --> 00:11:36.919
<v Speaker 1>A vital lesson. And you mentioned machine learning models like

242
00:11:36.960 --> 00:11:39.480
<v Speaker 1>linear regression. They fit in here too.

243
00:11:39.639 --> 00:11:43.759
<v Speaker 2>Yeah, there're essentially another form of inferential statistics. You build

244
00:11:43.759 --> 00:11:47.320
<v Speaker 2>a model on known data to make predictions about unseen data.

245
00:11:47.320 --> 00:11:48.200
<v Speaker 2>It's all connected.

246
00:11:48.559 --> 00:11:51.639
<v Speaker 1>Okay, let's shift to calculus, but again through the lens

247
00:11:51.720 --> 00:11:55.159
<v Speaker 1>of Python, making it practical. We talked about functions earlier, right.

248
00:11:55.000 --> 00:11:58.639
<v Speaker 2>And that core rule one input, only one output. The

249
00:11:58.720 --> 00:12:02.519
<v Speaker 2>vertical line test helps visualize that a circle fails it.

250
00:12:02.679 --> 00:12:05.720
<v Speaker 2>So why isn't a simple function of X for a circle.

251
00:12:05.440 --> 00:12:08.320
<v Speaker 1>And functions can be transformed, shifted.

252
00:12:08.039 --> 00:12:12.039
<v Speaker 2>Scaled, yep, adding a constant, shifts vertically, adding inside like

253
00:12:12.200 --> 00:12:17.279
<v Speaker 2>FX plus C, shifts horizontally, multiplying, stretches or shrinks. Python's

254
00:12:17.320 --> 00:12:20.720
<v Speaker 2>plotting makes seeing these transformations really intuitive.

255
00:12:20.320 --> 00:12:22.960
<v Speaker 1>And Python helps solve equations too, even tricky ones.

256
00:12:23.039 --> 00:12:25.879
<v Speaker 2>Oh, definitely simple linear like three by five lay six.

257
00:12:25.919 --> 00:12:27.519
<v Speaker 2>Python can solve that easily that you can do it

258
00:12:27.519 --> 00:12:30.279
<v Speaker 2>by hand. But for polynomials like by three seven x

259
00:12:30.279 --> 00:12:33.360
<v Speaker 2>two plus fifteen x x nine that looks harder. Python

260
00:12:33.399 --> 00:12:36.320
<v Speaker 2>can help factor it find the roots in this case

261
00:12:36.639 --> 00:12:40.200
<v Speaker 2>x one and x school three. Libraries like SIMP can

262
00:12:40.240 --> 00:12:45.159
<v Speaker 2>even handle symbolic math, solving systems of nonlinear equations algebraically.

263
00:12:45.279 --> 00:12:48.080
<v Speaker 1>Wow. What about sequences and series they popped up too.

264
00:12:48.120 --> 00:12:52.519
<v Speaker 2>Yeah, arithmetic geometric sequences. They have direct applications in finance,

265
00:12:52.600 --> 00:12:56.039
<v Speaker 2>like calculating compound interest for retirement.

266
00:12:55.519 --> 00:12:57.960
<v Speaker 1>Savings four one K calculations.

267
00:12:57.360 --> 00:13:00.639
<v Speaker 2>Exactly, or even modeling things like bacterial growth which often

268
00:13:00.679 --> 00:13:02.039
<v Speaker 2>follows a geometric sequence.

269
00:13:02.159 --> 00:13:04.799
<v Speaker 1>Useful stuff. And a quick mention of trigonometry and.

270
00:13:04.879 --> 00:13:08.840
<v Speaker 2>Vectors right sine cosine tangent for angles, the Pythagorean theorem

271
00:13:08.919 --> 00:13:12.919
<v Speaker 2>for right triangles, and vectors for quantities with magnitude and direction.

272
00:13:13.200 --> 00:13:16.480
<v Speaker 1>The dot products came up for finding the angle between vectors.

273
00:13:16.559 --> 00:13:19.039
<v Speaker 2>Yep, if the dot product is zero, the vectors are

274
00:13:19.159 --> 00:13:22.080
<v Speaker 2>orthogonal perpendicular. Useful in physics and graphics.

275
00:13:22.200 --> 00:13:26.120
<v Speaker 1>Okay, now the core calculus concepts derivatives and integrals made

276
00:13:26.120 --> 00:13:29.039
<v Speaker 1>practical with Python. Derivatives first rate of.

277
00:13:29.039 --> 00:13:32.679
<v Speaker 2>Change instantaneous rate of change, how fast something is changing

278
00:13:32.679 --> 00:13:36.360
<v Speaker 2>at a specific point. Traditionally, finding derivatives involves a lot

279
00:13:36.360 --> 00:13:38.279
<v Speaker 2>of algebra limit rules.

280
00:13:38.000 --> 00:13:40.600
<v Speaker 1>The tedious algebraic manipulations.

281
00:13:39.960 --> 00:13:44.240
<v Speaker 2>Exactly, but Python lets you do it numerically. You approximate

282
00:13:44.279 --> 00:13:46.879
<v Speaker 2>the slope using a tiny change in X like H

283
00:13:46.960 --> 00:13:50.039
<v Speaker 2>equals zero point zero zero zero zero zero one. You

284
00:13:50.080 --> 00:13:53.159
<v Speaker 2>calculate FX plus h FX, so.

285
00:13:53.080 --> 00:13:55.440
<v Speaker 1>You get the slope the rate of change without the

286
00:13:55.639 --> 00:13:57.480
<v Speaker 1>complex algebra pretty much.

287
00:13:57.360 --> 00:13:58.879
<v Speaker 2>And once you have the slope at a point you

288
00:13:58.879 --> 00:14:01.720
<v Speaker 2>can find the equation of the tangent line. There. Very

289
00:14:01.759 --> 00:14:03.919
<v Speaker 2>powerful for optimization and analysis.

290
00:14:04.000 --> 00:14:07.159
<v Speaker 1>Okay, And integrals the opposite kind of adding things up.

291
00:14:07.279 --> 00:14:10.759
<v Speaker 2>Conceptually, yes, adding up areas or volumes by slicing them

292
00:14:10.759 --> 00:14:14.440
<v Speaker 2>into many tiny pieces. Old methods like rhemen sums use

293
00:14:14.559 --> 00:14:17.240
<v Speaker 2>rectangles but weren't very accurate with few slices.

294
00:14:17.519 --> 00:14:20.559
<v Speaker 1>Python uses trapezoids the trap intogal function right.

295
00:14:20.879 --> 00:14:24.240
<v Speaker 2>Using trapezoids gives a better approximation, and because Python can

296
00:14:24.279 --> 00:14:28.039
<v Speaker 2>handle thousands or millions of slices easily, the numerical integration

297
00:14:28.120 --> 00:14:32.200
<v Speaker 2>becomes incredibly accurate. The workshop example show just five trapezoids

298
00:14:32.200 --> 00:14:34.159
<v Speaker 2>getting the air down to three percent, and this lets.

299
00:14:34.000 --> 00:14:37.559
<v Speaker 1>You calculate volumes of complex shapes solids of revolution.

300
00:14:37.440 --> 00:14:40.279
<v Speaker 2>Exactly like rotating a curve to make a bowl shape

301
00:14:40.279 --> 00:14:44.159
<v Speaker 2>a paraboloid, or solving optimization problems like finding the maximum

302
00:14:44.200 --> 00:14:46.600
<v Speaker 2>volume cone you can fit inside a sphere.

303
00:14:46.600 --> 00:14:49.440
<v Speaker 1>But the real power seem to be in differential equations.

304
00:14:49.480 --> 00:14:52.720
<v Speaker 2>Absolutely. These describe situations where the rate of change of

305
00:14:52.759 --> 00:14:55.840
<v Speaker 2>something depends on its current value. Finding the function itself

306
00:14:55.840 --> 00:14:59.600
<v Speaker 2>can be very hard or even impossible algebraically.

307
00:14:59.080 --> 00:15:04.960
<v Speaker 1>But Python offers numerical methods Euler's method, ran Jikuda RK four.

308
00:15:05.200 --> 00:15:08.799
<v Speaker 2>Yes, these are algorithmic approaches. You start with an initial

309
00:15:08.799 --> 00:15:11.960
<v Speaker 2>condition and step forward in small time increments, using the

310
00:15:11.960 --> 00:15:15.039
<v Speaker 2>derivative information to predict the next value. It's like building

311
00:15:15.120 --> 00:15:16.919
<v Speaker 2>the solution step by step.

312
00:15:16.679 --> 00:15:19.559
<v Speaker 1>And this opens up modeling for tons of real world things.

313
00:15:19.639 --> 00:15:22.480
<v Speaker 2>Oh a huge range. The workshop listed quite a few.

314
00:15:22.679 --> 00:15:25.799
<v Speaker 1>Let's recap some interest calculations. How money grows.

315
00:15:25.639 --> 00:15:29.320
<v Speaker 2>YEP modeling compound interest one thousand dollars growing to one

316
00:15:29.360 --> 00:15:32.240
<v Speaker 2>million dollars in about eighty six years, and eight percent.

317
00:15:32.080 --> 00:15:34.480
<v Speaker 1>Population growth like Kenya's doubling.

318
00:15:34.159 --> 00:15:38.080
<v Speaker 2>Time right, modeling exponential growth or maybe logistic growth if

319
00:15:38.080 --> 00:15:41.600
<v Speaker 2>there are limiting factors, how policy changes might affect growth.

320
00:15:41.360 --> 00:15:44.519
<v Speaker 1>Rates, radioactive decay carbon fourteen.

321
00:15:44.600 --> 00:15:48.960
<v Speaker 2>Dating exactly the half life calculation is a classic differential

322
00:15:49.000 --> 00:15:51.840
<v Speaker 2>equation problem used to date artifacts.

323
00:15:52.120 --> 00:15:55.240
<v Speaker 1>Noon's law of cooling, like figuring out time of death.

324
00:15:55.360 --> 00:15:59.120
<v Speaker 2>That's a famous application. Yes, or just modeling how any

325
00:15:59.159 --> 00:16:02.440
<v Speaker 2>object cools down or warms up towards the ambient temperature.

326
00:16:02.559 --> 00:16:04.720
<v Speaker 1>Mixture problems salt in a tank.

327
00:16:04.679 --> 00:16:08.159
<v Speaker 2>Yeah, Tracking the concentration of a substance as fluids flow

328
00:16:08.240 --> 00:16:10.200
<v Speaker 2>in and out common in chemical engineering.

329
00:16:10.240 --> 00:16:13.679
<v Speaker 1>Projectile motion calculating balls trajectory m HM.

330
00:16:13.799 --> 00:16:18.120
<v Speaker 2>Python can constantly reclculate velocity and position, accounting for gravity

331
00:16:18.200 --> 00:16:22.120
<v Speaker 2>air resistance, much more realistically than simple formulas.

332
00:16:21.639 --> 00:16:23.240
<v Speaker 1>And even predator prese scenarios.

333
00:16:23.440 --> 00:16:26.360
<v Speaker 2>A fox chasing a rabbit, Yes, showing exactly where the

334
00:16:26.399 --> 00:16:29.679
<v Speaker 2>fox intercepts the rabbit why twenty three point ninety nine

335
00:16:29.759 --> 00:16:32.039
<v Speaker 2>In the example, it models pursuit curves.

336
00:16:32.039 --> 00:16:34.600
<v Speaker 1>So the big advantage is avoiding the complex algebra and

337
00:16:34.679 --> 00:16:36.279
<v Speaker 1>just letting Python crunch the numbers.

338
00:16:36.360 --> 00:16:39.919
<v Speaker 2>Essentially, Yes, Modeling using Python and running simulations has saved

339
00:16:40.000 --> 00:16:42.000
<v Speaker 2>us a lot of algebra and still got us very

340
00:16:42.039 --> 00:16:45.480
<v Speaker 2>accurate answers. You can use brute force by recalculating thousands

341
00:16:45.519 --> 00:16:46.000
<v Speaker 2>of times.

342
00:16:46.120 --> 00:16:48.559
<v Speaker 1>Very cool. And finally, a brief look at matrices and

343
00:16:48.639 --> 00:16:49.559
<v Speaker 1>Markoff chains.

344
00:16:49.759 --> 00:16:53.480
<v Speaker 2>Right. Matrices are fundamental in linear algebra, AI machine learning,

345
00:16:53.519 --> 00:16:58.000
<v Speaker 2>and Markov chains model systems transitioning between states based on probabilities.

346
00:16:58.080 --> 00:17:00.559
<v Speaker 1>The example was a text predictor yeah, using.

347
00:17:00.360 --> 00:17:04.759
<v Speaker 2>The probability of one word following another state transitions to generate.

348
00:17:04.480 --> 00:17:05.400
<v Speaker 1>New text Yeah.

349
00:17:05.480 --> 00:17:08.519
<v Speaker 2>A basic but illustrative example of Markov chains in action.

350
00:17:09.319 --> 00:17:12.319
<v Speaker 1>So, wrapping this all up, what's the big takeaway here?

351
00:17:12.559 --> 00:17:15.839
<v Speaker 2>I think it's that Python, with these incredible libraries, really

352
00:17:15.880 --> 00:17:19.079
<v Speaker 2>acts like a universal translator for math and stats.

353
00:17:18.799 --> 00:17:21.960
<v Speaker 1>Taking abstract concepts and making them tools for solving real

354
00:17:22.000 --> 00:17:22.920
<v Speaker 1>problem exactly.

355
00:17:22.920 --> 00:17:27.039
<v Speaker 2>Whether it's finance, biology, physics, social science, you can model

356
00:17:27.200 --> 00:17:30.839
<v Speaker 2>complex systems, make predictions, understand dynamics.

357
00:17:30.359 --> 00:17:33.480
<v Speaker 1>Without necessarily needing a PhD in advanced mathematics to do

358
00:17:33.519 --> 00:17:34.839
<v Speaker 1>the calculations by hand. Right.

359
00:17:35.000 --> 00:17:38.599
<v Speaker 2>It democratizes the ability to use these powerful techniques. You

360
00:17:38.720 --> 00:17:41.160
<v Speaker 2>leverage the computational power to get insights.

361
00:17:41.200 --> 00:17:44.319
<v Speaker 1>It lets you ask what if and get remarkably accurate

362
00:17:44.319 --> 00:17:46.759
<v Speaker 1>answers through simulation and numerical methods.

363
00:17:46.839 --> 00:17:50.319
<v Speaker 2>It really does shift the focus from algebraic manipulation to

364
00:17:50.440 --> 00:17:52.559
<v Speaker 2>understanding the concepts and applying them.

365
00:17:52.839 --> 00:17:56.480
<v Speaker 1>So here's something to think about. What problem maybe something

366
00:17:56.519 --> 00:17:59.920
<v Speaker 1>that seemed mathematically impossible or just way too complex before.

367
00:18:00.599 --> 00:18:03.920
<v Speaker 1>What might you approach differently now knowing that Python could

368
00:18:03.960 --> 00:18:05.160
<v Speaker 1>be your computational guide.
