WEBVTT

1
00:00:00.080 --> 00:00:03.439
<v Speaker 1>Okay, let's unpack this today. We are diving deep into

2
00:00:03.799 --> 00:00:07.599
<v Speaker 1>something well, pretty fundamental if you work with data and

3
00:00:07.679 --> 00:00:10.640
<v Speaker 1>numbers in Python. We're talking about the numpi library.

4
00:00:10.679 --> 00:00:14.160
<v Speaker 2>Oh yeah, it's absolutely bedrock. If you've ever felt the

5
00:00:14.199 --> 00:00:18.079
<v Speaker 2>pain of doing complex math or managing big data sets

6
00:00:18.120 --> 00:00:22.839
<v Speaker 2>with just standard Python lists, numb pie is basically the answer.

7
00:00:23.000 --> 00:00:25.120
<v Speaker 1>And we've got a great guide for this deep dive.

8
00:00:25.160 --> 00:00:28.320
<v Speaker 1>It's a book called Learning numb Pi. Array really takes

9
00:00:28.359 --> 00:00:32.000
<v Speaker 1>you from the core ideas right through to tackling some

10
00:00:32.039 --> 00:00:32.840
<v Speaker 1>real world stuff.

11
00:00:32.880 --> 00:00:35.240
<v Speaker 2>It does, yeah, and it's good because it doesn't just

12
00:00:35.280 --> 00:00:38.119
<v Speaker 2>throw syntax at you. It explains why numpi is built

13
00:00:38.119 --> 00:00:40.520
<v Speaker 2>the way it is that's really key to you using

14
00:00:40.520 --> 00:00:41.759
<v Speaker 2>it effectively exactly.

15
00:00:42.119 --> 00:00:44.200
<v Speaker 1>So our mission here is to pull out the most

16
00:00:44.280 --> 00:00:47.920
<v Speaker 1>valuable insights, those aha moments from the material give you

17
00:00:47.960 --> 00:00:51.439
<v Speaker 1>a shortcut basically to understanding why numb pi is so critical,

18
00:00:51.960 --> 00:00:55.079
<v Speaker 1>how it gets that amazing speed and efficiency, and how

19
00:00:55.119 --> 00:00:58.039
<v Speaker 1>you can actually use it from basic array stuff to

20
00:00:58.479 --> 00:01:01.240
<v Speaker 1>data analysis, maybe even some predictive modeling later on.

21
00:01:01.600 --> 00:01:02.759
<v Speaker 2>Sounds good, Let's jump in.

22
00:01:02.880 --> 00:01:07.200
<v Speaker 1>Okay, So, starting right at the beginning, what is numb

23
00:01:07.200 --> 00:01:10.599
<v Speaker 1>pi really at its core? And why should someone doing

24
00:01:10.719 --> 00:01:13.400
<v Speaker 1>numerical work in Python absolutely care well?

25
00:01:13.400 --> 00:01:16.840
<v Speaker 2>Think of it as Python's high performance engine for well,

26
00:01:16.959 --> 00:01:20.439
<v Speaker 2>anything involving a raise or matrices of numbers. It gives

27
00:01:20.439 --> 00:01:23.719
<v Speaker 2>you this core object, the end array, right the end

28
00:01:23.760 --> 00:01:26.439
<v Speaker 2>of it, and that's specifically built for numerical work.

29
00:01:26.519 --> 00:01:29.239
<v Speaker 1>And the difference compared to a standard Python list is huge,

30
00:01:29.280 --> 00:01:31.959
<v Speaker 1>isn't it? That efficiency? That speed? That's one of the

31
00:01:32.000 --> 00:01:33.079
<v Speaker 1>first big takeaways.

32
00:01:33.120 --> 00:01:35.519
<v Speaker 2>Absolute massive. You know, Python lists are super flexible, they

33
00:01:35.519 --> 00:01:38.840
<v Speaker 2>can hold anything, but that flexibility it costs you when

34
00:01:38.840 --> 00:01:43.040
<v Speaker 2>you're doing math. Right, NUMBPI arrays, they're homogeneous. All elements

35
00:01:43.079 --> 00:01:45.640
<v Speaker 2>are the same data type. That means NUMBPI can store

36
00:01:45.680 --> 00:01:48.719
<v Speaker 2>them way more efficiently in memory. Okay, Plus a lot

37
00:01:48.719 --> 00:01:51.560
<v Speaker 2>of numbpi is actually written in C, so the computations

38
00:01:51.560 --> 00:01:53.159
<v Speaker 2>themselves are just lightning fast.

39
00:01:53.239 --> 00:01:57.599
<v Speaker 1>And that homogeneity enables one of numpis like superpowers. Right.

40
00:01:57.680 --> 00:02:02.120
<v Speaker 2>Vectorization exactly vectorized operation instead of writing for loops to

41
00:02:02.319 --> 00:02:03.439
<v Speaker 2>manually iterate.

42
00:02:03.159 --> 00:02:05.280
<v Speaker 1>Through numbers, it's tedious and slow.

43
00:02:05.959 --> 00:02:09.080
<v Speaker 2>Really slow. Yeah, you just apply operations to the entire

44
00:02:09.240 --> 00:02:11.520
<v Speaker 2>array at once, kind of like how you'd write it

45
00:02:11.520 --> 00:02:12.240
<v Speaker 2>in math notation.

46
00:02:12.479 --> 00:02:14.520
<v Speaker 1>Much cleaner code, much cleaner.

47
00:02:14.400 --> 00:02:17.680
<v Speaker 2>And crucilly much much faster because it's running optimized C

48
00:02:17.840 --> 00:02:18.680
<v Speaker 2>code underneath.

49
00:02:18.840 --> 00:02:21.120
<v Speaker 1>That's where the real performance game kicks in, especially with

50
00:02:21.159 --> 00:02:22.319
<v Speaker 1>bigger data sets. I guess.

51
00:02:22.360 --> 00:02:25.360
<v Speaker 2>Oh. Definitely, getting started is usually just a simple install.

52
00:02:25.400 --> 00:02:27.840
<v Speaker 2>The book covers that for Windows, Linux, Mac, you know,

53
00:02:28.400 --> 00:02:30.479
<v Speaker 2>But the real proof is seeing it in action.

54
00:02:30.560 --> 00:02:32.840
<v Speaker 1>Okay, yeah, and this is where it gets really interesting.

55
00:02:32.879 --> 00:02:37.599
<v Speaker 1>I thought that vector edition example, right, adding two lists

56
00:02:37.599 --> 00:02:41.240
<v Speaker 1>in pure Python with loops. Sure it works, but it's slow.

57
00:02:41.680 --> 00:02:44.639
<v Speaker 1>Then you see the numb pie version one line and

58
00:02:44.719 --> 00:02:46.879
<v Speaker 1>dramatically faster, dramatically faster.

59
00:02:47.080 --> 00:02:49.439
<v Speaker 2>And the key thing there it's not just that it's

60
00:02:49.479 --> 00:02:54.159
<v Speaker 2>faster for one small example. That performance gap scales. With

61
00:02:54.280 --> 00:02:57.360
<v Speaker 2>small arrays, maybe you don't notice much, but when you

62
00:02:57.400 --> 00:03:01.280
<v Speaker 2>have millions billions of elements, numb pi isn't just nice

63
00:03:01.319 --> 00:03:04.599
<v Speaker 2>to have, it's essential. It makes operations feasible that would

64
00:03:04.680 --> 00:03:06.360
<v Speaker 2>just be impossibly slow with lists.

65
00:03:06.599 --> 00:03:09.599
<v Speaker 1>Okay, so the core idea is speed an efficiency built

66
00:03:09.599 --> 00:03:12.400
<v Speaker 1>around the special array type. Let's talk more about the

67
00:03:12.439 --> 00:03:14.400
<v Speaker 1>nundarray itself, the fundamental building.

68
00:03:14.080 --> 00:03:16.639
<v Speaker 2>Block, right, the number. So, like we said, the defining

69
00:03:16.680 --> 00:03:21.680
<v Speaker 2>thing is homogeneity, all elements, same data type. This predictability

70
00:03:21.719 --> 00:03:24.639
<v Speaker 2>knowing how big each element is. Let's numb pi lay

71
00:03:24.639 --> 00:03:29.199
<v Speaker 2>out the data really efficiently in memory contiguously.

72
00:03:28.680 --> 00:03:32.319
<v Speaker 1>And they're zero indexed like Python lists.

73
00:03:32.039 --> 00:03:34.919
<v Speaker 2>Yep, zero indexed. First element is that index zero, and

74
00:03:34.960 --> 00:03:38.960
<v Speaker 2>the array object itself knows things about its structure metadata exactly.

75
00:03:39.000 --> 00:03:40.639
<v Speaker 2>You can ask it for its shape, like is it.

76
00:03:40.639 --> 00:03:42.479
<v Speaker 1>A row, a matrix, a three D.

77
00:03:42.520 --> 00:03:45.400
<v Speaker 2>Cube precisely, or it's d typed to know what type

78
00:03:45.439 --> 00:03:48.520
<v Speaker 2>of data it holds right size for the total number

79
00:03:48.520 --> 00:03:50.599
<v Speaker 2>of elements. Item size sells you how many bytes each

80
00:03:50.639 --> 00:03:53.759
<v Speaker 2>element uses, and in them for the number of dimensions

81
00:03:54.080 --> 00:03:55.639
<v Speaker 2>you use these attributes all the time.

82
00:03:55.719 --> 00:03:57.840
<v Speaker 1>And the data types themselves, the D types, there's a

83
00:03:57.960 --> 00:03:58.520
<v Speaker 1>decent range.

84
00:03:58.560 --> 00:04:02.039
<v Speaker 2>Oh yeah, everything you'd expect from numeror work integers, signed

85
00:04:02.080 --> 00:04:04.599
<v Speaker 2>and unsigned, different sizes like selling two bits, sixty four

86
00:04:04.639 --> 00:04:10.240
<v Speaker 2>bit floats, single double precision boollions, complex numbers, and these

87
00:04:10.280 --> 00:04:13.680
<v Speaker 2>are represented by special D type objects like in sixty

88
00:04:13.680 --> 00:04:16.759
<v Speaker 2>four float thirty two. You can convert between types two,

89
00:04:16.839 --> 00:04:17.360
<v Speaker 2>but you have to.

90
00:04:17.279 --> 00:04:20.199
<v Speaker 1>Be careful, like you can't just stuff a complex number

91
00:04:20.240 --> 00:04:22.279
<v Speaker 1>into an integer array exactly.

92
00:04:22.639 --> 00:04:25.800
<v Speaker 2>NUMPI will complain rightly so with a type error. It

93
00:04:25.879 --> 00:04:27.800
<v Speaker 2>knows that doesn't make sense mathematically.

94
00:04:27.959 --> 00:04:31.319
<v Speaker 1>Well, you stress homogeneity is key, But the book mentions

95
00:04:31.360 --> 00:04:34.600
<v Speaker 1>something called record data types. That sounds like it breaks

96
00:04:34.600 --> 00:04:35.000
<v Speaker 1>the rule.

97
00:04:35.120 --> 00:04:38.959
<v Speaker 2>Oh, it's a clever exception. It's for structured data. Think

98
00:04:39.000 --> 00:04:42.279
<v Speaker 2>of like a spreadsheet row okay, where each cell in

99
00:04:42.319 --> 00:04:44.639
<v Speaker 2>the row might be different in name, which is text,

100
00:04:44.920 --> 00:04:47.279
<v Speaker 2>an age, an integer, a height of float.

101
00:04:47.360 --> 00:04:48.079
<v Speaker 1>Mixed types.

102
00:04:48.240 --> 00:04:51.079
<v Speaker 2>A record d type lets you define named fields within

103
00:04:51.120 --> 00:04:54.319
<v Speaker 2>the array element structure. Each field can have its own

104
00:04:54.360 --> 00:04:55.399
<v Speaker 2>specific data type.

105
00:04:55.480 --> 00:04:58.040
<v Speaker 1>Ah. Okay, so the elements are all the same record type.

106
00:04:58.079 --> 00:05:01.319
<v Speaker 1>But inside that record types can do exactly.

107
00:05:01.000 --> 00:05:07.680
<v Speaker 2>The inventory example in the book nails it item name, string, count, integer, price, float,

108
00:05:08.160 --> 00:05:11.519
<v Speaker 2>all stored together efficiently in a NUMPI array structure. It's

109
00:05:11.560 --> 00:05:12.600
<v Speaker 2>great for tabular data.

110
00:05:12.759 --> 00:05:15.680
<v Speaker 1>Okay, that makes sense. So we have these efficient arrays,

111
00:05:15.920 --> 00:05:18.920
<v Speaker 1>how do we work with them? Slicing comes first feels

112
00:05:18.959 --> 00:05:20.160
<v Speaker 1>familiar from lists.

113
00:05:20.399 --> 00:05:24.240
<v Speaker 2>Yeah, for one dimensional arrays, slicing is exactly like Python lists. Yeah,

114
00:05:24.319 --> 00:05:26.480
<v Speaker 2>grab a chunk using start, stop step.

115
00:05:26.560 --> 00:05:29.000
<v Speaker 1>But it gets more interesting with multiple dimensions.

116
00:05:29.079 --> 00:05:32.839
<v Speaker 2>Right, you can slice along each axis or dimension, and

117
00:05:32.879 --> 00:05:34.720
<v Speaker 2>then you get into reshaping and flattening.

118
00:05:34.759 --> 00:05:38.480
<v Speaker 1>Flattening like ravel or flatten That takes a multidimensional array

119
00:05:38.519 --> 00:05:40.040
<v Speaker 1>and just makes it one long row.

120
00:05:40.160 --> 00:05:43.279
<v Speaker 2>Yep, turns it into one D. But there's a really

121
00:05:43.319 --> 00:05:47.439
<v Speaker 2>critical difference between those two you mentioned. Ravel might give

122
00:05:47.480 --> 00:05:49.560
<v Speaker 2>you back a view of the original data. They could

123
00:05:49.560 --> 00:05:53.439
<v Speaker 2>share the same memory, whereas flatten always creates a brand new,

124
00:05:53.560 --> 00:05:55.120
<v Speaker 2>independent copy.

125
00:05:55.279 --> 00:05:58.600
<v Speaker 1>And knowing that difference is vital, right, because changing a

126
00:05:58.720 --> 00:06:01.000
<v Speaker 1>view changes the original absolutely vital.

127
00:06:01.279 --> 00:06:03.120
<v Speaker 2>We definitely need to circle back to that view versus

128
00:06:03.120 --> 00:06:05.160
<v Speaker 2>copy thing. It catches so many people out, Okay.

129
00:06:05.720 --> 00:06:08.639
<v Speaker 1>You can also reshape arrays by setting the shape attribute

130
00:06:08.639 --> 00:06:12.199
<v Speaker 1>directly or using re size and transpose, or just two

131
00:06:12.319 --> 00:06:13.639
<v Speaker 1>for swapping rose and columns.

132
00:06:13.839 --> 00:06:16.360
<v Speaker 2>Yeah, teat is super common, especially if you're doing anything

133
00:06:16.399 --> 00:06:17.319
<v Speaker 2>with linear algebra.

134
00:06:17.480 --> 00:06:20.680
<v Speaker 1>Then they're stacking combining a rays together, right.

135
00:06:20.759 --> 00:06:23.279
<v Speaker 2>You can stab them horizontally side by side with.

136
00:06:23.319 --> 00:06:25.600
<v Speaker 1>Each stack, or vertically one above the other.

137
00:06:25.519 --> 00:06:28.439
<v Speaker 2>Withou B stack exactly, or even depth wise with D

138
00:06:28.600 --> 00:06:32.480
<v Speaker 2>stack for three D arrays. The general function behind these

139
00:06:32.600 --> 00:06:37.079
<v Speaker 2>is concatenat, where you explicitly say which access to join

140
00:06:37.120 --> 00:06:38.120
<v Speaker 2>along and.

141
00:06:38.079 --> 00:06:39.759
<v Speaker 1>The opposite is splitting them apart.

142
00:06:39.920 --> 00:06:42.279
<v Speaker 2>Yeah, each split V split, D split, or the general

143
00:06:42.319 --> 00:06:45.759
<v Speaker 2>split function takes a big array, cars it into smaller ones.

144
00:06:45.839 --> 00:06:48.560
<v Speaker 1>Okay, these seem like the day to day tools for

145
00:06:48.920 --> 00:06:50.120
<v Speaker 1>juggling array shapes.

146
00:06:50.199 --> 00:06:52.800
<v Speaker 2>They really are your bread butter for manipulation.

147
00:06:53.399 --> 00:06:55.519
<v Speaker 1>So let's double back, like you said to that really

148
00:06:55.560 --> 00:07:01.120
<v Speaker 1>critical idea, views versus copies. The book really emphasis and for.

149
00:07:01.240 --> 00:07:03.959
<v Speaker 2>Very good reason, is probably one of the most important,

150
00:07:04.439 --> 00:07:07.720
<v Speaker 2>maybe subtle things to grasp to avoid weird bugs. Okay,

151
00:07:08.360 --> 00:07:11.800
<v Speaker 2>when you do certain Numpi operations slicing is a big one,

152
00:07:12.360 --> 00:07:15.600
<v Speaker 2>or using the view method. Explicitly, Numpie tries to be efficient.

153
00:07:15.759 --> 00:07:18.040
<v Speaker 2>It doesn't want to waste time and a memory copying

154
00:07:18.160 --> 00:07:19.519
<v Speaker 2>data if it doesn't have to.

155
00:07:19.759 --> 00:07:21.839
<v Speaker 1>So it gives you a view. What does that actually mean.

156
00:07:22.360 --> 00:07:25.160
<v Speaker 2>It means the new array object you get back, yeah,

157
00:07:25.199 --> 00:07:28.800
<v Speaker 2>shares the same underlying data in memory as the original array.

158
00:07:29.319 --> 00:07:31.480
<v Speaker 2>It's just looking at potentially at a different part of it,

159
00:07:31.839 --> 00:07:33.600
<v Speaker 2>or maybe the same part with a different shape or

160
00:07:33.639 --> 00:07:34.639
<v Speaker 2>data type.

161
00:07:34.360 --> 00:07:37.439
<v Speaker 1>Which means if I changed the data in the view,

162
00:07:37.759 --> 00:07:38.759
<v Speaker 1>you are changing.

163
00:07:38.439 --> 00:07:41.399
<v Speaker 2>The data in the original array too, because it's the

164
00:07:41.439 --> 00:07:43.920
<v Speaker 2>same data. It's not a separate snapshot.

165
00:07:44.199 --> 00:07:46.439
<v Speaker 1>WHOA, Okay, that's huge.

166
00:07:46.399 --> 00:07:48.959
<v Speaker 2>It really is. The Lena image example in the book

167
00:07:49.000 --> 00:07:51.319
<v Speaker 2>makes it super clear. You take a slice of the

168
00:07:51.360 --> 00:07:54.839
<v Speaker 2>image array, maybe representing her hat. You set all the

169
00:07:54.839 --> 00:07:58.000
<v Speaker 2>pixel values in that slice to black, thinking you're just

170
00:07:58.040 --> 00:08:01.120
<v Speaker 2>modifying the slice. But then you look at the original.

171
00:08:00.800 --> 00:08:03.720
<v Speaker 1>Image and her hat is blacked out on the original too.

172
00:08:03.639 --> 00:08:06.959
<v Speaker 2>Exactly because the slice was just a view into the

173
00:08:07.000 --> 00:08:08.480
<v Speaker 2>original images data buffer.

174
00:08:09.120 --> 00:08:11.120
<v Speaker 1>Okay, so how do you avoid that? If you don't

175
00:08:11.120 --> 00:08:12.240
<v Speaker 1>want to change the original?

176
00:08:12.319 --> 00:08:15.519
<v Speaker 2>You have to explicitly ask for a copy using the

177
00:08:15.560 --> 00:08:18.600
<v Speaker 2>dot copy method. That tells numbpi, no, I want a

178
00:08:18.639 --> 00:08:21.920
<v Speaker 2>completely separate version of this data in new memory. Then

179
00:08:22.000 --> 00:08:25.560
<v Speaker 2>if you modify the copy, the original is totally unaffected.

180
00:08:26.000 --> 00:08:29.759
<v Speaker 1>Right. So views are efficient but dangerous if you're not careful.

181
00:08:30.079 --> 00:08:32.840
<v Speaker 1>Copies are safe, but use more memory and take time.

182
00:08:33.360 --> 00:08:35.320
<v Speaker 2>That's the trade off. You need to know when you're

183
00:08:35.320 --> 00:08:37.360
<v Speaker 2>getting a view and when you're getting a copy, and

184
00:08:37.480 --> 00:08:40.799
<v Speaker 2>use dot copy when you need independence. It's fundamental.

185
00:08:40.960 --> 00:08:45.799
<v Speaker 1>Oh, okay, fundamental. Indeed, Now beyond basic slicing, numb pi

186
00:08:45.960 --> 00:08:48.559
<v Speaker 1>has more advanced ways to index arrays.

187
00:08:49.120 --> 00:08:52.559
<v Speaker 2>Fancy indexing, yeah, fancy indexing. It basically lets you select

188
00:08:52.639 --> 00:08:56.960
<v Speaker 2>elements using things other than simple integer slices. Primarily you

189
00:08:57.080 --> 00:08:59.039
<v Speaker 2>use lists of indices or booleon arrays.

190
00:08:59.080 --> 00:09:01.360
<v Speaker 1>So Instead of describing a block like two point five,

191
00:09:01.480 --> 00:09:03.720
<v Speaker 1>I could give it a list, say one, five, seven,

192
00:09:03.919 --> 00:09:06.399
<v Speaker 1>to pick out just those specific rows.

193
00:09:06.159 --> 00:09:09.559
<v Speaker 2>Or columns exactly. You can pinpoint specific scattered elements. The

194
00:09:09.559 --> 00:09:12.559
<v Speaker 2>books example using nymex with lists to kind of shuffle

195
00:09:12.600 --> 00:09:15.720
<v Speaker 2>parts of the Lina image around visually shows this power.

196
00:09:15.559 --> 00:09:18.480
<v Speaker 1>Right, and Boolean indexing that sounds like filtering, it is.

197
00:09:18.559 --> 00:09:22.159
<v Speaker 2>It's incredibly useful for filtering data based on conditions. You

198
00:09:22.240 --> 00:09:24.559
<v Speaker 2>create an array of true and false values.

199
00:09:24.559 --> 00:09:27.360
<v Speaker 1>Usually by applying some comparison to your data array like

200
00:09:27.440 --> 00:09:28.639
<v Speaker 1>data ten precisely.

201
00:09:28.840 --> 00:09:31.559
<v Speaker 2>Yeah, and then you use that Booleon array as the

202
00:09:31.639 --> 00:09:33.759
<v Speaker 2>index for your original data array.

203
00:09:33.639 --> 00:09:35.320
<v Speaker 1>And numb I just gives you back the elements where

204
00:09:35.320 --> 00:09:36.720
<v Speaker 1>the Booleon array was true.

205
00:09:37.039 --> 00:09:40.440
<v Speaker 2>Yep, only those elements that meant the condition. The example

206
00:09:40.440 --> 00:09:43.200
<v Speaker 2>of putting dots along the diagonal of the Lina image

207
00:09:43.720 --> 00:09:46.000
<v Speaker 2>is a neat way to see its selecting pixels based

208
00:09:46.000 --> 00:09:48.399
<v Speaker 2>on whether their row and column index are equal.

209
00:09:48.519 --> 00:09:52.039
<v Speaker 1>Okay, that seems really powerful for data selection. Now stride

210
00:09:52.120 --> 00:09:55.000
<v Speaker 1>tricks that sounds a bit more arecane.

211
00:09:55.080 --> 00:09:58.080
<v Speaker 2>It is definitely a more advanced concept. Yeah, but the

212
00:09:58.120 --> 00:10:01.000
<v Speaker 2>idea behind it is fascinating and it really shows off

213
00:10:01.039 --> 00:10:04.600
<v Speaker 2>how numb pi thinks about memory. Okay, so we know

214
00:10:04.679 --> 00:10:08.159
<v Speaker 2>numbpi stores array data in one contiguous block of memory,

215
00:10:08.320 --> 00:10:12.519
<v Speaker 2>right because of howmogin eighty huh. Stride tricks let you

216
00:10:12.559 --> 00:10:15.159
<v Speaker 2>create views of that same block of memory, but you

217
00:10:15.200 --> 00:10:17.919
<v Speaker 2>tell numb pi to interpret it with a completely different structure.

218
00:10:18.320 --> 00:10:20.519
<v Speaker 2>You do this by specifying the strides.

219
00:10:20.120 --> 00:10:22.240
<v Speaker 1>Strides like how many bytes to jump to get to

220
00:10:22.240 --> 00:10:23.360
<v Speaker 1>the next element.

221
00:10:23.399 --> 00:10:26.039
<v Speaker 2>Exactly, how many bytes to step to get to the

222
00:10:26.080 --> 00:10:28.399
<v Speaker 2>next element in the same row, and how many bytes

223
00:10:28.440 --> 00:10:30.080
<v Speaker 2>to step to get to the next element in the

224
00:10:30.120 --> 00:10:33.440
<v Speaker 2>same column, which is usually just the side of one row. Okay,

225
00:10:33.639 --> 00:10:36.879
<v Speaker 2>with stride tricks, using functions like a strided, you can

226
00:10:37.000 --> 00:10:39.919
<v Speaker 2>manipulate those step sizes. You can tell numb pi to

227
00:10:39.960 --> 00:10:42.639
<v Speaker 2>get to the next element in this dimension step forward

228
00:10:42.639 --> 00:10:45.879
<v Speaker 2>this many bytes, even if that overlaps with previous data

229
00:10:46.000 --> 00:10:47.919
<v Speaker 2>or creates a totally different logical layout.

230
00:10:48.039 --> 00:10:51.120
<v Speaker 1>The Sudoku example in the book was wild taking a

231
00:10:51.240 --> 00:10:52.720
<v Speaker 1>nine by nine grid.

232
00:10:52.559 --> 00:10:55.039
<v Speaker 2>Yeah, and using a stride in to make numb PI

233
00:10:55.200 --> 00:10:58.000
<v Speaker 2>see that same nine by nine block of memory not

234
00:10:58.120 --> 00:11:00.600
<v Speaker 2>just as nine rows of nine numbers, but as an

235
00:11:00.679 --> 00:11:01.879
<v Speaker 2>array of three by three.

236
00:11:01.720 --> 00:11:04.399
<v Speaker 1>Squares without copying any data without.

237
00:11:04.200 --> 00:11:06.919
<v Speaker 2>Copying anything, You're just giving NUMPI a new recipe, new

238
00:11:06.919 --> 00:11:09.440
<v Speaker 2>strides for how to walk through the existing memory to

239
00:11:09.679 --> 00:11:11.120
<v Speaker 2>precede these three by three blocks.

240
00:11:11.159 --> 00:11:14.559
<v Speaker 1>Wow, So that efficient memory layout isn't just for raw speed.

241
00:11:14.639 --> 00:11:18.240
<v Speaker 1>It enables these incredibly clever ways to access structured data

242
00:11:18.320 --> 00:11:19.879
<v Speaker 1>within the array exactly.

243
00:11:20.080 --> 00:11:24.000
<v Speaker 2>It really highlights how numb pile leverages that contiguous memory.

244
00:11:24.279 --> 00:11:26.519
<v Speaker 2>You can instantly get all the three by three blocks

245
00:11:26.600 --> 00:11:29.879
<v Speaker 2>or overlapping windows for signal processing just by defining the

246
00:11:29.919 --> 00:11:33.399
<v Speaker 2>right strides. It raises that question, doesn't it How this

247
00:11:33.519 --> 00:11:36.720
<v Speaker 2>simple contiguous block enables such complex views?

248
00:11:36.840 --> 00:11:42.919
<v Speaker 1>Mind blown slightly? Okay? One more fundamental concept broadcasting. This

249
00:11:42.960 --> 00:11:45.919
<v Speaker 1>one seems simpler but pops up everywhere it does.

250
00:11:46.559 --> 00:11:49.720
<v Speaker 2>Broadcasting is how numb pie handles operations like addition or

251
00:11:49.799 --> 00:11:53.120
<v Speaker 2>multiplication between arrays that don't have the exact same.

252
00:11:52.960 --> 00:11:55.679
<v Speaker 1>Shape, like adding a single number to every element in

253
00:11:55.720 --> 00:11:56.399
<v Speaker 1>an array.

254
00:11:56.240 --> 00:11:59.080
<v Speaker 2>That's a classic example, or multiplying an entire array by

255
00:11:59.080 --> 00:12:02.159
<v Speaker 2>a scaler. The rule basically is that numbpi tries to

256
00:12:02.240 --> 00:12:06.240
<v Speaker 2>stretch or duplicate the smaller arrays dimensions so that its

257
00:12:06.279 --> 00:12:09.000
<v Speaker 2>shape becomes compatible with the larger array. For the element

258
00:12:09.039 --> 00:12:09.759
<v Speaker 2>wise operation.

259
00:12:09.919 --> 00:12:11.799
<v Speaker 1>The audio volume example is perfect for this. You have

260
00:12:11.799 --> 00:12:12.879
<v Speaker 1>an array of audio.

261
00:12:12.559 --> 00:12:14.559
<v Speaker 2>Sample right, maybe thousands of numbers, then you.

262
00:12:14.519 --> 00:12:16.600
<v Speaker 1>Just multiply it by point two to make it quieter.

263
00:12:16.720 --> 00:12:20.200
<v Speaker 2>Yep, numb Pi doesn't actually create a massive array filled

264
00:12:20.240 --> 00:12:23.559
<v Speaker 2>with point two's to match the audio data size. That

265
00:12:23.600 --> 00:12:27.039
<v Speaker 2>would be really inefficient. It just understands the broadcasting rule.

266
00:12:27.360 --> 00:12:30.440
<v Speaker 2>It sees you're multiplying an n dimensional array by a scaler,

267
00:12:30.679 --> 00:12:33.080
<v Speaker 2>which is like a zero dimensional array. It knows to

268
00:12:33.120 --> 00:12:36.200
<v Speaker 2>apply that scaler multiplication to every single element of the

269
00:12:36.279 --> 00:12:37.360
<v Speaker 2>n dimensional array.

270
00:12:37.240 --> 00:12:40.159
<v Speaker 1>In one go, using that fast C code again exactly.

271
00:12:40.240 --> 00:12:42.080
<v Speaker 2>So what does this all mean for you? It means

272
00:12:42.080 --> 00:12:44.279
<v Speaker 2>you can write really intuitive code like audio dada zero

273
00:12:44.360 --> 00:12:47.480
<v Speaker 2>point two or ray plus five without writing loops. Numb

274
00:12:47.480 --> 00:12:50.000
<v Speaker 2>Pile figures out how to make the shapes compatible efficiently.

275
00:12:50.240 --> 00:12:52.120
<v Speaker 2>It makes a ray math much cleaner and faster.

276
00:12:52.559 --> 00:12:57.000
<v Speaker 1>Okay, so we've got the foundation efficient arrays, data types, manipulation,

277
00:12:57.200 --> 00:13:02.480
<v Speaker 1>the crucial views versus copies, fancy indexings, broadcasting. That's a

278
00:13:02.519 --> 00:13:04.200
<v Speaker 1>powerful toolkit just on its own.

279
00:13:04.320 --> 00:13:04.919
<v Speaker 2>Absolutely.

280
00:13:04.960 --> 00:13:07.039
<v Speaker 1>Now let's talk about putting it to work. The book

281
00:13:07.080 --> 00:13:10.960
<v Speaker 1>moves into actual data analysis prediction linking up with other

282
00:13:11.039 --> 00:13:11.639
<v Speaker 1>librarries right.

283
00:13:11.679 --> 00:13:14.840
<v Speaker 2>Applying these tools. The basic data analysis example using weather

284
00:13:14.919 --> 00:13:18.399
<v Speaker 2>data from a station in the Netherlands to built I

285
00:13:18.440 --> 00:13:20.080
<v Speaker 2>think it's very practical.

286
00:13:19.679 --> 00:13:22.320
<v Speaker 1>Shows how you load data from a file maybe a

287
00:13:22.360 --> 00:13:25.080
<v Speaker 1>CSV or a text file using load TX.

288
00:13:24.879 --> 00:13:28.360
<v Speaker 2>And it immediately hits a real world issue, messy data

289
00:13:29.159 --> 00:13:30.679
<v Speaker 2>missing values.

290
00:13:30.320 --> 00:13:33.039
<v Speaker 1>Yeah, which happens all the time. In that data set,

291
00:13:33.240 --> 00:13:36.960
<v Speaker 1>missing values were marked with like meta I or something yeah.

292
00:13:36.759 --> 00:13:40.759
<v Speaker 2>Some special code, and the book shows how you typically

293
00:13:40.759 --> 00:13:44.399
<v Speaker 2>handle that, maybe filter them out or more often convert

294
00:13:44.399 --> 00:13:48.879
<v Speaker 2>them into NUMBPI special NAN value not a number, right nan,

295
00:13:49.200 --> 00:13:53.200
<v Speaker 2>because numbpie's math functions often know how to handle NaN's correctly,

296
00:13:53.639 --> 00:13:55.919
<v Speaker 2>like ignoring them when calculating a mean.

297
00:13:56.120 --> 00:13:59.080
<v Speaker 1>Okay, so data loaded cleaned up a bit, then doing

298
00:13:59.120 --> 00:14:01.399
<v Speaker 1>the actual analysis is easy, super easy.

299
00:14:01.399 --> 00:14:05.960
<v Speaker 2>With NUMPI wenty average temperature maxwindspeed dot max standard deviation

300
00:14:06.039 --> 00:14:09.080
<v Speaker 2>dot STV. You apply these functions directly to your arrays

301
00:14:09.159 --> 00:14:10.080
<v Speaker 2>or columns of data.

302
00:14:10.159 --> 00:14:13.159
<v Speaker 1>The example showed calculating things like the daily temperature range

303
00:14:13.240 --> 00:14:16.120
<v Speaker 1>max minus men or looking at yearly averages.

304
00:14:16.360 --> 00:14:19.200
<v Speaker 2>Yeah. And while you know one station's data isn't proof

305
00:14:19.200 --> 00:14:21.519
<v Speaker 2>of global warming or anything of course, it gives you

306
00:14:21.559 --> 00:14:24.600
<v Speaker 2>a taste of using these tools for exploring trends. You

307
00:14:24.639 --> 00:14:27.279
<v Speaker 2>could do the same for wind, pressure, humidity, whatever's in

308
00:14:27.279 --> 00:14:27.840
<v Speaker 2>the data sets.

309
00:14:27.840 --> 00:14:31.799
<v Speaker 1>Okay, so that's understanding past data. What about predicting the future?

310
00:14:32.440 --> 00:14:34.960
<v Speaker 1>The book touches on simple predictive analytics.

311
00:14:35.279 --> 00:14:39.039
<v Speaker 2>Yeah, moving from description to forecasting. The core idea is

312
00:14:39.120 --> 00:14:42.120
<v Speaker 2>using historical patterns to guess what might happen.

313
00:14:41.919 --> 00:14:44.519
<v Speaker 1>Next, like with the temperature data exactly.

314
00:14:45.120 --> 00:14:49.480
<v Speaker 2>The book mentions basic concepts like autoregressive models or ar models.

315
00:14:49.919 --> 00:14:54.440
<v Speaker 2>Simple idea, predict tomorrow's temperature based on today's yesterday's the

316
00:14:54.519 --> 00:14:55.159
<v Speaker 2>day before.

317
00:14:55.320 --> 00:14:58.200
<v Speaker 1>Using past values to predict the future.

318
00:14:58.120 --> 00:15:00.519
<v Speaker 2>Right, and it hints at how you'd actually fit a

319
00:15:00.559 --> 00:15:03.559
<v Speaker 2>model like that to your data. This often involves bringing

320
00:15:03.600 --> 00:15:07.399
<v Speaker 2>in tools from the wider scientific Python world like SIP. Yeah,

321
00:15:07.480 --> 00:15:10.360
<v Speaker 2>maybe using something like pip dot optimize at least sq

322
00:15:10.480 --> 00:15:13.360
<v Speaker 2>to find the model parameters that best match the historical data.

323
00:15:14.080 --> 00:15:17.240
<v Speaker 2>It also mentions that tools like pandas, which builds on NUMPI,

324
00:15:17.600 --> 00:15:20.480
<v Speaker 2>are great for summarizing data and looking for correlations before

325
00:15:20.480 --> 00:15:21.519
<v Speaker 2>you even start modeling.

326
00:15:21.759 --> 00:15:25.679
<v Speaker 1>Makes sense. Another area is signal processing, analyzing data that

327
00:15:25.759 --> 00:15:28.039
<v Speaker 1>changes over time, often with cycles right.

328
00:15:27.879 --> 00:15:31.279
<v Speaker 2>Like the sunspot data example. Sunspots have these known cycles

329
00:15:31.360 --> 00:15:32.320
<v Speaker 2>roughly eleven.

330
00:15:32.080 --> 00:15:35.720
<v Speaker 1>Years, and signal processing techniques help analyze those patterns.

331
00:15:36.000 --> 00:15:39.639
<v Speaker 2>Yep. The book mentions smoothing like using a moving average

332
00:15:39.960 --> 00:15:42.600
<v Speaker 2>to filter out short term noise and see the underlying

333
00:15:42.639 --> 00:15:43.679
<v Speaker 2>trend more clearly.

334
00:15:44.240 --> 00:15:47.120
<v Speaker 1>Though it notes simple moving averages aren't always the best

335
00:15:47.159 --> 00:15:50.080
<v Speaker 1>for cyclical data like sunspots.

336
00:15:49.559 --> 00:15:52.720
<v Speaker 2>Correct they can distort the peaks and troughs, so it

337
00:15:52.840 --> 00:15:56.080
<v Speaker 2>hints it more advanced stuff like decomposing a signal into

338
00:15:56.080 --> 00:16:00.000
<v Speaker 2>its core components. It mentions techniques like EMD empirical mode

339
00:16:00.120 --> 00:16:04.080
<v Speaker 2>decomposition to break down the sunspot signal into intrinsic mode

340
00:16:04.080 --> 00:16:07.679
<v Speaker 2>functions or IMFs to better analyze those cycles.

341
00:16:07.799 --> 00:16:10.759
<v Speaker 1>And NUMPI is doing the heavy lifting numerically for these

342
00:16:10.759 --> 00:16:11.960
<v Speaker 1>algorithms exactly.

343
00:16:12.320 --> 00:16:16.240
<v Speaker 2>It provides the array operations needed to implement these complex

344
00:16:16.440 --> 00:16:17.759
<v Speaker 2>signal processing methods.

345
00:16:17.960 --> 00:16:20.679
<v Speaker 1>It really feels like NUMPI isn't just standalone, It's like

346
00:16:20.799 --> 00:16:26.240
<v Speaker 1>the central hub for this whole ecosystem of scientific Python tools.

347
00:16:26.279 --> 00:16:28.159
<v Speaker 2>That's a great way to put it. It's the common language,

348
00:16:28.200 --> 00:16:31.200
<v Speaker 2>the common data structure SCIPI, as we mentioned, builds directly

349
00:16:31.200 --> 00:16:34.159
<v Speaker 2>on NUMPI. That's tons more advanced scientific tools.

350
00:16:33.840 --> 00:16:35.679
<v Speaker 1>Like what kind of things, oh.

351
00:16:35.480 --> 00:16:42.519
<v Speaker 2>Numerical integration, solving, differential equations, interpolation, optimization algorithms, more linear algebras,

352
00:16:42.519 --> 00:16:47.279
<v Speaker 2>statistical functions, stuff that goes beyond numbpi's core array focus.

353
00:16:47.000 --> 00:16:49.960
<v Speaker 1>And psychic Learn, the big machine learning.

354
00:16:49.679 --> 00:16:54.000
<v Speaker 2>Library hugely reliant on NUMBPI. Almost everything in psychic learn

355
00:16:54.080 --> 00:16:58.240
<v Speaker 2>expects input data as NUMBPI arrays your features, your targets,

356
00:16:58.720 --> 00:17:01.679
<v Speaker 2>and it often outputs for results as numbi erays two

357
00:17:02.399 --> 00:17:07.039
<v Speaker 2>predictions model coefficients. The book points to examples like clustering

358
00:17:07.039 --> 00:17:10.400
<v Speaker 2>stock data or using Psychic's image which is related for

359
00:17:10.480 --> 00:17:13.839
<v Speaker 2>image processing, like finding corners in a picture, all powered

360
00:17:13.839 --> 00:17:15.240
<v Speaker 2>by numb pi arrays underneath.

361
00:17:15.359 --> 00:17:18.720
<v Speaker 1>And what if Python itself, even with numbpis C backend,

362
00:17:18.920 --> 00:17:21.519
<v Speaker 1>isn't fast enough for some really critical part of your code.

363
00:17:21.559 --> 00:17:24.240
<v Speaker 2>That's where Cithon comes in Python. It's a language that's

364
00:17:24.359 --> 00:17:26.200
<v Speaker 2>kind of a mix of Python and C. You can

365
00:17:26.240 --> 00:17:29.599
<v Speaker 2>write Python like code, add some static type declarations, and

366
00:17:29.680 --> 00:17:31.119
<v Speaker 2>Cython compiles it down to.

367
00:17:31.039 --> 00:17:33.599
<v Speaker 1>Efficient C code, and it works well with NUMPI.

368
00:17:33.599 --> 00:17:37.599
<v Speaker 2>Very well because NUMPI arrays already have that underlying C structure.

369
00:17:37.960 --> 00:17:40.559
<v Speaker 2>Cython code can operate on the array data directly at

370
00:17:40.640 --> 00:17:44.799
<v Speaker 2>C speeds without the Python interpreter overhead. It's great for

371
00:17:44.839 --> 00:17:49.119
<v Speaker 2>optimizing bottlenecks or for wrapping existing C or C plus

372
00:17:49.119 --> 00:17:51.000
<v Speaker 2>libraries to use them from Python.

373
00:17:51.279 --> 00:17:54.039
<v Speaker 1>So the ecosystem is NUMBPI at the core, SIPI for

374
00:17:54.160 --> 00:17:58.079
<v Speaker 1>more science, math tools, psychic learn for mL, Cython for

375
00:17:58.119 --> 00:18:00.680
<v Speaker 1>speed optimization. It's quite layered, it is.

376
00:18:00.880 --> 00:18:03.519
<v Speaker 2>And it's still evolving. The book even looks ahead, mentioning

377
00:18:03.559 --> 00:18:05.160
<v Speaker 2>projects like Blaze Blaze.

378
00:18:05.160 --> 00:18:06.160
<v Speaker 1>What's the idea there?

379
00:18:06.640 --> 00:18:09.079
<v Speaker 2>The goal is to take Numpie's array based way of

380
00:18:09.119 --> 00:18:11.359
<v Speaker 2>thinking and extend it to data sets that are too

381
00:18:11.400 --> 00:18:12.519
<v Speaker 2>big for memory.

382
00:18:12.400 --> 00:18:15.559
<v Speaker 1>Ah big data territory or streaming.

383
00:18:15.240 --> 00:18:19.559
<v Speaker 2>Data exactly, applying similar principles of efficient array oriented computation

384
00:18:19.799 --> 00:18:22.640
<v Speaker 2>to distributed systems or data streams. It shows that this

385
00:18:22.759 --> 00:18:25.400
<v Speaker 2>core idea started by Numpi is still expanding.

386
00:18:25.559 --> 00:18:27.720
<v Speaker 1>That's really cool. Okay, Before we wrap up one less

387
00:18:27.759 --> 00:18:33.359
<v Speaker 1>practical point, the book covers good development practices Profiling, debugging, testing, Yeah.

388
00:18:33.119 --> 00:18:36.279
<v Speaker 2>Really important, especially with numerical code, where small errors can

389
00:18:36.319 --> 00:18:40.480
<v Speaker 2>sometimes lead to big problems or just slow things down unnecessarily.

390
00:18:40.000 --> 00:18:42.519
<v Speaker 1>Profiling helps you find where your code is spending its

391
00:18:42.559 --> 00:18:44.359
<v Speaker 1>time the bottlenecks right.

392
00:18:44.599 --> 00:18:47.640
<v Speaker 2>Debugging is for tracking down errors when things go wrong,

393
00:18:48.079 --> 00:18:52.000
<v Speaker 2>and testing. Writing automated test is crucial for making sure

394
00:18:52.039 --> 00:18:55.440
<v Speaker 2>your code actually works as expected and stays working when

395
00:18:55.440 --> 00:18:56.720
<v Speaker 2>you make changes later.

396
00:18:56.680 --> 00:18:59.880
<v Speaker 1>And Python and its ecosystem have tools for these.

397
00:19:00.079 --> 00:19:04.440
<v Speaker 2>Definitely, Python has built in profilers I Python has magic

398
00:19:04.440 --> 00:19:07.200
<v Speaker 2>commands like percent debug for jumping into the debugger right

399
00:19:07.240 --> 00:19:10.519
<v Speaker 2>after an error. There are standalone debuggers like a PDB,

400
00:19:10.880 --> 00:19:14.799
<v Speaker 2>and for testing, Python's unitist module is standard, but libraries

401
00:19:14.839 --> 00:19:17.880
<v Speaker 2>like nose or pie test are very popular, especially in

402
00:19:17.920 --> 00:19:21.599
<v Speaker 2>the scientific community. They have helpful tools like special functions

403
00:19:21.599 --> 00:19:25.559
<v Speaker 2>for comparing floating point arrays where exact equality is often tricky.

404
00:19:25.720 --> 00:19:28.240
<v Speaker 2>These are all essential habits for writing robust code.

405
00:19:28.279 --> 00:19:31.079
<v Speaker 1>Wow, okay, that was definitely a deep dive. We started

406
00:19:31.119 --> 00:19:34.200
<v Speaker 1>with that fundamental idea, the heniray giving this huge efficiency

407
00:19:34.200 --> 00:19:35.160
<v Speaker 1>boost over lists.

408
00:19:35.319 --> 00:19:38.920
<v Speaker 2>Yeah, thanks to homogeneity those vectorized operations the se core.

409
00:19:39.240 --> 00:19:41.799
<v Speaker 1>We looked at the arrays, structure, its attributes, how to

410
00:19:41.839 --> 00:19:44.559
<v Speaker 1>manipulate it with reshaping, stacking.

411
00:19:44.359 --> 00:19:49.279
<v Speaker 2>Grappled with that critical views versus copies distinction definitely critical.

412
00:19:49.880 --> 00:19:54.079
<v Speaker 1>Then dove into advanced indexing, fancy indexing, boolean indexing, those

413
00:19:54.240 --> 00:19:57.000
<v Speaker 1>mind bending stride tricks, and the super useful.

414
00:19:56.759 --> 00:19:59.519
<v Speaker 2>Broadcasting, and then saw how it all comes together for

415
00:19:59.599 --> 00:20:04.759
<v Speaker 2>real work, basic data analysis, hints of prediction and signal.

416
00:20:04.440 --> 00:20:08.960
<v Speaker 1>Processing, and its absolutely central role in that wider scientific

417
00:20:08.960 --> 00:20:13.920
<v Speaker 1>Python world, connecting with SCIPI, psychic learn, cithon.

418
00:20:13.839 --> 00:20:15.839
<v Speaker 2>Even looking forward with things like Blaze.

419
00:20:15.519 --> 00:20:18.039
<v Speaker 1>You know, pulling these insights from the book, it really

420
00:20:18.079 --> 00:20:21.200
<v Speaker 1>feels like understanding why numbpi works this way. The views,

421
00:20:21.240 --> 00:20:23.519
<v Speaker 1>the broadcasting, the see back end is the key. It's

422
00:20:23.559 --> 00:20:26.079
<v Speaker 1>the shortcut to writing code that's not just correct, but

423
00:20:26.119 --> 00:20:30.240
<v Speaker 1>also fast and well pythonic in this numerical context.

424
00:20:30.480 --> 00:20:34.160
<v Speaker 2>Absolutely so. Thinking about all that power using Numbpi and

425
00:20:34.200 --> 00:20:37.960
<v Speaker 2>friends for complex analysis, prediction, image stuff, integrating with all

426
00:20:38.000 --> 00:20:41.240
<v Speaker 2>these libraries, and the future potential for even bigger data,

427
00:20:41.640 --> 00:20:44.000
<v Speaker 2>it makes you wonder, right, what kinds of problems that

428
00:20:44.039 --> 00:20:47.519
<v Speaker 2>maybe seem overwhelming today might actually become solvable with these

429
00:20:47.519 --> 00:20:50.319
<v Speaker 2>tools tomorrow? What for you stands out as the most

430
00:20:50.319 --> 00:20:53.319
<v Speaker 2>surprising or maybe powerful capability We touched on
