WEBVTT 1 00:00:00.080 --> 00:00:03.439 Okay, let's unpack this today. We are diving deep into 2 00:00:03.799 --> 00:00:07.599 something well, pretty fundamental if you work with data and 3 00:00:07.679 --> 00:00:10.640 numbers in Python. We're talking about the numpi library. 4 00:00:10.679 --> 00:00:14.160 Oh yeah, it's absolutely bedrock. If you've ever felt the 5 00:00:14.199 --> 00:00:18.079 pain of doing complex math or managing big data sets 6 00:00:18.120 --> 00:00:22.839 with just standard Python lists, numb pie is basically the answer. 7 00:00:23.000 --> 00:00:25.120 And we've got a great guide for this deep dive. 8 00:00:25.160 --> 00:00:28.320 It's a book called Learning numb Pi. Array really takes 9 00:00:28.359 --> 00:00:32.000 you from the core ideas right through to tackling some 10 00:00:32.039 --> 00:00:32.840 real world stuff. 11 00:00:32.880 --> 00:00:35.240 It does, yeah, and it's good because it doesn't just 12 00:00:35.280 --> 00:00:38.119 throw syntax at you. It explains why numpi is built 13 00:00:38.119 --> 00:00:40.520 the way it is that's really key to you using 14 00:00:40.520 --> 00:00:41.759 it effectively exactly. 15 00:00:42.119 --> 00:00:44.200 So our mission here is to pull out the most 16 00:00:44.280 --> 00:00:47.920 valuable insights, those aha moments from the material give you 17 00:00:47.960 --> 00:00:51.439 a shortcut basically to understanding why numb pi is so critical, 18 00:00:51.960 --> 00:00:55.079 how it gets that amazing speed and efficiency, and how 19 00:00:55.119 --> 00:00:58.039 you can actually use it from basic array stuff to 20 00:00:58.479 --> 00:01:01.240 data analysis, maybe even some predictive modeling later on. 21 00:01:01.600 --> 00:01:02.759 Sounds good, Let's jump in. 22 00:01:02.880 --> 00:01:07.200 Okay, So, starting right at the beginning, what is numb 23 00:01:07.200 --> 00:01:10.599 pi really at its core? And why should someone doing 24 00:01:10.719 --> 00:01:13.400 numerical work in Python absolutely care well? 25 00:01:13.400 --> 00:01:16.840 Think of it as Python's high performance engine for well, 26 00:01:16.959 --> 00:01:20.439 anything involving a raise or matrices of numbers. It gives 27 00:01:20.439 --> 00:01:23.719 you this core object, the end array, right the end 28 00:01:23.760 --> 00:01:26.439 of it, and that's specifically built for numerical work. 29 00:01:26.519 --> 00:01:29.239 And the difference compared to a standard Python list is huge, 30 00:01:29.280 --> 00:01:31.959 isn't it? That efficiency? That speed? That's one of the 31 00:01:32.000 --> 00:01:33.079 first big takeaways. 32 00:01:33.120 --> 00:01:35.519 Absolute massive. You know, Python lists are super flexible, they 33 00:01:35.519 --> 00:01:38.840 can hold anything, but that flexibility it costs you when 34 00:01:38.840 --> 00:01:43.040 you're doing math. Right, NUMBPI arrays, they're homogeneous. All elements 35 00:01:43.079 --> 00:01:45.640 are the same data type. That means NUMBPI can store 36 00:01:45.680 --> 00:01:48.719 them way more efficiently in memory. Okay, Plus a lot 37 00:01:48.719 --> 00:01:51.560 of numbpi is actually written in C, so the computations 38 00:01:51.560 --> 00:01:53.159 themselves are just lightning fast. 39 00:01:53.239 --> 00:01:57.599 And that homogeneity enables one of numpis like superpowers. Right. 40 00:01:57.680 --> 00:02:02.120 Vectorization exactly vectorized operation instead of writing for loops to 41 00:02:02.319 --> 00:02:03.439 manually iterate. 42 00:02:03.159 --> 00:02:05.280 Through numbers, it's tedious and slow. 43 00:02:05.959 --> 00:02:09.080 Really slow. Yeah, you just apply operations to the entire 44 00:02:09.240 --> 00:02:11.520 array at once, kind of like how you'd write it 45 00:02:11.520 --> 00:02:12.240 in math notation. 46 00:02:12.479 --> 00:02:14.520 Much cleaner code, much cleaner. 47 00:02:14.400 --> 00:02:17.680 And crucilly much much faster because it's running optimized C 48 00:02:17.840 --> 00:02:18.680 code underneath. 49 00:02:18.840 --> 00:02:21.120 That's where the real performance game kicks in, especially with 50 00:02:21.159 --> 00:02:22.319 bigger data sets. I guess. 51 00:02:22.360 --> 00:02:25.360 Oh. Definitely, getting started is usually just a simple install. 52 00:02:25.400 --> 00:02:27.840 The book covers that for Windows, Linux, Mac, you know, 53 00:02:28.400 --> 00:02:30.479 But the real proof is seeing it in action. 54 00:02:30.560 --> 00:02:32.840 Okay, yeah, and this is where it gets really interesting. 55 00:02:32.879 --> 00:02:37.599 I thought that vector edition example, right, adding two lists 56 00:02:37.599 --> 00:02:41.240 in pure Python with loops. Sure it works, but it's slow. 57 00:02:41.680 --> 00:02:44.639 Then you see the numb pie version one line and 58 00:02:44.719 --> 00:02:46.879 dramatically faster, dramatically faster. 59 00:02:47.080 --> 00:02:49.439 And the key thing there it's not just that it's 60 00:02:49.479 --> 00:02:54.159 faster for one small example. That performance gap scales. With 61 00:02:54.280 --> 00:02:57.360 small arrays, maybe you don't notice much, but when you 62 00:02:57.400 --> 00:03:01.280 have millions billions of elements, numb pi isn't just nice 63 00:03:01.319 --> 00:03:04.599 to have, it's essential. It makes operations feasible that would 64 00:03:04.680 --> 00:03:06.360 just be impossibly slow with lists. 65 00:03:06.599 --> 00:03:09.599 Okay, so the core idea is speed an efficiency built 66 00:03:09.599 --> 00:03:12.400 around the special array type. Let's talk more about the 67 00:03:12.439 --> 00:03:14.400 nundarray itself, the fundamental building. 68 00:03:14.080 --> 00:03:16.639 Block, right, the number. So, like we said, the defining 69 00:03:16.680 --> 00:03:21.680 thing is homogeneity, all elements, same data type. This predictability 70 00:03:21.719 --> 00:03:24.639 knowing how big each element is. Let's numb pi lay 71 00:03:24.639 --> 00:03:29.199 out the data really efficiently in memory contiguously. 72 00:03:28.680 --> 00:03:32.319 And they're zero indexed like Python lists. 73 00:03:32.039 --> 00:03:34.919 Yep, zero indexed. First element is that index zero, and 74 00:03:34.960 --> 00:03:38.960 the array object itself knows things about its structure metadata exactly. 75 00:03:39.000 --> 00:03:40.639 You can ask it for its shape, like is it. 76 00:03:40.639 --> 00:03:42.479 A row, a matrix, a three D. 77 00:03:42.520 --> 00:03:45.400 Cube precisely, or it's d typed to know what type 78 00:03:45.439 --> 00:03:48.520 of data it holds right size for the total number 79 00:03:48.520 --> 00:03:50.599 of elements. Item size sells you how many bytes each 80 00:03:50.639 --> 00:03:53.759 element uses, and in them for the number of dimensions 81 00:03:54.080 --> 00:03:55.639 you use these attributes all the time. 82 00:03:55.719 --> 00:03:57.840 And the data types themselves, the D types, there's a 83 00:03:57.960 --> 00:03:58.520 decent range. 84 00:03:58.560 --> 00:04:02.039 Oh yeah, everything you'd expect from numeror work integers, signed 85 00:04:02.080 --> 00:04:04.599 and unsigned, different sizes like selling two bits, sixty four 86 00:04:04.639 --> 00:04:10.240 bit floats, single double precision boollions, complex numbers, and these 87 00:04:10.280 --> 00:04:13.680 are represented by special D type objects like in sixty 88 00:04:13.680 --> 00:04:16.759 four float thirty two. You can convert between types two, 89 00:04:16.839 --> 00:04:17.360 but you have to. 90 00:04:17.279 --> 00:04:20.199 Be careful, like you can't just stuff a complex number 91 00:04:20.240 --> 00:04:22.279 into an integer array exactly. 92 00:04:22.639 --> 00:04:25.800 NUMPI will complain rightly so with a type error. It 93 00:04:25.879 --> 00:04:27.800 knows that doesn't make sense mathematically. 94 00:04:27.959 --> 00:04:31.319 Well, you stress homogeneity is key, But the book mentions 95 00:04:31.360 --> 00:04:34.600 something called record data types. That sounds like it breaks 96 00:04:34.600 --> 00:04:35.000 the rule. 97 00:04:35.120 --> 00:04:38.959 Oh, it's a clever exception. It's for structured data. Think 98 00:04:39.000 --> 00:04:42.279 of like a spreadsheet row okay, where each cell in 99 00:04:42.319 --> 00:04:44.639 the row might be different in name, which is text, 100 00:04:44.920 --> 00:04:47.279 an age, an integer, a height of float. 101 00:04:47.360 --> 00:04:48.079 Mixed types. 102 00:04:48.240 --> 00:04:51.079 A record d type lets you define named fields within 103 00:04:51.120 --> 00:04:54.319 the array element structure. Each field can have its own 104 00:04:54.360 --> 00:04:55.399 specific data type. 105 00:04:55.480 --> 00:04:58.040 Ah. Okay, so the elements are all the same record type. 106 00:04:58.079 --> 00:05:01.319 But inside that record types can do exactly. 107 00:05:01.000 --> 00:05:07.680 The inventory example in the book nails it item name, string, count, integer, price, float, 108 00:05:08.160 --> 00:05:11.519 all stored together efficiently in a NUMPI array structure. It's 109 00:05:11.560 --> 00:05:12.600 great for tabular data. 110 00:05:12.759 --> 00:05:15.680 Okay, that makes sense. So we have these efficient arrays, 111 00:05:15.920 --> 00:05:18.920 how do we work with them? Slicing comes first feels 112 00:05:18.959 --> 00:05:20.160 familiar from lists. 113 00:05:20.399 --> 00:05:24.240 Yeah, for one dimensional arrays, slicing is exactly like Python lists. Yeah, 114 00:05:24.319 --> 00:05:26.480 grab a chunk using start, stop step. 115 00:05:26.560 --> 00:05:29.000 But it gets more interesting with multiple dimensions. 116 00:05:29.079 --> 00:05:32.839 Right, you can slice along each axis or dimension, and 117 00:05:32.879 --> 00:05:34.720 then you get into reshaping and flattening. 118 00:05:34.759 --> 00:05:38.480 Flattening like ravel or flatten That takes a multidimensional array 119 00:05:38.519 --> 00:05:40.040 and just makes it one long row. 120 00:05:40.160 --> 00:05:43.279 Yep, turns it into one D. But there's a really 121 00:05:43.319 --> 00:05:47.439 critical difference between those two you mentioned. Ravel might give 122 00:05:47.480 --> 00:05:49.560 you back a view of the original data. They could 123 00:05:49.560 --> 00:05:53.439 share the same memory, whereas flatten always creates a brand new, 124 00:05:53.560 --> 00:05:55.120 independent copy. 125 00:05:55.279 --> 00:05:58.600 And knowing that difference is vital, right, because changing a 126 00:05:58.720 --> 00:06:01.000 view changes the original absolutely vital. 127 00:06:01.279 --> 00:06:03.120 We definitely need to circle back to that view versus 128 00:06:03.120 --> 00:06:05.160 copy thing. It catches so many people out, Okay. 129 00:06:05.720 --> 00:06:08.639 You can also reshape arrays by setting the shape attribute 130 00:06:08.639 --> 00:06:12.199 directly or using re size and transpose, or just two 131 00:06:12.319 --> 00:06:13.639 for swapping rose and columns. 132 00:06:13.839 --> 00:06:16.360 Yeah, teat is super common, especially if you're doing anything 133 00:06:16.399 --> 00:06:17.319 with linear algebra. 134 00:06:17.480 --> 00:06:20.680 Then they're stacking combining a rays together, right. 135 00:06:20.759 --> 00:06:23.279 You can stab them horizontally side by side with. 136 00:06:23.319 --> 00:06:25.600 Each stack, or vertically one above the other. 137 00:06:25.519 --> 00:06:28.439 Withou B stack exactly, or even depth wise with D 138 00:06:28.600 --> 00:06:32.480 stack for three D arrays. The general function behind these 139 00:06:32.600 --> 00:06:37.079 is concatenat, where you explicitly say which access to join 140 00:06:37.120 --> 00:06:38.120 along and. 141 00:06:38.079 --> 00:06:39.759 The opposite is splitting them apart. 142 00:06:39.920 --> 00:06:42.279 Yeah, each split V split, D split, or the general 143 00:06:42.319 --> 00:06:45.759 split function takes a big array, cars it into smaller ones. 144 00:06:45.839 --> 00:06:48.560 Okay, these seem like the day to day tools for 145 00:06:48.920 --> 00:06:50.120 juggling array shapes. 146 00:06:50.199 --> 00:06:52.800 They really are your bread butter for manipulation. 147 00:06:53.399 --> 00:06:55.519 So let's double back, like you said to that really 148 00:06:55.560 --> 00:07:01.120 critical idea, views versus copies. The book really emphasis and for. 149 00:07:01.240 --> 00:07:03.959 Very good reason, is probably one of the most important, 150 00:07:04.439 --> 00:07:07.720 maybe subtle things to grasp to avoid weird bugs. Okay, 151 00:07:08.360 --> 00:07:11.800 when you do certain Numpi operations slicing is a big one, 152 00:07:12.360 --> 00:07:15.600 or using the view method. Explicitly, Numpie tries to be efficient. 153 00:07:15.759 --> 00:07:18.040 It doesn't want to waste time and a memory copying 154 00:07:18.160 --> 00:07:19.519 data if it doesn't have to. 155 00:07:19.759 --> 00:07:21.839 So it gives you a view. What does that actually mean. 156 00:07:22.360 --> 00:07:25.160 It means the new array object you get back, yeah, 157 00:07:25.199 --> 00:07:28.800 shares the same underlying data in memory as the original array. 158 00:07:29.319 --> 00:07:31.480 It's just looking at potentially at a different part of it, 159 00:07:31.839 --> 00:07:33.600 or maybe the same part with a different shape or 160 00:07:33.639 --> 00:07:34.639 data type. 161 00:07:34.360 --> 00:07:37.439 Which means if I changed the data in the view, 162 00:07:37.759 --> 00:07:38.759 you are changing. 163 00:07:38.439 --> 00:07:41.399 The data in the original array too, because it's the 164 00:07:41.439 --> 00:07:43.920 same data. It's not a separate snapshot. 165 00:07:44.199 --> 00:07:46.439 WHOA, Okay, that's huge. 166 00:07:46.399 --> 00:07:48.959 It really is. The Lena image example in the book 167 00:07:49.000 --> 00:07:51.319 makes it super clear. You take a slice of the 168 00:07:51.360 --> 00:07:54.839 image array, maybe representing her hat. You set all the 169 00:07:54.839 --> 00:07:58.000 pixel values in that slice to black, thinking you're just 170 00:07:58.040 --> 00:08:01.120 modifying the slice. But then you look at the original. 171 00:08:00.800 --> 00:08:03.720 Image and her hat is blacked out on the original too. 172 00:08:03.639 --> 00:08:06.959 Exactly because the slice was just a view into the 173 00:08:07.000 --> 00:08:08.480 original images data buffer. 174 00:08:09.120 --> 00:08:11.120 Okay, so how do you avoid that? If you don't 175 00:08:11.120 --> 00:08:12.240 want to change the original? 176 00:08:12.319 --> 00:08:15.519 You have to explicitly ask for a copy using the 177 00:08:15.560 --> 00:08:18.600 dot copy method. That tells numbpi, no, I want a 178 00:08:18.639 --> 00:08:21.920 completely separate version of this data in new memory. Then 179 00:08:22.000 --> 00:08:25.560 if you modify the copy, the original is totally unaffected. 180 00:08:26.000 --> 00:08:29.759 Right. So views are efficient but dangerous if you're not careful. 181 00:08:30.079 --> 00:08:32.840 Copies are safe, but use more memory and take time. 182 00:08:33.360 --> 00:08:35.320 That's the trade off. You need to know when you're 183 00:08:35.320 --> 00:08:37.360 getting a view and when you're getting a copy, and 184 00:08:37.480 --> 00:08:40.799 use dot copy when you need independence. It's fundamental. 185 00:08:40.960 --> 00:08:45.799 Oh, okay, fundamental. Indeed, Now beyond basic slicing, numb pi 186 00:08:45.960 --> 00:08:48.559 has more advanced ways to index arrays. 187 00:08:49.120 --> 00:08:52.559 Fancy indexing, yeah, fancy indexing. It basically lets you select 188 00:08:52.639 --> 00:08:56.960 elements using things other than simple integer slices. Primarily you 189 00:08:57.080 --> 00:08:59.039 use lists of indices or booleon arrays. 190 00:08:59.080 --> 00:09:01.360 So Instead of describing a block like two point five, 191 00:09:01.480 --> 00:09:03.720 I could give it a list, say one, five, seven, 192 00:09:03.919 --> 00:09:06.399 to pick out just those specific rows. 193 00:09:06.159 --> 00:09:09.559 Or columns exactly. You can pinpoint specific scattered elements. The 194 00:09:09.559 --> 00:09:12.559 books example using nymex with lists to kind of shuffle 195 00:09:12.600 --> 00:09:15.720 parts of the Lina image around visually shows this power. 196 00:09:15.559 --> 00:09:18.480 Right, and Boolean indexing that sounds like filtering, it is. 197 00:09:18.559 --> 00:09:22.159 It's incredibly useful for filtering data based on conditions. You 198 00:09:22.240 --> 00:09:24.559 create an array of true and false values. 199 00:09:24.559 --> 00:09:27.360 Usually by applying some comparison to your data array like 200 00:09:27.440 --> 00:09:28.639 data ten precisely. 201 00:09:28.840 --> 00:09:31.559 Yeah, and then you use that Booleon array as the 202 00:09:31.639 --> 00:09:33.759 index for your original data array. 203 00:09:33.639 --> 00:09:35.320 And numb I just gives you back the elements where 204 00:09:35.320 --> 00:09:36.720 the Booleon array was true. 205 00:09:37.039 --> 00:09:40.440 Yep, only those elements that meant the condition. The example 206 00:09:40.440 --> 00:09:43.200 of putting dots along the diagonal of the Lina image 207 00:09:43.720 --> 00:09:46.000 is a neat way to see its selecting pixels based 208 00:09:46.000 --> 00:09:48.399 on whether their row and column index are equal. 209 00:09:48.519 --> 00:09:52.039 Okay, that seems really powerful for data selection. Now stride 210 00:09:52.120 --> 00:09:55.000 tricks that sounds a bit more arecane. 211 00:09:55.080 --> 00:09:58.080 It is definitely a more advanced concept. Yeah, but the 212 00:09:58.120 --> 00:10:01.000 idea behind it is fascinating and it really shows off 213 00:10:01.039 --> 00:10:04.600 how numb pi thinks about memory. Okay, so we know 214 00:10:04.679 --> 00:10:08.159 numbpi stores array data in one contiguous block of memory, 215 00:10:08.320 --> 00:10:12.519 right because of howmogin eighty huh. Stride tricks let you 216 00:10:12.559 --> 00:10:15.159 create views of that same block of memory, but you 217 00:10:15.200 --> 00:10:17.919 tell numb pi to interpret it with a completely different structure. 218 00:10:18.320 --> 00:10:20.519 You do this by specifying the strides. 219 00:10:20.120 --> 00:10:22.240 Strides like how many bytes to jump to get to 220 00:10:22.240 --> 00:10:23.360 the next element. 221 00:10:23.399 --> 00:10:26.039 Exactly, how many bytes to step to get to the 222 00:10:26.080 --> 00:10:28.399 next element in the same row, and how many bytes 223 00:10:28.440 --> 00:10:30.080 to step to get to the next element in the 224 00:10:30.120 --> 00:10:33.440 same column, which is usually just the side of one row. Okay, 225 00:10:33.639 --> 00:10:36.879 with stride tricks, using functions like a strided, you can 226 00:10:37.000 --> 00:10:39.919 manipulate those step sizes. You can tell numb pi to 227 00:10:39.960 --> 00:10:42.639 get to the next element in this dimension step forward 228 00:10:42.639 --> 00:10:45.879 this many bytes, even if that overlaps with previous data 229 00:10:46.000 --> 00:10:47.919 or creates a totally different logical layout. 230 00:10:48.039 --> 00:10:51.120 The Sudoku example in the book was wild taking a 231 00:10:51.240 --> 00:10:52.720 nine by nine grid. 232 00:10:52.559 --> 00:10:55.039 Yeah, and using a stride in to make numb PI 233 00:10:55.200 --> 00:10:58.000 see that same nine by nine block of memory not 234 00:10:58.120 --> 00:11:00.600 just as nine rows of nine numbers, but as an 235 00:11:00.679 --> 00:11:01.879 array of three by three. 236 00:11:01.720 --> 00:11:04.399 Squares without copying any data without. 237 00:11:04.200 --> 00:11:06.919 Copying anything, You're just giving NUMPI a new recipe, new 238 00:11:06.919 --> 00:11:09.440 strides for how to walk through the existing memory to 239 00:11:09.679 --> 00:11:11.120 precede these three by three blocks. 240 00:11:11.159 --> 00:11:14.559 Wow, So that efficient memory layout isn't just for raw speed. 241 00:11:14.639 --> 00:11:18.240 It enables these incredibly clever ways to access structured data 242 00:11:18.320 --> 00:11:19.879 within the array exactly. 243 00:11:20.080 --> 00:11:24.000 It really highlights how numb pile leverages that contiguous memory. 244 00:11:24.279 --> 00:11:26.519 You can instantly get all the three by three blocks 245 00:11:26.600 --> 00:11:29.879 or overlapping windows for signal processing just by defining the 246 00:11:29.919 --> 00:11:33.399 right strides. It raises that question, doesn't it How this 247 00:11:33.519 --> 00:11:36.720 simple contiguous block enables such complex views? 248 00:11:36.840 --> 00:11:42.919 Mind blown slightly? Okay? One more fundamental concept broadcasting. This 249 00:11:42.960 --> 00:11:45.919 one seems simpler but pops up everywhere it does. 250 00:11:46.559 --> 00:11:49.720 Broadcasting is how numb pie handles operations like addition or 251 00:11:49.799 --> 00:11:53.120 multiplication between arrays that don't have the exact same. 252 00:11:52.960 --> 00:11:55.679 Shape, like adding a single number to every element in 253 00:11:55.720 --> 00:11:56.399 an array. 254 00:11:56.240 --> 00:11:59.080 That's a classic example, or multiplying an entire array by 255 00:11:59.080 --> 00:12:02.159 a scaler. The rule basically is that numbpi tries to 256 00:12:02.240 --> 00:12:06.240 stretch or duplicate the smaller arrays dimensions so that its 257 00:12:06.279 --> 00:12:09.000 shape becomes compatible with the larger array. For the element 258 00:12:09.039 --> 00:12:09.759 wise operation. 259 00:12:09.919 --> 00:12:11.799 The audio volume example is perfect for this. You have 260 00:12:11.799 --> 00:12:12.879 an array of audio. 261 00:12:12.559 --> 00:12:14.559 Sample right, maybe thousands of numbers, then you. 262 00:12:14.519 --> 00:12:16.600 Just multiply it by point two to make it quieter. 263 00:12:16.720 --> 00:12:20.200 Yep, numb Pi doesn't actually create a massive array filled 264 00:12:20.240 --> 00:12:23.559 with point two's to match the audio data size. That 265 00:12:23.600 --> 00:12:27.039 would be really inefficient. It just understands the broadcasting rule. 266 00:12:27.360 --> 00:12:30.440 It sees you're multiplying an n dimensional array by a scaler, 267 00:12:30.679 --> 00:12:33.080 which is like a zero dimensional array. It knows to 268 00:12:33.120 --> 00:12:36.200 apply that scaler multiplication to every single element of the 269 00:12:36.279 --> 00:12:37.360 n dimensional array. 270 00:12:37.240 --> 00:12:40.159 In one go, using that fast C code again exactly. 271 00:12:40.240 --> 00:12:42.080 So what does this all mean for you? It means 272 00:12:42.080 --> 00:12:44.279 you can write really intuitive code like audio dada zero 273 00:12:44.360 --> 00:12:47.480 point two or ray plus five without writing loops. Numb 274 00:12:47.480 --> 00:12:50.000 Pile figures out how to make the shapes compatible efficiently. 275 00:12:50.240 --> 00:12:52.120 It makes a ray math much cleaner and faster. 276 00:12:52.559 --> 00:12:57.000 Okay, so we've got the foundation efficient arrays, data types, manipulation, 277 00:12:57.200 --> 00:13:02.480 the crucial views versus copies, fancy indexings, broadcasting. That's a 278 00:13:02.519 --> 00:13:04.200 powerful toolkit just on its own. 279 00:13:04.320 --> 00:13:04.919 Absolutely. 280 00:13:04.960 --> 00:13:07.039 Now let's talk about putting it to work. The book 281 00:13:07.080 --> 00:13:10.960 moves into actual data analysis prediction linking up with other 282 00:13:11.039 --> 00:13:11.639 librarries right. 283 00:13:11.679 --> 00:13:14.840 Applying these tools. The basic data analysis example using weather 284 00:13:14.919 --> 00:13:18.399 data from a station in the Netherlands to built I 285 00:13:18.440 --> 00:13:20.080 think it's very practical. 286 00:13:19.679 --> 00:13:22.320 Shows how you load data from a file maybe a 287 00:13:22.360 --> 00:13:25.080 CSV or a text file using load TX. 288 00:13:24.879 --> 00:13:28.360 And it immediately hits a real world issue, messy data 289 00:13:29.159 --> 00:13:30.679 missing values. 290 00:13:30.320 --> 00:13:33.039 Yeah, which happens all the time. In that data set, 291 00:13:33.240 --> 00:13:36.960 missing values were marked with like meta I or something yeah. 292 00:13:36.759 --> 00:13:40.759 Some special code, and the book shows how you typically 293 00:13:40.759 --> 00:13:44.399 handle that, maybe filter them out or more often convert 294 00:13:44.399 --> 00:13:48.879 them into NUMBPI special NAN value not a number, right nan, 295 00:13:49.200 --> 00:13:53.200 because numbpie's math functions often know how to handle NaN's correctly, 296 00:13:53.639 --> 00:13:55.919 like ignoring them when calculating a mean. 297 00:13:56.120 --> 00:13:59.080 Okay, so data loaded cleaned up a bit, then doing 298 00:13:59.120 --> 00:14:01.399 the actual analysis is easy, super easy. 299 00:14:01.399 --> 00:14:05.960 With NUMPI wenty average temperature maxwindspeed dot max standard deviation 300 00:14:06.039 --> 00:14:09.080 dot STV. You apply these functions directly to your arrays 301 00:14:09.159 --> 00:14:10.080 or columns of data. 302 00:14:10.159 --> 00:14:13.159 The example showed calculating things like the daily temperature range 303 00:14:13.240 --> 00:14:16.120 max minus men or looking at yearly averages. 304 00:14:16.360 --> 00:14:19.200 Yeah. And while you know one station's data isn't proof 305 00:14:19.200 --> 00:14:21.519 of global warming or anything of course, it gives you 306 00:14:21.559 --> 00:14:24.600 a taste of using these tools for exploring trends. You 307 00:14:24.639 --> 00:14:27.279 could do the same for wind, pressure, humidity, whatever's in 308 00:14:27.279 --> 00:14:27.840 the data sets. 309 00:14:27.840 --> 00:14:31.799 Okay, so that's understanding past data. What about predicting the future? 310 00:14:32.440 --> 00:14:34.960 The book touches on simple predictive analytics. 311 00:14:35.279 --> 00:14:39.039 Yeah, moving from description to forecasting. The core idea is 312 00:14:39.120 --> 00:14:42.120 using historical patterns to guess what might happen. 313 00:14:41.919 --> 00:14:44.519 Next, like with the temperature data exactly. 314 00:14:45.120 --> 00:14:49.480 The book mentions basic concepts like autoregressive models or ar models. 315 00:14:49.919 --> 00:14:54.440 Simple idea, predict tomorrow's temperature based on today's yesterday's the 316 00:14:54.519 --> 00:14:55.159 day before. 317 00:14:55.320 --> 00:14:58.200 Using past values to predict the future. 318 00:14:58.120 --> 00:15:00.519 Right, and it hints at how you'd actually fit a 319 00:15:00.559 --> 00:15:03.559 model like that to your data. This often involves bringing 320 00:15:03.600 --> 00:15:07.399 in tools from the wider scientific Python world like SIP. Yeah, 321 00:15:07.480 --> 00:15:10.360 maybe using something like pip dot optimize at least sq 322 00:15:10.480 --> 00:15:13.360 to find the model parameters that best match the historical data. 323 00:15:14.080 --> 00:15:17.240 It also mentions that tools like pandas, which builds on NUMPI, 324 00:15:17.600 --> 00:15:20.480 are great for summarizing data and looking for correlations before 325 00:15:20.480 --> 00:15:21.519 you even start modeling. 326 00:15:21.759 --> 00:15:25.679 Makes sense. Another area is signal processing, analyzing data that 327 00:15:25.759 --> 00:15:28.039 changes over time, often with cycles right. 328 00:15:27.879 --> 00:15:31.279 Like the sunspot data example. Sunspots have these known cycles 329 00:15:31.360 --> 00:15:32.320 roughly eleven. 330 00:15:32.080 --> 00:15:35.720 Years, and signal processing techniques help analyze those patterns. 331 00:15:36.000 --> 00:15:39.639 Yep. The book mentions smoothing like using a moving average 332 00:15:39.960 --> 00:15:42.600 to filter out short term noise and see the underlying 333 00:15:42.639 --> 00:15:43.679 trend more clearly. 334 00:15:44.240 --> 00:15:47.120 Though it notes simple moving averages aren't always the best 335 00:15:47.159 --> 00:15:50.080 for cyclical data like sunspots. 336 00:15:49.559 --> 00:15:52.720 Correct they can distort the peaks and troughs, so it 337 00:15:52.840 --> 00:15:56.080 hints it more advanced stuff like decomposing a signal into 338 00:15:56.080 --> 00:16:00.000 its core components. It mentions techniques like EMD empirical mode 339 00:16:00.120 --> 00:16:04.080 decomposition to break down the sunspot signal into intrinsic mode 340 00:16:04.080 --> 00:16:07.679 functions or IMFs to better analyze those cycles. 341 00:16:07.799 --> 00:16:10.759 And NUMPI is doing the heavy lifting numerically for these 342 00:16:10.759 --> 00:16:11.960 algorithms exactly. 343 00:16:12.320 --> 00:16:16.240 It provides the array operations needed to implement these complex 344 00:16:16.440 --> 00:16:17.759 signal processing methods. 345 00:16:17.960 --> 00:16:20.679 It really feels like NUMPI isn't just standalone, It's like 346 00:16:20.799 --> 00:16:26.240 the central hub for this whole ecosystem of scientific Python tools. 347 00:16:26.279 --> 00:16:28.159 That's a great way to put it. It's the common language, 348 00:16:28.200 --> 00:16:31.200 the common data structure SCIPI, as we mentioned, builds directly 349 00:16:31.200 --> 00:16:34.159 on NUMPI. That's tons more advanced scientific tools. 350 00:16:33.840 --> 00:16:35.679 Like what kind of things, oh. 351 00:16:35.480 --> 00:16:42.519 Numerical integration, solving, differential equations, interpolation, optimization algorithms, more linear algebras, 352 00:16:42.519 --> 00:16:47.279 statistical functions, stuff that goes beyond numbpi's core array focus. 353 00:16:47.000 --> 00:16:49.960 And psychic Learn, the big machine learning. 354 00:16:49.679 --> 00:16:54.000 Library hugely reliant on NUMBPI. Almost everything in psychic learn 355 00:16:54.080 --> 00:16:58.240 expects input data as NUMBPI arrays your features, your targets, 356 00:16:58.720 --> 00:17:01.679 and it often outputs for results as numbi erays two 357 00:17:02.399 --> 00:17:07.039 predictions model coefficients. The book points to examples like clustering 358 00:17:07.039 --> 00:17:10.400 stock data or using Psychic's image which is related for 359 00:17:10.480 --> 00:17:13.839 image processing, like finding corners in a picture, all powered 360 00:17:13.839 --> 00:17:15.240 by numb pi arrays underneath. 361 00:17:15.359 --> 00:17:18.720 And what if Python itself, even with numbpis C backend, 362 00:17:18.920 --> 00:17:21.519 isn't fast enough for some really critical part of your code. 363 00:17:21.559 --> 00:17:24.240 That's where Cithon comes in Python. It's a language that's 364 00:17:24.359 --> 00:17:26.200 kind of a mix of Python and C. You can 365 00:17:26.240 --> 00:17:29.599 write Python like code, add some static type declarations, and 366 00:17:29.680 --> 00:17:31.119 Cython compiles it down to. 367 00:17:31.039 --> 00:17:33.599 Efficient C code, and it works well with NUMPI. 368 00:17:33.599 --> 00:17:37.599 Very well because NUMPI arrays already have that underlying C structure. 369 00:17:37.960 --> 00:17:40.559 Cython code can operate on the array data directly at 370 00:17:40.640 --> 00:17:44.799 C speeds without the Python interpreter overhead. It's great for 371 00:17:44.839 --> 00:17:49.119 optimizing bottlenecks or for wrapping existing C or C plus 372 00:17:49.119 --> 00:17:51.000 libraries to use them from Python. 373 00:17:51.279 --> 00:17:54.039 So the ecosystem is NUMBPI at the core, SIPI for 374 00:17:54.160 --> 00:17:58.079 more science, math tools, psychic learn for mL, Cython for 375 00:17:58.119 --> 00:18:00.680 speed optimization. It's quite layered, it is. 376 00:18:00.880 --> 00:18:03.519 And it's still evolving. The book even looks ahead, mentioning 377 00:18:03.559 --> 00:18:05.160 projects like Blaze Blaze. 378 00:18:05.160 --> 00:18:06.160 What's the idea there? 379 00:18:06.640 --> 00:18:09.079 The goal is to take Numpie's array based way of 380 00:18:09.119 --> 00:18:11.359 thinking and extend it to data sets that are too 381 00:18:11.400 --> 00:18:12.519 big for memory. 382 00:18:12.400 --> 00:18:15.559 Ah big data territory or streaming. 383 00:18:15.240 --> 00:18:19.559 Data exactly, applying similar principles of efficient array oriented computation 384 00:18:19.799 --> 00:18:22.640 to distributed systems or data streams. It shows that this 385 00:18:22.759 --> 00:18:25.400 core idea started by Numpi is still expanding. 386 00:18:25.559 --> 00:18:27.720 That's really cool. Okay, Before we wrap up one less 387 00:18:27.759 --> 00:18:33.359 practical point, the book covers good development practices Profiling, debugging, testing, Yeah. 388 00:18:33.119 --> 00:18:36.279 Really important, especially with numerical code, where small errors can 389 00:18:36.319 --> 00:18:40.480 sometimes lead to big problems or just slow things down unnecessarily. 390 00:18:40.000 --> 00:18:42.519 Profiling helps you find where your code is spending its 391 00:18:42.559 --> 00:18:44.359 time the bottlenecks right. 392 00:18:44.599 --> 00:18:47.640 Debugging is for tracking down errors when things go wrong, 393 00:18:48.079 --> 00:18:52.000 and testing. Writing automated test is crucial for making sure 394 00:18:52.039 --> 00:18:55.440 your code actually works as expected and stays working when 395 00:18:55.440 --> 00:18:56.720 you make changes later. 396 00:18:56.680 --> 00:18:59.880 And Python and its ecosystem have tools for these. 397 00:19:00.079 --> 00:19:04.440 Definitely, Python has built in profilers I Python has magic 398 00:19:04.440 --> 00:19:07.200 commands like percent debug for jumping into the debugger right 399 00:19:07.240 --> 00:19:10.519 after an error. There are standalone debuggers like a PDB, 400 00:19:10.880 --> 00:19:14.799 and for testing, Python's unitist module is standard, but libraries 401 00:19:14.839 --> 00:19:17.880