WEBVTT 1 00:00:00.120 --> 00:00:02.200 Welcome to the deep Dive. We're of the show that 2 00:00:02.200 --> 00:00:05.160 cuts through the noise, getting straight to the fascinating core 3 00:00:05.280 --> 00:00:09.039 of well complex topics. That's the plan, and today we 4 00:00:09.119 --> 00:00:12.960 are diving headfirst into the fundamentals of computer science. We're 5 00:00:13.039 --> 00:00:18.239 drawing insights from computer science distilled by a Watston Fererrafilo. 6 00:00:17.879 --> 00:00:20.239 A great resource, absolutely. 7 00:00:19.839 --> 00:00:22.719 And our mission isn't just for coders, right, It's for 8 00:00:22.760 --> 00:00:26.839 anyone who wants to sharpen their computational. 9 00:00:26.160 --> 00:00:30.719 Thinking exactly to really understand the invisible forces powering our 10 00:00:30.760 --> 00:00:31.440 digital world. 11 00:00:31.920 --> 00:00:34.359 And you know it goes way beyond just the machines. 12 00:00:34.039 --> 00:00:37.280 Themselves, it really does. There's that famous quote from Esger Dikstra, 13 00:00:37.359 --> 00:00:40.119 a real pioneer. Oh yeah, he said, computer science is 14 00:00:40.159 --> 00:00:43.079 not about machines in the same way that astronomy is 15 00:00:43.119 --> 00:00:44.240 not about telescopes. 16 00:00:44.359 --> 00:00:46.000 Huh. I like that. 17 00:00:46.119 --> 00:00:49.719 It's fundamentally about problem solving. It's a systematic way of 18 00:00:49.759 --> 00:00:53.280 thinking that helps us break down incredibly complex challenges. 19 00:00:53.520 --> 00:00:56.679 That quote perfectly sets the stage, doesn't it. So you 20 00:00:56.840 --> 00:00:59.159 listening are about to take a shortcut sort of to 21 00:00:59.240 --> 00:01:02.880 being truly well informed about these foundations, the stuff that 22 00:01:02.880 --> 00:01:07.000 builds everything from your phone apps to global networks. Okay, 23 00:01:07.079 --> 00:01:10.359 so if computer science isn't just about the physical machines, 24 00:01:11.840 --> 00:01:14.120 where do we even begin when we talk about the 25 00:01:14.200 --> 00:01:16.120 art of solving computational problems? 26 00:01:16.319 --> 00:01:18.840 Well, we begin right at the start, taking a big, 27 00:01:19.079 --> 00:01:20.920 maybe fuzzy idea and breaking. 28 00:01:20.640 --> 00:01:22.560 It down into smaller pieces. 29 00:01:22.200 --> 00:01:26.480 Exactly definable chunks. Translating a human thought into something that 30 00:01:26.680 --> 00:01:29.000 you know could eventually be processed step by step. 31 00:01:29.200 --> 00:01:31.840 Okay, that makes a lot of sense. So what tools 32 00:01:31.840 --> 00:01:35.480 do we use for that initial modeling of ideas well? 33 00:01:36.079 --> 00:01:38.840 Visualizing the process is a great start. I think of flow. 34 00:01:38.680 --> 00:01:40.480 Charts, right, like drawing it out. 35 00:01:40.680 --> 00:01:43.480 Yeah, the book mentioned how wikipedians actually used a flow 36 00:01:43.560 --> 00:01:46.120 chart to map out their whole collaborative editing process. 37 00:01:46.280 --> 00:01:48.480 Really for human interaction. 38 00:01:48.280 --> 00:01:52.920 Yep, showing how discussions, edits agreements would flow. It's powerful 39 00:01:52.920 --> 00:01:56.000 for complex interactions, not just code or you know, a 40 00:01:56.040 --> 00:01:59.359 simpler example just mapping the steps to find the biggest 41 00:01:59.359 --> 00:02:00.200 of three numbers. 42 00:02:00.319 --> 00:02:02.439 I can see how that would clarify things. Okay, what 43 00:02:02.519 --> 00:02:05.760 about pseudo code. I've had that term thrown around. Ugh. 44 00:02:05.879 --> 00:02:12.240 Pseudocode another fantastic tool. It's like human friendly code, so 45 00:02:12.400 --> 00:02:16.960 not actual programming language. It completely disregards strict syntax rules. 46 00:02:17.000 --> 00:02:19.080 You just write out the steps in plain English or 47 00:02:19.080 --> 00:02:20.000 whatever works for you. 48 00:02:20.080 --> 00:02:24.400 So flow chart, pseudocode, they're basically tools for thinking clearly. 49 00:02:24.439 --> 00:02:27.520 Then that's it, helping you visualize and refine the process 50 00:02:27.759 --> 00:02:32.039 before you even think about writing actual rentable code, structuring 51 00:02:32.039 --> 00:02:32.479 your thoughts. 52 00:02:32.520 --> 00:02:36.719 Logically, and speaking of logic, the book makes this interesting point. 53 00:02:37.000 --> 00:02:40.400 Coders work with logic so much it messes their minds, 54 00:02:40.879 --> 00:02:42.680 but often use it unknowingly. 55 00:02:43.039 --> 00:02:44.599 Uh huh, Yeah, there's some truth to that. 56 00:02:44.719 --> 00:02:48.280 So what's the benefit of understanding logic consciously? 57 00:02:48.439 --> 00:02:51.919 It's about precision. Formal logic lets us work with definite 58 00:02:51.960 --> 00:02:54.599 true or false statements black and white pretty much. Yeah, 59 00:02:54.719 --> 00:02:59.080 understanding logical operators A and D or xor conditional negation. 60 00:02:59.360 --> 00:03:00.199 It's all boole. 61 00:03:00.159 --> 00:03:03.039 Algebra really, giving us rules to simplify things exactly. 62 00:03:03.039 --> 00:03:05.199 The source has a great example the hot server problem. 63 00:03:05.360 --> 00:03:08.360 A server crashes if it's overheating and the AC is off, okay, 64 00:03:08.439 --> 00:03:10.639 or if it's overheating and the Chassi cooler fails. 65 00:03:10.960 --> 00:03:13.719 Right, So you'd write that out as a logical expression. 66 00:03:13.199 --> 00:03:19.120 Precisely, maybe overheating andy acoff or overheating and de cooler failed. 67 00:03:19.199 --> 00:03:21.120 What's complicated, But here's the power. 68 00:03:21.719 --> 00:03:26.319 Using logic rules like distributivity from Boolean algebra, you simplify 69 00:03:26.360 --> 00:03:30.479 that it becomes overheating and d acoff or cooler failed. 70 00:03:30.639 --> 00:03:31.680 Much clearer, right. 71 00:03:31.639 --> 00:03:34.520 Yeah, that's much simpler. You see immediately overheating is key 72 00:03:34.560 --> 00:03:36.240 plus one of the cooling failures. 73 00:03:36.280 --> 00:03:39.960 Exactly. This kind of simplification saves a ton of effort 74 00:03:40.000 --> 00:03:42.479 in complex systems. It gives you the precise conditions. 75 00:03:42.520 --> 00:03:45.759 Okay, great illustration, And to analyze these logical models you 76 00:03:45.840 --> 00:03:47.479 mentioned truth tables. 77 00:03:47.759 --> 00:03:52.840 Ah, yes, truth tables are like comprehensive diagnostic tools. Oh 78 00:03:52.919 --> 00:03:56.479 so they lay out every single possible combination of inputs 79 00:03:56.520 --> 00:03:59.599 truefall for each condition and show the outcome for each one. 80 00:04:00.120 --> 00:04:01.199 Check every scenario. 81 00:04:01.400 --> 00:04:04.960 Yep. The book's fragile system example about designing a database 82 00:04:05.280 --> 00:04:09.479 had four variables that means sixteen possible states. A truth 83 00:04:09.479 --> 00:04:10.879 table maps mole out. 84 00:04:10.800 --> 00:04:12.280 Ensures you don't miss any weird. 85 00:04:12.080 --> 00:04:14.360 Edge cases precisely. It's systematic. 86 00:04:14.599 --> 00:04:17.720 Now, what's really fascinating here is how this abstract logic 87 00:04:18.120 --> 00:04:21.839 connects directly to the hardware. Right, how do true false 88 00:04:21.879 --> 00:04:23.680 becomes something a machine understands? 89 00:04:23.800 --> 00:04:26.920 It's kind of brilliant in its simplicity, True and false 90 00:04:27.040 --> 00:04:30.720 become high or low voltage signals signals right, right, And 91 00:04:30.800 --> 00:04:35.759 these logical operations an D or not T are performed 92 00:04:35.800 --> 00:04:38.560 by tiny electrical circuits. We call them logic gates. 93 00:04:38.639 --> 00:04:39.040 Got it. 94 00:04:39.240 --> 00:04:41.759 The book even shows how you combine these gates to 95 00:04:41.879 --> 00:04:44.839 actually some numbers, like how two plus three weekls five 96 00:04:44.920 --> 00:04:46.120 works at the binary level. 97 00:04:46.240 --> 00:04:48.839 Wow, so's the CPU, the brain of the computer. 98 00:04:48.959 --> 00:04:52.160 It's working. Principles are still fundamentally based on this Boolean algebra. 99 00:04:52.439 --> 00:04:56.519 It's millions billions of these tiny switches manipulating voltages. 100 00:04:56.600 --> 00:05:00.360 Incredible. Okay, So logic defines conditions. What about the sheer 101 00:05:00.360 --> 00:05:03.680 scale of possibilities that brings us to counting? Why is 102 00:05:03.720 --> 00:05:06.000 something basic counting so vital here? 103 00:05:06.040 --> 00:05:08.480 Because it builds your intuition, your gut feeling for how 104 00:05:08.519 --> 00:05:12.680 possibilities combine and often explode explode. Yeah, it's not about 105 00:05:12.720 --> 00:05:16.199 memorizing formulas, but grasping the principles, like cracking a simple 106 00:05:16.279 --> 00:05:18.360 pin two digits one letter. 107 00:05:18.399 --> 00:05:21.759 Okay, so ten choices, ten choices, twenty six choices, right. 108 00:05:21.680 --> 00:05:24.680 You multiply them ten by ten by twenty six, it's 109 00:05:24.720 --> 00:05:28.639 twenty six hundred possibilities, or the team building example. Twenty 110 00:05:28.639 --> 00:05:31.600 three candidates each can be in or out. 111 00:05:31.399 --> 00:05:33.639 So two choices for each of the twenty three. 112 00:05:33.759 --> 00:05:36.279 That's two to the power of twenty three possible teams. 113 00:05:36.519 --> 00:05:38.040 The numbers grow fast. 114 00:05:37.839 --> 00:05:40.879 And sometimes they grow astronomically fast, like with permutations right 115 00:05:40.920 --> 00:05:42.519 where order matters exactly. 116 00:05:42.759 --> 00:05:45.160 Permutations are about the number of ways to order things. 117 00:05:45.360 --> 00:05:47.319 The classic traveling salesman. 118 00:05:47.040 --> 00:05:48.720 Problem finding the shortest route. 119 00:05:48.800 --> 00:05:52.519 Yeah, just fifteen cities, yeh. Try to take every possible 120 00:05:52.600 --> 00:05:56.079 route order you're looking at over one point three trillion 121 00:05:56.360 --> 00:05:59.879 routes trillion. Wow, And if each calculation takes a microsecond, 122 00:06:00.040 --> 00:06:03.279 that's fifteen days of compute time for twenty cities. It 123 00:06:03.360 --> 00:06:05.319 jumps to seventy seven thousand years. 124 00:06:05.399 --> 00:06:06.360 That's insane. 125 00:06:06.519 --> 00:06:09.639 The precious tune example too, thirteen note six note melodies, 126 00:06:10.199 --> 00:06:13.959 over one point two million possibilities. It shows that factorials 127 00:06:14.000 --> 00:06:16.480 just make things explode computationally staggering. 128 00:06:16.519 --> 00:06:18.839 Okay, what if the order doesn't matter? Like in combination. 129 00:06:19.079 --> 00:06:22.519 Combinations are about selecting items where arrangement isn't important, like 130 00:06:22.600 --> 00:06:23.680 the chess queen's. 131 00:06:23.319 --> 00:06:25.480 Problem placing eight queens so none attack each other. 132 00:06:25.720 --> 00:06:27.600 Right, if we were just choosing eight squares out of 133 00:06:27.600 --> 00:06:30.040 sixty four, ignoring the attack rules for a moment, that's 134 00:06:30.040 --> 00:06:32.800 sixty four, choose eight. It was about four point four 135 00:06:32.839 --> 00:06:34.560 billion ways. 136 00:06:33.920 --> 00:06:38.040 Still huge, but less than the traveling salesman for fifteen cities. 137 00:06:38.160 --> 00:06:42.000 Often significantly smaller than permutations. Yeah yeah, but still massive 138 00:06:42.079 --> 00:06:42.920 numbers to deal with. 139 00:06:43.040 --> 00:06:45.519 And then there are sums. I remember Gauss's trick for 140 00:06:45.560 --> 00:06:48.079 adding one to one hundred. How does that play in? 141 00:06:48.160 --> 00:06:50.720 It's a classic shortcut, right, Yeah. That one plus two 142 00:06:50.720 --> 00:06:53.360 plus ro plus and n plus one two formula the 143 00:06:53.360 --> 00:06:56.240 flying cheap airline ticket problem. And the source uses this 144 00:06:56.879 --> 00:06:59.759 finding the cheapest flight over thirty days. You check every 145 00:06:59.759 --> 00:07:00.480 pair of star. 146 00:07:00.399 --> 00:07:02.680 Ten dates, so day one to day two, day one 147 00:07:02.800 --> 00:07:05.560 to day three up to day twenty ninety day thirty exactly. 148 00:07:05.639 --> 00:07:07.720 That's a sum one plus two plus b L plus 149 00:07:07.720 --> 00:07:09.879 twenty nine, which Gauss's trick tells you as four hundred 150 00:07:09.879 --> 00:07:12.759 and thirty five pairs. Shows how even simple sums lead 151 00:07:12.800 --> 00:07:13.839 to lots of operations. 152 00:07:13.879 --> 00:07:15.920 And this leads into probability, too, doesn't it. 153 00:07:16.040 --> 00:07:20.759 Absolutely. Probability concepts are critical for managing uncertainty thinking about 154 00:07:20.800 --> 00:07:24.519 independent events. Hey, like backing up data using three cheap 155 00:07:24.600 --> 00:07:27.680 discs might actually give you a lower overall failure risk 156 00:07:27.959 --> 00:07:31.079 than one super expensive disc if their failures are independent, 157 00:07:31.319 --> 00:07:32.240 interesting trade off. 158 00:07:32.360 --> 00:07:35.920 Or mutually exclusive events like choosing one subscription plan. And 159 00:07:36.079 --> 00:07:40.439 understanding this helps avoid things like the gambler's fallacy, believing. 160 00:07:40.000 --> 00:07:42.439 A coin is due for heads after a run of tails. 161 00:07:42.600 --> 00:07:45.920 Yeah, past independent events just don't affect future ones. It 162 00:07:45.959 --> 00:07:50.319 seems obvious, but it trips people up. So okay, with modeling, 163 00:07:50.720 --> 00:07:54.759 logic and counting, we can define and understand problems. But 164 00:07:54.839 --> 00:07:57.439 we have a solution. How do we know if it's 165 00:07:57.439 --> 00:07:59.839 actually good or you know, fast? 166 00:08:00.160 --> 00:08:03.160 Yeah, that feels like the crucial next step. It brings 167 00:08:03.199 --> 00:08:07.040 us to complexity. The book asks sorting twenty six cards, 168 00:08:07.120 --> 00:08:09.360 how long does it take? What if you have fifty 169 00:08:09.399 --> 00:08:11.199 two cards? Does it just take twice as long? 170 00:08:11.279 --> 00:08:13.959 It's often more nuanced than that, and it's absolutely critical. 171 00:08:13.959 --> 00:08:16.399 We talk about time complexity, usually written as T on 172 00:08:16.639 --> 00:08:17.639 N T O N. 173 00:08:17.759 --> 00:08:18.000 Yeah. 174 00:08:18.040 --> 00:08:20.680 It measures how the number of operations and algorithm performs 175 00:08:20.720 --> 00:08:24.279 scales with the input size N, and we usually focus 176 00:08:24.279 --> 00:08:25.079 on the worst case. 177 00:08:25.160 --> 00:08:27.120 Planning for the worst makes sense. 178 00:08:26.920 --> 00:08:29.040 And the standard way to talk about this growth is 179 00:08:29.079 --> 00:08:32.159 big O notation. It focuses on the dominant term how 180 00:08:32.159 --> 00:08:33.320 it scales, like. 181 00:08:33.320 --> 00:08:37.200 The biggest factor at n gets large exactly. For example, 182 00:08:37.360 --> 00:08:39.960 a simple method like selection sword is O N two 183 00:08:40.799 --> 00:08:43.519 O of n squared, meaning meaning if you double the 184 00:08:43.559 --> 00:08:46.919 input size n, the time it takes roughly quadruples. Because 185 00:08:46.919 --> 00:08:49.399 of that N squared factor, it grows quadratically. 186 00:08:49.440 --> 00:08:52.559 Ouch. And here's where we get that aha moment, right. 187 00:08:53.200 --> 00:08:56.000 Comparing something like om log in to O N two. 188 00:08:57.120 --> 00:08:58.440 The difference is huge. 189 00:08:58.639 --> 00:09:01.919 It's absolutely mass especially for large inputs. The source has 190 00:09:01.919 --> 00:09:03.759 a table. It's shocking. 191 00:09:03.879 --> 00:09:04.639 Give us an idea. 192 00:09:04.679 --> 00:09:07.480 Okay, for a million items, an O N two algorithm 193 00:09:07.600 --> 00:09:10.639 might literally take years to rent years. Yeah, But an 194 00:09:10.639 --> 00:09:13.799 O N log n algorithm for the same million items, 195 00:09:13.840 --> 00:09:14.480 it might take. 196 00:09:14.399 --> 00:09:17.480 Minutes minutes versus years. That's incredible. 197 00:09:17.600 --> 00:09:20.200 That log in factor makes a profound difference. It often 198 00:09:20.200 --> 00:09:23.159 comes from algorithms that repeatedly cut the problem size in half. 199 00:09:23.440 --> 00:09:26.120 It's why finding efficient algorithms is so important. 200 00:09:26.200 --> 00:09:28.440 So O N two is bad for big inputs, O 201 00:09:28.600 --> 00:09:30.679 en log in is much better. What about even worse 202 00:09:30.720 --> 00:09:32.799 things like exponential algorithms? 203 00:09:32.799 --> 00:09:35.799 Ah, those are the scary ones, things like two oh 204 00:09:35.919 --> 00:09:38.600 of two. To the end's they grow so incredibly fast 205 00:09:38.600 --> 00:09:42.440 they're considered not runnable for anything but really tiny inputs. 206 00:09:42.759 --> 00:09:46.879 Even supercomputers can't handle them. Once NN gets even moderately. 207 00:09:46.480 --> 00:09:49.720 Large, like the traveling salesman problem again exactly. 208 00:09:50.000 --> 00:09:53.240 That often falls into this category, along with other famously 209 00:09:53.480 --> 00:09:57.559 hard problems called NP complete problems like the knapsack problem. 210 00:09:57.679 --> 00:09:59.559 And isn't there a prize for solving one of those 211 00:09:59.600 --> 00:10:01.080 efficient Huh? 212 00:10:01.159 --> 00:10:03.679 Yes? Yeah, that's the million dollar prize from the Clay 213 00:10:03.679 --> 00:10:07.559 Mathematics Institute. If you find a non exponential algorithm for 214 00:10:07.639 --> 00:10:11.320 any NP complete problem, you solve a major open question 215 00:10:11.399 --> 00:10:12.279 in computer science. 216 00:10:12.399 --> 00:10:15.639 Wow. Okay, So knowing this, how do we devise clever 217 00:10:15.759 --> 00:10:18.960 strategies to solve problems more efficiently? Like the quote says, 218 00:10:19.120 --> 00:10:21.440 if you find a good move, look for a better one. 219 00:10:21.519 --> 00:10:22.799 Right, we need strategies. 220 00:10:23.080 --> 00:10:26.080 Let's start simple iteration just using loops right yep. 221 00:10:26.480 --> 00:10:29.879 Repeating a process like merging two sorted lists you just 222 00:10:30.240 --> 00:10:33.240 iteratively compare the front items simple and effective. 223 00:10:33.399 --> 00:10:35.759 But iteration can lead to bad complexity too. 224 00:10:35.960 --> 00:10:39.159 It can if you have nested loops, like when generating 225 00:10:39.159 --> 00:10:41.480 a power set all possible subsets. 226 00:10:41.039 --> 00:10:43.879 Of items like all combinations pizza toppings sort of. 227 00:10:43.879 --> 00:10:46.840 Yeah, that can quickly lead to O two end complexity 228 00:10:47.039 --> 00:10:51.360 exponential Again. Then there's recursion, which can be quite elegant. 229 00:10:51.360 --> 00:10:53.080 Where a function calls itself. 230 00:10:52.799 --> 00:10:56.399 Exactly, it calls clones of itself to solve smaller versions 231 00:10:56.440 --> 00:10:59.559 of the same problem. Really intuitive for things like calculating 232 00:10:59.559 --> 00:11:03.080 fibinih numbers or checking if a word is a palindrome. 233 00:11:03.159 --> 00:11:04.840 Simpler code often often. 234 00:11:04.679 --> 00:11:09.480 Yes, but there's a trade off. Recursion can have computational overhead. 235 00:11:09.919 --> 00:11:12.159 The computer needs to keep track of all those calls, 236 00:11:12.480 --> 00:11:15.279 which can use more memory in CPU time than a 237 00:11:15.320 --> 00:11:17.039 straightforward iterative loop. 238 00:11:17.200 --> 00:11:21.200 Simplicity versus performance a classic dilemma. It often is what 239 00:11:21.240 --> 00:11:24.000 about just trying everything? Brute force? 240 00:11:24.320 --> 00:11:27.279 That's exactly what it sounds like. Inspecting all possible solutions 241 00:11:27.360 --> 00:11:30.840 seems inefficient. Often it is very naive, but sometimes it's 242 00:11:30.879 --> 00:11:33.639 the only direct way with the starting point. For the 243 00:11:33.679 --> 00:11:36.320 best trade problem finding the best days to buy and 244 00:11:36.360 --> 00:11:37.320 sell gold, say. 245 00:11:37.360 --> 00:11:40.639 You check every possible by day and sell day pair. 246 00:11:40.759 --> 00:11:43.960 Yep, that's o N two. For the knapsack problem, checking 247 00:11:44.000 --> 00:11:47.120 every combination of items that's two n A two quickly 248 00:11:47.159 --> 00:11:48.639 becomes impossible. 249 00:11:48.240 --> 00:11:51.279 So brute force is often too slow. How do we 250 00:11:51.320 --> 00:11:54.440 make it smarter? Can we avoid checking everything? 251 00:11:54.679 --> 00:11:57.720 Absolutely? A key optimization is backtracking. 252 00:11:57.840 --> 00:11:58.759 How does that work? 253 00:11:59.080 --> 00:12:02.200 Think of the eight queen puzzle again. Instead of placing 254 00:12:02.279 --> 00:12:06.960 queens randomly, backtracking places them one by one, ensuring each 255 00:12:07.039 --> 00:12:08.399 placement is valid so far. 256 00:12:08.480 --> 00:12:10.440 And if it hits a dead end, if. 257 00:12:10.279 --> 00:12:12.840 It places a queen and realizes there's no valid spot 258 00:12:12.840 --> 00:12:15.960 for the next one, it rolls back its last move, 259 00:12:16.240 --> 00:12:18.240 takes the queen off, and tries a different spot in 260 00:12:18.279 --> 00:12:19.240 the previous. 261 00:12:18.879 --> 00:12:21.720 Row, failing early. Like you said, exactly, Yeah. 262 00:12:21.639 --> 00:12:24.279 It prunes entire branches of the search tree that can't 263 00:12:24.279 --> 00:12:28.039 possibly lead to a solution, cuts down the search space dramatically. 264 00:12:28.320 --> 00:12:31.720 That's clever. But what if even backtracking is too slow? 265 00:12:31.960 --> 00:12:33.840 Or maybe we just need a good enough answer, not 266 00:12:33.919 --> 00:12:35.279 the absolute perfect one. 267 00:12:35.399 --> 00:12:38.200 Right. That's where heuristics come in. They're like rules of thumb. 268 00:12:37.960 --> 00:12:40.360 Or shortcuts, not guaranteed to be optimal. 269 00:12:40.039 --> 00:12:42.720 Not guaranteed no, but they often lead to a pretty 270 00:12:42.759 --> 00:12:46.639 good solution much much faster. A common one is the 271 00:12:46.679 --> 00:12:50.519 greedy approach. Yeah. For the knapsack problem, a greedy approach 272 00:12:50.600 --> 00:12:53.720 might just be keep packing the most valuable items per 273 00:12:53.720 --> 00:12:57.600 pound until the bag is full, don't overthink future possibilities. 274 00:12:57.720 --> 00:12:58.960 Simple does it work? 275 00:12:59.200 --> 00:13:04.080 Sometimes well. For the traveling salesman, a greedy heuristic might 276 00:13:04.120 --> 00:13:08.120 be always go to the nearest unvisited city next. Not 277 00:13:08.279 --> 00:13:12.720 optimal but fast, and sometimes like connecting settlements with minimum 278 00:13:12.799 --> 00:13:16.679 wire power grid problem, the greedy approach actually is the 279 00:13:16.720 --> 00:13:17.559 optimal solution. 280 00:13:17.879 --> 00:13:20.879 Interesting, Okay. One more strategy branch and bound. How does 281 00:13:20.919 --> 00:13:21.360 that fit? In? 282 00:13:21.559 --> 00:13:24.559 Branch and bound is a more sophisticated strategy, usually for 283 00:13:24.600 --> 00:13:27.799 optimization problems where you want the absolute best solution. 284 00:13:27.679 --> 00:13:29.480 Like backtracking, but smarter kind of. 285 00:13:29.639 --> 00:13:33.039 It systematically searches dividing the problem up. But the key 286 00:13:33.200 --> 00:13:36.320 is it uses bounds, estimates of the best possible outcome 287 00:13:36.360 --> 00:13:37.120 in a given branch. 288 00:13:37.440 --> 00:13:37.919 Bounds. 289 00:13:38.120 --> 00:13:41.000 Yeah, Like for the knapsack problem, you could calculate an 290 00:13:41.000 --> 00:13:44.519 optimistic upper bound on the profit you could get from 291 00:13:44.519 --> 00:13:48.159 a partial solution, for example, allowing fractions of items. If 292 00:13:48.200 --> 00:13:50.600 that upper bound is still worse than a complete solution 293 00:13:50.679 --> 00:13:51.559 you've already found. 294 00:13:51.320 --> 00:13:53.960 Elsewhere, you just ditch that whole branch exactly. 295 00:13:54.559 --> 00:13:57.480 You prune it, knowing it can't possibly lead to the 296 00:13:57.480 --> 00:14:01.639 best overall answer. This raise is in shortant question, how 297 00:14:01.679 --> 00:14:05.360 can code go from super slow to reasonably paced. This 298 00:14:05.440 --> 00:14:08.240 kind of intelligent pruning is often the key. It lets 299 00:14:08.240 --> 00:14:11.600 you find the optimal solution way faster than simple brute 300 00:14:11.600 --> 00:14:13.080 force or basic backtracking. 301 00:14:13.240 --> 00:14:15.879 Okay, so we've got ways to think about problems, ways 302 00:14:15.879 --> 00:14:18.960 to strategize for efficiency. But what about the data itself? 303 00:14:19.360 --> 00:14:22.440 The information these algorithms work on. How do we manage that. 304 00:14:22.919 --> 00:14:26.320 Great question that brings us to organizing the data effectively? 305 00:14:26.360 --> 00:14:30.120 Which starts with abstract data types or ADTs. The book 306 00:14:30.200 --> 00:14:32.000 uses a card dashboard analogy. 307 00:14:32.120 --> 00:14:33.960 Yeah, it's a good one. The dashboard hides all the 308 00:14:33.960 --> 00:14:37.519 complex engine mechanics, right, you just use the control speed fuel. 309 00:14:37.639 --> 00:14:40.559 So ADT's hide the data details exactly if. 310 00:14:40.480 --> 00:14:43.600 They define a set of operations like push, pop, find 311 00:14:43.600 --> 00:14:46.480 without specifying how those operations are implemented or how the 312 00:14:46.559 --> 00:14:50.480 data is stored underneath. What's the advantage Simpler code flexibility. 313 00:14:50.759 --> 00:14:53.840 You can swap out the underline implementation later without changing 314 00:14:53.840 --> 00:14:57.879 the code that uses the ADT Reasonability, better organization, fewer bugs, 315 00:14:58.559 --> 00:15:02.240 lots of benefits. Interact with the interface, not the messy internals. 316 00:15:02.879 --> 00:15:05.320 And there are several common ADTs we use all the time, 317 00:15:05.399 --> 00:15:08.559 like a stack YEP, a stack is lifo last in, 318 00:15:08.879 --> 00:15:11.840 first out. Think of the undoe button in your text editor. 319 00:15:11.960 --> 00:15:14.480 The last thing you did gets undone first, like a stack. 320 00:15:14.240 --> 00:15:17.919 Of plates exactly. Then there's a queue which is fifo 321 00:15:18.120 --> 00:15:21.080 first in, first out, like a line for coffee or 322 00:15:21.159 --> 00:15:22.200 print jobs waiting for. 323 00:15:22.159 --> 00:15:23.919 The printer first come for served. 324 00:15:24.080 --> 00:15:27.720 Right. A priority queue is similar, but items have priorities. 325 00:15:28.039 --> 00:15:30.639 Higher priority items jump the queue like in an er. 326 00:15:30.840 --> 00:15:31.360 Makes sense. 327 00:15:31.480 --> 00:15:34.120 A list lets you insert or remove items anywhere. A 328 00:15:34.200 --> 00:15:37.080 map or dictionary stores key value pairs like a word 329 00:15:37.080 --> 00:15:40.480 in its definition, and a set holds unique items, like 330 00:15:40.519 --> 00:15:42.159 a collection of unique user names. 331 00:15:42.279 --> 00:15:44.360 So those are the abstract ideas. What about the actual 332 00:15:44.440 --> 00:15:47.799 data structures, the concrete ways data is organized in memory? 333 00:15:47.919 --> 00:15:50.720 Right, this is the implementation layer, like a rayse arrays 334 00:15:50.759 --> 00:15:54.039 or fundamental data storage side by side in memory. This 335 00:15:54.159 --> 00:15:57.240 gives you instant access. Oh one, if you know the index, 336 00:15:57.320 --> 00:16:00.759 the position that they're rigid, Yeah, fixed size. Usually resizing 337 00:16:00.799 --> 00:16:02.639 can be slow because you might have to copy everything. 338 00:16:02.960 --> 00:16:05.799 Then you have linked lists, chains of items. 339 00:16:05.480 --> 00:16:07.120 Cans of cells. Yeah yeah, they don't need to be 340 00:16:07.159 --> 00:16:10.519 contiguous in memory, very flexible for adding or removing items, 341 00:16:10.559 --> 00:16:13.759 just relink the pointers. But finding an item takes time, 342 00:16:14.600 --> 00:16:16.639 oh when because you might have to follow the chain 343 00:16:16.639 --> 00:16:17.159 from the start. 344 00:16:17.200 --> 00:16:21.279 Trade offs again, speed versus flexibility always trade offs. 345 00:16:21.799 --> 00:16:25.879 Then we get into trees. Hierarchical structures like a family 346 00:16:25.960 --> 00:16:30.279 tree sort of binary search trees are really important. They 347 00:16:30.360 --> 00:16:32.879 keep data sorted in a way that allows very fast 348 00:16:32.919 --> 00:16:36.440 searching how fast O log in average. If the tree 349 00:16:36.440 --> 00:16:39.879 is balanced, that means it doesn't get too lopsided. Keeping 350 00:16:39.960 --> 00:16:42.960 trees balanced is key, which leads to self balancing trees 351 00:16:43.000 --> 00:16:45.759 like avl or red black trees and things like B 352 00:16:45.919 --> 00:16:48.240 trees are optimalist for storing tree structures on disk. 353 00:16:48.600 --> 00:16:51.639 Okay, and one more structure. Hash tables. You said they're 354 00:16:51.679 --> 00:16:52.240 like magic. 355 00:16:52.480 --> 00:16:56.039 They feel like magic. Sometimes they allow finding items in 356 00:16:56.320 --> 00:17:00.759 one time on average constant time. They use hash function, 357 00:17:01.240 --> 00:17:04.640 a special calculation to convert the item's key directly into 358 00:17:04.640 --> 00:17:06.680 a memory address or close. 359 00:17:06.400 --> 00:17:08.920 To it, so you jump right to the data pretty much. 360 00:17:08.960 --> 00:17:12.519 The main challenge is hash collisions when different keys hash 361 00:17:12.599 --> 00:17:14.640 to the same spot. There are ways to handle that, 362 00:17:14.920 --> 00:17:17.160 but yeah, hash tables are behind maps and sets when 363 00:17:17.200 --> 00:17:18.880 you need that super fast access. 364 00:17:18.960 --> 00:17:23.000 Okay, stepping back from individual structures, managing huge amounts of 365 00:17:23.079 --> 00:17:26.880 data brings us to databases and relational databases have been 366 00:17:26.960 --> 00:17:28.880 king for a long time. How do they work? 367 00:17:29.079 --> 00:17:33.640 They organize data into tables think spreadsheets basically with rows 368 00:17:33.640 --> 00:17:36.880 and columns with strict rules, very strict rules defined by 369 00:17:36.880 --> 00:17:39.599 a schema. It dictates what kind of data goes in 370 00:17:39.680 --> 00:17:43.839 each column field for each rope entry. But the real 371 00:17:43.920 --> 00:17:47.119 power isn't just the tables. It's the relationships between tables. 372 00:17:47.160 --> 00:17:47.960 How do you link them? 373 00:17:48.240 --> 00:17:51.480 Using primary keys unique IDs for each row and a table, 374 00:17:51.599 --> 00:17:54.319 and foreign keys which are references to primary keys in 375 00:17:54.440 --> 00:17:55.119 other tables. 376 00:17:55.200 --> 00:17:57.279 Ah, So you don't repeat information. 377 00:17:57.240 --> 00:18:01.000 Exactly, You avoid data duplication. I don't store a customer's 378 00:18:01.000 --> 00:18:03.640 full address with every single order they place. You have 379 00:18:03.680 --> 00:18:06.720 a customer's table and an order's table linked by a 380 00:18:06.720 --> 00:18:11.000 customer id. This process is called normalization. It keeps data 381 00:18:11.039 --> 00:18:12.920 consistent and reduces redundancy. 382 00:18:13.160 --> 00:18:17.319 And we talk to these databases using SQL right structured query. 383 00:18:17.039 --> 00:18:18.759 Language, that's the standard language. Yeah. 384 00:18:18.839 --> 00:18:21.480 Key commands are like select from. 385 00:18:21.480 --> 00:18:25.079 Where, order by for sorting, group by why for aggregation, 386 00:18:25.559 --> 00:18:29.039 and crucially join what is join do? Join lets you 387 00:18:29.079 --> 00:18:32.680 combine data from multiple related tables based on those keys 388 00:18:32.720 --> 00:18:36.359 we talked about, like getting customer names alongside their order details. 389 00:18:36.480 --> 00:18:37.759 It's incredibly powerful. 390 00:18:37.799 --> 00:18:40.160 That also maybe slow it can be. 391 00:18:40.400 --> 00:18:43.160 Joining very large tables is computationally expensive. That's kind of 392 00:18:43.160 --> 00:18:45.960 the power and weakness of the relational model. So to 393 00:18:45.960 --> 00:18:48.079 speed things up, we use indexing. 394 00:18:47.759 --> 00:18:49.960 Like an index in a book similar idea. 395 00:18:50.079 --> 00:18:52.799 It's a separate data structure, often a self balancing tree 396 00:18:53.039 --> 00:18:57.000 that maps index values like IDs to their actual location 397 00:18:57.079 --> 00:18:57.559 in the table. 398 00:18:57.680 --> 00:19:00.200 So queries are faster, much faster. 399 00:19:00.599 --> 00:19:03.960 O log instead of scanning the whole table. Sorting is 400 00:19:03.960 --> 00:19:08.039 faster too, But indexes have costs, take up space, and 401 00:19:08.079 --> 00:19:10.039