WEBVTT

1
00:00:00.120 --> 00:00:02.200
<v Speaker 1>Welcome to the deep Dive. We're of the show that

2
00:00:02.200 --> 00:00:05.160
<v Speaker 1>cuts through the noise, getting straight to the fascinating core

3
00:00:05.280 --> 00:00:09.039
<v Speaker 1>of well complex topics. That's the plan, and today we

4
00:00:09.119 --> 00:00:12.960
<v Speaker 1>are diving headfirst into the fundamentals of computer science. We're

5
00:00:13.039 --> 00:00:18.239
<v Speaker 1>drawing insights from computer science distilled by a Watston Fererrafilo.

6
00:00:17.879 --> 00:00:20.239
<v Speaker 2>A great resource, absolutely.

7
00:00:19.839 --> 00:00:22.719
<v Speaker 1>And our mission isn't just for coders, right, It's for

8
00:00:22.760 --> 00:00:26.839
<v Speaker 1>anyone who wants to sharpen their computational.

9
00:00:26.160 --> 00:00:30.719
<v Speaker 2>Thinking exactly to really understand the invisible forces powering our

10
00:00:30.760 --> 00:00:31.440
<v Speaker 2>digital world.

11
00:00:31.920 --> 00:00:34.359
<v Speaker 1>And you know it goes way beyond just the machines.

12
00:00:34.039 --> 00:00:37.280
<v Speaker 2>Themselves, it really does. There's that famous quote from Esger Dikstra,

13
00:00:37.359 --> 00:00:40.119
<v Speaker 2>a real pioneer. Oh yeah, he said, computer science is

14
00:00:40.159 --> 00:00:43.079
<v Speaker 2>not about machines in the same way that astronomy is

15
00:00:43.119 --> 00:00:44.240
<v Speaker 2>not about telescopes.

16
00:00:44.359 --> 00:00:46.000
<v Speaker 1>Huh. I like that.

17
00:00:46.119 --> 00:00:49.719
<v Speaker 2>It's fundamentally about problem solving. It's a systematic way of

18
00:00:49.759 --> 00:00:53.280
<v Speaker 2>thinking that helps us break down incredibly complex challenges.

19
00:00:53.520 --> 00:00:56.679
<v Speaker 1>That quote perfectly sets the stage, doesn't it. So you

20
00:00:56.840 --> 00:00:59.159
<v Speaker 1>listening are about to take a shortcut sort of to

21
00:00:59.240 --> 00:01:02.880
<v Speaker 1>being truly well informed about these foundations, the stuff that

22
00:01:02.880 --> 00:01:07.000
<v Speaker 1>builds everything from your phone apps to global networks. Okay,

23
00:01:07.079 --> 00:01:10.359
<v Speaker 1>so if computer science isn't just about the physical machines,

24
00:01:11.840 --> 00:01:14.120
<v Speaker 1>where do we even begin when we talk about the

25
00:01:14.200 --> 00:01:16.120
<v Speaker 1>art of solving computational problems?

26
00:01:16.319 --> 00:01:18.840
<v Speaker 2>Well, we begin right at the start, taking a big,

27
00:01:19.079 --> 00:01:20.920
<v Speaker 2>maybe fuzzy idea and breaking.

28
00:01:20.640 --> 00:01:22.560
<v Speaker 1>It down into smaller pieces.

29
00:01:22.200 --> 00:01:26.480
<v Speaker 2>Exactly definable chunks. Translating a human thought into something that

30
00:01:26.680 --> 00:01:29.000
<v Speaker 2>you know could eventually be processed step by step.

31
00:01:29.200 --> 00:01:31.840
<v Speaker 1>Okay, that makes a lot of sense. So what tools

32
00:01:31.840 --> 00:01:35.480
<v Speaker 1>do we use for that initial modeling of ideas well?

33
00:01:36.079 --> 00:01:38.840
<v Speaker 2>Visualizing the process is a great start. I think of flow.

34
00:01:38.680 --> 00:01:40.480
<v Speaker 1>Charts, right, like drawing it out.

35
00:01:40.680 --> 00:01:43.480
<v Speaker 2>Yeah, the book mentioned how wikipedians actually used a flow

36
00:01:43.560 --> 00:01:46.120
<v Speaker 2>chart to map out their whole collaborative editing process.

37
00:01:46.280 --> 00:01:48.480
<v Speaker 1>Really for human interaction.

38
00:01:48.280 --> 00:01:52.920
<v Speaker 2>Yep, showing how discussions, edits agreements would flow. It's powerful

39
00:01:52.920 --> 00:01:56.000
<v Speaker 2>for complex interactions, not just code or you know, a

40
00:01:56.040 --> 00:01:59.359
<v Speaker 2>simpler example just mapping the steps to find the biggest

41
00:01:59.359 --> 00:02:00.200
<v Speaker 2>of three numbers.

42
00:02:00.319 --> 00:02:02.439
<v Speaker 1>I can see how that would clarify things. Okay, what

43
00:02:02.519 --> 00:02:05.760
<v Speaker 1>about pseudo code. I've had that term thrown around. Ugh.

44
00:02:05.879 --> 00:02:12.240
<v Speaker 2>Pseudocode another fantastic tool. It's like human friendly code, so

45
00:02:12.400 --> 00:02:16.960
<v Speaker 2>not actual programming language. It completely disregards strict syntax rules.

46
00:02:17.000 --> 00:02:19.080
<v Speaker 2>You just write out the steps in plain English or

47
00:02:19.080 --> 00:02:20.000
<v Speaker 2>whatever works for you.

48
00:02:20.080 --> 00:02:24.400
<v Speaker 1>So flow chart, pseudocode, they're basically tools for thinking clearly.

49
00:02:24.439 --> 00:02:27.520
<v Speaker 2>Then that's it, helping you visualize and refine the process

50
00:02:27.759 --> 00:02:32.039
<v Speaker 2>before you even think about writing actual rentable code, structuring

51
00:02:32.039 --> 00:02:32.479
<v Speaker 2>your thoughts.

52
00:02:32.520 --> 00:02:36.719
<v Speaker 1>Logically, and speaking of logic, the book makes this interesting point.

53
00:02:37.000 --> 00:02:40.400
<v Speaker 1>Coders work with logic so much it messes their minds,

54
00:02:40.879 --> 00:02:42.680
<v Speaker 1>but often use it unknowingly.

55
00:02:43.039 --> 00:02:44.599
<v Speaker 2>Uh huh, Yeah, there's some truth to that.

56
00:02:44.719 --> 00:02:48.280
<v Speaker 1>So what's the benefit of understanding logic consciously?

57
00:02:48.439 --> 00:02:51.919
<v Speaker 2>It's about precision. Formal logic lets us work with definite

58
00:02:51.960 --> 00:02:54.599
<v Speaker 2>true or false statements black and white pretty much. Yeah,

59
00:02:54.719 --> 00:02:59.080
<v Speaker 2>understanding logical operators A and D or xor conditional negation.

60
00:02:59.360 --> 00:03:00.199
<v Speaker 2>It's all boole.

61
00:03:00.159 --> 00:03:03.039
<v Speaker 1>Algebra really, giving us rules to simplify things exactly.

62
00:03:03.039 --> 00:03:05.199
<v Speaker 2>The source has a great example the hot server problem.

63
00:03:05.360 --> 00:03:08.360
<v Speaker 2>A server crashes if it's overheating and the AC is off, okay,

64
00:03:08.439 --> 00:03:10.639
<v Speaker 2>or if it's overheating and the Chassi cooler fails.

65
00:03:10.960 --> 00:03:13.719
<v Speaker 1>Right, So you'd write that out as a logical expression.

66
00:03:13.199 --> 00:03:19.120
<v Speaker 2>Precisely, maybe overheating andy acoff or overheating and de cooler failed.

67
00:03:19.199 --> 00:03:21.120
<v Speaker 1>What's complicated, But here's the power.

68
00:03:21.719 --> 00:03:26.319
<v Speaker 2>Using logic rules like distributivity from Boolean algebra, you simplify

69
00:03:26.360 --> 00:03:30.479
<v Speaker 2>that it becomes overheating and d acoff or cooler failed.

70
00:03:30.639 --> 00:03:31.680
<v Speaker 2>Much clearer, right.

71
00:03:31.639 --> 00:03:34.520
<v Speaker 1>Yeah, that's much simpler. You see immediately overheating is key

72
00:03:34.560 --> 00:03:36.240
<v Speaker 1>plus one of the cooling failures.

73
00:03:36.280 --> 00:03:39.960
<v Speaker 2>Exactly. This kind of simplification saves a ton of effort

74
00:03:40.000 --> 00:03:42.479
<v Speaker 2>in complex systems. It gives you the precise conditions.

75
00:03:42.520 --> 00:03:45.759
<v Speaker 1>Okay, great illustration, And to analyze these logical models you

76
00:03:45.840 --> 00:03:47.479
<v Speaker 1>mentioned truth tables.

77
00:03:47.759 --> 00:03:52.840
<v Speaker 2>Ah, yes, truth tables are like comprehensive diagnostic tools. Oh

78
00:03:52.919 --> 00:03:56.479
<v Speaker 2>so they lay out every single possible combination of inputs

79
00:03:56.520 --> 00:03:59.599
<v Speaker 2>truefall for each condition and show the outcome for each one.

80
00:04:00.120 --> 00:04:01.199
<v Speaker 1>Check every scenario.

81
00:04:01.400 --> 00:04:04.960
<v Speaker 2>Yep. The book's fragile system example about designing a database

82
00:04:05.280 --> 00:04:09.479
<v Speaker 2>had four variables that means sixteen possible states. A truth

83
00:04:09.479 --> 00:04:10.879
<v Speaker 2>table maps mole out.

84
00:04:10.800 --> 00:04:12.280
<v Speaker 1>Ensures you don't miss any weird.

85
00:04:12.080 --> 00:04:14.360
<v Speaker 2>Edge cases precisely. It's systematic.

86
00:04:14.599 --> 00:04:17.720
<v Speaker 1>Now, what's really fascinating here is how this abstract logic

87
00:04:18.120 --> 00:04:21.839
<v Speaker 1>connects directly to the hardware. Right, how do true false

88
00:04:21.879 --> 00:04:23.680
<v Speaker 1>becomes something a machine understands?

89
00:04:23.800 --> 00:04:26.920
<v Speaker 2>It's kind of brilliant in its simplicity, True and false

90
00:04:27.040 --> 00:04:30.720
<v Speaker 2>become high or low voltage signals signals right, right, And

91
00:04:30.800 --> 00:04:35.759
<v Speaker 2>these logical operations an D or not T are performed

92
00:04:35.800 --> 00:04:38.560
<v Speaker 2>by tiny electrical circuits. We call them logic gates.

93
00:04:38.639 --> 00:04:39.040
<v Speaker 1>Got it.

94
00:04:39.240 --> 00:04:41.759
<v Speaker 2>The book even shows how you combine these gates to

95
00:04:41.879 --> 00:04:44.839
<v Speaker 2>actually some numbers, like how two plus three weekls five

96
00:04:44.920 --> 00:04:46.120
<v Speaker 2>works at the binary level.

97
00:04:46.240 --> 00:04:48.839
<v Speaker 1>Wow, so's the CPU, the brain of the computer.

98
00:04:48.959 --> 00:04:52.160
<v Speaker 2>It's working. Principles are still fundamentally based on this Boolean algebra.

99
00:04:52.439 --> 00:04:56.519
<v Speaker 2>It's millions billions of these tiny switches manipulating voltages.

100
00:04:56.600 --> 00:05:00.360
<v Speaker 1>Incredible. Okay, So logic defines conditions. What about the sheer

101
00:05:00.360 --> 00:05:03.680
<v Speaker 1>scale of possibilities that brings us to counting? Why is

102
00:05:03.720 --> 00:05:06.000
<v Speaker 1>something basic counting so vital here?

103
00:05:06.040 --> 00:05:08.480
<v Speaker 2>Because it builds your intuition, your gut feeling for how

104
00:05:08.519 --> 00:05:12.680
<v Speaker 2>possibilities combine and often explode explode. Yeah, it's not about

105
00:05:12.720 --> 00:05:16.199
<v Speaker 2>memorizing formulas, but grasping the principles, like cracking a simple

106
00:05:16.279 --> 00:05:18.360
<v Speaker 2>pin two digits one letter.

107
00:05:18.399 --> 00:05:21.759
<v Speaker 1>Okay, so ten choices, ten choices, twenty six choices, right.

108
00:05:21.680 --> 00:05:24.680
<v Speaker 2>You multiply them ten by ten by twenty six, it's

109
00:05:24.720 --> 00:05:28.639
<v Speaker 2>twenty six hundred possibilities, or the team building example. Twenty

110
00:05:28.639 --> 00:05:31.600
<v Speaker 2>three candidates each can be in or out.

111
00:05:31.399 --> 00:05:33.639
<v Speaker 1>So two choices for each of the twenty three.

112
00:05:33.759 --> 00:05:36.279
<v Speaker 2>That's two to the power of twenty three possible teams.

113
00:05:36.519 --> 00:05:38.040
<v Speaker 2>The numbers grow fast.

114
00:05:37.839 --> 00:05:40.879
<v Speaker 1>And sometimes they grow astronomically fast, like with permutations right

115
00:05:40.920 --> 00:05:42.519
<v Speaker 1>where order matters exactly.

116
00:05:42.759 --> 00:05:45.160
<v Speaker 2>Permutations are about the number of ways to order things.

117
00:05:45.360 --> 00:05:47.319
<v Speaker 2>The classic traveling salesman.

118
00:05:47.040 --> 00:05:48.720
<v Speaker 1>Problem finding the shortest route.

119
00:05:48.800 --> 00:05:52.519
<v Speaker 2>Yeah, just fifteen cities, yeh. Try to take every possible

120
00:05:52.600 --> 00:05:56.079
<v Speaker 2>route order you're looking at over one point three trillion

121
00:05:56.360 --> 00:05:59.879
<v Speaker 2>routes trillion. Wow, And if each calculation takes a microsecond,

122
00:06:00.040 --> 00:06:03.279
<v Speaker 2>that's fifteen days of compute time for twenty cities. It

123
00:06:03.360 --> 00:06:05.319
<v Speaker 2>jumps to seventy seven thousand years.

124
00:06:05.399 --> 00:06:06.360
<v Speaker 1>That's insane.

125
00:06:06.519 --> 00:06:09.639
<v Speaker 2>The precious tune example too, thirteen note six note melodies,

126
00:06:10.199 --> 00:06:13.959
<v Speaker 2>over one point two million possibilities. It shows that factorials

127
00:06:14.000 --> 00:06:16.480
<v Speaker 2>just make things explode computationally staggering.

128
00:06:16.519 --> 00:06:18.839
<v Speaker 1>Okay, what if the order doesn't matter? Like in combination.

129
00:06:19.079 --> 00:06:22.519
<v Speaker 2>Combinations are about selecting items where arrangement isn't important, like

130
00:06:22.600 --> 00:06:23.680
<v Speaker 2>the chess queen's.

131
00:06:23.319 --> 00:06:25.480
<v Speaker 1>Problem placing eight queens so none attack each other.

132
00:06:25.720 --> 00:06:27.600
<v Speaker 2>Right, if we were just choosing eight squares out of

133
00:06:27.600 --> 00:06:30.040
<v Speaker 2>sixty four, ignoring the attack rules for a moment, that's

134
00:06:30.040 --> 00:06:32.800
<v Speaker 2>sixty four, choose eight. It was about four point four

135
00:06:32.839 --> 00:06:34.560
<v Speaker 2>billion ways.

136
00:06:33.920 --> 00:06:38.040
<v Speaker 1>Still huge, but less than the traveling salesman for fifteen cities.

137
00:06:38.160 --> 00:06:42.000
<v Speaker 2>Often significantly smaller than permutations. Yeah yeah, but still massive

138
00:06:42.079 --> 00:06:42.920
<v Speaker 2>numbers to deal with.

139
00:06:43.040 --> 00:06:45.519
<v Speaker 1>And then there are sums. I remember Gauss's trick for

140
00:06:45.560 --> 00:06:48.079
<v Speaker 1>adding one to one hundred. How does that play in?

141
00:06:48.160 --> 00:06:50.720
<v Speaker 2>It's a classic shortcut, right, Yeah. That one plus two

142
00:06:50.720 --> 00:06:53.360
<v Speaker 2>plus ro plus and n plus one two formula the

143
00:06:53.360 --> 00:06:56.240
<v Speaker 2>flying cheap airline ticket problem. And the source uses this

144
00:06:56.879 --> 00:06:59.759
<v Speaker 2>finding the cheapest flight over thirty days. You check every

145
00:06:59.759 --> 00:07:00.480
<v Speaker 2>pair of star.

146
00:07:00.399 --> 00:07:02.680
<v Speaker 1>Ten dates, so day one to day two, day one

147
00:07:02.800 --> 00:07:05.560
<v Speaker 1>to day three up to day twenty ninety day thirty exactly.

148
00:07:05.639 --> 00:07:07.720
<v Speaker 2>That's a sum one plus two plus b L plus

149
00:07:07.720 --> 00:07:09.879
<v Speaker 2>twenty nine, which Gauss's trick tells you as four hundred

150
00:07:09.879 --> 00:07:12.759
<v Speaker 2>and thirty five pairs. Shows how even simple sums lead

151
00:07:12.800 --> 00:07:13.839
<v Speaker 2>to lots of operations.

152
00:07:13.879 --> 00:07:15.920
<v Speaker 1>And this leads into probability, too, doesn't it.

153
00:07:16.040 --> 00:07:20.759
<v Speaker 2>Absolutely. Probability concepts are critical for managing uncertainty thinking about

154
00:07:20.800 --> 00:07:24.519
<v Speaker 2>independent events. Hey, like backing up data using three cheap

155
00:07:24.600 --> 00:07:27.680
<v Speaker 2>discs might actually give you a lower overall failure risk

156
00:07:27.959 --> 00:07:31.079
<v Speaker 2>than one super expensive disc if their failures are independent,

157
00:07:31.319 --> 00:07:32.240
<v Speaker 2>interesting trade off.

158
00:07:32.360 --> 00:07:35.920
<v Speaker 1>Or mutually exclusive events like choosing one subscription plan. And

159
00:07:36.079 --> 00:07:40.439
<v Speaker 1>understanding this helps avoid things like the gambler's fallacy, believing.

160
00:07:40.000 --> 00:07:42.439
<v Speaker 2>A coin is due for heads after a run of tails.

161
00:07:42.600 --> 00:07:45.920
<v Speaker 1>Yeah, past independent events just don't affect future ones. It

162
00:07:45.959 --> 00:07:50.319
<v Speaker 1>seems obvious, but it trips people up. So okay, with modeling,

163
00:07:50.720 --> 00:07:54.759
<v Speaker 1>logic and counting, we can define and understand problems. But

164
00:07:54.839 --> 00:07:57.439
<v Speaker 1>we have a solution. How do we know if it's

165
00:07:57.439 --> 00:07:59.839
<v Speaker 1>actually good or you know, fast?

166
00:08:00.160 --> 00:08:03.160
<v Speaker 2>Yeah, that feels like the crucial next step. It brings

167
00:08:03.199 --> 00:08:07.040
<v Speaker 2>us to complexity. The book asks sorting twenty six cards,

168
00:08:07.120 --> 00:08:09.360
<v Speaker 2>how long does it take? What if you have fifty

169
00:08:09.399 --> 00:08:11.199
<v Speaker 2>two cards? Does it just take twice as long?

170
00:08:11.279 --> 00:08:13.959
<v Speaker 1>It's often more nuanced than that, and it's absolutely critical.

171
00:08:13.959 --> 00:08:16.399
<v Speaker 1>We talk about time complexity, usually written as T on

172
00:08:16.639 --> 00:08:17.639
<v Speaker 1>N T O N.

173
00:08:17.759 --> 00:08:18.000
<v Speaker 2>Yeah.

174
00:08:18.040 --> 00:08:20.680
<v Speaker 1>It measures how the number of operations and algorithm performs

175
00:08:20.720 --> 00:08:24.279
<v Speaker 1>scales with the input size N, and we usually focus

176
00:08:24.279 --> 00:08:25.079
<v Speaker 1>on the worst case.

177
00:08:25.160 --> 00:08:27.120
<v Speaker 2>Planning for the worst makes sense.

178
00:08:26.920 --> 00:08:29.040
<v Speaker 1>And the standard way to talk about this growth is

179
00:08:29.079 --> 00:08:32.159
<v Speaker 1>big O notation. It focuses on the dominant term how

180
00:08:32.159 --> 00:08:33.320
<v Speaker 1>it scales, like.

181
00:08:33.320 --> 00:08:37.200
<v Speaker 2>The biggest factor at n gets large exactly. For example,

182
00:08:37.360 --> 00:08:39.960
<v Speaker 2>a simple method like selection sword is O N two

183
00:08:40.799 --> 00:08:43.519
<v Speaker 2>O of n squared, meaning meaning if you double the

184
00:08:43.559 --> 00:08:46.919
<v Speaker 2>input size n, the time it takes roughly quadruples. Because

185
00:08:46.919 --> 00:08:49.399
<v Speaker 2>of that N squared factor, it grows quadratically.

186
00:08:49.440 --> 00:08:52.559
<v Speaker 1>Ouch. And here's where we get that aha moment, right.

187
00:08:53.200 --> 00:08:56.000
<v Speaker 1>Comparing something like om log in to O N two.

188
00:08:57.120 --> 00:08:58.440
<v Speaker 1>The difference is huge.

189
00:08:58.639 --> 00:09:01.919
<v Speaker 2>It's absolutely mass especially for large inputs. The source has

190
00:09:01.919 --> 00:09:03.759
<v Speaker 2>a table. It's shocking.

191
00:09:03.879 --> 00:09:04.639
<v Speaker 1>Give us an idea.

192
00:09:04.679 --> 00:09:07.480
<v Speaker 2>Okay, for a million items, an O N two algorithm

193
00:09:07.600 --> 00:09:10.639
<v Speaker 2>might literally take years to rent years. Yeah, But an

194
00:09:10.639 --> 00:09:13.799
<v Speaker 2>O N log n algorithm for the same million items,

195
00:09:13.840 --> 00:09:14.480
<v Speaker 2>it might take.

196
00:09:14.399 --> 00:09:17.480
<v Speaker 1>Minutes minutes versus years. That's incredible.

197
00:09:17.600 --> 00:09:20.200
<v Speaker 2>That log in factor makes a profound difference. It often

198
00:09:20.200 --> 00:09:23.159
<v Speaker 2>comes from algorithms that repeatedly cut the problem size in half.

199
00:09:23.440 --> 00:09:26.120
<v Speaker 2>It's why finding efficient algorithms is so important.

200
00:09:26.200 --> 00:09:28.440
<v Speaker 1>So O N two is bad for big inputs, O

201
00:09:28.600 --> 00:09:30.679
<v Speaker 1>en log in is much better. What about even worse

202
00:09:30.720 --> 00:09:32.799
<v Speaker 1>things like exponential algorithms?

203
00:09:32.799 --> 00:09:35.799
<v Speaker 2>Ah, those are the scary ones, things like two oh

204
00:09:35.919 --> 00:09:38.600
<v Speaker 2>of two. To the end's they grow so incredibly fast

205
00:09:38.600 --> 00:09:42.440
<v Speaker 2>they're considered not runnable for anything but really tiny inputs.

206
00:09:42.759 --> 00:09:46.879
<v Speaker 2>Even supercomputers can't handle them. Once NN gets even moderately.

207
00:09:46.480 --> 00:09:49.720
<v Speaker 1>Large, like the traveling salesman problem again exactly.

208
00:09:50.000 --> 00:09:53.240
<v Speaker 2>That often falls into this category, along with other famously

209
00:09:53.480 --> 00:09:57.559
<v Speaker 2>hard problems called NP complete problems like the knapsack problem.

210
00:09:57.679 --> 00:09:59.559
<v Speaker 1>And isn't there a prize for solving one of those

211
00:09:59.600 --> 00:10:01.080
<v Speaker 1>efficient Huh?

212
00:10:01.159 --> 00:10:03.679
<v Speaker 2>Yes? Yeah, that's the million dollar prize from the Clay

213
00:10:03.679 --> 00:10:07.559
<v Speaker 2>Mathematics Institute. If you find a non exponential algorithm for

214
00:10:07.639 --> 00:10:11.320
<v Speaker 2>any NP complete problem, you solve a major open question

215
00:10:11.399 --> 00:10:12.279
<v Speaker 2>in computer science.

216
00:10:12.399 --> 00:10:15.639
<v Speaker 1>Wow. Okay, So knowing this, how do we devise clever

217
00:10:15.759 --> 00:10:18.960
<v Speaker 1>strategies to solve problems more efficiently? Like the quote says,

218
00:10:19.120 --> 00:10:21.440
<v Speaker 1>if you find a good move, look for a better one.

219
00:10:21.519 --> 00:10:22.799
<v Speaker 2>Right, we need strategies.

220
00:10:23.080 --> 00:10:26.080
<v Speaker 1>Let's start simple iteration just using loops right yep.

221
00:10:26.480 --> 00:10:29.879
<v Speaker 2>Repeating a process like merging two sorted lists you just

222
00:10:30.240 --> 00:10:33.240
<v Speaker 2>iteratively compare the front items simple and effective.

223
00:10:33.399 --> 00:10:35.759
<v Speaker 1>But iteration can lead to bad complexity too.

224
00:10:35.960 --> 00:10:39.159
<v Speaker 2>It can if you have nested loops, like when generating

225
00:10:39.159 --> 00:10:41.480
<v Speaker 2>a power set all possible subsets.

226
00:10:41.039 --> 00:10:43.879
<v Speaker 1>Of items like all combinations pizza toppings sort of.

227
00:10:43.879 --> 00:10:46.840
<v Speaker 2>Yeah, that can quickly lead to O two end complexity

228
00:10:47.039 --> 00:10:51.360
<v Speaker 2>exponential Again. Then there's recursion, which can be quite elegant.

229
00:10:51.360 --> 00:10:53.080
<v Speaker 1>Where a function calls itself.

230
00:10:52.799 --> 00:10:56.399
<v Speaker 2>Exactly, it calls clones of itself to solve smaller versions

231
00:10:56.440 --> 00:10:59.559
<v Speaker 2>of the same problem. Really intuitive for things like calculating

232
00:10:59.559 --> 00:11:03.080
<v Speaker 2>fibinih numbers or checking if a word is a palindrome.

233
00:11:03.159 --> 00:11:04.840
<v Speaker 1>Simpler code often often.

234
00:11:04.679 --> 00:11:09.480
<v Speaker 2>Yes, but there's a trade off. Recursion can have computational overhead.

235
00:11:09.919 --> 00:11:12.159
<v Speaker 2>The computer needs to keep track of all those calls,

236
00:11:12.480 --> 00:11:15.279
<v Speaker 2>which can use more memory in CPU time than a

237
00:11:15.320 --> 00:11:17.039
<v Speaker 2>straightforward iterative loop.

238
00:11:17.200 --> 00:11:21.200
<v Speaker 1>Simplicity versus performance a classic dilemma. It often is what

239
00:11:21.240 --> 00:11:24.000
<v Speaker 1>about just trying everything? Brute force?

240
00:11:24.320 --> 00:11:27.279
<v Speaker 2>That's exactly what it sounds like. Inspecting all possible solutions

241
00:11:27.360 --> 00:11:30.840
<v Speaker 2>seems inefficient. Often it is very naive, but sometimes it's

242
00:11:30.879 --> 00:11:33.639
<v Speaker 2>the only direct way with the starting point. For the

243
00:11:33.679 --> 00:11:36.320
<v Speaker 2>best trade problem finding the best days to buy and

244
00:11:36.360 --> 00:11:37.320
<v Speaker 2>sell gold, say.

245
00:11:37.360 --> 00:11:40.639
<v Speaker 1>You check every possible by day and sell day pair.

246
00:11:40.759 --> 00:11:43.960
<v Speaker 2>Yep, that's o N two. For the knapsack problem, checking

247
00:11:44.000 --> 00:11:47.120
<v Speaker 2>every combination of items that's two n A two quickly

248
00:11:47.159 --> 00:11:48.639
<v Speaker 2>becomes impossible.

249
00:11:48.240 --> 00:11:51.279
<v Speaker 1>So brute force is often too slow. How do we

250
00:11:51.320 --> 00:11:54.440
<v Speaker 1>make it smarter? Can we avoid checking everything?

251
00:11:54.679 --> 00:11:57.720
<v Speaker 2>Absolutely? A key optimization is backtracking.

252
00:11:57.840 --> 00:11:58.759
<v Speaker 1>How does that work?

253
00:11:59.080 --> 00:12:02.200
<v Speaker 2>Think of the eight queen puzzle again. Instead of placing

254
00:12:02.279 --> 00:12:06.960
<v Speaker 2>queens randomly, backtracking places them one by one, ensuring each

255
00:12:07.039 --> 00:12:08.399
<v Speaker 2>placement is valid so far.

256
00:12:08.480 --> 00:12:10.440
<v Speaker 1>And if it hits a dead end, if.

257
00:12:10.279 --> 00:12:12.840
<v Speaker 2>It places a queen and realizes there's no valid spot

258
00:12:12.840 --> 00:12:15.960
<v Speaker 2>for the next one, it rolls back its last move,

259
00:12:16.240 --> 00:12:18.240
<v Speaker 2>takes the queen off, and tries a different spot in

260
00:12:18.279 --> 00:12:19.240
<v Speaker 2>the previous.

261
00:12:18.879 --> 00:12:21.720
<v Speaker 1>Row, failing early. Like you said, exactly, Yeah.

262
00:12:21.639 --> 00:12:24.279
<v Speaker 2>It prunes entire branches of the search tree that can't

263
00:12:24.279 --> 00:12:28.039
<v Speaker 2>possibly lead to a solution, cuts down the search space dramatically.

264
00:12:28.320 --> 00:12:31.720
<v Speaker 1>That's clever. But what if even backtracking is too slow?

265
00:12:31.960 --> 00:12:33.840
<v Speaker 1>Or maybe we just need a good enough answer, not

266
00:12:33.919 --> 00:12:35.279
<v Speaker 1>the absolute perfect one.

267
00:12:35.399 --> 00:12:38.200
<v Speaker 2>Right. That's where heuristics come in. They're like rules of thumb.

268
00:12:37.960 --> 00:12:40.360
<v Speaker 1>Or shortcuts, not guaranteed to be optimal.

269
00:12:40.039 --> 00:12:42.720
<v Speaker 2>Not guaranteed no, but they often lead to a pretty

270
00:12:42.759 --> 00:12:46.639
<v Speaker 2>good solution much much faster. A common one is the

271
00:12:46.679 --> 00:12:50.519
<v Speaker 2>greedy approach. Yeah. For the knapsack problem, a greedy approach

272
00:12:50.600 --> 00:12:53.720
<v Speaker 2>might just be keep packing the most valuable items per

273
00:12:53.720 --> 00:12:57.600
<v Speaker 2>pound until the bag is full, don't overthink future possibilities.

274
00:12:57.720 --> 00:12:58.960
<v Speaker 1>Simple does it work?

275
00:12:59.200 --> 00:13:04.080
<v Speaker 2>Sometimes well. For the traveling salesman, a greedy heuristic might

276
00:13:04.120 --> 00:13:08.120
<v Speaker 2>be always go to the nearest unvisited city next. Not

277
00:13:08.279 --> 00:13:12.720
<v Speaker 2>optimal but fast, and sometimes like connecting settlements with minimum

278
00:13:12.799 --> 00:13:16.679
<v Speaker 2>wire power grid problem, the greedy approach actually is the

279
00:13:16.720 --> 00:13:17.559
<v Speaker 2>optimal solution.

280
00:13:17.879 --> 00:13:20.879
<v Speaker 1>Interesting, Okay. One more strategy branch and bound. How does

281
00:13:20.919 --> 00:13:21.360
<v Speaker 1>that fit? In?

282
00:13:21.559 --> 00:13:24.559
<v Speaker 2>Branch and bound is a more sophisticated strategy, usually for

283
00:13:24.600 --> 00:13:27.799
<v Speaker 2>optimization problems where you want the absolute best solution.

284
00:13:27.679 --> 00:13:29.480
<v Speaker 1>Like backtracking, but smarter kind of.

285
00:13:29.639 --> 00:13:33.039
<v Speaker 2>It systematically searches dividing the problem up. But the key

286
00:13:33.200 --> 00:13:36.320
<v Speaker 2>is it uses bounds, estimates of the best possible outcome

287
00:13:36.360 --> 00:13:37.120
<v Speaker 2>in a given branch.

288
00:13:37.440 --> 00:13:37.919
<v Speaker 1>Bounds.

289
00:13:38.120 --> 00:13:41.000
<v Speaker 2>Yeah, Like for the knapsack problem, you could calculate an

290
00:13:41.000 --> 00:13:44.519
<v Speaker 2>optimistic upper bound on the profit you could get from

291
00:13:44.519 --> 00:13:48.159
<v Speaker 2>a partial solution, for example, allowing fractions of items. If

292
00:13:48.200 --> 00:13:50.600
<v Speaker 2>that upper bound is still worse than a complete solution

293
00:13:50.679 --> 00:13:51.559
<v Speaker 2>you've already found.

294
00:13:51.320 --> 00:13:53.960
<v Speaker 1>Elsewhere, you just ditch that whole branch exactly.

295
00:13:54.559 --> 00:13:57.480
<v Speaker 2>You prune it, knowing it can't possibly lead to the

296
00:13:57.480 --> 00:14:01.639
<v Speaker 2>best overall answer. This raise is in shortant question, how

297
00:14:01.679 --> 00:14:05.360
<v Speaker 2>can code go from super slow to reasonably paced. This

298
00:14:05.440 --> 00:14:08.240
<v Speaker 2>kind of intelligent pruning is often the key. It lets

299
00:14:08.240 --> 00:14:11.600
<v Speaker 2>you find the optimal solution way faster than simple brute

300
00:14:11.600 --> 00:14:13.080
<v Speaker 2>force or basic backtracking.

301
00:14:13.240 --> 00:14:15.879
<v Speaker 1>Okay, so we've got ways to think about problems, ways

302
00:14:15.879 --> 00:14:18.960
<v Speaker 1>to strategize for efficiency. But what about the data itself?

303
00:14:19.360 --> 00:14:22.440
<v Speaker 1>The information these algorithms work on. How do we manage that.

304
00:14:22.919 --> 00:14:26.320
<v Speaker 2>Great question that brings us to organizing the data effectively?

305
00:14:26.360 --> 00:14:30.120
<v Speaker 1>Which starts with abstract data types or ADTs. The book

306
00:14:30.200 --> 00:14:32.000
<v Speaker 1>uses a card dashboard analogy.

307
00:14:32.120 --> 00:14:33.960
<v Speaker 2>Yeah, it's a good one. The dashboard hides all the

308
00:14:33.960 --> 00:14:37.519
<v Speaker 2>complex engine mechanics, right, you just use the control speed fuel.

309
00:14:37.639 --> 00:14:40.559
<v Speaker 1>So ADT's hide the data details exactly if.

310
00:14:40.480 --> 00:14:43.600
<v Speaker 2>They define a set of operations like push, pop, find

311
00:14:43.600 --> 00:14:46.480
<v Speaker 2>without specifying how those operations are implemented or how the

312
00:14:46.559 --> 00:14:50.480
<v Speaker 2>data is stored underneath. What's the advantage Simpler code flexibility.

313
00:14:50.759 --> 00:14:53.840
<v Speaker 2>You can swap out the underline implementation later without changing

314
00:14:53.840 --> 00:14:57.879
<v Speaker 2>the code that uses the ADT Reasonability, better organization, fewer bugs,

315
00:14:58.559 --> 00:15:02.240
<v Speaker 2>lots of benefits. Interact with the interface, not the messy internals.

316
00:15:02.879 --> 00:15:05.320
<v Speaker 2>And there are several common ADTs we use all the time,

317
00:15:05.399 --> 00:15:08.559
<v Speaker 2>like a stack YEP, a stack is lifo last in,

318
00:15:08.879 --> 00:15:11.840
<v Speaker 2>first out. Think of the undoe button in your text editor.

319
00:15:11.960 --> 00:15:14.480
<v Speaker 1>The last thing you did gets undone first, like a stack.

320
00:15:14.240 --> 00:15:17.919
<v Speaker 2>Of plates exactly. Then there's a queue which is fifo

321
00:15:18.120 --> 00:15:21.080
<v Speaker 2>first in, first out, like a line for coffee or

322
00:15:21.159 --> 00:15:22.200
<v Speaker 2>print jobs waiting for.

323
00:15:22.159 --> 00:15:23.919
<v Speaker 1>The printer first come for served.

324
00:15:24.080 --> 00:15:27.720
<v Speaker 2>Right. A priority queue is similar, but items have priorities.

325
00:15:28.039 --> 00:15:30.639
<v Speaker 2>Higher priority items jump the queue like in an er.

326
00:15:30.840 --> 00:15:31.360
<v Speaker 1>Makes sense.

327
00:15:31.480 --> 00:15:34.120
<v Speaker 2>A list lets you insert or remove items anywhere. A

328
00:15:34.200 --> 00:15:37.080
<v Speaker 2>map or dictionary stores key value pairs like a word

329
00:15:37.080 --> 00:15:40.480
<v Speaker 2>in its definition, and a set holds unique items, like

330
00:15:40.519 --> 00:15:42.159
<v Speaker 2>a collection of unique user names.

331
00:15:42.279 --> 00:15:44.360
<v Speaker 1>So those are the abstract ideas. What about the actual

332
00:15:44.440 --> 00:15:47.799
<v Speaker 1>data structures, the concrete ways data is organized in memory?

333
00:15:47.919 --> 00:15:50.720
<v Speaker 2>Right, this is the implementation layer, like a rayse arrays

334
00:15:50.759 --> 00:15:54.039
<v Speaker 2>or fundamental data storage side by side in memory. This

335
00:15:54.159 --> 00:15:57.240
<v Speaker 2>gives you instant access. Oh one, if you know the index,

336
00:15:57.320 --> 00:16:00.759
<v Speaker 2>the position that they're rigid, Yeah, fixed size. Usually resizing

337
00:16:00.799 --> 00:16:02.639
<v Speaker 2>can be slow because you might have to copy everything.

338
00:16:02.960 --> 00:16:05.799
<v Speaker 1>Then you have linked lists, chains of items.

339
00:16:05.480 --> 00:16:07.120
<v Speaker 2>Cans of cells. Yeah yeah, they don't need to be

340
00:16:07.159 --> 00:16:10.519
<v Speaker 2>contiguous in memory, very flexible for adding or removing items,

341
00:16:10.559 --> 00:16:13.759
<v Speaker 2>just relink the pointers. But finding an item takes time,

342
00:16:14.600 --> 00:16:16.639
<v Speaker 2>oh when because you might have to follow the chain

343
00:16:16.639 --> 00:16:17.159
<v Speaker 2>from the start.

344
00:16:17.200 --> 00:16:21.279
<v Speaker 1>Trade offs again, speed versus flexibility always trade offs.

345
00:16:21.799 --> 00:16:25.879
<v Speaker 2>Then we get into trees. Hierarchical structures like a family

346
00:16:25.960 --> 00:16:30.279
<v Speaker 2>tree sort of binary search trees are really important. They

347
00:16:30.360 --> 00:16:32.879
<v Speaker 2>keep data sorted in a way that allows very fast

348
00:16:32.919 --> 00:16:36.440
<v Speaker 2>searching how fast O log in average. If the tree

349
00:16:36.440 --> 00:16:39.879
<v Speaker 2>is balanced, that means it doesn't get too lopsided. Keeping

350
00:16:39.960 --> 00:16:42.960
<v Speaker 2>trees balanced is key, which leads to self balancing trees

351
00:16:43.000 --> 00:16:45.759
<v Speaker 2>like avl or red black trees and things like B

352
00:16:45.919 --> 00:16:48.240
<v Speaker 2>trees are optimalist for storing tree structures on disk.

353
00:16:48.600 --> 00:16:51.639
<v Speaker 1>Okay, and one more structure. Hash tables. You said they're

354
00:16:51.679 --> 00:16:52.240
<v Speaker 1>like magic.

355
00:16:52.480 --> 00:16:56.039
<v Speaker 2>They feel like magic. Sometimes they allow finding items in

356
00:16:56.320 --> 00:17:00.759
<v Speaker 2>one time on average constant time. They use hash function,

357
00:17:01.240 --> 00:17:04.640
<v Speaker 2>a special calculation to convert the item's key directly into

358
00:17:04.640 --> 00:17:06.680
<v Speaker 2>a memory address or close.

359
00:17:06.400 --> 00:17:08.920
<v Speaker 1>To it, so you jump right to the data pretty much.

360
00:17:08.960 --> 00:17:12.519
<v Speaker 2>The main challenge is hash collisions when different keys hash

361
00:17:12.599 --> 00:17:14.640
<v Speaker 2>to the same spot. There are ways to handle that,

362
00:17:14.920 --> 00:17:17.160
<v Speaker 2>but yeah, hash tables are behind maps and sets when

363
00:17:17.200 --> 00:17:18.880
<v Speaker 2>you need that super fast access.

364
00:17:18.960 --> 00:17:23.000
<v Speaker 1>Okay, stepping back from individual structures, managing huge amounts of

365
00:17:23.079 --> 00:17:26.880
<v Speaker 1>data brings us to databases and relational databases have been

366
00:17:26.960 --> 00:17:28.880
<v Speaker 1>king for a long time. How do they work?

367
00:17:29.079 --> 00:17:33.640
<v Speaker 2>They organize data into tables think spreadsheets basically with rows

368
00:17:33.640 --> 00:17:36.880
<v Speaker 2>and columns with strict rules, very strict rules defined by

369
00:17:36.880 --> 00:17:39.599
<v Speaker 2>a schema. It dictates what kind of data goes in

370
00:17:39.680 --> 00:17:43.839
<v Speaker 2>each column field for each rope entry. But the real

371
00:17:43.920 --> 00:17:47.119
<v Speaker 2>power isn't just the tables. It's the relationships between tables.

372
00:17:47.160 --> 00:17:47.960
<v Speaker 1>How do you link them?

373
00:17:48.240 --> 00:17:51.480
<v Speaker 2>Using primary keys unique IDs for each row and a table,

374
00:17:51.599 --> 00:17:54.319
<v Speaker 2>and foreign keys which are references to primary keys in

375
00:17:54.440 --> 00:17:55.119
<v Speaker 2>other tables.

376
00:17:55.200 --> 00:17:57.279
<v Speaker 1>Ah, So you don't repeat information.

377
00:17:57.240 --> 00:18:01.000
<v Speaker 2>Exactly, You avoid data duplication. I don't store a customer's

378
00:18:01.000 --> 00:18:03.640
<v Speaker 2>full address with every single order they place. You have

379
00:18:03.680 --> 00:18:06.720
<v Speaker 2>a customer's table and an order's table linked by a

380
00:18:06.720 --> 00:18:11.000
<v Speaker 2>customer id. This process is called normalization. It keeps data

381
00:18:11.039 --> 00:18:12.920
<v Speaker 2>consistent and reduces redundancy.

382
00:18:13.160 --> 00:18:17.319
<v Speaker 1>And we talk to these databases using SQL right structured query.

383
00:18:17.039 --> 00:18:18.759
<v Speaker 2>Language, that's the standard language. Yeah.

384
00:18:18.839 --> 00:18:21.480
<v Speaker 1>Key commands are like select from.

385
00:18:21.480 --> 00:18:25.079
<v Speaker 2>Where, order by for sorting, group by why for aggregation,

386
00:18:25.559 --> 00:18:29.039
<v Speaker 2>and crucially join what is join do? Join lets you

387
00:18:29.079 --> 00:18:32.680
<v Speaker 2>combine data from multiple related tables based on those keys

388
00:18:32.720 --> 00:18:36.359
<v Speaker 2>we talked about, like getting customer names alongside their order details.

389
00:18:36.480 --> 00:18:37.759
<v Speaker 2>It's incredibly powerful.

390
00:18:37.799 --> 00:18:40.160
<v Speaker 1>That also maybe slow it can be.

391
00:18:40.400 --> 00:18:43.160
<v Speaker 2>Joining very large tables is computationally expensive. That's kind of

392
00:18:43.160 --> 00:18:45.960
<v Speaker 2>the power and weakness of the relational model. So to

393
00:18:45.960 --> 00:18:48.079
<v Speaker 2>speed things up, we use indexing.

394
00:18:47.759 --> 00:18:49.960
<v Speaker 1>Like an index in a book similar idea.

395
00:18:50.079 --> 00:18:52.799
<v Speaker 2>It's a separate data structure, often a self balancing tree

396
00:18:53.039 --> 00:18:57.000
<v Speaker 2>that maps index values like IDs to their actual location

397
00:18:57.079 --> 00:18:57.559
<v Speaker 2>in the table.

398
00:18:57.680 --> 00:19:00.200
<v Speaker 1>So queries are faster, much faster.

399
00:19:00.599 --> 00:19:03.960
<v Speaker 2>O log instead of scanning the whole table. Sorting is

400
00:19:03.960 --> 00:19:08.039
<v Speaker 2>faster too, But indexes have costs, take up space, and

401
00:19:08.079 --> 00:19:10.039
<v Speaker 2>they slow down updates because the index needs to be

402
00:19:10.119 --> 00:19:10.680
<v Speaker 2>updated too.

403
00:19:10.799 --> 00:19:12.720
<v Speaker 1>So don't just index everything.

404
00:19:12.880 --> 00:19:16.200
<v Speaker 2>Definitely not, use tools to see where indexes actually help.

405
00:19:16.599 --> 00:19:20.799
<v Speaker 2>And finally, transactions are vital why they ensure operations happen

406
00:19:20.880 --> 00:19:24.640
<v Speaker 2>completely or not at all. Think transferring money. You need

407
00:19:24.680 --> 00:19:27.960
<v Speaker 2>to debit one account and credit another. A transaction bundles

408
00:19:27.960 --> 00:19:31.200
<v Speaker 2>those together. If anything fails midway, the whole thing rolls back.

409
00:19:31.400 --> 00:19:34.000
<v Speaker 2>Prevents data inconsistencies, right, crucial.

410
00:19:33.680 --> 00:19:36.799
<v Speaker 1>For things like banking. Okay, but relational isn't the only

411
00:19:36.839 --> 00:19:38.599
<v Speaker 1>game in town anymore. We hear a lot about no

412
00:19:38.720 --> 00:19:40.640
<v Speaker 1>SQL databases. How are they different?

413
00:19:40.920 --> 00:19:44.319
<v Speaker 2>They generally bitch the rigid table structure and relationships for

414
00:19:44.480 --> 00:19:46.200
<v Speaker 2>more flexibility.

415
00:19:45.640 --> 00:19:47.119
<v Speaker 1>Like documents stores.

416
00:19:46.880 --> 00:19:50.119
<v Speaker 2>Yeah, popular ones like Mango dB. Data is stored in

417
00:19:50.119 --> 00:19:54.480
<v Speaker 2>flexible documents, often jason like grouped in collections, no fixed schema.

418
00:19:54.599 --> 00:19:58.039
<v Speaker 2>Usually data might be duplicated intentionally for faster reads.

419
00:19:58.279 --> 00:20:01.000
<v Speaker 1>So different philosophy than normal is very.

420
00:20:00.839 --> 00:20:04.079
<v Speaker 2>Different, more application centric than data centric. Then you have

421
00:20:04.160 --> 00:20:05.000
<v Speaker 2>key value.

422
00:20:04.680 --> 00:20:07.519
<v Speaker 1>Stores, simple key simple value.

423
00:20:07.279 --> 00:20:11.200
<v Speaker 2>The simplest think REDTIS or memcached mainly used for cashing,

424
00:20:11.319 --> 00:20:13.119
<v Speaker 2>getting that one access speed.

425
00:20:13.200 --> 00:20:14.240
<v Speaker 1>And graph databases.

426
00:20:14.839 --> 00:20:19.519
<v Speaker 2>Those are cool designed specifically for networked data. Nodes represent

427
00:20:19.720 --> 00:20:25.359
<v Speaker 2>entities like people, Edges represent relationships like friendships. If your

428
00:20:25.440 --> 00:20:28.119
<v Speaker 2>data looks like a network, a graph database like neo

429
00:20:28.200 --> 00:20:30.200
<v Speaker 2>fog is often the most natural.

430
00:20:29.880 --> 00:20:32.680
<v Speaker 1>Fit, and no sequels is often linked with big data.

431
00:20:32.359 --> 00:20:36.839
<v Speaker 2>Often yeah the three v's volume, velocity, variety, no sequels.

432
00:20:36.880 --> 00:20:41.000
<v Speaker 2>Flexibility can handle massive, fast changing unstructured data better than

433
00:20:41.119 --> 00:20:43.079
<v Speaker 2>rigid relational models, sometimes.

434
00:20:42.680 --> 00:20:43.960
<v Speaker 1>So segle versus no SQL.

435
00:20:43.960 --> 00:20:47.160
<v Speaker 2>It's a trade off, big trade off. Relational prioritizes data

436
00:20:47.200 --> 00:20:52.160
<v Speaker 2>consistency via normalization and transactions. No SQL often prioritizes flexibility,

437
00:20:52.240 --> 00:20:55.920
<v Speaker 2>speed and scalability. But consistency might need more careful handling

438
00:20:55.960 --> 00:20:57.480
<v Speaker 2>by the application developer, and.

439
00:20:57.400 --> 00:21:01.559
<v Speaker 1>For truly massive scale databases can be running on multiple

440
00:21:01.599 --> 00:21:02.640
<v Speaker 1>machines YEP to.

441
00:21:02.640 --> 00:21:06.519
<v Speaker 2>Handle huge loads or ensure high availability. This involves techniques

442
00:21:06.559 --> 00:21:10.839
<v Speaker 2>like replication and charting copying data. Single master replication has

443
00:21:10.880 --> 00:21:14.240
<v Speaker 2>one machine handle rights, others handle reads, good for read

444
00:21:14.279 --> 00:21:18.319
<v Speaker 2>heavy loads. Multi master lets multiple machines handle rights better

445
00:21:18.359 --> 00:21:22.119
<v Speaker 2>for right scaling, but more complex and charting. That's splitting

446
00:21:22.160 --> 00:21:26.119
<v Speaker 2>the data itself across multiple machines like customers am go

447
00:21:26.240 --> 00:21:29.319
<v Speaker 2>on server one and Z on server two. Often combined

448
00:21:29.319 --> 00:21:35.680
<v Speaker 2>with replication, but distribution introduces a huge challenge data consistency.

449
00:21:35.119 --> 00:21:37.200
<v Speaker 1>Making sure all the machines agree exactly.

450
00:21:37.599 --> 00:21:41.119
<v Speaker 2>There's a fundamental trade off between consistency and performance availability.

451
00:21:41.759 --> 00:21:45.359
<v Speaker 2>The movie ticket example, two people by the last ticket

452
00:21:45.519 --> 00:21:49.079
<v Speaker 2>via different server simultaneously. How do you resolve that instantly

453
00:21:49.160 --> 00:21:53.279
<v Speaker 2>without slowing everything down tricky. Many systems use eventual consistency,

454
00:21:53.319 --> 00:21:56.200
<v Speaker 2>meaning the data will eventually become consistent across all nodes,

455
00:21:56.319 --> 00:21:58.759
<v Speaker 2>but there might be brief periods where different users see

456
00:21:58.799 --> 00:22:01.160
<v Speaker 2>slightly different data. It's a design choice.

457
00:22:01.200 --> 00:22:04.759
<v Speaker 1>We also briefly touched on GIS and serialization formats.

458
00:22:04.519 --> 00:22:08.680
<v Speaker 2>Right, gis for spatial data maps, locations, nearest hospital queries,

459
00:22:09.119 --> 00:22:14.160
<v Speaker 2>and serialization formats like SQL, dumps, XML, CSV, and especially JSON,

460
00:22:14.359 --> 00:22:17.480
<v Speaker 2>which is kind of becoming the universal standard for exchanging

461
00:22:17.519 --> 00:22:18.440
<v Speaker 2>data between systems.

462
00:22:18.480 --> 00:22:22.039
<v Speaker 1>Okay, pew, we have the thought process, the efficiency strategies,

463
00:22:22.119 --> 00:22:24.839
<v Speaker 1>ways to manage data, But how does it all actually

464
00:22:24.960 --> 00:22:27.759
<v Speaker 1>run on the metal? How do computers work under the hood?

465
00:22:28.039 --> 00:22:30.920
<v Speaker 2>Right, let's get down to the hardware, the computer architecture

466
00:22:31.160 --> 00:22:31.799
<v Speaker 2>CPU and.

467
00:22:31.799 --> 00:22:35.440
<v Speaker 1>RAM SO RAM random access memory. What is it?

468
00:22:35.480 --> 00:22:39.000
<v Speaker 2>Fundamentally, it's basically a grid of millions or billions of

469
00:22:39.000 --> 00:22:43.079
<v Speaker 2>tiny memory cells. Each cell has a unique numerical.

470
00:22:42.640 --> 00:22:44.160
<v Speaker 1>Address and you access them.

471
00:22:44.079 --> 00:22:47.880
<v Speaker 2>Using electrical signals on buses and address PUS specifies which

472
00:22:47.920 --> 00:22:50.559
<v Speaker 2>cell and a data bus carries the actual data as

473
00:22:50.599 --> 00:22:53.839
<v Speaker 2>binary ones and narlows represented by high low voltage to

474
00:22:53.839 --> 00:22:57.039
<v Speaker 2>be read written. And the CPU central processing unit is

475
00:22:57.079 --> 00:22:58.160
<v Speaker 2>the brain doing the work.

476
00:22:58.319 --> 00:22:58.839
<v Speaker 1>What does it do?

477
00:22:59.119 --> 00:23:01.799
<v Speaker 2>It has its own suit profast internal storage called registers,

478
00:23:02.160 --> 00:23:04.480
<v Speaker 2>and it understands a specific set of simple commands. It's

479
00:23:04.480 --> 00:23:06.920
<v Speaker 2>instruction set, things like ad number eight to number B,

480
00:23:07.519 --> 00:23:10.400
<v Speaker 2>move data from memory location x to register y compare

481
00:23:10.440 --> 00:23:11.039
<v Speaker 2>two numbers.

482
00:23:11.319 --> 00:23:13.799
<v Speaker 1>So computer code deep down is just a sequence of

483
00:23:13.839 --> 00:23:15.720
<v Speaker 1>these simple instructions.

484
00:23:15.079 --> 00:23:19.079
<v Speaker 2>Exactly a sequence of numbers representing the CPU operations. The

485
00:23:19.200 --> 00:23:23.039
<v Speaker 2>CPU runs in a constant fetch execute cycle fetch execute YEP,

486
00:23:23.519 --> 00:23:26.960
<v Speaker 2>guided by the program counter PC, which holds the memory

487
00:23:26.960 --> 00:23:30.319
<v Speaker 2>address of the next instruction. The CPU fetches that instruction

488
00:23:30.359 --> 00:23:34.559
<v Speaker 2>from RAM, decodes it, executes it, maybe fetching data from RAM,

489
00:23:34.880 --> 00:23:37.960
<v Speaker 2>doing math, storing results back, and then moves the PC

490
00:23:38.079 --> 00:23:40.960
<v Speaker 2>to the next instruction, over and over, billions of times

491
00:23:41.000 --> 00:23:41.400
<v Speaker 2>a second.

492
00:23:41.480 --> 00:23:45.359
<v Speaker 1>It's kind of amazing. Everything we do browsing, gaming spreadsheets

493
00:23:45.440 --> 00:23:48.680
<v Speaker 1>boils down to that cycle executing simple math and data

494
00:23:48.720 --> 00:23:49.759
<v Speaker 1>moves pretty much.

495
00:23:49.839 --> 00:23:53.279
<v Speaker 2>Yeah, the old Space Invaders game around three thousand machine

496
00:23:53.319 --> 00:23:56.839
<v Speaker 2>instructions on a two miller heard CPU. Modern CPUs do

497
00:23:57.000 --> 00:23:58.200
<v Speaker 2>vastly more, much.

498
00:23:58.039 --> 00:24:00.480
<v Speaker 1>Faster, and different CPUs speak different languages.

499
00:24:00.599 --> 00:24:04.319
<v Speaker 2>Right architectures, right CPU architectures like by eighty six. Most

500
00:24:04.319 --> 00:24:08.119
<v Speaker 2>PCs MAX or arm phones, tablets New or Max have

501
00:24:08.160 --> 00:24:11.279
<v Speaker 2>different instruction sets. Code compiled for one won't run on

502
00:24:11.319 --> 00:24:11.599
<v Speaker 2>the other.

503
00:24:11.839 --> 00:24:14.880
<v Speaker 1>And there's Indian this too, big Indian versus little Indian.

504
00:24:14.960 --> 00:24:15.920
<v Speaker 1>How numbers are stored.

505
00:24:16.119 --> 00:24:20.240
<v Speaker 2>Yeah, most CPUs today are a little Indian. Least significant

506
00:24:20.240 --> 00:24:24.880
<v Speaker 2>bite first. But interestingly, network traffic often uses big Indian,

507
00:24:25.119 --> 00:24:26.960
<v Speaker 2>a legacy from early router hardware.

508
00:24:27.759 --> 00:24:30.559
<v Speaker 1>So how do we get from say, Python or C

509
00:24:30.640 --> 00:24:34.599
<v Speaker 1>plus plus code to those specific CPU instructions. That's the

510
00:24:34.680 --> 00:24:35.880
<v Speaker 1>job of a compiler, right.

511
00:24:36.079 --> 00:24:39.319
<v Speaker 2>The compiler is that magic program that translates high level,

512
00:24:39.400 --> 00:24:42.440
<v Speaker 2>human readable code into low level machine code for a

513
00:24:42.519 --> 00:24:44.160
<v Speaker 2>specific CPU architecture.

514
00:24:44.400 --> 00:24:46.279
<v Speaker 1>Is it true any code can be boiled down to

515
00:24:46.319 --> 00:24:47.000
<v Speaker 1>simple moves?

516
00:24:47.200 --> 00:24:50.880
<v Speaker 2>Theoretically yes, based on Turing completeness, It's been shown that

517
00:24:51.079 --> 00:24:54.480
<v Speaker 2>even just a single instruction type like mov moved data,

518
00:24:54.640 --> 00:24:58.319
<v Speaker 2>if used cleverly, is powerful enough to compute anything computable.

519
00:24:58.599 --> 00:25:00.680
<v Speaker 2>And programs don't just talk to this CPU. They need

520
00:25:00.680 --> 00:25:01.880
<v Speaker 2>the operating system too.

521
00:25:01.880 --> 00:25:05.400
<v Speaker 1>Right Windows, Mac, OS, Linux, for things like reading files,

522
00:25:05.440 --> 00:25:08.279
<v Speaker 1>getting keyboard input, drawing on the screen. Programs make system

523
00:25:08.359 --> 00:25:10.039
<v Speaker 1>calls to the OS for these services.

524
00:25:10.119 --> 00:25:12.599
<v Speaker 2>So compiled code targets both a CPU and an.

525
00:25:12.480 --> 00:25:15.200
<v Speaker 1>OS, and compiles are smart. They optimize the code.

526
00:25:15.359 --> 00:25:21.559
<v Speaker 2>Oh yeah, good compilers do incredible optimizations, rearranging instructions, eliminating

527
00:25:21.599 --> 00:25:25.480
<v Speaker 2>redundant steps, making the machine code much faster than the

528
00:25:25.599 --> 00:25:26.559
<v Speaker 2>naive translation.

529
00:25:26.759 --> 00:25:31.799
<v Speaker 1>So the advice is write clean code, don't prematurely optimize yourself.

530
00:25:32.039 --> 00:25:35.799
<v Speaker 2>Generally yes, focus on clarity. If you have performance problems,

531
00:25:36.599 --> 00:25:40.400
<v Speaker 2>use profiling tools to find the real bottlenecks, then optimize

532
00:25:40.440 --> 00:25:43.640
<v Speaker 2>those specific parts. Let the compiler handle the general stuff.

533
00:25:43.720 --> 00:25:46.759
<v Speaker 1>What about languages like JavaScript or Python? They don't always

534
00:25:46.799 --> 00:25:48.319
<v Speaker 1>use compilers upfront, right.

535
00:25:48.359 --> 00:25:51.920
<v Speaker 2>Those are often scripting languages that use interpreters, and interpreter

536
00:25:52.039 --> 00:25:54.599
<v Speaker 2>reads and executes the code line by line or translates

537
00:25:54.640 --> 00:25:55.079
<v Speaker 2>it on the fly.

538
00:25:55.440 --> 00:25:57.119
<v Speaker 1>Slower than compiled code.

539
00:25:56.960 --> 00:25:59.799
<v Speaker 2>Usually yes, yeah, but often faster for development you don't

540
00:25:59.839 --> 00:26:03.400
<v Speaker 2>have long compile step. Some languages like Go try to

541
00:26:03.400 --> 00:26:06.200
<v Speaker 2>bridge the gap with super fast compilation, and.

542
00:26:06.160 --> 00:26:08.799
<v Speaker 1>People can reverse this. Look at the machine code.

543
00:26:08.599 --> 00:26:11.920
<v Speaker 2>Yep, disassembly and reverse engineering, taking the binary machine code

544
00:26:11.960 --> 00:26:14.319
<v Speaker 2>and trying to figure out the original logic. Hackers do

545
00:26:14.359 --> 00:26:18.039
<v Speaker 2>it to bypass licenses. Security researchers do it to find vulnerabilities,

546
00:26:18.480 --> 00:26:19.920
<v Speaker 2>like with stucksnet.

547
00:26:19.359 --> 00:26:22.319
<v Speaker 1>Which highlights the value of open source software.

548
00:26:22.200 --> 00:26:26.920
<v Speaker 2>Exactly with open source like Linux, many eyes can scrutinize

549
00:26:26.920 --> 00:26:30.359
<v Speaker 2>the source code for flaws, which arguably leads to better

550
00:26:30.400 --> 00:26:33.359
<v Speaker 2>security than closed source systems. Or only the vendor sees

551
00:26:33.400 --> 00:26:34.200
<v Speaker 2>the code, okay.

552
00:26:34.359 --> 00:26:37.400
<v Speaker 1>Last piece of the hardware puzzle, the memory hierarchy. You

553
00:26:37.440 --> 00:26:40.079
<v Speaker 1>mentioned caches earlier. Why do we need different types of

554
00:26:40.119 --> 00:26:41.400
<v Speaker 1>memory Because.

555
00:26:41.039 --> 00:26:45.200
<v Speaker 2>Of the processor memory gap. CPUs got incredibly fast, much

556
00:26:45.240 --> 00:26:49.119
<v Speaker 2>faster than main memory RAM. If the CPU had to

557
00:26:49.119 --> 00:26:52.119
<v Speaker 2>wait for RAM every single time it needed data, it

558
00:26:52.160 --> 00:26:53.759
<v Speaker 2>would spend most of its time idle.

559
00:26:54.240 --> 00:26:57.240
<v Speaker 1>So cash's L one, L two, L three bridge that

560
00:26:57.319 --> 00:26:58.240
<v Speaker 1>gap precisely.

561
00:26:58.440 --> 00:27:02.039
<v Speaker 2>They are small, extremely fast memory banks right on or

562
00:27:02.119 --> 00:27:04.759
<v Speaker 2>very near the CPU chip. They store copies of recently

563
00:27:04.839 --> 00:27:05.759
<v Speaker 2>used data from RAM.

564
00:27:05.880 --> 00:27:07.240
<v Speaker 1>How do they know what data to store?

565
00:27:07.319 --> 00:27:10.319
<v Speaker 2>The exploit locality temporal locality. If you just use some data,

566
00:27:10.319 --> 00:27:13.319
<v Speaker 2>you'll probably use it again soon, spatial locality. If you

567
00:27:13.400 --> 00:27:15.799
<v Speaker 2>just use data at address X, you'll probably use data

568
00:27:15.839 --> 00:27:16.480
<v Speaker 2>near X soon.

569
00:27:16.759 --> 00:27:19.680
<v Speaker 1>So the CASH keeps that nearby recent data close exactly.

570
00:27:19.759 --> 00:27:23.440
<v Speaker 2>Accessing the OL one CASH might take say ten CPU cycles,

571
00:27:23.680 --> 00:27:26.480
<v Speaker 2>Axasing RAM might take a thousand cycles. Cashes make a

572
00:27:26.480 --> 00:27:28.359
<v Speaker 2>massive difference by reducing that waiting time.

573
00:27:28.599 --> 00:27:31.400
<v Speaker 1>And then below RAM and the higher tricky is secondary

574
00:27:31.440 --> 00:27:33.920
<v Speaker 1>memory like hard disks or SSDs.

575
00:27:34.079 --> 00:27:38.200
<v Speaker 2>Right, primary memory is RAM, volatile fast. Secondary memory is

576
00:27:38.240 --> 00:27:43.000
<v Speaker 2>non volatile storage hard discs, HDDs, solid state drives as eight.

577
00:27:43.039 --> 00:27:44.119
<v Speaker 1>Difference is huge.

578
00:27:44.119 --> 00:27:48.079
<v Speaker 2>Again, dramatic RAM is maybe a thousand cycles away a

579
00:27:48.119 --> 00:27:52.119
<v Speaker 2>traditional hard disc maybe a million cycles. SSDs are much

580
00:27:52.160 --> 00:27:55.079
<v Speaker 2>faster than hgds, but still way slower.

581
00:27:54.720 --> 00:27:56.920
<v Speaker 1>Than RAM, and if you run out of RAM.

582
00:27:56.640 --> 00:28:00.519
<v Speaker 2>The system starts trashing constantly swapping data between RAM and

583
00:28:00.599 --> 00:28:04.359
<v Speaker 2>the much slower disc. The computer becomes practically unusable. Insufficient

584
00:28:04.440 --> 00:28:06.839
<v Speaker 2>RAM is a huge performance killer, maybe a top cause

585
00:28:06.839 --> 00:28:10.359
<v Speaker 2>of server failure. Wow, this hierarchy extends further external drives,

586
00:28:10.640 --> 00:28:15.240
<v Speaker 2>tape backups, per schary storage, each level bigger, cheaper, but slower,

587
00:28:15.640 --> 00:28:18.960
<v Speaker 2>and the key insight identifying the data your application uses

588
00:28:18.960 --> 00:28:21.599
<v Speaker 2>most frequently and making that data faster to access by

589
00:28:21.599 --> 00:28:24.440
<v Speaker 2>loading it into RAM or leveraging caches effectively is one

590
00:28:24.480 --> 00:28:27.200
<v Speaker 2>of the biggest levers you have for improving program speed.

591
00:28:27.279 --> 00:28:29.920
<v Speaker 1>Okay, wow, we have really covered a ton of ground today.

592
00:28:29.920 --> 00:28:32.599
<v Speaker 1>I mean, from pure logic and problem solving.

593
00:28:32.440 --> 00:28:37.240
<v Speaker 2>Through designing efficient algorithms, managing data structures and databases.

594
00:28:36.799 --> 00:28:39.160
<v Speaker 1>Right down to the CPU cycles and memory hierarchy.

595
00:28:39.200 --> 00:28:41.599
<v Speaker 2>It's quite a journey, it really is, and hopefully shows

596
00:28:41.640 --> 00:28:45.119
<v Speaker 2>that computer science is so much more than just typing code.

597
00:28:45.160 --> 00:28:47.880
<v Speaker 1>It's a fundamental way of thinking, Isn't it Absolutely a

598
00:28:47.880 --> 00:28:52.000
<v Speaker 1>powerful toolkit for approaching problems systematically, and that applies way

599
00:28:52.000 --> 00:28:56.279
<v Speaker 1>beyond just computers. So you've peered behind the curtain, got

600
00:28:56.279 --> 00:28:59.839
<v Speaker 1>the distilled essence. Hopefully, What does this all mean for you?

601
00:29:00.559 --> 00:29:00.759
<v Speaker 2>Yeah?

602
00:29:00.799 --> 00:29:02.720
<v Speaker 1>How can you use this now that you've got this

603
00:29:02.920 --> 00:29:06.400
<v Speaker 1>lens of computational thinking? What complex problem in your own

604
00:29:06.440 --> 00:29:11.119
<v Speaker 1>life could be? Anything? Work, project planning, something complicated.

605
00:29:10.799 --> 00:29:12.400
<v Speaker 2>Even just organizing your thoughts?

606
00:29:12.960 --> 00:29:16.839
<v Speaker 1>Right? Could you simplify it by applying one of these tools,

607
00:29:17.440 --> 00:29:22.720
<v Speaker 1>breaking it down, logically, optimizing a process, organizing the information better.

608
00:29:22.960 --> 00:29:25.039
<v Speaker 2>It's worth thinking about. These aren't just for computers.

609
00:29:25.039 --> 00:29:27.720
<v Speaker 1>Definitely something to mall over. Yeah, we encourage you to

610
00:29:27.799 --> 00:29:31.119
<v Speaker 1>keep exploring these ideas and of course keep tuning in

611
00:29:31.160 --> 00:29:32.440
<v Speaker 1>for more deep dives.
