WEBVTT

1
00:00:00.080 --> 00:00:02.200
<v Speaker 1>Welcome back to the deep dive, where we take a

2
00:00:02.240 --> 00:00:06.200
<v Speaker 1>stack of information and transform it into instant expertise tailored

3
00:00:06.280 --> 00:00:09.599
<v Speaker 1>just for you. Today, we're plunging into well, the very

4
00:00:09.599 --> 00:00:13.519
<v Speaker 1>bedrock of computer science, data structures and algorithms.

5
00:00:13.599 --> 00:00:16.359
<v Speaker 2>Yeah, it's a foundational topic. We're drawing our insights from

6
00:00:16.519 --> 00:00:19.879
<v Speaker 2>Ruby data structures and Algorithms, which is a really comprehensive resource.

7
00:00:20.079 --> 00:00:22.480
<v Speaker 2>It helps us understand not just what these concepts are,

8
00:00:22.519 --> 00:00:25.480
<v Speaker 2>but really why they're so critical to everything you interact

9
00:00:25.519 --> 00:00:29.559
<v Speaker 2>with in the digital world. So our mission today is

10
00:00:29.600 --> 00:00:33.320
<v Speaker 2>to kind of distill the core principles, maybe reveal some

11
00:00:33.359 --> 00:00:37.079
<v Speaker 2>surprising implications of different design choices, and also highlight how

12
00:00:37.159 --> 00:00:41.840
<v Speaker 2>Ruby's unique philosophy shapes these essential building blocks of code.

13
00:00:42.039 --> 00:00:44.960
<v Speaker 1>Think of this as pulling back the curtain on the powerful,

14
00:00:45.000 --> 00:00:48.159
<v Speaker 1>invisible machinery that makes your favorite apps tick. You'll hopefully

15
00:00:48.280 --> 00:00:52.399
<v Speaker 1>uncover insights that might change how you think about efficiency, flexibility,

16
00:00:52.799 --> 00:00:56.880
<v Speaker 1>and even the subtle art of crafting robust software. Okay,

17
00:00:56.920 --> 00:00:59.840
<v Speaker 1>let's start at the absolute beginning foundational ideas. We hear

18
00:00:59.840 --> 00:01:03.280
<v Speaker 1>a lot about abstract data types or ADTs. What's the

19
00:01:03.920 --> 00:01:05.319
<v Speaker 1>core concept behind an ADT?

20
00:01:05.799 --> 00:01:09.400
<v Speaker 2>Right? Well, and ADT is essentially the pure abstract idea

21
00:01:09.439 --> 00:01:12.120
<v Speaker 2>of what a data structure should do. It's completely separate

22
00:01:12.120 --> 00:01:13.719
<v Speaker 2>from how it's implemented on a computer.

23
00:01:14.239 --> 00:01:14.400
<v Speaker 1>Right.

24
00:01:14.680 --> 00:01:16.879
<v Speaker 2>It defines a set of values could be all the

25
00:01:16.920 --> 00:01:20.319
<v Speaker 2>integers or maybe just true and false, and the operations

26
00:01:20.319 --> 00:01:23.480
<v Speaker 2>you can perform on them, so I think addition, subtraction,

27
00:01:23.840 --> 00:01:25.959
<v Speaker 2>or logical negation, stuff like that.

28
00:01:26.079 --> 00:01:26.640
<v Speaker 1>Oh okay.

29
00:01:26.760 --> 00:01:30.319
<v Speaker 2>The real insight here, I think is this principle you

30
00:01:30.400 --> 00:01:33.400
<v Speaker 2>define the behavior first, focusing on what you need to

31
00:01:33.480 --> 00:01:37.000
<v Speaker 2>achieve before you even consider the specific coding details.

32
00:01:37.359 --> 00:01:40.439
<v Speaker 1>So it's almost like setting up a conceptual contract for

33
00:01:40.519 --> 00:01:42.359
<v Speaker 1>what your data will be and how it will behave

34
00:01:42.680 --> 00:01:45.079
<v Speaker 1>even before you write a single line of code precisely.

35
00:01:45.159 --> 00:01:47.760
<v Speaker 2>Yeah, that's a great analogy. And then once that conceptual

36
00:01:47.799 --> 00:01:50.879
<v Speaker 2>ADT is actually built in a programming language, well that's

37
00:01:50.879 --> 00:01:52.120
<v Speaker 2>when we call it a data type.

38
00:01:52.159 --> 00:01:52.719
<v Speaker 1>Gotcha.

39
00:01:52.799 --> 00:01:56.159
<v Speaker 2>So, for example, the abstract integer ADT becomes the concrete

40
00:01:56.480 --> 00:01:59.280
<v Speaker 2>type in Java or maybe the integer class in Ruby.

41
00:01:59.840 --> 00:02:02.719
<v Speaker 2>And the specific way we arrange that data in memory

42
00:02:03.519 --> 00:02:06.200
<v Speaker 2>to represent those values, that's our data structure.

43
00:02:06.319 --> 00:02:08.400
<v Speaker 1>And the algorithms where do they fit into this picture?

44
00:02:08.599 --> 00:02:11.879
<v Speaker 2>Ah, algorithms, they're the step by step instructions kind of

45
00:02:11.879 --> 00:02:15.000
<v Speaker 2>like the recipes that bring an ADT's operations to life.

46
00:02:15.639 --> 00:02:18.360
<v Speaker 2>They use the data structures you've chosen to perform those

47
00:02:18.360 --> 00:02:22.039
<v Speaker 2>defined actions. So you see, data structures and algorithms are

48
00:02:22.080 --> 00:02:24.199
<v Speaker 2>really two sides of the same coin. You can't really

49
00:02:24.199 --> 00:02:26.599
<v Speaker 2>have one without the other, and together they let us

50
00:02:26.639 --> 00:02:30.319
<v Speaker 2>implement these powerful abstract concepts in our programs.

51
00:02:30.599 --> 00:02:33.759
<v Speaker 1>Okay, that makes sense. Now, as we build these systems,

52
00:02:33.919 --> 00:02:39.120
<v Speaker 1>making sure they're correct is well. Paramount Our source emphasizes

53
00:02:39.199 --> 00:02:42.520
<v Speaker 1>something called assertions. What role do they play? Right?

54
00:02:42.599 --> 00:02:46.560
<v Speaker 2>Assertions? An assertion is simply a statement that must be

55
00:02:46.599 --> 00:02:49.960
<v Speaker 2>true at a specific point in a program's execution. They

56
00:02:50.000 --> 00:02:53.719
<v Speaker 2>serve two main purposes, one clearly documenting constraints for human

57
00:02:53.759 --> 00:02:57.800
<v Speaker 2>developers reading the code, and two crucially checking those constraints

58
00:02:57.879 --> 00:03:01.319
<v Speaker 2>during execution to catch faults, you know as early as possible.

59
00:03:01.560 --> 00:03:04.400
<v Speaker 1>What are the most vital types of assertions? Developers tend

60
00:03:04.439 --> 00:03:05.000
<v Speaker 1>to rely.

61
00:03:04.840 --> 00:03:08.800
<v Speaker 2>On, good question. The most common and i'd say crucial

62
00:03:08.879 --> 00:03:13.479
<v Speaker 2>types are preconditions and post conditions. A precondition make sure

63
00:03:13.479 --> 00:03:16.400
<v Speaker 2>your inputs are valid before an operation even starts.

64
00:03:16.520 --> 00:03:19.319
<v Speaker 1>Like making sure a square root function only gets a

65
00:03:19.360 --> 00:03:21.960
<v Speaker 1>non negative number. You can't square root, a negative right,

66
00:03:22.280 --> 00:03:27.000
<v Speaker 1>and a post condition verifies the outcome after the operation

67
00:03:27.120 --> 00:03:30.159
<v Speaker 1>is done, confirming the result is what you actually expect it.

68
00:03:30.280 --> 00:03:30.599
<v Speaker 2>Okay.

69
00:03:31.159 --> 00:03:34.360
<v Speaker 1>There are also class invariants which maintain the integrity of

70
00:03:34.400 --> 00:03:38.520
<v Speaker 1>an object between method calls, and some other specialized types too.

71
00:03:38.599 --> 00:03:42.680
<v Speaker 1>But those core assertions, preconditions and post conditions, that's where

72
00:03:42.680 --> 00:03:44.439
<v Speaker 1>a lot of your bug catching power lies.

73
00:03:44.639 --> 00:03:47.919
<v Speaker 2>That sounds incredibly helpful for catching bugs early. But how

74
00:03:47.919 --> 00:03:50.479
<v Speaker 2>does Ruby handle these if at all? Is it built in?

75
00:03:50.759 --> 00:03:53.400
<v Speaker 1>Well, what's kind of fascinating about Ruby here is its approach.

76
00:03:54.000 --> 00:03:57.560
<v Speaker 1>It provides no direct, built in support for assertions in

77
00:03:57.639 --> 00:03:58.599
<v Speaker 1>the language itself.

78
00:03:59.039 --> 00:03:59.360
<v Speaker 2>None.

79
00:03:59.599 --> 00:04:03.680
<v Speaker 1>Nope. And what's more, Ruby's dynamic weekly type nature means

80
00:04:03.919 --> 00:04:07.039
<v Speaker 1>it doesn't even enforce basic type checking for method parameters

81
00:04:07.120 --> 00:04:10.159
<v Speaker 1>or return values at the language level. So those crucial

82
00:04:10.280 --> 00:04:14.560
<v Speaker 1>type related pre and post conditions, they aren't automatically checked. Wow.

83
00:04:15.039 --> 00:04:18.199
<v Speaker 1>So if you're a Ruby programmer, it sounds like the

84
00:04:18.360 --> 00:04:23.160
<v Speaker 1>entire burden of checking assertions, of enforcing these constraints falls

85
00:04:23.199 --> 00:04:26.800
<v Speaker 1>squarely on your shoulders. That freedom could certainly make certain

86
00:04:26.839 --> 00:04:29.600
<v Speaker 1>types of bugs well much harder to find, couldn't it.

87
00:04:29.720 --> 00:04:33.160
<v Speaker 2>Absolutely, it's a trade off for example, many unintended or

88
00:04:33.959 --> 00:04:37.680
<v Speaker 2>erroneous array references might be considered legal by the Ruby interpreter.

89
00:04:38.120 --> 00:04:41.000
<v Speaker 2>They might just return nil instead of raising an error

90
00:04:41.040 --> 00:04:43.839
<v Speaker 2>like you'd see in a more strongly typed language. So yeah,

91
00:04:43.839 --> 00:04:47.360
<v Speaker 2>it truly pushes the responsibility for robust error checking onto

92
00:04:47.360 --> 00:04:51.560
<v Speaker 2>the developer using Ruby. Now, when we categorize data types

93
00:04:51.600 --> 00:04:54.160
<v Speaker 2>and programming, they typically fall into two buckets.

94
00:04:54.240 --> 00:04:54.360
<v Speaker 1>Right.

95
00:04:54.399 --> 00:04:58.480
<v Speaker 2>You've got simple types like individual integers or booleans, things

96
00:04:58.519 --> 00:05:00.920
<v Speaker 2>that can't really be broken down for there, and then

97
00:05:01.000 --> 00:05:03.879
<v Speaker 2>structured types, which are composed of multiple parts, things like

98
00:05:04.000 --> 00:05:05.040
<v Speaker 2>arrays or hashes.

99
00:05:05.480 --> 00:05:08.959
<v Speaker 1>And Ruby has some really interesting nuances here, even with

100
00:05:09.000 --> 00:05:12.800
<v Speaker 1>things that seem like simple types, especially it's string and

101
00:05:12.839 --> 00:05:14.360
<v Speaker 1>symbol classes.

102
00:05:13.879 --> 00:05:17.800
<v Speaker 2>It really does. Ruby's string class creates mutable sequences of

103
00:05:17.920 --> 00:05:21.279
<v Speaker 2>Unicode characters. Mutable means you can change them after you

104
00:05:21.360 --> 00:05:26.519
<v Speaker 2>create them. Okay, But the symbol class creates immutable character sequences.

105
00:05:27.399 --> 00:05:32.000
<v Speaker 2>And here's the surprising implication. Symbols are stored uniquely in

106
00:05:32.120 --> 00:05:33.959
<v Speaker 2>Ruby's interpreter's symbol table.

107
00:05:34.079 --> 00:05:35.160
<v Speaker 1>What does that mean exactly?

108
00:05:35.439 --> 00:05:38.879
<v Speaker 2>It guarantees only a single instance exists for any given

109
00:05:38.959 --> 00:05:42.519
<v Speaker 2>character sequence like dot my symbol. So while you might

110
00:05:42.560 --> 00:05:45.319
<v Speaker 2>just think of them as different kinds of text. Ruby

111
00:05:45.360 --> 00:05:48.839
<v Speaker 2>effectively offers two distinct implementations of a string eighty T.

112
00:05:49.639 --> 00:05:53.120
<v Speaker 2>Each has different performance and memory characteristics, making them suited

113
00:05:53.160 --> 00:05:54.279
<v Speaker 2>for different use cases.

114
00:05:54.360 --> 00:05:55.959
<v Speaker 1>It's quite clever interesting.

115
00:05:56.040 --> 00:05:59.120
<v Speaker 2>Another crucial structured type in Ruby is, of course, the array.

116
00:05:59.800 --> 00:06:02.759
<v Speaker 2>Now Unlike the fixed sized static arrays you might find

117
00:06:02.759 --> 00:06:06.360
<v Speaker 2>in some older languages, Ruby arrays are dynamic.

118
00:06:06.000 --> 00:06:08.040
<v Speaker 1>Arrays, meaning they can grow exactly.

119
00:06:08.199 --> 00:06:10.639
<v Speaker 2>Their size can change at runtime. When an array needs

120
00:06:10.639 --> 00:06:12.879
<v Speaker 2>more memory to grow, the system has to allocate a

121
00:06:12.920 --> 00:06:16.519
<v Speaker 2>whole new, larger chunk of memory right, copy all the

122
00:06:16.560 --> 00:06:19.600
<v Speaker 2>existing elements over, and then free up the old memory space.

123
00:06:20.639 --> 00:06:25.959
<v Speaker 2>It is that reallocation is computationally expensive, so to minimize

124
00:06:26.000 --> 00:06:30.399
<v Speaker 2>how often this happens, Ruby often doubles the array's allocated size,

125
00:06:30.480 --> 00:06:33.920
<v Speaker 2>even if you only ask for say one more slot.

126
00:06:34.399 --> 00:06:36.720
<v Speaker 2>It's trying to anticipate future growth.

127
00:06:36.680 --> 00:06:40.000
<v Speaker 1>That proactiicizing makes a lot of sense avoiding those frequent,

128
00:06:40.120 --> 00:06:44.399
<v Speaker 1>expensive copy operations. But Ruby arrays have some other truly

129
00:06:44.480 --> 00:06:47.480
<v Speaker 1>unique characteristics too, don't they Things that really set them

130
00:06:47.519 --> 00:06:49.079
<v Speaker 1>apart from arrays in other languages.

131
00:06:49.199 --> 00:06:53.279
<v Speaker 2>I absolutely do. For one, Ruby arrays automatically expand if

132
00:06:53.279 --> 00:06:55.600
<v Speaker 2>you try to store a value beyond their current end.

133
00:06:55.920 --> 00:06:59.000
<v Speaker 2>To the programmer, they kind of seem unbounded, okay, And

134
00:06:59.040 --> 00:07:00.879
<v Speaker 2>if a location within when the array doesn't have an

135
00:07:00.920 --> 00:07:03.839
<v Speaker 2>assigned value, it just defaults to nil, right nil. Ruby

136
00:07:03.879 --> 00:07:07.160
<v Speaker 2>also allows negative indices, so a one directly references the

137
00:07:07.240 --> 00:07:09.480
<v Speaker 2>very last element, A two to the second to last,

138
00:07:09.519 --> 00:07:12.000
<v Speaker 2>and so on. Bandy, and maybe most strikingly, if you

139
00:07:12.040 --> 00:07:14.600
<v Speaker 2>try to access an index that's completely out of bounds,

140
00:07:14.680 --> 00:07:17.160
<v Speaker 2>it just returns nil instead of causing an error.

141
00:07:18.319 --> 00:07:22.920
<v Speaker 1>That unbounded nil defaulting behavior seems incredibly flexible, almost convenient

142
00:07:22.959 --> 00:07:25.560
<v Speaker 1>on the surface. Yeah, but when does that flexibility become

143
00:07:25.800 --> 00:07:27.120
<v Speaker 1>maybe a hidden trap.

144
00:07:26.920 --> 00:07:31.199
<v Speaker 2>For developers exactly. That's the crux of it. While it's convenient,

145
00:07:31.360 --> 00:07:36.240
<v Speaker 2>this freedom can inadvertently hide bugs. Think about it. In

146
00:07:36.279 --> 00:07:39.240
<v Speaker 2>a language like Java, if you mistakenly try to assign

147
00:07:39.240 --> 00:07:42.240
<v Speaker 2>a string value to an array meant for integers, the

148
00:07:42.279 --> 00:07:45.360
<v Speaker 2>compiler would immediately flag it as an error boom cauterly

149
00:07:45.560 --> 00:07:48.879
<v Speaker 2>makes sense. In Ruby, the interpreter won't complain at that point.

150
00:07:48.959 --> 00:07:51.279
<v Speaker 2>It might just happily put a string where you expected

151
00:07:51.319 --> 00:07:55.079
<v Speaker 2>an integer, or return nil from an invalid array access.

152
00:07:55.600 --> 00:07:58.120
<v Speaker 2>This makes those types of books potentially much harder to

153
00:07:58.160 --> 00:08:01.639
<v Speaker 2>diagnose until much later, maybe when something unexpected happens further

154
00:08:01.720 --> 00:08:02.560
<v Speaker 2>down the line.

155
00:08:02.759 --> 00:08:05.600
<v Speaker 1>So the core takeaway here seems to be that Ruby rays,

156
00:08:06.319 --> 00:08:10.360
<v Speaker 1>because they can store arbitrary values and grow automatically, behave

157
00:08:10.439 --> 00:08:14.519
<v Speaker 1>more like highly flexible lists than the strict homogeneous arrayse

158
00:08:14.560 --> 00:08:16.000
<v Speaker 1>you find in many other languages.

159
00:08:16.079 --> 00:08:18.959
<v Speaker 2>Precisely, and they also come with an incredibly rich set

160
00:08:19.000 --> 00:08:22.839
<v Speaker 2>of built in methods. They effectively bundle in set operations

161
00:08:22.920 --> 00:08:26.680
<v Speaker 2>like membership testing or unions, string like operations, and even

162
00:08:26.720 --> 00:08:30.600
<v Speaker 2>built in stack operations push pop and Q operations shift

163
00:08:30.879 --> 00:08:34.279
<v Speaker 2>a pen. They're very versatile. So moving beyond those simple types,

164
00:08:34.480 --> 00:08:38.679
<v Speaker 2>containers are a really crucial category of complex abstract data types.

165
00:08:39.080 --> 00:08:42.679
<v Speaker 2>A container is simply an entity that holds a finite

166
00:08:42.759 --> 00:08:45.120
<v Speaker 2>number of other entities. Think of it like a box

167
00:08:45.679 --> 00:08:48.080
<v Speaker 2>or a bag designed to organize things.

168
00:08:47.919 --> 00:08:51.039
<v Speaker 1>That box or bag. Analogy is helpful, it helps visualize it.

169
00:08:51.399 --> 00:08:54.039
<v Speaker 1>But if I'm a developer trying to choose the right container,

170
00:08:54.320 --> 00:08:56.600
<v Speaker 1>what are the critical distinctions I need to keep in mind?

171
00:08:56.720 --> 00:08:58.840
<v Speaker 1>How do I pick the right box for my data?

172
00:08:58.879 --> 00:09:01.799
<v Speaker 2>That's a great question. Key distinctions really boil down to

173
00:09:01.960 --> 00:09:06.840
<v Speaker 2>three main properties. First, their structure. Do they hold elements

174
00:09:06.840 --> 00:09:09.360
<v Speaker 2>in a specific order like a list does, or are

175
00:09:09.360 --> 00:09:10.440
<v Speaker 2>they unordered like a set?

176
00:09:10.519 --> 00:09:11.519
<v Speaker 1>Okay, order matters.

177
00:09:11.639 --> 00:09:15.279
<v Speaker 2>Second, access restrictions. Can you add, remove, or look at

178
00:09:15.279 --> 00:09:18.559
<v Speaker 2>elements anywhere in the container or only at specific points

179
00:09:18.600 --> 00:09:20.399
<v Speaker 2>like just the top of a stack, right like a

180
00:09:20.440 --> 00:09:24.320
<v Speaker 2>stack or queue exactly? And finally, keyed access. Can you

181
00:09:24.360 --> 00:09:27.559
<v Speaker 2>retrieve elements using some unique identifier like how a map

182
00:09:27.639 --> 00:09:30.320
<v Speaker 2>uses a key to find its associated value.

183
00:09:30.120 --> 00:09:32.559
<v Speaker 1>Structure access keyed access? Got it?

184
00:09:32.960 --> 00:09:35.120
<v Speaker 2>Yeah, And we can sort of visualize this with a

185
00:09:35.159 --> 00:09:38.480
<v Speaker 2>conceptual hierarchy like a family tree. At the root, you

186
00:09:38.559 --> 00:09:41.440
<v Speaker 2>might have a very general container. All it knows is

187
00:09:41.480 --> 00:09:44.200
<v Speaker 2>its size, whether it's empty, and how to clear itself

188
00:09:44.240 --> 00:09:47.480
<v Speaker 2>out basic stuff. Right, Then you branch out. You might

189
00:09:47.519 --> 00:09:50.600
<v Speaker 2>have collection, which is a traversible container, meaning you can

190
00:09:50.639 --> 00:09:54.320
<v Speaker 2>get to all its elements. This includes things like lists, sets,

191
00:09:54.360 --> 00:09:57.759
<v Speaker 2>and maps, And separately, you might have dispenser, which is

192
00:09:57.799 --> 00:10:00.960
<v Speaker 2>a non troversible container with those restricts access points we

193
00:10:01.000 --> 00:10:02.799
<v Speaker 2>talked about, like stacks and queues.

194
00:10:03.360 --> 00:10:06.320
<v Speaker 1>These sound a lot like interfaces, which is a common

195
00:10:06.600 --> 00:10:11.519
<v Speaker 1>concept in object oriented programming. But how does Ruby handle

196
00:10:11.639 --> 00:10:16.000
<v Speaker 1>these structural ideas given its famously flexible duck typing approach.

197
00:10:16.679 --> 00:10:18.960
<v Speaker 1>It doesn't really do interfaces in the same way, does it.

198
00:10:19.080 --> 00:10:21.120
<v Speaker 2>You're absolutely right to point that out. Ruby doesn't have

199
00:10:21.159 --> 00:10:24.360
<v Speaker 2>explicit interfaces like Java or c sharp do, where a

200
00:10:24.399 --> 00:10:28.240
<v Speaker 2>class formally declares it implements an interface. Instead, it relies

201
00:10:28.279 --> 00:10:31.840
<v Speaker 2>heavily on duck typing. If it walks like a duck, exactly,

202
00:10:31.960 --> 00:10:34.159
<v Speaker 2>if it walks like a duck and quacks like a duck,

203
00:10:34.360 --> 00:10:37.159
<v Speaker 2>Ruby treats it as a duck. When an object calls

204
00:10:37.200 --> 00:10:40.399
<v Speaker 2>a method on another object, Ruby simply checks at run

205
00:10:40.480 --> 00:10:44.360
<v Speaker 2>time if the receiving object actually implements that method. Does

206
00:10:44.399 --> 00:10:45.360
<v Speaker 2>it respond to that.

207
00:10:45.279 --> 00:10:47.559
<v Speaker 1>Message ah, run time checking yes.

208
00:10:48.120 --> 00:10:51.360
<v Speaker 2>It achieves pretty much the same goal as formal interfaces,

209
00:10:51.399 --> 00:10:54.679
<v Speaker 2>making sure method calls are valid, but without requiring classes

210
00:10:54.759 --> 00:10:58.919
<v Speaker 2>to declare upfront which contracts or interfaces they fulfill. It's

211
00:10:58.960 --> 00:11:03.080
<v Speaker 2>a very dynamic, flexible way to handle type patibility.

212
00:11:03.159 --> 00:11:06.320
<v Speaker 1>Okay, let's dive into some specific linear containers then, starting

213
00:11:06.320 --> 00:11:09.519
<v Speaker 1>with stacks. I'm picturing that stack of plates or maybe

214
00:11:09.559 --> 00:11:11.639
<v Speaker 1>a pile of shirts. The last thing you put on

215
00:11:11.759 --> 00:11:13.720
<v Speaker 1>is the first thing you take off the LEFO.

216
00:11:14.080 --> 00:11:17.279
<v Speaker 2>That's precisely the model. A stack is an ordered container.

217
00:11:17.480 --> 00:11:19.840
<v Speaker 2>And the key thing is that access is restricted to

218
00:11:19.960 --> 00:11:23.000
<v Speaker 2>just one end, which we call the top. The core

219
00:11:23.080 --> 00:11:25.759
<v Speaker 2>operations are push for adding an element to the top,

220
00:11:25.799 --> 00:11:28.799
<v Speaker 2>putting a plate on right, pop for removing the top element.

221
00:11:28.600 --> 00:11:30.039
<v Speaker 1>Taking the top plate off yep.

222
00:11:29.840 --> 00:11:31.840
<v Speaker 2>And top for just peaking at the top element without

223
00:11:31.840 --> 00:11:32.679
<v Speaker 2>actually removing it.

224
00:11:32.919 --> 00:11:38.799
<v Speaker 1>Okay, where does this last in first out logic really shine?

225
00:11:38.840 --> 00:11:40.320
<v Speaker 1>Where is it used in practice?

226
00:11:40.519 --> 00:11:43.919
<v Speaker 2>Oh? All over the place. A classic example is managing

227
00:11:43.960 --> 00:11:47.879
<v Speaker 2>function calls in a program. When function A calls function B,

228
00:11:47.879 --> 00:11:51.120
<v Speaker 2>b's context goes on the stack. If B call C,

229
00:11:51.120 --> 00:11:54.159
<v Speaker 2>C goes on top. When C finishes it pops off.

230
00:11:54.200 --> 00:11:56.759
<v Speaker 2>Then B then a last one called is the first

231
00:11:56.759 --> 00:12:00.000
<v Speaker 2>to finish, right the call stack exactly. Another good example

232
00:12:00.200 --> 00:12:03.440
<v Speaker 2>is maybe reversing something like if you read characters of

233
00:12:03.480 --> 00:12:05.759
<v Speaker 2>a word one by one and push them onto a stack,

234
00:12:05.840 --> 00:12:07.039
<v Speaker 2>then pop them off, you'll get.

235
00:12:06.919 --> 00:12:08.759
<v Speaker 1>The word in reverse simple but effective.

236
00:12:09.120 --> 00:12:12.120
<v Speaker 2>Very or that print school or example we mentioned earlier,

237
00:12:12.159 --> 00:12:15.120
<v Speaker 2>pushing pages on popping them off to print in reverse order.

238
00:12:15.200 --> 00:12:18.240
<v Speaker 1>Gotcha. Now, when it comes to actually implementing a stack,

239
00:12:18.399 --> 00:12:19.840
<v Speaker 1>what are the main ways to do it?

240
00:12:20.039 --> 00:12:23.600
<v Speaker 2>There are basically two primary approaches. You can use contiguous

241
00:12:23.639 --> 00:12:27.080
<v Speaker 2>memory locations, which usually means using an array, or you

242
00:12:27.120 --> 00:12:30.320
<v Speaker 2>can use linked structures. In Ruby, It's built in array

243
00:12:30.360 --> 00:12:34.240
<v Speaker 2>class is particularly well suited for a contiguous stack implementation.

244
00:12:34.879 --> 00:12:38.879
<v Speaker 2>Why because it already has push and pop methods built

245
00:12:38.919 --> 00:12:42.279
<v Speaker 2>right in and they automatically handle the memory expansion and

246
00:12:42.320 --> 00:12:43.080
<v Speaker 2>shrinking for you.

247
00:12:43.200 --> 00:12:45.720
<v Speaker 1>It's very convenient, nice and the linked way.

248
00:12:46.200 --> 00:12:49.240
<v Speaker 2>For a linked implementation, you typically use what's called a

249
00:12:49.279 --> 00:12:52.799
<v Speaker 2>singly linked list. You have nodes each pointing to the next,

250
00:12:52.960 --> 00:12:54.919
<v Speaker 2>and you just keep a reference to the headnode, which

251
00:12:54.919 --> 00:12:57.559
<v Speaker 2>acts as your top node. The advantage here is that

252
00:12:57.600 --> 00:13:00.519
<v Speaker 2>it never really becomes full unless you complete run out

253
00:13:00.519 --> 00:13:03.080
<v Speaker 2>of memory, and it can sometimes be more efficient with

254
00:13:03.159 --> 00:13:05.320
<v Speaker 2>space only using what it needs.

255
00:13:05.919 --> 00:13:09.879
<v Speaker 1>Interesting trade offs. Okay, so then we have queues. Unlike stacks,

256
00:13:09.919 --> 00:13:14.159
<v Speaker 1>these are first in, first out FIFO, like a line

257
00:13:14.200 --> 00:13:16.080
<v Speaker 1>at the bank or for free lunch.

258
00:13:16.240 --> 00:13:18.799
<v Speaker 2>Exactly like a line, a queue is a dispenser where

259
00:13:18.879 --> 00:13:20.600
<v Speaker 2>elements are inserted at one end.

260
00:13:20.600 --> 00:13:23.320
<v Speaker 1>Called the rear, joining the back of the line right, and.

261
00:13:23.279 --> 00:13:25.840
<v Speaker 2>They're removed or access from the other end the front,

262
00:13:25.919 --> 00:13:30.000
<v Speaker 2>getting served at the front precisely. Key operations include enter

263
00:13:30.240 --> 00:13:32.960
<v Speaker 2>or on queue to add to the rear, leave or

264
00:13:33.000 --> 00:13:35.279
<v Speaker 2>to queue to remove from the front and front to

265
00:13:35.399 --> 00:13:37.679
<v Speaker 2>just peek at the element at the front without removing it.

266
00:13:37.799 --> 00:13:40.559
<v Speaker 1>And there's a catch with leave in front, isn't there?

267
00:13:40.720 --> 00:13:44.679
<v Speaker 2>Yes, both leave and front share an important precondition. The

268
00:13:44.799 --> 00:13:48.000
<v Speaker 2>queue must not be empty. You can't serve someone if

269
00:13:48.000 --> 00:13:48.840
<v Speaker 2>the line is empty.

270
00:13:49.399 --> 00:13:53.639
<v Speaker 1>Makes sense. So how would our print spooler use a

271
00:13:53.759 --> 00:13:56.120
<v Speaker 1>queue instead of the stack we talked about earlier?

272
00:13:56.240 --> 00:13:59.080
<v Speaker 2>Well, a queue is perfect for managing print jobs fairly.

273
00:13:59.480 --> 00:14:02.559
<v Speaker 2>As new print jobs arrive from different users, they enter

274
00:14:02.600 --> 00:14:05.240
<v Speaker 2>the queue at the rear. When the printer becomes free,

275
00:14:05.440 --> 00:14:07.240
<v Speaker 2>it takes the job at the front, the one that's

276
00:14:07.279 --> 00:14:08.799
<v Speaker 2>been waiting the longest, by having it.

277
00:14:08.799 --> 00:14:11.519
<v Speaker 1>Leave the queue first come, first served exactly.

278
00:14:12.080 --> 00:14:15.480
<v Speaker 2>This guarantees no job gets held up indefinitely. Everyone gets

279
00:14:15.519 --> 00:14:18.320
<v Speaker 2>their turn in the order they arrived. It ensures fairness.

280
00:14:18.759 --> 00:14:22.039
<v Speaker 1>Okay, So when we think about implementing queues, what's the

281
00:14:22.080 --> 00:14:25.080
<v Speaker 1>core challenge, especially if we try to use a standard array,

282
00:14:25.679 --> 00:14:27.360
<v Speaker 1>and how do we usually solve it?

283
00:14:27.759 --> 00:14:31.639
<v Speaker 2>Right with a contiguous array implementation, The challenge is that

284
00:14:31.799 --> 00:14:35.519
<v Speaker 2>removing an element from the front would normally require shifting

285
00:14:35.639 --> 00:14:38.919
<v Speaker 2>every single other element down one spot to fill the gap,

286
00:14:39.120 --> 00:14:39.399
<v Speaker 2>and it.

287
00:14:39.399 --> 00:14:42.320
<v Speaker 1>Sounds terribly inefficient, especially for long queues.

288
00:14:42.639 --> 00:14:45.120
<v Speaker 2>It is it would make the leave operation very slow.

289
00:14:45.639 --> 00:14:48.159
<v Speaker 2>So the clever solution is to use what's called a

290
00:14:48.240 --> 00:14:52.399
<v Speaker 2>circular array or ring buffer. Imagine the array is bent

291
00:14:52.440 --> 00:14:55.639
<v Speaker 2>into a circle. The data can float within the array,

292
00:14:55.960 --> 00:14:58.639
<v Speaker 2>and the front and rear printers can wrap around from

293
00:14:58.639 --> 00:14:59.600
<v Speaker 2>the end back.

294
00:14:59.399 --> 00:15:01.519
<v Speaker 1>To the start. Ah, so you don't have to shift

295
00:15:01.559 --> 00:15:02.679
<v Speaker 1>everything exactly.

296
00:15:03.120 --> 00:15:07.519
<v Speaker 2>This dramatically improves efficiency by avoiding those constant, costly data

297
00:15:07.600 --> 00:15:09.360
<v Speaker 2>shifts when elements leave the front.

298
00:15:09.759 --> 00:15:11.519
<v Speaker 1>Clever, but is that the preferred way?

299
00:15:11.679 --> 00:15:14.879
<v Speaker 2>It works? But a linked implementation is often generally preferred

300
00:15:14.879 --> 00:15:18.000
<v Speaker 2>For cues. You typically use a singly linked list, but

301
00:15:18.080 --> 00:15:20.600
<v Speaker 2>this time you need pointers to both the front ptr

302
00:15:20.759 --> 00:15:23.720
<v Speaker 2>the head and the rear ptr the tail to allow

303
00:15:23.720 --> 00:15:26.120
<v Speaker 2>efficient additions at the back and removals from the front.

304
00:15:26.240 --> 00:15:27.200
<v Speaker 1>Why is that often better?

305
00:15:27.399 --> 00:15:31.600
<v Speaker 2>Well, it's truly unbounded, again, limited only by memory. It's

306
00:15:31.679 --> 00:15:35.840
<v Speaker 2>usually very efficient in space usage, and both entering and

307
00:15:36.039 --> 00:15:40.480
<v Speaker 2>leaving are very fast constant time operations because you don't

308
00:15:40.480 --> 00:15:42.559
<v Speaker 2>need to traverse the whole list, just update a couple

309
00:15:42.600 --> 00:15:43.159
<v Speaker 2>of pointers.

310
00:15:43.240 --> 00:15:47.080
<v Speaker 1>Got it. Lots of implementation choices with different trade offs.

311
00:15:47.120 --> 00:15:51.080
<v Speaker 2>Absolutely. Now let's shift gears slightly and turn our attention

312
00:15:51.200 --> 00:15:55.960
<v Speaker 2>back to collections. Remember those they're traversible containers.

313
00:15:55.519 --> 00:15:56.960
<v Speaker 1>Right, meaning you can access all their.

314
00:15:56.879 --> 00:16:00.000
<v Speaker 2>Elements exactly, and that process of accessing all the other

315
00:16:00.000 --> 00:16:01.279
<v Speaker 2>elements is called iteration.

316
00:16:01.639 --> 00:16:04.279
<v Speaker 1>Okay, iteration, But there are different ways to actually do

317
00:16:04.399 --> 00:16:05.440
<v Speaker 1>the iterating, aren't there?

318
00:16:05.600 --> 00:16:09.440
<v Speaker 2>Yes, there are a few main styles. Iteration can be external,

319
00:16:09.840 --> 00:16:13.720
<v Speaker 2>where a separate entity an iterator object, controls the traversal.

320
00:16:14.159 --> 00:16:17.000
<v Speaker 2>It asks the collection for the next element, then the next,

321
00:16:17.120 --> 00:16:20.440
<v Speaker 2>giving you fine grained control over the process.

322
00:16:20.120 --> 00:16:22.240
<v Speaker 1>Like having a separate remote control for the collection.

323
00:16:22.440 --> 00:16:25.480
<v Speaker 2>Kind of yeah. The well known iterator design pattern is

324
00:16:25.480 --> 00:16:29.320
<v Speaker 2>a formal model for these external iterators. Or iteration can

325
00:16:29.360 --> 00:16:33.080
<v Speaker 2>be internal. Here, the collection itself provides a method maybe

326
00:16:33.120 --> 00:16:35.360
<v Speaker 2>called each or four each, that accepts a block of

327
00:16:35.440 --> 00:16:38.960
<v Speaker 2>code like a function or lambda, and applies that code

328
00:16:38.960 --> 00:16:42.279
<v Speaker 2>to each element within the collection. The collection manages the

329
00:16:42.279 --> 00:16:43.279
<v Speaker 2>traversal internally.

330
00:16:43.399 --> 00:16:45.759
<v Speaker 1>Okay, external control versus internal application.

331
00:16:46.000 --> 00:16:49.000
<v Speaker 2>Right, And given its reputation for flexibility, how does Ruby

332
00:16:49.039 --> 00:16:51.120
<v Speaker 2>fit into this picture? Does it prefer one way?

333
00:16:51.559 --> 00:16:56.399
<v Speaker 1>Ruby is remarkably versatile here. It actually supports most of

334
00:16:56.440 --> 00:16:59.679
<v Speaker 1>the common iteration alternatives you find in different languages.

335
00:16:59.320 --> 00:17:02.320
<v Speaker 2>But it's preferred mechanism. The most idiomatic way to traverse

336
00:17:02.320 --> 00:17:06.079
<v Speaker 2>collections in Ruby is internal iteration. This is primarily done

337
00:17:06.119 --> 00:17:09.160
<v Speaker 2>through the innumerable mix and module. If a class includes

338
00:17:09.160 --> 00:17:12.920
<v Speaker 2>innumerable and defines in each method, it automatically gets access

339
00:17:12.960 --> 00:17:17.000
<v Speaker 2>to over twenty powerful internal iteration and collection processing methods

340
00:17:17.039 --> 00:17:19.000
<v Speaker 2>like map, select, reduce, and so on.

341
00:17:19.160 --> 00:17:22.319
<v Speaker 1>Wow, innumerable is powerful, then hugely powerful.

342
00:17:22.680 --> 00:17:27.160
<v Speaker 2>But Ruby also supports enumerator objects. These are interesting because

343
00:17:27.400 --> 00:17:30.279
<v Speaker 2>they're kind of like iterators that can perform both internal

344
00:17:30.359 --> 00:17:34.680
<v Speaker 2>and external iteration. They can even handle things like exceptions

345
00:17:34.720 --> 00:17:37.200
<v Speaker 2>if you need to stop the iteration process early. So

346
00:17:37.319 --> 00:17:41.480
<v Speaker 2>Ruby gives developers a really nice blend of options for iteration.

347
00:17:41.359 --> 00:17:45.400
<v Speaker 1>Very flexible. Let's talk about a fundamental collection type lists.

348
00:17:45.559 --> 00:17:48.720
<v Speaker 2>Right lists are ordered linear collections. They're absolutely fundamental for

349
00:17:48.759 --> 00:17:52.599
<v Speaker 2>countless applications. Common operations you'd expect on a list include

350
00:17:52.720 --> 00:17:56.200
<v Speaker 2>inserting an element at a specific index, deleting an element

351
00:17:56.240 --> 00:17:59.960
<v Speaker 2>at a specific index, delete heat, accessing elements by the

352
00:18:00.160 --> 00:18:04.519
<v Speaker 2>index using the square brackets, replacing elements by index, finding

353
00:18:04.559 --> 00:18:07.720
<v Speaker 2>the index of a particular element, and maybe creating slices

354
00:18:07.799 --> 00:18:09.160
<v Speaker 2>which are like sub lists.

355
00:18:09.240 --> 00:18:11.720
<v Speaker 1>Yeah, that example you gave earlier. A calendar programs to do.

356
00:18:11.839 --> 00:18:14.720
<v Speaker 1>List seems like a perfect fit. Items have an order,

357
00:18:14.720 --> 00:18:17.559
<v Speaker 1>maybe based on precedence, and need to add, remove, or

358
00:18:17.640 --> 00:18:20.359
<v Speaker 1>maybe move them around freely. A list seems ideal.

359
00:18:20.480 --> 00:18:23.279
<v Speaker 2>It is a list ADT is the perfect container for

360
00:18:23.319 --> 00:18:25.880
<v Speaker 2>that kind of dynamic ordered data.

361
00:18:25.960 --> 00:18:28.799
<v Speaker 1>So when we talk about implementing lists, what are the

362
00:18:28.799 --> 00:18:31.200
<v Speaker 1>primary trade offs? We touched on this a bit with

363
00:18:31.279 --> 00:18:33.839
<v Speaker 1>stacks and queues using arrays versus link.

364
00:18:33.720 --> 00:18:37.599
<v Speaker 2>Structures exactly the same core trade off. Supply lists are

365
00:18:37.640 --> 00:18:42.039
<v Speaker 2>actually very straightforward to implement with a contiguous implementation, Elements

366
00:18:42.039 --> 00:18:45.319
<v Speaker 2>are stored sequentially right next to each other in an array.

367
00:18:45.960 --> 00:18:48.960
<v Speaker 2>Static arrays have a fixed size, which can be limiting.

368
00:18:49.400 --> 00:18:52.680
<v Speaker 2>Dynamic arrays like the ones Ruby provides, grow as needed.

369
00:18:52.839 --> 00:18:54.839
<v Speaker 2>Remember the doubling strategy.

370
00:18:54.440 --> 00:18:56.200
<v Speaker 1>Right to avoid frequent reallocations.

371
00:18:56.319 --> 00:18:59.559
<v Speaker 2>Yeah. Now, the big trade off with contiguous lists is

372
00:18:59.640 --> 00:19:03.119
<v Speaker 2>insert and deletion in the middle. If you insert or

373
00:19:03.160 --> 00:19:05.640
<v Speaker 2>delete an element somewhere in the middle of a long list,

374
00:19:05.720 --> 00:19:08.559
<v Speaker 2>you might have to shift many many other elements up

375
00:19:08.640 --> 00:19:11.359
<v Speaker 2>or down to make space or close the gap. This

376
00:19:11.519 --> 00:19:16.400
<v Speaker 2>makes those operations potentially slow on in the worst case, proportional.

377
00:19:15.759 --> 00:19:17.799
<v Speaker 1>To the list length, But accessing is fast.

378
00:19:17.920 --> 00:19:22.200
<v Speaker 2>But accessing an element directly by its index number, that's instantaneous.

379
00:19:22.359 --> 00:19:24.799
<v Speaker 2>Oh one, because you know exactly where it is in

380
00:19:24.839 --> 00:19:26.079
<v Speaker 2>memory based on the index.

381
00:19:26.160 --> 00:19:28.960
<v Speaker 1>Okay, and what about linked implementations for lists?

382
00:19:29.160 --> 00:19:32.920
<v Speaker 2>For a linked implementation, you use that series of nodes

383
00:19:32.960 --> 00:19:36.759
<v Speaker 2>we talked about, connected by references or pointers. You can

384
00:19:36.880 --> 00:19:39.839
<v Speaker 2>use singly linked lists each node points to the next,

385
00:19:39.960 --> 00:19:42.720
<v Speaker 2>or doubly linked lists each node points to the next

386
00:19:42.799 --> 00:19:43.119
<v Speaker 2>and the.

387
00:19:43.079 --> 00:19:45.759
<v Speaker 1>Previous, and the trade off here is flipped pretty much.

388
00:19:46.279 --> 00:19:49.799
<v Speaker 2>The advantage is that insertion and deletion are very fast.

389
00:19:49.920 --> 00:19:51.880
<v Speaker 2>Once you've found the node where you want to insert

390
00:19:51.960 --> 00:19:54.480
<v Speaker 2>or delete, you just need to update a few pointers.

391
00:19:54.519 --> 00:19:56.680
<v Speaker 2>It doesn't matter how long the list is, right, But

392
00:19:56.799 --> 00:20:00.799
<v Speaker 2>the significant drawback is accessing an element by its index.

393
00:20:01.359 --> 00:20:04.920
<v Speaker 2>Since the nodes aren't stored contiguously, to find the element

394
00:20:05.000 --> 00:20:08.079
<v Speaker 2>at index, say one thousand, you have to start at

395
00:20:08.119 --> 00:20:11.480
<v Speaker 2>the beginning ahead and follow the links one thousand times.

396
00:20:11.559 --> 00:20:14.119
<v Speaker 1>Ah, So accessing by index become slow.

397
00:20:14.240 --> 00:20:15.359
<v Speaker 2>Oh m, exactly.

398
00:20:15.559 --> 00:20:15.799
<v Speaker 1>Yeah.

399
00:20:15.839 --> 00:20:18.480
<v Speaker 2>So this raises that important question again, when would you

400
00:20:18.599 --> 00:20:19.720
<v Speaker 2>choose one over the other.

401
00:20:20.119 --> 00:20:21.680
<v Speaker 1>It depends on how you use the list.

402
00:20:21.519 --> 00:20:24.480
<v Speaker 2>Right precisely, if your list is going to be modified, frequently,

403
00:20:24.559 --> 00:20:27.359
<v Speaker 2>lots of insertions and deletions, especially in the middle, but

404
00:20:27.400 --> 00:20:30.039
<v Speaker 2>you don't need to jump to specific indices very often,

405
00:20:30.559 --> 00:20:34.119
<v Speaker 2>then a linked implementation is generally better. But if access

406
00:20:34.160 --> 00:20:36.279
<v Speaker 2>to elements in the middle of a long list by

407
00:20:36.319 --> 00:20:41.559
<v Speaker 2>their index is frequent and changes, insertions deletions are relatively rare,

408
00:20:42.000 --> 00:20:46.200
<v Speaker 2>then a contiguous implementation like an array is usually superior.

409
00:20:46.440 --> 00:20:50.079
<v Speaker 1>Understanding those access patterns is crucial for performance.

410
00:20:49.680 --> 00:20:50.559
<v Speaker 2>Absolutely key.

411
00:20:50.680 --> 00:20:52.640
<v Speaker 1>And what about rubies built in array How does it

412
00:20:52.680 --> 00:20:54.960
<v Speaker 1>compare to these conceptual list implementations.

413
00:20:55.240 --> 00:20:58.279
<v Speaker 2>Well, the Ruby array class is already remarkably similar to

414
00:20:58.319 --> 00:21:03.240
<v Speaker 2>a contiguously implemented list ADT. It effectively implements most, if

415
00:21:03.279 --> 00:21:06.799
<v Speaker 2>not all, of the common list interface operations we discussed,

416
00:21:07.079 --> 00:21:10.640
<v Speaker 2>and behaves almost identically in terms of performance, characteristics, fast

417
00:21:10.680 --> 00:21:14.599
<v Speaker 2>index access, potentially slower mid list insertions solutions.

418
00:21:14.839 --> 00:21:16.559
<v Speaker 1>So, if you need a list in Ruby.

419
00:21:16.319 --> 00:21:19.200
<v Speaker 2>The simplest and often the most efficient way to implement

420
00:21:19.200 --> 00:21:22.000
<v Speaker 2>a customer array list in Ruby is simply to subclass

421
00:21:22.000 --> 00:21:25.519
<v Speaker 2>it's powerful built in array class, or often just use

422
00:21:25.519 --> 00:21:27.640
<v Speaker 2>the array class directly. It already does most of what

423
00:21:27.640 --> 00:21:28.519
<v Speaker 2>you'd want from a list.

424
00:21:28.640 --> 00:21:31.599
<v Speaker 1>Got it? Okay? We've mentioned efficiency and performance quite a

425
00:21:31.599 --> 00:21:34.079
<v Speaker 1>bit in one. But how do we actually measure how

426
00:21:34.119 --> 00:21:36.279
<v Speaker 1>good an algorithm is? Is it really just about timing

427
00:21:36.359 --> 00:21:38.240
<v Speaker 1>how fast it runs on my specific computer.

428
00:21:38.599 --> 00:21:41.720
<v Speaker 2>That's a really great question and the answer is no,

429
00:21:42.000 --> 00:21:46.400
<v Speaker 2>not really. Simply timing programs isn't a reliable enough measure.

430
00:21:46.799 --> 00:21:50.079
<v Speaker 2>There are just too many variables, too many confounding factors,

431
00:21:50.160 --> 00:21:53.599
<v Speaker 2>as the source calls them. Well, the programming language you

432
00:21:53.680 --> 00:21:57.079
<v Speaker 2>use makes a difference, the compiler or interpreter used, the

433
00:21:57.119 --> 00:21:59.720
<v Speaker 2>actual speed of the machine you're running it on, and

434
00:21:59.720 --> 00:22:02.400
<v Speaker 2>what else that machine is doing its load, even the

435
00:22:02.440 --> 00:22:05.559
<v Speaker 2>operating system, and frankly, the skill of the programmer who

436
00:22:05.559 --> 00:22:09.599
<v Speaker 2>wrote the specific implementation. All these things make it really

437
00:22:09.680 --> 00:22:12.839
<v Speaker 2>hard to get a trustworthy measure of the algorithm's inherent

438
00:22:12.960 --> 00:22:15.119
<v Speaker 2>work just by using a stopwatch.

439
00:22:15.200 --> 00:22:18.119
<v Speaker 1>Okay, so if direct timing is unreliable because of all

440
00:22:18.160 --> 00:22:22.119
<v Speaker 1>those factors, what can we measure instead to understand an

441
00:22:22.160 --> 00:22:23.640
<v Speaker 1>algorithm's true performance.

442
00:22:23.720 --> 00:22:26.599
<v Speaker 2>It's essence, we need to abstract away from the specific

443
00:22:26.640 --> 00:22:30.480
<v Speaker 2>implementation details and focus on the algorithm. And abstraction, we

444
00:22:30.559 --> 00:22:32.799
<v Speaker 2>measure the amount of work done, not in seconds, but

445
00:22:32.880 --> 00:22:36.799
<v Speaker 2>by counting basic operations. Basic operation, Yeah, those operations that

446
00:22:36.799 --> 00:22:40.359
<v Speaker 2>are performed most often and fundamentally impact the algorithm's execution

447
00:22:40.480 --> 00:22:44.480
<v Speaker 2>time as the input size grows. For instance, in searching

448
00:22:44.559 --> 00:22:48.000
<v Speaker 2>or sorting algorithms, a key basic operation is often the

449
00:22:48.079 --> 00:22:51.079
<v Speaker 2>number of times we compare two keys or values. That's

450
00:22:51.119 --> 00:22:52.079
<v Speaker 2>a good proxy for the.

451
00:22:52.119 --> 00:22:54.599
<v Speaker 1>Work being done. Okay, count the core work right now.

452
00:22:54.640 --> 00:22:57.480
<v Speaker 2>Algorithms don't always perform the exact same number of these

453
00:22:57.519 --> 00:23:00.319
<v Speaker 2>basic operations for every input of a certain size, so

454
00:23:00.359 --> 00:23:04.200
<v Speaker 2>we usually consider three cases for analysis. First, there's best

455
00:23:04.240 --> 00:23:08.640
<v Speaker 2>case complexity, often written as b N. That's the minimum

456
00:23:08.720 --> 00:23:11.680
<v Speaker 2>number of operations the algorithm might perform for an input

457
00:23:11.720 --> 00:23:15.200
<v Speaker 2>of size N. Example, sure, take a simple sequential search,

458
00:23:15.440 --> 00:23:17.839
<v Speaker 2>just looking through a list from beginning to end. If

459
00:23:17.839 --> 00:23:19.880
<v Speaker 2>the item you're looking for happens to be the very

460
00:23:19.920 --> 00:23:23.240
<v Speaker 2>first one, that's the best case. Just one comparison, bn

461
00:23:23.440 --> 00:23:27.039
<v Speaker 2>N one got it. Then there's worst case complexity wn

462
00:23:27.240 --> 00:23:31.240
<v Speaker 2>that's the maximum number of operations for sequential search. The

463
00:23:31.279 --> 00:23:33.319
<v Speaker 2>worst case is if the item isn't in the list

464
00:23:33.319 --> 00:23:35.680
<v Speaker 2>at all, or if it's the very last element. You

465
00:23:35.680 --> 00:23:38.680
<v Speaker 2>have to compare it with all n elements wnn.

466
00:23:38.279 --> 00:23:39.799
<v Speaker 1>N Okay, best and worse what else?

467
00:23:40.039 --> 00:23:44.440
<v Speaker 2>Finally, there's average case complexity an This tries to describe

468
00:23:44.480 --> 00:23:47.640
<v Speaker 2>the algorithm's behavior over a wide range of typical or

469
00:23:47.680 --> 00:23:51.680
<v Speaker 2>random inputs. It often requires making some reasonable assumptions about

470
00:23:51.720 --> 00:23:55.559
<v Speaker 2>the distribution of possible inputs. For sequential search, the average

471
00:23:55.599 --> 00:23:57.400
<v Speaker 2>is often around ND two comparisons.

472
00:23:57.960 --> 00:24:00.920
<v Speaker 1>This seems like where big O notation becomes really useful,

473
00:24:00.920 --> 00:24:02.960
<v Speaker 1>doesn't it to simplify these complexities?

474
00:24:03.079 --> 00:24:07.680
<v Speaker 2>It absolutely is. Big O notation written as of describes

475
00:24:07.720 --> 00:24:11.000
<v Speaker 2>the asymptotic growth rate of a function, or its order

476
00:24:11.119 --> 00:24:13.839
<v Speaker 2>of growth. It gives us a standard way to talk

477
00:24:13.839 --> 00:24:16.799
<v Speaker 2>about how the work required by an algorithm grows as

478
00:24:16.839 --> 00:24:19.119
<v Speaker 2>the input size end gets really really.

479
00:24:18.920 --> 00:24:21.839
<v Speaker 1>Large asymptotic Focusing on large inputs.

480
00:24:21.599 --> 00:24:24.960
<v Speaker 2>Exactly, we're generally more interested in the big differences in

481
00:24:25.000 --> 00:24:27.559
<v Speaker 2>efficiency that show up when you're dealing with large amounts

482
00:24:27.599 --> 00:24:30.720
<v Speaker 2>of data, rather than tiny differences in constant factors or

483
00:24:30.759 --> 00:24:34.200
<v Speaker 2>performance on small inputs. Big OH helps us partition algorithms

484
00:24:34.240 --> 00:24:37.279
<v Speaker 2>into groups with roughly equivalent efficiency scaling.

485
00:24:37.559 --> 00:24:38.960
<v Speaker 1>So for that sequential search.

486
00:24:39.319 --> 00:24:42.920
<v Speaker 2>For sequential search, whether you look at the best case one,

487
00:24:43.200 --> 00:24:46.519
<v Speaker 2>worst case n, or average case N two, we'd say

488
00:24:46.519 --> 00:24:49.960
<v Speaker 2>the algorithm is linear. Its complexity grows roughly in proportion

489
00:24:50.079 --> 00:24:52.680
<v Speaker 2>to end, So we say sequential search is an.

490
00:24:52.559 --> 00:24:55.759
<v Speaker 1>O n Okay, big O gives us the dominant growth

491
00:24:55.839 --> 00:24:56.720
<v Speaker 1>rate precisely.

492
00:24:56.839 --> 00:24:59.880
<v Speaker 2>Now that we've analyzed search efficiency, let's apply this thinking

493
00:25:00.079 --> 00:25:04.519
<v Speaker 2>to sorting algorithms. The basic sorting algorithms things like bubble sort,

494
00:25:04.599 --> 00:25:06.960
<v Speaker 2>selection sort, and insertion sort. They typically have an O

495
00:25:07.119 --> 00:25:11.200
<v Speaker 2>N two or quadratic worst case and average case complexity

496
00:25:11.319 --> 00:25:11.960
<v Speaker 2>and squared.

497
00:25:12.119 --> 00:25:14.119
<v Speaker 1>That sounds like it gets slow fast it does.

498
00:25:14.319 --> 00:25:16.720
<v Speaker 2>It means if you double the input size, the time

499
00:25:16.799 --> 00:25:21.200
<v Speaker 2>taken roughly quadruples. Performance degrades quadratically as the input size grows.

500
00:25:21.440 --> 00:25:25.119
<v Speaker 2>So for large data sets, these simple sorts become very impractical.

501
00:25:25.480 --> 00:25:28.480
<v Speaker 1>Although you mentioned insertion sort has a slightly better case, right.

502
00:25:28.480 --> 00:25:31.440
<v Speaker 2>Insertion sort is interesting because its best case, if the

503
00:25:31.480 --> 00:25:34.519
<v Speaker 2>list is already sorted or nearly sorted, is actually much faster.

504
00:25:34.599 --> 00:25:37.880
<v Speaker 2>It's o in linear, but its average and worst are

505
00:25:37.960 --> 00:25:39.279
<v Speaker 2>still O N two.

506
00:25:39.640 --> 00:25:42.440
<v Speaker 1>So for any serious amount of data, these simple O

507
00:25:42.640 --> 00:25:45.039
<v Speaker 1>N two sorts probably aren't going to cut it. What

508
00:25:45.079 --> 00:25:48.000
<v Speaker 1>are the algorithms we really rely on for speed when

509
00:25:48.039 --> 00:25:49.400
<v Speaker 1>sorting large lists?

510
00:25:49.720 --> 00:25:53.079
<v Speaker 2>For larger data sets, we absolutely turn to faster, more

511
00:25:53.119 --> 00:25:57.839
<v Speaker 2>sophisticated algorithms. The most famous are probably merge sort and

512
00:25:57.920 --> 00:26:01.359
<v Speaker 2>quick sort. Both of these typical use a powerful strategy

513
00:26:01.400 --> 00:26:02.720
<v Speaker 2>called divide and conquer.

514
00:26:02.880 --> 00:26:06.200
<v Speaker 1>Divide and conquer break the problem down exactly.

515
00:26:06.640 --> 00:26:10.119
<v Speaker 2>Merge sort works by recursively dividing the list into two halves,

516
00:26:10.279 --> 00:26:13.519
<v Speaker 2>sorting each half independently, and then merging the two sorted

517
00:26:13.559 --> 00:26:15.359
<v Speaker 2>halves back together into a single.

518
00:26:15.119 --> 00:26:17.440
<v Speaker 1>Sorted list, and its performance merch.

519
00:26:17.240 --> 00:26:21.359
<v Speaker 2>Sort is remarkably consistent. It's o n log n pronounced

520
00:26:21.519 --> 00:26:24.640
<v Speaker 2>n log n in its best, worst, and average cases.

521
00:26:25.119 --> 00:26:28.359
<v Speaker 2>That logarithmic factor makes a huge difference compared to end.

522
00:26:28.279 --> 00:26:31.200
<v Speaker 1>Two N log n is much better than n squared.

523
00:26:31.039 --> 00:26:33.920
<v Speaker 2>Much much better for a large N. The main downside

524
00:26:33.920 --> 00:26:36.400
<v Speaker 2>of merge sort is that it usually requires extra memory

525
00:26:36.440 --> 00:26:39.440
<v Speaker 2>space roughly proportional to end to perform that merging step.

526
00:26:39.599 --> 00:26:40.960
<v Speaker 1>Okay, what about quicksort.

527
00:26:41.079 --> 00:26:43.920
<v Speaker 2>Quick sort also uses divide and conquer, but in a

528
00:26:43.920 --> 00:26:46.680
<v Speaker 2>different way. It selects an element from the list called

529
00:26:46.680 --> 00:26:49.160
<v Speaker 2>the pivot. Then it partitions the rest of the list

530
00:26:49.200 --> 00:26:51.960
<v Speaker 2>around the pivot. All elements smaller than the pivot go

531
00:26:52.039 --> 00:26:54.319
<v Speaker 2>to its left, all elements greater go to its right.

532
00:26:54.839 --> 00:26:58.440
<v Speaker 2>After partitioning, it recursively calls quicksort on the sub list

533
00:26:58.480 --> 00:27:00.440
<v Speaker 2>to the left and the sub list to the right.

534
00:27:00.680 --> 00:27:02.039
<v Speaker 1>How does quick sort perform.

535
00:27:02.359 --> 00:27:05.960
<v Speaker 2>Quicksort's best and average case complexity is also o N

536
00:27:06.160 --> 00:27:09.119
<v Speaker 2>log end just like merged sort, and in practice, quick

537
00:27:09.160 --> 00:27:11.880
<v Speaker 2>sort is often slightly faster than merge sort on average

538
00:27:11.960 --> 00:27:15.519
<v Speaker 2>due to lower constant factors in its operations. It's incredibly

539
00:27:15.559 --> 00:27:16.759
<v Speaker 2>fast most of the time.

540
00:27:16.720 --> 00:27:19.559
<v Speaker 1>But quicksort has that notorious Achilles heel we talked about,

541
00:27:19.599 --> 00:27:22.039
<v Speaker 1>right it's worst case scenario.

542
00:27:21.880 --> 00:27:25.480
<v Speaker 2>It certainly does. Unlike merged sort, quick sort's worst case

543
00:27:25.519 --> 00:27:29.200
<v Speaker 2>complexity can degrade significantly. It can become o N two

544
00:27:29.359 --> 00:27:32.039
<v Speaker 2>when the that It happens if the pivot selection consistently

545
00:27:32.119 --> 00:27:35.880
<v Speaker 2>leads to highly imbalanced partitions. For example, if you always

546
00:27:35.880 --> 00:27:38.319
<v Speaker 2>pick the first element as the pivot and the list

547
00:27:38.359 --> 00:27:41.680
<v Speaker 2>happens to be already sorted or reverse sorted, each partition

548
00:27:41.759 --> 00:27:44.799
<v Speaker 2>step will only separate one element from the rest. This

549
00:27:44.960 --> 00:27:48.240
<v Speaker 2>turns the recursive process into something much closer to O

550
00:27:48.480 --> 00:27:48.839
<v Speaker 2>EN two.

551
00:27:49.079 --> 00:27:54.119
<v Speaker 1>Yikes. So how do real world implementations avoid that two trap?

552
00:27:54.319 --> 00:27:54.960
<v Speaker 1>Good question.

553
00:27:55.559 --> 00:27:59.640
<v Speaker 2>Practical implementations of quick sort almost always include improvements to

554
00:27:59.680 --> 00:28:03.000
<v Speaker 2>mitt gate this worst case. One common strategy is to

555
00:28:03.039 --> 00:28:05.279
<v Speaker 2>choose a better pivot for instance, using the median of

556
00:28:05.319 --> 00:28:08.079
<v Speaker 2>three method picking the median of the first, middle, and

557
00:28:08.160 --> 00:28:12.440
<v Speaker 2>last elements. This makes pathologically bad partitions much less likely.

558
00:28:12.640 --> 00:28:12.920
<v Speaker 1>Okay.

559
00:28:13.119 --> 00:28:15.920
<v Speaker 2>Another optimization is to switch to a simpler sort like

560
00:28:15.960 --> 00:28:19.400
<v Speaker 2>insertion sort for very small sublists, say fewer than ten

561
00:28:19.480 --> 00:28:23.200
<v Speaker 2>twenty elements. This reduces the overhead of recursive calls where

562
00:28:23.200 --> 00:28:26.079
<v Speaker 2>tiny problems where quicksort's complexity isn't needed.

563
00:28:26.279 --> 00:28:30.720
<v Speaker 1>Clever optimizations. Now, this raises an important practical point. Why

564
00:28:30.839 --> 00:28:33.920
<v Speaker 1>is rubies built in array dot sort method often significantly

565
00:28:33.960 --> 00:28:36.559
<v Speaker 1>faster than the standard quick sort implementation? You might write

566
00:28:36.559 --> 00:28:37.519
<v Speaker 1>yourself in Ruby.

567
00:28:37.799 --> 00:28:41.359
<v Speaker 2>Ah, yeah, rubies built in array dot sort method and

568
00:28:41.440 --> 00:28:45.680
<v Speaker 2>its related sort is highly optimized. It's typically implemented down

569
00:28:45.720 --> 00:28:48.559
<v Speaker 2>at the C level within the Ruby interpreter, which is

570
00:28:48.720 --> 00:28:52.359
<v Speaker 2>much faster than executing Ruby code directly. Lower level implementation

571
00:28:52.480 --> 00:28:55.880
<v Speaker 2>helps definitely, Plus it often doesn't use just a simple

572
00:28:55.960 --> 00:29:00.880
<v Speaker 2>quick sort. Modern implementations frequently employ advanced hybrid SOT algorithms.

573
00:29:01.319 --> 00:29:04.000
<v Speaker 2>A common one is called introspective.

574
00:29:03.279 --> 00:29:05.720
<v Speaker 1>Sort or intra sort introsort. What's up?

575
00:29:05.759 --> 00:29:09.000
<v Speaker 2>It basically starts with quicksort because it's fast on average,

576
00:29:09.240 --> 00:29:12.400
<v Speaker 2>but it monitors the recursion depth if the recursion gets

577
00:29:12.400 --> 00:29:15.359
<v Speaker 2>too deep, which indicates a potential O N two worst

578
00:29:15.359 --> 00:29:18.839
<v Speaker 2>case scenario is developing, which is strategy exactly. It switches

579
00:29:18.880 --> 00:29:21.839
<v Speaker 2>over to a different algorithm like heapsort, which has a

580
00:29:22.039 --> 00:29:25.519
<v Speaker 2>guaranteed O N log en worst case performance, even if

581
00:29:25.519 --> 00:29:28.400
<v Speaker 2>it's slightly slower on average than quicksort. So intrat sort

582
00:29:28.519 --> 00:29:31.039
<v Speaker 2>gets the average case speed of quicksort, all avoiding its

583
00:29:31.079 --> 00:29:33.480
<v Speaker 2>O N two watch case best of both worlds.

584
00:29:33.519 --> 00:29:36.200
<v Speaker 1>Really, that's really smart. It's an example of how built

585
00:29:36.240 --> 00:29:40.839
<v Speaker 1>in language features can leverage sophisticated algorithmic knowledge. Okay, let's

586
00:29:40.880 --> 00:29:43.599
<v Speaker 1>loop back to stacks for just a moment. Our source

587
00:29:43.640 --> 00:29:47.400
<v Speaker 1>material highlights a really interesting and strong connection between stacks

588
00:29:47.440 --> 00:29:49.240
<v Speaker 1>and recursion. Can you elaborate on that.

589
00:29:49.440 --> 00:29:54.200
<v Speaker 2>Indeed, it's actually a fundamental principle in computer science. Every

590
00:29:54.279 --> 00:29:58.480
<v Speaker 2>recursive algorithm and algorithm that calls itself can, by definition

591
00:29:58.920 --> 00:30:03.960
<v Speaker 2>be rewritten as a non recursive iterative algorithm that explicitly

592
00:30:04.079 --> 00:30:07.119
<v Speaker 2>uses a stack data structure to manage its state. Every

593
00:30:07.160 --> 00:30:10.359
<v Speaker 2>single one, every single one, the recursion the language provides

594
00:30:10.400 --> 00:30:13.319
<v Speaker 2>implicitly uses a call stack behind the scenes. You can

595
00:30:13.440 --> 00:30:16.440
<v Speaker 2>always replace that implicit stack with an explicit one. You

596
00:30:16.480 --> 00:30:17.240
<v Speaker 2>manage yourself.

597
00:30:17.519 --> 00:30:21.000
<v Speaker 1>So let's take a concrete problem, maybe checking for balance

598
00:30:21.079 --> 00:30:23.240
<v Speaker 1>brackets in a string. You know, make sure all the

599
00:30:23.240 --> 00:30:26.599
<v Speaker 1>opening brackets are like or have matching closing brackets, or

600
00:30:26.720 --> 00:30:29.640
<v Speaker 1>could you solve that recursively or with a stack.

601
00:30:29.759 --> 00:30:32.519
<v Speaker 2>That's a perfect example. Yes, you could absolutely solve the

602
00:30:32.519 --> 00:30:35.519
<v Speaker 2>balance brackets problem either way. You could write a recursive

603
00:30:35.559 --> 00:30:38.920
<v Speaker 2>function that processes the string, or you could write an

604
00:30:38.960 --> 00:30:41.960
<v Speaker 2>iterative function that pushes opening brackets onto a stack and

605
00:30:42.000 --> 00:30:44.759
<v Speaker 2>pops them when it finds a matching closing bracket.

606
00:30:44.799 --> 00:30:46.799
<v Speaker 1>Which way is better for balance brackets?

607
00:30:46.839 --> 00:30:51.799
<v Speaker 2>Honestly, both approaches are about equally complicated to conceptualize and write.

608
00:30:52.359 --> 00:30:56.240
<v Speaker 2>Neither is significantly easier than the other. However, this isn't

609
00:30:56.279 --> 00:31:02.119
<v Speaker 2>always the case. Consider evaluating mathematical expression. For prefix expressions,

610
00:31:02.319 --> 00:31:05.440
<v Speaker 2>where the operator comes before its operas like plus two three,

611
00:31:05.759 --> 00:31:09.319
<v Speaker 2>a recursive algorithm is generally much much easier and more

612
00:31:09.400 --> 00:31:13.720
<v Speaker 2>natural to write. Conversely, for postfix expressions also known as

613
00:31:13.759 --> 00:31:17.200
<v Speaker 2>reverse polish notation, where the operator comes after its operas

614
00:31:17.319 --> 00:31:20.799
<v Speaker 2>like two to three plus, a stack based iterative algorithm

615
00:31:20.880 --> 00:31:23.279
<v Speaker 2>is far simpler and more intuitive.

616
00:31:23.680 --> 00:31:24.799
<v Speaker 1>So the lesson is the.

617
00:31:24.880 --> 00:31:27.240
<v Speaker 2>Lesson for you as a developer is that while it's

618
00:31:27.279 --> 00:31:30.400
<v Speaker 2>always possible to use either recursion or an explicit stack,

619
00:31:30.759 --> 00:31:33.680
<v Speaker 2>you should probably sketch out both approaches mentally or even

620
00:31:33.720 --> 00:31:36.279
<v Speaker 2>on paper to see which one lends itself more naturally

621
00:31:36.319 --> 00:31:39.039
<v Speaker 2>and leads to cleaner, easier to understand code for the

622
00:31:39.119 --> 00:31:41.039
<v Speaker 2>specific problem you're trying to solve.

623
00:31:41.279 --> 00:31:44.279
<v Speaker 1>Choose the right tool for the job, even between recursion

624
00:31:44.319 --> 00:31:47.400
<v Speaker 1>and iteration with a stack exactly. Now, what about tail recursion?

625
00:31:47.440 --> 00:31:48.799
<v Speaker 1>Is that a special kind of recursion?

626
00:31:49.000 --> 00:31:52.279
<v Speaker 2>It is a very special and important case. A tail

627
00:31:52.359 --> 00:31:56.799
<v Speaker 2>recursive algorithm is one where if a recursive call happens,

628
00:31:57.079 --> 00:31:59.880
<v Speaker 2>it is the very last thing done in that execution

629
00:32:00.119 --> 00:32:03.680
<v Speaker 2>path of the function. There's no computation done after the

630
00:32:03.720 --> 00:32:04.880
<v Speaker 2>recursive call returns.

631
00:32:04.920 --> 00:32:05.759
<v Speaker 1>Why is that special?

632
00:32:06.000 --> 00:32:10.480
<v Speaker 2>Because these tail recursive algorithms are particularly efficient. Compilers or

633
00:32:10.480 --> 00:32:15.799
<v Speaker 2>interpreters that support tail call optimization TCO can recognize this pattern.

634
00:32:16.119 --> 00:32:18.720
<v Speaker 2>They can replace the recursive call with a simple jump,

635
00:32:19.000 --> 00:32:22.920
<v Speaker 2>essentially turning the recursion into an efficient loop without needing

636
00:32:22.920 --> 00:32:24.759
<v Speaker 2>to add a new frame to the call stacks.

637
00:32:24.839 --> 00:32:26.319
<v Speaker 1>So no stack build up.

638
00:32:26.440 --> 00:32:29.359
<v Speaker 2>Right because no additional data or context needs to be

639
00:32:29.400 --> 00:32:32.440
<v Speaker 2>saved for when the recursive call returns. Since the last action,

640
00:32:32.599 --> 00:32:34.960
<v Speaker 2>you don't need the stack space. This is always more

641
00:32:35.000 --> 00:32:37.960
<v Speaker 2>efficient in terms of both time and memory than standard

642
00:32:38.000 --> 00:32:39.440
<v Speaker 2>recursion that builds up the stack.

643
00:32:39.640 --> 00:32:41.680
<v Speaker 1>Does Ruby support tail call optimization?

644
00:32:42.240 --> 00:32:45.640
<v Speaker 2>Ugh, that's a tricky point. In Ruby standard Ruby interpreters

645
00:32:45.680 --> 00:32:49.480
<v Speaker 2>historically did not perform TCO by default, Although some alternative

646
00:32:49.519 --> 00:32:53.440
<v Speaker 2>implementations or specific configurations might offer it, it's not something

647
00:32:53.480 --> 00:32:56.640
<v Speaker 2>you can universally rely on in the Ruby ecosystem like

648
00:32:56.680 --> 00:32:58.480
<v Speaker 2>you might in some functional languages.

649
00:32:58.720 --> 00:33:02.519
<v Speaker 1>Good to know. Okay. Let's transition out to some more

650
00:33:02.519 --> 00:33:04.799
<v Speaker 1>complex nonlinear data structures.

651
00:33:05.480 --> 00:33:09.319
<v Speaker 2>Trees right trees a very important category. Let's start with

652
00:33:09.400 --> 00:33:14.440
<v Speaker 2>binary search trees, or bsts. They're incredibly powerful and widely used.

653
00:33:14.680 --> 00:33:17.799
<v Speaker 2>A BST is an ordered tree structure where every node

654
00:33:17.880 --> 00:33:20.960
<v Speaker 2>has at most two children, a left child and a

655
00:33:21.039 --> 00:33:21.640
<v Speaker 2>right child.

656
00:33:21.960 --> 00:33:24.200
<v Speaker 1>Binary means two children bax correct.

657
00:33:24.480 --> 00:33:26.519
<v Speaker 2>And the crucial property, the thing that makes it a

658
00:33:26.519 --> 00:33:30.440
<v Speaker 2>search tree is the ordering constraint. For any given node,

659
00:33:30.559 --> 00:33:33.519
<v Speaker 2>all the value stored in its left subtree must be

660
00:33:33.640 --> 00:33:36.079
<v Speaker 2>less than the node's own value. Okay, and all the

661
00:33:36.160 --> 00:33:38.680
<v Speaker 2>values stored in its right subtree must be greater than

662
00:33:38.720 --> 00:33:40.200
<v Speaker 2>the node's own value less.

663
00:33:40.000 --> 00:33:42.079
<v Speaker 1>Than to the left greater than to the right. That

664
00:33:42.119 --> 00:33:45.400
<v Speaker 1>structure seems specifically designed for fast lookups, doesn't It almost

665
00:33:45.440 --> 00:33:47.839
<v Speaker 1>like doing a binary search, but on a tree structure.

666
00:33:48.039 --> 00:33:52.160
<v Speaker 2>That's precisely the goal and the advantage. Operations like searching

667
00:33:52.160 --> 00:33:54.519
<v Speaker 2>for a value and serting a new value, or even

668
00:33:54.599 --> 00:33:57.839
<v Speaker 2>deleting a value all typically involves starting at the root

669
00:33:57.880 --> 00:34:01.440
<v Speaker 2>node and comparing the target value with the current nodes value.

670
00:34:01.680 --> 00:34:04.400
<v Speaker 2>If the target is smaller, you go left, If it's larger,

671
00:34:04.440 --> 00:34:08.119
<v Speaker 2>you go right. You repeat this process moving down the tree,

672
00:34:08.400 --> 00:34:10.519
<v Speaker 2>effectively mimicking a binary.

673
00:34:10.119 --> 00:34:11.559
<v Speaker 1>Search, and that makes it fast.

674
00:34:11.960 --> 00:34:15.559
<v Speaker 2>On average. Yes, if the tree is reasonably balanced, meaning

675
00:34:15.639 --> 00:34:19.119
<v Speaker 2>it's sort of bushy and not too skewed, these operations

676
00:34:19.400 --> 00:34:24.000
<v Speaker 2>search insertion deletion are very fast. They typically take logarithmic

677
00:34:24.039 --> 00:34:26.400
<v Speaker 2>time relative to the number of nodes n which we

678
00:34:26.519 --> 00:34:28.079
<v Speaker 2>rate as olgn or.

679
00:34:28.039 --> 00:34:31.199
<v Speaker 1>Olog in logarithmic time is great much faster than linear.

680
00:34:31.239 --> 00:34:34.920
<v Speaker 2>Absolutely, But there's a significant catch. Isn't there a worst

681
00:34:34.960 --> 00:34:38.239
<v Speaker 2>case scenario that can completely undermine all that wonderful olog

682
00:34:38.320 --> 00:34:42.039
<v Speaker 2>and efficiency. Yes, there is a major catch with simple bsts.

683
00:34:42.480 --> 00:34:44.800
<v Speaker 2>In the worst case scenario, if you insert values into

684
00:34:44.800 --> 00:34:47.519
<v Speaker 2>the tree in a sordid order like one, two, three, four, five,

685
00:34:47.920 --> 00:34:50.719
<v Speaker 2>or a reverse sortied order five four three two one.

686
00:34:51.039 --> 00:34:53.159
<v Speaker 2>The tree doesn't become bushy at all. It becomes long

687
00:34:53.280 --> 00:34:57.559
<v Speaker 2>and skinny, essentially degenerating into a linked list structure. Every

688
00:34:57.599 --> 00:35:00.079
<v Speaker 2>new node just gets added as the right child or

689
00:35:00.159 --> 00:35:01.840
<v Speaker 2>left child of the previous one.

690
00:35:01.920 --> 00:35:04.159
<v Speaker 1>Oh, so it just becomes a line exactly.

691
00:35:04.559 --> 00:35:07.920
<v Speaker 2>And in this degenerate list like scenario, searching for an

692
00:35:07.920 --> 00:35:11.639
<v Speaker 2>element might require traversing all the way down this long chain,

693
00:35:12.159 --> 00:35:16.880
<v Speaker 2>so search, insertion and deletion can all become on linear time,

694
00:35:17.039 --> 00:35:19.800
<v Speaker 2>completely losing the logarithmic efficiency benefit we want it.

695
00:35:19.880 --> 00:35:23.119
<v Speaker 1>That's bad, that defeats the whole purpose. So the crucial

696
00:35:23.199 --> 00:35:26.760
<v Speaker 1>question for anyone designing systems that rely on bsts yeah,

697
00:35:26.880 --> 00:35:31.519
<v Speaker 1>becomes how can we prevent this worst case degeneration and

698
00:35:31.599 --> 00:35:35.559
<v Speaker 1>guarantee that nice, consistent olog in performance, right.

699
00:35:35.360 --> 00:35:38.199
<v Speaker 2>And the answer is to use balanced search trees. These

700
00:35:38.199 --> 00:35:40.800
<v Speaker 2>are more sophisticated types of binary search trees that have

701
00:35:40.840 --> 00:35:44.360
<v Speaker 2>additional rules or mechanisms to actively maintain a relatively bushy

702
00:35:44.440 --> 00:35:47.719
<v Speaker 2>or balanced structure even as elements are inserted and deleted.

703
00:35:48.000 --> 00:35:50.800
<v Speaker 2>They prevent the tree from becoming too skewed or degenerate,

704
00:35:51.119 --> 00:35:54.480
<v Speaker 2>thereby guaranteeing olog in performance for all the key operations,

705
00:35:54.760 --> 00:35:57.039
<v Speaker 2>regardless of the order in which data arrives.

706
00:35:57.360 --> 00:36:00.920
<v Speaker 1>Okay, balanced trees are the solution. Us about one type?

707
00:36:01.000 --> 00:36:02.079
<v Speaker 1>Maybe AVL trees.

708
00:36:02.199 --> 00:36:05.039
<v Speaker 2>Sure ABL trees are one of the classic examples of

709
00:36:05.079 --> 00:36:08.239
<v Speaker 2>self balancing binary search trees. They were named after their

710
00:36:08.280 --> 00:36:12.920
<v Speaker 2>inventors Adelsenvelski and Landis and AVL tree is a BST

711
00:36:13.159 --> 00:36:16.679
<v Speaker 2>with an additional balance condition. For every single node in

712
00:36:16.679 --> 00:36:18.960
<v Speaker 2>the tree, the heights of its left subtree and its

713
00:36:19.079 --> 00:36:21.719
<v Speaker 2>right subtree can differ by at most.

714
00:36:21.519 --> 00:36:25.079
<v Speaker 1>One height difference of at most one. That sounds strict.

715
00:36:25.320 --> 00:36:26.199
<v Speaker 1>It is quite strict.

716
00:36:26.360 --> 00:36:29.159
<v Speaker 2>When you insert or delete a node in an AVL tree,

717
00:36:29.199 --> 00:36:32.719
<v Speaker 2>the operation might temporarily violate this balance condition somewhere up

718
00:36:32.719 --> 00:36:36.360
<v Speaker 2>the tree. When that happens, the tree automatically performs specific

719
00:36:36.360 --> 00:36:40.719
<v Speaker 2>restructuring operations called rotations single or double rotations at the

720
00:36:40.840 --> 00:36:43.840
<v Speaker 2>unbalanced nodes to restore the height balance property, ensuring the

721
00:36:43.880 --> 00:36:45.559
<v Speaker 2>O log and guarantee is maintained.

722
00:36:45.760 --> 00:36:49.960
<v Speaker 1>Okay, AVL trees use rotations to keep things balanced. What

723
00:36:50.000 --> 00:36:51.760
<v Speaker 1>about two three trees? They sound different?

724
00:36:52.039 --> 00:36:55.719
<v Speaker 2>They are different. Yeah. Two three trees are another type

725
00:36:55.719 --> 00:36:59.320
<v Speaker 2>of balanced search tree, but they aren't strictly binary trees.

726
00:36:59.440 --> 00:37:02.320
<v Speaker 2>They achieve balance in a different way. Two three trees

727
00:37:02.360 --> 00:37:06.000
<v Speaker 2>are defined as being perfectly balanced, which means all their

728
00:37:06.119 --> 00:37:08.760
<v Speaker 2>leaf nodes, the nodes at the bottom with no children,

729
00:37:08.960 --> 00:37:12.199
<v Speaker 2>are always on the exact same level or depth perfectly balanced.

730
00:37:12.199 --> 00:37:13.159
<v Speaker 1>How do they manage that?

731
00:37:13.400 --> 00:37:15.360
<v Speaker 2>They do it by allowing nodes to have either two

732
00:37:15.480 --> 00:37:18.800
<v Speaker 2>children or three children. Every internal node in a two

733
00:37:18.880 --> 00:37:21.360
<v Speaker 2>three tree is either a two node, meaning it holds

734
00:37:21.400 --> 00:37:24.159
<v Speaker 2>one data value and has two children left and right,

735
00:37:24.280 --> 00:37:27.239
<v Speaker 2>maintaining the search property, or it's a three node, meaning

736
00:37:27.280 --> 00:37:30.119
<v Speaker 2>it holds two data values, say small and large, and

737
00:37:30.199 --> 00:37:33.599
<v Speaker 2>has three children left, middle, and right, again maintaining order.

738
00:37:33.719 --> 00:37:36.159
<v Speaker 1>So nodes can hold more data and have more children.

739
00:37:36.440 --> 00:37:39.960
<v Speaker 2>Right when you insert or delete, you find the right spot.

740
00:37:40.519 --> 00:37:43.599
<v Speaker 2>Insertion might cause a node to temporarily have too many values,

741
00:37:43.679 --> 00:37:46.400
<v Speaker 2>like a four node. When this happens, the middle value

742
00:37:46.440 --> 00:37:49.639
<v Speaker 2>gets promoted up to the parent node, splitting the overflowing

743
00:37:49.679 --> 00:37:53.119
<v Speaker 2>node into two valid two nodes. This splitting process might

744
00:37:53.159 --> 00:37:56.239
<v Speaker 2>propagate upwards. Deletion might cause a node to have too

745
00:37:56.320 --> 00:38:00.000
<v Speaker 2>few values, like a one node. This trigger's combining or

746
00:38:00.079 --> 00:38:03.840
<v Speaker 2>borrowing values from siblings, potentially propagating changes upwards as well.

747
00:38:04.519 --> 00:38:08.000
<v Speaker 2>These mechanisms ensure the tree always remains perfectly balanced, with

748
00:38:08.039 --> 00:38:11.760
<v Speaker 2>all leaves at the same depth, guaranteeing olog in performance.

749
00:38:12.000 --> 00:38:15.679
<v Speaker 1>Wow. AVL trees and two three trees sound quite complex

750
00:38:15.719 --> 00:38:17.559
<v Speaker 1>to implement compared to a simple BST.

751
00:38:17.880 --> 00:38:21.000
<v Speaker 2>They definitely are more complex to implement correctly. There's more

752
00:38:21.039 --> 00:38:24.159
<v Speaker 2>overhead involved in the insertion and deletion operations due to

753
00:38:24.199 --> 00:38:27.840
<v Speaker 2>the balancing checks and restructuring rotations or splitting combining. But

754
00:38:27.920 --> 00:38:30.639
<v Speaker 2>what's fascinating and why they're so important is that they

755
00:38:30.679 --> 00:38:35.199
<v Speaker 2>trade that increased complexity during modifications for consistent, highly efficient

756
00:38:35.239 --> 00:38:40.079
<v Speaker 2>olog in performance across all critical operations search insertion, and dilution,

757
00:38:40.199 --> 00:38:41.639
<v Speaker 2>regardless of the input data.

758
00:38:41.480 --> 00:38:46.039
<v Speaker 1>Patterns, So that guaranteed performance makes the complexity worthwhile in many.

759
00:38:45.880 --> 00:38:50.719
<v Speaker 2>Applications absolutely, especially in databases, file systems, or any situation

760
00:38:50.800 --> 00:38:54.880
<v Speaker 2>where you need reliably fast search, insertion, and deletion on

761
00:38:55.039 --> 00:38:57.400
<v Speaker 2>potentially large dynamic data sets.

762
00:38:57.440 --> 00:39:00.239
<v Speaker 1>Okay, that makes sense. Now, let's talk about two very

763
00:39:00.239 --> 00:39:04.840
<v Speaker 1>common collection types that often leverage these efficient structures. Sets

764
00:39:05.039 --> 00:39:06.039
<v Speaker 1>and maps. Right.

765
00:39:06.159 --> 00:39:09.760
<v Speaker 2>Sets and maps are fundamental abstract data types themselves. A

766
00:39:09.880 --> 00:39:14.360
<v Speaker 2>set is conceptually an unordered collection where each element is unique.

767
00:39:14.519 --> 00:39:17.960
<v Speaker 2>It can appear at most once. Think of a mathematical.

768
00:39:17.400 --> 00:39:19.199
<v Speaker 1>Set no duplicates allowed.

769
00:39:18.880 --> 00:39:22.599
<v Speaker 2>Exactly, and a map, which is also sometimes called a table.

770
00:39:22.719 --> 00:39:26.000
<v Speaker 2>A dictionary or an associative array is an unordered collection

771
00:39:26.039 --> 00:39:29.239
<v Speaker 2>where values are stored and retrieved using a unique key.

772
00:39:29.880 --> 00:39:32.320
<v Speaker 2>It acts like a mathematical function, mapping keys to their

773
00:39:32.320 --> 00:39:35.800
<v Speaker 2>corresponding values. Each key must be unique within the map.

774
00:39:36.079 --> 00:39:39.840
<v Speaker 1>Key value pairs like a dictionary mapping words keys to

775
00:39:39.880 --> 00:39:41.199
<v Speaker 1>the definitions values.

776
00:39:41.320 --> 00:39:44.000
<v Speaker 2>Perfect analogy. Now how do we implement these efficiently?

777
00:39:44.199 --> 00:39:46.400
<v Speaker 1>Yeah, you wouldn't just use a plan array or linked list,

778
00:39:46.400 --> 00:39:49.159
<v Speaker 1>would you. Searching for a key or checking for set

779
00:39:49.159 --> 00:39:52.119
<v Speaker 1>membership could become slownn.

780
00:39:51.519 --> 00:39:55.559
<v Speaker 2>Generally, no storing set elements or map key value pairs

781
00:39:55.599 --> 00:39:58.960
<v Speaker 2>in simple arrays or linked lists is usually inefficient for

782
00:39:59.039 --> 00:40:03.559
<v Speaker 2>anything but the small collections, precisely because search insertion and

783
00:40:03.639 --> 00:40:08.360
<v Speaker 2>deletion often end up being on However, those balanced search

784
00:40:08.400 --> 00:40:11.840
<v Speaker 2>trees we just discussed, like AVL trees or two three trees,

785
00:40:12.079 --> 00:40:15.760
<v Speaker 2>or their relatives like red black trees are excellent candidates

786
00:40:15.760 --> 00:40:19.559
<v Speaker 2>for implementing both sets and maps, how so because they

787
00:40:19.599 --> 00:40:23.880
<v Speaker 2>provide those fast, guaranteed O log in operations for insertion, dilution,

788
00:40:24.039 --> 00:40:26.800
<v Speaker 2>and searching based on the set elements value or the

789
00:40:26.800 --> 00:40:30.239
<v Speaker 2>maps key. If you store set elements or map entries

790
00:40:30.320 --> 00:40:32.679
<v Speaker 2>keyed by their key in a balance suitch tree, all

791
00:40:32.719 --> 00:40:35.039
<v Speaker 2>the fundamental operations become very efficient.

792
00:40:35.079 --> 00:40:38.039
<v Speaker 1>Plus there's an added benefit with tree based maps. Right yes.

793
00:40:38.320 --> 00:40:40.320
<v Speaker 2>A nice side effect of using search trees is that

794
00:40:40.360 --> 00:40:42.920
<v Speaker 2>you can easily traverse the set elements or map entries

795
00:40:42.960 --> 00:40:45.719
<v Speaker 2>in sorted order, either by element value for sets or

796
00:40:45.760 --> 00:40:49.320
<v Speaker 2>by key for maps. Sometimes that order traversal is very useful.

797
00:40:49.480 --> 00:40:52.559
<v Speaker 1>Okay, So balance FREEZ give us o log in sets

798
00:40:52.559 --> 00:40:55.159
<v Speaker 1>and maps. Is there a way to get even faster?

799
00:40:55.360 --> 00:40:58.000
<v Speaker 1>Maybe close to one? Constant time access?

800
00:40:58.480 --> 00:41:02.159
<v Speaker 2>Ah, the quester instant acts us. In an ideal world,

801
00:41:02.280 --> 00:41:05.039
<v Speaker 2>retrieving a value from a map given its key, or

802
00:41:05.119 --> 00:41:09.000
<v Speaker 2>checking if an element isn't a set would be instantaneous. Right, oh, one,

803
00:41:09.400 --> 00:41:11.400
<v Speaker 2>that's precisely the goal of hashing.

804
00:41:11.679 --> 00:41:14.000
<v Speaker 1>Hashing, I've heard that term. How does it work?

805
00:41:14.440 --> 00:41:17.280
<v Speaker 2>Hashing uses a special function called a hash function to

806
00:41:17.320 --> 00:41:20.000
<v Speaker 2>do something quite clever. It takes an input key, which

807
00:41:20.039 --> 00:41:23.400
<v Speaker 2>could be a string, number or any object, and mathematically

808
00:41:23.440 --> 00:41:26.800
<v Speaker 2>transforms it into an integer. This integer is then typically

809
00:41:26.920 --> 00:41:29.519
<v Speaker 2>used as an index into a regular array, which we

810
00:41:29.559 --> 00:41:30.559
<v Speaker 2>call the hash table.

811
00:41:30.760 --> 00:41:33.199
<v Speaker 1>So the hash function map's a key directly to an

812
00:41:33.239 --> 00:41:33.760
<v Speaker 1>arrayse loote.

813
00:41:33.840 --> 00:41:36.239
<v Speaker 2>Ideally, yes, the value associated with the key in a

814
00:41:36.280 --> 00:41:38.559
<v Speaker 2>map or just an indicator of presence in a set,

815
00:41:38.800 --> 00:41:41.440
<v Speaker 2>is stored at that calculated array index. If the hash

816
00:41:41.519 --> 00:41:44.840
<v Speaker 2>function is good and distributes keys evenly across the array indicies,

817
00:41:44.920 --> 00:41:48.320
<v Speaker 2>you can achieve incredibly fast access effectively oh one on

818
00:41:48.400 --> 00:41:51.800
<v Speaker 2>average for insertion, dilution, and searching. You just compute the hash,

819
00:41:51.840 --> 00:41:53.960
<v Speaker 2>go straight to the array index and find or place

820
00:41:54.000 --> 00:41:54.360
<v Speaker 2>your data.

821
00:41:54.599 --> 00:41:58.679
<v Speaker 1>That sounds almost magical oh one average time. But there

822
00:41:58.760 --> 00:42:01.679
<v Speaker 1>must be a catch. What happens if two different keys,

823
00:42:02.000 --> 00:42:04.800
<v Speaker 1>after being put through the hash function, end up mapping

824
00:42:04.880 --> 00:42:06.119
<v Speaker 1>to the same array index.

825
00:42:06.320 --> 00:42:09.119
<v Speaker 2>Ah, You've hit the crucial issue. That's called a collision.

826
00:42:09.760 --> 00:42:12.679
<v Speaker 2>Since the hash function maps a potentially huge universe of

827
00:42:12.760 --> 00:42:15.840
<v Speaker 2>keys down to a smaller fixed number of array indices,

828
00:42:16.119 --> 00:42:18.840
<v Speaker 2>collisions are practically bound to occur sometimes.

829
00:42:19.320 --> 00:42:21.599
<v Speaker 1>So a key part of hashing isn't just a hash function,

830
00:42:21.719 --> 00:42:24.119
<v Speaker 1>it's how you handle those collisions exactly.

831
00:42:24.599 --> 00:42:28.960
<v Speaker 2>A robust hashing implementation needs a good collision resolution scheme.

832
00:42:29.760 --> 00:42:32.280
<v Speaker 2>There are two main families of approaches. One is called

833
00:42:32.360 --> 00:42:35.960
<v Speaker 2>chaining or separate chaining. Here, each slot in the hash

834
00:42:36.000 --> 00:42:39.039
<v Speaker 2>table array doesn't hold just one value, but rather a

835
00:42:39.079 --> 00:42:41.800
<v Speaker 2>pointer to a link list or some other collection of

836
00:42:41.840 --> 00:42:44.719
<v Speaker 2>all the key value pairs that hash to that same index.

837
00:42:44.800 --> 00:42:47.119
<v Speaker 1>So collisions just get added to the list at that spot.

838
00:42:47.360 --> 00:42:49.559
<v Speaker 2>Right to find a key, you hash it to get

839
00:42:49.559 --> 00:42:51.679
<v Speaker 2>the index. Then you might have to search through the

840
00:42:51.679 --> 00:42:54.760
<v Speaker 2>short link list at that index. The other main approach

841
00:42:54.800 --> 00:42:58.280
<v Speaker 2>is open addressing. Here all key value pairs are stored

842
00:42:58.280 --> 00:43:01.840
<v Speaker 2>directly within the hash table array. If you hash a

843
00:43:01.920 --> 00:43:04.719
<v Speaker 2>key and find that slot is already occupied by a different.

844
00:43:04.480 --> 00:43:05.320
<v Speaker 1>Key, what do you do?

845
00:43:05.840 --> 00:43:09.519
<v Speaker 2>You use a systematic procedure called probing to find the

846
00:43:09.519 --> 00:43:13.639
<v Speaker 2>next available free slot in the table. Ag Linear probing

847
00:43:13.719 --> 00:43:16.400
<v Speaker 2>just checks the next slot, then the next wrapping around.

848
00:43:16.760 --> 00:43:20.880
<v Speaker 2>Quadratic probing checks slots further away. The key value pair

849
00:43:21.000 --> 00:43:24.360
<v Speaker 2>is stored in the first empty slot found. Searching involves

850
00:43:24.360 --> 00:43:25.760
<v Speaker 2>following the same probe sequence.

851
00:43:25.840 --> 00:43:30.320
<v Speaker 1>Chaining versus open addressing. Complex stuff does how full the

852
00:43:30.400 --> 00:43:30.960
<v Speaker 1>table is.

853
00:43:30.920 --> 00:43:34.679
<v Speaker 2>Matter critically important. The performance of hashing heavily depends on

854
00:43:34.719 --> 00:43:38.239
<v Speaker 2>the load factor, usually denoted by lambda. It's the ratio

855
00:43:38.360 --> 00:43:41.119
<v Speaker 2>of the number of elements stored and divided by the

856
00:43:41.239 --> 00:43:43.880
<v Speaker 2>total size of the hash table array. If the load

857
00:43:43.880 --> 00:43:46.880
<v Speaker 2>factor gets too high at either the table gets too full,

858
00:43:47.159 --> 00:43:52.039
<v Speaker 2>collisions become much more frequent, and the performance starts to degrade.

859
00:43:52.199 --> 00:43:55.360
<v Speaker 2>With chaining, the average search time becomes proportional to the

860
00:43:55.360 --> 00:43:59.880
<v Speaker 2>load factor. With open addressing, performance degrades even more sharply

861
00:44:00.039 --> 00:44:02.719
<v Speaker 2>as the load factor approaches one one hundred percent full,

862
00:44:03.039 --> 00:44:05.760
<v Speaker 2>because finding empty slots becomes very hard.

863
00:44:05.880 --> 00:44:08.320
<v Speaker 1>So you need to keep the hash table reasonably sized,

864
00:44:08.400 --> 00:44:10.079
<v Speaker 1>maybe resize it if it gets too full.

865
00:44:10.199 --> 00:44:15.159
<v Speaker 2>Precisely, good hashtable implementations often automatically resize the underlying array

866
00:44:15.519 --> 00:44:18.760
<v Speaker 2>zamag doubling it, and rehash all these existing elements into

867
00:44:18.760 --> 00:44:21.360
<v Speaker 2>the new larger table when the load factor exceeds a

868
00:44:21.360 --> 00:44:24.199
<v Speaker 2>certain threshold like point seven or point seventy five.

869
00:44:24.320 --> 00:44:25.920
<v Speaker 1>Okay, so what's the bottom line on hashing.

870
00:44:26.239 --> 00:44:29.079
<v Speaker 2>The bottom line is that hashing provides very fast, effectively

871
00:44:29.199 --> 00:44:33.280
<v Speaker 2>constant time average performance oh one for the core operations

872
00:44:33.280 --> 00:44:37.119
<v Speaker 2>of insertion, dilution, and searching or membership testing for sets.

873
00:44:37.519 --> 00:44:40.920
<v Speaker 2>This generally makes hash maps like Ruby's hash class and

874
00:44:41.000 --> 00:44:44.800
<v Speaker 2>hash sets significantly faster on average than their tree based counterparts,

875
00:44:44.960 --> 00:44:47.760
<v Speaker 2>which are o log in provided their backing hash tables

876
00:44:47.760 --> 00:44:50.199
<v Speaker 2>are well managed and don't become excessively.

877
00:44:49.679 --> 00:44:51.239
<v Speaker 1>Full, and Ruby helps here.

878
00:44:51.480 --> 00:44:54.840
<v Speaker 2>Yeah, it's notable that Ruby's base object class includes a

879
00:44:54.920 --> 00:44:59.199
<v Speaker 2>default hash method and an equality method. This means pretty

880
00:44:59.280 --> 00:45:01.199
<v Speaker 2>much any Ruby the object can be used as a

881
00:45:01.280 --> 00:45:03.519
<v Speaker 2>key in a hash table out of the box. And

882
00:45:03.639 --> 00:45:06.960
<v Speaker 2>Ruby's built in hash class is a highly optimized implementation

883
00:45:07.039 --> 00:45:10.000
<v Speaker 2>of a hash map, effectively implementing most of the map

884
00:45:10.039 --> 00:45:13.719
<v Speaker 2>ADT interface very efficiently for you. It's one of Ruby's

885
00:45:13.760 --> 00:45:16.199
<v Speaker 2>most used and powerful built in types.

886
00:45:16.480 --> 00:45:19.559
<v Speaker 1>Great Okay. One last major category of data structures to

887
00:45:19.559 --> 00:45:20.679
<v Speaker 1>cover from our source.

888
00:45:20.880 --> 00:45:25.320
<v Speaker 2>Graphs right graphs. Graphs are incredibly flexible and powerful structures

889
00:45:25.400 --> 00:45:29.400
<v Speaker 2>used to model relationships and connections between entities. Formally, a

890
00:45:29.440 --> 00:45:32.320
<v Speaker 2>graph is just a collection of vertices also called nodes,

891
00:45:32.719 --> 00:45:35.519
<v Speaker 2>and a collection of edges that connect pairs of vertices.

892
00:45:35.639 --> 00:45:38.360
<v Speaker 1>Vertices and edges like cities and the roads connecting them.

893
00:45:38.440 --> 00:45:41.039
<v Speaker 2>That's a perfect analogy. The cities are vertices, the roads

894
00:45:41.119 --> 00:45:44.320
<v Speaker 2>or edges. Edges can be undirected, meaning the connection works

895
00:45:44.360 --> 00:45:46.920
<v Speaker 2>both ways like a typical two way road, or they

896
00:45:46.960 --> 00:45:49.239
<v Speaker 2>can be directed, meaning the connection only goes one way

897
00:45:49.320 --> 00:45:51.920
<v Speaker 2>like a one way street. A graph with directed edges

898
00:45:52.000 --> 00:45:53.559
<v Speaker 2>is often called a digraph.

899
00:45:53.679 --> 00:45:57.440
<v Speaker 1>Okay, undirected or directed. What are some other key graph

900
00:45:57.519 --> 00:45:58.280
<v Speaker 1>terms we should know?

901
00:45:58.519 --> 00:46:03.639
<v Speaker 2>Some important concepts include adjacency. Two vertices are adjacent if

902
00:46:03.679 --> 00:46:07.679
<v Speaker 2>there's an edge directly connecting them. Path a sequence of

903
00:46:07.760 --> 00:46:10.880
<v Speaker 2>vertices where each adjacent pair in the sequence is connected

904
00:46:10.880 --> 00:46:13.320
<v Speaker 2>by an edge, like a route from one city to

905
00:46:13.400 --> 00:46:17.480
<v Speaker 2>another following the roads cycle. A path that starts and

906
00:46:17.559 --> 00:46:20.480
<v Speaker 2>ends at the same vertex and doesn't repeat edges, usually

907
00:46:20.559 --> 00:46:24.920
<v Speaker 2>like a round trim connected graph. For undirected graphs, a

908
00:46:24.960 --> 00:46:27.519
<v Speaker 2>graph is connected if there's a path between every pair

909
00:46:27.599 --> 00:46:30.800
<v Speaker 2>of distinct vertices. You can get from anywhere to anywhere else.

910
00:46:30.840 --> 00:46:33.239
<v Speaker 1>It's spanning tree. You mentioned that earlier, right.

911
00:46:33.519 --> 00:46:36.239
<v Speaker 2>A spanning free of a connected undirected graph is a

912
00:46:36.280 --> 00:46:38.960
<v Speaker 2>subgraph that includes all the vertices of the original graph

913
00:46:39.159 --> 00:46:41.599
<v Speaker 2>and is also a tree, meaning it's connected and has

914
00:46:41.639 --> 00:46:44.679
<v Speaker 2>no cycles. It's like finding a minimal set of edges

915
00:46:44.760 --> 00:46:48.840
<v Speaker 2>needed to keep all the vertices connected. Think network backbones.

916
00:46:48.960 --> 00:46:52.000
<v Speaker 1>Okay, so how do we actually represent these graphs in

917
00:46:52.039 --> 00:46:55.119
<v Speaker 1>a program? How do we store the vertices and edges?

918
00:46:55.440 --> 00:46:58.559
<v Speaker 2>There are two main standard ways to implement graphs. One

919
00:46:58.599 --> 00:47:02.119
<v Speaker 2>is the adjacency matrix. If you have vvertices, you create

920
00:47:02.159 --> 00:47:05.360
<v Speaker 2>a V by V matrix, usually of booleans or integers.

921
00:47:05.800 --> 00:47:08.760
<v Speaker 2>The entry at matrix C is true or one if

922
00:47:08.800 --> 00:47:10.960
<v Speaker 2>there's an edge from vertex E to vertex J, and

923
00:47:11.159 --> 00:47:12.039
<v Speaker 2>false or zero.

924
00:47:12.119 --> 00:47:15.760
<v Speaker 1>Otherwise a big grid showing all possible connections exactly.

925
00:47:16.119 --> 00:47:18.840
<v Speaker 2>This is simple and allows very fast checking if an

926
00:47:18.960 --> 00:47:22.039
<v Speaker 2>edge exists between two specific vertices, but it can use

927
00:47:22.079 --> 00:47:24.719
<v Speaker 2>a lot of memory OV two even if the graph

928
00:47:24.800 --> 00:47:28.719
<v Speaker 2>has very few edges a sparse graph, it's generally better

929
00:47:28.760 --> 00:47:31.519
<v Speaker 2>for dense graphs where many pairs of vertices are connected

930
00:47:31.599 --> 00:47:33.360
<v Speaker 2>man the other way. The other common way is the

931
00:47:33.400 --> 00:47:36.679
<v Speaker 2>adjacency list. Here you typically have an array or list

932
00:47:36.760 --> 00:47:39.960
<v Speaker 2>with one entry for each vertex. The entry for vertex

933
00:47:40.079 --> 00:47:42.480
<v Speaker 2>I contains a link to list or some other collection

934
00:47:42.880 --> 00:47:44.920
<v Speaker 2>of all the vertices J such that there is an

935
00:47:44.960 --> 00:47:45.480
<v Speaker 2>edge from I.

936
00:47:45.639 --> 00:47:49.639
<v Speaker 1>To J, so each vertex just lists its direct neighbors right.

937
00:47:50.039 --> 00:47:52.880
<v Speaker 2>This is usually much more space efficient for sparse graphs,

938
00:47:53.039 --> 00:47:56.960
<v Speaker 2>using memory proportional to V plus e vertices plus edges.

939
00:47:57.480 --> 00:48:00.079
<v Speaker 2>Checking for a specific edge might be slightly slow, or

940
00:48:00.239 --> 00:48:02.519
<v Speaker 2>you might have to scan a list, but iterating over

941
00:48:02.559 --> 00:48:05.840
<v Speaker 2>all neighbors of a vertex is very efficient. This is

942
00:48:05.880 --> 00:48:09.000
<v Speaker 2>often the preferred representation for sparse graphs, which are common

943
00:48:09.039 --> 00:48:09.800
<v Speaker 2>in practice.

944
00:48:10.079 --> 00:48:13.400
<v Speaker 1>Matrix versus list depends on density. Makes sense.

945
00:48:13.679 --> 00:48:16.400
<v Speaker 2>So once we have a graph represented, we need algorithms

946
00:48:16.440 --> 00:48:19.199
<v Speaker 2>to do things with it, like explore it or find paths.

947
00:48:19.599 --> 00:48:22.000
<v Speaker 2>What are the main ways to search or traverse a graph?

948
00:48:22.559 --> 00:48:26.320
<v Speaker 2>The two fundamental graph traversal algorithms are depth first search

949
00:48:26.679 --> 00:48:28.679
<v Speaker 2>DFS and breadth first search.

950
00:48:28.679 --> 00:48:31.599
<v Speaker 1>BFFs dept first search going deep exactly.

951
00:48:31.920 --> 00:48:35.000
<v Speaker 2>DFS starts at a chosen vertex. It explores as far

952
00:48:35.039 --> 00:48:37.599
<v Speaker 2>as possible along one path before it has to backtrack.

953
00:48:37.960 --> 00:48:41.480
<v Speaker 2>It picks an unvisited neighbor, visits it, and immediately recurses

954
00:48:41.480 --> 00:48:43.920
<v Speaker 2>from there. If it hits a dead end a node

955
00:48:43.920 --> 00:48:47.199
<v Speaker 2>with no unvisited neighbors, it backtracks to the previous node

956
00:48:47.199 --> 00:48:48.800
<v Speaker 2>and tries a different unvisited neighbor.

957
00:48:48.960 --> 00:48:51.880
<v Speaker 1>Like exploring a maze by always taking the first available

958
00:48:51.920 --> 00:48:54.079
<v Speaker 1>path until you hit a wall, then backing up.

959
00:48:54.280 --> 00:48:57.559
<v Speaker 2>That's a great analogy. DFS is conceptually similar to a

960
00:48:57.599 --> 00:49:01.239
<v Speaker 2>pre ordered traversal of a tree and be implemented elegantly

961
00:49:01.360 --> 00:49:04.760
<v Speaker 2>using recursion or inneratively using an explicit stack to keep

962
00:49:04.800 --> 00:49:07.679
<v Speaker 2>track of the path being explored. Its worst case time

963
00:49:07.719 --> 00:49:11.239
<v Speaker 2>complexity is ov plus e proportional to the number of

964
00:49:11.320 --> 00:49:14.840
<v Speaker 2>vertices plus the number of edges, because in the worst case,

965
00:49:14.880 --> 00:49:17.800
<v Speaker 2>it visits every vertex and traverses every edge.

966
00:49:17.599 --> 00:49:21.280
<v Speaker 1>Once Okay and breadth first search going wide.

967
00:49:20.920 --> 00:49:24.840
<v Speaker 2>Precisely, BFS starts at a chosen vertex and explores the

968
00:49:24.840 --> 00:49:28.199
<v Speaker 2>graph layer by layer. First, it visits the starting vertex,

969
00:49:28.519 --> 00:49:31.239
<v Speaker 2>Then it visits all vertices that are directly adjacent to

970
00:49:31.280 --> 00:49:34.000
<v Speaker 2>the start vertex distance one. Then it visits all their

971
00:49:34.079 --> 00:49:37.679
<v Speaker 2>unvisited neighbors distance two, and so on. It visits vertices

972
00:49:37.679 --> 00:49:40.480
<v Speaker 2>in increasing order of their distance from the starting vertex.

973
00:49:40.760 --> 00:49:44.159
<v Speaker 2>It explores closest vertices first before moving further out.

974
00:49:44.079 --> 00:49:46.440
<v Speaker 1>Like ripples spreading out from a stone dropped in water.

975
00:49:46.639 --> 00:49:50.280
<v Speaker 2>Another excellent analogy. BFS is typically implemented using a Q

976
00:49:50.519 --> 00:49:53.519
<v Speaker 2>data structure to keep track of the vertices to visit. Next,

977
00:49:53.960 --> 00:49:56.280
<v Speaker 2>you in queue the start node. Then, while the queue

978
00:49:56.280 --> 00:49:58.679
<v Speaker 2>isn't empty u d q a node, visit it and

979
00:49:58.719 --> 00:50:02.639
<v Speaker 2>in queue all its unvisited neighbors. Like DFS, its worst

980
00:50:02.679 --> 00:50:05.320
<v Speaker 2>case time complexity is also ov plus e.

981
00:50:05.519 --> 00:50:09.920
<v Speaker 1>DFS uses a stack implicitly or explicitly, BFS uses a

982
00:50:10.000 --> 00:50:13.639
<v Speaker 1>que both OV plus E and these searches aren't just

983
00:50:13.719 --> 00:50:16.719
<v Speaker 1>theoretical exercises, right, They have practical applications.

984
00:50:16.840 --> 00:50:20.039
<v Speaker 2>Oh, absolutely, They form the basis for solving many important

985
00:50:20.079 --> 00:50:23.920
<v Speaker 2>graph problems. For example, both dfs and BFS can be

986
00:50:24.039 --> 00:50:26.639
<v Speaker 2>used to determine if a path exists between two given

987
00:50:26.719 --> 00:50:29.599
<v Speaker 2>vertices and a graph. They can find all reachable vertices

988
00:50:29.599 --> 00:50:30.519
<v Speaker 2>from a starting point.

989
00:50:30.679 --> 00:50:31.039
<v Speaker 1>What else.

990
00:50:31.119 --> 00:50:34.559
<v Speaker 2>BFS has a particularly important property when used on an

991
00:50:34.599 --> 00:50:36.960
<v Speaker 2>unweighted graph, where all edges have the same cost or

992
00:50:37.079 --> 00:50:40.280
<v Speaker 2>length typically one, it finds the shortest path in terms

993
00:50:40.280 --> 00:50:42.679
<v Speaker 2>of the number of edges between the starting vertex and

994
00:50:42.800 --> 00:50:46.239
<v Speaker 2>all other reachable vertices. That's incredibly useful, finding the shortest

995
00:50:46.320 --> 00:50:49.519
<v Speaker 2>route yep. And both dfs and BFS can be easily

996
00:50:49.519 --> 00:50:52.159
<v Speaker 2>adapted to generate a spanning tree for a connected graph

997
00:50:52.519 --> 00:50:54.679
<v Speaker 2>simply by keeping track of the edges that lead to

998
00:50:54.679 --> 00:50:57.519
<v Speaker 2>the discovery of each new vertex during the traversal.

999
00:50:57.800 --> 00:51:03.519
<v Speaker 1>Wow. Okay, from the fundamental building blocks of ADTs, through lists, stacks,

1000
00:51:03.599 --> 00:51:06.679
<v Speaker 1>queues all the way to the intricate dance of balanced

1001
00:51:06.679 --> 00:51:10.320
<v Speaker 1>trees and these powerful graph algorithms, we've really taken a

1002
00:51:10.320 --> 00:51:11.840
<v Speaker 1>deep dive today, we certainly have.

1003
00:51:12.039 --> 00:51:15.880
<v Speaker 2>We've hopefully illuminated how data structures are these crucial arrangements

1004
00:51:15.880 --> 00:51:18.800
<v Speaker 2>of data and memory, how algorithms are the procedures that

1005
00:51:18.840 --> 00:51:22.440
<v Speaker 2>manipulate them, and how abstract data types provide that essential

1006
00:51:22.480 --> 00:51:26.400
<v Speaker 2>conceptual blueprint separating the what from the how. Yeah, we've

1007
00:51:26.440 --> 00:51:29.679
<v Speaker 2>also explored the particular strengths and maybe some surprising quirks

1008
00:51:30.000 --> 00:51:33.599
<v Speaker 2>of Rubies built in types, especially it's very flexible dynamic

1009
00:51:33.639 --> 00:51:36.239
<v Speaker 2>arrays and the implications of its duct typing philosophy.

1010
00:51:36.559 --> 00:51:39.880
<v Speaker 1>And we definitely unpacked the whole world of containers, from

1011
00:51:39.880 --> 00:51:43.400
<v Speaker 1>those simple linear stacks and cues to the incredibly versatile lists,

1012
00:51:43.679 --> 00:51:46.559
<v Speaker 1>and then onto the more complex but powerful trees, sets,

1013
00:51:46.719 --> 00:51:50.320
<v Speaker 1>maps and finely graphs. Perhaps most importantly, we've got to

1014
00:51:50.360 --> 00:51:54.360
<v Speaker 1>handle on analyzing algorithms, understanding that big O notation that

1015
00:51:54.400 --> 00:51:57.280
<v Speaker 1>tells us why some solutions are lightning fast for large

1016
00:51:57.280 --> 00:52:00.519
<v Speaker 1>inputs or others just grind to a halt exactly.

1017
00:52:00.559 --> 00:52:03.599
<v Speaker 2>And understanding these core concepts isn't just you know, academic

1018
00:52:03.639 --> 00:52:06.760
<v Speaker 2>knowledge for computer science exams. It really empowers you, as

1019
00:52:06.760 --> 00:52:10.920
<v Speaker 2>a developer or anyone interacting with technology to make informed decisions.

1020
00:52:11.400 --> 00:52:15.599
<v Speaker 2>Knowing the trade offs contiguous versus linked implementations, recursive versus

1021
00:52:15.639 --> 00:52:20.039
<v Speaker 2>iterative approaches. How Ruby's specific design choices influence performance. That's

1022
00:52:20.119 --> 00:52:23.599
<v Speaker 2>key to designing efficient, robust software and understanding why things

1023
00:52:23.639 --> 00:52:24.440
<v Speaker 2>work the way they do.

1024
00:52:24.840 --> 00:52:27.519
<v Speaker 1>So for you the listener, maybe think about your own

1025
00:52:27.599 --> 00:52:31.920
<v Speaker 1>daily digital interactions, every single time you search for something online,

1026
00:52:32.039 --> 00:52:34.840
<v Speaker 1>or send a message, or even just load a web page. Yeah,

1027
00:52:35.000 --> 00:52:38.360
<v Speaker 1>these very data structures and algorithms we've been discussing are

1028
00:52:38.360 --> 00:52:43.360
<v Speaker 1>working tirelessly, constantly behind the scenes, shaping that entire experience.

1029
00:52:43.719 --> 00:52:46.320
<v Speaker 2>Yeah, it's everywhere. And this kind of brings us to

1030
00:52:46.480 --> 00:52:50.000
<v Speaker 2>a compelling final thought, maybe something to chew on. If

1031
00:52:50.000 --> 00:52:53.320
<v Speaker 2>we as a field are continuously striving for better ways

1032
00:52:53.360 --> 00:52:57.400
<v Speaker 2>to organize and process information, constantly improving on basic structures

1033
00:52:57.440 --> 00:53:01.840
<v Speaker 2>like bsts by inventing things like trees and two three trees,

1034
00:53:02.000 --> 00:53:06.280
<v Speaker 2>or optimizing sorting algorithms with clever techniques like introspective sort,

1035
00:53:06.800 --> 00:53:10.119
<v Speaker 2>What surprising new applications, or perhaps entirely new kinds of

1036
00:53:10.119 --> 00:53:14.599
<v Speaker 2>efficiency breakthroughs might emerge next? What problems might become solvable

1037
00:53:14.599 --> 00:53:17.239
<v Speaker 2>that we haven't even properly conceived of yet, just through

1038
00:53:17.239 --> 00:53:18.840
<v Speaker 2>better data structures and algorithms.

1039
00:53:19.119 --> 00:53:22.679
<v Speaker 1>That's a fascinating question to ponder, the relentless drive for

1040
00:53:22.760 --> 00:53:26.920
<v Speaker 1>efficiency in organizing information. We really hope this deep dive

1041
00:53:26.960 --> 00:53:29.159
<v Speaker 1>has given you a whole new perspective, maybe, on the

1042
00:53:29.199 --> 00:53:32.360
<v Speaker 1>often invisible architecture of the digital world all around us.

1043
00:53:33.039 --> 00:53:35.199
<v Speaker 1>Join us next time for another deep dive right here

1044
00:53:35.360 --> 00:53:36.119
<v Speaker 1>on the Deep Dive
