WEBVTT

1
00:00:00.160 --> 00:00:03.960
<v Speaker 1>Welcome to the deep dive. Today, we're tackling something fundamental

2
00:00:04.240 --> 00:00:07.799
<v Speaker 1>yet often unseen in the world of coding. The compiler.

3
00:00:08.320 --> 00:00:12.359
<v Speaker 1>Think of it as the architect behind the scenes, taking

4
00:00:12.400 --> 00:00:15.919
<v Speaker 1>your human readable instructions and blueprinting them into the precise

5
00:00:16.039 --> 00:00:18.679
<v Speaker 1>machine language your computer understands precisely.

6
00:00:19.039 --> 00:00:21.199
<v Speaker 2>And for this deep dive, we're not just talking about

7
00:00:21.199 --> 00:00:24.760
<v Speaker 2>compilers conceptually. We're getting into the initial stages of building

8
00:00:24.760 --> 00:00:28.359
<v Speaker 2>our own. Our mission is to really dissect the core

9
00:00:28.440 --> 00:00:32.560
<v Speaker 2>transformations required to compile even the most well the simplest

10
00:00:32.600 --> 00:00:33.960
<v Speaker 2>see programs.

11
00:00:33.679 --> 00:00:37.039
<v Speaker 1>And to guide this architectural exploration, we'll be using excerpts

12
00:00:37.039 --> 00:00:40.439
<v Speaker 1>from Writing a C Compiler as our foundational text. It's

13
00:00:40.479 --> 00:00:43.240
<v Speaker 1>our guide as we uncover how source code undergoes its

14
00:00:43.280 --> 00:00:47.600
<v Speaker 1>initial metamorphosis into executable form. Okay, let's unpack this, let's

15
00:00:47.640 --> 00:00:51.119
<v Speaker 1>do it. Our compiler construction starts with establishing a modular

16
00:00:51.159 --> 00:00:53.039
<v Speaker 1>pipeline of four key stages.

17
00:00:53.399 --> 00:00:55.920
<v Speaker 2>That's right. Yeah, even for the simplest stuff like maybe

18
00:00:55.960 --> 00:00:59.840
<v Speaker 2>not even Hello World, maybe just returning a number, will

19
00:00:59.840 --> 00:01:02.000
<v Speaker 2>be setting up a four pass structure. The first of.

20
00:01:01.960 --> 00:01:04.760
<v Speaker 1>These is the lex lexer or tokenizer.

21
00:01:04.280 --> 00:01:05.920
<v Speaker 2>Right or tokenizer, Yeah, same thing.

22
00:01:06.200 --> 00:01:09.159
<v Speaker 1>So the lexer's fundamental task is to scan our c

23
00:01:09.319 --> 00:01:12.159
<v Speaker 1>code and break it down into its essential components, right

24
00:01:12.239 --> 00:01:15.400
<v Speaker 1>like identifying the individual building blocks before you start assembling

25
00:01:15.400 --> 00:01:16.120
<v Speaker 1>them exactly.

26
00:01:16.480 --> 00:01:23.280
<v Speaker 2>These smallest meaningful units are the tokens. Think about curly braces,

27
00:01:23.599 --> 00:01:27.439
<v Speaker 2>defining scope, the summit, colon, ending statements, see words like

28
00:01:27.519 --> 00:01:31.319
<v Speaker 2>int right, int return, and then the identifiers you create

29
00:01:31.359 --> 00:01:34.280
<v Speaker 2>for functions, variables. All of these are distinct tokens. The

30
00:01:34.400 --> 00:01:37.760
<v Speaker 2>lexer read your code character by character and groups them intelligently.

31
00:01:37.920 --> 00:01:40.680
<v Speaker 1>I see, so a basic line like into main return

32
00:01:40.760 --> 00:01:44.599
<v Speaker 1>thirty two would be segmented into tokens int main return

33
00:01:44.680 --> 00:01:46.079
<v Speaker 1>thirty two taxes.

34
00:01:45.959 --> 00:01:47.439
<v Speaker 2>Enter exactly that sequence.

35
00:01:47.519 --> 00:01:51.439
<v Speaker 1>Okay, what's the next step after this initial segmentation.

36
00:01:51.159 --> 00:01:55.280
<v Speaker 2>Following tokenization gives the parser? It takes that linear sequence

37
00:01:55.280 --> 00:01:56.439
<v Speaker 2>of tokens.

38
00:01:55.959 --> 00:01:57.879
<v Speaker 1>Just a list basically just a list.

39
00:01:57.879 --> 00:02:01.239
<v Speaker 2>And imposes a hierarchical structure. It constructs what's known as

40
00:02:01.239 --> 00:02:04.359
<v Speaker 2>an abstract syntax tree or at an AST.

41
00:02:04.920 --> 00:02:06.840
<v Speaker 1>Okay, so instead of just a flat list, we get

42
00:02:06.840 --> 00:02:09.879
<v Speaker 1>a structured representation that shows how these tokens relate to

43
00:02:09.919 --> 00:02:10.319
<v Speaker 1>each other.

44
00:02:10.599 --> 00:02:13.919
<v Speaker 2>Precisely. Think of it as moving from say a list

45
00:02:13.960 --> 00:02:15.919
<v Speaker 2>of ingredients to an actual recipe.

46
00:02:15.960 --> 00:02:17.319
<v Speaker 1>Oh nice analogy.

47
00:02:17.599 --> 00:02:21.000
<v Speaker 2>The AST is this tree like structure that embodies the

48
00:02:21.039 --> 00:02:25.199
<v Speaker 2>grammatical rules of C and reveals the program's operational flow.

49
00:02:25.879 --> 00:02:29.199
<v Speaker 2>It's a format that lets the compiler analyze and well

50
00:02:29.360 --> 00:02:33.199
<v Speaker 2>understand the codes intent much better than just that stream of.

51
00:02:33.240 --> 00:02:37.039
<v Speaker 1>Tokens, right, because just having a list of words doesn't

52
00:02:37.080 --> 00:02:39.680
<v Speaker 1>tell you the sentence structure of the meaning exactly. So,

53
00:02:39.719 --> 00:02:44.000
<v Speaker 1>for our example in Maine return thirty two, what would

54
00:02:44.039 --> 00:02:46.400
<v Speaker 1>the AST look like in its simplest form.

55
00:02:46.719 --> 00:02:50.759
<v Speaker 2>In a simplified view, the root might be a program node, okay.

56
00:02:50.919 --> 00:02:53.199
<v Speaker 2>Descending from that, you'd have a function node for Maine.

57
00:02:53.520 --> 00:02:56.360
<v Speaker 2>Inside the function there'd be a return node, and finally,

58
00:02:56.680 --> 00:03:00.039
<v Speaker 2>connected to return, a constant node holding the value.

59
00:02:59.800 --> 00:03:04.439
<v Speaker 1>There So it visually organizes the program's components and their relationships.

60
00:03:04.560 --> 00:03:07.039
<v Speaker 1>Function contains a return statement which returns a.

61
00:03:07.000 --> 00:03:09.240
<v Speaker 2>Constant exactly captures that structure.

62
00:03:09.360 --> 00:03:12.960
<v Speaker 1>Interesting, Now the compiler has this structured tree. What happens

63
00:03:12.960 --> 00:03:14.120
<v Speaker 1>next in the transformation.

64
00:03:14.479 --> 00:03:17.120
<v Speaker 2>This is where the translation to a lower level language begins.

65
00:03:17.759 --> 00:03:21.800
<v Speaker 2>The code generation pass takes the AST the tree we

66
00:03:21.919 --> 00:03:25.280
<v Speaker 2>just spilled, yes, that tree, and translates it into assembly

67
00:03:25.319 --> 00:03:26.680
<v Speaker 2>language instruction assembly.

68
00:03:26.759 --> 00:03:29.080
<v Speaker 1>Okay, that's much closer to the hardware, right exactly.

69
00:03:29.280 --> 00:03:32.840
<v Speaker 2>And it's important to understand at this stage the compiler

70
00:03:33.039 --> 00:03:37.759
<v Speaker 2>isn't directly writing out like a human readable assembly text file.

71
00:03:38.039 --> 00:03:42.840
<v Speaker 2>Oh okay, it's still manipulating data structures internally. It's creating

72
00:03:42.879 --> 00:03:46.120
<v Speaker 2>an in memory representation of these assembly instructions.

73
00:03:46.199 --> 00:03:48.800
<v Speaker 1>So the creation of the actual DOTS file comes later.

74
00:03:49.120 --> 00:03:53.199
<v Speaker 2>Yes, that's the job of the fourth initial pass code emission, right.

75
00:03:53.360 --> 00:03:57.400
<v Speaker 2>This pass takes the in memory assembly representation what just

76
00:03:57.400 --> 00:04:02.039
<v Speaker 2>created and finally writes it out, serializes it into a file,

77
00:04:02.400 --> 00:04:04.240
<v Speaker 2>usually with the dot extension.

78
00:04:03.879 --> 00:04:06.000
<v Speaker 1>And that S file is what the assembler and linker

79
00:04:06.439 --> 00:04:07.120
<v Speaker 1>use later on.

80
00:04:07.240 --> 00:04:09.240
<v Speaker 2>That's the one they take that and produce your final

81
00:04:09.319 --> 00:04:10.400
<v Speaker 2>executable program.

82
00:04:10.520 --> 00:04:13.360
<v Speaker 1>It seems like a considerable number of steps for such

83
00:04:13.360 --> 00:04:15.879
<v Speaker 1>a basic program, you know, return thirty two. Why not

84
00:04:15.919 --> 00:04:17.480
<v Speaker 1>a more direct approach.

85
00:04:17.240 --> 00:04:19.759
<v Speaker 2>That's a really fair question. Again, it might seem like

86
00:04:19.800 --> 00:04:24.519
<v Speaker 2>overkill for these tiny examples, but establishing this multipass architecture

87
00:04:24.560 --> 00:04:28.199
<v Speaker 2>right from the start gives us huge advantages later. By

88
00:04:28.240 --> 00:04:33.000
<v Speaker 2>separating concerns lexical analysis here, syntax there, cogen over here,

89
00:04:33.360 --> 00:04:39.160
<v Speaker 2>modularity exactly, modularity, we create a more maintainable system. Imagine

90
00:04:39.199 --> 00:04:42.160
<v Speaker 2>trying to translate I don't know, a complex novel straight

91
00:04:42.199 --> 00:04:45.600
<v Speaker 2>into another language without first understanding the grammar and structure.

92
00:04:45.720 --> 00:04:46.680
<v Speaker 1>Yeah, that would be a mess.

93
00:04:46.680 --> 00:04:49.839
<v Speaker 2>It would be incredibly hard. This separation lets us handle

94
00:04:49.839 --> 00:04:52.160
<v Speaker 2>more complexity later without having to rip everything up and

95
00:04:52.160 --> 00:04:55.680
<v Speaker 2>start again. It's really about building a scalable foundation that.

96
00:04:55.600 --> 00:04:58.279
<v Speaker 1>Makes a lot of sense planning for future complexity. Okay,

97
00:04:58.319 --> 00:05:01.160
<v Speaker 1>speaking of assembly language, can we actually peek under the

98
00:05:01.160 --> 00:05:04.639
<v Speaker 1>hood see what a real compiler like GCC generates for

99
00:05:04.800 --> 00:05:06.199
<v Speaker 1>a simple C program.

100
00:05:06.240 --> 00:05:09.160
<v Speaker 2>That's an excellent idea. It helps ground this discussion. For

101
00:05:09.199 --> 00:05:12.399
<v Speaker 2>a simple program like in main return two saved us

102
00:05:12.439 --> 00:05:14.959
<v Speaker 2>say return two dot C. You can use GCC with

103
00:05:15.000 --> 00:05:18.279
<v Speaker 2>some specific commandline flags gccss and f and O. A

104
00:05:18.360 --> 00:05:22.600
<v Speaker 2>secret is unwine tables FCF protection none, return two dot

105
00:05:22.639 --> 00:05:23.839
<v Speaker 2>e t whoa.

106
00:05:24.040 --> 00:05:26.480
<v Speaker 1>Okay, lots of flags there, but the key is mesh

107
00:05:26.639 --> 00:05:27.120
<v Speaker 1>s ash.

108
00:05:27.160 --> 00:05:29.040
<v Speaker 2>This is the main one telling it to stop after

109
00:05:29.079 --> 00:05:32.480
<v Speaker 2>compilation and output assembly. The others just simplify the output

110
00:05:32.519 --> 00:05:33.399
<v Speaker 2>a bit for our purposes.

111
00:05:33.439 --> 00:05:35.199
<v Speaker 1>Got it, And what's the output look like?

112
00:05:35.360 --> 00:05:38.519
<v Speaker 2>This command generates a file probably return two dots. With

113
00:05:38.600 --> 00:05:41.720
<v Speaker 2>the assembly for that C program, you'll likely see something

114
00:05:41.759 --> 00:05:45.879
<v Speaker 2>really simple like dot globalmine, dot main, dot movell two

115
00:05:45.920 --> 00:05:48.480
<v Speaker 2>dollars percent ax red.

116
00:05:48.560 --> 00:05:50.839
<v Speaker 1>Okay, that's definitely not c what's going on.

117
00:05:50.800 --> 00:05:53.079
<v Speaker 2>Here, Let's break it down. The syntax is AT and

118
00:05:53.120 --> 00:05:56.439
<v Speaker 2>T assembly syntax common on Linux and mac os. That

119
00:05:56.560 --> 00:05:59.680
<v Speaker 2>first line dot global main that starts with a period, right,

120
00:06:00.079 --> 00:06:02.560
<v Speaker 2>That means it's an assembler directive. It's an instruction for

121
00:06:02.600 --> 00:06:06.000
<v Speaker 2>the assembler itself, not the CPU docl WI main just

122
00:06:06.000 --> 00:06:08.040
<v Speaker 2>makes the main symbol visible outside this.

123
00:06:08.040 --> 00:06:10.600
<v Speaker 1>File, a symbol like a label or a name for

124
00:06:10.639 --> 00:06:11.560
<v Speaker 1>a place in memory.

125
00:06:11.639 --> 00:06:14.759
<v Speaker 2>Precisely. Here, Maine is a symbol representing the starting address

126
00:06:14.759 --> 00:06:17.199
<v Speaker 2>of our main function. The compiler doesn't know the final

127
00:06:17.199 --> 00:06:19.680
<v Speaker 2>address yet. The linker figures that out later. Ah.

128
00:06:19.759 --> 00:06:22.959
<v Speaker 1>The linker's job resolving symbols exactly.

129
00:06:23.079 --> 00:06:27.360
<v Speaker 2>It resolves them assigns actual memory locations. If code refers

130
00:06:27.399 --> 00:06:30.439
<v Speaker 2>to Maine, the linker patches in the real address. That's

131
00:06:30.519 --> 00:06:31.399
<v Speaker 2>called relocation.

132
00:06:31.560 --> 00:06:33.639
<v Speaker 1>So Maine on the next line is just marking the

133
00:06:33.680 --> 00:06:35.600
<v Speaker 1>spot the start of the code exactly.

134
00:06:35.680 --> 00:06:39.560
<v Speaker 2>It's the label. Then move two dollars percent ax. That's

135
00:06:39.560 --> 00:06:40.319
<v Speaker 2>a real instruction.

136
00:06:40.439 --> 00:06:43.199
<v Speaker 1>MOLL move long thirty two bit.

137
00:06:43.279 --> 00:06:47.360
<v Speaker 2>Yep thirty two bit integer two dollars means the literal value.

138
00:06:47.040 --> 00:06:49.720
<v Speaker 1>Too an immediate value, right, and percent.

139
00:06:49.720 --> 00:06:52.639
<v Speaker 2>X is a register, a small fast storage spot inside

140
00:06:52.680 --> 00:06:55.879
<v Speaker 2>the CPU. So this instruction puts the value too into

141
00:06:55.920 --> 00:06:57.160
<v Speaker 2>the percent ax register.

142
00:06:57.279 --> 00:07:00.800
<v Speaker 1>Okay, but y percent x specifically convention.

143
00:07:01.199 --> 00:07:04.399
<v Speaker 2>In many standard ways, functions call each other calling conventions,

144
00:07:04.439 --> 00:07:07.439
<v Speaker 2>the percent x register is designated to hold the function's

145
00:07:07.439 --> 00:07:08.079
<v Speaker 2>return value.

146
00:07:08.160 --> 00:07:11.079
<v Speaker 1>Oh okay, So because our C code returns two, we put.

147
00:07:10.879 --> 00:07:13.160
<v Speaker 2>Two in percent acts so whoever called main can find

148
00:07:13.160 --> 00:07:13.879
<v Speaker 2>the result.

149
00:07:13.600 --> 00:07:16.480
<v Speaker 1>There makes sense. And the last line writ.

150
00:07:16.360 --> 00:07:18.279
<v Speaker 2>Just means return tells the CPU to go back to

151
00:07:18.319 --> 00:07:20.600
<v Speaker 2>where main was called from. So yeah, those four lines

152
00:07:20.639 --> 00:07:23.399
<v Speaker 2>are the complete assembly for a tiny C program.

153
00:07:23.480 --> 00:07:24.879
<v Speaker 1>That's surprisingly direct.

154
00:07:25.319 --> 00:07:25.639
<v Speaker 2>Cool.

155
00:07:26.120 --> 00:07:28.120
<v Speaker 1>So when we compile a C program, even with our

156
00:07:28.120 --> 00:07:31.600
<v Speaker 1>own simple compiler, what's the typical sequence of operations overall?

157
00:07:31.879 --> 00:07:34.439
<v Speaker 2>Right? So while our first compiler focuses mainly on that

158
00:07:34.680 --> 00:07:36.199
<v Speaker 2>compilation to assembly.

159
00:07:35.839 --> 00:07:38.879
<v Speaker 1>Step step two, in the usual process, Yeah, the standard.

160
00:07:38.480 --> 00:07:42.519
<v Speaker 2>C process has a few phases. First, there's pre processing.

161
00:07:42.079 --> 00:07:44.680
<v Speaker 1>Handling hashtag include and macros and stuff.

162
00:07:44.720 --> 00:07:48.160
<v Speaker 2>Exactly Commands like GCCE do this. It often outputs a

163
00:07:48.199 --> 00:07:48.800
<v Speaker 2>DITI file.

164
00:07:49.040 --> 00:07:53.040
<v Speaker 1>Then comes compilation proper our focus generating the didass assembly

165
00:07:53.079 --> 00:07:53.800
<v Speaker 1>file correct?

166
00:07:53.879 --> 00:07:56.240
<v Speaker 2>Then in a full setup you have assembly and linking

167
00:07:56.959 --> 00:07:59.600
<v Speaker 2>usually just GCC assembly file. Oh, output file.

168
00:08:00.279 --> 00:08:04.000
<v Speaker 1>Takes the dot S file, makes machine code, links libraries.

169
00:08:03.560 --> 00:08:06.439
<v Speaker 2>And gives you the final executable. Right. Our initial compiler

170
00:08:06.480 --> 00:08:08.560
<v Speaker 2>will sort of stub out that last step, relying on

171
00:08:08.600 --> 00:08:10.560
<v Speaker 2>the system's assembler and linker. Gotcha.

172
00:08:10.879 --> 00:08:13.160
<v Speaker 1>And for our own compiler driver, the command line tool

173
00:08:13.199 --> 00:08:14.439
<v Speaker 1>we're building, how should.

174
00:08:14.199 --> 00:08:17.160
<v Speaker 2>That behave, good question. It should take the path to

175
00:08:17.199 --> 00:08:19.800
<v Speaker 2>a C source file like your compiler paths to program

176
00:08:19.920 --> 00:08:23.040
<v Speaker 2>dot C. If it works, success, it should create an

177
00:08:23.040 --> 00:08:26.199
<v Speaker 2>executable in the same directory, same name, but no dot C.

178
00:08:26.480 --> 00:08:29.439
<v Speaker 2>So pats a program and exit with code zero. And

179
00:08:29.480 --> 00:08:32.720
<v Speaker 2>if it fails, non zero exit code and crucially no

180
00:08:32.799 --> 00:08:36.519
<v Speaker 2>output files, no executable, clean failure.

181
00:08:36.360 --> 00:08:39.559
<v Speaker 1>Clear rules. And I saw mentions of lex and parse

182
00:08:39.559 --> 00:08:40.559
<v Speaker 1>options in the notes.

183
00:08:40.759 --> 00:08:43.120
<v Speaker 2>Uh yeah, those are mostly for testing and debugging. Our

184
00:08:43.120 --> 00:08:46.240
<v Speaker 2>compiler lexmis just run the lexer.

185
00:08:45.919 --> 00:08:48.279
<v Speaker 1>And stop check tokenizing, right, and.

186
00:08:48.399 --> 00:08:52.399
<v Speaker 2>Parse runs the lexer and parser builds the AST then stops.

187
00:08:52.840 --> 00:08:55.039
<v Speaker 2>Neither should create any output files, they just check those

188
00:08:55.039 --> 00:08:55.960
<v Speaker 2>stages internally.

189
00:08:56.039 --> 00:08:59.039
<v Speaker 1>Okay, that makes sense for development. All right, we've got

190
00:08:59.039 --> 00:09:01.840
<v Speaker 1>a solid high light picture. The four passes the standard

191
00:09:01.840 --> 00:09:06.240
<v Speaker 1>GCC flow. Let's dive deeper into the lexer and pulser.

192
00:09:06.279 --> 00:09:08.720
<v Speaker 1>Now they're the first big hurdle in building our own

193
00:09:08.799 --> 00:09:09.600
<v Speaker 1>right absolutely.

194
00:09:09.919 --> 00:09:13.679
<v Speaker 2>Chapter two of the guide digs into building these starting

195
00:09:13.679 --> 00:09:16.799
<v Speaker 2>with the lexer. As we said, its job is finding tokens,

196
00:09:17.480 --> 00:09:19.879
<v Speaker 2>and one simplifying assumption we make early on is that

197
00:09:19.919 --> 00:09:22.879
<v Speaker 2>our c files only use ASKI characters.

198
00:09:22.519 --> 00:09:25.559
<v Speaker 1>Just standard ask you for now sensible starting point. How

199
00:09:25.559 --> 00:09:27.720
<v Speaker 1>do we actually test if our lexer is doing the

200
00:09:27.799 --> 00:09:28.159
<v Speaker 1>right thing.

201
00:09:28.320 --> 00:09:31.240
<v Speaker 2>The guide provides a test compiler tool, which is super helpful.

202
00:09:31.360 --> 00:09:33.960
<v Speaker 2>It comes with test programs. In test chapter two, you'll

203
00:09:33.960 --> 00:09:35.679
<v Speaker 2>find directories like invalid.

204
00:09:35.320 --> 00:09:37.559
<v Speaker 1>Lex programs that should fail the lexer.

205
00:09:37.480 --> 00:09:41.320
<v Speaker 2>Exactly, bad tokens, weird characters, and then invalid parts and

206
00:09:41.399 --> 00:09:45.519
<v Speaker 2>valid directories. For later stages. You test the lexer specifically

207
00:09:46.039 --> 00:09:50.120
<v Speaker 2>using dot test compiler path toward compiler chapter two stage lex.

208
00:09:50.000 --> 00:09:53.159
<v Speaker 1>Okay, So that command runs our compiler inlex only mode

209
00:09:53.240 --> 00:09:56.440
<v Speaker 1>against those test cases and checks if it accepts or

210
00:09:56.480 --> 00:09:57.519
<v Speaker 1>rejects them correctly.

211
00:09:57.600 --> 00:10:00.399
<v Speaker 2>That's the idea. It verifies the pass feel behavior. It

212
00:10:00.440 --> 00:10:03.600
<v Speaker 2>doesn't necessarily check the exact stream of tokens for the

213
00:10:03.639 --> 00:10:04.759
<v Speaker 2>valid files.

214
00:10:04.399 --> 00:10:07.159
<v Speaker 1>Though, ah so for that level of detail, we'd need

215
00:10:07.200 --> 00:10:08.240
<v Speaker 1>our own unit tests.

216
00:10:08.480 --> 00:10:11.759
<v Speaker 2>Precisely, you'd write tests to feed it valid code and

217
00:10:11.799 --> 00:10:14.360
<v Speaker 2>assert that the token list matches exactly what you expect,

218
00:10:14.639 --> 00:10:17.159
<v Speaker 2>and feed it invalid code to check the error messages.

219
00:10:17.279 --> 00:10:20.960
<v Speaker 1>Got it? Any key implementation tips for the lexer itself, Yes.

220
00:10:20.879 --> 00:10:23.360
<v Speaker 2>A couple of important ones. First, when you see something

221
00:10:23.360 --> 00:10:25.679
<v Speaker 2>that looks like an identifier, a sequence of letters, numbers,

222
00:10:25.759 --> 00:10:29.519
<v Speaker 2>underscores like maine or my variable right your logic. Maybe

223
00:10:29.559 --> 00:10:32.799
<v Speaker 2>your rejects will probably also match keywords like int or return.

224
00:10:33.399 --> 00:10:37.320
<v Speaker 2>The efficient way is first recognize it as a generic identifier.

225
00:10:37.799 --> 00:10:40.000
<v Speaker 2>Then check if that identifier happens to be on the

226
00:10:40.039 --> 00:10:41.320
<v Speaker 2>list of reserved keywords.

227
00:10:41.559 --> 00:10:45.799
<v Speaker 1>Ah, don't try to make the initial pattern, distinguish them, identify, then.

228
00:10:45.720 --> 00:10:49.639
<v Speaker 2>Classify exactly two steps. The other thing is, don't rely

229
00:10:49.759 --> 00:10:52.720
<v Speaker 2>only on white space to split tokens. Oh right, think

230
00:10:52.720 --> 00:10:55.879
<v Speaker 2>about main. That's three tokens main and no white space

231
00:10:55.919 --> 00:10:59.240
<v Speaker 2>separating them. If you just split on spaces, you'd get

232
00:10:59.279 --> 00:10:59.559
<v Speaker 2>it wrong.

233
00:10:59.799 --> 00:11:03.399
<v Speaker 1>Point Okay, So the lexer spins out tokens. Then the

234
00:11:03.440 --> 00:11:06.120
<v Speaker 1>parser steps in to build the ast exactly.

235
00:11:06.480 --> 00:11:09.960
<v Speaker 2>The parser takes that flat stream and gives it structure

236
00:11:10.159 --> 00:11:13.679
<v Speaker 2>hierarchy based on the C grammar. The AST is the

237
00:11:13.759 --> 00:11:15.480
<v Speaker 2>data structure holding that hierarchy.

238
00:11:15.879 --> 00:11:18.879
<v Speaker 1>We saw the simple AST for return thirty two. What

239
00:11:18.960 --> 00:11:22.440
<v Speaker 1>about something slightly more complex, like an if statement. How

240
00:11:22.440 --> 00:11:23.559
<v Speaker 1>does the hierarchy show up there?

241
00:11:23.600 --> 00:11:27.000
<v Speaker 2>Okay, good example. Let's say you have if ab return

242
00:11:27.039 --> 00:11:29.600
<v Speaker 2>two plus two. Right, the top AST node might be

243
00:11:29.639 --> 00:11:32.159
<v Speaker 2>an if node. This if node would have say, two

244
00:11:32.200 --> 00:11:34.279
<v Speaker 2>main children, one for the condition.

245
00:11:34.080 --> 00:11:36.159
<v Speaker 1>Ab, which itself might be structured.

246
00:11:36.200 --> 00:11:38.480
<v Speaker 2>Oh yeah, that condition could be a binary op node

247
00:11:38.519 --> 00:11:40.879
<v Speaker 2>for with this own children for the variable A and

248
00:11:40.919 --> 00:11:41.440
<v Speaker 2>the variable b.

249
00:11:41.559 --> 00:11:43.679
<v Speaker 1>Okay, and the other child of the if.

250
00:11:43.720 --> 00:11:46.399
<v Speaker 2>That would be the then block return two plus two.

251
00:11:46.840 --> 00:11:48.960
<v Speaker 2>That could be a return node, and its child would

252
00:11:48.960 --> 00:11:52.080
<v Speaker 2>be another binary op node for the plus with two

253
00:11:52.159 --> 00:11:54.399
<v Speaker 2>constant children both holding to wow.

254
00:11:54.440 --> 00:11:57.639
<v Speaker 1>Okay, So the tree really mirrors the nesting and the

255
00:11:57.679 --> 00:12:00.679
<v Speaker 1>logic if conditioned then and the condition the then part

256
00:12:00.720 --> 00:12:02.279
<v Speaker 1>have their own little subtrees.

257
00:12:02.360 --> 00:12:04.879
<v Speaker 2>Decisely, it captures that structure directly, which is what the

258
00:12:04.879 --> 00:12:08.600
<v Speaker 2>next stages need now. To define these AST structures formally

259
00:12:08.679 --> 00:12:12.759
<v Speaker 2>and importantly in a language neutral way, the guide introduces

260
00:12:12.759 --> 00:12:14.320
<v Speaker 2>something called ASDL.

261
00:12:13.919 --> 00:12:16.879
<v Speaker 1>Asdl zephyr abstract syntax description language.

262
00:12:16.879 --> 00:12:18.679
<v Speaker 2>That's the one. It's just a formal way to write

263
00:12:18.679 --> 00:12:20.159
<v Speaker 2>down what our AST nodes look like.

264
00:12:20.279 --> 00:12:22.960
<v Speaker 1>Okay, So what does the ASDL look like for our

265
00:12:23.240 --> 00:12:25.519
<v Speaker 1>super simple C subset in chapter two?

266
00:12:25.840 --> 00:12:30.759
<v Speaker 2>It's pretty minimal. It's like program program function definition, function definition, function,

267
00:12:30.799 --> 00:12:33.399
<v Speaker 2>identify your name, statement, body, return next, sovisp.

268
00:12:34.000 --> 00:12:37.039
<v Speaker 1>Okay, let's decode that a program is just a program

269
00:12:37.080 --> 00:12:40.240
<v Speaker 1>node containing one function definition. Yep, A function definition is

270
00:12:40.279 --> 00:12:43.360
<v Speaker 1>a function node. It has a name, which is an

271
00:12:43.360 --> 00:12:46.879
<v Speaker 1>identifier type, and a body which is a statement type.

272
00:12:47.120 --> 00:12:49.759
<v Speaker 2>Right, and those words name and body are just field

273
00:12:49.840 --> 00:12:51.000
<v Speaker 2>names helpful labels.

274
00:12:51.039 --> 00:12:54.279
<v Speaker 1>Gotcha. Then a statement can only be a return node

275
00:12:54.360 --> 00:12:57.080
<v Speaker 1>containing an x expression for now, yes, and an x

276
00:12:57.159 --> 00:12:59.720
<v Speaker 1>can only be a constant node holding an int.

277
00:13:00.159 --> 00:13:03.360
<v Speaker 2>That's it for chapter two. Identifier and int are like

278
00:13:03.440 --> 00:13:05.000
<v Speaker 2>built in ASDL types.

279
00:13:05.480 --> 00:13:08.720
<v Speaker 1>So when we implement this, say in Python or Rust

280
00:13:08.840 --> 00:13:12.639
<v Speaker 1>or drava, will create classes or data types that match

281
00:13:12.720 --> 00:13:14.759
<v Speaker 1>this ASDL structure exactly.

282
00:13:15.120 --> 00:13:18.759
<v Speaker 2>Functional languages might use algebraic data types. OP languages might

283
00:13:18.840 --> 00:13:22.200
<v Speaker 2>use abstract classes and inheritance. The guide mentioned some idioms

284
00:13:22.200 --> 00:13:23.559
<v Speaker 2>and points to more reading if you want to go

285
00:13:23.600 --> 00:13:25.120
<v Speaker 2>deeper into implementation strategies.

286
00:13:25.159 --> 00:13:28.919
<v Speaker 1>Okay, but the ASDL defines the structure, but it doesn't

287
00:13:28.960 --> 00:13:31.879
<v Speaker 1>tell the parser which tokens in what order make up

288
00:13:32.000 --> 00:13:34.879
<v Speaker 1>say a function definition. Right, it doesn't mention the ind

289
00:13:34.960 --> 00:13:37.120
<v Speaker 1>keyword or the parentheses or braces.

290
00:13:37.360 --> 00:13:39.519
<v Speaker 2>That is a crucial distinction. You're absolutely right. The AST

291
00:13:39.679 --> 00:13:43.120
<v Speaker 2>is abstract. It leaves out the syntactic sugar like semicolons embraces.

292
00:13:43.519 --> 00:13:46.480
<v Speaker 2>The parser needs a concrete map of the token sequences.

293
00:13:45.960 --> 00:13:47.519
<v Speaker 1>Which is where the formal grammar comes.

294
00:13:47.360 --> 00:13:50.759
<v Speaker 2>In, exactly, using a notation called backus nair form or

295
00:13:50.840 --> 00:13:52.279
<v Speaker 2>BNF BNF.

296
00:13:52.559 --> 00:13:56.279
<v Speaker 1>Okay, what's the BNF for this simplecy it mirrors.

297
00:13:55.919 --> 00:14:00.879
<v Speaker 2>The ASDL pretty closely. The program function I identify return

298
00:14:00.919 --> 00:14:04.679
<v Speaker 2>expeed a statement return expediment, and then it clarifies the

299
00:14:04.759 --> 00:14:09.360
<v Speaker 2>terminals identifier an identify your token and in a constant.

300
00:14:09.039 --> 00:14:11.440
<v Speaker 1>Token then okay, So things in angle brackets like this

301
00:14:11.519 --> 00:14:14.960
<v Speaker 1>are non terminals. They correspond to our AST node types.

302
00:14:15.080 --> 00:14:17.200
<v Speaker 2>Yes, grammatical categories.

303
00:14:16.639 --> 00:14:19.720
<v Speaker 1>And things in quotes like this are terminals. The actual

304
00:14:19.799 --> 00:14:21.039
<v Speaker 1>tokens the lexer gives.

305
00:14:20.919 --> 00:14:23.679
<v Speaker 2>Us exactly the literal tokens we expect to see. The

306
00:14:23.720 --> 00:14:27.480
<v Speaker 2>bn F spells out the exact sequence and int token.

307
00:14:27.759 --> 00:14:31.840
<v Speaker 2>Then an identifier token then art rcedo a statement right,

308
00:14:31.960 --> 00:14:32.240
<v Speaker 2>and the.

309
00:14:32.240 --> 00:14:35.200
<v Speaker 1>Question mark definitions are just clarifying what kind of token

310
00:14:35.240 --> 00:14:38.240
<v Speaker 1>identifier and in refer to. So the BNF is the

311
00:14:38.279 --> 00:14:41.879
<v Speaker 1>parser's rulebook for matching token sequences to build the AST

312
00:14:42.080 --> 00:14:44.159
<v Speaker 1>nodes defined by the asdo.

313
00:14:43.840 --> 00:14:46.399
<v Speaker 2>You've got it perfect summary. The guide also shows how

314
00:14:46.440 --> 00:14:49.519
<v Speaker 2>you'd extend bn F like adding an if statement rules

315
00:14:49.799 --> 00:14:53.120
<v Speaker 2>ifpanis statement l statement the brackets mean the l's part

316
00:14:53.159 --> 00:14:54.639
<v Speaker 2>is optional neat Okay.

317
00:14:54.639 --> 00:14:57.600
<v Speaker 1>So we have tokens, the ASDL defining the target AST

318
00:14:58.360 --> 00:15:01.360
<v Speaker 1>and the BNF grammar as a rule book. How does

319
00:15:01.360 --> 00:15:04.279
<v Speaker 1>the parser actually do the parsing? What's the technique?

320
00:15:04.519 --> 00:15:08.000
<v Speaker 2>The guide introduces a common technique called recursive descent parsing.

321
00:15:08.120 --> 00:15:10.240
<v Speaker 1>Recursive dissent sounds intriguing.

322
00:15:10.480 --> 00:15:13.759
<v Speaker 2>The basic idea is simple, For each non terminal symbol

323
00:15:14.080 --> 00:15:18.639
<v Speaker 2>in the BNF grammar, like program function statement, you write

324
00:15:18.799 --> 00:15:20.360
<v Speaker 2>a corresponding parsing function.

325
00:15:20.440 --> 00:15:22.240
<v Speaker 1>Okay, a function for each rule.

326
00:15:22.120 --> 00:15:25.200
<v Speaker 2>Pretty much, and these functions often call each other, mirroring

327
00:15:25.240 --> 00:15:27.679
<v Speaker 2>the structure of the grammar. That's the recursive part.

328
00:15:27.799 --> 00:15:31.320
<v Speaker 1>Ah okay, So how would parse statement work based on

329
00:15:31.360 --> 00:15:32.279
<v Speaker 1>our simple grammar?

330
00:15:32.440 --> 00:15:35.759
<v Speaker 2>Well, the rule is statement return x biller. So the

331
00:15:35.799 --> 00:15:38.720
<v Speaker 2>par statement function would first look for a return token. Okay,

332
00:15:38.879 --> 00:15:41.799
<v Speaker 2>if it finds one, it consumes it. Then it needs

333
00:15:41.799 --> 00:15:45.840
<v Speaker 2>to parson x, so it would call another function, maybe parsex.

334
00:15:45.399 --> 00:15:47.840
<v Speaker 1>Which would handle parsing the integer constant in our.

335
00:15:47.840 --> 00:15:51.639
<v Speaker 2>Case, right parsiicus would return the constant ast node, then parsated,

336
00:15:51.639 --> 00:15:54.360
<v Speaker 2>but looks for the final token, consumes that, consumes that,

337
00:15:55.000 --> 00:15:58.039
<v Speaker 2>and if everything worked, it bundles up the constant node

338
00:15:58.120 --> 00:16:01.919
<v Speaker 2>returned by parsis inside a new return ast node and

339
00:16:01.960 --> 00:16:03.320
<v Speaker 2>returns that got it.

340
00:16:03.480 --> 00:16:06.320
<v Speaker 1>The guide showed some pseudocode with an expect helper function.

341
00:16:06.799 --> 00:16:09.480
<v Speaker 2>Yeah. Expect is useful. It basically means check if the

342
00:16:09.519 --> 00:16:12.240
<v Speaker 2>next token is x, consume it if yes, raise an

343
00:16:12.320 --> 00:16:12.879
<v Speaker 2>error if no.

344
00:16:13.279 --> 00:16:16.000
<v Speaker 1>And these functions consume tokens as they go, So if

345
00:16:16.039 --> 00:16:19.240
<v Speaker 1>parse program finishes and there are still tokens left over.

346
00:16:19.200 --> 00:16:21.919
<v Speaker 2>That usually means there's extra stuff that doesn't fit the grammar.

347
00:16:22.000 --> 00:16:23.399
<v Speaker 2>A syntax error makes sense.

348
00:16:23.559 --> 00:16:26.399
<v Speaker 1>The guide mentioned predictive parsers and backtracking briefly too.

349
00:16:26.679 --> 00:16:29.639
<v Speaker 2>Yeah. For more complex grammars where a rule might have

350
00:16:29.759 --> 00:16:33.840
<v Speaker 2>multiple options like if versus return for statement, the parser

351
00:16:33.960 --> 00:16:36.039
<v Speaker 2>might need to peek ahead at the next token to

352
00:16:36.080 --> 00:16:40.159
<v Speaker 2>decide which path to take predictive or try one path

353
00:16:40.200 --> 00:16:41.399
<v Speaker 2>and backtrack if it fails.

354
00:16:41.960 --> 00:16:45.159
<v Speaker 1>But for our simple start, direct recursive descent works well.

355
00:16:45.240 --> 00:16:47.200
<v Speaker 2>And testing the parser same tool.

356
00:16:47.080 --> 00:16:50.440
<v Speaker 1>YEP test compiler path where you're compiler chapter two stage pars.

357
00:16:50.840 --> 00:16:53.919
<v Speaker 1>It checks against the invalid pars and valid tests again.

358
00:16:54.000 --> 00:16:56.120
<v Speaker 1>Writing your own tests to check the structure of the

359
00:16:56.159 --> 00:17:00.200
<v Speaker 1>output AST is super helpful for debugging and the implementation

360
00:17:00.279 --> 00:17:03.480
<v Speaker 1>tips where write a pretty printer for the ASD definitely

361
00:17:03.519 --> 00:17:05.799
<v Speaker 1>helps visualize the tree and give good error messages.

362
00:17:05.880 --> 00:17:10.200
<v Speaker 2>Crucial expected but found return online five column ten is

363
00:17:10.359 --> 00:17:12.119
<v Speaker 2>way better than just syntax error.

364
00:17:12.240 --> 00:17:16.240
<v Speaker 1>Absolutely okay. So source DAN lexer, the DAN pokins, the

365
00:17:16.279 --> 00:17:20.119
<v Speaker 1>met parser, DAN cast. We have the tree. What's next?

366
00:17:20.440 --> 00:17:24.400
<v Speaker 2>Now we hit cogeneration. This pass takes that c language AST.

367
00:17:24.279 --> 00:17:25.680
<v Speaker 1>The one the parser is built.

368
00:17:25.519 --> 00:17:30.039
<v Speaker 2>Exactly and transforms it into our target by sixty four

369
00:17:30.039 --> 00:17:33.640
<v Speaker 2>assembly instructions, but again not as text yet. We represent

370
00:17:33.680 --> 00:17:35.880
<v Speaker 2>the assembly program as another internal data.

371
00:17:35.680 --> 00:17:39.559
<v Speaker 1>Structure first another AST, an assembly AST precisely.

372
00:17:39.759 --> 00:17:41.559
<v Speaker 2>The guide calls it that. To keep things clear, it

373
00:17:41.599 --> 00:17:43.319
<v Speaker 2>has its own ASDL definition two.

374
00:17:43.480 --> 00:17:45.839
<v Speaker 1>Okay, what does the assembly ASDL look like?

375
00:17:45.920 --> 00:17:49.039
<v Speaker 2>It's also quite simple for now. Program function definition function

376
00:17:49.160 --> 00:17:54.599
<v Speaker 2>identify our name, instruction instructions instructions op src, opern and

377
00:17:54.680 --> 00:17:57.960
<v Speaker 2>dst ret oper im in register.

378
00:17:58.119 --> 00:18:02.440
<v Speaker 1>Okay, interesting parallels. Program has a function definition. A function

379
00:18:02.480 --> 00:18:04.599
<v Speaker 1>has a name, but instead of a C statement body,

380
00:18:04.640 --> 00:18:06.799
<v Speaker 1>it has instruction a list of instructions.

381
00:18:07.119 --> 00:18:09.720
<v Speaker 2>The astrisk means a list or sequence.

382
00:18:09.440 --> 00:18:13.039
<v Speaker 1>And the instruction types are mauve or reht and their

383
00:18:13.079 --> 00:18:16.039
<v Speaker 1>operations can be immediate, a constant or a register.

384
00:18:16.400 --> 00:18:18.480
<v Speaker 2>That's it for now, and initially, the only register we

385
00:18:18.519 --> 00:18:21.799
<v Speaker 2>care about is percent ax for the return value. The

386
00:18:21.799 --> 00:18:25.720
<v Speaker 2>code generator walks the cast and for each node it

387
00:18:25.799 --> 00:18:28.880
<v Speaker 2>figures out the equivalent assembly instructions and builds up this

388
00:18:28.960 --> 00:18:30.000
<v Speaker 2>assembly AST.

389
00:18:30.359 --> 00:18:33.599
<v Speaker 1>The guide had a table mapping cast nodes to assembly

390
00:18:33.640 --> 00:18:37.839
<v Speaker 1>AST constructs like return in C becomes a mauv register

391
00:18:38.559 --> 00:18:40.200
<v Speaker 1>than are ret in assembling.

392
00:18:39.920 --> 00:18:43.319
<v Speaker 2>Right, and constant int in the cast becomes in in

393
00:18:43.359 --> 00:18:44.359
<v Speaker 2>the assembly AST.

394
00:18:44.559 --> 00:18:46.759
<v Speaker 1>So it's a translation step building a new tree that

395
00:18:46.799 --> 00:18:48.960
<v Speaker 1>represents the assembly code needed exactly.

396
00:18:48.960 --> 00:18:51.400
<v Speaker 2>And you can see how one C statement return maps

397
00:18:51.440 --> 00:18:54.759
<v Speaker 2>to two assembly instructions movel and ret. That becomes more

398
00:18:54.799 --> 00:18:56.039
<v Speaker 2>common as things get complex.

399
00:18:56.200 --> 00:18:59.319
<v Speaker 1>Okay, assembly AST constructed in memory. The final step of

400
00:18:59.319 --> 00:19:01.119
<v Speaker 1>the initial four passes code emission.

401
00:19:01.160 --> 00:19:03.599
<v Speaker 2>Take that assembly AST we just built and write it

402
00:19:03.640 --> 00:19:04.720
<v Speaker 2>out to the s text file.

403
00:19:04.839 --> 00:19:06.279
<v Speaker 1>Finally the text file.

404
00:19:06.400 --> 00:19:09.440
<v Speaker 2>Finally the text file. And since the assembly AST structure

405
00:19:09.480 --> 00:19:13.400
<v Speaker 2>closely matches actual assembly syntax, this cast is usually quite straightforward.

406
00:19:13.519 --> 00:19:15.880
<v Speaker 2>You just traverse the assembly AST and put the text

407
00:19:15.960 --> 00:19:16.640
<v Speaker 2>for each node.

408
00:19:16.880 --> 00:19:20.319
<v Speaker 1>Another table in the guide showed the formatting function name

409
00:19:20.599 --> 00:19:24.920
<v Speaker 1>instructions becomes dot global name, then name than the text

410
00:19:24.960 --> 00:19:26.160
<v Speaker 1>for each instruction each On.

411
00:19:26.119 --> 00:19:31.039
<v Speaker 2>The new line, Yeah, mov A sarvisctst becomes move, srcdst,

412
00:19:31.440 --> 00:19:35.720
<v Speaker 2>rep becomes REHT, register becomes percent x, mint becomes NT.

413
00:19:35.960 --> 00:19:40.559
<v Speaker 1>Just translating the assembly AST nodes into their standard text representation.

414
00:19:40.079 --> 00:19:41.319
<v Speaker 2>Pretty much a direct translation.

415
00:19:41.400 --> 00:19:44.839
<v Speaker 1>Yeah, and remembering that mac OS needs the underscore prefix

416
00:19:45.000 --> 00:19:46.839
<v Speaker 1>on the global name like Maine.

417
00:19:46.759 --> 00:19:49.119
<v Speaker 2>Right, that platform detail needs to be handled by the emitter.

418
00:19:49.240 --> 00:19:51.920
<v Speaker 1>Okay. And then to test the whole thing lexer parser

419
00:19:51.960 --> 00:19:56.559
<v Speaker 1>codegen code emitter, we run test compiler without the stage flags.

420
00:19:55.920 --> 00:19:59.079
<v Speaker 2>Exactly test compiler path toward your compiler. Chapter two. That

421
00:19:59.160 --> 00:20:02.319
<v Speaker 2>command will one run your compiler on the test c

422
00:20:02.519 --> 00:20:07.400
<v Speaker 2>files to generate S files. Two use the system's GCC.

423
00:20:07.640 --> 00:20:13.119
<v Speaker 2>We're Clang to assemble and link those STS files into executables. Three.

424
00:20:13.400 --> 00:20:17.920
<v Speaker 2>Run those executables. Four compare their exit codes to the

425
00:20:17.960 --> 00:20:20.839
<v Speaker 2>exit codes produced by compiling the original C files directly

426
00:20:20.839 --> 00:20:21.559
<v Speaker 2>with GCC.

427
00:20:21.960 --> 00:20:25.119
<v Speaker 1>So it verifies the end to end behavior. Does our

428
00:20:25.119 --> 00:20:27.839
<v Speaker 1>compiler produce assembly that results in a program doing the

429
00:20:27.880 --> 00:20:30.640
<v Speaker 1>same thing at least in terms of exit code as GCC.

430
00:20:30.920 --> 00:20:33.240
<v Speaker 2>That's the goal for this stage. It's the final check

431
00:20:33.279 --> 00:20:36.119
<v Speaker 2>that all four passes are working together correctly for these

432
00:20:36.160 --> 00:20:37.000
<v Speaker 2>simple programs.

433
00:20:37.039 --> 00:20:39.880
<v Speaker 1>Wow. Okay, so in this deep dive, we've really laid

434
00:20:39.920 --> 00:20:43.079
<v Speaker 1>out the blueprint for a compiler's first steps. Yeah, lexingcode

435
00:20:43.079 --> 00:20:47.359
<v Speaker 1>into tokens, parsing tokens into that crucial abstracts and text tree,

436
00:20:47.440 --> 00:20:51.200
<v Speaker 1>generating an intermediate assembly representation from that tree, and finally

437
00:20:51.279 --> 00:20:54.519
<v Speaker 1>emitting that assembly into a text file. It's fascinating to

438
00:20:54.519 --> 00:20:57.519
<v Speaker 1>see how these stages transform the code step by step.

439
00:20:57.359 --> 00:20:59.920
<v Speaker 2>Absolutely and like we said, while it seems like a

440
00:21:00.240 --> 00:21:03.400
<v Speaker 2>for just returning a number, this structured multi pass approach

441
00:21:03.480 --> 00:21:05.839
<v Speaker 2>is really the key. It's the foundation we need to

442
00:21:05.920 --> 00:21:11.200
<v Speaker 2>start handling more complex sea features like operators, variables, control

443
00:21:11.279 --> 00:21:13.200
<v Speaker 2>flow in our next deep dives.

444
00:21:13.319 --> 00:21:15.799
<v Speaker 1>It definitely gives you a much deeper appreciation for what's

445
00:21:15.799 --> 00:21:20.240
<v Speaker 1>happening when you just type gccmiprogram dot c okay. Thinking

446
00:21:20.240 --> 00:21:23.519
<v Speaker 1>about these basic steps, how do they scale, How does

447
00:21:23.559 --> 00:21:26.759
<v Speaker 1>this foundation handle the sheer complexity of say, the Linux

448
00:21:26.799 --> 00:21:29.759
<v Speaker 1>kernel source code, and what are the really tricky sea

449
00:21:29.799 --> 00:21:33.000
<v Speaker 1>features that will challenge this simple pipeline later on. Definitely

450
00:21:33.079 --> 00:21:33.759
<v Speaker 1>something to chew on.

451
00:21:33.920 --> 00:21:36.000
<v Speaker 2>Indeed, plenty more complexity ahead.

452
00:21:36.039 --> 00:21:37.559
<v Speaker 1>Thanks for joining us for this deep dive
