WEBVTT

1
00:00:00.080 --> 00:00:02.040
<v Speaker 1>So you've probably done this right, type to line of code,

2
00:00:02.120 --> 00:00:05.160
<v Speaker 1>hit compile dot, and then you know, just watch your

3
00:00:05.200 --> 00:00:07.440
<v Speaker 1>program run. It feels kind of like magic, doesn't it

4
00:00:07.480 --> 00:00:10.279
<v Speaker 1>one moment? It's if by five the next poof the

5
00:00:10.279 --> 00:00:13.359
<v Speaker 1>computer's doing it. Today, we're really going to pull back

6
00:00:13.400 --> 00:00:16.359
<v Speaker 1>the curtain on that magic. We're doing a deep dive

7
00:00:16.519 --> 00:00:20.079
<v Speaker 1>into compiler construction, and we're using Bill Campbell's introduction to

8
00:00:20.120 --> 00:00:23.839
<v Speaker 1>compiler construction in a Java world as our map.

9
00:00:24.359 --> 00:00:26.640
<v Speaker 2>Our mission really is, right, it's a translation, but a

10
00:00:26.760 --> 00:00:32.840
<v Speaker 2>very sophisticated one. Think of it like translating a complex novel,

11
00:00:32.920 --> 00:00:36.600
<v Speaker 2>your high level language into a super precise, step by

12
00:00:36.600 --> 00:00:39.359
<v Speaker 2>step instruction manual for a worker, which is the computer

13
00:00:40.000 --> 00:00:42.600
<v Speaker 2>and the absolute key. The non negotiable rule is it

14
00:00:42.640 --> 00:00:45.560
<v Speaker 2>has to be semantics preserves scientic's preserving right, meaning the

15
00:00:45.600 --> 00:00:49.679
<v Speaker 2>translated program it must behave exactly like the original, every detail,

16
00:00:49.799 --> 00:00:52.200
<v Speaker 2>every calculation perfectly mirrors.

17
00:00:52.320 --> 00:00:55.679
<v Speaker 1>Okay, got it, But then why compile it all? Why

18
00:00:55.719 --> 00:00:58.960
<v Speaker 1>not just interpret everything directly? Seems simpler.

19
00:00:59.000 --> 00:01:02.119
<v Speaker 2>That's a really good question. There are two main reasons. Usually,

20
00:01:02.520 --> 00:01:06.560
<v Speaker 2>first performance, native machine code just runs way way faster,

21
00:01:06.840 --> 00:01:09.879
<v Speaker 2>and interpreter it has to parse and analyze every statement

22
00:01:09.920 --> 00:01:11.040
<v Speaker 2>every single time it runs.

23
00:01:11.280 --> 00:01:13.599
<v Speaker 1>Okay, so compiling does that work up front.

24
00:01:13.439 --> 00:01:16.400
<v Speaker 2>Exactly once, then it just runs. The second reason, well,

25
00:01:16.439 --> 00:01:20.159
<v Speaker 2>it's intellectual property or secrecy. Compiled machine code is much

26
00:01:20.200 --> 00:01:22.920
<v Speaker 2>harder to figure out to reverse engineer back into readable

27
00:01:22.959 --> 00:01:24.840
<v Speaker 2>source code. It offers protection.

28
00:01:25.200 --> 00:01:25.840
<v Speaker 1>Right, makes sense.

29
00:01:25.959 --> 00:01:28.719
<v Speaker 2>Now. Interpretation definitely still has its uses, you know, for

30
00:01:28.799 --> 00:01:31.359
<v Speaker 2>simple stuff scripts you don't run often, like Unix shell

31
00:01:31.400 --> 00:01:35.200
<v Speaker 2>scripts maybe, or some dynamic languages like Lisp where you

32
00:01:35.239 --> 00:01:39.519
<v Speaker 2>need names available at runtime. Compilation overhead isn't always worth it.

33
00:01:39.840 --> 00:01:42.359
<v Speaker 1>So when we talk about that native machine code, we're

34
00:01:42.400 --> 00:01:45.319
<v Speaker 1>really getting down to the metal, right, Things like memps

35
00:01:45.439 --> 00:01:48.599
<v Speaker 1>or Intel I three eighty six. Are they very different?

36
00:01:48.799 --> 00:01:52.760
<v Speaker 2>Yeah, they represent different philosophies. You have CISC complex instruction

37
00:01:52.799 --> 00:01:56.040
<v Speaker 2>set computers like Intel's I three eighty six and RISC

38
00:01:56.480 --> 00:02:00.680
<v Speaker 2>reduced instruction set computers like MIMPRSS. He tries to do

39
00:02:00.719 --> 00:02:04.519
<v Speaker 2>more in one instruction. RIS uses simplier steps. But the

40
00:02:04.599 --> 00:02:07.879
<v Speaker 2>key thing for the compiler is understanding these instructions and

41
00:02:07.920 --> 00:02:11.639
<v Speaker 2>how to use the CPU's super fast internal memory spots.

42
00:02:11.639 --> 00:02:14.120
<v Speaker 1>The register registers. Okay, those are limited.

43
00:02:13.800 --> 00:02:17.319
<v Speaker 2>Right, very limited? Maybe thirty two in a typical RSC machine,

44
00:02:17.319 --> 00:02:20.439
<v Speaker 2>even fewer, like eight general purpose ones in older CISE,

45
00:02:20.680 --> 00:02:23.159
<v Speaker 2>and they're way faster than main memory. So a huge

46
00:02:23.240 --> 00:02:25.800
<v Speaker 2>job for the compiler is figuring out how to keep

47
00:02:25.840 --> 00:02:29.080
<v Speaker 2>your variables and intermediate results right there in those fast

48
00:02:29.120 --> 00:02:30.759
<v Speaker 2>registers as much as possible.

49
00:02:31.000 --> 00:02:35.800
<v Speaker 1>So it's mapping names, generating the code sequence, and catching errors.

50
00:02:35.560 --> 00:02:40.360
<v Speaker 2>Absolutely, mapping names to memory or registers, generating that linear

51
00:02:40.400 --> 00:02:43.639
<v Speaker 2>machine code, and finding errors early on. Those are the

52
00:02:43.639 --> 00:02:44.280
<v Speaker 2>core tasks.

53
00:02:44.360 --> 00:02:46.400
<v Speaker 1>It sounds like it's not just one big program doing

54
00:02:46.479 --> 00:02:48.879
<v Speaker 1>all this, more like an assembly line.

55
00:02:48.960 --> 00:02:52.039
<v Speaker 2>That's a great analogy. Yes, compilers are usually broken down

56
00:02:52.039 --> 00:02:54.680
<v Speaker 2>into phases. Broadly, you've got a front end and a

57
00:02:54.680 --> 00:02:57.560
<v Speaker 2>back end. The front end deals with the analysis. It

58
00:02:57.599 --> 00:03:00.719
<v Speaker 2>takes your source code, understands its meaning, and this part

59
00:03:00.759 --> 00:03:03.159
<v Speaker 2>is source language dependent. It cares if it's Java or

60
00:03:03.199 --> 00:03:06.240
<v Speaker 2>C or whatever. Its final output is something called an

61
00:03:06.240 --> 00:03:10.400
<v Speaker 2>intermediate representation or IR. Think of it as a generic

62
00:03:10.520 --> 00:03:11.479
<v Speaker 2>version of your program.

63
00:03:11.560 --> 00:03:13.800
<v Speaker 1>Okay, And what happens inside that front end? What are

64
00:03:13.800 --> 00:03:14.280
<v Speaker 1>the steps?

65
00:03:14.680 --> 00:03:18.680
<v Speaker 2>Several subphases. First up is the scanner or lexical analyzer.

66
00:03:18.719 --> 00:03:21.199
<v Speaker 2>It's like breaking a sentence into words. It reads the

67
00:03:21.240 --> 00:03:23.680
<v Speaker 2>stream of characters and spits out a stream of tokens

68
00:03:23.719 --> 00:03:28.879
<v Speaker 2>identifiers like my var, numbers like forty two, keywords like if, operators,

69
00:03:29.039 --> 00:03:33.319
<v Speaker 2>then comments ah good point. Comments are scanned, recognized, and

70
00:03:33.360 --> 00:03:36.680
<v Speaker 2>then just thrown away. They mean nothing to the subsequent phases.

71
00:03:37.000 --> 00:03:38.639
<v Speaker 1>Right. So, after tokens.

72
00:03:38.319 --> 00:03:41.879
<v Speaker 2>Comes the parser or syntax analyzer. It takes the token

73
00:03:41.919 --> 00:03:46.159
<v Speaker 2>stream and builds an abstract syntax tree the AST. This

74
00:03:46.280 --> 00:03:49.520
<v Speaker 2>tree represents the grammatical structure of your program. It figures

75
00:03:49.520 --> 00:03:51.479
<v Speaker 2>out how the tokens relate to each other based on

76
00:03:51.520 --> 00:03:52.879
<v Speaker 2>the language's grammar.

77
00:03:52.599 --> 00:03:55.479
<v Speaker 1>Rules, like diagramming a sentence exactly like that.

78
00:03:55.879 --> 00:03:59.719
<v Speaker 2>Then the final analysis step is the semantic analyzer. This

79
00:03:59.800 --> 00:04:02.080
<v Speaker 2>is where type checking happens. It makes sure you're not

80
00:04:02.400 --> 00:04:04.960
<v Speaker 2>trying to add a number to a customer object Bickley.

81
00:04:05.280 --> 00:04:08.039
<v Speaker 2>It forces type rules. It also figures out where to

82
00:04:08.080 --> 00:04:11.199
<v Speaker 2>store variables, maybe on the stack frame, and can even

83
00:04:11.319 --> 00:04:14.039
<v Speaker 2>rewrite bits of the AST to make things clearer for

84
00:04:14.120 --> 00:04:14.879
<v Speaker 2>later stages.

85
00:04:15.240 --> 00:04:18.920
<v Speaker 1>Okay, so the front end understands the code's meaning and structure.

86
00:04:19.199 --> 00:04:21.680
<v Speaker 1>What's next is it straight to machine code?

87
00:04:21.839 --> 00:04:24.639
<v Speaker 2>Often there's a step in between, sometimes called the middle end.

88
00:04:24.720 --> 00:04:27.759
<v Speaker 2>It's kind of optional but very common. Its whole job

89
00:04:27.839 --> 00:04:31.879
<v Speaker 2>is optimization. It takes that intermediate representation, the IR, and

90
00:04:31.959 --> 00:04:35.120
<v Speaker 2>tries to make it better, faster, smaller. How does it

91
00:04:35.199 --> 00:04:36.439
<v Speaker 2>do that various ways.

92
00:04:36.480 --> 00:04:36.959
<v Speaker 1>It might.

93
00:04:39.959 --> 00:04:42.519
<v Speaker 2>Loops that don't change loop invariants and pull them out

94
00:04:42.519 --> 00:04:46.319
<v Speaker 2>so they only run once, or replace an expensive operation

95
00:04:46.439 --> 00:04:50.240
<v Speaker 2>like multiplication with a cheaper one like addition, if possible.

96
00:04:50.279 --> 00:04:52.399
<v Speaker 2>It's called strength reduction. It polishes the.

97
00:04:52.360 --> 00:04:55.160
<v Speaker 1>IR optimized IR, then comes the back end.

98
00:04:55.199 --> 00:04:57.920
<v Speaker 2>Then comes the back end the synthesis phase. This takes

99
00:04:57.959 --> 00:05:01.879
<v Speaker 2>the R and generates the action target machine code. And

100
00:05:02.000 --> 00:05:05.319
<v Speaker 2>this part is target language dependent. It needs to know

101
00:05:05.360 --> 00:05:09.360
<v Speaker 2>if it's generating code for Intel or AR or MIIPS.

102
00:05:09.519 --> 00:05:11.240
<v Speaker 1>It doesn't have subphases.

103
00:05:10.680 --> 00:05:14.439
<v Speaker 2>Too, it does. There's code generation itself, picking the actual

104
00:05:14.480 --> 00:05:18.680
<v Speaker 2>machine instructions. Then often a peephole optimizer. This looks at

105
00:05:18.680 --> 00:05:22.120
<v Speaker 2>a small window of instructions just generated and looks for

106
00:05:22.199 --> 00:05:25.399
<v Speaker 2>obvious local inefficiencies, like maybe a jump that immediately jumps

107
00:05:25.399 --> 00:05:29.519
<v Speaker 2>somewhere else, tiing up locally exactly. And finally the object phase,

108
00:05:29.560 --> 00:05:32.920
<v Speaker 2>which handles linking different compiled pieces together into the final

109
00:05:32.920 --> 00:05:34.000
<v Speaker 2>executable program.

110
00:05:34.199 --> 00:05:36.800
<v Speaker 1>Wow, that's quite a journey breaking it down like that.

111
00:05:37.120 --> 00:05:39.079
<v Speaker 1>What's the big win for compiler builders?

112
00:05:39.319 --> 00:05:43.360
<v Speaker 2>The modularity is huge. The biggest win is probably reusability

113
00:05:43.399 --> 00:05:46.399
<v Speaker 2>thanks to that intermediate representation. If you have a well

114
00:05:46.399 --> 00:05:49.240
<v Speaker 2>designed IR, it acts like a universal adapter. You can

115
00:05:49.279 --> 00:05:52.120
<v Speaker 2>write one front end for Java and plug it into

116
00:05:52.160 --> 00:05:56.199
<v Speaker 2>different back ends for Intel Era, whatever, or write one

117
00:05:56.240 --> 00:05:58.839
<v Speaker 2>back end for Intel and plug in front ends for

118
00:05:58.959 --> 00:06:02.560
<v Speaker 2>Java C plus plus Python. The source material gives a

119
00:06:02.600 --> 00:06:05.279
<v Speaker 2>great example. Once you have a front end for Java

120
00:06:05.319 --> 00:06:07.439
<v Speaker 2>and a back end for the Intel Core duo, you

121
00:06:07.519 --> 00:06:09.360
<v Speaker 2>only need to write a new C front end to

122
00:06:09.360 --> 00:06:12.399
<v Speaker 2>get a C compiler at a Spark back end, and

123
00:06:12.439 --> 00:06:14.680
<v Speaker 2>you get two more compilers, four for the price of two.

124
00:06:15.040 --> 00:06:16.199
<v Speaker 2>It's incredibly efficient.

125
00:06:16.279 --> 00:06:19.519
<v Speaker 1>That makes a lot of sense. Now, Java specifically, it

126
00:06:19.560 --> 00:06:21.399
<v Speaker 1>has a bit of a unique approach, doesn't it. It's

127
00:06:21.439 --> 00:06:24.160
<v Speaker 1>not quite that direct to native code path we just described.

128
00:06:24.240 --> 00:06:27.199
<v Speaker 2>That's absolutely right. Java takes a fascinating detour. When you

129
00:06:27.240 --> 00:06:30.720
<v Speaker 2>compile Java source code using Javac dot Burke, you don't

130
00:06:30.720 --> 00:06:33.600
<v Speaker 2>get native machine code. You get a dot class file

131
00:06:33.639 --> 00:06:37.360
<v Speaker 2>which contains bytecode. This bitcode isn't for your actual CPU,

132
00:06:37.519 --> 00:06:40.920
<v Speaker 2>it's for the Java Virtual Machine, the JVM. The JVM

133
00:06:40.959 --> 00:06:43.000
<v Speaker 2>itself starts out as an interpreter.

134
00:06:42.600 --> 00:06:44.879
<v Speaker 1>For this bike code an interpreter. But I thought we

135
00:06:44.920 --> 00:06:46.920
<v Speaker 1>said compilation was faster. Ah.

136
00:06:46.920 --> 00:06:50.120
<v Speaker 2>But here's the clever bit. The JVM uses just in

137
00:06:50.199 --> 00:06:54.680
<v Speaker 2>time JIT compilation and techniques like hotspot compilation. As your

138
00:06:54.759 --> 00:06:58.439
<v Speaker 2>Java program runs, the JVM watches itself. It identifies the

139
00:06:58.480 --> 00:07:01.920
<v Speaker 2>hotspots methods or loops that are executed frequently, and then

140
00:07:02.120 --> 00:07:06.040
<v Speaker 2>on the fly during execution, it compiles these specific hotspots

141
00:07:06.079 --> 00:07:08.560
<v Speaker 2>down to native machine code for the actual hardware you're

142
00:07:08.639 --> 00:07:09.000
<v Speaker 2>running on.

143
00:07:09.160 --> 00:07:11.240
<v Speaker 1>WHOA, So it compiles while the program.

144
00:07:11.040 --> 00:07:14.040
<v Speaker 2>Is running exactly it caches this native code. The next

145
00:07:14.040 --> 00:07:16.399
<v Speaker 2>time that hotspot is hit, it runs the super fast

146
00:07:16.439 --> 00:07:19.480
<v Speaker 2>native version. It's dynamic, and often this can lead to

147
00:07:19.519 --> 00:07:23.160
<v Speaker 2>performance that rivals or even beats statically compiled code in

148
00:07:23.240 --> 00:07:27.600
<v Speaker 2>some benchmarks. Because the JIT compiler has runtime information, so

149
00:07:27.759 --> 00:07:31.759
<v Speaker 2>JVM bytecode essentially acts as another layer of intermediate representation.

150
00:07:32.040 --> 00:07:35.519
<v Speaker 1>That's incredibly dynamic. How do the big players, the celebrity

151
00:07:35.519 --> 00:07:38.639
<v Speaker 1>compilers like Oracles, manage that complexity.

152
00:07:38.879 --> 00:07:42.399
<v Speaker 2>Well, let's look at Oracle's Java hotspot compiler. It's kind

153
00:07:42.399 --> 00:07:45.519
<v Speaker 2>of the standard. It actually has two main JT compilers,

154
00:07:46.120 --> 00:07:48.680
<v Speaker 2>a client one for fast startup and a server one

155
00:07:48.720 --> 00:07:52.439
<v Speaker 2>for heavy optimization. The server compiler does amazing things like

156
00:07:52.560 --> 00:07:56.720
<v Speaker 2>using advanced IR sophisticated register allocation, But one of the

157
00:07:56.720 --> 00:08:01.160
<v Speaker 2>coolest things is dynamic de optimization optimization. It undoes its

158
00:08:01.160 --> 00:08:04.839
<v Speaker 2>own work precisely. Hotspot can be really aggressive with optimizations,

159
00:08:04.879 --> 00:08:08.360
<v Speaker 2>like inlining methods. It makes assumptions, but what if later

160
00:08:08.519 --> 00:08:11.439
<v Speaker 2>the program loads a new class that invalidates an assumption,

161
00:08:11.879 --> 00:08:15.360
<v Speaker 2>maybe overrides a method it inlined. Hotspot can detect this,

162
00:08:15.680 --> 00:08:18.519
<v Speaker 2>throw away the now incorrect native code for that section,

163
00:08:18.680 --> 00:08:21.360
<v Speaker 2>and fall back to interpreting it. Then it might reoptimize

164
00:08:21.439 --> 00:08:22.439
<v Speaker 2>later with the new information.

165
00:08:22.639 --> 00:08:25.000
<v Speaker 1>That's wild. It lets it be optimistic because it has

166
00:08:25.000 --> 00:08:26.800
<v Speaker 1>an escape patch exactly.

167
00:08:26.959 --> 00:08:32.399
<v Speaker 2>It can be aggressively optimistic. Another trick is onstack replacement OSR.

168
00:08:33.279 --> 00:08:36.080
<v Speaker 2>Imagine a method is stuck in a long loop running

169
00:08:36.080 --> 00:08:39.840
<v Speaker 2>in interpreted mode. Hotspot can compile an optimized version of

170
00:08:39.840 --> 00:08:43.120
<v Speaker 2>that loop on the side and then incredibly swap the

171
00:08:43.159 --> 00:08:46.120
<v Speaker 2>currently executing interpreted cun and its state with the new

172
00:08:46.200 --> 00:08:49.919
<v Speaker 2>compiled version right in the middle of the loop. Execution continues,

173
00:08:50.159 --> 00:08:51.039
<v Speaker 2>but now it's fast.

174
00:08:51.279 --> 00:08:54.240
<v Speaker 1>Mind boggling. What about other compilers like the one used

175
00:08:54.279 --> 00:08:55.320
<v Speaker 1>in Eclipse.

176
00:08:54.960 --> 00:08:59.000
<v Speaker 2>AH the Eclipse compiler for Java BCJ it's claim to

177
00:08:59.039 --> 00:09:02.240
<v Speaker 2>fame as being an in compiler. After you build your

178
00:09:02.279 --> 00:09:05.159
<v Speaker 2>project once, if you change just one file, ECJ is

179
00:09:05.159 --> 00:09:07.720
<v Speaker 2>smart enough to only recompile that file and any others

180
00:09:07.720 --> 00:09:08.799
<v Speaker 2>that directly depend on it.

181
00:09:08.840 --> 00:09:10.919
<v Speaker 1>Well, that must be way faster for big projects.

182
00:09:11.039 --> 00:09:14.279
<v Speaker 2>Usually faster developers love it. And here's a really surprising

183
00:09:14.320 --> 00:09:16.919
<v Speaker 2>thing the source mentions about ECJ. It actually lets you

184
00:09:17.000 --> 00:09:19.639
<v Speaker 2>run an even debug code that contains errors.

185
00:09:19.679 --> 00:09:21.080
<v Speaker 1>Wait, what how can that work?

186
00:09:21.240 --> 00:09:23.159
<v Speaker 2>If you have an error and a method, but that

187
00:09:23.200 --> 00:09:25.919
<v Speaker 2>method is never actually called during your test run, the

188
00:09:25.960 --> 00:09:29.279
<v Speaker 2>program runs fine. The error only matters if you actually

189
00:09:29.320 --> 00:09:32.320
<v Speaker 2>try to execute that specific broken part. If you do,

190
00:09:32.519 --> 00:09:35.440
<v Speaker 2>you get an exception. But otherwise it works great for

191
00:09:35.519 --> 00:09:36.159
<v Speaker 2>rapid development.

192
00:09:36.279 --> 00:09:40.480
<v Speaker 1>That's surprisingly pragmatic. So we have dynamic incremental. Any other

193
00:09:40.519 --> 00:09:41.919
<v Speaker 1>major approaches.

194
00:09:41.799 --> 00:09:45.039
<v Speaker 2>Well, there's the g and U Java compiler GCJ. Its

195
00:09:45.039 --> 00:09:48.519
<v Speaker 2>focus was different, more on static compilation. It aim to

196
00:09:48.519 --> 00:09:51.320
<v Speaker 2>compile Java directly to native machine code ahead of time,

197
00:09:51.360 --> 00:09:55.000
<v Speaker 2>like a traditional plus compiler. This gives you fast startup

198
00:09:55.039 --> 00:09:57.519
<v Speaker 2>and potentially lower memory use, which can be good for

199
00:09:57.559 --> 00:10:01.000
<v Speaker 2>embedded systems, but you lose some of the dynamic advantages.

200
00:10:01.600 --> 00:10:05.320
<v Speaker 1>And we should mention Microsoft's cshard compiler for dot Net too.

201
00:10:05.639 --> 00:10:08.279
<v Speaker 1>It's similar to Java in concept to compile c chap

202
00:10:08.360 --> 00:10:12.600
<v Speaker 1>into Common Intermediate Language CIL, which is like JVM bytecode.

203
00:10:12.919 --> 00:10:16.480
<v Speaker 1>The CIL is then run by the common language runtime CLR,

204
00:10:16.759 --> 00:10:18.799
<v Speaker 1>which uses JT compilation.

205
00:10:18.600 --> 00:10:22.320
<v Speaker 2>So a similar virtual machine and JIT approach very similar.

206
00:10:22.360 --> 00:10:25.159
<v Speaker 1>Dot Net has its own bytcode format CIL and its

207
00:10:25.159 --> 00:10:28.279
<v Speaker 1>own virtual machine CLR doing the JT magic. This is

208
00:10:28.320 --> 00:10:31.639
<v Speaker 1>all fascinatingly complex, but the source talks about a hands

209
00:10:31.639 --> 00:10:34.440
<v Speaker 1>on way to learn this using J How does that

210
00:10:34.480 --> 00:10:35.399
<v Speaker 1>make it more concrete?

211
00:10:35.639 --> 00:10:38.559
<v Speaker 2>J is brilliant for teaching. It's a subset of Java,

212
00:10:38.600 --> 00:10:42.360
<v Speaker 2>maybe half the syntax, but still a proper object oriented language.

213
00:10:42.480 --> 00:10:45.159
<v Speaker 2>The compiler for it describing the book is over thirty

214
00:10:45.200 --> 00:10:49.120
<v Speaker 2>thousand lines. So by working with JJ, adding features, fixing bugs,

215
00:10:49.399 --> 00:10:52.919
<v Speaker 2>you're not just learning compiler theory, you're practicing real software engineering,

216
00:10:53.279 --> 00:10:56.080
<v Speaker 2>and honestly, it makes you a better Java programmer too.

217
00:10:56.200 --> 00:10:58.679
<v Speaker 1>Okay, let's make it practical. Suppose we wanted to add

218
00:10:58.679 --> 00:11:02.440
<v Speaker 1>a simple division operator a t J. How would that

219
00:11:02.519 --> 00:11:04.320
<v Speaker 1>touch the different phases we talked about?

220
00:11:04.519 --> 00:11:08.679
<v Speaker 2>Great example. Okay, first, lexical analysis. The scanner needs to

221
00:11:08.720 --> 00:11:11.159
<v Speaker 2>recognize as a new token let's call it div, so

222
00:11:11.200 --> 00:11:13.320
<v Speaker 2>when it sees in the source code, it outputs DV

223
00:11:13.480 --> 00:11:15.159
<v Speaker 2>instead of say an error.

224
00:11:15.200 --> 00:11:16.840
<v Speaker 1>Step one, recognize the symbol.

225
00:11:16.919 --> 00:11:20.519
<v Speaker 2>Step two syntax analysis. The parser needs a new rule.

226
00:11:20.759 --> 00:11:23.279
<v Speaker 2>We'd define a new node for our abstracts and tax tree,

227
00:11:23.320 --> 00:11:26.000
<v Speaker 2>maybe JADA write up. The parser's grammar rules would be

228
00:11:26.080 --> 00:11:29.840
<v Speaker 2>updated to note that an expression can be expression div expression,

229
00:11:30.159 --> 00:11:32.799
<v Speaker 2>and it would build that jadavide up node in the AST.

230
00:11:33.240 --> 00:11:36.039
<v Speaker 1>So the tree now represents division structurally exactly.

231
00:11:36.240 --> 00:11:40.200
<v Speaker 2>Step three semantic analysis. This is type checking. We'd add

232
00:11:40.279 --> 00:11:43.879
<v Speaker 2>rules here. J probably only allows in division, so the

233
00:11:43.960 --> 00:11:47.240
<v Speaker 2>analyzer checks are the left and right operands of jadavide

234
00:11:47.279 --> 00:11:51.080
<v Speaker 2>up both integers if not throw a type error. It

235
00:11:51.120 --> 00:11:55.000
<v Speaker 2>also handles story delocation, but for a simple operator typechecking

236
00:11:55.080 --> 00:11:56.080
<v Speaker 2>is key makes sense.

237
00:11:56.440 --> 00:11:59.879
<v Speaker 1>Can't divide an integer by a Hello world precisely.

238
00:12:00.600 --> 00:12:06.159
<v Speaker 2>Finally, Step four code generation. Assuming it past semantic analysis,

239
00:12:06.159 --> 00:12:09.519
<v Speaker 2>we generate the JVM bytecode. Would use the provided tools

240
00:12:09.639 --> 00:12:12.519
<v Speaker 2>like slimmett and the books example to output the right

241
00:12:12.600 --> 00:12:16.080
<v Speaker 2>JVM instructions, probably something like loading the two operands onto

242
00:12:16.120 --> 00:12:18.759
<v Speaker 2>the JVM stack, then issuing the ID of instruction for

243
00:12:18.799 --> 00:12:19.559
<v Speaker 2>integer division.

244
00:12:19.679 --> 00:12:22.799
<v Speaker 1>Wow. Okay, so even adding one operator forces you to

245
00:12:22.840 --> 00:12:26.120
<v Speaker 1>modify the scanner, the parser, the semantic analyzer, and.

246
00:12:26.080 --> 00:12:28.919
<v Speaker 2>The code generator every single phase. It really illustrates how

247
00:12:28.919 --> 00:12:31.960
<v Speaker 2>interconnected it all is. And the j approach emphasizes writing

248
00:12:32.000 --> 00:12:35.080
<v Speaker 2>tests first. Using extreme programming, you'd write tests that should

249
00:12:35.080 --> 00:12:37.440
<v Speaker 2>compile with division and tests that should fail with specific

250
00:12:37.559 --> 00:12:39.480
<v Speaker 2>errors before you even implement the operator.

251
00:12:39.600 --> 00:12:41.639
<v Speaker 1>Test driven compiler development makes sense.

252
00:12:41.679 --> 00:12:43.000
<v Speaker 2>It ensures robustness.

253
00:12:43.279 --> 00:12:45.919
<v Speaker 1>So after all that, we might have JVM bytcode, but

254
00:12:46.000 --> 00:12:49.240
<v Speaker 1>sometimes you need ROSS by native code. The source even

255
00:12:49.240 --> 00:12:52.679
<v Speaker 1>talks about a JVM to my pace translator. What's that stage.

256
00:12:52.399 --> 00:12:54.840
<v Speaker 2>About, right, that's pushing it all the way to the hardware.

257
00:12:55.519 --> 00:12:58.720
<v Speaker 2>In that scenario, the JVM bytcode we just generated that

258
00:12:58.799 --> 00:13:01.840
<v Speaker 2>becomes a new sourceling which or rather a new IR.

259
00:13:02.879 --> 00:13:06.879
<v Speaker 2>Thead JVM compiler acts as the front end, and this

260
00:13:07.000 --> 00:13:11.000
<v Speaker 2>JVM to SPIM translator SPIM simulates a mi IPS is

261
00:13:11.039 --> 00:13:14.240
<v Speaker 2>the new back end. Its compilation layered on compilation.

262
00:13:14.000 --> 00:13:15.399
<v Speaker 1>Buying the BI code itself.

263
00:13:15.120 --> 00:13:18.559
<v Speaker 2>Exactly, and this back end needs to understand the target

264
00:13:18.600 --> 00:13:21.720
<v Speaker 2>machine mi IPS in this case its memory layout, where

265
00:13:21.720 --> 00:13:24.320
<v Speaker 2>code goes, where static data goes, the heap, the stack,

266
00:13:24.639 --> 00:13:27.279
<v Speaker 2>and its registers. Those thirty two on ips registers, knowing

267
00:13:27.279 --> 00:13:30.679
<v Speaker 2>which ones are for arguments mount fewer dollars, return values

268
00:13:30.759 --> 00:13:33.200
<v Speaker 2>V dollar V dollars, the return address, which ones the

269
00:13:33.200 --> 00:13:35.440
<v Speaker 2>called function must save Salmi's dollars, and which ones the

270
00:13:35.440 --> 00:13:38.879
<v Speaker 2>caller must save if needed. What's conventions to follow absolutely

271
00:13:39.279 --> 00:13:42.919
<v Speaker 2>and this back end often uses its own intermediate representations

272
00:13:42.960 --> 00:13:46.840
<v Speaker 2>to manage the complexity. It might first convert JVM bycode

273
00:13:46.879 --> 00:13:50.679
<v Speaker 2>into a high level ir HIR. This often uses static

274
00:13:50.759 --> 00:13:55.159
<v Speaker 2>single assignment SSA form. It's a form where every variable

275
00:13:55.200 --> 00:13:58.440
<v Speaker 2>is assigned of value exactly once. If a variable could

276
00:13:58.440 --> 00:14:01.240
<v Speaker 2>get its value from different places, like after an IFLSE,

277
00:14:01.559 --> 00:14:04.720
<v Speaker 2>special five functions are introduced to represent that merge. It

278
00:14:04.759 --> 00:14:08.879
<v Speaker 2>makes many optimizations easier to implement. Then this HIR is

279
00:14:08.919 --> 00:14:12.159
<v Speaker 2>lowered to a low level ir LR, the five functions

280
00:14:12.159 --> 00:14:15.559
<v Speaker 2>are gone, replaced by actual move instructions. And crucially, the

281
00:14:15.720 --> 00:14:19.279
<v Speaker 2>LAR often uses an unlimited number of virtual registers. It's

282
00:14:19.360 --> 00:14:21.679
<v Speaker 2>much closer to the final machine code, but without worrying

283
00:14:21.679 --> 00:14:23.159
<v Speaker 2>about the limited physical registers.

284
00:14:23.200 --> 00:14:27.000
<v Speaker 1>Just yet, unlimited virtual registers. But the chip only has say,

285
00:14:27.120 --> 00:14:29.279
<v Speaker 1>thirty two physical ones, how does they get resolved?

286
00:14:29.559 --> 00:14:32.639
<v Speaker 2>That's the magic of register allocation. It's one of the

287
00:14:32.639 --> 00:14:35.960
<v Speaker 2>most critical and complex parts of a modern back end.

288
00:14:36.600 --> 00:14:39.720
<v Speaker 2>The goal is to map those unlimited virtual registers onto

289
00:14:39.720 --> 00:14:43.799
<v Speaker 2>the scarce physical registers. When you have more virtual registers

290
00:14:43.879 --> 00:14:47.679
<v Speaker 2>live holding values still needed than physical registers available, you

291
00:14:47.759 --> 00:14:50.559
<v Speaker 2>have to spill some virtual registers. That means storing their

292
00:14:50.600 --> 00:14:53.360
<v Speaker 2>values out to memory, usually the stack, and reloading them

293
00:14:53.399 --> 00:14:54.240
<v Speaker 2>later when needed.

294
00:14:54.519 --> 00:14:55.960
<v Speaker 1>Ah, and spilling is slow.

295
00:14:56.200 --> 00:14:59.679
<v Speaker 2>Spilling is slow, so a good register allocator tries really

296
00:14:59.720 --> 00:15:02.720
<v Speaker 2>hard to minimize spills. Techniques like linear scan or graph

297
00:15:02.759 --> 00:15:05.759
<v Speaker 2>coloring are used. It's a huge area of compiler research.

298
00:15:05.879 --> 00:15:08.440
<v Speaker 1>Okay, and you mentioned optimizations happen in the middle end,

299
00:15:08.519 --> 00:15:11.679
<v Speaker 1>but surely there are more optimizations happening here in the

300
00:15:11.720 --> 00:15:12.320
<v Speaker 1>back end too.

301
00:15:12.360 --> 00:15:15.799
<v Speaker 2>Oh. Definitely, optimizations happen at multiple stages, often on these

302
00:15:15.840 --> 00:15:21.279
<v Speaker 2>intermediate representations like HIR things like inlining replacing a function

303
00:15:21.399 --> 00:15:24.840
<v Speaker 2>call with the function's actual code to avoid call return overhead.

304
00:15:25.000 --> 00:15:28.960
<v Speaker 2>Very common. Constant folding and propagation. If you have xplicus

305
00:15:29.000 --> 00:15:32.200
<v Speaker 2>three y quills x plus four, the compiler can figure

306
00:15:32.200 --> 00:15:34.360
<v Speaker 2>out why is always seven and just use seven.

307
00:15:34.480 --> 00:15:35.960
<v Speaker 1>Simple but effective.

308
00:15:36.000 --> 00:15:41.600
<v Speaker 2>Very common sub expression elimination CSE, finding identical calculations and

309
00:15:41.639 --> 00:15:45.200
<v Speaker 2>doing them only once, storing the result. Think about calculating

310
00:15:45.200 --> 00:15:49.360
<v Speaker 2>an array elements address inside nested loops. CSE can save

311
00:15:49.399 --> 00:15:52.799
<v Speaker 2>a lot of redundant math lifting loop invariant code. We

312
00:15:52.840 --> 00:15:55.600
<v Speaker 2>mentioned this before. If a calculation inside a loop doesn't

313
00:15:55.639 --> 00:15:58.360
<v Speaker 2>depend on the loop iteration, pull it out, compute at

314
00:15:58.399 --> 00:16:02.519
<v Speaker 2>once before the loop starts. Array bounds check elimination Java,

315
00:16:02.559 --> 00:16:05.360
<v Speaker 2>for instance, checks if your array index is valid. But

316
00:16:05.399 --> 00:16:07.559
<v Speaker 2>if the compiler can prove the index will always be

317
00:16:07.600 --> 00:16:11.200
<v Speaker 2>in bounds, maybe for iokal to array dot linksh one,

318
00:16:11.480 --> 00:16:14.360
<v Speaker 2>it can remove that check entirely, saving potentially millions of

319
00:16:14.399 --> 00:16:15.240
<v Speaker 2>checks in a big loop.

320
00:16:15.399 --> 00:16:17.399
<v Speaker 1>That sounds like it can make a massive difference.

321
00:16:17.519 --> 00:16:20.279
<v Speaker 2>It really does. The source quote's an expert Kathleen Nubbs

322
00:16:20.279 --> 00:16:22.720
<v Speaker 2>saying it's often best to just try all the optimizations,

323
00:16:22.759 --> 00:16:25.840
<v Speaker 2>even imperfectly. The combined effect is usually huge.

324
00:16:25.960 --> 00:16:32.320
<v Speaker 1>Wow. What a journey from typing code through tokens, trees bycode, irs, optimizations,

325
00:16:32.360 --> 00:16:36.759
<v Speaker 1>register allocation, finally down to the bare metal instructions. It's incredible,

326
00:16:36.919 --> 00:16:37.519
<v Speaker 1>it really is.

327
00:16:38.320 --> 00:16:41.000
<v Speaker 2>This deep dive shows that behind every app you use,

328
00:16:41.080 --> 00:16:45.720
<v Speaker 2>every website, there's this incredibly intricate dance of translation analysis

329
00:16:45.720 --> 00:16:51.600
<v Speaker 2>and refinement going on a compiler isn't just translating. It's optimizing, checking, restructuring,

330
00:16:52.440 --> 00:16:56.039
<v Speaker 2>constantly working to bridge the gap between how we think

331
00:16:56.159 --> 00:16:57.360
<v Speaker 2>and how the machine works.

332
00:16:57.559 --> 00:16:59.320
<v Speaker 1>And it definitely drives home that even if you don't

333
00:16:59.320 --> 00:17:03.639
<v Speaker 1>build compile, just knowing this process exists, understanding the steps

334
00:17:04.000 --> 00:17:06.920
<v Speaker 1>makes you much more informed programmer. Right, you appreciate what's

335
00:17:06.920 --> 00:17:07.880
<v Speaker 1>happening under the hood.

336
00:17:07.920 --> 00:17:11.519
<v Speaker 2>Absolutely, you understand why some code might be faster than

337
00:17:11.519 --> 00:17:15.359
<v Speaker 2>other code, or why certain language features might have performance implications.

338
00:17:15.759 --> 00:17:18.759
<v Speaker 2>It demystifies things, and maybe a final thought to leave with,

339
00:17:19.559 --> 00:17:23.799
<v Speaker 2>think about how these compiler techniques keep evolving JIT hotspot,

340
00:17:23.920 --> 00:17:28.599
<v Speaker 2>dynamic deoptimization, incremental builds. It reflects this constant push and

341
00:17:28.640 --> 00:17:32.240
<v Speaker 2>pull in computing between rows, speed, the flexibility programmers need,

342
00:17:32.359 --> 00:17:35.559
<v Speaker 2>and managing the sheer complexity of modern software and hardware.

343
00:17:36.200 --> 00:17:38.279
<v Speaker 2>What's the next piece of compiler magic going to.

344
00:17:38.200 --> 00:17:41.240
<v Speaker 1>Be a fascinating question? Indeed, well, thank you for joining

345
00:17:41.319 --> 00:17:43.680
<v Speaker 1>us on this really deep dive into the compiler's world.

346
00:17:44.000 --> 00:17:48.240
<v Speaker 1>We hope you listening start seeing these princibles everywhere in

347
00:17:48.279 --> 00:17:49.400
<v Speaker 1>the technology you use.
