WEBVTT 1 00:00:00.080 --> 00:00:02.040 So you've probably done this right, type to line of code, 2 00:00:02.120 --> 00:00:05.160 hit compile dot, and then you know, just watch your 3 00:00:05.200 --> 00:00:07.440 program run. It feels kind of like magic, doesn't it 4 00:00:07.480 --> 00:00:10.279 one moment? It's if by five the next poof the 5 00:00:10.279 --> 00:00:13.359 computer's doing it. Today, we're really going to pull back 6 00:00:13.400 --> 00:00:16.359 the curtain on that magic. We're doing a deep dive 7 00:00:16.519 --> 00:00:20.079 into compiler construction, and we're using Bill Campbell's introduction to 8 00:00:20.120 --> 00:00:23.839 compiler construction in a Java world as our map. 9 00:00:24.359 --> 00:00:26.640 Our mission really is, right, it's a translation, but a 10 00:00:26.760 --> 00:00:32.840 very sophisticated one. Think of it like translating a complex novel, 11 00:00:32.920 --> 00:00:36.600 your high level language into a super precise, step by 12 00:00:36.600 --> 00:00:39.359 step instruction manual for a worker, which is the computer 13 00:00:40.000 --> 00:00:42.600 and the absolute key. The non negotiable rule is it 14 00:00:42.640 --> 00:00:45.560 has to be semantics preserves scientic's preserving right, meaning the 15 00:00:45.600 --> 00:00:49.679 translated program it must behave exactly like the original, every detail, 16 00:00:49.799 --> 00:00:52.200 every calculation perfectly mirrors. 17 00:00:52.320 --> 00:00:55.679 Okay, got it, But then why compile it all? Why 18 00:00:55.719 --> 00:00:58.960 not just interpret everything directly? Seems simpler. 19 00:00:59.000 --> 00:01:02.119 That's a really good question. There are two main reasons. Usually, 20 00:01:02.520 --> 00:01:06.560 first performance, native machine code just runs way way faster, 21 00:01:06.840 --> 00:01:09.879 and interpreter it has to parse and analyze every statement 22 00:01:09.920 --> 00:01:11.040 every single time it runs. 23 00:01:11.280 --> 00:01:13.599 Okay, so compiling does that work up front. 24 00:01:13.439 --> 00:01:16.400 Exactly once, then it just runs. The second reason, well, 25 00:01:16.439 --> 00:01:20.159 it's intellectual property or secrecy. Compiled machine code is much 26 00:01:20.200 --> 00:01:22.920 harder to figure out to reverse engineer back into readable 27 00:01:22.959 --> 00:01:24.840 source code. It offers protection. 28 00:01:25.200 --> 00:01:25.840 Right, makes sense. 29 00:01:25.959 --> 00:01:28.719 Now. Interpretation definitely still has its uses, you know, for 30 00:01:28.799 --> 00:01:31.359 simple stuff scripts you don't run often, like Unix shell 31 00:01:31.400 --> 00:01:35.200 scripts maybe, or some dynamic languages like Lisp where you 32 00:01:35.239 --> 00:01:39.519 need names available at runtime. Compilation overhead isn't always worth it. 33 00:01:39.840 --> 00:01:42.359 So when we talk about that native machine code, we're 34 00:01:42.400 --> 00:01:45.319 really getting down to the metal, right, Things like memps 35 00:01:45.439 --> 00:01:48.599 or Intel I three eighty six. Are they very different? 36 00:01:48.799 --> 00:01:52.760 Yeah, they represent different philosophies. You have CISC complex instruction 37 00:01:52.799 --> 00:01:56.040 set computers like Intel's I three eighty six and RISC 38 00:01:56.480 --> 00:02:00.680 reduced instruction set computers like MIMPRSS. He tries to do 39 00:02:00.719 --> 00:02:04.519 more in one instruction. RIS uses simplier steps. But the 40 00:02:04.599 --> 00:02:07.879 key thing for the compiler is understanding these instructions and 41 00:02:07.920 --> 00:02:11.639 how to use the CPU's super fast internal memory spots. 42 00:02:11.639 --> 00:02:14.120 The register registers. Okay, those are limited. 43 00:02:13.800 --> 00:02:17.319 Right, very limited? Maybe thirty two in a typical RSC machine, 44 00:02:17.319 --> 00:02:20.439 even fewer, like eight general purpose ones in older CISE, 45 00:02:20.680 --> 00:02:23.159 and they're way faster than main memory. So a huge 46 00:02:23.240 --> 00:02:25.800 job for the compiler is figuring out how to keep 47 00:02:25.840 --> 00:02:29.080 your variables and intermediate results right there in those fast 48 00:02:29.120 --> 00:02:30.759 registers as much as possible. 49 00:02:31.000 --> 00:02:35.800 So it's mapping names, generating the code sequence, and catching errors. 50 00:02:35.560 --> 00:02:40.360 Absolutely, mapping names to memory or registers, generating that linear 51 00:02:40.400 --> 00:02:43.639 machine code, and finding errors early on. Those are the 52 00:02:43.639 --> 00:02:44.280 core tasks. 53 00:02:44.360 --> 00:02:46.400 It sounds like it's not just one big program doing 54 00:02:46.479 --> 00:02:48.879 all this, more like an assembly line. 55 00:02:48.960 --> 00:02:52.039 That's a great analogy. Yes, compilers are usually broken down 56 00:02:52.039 --> 00:02:54.680 into phases. Broadly, you've got a front end and a 57 00:02:54.680 --> 00:02:57.560 back end. The front end deals with the analysis. It 58 00:02:57.599 --> 00:03:00.719 takes your source code, understands its meaning, and this part 59 00:03:00.759 --> 00:03:03.159 is source language dependent. It cares if it's Java or 60 00:03:03.199 --> 00:03:06.240 C or whatever. Its final output is something called an 61 00:03:06.240 --> 00:03:10.400 intermediate representation or IR. Think of it as a generic 62 00:03:10.520 --> 00:03:11.479 version of your program. 63 00:03:11.560 --> 00:03:13.800 Okay, And what happens inside that front end? What are 64 00:03:13.800 --> 00:03:14.280 the steps? 65 00:03:14.680 --> 00:03:18.680 Several subphases. First up is the scanner or lexical analyzer. 66 00:03:18.719 --> 00:03:21.199 It's like breaking a sentence into words. It reads the 67 00:03:21.240 --> 00:03:23.680 stream of characters and spits out a stream of tokens 68 00:03:23.719 --> 00:03:28.879 identifiers like my var, numbers like forty two, keywords like if, operators, 69 00:03:29.039 --> 00:03:33.319 then comments ah good point. Comments are scanned, recognized, and 70 00:03:33.360 --> 00:03:36.680 then just thrown away. They mean nothing to the subsequent phases. 71 00:03:37.000 --> 00:03:38.639 Right. So, after tokens. 72 00:03:38.319 --> 00:03:41.879 Comes the parser or syntax analyzer. It takes the token 73 00:03:41.919 --> 00:03:46.159 stream and builds an abstract syntax tree the AST. This 74 00:03:46.280 --> 00:03:49.520 tree represents the grammatical structure of your program. It figures 75 00:03:49.520 --> 00:03:51.479 out how the tokens relate to each other based on 76 00:03:51.520 --> 00:03:52.879 the language's grammar. 77 00:03:52.599 --> 00:03:55.479 Rules, like diagramming a sentence exactly like that. 78 00:03:55.879 --> 00:03:59.719 Then the final analysis step is the semantic analyzer. This 79 00:03:59.800 --> 00:04:02.080 is where type checking happens. It makes sure you're not 80 00:04:02.400 --> 00:04:04.960 trying to add a number to a customer object Bickley. 81 00:04:05.280 --> 00:04:08.039 It forces type rules. It also figures out where to 82 00:04:08.080 --> 00:04:11.199 store variables, maybe on the stack frame, and can even 83 00:04:11.319 --> 00:04:14.039 rewrite bits of the AST to make things clearer for 84 00:04:14.120 --> 00:04:14.879 later stages. 85 00:04:15.240 --> 00:04:18.920 Okay, so the front end understands the code's meaning and structure. 86 00:04:19.199 --> 00:04:21.680 What's next is it straight to machine code? 87 00:04:21.839 --> 00:04:24.639 Often there's a step in between, sometimes called the middle end. 88 00:04:24.720 --> 00:04:27.759 It's kind of optional but very common. Its whole job 89 00:04:27.839 --> 00:04:31.879 is optimization. It takes that intermediate representation, the IR, and 90 00:04:31.959 --> 00:04:35.120 tries to make it better, faster, smaller. How does it 91 00:04:35.199 --> 00:04:36.439 do that various ways. 92 00:04:36.480 --> 00:04:36.959 It might. 93 00:04:39.959 --> 00:04:42.519 Loops that don't change loop invariants and pull them out 94 00:04:42.519 --> 00:04:46.319 so they only run once, or replace an expensive operation 95 00:04:46.439 --> 00:04:50.240 like multiplication with a cheaper one like addition, if possible. 96 00:04:50.279 --> 00:04:52.399 It's called strength reduction. It polishes the. 97 00:04:52.360 --> 00:04:55.160 IR optimized IR, then comes the back end. 98 00:04:55.199 --> 00:04:57.920 Then comes the back end the synthesis phase. This takes 99 00:04:57.959 --> 00:05:01.879 the R and generates the action target machine code. And 100 00:05:02.000 --> 00:05:05.319 this part is target language dependent. It needs to know 101 00:05:05.360 --> 00:05:09.360 if it's generating code for Intel or AR or MIIPS. 102 00:05:09.519 --> 00:05:11.240 It doesn't have subphases. 103 00:05:10.680 --> 00:05:14.439 Too, it does. There's code generation itself, picking the actual 104 00:05:14.480 --> 00:05:18.680 machine instructions. Then often a peephole optimizer. This looks at 105 00:05:18.680 --> 00:05:22.120 a small window of instructions just generated and looks for 106 00:05:22.199 --> 00:05:25.399 obvious local inefficiencies, like maybe a jump that immediately jumps 107 00:05:25.399 --> 00:05:29.519 somewhere else, tiing up locally exactly. And finally the object phase, 108 00:05:29.560 --> 00:05:32.920 which handles linking different compiled pieces together into the final 109 00:05:32.920 --> 00:05:34.000 executable program. 110 00:05:34.199 --> 00:05:36.800 Wow, that's quite a journey breaking it down like that. 111 00:05:37.120 --> 00:05:39.079 What's the big win for compiler builders? 112 00:05:39.319 --> 00:05:43.360 The modularity is huge. The biggest win is probably reusability 113 00:05:43.399 --> 00:05:46.399 thanks to that intermediate representation. If you have a well 114 00:05:46.399 --> 00:05:49.240 designed IR, it acts like a universal adapter. You can 115 00:05:49.279 --> 00:05:52.120 write one front end for Java and plug it into 116 00:05:52.160 --> 00:05:56.199 different back ends for Intel Era, whatever, or write one 117 00:05:56.240 --> 00:05:58.839 back end for Intel and plug in front ends for 118 00:05:58.959 --> 00:06:02.560 Java C plus plus Python. The source material gives a 119 00:06:02.600 --> 00:06:05.279 great example. Once you have a front end for Java 120 00:06:05.319 --> 00:06:07.439 and a back end for the Intel Core duo, you 121 00:06:07.519 --> 00:06:09.360 only need to write a new C front end to 122 00:06:09.360 --> 00:06:12.399 get a C compiler at a Spark back end, and 123 00:06:12.439 --> 00:06:14.680 you get two more compilers, four for the price of two. 124 00:06:15.040 --> 00:06:16.199 It's incredibly efficient. 125 00:06:16.279 --> 00:06:19.519 That makes a lot of sense. Now, Java specifically, it 126 00:06:19.560 --> 00:06:21.399 has a bit of a unique approach, doesn't it. It's 127 00:06:21.439 --> 00:06:24.160 not quite that direct to native code path we just described. 128 00:06:24.240 --> 00:06:27.199 That's absolutely right. Java takes a fascinating detour. When you 129 00:06:27.240 --> 00:06:30.720 compile Java source code using Javac dot Burke, you don't 130 00:06:30.720 --> 00:06:33.600 get native machine code. You get a dot class file 131 00:06:33.639 --> 00:06:37.360 which contains bytecode. This bitcode isn't for your actual CPU, 132 00:06:37.519 --> 00:06:40.920 it's for the Java Virtual Machine, the JVM. The JVM 133 00:06:40.959 --> 00:06:43.000 itself starts out as an interpreter. 134 00:06:42.600 --> 00:06:44.879 For this bike code an interpreter. But I thought we 135 00:06:44.920 --> 00:06:46.920 said compilation was faster. Ah. 136 00:06:46.920 --> 00:06:50.120 But here's the clever bit. The JVM uses just in 137 00:06:50.199 --> 00:06:54.680 time JIT compilation and techniques like hotspot compilation. As your 138 00:06:54.759 --> 00:06:58.439 Java program runs, the JVM watches itself. It identifies the 139 00:06:58.480 --> 00:07:01.920 hotspots methods or loops that are executed frequently, and then 140 00:07:02.120 --> 00:07:06.040 on the fly during execution, it compiles these specific hotspots 141 00:07:06.079 --> 00:07:08.560 down to native machine code for the actual hardware you're 142 00:07:08.639 --> 00:07:09.000 running on. 143 00:07:09.160 --> 00:07:11.240 WHOA, So it compiles while the program. 144 00:07:11.040 --> 00:07:14.040 Is running exactly it caches this native code. The next 145 00:07:14.040 --> 00:07:16.399 time that hotspot is hit, it runs the super fast 146 00:07:16.439 --> 00:07:19.480 native version. It's dynamic, and often this can lead to 147 00:07:19.519 --> 00:07:23.160 performance that rivals or even beats statically compiled code in 148 00:07:23.240 --> 00:07:27.600 some benchmarks. Because the JIT compiler has runtime information, so 149 00:07:27.759 --> 00:07:31.759 JVM bytecode essentially acts as another layer of intermediate representation. 150 00:07:32.040 --> 00:07:35.519 That's incredibly dynamic. How do the big players, the celebrity 151 00:07:35.519 --> 00:07:38.639 compilers like Oracles, manage that complexity. 152 00:07:38.879 --> 00:07:42.399 Well, let's look at Oracle's Java hotspot compiler. It's kind 153 00:07:42.399 --> 00:07:45.519 of the standard. It actually has two main JT compilers, 154 00:07:46.120 --> 00:07:48.680 a client one for fast startup and a server one 155 00:07:48.720 --> 00:07:52.439 for heavy optimization. The server compiler does amazing things like 156 00:07:52.560 --> 00:07:56.720 using advanced IR sophisticated register allocation, But one of the 157 00:07:56.720 --> 00:08:01.160 coolest things is dynamic de optimization optimization. It undoes its 158 00:08:01.160 --> 00:08:04.839 own work precisely. Hotspot can be really aggressive with optimizations, 159 00:08:04.879 --> 00:08:08.360 like inlining methods. It makes assumptions, but what if later 160 00:08:08.519 --> 00:08:11.439 the program loads a new class that invalidates an assumption, 161 00:08:11.879 --> 00:08:15.360 maybe overrides a method it inlined. Hotspot can detect this, 162 00:08:15.680 --> 00:08:18.519 throw away the now incorrect native code for that section, 163 00:08:18.680 --> 00:08:21.360 and fall back to interpreting it. Then it might reoptimize 164 00:08:21.439 --> 00:08:22.439 later with the new information. 165 00:08:22.639 --> 00:08:25.000 That's wild. It lets it be optimistic because it has 166 00:08:25.000 --> 00:08:26.800 an escape patch exactly. 167 00:08:26.959 --> 00:08:32.399 It can be aggressively optimistic. Another trick is onstack replacement OSR. 168 00:08:33.279 --> 00:08:36.080 Imagine a method is stuck in a long loop running 169 00:08:36.080 --> 00:08:39.840 in interpreted mode. Hotspot can compile an optimized version of 170 00:08:39.840 --> 00:08:43.120 that loop on the side and then incredibly swap the 171 00:08:43.159 --> 00:08:46.120 currently executing interpreted cun and its state with the new 172 00:08:46.200 --> 00:08:49.919 compiled version right in the middle of the loop. Execution continues, 173 00:08:50.159 --> 00:08:51.039 but now it's fast. 174 00:08:51.279 --> 00:08:54.240 Mind boggling. What about other compilers like the one used 175 00:08:54.279 --> 00:08:55.320 in Eclipse. 176 00:08:54.960 --> 00:08:59.000 AH the Eclipse compiler for Java BCJ it's claim to 177 00:08:59.039 --> 00:09:02.240 fame as being an in compiler. After you build your 178 00:09:02.279 --> 00:09:05.159 project once, if you change just one file, ECJ is 179 00:09:05.159 --> 00:09:07.720 smart enough to only recompile that file and any others 180 00:09:07.720 --> 00:09:08.799 that directly depend on it. 181 00:09:08.840 --> 00:09:10.919 Well, that must be way faster for big projects. 182 00:09:11.039 --> 00:09:14.279 Usually faster developers love it. And here's a really surprising 183 00:09:14.320 --> 00:09:16.919 thing the source mentions about ECJ. It actually lets you 184 00:09:17.000 --> 00:09:19.639 run an even debug code that contains errors. 185 00:09:19.679 --> 00:09:21.080 Wait, what how can that work? 186 00:09:21.240 --> 00:09:23.159 If you have an error and a method, but that 187 00:09:23.200 --> 00:09:25.919 method is never actually called during your test run, the 188 00:09:25.960 --> 00:09:29.279 program runs fine. The error only matters if you actually 189 00:09:29.320 --> 00:09:32.320 try to execute that specific broken part. If you do, 190 00:09:32.519 --> 00:09:35.440 you get an exception. But otherwise it works great for 191 00:09:35.519 --> 00:09:36.159 rapid development. 192 00:09:36.279 --> 00:09:40.480 That's surprisingly pragmatic. So we have dynamic incremental. Any other 193 00:09:40.519 --> 00:09:41.919 major approaches. 194 00:09:41.799 --> 00:09:45.039 Well, there's the g and U Java compiler GCJ. Its 195 00:09:45.039 --> 00:09:48.519 focus was different, more on static compilation. It aim to 196 00:09:48.519 --> 00:09:51.320 compile Java directly to native machine code ahead of time, 197 00:09:51.360 --> 00:09:55.000 like a traditional plus compiler. This gives you fast startup 198 00:09:55.039 --> 00:09:57.519 and potentially lower memory use, which can be good for 199 00:09:57.559 --> 00:10:01.000 embedded systems, but you lose some of the dynamic advantages. 200 00:10:01.600 --> 00:10:05.320 And we should mention Microsoft's cshard compiler for dot Net too. 201 00:10:05.639 --> 00:10:08.279 It's similar to Java in concept to compile c chap 202 00:10:08.360 --> 00:10:12.600 into Common Intermediate Language CIL, which is like JVM bytecode. 203 00:10:12.919 --> 00:10:16.480 The CIL is then run by the common language runtime CLR, 204 00:10:16.759 --> 00:10:18.799 which uses JT compilation. 205 00:10:18.600 --> 00:10:22.320 So a similar virtual machine and JIT approach very similar. 206 00:10:22.360 --> 00:10:25.159 Dot Net has its own bytcode format CIL and its 207 00:10:25.159 --> 00:10:28.279 own virtual machine CLR doing the JT magic. This is 208 00:10:28.320 --> 00:10:31.639 all fascinatingly complex, but the source talks about a hands 209 00:10:31.639 --> 00:10:34.440 on way to learn this using J How does that 210 00:10:34.480 --> 00:10:35.399 make it more concrete? 211 00:10:35.639 --> 00:10:38.559 J is brilliant for teaching. It's a subset of Java, 212 00:10:38.600 --> 00:10:42.360 maybe half the syntax, but still a proper object oriented language. 213 00:10:42.480 --> 00:10:45.159 The compiler for it describing the book is over thirty 214 00:10:45.200 --> 00:10:49.120 thousand lines. So by working with JJ, adding features, fixing bugs, 215 00:10:49.399 --> 00:10:52.919 you're not just learning compiler theory, you're practicing real software engineering, 216 00:10:53.279 --> 00:10:56.080 and honestly, it makes you a better Java programmer too. 217 00:10:56.200 --> 00:10:58.679 Okay, let's make it practical. Suppose we wanted to add 218 00:10:58.679 --> 00:11:02.440 a simple division operator a t J. How would that 219 00:11:02.519 --> 00:11:04.320 touch the different phases we talked about? 220 00:11:04.519 --> 00:11:08.679 Great example. Okay, first, lexical analysis. The scanner needs to 221 00:11:08.720 --> 00:11:11.159 recognize as a new token let's call it div, so 222 00:11:11.200 --> 00:11:13.320 when it sees in the source code, it outputs DV 223 00:11:13.480 --> 00:11:15.159 instead of say an error. 224 00:11:15.200 --> 00:11:16.840 Step one, recognize the symbol. 225 00:11:16.919 --> 00:11:20.519 Step two syntax analysis. The parser needs a new rule. 226 00:11:20.759 --> 00:11:23.279 We'd define a new node for our abstracts and tax tree, 227 00:11:23.320 --> 00:11:26.000 maybe JADA write up. The parser's grammar rules would be 228 00:11:26.080 --> 00:11:29.840 updated to note that an expression can be expression div expression, 229 00:11:30.159 --> 00:11:32.799 and it would build that jadavide up node in the AST. 230 00:11:33.240 --> 00:11:36.039 So the tree now represents division structurally exactly. 231 00:11:36.240 --> 00:11:40.200 Step three semantic analysis. This is type checking. We'd add 232 00:11:40.279 --> 00:11:43.879 rules here. J probably only allows in division, so the 233 00:11:43.960 --> 00:11:47.240 analyzer checks are the left and right operands of jadavide 234 00:11:47.279 --> 00:11:51.080 up both integers if not throw a type error. It 235 00:11:51.120 --> 00:11:55.000 also handles story delocation, but for a simple operator typechecking 236 00:11:55.080 --> 00:11:56.080 is key makes sense. 237 00:11:56.440 --> 00:11:59.879 Can't divide an integer by a Hello world precisely. 238 00:12:00.600 --> 00:12:06.159 Finally, Step four code generation. Assuming it past semantic analysis, 239 00:12:06.159 --> 00:12:09.519 we generate the JVM bytecode. Would use the provided tools 240 00:12:09.639 --> 00:12:12.519 like slimmett and the books example to output the right 241 00:12:12.600 --> 00:12:16.080 JVM instructions, probably something like loading the two operands onto 242 00:12:16.120 --> 00:12:18.759 the JVM stack, then issuing the ID of instruction for 243 00:12:18.799 --> 00:12:19.559 integer division. 244 00:12:19.679 --> 00:12:22.799 Wow. Okay, so even adding one operator forces you to 245 00:12:22.840 --> 00:12:26.120 modify the scanner, the parser, the semantic analyzer, and. 246 00:12:26.080 --> 00:12:28.919 The code generator every single phase. It really illustrates how 247 00:12:28.919 --> 00:12:31.960 interconnected it all is. And the j approach emphasizes writing 248 00:12:32.000 --> 00:12:35.080 tests first. Using extreme programming, you'd write tests that should 249 00:12:35.080 --> 00:12:37.440 compile with division and tests that should fail with specific 250 00:12:37.559 --> 00:12:39.480 errors before you even implement the operator. 251 00:12:39.600 --> 00:12:41.639 Test driven compiler development makes sense. 252 00:12:41.679 --> 00:12:43.000 It ensures robustness. 253 00:12:43.279 --> 00:12:45.919 So after all that, we might have JVM bytcode, but 254 00:12:46.000 --> 00:12:49.240 sometimes you need ROSS by native code. The source even 255 00:12:49.240 --> 00:12:52.679 talks about a JVM to my pace translator. What's that stage. 256 00:12:52.399 --> 00:12:54.840 About, right, that's pushing it all the way to the hardware. 257 00:12:55.519 --> 00:12:58.720 In that scenario, the JVM bytcode we just generated that 258 00:12:58.799 --> 00:13:01.840 becomes a new sourceling which or rather a new IR. 259 00:13:02.879 --> 00:13:06.879 Thead JVM compiler acts as the front end, and this 260 00:13:07.000 --> 00:13:11.000 JVM to SPIM translator SPIM simulates a mi IPS is 261 00:13:11.039 --> 00:13:14.240 the new back end. Its compilation layered on compilation. 262 00:13:14.000 --> 00:13:15.399 Buying the BI code itself. 263 00:13:15.120 --> 00:13:18.559 Exactly, and this back end needs to understand the target 264 00:13:18.600 --> 00:13:21.720 machine mi IPS in this case its memory layout, where 265 00:13:21.720 --> 00:13:24.320 code goes, where static data goes, the heap, the stack, 266 00:13:24.639 --> 00:13:27.279 and its registers. Those thirty two on ips registers, knowing 267 00:13:27.279 --> 00:13:30.679 which ones are for arguments mount fewer dollars, return values 268 00:13:30.759 --> 00:13:33.200 V dollar V dollars, the return address, which ones the 269 00:13:33.200 --> 00:13:35.440 called function must save Salmi's dollars, and which ones the 270 00:13:35.440 --> 00:13:38.879 caller must save if needed. What's conventions to follow absolutely 271 00:13:39.279 --> 00:13:42.919 and this back end often uses its own intermediate representations 272 00:13:42.960 --> 00:13:46.840 to manage the complexity. It might first convert JVM bycode 273 00:13:46.879 --> 00:13:50.679 into a high level ir HIR. This often uses static 274 00:13:50.759 --> 00:13:55.159 single assignment SSA form. It's a form where every variable 275 00:13:55.200 --> 00:13:58.440 is assigned of value exactly once. If a variable could 276 00:13:58.440 --> 00:14:01.240 get its value from different places, like after an IFLSE, 277 00:14:01.559 --> 00:14:04.720 special five functions are introduced to represent that merge. It 278 00:14:04.759 --> 00:14:08.879 makes many optimizations easier to implement. Then this HIR is 279 00:14:08.919 --> 00:14:12.159 lowered to a low level ir LR, the five functions 280 00:14:12.159 --> 00:14:15.559 are gone, replaced by actual move instructions. And crucially, the 281 00:14:15.720 --> 00:14:19.279 LAR often uses an unlimited number of virtual registers. It's 282 00:14:19.360 --> 00:14:21.679 much closer to the final machine code, but without worrying 283 00:14:21.679 --> 00:14:23.159 about the limited physical registers. 284 00:14:23.200 --> 00:14:27.000 Just yet, unlimited virtual registers. But the chip only has say, 285 00:14:27.120 --> 00:14:29.279 thirty two physical ones, how does they get resolved? 286 00:14:29.559 --> 00:14:32.639 That's the magic of register allocation. It's one of the 287 00:14:32.639 --> 00:14:35.960 most critical and complex parts of a modern back end. 288 00:14:36.600 --> 00:14:39.720 The goal is to map those unlimited virtual registers onto 289 00:14:39.720 --> 00:14:43.799 the scarce physical registers. When you have more virtual registers 290 00:14:43.879 --> 00:14:47.679 live holding values still needed than physical registers available, you 291 00:14:47.759 --> 00:14:50.559 have to spill some virtual registers. That means storing their 292 00:14:50.600 --> 00:14:53.360 values out to memory, usually the stack, and reloading them 293 00:14:53.399 --> 00:14:54.240 later when needed. 294 00:14:54.519 --> 00:14:55.960 Ah, and spilling is slow. 295 00:14:56.200 --> 00:14:59.679 Spilling is slow, so a good register allocator tries really 296 00:14:59.720 --> 00:15:02.720 hard to minimize spills. Techniques like linear scan or graph 297 00:15:02.759 --> 00:15:05.759 coloring are used. It's a huge area of compiler research. 298 00:15:05.879 --> 00:15:08.440 Okay, and you mentioned optimizations happen in the middle end, 299 00:15:08.519 --> 00:15:11.679 but surely there are more optimizations happening here in the 300 00:15:11.720 --> 00:15:12.320 back end too. 301 00:15:12.360 --> 00:15:15.799 Oh. Definitely, optimizations happen at multiple stages, often on these 302 00:15:15.840 --> 00:15:21.279 intermediate representations like HIR things like inlining replacing a function 303 00:15:21.399 --> 00:15:24.840 call with the function's actual code to avoid call return overhead. 304 00:15:25.000 --> 00:15:28.960 Very common. Constant folding and propagation. If you have xplicus 305 00:15:29.000 --> 00:15:32.200 three y quills x plus four, the compiler can figure 306 00:15:32.200 --> 00:15:34.360 out why is always seven and just use seven. 307 00:15:34.480 --> 00:15:35.960 Simple but effective. 308 00:15:36.000 --> 00:15:41.600 Very common sub expression elimination CSE, finding identical calculations and 309 00:15:41.639 --> 00:15:45.200 doing them only once, storing the result. Think about calculating 310 00:15:45.200 --> 00:15:49.360 an array elements address inside nested loops. CSE can save 311 00:15:49.399 --> 00:15:52.799 a lot of redundant math lifting loop invariant code. We 312 00:15:52.840 --> 00:15:55.600 mentioned this before. If a calculation inside a loop doesn't 313 00:15:55.639 --> 00:15:58.360 depend on the loop iteration, pull it out, compute at 314 00:15:58.399 --> 00:16:02.519 once before the loop starts. Array bounds check elimination Java, 315 00:16:02.559 --> 00:16:05.360 for instance, checks if your array index is valid. But 316 00:16:05.399 --> 00:16:07.559 if the compiler can prove the index will always be 317 00:16:07.600 --> 00:16:11.200 in bounds, maybe for iokal to array dot linksh one, 318 00:16:11.480 --> 00:16:14.360 it can remove that check entirely, saving potentially millions of 319 00:16:14.399 --> 00:16:15.240 checks in a big loop. 320 00:16:15.399 --> 00:16:17.399 That sounds like it can make a massive difference. 321 00:16:17.519 --> 00:16:20.279 It really does. The source quote's an expert Kathleen Nubbs 322 00:16:20.279 --> 00:16:22.720 saying it's often best to just try all the optimizations, 323 00:16:22.759 --> 00:16:25.840 even imperfectly. The combined effect is usually huge. 324 00:16:25.960 --> 00:16:32.320 Wow. What a journey from typing code through tokens, trees bycode, irs, optimizations, 325 00:16:32.360 --> 00:16:36.759 register allocation, finally down to the bare metal instructions. It's incredible, 326 00:16:36.919 --> 00:16:37.519 it really is. 327 00:16:38.320 --> 00:16:41.000 This deep dive shows that behind every app you use, 328 00:16:41.080 --> 00:16:45.720 every website, there's this incredibly intricate dance of translation analysis 329 00:16:45.720 --> 00:16:51.600 and refinement going on a compiler isn't just translating. It's optimizing, checking, restructuring, 330 00:16:52.440 --> 00:16:56.039 constantly working to bridge the gap between how we think 331 00:16:56.159 --> 00:16:57.360 and how the machine works. 332 00:16:57.559 --> 00:16:59.320 And it definitely drives home that even if you don't 333 00:16:59.320 --> 00:17:03.639 build compile, just knowing this process exists, understanding the steps 334 00:17:04.000 --> 00:17:06.920 makes you much more informed programmer. Right, you appreciate what's 335 00:17:06.920 --> 00:17:07.880 happening under the hood. 336 00:17:07.920 --> 00:17:11.519 Absolutely, you understand why some code might be faster than 337 00:17:11.519 --> 00:17:15.359 other code, or why certain language features might have performance implications. 338 00:17:15.759 --> 00:17:18.759 It demystifies things, and maybe a final thought to leave with, 339 00:17:19.559 --> 00:17:23.799 think about how these compiler techniques keep evolving JIT hotspot, 340 00:17:23.920 --> 00:17:28.599 dynamic deoptimization, incremental builds. It reflects this constant push and 341 00:17:28.640 --> 00:17:32.240 pull in computing between rows, speed, the flexibility programmers need, 342 00:17:32.359 --> 00:17:35.559 and managing the sheer complexity of modern software and hardware. 343 00:17:36.200 --> 00:17:38.279 What's the next piece of compiler magic going to. 344 00:17:38.200 --> 00:17:41.240 Be a fascinating question? Indeed, well, thank you for joining 345 00:17:41.319 --> 00:17:43.680 us on this really deep dive into the compiler's world. 346 00:17:44.000 --> 00:17:48.240 We hope you listening start seeing these princibles everywhere in 347 00:17:48.279 --> 00:17:49.400 the technology you use.