WEBVTT 1 00:00:00.120 --> 00:00:04.639 Okay, let's unpack LLVM. It's incredibly powerful, right, underpins so 2 00:00:04.719 --> 00:00:07.440 much stuff, apples, x code, game engines, you name it. 3 00:00:07.519 --> 00:00:08.880 Absolutely it's everywhere. 4 00:00:09.279 --> 00:00:12.119 But you know, for developers already deep in the ecosystem 5 00:00:12.240 --> 00:00:15.439 or maybe looking to get deeper into compiler engineering, tackle 6 00:00:15.560 --> 00:00:19.079 the docs, well, it can feel less like learning and 7 00:00:19.199 --> 00:00:21.399 more like drinking from a fire hose. 8 00:00:21.559 --> 00:00:24.440 That's a perfect description. It's famously scattered, right. 9 00:00:24.800 --> 00:00:28.600 So our source today, this book LLVM Techniques, tips and 10 00:00:28.640 --> 00:00:32.320 bust practices, it promises to sort of rain that beast in. 11 00:00:32.520 --> 00:00:34.479 So what's our big mission for you today? What are 12 00:00:34.520 --> 00:00:35.679 we trying to achieve here? 13 00:00:35.920 --> 00:00:39.119 Well, our mission is really to give you this streamlined, 14 00:00:39.880 --> 00:00:44.280 comprehensive overview to cut through all that documentation sprawl. We're 15 00:00:44.280 --> 00:00:47.759 going to unearth some key techniques, maybe some surprising insights, 16 00:00:48.039 --> 00:00:51.560 to help you build, tests, optimize, and importantly debug your 17 00:00:51.719 --> 00:00:55.320 LLVM projects much more efficiently. Fewer headaches. 18 00:00:55.640 --> 00:00:56.880 Fewer headaches are always good. 19 00:00:57.079 --> 00:01:00.880 Definitely think of this deep dive as like your shortcut, 20 00:01:01.079 --> 00:01:05.319 making LLVM development faster, more reliable, and well more insightful. 21 00:01:05.680 --> 00:01:08.439 We're aiming straight at those common pain points, those really 22 00:01:08.519 --> 00:01:13.560 long build times, complex testing opaque debugging, that kind of thing. 23 00:01:13.640 --> 00:01:15.599 Yeah, long build times. That's off for the first mountain 24 00:01:15.640 --> 00:01:18.959 you hit, isn't it. Especially with a huge project like LLVM, 25 00:01:19.480 --> 00:01:22.000 defaults can take hours, huge productivity killer. 26 00:01:22.079 --> 00:01:23.599 Oh absolutely, it's a major drag. 27 00:01:24.040 --> 00:01:25.840 So what's the secret? How do we cut down this 28 00:01:25.920 --> 00:01:26.920 build time? Beast? 29 00:01:27.159 --> 00:01:31.239 Okay, so the book immediately points to replacing slower, older tools. 30 00:01:31.359 --> 00:01:33.920 Makes sense, right, right? Take build systems Ninja. It just 31 00:01:33.959 --> 00:01:36.959 runs significantly faster than say, g and you make on 32 00:01:37.159 --> 00:01:39.920 big code bases. LLVM is a perfect example. 33 00:01:40.200 --> 00:01:42.519 Why is Ninja faster? What's the magic there? 34 00:01:42.640 --> 00:01:45.799 The key is it's build dot ninjascript. It's it's almost 35 00:01:45.799 --> 00:01:48.640 like assembly language for builds. It gets generated by higher 36 00:01:48.680 --> 00:01:51.599 level systems like Seamake, and because it's so low level, 37 00:01:51.640 --> 00:01:54.400 it allows for loads of optimizations under the hood. Plus 38 00:01:54.400 --> 00:01:56.799 it handles dependencies much better. You just tell Sea make 39 00:01:56.879 --> 00:01:58.719 desh g ninja as simple as that. 40 00:01:58.760 --> 00:02:01.680 So just one command line flag can make a big difference. 41 00:02:01.719 --> 00:02:04.239 We're basically upgrading the engine. But it's not just the 42 00:02:04.239 --> 00:02:06.959 build system, is it. The linker is often a problem too. 43 00:02:07.040 --> 00:02:11.400 Precisely, the default linker BFD. It's mature, sure, but it 44 00:02:11.479 --> 00:02:14.000 wasn't really built for modern speed or memory needs. 45 00:02:14.039 --> 00:02:14.960 How bad can it get? 46 00:02:15.039 --> 00:02:18.159 It can show up that this up to twenty gigabytes 47 00:02:18.199 --> 00:02:20.120 of memory building LLVM. 48 00:02:20.360 --> 00:02:22.319 Wow. Okay, that's a bottleneck. 49 00:02:21.919 --> 00:02:25.039 Definitely a performance hurdle. But thankfully there are much better 50 00:02:25.080 --> 00:02:28.520 alternatives g and you gold, Google developed that one, and 51 00:02:28.759 --> 00:02:33.280 LVM's own linker ld LED Yeah. Ld is often even faster, 52 00:02:33.360 --> 00:02:36.159 and it's got experimental parallel linking too, which is pretty cool. 53 00:02:36.400 --> 00:02:40.400 And again easy sea make flags ds DLVM muslinker Gold 54 00:02:41.039 --> 00:02:44.639 or dz dlv musle ankored. Okay, the speed up from 55 00:02:44.680 --> 00:02:47.639 just those two changes Ninja and a faster linker, it's substantial, 56 00:02:47.919 --> 00:02:48.680 really noticeable. 57 00:02:48.759 --> 00:02:51.319 That's huge just swapping out a couple of underlying tools. 58 00:02:51.439 --> 00:02:53.759 But we can also tweak c make itself right, fine 59 00:02:53.759 --> 00:02:54.840 tune the build arguments. 60 00:02:54.879 --> 00:02:58.520 Absolutely. Tweaking CE make arguments is crucial for efficiency, like 61 00:02:58.960 --> 00:03:01.360 choosing the right build type. A roll with deb info 62 00:03:01.479 --> 00:03:03.240 is off in the sweet spot. Why that one, Well, 63 00:03:03.280 --> 00:03:05.319 it gives you optimized code so it's fast, but it 64 00:03:05.400 --> 00:03:07.759 keeps the debug information so you get a great balance 65 00:03:07.800 --> 00:03:12.240 between space speed and you know, being able to actually debug. 66 00:03:12.280 --> 00:03:12.840 It makes sense. 67 00:03:13.000 --> 00:03:15.240 You generally want to avoid a full debug build unless 68 00:03:15.240 --> 00:03:18.879 you absolutely have to. It just creates so much unnecessary 69 00:03:18.919 --> 00:03:21.599 storage waste, huge binaries and targets. 70 00:03:21.879 --> 00:03:26.039 LVM supports what nearly two dozen hardware targets. Most people 71 00:03:26.039 --> 00:03:28.039 don't need all of those really exactly. 72 00:03:28.080 --> 00:03:30.479 That's another massive time sink. If you build them all. 73 00:03:30.800 --> 00:03:32.520 You can save a ton of time by just building 74 00:03:32.520 --> 00:03:36.479 the ones you actually need. Use DLLLVM target. 75 00:03:36.120 --> 00:03:37.639 Still build How does that look? 76 00:03:37.840 --> 00:03:41.199 Something like Della's DLVM targets to build X eighty six 77 00:03:41.319 --> 00:03:43.879 R sixty four. Just list the ones you care about. 78 00:03:43.960 --> 00:03:45.680 And there's a catch with shells you mentioned. 79 00:03:45.680 --> 00:03:48.960 Ah yeah, good point. In some shells like BSh, you 80 00:03:49.000 --> 00:03:51.240 have to remember the double quotes around the list, otherwise 81 00:03:51.319 --> 00:03:53.759 the command gets cut off part way through. Little gotcha? 82 00:03:53.919 --> 00:03:56.199 Good tip? What else? Shared libraries? 83 00:03:56.400 --> 00:04:01.159 Yes, another great strategy, especially during development. Build LVM components 84 00:04:01.199 --> 00:04:05.280 as shared libraries. Use LVM components as shared libraries. 85 00:04:05.439 --> 00:04:07.639 Use aad build shared libs. 86 00:04:07.360 --> 00:04:10.479 On Why is that better? Because LVM is so modular, 87 00:04:10.840 --> 00:04:14.280 Building shared library saves a significant amount of storage space 88 00:04:14.319 --> 00:04:16.560 and really speeds up the linking part of the build 89 00:04:16.600 --> 00:04:19.920 process compared to static libraries. Much faster iteration. 90 00:04:20.240 --> 00:04:23.480 Okay, and what do LVM dash tubulliging. That one comes 91 00:04:23.519 --> 00:04:24.600 up a lot as being slow. 92 00:04:24.759 --> 00:04:27.399 It does. It can really impact build times. But there's 93 00:04:27.439 --> 00:04:30.439 a trick. You can build an optimized version of just 94 00:04:30.600 --> 00:04:33.680 lavmt bulgein itself even if the rest of your build 95 00:04:33.720 --> 00:04:38.399 is in debug mode. Use h dlll V optimized stable gen. 96 00:04:38.399 --> 00:04:41.959 Eldve one nice. So optimizing the tool that helps build the. 97 00:04:41.879 --> 00:04:44.000 Tools exactly, it shaves off more time. 98 00:04:44.160 --> 00:04:48.040 So it really feels like we're swapping out a rusty 99 00:04:48.079 --> 00:04:50.600 old tractor for a soup up racing machine just by 100 00:04:50.720 --> 00:04:53.800 changing a few settings and tools. Speaking of alternatives, the 101 00:04:53.839 --> 00:04:56.560 source mentioned another build system gn it. 102 00:04:56.560 --> 00:04:59.959 Does generate Ninja or gn used a lot by Google projects. 103 00:05:00.000 --> 00:05:00.800 It's like Chromium. 104 00:05:00.879 --> 00:05:01.839 What's its advantage. 105 00:05:01.920 --> 00:05:05.879 It's known for really fast configuration time and reliable argument management. 106 00:05:06.160 --> 00:05:08.800 The book says it's especially useful if your developments make 107 00:05:08.920 --> 00:05:11.560 changes to build files, or if you're constantly trying out 108 00:05:11.560 --> 00:05:15.079 different build options. Much quicker reconfiguration, so. 109 00:05:15.240 --> 00:05:17.759 Good for rapid iteration on the build itself exactly. 110 00:05:17.959 --> 00:05:20.639 It's more of an alternative for those specific scenarios. Maybe 111 00:05:20.639 --> 00:05:23.040 not a full replacement for everyone, but very handy when 112 00:05:23.040 --> 00:05:24.279 you're tweaking build files a lot. 113 00:05:24.319 --> 00:05:26.720 Okay, it makes sense. So once you've got your compiler 114 00:05:26.759 --> 00:05:30.439 built fast, the next big hurdle is reliability testing. How 115 00:05:30.439 --> 00:05:32.720 do you make sure it's actually, you know, correct? 116 00:05:33.079 --> 00:05:37.079 Yeah, testing is critical, and LVM provides its own framework 117 00:05:37.120 --> 00:05:43.040 for this, LVM LIT like LLVM Integrated Tester. The book 118 00:05:43.079 --> 00:05:46.839 calls it an easy to use yet general framework, and importantly, 119 00:05:47.000 --> 00:05:50.319 while it started for LVM's own tests, it's actually a 120 00:05:50.360 --> 00:05:53.680 generic testing framework. You can use it outside LVM for 121 00:05:53.759 --> 00:05:54.639 other projects too. 122 00:05:54.879 --> 00:05:58.800 Very versatile and inside RIT there's this utility file check 123 00:05:59.360 --> 00:06:02.560 that sounds key for compiler testing. What's special about it? 124 00:06:02.800 --> 00:06:05.560 File check is really powerful. It does advance pattern checking 125 00:06:05.560 --> 00:06:08.040 on output files. It goes way beyond just diffing text, 126 00:06:08.399 --> 00:06:11.839 so you embed directives right in your test files. Gacheck 127 00:06:11.920 --> 00:06:15.560 is basic rajex matching, simple enough, but then you get 128 00:06:15.560 --> 00:06:18.759 directives like check next t that makes sure a pattern 129 00:06:18.839 --> 00:06:21.399 is found on the very next line after the previous match. 130 00:06:21.800 --> 00:06:24.040 Super useful for checking sequential. 131 00:06:23.519 --> 00:06:25.839 Output ah right, controlling the order. 132 00:06:25.839 --> 00:06:30.079 Exactly and check same that matches patterns that must be 133 00:06:30.079 --> 00:06:33.759 on the exact same line. Brilliant for avoiding really long 134 00:06:34.000 --> 00:06:36.639 messy check lines when you need multiple things on one 135 00:06:36.680 --> 00:06:38.759 line keeps tests readable. 136 00:06:38.839 --> 00:06:42.000 Yeah, I can see that. Verbos ir needs concise checks. 137 00:06:42.160 --> 00:06:44.879 What if you want to ensure something isn't there or 138 00:06:44.879 --> 00:06:45.959 if the order doesn't matter? 139 00:06:46.040 --> 00:06:50.680 Good questions. For negative checks, there's check not. It asserts 140 00:06:50.680 --> 00:06:53.240 a pattern does not exist. Really handy for saying Okay, 141 00:06:53.240 --> 00:06:55.800 I expect why, but I definitely don't want to see X. 142 00:06:56.079 --> 00:06:58.759 Makes sense asserting the absence of something. 143 00:06:58.560 --> 00:07:01.399 And for when the order might change, maybe due to optimizations. 144 00:07:01.399 --> 00:07:05.000 Shuffling code around you use check DAG. That stands for 145 00:07:05.040 --> 00:07:07.800 a directed ecyclic graph, but here it means it allows 146 00:07:07.800 --> 00:07:12.920 matching texts and arbitrary orders. Super flexible for testing nondeterministic output. 147 00:07:13.160 --> 00:07:16.120 Wow, check DAG. That's really flexible. It seems like you 148 00:07:16.120 --> 00:07:18.839 can test the intent behind the code changes, not just 149 00:07:18.879 --> 00:07:20.160 the literal output strengths. 150 00:07:20.279 --> 00:07:22.600 That's exactly the point. It's about semantic checking, not just 151 00:07:22.600 --> 00:07:23.439 textual matching. 152 00:07:23.720 --> 00:07:27.720 So, speaking of describing intent and structure, compilers deal with 153 00:07:27.879 --> 00:07:33.639 incredibly complex structured data instruction sets, optimization rules. How does 154 00:07:33.920 --> 00:07:37.759 LLVM handle describing that efficiently? Is that table gen You've 155 00:07:37.839 --> 00:07:38.279 nailed it. 156 00:07:38.360 --> 00:07:41.519 Tablegen is the answer. There, it's a domain specific language 157 00:07:41.519 --> 00:07:45.959 a DSL ESL. Yeah. It originally started within LVM for 158 00:07:46.040 --> 00:07:49.079 describing things like process or instruction sets, the ISA, and 159 00:07:49.120 --> 00:07:52.639 other hardware details, but its use has just exploded. 160 00:07:52.879 --> 00:07:54.480 How so what else does it use for? 161 00:07:54.759 --> 00:07:59.319 Oh? Everything, managing Clang's command line options, defining complex optimization 162 00:07:59.439 --> 00:08:02.839 rules like the inst combine people optimizations. The book says 163 00:08:02.839 --> 00:08:06.079 it's basically for any tasks that involve non trivial static 164 00:08:06.120 --> 00:08:07.759 and structural data. 165 00:08:07.360 --> 00:08:10.120 So much broader than just hardware. Now it's a general 166 00:08:10.160 --> 00:08:12.240 tool for this kind of static data. Can you give 167 00:08:12.279 --> 00:08:14.199 us a quick feel for the syntax? How does it work? 168 00:08:14.399 --> 00:08:17.319 Sure? At its core, you define a class. Think of 169 00:08:17.360 --> 00:08:20.000 it like a C plus plus struct It defines a layout, 170 00:08:20.199 --> 00:08:22.720 fields and types. Then you use def to create an 171 00:08:22.720 --> 00:08:24.879 instance of that class called a record. 172 00:08:24.839 --> 00:08:27.199 Like creating an object from a class blueprint exactly. 173 00:08:27.759 --> 00:08:31.279 And you can override specific fields in that instance using 174 00:08:31.279 --> 00:08:32.080 the let keyword. 175 00:08:32.279 --> 00:08:35.320 Okay, and what about these bang operators I've heard about, 176 00:08:35.799 --> 00:08:39.120 dot AD, dot mole Ah, Yes. 177 00:08:39.559 --> 00:08:41.919 Those aren't run time functions. They're more like macros that 178 00:08:41.960 --> 00:08:44.840 get evaluated during build time buil tag. Yeah, so you 179 00:08:44.879 --> 00:08:48.480 can do simple computations right in the table gen file itself. 180 00:08:48.759 --> 00:08:51.840 The example given is dot mole kilogram one thousand to 181 00:08:51.879 --> 00:08:55.360 maybe convert units. It happens when table gen runs, not 182 00:08:55.480 --> 00:08:56.639 when the compiler runs. 183 00:08:56.519 --> 00:08:59.600 Later clever build time computation, or if you have lots 184 00:08:59.639 --> 00:09:01.200 of similar records, is there a shortcut? 185 00:09:01.320 --> 00:09:03.519 Yes, that's where multi class comes in. It's a way 186 00:09:03.559 --> 00:09:06.320 to define multiple records at once by factoring out common 187 00:09:06.360 --> 00:09:09.559 parameters like a template sort of. Yeah. The book uses 188 00:09:09.600 --> 00:09:12.480 an autopart and car example. You define a multi class 189 00:09:12.480 --> 00:09:15.240 for parts, then use defen to instantiate multiple cars, and 190 00:09:15.279 --> 00:09:18.639 it automatically generates all the individual part records like car one, 191 00:09:18.679 --> 00:09:21.679 fuel tank, carto, engine, et cetera from one definition. 192 00:09:21.879 --> 00:09:25.759 Very concise, nice as boilerplate right and complex relationships. 193 00:09:25.840 --> 00:09:29.200 Yeah, graphs for that. Tablegen has a specific DAG data 194 00:09:29.240 --> 00:09:33.320 type that lets you define directed cyclic graph instances explicitly, 195 00:09:33.840 --> 00:09:37.799 super important for things like instruction selection patterns or optimization 196 00:09:37.919 --> 00:09:40.679 rules where you have dependencies. You can even use tags 197 00:09:40.759 --> 00:09:42.559 like the upper term dollars to give parts of the 198 00:09:42.639 --> 00:09:43.960 daglogical names. 199 00:09:43.720 --> 00:09:46.879 A DAG type built in. Yeah, that's powerful, and the 200 00:09:46.960 --> 00:09:51.039 source uses this amazing analogy to make a concrete right 201 00:09:51.200 --> 00:09:52.759 a donut recipe it does. 202 00:09:52.799 --> 00:09:56.000 It's a brilliant example. The book uses a delicious doughnut 203 00:09:56.080 --> 00:09:58.000 recipe to show tablegen's power. 204 00:09:58.159 --> 00:10:00.639 How does that work? A doughnut recipe and a piler book. 205 00:10:00.720 --> 00:10:04.240 It defines unit classes like gramunit peb's peanut, then ingredient 206 00:10:04.279 --> 00:10:07.919 base records, and finally step records. These step records form 207 00:10:07.960 --> 00:10:11.320 a DAG representing the cooking actions. Makes this add that 208 00:10:11.480 --> 00:10:14.679 complete with ingredients and amounts. Wow, it's a perfect analogy 209 00:10:14.679 --> 00:10:17.879 because it takes this abstract idea of describing structured data 210 00:10:17.919 --> 00:10:20.440 and makes it totally tangible. You immediately see how it 211 00:10:20.480 --> 00:10:23.879 parallels describing instruction patterns or optimization steps. 212 00:10:23.720 --> 00:10:27.360 A donut recipe in compiler engineering. That's definitely an aha moment, 213 00:10:27.519 --> 00:10:30.919 makes total sense. So, okay, you've described your donut recipe 214 00:10:30.960 --> 00:10:34.039 or your instruction set in table gen. How do you 215 00:10:34.080 --> 00:10:36.480 actually use that data like print the recipe? 216 00:10:36.600 --> 00:10:39.320 Right? For that, you need a custom table gen back end? 217 00:10:39.600 --> 00:10:43.759 Now important distinction. This isn't an LVM back end like 218 00:10:43.799 --> 00:10:45.120 for generating machine code. 219 00:10:45.240 --> 00:10:46.799 Different kind of back end, totally different. 220 00:10:47.080 --> 00:10:49.320 A table gen back end is a piece of code, 221 00:10:49.519 --> 00:10:54.480 usually C plus A that convert or transpiles table gen 222 00:10:54.519 --> 00:10:57.519 files into an arbitrary, textual. 223 00:10:57.039 --> 00:10:59.639 Content arbitrary, so anything pretty much. 224 00:10:59.720 --> 00:11:02.759 It could generate a C plus plus header file documentation 225 00:11:03.039 --> 00:11:05.080 or in the donut example, just plain text for the 226 00:11:05.080 --> 00:11:09.519 recipe you use C plus plus APIs provided by tablegen 227 00:11:09.639 --> 00:11:12.440 like recordkeeper dot get all the rive definitions to get 228 00:11:12.440 --> 00:11:15.440 all the defined steps and record dot get value restring 229 00:11:15.559 --> 00:11:18.720 to pull out specific values like ingredient names or amounts. 230 00:11:18.960 --> 00:11:21.759 So you turn the table genstructure into usable code or data. 231 00:11:21.799 --> 00:11:25.080 Exactly, you transform the structured description into whatever format you 232 00:11:25.120 --> 00:11:25.919 need downstream. 233 00:11:26.120 --> 00:11:29.440 That ability to generate code is huge. Yeah, it feels 234 00:11:29.519 --> 00:11:33.240 like that opens the door to extending client itself, maybe 235 00:11:33.240 --> 00:11:35.320 injecting custom logic into the frontend. 236 00:11:35.399 --> 00:11:38.759 It absolutely does. The front end is a prime place 237 00:11:38.799 --> 00:11:42.879 for customization. Think about the preprocessor, the very first stage 238 00:11:42.919 --> 00:11:46.919 handling macros includes you can customize that. Oh yeah, you 239 00:11:46.960 --> 00:11:50.919 can write custom Pragma handler extensions, so you can invent 240 00:11:51.000 --> 00:11:55.000 your own hashtag pragma directives. The book shows an example 241 00:11:55.080 --> 00:11:56.919 hashtag pragma macro or guard. 242 00:11:57.240 --> 00:11:57.919 What would that do? 243 00:11:58.279 --> 00:12:01.559 Well? When the preprocessor sees your prag your handler code runs. 244 00:12:01.960 --> 00:12:04.759 It can parse the Pragma arguments and even register something 245 00:12:04.759 --> 00:12:08.279 called PEP callbacks. Callbacks, yeah, pp callbats let you hook 246 00:12:08.279 --> 00:12:12.000 into various preprocessor events, so you can insert custom logic 247 00:12:12.039 --> 00:12:15.840 whenever a preprocessor event happens. In the example, a macroguard 248 00:12:15.919 --> 00:12:19.120 validator uses the macro defined callback to automatically check if 249 00:12:19.240 --> 00:12:22.080 arguments in certain macros are properly wrapped in parentheses. 250 00:12:22.120 --> 00:12:24.320 Wow, that's fine grain control. Right at the start. What 251 00:12:24.360 --> 00:12:26.960 of the driver, the thing that orchestrates GCC or Clang? 252 00:12:27.039 --> 00:12:28.200 Can you customize that too? 253 00:12:28.639 --> 00:12:32.039 You can? The driver is basically the dispatcher, right, it 254 00:12:32.080 --> 00:12:35.879 passes flags and manages the different compilation phases. And guess 255 00:12:35.879 --> 00:12:38.840 what Clang uses to define its driver flags. 256 00:12:39.200 --> 00:12:40.639 Let me guess tablechen. 257 00:12:40.759 --> 00:12:44.919 You got it, tablegen again. You can declare custom flags, 258 00:12:45.080 --> 00:12:49.080 even paired flags like tay flag and NAM flag, using 259 00:12:49.120 --> 00:12:51.799 things like the booleion f flag, multi class and table gen. 260 00:12:52.200 --> 00:12:54.919 So you define your flag in table gen and Klang 261 00:12:55.039 --> 00:12:56.320 understands it exactly. 262 00:12:56.559 --> 00:12:59.399 The source gives an example of a custom fuse simple 263 00:12:59.440 --> 00:13:02.799 log flag. Defining this in tablegen allows the driver to 264 00:13:02.840 --> 00:13:05.600 recognize it, and then your custom logic can make it 265 00:13:05.679 --> 00:13:09.279 implicitly include a specific header simplelog dot H and maybe 266 00:13:09.279 --> 00:13:12.320 define macros to control log levels all driven by that 267 00:13:12.360 --> 00:13:12.960 one flag. 268 00:13:13.000 --> 00:13:16.159 That's really neat centralized control via custom flag. But can 269 00:13:16.200 --> 00:13:19.879 you go even deeper, like fundamentally change how Clang interacts 270 00:13:19.879 --> 00:13:22.519 with the system's tools, make it output something totally different. 271 00:13:22.600 --> 00:13:26.440 You absolutely can using custom toolchains. The toolchain normally adapts 272 00:13:26.440 --> 00:13:29.480 Clang for different platforms like different ozes or architectures, but 273 00:13:29.519 --> 00:13:31.919 you can make it do completely customed things like what 274 00:13:32.200 --> 00:13:35.039 The book has this fantastic, almost wild example called the 275 00:13:35.159 --> 00:13:38.879 zipline toolchain. It's a demo obviously, but it shows the 276 00:13:38.919 --> 00:13:39.840 power zipline. 277 00:13:39.840 --> 00:13:40.320 What does it do? 278 00:13:40.679 --> 00:13:43.440 Instead of normal compilation, it uses Clang, but then it 279 00:13:44.480 --> 00:13:47.799 encodes the generated assembly code using base sixty four during 280 00:13:47.799 --> 00:13:49.159 the assembling phase. 281 00:13:49.200 --> 00:13:51.639 Base sixty four why, just to show it can. 282 00:13:51.799 --> 00:13:55.279 And then during the linking phase it packages those base 283 00:13:55.320 --> 00:13:57.840 sixty four files into a ZP archive. 284 00:13:58.039 --> 00:14:00.480 Okay, that is wild? How does it that? In? 285 00:14:00.720 --> 00:14:04.120 Through the tool chain definition, you override methods like ad 286 00:14:04.120 --> 00:14:07.559 Clang system include ARGs can add custom include paths. Build 287 00:14:07.600 --> 00:14:10.279 assembler gets overridden to call open cell base sixty four 288 00:14:10.360 --> 00:14:13.440 instead of the normal assembler, and build linker gets overridden 289 00:14:13.480 --> 00:14:15.879 to call zip or tar instead of the linker. 290 00:14:16.080 --> 00:14:20.120 Wow. So you're completely replacing standard build steps with custom commands. 291 00:14:20.200 --> 00:14:23.399 Exactly. It perfectly illustrates how deeply you can customize the 292 00:14:23.519 --> 00:14:24.679 entire pipeline if you need to. 293 00:14:24.759 --> 00:14:26.960 So if you thought compilers where a black box, definitely 294 00:14:27.000 --> 00:14:30.240 think again. We're not just peeking inside. We're fundamentally changing 295 00:14:30.320 --> 00:14:32.720 how they work, how they talk to the OS. That 296 00:14:32.799 --> 00:14:36.840 level of control it must open up amazing possibilities for 297 00:14:36.879 --> 00:14:38.639 optimization and analysis. Right. 298 00:14:38.960 --> 00:14:43.399 Absolutely, that's where the real power of LLVM shines. Sophisticated 299 00:14:43.440 --> 00:14:48.399 optimizations need deep program understanding, and this happens primarily in llvm. 300 00:14:48.120 --> 00:14:50.600 IR, right, the intermediate representation. 301 00:14:50.200 --> 00:14:54.039 Exactly, it's the target independent intermediate representation. It's the core 302 00:14:54.120 --> 00:14:59.320 of the entire LLVM framework where most analysis and transformation happens. 303 00:14:59.399 --> 00:15:02.519 And the mechan is and for doing these transformations is passes. Right, 304 00:15:02.600 --> 00:15:05.720 what's a pass and what's this new pass manager deal? 305 00:15:06.120 --> 00:15:10.240 Think of an LLVM pass as a module, a basic 306 00:15:10.360 --> 00:15:13.919 unit that performs certain actions against LLVMI are like one 307 00:15:14.000 --> 00:15:15.799 step on a factory assembly. 308 00:15:15.399 --> 00:15:17.159 Line, Okay, a modular step, right. 309 00:15:17.000 --> 00:15:20.080 And the new pass manager is a significant redesign compared 310 00:15:20.120 --> 00:15:23.279 to the older system. The book highlights it runs faster 311 00:15:23.399 --> 00:15:26.080 and generates results with better quality, partly due to a 312 00:15:26.120 --> 00:15:27.039 cleaner interface. 313 00:15:27.159 --> 00:15:29.200 Can you give an example of a simple pass. 314 00:15:28.960 --> 00:15:31.960 Sure the source shows a strict up pass. Its goal 315 00:15:32.039 --> 00:15:35.000 is simple, add the nolias attribute to function arguments that 316 00:15:35.039 --> 00:15:36.480 are pointers no alias. 317 00:15:36.519 --> 00:15:37.720 What does that tell the compiler? 318 00:15:38.039 --> 00:15:42.200 It's a powerful hint. It guarantees that pointer does an alias, 319 00:15:42.360 --> 00:15:44.960 meaning it doesn't point to the same memory location as 320 00:15:45.000 --> 00:15:48.080 any other pointer accessible in that scope. This lets the 321 00:15:48.080 --> 00:15:52.080 optimizer be much more aggressive, assuming less potential overlap, which 322 00:15:52.120 --> 00:15:54.039 can unlock significant speed ups. 323 00:15:54.240 --> 00:15:56.639 How does the pass know what other passes have done? 324 00:15:56.720 --> 00:15:59.080 Ah, that's key to the new manager. When you write 325 00:15:59.080 --> 00:16:02.000 a pass, you have to clear what analysis it preserves. 326 00:16:02.360 --> 00:16:06.279 You use preserved analyzes, so if your pass adds no alias, 327 00:16:06.639 --> 00:16:10.600 it might invalidate alias analysis results. You tell a manager, 328 00:16:10.879 --> 00:16:13.720 maybe AA manager results are no longer valid. 329 00:16:13.919 --> 00:16:16.320 So you explicitly state what your pass. 330 00:16:16.159 --> 00:16:18.879 Breaks, well, rather what it doesn't break. By default, it 331 00:16:18.919 --> 00:16:22.840 assumes you break everything you specify what's preserved. This avoids 332 00:16:22.919 --> 00:16:26.120 costly recomputation of analyzes that are still perfectly valid. It's 333 00:16:26.159 --> 00:16:28.600 like a librarian keeping track much more efficient. 334 00:16:28.759 --> 00:16:31.720 Makes sense. So passes are the workers, but they need 335 00:16:31.799 --> 00:16:35.000 information to do complex jobs, they need a brain, right 336 00:16:35.200 --> 00:16:36.559 is that the analysis manager? 337 00:16:36.600 --> 00:16:40.360 Precisely, you nailed it. Modern compiler optimizations can be complex. 338 00:16:40.559 --> 00:16:43.600 They require lots of information and often getting that information 339 00:16:43.720 --> 00:16:45.000 is expensive to evaluate. 340 00:16:45.159 --> 00:16:46.720 So the analysis manager helps with that. 341 00:16:46.919 --> 00:16:50.759 Yes, it handles all tasks related to program analysis. It 342 00:16:50.840 --> 00:16:54.960 runs the analysis passes, and crucially caches their results so 343 00:16:54.960 --> 00:16:57.279 they don't have to be rerun constantly. 344 00:16:57.720 --> 00:16:59.440 Can you give an example of an analysis? 345 00:16:59.440 --> 00:17:02.879 It might manage the source mentions a hal tantalizer project. 346 00:17:03.279 --> 00:17:05.839 Its goal is to find code that's unreachable because a 347 00:17:05.880 --> 00:17:08.440 special function like my halt gets called earlier. 348 00:17:08.519 --> 00:17:10.000 Okay, dead god detection. 349 00:17:10.119 --> 00:17:13.319 Sort of yeah. And to do this it relies on 350 00:17:13.319 --> 00:17:18.079 one of the fundamental analyzes. LVM provides the dominator tree. 351 00:17:18.160 --> 00:17:21.359 Or DT dominator tree. How does that help find unreachable 352 00:17:21.359 --> 00:17:22.319 code after my halt? 353 00:17:22.640 --> 00:17:25.720 Okay? So the dominator tree tells you control flow relationships. 354 00:17:25.920 --> 00:17:29.640 If basic block A dominates basic block B, it means 355 00:17:29.680 --> 00:17:32.400 every possible path to B must go through A first. 356 00:17:32.799 --> 00:17:33.000 Ah. 357 00:17:33.079 --> 00:17:35.400 I see, So if the block containing my halt dominates 358 00:17:35.440 --> 00:17:38.599 another block and my halt stops execution, then that dominated 359 00:17:38.640 --> 00:17:42.720 block is definitely unreachable. Dominator tree analysis computes this tree structure, 360 00:17:42.880 --> 00:17:44.799 and halt tantalizer just needs to query it. 361 00:17:45.039 --> 00:17:48.359 That's really clever. Leveraging fundamental graph analysis. 362 00:17:48.119 --> 00:17:52.599 Exactly and understanding these core analyzes like dominator trees is 363 00:17:52.640 --> 00:17:56.160 what lets you build much smarter, much more effective custom 364 00:17:56.200 --> 00:18:01.039 optimization or analysis passes. You're building on solid theory foundations. 365 00:18:01.079 --> 00:18:04.880 That's a powerful concept. But okay, even with great optimizations, 366 00:18:04.920 --> 00:18:08.160 things go wrong. You need to debug, diagnose issues, check 367 00:18:08.240 --> 00:18:11.640 run time behavior. What tools does LVM offer there? 368 00:18:11.799 --> 00:18:16.000 Right, optimization isn't everything. LVM has some essential support utilities 369 00:18:16.200 --> 00:18:19.519 for debugging your passes themselves. There's lvmd bug. 370 00:18:19.599 --> 00:18:20.240 How does that work? 371 00:18:20.359 --> 00:18:24.599 You sprinkle LLVMD e bug DDGSS calls in your passcode. 372 00:18:24.839 --> 00:18:27.920 These messages only get printed if you run the optimizer 373 00:18:27.960 --> 00:18:30.880 tool with the ededbug or dbug only your past name 374 00:18:30.960 --> 00:18:34.279 flag keeps your production builds clean, but gives you detailed 375 00:18:34.319 --> 00:18:35.240 logs when you need them. 376 00:18:35.480 --> 00:18:38.359 Nice conditional logging. What about tracking numbers like how many 377 00:18:38.359 --> 00:18:39.759 times an optimization fired? 378 00:18:40.000 --> 00:18:42.680 For that, you use the statistic macro. You just declare 379 00:18:42.720 --> 00:18:45.599 statistic counter name description and then increment counter name in 380 00:18:45.640 --> 00:18:49.640 your code. LVM automatically collects these organizes them and can 381 00:18:49.680 --> 00:18:51.160 print them out even in formats like. 382 00:18:51.200 --> 00:18:54.559 Chason Jason output. That's useful for automation. 383 00:18:54.400 --> 00:18:58.960 Very useful turns ad hoc counting into structured data for analysis. 384 00:18:59.279 --> 00:19:01.400 Let's you see if you're pass is actually doing what 385 00:19:01.440 --> 00:19:04.240 you thought or hitting unexpected bottlenecks. 386 00:19:04.359 --> 00:19:08.720 Okay, what if an optimization tries to do something but fails, 387 00:19:09.200 --> 00:19:11.880 like it wants to vectorize a loop but can't, how 388 00:19:11.920 --> 00:19:12.960 do you find out why? 389 00:19:13.559 --> 00:19:17.519 That's exactly what optimization remarks are for you use optimization 390 00:19:17.640 --> 00:19:20.640 remark emitter in your past to report why something happened 391 00:19:20.839 --> 00:19:21.599 or didn't. 392 00:19:21.319 --> 00:19:23.960 Happen, So notes from the optimizer pretty much. 393 00:19:24.119 --> 00:19:28.279 The book uses the example of loop invariant code motion LICM. 394 00:19:28.640 --> 00:19:31.039 If it fails to hoist an instruction out of a loop, 395 00:19:31.079 --> 00:19:33.799 it can emit a remark saying why maybe there's a 396 00:19:33.799 --> 00:19:35.559 potential side effect it couldn't ignore. 397 00:19:35.720 --> 00:19:37.519 You can see these remarks yes. 398 00:19:37.440 --> 00:19:40.240 And even better, there's a tool Optviewer dot kei that 399 00:19:40.279 --> 00:19:43.079 takes these remarks and generates a webpage. It highlights the 400 00:19:43.160 --> 00:19:45.640 relevant source code lines and shows the remarks right next 401 00:19:45.640 --> 00:19:48.200