WEBVTT

1
00:00:00.120 --> 00:00:04.639
<v Speaker 1>Okay, let's unpack LLVM. It's incredibly powerful, right, underpins so

2
00:00:04.719 --> 00:00:07.440
<v Speaker 1>much stuff, apples, x code, game engines, you name it.

3
00:00:07.519 --> 00:00:08.880
<v Speaker 2>Absolutely it's everywhere.

4
00:00:09.279 --> 00:00:12.119
<v Speaker 1>But you know, for developers already deep in the ecosystem

5
00:00:12.240 --> 00:00:15.439
<v Speaker 1>or maybe looking to get deeper into compiler engineering, tackle

6
00:00:15.560 --> 00:00:19.079
<v Speaker 1>the docs, well, it can feel less like learning and

7
00:00:19.199 --> 00:00:21.399
<v Speaker 1>more like drinking from a fire hose.

8
00:00:21.559 --> 00:00:24.440
<v Speaker 2>That's a perfect description. It's famously scattered, right.

9
00:00:24.800 --> 00:00:28.600
<v Speaker 1>So our source today, this book LLVM Techniques, tips and

10
00:00:28.640 --> 00:00:32.320
<v Speaker 1>bust practices, it promises to sort of rain that beast in.

11
00:00:32.520 --> 00:00:34.479
<v Speaker 1>So what's our big mission for you today? What are

12
00:00:34.520 --> 00:00:35.679
<v Speaker 1>we trying to achieve here?

13
00:00:35.920 --> 00:00:39.119
<v Speaker 2>Well, our mission is really to give you this streamlined,

14
00:00:39.880 --> 00:00:44.280
<v Speaker 2>comprehensive overview to cut through all that documentation sprawl. We're

15
00:00:44.280 --> 00:00:47.759
<v Speaker 2>going to unearth some key techniques, maybe some surprising insights,

16
00:00:48.039 --> 00:00:51.560
<v Speaker 2>to help you build, tests, optimize, and importantly debug your

17
00:00:51.719 --> 00:00:55.320
<v Speaker 2>LLVM projects much more efficiently. Fewer headaches.

18
00:00:55.640 --> 00:00:56.880
<v Speaker 1>Fewer headaches are always good.

19
00:00:57.079 --> 00:01:00.880
<v Speaker 2>Definitely think of this deep dive as like your shortcut,

20
00:01:01.079 --> 00:01:05.319
<v Speaker 2>making LLVM development faster, more reliable, and well more insightful.

21
00:01:05.680 --> 00:01:08.439
<v Speaker 2>We're aiming straight at those common pain points, those really

22
00:01:08.519 --> 00:01:13.560
<v Speaker 2>long build times, complex testing opaque debugging, that kind of thing.

23
00:01:13.640 --> 00:01:15.599
<v Speaker 1>Yeah, long build times. That's off for the first mountain

24
00:01:15.640 --> 00:01:18.959
<v Speaker 1>you hit, isn't it. Especially with a huge project like LLVM,

25
00:01:19.480 --> 00:01:22.000
<v Speaker 1>defaults can take hours, huge productivity killer.

26
00:01:22.079 --> 00:01:23.599
<v Speaker 2>Oh absolutely, it's a major drag.

27
00:01:24.040 --> 00:01:25.840
<v Speaker 1>So what's the secret? How do we cut down this

28
00:01:25.920 --> 00:01:26.920
<v Speaker 1>build time? Beast?

29
00:01:27.159 --> 00:01:31.239
<v Speaker 2>Okay, so the book immediately points to replacing slower, older tools.

30
00:01:31.359 --> 00:01:33.920
<v Speaker 2>Makes sense, right, right? Take build systems Ninja. It just

31
00:01:33.959 --> 00:01:36.959
<v Speaker 2>runs significantly faster than say, g and you make on

32
00:01:37.159 --> 00:01:39.920
<v Speaker 2>big code bases. LLVM is a perfect example.

33
00:01:40.200 --> 00:01:42.519
<v Speaker 1>Why is Ninja faster? What's the magic there?

34
00:01:42.640 --> 00:01:45.799
<v Speaker 2>The key is it's build dot ninjascript. It's it's almost

35
00:01:45.799 --> 00:01:48.640
<v Speaker 2>like assembly language for builds. It gets generated by higher

36
00:01:48.680 --> 00:01:51.599
<v Speaker 2>level systems like Seamake, and because it's so low level,

37
00:01:51.640 --> 00:01:54.400
<v Speaker 2>it allows for loads of optimizations under the hood. Plus

38
00:01:54.400 --> 00:01:56.799
<v Speaker 2>it handles dependencies much better. You just tell Sea make

39
00:01:56.879 --> 00:01:58.719
<v Speaker 2>desh g ninja as simple as that.

40
00:01:58.760 --> 00:02:01.680
<v Speaker 1>So just one command line flag can make a big difference.

41
00:02:01.719 --> 00:02:04.239
<v Speaker 1>We're basically upgrading the engine. But it's not just the

42
00:02:04.239 --> 00:02:06.959
<v Speaker 1>build system, is it. The linker is often a problem too.

43
00:02:07.040 --> 00:02:11.400
<v Speaker 2>Precisely, the default linker BFD. It's mature, sure, but it

44
00:02:11.479 --> 00:02:14.000
<v Speaker 2>wasn't really built for modern speed or memory needs.

45
00:02:14.039 --> 00:02:14.960
<v Speaker 1>How bad can it get?

46
00:02:15.039 --> 00:02:18.159
<v Speaker 2>It can show up that this up to twenty gigabytes

47
00:02:18.199 --> 00:02:20.120
<v Speaker 2>of memory building LLVM.

48
00:02:20.360 --> 00:02:22.319
<v Speaker 1>Wow. Okay, that's a bottleneck.

49
00:02:21.919 --> 00:02:25.039
<v Speaker 2>Definitely a performance hurdle. But thankfully there are much better

50
00:02:25.080 --> 00:02:28.520
<v Speaker 2>alternatives g and you gold, Google developed that one, and

51
00:02:28.759 --> 00:02:33.280
<v Speaker 2>LVM's own linker ld LED Yeah. Ld is often even faster,

52
00:02:33.360 --> 00:02:36.159
<v Speaker 2>and it's got experimental parallel linking too, which is pretty cool.

53
00:02:36.400 --> 00:02:40.400
<v Speaker 2>And again easy sea make flags ds DLVM muslinker Gold

54
00:02:41.039 --> 00:02:44.639
<v Speaker 2>or dz dlv musle ankored. Okay, the speed up from

55
00:02:44.680 --> 00:02:47.639
<v Speaker 2>just those two changes Ninja and a faster linker, it's substantial,

56
00:02:47.919 --> 00:02:48.680
<v Speaker 2>really noticeable.

57
00:02:48.759 --> 00:02:51.319
<v Speaker 1>That's huge just swapping out a couple of underlying tools.

58
00:02:51.439 --> 00:02:53.759
<v Speaker 1>But we can also tweak c make itself right, fine

59
00:02:53.759 --> 00:02:54.840
<v Speaker 1>tune the build arguments.

60
00:02:54.879 --> 00:02:58.520
<v Speaker 2>Absolutely. Tweaking CE make arguments is crucial for efficiency, like

61
00:02:58.960 --> 00:03:01.360
<v Speaker 2>choosing the right build type. A roll with deb info

62
00:03:01.479 --> 00:03:03.240
<v Speaker 2>is off in the sweet spot. Why that one, Well,

63
00:03:03.280 --> 00:03:05.319
<v Speaker 2>it gives you optimized code so it's fast, but it

64
00:03:05.400 --> 00:03:07.759
<v Speaker 2>keeps the debug information so you get a great balance

65
00:03:07.800 --> 00:03:12.240
<v Speaker 2>between space speed and you know, being able to actually debug.

66
00:03:12.280 --> 00:03:12.840
<v Speaker 1>It makes sense.

67
00:03:13.000 --> 00:03:15.240
<v Speaker 2>You generally want to avoid a full debug build unless

68
00:03:15.240 --> 00:03:18.879
<v Speaker 2>you absolutely have to. It just creates so much unnecessary

69
00:03:18.919 --> 00:03:21.599
<v Speaker 2>storage waste, huge binaries and targets.

70
00:03:21.879 --> 00:03:26.039
<v Speaker 1>LVM supports what nearly two dozen hardware targets. Most people

71
00:03:26.039 --> 00:03:28.039
<v Speaker 1>don't need all of those really exactly.

72
00:03:28.080 --> 00:03:30.479
<v Speaker 2>That's another massive time sink. If you build them all.

73
00:03:30.800 --> 00:03:32.520
<v Speaker 2>You can save a ton of time by just building

74
00:03:32.520 --> 00:03:36.479
<v Speaker 2>the ones you actually need. Use DLLLVM target.

75
00:03:36.120 --> 00:03:37.639
<v Speaker 1>Still build How does that look?

76
00:03:37.840 --> 00:03:41.199
<v Speaker 2>Something like Della's DLVM targets to build X eighty six

77
00:03:41.319 --> 00:03:43.879
<v Speaker 2>R sixty four. Just list the ones you care about.

78
00:03:43.960 --> 00:03:45.680
<v Speaker 1>And there's a catch with shells you mentioned.

79
00:03:45.680 --> 00:03:48.960
<v Speaker 2>Ah yeah, good point. In some shells like BSh, you

80
00:03:49.000 --> 00:03:51.240
<v Speaker 2>have to remember the double quotes around the list, otherwise

81
00:03:51.319 --> 00:03:53.759
<v Speaker 2>the command gets cut off part way through. Little gotcha?

82
00:03:53.919 --> 00:03:56.199
<v Speaker 1>Good tip? What else? Shared libraries?

83
00:03:56.400 --> 00:04:01.159
<v Speaker 2>Yes, another great strategy, especially during development. Build LVM components

84
00:04:01.199 --> 00:04:05.280
<v Speaker 2>as shared libraries. Use LVM components as shared libraries.

85
00:04:05.439 --> 00:04:07.639
<v Speaker 1>Use aad build shared libs.

86
00:04:07.360 --> 00:04:10.479
<v Speaker 2>On Why is that better? Because LVM is so modular,

87
00:04:10.840 --> 00:04:14.280
<v Speaker 2>Building shared library saves a significant amount of storage space

88
00:04:14.319 --> 00:04:16.560
<v Speaker 2>and really speeds up the linking part of the build

89
00:04:16.600 --> 00:04:19.920
<v Speaker 2>process compared to static libraries. Much faster iteration.

90
00:04:20.240 --> 00:04:23.480
<v Speaker 1>Okay, and what do LVM dash tubulliging. That one comes

91
00:04:23.519 --> 00:04:24.600
<v Speaker 1>up a lot as being slow.

92
00:04:24.759 --> 00:04:27.399
<v Speaker 2>It does. It can really impact build times. But there's

93
00:04:27.439 --> 00:04:30.439
<v Speaker 2>a trick. You can build an optimized version of just

94
00:04:30.600 --> 00:04:33.680
<v Speaker 2>lavmt bulgein itself even if the rest of your build

95
00:04:33.720 --> 00:04:38.399
<v Speaker 2>is in debug mode. Use h dlll V optimized stable gen.

96
00:04:38.399 --> 00:04:41.959
<v Speaker 1>Eldve one nice. So optimizing the tool that helps build the.

97
00:04:41.879 --> 00:04:44.000
<v Speaker 2>Tools exactly, it shaves off more time.

98
00:04:44.160 --> 00:04:48.040
<v Speaker 1>So it really feels like we're swapping out a rusty

99
00:04:48.079 --> 00:04:50.600
<v Speaker 1>old tractor for a soup up racing machine just by

100
00:04:50.720 --> 00:04:53.800
<v Speaker 1>changing a few settings and tools. Speaking of alternatives, the

101
00:04:53.839 --> 00:04:56.560
<v Speaker 1>source mentioned another build system gn it.

102
00:04:56.560 --> 00:04:59.959
<v Speaker 2>Does generate Ninja or gn used a lot by Google projects.

103
00:05:00.000 --> 00:05:00.800
<v Speaker 2>It's like Chromium.

104
00:05:00.879 --> 00:05:01.839
<v Speaker 1>What's its advantage.

105
00:05:01.920 --> 00:05:05.879
<v Speaker 2>It's known for really fast configuration time and reliable argument management.

106
00:05:06.160 --> 00:05:08.800
<v Speaker 2>The book says it's especially useful if your developments make

107
00:05:08.920 --> 00:05:11.560
<v Speaker 2>changes to build files, or if you're constantly trying out

108
00:05:11.560 --> 00:05:15.079
<v Speaker 2>different build options. Much quicker reconfiguration, so.

109
00:05:15.240 --> 00:05:17.759
<v Speaker 1>Good for rapid iteration on the build itself exactly.

110
00:05:17.959 --> 00:05:20.639
<v Speaker 2>It's more of an alternative for those specific scenarios. Maybe

111
00:05:20.639 --> 00:05:23.040
<v Speaker 2>not a full replacement for everyone, but very handy when

112
00:05:23.040 --> 00:05:24.279
<v Speaker 2>you're tweaking build files a lot.

113
00:05:24.319 --> 00:05:26.720
<v Speaker 1>Okay, it makes sense. So once you've got your compiler

114
00:05:26.759 --> 00:05:30.439
<v Speaker 1>built fast, the next big hurdle is reliability testing. How

115
00:05:30.439 --> 00:05:32.720
<v Speaker 1>do you make sure it's actually, you know, correct?

116
00:05:33.079 --> 00:05:37.079
<v Speaker 2>Yeah, testing is critical, and LVM provides its own framework

117
00:05:37.120 --> 00:05:43.040
<v Speaker 2>for this, LVM LIT like LLVM Integrated Tester. The book

118
00:05:43.079 --> 00:05:46.839
<v Speaker 2>calls it an easy to use yet general framework, and importantly,

119
00:05:47.000 --> 00:05:50.319
<v Speaker 2>while it started for LVM's own tests, it's actually a

120
00:05:50.360 --> 00:05:53.680
<v Speaker 2>generic testing framework. You can use it outside LVM for

121
00:05:53.759 --> 00:05:54.639
<v Speaker 2>other projects too.

122
00:05:54.879 --> 00:05:58.800
<v Speaker 1>Very versatile and inside RIT there's this utility file check

123
00:05:59.360 --> 00:06:02.560
<v Speaker 1>that sounds key for compiler testing. What's special about it?

124
00:06:02.800 --> 00:06:05.560
<v Speaker 2>File check is really powerful. It does advance pattern checking

125
00:06:05.560 --> 00:06:08.040
<v Speaker 2>on output files. It goes way beyond just diffing text,

126
00:06:08.399 --> 00:06:11.839
<v Speaker 2>so you embed directives right in your test files. Gacheck

127
00:06:11.920 --> 00:06:15.560
<v Speaker 2>is basic rajex matching, simple enough, but then you get

128
00:06:15.560 --> 00:06:18.759
<v Speaker 2>directives like check next t that makes sure a pattern

129
00:06:18.839 --> 00:06:21.399
<v Speaker 2>is found on the very next line after the previous match.

130
00:06:21.800 --> 00:06:24.040
<v Speaker 2>Super useful for checking sequential.

131
00:06:23.519 --> 00:06:25.839
<v Speaker 1>Output ah right, controlling the order.

132
00:06:25.839 --> 00:06:30.079
<v Speaker 2>Exactly and check same that matches patterns that must be

133
00:06:30.079 --> 00:06:33.759
<v Speaker 2>on the exact same line. Brilliant for avoiding really long

134
00:06:34.000 --> 00:06:36.639
<v Speaker 2>messy check lines when you need multiple things on one

135
00:06:36.680 --> 00:06:38.759
<v Speaker 2>line keeps tests readable.

136
00:06:38.839 --> 00:06:42.000
<v Speaker 1>Yeah, I can see that. Verbos ir needs concise checks.

137
00:06:42.160 --> 00:06:44.879
<v Speaker 1>What if you want to ensure something isn't there or

138
00:06:44.879 --> 00:06:45.959
<v Speaker 1>if the order doesn't matter?

139
00:06:46.040 --> 00:06:50.680
<v Speaker 2>Good questions. For negative checks, there's check not. It asserts

140
00:06:50.680 --> 00:06:53.240
<v Speaker 2>a pattern does not exist. Really handy for saying Okay,

141
00:06:53.240 --> 00:06:55.800
<v Speaker 2>I expect why, but I definitely don't want to see X.

142
00:06:56.079 --> 00:06:58.759
<v Speaker 1>Makes sense asserting the absence of something.

143
00:06:58.560 --> 00:07:01.399
<v Speaker 2>And for when the order might change, maybe due to optimizations.

144
00:07:01.399 --> 00:07:05.000
<v Speaker 2>Shuffling code around you use check DAG. That stands for

145
00:07:05.040 --> 00:07:07.800
<v Speaker 2>a directed ecyclic graph, but here it means it allows

146
00:07:07.800 --> 00:07:12.920
<v Speaker 2>matching texts and arbitrary orders. Super flexible for testing nondeterministic output.

147
00:07:13.160 --> 00:07:16.120
<v Speaker 1>Wow, check DAG. That's really flexible. It seems like you

148
00:07:16.120 --> 00:07:18.839
<v Speaker 1>can test the intent behind the code changes, not just

149
00:07:18.879 --> 00:07:20.160
<v Speaker 1>the literal output strengths.

150
00:07:20.279 --> 00:07:22.600
<v Speaker 2>That's exactly the point. It's about semantic checking, not just

151
00:07:22.600 --> 00:07:23.439
<v Speaker 2>textual matching.

152
00:07:23.720 --> 00:07:27.720
<v Speaker 1>So, speaking of describing intent and structure, compilers deal with

153
00:07:27.879 --> 00:07:33.639
<v Speaker 1>incredibly complex structured data instruction sets, optimization rules. How does

154
00:07:33.920 --> 00:07:37.759
<v Speaker 1>LLVM handle describing that efficiently? Is that table gen You've

155
00:07:37.839 --> 00:07:38.279
<v Speaker 1>nailed it.

156
00:07:38.360 --> 00:07:41.519
<v Speaker 2>Tablegen is the answer. There, it's a domain specific language

157
00:07:41.519 --> 00:07:45.959
<v Speaker 2>a DSL ESL. Yeah. It originally started within LVM for

158
00:07:46.040 --> 00:07:49.079
<v Speaker 2>describing things like process or instruction sets, the ISA, and

159
00:07:49.120 --> 00:07:52.639
<v Speaker 2>other hardware details, but its use has just exploded.

160
00:07:52.879 --> 00:07:54.480
<v Speaker 1>How so what else does it use for?

161
00:07:54.759 --> 00:07:59.319
<v Speaker 2>Oh? Everything, managing Clang's command line options, defining complex optimization

162
00:07:59.439 --> 00:08:02.839
<v Speaker 2>rules like the inst combine people optimizations. The book says

163
00:08:02.839 --> 00:08:06.079
<v Speaker 2>it's basically for any tasks that involve non trivial static

164
00:08:06.120 --> 00:08:07.759
<v Speaker 2>and structural data.

165
00:08:07.360 --> 00:08:10.120
<v Speaker 1>So much broader than just hardware. Now it's a general

166
00:08:10.160 --> 00:08:12.240
<v Speaker 1>tool for this kind of static data. Can you give

167
00:08:12.279 --> 00:08:14.199
<v Speaker 1>us a quick feel for the syntax? How does it work?

168
00:08:14.399 --> 00:08:17.319
<v Speaker 2>Sure? At its core, you define a class. Think of

169
00:08:17.360 --> 00:08:20.000
<v Speaker 2>it like a C plus plus struct It defines a layout,

170
00:08:20.199 --> 00:08:22.720
<v Speaker 2>fields and types. Then you use def to create an

171
00:08:22.720 --> 00:08:24.879
<v Speaker 2>instance of that class called a record.

172
00:08:24.839 --> 00:08:27.199
<v Speaker 1>Like creating an object from a class blueprint exactly.

173
00:08:27.759 --> 00:08:31.279
<v Speaker 2>And you can override specific fields in that instance using

174
00:08:31.279 --> 00:08:32.080
<v Speaker 2>the let keyword.

175
00:08:32.279 --> 00:08:35.320
<v Speaker 1>Okay, and what about these bang operators I've heard about,

176
00:08:35.799 --> 00:08:39.120
<v Speaker 1>dot AD, dot mole Ah, Yes.

177
00:08:39.559 --> 00:08:41.919
<v Speaker 2>Those aren't run time functions. They're more like macros that

178
00:08:41.960 --> 00:08:44.840
<v Speaker 2>get evaluated during build time buil tag. Yeah, so you

179
00:08:44.879 --> 00:08:48.480
<v Speaker 2>can do simple computations right in the table gen file itself.

180
00:08:48.759 --> 00:08:51.840
<v Speaker 2>The example given is dot mole kilogram one thousand to

181
00:08:51.879 --> 00:08:55.360
<v Speaker 2>maybe convert units. It happens when table gen runs, not

182
00:08:55.480 --> 00:08:56.639
<v Speaker 2>when the compiler runs.

183
00:08:56.519 --> 00:08:59.600
<v Speaker 1>Later clever build time computation, or if you have lots

184
00:08:59.639 --> 00:09:01.200
<v Speaker 1>of similar records, is there a shortcut?

185
00:09:01.320 --> 00:09:03.519
<v Speaker 2>Yes, that's where multi class comes in. It's a way

186
00:09:03.559 --> 00:09:06.320
<v Speaker 2>to define multiple records at once by factoring out common

187
00:09:06.360 --> 00:09:09.559
<v Speaker 2>parameters like a template sort of. Yeah. The book uses

188
00:09:09.600 --> 00:09:12.480
<v Speaker 2>an autopart and car example. You define a multi class

189
00:09:12.480 --> 00:09:15.240
<v Speaker 2>for parts, then use defen to instantiate multiple cars, and

190
00:09:15.279 --> 00:09:18.639
<v Speaker 2>it automatically generates all the individual part records like car one,

191
00:09:18.679 --> 00:09:21.679
<v Speaker 2>fuel tank, carto, engine, et cetera from one definition.

192
00:09:21.879 --> 00:09:25.759
<v Speaker 1>Very concise, nice as boilerplate right and complex relationships.

193
00:09:25.840 --> 00:09:29.200
<v Speaker 2>Yeah, graphs for that. Tablegen has a specific DAG data

194
00:09:29.240 --> 00:09:33.320
<v Speaker 2>type that lets you define directed cyclic graph instances explicitly,

195
00:09:33.840 --> 00:09:37.799
<v Speaker 2>super important for things like instruction selection patterns or optimization

196
00:09:37.919 --> 00:09:40.679
<v Speaker 2>rules where you have dependencies. You can even use tags

197
00:09:40.759 --> 00:09:42.559
<v Speaker 2>like the upper term dollars to give parts of the

198
00:09:42.639 --> 00:09:43.960
<v Speaker 2>daglogical names.

199
00:09:43.720 --> 00:09:46.879
<v Speaker 1>A DAG type built in. Yeah, that's powerful, and the

200
00:09:46.960 --> 00:09:51.039
<v Speaker 1>source uses this amazing analogy to make a concrete right

201
00:09:51.200 --> 00:09:52.759
<v Speaker 1>a donut recipe it does.

202
00:09:52.799 --> 00:09:56.000
<v Speaker 2>It's a brilliant example. The book uses a delicious doughnut

203
00:09:56.080 --> 00:09:58.000
<v Speaker 2>recipe to show tablegen's power.

204
00:09:58.159 --> 00:10:00.639
<v Speaker 1>How does that work? A doughnut recipe and a piler book.

205
00:10:00.720 --> 00:10:04.240
<v Speaker 2>It defines unit classes like gramunit peb's peanut, then ingredient

206
00:10:04.279 --> 00:10:07.919
<v Speaker 2>base records, and finally step records. These step records form

207
00:10:07.960 --> 00:10:11.320
<v Speaker 2>a DAG representing the cooking actions. Makes this add that

208
00:10:11.480 --> 00:10:14.679
<v Speaker 2>complete with ingredients and amounts. Wow, it's a perfect analogy

209
00:10:14.679 --> 00:10:17.879
<v Speaker 2>because it takes this abstract idea of describing structured data

210
00:10:17.919 --> 00:10:20.440
<v Speaker 2>and makes it totally tangible. You immediately see how it

211
00:10:20.480 --> 00:10:23.879
<v Speaker 2>parallels describing instruction patterns or optimization steps.

212
00:10:23.720 --> 00:10:27.360
<v Speaker 1>A donut recipe in compiler engineering. That's definitely an aha moment,

213
00:10:27.519 --> 00:10:30.919
<v Speaker 1>makes total sense. So, okay, you've described your donut recipe

214
00:10:30.960 --> 00:10:34.039
<v Speaker 1>or your instruction set in table gen. How do you

215
00:10:34.080 --> 00:10:36.480
<v Speaker 1>actually use that data like print the recipe?

216
00:10:36.600 --> 00:10:39.320
<v Speaker 2>Right? For that, you need a custom table gen back end?

217
00:10:39.600 --> 00:10:43.759
<v Speaker 2>Now important distinction. This isn't an LVM back end like

218
00:10:43.799 --> 00:10:45.120
<v Speaker 2>for generating machine code.

219
00:10:45.240 --> 00:10:46.799
<v Speaker 1>Different kind of back end, totally different.

220
00:10:47.080 --> 00:10:49.320
<v Speaker 2>A table gen back end is a piece of code,

221
00:10:49.519 --> 00:10:54.480
<v Speaker 2>usually C plus A that convert or transpiles table gen

222
00:10:54.519 --> 00:10:57.519
<v Speaker 2>files into an arbitrary, textual.

223
00:10:57.039 --> 00:10:59.639
<v Speaker 1>Content arbitrary, so anything pretty much.

224
00:10:59.720 --> 00:11:02.759
<v Speaker 2>It could generate a C plus plus header file documentation

225
00:11:03.039 --> 00:11:05.080
<v Speaker 2>or in the donut example, just plain text for the

226
00:11:05.080 --> 00:11:09.519
<v Speaker 2>recipe you use C plus plus APIs provided by tablegen

227
00:11:09.639 --> 00:11:12.440
<v Speaker 2>like recordkeeper dot get all the rive definitions to get

228
00:11:12.440 --> 00:11:15.440
<v Speaker 2>all the defined steps and record dot get value restring

229
00:11:15.559 --> 00:11:18.720
<v Speaker 2>to pull out specific values like ingredient names or amounts.

230
00:11:18.960 --> 00:11:21.759
<v Speaker 1>So you turn the table genstructure into usable code or data.

231
00:11:21.799 --> 00:11:25.080
<v Speaker 2>Exactly, you transform the structured description into whatever format you

232
00:11:25.120 --> 00:11:25.919
<v Speaker 2>need downstream.

233
00:11:26.120 --> 00:11:29.440
<v Speaker 1>That ability to generate code is huge. Yeah, it feels

234
00:11:29.519 --> 00:11:33.240
<v Speaker 1>like that opens the door to extending client itself, maybe

235
00:11:33.240 --> 00:11:35.320
<v Speaker 1>injecting custom logic into the frontend.

236
00:11:35.399 --> 00:11:38.759
<v Speaker 2>It absolutely does. The front end is a prime place

237
00:11:38.799 --> 00:11:42.879
<v Speaker 2>for customization. Think about the preprocessor, the very first stage

238
00:11:42.919 --> 00:11:46.919
<v Speaker 2>handling macros includes you can customize that. Oh yeah, you

239
00:11:46.960 --> 00:11:50.919
<v Speaker 2>can write custom Pragma handler extensions, so you can invent

240
00:11:51.000 --> 00:11:55.000
<v Speaker 2>your own hashtag pragma directives. The book shows an example

241
00:11:55.080 --> 00:11:56.919
<v Speaker 2>hashtag pragma macro or guard.

242
00:11:57.240 --> 00:11:57.919
<v Speaker 1>What would that do?

243
00:11:58.279 --> 00:12:01.559
<v Speaker 2>Well? When the preprocessor sees your prag your handler code runs.

244
00:12:01.960 --> 00:12:04.759
<v Speaker 2>It can parse the Pragma arguments and even register something

245
00:12:04.759 --> 00:12:08.279
<v Speaker 2>called PEP callbacks. Callbacks, yeah, pp callbats let you hook

246
00:12:08.279 --> 00:12:12.000
<v Speaker 2>into various preprocessor events, so you can insert custom logic

247
00:12:12.039 --> 00:12:15.840
<v Speaker 2>whenever a preprocessor event happens. In the example, a macroguard

248
00:12:15.919 --> 00:12:19.120
<v Speaker 2>validator uses the macro defined callback to automatically check if

249
00:12:19.240 --> 00:12:22.080
<v Speaker 2>arguments in certain macros are properly wrapped in parentheses.

250
00:12:22.120 --> 00:12:24.320
<v Speaker 1>Wow, that's fine grain control. Right at the start. What

251
00:12:24.360 --> 00:12:26.960
<v Speaker 1>of the driver, the thing that orchestrates GCC or Clang?

252
00:12:27.039 --> 00:12:28.200
<v Speaker 1>Can you customize that too?

253
00:12:28.639 --> 00:12:32.039
<v Speaker 2>You can? The driver is basically the dispatcher, right, it

254
00:12:32.080 --> 00:12:35.879
<v Speaker 2>passes flags and manages the different compilation phases. And guess

255
00:12:35.879 --> 00:12:38.840
<v Speaker 2>what Clang uses to define its driver flags.

256
00:12:39.200 --> 00:12:40.639
<v Speaker 1>Let me guess tablechen.

257
00:12:40.759 --> 00:12:44.919
<v Speaker 2>You got it, tablegen again. You can declare custom flags,

258
00:12:45.080 --> 00:12:49.080
<v Speaker 2>even paired flags like tay flag and NAM flag, using

259
00:12:49.120 --> 00:12:51.799
<v Speaker 2>things like the booleion f flag, multi class and table gen.

260
00:12:52.200 --> 00:12:54.919
<v Speaker 1>So you define your flag in table gen and Klang

261
00:12:55.039 --> 00:12:56.320
<v Speaker 1>understands it exactly.

262
00:12:56.559 --> 00:12:59.399
<v Speaker 2>The source gives an example of a custom fuse simple

263
00:12:59.440 --> 00:13:02.799
<v Speaker 2>log flag. Defining this in tablegen allows the driver to

264
00:13:02.840 --> 00:13:05.600
<v Speaker 2>recognize it, and then your custom logic can make it

265
00:13:05.679 --> 00:13:09.279
<v Speaker 2>implicitly include a specific header simplelog dot H and maybe

266
00:13:09.279 --> 00:13:12.320
<v Speaker 2>define macros to control log levels all driven by that

267
00:13:12.360 --> 00:13:12.960
<v Speaker 2>one flag.

268
00:13:13.000 --> 00:13:16.159
<v Speaker 1>That's really neat centralized control via custom flag. But can

269
00:13:16.200 --> 00:13:19.879
<v Speaker 1>you go even deeper, like fundamentally change how Clang interacts

270
00:13:19.879 --> 00:13:22.519
<v Speaker 1>with the system's tools, make it output something totally different.

271
00:13:22.600 --> 00:13:26.440
<v Speaker 2>You absolutely can using custom toolchains. The toolchain normally adapts

272
00:13:26.440 --> 00:13:29.480
<v Speaker 2>Clang for different platforms like different ozes or architectures, but

273
00:13:29.519 --> 00:13:31.919
<v Speaker 2>you can make it do completely customed things like what

274
00:13:32.200 --> 00:13:35.039
<v Speaker 2>The book has this fantastic, almost wild example called the

275
00:13:35.159 --> 00:13:38.879
<v Speaker 2>zipline toolchain. It's a demo obviously, but it shows the

276
00:13:38.919 --> 00:13:39.840
<v Speaker 2>power zipline.

277
00:13:39.840 --> 00:13:40.320
<v Speaker 1>What does it do?

278
00:13:40.679 --> 00:13:43.440
<v Speaker 2>Instead of normal compilation, it uses Clang, but then it

279
00:13:44.480 --> 00:13:47.799
<v Speaker 2>encodes the generated assembly code using base sixty four during

280
00:13:47.799 --> 00:13:49.159
<v Speaker 2>the assembling phase.

281
00:13:49.200 --> 00:13:51.639
<v Speaker 1>Base sixty four why, just to show it can.

282
00:13:51.799 --> 00:13:55.279
<v Speaker 2>And then during the linking phase it packages those base

283
00:13:55.320 --> 00:13:57.840
<v Speaker 2>sixty four files into a ZP archive.

284
00:13:58.039 --> 00:14:00.480
<v Speaker 1>Okay, that is wild? How does it that? In?

285
00:14:00.720 --> 00:14:04.120
<v Speaker 2>Through the tool chain definition, you override methods like ad

286
00:14:04.120 --> 00:14:07.559
<v Speaker 2>Clang system include ARGs can add custom include paths. Build

287
00:14:07.600 --> 00:14:10.279
<v Speaker 2>assembler gets overridden to call open cell base sixty four

288
00:14:10.360 --> 00:14:13.440
<v Speaker 2>instead of the normal assembler, and build linker gets overridden

289
00:14:13.480 --> 00:14:15.879
<v Speaker 2>to call zip or tar instead of the linker.

290
00:14:16.080 --> 00:14:20.120
<v Speaker 1>Wow. So you're completely replacing standard build steps with custom commands.

291
00:14:20.200 --> 00:14:23.399
<v Speaker 2>Exactly. It perfectly illustrates how deeply you can customize the

292
00:14:23.519 --> 00:14:24.679
<v Speaker 2>entire pipeline if you need to.

293
00:14:24.759 --> 00:14:26.960
<v Speaker 1>So if you thought compilers where a black box, definitely

294
00:14:27.000 --> 00:14:30.240
<v Speaker 1>think again. We're not just peeking inside. We're fundamentally changing

295
00:14:30.320 --> 00:14:32.720
<v Speaker 1>how they work, how they talk to the OS. That

296
00:14:32.799 --> 00:14:36.840
<v Speaker 1>level of control it must open up amazing possibilities for

297
00:14:36.879 --> 00:14:38.639
<v Speaker 1>optimization and analysis. Right.

298
00:14:38.960 --> 00:14:43.399
<v Speaker 2>Absolutely, that's where the real power of LLVM shines. Sophisticated

299
00:14:43.440 --> 00:14:48.399
<v Speaker 2>optimizations need deep program understanding, and this happens primarily in llvm.

300
00:14:48.120 --> 00:14:50.600
<v Speaker 1>IR, right, the intermediate representation.

301
00:14:50.200 --> 00:14:54.039
<v Speaker 2>Exactly, it's the target independent intermediate representation. It's the core

302
00:14:54.120 --> 00:14:59.320
<v Speaker 2>of the entire LLVM framework where most analysis and transformation happens.

303
00:14:59.399 --> 00:15:02.519
<v Speaker 1>And the mechan is and for doing these transformations is passes. Right,

304
00:15:02.600 --> 00:15:05.720
<v Speaker 1>what's a pass and what's this new pass manager deal?

305
00:15:06.120 --> 00:15:10.240
<v Speaker 2>Think of an LLVM pass as a module, a basic

306
00:15:10.360 --> 00:15:13.919
<v Speaker 2>unit that performs certain actions against LLVMI are like one

307
00:15:14.000 --> 00:15:15.799
<v Speaker 2>step on a factory assembly.

308
00:15:15.399 --> 00:15:17.159
<v Speaker 1>Line, Okay, a modular step, right.

309
00:15:17.000 --> 00:15:20.080
<v Speaker 2>And the new pass manager is a significant redesign compared

310
00:15:20.120 --> 00:15:23.279
<v Speaker 2>to the older system. The book highlights it runs faster

311
00:15:23.399 --> 00:15:26.080
<v Speaker 2>and generates results with better quality, partly due to a

312
00:15:26.120 --> 00:15:27.039
<v Speaker 2>cleaner interface.

313
00:15:27.159 --> 00:15:29.200
<v Speaker 1>Can you give an example of a simple pass.

314
00:15:28.960 --> 00:15:31.960
<v Speaker 2>Sure the source shows a strict up pass. Its goal

315
00:15:32.039 --> 00:15:35.000
<v Speaker 2>is simple, add the nolias attribute to function arguments that

316
00:15:35.039 --> 00:15:36.480
<v Speaker 2>are pointers no alias.

317
00:15:36.519 --> 00:15:37.720
<v Speaker 1>What does that tell the compiler?

318
00:15:38.039 --> 00:15:42.200
<v Speaker 2>It's a powerful hint. It guarantees that pointer does an alias,

319
00:15:42.360 --> 00:15:44.960
<v Speaker 2>meaning it doesn't point to the same memory location as

320
00:15:45.000 --> 00:15:48.080
<v Speaker 2>any other pointer accessible in that scope. This lets the

321
00:15:48.080 --> 00:15:52.080
<v Speaker 2>optimizer be much more aggressive, assuming less potential overlap, which

322
00:15:52.120 --> 00:15:54.039
<v Speaker 2>can unlock significant speed ups.

323
00:15:54.240 --> 00:15:56.639
<v Speaker 1>How does the pass know what other passes have done?

324
00:15:56.720 --> 00:15:59.080
<v Speaker 2>Ah, that's key to the new manager. When you write

325
00:15:59.080 --> 00:16:02.000
<v Speaker 2>a pass, you have to clear what analysis it preserves.

326
00:16:02.360 --> 00:16:06.279
<v Speaker 2>You use preserved analyzes, so if your pass adds no alias,

327
00:16:06.639 --> 00:16:10.600
<v Speaker 2>it might invalidate alias analysis results. You tell a manager,

328
00:16:10.879 --> 00:16:13.720
<v Speaker 2>maybe AA manager results are no longer valid.

329
00:16:13.919 --> 00:16:16.320
<v Speaker 1>So you explicitly state what your pass.

330
00:16:16.159 --> 00:16:18.879
<v Speaker 2>Breaks, well, rather what it doesn't break. By default, it

331
00:16:18.919 --> 00:16:22.840
<v Speaker 2>assumes you break everything you specify what's preserved. This avoids

332
00:16:22.919 --> 00:16:26.120
<v Speaker 2>costly recomputation of analyzes that are still perfectly valid. It's

333
00:16:26.159 --> 00:16:28.600
<v Speaker 2>like a librarian keeping track much more efficient.

334
00:16:28.759 --> 00:16:31.720
<v Speaker 1>Makes sense. So passes are the workers, but they need

335
00:16:31.799 --> 00:16:35.000
<v Speaker 1>information to do complex jobs, they need a brain, right

336
00:16:35.200 --> 00:16:36.559
<v Speaker 1>is that the analysis manager?

337
00:16:36.600 --> 00:16:40.360
<v Speaker 2>Precisely, you nailed it. Modern compiler optimizations can be complex.

338
00:16:40.559 --> 00:16:43.600
<v Speaker 2>They require lots of information and often getting that information

339
00:16:43.720 --> 00:16:45.000
<v Speaker 2>is expensive to evaluate.

340
00:16:45.159 --> 00:16:46.720
<v Speaker 1>So the analysis manager helps with that.

341
00:16:46.919 --> 00:16:50.759
<v Speaker 2>Yes, it handles all tasks related to program analysis. It

342
00:16:50.840 --> 00:16:54.960
<v Speaker 2>runs the analysis passes, and crucially caches their results so

343
00:16:54.960 --> 00:16:57.279
<v Speaker 2>they don't have to be rerun constantly.

344
00:16:57.720 --> 00:16:59.440
<v Speaker 1>Can you give an example of an analysis?

345
00:16:59.440 --> 00:17:02.879
<v Speaker 2>It might manage the source mentions a hal tantalizer project.

346
00:17:03.279 --> 00:17:05.839
<v Speaker 2>Its goal is to find code that's unreachable because a

347
00:17:05.880 --> 00:17:08.440
<v Speaker 2>special function like my halt gets called earlier.

348
00:17:08.519 --> 00:17:10.000
<v Speaker 1>Okay, dead god detection.

349
00:17:10.119 --> 00:17:13.319
<v Speaker 2>Sort of yeah. And to do this it relies on

350
00:17:13.319 --> 00:17:18.079
<v Speaker 2>one of the fundamental analyzes. LVM provides the dominator tree.

351
00:17:18.160 --> 00:17:21.359
<v Speaker 1>Or DT dominator tree. How does that help find unreachable

352
00:17:21.359 --> 00:17:22.319
<v Speaker 1>code after my halt?

353
00:17:22.640 --> 00:17:25.720
<v Speaker 2>Okay? So the dominator tree tells you control flow relationships.

354
00:17:25.920 --> 00:17:29.640
<v Speaker 2>If basic block A dominates basic block B, it means

355
00:17:29.680 --> 00:17:32.400
<v Speaker 2>every possible path to B must go through A first.

356
00:17:32.799 --> 00:17:33.000
<v Speaker 1>Ah.

357
00:17:33.079 --> 00:17:35.400
<v Speaker 2>I see, So if the block containing my halt dominates

358
00:17:35.440 --> 00:17:38.599
<v Speaker 2>another block and my halt stops execution, then that dominated

359
00:17:38.640 --> 00:17:42.720
<v Speaker 2>block is definitely unreachable. Dominator tree analysis computes this tree structure,

360
00:17:42.880 --> 00:17:44.799
<v Speaker 2>and halt tantalizer just needs to query it.

361
00:17:45.039 --> 00:17:48.359
<v Speaker 1>That's really clever. Leveraging fundamental graph analysis.

362
00:17:48.119 --> 00:17:52.599
<v Speaker 2>Exactly and understanding these core analyzes like dominator trees is

363
00:17:52.640 --> 00:17:56.160
<v Speaker 2>what lets you build much smarter, much more effective custom

364
00:17:56.200 --> 00:18:01.039
<v Speaker 2>optimization or analysis passes. You're building on solid theory foundations.

365
00:18:01.079 --> 00:18:04.880
<v Speaker 1>That's a powerful concept. But okay, even with great optimizations,

366
00:18:04.920 --> 00:18:08.160
<v Speaker 1>things go wrong. You need to debug, diagnose issues, check

367
00:18:08.240 --> 00:18:11.640
<v Speaker 1>run time behavior. What tools does LVM offer there?

368
00:18:11.799 --> 00:18:16.000
<v Speaker 2>Right, optimization isn't everything. LVM has some essential support utilities

369
00:18:16.200 --> 00:18:19.519
<v Speaker 2>for debugging your passes themselves. There's lvmd bug.

370
00:18:19.599 --> 00:18:20.240
<v Speaker 1>How does that work?

371
00:18:20.359 --> 00:18:24.599
<v Speaker 2>You sprinkle LLVMD e bug DDGSS calls in your passcode.

372
00:18:24.839 --> 00:18:27.920
<v Speaker 2>These messages only get printed if you run the optimizer

373
00:18:27.960 --> 00:18:30.880
<v Speaker 2>tool with the ededbug or dbug only your past name

374
00:18:30.960 --> 00:18:34.279
<v Speaker 2>flag keeps your production builds clean, but gives you detailed

375
00:18:34.319 --> 00:18:35.240
<v Speaker 2>logs when you need them.

376
00:18:35.480 --> 00:18:38.359
<v Speaker 1>Nice conditional logging. What about tracking numbers like how many

377
00:18:38.359 --> 00:18:39.759
<v Speaker 1>times an optimization fired?

378
00:18:40.000 --> 00:18:42.680
<v Speaker 2>For that, you use the statistic macro. You just declare

379
00:18:42.720 --> 00:18:45.599
<v Speaker 2>statistic counter name description and then increment counter name in

380
00:18:45.640 --> 00:18:49.640
<v Speaker 2>your code. LVM automatically collects these organizes them and can

381
00:18:49.680 --> 00:18:51.160
<v Speaker 2>print them out even in formats like.

382
00:18:51.200 --> 00:18:54.559
<v Speaker 1>Chason Jason output. That's useful for automation.

383
00:18:54.400 --> 00:18:58.960
<v Speaker 2>Very useful turns ad hoc counting into structured data for analysis.

384
00:18:59.279 --> 00:19:01.400
<v Speaker 2>Let's you see if you're pass is actually doing what

385
00:19:01.440 --> 00:19:04.240
<v Speaker 2>you thought or hitting unexpected bottlenecks.

386
00:19:04.359 --> 00:19:08.720
<v Speaker 1>Okay, what if an optimization tries to do something but fails,

387
00:19:09.200 --> 00:19:11.880
<v Speaker 1>like it wants to vectorize a loop but can't, how

388
00:19:11.920 --> 00:19:12.960
<v Speaker 1>do you find out why?

389
00:19:13.559 --> 00:19:17.519
<v Speaker 2>That's exactly what optimization remarks are for you use optimization

390
00:19:17.640 --> 00:19:20.640
<v Speaker 2>remark emitter in your past to report why something happened

391
00:19:20.839 --> 00:19:21.599
<v Speaker 2>or didn't.

392
00:19:21.319 --> 00:19:23.960
<v Speaker 1>Happen, So notes from the optimizer pretty much.

393
00:19:24.119 --> 00:19:28.279
<v Speaker 2>The book uses the example of loop invariant code motion LICM.

394
00:19:28.640 --> 00:19:31.039
<v Speaker 2>If it fails to hoist an instruction out of a loop,

395
00:19:31.079 --> 00:19:33.799
<v Speaker 2>it can emit a remark saying why maybe there's a

396
00:19:33.799 --> 00:19:35.559
<v Speaker 2>potential side effect it couldn't ignore.

397
00:19:35.720 --> 00:19:37.519
<v Speaker 1>You can see these remarks yes.

398
00:19:37.440 --> 00:19:40.240
<v Speaker 2>And even better, there's a tool Optviewer dot kei that

399
00:19:40.279 --> 00:19:43.079
<v Speaker 2>takes these remarks and generates a webpage. It highlights the

400
00:19:43.160 --> 00:19:45.640
<v Speaker 2>relevant source code lines and shows the remarks right next

401
00:19:45.640 --> 00:19:48.200
<v Speaker 2>to them. It's like a visual debugger for your optimizations.

402
00:19:48.240 --> 00:19:51.359
<v Speaker 1>Wow from pinpointing buffer overflows, which we'll get to to

403
00:19:51.640 --> 00:19:55.440
<v Speaker 1>visualizing optimization decisions. Yea LVM really does feel like it

404
00:19:55.440 --> 00:19:57.200
<v Speaker 1>gives you X ray vision into your code.

405
00:19:57.240 --> 00:19:59.279
<v Speaker 2>It's a good way to put it. And beyond these

406
00:19:59.400 --> 00:20:01.920
<v Speaker 2>utilities there's the whole world of instrumentation.

407
00:20:02.119 --> 00:20:05.240
<v Speaker 1>Instrumentation, you mean adding code to collect data at runtime.

408
00:20:05.160 --> 00:20:08.119
<v Speaker 2>Exactly, inserting some probes into the code we are compiling

409
00:20:08.119 --> 00:20:12.480
<v Speaker 2>in order to collect runtime information. Two massive areas here

410
00:20:12.559 --> 00:20:16.799
<v Speaker 2>are sanitizers and profile guided optimization or PGO.

411
00:20:17.119 --> 00:20:19.960
<v Speaker 1>Sanitizers sound like they're about fixing problems or finding them.

412
00:20:19.960 --> 00:20:21.200
<v Speaker 1>Give us a dramatic example.

413
00:20:21.279 --> 00:20:26.240
<v Speaker 2>Okay, address sanitizer classic example, buffer overflow dot C. Normally

414
00:20:26.319 --> 00:20:28.960
<v Speaker 2>it might just crash cryptically or worse, seem to work

415
00:20:29.000 --> 00:20:32.440
<v Speaker 2>but corrupt memory right hard to debug. But compile it

416
00:20:32.480 --> 00:20:35.519
<v Speaker 2>with Clang dash sanitize address. Now when you run it

417
00:20:35.519 --> 00:20:38.839
<v Speaker 2>and hit the overflow asan doesn't just crash. It prints

418
00:20:38.880 --> 00:20:41.680
<v Speaker 2>a detailed report that points out the problematic area with

419
00:20:41.759 --> 00:20:44.799
<v Speaker 2>high accuracy, right away tells you the exact line the

420
00:20:44.920 --> 00:20:46.160
<v Speaker 2>variable everything.

421
00:20:46.279 --> 00:20:47.000
<v Speaker 1>How does it do that?

422
00:20:47.200 --> 00:20:50.079
<v Speaker 2>It works by having the compiler inserting a boundary check

423
00:20:50.160 --> 00:20:53.720
<v Speaker 2>into the array index access in the LVMR. It adds

424
00:20:53.799 --> 00:20:56.799
<v Speaker 2>runtime guards. I remember this one bug. We spent days

425
00:20:56.839 --> 00:21:01.279
<v Speaker 2>stepping through GDB asan found it instantly on the Total

426
00:21:01.319 --> 00:21:01.960
<v Speaker 2>game Changer.

427
00:21:02.240 --> 00:21:05.599
<v Speaker 1>That's incredible. Are there other sanitizers custom ones?

428
00:21:05.720 --> 00:21:08.559
<v Speaker 2>Yes? Many built in ones, and the source describes building

429
00:21:08.559 --> 00:21:12.480
<v Speaker 2>a custom one loop counter sanitizer or LPC SAND.

430
00:21:12.559 --> 00:21:13.200
<v Speaker 1>What does that do?

431
00:21:13.400 --> 00:21:16.480
<v Speaker 2>It's designed to collect the exact trip count of every

432
00:21:16.559 --> 00:21:19.200
<v Speaker 2>loop in a module. It does this by using an

433
00:21:19.440 --> 00:21:23.519
<v Speaker 2>LVM pass to insert function calls like LPC sandst loopstart

434
00:21:23.519 --> 00:21:26.279
<v Speaker 2>and LPC senate loop in into the IR around loops.

435
00:21:26.880 --> 00:21:31.039
<v Speaker 2>These functions part of the compiler RT runtime library, then

436
00:21:31.119 --> 00:21:33.359
<v Speaker 2>record the counts when the instrumented code runs.

437
00:21:33.559 --> 00:21:36.839
<v Speaker 1>You build your own runtime analysis tools using the same infrastructure.

438
00:21:37.200 --> 00:21:39.039
<v Speaker 1>Very cool. And the other big area.

439
00:21:38.880 --> 00:21:42.359
<v Speaker 2>Was PGO profile guided optimization. The idea here is to

440
00:21:42.440 --> 00:21:45.640
<v Speaker 2>use runtime information not just for debugging, but to enable

441
00:21:45.680 --> 00:21:47.960
<v Speaker 2>more aggressive compiler optimizations.

442
00:21:48.240 --> 00:21:50.279
<v Speaker 1>How does runtime info help optimize?

443
00:21:50.519 --> 00:21:52.920
<v Speaker 2>It tells the compiler how the code is actually used,

444
00:21:52.960 --> 00:21:55.319
<v Speaker 2>which paths are hot, which are cold, which branches are

445
00:21:55.319 --> 00:21:58.319
<v Speaker 2>almost always taken. This lets it make smarter decisions about

446
00:21:58.319 --> 00:22:01.440
<v Speaker 2>things like function in lining, block lege out register allocation

447
00:22:02.119 --> 00:22:05.480
<v Speaker 2>optimizations that are risky without knowing typical behavior.

448
00:22:05.160 --> 00:22:07.440
<v Speaker 1>So it learns from real usage. How do you get

449
00:22:07.440 --> 00:22:08.519
<v Speaker 1>that usage data?

450
00:22:08.559 --> 00:22:12.559
<v Speaker 2>Two main ways. Instrumentation based PGO adds counters directly into

451
00:22:12.559 --> 00:22:15.559
<v Speaker 2>the code. It's very precise, but adds overhead during the

452
00:22:15.599 --> 00:22:20.079
<v Speaker 2>profiling run. Sampling based PGO uses external tools like Linux

453
00:22:20.119 --> 00:22:24.680
<v Speaker 2>perf to statistically sample the program counter during execution. Lower

454
00:22:24.759 --> 00:22:26.880
<v Speaker 2>overhead but less precise data.

455
00:22:26.960 --> 00:22:29.559
<v Speaker 1>Okay, so you gather the data than what.

456
00:22:29.920 --> 00:22:32.880
<v Speaker 2>First, you compile with a flag like clang def profile

457
00:22:32.920 --> 00:22:36.359
<v Speaker 2>generat for instrumentation. This creates a profiled data file when

458
00:22:36.359 --> 00:22:39.680
<v Speaker 2>you run the program. Then you can use llvm def

459
00:22:39.759 --> 00:22:42.839
<v Speaker 2>prof data to merge and maybe inspect this data. It

460
00:22:42.880 --> 00:22:45.359
<v Speaker 2>can show you things like the execution frequency of all

461
00:22:45.400 --> 00:22:46.680
<v Speaker 2>the enclosing basic.

462
00:22:46.440 --> 00:22:48.640
<v Speaker 1>Blocks to see the hotspots exactly.

463
00:22:49.279 --> 00:22:51.960
<v Speaker 2>Finally, you recompile your program, but this time with clang

464
00:22:52.000 --> 00:22:55.160
<v Speaker 2>def profile use, feeding that profile data back into the.

465
00:22:55.079 --> 00:22:56.880
<v Speaker 1>Compiler and the compiler uses it. Yes.

466
00:22:57.000 --> 00:23:00.599
<v Speaker 2>Yeah, the LVMIR actually gets annotated with metadat. You'll see

467
00:23:00.599 --> 00:23:03.240
<v Speaker 2>things like dot prop point seven to one or dot

468
00:23:03.279 --> 00:23:06.039
<v Speaker 2>prof point seven to two attached to instructions or branches.

469
00:23:06.319 --> 00:23:09.720
<v Speaker 2>These represent the collected frequencies and probabilities directly guiding the

470
00:23:09.759 --> 00:23:13.599
<v Speaker 2>optimization passes, the compiler literally learns from the profile that.

471
00:23:13.519 --> 00:23:18.559
<v Speaker 1>Closes the loop nicely, compiler learns from runtime. Okay, wow,

472
00:23:18.839 --> 00:23:20.359
<v Speaker 1>we have covered a ton of ground today.

473
00:23:20.400 --> 00:23:20.960
<v Speaker 2>We really have.

474
00:23:21.279 --> 00:23:24.640
<v Speaker 1>From you know, speeding up those massive builds, mastering testing

475
00:23:24.680 --> 00:23:28.000
<v Speaker 1>with lat and file check's crazy pattern matching.

476
00:23:27.799 --> 00:23:31.400
<v Speaker 2>To crafting custom dsls using table gen even from doughnut

477
00:23:31.440 --> 00:23:32.400
<v Speaker 2>recipes right.

478
00:23:32.359 --> 00:23:35.720
<v Speaker 1>And extending Klang's front end building whole custom tool chains,

479
00:23:36.000 --> 00:23:40.759
<v Speaker 1>then diving deep into LLVMR, the new pass manager analysis

480
00:23:40.839 --> 00:23:42.079
<v Speaker 1>like dominator trees, and.

481
00:23:42.039 --> 00:23:45.880
<v Speaker 2>Finally using runtime techniques like sanitizers for correctness and PGO

482
00:23:46.000 --> 00:23:48.960
<v Speaker 2>for performance, plus all those handy debug utilities.

483
00:23:49.119 --> 00:23:52.440
<v Speaker 1>It's been a whirlwind tour. But the key takeaway, I

484
00:23:52.480 --> 00:23:55.039
<v Speaker 1>think is that this deep dive using a guide like

485
00:23:55.039 --> 00:23:57.920
<v Speaker 1>the source book really offers you a shortcut. Doesn't it

486
00:23:58.000 --> 00:24:01.000
<v Speaker 1>a way to get a handle on LVM's huge, huge capabilities.

487
00:24:01.000 --> 00:24:05.119
<v Speaker 2>Absolutely, it turns these conmplex compiler engineering problems into challenges

488
00:24:05.160 --> 00:24:07.440
<v Speaker 2>you can actually tackle. It gives you that mental framework,

489
00:24:07.480 --> 00:24:11.599
<v Speaker 2>the specific tools, the techniques to really engage with LVM effectively.

490
00:24:12.039 --> 00:24:14.960
<v Speaker 1>We've seen how LVM lets you dissect code, rebuild it,

491
00:24:15.200 --> 00:24:18.119
<v Speaker 1>get incredible control, even less the compiler learn from how

492
00:24:18.160 --> 00:24:21.559
<v Speaker 1>programs run in the real world. With PGO, it's quite powerful,

493
00:24:21.680 --> 00:24:24.799
<v Speaker 1>which leads to a final thought. If the compilation process

494
00:24:24.839 --> 00:24:28.480
<v Speaker 1>itself can learn and adapt based on actual usage, what

495
00:24:28.640 --> 00:24:31.759
<v Speaker 1>fundamental assumptions that we currently make about software design and

496
00:24:31.799 --> 00:24:35.160
<v Speaker 1>development might we need to reconsider next? Something to chew on.

497
00:24:35.519 --> 00:24:37.720
<v Speaker 2>Definitely something to think about and if you want to

498
00:24:37.759 --> 00:24:40.920
<v Speaker 2>dive deeper. The LVM community is very active. Check out

499
00:24:40.960 --> 00:24:43.759
<v Speaker 2>the mailing lists on lists at LVM dot org, or

500
00:24:43.799 --> 00:24:47.039
<v Speaker 2>the discourse forums at LVM dot discourse dot group, or.

501
00:24:47.000 --> 00:24:52.279
<v Speaker 1>Even attended LVM Developers Meeting LVM dot org DDVMTG. Lots

502
00:24:52.279 --> 00:24:53.759
<v Speaker 1>of ways to keep learning and engage.

503
00:24:53.839 --> 00:24:55.759
<v Speaker 2>The resources out there once you know where to look,

504
00:24:55.799 --> 00:24:56.640
<v Speaker 2>are invaluable
