WEBVTT

1
00:00:00.160 --> 00:00:03.080
<v Speaker 1>Welcome to the deep dive. Today, we're jumping into the

2
00:00:03.120 --> 00:00:07.519
<v Speaker 1>Linux kernel, specifically how it handles hardware. If you've ever

3
00:00:07.559 --> 00:00:10.599
<v Speaker 1>wanted to write a device driver, or you know, just

4
00:00:10.679 --> 00:00:13.800
<v Speaker 1>understand how your keyboard actually talks to your OS, this

5
00:00:13.919 --> 00:00:16.679
<v Speaker 1>is for you. We're building the conceptual map exactly.

6
00:00:16.760 --> 00:00:20.039
<v Speaker 2>Our mission today is really about the architecture. We've looked

7
00:00:20.039 --> 00:00:24.600
<v Speaker 2>at some pretty comprehensive guides on kernel and driver development,

8
00:00:25.039 --> 00:00:27.480
<v Speaker 2>and we want to quickly lay out that core blueprint.

9
00:00:27.640 --> 00:00:29.320
<v Speaker 2>You know, what do you have to know? What's different?

10
00:00:29.359 --> 00:00:31.359
<v Speaker 2>How does the kernel see the world of devices?

11
00:00:31.519 --> 00:00:35.600
<v Speaker 1>Okay, so starting basic the device driver, it's fundamentally a

12
00:00:35.600 --> 00:00:37.200
<v Speaker 1>translator right sitting in the middle.

13
00:00:37.240 --> 00:00:40.159
<v Speaker 2>It's the crucial link. Yeah, it's that specialized code connecting

14
00:00:40.200 --> 00:00:43.479
<v Speaker 2>your apps, your user space stuff to the actual physical

15
00:00:43.479 --> 00:00:45.560
<v Speaker 2>hardware and it all goes to the kernel. You could

16
00:00:45.560 --> 00:00:48.200
<v Speaker 2>almost say it's the only bit of software allowed to

17
00:00:48.240 --> 00:00:50.359
<v Speaker 2>like directly touch the hardware ports.

18
00:00:50.560 --> 00:00:53.039
<v Speaker 1>Got it. Okay, So let's start where most drivers start,

19
00:00:53.479 --> 00:00:55.920
<v Speaker 1>the kernel module. This is how Linux stays flexible. Right,

20
00:00:56.000 --> 00:00:57.560
<v Speaker 1>Let's you add stuff without a.

21
00:00:57.479 --> 00:01:01.799
<v Speaker 2>Full reboot precisely, modules like you extend the kernel dynamically

22
00:01:02.159 --> 00:01:07.239
<v Speaker 2>at run time, and our sources. Definitely emphasize this isn't

23
00:01:07.280 --> 00:01:10.959
<v Speaker 2>for absolute beginners. You need solid C skills and to

24
00:01:10.959 --> 00:01:13.280
<v Speaker 2>be comfortable on the Linux command line before you really

25
00:01:13.280 --> 00:01:14.640
<v Speaker 2>dive into writing kernel code.

26
00:01:14.799 --> 00:01:18.040
<v Speaker 1>Right, So, if we're building one, we need the basic structure,

27
00:01:18.480 --> 00:01:21.120
<v Speaker 1>an entry point, and ideally an exit point.

28
00:01:21.200 --> 00:01:25.200
<v Speaker 2>That's the core. You use module in it that declares

29
00:01:25.239 --> 00:01:28.000
<v Speaker 2>the function that runs when your module gets loaded you

30
00:01:28.000 --> 00:01:31.680
<v Speaker 2>know withins modern mod probe. And then module exit declares

31
00:01:31.719 --> 00:01:34.079
<v Speaker 2>the cleanup function for when you unload it with a romedet.

32
00:01:34.200 --> 00:01:37.000
<v Speaker 1>Okay, now here's where it gets interesting. Compared to normal programming.

33
00:01:37.400 --> 00:01:41.640
<v Speaker 1>Memory optimization. Yeah, the kernel uses these macros in and exit.

34
00:01:42.159 --> 00:01:43.439
<v Speaker 1>Why are they so important? Ah?

35
00:01:43.519 --> 00:01:45.719
<v Speaker 2>Yeah, they're not just comments. There are instructions for the

36
00:01:45.760 --> 00:01:49.200
<v Speaker 2>compiler and linker really important. When you mark your in

37
00:01:49.280 --> 00:01:52.000
<v Speaker 2>it function within it, you're telling the linker put this

38
00:01:52.159 --> 00:01:53.840
<v Speaker 2>code in a special memory section.

39
00:01:54.000 --> 00:01:55.480
<v Speaker 1>And what's special about that section?

40
00:01:55.879 --> 00:01:58.799
<v Speaker 2>Well, if your module is built into the kernel statically compiled,

41
00:01:59.200 --> 00:02:02.319
<v Speaker 2>the kernel actually freeze that memory once the init function

42
00:02:02.439 --> 00:02:04.280
<v Speaker 2>is done, because, I mean, it's never going to call

43
00:02:04.319 --> 00:02:06.400
<v Speaker 2>that in it function again until the next reboot, right,

44
00:02:06.680 --> 00:02:09.120
<v Speaker 2>so why keep the code around. It's a smart optimization.

45
00:02:09.319 --> 00:02:10.080
<v Speaker 2>Use it, lose it.

46
00:02:10.360 --> 00:02:13.520
<v Speaker 1>That is clever proactively cleaning up its own startup code.

47
00:02:13.879 --> 00:02:17.719
<v Speaker 1>And exit is that the flip side pretty much? Yeah.

48
00:02:17.840 --> 00:02:21.400
<v Speaker 2>Exit tills the compiler to just leave out the exit

49
00:02:21.439 --> 00:02:24.319
<v Speaker 2>functions code entirely if the module is built in, because

50
00:02:24.319 --> 00:02:26.240
<v Speaker 2>if it's built in, you can unload it anyway, so

51
00:02:26.639 --> 00:02:28.400
<v Speaker 2>save space removes code you'd never run.

52
00:02:28.479 --> 00:02:33.919
<v Speaker 1>Okay, moving to metadata. Every module needs info like module author,

53
00:02:34.439 --> 00:02:37.639
<v Speaker 1>But the big one seems to be module license. Why

54
00:02:37.719 --> 00:02:39.879
<v Speaker 1>is that one so sensitive?

55
00:02:40.000 --> 00:02:43.120
<v Speaker 2>It really is. It's about the GPL and the kernel's ecosystem.

56
00:02:43.479 --> 00:02:46.719
<v Speaker 2>You must declare a GPL compatible license using module license,

57
00:02:47.280 --> 00:02:49.840
<v Speaker 2>especially if you want your module to use certain kernel functions,

58
00:02:50.039 --> 00:02:53.319
<v Speaker 2>ones that are specifically exported only for GPL modules using

59
00:02:53.360 --> 00:02:54.520
<v Speaker 2>exports symbol GPL.

60
00:02:54.599 --> 00:02:57.360
<v Speaker 1>And if you don't, or if you use a proprietary.

61
00:02:56.759 --> 00:02:59.479
<v Speaker 2>License, then boot your kernel gets more distainted. The kernel

62
00:02:59.520 --> 00:03:02.400
<v Speaker 2>sits a fly. It basically says warning, non open or

63
00:03:02.520 --> 00:03:05.400
<v Speaker 2>untrusted code has been loaded. If you then get a

64
00:03:05.400 --> 00:03:09.599
<v Speaker 2>crash a kernel panic, well good luck getting help from

65
00:03:09.639 --> 00:03:12.800
<v Speaker 2>the community. They'll likely see the taint flag and you know,

66
00:03:12.919 --> 00:03:15.199
<v Speaker 2>say they can't support it because proprietary code might be

67
00:03:15.240 --> 00:03:17.120
<v Speaker 2>the cause. It's a serious line.

68
00:03:17.240 --> 00:03:19.960
<v Speaker 1>Okay, that definitely sets the stage. Now let's make that

69
00:03:20.000 --> 00:03:24.759
<v Speaker 1>mental shift. We're not writing userspace programs anymore. We're inside

70
00:03:24.759 --> 00:03:27.759
<v Speaker 1>the kernel. The rules change, the safety nets.

71
00:03:27.960 --> 00:03:31.400
<v Speaker 2>Gone completely gone. In user space, if something goes terribly wrong,

72
00:03:31.479 --> 00:03:34.639
<v Speaker 2>your program crashes. The OS cleans up fine. In the kernel,

73
00:03:34.919 --> 00:03:38.599
<v Speaker 2>no way. If you allocate memory, grab an ieoport, whatever,

74
00:03:39.199 --> 00:03:41.680
<v Speaker 2>you must clean it up yourself before your function returns.

75
00:03:41.719 --> 00:03:44.759
<v Speaker 2>Fail to do that and your risk leaks instability, maybe

76
00:03:44.800 --> 00:03:45.840
<v Speaker 2>even a full system crash.

77
00:03:45.919 --> 00:03:48.560
<v Speaker 1>So error handling is totally different. No more just returning

78
00:03:48.680 --> 00:03:50.360
<v Speaker 1>zero for success and one for failure.

79
00:03:50.560 --> 00:03:53.599
<v Speaker 2>Right. The standard is really strict. Kernel functions that interact

80
00:03:53.639 --> 00:03:57.680
<v Speaker 2>with system calls. They must return errors as negative values

81
00:03:57.840 --> 00:04:00.719
<v Speaker 2>like return error code. So you'll see you're an AIO

82
00:04:00.879 --> 00:04:03.199
<v Speaker 2>for an IO error or an enulmum if you couldn't

83
00:04:03.199 --> 00:04:05.759
<v Speaker 2>get memory. It's a clear convention, so errors get passed

84
00:04:05.840 --> 00:04:06.960
<v Speaker 2>up correctly, and.

85
00:04:06.919 --> 00:04:09.280
<v Speaker 1>To manage that cleanup, especially if you have say five

86
00:04:09.319 --> 00:04:12.560
<v Speaker 1>steps that work then the sixth one fails. The kernel

87
00:04:12.560 --> 00:04:17.480
<v Speaker 1>style actually recommends using go to. That sounds controversial. It does.

88
00:04:17.560 --> 00:04:19.759
<v Speaker 2>It goes against a lot of Standard C teaching, but

89
00:04:19.879 --> 00:04:23.040
<v Speaker 2>in the kernel it's pragmatic. You use go to strictly

90
00:04:23.079 --> 00:04:26.399
<v Speaker 2>for error cleanup paths. You set up labels like air free, buffer,

91
00:04:26.439 --> 00:04:29.360
<v Speaker 2>air release, lock, maybe out. When an error happens, you

92
00:04:29.399 --> 00:04:31.920
<v Speaker 2>go to the appropriate label. This jumps you straight into

93
00:04:31.920 --> 00:04:34.959
<v Speaker 2>the sequence of cleanup steps needed, executed in reverse order

94
00:04:34.959 --> 00:04:35.519
<v Speaker 2>of allocation.

95
00:04:35.639 --> 00:04:37.959
<v Speaker 1>But doesn't that make the code harder to follow all

96
00:04:38.000 --> 00:04:38.639
<v Speaker 1>those jumps.

97
00:04:39.000 --> 00:04:41.160
<v Speaker 2>You'd think so, but it actually tends to make it cleaner.

98
00:04:41.160 --> 00:04:45.560
<v Speaker 2>In this context. The alternative is deeply nested eifles blocks

99
00:04:45.800 --> 00:04:48.360
<v Speaker 2>which get really hard to read and are super easy

100
00:04:48.399 --> 00:04:51.839
<v Speaker 2>to mess up, like forgetting a cleanup step inside one branch.

101
00:04:52.560 --> 00:04:55.879
<v Speaker 2>The go to approach enforces a clear, linear cleanup path.

102
00:04:56.240 --> 00:04:59.040
<v Speaker 2>It's considered the safest and most readable pattern for kernel

103
00:04:59.120 --> 00:05:00.959
<v Speaker 2>error handling. Strange but true.

104
00:05:01.040 --> 00:05:03.240
<v Speaker 1>Okay, I can see the logic there, given the stakes.

105
00:05:03.720 --> 00:05:06.120
<v Speaker 1>What about functions that need to return a pointer but

106
00:05:06.199 --> 00:05:08.959
<v Speaker 1>might also fail. You can't return both a pointer and

107
00:05:09.079 --> 00:05:10.800
<v Speaker 1>a negative error code, ah.

108
00:05:10.720 --> 00:05:13.560
<v Speaker 2>Classic C problem. The kernel has neat macros for this,

109
00:05:13.720 --> 00:05:17.120
<v Speaker 2>or ptr iSER and per tier. If your function fails

110
00:05:17.120 --> 00:05:20.720
<v Speaker 2>and needs to return say anviol invalid argument instead of

111
00:05:20.759 --> 00:05:24.639
<v Speaker 2>a pointer, it uses airptr and ball fail. This converts

112
00:05:24.639 --> 00:05:27.120
<v Speaker 2>the air code into a special pointer value. The code

113
00:05:27.120 --> 00:05:29.759
<v Speaker 2>calling that function then checks the return pointer using iSER.

114
00:05:30.120 --> 00:05:32.040
<v Speaker 2>If it returns true, it means it's an error pointer.

115
00:05:32.360 --> 00:05:34.319
<v Speaker 2>Then if you need the actual air code back, you

116
00:05:34.439 --> 00:05:37.399
<v Speaker 2>use ptr on that pointer and it gives you ANVOL.

117
00:05:37.560 --> 00:05:39.720
<v Speaker 2>It avoids ambiguity with returning anal MP.

118
00:05:39.800 --> 00:05:43.600
<v Speaker 1>That's ill again. Okay, last bit on the programming shift logging,

119
00:05:43.639 --> 00:05:46.240
<v Speaker 1>we're supposed to move beyond just using print, right.

120
00:05:46.399 --> 00:05:49.720
<v Speaker 2>Yeah. While print is still the underlying engine, the recommendation

121
00:05:49.839 --> 00:05:53.120
<v Speaker 2>is strongly towards using specific elper functions. Don't just use

122
00:05:53.120 --> 00:05:57.360
<v Speaker 2>print with log levels directly, use things like prayer or

123
00:05:57.399 --> 00:06:02.120
<v Speaker 2>prinfo for general module messages. But and this is important,

124
00:06:02.399 --> 00:06:05.319
<v Speaker 2>if you're in a device driver, use the dev versions

125
00:06:05.639 --> 00:06:07.879
<v Speaker 2>UH or dev infostruct device DEV.

126
00:06:08.120 --> 00:06:11.319
<v Speaker 1>And why the dev prefix ones specifically for drivers.

127
00:06:11.199 --> 00:06:15.040
<v Speaker 2>Because they automatically include context about the specific device the

128
00:06:15.040 --> 00:06:17.720
<v Speaker 2>message is coming from, its name is positioned in the system,

129
00:06:18.000 --> 00:06:21.160
<v Speaker 2>makes debugging way easier when you have multiple identical devices.

130
00:06:21.240 --> 00:06:24.199
<v Speaker 2>They're even netdev versions for network drivers. They all tie

131
00:06:24.199 --> 00:06:26.000
<v Speaker 2>the message to a specific kernel object.

132
00:06:26.079 --> 00:06:28.000
<v Speaker 1>Got it? And if we want our modules messages to

133
00:06:28.000 --> 00:06:30.839
<v Speaker 1>stand out in the flood of kernel logs, how do

134
00:06:30.879 --> 00:06:31.759
<v Speaker 1>we add a prefix?

135
00:06:32.240 --> 00:06:36.199
<v Speaker 2>Simple. You use the prfmt macro, you define hashtag, define

136
00:06:36.240 --> 00:06:39.480
<v Speaker 2>prfmt fmt KDI. You're building the devmt at the top

137
00:06:39.519 --> 00:06:42.160
<v Speaker 2>of your source file, Katie. Build mode name gets set

138
00:06:42.199 --> 00:06:44.800
<v Speaker 2>automatically during the build to your module's name. So now

139
00:06:44.839 --> 00:06:48.319
<v Speaker 2>every print FO, dev etc. Automatically prints like my driver

140
00:06:48.800 --> 00:06:50.800
<v Speaker 2>error occurred. Super helpful.

141
00:06:50.839 --> 00:06:53.319
<v Speaker 1>Okay, we've got the rules of engagement down. Now. The

142
00:06:53.360 --> 00:06:58.560
<v Speaker 1>big danger zone concurrency, especially on SMP systems symmetric multiprocessing,

143
00:06:58.839 --> 00:07:01.600
<v Speaker 1>where multiple CPU use can access the same memory the

144
00:07:01.639 --> 00:07:05.759
<v Speaker 1>same hardware simultaneously. Locks are essential.

145
00:07:05.920 --> 00:07:09.399
<v Speaker 2>This is absolutely where things get tricky, even for experienced devs.

146
00:07:09.839 --> 00:07:12.560
<v Speaker 2>The key is understanding the context. Are you in code

147
00:07:12.560 --> 00:07:16.519
<v Speaker 2>that can sleep user context or code that absolutely cannot sleep,

148
00:07:16.680 --> 00:07:18.560
<v Speaker 2>like an interrupt handler atomic context.

149
00:07:18.639 --> 00:07:20.959
<v Speaker 1>Let's start with the fast ones. Spin locks meant for

150
00:07:21.040 --> 00:07:22.680
<v Speaker 1>short atomic operation.

151
00:07:22.439 --> 00:07:25.639
<v Speaker 2>Exactly very short critical sections. When a CPU grabs a

152
00:07:25.680 --> 00:07:28.839
<v Speaker 2>standard spin lock. Using spin lock, it disables preemption on

153
00:07:28.839 --> 00:07:31.560
<v Speaker 2>that specific CPU, so other tasks in the same core

154
00:07:31.600 --> 00:07:34.439
<v Speaker 2>won't interrupt it. But and this is the critical part,

155
00:07:34.519 --> 00:07:37.560
<v Speaker 2>a standard spin lock does not stop hardware interrupts from

156
00:07:37.600 --> 00:07:38.560
<v Speaker 2>firing on that cpu.

157
00:07:38.879 --> 00:07:42.519
<v Speaker 1>Ah, and that's the classic deadlock scenario waiting to happen sisily.

158
00:07:42.560 --> 00:07:45.519
<v Speaker 2>Imagine task A on CPU zero called spin lock to

159
00:07:45.560 --> 00:07:49.759
<v Speaker 2>protect some data. Then bam, hardware interrupt fires on CPU zero.

160
00:07:50.079 --> 00:07:53.519
<v Speaker 2>The CPU jumps to the interrupt handler IRQ. Now, if

161
00:07:53.560 --> 00:07:56.000
<v Speaker 2>that IRQ handler needs the same data and tries to

162
00:07:56.040 --> 00:07:59.920
<v Speaker 2>acquire the same spin lock, the IRQ spins forever, waiting

163
00:08:00.120 --> 00:08:02.480
<v Speaker 2>task A to release the lock. But Task A is

164
00:08:02.480 --> 00:08:04.879
<v Speaker 2>preempted by the IRQ and can't run to release.

165
00:08:04.600 --> 00:08:09.360
<v Speaker 1>It out catastrophic. So the rule is, if data protected

166
00:08:09.360 --> 00:08:11.519
<v Speaker 1>by a spin lock might also be touched by an

167
00:08:11.519 --> 00:08:13.120
<v Speaker 1>interrupt handler, you need something more.

168
00:08:13.319 --> 00:08:16.360
<v Speaker 2>Yes, unless you are absolutely one hundred percent certain that

169
00:08:16.439 --> 00:08:18.720
<v Speaker 2>no interrupt handler will ever try to access that data

170
00:08:18.800 --> 00:08:22.360
<v Speaker 2>or acquire that lock, you must use the IRQ save variants.

171
00:08:22.600 --> 00:08:24.560
<v Speaker 2>Those are functions like spin locker save and spin a

172
00:08:24.560 --> 00:08:27.600
<v Speaker 2>locker store. They do two things, disable preemption and disabled

173
00:08:27.639 --> 00:08:30.480
<v Speaker 2>hardware interrupts on the local CPU. This makes the critical

174
00:08:30.480 --> 00:08:33.039
<v Speaker 2>section truly atomic on that core. It's a vital lesson.

175
00:08:33.080 --> 00:08:36.240
<v Speaker 1>Okay, So spin locks are for short atomic contexts, possibly

176
00:08:36.360 --> 00:08:39.399
<v Speaker 1>needing IRQ disabling. How do utexts differ?

177
00:08:39.600 --> 00:08:43.360
<v Speaker 2>Mutexes are conceptually simpler but have different rules they're built

178
00:08:43.480 --> 00:08:46.799
<v Speaker 2>using spin locks underneath, but their behavior is different. If

179
00:08:46.840 --> 00:08:49.720
<v Speaker 2>task B tries to acquire a mutex held by task A,

180
00:08:50.279 --> 00:08:53.360
<v Speaker 2>instead of spinning, task B is put to sleep. The

181
00:08:53.440 --> 00:08:55.360
<v Speaker 2>kernel puts it on a weight queue and lets the

182
00:08:55.360 --> 00:08:56.639
<v Speaker 2>CPU schedule some other.

183
00:08:56.559 --> 00:09:00.399
<v Speaker 1>Task, so they yield the CPU much more better for

184
00:09:00.559 --> 00:09:02.399
<v Speaker 1>potentially longer critical sections.

185
00:09:02.440 --> 00:09:05.919
<v Speaker 2>Then, yes, much better for contention and longer holds. But

186
00:09:06.399 --> 00:09:09.039
<v Speaker 2>and this is a huge butt because they involve sleeping.

187
00:09:09.320 --> 00:09:12.360
<v Speaker 2>You can only use mutexes in context where sleeping is allowed.

188
00:09:12.639 --> 00:09:15.480
<v Speaker 2>That mean primarily user context like when handling a system call.

189
00:09:15.799 --> 00:09:18.000
<v Speaker 2>You can never acquire a mutex from an interrupt handler

190
00:09:18.200 --> 00:09:21.360
<v Speaker 2>or any other atomic context because those contexts absolutely cannot

191
00:09:21.360 --> 00:09:25.200
<v Speaker 2>sleep using a mutex. There boom, system panic, got it.

192
00:09:25.240 --> 00:09:28.519
<v Speaker 1>Spin locks for atomic mutexes for sleeping contexts. Let's briefly

193
00:09:28.519 --> 00:09:30.639
<v Speaker 1>touch on time. We hear about Jiffey's but things are

194
00:09:30.639 --> 00:09:31.480
<v Speaker 1>more abstract now.

195
00:09:31.600 --> 00:09:33.639
<v Speaker 2>Yeah, Chiffees was the old way. It is. The kernel

196
00:09:33.639 --> 00:09:36.399
<v Speaker 2>counter the increments at a certain frequency defined by the

197
00:09:36.720 --> 00:09:40.600
<v Speaker 2>h z value. It gave fairly low resolution timing. The

198
00:09:40.639 --> 00:09:43.639
<v Speaker 2>modern kernel abstracts this. It relies on two types of

199
00:09:43.639 --> 00:09:44.960
<v Speaker 2>hardware components. For time.

200
00:09:45.279 --> 00:09:47.759
<v Speaker 1>The things that track time and the things that schedule

201
00:09:47.799 --> 00:09:48.519
<v Speaker 1>events in time.

202
00:09:48.679 --> 00:09:52.559
<v Speaker 2>You got it. First, clock source devices. These provide a

203
00:09:52.600 --> 00:09:56.639
<v Speaker 2>high resolution, always increasing monotonic counter. Think of it as

204
00:09:56.639 --> 00:10:00.679
<v Speaker 2>the master clock for accurate time stamps. Second clock event

205
00:10:00.759 --> 00:10:04.799
<v Speaker 2>devices these are the programmable timers. They can be told

206
00:10:05.039 --> 00:10:09.639
<v Speaker 2>fire and interrupt exactly n nanoseconds from now. They're crucial

207
00:10:09.720 --> 00:10:12.759
<v Speaker 2>for things like high resolution timers or timers much more

208
00:10:12.840 --> 00:10:13.840
<v Speaker 2>precise than Jiffy's.

209
00:10:13.879 --> 00:10:17.200
<v Speaker 1>Okay, and before we leave this area, tasklts they were

210
00:10:17.200 --> 00:10:19.559
<v Speaker 1>for deferring work from interrupts, right the bottom half.

211
00:10:19.679 --> 00:10:22.159
<v Speaker 2>Yes, they were a common mechanism. Yeah, but the sources

212
00:10:22.200 --> 00:10:24.840
<v Speaker 2>we looked at had a very very strong warning. Task

213
00:10:24.919 --> 00:10:28.200
<v Speaker 2>lits are being actively deprecated. They're slated for removal. You

214
00:10:28.240 --> 00:10:30.440
<v Speaker 2>should not use them in any new code. They really

215
00:10:30.480 --> 00:10:32.799
<v Speaker 2>only exist now for historical pedagogic reasons.

216
00:10:32.840 --> 00:10:35.320
<v Speaker 1>Okay, loud and clear. So for deferring work now, the

217
00:10:35.360 --> 00:10:36.480
<v Speaker 1>answer is work cues.

218
00:10:36.720 --> 00:10:40.159
<v Speaker 2>Absolutely use workques. They provide a much more flexible and

219
00:10:40.279 --> 00:10:43.039
<v Speaker 2>robust way to q work to be executed by dedicated

220
00:10:43.080 --> 00:10:45.879
<v Speaker 2>kernel threads. Those threads can sleep, so they can handle

221
00:10:45.960 --> 00:10:47.840
<v Speaker 2>much longer, more complex tasks safely.

222
00:10:48.159 --> 00:10:51.679
<v Speaker 1>Right. Let's zoom out now to the big picture, how

223
00:10:51.679 --> 00:10:55.240
<v Speaker 1>the kernel organizes all this hardware, the Linux Device Model LDM.

224
00:10:55.480 --> 00:10:57.840
<v Speaker 1>This is how physical stuff turns into those directories we

225
00:10:57.879 --> 00:10:58.480
<v Speaker 1>see insists.

226
00:10:58.840 --> 00:11:02.120
<v Speaker 2>LDM is the framework, the glue it creates that hierarchy.

227
00:11:02.559 --> 00:11:06.600
<v Speaker 2>Buses contain devices, drivers attached to devices, and it uses

228
00:11:06.679 --> 00:11:09.720
<v Speaker 2>three core low level structures. At the very bottom, the

229
00:11:09.759 --> 00:11:12.960
<v Speaker 2>most fundamental piece is the object. A cobject isn't a

230
00:11:12.960 --> 00:11:16.000
<v Speaker 2>device itself or a driver. It represents the connection or

231
00:11:16.039 --> 00:11:19.519
<v Speaker 2>an object that can appear in sifts. Basically, every single

232
00:11:19.519 --> 00:11:23.399
<v Speaker 2>directory you navigate in the CYS Virtual file system corresponds

233
00:11:23.440 --> 00:11:26.159
<v Speaker 2>to a object in kernel memory. It's the basic building

234
00:11:26.200 --> 00:11:27.279
<v Speaker 2>block of that hierarchy.

235
00:11:27.399 --> 00:11:30.360
<v Speaker 1>Ah okay, So cobject is the foundation for creating that

236
00:11:30.399 --> 00:11:34.399
<v Speaker 1>topology and exposing attributes out to user space through sifts.

237
00:11:34.519 --> 00:11:35.480
<v Speaker 1>That clicks.

238
00:11:35.559 --> 00:11:38.559
<v Speaker 2>It's fundamental. These objects are then grouped into ksets, which

239
00:11:38.600 --> 00:11:41.720
<v Speaker 2>often correspond to directories containing multiple objects, and they have

240
00:11:41.759 --> 00:11:46.000
<v Speaker 2>associated cobduy types which define behavior like attribute handling. Together

241
00:11:46.039 --> 00:11:46.879
<v Speaker 2>they build the whole.

242
00:11:46.679 --> 00:11:49.480
<v Speaker 1>Structure, and within the structure we find different kinds of buses.

243
00:11:49.759 --> 00:11:54.720
<v Speaker 2>Broadly, yeah, you have discoverable buses. Think PCI USB. You

244
00:11:54.759 --> 00:11:57.039
<v Speaker 2>plug something in the bus itself can figure out what

245
00:11:57.080 --> 00:11:59.919
<v Speaker 2>it is until the kernel the hardware handles the numeration.

246
00:12:00.360 --> 00:12:04.080
<v Speaker 2>Then you have nondiscoverable buses. The platform bus is a

247
00:12:04.120 --> 00:12:07.639
<v Speaker 2>common example, especially in embedded systems, where peripherals are just

248
00:12:07.679 --> 00:12:11.200
<v Speaker 2>hardwired to the CPU. The bus itself doesn't announce devices.

249
00:12:11.679 --> 00:12:13.399
<v Speaker 2>For these, the kernel needs to be told that a

250
00:12:13.440 --> 00:12:17.039
<v Speaker 2>device exists at a certain address or uses certain resources.

251
00:12:17.279 --> 00:12:19.799
<v Speaker 2>This information needs to be provided somehow.

252
00:12:19.440 --> 00:12:22.200
<v Speaker 1>Which brings us neatly to the modern way of providing

253
00:12:22.200 --> 00:12:25.759
<v Speaker 1>that information, the device tree. This replaced the old hard

254
00:12:25.759 --> 00:12:27.360
<v Speaker 1>coded board files exactly.

255
00:12:27.720 --> 00:12:31.200
<v Speaker 2>Device tree or DT is a huge step for portability.

256
00:12:31.799 --> 00:12:34.159
<v Speaker 2>Instead of c code describing the hardware layout for one

257
00:12:34.200 --> 00:12:37.600
<v Speaker 2>specific board, you use a textual format dot DTS source files.

258
00:12:37.919 --> 00:12:41.360
<v Speaker 2>These files describe the hardware using nodes representing devices, buses,

259
00:12:41.559 --> 00:12:45.039
<v Speaker 2>and properties like memory addresses, interrupt numbers. This dot dts

260
00:12:45.039 --> 00:12:47.960
<v Speaker 2>file is compiled into a binary blob, the dot DTB

261
00:12:48.200 --> 00:12:51.039
<v Speaker 2>or FDT Flatten device tree. The bootloader loads the dot

262
00:12:51.120 --> 00:12:53.399
<v Speaker 2>DPB into memory, and the kernel parses it at boot

263
00:12:53.399 --> 00:12:55.879
<v Speaker 2>time to learn about the non discoverable hardware present.

264
00:12:56.039 --> 00:13:00.399
<v Speaker 1>What about adding hardware after boot or changing configurations dynamic.

265
00:13:00.720 --> 00:13:04.039
<v Speaker 2>That's where dtoverlays come in. These are smaller partial dot

266
00:13:04.120 --> 00:13:07.799
<v Speaker 2>DTBO device tree blob overlay files. They act like patches.

267
00:13:08.039 --> 00:13:10.600
<v Speaker 2>You can load it dot DTB at runtime, usually via

268
00:13:10.679 --> 00:13:14.120
<v Speaker 2>can figs, and it modifies the live in memory device tree,

269
00:13:14.360 --> 00:13:17.519
<v Speaker 2>maybe enabling an interface, changing a pin configuration, adding a

270
00:13:17.559 --> 00:13:19.480
<v Speaker 2>new IT two C device. It's very powerful.

271
00:13:19.639 --> 00:13:21.639
<v Speaker 1>So when my driver starts up and it needs to

272
00:13:21.679 --> 00:13:24.240
<v Speaker 1>know its memory address or its interrupt number, how does

273
00:13:24.279 --> 00:13:26.360
<v Speaker 1>it get that from the device tree? Reliably?

274
00:13:26.519 --> 00:13:29.879
<v Speaker 2>It uses specific API calls to query the tree. Crucially,

275
00:13:29.960 --> 00:13:33.240
<v Speaker 2>the device tree uses named properties, so instead of asking

276
00:13:33.279 --> 00:13:35.720
<v Speaker 2>for the first interrupts, your driver asked for the interrupt

277
00:13:35.759 --> 00:13:39.320
<v Speaker 2>named four or similar functions like platforms yourt resource by

278
00:13:39.399 --> 00:13:41.759
<v Speaker 2>name or of property read thirty two. Let the driver

279
00:13:41.840 --> 00:13:44.600
<v Speaker 2>request resources by name as defined in the dot dts.

280
00:13:44.960 --> 00:13:47.679
<v Speaker 2>This decouples the driver from the exact physical wiring order.

281
00:13:47.840 --> 00:13:51.039
<v Speaker 1>Makes sense, Okay, last piece, let's connect this architecture to

282
00:13:51.080 --> 00:13:54.600
<v Speaker 1>a real world subsystem. The example given was industrial io

283
00:13:54.879 --> 00:13:56.279
<v Speaker 1>iiO right io is.

284
00:13:56.279 --> 00:13:59.919
<v Speaker 2>A great example. It's a whole kernel subsystem specifically designed

285
00:13:59.919 --> 00:14:04.399
<v Speaker 2>for sensors. Analog to digital converters ADCs, Digital to analog

286
00:14:04.399 --> 00:14:07.559
<v Speaker 2>converters dcs basically data acquisition hardware.

287
00:14:07.799 --> 00:14:10.759
<v Speaker 1>It manages all the complexity of reading raw sens or values,

288
00:14:10.799 --> 00:14:12.879
<v Speaker 1>scaling them, handling triggers.

289
00:14:12.440 --> 00:14:16.159
<v Speaker 2>Exactly, But how does it expose that data to userspace.

290
00:14:16.600 --> 00:14:18.960
<v Speaker 2>It doesn't require custom applications for every sensor.

291
00:14:19.039 --> 00:14:21.639
<v Speaker 1>It uses the device model and sifts precisely.

292
00:14:22.080 --> 00:14:25.480
<v Speaker 2>The io framework models sensors and their data points as channels.

293
00:14:25.759 --> 00:14:29.080
<v Speaker 2>Each channel like the X axis acceleration or temperature reading,

294
00:14:29.519 --> 00:14:33.039
<v Speaker 2>gets exposed as attributes within sifts, So reading the raw

295
00:14:33.120 --> 00:14:35.600
<v Speaker 2>value from an accelerometer might be as simple as reading

296
00:14:35.600 --> 00:14:39.519
<v Speaker 2>a file like cispusiodevi COO dot device zero in SLX.

297
00:14:40.519 --> 00:14:44.120
<v Speaker 2>User space just interacts with these standard file interfaces underpinned

298
00:14:44.120 --> 00:14:47.480
<v Speaker 2>by LDM and the cobject structure representing that channel attribute.

299
00:14:47.519 --> 00:14:50.120
<v Speaker 1>Wow. Okay, we've covered a lot, from those tiny innit

300
00:14:50.240 --> 00:14:52.919
<v Speaker 1>macros affecting memory all the way up to these complex

301
00:14:52.919 --> 00:14:55.399
<v Speaker 1>frameworks like iiO built on the core architecture.

302
00:14:55.559 --> 00:14:58.720
<v Speaker 2>Yeah, we hit the big points. The mindset shift for

303
00:14:58.799 --> 00:15:03.159
<v Speaker 2>kernel programming error clean up with go to logging, then

304
00:15:03.200 --> 00:15:06.679
<v Speaker 2>the concurrency mindfield spin locks versus mutex is, atomic versus

305
00:15:06.720 --> 00:15:08.440
<v Speaker 2>sleeping contexts.

306
00:15:08.000 --> 00:15:11.159
<v Speaker 1>And then the overall structure. How the Linux device model

307
00:15:11.240 --> 00:15:14.240
<v Speaker 1>uses cobjects to build the hierarchy, how the device tree

308
00:15:14.240 --> 00:15:18.399
<v Speaker 1>describes hardware, and how subsystems like iiO plug into that.

309
00:15:18.879 --> 00:15:20.759
<v Speaker 2>So if there's one final thought to leave you with,

310
00:15:20.840 --> 00:15:24.960
<v Speaker 2>it's this, remember that object. It seems abstract, almost trivial,

311
00:15:25.399 --> 00:15:28.919
<v Speaker 2>but everything in the device hierarchy, from the simplest led

312
00:15:29.080 --> 00:15:33.120
<v Speaker 2>driver to the most complex network interface or iiO sensor framework,

313
00:15:33.519 --> 00:15:36.440
<v Speaker 2>ultimately relies on objects to establish its existence and its

314
00:15:36.440 --> 00:15:40.000
<v Speaker 2>attributes within the kernel's world, making it visible and controllable

315
00:15:40.039 --> 00:15:43.879
<v Speaker 2>through sifts. That tiny structure is the bedrock of Linux's

316
00:15:44.039 --> 00:15:45.480
<v Speaker 2>entire device organization.

317
00:15:45.759 --> 00:15:49.120
<v Speaker 1>Amazing how that one fundamental piece underpins so much complexity.

318
00:15:49.200 --> 00:15:51.480
<v Speaker 1>That's a great takeaway. Thank you for walking us through this.

319
00:15:51.600 --> 00:15:53.000
<v Speaker 2>My pleasure is fascinating stuff.

320
00:15:53.080 --> 00:15:55.080
<v Speaker 1>It really is. We'll catch you next time for another

321
00:15:55.120 --> 00:15:55.600
<v Speaker 1>deep dive.
