WEBVTT

1
00:00:00.120 --> 00:00:02.799
<v Speaker 1>Welcome to the deep dive. We've got some really interesting

2
00:00:02.799 --> 00:00:06.320
<v Speaker 1>material you sent over on concurrency in modern C plus

3
00:00:06.360 --> 00:00:09.240
<v Speaker 1>plus POM. Looks like it's mostly drawing from the book

4
00:00:09.400 --> 00:00:11.480
<v Speaker 1>Concurrency with Modern C plus plus POM.

5
00:00:11.560 --> 00:00:15.839
<v Speaker 2>That's right, And our goal today is well, to unpack

6
00:00:15.880 --> 00:00:19.600
<v Speaker 2>the core ideas, maybe find some of those aha moments

7
00:00:19.679 --> 00:00:19.960
<v Speaker 2>for you.

8
00:00:20.239 --> 00:00:23.359
<v Speaker 1>Yeah, make this whole complex topic a bit more accessible

9
00:00:23.600 --> 00:00:25.760
<v Speaker 1>without drowning in the jargon exactly.

10
00:00:25.920 --> 00:00:27.839
<v Speaker 2>And you know, the book itself kind of hints at

11
00:00:27.879 --> 00:00:30.719
<v Speaker 2>the challenge. It mentions how the C plus plus memory

12
00:00:30.760 --> 00:00:33.159
<v Speaker 2>model often runs counter to our intuition.

13
00:00:33.640 --> 00:00:37.000
<v Speaker 1>Oh. Interesting, So that's our mission then, to navigate that

14
00:00:37.039 --> 00:00:40.079
<v Speaker 1>complexity and pull out the essentials you need for writing

15
00:00:40.320 --> 00:00:42.399
<v Speaker 1>you know, solid concurrency.

16
00:00:41.920 --> 00:00:45.920
<v Speaker 2>Plus plus code, precisely efficient, dependable code. That's the end.

17
00:00:46.039 --> 00:00:49.159
<v Speaker 1>Okay, let's get started then, right at the foundation the

18
00:00:49.200 --> 00:00:51.799
<v Speaker 1>memory model. Yeah, in simple terms, what is it we're

19
00:00:51.840 --> 00:00:53.200
<v Speaker 1>trying to wrap our heads around here.

20
00:00:53.280 --> 00:00:56.240
<v Speaker 2>Well, think of the memory model as like the official

21
00:00:56.320 --> 00:00:59.560
<v Speaker 2>rule book. It dictates how different threads in your program

22
00:00:59.679 --> 00:01:01.679
<v Speaker 2>see and interact with the computer's memory.

23
00:01:01.759 --> 00:01:03.719
<v Speaker 1>Okay, rules for memory interaction.

24
00:01:03.439 --> 00:01:07.079
<v Speaker 2>And from a concurrency angle, two basic questions pop up. First,

25
00:01:07.920 --> 00:01:11.519
<v Speaker 2>what counts as a single place in memory a memory location.

26
00:01:11.719 --> 00:01:14.040
<v Speaker 1>Right, is it a bite an integer?

27
00:01:14.159 --> 00:01:16.519
<v Speaker 2>According to the source, Yeah, it's either a basic scaler

28
00:01:16.519 --> 00:01:20.159
<v Speaker 2>type like your ince floats, pointers, enoms, or if you

29
00:01:20.200 --> 00:01:24.120
<v Speaker 2>have bitfields, it's the largest sort of continuous sequence of

30
00:01:24.159 --> 00:01:24.719
<v Speaker 2>those bits.

31
00:01:25.120 --> 00:01:28.040
<v Speaker 1>Got it. Scaler types are contiguous bitfields. What was the

32
00:01:28.079 --> 00:01:28.680
<v Speaker 1>second question?

33
00:01:28.959 --> 00:01:32.760
<v Speaker 2>Ah, the big one. What happens when multiple threads try

34
00:01:32.760 --> 00:01:35.400
<v Speaker 2>to access that same memory location around.

35
00:01:35.200 --> 00:01:37.480
<v Speaker 1>The same time. Okay, and I sense danger here?

36
00:01:37.599 --> 00:01:40.680
<v Speaker 2>You got it? That leads us straight to data races. Okay,

37
00:01:40.680 --> 00:01:44.000
<v Speaker 2>imagine two threads hitting the same shared variable. It's mutable,

38
00:01:44.159 --> 00:01:45.959
<v Speaker 2>and at least one of those threads is trying to

39
00:01:46.000 --> 00:01:46.519
<v Speaker 2>write to it.

40
00:01:46.760 --> 00:01:47.640
<v Speaker 1>That's data race.

41
00:01:47.719 --> 00:01:51.319
<v Speaker 2>That's the data race, and the result undefined behavior. Ah,

42
00:01:51.480 --> 00:01:56.000
<v Speaker 2>the dreaded ub exactly the wild West. Your program might crash,

43
00:01:56.040 --> 00:01:59.120
<v Speaker 2>spit out garbage, or maybe even seem to work fine

44
00:01:59.159 --> 00:02:01.760
<v Speaker 2>for a while, then just fail later completely out of

45
00:02:01.760 --> 00:02:02.120
<v Speaker 2>the blue.

46
00:02:02.280 --> 00:02:04.920
<v Speaker 1>So that's why we need things like mute texts and locks. Yeah,

47
00:02:04.959 --> 00:02:07.439
<v Speaker 1>to coordinate who gets access when precisely.

48
00:02:07.680 --> 00:02:11.280
<v Speaker 2>There are the traffic cops for shared data access essential tools.

49
00:02:11.560 --> 00:02:15.560
<v Speaker 1>The book uses thread safe singleton initialization as a classic example.

50
00:02:16.240 --> 00:02:19.199
<v Speaker 1>Why is that such a good illustration. It seems simple, right,

51
00:02:19.400 --> 00:02:20.159
<v Speaker 1>just one instance.

52
00:02:20.360 --> 00:02:23.120
<v Speaker 2>Well, it seems simple in a single thread, but imagine

53
00:02:23.199 --> 00:02:27.159
<v Speaker 2>multiple threads all deciding, hey, I need the singleton at

54
00:02:27.159 --> 00:02:28.280
<v Speaker 2>the exact same time.

55
00:02:28.400 --> 00:02:30.360
<v Speaker 1>Ah. So if they all check and see it doesn't

56
00:02:30.400 --> 00:02:31.240
<v Speaker 1>exist yet, they.

57
00:02:31.199 --> 00:02:33.240
<v Speaker 2>Might all try to create it, and suddenly you've got

58
00:02:33.360 --> 00:02:36.080
<v Speaker 2>multiple singletons, which completely breaks the whole idea.

59
00:02:36.280 --> 00:02:39.400
<v Speaker 1>Right, So, thread safe techniques make sure only one thread

60
00:02:39.639 --> 00:02:42.599
<v Speaker 1>actually does the creation, even if many try.

61
00:02:42.479 --> 00:02:44.879
<v Speaker 2>Exactly ensure as it's created exactly once.

62
00:02:45.280 --> 00:02:48.759
<v Speaker 1>Now for digging deeper, the book mentions a tool called creepmem.

63
00:02:49.000 --> 00:02:49.719
<v Speaker 1>What's that about?

64
00:02:49.879 --> 00:02:53.879
<v Speaker 2>Oh, CRIPPYMM is fantastic for this. It's like a sandbox

65
00:02:54.080 --> 00:02:57.080
<v Speaker 2>or a simulator for the C plus plus memory model.

66
00:02:57.199 --> 00:02:57.479
<v Speaker 1>Okay.

67
00:02:57.719 --> 00:03:00.439
<v Speaker 2>You feed it small snippets of concurrent code, and it

68
00:03:00.439 --> 00:03:03.039
<v Speaker 2>shows you all the possible ways the operations from different

69
00:03:03.039 --> 00:03:07.840
<v Speaker 2>threads could interleave. It visualizes the impact of different memory orderings.

70
00:03:07.439 --> 00:03:09.479
<v Speaker 1>So you can actually see how things might go wrong

71
00:03:09.680 --> 00:03:12.360
<v Speaker 1>or why a certain ordering works precisely.

72
00:03:12.840 --> 00:03:15.719
<v Speaker 2>It helps build that intuition for how the memory model behaves, which,

73
00:03:16.080 --> 00:03:17.639
<v Speaker 2>as we said, isn't always obvious.

74
00:03:17.919 --> 00:03:21.240
<v Speaker 1>Really valuable tool, okay, memory model basics covered, let's talk

75
00:03:21.280 --> 00:03:24.000
<v Speaker 1>about the threads themselves. We've had std dot thread for

76
00:03:24.000 --> 00:03:27.840
<v Speaker 1>a while clus plus twenty added std dot j thread.

77
00:03:28.199 --> 00:03:29.400
<v Speaker 1>What's the leap forward there?

78
00:03:29.520 --> 00:03:32.800
<v Speaker 2>The big difference really is resource management safety.

79
00:03:32.919 --> 00:03:34.520
<v Speaker 1>Also with sdd.

80
00:03:34.280 --> 00:03:37.919
<v Speaker 2>Dot thread, you the programmer must remember to either join

81
00:03:38.000 --> 00:03:40.599
<v Speaker 2>the thread, wait for it to finish, or detach it

82
00:03:40.639 --> 00:03:41.680
<v Speaker 2>to run independently.

83
00:03:41.840 --> 00:03:43.479
<v Speaker 1>And if you forget, if the std.

84
00:03:43.319 --> 00:03:45.919
<v Speaker 2>Dot thread object gets destroyed before you do either, your

85
00:03:45.960 --> 00:03:48.000
<v Speaker 2>program terminates. It's a common mistake.

86
00:03:48.240 --> 00:03:51.599
<v Speaker 1>Ouch. Okay, So how does std dot j thread fix that.

87
00:03:51.800 --> 00:03:55.919
<v Speaker 2>It's our Aii based resource acquisition is initialization. When an

88
00:03:55.960 --> 00:03:58.280
<v Speaker 2>std dot j thread object goes out of scope, its

89
00:03:58.319 --> 00:04:00.240
<v Speaker 2>destructor automatically calls joint.

90
00:04:00.360 --> 00:04:03.240
<v Speaker 1>No more forgetting Nice. That sounds much safer it is.

91
00:04:03.560 --> 00:04:06.080
<v Speaker 2>Plus, std dot j thread has built in support for

92
00:04:06.199 --> 00:04:09.719
<v Speaker 2>cooperative interruption, a clean way to ask a thread to stop.

93
00:04:09.960 --> 00:04:12.759
<v Speaker 1>Okay, cooperative interruption. We'll probably circle back to that now.

94
00:04:12.759 --> 00:04:15.879
<v Speaker 1>The book mentioned something tricky with std dot shared ptr.

95
00:04:16.240 --> 00:04:17.439
<v Speaker 1>I thought they were threads safe.

96
00:04:17.480 --> 00:04:21.000
<v Speaker 2>They help with memory management in threads. Yes, they prevent

97
00:04:21.120 --> 00:04:25.040
<v Speaker 2>leaks by managing the object's lifetime automatically, but the shared

98
00:04:25.040 --> 00:04:28.000
<v Speaker 2>pointer itself isn't fully thread safe for all operations.

99
00:04:28.079 --> 00:04:28.680
<v Speaker 1>What's the catch?

100
00:04:28.759 --> 00:04:31.560
<v Speaker 2>It's the internal reference counter. If you have multiple threads

101
00:04:31.839 --> 00:04:34.680
<v Speaker 2>all trying to say, assign a new shared pointer to

102
00:04:34.720 --> 00:04:37.519
<v Speaker 2>the same shared pointer variable, especially if it was passed

103
00:04:37.519 --> 00:04:40.160
<v Speaker 2>by reference, we could corrupt the count exactly. You can

104
00:04:40.199 --> 00:04:43.040
<v Speaker 2>get a data RaSE on that internal counter. The book

105
00:04:43.040 --> 00:04:45.839
<v Speaker 2>shows an example where this happens when threads modify a

106
00:04:45.920 --> 00:04:50.240
<v Speaker 2>shared shared ptr passed by reference. The object being pointed

107
00:04:50.279 --> 00:04:53.480
<v Speaker 2>to might be fine, but the pointer's bookkeeping gets messed up.

108
00:04:53.639 --> 00:04:56.480
<v Speaker 1>So if I need multiple threads to safely update which

109
00:04:56.480 --> 00:04:59.279
<v Speaker 1>object to shared pointer points to, what's the solution.

110
00:05:00.000 --> 00:05:03.079
<v Speaker 2>The book suggests using std dot atomic store for that

111
00:05:03.120 --> 00:05:06.000
<v Speaker 2>specific case to make the update atomic, but it also

112
00:05:06.040 --> 00:05:07.480
<v Speaker 2>points out that this is kind.

113
00:05:07.240 --> 00:05:09.120
<v Speaker 1>Of a workaround. What's the real fix? Then?

114
00:05:09.720 --> 00:05:13.279
<v Speaker 2>Ideally we'd use atomic smart pointers like std dot atomas

115
00:05:13.439 --> 00:05:16.639
<v Speaker 2>esdd dot shared ptr, which C plus plus twenty introduced

116
00:05:17.040 --> 00:05:20.000
<v Speaker 2>that handles the atomicity of the pointer operations themselves.

117
00:05:20.319 --> 00:05:23.399
<v Speaker 1>Okay, that makes sense. The book also mentioned std dot

118
00:05:23.399 --> 00:05:24.079
<v Speaker 1>atomic cref.

119
00:05:24.279 --> 00:05:27.639
<v Speaker 2>What's that for, ah, atomic cref. That's pretty neat. It

120
00:05:27.720 --> 00:05:30.959
<v Speaker 2>lets you perform atomic operations on an existing object that

121
00:05:31.199 --> 00:05:34.480
<v Speaker 2>wasn't originally declared. Std dot atomic, so you.

122
00:05:34.399 --> 00:05:37.240
<v Speaker 1>Can temporarily treat a regular variable as atomic sort of.

123
00:05:37.319 --> 00:05:39.439
<v Speaker 2>Yeah, you created an automic craft to it, and then

124
00:05:39.480 --> 00:05:43.279
<v Speaker 2>you can use atomic operations like fetchad or compare exchange

125
00:05:43.279 --> 00:05:47.079
<v Speaker 2>strong directly on that underlying variable through the reference. The

126
00:05:47.160 --> 00:05:50.800
<v Speaker 2>example showed incrementing a counter inside some big object without

127
00:05:50.839 --> 00:05:53.800
<v Speaker 2>needing locks or making the whole object atomic.

128
00:05:53.600 --> 00:05:56.399
<v Speaker 1>Interesting, so careful management is key. This leads us nicely

129
00:05:56.439 --> 00:06:02.040
<v Speaker 1>into memory ordering, sequential consistency, acquire release, relaxed. These sound

130
00:06:02.079 --> 00:06:03.279
<v Speaker 1>like different levels of rules.

131
00:06:03.360 --> 00:06:06.839
<v Speaker 2>They are. They're different contracts, different guarantees about how memory

132
00:06:06.879 --> 00:06:09.000
<v Speaker 2>operations become visible across threads.

133
00:06:09.160 --> 00:06:13.959
<v Speaker 1>Let's start with the strictest sequential consistency memory order.

134
00:06:13.800 --> 00:06:17.079
<v Speaker 2>Seconds, right, That's the default for atomics, and it's the

135
00:06:17.120 --> 00:06:20.959
<v Speaker 2>easiest to reason about it. Basically, guarantees two things. One,

136
00:06:21.560 --> 00:06:24.800
<v Speaker 2>all threads agree on a single global order of all

137
00:06:24.839 --> 00:06:30.240
<v Speaker 2>sequentially consistent operations, and two, the operations within any single

138
00:06:30.319 --> 00:06:33.240
<v Speaker 2>thread happen in the order you wrote them in your code.

139
00:06:33.120 --> 00:06:35.199
<v Speaker 1>Like one single timeline for everything.

140
00:06:35.360 --> 00:06:39.120
<v Speaker 2>Exactly simple model, but it can sometimes have performance costs

141
00:06:39.199 --> 00:06:41.519
<v Speaker 2>because the hardware has to work harder to maintain that

142
00:06:41.560 --> 00:06:42.240
<v Speaker 2>global order.

143
00:06:42.399 --> 00:06:45.600
<v Speaker 1>Okay, what about acchore release semantics. Then sounds like it

144
00:06:45.680 --> 00:06:46.720
<v Speaker 1>loosens things up a bit.

145
00:06:46.879 --> 00:06:50.920
<v Speaker 2>It does acquoire, release memory order, require memory order, release

146
00:06:51.000 --> 00:06:55.199
<v Speaker 2>memory order, roll, and also consume, though that's trickier. Focuses

147
00:06:55.240 --> 00:06:58.720
<v Speaker 2>on synchronization between operations on the same atomic.

148
00:06:58.360 --> 00:06:59.759
<v Speaker 1>Variable, same variable, okay.

149
00:07:00.120 --> 00:07:03.839
<v Speaker 2>Release operation. Typically a right ensures that all memory rights

150
00:07:03.920 --> 00:07:07.040
<v Speaker 2>that happen before it in the same thread become visible

151
00:07:07.079 --> 00:07:10.519
<v Speaker 2>to other threads that later perform an acquire operation usually

152
00:07:10.680 --> 00:07:12.920
<v Speaker 2>read on that same atomic variable.

153
00:07:13.120 --> 00:07:16.319
<v Speaker 1>So the release makes prior rights visible and the acquire

154
00:07:16.399 --> 00:07:17.720
<v Speaker 1>sees them precisely.

155
00:07:18.199 --> 00:07:22.839
<v Speaker 2>This creates what's called a synchronizes with relationship. It's fundamental.

156
00:07:23.199 --> 00:07:26.240
<v Speaker 2>The book points out this is how mutexes, thread joins,

157
00:07:26.360 --> 00:07:29.800
<v Speaker 2>condition variables, all the higher level stuff actually works under

158
00:07:29.800 --> 00:07:33.839
<v Speaker 2>the hood. A lock release synchronizes with a subsequent.

159
00:07:33.439 --> 00:07:37.519
<v Speaker 1>Lock acquire that synchronizes with it. Sounds important. It establishes

160
00:07:37.639 --> 00:07:38.680
<v Speaker 1>order across threads.

161
00:07:38.839 --> 00:07:43.040
<v Speaker 2>Yes, it establishes A happens before relationship. If action A

162
00:07:43.279 --> 00:07:47.399
<v Speaker 2>synchronizes with action B, then A happens before B. This

163
00:07:47.480 --> 00:07:49.600
<v Speaker 2>guarantees visibility of memory changes.

164
00:07:49.920 --> 00:07:52.680
<v Speaker 1>Got it? Now? What about the most lenient one memory

165
00:07:52.800 --> 00:07:55.560
<v Speaker 1>order relaxed? What guarantees do we lose there.

166
00:07:55.720 --> 00:07:59.240
<v Speaker 2>With relaxed ordering, you only get the bare minimum the

167
00:07:59.240 --> 00:08:02.360
<v Speaker 2>operation itself as atomic. It happens indivisibly, and there's a

168
00:08:02.399 --> 00:08:06.480
<v Speaker 2>single modification order for that specific atomic variable. All threads

169
00:08:06.519 --> 00:08:09.040
<v Speaker 2>will agree on the sequence of values written to that one.

170
00:08:08.920 --> 00:08:11.879
<v Speaker 1>Variable, but no guarantees about other memory operations exactly.

171
00:08:11.920 --> 00:08:15.839
<v Speaker 2>Relaxed operations don't create synchronizers with relationships. They don't guarantee

172
00:08:15.839 --> 00:08:18.680
<v Speaker 2>anything about the visibility or ordering of other reads and writes,

173
00:08:18.879 --> 00:08:21.079
<v Speaker 2>even to the same variable by different threads or to

174
00:08:21.120 --> 00:08:21.959
<v Speaker 2>different variables.

175
00:08:22.240 --> 00:08:26.480
<v Speaker 1>So potentially faster, but much harder to reason about, much harder.

176
00:08:26.800 --> 00:08:29.720
<v Speaker 2>The book shows using fetchad with relaxed ordering for a

177
00:08:29.800 --> 00:08:32.360
<v Speaker 2>simple counter, which is a common use case, but it

178
00:08:32.399 --> 00:08:35.039
<v Speaker 2>also warns that you can still get data rass on

179
00:08:35.120 --> 00:08:38.360
<v Speaker 2>non atomic variables even if you're reading related atomics with

180
00:08:38.600 --> 00:08:42.799
<v Speaker 2>relaxed order, because there's no happens before relationship established. It's

181
00:08:42.799 --> 00:08:43.519
<v Speaker 2>subtle stuff.

182
00:08:43.559 --> 00:08:46.440
<v Speaker 1>And where do memory fences? Atomic thread fens.

183
00:08:46.240 --> 00:08:50.360
<v Speaker 2>Fit in fences acts like barriers. They enforce ordering constraints

184
00:08:50.360 --> 00:08:53.519
<v Speaker 2>between operations before the fence and operations after the fence,

185
00:08:53.600 --> 00:08:57.440
<v Speaker 2>even across different variables or relaxed atomics. A release fence

186
00:08:57.480 --> 00:09:00.480
<v Speaker 2>makes prior rights visible to threads that later cute and

187
00:09:00.519 --> 00:09:05.000
<v Speaker 2>acquire fence. It's another way to establish that synchronizes with relationship,

188
00:09:05.360 --> 00:09:08.399
<v Speaker 2>but without needing a specific atomic variable to mediate.

189
00:09:08.480 --> 00:09:10.799
<v Speaker 1>Okay, that's a lot to digest on ordering. Let's shift

190
00:09:10.840 --> 00:09:12.960
<v Speaker 1>to actually using threads. How do we launch them? What

191
00:09:13.000 --> 00:09:13.720
<v Speaker 1>are the options?

192
00:09:13.840 --> 00:09:17.519
<v Speaker 2>The main way is std dot thread. Its constructor can

193
00:09:17.559 --> 00:09:20.799
<v Speaker 2>take basically any callable thing hollible thing. Yeah, like a

194
00:09:20.840 --> 00:09:24.840
<v Speaker 2>regular function pointer or a function object you know, an

195
00:09:24.840 --> 00:09:29.080
<v Speaker 2>object where you've overloaded the parentheses operator, or very commonly

196
00:09:29.159 --> 00:09:30.000
<v Speaker 2>a lambda function.

197
00:09:30.159 --> 00:09:31.600
<v Speaker 1>Right. Lambas are handy there.

198
00:09:31.480 --> 00:09:34.279
<v Speaker 2>Super handy. You just passed the function or lambda you

199
00:09:34.279 --> 00:09:36.360
<v Speaker 2>want to run in the new thread, followed by any

200
00:09:36.480 --> 00:09:39.679
<v Speaker 2>arguments it needs. The book shows a simple Hello from

201
00:09:39.720 --> 00:09:41.639
<v Speaker 2>thread using a lambda.

202
00:09:41.399 --> 00:09:44.440
<v Speaker 1>And once it's running, we have to decide what happens

203
00:09:44.440 --> 00:09:47.559
<v Speaker 1>when it finishes. Join or detach.

204
00:09:47.279 --> 00:09:50.159
<v Speaker 2>Exactly, You have to make a choice before the std

205
00:09:50.279 --> 00:09:54.519
<v Speaker 2>dot thread object itself is destroyed. Join means the current

206
00:09:54.559 --> 00:09:57.919
<v Speaker 2>thread waits right there until the launched thread completes.

207
00:09:58.240 --> 00:10:00.440
<v Speaker 1>Useful if you need its result or need to know

208
00:10:00.480 --> 00:10:02.080
<v Speaker 1>it's done before cleaning up resources.

209
00:10:02.120 --> 00:10:05.759
<v Speaker 2>Precisely. Detach. On the other hand, lets the thread run

210
00:10:05.799 --> 00:10:11.000
<v Speaker 2>completely independently in the background. The original thread continues immediately.

211
00:10:11.080 --> 00:10:14.039
<v Speaker 1>But that sounds risky. What if the detached thread needs

212
00:10:14.159 --> 00:10:15.879
<v Speaker 1>data that the original thread owns.

213
00:10:16.039 --> 00:10:18.919
<v Speaker 2>That's the big danger. If the original thread finishes and

214
00:10:18.960 --> 00:10:21.440
<v Speaker 2>its data goes out of scope, but the detached thread

215
00:10:21.480 --> 00:10:24.919
<v Speaker 2>is still running and tries to access that data, Boom,

216
00:10:25.000 --> 00:10:29.240
<v Speaker 2>undefined behavior again. So the book advises joining, usually strongly

217
00:10:29.279 --> 00:10:32.840
<v Speaker 2>advises joining, especially if the thread interacts with data whose

218
00:10:32.879 --> 00:10:35.759
<v Speaker 2>lifetime is tied to the scope where the thread was created.

219
00:10:35.919 --> 00:10:39.279
<v Speaker 2>Detaching requires very careful management of lifetimes.

220
00:10:39.559 --> 00:10:43.720
<v Speaker 1>Makes sense. What about std dot thread dot hardware concurrency.

221
00:10:43.879 --> 00:10:46.159
<v Speaker 2>It's said to us it gives you a hint, basically,

222
00:10:46.679 --> 00:10:49.919
<v Speaker 2>an estimate of how many threads the hardware can genuinely

223
00:10:49.960 --> 00:10:53.559
<v Speaker 2>run in parallel, often related to the number of CPU

224
00:10:53.600 --> 00:10:54.879
<v Speaker 2>cores or hyperthreads.

225
00:10:55.120 --> 00:10:57.240
<v Speaker 1>A hint, not a rule, definitely just a hint.

226
00:10:57.480 --> 00:11:01.559
<v Speaker 2>The optimal number of threads depends heavily on the specific task, io, contention,

227
00:11:01.720 --> 00:11:05.120
<v Speaker 2>et cetera. Using exactly this number isn't always best. The

228
00:11:05.120 --> 00:11:08.080
<v Speaker 2>book mentions, it's just a starting point, a native handle

229
00:11:08.279 --> 00:11:10.799
<v Speaker 2>that's an escape patch. It gives you direct access to

230
00:11:10.840 --> 00:11:13.679
<v Speaker 2>the underlying operating systems thread handle like a thread on

231
00:11:13.759 --> 00:11:16.120
<v Speaker 2>Linux or a handle on Windows. If you need to

232
00:11:16.159 --> 00:11:18.960
<v Speaker 2>do something platform specific that the C plus plus standard

233
00:11:18.960 --> 00:11:21.159
<v Speaker 2>library doesn't cover, use with caution though.

234
00:11:21.279 --> 00:11:23.919
<v Speaker 1>Okay, got it. Let's move on to the tools we

235
00:11:24.000 --> 00:11:27.559
<v Speaker 1>use with threads synchronization primitives, starting with the most basic

236
00:11:27.840 --> 00:11:29.000
<v Speaker 1>STD mutex.

237
00:11:29.320 --> 00:11:33.759
<v Speaker 2>Right, the mutex its core job is mutual exclusion, protecting

238
00:11:33.799 --> 00:11:34.440
<v Speaker 2>shared data.

239
00:11:34.519 --> 00:11:35.200
<v Speaker 1>How does it do that?

240
00:11:35.480 --> 00:11:37.840
<v Speaker 2>Think of it as a lock guarding a piece of data.

241
00:11:38.399 --> 00:11:40.600
<v Speaker 2>Before a thread can touch that data, it has to

242
00:11:40.639 --> 00:11:43.759
<v Speaker 2>lock the mutex. If another thread already holds the lock,

243
00:11:43.879 --> 00:11:47.240
<v Speaker 2>the first thread weights. Once it's done, it must unlock

244
00:11:47.279 --> 00:11:50.200
<v Speaker 2>the mutex, allowing another waiting thread to proceed.

245
00:11:50.200 --> 00:11:52.639
<v Speaker 1>So only one thread gets access at a time. Prevents

246
00:11:52.759 --> 00:11:55.039
<v Speaker 1>data rases on that protected data exactly.

247
00:11:55.399 --> 00:11:58.399
<v Speaker 2>Mutexes are your go to for protecting shared mutable state

248
00:11:58.639 --> 00:11:59.639
<v Speaker 2>first line of defense.

249
00:12:00.240 --> 00:12:04.279
<v Speaker 1>But the book warns about deadlocks. How did mutexes lead

250
00:12:04.320 --> 00:12:04.559
<v Speaker 1>to that?

251
00:12:05.080 --> 00:12:09.000
<v Speaker 2>Ah? The classic deadlock scenario. Imagine thread one locks mutex A,

252
00:12:09.360 --> 00:12:13.360
<v Speaker 2>then tries to lock mutex B. Simultaneously, Thread two locks

253
00:12:13.440 --> 00:12:15.440
<v Speaker 2>mutex B, then tries to lock mutex A.

254
00:12:15.840 --> 00:12:18.240
<v Speaker 1>Oh. Thread one has A and wants B. Thread two

255
00:12:18.279 --> 00:12:19.200
<v Speaker 1>has B and wants a.

256
00:12:19.480 --> 00:12:22.440
<v Speaker 2>And they're stuck. Neither can proceed because it's waiting for

257
00:12:22.480 --> 00:12:24.600
<v Speaker 2>the resource the other one holds. That's a deadlock. They

258
00:12:24.679 --> 00:12:25.960
<v Speaker 2>wait forever, masty.

259
00:12:26.360 --> 00:12:28.440
<v Speaker 1>How do we avoid that when we need multiple locks?

260
00:12:28.679 --> 00:12:31.879
<v Speaker 2>The standard solution is std dot lock. You pass it

261
00:12:31.919 --> 00:12:34.240
<v Speaker 2>all the mutexts you need to acquire. It uses a

262
00:12:34.279 --> 00:12:37.360
<v Speaker 2>deadlock avoidance algorithm internally to try and lock all of them.

263
00:12:37.320 --> 00:12:40.279
<v Speaker 1>Atomically atomically, meaning it gets all of them or none

264
00:12:40.320 --> 00:12:40.639
<v Speaker 1>of them.

265
00:12:40.879 --> 00:12:43.480
<v Speaker 2>Essentially. Yes, it guarantees it won't end up in a

266
00:12:43.519 --> 00:12:46.200
<v Speaker 2>state where it holds some locks while blocking waiting for

267
00:12:46.279 --> 00:12:49.039
<v Speaker 2>others in a way that contributes to deadlock. If it

268
00:12:49.039 --> 00:12:51.559
<v Speaker 2>can't get all locks, it'll release any it acquired and

269
00:12:51.600 --> 00:12:54.600
<v Speaker 2>try again, or perhaps throw an exception, depending on the context.

270
00:12:54.799 --> 00:12:58.799
<v Speaker 1>Okay, so std dot lock for multiple mutexes. Yeah, good tip.

271
00:12:59.159 --> 00:13:02.600
<v Speaker 1>We mentioned threads saf initialization earlier. Besides a simple lock.

272
00:13:02.639 --> 00:13:04.120
<v Speaker 1>What other techniques does the book cover?

273
00:13:04.320 --> 00:13:06.679
<v Speaker 2>Several good ones. If something can be a const expert,

274
00:13:06.759 --> 00:13:09.960
<v Speaker 2>its value is fixed at compile time, so that's inherently thread.

275
00:13:09.840 --> 00:13:11.799
<v Speaker 1>Safe, right, no runtime race possible.

276
00:13:11.840 --> 00:13:15.759
<v Speaker 2>Well, then there's std dotkalents with the std dot once flag.

277
00:13:16.120 --> 00:13:18.279
<v Speaker 2>You pass it a flag and a function like your

278
00:13:18.320 --> 00:13:22.279
<v Speaker 2>initialization function. The standard guarantees that function will be executed

279
00:13:22.320 --> 00:13:25.000
<v Speaker 2>exactly once by the first thread that calls it, even

280
00:13:25.039 --> 00:13:27.919
<v Speaker 2>if many threads call it concurrently, other threads will wait

281
00:13:28.000 --> 00:13:29.200
<v Speaker 2>until the first one is done.

282
00:13:29.240 --> 00:13:30.440
<v Speaker 1>Okay, that sounds robust.

283
00:13:30.960 --> 00:13:34.639
<v Speaker 2>Very Another common C plus plus idiom, especially since C

284
00:13:34.759 --> 00:13:38.879
<v Speaker 2>plus plus eleven, is the Meyers singleton. Using a static

285
00:13:39.000 --> 00:13:40.559
<v Speaker 2>variable inside a function.

286
00:13:40.600 --> 00:13:43.279
<v Speaker 1>Like static my singleton instance return.

287
00:13:43.000 --> 00:13:47.799
<v Speaker 2>Instance exactly that. The language guarantees that the initialization of

288
00:13:47.840 --> 00:13:51.679
<v Speaker 2>that static local variable is thread safe. The compiler and

289
00:13:51.799 --> 00:13:55.759
<v Speaker 2>runtime handle the locking implicitly. It's often the simplest and

290
00:13:55.799 --> 00:13:56.480
<v Speaker 2>preferred way.

291
00:13:56.519 --> 00:13:59.080
<v Speaker 1>Now simple as good any others.

292
00:13:59.320 --> 00:14:02.559
<v Speaker 2>Well of all, if your program structure allows it is

293
00:14:02.720 --> 00:14:05.720
<v Speaker 2>just initialize the shared resource in your main thread before

294
00:14:05.840 --> 00:14:10.120
<v Speaker 2>you create any other threads. No concurrency Doing initialization means

295
00:14:10.120 --> 00:14:10.679
<v Speaker 2>no problem?

296
00:14:11.159 --> 00:14:14.559
<v Speaker 1>Fair enough? What about signaling between threads, like, hey, the

297
00:14:14.639 --> 00:14:18.159
<v Speaker 1>data you're waiting for is ready. That's std dot condition

298
00:14:18.320 --> 00:14:19.399
<v Speaker 1>variable precisely.

299
00:14:19.440 --> 00:14:23.519
<v Speaker 2>Condition variables let threads weight efficiently until some condition becomes true.

300
00:14:23.720 --> 00:14:25.360
<v Speaker 1>How do they work? Do they need a mutex?

301
00:14:25.559 --> 00:14:28.159
<v Speaker 2>Yes? They always work together with the mutex. A waiting

302
00:14:28.200 --> 00:14:31.200
<v Speaker 2>thread must first lock the mutex protecting the shared state

303
00:14:31.320 --> 00:14:34.960
<v Speaker 2>at the condition. Then it calls weight on the condition variable,

304
00:14:35.039 --> 00:14:38.200
<v Speaker 2>and weight does what. It atomically releases the mutex and

305
00:14:38.240 --> 00:14:42.080
<v Speaker 2>puts the thread to sleep. It waits until another thread notifies.

306
00:14:42.120 --> 00:14:43.840
<v Speaker 1>It notifies it how by calling.

307
00:14:43.639 --> 00:14:48.720
<v Speaker 2>Notify one or notifile on the same condition variable. When

308
00:14:48.759 --> 00:14:52.639
<v Speaker 2>the waiting thread wakes up, it automatically reacquires the mutex

309
00:14:52.799 --> 00:14:54.120
<v Speaker 2>before weight returns.

310
00:14:54.159 --> 00:14:56.840
<v Speaker 1>Okay, it wakes up, gets the locked back. Then it

311
00:14:56.879 --> 00:14:58.159
<v Speaker 1>can check the condition exactly.

312
00:14:58.240 --> 00:15:01.360
<v Speaker 2>And this is crucial. It must check the condition again

313
00:15:01.600 --> 00:15:02.399
<v Speaker 2>after waking up.

314
00:15:02.519 --> 00:15:05.039
<v Speaker 1>Why didn't get notified because the condition is true.

315
00:15:05.240 --> 00:15:08.080
<v Speaker 2>Not necessarily, you can get spurious wakeups where the thread

316
00:15:08.080 --> 00:15:10.519
<v Speaker 2>wakes up even though no notification happen or the condition

317
00:15:10.679 --> 00:15:14.360
<v Speaker 2>changed back. That's why weight functions usually take a predicate,

318
00:15:14.480 --> 00:15:17.639
<v Speaker 2>a lambda or function that checks the actual condition. The

319
00:15:17.679 --> 00:15:20.679
<v Speaker 2>weight will only return if the predicate is true or

320
00:15:20.720 --> 00:15:21.480
<v Speaker 2>if interrupted.

321
00:15:21.720 --> 00:15:25.480
<v Speaker 1>Ah, so the predicate handles spurious wakeups. Never wait with

322
00:15:25.519 --> 00:15:25.960
<v Speaker 1>that one.

323
00:15:26.080 --> 00:15:28.000
<v Speaker 2>That's the rule. Always weight with the predicate.

324
00:15:28.120 --> 00:15:32.519
<v Speaker 1>Now C plus plus twenty brought cooperative interruption std dot

325
00:15:32.600 --> 00:15:36.679
<v Speaker 1>stop source stop token. How does that fit in? Especially

326
00:15:36.679 --> 00:15:38.799
<v Speaker 1>with j thread and condition.

327
00:15:38.639 --> 00:15:40.600
<v Speaker 2>Very blany right, This is a much better way to

328
00:15:40.679 --> 00:15:43.759
<v Speaker 2>ask threads to stop than say, just setting a boolean flag.

329
00:15:43.759 --> 00:15:44.600
<v Speaker 2>It's more integrated.

330
00:15:44.639 --> 00:15:45.279
<v Speaker 1>How does it work.

331
00:15:45.519 --> 00:15:49.840
<v Speaker 2>You create a std dot stop source. This object can

332
00:15:49.879 --> 00:15:53.519
<v Speaker 2>request that associated operations stop. From the stop source, you

333
00:15:53.559 --> 00:15:56.679
<v Speaker 2>get std dot stop tokens. You pass these tokens to

334
00:15:56.759 --> 00:15:58.360
<v Speaker 2>the threads or operations.

335
00:15:57.840 --> 00:16:00.000
<v Speaker 1>You might want to interrupt, and the thread checks up

336
00:16:00.120 --> 00:16:00.519
<v Speaker 1>the token.

337
00:16:00.720 --> 00:16:04.679
<v Speaker 2>Yes, a thread can periodically call stop requested on its token.

338
00:16:05.200 --> 00:16:08.399
<v Speaker 2>Or even better, many blocking functions, like the weight functions

339
00:16:08.399 --> 00:16:11.159
<v Speaker 2>on std dot condition variably needs, and the ones in

340
00:16:11.240 --> 00:16:15.000
<v Speaker 2>J thread implicitly can accept a stop token. They'll automatically

341
00:16:15.000 --> 00:16:17.279
<v Speaker 2>wake up if a stop is requested on that token.

342
00:16:17.399 --> 00:16:19.120
<v Speaker 1>So J thread uses this automatically.

343
00:16:19.240 --> 00:16:21.720
<v Speaker 2>J thread has a stop source built in. If you

344
00:16:21.759 --> 00:16:23.559
<v Speaker 2>create a J thread with a function that takes a

345
00:16:23.600 --> 00:16:26.399
<v Speaker 2>stop token as its first argument, the J threads destructor

346
00:16:26.440 --> 00:16:30.480
<v Speaker 2>will automatically request stop before joining. It makes graceful shut down.

347
00:16:30.320 --> 00:16:33.679
<v Speaker 1>Much easier and std dot stop call back that lets.

348
00:16:33.519 --> 00:16:36.279
<v Speaker 2>You register a function that gets called immediately when stop

349
00:16:36.320 --> 00:16:38.960
<v Speaker 2>is requested on a given token, useful for things like

350
00:16:39.039 --> 00:16:41.840
<v Speaker 2>quickly closing a socket or canceling an io operation.

351
00:16:42.000 --> 00:16:45.080
<v Speaker 1>Okay, a much cleaner stop mechanism. What about STD dot

352
00:16:45.120 --> 00:16:47.600
<v Speaker 1>counting semaphore also C plus plus twenty. How's that different

353
00:16:47.639 --> 00:16:48.279
<v Speaker 1>from a mutex?

354
00:16:48.559 --> 00:16:51.840
<v Speaker 2>A mutex is about exclusive access only one thread in

355
00:16:51.919 --> 00:16:55.799
<v Speaker 2>at a time. A semaphore maintains a counter representing available

356
00:16:55.840 --> 00:16:57.080
<v Speaker 2>resources or permits.

357
00:16:57.080 --> 00:16:57.799
<v Speaker 1>How does that work?

358
00:16:58.080 --> 00:17:01.720
<v Speaker 2>A thread calls a choir to take a permit, decrementing

359
00:17:01.759 --> 00:17:04.640
<v Speaker 2>the counter. If the counter is zero, the thread blocks.

360
00:17:05.279 --> 00:17:08.799
<v Speaker 2>A thread calls release to return a permit, incrementing the counter,

361
00:17:09.119 --> 00:17:11.000
<v Speaker 2>potentially waking up a blocked thread.

362
00:17:11.160 --> 00:17:13.240
<v Speaker 1>Can different threads acquire and release?

363
00:17:13.720 --> 00:17:17.359
<v Speaker 2>Yes, that's a key difference from utexas, which are usually

364
00:17:17.400 --> 00:17:20.680
<v Speaker 2>locked and unlocked by the same thread Somemophores are great

365
00:17:20.680 --> 00:17:23.400
<v Speaker 2>for controlling access to a pool of n resources or

366
00:17:23.440 --> 00:17:26.839
<v Speaker 2>for producer consumer scenarios where one thread signals another about

367
00:17:26.839 --> 00:17:28.960
<v Speaker 2>available work. They're thread agnostic.

368
00:17:29.359 --> 00:17:33.160
<v Speaker 1>Interesting. Lastly, for basic sinc C plus plus twenty also

369
00:17:33.200 --> 00:17:36.519
<v Speaker 1>give us STD dot barrier and std dot latch. Yeah,

370
00:17:36.559 --> 00:17:38.480
<v Speaker 1>coordinating multiple threads exactly.

371
00:17:38.559 --> 00:17:40.759
<v Speaker 2>Both are for synchronizing a group of threads at a

372
00:17:40.799 --> 00:17:41.640
<v Speaker 2>specific point.

373
00:17:41.759 --> 00:17:43.799
<v Speaker 1>What's the difference latch versus barrier.

374
00:17:43.720 --> 00:17:46.359
<v Speaker 2>A SSTD dot latch is basically a one shot countdown.

375
00:17:46.400 --> 00:17:48.559
<v Speaker 2>You initialize it with a count threads call countdown. When

376
00:17:48.559 --> 00:17:51.079
<v Speaker 2>the count reaches zero, any threads waiting on the latch

377
00:17:51.240 --> 00:17:54.200
<v Speaker 2>using weight are unblocked. After that, the latch is done.

378
00:17:54.240 --> 00:17:55.839
<v Speaker 2>It can't be reset.

379
00:17:55.839 --> 00:17:58.039
<v Speaker 1>One time use and a barrier.

380
00:17:58.160 --> 00:18:01.640
<v Speaker 2>A std dot barrier is reusable. You initialize it with

381
00:18:01.680 --> 00:18:04.720
<v Speaker 2>the number of threads in the group. Each thread calls

382
00:18:04.880 --> 00:18:07.839
<v Speaker 2>arrive and weight. When all threads have arrived, they are

383
00:18:07.880 --> 00:18:12.519
<v Speaker 2>all unblocked simultaneously. Crucially, the barrier resets ready for the

384
00:18:12.559 --> 00:18:15.920
<v Speaker 2>next synchronization phase. You can even run a completion function

385
00:18:16.000 --> 00:18:18.519
<v Speaker 2>when all threads arrive but before they're unblocked.

386
00:18:18.799 --> 00:18:22.359
<v Speaker 1>So latch for a single sync point barrier for repeated

387
00:18:22.400 --> 00:18:23.440
<v Speaker 1>phases of computation.

388
00:18:24.000 --> 00:18:25.680
<v Speaker 2>That's a good way to think about it. The book

389
00:18:25.720 --> 00:18:28.759
<v Speaker 2>shows an example of barriers being used across different stages

390
00:18:28.799 --> 00:18:30.599
<v Speaker 2>where the number of workers might even change.

391
00:18:30.680 --> 00:18:33.559
<v Speaker 1>Okay, let's move up a level to tasks and futures.

392
00:18:33.640 --> 00:18:36.680
<v Speaker 1>Std dot ASNC sounds really convenient for running stuff in

393
00:18:36.759 --> 00:18:37.279
<v Speaker 1>the background.

394
00:18:37.400 --> 00:18:40.079
<v Speaker 2>It is. It's a high level way to say, run

395
00:18:40.119 --> 00:18:43.119
<v Speaker 2>this function, possibly on another thread, and give me back

396
00:18:43.160 --> 00:18:44.920
<v Speaker 2>something I can use to get the result later.

397
00:18:45.079 --> 00:18:47.039
<v Speaker 1>That's something is the future exactly.

398
00:18:47.160 --> 00:18:50.799
<v Speaker 2>Std dot ACNC returns std dot future object, and it

399
00:18:50.839 --> 00:18:54.119
<v Speaker 2>handles the thread management, often using an internal.

400
00:18:53.759 --> 00:18:56.799
<v Speaker 1>Threadpool you mentioned, possibly on another thread right.

401
00:18:57.160 --> 00:19:01.160
<v Speaker 2>Std dot ACNC takes an optional launch policy. The default

402
00:19:01.240 --> 00:19:04.200
<v Speaker 2>std dot launch dot acing std dot launch dot deferred

403
00:19:04.480 --> 00:19:07.519
<v Speaker 2>usually runs it on a new thread eager evaluation. But

404
00:19:07.559 --> 00:19:10.440
<v Speaker 2>you can specify sdd dot launch dot acing to guarantee

405
00:19:10.480 --> 00:19:13.720
<v Speaker 2>a new thread, or std dot launch dot deferred to

406
00:19:13.759 --> 00:19:14.119
<v Speaker 2>make it.

407
00:19:14.160 --> 00:19:16.759
<v Speaker 1>Lazy lazy evaluation meaning.

408
00:19:16.640 --> 00:19:19.519
<v Speaker 2>Meaning the function only runs when you actually call get

409
00:19:19.799 --> 00:19:22.279
<v Speaker 2>or wait on the future it runs synchronously in the

410
00:19:22.279 --> 00:19:23.319
<v Speaker 2>thread that calls get.

411
00:19:23.400 --> 00:19:26.440
<v Speaker 1>Interesting trade off. What about std dot package task How

412
00:19:26.440 --> 00:19:26.880
<v Speaker 1>does that fit?

413
00:19:26.920 --> 00:19:29.799
<v Speaker 2>In std dot package task gives you more control. It

414
00:19:29.839 --> 00:19:32.119
<v Speaker 2>bundles up a function or callable with a promise the

415
00:19:32.119 --> 00:19:34.079
<v Speaker 2>thing that will eventually hold the result. Okay, it gives

416
00:19:34.079 --> 00:19:36.519
<v Speaker 2>you back at sdd dot future associated with that promise.

417
00:19:36.680 --> 00:19:40.279
<v Speaker 2>But crucially, the task doesn't run yet. You are responsible

418
00:19:40.359 --> 00:19:43.720
<v Speaker 2>for invoking the package task object itself, maybe passing it

419
00:19:43.759 --> 00:19:45.599
<v Speaker 2>to a thread you manage, or putting it in your

420
00:19:45.640 --> 00:19:46.880
<v Speaker 2>own queue for a threadpool.

421
00:19:47.519 --> 00:19:50.440
<v Speaker 1>So ACNC is fire and forget with a future back

422
00:19:50.759 --> 00:19:53.160
<v Speaker 1>package task is prepare and run later.

423
00:19:53.359 --> 00:19:55.799
<v Speaker 2>That's a good way to put it package. Task decouples

424
00:19:55.839 --> 00:19:57.599
<v Speaker 2>defining the work from executing it.

425
00:19:57.680 --> 00:20:00.759
<v Speaker 1>In std dot future itself. Yeah, just a placeholder for

426
00:20:00.799 --> 00:20:02.039
<v Speaker 1>the result pretty much.

427
00:20:02.640 --> 00:20:05.599
<v Speaker 2>It represents a result that will eventually be available from

428
00:20:05.720 --> 00:20:10.240
<v Speaker 2>some asynchronous operation. You call get on it to retrieve

429
00:20:10.279 --> 00:20:13.079
<v Speaker 2>the value those get block Yes. If the result isn't

430
00:20:13.119 --> 00:20:17.359
<v Speaker 2>ready yet, debt blocks the calling thread until it is. Also, importantly,

431
00:20:17.680 --> 00:20:21.160
<v Speaker 2>you can typically only call get once on a regular

432
00:20:21.240 --> 00:20:22.160
<v Speaker 2>sdd dot future.

433
00:20:22.480 --> 00:20:25.799
<v Speaker 1>The result is moved out only once. What if multiple

434
00:20:25.799 --> 00:20:26.839
<v Speaker 1>threads need the result?

435
00:20:27.039 --> 00:20:30.039
<v Speaker 2>Ah, that's where std dot shared future comes in. You

436
00:20:30.039 --> 00:20:32.640
<v Speaker 2>can create a shared future from a sdd dot future

437
00:20:32.680 --> 00:20:36.039
<v Speaker 2>which consumes the original future. Copies of the shared future

438
00:20:36.039 --> 00:20:38.240
<v Speaker 2>can then be given to multiple threads, and they can

439
00:20:38.279 --> 00:20:40.480
<v Speaker 2>all call get to retrieve a copy of the result

440
00:20:40.480 --> 00:20:41.200
<v Speaker 2>once it's ready.

441
00:20:41.279 --> 00:20:44.519
<v Speaker 1>Okay, so future for single result retrieval, shared future for

442
00:20:44.640 --> 00:20:49.000
<v Speaker 1>multiple correct. How do futures compared to condition variables for synchronization,

443
00:20:49.359 --> 00:20:50.680
<v Speaker 1>The book mentioned a comparison.

444
00:20:51.119 --> 00:20:54.359
<v Speaker 2>They serve different purposes. Mostly, conditioned variables are more general

445
00:20:54.359 --> 00:20:58.640
<v Speaker 2>purpose for complex waiting logic, maybe involving multiple conditions or

446
00:20:58.680 --> 00:21:02.359
<v Speaker 2>repeated signaling. Futures are primarily designed for getting a single

447
00:21:02.400 --> 00:21:03.920
<v Speaker 2>result back from a one off task.

448
00:21:04.319 --> 00:21:06.880
<v Speaker 1>So futures are simpler for the GATA result case.

449
00:21:07.279 --> 00:21:10.839
<v Speaker 2>Often yes, they bundle the data transmission, the result or

450
00:21:10.880 --> 00:21:14.880
<v Speaker 2>exception with the synchronization. With conditioned variables, you manage the

451
00:21:14.880 --> 00:21:17.839
<v Speaker 2>shared data and locking separately, which can be more error

452
00:21:17.880 --> 00:21:21.160
<v Speaker 2>prone if not done carefully. Tasks futures are often less

453
00:21:21.160 --> 00:21:23.240
<v Speaker 2>susceptible to issues like lost way cups.

454
00:21:23.279 --> 00:21:25.240
<v Speaker 1>Right DA, that makes sense. Let's switch gears to the

455
00:21:25.240 --> 00:21:28.599
<v Speaker 1>parallel algorithms in the SEL. Since C plus plus seventeen

456
00:21:28.799 --> 00:21:30.160
<v Speaker 1>many of them can run in parallel.

457
00:21:30.319 --> 00:21:34.640
<v Speaker 2>Yes, a huge addition. Many standard algorithms like four each, transform,

458
00:21:34.720 --> 00:21:37.640
<v Speaker 2>reduced sort, et cetera, now have overloads that take in

459
00:21:37.759 --> 00:21:39.759
<v Speaker 2>execution policy as the first argument.

460
00:21:39.960 --> 00:21:41.680
<v Speaker 1>Execution policy, what are the options?

461
00:21:41.799 --> 00:21:45.000
<v Speaker 2>The main ones are std dot execution dot SICK for

462
00:21:45.079 --> 00:21:49.359
<v Speaker 2>sequential execution, the old default std dot execution dot PR

463
00:21:49.400 --> 00:21:53.599
<v Speaker 2>for parallel execution on multiple threads, and std dot execution

464
00:21:53.720 --> 00:21:57.279
<v Speaker 2>dot PARENTSEC for parallel and potentially vectorized execution.

465
00:21:57.480 --> 00:21:59.400
<v Speaker 1>Vectorized like SIMD.

466
00:21:59.359 --> 00:22:02.319
<v Speaker 2>Exactly, parentsec gives the implementation the most freedom. It can

467
00:22:02.400 --> 00:22:04.759
<v Speaker 2>run jumps in parallel, and within each thread, it can

468
00:22:04.799 --> 00:22:08.279
<v Speaker 2>reorder or interleave operations on different elements. Often to take

469
00:22:08.279 --> 00:22:11.279
<v Speaker 2>advantage of SIMD instructions if the hardware supports.

470
00:22:10.920 --> 00:22:14.160
<v Speaker 1>It, so potentially the fastest, but maybe harder for the

471
00:22:14.160 --> 00:22:16.880
<v Speaker 1>programmer to reason about if side effects are involved.

472
00:22:16.920 --> 00:22:20.880
<v Speaker 2>Precisely, PARENTSEC demands more care regarding thread safety and lack

473
00:22:20.920 --> 00:22:24.839
<v Speaker 2>of dependencies between element operations. The book shows four each

474
00:22:24.880 --> 00:22:27.920
<v Speaker 2>with parentsec using an atomic counter, which is safe.

475
00:22:28.000 --> 00:22:32.160
<v Speaker 1>What about algorithms like std dot reduce or std dot

476
00:22:32.160 --> 00:22:35.640
<v Speaker 1>transform reduce, any special rules for parallelizing.

477
00:22:35.039 --> 00:22:38.440
<v Speaker 2>Them, Yes, a very important one. The operation you provide

478
00:22:38.680 --> 00:22:42.160
<v Speaker 2>addition for reduce or multiplication and addition for transform reduce

479
00:22:42.519 --> 00:22:45.960
<v Speaker 2>must be associative and commutative for the parallel versions to

480
00:22:46.039 --> 00:22:48.519
<v Speaker 2>guarantee the same result as the sequential one.

481
00:22:48.599 --> 00:22:51.799
<v Speaker 1>Associative and commutative like addition A plus B plus C

482
00:22:52.000 --> 00:22:54.160
<v Speaker 1>plus B plus c and A plus b egals b

483
00:22:54.279 --> 00:22:55.759
<v Speaker 1>plus a exactly.

484
00:22:56.200 --> 00:22:59.079
<v Speaker 2>If your operation doesn't have those properties, the result might

485
00:22:59.160 --> 00:23:03.079
<v Speaker 2>differ depending on how the parallel execution chunks and combines

486
00:23:03.119 --> 00:23:06.960
<v Speaker 2>the data. Floating point Edition strictly speaking, isn't associative, which

487
00:23:06.960 --> 00:23:08.759
<v Speaker 2>can sometimes cause tiny differences.

488
00:23:08.960 --> 00:23:13.319
<v Speaker 1>Good point. Are these policies just hints or guarantees of parallelism, They're.

489
00:23:13.160 --> 00:23:17.240
<v Speaker 2>More like permission slips or strong hints SDD dot execution.

490
00:23:17.359 --> 00:23:21.960
<v Speaker 2>DOT par allows parallel execution, but the library implementation might

491
00:23:22.000 --> 00:23:25.000
<v Speaker 2>decide to run it sequentially if it thinks that's faster. Eventually,

492
00:23:25.039 --> 00:23:28.039
<v Speaker 2>for very small ranges, it's not a strict guarantee of

493
00:23:28.160 --> 00:23:29.240
<v Speaker 2>end threads being used.

494
00:23:29.559 --> 00:23:32.319
<v Speaker 1>Does the book give any performance numbers? Is this speed

495
00:23:32.400 --> 00:23:32.799
<v Speaker 1>up real?

496
00:23:33.119 --> 00:23:36.240
<v Speaker 2>Yes, it shows a test case calculating tangents. The PAR

497
00:23:36.440 --> 00:23:40.480
<v Speaker 2>version on their quad core machine was significantly faster than SAKE,

498
00:23:40.559 --> 00:23:43.240
<v Speaker 2>close to a four x speed up. The parentsec version

499
00:23:43.359 --> 00:23:45.599
<v Speaker 2>was similar to PAR in that specific test, but your

500
00:23:45.640 --> 00:23:49.759
<v Speaker 2>mileage may vary absolutely. Performance depends heavily on the hardware,

501
00:23:49.839 --> 00:23:52.839
<v Speaker 2>the compiler, the specific algorithm, the data size, and the

502
00:23:52.880 --> 00:23:56.160
<v Speaker 2>operation being performed. Always benchmark your own code.

503
00:23:56.160 --> 00:23:59.519
<v Speaker 1>Sound advice. Okay, let's tackle a really modern feature. C

504
00:23:59.680 --> 00:24:02.880
<v Speaker 1>plus plus twenty quarantines. What's the fundamental difference from a

505
00:24:02.920 --> 00:24:03.640
<v Speaker 1>regular function.

506
00:24:03.880 --> 00:24:08.480
<v Speaker 2>The key idea is that they are resumable functions stackless, specifically.

507
00:24:07.960 --> 00:24:09.680
<v Speaker 1>Resumable, meaning they can pause and.

508
00:24:09.640 --> 00:24:13.160
<v Speaker 2>Continue later exactly. A regular function runs from start to

509
00:24:13.200 --> 00:24:16.400
<v Speaker 2>finish in one go. A core utine can execute a bit,

510
00:24:16.759 --> 00:24:20.559
<v Speaker 2>then cowight some operation or coiled of value, which suspends

511
00:24:20.599 --> 00:24:24.640
<v Speaker 2>its execution. Later, something can resume the core routine and

512
00:24:24.680 --> 00:24:27.200
<v Speaker 2>it picks up right where it left off, with all

513
00:24:27.240 --> 00:24:28.880
<v Speaker 2>its local variables intact.

514
00:24:29.079 --> 00:24:31.799
<v Speaker 1>So the state is saved somewhere, not just on the stack.

515
00:24:32.240 --> 00:24:36.920
<v Speaker 2>Right the state local variables suspension point is typically allocated

516
00:24:36.960 --> 00:24:39.480
<v Speaker 2>on the heap or in a quarutine frame managed by

517
00:24:39.480 --> 00:24:42.119
<v Speaker 2>the compiler, not just the traditional call stack. That's the

518
00:24:42.160 --> 00:24:43.039
<v Speaker 2>stackless part.

519
00:24:43.200 --> 00:24:45.319
<v Speaker 1>What makes a function become a core routine? Is there

520
00:24:45.319 --> 00:24:46.279
<v Speaker 1>a special keyword?

521
00:24:46.519 --> 00:24:49.119
<v Speaker 2>It becomes a quarantine if its body uses any of

522
00:24:49.160 --> 00:24:52.880
<v Speaker 2>the three Cortine keywords core return to return a value

523
00:24:52.880 --> 00:24:56.920
<v Speaker 2>and finish, suspend, co weight to suspend and wait for something,

524
00:24:57.319 --> 00:25:00.880
<v Speaker 2>or coiled to produce a value in a generator like sequence.

525
00:25:01.279 --> 00:25:03.960
<v Speaker 2>Even a range based for loop using co weight makes

526
00:25:04.000 --> 00:25:04.720
<v Speaker 2>it a core routine.

527
00:25:04.799 --> 00:25:08.759
<v Speaker 1>Okay, co return, cowight, coy yield. The book mentions handles,

528
00:25:08.839 --> 00:25:12.240
<v Speaker 1>suspend points, awaitables. Sounds like the machinery behind it it is.

529
00:25:12.319 --> 00:25:13.799
<v Speaker 2>Let's break it down quickly, go for it.

530
00:25:13.839 --> 00:25:15.279
<v Speaker 1>Quarantine handle that's.

531
00:25:15.119 --> 00:25:17.799
<v Speaker 2>Your remote control for the qure routine. An object you

532
00:25:17.799 --> 00:25:20.400
<v Speaker 2>can use to resume it, destroy its state, or check

533
00:25:20.400 --> 00:25:21.079
<v Speaker 2>if it's done.

534
00:25:21.200 --> 00:25:23.319
<v Speaker 1>Initial and final suspend points.

535
00:25:23.119 --> 00:25:26.839
<v Speaker 2>Every quarantine has a promise object associated with it. This

536
00:25:26.920 --> 00:25:31.079
<v Speaker 2>promise defines initial suspend and final suspend. These return special

537
00:25:31.160 --> 00:25:35.000
<v Speaker 2>awaitable objects like std dot suspenda ways or std dot

538
00:25:35.039 --> 00:25:39.400
<v Speaker 2>suspend never that control whether the quarantine suspends immediately when called,

539
00:25:39.599 --> 00:25:41.960
<v Speaker 2>and whether it suspends when it finishes via core return

540
00:25:42.079 --> 00:25:43.079
<v Speaker 2>or falling off the end.

541
00:25:43.279 --> 00:25:46.119
<v Speaker 1>So you can have a couroretine start suspended or run until.

542
00:25:45.839 --> 00:25:48.319
<v Speaker 2>The first CO eight exactly, and you can control if

543
00:25:48.319 --> 00:25:51.079
<v Speaker 2>it cleans itself up automatically when done, or waits to

544
00:25:51.119 --> 00:25:52.119
<v Speaker 2>be destroyed.

545
00:25:51.759 --> 00:25:55.200
<v Speaker 1>Via its handle and awaitables of waiters. That's for coit right.

546
00:25:55.319 --> 00:25:58.599
<v Speaker 2>When you cowight something that's something has to be an awaitable.

547
00:25:58.960 --> 00:26:02.519
<v Speaker 2>The compiler calls three key methods on the corresponding a

548
00:26:02.599 --> 00:26:07.039
<v Speaker 2>weight object, often the awaitable itself. A weight ready checks

549
00:26:07.039 --> 00:26:10.519
<v Speaker 2>if suspension is even needed. If not, it continues. If

550
00:26:10.519 --> 00:26:14.440
<v Speaker 2>suspension is needed, a weight suspend is called, which suspends

551
00:26:14.440 --> 00:26:17.440
<v Speaker 2>the quarantine and can schedule it for resumption later. When

552
00:26:17.480 --> 00:26:20.319
<v Speaker 2>it's time to resume, a weight resume is called and

553
00:26:20.359 --> 00:26:23.640
<v Speaker 2>its return value becomes the result of the cowight expression.

554
00:26:23.720 --> 00:26:26.240
<v Speaker 1>Okay, that's the core mechanism. The book had examples. One

555
00:26:26.279 --> 00:26:29.200
<v Speaker 1>preparing a job another using an event core routine.

556
00:26:29.359 --> 00:26:32.039
<v Speaker 2>Yeah, the job example was basic, showing the structure even

557
00:26:32.079 --> 00:26:35.200
<v Speaker 2>if it didn't suspend much initially. The event example was

558
00:26:35.240 --> 00:26:36.720
<v Speaker 2>more interesting for synchronization.

559
00:26:36.920 --> 00:26:37.759
<v Speaker 1>How did the event work.

560
00:26:38.079 --> 00:26:41.039
<v Speaker 2>It was a quarantine helper. You could cowight an event

561
00:26:41.119 --> 00:26:45.359
<v Speaker 2>object that courantine would suspend elsewhere code could call notify

562
00:26:45.599 --> 00:26:48.359
<v Speaker 2>on the event, which would resume the weighting core routine.

563
00:26:48.799 --> 00:26:52.000
<v Speaker 2>It's a way to build synchronization primitives using the quarantine

564
00:26:52.079 --> 00:26:53.359
<v Speaker 2>machinery itself.

565
00:26:53.720 --> 00:26:56.680
<v Speaker 1>Like a condition variable, but maybe fitting more naturally into

566
00:26:56.680 --> 00:26:57.559
<v Speaker 1>acen code flow.

567
00:26:57.880 --> 00:27:00.480
<v Speaker 2>Kind of. Yeah, it shows how coroutines can help manage

568
00:27:00.519 --> 00:27:01.480
<v Speaker 2>asynchronous waiting.

569
00:27:01.799 --> 00:27:04.759
<v Speaker 1>Let's look at the case studies. The book compared something

570
00:27:04.839 --> 00:27:08.200
<v Speaker 1>numbers in different ways. What was the fastest concurrent approach?

571
00:27:08.680 --> 00:27:13.160
<v Speaker 2>Right? They compared symbol, threaded, mutex protected, shared, some atomic shared,

572
00:27:13.200 --> 00:27:17.119
<v Speaker 2>some with different orderings, and finally a local sum approach.

573
00:27:17.480 --> 00:27:21.119
<v Speaker 2>Then the winner was by far the best concurrent performance

574
00:27:21.160 --> 00:27:23.960
<v Speaker 2>came from having each thread calculate a sum for its

575
00:27:24.000 --> 00:27:27.480
<v Speaker 2>own portion of the data into a local non shared variable.

576
00:27:28.119 --> 00:27:31.720
<v Speaker 2>Then only at the very end, each thread atomically adds

577
00:27:31.759 --> 00:27:34.240
<v Speaker 2>its local sum to the final shared result.

578
00:27:34.000 --> 00:27:38.480
<v Speaker 1>Variable, so minimize the shared operations do most work locally exactly.

579
00:27:38.640 --> 00:27:41.160
<v Speaker 2>Contention on the mutex or even the atomic variable in

580
00:27:41.200 --> 00:27:45.119
<v Speaker 2>the other approaches really killed performance locking or atomic ops

581
00:27:45.160 --> 00:27:47.920
<v Speaker 2>on every single edition was very slow compared to the

582
00:27:47.920 --> 00:27:48.839
<v Speaker 2>local accumulation.

583
00:27:49.200 --> 00:27:52.079
<v Speaker 1>Makes sense. The dining Philosopher's problem also came up. What

584
00:27:52.200 --> 00:27:54.519
<v Speaker 1>classic concurrency issues does that highlight?

585
00:27:54.799 --> 00:27:58.200
<v Speaker 2>Oh Dining Philosophers is the poster child for deadlock. It

586
00:27:58.240 --> 00:28:02.799
<v Speaker 2>perfectly illustrates how multiple actors philosophers competing for multiple shared

587
00:28:02.839 --> 00:28:06.720
<v Speaker 2>resources forks can easily get into a state where none

588
00:28:06.759 --> 00:28:09.759
<v Speaker 2>can proceed because they're all waiting for a resource held by.

589
00:28:09.680 --> 00:28:12.039
<v Speaker 1>Another the circular weight exactly.

590
00:28:12.519 --> 00:28:15.720
<v Speaker 2>The book uses it to show how flawed synchronization attempts

591
00:28:15.759 --> 00:28:18.720
<v Speaker 2>can lead to deadlock or maybe livelock, where they're busy

592
00:28:18.799 --> 00:28:22.200
<v Speaker 2>trying but making no progress, and it shows solutions like

593
00:28:22.400 --> 00:28:26.119
<v Speaker 2>establishing a strict ordering for acquiring the resources always pick

594
00:28:26.200 --> 00:28:29.000
<v Speaker 2>up the lower numbered fork first, for instance, to break

595
00:28:29.039 --> 00:28:30.359
<v Speaker 2>that circular dependency.

596
00:28:30.599 --> 00:28:34.440
<v Speaker 1>So resource ordering is a key deadlock prevention technique.

597
00:28:34.559 --> 00:28:36.559
<v Speaker 2>One of the most common and effective.

598
00:28:36.079 --> 00:28:41.559
<v Speaker 1>Ones singleton initialization again block based double checked locking, Meyer singleton,

599
00:28:41.759 --> 00:28:42.559
<v Speaker 1>what's the verdict?

600
00:28:42.759 --> 00:28:46.799
<v Speaker 2>Lock based is simple but potentially slow under contention double

601
00:28:46.880 --> 00:28:50.279
<v Speaker 2>check locking tries to optimize by checking first before locking,

602
00:28:50.599 --> 00:28:53.400
<v Speaker 2>but it's notoriously hard to get right and C plus

603
00:28:53.440 --> 00:28:56.839
<v Speaker 2>plus without hitting subtle memory ordering bugs. Avoid it unless

604
00:28:56.839 --> 00:28:57.839
<v Speaker 2>you really know what you're doing.

605
00:28:58.119 --> 00:29:01.680
<v Speaker 1>So Meyer's singleton datic local variable.

606
00:29:01.440 --> 00:29:03.559
<v Speaker 2>That's generally the way to go in modern C plus

607
00:29:03.559 --> 00:29:06.960
<v Speaker 2>plastatic tea instance return instance, since C plus plus eleven

608
00:29:07.079 --> 00:29:11.079
<v Speaker 2>the language guarantees this is thread safe and efficient, simple correct,

609
00:29:11.240 --> 00:29:12.200
<v Speaker 2>usually fast enough.

610
00:29:12.559 --> 00:29:17.200
<v Speaker 1>Good takeaway, Yeah, prefer meyer singleton CPMM was used again

611
00:29:17.240 --> 00:29:19.319
<v Speaker 1>to look at memory ordering and data races.

612
00:29:19.559 --> 00:29:24.599
<v Speaker 2>Yes, analyzing small examples, it visually reinforces how without proper

613
00:29:24.640 --> 00:29:29.279
<v Speaker 2>synchronization like mutexes or acquire release semantics, rights in one

614
00:29:29.359 --> 00:29:32.160
<v Speaker 2>thread are simply not guaranteed to be visible to reads

615
00:29:32.200 --> 00:29:35.039
<v Speaker 2>in another thread if they access the same non atomic

616
00:29:35.160 --> 00:29:39.000
<v Speaker 2>memory location. It makes the abstract memory ordering rules much

617
00:29:39.039 --> 00:29:39.759
<v Speaker 2>more concrete.

618
00:29:40.039 --> 00:29:42.200
<v Speaker 1>Seeing is believing, basically pretty much.

619
00:29:42.279 --> 00:29:45.240
<v Speaker 2>It helps you spot potential data races you might otherwise miss.

620
00:29:45.319 --> 00:29:49.319
<v Speaker 1>There was also a comparison condition variables versus atomic flags

621
00:29:49.319 --> 00:29:51.079
<v Speaker 1>for synchronization between two threads.

622
00:29:51.319 --> 00:29:54.640
<v Speaker 2>What was faster in the specific tests shown, which involved

623
00:29:54.680 --> 00:29:58.559
<v Speaker 2>repeated ping pong synchronization using atomic flags like std dot

624
00:29:58.559 --> 00:30:01.440
<v Speaker 2>atomic flag or std dot com atomic bool for the

625
00:30:01.480 --> 00:30:04.640
<v Speaker 2>signaling was found to be faster than using conditioned variables

626
00:30:04.640 --> 00:30:05.279
<v Speaker 2>in mutexes.

627
00:30:05.319 --> 00:30:07.559
<v Speaker 1>Why would that be It's likely due to overhead.

628
00:30:08.119 --> 00:30:12.079
<v Speaker 2>Atomic operations, especially on flags, can often be implemented very

629
00:30:12.079 --> 00:30:16.319
<v Speaker 2>efficiently by the processor, sometimes without involving the operating system kernel.

630
00:30:16.839 --> 00:30:20.759
<v Speaker 2>Conditioned variables in mutexes usually involves system calls for blocking

631
00:30:20.799 --> 00:30:23.440
<v Speaker 2>and waking threads, which adds more overhead.

632
00:30:24.119 --> 00:30:27.519
<v Speaker 1>So for simple signaling, atomics might be quicker, but condition

633
00:30:27.640 --> 00:30:28.680
<v Speaker 1>variables are more general.

634
00:30:28.759 --> 00:30:29.839
<v Speaker 2>That's a reasonable summary.

635
00:30:29.920 --> 00:30:33.319
<v Speaker 1>Yeah, and the last case study bit a coroutine returning

636
00:30:33.319 --> 00:30:33.799
<v Speaker 1>a future.

637
00:30:33.960 --> 00:30:36.680
<v Speaker 2>Yes, that just showed the nice integration. You can write

638
00:30:36.680 --> 00:30:40.599
<v Speaker 2>a function as a coroutine using coweight for internal ACYNC operations,

639
00:30:41.000 --> 00:30:44.240
<v Speaker 2>and then use coreturn to provide the final value. The

640
00:30:44.279 --> 00:30:47.079
<v Speaker 2>compiler automatically hooks this up, so the corotine returns an

641
00:30:47.240 --> 00:30:51.119
<v Speaker 2>std dot future or similar awaitable type that completes with

642
00:30:51.160 --> 00:30:51.960
<v Speaker 2>the core return.

643
00:30:51.720 --> 00:30:54.400
<v Speaker 1>To value, seamlessly bridging the two models exactly.

644
00:30:54.680 --> 00:30:58.079
<v Speaker 2>Let's use the corotine syntax internally while still interacting with

645
00:30:58.160 --> 00:30:59.720
<v Speaker 2>other code expecting futures.

646
00:31:00.000 --> 00:31:02.559
<v Speaker 1>Okay, let's gaze into the crystal ball see plus plus

647
00:31:02.599 --> 00:31:06.160
<v Speaker 1>twenty three and beyond. Executors are presented as a big deal.

648
00:31:06.440 --> 00:31:07.559
<v Speaker 1>What's the core concept?

649
00:31:07.920 --> 00:31:11.759
<v Speaker 2>Executors are intended to be a fundamental abstraction for how, where,

650
00:31:11.799 --> 00:31:15.640
<v Speaker 2>and when work gets done. They define the execution context.

651
00:31:15.880 --> 00:31:21.400
<v Speaker 1>Execution context like which threadpool or run on the GPU or.

652
00:31:21.319 --> 00:31:24.039
<v Speaker 2>Inline potentially all of the above. The idea is to

653
00:31:24.079 --> 00:31:26.880
<v Speaker 2>separate what the function or task you want to run

654
00:31:27.160 --> 00:31:29.960
<v Speaker 2>from the how or when, which is defined by the executor.

655
00:31:30.200 --> 00:31:32.400
<v Speaker 2>You'd submit your task to an executor, and the executor

656
00:31:32.440 --> 00:31:33.279
<v Speaker 2>decides how to run it.

657
00:31:33.519 --> 00:31:37.279
<v Speaker 1>So it's a unified way to handle thread pools, inline execution,

658
00:31:37.440 --> 00:31:38.440
<v Speaker 1>maybe event loops.

659
00:31:38.720 --> 00:31:42.079
<v Speaker 2>That's the goal, a standard, composable way to represent and

660
00:31:42.160 --> 00:31:46.160
<v Speaker 2>manage different execution strategies. The book sees them as foundational

661
00:31:46.240 --> 00:31:48.839
<v Speaker 2>for future concurrency libraries, networking, etc.

662
00:31:49.480 --> 00:31:51.640
<v Speaker 1>What kind of properties can these executors have?

663
00:31:51.839 --> 00:31:55.759
<v Speaker 2>The proposals discuss properties like directionality, is it fire and

664
00:31:55.799 --> 00:31:58.279
<v Speaker 2>forget one way? Does it return a future two way?

665
00:31:58.440 --> 00:32:01.559
<v Speaker 2>Does it support continuations then then cardinality? Does it run

666
00:32:01.599 --> 00:32:05.559
<v Speaker 2>one task, single or many? Bulk blocking behavior? Does submitting

667
00:32:05.640 --> 00:32:08.720
<v Speaker 2>work potentially block the caller? Possibly always never?

668
00:32:09.000 --> 00:32:12.759
<v Speaker 1>And you could potentially query or acquire executors with certain properties.

669
00:32:12.920 --> 00:32:16.960
<v Speaker 2>That's the idea, using mechanisms like execution dot require or

670
00:32:17.000 --> 00:32:20.119
<v Speaker 2>prefer to tailor the execution context to your needs.

671
00:32:20.440 --> 00:32:23.440
<v Speaker 1>How might this integrate with things like std dot ASNC

672
00:32:23.839 --> 00:32:25.160
<v Speaker 1>or parallel algorithms.

673
00:32:25.400 --> 00:32:27.400
<v Speaker 2>The vision is you'd be able to pass an executor

674
00:32:27.440 --> 00:32:30.799
<v Speaker 2>object to std dot ASNC or to the parallel algorithms

675
00:32:30.839 --> 00:32:33.319
<v Speaker 2>to control where and how they run, instead of them

676
00:32:33.400 --> 00:32:36.640
<v Speaker 2>always using some default mechanism like a hidden global threadpool.

677
00:32:37.079 --> 00:32:38.720
<v Speaker 2>More control, more flexibility.

678
00:32:38.960 --> 00:32:41.119
<v Speaker 1>What are the main design goals for executors?

679
00:32:41.720 --> 00:32:45.000
<v Speaker 2>Usability both for library writers building on them and for

680
00:32:45.039 --> 00:32:48.559
<v Speaker 2>application developers using them, composabilities so you can layer and

681
00:32:48.599 --> 00:32:53.279
<v Speaker 2>combine executors, and minimality, keeping the core concepts lean and extensible.

682
00:32:53.920 --> 00:32:56.720
<v Speaker 1>You mentioned single versus bulk cardinality. What's the difference in

683
00:32:56.759 --> 00:32:57.640
<v Speaker 1>how they execute?

684
00:32:57.799 --> 00:33:00.680
<v Speaker 2>Single cardinality functions one way execute, two U way execute

685
00:33:00.720 --> 00:33:03.519
<v Speaker 2>then execute, take one callable and run at once. Bulk

686
00:33:03.519 --> 00:33:06.240
<v Speaker 2>functions take a callable and a shape like account and

687
00:33:06.319 --> 00:33:09.319
<v Speaker 2>run the callable multiple times, possibly in parallel, passing an

688
00:33:09.319 --> 00:33:12.799
<v Speaker 2>index or other info to each invocation. Useful for parallel

689
00:33:12.799 --> 00:33:13.880
<v Speaker 2>for style operations.

690
00:33:14.319 --> 00:33:18.599
<v Speaker 1>Got it? The book mentioned some ongoing concerns like when

691
00:33:18.640 --> 00:33:21.240
<v Speaker 1>all wenny return types and blocking future destructors.

692
00:33:21.359 --> 00:33:24.799
<v Speaker 2>Yeah, those are known complexities. Combining futures with when all

693
00:33:24.799 --> 00:33:28.359
<v Speaker 2>weny can lead to complicated return types, and the fact

694
00:33:28.400 --> 00:33:31.319
<v Speaker 2>that std dot ASNC can return a future that blocks

695
00:33:31.319 --> 00:33:34.079
<v Speaker 2>in its destructor if you don't get the result is

696
00:33:34.119 --> 00:33:38.279
<v Speaker 2>problematic as it can accidentally serialize your code. There are

697
00:33:38.400 --> 00:33:40.960
<v Speaker 2>active proposals trying to refine these areas.

698
00:33:41.519 --> 00:33:44.480
<v Speaker 1>What about synchronized in atomic blocks in C plus plus

699
00:33:44.519 --> 00:33:46.960
<v Speaker 1>twenty three sound related but different.

700
00:33:47.279 --> 00:33:50.200
<v Speaker 2>They are both aim for atomic execution of a code block.

701
00:33:50.599 --> 00:33:53.319
<v Speaker 2>Synchronized blocks are more relaxed. They act like the block

702
00:33:53.359 --> 00:33:56.480
<v Speaker 2>is guarded by a single global mutex, providing a total order.

703
00:33:56.799 --> 00:34:00.319
<v Speaker 2>They can contain things like io atomic blocks at no

704
00:34:00.359 --> 00:34:03.799
<v Speaker 2>accept atomic commit, atomic cancel are for true transactions. They

705
00:34:03.799 --> 00:34:06.960
<v Speaker 2>have stricter rules about what they can contain no non transactions,

706
00:34:07.000 --> 00:34:10.840
<v Speaker 2>safe operations, and explicit handling of exceptions commit or cancel

707
00:34:10.920 --> 00:34:11.480
<v Speaker 2>the transaction.

708
00:34:11.639 --> 00:34:15.000
<v Speaker 1>So atomic blocks are closer to database transactions.

709
00:34:14.440 --> 00:34:17.920
<v Speaker 2>Conceptually, yes, aiming for that kind of atomicity, though without

710
00:34:17.920 --> 00:34:19.039
<v Speaker 2>the durability aspect.

711
00:34:19.119 --> 00:34:21.239
<v Speaker 1>Usually and taskblocks a fork joint model.

712
00:34:21.559 --> 00:34:26.039
<v Speaker 2>Right. Taskblocks provide structured parallelism. You define a work launch

713
00:34:26.079 --> 00:34:28.960
<v Speaker 2>subtasks within it. The fork the thread that started the block,

714
00:34:29.000 --> 00:34:31.159
<v Speaker 2>automatically waits at the end of the block until all

715
00:34:31.239 --> 00:34:35.480
<v Speaker 2>launch subtasks are complete. The join makes managing parallel task

716
00:34:35.559 --> 00:34:37.079
<v Speaker 2>dependencies much simpler.

717
00:34:37.239 --> 00:34:40.599
<v Speaker 1>Okay, And lastly, for C plus plus twenty three, the

718
00:34:40.719 --> 00:34:43.800
<v Speaker 1>Data Parallel Vector Library SAMD.

719
00:34:44.280 --> 00:34:47.719
<v Speaker 2>This aims to standardize SIMD programming in C plus plus,

720
00:34:48.159 --> 00:34:51.559
<v Speaker 2>providing standard vector types that map to hardware SIMD registers

721
00:34:51.920 --> 00:34:55.519
<v Speaker 2>and operations on them. Features like masked operations apply an

722
00:34:55.519 --> 00:34:58.239
<v Speaker 2>operation only where a condition is true, and traits to

723
00:34:58.320 --> 00:35:01.199
<v Speaker 2>query vector properties like size are part of it, making

724
00:35:01.239 --> 00:35:02.880
<v Speaker 2>SIDY more portable and accessible.

725
00:35:03.079 --> 00:35:06.320
<v Speaker 1>Lots of interesting stuff potentially coming. Let's switch to synchronization patterns,

726
00:35:06.760 --> 00:35:09.760
<v Speaker 1>architectural design idioms. What's the difference?

727
00:35:09.880 --> 00:35:14.159
<v Speaker 2>Think levels of abstraction. Architectural patterns like reactor proactor define

728
00:35:14.159 --> 00:35:17.559
<v Speaker 2>the high level structure of a concurrent system. Design patterns

729
00:35:17.599 --> 00:35:21.960
<v Speaker 2>like active object monitor describe common interaction solutions between components.

730
00:35:22.199 --> 00:35:26.800
<v Speaker 2>Idioms are lower level se plus specific techniques like scope locking.

731
00:35:26.599 --> 00:35:27.880
<v Speaker 1>And using patterns helps out.

732
00:35:27.960 --> 00:35:31.639
<v Speaker 2>Gives you a shared vocabulary, makes designs clearer, lets you

733
00:35:31.760 --> 00:35:35.760
<v Speaker 2>reuse proven solutions instead of reinventing the wheel. They build

734
00:35:35.760 --> 00:35:39.840
<v Speaker 2>on best practices but are more specific named solutions to

735
00:35:39.880 --> 00:35:40.840
<v Speaker 2>recurring problems.

736
00:35:41.000 --> 00:35:43.639
<v Speaker 1>What patterns help with managing shared data?

737
00:35:43.760 --> 00:35:47.960
<v Speaker 2>The book mentions things like copying the value avoids sharing

738
00:35:48.039 --> 00:35:52.199
<v Speaker 2>mutable state altogether good for value types, thread specific storage.

739
00:35:52.400 --> 00:35:56.119
<v Speaker 2>Each thread gets its own copy using futures, share the

740
00:35:56.159 --> 00:35:57.599
<v Speaker 2>result once it's ready.

741
00:35:57.440 --> 00:36:01.639
<v Speaker 1>And patterns for handling mutations safely one include scoped locking

742
00:36:02.000 --> 00:36:04.400
<v Speaker 1>using std dot lockered or std.

743
00:36:04.119 --> 00:36:08.559
<v Speaker 2>Dot unique lock for RAI mutex management, strategize locking using

744
00:36:08.599 --> 00:36:11.920
<v Speaker 2>templates or polymorphism to vary the locking strategy, thread safe

745
00:36:11.960 --> 00:36:16.039
<v Speaker 2>interface designing the class itself to handle internal synchronization, guarded

746
00:36:16.079 --> 00:36:19.880
<v Speaker 2>suspension using condition variables to wait for preconditions, and always.

747
00:36:19.880 --> 00:36:22.639
<v Speaker 2>The book warns about lifetimes when passing references to threads.

748
00:36:22.719 --> 00:36:25.920
<v Speaker 1>Right the architectural patterns active object, monitor, half sync, have

749
00:36:26.119 --> 00:36:27.840
<v Speaker 1>async reactor proactor.

750
00:36:28.119 --> 00:36:30.239
<v Speaker 3>Can we get a super quick idea of each okay

751
00:36:30.320 --> 00:36:35.079
<v Speaker 3>quick fire active object decouple's method call from execution uses

752
00:36:35.119 --> 00:36:39.599
<v Speaker 3>an internal thread and message que monitor object synchronizes access

753
00:36:39.599 --> 00:36:42.599
<v Speaker 3>to an object's methods, usually one lock for the whole object.

754
00:36:43.239 --> 00:36:47.239
<v Speaker 2>Half sync half ACYNC separates asinc tasks verr gio in

755
00:36:47.280 --> 00:36:51.440
<v Speaker 2>a thread pool from sync processing. You examle main logic thread,

756
00:36:51.880 --> 00:36:55.760
<v Speaker 2>often using a queue reactor, single thread, weights for events,

757
00:36:56.000 --> 00:36:59.960
<v Speaker 2>synchronously dispatches to handlers. Proactor waits for completion of ace

758
00:37:00.039 --> 00:37:03.920
<v Speaker 2>zinc operations, then calls handlers leverages acinc os features.

759
00:37:04.000 --> 00:37:07.239
<v Speaker 1>Got it. The book uses boost assio for a reactor example.

760
00:37:06.920 --> 00:37:09.320
<v Speaker 2>Yes, showing how that library implements the event loop and

761
00:37:09.360 --> 00:37:11.559
<v Speaker 2>handler dispatch typical of the reactor pattern.

762
00:37:11.719 --> 00:37:14.280
<v Speaker 1>Moving on to best practices, what are the absolute top.

763
00:37:14.079 --> 00:37:17.800
<v Speaker 2>Ones number one, far and away. Minimize shared mutable state

764
00:37:17.880 --> 00:37:21.679
<v Speaker 2>If data isn't shared or isn't mutable, concurrency gets vastly simpler.

765
00:37:21.760 --> 00:37:23.239
<v Speaker 1>Avoid the problem if you can.

766
00:37:23.320 --> 00:37:26.800
<v Speaker 2>Exactly if you must have shared mutable state. Ensure proper

767
00:37:26.840 --> 00:37:31.480
<v Speaker 2>synchronization mutext as atomics, et cetera. Minimize waiting time. Amdell's

768
00:37:31.559 --> 00:37:34.280
<v Speaker 2>law limits speed up based on sequential parts. Use a

769
00:37:34.400 --> 00:37:40.039
<v Speaker 2>mutability const expert where possible. Use RAII for locks lockguard.

770
00:37:40.320 --> 00:37:44.199
<v Speaker 2>Don't use condition variables without predicates. Prefer higher level tools

771
00:37:44.239 --> 00:37:48.719
<v Speaker 2>a std dot acinc. Parallel algorithms over raw threads when appropriate,

772
00:37:48.920 --> 00:37:50.320
<v Speaker 2>and understand the memory model.

773
00:37:50.519 --> 00:37:53.760
<v Speaker 1>Understand the memory model. It seems full circle on that one. Okay,

774
00:37:54.079 --> 00:37:59.239
<v Speaker 1>Concurrent data structures, stacks, ques, What are the challenges?

775
00:37:59.400 --> 00:38:03.119
<v Speaker 2>The main chate is maintaining the data structure's internal consistency.

776
00:38:03.159 --> 00:38:07.079
<v Speaker 2>It's invariance when multiple threads are operating on it simultaneously.

777
00:38:07.400 --> 00:38:09.800
<v Speaker 2>A simple example is a stack. What if one thread

778
00:38:09.880 --> 00:38:11.760
<v Speaker 2>tries to pop while another is reading the top. You

779
00:38:11.800 --> 00:38:13.840
<v Speaker 2>might get inconsistent results or errors.

780
00:38:13.880 --> 00:38:15.239
<v Speaker 1>How do you fix that? Locks?

781
00:38:15.480 --> 00:38:18.039
<v Speaker 2>Locks are the first step. Coarse grained locking one big

782
00:38:18.079 --> 00:38:21.199
<v Speaker 2>lock for the whole structure is simpler, but limits concurrency.

783
00:38:21.559 --> 00:38:25.360
<v Speaker 2>Fine grained locking multiple locks for different parts allows more parallelism,

784
00:38:25.400 --> 00:38:26.760
<v Speaker 2>but is much harder to get right.

785
00:38:27.119 --> 00:38:29.960
<v Speaker 1>The book mentioned changing the interface sometimes.

786
00:38:29.559 --> 00:38:33.639
<v Speaker 2>Yes like instead of separate top on pop on a stack,

787
00:38:33.760 --> 00:38:37.360
<v Speaker 2>provide a single atomic top app operation. This avoids the

788
00:38:37.440 --> 00:38:39.880
<v Speaker 2>race condition between checking the top and removing it.

789
00:38:40.119 --> 00:38:41.679
<v Speaker 1>What about lock free structures?

790
00:38:41.719 --> 00:38:45.480
<v Speaker 2>That's the next level, avoiding locks entirely using a comic

791
00:38:45.519 --> 00:38:49.360
<v Speaker 2>operations like compare and swap. This can offer better performance

792
00:38:49.400 --> 00:38:53.199
<v Speaker 2>and avoids deadlock, but it's extremely complex. You run into

793
00:38:53.199 --> 00:38:57.639
<v Speaker 2>issues like the ABA problem, and memory reclamation becomes very tricky.

794
00:38:57.800 --> 00:39:01.199
<v Speaker 2>The book mentions hazard pointers as one solution a problem

795
00:39:01.280 --> 00:39:04.960
<v Speaker 2>where a location reads value a then computation happens then

796
00:39:04.960 --> 00:39:07.840
<v Speaker 2>it reads A again, but in the meantime another thread

797
00:39:07.960 --> 00:39:10.519
<v Speaker 2>changed it to B and then back to A. Your

798
00:39:10.519 --> 00:39:14.320
<v Speaker 2>comparent swap might succeed, thinking nothing changed, but the underlying

799
00:39:14.400 --> 00:39:16.880
<v Speaker 2>state is different. Needs careful handling.

800
00:39:17.119 --> 00:39:20.199
<v Speaker 1>Wow, concurrent data structures sound like a deep topic on their.

801
00:39:20.119 --> 00:39:22.599
<v Speaker 2>Own, they really are. The book gives a taste, including

802
00:39:22.599 --> 00:39:25.480
<v Speaker 2>a lock free stack using C plus plus twenty atomic

803
00:39:25.559 --> 00:39:26.280
<v Speaker 2>smart pointers.

804
00:39:26.360 --> 00:39:29.320
<v Speaker 1>What about the time library chrono? How did that relate?

805
00:39:29.519 --> 00:39:34.199
<v Speaker 2>Krono is essential for measuring performance, setting timeouts, managing timed weights.

806
00:39:34.519 --> 00:39:38.239
<v Speaker 2>It provides clocks system clock for wall time, steady clock

807
00:39:38.280 --> 00:39:42.960
<v Speaker 2>for intervals, time points, specific moments and durations, intervals, very

808
00:39:42.960 --> 00:39:44.519
<v Speaker 2>flexible for representing time.

809
00:39:44.360 --> 00:39:49.519
<v Speaker 1>Accurately, and atomic operations, transactional memory any final points there.

810
00:39:50.119 --> 00:39:53.599
<v Speaker 2>The book mentions atomics should ideally be addressed free atomic

811
00:39:53.679 --> 00:39:57.360
<v Speaker 2>even across processes sharing memory, and it touches on ACD

812
00:39:57.440 --> 00:40:03.199
<v Speaker 2>properties atomicity, consistency, isolation, durability for transactions, noting that C

813
00:40:03.440 --> 00:40:07.079
<v Speaker 2>plus plus transactional memory proposals focus mainly on AC and I,

814
00:40:07.519 --> 00:40:10.519
<v Speaker 2>with durability being less of a focus than in databases.

815
00:40:10.559 --> 00:40:12.800
<v Speaker 1>And finally, a glossary that sounds useful.

816
00:40:13.079 --> 00:40:17.280
<v Speaker 2>Very concurrency has a lot of specific terminology, acquire, release, data, RaSE, deadlock,

817
00:40:17.320 --> 00:40:21.320
<v Speaker 2>sequential consistency, et cetera. The glossary helps nail down those definitions.

818
00:40:21.400 --> 00:40:23.639
<v Speaker 1>Wow, Okay, we have definitely covered a lot of ground

819
00:40:23.679 --> 00:40:26.559
<v Speaker 1>there based on the material. Real deep dive into modern

820
00:40:26.599 --> 00:40:27.639
<v Speaker 1>C plus plus.

821
00:40:27.400 --> 00:40:31.880
<v Speaker 2>Concurrency absolutely from the memory models, tricky foundations, rite up

822
00:40:31.880 --> 00:40:35.280
<v Speaker 2>to currotines, parallel algorithms, and a peak at what's coming

823
00:40:35.320 --> 00:40:39.199
<v Speaker 2>with executors. C plus plus provides a pretty powerful set

824
00:40:39.239 --> 00:40:39.840
<v Speaker 2>of tools.

825
00:40:39.960 --> 00:40:43.280
<v Speaker 1>So for you lifting in, I think the big takeaway

826
00:40:43.360 --> 00:40:46.880
<v Speaker 1>is that concurrency really forces a shift in thinking compared

827
00:40:46.880 --> 00:40:52.519
<v Speaker 1>to sequential code, timing, interaction, ordering. It all becomes critical, right.

828
00:40:52.920 --> 00:40:55.760
<v Speaker 2>It unlocks performance potential, but also opens up whole new

829
00:40:55.800 --> 00:40:58.840
<v Speaker 2>categories of bugs like data races and deadlocks if you're

830
00:40:58.880 --> 00:40:59.400
<v Speaker 2>not careful.

831
00:41:00.320 --> 00:41:02.920
<v Speaker 1>Vigilance is needed, and this is an evolving field, right.

832
00:41:03.000 --> 00:41:06.639
<v Speaker 2>Definitely, the C plus plus standard keeps adding features, best

833
00:41:06.639 --> 00:41:09.719
<v Speaker 2>practices emerge, So we'd encourage you to maybe pick an

834
00:41:09.719 --> 00:41:12.599
<v Speaker 2>area that caught your interest today and dig deeper. Try

835
00:41:12.599 --> 00:41:15.599
<v Speaker 2>out the parallel algorithms, maybe write a small program using

836
00:41:15.679 --> 00:41:19.239
<v Speaker 2>j thread. When C plus plus twenty three features become available,

837
00:41:19.360 --> 00:41:21.480
<v Speaker 2>experiment with executors.

838
00:41:21.000 --> 00:41:23.880
<v Speaker 1>Or Even if you're feeling brave, try implementing a simple

839
00:41:23.960 --> 00:41:27.199
<v Speaker 1>lock free structure just to appreciate the complexity exactly.

840
00:41:27.440 --> 00:41:30.039
<v Speaker 2>The more you work with it, the better your intuition becomes.

841
00:41:30.320 --> 00:41:33.079
<v Speaker 1>Ultimately, understanding of these concepts put you in a much

842
00:41:33.119 --> 00:41:36.599
<v Speaker 1>stronger position to build modern software. Being able to reason

843
00:41:36.599 --> 00:41:41.480
<v Speaker 1>about concurrency is just It's becoming a non negotiable skill

844
00:41:41.480 --> 00:41:42.559
<v Speaker 1>in our multi core world.

845
00:41:42.719 --> 00:41:43.599
<v Speaker 2>Couldn't agree more.

846
00:41:44.400 --> 00:41:46.000
<v Speaker 1>Thanks for joining us on this deep dive.
