WEBVTT 1 00:00:00.120 --> 00:00:02.799 Welcome to the deep dive. We've got some really interesting 2 00:00:02.799 --> 00:00:06.320 material you sent over on concurrency in modern C plus 3 00:00:06.360 --> 00:00:09.240 plus POM. Looks like it's mostly drawing from the book 4 00:00:09.400 --> 00:00:11.480 Concurrency with Modern C plus plus POM. 5 00:00:11.560 --> 00:00:15.839 That's right, And our goal today is well, to unpack 6 00:00:15.880 --> 00:00:19.600 the core ideas, maybe find some of those aha moments 7 00:00:19.679 --> 00:00:19.960 for you. 8 00:00:20.239 --> 00:00:23.359 Yeah, make this whole complex topic a bit more accessible 9 00:00:23.600 --> 00:00:25.760 without drowning in the jargon exactly. 10 00:00:25.920 --> 00:00:27.839 And you know, the book itself kind of hints at 11 00:00:27.879 --> 00:00:30.719 the challenge. It mentions how the C plus plus memory 12 00:00:30.760 --> 00:00:33.159 model often runs counter to our intuition. 13 00:00:33.640 --> 00:00:37.000 Oh. Interesting, So that's our mission then, to navigate that 14 00:00:37.039 --> 00:00:40.079 complexity and pull out the essentials you need for writing 15 00:00:40.320 --> 00:00:42.399 you know, solid concurrency. 16 00:00:41.920 --> 00:00:45.920 Plus plus code, precisely efficient, dependable code. That's the end. 17 00:00:46.039 --> 00:00:49.159 Okay, let's get started then, right at the foundation the 18 00:00:49.200 --> 00:00:51.799 memory model. Yeah, in simple terms, what is it we're 19 00:00:51.840 --> 00:00:53.200 trying to wrap our heads around here. 20 00:00:53.280 --> 00:00:56.240 Well, think of the memory model as like the official 21 00:00:56.320 --> 00:00:59.560 rule book. It dictates how different threads in your program 22 00:00:59.679 --> 00:01:01.679 see and interact with the computer's memory. 23 00:01:01.759 --> 00:01:03.719 Okay, rules for memory interaction. 24 00:01:03.439 --> 00:01:07.079 And from a concurrency angle, two basic questions pop up. First, 25 00:01:07.920 --> 00:01:11.519 what counts as a single place in memory a memory location. 26 00:01:11.719 --> 00:01:14.040 Right, is it a bite an integer? 27 00:01:14.159 --> 00:01:16.519 According to the source, Yeah, it's either a basic scaler 28 00:01:16.519 --> 00:01:20.159 type like your ince floats, pointers, enoms, or if you 29 00:01:20.200 --> 00:01:24.120 have bitfields, it's the largest sort of continuous sequence of 30 00:01:24.159 --> 00:01:24.719 those bits. 31 00:01:25.120 --> 00:01:28.040 Got it. Scaler types are contiguous bitfields. What was the 32 00:01:28.079 --> 00:01:28.680 second question? 33 00:01:28.959 --> 00:01:32.760 Ah, the big one. What happens when multiple threads try 34 00:01:32.760 --> 00:01:35.400 to access that same memory location around. 35 00:01:35.200 --> 00:01:37.480 The same time. Okay, and I sense danger here? 36 00:01:37.599 --> 00:01:40.680 You got it? That leads us straight to data races. Okay, 37 00:01:40.680 --> 00:01:44.000 imagine two threads hitting the same shared variable. It's mutable, 38 00:01:44.159 --> 00:01:45.959 and at least one of those threads is trying to 39 00:01:46.000 --> 00:01:46.519 write to it. 40 00:01:46.760 --> 00:01:47.640 That's data race. 41 00:01:47.719 --> 00:01:51.319 That's the data race, and the result undefined behavior. Ah, 42 00:01:51.480 --> 00:01:56.000 the dreaded ub exactly the wild West. Your program might crash, 43 00:01:56.040 --> 00:01:59.120 spit out garbage, or maybe even seem to work fine 44 00:01:59.159 --> 00:02:01.760 for a while, then just fail later completely out of 45 00:02:01.760 --> 00:02:02.120 the blue. 46 00:02:02.280 --> 00:02:04.920 So that's why we need things like mute texts and locks. Yeah, 47 00:02:04.959 --> 00:02:07.439 to coordinate who gets access when precisely. 48 00:02:07.680 --> 00:02:11.280 There are the traffic cops for shared data access essential tools. 49 00:02:11.560 --> 00:02:15.560 The book uses thread safe singleton initialization as a classic example. 50 00:02:16.240 --> 00:02:19.199 Why is that such a good illustration. It seems simple, right, 51 00:02:19.400 --> 00:02:20.159 just one instance. 52 00:02:20.360 --> 00:02:23.120 Well, it seems simple in a single thread, but imagine 53 00:02:23.199 --> 00:02:27.159 multiple threads all deciding, hey, I need the singleton at 54 00:02:27.159 --> 00:02:28.280 the exact same time. 55 00:02:28.400 --> 00:02:30.360 Ah. So if they all check and see it doesn't 56 00:02:30.400 --> 00:02:31.240 exist yet, they. 57 00:02:31.199 --> 00:02:33.240 Might all try to create it, and suddenly you've got 58 00:02:33.360 --> 00:02:36.080 multiple singletons, which completely breaks the whole idea. 59 00:02:36.280 --> 00:02:39.400 Right, So, thread safe techniques make sure only one thread 60 00:02:39.639 --> 00:02:42.599 actually does the creation, even if many try. 61 00:02:42.479 --> 00:02:44.879 Exactly ensure as it's created exactly once. 62 00:02:45.280 --> 00:02:48.759 Now for digging deeper, the book mentions a tool called creepmem. 63 00:02:49.000 --> 00:02:49.719 What's that about? 64 00:02:49.879 --> 00:02:53.879 Oh, CRIPPYMM is fantastic for this. It's like a sandbox 65 00:02:54.080 --> 00:02:57.080 or a simulator for the C plus plus memory model. 66 00:02:57.199 --> 00:02:57.479 Okay. 67 00:02:57.719 --> 00:03:00.439 You feed it small snippets of concurrent code, and it 68 00:03:00.439 --> 00:03:03.039 shows you all the possible ways the operations from different 69 00:03:03.039 --> 00:03:07.840 threads could interleave. It visualizes the impact of different memory orderings. 70 00:03:07.439 --> 00:03:09.479 So you can actually see how things might go wrong 71 00:03:09.680 --> 00:03:12.360 or why a certain ordering works precisely. 72 00:03:12.840 --> 00:03:15.719 It helps build that intuition for how the memory model behaves, which, 73 00:03:16.080 --> 00:03:17.639 as we said, isn't always obvious. 74 00:03:17.919 --> 00:03:21.240 Really valuable tool, okay, memory model basics covered, let's talk 75 00:03:21.280 --> 00:03:24.000 about the threads themselves. We've had std dot thread for 76 00:03:24.000 --> 00:03:27.840 a while clus plus twenty added std dot j thread. 77 00:03:28.199 --> 00:03:29.400 What's the leap forward there? 78 00:03:29.520 --> 00:03:32.800 The big difference really is resource management safety. 79 00:03:32.919 --> 00:03:34.520 Also with sdd. 80 00:03:34.280 --> 00:03:37.919 Dot thread, you the programmer must remember to either join 81 00:03:38.000 --> 00:03:40.599 the thread, wait for it to finish, or detach it 82 00:03:40.639 --> 00:03:41.680 to run independently. 83 00:03:41.840 --> 00:03:43.479 And if you forget, if the std. 84 00:03:43.319 --> 00:03:45.919 Dot thread object gets destroyed before you do either, your 85 00:03:45.960 --> 00:03:48.000 program terminates. It's a common mistake. 86 00:03:48.240 --> 00:03:51.599 Ouch. Okay, So how does std dot j thread fix that. 87 00:03:51.800 --> 00:03:55.919 It's our Aii based resource acquisition is initialization. When an 88 00:03:55.960 --> 00:03:58.280 std dot j thread object goes out of scope, its 89 00:03:58.319 --> 00:04:00.240 destructor automatically calls joint. 90 00:04:00.360 --> 00:04:03.240 No more forgetting Nice. That sounds much safer it is. 91 00:04:03.560 --> 00:04:06.080 Plus, std dot j thread has built in support for 92 00:04:06.199 --> 00:04:09.719 cooperative interruption, a clean way to ask a thread to stop. 93 00:04:09.960 --> 00:04:12.759 Okay, cooperative interruption. We'll probably circle back to that now. 94 00:04:12.759 --> 00:04:15.879 The book mentioned something tricky with std dot shared ptr. 95 00:04:16.240 --> 00:04:17.439 I thought they were threads safe. 96 00:04:17.480 --> 00:04:21.000 They help with memory management in threads. Yes, they prevent 97 00:04:21.120 --> 00:04:25.040 leaks by managing the object's lifetime automatically, but the shared 98 00:04:25.040 --> 00:04:28.000 pointer itself isn't fully thread safe for all operations. 99 00:04:28.079 --> 00:04:28.680 What's the catch? 100 00:04:28.759 --> 00:04:31.560 It's the internal reference counter. If you have multiple threads 101 00:04:31.839 --> 00:04:34.680 all trying to say, assign a new shared pointer to 102 00:04:34.720 --> 00:04:37.519 the same shared pointer variable, especially if it was passed 103 00:04:37.519 --> 00:04:40.160 by reference, we could corrupt the count exactly. You can 104 00:04:40.199 --> 00:04:43.040 get a data RaSE on that internal counter. The book 105 00:04:43.040 --> 00:04:45.839 shows an example where this happens when threads modify a 106 00:04:45.920 --> 00:04:50.240 shared shared ptr passed by reference. The object being pointed 107 00:04:50.279 --> 00:04:53.480 to might be fine, but the pointer's bookkeeping gets messed up. 108 00:04:53.639 --> 00:04:56.480 So if I need multiple threads to safely update which 109 00:04:56.480 --> 00:04:59.279 object to shared pointer points to, what's the solution. 110 00:05:00.000 --> 00:05:03.079 The book suggests using std dot atomic store for that 111 00:05:03.120 --> 00:05:06.000 specific case to make the update atomic, but it also 112 00:05:06.040 --> 00:05:07.480 points out that this is kind. 113 00:05:07.240 --> 00:05:09.120 Of a workaround. What's the real fix? Then? 114 00:05:09.720 --> 00:05:13.279 Ideally we'd use atomic smart pointers like std dot atomas 115 00:05:13.439 --> 00:05:16.639 esdd dot shared ptr, which C plus plus twenty introduced 116 00:05:17.040 --> 00:05:20.000 that handles the atomicity of the pointer operations themselves. 117 00:05:20.319 --> 00:05:23.399 Okay, that makes sense. The book also mentioned std dot 118 00:05:23.399 --> 00:05:24.079 atomic cref. 119 00:05:24.279 --> 00:05:27.639 What's that for, ah, atomic cref. That's pretty neat. It 120 00:05:27.720 --> 00:05:30.959 lets you perform atomic operations on an existing object that 121 00:05:31.199 --> 00:05:34.480 wasn't originally declared. Std dot atomic, so you. 122 00:05:34.399 --> 00:05:37.240 Can temporarily treat a regular variable as atomic sort of. 123 00:05:37.319 --> 00:05:39.439 Yeah, you created an automic craft to it, and then 124 00:05:39.480 --> 00:05:43.279 you can use atomic operations like fetchad or compare exchange 125 00:05:43.279 --> 00:05:47.079 strong directly on that underlying variable through the reference. The 126 00:05:47.160 --> 00:05:50.800 example showed incrementing a counter inside some big object without 127 00:05:50.839 --> 00:05:53.800 needing locks or making the whole object atomic. 128 00:05:53.600 --> 00:05:56.399 Interesting, so careful management is key. This leads us nicely 129 00:05:56.439 --> 00:06:02.040 into memory ordering, sequential consistency, acquire release, relaxed. These sound 130 00:06:02.079 --> 00:06:03.279 like different levels of rules. 131 00:06:03.360 --> 00:06:06.839 They are. They're different contracts, different guarantees about how memory 132 00:06:06.879 --> 00:06:09.000 operations become visible across threads. 133 00:06:09.160 --> 00:06:13.959 Let's start with the strictest sequential consistency memory order. 134 00:06:13.800 --> 00:06:17.079 Seconds, right, That's the default for atomics, and it's the 135 00:06:17.120 --> 00:06:20.959 easiest to reason about it. Basically, guarantees two things. One, 136 00:06:21.560 --> 00:06:24.800 all threads agree on a single global order of all 137 00:06:24.839 --> 00:06:30.240 sequentially consistent operations, and two, the operations within any single 138 00:06:30.319 --> 00:06:33.240 thread happen in the order you wrote them in your code. 139 00:06:33.120 --> 00:06:35.199 Like one single timeline for everything. 140 00:06:35.360 --> 00:06:39.120 Exactly simple model, but it can sometimes have performance costs 141 00:06:39.199 --> 00:06:41.519 because the hardware has to work harder to maintain that 142 00:06:41.560 --> 00:06:42.240 global order. 143 00:06:42.399 --> 00:06:45.600 Okay, what about acchore release semantics. Then sounds like it 144 00:06:45.680 --> 00:06:46.720 loosens things up a bit. 145 00:06:46.879 --> 00:06:50.920 It does acquoire, release memory order, require memory order, release 146 00:06:51.000 --> 00:06:55.199 memory order, roll, and also consume, though that's trickier. Focuses 147 00:06:55.240 --> 00:06:58.720 on synchronization between operations on the same atomic. 148 00:06:58.360 --> 00:06:59.759 Variable, same variable, okay. 149 00:07:00.120 --> 00:07:03.839 Release operation. Typically a right ensures that all memory rights 150 00:07:03.920 --> 00:07:07.040 that happen before it in the same thread become visible 151 00:07:07.079 --> 00:07:10.519 to other threads that later perform an acquire operation usually 152 00:07:10.680 --> 00:07:12.920 read on that same atomic variable. 153 00:07:13.120 --> 00:07:16.319 So the release makes prior rights visible and the acquire 154 00:07:16.399 --> 00:07:17.720 sees them precisely. 155 00:07:18.199 --> 00:07:22.839 This creates what's called a synchronizes with relationship. It's fundamental. 156 00:07:23.199 --> 00:07:26.240 The book points out this is how mutexes, thread joins, 157 00:07:26.360 --> 00:07:29.800 condition variables, all the higher level stuff actually works under 158 00:07:29.800 --> 00:07:33.839 the hood. A lock release synchronizes with a subsequent. 159 00:07:33.439 --> 00:07:37.519 Lock acquire that synchronizes with it. Sounds important. It establishes 160 00:07:37.639 --> 00:07:38.680 order across threads. 161 00:07:38.839 --> 00:07:43.040 Yes, it establishes A happens before relationship. If action A 162 00:07:43.279 --> 00:07:47.399 synchronizes with action B, then A happens before B. This 163 00:07:47.480 --> 00:07:49.600 guarantees visibility of memory changes. 164 00:07:49.920 --> 00:07:52.680 Got it? Now? What about the most lenient one memory 165 00:07:52.800 --> 00:07:55.560 order relaxed? What guarantees do we lose there. 166 00:07:55.720 --> 00:07:59.240 With relaxed ordering, you only get the bare minimum the 167 00:07:59.240 --> 00:08:02.360 operation itself as atomic. It happens indivisibly, and there's a 168 00:08:02.399 --> 00:08:06.480 single modification order for that specific atomic variable. All threads 169 00:08:06.519 --> 00:08:09.040 will agree on the sequence of values written to that one. 170 00:08:08.920 --> 00:08:11.879 Variable, but no guarantees about other memory operations exactly. 171 00:08:11.920 --> 00:08:15.839 Relaxed operations don't create synchronizers with relationships. They don't guarantee 172 00:08:15.839 --> 00:08:18.680 anything about the visibility or ordering of other reads and writes, 173 00:08:18.879 --> 00:08:21.079 even to the same variable by different threads or to 174 00:08:21.120 --> 00:08:21.959 different variables. 175 00:08:22.240 --> 00:08:26.480 So potentially faster, but much harder to reason about, much harder. 176 00:08:26.800 --> 00:08:29.720 The book shows using fetchad with relaxed ordering for a 177 00:08:29.800 --> 00:08:32.360 simple counter, which is a common use case, but it 178 00:08:32.399 --> 00:08:35.039 also warns that you can still get data rass on 179 00:08:35.120 --> 00:08:38.360 non atomic variables even if you're reading related atomics with 180 00:08:38.600 --> 00:08:42.799 relaxed order, because there's no happens before relationship established. It's 181 00:08:42.799 --> 00:08:43.519 subtle stuff. 182 00:08:43.559 --> 00:08:46.440 And where do memory fences? Atomic thread fens. 183 00:08:46.240 --> 00:08:50.360 Fit in fences acts like barriers. They enforce ordering constraints 184 00:08:50.360 --> 00:08:53.519 between operations before the fence and operations after the fence, 185 00:08:53.600 --> 00:08:57.440 even across different variables or relaxed atomics. A release fence 186 00:08:57.480 --> 00:09:00.480 makes prior rights visible to threads that later cute and 187 00:09:00.519 --> 00:09:05.000 acquire fence. It's another way to establish that synchronizes with relationship, 188 00:09:05.360 --> 00:09:08.399 but without needing a specific atomic variable to mediate. 189 00:09:08.480 --> 00:09:10.799 Okay, that's a lot to digest on ordering. Let's shift 190 00:09:10.840 --> 00:09:12.960 to actually using threads. How do we launch them? What 191 00:09:13.000 --> 00:09:13.720 are the options? 192 00:09:13.840 --> 00:09:17.519 The main way is std dot thread. Its constructor can 193 00:09:17.559 --> 00:09:20.799 take basically any callable thing hollible thing. Yeah, like a 194 00:09:20.840 --> 00:09:24.840 regular function pointer or a function object you know, an 195 00:09:24.840 --> 00:09:29.080 object where you've overloaded the parentheses operator, or very commonly 196 00:09:29.159 --> 00:09:30.000 a lambda function. 197 00:09:30.159 --> 00:09:31.600 Right. Lambas are handy there. 198 00:09:31.480 --> 00:09:34.279 Super handy. You just passed the function or lambda you 199 00:09:34.279 --> 00:09:36.360 want to run in the new thread, followed by any 200 00:09:36.480 --> 00:09:39.679 arguments it needs. The book shows a simple Hello from 201 00:09:39.720 --> 00:09:41.639 thread using a lambda. 202 00:09:41.399 --> 00:09:44.440 And once it's running, we have to decide what happens 203 00:09:44.440 --> 00:09:47.559 when it finishes. Join or detach. 204 00:09:47.279 --> 00:09:50.159 Exactly, You have to make a choice before the std 205 00:09:50.279 --> 00:09:54.519 dot thread object itself is destroyed. Join means the current 206 00:09:54.559 --> 00:09:57.919 thread waits right there until the launched thread completes. 207 00:09:58.240 --> 00:10:00.440 Useful if you need its result or need to know 208 00:10:00.480 --> 00:10:02.080 it's done before cleaning up resources. 209 00:10:02.120 --> 00:10:05.759 Precisely. Detach. On the other hand, lets the thread run 210 00:10:05.799 --> 00:10:11.000 completely independently in the background. The original thread continues immediately. 211 00:10:11.080 --> 00:10:14.039 But that sounds risky. What if the detached thread needs 212 00:10:14.159 --> 00:10:15.879 data that the original thread owns. 213 00:10:16.039 --> 00:10:18.919 That's the big danger. If the original thread finishes and 214 00:10:18.960 --> 00:10:21.440 its data goes out of scope, but the detached thread 215 00:10:21.480 --> 00:10:24.919 is still running and tries to access that data, Boom, 216 00:10:25.000 --> 00:10:29.240 undefined behavior again. So the book advises joining, usually strongly 217 00:10:29.279 --> 00:10:32.840 advises joining, especially if the thread interacts with data whose 218 00:10:32.879 --> 00:10:35.759 lifetime is tied to the scope where the thread was created. 219 00:10:35.919 --> 00:10:39.279 Detaching requires very careful management of lifetimes. 220 00:10:39.559 --> 00:10:43.720 Makes sense. What about std dot thread dot hardware concurrency. 221 00:10:43.879 --> 00:10:46.159 It's said to us it gives you a hint, basically, 222 00:10:46.679 --> 00:10:49.919 an estimate of how many threads the hardware can genuinely 223 00:10:49.960 --> 00:10:53.559 run in parallel, often related to the number of CPU 224 00:10:53.600 --> 00:10:54.879 cores or hyperthreads. 225 00:10:55.120 --> 00:10:57.240 A hint, not a rule, definitely just a hint. 226 00:10:57.480 --> 00:11:01.559 The optimal number of threads depends heavily on the specific task, io, contention, 227 00:11:01.720 --> 00:11:05.120 et cetera. Using exactly this number isn't always best. The 228 00:11:05.120 --> 00:11:08.080 book mentions, it's just a starting point, a native handle 229 00:11:08.279 --> 00:11:10.799 that's an escape patch. It gives you direct access to 230 00:11:10.840 --> 00:11:13.679 the underlying operating systems thread handle like a thread on 231 00:11:13.759 --> 00:11:16.120 Linux or a handle on Windows. If you need to 232 00:11:16.159 --> 00:11:18.960 do something platform specific that the C plus plus standard 233 00:11:18.960 --> 00:11:21.159 library doesn't cover, use with caution though. 234 00:11:21.279 --> 00:11:23.919 Okay, got it. Let's move on to the tools we 235 00:11:24.000 --> 00:11:27.559 use with threads synchronization primitives, starting with the most basic 236 00:11:27.840 --> 00:11:29.000 STD mutex. 237 00:11:29.320 --> 00:11:33.759 Right, the mutex its core job is mutual exclusion, protecting 238 00:11:33.799 --> 00:11:34.440 shared data. 239 00:11:34.519 --> 00:11:35.200 How does it do that? 240 00:11:35.480 --> 00:11:37.840 Think of it as a lock guarding a piece of data. 241 00:11:38.399 --> 00:11:40.600 Before a thread can touch that data, it has to 242 00:11:40.639 --> 00:11:43.759 lock the mutex. If another thread already holds the lock, 243 00:11:43.879 --> 00:11:47.240 the first thread weights. Once it's done, it must unlock 244 00:11:47.279 --> 00:11:50.200 the mutex, allowing another waiting thread to proceed. 245 00:11:50.200 --> 00:11:52.639 So only one thread gets access at a time. Prevents 246 00:11:52.759 --> 00:11:55.039 data rases on that protected data exactly. 247 00:11:55.399 --> 00:11:58.399 Mutexes are your go to for protecting shared mutable state 248 00:11:58.639 --> 00:11:59.639 first line of defense. 249 00:12:00.240 --> 00:12:04.279 But the book warns about deadlocks. How did mutexes lead 250 00:12:04.320 --> 00:12:04.559 to that? 251 00:12:05.080 --> 00:12:09.000 Ah? The classic deadlock scenario. Imagine thread one locks mutex A, 252 00:12:09.360 --> 00:12:13.360 then tries to lock mutex B. Simultaneously, Thread two locks 253 00:12:13.440 --> 00:12:15.440 mutex B, then tries to lock mutex A. 254 00:12:15.840 --> 00:12:18.240 Oh. Thread one has A and wants B. Thread two 255 00:12:18.279 --> 00:12:19.200 has B and wants a. 256 00:12:19.480 --> 00:12:22.440 And they're stuck. Neither can proceed because it's waiting for 257 00:12:22.480 --> 00:12:24.600 the resource the other one holds. That's a deadlock. They 258 00:12:24.679 --> 00:12:25.960 wait forever, masty. 259 00:12:26.360 --> 00:12:28.440 How do we avoid that when we need multiple locks? 260 00:12:28.679 --> 00:12:31.879 The standard solution is std dot lock. You pass it 261 00:12:31.919 --> 00:12:34.240 all the mutexts you need to acquire. It uses a 262 00:12:34.279 --> 00:12:37.360 deadlock avoidance algorithm internally to try and lock all of them. 263 00:12:37.320 --> 00:12:40.279 Atomically atomically, meaning it gets all of them or none 264 00:12:40.320 --> 00:12:40.639 of them. 265 00:12:40.879 --> 00:12:43.480 Essentially. Yes, it guarantees it won't end up in a 266 00:12:43.519 --> 00:12:46.200 state where it holds some locks while blocking waiting for 267 00:12:46.279 --> 00:12:49.039 others in a way that contributes to deadlock. If it 268 00:12:49.039 --> 00:12:51.559 can't get all locks, it'll release any it acquired and 269 00:12:51.600 --> 00:12:54.600 try again, or perhaps throw an exception, depending on the context. 270 00:12:54.799 --> 00:12:58.799 Okay, so std dot lock for multiple mutexes. Yeah, good tip. 271 00:12:59.159 --> 00:13:02.600 We mentioned threads saf initialization earlier. Besides a simple lock. 272 00:13:02.639 --> 00:13:04.120 What other techniques does the book cover? 273 00:13:04.320 --> 00:13:06.679 Several good ones. If something can be a const expert, 274 00:13:06.759 --> 00:13:09.960 its value is fixed at compile time, so that's inherently thread. 275 00:13:09.840 --> 00:13:11.799 Safe, right, no runtime race possible. 276 00:13:11.840 --> 00:13:15.759 Well, then there's std dotkalents with the std dot once flag. 277 00:13:16.120 --> 00:13:18.279 You pass it a flag and a function like your 278 00:13:18.320 --> 00:13:22.279 initialization function. The standard guarantees that function will be executed 279 00:13:22.320 --> 00:13:25.000 exactly once by the first thread that calls it, even 280 00:13:25.039 --> 00:13:27.919 if many threads call it concurrently, other threads will wait 281 00:13:28.000 --> 00:13:29.200 until the first one is done. 282 00:13:29.240 --> 00:13:30.440 Okay, that sounds robust. 283 00:13:30.960 --> 00:13:34.639 Very Another common C plus plus idiom, especially since C 284 00:13:34.759 --> 00:13:38.879 plus plus eleven, is the Meyers singleton. Using a static 285 00:13:39.000 --> 00:13:40.559 variable inside a function. 286 00:13:40.600 --> 00:13:43.279 Like static my singleton instance return. 287 00:13:43.000 --> 00:13:47.799 Instance exactly that. The language guarantees that the initialization of 288 00:13:47.840 --> 00:13:51.679 that static local variable is thread safe. The compiler and 289 00:13:51.799 --> 00:13:55.759 runtime handle the locking implicitly. It's often the simplest and 290 00:13:55.799 --> 00:13:56.480 preferred way. 291 00:13:56.519 --> 00:13:59.080 Now simple as good any others. 292 00:13:59.320 --> 00:14:02.559 Well of all, if your program structure allows it is 293 00:14:02.720 --> 00:14:05.720 just initialize the shared resource in your main thread before 294 00:14:05.840 --> 00:14:10.120 you create any other threads. No concurrency Doing initialization means 295 00:14:10.120 --> 00:14:10.679 no problem? 296 00:14:11.159 --> 00:14:14.559 Fair enough? What about signaling between threads, like, hey, the 297 00:14:14.639 --> 00:14:18.159 data you're waiting for is ready. That's std dot condition 298 00:14:18.320 --> 00:14:19.399 variable precisely. 299 00:14:19.440 --> 00:14:23.519 Condition variables let threads weight efficiently until some condition becomes true. 300 00:14:23.720 --> 00:14:25.360 How do they work? Do they need a mutex? 301 00:14:25.559 --> 00:14:28.159 Yes? They always work together with the mutex. A waiting 302 00:14:28.200 --> 00:14:31.200 thread must first lock the mutex protecting the shared state 303 00:14:31.320 --> 00:14:34.960 at the condition. Then it calls weight on the condition variable, 304 00:14:35.039 --> 00:14:38.200 and weight does what. It atomically releases the mutex and 305 00:14:38.240 --> 00:14:42.080 puts the thread to sleep. It waits until another thread notifies. 306 00:14:42.120 --> 00:14:43.840 It notifies it how by calling. 307 00:14:43.639 --> 00:14:48.720 Notify one or notifile on the same condition variable. When 308 00:14:48.759 --> 00:14:52.639 the waiting thread wakes up, it automatically reacquires the mutex 309 00:14:52.799 --> 00:14:54.120 before weight returns. 310 00:14:54.159 --> 00:14:56.840 Okay, it wakes up, gets the locked back. Then it 311 00:14:56.879 --> 00:14:58.159 can check the condition exactly. 312 00:14:58.240 --> 00:15:01.360 And this is crucial. It must check the condition again 313 00:15:01.600 --> 00:15:02.399 after waking up. 314 00:15:02.519 --> 00:15:05.039 Why didn't get notified because the condition is true. 315 00:15:05.240 --> 00:15:08.080 Not necessarily, you can get spurious wakeups where the thread 316 00:15:08.080 --> 00:15:10.519 wakes up even though no notification happen or the condition 317 00:15:10.679 --> 00:15:14.360 changed back. That's why weight functions usually take a predicate, 318 00:15:14.480 --> 00:15:17.639 a lambda or function that checks the actual condition. The 319 00:15:17.679 --> 00:15:20.679 weight will only return if the predicate is true or 320 00:15:20.720 --> 00:15:21.480 if interrupted. 321 00:15:21.720 --> 00:15:25.480 Ah, so the predicate handles spurious wakeups. Never wait with 322 00:15:25.519 --> 00:15:25.960 that one. 323 00:15:26.080 --> 00:15:28.000 That's the rule. Always weight with the predicate. 324 00:15:28.120 --> 00:15:32.519 Now C plus plus twenty brought cooperative interruption std dot 325 00:15:32.600 --> 00:15:36.679 stop source stop token. How does that fit in? Especially 326 00:15:36.679 --> 00:15:38.799 with j thread and condition. 327 00:15:38.639 --> 00:15:40.600 Very blany right, This is a much better way to 328 00:15:40.679 --> 00:15:43.759 ask threads to stop than say, just setting a boolean flag. 329 00:15:43.759 --> 00:15:44.600 It's more integrated. 330 00:15:44.639 --> 00:15:45.279 How does it work. 331 00:15:45.519 --> 00:15:49.840 You create a std dot stop source. This object can 332 00:15:49.879 --> 00:15:53.519 request that associated operations stop. From the stop source, you 333 00:15:53.559 --> 00:15:56.679 get std dot stop tokens. You pass these tokens to 334 00:15:56.759 --> 00:15:58.360 the threads or operations. 335 00:15:57.840 --> 00:16:00.000 You might want to interrupt, and the thread checks up 336 00:16:00.120 --> 00:16:00.519 the token. 337 00:16:00.720 --> 00:16:04.679 Yes, a thread can periodically call stop requested on its token. 338 00:16:05.200 --> 00:16:08.399 Or even better, many blocking functions, like the weight functions 339 00:16:08.399 --> 00:16:11.159 on std dot condition variably needs, and the ones in 340 00:16:11.240 --> 00:16:15.000 J thread implicitly can accept a stop token. They'll automatically 341 00:16:15.000 --> 00:16:17.279 wake up if a stop is requested on that token. 342 00:16:17.399 --> 00:16:19.120 So J thread uses this automatically. 343 00:16:19.240 --> 00:16:21.720 J thread has a stop source built in. If you 344 00:16:21.759 --> 00:16:23.559 create a J thread with a function that takes a 345 00:16:23.600 --> 00:16:26.399 stop token as its first argument, the J threads destructor 346 00:16:26.440 --> 00:16:30.480 will automatically request stop before joining. It makes graceful shut down. 347 00:16:30.320 --> 00:16:33.679 Much easier and std dot stop call back that lets. 348 00:16:33.519 --> 00:16:36.279 You register a function that gets called immediately when stop 349 00:16:36.320 --> 00:16:38.960 is requested on a given token, useful for things like 350 00:16:39.039 --> 00:16:41.840 quickly closing a socket or canceling an io operation. 351 00:16:42.000 --> 00:16:45.080 Okay, a much cleaner stop mechanism. What about STD dot 352 00:16:45.120 --> 00:16:47.600 counting semaphore also C plus plus twenty. How's that different 353 00:16:47.639 --> 00:16:48.279 from a mutex? 354 00:16:48.559 --> 00:16:51.840 A mutex is about exclusive access only one thread in 355 00:16:51.919 --> 00:16:55.799 at a time. A semaphore maintains a counter representing available 356 00:16:55.840 --> 00:16:57.080 resources or permits. 357 00:16:57.080 --> 00:16:57.799 How does that work? 358 00:16:58.080 --> 00:17:01.720 A thread calls a choir to take a permit, decrementing 359 00:17:01.759 --> 00:17:04.640 the counter. If the counter is zero, the thread blocks. 360 00:17:05.279 --> 00:17:08.799 A thread calls release to return a permit, incrementing the counter, 361 00:17:09.119 --> 00:17:11.000 potentially waking up a blocked thread. 362 00:17:11.160 --> 00:17:13.240 Can different threads acquire and release? 363 00:17:13.720 --> 00:17:17.359 Yes, that's a key difference from utexas, which are usually 364 00:17:17.400 --> 00:17:20.680 locked and unlocked by the same thread Somemophores are great 365 00:17:20.680 --> 00:17:23.400 for controlling access to a pool of n resources or 366 00:17:23.440 --> 00:17:26.839 for producer consumer scenarios where one thread signals another about 367 00:17:26.839 --> 00:17:28.960 available work. They're thread agnostic. 368 00:17:29.359 --> 00:17:33.160 Interesting. Lastly, for basic sinc C plus plus twenty also 369 00:17:33.200 --> 00:17:36.519 give us STD dot barrier and std dot latch. Yeah, 370 00:17:36.559 --> 00:17:38.480 coordinating multiple threads exactly. 371 00:17:38.559 --> 00:17:40.759 Both are for synchronizing a group of threads at a 372 00:17:40.799 --> 00:17:41.640 specific point. 373 00:17:41.759 --> 00:17:43.799 What's the difference latch versus barrier. 374 00:17:43.720 --> 00:17:46.359 A SSTD dot latch is basically a one shot countdown. 375 00:17:46.400 --> 00:17:48.559 You initialize it with a count threads call countdown. When 376 00:17:48.559 --> 00:17:51.079 the count reaches zero, any threads waiting on the latch 377 00:17:51.240 --> 00:17:54.200 using weight are unblocked. After that, the latch is done. 378 00:17:54.240 --> 00:17:55.839 It can't be reset. 379 00:17:55.839 --> 00:17:58.039 One time use and a barrier. 380 00:17:58.160 --> 00:18:01.640 A std dot barrier is reusable. You initialize it with 381 00:18:01.680 --> 00:18:04.720 the number of threads in the group. Each thread calls 382 00:18:04.880 --> 00:18:07.839 arrive and weight. When all threads have arrived, they are 383 00:18:07.880 --> 00:18:12.519 all unblocked simultaneously. Crucially, the barrier resets ready for the 384 00:18:12.559 --> 00:18:15.920 next synchronization phase. You can even run a completion function 385 00:18:16.000 --> 00:18:18.519 when all threads arrive but before they're unblocked. 386 00:18:18.799 --> 00:18:22.359 So latch for a single sync point barrier for repeated 387 00:18:22.400 --> 00:18:23.440 phases of computation. 388 00:18:24.000 --> 00:18:25.680 That's a good way to think about it. The book 389 00:18:25.720 --> 00:18:28.759 shows an example of barriers being used across different stages 390 00:18:28.799 --> 00:18:30.599 where the number of workers might even change. 391 00:18:30.680 --> 00:18:33.559 Okay, let's move up a level to tasks and futures. 392 00:18:33.640 --> 00:18:36.680 Std dot ASNC sounds really convenient for running stuff in 393 00:18:36.759 --> 00:18:37.279 the background. 394 00:18:37.400 --> 00:18:40.079 It is. It's a high level way to say, run 395 00:18:40.119 --> 00:18:43.119 this function, possibly on another thread, and give me back 396 00:18:43.160 --> 00:18:44.920 something I can use to get the result later. 397 00:18:45.079 --> 00:18:47.039 That's something is the future exactly. 398 00:18:47.160 --> 00:18:50.799 Std dot ACNC returns std dot future object, and it 399 00:18:50.839 --> 00:18:54.119 handles the thread management, often using an internal. 400 00:18:53.759 --> 00:18:56.799 Threadpool you mentioned, possibly on another thread right. 401 00:18:57.160 --> 00:19:01.160