WEBVTT 1 00:00:00.120 --> 00:00:05.960 Imagine looking at a high definition, totally photorealistic image of 2 00:00:06.000 --> 00:00:09.080 a human face. You know, you can see every single pore, 3 00:00:09.199 --> 00:00:12.839 the individual eyelashes, like the exact glint of light reflecting 4 00:00:12.880 --> 00:00:13.359 in their eyes. 5 00:00:13.439 --> 00:00:15.480 Yeah, and then you realize that person has never actually. 6 00:00:15.240 --> 00:00:18.359 Existed right anywhere. I mean, a machine didn't just scrape 7 00:00:18.399 --> 00:00:20.920 the internet and you know, find their kicture. It imagine 8 00:00:20.920 --> 00:00:22.879 them into existence from pure. 9 00:00:22.640 --> 00:00:25.600 Mathematics, which is just wild to think about. 10 00:00:25.519 --> 00:00:29.320 It really is. Today we are opening up an incredibly 11 00:00:29.399 --> 00:00:33.200 comprehensive text for this deep dive. It's called Introduction to 12 00:00:33.320 --> 00:00:37.200 Deep Learning Business Applications for Developers by Armando Vieira and 13 00:00:37.280 --> 00:00:39.600 Bernard det Ribero. And our mission for you today is 14 00:00:39.640 --> 00:00:42.200 to really get past the buzzwords. 15 00:00:41.759 --> 00:00:45.000 Exactly because we hear AI and neural networks everywhere right. 16 00:00:45.000 --> 00:00:48.359 Now, right, but we want to genuinely demyssify what is 17 00:00:48.399 --> 00:00:50.799 happening under the hood, Like how did we go from 18 00:00:50.840 --> 00:00:54.560 machines that literally couldn't solve a child's basic logic puzzle 19 00:00:54.880 --> 00:00:59.039 to algorithms that can compose music, drive cars, and synthesize 20 00:00:59.119 --> 00:01:00.039 entirely new strands of. 21 00:01:00.840 --> 00:01:03.840 And to understand that leap, we really have to look 22 00:01:03.840 --> 00:01:06.439 at the foundational inspiration for all of this, which is 23 00:01:06.439 --> 00:01:10.239 the human brain itself. Okay, because the architectures we're looking at, 24 00:01:10.480 --> 00:01:13.959 they're designed to build hierarchical, abstract concepts to make sense 25 00:01:13.959 --> 00:01:14.480 of the world. 26 00:01:14.640 --> 00:01:16.680 So it's not just plugging numbers into a formula. 27 00:01:17.079 --> 00:01:19.879 No, not at all. It is much less like programming 28 00:01:19.959 --> 00:01:23.480 a traditional calculator to execute a strict set of rules, 29 00:01:23.560 --> 00:01:26.640 and well much more like watching a toddler learn about 30 00:01:26.680 --> 00:01:29.120 their environment through unstructured observation. 31 00:01:29.400 --> 00:01:32.879 Okay, let's unpack this because before we talk about machines 32 00:01:32.920 --> 00:01:35.599 imagining faces, we kind of have to talk about why 33 00:01:35.640 --> 00:01:39.040 this technology failed so spectacularly. 34 00:01:38.200 --> 00:01:40.719 In the past, right, the Dark Ages of AI. 35 00:01:41.000 --> 00:01:44.840 Yeah, exactly. Yeah, there was this multi decade freeze known 36 00:01:44.879 --> 00:01:47.400 as the AI Winter, and it actually started with a 37 00:01:47.519 --> 00:01:49.799 very simple logic problem, didn't it. 38 00:01:49.799 --> 00:01:52.840 It did. So. The origins of artificial neural networks go 39 00:01:52.959 --> 00:01:56.000 way back to the nineteen fifties. A researcher named Frank 40 00:01:56.079 --> 00:01:58.719 Rosenblatt invented what he called the perceptron. 41 00:01:58.920 --> 00:02:01.560 The perceptrons. It sounds very retro sci fi. 42 00:02:01.760 --> 00:02:04.840 It does, and it was essentially a single layer of 43 00:02:04.920 --> 00:02:08.800 artificial neurons like a basic decision making unit, and the 44 00:02:08.879 --> 00:02:11.479 hype at the time was massive. I mean, the media 45 00:02:11.520 --> 00:02:14.199 thought human level AI was just around the corner, But 46 00:02:14.280 --> 00:02:17.599 it wasn't, not even close, because in nineteen sixty nine 47 00:02:17.719 --> 00:02:22.120 Marvin Minsky and Seymour paper published this completely crushing critique. 48 00:02:22.319 --> 00:02:26.120 They proved that these simple perceptrons were mathematically incapable of 49 00:02:26.159 --> 00:02:27.599 solving nonlinear problems. 50 00:02:27.879 --> 00:02:30.599 Right, And the famous example from the text is the 51 00:02:30.759 --> 00:02:34.000 xor problem, the exclusive or yes, exactly. 52 00:02:34.080 --> 00:02:37.240 It's an incredibly basic logical operation. Basically, if you have 53 00:02:37.319 --> 00:02:39.879 two inputs, the output is true if one and only 54 00:02:39.919 --> 00:02:41.199 one of the inputs is true. 55 00:02:41.280 --> 00:02:43.680 Like a child could grasp this intuitively totally. 56 00:02:44.039 --> 00:02:46.039 But if you try to draw a single straight line 57 00:02:46.039 --> 00:02:48.240 on a graph to separate the true results from the 58 00:02:48.240 --> 00:02:50.919 false results and an xor problem, you just can't. 59 00:02:50.759 --> 00:02:52.840 Do it because it's nonlinear, right. 60 00:02:52.919 --> 00:02:55.439 And since a single layer perceptron could only draw one 61 00:02:55.479 --> 00:02:59.400 straight line, it completely failed. That mathematical proof was so 62 00:02:59.479 --> 00:03:03.960 devastating that well funding completely dried up. People basically abandoned 63 00:03:04.000 --> 00:03:06.039 the dream of neural networks for decades. 64 00:03:06.400 --> 00:03:09.159 But the text highlights the year two thousand and six 65 00:03:09.280 --> 00:03:13.599 as the major thaw to this AI winter right, driven 66 00:03:13.639 --> 00:03:16.199 by Jeffrey Hinton and his work on deep belief networks 67 00:03:16.280 --> 00:03:19.159 or dPNs and restricted Boltzmann machines. 68 00:03:19.280 --> 00:03:21.400 Yeah, Hinton was a massive turning point. 69 00:03:21.520 --> 00:03:23.280 Oh wait, here's where I want to push back on 70 00:03:23.319 --> 00:03:26.800 the timeline a bit, because the underlying math to fix 71 00:03:26.879 --> 00:03:31.280 that single layer problem, specifically adding multiple layers and using 72 00:03:31.280 --> 00:03:34.319 an algorithm called back propagation that was popularized in the 73 00:03:34.400 --> 00:03:35.199 nineteen eighties. 74 00:03:35.439 --> 00:03:37.120 That's true, the mass was there. 75 00:03:36.960 --> 00:03:39.560 And back propagation is where the network makes a guess, 76 00:03:39.960 --> 00:03:43.000 calculates how wrong its guess was, and then sends a 77 00:03:43.000 --> 00:03:46.680 correction signal backward through its layers to adjust the mathematical weights. 78 00:03:46.800 --> 00:03:47.800 Right, it updates itself. 79 00:03:47.840 --> 00:03:49.879 So if we had that in the eighties, why did 80 00:03:49.879 --> 00:03:52.000 it take until two thousand and six to actually make 81 00:03:52.039 --> 00:03:54.759 these networks deep? Like, why couldn't they just stack a 82 00:03:54.759 --> 00:03:58.039 bunch of layers together back then and let back propagation 83 00:03:58.080 --> 00:03:58.599 do its thing. 84 00:03:58.719 --> 00:04:02.000 Well, what's fascinating here is the physical limitation of how 85 00:04:02.039 --> 00:04:05.199 that error signal travels. When you tried to stack many 86 00:04:05.240 --> 00:04:08.639 layers to make the network genuinely deep, you ran straight 87 00:04:08.680 --> 00:04:10.199 into the vanishing gradient problem. 88 00:04:10.400 --> 00:04:14.360 Oh, I picture this like a massive game of telephone 89 00:04:14.439 --> 00:04:16.519 played across a packed football stadium. 90 00:04:16.720 --> 00:04:19.240 Yes, that is a perfect way to visualize it. Because 91 00:04:19.279 --> 00:04:23.560 backpropagation relies on gradients, which are essentially, you know, mathematical 92 00:04:23.600 --> 00:04:26.399 slopes that tell the network how to adjust its weights. 93 00:04:26.639 --> 00:04:28.480 Okay, the instructions for changing right. 94 00:04:28.959 --> 00:04:31.959 But as that correction signal passes backward through each layer, 95 00:04:32.199 --> 00:04:35.519 it gets multiplied by numbers smaller than one, so it shrinks. 96 00:04:35.759 --> 00:04:38.120 If you multiply zero point one by zero point one, 97 00:04:38.160 --> 00:04:41.160 you get zero point zero one exactly. 98 00:04:41.279 --> 00:04:44.680 Do that ten times and the number becomes microscopically small. 99 00:04:45.160 --> 00:04:48.360 By the time that signal reaches the earliest bottom layers 100 00:04:48.399 --> 00:04:51.920 of the network, like the foundation, the gradient has basically vanished. 101 00:04:52.000 --> 00:04:53.600 It's practically zero right zero. 102 00:04:53.680 --> 00:04:56.439 Yeah, So the lower layers never get the message to update, 103 00:04:56.600 --> 00:04:59.720 which means the network never learns the fundamental building blocks 104 00:04:59.720 --> 00:05:00.279 of the. 105 00:05:00.120 --> 00:05:02.839 Data, like trying to memorize a textbook without knowing the 106 00:05:02.839 --> 00:05:04.120 alphabet precisely. 107 00:05:04.759 --> 00:05:07.920 So, to fix the telephone game, Hinton had to completely 108 00:05:08.040 --> 00:05:11.720 change how those earliest layers learned. He introduced something called 109 00:05:11.759 --> 00:05:14.000 contrastive divergence, and from. 110 00:05:13.839 --> 00:05:16.959 What I understand, this basically allowed the network to learn 111 00:05:17.560 --> 00:05:21.240 without needing a human to provide a perfectly labeled answer 112 00:05:21.319 --> 00:05:22.920 key right away exactly. 113 00:05:23.319 --> 00:05:27.360 He realized you can't train a massive deep network all 114 00:05:27.399 --> 00:05:30.959 at once from the top down, So with contrastive divergence, 115 00:05:31.360 --> 00:05:34.439 he trained the network layer by layer from the bottom up. 116 00:05:34.360 --> 00:05:35.639 Using unlabeled data. 117 00:05:36.079 --> 00:05:38.439 Right. He allowed the first layer to just look at 118 00:05:38.439 --> 00:05:41.199 the raw input and try to reconstruct it, learning the 119 00:05:41.199 --> 00:05:44.959 statistical properties of the data autonomously. Oh wow. Yeah, And 120 00:05:45.000 --> 00:05:47.560 once that first layer figured out the basic patterns, its 121 00:05:47.600 --> 00:05:50.120 output became the input for the second layer, and so on. 122 00:05:51.000 --> 00:05:53.519 By the time you apply backpropagation to fine tune the 123 00:05:53.519 --> 00:05:57.079 whole system, the network already has a massive. 124 00:05:56.720 --> 00:05:59.519 Headstart because it organically built a rough model of the 125 00:05:59.519 --> 00:06:02.879 world for us exactly, which really brings up a massive 126 00:06:02.920 --> 00:06:06.000 paradigm shift in how we handle data. The text frames 127 00:06:06.079 --> 00:06:08.720 this as escaping the curse of dimensionality. 128 00:06:08.839 --> 00:06:11.720 The curse of dimensionality. Yes, it's a huge. 129 00:06:11.480 --> 00:06:14.839 Concept, right, And the source material uses the iris flower 130 00:06:14.920 --> 00:06:18.240 data set to explain this. So, say you want a 131 00:06:18.279 --> 00:06:23.199 classic traditional machine learning algorithm like naive base to categorize 132 00:06:23.199 --> 00:06:25.920 three different types of iris flowers, You really only need 133 00:06:25.959 --> 00:06:27.639 four measurements, right, Yeah. 134 00:06:27.600 --> 00:06:29.279 Just the length and width of the pedal and the 135 00:06:29.360 --> 00:06:30.199 length and width of. 136 00:06:30.079 --> 00:06:35.439 The sepal four dimensions. A traditional algorithm handles that flawlessly. 137 00:06:35.160 --> 00:06:38.480 Because it is a beautifully simple, low dimensional space. 138 00:06:38.680 --> 00:06:41.199 But what happens when you feed a computer and image, 139 00:06:41.319 --> 00:06:46.000 even a tiny, practically unusable, one thousand pixel image has 140 00:06:46.120 --> 00:06:49.839 tend to the power of one thousand possible combinations. 141 00:06:49.279 --> 00:06:51.240 Right, And to give you a sense of scale, that 142 00:06:51.360 --> 00:06:54.279 number is vastly larger than the total number of atoms 143 00:06:54.319 --> 00:06:55.600 in the observable. 144 00:06:55.160 --> 00:06:57.199 Universe, which is just mind blowing it is. 145 00:06:57.399 --> 00:07:00.319 And when a traditional algorithm looks at a terrifying high 146 00:07:00.360 --> 00:07:03.800 dimensional space like an image, the math just breaks down. 147 00:07:04.199 --> 00:07:07.519 The data points become so sparse and far apart that 148 00:07:07.560 --> 00:07:09.800 the algorithm can't find any meaningful patterns. 149 00:07:10.120 --> 00:07:13.240 So the old way of solving this was feature engineering. 150 00:07:13.639 --> 00:07:17.160 Like I compare traditional machine learning to a chef who 151 00:07:17.199 --> 00:07:22.079 needs every single ingredient chopped, measured perfectly and laid out 152 00:07:22.079 --> 00:07:24.439 in little bowls before they can even start cooking. 153 00:07:24.680 --> 00:07:26.959 Right, a human programmer had to step in and pre 154 00:07:27.079 --> 00:07:30.120 digest the data. They had to explicitly tell the computer 155 00:07:30.800 --> 00:07:33.319 look at the distance between these two specific pixels. 156 00:07:33.519 --> 00:07:36.680 But deep learning throws feature engineering entirely in the trash. 157 00:07:36.839 --> 00:07:40.879 By contrast, deep learning is like throwing whole raw ingredients 158 00:07:40.920 --> 00:07:44.000 into a magical pot that just figures out the recipe itself. 159 00:07:44.079 --> 00:07:47.800 I love that analogy. The network completely removes the need 160 00:07:47.920 --> 00:07:51.639 for human lead feature engineering, and the engine driving that 161 00:07:51.680 --> 00:07:56.399 magical pot is an optimization algorithm called stochastic gradient descent. 162 00:07:56.519 --> 00:07:59.600 Okay, so how does that algorithm actually guide the recipe? 163 00:07:59.720 --> 00:08:00.560 How does it work? 164 00:08:00.639 --> 00:08:03.920 Well? Imagine a blindfolded hiker standing somewhere on a massive, 165 00:08:04.000 --> 00:08:04.879 jagged mountain range. 166 00:08:04.959 --> 00:08:06.800 Okay, blind colded hiker, right, and. 167 00:08:06.800 --> 00:08:09.240 Their goal is to get to the lowest possible point 168 00:08:09.240 --> 00:08:12.399 in the valley, which represents the lowest possible error in 169 00:08:12.439 --> 00:08:13.600 the network's predictions. 170 00:08:13.680 --> 00:08:15.560 But they can't see the map exactly. 171 00:08:15.639 --> 00:08:18.000 They can't see the whole map, so they just feel 172 00:08:18.040 --> 00:08:21.240 the ground directly beneath their feet, figure out which direction 173 00:08:21.519 --> 00:08:24.759 is the steepest slope downward, and they take a single step. 174 00:08:24.839 --> 00:08:25.600 Oh, I see. 175 00:08:25.720 --> 00:08:29.560 Stochastic gradient descent does this mathematically. It samples a small 176 00:08:29.560 --> 00:08:33.039 batch of data, calculates the slope of the error, and 177 00:08:33.120 --> 00:08:35.519 tweaks the weights in the network to step downhill. 178 00:08:35.720 --> 00:08:36.399 Wow. 179 00:08:37.559 --> 00:08:41.200 Yeah, it maps that terrifying universe of a thousand pixels 180 00:08:41.519 --> 00:08:45.840 into a continuous, low dimensional manifold. It essentially folds and 181 00:08:45.919 --> 00:08:49.559 compresses the data into a simpler geometry where the solutions 182 00:08:49.559 --> 00:08:50.159 are easy to. 183 00:08:50.120 --> 00:08:53.519 Find, so the machine discovers the features entirely on its 184 00:08:53.519 --> 00:08:56.600 own exactly. But once researchers realized they could build these 185 00:08:56.639 --> 00:08:59.840 autonomous systems, they figured out that a network designed to 186 00:09:00.000 --> 00:09:03.720 process photographs needs a very different architecture than a network 187 00:09:03.759 --> 00:09:05.960 designed to process, say, spoken language. 188 00:09:06.039 --> 00:09:08.240 Right, you need different pots for different recipes. 189 00:09:07.879 --> 00:09:11.080 Which leads to the different flavors of neural networks. Reading 190 00:09:11.080 --> 00:09:13.159 this part of the text honestly feels like wandering through 191 00:09:13.159 --> 00:09:14.159 an architecture zoo. 192 00:09:14.440 --> 00:09:17.919 It really is a highly specialized zoo, because once the 193 00:09:18.000 --> 00:09:21.200 vanishing gradient problem was solved, the whole field just exploded, 194 00:09:21.519 --> 00:09:25.879 and the undisputed kings of image processing are convolutional neural 195 00:09:25.919 --> 00:09:29.840 networks or CNN's okay CNNs, Yeah, And they achieve this 196 00:09:30.039 --> 00:09:34.200 by directly mimicking the low level stages of a primate's 197 00:09:34.320 --> 00:09:35.240 visual core tex. 198 00:09:35.480 --> 00:09:38.480 Wait, really, so to avoid looking at a trillion pixels 199 00:09:38.480 --> 00:09:41.600 all at once and getting completely overwhelmed, I'm assuming it 200 00:09:41.639 --> 00:09:44.799 breaks the image down into smaller chunks, kind of like scanning. 201 00:09:44.799 --> 00:09:47.799 A document exactly that. It uses a mechanism called a 202 00:09:47.799 --> 00:09:51.440 filter or a kernel, which basically slides across the image, 203 00:09:51.679 --> 00:09:54.120 and it also utilizes something called max pooling. 204 00:09:54.320 --> 00:09:55.679 Max pooling, what does that do? 205 00:09:56.000 --> 00:09:58.960 It takes small grid of pixels, finds the single most 206 00:09:59.000 --> 00:10:01.879 prominent feature in that and just discards the rest. 207 00:10:02.000 --> 00:10:03.600 So it shrinks the image, right, It. 208 00:10:03.519 --> 00:10:06.399 Shrinks it down while keeping the most critical information. And 209 00:10:06.440 --> 00:10:10.200 the absolute brilliance of this is that it creates translation invariance. 210 00:10:10.639 --> 00:10:13.120 Oh, meaning that if I pull out my smartphone camera, 211 00:10:13.559 --> 00:10:16.960 the software draws a yellow focus box around my friend's face, 212 00:10:17.000 --> 00:10:19.320 whether she is standing dead center in the frame or 213 00:10:19.360 --> 00:10:20.960 you know, off to the bottom left corner. 214 00:10:21.200 --> 00:10:24.600 Precisely, the CNN doesn't have to relearn what a face 215 00:10:24.679 --> 00:10:28.080 looks like for every single coordinate on the screen. It 216 00:10:28.120 --> 00:10:30.960 recognizes the pattern regardless of its spatial location. 217 00:10:31.200 --> 00:10:33.000 That's incredibly efficient, it is. 218 00:10:33.320 --> 00:10:36.879 But that reliance on space is exactly why a standard 219 00:10:36.919 --> 00:10:40.159 neural net or a CNN fails when you introduce the 220 00:10:40.159 --> 00:10:40.759 concept of. 221 00:10:40.720 --> 00:10:43.360 Time ah time like text or video. 222 00:10:43.519 --> 00:10:46.360 Right, if you are analyzing a video frame by frame 223 00:10:46.679 --> 00:10:49.200 or trying to translate a spoken sentence, the order of 224 00:10:49.240 --> 00:10:52.480 the data matters immensely. Traditional neural nets try to project 225 00:10:52.480 --> 00:10:56.039 time as space, which completely fails to capture context. 226 00:10:55.559 --> 00:10:58.279 Which is where a recurrent neural networks or RNNs come in, 227 00:10:58.399 --> 00:11:01.559 and specifically, the text highlight it's a massive architectural leap. 228 00:11:01.559 --> 00:11:05.039 In nineteen ninety seven by researchers Hawk Writer and Schmid Hoober. 229 00:11:05.200 --> 00:11:09.039 Yes, they introduced long short term memory units or LSTMs. 230 00:11:09.360 --> 00:11:12.519 Here's where it gets really interesting, because the text points 231 00:11:12.519 --> 00:11:16.159 out an incredible irony. Here. To make a machine possess 232 00:11:16.159 --> 00:11:19.919 a better memory for long sequences of data, scientists actually 233 00:11:19.960 --> 00:11:23.279 had to build in a specific mathematical mechanism to force 234 00:11:23.320 --> 00:11:24.039 it to forget. 235 00:11:24.320 --> 00:11:25.080 The forget gate. 236 00:11:25.200 --> 00:11:28.720 Yeah, the forget gate. Why is the ability to selectively 237 00:11:28.799 --> 00:11:31.240 forget so crucial to artificial memory? 238 00:11:31.600 --> 00:11:34.960 Well, think about the maz's analogy from the text. Imagine 239 00:11:35.000 --> 00:11:38.679 a robot trying to navigate a complex maze where every 240 00:11:38.720 --> 00:11:41.159 single T junction looks completely identical. 241 00:11:41.320 --> 00:11:42.679 Okay, that sounds like a nightmare. 242 00:11:43.000 --> 00:11:45.480 It is, And if the robot only looks at its 243 00:11:45.480 --> 00:11:48.840 present state, it will wander in circles forever. This is 244 00:11:48.840 --> 00:11:50.919 what we call a non Markovian task. 245 00:11:50.799 --> 00:11:53.000 A task where the present moment doesn't give you enough 246 00:11:53.000 --> 00:11:56.320 information to solve the puzzle. You need your history, right. 247 00:11:56.600 --> 00:11:59.600 But if the network simply tries to remember every single 248 00:11:59.639 --> 00:12:03.559 micro movement it ever made, the mathematical gradients will explode 249 00:12:03.600 --> 00:12:05.960 or vanish again, completely paralyzing the system. 250 00:12:06.120 --> 00:12:09.159 Oh, the vanishing gradient strikes again exactly. 251 00:12:08.759 --> 00:12:11.440 So, the forget gate in an LSTM allows the network 252 00:12:11.440 --> 00:12:15.879 to evaluate its internal state and intentionally discharge useless information. 253 00:12:16.360 --> 00:12:18.840 It basically says, the color of the wall five turns 254 00:12:18.879 --> 00:12:21.399 go doesn't matter, dump it, but the fact that I 255 00:12:21.399 --> 00:12:23.360 took three left turns does matter. 256 00:12:23.720 --> 00:12:26.200 Keep it, which is exactly what is happening when you 257 00:12:26.279 --> 00:12:28.799 like type a text message on your phone. The predictive 258 00:12:28.799 --> 00:12:31.240 text keyboard isn't just looking at the last word you typed, 259 00:12:31.519 --> 00:12:34.279 and LSTM is remembering the context from the beginning of 260 00:12:34.320 --> 00:12:38.120 your sentence selectively forgetting the filler words and predicting the 261 00:12:38.159 --> 00:12:39.759 most logical next word. 262 00:12:40.200 --> 00:12:43.279 Exactly. The machine is holding onto the crucial sequence while 263 00:12:43.320 --> 00:12:45.879 ignoring the irrelevant static of the journey. 264 00:12:46.120 --> 00:12:48.799 So we have architectures that can see like primates, and 265 00:12:48.960 --> 00:12:52.720 architectures that can navigate time and memory, but analyzing data 266 00:12:52.759 --> 00:12:55.159 is one thing. The text takes a wild turn when 267 00:12:55.200 --> 00:12:58.039 It explores what happens when these architectures are turned inside 268 00:12:58.080 --> 00:12:59.480 out to actually create data. 269 00:12:59.600 --> 00:13:03.600 Yes, the shift from discriminative models, which just categorize things 270 00:13:03.679 --> 00:13:07.360 two generative models, and the standout breakthrough here was in 271 00:13:07.399 --> 00:13:10.679 twenty fourteen by Ian Goodfellow, who introduced. 272 00:13:10.279 --> 00:13:14.960 Gand generative adversarial networks. The adversarial part is definitely my 273 00:13:15.000 --> 00:13:16.200 favorite concept in the book. 274 00:13:16.240 --> 00:13:17.519 It's a really elegant solution. 275 00:13:17.720 --> 00:13:19.720 I like, in a jan To an art forger and 276 00:13:19.759 --> 00:13:22.919 a detective locked in a room together. The forger, which 277 00:13:22.960 --> 00:13:26.600 is the generator network, paints a fake masterpiece and slips 278 00:13:26.600 --> 00:13:29.879 it under the door. The detective, the discriminator network, looks 279 00:13:29.919 --> 00:13:32.799 at it, spots the flaws, and tells the forger exactly 280 00:13:32.799 --> 00:13:33.600 how they messed up. 281 00:13:33.679 --> 00:13:35.120 Right, They play a game against each. 282 00:13:35.000 --> 00:13:38.960 Other, so the forger tries again and again over millions 283 00:13:38.960 --> 00:13:44.159 of iterations. The forger's technique becomes so flawlessly mathematical that 284 00:13:44.240 --> 00:13:47.279 the detective literally can no longer tell the difference between 285 00:13:47.279 --> 00:13:51.559 the generated fake and reality. Your digital forger is suddenly 286 00:13:51.600 --> 00:13:52.720 painting like Da Vinci. 287 00:13:53.159 --> 00:13:55.639 Pitting two neural networks against each other in a zero 288 00:13:55.679 --> 00:13:59.480 sum game is just a brilliant mechanic and the applications 289 00:13:59.519 --> 00:14:03.240 highlighted in the text, I mean they're staggering. Likewise, well, 290 00:14:03.360 --> 00:14:06.080 jans and deep autow encoders are used to generate those 291 00:14:06.399 --> 00:14:09.519 photorealistic human faces we talked about at the start. They 292 00:14:09.519 --> 00:14:13.399 are automatically colorizing black and white manga comics. They're even 293 00:14:13.440 --> 00:14:17.559 being used to generate functional synthetic DNA sequences for medical research. 294 00:14:17.960 --> 00:14:21.279 Synthetic DNA that is unreal it is, And. 295 00:14:21.159 --> 00:14:24.559 If we connect this to the bigger picture, generative models 296 00:14:24.639 --> 00:14:28.679 represent a profound evolution in how we view machine intelligence. 297 00:14:28.879 --> 00:14:33.799 How So, when a network can generate realistic, entirely new data, 298 00:14:34.080 --> 00:14:38.440 it proves that it genuinely understands the underlying latent structure 299 00:14:38.519 --> 00:14:41.840 of our world. It forms a rich internal imagery. 300 00:14:42.000 --> 00:14:43.879 So it's not just checking boxes anymore. 301 00:14:43.919 --> 00:14:47.720 Exactly, It isn't just classifying pixels. It is empowered to reason, 302 00:14:48.039 --> 00:14:52.360 to explore infinite variations, and to make complex decisions without 303 00:14:52.360 --> 00:14:55.600 a human explicitly hard coding a strict loss function. 304 00:14:55.399 --> 00:14:58.279 Which brings us crashing into the real world implications of 305 00:14:58.320 --> 00:15:02.120 this technology. These architectures aren't just academic thought experiments anymore. 306 00:15:02.120 --> 00:15:05.799 They are actively driving a massive, multi billion dollars shift 307 00:15:05.840 --> 00:15:06.759 in global business. 308 00:15:06.759 --> 00:15:09.480 Oh. Absolutely, the business reality is moving faster than most 309 00:15:09.480 --> 00:15:10.320 people realize. 310 00:15:10.360 --> 00:15:13.399 The text sites of very stark twenty thirteen study by 311 00:15:13.399 --> 00:15:16.360 Fray and Osborne warning that forty seven percent of US 312 00:15:16.440 --> 00:15:18.879 jobs are at risk of automation. I mean, they say 313 00:15:18.919 --> 00:15:21.639 this is impacting society at three hundred times the scale 314 00:15:21.679 --> 00:15:22.840 of the Industrial revolution. 315 00:15:23.159 --> 00:15:26.600 And we are seeing the deployments right now, like WEIMO 316 00:15:26.799 --> 00:15:30.440 is taking human operators out of vehicles, trusting these networks 317 00:15:30.440 --> 00:15:33.840 to process visual data in real time at highway. 318 00:15:33.559 --> 00:15:35.320 Speeds and playing games too, right. 319 00:15:35.440 --> 00:15:38.919 Yeah, An AI system called Libridis beat the top human 320 00:15:38.960 --> 00:15:41.840 poker players in the world at Texas hold them and 321 00:15:41.879 --> 00:15:44.360 that is a game of bluffing and hidden information. Not 322 00:15:44.399 --> 00:15:48.840 just pure math. Plus deep learning is fundamentally overhauling automated 323 00:15:48.879 --> 00:15:53.240 medical imagery, spotting tumors in radiology scans that humanized miss. 324 00:15:53.600 --> 00:15:56.399 But fueling all of this requires an astonishing amount of 325 00:15:56.399 --> 00:16:00.039 computing power. The text refers to this era as the 326 00:16:00.080 --> 00:16:03.440 cloud Wars, because let's be real, you absolutely cannot run 327 00:16:03.519 --> 00:16:07.480 a multi layer generative adversarial network on your standard office laptop. 328 00:16:07.600 --> 00:16:10.480 No, you really can't. The hardware bottleneck is immense training 329 00:16:10.519 --> 00:16:13.759 these models requires monstrous data sets and specialized computing power, 330 00:16:13.919 --> 00:16:18.960 specifically GPUs and FPGAs FPGAs field programmable GATA rays. These 331 00:16:18.960 --> 00:16:22.159 are tips that can perform thousands of calculations simultaneously. But 332 00:16:22.200 --> 00:16:25.080 because most companies cannot afford to build their own supercomputers, 333 00:16:25.240 --> 00:16:27.600 deep learning is rapidly shifting to an AI as a 334 00:16:27.639 --> 00:16:28.639 service model. 335 00:16:28.519 --> 00:16:31.080 So everyone is just renting server space exactly. 336 00:16:31.360 --> 00:16:35.840 You have tech titans like Amazon Web Services, Google Cloud, IBM, Watson, 337 00:16:35.879 --> 00:16:39.279 and Microsoft Azure locked in a brutal fight for dominance 338 00:16:39.519 --> 00:16:41.840 over this cloud infrastructure. 339 00:16:41.320 --> 00:16:45.440 And their strategy to win that war is fascinating. They 340 00:16:45.480 --> 00:16:49.240 are taking their most valuable cutting edge software frameworks like 341 00:16:49.320 --> 00:16:55.039 Google's TensorFlow or the Kera's library, and they're completely open 342 00:16:55.080 --> 00:16:58.360 solcing them. They are just giving the blueprints away for free. 343 00:16:58.440 --> 00:17:01.120 Well, it is the ultimate trap. By giving away the 344 00:17:01.200 --> 00:17:04.599 software framework for free, developers learned to build their neural 345 00:17:04.640 --> 00:17:09.039 networks using Google specific code or Amazon specific code. 346 00:17:09.160 --> 00:17:11.400 Right they get used to the ecosystem. 347 00:17:10.839 --> 00:17:13.519 And once the developer is locked into that software ecosystem, 348 00:17:13.720 --> 00:17:16.640 they eventually realize they need massive hardware to actually run 349 00:17:16.680 --> 00:17:19.079 the models. And who do they rent that hardware? 350 00:17:19.119 --> 00:17:21.759 From the very same cloud provider who gave them the 351 00:17:21.799 --> 00:17:22.839 software exactly. 352 00:17:22.839 --> 00:17:26.759 It guarantees massive recurring cloud computing revenue. 353 00:17:26.920 --> 00:17:29.559 So what does this all mean for us? We have 354 00:17:29.720 --> 00:17:33.799 massive corporate monopolies fighting over compute power, algorithms that can 355 00:17:33.839 --> 00:17:38.440 imagine synthetic DNA, and autonomous systems driving our cars. The 356 00:17:38.519 --> 00:17:42.839 text highlights one primary glaring vulnerability to all of this, 357 00:17:43.599 --> 00:17:44.960 the black box dilemma. 358 00:17:45.160 --> 00:17:46.359 Yes, the black box. 359 00:17:46.440 --> 00:17:50.000 If an AI diagnoses you with a terminal illness, or 360 00:17:50.039 --> 00:17:53.880 a self driving car swerves into a barrier, how dangerous 361 00:17:53.920 --> 00:17:56.480 is it that we can't easily interpret how the neural 362 00:17:56.519 --> 00:17:58.000 network arrived at its decision. 363 00:17:58.240 --> 00:18:00.880 The black box nature of deep learning is arguably its 364 00:18:00.880 --> 00:18:04.519 most dangerous flaw, Because the network learns by mapping incredibly 365 00:18:04.559 --> 00:18:08.519 high dimensional spaces and extracting millions of its own latent features. 366 00:18:08.519 --> 00:18:11.519 On that metaphorical magical pot we talked about. You can't 367 00:18:11.559 --> 00:18:13.160 just pop the hood and read a clean line of 368 00:18:13.160 --> 00:18:14.920 code that says if X, then Y. 369 00:18:15.200 --> 00:18:16.319 It's completely opaque. 370 00:18:16.359 --> 00:18:20.319 It completely lacks the transparency of a traditional decision tree. Furthermore, 371 00:18:20.319 --> 00:18:22.680 these networks suffer from the knowledge persistence problem. 372 00:18:22.799 --> 00:18:23.400 What does that mean? 373 00:18:23.680 --> 00:18:26.000 Well, if you train a network to diagnose lung X 374 00:18:26.119 --> 00:18:28.519 rays and then ask it to look at a bone fracture. 375 00:18:28.640 --> 00:18:30.880 It often forgets everything. It has to be trained entirely 376 00:18:30.920 --> 00:18:31.519 from scratch. 377 00:18:31.799 --> 00:18:34.799 It can't easily transfer its worldview the way a human 378 00:18:34.839 --> 00:18:35.519 doctor could. 379 00:18:35.680 --> 00:18:39.400 Right, It's very sensitive to its initialization. But the text 380 00:18:39.480 --> 00:18:44.160 notes that researchers aren't just accepting the black box. They 381 00:18:44.160 --> 00:18:46.799 are building tools to shine a light inside it. 382 00:18:46.960 --> 00:18:48.039 Like what kind of tools? 383 00:18:48.079 --> 00:18:52.039 There is a massive push for interpretability. Methods like patternet 384 00:18:52.119 --> 00:18:55.440 are being developed specifically to trace the decision pathway backward. 385 00:18:56.319 --> 00:19:00.000 So if the network identifies a tumor, PATTERNET tries to 386 00:19:00.240 --> 00:19:04.319 highlight exactly which specific pixels in the original image activated 387 00:19:04.319 --> 00:19:06.319 the neurons that led to that conclusion. 388 00:19:06.440 --> 00:19:09.400 Oh, so a reverse engineers the network's logic exactly. 389 00:19:09.720 --> 00:19:13.359 We're also seeing significant breakthroughs in transfer learning, where a 390 00:19:13.400 --> 00:19:16.519 model train on one massive data set can freeze its 391 00:19:16.559 --> 00:19:20.160 foundational layers, the layers that recognize basic edges and shapes 392 00:19:20.160 --> 00:19:23.160 and carry that knowledge over to a smaller, entirely different 393 00:19:23.200 --> 00:19:23.599 data set. 394 00:19:23.640 --> 00:19:26.039 That sounds promising, it is, but. 395 00:19:26.000 --> 00:19:29.720 This raises an important question as we develop techniques like 396 00:19:29.799 --> 00:19:34.279 open ais evolution strategies to train networks without relying entirely 397 00:19:34.319 --> 00:19:37.559 on traditional gradients. We are desperately trying to align the 398 00:19:37.599 --> 00:19:41.240 machine's reasoning with human reasoning. We want them to explain 399 00:19:41.279 --> 00:19:45.400 themselves to us, but their fundamental architecture, the way they 400 00:19:45.440 --> 00:19:50.599 perceive thousands of dimensions simultaneously, it remains profoundly alien. 401 00:19:50.799 --> 00:19:53.640