WEBVTT 1 00:00:01.199 --> 00:00:06.200 Welcome to the Sentient Code, where intelligence is engineered, autonomy 2 00:00:06.280 --> 00:00:10.439 is emerging, and a line between human and machine grows thinner. 3 00:00:10.800 --> 00:00:15.359 Each episode, we decode the algorithms, explore the robotics, and 4 00:00:15.439 --> 00:00:19.000 examine the ideas shaping the future of artificial minds. 5 00:00:23.879 --> 00:00:25.079 Welcome back to the deep Dive. 6 00:00:25.440 --> 00:00:25.800 You know. 7 00:00:27.280 --> 00:00:30.719 We spend so much time obsessing over the software side 8 00:00:30.719 --> 00:00:31.120 of AI. 9 00:00:31.239 --> 00:00:33.560 Oh, we absolutely do. It is all anyone talks about, 10 00:00:33.640 --> 00:00:34.359 right We look. 11 00:00:34.240 --> 00:00:37.640 At the chatbots and the image generators, the flashy demos, 12 00:00:37.679 --> 00:00:40.320 the code. It is always look what this new model 13 00:00:40.359 --> 00:00:46.399 can do. But today I want to physically unplug all 14 00:00:46.439 --> 00:00:46.640 of that. 15 00:00:46.840 --> 00:00:48.200 If we are going down to the basement. 16 00:00:48.280 --> 00:00:50.200 We are going to the basement, we are stepping away 17 00:00:50.200 --> 00:00:52.439 from the cloud and we're going to talk about the 18 00:00:52.520 --> 00:00:56.200 actual physical machine that makes the cloud exist. We are 19 00:00:56.240 --> 00:00:57.240 talking about the iron. 20 00:00:57.399 --> 00:00:59.600 The iron. You know, it is funny you use that 21 00:00:59.719 --> 00:01:03.159 term because for the first really the first fifty years 22 00:01:03.200 --> 00:01:06.879 of computing history, software people look down on hardware people. 23 00:01:06.920 --> 00:01:08.239 It was just plumbing to them. 24 00:01:08.400 --> 00:01:11.079 Exactly. Hardware was a commodity. It was dirt. It was 25 00:01:11.200 --> 00:01:13.480 just the physical thing you ran your brilliant code on. 26 00:01:13.879 --> 00:01:16.400 And if your code was running slow. You didn't rewrite it. 27 00:01:16.480 --> 00:01:18.560 You just waited two years for Intel to make a 28 00:01:18.599 --> 00:01:21.280 faster chip, and your problem was magically solved. 29 00:01:21.519 --> 00:01:24.280 Moore's law was basically a free lunch for developers. 30 00:01:24.560 --> 00:01:27.319 It was a free lunch. But that free lunch is over. 31 00:01:27.879 --> 00:01:30.719 The script is completely flipped. Now we have moved from 32 00:01:30.760 --> 00:01:34.680 this era of code dominance to an era of compute dominance. 33 00:01:34.840 --> 00:01:38.040 Because in the AI world today, the smartest algorithm doesn't 34 00:01:38.040 --> 00:01:39.200 necessarily win anymore. 35 00:01:39.439 --> 00:01:42.280 No, it doesn't. The team with the biggest, most specialized 36 00:01:42.280 --> 00:01:44.439 pile of silicon wins, period. 37 00:01:44.560 --> 00:01:46.840 And we are calling this the compute race. I think 38 00:01:46.959 --> 00:01:49.359 what is so surprising to people tuning into this is 39 00:01:49.400 --> 00:01:52.120 that this isn't just Apple versus Microsoft anymore. This is 40 00:01:52.519 --> 00:01:56.159 arguably the central geopolitical conflict of the twenty twenties. 41 00:01:56.400 --> 00:02:00.079 It is the absolute bottleneck of the modern world. You 42 00:02:00.120 --> 00:02:03.359 look at the supply chain for advanced AI chips, it 43 00:02:03.439 --> 00:02:06.560 is terrifyingly concentrated. Right we were talking about a single 44 00:02:06.599 --> 00:02:10.000 company in the Netherlands that makes the manufacturing machines, a 45 00:02:10.039 --> 00:02:13.120 single island Taiwan that actually manufactures the chips, and a 46 00:02:13.159 --> 00:02:15.400 handful of US companies that design them and if. 47 00:02:15.280 --> 00:02:17.639 Any single link in that chain breaks, the. 48 00:02:17.599 --> 00:02:20.080 AI revolution doesn't just slow down, it stops entirely. 49 00:02:20.439 --> 00:02:23.439 So today we are going to trace that chain. We 50 00:02:23.479 --> 00:02:26.439 are going to figure out how a piece of hardware 51 00:02:26.520 --> 00:02:29.680 designed to let teenagers play Call of Duty somehow became 52 00:02:29.719 --> 00:02:31.360 the brain of modern civilization. 53 00:02:31.759 --> 00:02:33.120 Is an incredible pivot, and. 54 00:02:33.080 --> 00:02:35.560 We are going to look at why Google panic built 55 00:02:35.599 --> 00:02:38.680 their own secret chip factory, and we will get into 56 00:02:38.680 --> 00:02:41.479 the physical limits too, because apparently we are literally running 57 00:02:41.479 --> 00:02:42.599 out of atoms to work with. 58 00:02:42.879 --> 00:02:44.840 It is a story about physics, it is a story 59 00:02:44.879 --> 00:02:48.280 about economics, and ultimately it is a story about war. 60 00:02:49.080 --> 00:02:51.599 Let us start with the origin story, because this is 61 00:02:51.639 --> 00:02:54.000 the part that always feels like a massive accident of 62 00:02:54.080 --> 00:02:56.719 history to me. If you look at the trillion dollar 63 00:02:56.759 --> 00:02:59.919 club today, Nvidia is sitting right there at the top, 64 00:03:00.599 --> 00:03:03.039 But twenty years ago they were not trying to build 65 00:03:03.159 --> 00:03:04.639 artificial intelligence. 66 00:03:04.199 --> 00:03:07.240 Not even close. If you walked into Nvidia headquarters in 67 00:03:07.319 --> 00:03:10.120 say nineteen ninety nine or two thousand and five, they 68 00:03:10.120 --> 00:03:13.680 were obsessed with one single thing, polygons video games, specifically 69 00:03:13.759 --> 00:03:17.280 rendering three D graphics. They wanted to make better explosions, 70 00:03:17.280 --> 00:03:20.719 realistic textures, dynamic lighting, shadows and to understand why that 71 00:03:20.759 --> 00:03:22.560 matters for AI, we really have to get a little 72 00:03:22.639 --> 00:03:23.240 technical here. 73 00:03:23.280 --> 00:03:25.759 Okay, let's break it down. We need to distinguish between 74 00:03:25.800 --> 00:03:29.319 the chip in your standard laptop, which is the CPU, 75 00:03:29.840 --> 00:03:32.879 and the chip in a gaming card, the GPU. Because 76 00:03:32.919 --> 00:03:35.080 I think most people hear the word processor and they 77 00:03:35.120 --> 00:03:38.199 just picture a little black square. What is the actual 78 00:03:38.319 --> 00:03:40.639 architectural difference inside that square? 79 00:03:40.840 --> 00:03:45.319 Okay, let's unpack this. Imagine a CPU, the central processing unit. 80 00:03:45.400 --> 00:03:48.039 This is your Intel Core I nine or your AMD risin. 81 00:03:48.719 --> 00:03:52.639 A CPU is like a team of incredibly smart mathematicians. 82 00:03:53.159 --> 00:03:55.879 Let us say a tight knit team of twelve geniuses. 83 00:03:56.280 --> 00:03:58.240 Small team, very high IQ. 84 00:03:58.159 --> 00:04:00.960 Extremely high i Q, and highly versatile. If you give 85 00:04:01.000 --> 00:04:04.000 one of these CPU cores a really complex problem, something 86 00:04:04.080 --> 00:04:07.479 like run this operating system, then open Excel, then calculate 87 00:04:07.520 --> 00:04:10.479 this complex formula, then check for incoming email. It can 88 00:04:10.520 --> 00:04:12.599 handle that context switching perfectly. 89 00:04:12.199 --> 00:04:14.479 Because it is designed for serial processing. 90 00:04:14.080 --> 00:04:16.800 Exactly Step A, then step B, then step C. It 91 00:04:16.839 --> 00:04:19.519 has massive amounts of complex logic built in just to 92 00:04:19.560 --> 00:04:22.199 handle branching paths like if this happens, then do that. 93 00:04:22.279 --> 00:04:24.920 So a CPU is optimized for logic and sequence. 94 00:04:24.879 --> 00:04:28.199 Correct now look at a GPU, the graphics processing unit. 95 00:04:28.639 --> 00:04:31.519 A GPU is not a team of twelve geniuses. It 96 00:04:31.600 --> 00:04:35.160 is a stadium filled with ten thousand average high school students. 97 00:04:35.240 --> 00:04:36.920 Okay, I like where this analogy is going. 98 00:04:37.079 --> 00:04:40.399 Individually, those students aren't that smart. They cannot run a 99 00:04:40.439 --> 00:04:44.160 modern operating system. They will completely freeze up if you 100 00:04:44.199 --> 00:04:46.639 give them complex branching logic chains. 101 00:04:46.759 --> 00:04:48.360 But they have numbers on their side. 102 00:04:48.480 --> 00:04:51.040 Exactly, if you give them a task that is simple 103 00:04:51.079 --> 00:04:54.120 and repetitive, like take these two numbers and add them together, 104 00:04:54.240 --> 00:04:55.959 and you tell all ten thousand of them to do 105 00:04:56.000 --> 00:04:57.120 it at the exact same. 106 00:04:56.959 --> 00:04:59.160 Time, they will completely obliterate the CPU. 107 00:04:59.279 --> 00:05:00.560 They will leave it a in the dust. 108 00:05:00.720 --> 00:05:03.800 And this is the core concept of parallelism. 109 00:05:03.120 --> 00:05:07.360 Specifically data parallelism. And this is exactly where video games 110 00:05:07.399 --> 00:05:10.279 come into the picture. Think about your computer screen right now. 111 00:05:10.519 --> 00:05:13.120 It is a grid of pixels. A standard monitor is 112 00:05:13.240 --> 00:05:16.480 nineteen twenty y ten eighty, which is roughly two million pixels. 113 00:05:16.959 --> 00:05:19.439 To render just one single frame of a video game, 114 00:05:19.759 --> 00:05:22.720 you need to calculate the exact color for every single 115 00:05:22.720 --> 00:05:25.800 one of those two million pixels based on the virtual lighting, 116 00:05:26.079 --> 00:05:28.600 the texture of the wall, the geometry of the character. 117 00:05:28.839 --> 00:05:31.079 And the crucial part here is that the color of 118 00:05:31.120 --> 00:05:34.079 the pixel and the top left corner generally does not 119 00:05:34.160 --> 00:05:36.120 depend on the color of the pixel in the bottom 120 00:05:36.199 --> 00:05:36.720 right corner. 121 00:05:36.759 --> 00:05:40.439 Precisely, they are mathematically independent. You don't need to calculate 122 00:05:40.439 --> 00:05:42.959 pixel one and then wait to calculate pixel two and 123 00:05:42.959 --> 00:05:45.319 then pixel three. You can calculate all two million of 124 00:05:45.319 --> 00:05:48.800 them simultaneously. Computer scientists actually have a great term for this. 125 00:05:49.160 --> 00:05:51.839 They call it an embarrassingly parallel problem. 126 00:05:51.959 --> 00:05:54.519 I love that term so much. It is so parallel 127 00:05:54.560 --> 00:05:57.120 it is actually embarrassing not to do it all at once, and. 128 00:05:57.040 --> 00:06:00.439 That is exactly why the GPU was invented. It actively 129 00:06:00.439 --> 00:06:05.279 sacrifices individual core speed and complex logic in exchange for 130 00:06:05.480 --> 00:06:10.519 raw massive parallelism. It uses thousands of tiny, relatively dumb 131 00:06:10.560 --> 00:06:12.920 cores instead of a few really smart ones. 132 00:06:13.000 --> 00:06:15.480 Okay, so we have this chip that was fundamentally designed 133 00:06:15.519 --> 00:06:18.079 to run games like Doom and Quake. How do we 134 00:06:18.120 --> 00:06:20.839 make the jump from rendering a virtual shotgun to running 135 00:06:20.920 --> 00:06:21.600 chat GBT? 136 00:06:21.959 --> 00:06:24.519 This is where we hit the great convergence. Around the 137 00:06:24.600 --> 00:06:28.240 late two thousands, AI researchers were hitting a massive wall. 138 00:06:28.720 --> 00:06:31.839 They had these theoretical ideas about neural networks, which are 139 00:06:31.839 --> 00:06:36.519 basically mathematical structures inspired by the biological human brain, but 140 00:06:36.680 --> 00:06:40.160 actually training them was agonizingly slow because they were trying 141 00:06:40.160 --> 00:06:43.240 to run them on those CPUs. The twelve Geniuses, right, 142 00:06:43.319 --> 00:06:45.759 and the geniuses were just getting bogged down because a 143 00:06:45.800 --> 00:06:49.199 neural network, at its very core is just a massive 144 00:06:49.319 --> 00:06:52.959 grid of numbers. In math we call the matrices. To 145 00:06:53.079 --> 00:06:55.600 train in AI, you have to multiply these giant grids 146 00:06:55.600 --> 00:06:58.720 of numbers together, adjust the results slightly, and then do 147 00:06:58.800 --> 00:07:01.560 it again billions of time, literally billions of times. 148 00:07:01.600 --> 00:07:03.800 And I'm guessing matrix multiplication is. 149 00:07:03.800 --> 00:07:08.079 It is embarrassingly parallel. Multiplying a massive matrix is really 150 00:07:08.120 --> 00:07:12.240 just performing the exact same simple multiplication operation on thousands 151 00:07:12.279 --> 00:07:15.480 of numbers at the exact same time. It turns out 152 00:07:15.800 --> 00:07:18.639 the math required to simulate a photon of light bouncing 153 00:07:18.680 --> 00:07:20.720 off a three D wall in a video game is 154 00:07:20.759 --> 00:07:23.319 almost identical to the math required to simulate a virtual 155 00:07:23.360 --> 00:07:25.680 neuron firing in an artificial brain. 156 00:07:25.800 --> 00:07:27.800 That is just such a wild coincidence to me. 157 00:07:28.120 --> 00:07:30.600 It is the happy accident that gave us the entire 158 00:07:30.639 --> 00:07:31.319 modern world. 159 00:07:31.480 --> 00:07:34.800 So when did the industry actually realize this? Was there 160 00:07:34.800 --> 00:07:37.199 a specific moment where the light bulb suddenly went on 161 00:07:37.240 --> 00:07:38.759 for everyone, there was. 162 00:07:38.759 --> 00:07:41.160 A big bang moment. It was twenty twelve. A competition 163 00:07:41.240 --> 00:07:42.040 called image. 164 00:07:41.839 --> 00:07:45.040 Net set the scene for us. What exactly was image. 165 00:07:44.839 --> 00:07:48.000 Net Imaget was basically the Olympics of computer vision. You 166 00:07:48.040 --> 00:07:51.639 had this massive data set of millions of images pictures 167 00:07:51.680 --> 00:07:56.120 of cats, dogs, airplanes, strawberries, and researchers had to write 168 00:07:56.519 --> 00:07:59.600 software that could look at the pixels and identify what 169 00:07:59.639 --> 00:08:00.319 was actually in. 170 00:08:00.240 --> 00:08:03.600 The picture, which is incredibly hard for a computer, very hard. 171 00:08:03.759 --> 00:08:06.279 For years, the best teams in the world, mostly using 172 00:08:06.319 --> 00:08:10.319 traditional hand coded logic techniques, were stuck getting air rates 173 00:08:10.360 --> 00:08:11.399 around twenty six percent. 174 00:08:11.560 --> 00:08:13.759 That is not great, That is missing one out of 175 00:08:13.759 --> 00:08:14.720 every four pictures. 176 00:08:14.759 --> 00:08:17.040 It was the best we had At the time. Progress 177 00:08:17.120 --> 00:08:21.279 was agonizingly slow. Most people thought true human level computer 178 00:08:21.360 --> 00:08:25.199 vision was decades away. But then in twenty twelve, this 179 00:08:25.319 --> 00:08:28.920 small team from the University of Toronto shows up. Alex Krzewski, 180 00:08:29.399 --> 00:08:31.959 Ilia Sitzkaver, and Jeffrey Hinton, and. 181 00:08:31.920 --> 00:08:34.840 Those Nate I mean is Ilia Setzkaver and Jeffrey Hinton. 182 00:08:35.080 --> 00:08:39.120 These are the absolute titans of AI today, but back 183 00:08:39.159 --> 00:08:40.600 then they were kind of the outsiders. 184 00:08:40.679 --> 00:08:43.360 Right, they are the crazy ones. Neural networks were widely 185 00:08:43.399 --> 00:08:46.679 considered a dead end by most serious computer scientists, but 186 00:08:46.799 --> 00:08:50.080 this team entered a neural network they called alex net, 187 00:08:50.519 --> 00:08:53.360 and it didn't just win the competition, it utterly destroyed 188 00:08:53.360 --> 00:08:55.639 the field. They dropped the air rate from twenty six 189 00:08:55.639 --> 00:08:57.440 percent down to fifteen percent in. 190 00:08:57.360 --> 00:09:00.759 A single year. That is unprecedented for that competition. 191 00:09:00.840 --> 00:09:03.879 In one year. It was a mathematical massacre. The entire 192 00:09:04.000 --> 00:09:07.080 conference room when completely silent when they presented. But here's 193 00:09:07.080 --> 00:09:09.720 the specific detail that matters for a story today. To 194 00:09:09.799 --> 00:09:13.360 train alex Net, they didn't use a massive government supercomputer. 195 00:09:13.919 --> 00:09:15.879 They didn't use a giant server cluster. 196 00:09:16.039 --> 00:09:16.840 What did they use. 197 00:09:16.919 --> 00:09:20.399 They literally went to a consumer electronics store and bought 198 00:09:20.440 --> 00:09:24.039 two Nvidia GTX five eighty graphics cards. 199 00:09:23.960 --> 00:09:26.720 Two gamer cards, the exact kind of thing you would 200 00:09:26.759 --> 00:09:29.480 put in a Dusktop PC to play Skyrim. 201 00:09:29.240 --> 00:09:32.519 Exactly two cards that cost maybe five hundred dollars each 202 00:09:32.559 --> 00:09:35.279 at the time. They shoved them into a standard PC. 203 00:09:35.840 --> 00:09:38.039 They wrote some custom code to move the math off 204 00:09:38.080 --> 00:09:41.080 the CPU and onto the GPU, and they suddenly realized 205 00:09:41.120 --> 00:09:43.879 they could train their model in a matter of days instead. 206 00:09:43.559 --> 00:09:47.039 Of months and That is the true Aha moment, because 207 00:09:47.080 --> 00:09:49.639 if you can iterate in days, you can actually learn 208 00:09:49.720 --> 00:09:52.480 and adapt. If an experiment takes six months, you are 209 00:09:52.559 --> 00:09:53.200 just stuck. 210 00:09:53.480 --> 00:09:57.039 Exactly. Speed is intelligence in this field. If you can 211 00:09:57.080 --> 00:09:59.159 run one hundred experiments in the time it takes your 212 00:09:59.240 --> 00:10:03.639 rival to run one one, you get smarter incredibly fast. Honestly, 213 00:10:03.720 --> 00:10:06.080 in vidia stock price chart should basically have a little 214 00:10:06.120 --> 00:10:08.679 bronze statue of alex Net next to it. But this 215 00:10:08.720 --> 00:10:10.159 is the deep dive of nuance we need to hit. 216 00:10:10.240 --> 00:10:12.080 It wasn't just the physical hardware that made. 217 00:10:11.919 --> 00:10:14.559 This possible, right, because you cannot just plug a video 218 00:10:14.600 --> 00:10:16.840 card into a motherboard and tell it to learn English. 219 00:10:16.879 --> 00:10:19.639 It inherently speaks graphics, doesn't speak math. 220 00:10:19.600 --> 00:10:22.960 Correct And this is where in Nvidia's CEO Jensen Huong 221 00:10:23.080 --> 00:10:27.480 showed just incredible almost prophetic foresight years before alex Net, 222 00:10:27.519 --> 00:10:29.320 way back in two thousand and six, and Video released 223 00:10:29.320 --> 00:10:32.279 a software platform called CUA ce UA. 224 00:10:33.039 --> 00:10:36.120 I see this acronym constantly when reading about this space. 225 00:10:36.720 --> 00:10:41.240 It is usually described as Invidia's massive mote. What is 226 00:10:41.279 --> 00:10:42.879 it actually doing under the hood? 227 00:10:43.000 --> 00:10:45.279 Well before CD existed, If you wanted to use a 228 00:10:45.320 --> 00:10:48.200 GPU for scientific math, you basically had to hack it. 229 00:10:48.480 --> 00:10:50.879 You literally had to trick the graphics card into thinking 230 00:10:50.919 --> 00:10:53.879 your math problem was actually a texture or a shadow 231 00:10:53.919 --> 00:10:56.960 on a polygon. It was an incredibly painful process. 232 00:10:57.039 --> 00:11:00.000 Please render this massive spreadsheet as an explosion. 233 00:11:00.200 --> 00:11:04.080 Basically, yes, it was a total nightmare for researchers. CUDA 234 00:11:04.240 --> 00:11:06.879 changed all of that. It was a software layer that 235 00:11:06.960 --> 00:11:10.639 let normal programmers write standard code like C plus plus 236 00:11:10.919 --> 00:11:14.279 that ran directly on the GPU. It exposed the raw 237 00:11:14.360 --> 00:11:17.960 mathematical power of the chip without all the annoying graphics baggage. 238 00:11:18.120 --> 00:11:21.600 So Nvidia essentially built the translation layer before anyone even 239 00:11:21.639 --> 00:11:23.519 really knew what language they wanted to speak. 240 00:11:23.600 --> 00:11:26.120 Jensen Wong practically bet the entire company on it now 241 00:11:26.279 --> 00:11:28.679 and Wall Street absolutely hated it at the time. Investors 242 00:11:28.720 --> 00:11:30.960 were furious. They said, why are you spending billions of 243 00:11:30.960 --> 00:11:33.039 dollars on R and D for a feature that only 244 00:11:33.080 --> 00:11:35.200 a few academic weirdos in universities use. 245 00:11:35.480 --> 00:11:39.000 And then a few years later those weirdos invented modern AI. 246 00:11:39.480 --> 00:11:43.759 And because all those weirdos learned to code specifically in CUDA, 247 00:11:44.519 --> 00:11:48.159 the entire foundation of modern AI was built on top 248 00:11:48.200 --> 00:11:51.639 of Nvidia's proprietary software. Now If you are a hardware 249 00:11:51.679 --> 00:11:53.679 startup today and you want to build a brand new 250 00:11:53.720 --> 00:11:57.240 chip to beat in Nvidia, you have a massive, massive problem. 251 00:11:56.879 --> 00:11:59.679 Because nobody knows how to program your new chip exactly. 252 00:12:00.080 --> 00:12:02.559 All the libraries, all the developer tools, all the research 253 00:12:02.639 --> 00:12:07.600 they all natively speak CUDA. It is the classic ecosystem 254 00:12:07.720 --> 00:12:10.399 lock in. It is like Windows in the nineties or 255 00:12:10.440 --> 00:12:13.399 the iPhone app store today. It is incredibly difficult to 256 00:12:13.399 --> 00:12:14.120 break that habit. 257 00:12:14.399 --> 00:12:16.639 Let us fast forward to today then, because we are 258 00:12:16.720 --> 00:12:19.360 obviously not using five hundred dollars GTX five eighties anymore. 259 00:12:19.399 --> 00:12:21.440 We're using the H one hundred. This is the chip 260 00:12:21.480 --> 00:12:23.759 that companies are fighting over, the one Mark Zuckerberg is 261 00:12:23.759 --> 00:12:26.000 supposedly buying three hundred and fifty thousand. 262 00:12:25.679 --> 00:12:28.080 Of H one hundred is It is a monster. It 263 00:12:28.159 --> 00:12:29.799 is a true marvel of human engineering. 264 00:12:29.840 --> 00:12:32.039 Give me the physical stats. What are actually looking at here? 265 00:12:32.159 --> 00:12:36.120 It is a slab of silicon that has eighty billion 266 00:12:36.399 --> 00:12:39.759 individual transistors carved into it using a four and nanimeter 267 00:12:39.960 --> 00:12:43.759 manufacturing process. Just wrap your head around that eighty billion 268 00:12:43.799 --> 00:12:47.080 on one chip. But honestly, the raw transistor count isn't 269 00:12:47.080 --> 00:12:50.159 even the most impressive part. It is how highly specialized. 270 00:12:50.159 --> 00:12:53.320 The architecture has become specialized in what way? Remember how 271 00:12:53.360 --> 00:12:56.639 the old GPUs were fairly general purpose for graphics. The 272 00:12:56.799 --> 00:12:59.720 H one hundred is designed specifically from the ground up 273 00:12:59.799 --> 00:13:02.440 for the math of transformers DASH, which is the t 274 00:13:02.799 --> 00:13:06.639 in chat GPT. It has specific hardware units inside it 275 00:13:06.679 --> 00:13:07.639 called tensor. 276 00:13:07.279 --> 00:13:08.559 Cores tensor course. 277 00:13:08.759 --> 00:13:11.080 Think of them as dedicated calculator services that do nothing 278 00:13:11.120 --> 00:13:14.320 but matrix multiplication. They cannot render graphics, they cannot run 279 00:13:14.360 --> 00:13:16.799 an operating system. They just do that one specific math 280 00:13:16.840 --> 00:13:20.360 operation incredibly fast. The H one hundred can perform roughly 281 00:13:20.399 --> 00:13:23.840 four thousand trillion floating point operations per second if you 282 00:13:23.919 --> 00:13:25.080 use the right precision levels. 283 00:13:25.279 --> 00:13:28.720 Four thousand trillion operations per second. That is unfathomable. 284 00:13:28.799 --> 00:13:31.440 But here's the crazy part. Raw compute speed is actually 285 00:13:31.480 --> 00:13:34.080 the easy part of chip design. Now, the real bottleneck, 286 00:13:34.080 --> 00:13:36.360 the thing that actually keeps chip architects up at night, 287 00:13:36.440 --> 00:13:36.960 is memory. 288 00:13:37.240 --> 00:13:39.919 This is the memory wall concept I keep reading about, right. 289 00:13:40.200 --> 00:13:42.480 It simply does not matter if your process or brain 290 00:13:42.519 --> 00:13:45.000 can think of billion thoughts a second, if you cannot 291 00:13:45.000 --> 00:13:47.600 get the data into the brain fast enough, A super 292 00:13:47.639 --> 00:13:50.440 SaaS chip with slow memory is like a Ferrari with 293 00:13:50.519 --> 00:13:53.759 a clogged fuel line, it just stalls out. So the 294 00:13:53.960 --> 00:13:56.600 H one hundred uses a brand new type of memory 295 00:13:56.639 --> 00:13:59.399 called HBM High bandwidth memory. 296 00:13:59.559 --> 00:14:01.879 How does that solve the fuel line problem? 297 00:14:01.960 --> 00:14:05.679 It is stacked vertically. They literally build skyscrapers of memory 298 00:14:05.759 --> 00:14:09.480 chips directly on top of the processor itself to physically 299 00:14:09.480 --> 00:14:12.039 shorten the distance the electrical signals have to travel. 300 00:14:12.200 --> 00:14:14.919 So they are building three D towers of memory right 301 00:14:15.000 --> 00:14:18.000 next to the logic cores just to save the fractions 302 00:14:18.000 --> 00:14:20.240 of a nanosecond it takes for the signal to travel 303 00:14:20.240 --> 00:14:21.679 across a standard motherboard. 304 00:14:21.799 --> 00:14:24.200 We are actively fighting this speed of light. At this point. 305 00:14:24.840 --> 00:14:27.559 The H one hundred has a memory bandwidth of over 306 00:14:27.639 --> 00:14:31.080 three point three terabytes per second. To put that in perspective, 307 00:14:31.440 --> 00:14:34.279 that is like downloading thousands of full four K movies 308 00:14:34.320 --> 00:14:36.720 in a single second. It is an absolute fire hose 309 00:14:36.759 --> 00:14:37.159 of data. 310 00:14:37.200 --> 00:14:39.799 And they use something called envylink to string them together. Right. 311 00:14:40.200 --> 00:14:45.080 Yes, Envylink is their proprietary interconnect because one H one 312 00:14:45.159 --> 00:14:48.440 hundred isn't enough. You need thousands of them to function 313 00:14:48.519 --> 00:14:52.600 as one giant unified brain. Envylink is the nervous system 314 00:14:52.639 --> 00:14:54.519 that lets them talk to each other fast enough to 315 00:14:54.559 --> 00:14:55.559 stay synchronized. 316 00:14:55.799 --> 00:14:58.960 And yet despite all of that insane power and the 317 00:14:59.039 --> 00:15:02.159 CDA mode, and VIDIA is not the only player in 318 00:15:02.200 --> 00:15:05.159 town anymore. Which brings us to this sleeping giant that 319 00:15:05.240 --> 00:15:06.919 suddenly woke up Google. 320 00:15:07.399 --> 00:15:10.080 This is truly one of my favorite corporate history stories 321 00:15:10.360 --> 00:15:12.679 because while everyone else in the world was just blindly 322 00:15:12.720 --> 00:15:16.000 buying in vidio chips, Google looked at their internal usage 323 00:15:16.080 --> 00:15:17.960 data and absolutely freaked out. 324 00:15:18.000 --> 00:15:20.240 This was back around twenty thirteen r twenty thirteen. 325 00:15:20.320 --> 00:15:22.200 Yeah, yeah. Google engineers did a back of the n 326 00:15:22.200 --> 00:15:25.440 appting calculation that terrified them. They looked at the rapid 327 00:15:25.519 --> 00:15:28.399 rise of voice search on Android phones and they realized 328 00:15:28.440 --> 00:15:31.840 that if every single Android user used voice search for 329 00:15:32.080 --> 00:15:33.559 just three minutes a day. 330 00:15:33.639 --> 00:15:36.320 Just three minutes, that is like two quick searches. 331 00:15:35.960 --> 00:15:39.080 Exactly almost nothing. But they calculated that those three minutes 332 00:15:39.080 --> 00:15:41.840 would require so much compute power to process the speech 333 00:15:41.879 --> 00:15:46.120 recognition that it would completely double Google's entire global data center. 334 00:15:45.919 --> 00:15:48.639 Footprint, doubled their entire footprint. 335 00:15:48.720 --> 00:15:51.080 They would have had to build twice as many data 336 00:15:51.080 --> 00:15:53.559 centers as they had built in their entire corporate history 337 00:15:54.039 --> 00:15:57.519 just to support three minutes of voice search. They realized 338 00:15:57.639 --> 00:16:01.039 instantly that if they relied on buying standard Intel CPUs 339 00:16:01.039 --> 00:16:05.000 and Nvidia GPUs, they would literally go bankrupt. The economics 340 00:16:05.080 --> 00:16:07.159 just flat out did not work at that scale. 341 00:16:07.240 --> 00:16:10.440 So, in classic Google fashion, they just decided, we will 342 00:16:10.440 --> 00:16:11.600 build our own hardware. 343 00:16:12.080 --> 00:16:15.639 They launched a highly secret internal project to build the TPU, 344 00:16:16.080 --> 00:16:20.840 the tensor processing unit, and their design philosophy was incredibly 345 00:16:20.960 --> 00:16:25.279 radical compared to Nvidia. Because in Vidia sells GPUs to everyone, right, 346 00:16:25.480 --> 00:16:28.559 they have to be good at gaming, cryptomning, self driving cars, 347 00:16:28.799 --> 00:16:32.440 scientific simulations. Google said, we do not care about gaming, 348 00:16:32.480 --> 00:16:34.840 We do not care about graphics at all. We want 349 00:16:34.879 --> 00:16:38.000 a chip that does deep learning and absolutely nothing else. 350 00:16:38.039 --> 00:16:39.600 So they stripped the sports car all the way down 351 00:16:39.600 --> 00:16:42.559 to the chassis. No AC, no radio, just a massive engine. 352 00:16:42.639 --> 00:16:44.639 Even the engine itself is totally different. They used a 353 00:16:44.639 --> 00:16:46.919 specific architecture called a systolic array. 354 00:16:47.039 --> 00:16:49.159 Systolic like blood pressure, like a. 355 00:16:49.120 --> 00:16:52.039 Heart beat, exactly like a heart beat, and a normal CPU. 356 00:16:52.080 --> 00:16:54.919 With GPU, the chip acts kind of like a library. 357 00:16:55.200 --> 00:16:56.679 You go to the shelf to get a book which 358 00:16:56.720 --> 00:16:59.440 is your data. You bring it to the desk the processor, 359 00:16:59.519 --> 00:17:01.320 you read it, and then you walk all the way 360 00:17:01.360 --> 00:17:03.720 back to put it on the shelf. That walking back 361 00:17:03.720 --> 00:17:06.720 and forth accessing the memory takes a massive amount of 362 00:17:06.799 --> 00:17:07.880 energy and time. 363 00:17:07.880 --> 00:17:10.640 And as we just established, energy and memory are the 364 00:17:10.720 --> 00:17:11.880 ultimate enemies. 365 00:17:11.599 --> 00:17:14.720 Here, right, So in a systolic array, you do not 366 00:17:14.880 --> 00:17:17.240 put the book back on the shelf. You process it, 367 00:17:17.359 --> 00:17:19.519 and then you immediately hand it to the person sitting 368 00:17:19.599 --> 00:17:22.680 right next to you. That data physically flows through the 369 00:17:22.720 --> 00:17:26.160 grid of the chip in a wave. One calculation finishes 370 00:17:26.440 --> 00:17:29.359 and simply pushes the result directly to the next math unit. 371 00:17:29.640 --> 00:17:32.359 It heavily mimics a continuous flow of blood through a 372 00:17:32.359 --> 00:17:33.440 circulatory system. 373 00:17:33.559 --> 00:17:35.680 So the data just enters one side of the chip, 374 00:17:35.839 --> 00:17:38.960 flows through this massive grid of math units getting multiplied, 375 00:17:39.240 --> 00:17:41.559 and just pops out the other side as a finished result. 376 00:17:41.759 --> 00:17:46.200 Exactly. It drastically reduced the need to constantly access external memory, 377 00:17:46.759 --> 00:17:50.480 and the result was staggering. That first internal TPU they 378 00:17:50.559 --> 00:17:54.240 deployed was roughly fifteen thirty times more efficient per watt 379 00:17:54.480 --> 00:17:57.200 than anything else available on the commercial market at the time. 380 00:17:57.359 --> 00:18:01.319 That is an insane leap inefficiency. And Google didn't stop there. 381 00:18:01.359 --> 00:18:02.839 They kept iterating on it. 382 00:18:02.920 --> 00:18:05.400 Oh yeah. Version two came out in twenty seventeen, and 383 00:18:05.440 --> 00:18:07.599 that was a huge deal because the first one could 384 00:18:07.599 --> 00:18:10.240 only run models, they couldn't train them. V two added 385 00:18:10.319 --> 00:18:13.200 full training capabilities, and today we are on V four 386 00:18:13.279 --> 00:18:16.319 and V five. They are entirely liquid cooled now and 387 00:18:16.359 --> 00:18:18.880 they deploy them in massive clusters they call pods. 388 00:18:19.240 --> 00:18:21.200 Thousands of chips all wired together. 389 00:18:21.039 --> 00:18:23.480 Right, and this brings up a massive advantage Google has 390 00:18:23.519 --> 00:18:27.880 over almost everyone else. It is their interconnect system called ICI. 391 00:18:28.200 --> 00:18:30.680 How is that different from in videos and vlink? 392 00:18:31.119 --> 00:18:34.559 Because Google completely controls their own data centers, they can 393 00:18:34.640 --> 00:18:37.480 wire these TPUs directly to each other in what is 394 00:18:37.480 --> 00:18:40.200 called a torus topology. Think of it like a giant 395 00:18:40.200 --> 00:18:43.599 three D donut shape. They use direct optical links between 396 00:18:43.599 --> 00:18:45.319 the chips. They don't have to route the data through 397 00:18:45.319 --> 00:18:49.519 standard bulky networking switches. It makes those thousands of TPUs 398 00:18:49.759 --> 00:18:52.240 act flawlessly as one single brain. 399 00:18:52.599 --> 00:18:54.720 And this is why today when you look at Google, 400 00:18:54.759 --> 00:18:58.079 they don't really buy in Vidia chips for their core 401 00:18:58.440 --> 00:19:01.799