WEBVTT

1
00:00:01.199 --> 00:00:06.200
<v Speaker 1>Welcome to the Sentient Code, where intelligence is engineered, autonomy

2
00:00:06.280 --> 00:00:10.439
<v Speaker 1>is emerging, and a line between human and machine grows thinner.

3
00:00:10.800 --> 00:00:15.359
<v Speaker 1>Each episode, we decode the algorithms, explore the robotics, and

4
00:00:15.439 --> 00:00:19.000
<v Speaker 1>examine the ideas shaping the future of artificial minds.

5
00:00:23.879 --> 00:00:25.079
<v Speaker 2>Welcome back to the deep Dive.

6
00:00:25.440 --> 00:00:25.800
<v Speaker 3>You know.

7
00:00:27.280 --> 00:00:30.719
<v Speaker 2>We spend so much time obsessing over the software side

8
00:00:30.719 --> 00:00:31.120
<v Speaker 2>of AI.

9
00:00:31.239 --> 00:00:33.560
<v Speaker 3>Oh, we absolutely do. It is all anyone talks about,

10
00:00:33.640 --> 00:00:34.359
<v Speaker 3>right We look.

11
00:00:34.240 --> 00:00:37.640
<v Speaker 2>At the chatbots and the image generators, the flashy demos,

12
00:00:37.679 --> 00:00:40.320
<v Speaker 2>the code. It is always look what this new model

13
00:00:40.359 --> 00:00:46.399
<v Speaker 2>can do. But today I want to physically unplug all

14
00:00:46.439 --> 00:00:46.640
<v Speaker 2>of that.

15
00:00:46.840 --> 00:00:48.200
<v Speaker 3>If we are going down to the basement.

16
00:00:48.280 --> 00:00:50.200
<v Speaker 2>We are going to the basement, we are stepping away

17
00:00:50.200 --> 00:00:52.439
<v Speaker 2>from the cloud and we're going to talk about the

18
00:00:52.520 --> 00:00:56.200
<v Speaker 2>actual physical machine that makes the cloud exist. We are

19
00:00:56.240 --> 00:00:57.240
<v Speaker 2>talking about the iron.

20
00:00:57.399 --> 00:00:59.600
<v Speaker 3>The iron. You know, it is funny you use that

21
00:00:59.719 --> 00:01:03.159
<v Speaker 3>term because for the first really the first fifty years

22
00:01:03.200 --> 00:01:06.879
<v Speaker 3>of computing history, software people look down on hardware people.

23
00:01:06.920 --> 00:01:08.239
<v Speaker 2>It was just plumbing to them.

24
00:01:08.400 --> 00:01:11.079
<v Speaker 3>Exactly. Hardware was a commodity. It was dirt. It was

25
00:01:11.200 --> 00:01:13.480
<v Speaker 3>just the physical thing you ran your brilliant code on.

26
00:01:13.879 --> 00:01:16.400
<v Speaker 3>And if your code was running slow. You didn't rewrite it.

27
00:01:16.480 --> 00:01:18.560
<v Speaker 3>You just waited two years for Intel to make a

28
00:01:18.599 --> 00:01:21.280
<v Speaker 3>faster chip, and your problem was magically solved.

29
00:01:21.519 --> 00:01:24.280
<v Speaker 2>Moore's law was basically a free lunch for developers.

30
00:01:24.560 --> 00:01:27.319
<v Speaker 3>It was a free lunch. But that free lunch is over.

31
00:01:27.879 --> 00:01:30.719
<v Speaker 3>The script is completely flipped. Now we have moved from

32
00:01:30.760 --> 00:01:34.680
<v Speaker 3>this era of code dominance to an era of compute dominance.

33
00:01:34.840 --> 00:01:38.040
<v Speaker 2>Because in the AI world today, the smartest algorithm doesn't

34
00:01:38.040 --> 00:01:39.200
<v Speaker 2>necessarily win anymore.

35
00:01:39.439 --> 00:01:42.280
<v Speaker 3>No, it doesn't. The team with the biggest, most specialized

36
00:01:42.280 --> 00:01:44.439
<v Speaker 3>pile of silicon wins, period.

37
00:01:44.560 --> 00:01:46.840
<v Speaker 2>And we are calling this the compute race. I think

38
00:01:46.959 --> 00:01:49.359
<v Speaker 2>what is so surprising to people tuning into this is

39
00:01:49.400 --> 00:01:52.120
<v Speaker 2>that this isn't just Apple versus Microsoft anymore. This is

40
00:01:52.519 --> 00:01:56.159
<v Speaker 2>arguably the central geopolitical conflict of the twenty twenties.

41
00:01:56.400 --> 00:02:00.079
<v Speaker 3>It is the absolute bottleneck of the modern world. You

42
00:02:00.120 --> 00:02:03.359
<v Speaker 3>look at the supply chain for advanced AI chips, it

43
00:02:03.439 --> 00:02:06.560
<v Speaker 3>is terrifyingly concentrated. Right we were talking about a single

44
00:02:06.599 --> 00:02:10.000
<v Speaker 3>company in the Netherlands that makes the manufacturing machines, a

45
00:02:10.039 --> 00:02:13.120
<v Speaker 3>single island Taiwan that actually manufactures the chips, and a

46
00:02:13.159 --> 00:02:15.400
<v Speaker 3>handful of US companies that design them and if.

47
00:02:15.280 --> 00:02:17.639
<v Speaker 2>Any single link in that chain breaks, the.

48
00:02:17.599 --> 00:02:20.080
<v Speaker 3>AI revolution doesn't just slow down, it stops entirely.

49
00:02:20.439 --> 00:02:23.439
<v Speaker 2>So today we are going to trace that chain. We

50
00:02:23.479 --> 00:02:26.439
<v Speaker 2>are going to figure out how a piece of hardware

51
00:02:26.520 --> 00:02:29.680
<v Speaker 2>designed to let teenagers play Call of Duty somehow became

52
00:02:29.719 --> 00:02:31.360
<v Speaker 2>the brain of modern civilization.

53
00:02:31.759 --> 00:02:33.120
<v Speaker 3>Is an incredible pivot, and.

54
00:02:33.080 --> 00:02:35.560
<v Speaker 2>We are going to look at why Google panic built

55
00:02:35.599 --> 00:02:38.680
<v Speaker 2>their own secret chip factory, and we will get into

56
00:02:38.680 --> 00:02:41.479
<v Speaker 2>the physical limits too, because apparently we are literally running

57
00:02:41.479 --> 00:02:42.599
<v Speaker 2>out of atoms to work with.

58
00:02:42.879 --> 00:02:44.840
<v Speaker 3>It is a story about physics, it is a story

59
00:02:44.879 --> 00:02:48.280
<v Speaker 3>about economics, and ultimately it is a story about war.

60
00:02:49.080 --> 00:02:51.599
<v Speaker 2>Let us start with the origin story, because this is

61
00:02:51.639 --> 00:02:54.000
<v Speaker 2>the part that always feels like a massive accident of

62
00:02:54.080 --> 00:02:56.719
<v Speaker 2>history to me. If you look at the trillion dollar

63
00:02:56.759 --> 00:02:59.919
<v Speaker 2>club today, Nvidia is sitting right there at the top,

64
00:03:00.599 --> 00:03:03.039
<v Speaker 2>But twenty years ago they were not trying to build

65
00:03:03.159 --> 00:03:04.639
<v Speaker 2>artificial intelligence.

66
00:03:04.199 --> 00:03:07.240
<v Speaker 3>Not even close. If you walked into Nvidia headquarters in

67
00:03:07.319 --> 00:03:10.120
<v Speaker 3>say nineteen ninety nine or two thousand and five, they

68
00:03:10.120 --> 00:03:13.680
<v Speaker 3>were obsessed with one single thing, polygons video games, specifically

69
00:03:13.759 --> 00:03:17.280
<v Speaker 3>rendering three D graphics. They wanted to make better explosions,

70
00:03:17.280 --> 00:03:20.719
<v Speaker 3>realistic textures, dynamic lighting, shadows and to understand why that

71
00:03:20.759 --> 00:03:22.560
<v Speaker 3>matters for AI, we really have to get a little

72
00:03:22.639 --> 00:03:23.240
<v Speaker 3>technical here.

73
00:03:23.280 --> 00:03:25.759
<v Speaker 2>Okay, let's break it down. We need to distinguish between

74
00:03:25.800 --> 00:03:29.319
<v Speaker 2>the chip in your standard laptop, which is the CPU,

75
00:03:29.840 --> 00:03:32.879
<v Speaker 2>and the chip in a gaming card, the GPU. Because

76
00:03:32.919 --> 00:03:35.080
<v Speaker 2>I think most people hear the word processor and they

77
00:03:35.120 --> 00:03:38.199
<v Speaker 2>just picture a little black square. What is the actual

78
00:03:38.319 --> 00:03:40.639
<v Speaker 2>architectural difference inside that square?

79
00:03:40.840 --> 00:03:45.319
<v Speaker 3>Okay, let's unpack this. Imagine a CPU, the central processing unit.

80
00:03:45.400 --> 00:03:48.039
<v Speaker 3>This is your Intel Core I nine or your AMD risin.

81
00:03:48.719 --> 00:03:52.639
<v Speaker 3>A CPU is like a team of incredibly smart mathematicians.

82
00:03:53.159 --> 00:03:55.879
<v Speaker 3>Let us say a tight knit team of twelve geniuses.

83
00:03:56.280 --> 00:03:58.240
<v Speaker 2>Small team, very high IQ.

84
00:03:58.159 --> 00:04:00.960
<v Speaker 3>Extremely high i Q, and highly versatile. If you give

85
00:04:01.000 --> 00:04:04.000
<v Speaker 3>one of these CPU cores a really complex problem, something

86
00:04:04.080 --> 00:04:07.479
<v Speaker 3>like run this operating system, then open Excel, then calculate

87
00:04:07.520 --> 00:04:10.479
<v Speaker 3>this complex formula, then check for incoming email. It can

88
00:04:10.520 --> 00:04:12.599
<v Speaker 3>handle that context switching perfectly.

89
00:04:12.199 --> 00:04:14.479
<v Speaker 2>Because it is designed for serial processing.

90
00:04:14.080 --> 00:04:16.800
<v Speaker 3>Exactly Step A, then step B, then step C. It

91
00:04:16.839 --> 00:04:19.519
<v Speaker 3>has massive amounts of complex logic built in just to

92
00:04:19.560 --> 00:04:22.199
<v Speaker 3>handle branching paths like if this happens, then do that.

93
00:04:22.279 --> 00:04:24.920
<v Speaker 2>So a CPU is optimized for logic and sequence.

94
00:04:24.879 --> 00:04:28.199
<v Speaker 3>Correct now look at a GPU, the graphics processing unit.

95
00:04:28.639 --> 00:04:31.519
<v Speaker 3>A GPU is not a team of twelve geniuses. It

96
00:04:31.600 --> 00:04:35.160
<v Speaker 3>is a stadium filled with ten thousand average high school students.

97
00:04:35.240 --> 00:04:36.920
<v Speaker 2>Okay, I like where this analogy is going.

98
00:04:37.079 --> 00:04:40.399
<v Speaker 3>Individually, those students aren't that smart. They cannot run a

99
00:04:40.439 --> 00:04:44.160
<v Speaker 3>modern operating system. They will completely freeze up if you

100
00:04:44.199 --> 00:04:46.639
<v Speaker 3>give them complex branching logic chains.

101
00:04:46.759 --> 00:04:48.360
<v Speaker 2>But they have numbers on their side.

102
00:04:48.480 --> 00:04:51.040
<v Speaker 3>Exactly, if you give them a task that is simple

103
00:04:51.079 --> 00:04:54.120
<v Speaker 3>and repetitive, like take these two numbers and add them together,

104
00:04:54.240 --> 00:04:55.959
<v Speaker 3>and you tell all ten thousand of them to do

105
00:04:56.000 --> 00:04:57.120
<v Speaker 3>it at the exact same.

106
00:04:56.959 --> 00:04:59.160
<v Speaker 2>Time, they will completely obliterate the CPU.

107
00:04:59.279 --> 00:05:00.560
<v Speaker 3>They will leave it a in the dust.

108
00:05:00.720 --> 00:05:03.800
<v Speaker 2>And this is the core concept of parallelism.

109
00:05:03.120 --> 00:05:07.360
<v Speaker 3>Specifically data parallelism. And this is exactly where video games

110
00:05:07.399 --> 00:05:10.279
<v Speaker 3>come into the picture. Think about your computer screen right now.

111
00:05:10.519 --> 00:05:13.120
<v Speaker 3>It is a grid of pixels. A standard monitor is

112
00:05:13.240 --> 00:05:16.480
<v Speaker 3>nineteen twenty y ten eighty, which is roughly two million pixels.

113
00:05:16.959 --> 00:05:19.439
<v Speaker 3>To render just one single frame of a video game,

114
00:05:19.759 --> 00:05:22.720
<v Speaker 3>you need to calculate the exact color for every single

115
00:05:22.720 --> 00:05:25.800
<v Speaker 3>one of those two million pixels based on the virtual lighting,

116
00:05:26.079 --> 00:05:28.600
<v Speaker 3>the texture of the wall, the geometry of the character.

117
00:05:28.839 --> 00:05:31.079
<v Speaker 2>And the crucial part here is that the color of

118
00:05:31.120 --> 00:05:34.079
<v Speaker 2>the pixel and the top left corner generally does not

119
00:05:34.160 --> 00:05:36.120
<v Speaker 2>depend on the color of the pixel in the bottom

120
00:05:36.199 --> 00:05:36.720
<v Speaker 2>right corner.

121
00:05:36.759 --> 00:05:40.439
<v Speaker 3>Precisely, they are mathematically independent. You don't need to calculate

122
00:05:40.439 --> 00:05:42.959
<v Speaker 3>pixel one and then wait to calculate pixel two and

123
00:05:42.959 --> 00:05:45.319
<v Speaker 3>then pixel three. You can calculate all two million of

124
00:05:45.319 --> 00:05:48.800
<v Speaker 3>them simultaneously. Computer scientists actually have a great term for this.

125
00:05:49.160 --> 00:05:51.839
<v Speaker 3>They call it an embarrassingly parallel problem.

126
00:05:51.959 --> 00:05:54.519
<v Speaker 2>I love that term so much. It is so parallel

127
00:05:54.560 --> 00:05:57.120
<v Speaker 2>it is actually embarrassing not to do it all at once, and.

128
00:05:57.040 --> 00:06:00.439
<v Speaker 3>That is exactly why the GPU was invented. It actively

129
00:06:00.439 --> 00:06:05.279
<v Speaker 3>sacrifices individual core speed and complex logic in exchange for

130
00:06:05.480 --> 00:06:10.519
<v Speaker 3>raw massive parallelism. It uses thousands of tiny, relatively dumb

131
00:06:10.560 --> 00:06:12.920
<v Speaker 3>cores instead of a few really smart ones.

132
00:06:13.000 --> 00:06:15.480
<v Speaker 2>Okay, so we have this chip that was fundamentally designed

133
00:06:15.519 --> 00:06:18.079
<v Speaker 2>to run games like Doom and Quake. How do we

134
00:06:18.120 --> 00:06:20.839
<v Speaker 2>make the jump from rendering a virtual shotgun to running

135
00:06:20.920 --> 00:06:21.600
<v Speaker 2>chat GBT?

136
00:06:21.959 --> 00:06:24.519
<v Speaker 3>This is where we hit the great convergence. Around the

137
00:06:24.600 --> 00:06:28.240
<v Speaker 3>late two thousands, AI researchers were hitting a massive wall.

138
00:06:28.720 --> 00:06:31.839
<v Speaker 3>They had these theoretical ideas about neural networks, which are

139
00:06:31.839 --> 00:06:36.519
<v Speaker 3>basically mathematical structures inspired by the biological human brain, but

140
00:06:36.680 --> 00:06:40.160
<v Speaker 3>actually training them was agonizingly slow because they were trying

141
00:06:40.160 --> 00:06:43.240
<v Speaker 3>to run them on those CPUs. The twelve Geniuses, right,

142
00:06:43.319 --> 00:06:45.759
<v Speaker 3>and the geniuses were just getting bogged down because a

143
00:06:45.800 --> 00:06:49.199
<v Speaker 3>neural network, at its very core is just a massive

144
00:06:49.319 --> 00:06:52.959
<v Speaker 3>grid of numbers. In math we call the matrices. To

145
00:06:53.079 --> 00:06:55.600
<v Speaker 3>train in AI, you have to multiply these giant grids

146
00:06:55.600 --> 00:06:58.720
<v Speaker 3>of numbers together, adjust the results slightly, and then do

147
00:06:58.800 --> 00:07:01.560
<v Speaker 3>it again billions of time, literally billions of times.

148
00:07:01.600 --> 00:07:03.800
<v Speaker 2>And I'm guessing matrix multiplication is.

149
00:07:03.800 --> 00:07:08.079
<v Speaker 3>It is embarrassingly parallel. Multiplying a massive matrix is really

150
00:07:08.120 --> 00:07:12.240
<v Speaker 3>just performing the exact same simple multiplication operation on thousands

151
00:07:12.279 --> 00:07:15.480
<v Speaker 3>of numbers at the exact same time. It turns out

152
00:07:15.800 --> 00:07:18.639
<v Speaker 3>the math required to simulate a photon of light bouncing

153
00:07:18.680 --> 00:07:20.720
<v Speaker 3>off a three D wall in a video game is

154
00:07:20.759 --> 00:07:23.319
<v Speaker 3>almost identical to the math required to simulate a virtual

155
00:07:23.360 --> 00:07:25.680
<v Speaker 3>neuron firing in an artificial brain.

156
00:07:25.800 --> 00:07:27.800
<v Speaker 2>That is just such a wild coincidence to me.

157
00:07:28.120 --> 00:07:30.600
<v Speaker 3>It is the happy accident that gave us the entire

158
00:07:30.639 --> 00:07:31.319
<v Speaker 3>modern world.

159
00:07:31.480 --> 00:07:34.800
<v Speaker 2>So when did the industry actually realize this? Was there

160
00:07:34.800 --> 00:07:37.199
<v Speaker 2>a specific moment where the light bulb suddenly went on

161
00:07:37.240 --> 00:07:38.759
<v Speaker 2>for everyone, there was.

162
00:07:38.759 --> 00:07:41.160
<v Speaker 3>A big bang moment. It was twenty twelve. A competition

163
00:07:41.240 --> 00:07:42.040
<v Speaker 3>called image.

164
00:07:41.839 --> 00:07:45.040
<v Speaker 2>Net set the scene for us. What exactly was image.

165
00:07:44.839 --> 00:07:48.000
<v Speaker 3>Net Imaget was basically the Olympics of computer vision. You

166
00:07:48.040 --> 00:07:51.639
<v Speaker 3>had this massive data set of millions of images pictures

167
00:07:51.680 --> 00:07:56.120
<v Speaker 3>of cats, dogs, airplanes, strawberries, and researchers had to write

168
00:07:56.519 --> 00:07:59.600
<v Speaker 3>software that could look at the pixels and identify what

169
00:07:59.639 --> 00:08:00.319
<v Speaker 3>was actually in.

170
00:08:00.240 --> 00:08:03.600
<v Speaker 2>The picture, which is incredibly hard for a computer, very hard.

171
00:08:03.759 --> 00:08:06.279
<v Speaker 3>For years, the best teams in the world, mostly using

172
00:08:06.319 --> 00:08:10.319
<v Speaker 3>traditional hand coded logic techniques, were stuck getting air rates

173
00:08:10.360 --> 00:08:11.399
<v Speaker 3>around twenty six percent.

174
00:08:11.560 --> 00:08:13.759
<v Speaker 2>That is not great, That is missing one out of

175
00:08:13.759 --> 00:08:14.720
<v Speaker 2>every four pictures.

176
00:08:14.759 --> 00:08:17.040
<v Speaker 3>It was the best we had At the time. Progress

177
00:08:17.120 --> 00:08:21.279
<v Speaker 3>was agonizingly slow. Most people thought true human level computer

178
00:08:21.360 --> 00:08:25.199
<v Speaker 3>vision was decades away. But then in twenty twelve, this

179
00:08:25.319 --> 00:08:28.920
<v Speaker 3>small team from the University of Toronto shows up. Alex Krzewski,

180
00:08:29.399 --> 00:08:31.959
<v Speaker 3>Ilia Sitzkaver, and Jeffrey Hinton, and.

181
00:08:31.920 --> 00:08:34.840
<v Speaker 2>Those Nate I mean is Ilia Setzkaver and Jeffrey Hinton.

182
00:08:35.080 --> 00:08:39.120
<v Speaker 2>These are the absolute titans of AI today, but back

183
00:08:39.159 --> 00:08:40.600
<v Speaker 2>then they were kind of the outsiders.

184
00:08:40.679 --> 00:08:43.360
<v Speaker 3>Right, they are the crazy ones. Neural networks were widely

185
00:08:43.399 --> 00:08:46.679
<v Speaker 3>considered a dead end by most serious computer scientists, but

186
00:08:46.799 --> 00:08:50.080
<v Speaker 3>this team entered a neural network they called alex net,

187
00:08:50.519 --> 00:08:53.360
<v Speaker 3>and it didn't just win the competition, it utterly destroyed

188
00:08:53.360 --> 00:08:55.639
<v Speaker 3>the field. They dropped the air rate from twenty six

189
00:08:55.639 --> 00:08:57.440
<v Speaker 3>percent down to fifteen percent in.

190
00:08:57.360 --> 00:09:00.759
<v Speaker 2>A single year. That is unprecedented for that competition.

191
00:09:00.840 --> 00:09:03.879
<v Speaker 3>In one year. It was a mathematical massacre. The entire

192
00:09:04.000 --> 00:09:07.080
<v Speaker 3>conference room when completely silent when they presented. But here's

193
00:09:07.080 --> 00:09:09.720
<v Speaker 3>the specific detail that matters for a story today. To

194
00:09:09.799 --> 00:09:13.360
<v Speaker 3>train alex Net, they didn't use a massive government supercomputer.

195
00:09:13.919 --> 00:09:15.879
<v Speaker 3>They didn't use a giant server cluster.

196
00:09:16.039 --> 00:09:16.840
<v Speaker 2>What did they use.

197
00:09:16.919 --> 00:09:20.399
<v Speaker 3>They literally went to a consumer electronics store and bought

198
00:09:20.440 --> 00:09:24.039
<v Speaker 3>two Nvidia GTX five eighty graphics cards.

199
00:09:23.960 --> 00:09:26.720
<v Speaker 2>Two gamer cards, the exact kind of thing you would

200
00:09:26.759 --> 00:09:29.480
<v Speaker 2>put in a Dusktop PC to play Skyrim.

201
00:09:29.240 --> 00:09:32.519
<v Speaker 3>Exactly two cards that cost maybe five hundred dollars each

202
00:09:32.559 --> 00:09:35.279
<v Speaker 3>at the time. They shoved them into a standard PC.

203
00:09:35.840 --> 00:09:38.039
<v Speaker 3>They wrote some custom code to move the math off

204
00:09:38.080 --> 00:09:41.080
<v Speaker 3>the CPU and onto the GPU, and they suddenly realized

205
00:09:41.120 --> 00:09:43.879
<v Speaker 3>they could train their model in a matter of days instead.

206
00:09:43.559 --> 00:09:47.039
<v Speaker 2>Of months and That is the true Aha moment, because

207
00:09:47.080 --> 00:09:49.639
<v Speaker 2>if you can iterate in days, you can actually learn

208
00:09:49.720 --> 00:09:52.480
<v Speaker 2>and adapt. If an experiment takes six months, you are

209
00:09:52.559 --> 00:09:53.200
<v Speaker 2>just stuck.

210
00:09:53.480 --> 00:09:57.039
<v Speaker 3>Exactly. Speed is intelligence in this field. If you can

211
00:09:57.080 --> 00:09:59.159
<v Speaker 3>run one hundred experiments in the time it takes your

212
00:09:59.240 --> 00:10:03.639
<v Speaker 3>rival to run one one, you get smarter incredibly fast. Honestly,

213
00:10:03.720 --> 00:10:06.080
<v Speaker 3>in vidia stock price chart should basically have a little

214
00:10:06.120 --> 00:10:08.679
<v Speaker 3>bronze statue of alex Net next to it. But this

215
00:10:08.720 --> 00:10:10.159
<v Speaker 3>is the deep dive of nuance we need to hit.

216
00:10:10.240 --> 00:10:12.080
<v Speaker 3>It wasn't just the physical hardware that made.

217
00:10:11.919 --> 00:10:14.559
<v Speaker 2>This possible, right, because you cannot just plug a video

218
00:10:14.600 --> 00:10:16.840
<v Speaker 2>card into a motherboard and tell it to learn English.

219
00:10:16.879 --> 00:10:19.639
<v Speaker 2>It inherently speaks graphics, doesn't speak math.

220
00:10:19.600 --> 00:10:22.960
<v Speaker 3>Correct And this is where in Nvidia's CEO Jensen Huong

221
00:10:23.080 --> 00:10:27.480
<v Speaker 3>showed just incredible almost prophetic foresight years before alex Net,

222
00:10:27.519 --> 00:10:29.320
<v Speaker 3>way back in two thousand and six, and Video released

223
00:10:29.320 --> 00:10:32.279
<v Speaker 3>a software platform called CUA ce UA.

224
00:10:33.039 --> 00:10:36.120
<v Speaker 2>I see this acronym constantly when reading about this space.

225
00:10:36.720 --> 00:10:41.240
<v Speaker 2>It is usually described as Invidia's massive mote. What is

226
00:10:41.279 --> 00:10:42.879
<v Speaker 2>it actually doing under the hood?

227
00:10:43.000 --> 00:10:45.279
<v Speaker 3>Well before CD existed, If you wanted to use a

228
00:10:45.320 --> 00:10:48.200
<v Speaker 3>GPU for scientific math, you basically had to hack it.

229
00:10:48.480 --> 00:10:50.879
<v Speaker 3>You literally had to trick the graphics card into thinking

230
00:10:50.919 --> 00:10:53.879
<v Speaker 3>your math problem was actually a texture or a shadow

231
00:10:53.919 --> 00:10:56.960
<v Speaker 3>on a polygon. It was an incredibly painful process.

232
00:10:57.039 --> 00:11:00.000
<v Speaker 2>Please render this massive spreadsheet as an explosion.

233
00:11:00.200 --> 00:11:04.080
<v Speaker 3>Basically, yes, it was a total nightmare for researchers. CUDA

234
00:11:04.240 --> 00:11:06.879
<v Speaker 3>changed all of that. It was a software layer that

235
00:11:06.960 --> 00:11:10.639
<v Speaker 3>let normal programmers write standard code like C plus plus

236
00:11:10.919 --> 00:11:14.279
<v Speaker 3>that ran directly on the GPU. It exposed the raw

237
00:11:14.360 --> 00:11:17.960
<v Speaker 3>mathematical power of the chip without all the annoying graphics baggage.

238
00:11:18.120 --> 00:11:21.600
<v Speaker 2>So Nvidia essentially built the translation layer before anyone even

239
00:11:21.639 --> 00:11:23.519
<v Speaker 2>really knew what language they wanted to speak.

240
00:11:23.600 --> 00:11:26.120
<v Speaker 3>Jensen Wong practically bet the entire company on it now

241
00:11:26.279 --> 00:11:28.679
<v Speaker 3>and Wall Street absolutely hated it at the time. Investors

242
00:11:28.720 --> 00:11:30.960
<v Speaker 3>were furious. They said, why are you spending billions of

243
00:11:30.960 --> 00:11:33.039
<v Speaker 3>dollars on R and D for a feature that only

244
00:11:33.080 --> 00:11:35.200
<v Speaker 3>a few academic weirdos in universities use.

245
00:11:35.480 --> 00:11:39.000
<v Speaker 2>And then a few years later those weirdos invented modern AI.

246
00:11:39.480 --> 00:11:43.759
<v Speaker 3>And because all those weirdos learned to code specifically in CUDA,

247
00:11:44.519 --> 00:11:48.159
<v Speaker 3>the entire foundation of modern AI was built on top

248
00:11:48.200 --> 00:11:51.639
<v Speaker 3>of Nvidia's proprietary software. Now If you are a hardware

249
00:11:51.679 --> 00:11:53.679
<v Speaker 3>startup today and you want to build a brand new

250
00:11:53.720 --> 00:11:57.240
<v Speaker 3>chip to beat in Nvidia, you have a massive, massive problem.

251
00:11:56.879 --> 00:11:59.679
<v Speaker 2>Because nobody knows how to program your new chip exactly.

252
00:12:00.080 --> 00:12:02.559
<v Speaker 3>All the libraries, all the developer tools, all the research

253
00:12:02.639 --> 00:12:07.600
<v Speaker 3>they all natively speak CUDA. It is the classic ecosystem

254
00:12:07.720 --> 00:12:10.399
<v Speaker 3>lock in. It is like Windows in the nineties or

255
00:12:10.440 --> 00:12:13.399
<v Speaker 3>the iPhone app store today. It is incredibly difficult to

256
00:12:13.399 --> 00:12:14.120
<v Speaker 3>break that habit.

257
00:12:14.399 --> 00:12:16.639
<v Speaker 2>Let us fast forward to today then, because we are

258
00:12:16.720 --> 00:12:19.360
<v Speaker 2>obviously not using five hundred dollars GTX five eighties anymore.

259
00:12:19.399 --> 00:12:21.440
<v Speaker 2>We're using the H one hundred. This is the chip

260
00:12:21.480 --> 00:12:23.759
<v Speaker 2>that companies are fighting over, the one Mark Zuckerberg is

261
00:12:23.759 --> 00:12:26.000
<v Speaker 2>supposedly buying three hundred and fifty thousand.

262
00:12:25.679 --> 00:12:28.080
<v Speaker 3>Of H one hundred is It is a monster. It

263
00:12:28.159 --> 00:12:29.799
<v Speaker 3>is a true marvel of human engineering.

264
00:12:29.840 --> 00:12:32.039
<v Speaker 2>Give me the physical stats. What are actually looking at here?

265
00:12:32.159 --> 00:12:36.120
<v Speaker 3>It is a slab of silicon that has eighty billion

266
00:12:36.399 --> 00:12:39.759
<v Speaker 3>individual transistors carved into it using a four and nanimeter

267
00:12:39.960 --> 00:12:43.759
<v Speaker 3>manufacturing process. Just wrap your head around that eighty billion

268
00:12:43.799 --> 00:12:47.080
<v Speaker 3>on one chip. But honestly, the raw transistor count isn't

269
00:12:47.080 --> 00:12:50.159
<v Speaker 3>even the most impressive part. It is how highly specialized.

270
00:12:50.159 --> 00:12:53.320
<v Speaker 3>The architecture has become specialized in what way? Remember how

271
00:12:53.360 --> 00:12:56.639
<v Speaker 3>the old GPUs were fairly general purpose for graphics. The

272
00:12:56.799 --> 00:12:59.720
<v Speaker 3>H one hundred is designed specifically from the ground up

273
00:12:59.799 --> 00:13:02.440
<v Speaker 3>for the math of transformers DASH, which is the t

274
00:13:02.799 --> 00:13:06.639
<v Speaker 3>in chat GPT. It has specific hardware units inside it

275
00:13:06.679 --> 00:13:07.639
<v Speaker 3>called tensor.

276
00:13:07.279 --> 00:13:08.559
<v Speaker 2>Cores tensor course.

277
00:13:08.759 --> 00:13:11.080
<v Speaker 3>Think of them as dedicated calculator services that do nothing

278
00:13:11.120 --> 00:13:14.320
<v Speaker 3>but matrix multiplication. They cannot render graphics, they cannot run

279
00:13:14.360 --> 00:13:16.799
<v Speaker 3>an operating system. They just do that one specific math

280
00:13:16.840 --> 00:13:20.360
<v Speaker 3>operation incredibly fast. The H one hundred can perform roughly

281
00:13:20.399 --> 00:13:23.840
<v Speaker 3>four thousand trillion floating point operations per second if you

282
00:13:23.919 --> 00:13:25.080
<v Speaker 3>use the right precision levels.

283
00:13:25.279 --> 00:13:28.720
<v Speaker 2>Four thousand trillion operations per second. That is unfathomable.

284
00:13:28.799 --> 00:13:31.440
<v Speaker 3>But here's the crazy part. Raw compute speed is actually

285
00:13:31.480 --> 00:13:34.080
<v Speaker 3>the easy part of chip design. Now, the real bottleneck,

286
00:13:34.080 --> 00:13:36.360
<v Speaker 3>the thing that actually keeps chip architects up at night,

287
00:13:36.440 --> 00:13:36.960
<v Speaker 3>is memory.

288
00:13:37.240 --> 00:13:39.919
<v Speaker 2>This is the memory wall concept I keep reading about, right.

289
00:13:40.200 --> 00:13:42.480
<v Speaker 3>It simply does not matter if your process or brain

290
00:13:42.519 --> 00:13:45.000
<v Speaker 3>can think of billion thoughts a second, if you cannot

291
00:13:45.000 --> 00:13:47.600
<v Speaker 3>get the data into the brain fast enough, A super

292
00:13:47.639 --> 00:13:50.440
<v Speaker 3>SaaS chip with slow memory is like a Ferrari with

293
00:13:50.519 --> 00:13:53.759
<v Speaker 3>a clogged fuel line, it just stalls out. So the

294
00:13:53.960 --> 00:13:56.600
<v Speaker 3>H one hundred uses a brand new type of memory

295
00:13:56.639 --> 00:13:59.399
<v Speaker 3>called HBM High bandwidth memory.

296
00:13:59.559 --> 00:14:01.879
<v Speaker 2>How does that solve the fuel line problem?

297
00:14:01.960 --> 00:14:05.679
<v Speaker 3>It is stacked vertically. They literally build skyscrapers of memory

298
00:14:05.759 --> 00:14:09.480
<v Speaker 3>chips directly on top of the processor itself to physically

299
00:14:09.480 --> 00:14:12.039
<v Speaker 3>shorten the distance the electrical signals have to travel.

300
00:14:12.200 --> 00:14:14.919
<v Speaker 2>So they are building three D towers of memory right

301
00:14:15.000 --> 00:14:18.000
<v Speaker 2>next to the logic cores just to save the fractions

302
00:14:18.000 --> 00:14:20.240
<v Speaker 2>of a nanosecond it takes for the signal to travel

303
00:14:20.240 --> 00:14:21.679
<v Speaker 2>across a standard motherboard.

304
00:14:21.799 --> 00:14:24.200
<v Speaker 3>We are actively fighting this speed of light. At this point.

305
00:14:24.840 --> 00:14:27.559
<v Speaker 3>The H one hundred has a memory bandwidth of over

306
00:14:27.639 --> 00:14:31.080
<v Speaker 3>three point three terabytes per second. To put that in perspective,

307
00:14:31.440 --> 00:14:34.279
<v Speaker 3>that is like downloading thousands of full four K movies

308
00:14:34.320 --> 00:14:36.720
<v Speaker 3>in a single second. It is an absolute fire hose

309
00:14:36.759 --> 00:14:37.159
<v Speaker 3>of data.

310
00:14:37.200 --> 00:14:39.799
<v Speaker 2>And they use something called envylink to string them together. Right.

311
00:14:40.200 --> 00:14:45.080
<v Speaker 3>Yes, Envylink is their proprietary interconnect because one H one

312
00:14:45.159 --> 00:14:48.440
<v Speaker 3>hundred isn't enough. You need thousands of them to function

313
00:14:48.519 --> 00:14:52.600
<v Speaker 3>as one giant unified brain. Envylink is the nervous system

314
00:14:52.639 --> 00:14:54.519
<v Speaker 3>that lets them talk to each other fast enough to

315
00:14:54.559 --> 00:14:55.559
<v Speaker 3>stay synchronized.

316
00:14:55.799 --> 00:14:58.960
<v Speaker 2>And yet despite all of that insane power and the

317
00:14:59.039 --> 00:15:02.159
<v Speaker 2>CDA mode, and VIDIA is not the only player in

318
00:15:02.200 --> 00:15:05.159
<v Speaker 2>town anymore. Which brings us to this sleeping giant that

319
00:15:05.240 --> 00:15:06.919
<v Speaker 2>suddenly woke up Google.

320
00:15:07.399 --> 00:15:10.080
<v Speaker 3>This is truly one of my favorite corporate history stories

321
00:15:10.360 --> 00:15:12.679
<v Speaker 3>because while everyone else in the world was just blindly

322
00:15:12.720 --> 00:15:16.000
<v Speaker 3>buying in vidio chips, Google looked at their internal usage

323
00:15:16.080 --> 00:15:17.960
<v Speaker 3>data and absolutely freaked out.

324
00:15:18.000 --> 00:15:20.240
<v Speaker 2>This was back around twenty thirteen r twenty thirteen.

325
00:15:20.320 --> 00:15:22.200
<v Speaker 3>Yeah, yeah. Google engineers did a back of the n

326
00:15:22.200 --> 00:15:25.440
<v Speaker 3>appting calculation that terrified them. They looked at the rapid

327
00:15:25.519 --> 00:15:28.399
<v Speaker 3>rise of voice search on Android phones and they realized

328
00:15:28.440 --> 00:15:31.840
<v Speaker 3>that if every single Android user used voice search for

329
00:15:32.080 --> 00:15:33.559
<v Speaker 3>just three minutes a day.

330
00:15:33.639 --> 00:15:36.320
<v Speaker 2>Just three minutes, that is like two quick searches.

331
00:15:35.960 --> 00:15:39.080
<v Speaker 3>Exactly almost nothing. But they calculated that those three minutes

332
00:15:39.080 --> 00:15:41.840
<v Speaker 3>would require so much compute power to process the speech

333
00:15:41.879 --> 00:15:46.120
<v Speaker 3>recognition that it would completely double Google's entire global data center.

334
00:15:45.919 --> 00:15:48.639
<v Speaker 2>Footprint, doubled their entire footprint.

335
00:15:48.720 --> 00:15:51.080
<v Speaker 3>They would have had to build twice as many data

336
00:15:51.080 --> 00:15:53.559
<v Speaker 3>centers as they had built in their entire corporate history

337
00:15:54.039 --> 00:15:57.519
<v Speaker 3>just to support three minutes of voice search. They realized

338
00:15:57.639 --> 00:16:01.039
<v Speaker 3>instantly that if they relied on buying standard Intel CPUs

339
00:16:01.039 --> 00:16:05.000
<v Speaker 3>and Nvidia GPUs, they would literally go bankrupt. The economics

340
00:16:05.080 --> 00:16:07.159
<v Speaker 3>just flat out did not work at that scale.

341
00:16:07.240 --> 00:16:10.440
<v Speaker 2>So, in classic Google fashion, they just decided, we will

342
00:16:10.440 --> 00:16:11.600
<v Speaker 2>build our own hardware.

343
00:16:12.080 --> 00:16:15.639
<v Speaker 3>They launched a highly secret internal project to build the TPU,

344
00:16:16.080 --> 00:16:20.840
<v Speaker 3>the tensor processing unit, and their design philosophy was incredibly

345
00:16:20.960 --> 00:16:25.279
<v Speaker 3>radical compared to Nvidia. Because in Vidia sells GPUs to everyone, right,

346
00:16:25.480 --> 00:16:28.559
<v Speaker 3>they have to be good at gaming, cryptomning, self driving cars,

347
00:16:28.799 --> 00:16:32.440
<v Speaker 3>scientific simulations. Google said, we do not care about gaming,

348
00:16:32.480 --> 00:16:34.840
<v Speaker 3>We do not care about graphics at all. We want

349
00:16:34.879 --> 00:16:38.000
<v Speaker 3>a chip that does deep learning and absolutely nothing else.

350
00:16:38.039 --> 00:16:39.600
<v Speaker 2>So they stripped the sports car all the way down

351
00:16:39.600 --> 00:16:42.559
<v Speaker 2>to the chassis. No AC, no radio, just a massive engine.

352
00:16:42.639 --> 00:16:44.639
<v Speaker 3>Even the engine itself is totally different. They used a

353
00:16:44.639 --> 00:16:46.919
<v Speaker 3>specific architecture called a systolic array.

354
00:16:47.039 --> 00:16:49.159
<v Speaker 2>Systolic like blood pressure, like a.

355
00:16:49.120 --> 00:16:52.039
<v Speaker 3>Heart beat, exactly like a heart beat, and a normal CPU.

356
00:16:52.080 --> 00:16:54.919
<v Speaker 3>With GPU, the chip acts kind of like a library.

357
00:16:55.200 --> 00:16:56.679
<v Speaker 3>You go to the shelf to get a book which

358
00:16:56.720 --> 00:16:59.440
<v Speaker 3>is your data. You bring it to the desk the processor,

359
00:16:59.519 --> 00:17:01.320
<v Speaker 3>you read it, and then you walk all the way

360
00:17:01.360 --> 00:17:03.720
<v Speaker 3>back to put it on the shelf. That walking back

361
00:17:03.720 --> 00:17:06.720
<v Speaker 3>and forth accessing the memory takes a massive amount of

362
00:17:06.799 --> 00:17:07.880
<v Speaker 3>energy and time.

363
00:17:07.880 --> 00:17:10.640
<v Speaker 2>And as we just established, energy and memory are the

364
00:17:10.720 --> 00:17:11.880
<v Speaker 2>ultimate enemies.

365
00:17:11.599 --> 00:17:14.720
<v Speaker 3>Here, right, So in a systolic array, you do not

366
00:17:14.880 --> 00:17:17.240
<v Speaker 3>put the book back on the shelf. You process it,

367
00:17:17.359 --> 00:17:19.519
<v Speaker 3>and then you immediately hand it to the person sitting

368
00:17:19.599 --> 00:17:22.680
<v Speaker 3>right next to you. That data physically flows through the

369
00:17:22.720 --> 00:17:26.160
<v Speaker 3>grid of the chip in a wave. One calculation finishes

370
00:17:26.440 --> 00:17:29.359
<v Speaker 3>and simply pushes the result directly to the next math unit.

371
00:17:29.640 --> 00:17:32.359
<v Speaker 3>It heavily mimics a continuous flow of blood through a

372
00:17:32.359 --> 00:17:33.440
<v Speaker 3>circulatory system.

373
00:17:33.559 --> 00:17:35.680
<v Speaker 2>So the data just enters one side of the chip,

374
00:17:35.839 --> 00:17:38.960
<v Speaker 2>flows through this massive grid of math units getting multiplied,

375
00:17:39.240 --> 00:17:41.559
<v Speaker 2>and just pops out the other side as a finished result.

376
00:17:41.759 --> 00:17:46.200
<v Speaker 3>Exactly. It drastically reduced the need to constantly access external memory,

377
00:17:46.759 --> 00:17:50.480
<v Speaker 3>and the result was staggering. That first internal TPU they

378
00:17:50.559 --> 00:17:54.240
<v Speaker 3>deployed was roughly fifteen thirty times more efficient per watt

379
00:17:54.480 --> 00:17:57.200
<v Speaker 3>than anything else available on the commercial market at the time.

380
00:17:57.359 --> 00:18:01.319
<v Speaker 2>That is an insane leap inefficiency. And Google didn't stop there.

381
00:18:01.359 --> 00:18:02.839
<v Speaker 2>They kept iterating on it.

382
00:18:02.920 --> 00:18:05.400
<v Speaker 3>Oh yeah. Version two came out in twenty seventeen, and

383
00:18:05.440 --> 00:18:07.599
<v Speaker 3>that was a huge deal because the first one could

384
00:18:07.599 --> 00:18:10.240
<v Speaker 3>only run models, they couldn't train them. V two added

385
00:18:10.319 --> 00:18:13.200
<v Speaker 3>full training capabilities, and today we are on V four

386
00:18:13.279 --> 00:18:16.319
<v Speaker 3>and V five. They are entirely liquid cooled now and

387
00:18:16.359 --> 00:18:18.880
<v Speaker 3>they deploy them in massive clusters they call pods.

388
00:18:19.240 --> 00:18:21.200
<v Speaker 2>Thousands of chips all wired together.

389
00:18:21.039 --> 00:18:23.480
<v Speaker 3>Right, and this brings up a massive advantage Google has

390
00:18:23.519 --> 00:18:27.880
<v Speaker 3>over almost everyone else. It is their interconnect system called ICI.

391
00:18:28.200 --> 00:18:30.680
<v Speaker 2>How is that different from in videos and vlink?

392
00:18:31.119 --> 00:18:34.559
<v Speaker 3>Because Google completely controls their own data centers, they can

393
00:18:34.640 --> 00:18:37.480
<v Speaker 3>wire these TPUs directly to each other in what is

394
00:18:37.480 --> 00:18:40.200
<v Speaker 3>called a torus topology. Think of it like a giant

395
00:18:40.200 --> 00:18:43.599
<v Speaker 3>three D donut shape. They use direct optical links between

396
00:18:43.599 --> 00:18:45.319
<v Speaker 3>the chips. They don't have to route the data through

397
00:18:45.319 --> 00:18:49.519
<v Speaker 3>standard bulky networking switches. It makes those thousands of TPUs

398
00:18:49.759 --> 00:18:52.240
<v Speaker 3>act flawlessly as one single brain.

399
00:18:52.599 --> 00:18:54.720
<v Speaker 2>And this is why today when you look at Google,

400
00:18:54.759 --> 00:18:58.079
<v Speaker 2>they don't really buy in Vidia chips for their core

401
00:18:58.440 --> 00:19:01.799
<v Speaker 2>internal AID training. They use their own hardware.

402
00:19:01.960 --> 00:19:04.440
<v Speaker 3>They use TPUs for almost everything. Gemini, which is their

403
00:19:04.559 --> 00:19:08.559
<v Speaker 3>massive competitor to GPT four, was trained entirely on TPUs.

404
00:19:08.640 --> 00:19:10.759
<v Speaker 3>It gives Google this incredible vertical integration.

405
00:19:10.839 --> 00:19:11.720
<v Speaker 2>They own the whole stack.

406
00:19:11.839 --> 00:19:14.240
<v Speaker 3>They own the chip design, the physical server rack, the

407
00:19:14.240 --> 00:19:18.319
<v Speaker 3>custom cooling systems, the softwaware framework which is TensorFlow or jx,

408
00:19:18.440 --> 00:19:21.400
<v Speaker 3>and the final AI model itself. It is essentially the

409
00:19:21.440 --> 00:19:25.079
<v Speaker 3>Apple iPhone strategy, but applied to a warehouse sized data center.

410
00:19:25.200 --> 00:19:27.440
<v Speaker 3>They completely control their own destiny.

411
00:19:27.160 --> 00:19:29.200
<v Speaker 2>Which is an amazing position to be in. So we

412
00:19:29.279 --> 00:19:33.000
<v Speaker 2>have the raining commercial champion in Nvidia. We have the independent,

413
00:19:33.200 --> 00:19:37.359
<v Speaker 2>vertically integrated superpower Google. But looking at the market right now,

414
00:19:37.400 --> 00:19:40.359
<v Speaker 2>it feels like the floodgates have totally opened. Every major

415
00:19:40.400 --> 00:19:43.480
<v Speaker 2>tech company is suddenly announcing their own custom chip.

416
00:19:43.799 --> 00:19:46.880
<v Speaker 3>It is the great me too wave of Silicon and

417
00:19:46.920 --> 00:19:50.519
<v Speaker 3>it is driven by very simple, very ruthless economics. If

418
00:19:50.559 --> 00:19:55.400
<v Speaker 3>you are Amazon Aws or Microsoft Azure, you are currently

419
00:19:55.400 --> 00:19:59.119
<v Speaker 3>spending tens of billions of dollars a year buying chips

420
00:19:59.119 --> 00:20:02.319
<v Speaker 3>from Nvidia. That just vaporizes your profit margins.

421
00:20:02.440 --> 00:20:04.920
<v Speaker 2>And worse than that, you are entirely dependent on a

422
00:20:04.960 --> 00:20:08.839
<v Speaker 2>single supplier who literally cannot manufacture the chips fast enough

423
00:20:08.839 --> 00:20:10.440
<v Speaker 2>to meet your needs exactly.

424
00:20:10.480 --> 00:20:13.440
<v Speaker 3>So let's look at the broader landscape. You have Amazon Aws,

425
00:20:13.480 --> 00:20:16.640
<v Speaker 3>who took a very smart, bifurcated approach. They split the

426
00:20:16.640 --> 00:20:19.400
<v Speaker 3>AI problem in half. They built a chip called Trainium

427
00:20:19.480 --> 00:20:23.000
<v Speaker 3>specifically for training models, and a separate chip called Inferentia

428
00:20:23.119 --> 00:20:23.759
<v Speaker 3>for running them.

429
00:20:23.839 --> 00:20:26.160
<v Speaker 2>We actually really need to pause here and defind this clearly.

430
00:20:26.200 --> 00:20:29.039
<v Speaker 2>Because the difference between training and inference comes up constantly

431
00:20:29.039 --> 00:20:31.599
<v Speaker 2>in this space. What is the actual practical difference.

432
00:20:31.799 --> 00:20:33.880
<v Speaker 3>The best way to think about it is think of

433
00:20:34.039 --> 00:20:36.079
<v Speaker 3>training as graduate school.

434
00:20:36.200 --> 00:20:40.400
<v Speaker 2>Okay, graduate school, years of intense work, massive amounts of coffee,

435
00:20:40.440 --> 00:20:42.119
<v Speaker 2>incredibly expensive exactly.

436
00:20:42.400 --> 00:20:45.440
<v Speaker 3>Training is the phase where the AI is actively learning

437
00:20:45.480 --> 00:20:49.759
<v Speaker 3>from scratch. You are feeling it essentially the entire written

438
00:20:49.759 --> 00:20:53.359
<v Speaker 3>text of the Internet. It takes months of continuous run time.

439
00:20:53.799 --> 00:20:58.240
<v Speaker 3>You need massive, incredibly expensive compute clusters with thousands of

440
00:20:58.279 --> 00:21:02.200
<v Speaker 3>GPUs working in perfect unison. You need high precision maths

441
00:21:02.200 --> 00:21:04.880
<v Speaker 3>so the model can learn tiny nuances. This is the

442
00:21:04.920 --> 00:21:08.240
<v Speaker 3>graduate school phase, and this is where Nvidia absolutely dominates.

443
00:21:08.279 --> 00:21:11.039
<v Speaker 2>But eventually the model finishes its exams, it.

444
00:21:11.079 --> 00:21:13.359
<v Speaker 3>Graduates, It graduates, and it has to go get a job.

445
00:21:13.519 --> 00:21:16.440
<v Speaker 3>That job is inference, okay. Inference is what happens when

446
00:21:16.480 --> 00:21:19.000
<v Speaker 3>you open an app, ask chat GPT a question and

447
00:21:19.039 --> 00:21:21.799
<v Speaker 3>it types out an answer. The model is no longer learning.

448
00:21:21.839 --> 00:21:24.519
<v Speaker 3>Its weights are frozen. It is simply applying the knowledge

449
00:21:24.519 --> 00:21:27.039
<v Speaker 3>had already gained in grad school to a new prompt.

450
00:21:26.680 --> 00:21:29.440
<v Speaker 2>And that happens in real time instantly, right.

451
00:21:29.400 --> 00:21:32.079
<v Speaker 3>It happens in milliseconds, and it happens billions of times

452
00:21:32.079 --> 00:21:34.920
<v Speaker 3>a day across the world. And the hardware needs for

453
00:21:34.960 --> 00:21:39.279
<v Speaker 3>that day job are completely different than grad school. For inference,

454
00:21:39.519 --> 00:21:42.599
<v Speaker 3>you don't need massive precision. You care about latency, how

455
00:21:42.640 --> 00:21:45.319
<v Speaker 3>fast can I serve this answer to the user? And

456
00:21:45.359 --> 00:21:48.119
<v Speaker 3>you care deeply about costs and power consumption.

457
00:21:47.799 --> 00:21:51.359
<v Speaker 2>Because you only train the model once maybe twice a year,

458
00:21:52.240 --> 00:21:55.640
<v Speaker 2>but you run inference on it constantly every single second

459
00:21:55.640 --> 00:21:56.079
<v Speaker 2>of every.

460
00:21:56.000 --> 00:21:59.759
<v Speaker 3>Day, precisely as AI applications explode and get integrated into

461
00:21:59.799 --> 00:22:02.799
<v Speaker 3>every piece of software. The overall market for inference chips

462
00:22:02.839 --> 00:22:05.839
<v Speaker 3>is actually projected to grow much faster than the market

463
00:22:05.839 --> 00:22:09.440
<v Speaker 3>for training chips. That is exactly why Amazon built Inferentia

464
00:22:09.960 --> 00:22:12.880
<v Speaker 3>is designed to be a cheap, highly efficient chip just

465
00:22:12.920 --> 00:22:15.920
<v Speaker 3>for that day job workload. Microsoft is doing the exact

466
00:22:15.960 --> 00:22:18.160
<v Speaker 3>same thing with their Meya accelerator for Azure.

467
00:22:18.519 --> 00:22:21.799
<v Speaker 2>But what about Meta? Because Facebook and Instagram they aren't

468
00:22:21.839 --> 00:22:25.079
<v Speaker 2>selling cloud server space to startups like Amazon and Microsoft do.

469
00:22:25.480 --> 00:22:27.880
<v Speaker 2>Why are they spending billions to design their own chip.

470
00:22:28.119 --> 00:22:32.680
<v Speaker 3>Meta is a fascinating outlier here. Their core AI problem

471
00:22:32.839 --> 00:22:37.039
<v Speaker 3>is fundamentally different from open ai or Google. They aren't

472
00:22:37.079 --> 00:22:42.200
<v Speaker 3>primarily building conversational text chatbots. Their entire trillion dollar business

473
00:22:42.559 --> 00:22:44.319
<v Speaker 3>depends on recommendation engines.

474
00:22:44.599 --> 00:22:48.200
<v Speaker 2>Right, the algorithm deciding exactly which reel or add to

475
00:22:48.200 --> 00:22:51.240
<v Speaker 2>show me next, so I do not close the app exactly.

476
00:22:51.160 --> 00:22:54.720
<v Speaker 3>And computationally speaking, a recommendation engine is a very weird

477
00:22:54.759 --> 00:22:59.000
<v Speaker 3>mathematical problem. It relies on something called embedding tables. These

478
00:22:59.039 --> 00:23:03.079
<v Speaker 3>are just astronomy massive databases that map out user preferences

479
00:23:03.079 --> 00:23:04.119
<v Speaker 3>and content features.

480
00:23:04.400 --> 00:23:06.519
<v Speaker 2>So how does a chip process that differently?

481
00:23:06.759 --> 00:23:09.319
<v Speaker 3>Well, when you are generating text with a language model,

482
00:23:09.480 --> 00:23:12.079
<v Speaker 3>the math is very dense and predictable. But when you

483
00:23:12.079 --> 00:23:15.039
<v Speaker 3>are pulling from embedding tables for an ad recommendation, the

484
00:23:15.079 --> 00:23:19.119
<v Speaker 3>memory access pattern is random, sparse, and chaotic. You're jumping

485
00:23:19.119 --> 00:23:21.359
<v Speaker 3>all over the place pulling bits of user history.

486
00:23:21.440 --> 00:23:24.279
<v Speaker 2>So a standard in video GPU just isn't efficient for

487
00:23:24.319 --> 00:23:26.200
<v Speaker 2>that kind of chaotic memory access.

488
00:23:26.599 --> 00:23:30.559
<v Speaker 3>It is massive overkill in some compute areas and horribly

489
00:23:30.599 --> 00:23:33.839
<v Speaker 3>inefficient in memory access for others. So Meta designed their

490
00:23:33.880 --> 00:23:38.000
<v Speaker 3>own chip. The MTIA, the Meta Training and Inference Accelerator.

491
00:23:38.519 --> 00:23:42.000
<v Speaker 3>It is tuned specifically to handle the chaotic memory demands

492
00:23:42.279 --> 00:23:45.880
<v Speaker 3>of massive embedding tables. It really shows that the industry

493
00:23:45.960 --> 00:23:49.000
<v Speaker 3>is moving rapidly away from this idea of one chip

494
00:23:49.039 --> 00:23:51.680
<v Speaker 3>fits all and moving toward the right custom chip for

495
00:23:51.720 --> 00:23:53.160
<v Speaker 3>the specific math problem.

496
00:23:53.240 --> 00:23:55.160
<v Speaker 2>And Apple is doing this too, right with the Neural

497
00:23:55.200 --> 00:23:58.279
<v Speaker 2>Engine on iPhones, but their goal is keeping the AI

498
00:23:58.400 --> 00:24:01.359
<v Speaker 2>on the physical phone for privacy rather than sending it

499
00:24:01.359 --> 00:24:02.599
<v Speaker 2>to a cloud data center.

500
00:24:02.799 --> 00:24:05.960
<v Speaker 3>Exactly on device inference. Everyone is carving out their own

501
00:24:05.960 --> 00:24:06.880
<v Speaker 3>specialized niche.

502
00:24:06.920 --> 00:24:09.480
<v Speaker 2>Okay, we cannot talk about chip design without talking about

503
00:24:09.480 --> 00:24:12.599
<v Speaker 2>the rebels in the room, the startup landscape, because honestly,

504
00:24:12.640 --> 00:24:15.039
<v Speaker 2>it takes a certain level of sheer insanity to try

505
00:24:15.039 --> 00:24:17.519
<v Speaker 2>to start a hardware company from scratch to compete against

506
00:24:17.559 --> 00:24:20.799
<v Speaker 2>a giant like Nvidia, but people are actually doing it.

507
00:24:20.799 --> 00:24:23.839
<v Speaker 3>It is notoriously the hardest game of Silicon Valley. But

508
00:24:23.880 --> 00:24:26.440
<v Speaker 3>there are two startups right now that really highlight the

509
00:24:26.559 --> 00:24:30.759
<v Speaker 3>extreme physical limits we are pushing in chip architecture. Cerebras

510
00:24:30.799 --> 00:24:32.000
<v Speaker 3>and groc let Us.

511
00:24:32.000 --> 00:24:35.519
<v Speaker 2>Start with Cerebra Systems. These are the wafer scale guys,

512
00:24:35.599 --> 00:24:35.920
<v Speaker 2>So to.

513
00:24:35.920 --> 00:24:38.200
<v Speaker 3>Understand what Cerebras did you have to look at how

514
00:24:38.279 --> 00:24:42.519
<v Speaker 3>chips are normally made. Normally, a factory takes a silicon wafer,

515
00:24:42.920 --> 00:24:46.240
<v Speaker 3>which is basically a shiny disc of silicon roughly the

516
00:24:46.279 --> 00:24:49.920
<v Speaker 3>size of a dinner plate. They print hundreds of identical

517
00:24:50.000 --> 00:24:52.680
<v Speaker 3>small chips onto that plate, and then they slice the

518
00:24:52.720 --> 00:24:54.160
<v Speaker 3>plate up into little squares.

519
00:24:54.599 --> 00:24:57.039
<v Speaker 2>And then you take those little individual squares, put them

520
00:24:57.039 --> 00:24:59.799
<v Speaker 2>in protective plastic packages and wire them all back together

521
00:24:59.799 --> 00:25:01.279
<v Speaker 2>on big green motherboard.

522
00:25:01.440 --> 00:25:04.319
<v Speaker 3>Right, But that wiring them back together part, that is

523
00:25:04.359 --> 00:25:08.680
<v Speaker 3>the ultimate bottleneck. Moving data across copper wires between different

524
00:25:08.759 --> 00:25:11.960
<v Speaker 3>chips is painfully slow and burns a ton of energy.

525
00:25:12.440 --> 00:25:16.119
<v Speaker 3>So the founders of Cerebras just asked a seemingly crazy question,

526
00:25:16.720 --> 00:25:18.200
<v Speaker 3>why are we cutting the wafer it all?

527
00:25:18.319 --> 00:25:21.079
<v Speaker 2>They just use the entire dinner plate as one single chip.

528
00:25:21.200 --> 00:25:24.039
<v Speaker 3>The whole plate the Cerebra's wafer scale engine is a

529
00:25:24.039 --> 00:25:26.720
<v Speaker 3>single chip roughly the size of an iPad. It contains

530
00:25:26.759 --> 00:25:29.359
<v Speaker 3>four trillion transistors.

531
00:25:29.039 --> 00:25:33.200
<v Speaker 2>Four trillion on one piece of silicon. Visually, it is

532
00:25:33.279 --> 00:25:36.559
<v Speaker 2>just such a cool concept. But practically, I mean, manufacturing

533
00:25:36.559 --> 00:25:39.319
<v Speaker 2>at the atomic level is not perfect. Usually, if a

534
00:25:39.400 --> 00:25:42.039
<v Speaker 2>tiny speck of dust ruins one chip on a wafer,

535
00:25:42.160 --> 00:25:44.240
<v Speaker 2>you just throw that one small square away and keep

536
00:25:44.279 --> 00:25:47.400
<v Speaker 2>the other three hundred. If the entire wafer is the chip,

537
00:25:47.640 --> 00:25:51.519
<v Speaker 2>doesn't one single manufacturing defect ruin the whole multi million

538
00:25:51.559 --> 00:25:52.119
<v Speaker 2>dollar plate.

539
00:25:52.519 --> 00:25:55.119
<v Speaker 3>You just nailed the exact reason no one ever successfully

540
00:25:55.160 --> 00:25:58.720
<v Speaker 3>did this before. It's called the yield problem. Cerebras had

541
00:25:58.720 --> 00:26:02.279
<v Speaker 3>to invent a completely novel networking architecture to route around

542
00:26:02.319 --> 00:26:05.359
<v Speaker 3>the physically broken parts on the fly. If a microscopic

543
00:26:05.400 --> 00:26:08.480
<v Speaker 3>section of the wafer has a manufacturing defect, the internal

544
00:26:08.519 --> 00:26:10.920
<v Speaker 3>software simply ignores it and rotes the data to the

545
00:26:10.920 --> 00:26:11.640
<v Speaker 3>healthy neighbors.

546
00:26:11.759 --> 00:26:14.000
<v Speaker 2>It is like having a biological brain with a few

547
00:26:14.039 --> 00:26:17.200
<v Speaker 2>dead neurons. The overall network just adapts and rotes around

548
00:26:17.240 --> 00:26:18.559
<v Speaker 2>the damage exactly.

549
00:26:18.920 --> 00:26:22.319
<v Speaker 3>And the massive benefit of doing this is unprecedented bandwidth.

550
00:26:22.880 --> 00:26:25.400
<v Speaker 3>Because everything, all the memory, and all the compute cores

551
00:26:25.440 --> 00:26:28.640
<v Speaker 3>is physically located on the exact same piece of unbroken silicon,

552
00:26:29.279 --> 00:26:32.839
<v Speaker 3>communication is instantaneous. You never have to wait for data

553
00:26:32.880 --> 00:26:36.480
<v Speaker 3>to travel across a slow external wire. Is an absolute

554
00:26:36.480 --> 00:26:38.279
<v Speaker 3>beast for training massive models.

555
00:26:38.640 --> 00:26:40.279
<v Speaker 2>And then on the other end of the spectrum there

556
00:26:40.400 --> 00:26:43.480
<v Speaker 2>is grock spelled with a queue I have seen their

557
00:26:43.519 --> 00:26:46.279
<v Speaker 2>inference demos online where it just prints out paragraphs of

558
00:26:46.319 --> 00:26:49.440
<v Speaker 2>text instantly. It genuinely feels faster than human thought.

559
00:26:49.839 --> 00:26:53.319
<v Speaker 3>Grok took a completely different, almost philosophical approach. They looked

560
00:26:53.359 --> 00:26:56.720
<v Speaker 3>at the modern GPU and said, there is way too

561
00:26:56.799 --> 00:27:00.400
<v Speaker 3>much chaotic management going on inside this chip. The standard

562
00:27:00.480 --> 00:27:03.680
<v Speaker 3>GPU dedicates a massive amount of physical hardware and energy

563
00:27:04.279 --> 00:27:07.480
<v Speaker 3>just to scheduling dynamically deciding which core should do which

564
00:27:07.519 --> 00:27:11.039
<v Speaker 3>math problem next. Grock stripped all of that dynamic management

565
00:27:11.079 --> 00:27:11.759
<v Speaker 3>out completely.

566
00:27:11.799 --> 00:27:13.920
<v Speaker 2>They made it strictly deterministic.

567
00:27:13.480 --> 00:27:18.480
<v Speaker 3>Yes, deterministic architecture in their system called an LPU or

568
00:27:18.599 --> 00:27:22.599
<v Speaker 3>language processing unit. The software compiler maps out exactly what

569
00:27:22.759 --> 00:27:25.839
<v Speaker 3>every single transistor will do it every single clock cycle

570
00:27:26.039 --> 00:27:28.119
<v Speaker 3>before the program even starts running.

571
00:27:28.000 --> 00:27:31.279
<v Speaker 2>Like a perfectly choreographed dance routine where everyone knows their

572
00:27:31.319 --> 00:27:33.960
<v Speaker 2>steps in advance, so you don't need a director shouting

573
00:27:34.079 --> 00:27:36.079
<v Speaker 2>orders in real time exactly.

574
00:27:36.119 --> 00:27:40.079
<v Speaker 3>There is zero hesitation. And crucially, they do not use

575
00:27:40.119 --> 00:27:44.000
<v Speaker 3>the massive slow external memory like HPM that Nvidia uses.

576
00:27:44.319 --> 00:27:48.200
<v Speaker 3>They exclusively use something called SRAM. What is shram static ram?

577
00:27:48.519 --> 00:27:50.880
<v Speaker 3>It is a type of incredibly fast memory that lives

578
00:27:50.880 --> 00:27:53.359
<v Speaker 3>directly on the processor die itself right next to the

579
00:27:53.359 --> 00:27:56.519
<v Speaker 3>logic gates. It is vastly more expensive to manufacture, and

580
00:27:56.559 --> 00:27:58.799
<v Speaker 3>you physically cannot fit very much of it on a ship,

581
00:27:58.920 --> 00:28:01.440
<v Speaker 3>but it completely a limit is the delay of fetching data.

582
00:28:01.880 --> 00:28:04.920
<v Speaker 3>That is exactly why growth is so blindingly fast at

583
00:28:04.960 --> 00:28:08.480
<v Speaker 3>generating text tokens. It is perfectly optimized for the inference

584
00:28:08.599 --> 00:28:10.839
<v Speaker 3>day job, where raw speed is everything.

585
00:28:10.920 --> 00:28:13.839
<v Speaker 2>But again, all these incredible hardware startups face the exact

586
00:28:13.880 --> 00:28:17.000
<v Speaker 2>same invisible wall we talked about earlier software.

587
00:28:16.720 --> 00:28:20.839
<v Speaker 3>The Cuda ecosystem. You can literally build the fastest, most

588
00:28:20.839 --> 00:28:24.240
<v Speaker 3>beautiful chip in human history, but if a researcher's standard

589
00:28:24.319 --> 00:28:27.440
<v Speaker 3>PyTorch code does not run on it effortlessly out of

590
00:28:27.480 --> 00:28:30.279
<v Speaker 3>the box, nobody's going to buy it. And Vidia has

591
00:28:30.279 --> 00:28:34.000
<v Speaker 3>a fifteen year head start on software momentum. Breaking that

592
00:28:34.039 --> 00:28:37.759
<v Speaker 3>psychological and technical walk in is arguably harder than breaking

593
00:28:37.759 --> 00:28:38.599
<v Speaker 3>the laws of physics.

594
00:28:39.359 --> 00:28:41.720
<v Speaker 2>Speaking of breaking things, let us zoom out to the

595
00:28:41.720 --> 00:28:44.880
<v Speaker 2>global map, because up until now we've just been talking

596
00:28:44.880 --> 00:28:48.400
<v Speaker 2>about corporate rivalries. But this isn't just about companies anymore.

597
00:28:48.640 --> 00:28:49.480
<v Speaker 2>It is about.

598
00:28:49.240 --> 00:28:53.039
<v Speaker 3>Countries, geopolitics. This is where the story gets genuinely scary.

599
00:28:53.359 --> 00:28:55.279
<v Speaker 2>You mentioned at the very beginning that the supply chain

600
00:28:55.319 --> 00:28:58.200
<v Speaker 2>is highly concentrated, walk us through just how fragile this

601
00:28:58.319 --> 00:28:59.200
<v Speaker 2>map actually is.

602
00:28:59.599 --> 00:29:02.640
<v Speaker 3>Imagine if all the oil in the world, every single drop,

603
00:29:02.759 --> 00:29:06.000
<v Speaker 3>was pumped from just three buildings. That is the modern

604
00:29:06.039 --> 00:29:09.079
<v Speaker 3>semiconductor industry. Let us start with the machine that actually

605
00:29:09.079 --> 00:29:13.240
<v Speaker 3>prints the chips ASML, headquartered in the Netherlands, they are

606
00:29:13.279 --> 00:29:17.440
<v Speaker 3>the sole manufacturer of extreme ultraviolet or EUV lithography machines.

607
00:29:17.920 --> 00:29:21.720
<v Speaker 3>To even begin to understand how insanely complex this machine is,

608
00:29:22.000 --> 00:29:25.079
<v Speaker 3>these machines generate the UV light needed to print atomic

609
00:29:25.160 --> 00:29:29.200
<v Speaker 3>level circuits by dropping a microscopic droplet of molten tin

610
00:29:29.599 --> 00:29:33.160
<v Speaker 3>inside a vacuum cham molten tins falling through a vacuum, yes,

611
00:29:33.319 --> 00:29:36.319
<v Speaker 3>and as it falls, they hit that microscopic droplet with

612
00:29:36.359 --> 00:29:39.599
<v Speaker 3>a high powered laser. The impact flattens the droplet into

613
00:29:39.640 --> 00:29:42.799
<v Speaker 3>a pancake shape, and then a microsecond later they hit

614
00:29:42.839 --> 00:29:46.720
<v Speaker 3>it again with a second, infinitely more powerful laser. This

615
00:29:46.839 --> 00:29:50.400
<v Speaker 3>instantly vaporizes the tin into a plasma which amids a

616
00:29:50.480 --> 00:29:55.039
<v Speaker 3>very specific thirteen point five nanometer wavelength of extreme ultraviolet light.

617
00:29:54.960 --> 00:29:57.200
<v Speaker 2>That sounds like a weapon from Star Trek. And how

618
00:29:57.240 --> 00:30:00.119
<v Speaker 2>often is it doing this? Laser plasma explosion fift.

619
00:30:00.000 --> 00:30:01.440
<v Speaker 3>T thousand times a single second.

620
00:30:01.559 --> 00:30:03.000
<v Speaker 2>That is just incomprehensible.

621
00:30:03.039 --> 00:30:05.720
<v Speaker 3>It does that continuously fifty thousand times a second to

622
00:30:05.799 --> 00:30:09.000
<v Speaker 3>generate enough light to etch physical features onto silicon that

623
00:30:09.079 --> 00:30:11.640
<v Speaker 3>are literally the size of a few strands of human DNA.

624
00:30:11.759 --> 00:30:14.160
<v Speaker 2>And you are telling me only one single company on

625
00:30:14.240 --> 00:30:16.599
<v Speaker 2>planet Earth knows how to build this machine.

626
00:30:16.279 --> 00:30:20.359
<v Speaker 3>Only one ASML. If their primary factory in the Netherlands

627
00:30:20.440 --> 00:30:23.319
<v Speaker 3>experience is a major flood or a fire, Moore's law

628
00:30:23.400 --> 00:30:26.319
<v Speaker 3>simply ends period nobody else can make them.

629
00:30:26.359 --> 00:30:28.160
<v Speaker 2>And then once you have that two hundred million dollar

630
00:30:28.279 --> 00:30:32.240
<v Speaker 2>ASML machine. The chips themselves are mostly manufactured in Taiwan.

631
00:30:31.960 --> 00:30:36.960
<v Speaker 3>By TSMC, the Taiwan semiconductor manufacturing company. They manufacture upwards

632
00:30:36.960 --> 00:30:39.880
<v Speaker 3>of ninety percent of the world's most advanced logic chips,

633
00:30:40.319 --> 00:30:43.480
<v Speaker 3>all of Apple's chips, all of Nvidia's advanced chips. They

634
00:30:43.519 --> 00:30:47.039
<v Speaker 3>all come out of TSMC fabs in Taiwan. This creates

635
00:30:47.039 --> 00:30:51.000
<v Speaker 3>a massive, glaring geopolitical vulnerability for the rest of the world.

636
00:30:51.240 --> 00:30:53.680
<v Speaker 3>If China were to blockade the island of Taiwan, or

637
00:30:53.720 --> 00:30:56.559
<v Speaker 3>if there was just a catastrophic earthquake there, the entire

638
00:30:56.640 --> 00:30:59.720
<v Speaker 3>global economy would lose its primary computing engine overnight.

639
00:31:00.160 --> 00:31:03.519
<v Speaker 2>And this sheer panic over that vulnerability is exactly why

640
00:31:03.559 --> 00:31:05.400
<v Speaker 2>the United States initiated the chip war.

641
00:31:05.720 --> 00:31:08.759
<v Speaker 3>Exactly the US government looked closely at this supply chain

642
00:31:08.799 --> 00:31:12.480
<v Speaker 3>map and realized that AI is fundamentally a dual use technology.

643
00:31:13.000 --> 00:31:15.039
<v Speaker 3>The exact same H one hundred chip that runs a

644
00:31:15.039 --> 00:31:18.279
<v Speaker 3>friendly customer service chatbot can easily be used to model

645
00:31:18.279 --> 00:31:22.559
<v Speaker 3>the aerodynamics of hypersonic nuclear missiles or orchestrate massive cyber

646
00:31:22.599 --> 00:31:24.440
<v Speaker 3>warfare campaigns at a global scale.

647
00:31:24.519 --> 00:31:27.799
<v Speaker 2>So the US government stepped in with heavy export controls.

648
00:31:27.960 --> 00:31:30.920
<v Speaker 3>Starting heavily in twenty twenty two, the US Department of

649
00:31:30.960 --> 00:31:35.039
<v Speaker 3>Commerce outright banned the sale of Viba's absolute top tier

650
00:31:35.119 --> 00:31:37.400
<v Speaker 3>frontier chips, the eight one hundred and AGE one hundred,

651
00:31:37.599 --> 00:31:42.319
<v Speaker 3>to any entity inside China. The explicit geopolitical goal was

652
00:31:42.359 --> 00:31:46.240
<v Speaker 3>to freeze China's AI progress in place, keeping them permanently

653
00:31:46.240 --> 00:31:48.359
<v Speaker 3>a generation or two behind American labs.

654
00:31:48.440 --> 00:31:51.400
<v Speaker 2>But in Vidia's a publicly traded company, they obviously did

655
00:31:51.400 --> 00:31:54.200
<v Speaker 2>not want to lose out on the massive Chinese tech market.

656
00:31:54.359 --> 00:31:57.240
<v Speaker 3>No China represents billions of dollars in revenue for them,

657
00:31:57.279 --> 00:31:59.519
<v Speaker 3>so Nvidia engineers went back to the drawing board and

658
00:31:59.599 --> 00:32:01.559
<v Speaker 3>quickly to design new chips, the eight hundred and the

659
00:32:01.559 --> 00:32:05.119
<v Speaker 3>eight eight hundred. These were specific, slightly modified versions of

660
00:32:05.119 --> 00:32:08.440
<v Speaker 3>their flagship chips, designed specifically to comply with the exact

661
00:32:08.599 --> 00:32:10.240
<v Speaker 3>letter of the new US law.

662
00:32:10.640 --> 00:32:12.559
<v Speaker 2>How do they manage to cripple it enough to make

663
00:32:12.559 --> 00:32:14.960
<v Speaker 2>it legal? Did they just turn down the clock speed

664
00:32:15.000 --> 00:32:16.200
<v Speaker 2>and make the math slower.

665
00:32:16.400 --> 00:32:19.039
<v Speaker 3>No, And that is what was so incredibly clever about it.

666
00:32:19.119 --> 00:32:22.000
<v Speaker 3>They kept the raw compute speed exactly the same. The

667
00:32:22.160 --> 00:32:24.720
<v Speaker 3>H eight hundred could crunch matrix math just as fast

668
00:32:24.759 --> 00:32:26.640
<v Speaker 3>as the H one hundred. But what they did was

669
00:32:26.680 --> 00:32:30.119
<v Speaker 3>cut the interconnect speed, the envy link communication speed completely in.

670
00:32:30.079 --> 00:32:32.559
<v Speaker 2>Half, the wire speed between the chips exactly.

671
00:32:32.839 --> 00:32:36.920
<v Speaker 3>Why. Because if you cannot connect thousands of chips together

672
00:32:36.960 --> 00:32:41.279
<v Speaker 3>fast enough, you physically cannot build a cohesive supercomputer. You

673
00:32:41.319 --> 00:32:45.839
<v Speaker 3>cannot train a massive trillion parameter frontier model like GPT four.

674
00:32:46.200 --> 00:32:48.839
<v Speaker 3>If the chips can't talk to each other rapidly, the

675
00:32:48.920 --> 00:32:52.119
<v Speaker 3>data just bottlenecks. You can still do basic inference, you

676
00:32:52.119 --> 00:32:55.799
<v Speaker 3>can run smaller AI models locally, but you cannot create

677
00:32:55.839 --> 00:32:58.359
<v Speaker 3>the next generation of frontier AI that.

678
00:32:58.279 --> 00:33:02.400
<v Speaker 2>Is such a hyper specific surgical restriction, just snipping the

679
00:33:02.519 --> 00:33:03.720
<v Speaker 2>communication cables.

680
00:33:03.799 --> 00:33:07.000
<v Speaker 3>Essentially, it was brilliant engineering. But then the US government

681
00:33:07.039 --> 00:33:09.359
<v Speaker 3>looked at the age eight hundred, realized it was still

682
00:33:09.359 --> 00:33:11.799
<v Speaker 3>too powerful, and they abruptly tighten the rules again the

683
00:33:11.799 --> 00:33:15.519
<v Speaker 3>following year to ban even those workaround chips. It has

684
00:33:15.519 --> 00:33:19.319
<v Speaker 3>become this intense, high stakes game of regulatory cat and mouse.

685
00:33:19.440 --> 00:33:21.160
<v Speaker 2>And what is China doing in response to all this?

686
00:33:21.240 --> 00:33:23.039
<v Speaker 2>They surely aren't just throwing their hands up and giving

687
00:33:23.079 --> 00:33:24.519
<v Speaker 2>up on AI.

688
00:33:24.240 --> 00:33:27.960
<v Speaker 3>Not at all. And this brings up the massive unintended

689
00:33:27.960 --> 00:33:31.960
<v Speaker 3>consequence argument that policy experts are debating right now. By

690
00:33:32.000 --> 00:33:35.680
<v Speaker 3>aggressively cutting China off from the best Western hardware, we

691
00:33:35.839 --> 00:33:39.039
<v Speaker 3>basically forced them to heavily subsidize and build their own

692
00:33:39.079 --> 00:33:44.000
<v Speaker 3>completely independent supply chain. Huawei, despite massive sanctions, has released

693
00:33:44.000 --> 00:33:46.200
<v Speaker 3>an AI chip called the Ascend nine ten B.

694
00:33:47.039 --> 00:33:50.039
<v Speaker 2>Is it actually as good as an Nvidia H one hundred.

695
00:33:50.559 --> 00:33:53.920
<v Speaker 3>No, it is generally considered slower and less efficient. But

696
00:33:54.039 --> 00:33:56.359
<v Speaker 3>is it good enough to train large language models? Yes,

697
00:33:56.440 --> 00:33:59.519
<v Speaker 3>it absolutely is. And Smick, which is China state back

698
00:33:59.559 --> 00:34:02.839
<v Speaker 3>to semi conductor manufacturer, is figuring out brilliant ways to

699
00:34:02.920 --> 00:34:08.039
<v Speaker 3>use older non EUV machines to painstakingly manufacture advanced seven

700
00:34:08.159 --> 00:34:09.039
<v Speaker 3>nanometer chips.

701
00:34:09.239 --> 00:34:11.599
<v Speaker 2>So by trying to completely starve them up chips, we

702
00:34:11.679 --> 00:34:14.760
<v Speaker 2>might have accidentally accelerated the exact thing we were terrified of.

703
00:34:14.880 --> 00:34:18.760
<v Speaker 2>Full Chinese independence and self sufficiency in cutting edge silicon.

704
00:34:18.519 --> 00:34:21.280
<v Speaker 3>Is a very real possibility. We've pushed an economic superpower

705
00:34:21.360 --> 00:34:24.079
<v Speaker 3>into a desperate corner and they are actively engineering their

706
00:34:24.119 --> 00:34:24.639
<v Speaker 3>way out of it.

707
00:34:24.880 --> 00:34:27.880
<v Speaker 2>Meanwhile, the US is desperately trying to bring manufacturing back

708
00:34:27.920 --> 00:34:30.840
<v Speaker 2>onto domestic soil with the Chew GPS Act, which.

709
00:34:30.639 --> 00:34:33.320
<v Speaker 3>Is a massive piece of industrial policy. The US government

710
00:34:33.320 --> 00:34:36.920
<v Speaker 3>is spending over fifty two billion dollars to subsidize companies

711
00:34:37.119 --> 00:34:40.280
<v Speaker 3>like Intel to build fabs in Ohio and TSMC to

712
00:34:40.280 --> 00:34:41.480
<v Speaker 3>build fabs in Arizona.

713
00:34:41.880 --> 00:34:44.039
<v Speaker 2>But fifty two billion, I mean earlier you said one

714
00:34:44.119 --> 00:34:46.960
<v Speaker 2>single modern fab cost twenty billion dollars to build.

715
00:34:47.000 --> 00:34:49.719
<v Speaker 3>Exactly. The money goes fast, and it's not just about

716
00:34:49.719 --> 00:34:53.119
<v Speaker 3>the money. It is a severe talent deficit. We haven't

717
00:34:53.119 --> 00:34:55.960
<v Speaker 3>built leading edge silicon factories at scale in the US

718
00:34:56.000 --> 00:34:59.000
<v Speaker 3>for decades. We simply do not have the thousands of

719
00:34:59.000 --> 00:35:02.519
<v Speaker 3>specialized PhD ease or the massive workforce of highly trained

720
00:35:02.559 --> 00:35:05.840
<v Speaker 3>clean room technicians required you can pour concrete in a year,

721
00:35:06.360 --> 00:35:09.679
<v Speaker 3>but it takes a generation to rebuild that specialized human capacity.

722
00:35:10.199 --> 00:35:13.239
<v Speaker 2>Okay, we have covered the geopolitical wall, but there are

723
00:35:13.280 --> 00:35:16.039
<v Speaker 2>two other massive walls we are currently slamming into at

724
00:35:16.079 --> 00:35:18.280
<v Speaker 2>full speed, physics and energy.

725
00:35:18.440 --> 00:35:21.559
<v Speaker 3>Let us tackle physics first. Simply put, we are running

726
00:35:21.559 --> 00:35:22.199
<v Speaker 3>out of atoms.

727
00:35:22.320 --> 00:35:23.840
<v Speaker 2>The famous death of Moore's law.

728
00:35:24.119 --> 00:35:27.400
<v Speaker 3>Right. Moore's Law, for decades depended on our ability to

729
00:35:27.480 --> 00:35:30.440
<v Speaker 3>just keep physically shrinking transistors so we could pack more

730
00:35:30.440 --> 00:35:33.280
<v Speaker 3>of them onto the same sized chip. But we are

731
00:35:33.320 --> 00:35:36.960
<v Speaker 3>currently manufacturing at the three nanometer and two nanometer scale.

732
00:35:37.519 --> 00:35:41.199
<v Speaker 3>To give you perspective, a single silicon atom is roughly

733
00:35:41.320 --> 00:35:43.039
<v Speaker 3>point two nanometers.

734
00:35:42.559 --> 00:35:45.880
<v Speaker 2>Wide, So the physical wires inside these nude chips are

735
00:35:45.880 --> 00:35:48.239
<v Speaker 2>what ten or fifteen atoms across exactly.

736
00:35:48.320 --> 00:35:51.360
<v Speaker 3>We're dealing with structures that are literally counted in dozens

737
00:35:51.360 --> 00:35:53.800
<v Speaker 3>of atoms. And when you get down to that microscopic

738
00:35:53.880 --> 00:35:58.320
<v Speaker 3>quantum scale, classical physics breaks down. Quantum mechanics completely takes over.

739
00:35:58.599 --> 00:36:02.119
<v Speaker 3>Electrons stop behaving like predictable solid particles and they start

740
00:36:02.159 --> 00:36:03.079
<v Speaker 3>behaving like waves.

741
00:36:03.159 --> 00:36:05.719
<v Speaker 2>And what does an electron wave do inside a transistor?

742
00:36:06.000 --> 00:36:08.679
<v Speaker 3>It ignores the walls. It does something called quantum tunneling.

743
00:36:08.760 --> 00:36:11.800
<v Speaker 3>The electron literally teleports through the physical barrier that is

744
00:36:11.840 --> 00:36:14.920
<v Speaker 3>supposed to hold it back. You get massive electrical leakage,

745
00:36:15.079 --> 00:36:19.360
<v Speaker 3>you get uncontrollable heat. You physically legally cannot shrink the

746
00:36:19.400 --> 00:36:22.360
<v Speaker 3>silicon gate much further because the laws of the universe

747
00:36:22.400 --> 00:36:23.039
<v Speaker 3>won't let you.

748
00:36:23.119 --> 00:36:25.079
<v Speaker 2>So how on earth do we keep making computers faster

749
00:36:25.159 --> 00:36:27.800
<v Speaker 2>every year? If we cannot make the microscopic parts any.

750
00:36:27.599 --> 00:36:30.480
<v Speaker 3>Smaller, we completely change how we package them. This is

751
00:36:30.480 --> 00:36:32.360
<v Speaker 3>called the chiplet revolution chiplets.

752
00:36:32.599 --> 00:36:35.519
<v Speaker 2>So instead of putting one giant, monolithic chip, you use

753
00:36:35.599 --> 00:36:37.079
<v Speaker 2>lots of little pieces.

754
00:36:36.760 --> 00:36:39.800
<v Speaker 3>Exactly go back to that yield problem we talked about

755
00:36:39.800 --> 00:36:43.039
<v Speaker 3>with cerebras. If you try to print a giant, complex

756
00:36:43.159 --> 00:36:46.159
<v Speaker 3>chip the size of a cracker, the mathematical odds of

757
00:36:46.199 --> 00:36:49.880
<v Speaker 3>a microscopic dust particle or a defect landing somewhere on

758
00:36:49.920 --> 00:36:54.159
<v Speaker 3>that large surface are extremely high. Maybe only forty percent

759
00:36:54.199 --> 00:36:56.800
<v Speaker 3>of the chips on your wafer actually work. That makes

760
00:36:56.800 --> 00:36:58.880
<v Speaker 3>them astronomically expensive to produce.

761
00:36:59.079 --> 00:37:02.800
<v Speaker 2>But if you intentional print very tiny, simple chips, the

762
00:37:02.840 --> 00:37:03.239
<v Speaker 2>odds of.

763
00:37:03.199 --> 00:37:05.519
<v Speaker 3>A defect landing on a tiny footprint are very low.

764
00:37:05.599 --> 00:37:07.880
<v Speaker 3>You might get a ninety or ninety five percent yield.

765
00:37:08.039 --> 00:37:12.079
<v Speaker 3>So now companies like AMD and Intel are pivoting entirely.

766
00:37:12.519 --> 00:37:16.639
<v Speaker 3>They are printing small, specialized functional tiles, a compute tile,

767
00:37:17.159 --> 00:37:19.599
<v Speaker 3>a separate memory tile, and input output tile, and they

768
00:37:19.599 --> 00:37:20.920
<v Speaker 3>are stitching them closely together.

769
00:37:20.960 --> 00:37:23.679
<v Speaker 2>After the fact, it is exactly like building with Legos.

770
00:37:23.960 --> 00:37:27.280
<v Speaker 3>It is just like legos, but they use incredibly advanced

771
00:37:27.320 --> 00:37:31.000
<v Speaker 3>packaging techniques, sometimes stacking them in three D so that

772
00:37:31.079 --> 00:37:34.840
<v Speaker 3>electrically to the software they act exactly like one single

773
00:37:34.920 --> 00:37:39.239
<v Speaker 3>unified chip. This modular chiplet design is realistically the only

774
00:37:39.320 --> 00:37:42.159
<v Speaker 3>way we can keep performance scaling up now that traditional

775
00:37:42.239 --> 00:37:44.000
<v Speaker 3>transistor shrinking is effectively dead.

776
00:37:44.159 --> 00:37:46.519
<v Speaker 2>And then there is the energy wall. Honestly, this one

777
00:37:46.559 --> 00:37:48.800
<v Speaker 2>feels the most tangible and immediate to me.

778
00:37:49.000 --> 00:37:51.440
<v Speaker 3>It is the most immediate constraint on the entire AI

779
00:37:51.480 --> 00:37:55.320
<v Speaker 3>industry right now. The power draw statistics are genuinely frightening.

780
00:37:55.840 --> 00:37:58.599
<v Speaker 3>A single standard server rack full of Nvidia H one

781
00:37:58.639 --> 00:38:02.320
<v Speaker 3>hundreds can draw four eighty to fifty kilowants of continuous.

782
00:38:01.800 --> 00:38:04.039
<v Speaker 2>Power just to ground that. Compare that to a standard

783
00:38:04.079 --> 00:38:04.880
<v Speaker 2>suburban house.

784
00:38:05.159 --> 00:38:07.920
<v Speaker 3>An average American home might draw one to two kilowatts

785
00:38:07.920 --> 00:38:11.360
<v Speaker 3>on average, So one single metal rack of AI servers

786
00:38:11.400 --> 00:38:14.239
<v Speaker 3>is using the power of an entire neighborhood. A full

787
00:38:14.280 --> 00:38:16.360
<v Speaker 3>AI data center is drawing the power of a medium

788
00:38:16.360 --> 00:38:16.960
<v Speaker 3>sized city.

789
00:38:17.280 --> 00:38:19.559
<v Speaker 2>I have heard these wild rumors that big tech companies

790
00:38:19.599 --> 00:38:22.920
<v Speaker 2>are literally looking into buying nuclear power plants just for AI.

791
00:38:23.159 --> 00:38:26.119
<v Speaker 3>I can assure you it is not a rumor. Microsoft

792
00:38:26.239 --> 00:38:30.719
<v Speaker 3>is actively hiring directors of nuclear strategy right now. Amazon

793
00:38:30.920 --> 00:38:34.000
<v Speaker 3>just bought a data center campus in Pennsylvania that is

794
00:38:34.039 --> 00:38:37.920
<v Speaker 3>physically plugged directly into an existing nuclear power plant. They're

795
00:38:37.960 --> 00:38:43.360
<v Speaker 3>actively lobbying for SMR small modular reactors because our aging

796
00:38:43.440 --> 00:38:48.039
<v Speaker 3>national power grids simply cannot support the projected AI energy demand.

797
00:38:48.199 --> 00:38:52.000
<v Speaker 3>We're talking about future training runs that will require gigawatts

798
00:38:52.000 --> 00:38:52.719
<v Speaker 3>scale power.

799
00:38:52.840 --> 00:38:55.920
<v Speaker 2>Gigawatts. We are going to have to construct dedicated nuclear

800
00:38:55.920 --> 00:38:58.119
<v Speaker 2>power plants just to train GPT six.

801
00:38:58.199 --> 00:39:00.639
<v Speaker 3>Literally, yes, If we do not fund and mentally improve

802
00:39:00.679 --> 00:39:03.800
<v Speaker 3>the energetic efficiency of these chips, the entire AI revolution

803
00:39:03.880 --> 00:39:06.079
<v Speaker 3>will just stall out because we will physically trip the

804
00:39:06.119 --> 00:39:07.320
<v Speaker 3>breakers of the electric grid.

805
00:39:07.440 --> 00:39:11.159
<v Speaker 2>That puts an unimaginable premium on completely new ideas. If

806
00:39:11.159 --> 00:39:14.239
<v Speaker 2>traditional silicon is hitting a quantum wall and energy consumption

807
00:39:14.320 --> 00:39:17.280
<v Speaker 2>is hitting an absolute ceiling, we clearly need new physics.

808
00:39:17.480 --> 00:39:20.679
<v Speaker 2>What does the sci fi future of computing actually look like?

809
00:39:21.000 --> 00:39:24.119
<v Speaker 3>There are three major alternative computing paths being researched right

810
00:39:24.159 --> 00:39:27.639
<v Speaker 3>now that get me incredibly excited. The first is photonic.

811
00:39:27.159 --> 00:39:31.639
<v Speaker 2>Computing, computing with actual light instead of electricity, right right now.

812
00:39:31.840 --> 00:39:35.800
<v Speaker 3>To do math, we forcefully push electrons through solid copper wires.

813
00:39:36.239 --> 00:39:41.440
<v Speaker 3>That physical friction generates heat, it creates electrical resistance, But photons,

814
00:39:41.480 --> 00:39:45.119
<v Speaker 3>actual particles of light, have zero mass and experience zero

815
00:39:45.199 --> 00:39:49.159
<v Speaker 3>electrical resistance. Innovative startups like light Matter are currently building

816
00:39:49.199 --> 00:39:52.480
<v Speaker 3>chips that use microscopic devices called interferometers.

817
00:39:52.840 --> 00:39:55.400
<v Speaker 2>How does an interferometer actually calculate anything?

818
00:39:55.639 --> 00:39:57.760
<v Speaker 3>You take a single laser beam of light and you

819
00:39:57.800 --> 00:40:01.440
<v Speaker 3>split it down two different tiny optic channels. By precisely

820
00:40:01.440 --> 00:40:04.960
<v Speaker 3>controlling the phase of those light waves, basically shifting how

821
00:40:05.000 --> 00:40:06.719
<v Speaker 3>the peaks and valleys of the waves line up when

822
00:40:06.760 --> 00:40:11.000
<v Speaker 3>they recombine, you can actually perform complex matrix multiplication natively

823
00:40:11.159 --> 00:40:14.800
<v Speaker 3>in the light itself. You are doing advanced aimth with

824
00:40:14.920 --> 00:40:17.639
<v Speaker 3>pure light beams moving at the literal speed of light

825
00:40:17.800 --> 00:40:19.360
<v Speaker 3>while generating almost zero heat.

826
00:40:19.440 --> 00:40:21.639
<v Speaker 2>That sounds like actual magic. If it is that fast

827
00:40:21.679 --> 00:40:23.599
<v Speaker 2>and cold, why aren't we ripping out our GPUs and

828
00:40:23.639 --> 00:40:25.159
<v Speaker 2>doing it right now.

829
00:40:25.159 --> 00:40:29.280
<v Speaker 3>Because controlling microscopic light waves precisely on a tiny silicon

830
00:40:29.400 --> 00:40:33.360
<v Speaker 3>chip is outrageously difficult, and the big catches. You still

831
00:40:33.400 --> 00:40:36.039
<v Speaker 3>have to convert the light signals back into slow electricity

832
00:40:36.079 --> 00:40:38.760
<v Speaker 3>anytime you want to store the data in standard memory,

833
00:40:39.039 --> 00:40:42.000
<v Speaker 3>but as a coprocessor just for doing the heavy matrix math.

834
00:40:42.039 --> 00:40:44.039
<v Speaker 3>It is a massive potential game changer.

835
00:40:44.199 --> 00:40:46.320
<v Speaker 2>Okay, so light is past number one? What is past

836
00:40:46.400 --> 00:40:51.400
<v Speaker 2>number two? Neuromorphic computing neuro meeting modeling it after the

837
00:40:51.440 --> 00:40:52.440
<v Speaker 2>biological brain.

838
00:40:52.719 --> 00:40:55.760
<v Speaker 3>The human brain is, without a doubt, the most efficient

839
00:40:55.840 --> 00:40:58.320
<v Speaker 3>computer in the known universe. Your brain runs on about

840
00:40:58.360 --> 00:41:01.880
<v Speaker 3>twenty watts of power, basically incandescent light bulb, and with

841
00:41:01.920 --> 00:41:05.559
<v Speaker 3>those twenty watts it manages complex continuous learning, real time

842
00:41:05.639 --> 00:41:10.840
<v Speaker 3>stereoscopic vision, complex language processing, and fine motor control all simultaneously.

843
00:41:10.920 --> 00:41:13.159
<v Speaker 2>And meanwhile, an AI trying to do just one of

844
00:41:13.199 --> 00:41:17.360
<v Speaker 2>those things needs its own personal nuclear reactor. What exactly

845
00:41:17.440 --> 00:41:19.239
<v Speaker 2>is the biological brain doing so differently?

846
00:41:19.440 --> 00:41:22.480
<v Speaker 3>It uses an event based architecture. In a normal digital

847
00:41:22.480 --> 00:41:25.280
<v Speaker 3>computer chip, there is a global clock that ticks billions

848
00:41:25.280 --> 00:41:28.679
<v Speaker 3>of times a second, and with every single tick, every

849
00:41:28.719 --> 00:41:31.840
<v Speaker 3>single transistor on the chip gets flooded with power, even

850
00:41:31.840 --> 00:41:34.760
<v Speaker 3>if it has absolutely zero useful work to do. In

851
00:41:34.760 --> 00:41:38.760
<v Speaker 3>the biological brain, neurons stay completely dark and only consume

852
00:41:38.880 --> 00:41:41.639
<v Speaker 3>energy when they actively need to fire. We call these spikes.

853
00:41:41.840 --> 00:41:44.599
<v Speaker 2>So the brain is just incredibly lazy in a highly

854
00:41:44.639 --> 00:41:45.719
<v Speaker 2>optimized good way.

855
00:41:45.800 --> 00:41:50.239
<v Speaker 3>Efficiently lazy. Yes, Neuromorphic chips like Intel's Loy Project or

856
00:41:50.480 --> 00:41:54.920
<v Speaker 3>IBM's Northpole are attempting to physically mimic this spiking neural

857
00:41:54.960 --> 00:41:58.559
<v Speaker 3>network architecture in silicon. If the incoming data isn't actively

858
00:41:58.639 --> 00:42:01.400
<v Speaker 3>changing the relevant parts, the chips simply shut down and

859
00:42:01.440 --> 00:42:03.840
<v Speaker 3>consume zero power. It is wildly efficient.

860
00:42:03.920 --> 00:42:05.559
<v Speaker 2>And what is the third alternative path?

861
00:42:05.639 --> 00:42:06.440
<v Speaker 3>Analog computing?

862
00:42:06.519 --> 00:42:09.599
<v Speaker 2>Analog like going backwards to the nineteen fifties sort of.

863
00:42:09.719 --> 00:42:13.079
<v Speaker 3>Yes, modern computing is purely digital. It uses strict ones

864
00:42:13.079 --> 00:42:16.559
<v Speaker 3>and zeros. It is perfectly precise, but it takes a

865
00:42:16.639 --> 00:42:20.280
<v Speaker 3>massive amount of physical transistors just to represent a long

866
00:42:20.360 --> 00:42:24.280
<v Speaker 3>decimal number like zero point three four five two. In

867
00:42:24.320 --> 00:42:28.159
<v Speaker 3>an analog system, you can represent that exact same complex

868
00:42:28.280 --> 00:42:31.960
<v Speaker 3>number as a single continuous voltage level, just a fluid

869
00:42:31.960 --> 00:42:32.920
<v Speaker 3>electrical wave.

870
00:42:32.960 --> 00:42:35.760
<v Speaker 2>So you can actually do the math natively inside the

871
00:42:35.800 --> 00:42:38.920
<v Speaker 2>continuous wave itself, rather than breaking it down into digital bits.

872
00:42:39.079 --> 00:42:43.559
<v Speaker 3>Exactly, you can perform massive matrix multiplications instantly simply by

873
00:42:43.559 --> 00:42:46.719
<v Speaker 3>passing different electrical currents through a grid of physical resistors.

874
00:42:47.159 --> 00:42:50.000
<v Speaker 3>It is incredibly fast and saves a massive amount of energy.

875
00:42:50.320 --> 00:42:53.599
<v Speaker 3>The main downside, however, is noise. Analog systems are affected

876
00:42:53.599 --> 00:42:57.760
<v Speaker 3>by temperature variations and slight manufacturing defects. They aren't profecularly precise.

877
00:42:57.840 --> 00:42:59.039
<v Speaker 3>The math gets a little fuzzy.

878
00:42:59.159 --> 00:43:02.400
<v Speaker 2>But with ai ural networks isn't a little fuzzy. Actually, Okay,

879
00:43:02.480 --> 00:43:04.559
<v Speaker 2>they are basically probability engines anyway.

880
00:43:04.840 --> 00:43:08.239
<v Speaker 3>That is the exact billion dollar bet these analog startups

881
00:43:08.239 --> 00:43:11.559
<v Speaker 3>are making. A neural network doesn't always need mathematically perfect

882
00:43:11.599 --> 00:43:13.760
<v Speaker 3>precision to tell you it's looking at a picture of

883
00:43:13.760 --> 00:43:16.880
<v Speaker 3>a cat. It just needs to be close enough. If

884
00:43:16.880 --> 00:43:19.360
<v Speaker 3>we can accept just a tiny bit of analog noise

885
00:43:19.599 --> 00:43:22.639
<v Speaker 3>in exchange for a one thousand x gain and energy efficiency,

886
00:43:22.960 --> 00:43:26.639
<v Speaker 3>the entire industry will gladly make that trade, and ultimately

887
00:43:26.719 --> 00:43:28.840
<v Speaker 3>that ties into the final evolution of all this, which

888
00:43:28.880 --> 00:43:31.480
<v Speaker 3>is deep software hardware code design.

889
00:43:31.480 --> 00:43:33.800
<v Speaker 2>Where the code and the silicon are actually designed together

890
00:43:33.840 --> 00:43:34.519
<v Speaker 2>from day one.

891
00:43:34.800 --> 00:43:37.960
<v Speaker 3>Exactly, we are moving away from writing general software for

892
00:43:38.039 --> 00:43:42.320
<v Speaker 3>general chips. Future architectures will feature algorithms totally explicitly co

893
00:43:42.440 --> 00:43:46.280
<v Speaker 3>designed alongside the cuts and physical hardware squeezing absolutely every

894
00:43:46.360 --> 00:43:48.280
<v Speaker 3>last drop of performance out of the silicon.

895
00:43:48.440 --> 00:43:51.880
<v Speaker 2>Wow. So, just stepping back for a second, we have

896
00:43:52.000 --> 00:43:55.119
<v Speaker 2>gone from two kids buying a five hundred dollars GTX

897
00:43:55.199 --> 00:43:57.360
<v Speaker 2>five to eighty in a Best Buy to play video games,

898
00:43:57.559 --> 00:44:00.119
<v Speaker 2>all the way to massive H one hundred data center.

899
00:44:00.639 --> 00:44:03.760
<v Speaker 2>We've gone through Google's secret TPU panic, through an escalating

900
00:44:03.760 --> 00:44:06.679
<v Speaker 2>global trade war over tiny EUV machines, and now we

901
00:44:06.719 --> 00:44:09.119
<v Speaker 2>are talking about computers running on actual light beams and

902
00:44:09.159 --> 00:44:11.960
<v Speaker 2>analog waves. It has been quite a journey, it really.

903
00:44:11.800 --> 00:44:14.280
<v Speaker 3>Has been, and I think the major synthesis here, the

904
00:44:14.280 --> 00:44:18.039
<v Speaker 3>big takeaway, is crucial. We started this entire conversation by

905
00:44:18.079 --> 00:44:21.239
<v Speaker 3>observing that software code used to be the undisputed king

906
00:44:21.320 --> 00:44:26.199
<v Speaker 3>of tech, but today raw access to this unbelievable level

907
00:44:26.239 --> 00:44:29.119
<v Speaker 3>of compute is all that really matters. And unfortunately that

908
00:44:29.199 --> 00:44:31.280
<v Speaker 3>access is heavily centralizing.

909
00:44:30.960 --> 00:44:33.519
<v Speaker 2>Right because you and I cannot just decide to build

910
00:44:33.559 --> 00:44:36.280
<v Speaker 2>a ten thousand chip H one hundred cluster in our

911
00:44:36.280 --> 00:44:40.320
<v Speaker 2>garage this weekend. This kind of power is exclusively concentrating

912
00:44:40.360 --> 00:44:43.079
<v Speaker 2>in the hands of the massive hyperscalers and a select

913
00:44:43.159 --> 00:44:44.679
<v Speaker 2>few sovereign nations.

914
00:44:44.719 --> 00:44:47.000
<v Speaker 3>Exactly, the barrier to entry has moved from needing a

915
00:44:47.039 --> 00:44:50.199
<v Speaker 3>clever idea and a laptop to needing billions of dollars

916
00:44:50.199 --> 00:44:52.639
<v Speaker 3>in capital in your own dedicated power plant.

917
00:44:52.840 --> 00:44:54.920
<v Speaker 2>So, as we wrap up what does this all mean

918
00:44:54.960 --> 00:44:57.119
<v Speaker 2>for the listener, we always like to leave everyone with

919
00:44:57.159 --> 00:44:59.360
<v Speaker 2>a provocative thought to chew on after the deep dive.

920
00:45:00.119 --> 00:45:03.280
<v Speaker 3>Say this, the compute race that we've been discussing is

921
00:45:03.360 --> 00:45:07.119
<v Speaker 3>not just a standard market competition between a few tech giants.

922
00:45:07.559 --> 00:45:10.679
<v Speaker 3>It is quite literally a race to harness the fundamental

923
00:45:10.760 --> 00:45:14.639
<v Speaker 3>laws of physics. We are currently pushing atomic structures and

924
00:45:14.679 --> 00:45:18.679
<v Speaker 3>global energy grids to their absolute breaking point, all in

925
00:45:18.719 --> 00:45:20.880
<v Speaker 3>an attempt to build synthetic minds.

926
00:45:20.760 --> 00:45:23.599
<v Speaker 2>And whoever gets there first, whoever wins that race, gets

927
00:45:23.639 --> 00:45:24.440
<v Speaker 2>all the spoils.

928
00:45:24.519 --> 00:45:28.440
<v Speaker 3>Because whoever builds the fastest, most scalable, and most efficient

929
00:45:28.480 --> 00:45:32.599
<v Speaker 3>hardware doesn't just win in a temporary commercial sector. They

930
00:45:32.599 --> 00:45:35.920
<v Speaker 3>gain the power to permanently decide what artificial intelligence actually

931
00:45:35.920 --> 00:45:38.880
<v Speaker 3>looks like for the rest of human history. They will

932
00:45:38.920 --> 00:45:41.840
<v Speaker 3>fully control the fundamental infrastructure of thought for the twenty

933
00:45:41.880 --> 00:45:44.639
<v Speaker 3>first century. That is the true stakes of what is

934
00:45:44.679 --> 00:45:48.880
<v Speaker 3>being painstakingly etched onto those tiny silicon wafers in Taiwan.

935
00:45:48.599 --> 00:45:51.440
<v Speaker 2>The infrastructure of thought. That is a wonderfully heavy concept

936
00:45:51.440 --> 00:45:53.039
<v Speaker 2>to end on. Thank you so much for breaking all

937
00:45:53.039 --> 00:45:55.000
<v Speaker 2>this down. We will be back next time with another

938
00:45:55.079 --> 00:45:56.920
<v Speaker 2>deep dive. Thanks for listening, Thanks

939
00:45:56.960 --> 00:45:57.440
<v Speaker 3>For having me.
