WEBVTT

1
00:00:01.199 --> 00:00:06.200
<v Speaker 1>Welcome to the Sentient Code, where intelligence is engineered, autonomy

2
00:00:06.280 --> 00:00:10.439
<v Speaker 1>is emerging, and a line between human and machine grows thinner.

3
00:00:10.800 --> 00:00:15.359
<v Speaker 1>Each episode, we decode the algorithms, explore the robotics, and

4
00:00:15.439 --> 00:00:21.879
<v Speaker 1>examine the ideas shaping the future of artificial minds.

5
00:00:23.800 --> 00:00:25.879
<v Speaker 2>Okay, let's just take a second to breathe.

6
00:00:26.320 --> 00:00:29.000
<v Speaker 3>Yeah, a deep breath is probably a good idea.

7
00:00:29.160 --> 00:00:31.559
<v Speaker 2>Because if you were online yesterday, I mean, if you

8
00:00:31.600 --> 00:00:34.719
<v Speaker 2>were anywhere near a notification stream or x or just

9
00:00:34.759 --> 00:00:38.240
<v Speaker 2>a tech newsticker, you didn't just see a press release, No,

10
00:00:38.439 --> 00:00:39.399
<v Speaker 2>you felt a tremor.

11
00:00:39.799 --> 00:00:42.200
<v Speaker 3>That is the perfect word for it, a tremor, a

12
00:00:42.359 --> 00:00:43.439
<v Speaker 3>shift in the bedrock.

13
00:00:43.600 --> 00:00:47.039
<v Speaker 2>You know, we had become so desensitized to updates, haven't we.

14
00:00:47.280 --> 00:00:49.560
<v Speaker 2>It's always version one point one, version one point two.

15
00:00:49.679 --> 00:00:51.479
<v Speaker 3>Oh yeah, minor bug fixes.

16
00:00:51.520 --> 00:00:53.759
<v Speaker 2>It's usually bug fixes, maybe a dark mode, maybe the

17
00:00:53.759 --> 00:00:56.759
<v Speaker 2>apuploads what five percent faster? We just scroll right past it.

18
00:00:57.119 --> 00:01:00.880
<v Speaker 2>But what happened yesterday, February seventeenth, twenty twenty six was

19
00:01:02.119 --> 00:01:02.840
<v Speaker 2>it was not that?

20
00:01:03.280 --> 00:01:06.920
<v Speaker 3>No, it was a paradigm shift disguised as a decimal point,

21
00:01:07.319 --> 00:01:09.439
<v Speaker 3>a very very significant decimal point.

22
00:01:09.480 --> 00:01:11.359
<v Speaker 2>We are talking, of course, about the massive news from

23
00:01:11.439 --> 00:01:14.959
<v Speaker 2>Xai Elon Musk took to x and announced the immediate

24
00:01:14.959 --> 00:01:16.840
<v Speaker 2>public beta of Grock four point two.

25
00:01:17.159 --> 00:01:19.200
<v Speaker 3>And not Grock four point twenty, as some of the

26
00:01:19.239 --> 00:01:20.959
<v Speaker 3>early memes were suggesting.

27
00:01:20.560 --> 00:01:23.319
<v Speaker 2>Write that little nod to internet culture, but they clarified

28
00:01:23.359 --> 00:01:25.200
<v Speaker 2>it's free point two. And I have to say, looking

29
00:01:25.239 --> 00:01:27.879
<v Speaker 2>at the sheer density of the documentation, the white papers,

30
00:01:27.920 --> 00:01:31.519
<v Speaker 2>the architecture diagrams, salt calling this an update feels like

31
00:01:31.519 --> 00:01:34.680
<v Speaker 2>an insult. This feels like a different species of intelligence.

32
00:01:34.760 --> 00:01:37.280
<v Speaker 3>It really is. And to understand why this matters, you

33
00:01:37.280 --> 00:01:40.040
<v Speaker 3>have to look past the branding. Usually a point two

34
00:01:40.200 --> 00:01:44.120
<v Speaker 3>release is an optimization, it's a tweak, the refinement exactly.

35
00:01:44.560 --> 00:01:47.879
<v Speaker 3>But Xai is claiming this model is designed to be

36
00:01:48.359 --> 00:01:51.439
<v Speaker 3>and this is a direct quote, an order of magnitude

37
00:01:51.480 --> 00:01:54.000
<v Speaker 3>smarter and faster than Grock four.

38
00:01:54.120 --> 00:01:56.159
<v Speaker 2>An order of magnitude. That's not a small claim.

39
00:01:56.239 --> 00:01:59.799
<v Speaker 3>It's a huge claim. And the kicker, they aren't promising

40
00:01:59.799 --> 00:02:02.239
<v Speaker 3>this for next year or Q four. They're saying this

41
00:02:02.280 --> 00:02:04.799
<v Speaker 3>is happening now in a public beta that concludes in

42
00:02:04.840 --> 00:02:05.439
<v Speaker 3>about a month.

43
00:02:05.680 --> 00:02:09.400
<v Speaker 2>That timeline is what stopped me in my tracks. It's just, yeah,

44
00:02:09.479 --> 00:02:10.199
<v Speaker 2>it's breathtaking.

45
00:02:10.280 --> 00:02:13.120
<v Speaker 3>Leefast it's aggressive. I mean even for them, it's aggressive.

46
00:02:13.280 --> 00:02:14.719
<v Speaker 2>Let's just put it on a calendar for a second

47
00:02:14.759 --> 00:02:17.639
<v Speaker 2>for everyone listening. Grock four was released in July of

48
00:02:17.639 --> 00:02:18.520
<v Speaker 2>twenty twenty.

49
00:02:18.240 --> 00:02:20.639
<v Speaker 3>Five, right, which already felt like a huge leap, a

50
00:02:20.719 --> 00:02:21.120
<v Speaker 3>huge leap.

51
00:02:21.159 --> 00:02:25.680
<v Speaker 2>Then Grock four point one followed in November twenty twenty five. Fine,

52
00:02:25.840 --> 00:02:28.360
<v Speaker 2>now here we are February twenty twenty six, and we

53
00:02:28.400 --> 00:02:31.360
<v Speaker 2>get four point two. This isn't software development time now,

54
00:02:31.400 --> 00:02:34.560
<v Speaker 2>this is evolutionary time. This is compounding at a speed

55
00:02:34.599 --> 00:02:36.080
<v Speaker 2>that feels unnatural.

56
00:02:36.280 --> 00:02:39.360
<v Speaker 3>It's the fastest major iteration cycle we've seen in the

57
00:02:39.400 --> 00:02:41.919
<v Speaker 3>history of the company, and you know, arguably in the

58
00:02:41.960 --> 00:02:46.560
<v Speaker 3>history of the entire sector. They are compounding intelligence at

59
00:02:46.639 --> 00:02:49.879
<v Speaker 3>a rate that is becoming genuinely difficult to track.

60
00:02:50.560 --> 00:02:53.000
<v Speaker 2>So here's our mission for today. We aren't just going

61
00:02:53.080 --> 00:02:54.439
<v Speaker 2>to read the release notes.

62
00:02:54.599 --> 00:02:55.560
<v Speaker 3>Yeah, one can do that.

63
00:02:55.680 --> 00:02:58.280
<v Speaker 2>Anyone can do that. We have the technical breakdowns, we

64
00:02:58.319 --> 00:03:01.000
<v Speaker 2>have some of the leaked benchmarks, and the early user

65
00:03:01.039 --> 00:03:04.280
<v Speaker 2>reports are flooding in. We need to understand not just

66
00:03:04.439 --> 00:03:08.759
<v Speaker 2>what changed, but how this thing actually operates. Because the

67
00:03:08.840 --> 00:03:12.080
<v Speaker 2>central claim is that it thinks differently than anything we've

68
00:03:12.159 --> 00:03:12.719
<v Speaker 2>used before.

69
00:03:12.840 --> 00:03:15.000
<v Speaker 3>That's the key insight. You said it perfectly. It's not

70
00:03:15.080 --> 00:03:17.039
<v Speaker 3>just a bigger brain, it's a different kind of mind.

71
00:03:17.080 --> 00:03:19.960
<v Speaker 3>It's a whole new cognitive architecture.

72
00:03:20.159 --> 00:03:22.520
<v Speaker 2>Okay, So before we get into the software wizardry, which

73
00:03:22.800 --> 00:03:25.719
<v Speaker 2>honestly it's mind bending stuff, we have to ground this

74
00:03:25.840 --> 00:03:30.240
<v Speaker 2>in physical reality the hardware, because AI feels like magic,

75
00:03:30.479 --> 00:03:33.199
<v Speaker 2>but it runs on metal. It runs on silicon and copper.

76
00:03:33.840 --> 00:03:36.000
<v Speaker 2>We need to talk about the Memphis Colossus.

77
00:03:36.080 --> 00:03:38.960
<v Speaker 3>The Colossus. It sounds like a wonder of the ancient world,

78
00:03:39.000 --> 00:03:39.680
<v Speaker 3>doesn't it.

79
00:03:39.680 --> 00:03:43.319
<v Speaker 2>It basically is the modern equivalent. The stats on this

80
00:03:43.360 --> 00:03:47.680
<v Speaker 2>supercluster are hard to even visualize. We are talking about

81
00:03:47.680 --> 00:03:50.639
<v Speaker 2>a cluster that has now scaled to over one point

82
00:03:50.680 --> 00:03:52.680
<v Speaker 2>two million GPUs.

83
00:03:52.960 --> 00:03:54.960
<v Speaker 3>Let's just pause on that number for a second. One

84
00:03:55.000 --> 00:03:55.800
<v Speaker 3>point two million.

85
00:03:55.960 --> 00:03:57.879
<v Speaker 2>I mean, I remember just two years ago, back in

86
00:03:57.919 --> 00:04:00.520
<v Speaker 2>twenty four, we were looking at clusters with one hundred

87
00:04:00.520 --> 00:04:03.159
<v Speaker 2>thousand GPUs and our minds were blown. We were thinking,

88
00:04:03.599 --> 00:04:06.120
<v Speaker 2>this is it. This is the peak. You can't possibly

89
00:04:06.120 --> 00:04:08.479
<v Speaker 2>connect more than that efficiently exactly.

90
00:04:08.520 --> 00:04:11.879
<v Speaker 3>We thought that was industrial scale. This this is planetary scale.

91
00:04:12.000 --> 00:04:14.039
<v Speaker 3>This is a city made of compute.

92
00:04:13.560 --> 00:04:15.719
<v Speaker 2>A city. That's a great way to put it, and.

93
00:04:15.719 --> 00:04:18.560
<v Speaker 3>The reason you, the listener, need to care about that

94
00:04:18.639 --> 00:04:22.040
<v Speaker 3>number isn't just because it's big and impressive. It's because

95
00:04:22.120 --> 00:04:24.399
<v Speaker 3>quantity has a quality all its own.

96
00:04:25.199 --> 00:04:26.040
<v Speaker 2>What do you mean by that?

97
00:04:26.120 --> 00:04:29.079
<v Speaker 3>When you have one point two million GPUs at your disposal,

98
00:04:29.560 --> 00:04:32.319
<v Speaker 3>you aren't just training the same old models faster. You

99
00:04:32.399 --> 00:04:36.040
<v Speaker 3>unlock entirely new training techniques that are physically impossible when

100
00:04:36.079 --> 00:04:37.160
<v Speaker 3>you're compute constrained.

101
00:04:37.360 --> 00:04:40.040
<v Speaker 2>So it's not just about cooking the steak faster. Now

102
00:04:40.199 --> 00:04:42.639
<v Speaker 2>it allows you to cook a completely different meal.

103
00:04:42.839 --> 00:04:46.240
<v Speaker 3>Precisely. You can run parallel simulations on a massive scale.

104
00:04:46.639 --> 00:04:48.600
<v Speaker 3>You can do reinforcement learning loops that would take a

105
00:04:48.680 --> 00:04:51.319
<v Speaker 3>smaller cluster a decade to finish, and you can now

106
00:04:51.360 --> 00:04:54.720
<v Speaker 3>do them in a week. That hardware, that colossus is

107
00:04:54.759 --> 00:04:58.480
<v Speaker 3>the absolute prerequisite for the software breakthroughs we're about to

108
00:04:58.480 --> 00:04:58.959
<v Speaker 3>get into.

109
00:04:58.959 --> 00:05:01.439
<v Speaker 2>Which brings us to the first big one, the rapid

110
00:05:01.519 --> 00:05:04.279
<v Speaker 2>learning architecture. Now, I really want you to break this

111
00:05:04.399 --> 00:05:06.920
<v Speaker 2>down for us, because learning is a word we throw

112
00:05:06.959 --> 00:05:09.959
<v Speaker 2>around a lot in AI. It's almost lost its meaning.

113
00:05:10.240 --> 00:05:11.839
<v Speaker 2>How is this different from the old way?

114
00:05:12.240 --> 00:05:14.240
<v Speaker 3>Right? So, the old way, and by old I mean

115
00:05:14.319 --> 00:05:16.959
<v Speaker 3>the standard practice for pretty much every major model in

116
00:05:17.040 --> 00:05:19.560
<v Speaker 3>twenty twenty four and twenty twenty five is what i'd

117
00:05:19.560 --> 00:05:22.879
<v Speaker 3>call the snapshot methods. The snapshot, you gather the entire internet,

118
00:05:22.920 --> 00:05:27.079
<v Speaker 3>basically petabytes of text books, code everything, you feed it

119
00:05:27.120 --> 00:05:29.319
<v Speaker 3>into the model, You cook it for months on a

120
00:05:29.399 --> 00:05:33.160
<v Speaker 3>giant cluster, and then when it's done, you freeze it.

121
00:05:33.240 --> 00:05:36.240
<v Speaker 2>You freeze the weights like printing an encyclopedia exactly.

122
00:05:36.279 --> 00:05:39.480
<v Speaker 3>It's a perfect analogy. Once that training run is done,

123
00:05:39.920 --> 00:05:43.480
<v Speaker 3>the weights, the neural connections inside the model are set

124
00:05:43.519 --> 00:05:46.240
<v Speaker 3>in stone. If the world changes a day after you

125
00:05:46.279 --> 00:05:49.279
<v Speaker 3>finish training, well, too bad. The model doesn't know. It's

126
00:05:49.319 --> 00:05:50.800
<v Speaker 3>a frozen artifact of the past.

127
00:05:50.839 --> 00:05:53.639
<v Speaker 2>And that's why we always had those knowledge cutoffs. You'd

128
00:05:53.639 --> 00:05:55.959
<v Speaker 2>ask about a news event from last week and the

129
00:05:55.959 --> 00:05:59.519
<v Speaker 2>AI would say, sorry, my knowledge ends in September twenty

130
00:05:59.519 --> 00:06:00.319
<v Speaker 2>twenty three wrecked.

131
00:06:00.920 --> 00:06:04.399
<v Speaker 3>That was the fundamental limitation of the static paradigm. Grock

132
00:06:04.480 --> 00:06:07.920
<v Speaker 3>four point two is trying to kill the snapshot. They've

133
00:06:07.959 --> 00:06:10.839
<v Speaker 3>introduced what they call a hybrid post training process.

134
00:06:11.040 --> 00:06:14.360
<v Speaker 2>This is the real time adaptation feature I saw mentioned everywhere.

135
00:06:14.519 --> 00:06:17.040
<v Speaker 3>Yes, so instead of being a solid block of ice,

136
00:06:17.319 --> 00:06:20.560
<v Speaker 3>think of the model as having a fluid outer layer.

137
00:06:21.120 --> 00:06:24.720
<v Speaker 3>It's a lightweight continue learning layer that ingests anonymized high

138
00:06:24.720 --> 00:06:27.160
<v Speaker 3>signal user feedback almost constantly.

139
00:06:27.439 --> 00:06:30.360
<v Speaker 2>So when I'm using Groc and I click that thumbs down,

140
00:06:30.839 --> 00:06:33.519
<v Speaker 2>i have flag an answer as unhelpful, or maybe I've

141
00:06:33.560 --> 00:06:35.959
<v Speaker 2>paste into correction, or I'm working through a complex code

142
00:06:35.959 --> 00:06:36.600
<v Speaker 2>bug with it.

143
00:06:36.600 --> 00:06:38.920
<v Speaker 3>It's not just going into a log file that a

144
00:06:39.000 --> 00:06:41.480
<v Speaker 3>human intern might read in six months. It is being

145
00:06:41.519 --> 00:06:46.319
<v Speaker 3>distilled mathematically into these tiny micro updates. And this all

146
00:06:46.439 --> 00:06:50.360
<v Speaker 3>leads to what Xai is very cleverly branding as the

147
00:06:50.480 --> 00:06:51.759
<v Speaker 3>Friday Ritual.

148
00:06:52.199 --> 00:06:55.199
<v Speaker 2>I love this branding, the Friday Ritual. It sounds a

149
00:06:55.240 --> 00:06:58.920
<v Speaker 2>little culty, but in a cool Silicon Valley kind of way. Yeah,

150
00:06:58.959 --> 00:07:00.759
<v Speaker 2>what exactly happens on Fridays.

151
00:07:00.959 --> 00:07:04.800
<v Speaker 3>Every Friday, Xai pushes a global update to the model's weights.

152
00:07:05.160 --> 00:07:08.120
<v Speaker 3>This isn't a full retrain, but it's a significant update

153
00:07:08.279 --> 00:07:12.000
<v Speaker 3>based on the aggregated, verified learnings from millions of users

154
00:07:12.040 --> 00:07:12.920
<v Speaker 3>over the previous week.

155
00:07:13.160 --> 00:07:14.920
<v Speaker 2>It is that's wild.

156
00:07:15.040 --> 00:07:17.040
<v Speaker 3>It means the Groc you talk to on a Monday

157
00:07:17.079 --> 00:07:20.439
<v Speaker 3>morning is measurably mathematically smarter than the one you were

158
00:07:20.439 --> 00:07:21.639
<v Speaker 3>talking to on Sunday night.

159
00:07:21.920 --> 00:07:22.279
<v Speaker 2>Wow.

160
00:07:22.480 --> 00:07:26.759
<v Speaker 3>It has metabolized the experiences, the corrections the hard problems

161
00:07:26.959 --> 00:07:28.959
<v Speaker 3>that millions of people threw at it over the last

162
00:07:29.000 --> 00:07:29.560
<v Speaker 3>seven days.

163
00:07:29.839 --> 00:07:32.160
<v Speaker 2>Just think about the feedback loop there. I mean, if

164
00:07:32.199 --> 00:07:35.120
<v Speaker 2>a brand new programming library comes out on a Tuesday YEP,

165
00:07:35.439 --> 00:07:38.000
<v Speaker 2>and thousands of developers are struggling with it, and they're

166
00:07:38.079 --> 00:07:41.519
<v Speaker 2>using GROC and they're correcting its mistakes on Wednesday and Thursday.

167
00:07:41.639 --> 00:07:45.240
<v Speaker 3>By Friday's update, Grock knows the new library. It stops

168
00:07:45.279 --> 00:07:46.240
<v Speaker 3>making those mistakes.

169
00:07:46.560 --> 00:07:48.800
<v Speaker 2>It's absorbed that knowledge from the collective.

170
00:07:48.920 --> 00:07:52.079
<v Speaker 3>It's evolution on a weekly cycle. It's an unprecedented speed

171
00:07:52.079 --> 00:07:52.720
<v Speaker 3>of adaptation.

172
00:07:53.079 --> 00:07:55.439
<v Speaker 2>But and I have to play the skeptic here because

173
00:07:55.439 --> 00:07:58.160
<v Speaker 2>I can hear the safety researchers screaming into their pillows

174
00:07:58.199 --> 00:07:58.600
<v Speaker 2>right now.

175
00:07:59.000 --> 00:08:02.439
<v Speaker 3>Isn't this incredibly It's the first question everyone asks.

176
00:08:02.639 --> 00:08:06.199
<v Speaker 2>We all remember pay the Microsoft chatbot from a decade ago.

177
00:08:06.480 --> 00:08:07.680
<v Speaker 2>It lasted what a day?

178
00:08:07.959 --> 00:08:08.680
<v Speaker 3>Less than a day.

179
00:08:08.800 --> 00:08:11.279
<v Speaker 2>You let the internet teach an AI, and it just

180
00:08:11.319 --> 00:08:15.199
<v Speaker 2>becomes a toxic, racist, conspiratorial nightmare.

181
00:08:15.279 --> 00:08:18.000
<v Speaker 3>That is the primary risk. Absolutely, If you just pipe

182
00:08:18.120 --> 00:08:21.879
<v Speaker 3>raw x data, you know, formerly Twitter, into the model's brain,

183
00:08:22.040 --> 00:08:25.879
<v Speaker 3>you get garbage, you get chaos. But XAI is keenly,

184
00:08:26.240 --> 00:08:29.720
<v Speaker 3>keenly aware of this. The documentation emphasizes over and over

185
00:08:29.800 --> 00:08:32.639
<v Speaker 3>that this isn't raw learning, it's high signal learning.

186
00:08:32.720 --> 00:08:34.600
<v Speaker 2>So there's a filter, a big one.

187
00:08:34.440 --> 00:08:37.039
<v Speaker 3>A massive one. Think of it as curated evolution. They

188
00:08:37.159 --> 00:08:40.440
<v Speaker 3>use automated alignment checks, which are basically other AI models

189
00:08:40.480 --> 00:08:43.399
<v Speaker 3>whose entire job is to grade the proposed updates to

190
00:08:43.559 --> 00:08:47.000
<v Speaker 3>verify that the new information is actually factual, helpful, and

191
00:08:47.080 --> 00:08:49.759
<v Speaker 3>crucially not a jail break or an attempt to corrupt

192
00:08:49.759 --> 00:08:50.159
<v Speaker 3>the model.

193
00:08:50.360 --> 00:08:52.879
<v Speaker 2>So it's less like a parrot repeating every single thing

194
00:08:52.879 --> 00:08:57.360
<v Speaker 2>adhares and more like a diligent student who checks a

195
00:08:57.399 --> 00:09:01.240
<v Speaker 2>new fact against a trusted textbook before accepting it is.

196
00:09:01.240 --> 00:09:05.519
<v Speaker 3>True, a student with a very very strict teacher grading

197
00:09:05.559 --> 00:09:08.559
<v Speaker 3>their homework before it gets committed to their permanent memory.

198
00:09:08.720 --> 00:09:12.120
<v Speaker 3>They are filtering for a utility and truth, trying to

199
00:09:12.120 --> 00:09:16.519
<v Speaker 3>distinguish between the internet's noise and it's signal.

200
00:09:16.679 --> 00:09:19.120
<v Speaker 2>Okay, that makes sense, so that covers how it learns.

201
00:09:19.159 --> 00:09:22.200
<v Speaker 2>It's a living system now, not a static object. But

202
00:09:22.240 --> 00:09:25.399
<v Speaker 2>the thing that really seems to be dominating the technical discourse,

203
00:09:25.440 --> 00:09:28.440
<v Speaker 2>the thing everyone's buzzing about is how it thinks.

204
00:09:28.759 --> 00:09:30.879
<v Speaker 3>Yes, the cognitive architecture.

205
00:09:30.919 --> 00:09:32.480
<v Speaker 2>We're talking about the four agent system.

206
00:09:32.639 --> 00:09:36.120
<v Speaker 3>This is the revolution. Honestly, if you take one thing

207
00:09:36.159 --> 00:09:39.159
<v Speaker 3>away from our entire discussion today, let it be this.

208
00:09:39.159 --> 00:09:40.480
<v Speaker 3>This is the core innovation.

209
00:09:40.679 --> 00:09:43.639
<v Speaker 2>So set the scene for us. Previously, an AI like

210
00:09:43.720 --> 00:09:46.879
<v Speaker 2>Grock for GBT four was a monolith.

211
00:09:46.519 --> 00:09:50.440
<v Speaker 3>Right, a monolith. You ask a question and one giant

212
00:09:50.480 --> 00:09:53.039
<v Speaker 3>neural network starts predicting the next word, then the next,

213
00:09:53.080 --> 00:09:56.440
<v Speaker 3>then the next, based on pure probability. It's a stream

214
00:09:56.440 --> 00:10:00.480
<v Speaker 3>of consciousness, one voice. It's incredibly impressive, but it's prone

215
00:10:00.480 --> 00:10:03.519
<v Speaker 3>to getting lost in its own rambling. It can hallucinate,

216
00:10:03.639 --> 00:10:07.240
<v Speaker 3>it can contradict itself because there's no internal checking mechanism.

217
00:10:07.279 --> 00:10:09.240
<v Speaker 2>But Grock four point two doesn't work alone.

218
00:10:09.399 --> 00:10:12.919
<v Speaker 3>No, for simple things like what's the weather? Or tell

219
00:10:12.919 --> 00:10:16.039
<v Speaker 3>me a joke, it stays simple. It uses the base model.

220
00:10:16.200 --> 00:10:19.679
<v Speaker 3>But for any non trivial query, how do I design

221
00:10:19.720 --> 00:10:24.200
<v Speaker 3>a structurally sound shed or analyze this complex legal contract

222
00:10:24.240 --> 00:10:28.200
<v Speaker 3>for loopholes? Grock four point two spins up a team

223
00:10:28.480 --> 00:10:32.279
<v Speaker 3>a team. It creates an internal council of four specialized

224
00:10:32.320 --> 00:10:33.360
<v Speaker 3>independent agents.

225
00:10:33.559 --> 00:10:36.639
<v Speaker 2>I want to be the team because the documentation gives

226
00:10:36.679 --> 00:10:40.200
<v Speaker 2>them these almost distinct personalities. It's fascinating. Let's go through them.

227
00:10:40.200 --> 00:10:41.720
<v Speaker 2>One by one who is Agent one.

228
00:10:41.919 --> 00:10:44.519
<v Speaker 3>Agent one is the reasoner, the reasoner, the logition, the

229
00:10:44.519 --> 00:10:47.399
<v Speaker 3>pure logician, the mathematician. Think of this agent as the

230
00:10:47.440 --> 00:10:50.080
<v Speaker 3>Spock of the group. Its entire job is step by

231
00:10:50.080 --> 00:10:53.240
<v Speaker 3>step decomposition. It doesn't care about being polite. It doesn't

232
00:10:53.240 --> 00:10:55.440
<v Speaker 3>care about facts in the outside world, not at first.

233
00:10:55.720 --> 00:10:57.960
<v Speaker 3>It cares only about internal consistency.

234
00:10:58.080 --> 00:10:59.639
<v Speaker 2>A leads to B, B leads to C.

235
00:11:00.159 --> 00:11:01.279
<v Speaker 3>Does A lead to B?

236
00:11:01.759 --> 00:11:02.000
<v Speaker 2>Does?

237
00:11:02.039 --> 00:11:06.200
<v Speaker 3>The math checkout is the code logically sound from top

238
00:11:06.240 --> 00:11:09.399
<v Speaker 3>to bottom. It's the one that prevents those weird intuitive

239
00:11:09.480 --> 00:11:11.679
<v Speaker 3>jumps where an AI just guesses the answer to a

240
00:11:11.720 --> 00:11:13.559
<v Speaker 3>math problem instead of showing its work.

241
00:11:13.720 --> 00:11:15.879
<v Speaker 2>So it enforces that chain of thought we hear so

242
00:11:16.000 --> 00:11:17.159
<v Speaker 2>much about exactly.

243
00:11:17.200 --> 00:11:19.799
<v Speaker 3>It's the rigorous scientist of the group. It breaks a

244
00:11:19.840 --> 00:11:24.039
<v Speaker 3>complex problem down into its smallest possible components and solves

245
00:11:24.080 --> 00:11:25.720
<v Speaker 3>them linearly methodically.

246
00:11:26.039 --> 00:11:29.559
<v Speaker 2>Okay, but logic isn't enough. If your initial facts are wrong.

247
00:11:29.919 --> 00:11:32.320
<v Speaker 2>You can have a perfectly logical argument that's based on

248
00:11:32.360 --> 00:11:33.720
<v Speaker 2>a complete lie, which.

249
00:11:33.480 --> 00:11:37.240
<v Speaker 3>Brings us directly to Agent two, the verifier.

250
00:11:36.759 --> 00:11:40.480
<v Speaker 2>The truth seeker, fact checker. The verifier is the journalist

251
00:11:40.559 --> 00:11:43.080
<v Speaker 2>or the librarian in the room. It has a live

252
00:11:43.200 --> 00:11:46.360
<v Speaker 2>real time connection to the Internet, specifically the X fire

253
00:11:46.399 --> 00:11:50.000
<v Speaker 2>hose for breaking news and the broader web for established knowledge.

254
00:11:50.399 --> 00:11:52.320
<v Speaker 2>Its job is to look at what the reasoner is

255
00:11:52.360 --> 00:11:54.720
<v Speaker 2>proposing and say, hang on a second, with a minute,

256
00:11:54.840 --> 00:11:57.960
<v Speaker 2>does that scientific paper you cited actually exist? Is that

257
00:11:58.039 --> 00:12:01.279
<v Speaker 2>chemical reaction possible at room tenpure? What is the current

258
00:12:01.320 --> 00:12:04.120
<v Speaker 2>stock price of that company? Not the price from six

259
00:12:04.159 --> 00:12:06.279
<v Speaker 2>months ago. So it's the hallucination killer.

260
00:12:06.399 --> 00:12:08.879
<v Speaker 3>It is designed to be the hallucination killer. It's the

261
00:12:09.000 --> 00:12:11.320
<v Speaker 3>editor with the big red pen. It prevents the model

262
00:12:11.320 --> 00:12:15.159
<v Speaker 3>from confidently lying to you. If the reasoner says, based

263
00:12:15.159 --> 00:12:18.039
<v Speaker 3>on the twenty twenty five tax code, you oex amount,

264
00:12:18.320 --> 00:12:21.320
<v Speaker 3>the verifier is there to check. Wait, the tax code

265
00:12:21.360 --> 00:12:24.600
<v Speaker 3>was updated in January twenty twenty six. Your premise is wrong.

266
00:12:24.799 --> 00:12:28.240
<v Speaker 2>That is a crucial, crucial distinction. Okay, okay, So we

267
00:12:28.279 --> 00:12:32.600
<v Speaker 2>have a logitian and a fact checker, a powerful combo.

268
00:12:32.639 --> 00:12:33.559
<v Speaker 2>Who's number three?

269
00:12:33.840 --> 00:12:36.039
<v Speaker 3>Agent three is the embodied simulator.

270
00:12:36.200 --> 00:12:37.679
<v Speaker 2>This is the one that sounds of both sci fi

271
00:12:37.759 --> 00:12:40.519
<v Speaker 2>to me. Embodied simulator? What does that even mean?

272
00:12:40.759 --> 00:12:43.679
<v Speaker 3>This is the imaginative one, but it's an imagination rounded

273
00:12:43.720 --> 00:12:49.399
<v Speaker 3>in physics. It understands three D space It understands object permanence, friction, gravity.

274
00:12:49.639 --> 00:12:52.559
<v Speaker 3>If you ask a question about robotics or mechanical engineering,

275
00:12:52.639 --> 00:12:55.519
<v Speaker 3>or how an object might move through space, this agent

276
00:12:55.600 --> 00:12:58.679
<v Speaker 3>actually runs a mental physics based simulation of that event.

277
00:12:58.960 --> 00:13:00.639
<v Speaker 2>So if I ask it to write code for a

278
00:13:00.720 --> 00:13:03.840
<v Speaker 2>robot arm to pick up a delicate glass object, Agent

279
00:13:03.840 --> 00:13:06.320
<v Speaker 2>three isn't just guessing words based on another code it's seen.

280
00:13:06.679 --> 00:13:09.039
<v Speaker 2>It's actually simulating the fragility of the glass.

281
00:13:09.360 --> 00:13:12.559
<v Speaker 3>It's modeling the physics of the grip. It's asking how

282
00:13:12.639 --> 00:13:15.840
<v Speaker 3>much pressure is too much pressure? What's the optimal trajectory

283
00:13:15.879 --> 00:13:19.600
<v Speaker 3>to avoid collision. It's the bridge between the digital brand

284
00:13:19.639 --> 00:13:22.799
<v Speaker 3>and the physical world. It's the engineer and the architect.

285
00:13:22.440 --> 00:13:25.919
<v Speaker 2>Of the group. Mind blowing okay. And finally, Agent four,

286
00:13:26.279 --> 00:13:27.559
<v Speaker 2>the one running the whole show.

287
00:13:27.759 --> 00:13:32.200
<v Speaker 3>Agent four the synthesizer, the boss, the project manager. The

288
00:13:32.200 --> 00:13:35.480
<v Speaker 3>manager is the perfect term. The synthesizer doesn't generate the

289
00:13:35.519 --> 00:13:39.679
<v Speaker 3>initial raw ideas it listens. It takes the logical breakdown

290
00:13:39.679 --> 00:13:43.159
<v Speaker 3>from the reasoner, the factual corrections from the verifier, and

291
00:13:43.200 --> 00:13:46.519
<v Speaker 3>the physical simulations from the simulator. It looks at all their.

292
00:13:46.440 --> 00:13:48.840
<v Speaker 2>Drafts and it must notice where they disagree.

293
00:13:48.919 --> 00:13:52.000
<v Speaker 3>That's his most important job. It notices the conflicts and

294
00:13:52.039 --> 00:13:55.000
<v Speaker 3>it integrates them into the final coherent answer that you,

295
00:13:55.240 --> 00:13:57.360
<v Speaker 3>the user, actually see on your screen.

296
00:13:57.519 --> 00:13:59.960
<v Speaker 2>This is where that concept of the debate comes in, right,

297
00:14:00.320 --> 00:14:03.840
<v Speaker 2>The sources mentioned a hidden reasoning trace. Yes, exactly before

298
00:14:03.840 --> 00:14:08.000
<v Speaker 2>I see my answer, these agents are actually arguing. Are

299
00:14:08.039 --> 00:14:08.879
<v Speaker 2>they fighting it out?

300
00:14:09.039 --> 00:14:12.240
<v Speaker 3>They are debating, and arguing might be the right word.

301
00:14:12.279 --> 00:14:16.440
<v Speaker 3>In some cases, the reasoner might propose a solution, saying, logically,

302
00:14:16.559 --> 00:14:19.840
<v Speaker 3>this is the most efficient path. The verifier might jump

303
00:14:19.840 --> 00:14:24.279
<v Speaker 3>in and say, actually, current federal safety regulations prohibit that method.

304
00:14:24.120 --> 00:14:25.919
<v Speaker 2>Entirely, and then the simulator chimes in.

305
00:14:26.039 --> 00:14:28.759
<v Speaker 3>The simulator might add and even if it were legal,

306
00:14:28.960 --> 00:14:32.039
<v Speaker 3>if you try that, the engine will overheat in thirty

307
00:14:32.120 --> 00:14:34.440
<v Speaker 3>seconds because of the friction involved.

308
00:14:34.080 --> 00:14:37.720
<v Speaker 2>And the synthesizer. The manager has to resolve that conflict.

309
00:14:37.840 --> 00:14:41.360
<v Speaker 3>It has to reconcile those discrepancies. And this matters so

310
00:14:41.440 --> 00:14:45.200
<v Speaker 3>much because in a single model, monolithic system, the AI

311
00:14:45.559 --> 00:14:48.879
<v Speaker 3>just picks the most statistically likely path and commits to it.

312
00:14:48.879 --> 00:14:51.960
<v Speaker 3>It often doubles down on its own errors. Right here,

313
00:14:52.039 --> 00:14:54.759
<v Speaker 3>the system challenges itself before it ever speaks to you.

314
00:14:55.120 --> 00:14:57.840
<v Speaker 2>It's like having a boardroom of diverse experts in your

315
00:14:57.840 --> 00:15:02.480
<v Speaker 2>pocket instead of just one really smart but sometimes overconfident intern.

316
00:15:02.639 --> 00:15:05.399
<v Speaker 3>That is the perfect analogy, and it explains why the

317
00:15:05.480 --> 00:15:09.639
<v Speaker 3>leaked benchmarks for complex, open ended engineering problems are so high.

318
00:15:10.039 --> 00:15:12.960
<v Speaker 3>Grock four point two isn't guessing on these problems. It's

319
00:15:13.000 --> 00:15:15.000
<v Speaker 3>holding a committee meeting at the speed of light.

320
00:15:15.320 --> 00:15:19.440
<v Speaker 2>Now, usually when you tell me committee meeting, I hear slow.

321
00:15:19.399 --> 00:15:21.360
<v Speaker 3>Right bureaucracy, red tape.

322
00:15:21.440 --> 00:15:24.279
<v Speaker 2>Exactly if I have to wait for four different AI

323
00:15:24.399 --> 00:15:26.840
<v Speaker 2>agents to argue it out, am I waiting ten minutes

324
00:15:26.840 --> 00:15:31.399
<v Speaker 2>for a response? Because we've all become very, very impatient users.

325
00:15:31.440 --> 00:15:32.840
<v Speaker 2>We want our answers instantly.

326
00:15:33.120 --> 00:15:37.480
<v Speaker 3>You would think, so, it's logical, more computation, more agents,

327
00:15:37.559 --> 00:15:41.639
<v Speaker 3>more steps, it should equal more time. But this is

328
00:15:41.679 --> 00:15:45.240
<v Speaker 3>the engineering miracle of Grock four point two. The stats

329
00:15:45.279 --> 00:15:48.639
<v Speaker 3>on speed are genuinely baffling. What we talk about inference

330
00:15:48.759 --> 00:15:51.759
<v Speaker 3>latency is actually down by a factor of three to five.

331
00:15:51.600 --> 00:15:54.519
<v Speaker 2>Down, not up. It's faster to run four agents than

332
00:15:54.559 --> 00:15:56.279
<v Speaker 2>it was to run one old model.

333
00:15:56.120 --> 00:15:59.039
<v Speaker 3>Much much faster. Responses that used to take a sluggish

334
00:15:59.039 --> 00:16:01.559
<v Speaker 3>eight to twelve seconds on GROC four are now taking

335
00:16:01.600 --> 00:16:02.519
<v Speaker 3>one to three seconds.

336
00:16:02.559 --> 00:16:04.679
<v Speaker 2>Okay, hold on, how is that physically possible. That seems

337
00:16:04.720 --> 00:16:06.960
<v Speaker 2>to defy the lies of computation.

338
00:16:07.120 --> 00:16:10.759
<v Speaker 3>It comes down to two main things, extreme parallelism and

339
00:16:10.799 --> 00:16:13.399
<v Speaker 3>a new memory innovation they're calling anagram primitive.

340
00:16:13.519 --> 00:16:16.159
<v Speaker 2>Okay, let's unpack parallelism first. That makes some sense.

341
00:16:16.240 --> 00:16:18.960
<v Speaker 3>The agents don't run in a sequence. It's not reasoner

342
00:16:19.120 --> 00:16:20.759
<v Speaker 3>than verifier, then simulator.

343
00:16:20.840 --> 00:16:22.080
<v Speaker 2>It's not a bucket brigade.

344
00:16:22.279 --> 00:16:26.879
<v Speaker 3>No, they spin up simultaneously on that massive colossus cluster.

345
00:16:27.759 --> 00:16:30.240
<v Speaker 3>The reasoner is doing its math at the exact same

346
00:16:30.320 --> 00:16:33.480
<v Speaker 3>time the verifier is queerying the web for facts. They

347
00:16:33.480 --> 00:16:35.960
<v Speaker 3>work in parallel, not in a line, and then the

348
00:16:35.960 --> 00:16:37.559
<v Speaker 3>synthesizer sorts out the results.

349
00:16:37.559 --> 00:16:40.559
<v Speaker 2>Okay, that accounts for some of it. Yeah, but n grams.

350
00:16:41.159 --> 00:16:43.840
<v Speaker 2>That sounds like something straight out of a cyberpunk novel. Yeah,

351
00:16:43.879 --> 00:16:46.159
<v Speaker 2>I need to slot in a new anagram for kung fu.

352
00:16:46.399 --> 00:16:48.519
<v Speaker 3>Hey, yeah, it does have that ring to it. It's

353
00:16:48.519 --> 00:16:51.799
<v Speaker 3>a fascinating memory innovation. The simplest way to think of

354
00:16:51.840 --> 00:16:55.039
<v Speaker 3>it is like a highly advanced ZIP file for concepts.

355
00:16:55.080 --> 00:16:56.440
<v Speaker 2>A ZIP file for a concept.

356
00:16:56.519 --> 00:17:00.759
<v Speaker 3>Okay. Normally, for an AI to access a specific piece

357
00:17:00.759 --> 00:17:03.759
<v Speaker 3>of knowledge, say the entire tax code of France, it

358
00:17:03.799 --> 00:17:07.160
<v Speaker 3>has to compute that information through its entire massive neural network.

359
00:17:07.160 --> 00:17:10.240
<v Speaker 3>That's billions of parameters firing. It's computationally heavy lifting.

360
00:17:10.440 --> 00:17:10.640
<v Speaker 1>Right.

361
00:17:11.000 --> 00:17:16.039
<v Speaker 3>Anagram primitives are pre computed, compressed memory representations of large

362
00:17:16.079 --> 00:17:19.359
<v Speaker 3>stable concepts. They allow the model to recall and reason

363
00:17:19.400 --> 00:17:22.400
<v Speaker 3>over vast knowledge bases without needing to activate the entire

364
00:17:22.440 --> 00:17:23.920
<v Speaker 3>brain for every single query.

365
00:17:24.200 --> 00:17:26.519
<v Speaker 2>So it's like instead of having to read the whole

366
00:17:26.559 --> 00:17:29.799
<v Speaker 2>textbook again, every time it has a photographic memory of

367
00:17:29.839 --> 00:17:32.640
<v Speaker 2>a specific page it needs, it can just pull that instantly.

368
00:17:32.839 --> 00:17:36.599
<v Speaker 3>Roughly, Yes, it's a very clever shortcut for high speed recall.

369
00:17:37.359 --> 00:17:40.599
<v Speaker 3>It allows the agents to access huge amounts of domain

370
00:17:40.640 --> 00:17:45.960
<v Speaker 3>specific data instantly without the computational drag. It effectively prefetches

371
00:17:46.000 --> 00:17:49.559
<v Speaker 3>the context it thinks it will need for a given problem, and.

372
00:17:49.440 --> 00:17:51.599
<v Speaker 2>That must tie into the context window. We're seeing a

373
00:17:51.680 --> 00:17:54.000
<v Speaker 2>native one million token window.

374
00:17:53.799 --> 00:17:56.680
<v Speaker 3>Native one million, and it's expandable to two million for

375
00:17:56.799 --> 00:17:58.000
<v Speaker 3>enterprise users, for.

376
00:17:57.960 --> 00:18:01.759
<v Speaker 2>Anyone listening just for scale one million tokens is huge.

377
00:18:01.759 --> 00:18:04.319
<v Speaker 2>It's the entire Lord of the Rings trilogy plus the Hobbit.

378
00:18:04.359 --> 00:18:05.440
<v Speaker 2>It's a massive code base.

379
00:18:05.640 --> 00:18:08.559
<v Speaker 3>It's the ability to hold an entire complex project in

380
00:18:08.599 --> 00:18:11.759
<v Speaker 3>memory at once. And because of this anging ram system.

381
00:18:11.799 --> 00:18:14.279
<v Speaker 3>It doesn't feel like the AI is loading all that data.

382
00:18:14.319 --> 00:18:16.880
<v Speaker 3>It feels native. It just knows it instantly.

383
00:18:17.400 --> 00:18:21.160
<v Speaker 2>So we have speed, unbelievable speed, but speed is useless

384
00:18:21.160 --> 00:18:24.119
<v Speaker 2>if you're just confidently wrong faster. Of course, we touched

385
00:18:24.160 --> 00:18:26.920
<v Speaker 2>on accuracy with the verifier agent, but what do the

386
00:18:26.960 --> 00:18:28.799
<v Speaker 2>actual numbers say about error rates?

387
00:18:29.119 --> 00:18:32.799
<v Speaker 3>The internal evaluations, which are now being corroborated by early

388
00:18:32.960 --> 00:18:36.640
<v Speaker 3>user benchmarks, are claiming a forty to sixty percent reduction

389
00:18:36.759 --> 00:18:41.039
<v Speaker 3>in air rates on complex multi step reasoning problems compared

390
00:18:41.039 --> 00:18:42.319
<v Speaker 3>to Grock four point one.

391
00:18:42.400 --> 00:18:45.400
<v Speaker 2>A sixty percent reduction. That's not an incremental improvement. That

392
00:18:45.480 --> 00:18:46.759
<v Speaker 2>is a generational leap.

393
00:18:46.880 --> 00:18:49.720
<v Speaker 3>It's the difference between a novelty and a professional tool.

394
00:18:50.599 --> 00:18:53.039
<v Speaker 3>If an AI is wrong twenty percent of the time

395
00:18:53.079 --> 00:18:55.960
<v Speaker 3>on hard problems, you can't really trust it with your job.

396
00:18:56.279 --> 00:18:59.279
<v Speaker 3>You spend more time checking its work than doing your own. Sure,

397
00:18:59.440 --> 00:19:01.480
<v Speaker 3>if it's only wrong, say one or two percent of

398
00:19:01.480 --> 00:19:03.359
<v Speaker 3>the time, you can start building a company on top

399
00:19:03.400 --> 00:19:06.039
<v Speaker 3>of it. The four agent system is pushing it across

400
00:19:06.079 --> 00:19:08.039
<v Speaker 3>that critical reliability threshold.

401
00:19:08.119 --> 00:19:10.400
<v Speaker 2>And I saw a specific note about coding, which is

402
00:19:10.519 --> 00:19:11.759
<v Speaker 2>always a key benchmark.

403
00:19:11.920 --> 00:19:15.279
<v Speaker 3>Yes, the capabilities there are really impressive. It can maintain

404
00:19:15.359 --> 00:19:19.559
<v Speaker 3>context across ten thousand plus line code bases. It's outperforming

405
00:19:19.599 --> 00:19:23.400
<v Speaker 3>all previous versions on standard benchmarks like Human Evil and

406
00:19:23.839 --> 00:19:26.680
<v Speaker 3>we bench. But the key phrase and the release notes

407
00:19:26.680 --> 00:19:29.039
<v Speaker 3>that jumped out at me was autonomous debugging.

408
00:19:29.240 --> 00:19:33.200
<v Speaker 2>Autonomous debugging. That's the dream or the nightmare, depending on

409
00:19:33.240 --> 00:19:34.759
<v Speaker 2>your job security as a developer.

410
00:19:35.160 --> 00:19:38.480
<v Speaker 3>Well, for now, let's just call it a massive productivity multiplier.

411
00:19:38.559 --> 00:19:40.359
<v Speaker 2>Okay, so what does that look like in prexice? What

412
00:19:40.400 --> 00:19:42.559
<v Speaker 2>does autonomous debugging actually mean?

413
00:19:42.880 --> 00:19:45.279
<v Speaker 3>It means Grock four point two doesn't just write a

414
00:19:45.319 --> 00:19:47.440
<v Speaker 3>block of code and hand it to you. It writes

415
00:19:47.480 --> 00:19:51.160
<v Speaker 3>the code and then, using the embodied simulator agent, it

416
00:19:51.279 --> 00:19:55.279
<v Speaker 3>runs the code in its own internal, sandboxed environment.

417
00:19:55.440 --> 00:19:57.119
<v Speaker 2>Ah, so it tests its own work.

418
00:19:57.359 --> 00:20:01.680
<v Speaker 3>It tests, It sees the error message, understands why it failed.

419
00:20:02.160 --> 00:20:05.559
<v Speaker 3>Maybe it missed a semicolon, maybe it imported the wrong library,

420
00:20:05.759 --> 00:20:09.240
<v Speaker 3>a logical flaw. It then fixes its own mistake and

421
00:20:09.319 --> 00:20:11.680
<v Speaker 3>runs it again. It iterates internally.

422
00:20:11.799 --> 00:20:13.759
<v Speaker 2>It fixes its own mistakes before you even.

423
00:20:13.640 --> 00:20:17.000
<v Speaker 3>See them exactly. All you see is the final working code.

424
00:20:17.119 --> 00:20:18.599
<v Speaker 3>It's a huge step forward.

425
00:20:18.920 --> 00:20:21.160
<v Speaker 2>Let's pivot. Then, let's get into the real world. Because

426
00:20:21.160 --> 00:20:24.599
<v Speaker 2>all these stats, latency parameters, tokens, they can be a

427
00:20:24.599 --> 00:20:27.000
<v Speaker 2>bit abstract. What does this actually mean for us? What

428
00:20:27.039 --> 00:20:28.440
<v Speaker 2>are people doing with it? Right now?

429
00:20:28.599 --> 00:20:31.759
<v Speaker 3>In the beta, the applications are really starting to broaden out.

430
00:20:31.880 --> 00:20:34.480
<v Speaker 3>One of the big ones is in scientific simulation. You know,

431
00:20:34.640 --> 00:20:37.319
<v Speaker 3>Xai has always had that very grand mission.

432
00:20:37.039 --> 00:20:39.160
<v Speaker 2>Statement, understand the universe right.

433
00:20:39.319 --> 00:20:42.680
<v Speaker 3>Very modest goals always, but Grock four point two is

434
00:20:42.720 --> 00:20:46.960
<v Speaker 3>taking that vision very seriously. We are already seeing users

435
00:20:47.000 --> 00:20:52.440
<v Speaker 3>in the beta prototyping novel robotics control algorithms, simulating protein folding,

436
00:20:52.680 --> 00:20:56.039
<v Speaker 3>modeling chemical reactions. And this brings us right back to Age.

437
00:20:55.799 --> 00:20:58.440
<v Speaker 2>In three, the embodied simulator.

438
00:20:57.960 --> 00:21:00.319
<v Speaker 3>The simulator, because this is where the connection to the

439
00:21:00.319 --> 00:21:02.200
<v Speaker 3>physical world becomes so strong.

440
00:21:02.279 --> 00:21:04.680
<v Speaker 2>Why is that robotic connection so front and center with

441
00:21:04.759 --> 00:21:05.640
<v Speaker 2>this release.

442
00:21:05.400 --> 00:21:07.960
<v Speaker 3>Well, you have to look at the whole ecosystem. Xai

443
00:21:08.119 --> 00:21:10.599
<v Speaker 3>isn't just a software company existing in a vacuum. It's

444
00:21:10.640 --> 00:21:13.759
<v Speaker 3>a sister company to Tesla. And what are we seeing

445
00:21:13.799 --> 00:21:17.559
<v Speaker 3>from Tesla right now an explosion of humanoid robots, optimists

446
00:21:17.559 --> 00:21:19.160
<v Speaker 3>and other robotics projects.

447
00:21:19.279 --> 00:21:21.640
<v Speaker 2>They are everywhere on social media. You can't scroll for

448
00:21:21.680 --> 00:21:24.400
<v Speaker 2>five minutes without seeing a robot folding laundry or walking

449
00:21:24.400 --> 00:21:25.319
<v Speaker 2>a dog exactly.

450
00:21:25.440 --> 00:21:27.680
<v Speaker 3>Grock four point two is being positioned as the mind

451
00:21:27.759 --> 00:21:30.480
<v Speaker 3>being built for those bodies. When you ask it to

452
00:21:30.519 --> 00:21:33.200
<v Speaker 3>write code to make this robot walk over uneven terrain,

453
00:21:33.759 --> 00:21:36.920
<v Speaker 3>it isn't just guessing syntax based on texts from the internet.

454
00:21:37.039 --> 00:21:38.519
<v Speaker 2>The simulator agent is running it.

455
00:21:38.680 --> 00:21:42.000
<v Speaker 3>The simulator agent is modeling the friction of the ground,

456
00:21:42.480 --> 00:21:46.079
<v Speaker 3>the robot's center of gravity, the momentum. It's trying to

457
00:21:46.079 --> 00:21:49.359
<v Speaker 3>solve the problem from first principles of physics it is.

458
00:21:49.960 --> 00:21:53.279
<v Speaker 2>It's just a wild concept. It's bridging the gap between

459
00:21:53.279 --> 00:21:57.079
<v Speaker 2>digital intelligence and physical action. It's not just describing the

460
00:21:57.079 --> 00:21:59.920
<v Speaker 2>world anymore. It's actively learning how to move through it.

461
00:22:00.000 --> 00:22:03.359
<v Speaker 3>It's giving the AI a sense of proprioception, a sense

462
00:22:03.359 --> 00:22:06.000
<v Speaker 3>of its own body and how it exists in space.

463
00:22:06.079 --> 00:22:09.359
<v Speaker 3>It's a critical step toward general purpose robotics.

464
00:22:09.480 --> 00:22:12.119
<v Speaker 2>And on the complete flip side of the cold hard science,

465
00:22:12.480 --> 00:22:15.799
<v Speaker 2>there's the personality. Groc has always been known for that witty,

466
00:22:16.279 --> 00:22:17.359
<v Speaker 2>non corporate.

467
00:22:17.039 --> 00:22:21.200
<v Speaker 3>Vibe, right, the rebellious humor. The anti hr chatbot, as

468
00:22:21.200 --> 00:22:22.200
<v Speaker 3>some people call it, is.

469
00:22:22.200 --> 00:22:24.640
<v Speaker 2>Four point two. Keep that or has the committee of

470
00:22:24.680 --> 00:22:26.359
<v Speaker 2>agents made it boring and safe?

471
00:22:26.599 --> 00:22:29.119
<v Speaker 3>From what I'm seeing and experiencing. The reports say it

472
00:22:29.160 --> 00:22:32.519
<v Speaker 3>has actually deepened it. It seems to have better emotional intelligence.

473
00:22:32.559 --> 00:22:35.240
<v Speaker 3>Now it's not just snarky for the sake of being snarky.

474
00:22:35.519 --> 00:22:36.559
<v Speaker 3>It can be more nuanced.

475
00:22:36.759 --> 00:22:37.839
<v Speaker 2>It can read the room better.

476
00:22:38.039 --> 00:22:40.400
<v Speaker 3>It can read the room, and it can handle long

477
00:22:40.440 --> 00:22:45.319
<v Speaker 3>form creative writing things like essays screenplays with much better coherence.

478
00:22:46.119 --> 00:22:49.599
<v Speaker 3>That's because the synthesizer agent is there to maintain the

479
00:22:49.759 --> 00:22:52.799
<v Speaker 3>narrative arc and thematic consistency.

480
00:22:52.200 --> 00:22:54.319
<v Speaker 2>So it doesn't just lose the plot halfway through a

481
00:22:54.400 --> 00:22:55.240
<v Speaker 2>story exactly.

482
00:22:55.240 --> 00:22:57.680
<v Speaker 3>And the visuals are tighter too. The integration with groc

483
00:22:57.720 --> 00:23:01.640
<v Speaker 3>Imagine is much more seamless of the multi agent reasoning.

484
00:23:01.839 --> 00:23:05.640
<v Speaker 3>You can refine images with more abstract conversational commands.

485
00:23:05.799 --> 00:23:06.279
<v Speaker 2>What do you mean?

486
00:23:06.599 --> 00:23:09.319
<v Speaker 3>You can say no, make the lighting more moody, you know,

487
00:23:09.400 --> 00:23:12.400
<v Speaker 3>like a film war movie, and the verifier and simulator

488
00:23:12.440 --> 00:23:17.039
<v Speaker 3>agents actually understand the stylistic implication of film noir, the

489
00:23:17.119 --> 00:23:21.200
<v Speaker 3>deep shadows, the high contrasts, rather than just looking for keywords.

490
00:23:21.240 --> 00:23:23.319
<v Speaker 2>So it's a better artist, a better engineer, and a

491
00:23:23.319 --> 00:23:23.960
<v Speaker 2>better comedian.

492
00:23:24.119 --> 00:23:27.480
<v Speaker 3>It's a true Renaissance model. It's a generalist that uses specialists.

493
00:23:27.599 --> 00:23:30.039
<v Speaker 2>Okay, I'm sold. I want to use it. Everyone listening

494
00:23:30.079 --> 00:23:32.680
<v Speaker 2>probably wants to use it. How do people get access

495
00:23:32.720 --> 00:23:34.799
<v Speaker 2>to this? It's a beta, it is.

496
00:23:34.759 --> 00:23:38.319
<v Speaker 3>But it's a public beta available right now as we speak.

497
00:23:38.400 --> 00:23:41.119
<v Speaker 2>So I just go to grock dot com or the

498
00:23:41.279 --> 00:23:41.640
<v Speaker 2>x app.

499
00:23:41.880 --> 00:23:44.440
<v Speaker 3>Yep, go to grock dot com or open up the

500
00:23:44.599 --> 00:23:47.279
<v Speaker 3>x app, start a new chat and there should be

501
00:23:47.319 --> 00:23:49.160
<v Speaker 3>a model selector usually at the top of the screen.

502
00:23:49.319 --> 00:23:53.240
<v Speaker 3>Just click that and choose GROC four point two public.

503
00:23:52.920 --> 00:23:57.240
<v Speaker 2>Beta, and that's it. No waitlist, no secret handshake.

504
00:23:56.960 --> 00:24:01.240
<v Speaker 3>No waitlist. XAI wants the data. They need that Friday

505
00:24:01.319 --> 00:24:04.000
<v Speaker 3>ritual to kick in. They need millions of people to

506
00:24:04.039 --> 00:24:04.720
<v Speaker 3>be the teachers.

507
00:24:04.759 --> 00:24:07.920
<v Speaker 2>They want to accelerate the flywheel. And for the power users,

508
00:24:08.160 --> 00:24:09.559
<v Speaker 2>the developers, the researchers.

509
00:24:09.799 --> 00:24:12.680
<v Speaker 3>If you're a supergrock or an ex Premium plus user,

510
00:24:12.960 --> 00:24:16.279
<v Speaker 3>you get higher rate limits, which is nice, but more interestingly,

511
00:24:16.319 --> 00:24:18.720
<v Speaker 3>you get access to some of the debugging features.

512
00:24:18.839 --> 00:24:19.960
<v Speaker 2>You can peak behind the curtain.

513
00:24:20.000 --> 00:24:21.839
<v Speaker 3>You can peek behind the curtain and see some of

514
00:24:21.880 --> 00:24:25.480
<v Speaker 3>that hidden reasoning trace that debate between the agents.

515
00:24:25.559 --> 00:24:27.359
<v Speaker 2>I suspect a lot of people will be upgrading their

516
00:24:27.359 --> 00:24:29.480
<v Speaker 2>subscriptions just to see that I would.

517
00:24:29.519 --> 00:24:32.839
<v Speaker 3>I mean, seeing the agents argue is probably as entertaining

518
00:24:32.880 --> 00:24:35.880
<v Speaker 3>and definitely as educational as the final answer itself. It

519
00:24:35.880 --> 00:24:38.319
<v Speaker 3>shows you the process of intelligence, not just the product.

520
00:24:38.599 --> 00:24:40.839
<v Speaker 2>So let's zoom out for the last section. What is

521
00:24:40.920 --> 00:24:46.359
<v Speaker 2>this strategic play here? Why release a powerful beta this aggressively,

522
00:24:47.039 --> 00:24:50.880
<v Speaker 2>Why commit to weekly public updates? It feels risky.

523
00:24:51.079 --> 00:24:53.759
<v Speaker 3>It's a huge strategic beat against the rest of the industry.

524
00:24:53.799 --> 00:24:56.400
<v Speaker 3>I mean, look at their main competitors, open Ai, Google.

525
00:24:56.480 --> 00:24:58.720
<v Speaker 3>They tend to hold their models back. They keep them

526
00:24:58.720 --> 00:25:01.720
<v Speaker 3>in the lab until they are perfect or perfectly safe.

527
00:25:01.720 --> 00:25:03.480
<v Speaker 3>They bake them for a year or more.

528
00:25:03.680 --> 00:25:05.960
<v Speaker 2>Right, they are more cautious. They create it like a traditional,

529
00:25:06.000 --> 00:25:06.720
<v Speaker 2>polished product.

530
00:25:06.799 --> 00:25:11.400
<v Speaker 3>Launch Xai is treating this like a continuous open science experiment.

531
00:25:11.759 --> 00:25:14.880
<v Speaker 3>They are betting that the fastest path to AGI to

532
00:25:15.000 --> 00:25:19.440
<v Speaker 3>artificial general intelligence isn't just more compute or more data,

533
00:25:19.799 --> 00:25:22.519
<v Speaker 3>it's continuous, user grounded learner.

534
00:25:22.640 --> 00:25:25.279
<v Speaker 2>They're trying to build an unbeatable flywheel exactly.

535
00:25:25.440 --> 00:25:28.359
<v Speaker 3>By opening the beta to everyone and committing to weekly updates,

536
00:25:28.440 --> 00:25:30.960
<v Speaker 3>they're accelerating their own feedback loop at a pace no

537
00:25:30.960 --> 00:25:34.279
<v Speaker 3>one else can match. If they genuinely improve every single Friday,

538
00:25:34.559 --> 00:25:37.920
<v Speaker 3>that's fifty two major iteration cycles a year.

539
00:25:38.039 --> 00:25:38.440
<v Speaker 2>Wow.

540
00:25:38.559 --> 00:25:41.599
<v Speaker 3>Their competitors might only have one or two major releases

541
00:25:41.640 --> 00:25:44.240
<v Speaker 3>in that same timeframe. The compounding effect of that is

542
00:25:44.359 --> 00:25:46.599
<v Speaker 3>just the math is hard to beat.

543
00:25:47.079 --> 00:25:52.079
<v Speaker 2>It's the SpaceX approach applied to AI development. Launch, test, iterate,

544
00:25:52.319 --> 00:25:56.599
<v Speaker 2>repeat rapidly, even if things break occasionally along the way.

545
00:25:56.640 --> 00:25:58.759
<v Speaker 3>That's it move fast and fix things.

546
00:25:58.519 --> 00:26:03.000
<v Speaker 2>And looking ahead, what's on the roadmap after this beta.

547
00:26:02.519 --> 00:26:04.920
<v Speaker 3>Short term, this public beta is scheduled to run until

548
00:26:05.000 --> 00:26:07.559
<v Speaker 3>late March twenty twenty six, so we have about a

549
00:26:07.680 --> 00:26:10.559
<v Speaker 3>month of this rapid weekly evolution.

550
00:26:10.240 --> 00:26:12.519
<v Speaker 2>To watch and then what comes next.

551
00:26:12.680 --> 00:26:16.359
<v Speaker 3>Then all eyes turned toward Grock five. The internal target

552
00:26:16.400 --> 00:26:18.640
<v Speaker 3>for that is Q two or Q three of this

553
00:26:18.759 --> 00:26:19.839
<v Speaker 3>year twenty twenty six.

554
00:26:20.119 --> 00:26:20.799
<v Speaker 2>Rock five.

555
00:26:21.119 --> 00:26:23.480
<v Speaker 3>It just never stops, and that will likely run on

556
00:26:23.519 --> 00:26:26.519
<v Speaker 3>the next generation of their hardware, which they're calling Colossus two,

557
00:26:26.880 --> 00:26:31.839
<v Speaker 3>and will probably involve even more sophisticated and autonomous agent orchestration.

558
00:26:32.000 --> 00:26:34.400
<v Speaker 2>What does that mean? Autonomous agent orchestration?

559
00:26:34.599 --> 00:26:36.440
<v Speaker 3>We might start to see agents that can go out

560
00:26:36.480 --> 00:26:39.000
<v Speaker 3>and perform complex tasks on the web for you, not

561
00:26:39.119 --> 00:26:41.680
<v Speaker 3>just answer questions. Agents that can research and book an

562
00:26:41.799 --> 00:26:45.400
<v Speaker 3>entire vacation or manage your calendar, or negotiate a purchase

563
00:26:45.440 --> 00:26:47.640
<v Speaker 3>on your behalf true digital assistance.

564
00:26:47.920 --> 00:26:49.920
<v Speaker 2>So to recap before we wrap up here, we have

565
00:26:49.960 --> 00:26:52.680
<v Speaker 2>a new model, GROC four point two. It runs on

566
00:26:52.720 --> 00:26:55.720
<v Speaker 2>a supercluster the size of a city. It learns every

567
00:26:55.759 --> 00:26:59.079
<v Speaker 2>single week based on our direct feedback via this Friday ritual,

568
00:27:00.119 --> 00:27:04.079
<v Speaker 2>thanks using an internal council of four specialized agents, a reasoner,

569
00:27:04.160 --> 00:27:07.799
<v Speaker 2>a verifier, a simulator, and a synthesizer. And it's available

570
00:27:07.839 --> 00:27:09.279
<v Speaker 2>for anyone to try right now.

571
00:27:09.400 --> 00:27:13.559
<v Speaker 3>That's the summary. We've moved from static, frozen models to

572
00:27:14.720 --> 00:27:18.480
<v Speaker 3>a living, evolving intelligence. It's a fundamental shift.

573
00:27:18.519 --> 00:27:20.640
<v Speaker 2>It's a massive shift, and I just keep coming back

574
00:27:20.640 --> 00:27:23.279
<v Speaker 2>to that Friday ritual, the idea that the machine is

575
00:27:23.319 --> 00:27:25.640
<v Speaker 2>growing up alongside us week by week.

576
00:27:25.799 --> 00:27:28.160
<v Speaker 3>It completely changes your relationship with it. You aren't just

577
00:27:28.200 --> 00:27:31.240
<v Speaker 3>a consumer of its answers. You're an active contributor to

578
00:27:31.279 --> 00:27:33.480
<v Speaker 3>its growth. You're part of the training data.

579
00:27:33.559 --> 00:27:35.200
<v Speaker 2>Which brings me to a final thought. I wanted to

580
00:27:35.200 --> 00:27:37.359
<v Speaker 2>throw at you a bit of a provocation to leave

581
00:27:37.400 --> 00:27:39.160
<v Speaker 2>our listeners with as they go and try this thing out.

582
00:27:39.279 --> 00:27:40.079
<v Speaker 3>Okay, let's hear it.

583
00:27:40.400 --> 00:27:43.160
<v Speaker 2>We talked about agent three, the embodied simulator, it runs

584
00:27:43.200 --> 00:27:46.720
<v Speaker 2>physics simulations to help robots interact with the real world. Right,

585
00:27:46.920 --> 00:27:49.920
<v Speaker 2>And we talked about the feedback loop bus the users

586
00:27:50.279 --> 00:27:53.839
<v Speaker 2>telling the AI what is true or helpful or correct.

587
00:27:54.160 --> 00:27:58.200
<v Speaker 2>So if the model updates every single Friday based on

588
00:27:58.240 --> 00:28:02.680
<v Speaker 2>our collective human interactions, and it has an agent specifically

589
00:28:02.680 --> 00:28:06.759
<v Speaker 2>designed to simulate and then operate in reality, at what

590
00:28:06.880 --> 00:28:09.440
<v Speaker 2>point does the feedback loop between us and the AI

591
00:28:10.039 --> 00:28:13.200
<v Speaker 2>begin to actively shape reality rather than just describe it.

592
00:28:13.279 --> 00:28:16.119
<v Speaker 3>That is a heavy question, a very heavy question.

593
00:28:16.240 --> 00:28:19.960
<v Speaker 2>Think about it. If millions of us tell the simulator

594
00:28:20.000 --> 00:28:22.240
<v Speaker 2>that this is how a financial market should work, or

595
00:28:22.519 --> 00:28:24.880
<v Speaker 2>this is how traffic flow should be optimized, or even

596
00:28:25.039 --> 00:28:27.880
<v Speaker 2>this is how society ought to function, and the AI

597
00:28:28.079 --> 00:28:30.799
<v Speaker 2>then begins to write the code for the robots and

598
00:28:30.839 --> 00:28:33.759
<v Speaker 2>the autonomous systems that run our infrastructure based on that

599
00:28:33.839 --> 00:28:37.240
<v Speaker 2>learned consensus. Yeah, we're interesting, a very very strange territory.

600
00:28:37.319 --> 00:28:40.400
<v Speaker 3>It becomes the ultimate consensus reality. We are collectively training

601
00:28:40.440 --> 00:28:43.440
<v Speaker 3>the engine that will then build our future infrastructure. And

602
00:28:43.519 --> 00:28:45.759
<v Speaker 3>if we as a collective feed at our biases or

603
00:28:45.799 --> 00:28:50.480
<v Speaker 3>our errors, or our collective delusions, those delusions become code, they.

604
00:28:50.319 --> 00:28:53.440
<v Speaker 2>Become concrete, They become how the robot acts exactly.

605
00:28:53.559 --> 00:28:56.599
<v Speaker 3>We aren't just chatting with a bot anymore. We're, in

606
00:28:56.640 --> 00:29:00.400
<v Speaker 3>a very real sense, teaching the operating system of the future.

607
00:29:00.839 --> 00:29:02.680
<v Speaker 3>So we better be damn careful what we teach it

608
00:29:02.720 --> 00:29:04.039
<v Speaker 3>on Tuesdays and Wednesdays.

609
00:29:04.319 --> 00:29:07.440
<v Speaker 2>Indeed, we better be very careful what we flag as

610
00:29:07.799 --> 00:29:13.319
<v Speaker 2>helpful on that slightly terrifying but also exhilarating note, I

611
00:29:13.319 --> 00:29:14.599
<v Speaker 2>think we're going to wrap it up there.

612
00:29:14.720 --> 00:29:17.039
<v Speaker 3>Go test it out, see what the Council of Agents

613
00:29:17.079 --> 00:29:18.200
<v Speaker 3>has to say for itself.

614
00:29:18.279 --> 00:29:20.640
<v Speaker 2>Go collaborate with the new system. Be a good teacher.

615
00:29:21.000 --> 00:29:22.240
<v Speaker 2>We'll see you in the next discussion.

616
00:29:22.640 --> 00:29:24.240
<v Speaker 3>Thanks for listening, everyone, By everyone,
