WEBVTT 1 00:00:01.199 --> 00:00:06.200 Welcome to the Sentient Code, where intelligence is engineered, autonomy 2 00:00:06.280 --> 00:00:10.439 is emerging, and a line between human and machine grows thinner. 3 00:00:10.800 --> 00:00:15.359 Each episode, we decode the algorithms, explore the robotics, and 4 00:00:15.439 --> 00:00:21.879 examine the ideas shaping the future of artificial minds. 5 00:00:23.800 --> 00:00:25.879 Okay, let's just take a second to breathe. 6 00:00:26.320 --> 00:00:29.000 Yeah, a deep breath is probably a good idea. 7 00:00:29.160 --> 00:00:31.559 Because if you were online yesterday, I mean, if you 8 00:00:31.600 --> 00:00:34.719 were anywhere near a notification stream or x or just 9 00:00:34.759 --> 00:00:38.240 a tech newsticker, you didn't just see a press release, No, 10 00:00:38.439 --> 00:00:39.399 you felt a tremor. 11 00:00:39.799 --> 00:00:42.200 That is the perfect word for it, a tremor, a 12 00:00:42.359 --> 00:00:43.439 shift in the bedrock. 13 00:00:43.600 --> 00:00:47.039 You know, we had become so desensitized to updates, haven't we. 14 00:00:47.280 --> 00:00:49.560 It's always version one point one, version one point two. 15 00:00:49.679 --> 00:00:51.479 Oh yeah, minor bug fixes. 16 00:00:51.520 --> 00:00:53.759 It's usually bug fixes, maybe a dark mode, maybe the 17 00:00:53.759 --> 00:00:56.759 apuploads what five percent faster? We just scroll right past it. 18 00:00:57.119 --> 00:01:00.880 But what happened yesterday, February seventeenth, twenty twenty six was 19 00:01:02.119 --> 00:01:02.840 it was not that? 20 00:01:03.280 --> 00:01:06.920 No, it was a paradigm shift disguised as a decimal point, 21 00:01:07.319 --> 00:01:09.439 a very very significant decimal point. 22 00:01:09.480 --> 00:01:11.359 We are talking, of course, about the massive news from 23 00:01:11.439 --> 00:01:14.959 Xai Elon Musk took to x and announced the immediate 24 00:01:14.959 --> 00:01:16.840 public beta of Grock four point two. 25 00:01:17.159 --> 00:01:19.200 And not Grock four point twenty, as some of the 26 00:01:19.239 --> 00:01:20.959 early memes were suggesting. 27 00:01:20.560 --> 00:01:23.319 Write that little nod to internet culture, but they clarified 28 00:01:23.359 --> 00:01:25.200 it's free point two. And I have to say, looking 29 00:01:25.239 --> 00:01:27.879 at the sheer density of the documentation, the white papers, 30 00:01:27.920 --> 00:01:31.519 the architecture diagrams, salt calling this an update feels like 31 00:01:31.519 --> 00:01:34.680 an insult. This feels like a different species of intelligence. 32 00:01:34.760 --> 00:01:37.280 It really is. And to understand why this matters, you 33 00:01:37.280 --> 00:01:40.040 have to look past the branding. Usually a point two 34 00:01:40.200 --> 00:01:44.120 release is an optimization, it's a tweak, the refinement exactly. 35 00:01:44.560 --> 00:01:47.879 But Xai is claiming this model is designed to be 36 00:01:48.359 --> 00:01:51.439 and this is a direct quote, an order of magnitude 37 00:01:51.480 --> 00:01:54.000 smarter and faster than Grock four. 38 00:01:54.120 --> 00:01:56.159 An order of magnitude. That's not a small claim. 39 00:01:56.239 --> 00:01:59.799 It's a huge claim. And the kicker, they aren't promising 40 00:01:59.799 --> 00:02:02.239 this for next year or Q four. They're saying this 41 00:02:02.280 --> 00:02:04.799 is happening now in a public beta that concludes in 42 00:02:04.840 --> 00:02:05.439 about a month. 43 00:02:05.680 --> 00:02:09.400 That timeline is what stopped me in my tracks. It's just, yeah, 44 00:02:09.479 --> 00:02:10.199 it's breathtaking. 45 00:02:10.280 --> 00:02:13.120 Leefast it's aggressive. I mean even for them, it's aggressive. 46 00:02:13.280 --> 00:02:14.719 Let's just put it on a calendar for a second 47 00:02:14.759 --> 00:02:17.639 for everyone listening. Grock four was released in July of 48 00:02:17.639 --> 00:02:18.520 twenty twenty. 49 00:02:18.240 --> 00:02:20.639 Five, right, which already felt like a huge leap, a 50 00:02:20.719 --> 00:02:21.120 huge leap. 51 00:02:21.159 --> 00:02:25.680 Then Grock four point one followed in November twenty twenty five. Fine, 52 00:02:25.840 --> 00:02:28.360 now here we are February twenty twenty six, and we 53 00:02:28.400 --> 00:02:31.360 get four point two. This isn't software development time now, 54 00:02:31.400 --> 00:02:34.560 this is evolutionary time. This is compounding at a speed 55 00:02:34.599 --> 00:02:36.080 that feels unnatural. 56 00:02:36.280 --> 00:02:39.360 It's the fastest major iteration cycle we've seen in the 57 00:02:39.400 --> 00:02:41.919 history of the company, and you know, arguably in the 58 00:02:41.960 --> 00:02:46.560 history of the entire sector. They are compounding intelligence at 59 00:02:46.639 --> 00:02:49.879 a rate that is becoming genuinely difficult to track. 60 00:02:50.560 --> 00:02:53.000 So here's our mission for today. We aren't just going 61 00:02:53.080 --> 00:02:54.439 to read the release notes. 62 00:02:54.599 --> 00:02:55.560 Yeah, one can do that. 63 00:02:55.680 --> 00:02:58.280 Anyone can do that. We have the technical breakdowns, we 64 00:02:58.319 --> 00:03:01.000 have some of the leaked benchmarks, and the early user 65 00:03:01.039 --> 00:03:04.280 reports are flooding in. We need to understand not just 66 00:03:04.439 --> 00:03:08.759 what changed, but how this thing actually operates. Because the 67 00:03:08.840 --> 00:03:12.080 central claim is that it thinks differently than anything we've 68 00:03:12.159 --> 00:03:12.719 used before. 69 00:03:12.840 --> 00:03:15.000 That's the key insight. You said it perfectly. It's not 70 00:03:15.080 --> 00:03:17.039 just a bigger brain, it's a different kind of mind. 71 00:03:17.080 --> 00:03:19.960 It's a whole new cognitive architecture. 72 00:03:20.159 --> 00:03:22.520 Okay, So before we get into the software wizardry, which 73 00:03:22.800 --> 00:03:25.719 honestly it's mind bending stuff, we have to ground this 74 00:03:25.840 --> 00:03:30.240 in physical reality the hardware, because AI feels like magic, 75 00:03:30.479 --> 00:03:33.199 but it runs on metal. It runs on silicon and copper. 76 00:03:33.840 --> 00:03:36.000 We need to talk about the Memphis Colossus. 77 00:03:36.080 --> 00:03:38.960 The Colossus. It sounds like a wonder of the ancient world, 78 00:03:39.000 --> 00:03:39.680 doesn't it. 79 00:03:39.680 --> 00:03:43.319 It basically is the modern equivalent. The stats on this 80 00:03:43.360 --> 00:03:47.680 supercluster are hard to even visualize. We are talking about 81 00:03:47.680 --> 00:03:50.639 a cluster that has now scaled to over one point 82 00:03:50.680 --> 00:03:52.680 two million GPUs. 83 00:03:52.960 --> 00:03:54.960 Let's just pause on that number for a second. One 84 00:03:55.000 --> 00:03:55.800 point two million. 85 00:03:55.960 --> 00:03:57.879 I mean, I remember just two years ago, back in 86 00:03:57.919 --> 00:04:00.520 twenty four, we were looking at clusters with one hundred 87 00:04:00.520 --> 00:04:03.159 thousand GPUs and our minds were blown. We were thinking, 88 00:04:03.599 --> 00:04:06.120 this is it. This is the peak. You can't possibly 89 00:04:06.120 --> 00:04:08.479 connect more than that efficiently exactly. 90 00:04:08.520 --> 00:04:11.879 We thought that was industrial scale. This this is planetary scale. 91 00:04:12.000 --> 00:04:14.039 This is a city made of compute. 92 00:04:13.560 --> 00:04:15.719 A city. That's a great way to put it, and. 93 00:04:15.719 --> 00:04:18.560 The reason you, the listener, need to care about that 94 00:04:18.639 --> 00:04:22.040 number isn't just because it's big and impressive. It's because 95 00:04:22.120 --> 00:04:24.399 quantity has a quality all its own. 96 00:04:25.199 --> 00:04:26.040 What do you mean by that? 97 00:04:26.120 --> 00:04:29.079 When you have one point two million GPUs at your disposal, 98 00:04:29.560 --> 00:04:32.319 you aren't just training the same old models faster. You 99 00:04:32.399 --> 00:04:36.040 unlock entirely new training techniques that are physically impossible when 100 00:04:36.079 --> 00:04:37.160 you're compute constrained. 101 00:04:37.360 --> 00:04:40.040 So it's not just about cooking the steak faster. Now 102 00:04:40.199 --> 00:04:42.639 it allows you to cook a completely different meal. 103 00:04:42.839 --> 00:04:46.240 Precisely. You can run parallel simulations on a massive scale. 104 00:04:46.639 --> 00:04:48.600 You can do reinforcement learning loops that would take a 105 00:04:48.680 --> 00:04:51.319 smaller cluster a decade to finish, and you can now 106 00:04:51.360 --> 00:04:54.720 do them in a week. That hardware, that colossus is 107 00:04:54.759 --> 00:04:58.480 the absolute prerequisite for the software breakthroughs we're about to 108 00:04:58.480 --> 00:04:58.959 get into. 109 00:04:58.959 --> 00:05:01.439 Which brings us to the first big one, the rapid 110 00:05:01.519 --> 00:05:04.279 learning architecture. Now, I really want you to break this 111 00:05:04.399 --> 00:05:06.920 down for us, because learning is a word we throw 112 00:05:06.959 --> 00:05:09.959 around a lot in AI. It's almost lost its meaning. 113 00:05:10.240 --> 00:05:11.839 How is this different from the old way? 114 00:05:12.240 --> 00:05:14.240 Right? So, the old way, and by old I mean 115 00:05:14.319 --> 00:05:16.959 the standard practice for pretty much every major model in 116 00:05:17.040 --> 00:05:19.560 twenty twenty four and twenty twenty five is what i'd 117 00:05:19.560 --> 00:05:22.879 call the snapshot methods. The snapshot, you gather the entire internet, 118 00:05:22.920 --> 00:05:27.079 basically petabytes of text books, code everything, you feed it 119 00:05:27.120 --> 00:05:29.319 into the model, You cook it for months on a 120 00:05:29.399 --> 00:05:33.160 giant cluster, and then when it's done, you freeze it. 121 00:05:33.240 --> 00:05:36.240 You freeze the weights like printing an encyclopedia exactly. 122 00:05:36.279 --> 00:05:39.480 It's a perfect analogy. Once that training run is done, 123 00:05:39.920 --> 00:05:43.480 the weights, the neural connections inside the model are set 124 00:05:43.519 --> 00:05:46.240 in stone. If the world changes a day after you 125 00:05:46.279 --> 00:05:49.279 finish training, well, too bad. The model doesn't know. It's 126 00:05:49.319 --> 00:05:50.800 a frozen artifact of the past. 127 00:05:50.839 --> 00:05:53.639 And that's why we always had those knowledge cutoffs. You'd 128 00:05:53.639 --> 00:05:55.959 ask about a news event from last week and the 129 00:05:55.959 --> 00:05:59.519 AI would say, sorry, my knowledge ends in September twenty 130 00:05:59.519 --> 00:06:00.319 twenty three wrecked. 131 00:06:00.920 --> 00:06:04.399 That was the fundamental limitation of the static paradigm. Grock 132 00:06:04.480 --> 00:06:07.920 four point two is trying to kill the snapshot. They've 133 00:06:07.959 --> 00:06:10.839 introduced what they call a hybrid post training process. 134 00:06:11.040 --> 00:06:14.360 This is the real time adaptation feature I saw mentioned everywhere. 135 00:06:14.519 --> 00:06:17.040 Yes, so instead of being a solid block of ice, 136 00:06:17.319 --> 00:06:20.560 think of the model as having a fluid outer layer. 137 00:06:21.120 --> 00:06:24.720 It's a lightweight continue learning layer that ingests anonymized high 138 00:06:24.720 --> 00:06:27.160 signal user feedback almost constantly. 139 00:06:27.439 --> 00:06:30.360 So when I'm using Groc and I click that thumbs down, 140 00:06:30.839 --> 00:06:33.519 i have flag an answer as unhelpful, or maybe I've 141 00:06:33.560 --> 00:06:35.959 paste into correction, or I'm working through a complex code 142 00:06:35.959 --> 00:06:36.600 bug with it. 143 00:06:36.600 --> 00:06:38.920 It's not just going into a log file that a 144 00:06:39.000 --> 00:06:41.480 human intern might read in six months. It is being 145 00:06:41.519 --> 00:06:46.319 distilled mathematically into these tiny micro updates. And this all 146 00:06:46.439 --> 00:06:50.360 leads to what Xai is very cleverly branding as the 147 00:06:50.480 --> 00:06:51.759 Friday Ritual. 148 00:06:52.199 --> 00:06:55.199 I love this branding, the Friday Ritual. It sounds a 149 00:06:55.240 --> 00:06:58.920 little culty, but in a cool Silicon Valley kind of way. Yeah, 150 00:06:58.959 --> 00:07:00.759 what exactly happens on Fridays. 151 00:07:00.959 --> 00:07:04.800 Every Friday, Xai pushes a global update to the model's weights. 152 00:07:05.160 --> 00:07:08.120 This isn't a full retrain, but it's a significant update 153 00:07:08.279 --> 00:07:12.000 based on the aggregated, verified learnings from millions of users 154 00:07:12.040 --> 00:07:12.920 over the previous week. 155 00:07:13.160 --> 00:07:14.920 It is that's wild. 156 00:07:15.040 --> 00:07:17.040 It means the Groc you talk to on a Monday 157 00:07:17.079 --> 00:07:20.439 morning is measurably mathematically smarter than the one you were 158 00:07:20.439 --> 00:07:21.639 talking to on Sunday night. 159 00:07:21.920 --> 00:07:22.279 Wow. 160 00:07:22.480 --> 00:07:26.759 It has metabolized the experiences, the corrections the hard problems 161 00:07:26.959 --> 00:07:28.959 that millions of people threw at it over the last 162 00:07:29.000 --> 00:07:29.560 seven days. 163 00:07:29.839 --> 00:07:32.160 Just think about the feedback loop there. I mean, if 164 00:07:32.199 --> 00:07:35.120 a brand new programming library comes out on a Tuesday YEP, 165 00:07:35.439 --> 00:07:38.000 and thousands of developers are struggling with it, and they're 166 00:07:38.079 --> 00:07:41.519 using GROC and they're correcting its mistakes on Wednesday and Thursday. 167 00:07:41.639 --> 00:07:45.240 By Friday's update, Grock knows the new library. It stops 168 00:07:45.279 --> 00:07:46.240 making those mistakes. 169 00:07:46.560 --> 00:07:48.800 It's absorbed that knowledge from the collective. 170 00:07:48.920 --> 00:07:52.079 It's evolution on a weekly cycle. It's an unprecedented speed 171 00:07:52.079 --> 00:07:52.720 of adaptation. 172 00:07:53.079 --> 00:07:55.439 But and I have to play the skeptic here because 173 00:07:55.439 --> 00:07:58.160 I can hear the safety researchers screaming into their pillows 174 00:07:58.199 --> 00:07:58.600 right now. 175 00:07:59.000 --> 00:08:02.439 Isn't this incredibly It's the first question everyone asks. 176 00:08:02.639 --> 00:08:06.199 We all remember pay the Microsoft chatbot from a decade ago. 177 00:08:06.480 --> 00:08:07.680 It lasted what a day? 178 00:08:07.959 --> 00:08:08.680 Less than a day. 179 00:08:08.800 --> 00:08:11.279 You let the internet teach an AI, and it just 180 00:08:11.319 --> 00:08:15.199 becomes a toxic, racist, conspiratorial nightmare. 181 00:08:15.279 --> 00:08:18.000 That is the primary risk. Absolutely, If you just pipe 182 00:08:18.120 --> 00:08:21.879 raw x data, you know, formerly Twitter, into the model's brain, 183 00:08:22.040 --> 00:08:25.879 you get garbage, you get chaos. But XAI is keenly, 184 00:08:26.240 --> 00:08:29.720 keenly aware of this. The documentation emphasizes over and over 185 00:08:29.800 --> 00:08:32.639 that this isn't raw learning, it's high signal learning. 186 00:08:32.720 --> 00:08:34.600 So there's a filter, a big one. 187 00:08:34.440 --> 00:08:37.039 A massive one. Think of it as curated evolution. They 188 00:08:37.159 --> 00:08:40.440 use automated alignment checks, which are basically other AI models 189 00:08:40.480 --> 00:08:43.399 whose entire job is to grade the proposed updates to 190 00:08:43.559 --> 00:08:47.000 verify that the new information is actually factual, helpful, and 191 00:08:47.080 --> 00:08:49.759 crucially not a jail break or an attempt to corrupt 192 00:08:49.759 --> 00:08:50.159 the model. 193 00:08:50.360 --> 00:08:52.879 So it's less like a parrot repeating every single thing 194 00:08:52.879 --> 00:08:57.360 adhares and more like a diligent student who checks a 195 00:08:57.399 --> 00:09:01.240 new fact against a trusted textbook before accepting it is. 196 00:09:01.240 --> 00:09:05.519 True, a student with a very very strict teacher grading 197 00:09:05.559 --> 00:09:08.559 their homework before it gets committed to their permanent memory. 198 00:09:08.720 --> 00:09:12.120 They are filtering for a utility and truth, trying to 199 00:09:12.120 --> 00:09:16.519 distinguish between the internet's noise and it's signal. 200 00:09:16.679 --> 00:09:19.120 Okay, that makes sense, so that covers how it learns. 201 00:09:19.159 --> 00:09:22.200 It's a living system now, not a static object. But 202 00:09:22.240 --> 00:09:25.399 the thing that really seems to be dominating the technical discourse, 203 00:09:25.440 --> 00:09:28.440 the thing everyone's buzzing about is how it thinks. 204 00:09:28.759 --> 00:09:30.879 Yes, the cognitive architecture. 205 00:09:30.919 --> 00:09:32.480 We're talking about the four agent system. 206 00:09:32.639 --> 00:09:36.120 This is the revolution. Honestly, if you take one thing 207 00:09:36.159 --> 00:09:39.159 away from our entire discussion today, let it be this. 208 00:09:39.159 --> 00:09:40.480 This is the core innovation. 209 00:09:40.679 --> 00:09:43.639 So set the scene for us. Previously, an AI like 210 00:09:43.720 --> 00:09:46.879 Grock for GBT four was a monolith. 211 00:09:46.519 --> 00:09:50.440 Right, a monolith. You ask a question and one giant 212 00:09:50.480 --> 00:09:53.039 neural network starts predicting the next word, then the next, 213 00:09:53.080 --> 00:09:56.440 then the next, based on pure probability. It's a stream 214 00:09:56.440 --> 00:10:00.480 of consciousness, one voice. It's incredibly impressive, but it's prone 215 00:10:00.480 --> 00:10:03.519 to getting lost in its own rambling. It can hallucinate, 216 00:10:03.639 --> 00:10:07.240 it can contradict itself because there's no internal checking mechanism. 217 00:10:07.279 --> 00:10:09.240 But Grock four point two doesn't work alone. 218 00:10:09.399 --> 00:10:12.919 No, for simple things like what's the weather? Or tell 219 00:10:12.919 --> 00:10:16.039 me a joke, it stays simple. It uses the base model. 220 00:10:16.200 --> 00:10:19.679 But for any non trivial query, how do I design 221 00:10:19.720 --> 00:10:24.200 a structurally sound shed or analyze this complex legal contract 222 00:10:24.240 --> 00:10:28.200 for loopholes? Grock four point two spins up a team 223 00:10:28.480 --> 00:10:32.279 a team. It creates an internal council of four specialized 224 00:10:32.320 --> 00:10:33.360 independent agents. 225 00:10:33.559 --> 00:10:36.639 I want to be the team because the documentation gives 226 00:10:36.679 --> 00:10:40.200 them these almost distinct personalities. It's fascinating. Let's go through them. 227 00:10:40.200 --> 00:10:41.720 One by one who is Agent one. 228 00:10:41.919 --> 00:10:44.519 Agent one is the reasoner, the reasoner, the logition, the 229 00:10:44.519 --> 00:10:47.399 pure logician, the mathematician. Think of this agent as the 230 00:10:47.440 --> 00:10:50.080 Spock of the group. Its entire job is step by 231 00:10:50.080 --> 00:10:53.240 step decomposition. It doesn't care about being polite. It doesn't 232 00:10:53.240 --> 00:10:55.440 care about facts in the outside world, not at first. 233 00:10:55.720 --> 00:10:57.960 It cares only about internal consistency. 234 00:10:58.080 --> 00:10:59.639 A leads to B, B leads to C. 235 00:11:00.159 --> 00:11:01.279 Does A lead to B? 236 00:11:01.759 --> 00:11:02.000 Does? 237 00:11:02.039 --> 00:11:06.200 The math checkout is the code logically sound from top 238 00:11:06.240 --> 00:11:09.399 to bottom. It's the one that prevents those weird intuitive 239 00:11:09.480 --> 00:11:11.679 jumps where an AI just guesses the answer to a 240 00:11:11.720 --> 00:11:13.559 math problem instead of showing its work. 241 00:11:13.720 --> 00:11:15.879 So it enforces that chain of thought we hear so 242 00:11:16.000 --> 00:11:17.159 much about exactly. 243 00:11:17.200 --> 00:11:19.799 It's the rigorous scientist of the group. It breaks a 244 00:11:19.840 --> 00:11:24.039 complex problem down into its smallest possible components and solves 245 00:11:24.080 --> 00:11:25.720 them linearly methodically. 246 00:11:26.039 --> 00:11:29.559 Okay, but logic isn't enough. If your initial facts are wrong. 247 00:11:29.919 --> 00:11:32.320 You can have a perfectly logical argument that's based on 248 00:11:32.360 --> 00:11:33.720 a complete lie, which. 249 00:11:33.480 --> 00:11:37.240 Brings us directly to Agent two, the verifier. 250 00:11:36.759 --> 00:11:40.480 The truth seeker, fact checker. The verifier is the journalist 251 00:11:40.559 --> 00:11:43.080 or the librarian in the room. It has a live 252 00:11:43.200 --> 00:11:46.360 real time connection to the Internet, specifically the X fire 253 00:11:46.399 --> 00:11:50.000 hose for breaking news and the broader web for established knowledge. 254 00:11:50.399 --> 00:11:52.320 Its job is to look at what the reasoner is 255 00:11:52.360 --> 00:11:54.720 proposing and say, hang on a second, with a minute, 256 00:11:54.840 --> 00:11:57.960 does that scientific paper you cited actually exist? Is that 257 00:11:58.039 --> 00:12:01.279 chemical reaction possible at room tenpure? What is the current 258 00:12:01.320 --> 00:12:04.120 stock price of that company? Not the price from six 259 00:12:04.159 --> 00:12:06.279 months ago. So it's the hallucination killer. 260 00:12:06.399 --> 00:12:08.879 It is designed to be the hallucination killer. It's the 261 00:12:09.000 --> 00:12:11.320 editor with the big red pen. It prevents the model 262 00:12:11.320 --> 00:12:15.159 from confidently lying to you. If the reasoner says, based 263 00:12:15.159 --> 00:12:18.039 on the twenty twenty five tax code, you oex amount, 264 00:12:18.320 --> 00:12:21.320 the verifier is there to check. Wait, the tax code 265 00:12:21.360 --> 00:12:24.600 was updated in January twenty twenty six. Your premise is wrong. 266 00:12:24.799 --> 00:12:28.240 That is a crucial, crucial distinction. Okay, okay, So we 267 00:12:28.279 --> 00:12:32.600 have a logitian and a fact checker, a powerful combo. 268 00:12:32.639 --> 00:12:33.559 Who's number three? 269 00:12:33.840 --> 00:12:36.039 Agent three is the embodied simulator. 270 00:12:36.200 --> 00:12:37.679 This is the one that sounds of both sci fi 271 00:12:37.759 --> 00:12:40.519 to me. Embodied simulator? What does that even mean? 272 00:12:40.759 --> 00:12:43.679 This is the imaginative one, but it's an imagination rounded 273 00:12:43.720 --> 00:12:49.399 in physics. It understands three D space It understands object permanence, friction, gravity. 274 00:12:49.639 --> 00:12:52.559 If you ask a question about robotics or mechanical engineering, 275 00:12:52.639 --> 00:12:55.519 or how an object might move through space, this agent 276 00:12:55.600 --> 00:12:58.679 actually runs a mental physics based simulation of that event. 277 00:12:58.960 --> 00:13:00.639 So if I ask it to write code for a 278 00:13:00.720 --> 00:13:03.840 robot arm to pick up a delicate glass object, Agent 279 00:13:03.840 --> 00:13:06.320 three isn't just guessing words based on another code it's seen. 280 00:13:06.679 --> 00:13:09.039 It's actually simulating the fragility of the glass. 281 00:13:09.360 --> 00:13:12.559 It's modeling the physics of the grip. It's asking how 282 00:13:12.639 --> 00:13:15.840 much pressure is too much pressure? What's the optimal trajectory 283 00:13:15.879 --> 00:13:19.600 to avoid collision. It's the bridge between the digital brand 284 00:13:19.639 --> 00:13:22.799 and the physical world. It's the engineer and the architect. 285 00:13:22.440 --> 00:13:25.919 Of the group. Mind blowing okay. And finally, Agent four, 286 00:13:26.279 --> 00:13:27.559 the one running the whole show. 287 00:13:27.759 --> 00:13:32.200 Agent four the synthesizer, the boss, the project manager. The 288 00:13:32.200 --> 00:13:35.480 manager is the perfect term. The synthesizer doesn't generate the 289 00:13:35.519 --> 00:13:39.679 initial raw ideas it listens. It takes the logical breakdown 290 00:13:39.679 --> 00:13:43.159 from the reasoner, the factual corrections from the verifier, and 291 00:13:43.200 --> 00:13:46.519 the physical simulations from the simulator. It looks at all their. 292 00:13:46.440 --> 00:13:48.840 Drafts and it must notice where they disagree. 293 00:13:48.919 --> 00:13:52.000 That's his most important job. It notices the conflicts and 294 00:13:52.039 --> 00:13:55.000 it integrates them into the final coherent answer that you, 295 00:13:55.240 --> 00:13:57.360 the user, actually see on your screen. 296 00:13:57.519 --> 00:13:59.960 This is where that concept of the debate comes in, right, 297 00:14:00.320 --> 00:14:03.840 The sources mentioned a hidden reasoning trace. Yes, exactly before 298 00:14:03.840 --> 00:14:08.000 I see my answer, these agents are actually arguing. Are 299 00:14:08.039 --> 00:14:08.879 they fighting it out? 300 00:14:09.039 --> 00:14:12.240 They are debating, and arguing might be the right word. 301 00:14:12.279 --> 00:14:16.440 In some cases, the reasoner might propose a solution, saying, logically, 302 00:14:16.559 --> 00:14:19.840 this is the most efficient path. The verifier might jump 303 00:14:19.840 --> 00:14:24.279 in and say, actually, current federal safety regulations prohibit that method. 304 00:14:24.120 --> 00:14:25.919 Entirely, and then the simulator chimes in. 305 00:14:26.039 --> 00:14:28.759 The simulator might add and even if it were legal, 306 00:14:28.960 --> 00:14:32.039 if you try that, the engine will overheat in thirty 307 00:14:32.120 --> 00:14:34.440 seconds because of the friction involved. 308 00:14:34.080 --> 00:14:37.720 And the synthesizer. The manager has to resolve that conflict. 309 00:14:37.840 --> 00:14:41.360 It has to reconcile those discrepancies. And this matters so 310 00:14:41.440 --> 00:14:45.200 much because in a single model, monolithic system, the AI 311 00:14:45.559 --> 00:14:48.879 just picks the most statistically likely path and commits to it. 312 00:14:48.879 --> 00:14:51.960 It often doubles down on its own errors. Right here, 313 00:14:52.039 --> 00:14:54.759 the system challenges itself before it ever speaks to you. 314 00:14:55.120 --> 00:14:57.840 It's like having a boardroom of diverse experts in your 315 00:14:57.840 --> 00:15:02.480 pocket instead of just one really smart but sometimes overconfident intern. 316 00:15:02.639 --> 00:15:05.399 That is the perfect analogy, and it explains why the 317 00:15:05.480 --> 00:15:09.639 leaked benchmarks for complex, open ended engineering problems are so high. 318 00:15:10.039 --> 00:15:12.960 Grock four point two isn't guessing on these problems. It's 319 00:15:13.000 --> 00:15:15.000 holding a committee meeting at the speed of light. 320 00:15:15.320 --> 00:15:19.440 Now, usually when you tell me committee meeting, I hear slow. 321 00:15:19.399 --> 00:15:21.360 Right bureaucracy, red tape. 322 00:15:21.440 --> 00:15:24.279 Exactly if I have to wait for four different AI 323 00:15:24.399 --> 00:15:26.840 agents to argue it out, am I waiting ten minutes 324 00:15:26.840 --> 00:15:31.399 for a response? Because we've all become very, very impatient users. 325 00:15:31.440 --> 00:15:32.840 We want our answers instantly. 326 00:15:33.120 --> 00:15:37.480 You would think, so, it's logical, more computation, more agents, 327 00:15:37.559 --> 00:15:41.639 more steps, it should equal more time. But this is 328 00:15:41.679 --> 00:15:45.240 the engineering miracle of Grock four point two. The stats 329 00:15:45.279 --> 00:15:48.639 on speed are genuinely baffling. What we talk about inference 330 00:15:48.759 --> 00:15:51.759 latency is actually down by a factor of three to five. 331 00:15:51.600 --> 00:15:54.519 Down, not up. It's faster to run four agents than 332 00:15:54.559 --> 00:15:56.279 it was to run one old model. 333 00:15:56.120 --> 00:15:59.039 Much much faster. Responses that used to take a sluggish 334 00:15:59.039 --> 00:16:01.559 eight to twelve seconds on GROC four are now taking 335 00:16:01.600 --> 00:16:02.519 one to three seconds. 336 00:16:02.559 --> 00:16:04.679 Okay, hold on, how is that physically possible. That seems 337 00:16:04.720 --> 00:16:06.960 to defy the lies of computation. 338 00:16:07.120 --> 00:16:10.759 It comes down to two main things, extreme parallelism and 339 00:16:10.799 --> 00:16:13.399 a new memory innovation they're calling anagram primitive. 340 00:16:13.519 --> 00:16:16.159 Okay, let's unpack parallelism first. That makes some sense. 341 00:16:16.240 --> 00:16:18.960 The agents don't run in a sequence. It's not reasoner 342 00:16:19.120 --> 00:16:20.759 than verifier, then simulator. 343 00:16:20.840 --> 00:16:22.080 It's not a bucket brigade. 344 00:16:22.279 --> 00:16:26.879 No, they spin up simultaneously on that massive colossus cluster. 345 00:16:27.759 --> 00:16:30.240 The reasoner is doing its math at the exact same 346 00:16:30.320 --> 00:16:33.480 time the verifier is queerying the web for facts. They 347 00:16:33.480 --> 00:16:35.960 work in parallel, not in a line, and then the 348 00:16:35.960 --> 00:16:37.559 synthesizer sorts out the results. 349 00:16:37.559 --> 00:16:40.559 Okay, that accounts for some of it. Yeah, but n grams. 350 00:16:41.159 --> 00:16:43.840 That sounds like something straight out of a cyberpunk novel. Yeah, 351 00:16:43.879 --> 00:16:46.159 I need to slot in a new anagram for kung fu. 352 00:16:46.399 --> 00:16:48.519 Hey, yeah, it does have that ring to it. It's 353 00:16:48.519 --> 00:16:51.799 a fascinating memory innovation. The simplest way to think of 354 00:16:51.840 --> 00:16:55.039 it is like a highly advanced ZIP file for concepts. 355 00:16:55.080 --> 00:16:56.440 A ZIP file for a concept. 356 00:16:56.519 --> 00:17:00.759 Okay. Normally, for an AI to access a specific piece 357 00:17:00.759 --> 00:17:03.759 of knowledge, say the entire tax code of France, it 358 00:17:03.799 --> 00:17:07.160 has to compute that information through its entire massive neural network. 359 00:17:07.160 --> 00:17:10.240 That's billions of parameters firing. It's computationally heavy lifting. 360 00:17:10.440 --> 00:17:10.640 Right. 361 00:17:11.000 --> 00:17:16.039 Anagram primitives are pre computed, compressed memory representations of large 362 00:17:16.079 --> 00:17:19.359 stable concepts. They allow the model to recall and reason 363 00:17:19.400 --> 00:17:22.400 over vast knowledge bases without needing to activate the entire 364 00:17:22.440 --> 00:17:23.920 brain for every single query. 365 00:17:24.200 --> 00:17:26.519 So it's like instead of having to read the whole 366 00:17:26.559 --> 00:17:29.799 textbook again, every time it has a photographic memory of 367 00:17:29.839 --> 00:17:32.640 a specific page it needs, it can just pull that instantly. 368 00:17:32.839 --> 00:17:36.599 Roughly, Yes, it's a very clever shortcut for high speed recall. 369 00:17:37.359 --> 00:17:40.599 It allows the agents to access huge amounts of domain 370 00:17:40.640 --> 00:17:45.960 specific data instantly without the computational drag. It effectively prefetches 371 00:17:46.000 --> 00:17:49.559 the context it thinks it will need for a given problem, and. 372 00:17:49.440 --> 00:17:51.599 That must tie into the context window. We're seeing a 373 00:17:51.680 --> 00:17:54.000 native one million token window. 374 00:17:53.799 --> 00:17:56.680 Native one million, and it's expandable to two million for 375 00:17:56.799 --> 00:17:58.000 enterprise users, for. 376 00:17:57.960 --> 00:18:01.759 Anyone listening just for scale one million tokens is huge. 377 00:18:01.759 --> 00:18:04.319 It's the entire Lord of the Rings trilogy plus the Hobbit. 378 00:18:04.359 --> 00:18:05.440 It's a massive code base. 379 00:18:05.640 --> 00:18:08.559 It's the ability to hold an entire complex project in 380 00:18:08.599 --> 00:18:11.759 memory at once. And because of this anging ram system. 381 00:18:11.799 --> 00:18:14.279 It doesn't feel like the AI is loading all that data. 382 00:18:14.319 --> 00:18:16.880 It feels native. It just knows it instantly. 383 00:18:17.400 --> 00:18:21.160 So we have speed, unbelievable speed, but speed is useless 384 00:18:21.160 --> 00:18:24.119 if you're just confidently wrong faster. Of course, we touched 385 00:18:24.160 --> 00:18:26.920 on accuracy with the verifier agent, but what do the 386 00:18:26.960 --> 00:18:28.799 actual numbers say about error rates? 387 00:18:29.119 --> 00:18:32.799 The internal evaluations, which are now being corroborated by early 388 00:18:32.960 --> 00:18:36.640 user benchmarks, are claiming a forty to sixty percent reduction 389 00:18:36.759 --> 00:18:41.039 in air rates on complex multi step reasoning problems compared 390 00:18:41.039 --> 00:18:42.319 to Grock four point one. 391 00:18:42.400 --> 00:18:45.400 A sixty percent reduction. That's not an incremental improvement. That 392 00:18:45.480 --> 00:18:46.759 is a generational leap. 393 00:18:46.880 --> 00:18:49.720 It's the difference between a novelty and a professional tool. 394 00:18:50.599 --> 00:18:53.039 If an AI is wrong twenty percent of the time 395 00:18:53.079 --> 00:18:55.960 on hard problems, you can't really trust it with your job. 396 00:18:56.279 --> 00:18:59.279 You spend more time checking its work than doing your own. Sure, 397 00:18:59.440 --> 00:19:01.480 if it's only wrong, say one or two percent of 398 00:19:01.480 --> 00:19:03.359 the time, you can start building a company on top 399 00:19:03.400 --> 00:19:06.039 of it. The four agent system is pushing it across 400 00:19:06.079 --> 00:19:08.039 that critical reliability threshold. 401 00:19:08.119 --> 00:19:10.400