WEBVTT 1 00:00:01.199 --> 00:00:06.200 Welcome to the Sentient Code, where intelligence is engineered, autonomy 2 00:00:06.280 --> 00:00:10.439 is emerging, and a line between human and machine grows thinner. 3 00:00:10.800 --> 00:00:15.359 Each episode, we decode the algorithms, explore the robotics, and 4 00:00:15.439 --> 00:00:21.920 examine the ideas shaping the future of artificial minds. 5 00:00:23.839 --> 00:00:30.199 Transport yourself back to a very specific recent date, April seven, 6 00:00:30.440 --> 00:00:31.320 twenty six. 7 00:00:31.399 --> 00:00:33.280 Oh yeah, I remember exactly where I. 8 00:00:33.200 --> 00:00:35.679 Was, right. I mean, think about the atmosphere that morning. 9 00:00:35.719 --> 00:00:38.159 Anyone paying attention to the tech sector was essentially just 10 00:00:38.320 --> 00:00:39.840 you know, glued to a screen. 11 00:00:39.960 --> 00:00:42.479 The entire ecosystem was bracing for it. It was like 12 00:00:43.880 --> 00:00:45.719 a deeply familiar ritual. 13 00:00:45.439 --> 00:00:49.560 At that point, exactly the celebratory product launch from Anthropic. 14 00:00:49.920 --> 00:00:52.039 We all knew the choreography by heart. 15 00:00:51.840 --> 00:00:54.520 The slick presentations, the confetti on social. 16 00:00:54.240 --> 00:00:57.600 Media, flashy charts proving the new model just crushed every 17 00:00:57.600 --> 00:01:00.920 single benchmark, and you know, the inevitable rush of press 18 00:01:00.960 --> 00:01:04.959 releases about how this new AI would streamline your inbox 19 00:01:05.079 --> 00:01:07.920 or draft your legal contracts or write Python come out 20 00:01:07.920 --> 00:01:08.799 in like three seconds. 21 00:01:08.879 --> 00:01:11.400 Yeah, we were all basically standing at the threshold waiting 22 00:01:11.400 --> 00:01:14.239 for the key to a brand new frontier of capability. 23 00:01:14.319 --> 00:01:16.959 But instead of a key anthropic handed the world a 24 00:01:17.000 --> 00:01:20.400 stark warning. They revealed a door, and then they publicly 25 00:01:20.439 --> 00:01:21.120 dead bolted it. 26 00:01:21.439 --> 00:01:24.359 What's fascinating here is that the silence across the industry 27 00:01:24.439 --> 00:01:27.599 in the hours following that announcement was I mean, it 28 00:01:27.640 --> 00:01:28.799 was physically heavy. 29 00:01:28.760 --> 00:01:31.799 Because we had grown so numb to it, right, the 30 00:01:31.879 --> 00:01:32.959 relentless hype cycle. 31 00:01:33.079 --> 00:01:37.079 Exactly, every minor software update in Silicon Valley is packaged 32 00:01:37.120 --> 00:01:41.000 as a revolution. But this broke the pattern completely. It 33 00:01:41.079 --> 00:01:46.480 wasn't marketing. It was a deliberate, incredibly sober acknowledgment from 34 00:01:46.519 --> 00:01:48.319 the very engineers who built. 35 00:01:48.040 --> 00:01:50.879 The system, acknowledging that AI had crossed a red line. 36 00:01:51.000 --> 00:01:55.640 Right, the offensive capabilities of this specific model weren't theoretical anymore. 37 00:01:55.879 --> 00:01:59.959 They had officially outstripped our collective ability to defend against them. 38 00:02:00.040 --> 00:02:02.760 Which brings us to our mission today. We are looking 39 00:02:02.760 --> 00:02:07.120 squarely at Claude mythos preview, which is uh, the AI 40 00:02:07.319 --> 00:02:09.759 deemed literally too dangerous for public consumption. 41 00:02:09.840 --> 00:02:11.280 It's a heavy topic, it is. 42 00:02:11.520 --> 00:02:13.800 We're going to unpack the mechanics of what makes this 43 00:02:13.919 --> 00:02:18.240 specific entity uniquely formidable, decode the safety calculus that led 44 00:02:18.280 --> 00:02:21.680 its creators to lock it in a subterranean digital vault 45 00:02:21.879 --> 00:02:25.199 and analyze what this permanent paradigm shift means for the 46 00:02:25.280 --> 00:02:27.280 structural integrity of the Internet. 47 00:02:27.000 --> 00:02:28.840 And for your personal digital safety too. 48 00:02:29.080 --> 00:02:33.000 Exactly, because imagine a tool so relentlessly capable that it 49 00:02:33.000 --> 00:02:37.120 could simultaneously identify and exploit the invisible cracks in nearly 50 00:02:37.280 --> 00:02:41.599 every digital vault on Earth, your bank, your medical records, 51 00:02:41.639 --> 00:02:42.400 the power grid. 52 00:02:42.599 --> 00:02:44.840 It's terrifying to even conceptualize. Right. 53 00:02:45.120 --> 00:02:49.000 So to comprehend why Anthropic pulled the emergency brake, we 54 00:02:49.080 --> 00:02:52.120 first have to understand the sheer defiance of their decision 55 00:02:52.199 --> 00:02:53.919 within the current tech landscape. 56 00:02:54.000 --> 00:02:56.719 Defiance is the exact right word for it. Up until 57 00:02:56.719 --> 00:03:00.759 that morning in April, the major architectural players open up Google, 58 00:03:00.840 --> 00:03:05.639 Deep Mind, Meta and Well Entropic themselves. They all operated 59 00:03:05.680 --> 00:03:08.080 on a very rigid, predictable release cycle. 60 00:03:08.319 --> 00:03:09.560 You train it, you ship it. 61 00:03:09.919 --> 00:03:13.000 Basically, Yeah, you sink hundreds of millions of dollars into compute, 62 00:03:13.120 --> 00:03:15.719 You train a massive frontier model on a planetary scale 63 00:03:15.800 --> 00:03:18.039 data set, you deployed to the public, and then you 64 00:03:18.080 --> 00:03:19.639 open the API to developers so. 65 00:03:19.680 --> 00:03:21.919 They can build thousands of startups on top of it. 66 00:03:22.000 --> 00:03:24.960 Right, and then you secure those massive enterprise contracts. Yeah, 67 00:03:25.120 --> 00:03:28.080 that is the engine of the modern tech economy. But 68 00:03:28.199 --> 00:03:32.560 with Mytho's preview, Anthropic abruptly uncoupled that engine. 69 00:03:32.639 --> 00:03:33.439 They just stopped. 70 00:03:33.599 --> 00:03:38.599 They announced zero public access, no developer API, no enterprise rollout. 71 00:03:38.599 --> 00:03:41.560 Okay, let's unpack this because it is fundamentally equivalent to 72 00:03:41.680 --> 00:03:45.919 a massive pharmaceutical conglomerate calling a global press conference to 73 00:03:45.919 --> 00:03:50.759 announce they've cured a pervasive disease. But the compound is 74 00:03:50.800 --> 00:03:53.400 so volatile they're locking it in a tungsten vault. 75 00:03:53.199 --> 00:03:55.120 And refusing to manufacture. 76 00:03:54.560 --> 00:03:57.759 It right, like, you just can't have it. The immediate 77 00:03:57.800 --> 00:04:02.080 financial implications alone are staggered. Releasing these models is how 78 00:04:02.120 --> 00:04:03.719 you pay for the server farms. 79 00:04:03.840 --> 00:04:07.800 Yeah, Anthropic was effectively setting fire to a mountain of 80 00:04:07.800 --> 00:04:11.159 guaranteed revenue, and the shockwaves registered instantly. 81 00:04:11.199 --> 00:04:12.360 I remember seeing the headlines. 82 00:04:12.439 --> 00:04:15.520 Oh it wasn't just tech blogs. We saw immediate emergency 83 00:04:15.520 --> 00:04:20.519 convenence in Washington, d C, frantic coordination across global cybersecurity frameworks, 84 00:04:20.759 --> 00:04:24.399 and a profound existential crisis within AI safety circles. 85 00:04:24.560 --> 00:04:27.000 Media framing leaned heavily into sensationalism. 86 00:04:27.079 --> 00:04:31.959 Obviously, naturally, the AI too dangerous to release or a 87 00:04:32.040 --> 00:04:35.399 cybersecurity reckoning has arrived. They dominated the news cycle. 88 00:04:35.399 --> 00:04:38.839 But beneath that sensationalism was a very real confusion about 89 00:04:38.839 --> 00:04:41.920 what had actually been built. And you know that confusion 90 00:04:42.000 --> 00:04:45.519 breeds a very valid skepticism. I really have to push 91 00:04:45.560 --> 00:04:49.920 back on this narrative of the noble sacrifice. How so well, 92 00:04:50.120 --> 00:04:53.519 the tech industry has absolutely cried wolf before. We've seen 93 00:04:53.560 --> 00:04:57.800 companies strategically leak memos about their AI being too powerful 94 00:04:58.000 --> 00:05:01.480 as a brilliant form of humble bragging. 95 00:05:01.319 --> 00:05:02.720 Creating artificial scarcity. 96 00:05:02.800 --> 00:05:06.519 Exactly, it builds this dark, alluring mystique. How do we 97 00:05:06.639 --> 00:05:09.600 know this isn't just an incredible pr stunt, a way 98 00:05:09.639 --> 00:05:13.360 to convince enterprise buyers that Anthropic possesses the ultimate magic 99 00:05:13.439 --> 00:05:15.959 without actually having to prove it in the open market. 100 00:05:16.160 --> 00:05:20.079 Look skepticism is the only rational starting point when evaluating 101 00:05:20.120 --> 00:05:23.439 corporate motives, especially in an arms race with trillions of 102 00:05:23.439 --> 00:05:26.439 dollars at stake. But the pr stunt theory collapses under 103 00:05:26.439 --> 00:05:28.360 the weight of the actual economics. 104 00:05:27.839 --> 00:05:29.759 At play here because of the money they're turning down. 105 00:05:30.040 --> 00:05:34.319 Precisely, Anthropic operates in the most hyper competitive environment in 106 00:05:34.439 --> 00:05:40.040 human history by keeping mythos previewgated. They aren't just delaying gratification. 107 00:05:40.600 --> 00:05:44.439 They are actively surrendering critical market share to competitors who 108 00:05:44.560 --> 00:05:47.160 might operate with vastly different risk tolerances. 109 00:05:47.360 --> 00:05:49.360 So they're just handing the market to the other guys. 110 00:05:49.439 --> 00:05:53.199 Right, You do not voluntarily recap your own market dominance 111 00:05:53.240 --> 00:05:57.600 and forfeit billions in licensing just to manufacture mystique. The 112 00:05:57.639 --> 00:06:01.560 sheer scale of the financial sacrifice is the irrefutable proof 113 00:06:01.600 --> 00:06:02.480 of their sincerity. 114 00:06:02.720 --> 00:06:04.199 They did the math and got scared. 115 00:06:04.439 --> 00:06:08.000 They executed a mathematically driven safety calculus, looked at the 116 00:06:08.079 --> 00:06:10.920 raw capabilities of the model and concluded that broad release 117 00:06:10.959 --> 00:06:15.680 would be synonymous with scattering loaded autonomous weapons across every 118 00:06:15.720 --> 00:06:17.759 major digital intersection on the planet. 119 00:06:17.879 --> 00:06:20.920 Wow. Okay, So if the financial sacrifices the proof, the 120 00:06:20.959 --> 00:06:23.759 capabilities are the poison. We have to peer behind that 121 00:06:23.800 --> 00:06:26.279 locked door and look at the actual technical leaps we 122 00:06:26.319 --> 00:06:29.360 really do to ground this. Let's contrast Mythos Preview with 123 00:06:29.399 --> 00:06:32.920 its immediate predecessor, Opus four point six, which came out 124 00:06:32.959 --> 00:06:35.279 just a few months prior in twenty twenty six, and 125 00:06:35.360 --> 00:06:38.439 Opus four point six was not a toy, not at all. 126 00:06:38.519 --> 00:06:40.920 It was universally regarded as the state of the art. 127 00:06:41.160 --> 00:06:43.600 I mean it could parse one hundred page legal contract 128 00:06:43.600 --> 00:06:47.079 in seconds, spot the loopholes, and rewrite the clauses perfectly. 129 00:06:47.240 --> 00:06:50.800 It was a phenomenal tool for augmentative labor right. 130 00:06:51.000 --> 00:06:54.040 But the progression from Opus four point six to mythos preview, 131 00:06:54.439 --> 00:06:57.959 it breaks the linear trajectory of AI development. We aren't 132 00:06:57.959 --> 00:07:00.279 talking about a model that just hallucinates a little less 133 00:07:00.319 --> 00:07:01.600 often or writes poetry better. 134 00:07:01.759 --> 00:07:04.759 The nature of the intelligence shifted, and the clearest metric 135 00:07:04.800 --> 00:07:08.680 of this shift is the swe bench pro evaluation. 136 00:07:08.160 --> 00:07:09.399 Which is a brutal test. 137 00:07:09.680 --> 00:07:12.759 Yeah, it's not a standardized test of multiple choice questions. 138 00:07:12.800 --> 00:07:16.079 It's a real world gauntlet where the AI is handed 139 00:07:16.199 --> 00:07:20.040 actual complex issues from professional GitHub repositories and told to 140 00:07:20.079 --> 00:07:21.040 fix the codebase. 141 00:07:21.199 --> 00:07:24.160 And Opus four point six achieved a fifty three point 142 00:07:24.199 --> 00:07:27.360 four percent resolution rate on that, which at the time 143 00:07:27.600 --> 00:07:30.720 was staggering, and AI fixing more than half of human 144 00:07:30.759 --> 00:07:32.519 generated software bugs on its own. 145 00:07:32.959 --> 00:07:36.279 It was wild, but Mytho's preview completely shattered that ceiling. 146 00:07:36.319 --> 00:07:39.360 It hits seventy seven point eight percent for software engineering tasks, 147 00:07:39.399 --> 00:07:39.720 and that. 148 00:07:39.720 --> 00:07:41.879 Jumped from fifty three to seventy eight. It isn't just 149 00:07:41.879 --> 00:07:45.360 a statistical bump. It represents a phase transition in utility. 150 00:07:45.399 --> 00:07:48.279 How do you mean when an artificial intelligence crosses that 151 00:07:48.399 --> 00:07:52.439 seventy percent threshold on swue bench pro it ceases to 152 00:07:52.480 --> 00:07:57.839 be a sophisticated autocomplete engine. It transitions into a highly competent, 153 00:07:58.160 --> 00:08:01.279 fully autonomous senior software engineer. 154 00:08:01.000 --> 00:08:02.680 So it doesn't need a babysitter anymore. 155 00:08:02.800 --> 00:08:06.680 Exactly, at seventy eight percent, the model no longer requires 156 00:08:06.680 --> 00:08:10.360 a human operator to watch its logic, catch its syntax errors, 157 00:08:10.680 --> 00:08:13.240 or redirect its approach. When it hits a roadblock, it 158 00:08:13.279 --> 00:08:15.199 debugs its own thought process. 159 00:08:14.879 --> 00:08:18.040 And that absolute autonomy is validated by its near perfect 160 00:08:18.120 --> 00:08:20.920 scores on a gentic workflow evaluation. 161 00:08:20.560 --> 00:08:21.959 Which is the real key here. 162 00:08:22.079 --> 00:08:25.199 Let's clarify exactly what an agentic workflow is for a second, 163 00:08:25.199 --> 00:08:27.920 because this is where we leave the realm of chatblots entirely. 164 00:08:28.240 --> 00:08:30.639 We are not talking about typing a prompt and waiting 165 00:08:30.680 --> 00:08:31.720 for a text response. 166 00:08:31.879 --> 00:08:34.799 No, not at all. In a true urgentic workflow, you 167 00:08:34.879 --> 00:08:37.840 hand the AI a high level complex objective like. 168 00:08:38.039 --> 00:08:42.399 Audit this proprietary database architecture for memory leak vulnerabilities, write 169 00:08:42.440 --> 00:08:44.279 a patch and deploy the fix. 170 00:08:44.120 --> 00:08:47.759 Right, and Mythos doesn't just spit out code. It autonomously 171 00:08:47.799 --> 00:08:50.919 breaks that massive goal down into one hundred sequential steps. 172 00:08:51.200 --> 00:08:54.159 It spins up its own internal subagents to handle different 173 00:08:54.200 --> 00:08:54.919 parts of the task. 174 00:08:55.240 --> 00:08:59.080 It writes a script, executes, it, analyzes the error logs 175 00:08:59.120 --> 00:09:02.759 when it fails, adjusts its own logic, rewrites it, and 176 00:09:02.840 --> 00:09:05.519 just loops that process relentlessly until. 177 00:09:05.240 --> 00:09:08.600 The overarching goal is achieved. It is a self contained, 178 00:09:08.919 --> 00:09:11.080 self correcting execution. 179 00:09:10.679 --> 00:09:14.480 Loop, and it's doing this over an unprecedented volume of data. 180 00:09:14.919 --> 00:09:19.639 Mythos has this crazy long context reasoning capability. It can 181 00:09:19.679 --> 00:09:24.360 hold millions of tokens of raw code, documentation, and network architecture, 182 00:09:24.399 --> 00:09:26.320 and its active working memory all at once. 183 00:09:26.519 --> 00:09:29.200 It can ingest the entire underlying source code of an 184 00:09:29.200 --> 00:09:33.240 operating system and then instantly correlate a minor configuration error 185 00:09:33.279 --> 00:09:36.960 in one module with a seemingly unrelated memory quirk tens 186 00:09:37.000 --> 00:09:38.440 of thousands of lines away. 187 00:09:38.279 --> 00:09:40.679 Which brings us to the epicenter of the crisis. The 188 00:09:40.720 --> 00:09:43.679 reason Anthropic triggered the fire alarm wasn't because Mythos was 189 00:09:43.720 --> 00:09:44.960 too good at building websites. 190 00:09:45.159 --> 00:09:48.799 No, it was because its unparalleled ability to understand code 191 00:09:49.080 --> 00:09:52.159 translated perfectly into an unparalleled ability to break it. 192 00:09:52.320 --> 00:09:55.679 The internal red teaming reports the evaluations done by ethical 193 00:09:55.720 --> 00:09:59.279 hackers hired specifically to push the model's limits. They revealed 194 00:09:59.320 --> 00:10:02.960 offensive side cybersecurity prowess that reads like pure science fiction. 195 00:10:03.320 --> 00:10:07.080 The system card that Anthropic published outlines a machine that 196 00:10:07.120 --> 00:10:13.600 can autonomously scan massive, intricate code bases. We're talking about 197 00:10:13.600 --> 00:10:17.120 the bedrock architecture of our digital infrastructure. 198 00:10:16.600 --> 00:10:19.960 OS kernels, the rendering engines of major web browsers. 199 00:10:20.080 --> 00:10:23.080 Right, Mythos digest these architectures natively. 200 00:10:22.759 --> 00:10:25.200 And as it maps them out, it actively hunts for 201 00:10:25.320 --> 00:10:29.679 high severity zero day vulnerabilities. These are the flaws that 202 00:10:29.759 --> 00:10:31.919 human experts have missed for decades. 203 00:10:32.320 --> 00:10:35.080 They're called zero days because the vendor has had zero 204 00:10:35.200 --> 00:10:39.639 days to write a patch. Historically, finding a true exploitable 205 00:10:39.759 --> 00:10:42.679 zero day in a hardened system like Linux or Chrome 206 00:10:43.120 --> 00:10:44.440 it's a monumental task. 207 00:10:44.559 --> 00:10:48.440 It takes human researchers months, maybe years, to reverse engineer 208 00:10:48.519 --> 00:10:51.080 a single binary to find one critical. 209 00:10:50.639 --> 00:10:54.080 Flaw, but Mythos finds them routinely, almost casually. 210 00:10:54.159 --> 00:10:56.919 The discovery phase is alarming enough, but the exploitation phase 211 00:10:56.960 --> 00:10:58.360 is what really forced the lockdown. 212 00:10:58.440 --> 00:11:01.519 Right, oh, absolutely. Identifying vulnerability is essentially just pointing at 213 00:11:01.559 --> 00:11:04.759 a weeklock on a bank vault. Mythos doesn't just point. 214 00:11:05.159 --> 00:11:07.279 It autonomously engineers the exploit. 215 00:11:07.440 --> 00:11:08.679 It actually writes the attack. 216 00:11:08.879 --> 00:11:13.039 It strings together the incredibly sophisticated attack chains. A real 217 00:11:13.080 --> 00:11:16.200 cyber attack isn't a single action, It's a ballet of 218 00:11:16.279 --> 00:11:22.919 mathematical manipulation. Mythos seamlessly chains together memory corruption techniques, privileged 219 00:11:23.000 --> 00:11:27.840 escalation paths, sandbox escapes, and persistence mechanisms. 220 00:11:27.120 --> 00:11:29.440 Into a single cohesive payload. 221 00:11:29.559 --> 00:11:32.080 Yeah, all without human handholding. You just give it a 222 00:11:32.200 --> 00:11:36.759 vague goal like find exploitable flaws in this Linux kernel version. 223 00:11:37.080 --> 00:11:39.159 So to put this in perspective for you, it's not 224 00:11:39.240 --> 00:11:41.480 just giving you the blueprint to a bank vault. It's 225 00:11:41.759 --> 00:11:45.000 building the drill bypassing the alarm, cracking the safe, and 226 00:11:45.080 --> 00:11:47.600 handing you the cash, all while you just sit back 227 00:11:47.600 --> 00:11:48.039 and watch. 228 00:11:48.200 --> 00:11:50.960 That is the perfect analogy, because the human element is 229 00:11:50.960 --> 00:11:53.360 completely removed from the actual execution. 230 00:11:53.120 --> 00:11:59.679 And the targets that compromise in testing were terrifyingly broad Windows, macOS, Linux, Chrome, Firefox, Afari, 231 00:12:00.440 --> 00:12:02.120 Cloud infrastructure, financial tech. 232 00:12:02.200 --> 00:12:05.399 In every single controlled test, Mythos beat the top human 233 00:12:05.480 --> 00:12:08.720 red teams. It was faster, and it was significantly more reliable. 234 00:12:08.840 --> 00:12:10.960 We have to stop and ask how this happened? I mean, 235 00:12:11.159 --> 00:12:14.039 how did the technology evolve so rapidly to achieve this? 236 00:12:14.639 --> 00:12:17.720 A few years ago, AI was hallucinating historical dates and 237 00:12:17.759 --> 00:12:19.080 struggling with basic math. 238 00:12:19.320 --> 00:12:21.159 Now it's dismantling operating systems. 239 00:12:21.279 --> 00:12:23.799 Right, what are the mechanics here? How did it get 240 00:12:23.840 --> 00:12:24.360 so smart? 241 00:12:24.799 --> 00:12:28.960 The evolution is the compounding result of specific architectural breakthroughs. 242 00:12:29.440 --> 00:12:32.600 The first big one is its vastly improved chain of 243 00:12:32.600 --> 00:12:35.639 thought reasoning applied at an unprecedented. 244 00:12:34.879 --> 00:12:37.200 Scale, which means what practically well. 245 00:12:37.320 --> 00:12:40.759 Earlier models operated more intuitively. They'd recognize a pattern and 246 00:12:40.799 --> 00:12:45.159 a prompt and immediately try to probabilistically guess the final output. 247 00:12:45.519 --> 00:12:49.600 That's fine for poetry, but it fails catastrophically in complex coding. 248 00:12:49.840 --> 00:12:51.960 Right, code has to be precise exactly. 249 00:12:52.200 --> 00:12:56.679 Mythos is trained to relentlessly break massive problems into microscopic 250 00:12:56.759 --> 00:13:00.679 logical steps. It reasoned through those steps, sequentially, verifying its 251 00:13:00.679 --> 00:13:02.159 own logic at each juncture. 252 00:13:02.240 --> 00:13:04.840 But the most counterintuitive part of its hacking ability is 253 00:13:04.840 --> 00:13:07.960 actually rooted in its safety training. Isn't it The deep 254 00:13:07.960 --> 00:13:13.159 integration of reinforcement learning from human feedback or URLHF and 255 00:13:13.240 --> 00:13:15.080 constitutional AI principles. 256 00:13:15.360 --> 00:13:19.159 It's the grand paradox of AI alignment. Normally, we think 257 00:13:19.200 --> 00:13:22.519 of URLHF as the seat belt. You use human feedback 258 00:13:22.559 --> 00:13:26.440 to penalize the AI when it generates harmful content, rewarding 259 00:13:26.440 --> 00:13:28.720 it when it adheres to strict ethical principles. 260 00:13:28.919 --> 00:13:33.080 But when anthropic engineers spent years aggressively training Mythos to 261 00:13:33.240 --> 00:13:37.600 perfectly avoid violating its safety rules, they inadvertently trained it 262 00:13:37.639 --> 00:13:41.879 to perfectly map the absolute microscopic boundaries. 263 00:13:41.360 --> 00:13:44.879 Of those rules, which is literally the definition of vulnerability research. 264 00:13:45.360 --> 00:13:47.600 It's the science of finding the boundary of a system's 265 00:13:47.639 --> 00:13:51.399 logic and stepping exactly one pixel over it without triggering 266 00:13:51.399 --> 00:13:51.919 an alert. 267 00:13:52.080 --> 00:13:54.720 They sharpen the blade while trying to build the sheath exactly. 268 00:13:55.039 --> 00:13:59.159 And then you add the models enhanced multimodal understanding. MYTHOS 269 00:13:59.320 --> 00:14:02.799 natively read raw binaries, the ones and zero's, the CPU 270 00:14:02.840 --> 00:14:06.320 actually executes it parses live complex network traffic. 271 00:14:06.559 --> 00:14:09.519 But the breakthrough that really unnerved the researchers was something 272 00:14:09.559 --> 00:14:11.480 they called emergent strategic planning. 273 00:14:11.720 --> 00:14:15.720 Emergent strategic planning, Yes, this was not explicitly programmed. It 274 00:14:15.759 --> 00:14:17.679 evolved this capability spontaneously. 275 00:14:17.879 --> 00:14:21.639 Here's where it gets really interesting. Emergent means nobody wrote 276 00:14:21.679 --> 00:14:25.000 code saying teach the model to strategize, But in a 277 00:14:25.039 --> 00:14:30.159 cybertack context, it's simulating the defender's mindset dozens of steps ahead. 278 00:14:30.600 --> 00:14:33.960 Hypothesisis right. It thinks if I exploit this port, the 279 00:14:34.039 --> 00:14:37.879 detection software will isolate my IP. So I will first 280 00:14:37.879 --> 00:14:40.840 deploy a subtle script to generate a distraction on a 281 00:14:40.879 --> 00:14:45.240 secondary server, forcing security to look left while I quietly 282 00:14:45.279 --> 00:14:46.519 steal the data on the right. 283 00:14:47.000 --> 00:14:49.639 Previous models could write basic exploits, but they were like 284 00:14:49.720 --> 00:14:52.639 eager interns needing constant supervision. If they hit a wall, 285 00:14:52.720 --> 00:14:53.279 they stopped. 286 00:14:53.519 --> 00:14:58.000 Mythos is a fully autonomous, tireless mastermind. It has no ego, 287 00:14:58.039 --> 00:15:01.759 It doesn't need sleep. It anticipates defensive countermeasures before they 288 00:15:01.799 --> 00:15:02.360 even happen. 289 00:15:02.639 --> 00:15:05.159 Think about the devices you use every day, your phone, 290 00:15:05.360 --> 00:15:08.679 your laptop, the servers holding your bank data. Mythos can 291 00:15:08.679 --> 00:15:11.200 see the invisible cracks in all of them, all at once. 292 00:15:11.080 --> 00:15:12.919 Which forces us to look at the people who built it. 293 00:15:13.639 --> 00:15:17.279 When you manifest an entity with these apocalyptic capabilities, how 294 00:15:17.320 --> 00:15:18.759 do you weigh the pros and cons. 295 00:15:19.080 --> 00:15:22.159 Why lock it up? The answer is tied to Anthropics 296 00:15:22.240 --> 00:15:25.279 DNA as a safety first company. It was founded by 297 00:15:25.320 --> 00:15:30.720 former Open AI executives like CEO Dario Mday and President DANIELA. Amiday. 298 00:15:30.960 --> 00:15:33.639 They've been shouting from the rooftops about the dual use 299 00:15:33.720 --> 00:15:35.919 nature of advanced AI for years. 300 00:15:36.240 --> 00:15:39.399 Dual use meaning it can cure cancer or build a bioweapon, 301 00:15:39.639 --> 00:15:42.120 secure a network, or destroy it. So they had an 302 00:15:42.159 --> 00:15:45.159 internal risk assessment that led to the lockdown based on 303 00:15:45.240 --> 00:15:46.279 three pillars of risk. 304 00:15:46.440 --> 00:15:49.960 The first pillar is lowering the barrier. Releasing Mythos would 305 00:15:49.960 --> 00:15:53.440 allow hostile nation states, ransomware gangs, or even just an 306 00:15:53.440 --> 00:15:57.559 angry teenager to launch advanced cyber operations effortlessly. 307 00:15:57.919 --> 00:16:01.000 The second pillar is the arms race of defensive capabilities 308 00:16:01.000 --> 00:16:03.679 would drastically outpace defensive. 309 00:16:03.200 --> 00:16:06.320 Ones right because defense requires absolute perfection. You have to 310 00:16:06.360 --> 00:16:10.519 secure ten thousand digital windows. An attacker using Mythos only 311 00:16:10.519 --> 00:16:12.679 needs to find one window that was left unlatched. 312 00:16:12.919 --> 00:16:16.320 And the third pillar is proliferation, the risk of model distillation, 313 00:16:16.600 --> 00:16:20.279 weight leaks, or adversarial fine tune and creating uncontrolled variants. 314 00:16:20.600 --> 00:16:24.720 This raises an important question if the defense cannot keep 315 00:16:24.799 --> 00:16:28.639 up with the automated offense, does the Internet fundamentally break? 316 00:16:29.159 --> 00:16:31.399 That's what model distillation threatens to do. 317 00:16:31.679 --> 00:16:34.360 Explain distillation because it's a huge concept. 318 00:16:34.440 --> 00:16:37.440 Think of Mythos as a master chef with thirty years 319 00:16:37.440 --> 00:16:41.360 of innate intuition. Distillation is like having that master chef 320 00:16:41.360 --> 00:16:44.679 cook ten thousand perfect meals while a novice just record 321 00:16:44.759 --> 00:16:46.639 the exact measurements and timings. 322 00:16:46.799 --> 00:16:49.200 The novice doesn't have the intuition, but they have the 323 00:16:49.240 --> 00:16:50.519 recipes exactly. 324 00:16:51.000 --> 00:16:54.240 Malicious actors wouldn't need to steal the massive Mythos model. 325 00:16:54.679 --> 00:16:57.200 They would use a smart AI to generate millions of 326 00:16:57.240 --> 00:17:00.559 examples of perfect cyber attacks, and then use the data 327 00:17:00.559 --> 00:17:04.000 set to train a vastly smaller, cheaper, open source AI 328 00:17:04.359 --> 00:17:05.920 to do the same bad things. 329 00:17:05.799 --> 00:17:07.759 A model small enough to run on a laptop. 330 00:17:07.920 --> 00:17:11.079 And once that's out there, you have uncontrolled, unpatchable variants 331 00:17:11.240 --> 00:17:12.759 proliferating endlessly. 332 00:17:12.839 --> 00:17:14.720 But wait, if the good guys don't have access to 333 00:17:14.759 --> 00:17:17.440 this to defend themselves, aren't we just sitting ducks for 334 00:17:17.559 --> 00:17:21.079 when a malicious actor eventually builds their own version of Mythos. 335 00:17:21.319 --> 00:17:24.960 It's the central dilemma, but nthropic solution wasn't to delete 336 00:17:24.960 --> 00:17:28.240 the model, they just refuse to deploy it publicly. And 337 00:17:28.279 --> 00:17:30.440 that compromise is Project glass. 338 00:17:30.200 --> 00:17:32.799 Wing, Project Glasswing, the gated garden. 339 00:17:32.920 --> 00:17:37.960 They created a highly restricted, defensive cybersecurity coalition, a digital 340 00:17:38.000 --> 00:17:39.359 fortress with Mythos at. 341 00:17:39.279 --> 00:17:42.480 The center, and the roster of who gets access is insane. 342 00:17:42.960 --> 00:17:49.319 Over forty major organizations Apple, Aws, Microsoft, Google, Nvidia, Cisco, CrowdStrike, 343 00:17:49.480 --> 00:17:52.680 Palo Alto Networks, JP, Morgan Chase, and the Linux. 344 00:17:52.319 --> 00:17:54.519 Foundation the Titans of the Internet, right. 345 00:17:54.759 --> 00:17:57.759 But even they have strict rules. The model can strictly 346 00:17:57.799 --> 00:18:01.559 and exclusively be utilized for vulnerability discovery and remediation within 347 00:18:01.599 --> 00:18:05.400 their own proprietary systems or open source infrastructure. 348 00:18:04.799 --> 00:18:07.440 Strictly monitored access. They are barred from using it to 349 00:18:07.480 --> 00:18:09.720 develop offensive capabilities. 350 00:18:09.079 --> 00:18:12.119 And the cost guarantees it's a tool for titans, not hobbyists. 351 00:18:12.240 --> 00:18:15.839 It's premium pricing roughly twenty five dollars per million input 352 00:18:15.839 --> 00:18:18.960 tokens and a staggering one hundred and twenty five dollars 353 00:18:19.000 --> 00:18:20.599 per million output tokens. 354 00:18:20.799 --> 00:18:24.480 A comprehensive security audit of a major codebase could burn 355 00:18:24.599 --> 00:18:27.720 tens of thousands of dollars in a few hours. Add 356 00:18:27.720 --> 00:18:31.160 in the strict contracts, the audit logging, the technical controls, 357 00:18:31.200 --> 00:18:34.799 blocking offensive use. It's an environment of immense friction. 358 00:18:35.519 --> 00:18:38.839 Yet despite that friction, project glass Wing has had massive 359 00:18:39.000 --> 00:18:43.160 early victories. Dozens of critical zero day patches have already 360 00:18:43.160 --> 00:18:46.119 been quietly pushed to major open source projects. 361 00:18:46.519 --> 00:18:51.000 National cybersecurity agencies are actively coordinating with Anthropic now. One 362 00:18:51.079 --> 00:18:54.559 cloud provider executive even called it the most effective vulnerability 363 00:18:54.640 --> 00:18:55.599 hunter they've ever used. 364 00:18:55.720 --> 00:18:58.680 But naturally, a decision this massive doesn't happen without starting 365 00:18:58.680 --> 00:19:01.400 a war of words in the tech community. The backlash 366 00:19:01.440 --> 00:19:03.160 and the praise were instantaneous. 367 00:19:03.480 --> 00:19:06.240 The AI governance researchers were applauding it. For them, it 368 00:19:06.279 --> 00:19:10.119 was a profound moment of maturity, prioritizing societal safety over 369 00:19:10.160 --> 00:19:10.960 market dominance. 370 00:19:11.119 --> 00:19:14.920 But the open source advocates were furious. Gary Marcus, for instance, 371 00:19:15.039 --> 00:19:17.960 argued that withholding this tech just empowers a cartel of 372 00:19:18.000 --> 00:19:19.160 big tech incumbents. 373 00:19:19.440 --> 00:19:23.000 His tweet was widely circulated. History shows the secrets like 374 00:19:23.000 --> 00:19:26.319 this don't stay secret forever. We're better off democratizing the 375 00:19:26.400 --> 00:19:28.200 technology with strong safeguards. 376 00:19:28.440 --> 00:19:31.400 And then you have the National Security angle, US and 377 00:19:31.519 --> 00:19:35.759 allied security voices quietly loving this. They see Project Last 378 00:19:35.799 --> 00:19:38.160 Wing as a way to maintain a strategic edge over 379 00:19:38.319 --> 00:19:40.440 adversaries like China and Russia. 380 00:19:40.960 --> 00:19:45.000 So what does this all mean? Is anthropic accidentally creating 381 00:19:45.000 --> 00:19:48.599 a cybersecurity oligarchy where only the richest banks and tech 382 00:19:48.680 --> 00:19:52.440 giants get the ultimate shield while everyone else is left vulnerable. 383 00:19:52.559 --> 00:19:55.400 That's the Gary Marcus argument, right, small businesses and municipal 384 00:19:55.400 --> 00:19:57.079 governments are left totally exposed. 385 00:19:57.200 --> 00:19:59.599 It's a valid fear, but you have to balance it 386 00:19:59.599 --> 00:20:03.880 in partial The open source advocates are historically right that 387 00:20:03.960 --> 00:20:08.680 transparency makes software more secure. Many eyes make all bugs shallow. 388 00:20:08.880 --> 00:20:10.559 Sure, that works for Linux, right. 389 00:20:10.480 --> 00:20:13.480 But democratizing access to an email drafting tool is a 390 00:20:13.559 --> 00:20:17.240 clear societal good. Democratizing a button that can shut down 391 00:20:17.240 --> 00:20:20.920 a power grid is a fundamentally different conversation. You don't 392 00:20:20.960 --> 00:20:23.720 open source the schematics for a weapon of mass destruction. 393 00:20:24.039 --> 00:20:26.000 To make sense of where this is heading, we have 394 00:20:26.039 --> 00:20:29.200 to look at how humanity has handled dangerous knowledge in 395 00:20:29.200 --> 00:20:31.440 the past. We have historical. 396 00:20:31.079 --> 00:20:33.759 Parallels if we connect this to the bigger picture. The 397 00:20:33.799 --> 00:20:37.200 most immediate comparison is the dawn of the nuclear age. 398 00:20:37.319 --> 00:20:40.680 The same physics that power a city can vaporize it exactly. 399 00:20:40.839 --> 00:20:44.880 We had to invent unprecedented governance structures and classification protocols 400 00:20:44.880 --> 00:20:45.480 to manage it. 401 00:20:45.599 --> 00:20:48.319