WEBVTT 1 00:00:01.199 --> 00:00:06.200 Welcome to the sentient Code, where intelligence is engineered, autonomy 2 00:00:06.280 --> 00:00:10.439 is emerging, and a line between human and machine grows thinner. 3 00:00:10.800 --> 00:00:15.359 Each episode, we decode the algorithms, explore the robotics, and 4 00:00:15.439 --> 00:00:19.000 examine the ideas shaping the future of artificial minds. 5 00:00:23.879 --> 00:00:26.679 Picture this. You open your laptop, you fire up your browser, 6 00:00:26.719 --> 00:00:29.320 and you pull up your preferred AI interface. Right and 7 00:00:29.359 --> 00:00:32.399 there it is the little textbox staring back at you, 8 00:00:32.520 --> 00:00:36.439 just waiting exactly, the blinking cursor. It is waiting for you, 9 00:00:36.880 --> 00:00:40.079 waiting for you to set the context to engineer the 10 00:00:40.119 --> 00:00:43.079 perfect prompt. The guide it step by step through. 11 00:00:43.000 --> 00:00:47.000 Task, and it is phenomenally capable. But until you hit enter, 12 00:00:47.200 --> 00:00:49.079 it's essentially frozen in time. 13 00:00:49.240 --> 00:00:52.280 Yeah, it's a stateless entity completely. But now what if 14 00:00:52.320 --> 00:00:54.600 you never had to type that initial prompt again? What 15 00:00:54.640 --> 00:00:58.000 if the AI was already running, maintaining state and making 16 00:00:58.000 --> 00:01:00.679 decisions in the background while you were fast asleep. 17 00:01:00.799 --> 00:01:03.840 I mean, it completely flips the dynamic. We are conditioned 18 00:01:03.880 --> 00:01:08.040 to think of these models as highly advanced conversational calculators. 19 00:01:08.079 --> 00:01:10.400 You know, input, output and done exactly. 20 00:01:10.599 --> 00:01:12.439 You provide an input, you get an output, and then 21 00:01:12.480 --> 00:01:16.319 the system just goes dormant, but shifting from a reactive 22 00:01:16.359 --> 00:01:20.239 paradigm to a proactive, stateful one, it changes the entire 23 00:01:20.319 --> 00:01:22.760 foundation of human computer interaction. 24 00:01:23.000 --> 00:01:24.719 It's a huge leap, it really is. 25 00:01:25.079 --> 00:01:28.319 It moves the AI from a tool you wield to 26 00:01:28.640 --> 00:01:30.120 a colleague you manage. 27 00:01:30.439 --> 00:01:33.280 And that is the massive shift we are unpacking today. 28 00:01:33.840 --> 00:01:38.280 We're looking into Anthropics, highly anticipated project that's currently sitting 29 00:01:38.319 --> 00:01:41.680 in internal testing as of early April twenty twenty. 30 00:01:41.400 --> 00:01:43.680 Six, right code named Conway. 31 00:01:43.400 --> 00:01:45.879 Or Claude Conway. Yeah, yeah, And the mission here is 32 00:01:45.879 --> 00:01:49.120 to understand how we are moving from passive chat interfaces 33 00:01:49.159 --> 00:01:53.680 to always on, persistent digital coworkers. 34 00:01:53.319 --> 00:01:55.680 Which is a fundamental redesign of the technology. 35 00:01:55.719 --> 00:01:59.640 It is because honestly, giving software the autonomy to act 36 00:01:59.680 --> 00:02:04.159 without my constant supervision, well it sounds incredibly powerful, but 37 00:02:04.239 --> 00:02:06.560 also like a fantastic way to rack up a mass 38 00:02:06.599 --> 00:02:09.599 of aws bill Oh definitely, or you know, trigger a 39 00:02:09.599 --> 00:02:12.319 PR disaster if it just goes completely off the rails. 40 00:02:12.479 --> 00:02:15.560 Both are incredibly valid concerns and I think they speak 41 00:02:15.599 --> 00:02:18.400 to why this transition is so complex. To really grasp 42 00:02:18.439 --> 00:02:22.199 why Conway is generating such intense internal buzz at Anthropic, 43 00:02:22.599 --> 00:02:25.719 we have to look at the boundaries of our current architecture. 44 00:02:25.120 --> 00:02:27.039 The limitations of what we have right now. 45 00:02:26.919 --> 00:02:30.599 Exactly right now. Even with powerful models, the session is ephemeral. 46 00:02:30.800 --> 00:02:33.960 When you close the browser tab, the context window just drops. 47 00:02:34.080 --> 00:02:34.919 It forgets everything. 48 00:02:35.319 --> 00:02:35.479 Right. 49 00:02:35.680 --> 00:02:38.840 The system doesn't remember what you discussed yesterday unless you 50 00:02:38.960 --> 00:02:43.159 manually feed that data back into a new session. So 51 00:02:43.280 --> 00:02:47.280 the bottleneck isn't the intelligence or the reasoning capabilities of 52 00:02:47.319 --> 00:02:51.360 the model. The bottleneck is amnesia amnesia exactly. It requires 53 00:02:51.520 --> 00:02:55.120 constant human initiation just to maintain momentum. 54 00:02:55.439 --> 00:02:57.919 Okay, let's unpack this because I think the best way 55 00:02:57.960 --> 00:03:01.919 to visualize the difference is to look at organizational structures. Okay, 56 00:03:01.919 --> 00:03:04.960 I like that right now. Interacting with an LM is 57 00:03:05.039 --> 00:03:09.199 like having access to the world's most brilliant reference librarian. Sure, 58 00:03:10.199 --> 00:03:11.960 but to get any value out of them, you have 59 00:03:12.000 --> 00:03:15.159 to physically walk up to the reference desk, articulate a 60 00:03:15.240 --> 00:03:18.560 highly specific query, wait for them to fetch the materials, 61 00:03:18.680 --> 00:03:20.400 and then you have to synthesize it yourself. 62 00:03:20.479 --> 00:03:22.240 And if you want to follow up, you're walking right 63 00:03:22.240 --> 00:03:22.680 back to the. 64 00:03:22.639 --> 00:03:26.400 Desk exactly Conway. On the other hand, sounds like hiring 65 00:03:26.400 --> 00:03:28.840 a dedicated chief of staff. Yes, someone who has their 66 00:03:28.879 --> 00:03:32.960 own desk, who knows your ongoing priorities, and who initiates 67 00:03:32.960 --> 00:03:35.639 the research before you even realize you need it. 68 00:03:36.000 --> 00:03:40.400 That maps perfectly onto the architectural shift here. Conway operates 69 00:03:40.520 --> 00:03:42.639 essentially as an AI operating system. 70 00:03:42.719 --> 00:03:44.039 An operating system, right. 71 00:03:44.159 --> 00:03:46.800 It's built around the claud For family of models. But 72 00:03:46.879 --> 00:03:49.120 it is an environment rather than just a chat in 73 00:03:49.159 --> 00:03:53.039 our face. The engine driving this autonomous chief of Staff 74 00:03:53.080 --> 00:03:55.360 relies on an incredible scale of memory. 75 00:03:55.479 --> 00:03:56.400 How big are we talking. 76 00:03:56.599 --> 00:03:59.439 We are looking at a context window capable of handling 77 00:03:59.479 --> 00:04:02.120 upwards of one million tokens. 78 00:04:01.840 --> 00:04:04.000 Wait a million, Just to put that into perspective for 79 00:04:04.039 --> 00:04:07.759 a second. A million tokens is roughly equivalent to holding 80 00:04:07.759 --> 00:04:10.120 the entire Harry Potter series plus the Lord of the 81 00:04:10.199 --> 00:04:13.319 Rings trilogy in active working memory simultaneously. Right. 82 00:04:13.400 --> 00:04:16.639 Roughly, Yes, it's massive, and that massive context is what 83 00:04:16.839 --> 00:04:20.399 enables this long horizon reasoning. Okay, but the key mechanism 84 00:04:20.439 --> 00:04:23.040 here isn't just holding a lot of text. It's how 85 00:04:23.120 --> 00:04:27.399 Conway handles persistence without constantly retraining its neural network weights. 86 00:04:27.560 --> 00:04:30.800 Because retraining constantly would be computationally. 87 00:04:30.160 --> 00:04:33.120 Impossible, exactly, It would cost a fortune and take way 88 00:04:33.160 --> 00:04:37.759 too long, So instead Conway uses that massive context for 89 00:04:38.079 --> 00:04:42.680 advanced in context learning. It maintains a continuous running. 90 00:04:42.399 --> 00:04:44.240 Log like an internal scratch pad. 91 00:04:44.319 --> 00:04:48.000 Yes, exactly like a scratch pad, across hours, days, or 92 00:04:48.040 --> 00:04:51.319 even weeks. Before it goes into a dormant state, it 93 00:04:51.480 --> 00:04:54.480 summarizes its current state and writes it to its memory. 94 00:04:54.600 --> 00:04:56.759 Oh wow, And when it wakes up, it reads its 95 00:04:56.800 --> 00:05:01.040 own journal. It remembers the actions it took yesterday, and crucially, 96 00:05:01.519 --> 00:05:02.959 it learns from the outcomes. 97 00:05:03.759 --> 00:05:06.759 So if an API call failed on Tuesday, it's. 98 00:05:06.720 --> 00:05:09.800 State vector reflects that and it will automatically attempt an 99 00:05:09.839 --> 00:05:12.000 alternative routing on Wednesday. 100 00:05:11.800 --> 00:05:15.279 Which is brilliant, but it naturally brings up a huge 101 00:05:15.319 --> 00:05:18.360 logistical hurdle, right, because if this system is acting as 102 00:05:18.399 --> 00:05:20.279 my chief of staff running twenty four to seven in 103 00:05:20.319 --> 00:05:23.279 the background, it needs to interact with the digital world. 104 00:05:23.279 --> 00:05:25.399 And my immediate reaction is wait, if I am not 105 00:05:25.480 --> 00:05:28.120 hitting enter to trigger a prompt, how does it know 106 00:05:28.160 --> 00:05:28.879 when to wake up? 107 00:05:29.000 --> 00:05:29.879 That's the big question. 108 00:05:30.040 --> 00:05:32.959 Yeah, and AI just randomly executing tasks in the background 109 00:05:33.040 --> 00:05:34.199 sounds like pure chaos. 110 00:05:34.360 --> 00:05:38.720 What's fascinating here is how they have architected the environmental awareness. 111 00:05:39.240 --> 00:05:42.639 It doesn't rely on a constant, expensive polling loop. 112 00:05:42.439 --> 00:05:44.879 Where the AI is awake twenty four to seven asking 113 00:05:44.920 --> 00:05:46.720 should I do something now? Should do something now? 114 00:05:46.879 --> 00:05:50.839 Right? That would be incredibly inefficient. Instead, it operates entirely 115 00:05:51.120 --> 00:05:53.800 on a ven driven triggers. Okay, think of it like 116 00:05:53.920 --> 00:05:58.120 setting up tripwires across your digital ecosystem. Conway stays at 117 00:05:58.160 --> 00:06:02.319 a highly efficient, dormant state of passive monitoring until an 118 00:06:02.319 --> 00:06:05.879 external event physically wakes it up and hands it an objective. 119 00:06:06.079 --> 00:06:10.079 So a tripwire would be something like a VIP client 120 00:06:10.120 --> 00:06:12.839 sending an email with the word urgent, or maybe a 121 00:06:12.920 --> 00:06:14.480 database flag getting flipped. 122 00:06:14.560 --> 00:06:16.240 Yes, exactly does it. 123 00:06:16.120 --> 00:06:18.319 Hook directly into those systems to listen for that? 124 00:06:18.439 --> 00:06:21.199 It integrates deeply into your workflow. It could be a 125 00:06:21.399 --> 00:06:24.920 new pull request opening on GitHub, a sudden calendar alteration, 126 00:06:25.279 --> 00:06:28.279 or a massive spike and user traffic on your server. 127 00:06:28.439 --> 00:06:28.720 Got it. 128 00:06:28.879 --> 00:06:32.360 When that specific programmatic condition is met, the system receives 129 00:06:32.360 --> 00:06:36.519 a payload, Conway activates, reads the new information against its 130 00:06:36.519 --> 00:06:40.240 persistent journal, and then executes the pre defined action strategy. 131 00:06:40.319 --> 00:06:42.040 But hold on, if we connect this to the bigger 132 00:06:42.040 --> 00:06:44.759 picture of enterprise security, that actually sounds terrifying. 133 00:06:44.920 --> 00:06:46.959 I'm sure it departments are sweating. 134 00:06:46.800 --> 00:06:49.160 Because if Conway is just waiting for a signal from 135 00:06:49.160 --> 00:06:53.199 the Internet to wake up and start executing complex autonomous tasks, 136 00:06:53.600 --> 00:06:56.639 couldn't a bad actor just spoof an email or fake 137 00:06:56.680 --> 00:07:00.120 a server request. They could effectively hijack my a I 138 00:07:00.279 --> 00:07:02.600 co worker by sending it a malicious trigger. 139 00:07:02.680 --> 00:07:06.519 You're hitting on the core vulnerability of any event driven architecture, 140 00:07:06.519 --> 00:07:10.439 and it's exactly what Anthropic had to engineer around. Conway 141 00:07:10.439 --> 00:07:14.199 relies on highly secure webhooks to listen for these triggers, okay, 142 00:07:14.199 --> 00:07:19.959 and it enforces strict cryptographics signature verification. Specifically, it utilizes 143 00:07:20.199 --> 00:07:23.000 x hub signature two hundred and fifty six headers for 144 00:07:23.199 --> 00:07:24.720 all incoming payloads. 145 00:07:24.800 --> 00:07:27.199 Okay, x hubs signature two fifty six. So it's acting 146 00:07:27.199 --> 00:07:28.879 like a cryptographic bouncer at the door. 147 00:07:29.040 --> 00:07:30.000 That's a good way to look at it. 148 00:07:30.040 --> 00:07:32.240 And I'm assuming this isn't just checking a basic password. 149 00:07:32.319 --> 00:07:35.800 Far from it. It's an unbreakable mathematical seal. When a 150 00:07:35.839 --> 00:07:39.079 signal comes in. Let's say your inventory database sends a 151 00:07:39.079 --> 00:07:42.720 webhook saying stock for item A is zero, right, that 152 00:07:42.759 --> 00:07:46.040 payload is hashed by the sender using a complex algorithm 153 00:07:46.040 --> 00:07:49.079 and a secret key that only your server and Conway share. 154 00:07:49.160 --> 00:07:50.920 Okay, so they both had the key, right. 155 00:07:51.680 --> 00:07:55.680 The sender attaches that hash to the message. When Conway 156 00:07:55.720 --> 00:07:59.360 receives the payload, it performs the exact same mathematical hashing 157 00:07:59.399 --> 00:08:00.240 process on. 158 00:08:00.199 --> 00:08:02.360 The data, and if they don't match, If. 159 00:08:02.160 --> 00:08:04.680 The resulting hash doesn't perfectly match the one attached to 160 00:08:04.720 --> 00:08:07.759 the message, Conway drops the request immediately. 161 00:08:07.920 --> 00:08:09.040 It doesn't even read it. 162 00:08:09.040 --> 00:08:11.319 It won't even wake up the language model to read 163 00:08:11.319 --> 00:08:15.560 the prompt. It ensures the agent only ever responds to authenticated, 164 00:08:15.759 --> 00:08:17.079 untampered sources. 165 00:08:17.480 --> 00:08:21.600 Okay, so the cryptographic bouncer lets the trusted signal through. 166 00:08:21.959 --> 00:08:24.639 Conway wakes up, it reads its journal, and now it 167 00:08:24.680 --> 00:08:26.839 has to actually do the work. Yes, and from what 168 00:08:26.879 --> 00:08:30.319 I understand, it doesn't just quietly use APIs behind the scenes. 169 00:08:30.600 --> 00:08:34.159 It can actually browse the visual web. It can which, 170 00:08:34.320 --> 00:08:37.480 as someone who currently uses APR automations to run my life, 171 00:08:38.039 --> 00:08:40.480 I swear if a vendor changes a single pixel on 172 00:08:40.519 --> 00:08:44.039 their website or renames a CSS class, my entire automated 173 00:08:44.080 --> 00:08:46.320 workflow shatters into a million pieces. 174 00:08:46.360 --> 00:08:48.799 And that fragility is exactly what Conway is designed to 175 00:08:48.799 --> 00:08:53.159 bypass Conway features native browser automation, but it doesn't rely 176 00:08:53.240 --> 00:08:56.279 on brittle dom scraping, where it just blindly looks for 177 00:08:56.320 --> 00:08:59.360 a specific line of code on a page. Instead, it 178 00:08:59.480 --> 00:09:02.639 uses compute to visually parse the layout of a site, 179 00:09:02.720 --> 00:09:05.039 much like a human does. Oh wow, So if a 180 00:09:05.039 --> 00:09:08.480 competitor radically updates their pricing page, throwing off all your 181 00:09:08.519 --> 00:09:12.960 static webscrapers, Conway can navigate to the new url, visually 182 00:09:13.000 --> 00:09:16.399 identify the pricing tables regardless of the underlying code changes, 183 00:09:16.879 --> 00:09:19.960 extract the new data, compare it to your internal metrics, 184 00:09:20.279 --> 00:09:22.240 and draft a strategic response. 185 00:09:22.480 --> 00:09:25.799 So it's literally acting like a human analyst clicking around 186 00:09:25.799 --> 00:09:28.440 on Chrome, adapting to visual changes on the fly. 187 00:09:28.679 --> 00:09:32.879 Yes, and to exponentially scale that capability, developers can build 188 00:09:32.960 --> 00:09:35.120 custom extensions specifically for. 189 00:09:35.159 --> 00:09:38.519 Conway extensions, like browser extensions similar concepts. 190 00:09:38.519 --> 00:09:41.960 These are packaged in a proprietary format dot CNW, dot. 191 00:09:41.919 --> 00:09:45.000 Zip, CW dot zip. Okay, so we are looking at 192 00:09:45.039 --> 00:09:48.360 an ecosystem purpose built for an aiagent rather than a 193 00:09:48.440 --> 00:09:49.120 human user. 194 00:09:49.440 --> 00:09:52.639 That is the intended architecture. Just as browser extensions give 195 00:09:52.679 --> 00:09:56.639 you custom UI tools or block ads. These dot CNW 196 00:09:56.720 --> 00:10:00.799 dot zip files allow enterprise developers to build d native 197 00:10:00.799 --> 00:10:03.399 integrations into Conway's ecosystem, so. 198 00:10:03.399 --> 00:10:05.480 I could build one for my specific workflow. 199 00:10:05.600 --> 00:10:10.200 Exactly, you could install an extension that grants Conway highly specific, 200 00:10:10.440 --> 00:10:15.919 authenticated access to your proprietary HR software or maybe your 201 00:10:16.279 --> 00:10:19.720 AWS back end. It creates a standardized way to give 202 00:10:19.759 --> 00:10:22.840 the AI new skills without having to rebuild the entire 203 00:10:22.879 --> 00:10:23.759 agent from scratch. 204 00:10:23.879 --> 00:10:26.879 Here's where it gets really interesting, because Anthropic didn't just 205 00:10:26.879 --> 00:10:30.360 wake up one morning and decide to build a persistent browser, controlling, 206 00:10:30.639 --> 00:10:33.759 cryptographically secure AI out of thin air. 207 00:10:33.919 --> 00:10:35.759 No, this has been a long time coming, right. 208 00:10:36.000 --> 00:10:38.120 You can trace the development of this over the last year. 209 00:10:38.399 --> 00:10:41.159 Building an ecosystem where an AI can safely use custom 210 00:10:41.200 --> 00:10:44.120 extensions means the AI first had to learn how to 211 00:10:44.159 --> 00:10:46.759 interact with computer systems at a base level. Yes, we 212 00:10:46.799 --> 00:10:49.000 saw this start with claud code, which was very terminal 213 00:10:49.039 --> 00:10:52.960 based manipulating files for developers. The major pivot point was 214 00:10:53.000 --> 00:10:55.519 the transition to claud Cowork earlier this year. 215 00:10:55.639 --> 00:10:59.120 The January twenty twenty six research preview of claud Cowork 216 00:10:59.279 --> 00:11:02.840 was an absolutely vital stepping stone. Yeah, it was their 217 00:11:02.879 --> 00:11:07.559 first real foray into an agentic environment designed for general 218 00:11:07.600 --> 00:11:10.120 knowledge workers rather than just software engineering. 219 00:11:10.399 --> 00:11:12.759 Cowork was impressive. You could give it a high level 220 00:11:12.799 --> 00:11:16.080 goal like take these five raw data exports, clean them up, 221 00:11:16.279 --> 00:11:18.879 and build me a quarterly review presentation. 222 00:11:18.519 --> 00:11:19.159 And it would do it. 223 00:11:19.200 --> 00:11:21.639 It would navigate your files and build the deck. Yeah, 224 00:11:21.679 --> 00:11:25.600 but it still felt constrained. It felt like handing a 225 00:11:25.720 --> 00:11:29.320 project to a brilliant intern who legally doesn't have the 226 00:11:29.360 --> 00:11:31.000 authority to sign the checks. 227 00:11:31.159 --> 00:11:33.039 That's a great analogy. 228 00:11:32.600 --> 00:11:34.360 Like it could do the prep work, but it couldn't 229 00:11:34.399 --> 00:11:35.399 finalize anything. 230 00:11:35.600 --> 00:11:40.000 But that limitation was by design. Cowork was intensely gold driven, 231 00:11:40.039 --> 00:11:43.600 but the architecture heavily enforced human in the loop oversight. 232 00:11:43.960 --> 00:11:44.200 Right. 233 00:11:44.320 --> 00:11:47.200 The system could do the heavy lifting of data synthesis, 234 00:11:47.679 --> 00:11:51.720 but the consequential actions, sending the final emails, committing code 235 00:11:51.720 --> 00:11:55.840 to a production environment, executing a financial transfer, those remained 236 00:11:55.919 --> 00:11:57.559 gated behind user approval. 237 00:11:57.679 --> 00:11:58.879 So you were still the bottleneck. 238 00:11:59.159 --> 00:12:02.360 You are the supervise acting as the final security checkpoint. 239 00:12:02.399 --> 00:12:06.879 But Conway drops that requirement. It upgrades the intern to 240 00:12:07.159 --> 00:12:09.000 full corporate signing authority. 241 00:12:09.240 --> 00:12:12.320 It does, which is a massive leap in trust. Conway 242 00:12:12.360 --> 00:12:16.120 takes the baseline capabilities of cowork and embeds them into 243 00:12:16.159 --> 00:12:20.279 this continuous, persistent loop we've been discussing, right, But to 244 00:12:20.360 --> 00:12:23.279 do that safely, to give it that signing authority, and 245 00:12:23.360 --> 00:12:27.759 PROPIC had to build an internal regulatory system. They utilize 246 00:12:27.759 --> 00:12:30.320 what is known as managed agents infrastructure. 247 00:12:30.440 --> 00:12:33.320 Okay, wait, so it's not just one massive AI brain 248 00:12:33.840 --> 00:12:36.799 handling the execution and the oversight simultaneously. 249 00:12:36.840 --> 00:12:39.960 No. Relying on a single model to police itself during 250 00:12:40.000 --> 00:12:44.799 a complex, multi day task is incredibly risky. Managed agents 251 00:12:44.840 --> 00:12:47.440 infrastructure involves deploying supervisor agents. 252 00:12:47.559 --> 00:12:47.679 Ok. 253 00:12:47.879 --> 00:12:50.919 These are specialized, highly efficient models whose sole function is 254 00:12:50.960 --> 00:12:52.840 to audit the primary worker agent. 255 00:12:52.720 --> 00:12:54.279 Like an internal affairs department. 256 00:12:54.039 --> 00:12:57.759 Exactly as Conway executes a task, say researching competitors and 257 00:12:57.840 --> 00:13:01.919 updating your database. The supervisor agent runs parallel inference just 258 00:13:01.960 --> 00:13:06.440 watching it. It constantly evaluates Conway's state vector and scratchpad 259 00:13:06.720 --> 00:13:09.279 to ensure the worker isn't caught in an infinite loop, 260 00:13:09.759 --> 00:13:13.279 that it isn't hallucinating data, and that it isn't violating 261 00:13:13.320 --> 00:13:14.679 its core system. 262 00:13:14.320 --> 00:13:16.399 Constraints, and what happens if it does. 263 00:13:16.519 --> 00:13:19.320 If the supervisor detects an anomaly, it can issue a 264 00:13:19.399 --> 00:13:23.159 system level halt command to the worker agent and Conway 265 00:13:23.320 --> 00:13:26.159 orchestrates this entire hierarchy autonomously. 266 00:13:26.480 --> 00:13:29.720 Okay, so we have the architectural foundation, we have the 267 00:13:30.039 --> 00:13:34.120 million token memory acting as a persistent journal, the cryptographic 268 00:13:34.120 --> 00:13:38.000 webhooks waking it up, the visual browser control adapting to changes, 269 00:13:38.440 --> 00:13:41.279 and the supervisor agent's acting as an internal audit team. 270 00:13:41.399 --> 00:13:42.360 That's the full package. 271 00:13:42.440 --> 00:13:44.759 Let's bring this down to earth, right to the listener's desktop. 272 00:13:44.960 --> 00:13:47.679 What does this actually look like in practice? If I 273 00:13:47.720 --> 00:13:50.960 deploy Conway as my digital chief of staff, how does 274 00:13:51.000 --> 00:13:53.159 that fundamentally change my Tuesday workflow? 275 00:13:53.360 --> 00:13:56.919 Let's apply it to a real world business intelligence scenario. Okay, perfect, 276 00:13:57.120 --> 00:14:01.120 Imagine your company's global supply chain data every night at midnight. 277 00:14:01.679 --> 00:14:04.600 In a traditional setup, a human analyst logs in at 278 00:14:04.720 --> 00:14:08.679 nine zero am, spots a strange anomaly in the European numbers, 279 00:14:09.039 --> 00:14:12.879 spends three hours cross referencing shipping logs, and finally presents 280 00:14:12.879 --> 00:14:14.440 a preliminary report after lunch. 281 00:14:14.559 --> 00:14:17.039 Right, half a day is gone just identifying the scope 282 00:14:17.039 --> 00:14:18.279 of the problem exactly. 283 00:14:18.639 --> 00:14:20.320 But with an event driven. 284 00:14:20.039 --> 00:14:22.440 Setup, the midnight database sink is the trigger. 285 00:14:22.679 --> 00:14:26.279 Yes, at twelve zero one am, Conway wakes up and 286 00:14:26.360 --> 00:14:29.000 reads the new data. Its supervisor of agents ensure it 287 00:14:29.080 --> 00:14:32.279 stays on task. It detects a fifteen percent drop in 288 00:14:32.320 --> 00:14:34.120 European fulfillment speeds. 289 00:14:33.759 --> 00:14:35.639 And because it has browser control. 290 00:14:35.440 --> 00:14:39.639 It autonomously opens its native browser scans regional European news outlets, 291 00:14:39.919 --> 00:14:43.320 identifies a localized wildcat strike at a major shipping port, 292 00:14:43.480 --> 00:14:47.240 cross references that with competitor inventory levels, and calculates the 293 00:14:47.240 --> 00:14:49.080 projected impact on your Q three margin. 294 00:14:49.279 --> 00:14:50.000 While I'm sleeping. 295 00:14:50.159 --> 00:14:53.039 By three point am, it has synthesized the root cause, 296 00:14:53.320 --> 00:14:57.039 drafted a comprehensive mitigation strategy, and pushed a Slack message 297 00:14:57.039 --> 00:14:59.639 to the executive channel. When you wake up and pour 298 00:14:59.679 --> 00:15:03.080 your car. The crisis hasn't just been identified, the strategic 299 00:15:03.120 --> 00:15:04.480 analysis has already finished. 300 00:15:04.840 --> 00:15:09.279 That level of leverage is unbelievable. It completely eclipses static 301 00:15:09.320 --> 00:15:12.960 automation platforms. We are moving from if X happens, trigger 302 00:15:13.000 --> 00:15:17.279 why to if X happens, figure out why, understand the context, 303 00:15:17.600 --> 00:15:19.559 and execute the best possible solution. 304 00:15:19.840 --> 00:15:22.759 It replaces rigid logic with dynamic judgment, and. 305 00:15:22.720 --> 00:15:26.600 That dynamic judgment is the key to scaling complex operations, 306 00:15:27.399 --> 00:15:31.360 but it is also the source of the most significant risk. 307 00:15:31.799 --> 00:15:33.279 Yes, it is right, I have. 308 00:15:33.240 --> 00:15:35.799 To play Devil's advocate here because listening to this, I 309 00:15:35.840 --> 00:15:38.159 can't help but think of the Sorcerer's apprentice. 310 00:15:38.240 --> 00:15:39.440 Oh that's a good comparison. 311 00:15:39.519 --> 00:15:41.960 You know, Mickey Mouse in Chance the Broom to carry 312 00:15:41.960 --> 00:15:45.320 the water falls asleep and wakes up drowning because the 313 00:15:45.360 --> 00:15:48.240 automated worker lacked the contextual judgment to know when the 314 00:15:48.320 --> 00:15:51.200 job was actually done. When you grant an AI system 315 00:15:51.360 --> 00:15:54.799 unsupervised autonomy over days or weeks, what happens when a 316 00:15:54.840 --> 00:15:58.000 micro error occurs on day one? Doesn't the probability of 317 00:15:58.000 --> 00:16:00.960 failure approach one hundred percent over a long enough timeline. 318 00:16:01.000 --> 00:16:03.919 This raises an incredibly important question, and frankly, it is 319 00:16:03.960 --> 00:16:07.759 the primary reason Conway remains an internal testing. The reality 320 00:16:07.840 --> 00:16:11.720 check on autonomous agents is severe. The first critical vulnerability 321 00:16:11.759 --> 00:16:15.480 is exactly what you're pointing to, reliability and the mathematics 322 00:16:15.480 --> 00:16:17.879 of compounding hallucinations. 323 00:16:17.200 --> 00:16:19.720 Because if it's acting on its own journal entries, a 324 00:16:19.759 --> 00:16:22.399 hallucination becomes a false memory that it treats as fact. 325 00:16:22.759 --> 00:16:26.919 Yes, in a long horizon execution, an AI might make 326 00:16:27.039 --> 00:16:30.840 a minor incorrect assumption during hour two of a seventy 327 00:16:30.879 --> 00:16:33.759 two hour workflow. Let's say an agent has a ninety 328 00:16:33.840 --> 00:16:36.679 nine percent success rate per individual reasoning step. 329 00:16:36.960 --> 00:16:39.000 That sounds excellent it does. 330 00:16:38.799 --> 00:16:41.879 But over a sequence of one hundred autonomous steps, that 331 00:16:41.960 --> 00:16:45.519 one percent error rate compounds, resulting in roughly a thirty 332 00:16:45.559 --> 00:16:48.720 six percent chance of task failure. Wow, by our forty 333 00:16:48.759 --> 00:16:52.840 that tiny initial assumption has completely derailed the workflow, and 334 00:16:52.919 --> 00:16:57.039 anthropics internal research highlights a fascinating psychological hazard here, the 335 00:16:57.080 --> 00:16:58.120 autonomy paradox. 336 00:16:58.200 --> 00:17:00.399 Let me guess as the system proves it can handle 337 00:17:00.440 --> 00:17:02.480 the work humans completely. 338 00:17:02.080 --> 00:17:05.839 Check out precisely the issue. As Conway demonstrates competence, users 339 00:17:05.880 --> 00:17:08.839 grant it more independence and check the audit logs less frequently. 340 00:17:09.240 --> 00:17:11.480 It is very similar to the self driving car problem. 341 00:17:11.640 --> 00:17:15.000 Exactly, you trust the autopilot so implicitly that you stop 342 00:17:15.079 --> 00:17:17.319 watching the road, which is exactly the moment you need 343 00:17:17.359 --> 00:17:17.839 to intervene. 344 00:17:18.000 --> 00:17:18.119 Right. 345 00:17:18.519 --> 00:17:22.240 The data shows that even with supervisor agents, edge case 346 00:17:22.279 --> 00:17:26.240 interruptions where the system requires human clarification are still common. 347 00:17:26.920 --> 00:17:30.440 If the human has grown complacent, the system either stalls 348 00:17:30.480 --> 00:17:35.160 indefinitely or worse confidently, hallucinates. 349 00:17:34.359 --> 00:17:37.640 A path forward, which leads to the second massive reality check. 350 00:17:38.079 --> 00:17:41.759 Privacy and control a huge issue because if Conway is 351 00:17:41.759 --> 00:17:43.599 going to act as a chief of staff and draft 352 00:17:43.599 --> 00:17:46.680 that supply chain report at three point zero am. It 353 00:17:46.759 --> 00:17:49.680 needs access to a terrifying amount of data. It needs 354 00:17:49.720 --> 00:17:52.799 everything deep access to my local files, my Gmail, my 355 00:17:52.839 --> 00:17:56.960 company's Google Drive, are internal Slack channels, my calendar, basically 356 00:17:56.960 --> 00:17:58.279 my entire digital brain. 357 00:17:58.400 --> 00:18:01.839 And giving an autonomous agent that level of lateral access 358 00:18:01.880 --> 00:18:07.519 demands intense encryption and airtight compartmentalization. Yeah, enterprise IT departments 359 00:18:07.519 --> 00:18:11.000 are going to require mathematical certainty that Conway won't accidentally 360 00:18:11.079 --> 00:18:13.960 email a draft of upcoming layoffs to the entire company 361 00:18:14.160 --> 00:18:16.720 while trying to you know, optimize. 362 00:18:16.240 --> 00:18:17.279 Your HR folders right. 363 00:18:17.279 --> 00:18:18.119 That would be a nightmare. 364 00:18:18.240 --> 00:18:20.880 The audit trails for these actions have to be flawless 365 00:18:20.920 --> 00:18:22.279 and instantly reviewable. 366 00:18:22.640 --> 00:18:25.519 And if you combine that deep access with the dot 367 00:18:25.559 --> 00:18:29.200 CNW dot zip extensions we talked about earlier, the security 368 00:18:29.240 --> 00:18:32.920 implications are wild. We're opening up a massive new attack. 369 00:18:32.680 --> 00:18:37.079 Surface without question, even with x hub signature cryptographic signing 370 00:18:37.160 --> 00:18:40.599 on the triggers. Anytime you allow third party extensions to 371 00:18:40.640 --> 00:18:44.279 dictate internal actions, you introduce severe risk. 372 00:18:44.079 --> 00:18:46.240 Because someone else wrote that code. 373 00:18:46.480 --> 00:18:50.359 A poorly coded or intentionally malicious dot CNW dot zip 374 00:18:50.400 --> 00:18:54.319 extension could act as a trojan horse, granting an attacker 375 00:18:54.599 --> 00:18:59.119 backdoor access to the agent's memory or its authenticated API keys. WOW, 376 00:19:00.039 --> 00:19:04.200 prize grade governance, strict extensions, sandboxing, and continuous monitoring are 377 00:19:04.240 --> 00:19:07.319 going to be absolute prerequisites before a system like Conways's 378 00:19:07.400 --> 00:19:08.559 wide commercial deployment. 379 00:19:08.799 --> 00:19:11.200 So what does this all mean for us? Looking at 380 00:19:11.240 --> 00:19:14.880 the trajectory from stateless prompts to this stateful persistent architecture, 381 00:19:15.039 --> 00:19:16.839 it is clear we are standing on the edge of 382 00:19:16.880 --> 00:19:19.640 a completely new era. We really are the age of 383 00:19:19.680 --> 00:19:23.519 the passive interface. The blanking cursor waiting patiently for our 384 00:19:23.559 --> 00:19:28.039 instructions is officially fading. We are transitioning into an era 385 00:19:28.160 --> 00:19:30.359 where AI isn't just a tool we pick up and 386 00:19:30.400 --> 00:19:31.200 put down. 387 00:19:31.079 --> 00:19:32.960 Right, it's becoming a continuous presence. 388 00:19:33.039 --> 00:19:37.240 We're entering the age of proactive, persistent companions, systems that 389 00:19:37.279 --> 00:19:40.519 act like peers, that manage their own workflows, and that 390 00:19:40.640 --> 00:19:43.599 keep the lights on long after we have clocked out. 391 00:19:43.839 --> 00:19:48.039 It fundamentally redefines the concept of digital leverage. You are 392 00:19:48.079 --> 00:19:51.000 no longer just augmenting your personal typing speed or your 393 00:19:51.000 --> 00:19:55.319 individual research capacity. You are essentially managing an artificial workforce 394 00:19:55.359 --> 00:19:59.960 that can scale indefinitely. The technical mitigations, the supervisor age, 395 00:20:00.400 --> 00:20:04.160 the cryptographic security, the persistent context loops that they're all 396 00:20:04.200 --> 00:20:05.359 falling into place. 397 00:20:05.240 --> 00:20:09.680 But the societal and organizational impacts are entirely uncharted completely. Yeah, 398 00:20:09.720 --> 00:20:12.799 the technology is one thing, but how our legal and 399 00:20:12.880 --> 00:20:16.039 social structures adapt to it is a completely different. 400 00:20:15.720 --> 00:20:18.720 Puzzle, and that leaves us with a critical unresolved tension 401 00:20:18.759 --> 00:20:22.960