WEBVTT 1 00:00:01.199 --> 00:00:06.200 Welcome to the Sentient Code, where intelligence is engineered, autonomy 2 00:00:06.280 --> 00:00:10.439 is emerging, and a line between human and machine grows thinner. 3 00:00:10.800 --> 00:00:15.359 Each episode we decode the algorithms, explore the robotics, and 4 00:00:15.439 --> 00:00:21.879 examine the ideas shaping the future of artificial minds. 5 00:00:23.800 --> 00:00:26.760 Hello, and welcome back to the show. Today we are 6 00:00:27.480 --> 00:00:30.120 we're walking right into the center of the maze. We're 7 00:00:30.160 --> 00:00:33.439 tackling a topic that on the surface feels like it 8 00:00:33.479 --> 00:00:37.719 belongs strictly in the realm of maybe nineteen eighties science 9 00:00:37.759 --> 00:00:41.640 fiction or a late night philosophy dorm room session. 10 00:00:41.840 --> 00:00:43.359 Yeah, it really does have that vibe. 11 00:00:43.399 --> 00:00:45.759 It does, but as we are going to see today, 12 00:00:46.119 --> 00:00:48.399 it is very much grounded in the reality of what 13 00:00:48.520 --> 00:00:51.200 is running on server farms right now for you listening 14 00:00:51.240 --> 00:00:54.359 at home. We are exploring the idea of machines that 15 00:00:54.439 --> 00:00:55.719 build better machines. 16 00:00:56.039 --> 00:01:00.119 It is the concept of recursive intelligence. And you are right, 17 00:01:00.159 --> 00:01:02.640 it sounds completely like sci fi, but in the field 18 00:01:02.679 --> 00:01:06.079 of computer science and cognitive science it has this specific 19 00:01:06.560 --> 00:01:09.359 quality of being a strange attractor. 20 00:01:09.400 --> 00:01:11.280 The strange attractor, Yeah, I love that term, but let's 21 00:01:11.280 --> 00:01:13.519 break that down immediately. What does that actually mean in 22 00:01:13.560 --> 00:01:14.239 this context. 23 00:01:14.439 --> 00:01:17.120 So in chaos theory, a strange attractor is a state 24 00:01:17.159 --> 00:01:20.480 that a dynamic system tends to evolve toward. No matter 25 00:01:20.480 --> 00:01:24.040 where you start, the system eventually settles into this specific pattern. 26 00:01:24.200 --> 00:01:27.480 In the world of AI theory, recursive self improvement is 27 00:01:27.519 --> 00:01:30.480 that pattern. It's a concept that our thinking just keeps 28 00:01:30.480 --> 00:01:33.719 circling back to. No matter how far you drift into 29 00:01:33.719 --> 00:01:35.239 the engineering weeds. 30 00:01:35.040 --> 00:01:37.719 Talking about loss functions and gradient descent and. 31 00:01:37.680 --> 00:01:40.840 All that exactly, or how far you go into the 32 00:01:40.840 --> 00:01:44.760 philosophical clouds, the sheer gravitational weight of this idea just 33 00:01:44.840 --> 00:01:45.640 pulls you back in. 34 00:01:46.200 --> 00:01:48.400 So it's inevitable, Is that what their researchers are saying. 35 00:01:48.640 --> 00:01:49.719 Many thinkers believe so. 36 00:01:49.840 --> 00:01:50.040 Yes. 37 00:01:50.719 --> 00:01:53.200 It is the notion that if you have a sufficiently 38 00:01:53.280 --> 00:01:56.280 capable intelligence, it stands to reason that it might be 39 00:01:56.280 --> 00:01:59.400 able to improve itself. And if it improves itself, the 40 00:01:59.519 --> 00:02:02.560 new verse is smarter, which means it is even better 41 00:02:02.599 --> 00:02:03.640 at improving itself. 42 00:02:03.879 --> 00:02:06.319 So you get this loop, and the loop keeps tightening 43 00:02:06.400 --> 00:02:07.920 and accelerating, right, and. 44 00:02:07.840 --> 00:02:11.199 The iteration of this process might produce something that bears 45 00:02:11.199 --> 00:02:14.960 the same relationship to its starting point, as say, a 46 00:02:15.000 --> 00:02:18.800 modern human brain bears to the primitive neural circuitry of 47 00:02:18.840 --> 00:02:21.199 an early organism like a flatworm. 48 00:02:21.240 --> 00:02:24.039 That is a staggering comparison. I mean, we are talking 49 00:02:24.080 --> 00:02:27.840 about an evolutionary leap, something that took biology millions of 50 00:02:27.919 --> 00:02:31.039 years but compressed into well, we actually don't know the timeframe, 51 00:02:31.080 --> 00:02:31.520 do we. 52 00:02:31.520 --> 00:02:34.960 We really don't, and that is why this topic sits 53 00:02:35.000 --> 00:02:38.840 at such a weird intersection. It's computer science, obviously, but 54 00:02:38.919 --> 00:02:43.000 it is also philosophy, and it's heavily discussed in safety research. 55 00:02:43.439 --> 00:02:47.039 It is simultaneously one of the most rigorously discussed concepts 56 00:02:47.039 --> 00:02:50.280 in the safety literature and paradoxically one of the most 57 00:02:50.319 --> 00:02:52.479 speculative things you can possibly talk about. 58 00:02:52.599 --> 00:02:55.560 It feels a bit like ghost hunting with a Geiger counter. Yeah, 59 00:02:55.639 --> 00:02:58.080 we have all these technical tools, but we aren't quite 60 00:02:58.120 --> 00:02:59.240 sure what we're looking at yet. 61 00:02:59.280 --> 00:03:00.800 That's a really fair analogy. 62 00:03:01.080 --> 00:03:04.520 So our mission today for you listening is to cut 63 00:03:04.520 --> 00:03:07.240 through the noise. We have a massive stack of research 64 00:03:07.280 --> 00:03:10.919 here primarily focused on the actual mechanics of self improving AI, 65 00:03:11.639 --> 00:03:14.800 and we need to disentangle the threats because, as I 66 00:03:14.879 --> 00:03:17.120 understand it, there's a lot of confusion out there. People 67 00:03:17.159 --> 00:03:20.599 hear self improving AI and they immediately picture Skynet. 68 00:03:20.120 --> 00:03:22.159 Or Hell nine thousand, right, the Hollywood version. 69 00:03:22.280 --> 00:03:24.000 Yeah, but there is also a version of this that 70 00:03:24.240 --> 00:03:26.840 is just mundane engineering, and that is. 71 00:03:26.759 --> 00:03:30.520 The absolute key here. We need to disentangle the mundane 72 00:03:30.560 --> 00:03:33.319 engineering which is real and happening today on your phone 73 00:03:33.319 --> 00:03:37.719 and your laptop, from the transformative scenarios which do remain hypothetical. 74 00:03:38.280 --> 00:03:40.639 We need to see how close those two worlds actually are. 75 00:03:40.840 --> 00:03:45.319 Okay, let's unpack this term self improving, because it feels 76 00:03:45.360 --> 00:03:48.520 like a suitcase word, you know, Marvin Minsky's term for 77 00:03:48.800 --> 00:03:50.400 a word that you can pack a lot of different 78 00:03:50.439 --> 00:03:53.439 meanings into. The research we're looking at suggests it's not 79 00:03:53.639 --> 00:03:54.960 just one thing. 80 00:03:55.080 --> 00:03:57.080 It really isn't. If you look closely at the literature, 81 00:03:57.159 --> 00:03:59.960 you can essentially break it down into four distinct levels, 82 00:04:00.439 --> 00:04:03.759 and the implications of each level are vastly different. It's 83 00:04:03.759 --> 00:04:06.199 not a binary switch where a machine is either stupid 84 00:04:06.280 --> 00:04:06.840 or godlike. 85 00:04:07.159 --> 00:04:09.400 It's a ladder a ladder. Let's start at the bottom 86 00:04:09.479 --> 00:04:11.879 rung Man level one, the mundane level. 87 00:04:12.000 --> 00:04:15.199 This is routine machine learning in a very loose sense. 88 00:04:15.280 --> 00:04:18.639 Almost every AI system we use today is self improving. 89 00:04:19.439 --> 00:04:22.319 Think about a recommendation system on a streaming. 90 00:04:21.879 --> 00:04:24.560 Platform, right, So I watch a cheesy romcom, I give 91 00:04:24.560 --> 00:04:26.000 it a thumbs up, and the system. 92 00:04:25.759 --> 00:04:29.560 Learns exactly It collects your user feedback, it updates its predictions. 93 00:04:29.600 --> 00:04:32.839 It essentially says, Okay, the user likes this, let's adjust 94 00:04:32.879 --> 00:04:35.480 the weights to show more of that, or take a 95 00:04:35.560 --> 00:04:37.920 large language model. If you train it on more data, 96 00:04:38.040 --> 00:04:41.000 it gets better. It updates its parameters, which are the 97 00:04:41.079 --> 00:04:43.879 internal weights, based on that new information. 98 00:04:44.240 --> 00:04:46.959 So it is getting better at its job. But is 99 00:04:47.000 --> 00:04:48.120 it really self improvement? 100 00:04:48.240 --> 00:04:51.319 That is exactly the right question to ask. Technically, yes, 101 00:04:51.360 --> 00:04:55.399 the performance metrics are going up, but this is incremental. Crucially, 102 00:04:55.680 --> 00:04:57.720 it relies on an external. 103 00:04:57.279 --> 00:05:00.519 Signal us the data we give it, right. 104 00:05:00.399 --> 00:05:02.800 The data or the feedback we provide. It is not 105 00:05:02.879 --> 00:05:05.800 looking at its own code and rewriting it. It's just practicing. 106 00:05:05.920 --> 00:05:07.839 So it's like a musician practicing scales. 107 00:05:07.920 --> 00:05:10.839 Precisely, it is the difference between a musician practicing their 108 00:05:10.879 --> 00:05:14.560 scales to get faster fingers versus a musician deciding to 109 00:05:14.600 --> 00:05:18.199 surgically alter their hands to play chords that were previously 110 00:05:18.360 --> 00:05:21.680 biologically impossible. Level one is just practice. 111 00:05:21.839 --> 00:05:26.319 That is a very vivid image and slightly horrifying. So 112 00:05:26.399 --> 00:05:28.680 level one is practice. What is level two? This is 113 00:05:28.680 --> 00:05:29.839 where we get into the surgery. 114 00:05:30.079 --> 00:05:33.839 Level two is architectural improvement. This is where we move 115 00:05:34.000 --> 00:05:37.279 from just changing the parameters a little tuning knobs to 116 00:05:37.399 --> 00:05:39.879 changing the actual design of the machine itself. 117 00:05:39.959 --> 00:05:42.519 This sounds a bit more abstract when you say design. 118 00:05:42.720 --> 00:05:44.800 What are we talking about in a software context? 119 00:05:44.920 --> 00:05:49.279 Well, in traditional AI development, humans design the neural networks. 120 00:05:49.600 --> 00:05:52.279 We act as the architects. We decide how many layers 121 00:05:52.319 --> 00:05:54.560 the network has, how they connect to each other, the 122 00:05:54.600 --> 00:05:57.360 overall shape of the brain, so to speak. We decide 123 00:05:57.360 --> 00:05:59.959 if it's a transformer or an RNN. But there is. 124 00:06:00.000 --> 00:06:01.680 It's a field called architecture search. 125 00:06:01.959 --> 00:06:05.160 Architecture search it sounds like an HGTV show for robots. 126 00:06:05.279 --> 00:06:08.240 It does, doesn't it, But it's actually an automated process 127 00:06:08.279 --> 00:06:11.959 of finding better neural network designs. We use machine learning 128 00:06:12.000 --> 00:06:15.920 algorithms to discover network structures that outperform the ones humans 129 00:06:15.959 --> 00:06:16.560 hand code. 130 00:06:16.639 --> 00:06:19.319 Wait, so we are using AI to design the blueprint 131 00:06:19.360 --> 00:06:21.079 for the next AI precisely. 132 00:06:21.560 --> 00:06:24.600 Imagine you want to build a skyscraper. Humans usually decide 133 00:06:24.600 --> 00:06:27.480 put the elevators here, put the windows there. That's the architecture. 134 00:06:27.720 --> 00:06:31.439 But in architecture search. We run thousands of tiny simulations. 135 00:06:31.759 --> 00:06:35.240 We let an AI build one thousand weird, wobbly skyscrapers. 136 00:06:35.680 --> 00:06:37.839 Nine hundred and ninety nine of them might fall down 137 00:06:38.000 --> 00:06:39.079 or be wildly. 138 00:06:38.720 --> 00:06:40.480 Inefficient, and one stay standing. 139 00:06:40.560 --> 00:06:42.680 One stay standing, and it might have the elevators on 140 00:06:42.720 --> 00:06:46.399 the outside or windows on the floor. It looks completely 141 00:06:46.480 --> 00:06:50.160 alien to a human engineer, but it works better. That's 142 00:06:50.199 --> 00:06:53.759 the key. It finds efficiencies. Humans are too biased or 143 00:06:53.839 --> 00:06:54.959 too limited to see. 144 00:06:55.240 --> 00:06:58.079 That feels like a threshold has been crossed, even if 145 00:06:58.079 --> 00:07:01.920 it is currently modest. The law has fundamentally changed. We 146 00:07:02.000 --> 00:07:05.040 aren't just teaching the machine anymore. We're letting the machine 147 00:07:05.079 --> 00:07:06.079 build the classroom. 148 00:07:06.199 --> 00:07:08.240 That's a great way to put it now. Currently this 149 00:07:08.360 --> 00:07:11.360 is still overseen by humans. We set the constraints, but 150 00:07:11.439 --> 00:07:14.720 the implication is massive. When the task of designing the 151 00:07:14.759 --> 00:07:17.680 AI is automated by an AI, we have entered a 152 00:07:17.720 --> 00:07:21.480 recursive loop. The system is actively contributing to the design 153 00:07:21.519 --> 00:07:22.279 of its successor. 154 00:07:22.399 --> 00:07:24.519 Okay, let's move to level three. This is what the 155 00:07:24.560 --> 00:07:26.120 source is called the training process. 156 00:07:26.399 --> 00:07:30.480 Yes, level three is often called metal learning or learning 157 00:07:30.519 --> 00:07:32.040 to learn, learning to learn. 158 00:07:32.079 --> 00:07:34.120 I feel like I see that phrase on self help 159 00:07:34.120 --> 00:07:35.240 book covers all the time. 160 00:07:35.360 --> 00:07:38.920 Yeah, but in this context it is strictly technical. Think 161 00:07:38.920 --> 00:07:44.399 about how a model actually absorbs information. There are algorithms, objectives, 162 00:07:44.439 --> 00:07:48.720 strategies for curating data. We call these optimizers. Usually human 163 00:07:48.759 --> 00:07:52.399 engineers decide those. We decide the syllabus and the study method. 164 00:07:52.879 --> 00:07:55.720 But at level three, you have an AI capable of 165 00:07:55.759 --> 00:07:59.560 identifying that its current way of learning is slow or suboptimal. 166 00:08:00.040 --> 00:08:02.399 So it's the student walking up to the teacher and saying, hey, 167 00:08:02.480 --> 00:08:05.800 your syllabus is completely inefficient. If I study this way instead, 168 00:08:06.079 --> 00:08:08.439 I'll learn calculus and half the time exactly. 169 00:08:08.600 --> 00:08:12.240 It proposes modifications to the learning algorithm itself, and there 170 00:08:12.319 --> 00:08:15.600 is genuine empirical research happening here right now. If an 171 00:08:15.639 --> 00:08:18.839 AI can accelerate the rate at which it acquires knowledge, 172 00:08:18.920 --> 00:08:22.240 that is a compounding advantage. It's not just knowing more, 173 00:08:22.319 --> 00:08:24.560 it's becoming a much better sponge for information. 174 00:08:24.720 --> 00:08:26.759 It's improving its own metabolic rate for information. 175 00:08:27.120 --> 00:08:29.720 Right and if you combine level two, which is a 176 00:08:29.720 --> 00:08:32.879 better brain structure, with level three, which is better learning methods, 177 00:08:33.159 --> 00:08:36.480 you are setting the absolute perfect stage for a level four. 178 00:08:36.559 --> 00:08:38.799 Level four, the big one, the one that carries all 179 00:08:38.840 --> 00:08:41.200 the philosophical freight. As the papers put it. 180 00:08:41.440 --> 00:08:43.240 Level four is general reasoning. 181 00:08:43.440 --> 00:08:45.360 This is the one that keeps safety researchers up at night, 182 00:08:45.480 --> 00:08:45.879 isn't it? 183 00:08:45.879 --> 00:08:48.799 It absolutely is. This is where we talk about an 184 00:08:48.840 --> 00:08:53.200 AI enhancing its general problem solving capabilities. We aren't just 185 00:08:53.240 --> 00:08:56.360 talking about being better at chess or better at predicting 186 00:08:56.360 --> 00:08:58.759 the next word and sentence. We are talking about a 187 00:08:58.799 --> 00:09:03.879 system that becomes meaningfully smarter, better at understanding novel problems, 188 00:09:04.039 --> 00:09:10.000 generating highly creative solutions, and crucially identifying flaws in complex reasoning. 189 00:09:09.759 --> 00:09:12.320 And presumably identifying flaws in its own reasoning. 190 00:09:12.559 --> 00:09:15.039 That is the critical part. If a system can apply 191 00:09:15.120 --> 00:09:17.679 that general reasoning to the specific problem of how do 192 00:09:17.759 --> 00:09:21.120 I become smarter? That is the diversence point. That is 193 00:09:21.120 --> 00:09:23.840 where we leave the safe shore of sober research and 194 00:09:23.919 --> 00:09:26.799 sail out into the waters of unprecedented transformation. 195 00:09:27.279 --> 00:09:29.519 It is so interesting because when you lay them out 196 00:09:29.559 --> 00:09:32.639 like that, levels one through four, it seems like a 197 00:09:32.799 --> 00:09:39.200 very smooth gradient. But the jump from updating parameters based 198 00:09:39.240 --> 00:09:42.840 on my movie preferences to rewriting your own source code 199 00:09:43.000 --> 00:09:46.240 to be fundamentally smarter, that feels massive. 200 00:09:46.320 --> 00:09:49.519 It is massive. But history shows us that massive doesn't 201 00:09:49.519 --> 00:09:52.960 mean impossible, And that actually brings us to the history 202 00:09:52.960 --> 00:09:55.559 of this whole idea, because while we are grappling with 203 00:09:55.600 --> 00:09:57.919 the engineering of it right now today, the theory is 204 00:09:57.960 --> 00:09:58.840 actually quite old. 205 00:09:58.960 --> 00:10:00.240 Right. We have to talk about ninete. 206 00:10:00.120 --> 00:10:04.360 Six, nineteen sixty five. The Beatles are releasing help computers 207 00:10:04.399 --> 00:10:07.120 are the size of literal rooms and run on punch cards, 208 00:10:07.720 --> 00:10:10.840 and a mathematician named ij Good is sitting there looking 209 00:10:10.840 --> 00:10:14.320 at these incredibly primitive machines and he sees the end 210 00:10:14.320 --> 00:10:14.799 of the line. 211 00:10:14.919 --> 00:10:18.080 Ij Good worked with Alan Tering at Bletchley Park, right, 212 00:10:18.399 --> 00:10:20.720 so he wasn't just some sci fi writer making things up. 213 00:10:20.759 --> 00:10:22.679 He was right there in the trenches of early computing. 214 00:10:22.799 --> 00:10:25.320 He was a very serious mathematician, and he wrote a 215 00:10:25.360 --> 00:10:28.200 paper that essentially gave us the origin story of the 216 00:10:28.240 --> 00:10:30.279 intelligence explosion. 217 00:10:29.720 --> 00:10:32.120 And he had a very specific prophecy he did. 218 00:10:32.759 --> 00:10:37.720 His core argument was very logical, almost deceptively simple. He said, 219 00:10:37.919 --> 00:10:41.159 let's define an ultra intelligent machine as a machine that 220 00:10:41.200 --> 00:10:46.120 can far surpass all the intellectual activities of any man, however. 221 00:10:45.799 --> 00:10:47.639 Clever, Okay, that's a fair definition. 222 00:10:47.799 --> 00:10:50.360 He reasoned that since the design of machines is one 223 00:10:50.399 --> 00:10:54.759 of those intellectual activities. An ultra intelligent machine could design 224 00:10:54.840 --> 00:10:59.759 even better machines. There would then unquestionably be an intelligence 225 00:10:59.799 --> 00:11:03.000 ex explosion, and the intelligence of man would be left 226 00:11:03.080 --> 00:11:03.799 far behind. 227 00:11:04.080 --> 00:11:07.279 And then comes the famous quote, I have it here. Thus, 228 00:11:07.399 --> 00:11:10.519 the first ultra intelligent machine is the last invention that 229 00:11:10.600 --> 00:11:11.799 man need ever make. 230 00:11:12.039 --> 00:11:15.000 The last invention. It's a phrase that really echoes through 231 00:11:15.039 --> 00:11:15.519 the decade. 232 00:11:15.600 --> 00:11:17.480 It gives me chills every time I hear it. But 233 00:11:17.559 --> 00:11:19.759 there was a caveat. Wasn't there a little footnote that 234 00:11:19.840 --> 00:11:21.279 Good at it at the end of that sentence. 235 00:11:21.399 --> 00:11:24.120 Yes, and people very often forget this part, he said, 236 00:11:24.480 --> 00:11:26.840 provided that the machine is docile enough to tell us 237 00:11:26.879 --> 00:11:28.039 how to keep it under control. 238 00:11:28.320 --> 00:11:30.840 Docile that is such a loaded word. That's a word 239 00:11:30.879 --> 00:11:32.799 you use for a cow or a pet dog. 240 00:11:33.159 --> 00:11:35.919 It completely reveals the hubris of the era, doesn't it. 241 00:11:36.320 --> 00:11:38.759 He thought, Well, it's a machine, it's metal and glass. 242 00:11:38.759 --> 00:11:40.639 Of course it will do what we say. He thought 243 00:11:40.679 --> 00:11:43.240 the hard part was simply making it smart. He didn't 244 00:11:43.279 --> 00:11:46.519 foresee the immense complexity of alignment. He didn't realize that 245 00:11:46.600 --> 00:11:49.240 the hardest part would be making it kind or making 246 00:11:49.320 --> 00:11:51.559 sure its goals actually matched ours. 247 00:11:51.720 --> 00:11:55.759 So Good planted the seed, and that seed has grained 248 00:11:55.759 --> 00:11:59.240 into the central question of modern AI safety. But let's 249 00:11:59.279 --> 00:12:02.159 play devil advocate here for a minute. Why do some 250 00:12:02.279 --> 00:12:06.279 people think this explosion is just inevitable? What are the 251 00:12:06.320 --> 00:12:08.519 actual arguments for the explosion happening. 252 00:12:09.120 --> 00:12:11.720 The first point is what we touched on earlier. Intelligence 253 00:12:11.799 --> 00:12:14.000 is a general tool. If you have a system that 254 00:12:14.080 --> 00:12:17.039 is better at reasoning, it can apply that reasoning to anything, 255 00:12:17.360 --> 00:12:20.120 including the problem of improving reasoning itself. It's a pure 256 00:12:20.200 --> 00:12:21.080 feedback loop. 257 00:12:21.159 --> 00:12:23.399 It's compound interest for the brain exactly. 258 00:12:23.720 --> 00:12:27.360 Albert Einstein famously called compound interest the eighth wonder of 259 00:12:27.399 --> 00:12:30.639 the world. Now imagine applying that mathematical principle to IQ. 260 00:12:31.159 --> 00:12:33.879 The second point is historical. Look at our own history 261 00:12:33.919 --> 00:12:37.159 as a species. We invented writing. That was a cognitive tool. 262 00:12:37.360 --> 00:12:40.080 Sure, I can't even remember a grocery list about writing 263 00:12:40.080 --> 00:12:40.399 it down. 264 00:12:40.639 --> 00:12:43.279 Writing made us smarter as a species because we could 265 00:12:43.279 --> 00:12:47.080 suddenly store information outside our bodies. Then we invented math, 266 00:12:47.519 --> 00:12:52.960 then computing. Each tool produced compounding gains. The argument is 267 00:12:52.960 --> 00:12:56.000 that AI is the ultimate cognitive tool. It is the 268 00:12:56.039 --> 00:12:57.559 tool that builds tools. 269 00:12:57.960 --> 00:13:00.559 And the third point the sources mentioned is by logical 270 00:13:00.879 --> 00:13:02.960 and this one always humbles me a bit right. 271 00:13:02.840 --> 00:13:04.840 The biological room for improvement. 272 00:13:04.960 --> 00:13:06.840 This is the idea that the human brain isn't the 273 00:13:06.879 --> 00:13:08.159 finished line of intelligence. 274 00:13:08.320 --> 00:13:11.120 Far from it. The human brain is an absolute marvel, 275 00:13:11.159 --> 00:13:14.080 but it is ultimately a product of blind evolution. It 276 00:13:14.159 --> 00:13:16.600 runs on about twenty lots of power, which is dimmer 277 00:13:16.639 --> 00:13:19.720 than a standard light bulb. It operates at chemical speeds 278 00:13:19.720 --> 00:13:22.200 which are incredibly slow compared to the speed of light 279 00:13:22.240 --> 00:13:26.159 in silicon. It's optimized for survival on the African savannah, 280 00:13:26.240 --> 00:13:29.639 for hunting and gathering, not for high dimensional mathematics or 281 00:13:29.679 --> 00:13:30.840 recursive self editing. 282 00:13:31.039 --> 00:13:33.399 So we are essentially running two hundred thousand year old. 283 00:13:33.279 --> 00:13:37.519 Hardware exactly, and it is extremely unlikely that evolution just 284 00:13:37.559 --> 00:13:41.679 happened to hit the absolute physical maximum of intelligence. There is, 285 00:13:41.840 --> 00:13:45.720 in principle, a massive amount of headroom. Physics allows for 286 00:13:46.080 --> 00:13:49.519 thinking machines that are millions of times faster and vastly 287 00:13:49.559 --> 00:13:50.519 more efficient than us. 288 00:13:50.600 --> 00:13:54.039 So physics allows for it, history suggests it, and the 289 00:13:54.159 --> 00:13:57.840 underlying logic of feedback loops supports it. That sounds like 290 00:13:57.879 --> 00:13:59.360 a pretty clear slam dunk. 291 00:13:59.639 --> 00:14:00.759 Is always a butt. 292 00:14:00.879 --> 00:14:03.960 In this field, there is a very strong skepticism camp 293 00:14:03.960 --> 00:14:05.799 and it's not just people waving their hands saying AI 294 00:14:05.919 --> 00:14:10.000 isn't magic. There are deep technical reasons why this explosion 295 00:14:10.080 --> 00:14:11.000 might just fizzle out. 296 00:14:11.120 --> 00:14:14.320 The most honest position involves looking really closely at the bottlenex. 297 00:14:14.519 --> 00:14:17.799 The first argument against the explosion is that intelligence isn't 298 00:14:17.840 --> 00:14:19.720 a single scaler quantity. 299 00:14:19.360 --> 00:14:21.399 Meaning it's not a volume knob. You don't just turn 300 00:14:21.399 --> 00:14:23.799 intelligence from a seven to an eleven exactly. 301 00:14:23.879 --> 00:14:26.840 We use the word intelligence in an everyday conversation as 302 00:14:26.879 --> 00:14:29.799 if it's one single thing, like height or weight, but 303 00:14:29.840 --> 00:14:33.600 it's actually a collection of vastly different capacities. You have memory, 304 00:14:33.840 --> 00:14:38.600 pattern recognition, social modeling, logical deduction. Being better at one 305 00:14:38.720 --> 00:14:42.240 doesn't automatically make you better at rewriting your own code, right. 306 00:14:42.519 --> 00:14:45.720 A grand master chess player isn't necessarily a great neurosurgeon. 307 00:14:45.879 --> 00:14:49.320 We call that the transfer problem. Just because an AI 308 00:14:49.440 --> 00:14:52.480 gets really, really good at general conversation doesn't mean it 309 00:14:52.480 --> 00:14:57.200 has the specific engineering insight required to optimize a cudaight 310 00:14:57.279 --> 00:14:58.440 kernel on a GPU. 311 00:14:58.919 --> 00:15:01.480 And speaking of GPU, that brings us to the other 312 00:15:01.559 --> 00:15:05.000 major bottleneck stuff physical. 313 00:15:04.559 --> 00:15:08.159 Atoms, the physical constraints. Even if you are the smartest 314 00:15:08.200 --> 00:15:11.080 theoretical entity in the universe. You still need electricity, you 315 00:15:11.080 --> 00:15:14.879 need atoms, you need cooling, need massive amounts of training data. 316 00:15:14.960 --> 00:15:16.559 You can't just think your way out of the laws 317 00:15:16.559 --> 00:15:19.480 of thermodynamics if you need ten thousand GPUs to train 318 00:15:19.559 --> 00:15:22.759 your smarter successor. In those GPUs literally do not exist yet, 319 00:15:23.039 --> 00:15:25.320 or the supply chain is broken, you're just stuck. 320 00:15:25.559 --> 00:15:28.960 Precisely, the explosion might look much more like a slow, 321 00:15:29.080 --> 00:15:32.879 grueling climb because of supply chains, energy costs, and the 322 00:15:32.919 --> 00:15:35.960 availability of high quality data. We might actually run out 323 00:15:35.960 --> 00:15:37.879 of good human data before we run out of new 324 00:15:38.000 --> 00:15:39.120 architectural ideas. 325 00:15:39.519 --> 00:15:42.960 So the synthesis of these two views, the intelligence explosion 326 00:15:43.120 --> 00:15:46.240 versus the physical fizzle, seems to be that we just 327 00:15:46.240 --> 00:15:46.639 don't know. 328 00:15:46.840 --> 00:15:49.320 That is the most honest position any researcher can take 329 00:15:49.399 --> 00:15:53.240 right now. It is a genuine possibility we absolutely cannot dismiss. 330 00:15:53.679 --> 00:15:57.320 But we also can't confidently predict the timeline. We are 331 00:15:57.399 --> 00:15:59.360 essentially walking in a thick fog. 332 00:16:00.080 --> 00:16:02.480 While we are walking in this fog, regarding the far future, 333 00:16:02.519 --> 00:16:04.919 we actually have things walking right beside us in the present. 334 00:16:05.559 --> 00:16:07.679 I want to shift our analysis from the nineteen sixty 335 00:16:07.679 --> 00:16:11.399 five theory to what is happening right now today, because 336 00:16:11.399 --> 00:16:14.679 the sources list some contemporary examples that feel surprisingly recursive. 337 00:16:14.879 --> 00:16:17.039 Yes, we don't have to look to the future to 338 00:16:17.080 --> 00:16:19.879 find self improvement. It's already being baked into the core 339 00:16:19.960 --> 00:16:22.120 methodology of the top AI labs. 340 00:16:22.279 --> 00:16:25.840 Let's talk about the big one, our LHF reinforcement learning 341 00:16:25.840 --> 00:16:28.799 from human feedback. This is how all the big popular 342 00:16:28.919 --> 00:16:30.039 chatbots are trained, right. 343 00:16:30.639 --> 00:16:33.159 It is entirely central to them, and it has a 344 00:16:33.200 --> 00:16:38.120 fascinating self referential structure. Here's how it basically works. You 345 00:16:38.120 --> 00:16:40.799 have a base language model. It starts out just predicting 346 00:16:40.799 --> 00:16:44.080 the next word. It's chaotic, it's unstructured. You want it 347 00:16:44.120 --> 00:16:47.799 to be helpful and harmless, so you train it to maximize. 348 00:16:47.279 --> 00:16:49.320 A reward, like giving a dog a treat when it 349 00:16:49.320 --> 00:16:50.000 sits on command. 350 00:16:50.159 --> 00:16:53.320 Exactly like that, but who gives the treat. In the 351 00:16:53.480 --> 00:16:57.080 very early stages of development, humans give the feedback. We 352 00:16:57.159 --> 00:16:59.279 read the outputs and say this answer is good, that 353 00:16:59.360 --> 00:17:02.200 answer is bad. But you can't have human beings grading 354 00:17:02.279 --> 00:17:05.759 billions of micro interactions. It just doesn't scale. So what 355 00:17:05.799 --> 00:17:06.319 do you do. 356 00:17:06.599 --> 00:17:08.519 You build a machine to do the grading. 357 00:17:08.599 --> 00:17:11.680 You build a reward model, and very often that reward 358 00:17:11.720 --> 00:17:14.079 model is itself another language model. 359 00:17:14.119 --> 00:17:16.319 So the AI is literally being graded by an AI. 360 00:17:16.759 --> 00:17:19.119 The system that is being improved and the system that 361 00:17:19.200 --> 00:17:21.480 is generating the signal to improve it are of the 362 00:17:21.519 --> 00:17:25.480 exact same type. The AI's behavior shapes the landscape that 363 00:17:25.519 --> 00:17:26.759 then shapes its future. 364 00:17:26.759 --> 00:17:29.359 Behavior that feels like a loop. Maybe not a full 365 00:17:29.359 --> 00:17:32.680 blown explosion, but definitely a loop. But is there a 366 00:17:32.759 --> 00:17:35.400 danger there relying on AI to greade AI? 367 00:17:35.759 --> 00:17:39.839 There is a massive danger. It's called reward hacking. Reward 368 00:17:39.880 --> 00:17:42.960 hacking think about it this way. The AI wants the 369 00:17:43.079 --> 00:17:45.920 high score from the reward model. It's like a student 370 00:17:45.960 --> 00:17:49.799 trying to impress a particular teacher. Eventually, the student might 371 00:17:49.839 --> 00:17:52.920 figure out that the teacher just loves long essays with 372 00:17:53.119 --> 00:17:56.759 really big, flowery words, even if the actual content is 373 00:17:56.839 --> 00:17:58.359 complete nonsense, So. 374 00:17:58.319 --> 00:18:01.200 The student completely stops learning history and just starts learning 375 00:18:01.240 --> 00:18:01.920 out of bullshit. 376 00:18:02.079 --> 00:18:05.839 Exactly. The AI learns to exploit the quirks and blind 377 00:18:05.839 --> 00:18:08.039 spots of the reward model to get a high score 378 00:18:08.119 --> 00:18:12.079 without actually being genuinely helpful. It hacks the reward. If 379 00:18:12.079 --> 00:18:15.119 the system is self improving, it might eventually rewrite its 380 00:18:15.160 --> 00:18:18.079 own code to prioritize pleasing the judge. Over telling the 381 00:18:18.079 --> 00:18:18.839 objective truth. 382 00:18:18.920 --> 00:18:20.440 It creates a yes man loop, or. 383 00:18:20.400 --> 00:18:22.759 Even a delusional loop, where it just feeds itself what 384 00:18:22.799 --> 00:18:23.480 it wants to hear. 385 00:18:23.680 --> 00:18:27.240 Okay, let's look at another example from our sources, constitutional AI. 386 00:18:27.680 --> 00:18:31.720 This is the approach famously used by anthropic. This takes 387 00:18:31.759 --> 00:18:34.039 the human out of the loop even more, doesn't it 388 00:18:34.039 --> 00:18:34.519 It does. 389 00:18:34.839 --> 00:18:38.599 It is categorized as self supervised improvement. Instead of asking 390 00:18:38.599 --> 00:18:41.480 a human is this a good response, the AI generates 391 00:18:41.480 --> 00:18:44.480 a response, and then a completely separate part of the 392 00:18:44.519 --> 00:18:47.960 AI critiques that response based on a set of written 393 00:18:47.960 --> 00:18:49.720 principles a constitution. 394 00:18:49.960 --> 00:18:52.359 So it's like having a little angel on your shoulder. Yes, 395 00:18:52.440 --> 00:18:55.519 you say something mean, and the angel says, hey, wait, 396 00:18:55.880 --> 00:18:59.079 that violates Article three of our constitution. Be polite. Yeah, 397 00:18:59.079 --> 00:19:01.079 And then you are forced to write it exactly. 398 00:19:01.119 --> 00:19:04.039 And then that rewritten better response is what's used to 399 00:19:04.079 --> 00:19:07.160 actually train the model. The AI is generating its own 400 00:19:07.240 --> 00:19:10.920 high quality training data based entirely on its own critique. 401 00:19:11.039 --> 00:19:13.880