WEBVTT 1 00:00:03.399 --> 00:00:07.719 Welcome to Bedtime Astronomy. Explore the wonders of the cosmos 2 00:00:07.759 --> 00:00:12.279 with our soothing Bedtime Astronomy podcast. Each episode offers a 3 00:00:12.359 --> 00:00:16.320 gentle journey through the stars, planets, and beyond, perfect for 4 00:00:16.399 --> 00:00:20.239 unwinding after a long day. Let's travel through the mysteries 5 00:00:20.239 --> 00:00:22.440 of the universe as you drift off into a peaceful 6 00:00:22.480 --> 00:00:26.760 slumber under the night sky. 7 00:00:27.039 --> 00:00:30.879 Welcome. We're diving into something pretty big today, a kind 8 00:00:30.879 --> 00:00:34.920 of crisis almost in modern astronomy, though maybe crisis of 9 00:00:34.920 --> 00:00:36.320 success is a better way to put it. 10 00:00:36.439 --> 00:00:38.600 Yeah, that's a good way to frame it. Our telescopes 11 00:00:38.640 --> 00:00:41.560 are just getting incredibly. 12 00:00:40.880 --> 00:00:44.840 Good, so good that the night sky isn't this peaceful 13 00:00:44.960 --> 00:00:50.159 backdrop anymore. It's more like a NonStop digital alarm, alerts 14 00:00:50.240 --> 00:00:51.960 firing constantly. 15 00:00:51.479 --> 00:00:54.719 Millions of them every single night, telling astronomers, hey, look here, 16 00:00:55.000 --> 00:00:55.880 something changed. 17 00:00:56.039 --> 00:00:59.679 And somewhere in that constant, overwhelming flood of data, that's 18 00:00:59.719 --> 00:01:03.920 where the really amazing stuff is hiding. Exploating stars, black holes, 19 00:01:04.000 --> 00:01:07.079 ripping things apart, maybe something totally new. We haven't even. 20 00:01:06.959 --> 00:01:11.760 Imagined exactly, these incredibly rare bright signals, but they're buried, 21 00:01:12.359 --> 00:01:14.159 just lost in an ocean of noise. 22 00:01:14.799 --> 00:01:17.680 So today we're looking at a really revolutionary approach not 23 00:01:17.760 --> 00:01:19.879 just to filter out the noise, but to actually like 24 00:01:20.079 --> 00:01:22.159 partner with the system generating at that's right. 25 00:01:22.200 --> 00:01:25.120 We're looking at work from a collaboration University of Oxford, 26 00:01:25.159 --> 00:01:29.599 Google Cloud, rad Booed University, and they found well, essentially 27 00:01:29.680 --> 00:01:30.879 a shortcut. 28 00:01:30.480 --> 00:01:33.760 A shortcut to dealing with this data tsunami using AI, 29 00:01:34.400 --> 00:01:38.400 and the core finding is, honestly, it's pretty surprising, even 30 00:01:38.400 --> 00:01:41.159 for AI development, which moves so fast. 31 00:01:40.959 --> 00:01:44.799 It really is. They took a general purpose large language model, 32 00:01:44.920 --> 00:01:48.159 Gemini one not specifically built for astronomy. 33 00:01:47.680 --> 00:01:49.519 At all, right, a generalist. 34 00:01:49.120 --> 00:01:52.719 And with minimal training, like incredibly minimal, turned it into 35 00:01:52.799 --> 00:01:54.680 an expert astronomical classifier. 36 00:01:54.799 --> 00:01:58.079 And the accuracy was what around ninety three percent, which 37 00:01:58.120 --> 00:01:58.719 is good. 38 00:01:58.640 --> 00:01:59.560 Obviously very good. 39 00:01:59.640 --> 00:02:01.799 Yeah, but that's not even the main story. I'd say 40 00:02:02.079 --> 00:02:04.359 the real game changer transparency. 41 00:02:04.480 --> 00:02:07.920 Ah okay, so not just what it decided, but why precisely. 42 00:02:08.000 --> 00:02:11.280 Traditional AI, especially in science, often works like a black box. 43 00:02:11.639 --> 00:02:14.120 You get an answer, maybe a confidence score, but no 44 00:02:14.240 --> 00:02:15.120 clue how it got. 45 00:02:15.000 --> 00:02:17.639 There, which is a huge problem for scientists. Right, you 46 00:02:17.680 --> 00:02:21.159 can't just blindly trust an output for say a once 47 00:02:21.199 --> 00:02:22.919 in a lifetime event exactly. 48 00:02:23.280 --> 00:02:27.240 But this LMM it provided a clear, plain English explanation 49 00:02:27.319 --> 00:02:30.560 for every single decision. It basically said, here's my conclusion, 50 00:02:30.599 --> 00:02:32.439 and here's why I think that based. 51 00:02:32.240 --> 00:02:36.159 On the images that fundamentally tackles that black box problem. 52 00:02:36.560 --> 00:02:38.840 It moves us from just using a tool to actually 53 00:02:38.879 --> 00:02:41.800 collaborating with something that explains its reasoning. 54 00:02:41.879 --> 00:02:45.039 Yeah, it's a shift from a specialized, opaque program to 55 00:02:45.759 --> 00:02:49.439 a generalist intelligence that we can actually talk to and understand. 56 00:02:49.479 --> 00:02:52.879 Okay, let's really unpack the scale of this data problem first, 57 00:02:53.280 --> 00:02:56.520 because you need to grasp just how massive it is 58 00:02:56.719 --> 00:02:59.680 to see why this AI approach wasn't just nice to have, 59 00:03:00.120 --> 00:03:04.000 it was becoming essential. So paint the picture for us. 60 00:03:04.319 --> 00:03:07.000 What's the day to day or night to night reality 61 00:03:07.280 --> 00:03:09.879 for an astronomer dealing with these transient surveys. 62 00:03:10.120 --> 00:03:13.080 We have these incredible telescope networks now, things like this 63 00:03:13.199 --> 00:03:18.000 wiki transient facility Atli's and Marelake t they're designed specifically 64 00:03:18.000 --> 00:03:20.439 for this. They stare at huge patches of the sky 65 00:03:20.719 --> 00:03:21.159 over and. 66 00:03:21.120 --> 00:03:23.000 Over looking for anything that changes. 67 00:03:22.759 --> 00:03:24.599 Right, clares up, dims, moves. 68 00:03:24.439 --> 00:03:26.759 Exactly, anything transient. And every time they take a new 69 00:03:26.800 --> 00:03:28.520 image and compare it to an older one of the 70 00:03:28.560 --> 00:03:31.800 same spot. If there's a difference, bang, an alert gets generated. 71 00:03:31.840 --> 00:03:34.319 And we said millions of these, yeah, yeah, easily, we're 72 00:03:34.360 --> 00:03:38.199 talking hundreds of thousands too, sometimes over a million alerts 73 00:03:38.479 --> 00:03:41.159 every single night. 74 00:03:41.400 --> 00:03:44.280 That's mind boggling. So if you're the astronomer on duty, 75 00:03:45.159 --> 00:03:46.039 what do you even do? 76 00:03:46.080 --> 00:03:50.759 You panic? No, you face this immediate, huge problem. Even 77 00:03:50.800 --> 00:03:52.960 if you could somehow look at one alert every five 78 00:03:53.000 --> 00:03:55.520 seconds four to seven, you wouldn't even make a dent. 79 00:03:55.759 --> 00:03:59.400 You couldn't possibly verify them all manually. 80 00:03:58.960 --> 00:04:02.680 Not even close. So you're forced into this instant triage. 81 00:04:02.919 --> 00:04:05.919 You have to rely on automated systems just to filter 82 00:04:05.960 --> 00:04:08.360 the incoming stream down to something manageable. 83 00:04:08.439 --> 00:04:10.039 And what are they hoping to find in all that? 84 00:04:10.080 --> 00:04:12.680 What are those really valuable signals hidden in the noise? 85 00:04:12.960 --> 00:04:13.159 Ah? 86 00:04:13.240 --> 00:04:17.199 The cosmic treasures. We're talking about things like supernovae, exploding stars, 87 00:04:17.439 --> 00:04:20.720 especially type A supernova. They're like standard candles, crucial for 88 00:04:20.759 --> 00:04:22.199 measuring the expansion of the universe. 89 00:04:22.480 --> 00:04:25.800 Okay, so fundamental cosmology relies on finding these absolutely. 90 00:04:25.879 --> 00:04:28.920 Then there are title disruption events TDEs. That's when a 91 00:04:28.920 --> 00:04:31.439 star gets too close to a supermassive black hole and 92 00:04:31.519 --> 00:04:35.240 gets well shredded spaghettified. It causes a huge flare. 93 00:04:35.000 --> 00:04:37.759 Of light, sound spectacular and probably quite rare. 94 00:04:37.759 --> 00:04:40.759 Very rare and very important for understanding black hole physics. 95 00:04:41.199 --> 00:04:44.800 We're also looking for fast moving objects like asteroids, especially 96 00:04:44.839 --> 00:04:46.560 nearer Earth asteroids for obvious. 97 00:04:46.199 --> 00:04:47.839 Reasons right planetary defense. 98 00:04:47.959 --> 00:04:53.439 And then brief energetic stuff, stellar flares, maybe the afterglows 99 00:04:53.439 --> 00:04:56.920 of gamma ray bursts, things that need immediate follow up, 100 00:04:57.040 --> 00:04:59.959 sometimes within minutes before they fade completely. 101 00:05:00.720 --> 00:05:03.959 So high stakes, time critical science. That's the gold. What 102 00:05:04.079 --> 00:05:07.519 about the junk? What makes up most of those million alerts? 103 00:05:07.560 --> 00:05:12.079 Oh, the noise, It's vast and incredibly varied. A huge 104 00:05:12.160 --> 00:05:15.319 chunk is just stuff that's not astrophysics at all. Satellite 105 00:05:15.360 --> 00:05:18.199 trails are a massive problem now, especially with all the 106 00:05:18.240 --> 00:05:19.480 new constellations going up. 107 00:05:19.560 --> 00:05:21.000 They just streak across the image. 108 00:05:21.079 --> 00:05:23.839 Yeah, during the exposure, looks like a transient source appeared 109 00:05:23.839 --> 00:05:27.680 and moved. Very annoying that you get instrumental artifacts, weird 110 00:05:27.720 --> 00:05:31.040 reflections inside the telescope, electronic glitches, dead pixels on the 111 00:05:31.040 --> 00:05:37.639 camera or just behaving imperfectly, and cosmic rays, high energy 112 00:05:37.680 --> 00:05:41.360 particles zipping through space, hit the detector chip and create 113 00:05:41.399 --> 00:05:44.720 a little flash looks exactly like a faint star popping 114 00:05:44.759 --> 00:05:45.959 into existence for a second. 115 00:05:46.000 --> 00:05:48.399 So without a really good filter, you're mostly looking at 116 00:05:48.439 --> 00:05:50.800 satellite photo bombs and camera glitches. 117 00:05:50.879 --> 00:05:52.959 Pretty much, it's like trying to find a diamond ring 118 00:05:53.000 --> 00:05:56.399 in a city landfill at night with a flickering flashlight. 119 00:05:57.079 --> 00:06:00.399 The sheer volume of bogus signals is OVERWHELT. 120 00:06:00.360 --> 00:06:04.040 And this already difficult situation is about to get exponentially worse. 121 00:06:04.160 --> 00:06:06.920 You mentioned the Versi Rubin Observatory. 122 00:06:06.360 --> 00:06:09.600 Ah, Ruben, Yeah, that's the big one coming online soon. 123 00:06:09.639 --> 00:06:12.839 It's going to survey the entire southern sky every few nights, 124 00:06:12.920 --> 00:06:17.079 deeper than ever before. The data volume is just staggering. 125 00:06:17.079 --> 00:06:17.959 How much are we talking. 126 00:06:18.079 --> 00:06:21.480 The estimate is around twenty terabytes of data every single night. 127 00:06:21.800 --> 00:06:23.839 Terabytes. Okay, that's not just a fire hose. That's like 128 00:06:23.879 --> 00:06:25.920 trying to drink from Niagara Falls exactly. 129 00:06:26.279 --> 00:06:30.079 Forget manual verification, it's impossible. It fundamentally changes the job. 130 00:06:30.800 --> 00:06:37.720 Without incredibly sophisticated, trustworthy automation, astronomers become data janitors, not discoverers. 131 00:06:37.240 --> 00:06:39.480 Which is where the traditional machine learning models came in. 132 00:06:39.600 --> 00:06:41.600 Right to try and handle this, but they had that 133 00:06:41.720 --> 00:06:43.160 black box problem we mentioned. 134 00:06:43.199 --> 00:06:45.519 They did, and they are good at filtering, don't get 135 00:06:45.560 --> 00:06:49.959 me wrong. Specialized models, usually convolutional neural networks, can be 136 00:06:50.040 --> 00:06:53.680 trained to recognize patterns. This looks like a supernova, This 137 00:06:53.720 --> 00:06:55.199 looks like a satellite trail. 138 00:06:55.199 --> 00:06:57.519 But the why is missing completely. 139 00:06:57.800 --> 00:07:01.279 The model learns all these internal parameters and biases to 140 00:07:01.360 --> 00:07:04.879 make the decision, but how it uses them it's opaque. 141 00:07:04.920 --> 00:07:09.800 It spits out real transient ninety eight percent confidence, and as. 142 00:07:09.720 --> 00:07:12.839 A scientist you just have to take its word for it. 143 00:07:13.040 --> 00:07:16.680 Pretty much, or spend precious telescope time verifying things that 144 00:07:16.759 --> 00:07:20.399 might be bogus or worse, ignore something real because the 145 00:07:20.439 --> 00:07:23.879 model made mistake. You can't diagnose. You can't build robust 146 00:07:23.959 --> 00:07:25.879 science on blind trust. 147 00:07:25.800 --> 00:07:29.800 Especially when hunting for unique, maybe paradigm shifting events. You 148 00:07:29.879 --> 00:07:32.199 need to know why the system thinks something is interesting. 149 00:07:32.560 --> 00:07:36.079 That's the core dilemma. The volume demands automation, but the 150 00:07:36.120 --> 00:07:39.439 science demands transparency. You're stuck between a rock and a 151 00:07:39.439 --> 00:07:39.959 hard place. 152 00:07:40.199 --> 00:07:44.000 Okay, So this Oxford Google rat Bood team set out 153 00:07:44.040 --> 00:07:47.920 to break that deadlock. Their goal wasn't just accuracy, It 154 00:07:48.000 --> 00:07:50.720 was accuracy plus explanation exactly. 155 00:07:50.839 --> 00:07:54.480 The big question was could a general purpose AI, one 156 00:07:54.519 --> 00:07:58.040 designed to understand both text and images, not only match 157 00:07:58.120 --> 00:08:01.399 the specialist in classification, but also explain itself in a 158 00:08:01.399 --> 00:08:03.120 way scientists could trust and use. 159 00:08:03.519 --> 00:08:06.360 And the key was this few shot learning approach, the 160 00:08:06.439 --> 00:08:09.560 minimal input part. You said, just fifteen examples, Just. 161 00:08:09.560 --> 00:08:13.680 Fifteen for each of the three different surveys they tested Atlus, 162 00:08:13.879 --> 00:08:18.199 mere Licht and pan Stars fifteen examples of real transience 163 00:08:18.439 --> 00:08:20.240 fifteen of Bogus's artifacts. 164 00:08:20.360 --> 00:08:22.160 Okay, I have to stop you there, because that sounds 165 00:08:22.199 --> 00:08:25.839 almost unbelievable. Fifteen. We usually hear about training AI on 166 00:08:25.959 --> 00:08:29.439 millions of images, needing massive data sets and weeks of computation. 167 00:08:30.240 --> 00:08:33.080 How can fifteen examples possibly be enough for such a 168 00:08:33.080 --> 00:08:37.679 complex visual task, especially across different telescopes with different characteristics. 169 00:08:37.759 --> 00:08:40.120 That's the crucial point, and it really highlights the power 170 00:08:40.159 --> 00:08:43.519 of these large, pre trained foundation models like Gemini. Doctor 171 00:08:43.559 --> 00:08:46.200 Fiorenzo Stoppa, one of the researchers, pointed this out. It 172 00:08:46.279 --> 00:08:48.000 wasn't just the fifteen image. 173 00:08:47.639 --> 00:08:50.159 Examples, Okay, there was more to it. 174 00:08:50.159 --> 00:08:53.320 It was the combination of those few examples plus clear 175 00:08:53.440 --> 00:08:57.679 simple text instructions. Think about it. A standard neural network 176 00:08:57.679 --> 00:09:02.679 starts from scratch. You have to teach you everything about shapes, light, noise, context. 177 00:09:02.799 --> 00:09:03.799 Right, it's a blank slate. 178 00:09:04.080 --> 00:09:06.519 But a large language model like Gemini has already been 179 00:09:06.559 --> 00:09:10.559 trained on vast amounts of text and images from the Internet. 180 00:09:11.519 --> 00:09:14.879 It already has a general understanding of the world, of patterns, 181 00:09:14.879 --> 00:09:19.080 of relationships, even of basic physics concepts implicitly. 182 00:09:18.879 --> 00:09:21.759 So it's not starting from zero. It already has a foundation. 183 00:09:22.200 --> 00:09:24.840 Exactly. You're not teaching it what is a dot of light? 184 00:09:25.200 --> 00:09:30.039 You're basically saying, hey, you incredibly smart, generally knowledgeable AI. 185 00:09:30.559 --> 00:09:34.240 In this specific context of astronomical images, this kind of 186 00:09:34.240 --> 00:09:36.480 pattern is what we call real, and this kind of 187 00:09:36.519 --> 00:09:39.519 streak or blob is bogus. Here are fifteen examples of 188 00:09:39.519 --> 00:09:40.440 each you get you started. 189 00:09:40.600 --> 00:09:42.960 So you're leveraging its existing knowledge and just giving it 190 00:09:43.039 --> 00:09:44.639 specific rules for this game. 191 00:09:44.759 --> 00:09:49.200 Precisely, those simple instructions and a handful of examples provide 192 00:09:49.240 --> 00:09:53.120 the specialized context it needs. It bypasses potentially years of 193 00:09:53.120 --> 00:09:55.960 training required for a specialized model built from the ground up. 194 00:09:56.080 --> 00:10:00.559 That's a powerful concept leveraging general intelligence for specific tasks. 195 00:10:01.159 --> 00:10:02.960 Let's talk about the kind of data it looked at. 196 00:10:03.120 --> 00:10:06.440 It wasn't just one picture per alert, was it. It was 197 00:10:06.519 --> 00:10:07.720 a set of three. 198 00:10:07.399 --> 00:10:11.080 Correct, a triplet of images all linked. This is pretty 199 00:10:11.080 --> 00:10:14.440 standard in transient surveys, and it's key to isolating the 200 00:10:14.559 --> 00:10:17.000 change for every potential event. 201 00:10:17.039 --> 00:10:18.759 The LLM got, Okay, what's the first one? 202 00:10:19.039 --> 00:10:22.360 First, the new image. That's the latest picture taken of 203 00:10:22.360 --> 00:10:24.799 that patch of sky. If something new appeared, it's in 204 00:10:24.840 --> 00:10:28.240 this image, along with all the background stars, galaxies. 205 00:10:27.759 --> 00:10:29.879 Noise, everything, standard observation yep. 206 00:10:30.519 --> 00:10:33.720 Second, the reference image. This is usually a much deeper image, 207 00:10:33.720 --> 00:10:37.360 maybe stacked from many previous observations of the exact same spot. 208 00:10:37.679 --> 00:10:41.360 It shows what's supposed to be there permanently, the unchanging background, like. 209 00:10:41.320 --> 00:10:43.279 A baseline map of that area exactly. 210 00:10:43.320 --> 00:10:45.200 And then the third and arguably the most. 211 00:10:45.000 --> 00:10:46.480 Important one, the difference image. 212 00:10:46.480 --> 00:10:49.840 That's the one they literally subtract the reference image from 213 00:10:49.840 --> 00:10:52.840 the new image, pixel by pixel. If nothing changed, the 214 00:10:52.879 --> 00:10:54.720 result is just black noise. 215 00:10:54.759 --> 00:10:58.240 Basically, all the constant stars and galaxies cancel out. 216 00:10:58.320 --> 00:11:01.799 Right, But if a new star appeared, it shows up 217 00:11:01.799 --> 00:11:04.879 as a bright spot, positive signal. If something that was 218 00:11:04.919 --> 00:11:08.240 there disappeared or dimmed, it shows up as a dark spot. 219 00:11:08.320 --> 00:11:11.000 Negative signal, though usually we look for the positive ones. 220 00:11:11.200 --> 00:11:14.279 So this difference image highlights only the change. It's like 221 00:11:14.360 --> 00:11:18.440 a cosmic spot, the difference puzzle result isolating the transient 222 00:11:18.480 --> 00:11:19.240 event itself. 223 00:11:19.360 --> 00:11:21.679 That's a perfect analogy. It removes all the clutter and 224 00:11:21.720 --> 00:11:25.919 focuses the AI's attention squarely on the potential discovery, the 225 00:11:25.960 --> 00:11:27.080 thing that wasn't there before. 226 00:11:27.159 --> 00:11:29.600 Okay, so it gets this triplet. But you mentioned it 227 00:11:29.639 --> 00:11:34.320 worked across different surveys Pan Stars, Mirrorlict, at Lass, and 228 00:11:34.360 --> 00:11:39.159 the source material notes these have different pixel scales, even 229 00:11:39.200 --> 00:11:40.840 though the image stamps were the same size. 230 00:11:40.919 --> 00:11:44.200 Yes, and this is really important for understanding the AI's flexibility. 231 00:11:44.799 --> 00:11:46.919 All the image cutouts given to the AI were one 232 00:11:46.960 --> 00:11:50.000 hundred by one hundred pixels, but how much sky those 233 00:11:50.159 --> 00:11:52.679 hundred pistols represented was different for each. 234 00:11:52.519 --> 00:11:54.679 Telescope, meaning the same object would look. 235 00:11:54.519 --> 00:11:58.759 Different, potentially very different. Pan Stars has high resolution about 236 00:11:58.799 --> 00:12:02.399 point twenty five arc secondsixel. A tiny point source like 237 00:12:02.440 --> 00:12:05.200 a distant supernova might look like a sharp little dot 238 00:12:05.279 --> 00:12:07.159 spread over say five or six pixels. 239 00:12:07.200 --> 00:12:07.799 Oh my crisp. 240 00:12:08.080 --> 00:12:10.639 But then you look at at Alis, which has much 241 00:12:10.679 --> 00:12:13.399 wider field of view, lower resolution about one point eighty 242 00:12:13.399 --> 00:12:17.360 six arc seconds per pixel, that same supernova might appear 243 00:12:17.399 --> 00:12:19.919 as just a slightly fuzzy blob contained within maybe one 244 00:12:20.000 --> 00:12:21.519 or two pixels. 245 00:12:21.240 --> 00:12:23.879 So much less detail almost smeared. 246 00:12:23.519 --> 00:12:27.679 Out exactly and mere ahts somewhere in between. The LLM, 247 00:12:28.039 --> 00:12:31.120 using just those fifteen examples per survey and the tax prompts, 248 00:12:31.279 --> 00:12:33.279 had to learn that the sharp five pixel dot in 249 00:12:33.360 --> 00:12:37.000 Pantstar's data and the think one pixel blob in atlast 250 00:12:37.080 --> 00:12:40.120 data could actually be the same type of astrophysical event. 251 00:12:40.240 --> 00:12:43.320 Wow. So it had to generalize across different instruments, signatures, 252 00:12:43.360 --> 00:12:47.240 different noise properties, different resolutions based on minimal input. 253 00:12:47.399 --> 00:12:49.799 It had to understand the underlying concept of a point 254 00:12:49.799 --> 00:12:52.559 source or a streak, regardless of how it was visually 255 00:12:52.600 --> 00:12:56.279 rendered by the specific telescope. That's way beyond simple pattern matching. Yeah, 256 00:12:56.320 --> 00:12:59.080 it suggests a deeper, more conceptual understanding, which. 257 00:12:58.919 --> 00:13:01.240 Is exactly what you need to move beyond the brittle 258 00:13:01.320 --> 00:13:05.840 nature of older specialized models. Okay, this brings us to 259 00:13:05.879 --> 00:13:08.440 the outputs, the transparency piece. This is where it gets 260 00:13:08.519 --> 00:13:13.759 really interesting, moving beyond just real or bogus. What exactly 261 00:13:13.759 --> 00:13:16.639 did the LM provide for each alert it analyzed? There 262 00:13:16.639 --> 00:13:18.080 were three key things right. 263 00:13:18.279 --> 00:13:21.039 This is the package that enables the collaboration. First, yeah, 264 00:13:21.080 --> 00:13:24.559 you get the basic real bogus classification, is it astrophysical 265 00:13:24.720 --> 00:13:27.799 or is it an artifact? The fundamental filter standard stuff 266 00:13:27.840 --> 00:13:32.480 needed that needed that. Second, the breakthrough the concise text explanation, 267 00:13:32.919 --> 00:13:36.399 a short paragraph describing why it made that classification, pointing 268 00:13:36.440 --> 00:13:39.600 out the key features in the triplet of images justification justification. 269 00:13:39.639 --> 00:13:41.679 This is where the black box opens up. And Third, 270 00:13:41.879 --> 00:13:45.799 an interest score basically ratings say one to ten, indicating 271 00:13:45.799 --> 00:13:49.080 how interesting or unusual this real event might be. Should 272 00:13:49.080 --> 00:13:52.519 astronomers drop everything or is it likely just another common 273 00:13:52.600 --> 00:13:53.679 type of variable star? 274 00:13:53.960 --> 00:13:58.960 So prioritization built right in that text explanation, though that 275 00:13:59.080 --> 00:14:01.399 seems like the core innovation for building trust. Can you 276 00:14:01.399 --> 00:14:03.679 give an example, like, what would it actually say for 277 00:14:03.759 --> 00:14:04.919 a potential supernova? 278 00:14:05.000 --> 00:14:08.240 Sure, instead of just real ninety five percent, it might 279 00:14:08.279 --> 00:14:14.039 output something like classification real interest score eight ten explanation 280 00:14:14.759 --> 00:14:17.600 signal is clearly visible as a distinct point source in 281 00:14:17.639 --> 00:14:21.919 the difference image, indicating a new object morphology is stellar, 282 00:14:22.240 --> 00:14:25.200 not streaked to like a satellite. Object is offset from 283 00:14:25.240 --> 00:14:28.799 the galaxy core, and the reference image. No obvious artifacts 284 00:14:28.840 --> 00:14:32.679 like diffraction spikes or cosmic rays nearby brightness increase is 285 00:14:32.720 --> 00:14:35.159 consistent with expectations for a young supernova. 286 00:14:35.279 --> 00:14:38.360 Okay, that's completely different. It's reasoning like an astronomer would. 287 00:14:38.360 --> 00:14:41.000 It's ticking off the checklist point source check, not a 288 00:14:41.039 --> 00:14:44.440 satellite check, not an artifact check, looks like a supernova 289 00:14:44.559 --> 00:14:45.320 check exactly. 290 00:14:45.720 --> 00:14:48.840 It's articulating its thought process using the language and logic 291 00:14:48.879 --> 00:14:51.399 of the field. And this wasn't just a theoretical benefit. 292 00:14:51.440 --> 00:14:52.559 They tested this a kid. 293 00:14:52.639 --> 00:14:55.559 They had a panel of twelve actual astronomers experts in 294 00:14:55.600 --> 00:14:59.799 transient science review a bunch of these AI generated explaining curtic. 295 00:15:00.000 --> 00:15:03.799 The consensus was that the explanations were highly coherent and useful, 296 00:15:04.360 --> 00:15:08.120 meaning they made sense scientifically, and they provided actionable information 297 00:15:08.159 --> 00:15:10.759 that the astronomers could actually use to evaluate the alert. 298 00:15:10.919 --> 00:15:14.440 Okay, that's strong validation from the human experts. But then 299 00:15:14.440 --> 00:15:19.440 there's this other layer, the AI evaluating itself. It assigned 300 00:15:19.480 --> 00:15:23.559 its own coherence score to its explanations. How does that work? 301 00:15:23.679 --> 00:15:26.320 Isn't that a bit circular, like asking the suspect to 302 00:15:26.440 --> 00:15:28.240 judge the quality of their own alibi. 303 00:15:28.440 --> 00:15:31.559 Huh, that's a fair question. It sounds a bit like that. 304 00:15:32.200 --> 00:15:35.000 But The coherence score is different from a simple confidence 305 00:15:35.000 --> 00:15:38.279 score on the classification itself. It's not rating if it's right. 306 00:15:38.559 --> 00:15:41.600 It's rating the quality and consistency of its own explanation. 307 00:15:41.759 --> 00:15:45.600 How so, it's assessing, did I manage to construct a logical, 308 00:15:45.639 --> 00:15:48.519 step by step argument connecting the visual features I saw 309 00:15:48.600 --> 00:15:51.720 to the final classification? Or was my reasoning a bit messy? 310 00:15:52.080 --> 00:15:54.519 Did I contradict myself? Did I have to ignore some 311 00:15:54.600 --> 00:15:57.960 awkward feature? If the AI detects features that pull it 312 00:15:58.000 --> 00:16:00.840 in different directions, or if the evidence isn't clean, it 313 00:16:00.879 --> 00:16:03.919 struggles to write a smooth, coherent explanation. 314 00:16:03.799 --> 00:16:06.799 And that struggle is reflected in a lower coherence score. 315 00:16:07.240 --> 00:16:11.799 Exactly, And here's the crucial finding. The team discovered a 316 00:16:11.840 --> 00:16:17.480 strong correlation explanations with low coherence scores were much much 317 00:16:17.559 --> 00:16:20.840 more likely to belong to incorrect classifications. 318 00:16:21.000 --> 00:16:23.720 Ah, I see, So the AI is basically flagging its 319 00:16:23.720 --> 00:16:27.240 own uncertainty, not by saying I'm only sixty percent sure, 320 00:16:27.480 --> 00:16:30.639 but by saying, my reasoning for this conclusion feels a 321 00:16:30.679 --> 00:16:32.000 bit weak or convoluted. 322 00:16:32.360 --> 00:16:36.720 Precisely, it's signaling its own internal cognitive friction. It's like 323 00:16:36.759 --> 00:16:39.639 it's saying, look, I'm calling this real. But honestly, the 324 00:16:39.679 --> 00:16:42.240 explanation I came up with isn't entirely convincing even to me. 325 00:16:42.519 --> 00:16:43.919 Maybe you should double check this one. 326 00:16:43.960 --> 00:16:47.120 That's incredibly useful. It moves away from silent failures. The 327 00:16:47.159 --> 00:16:50.399 system itself helps you identify where the potential problems are. 328 00:16:50.519 --> 00:16:53.159 It's the foundation for a truly reliable human. In the 329 00:16:53.200 --> 00:16:56.440 Loops system, astronomers are still overwhelmed. They can't check everything. 330 00:16:56.600 --> 00:16:59.279 But now the AI doesn't just give them the most 331 00:16:59.320 --> 00:17:01.759 likely real events. It gives them the most likely real 332 00:17:02.039 --> 00:17:04.960 and I classified this, but I'm not entirely sure my 333 00:17:05.039 --> 00:17:06.079 reasoning holds up event. 334 00:17:06.279 --> 00:17:09.119 So it directs human attention to the most scientifically valuable 335 00:17:09.160 --> 00:17:11.319 and the most potentially problematic cases. 336 00:17:11.599 --> 00:17:16.039 Smart, extremely smart, and it had an immediate practical benefit. 337 00:17:16.599 --> 00:17:19.079 The team used this feedback. They looked at the low 338 00:17:19.079 --> 00:17:22.759 coherence failures, understood why the AI was getting confused, and 339 00:17:22.880 --> 00:17:26.079 used that insight to slightly tweak or refine the initial 340 00:17:26.119 --> 00:17:28.119 fifteen examples and prompts. 341 00:17:27.799 --> 00:17:30.440 A quick iteration based on the AI's own self doubt. 342 00:17:30.640 --> 00:17:33.839 Yeah, and just doing that boosted the performance on one 343 00:17:33.839 --> 00:17:36.079 of the data sets from that initial ninety three point 344 00:17:36.119 --> 00:17:39.519 four percent accuracy up to about ninety six point seven percent. 345 00:17:39.599 --> 00:17:42.799 Wow, a significant jump, not by throwing massive new data 346 00:17:42.799 --> 00:17:45.400 sets at it, but by listening to its uncertainty and 347 00:17:45.400 --> 00:17:47.319 giving it slightly better guidance. 348 00:17:47.000 --> 00:17:50.640 Exactly smart targeted refinement enabled by transparency. 349 00:17:50.720 --> 00:17:54.000 Okay, so better accuracy through this feedback loop is one 350 00:17:54.119 --> 00:17:57.599 clear win, but the implications feel much broader. You mentioned 351 00:17:57.640 --> 00:18:01.519 democratization earlier. How does this approach change who can participate 352 00:18:01.559 --> 00:18:02.519 in this kind of science? 353 00:18:02.720 --> 00:18:04.920 That was a major point made by Tron Bullmus, one 354 00:18:04.960 --> 00:18:07.920 of the co lead authors. Because the method relies on 355 00:18:07.920 --> 00:18:10.200 such a small number of examples, just fifteen and plain 356 00:18:10.279 --> 00:18:12.799 language instructions, you suddenly don't need to be a deep 357 00:18:12.839 --> 00:18:16.200 learning expert or have access to huge computational resources to 358 00:18:16.279 --> 00:18:17.039 use it effectively. 359 00:18:17.119 --> 00:18:19.960 The barrier to entry drops significantly massively. 360 00:18:20.559 --> 00:18:23.839 Imagine you're an astronomer who discovers a new weird type 361 00:18:23.880 --> 00:18:27.400 of variable star. With the old methods, you'd maybe need 362 00:18:27.519 --> 00:18:31.880 years collaborating with AI engineers, gathering thousands of examples, training 363 00:18:31.920 --> 00:18:33.279 a specialized. 364 00:18:32.680 --> 00:18:34.279 Model, a huge undertaking. 365 00:18:34.440 --> 00:18:38.400 Yeah, but with this approach, you find fifteen good examples 366 00:18:38.400 --> 00:18:41.160 of your new weird star. Write a clear description of 367 00:18:41.200 --> 00:18:44.200 what makes it unique, and you can potentially deploy this 368 00:18:44.279 --> 00:18:47.440 general purpose LLM to start searching through survey data for 369 00:18:47.559 --> 00:18:49.079 more candidates almost immediately. 370 00:18:49.200 --> 00:18:53.000 So it empowers individual researchers or smaller teams who have 371 00:18:53.359 --> 00:18:57.000 deep astronomical expertise but maybe not deep AI expertise. 372 00:18:57.119 --> 00:19:00.880 Precisely, it shifts the bottleneck from AI programming skill back 373 00:19:00.880 --> 00:19:04.599 to scientific insight and curation ability. If you understand the 374 00:19:04.599 --> 00:19:07.000 science and can provide good examples, you can leverage this 375 00:19:07.119 --> 00:19:07.839 powerful tool. 376 00:19:08.160 --> 00:19:10.599 And this wasn't just the view of the researchers involved. 377 00:19:10.680 --> 00:19:13.799 Right established figures in the field also saw the potential. 378 00:19:13.920 --> 00:19:16.640 Oh absolutely. Professor Steven Smart, who's a big name in 379 00:19:16.640 --> 00:19:20.119 transient astronomy, been working on this exact classification problem for 380 00:19:20.160 --> 00:19:22.799 over a decade, building those complex specialized neural. 381 00:19:22.680 --> 00:19:25.039 Netwurgrey So someone deeply invested in the. 382 00:19:24.960 --> 00:19:28.839 Old way very much so. He described the lom's accuracy 383 00:19:29.160 --> 00:19:33.240 achieved with just those fifteen examples as remarkable, and he 384 00:19:33.319 --> 00:19:37.279 explicitly called this approach a potential total game changer. 385 00:19:37.440 --> 00:19:40.680 That's a powerful endorsement. When someone who spent years mastering 386 00:19:40.720 --> 00:19:43.079 the complex route sees a shortcut work this. 387 00:19:43.079 --> 00:19:46.799 Well, it tells you something fundamental is shifting. The era 388 00:19:46.960 --> 00:19:51.640 of needing highly specialized, custom built AI for every single 389 00:19:51.720 --> 00:19:56.400 scientific imaging task might be evolving. Generalist models with the 390 00:19:56.480 --> 00:19:59.319 right guidance are proving incredibly capable. 391 00:20:00.000 --> 00:20:03.079 This this transparent classification as the foundation. What's the next step? 392 00:20:03.079 --> 00:20:06.119 The paper talks about building agentic assistance. What does that 393 00:20:06.200 --> 00:20:06.559 look like? 394 00:20:06.759 --> 00:20:09.720 That's the really exciting future vision. It's moving beyond just 395 00:20:10.039 --> 00:20:13.680 labeling images to creating autonomous systems that actively participate in 396 00:20:13.720 --> 00:20:14.720 the scientific process. 397 00:20:14.839 --> 00:20:17.720 So the AI does more than just classify, much more. 398 00:20:17.880 --> 00:20:21.279 Imagine an AI agent. It gets the image, triplet classifies, 399 00:20:21.279 --> 00:20:24.559 it real generates the explanation looks like a tde flare 400 00:20:25.000 --> 00:20:28.440 checks its own coherence. High confidence in this reasoning. But 401 00:20:28.640 --> 00:20:29.519