WEBVTT 1 00:00:01.080 --> 00:00:03.799 How'd you like to listen to dot NetRocks with no ads? 2 00:00:04.440 --> 00:00:04.799 Easy? 3 00:00:05.360 --> 00:00:08.560 Become a patron for just five dollars a month. You 4 00:00:08.599 --> 00:00:11.320 get access to a private RSS feed where all the 5 00:00:11.359 --> 00:00:14.599 shows have no ads. Twenty dollars a month, we'll get 6 00:00:14.599 --> 00:00:18.440 you that and a special dot NetRocks patron mug. Sign 7 00:00:18.519 --> 00:00:34.640 up now at patreon dot dot NetRocks dot com. Hey 8 00:00:34.880 --> 00:00:39.039 guess what it's dot NetRocks episode nineteen forty four. I'm 9 00:00:39.119 --> 00:00:40.399 Carl Franklin. 10 00:00:39.960 --> 00:00:44.359 At amaterid cap nineteen forty four. Richard, I'm looking forward 11 00:00:44.359 --> 00:00:47.079 to the end of World War Two. Yeah, it's the 12 00:00:47.079 --> 00:00:50.679 beginning of the end. So nineteen forty four, the Allies 13 00:00:50.719 --> 00:00:55.200 launched D Day, the largest amphibious invasion in history, landing 14 00:00:55.200 --> 00:00:57.920 troops on the beaches of Normandy, France on June sixth, 15 00:00:58.679 --> 00:01:02.039 marking a turning point. In August, the Allied forces liberated 16 00:01:02.119 --> 00:01:08.280 Paris from Nazi occupation. You're welcome. In December, here's an 17 00:01:08.280 --> 00:01:10.879 anecdote to go with D Day if you like. Yeah. 18 00:01:10.959 --> 00:01:14.560 In concern for the soldiers in D Day, they mass 19 00:01:14.599 --> 00:01:17.000 produced penicillin for the very first time. There were two 20 00:01:17.000 --> 00:01:19.439 and a half million doses of penicillin made for the 21 00:01:19.519 --> 00:01:22.439 D Day invasion. That is so awesome. So post World 22 00:01:22.480 --> 00:01:27.400 War two, the reason we have antibiotics was that preparation. Yeah. 23 00:01:27.439 --> 00:01:30.400 In December, the Battle of the Bulge, the Germans launched 24 00:01:30.439 --> 00:01:34.519 a major counter offensive in the Ardennes region of Belgium. 25 00:01:34.519 --> 00:01:37.120 Did I say that right? Arden in the Ardennes? Yeah? 26 00:01:37.200 --> 00:01:38.040 Yeap Neardens. 27 00:01:38.120 --> 00:01:42.560 But the Allied forces eventually repelled the attack and in Rome, 28 00:01:42.760 --> 00:01:45.840 three hundred and thirty five Italians were killed in the 29 00:01:46.280 --> 00:01:49.599 Here's another thing I had but pronounce correctly in high school. 30 00:01:50.079 --> 00:01:56.480 Are eighteen r D ten ardittin all right? 31 00:01:56.560 --> 00:01:56.640 Right? 32 00:01:56.799 --> 00:01:59.359 R D ten We're going with that ar d eat 33 00:01:59.599 --> 00:02:03.079 I n E massacre including seventy five Jews and over 34 00:02:03.120 --> 00:02:07.280 two hundred members of the Italian resistance, various from various groups. 35 00:02:07.680 --> 00:02:08.960 So yeah, it's sort. 36 00:02:08.800 --> 00:02:12.000 Of the beginning of the end, the unwinding and leading 37 00:02:12.120 --> 00:02:17.039 up to the following year, nineteen forty five, which ended it. 38 00:02:17.159 --> 00:02:19.919 Right. Yeah. It's also the year that the first plutonium 39 00:02:19.919 --> 00:02:22.759 has ever made in the Hanford site in Washington, will 40 00:02:22.759 --> 00:02:27.639 eventually lead to the bombit Nagasaki. Yeah. And the Harvard 41 00:02:27.800 --> 00:02:31.960 Mark one, the built by IBM based on a design 42 00:02:32.000 --> 00:02:35.479 from professor at Harvard thirty five hundred relays and a 43 00:02:35.560 --> 00:02:39.560 fifty foot long camshaft because computers were different back then. Yeah, 44 00:02:39.639 --> 00:02:42.800 they were, and famously because it's a relays based computer. 45 00:02:42.919 --> 00:02:47.120 The next version of this they call, cleverly, the Mark two. Yeah, 46 00:02:47.159 --> 00:02:50.120 we'll have a moth get trapped in one of the relays, 47 00:02:50.400 --> 00:02:54.039 which race Hopper will find and remove and call the bug, 48 00:02:54.080 --> 00:02:56.080 and that will be the first bug, first bug in 49 00:02:56.080 --> 00:02:58.560 the machine. Yeah. I don't use a lot of relays 50 00:02:58.560 --> 00:02:59.479 and computers anymore. 51 00:02:59.639 --> 00:03:03.439 Yeah, And before we get started with doctor Rachelle, I 52 00:03:03.479 --> 00:03:07.240 wanted to just have you comment on the amazing recovery 53 00:03:07.360 --> 00:03:10.560 of the astronauts and the space station that happened this 54 00:03:10.759 --> 00:03:11.360 past week. 55 00:03:11.599 --> 00:03:13.599 Really not that amazing. It was so perfectly you know, 56 00:03:13.639 --> 00:03:17.159 it was an unexpected things. Those Butcher and Sonny both 57 00:03:17.319 --> 00:03:22.840 very experienced astronauts. When there was concerns about Starliner, they 58 00:03:22.960 --> 00:03:25.840 sent up the next crew with only the next crew 59 00:03:25.919 --> 00:03:28.439 on a crew Dragon with only two passengers, so they 60 00:03:28.439 --> 00:03:30.520 had the two additional seats for them to come back 61 00:03:30.639 --> 00:03:34.879 at any time. Yeah. But since they had two extremely 62 00:03:35.000 --> 00:03:40.080 qualified astronauts already up, why pay to send them back 63 00:03:40.120 --> 00:03:42.280 down when you can put them to work and in fact, 64 00:03:42.280 --> 00:03:45.360 they put Sonny in charge of the mission. She took 65 00:03:45.439 --> 00:03:49.000 over as mission commander for the station for the duration. 66 00:03:48.800 --> 00:03:50.759 And she and Butch were happy to stay there. They 67 00:03:50.759 --> 00:03:52.280 were like, no, we don't want to come home. 68 00:03:52.439 --> 00:03:54.400 Come on. Totally. They were never going to get to 69 00:03:54.400 --> 00:03:58.159 fly again. Those are retired astronauts, right, Yeah, so they 70 00:03:58.199 --> 00:03:59.919 got a great gig. Now that's going to take them 71 00:04:00.039 --> 00:04:03.080 more than a year to recover, which is also normal 72 00:04:03.360 --> 00:04:05.039 for a six months day, and they had a nine 73 00:04:05.039 --> 00:04:09.039 months day. Mark Kelly did a year, and you can 74 00:04:09.080 --> 00:04:11.120 read his book on this, Like, recovery is not a 75 00:04:11.159 --> 00:04:14.120 trivial thing. Yeah, I was watching him being interviewed. You know, 76 00:04:14.120 --> 00:04:17.240 you haven't walked on your feet nine months, your vestibulous 77 00:04:17.279 --> 00:04:20.160 systems messed up, your eyes have been bent out of shape. Like, 78 00:04:20.199 --> 00:04:23.639 it's not a small problem, right to recover from this. 79 00:04:23.759 --> 00:04:27.079 Yeah, I watched being interviewed on the news when it 80 00:04:27.160 --> 00:04:29.319 was having it. It's just still amazing to see that 81 00:04:29.519 --> 00:04:31.759 falcon Booster land. 82 00:04:31.600 --> 00:04:33.040 Land on his tail perfectly. 83 00:04:33.480 --> 00:04:36.439 Always it always is just going to be amazing to me. 84 00:04:36.639 --> 00:04:39.279 Yeah, no, it's it's a miracle. The crazier thing is 85 00:04:39.519 --> 00:04:43.120 it really is that starship Booster being caught out of 86 00:04:43.160 --> 00:04:47.399 the air. It's literally a twenty story, two hundred ton 87 00:04:47.600 --> 00:04:50.800 building that flies, yeah, and they catch it out of 88 00:04:50.839 --> 00:04:52.959 the air. So yeah, we are in amazing time. So 89 00:04:52.959 --> 00:04:55.800 the space industry has been funnelingentally changed by this, right. 90 00:04:56.120 --> 00:04:59.120 The cost of flight is so much lower. It's hard 91 00:04:59.160 --> 00:05:01.160 to even get her head around what's actually going on 92 00:05:01.240 --> 00:05:03.600 up there right now. So it's very cool with the proliferation. 93 00:05:04.240 --> 00:05:06.560 That's a very good experience for me. This week is 94 00:05:06.639 --> 00:05:08.399 very I felt very good about it, all right. 95 00:05:08.480 --> 00:05:10.879 So yeah, so that's a cue for me to roll 96 00:05:10.920 --> 00:05:12.399 the music for better no framework. 97 00:05:12.480 --> 00:05:21.800 So that's awesome. All right, man, what do you got 98 00:05:21.800 --> 00:05:25.199 our good buddy, Simon Crop has the genius. Simon Crop 99 00:05:25.319 --> 00:05:28.360 the ge. This guy is just he's so brilliant. He's 100 00:05:28.360 --> 00:05:31.839 brilliant and he comes up with solutions for things that 101 00:05:31.879 --> 00:05:33.120 you didn't even know you need it. Yeah. 102 00:05:33.240 --> 00:05:36.680 But this one is called symbol. It's a new GET 103 00:05:36.720 --> 00:05:41.120 package and it's an MS build task that enables bundling 104 00:05:41.199 --> 00:05:43.879 dot net symbols for references with a deployed app. 105 00:05:44.040 --> 00:05:44.319 Nice. 106 00:05:44.480 --> 00:05:50.040 The goal being to enable line numbers for exceptions in production. 107 00:05:50.519 --> 00:05:52.399 Oh okay, that's interesting. 108 00:05:52.240 --> 00:05:55.079 Yeah, because I guess you don't get that. Yeah, yeah, 109 00:05:55.120 --> 00:05:58.120 and this is this is what it does. So if 110 00:05:58.160 --> 00:06:01.959 you're in production you have an exception and yeah, I 111 00:06:01.959 --> 00:06:06.839 guess you log it, you're gonna see line numbers, all right, Yeah. 112 00:06:06.680 --> 00:06:09.759 That's cool. You got to know he had that problem, right, like, yeah, 113 00:06:09.879 --> 00:06:11.680 this is clearly a guy who built the thing to 114 00:06:11.720 --> 00:06:14.040 fix a thing that he had, and now we all 115 00:06:14.040 --> 00:06:14.639 get to benefit. 116 00:06:14.800 --> 00:06:17.920 Another alternative, I guess is just deploying the debug symbols 117 00:06:17.959 --> 00:06:20.560 with it, and now you're slowing things down in productions. 118 00:06:20.600 --> 00:06:23.800 So yeah, it's a lot more weight than just yeah, 119 00:06:24.000 --> 00:06:25.160 you know, use this library. 120 00:06:25.360 --> 00:06:29.519 So thank you Simon and Simon. Crop slash symbol on. 121 00:06:29.480 --> 00:06:31.360 GitHub continues to be awesome. 122 00:06:31.480 --> 00:06:34.759 See y mba l Yeah, the musical thing, the musical thing, 123 00:06:34.800 --> 00:06:35.079 all right? 124 00:06:35.079 --> 00:06:37.040 Who's talking to us? Richard grabbed a comment off a 125 00:06:37.040 --> 00:06:38.759 show eighteen thirty five of them when we did with 126 00:06:38.800 --> 00:06:41.040 our friend mattz Targanson talking about the next C sharp 127 00:06:41.040 --> 00:06:43.399 because we've got a great comment LLM related. This is 128 00:06:43.439 --> 00:06:46.959 from Murray who said MADD's mentioned making sure language features 129 00:06:47.000 --> 00:06:49.560 work with the tooling, such as ordering and link syntax. 130 00:06:50.000 --> 00:06:52.600 Increasingly with Copilot and other lms, this is part of 131 00:06:52.639 --> 00:06:56.319 the tooling. Yes. True. Obviously this is a year ago 132 00:06:56.360 --> 00:07:00.839 this comment, so you know so much changes happen. It's challenging. 133 00:07:01.680 --> 00:07:04.160 So given a piece of code using a new C 134 00:07:04.319 --> 00:07:06.800 Sharp language feature, which is what Mads was talking about, 135 00:07:06.920 --> 00:07:09.399 have you tried asking chat, GPT or copilot or so 136 00:07:09.560 --> 00:07:13.199 the LM to describe how that code works. If it 137 00:07:13.240 --> 00:07:16.680 gets it right, does it mean it's intuitive. He's an 138 00:07:16.800 --> 00:07:19.519 LM's intuition and at least you put that in quote, 139 00:07:19.519 --> 00:07:22.319 because there is no intuition in software. There is a 140 00:07:22.439 --> 00:07:25.639 grood approximation for the one that human programmers have, or 141 00:07:25.639 --> 00:07:28.720 a bad approximation, and if programmers are using copil, it 142 00:07:28.759 --> 00:07:32.639 doesn't matter about the human's intuition or the LMS. Let's 143 00:07:32.639 --> 00:07:35.759 complicate this fact with next year's LM that would be now, 144 00:07:36.560 --> 00:07:40.600 which will probably be profoundly different. Yes, so, having said 145 00:07:40.600 --> 00:07:42.240 all that, it's probably best to just aim for the 146 00:07:42.319 --> 00:07:45.600 human and let the LM catch up. Yeah, no intuition 147 00:07:45.720 --> 00:07:48.879 in software. The reality is, of course you would expect 148 00:07:48.920 --> 00:07:51.480 it to not understand a new language feature. There has 149 00:07:51.519 --> 00:07:54.120 to be some time for that language feature to be 150 00:07:54.160 --> 00:07:57.600 documented properly. The good news being as they keep regenerating 151 00:07:57.639 --> 00:08:01.360 these LMS on a regular basis, and Microsoft builds these 152 00:08:01.399 --> 00:08:06.800 features in public view on GitHub even before it ships. 153 00:08:07.000 --> 00:08:11.720 It's likely in the knowledge base that is the al Yeah, curiously, 154 00:08:12.000 --> 00:08:14.160 you know, in my last trip to Microsoft talking to folks, 155 00:08:14.199 --> 00:08:18.720 so what they're using, they've been using Claude Sonnet three seven. 156 00:08:18.879 --> 00:08:22.319 That's their favorite for working in dot net, which isn't 157 00:08:22.319 --> 00:08:27.480 that funny? Fascinating, But you know that's where it's at. 158 00:08:27.759 --> 00:08:30.680 So Mary, you're right, let's focus on the human understanding 159 00:08:30.720 --> 00:08:32.960 the language the most, because the software is only going 160 00:08:33.000 --> 00:08:35.519 to generate what it's got in its model, and it's 161 00:08:35.600 --> 00:08:38.120 up to you to assess it, although admittedly the compiler 162 00:08:38.200 --> 00:08:41.120 has to say also yes, and a copy of music 163 00:08:41.120 --> 00:08:42.679 Cobey is on its way to unit. If you'd like 164 00:08:42.679 --> 00:08:44.200 a copy of music code by, I write a comment 165 00:08:44.240 --> 00:08:46.679 on the website at dot netroocks dot comment on the facebooks. 166 00:08:46.679 --> 00:08:48.320 We publish every show there, and if you comment there 167 00:08:48.320 --> 00:08:50.120 and everything in the show, we'll send you copy of 168 00:08:50.159 --> 00:08:50.720 music code By. 169 00:08:50.840 --> 00:08:52.399 And if you don't want to wait for that, or 170 00:08:52.480 --> 00:08:55.000 you have other ideas and you just want to buy 171 00:08:55.159 --> 00:08:57.279 music to code buy, you can go to music tocode buy, 172 00:08:57.320 --> 00:09:01.519 dot net and track twenty two is new ish and 173 00:09:01.639 --> 00:09:04.519 you can get the entire collection an MP three flacre 174 00:09:04.600 --> 00:09:09.480 wave for a very good deal. It's a very good price, 175 00:09:09.919 --> 00:09:13.879 So happy coding, all right, Well, let's bring on doctor Birchell. 176 00:09:14.080 --> 00:09:18.639 Doctor Jody Birchell is the developer advocate in data science 177 00:09:18.679 --> 00:09:22.200 at jet Brains and was previously a lead data scientist 178 00:09:22.200 --> 00:09:25.799 at Verve Group Europe. She completed a PhD in clinical 179 00:09:25.840 --> 00:09:30.320 psychology and a postdoc in biostatistics before leaving academia for 180 00:09:30.360 --> 00:09:34.039 a data science career. She has worked for seven years 181 00:09:34.039 --> 00:09:37.600 as a data scientist in both Australia and Germany, developing 182 00:09:37.639 --> 00:09:42.840 a range of products including recommendation systems, analysis platforms, search 183 00:09:42.879 --> 00:09:47.200 engine improvements and audience profiling. She's held a broad range 184 00:09:47.200 --> 00:09:51.159 of responsibilities in her career, doing everything from data analytics 185 00:09:51.159 --> 00:09:55.559 to maintaining machine learning solutions and production. She's a longtime 186 00:09:55.600 --> 00:10:01.159 content creator in data science across conference and user group presentations, books, webinars, 187 00:10:01.200 --> 00:10:04.320 and posts on both her own and jet Brains blogs. 188 00:10:04.639 --> 00:10:06.279 In other words, a slacker. 189 00:10:09.320 --> 00:10:11.200 It occurs to me, Jody, that you and I hang 190 00:10:11.240 --> 00:10:13.320 out several times a year of various conferences, But I 191 00:10:13.320 --> 00:10:15.320 don't know that Carl's had time with you since we 192 00:10:15.399 --> 00:10:18.080 did that show at Tekarama. Takarama was the last time 193 00:10:18.120 --> 00:10:20.320 I saw you, No a couple of years ago. 194 00:10:20.519 --> 00:10:25.240 Yeah, yeah, exactly, So it's been a long time actually, Yeah. 195 00:10:25.360 --> 00:10:27.200 Things have changed your jet brains now. 196 00:10:27.240 --> 00:10:32.399 I have, certainly I think changed a lot. Yeah, yes, yeah, yeah, 197 00:10:32.440 --> 00:10:34.480 I was a jet brains when we first met as well, 198 00:10:34.519 --> 00:10:38.399 but I think I had only been there just over 199 00:10:38.440 --> 00:10:41.440 a year and so I was still like, I don't know, 200 00:10:41.559 --> 00:10:43.360 a little bit more shy, I think, a little bit 201 00:10:43.440 --> 00:10:44.279 less opinionated. 202 00:10:45.240 --> 00:10:47.480 You've been hanging around with the troublemakers for a while. 203 00:10:47.320 --> 00:10:49.039 Now, yeah, you talking about you? 204 00:10:49.519 --> 00:10:49.879 Yeah? 205 00:10:49.879 --> 00:10:55.639 Actually, well, and we're going to be hanging out in 206 00:10:55.679 --> 00:10:58.840 my hometown of Melbourne next month. 207 00:10:59.120 --> 00:11:03.039 Yeah, we're excited about that, yeah, NDC. Yes, so, And 208 00:11:03.159 --> 00:11:06.159 of course I've got family in New Zealand, so I've 209 00:11:06.159 --> 00:11:08.039 got to do a little time in Sydney to see 210 00:11:08.039 --> 00:11:09.759 some folks there, and then I'll be in Melbourne for 211 00:11:09.799 --> 00:11:12.159 the show with you, and then a week on the 212 00:11:12.200 --> 00:11:15.559 farm hanging with the cows and the cousins and the 213 00:11:15.600 --> 00:11:20.039 sheep and the sheep, No sheep, the sheep, what sheeps? 214 00:11:20.080 --> 00:11:23.399 The South Island thing? No sheep on the farm. No, no, 215 00:11:23.440 --> 00:11:26.720 it's it's it's a dairy farm. Dairy farm. Yeah. And 216 00:11:26.759 --> 00:11:29.840 by the way, cows are awesome. Sheep are dumb, dumb, 217 00:11:29.879 --> 00:11:35.320 dumb dumb, holy cow dumb. But they're tasty. Like how 218 00:11:35.480 --> 00:11:39.000 Jody says they're cute. I say they're tasty tasty. Where 219 00:11:39.120 --> 00:11:41.320 my mind is at, that's in the cow. The cows 220 00:11:41.320 --> 00:11:44.039 are smart enough that if they're actually having distress, you know, 221 00:11:44.080 --> 00:11:47.200 in birthing or anything, they will come for help. Wow. Right, 222 00:11:47.360 --> 00:11:50.399 Like they're bright and they and they follow the they 223 00:11:50.399 --> 00:11:52.120 follow the gates of the packs where you want them 224 00:11:52.120 --> 00:11:53.279 to go. But it doesn't mean they don't know how 225 00:11:53.279 --> 00:11:55.039 to open them themselves if they really wanted to. I've 226 00:11:55.039 --> 00:11:57.440 seen them do it. Yeah, damn, they're just playing along. 227 00:11:57.559 --> 00:12:01.000 Cows are great, they really are. In lls are great. 228 00:12:01.159 --> 00:12:06.279 Right in the right settings. Yeah, they are great. 229 00:12:06.679 --> 00:12:09.639 Yes, But even that that show we did in twenty three, 230 00:12:09.759 --> 00:12:11.519 you know you were the grown up in the room there, 231 00:12:11.879 --> 00:12:15.120 it's just tired, Like listen, there were limits like that. 232 00:12:15.360 --> 00:12:18.039 We're so hype ish in twenty three, not that it's 233 00:12:18.080 --> 00:12:21.200 all common rational in twenty five, but it's so. 234 00:12:21.240 --> 00:12:24.039 Funny actually, because I remember I was this was the 235 00:12:24.080 --> 00:12:26.519 first talk I did on LMS, so that one at 236 00:12:26.559 --> 00:12:28.799 Techorama actually was the first one I ever did. 237 00:12:29.159 --> 00:12:29.840 No free lunch. 238 00:12:30.000 --> 00:12:32.440 Yeah yeah, yeah yeah, And I was I was actually 239 00:12:32.519 --> 00:12:35.399 really scared of getting up and giving my opinion, like 240 00:12:35.440 --> 00:12:39.480 being a contrarian. Obviously, I'm feeling so vindicated right now. 241 00:12:39.600 --> 00:12:41.960 But it's right, isn't it. 242 00:12:41.960 --> 00:12:45.679 It's great being right, but it's I will say, like 243 00:12:45.799 --> 00:12:48.080 the hype has died slower than I thought it would. 244 00:12:48.120 --> 00:12:50.960 So I think Deep Seek finally has spelled the beginning 245 00:12:50.960 --> 00:12:51.240 of the. 246 00:12:51.279 --> 00:12:55.039 End, but not the end of the business, but the 247 00:12:56.000 --> 00:12:57.080 end of the hype cycle. 248 00:12:57.440 --> 00:12:58.480 The end of the hype cycle. 249 00:12:58.600 --> 00:13:01.200 Okay, I appreciate that the approach. 250 00:13:00.960 --> 00:13:05.519 To how we're going to be I guess, manufacturing these models, 251 00:13:06.240 --> 00:13:10.440 deploying these models, and thinking about these models fundamentally changed 252 00:13:10.440 --> 00:13:13.879 with Deep Seeks. So m it sort of showed that 253 00:13:14.039 --> 00:13:17.159 this hyperinvestment in data centers, which was kicking off with 254 00:13:17.200 --> 00:13:20.799 the Stargate project in the US. To explain context to 255 00:13:20.799 --> 00:13:21.919 anyone in the audience who doesn't know. 256 00:13:21.919 --> 00:13:24.120 It, five hundred billion dollars. 257 00:13:23.799 --> 00:13:27.600 And intended five hundred billion dollar investment between Open AI, 258 00:13:27.919 --> 00:13:32.039 the US government, and I think Microsoft was involved so I. 259 00:13:31.960 --> 00:13:33.240 Think Microsoft pulled out of it. 260 00:13:33.240 --> 00:13:37.080 It was gorecle very okak got you Yeah, yeah, that 261 00:13:37.440 --> 00:13:38.320 just got announced. 262 00:13:39.440 --> 00:13:41.279 Yeah, there was a little political game here is that 263 00:13:41.320 --> 00:13:43.399 was also run around the town. They sort of announced this, Hey, 264 00:13:44.080 --> 00:13:45.759 you know, I know we had this deal with open 265 00:13:45.759 --> 00:13:47.720 Ai wherever there's going to run an azure, but we're 266 00:13:47.799 --> 00:13:50.240 ready to let that go. I think it was because 267 00:13:50.279 --> 00:13:52.759 of Stargate that. Yeah, you know, there was sort of 268 00:13:52.799 --> 00:13:55.000 this pressure on Microsoft. You have to keep growing, growing, growing, 269 00:13:55.039 --> 00:13:57.039 and they're like, this is getting irrational. So if you 270 00:13:57.080 --> 00:13:59.159 want to go play with someone else, you knock yourself out. 271 00:13:59.240 --> 00:14:02.039 So yeahing it back to deepseek for a minute. From 272 00:14:02.080 --> 00:14:05.440 what I understand, you know, the open Ai and all 273 00:14:05.480 --> 00:14:08.039 these other models are looking at that and learning from 274 00:14:08.080 --> 00:14:10.200 it and figuring out how to make their own models 275 00:14:10.240 --> 00:14:16.600 more efficient. And at one point I heard that the 276 00:14:17.480 --> 00:14:20.360 Chinese model is, you know, hey, let's spend a lot 277 00:14:20.440 --> 00:14:24.240 less money on these things so that they're less expensive. 278 00:14:24.360 --> 00:14:26.200 We don't have to use as many processors and all 279 00:14:26.240 --> 00:14:29.960 that stuff. And I think I heard that, you know, 280 00:14:30.039 --> 00:14:34.519 the response from the American companies was, oh no, we're 281 00:14:34.600 --> 00:14:36.639 just going to make it ten times more one hundred 282 00:14:36.639 --> 00:14:40.759 times more powerful, you know, so a different kind of 283 00:14:40.840 --> 00:14:44.240 mindset whereas but that was originally Now I think that 284 00:14:44.360 --> 00:14:52.879 there's more of a desire to make to get smaller lllms, right, yeah, 285 00:14:53.080 --> 00:14:54.120 that are more specialized. 286 00:14:54.360 --> 00:14:59.039 The new ones of the story is that basically we've 287 00:14:59.120 --> 00:15:01.960 known that there are ways to make neural nets more efficient, 288 00:15:02.120 --> 00:15:05.399 right like, there are ways of making the models smaller, 289 00:15:05.799 --> 00:15:09.039 or after you've trained them, actually trimming them down and 290 00:15:10.480 --> 00:15:13.039 getting the same performance or almost the same performance for 291 00:15:13.200 --> 00:15:16.919 much smaller number of parameters. We've also known for quite 292 00:15:16.919 --> 00:15:18.960 a long time, and this is true with any machine 293 00:15:19.039 --> 00:15:22.279 learning model, that the higher the quality of data the 294 00:15:22.879 --> 00:15:24.960 you know, the better the model can perform for much 295 00:15:25.000 --> 00:15:27.440 smaller number of parameters. So this was proven last year 296 00:15:27.480 --> 00:15:29.919 with the Falcon last year or the year before with 297 00:15:30.000 --> 00:15:32.919 the Falcon models, they were sort of the first big 298 00:15:33.000 --> 00:15:35.360 open source ones that were trained on higher quality data 299 00:15:35.440 --> 00:15:38.240 sets and got a lot more performance for less parameters. 300 00:15:38.759 --> 00:15:42.200 But the most reliable way to get better performance was 301 00:15:42.320 --> 00:15:48.759 to scale, and I think what happened. The story I've 302 00:15:48.799 --> 00:15:52.279 heard in China is that they just couldn't get access 303 00:15:52.360 --> 00:15:57.720 to the same size of GPUs because of sanctions. Not 304 00:15:57.840 --> 00:16:02.000 sanctions Basically they weren't being sold in China, and so 305 00:16:02.320 --> 00:16:05.120 they had to make do with older and much less 306 00:16:05.200 --> 00:16:07.440 efficient processes, and they had to do all these tricks 307 00:16:07.519 --> 00:16:11.159 to basically share the training across a bunch of smaller machines. 308 00:16:11.799 --> 00:16:16.919 So this meant that they just couldn't create absolutely massive models. 309 00:16:17.679 --> 00:16:20.759 And essentially this meant that, yeah, they were forced to 310 00:16:20.799 --> 00:16:24.240 create a smaller model. But you know, the thing is 311 00:16:24.360 --> 00:16:28.519 is the quality of AI researchers and AI engineers that 312 00:16:28.600 --> 00:16:32.440 are being employed at companies like Open Ai and Anthropic 313 00:16:32.559 --> 00:16:35.480 and companies like this. I'm sure that they knew it 314 00:16:35.600 --> 00:16:38.279 was possible. It was just as I understood it, a 315 00:16:38.360 --> 00:16:42.759 less reliable path to performance. And you know, the American 316 00:16:42.799 --> 00:16:45.519 companies had they had the money and they had the 317 00:16:45.720 --> 00:16:48.679 servers to train it, so why not go big? 318 00:16:49.000 --> 00:16:53.120 And they understand that race right like they understand build bigger, 319 00:16:53.279 --> 00:16:56.200 keep going like it's a very American approach to things. Yes, 320 00:16:56.399 --> 00:16:59.559 you can always tune later, right, do your land grab now, but. 321 00:16:59.600 --> 00:17:03.799 Also that there's a difference between having one huge model 322 00:17:03.919 --> 00:17:08.640 like you know, chat ept that knows everything as bazillions 323 00:17:08.680 --> 00:17:11.799 of nodes or whatever it is, and then can you know, 324 00:17:12.079 --> 00:17:16.359 can cross reference things right and put connect the dots 325 00:17:17.119 --> 00:17:20.519 very much in ways that humans do, but in even 326 00:17:20.640 --> 00:17:26.160 more broadly, Whereas if you have smaller, less expensive models 327 00:17:26.240 --> 00:17:31.480 that are just our lllms that are trained on specific data, right, 328 00:17:32.079 --> 00:17:35.880 you'll get probably get more accurate things out of them 329 00:17:36.079 --> 00:17:41.279 for that particular set you know, that particular context maybe 330 00:17:41.880 --> 00:17:45.000 and then be able to have many of those with 331 00:17:45.359 --> 00:17:48.880 that have different expertise, but you won't necessarily be able 332 00:17:48.960 --> 00:17:51.160 to it won't necessarily be able to connect the dots 333 00:17:51.319 --> 00:17:53.799 like a large, huge model can. 334 00:17:53.960 --> 00:17:56.640 Right. This can actually lead into a further discussion about 335 00:17:56.720 --> 00:18:01.039 measurement if we want. But basically, looking at the current 336 00:18:01.079 --> 00:18:05.400 benchmarks that they're using to assess performance of llms, Deepseeky 337 00:18:05.559 --> 00:18:08.880 and smaller models coming out of China are actually rivaling 338 00:18:09.000 --> 00:18:12.880 the performance of larger models. So basically the understanding seems 339 00:18:12.920 --> 00:18:15.160 to be is that a lot of the parameters that 340 00:18:15.279 --> 00:18:19.680 these big models have are not actually being used every 341 00:18:19.799 --> 00:18:23.720 single time you try to do like inference for a 342 00:18:23.799 --> 00:18:26.559 particular task. It's only a subset of the parameters. So 343 00:18:26.880 --> 00:18:29.559 the way to think about parameters is think about neural 344 00:18:29.640 --> 00:18:32.400 nets as like you have inputs and then you have 345 00:18:32.480 --> 00:18:35.880 a bunch of neurons that are connected by what are 346 00:18:35.920 --> 00:18:39.079 called weights. They're basically multipliers, and you can kind of 347 00:18:39.160 --> 00:18:43.400 think about inference as a path that you take through 348 00:18:43.559 --> 00:18:46.519 the neural net, where like, you know, the whole thing's 349 00:18:46.559 --> 00:18:49.640 going to be used, but only certain weights will actually 350 00:18:49.839 --> 00:18:54.480 have an impact for particular types of tasks. And it 351 00:18:54.559 --> 00:18:57.480 sort of seems that what's happened with scaling down these 352 00:18:57.559 --> 00:19:01.039 models is that because they learned on so much data, 353 00:19:01.359 --> 00:19:03.119 and so much of the data seems to have not 354 00:19:03.240 --> 00:19:06.960 been high quality, that they really, like a lot of 355 00:19:07.079 --> 00:19:10.799 the parameters were not really being used in the majority 356 00:19:10.880 --> 00:19:13.519 of cases, they were just I see dead weight. 357 00:19:14.359 --> 00:19:17.319 And so so if you wanted to translate parameters and 358 00:19:17.400 --> 00:19:21.119 neurons to language, we're talking about the probability of the 359 00:19:21.240 --> 00:19:24.720 next word exactly right that it spits out. Yeah, and 360 00:19:24.799 --> 00:19:29.759 what you're saying is that they're only choosing from parameters 361 00:19:29.839 --> 00:19:30.640 with higher weights. 362 00:19:31.240 --> 00:19:34.319 Yeah, it's it's like or words. 363 00:19:34.079 --> 00:19:34.799 With higher weights. 364 00:19:35.279 --> 00:19:37.480 Yeah. So basically the way it works is, like you 365 00:19:37.559 --> 00:19:40.799 think about the last layer of the neural net is 366 00:19:40.920 --> 00:19:44.440 basically like all the words in the vocabulary. So it's 367 00:19:44.440 --> 00:19:48.279 obviously really really huge, and so the whole neural net 368 00:19:48.440 --> 00:19:51.960 is trying to predict to the probability of which of 369 00:19:52.079 --> 00:19:54.880 these words is the most likely to come next. So 370 00:19:55.240 --> 00:19:58.400 it's basically saying that for a particular import only a 371 00:19:58.480 --> 00:20:01.480 subset of that, you know, the paths that go through 372 00:20:01.519 --> 00:20:03.720 the neural net are actually going to give good information 373 00:20:04.359 --> 00:20:08.240 about what the next word is. And so yeah, it's 374 00:20:09.440 --> 00:20:13.279 it's also like it's kind of fascinating because the models 375 00:20:13.319 --> 00:20:17.039 are such black boxes. No one fully understands how the 376 00:20:17.519 --> 00:20:20.599 decisions are being made. I'm putting decisions in air quotes. 377 00:20:20.640 --> 00:20:25.200 I want to make this clear because interpretability is hot, 378 00:20:25.279 --> 00:20:28.000 but this is actually interpretability is becoming a really hot 379 00:20:28.079 --> 00:20:31.279 area in twenty twenty five. So actually understanding how llms 380 00:20:31.319 --> 00:20:34.000 come to the conclusions they come to, or sorry, how 381 00:20:34.079 --> 00:20:36.720 the predictions being made. Let's put it in more clinical terms, 382 00:20:37.200 --> 00:20:40.200 and that's going to help firstly make the models more efficient, 383 00:20:40.279 --> 00:20:43.119 but also demystify a lot of the assumptions we make 384 00:20:43.319 --> 00:20:46.480 about the predictions they make. Like we look at the prediction, 385 00:20:46.599 --> 00:20:49.519 we're like, oh, it's solving problems because if a person 386 00:20:49.599 --> 00:20:52.160 did that, it would be showing problem solving. Or the 387 00:20:52.240 --> 00:20:54.839 model's more intelligent because if a person did that, it 388 00:20:54.880 --> 00:20:58.000 would be showing more intelligence, but that's just us projecting. 389 00:20:58.119 --> 00:21:01.839 Sure, yeah, anther of morphisation. Now you know, I'm maybe 390 00:21:01.839 --> 00:21:03.880 I'm thinking about this the wrong way, but you know, 391 00:21:03.960 --> 00:21:05.720 as soon as you say that, I'm like, hey, there's like, 392 00:21:05.839 --> 00:21:08.559 what six hundred thousand words in the Oxford Dictionary that's 393 00:21:08.599 --> 00:21:11.039 just English and most people use fifteen hundred of them. 394 00:21:11.920 --> 00:21:14.599 So oh yeah, yeah, yeah. You know here you've built 395 00:21:14.640 --> 00:21:18.279 this model that has this huge potential range of comprehension 396 00:21:18.720 --> 00:21:21.759 and you're using a tiny subsect of it depending on 397 00:21:21.839 --> 00:21:24.400 what you're doing. Especially when we're coming at this from 398 00:21:24.440 --> 00:21:27.200 the copilot part of you was like, I'm working on code. 399 00:21:27.440 --> 00:21:31.200 Yeah, every symbol in the language is a is a 400 00:21:31.279 --> 00:21:32.440 word essentially right. 401 00:21:33.839 --> 00:21:37.920