1
00:00:01,080 --> 00:00:03,799
Speaker 1: How'd you like to listen to dot NetRocks with no ads?

2
00:00:04,440 --> 00:00:04,799
Speaker 2: Easy?

3
00:00:05,360 --> 00:00:08,560
Speaker 1: Become a patron for just five dollars a month. You

4
00:00:08,599 --> 00:00:11,320
get access to a private RSS feed where all the

5
00:00:11,359 --> 00:00:14,599
shows have no ads. Twenty dollars a month, we'll get

6
00:00:14,599 --> 00:00:18,440
you that and a special dot NetRocks patron mug. Sign

7
00:00:18,519 --> 00:00:34,640
up now at patreon dot dot NetRocks dot com. Hey

8
00:00:34,880 --> 00:00:39,039
guess what it's dot NetRocks episode nineteen forty four. I'm

9
00:00:39,119 --> 00:00:40,399
Carl Franklin.

10
00:00:39,960 --> 00:00:44,359
Speaker 2: At amaterid cap nineteen forty four. Richard, I'm looking forward

11
00:00:44,359 --> 00:00:47,079
to the end of World War Two. Yeah, it's the

12
00:00:47,079 --> 00:00:50,679
beginning of the end. So nineteen forty four, the Allies

13
00:00:50,719 --> 00:00:55,200
launched D Day, the largest amphibious invasion in history, landing

14
00:00:55,200 --> 00:00:57,920
troops on the beaches of Normandy, France on June sixth,

15
00:00:58,679 --> 00:01:02,039
marking a turning point. In August, the Allied forces liberated

16
00:01:02,119 --> 00:01:08,280
Paris from Nazi occupation. You're welcome. In December, here's an

17
00:01:08,280 --> 00:01:10,879
anecdote to go with D Day if you like. Yeah.

18
00:01:10,959 --> 00:01:14,560
In concern for the soldiers in D Day, they mass

19
00:01:14,599 --> 00:01:17,000
produced penicillin for the very first time. There were two

20
00:01:17,000 --> 00:01:19,439
and a half million doses of penicillin made for the

21
00:01:19,519 --> 00:01:22,439
D Day invasion. That is so awesome. So post World

22
00:01:22,480 --> 00:01:27,400
War two, the reason we have antibiotics was that preparation. Yeah.

23
00:01:27,439 --> 00:01:30,400
Speaker 1: In December, the Battle of the Bulge, the Germans launched

24
00:01:30,439 --> 00:01:34,519
a major counter offensive in the Ardennes region of Belgium.

25
00:01:34,519 --> 00:01:37,120
Speaker 2: Did I say that right? Arden in the Ardennes? Yeah?

26
00:01:37,200 --> 00:01:38,040
Yeap Neardens.

27
00:01:38,120 --> 00:01:42,560
Speaker 1: But the Allied forces eventually repelled the attack and in Rome,

28
00:01:42,760 --> 00:01:45,840
three hundred and thirty five Italians were killed in the

29
00:01:46,280 --> 00:01:49,599
Here's another thing I had but pronounce correctly in high school.

30
00:01:50,079 --> 00:01:56,480
Are eighteen r D ten ardittin all right?

31
00:01:56,560 --> 00:01:56,640
Speaker 3: Right?

32
00:01:56,799 --> 00:01:59,359
Speaker 1: R D ten We're going with that ar d eat

33
00:01:59,599 --> 00:02:03,079
I n E massacre including seventy five Jews and over

34
00:02:03,120 --> 00:02:07,280
two hundred members of the Italian resistance, various from various groups.

35
00:02:07,680 --> 00:02:08,960
Speaker 2: So yeah, it's sort.

36
00:02:08,800 --> 00:02:12,000
Speaker 1: Of the beginning of the end, the unwinding and leading

37
00:02:12,120 --> 00:02:17,039
up to the following year, nineteen forty five, which ended it.

38
00:02:17,159 --> 00:02:19,919
Speaker 2: Right. Yeah. It's also the year that the first plutonium

39
00:02:19,919 --> 00:02:22,759
has ever made in the Hanford site in Washington, will

40
00:02:22,759 --> 00:02:27,639
eventually lead to the bombit Nagasaki. Yeah. And the Harvard

41
00:02:27,800 --> 00:02:31,960
Mark one, the built by IBM based on a design

42
00:02:32,000 --> 00:02:35,479
from professor at Harvard thirty five hundred relays and a

43
00:02:35,560 --> 00:02:39,560
fifty foot long camshaft because computers were different back then. Yeah,

44
00:02:39,639 --> 00:02:42,800
they were, and famously because it's a relays based computer.

45
00:02:42,919 --> 00:02:47,120
The next version of this they call, cleverly, the Mark two. Yeah,

46
00:02:47,159 --> 00:02:50,120
we'll have a moth get trapped in one of the relays,

47
00:02:50,400 --> 00:02:54,039
which race Hopper will find and remove and call the bug,

48
00:02:54,080 --> 00:02:56,080
and that will be the first bug, first bug in

49
00:02:56,080 --> 00:02:58,560
the machine. Yeah. I don't use a lot of relays

50
00:02:58,560 --> 00:02:59,479
and computers anymore.

51
00:02:59,639 --> 00:03:03,439
Speaker 1: Yeah, And before we get started with doctor Rachelle, I

52
00:03:03,479 --> 00:03:07,240
wanted to just have you comment on the amazing recovery

53
00:03:07,360 --> 00:03:10,560
of the astronauts and the space station that happened this

54
00:03:10,759 --> 00:03:11,360
past week.

55
00:03:11,599 --> 00:03:13,599
Speaker 2: Really not that amazing. It was so perfectly you know,

56
00:03:13,639 --> 00:03:17,159
it was an unexpected things. Those Butcher and Sonny both

57
00:03:17,319 --> 00:03:22,840
very experienced astronauts. When there was concerns about Starliner, they

58
00:03:22,960 --> 00:03:25,840
sent up the next crew with only the next crew

59
00:03:25,919 --> 00:03:28,439
on a crew Dragon with only two passengers, so they

60
00:03:28,439 --> 00:03:30,520
had the two additional seats for them to come back

61
00:03:30,639 --> 00:03:34,879
at any time. Yeah. But since they had two extremely

62
00:03:35,000 --> 00:03:40,080
qualified astronauts already up, why pay to send them back

63
00:03:40,120 --> 00:03:42,280
down when you can put them to work and in fact,

64
00:03:42,280 --> 00:03:45,360
they put Sonny in charge of the mission. She took

65
00:03:45,439 --> 00:03:49,000
over as mission commander for the station for the duration.

66
00:03:48,800 --> 00:03:50,759
Speaker 1: And she and Butch were happy to stay there. They

67
00:03:50,759 --> 00:03:52,280
were like, no, we don't want to come home.

68
00:03:52,439 --> 00:03:54,400
Speaker 2: Come on. Totally. They were never going to get to

69
00:03:54,400 --> 00:03:58,159
fly again. Those are retired astronauts, right, Yeah, so they

70
00:03:58,199 --> 00:03:59,919
got a great gig. Now that's going to take them

71
00:04:00,039 --> 00:04:03,080
more than a year to recover, which is also normal

72
00:04:03,360 --> 00:04:05,039
for a six months day, and they had a nine

73
00:04:05,039 --> 00:04:09,039
months day. Mark Kelly did a year, and you can

74
00:04:09,080 --> 00:04:11,120
read his book on this, Like, recovery is not a

75
00:04:11,159 --> 00:04:14,120
trivial thing. Yeah, I was watching him being interviewed. You know,

76
00:04:14,120 --> 00:04:17,240
you haven't walked on your feet nine months, your vestibulous

77
00:04:17,279 --> 00:04:20,160
systems messed up, your eyes have been bent out of shape. Like,

78
00:04:20,199 --> 00:04:23,639
it's not a small problem, right to recover from this.

79
00:04:23,759 --> 00:04:27,079
Speaker 1: Yeah, I watched being interviewed on the news when it

80
00:04:27,160 --> 00:04:29,319
was having it. It's just still amazing to see that

81
00:04:29,519 --> 00:04:31,759
falcon Booster land.

82
00:04:31,600 --> 00:04:33,040
Speaker 2: Land on his tail perfectly.

83
00:04:33,480 --> 00:04:36,439
Speaker 1: Always it always is just going to be amazing to me.

84
00:04:36,639 --> 00:04:39,279
Speaker 2: Yeah, no, it's it's a miracle. The crazier thing is

85
00:04:39,519 --> 00:04:43,120
it really is that starship Booster being caught out of

86
00:04:43,160 --> 00:04:47,399
the air. It's literally a twenty story, two hundred ton

87
00:04:47,600 --> 00:04:50,800
building that flies, yeah, and they catch it out of

88
00:04:50,839 --> 00:04:52,959
the air. So yeah, we are in amazing time. So

89
00:04:52,959 --> 00:04:55,800
the space industry has been funnelingentally changed by this, right.

90
00:04:56,120 --> 00:04:59,120
The cost of flight is so much lower. It's hard

91
00:04:59,160 --> 00:05:01,160
to even get her head around what's actually going on

92
00:05:01,240 --> 00:05:03,600
up there right now. So it's very cool with the proliferation.

93
00:05:04,240 --> 00:05:06,560
That's a very good experience for me. This week is

94
00:05:06,639 --> 00:05:08,399
very I felt very good about it, all right.

95
00:05:08,480 --> 00:05:10,879
Speaker 1: So yeah, so that's a cue for me to roll

96
00:05:10,920 --> 00:05:12,399
the music for better no framework.

97
00:05:12,480 --> 00:05:21,800
Speaker 2: So that's awesome. All right, man, what do you got

98
00:05:21,800 --> 00:05:25,199
our good buddy, Simon Crop has the genius. Simon Crop

99
00:05:25,319 --> 00:05:28,360
the ge. This guy is just he's so brilliant. He's

100
00:05:28,360 --> 00:05:31,839
brilliant and he comes up with solutions for things that

101
00:05:31,879 --> 00:05:33,120
you didn't even know you need it. Yeah.

102
00:05:33,240 --> 00:05:36,680
Speaker 1: But this one is called symbol. It's a new GET

103
00:05:36,720 --> 00:05:41,120
package and it's an MS build task that enables bundling

104
00:05:41,199 --> 00:05:43,879
dot net symbols for references with a deployed app.

105
00:05:44,040 --> 00:05:44,319
Speaker 2: Nice.

106
00:05:44,480 --> 00:05:50,040
Speaker 1: The goal being to enable line numbers for exceptions in production.

107
00:05:50,519 --> 00:05:52,399
Speaker 2: Oh okay, that's interesting.

108
00:05:52,240 --> 00:05:55,079
Speaker 1: Yeah, because I guess you don't get that. Yeah, yeah,

109
00:05:55,120 --> 00:05:58,120
and this is this is what it does. So if

110
00:05:58,160 --> 00:06:01,959
you're in production you have an exception and yeah, I

111
00:06:01,959 --> 00:06:06,839
guess you log it, you're gonna see line numbers, all right, Yeah.

112
00:06:06,680 --> 00:06:09,759
Speaker 2: That's cool. You got to know he had that problem, right, like, yeah,

113
00:06:09,879 --> 00:06:11,680
this is clearly a guy who built the thing to

114
00:06:11,720 --> 00:06:14,040
fix a thing that he had, and now we all

115
00:06:14,040 --> 00:06:14,639
get to benefit.

116
00:06:14,800 --> 00:06:17,920
Speaker 1: Another alternative, I guess is just deploying the debug symbols

117
00:06:17,959 --> 00:06:20,560
with it, and now you're slowing things down in productions.

118
00:06:20,600 --> 00:06:23,800
Speaker 2: So yeah, it's a lot more weight than just yeah,

119
00:06:24,000 --> 00:06:25,160
you know, use this library.

120
00:06:25,360 --> 00:06:29,519
Speaker 1: So thank you Simon and Simon. Crop slash symbol on.

121
00:06:29,480 --> 00:06:31,360
Speaker 2: GitHub continues to be awesome.

122
00:06:31,480 --> 00:06:34,759
Speaker 1: See y mba l Yeah, the musical thing, the musical thing,

123
00:06:34,800 --> 00:06:35,079
all right?

124
00:06:35,079 --> 00:06:37,040
Speaker 2: Who's talking to us? Richard grabbed a comment off a

125
00:06:37,040 --> 00:06:38,759
show eighteen thirty five of them when we did with

126
00:06:38,800 --> 00:06:41,040
our friend mattz Targanson talking about the next C sharp

127
00:06:41,040 --> 00:06:43,399
because we've got a great comment LLM related. This is

128
00:06:43,439 --> 00:06:46,959
from Murray who said MADD's mentioned making sure language features

129
00:06:47,000 --> 00:06:49,560
work with the tooling, such as ordering and link syntax.

130
00:06:50,000 --> 00:06:52,600
Increasingly with Copilot and other lms, this is part of

131
00:06:52,639 --> 00:06:56,319
the tooling. Yes. True. Obviously this is a year ago

132
00:06:56,360 --> 00:07:00,839
this comment, so you know so much changes happen. It's challenging.

133
00:07:01,680 --> 00:07:04,160
So given a piece of code using a new C

134
00:07:04,319 --> 00:07:06,800
Sharp language feature, which is what Mads was talking about,

135
00:07:06,920 --> 00:07:09,399
have you tried asking chat, GPT or copilot or so

136
00:07:09,560 --> 00:07:13,199
the LM to describe how that code works. If it

137
00:07:13,240 --> 00:07:16,680
gets it right, does it mean it's intuitive. He's an

138
00:07:16,800 --> 00:07:19,519
LM's intuition and at least you put that in quote,

139
00:07:19,519 --> 00:07:22,319
because there is no intuition in software. There is a

140
00:07:22,439 --> 00:07:25,639
grood approximation for the one that human programmers have, or

141
00:07:25,639 --> 00:07:28,720
a bad approximation, and if programmers are using copil, it

142
00:07:28,759 --> 00:07:32,639
doesn't matter about the human's intuition or the LMS. Let's

143
00:07:32,639 --> 00:07:35,759
complicate this fact with next year's LM that would be now,

144
00:07:36,560 --> 00:07:40,600
which will probably be profoundly different. Yes, so, having said

145
00:07:40,600 --> 00:07:42,240
all that, it's probably best to just aim for the

146
00:07:42,319 --> 00:07:45,600
human and let the LM catch up. Yeah, no intuition

147
00:07:45,720 --> 00:07:48,879
in software. The reality is, of course you would expect

148
00:07:48,920 --> 00:07:51,480
it to not understand a new language feature. There has

149
00:07:51,519 --> 00:07:54,120
to be some time for that language feature to be

150
00:07:54,160 --> 00:07:57,600
documented properly. The good news being as they keep regenerating

151
00:07:57,639 --> 00:08:01,360
these LMS on a regular basis, and Microsoft builds these

152
00:08:01,399 --> 00:08:06,800
features in public view on GitHub even before it ships.

153
00:08:07,000 --> 00:08:11,720
It's likely in the knowledge base that is the al Yeah, curiously,

154
00:08:12,000 --> 00:08:14,160
you know, in my last trip to Microsoft talking to folks,

155
00:08:14,199 --> 00:08:18,720
so what they're using, they've been using Claude Sonnet three seven.

156
00:08:18,879 --> 00:08:22,319
That's their favorite for working in dot net, which isn't

157
00:08:22,319 --> 00:08:27,480
that funny? Fascinating, But you know that's where it's at.

158
00:08:27,759 --> 00:08:30,680
So Mary, you're right, let's focus on the human understanding

159
00:08:30,720 --> 00:08:32,960
the language the most, because the software is only going

160
00:08:33,000 --> 00:08:35,519
to generate what it's got in its model, and it's

161
00:08:35,600 --> 00:08:38,120
up to you to assess it, although admittedly the compiler

162
00:08:38,200 --> 00:08:41,120
has to say also yes, and a copy of music

163
00:08:41,120 --> 00:08:42,679
Cobey is on its way to unit. If you'd like

164
00:08:42,679 --> 00:08:44,200
a copy of music code by, I write a comment

165
00:08:44,240 --> 00:08:46,679
on the website at dot netroocks dot comment on the facebooks.

166
00:08:46,679 --> 00:08:48,320
We publish every show there, and if you comment there

167
00:08:48,320 --> 00:08:50,120
and everything in the show, we'll send you copy of

168
00:08:50,159 --> 00:08:50,720
music code By.

169
00:08:50,840 --> 00:08:52,399
Speaker 1: And if you don't want to wait for that, or

170
00:08:52,480 --> 00:08:55,000
you have other ideas and you just want to buy

171
00:08:55,159 --> 00:08:57,279
music to code buy, you can go to music tocode buy,

172
00:08:57,320 --> 00:09:01,519
dot net and track twenty two is new ish and

173
00:09:01,639 --> 00:09:04,519
you can get the entire collection an MP three flacre

174
00:09:04,600 --> 00:09:09,480
wave for a very good deal. It's a very good price,

175
00:09:09,919 --> 00:09:13,879
So happy coding, all right, Well, let's bring on doctor Birchell.

176
00:09:14,080 --> 00:09:18,639
Doctor Jody Birchell is the developer advocate in data science

177
00:09:18,679 --> 00:09:22,200
at jet Brains and was previously a lead data scientist

178
00:09:22,200 --> 00:09:25,799
at Verve Group Europe. She completed a PhD in clinical

179
00:09:25,840 --> 00:09:30,320
psychology and a postdoc in biostatistics before leaving academia for

180
00:09:30,360 --> 00:09:34,039
a data science career. She has worked for seven years

181
00:09:34,039 --> 00:09:37,600
as a data scientist in both Australia and Germany, developing

182
00:09:37,639 --> 00:09:42,840
a range of products including recommendation systems, analysis platforms, search

183
00:09:42,879 --> 00:09:47,200
engine improvements and audience profiling. She's held a broad range

184
00:09:47,200 --> 00:09:51,159
of responsibilities in her career, doing everything from data analytics

185
00:09:51,159 --> 00:09:55,559
to maintaining machine learning solutions and production. She's a longtime

186
00:09:55,600 --> 00:10:01,159
content creator in data science across conference and user group presentations, books, webinars,

187
00:10:01,200 --> 00:10:04,320
and posts on both her own and jet Brains blogs.

188
00:10:04,639 --> 00:10:06,279
In other words, a slacker.

189
00:10:09,320 --> 00:10:11,200
Speaker 2: It occurs to me, Jody, that you and I hang

190
00:10:11,240 --> 00:10:13,320
out several times a year of various conferences, But I

191
00:10:13,320 --> 00:10:15,320
don't know that Carl's had time with you since we

192
00:10:15,399 --> 00:10:18,080
did that show at Tekarama. Takarama was the last time

193
00:10:18,120 --> 00:10:20,320
I saw you, No a couple of years ago.

194
00:10:20,519 --> 00:10:25,240
Speaker 3: Yeah, yeah, exactly, So it's been a long time actually, Yeah.

195
00:10:25,360 --> 00:10:27,200
Speaker 2: Things have changed your jet brains now.

196
00:10:27,240 --> 00:10:32,399
Speaker 3: I have, certainly I think changed a lot. Yeah, yes, yeah, yeah,

197
00:10:32,440 --> 00:10:34,480
I was a jet brains when we first met as well,

198
00:10:34,519 --> 00:10:38,399
but I think I had only been there just over

199
00:10:38,440 --> 00:10:41,440
a year and so I was still like, I don't know,

200
00:10:41,559 --> 00:10:43,360
a little bit more shy, I think, a little bit

201
00:10:43,440 --> 00:10:44,279
less opinionated.

202
00:10:45,240 --> 00:10:47,480
Speaker 2: You've been hanging around with the troublemakers for a while.

203
00:10:47,320 --> 00:10:49,039
Speaker 3: Now, yeah, you talking about you?

204
00:10:49,519 --> 00:10:49,879
Speaker 2: Yeah?

205
00:10:49,879 --> 00:10:55,639
Speaker 3: Actually, well, and we're going to be hanging out in

206
00:10:55,679 --> 00:10:58,840
my hometown of Melbourne next month.

207
00:10:59,120 --> 00:11:03,039
Speaker 2: Yeah, we're excited about that, yeah, NDC. Yes, so, And

208
00:11:03,159 --> 00:11:06,159
of course I've got family in New Zealand, so I've

209
00:11:06,159 --> 00:11:08,039
got to do a little time in Sydney to see

210
00:11:08,039 --> 00:11:09,759
some folks there, and then I'll be in Melbourne for

211
00:11:09,799 --> 00:11:12,159
the show with you, and then a week on the

212
00:11:12,200 --> 00:11:15,559
farm hanging with the cows and the cousins and the

213
00:11:15,600 --> 00:11:20,039
sheep and the sheep, No sheep, the sheep, what sheeps?

214
00:11:20,080 --> 00:11:23,399
The South Island thing? No sheep on the farm. No, no,

215
00:11:23,440 --> 00:11:26,720
it's it's it's a dairy farm. Dairy farm. Yeah. And

216
00:11:26,759 --> 00:11:29,840
by the way, cows are awesome. Sheep are dumb, dumb,

217
00:11:29,879 --> 00:11:35,320
dumb dumb, holy cow dumb. But they're tasty. Like how

218
00:11:35,480 --> 00:11:39,000
Jody says they're cute. I say they're tasty tasty. Where

219
00:11:39,120 --> 00:11:41,320
my mind is at, that's in the cow. The cows

220
00:11:41,320 --> 00:11:44,039
are smart enough that if they're actually having distress, you know,

221
00:11:44,080 --> 00:11:47,200
in birthing or anything, they will come for help. Wow. Right,

222
00:11:47,360 --> 00:11:50,399
Like they're bright and they and they follow the they

223
00:11:50,399 --> 00:11:52,120
follow the gates of the packs where you want them

224
00:11:52,120 --> 00:11:53,279
to go. But it doesn't mean they don't know how

225
00:11:53,279 --> 00:11:55,039
to open them themselves if they really wanted to. I've

226
00:11:55,039 --> 00:11:57,440
seen them do it. Yeah, damn, they're just playing along.

227
00:11:57,559 --> 00:12:01,000
Cows are great, they really are. In lls are great.

228
00:12:01,159 --> 00:12:06,279
Speaker 3: Right in the right settings. Yeah, they are great.

229
00:12:06,679 --> 00:12:09,639
Speaker 2: Yes, But even that that show we did in twenty three,

230
00:12:09,759 --> 00:12:11,519
you know you were the grown up in the room there,

231
00:12:11,879 --> 00:12:15,120
it's just tired, Like listen, there were limits like that.

232
00:12:15,360 --> 00:12:18,039
We're so hype ish in twenty three, not that it's

233
00:12:18,080 --> 00:12:21,200
all common rational in twenty five, but it's so.

234
00:12:21,240 --> 00:12:24,039
Speaker 3: Funny actually, because I remember I was this was the

235
00:12:24,080 --> 00:12:26,519
first talk I did on LMS, so that one at

236
00:12:26,559 --> 00:12:28,799
Techorama actually was the first one I ever did.

237
00:12:29,159 --> 00:12:29,840
Speaker 2: No free lunch.

238
00:12:30,000 --> 00:12:32,440
Speaker 3: Yeah yeah, yeah yeah, And I was I was actually

239
00:12:32,519 --> 00:12:35,399
really scared of getting up and giving my opinion, like

240
00:12:35,440 --> 00:12:39,480
being a contrarian. Obviously, I'm feeling so vindicated right now.

241
00:12:39,600 --> 00:12:41,960
Speaker 2: But it's right, isn't it.

242
00:12:41,960 --> 00:12:45,679
Speaker 3: It's great being right, but it's I will say, like

243
00:12:45,799 --> 00:12:48,080
the hype has died slower than I thought it would.

244
00:12:48,120 --> 00:12:50,960
So I think Deep Seek finally has spelled the beginning

245
00:12:50,960 --> 00:12:51,240
of the.

246
00:12:51,279 --> 00:12:55,039
Speaker 2: End, but not the end of the business, but the

247
00:12:56,000 --> 00:12:57,080
end of the hype cycle.

248
00:12:57,440 --> 00:12:58,480
Speaker 3: The end of the hype cycle.

249
00:12:58,600 --> 00:13:01,200
Speaker 2: Okay, I appreciate that the approach.

250
00:13:00,960 --> 00:13:05,519
Speaker 3: To how we're going to be I guess, manufacturing these models,

251
00:13:06,240 --> 00:13:10,440
deploying these models, and thinking about these models fundamentally changed

252
00:13:10,440 --> 00:13:13,879
with Deep Seeks. So m it sort of showed that

253
00:13:14,039 --> 00:13:17,159
this hyperinvestment in data centers, which was kicking off with

254
00:13:17,200 --> 00:13:20,799
the Stargate project in the US. To explain context to

255
00:13:20,799 --> 00:13:21,919
anyone in the audience who doesn't know.

256
00:13:21,919 --> 00:13:24,120
Speaker 2: It, five hundred billion dollars.

257
00:13:23,799 --> 00:13:27,600
Speaker 3: And intended five hundred billion dollar investment between Open AI,

258
00:13:27,919 --> 00:13:32,039
the US government, and I think Microsoft was involved so I.

259
00:13:31,960 --> 00:13:33,240
Speaker 2: Think Microsoft pulled out of it.

260
00:13:33,240 --> 00:13:37,080
Speaker 3: It was gorecle very okak got you Yeah, yeah, that

261
00:13:37,440 --> 00:13:38,320
just got announced.

262
00:13:39,440 --> 00:13:41,279
Speaker 2: Yeah, there was a little political game here is that

263
00:13:41,320 --> 00:13:43,399
was also run around the town. They sort of announced this, Hey,

264
00:13:44,080 --> 00:13:45,759
you know, I know we had this deal with open

265
00:13:45,759 --> 00:13:47,720
Ai wherever there's going to run an azure, but we're

266
00:13:47,799 --> 00:13:50,240
ready to let that go. I think it was because

267
00:13:50,279 --> 00:13:52,759
of Stargate that. Yeah, you know, there was sort of

268
00:13:52,799 --> 00:13:55,000
this pressure on Microsoft. You have to keep growing, growing, growing,

269
00:13:55,039 --> 00:13:57,039
and they're like, this is getting irrational. So if you

270
00:13:57,080 --> 00:13:59,159
want to go play with someone else, you knock yourself out.

271
00:13:59,240 --> 00:14:02,039
So yeahing it back to deepseek for a minute. From

272
00:14:02,080 --> 00:14:05,440
what I understand, you know, the open Ai and all

273
00:14:05,480 --> 00:14:08,039
these other models are looking at that and learning from

274
00:14:08,080 --> 00:14:10,200
it and figuring out how to make their own models

275
00:14:10,240 --> 00:14:16,600
more efficient. And at one point I heard that the

276
00:14:17,480 --> 00:14:20,360
Chinese model is, you know, hey, let's spend a lot

277
00:14:20,440 --> 00:14:24,240
less money on these things so that they're less expensive.

278
00:14:24,360 --> 00:14:26,200
We don't have to use as many processors and all

279
00:14:26,240 --> 00:14:29,960
that stuff. And I think I heard that, you know,

280
00:14:30,039 --> 00:14:34,519
the response from the American companies was, oh no, we're

281
00:14:34,600 --> 00:14:36,639
just going to make it ten times more one hundred

282
00:14:36,639 --> 00:14:40,759
times more powerful, you know, so a different kind of

283
00:14:40,840 --> 00:14:44,240
mindset whereas but that was originally Now I think that

284
00:14:44,360 --> 00:14:52,879
there's more of a desire to make to get smaller lllms, right, yeah,

285
00:14:53,080 --> 00:14:54,120
that are more specialized.

286
00:14:54,360 --> 00:14:59,039
Speaker 3: The new ones of the story is that basically we've

287
00:14:59,120 --> 00:15:01,960
known that there are ways to make neural nets more efficient,

288
00:15:02,120 --> 00:15:05,399
right like, there are ways of making the models smaller,

289
00:15:05,799 --> 00:15:09,039
or after you've trained them, actually trimming them down and

290
00:15:10,480 --> 00:15:13,039
getting the same performance or almost the same performance for

291
00:15:13,200 --> 00:15:16,919
much smaller number of parameters. We've also known for quite

292
00:15:16,919 --> 00:15:18,960
a long time, and this is true with any machine

293
00:15:19,039 --> 00:15:22,279
learning model, that the higher the quality of data the

294
00:15:22,879 --> 00:15:24,960
you know, the better the model can perform for much

295
00:15:25,000 --> 00:15:27,440
smaller number of parameters. So this was proven last year

296
00:15:27,480 --> 00:15:29,919
with the Falcon last year or the year before with

297
00:15:30,000 --> 00:15:32,919
the Falcon models, they were sort of the first big

298
00:15:33,000 --> 00:15:35,360
open source ones that were trained on higher quality data

299
00:15:35,440 --> 00:15:38,240
sets and got a lot more performance for less parameters.

300
00:15:38,759 --> 00:15:42,200
But the most reliable way to get better performance was

301
00:15:42,320 --> 00:15:48,759
to scale, and I think what happened. The story I've

302
00:15:48,799 --> 00:15:52,279
heard in China is that they just couldn't get access

303
00:15:52,360 --> 00:15:57,720
to the same size of GPUs because of sanctions. Not

304
00:15:57,840 --> 00:16:02,000
sanctions Basically they weren't being sold in China, and so

305
00:16:02,320 --> 00:16:05,120
they had to make do with older and much less

306
00:16:05,200 --> 00:16:07,440
efficient processes, and they had to do all these tricks

307
00:16:07,519 --> 00:16:11,159
to basically share the training across a bunch of smaller machines.

308
00:16:11,799 --> 00:16:16,919
So this meant that they just couldn't create absolutely massive models.

309
00:16:17,679 --> 00:16:20,759
And essentially this meant that, yeah, they were forced to

310
00:16:20,799 --> 00:16:24,240
create a smaller model. But you know, the thing is

311
00:16:24,360 --> 00:16:28,519
is the quality of AI researchers and AI engineers that

312
00:16:28,600 --> 00:16:32,440
are being employed at companies like Open Ai and Anthropic

313
00:16:32,559 --> 00:16:35,480
and companies like this. I'm sure that they knew it

314
00:16:35,600 --> 00:16:38,279
was possible. It was just as I understood it, a

315
00:16:38,360 --> 00:16:42,759
less reliable path to performance. And you know, the American

316
00:16:42,799 --> 00:16:45,519
companies had they had the money and they had the

317
00:16:45,720 --> 00:16:48,679
servers to train it, so why not go big?

318
00:16:49,000 --> 00:16:53,120
Speaker 2: And they understand that race right like they understand build bigger,

319
00:16:53,279 --> 00:16:56,200
keep going like it's a very American approach to things. Yes,

320
00:16:56,399 --> 00:16:59,559
you can always tune later, right, do your land grab now, but.

321
00:16:59,600 --> 00:17:03,799
Speaker 1: Also that there's a difference between having one huge model

322
00:17:03,919 --> 00:17:08,640
like you know, chat ept that knows everything as bazillions

323
00:17:08,680 --> 00:17:11,799
of nodes or whatever it is, and then can you know,

324
00:17:12,079 --> 00:17:16,359
can cross reference things right and put connect the dots

325
00:17:17,119 --> 00:17:20,519
very much in ways that humans do, but in even

326
00:17:20,640 --> 00:17:26,160
more broadly, Whereas if you have smaller, less expensive models

327
00:17:26,240 --> 00:17:31,480
that are just our lllms that are trained on specific data, right,

328
00:17:32,079 --> 00:17:35,880
you'll get probably get more accurate things out of them

329
00:17:36,079 --> 00:17:41,279
for that particular set you know, that particular context maybe

330
00:17:41,880 --> 00:17:45,000
and then be able to have many of those with

331
00:17:45,359 --> 00:17:48,880
that have different expertise, but you won't necessarily be able

332
00:17:48,960 --> 00:17:51,160
to it won't necessarily be able to connect the dots

333
00:17:51,319 --> 00:17:53,799
like a large, huge model can.

334
00:17:53,960 --> 00:17:56,640
Speaker 3: Right. This can actually lead into a further discussion about

335
00:17:56,720 --> 00:18:01,039
measurement if we want. But basically, looking at the current

336
00:18:01,079 --> 00:18:05,400
benchmarks that they're using to assess performance of llms, Deepseeky

337
00:18:05,559 --> 00:18:08,880
and smaller models coming out of China are actually rivaling

338
00:18:09,000 --> 00:18:12,880
the performance of larger models. So basically the understanding seems

339
00:18:12,920 --> 00:18:15,160
to be is that a lot of the parameters that

340
00:18:15,279 --> 00:18:19,680
these big models have are not actually being used every

341
00:18:19,799 --> 00:18:23,720
single time you try to do like inference for a

342
00:18:23,799 --> 00:18:26,559
particular task. It's only a subset of the parameters. So

343
00:18:26,880 --> 00:18:29,559
the way to think about parameters is think about neural

344
00:18:29,640 --> 00:18:32,400
nets as like you have inputs and then you have

345
00:18:32,480 --> 00:18:35,880
a bunch of neurons that are connected by what are

346
00:18:35,920 --> 00:18:39,079
called weights. They're basically multipliers, and you can kind of

347
00:18:39,160 --> 00:18:43,400
think about inference as a path that you take through

348
00:18:43,559 --> 00:18:46,519
the neural net, where like, you know, the whole thing's

349
00:18:46,559 --> 00:18:49,640
going to be used, but only certain weights will actually

350
00:18:49,839 --> 00:18:54,480
have an impact for particular types of tasks. And it

351
00:18:54,559 --> 00:18:57,480
sort of seems that what's happened with scaling down these

352
00:18:57,559 --> 00:19:01,039
models is that because they learned on so much data,

353
00:19:01,359 --> 00:19:03,119
and so much of the data seems to have not

354
00:19:03,240 --> 00:19:06,960
been high quality, that they really, like a lot of

355
00:19:07,079 --> 00:19:10,799
the parameters were not really being used in the majority

356
00:19:10,880 --> 00:19:13,519
of cases, they were just I see dead weight.

357
00:19:14,359 --> 00:19:17,319
Speaker 1: And so so if you wanted to translate parameters and

358
00:19:17,400 --> 00:19:21,119
neurons to language, we're talking about the probability of the

359
00:19:21,240 --> 00:19:24,720
next word exactly right that it spits out. Yeah, and

360
00:19:24,799 --> 00:19:29,759
what you're saying is that they're only choosing from parameters

361
00:19:29,839 --> 00:19:30,640
with higher weights.

362
00:19:31,240 --> 00:19:34,319
Speaker 3: Yeah, it's it's like or words.

363
00:19:34,079 --> 00:19:34,799
Speaker 2: With higher weights.

364
00:19:35,279 --> 00:19:37,480
Speaker 3: Yeah. So basically the way it works is, like you

365
00:19:37,559 --> 00:19:40,799
think about the last layer of the neural net is

366
00:19:40,920 --> 00:19:44,440
basically like all the words in the vocabulary. So it's

367
00:19:44,440 --> 00:19:48,279
obviously really really huge, and so the whole neural net

368
00:19:48,440 --> 00:19:51,960
is trying to predict to the probability of which of

369
00:19:52,079 --> 00:19:54,880
these words is the most likely to come next. So

370
00:19:55,240 --> 00:19:58,400
it's basically saying that for a particular import only a

371
00:19:58,480 --> 00:20:01,480
subset of that, you know, the paths that go through

372
00:20:01,519 --> 00:20:03,720
the neural net are actually going to give good information

373
00:20:04,359 --> 00:20:08,240
about what the next word is. And so yeah, it's

374
00:20:09,440 --> 00:20:13,279
it's also like it's kind of fascinating because the models

375
00:20:13,319 --> 00:20:17,039
are such black boxes. No one fully understands how the

376
00:20:17,519 --> 00:20:20,599
decisions are being made. I'm putting decisions in air quotes.

377
00:20:20,640 --> 00:20:25,200
I want to make this clear because interpretability is hot,

378
00:20:25,279 --> 00:20:28,000
but this is actually interpretability is becoming a really hot

379
00:20:28,079 --> 00:20:31,279
area in twenty twenty five. So actually understanding how llms

380
00:20:31,319 --> 00:20:34,000
come to the conclusions they come to, or sorry, how

381
00:20:34,079 --> 00:20:36,720
the predictions being made. Let's put it in more clinical terms,

382
00:20:37,200 --> 00:20:40,200
and that's going to help firstly make the models more efficient,

383
00:20:40,279 --> 00:20:43,119
but also demystify a lot of the assumptions we make

384
00:20:43,319 --> 00:20:46,480
about the predictions they make. Like we look at the prediction,

385
00:20:46,599 --> 00:20:49,519
we're like, oh, it's solving problems because if a person

386
00:20:49,599 --> 00:20:52,160
did that, it would be showing problem solving. Or the

387
00:20:52,240 --> 00:20:54,839
model's more intelligent because if a person did that, it

388
00:20:54,880 --> 00:20:58,000
would be showing more intelligence, but that's just us projecting.

389
00:20:58,119 --> 00:21:01,839
Speaker 2: Sure, yeah, anther of morphisation. Now you know, I'm maybe

390
00:21:01,839 --> 00:21:03,880
I'm thinking about this the wrong way, but you know,

391
00:21:03,960 --> 00:21:05,720
as soon as you say that, I'm like, hey, there's like,

392
00:21:05,839 --> 00:21:08,559
what six hundred thousand words in the Oxford Dictionary that's

393
00:21:08,599 --> 00:21:11,039
just English and most people use fifteen hundred of them.

394
00:21:11,920 --> 00:21:14,599
So oh yeah, yeah, yeah. You know here you've built

395
00:21:14,640 --> 00:21:18,279
this model that has this huge potential range of comprehension

396
00:21:18,720 --> 00:21:21,759
and you're using a tiny subsect of it depending on

397
00:21:21,839 --> 00:21:24,400
what you're doing. Especially when we're coming at this from

398
00:21:24,440 --> 00:21:27,200
the copilot part of you was like, I'm working on code.

399
00:21:27,440 --> 00:21:31,200
Speaker 1: Yeah, every symbol in the language is a is a

400
00:21:31,279 --> 00:21:32,440
word essentially right.

401
00:21:33,839 --> 00:21:37,920
Speaker 2: So, but you also talked about performance. In My immediate

402
00:21:38,160 --> 00:21:41,000
reaction was, so, what do we mean when we say performance?

403
00:21:42,400 --> 00:21:42,559
Speaker 3: Yes?

404
00:21:42,880 --> 00:21:45,440
Speaker 2: Is that speed? Is that a speed measurement or is

405
00:21:45,519 --> 00:21:47,079
that an accuracy measurement?

406
00:21:47,440 --> 00:21:51,200
Speaker 3: Yeah? So to kind of put this in context, I

407
00:21:51,240 --> 00:21:55,759
gave a keynote to NBC Porto about all the hairy

408
00:21:55,880 --> 00:21:59,200
things that go along with assessing LLM. So I didn't

409
00:21:59,200 --> 00:22:02,480
get into speed. We can come back to that if

410
00:22:02,519 --> 00:22:05,039
we get time. But it's more about like how do

411
00:22:05,119 --> 00:22:08,920
people judge if these models are good? And last time

412
00:22:09,000 --> 00:22:11,480
we talked and you gave the episode this name, we

413
00:22:11,559 --> 00:22:13,759
talked about the concept of there's no free lunch in

414
00:22:13,880 --> 00:22:17,880
machine learning, and what this means is there is no

415
00:22:19,000 --> 00:22:22,599
there's no one model that will be best for every

416
00:22:22,759 --> 00:22:24,359
possible task you can do.

417
00:22:24,960 --> 00:22:25,119
Speaker 2: Right.

418
00:22:25,359 --> 00:22:27,839
Speaker 3: But what we've seen with the way people talk about

419
00:22:28,000 --> 00:22:32,680
llms is there advertised exactly like this. Like it's like, oh,

420
00:22:33,519 --> 00:22:35,960
open Ai just came out with the one model, and

421
00:22:36,079 --> 00:22:39,359
it is the best model on the market, right, right,

422
00:22:39,640 --> 00:22:43,519
And even if we're not, let's put like engineering considerations aside,

423
00:22:43,559 --> 00:22:45,519
let's talk about like, let's put cost aside, let's put

424
00:22:45,559 --> 00:22:47,519
speed aside. That's still not going to be true.

425
00:22:47,839 --> 00:22:49,920
Speaker 1: It's like who's the best guitar player in the world?

426
00:22:51,000 --> 00:22:52,799
Speaker 3: Yes, how do you measure this?

427
00:22:53,079 --> 00:22:55,880
Speaker 2: That's an impossible question? Answered well, I think when they

428
00:22:55,920 --> 00:22:57,720
were saying best that time, we were talking the largest

429
00:22:57,799 --> 00:22:59,559
number of parameters, weren't they.

430
00:23:00,400 --> 00:23:03,759
Speaker 3: Well, what they're talking about is there's this suite of

431
00:23:04,240 --> 00:23:09,319
benchmarks that are designed to assess LLM performance. And we

432
00:23:09,440 --> 00:23:12,480
talked about this last time. But llms were originally designed

433
00:23:12,559 --> 00:23:17,680
to be natural language processing task generalists. So they're good

434
00:23:17,720 --> 00:23:20,599
at doing a range of natural language tasks, often without

435
00:23:20,720 --> 00:23:22,480
further training out of the box, so they can do

436
00:23:22,559 --> 00:23:29,680
things like classification, summarization, they can do translation, things like this.

437
00:23:30,680 --> 00:23:35,920
So generally, when these models were first designed, they were

438
00:23:36,200 --> 00:23:39,319
benchmarked against how well they could do these natural language tasks,

439
00:23:39,359 --> 00:23:42,640
like specific things like question and answering, translation, blah blah blah.

440
00:23:43,759 --> 00:23:48,279
But as as the capabilities of the models have grown,

441
00:23:48,559 --> 00:23:51,759
or maybe they seem to have grown, we don't know.

442
00:23:52,680 --> 00:23:55,079
What we started doing is getting them to do things

443
00:23:55,319 --> 00:23:58,920
like grade school math problems, or we've gotten them to

444
00:23:59,000 --> 00:24:02,920
do suites of questions that are designed to assess problem

445
00:24:03,000 --> 00:24:06,680
solving or blah blah blah. And then what we do

446
00:24:07,119 --> 00:24:10,240
is we collate a bunch of these gold standard measures

447
00:24:10,279 --> 00:24:12,880
together and we combine them in such a way, and

448
00:24:12,960 --> 00:24:15,480
we create leader boards and we rank these models and

449
00:24:15,559 --> 00:24:18,160
we say, oh, Okay, this model is the best because

450
00:24:18,160 --> 00:24:20,640
it did the best at the MMLU, which is like

451
00:24:20,759 --> 00:24:25,680
a reasoning benchmark, or this one's the best because it

452
00:24:25,799 --> 00:24:28,880
did the best at like a collated collection of all

453
00:24:28,960 --> 00:24:32,240
of these benchmarks. So it's doing well on reasoning, and

454
00:24:32,359 --> 00:24:34,680
it's doing well on problem solving, and it's doing well

455
00:24:34,720 --> 00:24:38,440
on math, and it's doing well on coding. But this

456
00:24:38,640 --> 00:24:42,519
is the thing, like, firstly, a lot of these measures

457
00:24:43,000 --> 00:24:46,839
have been found to have serious problems. Then they've been

458
00:24:46,880 --> 00:24:49,279
found to really not measure what they said they claim

459
00:24:49,319 --> 00:24:53,799
to measure in a variety of ways. And the second is, Okay,

460
00:24:54,079 --> 00:24:58,200
I am an application developer. I want to design an

461
00:24:58,240 --> 00:25:00,960
application that uses an LM. Say I want to make

462
00:25:01,359 --> 00:25:06,240
a chatbot that can help people plan their holidays. What

463
00:25:06,440 --> 00:25:09,599
does it matter to me that an l ELM is

464
00:25:09,799 --> 00:25:15,160
really good at solving science problems, grade school math problems,

465
00:25:16,079 --> 00:25:18,480
Like is that going to be good for my application?

466
00:25:18,960 --> 00:25:22,720
Speaker 2: So, got a calculator do you have, like, okay, gotta

467
00:25:22,799 --> 00:25:24,119
coverage and.

468
00:25:24,160 --> 00:25:26,960
Speaker 3: It's probably going to do the math wrong anywhere because

469
00:25:27,200 --> 00:25:33,119
they're not symbolically simulating exactly they do that, But.

470
00:25:33,160 --> 00:25:36,240
Speaker 2: Then you also have it return the response in the

471
00:25:36,319 --> 00:25:37,119
form of a limerick.

472
00:25:39,240 --> 00:25:41,240
Speaker 3: That's fantastic. It's what our customers needed.

473
00:25:41,680 --> 00:25:41,960
Speaker 2: That's it.

474
00:25:44,680 --> 00:25:47,039
Speaker 3: So yeah, this is part of the problem. The way

475
00:25:47,079 --> 00:25:49,680
we talk about the way we talk about l elms

476
00:25:50,200 --> 00:25:53,000
is we talk about them like they are a thing

477
00:25:53,279 --> 00:25:57,519
independent of machine learning, but they are absolutely not. And

478
00:25:58,640 --> 00:26:00,279
part of the problem with that is it means that

479
00:26:00,319 --> 00:26:02,000
the way that we use them is we tend to

480
00:26:02,079 --> 00:26:06,839
trust their outputs too much, and we also tend to

481
00:26:08,000 --> 00:26:11,759
you know, not have scrutiny about like whether a model

482
00:26:11,839 --> 00:26:14,519
is the best fit for our use KSE, so we

483
00:26:14,599 --> 00:26:19,000
don't design assessments to see like is this actually doing

484
00:26:19,079 --> 00:26:21,799
what it's supposed to do, which we would absolutely do

485
00:26:21,920 --> 00:26:23,160
with traditional machine learning.

486
00:26:23,359 --> 00:26:27,960
Speaker 1: I have had the experience of using lllms in you know,

487
00:26:28,079 --> 00:26:32,640
both chat, GPT and Copilot to help with coding things,

488
00:26:32,880 --> 00:26:36,799
and I found a situation where I asked it to

489
00:26:37,559 --> 00:26:41,359
do something, you know, to write something, and instead of

490
00:26:41,880 --> 00:26:44,640
pointing to something in the framework that already did that

491
00:26:45,559 --> 00:26:47,920
and say why don't you just use this, it just

492
00:26:48,000 --> 00:26:52,359
went ahead creating the thing, you know, reinventing the wheel.

493
00:26:53,240 --> 00:26:55,640
And then you know, an hour later, I've got something

494
00:26:55,680 --> 00:26:58,160
that works. But I'm like, hey, there's something in the

495
00:26:58,200 --> 00:26:59,519
framework that works just like this.

496
00:27:00,319 --> 00:27:00,519
Speaker 2: Yeah.

497
00:27:02,039 --> 00:27:05,119
Speaker 1: So it's that's why you need a human in the

498
00:27:05,319 --> 00:27:06,000
in the equation.

499
00:27:06,440 --> 00:27:08,920
Speaker 2: Although although where's there a prompt there to say is

500
00:27:09,039 --> 00:27:10,920
there a class that does X?

501
00:27:11,279 --> 00:27:13,880
Speaker 1: Well, that would have been yeah, that's the human error

502
00:27:13,960 --> 00:27:16,880
that because that should be the first question. It's like

503
00:27:16,960 --> 00:27:19,079
when somebody says, you know, I have an idea for

504
00:27:19,160 --> 00:27:22,559
an app, and my first question is, well, first of all,

505
00:27:23,119 --> 00:27:24,720
I don't I'm not going to write it for you

506
00:27:24,839 --> 00:27:25,599
unless you pay.

507
00:27:25,559 --> 00:27:28,640
Speaker 2: Me in second of all, does it already exist? And

508
00:27:28,799 --> 00:27:31,400
the answer is usually yeah, if it's really that good

509
00:27:31,440 --> 00:27:36,039
of an idea, somebody else's somebody else has done it. Okay, well,

510
00:27:36,039 --> 00:27:37,480
why don't we take a break down. I want to

511
00:27:37,519 --> 00:27:39,720
dig into some of these evaluation strategies.

512
00:27:39,960 --> 00:27:43,119
Speaker 1: All right, we'll be right back after these very important messages.

513
00:27:43,119 --> 00:27:46,200
Stay tuned. Do you have a complex dot net monolith

514
00:27:46,240 --> 00:27:50,119
you'd like to refactor to a microservices architecture? The micro

515
00:27:50,240 --> 00:27:53,920
Service Extractor for dot Net tool visualizes your app and

516
00:27:54,119 --> 00:27:58,079
helps progressively extract code into micro services. Learn more at

517
00:27:58,160 --> 00:28:02,519
aws dot Amazon dot com, slash modernize.

518
00:28:05,279 --> 00:28:07,200
Speaker 2: Am We're bag. It's done at Rocks Amateur canvill that's

519
00:28:07,240 --> 00:28:12,319
called Franklin talking to doctor Jody Burchell. Hi. And if

520
00:28:12,759 --> 00:28:16,519
you don't enjoy those those ads and you'd like an alternative,

521
00:28:16,599 --> 00:28:19,279
we do have a Patreon that provides an ad free feed.

522
00:28:19,880 --> 00:28:22,240
Let's go to patreon dot com. Check it out Patreon

523
00:28:22,279 --> 00:28:25,200
dot dot NetRocks dot com. Yeah, so I found the

524
00:28:25,279 --> 00:28:28,680
deep avow site that talks about MMLU. But nearest I

525
00:28:28,720 --> 00:28:31,039
can tell this is just a set of questions in

526
00:28:31,119 --> 00:28:32,279
different topic areas.

527
00:28:32,920 --> 00:28:41,799
Speaker 3: Yes, yes, so so let's talk about benchmarks. So there

528
00:28:42,160 --> 00:28:47,240
is a very famous leader board called the Hugging Face

529
00:28:48,400 --> 00:28:52,680
open Fellolane leader board something like that. Okay, So hugging

530
00:28:52,759 --> 00:28:56,799
Face is a company. They're based in France and basically

531
00:28:57,119 --> 00:29:00,480
what they do in their open source branch is provide

532
00:29:00,599 --> 00:29:03,720
access to all of the major open source what are

533
00:29:03,720 --> 00:29:07,920
called foundational models, so big l lams that are open

534
00:29:08,000 --> 00:29:14,960
source computer vision models, those that can generate audio, you know,

535
00:29:15,079 --> 00:29:19,119
do transcription, all these sort of things. And so Hugging

536
00:29:19,200 --> 00:29:23,160
Face take the open source models. They run these models

537
00:29:23,200 --> 00:29:25,440
against a suiteter benchmarks and then they call aid them.

538
00:29:26,480 --> 00:29:29,880
And they used to have a used to have a

539
00:29:30,000 --> 00:29:33,640
leaderboard up until June last year. This was the first

540
00:29:33,759 --> 00:29:40,039
version and it included scales like Hella, Swag and the MMLU.

541
00:29:41,359 --> 00:29:43,480
So it got retired for a couple of reasons. But

542
00:29:43,799 --> 00:29:46,319
one of the reasons that got retired is people started

543
00:29:46,359 --> 00:29:51,359
going through the questions and MMLU was bad. It had

544
00:29:51,400 --> 00:29:55,079
a few questions that literally were like I think one

545
00:29:55,119 --> 00:29:59,400
of them was something like the continuity of the theory.

546
00:29:59,640 --> 00:30:02,039
That's that's the full question. And then it was a

547
00:30:02,119 --> 00:30:04,880
bunch of multiple choice answers that were just lists of

548
00:30:05,000 --> 00:30:09,240
numbers like that was the question, and think about, Wow,

549
00:30:09,400 --> 00:30:11,799
the gold standard is a human, so humans meant to

550
00:30:11,799 --> 00:30:13,799
be able to answer this, And then you rank how

551
00:30:13,880 --> 00:30:16,039
well the LM goes, and I'm.

552
00:30:15,960 --> 00:30:17,559
Speaker 2: Like, nobody can answer that.

553
00:30:17,720 --> 00:30:18,440
Speaker 3: What does this mean?

554
00:30:18,920 --> 00:30:19,759
Speaker 2: What does it even mean?

555
00:30:20,200 --> 00:30:23,039
Speaker 3: What does this mean? But my favorite, my favorite, my

556
00:30:23,079 --> 00:30:25,759
favorite was Hella Swag. So apparently Hella Swag I think

557
00:30:25,920 --> 00:30:28,720
was made using mechanical turk, so they got people to

558
00:30:28,799 --> 00:30:32,759
generate the questions and then validate them. But clearly whoever

559
00:30:32,839 --> 00:30:36,119
picked up this task was like not particularly invested, Like,

560
00:30:36,559 --> 00:30:38,200
you know, they're not getting paid a lot of money,

561
00:30:38,400 --> 00:30:42,119
they probably didn't care, right, And I have actually an

562
00:30:42,319 --> 00:30:48,440
article with some of my favorite absolutely bizarre Hella Slag questions. Okay,

563
00:30:48,680 --> 00:30:51,839
now keep in mind I am reading this out as

564
00:30:51,920 --> 00:30:54,880
it's written. Okay, So we have a question, and we

565
00:30:55,000 --> 00:30:57,799
have a bunch of multiple choice answers, and what the

566
00:30:57,960 --> 00:31:01,720
LM is supposed to do is complete the scenario. So

567
00:31:01,799 --> 00:31:05,119
it's meant to pick the option that has the most

568
00:31:05,640 --> 00:31:10,559
you know, fitting scenario end. Okay, so I've got one

569
00:31:10,599 --> 00:31:15,640
for you. Man is in roofed gym weightlifting. Woman is

570
00:31:15,720 --> 00:31:19,279
walking behind the man watching the man. Woman is a

571
00:31:20,200 --> 00:31:24,559
tightening balls on stand on front of weight bar b

572
00:31:25,240 --> 00:31:29,279
lifting eights while he man sits to watch her cee

573
00:31:29,920 --> 00:31:34,160
doing mediocrity spinning on the floor, D lift the weight

574
00:31:34,319 --> 00:31:34,799
lift man.

575
00:31:37,119 --> 00:31:38,279
Speaker 2: That doesn't make any sense.

576
00:31:39,119 --> 00:31:42,240
Speaker 3: It doesn't and probably around a third of the questions

577
00:31:42,279 --> 00:31:44,440
in hellaswag with this garbage.

578
00:31:44,480 --> 00:31:47,279
Speaker 1: I just want to know what mediocrity spins are. I

579
00:31:47,759 --> 00:31:50,680
want to do that, and I just don't know because

580
00:31:50,680 --> 00:31:51,119
I don't.

581
00:31:51,000 --> 00:31:51,640
Speaker 2: Know the definition.

582
00:31:52,279 --> 00:31:54,359
Speaker 3: That's every time I turn around and knock something off

583
00:31:54,400 --> 00:32:00,880
a shelf with my clumsy hair. That was twenty twenty.

584
00:32:01,000 --> 00:32:02,920
That was mediocol for the child.

585
00:32:04,480 --> 00:32:07,839
Speaker 2: Yeah, there's been some times. So I mean this just

586
00:32:07,920 --> 00:32:11,359
seems lazy then, like the well, let's back up. Is

587
00:32:11,519 --> 00:32:14,519
asking questions of an LM actually a good way to

588
00:32:14,599 --> 00:32:15,799
measure its effectiveness?

589
00:32:16,200 --> 00:32:21,799
Speaker 3: Now? Yes and no. So you can create well defined

590
00:32:22,079 --> 00:32:24,599
problem suites if you have a good idea of what

591
00:32:24,720 --> 00:32:27,480
you're assessing. So this is this is basic measurement theory, right,

592
00:32:27,480 --> 00:32:32,880
It's like we learned this in psychology. It's tricky with

593
00:32:33,119 --> 00:32:37,839
llms because we have a tendency to extrapolate too much.

594
00:32:38,279 --> 00:32:42,039
We try to project what their performance would mean if

595
00:32:42,079 --> 00:32:44,400
a human did that, and we can't do it because

596
00:32:44,599 --> 00:32:49,079
llms do not have what's called fluid intelligence or general intelligence, right.

597
00:32:49,200 --> 00:32:52,839
They have what you could essentially call crystallized intelligence, which

598
00:32:52,880 --> 00:32:55,440
is that they have a bunch of little templates of

599
00:32:56,200 --> 00:33:00,160
how things work based on scenarios they've seen before. They

600
00:33:00,160 --> 00:33:03,960
can patent match questions they see against this, So you've

601
00:33:03,960 --> 00:33:06,839
got to be really careful about how far you deviate

602
00:33:06,920 --> 00:33:10,680
from the doing patent matching to their showing intelligence, right,

603
00:33:11,400 --> 00:33:13,880
But it is possible. Let's say you want to assess

604
00:33:14,039 --> 00:33:18,160
how well they do specific tasks, like they can answer

605
00:33:18,279 --> 00:33:21,240
questions about history or whatever. That's fine. I think that's

606
00:33:21,279 --> 00:33:26,640
fine to assess. It gets tricky because there are two

607
00:33:27,000 --> 00:33:31,079
main problems with using questions other than the one I've

608
00:33:31,119 --> 00:33:34,200
just said. The first is is that the answer type

609
00:33:34,359 --> 00:33:39,680
that an LM is presented with actually impacts their performance.

610
00:33:39,920 --> 00:33:44,599
So most of these measurements use multiple choice questions, and

611
00:33:44,640 --> 00:33:46,720
the reason that they do that is because it's much

612
00:33:46,839 --> 00:33:50,839
easier to score because they're essentially ways of seeing, you know,

613
00:33:50,880 --> 00:33:53,000
the probabilities of words I was talking about. You can

614
00:33:53,079 --> 00:33:56,680
quite accurately tell what's the highest probability sequence that it

615
00:33:56,720 --> 00:33:59,119
would have ended up predicting based on, you know, the

616
00:33:59,200 --> 00:34:02,880
ones that is present with, So you know, it's much

617
00:34:03,000 --> 00:34:06,200
much easier to work this out. But you can also

618
00:34:06,279 --> 00:34:10,119
get them to generate answers, and generating free form answers

619
00:34:10,239 --> 00:34:12,960
is really hard to assess unless you're gtting humans to

620
00:34:13,039 --> 00:34:15,519
actually compare it to a gold standard because the statistical

621
00:34:15,559 --> 00:34:20,880
ways we have of comparing two sequences are imperfect. So

622
00:34:22,280 --> 00:34:24,880
most of the time people will use these multiple choice

623
00:34:24,880 --> 00:34:28,199
answer keys. But the problem is is that elms seem

624
00:34:28,280 --> 00:34:31,440
to do a lot better when they're presented with multiple

625
00:34:31,519 --> 00:34:35,159
choice answers compared to free form answers. Sure, and the

626
00:34:35,199 --> 00:34:36,920
reason it seems to be is because it's a lot

627
00:34:36,960 --> 00:34:40,239
easier to just pattern match to something they've already memorized

628
00:34:40,360 --> 00:34:45,159
as opposed to having to generalize a bit more. And

629
00:34:45,239 --> 00:34:51,039
then the second big problem is hell, elms are ridiculously

630
00:34:51,239 --> 00:34:54,159
sensitive to the format of the prompt template you use.

631
00:34:54,199 --> 00:34:57,159
We've already talked about this, like did you tell them

632
00:34:57,239 --> 00:35:02,199
to use a framework that already exists? But it's so

633
00:35:02,519 --> 00:35:07,079
much more subtle than that. So using a different placement

634
00:35:07,159 --> 00:35:11,840
of punctuation, using different spacing, this can impact the performance

635
00:35:11,920 --> 00:35:17,480
of LMS on task by like thirty fifty seventy percent. Wow, yes, yeah,

636
00:35:17,719 --> 00:35:21,840
and like why it seems to be again pattern matching.

637
00:35:22,159 --> 00:35:27,440
So if that like particular formatting is closer to something

638
00:35:27,480 --> 00:35:30,800
that it's seen already in training, it's more likely going

639
00:35:30,840 --> 00:35:31,800
to be able to get it right.

640
00:35:32,199 --> 00:35:34,000
Speaker 2: So all I got to do is ee cummings my

641
00:35:34,159 --> 00:35:35,079
prompt and it just.

642
00:35:40,079 --> 00:35:45,000
Speaker 1: Exactly it's just Richard invents a new ferbs.

643
00:35:45,079 --> 00:35:47,760
Speaker 2: Yeah. But you know an interesting point, like anytime you

644
00:35:47,840 --> 00:35:52,800
want to remind a person that this software is not intelligent,

645
00:35:53,039 --> 00:35:55,880
it's that that recognize that this is pattern matching. The

646
00:35:55,960 --> 00:35:59,000
fact that that as a human, I can hand you

647
00:35:59,280 --> 00:36:03,199
only lower case there's no punctuation statement or a perfectly

648
00:36:03,400 --> 00:36:08,119
punctuated statement, and you'll see it as exactly the same,

649
00:36:08,320 --> 00:36:11,320
just one lazier than the other. But the software treats

650
00:36:11,320 --> 00:36:12,079
it completely differently.

651
00:36:12,280 --> 00:36:15,599
Speaker 1: Exactly do you guys know the story of what the

652
00:36:15,960 --> 00:36:20,800
moment that Bill Gates went nuts over chat GPT and

653
00:36:21,400 --> 00:36:24,599
began to trust it and his mind was blown over it.

654
00:36:25,559 --> 00:36:28,360
So the Richard I sent you a link in the

655
00:36:28,480 --> 00:36:31,000
chat you can post it there. This is the story

656
00:36:31,039 --> 00:36:34,039
and I heard about this story on the on the radio.

657
00:36:34,880 --> 00:36:38,400
So the story from this CNBC dot com things. Bill

658
00:36:38,480 --> 00:36:43,000
Gates watched chat GPT asen ap bio exam and went

659
00:36:43,079 --> 00:36:47,960
into quote a state of shock. And this was August eleventh,

660
00:36:48,239 --> 00:36:51,960
twenty twenty three. So but what you don't know, and

661
00:36:52,239 --> 00:36:55,079
I don't even know if they say it in this article,

662
00:36:55,119 --> 00:36:57,960
I don't think they do. But a couple of months

663
00:36:58,000 --> 00:37:04,320
before is when Sam All actually showed Bill chat GPT

664
00:37:05,559 --> 00:37:07,960
and he added a couple of things and he said,

665
00:37:08,039 --> 00:37:10,639
you know what, you know, it would be a great test.

666
00:37:11,280 --> 00:37:16,760
Sam is if we could give it the ap bio test.

667
00:37:17,920 --> 00:37:22,000
And it aced it and then Sam goes home and

668
00:37:22,119 --> 00:37:25,880
two months later brings it back and it ass the exam.

669
00:37:26,039 --> 00:37:27,760
So what do you think happened in those two months?

670
00:37:27,960 --> 00:37:28,719
Speaker 3: What a mystery.

671
00:37:29,199 --> 00:37:31,599
Speaker 2: It's so strange, I can't imagine. I'd also point out

672
00:37:31,639 --> 00:37:34,239
that a GAP exam is largely multiple choice.

673
00:37:34,400 --> 00:37:37,320
Speaker 1: There you go, Well, I mean I thought it was

674
00:37:37,639 --> 00:37:40,400
I thought they were essay questions. There were five questions,

675
00:37:40,480 --> 00:37:43,679
but I don't I didn't read that part, but I

676
00:37:43,800 --> 00:37:49,760
heard that they were five five essay questions. Anyway, they

677
00:37:49,880 --> 00:37:52,760
did not say that in the article, but that that

678
00:37:52,840 --> 00:37:53,800
apparently happened.

679
00:37:53,880 --> 00:37:54,960
Speaker 2: It all depends on what you.

680
00:37:54,960 --> 00:37:58,039
Speaker 3: Train it on, right, Yes, and this is actually a

681
00:37:58,599 --> 00:38:03,199
bigger issue. So this is an issue that's called data leakage, again,

682
00:38:03,280 --> 00:38:06,280
well known problem in machine learning. It's when your model

683
00:38:06,440 --> 00:38:09,360
gets access to the test set during training that it

684
00:38:09,440 --> 00:38:13,920
can basically learn the answers and well, the implication from

685
00:38:14,480 --> 00:38:16,800
Carl is that this may not have been an accident

686
00:38:16,920 --> 00:38:20,920
this time. But you know, we don't have a clear

687
00:38:21,000 --> 00:38:23,079
idea of what's in the training data for a lot

688
00:38:23,159 --> 00:38:25,719
of these models. Even open source models now are being

689
00:38:25,840 --> 00:38:28,880
super cagey about what's in their training data. So they

690
00:38:28,880 --> 00:38:33,840
say it's a competitive advantage. But we know from experiments

691
00:38:33,920 --> 00:38:37,760
people have done that even benchmarks have ended up at

692
00:38:37,880 --> 00:38:41,599
least partially leaking into the data. So we know that

693
00:38:41,840 --> 00:38:44,639
a lot of these companies will optimize for benchmarks. They'll

694
00:38:44,679 --> 00:38:46,599
keep training the models until or not keep training them,

695
00:38:46,639 --> 00:38:49,079
but they'll keep tuning them until they do well on benchmarks.

696
00:38:49,159 --> 00:38:54,440
But even accidentally, because they're just scraping the open internet,

697
00:38:54,800 --> 00:38:57,920
sure they've accidentally shoved a bunch of these questions.

698
00:38:57,679 --> 00:38:59,519
Speaker 2: Which is probably where the benchmarks came from in the

699
00:38:59,559 --> 00:39:02,880
first place. Any so exactly eventually you're going to meet

700
00:39:02,960 --> 00:39:05,480
up with the data. It doesn't seem surprising at all.

701
00:39:06,519 --> 00:39:09,960
Speaker 3: So yeah, the modern suite of like benchmarks that started

702
00:39:09,960 --> 00:39:14,679
being created last year, they started making them private reasons

703
00:39:14,719 --> 00:39:18,000
to mitigate this. But it doesn't mean that you know,

704
00:39:18,199 --> 00:39:21,440
you as a consumer, you're a lay user of an

705
00:39:21,599 --> 00:39:24,079
l l M. Maybe not a lay user. You might

706
00:39:24,119 --> 00:39:26,519
be a bit more technically advanced, but none of us

707
00:39:26,559 --> 00:39:29,719
here are. AI research is right, right, and so we

708
00:39:30,199 --> 00:39:38,360
might be not far from that to inform consumer. Let's

709
00:39:38,360 --> 00:39:38,920
put me that way.

710
00:39:40,480 --> 00:39:43,360
Speaker 1: But you know, I just found it. I'm sorry to interrupt,

711
00:39:43,360 --> 00:39:46,519
I just found it. A story is that, you know,

712
00:39:46,679 --> 00:39:49,960
Gates issued what he believed to be a rather difficult

713
00:39:50,119 --> 00:39:54,079
challenge to Sam Oltman, bring chat GPT back to me

714
00:39:54,360 --> 00:39:57,639
once it could exhibit advanced human level competency by achieving

715
00:39:57,679 --> 00:40:00,599
the highest possible score on the ap by l exam.

716
00:40:01,559 --> 00:40:06,719
And so two months later, oh Magic, Open a Eyes

717
00:40:06,760 --> 00:40:10,400
developers came back and Gates watched the top score of

718
00:40:10,480 --> 00:40:13,320
five on the test. So so yeah, so there it is,

719
00:40:13,440 --> 00:40:17,639
right in black and white that actually happened. And as

720
00:40:17,679 --> 00:40:20,800
I was hearing this, I was like, you idiot, why

721
00:40:20,880 --> 00:40:24,000
didn't you just say give it to me when it

722
00:40:24,079 --> 00:40:28,119
can answer a test question and don't tell them what test? Yeah,

723
00:40:28,440 --> 00:40:31,960
you know a test question and then just do it.

724
00:40:32,320 --> 00:40:32,639
Speaker 2: Try it.

725
00:40:33,400 --> 00:40:34,079
Speaker 3: Yeah, I don't know.

726
00:40:34,599 --> 00:40:36,480
Speaker 2: Far be it from me to call Bill Gates an idiots?

727
00:40:38,039 --> 00:40:40,159
Did I actually do that? But you know, there's this

728
00:40:40,320 --> 00:40:43,320
confirmation bias situation you can put yourself into. Yeah.

729
00:40:43,719 --> 00:40:46,440
Speaker 3: And this is the thing too, Like I don't blame

730
00:40:46,920 --> 00:40:50,760
people for feeling enchanted by the models, like there is

731
00:40:50,840 --> 00:40:53,559
something so human feeling about them because they're echoing back

732
00:40:54,159 --> 00:40:59,559
our humanity. Yeah, but you need you always need to

733
00:40:59,599 --> 00:41:02,199
be CAUs and like we were trying to do at

734
00:41:02,199 --> 00:41:05,280
the beginning, like you see, we slip into anthromorphizing the

735
00:41:05,320 --> 00:41:06,559
models even though we know better.

736
00:41:06,519 --> 00:41:07,679
Speaker 2: For sure, because it's easy.

737
00:41:07,760 --> 00:41:14,320
Speaker 3: It is easy. But really, like even with the latest

738
00:41:14,360 --> 00:41:17,960
benchmarks trying to assess AGI, this one called the ARCAGI

739
00:41:18,159 --> 00:41:22,639
that Open Eyes three actually did very well on got

740
00:41:22,760 --> 00:41:30,239
seventy percent late last year. This is still just pattern matching,

741
00:41:30,559 --> 00:41:34,880
but pattern matching in a more organized way. It's basically

742
00:41:34,960 --> 00:41:37,280
the model has more of an ability to sort of

743
00:41:37,480 --> 00:41:41,239
sort through which patterns might be the best to apply.

744
00:41:41,400 --> 00:41:45,320
But again, we're just talking about a more systematic application

745
00:41:46,159 --> 00:41:49,760
of crystallized intelligence. We're not talking about generalizability yet.

746
00:41:50,320 --> 00:41:52,800
Speaker 2: Yeah, And I mean the more I read, the less

747
00:41:52,800 --> 00:41:54,840
I'm concerned about the AGI side of the equation. It

748
00:41:54,960 --> 00:41:56,960
seems more and more like a marketing term to hire

749
00:41:57,000 --> 00:41:58,280
more people to work at open AI.

750
00:41:59,119 --> 00:42:02,960
Speaker 1: Yeah, it's only a fluid term that keeps changing. The

751
00:42:03,079 --> 00:42:05,119
definition keeps changing.

752
00:42:05,360 --> 00:42:07,320
Speaker 3: But how do you assess AGI? Like I don't know

753
00:42:07,360 --> 00:42:09,760
if we talked about this last time, because I had

754
00:42:09,800 --> 00:42:12,400
that in my first talk, But you know, how can

755
00:42:12,440 --> 00:42:16,559
you even assess the gap between what a model knows

756
00:42:17,440 --> 00:42:21,440
and you know, a task, so like the difficulty that

757
00:42:21,760 --> 00:42:23,840
a model would have doing that task based on what

758
00:42:23,920 --> 00:42:27,400
it already knows, and then standardized that across a bunch

759
00:42:27,440 --> 00:42:30,679
of different models that have potentially been exposed to very

760
00:42:30,840 --> 00:42:34,119
different tasks and knowledge like it. Sure, it feels it

761
00:42:34,199 --> 00:42:35,880
feels like such a difficult.

762
00:42:35,519 --> 00:42:39,719
Speaker 2: Challenge and it's way too broad, and ultimately I feel

763
00:42:39,760 --> 00:42:42,920
like it's a distraction from the fact that we're just

764
00:42:42,960 --> 00:42:47,119
trying to be engineers making using a useful tool. And

765
00:42:47,360 --> 00:42:49,320
I mean I let off this conversation talking about the

766
00:42:49,360 --> 00:42:52,159
fact that I always ask folks like what one are

767
00:42:52,159 --> 00:42:54,159
you using right now? What are you enamored of? And

768
00:42:54,199 --> 00:42:55,360
the fact that you know, I had sort of a

769
00:42:55,440 --> 00:42:59,119
universal everybody likes cloud right now, it's like, why what

770
00:42:59,519 --> 00:43:03,039
do you What is your innate benchmark that made you

771
00:43:03,199 --> 00:43:05,119
switch to this or is it just a social pressure

772
00:43:05,199 --> 00:43:08,159
thing vibesmen because that smart person was using Cloud. Now

773
00:43:08,199 --> 00:43:10,519
I'll use claud and then there'll be some nice confirmation

774
00:43:10,679 --> 00:43:12,599
bias there. Well, yeah, no, it seems to be doing

775
00:43:12,679 --> 00:43:15,480
the thing. Is it actually better than the chat GPT information?

776
00:43:15,920 --> 00:43:19,960
I don't know how would you measure that? So we're

777
00:43:20,000 --> 00:43:22,639
in this loop and I don't feel like I can

778
00:43:23,159 --> 00:43:25,599
get a version, a new version of anything from any

779
00:43:25,639 --> 00:43:27,800
of these folks come out and you open AI a

780
00:43:27,920 --> 00:43:30,519
new cloud or any of these and say, okay, is

781
00:43:30,559 --> 00:43:32,360
it worth switching? Yeah? I mean I know they want

782
00:43:32,400 --> 00:43:35,559
me to. I know it's invariably more expensive, but is

783
00:43:35,599 --> 00:43:36,519
it better?

784
00:43:37,119 --> 00:43:41,519
Speaker 3: Yeah? Look, I have a prediction that here we go.

785
00:43:42,280 --> 00:43:43,639
I'm going to do a prediction why not?

786
00:43:43,880 --> 00:43:45,480
Speaker 2: Why not? Why not? Why not?

787
00:43:46,079 --> 00:43:48,559
Speaker 3: I'm going to say probably in a year's time, the

788
00:43:48,679 --> 00:43:52,559
landscape of providers is going to look quite different. Oh, definitely,

789
00:43:52,639 --> 00:43:57,199
And it's because the advantages of using smaller models is

790
00:43:57,480 --> 00:44:01,159
just drastically outweighs using bigger one. They're cheaper, they're more

791
00:44:01,280 --> 00:44:05,440
momentally friendly, they're more more specialized, they can be more specialized,

792
00:44:05,480 --> 00:44:07,880
like it's easier to tune them so that you can

793
00:44:08,320 --> 00:44:12,199
focus them on specific tasks. And yeah, ultimately they're just

794
00:44:12,679 --> 00:44:14,840
you know, it's easy to control what happens to your data.

795
00:44:15,239 --> 00:44:17,920
Speaker 2: So right, that's a big one.

796
00:44:18,320 --> 00:44:19,159
Speaker 3: That is a big one.

797
00:44:19,400 --> 00:44:22,719
Speaker 1: It's a big one, especially with something like deep seek.

798
00:44:22,800 --> 00:44:24,400
You know, the only way I'm going to run that

799
00:44:24,800 --> 00:44:27,239
is on my own network not connected to the internet.

800
00:44:27,440 --> 00:44:30,960
Speaker 2: Well, they do often a local offer a local version, Yeah,

801
00:44:31,039 --> 00:44:33,480
they do, right, which n video has been benchmarking with

802
00:44:33,599 --> 00:44:36,159
a fair bit I noticed, like, which I thought was cool,

803
00:44:36,360 --> 00:44:38,719
like smart thing to do, not just to because there's

804
00:44:38,719 --> 00:44:41,599
lots of folks saying, no, don't use the Chinese LLM.

805
00:44:43,119 --> 00:44:45,239
But yeah, the fact that video just said.

806
00:44:45,119 --> 00:44:47,280
Speaker 1: That's the same reason they're saying don't use TikTok, right,

807
00:44:47,400 --> 00:44:48,880
so they don't trust it more or less?

808
00:44:49,719 --> 00:44:51,760
Speaker 2: What could happen I don't.

809
00:44:51,679 --> 00:44:54,559
Speaker 3: Use Yeah, I don't use TikTok just because I'm deeply uncool,

810
00:44:54,880 --> 00:44:56,360
Like it makes with you.

811
00:44:58,000 --> 00:44:58,920
Speaker 2: I'm so with you.

812
00:44:59,679 --> 00:45:02,119
Speaker 3: I actually had to make a TikTok. I was at

813
00:45:02,159 --> 00:45:05,039
a workshop for my job like two months ago, so

814
00:45:05,159 --> 00:45:07,000
you know, I'm an advocate. So they're like, hey, let's

815
00:45:07,039 --> 00:45:09,880
teach these old people to make tiktoks nice. And I

816
00:45:09,960 --> 00:45:15,039
made this TikTok with Michelle you know, Michelle Richard, Michelle Frost,

817
00:45:15,519 --> 00:45:18,239
so yeah, yeah, yeah, she just started with jet brains.

818
00:45:18,679 --> 00:45:20,559
And so we made this TikTok with another of our

819
00:45:20,639 --> 00:45:23,239
colleagues and an involved Wilbur, her dog, and it was

820
00:45:23,360 --> 00:45:26,480
just like it was so bad. And then they're like

821
00:45:26,679 --> 00:45:29,519
it's awesome, like you should go on TikTok right now,

822
00:45:29,599 --> 00:45:32,960
and I'm like no, I'm deeply ashamed, like.

823
00:45:34,519 --> 00:45:35,239
Speaker 2: I should see.

824
00:45:35,320 --> 00:45:37,440
Speaker 3: This was bad.

825
00:45:39,000 --> 00:45:41,199
Speaker 1: I do have to confess that I have a TikTok

826
00:45:41,199 --> 00:45:44,800
account Carlotphoenix dot com. I have not used it yet

827
00:45:44,880 --> 00:45:47,920
for anything more than hey, I'm here, and I certainly

828
00:45:48,039 --> 00:45:51,079
don't scroll TikTok. I have so many better things to

829
00:45:51,199 --> 00:45:56,199
do than to scroll inane, insane, crazy music videos of

830
00:45:56,239 --> 00:45:58,719
people doing stupid things, and dogs and.

831
00:45:58,760 --> 00:46:01,639
Speaker 2: Cats are also. You know what it's interesting about TikTok

832
00:46:02,440 --> 00:46:05,760
is you're not really picking the content they're picking the content. Yes, yeah,

833
00:46:06,199 --> 00:46:09,320
they are watching your loiter time, so it's your behavior

834
00:46:09,360 --> 00:46:12,599
that's only selecting the content. But you know, it is

835
00:46:12,679 --> 00:46:15,440
a different mechanism there where you can't really curate a

836
00:46:15,559 --> 00:46:18,239
list or build a social graph. That's not up to you.

837
00:46:19,079 --> 00:46:23,840
And and I find that interesting, right, like that, were

838
00:46:23,960 --> 00:46:27,119
literally are handing over our attention to something else that's

839
00:46:27,400 --> 00:46:27,840
driving it.

840
00:46:28,079 --> 00:46:30,239
Speaker 3: Yeah, I do have to admit I'm so curious about

841
00:46:30,239 --> 00:46:31,119
their recommended the.

842
00:46:32,119 --> 00:46:35,840
Speaker 2: Yeah, well, as a technologist, right anything, like well, because

843
00:46:35,880 --> 00:46:38,480
that's the thing that they're all upset about. This is

844
00:46:38,519 --> 00:46:40,400
our secret sauce, and we'll be keeping it to ourselves,

845
00:46:40,400 --> 00:46:43,360
thanks very much. Oh, it's well, here's the thing. Is

846
00:46:43,400 --> 00:46:45,320
there a way when you see something that you don't

847
00:46:45,440 --> 00:46:47,039
like to say, give her to death. I don't want

848
00:46:47,039 --> 00:46:49,920
to see this again? No, because it's too late to

849
00:46:50,039 --> 00:46:53,519
scroll past it. You've already scrolled. The problem is that

850
00:46:53,599 --> 00:46:55,400
before you found out you didn't like it, you watched it.

851
00:46:57,880 --> 00:47:00,360
You know, it's the old old I'm trying to improve

852
00:47:00,400 --> 00:47:02,679
the quality of my diet by eating everything and deciding

853
00:47:02,719 --> 00:47:09,480
what I like. Yeah, there is no nutritional label on

854
00:47:09,599 --> 00:47:10,400
any of these things.

855
00:47:13,039 --> 00:47:14,760
Speaker 3: Sorry, it's called democracy.

856
00:47:15,519 --> 00:47:17,960
Speaker 2: Okay, so that's what you want to call it. The

857
00:47:18,039 --> 00:47:19,760
only person who doesn't get a vote is the viewer.

858
00:47:26,000 --> 00:47:27,639
I'm not cynical at all. I don't know what you're

859
00:47:27,639 --> 00:47:28,159
talking about.

860
00:47:28,320 --> 00:47:31,719
Speaker 1: What you remind me of David Mitchell, the British comedian

861
00:47:32,119 --> 00:47:33,800
who just every once in a while will just go

862
00:47:33,920 --> 00:47:34,679
off on a rant.

863
00:47:36,039 --> 00:47:46,079
Speaker 2: Just start and he'll keep me going. I'm fine, Everything's fine, Okay. Yeah,

864
00:47:46,159 --> 00:47:48,800
we're still at this core issue of how do I

865
00:47:48,880 --> 00:47:52,800
select an LM from my app? I mean, part of

866
00:47:52,840 --> 00:47:55,800
it is the running contact or I can I can

867
00:47:55,880 --> 00:47:58,159
go down the cost side and can go down the

868
00:47:58,360 --> 00:48:00,679
does it, you know, integrate well with my Do I

869
00:48:00,679 --> 00:48:02,400
any cloud access? Or can I run local?

870
00:48:02,480 --> 00:48:02,519
Speaker 3: Like?

871
00:48:02,559 --> 00:48:05,159
Speaker 1: There's all those decisions we need an LM to answer

872
00:48:05,239 --> 00:48:05,719
this question.

873
00:48:06,159 --> 00:48:06,679
Speaker 2: I don't think.

874
00:48:09,199 --> 00:48:11,840
Speaker 3: I have something even better. I've got a blog post, hey,

875
00:48:12,199 --> 00:48:15,199
are right? He so I will share this with you

876
00:48:15,320 --> 00:48:18,000
so you can share it with the audience. But I

877
00:48:18,079 --> 00:48:19,760
came across when I was writing my talk, I came

878
00:48:19,800 --> 00:48:24,440
across this absolutely phenomenal blog post. So guy's an AI

879
00:48:24,519 --> 00:48:28,159
engineer called Hassan Hussein. So this guy works in an

880
00:48:28,159 --> 00:48:33,440
AI consultancy. So exciting job these days goes out and

881
00:48:33,639 --> 00:48:38,079
he needs to basically build applications for companies that use AI.

882
00:48:39,159 --> 00:48:42,159
And one of the jobs that he talks about is

883
00:48:42,519 --> 00:48:45,519
he and his company were hired to build a chatbot

884
00:48:45,719 --> 00:48:48,840
for real estate agents. So basically, they wanted the real

885
00:48:48,920 --> 00:48:50,760
estate agents to be able to type in natural language,

886
00:48:50,840 --> 00:48:55,599
like give me the contact details for everyone in this area,

887
00:48:55,880 --> 00:48:59,559
whatever you know, and then the LM would generate a

888
00:48:59,719 --> 00:49:02,599
quer to a CRM something like that and return the information.

889
00:49:03,800 --> 00:49:06,519
So when they first started building the app, they like,

890
00:49:06,679 --> 00:49:10,280
they picked a good LM based on the leaderboard, good one,

891
00:49:10,840 --> 00:49:13,920
and then they wrote the initial prompt templates and then

892
00:49:14,639 --> 00:49:17,239
you know, everything looked good, and then things started not

893
00:49:17,360 --> 00:49:19,480
working on the edge cases, so they made the prompt

894
00:49:19,480 --> 00:49:21,400
a bit more elaborate, and then the prompt started getting

895
00:49:21,400 --> 00:49:24,519
really unwieldy, and then they realized the only evaluation metric

896
00:49:24,559 --> 00:49:27,159
they had was vibes and they were like, really, this

897
00:49:28,079 --> 00:49:32,440
is a mess. So he set out in this really

898
00:49:32,559 --> 00:49:35,400
interesting way how they actually went back to ground zero

899
00:49:35,599 --> 00:49:39,280
and they started again, and he said, like, basically, we

900
00:49:39,400 --> 00:49:43,960
realized we needed a tiered assessment. So he said, like,

901
00:49:44,239 --> 00:49:47,559
the first tier of assessment is unit tests, Like it

902
00:49:47,639 --> 00:49:50,800
seems really obvious, right, But he's like, the thing is

903
00:49:50,920 --> 00:49:53,800
is like, because it's nondeterministic, you're not going to have

904
00:49:54,079 --> 00:49:56,440
one hundred percent pass rate on your unit tests. So

905
00:49:56,599 --> 00:50:00,199
you need to determine what error rate you're happy with,

906
00:50:00,440 --> 00:50:02,320
and that's going to require a bit of experimentation.

907
00:50:03,079 --> 00:50:04,960
Speaker 2: But you also have to accept the level of ur rate,

908
00:50:05,119 --> 00:50:07,320
like you're not getting all agree exactly.

909
00:50:07,039 --> 00:50:09,280
Speaker 3: Exactly, so it might be like you just need ninety

910
00:50:09,320 --> 00:50:11,559
five or ninety nine percent or whatever to pass whatever

911
00:50:11,639 --> 00:50:15,159
looks realistic. But you know, an example of unit tests

912
00:50:15,280 --> 00:50:18,360
is let's say the query from the user was return

913
00:50:18,400 --> 00:50:21,000
me the phone number of you know, Jane Smith or

914
00:50:21,920 --> 00:50:25,400
you know someone like that, and then basically what you're

915
00:50:25,400 --> 00:50:27,719
going to expect from the CRM is a phone number,

916
00:50:28,159 --> 00:50:30,199
So you can write a unit test for that. You know,

917
00:50:31,159 --> 00:50:34,639
it's basic engineering. And then he said, you know, you

918
00:50:34,760 --> 00:50:39,440
can create a suite of manual evaluations, so you basically

919
00:50:39,519 --> 00:50:43,239
look at the traces how the LM is interacting with

920
00:50:43,320 --> 00:50:45,320
the users and the rest of the system, and you

921
00:50:45,760 --> 00:50:48,840
manually evaluate that. And you don't have to keep doing

922
00:50:48,880 --> 00:50:51,280
that forever because then you can use a new method

923
00:50:51,440 --> 00:50:54,320
called LM as a judge just where you get another

924
00:50:54,519 --> 00:50:59,079
LM to also do the same assessments and trying to

925
00:50:59,239 --> 00:51:02,440
get them to converge. And once you have a relatively

926
00:51:02,519 --> 00:51:05,800
strong sense that the LM is giving similar assessments to

927
00:51:05,920 --> 00:51:08,519
your human you can you know, you need to check

928
00:51:08,519 --> 00:51:09,840
in on it from nine to time to see if

929
00:51:09,840 --> 00:51:13,000
it's okay. But that you know, takes over that part

930
00:51:13,039 --> 00:51:15,599
of the assessment, and then you know, you can go

931
00:51:15,719 --> 00:51:18,719
up to your normal kind of higher level assessments like

932
00:51:19,199 --> 00:51:22,159
a B testing. You know, it's really it's just a

933
00:51:22,239 --> 00:51:25,719
normal engineering system, and you can create a feedback loop

934
00:51:25,760 --> 00:51:28,920
where you can you know, refine your prompts or fine

935
00:51:28,920 --> 00:51:32,559
tune models, or use different models that maybe are smaller

936
00:51:32,760 --> 00:51:35,639
or cheaper and see whether you can get the same

937
00:51:35,679 --> 00:51:38,440
sort of performance. So you know, obviously you're going to

938
00:51:38,559 --> 00:51:40,920
need to just pick a model to start with. You

939
00:51:41,039 --> 00:51:42,880
might be able to get a sense of whether it's

940
00:51:42,960 --> 00:51:46,559
good for chatbot applications in this language, you know, do

941
00:51:46,639 --> 00:51:51,079
your research on that. But this really shows me, like

942
00:51:51,239 --> 00:51:54,840
it's just it's so obvious, right, like this is how

943
00:51:54,960 --> 00:51:56,000
we do monitoring.

944
00:51:56,400 --> 00:52:00,239
Speaker 2: Yeah, and it's I'm sorry, this looks too adult me.

945
00:52:00,440 --> 00:52:03,159
Speaker 3: Well, I know, it looks like a lot of hard work.

946
00:52:03,719 --> 00:52:06,280
Speaker 2: Literally, as you actually have to work at building a

947
00:52:06,360 --> 00:52:10,559
day decent testing framework specific to your your case. I

948
00:52:10,719 --> 00:52:15,239
know I wanted I wanted a happy button. Jody can

949
00:52:15,320 --> 00:52:17,639
be sad I want to button. Yeah.

950
00:52:18,000 --> 00:52:22,480
Speaker 1: We recorded a show with Spencer Schneidenbach, which is actually

951
00:52:22,639 --> 00:52:23,320
next week's show.

952
00:52:23,400 --> 00:52:24,559
Speaker 2: We recorded it a couple.

953
00:52:24,440 --> 00:52:27,480
Speaker 1: Of days ago, so we have the benefit of future

954
00:52:27,599 --> 00:52:30,559
looking here, and we talked about some of these things

955
00:52:30,639 --> 00:52:34,920
with him, and uh, you know that just the comment

956
00:52:35,039 --> 00:52:37,000
came up and I think it was even mere Richard.

957
00:52:37,000 --> 00:52:39,639
I can't remember who, but you know, we we used

958
00:52:39,639 --> 00:52:42,400
to have we used to be programmers that you know,

959
00:52:42,519 --> 00:52:45,320
we have a bug, we fix it. Now the program

960
00:52:45,480 --> 00:52:48,760
is one hundred percent accurate. And now I mean it's

961
00:52:48,800 --> 00:52:53,039
even it's even more like we're a psychologist now instead

962
00:52:53,079 --> 00:52:56,880
of scientists. You know, we make some suggestions, we examine

963
00:52:56,880 --> 00:52:58,840
the output, you know, we think about it a little

964
00:52:58,880 --> 00:53:01,440
bit and it doesn't seem quite right. We ask some

965
00:53:01,519 --> 00:53:03,320
more questions, examine the behavior.

966
00:53:03,880 --> 00:53:04,039
Speaker 2: You know.

967
00:53:04,159 --> 00:53:07,199
Speaker 1: It's like, if these things are going into our software,

968
00:53:08,760 --> 00:53:12,320
I have a little trepidation about that just because of

969
00:53:12,480 --> 00:53:16,000
the inaccuracies. Even if it's even if something is ninety

970
00:53:16,079 --> 00:53:20,320
nine percent accurate. That's that's a bug. That's a one

971
00:53:20,400 --> 00:53:21,119
percent bug.

972
00:53:21,480 --> 00:53:23,679
Speaker 2: Yeah, and when you probably can't pin down.

973
00:53:23,519 --> 00:53:25,039
Speaker 1: And one you probably can't fix.

974
00:53:25,199 --> 00:53:29,800
Speaker 2: Use probabilistic tools, get probabilistic results. Yeah, exactly.

975
00:53:32,360 --> 00:53:35,599
Speaker 3: Look, it's funny because I'm probably so much more comfortable

976
00:53:35,639 --> 00:53:37,639
with this than any of you, because I'm like, hey,

977
00:53:37,760 --> 00:53:39,199
this is just how stuff works.

978
00:53:39,440 --> 00:53:41,519
Speaker 2: That was how machine learning always work. When we talk

979
00:53:41,559 --> 00:53:43,280
to you in twenty three. You've been doing this for years,

980
00:53:43,320 --> 00:53:46,000
and it's like you do the testing and there is

981
00:53:46,119 --> 00:53:48,400
no one hundred percent exactly. Yeah, you get in the

982
00:53:48,480 --> 00:53:50,199
mid nineties. You should feel good.

983
00:53:50,440 --> 00:53:55,280
Speaker 3: Yeah, yeah, well sometimes suspicious, it depends sometimes too quickly.

984
00:53:59,320 --> 00:54:02,800
But yeah, I think it's an uncomfortable new reality. And

985
00:54:03,519 --> 00:54:06,519
you know, it's something I've observed for years when you know,

986
00:54:07,400 --> 00:54:10,000
you bring engineers into the world of machine learning, and

987
00:54:10,679 --> 00:54:13,719
it is a deeply uncomfortable thing not knowing that something

988
00:54:13,960 --> 00:54:18,239
is one hundred percent deterministic. I think the main problem

989
00:54:18,400 --> 00:54:21,599
is is it's one thing to say have a system

990
00:54:21,800 --> 00:54:26,360
that otherwise works totally in a deterministic faction. So let's

991
00:54:26,360 --> 00:54:28,960
say you've got some sort of system that say, takes

992
00:54:29,000 --> 00:54:33,039
in queries or takes in numbers from a user, let's say,

993
00:54:33,079 --> 00:54:37,480
like nutrition numbers for a piece of food or something,

994
00:54:38,480 --> 00:54:40,519
and then you have a machine learning model that generates

995
00:54:40,559 --> 00:54:43,239
a prediction that maybe within a certain band of correct.

996
00:54:44,440 --> 00:54:47,000
It's more difficult when you're talking about an LM being

997
00:54:47,119 --> 00:54:49,880
an actor as part of that system and generating pieces

998
00:54:49,880 --> 00:54:53,480
of code that will then run that system and then

999
00:54:54,679 --> 00:54:57,360
generating error in that way is actually quite consequential.

1000
00:54:57,559 --> 00:55:00,800
Speaker 2: I just like that phrase certain band of correct.

1001
00:55:03,199 --> 00:55:06,400
Speaker 3: We call it, we call it a confidence interval. Actually,

1002
00:55:09,280 --> 00:55:11,280
how confident am I that this is correct?

1003
00:55:13,440 --> 00:55:16,199
Speaker 2: But you know, I think as a developer, when you're

1004
00:55:16,239 --> 00:55:18,679
talking to leadership that want you to use these tools,

1005
00:55:18,719 --> 00:55:20,360
I think they're going to provide as advantage. Just like

1006
00:55:20,440 --> 00:55:24,000
one of their educations is these are nondeterministic models and

1007
00:55:24,000 --> 00:55:25,920
there will always be a certain level of uncertainty, and

1008
00:55:25,960 --> 00:55:28,159
if you're not good with that, we don't get to

1009
00:55:28,239 --> 00:55:28,840
use these tools.

1010
00:55:28,960 --> 00:55:29,159
Speaker 1: Yeah.

1011
00:55:29,280 --> 00:55:30,639
Speaker 2: Yeah, yeah, that's right.

1012
00:55:31,239 --> 00:55:34,440
Speaker 1: So you know, the I guess first analysis you should

1013
00:55:34,480 --> 00:55:38,400
do in your business is what level of uncertainty are

1014
00:55:38,440 --> 00:55:39,199
we comfortable with?

1015
00:55:39,519 --> 00:55:42,079
Speaker 2: Can we tolerate? You know, what is the benchmark that

1016
00:55:42,119 --> 00:55:44,400
we're shooting for? Well? And then the other side of

1017
00:55:44,440 --> 00:55:47,880
this is the consequences of the uncertainty of the mistake,

1018
00:55:48,039 --> 00:55:50,719
like what has happen? Right, people gonna die? You know,

1019
00:55:50,800 --> 00:55:52,559
I know, I see this over and over again, where

1020
00:55:52,719 --> 00:55:55,719
like the first case of an LLEM in an organization

1021
00:55:55,840 --> 00:55:58,800
is with an HR system. So it's totally internal. And

1022
00:55:58,960 --> 00:56:01,440
part of that is because as the consequence omitting correct

1023
00:56:01,599 --> 00:56:04,360
is minor. You know, yes, you're going to make somebody

1024
00:56:04,400 --> 00:56:06,280
angry if you tell them they have more vacation days

1025
00:56:06,280 --> 00:56:09,079
than they do, but you probably haven't cost a company

1026
00:56:09,079 --> 00:56:09,719
a lot of money.

1027
00:56:09,920 --> 00:56:12,920
Speaker 1: Well, and also there's a human there to make sure

1028
00:56:13,000 --> 00:56:16,960
that the you know, accurate information gets given to the person.

1029
00:56:17,840 --> 00:56:20,280
Speaker 2: I like your optimism, but yes, I would hope you

1030
00:56:20,320 --> 00:56:23,440
would hope. I would hope. But you know, get the point, like,

1031
00:56:24,119 --> 00:56:26,920
there is a bunch of ways to manage this uncertainty.

1032
00:56:27,079 --> 00:56:29,440
So there's a going to be a new corporate title

1033
00:56:29,920 --> 00:56:35,480
job and it's going to be a nondeterministic compensator. Oh,

1034
00:56:36,440 --> 00:56:38,119
I think that's from I think I think that's from

1035
00:56:38,159 --> 00:56:44,239
Back to the Future. You're thinking, CuO chief Certainty officer.

1036
00:56:44,119 --> 00:56:47,559
Speaker 1: Chief Uncertainly that's good. I think you're thinking of uncoupling

1037
00:56:47,639 --> 00:56:49,119
the Heisenberg compensators.

1038
00:56:49,360 --> 00:56:51,599
Speaker 2: There you go, that's star Trek, Star Trek.

1039
00:56:51,760 --> 00:56:56,440
Speaker 3: Yeah, bouncing all over the place, such geeks.

1040
00:56:58,159 --> 00:57:01,400
Speaker 2: I found the blog post from Hassan and I'll include

1041
00:57:01,400 --> 00:57:03,880
it in the show, or from Hammel, who's sing man

1042
00:57:04,079 --> 00:57:07,760
who sings your AI product needs evals And it's exactly

1043
00:57:07,840 --> 00:57:11,519
the way you describe it. Building unit tests, doing model evaluation,

1044
00:57:11,760 --> 00:57:15,239
doing a b testing. This seems like a real concrete

1045
00:57:15,480 --> 00:57:19,559
approach to just how do we at least be able

1046
00:57:19,639 --> 00:57:21,519
to look people in the eye and say we've done

1047
00:57:21,599 --> 00:57:25,079
our best to test this and have some certainty around it.

1048
00:57:25,840 --> 00:57:28,119
Speaker 3: Yeap, And well, I think what I like about it

1049
00:57:28,280 --> 00:57:34,599
is it's not unfamiliar territory for engineers. This is exactly

1050
00:57:34,679 --> 00:57:38,000
what you've all been doing for decades now, Like this

1051
00:57:38,159 --> 00:57:40,719
is just monitoring well, or at.

1052
00:57:40,719 --> 00:57:43,199
Speaker 2: Least should have been doing. This is looking like the

1053
00:57:43,360 --> 00:57:44,639
testing we do on software.

1054
00:57:44,960 --> 00:57:46,880
Speaker 3: But this is the thing. It's no one's fault. That's

1055
00:57:46,960 --> 00:57:49,679
an on the ground developer, because the way these models

1056
00:57:49,719 --> 00:57:52,360
are sold is that no their magic like they are

1057
00:57:52,440 --> 00:57:55,519
different to everything else. They are certainly not. They are

1058
00:57:56,280 --> 00:57:59,360
the same as any other machine learning model, except slightly

1059
00:57:59,440 --> 00:58:04,360
more problem because you're probably involving them in critical parts

1060
00:58:04,400 --> 00:58:05,280
of generating code.

1061
00:58:05,599 --> 00:58:08,880
Speaker 2: Just be careful and measure please be careful. Yeah. So so,

1062
00:58:09,199 --> 00:58:13,760
doctor Burchell, what's next for you? What's in your inbox? I?

1063
00:58:14,400 --> 00:58:17,239
Speaker 3: As I said, I'm heading down to Melbourne in a

1064
00:58:17,320 --> 00:58:19,440
month for NDC. I'm going to be giving this talk

1065
00:58:19,519 --> 00:58:21,679
actually the one I gave it Porter and I'm going

1066
00:58:21,760 --> 00:58:24,559
to be giving one that I gave in Oslo just

1067
00:58:24,599 --> 00:58:28,360
about the psychology of llms. If you will not be

1068
00:58:28,480 --> 00:58:30,920
with me in Australia, you can also watch that on

1069
00:58:31,079 --> 00:58:35,639
YouTube on the NDC channel. The moment that I'm laying

1070
00:58:35,719 --> 00:58:38,400
kind of low, I'm actually going for my German citizenship,

1071
00:58:38,599 --> 00:58:42,159
so nice. Yeah, I gotta do my citizenship test in June.

1072
00:58:42,360 --> 00:58:45,119
Just did my language test couple of weeks ago, and

1073
00:58:45,440 --> 00:58:48,039
like everything in Germany, it takes months, so I may

1074
00:58:48,119 --> 00:58:49,599
be able to apply by the end of the year.

1075
00:58:49,800 --> 00:58:51,880
Speaker 2: So you got to wratch it up your complaint too.

1076
00:58:52,760 --> 00:58:56,320
Speaker 3: I laughed, actually so hard, because a friend of mine

1077
00:58:56,360 --> 00:58:58,960
did her exam and one of her writing tests was

1078
00:58:59,000 --> 00:59:00,599
to write a letter of complaint.

1079
00:59:05,599 --> 00:59:08,880
Speaker 1: Well it started because before you came on, Richard, Jody says,

1080
00:59:08,920 --> 00:59:10,880
how you doing, and I said, I can't complain, but

1081
00:59:10,960 --> 00:59:14,599
I do anyway, She says free chairman.

1082
00:59:17,079 --> 00:59:20,320
Speaker 2: Awesome, all right, thanks Jody, really appreciate it. Yeah, thank you.

1083
00:59:20,760 --> 00:59:22,559
What a great conversation, all.

1084
00:59:22,480 --> 00:59:24,440
Speaker 3: Right, Always always a pleasure, Okay, and.

1085
00:59:24,440 --> 00:59:27,079
Speaker 1: We'll talk to you next time on dot net rocks.

1086
00:59:47,280 --> 00:59:49,840
Dot net Rocks is brought to you by Franklin's Net

1087
00:59:50,119 --> 00:59:54,039
and produced by Pop Studios, a full service audio, video

1088
00:59:54,159 --> 00:59:58,159
and post production facility located physically in New London, Connecticut,

1089
00:59:58,480 --> 01:00:02,639
and of course in the cloud online at PWOP dot com.

1090
01:00:03,480 --> 01:00:05,519
Visit our website at d O T N E t

1091
01:00:05,840 --> 01:00:09,840
R O c k S dot com for RSS feeds, downloads,

1092
01:00:10,000 --> 01:00:13,679
mobile apps, comments, and access to the full archives going

1093
01:00:13,719 --> 01:00:17,119
back to show number one, recorded in September two thousand

1094
01:00:17,119 --> 01:00:19,760
and two. And make sure you check out our sponsors.

1095
01:00:19,960 --> 01:00:22,760
They keep us in business. Now, go write some code,

1096
01:00:23,320 --> 01:00:24,079
See you next time.

1097
01:00:25,000 --> 01:00:28,920
Speaker 2: You got Jack Middle Vans and

