1
00:00:12,160 --> 00:00:20,760
Speaker 1: Hey man, it's not ned. You're not why not? You know,

2
00:00:20,839 --> 00:00:23,280
it's just the way I feel this morning.

3
00:00:23,359 --> 00:00:25,120
Speaker 2: Yeah, you've been had a little too much weekend.

4
00:00:25,160 --> 00:00:29,760
Speaker 1: I think I'm Carl Franklin. That's Richard Campbell, you know. Okay,

5
00:00:29,879 --> 00:00:33,600
I won't do that again until marijuana is completely legal

6
00:00:33,640 --> 00:00:36,159
in all fifty states of America. Oh, it's going to

7
00:00:36,200 --> 00:00:42,520
be a while and then I'll hey man, how you doing, Richard?

8
00:00:42,840 --> 00:00:43,320
I'm good.

9
00:00:43,799 --> 00:00:45,880
Speaker 2: I did have a bit of a crazy weekend. You know,

10
00:00:46,119 --> 00:00:48,759
between my birthday, my wife's birthday, and and many of

11
00:00:48,759 --> 00:00:51,679
our friends are all like this last two weeks of July,

12
00:00:51,880 --> 00:00:54,560
so we bought them all up to the coast and

13
00:00:54,560 --> 00:00:57,399
it's yeah, I've been a week in a debauchery, honestly.

14
00:00:57,600 --> 00:00:59,359
Speaker 1: Yeah, your nose does look a little red.

15
00:00:59,520 --> 00:01:02,000
Speaker 2: Actually, I'm a little pinked up, a little pinked up

16
00:01:02,039 --> 00:01:05,319
today doing some damage. But you know, yeah, there's a

17
00:01:05,359 --> 00:01:07,239
couple of buddies up here that are literally known for

18
00:01:07,239 --> 00:01:09,079
forty years. Like that's when I'm wow, get old.

19
00:01:09,400 --> 00:01:10,280
Speaker 1: That's so cool though.

20
00:01:10,400 --> 00:01:13,400
Speaker 2: Yeah, it's something. Yeah, how about you. You're staying out

21
00:01:13,400 --> 00:01:15,799
of trouble. You're playing I see, I see your calendar.

22
00:01:15,840 --> 00:01:17,239
You're playing summer time.

23
00:01:17,359 --> 00:01:20,640
Speaker 1: Yeah. Yeah, the band is currently just on a tear

24
00:01:20,799 --> 00:01:21,200
right now.

25
00:01:21,439 --> 00:01:23,879
Speaker 2: That's awesome, dude, And uh.

26
00:01:23,439 --> 00:01:27,680
Speaker 1: We're getting our original bass player back in September. August

27
00:01:27,760 --> 00:01:31,200
thirty first is our current bass player's last date. Oh,

28
00:01:31,280 --> 00:01:34,120
I'll tell you a little band story. So it was

29
00:01:34,200 --> 00:01:39,280
during COVID that Kevin, our original bass player, basically stopped

30
00:01:39,280 --> 00:01:42,680
coming to rehearsals because of COVID and and he had

31
00:01:42,680 --> 00:01:46,079
a good reason too. He was immune deficient and it

32
00:01:46,120 --> 00:01:50,239
turns out he got leukemia. He had and his immune

33
00:01:50,359 --> 00:01:53,519
system was compromised. Because of all that stuff, he had

34
00:01:53,519 --> 00:01:56,560
to stay home and he was like, guys, I'm I'm out.

35
00:01:56,599 --> 00:01:59,439
I can't do this and you know, my wife and

36
00:01:59,519 --> 00:02:03,920
child all that. Yeah. So then he gradually got over it,

37
00:02:03,959 --> 00:02:06,359
and I kept calling him once in a while and said, hey, man,

38
00:02:06,400 --> 00:02:08,759
how you doing. He's like, yeah, I'm okay, but I'm

39
00:02:08,800 --> 00:02:10,759
still not. You know, I tried playing out a couple

40
00:02:10,719 --> 00:02:14,080
of times with you know, somebody else, and I'm still not.

41
00:02:14,800 --> 00:02:19,800
And then finally he's like, okay, I'm retiring. I'm in oh,

42
00:02:19,840 --> 00:02:23,400
but not till September. And it turns out that I

43
00:02:23,439 --> 00:02:28,000
called him because our current bass player Chris basically met

44
00:02:28,000 --> 00:02:32,639
a girl, sold his house, and moved three hours away

45
00:02:33,000 --> 00:02:33,719
and married her.

46
00:02:33,879 --> 00:02:36,039
Speaker 2: Hopefully not necessarily an order, but okay.

47
00:02:35,840 --> 00:02:38,919
Speaker 1: Yeah, yeah, yeah. But he said, don't worry, guys, I'm

48
00:02:39,000 --> 00:02:41,560
still in the band. And we're like, oh, yeah, right,

49
00:02:41,759 --> 00:02:47,039
you know so. But props to Chris three hours. Yeah,

50
00:02:47,639 --> 00:02:49,479
he didn't come to rehearsals so that, you know, we

51
00:02:49,520 --> 00:02:52,960
couldn't really learn any new stuff, but we could send

52
00:02:53,039 --> 00:02:54,759
him a song and say, hey, learn this, and he

53
00:02:54,800 --> 00:02:56,599
was pretty good about it. And he never had a

54
00:02:56,599 --> 00:02:59,159
problem on the gig. But he showed up at every

55
00:02:59,240 --> 00:03:03,400
gig three hours away. It's a lot of driving. But

56
00:03:03,520 --> 00:03:05,759
I was going to say that the whole purpose of

57
00:03:05,800 --> 00:03:08,639
this was that it comes September. See, Kevin knows a

58
00:03:08,639 --> 00:03:11,599
lot of our original tunes that Chris doesn't know, and

59
00:03:11,680 --> 00:03:13,879
a lot more than the Steely Dan songs which were

60
00:03:13,879 --> 00:03:16,599
known for. So we're going to be playing a lot

61
00:03:16,639 --> 00:03:20,639
more original tunes, hopefully in bigger venues. Yeah, more Franklin Brothers.

62
00:03:20,759 --> 00:03:23,199
It's cool. Anyway, I've taken up enough time with my

63
00:03:23,400 --> 00:03:27,840
stupid stories of music and bands. Let's play role. Let's

64
00:03:27,919 --> 00:03:37,840
roll the music for better no framework, awesome, man, what

65
00:03:37,919 --> 00:03:40,439
do you got. I can't believe I've never talked about

66
00:03:40,479 --> 00:03:46,240
this particular project before. It's curated by the dot Net Foundation.

67
00:03:47,080 --> 00:03:51,000
It's Fluent validation. Okay, it's a validation library for dot

68
00:03:51,000 --> 00:03:53,680
Net that uses a fluent interface. You know what that is? This?

69
00:03:53,759 --> 00:03:56,879
Do this? Do that? Mm hmm and lambda expressions for

70
00:03:56,960 --> 00:04:02,080
building strongly typed validation rules, right, okay, So you can

71
00:04:02,120 --> 00:04:05,000
create these rules and then you can obviously use them,

72
00:04:05,240 --> 00:04:07,840
and it's very popular.

73
00:04:08,080 --> 00:04:10,599
Speaker 2: Well, and it's a way of building abstract validation rules

74
00:04:10,639 --> 00:04:13,840
you might actually reuse or ither than keep recoding it exactly.

75
00:04:14,159 --> 00:04:16,480
Speaker 1: Yeah. Yeah, And you know, if you decide to do

76
00:04:16,560 --> 00:04:18,879
it on the fly, where do you start that information?

77
00:04:19,639 --> 00:04:21,480
Where does it go? Does it go in a database?

78
00:04:22,480 --> 00:04:24,399
You know? Does it change? How does it change? Who

79
00:04:24,480 --> 00:04:29,040
changes it? Like, there's so many, so many things. But yeah,

80
00:04:29,560 --> 00:04:34,399
eight point nine thousand stars. Yeah, I would learn it,

81
00:04:34,439 --> 00:04:37,439
love it. Man. That's clearly a great tool. It's cool,

82
00:04:37,480 --> 00:04:39,199
and I'm going to start using it. I can't believe

83
00:04:39,199 --> 00:04:43,040
that I haven't even known about this existence before. So

84
00:04:43,120 --> 00:04:44,480
that's what I got. Who's talking to us?

85
00:04:44,600 --> 00:04:46,800
Speaker 2: Richard grabbed a comment off of the show eighteen ninety one,

86
00:04:46,800 --> 00:04:48,759
which you did back in March of twenty four with

87
00:04:48,920 --> 00:04:53,040
her friend Anthony LRB. Who did they he was he'd

88
00:04:53,040 --> 00:04:56,319
written that open source library for API observability.

89
00:04:56,680 --> 00:04:57,480
Speaker 1: Yeah right.

90
00:04:57,639 --> 00:04:59,639
Speaker 2: We had a great conversation with her. An API tool

91
00:04:59,680 --> 00:05:01,879
kit I was his two line. I know we're going

92
00:05:01,879 --> 00:05:04,399
to talk about APIs a bunch today. And our friend

93
00:05:04,439 --> 00:05:07,600
Matt Lacy actually commented on the show. He said, hey,

94
00:05:07,720 --> 00:05:10,279
ten plus years ago, I built something for API. Because

95
00:05:10,319 --> 00:05:12,480
this is your checking on the client end. I assume

96
00:05:12,519 --> 00:05:14,480
there was a potential business in it, but I couldn't

97
00:05:14,480 --> 00:05:16,759
work out how, you know, because building good software and

98
00:05:16,839 --> 00:05:20,040
selling software two different jobs, right, Like that's a different thing.

99
00:05:21,160 --> 00:05:23,240
Great to see API toolkitd in existence now I'm making

100
00:05:23,279 --> 00:05:26,319
APIs more reliable because that was our whole conversation, right

101
00:05:26,519 --> 00:05:29,759
was you know you changed some you changed an API

102
00:05:29,920 --> 00:05:33,000
and somebody with a dependency on it suddenly goes down. Yeah,

103
00:05:33,279 --> 00:05:35,720
you make people sad in a big hurry, and so

104
00:05:36,079 --> 00:05:38,800
API toolkit was all about doing that validation. Hey, Matt,

105
00:05:38,920 --> 00:05:40,720
thanks so much for being a long time listener. And

106
00:05:41,000 --> 00:05:42,360
I don't know if you have a copy of Music

107
00:05:42,439 --> 00:05:44,879
Code Buy already, but you do now if you'd like

108
00:05:44,879 --> 00:05:46,399
a copy of Music code By, I write a comment

109
00:05:46,439 --> 00:05:48,240
on the website at don at Rocks dot com or

110
00:05:48,279 --> 00:05:50,519
on the facebooks. We publish every show there, and if

111
00:05:50,519 --> 00:05:52,000
you comment there and every reading the show, we'll send

112
00:05:52,040 --> 00:05:54,120
you a copy of music code By and Music to code.

113
00:05:53,959 --> 00:05:56,040
Speaker 1: By for those who don't know, is something you can

114
00:05:56,120 --> 00:05:59,920
listen to while you're coding, or to calm a restless dog,

115
00:06:00,519 --> 00:06:02,360
or to put your children to sleep at night.

116
00:06:03,399 --> 00:06:07,399
Speaker 2: Here I thought that was the nuclear weapons geek out

117
00:06:07,439 --> 00:06:08,639
for putting children to sleep.

118
00:06:08,839 --> 00:06:11,879
Speaker 1: Oh come on, now, that was amazing, and it always

119
00:06:12,040 --> 00:06:14,720
is every year, but I was that particular one.

120
00:06:14,759 --> 00:06:17,439
Speaker 2: Somebody told me it's like what I was depressed. So

121
00:06:17,519 --> 00:06:19,360
the key, I'm a little more level in that.

122
00:06:19,480 --> 00:06:19,680
Speaker 1: Yeah.

123
00:06:19,680 --> 00:06:21,600
Speaker 2: And I did almost all of the talking too, So

124
00:06:21,839 --> 00:06:23,639
apparently it's pretty good knocking kids out.

125
00:06:23,879 --> 00:06:27,319
Speaker 1: Well, there you go. Hopefully we'll have something more positive

126
00:06:27,439 --> 00:06:27,759
to say.

127
00:06:27,800 --> 00:06:29,680
Speaker 2: This is only positive about nuclear weapons.

128
00:06:29,759 --> 00:06:34,839
Speaker 1: Go on, Well you know, okay, Well let's get into it.

129
00:06:34,959 --> 00:06:37,560
We're gonna introduce our guest right now. I'm going to

130
00:06:37,639 --> 00:06:42,240
introduce our guest. Andrea Kamenev is our guest, and from

131
00:06:42,319 --> 00:06:46,120
twenty sixteen to twenty twenty four, Andrea worked at Microsoft

132
00:06:46,240 --> 00:06:49,360
in various architect roles in Europe helping customers to bring

133
00:06:49,439 --> 00:06:53,199
their applications to Azure. Now he works as a product

134
00:06:53,279 --> 00:06:57,319
manager at Azure Api Management. All that do that we

135
00:06:57,439 --> 00:06:59,959
were kind of just talking about, but in the Azure way.

136
00:07:00,759 --> 00:07:04,079
Welcome Andre, thanks for hearing me, Thanks for being here.

137
00:07:04,879 --> 00:07:08,040
We first started like the quick start teams, those folks

138
00:07:08,120 --> 00:07:12,199
that helped onboard people into the cloud. Did that digital transformation.

139
00:07:11,759 --> 00:07:14,279
Speaker 3: Thing, Uh, I mean the the service by itself.

140
00:07:14,600 --> 00:07:17,600
Speaker 1: Yeah, your earlier role before you joined the product team.

141
00:07:17,839 --> 00:07:19,959
Speaker 3: Yeah, yeah, So I was a part of what was

142
00:07:20,040 --> 00:07:23,480
called here at Microsoft Global Black Built Team. Okay, so

143
00:07:23,600 --> 00:07:27,480
it's like a bunch of cloudsision architects who help local

144
00:07:27,600 --> 00:07:32,279
teams like field engineers and customer social architects in the

145
00:07:32,360 --> 00:07:36,199
MEA region to build stuff with Azure. So our team

146
00:07:36,240 --> 00:07:38,560
was mostly focusing on Kubernatis related stuff. So I was

147
00:07:38,600 --> 00:07:41,759
working a lot with customers on bringing workloads to Azurecubneti

148
00:07:41,839 --> 00:07:43,720
service asually had to open shift and so on.

149
00:07:43,959 --> 00:07:46,560
Speaker 2: So yeah, deep, So the move over to API management

150
00:07:46,600 --> 00:07:49,319
makes sense because that's a lynchpin problem when you expose

151
00:07:49,360 --> 00:07:52,439
stuff on the cloud that was typically just on prem before.

152
00:07:53,439 --> 00:07:53,600
Speaker 1: Yeah.

153
00:07:53,639 --> 00:07:56,519
Speaker 3: Absolutely, Yeah, we've seen a lot of customers who interested

154
00:07:56,519 --> 00:07:59,120
in APIM like and even even back then when I

155
00:07:59,240 --> 00:08:01,759
was not a part of a PAM team, like, okay,

156
00:08:01,839 --> 00:08:03,560
I have a bunch of APIs and my cuminator is

157
00:08:03,560 --> 00:08:05,800
how do they expose them securely? Like what what can

158
00:08:05,839 --> 00:08:06,319
you go for me?

159
00:08:06,439 --> 00:08:06,920
Speaker 1: Microsoft?

160
00:08:07,360 --> 00:08:08,920
Speaker 3: And then I think back back then there was a

161
00:08:09,000 --> 00:08:11,680
self hosted gate when it is it is still out there.

162
00:08:12,560 --> 00:08:13,279
It was a solution.

163
00:08:13,560 --> 00:08:16,759
Speaker 1: So yeah, so what are you working on these days? Yeah?

164
00:08:16,800 --> 00:08:21,040
Speaker 3: So these days, I guess there's a lot of interest

165
00:08:21,199 --> 00:08:24,399
in gen ai in large language models. Chugupt is all

166
00:08:24,439 --> 00:08:28,800
over the place. So goodness, right now, we in I

167
00:08:28,879 --> 00:08:31,000
believe in May, Yeah, in May we released the gen

168
00:08:31,040 --> 00:08:34,840
Ai Gatewakypical just Nature pay Management to help customers build

169
00:08:35,559 --> 00:08:37,480
like intelligent applications with lll ms.

170
00:08:37,519 --> 00:08:40,679
Speaker 1: So yeah, that's I have a great idea. Let's give

171
00:08:40,759 --> 00:08:43,519
our AI all of our API keys and let them

172
00:08:43,559 --> 00:08:44,639
do whatever they want to do with it.

173
00:08:45,600 --> 00:08:48,799
Speaker 3: Yeah, that's that's actually a better approach. That's that's what

174
00:08:49,120 --> 00:08:50,840
That's one of the things that were actually trying to

175
00:08:50,919 --> 00:08:52,320
solve for customers.

176
00:08:52,159 --> 00:08:54,679
Speaker 1: Right, I know. Yeah, it just sounds crazy, doesn't it,

177
00:08:54,960 --> 00:08:58,480
given all the y the AI hiccups, and things that

178
00:08:58,639 --> 00:09:01,840
people don't really trust them. But I mean, so let's

179
00:09:02,039 --> 00:09:05,440
let's bust that myth. You know, why would we use

180
00:09:05,679 --> 00:09:10,200
AI to make our API calls, manage our APIs do

181
00:09:10,279 --> 00:09:14,279
all those things that normally trusted folks to. Yeah.

182
00:09:14,360 --> 00:09:17,039
Speaker 3: So here I think, like from APM side, we have

183
00:09:17,279 --> 00:09:20,960
two different stories. Like, first of all, we use gen

184
00:09:20,960 --> 00:09:25,000
AI ourselves to help customers write the policies that we

185
00:09:25,120 --> 00:09:29,679
have an API M. And the second thing is we

186
00:09:29,799 --> 00:09:34,000
have customers for building intelligent applications using Judge PT models,

187
00:09:34,200 --> 00:09:37,279
using other models that are out there in az REAI studio,

188
00:09:38,200 --> 00:09:40,600
and they have challenges because if you think about as

189
00:09:40,679 --> 00:09:43,519
open EI for example, it is still an API, you

190
00:09:43,639 --> 00:09:46,759
still have the same challenges when it comes to managing

191
00:09:46,840 --> 00:09:51,039
and securing access to APIs. So we built but they

192
00:09:51,120 --> 00:09:54,159
have specific kind of a number of challenges which are

193
00:09:54,279 --> 00:09:57,399
kind of a specific to l ms. So this is

194
00:09:57,480 --> 00:09:59,960
kind of the second part where we help customers to secure,

195
00:10:00,279 --> 00:10:03,320
manage and scale Open the Eye deployments for the applications

196
00:10:03,360 --> 00:10:07,120
with that's what we call JENNYI Gateway and HPA management.

197
00:10:07,200 --> 00:10:09,120
Speaker 1: So what are some of the ways in which AI

198
00:10:09,240 --> 00:10:15,480
can be used with APIs. Obviously creating the management stuff

199
00:10:15,519 --> 00:10:22,000
around it. But would you necessarily trust an AI with

200
00:10:22,240 --> 00:10:25,480
your API keys and say, you know, here are the

201
00:10:25,600 --> 00:10:28,879
rules and times under which you would make these API calls.

202
00:10:29,000 --> 00:10:32,159
And I'm trying to put wrap my head around that.

203
00:10:32,679 --> 00:10:38,159
Speaker 3: Yeah, I think yeahs as always, it depends, right, So Yeah,

204
00:10:39,639 --> 00:10:43,840
if you have like specific like access controls in place,

205
00:10:44,440 --> 00:10:46,799
why not Like if you, for example, that apimor you

206
00:10:46,840 --> 00:10:51,159
can you can provide the specific keys for the LLM

207
00:10:51,519 --> 00:10:54,440
to like enhance the experience for those who use those

208
00:10:54,559 --> 00:10:57,320
l ms you can and all that. Yeah, then why

209
00:10:57,360 --> 00:11:00,000
not Like you're not given the access to like full API,

210
00:11:00,159 --> 00:11:03,759
You're just giving access to a subset of operations.

211
00:11:03,679 --> 00:11:05,519
Speaker 1: Sure, which are for example, wedn't.

212
00:11:05,240 --> 00:11:08,519
Speaker 3: Leave, or they just have access to specific data that

213
00:11:08,720 --> 00:11:11,559
you're not really they're not, which is not really trick.

214
00:11:11,679 --> 00:11:14,159
So yeah, that's definitely why I will use it.

215
00:11:14,200 --> 00:11:18,200
Speaker 1: I'd be definitely okay with gets, but posts and puts.

216
00:11:18,600 --> 00:11:21,879
Speaker 3: Yeah, I don't know, yeahuse.

217
00:11:21,080 --> 00:11:23,039
Speaker 2: Otherwise you'd have to make these rules yourself, right, I

218
00:11:23,080 --> 00:11:24,960
mean that that's the point here is that you got

219
00:11:24,960 --> 00:11:26,840
a machine learning model essentially that's figuring out what the

220
00:11:26,879 --> 00:11:28,399
optimal rules are for utilization.

221
00:11:28,679 --> 00:11:30,039
Speaker 3: Yeah, so there is a lot. There is one thing

222
00:11:30,120 --> 00:11:32,639
so as I mentioned, like first thing we've built like

223
00:11:32,639 --> 00:11:35,360
an a PM. For example, we're helping customers with with

224
00:11:35,639 --> 00:11:41,000
lllms to configure a PM. So I guess you're kind

225
00:11:41,039 --> 00:11:44,120
of familiar with AM. We have this XML policies which

226
00:11:44,240 --> 00:11:46,360
can be pretty long documents.

227
00:11:46,840 --> 00:11:49,919
Speaker 2: Yeah, so typically that you're doing the thing, you're trying

228
00:11:49,960 --> 00:11:52,519
to say no one user can do more than this

229
00:11:52,720 --> 00:11:55,759
many or if it's growing, you know, massively limited so

230
00:11:55,919 --> 00:11:58,679
you don't knock other people off and make sure they're

231
00:11:58,720 --> 00:12:01,399
using the right accounts like it's it's just forwarding.

232
00:12:01,799 --> 00:12:03,000
Speaker 1: Yeah, a lot of stuff in there.

233
00:12:03,039 --> 00:12:04,720
Speaker 2: You know where stuff is, and you know what failover

234
00:12:04,799 --> 00:12:07,480
modes look like, like yeah, AP. We've done a few

235
00:12:07,519 --> 00:12:10,879
shows on API management now and it's like pretty powerful stuff.

236
00:12:11,159 --> 00:12:14,600
You know, you're gonna pub put an API in a

237
00:12:14,679 --> 00:12:18,759
public like you're paying when everybody somebody calls that, so

238
00:12:19,000 --> 00:12:20,759
you kind of want to put governance around that. But

239
00:12:20,919 --> 00:12:22,960
write and all those rules like when you really dig

240
00:12:23,039 --> 00:12:25,320
into it, it's complicated. So that was my first thought

241
00:12:25,320 --> 00:12:27,120
when I thought about, Yeah, what do I want Generator

242
00:12:27,159 --> 00:12:29,200
I to do. It's like, look at what's actually going

243
00:12:29,279 --> 00:12:30,720
on and write me better rules.

244
00:12:30,600 --> 00:12:32,720
Speaker 3: Yeah, exactly. And that's kind of the two kind of

245
00:12:32,759 --> 00:12:35,879
two use cases that we focused on with Copilot. Like, first,

246
00:12:36,000 --> 00:12:38,960
we we decided that, oh, we we know that writing

247
00:12:39,000 --> 00:12:41,720
policies is hard. We have like fifty sixty different policses

248
00:12:41,799 --> 00:12:45,320
nippets to do like Validay, Jotan retripolicies like write limits

249
00:12:45,320 --> 00:12:48,279
and stuff like that. So we decided like, let's have

250
00:12:48,399 --> 00:12:50,519
a let's have a way for customers to express and

251
00:12:50,799 --> 00:12:52,840
plant English, like, for example, I want to have a

252
00:12:52,919 --> 00:12:55,919
policy to write limit this API for you know, five

253
00:12:56,000 --> 00:12:59,360
for quest per second, and then Copilot will just explain

254
00:12:59,440 --> 00:13:01,320
on sorry not explain, but generate a policy for you

255
00:13:01,360 --> 00:13:03,559
and then just copy and paste this into the XM

256
00:13:03,679 --> 00:13:07,080
leditor and that's it. And another one is, as I

257
00:13:07,120 --> 00:13:10,679
mentioned that the second scenario, policies can become pretty long,

258
00:13:10,799 --> 00:13:12,960
like two hundred three hundred, and sometimes you don't even

259
00:13:13,000 --> 00:13:14,200
understand what's going on there.

260
00:13:14,679 --> 00:13:20,759
Speaker 2: In XML, in XML exactly. To another theme on the

261
00:13:20,840 --> 00:13:24,559
show lately is we hate XML. Yeah, there's a use

262
00:13:24,639 --> 00:13:27,679
for AI right there. Hey, translate this XML to.

263
00:13:27,720 --> 00:13:30,759
Speaker 3: Me in English exactly, And that's actually what we do

264
00:13:30,879 --> 00:13:33,559
with the second scenario. So yeah, you can just select

265
00:13:33,879 --> 00:13:37,360
XML whole thing or just a policy snippet and then

266
00:13:37,480 --> 00:13:39,960
you ask it to explain it to you and the

267
00:13:40,039 --> 00:13:44,399
fund stuff. It's only explaining just like oh, this policy does,

268
00:13:44,519 --> 00:13:47,240
this policy is that, but it also understands the context,

269
00:13:47,320 --> 00:13:49,679
like if you have too different variables, you have context,

270
00:13:49,840 --> 00:13:54,159
you have policy expressions some logic in there. It also

271
00:13:54,200 --> 00:13:57,120
will explain that, for example, and all you are doing

272
00:13:57,200 --> 00:14:00,360
validate job policy and you have this admin claim that

273
00:14:00,440 --> 00:14:03,120
you're checking. If you're checking the saddening claim, if it exists,

274
00:14:03,240 --> 00:14:06,240
then you are allowed to do this operation. So yeah,

275
00:14:06,240 --> 00:14:08,919
it's it's pretty it's pretty good in explaining policies because

276
00:14:09,440 --> 00:14:12,600
we also we are not using like the plane model,

277
00:14:13,000 --> 00:14:16,759
but we're also like using this it's called retrieval augmented

278
00:14:17,120 --> 00:14:20,360
generation pattern where we also have like policy snippets that

279
00:14:20,440 --> 00:14:23,639
are stored in a storage and this model can also

280
00:14:23,759 --> 00:14:27,240
use this policy snippets that we provided to better like

281
00:14:27,399 --> 00:14:30,320
respond with correct policies with better explanations.

282
00:14:30,360 --> 00:14:32,200
Speaker 2: And so yeah, so the same way I would actually

283
00:14:32,240 --> 00:14:34,600
write policies is I go cut and paste from well

284
00:14:34,639 --> 00:14:37,759
written policies exactly. Yeah, you've trained a model on well

285
00:14:37,799 --> 00:14:39,840
written policy so that it has a good chance of

286
00:14:40,360 --> 00:14:41,840
expressing better ones.

287
00:14:41,679 --> 00:14:42,720
Speaker 1: For a customer.

288
00:14:42,919 --> 00:14:48,159
Speaker 2: Yeah, exactly, Okay, I mean I could see a few

289
00:14:48,159 --> 00:14:50,200
different things going on here at once because you and

290
00:14:50,240 --> 00:14:52,480
I'll include a link to this blog post here. You're

291
00:14:52,519 --> 00:15:00,559
also talking about using the the API APIM to manage

292
00:15:00,720 --> 00:15:05,039
utilization of the open Ai service because that stuff gets expensive,

293
00:15:05,279 --> 00:15:07,759
like exactly, Yeah, those tokens run away on you and

294
00:15:07,840 --> 00:15:09,000
like you're having a bad day.

295
00:15:09,360 --> 00:15:12,080
Speaker 3: Yeah, yeah, that's that's that's actually an interesting use case

296
00:15:12,120 --> 00:15:14,360
because as I mentioned, we have customers who are trying out,

297
00:15:14,399 --> 00:15:19,360
they're building pocs, they're building small applications, and there's azual

298
00:15:19,360 --> 00:15:21,600
open Eye service and it makes it really easy for

299
00:15:21,720 --> 00:15:24,639
you to start. You just deploy open endpoint, you select

300
00:15:24,720 --> 00:15:26,799
for example, you have you want to have GPT four

301
00:15:26,879 --> 00:15:29,399
model and the end you're good to go, like you can.

302
00:15:29,840 --> 00:15:32,679
That's just an API. You get your ap I key,

303
00:15:32,759 --> 00:15:34,879
you import dais the care of your choice to your application,

304
00:15:35,080 --> 00:15:37,759
and and that's it. You're sending prompts, you're saving completions.

305
00:15:37,840 --> 00:15:43,480
Everything's fine, but then customers realize that okay token comes exactly, Yeah,

306
00:15:44,840 --> 00:15:47,399
there are tokens, and tokens is like something which is

307
00:15:47,679 --> 00:15:50,480
super important in edge open and in general and l

308
00:15:50,720 --> 00:15:53,799
MS you spend tokens for prompts, you spend tokens for completions,

309
00:15:54,519 --> 00:15:58,799
and even when you do play open I instance, there

310
00:15:58,919 --> 00:16:01,600
is a quota associate to your model which is expressed

311
00:16:01,639 --> 00:16:04,840
and TPM which is tokens per minute. Right, and then

312
00:16:04,879 --> 00:16:08,840
after all of these experimentations customers, they started to realize that, Okay,

313
00:16:08,919 --> 00:16:11,360
now we need to wait to manage this because okay,

314
00:16:11,440 --> 00:16:13,840
we've built our first POC. We have one team who

315
00:16:13,919 --> 00:16:16,799
developed this kind of a private preview app which is

316
00:16:16,840 --> 00:16:18,679
not full in production right now. But now we have

317
00:16:18,879 --> 00:16:22,039
ten different departments, ten different teams who also want to

318
00:16:22,080 --> 00:16:24,840
get access to this model, And now, how can I

319
00:16:24,919 --> 00:16:28,559
manage that? How can I limit the consumption per team,

320
00:16:28,639 --> 00:16:31,120
per department, per developer, How can I.

321
00:16:31,200 --> 00:16:34,320
Speaker 2: Make sure signed costs out like my sessonment had is

322
00:16:34,440 --> 00:16:37,200
firmly on right now, It's like there's nothing better in

323
00:16:37,240 --> 00:16:39,720
this world. And being able to build out resources to

324
00:16:39,759 --> 00:16:41,120
the individual teams for what they do.

325
00:16:42,679 --> 00:16:44,639
Speaker 3: Yeah, and that's and that's a huge issue, Like you

326
00:16:44,720 --> 00:16:46,759
need to figure out how many tokens were consumed by

327
00:16:46,759 --> 00:16:49,519
a specific team, sure what kind of model they used,

328
00:16:50,720 --> 00:16:53,759
And then like okay, at the beginning, you have one endpoint.

329
00:16:53,960 --> 00:16:56,159
But what if you want to have multiple endpoints because

330
00:16:56,519 --> 00:17:00,480
like you're going production, you want to scale. How do

331
00:17:00,559 --> 00:17:03,759
you all balance how do you like create circuit breaker

332
00:17:03,840 --> 00:17:06,200
rules to make sure that for example, okay, wile our

333
00:17:06,279 --> 00:17:10,039
first instance is throat out responses with four twenty nine,

334
00:17:10,319 --> 00:17:13,240
how can I fail over to a different endpoint?

335
00:17:13,480 --> 00:17:13,599
Speaker 1: Right?

336
00:17:13,720 --> 00:17:16,920
Speaker 3: Yeah, so, yeah, there are a lot of challenges. Now

337
00:17:17,200 --> 00:17:21,319
you mentioned the given access API keys. Distributing API keys

338
00:17:21,359 --> 00:17:23,720
to all of these teams also doesn't sound like a

339
00:17:23,759 --> 00:17:27,960
good idea. So that's why we've built like a lot

340
00:17:28,000 --> 00:17:30,359
of stuff that is in this blog post for Jenny

341
00:17:30,400 --> 00:17:33,920
I announcement. We wanted to solve these challenges for customers

342
00:17:34,359 --> 00:17:36,920
who are kind of scaling and trying to like productize

343
00:17:37,039 --> 00:17:41,559
their their investment into as open THEI specifically, but also

344
00:17:42,400 --> 00:17:45,119
for other models like elms and stuff.

345
00:17:45,279 --> 00:17:45,480
Speaker 1: Yeah.

346
00:17:46,359 --> 00:17:49,200
Speaker 2: Certainly, one of the experiences I've dealt with with companies

347
00:17:49,240 --> 00:17:53,160
building a software into the cloud, even when they you know,

348
00:17:53,200 --> 00:17:55,799
they've got authentication and they're building back to the customer,

349
00:17:56,480 --> 00:17:59,039
the customer makes a mistake with the API and racks

350
00:17:59,119 --> 00:18:02,640
up a couple a million transactions that were test transactions,

351
00:18:02,759 --> 00:18:04,839
Like they're not making money on the back end. Then

352
00:18:04,920 --> 00:18:07,559
you're sending them this ugly bill and they, you know,

353
00:18:07,759 --> 00:18:11,359
want help. In the meantime, you've also gotten an ugly bill,

354
00:18:12,160 --> 00:18:13,880
you know, because you ran it on the back end.

355
00:18:13,920 --> 00:18:16,400
So this is this whole game of like who's.

356
00:18:16,200 --> 00:18:17,200
Speaker 1: Holding the bag here?

357
00:18:17,720 --> 00:18:19,400
Speaker 2: You know, you don't want to punish your customer for

358
00:18:19,480 --> 00:18:23,039
making a mistake. If you do, you may lose them

359
00:18:23,039 --> 00:18:25,960
as a customer. You're not necessarily going to get remediated,

360
00:18:26,119 --> 00:18:28,519
you know, back to Azure too. But although I've certainly

361
00:18:28,559 --> 00:18:30,519
had that experience where I've done stupid stuff in Azure

362
00:18:30,519 --> 00:18:32,200
and called them like I'm really sorry I did this,

363
00:18:32,400 --> 00:18:34,039
or like yep, fine, I'll wipe it.

364
00:18:34,319 --> 00:18:38,200
Speaker 1: Oh you're the guy. Yeah, we've been waiting for your call.

365
00:18:39,440 --> 00:18:39,920
What was that?

366
00:18:40,519 --> 00:18:47,799
Speaker 2: But the business reality of this consumption model is you

367
00:18:47,920 --> 00:18:50,759
don't always get paid for the stuff that you used,

368
00:18:51,279 --> 00:18:55,039
right or and or are willing to like that's this.

369
00:18:55,400 --> 00:18:58,440
All of these mechanisms to me speak to let's catch

370
00:18:58,519 --> 00:19:00,480
why didn't you notice? Why didn't you catch it before

371
00:19:00,519 --> 00:19:03,920
it ran away? You know, after the first million tokens?

372
00:19:04,200 --> 00:19:07,119
Why didn't you stop me? And these are the tools,

373
00:19:07,279 --> 00:19:10,799
right like, this is how this stops from being worse exactly.

374
00:19:11,000 --> 00:19:11,240
Speaker 1: Yeah.

375
00:19:11,519 --> 00:19:14,119
Speaker 3: Yeah, So we were trying to make sure that customers

376
00:19:14,200 --> 00:19:17,640
have the right tools to have the proper governance in place.

377
00:19:18,880 --> 00:19:20,960
So one of the things that you mentioned like tokens,

378
00:19:21,480 --> 00:19:24,839
So we introduced the So we already had like rate

379
00:19:24,880 --> 00:19:29,079
limiting policy that works for requests like you can say,

380
00:19:29,319 --> 00:19:31,880
as I mentioned previously, like five requests per second for example,

381
00:19:32,400 --> 00:19:34,440
and now we need we had to build something for

382
00:19:34,599 --> 00:19:37,119
tokens which is aware of these tokens, which is kind

383
00:19:37,119 --> 00:19:40,720
of the main currency of open the eye, as I mentioned. So, yeah,

384
00:19:40,720 --> 00:19:43,359
we introduced the stoken limit policy. It works pretty similarly

385
00:19:43,440 --> 00:19:46,480
to rate limit policy. You can say that, okay, we

386
00:19:46,599 --> 00:19:49,680
have this application, we have this department, we have this team.

387
00:19:50,440 --> 00:19:55,759
Now we assign let's say that one thousand tokens permitted

388
00:19:55,799 --> 00:19:58,920
to this application to make sure that do not consume more.

389
00:20:00,519 --> 00:20:02,920
And yeah, and that that prot works pretty well. And

390
00:20:03,039 --> 00:20:07,240
if you want to be extra careful, you also want

391
00:20:07,319 --> 00:20:10,519
to you also can configure the policy to estimate the

392
00:20:11,200 --> 00:20:13,759
uh the tokens which are in the prompt So whenever

393
00:20:13,839 --> 00:20:15,680
there is a request coming with a prompt and you

394
00:20:16,039 --> 00:20:19,039
calculate the number of prompts, the number of tokens which

395
00:20:19,079 --> 00:20:21,240
is used in the prompt and then if we on

396
00:20:21,319 --> 00:20:24,160
APM side understand that it already exceeds the limit, we

397
00:20:24,200 --> 00:20:26,160
will not send this to the to the back end.

398
00:20:26,559 --> 00:20:28,440
Speaker 2: Right, so you will consume in the first place, you're

399
00:20:28,440 --> 00:20:31,880
already pressing against the limit. Yes, how do you bubble

400
00:20:32,079 --> 00:20:33,319
up that you've hit a limit?

401
00:20:34,519 --> 00:20:34,559
Speaker 1: Like?

402
00:20:34,720 --> 00:20:36,799
Speaker 2: What does that look like for the customer? What does

403
00:20:36,799 --> 00:20:38,160
it look like for the operator?

404
00:20:38,960 --> 00:20:41,960
Speaker 3: Yeah, so there is a pattern with great limited. For example,

405
00:20:42,559 --> 00:20:45,960
you typically it's four twenty nine returned, retry with retry

406
00:20:46,000 --> 00:20:48,720
after header with a specific like number of seconds.

407
00:20:49,559 --> 00:20:55,000
Speaker 1: That's a message that says sorry yeah version yeah, Canadian

408
00:20:55,079 --> 00:20:55,599
version yeah.

409
00:20:56,119 --> 00:20:58,519
Speaker 3: And that that's what we've built forty for this token

410
00:20:58,599 --> 00:21:00,839
limit policy as well. So whenever the limit has hit

411
00:21:01,039 --> 00:21:05,799
four twenty nine, retry after a specific number of seconds

412
00:21:05,880 --> 00:21:07,519
or minutes, depending on how you can figure it.

413
00:21:07,599 --> 00:21:10,440
Speaker 1: Right, if you're being rate limited. Yeah, I use the

414
00:21:11,480 --> 00:21:14,880
Google YouTube API, and I'm working on a new publisher

415
00:21:15,119 --> 00:21:18,279
and it's going to be publishing to YouTube, just like

416
00:21:18,359 --> 00:21:24,160
we talked about earlier. And it's weird. I work for

417
00:21:24,240 --> 00:21:26,000
a couple hours on this in the morning, and I

418
00:21:26,160 --> 00:21:29,720
make several requests and then I get the you know,

419
00:21:29,960 --> 00:21:32,519
quota exceeded, and I'm going to look at my quota

420
00:21:32,559 --> 00:21:35,079
and it's like ten thousand API calls. I'm like, I

421
00:21:35,160 --> 00:21:39,240
need to make ten thousand API calls. So it's just

422
00:21:39,359 --> 00:21:43,160
an anecdote, but yeah, I'm looking at the response for that,

423
00:21:43,680 --> 00:21:46,599
you know when I try to authenticate myself and it'll

424
00:21:46,640 --> 00:21:50,799
say nope, quote exceeded. Sorry. Yeah.

425
00:21:50,839 --> 00:21:53,759
Speaker 3: And we also trying to make sure that it is

426
00:21:53,880 --> 00:21:58,279
fully transparent for developers because there is a huge ecosystem

427
00:21:58,319 --> 00:22:01,359
of different tools for open the I and other llms

428
00:22:01,440 --> 00:22:04,279
like as open the I, s decay, lung chain, prompt

429
00:22:04,319 --> 00:22:06,720
flow like there are a lot of different tools and

430
00:22:07,200 --> 00:22:11,400
typically typically developers they start with the direct access to

431
00:22:11,480 --> 00:22:14,400
open THEI because as the case, they expect a specific

432
00:22:14,599 --> 00:22:16,359
like ur L and the open the eye side, they

433
00:22:16,400 --> 00:22:19,039
expect the apike and so on. So on our side,

434
00:22:19,079 --> 00:22:22,039
we wanted to make sure that this experience is the

435
00:22:22,119 --> 00:22:24,799
same for developer. So which means that if we put

436
00:22:24,839 --> 00:22:28,279
API m behind or sorry between open EI and the developer,

437
00:22:28,799 --> 00:22:31,440
they will never notice that something changed. So for us,

438
00:22:31,480 --> 00:22:33,519
it was super important to make sure that the developer

439
00:22:33,559 --> 00:22:37,559
experience is still the same. That's why yes, yes, so

440
00:22:37,599 --> 00:22:39,400
that's why we return for twenty nine because that's what

441
00:22:39,559 --> 00:22:41,960
open EI does. We are trying to follow the same

442
00:22:42,000 --> 00:22:46,279
structure to make sure that everything works as it worked before, isn't.

443
00:22:46,279 --> 00:22:49,839
Speaker 1: One of the things that ap I M does is

444
00:22:50,359 --> 00:22:53,319
you can if you have a process for the developer

445
00:22:53,400 --> 00:22:58,880
that includes several API calls, maybe two different services or

446
00:22:58,960 --> 00:23:04,440
different you can make one sort of master API that

447
00:23:04,640 --> 00:23:07,599
then makes calls and proxies out on your behalf to

448
00:23:07,720 --> 00:23:11,039
these other ones and comes with a single result. I've

449
00:23:11,160 --> 00:23:14,799
used that feature of API M. There's just so much stuff,

450
00:23:14,839 --> 00:23:17,119
and when I got into it, there's just so much

451
00:23:17,160 --> 00:23:20,079
stuff in there. We could probably spend two hours just

452
00:23:20,119 --> 00:23:23,000
talking about all the features of API M. But you

453
00:23:23,759 --> 00:23:26,440
mentioned that you put out a Microsoft put out a

454
00:23:26,440 --> 00:23:30,079
white paper about, you know, some of these new features.

455
00:23:30,640 --> 00:23:34,000
I guess can we get a link to that and

456
00:23:35,079 --> 00:23:37,799
what what are some of the other amazing things that

457
00:23:37,880 --> 00:23:40,559
we might not know about that are in that now.

458
00:23:40,640 --> 00:23:42,880
Speaker 3: There's a lot of innovation happening in APAM, so Jenny

459
00:23:42,920 --> 00:23:45,640
I gateway is definitely one of those things that I mentioned.

460
00:23:46,039 --> 00:23:49,079
We're also currently working on the enhancing the for example,

461
00:23:49,160 --> 00:23:51,240
the workspaces feature that we have an APIM to make

462
00:23:51,279 --> 00:23:54,279
sure that each team has its soul in workspace with

463
00:23:54,440 --> 00:23:58,880
isolation like control plan isolation, data play isolation, and so on. Recently,

464
00:23:58,920 --> 00:24:02,400
we also released a couple of new SKUs for APIM

465
00:24:02,480 --> 00:24:05,799
which are way faster to provision, they work better, they

466
00:24:05,880 --> 00:24:09,839
work in a new architecture under the hood. There is

467
00:24:09,920 --> 00:24:13,039
a slightly different price in model. But yeah, that's we

468
00:24:13,920 --> 00:24:17,799
have a lot of stuff going on there. To your

469
00:24:17,880 --> 00:24:22,519
point for the as you mentioned that it's really hard

470
00:24:22,559 --> 00:24:24,480
to understand what's going on in a PM, like a

471
00:24:24,559 --> 00:24:28,119
lot of policies and stuff like that. With Jenny I

472
00:24:28,200 --> 00:24:31,279
gateway that we were discussing, that's also one of the

473
00:24:31,359 --> 00:24:34,400
challenges that we wanted to address, like, Okay, we have

474
00:24:34,519 --> 00:24:38,480
this intelligent application developers. They use JGPT, they know how

475
00:24:38,559 --> 00:24:41,480
to use that, but they're not familiar with APM, and

476
00:24:41,559 --> 00:24:43,720
now we're asking them to write a bunch of aximal

477
00:24:43,759 --> 00:24:46,799
policies to limit to have the token limit, to have

478
00:24:46,920 --> 00:24:49,960
the authorization in place, load balancing in place, like metrics

479
00:24:50,240 --> 00:24:53,559
for token consumption in place, and so on. So we

480
00:24:53,720 --> 00:24:56,559
wanted to address it, and we also kind of we

481
00:24:56,680 --> 00:24:58,799
thought that it would be nice to have an easy

482
00:24:58,839 --> 00:25:02,279
experience of for those developers and apim to import exist

483
00:25:02,400 --> 00:25:05,240
natural open AAPIs. So we now have this kind of

484
00:25:05,480 --> 00:25:07,759
UI portal experience where you can just say, okay, I

485
00:25:08,480 --> 00:25:12,079
was using this open the endpoint, let's configure that one.

486
00:25:12,160 --> 00:25:14,759
And also I want to have token limit off I

487
00:25:14,759 --> 00:25:18,200
don't know, two thousand GPM, and we can configure everything

488
00:25:18,279 --> 00:25:20,799
for them, so they don't really need to care about

489
00:25:20,839 --> 00:25:23,200
the eximal policies. They don't really need to look into those.

490
00:25:23,240 --> 00:25:24,759
Of course, if you need to change something later on

491
00:25:24,960 --> 00:25:26,799
or you need like most of his scated policies, of

492
00:25:26,839 --> 00:25:29,880
course I need to learn something, but at least to

493
00:25:30,000 --> 00:25:34,440
getting started experiences is like super opimal.

494
00:25:34,319 --> 00:25:38,160
Speaker 2: Well Microsoft, Yeah, I like the copilot ASPD here of Also,

495
00:25:38,599 --> 00:25:40,400
I know I wrote this a month ago, but I

496
00:25:40,440 --> 00:25:42,880
don't know what it says anymore. Like PARTSES for me,

497
00:25:43,480 --> 00:25:46,319
like again with my admin head on, it's like often

498
00:25:46,359 --> 00:25:48,400
I have a service level agreement I'm making with certain

499
00:25:48,440 --> 00:25:52,000
customers that's written in legal ease and I'm trying to

500
00:25:52,079 --> 00:25:55,079
translate it into haven't helped Me XML, But the idea

501
00:25:55,119 --> 00:25:58,400
that I have an intermediary tool that would then take

502
00:25:58,440 --> 00:26:00,200
them at legally to try and make the XML for me,

503
00:26:00,240 --> 00:26:01,880
and then after it's done. I could ask for it

504
00:26:02,039 --> 00:26:03,960
back and say, like, how close have I gotten here?

505
00:26:04,519 --> 00:26:06,960
I actually hit the rules that we've agreed to in

506
00:26:07,039 --> 00:26:10,920
the SLA. That translation that layer has always been a

507
00:26:11,000 --> 00:26:14,240
challenging part of it? Has this always been about the money?

508
00:26:15,279 --> 00:26:18,000
Like that's the main thing that's happening here is you

509
00:26:18,079 --> 00:26:19,920
don't want to run it, you know, I presume you'll

510
00:26:19,920 --> 00:26:22,400
always sell us more cloud, you know by the transaction.

511
00:26:22,720 --> 00:26:25,680
If you just keep requesting calls, that's fine. It's just

512
00:26:26,160 --> 00:26:27,720
then one day you're going to have to pay for

513
00:26:27,839 --> 00:26:31,279
it and it's not what you intended. So is that

514
00:26:31,359 --> 00:26:33,319
the important part in API management? Like, I'm not worried

515
00:26:33,319 --> 00:26:34,799
about tipping over the cloud, am I?

516
00:26:36,079 --> 00:26:38,400
Speaker 3: Well, I guess it depends on your shower, of course.

517
00:26:38,440 --> 00:26:40,359
But yeah, that's one of the one of the things

518
00:26:40,359 --> 00:26:43,079
that you can put into APIM, Like whatever control you need,

519
00:26:44,079 --> 00:26:46,839
you can you can build it with the kind of

520
00:26:46,920 --> 00:26:49,920
a pretty powerful police engine that we have in APM.

521
00:26:50,839 --> 00:26:51,279
Speaker 1: That's cool.

522
00:26:51,559 --> 00:26:54,680
Speaker 2: I appreciate that, And gentlemen, I needed to take a

523
00:26:54,720 --> 00:26:59,200
break for one moment for these very important messages, and

524
00:26:59,440 --> 00:27:01,720
we're back. It's don at Rock's I'mateurd Campbell, that's Carl

525
00:27:01,759 --> 00:27:04,640
Franklin yoh Yo Yo talking to our friend Andre a

526
00:27:04,680 --> 00:27:08,440
bit about these improvements to API M which we all

527
00:27:08,480 --> 00:27:10,759
should be using. If we're gonna expose an API through

528
00:27:10,839 --> 00:27:14,079
the cloud to the world, don't leave it naked, give

529
00:27:14,119 --> 00:27:17,759
it some armor, and this tool helps. These gen AI

530
00:27:17,920 --> 00:27:20,599
tools help us to configure it correctly, operate it well,

531
00:27:21,000 --> 00:27:23,920
but then also deal with the additional complexities when it

532
00:27:23,960 --> 00:27:28,000
comes to the as you open AI, APIs with limit

533
00:27:28,519 --> 00:27:34,319
issuing tokens for software to utilize open ai and put

534
00:27:34,400 --> 00:27:36,079
limits in place for all of those good things.

535
00:27:36,480 --> 00:27:37,799
Speaker 1: Have I summarized that correctly?

536
00:27:37,839 --> 00:27:38,119
Speaker 2: Andre?

537
00:27:38,559 --> 00:27:40,720
Speaker 1: Yeah? I think so. Yeah, I think I'm starting to

538
00:27:40,799 --> 00:27:43,920
understand what you doing here. Man. I'm pretty excited. Richard

539
00:27:44,079 --> 00:27:45,079
is the human AI.

540
00:27:46,119 --> 00:27:51,440
Speaker 2: I don't know that's true. It's yeah, real, definitely created.

541
00:27:51,519 --> 00:27:53,799
Like you said, a very important phrase is sticking with

542
00:27:53,920 --> 00:27:55,400
me now, which is tokens or currency?

543
00:27:55,759 --> 00:27:56,440
Speaker 1: Yeah? Absolutely.

544
00:27:56,480 --> 00:27:58,680
Speaker 3: You can think about it as your main currency, your

545
00:27:58,720 --> 00:28:02,480
main resource you have with all of these models, and

546
00:28:02,599 --> 00:28:04,480
that's also what you're paying for, and.

547
00:28:04,519 --> 00:28:07,319
Speaker 1: It's what what you pay that's what you pay for exactly.

548
00:28:07,440 --> 00:28:09,440
Speaker 2: And so of course it's a currency because it does

549
00:28:09,559 --> 00:28:13,640
ultimately translate into FIA currency. Of whatever form you're using,

550
00:28:13,759 --> 00:28:15,279
you're going to you're going to pay for that stuff,

551
00:28:15,960 --> 00:28:17,880
and then you get it that pays your models and

552
00:28:17,960 --> 00:28:19,799
all you have all that choices when you have these

553
00:28:19,799 --> 00:28:21,960
controls over top of Can we talk a little about

554
00:28:21,960 --> 00:28:25,680
the semantic casing policies. That sounds like a way to

555
00:28:25,880 --> 00:28:29,519
save money and potentially improve performance. That's interesting.

556
00:28:30,079 --> 00:28:32,960
Speaker 3: Yes, yeah, that's that's actually very interest simple see and

557
00:28:33,000 --> 00:28:36,240
every interesting implementation from all side. So yeah, as as

558
00:28:36,279 --> 00:28:40,680
you mentioned, so first of all, we solve the latency

559
00:28:40,720 --> 00:28:44,759
problem just with regular cushion that already exists in APAM

560
00:28:44,839 --> 00:28:47,160
for a while, you can cash request, you can cush

561
00:28:47,200 --> 00:28:50,839
responses for specific requests, but with with all items is

562
00:28:50,839 --> 00:28:53,640
a little bit different because your prompts can be different,

563
00:28:53,720 --> 00:28:56,519
but they're semantically similar, right, That's what we do with

564
00:28:56,599 --> 00:28:59,559
semantic cash, And so there is a an open opening.

565
00:28:59,559 --> 00:29:04,240
I provide and embedding models. Embedding model which generates vectors

566
00:29:04,319 --> 00:29:06,559
which represent the kind of you can think about it

567
00:29:06,599 --> 00:29:08,960
as a kind of semantic minion of a specific prompt

568
00:29:09,039 --> 00:29:13,519
war specific like stream, and then we generated for a

569
00:29:13,559 --> 00:29:15,839
specific prompt and then if we realize that there is

570
00:29:15,880 --> 00:29:19,400
a semantically similar prompt coming in, we will check the

571
00:29:19,480 --> 00:29:21,720
cash and we will retrieve the response from the cash

572
00:29:21,759 --> 00:29:23,880
instead of hitting the open the endpoints. So first of all,

573
00:29:23,920 --> 00:29:26,079
as I mentioned, were solving the latency problems or the

574
00:29:26,680 --> 00:29:29,400
response is getting to the client faster, but we all

575
00:29:29,480 --> 00:29:32,640
sort of saving on the token consumption because this prompt

576
00:29:32,759 --> 00:29:36,799
will never go to help on the A endpoint while

577
00:29:36,880 --> 00:29:40,240
we have the response cached. In our case, we're using

578
00:29:40,279 --> 00:29:43,599
reddis for vector search, so that's where story is responses.

579
00:29:43,680 --> 00:29:46,799
So yeah, if you're saying hi or saying hello afterwards,

580
00:29:47,079 --> 00:29:49,599
they're semantically similar where we just returned.

581
00:29:50,200 --> 00:29:54,279
Speaker 2: I immediately go to a scenario like imagine an incident

582
00:29:54,319 --> 00:29:56,759
that's happened that has caused a lot of flights to

583
00:29:56,839 --> 00:29:57,920
be canceled.

584
00:29:58,039 --> 00:30:00,960
Speaker 1: That would never happen, Richard, Come on, you a real example.

585
00:30:01,279 --> 00:30:04,839
Speaker 2: Folks are trying to find out if their flights canceled,

586
00:30:05,000 --> 00:30:07,519
So you're going to get many requests from different sources

587
00:30:07,559 --> 00:30:09,960
that are essentially the same thing. Is this flight canceled?

588
00:30:10,319 --> 00:30:12,119
You really only need to want to fetch that once.

589
00:30:12,240 --> 00:30:14,799
Now it's sitting in the cash, and you very quickly respond, yes,

590
00:30:15,160 --> 00:30:16,480
all flights are canceled.

591
00:30:17,599 --> 00:30:18,400
Speaker 1: But you know.

592
00:30:19,880 --> 00:30:21,400
Speaker 2: What I like about a cashing model like that is

593
00:30:21,440 --> 00:30:24,480
that it will evolve over time, you know, you imagine

594
00:30:24,519 --> 00:30:27,920
other scenarios whence those flights are gone, there's other flights

595
00:30:28,079 --> 00:30:30,519
like but you're often only going to need to make

596
00:30:30,599 --> 00:30:33,559
that actual request back to the engine once and use

597
00:30:33,599 --> 00:30:37,000
it over and over again. So a good caching opportunity

598
00:30:37,000 --> 00:30:39,519
when you're going to have multiple people more or less

599
00:30:40,000 --> 00:30:43,000
making the same requests but in many different ways of phrasing.

600
00:30:43,160 --> 00:30:45,079
Speaker 1: And also a way to bust the cash once the

601
00:30:45,119 --> 00:30:47,039
flights are back to normal.

602
00:30:46,839 --> 00:30:49,759
Speaker 2: Yeah, rather than do code it yourself where you have

603
00:30:49,839 --> 00:30:53,160
to it's cashing is not hard. Expiring is hard, yeah,

604
00:30:54,519 --> 00:30:56,160
inspiring's always hard.

605
00:30:56,279 --> 00:31:01,960
Speaker 1: So wait a minute, what why is it? Oh? How

606
00:31:02,000 --> 00:31:02,480
many times?

607
00:31:03,119 --> 00:31:05,880
Speaker 2: Although maybe and again i'm reading here this is an

608
00:31:05,920 --> 00:31:07,759
early version. This is your first sort of go with this.

609
00:31:08,079 --> 00:31:10,880
Speaker 3: Yes, yeah, yeah, yeah, well that's that's the nearly preview

610
00:31:11,000 --> 00:31:14,519
version for now. We're still like so there there are

611
00:31:14,519 --> 00:31:16,640
a lot of customer use cases for that. So as

612
00:31:16,720 --> 00:31:19,799
you mentioned, uh that that was a good example. Uh,

613
00:31:20,960 --> 00:31:25,680
but then we also have so basically like whenever whenever

614
00:31:25,960 --> 00:31:28,440
the company builds some sort of a chat service for

615
00:31:28,799 --> 00:31:32,920
answering questions, then you always have frequently asked questions.

616
00:31:33,279 --> 00:31:34,759
Speaker 2: And that's where you're Hey, you're going to build a

617
00:31:34,799 --> 00:31:38,640
factable inevitably, but rather than you define it, let utilization

618
00:31:38,880 --> 00:31:40,640
define it with a cash exactly.

619
00:31:40,920 --> 00:31:41,160
Speaker 1: Yeah.

620
00:31:41,759 --> 00:31:43,880
Speaker 3: Yeah, and that's where you you have a lot of

621
00:31:44,839 --> 00:31:47,559
token saved just with the semantic cash and policy.

622
00:31:47,640 --> 00:31:47,839
Speaker 1: Yeah.

623
00:31:47,880 --> 00:31:51,039
Speaker 3: Also also for internal knowledge base, that's also important. Like

624
00:31:52,000 --> 00:31:54,759
we have a bunch of for example, support engineers sitting

625
00:31:54,759 --> 00:31:59,680
in this in the call center and sometimes problems are similar. Yeah,

626
00:31:59,799 --> 00:32:01,680
it's and you're just doing the search through the Chad

627
00:32:01,720 --> 00:32:04,799
jubt and yeah, your your responsors are turning from cash

628
00:32:04,880 --> 00:32:07,480
and you're not hitting the opening endpoint.

629
00:32:07,759 --> 00:32:07,960
Speaker 1: Yeah.

630
00:32:08,279 --> 00:32:13,000
Speaker 2: I was recently reading about folks that aren't securing these

631
00:32:13,160 --> 00:32:16,759
kinds of services properly, and people discover them and just

632
00:32:17,000 --> 00:32:20,640
use them as their free version of chat ept, basically

633
00:32:20,720 --> 00:32:24,359
leaving that that vendor holding the bag for the token costs.

634
00:32:25,319 --> 00:32:28,079
Speaker 1: It's a great idea, Richard. Yeah, nice, glad. I never

635
00:32:28,640 --> 00:32:33,839
I can't believe in everything, but this is what I'm thinking.

636
00:32:33,880 --> 00:32:36,160
Speaker 2: It's like, I'm not even talking about the you know,

637
00:32:36,960 --> 00:32:40,319
the proper utilizations and run away API calls and so far,

638
00:32:40,400 --> 00:32:44,079
but genuine nefarious use that somebody's like, oh, look, you've

639
00:32:44,119 --> 00:32:46,839
exposed chat to me and I can use it for anything,

640
00:32:47,319 --> 00:32:49,039
So I'm not even gonna worry about your product. I'm

641
00:32:49,039 --> 00:32:52,319
just going to exploit your token availability to run the

642
00:32:52,400 --> 00:32:55,519
queries I want to run, and you know you get

643
00:32:55,559 --> 00:32:56,839
to eat it. Congratulations.

644
00:32:57,519 --> 00:32:59,720
Speaker 3: Yeah, that's that's why. First of all, it's important to

645
00:32:59,799 --> 00:33:02,759
have something like APIM where you have API keys which

646
00:33:02,799 --> 00:33:07,359
are on APM side represents specific color or application. But

647
00:33:07,559 --> 00:33:11,640
also there are certain tools in a measure OPENINGI itself

648
00:33:11,839 --> 00:33:14,960
where you can say that there's a specific filter on

649
00:33:15,039 --> 00:33:17,319
the content that this model is supposed to respond to,

650
00:33:17,839 --> 00:33:20,160
for example, if you're asking it, if you're training it.

651
00:33:20,480 --> 00:33:24,160
In our case, we're trained to respond about APIM policies

652
00:33:24,599 --> 00:33:27,799
if someone asks about the weather right now or something else,

653
00:33:27,960 --> 00:33:31,720
or summarizing a document which is which doesn't have anything

654
00:33:31,759 --> 00:33:33,599
to do with APIM, and we will just respond sorry,

655
00:33:33,640 --> 00:33:35,559
I cannot do that. I'm not trained to do that.

656
00:33:37,279 --> 00:33:38,960
Speaker 1: My job. Go find your own chatbot.

657
00:33:39,079 --> 00:33:43,400
Speaker 2: Yeah, and that documents particularly evil because that needs a

658
00:33:43,400 --> 00:33:45,519
lot of tokens. When you shove a document up to

659
00:33:45,640 --> 00:33:49,119
summarizes formul like absolutely as a token intensive and an

660
00:33:49,200 --> 00:33:53,880
easy mistake to make if you haven't boxed that interface properly.

661
00:33:54,400 --> 00:33:57,240
Speaker 1: Talking about some of the some more the new awesome features.

662
00:33:57,680 --> 00:33:59,559
Is there anything that we haven't talked about yet that

663
00:33:59,680 --> 00:34:02,759
customers have asked for that you've implemented in this next version.

664
00:34:03,039 --> 00:34:05,880
Speaker 3: Yeah, there is an interesting I wouldn't say that's specific feature,

665
00:34:05,960 --> 00:34:07,880
but that's kind of a challenge that we saw in

666
00:34:08,039 --> 00:34:12,840
a PM. So we supported Service cent Events technology for

667
00:34:12,920 --> 00:34:15,679
a while in APM, but we had some certain problems

668
00:34:15,719 --> 00:34:18,320
with that because that's essentially streaming. So when you when

669
00:34:18,360 --> 00:34:21,679
you send the request to judge a BT, typically what

670
00:34:21,840 --> 00:34:24,440
you will see and experience that you're used to most

671
00:34:24,559 --> 00:34:27,599
likely is that it will be it will be responding

672
00:34:27,679 --> 00:34:30,039
in chunk of text. It's not just it's not sending,

673
00:34:30,159 --> 00:34:33,320
like you, the full response, it's just responding it in

674
00:34:33,400 --> 00:34:37,960
streaming fashion. And it turns out the customers want to

675
00:34:38,039 --> 00:34:42,199
use streaming because that's what users are used to. They

676
00:34:42,719 --> 00:34:45,639
want to see the same experience in their chat experiences

677
00:34:45,679 --> 00:34:47,639
as well, like in their propilots and so on whatever

678
00:34:47,639 --> 00:34:52,119
applications they build. But there is a certain problem with

679
00:34:52,239 --> 00:34:56,079
that because whenever you introduce some sort of buffering, then

680
00:34:56,159 --> 00:34:59,800
the streaming experience breaks, which which is the case for

681
00:34:59,840 --> 00:35:02,079
you PM right now. Because whenever you have a log

682
00:35:02,159 --> 00:35:04,880
in policy or a monitoring policy, or you have a retripolicy,

683
00:35:04,960 --> 00:35:07,360
so whenever you do a buffer and a response or request,

684
00:35:08,000 --> 00:35:11,079
the streaming breaks. So we had certain challenges to make

685
00:35:11,119 --> 00:35:14,199
sure that talking limit and the talking metric policies they

686
00:35:14,320 --> 00:35:18,559
work with streaming scenarios as well. So that's kind of challenging,

687
00:35:18,719 --> 00:35:20,239
I would say, and that's kind of one of the

688
00:35:20,639 --> 00:35:23,960
things that customers requests to add support for.

689
00:35:24,599 --> 00:35:26,480
Speaker 2: Yeah, for sure, there's more features still to come down

690
00:35:26,559 --> 00:35:28,920
the pipe, you know, like there's a lot we could

691
00:35:29,000 --> 00:35:29,800
be doing in here.

692
00:35:30,519 --> 00:35:31,840
Speaker 1: Yeah, over time.

693
00:35:32,000 --> 00:35:34,599
Speaker 2: It's although honestly, when we started this conversation, like I

694
00:35:34,639 --> 00:35:37,199
think you guys already done too many things Like sting

695
00:35:37,239 --> 00:35:40,280
is all as out is challenging, and I know there's

696
00:35:40,280 --> 00:35:41,320
still more that could be done.

697
00:35:42,039 --> 00:35:43,920
Speaker 3: No, there are certainly a lot of scenarios like as

698
00:35:44,159 --> 00:35:46,039
as you mentioned, one of these scenarios is kind of

699
00:35:46,119 --> 00:35:48,280
content safety. Just to make sure that we do not

700
00:35:48,400 --> 00:35:52,119
respond on specific I don't know, if there is a

701
00:35:52,199 --> 00:35:55,199
specific question and a prompt, we should not respond to

702
00:35:55,280 --> 00:35:56,440
this prompt.

703
00:35:56,719 --> 00:36:00,519
Speaker 2: Which doesn't sound like an API responsibility. You are at

704
00:36:00,519 --> 00:36:04,519
the gateway point where doing content filtering. This is a

705
00:36:04,559 --> 00:36:09,079
logical opportunity to hit that. Yeah, that's definitely a different area. Yeah,

706
00:36:09,119 --> 00:36:10,920
and actually that's something that you can do today. Like

707
00:36:11,079 --> 00:36:13,679
we get access to the request, you can look at

708
00:36:13,679 --> 00:36:15,480
the headers, you can look at the body, and then

709
00:36:15,519 --> 00:36:17,719
you can write whatever regular expression you want to deny

710
00:36:17,760 --> 00:36:23,239
the request. But that's to my point that policies are hard,

711
00:36:23,360 --> 00:36:25,199
especially for those who are not used to if I am.

712
00:36:25,599 --> 00:36:28,239
We just want to make sure that it's easy, easy

713
00:36:28,320 --> 00:36:32,440
to use, and easy to configure. So yeah, that's something

714
00:36:32,519 --> 00:36:37,800
that we're looking at. Adding like content safety concerns are real,

715
00:36:39,400 --> 00:36:42,079
there might be like PII data, there might be some

716
00:36:42,559 --> 00:36:45,239
confidential data in the request or response. You want to

717
00:36:45,280 --> 00:36:46,079
filter this out.

718
00:36:46,639 --> 00:36:50,199
Speaker 3: And Gateway seems like a natural place to do this

719
00:36:50,320 --> 00:36:52,280
kind of stuff because that's the kind of single point

720
00:36:52,280 --> 00:36:53,840
where you see all the requests and responses.

721
00:36:53,880 --> 00:36:56,239
Speaker 2: Because see asuary AI studio has a whole mechanism for

722
00:36:56,320 --> 00:36:58,639
content controls and so forth, you kind of want to

723
00:36:58,679 --> 00:37:01,480
pick the policies you've built were there and then push

724
00:37:01,599 --> 00:37:04,079
them in a hook to the API side.

725
00:37:04,119 --> 00:37:06,679
Speaker 1: It's say, here's our saying, I only want to write

726
00:37:06,679 --> 00:37:07,119
one set.

727
00:37:07,000 --> 00:37:08,800
Speaker 2: Of policies, but I want to be able to catch

728
00:37:08,840 --> 00:37:10,679
them into different places where it would matter.

729
00:37:10,920 --> 00:37:12,599
Speaker 3: Yeah, there is also a big piece of kind of

730
00:37:12,599 --> 00:37:15,639
a governance and kind of best practices within an organization.

731
00:37:15,719 --> 00:37:17,920
For example, you can have multiple model deployments and they

732
00:37:18,000 --> 00:37:23,440
have different content safety configurations. With APIM, you're just having

733
00:37:23,519 --> 00:37:26,960
kind of this platform engineering side of GENNAI. Let's say

734
00:37:27,639 --> 00:37:29,599
where you can say that, oh, these are our rules

735
00:37:29,639 --> 00:37:31,719
and all of the models that are deployed they should

736
00:37:31,719 --> 00:37:34,719
be behind APIM. And then in APIM you can figure

737
00:37:34,760 --> 00:37:37,920
all of the rules that you have in your organization

738
00:37:38,039 --> 00:37:42,480
to comply with the basically policies whatever you have an organization.

739
00:37:42,639 --> 00:37:46,880
So in that case, you're basically shifting the control to

740
00:37:47,000 --> 00:37:49,800
APIM instead of configuring stuff on the models level.

741
00:37:50,000 --> 00:37:52,400
Speaker 2: Yeah no, And you could see that associated with particular

742
00:37:52,440 --> 00:37:55,119
authentication accounts too. So it's like, hey, I provide a

743
00:37:55,239 --> 00:37:58,880
service for medical and so some pictures are going to

744
00:37:58,920 --> 00:38:00,840
be the kind that you wouldn't know normally want to

745
00:38:00,960 --> 00:38:04,199
show anywhere. But that's the business here, so it needs

746
00:38:04,199 --> 00:38:05,000
a different rule set.

747
00:38:06,960 --> 00:38:07,159
Speaker 1: Yeah.

748
00:38:07,360 --> 00:38:10,679
Speaker 2: Interesting, interesting array of problems here, Like you guys are

749
00:38:10,719 --> 00:38:11,199
up against it.

750
00:38:11,239 --> 00:38:11,840
Speaker 1: I appreciate this.

751
00:38:12,599 --> 00:38:18,159
Speaker 2: Uh, you've got an AI gateway samples on GitHub. Should

752
00:38:18,159 --> 00:38:19,920
I include a link to that? That looks pretty cool

753
00:38:20,519 --> 00:38:21,320
and super current.

754
00:38:21,559 --> 00:38:24,920
Speaker 3: Yeah, yeah, that's that's an amazing repole that was built

755
00:38:24,960 --> 00:38:27,960
one of the by one of the gbb's that we

756
00:38:28,079 --> 00:38:35,000
work with. So that's basically a set of labs that

757
00:38:35,159 --> 00:38:40,480
you can try with with API M. So typically probably

758
00:38:40,599 --> 00:38:44,679
know that the typical like space for AI engineer is

759
00:38:44,800 --> 00:38:48,960
a Python notebook. Yeah, and that's something that we wanted

760
00:38:49,039 --> 00:38:52,079
to implement in those labs. So there's a bunch of

761
00:38:52,440 --> 00:38:55,360
there's a bunch of Python notebooks, and then there is

762
00:38:55,440 --> 00:38:58,280
a code. Usually there is a code that is calling

763
00:38:58,440 --> 00:39:00,519
open the E through a p I M with the

764
00:39:01,000 --> 00:39:03,440
Azure opening I is decate, so it's pretty natural for

765
00:39:03,599 --> 00:39:07,159
you engineers. And then we demonstrate kind of a different

766
00:39:07,280 --> 00:39:10,400
token limits policy emy token metric policy. Then d a

767
00:39:10,400 --> 00:39:14,360
lot of additional stuff like low balance in and sending

768
00:39:14,519 --> 00:39:19,400
the augmenting the response with the RAC pattern and so on.

769
00:39:19,920 --> 00:39:24,960
Speaker 2: Yeah, so it'll seem familiar pretty quickly, dude. Yeah, you

770
00:39:25,039 --> 00:39:27,159
know there's different people coming in from different angles. Right,

771
00:39:27,280 --> 00:39:30,360
You've got your service builder on the back end, once

772
00:39:30,480 --> 00:39:32,639
controls and throttles and logging and that kind of thing.

773
00:39:33,119 --> 00:39:37,880
You've got your ll M folks who you know, want

774
00:39:37,920 --> 00:39:41,880
to automate the flow and control of tokens. You've got

775
00:39:42,039 --> 00:39:45,280
administrators trying to keep things up and make sure buildings

776
00:39:45,320 --> 00:39:49,159
go into right places. Anybody involved in cost control, which

777
00:39:49,239 --> 00:39:52,119
is lots of folks. Like my experience talking developers when

778
00:39:52,159 --> 00:39:53,880
they're starting to experiment in ll MS is they want

779
00:39:53,880 --> 00:39:56,559
the ladies and greatest of everything. But the price, you know,

780
00:39:56,719 --> 00:39:58,960
may the technology may or may not be needed, and

781
00:39:59,079 --> 00:40:01,760
the price tag is huge for the latest versions compared

782
00:40:01,800 --> 00:40:04,840
to Hey, would this have worked with GPT three point

783
00:40:04,920 --> 00:40:09,679
five lass like, because it's a tenth the price, Like,

784
00:40:10,079 --> 00:40:14,119
it makes a difference. Yeah, if you don't concern about

785
00:40:14,119 --> 00:40:16,000
any of that, you just don't. Nope, give me four zero,

786
00:40:16,079 --> 00:40:16,679
I want it all.

787
00:40:17,000 --> 00:40:17,159
Speaker 1: Yeah.

788
00:40:17,239 --> 00:40:19,719
Speaker 3: What's interesting with the alms, that's actually the opposite. Usually

789
00:40:20,440 --> 00:40:23,559
usually like four always cheaper there than four or than

790
00:40:23,719 --> 00:40:27,679
three five. Oh really Yeah, that's because they're more kind

791
00:40:27,679 --> 00:40:30,920
of optimized, so they say that they consume more less resources,

792
00:40:30,960 --> 00:40:31,960
so they're more optimized.

793
00:40:32,039 --> 00:40:33,400
Speaker 2: That's why it's it's cheaper.

794
00:40:33,760 --> 00:40:34,079
Speaker 1: Interesting.

795
00:40:34,719 --> 00:40:37,320
Speaker 3: So yeah, but that's that's actually a good point that

796
00:40:37,599 --> 00:40:41,639
we we basically distinguish we have internally we think about

797
00:40:41,719 --> 00:40:43,920
two personas. We have a I engineer who's kind of

798
00:40:43,920 --> 00:40:46,719
building the application, who's using all of the s DKs

799
00:40:46,760 --> 00:40:49,039
they want latest and greatest, and then we have a

800
00:40:49,119 --> 00:40:51,320
I platform engineer who is kind of providing access to

801
00:40:51,360 --> 00:40:54,599
those models and he here she cares about the token

802
00:40:54,679 --> 00:40:57,760
consumption like cross charge and low balance and all this

803
00:40:57,920 --> 00:41:01,960
kind of stuff. And I engineer they also they always

804
00:41:02,239 --> 00:41:04,400
want something new, and that's also kind of one of

805
00:41:04,400 --> 00:41:07,639
the challenges for us because the space is evolve when

806
00:41:07,800 --> 00:41:10,360
like super fuss, like we are just trying to keep

807
00:41:10,440 --> 00:41:13,119
up with with different models. For example, for all I

808
00:41:13,320 --> 00:41:16,000
was recently announced. We're just working on adding the support

809
00:41:16,119 --> 00:41:20,440
for this model right now because it's multimodel. It supports images, audio,

810
00:41:20,599 --> 00:41:23,039
not on the text like for GBT four or in

811
00:41:23,119 --> 00:41:26,639
JBT three five. But then we're working on this right now,

812
00:41:26,719 --> 00:41:30,920
and recently they announced GPT four All Mini. So it's

813
00:41:31,360 --> 00:41:33,280
it's really like it's really hard to keep up with

814
00:41:33,400 --> 00:41:35,400
the with the industry and like a lot of open

815
00:41:35,440 --> 00:41:40,360
source projects building the gateways, building the capabilities to document

816
00:41:40,480 --> 00:41:44,599
the lllms. So yeah, it's it's a fascinating place.

817
00:41:45,159 --> 00:41:49,760
Speaker 2: Yeah, job security for you, it sounds like just trying

818
00:41:49,800 --> 00:41:51,440
to keep up, right, But I think that's part of

819
00:41:51,480 --> 00:41:54,280
the strength for the customers using this is to go, oh,

820
00:41:54,400 --> 00:41:57,480
new model arrived, Okay, well it's in APIM so we're okay,

821
00:41:57,599 --> 00:42:01,000
we can add that connection that there. But certainly as

822
00:42:01,039 --> 00:42:03,480
they switch over to multimodel, like I suspect your inputs

823
00:42:03,519 --> 00:42:07,039
are different. It's not just a blob of text going

824
00:42:07,159 --> 00:42:08,320
and work could be almost anything.

825
00:42:08,519 --> 00:42:11,559
Speaker 3: Yeah, that's actually also an interesting problem because like sometimes

826
00:42:11,639 --> 00:42:14,559
people think that, oh, okay, I have this GPT four

827
00:42:15,039 --> 00:42:17,400
and what if GPT four is not available, I will

828
00:42:17,480 --> 00:42:19,920
go to I don't know, some different model for example

829
00:42:20,039 --> 00:42:26,119
mistroll large. But in reality you have the engineers will

830
00:42:26,159 --> 00:42:28,239
work a lot on the prompts and make sure that

831
00:42:28,400 --> 00:42:31,920
these prompts work with a specific model, right, And typically

832
00:42:31,920 --> 00:42:34,119
if you switch to the underland model, most likely the

833
00:42:34,199 --> 00:42:37,840
result will be not that not what you expected, right, right? Oh,

834
00:42:37,960 --> 00:42:40,679
it's important to test it against multiple models.

835
00:42:41,199 --> 00:42:44,679
Speaker 2: Well yeah, I said, this is such a moving space. Heck,

836
00:42:44,800 --> 00:42:47,079
let's face it, you can fire the same prompt at

837
00:42:47,079 --> 00:42:48,559
the same model several.

838
00:42:48,360 --> 00:42:49,559
Speaker 1: Times and get the results.

839
00:42:49,639 --> 00:42:53,039
Speaker 2: Yeah, right, Absolutely, We're not living in a land at

840
00:42:53,079 --> 00:42:54,360
consistency right now.

841
00:42:55,039 --> 00:42:57,960
Speaker 1: Brian McKay and I used to do this show called

842
00:42:58,000 --> 00:43:01,599
the ai Bot Show, and it was a YouTube thing

843
00:43:01,679 --> 00:43:04,360
and he would have something to show and he would

844
00:43:04,400 --> 00:43:07,800
be practicing it the night before we recorded, and we recorded,

845
00:43:08,079 --> 00:43:12,159
the prompts would be completely different, probably because it learned

846
00:43:12,679 --> 00:43:17,519
overnight modified. Yeah, something that he could jail break today

847
00:43:18,119 --> 00:43:20,639
tomorrow is impossible. Crazy.

848
00:43:20,840 --> 00:43:24,039
Speaker 3: Yeah, that's why prompt engineering is like it's super important.

849
00:43:24,280 --> 00:43:27,599
Like whatever you build, prompt engineering is always going to

850
00:43:27,639 --> 00:43:30,679
be very important. And the thing is that, like whenever

851
00:43:30,679 --> 00:43:33,360
you're tested, it's not deterministic, Like you cannot say that, Okay,

852
00:43:33,440 --> 00:43:35,719
it works as you mentioned, it works today, but it

853
00:43:35,840 --> 00:43:37,159
might not work work tomorrow.

854
00:43:37,320 --> 00:43:40,039
Speaker 1: And as developers, that really messes with our head because

855
00:43:40,079 --> 00:43:42,320
we're used to absolute results. Yeah.

856
00:43:42,719 --> 00:43:45,960
Speaker 2: I think we're also used to building on an existing

857
00:43:46,159 --> 00:43:49,239
data set, where so far they're pretty much tearing down

858
00:43:49,280 --> 00:43:51,760
his models and rebuilding them over and over and over again.

859
00:43:51,880 --> 00:43:56,000
So you can't expect that what worked before works again.

860
00:43:56,239 --> 00:44:00,840
That's just not the thing, because we don't revised models.

861
00:44:00,840 --> 00:44:05,079
We replace models for better or worse. I was describing

862
00:44:05,239 --> 00:44:09,440
paredolia on a walk this weekend that paradolia is the

863
00:44:09,599 --> 00:44:13,000
tendency for humans to see faces in things. Right, You

864
00:44:13,119 --> 00:44:15,199
look at a bowling ball and it's like, that's got

865
00:44:15,239 --> 00:44:16,519
a face, or the front of a car, it's got

866
00:44:16,599 --> 00:44:19,199
a face, right, And how that's an evolved trade of

867
00:44:19,280 --> 00:44:22,880
humans because if you detected the face first in the trees,

868
00:44:23,320 --> 00:44:25,440
you were the one running before the other people were running,

869
00:44:25,440 --> 00:44:29,400
so you probably lived. And the downside of courses, when

870
00:44:29,440 --> 00:44:31,760
you see faces that aren't there is almost is very low.

871
00:44:31,800 --> 00:44:34,239
It's not a big deal. Right, So we're talking about

872
00:44:34,280 --> 00:44:36,599
model bility. I'm like, so, imagine I'd take a shotgun

873
00:44:37,000 --> 00:44:39,000
and I shoot at a target and then I say,

874
00:44:39,239 --> 00:44:42,679
do you like this face right now? If you say no,

875
00:44:42,920 --> 00:44:45,079
I want a better face, I don't take the same target.

876
00:44:45,239 --> 00:44:47,079
I did a new target and I shoot it again.

877
00:44:48,360 --> 00:44:51,360
And that's you know, the nature of constantly rebuilding models

878
00:44:51,480 --> 00:44:54,239
is that you typically don't get the same results again.

879
00:44:54,360 --> 00:44:54,559
Speaker 1: Yeah.

880
00:44:54,719 --> 00:44:56,239
Speaker 2: I'm sorry that was a very long winded way of

881
00:44:56,280 --> 00:44:57,960
going about that. But I like saying paradolia.

882
00:44:58,000 --> 00:45:00,239
Speaker 1: But I love that you, you know, introduced that word

883
00:45:00,320 --> 00:45:06,000
that I've already forgotten something. But for me, that's like

884
00:45:06,119 --> 00:45:09,119
staring up the clouds. You know, our block ink blot

885
00:45:09,159 --> 00:45:10,400
tests or shack tests.

886
00:45:10,519 --> 00:45:14,440
Speaker 2: Yeah, yeah, humans see things that aren't there because it

887
00:45:15,400 --> 00:45:17,480
used to be useful at least Now I don't know,

888
00:45:17,960 --> 00:45:21,519
it's creating its own set of complexities. In a question, well,

889
00:45:21,639 --> 00:45:24,159
how many versions out are you planned? Andre, Like, I

890
00:45:24,199 --> 00:45:27,880
could see lots of demand from different folks for various features.

891
00:45:28,159 --> 00:45:32,519
We talked about the whole content management thing. But you

892
00:45:32,599 --> 00:45:34,159
know what comes next for you?

893
00:45:35,400 --> 00:45:38,840
Speaker 3: Yeah, So we started with as open AI being kind

894
00:45:38,880 --> 00:45:41,000
of the one that is easy to use an Azure

895
00:45:41,119 --> 00:45:45,039
and have the most popular one, but we also want

896
00:45:45,079 --> 00:45:47,440
to extend to other models because there is definitely demand

897
00:45:47,639 --> 00:45:51,719
to use other models like Lama, Mistrol here, hug and

898
00:45:51,800 --> 00:45:54,760
Face and others. So we're looking at how to expand

899
00:45:55,119 --> 00:45:57,880
our genera k to akpabilities to support more models, to

900
00:45:57,960 --> 00:46:00,119
make sure that customers can use multiple models and the

901
00:46:00,159 --> 00:46:04,760
same in the same APM instance, without like the need

902
00:46:04,920 --> 00:46:08,360
to customize policies right crazy post expressions.

903
00:46:07,840 --> 00:46:08,159
Speaker 1: And so on.

904
00:46:10,320 --> 00:46:12,880
Speaker 3: Then U there is a there's a huge demand like

905
00:46:13,079 --> 00:46:15,960
on logging in monitoring side, as I mentioned, we we

906
00:46:16,159 --> 00:46:19,760
started with the talken tracking, but it turns out there

907
00:46:19,800 --> 00:46:23,039
are certain phases of the intelligent applications development where you

908
00:46:23,119 --> 00:46:25,639
actually want to collect all of the proms and completions

909
00:46:25,679 --> 00:46:30,440
to make sure that your model behaves correctly. And in general,

910
00:46:30,519 --> 00:46:33,239
logan is pretty easy with API M if you are

911
00:46:33,280 --> 00:46:35,400
not using streaming, if you're not using the SECE events,

912
00:46:35,440 --> 00:46:38,880
because again I mentioned there's a buffering problem, so that's

913
00:46:38,920 --> 00:46:41,320
something that we're looking at how we can solve this

914
00:46:41,480 --> 00:46:44,639
in the future. And also kind of in general, like

915
00:46:44,719 --> 00:46:49,639
focus on security traffic management like prompt manipulation policies, like

916
00:46:50,119 --> 00:46:52,440
let's say that this example that you share that I

917
00:46:52,639 --> 00:46:56,119
just found some copilot that I can use now for

918
00:46:56,239 --> 00:47:00,719
my personal personal Again, but what if I have some

919
00:47:00,840 --> 00:47:04,519
policy that says that, oh, for whatever context which is

920
00:47:04,559 --> 00:47:07,360
presented in the prompt, I will rewrite it so that

921
00:47:07,480 --> 00:47:09,840
I know that my application works with it perfectly well.

922
00:47:10,280 --> 00:47:12,039
So in that case, whatever you send us a problem

923
00:47:12,079 --> 00:47:14,119
that will be rewritten on APIM side, and you will

924
00:47:14,159 --> 00:47:17,079
not get the response that you wanted from this copile

925
00:47:17,199 --> 00:47:19,239
that you just found out in the Internet.

926
00:47:20,079 --> 00:47:21,719
Speaker 1: So it just occurred to me at the end of

927
00:47:21,719 --> 00:47:23,760
the show here that I should ask this question long ago.

928
00:47:23,800 --> 00:47:26,960
But is it possible to write two policies that contradict

929
00:47:27,039 --> 00:47:29,800
each other? And what happens if that's possible?

930
00:47:31,320 --> 00:47:33,039
Speaker 3: I believe technically you can do that. But at the

931
00:47:33,039 --> 00:47:36,320
same time, we have the policies are executed from top

932
00:47:36,400 --> 00:47:38,480
to bottom, so whatever you have at the bottom will

933
00:47:39,400 --> 00:47:41,519
be enforced, right, So it's.

934
00:47:41,360 --> 00:47:47,000
Speaker 1: The order of execution, yep, which is pretty common. Yeah, yeah, yeah,

935
00:47:47,039 --> 00:47:50,159
it is. It can lead to some confusion. Now, it'd

936
00:47:50,159 --> 00:47:53,480
be nice to have some something when you're creating those

937
00:47:53,559 --> 00:47:55,559
policies to say, hey, you know, by the way, this

938
00:47:55,719 --> 00:47:58,119
contradicts this policy, you might want to take a look

939
00:47:58,119 --> 00:47:58,639
at that. Yeah.

940
00:47:58,639 --> 00:48:01,760
Speaker 3: We definitely validate policies, so we have the policy ENGINET validates.

941
00:48:01,800 --> 00:48:05,320
If there's something like which doesn't make sense, there will

942
00:48:05,320 --> 00:48:07,199
be a validation error, so you will not be able

943
00:48:07,280 --> 00:48:08,119
to save the policy.

944
00:48:08,519 --> 00:48:08,719
Speaker 1: Yeah.

945
00:48:09,159 --> 00:48:10,440
Speaker 3: But yeah, if that's something.

946
00:48:10,400 --> 00:48:13,239
Speaker 1: Allow Carl to access this API, don't allow Carl to

947
00:48:13,280 --> 00:48:13,679
access to.

948
00:48:13,679 --> 00:48:17,239
Speaker 3: This ABA yeah, yeah exactly, and stuff like that.

949
00:48:17,960 --> 00:48:18,159
Speaker 1: Cool.

950
00:48:18,280 --> 00:48:20,920
Speaker 3: So yeah, and in general, like for the future of JENNI,

951
00:48:21,079 --> 00:48:24,800
so we are good at like security, traffic management, just

952
00:48:24,960 --> 00:48:27,280
kind of general ease of operations and APM. So that's

953
00:48:27,280 --> 00:48:29,440
what we are focusing on to make sure that customers

954
00:48:29,519 --> 00:48:33,280
have all of these secure access control to those models,

955
00:48:33,400 --> 00:48:35,719
like all of the policies and governments in place. But

956
00:48:35,760 --> 00:48:37,760
at the same time, we want to make it easier

957
00:48:37,800 --> 00:48:40,400
to build intelligent applications. So whatever we build, we are

958
00:48:40,440 --> 00:48:44,119
trying to give those AI engineers like an easy to

959
00:48:44,199 --> 00:48:46,480
use interface if they're not familiar, to make sure that

960
00:48:46,559 --> 00:48:50,079
it's easy for them to set up, configure and basically

961
00:48:51,239 --> 00:48:55,000
get all the benefits of a APMs GENEI gateway when

962
00:48:55,000 --> 00:48:56,039
they're building applications.

963
00:48:56,239 --> 00:48:58,320
Speaker 1: Great, well, it sounds like the end of the show,

964
00:48:58,400 --> 00:49:00,480
Andre Kamanov, thank you for being of this. Is there

965
00:49:00,519 --> 00:49:02,440
anything that we missed that you wanted to mention or

966
00:49:02,440 --> 00:49:04,519
a shout out or call it action or anything.

967
00:49:04,800 --> 00:49:10,719
Speaker 3: I would say, just make sure to check the AZRA

968
00:49:10,800 --> 00:49:14,039
updates when we release new stuff, and we'll get a

969
00:49:14,079 --> 00:49:17,119
technique community blocks where we publish all of the latest

970
00:49:17,159 --> 00:49:21,000
and greatest in in a PM. And yeah, all right,

971
00:49:21,079 --> 00:49:22,239
that's that's it.

972
00:49:22,719 --> 00:49:24,199
Speaker 1: Awesome. Well, it's been great talking to you.

973
00:49:24,320 --> 00:49:25,000
Speaker 3: Thanks for having me.

974
00:49:25,679 --> 00:49:27,760
Speaker 1: It was great talking to good to Thank you very much.

975
00:49:28,360 --> 00:49:31,199
All right, we'll talk to you next time I'm done.

976
00:49:52,239 --> 00:49:54,760
Dot net Rocks is brought to you by Franklin's Net

977
00:49:55,079 --> 00:49:59,000
and produced by Pop Studios, a full service audio, video

978
00:49:59,079 --> 00:50:03,119
and post production facility located physically in New London, Connecticut.

979
00:50:03,440 --> 00:50:07,599
And of course in the cloud online at pwop dot com.

980
00:50:08,440 --> 00:50:10,480
Visit our website at d O T N E t

981
00:50:10,800 --> 00:50:14,760
R O c k S dot com for RSS feeds, downloads,

982
00:50:14,960 --> 00:50:18,599
mobile apps, comments, and access to the full archives going

983
00:50:18,679 --> 00:50:22,079
back to show number one, recorded in September two thousand

984
00:50:22,079 --> 00:50:24,719
and two. And make sure you check out our sponsors.

985
00:50:24,920 --> 00:50:27,679
They keep us in business. Now, go write some code.

986
00:50:28,280 --> 00:50:33,880
See you next time. You got J middle Vans and

