WEBVTT

1
00:00:12.160 --> 00:00:20.760
<v Speaker 1>Hey man, it's not ned. You're not why not? You know,

2
00:00:20.839 --> 00:00:23.280
<v Speaker 1>it's just the way I feel this morning.

3
00:00:23.359 --> 00:00:25.120
<v Speaker 2>Yeah, you've been had a little too much weekend.

4
00:00:25.160 --> 00:00:29.760
<v Speaker 1>I think I'm Carl Franklin. That's Richard Campbell, you know. Okay,

5
00:00:29.879 --> 00:00:33.600
<v Speaker 1>I won't do that again until marijuana is completely legal

6
00:00:33.640 --> 00:00:36.159
<v Speaker 1>in all fifty states of America. Oh, it's going to

7
00:00:36.200 --> 00:00:42.520
<v Speaker 1>be a while and then I'll hey man, how you doing, Richard?

8
00:00:42.840 --> 00:00:43.320
<v Speaker 1>I'm good.

9
00:00:43.799 --> 00:00:45.880
<v Speaker 2>I did have a bit of a crazy weekend. You know,

10
00:00:46.119 --> 00:00:48.759
<v Speaker 2>between my birthday, my wife's birthday, and and many of

11
00:00:48.759 --> 00:00:51.679
<v Speaker 2>our friends are all like this last two weeks of July,

12
00:00:51.880 --> 00:00:54.560
<v Speaker 2>so we bought them all up to the coast and

13
00:00:54.560 --> 00:00:57.399
<v Speaker 2>it's yeah, I've been a week in a debauchery, honestly.

14
00:00:57.600 --> 00:00:59.359
<v Speaker 1>Yeah, your nose does look a little red.

15
00:00:59.520 --> 00:01:02.000
<v Speaker 2>Actually, I'm a little pinked up, a little pinked up

16
00:01:02.039 --> 00:01:05.319
<v Speaker 2>today doing some damage. But you know, yeah, there's a

17
00:01:05.359 --> 00:01:07.239
<v Speaker 2>couple of buddies up here that are literally known for

18
00:01:07.239 --> 00:01:09.079
<v Speaker 2>forty years. Like that's when I'm wow, get old.

19
00:01:09.400 --> 00:01:10.280
<v Speaker 1>That's so cool though.

20
00:01:10.400 --> 00:01:13.400
<v Speaker 2>Yeah, it's something. Yeah, how about you. You're staying out

21
00:01:13.400 --> 00:01:15.799
<v Speaker 2>of trouble. You're playing I see, I see your calendar.

22
00:01:15.840 --> 00:01:17.239
<v Speaker 2>You're playing summer time.

23
00:01:17.359 --> 00:01:20.640
<v Speaker 1>Yeah. Yeah, the band is currently just on a tear

24
00:01:20.799 --> 00:01:21.200
<v Speaker 1>right now.

25
00:01:21.439 --> 00:01:23.879
<v Speaker 2>That's awesome, dude, And uh.

26
00:01:23.439 --> 00:01:27.680
<v Speaker 1>We're getting our original bass player back in September. August

27
00:01:27.760 --> 00:01:31.200
<v Speaker 1>thirty first is our current bass player's last date. Oh,

28
00:01:31.280 --> 00:01:34.120
<v Speaker 1>I'll tell you a little band story. So it was

29
00:01:34.200 --> 00:01:39.280
<v Speaker 1>during COVID that Kevin, our original bass player, basically stopped

30
00:01:39.280 --> 00:01:42.680
<v Speaker 1>coming to rehearsals because of COVID and and he had

31
00:01:42.680 --> 00:01:46.079
<v Speaker 1>a good reason too. He was immune deficient and it

32
00:01:46.120 --> 00:01:50.239
<v Speaker 1>turns out he got leukemia. He had and his immune

33
00:01:50.359 --> 00:01:53.519
<v Speaker 1>system was compromised. Because of all that stuff, he had

34
00:01:53.519 --> 00:01:56.560
<v Speaker 1>to stay home and he was like, guys, I'm I'm out.

35
00:01:56.599 --> 00:01:59.439
<v Speaker 1>I can't do this and you know, my wife and

36
00:01:59.519 --> 00:02:03.920
<v Speaker 1>child all that. Yeah. So then he gradually got over it,

37
00:02:03.959 --> 00:02:06.359
<v Speaker 1>and I kept calling him once in a while and said, hey, man,

38
00:02:06.400 --> 00:02:08.759
<v Speaker 1>how you doing. He's like, yeah, I'm okay, but I'm

39
00:02:08.800 --> 00:02:10.759
<v Speaker 1>still not. You know, I tried playing out a couple

40
00:02:10.719 --> 00:02:14.080
<v Speaker 1>of times with you know, somebody else, and I'm still not.

41
00:02:14.800 --> 00:02:19.800
<v Speaker 1>And then finally he's like, okay, I'm retiring. I'm in oh,

42
00:02:19.840 --> 00:02:23.400
<v Speaker 1>but not till September. And it turns out that I

43
00:02:23.439 --> 00:02:28.000
<v Speaker 1>called him because our current bass player Chris basically met

44
00:02:28.000 --> 00:02:32.639
<v Speaker 1>a girl, sold his house, and moved three hours away

45
00:02:33.000 --> 00:02:33.719
<v Speaker 1>and married her.

46
00:02:33.879 --> 00:02:36.039
<v Speaker 2>Hopefully not necessarily an order, but okay.

47
00:02:35.840 --> 00:02:38.919
<v Speaker 1>Yeah, yeah, yeah. But he said, don't worry, guys, I'm

48
00:02:39.000 --> 00:02:41.560
<v Speaker 1>still in the band. And we're like, oh, yeah, right,

49
00:02:41.759 --> 00:02:47.039
<v Speaker 1>you know so. But props to Chris three hours. Yeah,

50
00:02:47.639 --> 00:02:49.479
<v Speaker 1>he didn't come to rehearsals so that, you know, we

51
00:02:49.520 --> 00:02:52.960
<v Speaker 1>couldn't really learn any new stuff, but we could send

52
00:02:53.039 --> 00:02:54.759
<v Speaker 1>him a song and say, hey, learn this, and he

53
00:02:54.800 --> 00:02:56.599
<v Speaker 1>was pretty good about it. And he never had a

54
00:02:56.599 --> 00:02:59.159
<v Speaker 1>problem on the gig. But he showed up at every

55
00:02:59.240 --> 00:03:03.400
<v Speaker 1>gig three hours away. It's a lot of driving. But

56
00:03:03.520 --> 00:03:05.759
<v Speaker 1>I was going to say that the whole purpose of

57
00:03:05.800 --> 00:03:08.639
<v Speaker 1>this was that it comes September. See, Kevin knows a

58
00:03:08.639 --> 00:03:11.599
<v Speaker 1>lot of our original tunes that Chris doesn't know, and

59
00:03:11.680 --> 00:03:13.879
<v Speaker 1>a lot more than the Steely Dan songs which were

60
00:03:13.879 --> 00:03:16.599
<v Speaker 1>known for. So we're going to be playing a lot

61
00:03:16.639 --> 00:03:20.639
<v Speaker 1>more original tunes, hopefully in bigger venues. Yeah, more Franklin Brothers.

62
00:03:20.759 --> 00:03:23.199
<v Speaker 1>It's cool. Anyway, I've taken up enough time with my

63
00:03:23.400 --> 00:03:27.840
<v Speaker 1>stupid stories of music and bands. Let's play role. Let's

64
00:03:27.919 --> 00:03:37.840
<v Speaker 1>roll the music for better no framework, awesome, man, what

65
00:03:37.919 --> 00:03:40.439
<v Speaker 1>do you got. I can't believe I've never talked about

66
00:03:40.479 --> 00:03:46.240
<v Speaker 1>this particular project before. It's curated by the dot Net Foundation.

67
00:03:47.080 --> 00:03:51.000
<v Speaker 1>It's Fluent validation. Okay, it's a validation library for dot

68
00:03:51.000 --> 00:03:53.680
<v Speaker 1>Net that uses a fluent interface. You know what that is? This?

69
00:03:53.759 --> 00:03:56.879
<v Speaker 1>Do this? Do that? Mm hmm and lambda expressions for

70
00:03:56.960 --> 00:04:02.080
<v Speaker 1>building strongly typed validation rules, right, okay, So you can

71
00:04:02.120 --> 00:04:05.000
<v Speaker 1>create these rules and then you can obviously use them,

72
00:04:05.240 --> 00:04:07.840
<v Speaker 1>and it's very popular.

73
00:04:08.080 --> 00:04:10.599
<v Speaker 2>Well, and it's a way of building abstract validation rules

74
00:04:10.639 --> 00:04:13.840
<v Speaker 2>you might actually reuse or ither than keep recoding it exactly.

75
00:04:14.159 --> 00:04:16.480
<v Speaker 1>Yeah. Yeah, And you know, if you decide to do

76
00:04:16.560 --> 00:04:18.879
<v Speaker 1>it on the fly, where do you start that information?

77
00:04:19.639 --> 00:04:21.480
<v Speaker 1>Where does it go? Does it go in a database?

78
00:04:22.480 --> 00:04:24.399
<v Speaker 1>You know? Does it change? How does it change? Who

79
00:04:24.480 --> 00:04:29.040
<v Speaker 1>changes it? Like, there's so many, so many things. But yeah,

80
00:04:29.560 --> 00:04:34.399
<v Speaker 1>eight point nine thousand stars. Yeah, I would learn it,

81
00:04:34.439 --> 00:04:37.439
<v Speaker 1>love it. Man. That's clearly a great tool. It's cool,

82
00:04:37.480 --> 00:04:39.199
<v Speaker 1>and I'm going to start using it. I can't believe

83
00:04:39.199 --> 00:04:43.040
<v Speaker 1>that I haven't even known about this existence before. So

84
00:04:43.120 --> 00:04:44.480
<v Speaker 1>that's what I got. Who's talking to us?

85
00:04:44.600 --> 00:04:46.800
<v Speaker 2>Richard grabbed a comment off of the show eighteen ninety one,

86
00:04:46.800 --> 00:04:48.759
<v Speaker 2>which you did back in March of twenty four with

87
00:04:48.920 --> 00:04:53.040
<v Speaker 2>her friend Anthony LRB. Who did they he was he'd

88
00:04:53.040 --> 00:04:56.319
<v Speaker 2>written that open source library for API observability.

89
00:04:56.680 --> 00:04:57.480
<v Speaker 1>Yeah right.

90
00:04:57.639 --> 00:04:59.639
<v Speaker 2>We had a great conversation with her. An API tool

91
00:04:59.680 --> 00:05:01.879
<v Speaker 2>kit I was his two line. I know we're going

92
00:05:01.879 --> 00:05:04.399
<v Speaker 2>to talk about APIs a bunch today. And our friend

93
00:05:04.439 --> 00:05:07.600
<v Speaker 2>Matt Lacy actually commented on the show. He said, hey,

94
00:05:07.720 --> 00:05:10.279
<v Speaker 2>ten plus years ago, I built something for API. Because

95
00:05:10.319 --> 00:05:12.480
<v Speaker 2>this is your checking on the client end. I assume

96
00:05:12.519 --> 00:05:14.480
<v Speaker 2>there was a potential business in it, but I couldn't

97
00:05:14.480 --> 00:05:16.759
<v Speaker 2>work out how, you know, because building good software and

98
00:05:16.839 --> 00:05:20.040
<v Speaker 2>selling software two different jobs, right, Like that's a different thing.

99
00:05:21.160 --> 00:05:23.240
<v Speaker 2>Great to see API toolkitd in existence now I'm making

100
00:05:23.279 --> 00:05:26.319
<v Speaker 2>APIs more reliable because that was our whole conversation, right

101
00:05:26.519 --> 00:05:29.759
<v Speaker 2>was you know you changed some you changed an API

102
00:05:29.920 --> 00:05:33.000
<v Speaker 2>and somebody with a dependency on it suddenly goes down. Yeah,

103
00:05:33.279 --> 00:05:35.720
<v Speaker 2>you make people sad in a big hurry, and so

104
00:05:36.079 --> 00:05:38.800
<v Speaker 2>API toolkit was all about doing that validation. Hey, Matt,

105
00:05:38.920 --> 00:05:40.720
<v Speaker 2>thanks so much for being a long time listener. And

106
00:05:41.000 --> 00:05:42.360
<v Speaker 2>I don't know if you have a copy of Music

107
00:05:42.439 --> 00:05:44.879
<v Speaker 2>Code Buy already, but you do now if you'd like

108
00:05:44.879 --> 00:05:46.399
<v Speaker 2>a copy of Music code By, I write a comment

109
00:05:46.439 --> 00:05:48.240
<v Speaker 2>on the website at don at Rocks dot com or

110
00:05:48.279 --> 00:05:50.519
<v Speaker 2>on the facebooks. We publish every show there, and if

111
00:05:50.519 --> 00:05:52.000
<v Speaker 2>you comment there and every reading the show, we'll send

112
00:05:52.040 --> 00:05:54.120
<v Speaker 2>you a copy of music code By and Music to code.

113
00:05:53.959 --> 00:05:56.040
<v Speaker 1>By for those who don't know, is something you can

114
00:05:56.120 --> 00:05:59.920
<v Speaker 1>listen to while you're coding, or to calm a restless dog,

115
00:06:00.519 --> 00:06:02.360
<v Speaker 1>or to put your children to sleep at night.

116
00:06:03.399 --> 00:06:07.399
<v Speaker 2>Here I thought that was the nuclear weapons geek out

117
00:06:07.439 --> 00:06:08.639
<v Speaker 2>for putting children to sleep.

118
00:06:08.839 --> 00:06:11.879
<v Speaker 1>Oh come on, now, that was amazing, and it always

119
00:06:12.040 --> 00:06:14.720
<v Speaker 1>is every year, but I was that particular one.

120
00:06:14.759 --> 00:06:17.439
<v Speaker 2>Somebody told me it's like what I was depressed. So

121
00:06:17.519 --> 00:06:19.360
<v Speaker 2>the key, I'm a little more level in that.

122
00:06:19.480 --> 00:06:19.680
<v Speaker 1>Yeah.

123
00:06:19.680 --> 00:06:21.600
<v Speaker 2>And I did almost all of the talking too, So

124
00:06:21.839 --> 00:06:23.639
<v Speaker 2>apparently it's pretty good knocking kids out.

125
00:06:23.879 --> 00:06:27.319
<v Speaker 1>Well, there you go. Hopefully we'll have something more positive

126
00:06:27.439 --> 00:06:27.759
<v Speaker 1>to say.

127
00:06:27.800 --> 00:06:29.680
<v Speaker 2>This is only positive about nuclear weapons.

128
00:06:29.759 --> 00:06:34.839
<v Speaker 1>Go on, Well you know, okay, Well let's get into it.

129
00:06:34.959 --> 00:06:37.560
<v Speaker 1>We're gonna introduce our guest right now. I'm going to

130
00:06:37.639 --> 00:06:42.240
<v Speaker 1>introduce our guest. Andrea Kamenev is our guest, and from

131
00:06:42.319 --> 00:06:46.120
<v Speaker 1>twenty sixteen to twenty twenty four, Andrea worked at Microsoft

132
00:06:46.240 --> 00:06:49.360
<v Speaker 1>in various architect roles in Europe helping customers to bring

133
00:06:49.439 --> 00:06:53.199
<v Speaker 1>their applications to Azure. Now he works as a product

134
00:06:53.279 --> 00:06:57.319
<v Speaker 1>manager at Azure Api Management. All that do that we

135
00:06:57.439 --> 00:06:59.959
<v Speaker 1>were kind of just talking about, but in the Azure way.

136
00:07:00.759 --> 00:07:04.079
<v Speaker 1>Welcome Andre, thanks for hearing me, Thanks for being here.

137
00:07:04.879 --> 00:07:08.040
<v Speaker 1>We first started like the quick start teams, those folks

138
00:07:08.120 --> 00:07:12.199
<v Speaker 1>that helped onboard people into the cloud. Did that digital transformation.

139
00:07:11.759 --> 00:07:14.279
<v Speaker 3>Thing, Uh, I mean the the service by itself.

140
00:07:14.600 --> 00:07:17.600
<v Speaker 1>Yeah, your earlier role before you joined the product team.

141
00:07:17.839 --> 00:07:19.959
<v Speaker 3>Yeah, yeah, So I was a part of what was

142
00:07:20.040 --> 00:07:23.480
<v Speaker 3>called here at Microsoft Global Black Built Team. Okay, so

143
00:07:23.600 --> 00:07:27.480
<v Speaker 3>it's like a bunch of cloudsision architects who help local

144
00:07:27.600 --> 00:07:32.279
<v Speaker 3>teams like field engineers and customer social architects in the

145
00:07:32.360 --> 00:07:36.199
<v Speaker 3>MEA region to build stuff with Azure. So our team

146
00:07:36.240 --> 00:07:38.560
<v Speaker 3>was mostly focusing on Kubernatis related stuff. So I was

147
00:07:38.600 --> 00:07:41.759
<v Speaker 3>working a lot with customers on bringing workloads to Azurecubneti

148
00:07:41.839 --> 00:07:43.720
<v Speaker 3>service asually had to open shift and so on.

149
00:07:43.959 --> 00:07:46.560
<v Speaker 2>So yeah, deep, So the move over to API management

150
00:07:46.600 --> 00:07:49.319
<v Speaker 2>makes sense because that's a lynchpin problem when you expose

151
00:07:49.360 --> 00:07:52.439
<v Speaker 2>stuff on the cloud that was typically just on prem before.

152
00:07:53.439 --> 00:07:53.600
<v Speaker 1>Yeah.

153
00:07:53.639 --> 00:07:56.519
<v Speaker 3>Absolutely, Yeah, we've seen a lot of customers who interested

154
00:07:56.519 --> 00:07:59.120
<v Speaker 3>in APIM like and even even back then when I

155
00:07:59.240 --> 00:08:01.759
<v Speaker 3>was not a part of a PAM team, like, okay,

156
00:08:01.839 --> 00:08:03.560
<v Speaker 3>I have a bunch of APIs and my cuminator is

157
00:08:03.560 --> 00:08:05.800
<v Speaker 3>how do they expose them securely? Like what what can

158
00:08:05.839 --> 00:08:06.319
<v Speaker 3>you go for me?

159
00:08:06.439 --> 00:08:06.920
<v Speaker 1>Microsoft?

160
00:08:07.360 --> 00:08:08.920
<v Speaker 3>And then I think back back then there was a

161
00:08:09.000 --> 00:08:11.680
<v Speaker 3>self hosted gate when it is it is still out there.

162
00:08:12.560 --> 00:08:13.279
<v Speaker 3>It was a solution.

163
00:08:13.560 --> 00:08:16.759
<v Speaker 1>So yeah, so what are you working on these days? Yeah?

164
00:08:16.800 --> 00:08:21.040
<v Speaker 3>So these days, I guess there's a lot of interest

165
00:08:21.199 --> 00:08:24.399
<v Speaker 3>in gen ai in large language models. Chugupt is all

166
00:08:24.439 --> 00:08:28.800
<v Speaker 3>over the place. So goodness, right now, we in I

167
00:08:28.879 --> 00:08:31.000
<v Speaker 3>believe in May, Yeah, in May we released the gen

168
00:08:31.040 --> 00:08:34.840
<v Speaker 3>Ai Gatewakypical just Nature pay Management to help customers build

169
00:08:35.559 --> 00:08:37.480
<v Speaker 3>like intelligent applications with lll ms.

170
00:08:37.519 --> 00:08:40.679
<v Speaker 1>So yeah, that's I have a great idea. Let's give

171
00:08:40.759 --> 00:08:43.519
<v Speaker 1>our AI all of our API keys and let them

172
00:08:43.559 --> 00:08:44.639
<v Speaker 1>do whatever they want to do with it.

173
00:08:45.600 --> 00:08:48.799
<v Speaker 3>Yeah, that's that's actually a better approach. That's that's what

174
00:08:49.120 --> 00:08:50.840
<v Speaker 3>That's one of the things that were actually trying to

175
00:08:50.919 --> 00:08:52.320
<v Speaker 3>solve for customers.

176
00:08:52.159 --> 00:08:54.679
<v Speaker 1>Right, I know. Yeah, it just sounds crazy, doesn't it,

177
00:08:54.960 --> 00:08:58.480
<v Speaker 1>given all the y the AI hiccups, and things that

178
00:08:58.639 --> 00:09:01.840
<v Speaker 1>people don't really trust them. But I mean, so let's

179
00:09:02.039 --> 00:09:05.440
<v Speaker 1>let's bust that myth. You know, why would we use

180
00:09:05.679 --> 00:09:10.200
<v Speaker 1>AI to make our API calls, manage our APIs do

181
00:09:10.279 --> 00:09:14.279
<v Speaker 1>all those things that normally trusted folks to. Yeah.

182
00:09:14.360 --> 00:09:17.039
<v Speaker 3>So here I think, like from APM side, we have

183
00:09:17.279 --> 00:09:20.960
<v Speaker 3>two different stories. Like, first of all, we use gen

184
00:09:20.960 --> 00:09:25.000
<v Speaker 3>AI ourselves to help customers write the policies that we

185
00:09:25.120 --> 00:09:29.679
<v Speaker 3>have an API M. And the second thing is we

186
00:09:29.799 --> 00:09:34.000
<v Speaker 3>have customers for building intelligent applications using Judge PT models,

187
00:09:34.200 --> 00:09:37.279
<v Speaker 3>using other models that are out there in az REAI studio,

188
00:09:38.200 --> 00:09:40.600
<v Speaker 3>and they have challenges because if you think about as

189
00:09:40.679 --> 00:09:43.519
<v Speaker 3>open EI for example, it is still an API, you

190
00:09:43.639 --> 00:09:46.759
<v Speaker 3>still have the same challenges when it comes to managing

191
00:09:46.840 --> 00:09:51.039
<v Speaker 3>and securing access to APIs. So we built but they

192
00:09:51.120 --> 00:09:54.159
<v Speaker 3>have specific kind of a number of challenges which are

193
00:09:54.279 --> 00:09:57.399
<v Speaker 3>kind of a specific to l ms. So this is

194
00:09:57.480 --> 00:09:59.960
<v Speaker 3>kind of the second part where we help customers to secure,

195
00:10:00.279 --> 00:10:03.320
<v Speaker 3>manage and scale Open the Eye deployments for the applications

196
00:10:03.360 --> 00:10:07.120
<v Speaker 3>with that's what we call JENNYI Gateway and HPA management.

197
00:10:07.200 --> 00:10:09.120
<v Speaker 1>So what are some of the ways in which AI

198
00:10:09.240 --> 00:10:15.480
<v Speaker 1>can be used with APIs. Obviously creating the management stuff

199
00:10:15.519 --> 00:10:22.000
<v Speaker 1>around it. But would you necessarily trust an AI with

200
00:10:22.240 --> 00:10:25.480
<v Speaker 1>your API keys and say, you know, here are the

201
00:10:25.600 --> 00:10:28.879
<v Speaker 1>rules and times under which you would make these API calls.

202
00:10:29.000 --> 00:10:32.159
<v Speaker 1>And I'm trying to put wrap my head around that.

203
00:10:32.679 --> 00:10:38.159
<v Speaker 3>Yeah, I think yeahs as always, it depends, right, So Yeah,

204
00:10:39.639 --> 00:10:43.840
<v Speaker 3>if you have like specific like access controls in place,

205
00:10:44.440 --> 00:10:46.799
<v Speaker 3>why not Like if you, for example, that apimor you

206
00:10:46.840 --> 00:10:51.159
<v Speaker 3>can you can provide the specific keys for the LLM

207
00:10:51.519 --> 00:10:54.440
<v Speaker 3>to like enhance the experience for those who use those

208
00:10:54.559 --> 00:10:57.320
<v Speaker 3>l ms you can and all that. Yeah, then why

209
00:10:57.360 --> 00:11:00.000
<v Speaker 3>not Like you're not given the access to like full API,

210
00:11:00.159 --> 00:11:03.759
<v Speaker 3>You're just giving access to a subset of operations.

211
00:11:03.679 --> 00:11:05.519
<v Speaker 1>Sure, which are for example, wedn't.

212
00:11:05.240 --> 00:11:08.519
<v Speaker 3>Leave, or they just have access to specific data that

213
00:11:08.720 --> 00:11:11.559
<v Speaker 3>you're not really they're not, which is not really trick.

214
00:11:11.679 --> 00:11:14.159
<v Speaker 3>So yeah, that's definitely why I will use it.

215
00:11:14.200 --> 00:11:18.200
<v Speaker 1>I'd be definitely okay with gets, but posts and puts.

216
00:11:18.600 --> 00:11:21.879
<v Speaker 3>Yeah, I don't know, yeahuse.

217
00:11:21.080 --> 00:11:23.039
<v Speaker 2>Otherwise you'd have to make these rules yourself, right, I

218
00:11:23.080 --> 00:11:24.960
<v Speaker 2>mean that that's the point here is that you got

219
00:11:24.960 --> 00:11:26.840
<v Speaker 2>a machine learning model essentially that's figuring out what the

220
00:11:26.879 --> 00:11:28.399
<v Speaker 2>optimal rules are for utilization.

221
00:11:28.679 --> 00:11:30.039
<v Speaker 3>Yeah, so there is a lot. There is one thing

222
00:11:30.120 --> 00:11:32.639
<v Speaker 3>so as I mentioned, like first thing we've built like

223
00:11:32.639 --> 00:11:35.360
<v Speaker 3>an a PM. For example, we're helping customers with with

224
00:11:35.639 --> 00:11:41.000
<v Speaker 3>lllms to configure a PM. So I guess you're kind

225
00:11:41.039 --> 00:11:44.120
<v Speaker 3>of familiar with AM. We have this XML policies which

226
00:11:44.240 --> 00:11:46.360
<v Speaker 3>can be pretty long documents.

227
00:11:46.840 --> 00:11:49.919
<v Speaker 2>Yeah, so typically that you're doing the thing, you're trying

228
00:11:49.960 --> 00:11:52.519
<v Speaker 2>to say no one user can do more than this

229
00:11:52.720 --> 00:11:55.759
<v Speaker 2>many or if it's growing, you know, massively limited so

230
00:11:55.919 --> 00:11:58.679
<v Speaker 2>you don't knock other people off and make sure they're

231
00:11:58.720 --> 00:12:01.399
<v Speaker 2>using the right accounts like it's it's just forwarding.

232
00:12:01.799 --> 00:12:03.000
<v Speaker 1>Yeah, a lot of stuff in there.

233
00:12:03.039 --> 00:12:04.720
<v Speaker 2>You know where stuff is, and you know what failover

234
00:12:04.799 --> 00:12:07.480
<v Speaker 2>modes look like, like yeah, AP. We've done a few

235
00:12:07.519 --> 00:12:10.879
<v Speaker 2>shows on API management now and it's like pretty powerful stuff.

236
00:12:11.159 --> 00:12:14.600
<v Speaker 2>You know, you're gonna pub put an API in a

237
00:12:14.679 --> 00:12:18.759
<v Speaker 2>public like you're paying when everybody somebody calls that, so

238
00:12:19.000 --> 00:12:20.759
<v Speaker 2>you kind of want to put governance around that. But

239
00:12:20.919 --> 00:12:22.960
<v Speaker 2>write and all those rules like when you really dig

240
00:12:23.039 --> 00:12:25.320
<v Speaker 2>into it, it's complicated. So that was my first thought

241
00:12:25.320 --> 00:12:27.120
<v Speaker 2>when I thought about, Yeah, what do I want Generator

242
00:12:27.159 --> 00:12:29.200
<v Speaker 2>I to do. It's like, look at what's actually going

243
00:12:29.279 --> 00:12:30.720
<v Speaker 2>on and write me better rules.

244
00:12:30.600 --> 00:12:32.720
<v Speaker 3>Yeah, exactly. And that's kind of the two kind of

245
00:12:32.759 --> 00:12:35.879
<v Speaker 3>two use cases that we focused on with Copilot. Like, first,

246
00:12:36.000 --> 00:12:38.960
<v Speaker 3>we we decided that, oh, we we know that writing

247
00:12:39.000 --> 00:12:41.720
<v Speaker 3>policies is hard. We have like fifty sixty different policses

248
00:12:41.799 --> 00:12:45.320
<v Speaker 3>nippets to do like Validay, Jotan retripolicies like write limits

249
00:12:45.320 --> 00:12:48.279
<v Speaker 3>and stuff like that. So we decided like, let's have

250
00:12:48.399 --> 00:12:50.519
<v Speaker 3>a let's have a way for customers to express and

251
00:12:50.799 --> 00:12:52.840
<v Speaker 3>plant English, like, for example, I want to have a

252
00:12:52.919 --> 00:12:55.919
<v Speaker 3>policy to write limit this API for you know, five

253
00:12:56.000 --> 00:12:59.360
<v Speaker 3>for quest per second, and then Copilot will just explain

254
00:12:59.440 --> 00:13:01.320
<v Speaker 3>on sorry not explain, but generate a policy for you

255
00:13:01.360 --> 00:13:03.559
<v Speaker 3>and then just copy and paste this into the XM

256
00:13:03.679 --> 00:13:07.080
<v Speaker 3>leditor and that's it. And another one is, as I

257
00:13:07.120 --> 00:13:10.679
<v Speaker 3>mentioned that the second scenario, policies can become pretty long,

258
00:13:10.799 --> 00:13:12.960
<v Speaker 3>like two hundred three hundred, and sometimes you don't even

259
00:13:13.000 --> 00:13:14.200
<v Speaker 3>understand what's going on there.

260
00:13:14.679 --> 00:13:20.759
<v Speaker 2>In XML, in XML exactly. To another theme on the

261
00:13:20.840 --> 00:13:24.559
<v Speaker 2>show lately is we hate XML. Yeah, there's a use

262
00:13:24.639 --> 00:13:27.679
<v Speaker 2>for AI right there. Hey, translate this XML to.

263
00:13:27.720 --> 00:13:30.759
<v Speaker 3>Me in English exactly, And that's actually what we do

264
00:13:30.879 --> 00:13:33.559
<v Speaker 3>with the second scenario. So yeah, you can just select

265
00:13:33.879 --> 00:13:37.360
<v Speaker 3>XML whole thing or just a policy snippet and then

266
00:13:37.480 --> 00:13:39.960
<v Speaker 3>you ask it to explain it to you and the

267
00:13:40.039 --> 00:13:44.399
<v Speaker 3>fund stuff. It's only explaining just like oh, this policy does,

268
00:13:44.519 --> 00:13:47.240
<v Speaker 3>this policy is that, but it also understands the context,

269
00:13:47.320 --> 00:13:49.679
<v Speaker 3>like if you have too different variables, you have context,

270
00:13:49.840 --> 00:13:54.159
<v Speaker 3>you have policy expressions some logic in there. It also

271
00:13:54.200 --> 00:13:57.120
<v Speaker 3>will explain that, for example, and all you are doing

272
00:13:57.200 --> 00:14:00.360
<v Speaker 3>validate job policy and you have this admin claim that

273
00:14:00.440 --> 00:14:03.120
<v Speaker 3>you're checking. If you're checking the saddening claim, if it exists,

274
00:14:03.240 --> 00:14:06.240
<v Speaker 3>then you are allowed to do this operation. So yeah,

275
00:14:06.240 --> 00:14:08.919
<v Speaker 3>it's it's pretty it's pretty good in explaining policies because

276
00:14:09.440 --> 00:14:12.600
<v Speaker 3>we also we are not using like the plane model,

277
00:14:13.000 --> 00:14:16.759
<v Speaker 3>but we're also like using this it's called retrieval augmented

278
00:14:17.120 --> 00:14:20.360
<v Speaker 3>generation pattern where we also have like policy snippets that

279
00:14:20.440 --> 00:14:23.639
<v Speaker 3>are stored in a storage and this model can also

280
00:14:23.759 --> 00:14:27.240
<v Speaker 3>use this policy snippets that we provided to better like

281
00:14:27.399 --> 00:14:30.320
<v Speaker 3>respond with correct policies with better explanations.

282
00:14:30.360 --> 00:14:32.200
<v Speaker 2>And so yeah, so the same way I would actually

283
00:14:32.240 --> 00:14:34.600
<v Speaker 2>write policies is I go cut and paste from well

284
00:14:34.639 --> 00:14:37.759
<v Speaker 2>written policies exactly. Yeah, you've trained a model on well

285
00:14:37.799 --> 00:14:39.840
<v Speaker 2>written policy so that it has a good chance of

286
00:14:40.360 --> 00:14:41.840
<v Speaker 2>expressing better ones.

287
00:14:41.679 --> 00:14:42.720
<v Speaker 1>For a customer.

288
00:14:42.919 --> 00:14:48.159
<v Speaker 2>Yeah, exactly, Okay, I mean I could see a few

289
00:14:48.159 --> 00:14:50.200
<v Speaker 2>different things going on here at once because you and

290
00:14:50.240 --> 00:14:52.480
<v Speaker 2>I'll include a link to this blog post here. You're

291
00:14:52.519 --> 00:15:00.559
<v Speaker 2>also talking about using the the API APIM to manage

292
00:15:00.720 --> 00:15:05.039
<v Speaker 2>utilization of the open Ai service because that stuff gets expensive,

293
00:15:05.279 --> 00:15:07.759
<v Speaker 2>like exactly, Yeah, those tokens run away on you and

294
00:15:07.840 --> 00:15:09.000
<v Speaker 2>like you're having a bad day.

295
00:15:09.360 --> 00:15:12.080
<v Speaker 3>Yeah, yeah, that's that's that's actually an interesting use case

296
00:15:12.120 --> 00:15:14.360
<v Speaker 3>because as I mentioned, we have customers who are trying out,

297
00:15:14.399 --> 00:15:19.360
<v Speaker 3>they're building pocs, they're building small applications, and there's azual

298
00:15:19.360 --> 00:15:21.600
<v Speaker 3>open Eye service and it makes it really easy for

299
00:15:21.720 --> 00:15:24.639
<v Speaker 3>you to start. You just deploy open endpoint, you select

300
00:15:24.720 --> 00:15:26.799
<v Speaker 3>for example, you have you want to have GPT four

301
00:15:26.879 --> 00:15:29.399
<v Speaker 3>model and the end you're good to go, like you can.

302
00:15:29.840 --> 00:15:32.679
<v Speaker 3>That's just an API. You get your ap I key,

303
00:15:32.759 --> 00:15:34.879
<v Speaker 3>you import dais the care of your choice to your application,

304
00:15:35.080 --> 00:15:37.759
<v Speaker 3>and and that's it. You're sending prompts, you're saving completions.

305
00:15:37.840 --> 00:15:43.480
<v Speaker 3>Everything's fine, but then customers realize that okay token comes exactly, Yeah,

306
00:15:44.840 --> 00:15:47.399
<v Speaker 3>there are tokens, and tokens is like something which is

307
00:15:47.679 --> 00:15:50.480
<v Speaker 3>super important in edge open and in general and l

308
00:15:50.720 --> 00:15:53.799
<v Speaker 3>MS you spend tokens for prompts, you spend tokens for completions,

309
00:15:54.519 --> 00:15:58.799
<v Speaker 3>and even when you do play open I instance, there

310
00:15:58.919 --> 00:16:01.600
<v Speaker 3>is a quota associate to your model which is expressed

311
00:16:01.639 --> 00:16:04.840
<v Speaker 3>and TPM which is tokens per minute. Right, and then

312
00:16:04.879 --> 00:16:08.840
<v Speaker 3>after all of these experimentations customers, they started to realize that, Okay,

313
00:16:08.919 --> 00:16:11.360
<v Speaker 3>now we need to wait to manage this because okay,

314
00:16:11.440 --> 00:16:13.840
<v Speaker 3>we've built our first POC. We have one team who

315
00:16:13.919 --> 00:16:16.799
<v Speaker 3>developed this kind of a private preview app which is

316
00:16:16.840 --> 00:16:18.679
<v Speaker 3>not full in production right now. But now we have

317
00:16:18.879 --> 00:16:22.039
<v Speaker 3>ten different departments, ten different teams who also want to

318
00:16:22.080 --> 00:16:24.840
<v Speaker 3>get access to this model, And now, how can I

319
00:16:24.919 --> 00:16:28.559
<v Speaker 3>manage that? How can I limit the consumption per team,

320
00:16:28.639 --> 00:16:31.120
<v Speaker 3>per department, per developer, How can I.

321
00:16:31.200 --> 00:16:34.320
<v Speaker 2>Make sure signed costs out like my sessonment had is

322
00:16:34.440 --> 00:16:37.200
<v Speaker 2>firmly on right now, It's like there's nothing better in

323
00:16:37.240 --> 00:16:39.720
<v Speaker 2>this world. And being able to build out resources to

324
00:16:39.759 --> 00:16:41.120
<v Speaker 2>the individual teams for what they do.

325
00:16:42.679 --> 00:16:44.639
<v Speaker 3>Yeah, and that's and that's a huge issue, Like you

326
00:16:44.720 --> 00:16:46.759
<v Speaker 3>need to figure out how many tokens were consumed by

327
00:16:46.759 --> 00:16:49.519
<v Speaker 3>a specific team, sure what kind of model they used,

328
00:16:50.720 --> 00:16:53.759
<v Speaker 3>And then like okay, at the beginning, you have one endpoint.

329
00:16:53.960 --> 00:16:56.159
<v Speaker 3>But what if you want to have multiple endpoints because

330
00:16:56.519 --> 00:17:00.480
<v Speaker 3>like you're going production, you want to scale. How do

331
00:17:00.559 --> 00:17:03.759
<v Speaker 3>you all balance how do you like create circuit breaker

332
00:17:03.840 --> 00:17:06.200
<v Speaker 3>rules to make sure that for example, okay, wile our

333
00:17:06.279 --> 00:17:10.039
<v Speaker 3>first instance is throat out responses with four twenty nine,

334
00:17:10.319 --> 00:17:13.240
<v Speaker 3>how can I fail over to a different endpoint?

335
00:17:13.480 --> 00:17:13.599
<v Speaker 1>Right?

336
00:17:13.720 --> 00:17:16.920
<v Speaker 3>Yeah, so, yeah, there are a lot of challenges. Now

337
00:17:17.200 --> 00:17:21.319
<v Speaker 3>you mentioned the given access API keys. Distributing API keys

338
00:17:21.359 --> 00:17:23.720
<v Speaker 3>to all of these teams also doesn't sound like a

339
00:17:23.759 --> 00:17:27.960
<v Speaker 3>good idea. So that's why we've built like a lot

340
00:17:28.000 --> 00:17:30.359
<v Speaker 3>of stuff that is in this blog post for Jenny

341
00:17:30.400 --> 00:17:33.920
<v Speaker 3>I announcement. We wanted to solve these challenges for customers

342
00:17:34.359 --> 00:17:36.920
<v Speaker 3>who are kind of scaling and trying to like productize

343
00:17:37.039 --> 00:17:41.559
<v Speaker 3>their their investment into as open THEI specifically, but also

344
00:17:42.400 --> 00:17:45.119
<v Speaker 3>for other models like elms and stuff.

345
00:17:45.279 --> 00:17:45.480
<v Speaker 1>Yeah.

346
00:17:46.359 --> 00:17:49.200
<v Speaker 2>Certainly, one of the experiences I've dealt with with companies

347
00:17:49.240 --> 00:17:53.160
<v Speaker 2>building a software into the cloud, even when they you know,

348
00:17:53.200 --> 00:17:55.799
<v Speaker 2>they've got authentication and they're building back to the customer,

349
00:17:56.480 --> 00:17:59.039
<v Speaker 2>the customer makes a mistake with the API and racks

350
00:17:59.119 --> 00:18:02.640
<v Speaker 2>up a couple a million transactions that were test transactions,

351
00:18:02.759 --> 00:18:04.839
<v Speaker 2>Like they're not making money on the back end. Then

352
00:18:04.920 --> 00:18:07.559
<v Speaker 2>you're sending them this ugly bill and they, you know,

353
00:18:07.759 --> 00:18:11.359
<v Speaker 2>want help. In the meantime, you've also gotten an ugly bill,

354
00:18:12.160 --> 00:18:13.880
<v Speaker 2>you know, because you ran it on the back end.

355
00:18:13.920 --> 00:18:16.400
<v Speaker 2>So this is this whole game of like who's.

356
00:18:16.200 --> 00:18:17.200
<v Speaker 1>Holding the bag here?

357
00:18:17.720 --> 00:18:19.400
<v Speaker 2>You know, you don't want to punish your customer for

358
00:18:19.480 --> 00:18:23.039
<v Speaker 2>making a mistake. If you do, you may lose them

359
00:18:23.039 --> 00:18:25.960
<v Speaker 2>as a customer. You're not necessarily going to get remediated,

360
00:18:26.119 --> 00:18:28.519
<v Speaker 2>you know, back to Azure too. But although I've certainly

361
00:18:28.559 --> 00:18:30.519
<v Speaker 2>had that experience where I've done stupid stuff in Azure

362
00:18:30.519 --> 00:18:32.200
<v Speaker 2>and called them like I'm really sorry I did this,

363
00:18:32.400 --> 00:18:34.039
<v Speaker 2>or like yep, fine, I'll wipe it.

364
00:18:34.319 --> 00:18:38.200
<v Speaker 1>Oh you're the guy. Yeah, we've been waiting for your call.

365
00:18:39.440 --> 00:18:39.920
<v Speaker 1>What was that?

366
00:18:40.519 --> 00:18:47.799
<v Speaker 2>But the business reality of this consumption model is you

367
00:18:47.920 --> 00:18:50.759
<v Speaker 2>don't always get paid for the stuff that you used,

368
00:18:51.279 --> 00:18:55.039
<v Speaker 2>right or and or are willing to like that's this.

369
00:18:55.400 --> 00:18:58.440
<v Speaker 2>All of these mechanisms to me speak to let's catch

370
00:18:58.519 --> 00:19:00.480
<v Speaker 2>why didn't you notice? Why didn't you catch it before

371
00:19:00.519 --> 00:19:03.920
<v Speaker 2>it ran away? You know, after the first million tokens?

372
00:19:04.200 --> 00:19:07.119
<v Speaker 2>Why didn't you stop me? And these are the tools,

373
00:19:07.279 --> 00:19:10.799
<v Speaker 2>right like, this is how this stops from being worse exactly.

374
00:19:11.000 --> 00:19:11.240
<v Speaker 1>Yeah.

375
00:19:11.519 --> 00:19:14.119
<v Speaker 3>Yeah, So we were trying to make sure that customers

376
00:19:14.200 --> 00:19:17.640
<v Speaker 3>have the right tools to have the proper governance in place.

377
00:19:18.880 --> 00:19:20.960
<v Speaker 3>So one of the things that you mentioned like tokens,

378
00:19:21.480 --> 00:19:24.839
<v Speaker 3>So we introduced the So we already had like rate

379
00:19:24.880 --> 00:19:29.079
<v Speaker 3>limiting policy that works for requests like you can say,

380
00:19:29.319 --> 00:19:31.880
<v Speaker 3>as I mentioned previously, like five requests per second for example,

381
00:19:32.400 --> 00:19:34.440
<v Speaker 3>and now we need we had to build something for

382
00:19:34.599 --> 00:19:37.119
<v Speaker 3>tokens which is aware of these tokens, which is kind

383
00:19:37.119 --> 00:19:40.720
<v Speaker 3>of the main currency of open the eye, as I mentioned. So, yeah,

384
00:19:40.720 --> 00:19:43.359
<v Speaker 3>we introduced the stoken limit policy. It works pretty similarly

385
00:19:43.440 --> 00:19:46.480
<v Speaker 3>to rate limit policy. You can say that, okay, we

386
00:19:46.599 --> 00:19:49.680
<v Speaker 3>have this application, we have this department, we have this team.

387
00:19:50.440 --> 00:19:55.759
<v Speaker 3>Now we assign let's say that one thousand tokens permitted

388
00:19:55.799 --> 00:19:58.920
<v Speaker 3>to this application to make sure that do not consume more.

389
00:20:00.519 --> 00:20:02.920
<v Speaker 3>And yeah, and that that prot works pretty well. And

390
00:20:03.039 --> 00:20:07.240
<v Speaker 3>if you want to be extra careful, you also want

391
00:20:07.319 --> 00:20:10.519
<v Speaker 3>to you also can configure the policy to estimate the

392
00:20:11.200 --> 00:20:13.759
<v Speaker 3>uh the tokens which are in the prompt So whenever

393
00:20:13.839 --> 00:20:15.680
<v Speaker 3>there is a request coming with a prompt and you

394
00:20:16.039 --> 00:20:19.039
<v Speaker 3>calculate the number of prompts, the number of tokens which

395
00:20:19.079 --> 00:20:21.240
<v Speaker 3>is used in the prompt and then if we on

396
00:20:21.319 --> 00:20:24.160
<v Speaker 3>APM side understand that it already exceeds the limit, we

397
00:20:24.200 --> 00:20:26.160
<v Speaker 3>will not send this to the to the back end.

398
00:20:26.559 --> 00:20:28.440
<v Speaker 2>Right, so you will consume in the first place, you're

399
00:20:28.440 --> 00:20:31.880
<v Speaker 2>already pressing against the limit. Yes, how do you bubble

400
00:20:32.079 --> 00:20:33.319
<v Speaker 2>up that you've hit a limit?

401
00:20:34.519 --> 00:20:34.559
<v Speaker 1>Like?

402
00:20:34.720 --> 00:20:36.799
<v Speaker 2>What does that look like for the customer? What does

403
00:20:36.799 --> 00:20:38.160
<v Speaker 2>it look like for the operator?

404
00:20:38.960 --> 00:20:41.960
<v Speaker 3>Yeah, so there is a pattern with great limited. For example,

405
00:20:42.559 --> 00:20:45.960
<v Speaker 3>you typically it's four twenty nine returned, retry with retry

406
00:20:46.000 --> 00:20:48.720
<v Speaker 3>after header with a specific like number of seconds.

407
00:20:49.559 --> 00:20:55.000
<v Speaker 1>That's a message that says sorry yeah version yeah, Canadian

408
00:20:55.079 --> 00:20:55.599
<v Speaker 1>version yeah.

409
00:20:56.119 --> 00:20:58.519
<v Speaker 3>And that that's what we've built forty for this token

410
00:20:58.599 --> 00:21:00.839
<v Speaker 3>limit policy as well. So whenever the limit has hit

411
00:21:01.039 --> 00:21:05.799
<v Speaker 3>four twenty nine, retry after a specific number of seconds

412
00:21:05.880 --> 00:21:07.519
<v Speaker 3>or minutes, depending on how you can figure it.

413
00:21:07.599 --> 00:21:10.440
<v Speaker 1>Right, if you're being rate limited. Yeah, I use the

414
00:21:11.480 --> 00:21:14.880
<v Speaker 1>Google YouTube API, and I'm working on a new publisher

415
00:21:15.119 --> 00:21:18.279
<v Speaker 1>and it's going to be publishing to YouTube, just like

416
00:21:18.359 --> 00:21:24.160
<v Speaker 1>we talked about earlier. And it's weird. I work for

417
00:21:24.240 --> 00:21:26.000
<v Speaker 1>a couple hours on this in the morning, and I

418
00:21:26.160 --> 00:21:29.720
<v Speaker 1>make several requests and then I get the you know,

419
00:21:29.960 --> 00:21:32.519
<v Speaker 1>quota exceeded, and I'm going to look at my quota

420
00:21:32.559 --> 00:21:35.079
<v Speaker 1>and it's like ten thousand API calls. I'm like, I

421
00:21:35.160 --> 00:21:39.240
<v Speaker 1>need to make ten thousand API calls. So it's just

422
00:21:39.359 --> 00:21:43.160
<v Speaker 1>an anecdote, but yeah, I'm looking at the response for that,

423
00:21:43.680 --> 00:21:46.599
<v Speaker 1>you know when I try to authenticate myself and it'll

424
00:21:46.640 --> 00:21:50.799
<v Speaker 1>say nope, quote exceeded. Sorry. Yeah.

425
00:21:50.839 --> 00:21:53.759
<v Speaker 3>And we also trying to make sure that it is

426
00:21:53.880 --> 00:21:58.279
<v Speaker 3>fully transparent for developers because there is a huge ecosystem

427
00:21:58.319 --> 00:22:01.359
<v Speaker 3>of different tools for open the I and other llms

428
00:22:01.440 --> 00:22:04.279
<v Speaker 3>like as open the I, s decay, lung chain, prompt

429
00:22:04.319 --> 00:22:06.720
<v Speaker 3>flow like there are a lot of different tools and

430
00:22:07.200 --> 00:22:11.400
<v Speaker 3>typically typically developers they start with the direct access to

431
00:22:11.480 --> 00:22:14.400
<v Speaker 3>open THEI because as the case, they expect a specific

432
00:22:14.599 --> 00:22:16.359
<v Speaker 3>like ur L and the open the eye side, they

433
00:22:16.400 --> 00:22:19.039
<v Speaker 3>expect the apike and so on. So on our side,

434
00:22:19.079 --> 00:22:22.039
<v Speaker 3>we wanted to make sure that this experience is the

435
00:22:22.119 --> 00:22:24.799
<v Speaker 3>same for developer. So which means that if we put

436
00:22:24.839 --> 00:22:28.279
<v Speaker 3>API m behind or sorry between open EI and the developer,

437
00:22:28.799 --> 00:22:31.440
<v Speaker 3>they will never notice that something changed. So for us,

438
00:22:31.480 --> 00:22:33.519
<v Speaker 3>it was super important to make sure that the developer

439
00:22:33.559 --> 00:22:37.559
<v Speaker 3>experience is still the same. That's why yes, yes, so

440
00:22:37.599 --> 00:22:39.400
<v Speaker 3>that's why we return for twenty nine because that's what

441
00:22:39.559 --> 00:22:41.960
<v Speaker 3>open EI does. We are trying to follow the same

442
00:22:42.000 --> 00:22:46.279
<v Speaker 3>structure to make sure that everything works as it worked before, isn't.

443
00:22:46.279 --> 00:22:49.839
<v Speaker 1>One of the things that ap I M does is

444
00:22:50.359 --> 00:22:53.319
<v Speaker 1>you can if you have a process for the developer

445
00:22:53.400 --> 00:22:58.880
<v Speaker 1>that includes several API calls, maybe two different services or

446
00:22:58.960 --> 00:23:04.440
<v Speaker 1>different you can make one sort of master API that

447
00:23:04.640 --> 00:23:07.599
<v Speaker 1>then makes calls and proxies out on your behalf to

448
00:23:07.720 --> 00:23:11.039
<v Speaker 1>these other ones and comes with a single result. I've

449
00:23:11.160 --> 00:23:14.799
<v Speaker 1>used that feature of API M. There's just so much stuff,

450
00:23:14.839 --> 00:23:17.119
<v Speaker 1>and when I got into it, there's just so much

451
00:23:17.160 --> 00:23:20.079
<v Speaker 1>stuff in there. We could probably spend two hours just

452
00:23:20.119 --> 00:23:23.000
<v Speaker 1>talking about all the features of API M. But you

453
00:23:23.759 --> 00:23:26.440
<v Speaker 1>mentioned that you put out a Microsoft put out a

454
00:23:26.440 --> 00:23:30.079
<v Speaker 1>white paper about, you know, some of these new features.

455
00:23:30.640 --> 00:23:34.000
<v Speaker 1>I guess can we get a link to that and

456
00:23:35.079 --> 00:23:37.799
<v Speaker 1>what what are some of the other amazing things that

457
00:23:37.880 --> 00:23:40.559
<v Speaker 1>we might not know about that are in that now.

458
00:23:40.640 --> 00:23:42.880
<v Speaker 3>There's a lot of innovation happening in APAM, so Jenny

459
00:23:42.920 --> 00:23:45.640
<v Speaker 3>I gateway is definitely one of those things that I mentioned.

460
00:23:46.039 --> 00:23:49.079
<v Speaker 3>We're also currently working on the enhancing the for example,

461
00:23:49.160 --> 00:23:51.240
<v Speaker 3>the workspaces feature that we have an APIM to make

462
00:23:51.279 --> 00:23:54.279
<v Speaker 3>sure that each team has its soul in workspace with

463
00:23:54.440 --> 00:23:58.880
<v Speaker 3>isolation like control plan isolation, data play isolation, and so on. Recently,

464
00:23:58.920 --> 00:24:02.400
<v Speaker 3>we also released a couple of new SKUs for APIM

465
00:24:02.480 --> 00:24:05.799
<v Speaker 3>which are way faster to provision, they work better, they

466
00:24:05.880 --> 00:24:09.839
<v Speaker 3>work in a new architecture under the hood. There is

467
00:24:09.920 --> 00:24:13.039
<v Speaker 3>a slightly different price in model. But yeah, that's we

468
00:24:13.920 --> 00:24:17.799
<v Speaker 3>have a lot of stuff going on there. To your

469
00:24:17.880 --> 00:24:22.519
<v Speaker 3>point for the as you mentioned that it's really hard

470
00:24:22.559 --> 00:24:24.480
<v Speaker 3>to understand what's going on in a PM, like a

471
00:24:24.559 --> 00:24:28.119
<v Speaker 3>lot of policies and stuff like that. With Jenny I

472
00:24:28.200 --> 00:24:31.279
<v Speaker 3>gateway that we were discussing, that's also one of the

473
00:24:31.359 --> 00:24:34.400
<v Speaker 3>challenges that we wanted to address, like, Okay, we have

474
00:24:34.519 --> 00:24:38.480
<v Speaker 3>this intelligent application developers. They use JGPT, they know how

475
00:24:38.559 --> 00:24:41.480
<v Speaker 3>to use that, but they're not familiar with APM, and

476
00:24:41.559 --> 00:24:43.720
<v Speaker 3>now we're asking them to write a bunch of aximal

477
00:24:43.759 --> 00:24:46.799
<v Speaker 3>policies to limit to have the token limit, to have

478
00:24:46.920 --> 00:24:49.960
<v Speaker 3>the authorization in place, load balancing in place, like metrics

479
00:24:50.240 --> 00:24:53.559
<v Speaker 3>for token consumption in place, and so on. So we

480
00:24:53.720 --> 00:24:56.559
<v Speaker 3>wanted to address it, and we also kind of we

481
00:24:56.680 --> 00:24:58.799
<v Speaker 3>thought that it would be nice to have an easy

482
00:24:58.839 --> 00:25:02.279
<v Speaker 3>experience of for those developers and apim to import exist

483
00:25:02.400 --> 00:25:05.240
<v Speaker 3>natural open AAPIs. So we now have this kind of

484
00:25:05.480 --> 00:25:07.759
<v Speaker 3>UI portal experience where you can just say, okay, I

485
00:25:08.480 --> 00:25:12.079
<v Speaker 3>was using this open the endpoint, let's configure that one.

486
00:25:12.160 --> 00:25:14.759
<v Speaker 3>And also I want to have token limit off I

487
00:25:14.759 --> 00:25:18.200
<v Speaker 3>don't know, two thousand GPM, and we can configure everything

488
00:25:18.279 --> 00:25:20.799
<v Speaker 3>for them, so they don't really need to care about

489
00:25:20.839 --> 00:25:23.200
<v Speaker 3>the eximal policies. They don't really need to look into those.

490
00:25:23.240 --> 00:25:24.759
<v Speaker 3>Of course, if you need to change something later on

491
00:25:24.960 --> 00:25:26.799
<v Speaker 3>or you need like most of his scated policies, of

492
00:25:26.839 --> 00:25:29.880
<v Speaker 3>course I need to learn something, but at least to

493
00:25:30.000 --> 00:25:34.440
<v Speaker 3>getting started experiences is like super opimal.

494
00:25:34.319 --> 00:25:38.160
<v Speaker 2>Well Microsoft, Yeah, I like the copilot ASPD here of Also,

495
00:25:38.599 --> 00:25:40.400
<v Speaker 2>I know I wrote this a month ago, but I

496
00:25:40.440 --> 00:25:42.880
<v Speaker 2>don't know what it says anymore. Like PARTSES for me,

497
00:25:43.480 --> 00:25:46.319
<v Speaker 2>like again with my admin head on, it's like often

498
00:25:46.359 --> 00:25:48.400
<v Speaker 2>I have a service level agreement I'm making with certain

499
00:25:48.440 --> 00:25:52.000
<v Speaker 2>customers that's written in legal ease and I'm trying to

500
00:25:52.079 --> 00:25:55.079
<v Speaker 2>translate it into haven't helped Me XML, But the idea

501
00:25:55.119 --> 00:25:58.400
<v Speaker 2>that I have an intermediary tool that would then take

502
00:25:58.440 --> 00:26:00.200
<v Speaker 2>them at legally to try and make the XML for me,

503
00:26:00.240 --> 00:26:01.880
<v Speaker 2>and then after it's done. I could ask for it

504
00:26:02.039 --> 00:26:03.960
<v Speaker 2>back and say, like, how close have I gotten here?

505
00:26:04.519 --> 00:26:06.960
<v Speaker 2>I actually hit the rules that we've agreed to in

506
00:26:07.039 --> 00:26:10.920
<v Speaker 2>the SLA. That translation that layer has always been a

507
00:26:11.000 --> 00:26:14.240
<v Speaker 2>challenging part of it? Has this always been about the money?

508
00:26:15.279 --> 00:26:18.000
<v Speaker 2>Like that's the main thing that's happening here is you

509
00:26:18.079 --> 00:26:19.920
<v Speaker 2>don't want to run it, you know, I presume you'll

510
00:26:19.920 --> 00:26:22.400
<v Speaker 2>always sell us more cloud, you know by the transaction.

511
00:26:22.720 --> 00:26:25.680
<v Speaker 2>If you just keep requesting calls, that's fine. It's just

512
00:26:26.160 --> 00:26:27.720
<v Speaker 2>then one day you're going to have to pay for

513
00:26:27.839 --> 00:26:31.279
<v Speaker 2>it and it's not what you intended. So is that

514
00:26:31.359 --> 00:26:33.319
<v Speaker 2>the important part in API management? Like, I'm not worried

515
00:26:33.319 --> 00:26:34.799
<v Speaker 2>about tipping over the cloud, am I?

516
00:26:36.079 --> 00:26:38.400
<v Speaker 3>Well, I guess it depends on your shower, of course.

517
00:26:38.440 --> 00:26:40.359
<v Speaker 3>But yeah, that's one of the one of the things

518
00:26:40.359 --> 00:26:43.079
<v Speaker 3>that you can put into APIM, Like whatever control you need,

519
00:26:44.079 --> 00:26:46.839
<v Speaker 3>you can you can build it with the kind of

520
00:26:46.920 --> 00:26:49.920
<v Speaker 3>a pretty powerful police engine that we have in APM.

521
00:26:50.839 --> 00:26:51.279
<v Speaker 1>That's cool.

522
00:26:51.559 --> 00:26:54.680
<v Speaker 2>I appreciate that, And gentlemen, I needed to take a

523
00:26:54.720 --> 00:26:59.200
<v Speaker 2>break for one moment for these very important messages, and

524
00:26:59.440 --> 00:27:01.720
<v Speaker 2>we're back. It's don at Rock's I'mateurd Campbell, that's Carl

525
00:27:01.759 --> 00:27:04.640
<v Speaker 2>Franklin yoh Yo Yo talking to our friend Andre a

526
00:27:04.680 --> 00:27:08.440
<v Speaker 2>bit about these improvements to API M which we all

527
00:27:08.480 --> 00:27:10.759
<v Speaker 2>should be using. If we're gonna expose an API through

528
00:27:10.839 --> 00:27:14.079
<v Speaker 2>the cloud to the world, don't leave it naked, give

529
00:27:14.119 --> 00:27:17.759
<v Speaker 2>it some armor, and this tool helps. These gen AI

530
00:27:17.920 --> 00:27:20.599
<v Speaker 2>tools help us to configure it correctly, operate it well,

531
00:27:21.000 --> 00:27:23.920
<v Speaker 2>but then also deal with the additional complexities when it

532
00:27:23.960 --> 00:27:28.000
<v Speaker 2>comes to the as you open AI, APIs with limit

533
00:27:28.519 --> 00:27:34.319
<v Speaker 2>issuing tokens for software to utilize open ai and put

534
00:27:34.400 --> 00:27:36.079
<v Speaker 2>limits in place for all of those good things.

535
00:27:36.480 --> 00:27:37.799
<v Speaker 1>Have I summarized that correctly?

536
00:27:37.839 --> 00:27:38.119
<v Speaker 2>Andre?

537
00:27:38.559 --> 00:27:40.720
<v Speaker 1>Yeah? I think so. Yeah, I think I'm starting to

538
00:27:40.799 --> 00:27:43.920
<v Speaker 1>understand what you doing here. Man. I'm pretty excited. Richard

539
00:27:44.079 --> 00:27:45.079
<v Speaker 1>is the human AI.

540
00:27:46.119 --> 00:27:51.440
<v Speaker 2>I don't know that's true. It's yeah, real, definitely created.

541
00:27:51.519 --> 00:27:53.799
<v Speaker 2>Like you said, a very important phrase is sticking with

542
00:27:53.920 --> 00:27:55.400
<v Speaker 2>me now, which is tokens or currency?

543
00:27:55.759 --> 00:27:56.440
<v Speaker 1>Yeah? Absolutely.

544
00:27:56.480 --> 00:27:58.680
<v Speaker 3>You can think about it as your main currency, your

545
00:27:58.720 --> 00:28:02.480
<v Speaker 3>main resource you have with all of these models, and

546
00:28:02.599 --> 00:28:04.480
<v Speaker 3>that's also what you're paying for, and.

547
00:28:04.519 --> 00:28:07.319
<v Speaker 1>It's what what you pay that's what you pay for exactly.

548
00:28:07.440 --> 00:28:09.440
<v Speaker 2>And so of course it's a currency because it does

549
00:28:09.559 --> 00:28:13.640
<v Speaker 2>ultimately translate into FIA currency. Of whatever form you're using,

550
00:28:13.759 --> 00:28:15.279
<v Speaker 2>you're going to you're going to pay for that stuff,

551
00:28:15.960 --> 00:28:17.880
<v Speaker 2>and then you get it that pays your models and

552
00:28:17.960 --> 00:28:19.799
<v Speaker 2>all you have all that choices when you have these

553
00:28:19.799 --> 00:28:21.960
<v Speaker 2>controls over top of Can we talk a little about

554
00:28:21.960 --> 00:28:25.680
<v Speaker 2>the semantic casing policies. That sounds like a way to

555
00:28:25.880 --> 00:28:29.519
<v Speaker 2>save money and potentially improve performance. That's interesting.

556
00:28:30.079 --> 00:28:32.960
<v Speaker 3>Yes, yeah, that's that's actually very interest simple see and

557
00:28:33.000 --> 00:28:36.240
<v Speaker 3>every interesting implementation from all side. So yeah, as as

558
00:28:36.279 --> 00:28:40.680
<v Speaker 3>you mentioned, so first of all, we solve the latency

559
00:28:40.720 --> 00:28:44.759
<v Speaker 3>problem just with regular cushion that already exists in APAM

560
00:28:44.839 --> 00:28:47.160
<v Speaker 3>for a while, you can cash request, you can cush

561
00:28:47.200 --> 00:28:50.839
<v Speaker 3>responses for specific requests, but with with all items is

562
00:28:50.839 --> 00:28:53.640
<v Speaker 3>a little bit different because your prompts can be different,

563
00:28:53.720 --> 00:28:56.519
<v Speaker 3>but they're semantically similar, right, That's what we do with

564
00:28:56.599 --> 00:28:59.559
<v Speaker 3>semantic cash, And so there is a an open opening.

565
00:28:59.559 --> 00:29:04.240
<v Speaker 3>I provide and embedding models. Embedding model which generates vectors

566
00:29:04.319 --> 00:29:06.559
<v Speaker 3>which represent the kind of you can think about it

567
00:29:06.599 --> 00:29:08.960
<v Speaker 3>as a kind of semantic minion of a specific prompt

568
00:29:09.039 --> 00:29:13.519
<v Speaker 3>war specific like stream, and then we generated for a

569
00:29:13.559 --> 00:29:15.839
<v Speaker 3>specific prompt and then if we realize that there is

570
00:29:15.880 --> 00:29:19.400
<v Speaker 3>a semantically similar prompt coming in, we will check the

571
00:29:19.480 --> 00:29:21.720
<v Speaker 3>cash and we will retrieve the response from the cash

572
00:29:21.759 --> 00:29:23.880
<v Speaker 3>instead of hitting the open the endpoints. So first of all,

573
00:29:23.920 --> 00:29:26.079
<v Speaker 3>as I mentioned, were solving the latency problems or the

574
00:29:26.680 --> 00:29:29.400
<v Speaker 3>response is getting to the client faster, but we all

575
00:29:29.480 --> 00:29:32.640
<v Speaker 3>sort of saving on the token consumption because this prompt

576
00:29:32.759 --> 00:29:36.799
<v Speaker 3>will never go to help on the A endpoint while

577
00:29:36.880 --> 00:29:40.240
<v Speaker 3>we have the response cached. In our case, we're using

578
00:29:40.279 --> 00:29:43.599
<v Speaker 3>reddis for vector search, so that's where story is responses.

579
00:29:43.680 --> 00:29:46.799
<v Speaker 3>So yeah, if you're saying hi or saying hello afterwards,

580
00:29:47.079 --> 00:29:49.599
<v Speaker 3>they're semantically similar where we just returned.

581
00:29:50.200 --> 00:29:54.279
<v Speaker 2>I immediately go to a scenario like imagine an incident

582
00:29:54.319 --> 00:29:56.759
<v Speaker 2>that's happened that has caused a lot of flights to

583
00:29:56.839 --> 00:29:57.920
<v Speaker 2>be canceled.

584
00:29:58.039 --> 00:30:00.960
<v Speaker 1>That would never happen, Richard, Come on, you a real example.

585
00:30:01.279 --> 00:30:04.839
<v Speaker 2>Folks are trying to find out if their flights canceled,

586
00:30:05.000 --> 00:30:07.519
<v Speaker 2>So you're going to get many requests from different sources

587
00:30:07.559 --> 00:30:09.960
<v Speaker 2>that are essentially the same thing. Is this flight canceled?

588
00:30:10.319 --> 00:30:12.119
<v Speaker 2>You really only need to want to fetch that once.

589
00:30:12.240 --> 00:30:14.799
<v Speaker 2>Now it's sitting in the cash, and you very quickly respond, yes,

590
00:30:15.160 --> 00:30:16.480
<v Speaker 2>all flights are canceled.

591
00:30:17.599 --> 00:30:18.400
<v Speaker 1>But you know.

592
00:30:19.880 --> 00:30:21.400
<v Speaker 2>What I like about a cashing model like that is

593
00:30:21.440 --> 00:30:24.480
<v Speaker 2>that it will evolve over time, you know, you imagine

594
00:30:24.519 --> 00:30:27.920
<v Speaker 2>other scenarios whence those flights are gone, there's other flights

595
00:30:28.079 --> 00:30:30.519
<v Speaker 2>like but you're often only going to need to make

596
00:30:30.599 --> 00:30:33.559
<v Speaker 2>that actual request back to the engine once and use

597
00:30:33.599 --> 00:30:37.000
<v Speaker 2>it over and over again. So a good caching opportunity

598
00:30:37.000 --> 00:30:39.519
<v Speaker 2>when you're going to have multiple people more or less

599
00:30:40.000 --> 00:30:43.000
<v Speaker 2>making the same requests but in many different ways of phrasing.

600
00:30:43.160 --> 00:30:45.079
<v Speaker 1>And also a way to bust the cash once the

601
00:30:45.119 --> 00:30:47.039
<v Speaker 1>flights are back to normal.

602
00:30:46.839 --> 00:30:49.759
<v Speaker 2>Yeah, rather than do code it yourself where you have

603
00:30:49.839 --> 00:30:53.160
<v Speaker 2>to it's cashing is not hard. Expiring is hard, yeah,

604
00:30:54.519 --> 00:30:56.160
<v Speaker 2>inspiring's always hard.

605
00:30:56.279 --> 00:31:01.960
<v Speaker 1>So wait a minute, what why is it? Oh? How

606
00:31:02.000 --> 00:31:02.480
<v Speaker 1>many times?

607
00:31:03.119 --> 00:31:05.880
<v Speaker 2>Although maybe and again i'm reading here this is an

608
00:31:05.920 --> 00:31:07.759
<v Speaker 2>early version. This is your first sort of go with this.

609
00:31:08.079 --> 00:31:10.880
<v Speaker 3>Yes, yeah, yeah, yeah, well that's that's the nearly preview

610
00:31:11.000 --> 00:31:14.519
<v Speaker 3>version for now. We're still like so there there are

611
00:31:14.519 --> 00:31:16.640
<v Speaker 3>a lot of customer use cases for that. So as

612
00:31:16.720 --> 00:31:19.799
<v Speaker 3>you mentioned, uh that that was a good example. Uh,

613
00:31:20.960 --> 00:31:25.680
<v Speaker 3>but then we also have so basically like whenever whenever

614
00:31:25.960 --> 00:31:28.440
<v Speaker 3>the company builds some sort of a chat service for

615
00:31:28.799 --> 00:31:32.920
<v Speaker 3>answering questions, then you always have frequently asked questions.

616
00:31:33.279 --> 00:31:34.759
<v Speaker 2>And that's where you're Hey, you're going to build a

617
00:31:34.799 --> 00:31:38.640
<v Speaker 2>factable inevitably, but rather than you define it, let utilization

618
00:31:38.880 --> 00:31:40.640
<v Speaker 2>define it with a cash exactly.

619
00:31:40.920 --> 00:31:41.160
<v Speaker 1>Yeah.

620
00:31:41.759 --> 00:31:43.880
<v Speaker 3>Yeah, and that's where you you have a lot of

621
00:31:44.839 --> 00:31:47.559
<v Speaker 3>token saved just with the semantic cash and policy.

622
00:31:47.640 --> 00:31:47.839
<v Speaker 1>Yeah.

623
00:31:47.880 --> 00:31:51.039
<v Speaker 3>Also also for internal knowledge base, that's also important. Like

624
00:31:52.000 --> 00:31:54.759
<v Speaker 3>we have a bunch of for example, support engineers sitting

625
00:31:54.759 --> 00:31:59.680
<v Speaker 3>in this in the call center and sometimes problems are similar. Yeah,

626
00:31:59.799 --> 00:32:01.680
<v Speaker 3>it's and you're just doing the search through the Chad

627
00:32:01.720 --> 00:32:04.799
<v Speaker 3>jubt and yeah, your your responsors are turning from cash

628
00:32:04.880 --> 00:32:07.480
<v Speaker 3>and you're not hitting the opening endpoint.

629
00:32:07.759 --> 00:32:07.960
<v Speaker 1>Yeah.

630
00:32:08.279 --> 00:32:13.000
<v Speaker 2>I was recently reading about folks that aren't securing these

631
00:32:13.160 --> 00:32:16.759
<v Speaker 2>kinds of services properly, and people discover them and just

632
00:32:17.000 --> 00:32:20.640
<v Speaker 2>use them as their free version of chat ept, basically

633
00:32:20.720 --> 00:32:24.359
<v Speaker 2>leaving that that vendor holding the bag for the token costs.

634
00:32:25.319 --> 00:32:28.079
<v Speaker 1>It's a great idea, Richard. Yeah, nice, glad. I never

635
00:32:28.640 --> 00:32:33.839
<v Speaker 1>I can't believe in everything, but this is what I'm thinking.

636
00:32:33.880 --> 00:32:36.160
<v Speaker 2>It's like, I'm not even talking about the you know,

637
00:32:36.960 --> 00:32:40.319
<v Speaker 2>the proper utilizations and run away API calls and so far,

638
00:32:40.400 --> 00:32:44.079
<v Speaker 2>but genuine nefarious use that somebody's like, oh, look, you've

639
00:32:44.119 --> 00:32:46.839
<v Speaker 2>exposed chat to me and I can use it for anything,

640
00:32:47.319 --> 00:32:49.039
<v Speaker 2>So I'm not even gonna worry about your product. I'm

641
00:32:49.039 --> 00:32:52.319
<v Speaker 2>just going to exploit your token availability to run the

642
00:32:52.400 --> 00:32:55.519
<v Speaker 2>queries I want to run, and you know you get

643
00:32:55.559 --> 00:32:56.839
<v Speaker 2>to eat it. Congratulations.

644
00:32:57.519 --> 00:32:59.720
<v Speaker 3>Yeah, that's that's why. First of all, it's important to

645
00:32:59.799 --> 00:33:02.759
<v Speaker 3>have something like APIM where you have API keys which

646
00:33:02.799 --> 00:33:07.359
<v Speaker 3>are on APM side represents specific color or application. But

647
00:33:07.559 --> 00:33:11.640
<v Speaker 3>also there are certain tools in a measure OPENINGI itself

648
00:33:11.839 --> 00:33:14.960
<v Speaker 3>where you can say that there's a specific filter on

649
00:33:15.039 --> 00:33:17.319
<v Speaker 3>the content that this model is supposed to respond to,

650
00:33:17.839 --> 00:33:20.160
<v Speaker 3>for example, if you're asking it, if you're training it.

651
00:33:20.480 --> 00:33:24.160
<v Speaker 3>In our case, we're trained to respond about APIM policies

652
00:33:24.599 --> 00:33:27.799
<v Speaker 3>if someone asks about the weather right now or something else,

653
00:33:27.960 --> 00:33:31.720
<v Speaker 3>or summarizing a document which is which doesn't have anything

654
00:33:31.759 --> 00:33:33.599
<v Speaker 3>to do with APIM, and we will just respond sorry,

655
00:33:33.640 --> 00:33:35.559
<v Speaker 3>I cannot do that. I'm not trained to do that.

656
00:33:37.279 --> 00:33:38.960
<v Speaker 1>My job. Go find your own chatbot.

657
00:33:39.079 --> 00:33:43.400
<v Speaker 2>Yeah, and that documents particularly evil because that needs a

658
00:33:43.400 --> 00:33:45.519
<v Speaker 2>lot of tokens. When you shove a document up to

659
00:33:45.640 --> 00:33:49.119
<v Speaker 2>summarizes formul like absolutely as a token intensive and an

660
00:33:49.200 --> 00:33:53.880
<v Speaker 2>easy mistake to make if you haven't boxed that interface properly.

661
00:33:54.400 --> 00:33:57.240
<v Speaker 1>Talking about some of the some more the new awesome features.

662
00:33:57.680 --> 00:33:59.559
<v Speaker 1>Is there anything that we haven't talked about yet that

663
00:33:59.680 --> 00:34:02.759
<v Speaker 1>customers have asked for that you've implemented in this next version.

664
00:34:03.039 --> 00:34:05.880
<v Speaker 3>Yeah, there is an interesting I wouldn't say that's specific feature,

665
00:34:05.960 --> 00:34:07.880
<v Speaker 3>but that's kind of a challenge that we saw in

666
00:34:08.039 --> 00:34:12.840
<v Speaker 3>a PM. So we supported Service cent Events technology for

667
00:34:12.920 --> 00:34:15.679
<v Speaker 3>a while in APM, but we had some certain problems

668
00:34:15.719 --> 00:34:18.320
<v Speaker 3>with that because that's essentially streaming. So when you when

669
00:34:18.360 --> 00:34:21.679
<v Speaker 3>you send the request to judge a BT, typically what

670
00:34:21.840 --> 00:34:24.440
<v Speaker 3>you will see and experience that you're used to most

671
00:34:24.559 --> 00:34:27.599
<v Speaker 3>likely is that it will be it will be responding

672
00:34:27.679 --> 00:34:30.039
<v Speaker 3>in chunk of text. It's not just it's not sending,

673
00:34:30.159 --> 00:34:33.320
<v Speaker 3>like you, the full response, it's just responding it in

674
00:34:33.400 --> 00:34:37.960
<v Speaker 3>streaming fashion. And it turns out the customers want to

675
00:34:38.039 --> 00:34:42.199
<v Speaker 3>use streaming because that's what users are used to. They

676
00:34:42.719 --> 00:34:45.639
<v Speaker 3>want to see the same experience in their chat experiences

677
00:34:45.679 --> 00:34:47.639
<v Speaker 3>as well, like in their propilots and so on whatever

678
00:34:47.639 --> 00:34:52.119
<v Speaker 3>applications they build. But there is a certain problem with

679
00:34:52.239 --> 00:34:56.079
<v Speaker 3>that because whenever you introduce some sort of buffering, then

680
00:34:56.159 --> 00:34:59.800
<v Speaker 3>the streaming experience breaks, which which is the case for

681
00:34:59.840 --> 00:35:02.079
<v Speaker 3>you PM right now. Because whenever you have a log

682
00:35:02.159 --> 00:35:04.880
<v Speaker 3>in policy or a monitoring policy, or you have a retripolicy,

683
00:35:04.960 --> 00:35:07.360
<v Speaker 3>so whenever you do a buffer and a response or request,

684
00:35:08.000 --> 00:35:11.079
<v Speaker 3>the streaming breaks. So we had certain challenges to make

685
00:35:11.119 --> 00:35:14.199
<v Speaker 3>sure that talking limit and the talking metric policies they

686
00:35:14.320 --> 00:35:18.559
<v Speaker 3>work with streaming scenarios as well. So that's kind of challenging,

687
00:35:18.719 --> 00:35:20.239
<v Speaker 3>I would say, and that's kind of one of the

688
00:35:20.639 --> 00:35:23.960
<v Speaker 3>things that customers requests to add support for.

689
00:35:24.599 --> 00:35:26.480
<v Speaker 2>Yeah, for sure, there's more features still to come down

690
00:35:26.559 --> 00:35:28.920
<v Speaker 2>the pipe, you know, like there's a lot we could

691
00:35:29.000 --> 00:35:29.800
<v Speaker 2>be doing in here.

692
00:35:30.519 --> 00:35:31.840
<v Speaker 1>Yeah, over time.

693
00:35:32.000 --> 00:35:34.599
<v Speaker 2>It's although honestly, when we started this conversation, like I

694
00:35:34.639 --> 00:35:37.199
<v Speaker 2>think you guys already done too many things Like sting

695
00:35:37.239 --> 00:35:40.280
<v Speaker 2>is all as out is challenging, and I know there's

696
00:35:40.280 --> 00:35:41.320
<v Speaker 2>still more that could be done.

697
00:35:42.039 --> 00:35:43.920
<v Speaker 3>No, there are certainly a lot of scenarios like as

698
00:35:44.159 --> 00:35:46.039
<v Speaker 3>as you mentioned, one of these scenarios is kind of

699
00:35:46.119 --> 00:35:48.280
<v Speaker 3>content safety. Just to make sure that we do not

700
00:35:48.400 --> 00:35:52.119
<v Speaker 3>respond on specific I don't know, if there is a

701
00:35:52.199 --> 00:35:55.199
<v Speaker 3>specific question and a prompt, we should not respond to

702
00:35:55.280 --> 00:35:56.440
<v Speaker 3>this prompt.

703
00:35:56.719 --> 00:36:00.519
<v Speaker 2>Which doesn't sound like an API responsibility. You are at

704
00:36:00.519 --> 00:36:04.519
<v Speaker 2>the gateway point where doing content filtering. This is a

705
00:36:04.559 --> 00:36:09.079
<v Speaker 2>logical opportunity to hit that. Yeah, that's definitely a different area. Yeah,

706
00:36:09.119 --> 00:36:10.920
<v Speaker 2>and actually that's something that you can do today. Like

707
00:36:11.079 --> 00:36:13.679
<v Speaker 2>we get access to the request, you can look at

708
00:36:13.679 --> 00:36:15.480
<v Speaker 2>the headers, you can look at the body, and then

709
00:36:15.519 --> 00:36:17.719
<v Speaker 2>you can write whatever regular expression you want to deny

710
00:36:17.760 --> 00:36:23.239
<v Speaker 2>the request. But that's to my point that policies are hard,

711
00:36:23.360 --> 00:36:25.199
<v Speaker 2>especially for those who are not used to if I am.

712
00:36:25.599 --> 00:36:28.239
<v Speaker 2>We just want to make sure that it's easy, easy

713
00:36:28.320 --> 00:36:32.440
<v Speaker 2>to use, and easy to configure. So yeah, that's something

714
00:36:32.519 --> 00:36:37.800
<v Speaker 2>that we're looking at. Adding like content safety concerns are real,

715
00:36:39.400 --> 00:36:42.079
<v Speaker 2>there might be like PII data, there might be some

716
00:36:42.559 --> 00:36:45.239
<v Speaker 2>confidential data in the request or response. You want to

717
00:36:45.280 --> 00:36:46.079
<v Speaker 2>filter this out.

718
00:36:46.639 --> 00:36:50.199
<v Speaker 3>And Gateway seems like a natural place to do this

719
00:36:50.320 --> 00:36:52.280
<v Speaker 3>kind of stuff because that's the kind of single point

720
00:36:52.280 --> 00:36:53.840
<v Speaker 3>where you see all the requests and responses.

721
00:36:53.880 --> 00:36:56.239
<v Speaker 2>Because see asuary AI studio has a whole mechanism for

722
00:36:56.320 --> 00:36:58.639
<v Speaker 2>content controls and so forth, you kind of want to

723
00:36:58.679 --> 00:37:01.480
<v Speaker 2>pick the policies you've built were there and then push

724
00:37:01.599 --> 00:37:04.079
<v Speaker 2>them in a hook to the API side.

725
00:37:04.119 --> 00:37:06.679
<v Speaker 1>It's say, here's our saying, I only want to write

726
00:37:06.679 --> 00:37:07.119
<v Speaker 1>one set.

727
00:37:07.000 --> 00:37:08.800
<v Speaker 2>Of policies, but I want to be able to catch

728
00:37:08.840 --> 00:37:10.679
<v Speaker 2>them into different places where it would matter.

729
00:37:10.920 --> 00:37:12.599
<v Speaker 3>Yeah, there is also a big piece of kind of

730
00:37:12.599 --> 00:37:15.639
<v Speaker 3>a governance and kind of best practices within an organization.

731
00:37:15.719 --> 00:37:17.920
<v Speaker 3>For example, you can have multiple model deployments and they

732
00:37:18.000 --> 00:37:23.440
<v Speaker 3>have different content safety configurations. With APIM, you're just having

733
00:37:23.519 --> 00:37:26.960
<v Speaker 3>kind of this platform engineering side of GENNAI. Let's say

734
00:37:27.639 --> 00:37:29.599
<v Speaker 3>where you can say that, oh, these are our rules

735
00:37:29.639 --> 00:37:31.719
<v Speaker 3>and all of the models that are deployed they should

736
00:37:31.719 --> 00:37:34.719
<v Speaker 3>be behind APIM. And then in APIM you can figure

737
00:37:34.760 --> 00:37:37.920
<v Speaker 3>all of the rules that you have in your organization

738
00:37:38.039 --> 00:37:42.480
<v Speaker 3>to comply with the basically policies whatever you have an organization.

739
00:37:42.639 --> 00:37:46.880
<v Speaker 3>So in that case, you're basically shifting the control to

740
00:37:47.000 --> 00:37:49.800
<v Speaker 3>APIM instead of configuring stuff on the models level.

741
00:37:50.000 --> 00:37:52.400
<v Speaker 2>Yeah no, And you could see that associated with particular

742
00:37:52.440 --> 00:37:55.119
<v Speaker 2>authentication accounts too. So it's like, hey, I provide a

743
00:37:55.239 --> 00:37:58.880
<v Speaker 2>service for medical and so some pictures are going to

744
00:37:58.920 --> 00:38:00.840
<v Speaker 2>be the kind that you wouldn't know normally want to

745
00:38:00.960 --> 00:38:04.199
<v Speaker 2>show anywhere. But that's the business here, so it needs

746
00:38:04.199 --> 00:38:05.000
<v Speaker 2>a different rule set.

747
00:38:06.960 --> 00:38:07.159
<v Speaker 1>Yeah.

748
00:38:07.360 --> 00:38:10.679
<v Speaker 2>Interesting, interesting array of problems here, Like you guys are

749
00:38:10.719 --> 00:38:11.199
<v Speaker 2>up against it.

750
00:38:11.239 --> 00:38:11.840
<v Speaker 1>I appreciate this.

751
00:38:12.599 --> 00:38:18.159
<v Speaker 2>Uh, you've got an AI gateway samples on GitHub. Should

752
00:38:18.159 --> 00:38:19.920
<v Speaker 2>I include a link to that? That looks pretty cool

753
00:38:20.519 --> 00:38:21.320
<v Speaker 2>and super current.

754
00:38:21.559 --> 00:38:24.920
<v Speaker 3>Yeah, yeah, that's that's an amazing repole that was built

755
00:38:24.960 --> 00:38:27.960
<v Speaker 3>one of the by one of the gbb's that we

756
00:38:28.079 --> 00:38:35.000
<v Speaker 3>work with. So that's basically a set of labs that

757
00:38:35.159 --> 00:38:40.480
<v Speaker 3>you can try with with API M. So typically probably

758
00:38:40.599 --> 00:38:44.679
<v Speaker 3>know that the typical like space for AI engineer is

759
00:38:44.800 --> 00:38:48.960
<v Speaker 3>a Python notebook. Yeah, and that's something that we wanted

760
00:38:49.039 --> 00:38:52.079
<v Speaker 3>to implement in those labs. So there's a bunch of

761
00:38:52.440 --> 00:38:55.360
<v Speaker 3>there's a bunch of Python notebooks, and then there is

762
00:38:55.440 --> 00:38:58.280
<v Speaker 3>a code. Usually there is a code that is calling

763
00:38:58.440 --> 00:39:00.519
<v Speaker 3>open the E through a p I M with the

764
00:39:01.000 --> 00:39:03.440
<v Speaker 3>Azure opening I is decate, so it's pretty natural for

765
00:39:03.599 --> 00:39:07.159
<v Speaker 3>you engineers. And then we demonstrate kind of a different

766
00:39:07.280 --> 00:39:10.400
<v Speaker 3>token limits policy emy token metric policy. Then d a

767
00:39:10.400 --> 00:39:14.360
<v Speaker 3>lot of additional stuff like low balance in and sending

768
00:39:14.519 --> 00:39:19.400
<v Speaker 3>the augmenting the response with the RAC pattern and so on.

769
00:39:19.920 --> 00:39:24.960
<v Speaker 2>Yeah, so it'll seem familiar pretty quickly, dude. Yeah, you

770
00:39:25.039 --> 00:39:27.159
<v Speaker 2>know there's different people coming in from different angles. Right,

771
00:39:27.280 --> 00:39:30.360
<v Speaker 2>You've got your service builder on the back end, once

772
00:39:30.480 --> 00:39:32.639
<v Speaker 2>controls and throttles and logging and that kind of thing.

773
00:39:33.119 --> 00:39:37.880
<v Speaker 2>You've got your ll M folks who you know, want

774
00:39:37.920 --> 00:39:41.880
<v Speaker 2>to automate the flow and control of tokens. You've got

775
00:39:42.039 --> 00:39:45.280
<v Speaker 2>administrators trying to keep things up and make sure buildings

776
00:39:45.320 --> 00:39:49.159
<v Speaker 2>go into right places. Anybody involved in cost control, which

777
00:39:49.239 --> 00:39:52.119
<v Speaker 2>is lots of folks. Like my experience talking developers when

778
00:39:52.159 --> 00:39:53.880
<v Speaker 2>they're starting to experiment in ll MS is they want

779
00:39:53.880 --> 00:39:56.559
<v Speaker 2>the ladies and greatest of everything. But the price, you know,

780
00:39:56.719 --> 00:39:58.960
<v Speaker 2>may the technology may or may not be needed, and

781
00:39:59.079 --> 00:40:01.760
<v Speaker 2>the price tag is huge for the latest versions compared

782
00:40:01.800 --> 00:40:04.840
<v Speaker 2>to Hey, would this have worked with GPT three point

783
00:40:04.920 --> 00:40:09.679
<v Speaker 2>five lass like, because it's a tenth the price, Like,

784
00:40:10.079 --> 00:40:14.119
<v Speaker 2>it makes a difference. Yeah, if you don't concern about

785
00:40:14.119 --> 00:40:16.000
<v Speaker 2>any of that, you just don't. Nope, give me four zero,

786
00:40:16.079 --> 00:40:16.679
<v Speaker 2>I want it all.

787
00:40:17.000 --> 00:40:17.159
<v Speaker 1>Yeah.

788
00:40:17.239 --> 00:40:19.719
<v Speaker 3>What's interesting with the alms, that's actually the opposite. Usually

789
00:40:20.440 --> 00:40:23.559
<v Speaker 3>usually like four always cheaper there than four or than

790
00:40:23.719 --> 00:40:27.679
<v Speaker 3>three five. Oh really Yeah, that's because they're more kind

791
00:40:27.679 --> 00:40:30.920
<v Speaker 3>of optimized, so they say that they consume more less resources,

792
00:40:30.960 --> 00:40:31.960
<v Speaker 3>so they're more optimized.

793
00:40:32.039 --> 00:40:33.400
<v Speaker 2>That's why it's it's cheaper.

794
00:40:33.760 --> 00:40:34.079
<v Speaker 1>Interesting.

795
00:40:34.719 --> 00:40:37.320
<v Speaker 3>So yeah, but that's that's actually a good point that

796
00:40:37.599 --> 00:40:41.639
<v Speaker 3>we we basically distinguish we have internally we think about

797
00:40:41.719 --> 00:40:43.920
<v Speaker 3>two personas. We have a I engineer who's kind of

798
00:40:43.920 --> 00:40:46.719
<v Speaker 3>building the application, who's using all of the s DKs

799
00:40:46.760 --> 00:40:49.039
<v Speaker 3>they want latest and greatest, and then we have a

800
00:40:49.119 --> 00:40:51.320
<v Speaker 3>I platform engineer who is kind of providing access to

801
00:40:51.360 --> 00:40:54.599
<v Speaker 3>those models and he here she cares about the token

802
00:40:54.679 --> 00:40:57.760
<v Speaker 3>consumption like cross charge and low balance and all this

803
00:40:57.920 --> 00:41:01.960
<v Speaker 3>kind of stuff. And I engineer they also they always

804
00:41:02.239 --> 00:41:04.400
<v Speaker 3>want something new, and that's also kind of one of

805
00:41:04.400 --> 00:41:07.639
<v Speaker 3>the challenges for us because the space is evolve when

806
00:41:07.800 --> 00:41:10.360
<v Speaker 3>like super fuss, like we are just trying to keep

807
00:41:10.440 --> 00:41:13.119
<v Speaker 3>up with with different models. For example, for all I

808
00:41:13.320 --> 00:41:16.000
<v Speaker 3>was recently announced. We're just working on adding the support

809
00:41:16.119 --> 00:41:20.440
<v Speaker 3>for this model right now because it's multimodel. It supports images, audio,

810
00:41:20.599 --> 00:41:23.039
<v Speaker 3>not on the text like for GBT four or in

811
00:41:23.119 --> 00:41:26.639
<v Speaker 3>JBT three five. But then we're working on this right now,

812
00:41:26.719 --> 00:41:30.920
<v Speaker 3>and recently they announced GPT four All Mini. So it's

813
00:41:31.360 --> 00:41:33.280
<v Speaker 3>it's really like it's really hard to keep up with

814
00:41:33.400 --> 00:41:35.400
<v Speaker 3>the with the industry and like a lot of open

815
00:41:35.440 --> 00:41:40.360
<v Speaker 3>source projects building the gateways, building the capabilities to document

816
00:41:40.480 --> 00:41:44.599
<v Speaker 3>the lllms. So yeah, it's it's a fascinating place.

817
00:41:45.159 --> 00:41:49.760
<v Speaker 2>Yeah, job security for you, it sounds like just trying

818
00:41:49.800 --> 00:41:51.440
<v Speaker 2>to keep up, right, But I think that's part of

819
00:41:51.480 --> 00:41:54.280
<v Speaker 2>the strength for the customers using this is to go, oh,

820
00:41:54.400 --> 00:41:57.480
<v Speaker 2>new model arrived, Okay, well it's in APIM so we're okay,

821
00:41:57.599 --> 00:42:01.000
<v Speaker 2>we can add that connection that there. But certainly as

822
00:42:01.039 --> 00:42:03.480
<v Speaker 2>they switch over to multimodel, like I suspect your inputs

823
00:42:03.519 --> 00:42:07.039
<v Speaker 2>are different. It's not just a blob of text going

824
00:42:07.159 --> 00:42:08.320
<v Speaker 2>and work could be almost anything.

825
00:42:08.519 --> 00:42:11.559
<v Speaker 3>Yeah, that's actually also an interesting problem because like sometimes

826
00:42:11.639 --> 00:42:14.559
<v Speaker 3>people think that, oh, okay, I have this GPT four

827
00:42:15.039 --> 00:42:17.400
<v Speaker 3>and what if GPT four is not available, I will

828
00:42:17.480 --> 00:42:19.920
<v Speaker 3>go to I don't know, some different model for example

829
00:42:20.039 --> 00:42:26.119
<v Speaker 3>mistroll large. But in reality you have the engineers will

830
00:42:26.159 --> 00:42:28.239
<v Speaker 3>work a lot on the prompts and make sure that

831
00:42:28.400 --> 00:42:31.920
<v Speaker 3>these prompts work with a specific model, right, And typically

832
00:42:31.920 --> 00:42:34.119
<v Speaker 3>if you switch to the underland model, most likely the

833
00:42:34.199 --> 00:42:37.840
<v Speaker 3>result will be not that not what you expected, right, right? Oh,

834
00:42:37.960 --> 00:42:40.679
<v Speaker 3>it's important to test it against multiple models.

835
00:42:41.199 --> 00:42:44.679
<v Speaker 2>Well yeah, I said, this is such a moving space. Heck,

836
00:42:44.800 --> 00:42:47.079
<v Speaker 2>let's face it, you can fire the same prompt at

837
00:42:47.079 --> 00:42:48.559
<v Speaker 2>the same model several.

838
00:42:48.360 --> 00:42:49.559
<v Speaker 1>Times and get the results.

839
00:42:49.639 --> 00:42:53.039
<v Speaker 2>Yeah, right, Absolutely, We're not living in a land at

840
00:42:53.079 --> 00:42:54.360
<v Speaker 2>consistency right now.

841
00:42:55.039 --> 00:42:57.960
<v Speaker 1>Brian McKay and I used to do this show called

842
00:42:58.000 --> 00:43:01.599
<v Speaker 1>the ai Bot Show, and it was a YouTube thing

843
00:43:01.679 --> 00:43:04.360
<v Speaker 1>and he would have something to show and he would

844
00:43:04.400 --> 00:43:07.800
<v Speaker 1>be practicing it the night before we recorded, and we recorded,

845
00:43:08.079 --> 00:43:12.159
<v Speaker 1>the prompts would be completely different, probably because it learned

846
00:43:12.679 --> 00:43:17.519
<v Speaker 1>overnight modified. Yeah, something that he could jail break today

847
00:43:18.119 --> 00:43:20.639
<v Speaker 1>tomorrow is impossible. Crazy.

848
00:43:20.840 --> 00:43:24.039
<v Speaker 3>Yeah, that's why prompt engineering is like it's super important.

849
00:43:24.280 --> 00:43:27.599
<v Speaker 3>Like whatever you build, prompt engineering is always going to

850
00:43:27.639 --> 00:43:30.679
<v Speaker 3>be very important. And the thing is that, like whenever

851
00:43:30.679 --> 00:43:33.360
<v Speaker 3>you're tested, it's not deterministic, Like you cannot say that, Okay,

852
00:43:33.440 --> 00:43:35.719
<v Speaker 3>it works as you mentioned, it works today, but it

853
00:43:35.840 --> 00:43:37.159
<v Speaker 3>might not work work tomorrow.

854
00:43:37.320 --> 00:43:40.039
<v Speaker 1>And as developers, that really messes with our head because

855
00:43:40.079 --> 00:43:42.320
<v Speaker 1>we're used to absolute results. Yeah.

856
00:43:42.719 --> 00:43:45.960
<v Speaker 2>I think we're also used to building on an existing

857
00:43:46.159 --> 00:43:49.239
<v Speaker 2>data set, where so far they're pretty much tearing down

858
00:43:49.280 --> 00:43:51.760
<v Speaker 2>his models and rebuilding them over and over and over again.

859
00:43:51.880 --> 00:43:56.000
<v Speaker 2>So you can't expect that what worked before works again.

860
00:43:56.239 --> 00:44:00.840
<v Speaker 2>That's just not the thing, because we don't revised models.

861
00:44:00.840 --> 00:44:05.079
<v Speaker 2>We replace models for better or worse. I was describing

862
00:44:05.239 --> 00:44:09.440
<v Speaker 2>paredolia on a walk this weekend that paradolia is the

863
00:44:09.599 --> 00:44:13.000
<v Speaker 2>tendency for humans to see faces in things. Right, You

864
00:44:13.119 --> 00:44:15.199
<v Speaker 2>look at a bowling ball and it's like, that's got

865
00:44:15.239 --> 00:44:16.519
<v Speaker 2>a face, or the front of a car, it's got

866
00:44:16.599 --> 00:44:19.199
<v Speaker 2>a face, right, And how that's an evolved trade of

867
00:44:19.280 --> 00:44:22.880
<v Speaker 2>humans because if you detected the face first in the trees,

868
00:44:23.320 --> 00:44:25.440
<v Speaker 2>you were the one running before the other people were running,

869
00:44:25.440 --> 00:44:29.400
<v Speaker 2>so you probably lived. And the downside of courses, when

870
00:44:29.440 --> 00:44:31.760
<v Speaker 2>you see faces that aren't there is almost is very low.

871
00:44:31.800 --> 00:44:34.239
<v Speaker 2>It's not a big deal. Right, So we're talking about

872
00:44:34.280 --> 00:44:36.599
<v Speaker 2>model bility. I'm like, so, imagine I'd take a shotgun

873
00:44:37.000 --> 00:44:39.000
<v Speaker 2>and I shoot at a target and then I say,

874
00:44:39.239 --> 00:44:42.679
<v Speaker 2>do you like this face right now? If you say no,

875
00:44:42.920 --> 00:44:45.079
<v Speaker 2>I want a better face, I don't take the same target.

876
00:44:45.239 --> 00:44:47.079
<v Speaker 2>I did a new target and I shoot it again.

877
00:44:48.360 --> 00:44:51.360
<v Speaker 2>And that's you know, the nature of constantly rebuilding models

878
00:44:51.480 --> 00:44:54.239
<v Speaker 2>is that you typically don't get the same results again.

879
00:44:54.360 --> 00:44:54.559
<v Speaker 1>Yeah.

880
00:44:54.719 --> 00:44:56.239
<v Speaker 2>I'm sorry that was a very long winded way of

881
00:44:56.280 --> 00:44:57.960
<v Speaker 2>going about that. But I like saying paradolia.

882
00:44:58.000 --> 00:45:00.239
<v Speaker 1>But I love that you, you know, introduced that word

883
00:45:00.320 --> 00:45:06.000
<v Speaker 1>that I've already forgotten something. But for me, that's like

884
00:45:06.119 --> 00:45:09.119
<v Speaker 1>staring up the clouds. You know, our block ink blot

885
00:45:09.159 --> 00:45:10.400
<v Speaker 1>tests or shack tests.

886
00:45:10.519 --> 00:45:14.440
<v Speaker 2>Yeah, yeah, humans see things that aren't there because it

887
00:45:15.400 --> 00:45:17.480
<v Speaker 2>used to be useful at least Now I don't know,

888
00:45:17.960 --> 00:45:21.519
<v Speaker 2>it's creating its own set of complexities. In a question, well,

889
00:45:21.639 --> 00:45:24.159
<v Speaker 2>how many versions out are you planned? Andre, Like, I

890
00:45:24.199 --> 00:45:27.880
<v Speaker 2>could see lots of demand from different folks for various features.

891
00:45:28.159 --> 00:45:32.519
<v Speaker 2>We talked about the whole content management thing. But you

892
00:45:32.599 --> 00:45:34.159
<v Speaker 2>know what comes next for you?

893
00:45:35.400 --> 00:45:38.840
<v Speaker 3>Yeah, So we started with as open AI being kind

894
00:45:38.880 --> 00:45:41.000
<v Speaker 3>of the one that is easy to use an Azure

895
00:45:41.119 --> 00:45:45.039
<v Speaker 3>and have the most popular one, but we also want

896
00:45:45.079 --> 00:45:47.440
<v Speaker 3>to extend to other models because there is definitely demand

897
00:45:47.639 --> 00:45:51.719
<v Speaker 3>to use other models like Lama, Mistrol here, hug and

898
00:45:51.800 --> 00:45:54.760
<v Speaker 3>Face and others. So we're looking at how to expand

899
00:45:55.119 --> 00:45:57.880
<v Speaker 3>our genera k to akpabilities to support more models, to

900
00:45:57.960 --> 00:46:00.119
<v Speaker 3>make sure that customers can use multiple models and the

901
00:46:00.159 --> 00:46:04.760
<v Speaker 3>same in the same APM instance, without like the need

902
00:46:04.920 --> 00:46:08.360
<v Speaker 3>to customize policies right crazy post expressions.

903
00:46:07.840 --> 00:46:08.159
<v Speaker 1>And so on.

904
00:46:10.320 --> 00:46:12.880
<v Speaker 3>Then U there is a there's a huge demand like

905
00:46:13.079 --> 00:46:15.960
<v Speaker 3>on logging in monitoring side, as I mentioned, we we

906
00:46:16.159 --> 00:46:19.760
<v Speaker 3>started with the talken tracking, but it turns out there

907
00:46:19.800 --> 00:46:23.039
<v Speaker 3>are certain phases of the intelligent applications development where you

908
00:46:23.119 --> 00:46:25.639
<v Speaker 3>actually want to collect all of the proms and completions

909
00:46:25.679 --> 00:46:30.440
<v Speaker 3>to make sure that your model behaves correctly. And in general,

910
00:46:30.519 --> 00:46:33.239
<v Speaker 3>logan is pretty easy with API M if you are

911
00:46:33.280 --> 00:46:35.400
<v Speaker 3>not using streaming, if you're not using the SECE events,

912
00:46:35.440 --> 00:46:38.880
<v Speaker 3>because again I mentioned there's a buffering problem, so that's

913
00:46:38.920 --> 00:46:41.320
<v Speaker 3>something that we're looking at how we can solve this

914
00:46:41.480 --> 00:46:44.639
<v Speaker 3>in the future. And also kind of in general, like

915
00:46:44.719 --> 00:46:49.639
<v Speaker 3>focus on security traffic management like prompt manipulation policies, like

916
00:46:50.119 --> 00:46:52.440
<v Speaker 3>let's say that this example that you share that I

917
00:46:52.639 --> 00:46:56.119
<v Speaker 3>just found some copilot that I can use now for

918
00:46:56.239 --> 00:47:00.719
<v Speaker 3>my personal personal Again, but what if I have some

919
00:47:00.840 --> 00:47:04.519
<v Speaker 3>policy that says that, oh, for whatever context which is

920
00:47:04.559 --> 00:47:07.360
<v Speaker 3>presented in the prompt, I will rewrite it so that

921
00:47:07.480 --> 00:47:09.840
<v Speaker 3>I know that my application works with it perfectly well.

922
00:47:10.280 --> 00:47:12.039
<v Speaker 3>So in that case, whatever you send us a problem

923
00:47:12.079 --> 00:47:14.119
<v Speaker 3>that will be rewritten on APIM side, and you will

924
00:47:14.159 --> 00:47:17.079
<v Speaker 3>not get the response that you wanted from this copile

925
00:47:17.199 --> 00:47:19.239
<v Speaker 3>that you just found out in the Internet.

926
00:47:20.079 --> 00:47:21.719
<v Speaker 1>So it just occurred to me at the end of

927
00:47:21.719 --> 00:47:23.760
<v Speaker 1>the show here that I should ask this question long ago.

928
00:47:23.800 --> 00:47:26.960
<v Speaker 1>But is it possible to write two policies that contradict

929
00:47:27.039 --> 00:47:29.800
<v Speaker 1>each other? And what happens if that's possible?

930
00:47:31.320 --> 00:47:33.039
<v Speaker 3>I believe technically you can do that. But at the

931
00:47:33.039 --> 00:47:36.320
<v Speaker 3>same time, we have the policies are executed from top

932
00:47:36.400 --> 00:47:38.480
<v Speaker 3>to bottom, so whatever you have at the bottom will

933
00:47:39.400 --> 00:47:41.519
<v Speaker 3>be enforced, right, So it's.

934
00:47:41.360 --> 00:47:47.000
<v Speaker 1>The order of execution, yep, which is pretty common. Yeah, yeah, yeah,

935
00:47:47.039 --> 00:47:50.159
<v Speaker 1>it is. It can lead to some confusion. Now, it'd

936
00:47:50.159 --> 00:47:53.480
<v Speaker 1>be nice to have some something when you're creating those

937
00:47:53.559 --> 00:47:55.559
<v Speaker 1>policies to say, hey, you know, by the way, this

938
00:47:55.719 --> 00:47:58.119
<v Speaker 1>contradicts this policy, you might want to take a look

939
00:47:58.119 --> 00:47:58.639
<v Speaker 1>at that. Yeah.

940
00:47:58.639 --> 00:48:01.760
<v Speaker 3>We definitely validate policies, so we have the policy ENGINET validates.

941
00:48:01.800 --> 00:48:05.320
<v Speaker 3>If there's something like which doesn't make sense, there will

942
00:48:05.320 --> 00:48:07.199
<v Speaker 3>be a validation error, so you will not be able

943
00:48:07.280 --> 00:48:08.119
<v Speaker 3>to save the policy.

944
00:48:08.519 --> 00:48:08.719
<v Speaker 1>Yeah.

945
00:48:09.159 --> 00:48:10.440
<v Speaker 3>But yeah, if that's something.

946
00:48:10.400 --> 00:48:13.239
<v Speaker 1>Allow Carl to access this API, don't allow Carl to

947
00:48:13.280 --> 00:48:13.679
<v Speaker 1>access to.

948
00:48:13.679 --> 00:48:17.239
<v Speaker 3>This ABA yeah, yeah exactly, and stuff like that.

949
00:48:17.960 --> 00:48:18.159
<v Speaker 1>Cool.

950
00:48:18.280 --> 00:48:20.920
<v Speaker 3>So yeah, and in general, like for the future of JENNI,

951
00:48:21.079 --> 00:48:24.800
<v Speaker 3>so we are good at like security, traffic management, just

952
00:48:24.960 --> 00:48:27.280
<v Speaker 3>kind of general ease of operations and APM. So that's

953
00:48:27.280 --> 00:48:29.440
<v Speaker 3>what we are focusing on to make sure that customers

954
00:48:29.519 --> 00:48:33.280
<v Speaker 3>have all of these secure access control to those models,

955
00:48:33.400 --> 00:48:35.719
<v Speaker 3>like all of the policies and governments in place. But

956
00:48:35.760 --> 00:48:37.760
<v Speaker 3>at the same time, we want to make it easier

957
00:48:37.800 --> 00:48:40.400
<v Speaker 3>to build intelligent applications. So whatever we build, we are

958
00:48:40.440 --> 00:48:44.119
<v Speaker 3>trying to give those AI engineers like an easy to

959
00:48:44.199 --> 00:48:46.480
<v Speaker 3>use interface if they're not familiar, to make sure that

960
00:48:46.559 --> 00:48:50.079
<v Speaker 3>it's easy for them to set up, configure and basically

961
00:48:51.239 --> 00:48:55.000
<v Speaker 3>get all the benefits of a APMs GENEI gateway when

962
00:48:55.000 --> 00:48:56.039
<v Speaker 3>they're building applications.

963
00:48:56.239 --> 00:48:58.320
<v Speaker 1>Great, well, it sounds like the end of the show,

964
00:48:58.400 --> 00:49:00.480
<v Speaker 1>Andre Kamanov, thank you for being of this. Is there

965
00:49:00.519 --> 00:49:02.440
<v Speaker 1>anything that we missed that you wanted to mention or

966
00:49:02.440 --> 00:49:04.519
<v Speaker 1>a shout out or call it action or anything.

967
00:49:04.800 --> 00:49:10.719
<v Speaker 3>I would say, just make sure to check the AZRA

968
00:49:10.800 --> 00:49:14.039
<v Speaker 3>updates when we release new stuff, and we'll get a

969
00:49:14.079 --> 00:49:17.119
<v Speaker 3>technique community blocks where we publish all of the latest

970
00:49:17.159 --> 00:49:21.000
<v Speaker 3>and greatest in in a PM. And yeah, all right,

971
00:49:21.079 --> 00:49:22.239
<v Speaker 3>that's that's it.

972
00:49:22.719 --> 00:49:24.199
<v Speaker 1>Awesome. Well, it's been great talking to you.

973
00:49:24.320 --> 00:49:25.000
<v Speaker 3>Thanks for having me.

974
00:49:25.679 --> 00:49:27.760
<v Speaker 1>It was great talking to good to Thank you very much.

975
00:49:28.360 --> 00:49:31.199
<v Speaker 1>All right, we'll talk to you next time I'm done.

976
00:49:52.239 --> 00:49:54.760
<v Speaker 1>Dot net Rocks is brought to you by Franklin's Net

977
00:49:55.079 --> 00:49:59.000
<v Speaker 1>and produced by Pop Studios, a full service audio, video

978
00:49:59.079 --> 00:50:03.119
<v Speaker 1>and post production facility located physically in New London, Connecticut.

979
00:50:03.440 --> 00:50:07.599
<v Speaker 1>And of course in the cloud online at pwop dot com.

980
00:50:08.440 --> 00:50:10.480
<v Speaker 1>Visit our website at d O T N E t

981
00:50:10.800 --> 00:50:14.760
<v Speaker 1>R O c k S dot com for RSS feeds, downloads,

982
00:50:14.960 --> 00:50:18.599
<v Speaker 1>mobile apps, comments, and access to the full archives going

983
00:50:18.679 --> 00:50:22.079
<v Speaker 1>back to show number one, recorded in September two thousand

984
00:50:22.079 --> 00:50:24.719
<v Speaker 1>and two. And make sure you check out our sponsors.

985
00:50:24.920 --> 00:50:27.679
<v Speaker 1>They keep us in business. Now, go write some code.

986
00:50:28.280 --> 00:50:33.880
<v Speaker 1>See you next time. You got J middle Vans and
