WEBVTT

1
00:00:00.040 --> 00:00:02.319
<v Speaker 1>Okay, let's untack this. What if you could take the

2
00:00:02.439 --> 00:00:07.200
<v Speaker 1>raw power behind tools like chat GPT and mold it

3
00:00:07.240 --> 00:00:10.160
<v Speaker 1>to your exact needs. We're talking about going beyond just

4
00:00:10.240 --> 00:00:12.759
<v Speaker 1>you know, chatting with an AI to actually building intelligent

5
00:00:12.759 --> 00:00:15.359
<v Speaker 1>applications and specialized assistants.

6
00:00:14.800 --> 00:00:18.120
<v Speaker 2>Exactly Today, our deep dive is all about the open

7
00:00:18.160 --> 00:00:22.800
<v Speaker 2>Ai API, focusing on how it empowers well anyone really

8
00:00:23.120 --> 00:00:26.160
<v Speaker 2>to create custom AI solutions. Our main source for this

9
00:00:26.320 --> 00:00:30.320
<v Speaker 2>is Henry Habib's Open Ai API Cookbook, which Packed Publishing

10
00:00:30.359 --> 00:00:31.879
<v Speaker 2>put out in March twenty twenty four.

11
00:00:32.000 --> 00:00:34.280
<v Speaker 1>And Henry Habib he knows his stuff over a decade

12
00:00:34.359 --> 00:00:37.439
<v Speaker 1>and AI and productivity right, and he's a big believer

13
00:00:37.520 --> 00:00:40.679
<v Speaker 1>in this citizen developer idea that you don't need to

14
00:00:40.679 --> 00:00:43.520
<v Speaker 1>be like a hardcore coder to build amazing things. He's

15
00:00:43.520 --> 00:00:45.920
<v Speaker 1>also the guy behind the Intelligent Worker newsletter.

16
00:00:45.560 --> 00:00:48.359
<v Speaker 2>That's right, and Sam Mackay, the CEO of Enterprise DNA.

17
00:00:48.439 --> 00:00:50.799
<v Speaker 2>He actually calls the book an essential guide for knowledge

18
00:00:50.799 --> 00:00:53.840
<v Speaker 2>workers eager to harness the power of open AI and

19
00:00:53.920 --> 00:00:58.280
<v Speaker 2>chat GPT to build intelligent applications and solutions. High praise.

20
00:00:58.560 --> 00:01:01.560
<v Speaker 1>So your mission for this, a deep Dive listener, is simple,

21
00:01:01.799 --> 00:01:05.560
<v Speaker 1>get a shortcut really understand how to use the open AIAPI.

22
00:01:05.879 --> 00:01:09.079
<v Speaker 1>We're focusing on the practical stuff, those real aha moments.

23
00:01:09.840 --> 00:01:13.040
<v Speaker 1>You've heard of AI chat GPT. They're everywhere, constantly talked about.

24
00:01:13.239 --> 00:01:15.760
<v Speaker 1>But what's cool here is how actionable they are. We're

25
00:01:15.760 --> 00:01:17.920
<v Speaker 1>going to show you how to turn ideas into reality.

26
00:01:18.280 --> 00:01:21.319
<v Speaker 1>Let's start with the basics. Why the API matters. It's

27
00:01:21.359 --> 00:01:24.000
<v Speaker 1>more than just the chat box you see online. I mean,

28
00:01:24.079 --> 00:01:27.159
<v Speaker 1>chat GPT's growth was just insane, wasn't it. One hundred

29
00:01:27.200 --> 00:01:30.599
<v Speaker 1>million users in two months. That's faster than TikTok, which

30
00:01:30.640 --> 00:01:34.120
<v Speaker 1>took what nine months. It really brought natural language processing

31
00:01:34.239 --> 00:01:36.000
<v Speaker 1>NLPE to the masses.

32
00:01:35.560 --> 00:01:39.319
<v Speaker 2>Absolutely, and the API it takes that democratization way further.

33
00:01:39.400 --> 00:01:42.400
<v Speaker 2>It's a genuine paradigm shift. It means anyone can generate

34
00:01:42.439 --> 00:01:45.480
<v Speaker 2>really human like text from simple prompts. You don't need

35
00:01:45.519 --> 00:01:47.400
<v Speaker 2>a PhD in machine learning anymore. It's not just for

36
00:01:47.439 --> 00:01:49.959
<v Speaker 2>the big players like Typeface or Jasper Ai building on

37
00:01:50.000 --> 00:01:52.359
<v Speaker 2>top of it. It's for you integrating that power into

38
00:01:52.359 --> 00:01:53.079
<v Speaker 2>your own stuff.

39
00:01:53.280 --> 00:01:56.159
<v Speaker 1>And the open II Playground is kind of the perfect

40
00:01:56.200 --> 00:01:58.760
<v Speaker 1>place to start messing around. Yeah, like a sandbox. It's

41
00:01:58.760 --> 00:02:01.799
<v Speaker 1>got three main parts the system message, the chat log,

42
00:02:01.879 --> 00:02:04.319
<v Speaker 1>and then the parameters. The system message is where you

43
00:02:04.400 --> 00:02:08.159
<v Speaker 1>tell the AI who it should be, like you are

44
00:02:08.240 --> 00:02:11.840
<v Speaker 1>an assistant that creates marketing slogans, simple as that shapes

45
00:02:11.840 --> 00:02:13.199
<v Speaker 1>its whole persona right.

46
00:02:13.039 --> 00:02:16.080
<v Speaker 2>And it's fascinating because the model isn't understanding like we

47
00:02:16.120 --> 00:02:18.520
<v Speaker 2>do no thoughts, no feelings. Think of it more like

48
00:02:19.719 --> 00:02:23.879
<v Speaker 2>super advanced autocomplete. It predicts the next word based on

49
00:02:24.000 --> 00:02:26.759
<v Speaker 2>patterns from tons of text data. So you put examples

50
00:02:26.800 --> 00:02:29.280
<v Speaker 2>in the chat log, say you give it company makes

51
00:02:29.319 --> 00:02:31.840
<v Speaker 2>ice cream, and then that apply sham the ice cream

52
00:02:31.840 --> 00:02:34.960
<v Speaker 2>that never melts. You're guiding those predictions. You're kind of

53
00:02:34.960 --> 00:02:37.560
<v Speaker 2>training it right there to follow patterns like starting with

54
00:02:37.599 --> 00:02:39.599
<v Speaker 2>sham and ending with an exclamation.

55
00:02:39.120 --> 00:02:43.159
<v Speaker 1>Mark that makes sense, guiding the probabilities. Okay, so once

56
00:02:43.159 --> 00:02:45.520
<v Speaker 1>you've got your prompts working well in the playground, you

57
00:02:45.560 --> 00:02:48.560
<v Speaker 1>move on to making real API requests, maybe using something

58
00:02:48.599 --> 00:02:51.400
<v Speaker 1>like postmam. And this is where it gets really powerful

59
00:02:51.439 --> 00:02:53.840
<v Speaker 1>because you're not just watching it work, you're controlling it

60
00:02:53.879 --> 00:02:57.039
<v Speaker 1>with code programmatically. And for an API request, there are

61
00:02:57.080 --> 00:02:59.240
<v Speaker 1>like four main things you need. Right First is the

62
00:02:59.360 --> 00:03:02.159
<v Speaker 1>endpoint that's URL, the address you're sending the request to

63
00:03:02.280 --> 00:03:05.599
<v Speaker 1>like https, dot API, dot openI, dot com, forward slash

64
00:03:05.719 --> 00:03:07.599
<v Speaker 1>v one chat completions exactly.

65
00:03:08.039 --> 00:03:12.439
<v Speaker 2>Then there's the header. Think of this as containing important metadata.

66
00:03:12.560 --> 00:03:15.719
<v Speaker 2>It tells open Ai what you're sending, usually content type

67
00:03:15.719 --> 00:03:18.400
<v Speaker 2>dot application JSON because Jason is just a standard way

68
00:03:18.439 --> 00:03:21.639
<v Speaker 2>for systems to swap structured data. And critically, it says

69
00:03:21.680 --> 00:03:25.039
<v Speaker 2>who you are with your authorization bearer, your API key.

70
00:03:25.199 --> 00:03:27.000
<v Speaker 2>That's your secret handshake with open Ai.

71
00:03:27.360 --> 00:03:29.680
<v Speaker 1>Okay, So header is like the envelope details, and the

72
00:03:29.719 --> 00:03:31.159
<v Speaker 1>body is what's inside the envelope.

73
00:03:31.159 --> 00:03:33.719
<v Speaker 2>Correct. A body is a Jason object. It holds the

74
00:03:33.759 --> 00:03:36.000
<v Speaker 2>specifics like which model you want to use, and the

75
00:03:36.039 --> 00:03:39.639
<v Speaker 2>messages that's your system message and chat log content. And

76
00:03:39.719 --> 00:03:42.199
<v Speaker 2>finally you get the response back from open Ai. That's

77
00:03:42.240 --> 00:03:46.240
<v Speaker 2>also Jason containing the AI's output. It's choices and usage

78
00:03:46.319 --> 00:03:47.960
<v Speaker 2>data like how many tokens you used?

79
00:03:48.080 --> 00:03:50.919
<v Speaker 1>Cool? But okay, let's break out of just text. The

80
00:03:50.960 --> 00:03:53.759
<v Speaker 1>open Ai API can do more than just words, can't it?

81
00:03:54.039 --> 00:03:55.319
<v Speaker 1>Multimodal stuff? Oh?

82
00:03:55.360 --> 00:03:59.199
<v Speaker 2>Absolutely, Beyond text. You've got image generation with Dally. The

83
00:03:59.280 --> 00:04:03.000
<v Speaker 2>newer versions Dally two and three use this technique called diffusion.

84
00:04:03.039 --> 00:04:05.039
<v Speaker 2>You can kind of picture it like starting with TV

85
00:04:05.120 --> 00:04:07.560
<v Speaker 2>static and slowly clearing it up until an image appears.

86
00:04:07.560 --> 00:04:10.719
<v Speaker 2>It's pretty neat. But the key with images, unlike text maybe,

87
00:04:10.800 --> 00:04:13.000
<v Speaker 2>is you have to be super specific in your prompts.

88
00:04:13.240 --> 00:04:15.759
<v Speaker 2>Just saying a dog gets you, well, a random dog,

89
00:04:16.040 --> 00:04:18.839
<v Speaker 2>but a brown, furry, medium sized CORKI doog on a

90
00:04:18.839 --> 00:04:22.199
<v Speaker 2>green grass field profile view that gets you much closer

91
00:04:22.199 --> 00:04:24.600
<v Speaker 2>to what you actually want. It raises an interesting point.

92
00:04:24.639 --> 00:04:27.959
<v Speaker 2>Text generation can infer context sometimes, but image generation it

93
00:04:28.000 --> 00:04:32.240
<v Speaker 2>needs precise descriptive language. Ambiguity is your enemy here.

94
00:04:32.319 --> 00:04:34.439
<v Speaker 1>Good point need to be crystal clear. And it does

95
00:04:34.480 --> 00:04:35.800
<v Speaker 1>audio too. Transcripts.

96
00:04:35.920 --> 00:04:38.240
<v Speaker 2>Yeah, the audio endpoint uses the Whisper model for that.

97
00:04:38.600 --> 00:04:40.000
<v Speaker 2>It transcribes audio files.

98
00:04:40.079 --> 00:04:43.240
<v Speaker 1>Ah and technically for file uploads you need to use

99
00:04:43.279 --> 00:04:45.040
<v Speaker 1>form data instead of JSON in.

100
00:04:45.000 --> 00:04:48.199
<v Speaker 2>The request right exactly. Jason is great for text data,

101
00:04:48.240 --> 00:04:51.000
<v Speaker 2>but form data is built for sending files, kind of

102
00:04:51.000 --> 00:04:53.360
<v Speaker 2>like attaching something to an email. It handles lots of

103
00:04:53.399 --> 00:04:57.000
<v Speaker 2>formats dot MP three, dot MP four, dot MPEG, dotwave,

104
00:04:57.120 --> 00:04:59.680
<v Speaker 2>dot web, dot WebM quite a few.

105
00:04:59.519 --> 00:05:02.279
<v Speaker 1>So you could transcribe a meeting maybe easily.

106
00:05:02.560 --> 00:05:04.839
<v Speaker 2>And the real magic starts when you chain these things together.

107
00:05:05.160 --> 00:05:08.439
<v Speaker 2>Imagine a voice assistant. Voice comes in whisper transcribes it,

108
00:05:08.680 --> 00:05:12.000
<v Speaker 2>chat Api figures out a response, maybe Dali even generates

109
00:05:12.079 --> 00:05:12.759
<v Speaker 2>relevant image.

110
00:05:12.800 --> 00:05:15.959
<v Speaker 1>Okay, that's starting to sound really powerful. Now, let's talk

111
00:05:15.959 --> 00:05:18.839
<v Speaker 1>about fine tuning the dials and knobs as you called

112
00:05:18.879 --> 00:05:19.399
<v Speaker 1>them in the book.

113
00:05:19.480 --> 00:05:23.000
<v Speaker 2>The parameters, right, The parameters let you control the AI's behavior,

114
00:05:23.319 --> 00:05:25.480
<v Speaker 2>and the model parameter is probably the biggest one. Usually

115
00:05:25.519 --> 00:05:28.279
<v Speaker 2>you're choosing between GPT three point five and GPT four.

116
00:05:28.480 --> 00:05:31.040
<v Speaker 2>GPT three point five has what one hundred and seventy

117
00:05:31.079 --> 00:05:34.680
<v Speaker 2>five billion parameters. GPT four is estimated to be way larger,

118
00:05:34.720 --> 00:05:37.480
<v Speaker 2>maybe over one hundred trillion parameters across a bunch of

119
00:05:37.519 --> 00:05:40.560
<v Speaker 2>models working together. More parameters generally means the model is

120
00:05:40.600 --> 00:05:44.199
<v Speaker 2>better at capturing subtle patterns and understanding complex instructions. So

121
00:05:44.279 --> 00:05:46.879
<v Speaker 2>GPT four tends to be more reliable, better with nuance.

122
00:05:47.160 --> 00:05:51.399
<v Speaker 2>It actually scores higher on things like standardized tests, EP calculus.

123
00:05:50.839 --> 00:05:53.040
<v Speaker 1>The lsat Wow, and you can see that difference in

124
00:05:53.040 --> 00:05:55.120
<v Speaker 1>the outputs. Can't you like that example in the book

125
00:05:55.279 --> 00:05:58.279
<v Speaker 1>asking for a sentence about Mars with six five letter words,

126
00:05:58.560 --> 00:06:01.399
<v Speaker 1>GPT three point five up the word count right, It

127
00:06:01.439 --> 00:06:04.319
<v Speaker 1>gives our Mars strip felt vast, new, cold.

128
00:06:04.000 --> 00:06:06.360
<v Speaker 2>Hard, grand grand isn't five letters.

129
00:06:06.120 --> 00:06:09.000
<v Speaker 1>Exactly If GPT four gets it, Mars Red World, Brave Crew,

130
00:06:09.079 --> 00:06:12.000
<v Speaker 1>Deep Space finds life. Perfect for the cigarette question how

131
00:06:12.000 --> 00:06:14.600
<v Speaker 1>many chemicals? How many harmful? How many cause cancer? Just

132
00:06:14.680 --> 00:06:17.639
<v Speaker 1>the numbers. GPT three point five gives you a paragraph.

133
00:06:17.839 --> 00:06:22.160
<v Speaker 1>GPT four just answers two hundred and fifty sixty concise

134
00:06:22.319 --> 00:06:25.759
<v Speaker 1>even logic puzzles. GPT four tends to reason more accurately

135
00:06:25.800 --> 00:06:28.079
<v Speaker 1>than three point five, and GPT four has a bigger

136
00:06:28.160 --> 00:06:30.680
<v Speaker 1>memory too. The context win much bigger.

137
00:06:30.759 --> 00:06:33.800
<v Speaker 2>Like GPT four thirty two k can handle around thirty

138
00:06:33.800 --> 00:06:36.959
<v Speaker 2>two thousand tokens maybe twenty four thousand words. GPT three

139
00:06:37.000 --> 00:06:39.480
<v Speaker 2>point five max is out around four thousand tokens about

140
00:06:39.480 --> 00:06:43.160
<v Speaker 2>three thousand words. Big difference if you're feeding it long documents.

141
00:06:43.160 --> 00:06:45.199
<v Speaker 1>Okay, but there's a catch, isn't there cost?

142
00:06:45.519 --> 00:06:49.040
<v Speaker 2>Huge catch? GPT four can be twenty to forty times

143
00:06:49.079 --> 00:06:52.680
<v Speaker 2>more expensive per token than GPT three point five. It's significant.

144
00:06:52.879 --> 00:06:55.319
<v Speaker 2>So the practical advice for you is always start with

145
00:06:55.319 --> 00:06:57.959
<v Speaker 2>GPT three point five. If it does the job great,

146
00:06:58.040 --> 00:07:00.600
<v Speaker 2>you save a lot of money. Only upgraded GPT four

147
00:07:00.639 --> 00:07:03.439
<v Speaker 2>if you absolutely need that extra reasoning power or the

148
00:07:03.560 --> 00:07:04.680
<v Speaker 2>larger context window.

149
00:07:04.839 --> 00:07:06.959
<v Speaker 1>That's a massive cost difference. Why is it so much

150
00:07:07.000 --> 00:07:08.360
<v Speaker 1>more just the size.

151
00:07:08.000 --> 00:07:11.560
<v Speaker 2>Primarily, Yeah, it's a much bigger, more complex model. Just

152
00:07:11.560 --> 00:07:14.240
<v Speaker 2>takes way more computing power to run each request. Think

153
00:07:14.319 --> 00:07:15.879
<v Speaker 2>supercomputer versus calculator.

154
00:07:15.959 --> 00:07:18.839
<v Speaker 1>Gotcha? Okay, another parameter dot N that controls how many

155
00:07:18.839 --> 00:07:20.000
<v Speaker 1>answers you get back right.

156
00:07:20.240 --> 00:07:22.360
<v Speaker 2>N sets the number of responses can be any whole

157
00:07:22.399 --> 00:07:25.160
<v Speaker 2>number for chat, but max ten for images. Super useful

158
00:07:25.160 --> 00:07:28.319
<v Speaker 2>for brainstormings, logans, getting different options, or for checking consistency,

159
00:07:28.360 --> 00:07:29.639
<v Speaker 2>maybe ab testing outputs.

160
00:07:29.720 --> 00:07:31.920
<v Speaker 1>And the interesting thing you mentioned is the cost isn't

161
00:07:31.959 --> 00:07:34.399
<v Speaker 1>linear like N three isn't three times the price?

162
00:07:34.480 --> 00:07:37.399
<v Speaker 2>No, it's often much less, maybe sixty percent more, not

163
00:07:37.439 --> 00:07:41.079
<v Speaker 2>two hundred percent more, which tells you something cool. The

164
00:07:41.120 --> 00:07:44.399
<v Speaker 2>AI isn't just running the request three times separately. It's

165
00:07:44.560 --> 00:07:48.439
<v Speaker 2>likely batching the computation somehow finding efficiencies. It's an optimization

166
00:07:48.560 --> 00:07:49.319
<v Speaker 2>hint clever.

167
00:07:49.600 --> 00:07:52.480
<v Speaker 1>Okay, what about temperature? That one sounds a bit abstract.

168
00:07:52.839 --> 00:07:53.920
<v Speaker 1>Controls creativity.

169
00:07:54.000 --> 00:07:57.279
<v Speaker 2>Yeah, temperature basically controls the randomness or let's say, creativity

170
00:07:57.279 --> 00:07:59.399
<v Speaker 2>of the output. It goes from point zero to two

171
00:07:59.439 --> 00:08:01.879
<v Speaker 2>point zero th of it, like tuning a radio. Low

172
00:08:01.879 --> 00:08:04.560
<v Speaker 2>temperature maybe twoint zero too point eight is like a sharp,

173
00:08:04.759 --> 00:08:09.680
<v Speaker 2>clear signal, very focused, consistent factual responses. Good for things

174
00:08:09.680 --> 00:08:13.360
<v Speaker 2>like code generation data analysis where you want deterministic.

175
00:08:12.759 --> 00:08:16.800
<v Speaker 1>Output and higher temperature more static, more like an eclectic

176
00:08:16.839 --> 00:08:17.319
<v Speaker 1>mix station.

177
00:08:17.439 --> 00:08:19.920
<v Speaker 2>Yeah, higher temps, say one point two to two point zero,

178
00:08:20.120 --> 00:08:22.279
<v Speaker 2>make the AI take more risks with word choices. It

179
00:08:22.319 --> 00:08:24.519
<v Speaker 2>flattens the probability curve for the next word, so you

180
00:08:24.519 --> 00:08:28.000
<v Speaker 2>get more diverse, unexpected sometimes more creative results. Great for brainstorming,

181
00:08:28.040 --> 00:08:29.920
<v Speaker 2>writing stories, generating slogans.

182
00:08:30.240 --> 00:08:32.600
<v Speaker 1>So for general use like a chatbot, maybe somewhere in

183
00:08:32.600 --> 00:08:35.159
<v Speaker 1>the middle point eight to one point two exactly.

184
00:08:35.320 --> 00:08:37.320
<v Speaker 2>Balance is making sense with being interesting.

185
00:08:37.759 --> 00:08:39.879
<v Speaker 1>So the advice is start around one point zero and

186
00:08:40.000 --> 00:08:42.879
<v Speaker 1>tweak it by like zero point two increments.

187
00:08:43.200 --> 00:08:45.840
<v Speaker 2>That's a good practical approach. Yeah, see what works for

188
00:08:45.879 --> 00:08:46.679
<v Speaker 2>your specific need.

189
00:08:46.799 --> 00:08:51.840
<v Speaker 1>Okay, makes sense. Now let's shift gears to building real applications.

190
00:08:52.320 --> 00:08:54.759
<v Speaker 1>Usually you don't just have your app talk directly to

191
00:08:54.799 --> 00:08:57.559
<v Speaker 1>open AI, right, there's often a back end layer in between.

192
00:08:57.840 --> 00:09:01.000
<v Speaker 2>That's right. The typical flow is from tend what the

193
00:09:01.080 --> 00:09:03.320
<v Speaker 2>user sees talks to your back end, and your back

194
00:09:03.399 --> 00:09:06.679
<v Speaker 2>end talks to the open AIAPI. This back end layer

195
00:09:06.759 --> 00:09:10.879
<v Speaker 2>is crucial. First security, it keeps your precious API key

196
00:09:11.000 --> 00:09:15.519
<v Speaker 2>safe hidden from the user's browser. Second control, you can

197
00:09:15.519 --> 00:09:18.120
<v Speaker 2>process the input before it goes to open AI or

198
00:09:18.159 --> 00:09:20.559
<v Speaker 2>clean up the output after it comes back. Plus it

199
00:09:20.600 --> 00:09:23.480
<v Speaker 2>lets you integrate other services, hand logins, all that stuff.

200
00:09:23.519 --> 00:09:26.120
<v Speaker 1>And for that back end, serverless options like Google Cloud

201
00:09:26.159 --> 00:09:27.720
<v Speaker 1>functions are pretty popular.

202
00:09:28.039 --> 00:09:30.679
<v Speaker 2>Very popular, yeah, because you don't have to manage servers.

203
00:09:30.679 --> 00:09:34.320
<v Speaker 2>It just scales automatically. You write your code, upload it,

204
00:09:34.679 --> 00:09:37.519
<v Speaker 2>and Google handles the rest. You set up an HTTP

205
00:09:37.679 --> 00:09:39.519
<v Speaker 2>trigger so it could be called like a web address.

206
00:09:39.919 --> 00:09:43.759
<v Speaker 2>Allow unauthenticated calls maybe for testing, but be careful in

207
00:09:43.759 --> 00:09:46.120
<v Speaker 2>production and define your entry point function.

208
00:09:46.440 --> 00:09:49.240
<v Speaker 1>And then for the front end the user interface. You

209
00:09:49.279 --> 00:09:52.480
<v Speaker 1>can use no code tools like Bubble, so anyone can

210
00:09:52.519 --> 00:09:53.960
<v Speaker 1>build the app part exactly.

211
00:09:54.159 --> 00:09:57.480
<v Speaker 2>Bubble lets you visually design your web app and connect

212
00:09:57.480 --> 00:10:00.759
<v Speaker 2>buttons and inputs directly to your back end cloud function.

213
00:10:01.000 --> 00:10:02.519
<v Speaker 2>It's incredibly empowering.

214
00:10:02.840 --> 00:10:05.240
<v Speaker 1>Let's walk through an example, like that email reply wrapper

215
00:10:05.279 --> 00:10:07.240
<v Speaker 1>from the book. You could do it in chat GPT, sure,

216
00:10:07.279 --> 00:10:10.159
<v Speaker 1>but building it yourself really teaches you the whole process.

217
00:10:10.440 --> 00:10:12.799
<v Speaker 1>So you start in the playground testing proms, get the

218
00:10:12.799 --> 00:10:15.240
<v Speaker 1>Python code, then you put that logic into a Google

219
00:10:15.240 --> 00:10:17.519
<v Speaker 1>Cloud function that's your back end. It takes the email

220
00:10:17.559 --> 00:10:20.360
<v Speaker 1>text as input, adds your API key. Secretly, you'd tell

221
00:10:20.399 --> 00:10:23.200
<v Speaker 1>it to use say GPT four, maybe a higher temperature

222
00:10:23.240 --> 00:10:25.960
<v Speaker 1>like one point four for creator replies, set N three

223
00:10:26.039 --> 00:10:28.519
<v Speaker 1>to get three options, maybe limit topens to five to one.

224
00:10:28.480 --> 00:10:30.960
<v Speaker 2>Right, and then you'd use Postman to test that cloud

225
00:10:31.000 --> 00:10:34.320
<v Speaker 2>function directly, make sure it actually returns three email replies

226
00:10:34.360 --> 00:10:36.840
<v Speaker 2>in the format you expect. Once that's working, you jump

227
00:10:36.840 --> 00:10:39.879
<v Speaker 2>into Bubble. You build the input box for the original email,

228
00:10:40.120 --> 00:10:42.759
<v Speaker 2>a button to generate replies, and maybe three textboxes to

229
00:10:42.759 --> 00:10:46.480
<v Speaker 2>display choice one, choice two, choice three. Use bubbles API

230
00:10:46.559 --> 00:10:49.279
<v Speaker 2>connector to link the button press to your cloud function

231
00:10:49.639 --> 00:10:53.720
<v Speaker 2>URL and display the return choices. And really understanding this

232
00:10:53.759 --> 00:10:58.279
<v Speaker 2>whole playground, Cloud function, Postman, Bubble. That's the fundamental pattern.

233
00:10:58.639 --> 00:11:01.360
<v Speaker 2>Master this and you can pretty much any intelligent app.

234
00:11:01.720 --> 00:11:04.080
<v Speaker 1>That's a great point. It's the core loop. What's a

235
00:11:04.120 --> 00:11:06.799
<v Speaker 1>common sticking point when people first try this? Getting the

236
00:11:06.879 --> 00:11:08.799
<v Speaker 1>data flow right often.

237
00:11:08.639 --> 00:11:11.759
<v Speaker 2>Yeah, getting the JSON right in the requests and responses,

238
00:11:12.159 --> 00:11:15.799
<v Speaker 2>making sure API keys are correct and secure, little syntax things.

239
00:11:15.879 --> 00:11:18.440
<v Speaker 2>Postwind really helps debug that before you even touch the frontend.

240
00:11:18.559 --> 00:11:21.200
<v Speaker 1>Okay, so that's a solid foundation. But let's get to

241
00:11:21.240 --> 00:11:23.279
<v Speaker 1>something really cool, something you can't just do in the

242
00:11:23.320 --> 00:11:27.960
<v Speaker 1>standard chat GPT interface easily. The multimodal travel itinerary app.

243
00:11:28.080 --> 00:11:29.399
<v Speaker 1>That sounds awesome.

244
00:11:30.039 --> 00:11:33.600
<v Speaker 2>It really shows the power of orchestrating multiple API calls.

245
00:11:33.919 --> 00:11:38.399
<v Speaker 2>The idea user toxicity gets back a detailed one day

246
00:11:38.399 --> 00:11:42.720
<v Speaker 2>plan morning, afternoon, evening activities and three AI generated images

247
00:11:42.759 --> 00:11:43.799
<v Speaker 2>matching those activities.

248
00:11:43.919 --> 00:11:46.600
<v Speaker 1>Wow, okay, how does that work behind the scenes in

249
00:11:46.639 --> 00:11:47.399
<v Speaker 1>the cloud function.

250
00:11:47.840 --> 00:11:51.039
<v Speaker 2>So first, because this involves multiple calls, including image generation,

251
00:11:51.120 --> 00:11:53.399
<v Speaker 2>which can be slow, you need to increase the cloud

252
00:11:53.399 --> 00:11:56.960
<v Speaker 2>function's timeout limit maybe to three hundred seconds five minutes,

253
00:11:57.240 --> 00:11:57.879
<v Speaker 2>just to be safe.

254
00:11:57.960 --> 00:11:59.000
<v Speaker 1>Good practical tip.

255
00:11:59.279 --> 00:12:03.120
<v Speaker 2>Then one uber one uses the chat api GPT four. Specifically,

256
00:12:03.159 --> 00:12:05.360
<v Speaker 2>it takes the city name. Crucially, you give it a

257
00:12:05.399 --> 00:12:08.559
<v Speaker 2>detailed chat log with examples what the book calls fu

258
00:12:08.559 --> 00:12:11.720
<v Speaker 2>shot prompting. You showed examples for Rome, Lisbon, et cetera.

259
00:12:11.840 --> 00:12:15.080
<v Speaker 2>Format it exactly how you want warning activity, afternoon activity,

260
00:12:15.120 --> 00:12:18.039
<v Speaker 2>evening activity. This force is GPT four to follow that

261
00:12:18.080 --> 00:12:21.120
<v Speaker 2>structure precisely. It stores the resulting itinerary text.

262
00:12:21.279 --> 00:12:24.080
<v Speaker 1>Got it. So the structure comes from good prompting and examples.

263
00:12:24.080 --> 00:12:25.480
<v Speaker 1>How do the images get generated?

264
00:12:25.679 --> 00:12:28.879
<v Speaker 2>That's call number two, also chat API, but this time

265
00:12:29.000 --> 00:12:31.720
<v Speaker 2>using GPT three point five Turbo one one oh six.

266
00:12:32.320 --> 00:12:34.639
<v Speaker 2>Its only job is to take the itinerary text from

267
00:12:34.639 --> 00:12:39.399
<v Speaker 2>call one and create three short descriptive prompts suitable for DELI.

268
00:12:40.080 --> 00:12:43.679
<v Speaker 2>Like if the itinerary mentioned the Colisseum, Vatican and Trevy Fountain,

269
00:12:44.039 --> 00:12:48.080
<v Speaker 2>it might output Colisseum and Rome, Vatican City Interior, Trevy

270
00:12:48.080 --> 00:12:51.120
<v Speaker 2>Fountain at night. Just the prompts separated by a pipe symbol.

271
00:12:51.279 --> 00:12:53.759
<v Speaker 1>Ah. And you use GPT three point five here because

272
00:12:53.759 --> 00:12:55.759
<v Speaker 1>it's cheaper and the task is simple. It doesn't need

273
00:12:55.840 --> 00:12:57.799
<v Speaker 1>GPT four's nuance exactly.

274
00:12:58.159 --> 00:13:00.960
<v Speaker 2>The user never sees this intermediate p output, only the

275
00:13:00.960 --> 00:13:03.759
<v Speaker 2>final images, so three point five is perfectly adequate and

276
00:13:03.840 --> 00:13:05.799
<v Speaker 2>much more cost effective for this specific step.

277
00:13:06.000 --> 00:13:08.679
<v Speaker 1>Smart resource use nice optimization. Okay, So now you have

278
00:13:08.720 --> 00:13:10.759
<v Speaker 1>the itinerary text and three image prompts.

279
00:13:10.919 --> 00:13:13.720
<v Speaker 2>Right, So call number three hits the images API using

280
00:13:13.759 --> 00:13:16.840
<v Speaker 2>DELI THII. Your code loops through the three prompts from

281
00:13:16.840 --> 00:13:19.120
<v Speaker 2>call too, making a separate API call for each one

282
00:13:19.159 --> 00:13:21.320
<v Speaker 2>to generated image. It collects the URLs of the three

283
00:13:21.360 --> 00:13:24.840
<v Speaker 2>generated images image rolls Finally, the cloud function bundles everything

284
00:13:24.919 --> 00:13:27.919
<v Speaker 2>up and returns a single Jason response containing the itinerary

285
00:13:27.960 --> 00:13:31.000
<v Speaker 2>text and the URLs for morning image, afternoon image, and

286
00:13:31.120 --> 00:13:31.919
<v Speaker 2>evening image.

287
00:13:31.960 --> 00:13:34.480
<v Speaker 1>And then in bubble you just connect those pieces input

288
00:13:34.519 --> 00:13:37.519
<v Speaker 1>for city button, a big text area for the itinerary,

289
00:13:37.519 --> 00:13:40.679
<v Speaker 1>and three image elements. You map the JSON fields from

290
00:13:40.679 --> 00:13:43.919
<v Speaker 1>the cloud function response directly to those elements. That's really slick,

291
00:13:44.120 --> 00:13:46.519
<v Speaker 1>combining text and custom images on the fly like that.

292
00:13:47.360 --> 00:13:52.320
<v Speaker 2>Very cool. Okay, let's switch tracks slightly. Building knowledge assistance

293
00:13:52.879 --> 00:13:56.039
<v Speaker 2>this is huge. Standard chat GPT is great, but its

294
00:13:56.120 --> 00:13:58.120
<v Speaker 2>knowledge is kind of frozen in time right, and it

295
00:13:58.120 --> 00:14:00.840
<v Speaker 2>can sometimes just make stuff up hallocin. You can't easily

296
00:14:00.840 --> 00:14:03.399
<v Speaker 2>to only use this specific document precisely.

297
00:14:03.639 --> 00:14:06.080
<v Speaker 1>That's where building your own assistant comes in, using the

298
00:14:06.120 --> 00:14:10.240
<v Speaker 1>API combined with your specific trusted knowledge source. A basic

299
00:14:10.279 --> 00:14:13.000
<v Speaker 1>way to do this covered in the book is PDF analysis.

300
00:14:13.279 --> 00:14:15.879
<v Speaker 1>Your app takes a PDF link and a question. The

301
00:14:15.919 --> 00:14:19.240
<v Speaker 1>cloud function fetches the pdf, uses a library like pipdf

302
00:14:19.279 --> 00:14:21.320
<v Speaker 1>two to scrabe all the text out of it. Then

303
00:14:21.399 --> 00:14:23.600
<v Speaker 1>it stuffs that entire text into the prompt along with

304
00:14:23.600 --> 00:14:25.960
<v Speaker 1>the user's question, and sends it off to GPT four

305
00:14:26.039 --> 00:14:26.919
<v Speaker 1>so it just crams the.

306
00:14:26.919 --> 00:14:30.200
<v Speaker 2>Whole PDF into the context window every single time. Yeah, coefficient,

307
00:14:30.360 --> 00:14:33.759
<v Speaker 2>it can be. It works, but yeah, limitations. It only

308
00:14:33.759 --> 00:14:36.879
<v Speaker 2>gets text, no images from the pdf. It struggles with

309
00:14:37.000 --> 00:14:40.600
<v Speaker 2>really huge documents, and the biggest issue is that context

310
00:14:40.639 --> 00:14:44.279
<v Speaker 2>window limit. If your PDF has more words then the

311
00:14:44.320 --> 00:14:47.879
<v Speaker 2>model can handle like those three thousand words for GPP

312
00:14:48.039 --> 00:14:50.440
<v Speaker 2>three point five or twenty four thousand for GPT four

313
00:14:50.519 --> 00:14:53.120
<v Speaker 2>thirty two. K. It just won't work properly, right.

314
00:14:53.519 --> 00:14:55.759
<v Speaker 1>But there's a better way now, isn't there with the

315
00:14:56.000 --> 00:14:57.840
<v Speaker 1>newer assistance API.

316
00:14:58.039 --> 00:15:01.240
<v Speaker 2>Oh yeah, the assistants APIs specifically with its built in

317
00:15:01.399 --> 00:15:03.759
<v Speaker 2>knowledge retrieval tool, is a total.

318
00:15:03.519 --> 00:15:05.799
<v Speaker 1>Game changer for this What makes it so different.

319
00:15:05.480 --> 00:15:09.559
<v Speaker 2>It's incredibly smart. When you upload your documents, PDFs, word docs, etc.

320
00:15:10.039 --> 00:15:13.879
<v Speaker 2>To an assistant with retrieval enabled, open AI automatically handles

321
00:15:13.919 --> 00:15:17.240
<v Speaker 2>the hard parts. It breaks the documents into manageable chunks,

322
00:15:17.399 --> 00:15:20.720
<v Speaker 2>creates embeddings for each chunk, those unique numerical fingerprints we

323
00:15:20.759 --> 00:15:23.759
<v Speaker 2>talked about, and stores them efficiently. Then when you ask

324
00:15:23.759 --> 00:15:26.440
<v Speaker 2>a question, it uses vector search to instantly find only

325
00:15:26.480 --> 00:15:29.320
<v Speaker 2>the most relevant chunks of texts from your documents related

326
00:15:29.320 --> 00:15:29.879
<v Speaker 2>to your question.

327
00:15:30.039 --> 00:15:31.919
<v Speaker 1>So It doesn't read the whole document every time, It

328
00:15:32.000 --> 00:15:34.159
<v Speaker 1>just finds the relevant paragraphs.

329
00:15:33.759 --> 00:15:38.759
<v Speaker 2>Exactly, which means there's effectively no context window limit for

330
00:15:38.840 --> 00:15:42.639
<v Speaker 2>your knowledge base. You can upload massive files or hundreds

331
00:15:42.639 --> 00:15:46.559
<v Speaker 2>of documents and the assistant intelligently retrieves only the necessary

332
00:15:46.559 --> 00:15:49.720
<v Speaker 2>snippets to answer the question. Incredibly efficient.

333
00:15:49.879 --> 00:15:51.960
<v Speaker 1>That sounds amazing. How do you set that up? Still?

334
00:15:51.960 --> 00:15:54.120
<v Speaker 2>Start in the playground, yep, The playground is great for

335
00:15:54.159 --> 00:15:57.159
<v Speaker 2>creating the assistant itself. You give it a name US

336
00:15:57.240 --> 00:16:01.120
<v Speaker 2>Constitution Expert Instructions answer questions based only on the provided

337
00:16:01.159 --> 00:16:04.919
<v Speaker 2>constitution document. Choose a model like GPT four to eleven

338
00:16:05.000 --> 00:16:07.399
<v Speaker 2>oh six Preview, which is good for this. Then the

339
00:16:07.440 --> 00:16:11.000
<v Speaker 2>crucial step you toggle on the retrieval tool and then

340
00:16:11.039 --> 00:16:13.279
<v Speaker 2>you upload your knowledge file like a PDF of the

341
00:16:13.360 --> 00:16:17.039
<v Speaker 2>US Constitution. Once it's created, you grab the unique assistant ID.

342
00:16:17.279 --> 00:16:21.080
<v Speaker 1>Okay, assistant created, knowledge uploaded. Then the cloud function code

343
00:16:21.279 --> 00:16:22.360
<v Speaker 1>uses this assistant ID.

344
00:16:22.720 --> 00:16:25.879
<v Speaker 2>Correct. The Python code for your cloud function becomes a

345
00:16:25.879 --> 00:16:29.960
<v Speaker 2>bit different using the assistants API. First, you create a thread.

346
00:16:30.440 --> 00:16:34.039
<v Speaker 2>Think of thread as a single conversation session. Then you

347
00:16:34.039 --> 00:16:37.799
<v Speaker 2>add the user's question as a message to that thread. Next,

348
00:16:38.000 --> 00:16:40.720
<v Speaker 2>you tell the assistant to run on that thread, providing

349
00:16:40.720 --> 00:16:43.480
<v Speaker 2>the assistant ID and the thread ID. Now here's a

350
00:16:43.519 --> 00:16:45.799
<v Speaker 2>key detail for the book's code. You need to wait

351
00:16:45.840 --> 00:16:49.399
<v Speaker 2>a bit. The assistant needs time to process, search the

352
00:16:49.480 --> 00:16:52.720
<v Speaker 2>knowledge and formulate the answer, so you might add a

353
00:16:52.799 --> 00:16:57.480
<v Speaker 2>time dot sleep or similar pause. After the pause, you

354
00:16:57.559 --> 00:16:59.759
<v Speaker 2>retrieve the list of messages from the thread and the

355
00:17:00.000 --> 00:17:01.799
<v Speaker 2>assistem's answer will be the newest message.

356
00:17:01.840 --> 00:17:04.920
<v Speaker 1>Okay, that pause is important. And the bubble front end

357
00:17:04.920 --> 00:17:07.720
<v Speaker 1>for this probably simpler.

358
00:17:07.480 --> 00:17:09.680
<v Speaker 2>Much simpler for this use case. Yeah, just an input

359
00:17:09.680 --> 00:17:12.319
<v Speaker 2>boxer the user's question, a button and a text box

360
00:17:12.359 --> 00:17:14.359
<v Speaker 2>to display the answer returned by the cloud function.

361
00:17:14.480 --> 00:17:16.400
<v Speaker 1>And the result is you can ask specific questions like

362
00:17:16.480 --> 00:17:19.200
<v Speaker 1>how many senators are there or what's the age requirement

363
00:17:19.240 --> 00:17:22.039
<v Speaker 1>for a senator and it pulls the answer directly from

364
00:17:22.079 --> 00:17:25.000
<v Speaker 1>that constitution pdf you uploaded exactly.

365
00:17:25.039 --> 00:17:28.720
<v Speaker 2>It grounds the AI in your specific source material. It's

366
00:17:28.799 --> 00:17:34.359
<v Speaker 2>incredibly powerful for legal teams, medical info, company knowledge bases,

367
00:17:34.920 --> 00:17:38.759
<v Speaker 2>educational tools, anywhere you need reliable answers from a defined

368
00:17:38.799 --> 00:17:39.640
<v Speaker 2>set of information.

369
00:17:39.920 --> 00:17:43.319
<v Speaker 1>Wow, we've covered a lot, from just understanding the API

370
00:17:43.400 --> 00:17:46.759
<v Speaker 1>basics to playing in the playground, making direct calls, adding

371
00:17:46.799 --> 00:17:50.599
<v Speaker 1>images and audio. Then building actual apps with back ends

372
00:17:50.680 --> 00:17:55.000
<v Speaker 1>and frontends, optimizing costs, and finally creating these powerful knowledgeable

373
00:17:55.000 --> 00:17:58.279
<v Speaker 1>assistance tied to specific documents. You've really gone from just

374
00:17:58.400 --> 00:18:01.039
<v Speaker 1>using chat GPT to understand how to build with its

375
00:18:01.119 --> 00:18:04.799
<v Speaker 1>underlying power. You're equipped now to actually create things.

376
00:18:05.039 --> 00:18:06.960
<v Speaker 2>Yeah, and it brings to mind something Paul Siegel, a

377
00:18:07.000 --> 00:18:10.240
<v Speaker 2>tech entrepreneur, wrote in the forward to Henry's book, You said, Essentially,

378
00:18:10.480 --> 00:18:12.920
<v Speaker 2>I strongly encourage you to use this knowledge to create

379
00:18:12.960 --> 00:18:16.319
<v Speaker 2>your next successful app or business, or simply to enrich

380
00:18:16.359 --> 00:18:19.200
<v Speaker 2>your thinking about how to innovate. Dream on it, then

381
00:18:19.440 --> 00:18:21.839
<v Speaker 2>fashion your dreams into a reality with the tools you've

382
00:18:21.839 --> 00:18:23.880
<v Speaker 2>gained here. I think that sums it up nicely.

383
00:18:24.160 --> 00:18:26.720
<v Speaker 1>Great final thoughts, So the message is clear, don't just

384
00:18:26.799 --> 00:18:29.319
<v Speaker 1>use AI, build with it, Go experiment, see what you

385
00:18:29.319 --> 00:18:29.839
<v Speaker 1>can create.
