WEBVTT 1 00:00:00.040 --> 00:00:02.319 Okay, let's untack this. What if you could take the 2 00:00:02.439 --> 00:00:07.200 raw power behind tools like chat GPT and mold it 3 00:00:07.240 --> 00:00:10.160 to your exact needs. We're talking about going beyond just 4 00:00:10.240 --> 00:00:12.759 you know, chatting with an AI to actually building intelligent 5 00:00:12.759 --> 00:00:15.359 applications and specialized assistants. 6 00:00:14.800 --> 00:00:18.120 Exactly Today, our deep dive is all about the open 7 00:00:18.160 --> 00:00:22.800 Ai API, focusing on how it empowers well anyone really 8 00:00:23.120 --> 00:00:26.160 to create custom AI solutions. Our main source for this 9 00:00:26.320 --> 00:00:30.320 is Henry Habib's Open Ai API Cookbook, which Packed Publishing 10 00:00:30.359 --> 00:00:31.879 put out in March twenty twenty four. 11 00:00:32.000 --> 00:00:34.280 And Henry Habib he knows his stuff over a decade 12 00:00:34.359 --> 00:00:37.439 and AI and productivity right, and he's a big believer 13 00:00:37.520 --> 00:00:40.679 in this citizen developer idea that you don't need to 14 00:00:40.679 --> 00:00:43.520 be like a hardcore coder to build amazing things. He's 15 00:00:43.520 --> 00:00:45.920 also the guy behind the Intelligent Worker newsletter. 16 00:00:45.560 --> 00:00:48.359 That's right, and Sam Mackay, the CEO of Enterprise DNA. 17 00:00:48.439 --> 00:00:50.799 He actually calls the book an essential guide for knowledge 18 00:00:50.799 --> 00:00:53.840 workers eager to harness the power of open AI and 19 00:00:53.920 --> 00:00:58.280 chat GPT to build intelligent applications and solutions. High praise. 20 00:00:58.560 --> 00:01:01.560 So your mission for this, a deep Dive listener, is simple, 21 00:01:01.799 --> 00:01:05.560 get a shortcut really understand how to use the open AIAPI. 22 00:01:05.879 --> 00:01:09.079 We're focusing on the practical stuff, those real aha moments. 23 00:01:09.840 --> 00:01:13.040 You've heard of AI chat GPT. They're everywhere, constantly talked about. 24 00:01:13.239 --> 00:01:15.760 But what's cool here is how actionable they are. We're 25 00:01:15.760 --> 00:01:17.920 going to show you how to turn ideas into reality. 26 00:01:18.280 --> 00:01:21.319 Let's start with the basics. Why the API matters. It's 27 00:01:21.359 --> 00:01:24.000 more than just the chat box you see online. I mean, 28 00:01:24.079 --> 00:01:27.159 chat GPT's growth was just insane, wasn't it. One hundred 29 00:01:27.200 --> 00:01:30.599 million users in two months. That's faster than TikTok, which 30 00:01:30.640 --> 00:01:34.120 took what nine months. It really brought natural language processing 31 00:01:34.239 --> 00:01:36.000 NLPE to the masses. 32 00:01:35.560 --> 00:01:39.319 Absolutely, and the API it takes that democratization way further. 33 00:01:39.400 --> 00:01:42.400 It's a genuine paradigm shift. It means anyone can generate 34 00:01:42.439 --> 00:01:45.480 really human like text from simple prompts. You don't need 35 00:01:45.519 --> 00:01:47.400 a PhD in machine learning anymore. It's not just for 36 00:01:47.439 --> 00:01:49.959 the big players like Typeface or Jasper Ai building on 37 00:01:50.000 --> 00:01:52.359 top of it. It's for you integrating that power into 38 00:01:52.359 --> 00:01:53.079 your own stuff. 39 00:01:53.280 --> 00:01:56.159 And the open II Playground is kind of the perfect 40 00:01:56.200 --> 00:01:58.760 place to start messing around. Yeah, like a sandbox. It's 41 00:01:58.760 --> 00:02:01.799 got three main parts the system message, the chat log, 42 00:02:01.879 --> 00:02:04.319 and then the parameters. The system message is where you 43 00:02:04.400 --> 00:02:08.159 tell the AI who it should be, like you are 44 00:02:08.240 --> 00:02:11.840 an assistant that creates marketing slogans, simple as that shapes 45 00:02:11.840 --> 00:02:13.199 its whole persona right. 46 00:02:13.039 --> 00:02:16.080 And it's fascinating because the model isn't understanding like we 47 00:02:16.120 --> 00:02:18.520 do no thoughts, no feelings. Think of it more like 48 00:02:19.719 --> 00:02:23.879 super advanced autocomplete. It predicts the next word based on 49 00:02:24.000 --> 00:02:26.759 patterns from tons of text data. So you put examples 50 00:02:26.800 --> 00:02:29.280 in the chat log, say you give it company makes 51 00:02:29.319 --> 00:02:31.840 ice cream, and then that apply sham the ice cream 52 00:02:31.840 --> 00:02:34.960 that never melts. You're guiding those predictions. You're kind of 53 00:02:34.960 --> 00:02:37.560 training it right there to follow patterns like starting with 54 00:02:37.599 --> 00:02:39.599 sham and ending with an exclamation. 55 00:02:39.120 --> 00:02:43.159 Mark that makes sense, guiding the probabilities. Okay, so once 56 00:02:43.159 --> 00:02:45.520 you've got your prompts working well in the playground, you 57 00:02:45.560 --> 00:02:48.560 move on to making real API requests, maybe using something 58 00:02:48.599 --> 00:02:51.400 like postmam. And this is where it gets really powerful 59 00:02:51.439 --> 00:02:53.840 because you're not just watching it work, you're controlling it 60 00:02:53.879 --> 00:02:57.039 with code programmatically. And for an API request, there are 61 00:02:57.080 --> 00:02:59.240 like four main things you need. Right First is the 62 00:02:59.360 --> 00:03:02.159 endpoint that's URL, the address you're sending the request to 63 00:03:02.280 --> 00:03:05.599 like https, dot API, dot openI, dot com, forward slash 64 00:03:05.719 --> 00:03:07.599 v one chat completions exactly. 65 00:03:08.039 --> 00:03:12.439 Then there's the header. Think of this as containing important metadata. 66 00:03:12.560 --> 00:03:15.719 It tells open Ai what you're sending, usually content type 67 00:03:15.719 --> 00:03:18.400 dot application JSON because Jason is just a standard way 68 00:03:18.439 --> 00:03:21.639 for systems to swap structured data. And critically, it says 69 00:03:21.680 --> 00:03:25.039 who you are with your authorization bearer, your API key. 70 00:03:25.199 --> 00:03:27.000 That's your secret handshake with open Ai. 71 00:03:27.360 --> 00:03:29.680 Okay, So header is like the envelope details, and the 72 00:03:29.719 --> 00:03:31.159 body is what's inside the envelope. 73 00:03:31.159 --> 00:03:33.719 Correct. A body is a Jason object. It holds the 74 00:03:33.759 --> 00:03:36.000 specifics like which model you want to use, and the 75 00:03:36.039 --> 00:03:39.639 messages that's your system message and chat log content. And 76 00:03:39.719 --> 00:03:42.199 finally you get the response back from open Ai. That's 77 00:03:42.240 --> 00:03:46.240 also Jason containing the AI's output. It's choices and usage 78 00:03:46.319 --> 00:03:47.960 data like how many tokens you used? 79 00:03:48.080 --> 00:03:50.919 Cool? But okay, let's break out of just text. The 80 00:03:50.960 --> 00:03:53.759 open Ai API can do more than just words, can't it? 81 00:03:54.039 --> 00:03:55.319 Multimodal stuff? Oh? 82 00:03:55.360 --> 00:03:59.199 Absolutely, Beyond text. You've got image generation with Dally. The 83 00:03:59.280 --> 00:04:03.000 newer versions Dally two and three use this technique called diffusion. 84 00:04:03.039 --> 00:04:05.039 You can kind of picture it like starting with TV 85 00:04:05.120 --> 00:04:07.560 static and slowly clearing it up until an image appears. 86 00:04:07.560 --> 00:04:10.719 It's pretty neat. But the key with images, unlike text maybe, 87 00:04:10.800 --> 00:04:13.000 is you have to be super specific in your prompts. 88 00:04:13.240 --> 00:04:15.759 Just saying a dog gets you, well, a random dog, 89 00:04:16.040 --> 00:04:18.839 but a brown, furry, medium sized CORKI doog on a 90 00:04:18.839 --> 00:04:22.199 green grass field profile view that gets you much closer 91 00:04:22.199 --> 00:04:24.600 to what you actually want. It raises an interesting point. 92 00:04:24.639 --> 00:04:27.959 Text generation can infer context sometimes, but image generation it 93 00:04:28.000 --> 00:04:32.240 needs precise descriptive language. Ambiguity is your enemy here. 94 00:04:32.319 --> 00:04:34.439 Good point need to be crystal clear. And it does 95 00:04:34.480 --> 00:04:35.800 audio too. Transcripts. 96 00:04:35.920 --> 00:04:38.240 Yeah, the audio endpoint uses the Whisper model for that. 97 00:04:38.600 --> 00:04:40.000 It transcribes audio files. 98 00:04:40.079 --> 00:04:43.240 Ah and technically for file uploads you need to use 99 00:04:43.279 --> 00:04:45.040 form data instead of JSON in. 100 00:04:45.000 --> 00:04:48.199 The request right exactly. Jason is great for text data, 101 00:04:48.240 --> 00:04:51.000 but form data is built for sending files, kind of 102 00:04:51.000 --> 00:04:53.360 like attaching something to an email. It handles lots of 103 00:04:53.399 --> 00:04:57.000 formats dot MP three, dot MP four, dot MPEG, dotwave, 104 00:04:57.120 --> 00:04:59.680 dot web, dot WebM quite a few. 105 00:04:59.519 --> 00:05:02.279 So you could transcribe a meeting maybe easily. 106 00:05:02.560 --> 00:05:04.839 And the real magic starts when you chain these things together. 107 00:05:05.160 --> 00:05:08.439 Imagine a voice assistant. Voice comes in whisper transcribes it, 108 00:05:08.680 --> 00:05:12.000 chat Api figures out a response, maybe Dali even generates 109 00:05:12.079 --> 00:05:12.759 relevant image. 110 00:05:12.800 --> 00:05:15.959 Okay, that's starting to sound really powerful. Now, let's talk 111 00:05:15.959 --> 00:05:18.839 about fine tuning the dials and knobs as you called 112 00:05:18.879 --> 00:05:19.399 them in the book. 113 00:05:19.480 --> 00:05:23.000 The parameters, right, The parameters let you control the AI's behavior, 114 00:05:23.319 --> 00:05:25.480 and the model parameter is probably the biggest one. Usually 115 00:05:25.519 --> 00:05:28.279 you're choosing between GPT three point five and GPT four. 116 00:05:28.480 --> 00:05:31.040 GPT three point five has what one hundred and seventy 117 00:05:31.079 --> 00:05:34.680 five billion parameters. GPT four is estimated to be way larger, 118 00:05:34.720 --> 00:05:37.480 maybe over one hundred trillion parameters across a bunch of 119 00:05:37.519 --> 00:05:40.560 models working together. More parameters generally means the model is 120 00:05:40.600 --> 00:05:44.199 better at capturing subtle patterns and understanding complex instructions. So 121 00:05:44.279 --> 00:05:46.879 GPT four tends to be more reliable, better with nuance. 122 00:05:47.160 --> 00:05:51.399 It actually scores higher on things like standardized tests, EP calculus. 123 00:05:50.839 --> 00:05:53.040 The lsat Wow, and you can see that difference in 124 00:05:53.040 --> 00:05:55.120 the outputs. Can't you like that example in the book 125 00:05:55.279 --> 00:05:58.279 asking for a sentence about Mars with six five letter words, 126 00:05:58.560 --> 00:06:01.399 GPT three point five up the word count right, It 127 00:06:01.439 --> 00:06:04.319 gives our Mars strip felt vast, new, cold. 128 00:06:04.000 --> 00:06:06.360 Hard, grand grand isn't five letters. 129 00:06:06.120 --> 00:06:09.000 Exactly If GPT four gets it, Mars Red World, Brave Crew, 130 00:06:09.079 --> 00:06:12.000 Deep Space finds life. Perfect for the cigarette question how 131 00:06:12.000 --> 00:06:14.600 many chemicals? How many harmful? How many cause cancer? Just 132 00:06:14.680 --> 00:06:17.639 the numbers. GPT three point five gives you a paragraph. 133 00:06:17.839 --> 00:06:22.160 GPT four just answers two hundred and fifty sixty concise 134 00:06:22.319 --> 00:06:25.759 even logic puzzles. GPT four tends to reason more accurately 135 00:06:25.800 --> 00:06:28.079 than three point five, and GPT four has a bigger 136 00:06:28.160 --> 00:06:30.680 memory too. The context win much bigger. 137 00:06:30.759 --> 00:06:33.800 Like GPT four thirty two k can handle around thirty 138 00:06:33.800 --> 00:06:36.959 two thousand tokens maybe twenty four thousand words. GPT three 139 00:06:37.000 --> 00:06:39.480 point five max is out around four thousand tokens about 140 00:06:39.480 --> 00:06:43.160 three thousand words. Big difference if you're feeding it long documents. 141 00:06:43.160 --> 00:06:45.199 Okay, but there's a catch, isn't there cost? 142 00:06:45.519 --> 00:06:49.040 Huge catch? GPT four can be twenty to forty times 143 00:06:49.079 --> 00:06:52.680 more expensive per token than GPT three point five. It's significant. 144 00:06:52.879 --> 00:06:55.319 So the practical advice for you is always start with 145 00:06:55.319 --> 00:06:57.959 GPT three point five. If it does the job great, 146 00:06:58.040 --> 00:07:00.600 you save a lot of money. Only upgraded GPT four 147 00:07:00.639 --> 00:07:03.439 if you absolutely need that extra reasoning power or the 148 00:07:03.560 --> 00:07:04.680 larger context window. 149 00:07:04.839 --> 00:07:06.959 That's a massive cost difference. Why is it so much 150 00:07:07.000 --> 00:07:08.360 more just the size. 151 00:07:08.000 --> 00:07:11.560 Primarily, Yeah, it's a much bigger, more complex model. Just 152 00:07:11.560 --> 00:07:14.240 takes way more computing power to run each request. Think 153 00:07:14.319 --> 00:07:15.879 supercomputer versus calculator. 154 00:07:15.959 --> 00:07:18.839 Gotcha? Okay, another parameter dot N that controls how many 155 00:07:18.839 --> 00:07:20.000 answers you get back right. 156 00:07:20.240 --> 00:07:22.360 N sets the number of responses can be any whole 157 00:07:22.399 --> 00:07:25.160 number for chat, but max ten for images. Super useful 158 00:07:25.160 --> 00:07:28.319 for brainstormings, logans, getting different options, or for checking consistency, 159 00:07:28.360 --> 00:07:29.639 maybe ab testing outputs. 160 00:07:29.720 --> 00:07:31.920 And the interesting thing you mentioned is the cost isn't 161 00:07:31.959 --> 00:07:34.399 linear like N three isn't three times the price? 162 00:07:34.480 --> 00:07:37.399 No, it's often much less, maybe sixty percent more, not 163 00:07:37.439 --> 00:07:41.079 two hundred percent more, which tells you something cool. The 164 00:07:41.120 --> 00:07:44.399 AI isn't just running the request three times separately. It's 165 00:07:44.560 --> 00:07:48.439 likely batching the computation somehow finding efficiencies. It's an optimization 166 00:07:48.560 --> 00:07:49.319 hint clever. 167 00:07:49.600 --> 00:07:52.480 Okay, what about temperature? That one sounds a bit abstract. 168 00:07:52.839 --> 00:07:53.920 Controls creativity. 169 00:07:54.000 --> 00:07:57.279 Yeah, temperature basically controls the randomness or let's say, creativity 170 00:07:57.279 --> 00:07:59.399 of the output. It goes from point zero to two 171 00:07:59.439 --> 00:08:01.879 point zero th of it, like tuning a radio. Low 172 00:08:01.879 --> 00:08:04.560 temperature maybe twoint zero too point eight is like a sharp, 173 00:08:04.759 --> 00:08:09.680 clear signal, very focused, consistent factual responses. Good for things 174 00:08:09.680 --> 00:08:13.360 like code generation data analysis where you want deterministic. 175 00:08:12.759 --> 00:08:16.800 Output and higher temperature more static, more like an eclectic 176 00:08:16.839 --> 00:08:17.319 mix station. 177 00:08:17.439 --> 00:08:19.920 Yeah, higher temps, say one point two to two point zero, 178 00:08:20.120 --> 00:08:22.279 make the AI take more risks with word choices. It 179 00:08:22.319 --> 00:08:24.519 flattens the probability curve for the next word, so you 180 00:08:24.519 --> 00:08:28.000 get more diverse, unexpected sometimes more creative results. Great for brainstorming, 181 00:08:28.040 --> 00:08:29.920 writing stories, generating slogans. 182 00:08:30.240 --> 00:08:32.600 So for general use like a chatbot, maybe somewhere in 183 00:08:32.600 --> 00:08:35.159 the middle point eight to one point two exactly. 184 00:08:35.320 --> 00:08:37.320 Balance is making sense with being interesting. 185 00:08:37.759 --> 00:08:39.879 So the advice is start around one point zero and 186 00:08:40.000 --> 00:08:42.879 tweak it by like zero point two increments. 187 00:08:43.200 --> 00:08:45.840 That's a good practical approach. Yeah, see what works for 188 00:08:45.879 --> 00:08:46.679 your specific need. 189 00:08:46.799 --> 00:08:51.840 Okay, makes sense. Now let's shift gears to building real applications. 190 00:08:52.320 --> 00:08:54.759 Usually you don't just have your app talk directly to 191 00:08:54.799 --> 00:08:57.559 open AI, right, there's often a back end layer in between. 192 00:08:57.840 --> 00:09:01.000 That's right. The typical flow is from tend what the 193 00:09:01.080 --> 00:09:03.320 user sees talks to your back end, and your back 194 00:09:03.399 --> 00:09:06.679 end talks to the open AIAPI. This back end layer 195 00:09:06.759 --> 00:09:10.879 is crucial. First security, it keeps your precious API key 196 00:09:11.000 --> 00:09:15.519 safe hidden from the user's browser. Second control, you can 197 00:09:15.519 --> 00:09:18.120 process the input before it goes to open AI or 198 00:09:18.159 --> 00:09:20.559 clean up the output after it comes back. Plus it 199 00:09:20.600 --> 00:09:23.480 lets you integrate other services, hand logins, all that stuff. 200 00:09:23.519 --> 00:09:26.120 And for that back end, serverless options like Google Cloud 201 00:09:26.159 --> 00:09:27.720 functions are pretty popular. 202 00:09:28.039 --> 00:09:30.679 Very popular, yeah, because you don't have to manage servers. 203 00:09:30.679 --> 00:09:34.320 It just scales automatically. You write your code, upload it, 204 00:09:34.679 --> 00:09:37.519 and Google handles the rest. You set up an HTTP 205 00:09:37.679 --> 00:09:39.519 trigger so it could be called like a web address. 206 00:09:39.919 --> 00:09:43.759 Allow unauthenticated calls maybe for testing, but be careful in 207 00:09:43.759 --> 00:09:46.120 production and define your entry point function. 208 00:09:46.440 --> 00:09:49.240 And then for the front end the user interface. You 209 00:09:49.279 --> 00:09:52.480 can use no code tools like Bubble, so anyone can 210 00:09:52.519 --> 00:09:53.960 build the app part exactly. 211 00:09:54.159 --> 00:09:57.480 Bubble lets you visually design your web app and connect 212 00:09:57.480 --> 00:10:00.759 buttons and inputs directly to your back end cloud function. 213 00:10:01.000 --> 00:10:02.519 It's incredibly empowering. 214 00:10:02.840 --> 00:10:05.240 Let's walk through an example, like that email reply wrapper 215 00:10:05.279 --> 00:10:07.240 from the book. You could do it in chat GPT, sure, 216 00:10:07.279 --> 00:10:10.159 but building it yourself really teaches you the whole process. 217 00:10:10.440 --> 00:10:12.799 So you start in the playground testing proms, get the 218 00:10:12.799 --> 00:10:15.240 Python code, then you put that logic into a Google 219 00:10:15.240 --> 00:10:17.519 Cloud function that's your back end. It takes the email 220 00:10:17.559 --> 00:10:20.360 text as input, adds your API key. Secretly, you'd tell 221 00:10:20.399 --> 00:10:23.200 it to use say GPT four, maybe a higher temperature 222 00:10:23.240 --> 00:10:25.960 like one point four for creator replies, set N three 223 00:10:26.039 --> 00:10:28.519 to get three options, maybe limit topens to five to one. 224 00:10:28.480 --> 00:10:30.960 Right, and then you'd use Postman to test that cloud 225 00:10:31.000 --> 00:10:34.320 function directly, make sure it actually returns three email replies 226 00:10:34.360 --> 00:10:36.840 in the format you expect. Once that's working, you jump 227 00:10:36.840 --> 00:10:39.879 into Bubble. You build the input box for the original email, 228 00:10:40.120 --> 00:10:42.759 a button to generate replies, and maybe three textboxes to 229 00:10:42.759 --> 00:10:46.480 display choice one, choice two, choice three. Use bubbles API 230 00:10:46.559 --> 00:10:49.279 connector to link the button press to your cloud function 231 00:10:49.639 --> 00:10:53.720 URL and display the return choices. And really understanding this 232 00:10:53.759 --> 00:10:58.279 whole playground, Cloud function, Postman, Bubble. That's the fundamental pattern. 233 00:10:58.639 --> 00:11:01.360 Master this and you can pretty much any intelligent app. 234 00:11:01.720 --> 00:11:04.080 That's a great point. It's the core loop. What's a 235 00:11:04.120 --> 00:11:06.799 common sticking point when people first try this? Getting the 236 00:11:06.879 --> 00:11:08.799 data flow right often. 237 00:11:08.639 --> 00:11:11.759 Yeah, getting the JSON right in the requests and responses, 238 00:11:12.159 --> 00:11:15.799 making sure API keys are correct and secure, little syntax things. 239 00:11:15.879 --> 00:11:18.440 Postwind really helps debug that before you even touch the frontend. 240 00:11:18.559 --> 00:11:21.200 Okay, so that's a solid foundation. But let's get to 241 00:11:21.240 --> 00:11:23.279 something really cool, something you can't just do in the 242 00:11:23.320 --> 00:11:27.960 standard chat GPT interface easily. The multimodal travel itinerary app. 243 00:11:28.080 --> 00:11:29.399 That sounds awesome. 244 00:11:30.039 --> 00:11:33.600 It really shows the power of orchestrating multiple API calls. 245 00:11:33.919 --> 00:11:38.399 The idea user toxicity gets back a detailed one day 246 00:11:38.399 --> 00:11:42.720 plan morning, afternoon, evening activities and three AI generated images 247 00:11:42.759 --> 00:11:43.799 matching those activities. 248 00:11:43.919 --> 00:11:46.600 Wow, okay, how does that work behind the scenes in 249 00:11:46.639 --> 00:11:47.399 the cloud function. 250 00:11:47.840 --> 00:11:51.039 So first, because this involves multiple calls, including image generation, 251 00:11:51.120 --> 00:11:53.399 which can be slow, you need to increase the cloud 252 00:11:53.399 --> 00:11:56.960 function's timeout limit maybe to three hundred seconds five minutes, 253 00:11:57.240 --> 00:11:57.879 just to be safe. 254 00:11:57.960 --> 00:11:59.000 Good practical tip. 255 00:11:59.279 --> 00:12:03.120 Then one uber one uses the chat api GPT four. Specifically, 256 00:12:03.159 --> 00:12:05.360 it takes the city name. Crucially, you give it a 257 00:12:05.399 --> 00:12:08.559 detailed chat log with examples what the book calls fu 258 00:12:08.559 --> 00:12:11.720 shot prompting. You showed examples for Rome, Lisbon, et cetera. 259 00:12:11.840 --> 00:12:15.080 Format it exactly how you want warning activity, afternoon activity, 260 00:12:15.120 --> 00:12:18.039 evening activity. This force is GPT four to follow that 261 00:12:18.080 --> 00:12:21.120 structure precisely. It stores the resulting itinerary text. 262 00:12:21.279 --> 00:12:24.080 Got it. So the structure comes from good prompting and examples. 263 00:12:24.080 --> 00:12:25.480 How do the images get generated? 264 00:12:25.679 --> 00:12:28.879 That's call number two, also chat API, but this time 265 00:12:29.000 --> 00:12:31.720 using GPT three point five Turbo one one oh six. 266 00:12:32.320 --> 00:12:34.639 Its only job is to take the itinerary text from 267 00:12:34.639 --> 00:12:39.399 call one and create three short descriptive prompts suitable for DELI. 268 00:12:40.080 --> 00:12:43.679 Like if the itinerary mentioned the Colisseum, Vatican and Trevy Fountain, 269 00:12:44.039 --> 00:12:48.080 it might output Colisseum and Rome, Vatican City Interior, Trevy 270 00:12:48.080 --> 00:12:51.120 Fountain at night. Just the prompts separated by a pipe symbol. 271 00:12:51.279 --> 00:12:53.759 Ah. And you use GPT three point five here because 272 00:12:53.759 --> 00:12:55.759 it's cheaper and the task is simple. It doesn't need 273 00:12:55.840 --> 00:12:57.799 GPT four's nuance exactly. 274 00:12:58.159 --> 00:13:00.960 The user never sees this intermediate p output, only the 275 00:13:00.960 --> 00:13:03.759 final images, so three point five is perfectly adequate and 276 00:13:03.840 --> 00:13:05.799 much more cost effective for this specific step. 277 00:13:06.000 --> 00:13:08.679 Smart resource use nice optimization. Okay, So now you have 278 00:13:08.720 --> 00:13:10.759 the itinerary text and three image prompts. 279 00:13:10.919 --> 00:13:13.720 Right, So call number three hits the images API using 280 00:13:13.759 --> 00:13:16.840 DELI THII. Your code loops through the three prompts from 281 00:13:16.840 --> 00:13:19.120 call too, making a separate API call for each one 282 00:13:19.159 --> 00:13:21.320 to generated image. It collects the URLs of the three 283 00:13:21.360 --> 00:13:24.840 generated images image rolls Finally, the cloud function bundles everything 284 00:13:24.919 --> 00:13:27.919 up and returns a single Jason response containing the itinerary 285 00:13:27.960 --> 00:13:31.000 text and the URLs for morning image, afternoon image, and 286 00:13:31.120 --> 00:13:31.919 evening image. 287 00:13:31.960 --> 00:13:34.480 And then in bubble you just connect those pieces input 288 00:13:34.519 --> 00:13:37.519 for city button, a big text area for the itinerary, 289 00:13:37.519 --> 00:13:40.679 and three image elements. You map the JSON fields from 290 00:13:40.679 --> 00:13:43.919 the cloud function response directly to those elements. That's really slick, 291 00:13:44.120 --> 00:13:46.519 combining text and custom images on the fly like that. 292 00:13:47.360 --> 00:13:52.320 Very cool. Okay, let's switch tracks slightly. Building knowledge assistance 293 00:13:52.879 --> 00:13:56.039 this is huge. Standard chat GPT is great, but its 294 00:13:56.120 --> 00:13:58.120 knowledge is kind of frozen in time right, and it 295 00:13:58.120 --> 00:14:00.840 can sometimes just make stuff up hallocin. You can't easily 296 00:14:00.840 --> 00:14:03.399 to only use this specific document precisely. 297 00:14:03.639 --> 00:14:06.080 That's where building your own assistant comes in, using the 298 00:14:06.120 --> 00:14:10.240 API combined with your specific trusted knowledge source. A basic 299 00:14:10.279 --> 00:14:13.000 way to do this covered in the book is PDF analysis. 300 00:14:13.279 --> 00:14:15.879 Your app takes a PDF link and a question. The 301 00:14:15.919 --> 00:14:19.240 cloud function fetches the pdf, uses a library like pipdf 302 00:14:19.279 --> 00:14:21.320 two to scrabe all the text out of it. Then 303 00:14:21.399 --> 00:14:23.600 it stuffs that entire text into the prompt along with 304 00:14:23.600 --> 00:14:25.960 the user's question, and sends it off to GPT four 305 00:14:26.039 --> 00:14:26.919 so it just crams the. 306 00:14:26.919 --> 00:14:30.200 Whole PDF into the context window every single time. Yeah, coefficient, 307 00:14:30.360 --> 00:14:33.759 it can be. It works, but yeah, limitations. It only 308 00:14:33.759 --> 00:14:36.879 gets text, no images from the pdf. It struggles with 309 00:14:37.000 --> 00:14:40.600 really huge documents, and the biggest issue is that context 310 00:14:40.639 --> 00:14:44.279 window limit. If your PDF has more words then the 311 00:14:44.320 --> 00:14:47.879 model can handle like those three thousand words for GPP 312 00:14:48.039 --> 00:14:50.440 three point five or twenty four thousand for GPT four 313 00:14:50.519 --> 00:14:53.120 thirty two. K. It just won't work properly, right. 314 00:14:53.519 --> 00:14:55.759 But there's a better way now, isn't there with the 315 00:14:56.000 --> 00:14:57.840 newer assistance API. 316 00:14:58.039 --> 00:15:01.240 Oh yeah, the assistants APIs specifically with its built in 317 00:15:01.399 --> 00:15:03.759 knowledge retrieval tool, is a total. 318 00:15:03.519 --> 00:15:05.799 Game changer for this What makes it so different. 319 00:15:05.480 --> 00:15:09.559 It's incredibly smart. When you upload your documents, PDFs, word docs, etc. 320 00:15:10.039 --> 00:15:13.879 To an assistant with retrieval enabled, open AI automatically handles 321 00:15:13.919 --> 00:15:17.240 the hard parts. It breaks the documents into manageable chunks, 322 00:15:17.399 --> 00:15:20.720 creates embeddings for each chunk, those unique numerical fingerprints we 323 00:15:20.759 --> 00:15:23.759 talked about, and stores them efficiently. Then when you ask 324 00:15:23.759 --> 00:15:26.440 a question, it uses vector search to instantly find only 325 00:15:26.480 --> 00:15:29.320 the most relevant chunks of texts from your documents related 326 00:15:29.320 --> 00:15:29.879 to your question. 327 00:15:30.039 --> 00:15:31.919 So It doesn't read the whole document every time, It 328 00:15:32.000 --> 00:15:34.159 just finds the relevant paragraphs. 329 00:15:33.759 --> 00:15:38.759 Exactly, which means there's effectively no context window limit for 330 00:15:38.840 --> 00:15:42.639 your knowledge base. You can upload massive files or hundreds 331 00:15:42.639 --> 00:15:46.559 of documents and the assistant intelligently retrieves only the necessary 332 00:15:46.559 --> 00:15:49.720 snippets to answer the question. Incredibly efficient. 333 00:15:49.879 --> 00:15:51.960 That sounds amazing. How do you set that up? Still? 334 00:15:51.960 --> 00:15:54.120 Start in the playground, yep, The playground is great for 335 00:15:54.159 --> 00:15:57.159 creating the assistant itself. You give it a name US 336 00:15:57.240 --> 00:16:01.120 Constitution Expert Instructions answer questions based only on the provided 337 00:16:01.159 --> 00:16:04.919 constitution document. Choose a model like GPT four to eleven 338 00:16:05.000 --> 00:16:07.399 oh six Preview, which is good for this. Then the 339 00:16:07.440 --> 00:16:11.000 crucial step you toggle on the retrieval tool and then 340 00:16:11.039 --> 00:16:13.279 you upload your knowledge file like a PDF of the 341 00:16:13.360 --> 00:16:17.039 US Constitution. Once it's created, you grab the unique assistant ID. 342 00:16:17.279 --> 00:16:21.080 Okay, assistant created, knowledge uploaded. Then the cloud function code 343 00:16:21.279 --> 00:16:22.360 uses this assistant ID. 344 00:16:22.720 --> 00:16:25.879 Correct. The Python code for your cloud function becomes a 345 00:16:25.879 --> 00:16:29.960 bit different using the assistants API. First, you create a thread. 346 00:16:30.440 --> 00:16:34.039 Think of thread as a single conversation session. Then you 347 00:16:34.039 --> 00:16:37.799 add the user's question as a message to that thread. Next, 348 00:16:38.000 --> 00:16:40.720 you tell the assistant to run on that thread, providing 349 00:16:40.720 --> 00:16:43.480 the assistant ID and the thread ID. Now here's a 350 00:16:43.519 --> 00:16:45.799 key detail for the book's code. You need to wait 351 00:16:45.840 --> 00:16:49.399 a bit. The assistant needs time to process, search the 352 00:16:49.480 --> 00:16:52.720 knowledge and formulate the answer, so you might add a 353 00:16:52.799 --> 00:16:57.480 time dot sleep or similar pause. After the pause, you 354 00:16:57.559 --> 00:16:59.759 retrieve the list of messages from the thread and the 355 00:17:00.000 --> 00:17:01.799 assistem's answer will be the newest message. 356 00:17:01.840 --> 00:17:04.920 Okay, that pause is important. And the bubble front end 357 00:17:04.920 --> 00:17:07.720 for this probably simpler. 358 00:17:07.480 --> 00:17:09.680 Much simpler for this use case. Yeah, just an input 359 00:17:09.680 --> 00:17:12.319 boxer the user's question, a button and a text box 360 00:17:12.359 --> 00:17:14.359 to display the answer returned by the cloud function. 361 00:17:14.480 --> 00:17:16.400 And the result is you can ask specific questions like 362 00:17:16.480 --> 00:17:19.200 how many senators are there or what's the age requirement 363 00:17:19.240 --> 00:17:22.039 for a senator and it pulls the answer directly from 364 00:17:22.079 --> 00:17:25.000 that constitution pdf you uploaded exactly. 365 00:17:25.039 --> 00:17:28.720 It grounds the AI in your specific source material. It's 366 00:17:28.799 --> 00:17:34.359 incredibly powerful for legal teams, medical info, company knowledge bases, 367 00:17:34.920 --> 00:17:38.759 educational tools, anywhere you need reliable answers from a defined 368 00:17:38.799 --> 00:17:39.640 set of information. 369 00:17:39.920 --> 00:17:43.319 Wow, we've covered a lot, from just understanding the API 370 00:17:43.400 --> 00:17:46.759 basics to playing in the playground, making direct calls, adding 371 00:17:46.799 --> 00:17:50.599 images and audio. Then building actual apps with back ends 372 00:17:50.680 --> 00:17:55.000 and frontends, optimizing costs, and finally creating these powerful knowledgeable 373 00:17:55.000 --> 00:17:58.279 assistance tied to specific documents. You've really gone from just 374 00:17:58.400 --> 00:18:01.039 using chat GPT to understand how to build with its 375 00:18:01.119 --> 00:18:04.799 underlying power. You're equipped now to actually create things. 376 00:18:05.039 --> 00:18:06.960 Yeah, and it brings to mind something Paul Siegel, a 377 00:18:07.000 --> 00:18:10.240 tech entrepreneur, wrote in the forward to Henry's book, You said, Essentially, 378 00:18:10.480 --> 00:18:12.920 I strongly encourage you to use this knowledge to create 379 00:18:12.960 --> 00:18:16.319 your next successful app or business, or simply to enrich 380 00:18:16.359 --> 00:18:19.200 your thinking about how to innovate. Dream on it, then 381 00:18:19.440 --> 00:18:21.839 fashion your dreams into a reality with the tools you've 382 00:18:21.839 --> 00:18:23.880 gained here. I think that sums it up nicely. 383 00:18:24.160 --> 00:18:26.720 Great final thoughts, So the message is clear, don't just 384 00:18:26.799 --> 00:18:29.319 use AI, build with it, Go experiment, see what you 385 00:18:29.319 --> 00:18:29.839 can create.