WEBVTT 1 00:00:00.120 --> 00:00:03.720 Welcome to the deep dive. We take a whole stack 2 00:00:03.799 --> 00:00:06.919 of information articles, research our notes, and really try to 3 00:00:06.919 --> 00:00:08.359 pull out the key insights for you. 4 00:00:08.480 --> 00:00:10.839 Right the goal is always to cut through that complexity, get. 5 00:00:10.679 --> 00:00:12.919 To the useful stuff exactly, and help you unlock the 6 00:00:12.960 --> 00:00:17.079 power of these well cutting edge tools. Today we're diving 7 00:00:17.160 --> 00:00:20.960 deep into prompt engineering for generative AI. 8 00:00:21.160 --> 00:00:23.280 It's a huge topic right now, it really. 9 00:00:23.079 --> 00:00:26.359 Is, and we're working from a fantastic resource today, the 10 00:00:26.399 --> 00:00:30.039 book Prompt Engineering for Generative AI by James Phoenix and 11 00:00:30.120 --> 00:00:33.640 Mike Taylor. It's been called a lighthouse in this sort 12 00:00:33.640 --> 00:00:35.280 of vast ocean of AI. 13 00:00:35.479 --> 00:00:37.159 That's a good way to put it. Yeah, this deep 14 00:00:37.200 --> 00:00:39.280 dive is really giving you a shortcut, a way to 15 00:00:39.399 --> 00:00:44.759 understand how to get reliable, high quality results from AI models. 16 00:00:44.439 --> 00:00:46.880 Whether it's text or images right exactly. 17 00:00:46.439 --> 00:00:49.159 Text or images. It's about, as books says, kind of 18 00:00:49.200 --> 00:00:53.640 future proofing your inputs for reliable AI outputs at scale. 19 00:00:53.799 --> 00:00:56.600 And that's so important because wow, the pace of change 20 00:00:56.640 --> 00:01:01.039 with generative AI is just it's breakneck speed. It's actually 21 00:01:01.039 --> 00:01:01.759 hard to keep up. 22 00:01:01.719 --> 00:01:03.960 Sometimes it really is every week something new. 23 00:01:04.280 --> 00:01:07.079 So let's start at the beginning. What is prompt engineering, 24 00:01:07.319 --> 00:01:08.680 Why does it matter so much? 25 00:01:08.760 --> 00:01:14.120 Okay? So, at its core, prompt engineering is basically the 26 00:01:14.280 --> 00:01:18.159 art and well the science of crafting the right funt inputs, 27 00:01:18.439 --> 00:01:20.959 the prompts you give the AI to get the outputs 28 00:01:20.959 --> 00:01:24.760 you actually want, the instructions essentially exactly clear instructions. And 29 00:01:24.799 --> 00:01:27.640 it matters a lot because what you put in fundamentally 30 00:01:27.719 --> 00:01:30.680 changes the probability of every single word or pixel the 31 00:01:30.719 --> 00:01:32.000 AI generates. 32 00:01:31.560 --> 00:01:34.120 Next ah probability. 33 00:01:34.200 --> 00:01:37.079 Yeah. Plus, you know models like open AIS they charge 34 00:01:37.079 --> 00:01:39.840 based on tokens. Used tokens are kind of like pieces 35 00:01:39.840 --> 00:01:40.760 of words. 36 00:01:40.599 --> 00:01:43.879 So the length and quality of your prompt directly impacts 37 00:01:43.959 --> 00:01:44.799 costs directly. 38 00:01:45.040 --> 00:01:48.239 So optimizing prompts isn't just about quality, it's crucial for 39 00:01:48.359 --> 00:01:53.000 cost and reliability to getting it right saves money and headaches. 40 00:01:53.079 --> 00:01:56.000 Okay, that makes total sense, like briefing someone properly before 41 00:01:56.040 --> 00:01:58.480 they start a task. So let's dive in. What are 42 00:01:58.519 --> 00:02:01.400 those foundational principles, the ones that work no matter which 43 00:02:01.439 --> 00:02:02.480 AI model you're using. 44 00:02:02.560 --> 00:02:05.159 Right, Let's focus on three core principles these really hold 45 00:02:05.239 --> 00:02:08.159 up over time. First, one, and maybe the most common 46 00:02:08.240 --> 00:02:10.719 pitfall people run into, is you need to give direction, 47 00:02:11.000 --> 00:02:14.840 be specific, be specific, brief the AI on exactly what 48 00:02:14.919 --> 00:02:17.759 you wanted to do. So instead of just saying brainstorm 49 00:02:17.759 --> 00:02:19.879 product names for a shoe, which is. 50 00:02:19.840 --> 00:02:21.599 Okay, a bit vague, right. 51 00:02:21.520 --> 00:02:24.759 Vague, you'd get much better results. Adding context like brainstorm 52 00:02:24.759 --> 00:02:27.400 product names for a shoe that fits any foot size 53 00:02:27.719 --> 00:02:30.439 in the style of Steve Jobs, or you know like 54 00:02:30.560 --> 00:02:31.800 Elon Musk wouldn't. 55 00:02:31.479 --> 00:02:34.439 Eame it ah, Okay. Adding that constraint really narrows it 56 00:02:34.479 --> 00:02:36.240 down for the AI precisely. 57 00:02:36.400 --> 00:02:39.199 And the sources we looked at they really emphasize that 58 00:02:39.319 --> 00:02:42.439 too little direction is the number one problem. That's why 59 00:02:42.479 --> 00:02:45.319 AI sometimes seems to well misunderstand you. 60 00:02:45.560 --> 00:02:48.680 Okay, So clear direction first. Once you've got that, what's 61 00:02:48.719 --> 00:02:51.120 the next key thing for getting predictable results. 62 00:02:51.560 --> 00:02:56.000 Next up is specify format. This is huge. AI models 63 00:02:56.039 --> 00:03:00.960 are incredible universal translators, not just between say French and English, 64 00:03:01.439 --> 00:03:05.199 but between data structures. I think JSON to YAMEL or 65 00:03:05.240 --> 00:03:07.360 even just natural language to Python code. 66 00:03:07.400 --> 00:03:07.759 Wow. 67 00:03:07.879 --> 00:03:09.919 So it's really important to tell the AI what format 68 00:03:09.960 --> 00:03:11.919 you want the answer in. If you don't, especially if 69 00:03:11.919 --> 00:03:13.400 you're building software that relies on. 70 00:03:13.360 --> 00:03:15.400 This, Yeah, I can see that you. 71 00:03:15.400 --> 00:03:17.919 Might sometimes get a numbered list when you expected comma 72 00:03:17.960 --> 00:03:20.319 separated values or something like that, and that could just 73 00:03:20.360 --> 00:03:21.520 break your whole process. 74 00:03:21.719 --> 00:03:25.560 So specifying format prevents those kinds of errors. Can you 75 00:03:25.599 --> 00:03:26.879 ask for complex stuff? 76 00:03:27.159 --> 00:03:31.800 Absolutely? You can ask for really complex formats like Mermaid 77 00:03:31.840 --> 00:03:35.800 syntax for generating flow diagrams. It's surprisingly capable. 78 00:03:35.879 --> 00:03:40.800 That's powerful, especially for developers. Okay, so direction format. What's 79 00:03:40.840 --> 00:03:41.520 the third pillar? 80 00:03:41.800 --> 00:03:46.199 The third one is provide examples Sometimes, honestly, it's just 81 00:03:46.360 --> 00:03:49.199 easier to show the AI what you like instead of 82 00:03:49.240 --> 00:03:50.960 trying to describe it perfectly. 83 00:03:50.560 --> 00:03:52.639 Like show, don't just tell exactly. 84 00:03:53.120 --> 00:03:55.879 This works really well if you're maybe not an expert 85 00:03:55.879 --> 00:03:58.560 in the specific domain yourself. Let's say you want product 86 00:03:58.639 --> 00:04:02.039 names but in a very particular kind of quirky style. Okay, 87 00:04:02.039 --> 00:04:04.280 instead of trying to describe quirki, you just give examples 88 00:04:04.319 --> 00:04:07.400 like eyebar, fridge, iverdge beer. I time, the AI sees 89 00:04:07.439 --> 00:04:09.639 that pattern immediately, Ah. 90 00:04:09.360 --> 00:04:11.520 I see, it learns the style from the examples. How 91 00:04:11.560 --> 00:04:12.680 many examples work best? 92 00:04:12.879 --> 00:04:15.840 Usually just adding one to three examples almost always helps. 93 00:04:15.840 --> 00:04:19.040 It gives the AI a much clearer target. You just 94 00:04:19.079 --> 00:04:21.040 need to be mindful of the token limits. 95 00:04:20.759 --> 00:04:22.680 Right, the character limits for the prompt. 96 00:04:22.480 --> 00:04:25.600 Yeah, like mid journey. The image generator takes about six 97 00:04:25.639 --> 00:04:29.160 thousand characters free chat GPT is more like thirty two thousand, 98 00:04:29.519 --> 00:04:32.040 so you usually have space for a few good examples 99 00:04:32.040 --> 00:04:33.000 without any trouble. 100 00:04:33.319 --> 00:04:38.079 Okay, so give direction, specify format, provide examples. Those are 101 00:04:38.120 --> 00:04:41.199 the fundamentals. But let's level up for people wanting to 102 00:04:41.319 --> 00:04:44.319 use AI professionally. How do you get it to do 103 00:04:44.480 --> 00:04:49.360 more complex things like generating structured data, transforming text, maybe 104 00:04:49.399 --> 00:04:50.680 even checking its own work. 105 00:04:50.800 --> 00:04:52.720 Absolutely, this is where it moves from you know, just 106 00:04:52.839 --> 00:04:57.560 experimenting to really building things. Let's start with generating structured outputs. 107 00:04:57.560 --> 00:04:59.800 This goes way beyond simple lists. Okay, you can get 108 00:04:59.800 --> 00:05:03.120 the A to generate really complex mested data structures. The 109 00:05:03.120 --> 00:05:07.560 book mentions things like hierarchical lists JSON Yamal, like. 110 00:05:07.480 --> 00:05:10.240 Creating a database ready structure exactly. 111 00:05:10.480 --> 00:05:14.839 Imagine generating a detailed article outline perfectly formatted as a 112 00:05:14.920 --> 00:05:19.120 Jason payload, or taking a user's casual request and turning 113 00:05:19.120 --> 00:05:24.000 it into a structured Yamal shopping list. The precision is incredible. 114 00:05:24.079 --> 00:05:27.120 Wow, does it always get it right like perfectly valid 115 00:05:27.160 --> 00:05:28.000 Jason every time? 116 00:05:28.120 --> 00:05:31.800 Not always. Language models can sometimes add extra conversational text 117 00:05:32.079 --> 00:05:35.680 or maybe generate slightly invalid Jason or Yamal, But there 118 00:05:35.680 --> 00:05:38.319 are smart ways strategies to handle those kinds of edge 119 00:05:38.319 --> 00:05:39.360 cases in your code. 120 00:05:39.519 --> 00:05:41.319 Okay, good to know and beyond JASAML. 121 00:05:41.720 --> 00:05:44.480 Yeah, it can even generate things like mock CSV data, 122 00:05:44.920 --> 00:05:48.040 you know, a list of fake names, ages, grades, whatever 123 00:05:48.040 --> 00:05:50.720 you need ready to use in spreadsheets or other tools. 124 00:05:50.959 --> 00:05:53.240 It's like having an instant data engineer. 125 00:05:53.040 --> 00:05:55.839 That is genuinely powerful for automation. Okay, so that's generating 126 00:05:55.839 --> 00:05:59.800 structured stuff, but what about working with texts that already exists, transforming, 127 00:06:00.120 --> 00:06:02.160 simplifying it, analyzing it right. 128 00:06:02.120 --> 00:06:05.399 Huge area. There's several really cool techniques here. One that's 129 00:06:05.439 --> 00:06:08.920 super popular and useful is explain it like I'm five 130 00:06:09.079 --> 00:06:13.240 ELI five, yet exactly ELI five. It's not just a gimmick. 131 00:06:13.720 --> 00:06:18.319 It's a seriously powerful way to take dense technical documents 132 00:06:18.759 --> 00:06:22.720 think medical abstracts or complex legal text and boil them 133 00:06:22.759 --> 00:06:27.480 down into language anyone can grasp. It really helps democratize information. 134 00:06:27.839 --> 00:06:28.759 That's fantastic. 135 00:06:28.879 --> 00:06:32.560 What else, then, there's universal translation. We mentioned language to language, 136 00:06:32.720 --> 00:06:36.279 but lllms can also translate between coding languages like Python 137 00:06:36.319 --> 00:06:39.040 to JavaScript or vice versa. They act as this amazing 138 00:06:39.040 --> 00:06:39.759 bridge TREA. 139 00:06:39.639 --> 00:06:43.560 Do communication gaps? Okay, but what if the AI doesn't 140 00:06:43.560 --> 00:06:46.240 have enough information to give a good answer, can it 141 00:06:46.399 --> 00:06:47.920 like ask for more detail? 142 00:06:48.120 --> 00:06:50.600 Yes? Absolutely, you can teach it to ask for context. 143 00:06:51.079 --> 00:06:53.680 Llm's can function as sort of simple agents with some 144 00:06:53.800 --> 00:06:56.560 reasoning ability. You can actually prompt them to recognize when 145 00:06:56.560 --> 00:06:59.319 they lack info and then ask you clarifying questions. 146 00:06:59.360 --> 00:07:00.959 Oh interesting, So it becomes more of a. 147 00:07:00.920 --> 00:07:03.639 Dialogue exactly like if you ask should I use Mango 148 00:07:03.680 --> 00:07:07.079 dB or POSTGRESCOO, a well prompted GBT four might come 149 00:07:07.120 --> 00:07:09.040 back with okay to answer that, I need to know 150 00:07:09.079 --> 00:07:11.399 what's your data structure? Like what are your scalability needs? 151 00:07:11.480 --> 00:07:13.199 Do you need acid compliance? And so on? 152 00:07:13.439 --> 00:07:15.360 So it guides you to give it the info at 153 00:07:15.399 --> 00:07:16.959 needs for a better answer. 154 00:07:17.319 --> 00:07:20.759 It's smart, very smart, turns it into an active problem solver. 155 00:07:21.360 --> 00:07:24.879 Another really neat one is text style unbundling unbundling. 156 00:07:25.120 --> 00:07:25.480 What's that? 157 00:07:25.839 --> 00:07:28.319 It means you can get the AI to analyze a 158 00:07:28.319 --> 00:07:33.399 piece of text and extract its specific stylistic features the tone, sentence, length, 159 00:07:33.600 --> 00:07:36.040 vocabulary choices, even the structure. 160 00:07:36.160 --> 00:07:38.399 Okay, and then what then you can. 161 00:07:38.399 --> 00:07:42.040 Use those extracted features? Is a kind of style guide 162 00:07:42.199 --> 00:07:45.639 to generate new content that matches that original voice perfectly. 163 00:07:46.319 --> 00:07:49.240 Super useful for businesses wanting consistent brand messages. 164 00:07:49.319 --> 00:07:53.560 Ah I see maintaining a consistent voice across different pieces 165 00:07:53.560 --> 00:07:57.040 of content crucial for branding totally. Now, what about just 166 00:07:57.079 --> 00:08:00.519 dealing with huge amounts of text, like reading massive reports 167 00:08:00.600 --> 00:08:01.560 or research papers. 168 00:08:01.639 --> 00:08:05.439 That's where summarization and chunking come in. AI. Summarization is 169 00:08:05.519 --> 00:08:09.399 amazing for distilling information, but for really long documents you 170 00:08:09.480 --> 00:08:11.720 hit those context limits we talked about, right. 171 00:08:11.600 --> 00:08:14.079 The AI can only remember so much text at once. 172 00:08:14.240 --> 00:08:19.439 Exactly, so chunking, just breaking the text into smaller, manageable pieces, 173 00:08:19.839 --> 00:08:23.680 is essential. It lets you process long documents, even ones 174 00:08:23.720 --> 00:08:26.480 covering multiple topics, without overwhelming the AI. 175 00:08:26.879 --> 00:08:28.800 How do you decide where to split the texts? 176 00:08:28.959 --> 00:08:32.000 There are different ways. You can split by sentence, paragraph, 177 00:08:32.159 --> 00:08:35.120 sometimes by complexity, or just by length, or you can 178 00:08:35.120 --> 00:08:37.919 get really precise and split by the actual token count 179 00:08:38.279 --> 00:08:42.279 using specific tools, especially for models like open ais ensures 180 00:08:42.320 --> 00:08:43.759 each chunk fits perfectly. 181 00:08:43.919 --> 00:08:46.679 Okay, smart ways to handle large inputs. But now we've 182 00:08:46.679 --> 00:08:49.159 generated all this output, how do we know if our 183 00:08:49.200 --> 00:08:51.720 prompts are actually any good? How do we evaluate the 184 00:08:51.799 --> 00:08:54.039 quality rigorously great question. 185 00:08:54.480 --> 00:08:57.919 Evaluating prompt quality is key if you're serious. You can 186 00:08:57.960 --> 00:09:00.600 start simple, like with the thumbs up thumbs down rating 187 00:09:00.639 --> 00:09:02.039 system at the bit of Rigger. 188 00:09:02.120 --> 00:09:03.600 Okay, basic feedback. 189 00:09:03.279 --> 00:09:06.000 But you can get much more sophisticated. Automated evaluation is 190 00:09:06.039 --> 00:09:08.879 totally possible. For instance, you could use a powerful model 191 00:09:08.919 --> 00:09:12.000 like GPT four to actually grade the responses from a 192 00:09:12.039 --> 00:09:13.840 less powerful model AI. 193 00:09:13.919 --> 00:09:17.440 Evaluating AI interesting using the best model to check the others. 194 00:09:17.759 --> 00:09:20.799 Yeah, and the book talks about proper ab testing methods. 195 00:09:21.159 --> 00:09:23.600 Often using tools like Jupiter notebooks, you can do things 196 00:09:23.639 --> 00:09:26.559 like shuffler responses, so the human rader is blind to 197 00:09:26.600 --> 00:09:30.159 which prompt variation produced which output, avoiding bias. 198 00:09:30.320 --> 00:09:33.120 Proper scientific method basically exactly. 199 00:09:33.159 --> 00:09:36.360 You can even compare prompt variations using metrics like embedding 200 00:09:36.399 --> 00:09:40.840 distance that measures how semantically similar an AI's answer is 201 00:09:41.120 --> 00:09:43.720 to a known ground truth or perfect. 202 00:09:43.399 --> 00:09:47.080 Answer, so measuring how close it is and meaning right. 203 00:09:47.120 --> 00:09:50.879 The whole point is to iterate faster, more scientifically and 204 00:09:51.000 --> 00:09:54.600 reduce the need for tons of slow, expensive manual review. 205 00:09:54.720 --> 00:09:57.399 It's incredible how fast this field is moving, not just 206 00:09:57.440 --> 00:10:00.799 the prompting techniques but the underlying AI model themselves, and 207 00:10:00.840 --> 00:10:03.200 the frameworks built on top of them feels like warp 208 00:10:03.200 --> 00:10:03.919 speeds sometimes. 209 00:10:04.039 --> 00:10:07.519 Oh, absolutely, the pace of innovation is just staggering. If 210 00:10:07.519 --> 00:10:10.200 we take a brief history of text generation models, the 211 00:10:10.200 --> 00:10:13.799 big leap was the transformer architecture back around twenty seventeen. 212 00:10:13.440 --> 00:10:14.720 Right, that changed everything. 213 00:10:15.039 --> 00:10:17.919 It really did allowed models to connect words across long 214 00:10:17.960 --> 00:10:22.240 distances in text, boosting comprehension and efficiency. Then you had 215 00:10:22.240 --> 00:10:25.639 open ais GPT series, GPT two, GPT three, three point 216 00:10:25.639 --> 00:10:29.240 five Turbo, Chat GPT now GPT four really pushing things 217 00:10:29.240 --> 00:10:30.039 into the public eye. 218 00:10:30.120 --> 00:10:33.279 GPT three point five Turbo and chat GPT made it accessible. 219 00:10:33.360 --> 00:10:37.279 Yeah, three point five Turbo, especially with Microsoft's investment, brought 220 00:10:37.279 --> 00:10:41.120 better efficiency and lower costs, made lllms practical for more people. 221 00:10:41.279 --> 00:10:45.240 And Chat GPT fine Tune for conversation just exploded fastest 222 00:10:45.240 --> 00:10:48.799 going app ever, apparently, and gptwo four GPT four released 223 00:10:48.799 --> 00:10:51.759 in twenty twenty four was another step change, excelling at 224 00:10:51.799 --> 00:10:55.080 complex stuff, scoring in the ninetieth percentile on the bar exam. 225 00:10:55.440 --> 00:10:58.919 It showed AI tackling really high level analytical tasks. 226 00:10:59.200 --> 00:11:01.399 That's the sort of clo source big company side. What 227 00:11:01.440 --> 00:11:03.200 about the open source world that seems to be moving 228 00:11:03.279 --> 00:11:04.200 justice fast. 229 00:11:03.960 --> 00:11:07.279 Totally mis Lama series, Lama, Lama two, Lama three takes 230 00:11:07.320 --> 00:11:09.559 a different path by being open source that builds a 231 00:11:09.559 --> 00:11:10.720 whole community. 232 00:11:10.279 --> 00:11:13.480 Around it, democratizing it in a way exactly, and it 233 00:11:13.480 --> 00:11:16.519 allows for cool optimizations like quantization and Laura. 234 00:11:17.080 --> 00:11:20.159 Those are techniques to basically shrink or specialize these huge 235 00:11:20.159 --> 00:11:22.840 models so you can run them on like good home computer. 236 00:11:22.879 --> 00:11:25.960 GPU makes them more accessible. Any other big open source 237 00:11:25.960 --> 00:11:26.799 players Yeah. 238 00:11:26.720 --> 00:11:30.639 Mistral seven B from the French startup mistral ai is 239 00:11:30.679 --> 00:11:33.279 getting a lot of buzz too, another really powerful open 240 00:11:33.279 --> 00:11:36.720 source option. So right now, GPT four probably leads on 241 00:11:36.879 --> 00:11:40.120 raw capability in many areas, but open source like Lama 242 00:11:40.159 --> 00:11:43.080 and Mistral are super exciting, especially if you want to 243 00:11:43.120 --> 00:11:45.080 find you in a model for a very specific job. 244 00:11:45.320 --> 00:11:48.200 Okay, so we have these powerful models open and closed source, 245 00:11:48.399 --> 00:11:51.960 but how do developers actually build applications with them, connect 246 00:11:52.000 --> 00:11:54.679 them to data, make them do things. Is there a 247 00:11:54.759 --> 00:11:56.240 standard toolkit that's. 248 00:11:56.080 --> 00:11:59.279 Where frameworks like lang chain come in. It's become hugely popular. 249 00:11:59.399 --> 00:12:03.000 Is an open source framework Python and typescript designed specifically 250 00:12:03.399 --> 00:12:04.919 for building LM applications? 251 00:12:04.960 --> 00:12:05.879 Oh, what's its main goal? 252 00:12:06.000 --> 00:12:10.679 Two core ideas enhancing data awareness, connecting lms to external 253 00:12:10.759 --> 00:12:14.600 data they weren't trained on, and agency giving LMS the 254 00:12:14.639 --> 00:12:16.919 ability to take actions and influence their environment. 255 00:12:17.279 --> 00:12:21.080 Okay, data awareness and agency. How does it achieve that? 256 00:12:21.559 --> 00:12:25.399 Through modular building blocks things like model io for interacting 257 00:12:25.399 --> 00:12:30.000 with different models, retrieval for fetching data, chains for sequencing operations, 258 00:12:30.200 --> 00:12:33.639 agents for decision making, and tool use memory for remembering 259 00:12:33.679 --> 00:12:36.759 past interactions and callbacks for running code at certain points. 260 00:12:36.840 --> 00:12:39.600 Sounds comprehensive. Does it work with different AI providers? 261 00:12:39.759 --> 00:12:44.600 Yeah? Supports models from Anthropic, Google's Vertex Ai, OpenAI, and others. 262 00:12:44.759 --> 00:12:48.159 Plus it handles practical stuff like streaming, getting words back 263 00:12:48.200 --> 00:12:51.159 one by one like chat GPT does, and batching for 264 00:12:51.240 --> 00:12:52.919 running multiple requests in parallel. 265 00:12:53.240 --> 00:12:56.960 What about getting structured data out of the LM's responses reliably. 266 00:12:57.120 --> 00:12:59.519 That's where laying chain's output parts are key, especially the 267 00:12:59.559 --> 00:13:02.519 ones that use identic, which is great for defining Jason structures. 268 00:13:02.840 --> 00:13:06.159 They help reliably turn the AI's natural language answer into 269 00:13:06.279 --> 00:13:09.480 clean structured data. It essentially lets you build a flexible 270 00:13:09.519 --> 00:13:11.159 API on top of the LLM. 271 00:13:11.240 --> 00:13:14.360 And what about open AI's specific way for models to 272 00:13:14.399 --> 00:13:16.679 interact with external systems. Is that different? 273 00:13:17.080 --> 00:13:20.759 You're probably thinking of open AI function calling. It's their 274 00:13:20.840 --> 00:13:25.200 method for letting llms intelligently decide to call external functions. 275 00:13:25.720 --> 00:13:27.039 How does that work? Exactly? 276 00:13:27.240 --> 00:13:30.519 LLM analyzes the conversation, figures out it needs to do 277 00:13:30.559 --> 00:13:34.159 something specific, like check the weather. It then outputs a 278 00:13:34.159 --> 00:13:37.519 structured Jason object saying call a check weather function with 279 00:13:37.600 --> 00:13:41.440 location London. Your system runs that function, gets the weather data, 280 00:13:41.720 --> 00:13:44.720 feeds it back into the conversation, and the LLM can 281 00:13:44.759 --> 00:13:46.320 then summarize it for the user. 282 00:13:46.519 --> 00:13:48.440 So it tells your code what function to run and 283 00:13:48.519 --> 00:13:49.360 with what arguments. 284 00:13:49.600 --> 00:13:52.960 Very neat, very neat, Very powerful for integrations and for. 285 00:13:53.039 --> 00:13:56.519 Fine tuning the output on specific tasks, especially new ones. 286 00:13:56.799 --> 00:13:59.919 That brings us back to fu shot learning. Remember providing 287 00:14:00.080 --> 00:14:01.120 examples in the prompt. 288 00:14:01.200 --> 00:14:03.480 Yeah, like the ibar fridge example exactly. 289 00:14:03.799 --> 00:14:07.360 While zero shot relies just on the model's training, few 290 00:14:07.360 --> 00:14:09.919 shot gives it those crucial examples right in the prompt. 291 00:14:10.679 --> 00:14:14.279 It helps optimize the model's behavior for exactly what you want. 292 00:14:14.919 --> 00:14:17.240 It's like giving the AI a mini tutorial for the 293 00:14:17.320 --> 00:14:18.159 specific task. 294 00:14:18.480 --> 00:14:22.000 Does it still matter with models that have huge context windows? Now? 295 00:14:22.200 --> 00:14:25.440 Yeah, it often still helps Even with large context windows. 296 00:14:25.679 --> 00:14:27.840 A few good examples can guide the model to the 297 00:14:27.919 --> 00:14:31.720 right answer faster and more reliably, which can actually save 298 00:14:31.759 --> 00:14:34.720 you on API costs because you use fewer tokens overall 299 00:14:34.960 --> 00:14:36.080 to get the desired results. 300 00:14:36.120 --> 00:14:39.320 Okay, this is all incredibly powerful, but it raises a 301 00:14:39.320 --> 00:14:42.200 big question. How do we get these AI models to 302 00:14:42.279 --> 00:14:45.480 work securely and effectively with our data, our company knowledge, 303 00:14:45.519 --> 00:14:48.200 our specific documents, and how do we make them remember 304 00:14:48.320 --> 00:14:51.080 previous conversations that seems vital for real. 305 00:14:50.840 --> 00:14:54.360 World use, absolutely vital. This is where connecting llms to 306 00:14:54.440 --> 00:14:58.440 your data and managing memory really unlocks their practical potential. 307 00:14:58.679 --> 00:15:01.440 Let's talk data connection in VEC databases. Okay, so your 308 00:15:01.519 --> 00:15:04.000 organization's data. It comes in all shapes and sizes, right, 309 00:15:04.080 --> 00:15:08.159 unstructured stuff like Google docs, web pages, code and structure 310 00:15:08.159 --> 00:15:11.200 stuff in SQL. No SQL databases. To let the AI 311 00:15:11.320 --> 00:15:14.799 query that unstructured data, the process usually involves loading it 312 00:15:14.840 --> 00:15:18.759 into what Lang chain calls documents, then chunking them, breaking 313 00:15:18.799 --> 00:15:21.519 them into smaller pieces, and then storing these pieces in 314 00:15:21.559 --> 00:15:24.080 a special database called a vector database. 315 00:15:24.240 --> 00:15:26.080 Vector database Okay, what makes it special? 316 00:15:26.320 --> 00:15:30.480 It stores data based on meaning using embeddings. Embeddings are 317 00:15:30.919 --> 00:15:36.120 numerical representations vectors of text. Models like open aies text 318 00:15:36.200 --> 00:15:40.039 embedding ATA zero zero two or open source ones from 319 00:15:40.120 --> 00:15:42.559 hugging face turn text into these. 320 00:15:42.519 --> 00:15:46.000 Vectors, so numbers that represent the meaning exactly. 321 00:15:45.840 --> 00:15:48.360 Text with similar meanings end up closer together in this 322 00:15:48.399 --> 00:15:51.159 high dimensional mathematical space. Think of it like a map 323 00:15:51.200 --> 00:15:51.879 of concepts. 324 00:15:52.159 --> 00:15:54.039 Is creating these embeddings expensive. 325 00:15:54.279 --> 00:15:57.799 Actually, open ais are pretty cheap. The source mentioned embedding 326 00:15:57.840 --> 00:16:00.519 the entire King James Bible would cost something like a 327 00:16:00.559 --> 00:16:03.399 dollar sixty cents, and there are good open source options too. 328 00:16:03.399 --> 00:16:07.000 Okay affordable. So these embeddings go into the vector database. 329 00:16:07.120 --> 00:16:10.639 Right. Vector databases like FAES which is open source, or 330 00:16:10.679 --> 00:16:13.600 hosted ones like pine Cone or Chroma are built to 331 00:16:13.720 --> 00:16:17.120 store these vectors and search them based on semantic similarity, 332 00:16:17.200 --> 00:16:20.440 finding the vectors and thus the original text chunks that 333 00:16:20.480 --> 00:16:23.159 are closest in meaning to your query vector. 334 00:16:23.240 --> 00:16:25.799 And this whole process helps prevent the AI from just 335 00:16:25.919 --> 00:16:28.559 making things up right the hallucinations precisely. 336 00:16:28.960 --> 00:16:33.159 That leads us to retrieval augmented generation or R. This 337 00:16:33.279 --> 00:16:36.320 is the key technique for fighting hallucinations and also getting 338 00:16:36.360 --> 00:16:38.159 around those context length limits. 339 00:16:38.279 --> 00:16:40.519 How does our RAG work in practice? 340 00:16:40.639 --> 00:16:43.960 It's pretty elegant. A user asks a question, your system 341 00:16:44.039 --> 00:16:47.679 first converts that question into an embedding vector. Then it 342 00:16:47.759 --> 00:16:51.000 searches your vector database for the text chunks whose embeddings 343 00:16:51.000 --> 00:16:52.120 are most similar. 344 00:16:52.159 --> 00:16:54.240 Finds the relevant bits of your own data. 345 00:16:54.360 --> 00:16:57.960 Exactly, It retrieves those relevant jumps and literally inserts them 346 00:16:58.000 --> 00:17:02.840 into the prompt you sent to the LLM, providing explicit context. Then, crucially, 347 00:17:03.120 --> 00:17:06.319 you instruct the LLM to answer the user's question based 348 00:17:06.319 --> 00:17:07.960 only on the provided context. 349 00:17:08.079 --> 00:17:10.759 Ah. So you're forcing it to use your verified information, 350 00:17:11.160 --> 00:17:13.039 not just its general training data. 351 00:17:13.079 --> 00:17:16.240 Precisely, it lets you dynamically pull in specific up to 352 00:17:16.319 --> 00:17:20.640 date knowledge, maybe chat history, specific PDS, sections, products, pecs, 353 00:17:20.680 --> 00:17:24.359 ensuring the AI's answer is informed, relevant, and grounded in fact. 354 00:17:24.640 --> 00:17:28.119 That's huge for accuracy. Okay, so gig gives it factual knowledge. 355 00:17:28.160 --> 00:17:30.599 What about memory, making it remember past parts of a 356 00:17:30.599 --> 00:17:32.640 conversation or user preferences over time. 357 00:17:32.799 --> 00:17:36.359 That's memory in llms, and it's crucial for making interactions 358 00:17:36.359 --> 00:17:40.519 feel natural and personalized. We can think about two types. First, 359 00:17:40.880 --> 00:17:42.319 short term memory. 360 00:17:42.519 --> 00:17:46.200 STM like working memory kind of Yeah, it lets. 361 00:17:46.000 --> 00:17:49.240 The LLM remember what was said earlier within the same interaction. 362 00:17:49.799 --> 00:17:53.319 Think of a support chatbot remembering your initial query when 363 00:17:53.359 --> 00:17:55.319 you ask a follow up question minutes later. 364 00:17:56.039 --> 00:17:58.640 Lane chain makes adding STM pretty straightforward. 365 00:17:58.720 --> 00:18:02.079 Okay, remembers the current chop. What about remembering things across 366 00:18:02.119 --> 00:18:04.680 different sessions days or weeks later. 367 00:18:04.880 --> 00:18:08.279 That's long term memory LTM, and this is usually achieved 368 00:18:08.319 --> 00:18:11.319 by storing summaries of past conversations or key pieces of 369 00:18:11.319 --> 00:18:14.599 information in a vector database. When the user starts a 370 00:18:14.599 --> 00:18:18.839 new session, you retrieve relevant past interactions or preferences using 371 00:18:18.839 --> 00:18:22.039 similarity search and add that as context to the prompt, 372 00:18:22.160 --> 00:18:22.519 so it. 373 00:18:22.440 --> 00:18:25.119 Can remember my book preferences from last month when I 374 00:18:25.160 --> 00:18:26.920 ask for new recommendation exactly. 375 00:18:26.960 --> 00:18:31.200 That it allows for truly personalized, context aware interactions over time. 376 00:18:31.519 --> 00:18:34.160 This is where it starts to feel really intelligent, capable 377 00:18:34.200 --> 00:18:38.000 of complex tasks, which brings us to AI agents. What 378 00:18:38.079 --> 00:18:40.519 if the AI could not just think or retrieve info, 379 00:18:40.799 --> 00:18:44.039 but actually do things, take actions exactly. 380 00:18:44.200 --> 00:18:47.880 We're now in the realm of agent based architecture. The 381 00:18:47.920 --> 00:18:52.880 AI acts, perceives, makes decisions to achieve goals. A key 382 00:18:52.960 --> 00:18:56.440 technique enabling this is chain of thought ski B reasoning. 383 00:18:56.519 --> 00:18:59.200 We touched on that, making the AI think step by 384 00:18:59.240 --> 00:18:59.960 step right. 385 00:19:00.000 --> 00:19:02.079 Instead of just asking for a say a marketing plan. 386 00:19:02.480 --> 00:19:04.960 You prompt the AI to first think through the steps. 387 00:19:05.160 --> 00:19:08.920 First consider the target audience, Second, analyze the budget constraints, 388 00:19:09.400 --> 00:19:12.640 third research competitor products. Then outline the. 389 00:19:12.559 --> 00:19:14.519 Plan breaking down the problem. Yeah. 390 00:19:14.559 --> 00:19:17.319 It forces a more structured reasoning process, leading to much 391 00:19:17.359 --> 00:19:20.279 more relevant and well thought out responses than just asking 392 00:19:20.279 --> 00:19:22.680 for the final answer directly. It's like making it show 393 00:19:22.720 --> 00:19:23.119 its work. 394 00:19:23.319 --> 00:19:26.480 Okay, so copey improves reasoning. How does that connect to 395 00:19:26.559 --> 00:19:27.960 taking actual actions? 396 00:19:28.279 --> 00:19:31.200 That leads directly to the reason and act REACT framework. 397 00:19:31.599 --> 00:19:34.440 This explicitly combines that chain of thought reasoning with the 398 00:19:34.480 --> 00:19:36.440 ability to take actions using tools. 399 00:19:36.720 --> 00:19:39.799 Okay, reason and act. How does that loop work? 400 00:19:40.119 --> 00:19:45.160 It's a cycle. One thought. The LM internally reasons about 401 00:19:45.160 --> 00:19:48.599