WEBVTT

1
00:00:00.120 --> 00:00:03.720
<v Speaker 1>Welcome to the deep dive. We take a whole stack

2
00:00:03.799 --> 00:00:06.919
<v Speaker 1>of information articles, research our notes, and really try to

3
00:00:06.919 --> 00:00:08.359
<v Speaker 1>pull out the key insights for you.

4
00:00:08.480 --> 00:00:10.839
<v Speaker 2>Right the goal is always to cut through that complexity, get.

5
00:00:10.679 --> 00:00:12.919
<v Speaker 1>To the useful stuff exactly, and help you unlock the

6
00:00:12.960 --> 00:00:17.079
<v Speaker 1>power of these well cutting edge tools. Today we're diving

7
00:00:17.160 --> 00:00:20.960
<v Speaker 1>deep into prompt engineering for generative AI.

8
00:00:21.160 --> 00:00:23.280
<v Speaker 2>It's a huge topic right now, it really.

9
00:00:23.079 --> 00:00:26.359
<v Speaker 1>Is, and we're working from a fantastic resource today, the

10
00:00:26.399 --> 00:00:30.039
<v Speaker 1>book Prompt Engineering for Generative AI by James Phoenix and

11
00:00:30.120 --> 00:00:33.640
<v Speaker 1>Mike Taylor. It's been called a lighthouse in this sort

12
00:00:33.640 --> 00:00:35.280
<v Speaker 1>of vast ocean of AI.

13
00:00:35.479 --> 00:00:37.159
<v Speaker 2>That's a good way to put it. Yeah, this deep

14
00:00:37.200 --> 00:00:39.280
<v Speaker 2>dive is really giving you a shortcut, a way to

15
00:00:39.399 --> 00:00:44.759
<v Speaker 2>understand how to get reliable, high quality results from AI models.

16
00:00:44.439 --> 00:00:46.880
<v Speaker 1>Whether it's text or images right exactly.

17
00:00:46.439 --> 00:00:49.159
<v Speaker 2>Text or images. It's about, as books says, kind of

18
00:00:49.200 --> 00:00:53.640
<v Speaker 2>future proofing your inputs for reliable AI outputs at scale.

19
00:00:53.799 --> 00:00:56.600
<v Speaker 1>And that's so important because wow, the pace of change

20
00:00:56.640 --> 00:01:01.039
<v Speaker 1>with generative AI is just it's breakneck speed. It's actually

21
00:01:01.039 --> 00:01:01.759
<v Speaker 1>hard to keep up.

22
00:01:01.719 --> 00:01:03.960
<v Speaker 2>Sometimes it really is every week something new.

23
00:01:04.280 --> 00:01:07.079
<v Speaker 1>So let's start at the beginning. What is prompt engineering,

24
00:01:07.319 --> 00:01:08.680
<v Speaker 1>Why does it matter so much?

25
00:01:08.760 --> 00:01:14.120
<v Speaker 2>Okay? So, at its core, prompt engineering is basically the

26
00:01:14.280 --> 00:01:18.159
<v Speaker 2>art and well the science of crafting the right funt inputs,

27
00:01:18.439 --> 00:01:20.959
<v Speaker 2>the prompts you give the AI to get the outputs

28
00:01:20.959 --> 00:01:24.760
<v Speaker 2>you actually want, the instructions essentially exactly clear instructions. And

29
00:01:24.799 --> 00:01:27.640
<v Speaker 2>it matters a lot because what you put in fundamentally

30
00:01:27.719 --> 00:01:30.680
<v Speaker 2>changes the probability of every single word or pixel the

31
00:01:30.719 --> 00:01:32.000
<v Speaker 2>AI generates.

32
00:01:31.560 --> 00:01:34.120
<v Speaker 1>Next ah probability.

33
00:01:34.200 --> 00:01:37.079
<v Speaker 2>Yeah. Plus, you know models like open AIS they charge

34
00:01:37.079 --> 00:01:39.840
<v Speaker 2>based on tokens. Used tokens are kind of like pieces

35
00:01:39.840 --> 00:01:40.760
<v Speaker 2>of words.

36
00:01:40.599 --> 00:01:43.879
<v Speaker 1>So the length and quality of your prompt directly impacts

37
00:01:43.959 --> 00:01:44.799
<v Speaker 1>costs directly.

38
00:01:45.040 --> 00:01:48.239
<v Speaker 2>So optimizing prompts isn't just about quality, it's crucial for

39
00:01:48.359 --> 00:01:53.000
<v Speaker 2>cost and reliability to getting it right saves money and headaches.

40
00:01:53.079 --> 00:01:56.000
<v Speaker 1>Okay, that makes total sense, like briefing someone properly before

41
00:01:56.040 --> 00:01:58.480
<v Speaker 1>they start a task. So let's dive in. What are

42
00:01:58.519 --> 00:02:01.400
<v Speaker 1>those foundational principles, the ones that work no matter which

43
00:02:01.439 --> 00:02:02.480
<v Speaker 1>AI model you're using.

44
00:02:02.560 --> 00:02:05.159
<v Speaker 2>Right, Let's focus on three core principles these really hold

45
00:02:05.239 --> 00:02:08.159
<v Speaker 2>up over time. First, one, and maybe the most common

46
00:02:08.240 --> 00:02:10.719
<v Speaker 2>pitfall people run into, is you need to give direction,

47
00:02:11.000 --> 00:02:14.840
<v Speaker 2>be specific, be specific, brief the AI on exactly what

48
00:02:14.919 --> 00:02:17.759
<v Speaker 2>you wanted to do. So instead of just saying brainstorm

49
00:02:17.759 --> 00:02:19.879
<v Speaker 2>product names for a shoe, which is.

50
00:02:19.840 --> 00:02:21.599
<v Speaker 1>Okay, a bit vague, right.

51
00:02:21.520 --> 00:02:24.759
<v Speaker 2>Vague, you'd get much better results. Adding context like brainstorm

52
00:02:24.759 --> 00:02:27.400
<v Speaker 2>product names for a shoe that fits any foot size

53
00:02:27.719 --> 00:02:30.439
<v Speaker 2>in the style of Steve Jobs, or you know like

54
00:02:30.560 --> 00:02:31.800
<v Speaker 2>Elon Musk wouldn't.

55
00:02:31.479 --> 00:02:34.439
<v Speaker 1>Eame it ah, Okay. Adding that constraint really narrows it

56
00:02:34.479 --> 00:02:36.240
<v Speaker 1>down for the AI precisely.

57
00:02:36.400 --> 00:02:39.199
<v Speaker 2>And the sources we looked at they really emphasize that

58
00:02:39.319 --> 00:02:42.439
<v Speaker 2>too little direction is the number one problem. That's why

59
00:02:42.479 --> 00:02:45.319
<v Speaker 2>AI sometimes seems to well misunderstand you.

60
00:02:45.560 --> 00:02:48.680
<v Speaker 1>Okay, So clear direction first. Once you've got that, what's

61
00:02:48.719 --> 00:02:51.120
<v Speaker 1>the next key thing for getting predictable results.

62
00:02:51.560 --> 00:02:56.000
<v Speaker 2>Next up is specify format. This is huge. AI models

63
00:02:56.039 --> 00:03:00.960
<v Speaker 2>are incredible universal translators, not just between say French and English,

64
00:03:01.439 --> 00:03:05.199
<v Speaker 2>but between data structures. I think JSON to YAMEL or

65
00:03:05.240 --> 00:03:07.360
<v Speaker 2>even just natural language to Python code.

66
00:03:07.400 --> 00:03:07.759
<v Speaker 1>Wow.

67
00:03:07.879 --> 00:03:09.919
<v Speaker 2>So it's really important to tell the AI what format

68
00:03:09.960 --> 00:03:11.919
<v Speaker 2>you want the answer in. If you don't, especially if

69
00:03:11.919 --> 00:03:13.400
<v Speaker 2>you're building software that relies on.

70
00:03:13.360 --> 00:03:15.400
<v Speaker 1>This, Yeah, I can see that you.

71
00:03:15.400 --> 00:03:17.919
<v Speaker 2>Might sometimes get a numbered list when you expected comma

72
00:03:17.960 --> 00:03:20.319
<v Speaker 2>separated values or something like that, and that could just

73
00:03:20.360 --> 00:03:21.520
<v Speaker 2>break your whole process.

74
00:03:21.719 --> 00:03:25.560
<v Speaker 1>So specifying format prevents those kinds of errors. Can you

75
00:03:25.599 --> 00:03:26.879
<v Speaker 1>ask for complex stuff?

76
00:03:27.159 --> 00:03:31.800
<v Speaker 2>Absolutely? You can ask for really complex formats like Mermaid

77
00:03:31.840 --> 00:03:35.800
<v Speaker 2>syntax for generating flow diagrams. It's surprisingly capable.

78
00:03:35.879 --> 00:03:40.800
<v Speaker 1>That's powerful, especially for developers. Okay, so direction format. What's

79
00:03:40.840 --> 00:03:41.520
<v Speaker 1>the third pillar?

80
00:03:41.800 --> 00:03:46.199
<v Speaker 2>The third one is provide examples Sometimes, honestly, it's just

81
00:03:46.360 --> 00:03:49.199
<v Speaker 2>easier to show the AI what you like instead of

82
00:03:49.240 --> 00:03:50.960
<v Speaker 2>trying to describe it perfectly.

83
00:03:50.560 --> 00:03:52.639
<v Speaker 1>Like show, don't just tell exactly.

84
00:03:53.120 --> 00:03:55.879
<v Speaker 2>This works really well if you're maybe not an expert

85
00:03:55.879 --> 00:03:58.560
<v Speaker 2>in the specific domain yourself. Let's say you want product

86
00:03:58.639 --> 00:04:02.039
<v Speaker 2>names but in a very particular kind of quirky style. Okay,

87
00:04:02.039 --> 00:04:04.280
<v Speaker 2>instead of trying to describe quirki, you just give examples

88
00:04:04.319 --> 00:04:07.400
<v Speaker 2>like eyebar, fridge, iverdge beer. I time, the AI sees

89
00:04:07.439 --> 00:04:09.639
<v Speaker 2>that pattern immediately, Ah.

90
00:04:09.360 --> 00:04:11.520
<v Speaker 1>I see, it learns the style from the examples. How

91
00:04:11.560 --> 00:04:12.680
<v Speaker 1>many examples work best?

92
00:04:12.879 --> 00:04:15.840
<v Speaker 2>Usually just adding one to three examples almost always helps.

93
00:04:15.840 --> 00:04:19.040
<v Speaker 2>It gives the AI a much clearer target. You just

94
00:04:19.079 --> 00:04:21.040
<v Speaker 2>need to be mindful of the token limits.

95
00:04:20.759 --> 00:04:22.680
<v Speaker 1>Right, the character limits for the prompt.

96
00:04:22.480 --> 00:04:25.600
<v Speaker 2>Yeah, like mid journey. The image generator takes about six

97
00:04:25.639 --> 00:04:29.160
<v Speaker 2>thousand characters free chat GPT is more like thirty two thousand,

98
00:04:29.519 --> 00:04:32.040
<v Speaker 2>so you usually have space for a few good examples

99
00:04:32.040 --> 00:04:33.000
<v Speaker 2>without any trouble.

100
00:04:33.319 --> 00:04:38.079
<v Speaker 1>Okay, so give direction, specify format, provide examples. Those are

101
00:04:38.120 --> 00:04:41.199
<v Speaker 1>the fundamentals. But let's level up for people wanting to

102
00:04:41.319 --> 00:04:44.319
<v Speaker 1>use AI professionally. How do you get it to do

103
00:04:44.480 --> 00:04:49.360
<v Speaker 1>more complex things like generating structured data, transforming text, maybe

104
00:04:49.399 --> 00:04:50.680
<v Speaker 1>even checking its own work.

105
00:04:50.800 --> 00:04:52.720
<v Speaker 2>Absolutely, this is where it moves from you know, just

106
00:04:52.839 --> 00:04:57.560
<v Speaker 2>experimenting to really building things. Let's start with generating structured outputs.

107
00:04:57.560 --> 00:04:59.800
<v Speaker 2>This goes way beyond simple lists. Okay, you can get

108
00:04:59.800 --> 00:05:03.120
<v Speaker 2>the A to generate really complex mested data structures. The

109
00:05:03.120 --> 00:05:07.560
<v Speaker 2>book mentions things like hierarchical lists JSON Yamal, like.

110
00:05:07.480 --> 00:05:10.240
<v Speaker 1>Creating a database ready structure exactly.

111
00:05:10.480 --> 00:05:14.839
<v Speaker 2>Imagine generating a detailed article outline perfectly formatted as a

112
00:05:14.920 --> 00:05:19.120
<v Speaker 2>Jason payload, or taking a user's casual request and turning

113
00:05:19.120 --> 00:05:24.000
<v Speaker 2>it into a structured Yamal shopping list. The precision is incredible.

114
00:05:24.079 --> 00:05:27.120
<v Speaker 1>Wow, does it always get it right like perfectly valid

115
00:05:27.160 --> 00:05:28.000
<v Speaker 1>Jason every time?

116
00:05:28.120 --> 00:05:31.800
<v Speaker 2>Not always. Language models can sometimes add extra conversational text

117
00:05:32.079 --> 00:05:35.680
<v Speaker 2>or maybe generate slightly invalid Jason or Yamal, But there

118
00:05:35.680 --> 00:05:38.319
<v Speaker 2>are smart ways strategies to handle those kinds of edge

119
00:05:38.319 --> 00:05:39.360
<v Speaker 2>cases in your code.

120
00:05:39.519 --> 00:05:41.319
<v Speaker 1>Okay, good to know and beyond JASAML.

121
00:05:41.720 --> 00:05:44.480
<v Speaker 2>Yeah, it can even generate things like mock CSV data,

122
00:05:44.920 --> 00:05:48.040
<v Speaker 2>you know, a list of fake names, ages, grades, whatever

123
00:05:48.040 --> 00:05:50.720
<v Speaker 2>you need ready to use in spreadsheets or other tools.

124
00:05:50.959 --> 00:05:53.240
<v Speaker 2>It's like having an instant data engineer.

125
00:05:53.040 --> 00:05:55.839
<v Speaker 1>That is genuinely powerful for automation. Okay, so that's generating

126
00:05:55.839 --> 00:05:59.800
<v Speaker 1>structured stuff, but what about working with texts that already exists, transforming,

127
00:06:00.120 --> 00:06:02.160
<v Speaker 1>simplifying it, analyzing it right.

128
00:06:02.120 --> 00:06:05.399
<v Speaker 2>Huge area. There's several really cool techniques here. One that's

129
00:06:05.439 --> 00:06:08.920
<v Speaker 2>super popular and useful is explain it like I'm five

130
00:06:09.079 --> 00:06:13.240
<v Speaker 2>ELI five, yet exactly ELI five. It's not just a gimmick.

131
00:06:13.720 --> 00:06:18.319
<v Speaker 2>It's a seriously powerful way to take dense technical documents

132
00:06:18.759 --> 00:06:22.720
<v Speaker 2>think medical abstracts or complex legal text and boil them

133
00:06:22.759 --> 00:06:27.480
<v Speaker 2>down into language anyone can grasp. It really helps democratize information.

134
00:06:27.839 --> 00:06:28.759
<v Speaker 1>That's fantastic.

135
00:06:28.879 --> 00:06:32.560
<v Speaker 2>What else, then, there's universal translation. We mentioned language to language,

136
00:06:32.720 --> 00:06:36.279
<v Speaker 2>but lllms can also translate between coding languages like Python

137
00:06:36.319 --> 00:06:39.040
<v Speaker 2>to JavaScript or vice versa. They act as this amazing

138
00:06:39.040 --> 00:06:39.759
<v Speaker 2>bridge TREA.

139
00:06:39.639 --> 00:06:43.560
<v Speaker 1>Do communication gaps? Okay, but what if the AI doesn't

140
00:06:43.560 --> 00:06:46.240
<v Speaker 1>have enough information to give a good answer, can it

141
00:06:46.399 --> 00:06:47.920
<v Speaker 1>like ask for more detail?

142
00:06:48.120 --> 00:06:50.600
<v Speaker 2>Yes? Absolutely, you can teach it to ask for context.

143
00:06:51.079 --> 00:06:53.680
<v Speaker 2>Llm's can function as sort of simple agents with some

144
00:06:53.800 --> 00:06:56.560
<v Speaker 2>reasoning ability. You can actually prompt them to recognize when

145
00:06:56.560 --> 00:06:59.319
<v Speaker 2>they lack info and then ask you clarifying questions.

146
00:06:59.360 --> 00:07:00.959
<v Speaker 1>Oh interesting, So it becomes more of a.

147
00:07:00.920 --> 00:07:03.639
<v Speaker 2>Dialogue exactly like if you ask should I use Mango

148
00:07:03.680 --> 00:07:07.079
<v Speaker 2>dB or POSTGRESCOO, a well prompted GBT four might come

149
00:07:07.120 --> 00:07:09.040
<v Speaker 2>back with okay to answer that, I need to know

150
00:07:09.079 --> 00:07:11.399
<v Speaker 2>what's your data structure? Like what are your scalability needs?

151
00:07:11.480 --> 00:07:13.199
<v Speaker 2>Do you need acid compliance? And so on?

152
00:07:13.439 --> 00:07:15.360
<v Speaker 1>So it guides you to give it the info at

153
00:07:15.399 --> 00:07:16.959
<v Speaker 1>needs for a better answer.

154
00:07:17.319 --> 00:07:20.759
<v Speaker 2>It's smart, very smart, turns it into an active problem solver.

155
00:07:21.360 --> 00:07:24.879
<v Speaker 2>Another really neat one is text style unbundling unbundling.

156
00:07:25.120 --> 00:07:25.480
<v Speaker 1>What's that?

157
00:07:25.839 --> 00:07:28.319
<v Speaker 2>It means you can get the AI to analyze a

158
00:07:28.319 --> 00:07:33.399
<v Speaker 2>piece of text and extract its specific stylistic features the tone, sentence, length,

159
00:07:33.600 --> 00:07:36.040
<v Speaker 2>vocabulary choices, even the structure.

160
00:07:36.160 --> 00:07:38.399
<v Speaker 1>Okay, and then what then you can.

161
00:07:38.399 --> 00:07:42.040
<v Speaker 2>Use those extracted features? Is a kind of style guide

162
00:07:42.199 --> 00:07:45.639
<v Speaker 2>to generate new content that matches that original voice perfectly.

163
00:07:46.319 --> 00:07:49.240
<v Speaker 2>Super useful for businesses wanting consistent brand messages.

164
00:07:49.319 --> 00:07:53.560
<v Speaker 1>Ah I see maintaining a consistent voice across different pieces

165
00:07:53.560 --> 00:07:57.040
<v Speaker 1>of content crucial for branding totally. Now, what about just

166
00:07:57.079 --> 00:08:00.519
<v Speaker 1>dealing with huge amounts of text, like reading massive reports

167
00:08:00.600 --> 00:08:01.560
<v Speaker 1>or research papers.

168
00:08:01.639 --> 00:08:05.439
<v Speaker 2>That's where summarization and chunking come in. AI. Summarization is

169
00:08:05.519 --> 00:08:09.399
<v Speaker 2>amazing for distilling information, but for really long documents you

170
00:08:09.480 --> 00:08:11.720
<v Speaker 2>hit those context limits we talked about, right.

171
00:08:11.600 --> 00:08:14.079
<v Speaker 1>The AI can only remember so much text at once.

172
00:08:14.240 --> 00:08:19.439
<v Speaker 2>Exactly, so chunking, just breaking the text into smaller, manageable pieces,

173
00:08:19.839 --> 00:08:23.680
<v Speaker 2>is essential. It lets you process long documents, even ones

174
00:08:23.720 --> 00:08:26.480
<v Speaker 2>covering multiple topics, without overwhelming the AI.

175
00:08:26.879 --> 00:08:28.800
<v Speaker 1>How do you decide where to split the texts?

176
00:08:28.959 --> 00:08:32.000
<v Speaker 2>There are different ways. You can split by sentence, paragraph,

177
00:08:32.159 --> 00:08:35.120
<v Speaker 2>sometimes by complexity, or just by length, or you can

178
00:08:35.120 --> 00:08:37.919
<v Speaker 2>get really precise and split by the actual token count

179
00:08:38.279 --> 00:08:42.279
<v Speaker 2>using specific tools, especially for models like open ais ensures

180
00:08:42.320 --> 00:08:43.759
<v Speaker 2>each chunk fits perfectly.

181
00:08:43.919 --> 00:08:46.679
<v Speaker 1>Okay, smart ways to handle large inputs. But now we've

182
00:08:46.679 --> 00:08:49.159
<v Speaker 1>generated all this output, how do we know if our

183
00:08:49.200 --> 00:08:51.720
<v Speaker 1>prompts are actually any good? How do we evaluate the

184
00:08:51.799 --> 00:08:54.039
<v Speaker 1>quality rigorously great question.

185
00:08:54.480 --> 00:08:57.919
<v Speaker 2>Evaluating prompt quality is key if you're serious. You can

186
00:08:57.960 --> 00:09:00.600
<v Speaker 2>start simple, like with the thumbs up thumbs down rating

187
00:09:00.639 --> 00:09:02.039
<v Speaker 2>system at the bit of Rigger.

188
00:09:02.120 --> 00:09:03.600
<v Speaker 1>Okay, basic feedback.

189
00:09:03.279 --> 00:09:06.000
<v Speaker 2>But you can get much more sophisticated. Automated evaluation is

190
00:09:06.039 --> 00:09:08.879
<v Speaker 2>totally possible. For instance, you could use a powerful model

191
00:09:08.919 --> 00:09:12.000
<v Speaker 2>like GPT four to actually grade the responses from a

192
00:09:12.039 --> 00:09:13.840
<v Speaker 2>less powerful model AI.

193
00:09:13.919 --> 00:09:17.440
<v Speaker 1>Evaluating AI interesting using the best model to check the others.

194
00:09:17.759 --> 00:09:20.799
<v Speaker 2>Yeah, and the book talks about proper ab testing methods.

195
00:09:21.159 --> 00:09:23.600
<v Speaker 2>Often using tools like Jupiter notebooks, you can do things

196
00:09:23.639 --> 00:09:26.559
<v Speaker 2>like shuffler responses, so the human rader is blind to

197
00:09:26.600 --> 00:09:30.159
<v Speaker 2>which prompt variation produced which output, avoiding bias.

198
00:09:30.320 --> 00:09:33.120
<v Speaker 1>Proper scientific method basically exactly.

199
00:09:33.159 --> 00:09:36.360
<v Speaker 2>You can even compare prompt variations using metrics like embedding

200
00:09:36.399 --> 00:09:40.840
<v Speaker 2>distance that measures how semantically similar an AI's answer is

201
00:09:41.120 --> 00:09:43.720
<v Speaker 2>to a known ground truth or perfect.

202
00:09:43.399 --> 00:09:47.080
<v Speaker 1>Answer, so measuring how close it is and meaning right.

203
00:09:47.120 --> 00:09:50.879
<v Speaker 2>The whole point is to iterate faster, more scientifically and

204
00:09:51.000 --> 00:09:54.600
<v Speaker 2>reduce the need for tons of slow, expensive manual review.

205
00:09:54.720 --> 00:09:57.399
<v Speaker 1>It's incredible how fast this field is moving, not just

206
00:09:57.440 --> 00:10:00.799
<v Speaker 1>the prompting techniques but the underlying AI model themselves, and

207
00:10:00.840 --> 00:10:03.200
<v Speaker 1>the frameworks built on top of them feels like warp

208
00:10:03.200 --> 00:10:03.919
<v Speaker 1>speeds sometimes.

209
00:10:04.039 --> 00:10:07.519
<v Speaker 2>Oh, absolutely, the pace of innovation is just staggering. If

210
00:10:07.519 --> 00:10:10.200
<v Speaker 2>we take a brief history of text generation models, the

211
00:10:10.200 --> 00:10:13.799
<v Speaker 2>big leap was the transformer architecture back around twenty seventeen.

212
00:10:13.440 --> 00:10:14.720
<v Speaker 1>Right, that changed everything.

213
00:10:15.039 --> 00:10:17.919
<v Speaker 2>It really did allowed models to connect words across long

214
00:10:17.960 --> 00:10:22.240
<v Speaker 2>distances in text, boosting comprehension and efficiency. Then you had

215
00:10:22.240 --> 00:10:25.639
<v Speaker 2>open ais GPT series, GPT two, GPT three, three point

216
00:10:25.639 --> 00:10:29.240
<v Speaker 2>five Turbo, Chat GPT now GPT four really pushing things

217
00:10:29.240 --> 00:10:30.039
<v Speaker 2>into the public eye.

218
00:10:30.120 --> 00:10:33.279
<v Speaker 1>GPT three point five Turbo and chat GPT made it accessible.

219
00:10:33.360 --> 00:10:37.279
<v Speaker 2>Yeah, three point five Turbo, especially with Microsoft's investment, brought

220
00:10:37.279 --> 00:10:41.120
<v Speaker 2>better efficiency and lower costs, made lllms practical for more people.

221
00:10:41.279 --> 00:10:45.240
<v Speaker 2>And Chat GPT fine Tune for conversation just exploded fastest

222
00:10:45.240 --> 00:10:48.799
<v Speaker 2>going app ever, apparently, and gptwo four GPT four released

223
00:10:48.799 --> 00:10:51.759
<v Speaker 2>in twenty twenty four was another step change, excelling at

224
00:10:51.799 --> 00:10:55.080
<v Speaker 2>complex stuff, scoring in the ninetieth percentile on the bar exam.

225
00:10:55.440 --> 00:10:58.919
<v Speaker 2>It showed AI tackling really high level analytical tasks.

226
00:10:59.200 --> 00:11:01.399
<v Speaker 1>That's the sort of clo source big company side. What

227
00:11:01.440 --> 00:11:03.200
<v Speaker 1>about the open source world that seems to be moving

228
00:11:03.279 --> 00:11:04.200
<v Speaker 1>justice fast.

229
00:11:03.960 --> 00:11:07.279
<v Speaker 2>Totally mis Lama series, Lama, Lama two, Lama three takes

230
00:11:07.320 --> 00:11:09.559
<v Speaker 2>a different path by being open source that builds a

231
00:11:09.559 --> 00:11:10.720
<v Speaker 2>whole community.

232
00:11:10.279 --> 00:11:13.480
<v Speaker 1>Around it, democratizing it in a way exactly, and it

233
00:11:13.480 --> 00:11:16.519
<v Speaker 1>allows for cool optimizations like quantization and Laura.

234
00:11:17.080 --> 00:11:20.159
<v Speaker 2>Those are techniques to basically shrink or specialize these huge

235
00:11:20.159 --> 00:11:22.840
<v Speaker 2>models so you can run them on like good home computer.

236
00:11:22.879 --> 00:11:25.960
<v Speaker 1>GPU makes them more accessible. Any other big open source

237
00:11:25.960 --> 00:11:26.799
<v Speaker 1>players Yeah.

238
00:11:26.720 --> 00:11:30.639
<v Speaker 2>Mistral seven B from the French startup mistral ai is

239
00:11:30.679 --> 00:11:33.279
<v Speaker 2>getting a lot of buzz too, another really powerful open

240
00:11:33.279 --> 00:11:36.720
<v Speaker 2>source option. So right now, GPT four probably leads on

241
00:11:36.879 --> 00:11:40.120
<v Speaker 2>raw capability in many areas, but open source like Lama

242
00:11:40.159 --> 00:11:43.080
<v Speaker 2>and Mistral are super exciting, especially if you want to

243
00:11:43.120 --> 00:11:45.080
<v Speaker 2>find you in a model for a very specific job.

244
00:11:45.320 --> 00:11:48.200
<v Speaker 1>Okay, so we have these powerful models open and closed source,

245
00:11:48.399 --> 00:11:51.960
<v Speaker 1>but how do developers actually build applications with them, connect

246
00:11:52.000 --> 00:11:54.679
<v Speaker 1>them to data, make them do things. Is there a

247
00:11:54.759 --> 00:11:56.240
<v Speaker 1>standard toolkit that's.

248
00:11:56.080 --> 00:11:59.279
<v Speaker 2>Where frameworks like lang chain come in. It's become hugely popular.

249
00:11:59.399 --> 00:12:03.000
<v Speaker 2>Is an open source framework Python and typescript designed specifically

250
00:12:03.399 --> 00:12:04.919
<v Speaker 2>for building LM applications?

251
00:12:04.960 --> 00:12:05.879
<v Speaker 1>Oh, what's its main goal?

252
00:12:06.000 --> 00:12:10.679
<v Speaker 2>Two core ideas enhancing data awareness, connecting lms to external

253
00:12:10.759 --> 00:12:14.600
<v Speaker 2>data they weren't trained on, and agency giving LMS the

254
00:12:14.639 --> 00:12:16.919
<v Speaker 2>ability to take actions and influence their environment.

255
00:12:17.279 --> 00:12:21.080
<v Speaker 1>Okay, data awareness and agency. How does it achieve that?

256
00:12:21.559 --> 00:12:25.399
<v Speaker 2>Through modular building blocks things like model io for interacting

257
00:12:25.399 --> 00:12:30.000
<v Speaker 2>with different models, retrieval for fetching data, chains for sequencing operations,

258
00:12:30.200 --> 00:12:33.639
<v Speaker 2>agents for decision making, and tool use memory for remembering

259
00:12:33.679 --> 00:12:36.759
<v Speaker 2>past interactions and callbacks for running code at certain points.

260
00:12:36.840 --> 00:12:39.600
<v Speaker 1>Sounds comprehensive. Does it work with different AI providers?

261
00:12:39.759 --> 00:12:44.600
<v Speaker 2>Yeah? Supports models from Anthropic, Google's Vertex Ai, OpenAI, and others.

262
00:12:44.759 --> 00:12:48.159
<v Speaker 2>Plus it handles practical stuff like streaming, getting words back

263
00:12:48.200 --> 00:12:51.159
<v Speaker 2>one by one like chat GPT does, and batching for

264
00:12:51.240 --> 00:12:52.919
<v Speaker 2>running multiple requests in parallel.

265
00:12:53.240 --> 00:12:56.960
<v Speaker 1>What about getting structured data out of the LM's responses reliably.

266
00:12:57.120 --> 00:12:59.519
<v Speaker 2>That's where laying chain's output parts are key, especially the

267
00:12:59.559 --> 00:13:02.519
<v Speaker 2>ones that use identic, which is great for defining Jason structures.

268
00:13:02.840 --> 00:13:06.159
<v Speaker 2>They help reliably turn the AI's natural language answer into

269
00:13:06.279 --> 00:13:09.480
<v Speaker 2>clean structured data. It essentially lets you build a flexible

270
00:13:09.519 --> 00:13:11.159
<v Speaker 2>API on top of the LLM.

271
00:13:11.240 --> 00:13:14.360
<v Speaker 1>And what about open AI's specific way for models to

272
00:13:14.399 --> 00:13:16.679
<v Speaker 1>interact with external systems. Is that different?

273
00:13:17.080 --> 00:13:20.759
<v Speaker 2>You're probably thinking of open AI function calling. It's their

274
00:13:20.840 --> 00:13:25.200
<v Speaker 2>method for letting llms intelligently decide to call external functions.

275
00:13:25.720 --> 00:13:27.039
<v Speaker 1>How does that work? Exactly?

276
00:13:27.240 --> 00:13:30.519
<v Speaker 2>LLM analyzes the conversation, figures out it needs to do

277
00:13:30.559 --> 00:13:34.159
<v Speaker 2>something specific, like check the weather. It then outputs a

278
00:13:34.159 --> 00:13:37.519
<v Speaker 2>structured Jason object saying call a check weather function with

279
00:13:37.600 --> 00:13:41.440
<v Speaker 2>location London. Your system runs that function, gets the weather data,

280
00:13:41.720 --> 00:13:44.720
<v Speaker 2>feeds it back into the conversation, and the LLM can

281
00:13:44.759 --> 00:13:46.320
<v Speaker 2>then summarize it for the user.

282
00:13:46.519 --> 00:13:48.440
<v Speaker 1>So it tells your code what function to run and

283
00:13:48.519 --> 00:13:49.360
<v Speaker 1>with what arguments.

284
00:13:49.600 --> 00:13:52.960
<v Speaker 2>Very neat, very neat, Very powerful for integrations and for.

285
00:13:53.039 --> 00:13:56.519
<v Speaker 1>Fine tuning the output on specific tasks, especially new ones.

286
00:13:56.799 --> 00:13:59.919
<v Speaker 2>That brings us back to fu shot learning. Remember providing

287
00:14:00.080 --> 00:14:01.120
<v Speaker 2>examples in the prompt.

288
00:14:01.200 --> 00:14:03.480
<v Speaker 1>Yeah, like the ibar fridge example exactly.

289
00:14:03.799 --> 00:14:07.360
<v Speaker 2>While zero shot relies just on the model's training, few

290
00:14:07.360 --> 00:14:09.919
<v Speaker 2>shot gives it those crucial examples right in the prompt.

291
00:14:10.679 --> 00:14:14.279
<v Speaker 2>It helps optimize the model's behavior for exactly what you want.

292
00:14:14.919 --> 00:14:17.240
<v Speaker 2>It's like giving the AI a mini tutorial for the

293
00:14:17.320 --> 00:14:18.159
<v Speaker 2>specific task.

294
00:14:18.480 --> 00:14:22.000
<v Speaker 1>Does it still matter with models that have huge context windows? Now?

295
00:14:22.200 --> 00:14:25.440
<v Speaker 2>Yeah, it often still helps Even with large context windows.

296
00:14:25.679 --> 00:14:27.840
<v Speaker 2>A few good examples can guide the model to the

297
00:14:27.919 --> 00:14:31.720
<v Speaker 2>right answer faster and more reliably, which can actually save

298
00:14:31.759 --> 00:14:34.720
<v Speaker 2>you on API costs because you use fewer tokens overall

299
00:14:34.960 --> 00:14:36.080
<v Speaker 2>to get the desired results.

300
00:14:36.120 --> 00:14:39.320
<v Speaker 1>Okay, this is all incredibly powerful, but it raises a

301
00:14:39.320 --> 00:14:42.200
<v Speaker 1>big question. How do we get these AI models to

302
00:14:42.279 --> 00:14:45.480
<v Speaker 1>work securely and effectively with our data, our company knowledge,

303
00:14:45.519 --> 00:14:48.200
<v Speaker 1>our specific documents, and how do we make them remember

304
00:14:48.320 --> 00:14:51.080
<v Speaker 1>previous conversations that seems vital for real.

305
00:14:50.840 --> 00:14:54.360
<v Speaker 2>World use, absolutely vital. This is where connecting llms to

306
00:14:54.440 --> 00:14:58.440
<v Speaker 2>your data and managing memory really unlocks their practical potential.

307
00:14:58.679 --> 00:15:01.440
<v Speaker 2>Let's talk data connection in VEC databases. Okay, so your

308
00:15:01.519 --> 00:15:04.000
<v Speaker 2>organization's data. It comes in all shapes and sizes, right,

309
00:15:04.080 --> 00:15:08.159
<v Speaker 2>unstructured stuff like Google docs, web pages, code and structure

310
00:15:08.159 --> 00:15:11.200
<v Speaker 2>stuff in SQL. No SQL databases. To let the AI

311
00:15:11.320 --> 00:15:14.799
<v Speaker 2>query that unstructured data, the process usually involves loading it

312
00:15:14.840 --> 00:15:18.759
<v Speaker 2>into what Lang chain calls documents, then chunking them, breaking

313
00:15:18.799 --> 00:15:21.519
<v Speaker 2>them into smaller pieces, and then storing these pieces in

314
00:15:21.559 --> 00:15:24.080
<v Speaker 2>a special database called a vector database.

315
00:15:24.240 --> 00:15:26.080
<v Speaker 1>Vector database Okay, what makes it special?

316
00:15:26.320 --> 00:15:30.480
<v Speaker 2>It stores data based on meaning using embeddings. Embeddings are

317
00:15:30.919 --> 00:15:36.120
<v Speaker 2>numerical representations vectors of text. Models like open aies text

318
00:15:36.200 --> 00:15:40.039
<v Speaker 2>embedding ATA zero zero two or open source ones from

319
00:15:40.120 --> 00:15:42.559
<v Speaker 2>hugging face turn text into these.

320
00:15:42.519 --> 00:15:46.000
<v Speaker 1>Vectors, so numbers that represent the meaning exactly.

321
00:15:45.840 --> 00:15:48.360
<v Speaker 2>Text with similar meanings end up closer together in this

322
00:15:48.399 --> 00:15:51.159
<v Speaker 2>high dimensional mathematical space. Think of it like a map

323
00:15:51.200 --> 00:15:51.879
<v Speaker 2>of concepts.

324
00:15:52.159 --> 00:15:54.039
<v Speaker 1>Is creating these embeddings expensive.

325
00:15:54.279 --> 00:15:57.799
<v Speaker 2>Actually, open ais are pretty cheap. The source mentioned embedding

326
00:15:57.840 --> 00:16:00.519
<v Speaker 2>the entire King James Bible would cost something like a

327
00:16:00.559 --> 00:16:03.399
<v Speaker 2>dollar sixty cents, and there are good open source options too.

328
00:16:03.399 --> 00:16:07.000
<v Speaker 1>Okay affordable. So these embeddings go into the vector database.

329
00:16:07.120 --> 00:16:10.639
<v Speaker 2>Right. Vector databases like FAES which is open source, or

330
00:16:10.679 --> 00:16:13.600
<v Speaker 2>hosted ones like pine Cone or Chroma are built to

331
00:16:13.720 --> 00:16:17.120
<v Speaker 2>store these vectors and search them based on semantic similarity,

332
00:16:17.200 --> 00:16:20.440
<v Speaker 2>finding the vectors and thus the original text chunks that

333
00:16:20.480 --> 00:16:23.159
<v Speaker 2>are closest in meaning to your query vector.

334
00:16:23.240 --> 00:16:25.799
<v Speaker 1>And this whole process helps prevent the AI from just

335
00:16:25.919 --> 00:16:28.559
<v Speaker 1>making things up right the hallucinations precisely.

336
00:16:28.960 --> 00:16:33.159
<v Speaker 2>That leads us to retrieval augmented generation or R. This

337
00:16:33.279 --> 00:16:36.320
<v Speaker 2>is the key technique for fighting hallucinations and also getting

338
00:16:36.360 --> 00:16:38.159
<v Speaker 2>around those context length limits.

339
00:16:38.279 --> 00:16:40.519
<v Speaker 1>How does our RAG work in practice?

340
00:16:40.639 --> 00:16:43.960
<v Speaker 2>It's pretty elegant. A user asks a question, your system

341
00:16:44.039 --> 00:16:47.679
<v Speaker 2>first converts that question into an embedding vector. Then it

342
00:16:47.759 --> 00:16:51.000
<v Speaker 2>searches your vector database for the text chunks whose embeddings

343
00:16:51.000 --> 00:16:52.120
<v Speaker 2>are most similar.

344
00:16:52.159 --> 00:16:54.240
<v Speaker 1>Finds the relevant bits of your own data.

345
00:16:54.360 --> 00:16:57.960
<v Speaker 2>Exactly, It retrieves those relevant jumps and literally inserts them

346
00:16:58.000 --> 00:17:02.840
<v Speaker 2>into the prompt you sent to the LLM, providing explicit context. Then, crucially,

347
00:17:03.120 --> 00:17:06.319
<v Speaker 2>you instruct the LLM to answer the user's question based

348
00:17:06.319 --> 00:17:07.960
<v Speaker 2>only on the provided context.

349
00:17:08.079 --> 00:17:10.759
<v Speaker 1>Ah. So you're forcing it to use your verified information,

350
00:17:11.160 --> 00:17:13.039
<v Speaker 1>not just its general training data.

351
00:17:13.079 --> 00:17:16.240
<v Speaker 2>Precisely, it lets you dynamically pull in specific up to

352
00:17:16.319 --> 00:17:20.640
<v Speaker 2>date knowledge, maybe chat history, specific PDS, sections, products, pecs,

353
00:17:20.680 --> 00:17:24.359
<v Speaker 2>ensuring the AI's answer is informed, relevant, and grounded in fact.

354
00:17:24.640 --> 00:17:28.119
<v Speaker 1>That's huge for accuracy. Okay, so gig gives it factual knowledge.

355
00:17:28.160 --> 00:17:30.599
<v Speaker 1>What about memory, making it remember past parts of a

356
00:17:30.599 --> 00:17:32.640
<v Speaker 1>conversation or user preferences over time.

357
00:17:32.799 --> 00:17:36.359
<v Speaker 2>That's memory in llms, and it's crucial for making interactions

358
00:17:36.359 --> 00:17:40.519
<v Speaker 2>feel natural and personalized. We can think about two types. First,

359
00:17:40.880 --> 00:17:42.319
<v Speaker 2>short term memory.

360
00:17:42.519 --> 00:17:46.200
<v Speaker 1>STM like working memory kind of Yeah, it lets.

361
00:17:46.000 --> 00:17:49.240
<v Speaker 2>The LLM remember what was said earlier within the same interaction.

362
00:17:49.799 --> 00:17:53.319
<v Speaker 2>Think of a support chatbot remembering your initial query when

363
00:17:53.359 --> 00:17:55.319
<v Speaker 2>you ask a follow up question minutes later.

364
00:17:56.039 --> 00:17:58.640
<v Speaker 1>Lane chain makes adding STM pretty straightforward.

365
00:17:58.720 --> 00:18:02.079
<v Speaker 2>Okay, remembers the current chop. What about remembering things across

366
00:18:02.119 --> 00:18:04.680
<v Speaker 2>different sessions days or weeks later.

367
00:18:04.880 --> 00:18:08.279
<v Speaker 1>That's long term memory LTM, and this is usually achieved

368
00:18:08.319 --> 00:18:11.319
<v Speaker 1>by storing summaries of past conversations or key pieces of

369
00:18:11.319 --> 00:18:14.599
<v Speaker 1>information in a vector database. When the user starts a

370
00:18:14.599 --> 00:18:18.839
<v Speaker 1>new session, you retrieve relevant past interactions or preferences using

371
00:18:18.839 --> 00:18:22.039
<v Speaker 1>similarity search and add that as context to the prompt,

372
00:18:22.160 --> 00:18:22.519
<v Speaker 1>so it.

373
00:18:22.440 --> 00:18:25.119
<v Speaker 2>Can remember my book preferences from last month when I

374
00:18:25.160 --> 00:18:26.920
<v Speaker 2>ask for new recommendation exactly.

375
00:18:26.960 --> 00:18:31.200
<v Speaker 1>That it allows for truly personalized, context aware interactions over time.

376
00:18:31.519 --> 00:18:34.160
<v Speaker 1>This is where it starts to feel really intelligent, capable

377
00:18:34.200 --> 00:18:38.000
<v Speaker 1>of complex tasks, which brings us to AI agents. What

378
00:18:38.079 --> 00:18:40.519
<v Speaker 1>if the AI could not just think or retrieve info,

379
00:18:40.799 --> 00:18:44.039
<v Speaker 1>but actually do things, take actions exactly.

380
00:18:44.200 --> 00:18:47.880
<v Speaker 2>We're now in the realm of agent based architecture. The

381
00:18:47.920 --> 00:18:52.880
<v Speaker 2>AI acts, perceives, makes decisions to achieve goals. A key

382
00:18:52.960 --> 00:18:56.440
<v Speaker 2>technique enabling this is chain of thought ski B reasoning.

383
00:18:56.519 --> 00:18:59.200
<v Speaker 1>We touched on that, making the AI think step by

384
00:18:59.240 --> 00:18:59.960
<v Speaker 1>step right.

385
00:19:00.000 --> 00:19:02.079
<v Speaker 2>Instead of just asking for a say a marketing plan.

386
00:19:02.480 --> 00:19:04.960
<v Speaker 2>You prompt the AI to first think through the steps.

387
00:19:05.160 --> 00:19:08.920
<v Speaker 2>First consider the target audience, Second, analyze the budget constraints,

388
00:19:09.400 --> 00:19:12.640
<v Speaker 2>third research competitor products. Then outline the.

389
00:19:12.559 --> 00:19:14.519
<v Speaker 1>Plan breaking down the problem. Yeah.

390
00:19:14.559 --> 00:19:17.319
<v Speaker 2>It forces a more structured reasoning process, leading to much

391
00:19:17.359 --> 00:19:20.279
<v Speaker 2>more relevant and well thought out responses than just asking

392
00:19:20.279 --> 00:19:22.680
<v Speaker 2>for the final answer directly. It's like making it show

393
00:19:22.720 --> 00:19:23.119
<v Speaker 2>its work.

394
00:19:23.319 --> 00:19:26.480
<v Speaker 1>Okay, so copey improves reasoning. How does that connect to

395
00:19:26.559 --> 00:19:27.960
<v Speaker 1>taking actual actions?

396
00:19:28.279 --> 00:19:31.200
<v Speaker 2>That leads directly to the reason and act REACT framework.

397
00:19:31.599 --> 00:19:34.440
<v Speaker 2>This explicitly combines that chain of thought reasoning with the

398
00:19:34.480 --> 00:19:36.440
<v Speaker 2>ability to take actions using tools.

399
00:19:36.720 --> 00:19:39.799
<v Speaker 1>Okay, reason and act. How does that loop work?

400
00:19:40.119 --> 00:19:45.160
<v Speaker 2>It's a cycle. One thought. The LM internally reasons about

401
00:19:45.160 --> 00:19:48.599
<v Speaker 2>what it needs to do. Next. Two action, It decides

402
00:19:48.599 --> 00:19:50.920
<v Speaker 2>which tool it needs to use, like a search engine

403
00:19:51.000 --> 00:19:53.799
<v Speaker 2>or a calculator, and formulates the input for that tool.

404
00:19:54.480 --> 00:19:59.319
<v Speaker 2>Three observation. It receives the result from the tool. This thought,

405
00:19:59.319 --> 00:20:02.920
<v Speaker 2>action observation continues until it reaches the final answer or

406
00:20:02.960 --> 00:20:03.920
<v Speaker 2>completes its task.

407
00:20:04.039 --> 00:20:05.920
<v Speaker 1>So it can decide, I need to search the web

408
00:20:05.960 --> 00:20:08.680
<v Speaker 1>for this. Run the search tools, see the results, then

409
00:20:08.720 --> 00:20:10.680
<v Speaker 1>think about the next step based on those results.

410
00:20:10.720 --> 00:20:13.640
<v Speaker 2>Processing. It allows the AI to interact dynamically with its

411
00:20:13.720 --> 00:20:16.240
<v Speaker 2>environment to gather information or perform tasks.

412
00:20:16.400 --> 00:20:18.519
<v Speaker 1>What kind of tools can these agents use? Are they

413
00:20:18.519 --> 00:20:19.160
<v Speaker 1>pre defined?

414
00:20:19.319 --> 00:20:21.920
<v Speaker 2>Yes, they're pre defined functions or APIs you may available

415
00:20:21.960 --> 00:20:24.680
<v Speaker 2>to the agent. Examples are things like a simple calculator,

416
00:20:24.839 --> 00:20:27.279
<v Speaker 2>a Google Search interface, tools to interact with your file

417
00:20:27.359 --> 00:20:31.279
<v Speaker 2>system read write files, tools to make HTTP requests like

418
00:20:31.319 --> 00:20:34.039
<v Speaker 2>interacting with web APIs, or even things like Twilio to

419
00:20:34.079 --> 00:20:35.240
<v Speaker 2>send SMS messages.

420
00:20:35.519 --> 00:20:38.119
<v Speaker 1>So you equip the agent with the capabilities it needs.

421
00:20:38.119 --> 00:20:40.880
<v Speaker 2>Exactly and a key tip from the book, give your

422
00:20:41.000 --> 00:20:45.359
<v Speaker 2>tools really clear descriptive names and descriptions. It significantly helps

423
00:20:45.359 --> 00:20:48.279
<v Speaker 2>the LM choose the right tool for the job. Lang

424
00:20:48.359 --> 00:20:52.000
<v Speaker 2>Chain also offers pre built agent toolkits for common scenarios,

425
00:20:52.440 --> 00:20:55.400
<v Speaker 2>like a CSV agent that can query data in spreadsheets

426
00:20:55.680 --> 00:20:58.640
<v Speaker 2>or a SEQL agent for databases. Saves you building everything

427
00:20:58.680 --> 00:21:00.000
<v Speaker 2>from scratch makes sense.

428
00:21:00.319 --> 00:21:03.079
<v Speaker 1>Are there even more advanced agent designs out there going

429
00:21:03.079 --> 00:21:04.279
<v Speaker 1>beyond this React loop?

430
00:21:04.480 --> 00:21:08.079
<v Speaker 2>Oh? Yeah, we're seeing advanced agent architectures emerge. One example

431
00:21:08.119 --> 00:21:10.839
<v Speaker 2>is baby Agi. It's less a single loop and more

432
00:21:11.000 --> 00:21:12.440
<v Speaker 2>a system of interacting agents.

433
00:21:12.680 --> 00:21:14.240
<v Speaker 1>How does babyagi work?

434
00:21:14.599 --> 00:21:17.319
<v Speaker 2>It has a continuous cycle. It pulls a task from

435
00:21:17.319 --> 00:21:20.279
<v Speaker 2>a list, uses an LLM in context, often from a

436
00:21:20.359 --> 00:21:23.279
<v Speaker 2>vector dB like Chroma or weaviate, to execute. It saves

437
00:21:23.279 --> 00:21:26.400
<v Speaker 2>a result. Then a separate task creation agent figures out

438
00:21:26.440 --> 00:21:29.079
<v Speaker 2>new follow up tasks based on the outcome, and another

439
00:21:29.160 --> 00:21:32.720
<v Speaker 2>prioritization agent reorders the task list. It's a self perpetuating

440
00:21:32.759 --> 00:21:33.720
<v Speaker 2>goal seeking system.

441
00:21:33.880 --> 00:21:37.160
<v Speaker 1>Wow, like a little automated project manager kind of Another

442
00:21:37.160 --> 00:21:39.319
<v Speaker 1>fascinating one is Tree of Thoughts TUT.

443
00:21:39.640 --> 00:21:42.000
<v Speaker 2>This moves beyond simple linear, step by step.

444
00:21:41.799 --> 00:21:43.640
<v Speaker 1>Reasoning Tree of thoughts. How's that different?

445
00:21:43.920 --> 00:21:48.359
<v Speaker 2>It lets the LLM explore multiple potential reasoning paths simultaneously.

446
00:21:48.720 --> 00:21:51.559
<v Speaker 2>Like branches on a tree. You can evaluate different paths,

447
00:21:51.759 --> 00:21:55.720
<v Speaker 2>backtrack if one isn't working, and explore alternatives, much more

448
00:21:56.079 --> 00:21:58.359
<v Speaker 2>like human brainstorming or strategic thinking.

449
00:21:58.519 --> 00:22:00.440
<v Speaker 1>Does it actually improve performance.

450
00:22:00.599 --> 00:22:04.680
<v Speaker 2>Significantly on certain types of problems? The example given is

451
00:22:04.680 --> 00:22:07.880
<v Speaker 2>the game of twenty four puzzle standard GPS four. With

452
00:22:08.079 --> 00:22:10.799
<v Speaker 2>just chain of thought got it right maybe four percent

453
00:22:10.799 --> 00:22:14.279
<v Speaker 2>of the time. With Tree of Thoughts exploring different calculation paths,

454
00:22:14.359 --> 00:22:16.240
<v Speaker 2>it jumped to seventy four percent success.

455
00:22:16.480 --> 00:22:20.079
<v Speaker 1>That's a massive difference. Shows the power of exploring multiple options.

456
00:22:20.119 --> 00:22:24.039
<v Speaker 2>A huge difference really pushes the boundaries of AI problem solving.

457
00:22:24.200 --> 00:22:26.359
<v Speaker 1>Okay, switching gears a bit, but still on this creative

458
00:22:26.359 --> 00:22:29.279
<v Speaker 1>potential image generation For a lot of people. That was

459
00:22:29.319 --> 00:22:32.359
<v Speaker 1>the wow moment for AI. It feels like magic, But

460
00:22:32.440 --> 00:22:35.599
<v Speaker 1>how do we actually guide that process effectively with prompts?

461
00:22:35.680 --> 00:22:38.079
<v Speaker 2>It definitely feels like magic, But yeah, prompt engineering is

462
00:22:38.079 --> 00:22:40.799
<v Speaker 2>your magic wand here the technology behind most of it

463
00:22:40.880 --> 00:22:41.759
<v Speaker 2>is diffusion model.

464
00:22:41.839 --> 00:22:43.759
<v Speaker 1>Diffusion models right introduced back.

465
00:22:43.680 --> 00:22:45.920
<v Speaker 2>In twenty fifteen, but they really took off recently. They

466
00:22:45.920 --> 00:22:49.400
<v Speaker 2>produced amazing images from text descriptions. You know the big names.

467
00:22:50.519 --> 00:22:52.960
<v Speaker 2>Two came out in twenty twenty two, Mid Journey hit

468
00:22:53.000 --> 00:22:55.680
<v Speaker 2>the scene July twenty twenty two, and then open source

469
00:22:55.680 --> 00:22:59.039
<v Speaker 2>Stable Diffusion landed August twenty twenty two, and Dali three

470
00:22:59.240 --> 00:23:00.720
<v Speaker 2>is now baked into chat Gypt.

471
00:23:01.000 --> 00:23:02.319
<v Speaker 1>How do they actually work?

472
00:23:02.519 --> 00:23:05.279
<v Speaker 2>In simple terms, They're trained on billions of image caption pairs.

473
00:23:05.359 --> 00:23:08.400
<v Speaker 2>They learn the connection between words and visuals. The process

474
00:23:08.440 --> 00:23:12.079
<v Speaker 2>starts with random noise like TV static, and they gradually

475
00:23:12.119 --> 00:23:15.240
<v Speaker 2>denois it step by step, guiding it towards an image

476
00:23:15.279 --> 00:23:18.839
<v Speaker 2>that matches your text prompt. They navigate this huge latent space.

477
00:23:19.319 --> 00:23:21.279
<v Speaker 2>Think of it as a vast map of all possible

478
00:23:21.319 --> 00:23:23.480
<v Speaker 2>images to find the spot that matches your description.

479
00:23:23.960 --> 00:23:26.519
<v Speaker 1>Fascinating What are the sort of vibes or strengths of

480
00:23:26.519 --> 00:23:30.200
<v Speaker 1>those main models? Dowly, mid Journey, Stable Diffusion.

481
00:23:29.759 --> 00:23:32.480
<v Speaker 2>Well Della got famous for its artistic flexibility, though early

482
00:23:32.559 --> 00:23:36.200
<v Speaker 2>versions sometimes struggled with like hands and eyes, mid Journey

483
00:23:36.240 --> 00:23:39.119
<v Speaker 2>built a huge following, especially on Discord, known for its

484
00:23:39.200 --> 00:23:43.640
<v Speaker 2>distinct esthetic, often fantasy sci fi, very polished photorealistic styles.

485
00:23:44.400 --> 00:23:47.279
<v Speaker 2>Stable Diffusion really shook things up by being open source.

486
00:23:47.319 --> 00:23:49.759
<v Speaker 2>Want it yourself right, Yeah, run around your own computer

487
00:23:49.799 --> 00:23:52.720
<v Speaker 2>if you have a decent graphics card. That open nature

488
00:23:52.799 --> 00:23:56.359
<v Speaker 2>led to super fast development, tons of community add ons

489
00:23:56.400 --> 00:23:59.440
<v Speaker 2>in advanced features like control Net. It's generally seen as

490
00:23:59.480 --> 00:24:01.640
<v Speaker 2>the most fleshxible and extendable one.

491
00:24:02.039 --> 00:24:06.039
<v Speaker 1>So how do our basic prompting principles direction format examples

492
00:24:06.079 --> 00:24:07.359
<v Speaker 1>apply to making images?

493
00:24:07.599 --> 00:24:12.279
<v Speaker 2>They apply surprisingly well. First format modifiers, Just like specifying

494
00:24:12.359 --> 00:24:15.880
<v Speaker 2>JSON for text, you specify the visual style an oil

495
00:24:15.920 --> 00:24:19.400
<v Speaker 2>painting of a business meeting versus an ancient Egyptian hieroglyph

496
00:24:19.480 --> 00:24:22.279
<v Speaker 2>of a business meeting. Change in the whole look completely.

497
00:24:22.440 --> 00:24:25.240
<v Speaker 2>Just be aware sometimes the style brings baggage from the

498
00:24:25.279 --> 00:24:28.880
<v Speaker 2>training data like oil paintings often appearing with digital frames

499
00:24:28.880 --> 00:24:30.359
<v Speaker 2>around them unless you negate that.

500
00:24:30.640 --> 00:24:33.359
<v Speaker 1>Oh okay, what about specific artists.

501
00:24:32.960 --> 00:24:35.400
<v Speaker 2>Yep art style modifiers. You can ask for the style

502
00:24:35.440 --> 00:24:39.960
<v Speaker 2>of Van Go Dolly, Picasso, specific art movements. Mid Journey

503
00:24:40.000 --> 00:24:42.480
<v Speaker 2>even has a described command where you upload an image

504
00:24:42.640 --> 00:24:45.359
<v Speaker 2>and it suggests prompts that might create something similar. Great

505
00:24:45.400 --> 00:24:46.319
<v Speaker 2>for learning cool.

506
00:24:46.720 --> 00:24:48.920
<v Speaker 1>Are there quick tricks to just make the images look

507
00:24:49.359 --> 00:24:50.680
<v Speaker 1>better higher quality?

508
00:24:51.119 --> 00:24:55.759
<v Speaker 2>Yeah? Quality boosters simple words like four K, highly detailed

509
00:24:56.000 --> 00:24:59.640
<v Speaker 2>masterpiece trending on art station. Adding these often bumps up

510
00:24:59.640 --> 00:25:02.880
<v Speaker 2>the quality without drastically changing the content or style.

511
00:25:03.279 --> 00:25:06.920
<v Speaker 1>Easy wins. And what about telling it what not to include?

512
00:25:07.119 --> 00:25:11.000
<v Speaker 2>Crucial? That's negative prompts You specify what you don't want,

513
00:25:11.559 --> 00:25:13.640
<v Speaker 2>like for that oil painting, you might add no frame

514
00:25:13.799 --> 00:25:17.200
<v Speaker 2>border signature, or if you want a realistic Homer Simpson,

515
00:25:17.480 --> 00:25:21.559
<v Speaker 2>you'd add no cartoon animation. It helps untangle concepts the

516
00:25:21.599 --> 00:25:25.720
<v Speaker 2>AI might merge and fixes common glitches like mangled hands.

517
00:25:26.000 --> 00:25:28.559
<v Speaker 1>Very useful. Can you emphasize certain parts of the prompt?

518
00:25:28.720 --> 00:25:32.319
<v Speaker 2>Yes, using weighted terms. Different models have different syntax, but

519
00:25:32.440 --> 00:25:35.319
<v Speaker 2>often you use parentheses, maybe within a number like hyropol

520
00:25:35.400 --> 00:25:38.079
<v Speaker 2>one point five to boost its importance or square brackets

521
00:25:38.079 --> 00:25:41.359
<v Speaker 2>hat in some systems to deemphasize something gives you finer control.

522
00:25:41.400 --> 00:25:43.319
<v Speaker 1>Okay, what if you have an image you like and

523
00:25:43.359 --> 00:25:44.640
<v Speaker 1>want something similar.

524
00:25:44.559 --> 00:25:48.160
<v Speaker 2>That's prompting with an image often called MG two img.

525
00:25:48.599 --> 00:25:51.279
<v Speaker 2>You provide a starting image as guidance the AI tries

526
00:25:51.319 --> 00:25:54.240
<v Speaker 2>to capture its vibe, composition, or style. It's like a

527
00:25:54.240 --> 00:25:55.720
<v Speaker 2>one shot visual example for.

528
00:25:55.680 --> 00:25:59.400
<v Speaker 1>The AI need. What about editing images the AI creates

529
00:25:59.480 --> 00:26:03.000
<v Speaker 1>or keeping care aracters consistent across multiple images. That seems hard.

530
00:26:03.680 --> 00:26:06.839
<v Speaker 2>It is tricky, but there are tools. In painting lets

531
00:26:06.880 --> 00:26:09.720
<v Speaker 2>you mask off a specific area of an image and

532
00:26:09.759 --> 00:26:12.799
<v Speaker 2>then prompt the AI to regenerate just that area, like

533
00:26:12.920 --> 00:26:15.119
<v Speaker 2>changing someone's shirt color or adding.

534
00:26:14.880 --> 00:26:17.799
<v Speaker 1>An object to targeted changes exactly.

535
00:26:17.680 --> 00:26:20.880
<v Speaker 2>And outpainting does the opposite. It extends the image beyond

536
00:26:20.880 --> 00:26:24.680
<v Speaker 2>its original borders, generating new content that fits. People often

537
00:26:24.759 --> 00:26:27.240
<v Speaker 2>use a combination of in painting and outpainting to try

538
00:26:27.279 --> 00:26:31.119
<v Speaker 2>and maintain consistent characters across a series of images. Tweaking

539
00:26:31.119 --> 00:26:32.799
<v Speaker 2>faces or outfits as needed.

540
00:26:32.799 --> 00:26:36.319
<v Speaker 1>Still sounds a bit manual. Are there more advanced ways

541
00:26:36.359 --> 00:26:38.279
<v Speaker 1>to control the composition or pose?

542
00:26:38.480 --> 00:26:41.440
<v Speaker 2>Oh? Yes, this is where control net comes in especially

543
00:26:41.440 --> 00:26:43.960
<v Speaker 2>for stable diffusion. It's a game changer. It lets you

544
00:26:44.000 --> 00:26:47.400
<v Speaker 2>provide an input image purely for structural guidance, things like

545
00:26:47.440 --> 00:26:51.440
<v Speaker 2>canny edge maps, depth maps, human post skeletons from open pos,

546
00:26:51.519 --> 00:26:53.640
<v Speaker 2>segmentation maps, even just rough.

547
00:26:53.519 --> 00:26:56.240
<v Speaker 1>Scribbles so you could sketch a layout and control NET

548
00:26:56.279 --> 00:26:59.000
<v Speaker 1>makes the AI follow that structure precisely.

549
00:26:59.599 --> 00:27:03.039
<v Speaker 2>It gives artist's incredible control over the final composition while

550
00:27:03.079 --> 00:27:06.880
<v Speaker 2>letting the AI handle the rendering in details. It bridges

551
00:27:06.920 --> 00:27:11.119
<v Speaker 2>the gap between human intent and AI generation. Another helpful

552
00:27:11.160 --> 00:27:14.720
<v Speaker 2>tool is the segment Anything Model SAM from Meta AI.

553
00:27:15.279 --> 00:27:19.160
<v Speaker 2>It's amazing at precisely identifying and masking objects or people

554
00:27:19.200 --> 00:27:22.279
<v Speaker 2>in an image, which is super useful for targeted in painting.

555
00:27:22.480 --> 00:27:25.799
<v Speaker 1>Wow, that's real control. What about teaching the AI about

556
00:27:25.839 --> 00:27:29.440
<v Speaker 1>my specific product, or my face or my company style

557
00:27:29.960 --> 00:27:31.240
<v Speaker 1>stuff it wasn't trained on.

558
00:27:31.480 --> 00:27:34.440
<v Speaker 2>That's personalization and the main technique there is green booth

559
00:27:34.480 --> 00:27:37.920
<v Speaker 2>fine tuning. You teach the diffusion model a new concept

560
00:27:38.119 --> 00:27:40.119
<v Speaker 2>by showing it just a few images of that thing,

561
00:27:40.200 --> 00:27:42.839
<v Speaker 2>your product, your pet, whatever. It creates a new custom

562
00:27:42.880 --> 00:27:45.079
<v Speaker 2>model file that understands that specific concept.

563
00:27:45.119 --> 00:27:47.119
<v Speaker 1>So I can generate images of my dog in ben

564
00:27:47.160 --> 00:27:48.000
<v Speaker 1>Go's style.

565
00:27:47.880 --> 00:27:50.079
<v Speaker 2>Exactly that kind of thing, and tying this all together.

566
00:27:50.119 --> 00:27:52.039
<v Speaker 2>There's meta prompting for images.

567
00:27:51.799 --> 00:27:53.720
<v Speaker 1>Using one AI to write the prompt for another.

568
00:27:54.039 --> 00:27:57.079
<v Speaker 2>Yeah, you could ask chat GPT rate me a detailed

569
00:27:57.119 --> 00:28:00.000
<v Speaker 2>bid journey prompt for a photorealistic image of a few

570
00:28:00.039 --> 00:28:03.960
<v Speaker 2>uturistic city scape at sunset. It often crafts a much better,

571
00:28:04.079 --> 00:28:07.279
<v Speaker 2>more detailed prompt than a non expert might write themselves,

572
00:28:07.839 --> 00:28:09.880
<v Speaker 2>divides the labor effectively.

573
00:28:09.400 --> 00:28:12.720
<v Speaker 1>Clever and how what was that last really intriguing concept meme?

574
00:28:13.519 --> 00:28:17.319
<v Speaker 2>Something meme unbundling and mapping. This is more advanced conceptual stuff.

575
00:28:17.839 --> 00:28:20.119
<v Speaker 2>Instead of just copying an artists like in the style

576
00:28:20.160 --> 00:28:23.160
<v Speaker 2>of Van Go, you try to decompose that style into

577
00:28:23.160 --> 00:28:26.720
<v Speaker 2>its core components, the memes, meaning the recurring visual elements

578
00:28:26.799 --> 00:28:29.519
<v Speaker 2>color palettes, breashtrobe types, compositional tricks.

579
00:28:29.599 --> 00:28:32.119
<v Speaker 1>Break down the style into its ingredients exactly.

580
00:28:32.160 --> 00:28:34.880
<v Speaker 2>Then you can remix those ingredients, maybe combining elements from

581
00:28:34.880 --> 00:28:38.680
<v Speaker 2>different styles to create something new and original. Meme mapping

582
00:28:38.799 --> 00:28:42.960
<v Speaker 2>is about the community, aspect sharing, analysis, learning from successful prompts,

583
00:28:42.960 --> 00:28:46.960
<v Speaker 2>figuring out together what makes certain styles visually appealing. It's

584
00:28:46.960 --> 00:28:50.400
<v Speaker 2>about deconstructing and reconstructing visual language.

585
00:28:50.359 --> 00:28:54.400
<v Speaker 1>Fascinating Okay, you've walked us through an incredible range of techniques,

586
00:28:54.440 --> 00:28:58.319
<v Speaker 1>from basic text prompts to complex agents and highly controlled

587
00:28:58.359 --> 00:29:01.960
<v Speaker 1>image generation. How does this all come together? Can you

588
00:29:01.960 --> 00:29:04.359
<v Speaker 1>give us a practical example of building something real with

589
00:29:04.440 --> 00:29:05.119
<v Speaker 1>these techniques.

590
00:29:05.359 --> 00:29:09.119
<v Speaker 2>Sure, Let's imagine building an end to end AI blog

591
00:29:09.119 --> 00:29:12.440
<v Speaker 2>writing system. This would integrate many things we've discussed.

592
00:29:12.559 --> 00:29:15.000
<v Speaker 1>Okay, an automated blog writer. How would it start?

593
00:29:15.200 --> 00:29:19.200
<v Speaker 2>First? Topic research? It could use a tool maybe integrated

594
00:29:19.279 --> 00:29:22.720
<v Speaker 2>via lang chain like Google search results, to scrape top

595
00:29:22.799 --> 00:29:26.039
<v Speaker 2>Google hits for the chosen topic. Process that info to

596
00:29:26.039 --> 00:29:27.920
<v Speaker 2>get a baseline understanding.

597
00:29:27.720 --> 00:29:30.599
<v Speaker 1>So grounded in actual search data. Smart.

598
00:29:31.039 --> 00:29:34.160
<v Speaker 2>Then to make the content unique, it could simulate an

599
00:29:34.160 --> 00:29:37.519
<v Speaker 2>expert interview. You'd use role prompting to have one LLLM

600
00:29:37.599 --> 00:29:40.319
<v Speaker 2>act as an expert on the topic. Another LLM interview

601
00:29:40.319 --> 00:29:43.160
<v Speaker 2>it generating unique insights and quotes you wouldn't find just

602
00:29:43.200 --> 00:29:44.160
<v Speaker 2>by scraping the web.

603
00:29:44.200 --> 00:29:46.759
<v Speaker 1>Adding original perspective. Nice. What's next?

604
00:29:46.839 --> 00:29:50.119
<v Speaker 2>Outline generation? The system would prompt an LLM to create

605
00:29:50.119 --> 00:29:54.000
<v Speaker 2>a detailed, structured outline, maybe in Jason or nested list format,

606
00:29:54.279 --> 00:29:55.720
<v Speaker 2>based on the research in the interview.

607
00:29:55.839 --> 00:29:58.759
<v Speaker 1>Okay, structure first, then the writing.

608
00:29:58.599 --> 00:30:02.000
<v Speaker 2>Right, text generation It would go section by section through

609
00:30:02.000 --> 00:30:05.440
<v Speaker 2>the outline, feeding the LLM the relevant research chunks and

610
00:30:05.559 --> 00:30:09.400
<v Speaker 2>interview snippets as context for each part, with strict instructions

611
00:30:09.440 --> 00:30:12.960
<v Speaker 2>not to plagiarize. This relies heavily on good chunking and

612
00:30:13.039 --> 00:30:14.920
<v Speaker 2>contextual prompting makes sense.

613
00:30:15.079 --> 00:30:16.720
<v Speaker 1>What about images for the blog.

614
00:30:16.480 --> 00:30:20.160
<v Speaker 2>Post image generation? For each post, it could generate a

615
00:30:20.200 --> 00:30:23.240
<v Speaker 2>custom image. This could be a two step process using

616
00:30:23.279 --> 00:30:27.480
<v Speaker 2>meta prompting. First, have an LM generate a really good

617
00:30:27.799 --> 00:30:30.720
<v Speaker 2>descriptive image prompt based on the section's content.

618
00:30:30.960 --> 00:30:33.519
<v Speaker 1>The AI writes the image prop exactly.

619
00:30:33.519 --> 00:30:35.680
<v Speaker 2>Then feed that prompt to an image model like stable

620
00:30:35.720 --> 00:30:39.119
<v Speaker 2>Diffusion XL. You could even specify a consistent style like

621
00:30:39.319 --> 00:30:42.960
<v Speaker 2>corporate Memphis for all images across the blog fully automated

622
00:30:43.000 --> 00:30:43.839
<v Speaker 2>visuals Wow.

623
00:30:44.200 --> 00:30:46.000
<v Speaker 1>Any optimization after the content is.

624
00:30:46.000 --> 00:30:49.960
<v Speaker 2>Written definitely title optimization. Use an LLM to generate or

625
00:30:50.000 --> 00:30:53.240
<v Speaker 2>refine the title for better SEO and clickser rates, and

626
00:30:53.319 --> 00:30:54.880
<v Speaker 2>crucially rewriting for.

627
00:30:54.880 --> 00:30:58.119
<v Speaker 1>Style matching a specific brand voice precisely.

628
00:30:58.839 --> 00:31:02.359
<v Speaker 2>The system could take generated draft and rewrite it to

629
00:31:02.440 --> 00:31:06.759
<v Speaker 2>match a defined style like informative and analytical with practical

630
00:31:07.000 --> 00:31:11.079
<v Speaker 2>actionable advice. This could be tricky, often needing a powerful

631
00:31:11.119 --> 00:31:14.920
<v Speaker 2>model like GPT four and careful prompt tuning, maybe ab

632
00:31:15.079 --> 00:31:16.400
<v Speaker 2>testing against human examples.

633
00:31:16.440 --> 00:31:18.079
<v Speaker 1>Seems like that style part could be the.

634
00:31:18.000 --> 00:31:20.720
<v Speaker 2>Hardest often is. And finally, for getting it out there

635
00:31:20.759 --> 00:31:24.279
<v Speaker 2>quickly user interface, you wouldn't need a complex web app

636
00:31:24.319 --> 00:31:27.240
<v Speaker 2>right away. You could build a simple prototype using Python

637
00:31:27.319 --> 00:31:30.559
<v Speaker 2>libraries like radio or streamlet, just to get it working

638
00:31:30.680 --> 00:31:31.960
<v Speaker 2>and gather early feedback.

639
00:31:32.279 --> 00:31:35.119
<v Speaker 1>So a full workflow from research to style, texts and

640
00:31:35.160 --> 00:31:38.599
<v Speaker 1>images all orchestrated using these AI techniques. That's impressive.

641
00:31:38.759 --> 00:31:43.039
<v Speaker 2>It really shows how these components are agents, prompting techniques,

642
00:31:43.079 --> 00:31:46.720
<v Speaker 2>structured data generation can stack together to build something powerful.

643
00:31:46.880 --> 00:31:51.359
<v Speaker 1>Wow. That was an incredible journey. Seriously, from the absolute

644
00:31:51.400 --> 00:31:53.440
<v Speaker 1>basics of a good prompt all the way to building

645
00:31:53.480 --> 00:31:56.359
<v Speaker 1>automated systems and creating custom art. You've given us a

646
00:31:56.400 --> 00:31:59.079
<v Speaker 1>proper deep dive into what's actually possible right now.

647
00:31:59.160 --> 00:32:01.160
<v Speaker 2>It's amazing to see how it all connects, isn't it.

648
00:32:01.559 --> 00:32:06.400
<v Speaker 2>Specifying formats, using vector databases, for memory, agents, taking actions,

649
00:32:06.480 --> 00:32:10.200
<v Speaker 2>fine tuning models, They're all pieces of this bigger puzzle

650
00:32:10.680 --> 00:32:15.880
<v Speaker 2>for creating reliable and frankly intelligent AI applications. Things that

651
00:32:16.000 --> 00:32:18.039
<v Speaker 2>felt like science fiction just a couple of years back.

652
00:32:18.200 --> 00:32:20.799
<v Speaker 1>So for everyone listening, what's the big takeaway? What does

653
00:32:20.839 --> 00:32:23.400
<v Speaker 1>this mean for you? I think it means that interacting

654
00:32:23.400 --> 00:32:26.240
<v Speaker 1>with AI isn't just about typing a quick query anymore.

655
00:32:26.240 --> 00:32:29.319
<v Speaker 1>It's about realizing you have the controls exactly.

656
00:32:29.519 --> 00:32:32.839
<v Speaker 2>You have the power to direct these models, to refine

657
00:32:32.880 --> 00:32:35.880
<v Speaker 2>their output, even to teach them new things, to get

658
00:32:35.920 --> 00:32:40.119
<v Speaker 2>incredibly specific, high quality results that are personalized to your needs.

659
00:32:40.359 --> 00:32:42.240
<v Speaker 1>You move from being just a user to being more

660
00:32:42.359 --> 00:32:44.839
<v Speaker 1>like a director or a collaborator with the AI.

661
00:32:45.039 --> 00:32:47.240
<v Speaker 2>That's a great way to put it. Yeah, the challenge

662
00:32:47.240 --> 00:32:50.319
<v Speaker 2>now really is to take these ideas and experiment. Think

663
00:32:50.319 --> 00:32:53.000
<v Speaker 2>about your own work, your own creative projects. How could

664
00:32:53.079 --> 00:32:56.720
<v Speaker 2>combining some of these techniques maybe RAG with your company's data,

665
00:32:57.160 --> 00:32:59.759
<v Speaker 2>or an agent to automate a tedious task, or finally

666
00:32:59.759 --> 00:33:02.920
<v Speaker 2>too image generation. How could that solve your unique problems?

667
00:33:03.119 --> 00:33:05.960
<v Speaker 1>Where could you apply this power to transform how you

668
00:33:06.079 --> 00:33:08.279
<v Speaker 1>work or unlock something totally new?

669
00:33:08.759 --> 00:33:10.519
<v Speaker 2>That's the question to ponder.

670
00:33:10.680 --> 00:33:13.400
<v Speaker 1>That really is a great thought to leave everyone with.

671
00:33:14.039 --> 00:33:17.160
<v Speaker 1>This Deep dive has given us a fantastic practical toolkit.

672
00:33:17.200 --> 00:33:18.920
<v Speaker 1>Thanks so much for walking us through it.

673
00:33:19.039 --> 00:33:20.839
<v Speaker 2>My pleasure. It's an exciting field.

674
00:33:20.599 --> 00:33:22.720
<v Speaker 1>And thanks to you for joining us on the deep dive.

675
00:33:22.799 --> 00:33:25.240
<v Speaker 1>We genuinely hope this empowers you to go out and

676
00:33:25.279 --> 00:33:27.440
<v Speaker 1>become a master of your own AI craft.
