WEBVTT

1
00:00:00.160 --> 00:00:02.480
<v Speaker 1>Welcome to the Deep Dive, the show that navigates the

2
00:00:02.560 --> 00:00:06.559
<v Speaker 1>labyrinth of information, distilling the essence of what truly matters.

3
00:00:07.240 --> 00:00:11.199
<v Speaker 1>I vividly remember the first time I interacted with GPT three.

4
00:00:11.480 --> 00:00:14.560
<v Speaker 1>Oh yeah, it felt like an almost magical experience. For

5
00:00:14.640 --> 00:00:18.160
<v Speaker 1>the first time, it genuinely seemed like the computer understood

6
00:00:18.160 --> 00:00:22.800
<v Speaker 1>my complex inputs and could react appropriately, you know, solving

7
00:00:22.879 --> 00:00:26.480
<v Speaker 1>diverse tasks from text analysis to coding, just based on

8
00:00:26.519 --> 00:00:29.839
<v Speaker 1>my instructions. It was a complete game changer, especially compared

9
00:00:29.879 --> 00:00:34.039
<v Speaker 1>to the well the prior neural networks that always needed specialized,

10
00:00:34.079 --> 00:00:35.719
<v Speaker 1>hand labeled training data.

11
00:00:35.759 --> 00:00:38.640
<v Speaker 2>It truly redefined what we thought was possible with AI.

12
00:00:38.799 --> 00:00:43.399
<v Speaker 1>The leap was just undeniable, absolutely, and today our Deep

13
00:00:43.439 --> 00:00:45.920
<v Speaker 1>Dive is all about harnessing that power. We're going to

14
00:00:45.960 --> 00:00:48.799
<v Speaker 1>explore how these incredible language models can be used specifically

15
00:00:48.880 --> 00:00:51.039
<v Speaker 1>for data analysis, helping you make the most of your

16
00:00:51.079 --> 00:00:54.200
<v Speaker 1>data sets. Right, They've evolved so rapidly, moving from just

17
00:00:54.439 --> 00:00:59.119
<v Speaker 1>processing text to understanding multimodal inputs that's images, audio, video,

18
00:00:59.200 --> 00:00:59.880
<v Speaker 1>and of course tech.

19
00:01:00.079 --> 00:01:01.920
<v Speaker 2>Yeah, the multi modality is huge.

20
00:01:02.079 --> 00:01:05.439
<v Speaker 1>This expansion makes them an invaluable tool across pretty much

21
00:01:05.439 --> 00:01:06.840
<v Speaker 1>every facet of data science.

22
00:01:07.120 --> 00:01:09.680
<v Speaker 2>Our mission in this deep dive really is to show

23
00:01:09.680 --> 00:01:13.799
<v Speaker 2>you how llms can act as expert guides to your data,

24
00:01:13.920 --> 00:01:17.280
<v Speaker 2>offering a genuine shortcut to being well informed. A shortcut

25
00:01:17.319 --> 00:01:19.719
<v Speaker 2>I like that will delve into how they extract the

26
00:01:19.760 --> 00:01:24.040
<v Speaker 2>most important nuggets of knowledge and insight from various sources

27
00:01:24.400 --> 00:01:28.319
<v Speaker 2>and empower you to build complex analysis pipelines with just

28
00:01:28.359 --> 00:01:31.400
<v Speaker 2>a few lines of Python code, all driven by natural

29
00:01:31.480 --> 00:01:32.359
<v Speaker 2>language instructions.

30
00:01:32.359 --> 00:01:34.840
<v Speaker 1>Okay, let's unpack the core of this magic. Then it

31
00:01:34.879 --> 00:01:38.959
<v Speaker 1>all begins with what GPT actually stands for, generative pre

32
00:01:39.040 --> 00:01:44.359
<v Speaker 1>trained transformer. Generative is key here, meaning these models don't

33
00:01:44.400 --> 00:01:47.719
<v Speaker 1>just classify or recognize things. They create new content, whether

34
00:01:47.760 --> 00:01:49.680
<v Speaker 1>it's text, code or even images.

35
00:01:49.959 --> 00:01:52.680
<v Speaker 2>And the pre trained aspect means they've learned from truly

36
00:01:52.719 --> 00:01:55.959
<v Speaker 2>immense amounts of data, vast swaths of the Internet, books,

37
00:01:56.319 --> 00:01:59.480
<v Speaker 2>and more, enabling them to understand languid broadly, not just

38
00:01:59.560 --> 00:02:03.640
<v Speaker 2>you know, specific narrow task. This generic understanding allows them

39
00:02:03.680 --> 00:02:06.599
<v Speaker 2>to then adapt to specialized tasks much more efficiently. And

40
00:02:06.640 --> 00:02:10.680
<v Speaker 2>the transformer part, ah, that's the underlying neural network architecture,

41
00:02:11.719 --> 00:02:13.919
<v Speaker 2>the brilliant design that makes all this possible.

42
00:02:14.400 --> 00:02:17.240
<v Speaker 1>So how does this fundamental design let them tackle such

43
00:02:17.280 --> 00:02:18.919
<v Speaker 1>a wide array of problems.

44
00:02:19.120 --> 00:02:22.240
<v Speaker 2>What's truly fascinating is how this design allows lms to

45
00:02:22.240 --> 00:02:26.520
<v Speaker 2>be universal task solvers. Unlike earlier models built for one

46
00:02:26.560 --> 00:02:31.479
<v Speaker 2>specific purpose, llms are designed intended to serve as universal

47
00:02:31.520 --> 00:02:35.479
<v Speaker 2>task solvers that can, in principle, solve any task the

48
00:02:35.560 --> 00:02:36.280
<v Speaker 2>user desires.

49
00:02:36.800 --> 00:02:38.439
<v Speaker 1>Any task wow.

50
00:02:38.199 --> 00:02:41.360
<v Speaker 2>Well within reason. The way you communicate with them is

51
00:02:41.360 --> 00:02:44.319
<v Speaker 2>through prompting. Think of a prompt as your direct instruction

52
00:02:44.400 --> 00:02:46.280
<v Speaker 2>to the model. So the input you give it, and

53
00:02:46.319 --> 00:02:49.400
<v Speaker 2>it can be multimodal, combining text with images or other

54
00:02:49.479 --> 00:02:53.560
<v Speaker 2>data types. A really effective plompt needs a clear task description,

55
00:02:53.960 --> 00:02:56.240
<v Speaker 2>all the relevant context like are we talking about reviewing

56
00:02:56.280 --> 00:02:58.080
<v Speaker 2>laptops or lawnmowers for example? Right?

57
00:02:58.120 --> 00:02:59.520
<v Speaker 1>Context matter Context is.

58
00:02:59.520 --> 00:03:02.639
<v Speaker 2>Critical, and crucially, it can optionally include a few examples

59
00:03:02.680 --> 00:03:03.479
<v Speaker 2>to guide the model.

60
00:03:03.800 --> 00:03:06.639
<v Speaker 1>So if the prompt is the key, how much handholding

61
00:03:06.680 --> 00:03:09.159
<v Speaker 1>do we actually need to give the model? Does it

62
00:03:09.240 --> 00:03:11.280
<v Speaker 1>learn from a few examples or can it just get

63
00:03:11.280 --> 00:03:12.560
<v Speaker 1>it from the description alone?

64
00:03:12.719 --> 00:03:15.960
<v Speaker 2>Yeah, that brings us to fu shot learning versus zero

65
00:03:16.039 --> 00:03:18.879
<v Speaker 2>shot learning. FU shot learning is when you provide those

66
00:03:18.919 --> 00:03:21.759
<v Speaker 2>few examples directly in your prompt to show the model

67
00:03:21.840 --> 00:03:25.360
<v Speaker 2>exactly what you expect showing your work exactly. It's like

68
00:03:25.400 --> 00:03:28.280
<v Speaker 2>showing someone a couple of solved puzzles so they understand

69
00:03:28.280 --> 00:03:31.960
<v Speaker 2>the pattern. Zero shot learning, on the other hand, means

70
00:03:32.080 --> 00:03:35.960
<v Speaker 2>you're relying solely on your task description with no examples provided,

71
00:03:36.080 --> 00:03:39.120
<v Speaker 2>and that works. It's impressive how often llms can still

72
00:03:39.120 --> 00:03:43.280
<v Speaker 2>perform effectively even with zero shot prompting. It really depends

73
00:03:43.280 --> 00:03:45.479
<v Speaker 2>on the task complexity and the model itself.

74
00:03:46.000 --> 00:03:48.879
<v Speaker 1>And it's important to distinguish between the types of data

75
00:03:49.000 --> 00:03:51.719
<v Speaker 1>lllms work with, right structured versus unstructured.

76
00:03:51.759 --> 00:03:55.080
<v Speaker 2>Absolutely, we have structured data that's your tables grabs, anything

77
00:03:55.120 --> 00:03:59.039
<v Speaker 2>with a fixed format that specialized tools can process very efficiently.

78
00:03:59.360 --> 00:04:02.439
<v Speaker 2>For this primarily act as an intelligent.

79
00:04:01.960 --> 00:04:04.560
<v Speaker 1>Interface, got it, like a translator kind of.

80
00:04:04.680 --> 00:04:08.840
<v Speaker 2>Then there's unstructured data text, images, audio video, where llms

81
00:04:08.879 --> 00:04:12.080
<v Speaker 2>operate directly on the raw content. A critical point for

82
00:04:12.080 --> 00:04:14.800
<v Speaker 2>anyone using these models, and something that often surprises people,

83
00:04:15.159 --> 00:04:18.920
<v Speaker 2>is that interacting with language models incurs monetary fees. AH

84
00:04:19.079 --> 00:04:22.079
<v Speaker 2>the cost yes proportional to the amount of data process

85
00:04:22.720 --> 00:04:27.199
<v Speaker 2>and using larger language models, well that's often more expensive significantly,

86
00:04:27.199 --> 00:04:28.000
<v Speaker 2>So sometimes how.

87
00:04:28.000 --> 00:04:28.879
<v Speaker 1>Do they measure that cost?

88
00:04:29.079 --> 00:04:32.279
<v Speaker 2>These costs are calculated in tokens. Think of tokens as

89
00:04:32.319 --> 00:04:36.120
<v Speaker 2>the smallest, meaningful lego bricks of language. So if I

90
00:04:36.160 --> 00:04:39.079
<v Speaker 2>say Hello World, that might be just a few tokens.

91
00:04:39.199 --> 00:04:41.360
<v Speaker 2>It's roughly four characters a text, give or take.

92
00:04:41.399 --> 00:04:43.399
<v Speaker 1>That's a good way to put it. So for many

93
00:04:43.399 --> 00:04:46.600
<v Speaker 1>of us are first let's say, dance with an LLM

94
00:04:46.800 --> 00:04:50.519
<v Speaker 1>was likely through the chat GPT web interface. Ugly, most

95
00:04:50.519 --> 00:04:52.800
<v Speaker 1>of you have probably already dabbled there, accessing it at

96
00:04:52.879 --> 00:04:55.759
<v Speaker 1>chat dot OpenEye dot com. It's a great sandbox for

97
00:04:55.839 --> 00:05:00.279
<v Speaker 1>quick text processing or even exploring its data analysis capabilities.

98
00:05:00.079 --> 00:05:03.199
<v Speaker 2>And in that web interface you can perform some genuinely

99
00:05:03.279 --> 00:05:09.000
<v Speaker 2>practical tasks. For text processing, classification is straightforward determining the

100
00:05:09.040 --> 00:05:11.959
<v Speaker 2>sentiment of a movie review or sorting a product review,

101
00:05:12.120 --> 00:05:13.959
<v Speaker 2>like for a I don't know a banana book a

102
00:05:13.959 --> 00:05:17.079
<v Speaker 2>banana book into its correct category. You can even hint

103
00:05:17.079 --> 00:05:20.519
<v Speaker 2>it your desired output format simply by saying answer concisely.

104
00:05:20.800 --> 00:05:21.720
<v Speaker 1>Nice.

105
00:05:21.759 --> 00:05:25.439
<v Speaker 2>For information extraction, it's brilliant at pulling structured data from

106
00:05:25.480 --> 00:05:29.720
<v Speaker 2>freeform text, like gathering a name, GPA, and degree from

107
00:05:29.720 --> 00:05:31.120
<v Speaker 2>a stack of applicant emails.

108
00:05:31.240 --> 00:05:33.839
<v Speaker 1>And what's truly impressive is how it handles tables right

109
00:05:34.120 --> 00:05:34.920
<v Speaker 1>right there in the chat.

110
00:05:35.079 --> 00:05:37.639
<v Speaker 2>It really is. You can upload a dot csv file,

111
00:05:37.680 --> 00:05:41.959
<v Speaker 2>for instance, review stable dot csv. Chat SHPT doesn't just

112
00:05:42.000 --> 00:05:45.240
<v Speaker 2>display it. If you've enabled to write features, it generates

113
00:05:45.240 --> 00:05:48.120
<v Speaker 2>an executes Python code behind the scenes to analyze that

114
00:05:48.199 --> 00:05:49.199
<v Speaker 2>data hikon code.

115
00:05:49.319 --> 00:05:50.079
<v Speaker 1>Really Yeah.

116
00:05:50.519 --> 00:05:52.199
<v Speaker 2>You can even peak at the code by clicking the

117
00:05:52.199 --> 00:05:57.000
<v Speaker 2>show analysis button. This demonstrates lms acting as intelligent orchestrators

118
00:05:57.000 --> 00:05:58.160
<v Speaker 2>for external tools.

119
00:05:58.360 --> 00:05:58.639
<v Speaker 1>Wow.

120
00:05:58.800 --> 00:06:02.240
<v Speaker 2>They also excel as translation, converting natural language questions into

121
00:06:02.279 --> 00:06:04.639
<v Speaker 2>formal query languages like sql.

122
00:06:04.319 --> 00:06:05.519
<v Speaker 1>Ah SQL generation.

123
00:06:05.920 --> 00:06:09.600
<v Speaker 2>That's useful, very You can then execute that SEQL on

124
00:06:09.639 --> 00:06:13.079
<v Speaker 2>your own platform, say and squally database in a Google

125
00:06:13.120 --> 00:06:18.439
<v Speaker 2>collab notebook. It's fantastic for writing complex multiline queries, and

126
00:06:18.480 --> 00:06:21.000
<v Speaker 2>that handy copy code button makes it so easy to

127
00:06:21.040 --> 00:06:22.279
<v Speaker 2>grab the generated sequel.

128
00:06:22.399 --> 00:06:26.680
<v Speaker 1>That sounds incredibly powerful, but it also raises an important question.

129
00:06:26.879 --> 00:06:29.879
<v Speaker 1>Can we truly trust everything? A LLLM tells us.

130
00:06:29.800 --> 00:06:31.920
<v Speaker 2>That's a crucial point, and it's one of the biggest challenges.

131
00:06:32.439 --> 00:06:37.600
<v Speaker 2>The term hallucinations refers to situations where lms will invent

132
00:06:37.680 --> 00:06:40.920
<v Speaker 2>new content in the absence of information, invent things. Yes,

133
00:06:41.120 --> 00:06:43.680
<v Speaker 2>and the truly profound insight here isn't just that they

134
00:06:43.680 --> 00:06:46.839
<v Speaker 2>invent things, but that they do so with such convincing

135
00:06:46.879 --> 00:06:49.360
<v Speaker 2>confidence it sounds completely plausible.

136
00:06:49.439 --> 00:06:51.199
<v Speaker 1>Oh that's dangerous, it can be.

137
00:06:51.360 --> 00:06:54.920
<v Speaker 2>This fundamentally shifts our perspective, and LLLLM doesn't know in

138
00:06:54.959 --> 00:06:57.759
<v Speaker 2>the human sense, it generates plausibly, Yeah, forcing us to

139
00:06:57.800 --> 00:07:00.879
<v Speaker 2>rethink how we trust automated information. So it's essential to

140
00:07:00.879 --> 00:07:03.959
<v Speaker 2>always verify the output, Always double check before relying on it.

141
00:07:04.319 --> 00:07:06.000
<v Speaker 2>Use alternative sources for corroboration.

142
00:07:06.199 --> 00:07:07.160
<v Speaker 1>Okay, always verify.

143
00:07:07.279 --> 00:07:07.560
<v Speaker 2>Got it.

144
00:07:07.879 --> 00:07:10.319
<v Speaker 1>So, while the web interface is great for chatting and

145
00:07:10.399 --> 00:07:14.040
<v Speaker 1>quick tasks, it's not really designed for building robust, complex

146
00:07:14.319 --> 00:07:15.759
<v Speaker 1>data processing pipelines.

147
00:07:16.000 --> 00:07:16.519
<v Speaker 2>Not really.

148
00:07:16.600 --> 00:07:18.959
<v Speaker 1>No, for that, we need to go deeper into the code.

149
00:07:18.959 --> 00:07:21.680
<v Speaker 1>This is where the open ai Python library comes into play.

150
00:07:21.800 --> 00:07:25.680
<v Speaker 2>Exactly. The Python library allows you to directly invoke llms

151
00:07:25.920 --> 00:07:28.720
<v Speaker 2>as a subfunction within your own code, giving you much

152
00:07:28.800 --> 00:07:30.000
<v Speaker 2>more programmatic control.

153
00:07:30.079 --> 00:07:31.240
<v Speaker 1>How do you get started with that?

154
00:07:31.600 --> 00:07:33.879
<v Speaker 2>To get set up, you'll need Python three point nine

155
00:07:34.040 --> 00:07:37.360
<v Speaker 2>or later, and then simply install the opene library using

156
00:07:37.399 --> 00:07:41.600
<v Speaker 2>pip standard stuff. Okay, critically, you'll need an API key

157
00:07:41.680 --> 00:07:44.639
<v Speaker 2>from open Ai, and it is highly recommended to store

158
00:07:44.680 --> 00:07:47.000
<v Speaker 2>this securely as an environment variable.

159
00:07:46.720 --> 00:07:48.000
<v Speaker 1>Right, don't just paste it in your.

160
00:07:48.000 --> 00:07:51.079
<v Speaker 2>Code, absolutely not. Never ever share your code if it

161
00:07:51.120 --> 00:07:54.120
<v Speaker 2>contains your open AI access key directly, as others could

162
00:07:54.160 --> 00:07:56.439
<v Speaker 2>use it to encour charges on your account. Very important.

163
00:07:56.519 --> 00:07:57.519
<v Speaker 1>Okay, key secured?

164
00:07:57.720 --> 00:08:01.319
<v Speaker 2>Then what when using check completetion in Python, you can

165
00:08:01.360 --> 00:08:04.439
<v Speaker 2>struct a list of messages. Each message has a role

166
00:08:04.680 --> 00:08:07.920
<v Speaker 2>user for your input, assistant for the model's reply or

167
00:08:08.000 --> 00:08:11.040
<v Speaker 2>system for instructions about the model's persona or behavior.

168
00:08:11.399 --> 00:08:14.279
<v Speaker 1>System user assistant okay.

169
00:08:14.120 --> 00:08:17.759
<v Speaker 2>And then the actual content of the message the client

170
00:08:17.839 --> 00:08:21.639
<v Speaker 2>dot chat, dot completions, dot create function handles setting this off.

171
00:08:22.120 --> 00:08:26.560
<v Speaker 2>Remember token usage, specifically, the total tokens attribute in the

172
00:08:26.680 --> 00:08:30.079
<v Speaker 2>response you get back directly impacts cost right back to

173
00:08:30.120 --> 00:08:32.799
<v Speaker 2>the tokens, and tokens generated by the model are often

174
00:08:32.840 --> 00:08:35.480
<v Speaker 2>more expensive than the tokens you send as input. Keep

175
00:08:35.519 --> 00:08:36.000
<v Speaker 2>that in mind.

176
00:08:36.120 --> 00:08:38.679
<v Speaker 1>Ah, good tip. Now that we know how to talk

177
00:08:38.679 --> 00:08:41.960
<v Speaker 1>to these models through code, the next logical step is

178
00:08:42.559 --> 00:08:44.360
<v Speaker 1>how do we steer them, how do we make sure

179
00:08:44.360 --> 00:08:46.679
<v Speaker 1>they behave exactly as we want, and crucially, how do

180
00:08:46.720 --> 00:08:47.960
<v Speaker 1>we manage those costs.

181
00:08:48.360 --> 00:08:51.840
<v Speaker 2>That's where customizing model behavior and optimizing for costing quality

182
00:08:51.879 --> 00:08:55.159
<v Speaker 2>comes in. It's really about controlling the generation process. Oh so,

183
00:08:55.559 --> 00:08:58.960
<v Speaker 2>for example, to control output length and therefore fees, you

184
00:08:59.000 --> 00:09:02.000
<v Speaker 2>can set max token to specify a maximum response length.

185
00:09:02.279 --> 00:09:04.600
<v Speaker 1>Pretty straightforward, can't limit the output makes sense.

186
00:09:04.919 --> 00:09:08.159
<v Speaker 2>You can also use obsequences specific text patterns like maybe

187
00:09:08.279 --> 00:09:11.759
<v Speaker 2>endo response or even something narrative like and they lived

188
00:09:11.759 --> 00:09:15.200
<v Speaker 2>happily ever after to tell the model exactly when to

189
00:09:15.200 --> 00:09:18.919
<v Speaker 2>stop generating. This can be very useful for getting structured.

190
00:09:18.559 --> 00:09:22.000
<v Speaker 1>Outputs ah nat trick, and for controlling the actual words

191
00:09:22.000 --> 00:09:25.039
<v Speaker 1>it chooses. How do we guide that for output generation?

192
00:09:25.399 --> 00:09:29.120
<v Speaker 2>Presence penalty and frequency penalty are your levers for controlling repetitiveness.

193
00:09:29.919 --> 00:09:33.480
<v Speaker 2>Positive values discourage the model from repeating tokens it's already

194
00:09:33.559 --> 00:09:36.080
<v Speaker 2>used or that are present in the prompt helps keep

195
00:09:36.080 --> 00:09:36.559
<v Speaker 2>things fresh.

196
00:09:36.639 --> 00:09:37.360
<v Speaker 1>That's repetition.

197
00:09:37.679 --> 00:09:41.519
<v Speaker 2>Good for truly surgical precision, like forcing a model to

198
00:09:41.639 --> 00:09:46.080
<v Speaker 2>use specific words, say positive or negative in a sentiment task.

199
00:09:46.600 --> 00:09:47.720
<v Speaker 2>There's legit bias.

200
00:09:48.000 --> 00:09:49.759
<v Speaker 1>Legit bias sounds complex.

201
00:09:49.840 --> 00:09:52.759
<v Speaker 2>It's a more advanced lever. It lets you explicitly increase

202
00:09:52.840 --> 00:09:56.080
<v Speaker 2>or decrease the likelihood of specific tokens appearing. You need

203
00:09:56.120 --> 00:09:58.919
<v Speaker 2>to find the token IDs using a token aser tool first.

204
00:09:59.200 --> 00:10:02.320
<v Speaker 2>It's powerful, but typically for very niche use cases. You

205
00:10:02.320 --> 00:10:03.440
<v Speaker 2>wouldn't use it every day.

206
00:10:03.440 --> 00:10:07.840
<v Speaker 1>Okay, And what about controlling how creative or let's say

207
00:10:07.919 --> 00:10:12.120
<v Speaker 1>random the model gets. Sometimes you want predictable, sometimes more exploratory.

208
00:10:12.360 --> 00:10:16.360
<v Speaker 2>That's where randomization parameters are key. Temperature, typically set between

209
00:10:16.440 --> 00:10:20.840
<v Speaker 2>zero and two, directly controls randomness. Higher values like maybe

210
00:10:20.919 --> 00:10:23.120
<v Speaker 2>point eight or one point zero lead to more diverse

211
00:10:23.159 --> 00:10:27.000
<v Speaker 2>and sometimes more created outputs. Lower values closer to zero

212
00:10:27.360 --> 00:10:29.480
<v Speaker 2>make it more deterministic and focused.

213
00:10:29.480 --> 00:10:31.559
<v Speaker 1>So zero for facts, higher for fiction.

214
00:10:32.080 --> 00:10:34.960
<v Speaker 2>Sort of kind of yeah. TOP is an alternative approach

215
00:10:35.000 --> 00:10:38.840
<v Speaker 2>that achieves a similar goal. It reduces randomization by focusing

216
00:10:38.879 --> 00:10:41.600
<v Speaker 2>only on the highest probability tokens that add up to

217
00:10:41.600 --> 00:10:45.399
<v Speaker 2>a certain cumulative probability. It's just a different way to tune.

218
00:10:45.200 --> 00:10:47.440
<v Speaker 1>The randomness, temperature or TOP. Okay.

219
00:10:47.720 --> 00:10:50.120
<v Speaker 2>And if you want multiple options for a single plumpt,

220
00:10:50.360 --> 00:10:52.840
<v Speaker 2>you can use the N parameter to generate several replies

221
00:10:52.879 --> 00:10:55.240
<v Speaker 2>at once. Gives you more choices to pick from.

222
00:10:55.720 --> 00:10:58.919
<v Speaker 1>This raises an important question with all these settings, how

223
00:10:59.000 --> 00:11:02.039
<v Speaker 1>do we get the best perform ormans while managing costs effectively?

224
00:11:02.080 --> 00:11:03.360
<v Speaker 1>It sounds like a balancing act.

225
00:11:03.480 --> 00:11:07.679
<v Speaker 2>It absolutely is, and that's where strategic optimization becomes crucial first,

226
00:11:08.159 --> 00:11:12.200
<v Speaker 2>model selection. Do not always default to the largest, most

227
00:11:12.240 --> 00:11:13.600
<v Speaker 2>expensive available model.

228
00:11:13.879 --> 00:11:15.399
<v Speaker 1>Bigger isn't always better.

229
00:11:15.440 --> 00:11:18.759
<v Speaker 2>Not necessarily and certainly not always cost effective. For many

230
00:11:18.799 --> 00:11:22.679
<v Speaker 2>simpler tasks, a smaller, cheaper model like GPT three point

231
00:11:22.679 --> 00:11:26.639
<v Speaker 2>five turbo might perform perfectly well. GPT four, for instance,

232
00:11:26.919 --> 00:11:29.639
<v Speaker 2>can be over one hundred times more expensive per token in.

233
00:11:29.600 --> 00:11:31.799
<v Speaker 1>Some cases, wow, a hundred times.

234
00:11:31.879 --> 00:11:35.879
<v Speaker 2>Yeah, it's smart to check benchmarks like Stanford's ALM evaluation

235
00:11:36.440 --> 00:11:39.759
<v Speaker 2>and definitely experiment with different models for your specific task

236
00:11:40.080 --> 00:11:42.960
<v Speaker 2>to find that sweet spot between cost and quality.

237
00:11:43.200 --> 00:11:46.120
<v Speaker 1>So model choice is clearly a big one. What else

238
00:11:46.159 --> 00:11:48.600
<v Speaker 1>can we do besides tweaking temperature and penalties?

239
00:11:48.960 --> 00:11:52.480
<v Speaker 2>Prompt engineering is absolutely vital. I can't stress this enough.

240
00:11:52.600 --> 00:11:55.240
<v Speaker 2>The design of your prompt can have a significant effect on.

241
00:11:55.200 --> 00:11:57.840
<v Speaker 1>Performance, really just the way you ask.

242
00:11:58.000 --> 00:12:01.600
<v Speaker 2>Yes, it's a really counterintuitive insights sometimes, but the biggest

243
00:12:01.639 --> 00:12:03.679
<v Speaker 2>leap in performance might not come from a bigger model

244
00:12:03.759 --> 00:12:07.320
<v Speaker 2>or more training data, but simply from better instructions, like

245
00:12:07.360 --> 00:12:10.440
<v Speaker 2>a skilled artisan responding to a perfectly precise brief. You know.

246
00:12:10.720 --> 00:12:13.399
<v Speaker 2>That's the magic of fu shot learning, which we mentioned earlier,

247
00:12:13.840 --> 00:12:17.240
<v Speaker 2>including samples of correctly solved tasks directly in the prompt

248
00:12:17.440 --> 00:12:21.360
<v Speaker 2>can dramatically improve quality. It often allows cheaper models to

249
00:12:21.399 --> 00:12:25.000
<v Speaker 2>perform comparably to much more expensive ones just because the

250
00:12:25.039 --> 00:12:26.399
<v Speaker 2>task is clearer.

251
00:12:26.080 --> 00:12:28.320
<v Speaker 1>So invest time in the prompt itself.

252
00:12:28.600 --> 00:12:31.840
<v Speaker 2>Definitely. You can even find ready made prompt templates on

253
00:12:31.919 --> 00:12:35.240
<v Speaker 2>platforms like prompt Base, though crafting your own specific to

254
00:12:35.320 --> 00:12:36.639
<v Speaker 2>your need is usually best.

255
00:12:36.759 --> 00:12:38.799
<v Speaker 1>And what about fine tuning? That sounds like a big

256
00:12:38.840 --> 00:12:40.559
<v Speaker 1>step like retraining the model?

257
00:12:40.759 --> 00:12:43.799
<v Speaker 2>It is kind of. Fine tuning allows you to specialize

258
00:12:43.840 --> 00:12:47.159
<v Speaker 2>base models to the specific tasks you care most about.

259
00:12:47.720 --> 00:12:50.240
<v Speaker 2>You take an existing model like GBT three point five

260
00:12:50.279 --> 00:12:53.440
<v Speaker 2>Turbo and you continue its training, but with a relatively

261
00:12:53.440 --> 00:12:56.799
<v Speaker 2>small amount of your own task specific data, typically fifty

262
00:12:56.879 --> 00:12:58.440
<v Speaker 2>to maybe a few thousand examples.

263
00:12:58.480 --> 00:13:00.840
<v Speaker 1>Fifty examples. That doesn't sound like much compared to the

264
00:13:00.840 --> 00:13:01.919
<v Speaker 1>pre training data.

265
00:13:02.039 --> 00:13:05.639
<v Speaker 2>It's not, but it's focused. The model already understands language,

266
00:13:05.639 --> 00:13:07.519
<v Speaker 2>you're just nudging it to be really good at your

267
00:13:07.559 --> 00:13:08.320
<v Speaker 2>specific thing.

268
00:13:08.440 --> 00:13:11.279
<v Speaker 1>What are the upsides and downsides of that kind of specialization?

269
00:13:11.399 --> 00:13:12.200
<v Speaker 1>Seems powerful?

270
00:13:12.440 --> 00:13:17.279
<v Speaker 2>The advantages include potentially significantly improved accuracy for your specific

271
00:13:17.440 --> 00:13:21.000
<v Speaker 2>use case, and you might get away with shorter, simpler prompts.

272
00:13:21.000 --> 00:13:23.759
<v Speaker 2>Because the task is sort of baked into the specialized

273
00:13:23.799 --> 00:13:24.679
<v Speaker 2>model now and.

274
00:13:24.600 --> 00:13:27.399
<v Speaker 1>The downsides cost I assume yes.

275
00:13:27.559 --> 00:13:31.200
<v Speaker 2>There are upfront monetary fees for the training process itself,

276
00:13:31.919 --> 00:13:36.000
<v Speaker 2>and importantly it usually increases the cost per token for

277
00:13:36.080 --> 00:13:39.840
<v Speaker 2>the fine tuned model's ongoing usage compared to the base model.

278
00:13:40.000 --> 00:13:41.960
<v Speaker 1>Ah, so it costs more to run afterwards.

279
00:13:42.000 --> 00:13:44.639
<v Speaker 2>Often Yes, the training data also needs to be in

280
00:13:44.639 --> 00:13:49.720
<v Speaker 2>a specific JSM lines format, basically representing successful interactions as

281
00:13:49.759 --> 00:13:53.679
<v Speaker 2>little conversations with user and assistant roles. It's a powerful tool,

282
00:13:53.879 --> 00:13:57.200
<v Speaker 2>but when you typically explore once you've exhausted prompt engineering options.

283
00:13:57.279 --> 00:14:00.559
<v Speaker 1>Okay, that makes sense. Let's unpack this further than beyond

284
00:14:00.720 --> 00:14:04.480
<v Speaker 1>just text, lms are fundamentally transforming how we interact with

285
00:14:04.759 --> 00:14:06.440
<v Speaker 1>all sorts of data, right, not just words on a.

286
00:14:06.440 --> 00:14:11.200
<v Speaker 2>Paid absolutely for text analysis. Classification remains a natural application,

287
00:14:11.440 --> 00:14:15.879
<v Speaker 2>like we said categorizing movie reviews or support tickets. Information extraction,

288
00:14:16.200 --> 00:14:19.240
<v Speaker 2>where you pull structured data like compiling a table of

289
00:14:19.519 --> 00:14:23.559
<v Speaker 2>applicant attributes from free form emails, is another really strong suit.

290
00:14:23.679 --> 00:14:27.360
<v Speaker 1>Yeah, pulling structured data from messy texts is huge and.

291
00:14:27.279 --> 00:14:32.600
<v Speaker 2>For clustering which groups semantically similar text documents, llms leverage

292
00:14:32.639 --> 00:14:34.039
<v Speaker 2>something called embeddings.

293
00:14:34.200 --> 00:14:36.320
<v Speaker 1>Embeddings heard that term, what is it exactly?

294
00:14:36.360 --> 00:14:39.039
<v Speaker 2>Think of embeddings like assigning every piece of text a

295
00:14:39.159 --> 00:14:43.279
<v Speaker 2>unique invisible address in a vast high dimensional space, like

296
00:14:43.360 --> 00:14:46.240
<v Speaker 2>a point on a complex map. Okay, the closer to

297
00:14:46.600 --> 00:14:49.279
<v Speaker 2>addresses or points are on this map, the more similarly

298
00:14:49.320 --> 00:14:51.879
<v Speaker 2>the meaning of the texts. This allows the computer to

299
00:14:52.000 --> 00:14:55.840
<v Speaker 2>understand semantic similarity without actually reading in our human sense.

300
00:14:55.879 --> 00:14:57.279
<v Speaker 2>It's purely mathematical, so.

301
00:14:57.279 --> 00:14:58.480
<v Speaker 1>It turns meaning into.

302
00:14:58.360 --> 00:15:02.279
<v Speaker 2>Coordinates basically, yes, and this makes tasks like clustering emails

303
00:15:02.279 --> 00:15:06.120
<v Speaker 2>incredibly efficient, separating them from, say, poems. It also powers

304
00:15:06.159 --> 00:15:09.759
<v Speaker 2>things like semantic search and retrieval systems find me documents

305
00:15:09.799 --> 00:15:10.320
<v Speaker 2>like this one.

306
00:15:10.440 --> 00:15:13.279
<v Speaker 1>So it's about turning complex information into something that computer

307
00:15:13.360 --> 00:15:17.080
<v Speaker 1>can intelligently compare and measure. That's genuinely mind.

308
00:15:16.919 --> 00:15:22.320
<v Speaker 2>Boggling precisely, and for structured data analysis think relational databases

309
00:15:22.399 --> 00:15:27.440
<v Speaker 2>or graph databases. Lms truly act as a universal interface interface.

310
00:15:27.480 --> 00:15:30.879
<v Speaker 2>How they translate your natural language questions directly into formal

311
00:15:30.960 --> 00:15:34.960
<v Speaker 2>query languages like SQL for tables or cipher for graphs.

312
00:15:35.679 --> 00:15:38.320
<v Speaker 2>This contrast sharply with the traditional need for someone to

313
00:15:38.399 --> 00:15:42.399
<v Speaker 2>manually write those precise, often complex queries.

314
00:15:42.080 --> 00:15:44.399
<v Speaker 1>So I can just ask my database questions in English.

315
00:15:44.480 --> 00:15:46.919
<v Speaker 2>That's the goal. We often use external tools with the

316
00:15:47.039 --> 00:15:50.279
<v Speaker 2>LM for this because of efficiency, cost, and the sheer

317
00:15:50.399 --> 00:15:53.240
<v Speaker 2>volume of large data sets that would exceed in LM's

318
00:15:53.399 --> 00:15:56.279
<v Speaker 2>input limits. The LM acts as the translator.

319
00:15:56.320 --> 00:15:57.519
<v Speaker 1>How does that work? In practice?

320
00:15:57.600 --> 00:16:00.840
<v Speaker 2>You can build a natural language query interface for tabular data,

321
00:16:00.840 --> 00:16:04.039
<v Speaker 2>for example, by first having the LM automatically extract the

322
00:16:04.120 --> 00:16:07.519
<v Speaker 2>database structure, maybe by querrying the quite master table in

323
00:16:07.639 --> 00:16:09.000
<v Speaker 2>squite ah.

324
00:16:09.080 --> 00:16:12.080
<v Speaker 1>It figures out the tables and columns itself exactly.

325
00:16:12.519 --> 00:16:16.000
<v Speaker 2>Then it translates your natural language questions into SEQL queries

326
00:16:16.039 --> 00:16:19.399
<v Speaker 2>based on that structure, and finally, your application executes those

327
00:16:19.480 --> 00:16:21.399
<v Speaker 2>queries against the actual database.

328
00:16:21.600 --> 00:16:24.879
<v Speaker 1>That kind of automation sounds amazing, but with that level

329
00:16:24.879 --> 00:16:30.840
<v Speaker 1>of power accessing databases directly, there must be a significant caution, right.

330
00:16:30.840 --> 00:16:34.120
<v Speaker 2>Yes, a big one, a huge one actually. Do not

331
00:16:34.279 --> 00:16:37.639
<v Speaker 2>blindly trust your language model to generate accurate queries. They

332
00:16:37.639 --> 00:16:40.559
<v Speaker 2>can make mistakes. What kind of mistakes They might misunderstand

333
00:16:40.600 --> 00:16:44.639
<v Speaker 2>the question, misinterpret the schema, or generate SQEL that's inefficient

334
00:16:45.080 --> 00:16:48.480
<v Speaker 2>or just plain wrong, or worse, potentially destructive if you've

335
00:16:48.519 --> 00:16:49.519
<v Speaker 2>given it right access.

336
00:16:49.639 --> 00:16:49.960
<v Speaker 1>Yikes.

337
00:16:50.320 --> 00:16:53.639
<v Speaker 2>So always always keep a backup of important data before

338
00:16:53.759 --> 00:16:57.799
<v Speaker 2>enabling data access via language models, and ideally have checks

339
00:16:57.799 --> 00:17:01.159
<v Speaker 2>in place, maybe even human review for sensitive querities. It's

340
00:17:01.200 --> 00:17:03.519
<v Speaker 2>power that absolutely needs human oversight.

341
00:17:03.600 --> 00:17:06.480
<v Speaker 1>Okay, creceed with caution on database access. Got it. And

342
00:17:06.559 --> 00:17:09.200
<v Speaker 1>it's not just text and tables anymore. Llms are now

343
00:17:09.240 --> 00:17:11.759
<v Speaker 1>analyzing images and videos too. How does that work?

344
00:17:12.000 --> 00:17:15.359
<v Speaker 2>It's truly incredible. Models like GPT four to H are

345
00:17:15.480 --> 00:17:19.160
<v Speaker 2>natively multimodal. This means they were trained from the ground

346
00:17:19.240 --> 00:17:21.839
<v Speaker 2>up on different types of data, not just text, so

347
00:17:21.880 --> 00:17:24.839
<v Speaker 2>they can see in the sense, you could ask free

348
00:17:24.880 --> 00:17:29.279
<v Speaker 2>form natural language questions directly about images. For example, detect

349
00:17:29.480 --> 00:17:32.599
<v Speaker 2>golden persian cats in this picture, and you provide the

350
00:17:32.640 --> 00:17:36.200
<v Speaker 2>image along with the text wow. Your prompts combine text

351
00:17:36.240 --> 00:17:40.279
<v Speaker 2>instructions with image ll components pointing to images online. You

352
00:17:40.279 --> 00:17:44.519
<v Speaker 2>could even include multiple images in one prompt for comparative analysis,

353
00:17:44.559 --> 00:17:46.640
<v Speaker 2>like what's different between these two photos?

354
00:17:46.640 --> 00:17:49.400
<v Speaker 1>And cost? Is analyzing images expensive?

355
00:17:49.680 --> 00:17:52.240
<v Speaker 2>The cost is generally proportional to the resolution of the

356
00:17:52.279 --> 00:17:56.920
<v Speaker 2>images you submit. High resolution, more detail, potentially more tokens used.

357
00:17:56.920 --> 00:18:00.200
<v Speaker 1>Okay, what about say, tagging people in photos.

358
00:18:00.160 --> 00:18:02.440
<v Speaker 2>Do that too. You could provide a reference picture of

359
00:18:02.480 --> 00:18:04.880
<v Speaker 2>a person alongside the pictures you want to tag, using

360
00:18:04.960 --> 00:18:08.559
<v Speaker 2>multimodal prompts with two or more images and text instructions

361
00:18:08.640 --> 00:18:10.799
<v Speaker 2>like is the person in the first image present in

362
00:18:10.799 --> 00:18:11.559
<v Speaker 2>the second image?

363
00:18:11.599 --> 00:18:14.039
<v Speaker 1>What if my images and videos aren't online if they're

364
00:18:14.079 --> 00:18:15.720
<v Speaker 1>stored locally on my computer?

365
00:18:15.920 --> 00:18:19.319
<v Speaker 2>Good question. For local images or video frames, you need

366
00:18:19.359 --> 00:18:23.480
<v Speaker 2>to encode them first. Common formats like PNG jpeg up

367
00:18:23.559 --> 00:18:26.039
<v Speaker 2>to about twenty milibuni in size need to be converted

368
00:18:26.079 --> 00:18:28.759
<v Speaker 2>into a text format called Base sixty four and then

369
00:18:28.920 --> 00:18:30.079
<v Speaker 2>encoded as UTF eight.

370
00:18:30.359 --> 00:18:32.119
<v Speaker 1>Encode them as text yes.

371
00:18:32.119 --> 00:18:34.839
<v Speaker 2>Essentially turning the image data into a long string of

372
00:18:34.920 --> 00:18:38.000
<v Speaker 2>characters that can be sent in the API request along

373
00:18:38.039 --> 00:18:42.160
<v Speaker 2>with your text prompt. Libraries like OpenCV are commonly used

374
00:18:42.200 --> 00:18:45.599
<v Speaker 2>to extract individual frames from videos, maybe just the first

375
00:18:45.680 --> 00:18:48.160
<v Speaker 2>ten frames, to get a sense of the videos content and.

376
00:18:48.119 --> 00:18:49.440
<v Speaker 1>What would you do with those frames?

377
00:18:49.559 --> 00:18:52.759
<v Speaker 2>You could use those sampled frames, along with text instructions

378
00:18:52.799 --> 00:18:56.240
<v Speaker 2>to say, generate a concise video title, like provide the

379
00:18:56.240 --> 00:18:58.759
<v Speaker 2>frames and ask generate a short title for a video

380
00:18:58.880 --> 00:19:01.799
<v Speaker 2>showing these scenes. It might come back with traffic conditions

381
00:19:01.799 --> 00:19:03.039
<v Speaker 2>on I five during rush hour.

382
00:19:03.200 --> 00:19:06.799
<v Speaker 1>That's remarkable versatility taking us from text to tables, to

383
00:19:06.920 --> 00:19:10.359
<v Speaker 1>images and video frames. And finally, what about audio data?

384
00:19:10.400 --> 00:19:11.039
<v Speaker 1>Can they listen?

385
00:19:11.279 --> 00:19:14.039
<v Speaker 2>Be sure, can or at least process the data for

386
00:19:14.119 --> 00:19:17.319
<v Speaker 2>audio data analysis. Open AI's Whisper model is a real

387
00:19:17.400 --> 00:19:21.319
<v Speaker 2>game changer. Yeah, it's a transformer model like GPT, but

388
00:19:21.440 --> 00:19:25.319
<v Speaker 2>train specifically on over six hundred and eighty thousand hours

389
00:19:25.640 --> 00:19:30.160
<v Speaker 2>of multilingual audio data. It's excellent for transcription, converting audio

390
00:19:30.200 --> 00:19:34.039
<v Speaker 2>recordings into written text, typically English text output, though it

391
00:19:34.160 --> 00:19:35.960
<v Speaker 2>understands many languages.

392
00:19:35.640 --> 00:19:37.920
<v Speaker 1>So speech to text. What formats?

393
00:19:37.799 --> 00:19:41.279
<v Speaker 2>It supports common formats like MP three, WAV and others,

394
00:19:41.599 --> 00:19:44.119
<v Speaker 2>usually with a file size limit around twenty five milibit

395
00:19:44.119 --> 00:19:45.000
<v Speaker 2>for the standard API.

396
00:19:45.079 --> 00:19:45.920
<v Speaker 1>What can you build with that?

397
00:19:46.160 --> 00:19:49.960
<v Speaker 2>You could build a full voice query interface. Imagine record

398
00:19:50.000 --> 00:19:53.200
<v Speaker 2>a spoken question using a library like sound device on

399
00:19:53.240 --> 00:19:56.759
<v Speaker 2>your computer, OK, transcribe that audio to text using whisper,

400
00:19:57.440 --> 00:20:00.839
<v Speaker 2>translate that text into a SQL query GBT four H,

401
00:20:01.480 --> 00:20:04.000
<v Speaker 2>execute the query against your database, get the result, and

402
00:20:04.039 --> 00:20:06.799
<v Speaker 2>then present the answer back as speech using text to

403
00:20:06.839 --> 00:20:07.599
<v Speaker 2>speech generation.

404
00:20:07.799 --> 00:20:11.119
<v Speaker 1>Whoa a full voice assistant for your data exactly?

405
00:20:11.559 --> 00:20:14.640
<v Speaker 2>OpenAI also has a tts IE model for that text

406
00:20:14.640 --> 00:20:17.440
<v Speaker 2>to speech part You can select from various voices, give

407
00:20:17.440 --> 00:20:19.920
<v Speaker 2>it the text answer, and it generates the audio. Pricing

408
00:20:20.000 --> 00:20:22.160
<v Speaker 2>for TTS is usually based on the number of characters

409
00:20:22.200 --> 00:20:22.680
<v Speaker 2>you convert.

410
00:20:22.720 --> 00:20:24.200
<v Speaker 1>That's amazing. What about translation?

411
00:20:24.519 --> 00:20:29.440
<v Speaker 2>This whole pipeline also enables simultaneous translation. Effectively, spoken input

412
00:20:29.440 --> 00:20:32.799
<v Speaker 2>in one language gets transcribed by whisper, translated to text

413
00:20:32.880 --> 00:20:35.720
<v Speaker 2>in the second language by GPT, and then spoken aloud

414
00:20:35.759 --> 00:20:38.039
<v Speaker 2>in that target language using the TTS model.

415
00:20:38.119 --> 00:20:40.720
<v Speaker 1>Wow, the pieces are all there. Okay, so we've covered

416
00:20:40.759 --> 00:20:42.920
<v Speaker 1>the basics with open AI. But now for the part

417
00:20:42.960 --> 00:20:45.920
<v Speaker 1>that truly blew my mind when I was researching this.

418
00:20:46.039 --> 00:20:49.759
<v Speaker 1>The world of LMS extends far beyond just open AI.

419
00:20:49.880 --> 00:20:53.559
<v Speaker 2>Oh absolutely, It's a rapidly growing ecosystem that are prominent

420
00:20:53.599 --> 00:20:57.839
<v Speaker 2>GPT alternatives, each with unique philosophies and strengths. Like COO, Well,

421
00:20:57.960 --> 00:21:02.119
<v Speaker 2>there's anthropic. With their claud models. They emphasize a constitutional

422
00:21:02.119 --> 00:21:05.720
<v Speaker 2>AI approach, trying to build models that are inherently helpful

423
00:21:05.759 --> 00:21:07.400
<v Speaker 2>and harmless through their training process.

424
00:21:07.480 --> 00:21:09.039
<v Speaker 1>Constitutional AI. Interesting.

425
00:21:09.200 --> 00:21:11.960
<v Speaker 2>Then you have cohere. Their command R plus model, for instance,

426
00:21:12.000 --> 00:21:16.559
<v Speaker 2>focus is heavily on grounding to avoid hallucinations, yet linking

427
00:21:16.640 --> 00:21:20.079
<v Speaker 2>the model's answers back to real data sources. They alpha

428
00:21:20.160 --> 00:21:23.839
<v Speaker 2>use techniques like RAG which stands for retrieval augmented Generation,

429
00:21:24.359 --> 00:21:27.200
<v Speaker 2>and provide connectors that allow the model to perform web

430
00:21:27.240 --> 00:21:32.119
<v Speaker 2>searches or query databases to find factual information before generating an.

431
00:21:32.000 --> 00:21:34.960
<v Speaker 1>Answer, So fact checking itself before answering.

432
00:21:35.039 --> 00:21:37.799
<v Speaker 2>That's the idea. And of course Google they played a

433
00:21:37.839 --> 00:21:42.279
<v Speaker 2>foundational role by inventing the transformer architecture itself back in

434
00:21:42.319 --> 00:21:47.400
<v Speaker 2>twenty seventeen. Their Gemini models offer very powerful multimodal capabilities

435
00:21:47.440 --> 00:21:47.839
<v Speaker 2>as well.

436
00:21:48.000 --> 00:21:50.200
<v Speaker 1>Right, Google's a huge player, and for those who prefer

437
00:21:50.359 --> 00:21:54.359
<v Speaker 1>more control, maybe running models locally on their own machines.

438
00:21:53.960 --> 00:21:56.799
<v Speaker 2>That's where hugging Face really shines. They are central to

439
00:21:56.839 --> 00:21:58.400
<v Speaker 2>the open source AI community.

440
00:21:58.519 --> 00:22:00.480
<v Speaker 1>Hugging Face like the emoji.

441
00:22:00.640 --> 00:22:03.640
<v Speaker 2>Exactly like the emoji, they provide a vast platform and

442
00:22:03.680 --> 00:22:07.519
<v Speaker 2>ecosystem for open source models. This allows users to download

443
00:22:07.519 --> 00:22:10.039
<v Speaker 2>and run models on their own local infrastructure.

444
00:22:10.160 --> 00:22:11.720
<v Speaker 1>What's the benefit of that It can.

445
00:22:11.680 --> 00:22:14.319
<v Speaker 2>Be significantly cheaper in the long run, especially for high

446
00:22:14.400 --> 00:22:17.559
<v Speaker 2>volume use as you're not paying per token to an API,

447
00:22:18.359 --> 00:22:22.400
<v Speaker 2>And crucially, it's ideal for sensitive data that you absolutely

448
00:22:22.440 --> 00:22:25.519
<v Speaker 2>cannot send to a third party API for privacy or

449
00:22:25.559 --> 00:22:26.839
<v Speaker 2>security reasons makes sense.

450
00:22:26.960 --> 00:22:28.759
<v Speaker 1>How many models are we talking about?

451
00:22:28.839 --> 00:22:32.200
<v Speaker 2>The hugging face hub has? I think over a million models,

452
00:22:32.400 --> 00:22:35.839
<v Speaker 2>data sets and related resources. Now, it's enormous. You can

453
00:22:35.880 --> 00:22:40.559
<v Speaker 2>filter models by task, text classification, image generation, visual question answering,

454
00:22:40.599 --> 00:22:43.119
<v Speaker 2>you name it. It's a real treasure trove for finding

455
00:22:43.200 --> 00:22:44.480
<v Speaker 2>or building custom solutions.

456
00:22:44.680 --> 00:22:48.559
<v Speaker 1>A million models. Wow? Okay. So for those truly complex,

457
00:22:48.640 --> 00:22:51.640
<v Speaker 1>multi step data analysis pipelines we talked about earlier, maybe

458
00:22:51.720 --> 00:22:56.200
<v Speaker 1>combining database queries with web searches and tech summarization. Yeah,

459
00:22:56.200 --> 00:22:59.079
<v Speaker 1>we're talking about next level tools, right frameworks exactly.

460
00:22:59.279 --> 00:23:01.759
<v Speaker 2>For those kinds of sophisticated workflows, you'll often turn to

461
00:23:01.839 --> 00:23:04.759
<v Speaker 2>software frameworks like lang chain and lama index. They help

462
00:23:04.799 --> 00:23:05.839
<v Speaker 2>manage the complexity.

463
00:23:06.000 --> 00:23:07.680
<v Speaker 1>Lang chain, what's the core idea there?

464
00:23:07.960 --> 00:23:11.920
<v Speaker 2>Lang chain helps you compose complex applications through chains. Think

465
00:23:11.960 --> 00:23:16.319
<v Speaker 2>of chains as ways to sequence operations, integrating LMAM calls

466
00:23:16.359 --> 00:23:19.160
<v Speaker 2>with other standard Python functions or external.

467
00:23:18.759 --> 00:23:22.000
<v Speaker 1>APIs like linking steps together precisely.

468
00:23:22.519 --> 00:23:25.640
<v Speaker 2>Key components include things like chat prompt template for creating

469
00:23:25.720 --> 00:23:29.839
<v Speaker 2>reusable prompt structures, chat open ai or similar for making

470
00:23:29.880 --> 00:23:33.240
<v Speaker 2>the actual API calls to the LLM, and strout pot

471
00:23:33.240 --> 00:23:35.400
<v Speaker 2>parser for neatly getting the text result out.

472
00:23:35.559 --> 00:23:36.599
<v Speaker 1>Okay, building blocks.

473
00:23:36.920 --> 00:23:39.799
<v Speaker 2>But the really powerful concept in lang chain is often agents.

474
00:23:40.119 --> 00:23:41.960
<v Speaker 1>Agents like secret agents.

475
00:23:42.119 --> 00:23:44.559
<v Speaker 2>Yeah, sort of think of an agent as putting the

476
00:23:44.680 --> 00:23:48.039
<v Speaker 2>LLM in the driver's seat. You give it a complex

477
00:23:48.119 --> 00:23:49.839
<v Speaker 2>task and a sit of tools that can use.

478
00:23:50.039 --> 00:23:50.400
<v Speaker 1>Tools.

479
00:23:50.440 --> 00:23:54.359
<v Speaker 2>Being tools are basically just regular functions, Python functions, API calls, whatever,

480
00:23:54.680 --> 00:23:57.279
<v Speaker 2>but they have a natural language description telling the agent

481
00:23:57.400 --> 00:23:59.720
<v Speaker 2>what the tool does. The agent can then look at

482
00:23:59.759 --> 00:24:02.680
<v Speaker 2>the tag, break it down and decide Okay, for this part,

483
00:24:02.759 --> 00:24:05.319
<v Speaker 2>I need to use the sql query tool. For that part,

484
00:24:05.359 --> 00:24:07.160
<v Speaker 2>I'll use the web search tool. Then I'll use the

485
00:24:07.160 --> 00:24:10.160
<v Speaker 2>summarizer tool. It figures out the plan and wish tools

486
00:24:10.160 --> 00:24:11.200
<v Speaker 2>to invoke in what order.

487
00:24:11.440 --> 00:24:15.359
<v Speaker 1>So the LLM itself orchestrates the workflow using the tools

488
00:24:15.400 --> 00:24:16.640
<v Speaker 1>you give it exactly.

489
00:24:16.920 --> 00:24:19.519
<v Speaker 2>For instance, you could build a data analysis agent that

490
00:24:19.599 --> 00:24:23.000
<v Speaker 2>combines several relational database tools one to list tables, one

491
00:24:23.039 --> 00:24:26.319
<v Speaker 2>to get the schema, one to check sql syntax, one

492
00:24:26.359 --> 00:24:28.799
<v Speaker 2>to actually run the query, maybe alongside a web search

493
00:24:28.880 --> 00:24:32.240
<v Speaker 2>tool for pulling in external context. You define these custom

494
00:24:32.279 --> 00:24:35.559
<v Speaker 2>tools quite easily, often using a simple a tool decorator.

495
00:24:35.119 --> 00:24:39.480
<v Speaker 1>In Python that sounds incredibly powerful. Giving the LM agency

496
00:24:39.640 --> 00:24:43.480
<v Speaker 1>to solve problems and LAMA index how does that fit in?

497
00:24:43.759 --> 00:24:47.720
<v Speaker 2>Lamma index is another extremely popular framework, and it particularly

498
00:24:47.720 --> 00:24:50.799
<v Speaker 2>excels when you're dealing with large collections of your own data,

499
00:24:51.079 --> 00:24:54.920
<v Speaker 2>especially documents of various types of PDFs, powerpoints, text files, etc.

500
00:24:55.400 --> 00:24:57.400
<v Speaker 1>So more focused on query and your own stuff.

501
00:24:57.480 --> 00:25:00.240
<v Speaker 2>Yes, its main strength is in how it indexes this data.

502
00:25:00.279 --> 00:25:03.599
<v Speaker 2>It pre processes your documents to create efficient search structures,

503
00:25:03.599 --> 00:25:06.319
<v Speaker 2>often using those embedding vectors we talked about earlier. This

504
00:25:06.359 --> 00:25:08.960
<v Speaker 2>allows it to quickly identify the most relevant subsets of

505
00:25:09.000 --> 00:25:10.240
<v Speaker 2>your data for a given query.

506
00:25:10.400 --> 00:25:14.160
<v Speaker 1>Ah, So it finds the right needle in the haystack first, exactly.

507
00:25:14.200 --> 00:25:17.039
<v Speaker 2>It retrieves the most relevant chunks of information and then

508
00:25:17.079 --> 00:25:20.400
<v Speaker 2>passes only that context, along with your question to a

509
00:25:20.599 --> 00:25:24.079
<v Speaker 2>powerful LLM like GPT four to synthesize the final answer.

510
00:25:24.559 --> 00:25:27.440
<v Speaker 2>This is much more efficient and effective than trying to

511
00:25:27.480 --> 00:25:31.240
<v Speaker 2>stuff entire documents into an LM prompt, which often isn't

512
00:25:31.240 --> 00:25:32.799
<v Speaker 2>even possible due to length limits.

513
00:25:32.880 --> 00:25:36.240
<v Speaker 1>So lemmy index is about retrieval, augmented generation, finding the

514
00:25:36.279 --> 00:25:37.319
<v Speaker 1>relevant bits first.

515
00:25:37.359 --> 00:25:40.359
<v Speaker 2>Precisely, you could build a simple question answering system over

516
00:25:40.400 --> 00:25:44.079
<v Speaker 2>a whole directory of diverse company reports, PDFs, powerpoints, maybe

517
00:25:44.119 --> 00:25:47.759
<v Speaker 2>web pages, and Lemmy index handles fetching the right pieces

518
00:25:47.759 --> 00:25:49.920
<v Speaker 2>of info before the ll generates the answer.

519
00:25:50.200 --> 00:25:52.839
<v Speaker 1>So how do they compare lang Chain and lemnx. Are

520
00:25:52.880 --> 00:25:55.759
<v Speaker 1>they competitors or complementary? Do you use one or the other?

521
00:25:55.880 --> 00:25:58.720
<v Speaker 2>That's a great question. Lane Chain is perhaps a more

522
00:25:58.799 --> 00:26:02.200
<v Speaker 2>general framework for build all sorts of LLM powered applications,

523
00:26:02.480 --> 00:26:05.319
<v Speaker 2>with a strong focus on agents and chains orchestrating tools.

524
00:26:06.000 --> 00:26:09.319
<v Speaker 2>Lemmy Index is more specialized, really focus on that interaction

525
00:26:09.400 --> 00:26:13.240
<v Speaker 2>pattern between lms and large external data sets via indexing

526
00:26:13.279 --> 00:26:13.839
<v Speaker 2>and retrieval.

527
00:26:14.000 --> 00:26:15.759
<v Speaker 1>So different focuses, right.

528
00:26:15.759 --> 00:26:18.640
<v Speaker 2>You'll often see them used together. Actually, you might use

529
00:26:18.720 --> 00:26:21.519
<v Speaker 2>law index to build a robust retrieval tool and then

530
00:26:21.559 --> 00:26:24.400
<v Speaker 2>incorporate that tool into a Lange chain agent, or you

531
00:26:24.440 --> 00:26:26.480
<v Speaker 2>might use one over the other, depending on whether your

532
00:26:26.519 --> 00:26:31.200
<v Speaker 2>main challenge is complex orchestration, lang chain or querying large

533
00:26:31.240 --> 00:26:35.640
<v Speaker 2>knowledge bases bamby index. Both frameworks are also relatively young

534
00:26:35.720 --> 00:26:39.079
<v Speaker 2>and evolving incredibly quickly, so the lines can blur. It's

535
00:26:39.119 --> 00:26:40.640
<v Speaker 2>really exciting space to watch.

536
00:26:40.759 --> 00:26:43.119
<v Speaker 1>So what does this all mean for you, the listener?

537
00:26:43.319 --> 00:26:47.359
<v Speaker 1>We've unpacked how llms are not just for generating text

538
00:26:47.440 --> 00:26:51.079
<v Speaker 1>or chatting, but are becoming these incredibly powerful engines for

539
00:26:51.119 --> 00:26:54.319
<v Speaker 1>a data analysis across pretty much every format imaginable.

540
00:26:54.400 --> 00:26:56.119
<v Speaker 2>Yeah, it's gone way beyond chatbots.

541
00:26:56.279 --> 00:26:59.880
<v Speaker 1>We've covered their incredible versatility, some practical techniques for optimizing

542
00:27:00.039 --> 00:27:03.680
<v Speaker 1>cost and performance, and explore these advanced frameworks like lang

543
00:27:03.759 --> 00:27:08.200
<v Speaker 1>chain and lem index that let you build truly sophisticated applications.

544
00:27:08.799 --> 00:27:11.279
<v Speaker 1>We're really just scratching the surface of what these frameworks

545
00:27:11.279 --> 00:27:14.079
<v Speaker 1>can do, but the key takeaway is they unlock true

546
00:27:14.200 --> 00:27:16.200
<v Speaker 1>programmatic power for llms.

547
00:27:16.559 --> 00:27:20.599
<v Speaker 2>And remember, knowledge is really most valuable when it's understood

548
00:27:20.599 --> 00:27:23.400
<v Speaker 2>and applied. We encourage you to think about the practical

549
00:27:23.400 --> 00:27:27.119
<v Speaker 2>applications of this knowledge in your own fields, your own work,

550
00:27:27.519 --> 00:27:31.000
<v Speaker 2>whether that's automating the analysis of customer reviews, building voice

551
00:27:31.000 --> 00:27:34.519
<v Speaker 2>interfaces for your internal tools, or extracting new insights from

552
00:27:34.559 --> 00:27:40.319
<v Speaker 2>complex multimodal data sources. The possibilities are just vast. Now,

553
00:27:41.039 --> 00:27:43.920
<v Speaker 2>this really raises an important question, maybe the most important one.

554
00:27:44.559 --> 00:27:48.200
<v Speaker 2>How will you start experimenting with these capabilities to solve

555
00:27:48.240 --> 00:27:52.000
<v Speaker 2>your unique data challenges? What problem could you tackle now

556
00:27:52.079 --> 00:27:53.079
<v Speaker 2>that you couldn't before.

557
00:27:53.680 --> 00:27:56.680
<v Speaker 1>That's the real question. And while llms definitely offer an

558
00:27:56.680 --> 00:28:00.720
<v Speaker 1>almost magical experience. Sometimes, maybe the true power lies in

559
00:28:00.759 --> 00:28:03.640
<v Speaker 1>our human ability to understand how they work warts and all,

560
00:28:04.240 --> 00:28:08.039
<v Speaker 1>to manage their limitations, like those convincing hallucinations, and to

561
00:28:08.079 --> 00:28:11.160
<v Speaker 1>critically design the prompts, the workflows, the frameworks that turn

562
00:28:11.240 --> 00:28:14.240
<v Speaker 1>all that raw data into meaningful, verified insights.

563
00:28:14.359 --> 00:28:15.519
<v Speaker 2>Verification is key.

564
00:28:15.640 --> 00:28:19.880
<v Speaker 1>The continuous journey of learning and applying these evolving tools,

565
00:28:19.920 --> 00:28:22.920
<v Speaker 1>figuring out how to use them responsibly and effectively. That

566
00:28:23.039 --> 00:28:25.960
<v Speaker 1>really feels like the most exciting frontier in data right now.
