WEBVTT

1
00:00:00.160 --> 00:00:02.240
<v Speaker 1>Welcome to the Deep Dive. We're the show that gets

2
00:00:02.279 --> 00:00:04.719
<v Speaker 1>you quickly and thoroughly well informed on the topics that

3
00:00:04.759 --> 00:00:09.679
<v Speaker 1>really matter. And today we are plunging headfirst into something that, honestly,

4
00:00:09.720 --> 00:00:11.480
<v Speaker 1>it still feels a bit like science fiction, but it's

5
00:00:11.560 --> 00:00:16.359
<v Speaker 1>very much real now, the world of artificial intelligence, specifically

6
00:00:16.440 --> 00:00:20.039
<v Speaker 1>large language models or LMS. I mean, think about it.

7
00:00:20.039 --> 00:00:24.719
<v Speaker 1>I remember when chat GPT just exploded onto the scene exactly.

8
00:00:24.800 --> 00:00:27.160
<v Speaker 1>It wasn't just some tech news item. It went global,

9
00:00:27.239 --> 00:00:29.760
<v Speaker 1>pulling what over one hundred million users in just two months.

10
00:00:30.079 --> 00:00:33.840
<v Speaker 1>It really felt like these things could just conjure up text, answers,

11
00:00:34.119 --> 00:00:35.439
<v Speaker 1>anything right out of thin air.

12
00:00:35.560 --> 00:00:37.600
<v Speaker 2>It did feel a bit magical, didnety totally.

13
00:00:38.159 --> 00:00:40.560
<v Speaker 1>So our mission with this Deep Dive is really to

14
00:00:40.560 --> 00:00:44.359
<v Speaker 1>give you a shortcut into learning liang chain, building AI

15
00:00:44.520 --> 00:00:47.600
<v Speaker 1>and LM applications. We want to go beyond the buzzwords,

16
00:00:47.600 --> 00:00:50.200
<v Speaker 1>you know, get into the core concepts, the actual practical

17
00:00:50.240 --> 00:00:53.960
<v Speaker 1>strategies for building powerful AI apps with this thing called

18
00:00:54.039 --> 00:00:56.600
<v Speaker 1>lang chain. Our goal isn't just to tell you what's

19
00:00:56.600 --> 00:00:59.280
<v Speaker 1>happening in AI, but really show you why it matters,

20
00:00:59.479 --> 00:01:02.920
<v Speaker 1>point out the aha moments, and crucially equip you the

21
00:01:03.000 --> 00:01:06.319
<v Speaker 1>listener with the knowledge to maybe even apply it yourself.

22
00:01:06.480 --> 00:01:10.280
<v Speaker 2>And to guide us through this pretty intricate, in let's

23
00:01:10.280 --> 00:01:13.400
<v Speaker 2>be honest, rapidly evolving landscape. We're drawing directly from a

24
00:01:13.439 --> 00:01:17.560
<v Speaker 2>really fantastic source, the book Learning lang Chain by myo

25
00:01:17.640 --> 00:01:20.719
<v Speaker 2>Ocean and Nunocampos. And these aren't just you know, academics

26
00:01:20.719 --> 00:01:23.000
<v Speaker 2>writing about the fields from Afar. Mayo was actually an

27
00:01:23.280 --> 00:01:25.799
<v Speaker 2>early developer, an advocate for the lang Chain open source

28
00:01:25.840 --> 00:01:28.879
<v Speaker 2>library itself, a real pioneer in that whole chat with

29
00:01:29.000 --> 00:01:32.000
<v Speaker 2>data movement, and Nun is a founding software engineer at

30
00:01:32.079 --> 00:01:35.239
<v Speaker 2>lang Chain. So this book, it isn't just theory, it's

31
00:01:35.280 --> 00:01:39.799
<v Speaker 2>packed with super clear explanations, actionable techniques. Industry experts are

32
00:01:39.799 --> 00:01:43.760
<v Speaker 2>calling it the go to resource for building production ready

33
00:01:43.840 --> 00:01:45.680
<v Speaker 2>generative AI and agents.

34
00:01:45.959 --> 00:01:48.680
<v Speaker 1>That's fantastic, a really solid foundation. Then, So the big

35
00:01:48.760 --> 00:01:53.239
<v Speaker 1>question we're tackling today is this, how can developers, maybe

36
00:01:53.280 --> 00:01:56.840
<v Speaker 1>even those who don't have a deep machine learning background,

37
00:01:57.079 --> 00:02:00.200
<v Speaker 1>how can they harness this incredible power of lllms to

38
00:02:00.239 --> 00:02:04.879
<v Speaker 1>build genuinely production ready generative AI applications and these intelligent agents. Right,

39
00:02:05.120 --> 00:02:07.879
<v Speaker 1>we're going to unpack the essential tools, the patterns, the

40
00:02:07.920 --> 00:02:11.840
<v Speaker 1>thinking that transforms these powerful models from cool tech demos

41
00:02:11.879 --> 00:02:16.080
<v Speaker 1>into practical, usable solutions. So, okay, let's start right at

42
00:02:16.120 --> 00:02:18.919
<v Speaker 1>the beginning. These lms, they seem almost magical. How exactly

43
00:02:18.919 --> 00:02:21.319
<v Speaker 1>do they know the answers they give and what precisely

44
00:02:21.400 --> 00:02:22.719
<v Speaker 1>is a token in their world?

45
00:02:22.960 --> 00:02:24.039
<v Speaker 3>Yeah, good place to start.

46
00:02:24.400 --> 00:02:27.919
<v Speaker 2>So at their heart, large language models are generative models

47
00:02:28.000 --> 00:02:32.159
<v Speaker 2>specifically built for text. They're trained on just vast amounts

48
00:02:32.199 --> 00:02:36.599
<v Speaker 2>of text, think everything publicly available, books, articles, forums, code,

49
00:02:37.080 --> 00:02:41.159
<v Speaker 2>even cleaned up video transcripts, an immense data set. Their

50
00:02:41.159 --> 00:02:43.599
<v Speaker 2>core function isn't really magic, though it looks like it.

51
00:02:43.599 --> 00:02:48.080
<v Speaker 2>It's incredibly sophisticated prediction. They basically predict the most probable

52
00:02:48.319 --> 00:02:51.879
<v Speaker 2>next word or token in a sequence based on all

53
00:02:51.879 --> 00:02:53.800
<v Speaker 2>the patterns they've learned. So if you feed it the

54
00:02:53.800 --> 00:02:56.759
<v Speaker 2>capital of England is, it's learned from countless examples that

55
00:02:56.800 --> 00:02:58.439
<v Speaker 2>the highest probability next word is.

56
00:02:58.439 --> 00:03:02.000
<v Speaker 1>London, ok N, matching on a massive scale. But what's

57
00:03:02.039 --> 00:03:03.879
<v Speaker 1>a token? Is it just a word?

58
00:03:04.280 --> 00:03:04.840
<v Speaker 3>Not always?

59
00:03:05.039 --> 00:03:08.439
<v Speaker 2>Yeah, a token is the fundamental unit the LM processes.

60
00:03:08.840 --> 00:03:12.039
<v Speaker 2>Often it's a word, but sometimes longer or less common

61
00:03:12.080 --> 00:03:15.400
<v Speaker 2>words get broken down, like dearest might become two tokens.

62
00:03:15.520 --> 00:03:19.240
<v Speaker 2>Done and arrest on average. You know, for common English text,

63
00:03:19.240 --> 00:03:22.280
<v Speaker 2>one token is roughly four characters. And the driving engine

64
00:03:22.319 --> 00:03:25.240
<v Speaker 2>behind all this predictive power is something called the transformer

65
00:03:25.280 --> 00:03:26.479
<v Speaker 2>neural network architecture.

66
00:03:26.560 --> 00:03:27.800
<v Speaker 1>Right, the transformer architecture.

67
00:03:27.879 --> 00:03:29.960
<v Speaker 2>Heard a lot about that, Yeah, it's key. Think of

68
00:03:29.960 --> 00:03:33.039
<v Speaker 2>it as being really good at understanding context. It relates

69
00:03:33.080 --> 00:03:35.599
<v Speaker 2>every word in a sentence to every other word, building

70
00:03:35.599 --> 00:03:38.840
<v Speaker 2>this rich understanding of meaning and relationships. That's how they

71
00:03:38.879 --> 00:03:42.520
<v Speaker 2>handle complex grammar and nuance, not just simple word prediction.

72
00:03:43.159 --> 00:03:46.000
<v Speaker 1>And it was that understanding, or perhaps a limitation of

73
00:03:46.000 --> 00:03:48.319
<v Speaker 1>that understanding, that led to lang chain, Right, I read

74
00:03:48.360 --> 00:03:51.400
<v Speaker 1>Harrison Chase started the open source library back in October

75
00:03:51.439 --> 00:03:54.680
<v Speaker 1>twenty twenty two. What was the key realization.

76
00:03:54.280 --> 00:03:56.919
<v Speaker 2>Exactly, the real breakthrough, The thing that sparked lang chain

77
00:03:56.960 --> 00:04:00.000
<v Speaker 2>was this insight in LM, as brilliant as it is

78
00:04:00.080 --> 00:04:03.840
<v Speaker 2>with language, could totally fumble basic arithmetic, Like ask it

79
00:04:03.879 --> 00:04:06.439
<v Speaker 2>to calculate one two hundred and thirty four module one

80
00:04:06.439 --> 00:04:08.759
<v Speaker 2>twenty three on its own. It might just guess or

81
00:04:08.800 --> 00:04:09.319
<v Speaker 2>get it wrong.

82
00:04:09.400 --> 00:04:12.400
<v Speaker 1>Huh weird. Right, It can write poetry but not do

83
00:04:12.479 --> 00:04:13.039
<v Speaker 1>simple math.

84
00:04:13.120 --> 00:04:14.319
<v Speaker 3>It is kind of paradoxical.

85
00:04:14.520 --> 00:04:18.079
<v Speaker 2>Yeah, And that raised this crucial question, how do you

86
00:04:18.120 --> 00:04:21.959
<v Speaker 2>give this powerful language model capabilities it just doesn't have intrinsically.

87
00:04:22.759 --> 00:04:25.959
<v Speaker 2>Harrison Chase's pivotal realization was, and this is key, the

88
00:04:26.000 --> 00:04:30.079
<v Speaker 2>most interesting LLM applications needed to use lms together with

89
00:04:30.279 --> 00:04:34.959
<v Speaker 2>other sources of computation or knowledge. Laying Chain was essentially

90
00:04:34.959 --> 00:04:37.959
<v Speaker 2>built to provide the building blocks, the interfaces, and the

91
00:04:38.000 --> 00:04:42.160
<v Speaker 2>tooling to reliably combine llms with other things, like giving

92
00:04:42.199 --> 00:04:44.800
<v Speaker 2>it the ability to call out to a calculator when

93
00:04:44.839 --> 00:04:45.920
<v Speaker 2>it sees a math problem.

94
00:04:46.000 --> 00:04:48.439
<v Speaker 1>That makes so much sense giving it tools. So, if

95
00:04:48.439 --> 00:04:51.879
<v Speaker 1>the LLM isn't a calculator or a database itself, how

96
00:04:51.879 --> 00:04:53.759
<v Speaker 1>do we even talk to it effectively? How do we

97
00:04:53.839 --> 00:04:55.120
<v Speaker 1>guide it to do what we need?

98
00:04:55.240 --> 00:04:57.720
<v Speaker 2>Yeah, that's all about prompting. The prompt is basically the

99
00:04:57.759 --> 00:05:00.959
<v Speaker 2>instructions and input you provide to the model, and crucially,

100
00:05:01.519 --> 00:05:04.759
<v Speaker 2>how you phrase that prompt significantly influences the model's output.

101
00:05:06.079 --> 00:05:08.920
<v Speaker 2>There's also this fascinating control called temperature. You can think

102
00:05:08.959 --> 00:05:11.439
<v Speaker 2>of it like a creativity dial. Lower temperature makes the

103
00:05:11.480 --> 00:05:16.600
<v Speaker 2>output more focused, more deterministic, predictable. Higher temperature it lets

104
00:05:16.639 --> 00:05:19.839
<v Speaker 2>the model take more risks, get more creative, maybe even

105
00:05:19.879 --> 00:05:22.240
<v Speaker 2>a bit random, useful for different tasks.

106
00:05:22.439 --> 00:05:26.399
<v Speaker 1>Okay, so prompting is key, temperature, controls, creativity. What are

107
00:05:26.399 --> 00:05:27.920
<v Speaker 1>the main ways we prompt these things?

108
00:05:28.439 --> 00:05:31.439
<v Speaker 2>There are several core techniques, each kind of addressing a

109
00:05:31.439 --> 00:05:35.720
<v Speaker 2>different need. The absolute simplest is zero shot prompting. Just

110
00:05:35.759 --> 00:05:39.000
<v Speaker 2>give it a direct instruction like that example earlier, how

111
00:05:39.000 --> 00:05:41.199
<v Speaker 2>Old was the thirtiest president of the United States when

112
00:05:41.240 --> 00:05:44.839
<v Speaker 2>his wife's mother died. It's straightforward for basic questions, but

113
00:05:44.959 --> 00:05:47.519
<v Speaker 2>you know it can often lead to inaccuracies or just

114
00:05:47.560 --> 00:05:49.399
<v Speaker 2>making things up at the info is and baked into

115
00:05:49.399 --> 00:05:50.240
<v Speaker 2>his training data.

116
00:05:50.319 --> 00:05:53.000
<v Speaker 1>Hallucinations, right, the dreaded hallucinations joys lely.

117
00:05:53.319 --> 00:05:55.879
<v Speaker 2>Then you've got chain of thought or co T. This

118
00:05:55.920 --> 00:05:58.680
<v Speaker 2>is where you literally tell the model think step by step.

119
00:06:00.079 --> 00:06:03.959
<v Speaker 2>It's intriguing because this often dramatically improves performance on reasoning tasks.

120
00:06:04.240 --> 00:06:07.079
<v Speaker 2>It forces the LM to sort of show its work,

121
00:06:07.319 --> 00:06:09.639
<v Speaker 2>break down the problem like we learned in school math

122
00:06:09.800 --> 00:06:12.959
<v Speaker 2>pretty much. But here's a funny quirk. The book notes

123
00:06:12.959 --> 00:06:16.680
<v Speaker 2>that sometimes for tasks where humans also tend to overthink

124
00:06:16.720 --> 00:06:20.639
<v Speaker 2>and make mistakes, COKE can actually make the LLLM perform worse.

125
00:06:21.240 --> 00:06:23.879
<v Speaker 2>A good reminder they aren't just scaled up human brains.

126
00:06:24.199 --> 00:06:25.680
<v Speaker 1>Huh interesting?

127
00:06:25.800 --> 00:06:26.959
<v Speaker 3>What else next?

128
00:06:27.040 --> 00:06:27.240
<v Speaker 1>Up?

129
00:06:27.360 --> 00:06:30.040
<v Speaker 2>And this is fundamental for making lllms useful with your

130
00:06:30.120 --> 00:06:35.879
<v Speaker 2>data is retrieval augmented generation or OURG. This means providing

131
00:06:36.000 --> 00:06:39.439
<v Speaker 2>relevant pieces of text also known as context, directly within

132
00:06:39.480 --> 00:06:41.759
<v Speaker 2>the prompt. So if you want the LLLM to know

133
00:06:41.800 --> 00:06:45.319
<v Speaker 2>about your company's latest internal reporter today's news, you use

134
00:06:45.360 --> 00:06:49.199
<v Speaker 2>OURG to feed at that specific information alongside the question.

135
00:06:49.000 --> 00:06:51.319
<v Speaker 1>Ah okay, so ori is how you give it knowledge

136
00:06:51.319 --> 00:06:52.879
<v Speaker 1>it wasn't trained on precisely.

137
00:06:53.160 --> 00:06:56.040
<v Speaker 2>And then for making lllms do things, there's tool calling.

138
00:06:56.240 --> 00:06:58.199
<v Speaker 2>This lets you give the LM a list of external

139
00:06:58.199 --> 00:07:00.959
<v Speaker 2>functions or calculator. Example, maybe a search engine, API, a

140
00:07:01.000 --> 00:07:03.600
<v Speaker 2>weather service, whatever. You train it to recognize when it

141
00:07:03.600 --> 00:07:05.600
<v Speaker 2>needs a tool and to signal it's intent.

142
00:07:05.480 --> 00:07:07.160
<v Speaker 1>To use it, so it can decide I need to

143
00:07:07.160 --> 00:07:09.639
<v Speaker 1>search for this or I need to calculate that exactly.

144
00:07:10.120 --> 00:07:14.560
<v Speaker 2>And often the most powerful applications combine these techniques. Maybe

145
00:07:14.639 --> 00:07:16.920
<v Speaker 2>use chain of thought to plan or ready to fetch

146
00:07:16.959 --> 00:07:20.000
<v Speaker 2>relevant data, and then tool calling to perform a specific

147
00:07:20.040 --> 00:07:23.480
<v Speaker 2>action or calculation based on that data. Oh and one

148
00:07:23.480 --> 00:07:26.279
<v Speaker 2>more few shot prompting. This is where you give the

149
00:07:26.399 --> 00:07:28.879
<v Speaker 2>LM just a small number of examples, like here's a question,

150
00:07:28.959 --> 00:07:31.279
<v Speaker 2>here's the right kind of answer. It helps it learn

151
00:07:31.360 --> 00:07:34.720
<v Speaker 2>new tasks or formats on the fly without full retraining,

152
00:07:34.759 --> 00:07:36.360
<v Speaker 2>like showing it a few examples to get the hang

153
00:07:36.399 --> 00:07:36.959
<v Speaker 2>of something new.

154
00:07:37.079 --> 00:07:39.759
<v Speaker 1>Wow, Okay, that's a whole toolkit for interacting with them,

155
00:07:40.000 --> 00:07:41.759
<v Speaker 1>and lang chain helps manage all of this.

156
00:07:41.959 --> 00:07:43.000
<v Speaker 3>Yeah, that's the beauty of it.

157
00:07:43.079 --> 00:07:46.160
<v Speaker 2>Lang chain was one of the earliest open source libraries

158
00:07:46.199 --> 00:07:49.800
<v Speaker 2>to provide these core LM and prompting building blocks, and

159
00:07:49.879 --> 00:07:52.720
<v Speaker 2>it's taken off massively. The community is huge, over seventy

160
00:07:52.759 --> 00:07:55.360
<v Speaker 2>two thousand members, twenty eight million downloads a month.

161
00:07:55.439 --> 00:07:56.120
<v Speaker 3>It's staggering.

162
00:07:56.519 --> 00:07:59.319
<v Speaker 2>What lang chain does is offer these simple abstractions for

163
00:07:59.439 --> 00:08:02.639
<v Speaker 2>all those tech niques we just discussed, zero shot solo

164
00:08:02.720 --> 00:08:07.079
<v Speaker 2>t rrag tool calling fushot plus. It integrates seamlessly with

165
00:08:07.120 --> 00:08:11.279
<v Speaker 2>all the major LLLM providers open Ai, Anthropic, Google, and

166
00:08:11.360 --> 00:08:15.240
<v Speaker 2>popular open source models like Lama. This common interface is

167
00:08:15.279 --> 00:08:17.600
<v Speaker 2>a really big deal. It means you can easily experiment,

168
00:08:17.920 --> 00:08:21.519
<v Speaker 2>swap out different llms, and crucially avoid being locked into

169
00:08:21.560 --> 00:08:22.399
<v Speaker 2>a single provider.

170
00:08:22.720 --> 00:08:24.199
<v Speaker 3>That gives you a huge flexibility.

171
00:08:24.319 --> 00:08:27.480
<v Speaker 1>That flexibility sounds key Okay, so this brings us to

172
00:08:27.560 --> 00:08:30.560
<v Speaker 1>a really crucial challenge, especially for us the builders. If

173
00:08:30.560 --> 00:08:33.759
<v Speaker 1>these alans are brilliant, but they fundamentally can't know everything.

174
00:08:33.759 --> 00:08:36.639
<v Speaker 1>They weren't trained on my company's latest financials or yesterday's news,

175
00:08:36.919 --> 00:08:38.679
<v Speaker 1>how do we stop them from just making stuff up,

176
00:08:38.720 --> 00:08:41.000
<v Speaker 1>from hallucinating when we ask about that information.

177
00:08:41.440 --> 00:08:42.960
<v Speaker 3>You've nailed the core problem.

178
00:08:43.600 --> 00:08:46.960
<v Speaker 2>Just relying on the LM's pre train knowledge often isn't

179
00:08:47.039 --> 00:08:50.799
<v Speaker 2>enough for real world apps, precisely because, like you said,

180
00:08:50.919 --> 00:08:53.600
<v Speaker 2>they lack private data stuff not on the public Internet,

181
00:08:53.799 --> 00:08:56.159
<v Speaker 2>and they lack knowledge of current events because of their

182
00:08:56.200 --> 00:08:57.240
<v Speaker 2>knowledge cutoff date.

183
00:08:57.879 --> 00:08:58.519
<v Speaker 3>When they don't have.

184
00:08:58.519 --> 00:09:02.519
<v Speaker 2>The information they need, they tend to hallucinate, generating plausible

185
00:09:02.519 --> 00:09:06.639
<v Speaker 2>sounding but incorrect or even totally fabricated answers.

186
00:09:06.360 --> 00:09:08.519
<v Speaker 1>Which can be dangerous in a real application.

187
00:09:08.720 --> 00:09:12.120
<v Speaker 2>Absolutely, and that's exactly where retrieval augmented generation are AG

188
00:09:12.480 --> 00:09:16.440
<v Speaker 2>becomes essential. It's basically your defense mechanism against hallucination by

189
00:09:16.440 --> 00:09:19.039
<v Speaker 2>providing the necessary context RAG.

190
00:09:19.279 --> 00:09:21.799
<v Speaker 1>So walk us through how it actually works. How does

191
00:09:21.840 --> 00:09:25.080
<v Speaker 1>it ground the LM in specific maybe private or very

192
00:09:25.120 --> 00:09:26.080
<v Speaker 1>current information.

193
00:09:26.559 --> 00:09:30.200
<v Speaker 2>Okay, So, our RAG is specifically designed to enhance the

194
00:09:30.240 --> 00:09:35.080
<v Speaker 2>accuracy of outputs generated by llms by providing context from

195
00:09:35.200 --> 00:09:39.440
<v Speaker 2>external sources. Meta AI actually coined the term, and their

196
00:09:39.480 --> 00:09:42.679
<v Speaker 2>research found that RAG makes models more factual and specific.

197
00:09:43.080 --> 00:09:46.000
<v Speaker 2>The whole process generally involves four key steps for getting

198
00:09:46.000 --> 00:09:50.399
<v Speaker 2>your documents ready, sometimes called ingestion or indexing. First, you

199
00:09:50.480 --> 00:09:53.480
<v Speaker 2>extract the text from whatever documents you have. Lang chain

200
00:09:53.559 --> 00:09:56.519
<v Speaker 2>has helpers for this, like text loater for plain text files,

201
00:09:56.600 --> 00:09:59.639
<v Speaker 2>or pd sloader for PDFs, and many others. Simple enough,

202
00:10:00.120 --> 00:10:02.799
<v Speaker 2>the text out step one. Step two, you split that

203
00:10:02.879 --> 00:10:06.440
<v Speaker 2>text into manageable chunks. This is really important because, as

204
00:10:06.440 --> 00:10:09.080
<v Speaker 2>we mentioned, LM's have a context window, a limit on

205
00:10:09.120 --> 00:10:11.000
<v Speaker 2>how much text they can look at in one go.

206
00:10:11.360 --> 00:10:13.559
<v Speaker 2>Can't just feeding up five hundred page document, right, It's

207
00:10:13.559 --> 00:10:16.399
<v Speaker 2>too big Exactly so, tools like lang teen's where cursive

208
00:10:16.440 --> 00:10:19.960
<v Speaker 2>character text splitter, cleverly break the text down. It tries

209
00:10:20.000 --> 00:10:23.120
<v Speaker 2>to split along natural boundaries first like paragraphs and sentences,

210
00:10:23.159 --> 00:10:25.840
<v Speaker 2>then words. To keep things coherent, you can configure the

211
00:10:25.919 --> 00:10:28.600
<v Speaker 2>chunk size and also add some chunk overlap, meaning consecutive

212
00:10:28.679 --> 00:10:30.120
<v Speaker 2>chunks share a bit of text.

213
00:10:30.440 --> 00:10:34.000
<v Speaker 1>Ah overlap helps maintain context across the breaks.

214
00:10:34.240 --> 00:10:37.120
<v Speaker 2>Precisely, It's like making sure the end of one chapter

215
00:10:37.159 --> 00:10:39.639
<v Speaker 2>flows smoothly into the start of the next third step.

216
00:10:40.200 --> 00:10:44.799
<v Speaker 2>You convert these text chunks into numbers, specifically into embeddings.

217
00:10:44.960 --> 00:10:47.639
<v Speaker 1>Embeddings. Okay, this sounds like where the magic happens.

218
00:10:47.720 --> 00:10:48.399
<v Speaker 3>It kind of is.

219
00:10:48.919 --> 00:10:51.519
<v Speaker 2>Think of an embedding as a long list of numbers

220
00:10:51.559 --> 00:10:54.759
<v Speaker 2>a vector that represents the meaning of that text chunk.

221
00:10:55.159 --> 00:10:58.360
<v Speaker 2>Now it's a lossy representation. You can't perfectly reconstruct the

222
00:10:58.360 --> 00:11:01.080
<v Speaker 2>original words just from the numbers, like you can't get

223
00:11:01.120 --> 00:11:02.639
<v Speaker 2>perfect ced quality back.

224
00:11:02.519 --> 00:11:03.279
<v Speaker 3>From an MP three.

225
00:11:03.399 --> 00:11:05.320
<v Speaker 1>But it captures the essence exactly.

226
00:11:05.399 --> 00:11:08.200
<v Speaker 2>It captures the semantic essence, and this allows for math

227
00:11:08.279 --> 00:11:11.120
<v Speaker 2>on words. This is a huge lead from older systems

228
00:11:11.120 --> 00:11:15.080
<v Speaker 2>that just did keyword searching LM based embeddings or semantic

229
00:11:15.120 --> 00:11:16.600
<v Speaker 2>embeddings understand meaning.

230
00:11:16.799 --> 00:11:19.879
<v Speaker 1>Okay, this is fascinating. How do we teach a computer

231
00:11:19.960 --> 00:11:24.720
<v Speaker 1>the difference between say, lion, pet and dog. I get

232
00:11:24.759 --> 00:11:27.440
<v Speaker 1>the related but how does the computer quantify that? If

233
00:11:27.440 --> 00:11:31.519
<v Speaker 1>we connect this to the bigger picture, this cosign similarity

234
00:11:31.600 --> 00:11:36.960
<v Speaker 1>idea quantifying how close pet and dog are numerically versus lion.

235
00:11:37.559 --> 00:11:40.879
<v Speaker 1>That seems powerful, But how does that number crunching actually

236
00:11:40.960 --> 00:11:42.039
<v Speaker 1>enable better search?

237
00:11:42.279 --> 00:11:44.399
<v Speaker 2>That's a fantastic question. Really gets to the heart of

238
00:11:44.440 --> 00:11:48.919
<v Speaker 2>semantic search. Imagine all these words, or rather the concepts

239
00:11:48.919 --> 00:11:53.159
<v Speaker 2>they represent, existing is points in some vast high dimensional space.

240
00:11:53.759 --> 00:11:56.480
<v Speaker 2>The embedding vectors for pet and dog would literally be

241
00:11:56.559 --> 00:11:58.919
<v Speaker 2>mapped closer together in this space. Then either would be

242
00:11:58.960 --> 00:12:02.519
<v Speaker 2>to lion because they meanings are more related. Cosine similarity

243
00:12:02.600 --> 00:12:04.679
<v Speaker 2>is just the mathematical tool we use to measure the

244
00:12:04.720 --> 00:12:07.399
<v Speaker 2>angle or the closeness between these vectors. It gives a

245
00:12:07.440 --> 00:12:11.360
<v Speaker 2>score usually between natus, one opposite meaning and one identical meaning.

246
00:12:11.720 --> 00:12:14.679
<v Speaker 2>So pet and dog would have a cosine similarity score

247
00:12:14.799 --> 00:12:16.559
<v Speaker 2>much closer to one than pet and lion.

248
00:12:16.759 --> 00:12:21.279
<v Speaker 1>Ah. Okay, so similarity means closer in this meaning space exactly.

249
00:12:21.759 --> 00:12:24.679
<v Speaker 2>And this ability to turn text into embeddings that capture

250
00:12:24.679 --> 00:12:28.559
<v Speaker 2>deep meaning lets us search based on concepts, not just keywords.

251
00:12:28.840 --> 00:12:31.519
<v Speaker 2>You could search for happy house animal and the system

252
00:12:31.519 --> 00:12:34.960
<v Speaker 2>could find documents talking about joyful puppies or content cats.

253
00:12:35.240 --> 00:12:38.399
<v Speaker 2>Even if the exact words happy house or animal aren't there,

254
00:12:38.559 --> 00:12:40.720
<v Speaker 2>it understands the underlying meaning is similar.

255
00:12:40.799 --> 00:12:44.759
<v Speaker 1>That's incredibly powerful search. Okay, so we've extracted, split, and

256
00:12:44.840 --> 00:12:47.039
<v Speaker 1>embedded the text. What's the final step.

257
00:12:47.320 --> 00:12:49.320
<v Speaker 2>The fourth step is to store these embeddings in a

258
00:12:49.399 --> 00:12:53.440
<v Speaker 2>vector store. Think of this as a specialized database designed

259
00:12:53.480 --> 00:12:57.440
<v Speaker 2>to store these numerical vectors and perform those complex similarity

260
00:12:57.440 --> 00:13:02.000
<v Speaker 2>calculations like cosine similarity really efficiently and quickly. There are

261
00:13:02.039 --> 00:13:05.039
<v Speaker 2>lots of options, open source ones like pg vector, an

262
00:13:05.080 --> 00:13:08.679
<v Speaker 2>extension for post cresscool, dedicated databases like wev eight or

263
00:13:08.720 --> 00:13:12.519
<v Speaker 2>pine Cone, or cloud services. When you, the user ask

264
00:13:12.600 --> 00:13:16.000
<v Speaker 2>a question, your question is also converted into an embedding vector.

265
00:13:16.639 --> 00:13:19.639
<v Speaker 2>The vector store then rapidly finds the stored embeddings and

266
00:13:19.679 --> 00:13:23.559
<v Speaker 2>their corresponding text chunks that are most similar mathematically to

267
00:13:23.600 --> 00:13:26.919
<v Speaker 2>your query embedding. Those relevant chunks are then retrieved and

268
00:13:27.000 --> 00:13:29.320
<v Speaker 2>passed to the LLM along with your original.

269
00:13:29.080 --> 00:13:31.879
<v Speaker 1>Question, giving it the specific context it needs to answer

270
00:13:31.919 --> 00:13:32.879
<v Speaker 1>accurate precisely.

271
00:13:32.919 --> 00:13:36.039
<v Speaker 2>And lane Chain also provides tools like its indexing API

272
00:13:36.080 --> 00:13:38.559
<v Speaker 2>and record manager to help keep this vector store up

273
00:13:38.600 --> 00:13:41.600
<v Speaker 2>to date. As your source documents change, you can efficiently

274
00:13:41.639 --> 00:13:45.000
<v Speaker 2>track those changes, add new embeddings, remove old ones, and

275
00:13:45.039 --> 00:13:49.240
<v Speaker 2>avoid costly reprocessing of unchanged documents, keeping the knowledge current.

276
00:13:49.519 --> 00:13:52.720
<v Speaker 1>Okay, that makes sense. We've got the basic arget pipeline down,

277
00:13:53.399 --> 00:13:57.559
<v Speaker 1>index the data, retrieve relevant chunks, give them to the LLM.

278
00:13:57.960 --> 00:14:02.000
<v Speaker 1>But I imagine building something truly production ready involves more nuance.

279
00:14:02.639 --> 00:14:05.240
<v Speaker 1>What are the common challenges and how do we refine

280
00:14:05.279 --> 00:14:08.480
<v Speaker 1>that search for knowledge to be even more accurate and robust?

281
00:14:08.799 --> 00:14:11.679
<v Speaker 2>Yeah, moving from a basic a RAGI demo to production

282
00:14:12.279 --> 00:14:15.799
<v Speaker 2>definitely introduces complexity. Users ask questions in all sorts of ways,

283
00:14:15.879 --> 00:14:19.399
<v Speaker 2>sometimes ambiguously. Your data might live in multiple different places,

284
00:14:19.639 --> 00:14:22.080
<v Speaker 2>and you often need to translate that natural language question

285
00:14:22.120 --> 00:14:25.000
<v Speaker 2>into something more structured for retrieval. So we need more

286
00:14:25.039 --> 00:14:25.879
<v Speaker 2>advanced strategies.

287
00:14:26.000 --> 00:14:26.639
<v Speaker 1>Okay, what are they?

288
00:14:26.759 --> 00:14:30.039
<v Speaker 2>The book highlights three main categories of strategy. The first

289
00:14:30.080 --> 00:14:33.480
<v Speaker 2>is query transformation. The idea here is to modify the

290
00:14:33.559 --> 00:14:36.840
<v Speaker 2>user's input before you even search to improve the chances

291
00:14:36.840 --> 00:14:38.200
<v Speaker 2>of finding the best documents.

292
00:14:38.360 --> 00:14:40.759
<v Speaker 1>Ah, like cleaning up the question first exactly.

293
00:14:40.799 --> 00:14:44.240
<v Speaker 2>One technique is rewrite retrieve read. Here, you actually use

294
00:14:44.279 --> 00:14:47.360
<v Speaker 2>another LM call first just to rewrite the user's potentially

295
00:14:47.480 --> 00:14:51.600
<v Speaker 2>vague or conversational query into a clearer, more focused search query.

296
00:14:51.919 --> 00:14:54.399
<v Speaker 2>Then you use that rewritten query for the retrieval step.

297
00:14:54.720 --> 00:14:58.919
<v Speaker 1>Smart like having an assistant clarify your question before searching.

298
00:14:58.960 --> 00:15:00.480
<v Speaker 1>Does it add much delay?

299
00:15:00.879 --> 00:15:02.720
<v Speaker 2>Yeah, it's a little bit of latency. Yeah, because it's

300
00:15:02.720 --> 00:15:06.039
<v Speaker 2>an extra LM call, but often the improvement in retrieval

301
00:15:06.120 --> 00:15:10.440
<v Speaker 2>quality is worth it. Another transformation technique is multi query retrieval.

302
00:15:11.320 --> 00:15:14.120
<v Speaker 2>Instead of just one query, you have the LMM generate

303
00:15:14.559 --> 00:15:18.480
<v Speaker 2>multiple versions of the given user question, maybe from slightly

304
00:15:18.480 --> 00:15:20.240
<v Speaker 2>different angles or using different keywords.

305
00:15:20.320 --> 00:15:21.600
<v Speaker 1>Oh interesting, why do that?

306
00:15:21.879 --> 00:15:24.519
<v Speaker 2>It's great for complex questions that might need information from

307
00:15:24.639 --> 00:15:28.159
<v Speaker 2>multiple perspectives. You run retrievals for all those generated queries

308
00:15:28.200 --> 00:15:31.960
<v Speaker 2>in parallel, then combine the unique documents found. It casts

309
00:15:31.960 --> 00:15:34.720
<v Speaker 2>a wider net, reducing the chance you miss something important.

310
00:15:35.200 --> 00:15:39.120
<v Speaker 2>Building on that is RAG fusion. It starts like multiquery,

311
00:15:39.399 --> 00:15:42.679
<v Speaker 2>generating multiple queries and retrieving results for each, but then

312
00:15:42.679 --> 00:15:45.759
<v Speaker 2>it has a crucial final re ranking step using something

313
00:15:45.759 --> 00:15:47.360
<v Speaker 2>called the reciprocal rank of fusion.

314
00:15:47.519 --> 00:15:48.559
<v Speaker 3>RF algorithm.

315
00:15:49.519 --> 00:15:50.320
<v Speaker 1>Sounds technical.

316
00:15:50.559 --> 00:15:52.919
<v Speaker 2>It's a clever way to combine the rankings from all

317
00:15:52.960 --> 00:15:57.279
<v Speaker 2>the different searches. Documents that consistently rank highly across multiple

318
00:15:57.360 --> 00:16:00.720
<v Speaker 2>queries get boosted to the very top. Really effective at

319
00:16:00.720 --> 00:16:03.559
<v Speaker 2>finding the most relevant stuff while also broadening discovery.

320
00:16:03.639 --> 00:16:07.159
<v Speaker 1>Okay, so RF aggregates the wisdom of multiple searches. Cool

321
00:16:07.480 --> 00:16:09.039
<v Speaker 1>any other transformation tricks?

322
00:16:09.360 --> 00:16:13.519
<v Speaker 2>One more interesting one is hypothetical document embeddings or Heidi

323
00:16:14.200 --> 00:16:17.519
<v Speaker 2>this kind of counterintuitive. Instead of searching with the user's query,

324
00:16:17.679 --> 00:16:21.399
<v Speaker 2>you first have an LM create a hypothetical document that

325
00:16:21.480 --> 00:16:23.039
<v Speaker 2>would be a perfect answer to the query.

326
00:16:23.120 --> 00:16:24.679
<v Speaker 1>Wait, it makes up an answer first.

327
00:16:24.759 --> 00:16:25.159
<v Speaker 3>Yeah.

328
00:16:25.360 --> 00:16:28.519
<v Speaker 2>The intuition is that this generated ideal answer, even though

329
00:16:28.519 --> 00:16:32.480
<v Speaker 2>it's hypothetical, is often semantically more similar to the actual

330
00:16:32.559 --> 00:16:36.919
<v Speaker 2>relevant documents than the original maybe short or ambiguous user query.

331
00:16:37.639 --> 00:16:40.960
<v Speaker 2>So you embed this hypothetical answer and use that embedding

332
00:16:41.320 --> 00:16:42.440
<v Speaker 2>for the similarity search.

333
00:16:42.799 --> 00:16:45.799
<v Speaker 1>Huh, that's sliver. Using an ideal answer is a better

334
00:16:45.799 --> 00:16:49.360
<v Speaker 1>search query? Okay, so that's query transformation. What's the second strategy?

335
00:16:49.600 --> 00:16:53.159
<v Speaker 2>The second strategy is query routing. This tackles the problem

336
00:16:53.200 --> 00:16:56.960
<v Speaker 2>you mentioned earlier. What if your data lives in different places.

337
00:16:57.200 --> 00:17:00.399
<v Speaker 2>Maybe you have Python docks in one vector store and

338
00:17:00.519 --> 00:17:01.919
<v Speaker 2>JavaScript docs in another.

339
00:17:02.240 --> 00:17:04.240
<v Speaker 1>Right, how do you send the query to the right place?

340
00:17:04.319 --> 00:17:07.960
<v Speaker 2>That's exactly what quer routing does, forward a user's query

341
00:17:08.000 --> 00:17:11.000
<v Speaker 2>to the relevant data source. There are a couple of

342
00:17:11.000 --> 00:17:14.880
<v Speaker 2>ways logical routing uses an LLM to make the decision.

343
00:17:15.680 --> 00:17:18.680
<v Speaker 2>You give the LM descriptions of your available data sources,

344
00:17:18.720 --> 00:17:23.000
<v Speaker 2>like this index contains technical documentation for Python and Based

345
00:17:23.000 --> 00:17:25.640
<v Speaker 2>on the user's query, the LM picks which of the

346
00:17:25.680 --> 00:17:29.480
<v Speaker 2>available indexes to use. Lang chain helps ensure the LM

347
00:17:29.519 --> 00:17:32.720
<v Speaker 2>outputs its choice in a structured way your application can understand.

348
00:17:32.920 --> 00:17:35.799
<v Speaker 1>So the LLM acts like a switchboard operator kind of.

349
00:17:35.880 --> 00:17:36.200
<v Speaker 3>Yeah.

350
00:17:36.559 --> 00:17:38.359
<v Speaker 2>Alternatively, there's semantic routing.

351
00:17:38.839 --> 00:17:39.079
<v Speaker 3>Here.

352
00:17:39.240 --> 00:17:42.319
<v Speaker 2>You embed the descriptions of your data sources themselves. Then

353
00:17:42.359 --> 00:17:45.279
<v Speaker 2>you compare the user's query embedding to these description embeddings.

354
00:17:45.599 --> 00:17:48.839
<v Speaker 2>The closest match indicates the most relevant data source. This

355
00:17:48.920 --> 00:17:51.440
<v Speaker 2>is more dynamic, doesn't require an LLLM call for every

356
00:17:51.519 --> 00:17:52.279
<v Speaker 2>routing decision.

357
00:17:52.480 --> 00:17:55.759
<v Speaker 1>Okay, route it logically or semantically makes sense. What's the

358
00:17:55.799 --> 00:17:57.079
<v Speaker 1>third major strategy?

359
00:17:57.319 --> 00:18:00.680
<v Speaker 2>The third is query construction. This is about transforming a

360
00:18:00.759 --> 00:18:04.119
<v Speaker 2>natural language query into the query language of the database

361
00:18:04.200 --> 00:18:07.279
<v Speaker 2>or data source you were interacting with. It goes beyond

362
00:18:07.359 --> 00:18:10.359
<v Speaker 2>just finding similar text chunks. Oh so, well, maybe you

363
00:18:10.400 --> 00:18:14.759
<v Speaker 2>need to combine semantic search with traditional database filters. Text

364
00:18:14.759 --> 00:18:17.599
<v Speaker 2>to metadata filter is a technique where the LLM extracts

365
00:18:17.640 --> 00:18:21.640
<v Speaker 2>structured information like a date, a category, a price range

366
00:18:21.880 --> 00:18:24.079
<v Speaker 2>directly from the user's natural language query.

367
00:18:24.279 --> 00:18:26.480
<v Speaker 1>Ah so, if I ask for sci fi movies from

368
00:18:26.480 --> 00:18:29.680
<v Speaker 1>the eighties, it pulls out sci fi for semantic search

369
00:18:29.759 --> 00:18:32.599
<v Speaker 1>and nineteen eighties as a metadata filter exactly.

370
00:18:32.640 --> 00:18:35.079
<v Speaker 2>It lets you combine the power of semantic understanding with

371
00:18:35.079 --> 00:18:38.279
<v Speaker 2>the precision of structured filters. Another big one here is

372
00:18:38.359 --> 00:18:41.640
<v Speaker 2>text to seql. This involves having the LM translate a

373
00:18:41.720 --> 00:18:45.039
<v Speaker 2>natural language question like what were our total sales in

374
00:18:45.160 --> 00:18:48.640
<v Speaker 2>Q three directly into an executable SQL query to run

375
00:18:48.680 --> 00:18:50.519
<v Speaker 2>against a traditional relational database.

376
00:18:50.680 --> 00:18:53.319
<v Speaker 1>Wow, that's powerful. How do you make that reliable? SQL?

377
00:18:53.319 --> 00:18:53.920
<v Speaker 1>Can be tricky?

378
00:18:54.160 --> 00:18:57.480
<v Speaker 2>It requires careful setup. You usually need to provide the

379
00:18:57.599 --> 00:19:00.559
<v Speaker 2>LLM with a description of the database scheme like the

380
00:19:00.920 --> 00:19:04.640
<v Speaker 2>create table statements, maybe some example rows, and often some

381
00:19:04.799 --> 00:19:08.319
<v Speaker 2>few shot examples of natural language questions paired with their

382
00:19:08.359 --> 00:19:10.240
<v Speaker 2>correct SQL queries.

383
00:19:09.839 --> 00:19:10.279
<v Speaker 3>To guide it.

384
00:19:10.599 --> 00:19:14.359
<v Speaker 1>Got it? So, text to sqel translates language to database code.

385
00:19:14.759 --> 00:19:17.240
<v Speaker 2>If we connect this text to SQL capability to the

386
00:19:17.279 --> 00:19:21.559
<v Speaker 2>bigger picture. Though, while it's incredibly powerful, letting an LEBLEM

387
00:19:21.680 --> 00:19:25.720
<v Speaker 2>generate SQL queries directly from potentially untrusted user input is

388
00:19:25.720 --> 00:19:27.519
<v Speaker 2>one of the riskiest things you can do in a

389
00:19:27.640 --> 00:19:28.960
<v Speaker 2>production application. Ah.

390
00:19:29.000 --> 00:19:31.279
<v Speaker 1>Security implications, Yeah, I can see that.

391
00:19:31.359 --> 00:19:34.960
<v Speaker 2>Absolutely. This raise is a really important question around safety.

392
00:19:35.200 --> 00:19:38.880
<v Speaker 2>You must implement critical security measures, things like ensuring the

393
00:19:38.960 --> 00:19:42.480
<v Speaker 2>database connection has read only permissions, strictly limiting access to

394
00:19:42.519 --> 00:19:45.799
<v Speaker 2>only the necessary tables, maybe even views, and definitely adding

395
00:19:45.839 --> 00:19:49.440
<v Speaker 2>query timeouts to prevent denial of service attacks or runaway queries.

396
00:19:49.880 --> 00:19:53.720
<v Speaker 2>It's a capability that demands extreme caution and robust safeguards.

397
00:19:54.079 --> 00:19:57.440
<v Speaker 1>Absolutely crucial point. Okay, this is getting really interesting, especially

398
00:19:57.440 --> 00:20:00.400
<v Speaker 1>for building interactive apps. How do we tackle the fact

399
00:20:00.440 --> 00:20:03.920
<v Speaker 1>that lllms are inherently forgetful. How do we give them

400
00:20:04.039 --> 00:20:07.160
<v Speaker 1>memory to actually hold a conversation, especially as things get

401
00:20:07.200 --> 00:20:07.880
<v Speaker 1>more complex.

402
00:20:08.319 --> 00:20:11.759
<v Speaker 2>Yeah, you've hit on a fundamental aspect. Llms are stateless.

403
00:20:12.160 --> 00:20:14.240
<v Speaker 2>Every time you interact with them. It's like a fresh start.

404
00:20:14.279 --> 00:20:17.799
<v Speaker 2>There's no memory of the prior prompt or model response

405
00:20:17.799 --> 00:20:21.319
<v Speaker 2>built in. It's like talking to someone who forgets everything

406
00:20:21.359 --> 00:20:22.599
<v Speaker 2>you just said every.

407
00:20:22.400 --> 00:20:25.039
<v Speaker 1>Single turn, which isn't great for a chatbot.

408
00:20:24.920 --> 00:20:25.480
<v Speaker 3>Not at all.

409
00:20:26.240 --> 00:20:29.240
<v Speaker 2>The simplest way to build memory is just to literally

410
00:20:29.279 --> 00:20:32.400
<v Speaker 2>store the history of the chatter, all the user messages

411
00:20:32.400 --> 00:20:35.400
<v Speaker 2>and assistant responses as a list, and then include that

412
00:20:35.559 --> 00:20:37.359
<v Speaker 2>entire list in the prompt for the next turn.

413
00:20:37.400 --> 00:20:40.000
<v Speaker 1>Okay, just stuff the whole conversation history back in.

414
00:20:40.400 --> 00:20:42.920
<v Speaker 2>Basically, Yeah, yeah, but you can imagine at scale. That

415
00:20:43.000 --> 00:20:46.279
<v Speaker 2>gets tricky. How do you update that history reliably? How

416
00:20:46.279 --> 00:20:49.680
<v Speaker 2>do you manage state? When you have multiple things happening,

417
00:20:49.720 --> 00:20:53.119
<v Speaker 2>maybe multiple actors or steps in your application? It gets

418
00:20:53.160 --> 00:20:56.880
<v Speaker 2>complicated fast. And that's precisely where langgraf enters the picture.

419
00:20:57.000 --> 00:21:00.920
<v Speaker 2>Lang graph acts as the coordination layer these more complex,

420
00:21:01.079 --> 00:21:04.880
<v Speaker 2>multi step, potentially multi actor applications. It's what allows them

421
00:21:04.880 --> 00:21:07.559
<v Speaker 2>to remember state and coordinate actions over time.

422
00:21:08.000 --> 00:21:09.640
<v Speaker 1>Lang grap Okay, how does it work? Is it like

423
00:21:09.680 --> 00:21:10.480
<v Speaker 1>a state machine?

424
00:21:10.599 --> 00:21:12.599
<v Speaker 2>You think of it kind of like designing a flow

425
00:21:12.680 --> 00:21:16.839
<v Speaker 2>chart for your AI app. It is three core components. First,

426
00:21:16.920 --> 00:21:20.240
<v Speaker 2>there's the state. This is the shared data that evolves

427
00:21:20.240 --> 00:21:22.839
<v Speaker 2>over the course of the application run. It can include

428
00:21:22.880 --> 00:21:27.480
<v Speaker 2>the chat history, intermediate results, anything the application needs to remember. Second,

429
00:21:27.519 --> 00:21:30.559
<v Speaker 2>you have nodes. These are the individual steps or functions

430
00:21:30.559 --> 00:21:32.880
<v Speaker 2>in your flow chart. A node might be a call

431
00:21:32.960 --> 00:21:36.160
<v Speaker 2>to an LM, a call to a tool like our

432
00:21:36.240 --> 00:21:38.759
<v Speaker 2>calculator or search engine, or just some regular Python code

433
00:21:38.799 --> 00:21:41.759
<v Speaker 2>that processes the state. And Third, you have edges. These

434
00:21:41.799 --> 00:21:44.880
<v Speaker 2>are the connections between the nodes, determining the flow of execution.

435
00:21:45.359 --> 00:21:47.559
<v Speaker 2>Edges can be fixed like always go from no day

436
00:21:47.599 --> 00:21:48.920
<v Speaker 2>to node B, or they.

437
00:21:48.799 --> 00:21:51.880
<v Speaker 1>Can be conditional conditional edges, meaning.

438
00:21:51.920 --> 00:21:54.960
<v Speaker 2>Meaning the next step depends on the current state. Often,

439
00:21:55.119 --> 00:21:58.000
<v Speaker 2>an LM in one node might decide which node to

440
00:21:58.039 --> 00:22:01.119
<v Speaker 2>go to next, making the flow dynamic. A huge benefit

441
00:22:01.160 --> 00:22:04.319
<v Speaker 2>of lang graph is its built in persistence. It uses

442
00:22:04.359 --> 00:22:06.319
<v Speaker 2>something called a checkpointer. You can think of it like

443
00:22:06.359 --> 00:22:07.160
<v Speaker 2>an auto save.

444
00:22:07.000 --> 00:22:10.039
<v Speaker 1>Function AH, so it saves the state automatically exactly.

445
00:22:10.480 --> 00:22:13.759
<v Speaker 2>It saves the current state after each step. This means

446
00:22:13.759 --> 00:22:17.359
<v Speaker 2>that every invocation after the first doesn't start from blank slate.

447
00:22:18.160 --> 00:22:20.680
<v Speaker 2>If the app crashes or the user comes back later,

448
00:22:21.119 --> 00:22:22.759
<v Speaker 2>it can pick up right where it left off from

449
00:22:22.799 --> 00:22:25.799
<v Speaker 2>the last save state. Really important for long running or

450
00:22:25.799 --> 00:22:26.799
<v Speaker 2>stateful interactions.

451
00:22:27.079 --> 00:22:31.119
<v Speaker 1>That's huge for usability. What about that growing chat history though,

452
00:22:31.240 --> 00:22:33.799
<v Speaker 1>Does lang graph help manage that so you don't overload

453
00:22:33.799 --> 00:22:34.400
<v Speaker 1>the LLM?

454
00:22:34.599 --> 00:22:35.599
<v Speaker 3>Yes? Absolutely.

455
00:22:36.200 --> 00:22:39.480
<v Speaker 2>While lang graph manages the overall state persistence, you still

456
00:22:39.559 --> 00:22:42.599
<v Speaker 2>use lane chain's utilities within your nodes to manage the

457
00:22:42.640 --> 00:22:45.319
<v Speaker 2>specific chat history part of the state before passing.

458
00:22:45.079 --> 00:22:46.160
<v Speaker 3>It to an LLM.

459
00:22:46.480 --> 00:22:49.640
<v Speaker 2>You can indeligently filter messages, maybe keeping only summaries or

460
00:22:49.720 --> 00:22:52.640
<v Speaker 2>key turns, trim the history based on the number of

461
00:22:52.680 --> 00:22:55.519
<v Speaker 2>messages or total tokens, or even merge older parts of

462
00:22:55.519 --> 00:22:58.519
<v Speaker 2>the conversation into a concise summary, all designed to keep

463
00:22:58.519 --> 00:23:00.480
<v Speaker 2>the context relevant without ex eating.

464
00:23:00.240 --> 00:23:01.319
<v Speaker 3>Those M limits.

465
00:23:01.480 --> 00:23:04.319
<v Speaker 1>Okay, so lang graph orchestrates the flow in memory, and

466
00:23:04.440 --> 00:23:07.680
<v Speaker 1>lang chain helps manage the conversation content. That makes sense.

467
00:23:08.160 --> 00:23:10.279
<v Speaker 1>So we've gone from just calling an LLLM once to

468
00:23:10.759 --> 00:23:14.240
<v Speaker 1>chains and now to the state ful lang graph applications.

469
00:23:14.960 --> 00:23:17.359
<v Speaker 1>How should we think about the different levels of complexity

470
00:23:17.519 --> 00:23:20.160
<v Speaker 1>and I guess intelligence in these systems.

471
00:23:20.359 --> 00:23:21.880
<v Speaker 2>That's a great way to frame it. We can think

472
00:23:21.880 --> 00:23:26.640
<v Speaker 2>about a progression of cognitive architectures moving towards more sophisticated applications,

473
00:23:27.079 --> 00:23:30.039
<v Speaker 2>and as we move up this ladder, we constantly grapple

474
00:23:30.079 --> 00:23:33.839
<v Speaker 2>with that fundamental trade off we mentioned agency, the lmm's

475
00:23:33.880 --> 00:23:37.680
<v Speaker 2>capacity to act autonomously, and reliability the degree to which

476
00:23:37.720 --> 00:23:42.240
<v Speaker 2>we can trust its outputs. More autonomy often means less predictability.

477
00:23:41.680 --> 00:23:43.880
<v Speaker 1>Right the agency versus reliability balance.

478
00:23:43.920 --> 00:23:45.160
<v Speaker 3>So progression looks something like this.

479
00:23:45.240 --> 00:23:47.680
<v Speaker 2>At the base, you have a simple LM call, one input,

480
00:23:47.720 --> 00:23:51.519
<v Speaker 2>one output, like asking get to summarize text. Simple, usually

481
00:23:51.599 --> 00:23:54.680
<v Speaker 2>reliable for that specific task. Next level up is a chain.

482
00:23:54.920 --> 00:23:58.440
<v Speaker 2>This involves multiple LELLM calls or calls to tools executed

483
00:23:58.480 --> 00:24:02.359
<v Speaker 2>in a pre defined fixed sequence. Example, step one LLLLM

484
00:24:02.400 --> 00:24:05.519
<v Speaker 2>generates a SEQL query. Step two different M explains that

485
00:24:05.599 --> 00:24:09.079
<v Speaker 2>query in plain English. The sequence never changes.

486
00:24:09.000 --> 00:24:11.640
<v Speaker 1>Okay, fixed steps like an assembly.

487
00:24:11.240 --> 00:24:14.160
<v Speaker 2>Line pretty much. Then it gets more dynamic with the router. Here,

488
00:24:14.319 --> 00:24:17.039
<v Speaker 2>an LOM decides the sequence of steps. At runtime, it

489
00:24:17.160 --> 00:24:20.279
<v Speaker 2>chooses which pre defined path to take based on the input.

490
00:24:20.440 --> 00:24:23.119
<v Speaker 2>Like an earlier example, if it's a medical question, rode

491
00:24:23.119 --> 00:24:26.799
<v Speaker 2>to the medical index, if insurance route to the FAQ index.

492
00:24:27.240 --> 00:24:29.880
<v Speaker 2>The LOLLM makes a choice, but the possible paths are

493
00:24:29.880 --> 00:24:31.319
<v Speaker 2>still pre defined.

494
00:24:30.880 --> 00:24:33.759
<v Speaker 1>So it adds a decision point. And beyond writers, we

495
00:24:33.839 --> 00:24:37.119
<v Speaker 1>finally get to agents. That seems to be the buzzword

496
00:24:37.160 --> 00:24:38.279
<v Speaker 1>everyone's excited about.

497
00:24:38.440 --> 00:24:42.799
<v Speaker 2>Exactly an agent is quite simply something that acts. What

498
00:24:42.920 --> 00:24:46.079
<v Speaker 2>makes agent architectures unique and powerful is that they use

499
00:24:46.119 --> 00:24:49.799
<v Speaker 2>an LLLM driven loop for control. The LM isn't just

500
00:24:49.880 --> 00:24:52.920
<v Speaker 2>executing pre defined steps. It's deciding what to do next

501
00:24:53.039 --> 00:24:55.559
<v Speaker 2>based on the result of its previous actions, and critically,

502
00:24:55.640 --> 00:24:58.440
<v Speaker 2>it decides when to stop. The most common pattern here

503
00:24:58.480 --> 00:25:00.920
<v Speaker 2>is the plan do loop, often called the react to

504
00:25:01.000 --> 00:25:04.119
<v Speaker 2>architecture reasoning plus acting plan do loop.

505
00:25:04.200 --> 00:25:05.599
<v Speaker 1>How does that work in practice?

506
00:25:05.799 --> 00:25:09.400
<v Speaker 2>Imagine an agent needs to answer a complex question that requires, say,

507
00:25:09.640 --> 00:25:14.039
<v Speaker 2>searching the web and then doing a calculation. Step one,

508
00:25:14.160 --> 00:25:19.039
<v Speaker 2>Them gets the question, observes. Step two, it thinks, for reasons, okay,

509
00:25:19.119 --> 00:25:21.839
<v Speaker 2>answer this, I first need to search for X. Step three,

510
00:25:22.000 --> 00:25:24.799
<v Speaker 2>it plans an action, call the search tool with query X.

511
00:25:25.079 --> 00:25:28.799
<v Speaker 2>Step four, the system executes, the search tool does step five.

512
00:25:28.880 --> 00:25:32.440
<v Speaker 2>Them gets the search results back, observes again. Step six,

513
00:25:32.599 --> 00:25:35.279
<v Speaker 2>It thinks again, okay, based on these results, now we.

514
00:25:35.200 --> 00:25:37.319
<v Speaker 3>Need to calculate why using the calculator tool.

515
00:25:37.559 --> 00:25:40.680
<v Speaker 2>Step seven, it plans the next action, call the calculator

516
00:25:40.720 --> 00:25:43.599
<v Speaker 2>with inputs A and B. Step eight, the system executes

517
00:25:43.599 --> 00:25:46.559
<v Speaker 2>the calculator does. Step nine. The LOM gets the calculation

518
00:25:46.599 --> 00:25:47.519
<v Speaker 2>result observes.

519
00:25:47.759 --> 00:25:48.240
<v Speaker 3>Step ten.

520
00:25:48.359 --> 00:25:50.440
<v Speaker 2>It thinks, one last time, okay, now I have all

521
00:25:50.480 --> 00:25:52.920
<v Speaker 2>the pieces I can formulate the final answer. It decides

522
00:25:52.960 --> 00:25:55.640
<v Speaker 2>the loop is finished. Step eleven, it outputs the final

523
00:25:55.680 --> 00:25:57.920
<v Speaker 2>answer to the user. See how the LOM is driving

524
00:25:57.960 --> 00:26:00.240
<v Speaker 2>the whole process, deciding which tool to use one and

525
00:26:00.319 --> 00:26:03.519
<v Speaker 2>ultimately deciding when it's done. That iterative self directed loop

526
00:26:03.559 --> 00:26:05.680
<v Speaker 2>is the core of an agent. Plank graph is perfect

527
00:26:05.720 --> 00:26:08.759
<v Speaker 2>for implementing these loops using its nodes for the LLM

528
00:26:08.880 --> 00:26:12.000
<v Speaker 2>calls and tool executions and conditional edges to route the

529
00:26:12.000 --> 00:26:14.160
<v Speaker 2>flow based on the LM's decisions.

530
00:26:14.359 --> 00:26:18.119
<v Speaker 1>Wow, okay, that really clarifies it. The LLM is in

531
00:26:18.160 --> 00:26:21.319
<v Speaker 1>the driver's seat, choosing actions and deciding when the job

532
00:26:21.400 --> 00:26:24.480
<v Speaker 1>is done. That's a big step up in autonomy. Can

533
00:26:24.519 --> 00:26:26.200
<v Speaker 1>we make these agents even smarter?

534
00:26:26.519 --> 00:26:29.839
<v Speaker 2>Definitely? There are enhancements. For instance, you might design the

535
00:26:29.880 --> 00:26:33.240
<v Speaker 2>agent to always call a tool first, maybe always starting

536
00:26:33.240 --> 00:26:35.640
<v Speaker 2>with a search to ensure its reasoning is grounded in

537
00:26:35.680 --> 00:26:39.400
<v Speaker 2>current information before it even tries to answer. Another challenge

538
00:26:39.400 --> 00:26:42.480
<v Speaker 2>arise is when you have many possible tools, how does

539
00:26:42.519 --> 00:26:45.160
<v Speaker 2>the agent pick the right one? You can actually use

540
00:26:45.359 --> 00:26:48.839
<v Speaker 2>our rag on the tool descriptions. Store descriptions of all

541
00:26:48.880 --> 00:26:51.000
<v Speaker 2>your tools in a vector store, and when the agent

542
00:26:51.039 --> 00:26:53.240
<v Speaker 2>decides it needs a tool, it first does a semantic

543
00:26:53.279 --> 00:26:55.960
<v Speaker 2>search over the tool descriptions to find the most relevant

544
00:26:56.000 --> 00:26:56.480
<v Speaker 2>one for the.

545
00:26:56.400 --> 00:26:58.920
<v Speaker 1>Task at hand, using our rig to help the agent

546
00:26:59.000 --> 00:27:02.599
<v Speaker 1>choose its own tools. That's pretty meta. This sounds incredibly powerful,

547
00:27:02.640 --> 00:27:06.079
<v Speaker 1>giving models the ability to genuinely tackle multi step problems.

548
00:27:06.400 --> 00:27:07.480
<v Speaker 1>Can they go even further?

549
00:27:07.640 --> 00:27:07.759
<v Speaker 3>Like?

550
00:27:08.039 --> 00:27:11.640
<v Speaker 1>Can agents learn or work together or maybe even critique

551
00:27:11.680 --> 00:27:12.319
<v Speaker 1>their own work?

552
00:27:12.559 --> 00:27:14.359
<v Speaker 3>Yes? Absolutely, to all of those.

553
00:27:14.839 --> 00:27:19.039
<v Speaker 2>One really powerful extension is reflection or self critique. This

554
00:27:19.119 --> 00:27:21.799
<v Speaker 2>involves setting up a kind of loop, often using multiple

555
00:27:21.960 --> 00:27:25.319
<v Speaker 2>LLM calls that mimics how humans create and refine things.

556
00:27:25.640 --> 00:27:27.720
<v Speaker 2>You might have a creator prompt that generates a first

557
00:27:27.759 --> 00:27:30.400
<v Speaker 2>draft of something, say in an essay. Then you have a

558
00:27:30.440 --> 00:27:33.599
<v Speaker 2>separate revisor prompt with the LLLM critiques that draft based

559
00:27:33.599 --> 00:27:38.160
<v Speaker 2>on specific criteria like clarity, tone, fascial accuracy. Finally, the

560
00:27:38.200 --> 00:27:42.240
<v Speaker 2>original lam or another one revises the draft based on that.

561
00:27:42.200 --> 00:27:44.759
<v Speaker 1>Critique, so it acts as its own editor exactly.

562
00:27:44.960 --> 00:27:47.599
<v Speaker 2>It allows the LM to refine its output, maybe catch

563
00:27:47.680 --> 00:27:50.039
<v Speaker 2>errors or improved style. You can eat something of the

564
00:27:50.079 --> 00:27:53.599
<v Speaker 2>revisor LLM to adopt a different persona while critiquing, like

565
00:27:53.839 --> 00:27:56.079
<v Speaker 2>asking it to review the essay from the perspective of

566
00:27:56.119 --> 00:28:01.000
<v Speaker 2>a skeptical historian. This iterative refinement can significately boost quality.

567
00:28:01.160 --> 00:28:05.200
<v Speaker 1>That's amazing. What about teamwork? Can you have multiple agents collaborating?

568
00:28:05.519 --> 00:28:08.279
<v Speaker 2>Yes, for really complex problems that might be too much

569
00:28:08.319 --> 00:28:11.519
<v Speaker 2>for a single agent, maybe requiring too many different tools

570
00:28:11.599 --> 00:28:14.960
<v Speaker 2>or too much context, you can use multi agent architectures.

571
00:28:15.400 --> 00:28:18.559
<v Speaker 2>You literally build teams of LVM agents that work together.

572
00:28:19.079 --> 00:28:21.039
<v Speaker 2>There are different ways to coordinate these teams, but a

573
00:28:21.079 --> 00:28:24.559
<v Speaker 2>practical approach highlighted in the book is the supervisor architecture.

574
00:28:25.160 --> 00:28:29.240
<v Speaker 2>Here you have a central supervisor agent often an element self,

575
00:28:29.519 --> 00:28:32.559
<v Speaker 2>whose job is to manage the workflow based on the

576
00:28:32.599 --> 00:28:35.960
<v Speaker 2>overall goal and the current state. The supervisor decides which

577
00:28:35.960 --> 00:28:39.839
<v Speaker 2>agent or agents should be called next. It wrote tasks

578
00:28:39.880 --> 00:28:43.160
<v Speaker 2>to specialize subagents. Maybe one agent is good at research,

579
00:28:43.200 --> 00:28:46.920
<v Speaker 2>another at writing code, another at summarizing. Their progress and

580
00:28:47.000 --> 00:28:49.559
<v Speaker 2>results are often shared in a central place, like a

581
00:28:49.559 --> 00:28:52.079
<v Speaker 2>list of messages in the lang graft state, allowing them

582
00:28:52.079 --> 00:28:55.240
<v Speaker 2>to build on each other's work. It enables true collaborative

583
00:28:55.240 --> 00:28:57.119
<v Speaker 2>problem solving among AI agents.

584
00:28:57.480 --> 00:28:59.799
<v Speaker 1>So what does this all mean? We're talking about building

585
00:28:59.799 --> 00:29:03.680
<v Speaker 1>digital teams that can think, plan, act, reflect on their actions,

586
00:29:03.720 --> 00:29:06.279
<v Speaker 1>and even critique their own work. It feels like this

587
00:29:06.440 --> 00:29:09.000
<v Speaker 1>genuinely expands the kinds of problems we can even attempt

588
00:29:09.039 --> 00:29:09.880
<v Speaker 1>to solve with AI.

589
00:29:10.240 --> 00:29:12.440
<v Speaker 2>It really does, and it brings us right back to

590
00:29:12.440 --> 00:29:15.359
<v Speaker 2>the fundamental tension we keep mentioning the trade off between

591
00:29:15.599 --> 00:29:20.920
<v Speaker 2>agency and reliability. As these agents become more autonomous, controlling

592
00:29:20.920 --> 00:29:24.119
<v Speaker 2>them and trusting their output becomes even more critical. If

593
00:29:24.119 --> 00:29:26.440
<v Speaker 2>we zoom out again, and connect this to the bigger picture.

594
00:29:27.000 --> 00:29:29.519
<v Speaker 2>You can visualize this trade off as a kind of

595
00:29:29.599 --> 00:29:33.519
<v Speaker 2>frontier on a graph. You want to push that frontier outwards,

596
00:29:33.799 --> 00:29:36.480
<v Speaker 2>achieve more agency for the same level of reliability, or

597
00:29:36.480 --> 00:29:39.480
<v Speaker 2>achieve higher reliability for the same level of agency. Pushing

598
00:29:39.480 --> 00:29:43.000
<v Speaker 2>this frontier is key to building production ready applications people

599
00:29:43.000 --> 00:29:43.880
<v Speaker 2>can actually depend on.

600
00:29:44.119 --> 00:29:47.599
<v Speaker 1>Right, trust is paramount, So specifically, how do we improve

601
00:29:47.680 --> 00:29:51.359
<v Speaker 1>the actual user experience and maybe more importantly, the reliability

602
00:29:51.400 --> 00:29:55.039
<v Speaker 1>and control in these powerful, sometimes complex, agentic systems.

603
00:29:55.160 --> 00:29:58.240
<v Speaker 2>Great question. There are several vital techniques discussed in the book.

604
00:29:58.279 --> 00:30:02.039
<v Speaker 2>First off, managing latency p reception with streaming an intermediate output.

605
00:30:02.519 --> 00:30:05.559
<v Speaker 2>LLM calls, especially in complex agent loops, can take time.

606
00:30:05.960 --> 00:30:08.400
<v Speaker 2>A few seconds of waiting can feel long to a user,

607
00:30:08.920 --> 00:30:13.440
<v Speaker 2>so communicating progress makes that higher latency more palatable. This

608
00:30:13.480 --> 00:30:17.400
<v Speaker 2>includes dreaming the llm's final output token by token so

609
00:30:17.480 --> 00:30:19.880
<v Speaker 2>the text appears gradually like someone typing.

610
00:30:19.720 --> 00:30:21.480
<v Speaker 1>Makes it feel more responsive exactly.

611
00:30:21.720 --> 00:30:24.960
<v Speaker 2>It also includes showing intermediate steps maybe messages like okay,

612
00:30:25.000 --> 00:30:28.440
<v Speaker 2>searching for X or now calculating Y, so the user

613
00:30:28.480 --> 00:30:32.039
<v Speaker 2>sees the agent as working. Second, and absolutely critical for

614
00:30:32.119 --> 00:30:36.119
<v Speaker 2>reliability is ensuring structured output. You often need the LLM

615
00:30:36.200 --> 00:30:39.880
<v Speaker 2>to return information in a specific predictable format like Jason,

616
00:30:40.319 --> 00:30:43.480
<v Speaker 2>not just freeform text. Linke dams with structure output method

617
00:30:43.519 --> 00:30:46.359
<v Speaker 2>is designed for this. It helps reduce variants and ensures

618
00:30:46.400 --> 00:30:49.680
<v Speaker 2>downstream systems can reliably parse and use the llm's output.

619
00:30:50.200 --> 00:30:52.680
<v Speaker 2>Using a low temperature setting often helps here too.

620
00:30:52.759 --> 00:30:55.119
<v Speaker 1>So you get predictable data structures back, not just a

621
00:30:55.240 --> 00:30:56.640
<v Speaker 1>chatty response precisely.

622
00:30:57.119 --> 00:31:00.960
<v Speaker 2>Third, especially for high agency applications, you need human in

623
00:31:01.000 --> 00:31:05.359
<v Speaker 2>the loop controls. These give essential oversight to the end user,

624
00:31:05.839 --> 00:31:12.519
<v Speaker 2>allowing intervention and correction. One control is interrupt The user

625
00:31:12.559 --> 00:31:15.519
<v Speaker 2>should be able to manually stop an ongoing agent process

626
00:31:15.559 --> 00:31:18.599
<v Speaker 2>at any time. Ideally the state is saved, so they

627
00:31:18.599 --> 00:31:21.519
<v Speaker 2>can choose to resume later, restart, or just abandon it.

628
00:31:21.519 --> 00:31:24.079
<v Speaker 2>If the agent's going off track. A panic button basically

629
00:31:24.319 --> 00:31:28.680
<v Speaker 2>kind of another is authorized. The application pauses before performing

630
00:31:28.680 --> 00:31:32.759
<v Speaker 2>a potentially critical or irreversible action, maybe sending an email,

631
00:31:32.960 --> 00:31:35.759
<v Speaker 2>making a purchase, modifying a file, and ask the user

632
00:31:35.799 --> 00:31:38.319
<v Speaker 2>for explicit confirmation is it okay to do this?

633
00:31:38.559 --> 00:31:40.240
<v Speaker 1>Essential for safety absolutely.

634
00:31:40.519 --> 00:31:42.640
<v Speaker 2>And then there's the ability to fork and replay history.

635
00:31:43.039 --> 00:31:45.400
<v Speaker 2>This allows the user to effectively go back in time

636
00:31:45.440 --> 00:31:47.799
<v Speaker 2>to an earlier point in the conversation or workflow state,

637
00:31:48.000 --> 00:31:49.920
<v Speaker 2>and then start a new branch from there, trying a

638
00:31:49.920 --> 00:31:53.880
<v Speaker 2>different approach. It's fantastic for experimentation, debugging, and recovering from

639
00:31:54.000 --> 00:31:55.799
<v Speaker 2>errors without starting over completely.

640
00:31:56.039 --> 00:31:59.880
<v Speaker 1>Those controls sound invaluable for making these powerful systems usable

641
00:32:00.119 --> 00:32:03.000
<v Speaker 1>end safe in the real world. Okay, what about a

642
00:32:03.039 --> 00:32:08.000
<v Speaker 1>common scenario. Llms can be a bit slow. What happens

643
00:32:08.039 --> 00:32:10.240
<v Speaker 1>if a user sends a new message while the agent

644
00:32:10.319 --> 00:32:13.079
<v Speaker 1>is still thinking about the previous one. How does systems

645
00:32:13.119 --> 00:32:14.359
<v Speaker 1>handle that concurrency?

646
00:32:14.839 --> 00:32:19.960
<v Speaker 2>That's the challenge of multitasking lllms. Handling concurrent inputs lms

647
00:32:19.960 --> 00:32:24.119
<v Speaker 2>are often quite slow compared to traditional software responses. Users

648
00:32:24.160 --> 00:32:26.559
<v Speaker 2>will send follow up messages or new requests before the

649
00:32:26.599 --> 00:32:29.519
<v Speaker 2>first one is finished. There are different strategies. The simplest

650
00:32:29.559 --> 00:32:32.400
<v Speaker 2>is just to refuse concurrent input maybe disable the input

651
00:32:32.400 --> 00:32:33.480
<v Speaker 2>box while processing.

652
00:32:33.680 --> 00:32:35.960
<v Speaker 1>Not very user friendly, Yeah, frustrating.

653
00:32:36.119 --> 00:32:36.279
<v Speaker 3>Right.

654
00:32:36.400 --> 00:32:38.920
<v Speaker 2>You can handle each input independently in a new thread,

655
00:32:39.039 --> 00:32:42.319
<v Speaker 2>but that might lose conversational context. You could queue inputs,

656
00:32:42.440 --> 00:32:45.799
<v Speaker 2>processing them one after another, or you can interrupt the

657
00:32:45.839 --> 00:32:48.880
<v Speaker 2>current run to prioritize the new input, either abandoning the

658
00:32:48.880 --> 00:32:51.480
<v Speaker 2>old one or trying to save its partial state. This

659
00:32:51.559 --> 00:32:54.680
<v Speaker 2>raises an interesting question. If an LLM is mid computation

660
00:32:54.920 --> 00:32:58.039
<v Speaker 2>and you send another message, how should it respond intelligently?

661
00:32:58.640 --> 00:33:00.880
<v Speaker 2>The book mentions in advanced sategy called.

662
00:33:00.680 --> 00:33:01.559
<v Speaker 3>Fork and merge.

663
00:33:01.720 --> 00:33:02.759
<v Speaker 1>Fork and merge.

664
00:33:02.920 --> 00:33:06.079
<v Speaker 2>Yeah, The idea is the system temporarily forks the agent's

665
00:33:06.119 --> 00:33:09.279
<v Speaker 2>current state. When new input arrives, it processes the new

666
00:33:09.279 --> 00:33:13.319
<v Speaker 2>input in parallel, perhaps starting from that fork state. Then

667
00:33:13.599 --> 00:33:17.279
<v Speaker 2>somehow it intelligently merges the results or final states from

668
00:33:17.279 --> 00:33:20.640
<v Speaker 2>both the original computation and the new parallel one. It's

669
00:33:20.640 --> 00:33:24.599
<v Speaker 2>complex to implement correctly, requiring careful steat management, but it

670
00:33:24.599 --> 00:33:28.240
<v Speaker 2>could allow for very fluid interruption tolerant interactions.

671
00:33:28.400 --> 00:33:32.599
<v Speaker 1>Wow. Okay, that sounds complex but powerful for smooth interaction.

672
00:33:32.880 --> 00:33:33.359
<v Speaker 3>Yeah.

673
00:33:33.440 --> 00:33:36.200
<v Speaker 1>So we've designed the core logic, we've added memory with

674
00:33:36.319 --> 00:33:39.920
<v Speaker 1>lang graph, We've considered reliability and UX controls. Now, how

675
00:33:39.920 --> 00:33:41.839
<v Speaker 1>do we actually get this thing out of our development

676
00:33:41.920 --> 00:33:45.599
<v Speaker 1>environment and deployed for external users, making sure it's stable

677
00:33:45.640 --> 00:33:46.519
<v Speaker 1>and can handle real.

678
00:33:46.400 --> 00:33:49.319
<v Speaker 2>Traffic Right the production leak? This is where things get serious.

679
00:33:49.839 --> 00:33:52.920
<v Speaker 2>The book highlights lang graph platform as a solution here.

680
00:33:53.160 --> 00:33:56.039
<v Speaker 2>It's essentially a managed service for deploying and hosting lang

681
00:33:56.079 --> 00:33:58.759
<v Speaker 2>graph agents at scale. Its goal is to handle the

682
00:33:58.799 --> 00:34:04.119
<v Speaker 2>operational headachesction. It provides things like horizontally scaling task queues

683
00:34:04.559 --> 00:34:07.920
<v Speaker 2>and servers to handle many concurrent users, plus a robust

684
00:34:08.000 --> 00:34:12.880
<v Speaker 2>postgress checkpointer for efficiently storing potentially large states and conversation threads.

685
00:34:13.480 --> 00:34:16.320
<v Speaker 2>The aim is fault tolerant scalability, so.

686
00:34:16.320 --> 00:34:19.119
<v Speaker 1>It takes care of the scaling and reliability infrastructure.

687
00:34:19.280 --> 00:34:20.079
<v Speaker 3>That's the idea.

688
00:34:20.480 --> 00:34:23.159
<v Speaker 2>Now, before you deploy there, you'll need a few prerequisite

689
00:34:23.199 --> 00:34:27.920
<v Speaker 2>setup your apikeys from your LLM provider like OpenAI, a

690
00:34:27.960 --> 00:34:31.440
<v Speaker 2>configured vector store if you're using RWI. The book mentioned

691
00:34:31.519 --> 00:34:34.599
<v Speaker 2>Superbase with its pg vector extension is a good option,

692
00:34:35.159 --> 00:34:38.000
<v Speaker 2>and you'll need a langsmith account because lang graft platform

693
00:34:38.039 --> 00:34:40.920
<v Speaker 2>is tightly integrated with langsmith for monitoring and debugging.

694
00:34:41.159 --> 00:34:42.239
<v Speaker 1>Langsmith. What's that?

695
00:34:42.480 --> 00:34:45.400
<v Speaker 2>Langsmith is another part of the lang sching ecosystem focus

696
00:34:45.440 --> 00:34:50.719
<v Speaker 2>specifically on observability, debugging, testing and monitoring for LLLM applications.

697
00:34:50.840 --> 00:34:52.519
<v Speaker 2>It's pretty crucial for the whole life cycle.

698
00:34:52.719 --> 00:34:55.119
<v Speaker 1>Okay, got it, So you set up prerequisites, then how

699
00:34:55.119 --> 00:34:57.079
<v Speaker 1>do you deploy to lang graph platform.

700
00:34:57.440 --> 00:35:00.559
<v Speaker 2>The process is designed to be fairly straightforward. You typically

701
00:35:00.599 --> 00:35:03.639
<v Speaker 2>define your lang graph application structure in a configuration file,

702
00:35:04.039 --> 00:35:07.039
<v Speaker 2>often the langgraph dot json. You can test it locally

703
00:35:07.119 --> 00:35:09.960
<v Speaker 2>using the lang graph command line interface the CLI with

704
00:35:10.079 --> 00:35:13.519
<v Speaker 2>a command like lang grafh dev, and then deployment to

705
00:35:13.519 --> 00:35:16.599
<v Speaker 2>the managed platform is often done via one click submissions

706
00:35:16.880 --> 00:35:18.679
<v Speaker 2>directly from the langsmith user interface.

707
00:35:18.800 --> 00:35:22.960
<v Speaker 1>Okay, seems streamlined, but deployment isn't the end, right. We

708
00:35:23.079 --> 00:35:25.440
<v Speaker 1>keep coming back to the fact that llms are non

709
00:35:25.480 --> 00:35:29.159
<v Speaker 1>deterministic and prone to hallucination. How do we build and

710
00:35:29.199 --> 00:35:33.199
<v Speaker 1>maintain trust after launch? How does continuous improvement work in

711
00:35:33.239 --> 00:35:34.119
<v Speaker 1>this AI world?

712
00:35:34.639 --> 00:35:37.320
<v Speaker 2>This is absolutely critical. Deployment is just the beginning of

713
00:35:37.320 --> 00:35:40.840
<v Speaker 2>the journey. You need that continuous improvement cycle. Design, test

714
00:35:40.920 --> 00:35:43.840
<v Speaker 2>data is deployed on monitor and fixed JAD redesign. Even

715
00:35:43.880 --> 00:35:46.360
<v Speaker 2>in the design stage, you can build in defensiveness like

716
00:35:46.400 --> 00:35:48.679
<v Speaker 2>that self corrective our gag idea we touched on earlier.

717
00:35:49.039 --> 00:35:51.320
<v Speaker 2>You can have an LLLM within your agent's flow whose

718
00:35:51.400 --> 00:35:54.719
<v Speaker 2>job is to grade retrieval relevance. Did we find good

719
00:35:54.760 --> 00:35:58.159
<v Speaker 2>documents and check the answer for hallucinations before showing it

720
00:35:58.360 --> 00:35:58.920
<v Speaker 2>to the user.

721
00:35:59.159 --> 00:36:01.639
<v Speaker 1>The LM double checks itself yeah.

722
00:36:01.280 --> 00:36:03.760
<v Speaker 2>And if it decides the retrieval was poor or the

723
00:36:03.800 --> 00:36:07.000
<v Speaker 2>answer looks suspicious, it could trigger a fallback. Maybe try

724
00:36:07.039 --> 00:36:10.119
<v Speaker 2>a web search instead, or ask the user for clarification.

725
00:36:10.639 --> 00:36:14.239
<v Speaker 2>Building self correction right into the design. Then comes pre

726
00:36:14.280 --> 00:36:18.280
<v Speaker 2>production testing. This is all about measuring accuracy, latency, cost,

727
00:36:18.719 --> 00:36:21.559
<v Speaker 2>whatever metrics matter to you before you expose the app

728
00:36:21.599 --> 00:36:24.159
<v Speaker 2>to real users. For this, you need good data sets.

729
00:36:24.280 --> 00:36:25.840
<v Speaker 1>Where do those data sets come from?

730
00:36:26.000 --> 00:36:30.079
<v Speaker 2>Several sources. You can have manually curated examples humans carefully

731
00:36:30.079 --> 00:36:33.559
<v Speaker 2>writing good questions and ideal answers. You can use application

732
00:36:33.639 --> 00:36:36.599
<v Speaker 2>logs from early internal testing or beta users, or you

733
00:36:36.599 --> 00:36:40.000
<v Speaker 2>can even generate synthetic data using other lolms to create

734
00:36:40.000 --> 00:36:43.800
<v Speaker 2>diverse examples of inputs and outputs. Langsmith actually has tools

735
00:36:43.800 --> 00:36:47.119
<v Speaker 2>specifically for creating and managing these test data sets. Once

736
00:36:47.119 --> 00:36:50.320
<v Speaker 2>you have data, you need evaluation criteria. You compare your

737
00:36:50.360 --> 00:36:52.599
<v Speaker 2>app's output against some ground truth references.

738
00:36:52.760 --> 00:36:55.400
<v Speaker 3>How do you evaluate? You can use human evaluators people.

739
00:36:55.280 --> 00:37:00.320
<v Speaker 2>Giving quolitative feedback on nuance, tone correctness, very valuable, but

740
00:37:00.440 --> 00:37:04.320
<v Speaker 2>slow and expensive. You can use heuristic evaluators basically simple

741
00:37:04.320 --> 00:37:06.440
<v Speaker 2>hard code of checks like does the output contain the

742
00:37:06.480 --> 00:37:09.320
<v Speaker 2>specific keyword? Is a below a certain length? Quick, but

743
00:37:09.440 --> 00:37:13.840
<v Speaker 2>limited increasingly popular lmms is a judge evaluators. Here, you

744
00:37:13.960 --> 00:37:16.519
<v Speaker 2>use another LM, give it a real quick and ask

745
00:37:16.559 --> 00:37:19.400
<v Speaker 2>it to score or critique your application's output based.

746
00:37:19.119 --> 00:37:19.800
<v Speaker 3>On that rubric.

747
00:37:19.960 --> 00:37:23.400
<v Speaker 1>Using an LM to judge another LLLM exactly.

748
00:37:23.239 --> 00:37:26.960
<v Speaker 2>And what's really fascinating here, Lanksmith has this clever feedback loop.

749
00:37:27.440 --> 00:37:30.079
<v Speaker 2>If a human corrects or disagrees with the LM as

750
00:37:30.119 --> 00:37:34.679
<v Speaker 2>a judge's assessment, lank Smith captures that correction and automatically

751
00:37:34.760 --> 00:37:37.719
<v Speaker 2>turns it into a fu shot example that gets added

752
00:37:37.760 --> 00:37:40.400
<v Speaker 2>back into the judge's prompt for future evaluations.

753
00:37:40.760 --> 00:37:44.280
<v Speaker 1>Wow, so the judging LLM actually learns from human corrections

754
00:37:44.280 --> 00:37:44.800
<v Speaker 1>over time.

755
00:37:44.960 --> 00:37:49.440
<v Speaker 2>Precisely, it helps the automated evaluation along better with human preferences,

756
00:37:49.760 --> 00:37:53.239
<v Speaker 2>reducing the need for constant manual prompt tweaking for the judge.

757
00:37:53.599 --> 00:37:56.559
<v Speaker 2>It's a really smart self improving mechanism. You also need

758
00:37:56.639 --> 00:38:00.320
<v Speaker 2>rigorous regression testing. As you update your code or even

759
00:38:00.360 --> 00:38:03.960
<v Speaker 2>the underlying LLLM models change model drift, you need to

760
00:38:04.000 --> 00:38:07.239
<v Speaker 2>constantly rerun your tests to prevent regression and sure you

761
00:38:07.239 --> 00:38:11.039
<v Speaker 2>haven't accidentally made things worse. Langsmith's comparison view is designed

762
00:38:11.039 --> 00:38:14.079
<v Speaker 2>to help spot these performance changes over time, and for

763
00:38:14.159 --> 00:38:17.679
<v Speaker 2>complex agents, evaluation needs to happen on multiple levels the

764
00:38:17.679 --> 00:38:21.679
<v Speaker 2>final response, but also individuals single step decisions like did

765
00:38:21.679 --> 00:38:24.480
<v Speaker 2>it pick the right tool, and even the entire trajectory,

766
00:38:24.519 --> 00:38:25.719
<v Speaker 2>the sequence of actions it took.

767
00:38:25.960 --> 00:38:27.679
<v Speaker 1>Okay, that's a lot of testing before launch.

768
00:38:27.679 --> 00:38:30.800
<v Speaker 2>What about after production? Monitoring is crucial for catching bugs

769
00:38:30.800 --> 00:38:33.840
<v Speaker 2>and weird edge cases that only emerge with real users

770
00:38:33.840 --> 00:38:36.719
<v Speaker 2>and real world data. Lang Smith is key here again

771
00:38:37.119 --> 00:38:40.639
<v Speaker 2>providing tracing to track exactly what happened inside your agent

772
00:38:40.679 --> 00:38:43.800
<v Speaker 2>when it error occurred or user gave bad feedback. You

773
00:38:43.840 --> 00:38:47.119
<v Speaker 2>need mechanisms for collecting feedback and production, maybe simple thumbs

774
00:38:47.199 --> 00:38:50.159
<v Speaker 2>updown buttons, annotation ques where users or internal teams can

775
00:38:50.159 --> 00:38:53.000
<v Speaker 2>flag issues, or even running those LMM as a judge

776
00:38:53.000 --> 00:38:57.320
<v Speaker 2>evaluators on live traffic samples. You can also implement classification

777
00:38:57.400 --> 00:39:00.880
<v Speaker 2>and tagging on inputs and outputs, checking for things like toxicity,

778
00:39:01.079 --> 00:39:05.480
<v Speaker 2>personally identifiable information, or even trying to detect prompt injection attacks.

779
00:39:05.920 --> 00:39:09.400
<v Speaker 2>These act as safety guardrails and a really important practical tip.

780
00:39:09.719 --> 00:39:12.360
<v Speaker 2>Release your app in phases. Start with a small group

781
00:39:12.400 --> 00:39:16.159
<v Speaker 2>of Beata users, gather feedback, fix issues, then gradually expand

782
00:39:16.159 --> 00:39:18.480
<v Speaker 2>the rollout. Don't just flip the switch for everyone on

783
00:39:18.559 --> 00:39:19.079
<v Speaker 2>day one.

784
00:39:19.320 --> 00:39:26.360
<v Speaker 1>So what this all really means, this continuous cycle of design, testing, evaluation, monitoring, fixing.

785
00:39:27.239 --> 00:39:29.639
<v Speaker 1>It's not just about squashing bugs, is it. It feels

786
00:39:29.639 --> 00:39:33.559
<v Speaker 1>like it's about systematically building confidence and trust in these

787
00:39:33.639 --> 00:39:37.800
<v Speaker 1>incredibly powerful but inherently probabilistic systems. It's about making them

788
00:39:37.840 --> 00:39:41.719
<v Speaker 1>reliable enough for the real world. Okay, thinking bigger picture, now,

789
00:39:41.800 --> 00:39:46.199
<v Speaker 1>let's unpack this idea. Llms are amazing because they're so intuitive, right,

790
00:39:46.320 --> 00:39:48.840
<v Speaker 1>They often understand what we mean, even with typos or

791
00:39:48.840 --> 00:39:53.119
<v Speaker 1>slightly vague questions. That's fantastic. But that same flexibility means

792
00:39:53.159 --> 00:39:56.079
<v Speaker 1>their output isn't always perfectly predictable. It can be slightly off,

793
00:39:56.320 --> 00:39:59.519
<v Speaker 1>and that challenges our traditional software interfaces, which we usually

794
00:39:59.519 --> 00:40:04.079
<v Speaker 1>build expecting very precise, deterministic results. How is this fundamental

795
00:40:04.119 --> 00:40:06.639
<v Speaker 1>difference going to change the way we actually interact with software.

796
00:40:06.679 --> 00:40:08.920
<v Speaker 3>That's a really profound question. You're right.

797
00:40:08.960 --> 00:40:13.639
<v Speaker 2>Traditionally UIs think Microsoft Word, Figma spreadsheets. They have fixed

798
00:40:13.639 --> 00:40:18.320
<v Speaker 2>tool pallettes, predictable menus canvases where actions have precise, repeatable

799
00:40:18.320 --> 00:40:23.639
<v Speaker 2>outcomes because the underlying logic is deterministic LLLM powered applications

800
00:40:23.679 --> 00:40:27.440
<v Speaker 2>are just different. They are more forgiving of messy input,

801
00:40:27.599 --> 00:40:30.920
<v Speaker 2>which is great, but their output does have that inherent variability.

802
00:40:31.360 --> 00:40:34.360
<v Speaker 2>This mismatch is pushing us to think about new interaction patterns.

803
00:40:34.679 --> 00:40:38.440
<v Speaker 2>New UIs designed for this LLM native world. The book

804
00:40:38.440 --> 00:40:43.159
<v Speaker 2>outlines three really interesting emerging patterns. The first and probably

805
00:40:43.159 --> 00:40:46.360
<v Speaker 2>the easiest lift to integrate into existing apps is the

806
00:40:46.360 --> 00:40:49.920
<v Speaker 2>interactive chatbot. Think of this as an AI sidekick living

807
00:40:49.920 --> 00:40:52.800
<v Speaker 2>within your application, like get up copilot, chat within your

808
00:40:52.800 --> 00:40:55.599
<v Speaker 2>code editor, or a similar chat interface within a design

809
00:40:55.679 --> 00:40:59.079
<v Speaker 2>tool or document editor. This chatbi can see the main

810
00:40:59.119 --> 00:41:03.719
<v Speaker 2>application content, the code, the design, the document and interact.

811
00:41:03.280 --> 00:41:04.920
<v Speaker 1>With it so you can talk to it about the

812
00:41:04.920 --> 00:41:05.800
<v Speaker 1>thing you're working.

813
00:41:05.559 --> 00:41:09.119
<v Speaker 2>On, exactly, explain this code, suggest a different layout. Summarize

814
00:41:09.119 --> 00:41:13.480
<v Speaker 2>this section. It's conversational collaboration. The key components are a

815
00:41:13.480 --> 00:41:18.480
<v Speaker 2>good dialogue tune chat model obviously, conversation history, streaming outputs

816
00:41:18.480 --> 00:41:21.239
<v Speaker 2>so it feels responsive, tool calling so the chatbot can

817
00:41:21.280 --> 00:41:25.199
<v Speaker 2>actually invoke application functions, change the font size, refactor this code,

818
00:41:25.360 --> 00:41:28.000
<v Speaker 2>and probably human in the loop controls for safety.

819
00:41:28.119 --> 00:41:31.079
<v Speaker 1>Okay, the AI sidekick makes sense. What's the next pattern?

820
00:41:31.119 --> 00:41:34.639
<v Speaker 2>The second pattern pushes the collaboration idea further collaborative editing

821
00:41:34.840 --> 00:41:38.400
<v Speaker 2>with lms. Here the LLM agent isn't just a sidekick

822
00:41:38.440 --> 00:41:41.800
<v Speaker 2>you talked to. It becomes one of those users contributing

823
00:41:41.840 --> 00:41:46.320
<v Speaker 2>to this shared document or shared state, right alongside human collaborators.

824
00:41:46.400 --> 00:41:49.039
<v Speaker 2>Think Google docs, but one of the cursors belongs to

825
00:41:49.039 --> 00:41:49.519
<v Speaker 2>an AI.

826
00:41:49.719 --> 00:41:52.320
<v Speaker 1>Whoa in AI is a real time teammate.

827
00:41:52.679 --> 00:41:56.559
<v Speaker 2>Potentially, Yes, what's fascinating here is how this could work.

828
00:41:57.199 --> 00:42:00.719
<v Speaker 2>Maybe the LM acts as an asynchronous drafter, pharing sections

829
00:42:00.719 --> 00:42:03.760
<v Speaker 2>for you overnight, or maybe it's an always on copilot

830
00:42:04.000 --> 00:42:08.199
<v Speaker 2>subjecting improvements or cleaning up formatting in real time. If

831
00:42:08.199 --> 00:42:11.000
<v Speaker 2>we connect this to the bigger picture, it raises really

832
00:42:11.039 --> 00:42:14.239
<v Speaker 2>important questions about how we design systems Where human edits

833
00:42:14.239 --> 00:42:17.920
<v Speaker 2>and AI edits merge seamlessly, how do you handle conflicts

834
00:42:18.039 --> 00:42:19.400
<v Speaker 2>whose changes take precedence?

835
00:42:19.480 --> 00:42:22.400
<v Speaker 1>Yeah, the merging in conflict resolution sounds tricky.

836
00:42:22.519 --> 00:42:22.960
<v Speaker 3>Definitely.

837
00:42:23.440 --> 00:42:27.239
<v Speaker 2>Key components here involve managing that shared state carefully, maybe

838
00:42:27.280 --> 00:42:31.280
<v Speaker 2>using sophisticated techniques like conflict free replicated data types crdts,

839
00:42:31.639 --> 00:42:34.960
<v Speaker 2>or just robust merging logic. You need task managers, ways

840
00:42:35.000 --> 00:42:37.639
<v Speaker 2>to handle concurrency, and definitely a good under.

841
00:42:37.519 --> 00:42:41.280
<v Speaker 1>Redos stack an AI teammate directly editing alongside you. That's

842
00:42:41.320 --> 00:42:44.239
<v Speaker 1>a huge paradigm shift. What's the third emerging pattern?

843
00:42:44.480 --> 00:42:48.480
<v Speaker 2>The third and perhaps the most futuristic feeling is ambient computing.

844
00:42:49.159 --> 00:42:52.119
<v Speaker 2>This is where the LLM is continuously doing some kind

845
00:42:52.159 --> 00:42:54.840
<v Speaker 2>of work in the background while you, the user, are

846
00:42:54.880 --> 00:42:56.199
<v Speaker 2>presumably doing something else.

847
00:42:56.360 --> 00:42:56.719
<v Speaker 3>Entirely.

848
00:42:57.440 --> 00:43:01.480
<v Speaker 2>It's not waiting for your explicit command, proactively working on your.

849
00:43:01.320 --> 00:43:05.039
<v Speaker 1>Behalf as silent, always on assistant kind of Think about

850
00:43:05.079 --> 00:43:06.960
<v Speaker 1>how LLM reasoning could transform this.

851
00:43:07.519 --> 00:43:11.000
<v Speaker 2>Old ambient computing often required setting up lots of manual rules.

852
00:43:11.280 --> 00:43:14.920
<v Speaker 2>If I get an email from X, notify me tedious.

853
00:43:15.440 --> 00:43:19.440
<v Speaker 2>The fascinating question now is can elms use their understanding

854
00:43:19.440 --> 00:43:23.360
<v Speaker 2>and reasoning to proactively identify what's genuinely interesting or important

855
00:43:23.360 --> 00:43:26.079
<v Speaker 2>to you without needing endless configuration.

856
00:43:25.639 --> 00:43:28.400
<v Speaker 1>So it learns what matters to me and surfaces just that.

857
00:43:28.400 --> 00:43:33.880
<v Speaker 2>That's the potential. Key components here would include triggers detecting

858
00:43:33.920 --> 00:43:38.320
<v Speaker 2>new information like emails, news, calendar, updates, long term memory

859
00:43:38.400 --> 00:43:42.360
<v Speaker 2>to build context about you and your priorities, reflection, the

860
00:43:42.400 --> 00:43:45.119
<v Speaker 2>agent actively learning and updating its internal model of what

861
00:43:45.159 --> 00:43:50.000
<v Speaker 2>you find interesting, and crucially summarized output. It doesn't bombard you.

862
00:43:50.159 --> 00:43:54.360
<v Speaker 2>It intelligently summarizes its findings and surfaces only the noteworthy stuff.

863
00:43:54.559 --> 00:43:56.760
<v Speaker 2>It needs a task manager too, to keep track of

864
00:43:56.760 --> 00:43:57.960
<v Speaker 2>his background processes.

865
00:43:58.559 --> 00:44:01.119
<v Speaker 1>So what does this all really mean for how we'll

866
00:44:01.159 --> 00:44:05.679
<v Speaker 1>interact with tech? Could lms truly become these quiet, proactive assistants,

867
00:44:05.719 --> 00:44:09.360
<v Speaker 1>maybe to drafting replies, summarizing reports, alerting us to opportunities

868
00:44:09.719 --> 00:44:12.880
<v Speaker 1>all happening in the background without constant manual setup. That

869
00:44:12.880 --> 00:44:15.760
<v Speaker 1>feels like a fundamentally different way to experience software.

870
00:44:16.000 --> 00:44:18.920
<v Speaker 2>It really does point towards a potential future where software

871
00:44:19.000 --> 00:44:22.320
<v Speaker 2>is less a collection of static tools we actively wield

872
00:44:22.679 --> 00:44:25.639
<v Speaker 2>and more of an intelligent environment that it anticipates and assists.

873
00:44:26.199 --> 00:44:29.000
<v Speaker 1>We have covered an incredible amount of ground today, haven't we.

874
00:44:29.079 --> 00:44:30.760
<v Speaker 1>It feels like a whirlwind.

875
00:44:30.280 --> 00:44:31.239
<v Speaker 3>Tourth It really does.

876
00:44:31.480 --> 00:44:34.920
<v Speaker 1>From the absolute fundamentals of llms, how they predict text,

877
00:44:35.239 --> 00:44:37.480
<v Speaker 1>what tokens are, and how prompting is our way to

878
00:44:37.519 --> 00:44:42.000
<v Speaker 1>steer them, then diving deep into retrieval augmented generation our RAG,

879
00:44:42.400 --> 00:44:45.639
<v Speaker 1>that crucial technique for grounding them in real world specific

880
00:44:45.719 --> 00:44:47.480
<v Speaker 1>data to fight hallucinations.

881
00:44:47.760 --> 00:44:51.199
<v Speaker 2>Yeah, and then moving into land graph enabling these complex

882
00:44:51.400 --> 00:44:56.199
<v Speaker 2>multi step agents that can actually remember conversations, plan sequences

883
00:44:56.239 --> 00:44:59.920
<v Speaker 2>of actions using tools, and even reflect and critique their own.

884
00:44:59.760 --> 00:45:03.159
<v Speaker 1>Out puts exactly. And then tackling the really practical side

885
00:45:03.599 --> 00:45:07.760
<v Speaker 1>making these things production ready, ensuring reliability with structured output,

886
00:45:08.159 --> 00:45:11.119
<v Speaker 1>giving users control with human in the loop features, and

887
00:45:11.199 --> 00:45:15.960
<v Speaker 1>mastering that whole deployment, testing, monitoring, and continuous improvement cycle.

888
00:45:16.039 --> 00:45:18.119
<v Speaker 2>It's a lot, It is a lot, but it's clear

889
00:45:18.159 --> 00:45:21.159
<v Speaker 2>that llms, combined with powerful frameworks like line chain and

890
00:45:21.239 --> 00:45:25.599
<v Speaker 2>lang graph are genuinely giving us well thing building superpowers.

891
00:45:25.599 --> 00:45:26.920
<v Speaker 3>As the book puts it, they're.

892
00:45:26.760 --> 00:45:30.599
<v Speaker 2>Making previously hard things easy and previously impossible things possible.

893
00:45:30.960 --> 00:45:33.639
<v Speaker 2>This deep dive hopefully has equipped you, the listener, with

894
00:45:33.679 --> 00:45:36.679
<v Speaker 2>the core knowledge to not just watch this revolution unfold,

895
00:45:36.880 --> 00:45:38.840
<v Speaker 2>but to potentially participate.

896
00:45:38.320 --> 00:45:41.320
<v Speaker 1>In it absolutely. And as we wrap up, maybe a

897
00:45:41.400 --> 00:45:45.039
<v Speaker 1>final thought to chew on. As these llms become more capable,

898
00:45:45.199 --> 00:45:48.360
<v Speaker 1>more edgentic, and more deeply integrated into our digital lives.

899
00:45:48.800 --> 00:45:52.400
<v Speaker 1>Imagine that world where software isn't just a static toolbox anymore.

900
00:45:52.760 --> 00:45:57.679
<v Speaker 1>Imagine it as a dynamic, intelligent collaborator, always learning, always adapting.

901
00:45:58.079 --> 00:46:02.159
<v Speaker 1>How will that fundamental shift I packed our creativity, our productivity,

902
00:46:02.320 --> 00:46:04.480
<v Speaker 1>even our very understanding of what it means to be

903
00:46:04.559 --> 00:46:08.039
<v Speaker 1>well informed or in control in an age of potentially

904
00:46:08.159 --> 00:46:10.119
<v Speaker 1>ambient AI constantly working around us.

905
00:46:10.199 --> 00:46:11.800
<v Speaker 2>That's a fascinating future to contemplate.

906
00:46:11.840 --> 00:46:14.360
<v Speaker 1>It really is. We definitely encourage you to continue your

907
00:46:14.360 --> 00:46:17.840
<v Speaker 1>own deep dive. Explore the open source Langschaine library, check

908
00:46:17.880 --> 00:46:20.760
<v Speaker 1>out lang graph, maybe look into langsmith. There's so much happening,

909
00:46:21.039 --> 00:46:23.000
<v Speaker 1>and the joy of discovery in this field right now

910
00:46:23.079 --> 00:46:24.199
<v Speaker 1>is truly immense.
