WEBVTT

1
00:00:00.120 --> 00:00:02.160
<v Speaker 1>Welcome to the Deep Dive, the show where we try

2
00:00:02.200 --> 00:00:06.000
<v Speaker 1>to cut through the noise and get you truly well informed.

3
00:00:06.040 --> 00:00:07.240
<v Speaker 2>Fast glad to be here.

4
00:00:07.799 --> 00:00:11.279
<v Speaker 1>So today we're plunging into a topic we've all encountered,

5
00:00:11.320 --> 00:00:15.599
<v Speaker 1>and let's be honest, sometimes really really disliked the chatbot.

6
00:00:15.759 --> 00:00:17.879
<v Speaker 1>Oh yeah, you know the ones they just don't understand

7
00:00:17.879 --> 00:00:20.120
<v Speaker 1>a single word you say, or they send you in

8
00:00:20.120 --> 00:00:23.480
<v Speaker 1>these endless circles, or you know, make you desperately mash

9
00:00:23.600 --> 00:00:25.160
<v Speaker 1>that speak to a human button.

10
00:00:25.440 --> 00:00:27.679
<v Speaker 2>It's such a universal pain point, isn't it. And it

11
00:00:27.760 --> 00:00:31.519
<v Speaker 2>really spotlights a critical challenge. Yeah, how do we build

12
00:00:31.519 --> 00:00:35.920
<v Speaker 2>AI that actually understands us? Yeah, and you know helps that's.

13
00:00:35.719 --> 00:00:38.719
<v Speaker 1>The core question exactly. So for this Deep Dive, we're

14
00:00:38.799 --> 00:00:44.119
<v Speaker 1>unpacking the secrets behind creating genuinely well delightful AI interactions.

15
00:00:44.200 --> 00:00:47.119
<v Speaker 2>Hopefully we're pulling insights from some really interesting new research,

16
00:00:47.399 --> 00:00:52.079
<v Speaker 2>Effective conversational AI Chatbots that work by Ennikarrosa, Andrew Freed,

17
00:00:52.119 --> 00:00:54.640
<v Speaker 2>and Corey Jacobs. It just came out in twenty twenty five.

18
00:00:54.920 --> 00:00:57.280
<v Speaker 1>Yeah, and our mission today is basically to reveal why

19
00:00:57.320 --> 00:01:00.799
<v Speaker 1>some bots succeed where others just spectacularly fit, and also

20
00:01:00.880 --> 00:01:04.400
<v Speaker 1>how the newest advancements in AI are truly changing the

21
00:01:04.439 --> 00:01:05.200
<v Speaker 1>game for the better.

22
00:01:05.560 --> 00:01:10.120
<v Speaker 2>Think of this as your shortcut maybe to understanding how

23
00:01:10.159 --> 00:01:14.319
<v Speaker 2>to build or even just identify a truly effective conversational AI.

24
00:01:14.680 --> 00:01:16.920
<v Speaker 1>Okay, let's get into it then, So let's.

25
00:01:16.680 --> 00:01:19.319
<v Speaker 2>Maybe start with a clear definition for you. Conversational AI.

26
00:01:19.760 --> 00:01:24.439
<v Speaker 2>It's essentially a set of technologies designed to mimic human

27
00:01:24.480 --> 00:01:27.959
<v Speaker 2>interaction or sometimes even replace it using natural language.

28
00:01:28.040 --> 00:01:28.280
<v Speaker 1>Right.

29
00:01:28.400 --> 00:01:32.640
<v Speaker 2>It goes by lots of names chatbots, virtual agents, AI assistants,

30
00:01:32.719 --> 00:01:34.159
<v Speaker 2>sometimes even digital.

31
00:01:33.799 --> 00:01:36.079
<v Speaker 1>Employees, digital employees, huh okay.

32
00:01:36.040 --> 00:01:39.439
<v Speaker 2>And you mostly see it use for automating customer service,

33
00:01:40.480 --> 00:01:44.599
<v Speaker 2>powering voice assistants like Alexa or Siri, and even sometimes

34
00:01:44.640 --> 00:01:47.280
<v Speaker 2>pre screening interactions before they actually go to a human.

35
00:01:47.680 --> 00:01:50.159
<v Speaker 1>So it's way more than just that little chat window

36
00:01:50.200 --> 00:01:52.120
<v Speaker 1>that pops up on a website. It's kind of everywhere,

37
00:01:52.319 --> 00:01:54.359
<v Speaker 1>it really is, and the book breaks these down into

38
00:01:54.439 --> 00:01:56.599
<v Speaker 1>I think three main functional categories.

39
00:01:56.640 --> 00:01:59.359
<v Speaker 2>Is that right precisely? Yeah? First, you have your question

40
00:01:59.400 --> 00:02:02.959
<v Speaker 2>answering bot. People often call them faqbots. They're designed to

41
00:02:02.959 --> 00:02:08.439
<v Speaker 2>give direct responses to pretty simple factual questions like when

42
00:02:08.479 --> 00:02:13.240
<v Speaker 2>are you open or where you located? No follow up needed. Really,

43
00:02:13.400 --> 00:02:14.719
<v Speaker 2>they just spit out the information.

44
00:02:14.919 --> 00:02:17.000
<v Speaker 1>So these are the quick hit ones. Get in, get

45
00:02:17.000 --> 00:02:19.520
<v Speaker 1>the answer, get out. Is there like a common mistake

46
00:02:19.560 --> 00:02:22.000
<v Speaker 1>people make when they're designing just these simple bots.

47
00:02:22.360 --> 00:02:25.719
<v Speaker 2>Well, I think the main pitfall is underestimating the sheer

48
00:02:25.800 --> 00:02:28.680
<v Speaker 2>variety of ways users might ask the same simple question.

49
00:02:28.960 --> 00:02:32.199
<v Speaker 2>You know that mismatch leads straight to misunderstanding.

50
00:02:32.360 --> 00:02:34.039
<v Speaker 1>Ah, right, makes sense.

51
00:02:34.159 --> 00:02:38.199
<v Speaker 2>Then you have the process oriented or transactional solutions. Now,

52
00:02:38.280 --> 00:02:41.400
<v Speaker 2>these are designed to guide users through a series of

53
00:02:41.439 --> 00:02:44.360
<v Speaker 2>steps to actually achieve a specific goal.

54
00:02:44.479 --> 00:02:47.719
<v Speaker 1>Like booking an appointment or checking inn account balance maybe.

55
00:02:47.479 --> 00:02:50.680
<v Speaker 2>Exactly checking in account balance, booking something. They might collect

56
00:02:50.680 --> 00:02:53.159
<v Speaker 2>info for someone else to handle later, or sometimes they

57
00:02:53.159 --> 00:02:55.240
<v Speaker 2>can even execute the transaction right then and there.

58
00:02:55.400 --> 00:02:57.879
<v Speaker 1>Okay, And the last category, what's that?

59
00:02:57.879 --> 00:03:00.879
<v Speaker 2>That's the routing agent. It's holds you basically is to

60
00:03:00.919 --> 00:03:03.039
<v Speaker 2>figure out where to send you next, so like a

61
00:03:03.159 --> 00:03:07.599
<v Speaker 2>dispatcher kinda yeah, either to another more specialized bot, or

62
00:03:07.759 --> 00:03:10.199
<v Speaker 2>you know, when it's necessary, hand you off to a

63
00:03:10.280 --> 00:03:10.840
<v Speaker 2>human agent.

64
00:03:10.960 --> 00:03:11.280
<v Speaker 1>Okay.

65
00:03:11.319 --> 00:03:13.759
<v Speaker 2>And what's fascinating here, and the book points is out,

66
00:03:14.000 --> 00:03:17.520
<v Speaker 2>is that many real world AI solutions are actually a

67
00:03:17.560 --> 00:03:21.120
<v Speaker 2>clever mix of all three. Oh interesting, Like how We'll

68
00:03:21.120 --> 00:03:24.680
<v Speaker 2>think of a retail banking chatbot. It might answer FAQs

69
00:03:25.240 --> 00:03:28.599
<v Speaker 2>about bank hours, right, but it could also guide you

70
00:03:28.639 --> 00:03:31.919
<v Speaker 2>through opening a new account that's transactional, and then route

71
00:03:31.960 --> 00:03:35.479
<v Speaker 2>you to a human specialist for something complex like fraud reporting.

72
00:03:35.680 --> 00:03:38.240
<v Speaker 1>Right right. It blends the functions that really paints a

73
00:03:38.280 --> 00:03:40.639
<v Speaker 1>clear picture of how versatile these things can be.

74
00:03:40.759 --> 00:03:41.840
<v Speaker 2>Yeah, when they work well.

75
00:03:43.159 --> 00:03:47.439
<v Speaker 1>So given this sort of intricate blend of categories, how

76
00:03:47.479 --> 00:03:50.360
<v Speaker 1>does this sophisticated dance actually happen? Behind the scenes? The

77
00:03:50.360 --> 00:03:52.879
<v Speaker 1>book describes this fascinating three step process.

78
00:03:53.039 --> 00:03:55.000
<v Speaker 2>It is quite elegant when you break it down. But

79
00:03:55.120 --> 00:03:58.080
<v Speaker 2>here's the real insight. I think the brilliance of a

80
00:03:58.120 --> 00:04:02.280
<v Speaker 2>truly effective bot it lies in the seamless execution of

81
00:04:02.319 --> 00:04:05.439
<v Speaker 2>these three fundamental steps. If any one of them falters,

82
00:04:05.680 --> 00:04:07.599
<v Speaker 2>the whole experience just kind of collapses.

83
00:04:07.639 --> 00:04:08.840
<v Speaker 1>Okay, so what's step one?

84
00:04:09.120 --> 00:04:11.919
<v Speaker 2>Step one, the bond has to figure out what the

85
00:04:12.039 --> 00:04:16.279
<v Speaker 2>user actually wants. This is done using natural language understanding

86
00:04:16.560 --> 00:04:17.199
<v Speaker 2>or NLU.

87
00:04:17.439 --> 00:04:18.519
<v Speaker 1>NLU got it.

88
00:04:18.720 --> 00:04:22.439
<v Speaker 2>Often this uses a machine learning text classifier. Think of

89
00:04:22.480 --> 00:04:25.160
<v Speaker 2>it like an AI that learns to categorize text, maybe

90
00:04:25.160 --> 00:04:30.959
<v Speaker 2>like sorting your emails into urgent or promotion. It uses

91
00:04:31.000 --> 00:04:32.560
<v Speaker 2>that to figure out the user's intent.

92
00:04:33.000 --> 00:04:35.920
<v Speaker 1>Okay, so intent when I type something? The first challenge

93
00:04:35.959 --> 00:04:38.120
<v Speaker 1>is the bot figuring out what I'm actually trying to do,

94
00:04:38.240 --> 00:04:42.439
<v Speaker 1>like distinguishing between me wanting to reset my password versus say,

95
00:04:42.519 --> 00:04:43.480
<v Speaker 1>find a store.

96
00:04:43.680 --> 00:04:46.319
<v Speaker 2>Exactly that you nailed it, that's the intent y Step two.

97
00:04:46.680 --> 00:04:49.040
<v Speaker 2>Once it thinks it knows the intent, the bot needs

98
00:04:49.040 --> 00:04:51.839
<v Speaker 2>to gather any extra information it needs to actually fulfill

99
00:04:51.920 --> 00:04:55.720
<v Speaker 2>that want. Okay, So a dialogue engine will ask clarifying questions,

100
00:04:56.160 --> 00:04:59.199
<v Speaker 2>and it might use something called orchestration layers to interact

101
00:04:59.199 --> 00:05:01.120
<v Speaker 2>with other systems through APIs.

102
00:05:01.199 --> 00:05:04.160
<v Speaker 1>APIs right like ways for computer systems to talk to

103
00:05:04.199 --> 00:05:04.519
<v Speaker 1>each other.

104
00:05:04.600 --> 00:05:07.319
<v Speaker 2>Precisely, it's the bot's way of securely talking to other

105
00:05:07.399 --> 00:05:10.279
<v Speaker 2>databases or services to pull the specific details it needs,

106
00:05:10.439 --> 00:05:12.000
<v Speaker 2>like your account info or whatever.

107
00:05:12.079 --> 00:05:15.040
<v Speaker 1>Okay, intent figured out, info gathered? What's step three?

108
00:05:15.360 --> 00:05:18.959
<v Speaker 2>Step three? Give the user what they want, simple as

109
00:05:19.000 --> 00:05:24.160
<v Speaker 2>that ideally, whether that's fulfilling their request directly providing the information,

110
00:05:24.639 --> 00:05:26.319
<v Speaker 2>or connecting them to a human agent.

111
00:05:26.439 --> 00:05:27.560
<v Speaker 1>And throughout this whole thing.

112
00:05:27.680 --> 00:05:31.000
<v Speaker 2>The critical takeaway, and the book really emphasizes this is

113
00:05:31.040 --> 00:05:36.560
<v Speaker 2>it must be quick, easy, and crucially follow ethical guidelines.

114
00:05:36.079 --> 00:05:38.240
<v Speaker 1>Ethical guidelines like what specifically like.

115
00:05:38.240 --> 00:05:43.040
<v Speaker 2>Handling sensitive information securely, and a big one never ever

116
00:05:43.160 --> 00:05:46.399
<v Speaker 2>pretending the AI is actually a human. Transparency is key.

117
00:05:46.800 --> 00:05:49.319
<v Speaker 1>Okay, it sounds so logical laid out like that, Yet,

118
00:05:49.439 --> 00:05:51.439
<v Speaker 1>as we said, for so many of us, the actual

119
00:05:51.439 --> 00:05:55.279
<v Speaker 1>experience with conversational AI causes so much pain. Yeah, the

120
00:05:55.319 --> 00:05:57.959
<v Speaker 1>book points out those classic frustrations, Right, the bot didn't

121
00:05:58.000 --> 00:06:00.199
<v Speaker 1>understand the thing I said, or you get that robot

122
00:06:00.360 --> 00:06:04.319
<v Speaker 1>voice initiating some totally confusing dialogue, or you just immediately

123
00:06:04.399 --> 00:06:05.800
<v Speaker 1>hit the button to talk to a person.

124
00:06:05.920 --> 00:06:06.680
<v Speaker 2>We've all been there.

125
00:06:06.759 --> 00:06:11.000
<v Speaker 1>It really begs the question, what exactly causes this weak understanding?

126
00:06:11.040 --> 00:06:12.399
<v Speaker 1>Why are they so often bad?

127
00:06:12.639 --> 00:06:15.759
<v Speaker 2>Well, weak understanding shows up in several frustrating ways. Right

128
00:06:16.639 --> 00:06:19.800
<v Speaker 2>the chatbot gives you the wrong answers, or it uses

129
00:06:19.839 --> 00:06:23.399
<v Speaker 2>that fallback intent way too much, you know, the sorry

130
00:06:23.439 --> 00:06:26.519
<v Speaker 2>I'm not sure what you're asking message yes, Or you

131
00:06:26.560 --> 00:06:32.240
<v Speaker 2>see frequent escalations to human agents, declining user engagement over time,

132
00:06:32.759 --> 00:06:35.560
<v Speaker 2>people just giving up and leaving, increasing abandonment rates.

133
00:06:36.000 --> 00:06:39.399
<v Speaker 1>So if users are constantly being asked to rephrase or

134
00:06:39.639 --> 00:06:43.519
<v Speaker 1>the bot just gives totally irrelevant responses, that's a dead giveaway.

135
00:06:43.199 --> 00:06:45.800
<v Speaker 2>Absolutely clear sign the understanding just isn't.

136
00:06:45.639 --> 00:06:48.680
<v Speaker 1>There, So what's behind it? Is it just like bad

137
00:06:48.759 --> 00:06:50.959
<v Speaker 1>luck or is there something fundamentally limited?

138
00:06:51.439 --> 00:06:54.319
<v Speaker 2>No, not usually bad luck. The book identifies a few

139
00:06:54.399 --> 00:06:57.519
<v Speaker 2>really common culprits, and the insight here is that these

140
00:06:57.519 --> 00:07:00.720
<v Speaker 2>are often design failures or sometimes maintenance fail things that

141
00:07:00.759 --> 00:07:03.839
<v Speaker 2>could have been prevented. Okay, like what well. One is

142
00:07:03.920 --> 00:07:07.800
<v Speaker 2>manufactured training data, so examples that don't truly reflect how

143
00:07:07.839 --> 00:07:09.480
<v Speaker 2>real users actually speak or type.

144
00:07:09.639 --> 00:07:12.199
<v Speaker 1>Right, if you train it on perfect grammar, but people

145
00:07:12.319 --> 00:07:14.839
<v Speaker 1>use slang or type fragments.

146
00:07:14.680 --> 00:07:17.600
<v Speaker 2>Exactly, the loot's going to fail. Another big one is

147
00:07:17.720 --> 00:07:22.040
<v Speaker 2>insufficient scope or gaps in topic coverage. Basically, the bot

148
00:07:22.120 --> 00:07:25.199
<v Speaker 2>just doesn't know enough about the things users are asking about, like.

149
00:07:25.079 --> 00:07:28.639
<v Speaker 1>That Meti World Pharma bought example, during the vaccine rollout.

150
00:07:28.319 --> 00:07:32.319
<v Speaker 2>Perfect example yeaheah. Initially it could handle general COVID nineteen

151
00:07:32.399 --> 00:07:35.519
<v Speaker 2>questions fine, but when people started asking about you know,

152
00:07:35.639 --> 00:07:39.959
<v Speaker 2>vaccine eligibility or booking appointments, the bot was totally stumped.

153
00:07:39.560 --> 00:07:42.439
<v Speaker 1>Because it hadn't been updated. The world changed faster than the.

154
00:07:42.360 --> 00:07:45.879
<v Speaker 2>Bot exactly, which highlights another cause new information that the

155
00:07:45.879 --> 00:07:48.279
<v Speaker 2>bot hasn't been taught. And the fourth one, which can

156
00:07:48.319 --> 00:07:50.959
<v Speaker 2>often be the trickiest to sort out, is a lack

157
00:07:51.000 --> 00:07:53.759
<v Speaker 2>of vetting or proper gatekeeping round changes.

158
00:07:54.319 --> 00:07:55.720
<v Speaker 1>What do you mean by that? Like too many cooks

159
00:07:55.720 --> 00:07:56.240
<v Speaker 1>in the kitchen?

160
00:07:56.879 --> 00:08:00.160
<v Speaker 2>Kind of untested changes or updates made by team who

161
00:08:00.160 --> 00:08:04.000
<v Speaker 2>aren't familiar with the whole system can accidentally introduce duplication

162
00:08:04.480 --> 00:08:07.600
<v Speaker 2>or create conflicts between different intents or mess up the

163
00:08:07.600 --> 00:08:08.920
<v Speaker 2>balance of the training data.

164
00:08:08.959 --> 00:08:09.519
<v Speaker 1>Wow.

165
00:08:09.879 --> 00:08:12.240
<v Speaker 2>Yeah. The book mentions a client where they saw their

166
00:08:12.319 --> 00:08:15.639
<v Speaker 2>classifiers accuracy just plummet from around eighty percent down to

167
00:08:15.680 --> 00:08:17.240
<v Speaker 2>like fifty five percent over time.

168
00:08:17.519 --> 00:08:21.160
<v Speaker 1>Fifty five percent that's barely better than guessing for some things, right, And.

169
00:08:21.160 --> 00:08:23.879
<v Speaker 2>It was because of all these unvetted changes piling up.

170
00:08:24.319 --> 00:08:27.480
<v Speaker 2>The insight here is that building an effective chatbot isn't

171
00:08:27.560 --> 00:08:31.120
<v Speaker 2>a one and done thing. It needs really diligent processes

172
00:08:31.480 --> 00:08:35.000
<v Speaker 2>to stop that kind of entropy from creeping in.

173
00:08:35.039 --> 00:08:38.039
<v Speaker 1>That's a huge drop. So how do we actually measure

174
00:08:38.080 --> 00:08:41.279
<v Speaker 1>this understanding for traditional AI to stop that kind of

175
00:08:41.360 --> 00:08:42.519
<v Speaker 1>decline from happening.

176
00:08:42.559 --> 00:08:45.879
<v Speaker 2>Well, for traditional classification based AI, we rely on a

177
00:08:45.919 --> 00:08:49.440
<v Speaker 2>few core metrics accuracy, precision, and recall.

178
00:08:49.559 --> 00:08:52.440
<v Speaker 1>Okay, break those down for us accuracy seems straightforward.

179
00:08:52.639 --> 00:08:55.919
<v Speaker 2>Accuracy is yeah, basically the overall percentage of correct predictions

180
00:08:55.960 --> 00:08:59.240
<v Speaker 2>the bot makes simple enough. Recall is the bot's ability

181
00:08:59.240 --> 00:09:02.039
<v Speaker 2>to identify the correct intent. Think of it as catching

182
00:09:02.080 --> 00:09:04.039
<v Speaker 2>all the relevant questions for a specific topic.

183
00:09:04.159 --> 00:09:06.360
<v Speaker 1>So if recall is low, it means.

184
00:09:06.240 --> 00:09:08.559
<v Speaker 2>The bot is missing a lot of relevant questions. Like

185
00:09:08.639 --> 00:09:11.559
<v Speaker 2>the example in the book, if a hashtag login issue

186
00:09:11.559 --> 00:09:14.120
<v Speaker 2>intent had a really low recall maybe zero point four

187
00:09:14.159 --> 00:09:16.240
<v Speaker 2>to four, it means the bot missed more than half

188
00:09:16.279 --> 00:09:18.480
<v Speaker 2>the questions that were actually about login problems.

189
00:09:18.639 --> 00:09:20.320
<v Speaker 1>Ouch, okay, and precision.

190
00:09:20.559 --> 00:09:23.679
<v Speaker 2>Precision, on the other hand, is the bot's ability to

191
00:09:23.759 --> 00:09:28.039
<v Speaker 2>avoid giving a wrong intent. So if precision is low,

192
00:09:28.480 --> 00:09:32.279
<v Speaker 2>your bot might be confidently misunderstanding.

193
00:09:31.519 --> 00:09:33.399
<v Speaker 1>Users, which might be even worse.

194
00:09:33.840 --> 00:09:36.399
<v Speaker 2>It can be yeah, more frustrated than the bot just

195
00:09:36.440 --> 00:09:39.639
<v Speaker 2>saying I don't know. So the real insight here is

196
00:09:39.759 --> 00:09:43.080
<v Speaker 2>how critical it is to balance both precision and recall.

197
00:09:44.120 --> 00:09:46.879
<v Speaker 2>Sometimes improving one can actually hurt the other, so you

198
00:09:46.919 --> 00:09:47.759
<v Speaker 2>need to watch both.

199
00:09:47.840 --> 00:09:50.120
<v Speaker 1>That makes sense. It's a balancing act. So we're talking

200
00:09:50.279 --> 00:09:53.720
<v Speaker 1>rigorous measurement, But how do we actually test this in

201
00:09:53.759 --> 00:09:57.080
<v Speaker 1>a way that reflects the real world. You mentioned kfold

202
00:09:57.120 --> 00:09:58.759
<v Speaker 1>cross validation or blind testing.

203
00:09:58.879 --> 00:10:02.000
<v Speaker 2>Yeah, those are standard method and AI generated data can

204
00:10:02.039 --> 00:10:05.360
<v Speaker 2>be useful for blind testing, especially when you're just starting out.

205
00:10:05.559 --> 00:10:08.200
<v Speaker 2>But what's really fascinating in the book highlights this is

206
00:10:08.200 --> 00:10:11.759
<v Speaker 2>that the most reliable, least biased testing data it comes

207
00:10:11.759 --> 00:10:13.279
<v Speaker 2>from representative.

208
00:10:12.639 --> 00:10:16.200
<v Speaker 1>Production logs, meaning the actual conversations people have had with

209
00:10:16.240 --> 00:10:17.200
<v Speaker 1>the bot exactly.

210
00:10:17.360 --> 00:10:20.840
<v Speaker 2>These logs show what users actually ask and precisely how

211
00:10:20.840 --> 00:10:23.120
<v Speaker 2>they phrase it. It gives you the truest measure of

212
00:10:23.159 --> 00:10:24.919
<v Speaker 2>how the bot performs in the wild.

213
00:10:25.000 --> 00:10:27.159
<v Speaker 1>But that sounds like it requires a lot of work

214
00:10:27.200 --> 00:10:28.679
<v Speaker 1>to go through and label correctly.

215
00:10:29.159 --> 00:10:33.120
<v Speaker 2>It often does. It frequently requires careful, sometimes even manual

216
00:10:33.159 --> 00:10:37.600
<v Speaker 2>annotation by humans to identify what the golden or correct

217
00:10:37.639 --> 00:10:40.960
<v Speaker 2>intent should have been for each user message. But the

218
00:10:40.960 --> 00:10:45.720
<v Speaker 2>insight is clear. Real user data is gold standard for testing.

219
00:10:46.080 --> 00:10:49.320
<v Speaker 1>It sounds like incredibly diligent work making sure the bot's

220
00:10:49.360 --> 00:10:52.720
<v Speaker 1>brain is truly learning the right lessons from real interactions.

221
00:10:52.919 --> 00:10:53.200
<v Speaker 2>It is.

222
00:10:53.360 --> 00:10:55.360
<v Speaker 1>Yeah, just as we're learning how to really fine tune

223
00:10:55.399 --> 00:10:58.840
<v Speaker 1>these traditional systems, there's been this monumental shift in AI

224
00:10:58.960 --> 00:11:01.480
<v Speaker 1>that's just complete lately, changing the rules of the game

225
00:11:01.480 --> 00:11:04.000
<v Speaker 1>for chatbots, oh, absolutely, which brings us to the real

226
00:11:04.000 --> 00:11:08.799
<v Speaker 1>game changer. Generative AI. How is this revolutionizing the very

227
00:11:08.879 --> 00:11:10.759
<v Speaker 1>nature of conversational interaction?

228
00:11:11.080 --> 00:11:13.440
<v Speaker 2>Right? Generative AI. It's kind of a blanket term really

229
00:11:13.480 --> 00:11:16.600
<v Speaker 2>for AI that's powered by these foundation models. So specifically,

230
00:11:16.600 --> 00:11:19.639
<v Speaker 2>we're usually talking about large language models or lms.

231
00:11:19.960 --> 00:11:22.159
<v Speaker 1>LMS. We hear that term everywhere.

232
00:11:22.200 --> 00:11:25.679
<v Speaker 2>Now, Yeah, think of them as these incredibly vast machine

233
00:11:25.720 --> 00:11:28.960
<v Speaker 2>learning models. They've been trained on well basically all the

234
00:11:29.000 --> 00:11:32.080
<v Speaker 2>Internet's text, or huge chunks of it anyway.

235
00:11:32.039 --> 00:11:34.519
<v Speaker 1>Okay, and how do they work? Fundamentally?

236
00:11:34.639 --> 00:11:38.039
<v Speaker 2>Their core function essentially is to predict the next word

237
00:11:38.120 --> 00:11:41.159
<v Speaker 2>in a sequence, and because they're trained on so much text,

238
00:11:41.399 --> 00:11:44.159
<v Speaker 2>they get incredibly good at it, good enough to generate

239
00:11:44.240 --> 00:11:48.039
<v Speaker 2>everything from coherent paragraphs to you know, entire pages of

240
00:11:48.080 --> 00:11:49.799
<v Speaker 2>texts that sound remarkably human.

241
00:11:50.039 --> 00:11:54.559
<v Speaker 1>Wow. Okay, So how do these incredibly powerful lms help

242
00:11:54.720 --> 00:11:58.240
<v Speaker 1>solve those common chat butt pain points we just talked about,

243
00:11:58.559 --> 00:12:00.240
<v Speaker 1>the ones that make us want to, you know, throw

244
00:12:00.240 --> 00:12:00.720
<v Speaker 1>our phones.

245
00:12:01.039 --> 00:12:04.480
<v Speaker 2>They offer potential solutions across the board. Really for that

246
00:12:04.559 --> 00:12:08.840
<v Speaker 2>weak understanding problem. Lllms can help train much stronger traditional

247
00:12:08.840 --> 00:12:11.840
<v Speaker 2>intents or and this is a big one, they can

248
00:12:11.879 --> 00:12:16.240
<v Speaker 2>even entirely replace traditional intent recognition using something called retrieval,

249
00:12:16.240 --> 00:12:19.279
<v Speaker 2>augmented generation or air gray greg.

250
00:12:19.480 --> 00:12:20.840
<v Speaker 1>Okay, we'll definitely need to dive.

251
00:12:20.679 --> 00:12:23.039
<v Speaker 2>Into that, Yeah we will. But the point is lllms

252
00:12:23.080 --> 00:12:25.240
<v Speaker 2>are just far more adaptive to nuance in all the

253
00:12:25.360 --> 00:12:26.919
<v Speaker 2>varied ways people phrase things.

254
00:12:27.039 --> 00:12:29.799
<v Speaker 1>And what about the complexity issue bots getting too confusing?

255
00:12:30.039 --> 00:12:33.799
<v Speaker 2>They can help there too. Lms can assist in writing simpler,

256
00:12:33.919 --> 00:12:36.919
<v Speaker 2>clearer dialogue for the bot, or they can even be

257
00:12:37.039 --> 00:12:40.759
<v Speaker 2>used to test dialogue flows for unexpected complexity before you

258
00:12:40.799 --> 00:12:41.799
<v Speaker 2>deploy them.

259
00:12:42.159 --> 00:12:45.000
<v Speaker 1>Okay, and the immediate opt outs people just giving up

260
00:12:45.120 --> 00:12:45.559
<v Speaker 1>right away.

261
00:12:45.919 --> 00:12:49.200
<v Speaker 2>Generative AI can help write much more engaging, maybe even

262
00:12:49.200 --> 00:12:53.440
<v Speaker 2>more empathetic pros for the bot's messages. Setting a better

263
00:12:53.480 --> 00:12:56.039
<v Speaker 2>tone right from the start can make a huge difference

264
00:12:56.039 --> 00:12:59.000
<v Speaker 2>in making the user feel heard and willing to continue.

265
00:12:59.080 --> 00:13:01.559
<v Speaker 1>So they're not just for the end user experience, but

266
00:13:01.600 --> 00:13:04.759
<v Speaker 1>they're also tools for the people actually building the bots.

267
00:13:04.799 --> 00:13:08.159
<v Speaker 1>I saw a table in the source about key applications exactly.

268
00:13:08.519 --> 00:13:12.440
<v Speaker 2>Lllms have both consumer facing applications like generating answers using

269
00:13:12.679 --> 00:13:16.720
<v Speaker 2>R which we mentioned, or maybe summarizing long conversation transcripts

270
00:13:16.759 --> 00:13:19.559
<v Speaker 2>or human agents who take over a call. It's useful, yeah,

271
00:13:19.679 --> 00:13:22.360
<v Speaker 2>hugely and then they have powerful build assistant tasks. They

272
00:13:22.399 --> 00:13:25.440
<v Speaker 2>can help copy it or even write dialogue flows from scratch,

273
00:13:25.759 --> 00:13:28.080
<v Speaker 2>or they can augment training data for the human builders,

274
00:13:28.440 --> 00:13:30.240
<v Speaker 2>which is just a massive time saver.

275
00:13:30.679 --> 00:13:32.879
<v Speaker 1>But with all that power, especially if they're trained on

276
00:13:33.240 --> 00:13:37.759
<v Speaker 1>all the Internet's text, there must be some pretty significant danger,

277
00:13:37.840 --> 00:13:39.720
<v Speaker 1>some pitfalls we need to watch out for. Oh.

278
00:13:39.759 --> 00:13:42.559
<v Speaker 2>Absolutely, that's a critical point. The Internet, as we all

279
00:13:42.559 --> 00:13:46.879
<v Speaker 2>know is it's full of bias, hateful speech, misinformation, you

280
00:13:47.000 --> 00:13:51.039
<v Speaker 2>name it, and lms can unfortunately learn from all of that.

281
00:13:51.840 --> 00:13:56.600
<v Speaker 2>So guardrails are absolutely crucial, non negotiable.

282
00:13:56.639 --> 00:13:58.200
<v Speaker 1>Really, what kind of guardrails are we talking about?

283
00:13:58.320 --> 00:14:02.799
<v Speaker 2>Things like content filters, but also process guardrails, like a

284
00:14:02.840 --> 00:14:06.240
<v Speaker 2>beforehand review process that means the LLM might assist a

285
00:14:06.320 --> 00:14:09.559
<v Speaker 2>human maybe drafting your response, but the human always has

286
00:14:09.600 --> 00:14:12.759
<v Speaker 2>the final say and is ultimately responsible for the output.

287
00:14:12.879 --> 00:14:14.600
<v Speaker 1>Human in the loop exactly.

288
00:14:14.720 --> 00:14:18.840
<v Speaker 2>Yeah, and perhaps most importantly, grounding the LM's output in

289
00:14:18.919 --> 00:14:23.279
<v Speaker 2>your own company's verified accurate documents through Araghi. That stops

290
00:14:23.320 --> 00:14:24.720
<v Speaker 2>it from just pulling answers from.

291
00:14:24.600 --> 00:14:28.840
<v Speaker 1>The wild web, like that unforgettable Canadian Airline chatbot example

292
00:14:28.879 --> 00:14:29.559
<v Speaker 1>that went viral.

293
00:14:29.799 --> 00:14:34.639
<v Speaker 2>Precisely that case is legendary. Now, their chatbot offered a

294
00:14:34.679 --> 00:14:38.679
<v Speaker 2>bereavement discount that didn't actually exist based on some information

295
00:14:38.759 --> 00:14:41.120
<v Speaker 2>it hallucinated or pulled in correctly.

296
00:14:40.720 --> 00:14:42.679
<v Speaker 1>And the airline tried to argue the bot was separate.

297
00:14:42.840 --> 00:14:45.320
<v Speaker 2>They tried to argue as a separate legal entity, if

298
00:14:45.360 --> 00:14:48.679
<v Speaker 2>you can believe it. The court strongly disagreed and made

299
00:14:48.720 --> 00:14:52.399
<v Speaker 2>them honor the discount. Wow, it really underscores a critical insight.

300
00:14:52.919 --> 00:14:56.120
<v Speaker 2>Companies are responsible for what their bots say, which highlights

301
00:14:56.159 --> 00:15:01.519
<v Speaker 2>the absolute necessity of these guardrails, especially like RAG, to

302
00:15:01.720 --> 00:15:05.960
<v Speaker 2>ensure accuracy and frankly avoid legal nightmares.

303
00:15:06.000 --> 00:15:08.639
<v Speaker 1>H that's a very expensive lesson and it really drives

304
00:15:08.679 --> 00:15:12.000
<v Speaker 1>home the need for proper implementation. So let's talk more

305
00:15:12.000 --> 00:15:15.200
<v Speaker 1>about this argo retrieval augmented generation. You said it's a

306
00:15:15.240 --> 00:15:19.000
<v Speaker 1>big part of solving the weak understanding problem, especially for

307
00:15:19.080 --> 00:15:21.440
<v Speaker 1>those less common, more specific queries.

308
00:15:21.480 --> 00:15:25.080
<v Speaker 2>It absolutely is traditional intent based systems. They really struggle

309
00:15:25.080 --> 00:15:26.720
<v Speaker 2>with what's called the long tail problem.

310
00:15:26.919 --> 00:15:27.399
<v Speaker 1>Long tail.

311
00:15:27.480 --> 00:15:30.399
<v Speaker 2>Yeah, think about it. Imagine trying to write a specific

312
00:15:30.759 --> 00:15:34.679
<v Speaker 2>rule or train an intent for every single possible question

313
00:15:34.799 --> 00:15:37.960
<v Speaker 2>someone could ask about your products or services. It's an

314
00:15:37.960 --> 00:15:41.200
<v Speaker 2>impossible task, right, there's a long tail of very specific,

315
00:15:41.480 --> 00:15:42.679
<v Speaker 2>infrequent questions.

316
00:15:42.799 --> 00:15:46.039
<v Speaker 1>Right, you can cover the common stuff, but not everything exactly.

317
00:15:46.480 --> 00:15:49.440
<v Speaker 2>So when users ask questions that deviate from those pre

318
00:15:49.519 --> 00:15:53.159
<v Speaker 2>defined intents you did train, or questions that are simply

319
00:15:53.279 --> 00:15:57.080
<v Speaker 2>too uncommon to have specific training data for, the traditional

320
00:15:57.159 --> 00:15:59.600
<v Speaker 2>bot just breaks down. It throws up its hands because

321
00:15:59.639 --> 00:16:00.799
<v Speaker 2>it has no rule to follow.

322
00:16:01.200 --> 00:16:05.000
<v Speaker 1>Okay, so Argie is the answer to that long tail problem.

323
00:16:05.399 --> 00:16:07.840
<v Speaker 1>How does it handle it differently, say, compared to just

324
00:16:07.919 --> 00:16:09.679
<v Speaker 1>adding a search function to the chatbot.

325
00:16:09.840 --> 00:16:12.240
<v Speaker 2>That's a great comparison. Let's think about traditional search within

326
00:16:12.279 --> 00:16:14.519
<v Speaker 2>a chat bot first. It would work kind of like

327
00:16:14.559 --> 00:16:18.919
<v Speaker 2>the pharmabout example before Eric. It finds relevant documents or passages,

328
00:16:18.960 --> 00:16:22.360
<v Speaker 2>maybe on ibuprofen and blood pressure. Okay, the benefits are clear.

329
00:16:22.879 --> 00:16:25.720
<v Speaker 2>You get a breadth of information, it's relatively easy to maintain,

330
00:16:25.879 --> 00:16:28.759
<v Speaker 2>just add or edit your documents, and search technology is

331
00:16:28.799 --> 00:16:32.559
<v Speaker 2>well established. But the downsides, the drawbacks are significant for

332
00:16:32.639 --> 00:16:36.360
<v Speaker 2>the user experience. It often just returns links or maybe

333
00:16:36.399 --> 00:16:40.559
<v Speaker 2>short snippets of text. It forces the user to click through, read,

334
00:16:40.639 --> 00:16:43.480
<v Speaker 2>and piece together the answer themselves, which is really frustrating.

335
00:16:43.519 --> 00:16:45.799
<v Speaker 1>You ask the bot for an answer, not homework.

336
00:16:45.559 --> 00:16:49.240
<v Speaker 2>Exactly, and it's particularly bad for voice interactions. You can't

337
00:16:49.279 --> 00:16:51.559
<v Speaker 2>exactly click a link when you're talking to a voice assistant.

338
00:16:51.639 --> 00:16:54.480
<v Speaker 1>Good point. So how does argon improve on that?

339
00:16:54.799 --> 00:16:59.039
<v Speaker 2>This is where retrieval augmented generation really shines. It offers

340
00:16:59.080 --> 00:17:03.600
<v Speaker 2>a truly powerful leap forward. Argin combines that search based

341
00:17:03.639 --> 00:17:07.039
<v Speaker 2>retrieval step with the power of generitive models. The LMS.

342
00:17:07.160 --> 00:17:09.279
<v Speaker 1>Okay, so it searches and generate precisely.

343
00:17:09.920 --> 00:17:12.160
<v Speaker 2>The insight here is that it first retrieves the most

344
00:17:12.200 --> 00:17:15.720
<v Speaker 2>relevant passages from your own verified knowledge base, your documents,

345
00:17:15.720 --> 00:17:18.640
<v Speaker 2>your website content, whatever you feed it, and then the

346
00:17:18.799 --> 00:17:23.279
<v Speaker 2>LM takes those retrieved passages and synthesizes them into a cohesive,

347
00:17:23.480 --> 00:17:26.279
<v Speaker 2>contextually aware answer in natural language.

348
00:17:26.359 --> 00:17:28.559
<v Speaker 1>Ah So, instead of just giving me links about ibuprofen

349
00:17:28.599 --> 00:17:31.240
<v Speaker 1>and blood pressure, the pharma bought with our gig would

350
00:17:31.240 --> 00:17:34.200
<v Speaker 1>actually read those relevant bits and then write me a clear,

351
00:17:34.359 --> 00:17:35.480
<v Speaker 1>single summary answer.

352
00:17:35.880 --> 00:17:40.400
<v Speaker 2>Exactly, and crucially, that answer is grounded in that verified

353
00:17:40.400 --> 00:17:41.799
<v Speaker 2>source information you provided.

354
00:17:41.960 --> 00:17:44.559
<v Speaker 1>Grounded That seems like the key word it really is.

355
00:17:45.160 --> 00:17:48.359
<v Speaker 2>It means the answer is based on your accurate, up

356
00:17:48.359 --> 00:17:51.279
<v Speaker 2>to day data, not just the llm's general knowledge is

357
00:17:51.359 --> 00:17:55.039
<v Speaker 2>great from the internet years ago. This dramatically expands the

358
00:17:55.079 --> 00:17:58.839
<v Speaker 2>bot's versatility. It can answer way more questions far more accurately,

359
00:17:59.319 --> 00:18:03.279
<v Speaker 2>and it's significantly reduces those bot doesn't understand and too

360
00:18:03.319 --> 00:18:04.720
<v Speaker 2>much complexity pain points.

361
00:18:04.799 --> 00:18:07.559
<v Speaker 1>Okay, that sounds amazing. How is it actually implemented behind

362
00:18:07.559 --> 00:18:10.119
<v Speaker 1>the scenes? It sounds potentially quite complex.

363
00:18:10.440 --> 00:18:14.160
<v Speaker 2>It involves a few pretty fascinating steps. First, your large

364
00:18:14.160 --> 00:18:18.039
<v Speaker 2>documents think manuals, website pages, knowledge based articles are broken

365
00:18:18.079 --> 00:18:21.519
<v Speaker 2>down or chunked into smaller manageable pieces, maybe paragraphs or

366
00:18:21.599 --> 00:18:24.960
<v Speaker 2>logical section chunking, got it. Then an AI model called

367
00:18:24.960 --> 00:18:29.599
<v Speaker 2>an embedding model converts these chunks into numerical representations. We

368
00:18:29.680 --> 00:18:31.680
<v Speaker 2>call these embeddings.

369
00:18:31.160 --> 00:18:33.920
<v Speaker 1>Numerical representations like coordinates on a map.

370
00:18:34.160 --> 00:18:37.279
<v Speaker 2>Kind of Yeah, it's like creating a unique numeric fingerprint

371
00:18:37.480 --> 00:18:40.119
<v Speaker 2>for the meaning of each piece of text. Texts with

372
00:18:40.240 --> 00:18:43.839
<v Speaker 2>similar meanings end up with similar fingerprints or closer coordinates

373
00:18:43.839 --> 00:18:48.079
<v Speaker 2>in this high dimensional space. Whoa these embeddings. These fingerprints

374
00:18:48.279 --> 00:18:51.000
<v Speaker 2>are then stored in a special kind of database called

375
00:18:51.000 --> 00:18:54.160
<v Speaker 2>a vector database. Think of it as a super fast,

376
00:18:54.240 --> 00:18:57.839
<v Speaker 2>intelligent library index that doesn't just look for keywords, but

377
00:18:57.920 --> 00:19:00.240
<v Speaker 2>for semantic similarity for similar.

378
00:19:00.599 --> 00:19:03.839
<v Speaker 1>Okay, so you've indexed all your chunked documents by their meaning.

379
00:19:04.000 --> 00:19:05.640
<v Speaker 1>What happens when I ask a question?

380
00:19:06.079 --> 00:19:08.839
<v Speaker 2>Right at runtime? When you ask something, your question is

381
00:19:08.880 --> 00:19:12.000
<v Speaker 2>also turned into an embedding using the same model. The

382
00:19:12.039 --> 00:19:15.079
<v Speaker 2>system then searches the vector database to find the chunks

383
00:19:15.079 --> 00:19:19.039
<v Speaker 2>whose embeddings are closest, meaning most semantically similar to your

384
00:19:19.119 --> 00:19:20.400
<v Speaker 2>questions embedding, So it.

385
00:19:20.359 --> 00:19:22.400
<v Speaker 1>Finds the most relevant paragraphs based.

386
00:19:22.200 --> 00:19:25.880
<v Speaker 2>On meaning exactly, and then those retrieved passages are fed

387
00:19:25.880 --> 00:19:29.400
<v Speaker 2>to the LLM along with your original question with instructions

388
00:19:29.440 --> 00:19:33.440
<v Speaker 2>like answer the user's question based only on this provided information.

389
00:19:34.160 --> 00:19:37.160
<v Speaker 2>The LLM then synthesizes the final grounded answer.

390
00:19:37.559 --> 00:19:43.920
<v Speaker 1>Wow, that's a lot of intricate steps chunking, embeddings, vector databases, retrieval, synthesis.

391
00:19:44.400 --> 00:19:46.240
<v Speaker 1>So for you, the listener, who might be thinking, Okay,

392
00:19:46.319 --> 00:19:49.160
<v Speaker 1>if the lms are so powerful, why not just ask

393
00:19:49.200 --> 00:19:52.160
<v Speaker 1>the LLM directly, why bother with all this a rag stuff?

394
00:19:52.400 --> 00:19:54.960
<v Speaker 2>That's a really crucial question, And the reason is simple

395
00:19:55.720 --> 00:19:59.359
<v Speaker 2>control and reliability. Llms used on their own.

396
00:19:59.359 --> 00:20:01.720
<v Speaker 1>Can hallucinate, hallucinate, make things up.

397
00:20:01.680 --> 00:20:06.359
<v Speaker 2>Yeah, literally makeup facts, or provide outdated information because their

398
00:20:06.400 --> 00:20:09.640
<v Speaker 2>training data isn't perfectly current. Remember they're trained on a

399
00:20:09.680 --> 00:20:13.519
<v Speaker 2>massive general data set, but they don't inherently know the specific,

400
00:20:13.680 --> 00:20:16.480
<v Speaker 2>up to the minute details of your company's policies or

401
00:20:16.519 --> 00:20:19.480
<v Speaker 2>product features, and you have much less direct control over

402
00:20:19.519 --> 00:20:20.519
<v Speaker 2>the answers they generate.

403
00:20:20.640 --> 00:20:22.079
<v Speaker 1>Okay, So RAG fixes that.

404
00:20:22.599 --> 00:20:26.359
<v Speaker 2>Eric directly addresses this. By grounding the LM's answers in

405
00:20:26.400 --> 00:20:30.039
<v Speaker 2>your specific, verified, up to day documents, you ensure accuracy

406
00:20:30.079 --> 00:20:33.559
<v Speaker 2>and reliability. You maintain control over the knowledge source. It's

407
00:20:33.599 --> 00:20:37.960
<v Speaker 2>about combining that amazing generative power of the LLM with controlled, accurate,

408
00:20:38.119 --> 00:20:39.200
<v Speaker 2>trustworthy knowledge.

409
00:20:39.359 --> 00:20:42.119
<v Speaker 1>That makes perfect sense. It's like giving the LLM guardrails

410
00:20:42.160 --> 00:20:43.279
<v Speaker 1>made of your own information.

411
00:20:43.680 --> 00:20:44.640
<v Speaker 2>That's a great way to put it.

412
00:20:44.720 --> 00:20:47.319
<v Speaker 1>Okay, So you've built this amazing bot. Maybe it uses

413
00:20:47.359 --> 00:20:50.519
<v Speaker 1>our gay, maybe it has finally tuned intents, it's got guardrails.

414
00:20:51.440 --> 00:20:53.519
<v Speaker 1>You're done right, set it and forget it.

415
00:20:53.640 --> 00:20:56.880
<v Speaker 2>Oh, if only, wouldn't that be not? No? Conversational AI

416
00:20:57.000 --> 00:20:59.640
<v Speaker 2>is definitely not static. It can't be. Why not, which

417
00:20:59.720 --> 00:21:03.599
<v Speaker 2>leads us straight to the critical insight of continuous improvement.

418
00:21:04.119 --> 00:21:08.559
<v Speaker 2>Think about it. User needs change constantly, business rules and

419
00:21:08.559 --> 00:21:13.920
<v Speaker 2>policies evolve, new technologies like generative AI itself emerge, and

420
00:21:14.160 --> 00:21:16.799
<v Speaker 2>better AI models become available. All the time.

421
00:21:16.960 --> 00:21:18.759
<v Speaker 1>So if you don't keep up, the bot just gets

422
00:21:18.799 --> 00:21:19.519
<v Speaker 1>worse over time.

423
00:21:19.720 --> 00:21:23.880
<v Speaker 2>Absolutely, performance inevitably degrades if you don't actively maintain and

424
00:21:23.920 --> 00:21:26.680
<v Speaker 2>improve it. The book calls it fighting entropy in a

425
00:21:26.720 --> 00:21:29.799
<v Speaker 2>constantly changing environment. You have to keep investing just to

426
00:21:29.839 --> 00:21:31.559
<v Speaker 2>stay level, let alone get better.

427
00:21:31.680 --> 00:21:35.000
<v Speaker 1>Okay, So what does this essential continuous improvement cycle look

428
00:21:35.119 --> 00:21:36.920
<v Speaker 1>like in practice? Is there a process?

429
00:21:37.039 --> 00:21:40.200
<v Speaker 2>Yes, it's an iterative process, a constant loop of refinement. Really,

430
00:21:40.599 --> 00:21:43.720
<v Speaker 2>you first measure the baseline performance of your system. Get

431
00:21:43.759 --> 00:21:44.359
<v Speaker 2>your starting.

432
00:21:44.160 --> 00:21:45.880
<v Speaker 1>Point, okay, establish the baseline.

433
00:21:45.960 --> 00:21:49.400
<v Speaker 2>Then you identify a problem, what's not working well? And

434
00:21:49.480 --> 00:21:53.000
<v Speaker 2>ideally you connect that problem directly to a business metric.

435
00:21:53.079 --> 00:21:56.400
<v Speaker 2>Not just the bot is confused, but maybe too many

436
00:21:56.400 --> 00:21:59.920
<v Speaker 2>calls about exer transferring to an agent or customer satisfaction

437
00:22:00.079 --> 00:22:02.240
<v Speaker 2>scores are declining. For why reason?

438
00:22:02.480 --> 00:22:04.279
<v Speaker 1>Make it concrete and business relevant?

439
00:22:04.359 --> 00:22:07.599
<v Speaker 2>Got it exactly? Next, you devise a solution. What's your

440
00:22:07.640 --> 00:22:10.759
<v Speaker 2>plan to fix it? Then you develop and deliver those changes.

441
00:22:11.240 --> 00:22:15.039
<v Speaker 2>And crucially, the book advice is making small, iterative changes

442
00:22:15.359 --> 00:22:18.759
<v Speaker 2>rather than huge, big bang updates. Right, small changes because

443
00:22:18.799 --> 00:22:21.559
<v Speaker 2>they have a smaller blast zone. If something goes wrong,

444
00:22:21.920 --> 00:22:24.519
<v Speaker 2>it's easier to roll back a small change that causes

445
00:22:24.559 --> 00:22:28.359
<v Speaker 2>problems than a massive overhaul makes sense safer. And the

446
00:22:28.440 --> 00:22:32.279
<v Speaker 2>last step, finally, you monitor and evaluate did the changes

447
00:22:32.319 --> 00:22:35.160
<v Speaker 2>actually deliver the improvements you expected? Did that metric you

448
00:22:35.160 --> 00:22:37.599
<v Speaker 2>were tracking actually move in the right direction? Then you

449
00:22:37.640 --> 00:22:43.079
<v Speaker 2>start the loop again measure, identify, solve, deliver, monitor.

450
00:22:42.720 --> 00:22:45.319
<v Speaker 1>And it's all about connecting this technical work back to

451
00:22:45.599 --> 00:22:48.200
<v Speaker 1>actual business value, isn't it You mentioned metrics. It can't

452
00:22:48.240 --> 00:22:49.680
<v Speaker 1>just be tech jargon for its own.

453
00:22:49.559 --> 00:22:53.000
<v Speaker 2>Sake, absolutely not. You have to speak the stakeholder's language.

454
00:22:53.440 --> 00:22:55.759
<v Speaker 2>That means focusing on things like cost reduction.

455
00:22:56.160 --> 00:22:59.359
<v Speaker 1>How does a better bought reduce costs several.

456
00:22:59.039 --> 00:23:03.119
<v Speaker 2>Ways through containment that's completing calls or interactions without any

457
00:23:03.160 --> 00:23:08.519
<v Speaker 2>human involvement, by reducing average handle time or AHT for

458
00:23:08.559 --> 00:23:11.279
<v Speaker 2>the human agents who do get involved because maybe the

459
00:23:11.319 --> 00:23:16.039
<v Speaker 2>bot gathered info better, or by reducing human touches which

460
00:23:16.119 --> 00:23:18.759
<v Speaker 2>means fewer calls getting routed to the wrong place and

461
00:23:18.839 --> 00:23:19.920
<v Speaker 2>needing another transfer.

462
00:23:20.079 --> 00:23:21.720
<v Speaker 1>Okay, cost reduction? What else?

463
00:23:21.880 --> 00:23:24.559
<v Speaker 2>The other big one is customer satisfaction or sees that

464
00:23:24.720 --> 00:23:27.279
<v Speaker 2>you measure that with things like net Promoter score or

465
00:23:27.400 --> 00:23:31.720
<v Speaker 2>MPs surveys, or by looking at metrics like timed resolution,

466
00:23:31.880 --> 00:23:34.960
<v Speaker 2>How quickly did the customer get their issue solved or

467
00:23:34.960 --> 00:23:37.839
<v Speaker 2>even reduce customer churn? Are fewer customers.

468
00:23:37.440 --> 00:23:39.759
<v Speaker 1>Leaving you that example in the book about the medical

469
00:23:39.839 --> 00:23:43.039
<v Speaker 1>ensurer though, that was fascinating. They improved the accuracy of

470
00:23:43.079 --> 00:23:45.359
<v Speaker 1>their claim to night reason intent right.

471
00:23:45.200 --> 00:23:48.559
<v Speaker 2>And it worked. It increased containment, more people got the

472
00:23:48.599 --> 00:23:51.839
<v Speaker 2>answer from the bot, but their NPS scores actually dropped

473
00:23:52.480 --> 00:23:56.039
<v Speaker 2>because the unhappy callers who previously would have just escalated

474
00:23:56.039 --> 00:23:58.680
<v Speaker 2>to complain to a human were now self serving with

475
00:23:58.759 --> 00:24:01.559
<v Speaker 2>the bot and then taking the post call survey and

476
00:24:01.680 --> 00:24:02.960
<v Speaker 2>expressing their displeasure.

477
00:24:03.160 --> 00:24:06.759
<v Speaker 1>Wow, so fixing one metric hurt another. What a nuanced problem.

478
00:24:06.880 --> 00:24:09.799
<v Speaker 2>It's a perfect illustration that business goals can sometimes actually

479
00:24:09.839 --> 00:24:13.359
<v Speaker 2>contradict each other. The real insight here is that you

480
00:24:13.440 --> 00:24:17.319
<v Speaker 2>need to deeply understand those nuances and potential trade offs.

481
00:24:17.599 --> 00:24:21.920
<v Speaker 2>Solving one problem might just uncover or even create another.

482
00:24:21.680 --> 00:24:24.440
<v Speaker 1>One, which brings us beautifully to the user journey itself.

483
00:24:24.720 --> 00:24:27.680
<v Speaker 1>Streamlining it reduced some complexity and trying to stop those

484
00:24:27.759 --> 00:24:30.759
<v Speaker 1>dreaded opt outs. The book talks quite a bit about

485
00:24:30.759 --> 00:24:31.839
<v Speaker 1>the pain of complexity.

486
00:24:32.000 --> 00:24:36.079
<v Speaker 2>Yeah, complex conversations are just intimidating and confusing for users.

487
00:24:36.240 --> 00:24:38.880
<v Speaker 2>They lead directly to friction, people having to retry things

488
00:24:39.079 --> 00:24:41.240
<v Speaker 2>and ultimately abandonment giving up.

489
00:24:41.319 --> 00:24:44.319
<v Speaker 1>Like that insurance company's voice system for checking claim Stata.

490
00:24:44.359 --> 00:24:46.319
<v Speaker 2>Oh, that was a rough one. It had only a

491
00:24:46.319 --> 00:24:49.960
<v Speaker 2>forty percent success rate forty percent, way so low because

492
00:24:50.000 --> 00:24:54.279
<v Speaker 2>it required users to verbally input five separate pieces of

493
00:24:54.359 --> 00:24:57.880
<v Speaker 2>numeric information before it would even start searching for the claim.

494
00:24:58.160 --> 00:25:02.880
<v Speaker 2>Think about that provider I member ID, claim date, claim number.

495
00:25:02.960 --> 00:25:05.359
<v Speaker 1>That sounds exhausting just listening to it. Who has all

496
00:25:05.359 --> 00:25:05.839
<v Speaker 1>that ready?

497
00:25:05.960 --> 00:25:09.119
<v Speaker 2>Exactly? It was grueling? And the insight here is that

498
00:25:09.240 --> 00:25:12.440
<v Speaker 2>every single extra step, every additional piece of information you

499
00:25:12.519 --> 00:25:15.960
<v Speaker 2>require from the user, creates another potential point where they

500
00:25:15.960 --> 00:25:16.599
<v Speaker 2>just drop off.

501
00:25:16.680 --> 00:25:18.559
<v Speaker 1>So how did they fix it? What was the impact?

502
00:25:19.160 --> 00:25:22.359
<v Speaker 2>They simplified it by using the caller ID to potentially

503
00:25:22.440 --> 00:25:25.480
<v Speaker 2>identify the member automatically and then only asking for I

504
00:25:25.480 --> 00:25:29.640
<v Speaker 2>think four fields instead of five. They grastically improved the

505
00:25:29.680 --> 00:25:33.240
<v Speaker 2>success rate. That relatively small simplification made a huge difference

506
00:25:33.279 --> 00:25:36.119
<v Speaker 2>to the user experience and therefore the business outcome.

507
00:25:36.279 --> 00:25:40.319
<v Speaker 1>So how can builders spot these overly complex dialogue flows

508
00:25:40.359 --> 00:25:43.480
<v Speaker 1>in their own bots? Are there specific warning signs to

509
00:25:43.519 --> 00:25:46.240
<v Speaker 1>look for? Anti patterns the book called them.

510
00:25:46.319 --> 00:25:49.319
<v Speaker 2>Yeah, the book lists several good ones. Asking for information

511
00:25:49.440 --> 00:25:52.680
<v Speaker 2>users are unlikely to have handy right then, having really

512
00:25:52.759 --> 00:25:56.160
<v Speaker 2>rigid input requirements like it only accepts dates in one

513
00:25:56.200 --> 00:25:57.960
<v Speaker 2>specific format, no flexibility.

514
00:25:58.079 --> 00:25:59.000
<v Speaker 1>Yeah, that's annoying.

515
00:25:59.119 --> 00:26:02.440
<v Speaker 2>Asking ambiguous questions where the user isn't sure what's being asked,

516
00:26:02.480 --> 00:26:05.400
<v Speaker 2>Treating all users exactly the same, regardless of their context

517
00:26:05.519 --> 00:26:09.599
<v Speaker 2>or history, Presenting too many options at once, choice overload,

518
00:26:10.200 --> 00:26:13.799
<v Speaker 2>asking for information in a weird, disjointed order that doesn't

519
00:26:13.799 --> 00:26:17.880
<v Speaker 2>feel natural, and a big one for voice delivering channel

520
00:26:17.960 --> 00:26:21.240
<v Speaker 2>unsuited information, like trying to read out a really long

521
00:26:21.400 --> 00:26:25.759
<v Speaker 2>complex URL over the phone. Clear signs of unnecessary complexity.

522
00:26:25.839 --> 00:26:29.279
<v Speaker 1>Okay, so once you spot these complexity traps, how do

523
00:26:29.359 --> 00:26:31.759
<v Speaker 1>you actively simplify the journey for the user?

524
00:26:31.920 --> 00:26:34.920
<v Speaker 2>Well, A key strategy and a really powerful insight here

525
00:26:35.160 --> 00:26:39.559
<v Speaker 2>is using contextual information personalize the experience. How So, based

526
00:26:39.559 --> 00:26:42.599
<v Speaker 2>on things you might already know about the user, their location,

527
00:26:43.000 --> 00:26:45.359
<v Speaker 2>maybe their time zone, the type of device they're using,

528
00:26:45.519 --> 00:26:49.880
<v Speaker 2>their stated preferences, maybe even their past behavior or previous interactions.

529
00:26:49.359 --> 00:26:53.400
<v Speaker 1>With you, like that banking chatbot example, MAX giving generic advice.

530
00:26:53.160 --> 00:26:57.039
<v Speaker 2>Exactly MAX giving generic credit card advice to Emma, who's

531
00:26:57.079 --> 00:27:01.839
<v Speaker 2>already a customer. That's just frustrating and unhelpful. But if

532
00:27:01.880 --> 00:27:05.440
<v Speaker 2>Max new Emma's transaction history may be her current credit cards,

533
00:27:05.799 --> 00:27:10.079
<v Speaker 2>it could give much more tailored relevant recommendations that saves

534
00:27:10.079 --> 00:27:13.200
<v Speaker 2>her time, saves her effort, and makes the interaction feel

535
00:27:13.240 --> 00:27:14.160
<v Speaker 2>actually valuable.

536
00:27:14.279 --> 00:27:17.079
<v Speaker 1>That's a great application of using what you know what

537
00:27:17.200 --> 00:27:18.640
<v Speaker 1>else helped simplify things.

538
00:27:18.880 --> 00:27:21.559
<v Speaker 2>Slot filling is another really powerful technique.

539
00:27:21.599 --> 00:27:23.839
<v Speaker 1>Slot filling like filling in blanks.

540
00:27:23.640 --> 00:27:26.720
<v Speaker 2>Kind of, it allows the bot to capture multiple pieces

541
00:27:26.759 --> 00:27:30.240
<v Speaker 2>of information from a single user utterance, letting it skip

542
00:27:30.319 --> 00:27:31.880
<v Speaker 2>unnecessary follow up questions.

543
00:27:31.960 --> 00:27:33.519
<v Speaker 1>Oh, I see like if I say.

544
00:27:33.440 --> 00:27:34.960
<v Speaker 2>Like if you say I'd like to make a reservation

545
00:27:35.039 --> 00:27:37.599
<v Speaker 2>for two people this Saturday at eight pm, A good

546
00:27:37.640 --> 00:27:40.599
<v Speaker 2>bot can pull out the party size to the day

547
00:27:40.880 --> 00:27:44.400
<v Speaker 2>Saturday and the time APM all from that one sentence.

548
00:27:44.680 --> 00:27:47.319
<v Speaker 2>It doesn't need to ask how many people than what day?

549
00:27:47.480 --> 00:27:48.160
<v Speaker 2>Than what time?

550
00:27:48.359 --> 00:27:51.480
<v Speaker 1>Right, much smoother. The insight is really about efficiency. Then

551
00:27:51.839 --> 00:27:53.440
<v Speaker 1>move the conversation forward.

552
00:27:53.200 --> 00:27:57.400
<v Speaker 2>Faster, exactly get to the point without tedious back and forth.

553
00:27:57.119 --> 00:28:00.319
<v Speaker 1>And allowing for flexibility too. I imagine users don't always

554
00:28:00.319 --> 00:28:02.559
<v Speaker 1>say exactly what the bot expects them to say.

555
00:28:02.400 --> 00:28:05.839
<v Speaker 2>Oh never, So you have design for multiple correct responses,

556
00:28:06.319 --> 00:28:10.079
<v Speaker 2>especially with things like yes, no, or choice confusion.

557
00:28:09.720 --> 00:28:13.200
<v Speaker 1>Like the example where the bot asks text message or phone.

558
00:28:12.920 --> 00:28:16.079
<v Speaker 2>Call right and the user just says yes, or rigid

559
00:28:16.079 --> 00:28:18.920
<v Speaker 2>bot might just say sorry, I didn't understand, but a

560
00:28:18.960 --> 00:28:23.079
<v Speaker 2>flexible bot could perhaps infer that yes means text message

561
00:28:23.119 --> 00:28:25.799
<v Speaker 2>if that's the first option or the more common preference,

562
00:28:26.119 --> 00:28:29.359
<v Speaker 2>or at least ask a better clarifying question. The bot

563
00:28:29.400 --> 00:28:31.640
<v Speaker 2>needs to try hard to meet the user where they are,

564
00:28:32.279 --> 00:28:33.960
<v Speaker 2>not force them into a rigid script.

565
00:28:34.279 --> 00:28:37.279
<v Speaker 1>Okay, so we've talked complexity, but why else do users

566
00:28:37.400 --> 00:28:41.640
<v Speaker 1>just bail on chatbots, give up and leave? Besides getting confused?

567
00:28:41.680 --> 00:28:44.480
<v Speaker 2>Well, there are several reasons, and the insight here often

568
00:28:44.519 --> 00:28:48.119
<v Speaker 2>comes down to trust and managing expectations. Sometimes it's just

569
00:28:48.160 --> 00:28:52.359
<v Speaker 2>prior poor experiences with other bots. They come in already skeptical.

570
00:28:52.160 --> 00:28:53.640
<v Speaker 1>Fair enough scar tissue.

571
00:28:53.720 --> 00:28:58.039
<v Speaker 2>Yeah. Other big reasons include the bot clearly not understanding

572
00:28:58.119 --> 00:29:01.559
<v Speaker 2>them or asking them to rephrase too many times, feeling

573
00:29:01.640 --> 00:29:03.440
<v Speaker 2>like they're stuck in a loop, or just not making

574
00:29:03.480 --> 00:29:07.279
<v Speaker 2>progress towards their actual goal, or sometimes they simply don't

575
00:29:07.319 --> 00:29:09.440
<v Speaker 2>like the answer the bot gives them. Or they had

576
00:29:09.440 --> 00:29:12.119
<v Speaker 2>different expectations for what the bot could even do.

577
00:29:12.720 --> 00:29:15.640
<v Speaker 1>Okay, so knowing why they leave, what are the best

578
00:29:15.680 --> 00:29:19.599
<v Speaker 1>strategies to actually reduce those frustrating opt outs? How do

579
00:29:19.680 --> 00:29:20.680
<v Speaker 1>we keep them engaged?

580
00:29:20.880 --> 00:29:24.839
<v Speaker 2>The book suggests several key things. First off, great.

581
00:29:24.599 --> 00:29:26.599
<v Speaker 1>Greetings the very beginning of the chat.

582
00:29:26.720 --> 00:29:29.599
<v Speaker 2>Yeah, SE's the tone. A good greeting should affirm the

583
00:29:29.599 --> 00:29:33.559
<v Speaker 2>bot's purpose, maybe briefly preview the journey ahead, set realistic

584
00:29:33.599 --> 00:29:37.799
<v Speaker 2>expectations for completion time, and sometimes even incentivize the user.

585
00:29:37.799 --> 00:29:40.759
<v Speaker 1>Like that utility company example where the bot started by

586
00:29:40.759 --> 00:29:43.079
<v Speaker 1>saying something like this process should only take a few

587
00:29:43.119 --> 00:29:44.119
<v Speaker 1>minutes exactly.

588
00:29:44.400 --> 00:29:48.119
<v Speaker 2>That simple sentence manages expectations and gives the user a

589
00:29:48.160 --> 00:29:51.440
<v Speaker 2>reason to stick with it. It's a small thing, but powerful.

590
00:29:51.599 --> 00:29:53.079
<v Speaker 1>Okay, great greetings? What else?

591
00:29:53.359 --> 00:29:56.759
<v Speaker 2>Second, try hard to understand, which goes back to what

592
00:29:56.799 --> 00:30:01.920
<v Speaker 2>we discussed. Continuous investment in good relevant training data. Using

593
00:30:01.960 --> 00:30:05.599
<v Speaker 2>conversational search or RV for those broader domains so you

594
00:30:05.640 --> 00:30:07.200
<v Speaker 2>can handle more queries makes sense.

595
00:30:07.400 --> 00:30:08.519
<v Speaker 1>Be better at understanding.

596
00:30:08.720 --> 00:30:13.119
<v Speaker 2>Third, try hard to be understood, Use really clear, simple wording,

597
00:30:13.519 --> 00:30:16.599
<v Speaker 2>allow for those flexible responses we talked about, and employ

598
00:30:16.759 --> 00:30:21.480
<v Speaker 2>graceful error handling Instead of just saying retry, maybe try

599
00:30:21.519 --> 00:30:24.480
<v Speaker 2>to disambiguate, offer a couple of options based on what

600
00:30:24.519 --> 00:30:25.359
<v Speaker 2>you thought they meant.

601
00:30:25.440 --> 00:30:26.759
<v Speaker 1>Okay, handle error is better.

602
00:30:27.559 --> 00:30:32.680
<v Speaker 2>And fourth, fourth, implement smart opt out retention flows. If

603
00:30:32.720 --> 00:30:36.000
<v Speaker 2>a user does ask for a human, don't just immediately transfer.

604
00:30:36.559 --> 00:30:40.039
<v Speaker 2>Try to discover their true underlying goal. First, assess if

605
00:30:40.079 --> 00:30:42.119
<v Speaker 2>the bot actually could help with that goal. Maybe try

606
00:30:42.119 --> 00:30:44.279
<v Speaker 2>to convince the use of the bot can handle it effectively,

607
00:30:45.000 --> 00:30:47.839
<v Speaker 2>or if not, at least route them intelligently to the

608
00:30:47.880 --> 00:30:51.319
<v Speaker 2>next best action, maybe another specialized virtual agent or then finally,

609
00:30:51.400 --> 00:30:52.480
<v Speaker 2>the right human team.

610
00:30:53.000 --> 00:30:55.599
<v Speaker 1>So try to salvage the interaction if possible, or at

611
00:30:55.640 --> 00:30:56.880
<v Speaker 1>least make the handoff smooth.

612
00:30:56.920 --> 00:30:58.200
<v Speaker 2>Exactly, don't just dump them.

613
00:30:58.359 --> 00:31:03.119
<v Speaker 1>And generative AI can actually help with crafting those messages themselves, right,

614
00:31:03.440 --> 00:31:07.000
<v Speaker 1>making the bot sound less robotic and more helpful.

615
00:31:07.119 --> 00:31:11.319
<v Speaker 2>Yes, absolutely, that's another powerful application. Llms can be great

616
00:31:11.359 --> 00:31:15.880
<v Speaker 2>at rewriting potentially rude or unhelpful system error messages, transforming

617
00:31:15.880 --> 00:31:18.960
<v Speaker 2>something blunt like you didn't provide a thirteen digit number

618
00:31:19.160 --> 00:31:21.119
<v Speaker 2>into something much kinder and more instructive.

619
00:31:21.200 --> 00:31:22.759
<v Speaker 1>Yes, off in the edges exactly.

620
00:31:22.839 --> 00:31:25.559
<v Speaker 2>Yeah, and they can also help craft those greeting messages

621
00:31:25.599 --> 00:31:29.079
<v Speaker 2>to be more helpful, more welcoming, and less like talking

622
00:31:29.119 --> 00:31:32.519
<v Speaker 2>to a machine, making that initial interaction much less likely

623
00:31:32.559 --> 00:31:33.880
<v Speaker 2>to cause an immediate opt out.

624
00:31:33.960 --> 00:31:36.480
<v Speaker 1>Okay, this has been fantastic. Finally, let's touch on the

625
00:31:36.559 --> 00:31:39.400
<v Speaker 1>human AI partnership. It seems like it's not just about

626
00:31:39.440 --> 00:31:44.160
<v Speaker 1>building bots for humans, but increasingly about how AI, especially llms,

627
00:31:44.480 --> 00:31:48.559
<v Speaker 1>can augment the human builders themselves. How do they become collaborators.

628
00:31:48.799 --> 00:31:51.839
<v Speaker 2>That's a really exciting area. Lms are becoming incredible partners

629
00:31:51.880 --> 00:31:54.440
<v Speaker 2>in the actual development process. They can help solve the

630
00:31:54.440 --> 00:31:56.640
<v Speaker 2>cold start problem, you know, when you're building a brand

631
00:31:56.640 --> 00:32:00.039
<v Speaker 2>new bot and have literally no real user data to

632
00:32:00.079 --> 00:32:01.160
<v Speaker 2>start training with.

633
00:32:01.319 --> 00:32:02.160
<v Speaker 1>Right where do you begin?

634
00:32:02.720 --> 00:32:07.519
<v Speaker 2>Lms can generate realistic sounding initial training data, or they

635
00:32:07.559 --> 00:32:10.400
<v Speaker 2>can help expand your existing data by filling gaps for

636
00:32:10.880 --> 00:32:15.400
<v Speaker 2>maybe rare but really important intents like fraud reporting, where

637
00:32:15.440 --> 00:32:18.359
<v Speaker 2>you might not have enough real world examples to train effectively.

638
00:32:19.000 --> 00:32:22.160
<v Speaker 2>The insight here is that llms don't replace human creativity

639
00:32:22.240 --> 00:32:24.960
<v Speaker 2>or oversight, but they can seriously supercharge the.

640
00:32:24.920 --> 00:32:28.440
<v Speaker 1>Process so they can essentially generate data for both training

641
00:32:28.480 --> 00:32:30.960
<v Speaker 1>and testing. That sounds like it could save a huge

642
00:32:30.960 --> 00:32:32.119
<v Speaker 1>amount of manual effort.

643
00:32:32.200 --> 00:32:36.200
<v Speaker 2>Oh. Absolutely, they can generate lists of synonyms for specific terms,

644
00:32:36.240 --> 00:32:39.559
<v Speaker 2>different nouns for credentials maybe, or various verbs for lost

645
00:32:39.640 --> 00:32:43.400
<v Speaker 2>or misremembered. But they can also generate entire utterances with

646
00:32:43.559 --> 00:32:47.519
<v Speaker 2>lots of varied grammatical structures, statements, questions, even fragments or commands,

647
00:32:47.920 --> 00:32:48.839
<v Speaker 2>just like real users.

648
00:32:49.039 --> 00:32:51.559
<v Speaker 1>So instead of just forgot password, it might generate I

649
00:32:51.599 --> 00:32:55.039
<v Speaker 1>can't remember my account information, or my account is locked,

650
00:32:55.160 --> 00:32:56.519
<v Speaker 1>or help logging in.

651
00:32:56.960 --> 00:33:00.839
<v Speaker 2>Precisely, this provides much more robust training data and also

652
00:33:00.920 --> 00:33:04.559
<v Speaker 2>more realistic testing data. It helps reduce the inherent bias

653
00:33:04.599 --> 00:33:07.160
<v Speaker 2>that might come from just a few humans righting examples,

654
00:33:07.480 --> 00:33:11.160
<v Speaker 2>and it can significantly improve the classifiers accuracy beyond what

655
00:33:11.359 --> 00:33:14.799
<v Speaker 2>manual efforts alone could likely achieve. It's about getting a

656
00:33:14.839 --> 00:33:16.880
<v Speaker 2>truly comprehensive and diverse data set.

657
00:33:17.119 --> 00:33:20.599
<v Speaker 1>And what about AI assisted process flows. That sounds almost

658
00:33:20.599 --> 00:33:23.640
<v Speaker 1>like the AI is designing the conversation itself in a way.

659
00:33:23.720 --> 00:33:27.039
<v Speaker 2>Yeah, it's about rapid prototyping and also hardening the system.

660
00:33:27.759 --> 00:33:31.079
<v Speaker 2>Llams can actually suggest or even design entire process flows

661
00:33:31.079 --> 00:33:33.240
<v Speaker 2>from scratch, Like you could ask it to design a

662
00:33:33.279 --> 00:33:36.160
<v Speaker 2>flow for handling a medical insurance claim status check, and

663
00:33:36.200 --> 00:33:39.079
<v Speaker 2>it might propose the steps, the questions to ask and

664
00:33:39.119 --> 00:33:42.200
<v Speaker 2>even justify its design choices. Wow. They can also help

665
00:33:42.240 --> 00:33:45.400
<v Speaker 2>execute dialogue flows at runtime and some newer architectures, acting

666
00:33:45.400 --> 00:33:49.039
<v Speaker 2>as the chatbot's brain, dynamically figuring out what questions to

667
00:33:49.039 --> 00:33:51.720
<v Speaker 2>ask next, to collect the information needed for say in

668
00:33:51.799 --> 00:33:55.480
<v Speaker 2>an API call. Okay, and testing and critically, for testing,

669
00:33:55.759 --> 00:33:59.119
<v Speaker 2>they can simulate user inputs. They can generate all sorts

670
00:33:59.119 --> 00:34:02.200
<v Speaker 2>of weird, una expected or edge case inputs that human

671
00:34:02.240 --> 00:34:05.559
<v Speaker 2>testers might never even think of. This really helps to

672
00:34:05.720 --> 00:34:09.719
<v Speaker 2>harden the chatbot against unexpected interactions, making it far more

673
00:34:09.800 --> 00:34:11.360
<v Speaker 2>robust when it meets real users.

674
00:34:11.559 --> 00:34:15.440
<v Speaker 1>That's a massive timesaver and quality booster for developers. Okay,

675
00:34:15.559 --> 00:34:20.039
<v Speaker 1>one last piece, those inevitable handoffs to human agents. What

676
00:34:20.119 --> 00:34:24.280
<v Speaker 1>about summarization. It must be incredibly frustrating for human agent

677
00:34:24.360 --> 00:34:27.119
<v Speaker 1>to have to read through a long, rambling chat log

678
00:34:27.159 --> 00:34:28.880
<v Speaker 1>when they take over it really is.

679
00:34:29.320 --> 00:34:31.920
<v Speaker 2>It wastes time and forces the customer to wait or

680
00:34:31.960 --> 00:34:35.159
<v Speaker 2>repeat themselves. And the insight here is all about efficiency

681
00:34:35.159 --> 00:34:38.639
<v Speaker 2>and improving the customer experience during that transfer. Agents don't

682
00:34:38.639 --> 00:34:41.440
<v Speaker 2>need the entire transcript. They need a brief, targeted summary

683
00:34:41.440 --> 00:34:42.480
<v Speaker 2>of what happened and what the.

684
00:34:42.519 --> 00:34:45.199
<v Speaker 1>User needs, so llms can create that summary.

685
00:34:45.519 --> 00:34:49.280
<v Speaker 2>Yes, yeah, llms are excellent at summarization. They can take

686
00:34:49.280 --> 00:34:52.440
<v Speaker 2>a full chat transcript and condense it into concise pros

687
00:34:52.920 --> 00:34:56.159
<v Speaker 2>or even better, sometimes they can extract key structure details

688
00:34:56.159 --> 00:34:59.920
<v Speaker 2>like the user's ID, maybe a claim number, the specific issue,

689
00:35:00.039 --> 00:35:02.559
<v Speaker 2>even a sentiment analysis score, and present that clearly to

690
00:35:02.639 --> 00:35:03.039
<v Speaker 2>the agent.

691
00:35:03.159 --> 00:35:04.760
<v Speaker 1>That sounds incredibly useful.

692
00:35:04.920 --> 00:35:07.800
<v Speaker 2>It ensures a much smoother handoff, It saves the human

693
00:35:07.840 --> 00:35:10.920
<v Speaker 2>agent valuable time letting them get straight to the issue,

694
00:35:11.119 --> 00:35:14.559
<v Speaker 2>and it dramatically improves customer satisfaction because the customer doesn't

695
00:35:14.559 --> 00:35:17.280
<v Speaker 2>have to frustratingly repeat everything they just told the bot.

696
00:35:17.519 --> 00:35:20.559
<v Speaker 1>What a deep dive this has been. We've really explored

697
00:35:20.599 --> 00:35:24.639
<v Speaker 1>how conversational AI works, haven't we, From those basic intent

698
00:35:24.760 --> 00:35:28.400
<v Speaker 1>based bots all the way to the truly transformative power

699
00:35:28.480 --> 00:35:30.719
<v Speaker 1>of generative AI and things like our rage.

700
00:35:30.880 --> 00:35:34.320
<v Speaker 2>Yeah, and we've seen how thoughtful design, that constant cycle

701
00:35:34.360 --> 00:35:38.920
<v Speaker 2>of continuous improvement, and this emerging strategic partnership between human

702
00:35:38.960 --> 00:35:42.039
<v Speaker 2>builders and the lms themselves, how all that can overcome

703
00:35:42.079 --> 00:35:44.079
<v Speaker 2>those common frustrations. We started with.

704
00:35:44.079 --> 00:35:48.199
<v Speaker 1>Right streamlining complex processes and ultimately creating virtual assistance that

705
00:35:48.280 --> 00:35:51.639
<v Speaker 1>feel genuinely understanding and actually helpful.

706
00:35:51.920 --> 00:35:55.360
<v Speaker 2>And the real aha moment here for me at least,

707
00:35:55.599 --> 00:35:57.880
<v Speaker 2>is that the ultimate goal isn't just about building a

708
00:35:57.920 --> 00:36:02.559
<v Speaker 2>smarter machine, although they're getting incredibly intelligent, incredibly fast.

709
00:36:02.760 --> 00:36:04.280
<v Speaker 1>It's more than that, I think.

710
00:36:04.320 --> 00:36:09.159
<v Speaker 2>So it's really about designing an intuitive, maybe even empathetic,

711
00:36:09.480 --> 00:36:13.119
<v Speaker 2>and definitely valuable experience for you, the user. It's about

712
00:36:13.199 --> 00:36:16.760
<v Speaker 2>leveraging all this amazing technology to meet people where they are,

713
00:36:17.159 --> 00:36:20.119
<v Speaker 2>truly understand their unique needs in that moment, and deliver

714
00:36:20.199 --> 00:36:21.480
<v Speaker 2>real value.

715
00:36:21.119 --> 00:36:24.599
<v Speaker 1>Whether that value comes from a quick answer, or helping

716
00:36:24.639 --> 00:36:27.840
<v Speaker 1>with a complex transaction, or even just making that handoff

717
00:36:27.880 --> 00:36:30.440
<v Speaker 1>to a human completely seamless.

718
00:36:29.960 --> 00:36:33.559
<v Speaker 2>Exactly making the interaction feel effective and respectful of the

719
00:36:33.679 --> 00:36:34.320
<v Speaker 2>user's time.

720
00:36:34.559 --> 00:36:36.559
<v Speaker 1>So here's a final thought to leave you with. As

721
00:36:36.599 --> 00:36:40.599
<v Speaker 1>conversational AI continues to evolve at this well breakneck speed,

722
00:36:41.199 --> 00:36:44.559
<v Speaker 1>what new ethical considerations, what new design challenges will become

723
00:36:44.559 --> 00:36:47.920
<v Speaker 1>paramount especially as that line between human and AI interaction

724
00:36:48.000 --> 00:36:51.599
<v Speaker 1>gets increasingly blurred. How will our own expectations of what

725
00:36:51.800 --> 00:36:55.239
<v Speaker 1>understanding even means continue to shift. That's a big question,

726
00:36:55.480 --> 00:36:58.440
<v Speaker 1>definitely something for all of us to maul over as

727
00:36:58.480 --> 00:37:01.320
<v Speaker 1>these systems become more and more deeply ingrained in our

728
00:37:01.400 --> 00:37:02.079
<v Speaker 1>daily lives.
