WEBVTT 1 00:00:00.120 --> 00:00:02.160 Welcome to the Deep Dive, the show where we try 2 00:00:02.200 --> 00:00:06.000 to cut through the noise and get you truly well informed. 3 00:00:06.040 --> 00:00:07.240 Fast glad to be here. 4 00:00:07.799 --> 00:00:11.279 So today we're plunging into a topic we've all encountered, 5 00:00:11.320 --> 00:00:15.599 and let's be honest, sometimes really really disliked the chatbot. 6 00:00:15.759 --> 00:00:17.879 Oh yeah, you know the ones they just don't understand 7 00:00:17.879 --> 00:00:20.120 a single word you say, or they send you in 8 00:00:20.120 --> 00:00:23.480 these endless circles, or you know, make you desperately mash 9 00:00:23.600 --> 00:00:25.160 that speak to a human button. 10 00:00:25.440 --> 00:00:27.679 It's such a universal pain point, isn't it. And it 11 00:00:27.760 --> 00:00:31.519 really spotlights a critical challenge. Yeah, how do we build 12 00:00:31.519 --> 00:00:35.920 AI that actually understands us? Yeah, and you know helps that's. 13 00:00:35.719 --> 00:00:38.719 The core question exactly. So for this Deep Dive, we're 14 00:00:38.799 --> 00:00:44.119 unpacking the secrets behind creating genuinely well delightful AI interactions. 15 00:00:44.200 --> 00:00:47.119 Hopefully we're pulling insights from some really interesting new research, 16 00:00:47.399 --> 00:00:52.079 Effective conversational AI Chatbots that work by Ennikarrosa, Andrew Freed, 17 00:00:52.119 --> 00:00:54.640 and Corey Jacobs. It just came out in twenty twenty five. 18 00:00:54.920 --> 00:00:57.280 Yeah, and our mission today is basically to reveal why 19 00:00:57.320 --> 00:01:00.799 some bots succeed where others just spectacularly fit, and also 20 00:01:00.880 --> 00:01:04.400 how the newest advancements in AI are truly changing the 21 00:01:04.439 --> 00:01:05.200 game for the better. 22 00:01:05.560 --> 00:01:10.120 Think of this as your shortcut maybe to understanding how 23 00:01:10.159 --> 00:01:14.319 to build or even just identify a truly effective conversational AI. 24 00:01:14.680 --> 00:01:16.920 Okay, let's get into it then, So let's. 25 00:01:16.680 --> 00:01:19.319 Maybe start with a clear definition for you. Conversational AI. 26 00:01:19.760 --> 00:01:24.439 It's essentially a set of technologies designed to mimic human 27 00:01:24.480 --> 00:01:27.959 interaction or sometimes even replace it using natural language. 28 00:01:28.040 --> 00:01:28.280 Right. 29 00:01:28.400 --> 00:01:32.640 It goes by lots of names chatbots, virtual agents, AI assistants, 30 00:01:32.719 --> 00:01:34.159 sometimes even digital. 31 00:01:33.799 --> 00:01:36.079 Employees, digital employees, huh okay. 32 00:01:36.040 --> 00:01:39.439 And you mostly see it use for automating customer service, 33 00:01:40.480 --> 00:01:44.599 powering voice assistants like Alexa or Siri, and even sometimes 34 00:01:44.640 --> 00:01:47.280 pre screening interactions before they actually go to a human. 35 00:01:47.680 --> 00:01:50.159 So it's way more than just that little chat window 36 00:01:50.200 --> 00:01:52.120 that pops up on a website. It's kind of everywhere, 37 00:01:52.319 --> 00:01:54.359 it really is, and the book breaks these down into 38 00:01:54.439 --> 00:01:56.599 I think three main functional categories. 39 00:01:56.640 --> 00:01:59.359 Is that right precisely? Yeah? First, you have your question 40 00:01:59.400 --> 00:02:02.959 answering bot. People often call them faqbots. They're designed to 41 00:02:02.959 --> 00:02:08.439 give direct responses to pretty simple factual questions like when 42 00:02:08.479 --> 00:02:13.240 are you open or where you located? No follow up needed. Really, 43 00:02:13.400 --> 00:02:14.719 they just spit out the information. 44 00:02:14.919 --> 00:02:17.000 So these are the quick hit ones. Get in, get 45 00:02:17.000 --> 00:02:19.520 the answer, get out. Is there like a common mistake 46 00:02:19.560 --> 00:02:22.000 people make when they're designing just these simple bots. 47 00:02:22.360 --> 00:02:25.719 Well, I think the main pitfall is underestimating the sheer 48 00:02:25.800 --> 00:02:28.680 variety of ways users might ask the same simple question. 49 00:02:28.960 --> 00:02:32.199 You know that mismatch leads straight to misunderstanding. 50 00:02:32.360 --> 00:02:34.039 Ah, right, makes sense. 51 00:02:34.159 --> 00:02:38.199 Then you have the process oriented or transactional solutions. Now, 52 00:02:38.280 --> 00:02:41.400 these are designed to guide users through a series of 53 00:02:41.439 --> 00:02:44.360 steps to actually achieve a specific goal. 54 00:02:44.479 --> 00:02:47.719 Like booking an appointment or checking inn account balance maybe. 55 00:02:47.479 --> 00:02:50.680 Exactly checking in account balance, booking something. They might collect 56 00:02:50.680 --> 00:02:53.159 info for someone else to handle later, or sometimes they 57 00:02:53.159 --> 00:02:55.240 can even execute the transaction right then and there. 58 00:02:55.400 --> 00:02:57.879 Okay, And the last category, what's that? 59 00:02:57.879 --> 00:03:00.879 That's the routing agent. It's holds you basically is to 60 00:03:00.919 --> 00:03:03.039 figure out where to send you next, so like a 61 00:03:03.159 --> 00:03:07.599 dispatcher kinda yeah, either to another more specialized bot, or 62 00:03:07.759 --> 00:03:10.199 you know, when it's necessary, hand you off to a 63 00:03:10.280 --> 00:03:10.840 human agent. 64 00:03:10.960 --> 00:03:11.280 Okay. 65 00:03:11.319 --> 00:03:13.759 And what's fascinating here, and the book points is out, 66 00:03:14.000 --> 00:03:17.520 is that many real world AI solutions are actually a 67 00:03:17.560 --> 00:03:21.120 clever mix of all three. Oh interesting, Like how We'll 68 00:03:21.120 --> 00:03:24.680 think of a retail banking chatbot. It might answer FAQs 69 00:03:25.240 --> 00:03:28.599 about bank hours, right, but it could also guide you 70 00:03:28.639 --> 00:03:31.919 through opening a new account that's transactional, and then route 71 00:03:31.960 --> 00:03:35.479 you to a human specialist for something complex like fraud reporting. 72 00:03:35.680 --> 00:03:38.240 Right right. It blends the functions that really paints a 73 00:03:38.280 --> 00:03:40.639 clear picture of how versatile these things can be. 74 00:03:40.759 --> 00:03:41.840 Yeah, when they work well. 75 00:03:43.159 --> 00:03:47.439 So given this sort of intricate blend of categories, how 76 00:03:47.479 --> 00:03:50.360 does this sophisticated dance actually happen? Behind the scenes? The 77 00:03:50.360 --> 00:03:52.879 book describes this fascinating three step process. 78 00:03:53.039 --> 00:03:55.000 It is quite elegant when you break it down. But 79 00:03:55.120 --> 00:03:58.080 here's the real insight. I think the brilliance of a 80 00:03:58.120 --> 00:04:02.280 truly effective bot it lies in the seamless execution of 81 00:04:02.319 --> 00:04:05.439 these three fundamental steps. If any one of them falters, 82 00:04:05.680 --> 00:04:07.599 the whole experience just kind of collapses. 83 00:04:07.639 --> 00:04:08.840 Okay, so what's step one? 84 00:04:09.120 --> 00:04:11.919 Step one, the bond has to figure out what the 85 00:04:12.039 --> 00:04:16.279 user actually wants. This is done using natural language understanding 86 00:04:16.560 --> 00:04:17.199 or NLU. 87 00:04:17.439 --> 00:04:18.519 NLU got it. 88 00:04:18.720 --> 00:04:22.439 Often this uses a machine learning text classifier. Think of 89 00:04:22.480 --> 00:04:25.160 it like an AI that learns to categorize text, maybe 90 00:04:25.160 --> 00:04:30.959 like sorting your emails into urgent or promotion. It uses 91 00:04:31.000 --> 00:04:32.560 that to figure out the user's intent. 92 00:04:33.000 --> 00:04:35.920 Okay, so intent when I type something? The first challenge 93 00:04:35.959 --> 00:04:38.120 is the bot figuring out what I'm actually trying to do, 94 00:04:38.240 --> 00:04:42.439 like distinguishing between me wanting to reset my password versus say, 95 00:04:42.519 --> 00:04:43.480 find a store. 96 00:04:43.680 --> 00:04:46.319 Exactly that you nailed it, that's the intent y Step two. 97 00:04:46.680 --> 00:04:49.040 Once it thinks it knows the intent, the bot needs 98 00:04:49.040 --> 00:04:51.839 to gather any extra information it needs to actually fulfill 99 00:04:51.920 --> 00:04:55.720 that want. Okay, So a dialogue engine will ask clarifying questions, 100 00:04:56.160 --> 00:04:59.199 and it might use something called orchestration layers to interact 101 00:04:59.199 --> 00:05:01.120 with other systems through APIs. 102 00:05:01.199 --> 00:05:04.160 APIs right like ways for computer systems to talk to 103 00:05:04.199 --> 00:05:04.519 each other. 104 00:05:04.600 --> 00:05:07.319 Precisely, it's the bot's way of securely talking to other 105 00:05:07.399 --> 00:05:10.279 databases or services to pull the specific details it needs, 106 00:05:10.439 --> 00:05:12.000 like your account info or whatever. 107 00:05:12.079 --> 00:05:15.040 Okay, intent figured out, info gathered? What's step three? 108 00:05:15.360 --> 00:05:18.959 Step three? Give the user what they want, simple as 109 00:05:19.000 --> 00:05:24.160 that ideally, whether that's fulfilling their request directly providing the information, 110 00:05:24.639 --> 00:05:26.319 or connecting them to a human agent. 111 00:05:26.439 --> 00:05:27.560 And throughout this whole thing. 112 00:05:27.680 --> 00:05:31.000 The critical takeaway, and the book really emphasizes this is 113 00:05:31.040 --> 00:05:36.560 it must be quick, easy, and crucially follow ethical guidelines. 114 00:05:36.079 --> 00:05:38.240 Ethical guidelines like what specifically like. 115 00:05:38.240 --> 00:05:43.040 Handling sensitive information securely, and a big one never ever 116 00:05:43.160 --> 00:05:46.399 pretending the AI is actually a human. Transparency is key. 117 00:05:46.800 --> 00:05:49.319 Okay, it sounds so logical laid out like that, Yet, 118 00:05:49.439 --> 00:05:51.439 as we said, for so many of us, the actual 119 00:05:51.439 --> 00:05:55.279 experience with conversational AI causes so much pain. Yeah, the 120 00:05:55.319 --> 00:05:57.959 book points out those classic frustrations, Right, the bot didn't 121 00:05:58.000 --> 00:06:00.199 understand the thing I said, or you get that robot 122 00:06:00.360 --> 00:06:04.319 voice initiating some totally confusing dialogue, or you just immediately 123 00:06:04.399 --> 00:06:05.800 hit the button to talk to a person. 124 00:06:05.920 --> 00:06:06.680 We've all been there. 125 00:06:06.759 --> 00:06:11.000 It really begs the question, what exactly causes this weak understanding? 126 00:06:11.040 --> 00:06:12.399 Why are they so often bad? 127 00:06:12.639 --> 00:06:15.759 Well, weak understanding shows up in several frustrating ways. Right 128 00:06:16.639 --> 00:06:19.800 the chatbot gives you the wrong answers, or it uses 129 00:06:19.839 --> 00:06:23.399 that fallback intent way too much, you know, the sorry 130 00:06:23.439 --> 00:06:26.519 I'm not sure what you're asking message yes, Or you 131 00:06:26.560 --> 00:06:32.240 see frequent escalations to human agents, declining user engagement over time, 132 00:06:32.759 --> 00:06:35.560 people just giving up and leaving, increasing abandonment rates. 133 00:06:36.000 --> 00:06:39.399 So if users are constantly being asked to rephrase or 134 00:06:39.639 --> 00:06:43.519 the bot just gives totally irrelevant responses, that's a dead giveaway. 135 00:06:43.199 --> 00:06:45.800 Absolutely clear sign the understanding just isn't. 136 00:06:45.639 --> 00:06:48.680 There, So what's behind it? Is it just like bad 137 00:06:48.759 --> 00:06:50.959 luck or is there something fundamentally limited? 138 00:06:51.439 --> 00:06:54.319 No, not usually bad luck. The book identifies a few 139 00:06:54.399 --> 00:06:57.519 really common culprits, and the insight here is that these 140 00:06:57.519 --> 00:07:00.720 are often design failures or sometimes maintenance fail things that 141 00:07:00.759 --> 00:07:03.839 could have been prevented. Okay, like what well. One is 142 00:07:03.920 --> 00:07:07.800 manufactured training data, so examples that don't truly reflect how 143 00:07:07.839 --> 00:07:09.480 real users actually speak or type. 144 00:07:09.639 --> 00:07:12.199 Right, if you train it on perfect grammar, but people 145 00:07:12.319 --> 00:07:14.839 use slang or type fragments. 146 00:07:14.680 --> 00:07:17.600 Exactly, the loot's going to fail. Another big one is 147 00:07:17.720 --> 00:07:22.040 insufficient scope or gaps in topic coverage. Basically, the bot 148 00:07:22.120 --> 00:07:25.199 just doesn't know enough about the things users are asking about, like. 149 00:07:25.079 --> 00:07:28.639 That Meti World Pharma bought example, during the vaccine rollout. 150 00:07:28.319 --> 00:07:32.319 Perfect example yeaheah. Initially it could handle general COVID nineteen 151 00:07:32.399 --> 00:07:35.519 questions fine, but when people started asking about you know, 152 00:07:35.639 --> 00:07:39.959 vaccine eligibility or booking appointments, the bot was totally stumped. 153 00:07:39.560 --> 00:07:42.439 Because it hadn't been updated. The world changed faster than the. 154 00:07:42.360 --> 00:07:45.879 Bot exactly, which highlights another cause new information that the 155 00:07:45.879 --> 00:07:48.279 bot hasn't been taught. And the fourth one, which can 156 00:07:48.319 --> 00:07:50.959 often be the trickiest to sort out, is a lack 157 00:07:51.000 --> 00:07:53.759 of vetting or proper gatekeeping round changes. 158 00:07:54.319 --> 00:07:55.720 What do you mean by that? Like too many cooks 159 00:07:55.720 --> 00:07:56.240 in the kitchen? 160 00:07:56.879 --> 00:08:00.160 Kind of untested changes or updates made by team who 161 00:08:00.160 --> 00:08:04.000 aren't familiar with the whole system can accidentally introduce duplication 162 00:08:04.480 --> 00:08:07.600 or create conflicts between different intents or mess up the 163 00:08:07.600 --> 00:08:08.920 balance of the training data. 164 00:08:08.959 --> 00:08:09.519 Wow. 165 00:08:09.879 --> 00:08:12.240 Yeah. The book mentions a client where they saw their 166 00:08:12.319 --> 00:08:15.639 classifiers accuracy just plummet from around eighty percent down to 167 00:08:15.680 --> 00:08:17.240 like fifty five percent over time. 168 00:08:17.519 --> 00:08:21.160 Fifty five percent that's barely better than guessing for some things, right, And. 169 00:08:21.160 --> 00:08:23.879 It was because of all these unvetted changes piling up. 170 00:08:24.319 --> 00:08:27.480 The insight here is that building an effective chatbot isn't 171 00:08:27.560 --> 00:08:31.120 a one and done thing. It needs really diligent processes 172 00:08:31.480 --> 00:08:35.000 to stop that kind of entropy from creeping in. 173 00:08:35.039 --> 00:08:38.039 That's a huge drop. So how do we actually measure 174 00:08:38.080 --> 00:08:41.279 this understanding for traditional AI to stop that kind of 175 00:08:41.360 --> 00:08:42.519 decline from happening. 176 00:08:42.559 --> 00:08:45.879 Well, for traditional classification based AI, we rely on a 177 00:08:45.919 --> 00:08:49.440 few core metrics accuracy, precision, and recall. 178 00:08:49.559 --> 00:08:52.440 Okay, break those down for us accuracy seems straightforward. 179 00:08:52.639 --> 00:08:55.919 Accuracy is yeah, basically the overall percentage of correct predictions 180 00:08:55.960 --> 00:08:59.240 the bot makes simple enough. Recall is the bot's ability 181 00:08:59.240 --> 00:09:02.039 to identify the correct intent. Think of it as catching 182 00:09:02.080 --> 00:09:04.039 all the relevant questions for a specific topic. 183 00:09:04.159 --> 00:09:06.360 So if recall is low, it means. 184 00:09:06.240 --> 00:09:08.559 The bot is missing a lot of relevant questions. Like 185 00:09:08.639 --> 00:09:11.559 the example in the book, if a hashtag login issue 186 00:09:11.559 --> 00:09:14.120 intent had a really low recall maybe zero point four 187 00:09:14.159 --> 00:09:16.240 to four, it means the bot missed more than half 188 00:09:16.279 --> 00:09:18.480 the questions that were actually about login problems. 189 00:09:18.639 --> 00:09:20.320 Ouch, okay, and precision. 190 00:09:20.559 --> 00:09:23.679 Precision, on the other hand, is the bot's ability to 191 00:09:23.759 --> 00:09:28.039 avoid giving a wrong intent. So if precision is low, 192 00:09:28.480 --> 00:09:32.279 your bot might be confidently misunderstanding. 193 00:09:31.519 --> 00:09:33.399 Users, which might be even worse. 194 00:09:33.840 --> 00:09:36.399 It can be yeah, more frustrated than the bot just 195 00:09:36.440 --> 00:09:39.639 saying I don't know. So the real insight here is 196 00:09:39.759 --> 00:09:43.080 how critical it is to balance both precision and recall. 197 00:09:44.120 --> 00:09:46.879 Sometimes improving one can actually hurt the other, so you 198 00:09:46.919 --> 00:09:47.759 need to watch both. 199 00:09:47.840 --> 00:09:50.120 That makes sense. It's a balancing act. So we're talking 200 00:09:50.279 --> 00:09:53.720 rigorous measurement, But how do we actually test this in 201 00:09:53.759 --> 00:09:57.080 a way that reflects the real world. You mentioned kfold 202 00:09:57.120 --> 00:09:58.759 cross validation or blind testing. 203 00:09:58.879 --> 00:10:02.000 Yeah, those are standard method and AI generated data can 204 00:10:02.039 --> 00:10:05.360 be useful for blind testing, especially when you're just starting out. 205 00:10:05.559 --> 00:10:08.200 But what's really fascinating in the book highlights this is 206 00:10:08.200 --> 00:10:11.759 that the most reliable, least biased testing data it comes 207 00:10:11.759 --> 00:10:13.279 from representative. 208 00:10:12.639 --> 00:10:16.200 Production logs, meaning the actual conversations people have had with 209 00:10:16.240 --> 00:10:17.200 the bot exactly. 210 00:10:17.360 --> 00:10:20.840 These logs show what users actually ask and precisely how 211 00:10:20.840 --> 00:10:23.120 they phrase it. It gives you the truest measure of 212 00:10:23.159 --> 00:10:24.919 how the bot performs in the wild. 213 00:10:25.000 --> 00:10:27.159 But that sounds like it requires a lot of work 214 00:10:27.200 --> 00:10:28.679 to go through and label correctly. 215 00:10:29.159 --> 00:10:33.120 It often does. It frequently requires careful, sometimes even manual 216 00:10:33.159 --> 00:10:37.600 annotation by humans to identify what the golden or correct 217 00:10:37.639 --> 00:10:40.960 intent should have been for each user message. But the 218 00:10:40.960 --> 00:10:45.720 insight is clear. Real user data is gold standard for testing. 219 00:10:46.080 --> 00:10:49.320 It sounds like incredibly diligent work making sure the bot's 220 00:10:49.360 --> 00:10:52.720 brain is truly learning the right lessons from real interactions. 221 00:10:52.919 --> 00:10:53.200 It is. 222 00:10:53.360 --> 00:10:55.360 Yeah, just as we're learning how to really fine tune 223 00:10:55.399 --> 00:10:58.840 these traditional systems, there's been this monumental shift in AI 224 00:10:58.960 --> 00:11:01.480 that's just complete lately, changing the rules of the game 225 00:11:01.480 --> 00:11:04.000 for chatbots, oh, absolutely, which brings us to the real 226 00:11:04.000 --> 00:11:08.799 game changer. Generative AI. How is this revolutionizing the very 227 00:11:08.879 --> 00:11:10.759 nature of conversational interaction? 228 00:11:11.080 --> 00:11:13.440 Right? Generative AI. It's kind of a blanket term really 229 00:11:13.480 --> 00:11:16.600 for AI that's powered by these foundation models. So specifically, 230 00:11:16.600 --> 00:11:19.639 we're usually talking about large language models or lms. 231 00:11:19.960 --> 00:11:22.159 LMS. We hear that term everywhere. 232 00:11:22.200 --> 00:11:25.679 Now, Yeah, think of them as these incredibly vast machine 233 00:11:25.720 --> 00:11:28.960 learning models. They've been trained on well basically all the 234 00:11:29.000 --> 00:11:32.080 Internet's text, or huge chunks of it anyway. 235 00:11:32.039 --> 00:11:34.519 Okay, and how do they work? Fundamentally? 236 00:11:34.639 --> 00:11:38.039 Their core function essentially is to predict the next word 237 00:11:38.120 --> 00:11:41.159 in a sequence, and because they're trained on so much text, 238 00:11:41.399 --> 00:11:44.159 they get incredibly good at it, good enough to generate 239 00:11:44.240 --> 00:11:48.039 everything from coherent paragraphs to you know, entire pages of 240 00:11:48.080 --> 00:11:49.799 texts that sound remarkably human. 241 00:11:50.039 --> 00:11:54.559 Wow. Okay, So how do these incredibly powerful lms help 242 00:11:54.720 --> 00:11:58.240 solve those common chat butt pain points we just talked about, 243 00:11:58.559 --> 00:12:00.240 the ones that make us want to, you know, throw 244 00:12:00.240 --> 00:12:00.720 our phones. 245 00:12:01.039 --> 00:12:04.480 They offer potential solutions across the board. Really for that 246 00:12:04.559 --> 00:12:08.840 weak understanding problem. Lllms can help train much stronger traditional 247 00:12:08.840 --> 00:12:11.840 intents or and this is a big one, they can 248 00:12:11.879 --> 00:12:16.240 even entirely replace traditional intent recognition using something called retrieval, 249 00:12:16.240 --> 00:12:19.279 augmented generation or air gray greg. 250 00:12:19.480 --> 00:12:20.840 Okay, we'll definitely need to dive. 251 00:12:20.679 --> 00:12:23.039 Into that, Yeah we will. But the point is lllms 252 00:12:23.080 --> 00:12:25.240 are just far more adaptive to nuance in all the 253 00:12:25.360 --> 00:12:26.919 varied ways people phrase things. 254 00:12:27.039 --> 00:12:29.799 And what about the complexity issue bots getting too confusing? 255 00:12:30.039 --> 00:12:33.799 They can help there too. Lms can assist in writing simpler, 256 00:12:33.919 --> 00:12:36.919 clearer dialogue for the bot, or they can even be 257 00:12:37.039 --> 00:12:40.759 used to test dialogue flows for unexpected complexity before you 258 00:12:40.799 --> 00:12:41.799 deploy them. 259 00:12:42.159 --> 00:12:45.000 Okay, and the immediate opt outs people just giving up 260 00:12:45.120 --> 00:12:45.559 right away. 261 00:12:45.919 --> 00:12:49.200 Generative AI can help write much more engaging, maybe even 262 00:12:49.200 --> 00:12:53.440 more empathetic pros for the bot's messages. Setting a better 263 00:12:53.480 --> 00:12:56.039 tone right from the start can make a huge difference 264 00:12:56.039 --> 00:12:59.000 in making the user feel heard and willing to continue. 265 00:12:59.080 --> 00:13:01.559 So they're not just for the end user experience, but 266 00:13:01.600 --> 00:13:04.759 they're also tools for the people actually building the bots. 267 00:13:04.799 --> 00:13:08.159 I saw a table in the source about key applications exactly. 268 00:13:08.519 --> 00:13:12.440 Lllms have both consumer facing applications like generating answers using 269 00:13:12.679 --> 00:13:16.720 R which we mentioned, or maybe summarizing long conversation transcripts 270 00:13:16.759 --> 00:13:19.559 or human agents who take over a call. It's useful, yeah, 271 00:13:19.679 --> 00:13:22.360 hugely and then they have powerful build assistant tasks. They 272 00:13:22.399 --> 00:13:25.440 can help copy it or even write dialogue flows from scratch, 273 00:13:25.759 --> 00:13:28.080 or they can augment training data for the human builders, 274 00:13:28.440 --> 00:13:30.240 which is just a massive time saver. 275 00:13:30.679 --> 00:13:32.879 But with all that power, especially if they're trained on 276 00:13:33.240 --> 00:13:37.759 all the Internet's text, there must be some pretty significant danger, 277 00:13:37.840 --> 00:13:39.720 some pitfalls we need to watch out for. Oh. 278 00:13:39.759 --> 00:13:42.559 Absolutely, that's a critical point. The Internet, as we all 279 00:13:42.559 --> 00:13:46.879 know is it's full of bias, hateful speech, misinformation, you 280 00:13:47.000 --> 00:13:51.039 name it, and lms can unfortunately learn from all of that. 281 00:13:51.840 --> 00:13:56.600 So guardrails are absolutely crucial, non negotiable. 282 00:13:56.639 --> 00:13:58.200 Really, what kind of guardrails are we talking about? 283 00:13:58.320 --> 00:14:02.799 Things like content filters, but also process guardrails, like a 284 00:14:02.840 --> 00:14:06.240 beforehand review process that means the LLM might assist a 285 00:14:06.320 --> 00:14:09.559 human maybe drafting your response, but the human always has 286 00:14:09.600 --> 00:14:12.759 the final say and is ultimately responsible for the output. 287 00:14:12.879 --> 00:14:14.600 Human in the loop exactly. 288 00:14:14.720 --> 00:14:18.840 Yeah, and perhaps most importantly, grounding the LM's output in 289 00:14:18.919 --> 00:14:23.279 your own company's verified accurate documents through Araghi. That stops 290 00:14:23.320 --> 00:14:24.720 it from just pulling answers from. 291 00:14:24.600 --> 00:14:28.840 The wild web, like that unforgettable Canadian Airline chatbot example 292 00:14:28.879 --> 00:14:29.559 that went viral. 293 00:14:29.799 --> 00:14:34.639 Precisely that case is legendary. Now, their chatbot offered a 294 00:14:34.679 --> 00:14:38.679 bereavement discount that didn't actually exist based on some information 295 00:14:38.759 --> 00:14:41.120 it hallucinated or pulled in correctly. 296 00:14:40.720 --> 00:14:42.679 And the airline tried to argue the bot was separate. 297 00:14:42.840 --> 00:14:45.320 They tried to argue as a separate legal entity, if 298 00:14:45.360 --> 00:14:48.679 you can believe it. The court strongly disagreed and made 299 00:14:48.720 --> 00:14:52.399 them honor the discount. Wow, it really underscores a critical insight. 300 00:14:52.919 --> 00:14:56.120 Companies are responsible for what their bots say, which highlights 301 00:14:56.159 --> 00:15:01.519 the absolute necessity of these guardrails, especially like RAG, to 302 00:15:01.720 --> 00:15:05.960 ensure accuracy and frankly avoid legal nightmares. 303 00:15:06.000 --> 00:15:08.639 H that's a very expensive lesson and it really drives 304 00:15:08.679 --> 00:15:12.000 home the need for proper implementation. So let's talk more 305 00:15:12.000 --> 00:15:15.200 about this argo retrieval augmented generation. You said it's a 306 00:15:15.240 --> 00:15:19.000 big part of solving the weak understanding problem, especially for 307 00:15:19.080 --> 00:15:21.440 those less common, more specific queries. 308 00:15:21.480 --> 00:15:25.080 It absolutely is traditional intent based systems. They really struggle 309 00:15:25.080 --> 00:15:26.720 with what's called the long tail problem. 310 00:15:26.919 --> 00:15:27.399 Long tail. 311 00:15:27.480 --> 00:15:30.399 Yeah, think about it. Imagine trying to write a specific 312 00:15:30.759 --> 00:15:34.679 rule or train an intent for every single possible question 313 00:15:34.799 --> 00:15:37.960 someone could ask about your products or services. It's an 314 00:15:37.960 --> 00:15:41.200 impossible task, right, there's a long tail of very specific, 315 00:15:41.480 --> 00:15:42.679 infrequent questions. 316 00:15:42.799 --> 00:15:46.039 Right, you can cover the common stuff, but not everything exactly. 317 00:15:46.480 --> 00:15:49.440 So when users ask questions that deviate from those pre 318 00:15:49.519 --> 00:15:53.159 defined intents you did train, or questions that are simply 319 00:15:53.279 --> 00:15:57.080 too uncommon to have specific training data for, the traditional 320 00:15:57.159 --> 00:15:59.600 bot just breaks down. It throws up its hands because 321 00:15:59.639 --> 00:16:00.799 it has no rule to follow. 322 00:16:01.200 --> 00:16:05.000 Okay, so Argie is the answer to that long tail problem. 323 00:16:05.399 --> 00:16:07.840 How does it handle it differently, say, compared to just 324 00:16:07.919 --> 00:16:09.679 adding a search function to the chatbot. 325 00:16:09.840 --> 00:16:12.240 That's a great comparison. Let's think about traditional search within 326 00:16:12.279 --> 00:16:14.519 a chat bot first. It would work kind of like 327 00:16:14.559 --> 00:16:18.919 the pharmabout example before Eric. It finds relevant documents or passages, 328 00:16:18.960 --> 00:16:22.360 maybe on ibuprofen and blood pressure. Okay, the benefits are clear. 329 00:16:22.879 --> 00:16:25.720 You get a breadth of information, it's relatively easy to maintain, 330 00:16:25.879 --> 00:16:28.759 just add or edit your documents, and search technology is 331 00:16:28.799 --> 00:16:32.559 well established. But the downsides, the drawbacks are significant for 332 00:16:32.639 --> 00:16:36.360 the user experience. It often just returns links or maybe 333 00:16:36.399 --> 00:16:40.559 short snippets of text. It forces the user to click through, read, 334 00:16:40.639 --> 00:16:43.480 and piece together the answer themselves, which is really frustrating. 335 00:16:43.519 --> 00:16:45.799 You ask the bot for an answer, not homework. 336 00:16:45.559 --> 00:16:49.240 Exactly, and it's particularly bad for voice interactions. You can't 337 00:16:49.279 --> 00:16:51.559 exactly click a link when you're talking to a voice assistant. 338 00:16:51.639 --> 00:16:54.480 Good point. So how does argon improve on that? 339 00:16:54.799 --> 00:16:59.039 This is where retrieval augmented generation really shines. It offers 340 00:16:59.080 --> 00:17:03.600 a truly powerful leap forward. Argin combines that search based 341 00:17:03.639 --> 00:17:07.039 retrieval step with the power of generitive models. The LMS. 342 00:17:07.160 --> 00:17:09.279 Okay, so it searches and generate precisely. 343 00:17:09.920 --> 00:17:12.160 The insight here is that it first retrieves the most 344 00:17:12.200 --> 00:17:15.720 relevant passages from your own verified knowledge base, your documents, 345 00:17:15.720 --> 00:17:18.640 your website content, whatever you feed it, and then the 346 00:17:18.799 --> 00:17:23.279 LM takes those retrieved passages and synthesizes them into a cohesive, 347 00:17:23.480 --> 00:17:26.279 contextually aware answer in natural language. 348 00:17:26.359 --> 00:17:28.559 Ah So, instead of just giving me links about ibuprofen 349 00:17:28.599 --> 00:17:31.240 and blood pressure, the pharma bought with our gig would 350 00:17:31.240 --> 00:17:34.200 actually read those relevant bits and then write me a clear, 351 00:17:34.359 --> 00:17:35.480 single summary answer. 352 00:17:35.880 --> 00:17:40.400 Exactly, and crucially, that answer is grounded in that verified 353 00:17:40.400 --> 00:17:41.799 source information you provided. 354 00:17:41.960 --> 00:17:44.559 Grounded That seems like the key word it really is. 355 00:17:45.160 --> 00:17:48.359 It means the answer is based on your accurate, up 356 00:17:48.359 --> 00:17:51.279 to day data, not just the llm's general knowledge is 357 00:17:51.359 --> 00:17:55.039 great from the internet years ago. This dramatically expands the 358 00:17:55.079 --> 00:17:58.839 bot's versatility. It can answer way more questions far more accurately, 359 00:17:59.319 --> 00:18:03.279 and it's significantly reduces those bot doesn't understand and too 360 00:18:03.319 --> 00:18:04.720 much complexity pain points. 361 00:18:04.799 --> 00:18:07.559 Okay, that sounds amazing. How is it actually implemented behind 362 00:18:07.559 --> 00:18:10.119 the scenes? It sounds potentially quite complex. 363 00:18:10.440 --> 00:18:14.160 It involves a few pretty fascinating steps. First, your large 364 00:18:14.160 --> 00:18:18.039 documents think manuals, website pages, knowledge based articles are broken 365 00:18:18.079 --> 00:18:21.519 down or chunked into smaller manageable pieces, maybe paragraphs or 366 00:18:21.599 --> 00:18:24.960 logical section chunking, got it. Then an AI model called 367 00:18:24.960 --> 00:18:29.599 an embedding model converts these chunks into numerical representations. We 368 00:18:29.680 --> 00:18:31.680 call these embeddings. 369 00:18:31.160 --> 00:18:33.920 Numerical representations like coordinates on a map. 370 00:18:34.160 --> 00:18:37.279 Kind of Yeah, it's like creating a unique numeric fingerprint 371 00:18:37.480 --> 00:18:40.119 for the meaning of each piece of text. Texts with 372 00:18:40.240 --> 00:18:43.839 similar meanings end up with similar fingerprints or closer coordinates 373 00:18:43.839 --> 00:18:48.079 in this high dimensional space. Whoa these embeddings. These fingerprints 374 00:18:48.279 --> 00:18:51.000 are then stored in a special kind of database called 375 00:18:51.000 --> 00:18:54.160 a vector database. Think of it as a super fast, 376 00:18:54.240 --> 00:18:57.839 intelligent library index that doesn't just look for keywords, but 377 00:18:57.920 --> 00:19:00.240 for semantic similarity for similar. 378 00:19:00.599 --> 00:19:03.839 Okay, so you've indexed all your chunked documents by their meaning. 379 00:19:04.000 --> 00:19:05.640 What happens when I ask a question? 380 00:19:06.079 --> 00:19:08.839 Right at runtime? When you ask something, your question is 381 00:19:08.880 --> 00:19:12.000 also turned into an embedding using the same model. The 382 00:19:12.039 --> 00:19:15.079 system then searches the vector database to find the chunks 383 00:19:15.079 --> 00:19:19.039 whose embeddings are closest, meaning most semantically similar to your 384 00:19:19.119 --> 00:19:20.400 questions embedding, So it. 385 00:19:20.359 --> 00:19:22.400 Finds the most relevant paragraphs based. 386 00:19:22.200 --> 00:19:25.880 On meaning exactly, and then those retrieved passages are fed 387 00:19:25.880 --> 00:19:29.400 to the LLM along with your original question with instructions 388 00:19:29.440 --> 00:19:33.440 like answer the user's question based only on this provided information. 389 00:19:34.160 --> 00:19:37.160 The LLM then synthesizes the final grounded answer. 390 00:19:37.559 --> 00:19:43.920 Wow, that's a lot of intricate steps chunking, embeddings, vector databases, retrieval, synthesis. 391 00:19:44.400 --> 00:19:46.240 So for you, the listener, who might be thinking, Okay, 392 00:19:46.319 --> 00:19:49.160 if the lms are so powerful, why not just ask 393 00:19:49.200 --> 00:19:52.160 the LLM directly, why bother with all this a rag stuff? 394 00:19:52.400 --> 00:19:54.960 That's a really crucial question, And the reason is simple 395 00:19:55.720 --> 00:19:59.359 control and reliability. Llms used on their own. 396 00:19:59.359 --> 00:20:01.720 Can hallucinate, hallucinate, make things up. 397 00:20:01.680 --> 00:20:06.359 Yeah, literally makeup facts, or provide outdated information because their 398 00:20:06.400 --> 00:20:09.640 training data isn't perfectly current. Remember they're trained on a 399 00:20:09.680 --> 00:20:13.519 massive general data set, but they don't inherently know the specific, 400 00:20:13.680 --> 00:20:16.480 up to the minute details of your company's policies or 401 00:20:16.519 --> 00:20:19.480