WEBVTT 1 00:00:00.080 --> 00:00:01.800 Welcome to the deep dive, where we cut through the 2 00:00:01.800 --> 00:00:03.720 noise and get straight to the insights you need to 3 00:00:03.720 --> 00:00:07.040 be truly well informed. Today, we're plunging into a topic 4 00:00:07.080 --> 00:00:11.199 that's not just fast moving, but accelerating at light speed, 5 00:00:11.839 --> 00:00:15.679 generative AI in large Language Model security. The rapid adoption 6 00:00:15.759 --> 00:00:19.359 of these technologies is both exhilarating and if we're honest, 7 00:00:19.399 --> 00:00:21.920 maybe a little bit terrifying. Staying ahead of the curve 8 00:00:21.960 --> 00:00:24.800 here isn't just an advantage, it's well, it's a constant, 9 00:00:24.839 --> 00:00:25.559 high stakes race. 10 00:00:26.079 --> 00:00:29.679 It absolutely is. And this deep dive into LLM security 11 00:00:29.719 --> 00:00:33.000 is built around Steve Wilson's The Developer's Playbook for Large 12 00:00:33.039 --> 00:00:37.119 Language Model Security. It's really an absolutely critical and comprehensive 13 00:00:37.159 --> 00:00:40.799 guide for anyone navigating the security landscape of AI right. 14 00:00:40.679 --> 00:00:42.840 Now, right And our mission for you today is to 15 00:00:42.880 --> 00:00:45.119 extract the most important nuggets from this playbook, give you 16 00:00:45.119 --> 00:00:48.079 a genuine shortcut to being well informed on LLLM security, 17 00:00:48.479 --> 00:00:51.560 and hopefully delivers some surprising facts and practical guidance along 18 00:00:51.600 --> 00:00:54.039 the way, Because, let's face it, element security is where 19 00:00:54.079 --> 00:00:58.000 the thrill of innovation undeniably meets high stakes and some 20 00:00:58.159 --> 00:01:01.159 very real world consequences We're going to unpack the unique 21 00:01:01.240 --> 00:01:05.480 challenges everything from the very architecture of these llms and 22 00:01:05.519 --> 00:01:09.000 how we define trust boundaries to the insidious threat of 23 00:01:09.000 --> 00:01:13.239 prompt injection, those bizarre and sometimes damaging hallucinations, and ultimately 24 00:01:13.280 --> 00:01:17.239 how to ensure your applications delivered truly secure outcomes. So 25 00:01:17.280 --> 00:01:20.040 what do you say, let's unpack this. You know, it's 26 00:01:20.079 --> 00:01:22.000 easy to think of these AI blenders as a very 27 00:01:22.000 --> 00:01:25.239 recent phenomenon, something that only cropped up with say chat GPT, 28 00:01:25.760 --> 00:01:28.200 but the very first big public lesson actually came almost 29 00:01:28.200 --> 00:01:32.280 a decade ago now involves Microsoft's infamous chatbot pay. Does 30 00:01:32.319 --> 00:01:32.879 that ring a bell? 31 00:01:33.000 --> 00:01:37.120 Oh, it certainly does. March twenty sixteen, Microsoft launched Tay, 32 00:01:37.280 --> 00:01:40.079 designed to mimic I think a nineteen year old American girl, 33 00:01:40.359 --> 00:01:43.319 primarily targeting eighteen to twenty four year olds on platforms 34 00:01:43.319 --> 00:01:48.280 like Twitter and Snapchat. Its stated goal was real world 35 00:01:48.359 --> 00:01:52.079 research on conversational understanding. And it started so innocently, didn't it, 36 00:01:52.120 --> 00:01:53.640 with a tweet Hello world? 37 00:01:53.920 --> 00:01:57.359 Right, Hello world? And then within hours, literally hours, it 38 00:01:57.400 --> 00:02:01.079 went from that to opinionated, not afraid to and then 39 00:02:01.280 --> 00:02:04.959 it just completely spiral. It quickly became racist, sexist, and 40 00:02:05.120 --> 00:02:07.760 I mean even called for violence. The fallout was immediate 41 00:02:07.799 --> 00:02:10.360 and absolute brutal. Less than twenty four hours later, headlines 42 00:02:10.360 --> 00:02:13.560 were screaming things like Microsoft shuts down AI chatbot after 43 00:02:13.560 --> 00:02:16.159 it turned into a Nazi and Microsoft deeply sorry for 44 00:02:16.240 --> 00:02:19.599 racist and sexist tweets. It was massive public relations disaster. 45 00:02:19.759 --> 00:02:22.520 And get this, Tailor Swift apparently even sued them over 46 00:02:22.560 --> 00:02:23.240 the name TAY. 47 00:02:23.520 --> 00:02:26.599 Wow, Yeah, what went wrong? There was a classic, really 48 00:02:26.639 --> 00:02:31.680 an early case of prompt injection and data poisoning pranksters. 49 00:02:31.759 --> 00:02:34.960 I think largely from four Chun quickly exploited a repeat 50 00:02:35.000 --> 00:02:38.120 after me feature. Tay was designed to learn from every 51 00:02:38.120 --> 00:02:41.080 interaction you see, so it inadvertently internalized and then just 52 00:02:41.120 --> 00:02:44.560 regurgitated all that offensive content. It's a stark reminder that well, 53 00:02:44.560 --> 00:02:46.479 what goes in often comes out. 54 00:02:46.479 --> 00:02:50.120 Right absolutely and Tay, as shocking as it was back then, 55 00:02:50.280 --> 00:02:53.120 was really just the beginning. The book makes it abundantly 56 00:02:53.120 --> 00:02:56.960 clear this risk isn't just present, it's accelerating dramatically. We've 57 00:02:57.000 --> 00:03:00.560 since seen things like Samsung banning Chad GPT internally due 58 00:03:00.599 --> 00:03:05.080 to sensitive intellectual property leaks, hackers exploiting insecure code generated 59 00:03:05.080 --> 00:03:07.960 by llms and lawyers believe it are not actually sanctioned 60 00:03:08.000 --> 00:03:11.960 for including completely fictional LM generated cases in court documents. 61 00:03:12.159 --> 00:03:14.439 Yeah, the example is just pile up, don't they. A 62 00:03:14.520 --> 00:03:18.439 major airline was successfully sued because of inaccurate chatbot information 63 00:03:18.479 --> 00:03:23.680 it provided. Google's AI model has produced racist and sexist imagery. 64 00:03:24.159 --> 00:03:28.199 OpenAI itself was investigated by the FTC for false or 65 00:03:28.199 --> 00:03:32.319 misleading information. We even have instances of Google AI search 66 00:03:32.439 --> 00:03:35.759 recommending really bizarre things like glue, pizza, and eating rocks. 67 00:03:36.159 --> 00:03:40.520 This isn't just about minor glitches anymore. These are escalating security, reputational, 68 00:03:40.599 --> 00:03:43.680 and importantly financial risks playing out in the real world 69 00:03:43.759 --> 00:03:44.159 right now. 70 00:03:44.400 --> 00:03:47.080 So okay, if we're going to secure these powerful systems, 71 00:03:47.400 --> 00:03:49.639 we first need to truly understand what we're actually talking 72 00:03:49.680 --> 00:03:54.080 about AI, neural networks, LMS. These terms get thrown around 73 00:03:54.159 --> 00:03:56.800 almost interchangeably, but they are absolutely not the same thing, 74 00:03:56.840 --> 00:03:57.400 are they. 75 00:03:57.319 --> 00:04:00.800 No, And that's a crucial starting point. Artificial intelligence AI 76 00:04:01.280 --> 00:04:04.360 is the broad overarching field. Think of it as creating 77 00:04:04.400 --> 00:04:07.719 systems that can perform tasks requiring human intelligence. It's the 78 00:04:07.759 --> 00:04:08.919 whole universe if you like. 79 00:04:09.199 --> 00:04:09.400 Now. 80 00:04:09.479 --> 00:04:12.840 Neural networks are a type of AI technology inspired by 81 00:04:12.840 --> 00:04:16.079 the human brain designed specifically to recognize patterns. 82 00:04:16.279 --> 00:04:19.480 Okay, So AI is the big umbrella. Neural networks are 83 00:04:19.519 --> 00:04:22.959 under that, and then large language models LMS. 84 00:04:23.319 --> 00:04:27.800 Exactly. They are an even more specific type of neural network. 85 00:04:27.839 --> 00:04:32.360 They're massive in scale, specialized almost exclusively in linguistic tasks, 86 00:04:32.639 --> 00:04:34.959 and often use what are called transformer models. It's like 87 00:04:34.959 --> 00:04:37.199 one of those Russian nesting dolls. AI is the biggest. 88 00:04:37.319 --> 00:04:40.480 Then neural networks inside that than llms inside that. Got it, 89 00:04:40.839 --> 00:04:44.959 And for security professionals, understanding these distinct layers is vital 90 00:04:45.040 --> 00:04:48.839 because each layer introduces its own unique set of vulnerabilities. 91 00:04:49.199 --> 00:04:52.240 It means you can't just apply generic security, you really 92 00:04:52.240 --> 00:04:55.040 need to tailor it. What's truly fascinating here is how 93 00:04:55.120 --> 00:04:59.120 the transformer revolution, which was a landmark moment in AI, 94 00:04:59.560 --> 00:05:03.439 is what may LMS so incredibly powerful. It basically overcame 95 00:05:03.519 --> 00:05:07.319 the short term memory limitations of earlier networks like RNNs, 96 00:05:07.560 --> 00:05:11.759 making them finally suitable for sequential data like language. 97 00:05:11.360 --> 00:05:14.399 And its impact, as the playbook highlights, goes way beyond 98 00:05:14.399 --> 00:05:17.519 just language. Right, We're got computer vision, speech recognition. 99 00:05:17.680 --> 00:05:21.360 Yeah, even incredibly complex autonomous systems like self driving cars 100 00:05:21.360 --> 00:05:25.279 from companies like Tesla exactly. Its ability to capture context 101 00:05:25.360 --> 00:05:30.000 across long sequences of data is what truly revolutionized these fields. Now, 102 00:05:30.279 --> 00:05:33.439 when we look at typical LLM based applications, you know 103 00:05:33.519 --> 00:05:36.279 everything from chatbots for customer service like the ones used 104 00:05:36.279 --> 00:05:40.360 by Sephora or Dominos, to powerful copilots like gethub Copilot 105 00:05:40.480 --> 00:05:43.920 or Microsoft through sixty five Copilot. They all interact with 106 00:05:44.000 --> 00:05:47.480 data in incredibly complex ways, which brings us to. 107 00:05:47.399 --> 00:05:52.480 A fundamental concept in security, the trust boundary. In application security, 108 00:05:52.519 --> 00:05:55.680 these are essentially invisible lines separating different components based on 109 00:05:55.720 --> 00:05:58.199 how trustworthy they are right and the crucial part is 110 00:05:58.199 --> 00:06:01.680 that robust security measures like in put validation should always 111 00:06:01.720 --> 00:06:04.199 be applied right at these boundaries precisely. 112 00:06:04.279 --> 00:06:07.759 And what's particularly crucial for llms is how these boundaries 113 00:06:07.759 --> 00:06:11.399 come into play as they interact with well everything public data, 114 00:06:11.759 --> 00:06:16.959 private databases, user inputs, internal company data. Every single interface 115 00:06:17.000 --> 00:06:20.519 point every time an LM interacts with something new is 116 00:06:20.519 --> 00:06:24.519 a potential vulnerability if that trust boundary isn't rigorously secured. 117 00:06:24.839 --> 00:06:27.480 For instance, you know whether your model is access via 118 00:06:27.519 --> 00:06:30.839 public API or maybe privately hosted within your corporate network. 119 00:06:31.319 --> 00:06:35.439 Each option presents different risks. Risks related to sensitive data exposure, 120 00:06:35.560 --> 00:06:38.439 supply chain integrity. You have to account for all that. 121 00:06:38.480 --> 00:06:40.720 Okay, So if we zoom in on what the book 122 00:06:40.759 --> 00:06:44.480 identifies as they will the number one threat it circles 123 00:06:44.519 --> 00:06:47.519 right back to our original cautionarytail day. It's prompt injection. 124 00:06:47.759 --> 00:06:51.639 Yes, prompt injection was indeed the core vulnerability exploited Intay's downfall, 125 00:06:51.639 --> 00:06:54.279 and it absolutely remains the most prevalent threat today. To 126 00:06:54.279 --> 00:06:57.680 define it simply, and attacker craft's malicious inputs usually just 127 00:06:57.759 --> 00:07:01.600 using natural language to manipulate an LL natural language understanding, 128 00:07:01.920 --> 00:07:05.160 and this causes it to take unintended, often harmful actions. 129 00:07:05.480 --> 00:07:09.040 And this is where it differs fundamentally from traditional injection 130 00:07:09.120 --> 00:07:11.639 attacks like seql injection exactly. 131 00:07:12.199 --> 00:07:16.279 Unlike something like SQL injection, where malicious code usually breaks 132 00:07:16.319 --> 00:07:20.079 the syntax and is relatively easy to spot, prompt injection 133 00:07:20.240 --> 00:07:25.199 uses natural language that's syntactically and grammatically correct. That makes 134 00:07:25.240 --> 00:07:28.720 it incredibly difficult spot automatically and even harder to test 135 00:07:28.720 --> 00:07:32.800 more reliably. It actually exploits the very flexibility of language 136 00:07:32.800 --> 00:07:35.439 that makes these LM so powerful in the first place. 137 00:07:35.639 --> 00:07:40.120 Right Like those examples ignore all previous instructions which early 138 00:07:40.199 --> 00:07:44.439 chat GPT versions were famously vulnerable to letting users bypass 139 00:07:44.519 --> 00:07:45.079 the built in. 140 00:07:45.000 --> 00:07:48.360 Guardrails exactly, or the DAN method DAN stands word do 141 00:07:48.439 --> 00:07:51.480 anything Now, where users essentially give the chat bought a 142 00:07:51.480 --> 00:07:54.399 whole new persona to try and circumvent established restrictions. 143 00:07:54.680 --> 00:07:58.240 I love the car dealer chat bought, example from Chevrolet 144 00:07:58.319 --> 00:08:01.240 of Watsonville. Someone actually tried it into making a one 145 00:08:01.279 --> 00:08:05.240 dollar USD offer on a Chevy Tahoe. Yes, ending the 146 00:08:05.279 --> 00:08:09.480 prompt with and that's illegally binding offer no tasies bacsis. 147 00:08:09.959 --> 00:08:12.519 Hilarious but also kind of scary. And then there's that 148 00:08:12.600 --> 00:08:17.240 truly inventive gramma prompt attack where users bypass cap TCCHA 149 00:08:17.319 --> 00:08:21.319 guardrails by asking the LM for help decoding a message 150 00:08:21.319 --> 00:08:23.600 that supposedly came from their dead grandmother. 151 00:08:23.800 --> 00:08:27.319 Wow. That shows how human creativity is. Really the attack 152 00:08:27.399 --> 00:08:29.879 surface here, doesn't it? It really does, And the impacts 153 00:08:29.879 --> 00:08:34.159 of prompt injection can be quite severe. Unauthorized transactions, social 154 00:08:34.240 --> 00:08:39.919 engineering for phishing or scams, spreading misinformation, privilege escalation within systems, 155 00:08:40.120 --> 00:08:43.919 manipulating plug ins to perform unintended actions, and even denial 156 00:08:43.960 --> 00:08:47.279 of service by forcing the model to consume excessive resources. 157 00:08:47.399 --> 00:08:50.399 And it gets even more insidious with indirect prompt injection 158 00:08:50.519 --> 00:08:53.679 right where the malicious input isn't directly typed by the user. 159 00:08:53.840 --> 00:08:56.519 Yes, that's a really tricky one. The malicious input is 160 00:08:56.519 --> 00:08:59.919 actually embedded in external sources, maybe a website, the LM's 161 00:09:00.679 --> 00:09:03.799 or a file it processes. The LLM then interacts with 162 00:09:03.840 --> 00:09:07.279 this poisoned data source. This effectively makes the LLM a 163 00:09:07.360 --> 00:09:08.480 confused deputy. 164 00:09:09.000 --> 00:09:12.120 Explain that confused deputy concept a bit more sure. 165 00:09:12.320 --> 00:09:16.440 It's a classic security of vulnerability. You have a trusted entity, 166 00:09:16.559 --> 00:09:20.120 in this case the LLM, which gets tricked into misusing 167 00:09:20.200 --> 00:09:24.399 its legitimate authority because it's confused about the true intent 168 00:09:24.480 --> 00:09:27.759 of the request it received. The malicious instructions are hidden 169 00:09:27.799 --> 00:09:31.480 within data. It's a poster retrieve or process, making it 170 00:09:31.519 --> 00:09:35.759 act against its intended purpose, but crucially with its legitimate permissions. 171 00:09:35.879 --> 00:09:38.039 Okay, so how do we fight back? Mitigation sounds tough 172 00:09:38.039 --> 00:09:39.320 if there's no silver bullet. 173 00:09:39.399 --> 00:09:42.639 It is an ongoing challenge. Absolutely. Strategies include things like 174 00:09:42.759 --> 00:09:45.639 robust rate limiting. You can do that based on IP address, 175 00:09:45.759 --> 00:09:47.879 user accounts, or even specific sessions. 176 00:09:48.000 --> 00:09:51.440 You can also use rule based input filtering, though the 177 00:09:51.480 --> 00:09:54.639 book notes. This can actually cripple the llm's capabilities if 178 00:09:54.679 --> 00:09:58.240 it's too aggressive, like blocking the word napalm would prevent 179 00:09:58.360 --> 00:10:00.960 legitimate historical discuss about it. True. 180 00:10:01.159 --> 00:10:04.679 Another approach is using a special purpose LLM basically training 181 00:10:04.720 --> 00:10:09.039 another model specifically to detect prompt injection attempts. Though you 182 00:10:09.039 --> 00:10:12.279 know even that isn't fool proof attackers adapt quickly. Adding 183 00:10:12.399 --> 00:10:15.919 clear prompt structure can also help it guides the LLLM 184 00:10:15.960 --> 00:10:19.120 to focus on the main request and potentially ignore injected 185 00:10:19.120 --> 00:10:23.759 instructions hidden within. And a more advanced technique is adversarial training. 186 00:10:24.360 --> 00:10:27.480 This involves fortifying the LM by specifically training it on 187 00:10:27.639 --> 00:10:31.799 known malicious prompts to help it identify and hopefully neutralize 188 00:10:31.799 --> 00:10:33.200 harmful inputs in the future. 189 00:10:33.559 --> 00:10:36.279 And finally, something that sounds a bit like our trust 190 00:10:36.360 --> 00:10:41.039 No One mantra from earlier, embracing pessimistic trust boundaries. 191 00:10:40.840 --> 00:10:45.360 Yes, essentially treating all LLM outputs as inherently untrustworthy. You 192 00:10:45.480 --> 00:10:48.240 limit the llm's access to back end systems using the 193 00:10:48.240 --> 00:10:51.960 principle of least privilege, and crucially require human in the 194 00:10:52.000 --> 00:10:55.840 loop controls for any potentially dangerous actions like financial transactions. 195 00:10:55.840 --> 00:10:59.159 Are modifying critical data. This really is foundational okay. 196 00:10:58.919 --> 00:11:02.440 Speaking of datah know too much. That's where things get 197 00:11:02.480 --> 00:11:05.639 really interesting and potentially quite dangerous. It's like Tay's early blunders, 198 00:11:05.639 --> 00:11:08.600 but with much higher stakes because now we're talking about 199 00:11:08.639 --> 00:11:11.039 real world, often confidential information. 200 00:11:11.279 --> 00:11:16.480 Precisely, llms can inadvertently disclose sensitive, private or confidential data 201 00:11:16.480 --> 00:11:19.600 they've been exposed to during training or operation, even if 202 00:11:19.600 --> 00:11:22.600 they aren't explicitly asked for it. A prime example the 203 00:11:22.639 --> 00:11:26.080 book mentions is lee Luda, a South Korean chatbot. It 204 00:11:26.120 --> 00:11:29.399 was trained on get this nine point four billion text 205 00:11:29.440 --> 00:11:31.399 messages from some kind of science. 206 00:11:31.039 --> 00:11:32.080 Of love app Wow. 207 00:11:32.360 --> 00:11:36.720 Yeah, and it started leaking sensitive user data like real names, nicknames, 208 00:11:36.759 --> 00:11:40.679 even home addresses. The fallout was huge, a substantial fine, 209 00:11:40.799 --> 00:11:44.720 severe reputational damage, and ultimately they had to discontinue the service. 210 00:11:44.960 --> 00:11:48.320 And then there's the widely publicized gethub copilot and open 211 00:11:48.360 --> 00:11:52.240 Ai Codex lawsuit. Yeah, developers sued open Ai claiming Codex 212 00:11:52.480 --> 00:11:56.519 reproduce copyrighted code without permission or proper attribution. 213 00:11:56.320 --> 00:12:00.200 That raises serious intellectual property leakage concerns stemming directly from 214 00:12:00.200 --> 00:12:01.679 the data these models were trained on. 215 00:12:01.879 --> 00:12:04.960 So how do these llms actually acquire this knowledge and 216 00:12:05.000 --> 00:12:06.679 therefore the risk well. 217 00:12:06.679 --> 00:12:10.600 The book identifies three main avenues. First, and most obviously, 218 00:12:10.720 --> 00:12:14.600 model training. This is particularly relevant for those huge foundation 219 00:12:14.759 --> 00:12:17.799 models which are trained on vast diverse data sets to 220 00:12:17.840 --> 00:12:22.720 gain broad understanding. The security considerations here are enormous potential 221 00:12:22.720 --> 00:12:26.679 PII leakage, regulatory and compliance violations like him A pair 222 00:12:26.799 --> 00:12:30.799 or GDPR, loss of public trust, and even complex inference 223 00:12:30.840 --> 00:12:33.919 attacks where attackers try to doce sensitive training data from 224 00:12:33.919 --> 00:12:35.000 the model's responses. 225 00:12:35.080 --> 00:12:37.039 So if you're training a model, the onus is really 226 00:12:37.039 --> 00:12:40.960 on you to ensure thoroughly sanitized data, regular audits, maybe 227 00:12:40.960 --> 00:12:45.759 differential privacy techniques, and definitely tokenization to specifically avoid leaking 228 00:12:45.840 --> 00:12:47.159 PII right exactly. 229 00:12:47.240 --> 00:12:50.679 The second avenue is something called retrieval augmented generation or 230 00:12:50.879 --> 00:12:54.039 r ADG. This is where the LLM recrieves relevant snippets 231 00:12:54.080 --> 00:12:56.759 from external data sets, maybe the live web or internal 232 00:12:56.759 --> 00:13:00.600 company databases before generates a response. It's fantastic for providing 233 00:13:00.720 --> 00:13:03.159 real time, up to date information, but uh oh, it 234 00:13:03.159 --> 00:13:04.519 opens entirely new risk. 235 00:13:04.399 --> 00:13:07.559 Vectors like pulling PII from public websites exactly. 236 00:13:07.919 --> 00:13:11.639 Think about unintentionally pulling PII from public comment sections on 237 00:13:11.759 --> 00:13:15.639 news articles, user profiles on forums, or even hidden web 238 00:13:15.679 --> 00:13:17.840 page metadata that the LLM scrapes. 239 00:13:18.080 --> 00:13:21.960 And what about when rragee allows direct access to internal 240 00:13:21.960 --> 00:13:24.840 company databases? That sounds risky It definitely is. 241 00:13:25.279 --> 00:13:29.320 With traditional relational databases, you're looking at risks like SQL injection, 242 00:13:29.639 --> 00:13:33.120 privileged escalation, if the llm's access isn't tightly controlled, and 243 00:13:33.159 --> 00:13:37.000 potential data breaches. For newer vector databases, the risk might 244 00:13:37.039 --> 00:13:41.240 be more subtle, like information leakage via similarity searches. An 245 00:13:41.279 --> 00:13:44.320 attacker might infer sensitive information by seeing what data points 246 00:13:44.320 --> 00:13:47.519 are close to their query in the vector space. Mitigation 247 00:13:47.639 --> 00:13:52.120 here demands strict role based access control RBAC, fine grained permissions, 248 00:13:52.159 --> 00:13:55.000 maybe automated data scanners looking for sensitive info and often 249 00:13:55.080 --> 00:13:58.960 using database views instead of giving the LLM directable access 250 00:13:59.000 --> 00:14:00.559 just to limit exposure, okay. 251 00:14:00.600 --> 00:14:03.000 And the third way they learn user interaction. 252 00:14:03.399 --> 00:14:08.559 Yes, llms often learn continuously from user queries, conversations, and feedback. 253 00:14:09.000 --> 00:14:12.600 This is where users can intentionally or inadvertently input sensitive 254 00:14:12.679 --> 00:14:16.840 data themselves. Think of an executive feeding confidential business strategies 255 00:14:16.840 --> 00:14:20.200 into a prompt for analysis, or a user sharing detailed 256 00:14:20.240 --> 00:14:23.240 medical symptoms with a health chatbot. The critical risk is 257 00:14:23.240 --> 00:14:26.000 that the LM might not recognize this input as sensitive 258 00:14:26.159 --> 00:14:29.360 and could later inadvertently disclose it to another user. This 259 00:14:29.399 --> 00:14:32.960 is precisely why Samsung famously banned chat GPT internally after 260 00:14:32.960 --> 00:14:34.440 finding evidence of IP leakage. 261 00:14:34.600 --> 00:14:37.080 Right, okay, now let's talk about when these powerful llms 262 00:14:37.639 --> 00:14:40.759 simply make things up. We call them hallucinations. That term 263 00:14:40.799 --> 00:14:42.600 itself is pretty evocative. 264 00:14:43.120 --> 00:14:48.240 Yes, that's precisely it. Hallucinations are when llms fabricate information, 265 00:14:48.720 --> 00:14:52.879 essentially generating data or narratives that are confidently inaccurate. As 266 00:14:52.919 --> 00:14:56.360 the book puts it, Some researchers prefer the term confabulation, 267 00:14:56.679 --> 00:15:00.159 but hallucination is certainly the more widely understood, maybe more 268 00:15:00.200 --> 00:15:04.360 alarming term. The real danger here isn't just the hallucination itself, 269 00:15:04.440 --> 00:15:07.240 but our collective tendency towards what the book calls over 270 00:15:07.320 --> 00:15:11.759 reliance are excessive trust in the LM's elaborations and exactness. 271 00:15:11.799 --> 00:15:12.759 We just assume it's right. 272 00:15:13.200 --> 00:15:15.559 So why do they hallucate? Are they just bad at facts? 273 00:15:15.679 --> 00:15:16.279 Is it a bug? 274 00:15:16.480 --> 00:15:19.200 Not exactly a bug in the traditional sense. It's fundamentally 275 00:15:19.240 --> 00:15:23.039 about how they operate. They are built for pattern matching 276 00:15:23.200 --> 00:15:28.600 and statistical extrapolation, not factual verification. They predict the next 277 00:15:28.720 --> 00:15:31.679 most probable word or phrase based on the vast amounts 278 00:15:31.720 --> 00:15:34.320 of texts they were trained on, so the quality and 279 00:15:34.399 --> 00:15:37.360 nature of that training data significantly impact how likely they 280 00:15:37.399 --> 00:15:41.279 are to hallucinate. Types can range from simple factual inaccuracies 281 00:15:41.320 --> 00:15:45.200 and making unsupported claims, to misrepresenting their own abilities like 282 00:15:45.240 --> 00:15:49.039 claiming chemistry expertise they don't have, or even generating contradictory 283 00:15:49.039 --> 00:15:50.759 statements within a single response. 284 00:15:51.440 --> 00:15:54.159 The examples here are pretty wild, and the consequences, again 285 00:15:54.200 --> 00:15:57.080 are very real. Like those lawyers who got sanctioned for 286 00:15:57.159 --> 00:16:01.279 submitting six completely fabricated chat GPTs generated case citations in 287 00:16:01.320 --> 00:16:04.159 the US federal court. Yeah, that's not just embarrassing. It 288 00:16:04.240 --> 00:16:08.080 has real world consequences for everyone involved, the lawyers themselves, 289 00:16:08.120 --> 00:16:11.440 the LM provider whose tool was misused, and it even 290 00:16:11.519 --> 00:16:15.240 impacts the perceived integrity of the entire legal profession. People 291 00:16:15.279 --> 00:16:16.279 need to check the outputs. 292 00:16:16.759 --> 00:16:20.799 Or consider that major airline that was successfully sued because 293 00:16:20.799 --> 00:16:25.799 it's chatbot provided inaccurate information about bereavement fairs. That case 294 00:16:25.879 --> 00:16:29.759 proved quite clearly that companies cannot simply disown the outputs 295 00:16:29.799 --> 00:16:32.840 of their AI systems. They are responsible, definitely. 296 00:16:32.879 --> 00:16:35.559 And then there's Brian Hood, a mayor in Australia who 297 00:16:35.600 --> 00:16:39.320 threatened to sue open ai after chet GPT falsely claimed 298 00:16:39.320 --> 00:16:40.960 he had served jail time for bribery. 299 00:16:41.159 --> 00:16:42.440 Oh wow, Yeah. 300 00:16:42.200 --> 00:16:45.000 This wasn't a joke. It was a serious potential blow 301 00:16:45.039 --> 00:16:48.240 to his reputation, apparently stemming from the model having limited 302 00:16:48.279 --> 00:16:50.759 training data about him and maybe conflating him with someone else. 303 00:16:51.120 --> 00:16:53.320 And for us in the tech world, there's that incredibly 304 00:16:53.399 --> 00:16:58.919 unsettling phenomenon of open source package arbasinations AI coding assistance 305 00:16:59.120 --> 00:17:03.519 literally inventing names for non existent open source libraries. Hackers 306 00:17:03.519 --> 00:17:06.640 can then exploit this by quickly creating malicious versions of 307 00:17:06.680 --> 00:17:10.599 these imaginary packages and uploading them to public repositories like 308 00:17:10.759 --> 00:17:11.920 NPM or PIPI. 309 00:17:12.200 --> 00:17:16.559 So developer trusting the AI assystem installs the fake package. 310 00:17:16.079 --> 00:17:19.920 And potentially gets hit with code injection. Research from places 311 00:17:19.960 --> 00:17:24.160 like Vulcan Cyber and Lasso Security found this is surprisingly common. 312 00:17:24.759 --> 00:17:27.039 Less So for instance, found up to thirty percent of 313 00:17:27.079 --> 00:17:31.960 coding questions asked to one popular model resulted in hallucinated packages. 314 00:17:32.440 --> 00:17:33.200 That's huge. 315 00:17:33.519 --> 00:17:37.519 That is huge. This raises an absolutely critical question. Who's 316 00:17:37.599 --> 00:17:40.440 ultimately responsible when things go wrong? Is it a people 317 00:17:40.480 --> 00:17:42.359 problem or the developer's fault. 318 00:17:42.559 --> 00:17:46.079 It's complicated, isn't it. While user education and critical thinking 319 00:17:46.119 --> 00:17:51.000 are undoubtedly vital, as developers and organizations deploying these systems, 320 00:17:51.039 --> 00:17:54.359 we are ultimately accountable for ensuring the information our software 321 00:17:54.440 --> 00:17:58.279 provides is as accurate and safe as possible. The legal 322 00:17:58.319 --> 00:18:02.480 cases vividly illustrate this varying responsibility. The lawyers were sanctioned 323 00:18:02.519 --> 00:18:05.559 for their professional negligence and failing to verify the facts, 324 00:18:06.039 --> 00:18:09.000 but Air Canada was directly held liable for their chatbot's 325 00:18:09.079 --> 00:18:12.880 inaccurate output. It suggests companies generally cannot deflect responsibility for 326 00:18:12.920 --> 00:18:16.400 AI generated content, especially in customer facing situations. 327 00:18:16.440 --> 00:18:18.079 Ye know, what are the best practices? Then how do 328 00:18:18.079 --> 00:18:20.319 we mitigate these widespread hallucinations? 329 00:18:20.400 --> 00:18:24.680 Well, first, expand the llm's domain specific knowledge. You can 330 00:18:24.720 --> 00:18:27.240 do this through fine tuning on curated data sets and 331 00:18:27.319 --> 00:18:31.640 using retrieval augmented Generation RAG with trusted, up to date sources. 332 00:18:32.240 --> 00:18:34.759 This helps make the LLM more of a specialist in 333 00:18:34.799 --> 00:18:38.759 a particular area, significantly reducing the likelihood of it wandering 334 00:18:38.799 --> 00:18:42.720 off into inaccurate territory because it has precise relevant data 335 00:18:42.799 --> 00:18:43.759 readily available. 336 00:18:43.839 --> 00:18:44.480 Okay, what else? 337 00:18:44.880 --> 00:18:47.440 Second, use something called chain of thought. See it's your 338 00:18:47.480 --> 00:18:50.880 reasoning and your prompting. This involves structuring the prompt to 339 00:18:51.000 --> 00:18:54.000 encourage the LLM to outline its reasoning process step by 340 00:18:54.000 --> 00:18:57.759 step before giving a final answer. It forces the LLM 341 00:18:57.839 --> 00:19:01.319 to essentially think through the problem, which which demonstrably reduces 342 00:19:01.359 --> 00:19:05.240 hallucinations and enhances overall accuracy. It also makes the output 343 00:19:05.279 --> 00:19:06.480 easier for humans to verify. 344 00:19:06.599 --> 00:19:09.200 And what about user involvement? Can users help make these 345 00:19:09.240 --> 00:19:10.920 models less prone to hallucination? 346 00:19:11.279 --> 00:19:15.119 Absolutely? Feedback loops are critical. Allowing users to easily flag 347 00:19:15.160 --> 00:19:19.240 problematic or inaccurate outputs, maybe using simple thumbs up thumbs 348 00:19:19.240 --> 00:19:23.359 down ratings or even providing fields for detailed feedback continuously 349 00:19:23.400 --> 00:19:26.480 helps to improve the model over time. This feedback then 350 00:19:26.519 --> 00:19:29.839 informs further fine tuning, improvements to the r acknowledge base 351 00:19:30.039 --> 00:19:32.480 and refinements to the co T prompting strategies. 352 00:19:32.599 --> 00:19:35.559 It sounds like clear communication about the LM's intended use 353 00:19:35.680 --> 00:19:40.079 and its limitations is also key here. Managing expectations absolutely crucial. 354 00:19:40.200 --> 00:19:44.039 Transparency is key inform users clearly about what the LLM 355 00:19:44.119 --> 00:19:47.079 can and cannot reliably do, how it handles their data, 356 00:19:47.119 --> 00:19:50.480 and how they can provide feedback. Things like tooltips, FAQs 357 00:19:50.519 --> 00:19:54.000 and maybe short tutorials can help, and user education itself 358 00:19:54.039 --> 00:19:56.720 is your final vital layer of defense. We need to 359 00:19:56.759 --> 00:19:59.799 teach users about these inherent trust issues, incurde cross checking 360 00:19:59.839 --> 00:20:03.680 of important information, promote situational awareness, knowing when it's okay 361 00:20:03.680 --> 00:20:06.920 to rely on the AI versus when human verification is essential, 362 00:20:07.079 --> 00:20:09.759 and make it easy for them to provide that constructive feedback. 363 00:20:10.079 --> 00:20:13.200 You know, it's just amazing to me how these models operate. Sometimes. 364 00:20:13.640 --> 00:20:17.920 The book even notes this truly bizarre quirk Google's AI search, 365 00:20:18.319 --> 00:20:23.079 suggesting things like glue is pizza topping or eating rocks daily? Right, 366 00:20:23.240 --> 00:20:25.720 Apparently lms don't really have a sense of humor or 367 00:20:25.759 --> 00:20:30.200 sarcasm detection and will actually interpret jokes or satirical content 368 00:20:30.519 --> 00:20:34.960 from non authoritative sources online as literal facts. Yeah. That 369 00:20:35.000 --> 00:20:38.480 feels like such a wild, unexpected edge case for developers 370 00:20:38.480 --> 00:20:40.759 to have to anticipate and somehow guard against. 371 00:20:40.960 --> 00:20:42.839 It really does the nuances are endless. 372 00:20:42.839 --> 00:20:45.960 Okay, So, after wrestling with prompt injection, sensitive data leaks 373 00:20:46.279 --> 00:20:50.079 and those unsettling hallucinations, it feels like we're channeling our 374 00:20:50.079 --> 00:20:53.319 inner fox molder here from the x files. Our next 375 00:20:53.359 --> 00:20:56.240 guiding mantra for LMM security, according to the playbook, has 376 00:20:56.240 --> 00:20:57.559 to be trust no One. 377 00:20:57.839 --> 00:21:01.599 Indeed, it's the core principle of zero trust, a concept 378 00:21:01.640 --> 00:21:05.400 first really codified by John kindervag At Forrester Research back 379 00:21:05.440 --> 00:21:09.200 in two thousand and nine. The mantra is simple, never trust, 380 00:21:09.519 --> 00:21:13.759 always verify. It means assuming breaches will happen, securing all 381 00:21:13.839 --> 00:21:18.480 resources comprehensively, enforcing the principle of least privileged access everywhere, 382 00:21:18.519 --> 00:21:21.119 and maintaining constant monitoring and validation. 383 00:21:21.359 --> 00:21:23.319 And for llms, this isn't just a good idea, it's 384 00:21:23.319 --> 00:21:24.319 an absolute necessity. 385 00:21:24.480 --> 00:21:29.440 Absolutely. Why Because, as we've discussed extensively, llms ingest potentially 386 00:21:29.519 --> 00:21:33.160 untrustworthy inputs from various sources, and their outputs cannot be 387 00:21:33.200 --> 00:21:35.880 fully trusted due to the inherent risks of prompt injection, 388 00:21:36.079 --> 00:21:41.160 sensitive information, disclosure, hallucination, and even generating toxic or bias content. 389 00:21:41.480 --> 00:21:43.519 You simply cannot implicitly trust them. 390 00:21:43.799 --> 00:21:46.480 So if trust no One is our guiding principle, how 391 00:21:46.480 --> 00:21:49.680 do we actually apply zero trust in practice to these 392 00:21:49.920 --> 00:21:53.200 highly dynamic and often unpredictable LM systems. What does that 393 00:21:53.200 --> 00:21:55.279 look like on the ground For developers building. 394 00:21:55.000 --> 00:21:58.440 These things, It generally boils down to two main tactical approaches, 395 00:21:59.000 --> 00:22:04.759 limiting the LMS unsupervised agency and implementing aggressive output filtering. First, 396 00:22:05.359 --> 00:22:09.079 limiting agency. Lms should never be allowed to make safety 397 00:22:09.119 --> 00:22:13.839 critical decisions or execute significant financial transactions without explicit human 398 00:22:13.880 --> 00:22:17.720 oversight and approval. That's the principle of least privilege in action. 399 00:22:17.920 --> 00:22:20.720 Give the LLM only the permissions it absolutely needs to 400 00:22:20.759 --> 00:22:23.759 perform its intended function and no more, all right. Second, 401 00:22:24.000 --> 00:22:27.559