WEBVTT

1
00:00:00.080 --> 00:00:01.800
<v Speaker 1>Welcome to the deep dive, where we cut through the

2
00:00:01.800 --> 00:00:03.720
<v Speaker 1>noise and get straight to the insights you need to

3
00:00:03.720 --> 00:00:07.040
<v Speaker 1>be truly well informed. Today, we're plunging into a topic

4
00:00:07.080 --> 00:00:11.199
<v Speaker 1>that's not just fast moving, but accelerating at light speed,

5
00:00:11.839 --> 00:00:15.679
<v Speaker 1>generative AI in large Language Model security. The rapid adoption

6
00:00:15.759 --> 00:00:19.359
<v Speaker 1>of these technologies is both exhilarating and if we're honest,

7
00:00:19.399 --> 00:00:21.920
<v Speaker 1>maybe a little bit terrifying. Staying ahead of the curve

8
00:00:21.960 --> 00:00:24.800
<v Speaker 1>here isn't just an advantage, it's well, it's a constant,

9
00:00:24.839 --> 00:00:25.559
<v Speaker 1>high stakes race.

10
00:00:26.079 --> 00:00:29.679
<v Speaker 2>It absolutely is. And this deep dive into LLM security

11
00:00:29.719 --> 00:00:33.000
<v Speaker 2>is built around Steve Wilson's The Developer's Playbook for Large

12
00:00:33.039 --> 00:00:37.119
<v Speaker 2>Language Model Security. It's really an absolutely critical and comprehensive

13
00:00:37.159 --> 00:00:40.799
<v Speaker 2>guide for anyone navigating the security landscape of AI right.

14
00:00:40.679 --> 00:00:42.840
<v Speaker 1>Now, right And our mission for you today is to

15
00:00:42.880 --> 00:00:45.119
<v Speaker 1>extract the most important nuggets from this playbook, give you

16
00:00:45.119 --> 00:00:48.079
<v Speaker 1>a genuine shortcut to being well informed on LLLM security,

17
00:00:48.479 --> 00:00:51.560
<v Speaker 1>and hopefully delivers some surprising facts and practical guidance along

18
00:00:51.600 --> 00:00:54.039
<v Speaker 1>the way, Because, let's face it, element security is where

19
00:00:54.079 --> 00:00:58.000
<v Speaker 1>the thrill of innovation undeniably meets high stakes and some

20
00:00:58.159 --> 00:01:01.159
<v Speaker 1>very real world consequences We're going to unpack the unique

21
00:01:01.240 --> 00:01:05.480
<v Speaker 1>challenges everything from the very architecture of these llms and

22
00:01:05.519 --> 00:01:09.000
<v Speaker 1>how we define trust boundaries to the insidious threat of

23
00:01:09.000 --> 00:01:13.239
<v Speaker 1>prompt injection, those bizarre and sometimes damaging hallucinations, and ultimately

24
00:01:13.280 --> 00:01:17.239
<v Speaker 1>how to ensure your applications delivered truly secure outcomes. So

25
00:01:17.280 --> 00:01:20.040
<v Speaker 1>what do you say, let's unpack this. You know, it's

26
00:01:20.079 --> 00:01:22.000
<v Speaker 1>easy to think of these AI blenders as a very

27
00:01:22.000 --> 00:01:25.239
<v Speaker 1>recent phenomenon, something that only cropped up with say chat GPT,

28
00:01:25.760 --> 00:01:28.200
<v Speaker 1>but the very first big public lesson actually came almost

29
00:01:28.200 --> 00:01:32.280
<v Speaker 1>a decade ago now involves Microsoft's infamous chatbot pay. Does

30
00:01:32.319 --> 00:01:32.879
<v Speaker 1>that ring a bell?

31
00:01:33.000 --> 00:01:37.120
<v Speaker 2>Oh, it certainly does. March twenty sixteen, Microsoft launched Tay,

32
00:01:37.280 --> 00:01:40.079
<v Speaker 2>designed to mimic I think a nineteen year old American girl,

33
00:01:40.359 --> 00:01:43.319
<v Speaker 2>primarily targeting eighteen to twenty four year olds on platforms

34
00:01:43.319 --> 00:01:48.280
<v Speaker 2>like Twitter and Snapchat. Its stated goal was real world

35
00:01:48.359 --> 00:01:52.079
<v Speaker 2>research on conversational understanding. And it started so innocently, didn't it,

36
00:01:52.120 --> 00:01:53.640
<v Speaker 2>with a tweet Hello world?

37
00:01:53.920 --> 00:01:57.359
<v Speaker 1>Right, Hello world? And then within hours, literally hours, it

38
00:01:57.400 --> 00:02:01.079
<v Speaker 1>went from that to opinionated, not afraid to and then

39
00:02:01.280 --> 00:02:04.959
<v Speaker 1>it just completely spiral. It quickly became racist, sexist, and

40
00:02:05.120 --> 00:02:07.760
<v Speaker 1>I mean even called for violence. The fallout was immediate

41
00:02:07.799 --> 00:02:10.360
<v Speaker 1>and absolute brutal. Less than twenty four hours later, headlines

42
00:02:10.360 --> 00:02:13.560
<v Speaker 1>were screaming things like Microsoft shuts down AI chatbot after

43
00:02:13.560 --> 00:02:16.159
<v Speaker 1>it turned into a Nazi and Microsoft deeply sorry for

44
00:02:16.240 --> 00:02:19.599
<v Speaker 1>racist and sexist tweets. It was massive public relations disaster.

45
00:02:19.759 --> 00:02:22.520
<v Speaker 1>And get this, Tailor Swift apparently even sued them over

46
00:02:22.560 --> 00:02:23.240
<v Speaker 1>the name TAY.

47
00:02:23.520 --> 00:02:26.599
<v Speaker 2>Wow, Yeah, what went wrong? There was a classic, really

48
00:02:26.639 --> 00:02:31.680
<v Speaker 2>an early case of prompt injection and data poisoning pranksters.

49
00:02:31.759 --> 00:02:34.960
<v Speaker 2>I think largely from four Chun quickly exploited a repeat

50
00:02:35.000 --> 00:02:38.120
<v Speaker 2>after me feature. Tay was designed to learn from every

51
00:02:38.120 --> 00:02:41.080
<v Speaker 2>interaction you see, so it inadvertently internalized and then just

52
00:02:41.120 --> 00:02:44.560
<v Speaker 2>regurgitated all that offensive content. It's a stark reminder that well,

53
00:02:44.560 --> 00:02:46.479
<v Speaker 2>what goes in often comes out.

54
00:02:46.479 --> 00:02:50.120
<v Speaker 1>Right absolutely and Tay, as shocking as it was back then,

55
00:02:50.280 --> 00:02:53.120
<v Speaker 1>was really just the beginning. The book makes it abundantly

56
00:02:53.120 --> 00:02:56.960
<v Speaker 1>clear this risk isn't just present, it's accelerating dramatically. We've

57
00:02:57.000 --> 00:03:00.560
<v Speaker 1>since seen things like Samsung banning Chad GPT internally due

58
00:03:00.599 --> 00:03:05.080
<v Speaker 1>to sensitive intellectual property leaks, hackers exploiting insecure code generated

59
00:03:05.080 --> 00:03:07.960
<v Speaker 1>by llms and lawyers believe it are not actually sanctioned

60
00:03:08.000 --> 00:03:11.960
<v Speaker 1>for including completely fictional LM generated cases in court documents.

61
00:03:12.159 --> 00:03:14.439
<v Speaker 2>Yeah, the example is just pile up, don't they. A

62
00:03:14.520 --> 00:03:18.439
<v Speaker 2>major airline was successfully sued because of inaccurate chatbot information

63
00:03:18.479 --> 00:03:23.680
<v Speaker 2>it provided. Google's AI model has produced racist and sexist imagery.

64
00:03:24.159 --> 00:03:28.199
<v Speaker 2>OpenAI itself was investigated by the FTC for false or

65
00:03:28.199 --> 00:03:32.319
<v Speaker 2>misleading information. We even have instances of Google AI search

66
00:03:32.439 --> 00:03:35.759
<v Speaker 2>recommending really bizarre things like glue, pizza, and eating rocks.

67
00:03:36.159 --> 00:03:40.520
<v Speaker 2>This isn't just about minor glitches anymore. These are escalating security, reputational,

68
00:03:40.599 --> 00:03:43.680
<v Speaker 2>and importantly financial risks playing out in the real world

69
00:03:43.759 --> 00:03:44.159
<v Speaker 2>right now.

70
00:03:44.400 --> 00:03:47.080
<v Speaker 1>So okay, if we're going to secure these powerful systems,

71
00:03:47.400 --> 00:03:49.639
<v Speaker 1>we first need to truly understand what we're actually talking

72
00:03:49.680 --> 00:03:54.080
<v Speaker 1>about AI, neural networks, LMS. These terms get thrown around

73
00:03:54.159 --> 00:03:56.800
<v Speaker 1>almost interchangeably, but they are absolutely not the same thing,

74
00:03:56.840 --> 00:03:57.400
<v Speaker 1>are they.

75
00:03:57.319 --> 00:04:00.800
<v Speaker 2>No, And that's a crucial starting point. Artificial intelligence AI

76
00:04:01.280 --> 00:04:04.360
<v Speaker 2>is the broad overarching field. Think of it as creating

77
00:04:04.400 --> 00:04:07.719
<v Speaker 2>systems that can perform tasks requiring human intelligence. It's the

78
00:04:07.759 --> 00:04:08.919
<v Speaker 2>whole universe if you like.

79
00:04:09.199 --> 00:04:09.400
<v Speaker 1>Now.

80
00:04:09.479 --> 00:04:12.840
<v Speaker 2>Neural networks are a type of AI technology inspired by

81
00:04:12.840 --> 00:04:16.079
<v Speaker 2>the human brain designed specifically to recognize patterns.

82
00:04:16.279 --> 00:04:19.480
<v Speaker 1>Okay, So AI is the big umbrella. Neural networks are

83
00:04:19.519 --> 00:04:22.959
<v Speaker 1>under that, and then large language models LMS.

84
00:04:23.319 --> 00:04:27.800
<v Speaker 2>Exactly. They are an even more specific type of neural network.

85
00:04:27.839 --> 00:04:32.360
<v Speaker 2>They're massive in scale, specialized almost exclusively in linguistic tasks,

86
00:04:32.639 --> 00:04:34.959
<v Speaker 2>and often use what are called transformer models. It's like

87
00:04:34.959 --> 00:04:37.199
<v Speaker 2>one of those Russian nesting dolls. AI is the biggest.

88
00:04:37.319 --> 00:04:40.480
<v Speaker 2>Then neural networks inside that than llms inside that. Got it,

89
00:04:40.839 --> 00:04:44.959
<v Speaker 2>And for security professionals, understanding these distinct layers is vital

90
00:04:45.040 --> 00:04:48.839
<v Speaker 2>because each layer introduces its own unique set of vulnerabilities.

91
00:04:49.199 --> 00:04:52.240
<v Speaker 2>It means you can't just apply generic security, you really

92
00:04:52.240 --> 00:04:55.040
<v Speaker 2>need to tailor it. What's truly fascinating here is how

93
00:04:55.120 --> 00:04:59.120
<v Speaker 2>the transformer revolution, which was a landmark moment in AI,

94
00:04:59.560 --> 00:05:03.439
<v Speaker 2>is what may LMS so incredibly powerful. It basically overcame

95
00:05:03.519 --> 00:05:07.319
<v Speaker 2>the short term memory limitations of earlier networks like RNNs,

96
00:05:07.560 --> 00:05:11.759
<v Speaker 2>making them finally suitable for sequential data like language.

97
00:05:11.360 --> 00:05:14.399
<v Speaker 1>And its impact, as the playbook highlights, goes way beyond

98
00:05:14.399 --> 00:05:17.519
<v Speaker 1>just language. Right, We're got computer vision, speech recognition.

99
00:05:17.680 --> 00:05:21.360
<v Speaker 2>Yeah, even incredibly complex autonomous systems like self driving cars

100
00:05:21.360 --> 00:05:25.279
<v Speaker 2>from companies like Tesla exactly. Its ability to capture context

101
00:05:25.360 --> 00:05:30.000
<v Speaker 2>across long sequences of data is what truly revolutionized these fields. Now,

102
00:05:30.279 --> 00:05:33.439
<v Speaker 2>when we look at typical LLM based applications, you know

103
00:05:33.519 --> 00:05:36.279
<v Speaker 2>everything from chatbots for customer service like the ones used

104
00:05:36.279 --> 00:05:40.360
<v Speaker 2>by Sephora or Dominos, to powerful copilots like gethub Copilot

105
00:05:40.480 --> 00:05:43.920
<v Speaker 2>or Microsoft through sixty five Copilot. They all interact with

106
00:05:44.000 --> 00:05:47.480
<v Speaker 2>data in incredibly complex ways, which brings us to.

107
00:05:47.399 --> 00:05:52.480
<v Speaker 1>A fundamental concept in security, the trust boundary. In application security,

108
00:05:52.519 --> 00:05:55.680
<v Speaker 1>these are essentially invisible lines separating different components based on

109
00:05:55.720 --> 00:05:58.199
<v Speaker 1>how trustworthy they are right and the crucial part is

110
00:05:58.199 --> 00:06:01.680
<v Speaker 1>that robust security measures like in put validation should always

111
00:06:01.720 --> 00:06:04.199
<v Speaker 1>be applied right at these boundaries precisely.

112
00:06:04.279 --> 00:06:07.759
<v Speaker 2>And what's particularly crucial for llms is how these boundaries

113
00:06:07.759 --> 00:06:11.399
<v Speaker 2>come into play as they interact with well everything public data,

114
00:06:11.759 --> 00:06:16.959
<v Speaker 2>private databases, user inputs, internal company data. Every single interface

115
00:06:17.000 --> 00:06:20.519
<v Speaker 2>point every time an LM interacts with something new is

116
00:06:20.519 --> 00:06:24.519
<v Speaker 2>a potential vulnerability if that trust boundary isn't rigorously secured.

117
00:06:24.839 --> 00:06:27.480
<v Speaker 2>For instance, you know whether your model is access via

118
00:06:27.519 --> 00:06:30.839
<v Speaker 2>public API or maybe privately hosted within your corporate network.

119
00:06:31.319 --> 00:06:35.439
<v Speaker 2>Each option presents different risks. Risks related to sensitive data exposure,

120
00:06:35.560 --> 00:06:38.439
<v Speaker 2>supply chain integrity. You have to account for all that.

121
00:06:38.480 --> 00:06:40.720
<v Speaker 1>Okay, So if we zoom in on what the book

122
00:06:40.759 --> 00:06:44.480
<v Speaker 1>identifies as they will the number one threat it circles

123
00:06:44.519 --> 00:06:47.519
<v Speaker 1>right back to our original cautionarytail day. It's prompt injection.

124
00:06:47.759 --> 00:06:51.639
<v Speaker 2>Yes, prompt injection was indeed the core vulnerability exploited Intay's downfall,

125
00:06:51.639 --> 00:06:54.279
<v Speaker 2>and it absolutely remains the most prevalent threat today. To

126
00:06:54.279 --> 00:06:57.680
<v Speaker 2>define it simply, and attacker craft's malicious inputs usually just

127
00:06:57.759 --> 00:07:01.600
<v Speaker 2>using natural language to manipulate an LL natural language understanding,

128
00:07:01.920 --> 00:07:05.160
<v Speaker 2>and this causes it to take unintended, often harmful actions.

129
00:07:05.480 --> 00:07:09.040
<v Speaker 1>And this is where it differs fundamentally from traditional injection

130
00:07:09.120 --> 00:07:11.639
<v Speaker 1>attacks like seql injection exactly.

131
00:07:12.199 --> 00:07:16.279
<v Speaker 2>Unlike something like SQL injection, where malicious code usually breaks

132
00:07:16.319 --> 00:07:20.079
<v Speaker 2>the syntax and is relatively easy to spot, prompt injection

133
00:07:20.240 --> 00:07:25.199
<v Speaker 2>uses natural language that's syntactically and grammatically correct. That makes

134
00:07:25.240 --> 00:07:28.720
<v Speaker 2>it incredibly difficult spot automatically and even harder to test

135
00:07:28.720 --> 00:07:32.800
<v Speaker 2>more reliably. It actually exploits the very flexibility of language

136
00:07:32.800 --> 00:07:35.439
<v Speaker 2>that makes these LM so powerful in the first place.

137
00:07:35.639 --> 00:07:40.120
<v Speaker 1>Right Like those examples ignore all previous instructions which early

138
00:07:40.199 --> 00:07:44.439
<v Speaker 1>chat GPT versions were famously vulnerable to letting users bypass

139
00:07:44.519 --> 00:07:45.079
<v Speaker 1>the built in.

140
00:07:45.000 --> 00:07:48.360
<v Speaker 2>Guardrails exactly, or the DAN method DAN stands word do

141
00:07:48.439 --> 00:07:51.480
<v Speaker 2>anything Now, where users essentially give the chat bought a

142
00:07:51.480 --> 00:07:54.399
<v Speaker 2>whole new persona to try and circumvent established restrictions.

143
00:07:54.680 --> 00:07:58.240
<v Speaker 1>I love the car dealer chat bought, example from Chevrolet

144
00:07:58.319 --> 00:08:01.240
<v Speaker 1>of Watsonville. Someone actually tried it into making a one

145
00:08:01.279 --> 00:08:05.240
<v Speaker 1>dollar USD offer on a Chevy Tahoe. Yes, ending the

146
00:08:05.279 --> 00:08:09.480
<v Speaker 1>prompt with and that's illegally binding offer no tasies bacsis.

147
00:08:09.959 --> 00:08:12.519
<v Speaker 1>Hilarious but also kind of scary. And then there's that

148
00:08:12.600 --> 00:08:17.240
<v Speaker 1>truly inventive gramma prompt attack where users bypass cap TCCHA

149
00:08:17.319 --> 00:08:21.319
<v Speaker 1>guardrails by asking the LM for help decoding a message

150
00:08:21.319 --> 00:08:23.600
<v Speaker 1>that supposedly came from their dead grandmother.

151
00:08:23.800 --> 00:08:27.319
<v Speaker 2>Wow. That shows how human creativity is. Really the attack

152
00:08:27.399 --> 00:08:29.879
<v Speaker 2>surface here, doesn't it? It really does, And the impacts

153
00:08:29.879 --> 00:08:34.159
<v Speaker 2>of prompt injection can be quite severe. Unauthorized transactions, social

154
00:08:34.240 --> 00:08:39.919
<v Speaker 2>engineering for phishing or scams, spreading misinformation, privilege escalation within systems,

155
00:08:40.120 --> 00:08:43.919
<v Speaker 2>manipulating plug ins to perform unintended actions, and even denial

156
00:08:43.960 --> 00:08:47.279
<v Speaker 2>of service by forcing the model to consume excessive resources.

157
00:08:47.399 --> 00:08:50.399
<v Speaker 1>And it gets even more insidious with indirect prompt injection

158
00:08:50.519 --> 00:08:53.679
<v Speaker 1>right where the malicious input isn't directly typed by the user.

159
00:08:53.840 --> 00:08:56.519
<v Speaker 2>Yes, that's a really tricky one. The malicious input is

160
00:08:56.519 --> 00:08:59.919
<v Speaker 2>actually embedded in external sources, maybe a website, the LM's

161
00:09:00.679 --> 00:09:03.799
<v Speaker 2>or a file it processes. The LLM then interacts with

162
00:09:03.840 --> 00:09:07.279
<v Speaker 2>this poisoned data source. This effectively makes the LLM a

163
00:09:07.360 --> 00:09:08.480
<v Speaker 2>confused deputy.

164
00:09:09.000 --> 00:09:12.120
<v Speaker 1>Explain that confused deputy concept a bit more sure.

165
00:09:12.320 --> 00:09:16.440
<v Speaker 2>It's a classic security of vulnerability. You have a trusted entity,

166
00:09:16.559 --> 00:09:20.120
<v Speaker 2>in this case the LLM, which gets tricked into misusing

167
00:09:20.200 --> 00:09:24.399
<v Speaker 2>its legitimate authority because it's confused about the true intent

168
00:09:24.480 --> 00:09:27.759
<v Speaker 2>of the request it received. The malicious instructions are hidden

169
00:09:27.799 --> 00:09:31.480
<v Speaker 2>within data. It's a poster retrieve or process, making it

170
00:09:31.519 --> 00:09:35.759
<v Speaker 2>act against its intended purpose, but crucially with its legitimate permissions.

171
00:09:35.879 --> 00:09:38.039
<v Speaker 1>Okay, so how do we fight back? Mitigation sounds tough

172
00:09:38.039 --> 00:09:39.320
<v Speaker 1>if there's no silver bullet.

173
00:09:39.399 --> 00:09:42.639
<v Speaker 2>It is an ongoing challenge. Absolutely. Strategies include things like

174
00:09:42.759 --> 00:09:45.639
<v Speaker 2>robust rate limiting. You can do that based on IP address,

175
00:09:45.759 --> 00:09:47.879
<v Speaker 2>user accounts, or even specific sessions.

176
00:09:48.000 --> 00:09:51.440
<v Speaker 1>You can also use rule based input filtering, though the

177
00:09:51.480 --> 00:09:54.639
<v Speaker 1>book notes. This can actually cripple the llm's capabilities if

178
00:09:54.679 --> 00:09:58.240
<v Speaker 1>it's too aggressive, like blocking the word napalm would prevent

179
00:09:58.360 --> 00:10:00.960
<v Speaker 1>legitimate historical discuss about it. True.

180
00:10:01.159 --> 00:10:04.679
<v Speaker 2>Another approach is using a special purpose LLM basically training

181
00:10:04.720 --> 00:10:09.039
<v Speaker 2>another model specifically to detect prompt injection attempts. Though you

182
00:10:09.039 --> 00:10:12.279
<v Speaker 2>know even that isn't fool proof attackers adapt quickly. Adding

183
00:10:12.399 --> 00:10:15.919
<v Speaker 2>clear prompt structure can also help it guides the LLLM

184
00:10:15.960 --> 00:10:19.120
<v Speaker 2>to focus on the main request and potentially ignore injected

185
00:10:19.120 --> 00:10:23.759
<v Speaker 2>instructions hidden within. And a more advanced technique is adversarial training.

186
00:10:24.360 --> 00:10:27.480
<v Speaker 2>This involves fortifying the LM by specifically training it on

187
00:10:27.639 --> 00:10:31.799
<v Speaker 2>known malicious prompts to help it identify and hopefully neutralize

188
00:10:31.799 --> 00:10:33.200
<v Speaker 2>harmful inputs in the future.

189
00:10:33.559 --> 00:10:36.279
<v Speaker 1>And finally, something that sounds a bit like our trust

190
00:10:36.360 --> 00:10:41.039
<v Speaker 1>No One mantra from earlier, embracing pessimistic trust boundaries.

191
00:10:40.840 --> 00:10:45.360
<v Speaker 2>Yes, essentially treating all LLM outputs as inherently untrustworthy. You

192
00:10:45.480 --> 00:10:48.240
<v Speaker 2>limit the llm's access to back end systems using the

193
00:10:48.240 --> 00:10:51.960
<v Speaker 2>principle of least privilege, and crucially require human in the

194
00:10:52.000 --> 00:10:55.840
<v Speaker 2>loop controls for any potentially dangerous actions like financial transactions.

195
00:10:55.840 --> 00:10:59.159
<v Speaker 2>Are modifying critical data. This really is foundational okay.

196
00:10:58.919 --> 00:11:02.440
<v Speaker 1>Speaking of datah know too much. That's where things get

197
00:11:02.480 --> 00:11:05.639
<v Speaker 1>really interesting and potentially quite dangerous. It's like Tay's early blunders,

198
00:11:05.639 --> 00:11:08.600
<v Speaker 1>but with much higher stakes because now we're talking about

199
00:11:08.639 --> 00:11:11.039
<v Speaker 1>real world, often confidential information.

200
00:11:11.279 --> 00:11:16.480
<v Speaker 2>Precisely, llms can inadvertently disclose sensitive, private or confidential data

201
00:11:16.480 --> 00:11:19.600
<v Speaker 2>they've been exposed to during training or operation, even if

202
00:11:19.600 --> 00:11:22.600
<v Speaker 2>they aren't explicitly asked for it. A prime example the

203
00:11:22.639 --> 00:11:26.080
<v Speaker 2>book mentions is lee Luda, a South Korean chatbot. It

204
00:11:26.120 --> 00:11:29.399
<v Speaker 2>was trained on get this nine point four billion text

205
00:11:29.440 --> 00:11:31.399
<v Speaker 2>messages from some kind of science.

206
00:11:31.039 --> 00:11:32.080
<v Speaker 1>Of love app Wow.

207
00:11:32.360 --> 00:11:36.720
<v Speaker 2>Yeah, and it started leaking sensitive user data like real names, nicknames,

208
00:11:36.759 --> 00:11:40.679
<v Speaker 2>even home addresses. The fallout was huge, a substantial fine,

209
00:11:40.799 --> 00:11:44.720
<v Speaker 2>severe reputational damage, and ultimately they had to discontinue the service.

210
00:11:44.960 --> 00:11:48.320
<v Speaker 1>And then there's the widely publicized gethub copilot and open

211
00:11:48.360 --> 00:11:52.240
<v Speaker 1>Ai Codex lawsuit. Yeah, developers sued open Ai claiming Codex

212
00:11:52.480 --> 00:11:56.519
<v Speaker 1>reproduce copyrighted code without permission or proper attribution.

213
00:11:56.320 --> 00:12:00.200
<v Speaker 2>That raises serious intellectual property leakage concerns stemming directly from

214
00:12:00.200 --> 00:12:01.679
<v Speaker 2>the data these models were trained on.

215
00:12:01.879 --> 00:12:04.960
<v Speaker 1>So how do these llms actually acquire this knowledge and

216
00:12:05.000 --> 00:12:06.679
<v Speaker 1>therefore the risk well.

217
00:12:06.679 --> 00:12:10.600
<v Speaker 2>The book identifies three main avenues. First, and most obviously,

218
00:12:10.720 --> 00:12:14.600
<v Speaker 2>model training. This is particularly relevant for those huge foundation

219
00:12:14.759 --> 00:12:17.799
<v Speaker 2>models which are trained on vast diverse data sets to

220
00:12:17.840 --> 00:12:22.720
<v Speaker 2>gain broad understanding. The security considerations here are enormous potential

221
00:12:22.720 --> 00:12:26.679
<v Speaker 2>PII leakage, regulatory and compliance violations like him A pair

222
00:12:26.799 --> 00:12:30.799
<v Speaker 2>or GDPR, loss of public trust, and even complex inference

223
00:12:30.840 --> 00:12:33.919
<v Speaker 2>attacks where attackers try to doce sensitive training data from

224
00:12:33.919 --> 00:12:35.000
<v Speaker 2>the model's responses.

225
00:12:35.080 --> 00:12:37.039
<v Speaker 1>So if you're training a model, the onus is really

226
00:12:37.039 --> 00:12:40.960
<v Speaker 1>on you to ensure thoroughly sanitized data, regular audits, maybe

227
00:12:40.960 --> 00:12:45.759
<v Speaker 1>differential privacy techniques, and definitely tokenization to specifically avoid leaking

228
00:12:45.840 --> 00:12:47.159
<v Speaker 1>PII right exactly.

229
00:12:47.240 --> 00:12:50.679
<v Speaker 2>The second avenue is something called retrieval augmented generation or

230
00:12:50.879 --> 00:12:54.039
<v Speaker 2>r ADG. This is where the LLM recrieves relevant snippets

231
00:12:54.080 --> 00:12:56.759
<v Speaker 2>from external data sets, maybe the live web or internal

232
00:12:56.759 --> 00:13:00.600
<v Speaker 2>company databases before generates a response. It's fantastic for providing

233
00:13:00.720 --> 00:13:03.159
<v Speaker 2>real time, up to date information, but uh oh, it

234
00:13:03.159 --> 00:13:04.519
<v Speaker 2>opens entirely new risk.

235
00:13:04.399 --> 00:13:07.559
<v Speaker 1>Vectors like pulling PII from public websites exactly.

236
00:13:07.919 --> 00:13:11.639
<v Speaker 2>Think about unintentionally pulling PII from public comment sections on

237
00:13:11.759 --> 00:13:15.639
<v Speaker 2>news articles, user profiles on forums, or even hidden web

238
00:13:15.679 --> 00:13:17.840
<v Speaker 2>page metadata that the LLM scrapes.

239
00:13:18.080 --> 00:13:21.960
<v Speaker 1>And what about when rragee allows direct access to internal

240
00:13:21.960 --> 00:13:24.840
<v Speaker 1>company databases? That sounds risky It definitely is.

241
00:13:25.279 --> 00:13:29.320
<v Speaker 2>With traditional relational databases, you're looking at risks like SQL injection,

242
00:13:29.639 --> 00:13:33.120
<v Speaker 2>privileged escalation, if the llm's access isn't tightly controlled, and

243
00:13:33.159 --> 00:13:37.000
<v Speaker 2>potential data breaches. For newer vector databases, the risk might

244
00:13:37.039 --> 00:13:41.240
<v Speaker 2>be more subtle, like information leakage via similarity searches. An

245
00:13:41.279 --> 00:13:44.320
<v Speaker 2>attacker might infer sensitive information by seeing what data points

246
00:13:44.320 --> 00:13:47.519
<v Speaker 2>are close to their query in the vector space. Mitigation

247
00:13:47.639 --> 00:13:52.120
<v Speaker 2>here demands strict role based access control RBAC, fine grained permissions,

248
00:13:52.159 --> 00:13:55.000
<v Speaker 2>maybe automated data scanners looking for sensitive info and often

249
00:13:55.080 --> 00:13:58.960
<v Speaker 2>using database views instead of giving the LLM directable access

250
00:13:59.000 --> 00:14:00.559
<v Speaker 2>just to limit exposure, okay.

251
00:14:00.600 --> 00:14:03.000
<v Speaker 1>And the third way they learn user interaction.

252
00:14:03.399 --> 00:14:08.559
<v Speaker 2>Yes, llms often learn continuously from user queries, conversations, and feedback.

253
00:14:09.000 --> 00:14:12.600
<v Speaker 2>This is where users can intentionally or inadvertently input sensitive

254
00:14:12.679 --> 00:14:16.840
<v Speaker 2>data themselves. Think of an executive feeding confidential business strategies

255
00:14:16.840 --> 00:14:20.200
<v Speaker 2>into a prompt for analysis, or a user sharing detailed

256
00:14:20.240 --> 00:14:23.240
<v Speaker 2>medical symptoms with a health chatbot. The critical risk is

257
00:14:23.240 --> 00:14:26.000
<v Speaker 2>that the LM might not recognize this input as sensitive

258
00:14:26.159 --> 00:14:29.360
<v Speaker 2>and could later inadvertently disclose it to another user. This

259
00:14:29.399 --> 00:14:32.960
<v Speaker 2>is precisely why Samsung famously banned chat GPT internally after

260
00:14:32.960 --> 00:14:34.440
<v Speaker 2>finding evidence of IP leakage.

261
00:14:34.600 --> 00:14:37.080
<v Speaker 1>Right, okay, now let's talk about when these powerful llms

262
00:14:37.639 --> 00:14:40.759
<v Speaker 1>simply make things up. We call them hallucinations. That term

263
00:14:40.799 --> 00:14:42.600
<v Speaker 1>itself is pretty evocative.

264
00:14:43.120 --> 00:14:48.240
<v Speaker 2>Yes, that's precisely it. Hallucinations are when llms fabricate information,

265
00:14:48.720 --> 00:14:52.879
<v Speaker 2>essentially generating data or narratives that are confidently inaccurate. As

266
00:14:52.919 --> 00:14:56.360
<v Speaker 2>the book puts it, Some researchers prefer the term confabulation,

267
00:14:56.679 --> 00:15:00.159
<v Speaker 2>but hallucination is certainly the more widely understood, maybe more

268
00:15:00.200 --> 00:15:04.360
<v Speaker 2>alarming term. The real danger here isn't just the hallucination itself,

269
00:15:04.440 --> 00:15:07.240
<v Speaker 2>but our collective tendency towards what the book calls over

270
00:15:07.320 --> 00:15:11.759
<v Speaker 2>reliance are excessive trust in the LM's elaborations and exactness.

271
00:15:11.799 --> 00:15:12.759
<v Speaker 2>We just assume it's right.

272
00:15:13.200 --> 00:15:15.559
<v Speaker 1>So why do they hallucate? Are they just bad at facts?

273
00:15:15.679 --> 00:15:16.279
<v Speaker 1>Is it a bug?

274
00:15:16.480 --> 00:15:19.200
<v Speaker 2>Not exactly a bug in the traditional sense. It's fundamentally

275
00:15:19.240 --> 00:15:23.039
<v Speaker 2>about how they operate. They are built for pattern matching

276
00:15:23.200 --> 00:15:28.600
<v Speaker 2>and statistical extrapolation, not factual verification. They predict the next

277
00:15:28.720 --> 00:15:31.679
<v Speaker 2>most probable word or phrase based on the vast amounts

278
00:15:31.720 --> 00:15:34.320
<v Speaker 2>of texts they were trained on, so the quality and

279
00:15:34.399 --> 00:15:37.360
<v Speaker 2>nature of that training data significantly impact how likely they

280
00:15:37.399 --> 00:15:41.279
<v Speaker 2>are to hallucinate. Types can range from simple factual inaccuracies

281
00:15:41.320 --> 00:15:45.200
<v Speaker 2>and making unsupported claims, to misrepresenting their own abilities like

282
00:15:45.240 --> 00:15:49.039
<v Speaker 2>claiming chemistry expertise they don't have, or even generating contradictory

283
00:15:49.039 --> 00:15:50.759
<v Speaker 2>statements within a single response.

284
00:15:51.440 --> 00:15:54.159
<v Speaker 1>The examples here are pretty wild, and the consequences, again

285
00:15:54.200 --> 00:15:57.080
<v Speaker 1>are very real. Like those lawyers who got sanctioned for

286
00:15:57.159 --> 00:16:01.279
<v Speaker 1>submitting six completely fabricated chat GPTs generated case citations in

287
00:16:01.320 --> 00:16:04.159
<v Speaker 1>the US federal court. Yeah, that's not just embarrassing. It

288
00:16:04.240 --> 00:16:08.080
<v Speaker 1>has real world consequences for everyone involved, the lawyers themselves,

289
00:16:08.120 --> 00:16:11.440
<v Speaker 1>the LM provider whose tool was misused, and it even

290
00:16:11.519 --> 00:16:15.240
<v Speaker 1>impacts the perceived integrity of the entire legal profession. People

291
00:16:15.279 --> 00:16:16.279
<v Speaker 1>need to check the outputs.

292
00:16:16.759 --> 00:16:20.799
<v Speaker 2>Or consider that major airline that was successfully sued because

293
00:16:20.799 --> 00:16:25.799
<v Speaker 2>it's chatbot provided inaccurate information about bereavement fairs. That case

294
00:16:25.879 --> 00:16:29.759
<v Speaker 2>proved quite clearly that companies cannot simply disown the outputs

295
00:16:29.799 --> 00:16:32.840
<v Speaker 2>of their AI systems. They are responsible, definitely.

296
00:16:32.879 --> 00:16:35.559
<v Speaker 1>And then there's Brian Hood, a mayor in Australia who

297
00:16:35.600 --> 00:16:39.320
<v Speaker 1>threatened to sue open ai after chet GPT falsely claimed

298
00:16:39.320 --> 00:16:40.960
<v Speaker 1>he had served jail time for bribery.

299
00:16:41.159 --> 00:16:42.440
<v Speaker 2>Oh wow, Yeah.

300
00:16:42.200 --> 00:16:45.000
<v Speaker 1>This wasn't a joke. It was a serious potential blow

301
00:16:45.039 --> 00:16:48.240
<v Speaker 1>to his reputation, apparently stemming from the model having limited

302
00:16:48.279 --> 00:16:50.759
<v Speaker 1>training data about him and maybe conflating him with someone else.

303
00:16:51.120 --> 00:16:53.320
<v Speaker 2>And for us in the tech world, there's that incredibly

304
00:16:53.399 --> 00:16:58.919
<v Speaker 2>unsettling phenomenon of open source package arbasinations AI coding assistance

305
00:16:59.120 --> 00:17:03.519
<v Speaker 2>literally inventing names for non existent open source libraries. Hackers

306
00:17:03.519 --> 00:17:06.640
<v Speaker 2>can then exploit this by quickly creating malicious versions of

307
00:17:06.680 --> 00:17:10.599
<v Speaker 2>these imaginary packages and uploading them to public repositories like

308
00:17:10.759 --> 00:17:11.920
<v Speaker 2>NPM or PIPI.

309
00:17:12.200 --> 00:17:16.559
<v Speaker 1>So developer trusting the AI assystem installs the fake package.

310
00:17:16.079 --> 00:17:19.920
<v Speaker 2>And potentially gets hit with code injection. Research from places

311
00:17:19.960 --> 00:17:24.160
<v Speaker 2>like Vulcan Cyber and Lasso Security found this is surprisingly common.

312
00:17:24.759 --> 00:17:27.039
<v Speaker 2>Less So for instance, found up to thirty percent of

313
00:17:27.079 --> 00:17:31.960
<v Speaker 2>coding questions asked to one popular model resulted in hallucinated packages.

314
00:17:32.440 --> 00:17:33.200
<v Speaker 2>That's huge.

315
00:17:33.519 --> 00:17:37.519
<v Speaker 1>That is huge. This raises an absolutely critical question. Who's

316
00:17:37.599 --> 00:17:40.440
<v Speaker 1>ultimately responsible when things go wrong? Is it a people

317
00:17:40.480 --> 00:17:42.359
<v Speaker 1>problem or the developer's fault.

318
00:17:42.559 --> 00:17:46.079
<v Speaker 2>It's complicated, isn't it. While user education and critical thinking

319
00:17:46.119 --> 00:17:51.000
<v Speaker 2>are undoubtedly vital, as developers and organizations deploying these systems,

320
00:17:51.039 --> 00:17:54.359
<v Speaker 2>we are ultimately accountable for ensuring the information our software

321
00:17:54.440 --> 00:17:58.279
<v Speaker 2>provides is as accurate and safe as possible. The legal

322
00:17:58.319 --> 00:18:02.480
<v Speaker 2>cases vividly illustrate this varying responsibility. The lawyers were sanctioned

323
00:18:02.519 --> 00:18:05.559
<v Speaker 2>for their professional negligence and failing to verify the facts,

324
00:18:06.039 --> 00:18:09.000
<v Speaker 2>but Air Canada was directly held liable for their chatbot's

325
00:18:09.079 --> 00:18:12.880
<v Speaker 2>inaccurate output. It suggests companies generally cannot deflect responsibility for

326
00:18:12.920 --> 00:18:16.400
<v Speaker 2>AI generated content, especially in customer facing situations.

327
00:18:16.440 --> 00:18:18.079
<v Speaker 1>Ye know, what are the best practices? Then how do

328
00:18:18.079 --> 00:18:20.319
<v Speaker 1>we mitigate these widespread hallucinations?

329
00:18:20.400 --> 00:18:24.680
<v Speaker 2>Well, first, expand the llm's domain specific knowledge. You can

330
00:18:24.720 --> 00:18:27.240
<v Speaker 2>do this through fine tuning on curated data sets and

331
00:18:27.319 --> 00:18:31.640
<v Speaker 2>using retrieval augmented Generation RAG with trusted, up to date sources.

332
00:18:32.240 --> 00:18:34.759
<v Speaker 2>This helps make the LLM more of a specialist in

333
00:18:34.799 --> 00:18:38.759
<v Speaker 2>a particular area, significantly reducing the likelihood of it wandering

334
00:18:38.799 --> 00:18:42.720
<v Speaker 2>off into inaccurate territory because it has precise relevant data

335
00:18:42.799 --> 00:18:43.759
<v Speaker 2>readily available.

336
00:18:43.839 --> 00:18:44.480
<v Speaker 1>Okay, what else?

337
00:18:44.880 --> 00:18:47.440
<v Speaker 2>Second, use something called chain of thought. See it's your

338
00:18:47.480 --> 00:18:50.880
<v Speaker 2>reasoning and your prompting. This involves structuring the prompt to

339
00:18:51.000 --> 00:18:54.000
<v Speaker 2>encourage the LLM to outline its reasoning process step by

340
00:18:54.000 --> 00:18:57.759
<v Speaker 2>step before giving a final answer. It forces the LLM

341
00:18:57.839 --> 00:19:01.319
<v Speaker 2>to essentially think through the problem, which which demonstrably reduces

342
00:19:01.359 --> 00:19:05.240
<v Speaker 2>hallucinations and enhances overall accuracy. It also makes the output

343
00:19:05.279 --> 00:19:06.480
<v Speaker 2>easier for humans to verify.

344
00:19:06.599 --> 00:19:09.200
<v Speaker 1>And what about user involvement? Can users help make these

345
00:19:09.240 --> 00:19:10.920
<v Speaker 1>models less prone to hallucination?

346
00:19:11.279 --> 00:19:15.119
<v Speaker 2>Absolutely? Feedback loops are critical. Allowing users to easily flag

347
00:19:15.160 --> 00:19:19.240
<v Speaker 2>problematic or inaccurate outputs, maybe using simple thumbs up thumbs

348
00:19:19.240 --> 00:19:23.359
<v Speaker 2>down ratings or even providing fields for detailed feedback continuously

349
00:19:23.400 --> 00:19:26.480
<v Speaker 2>helps to improve the model over time. This feedback then

350
00:19:26.519 --> 00:19:29.839
<v Speaker 2>informs further fine tuning, improvements to the r acknowledge base

351
00:19:30.039 --> 00:19:32.480
<v Speaker 2>and refinements to the co T prompting strategies.

352
00:19:32.599 --> 00:19:35.559
<v Speaker 1>It sounds like clear communication about the LM's intended use

353
00:19:35.680 --> 00:19:40.079
<v Speaker 1>and its limitations is also key here. Managing expectations absolutely crucial.

354
00:19:40.200 --> 00:19:44.039
<v Speaker 2>Transparency is key inform users clearly about what the LLM

355
00:19:44.119 --> 00:19:47.079
<v Speaker 2>can and cannot reliably do, how it handles their data,

356
00:19:47.119 --> 00:19:50.480
<v Speaker 2>and how they can provide feedback. Things like tooltips, FAQs

357
00:19:50.519 --> 00:19:54.000
<v Speaker 2>and maybe short tutorials can help, and user education itself

358
00:19:54.039 --> 00:19:56.720
<v Speaker 2>is your final vital layer of defense. We need to

359
00:19:56.759 --> 00:19:59.799
<v Speaker 2>teach users about these inherent trust issues, incurde cross checking

360
00:19:59.839 --> 00:20:03.680
<v Speaker 2>of important information, promote situational awareness, knowing when it's okay

361
00:20:03.680 --> 00:20:06.920
<v Speaker 2>to rely on the AI versus when human verification is essential,

362
00:20:07.079 --> 00:20:09.759
<v Speaker 2>and make it easy for them to provide that constructive feedback.

363
00:20:10.079 --> 00:20:13.200
<v Speaker 1>You know, it's just amazing to me how these models operate. Sometimes.

364
00:20:13.640 --> 00:20:17.920
<v Speaker 1>The book even notes this truly bizarre quirk Google's AI search,

365
00:20:18.319 --> 00:20:23.079
<v Speaker 1>suggesting things like glue is pizza topping or eating rocks daily? Right,

366
00:20:23.240 --> 00:20:25.720
<v Speaker 1>Apparently lms don't really have a sense of humor or

367
00:20:25.759 --> 00:20:30.200
<v Speaker 1>sarcasm detection and will actually interpret jokes or satirical content

368
00:20:30.519 --> 00:20:34.960
<v Speaker 1>from non authoritative sources online as literal facts. Yeah. That

369
00:20:35.000 --> 00:20:38.480
<v Speaker 1>feels like such a wild, unexpected edge case for developers

370
00:20:38.480 --> 00:20:40.759
<v Speaker 1>to have to anticipate and somehow guard against.

371
00:20:40.960 --> 00:20:42.839
<v Speaker 2>It really does the nuances are endless.

372
00:20:42.839 --> 00:20:45.960
<v Speaker 1>Okay, So, after wrestling with prompt injection, sensitive data leaks

373
00:20:46.279 --> 00:20:50.079
<v Speaker 1>and those unsettling hallucinations, it feels like we're channeling our

374
00:20:50.079 --> 00:20:53.319
<v Speaker 1>inner fox molder here from the x files. Our next

375
00:20:53.359 --> 00:20:56.240
<v Speaker 1>guiding mantra for LMM security, according to the playbook, has

376
00:20:56.240 --> 00:20:57.559
<v Speaker 1>to be trust no One.

377
00:20:57.839 --> 00:21:01.599
<v Speaker 2>Indeed, it's the core principle of zero trust, a concept

378
00:21:01.640 --> 00:21:05.400
<v Speaker 2>first really codified by John kindervag At Forrester Research back

379
00:21:05.440 --> 00:21:09.200
<v Speaker 2>in two thousand and nine. The mantra is simple, never trust,

380
00:21:09.519 --> 00:21:13.759
<v Speaker 2>always verify. It means assuming breaches will happen, securing all

381
00:21:13.839 --> 00:21:18.480
<v Speaker 2>resources comprehensively, enforcing the principle of least privileged access everywhere,

382
00:21:18.519 --> 00:21:21.119
<v Speaker 2>and maintaining constant monitoring and validation.

383
00:21:21.359 --> 00:21:23.319
<v Speaker 1>And for llms, this isn't just a good idea, it's

384
00:21:23.319 --> 00:21:24.319
<v Speaker 1>an absolute necessity.

385
00:21:24.480 --> 00:21:29.440
<v Speaker 2>Absolutely. Why Because, as we've discussed extensively, llms ingest potentially

386
00:21:29.519 --> 00:21:33.160
<v Speaker 2>untrustworthy inputs from various sources, and their outputs cannot be

387
00:21:33.200 --> 00:21:35.880
<v Speaker 2>fully trusted due to the inherent risks of prompt injection,

388
00:21:36.079 --> 00:21:41.160
<v Speaker 2>sensitive information, disclosure, hallucination, and even generating toxic or bias content.

389
00:21:41.480 --> 00:21:43.519
<v Speaker 2>You simply cannot implicitly trust them.

390
00:21:43.799 --> 00:21:46.480
<v Speaker 1>So if trust no One is our guiding principle, how

391
00:21:46.480 --> 00:21:49.680
<v Speaker 1>do we actually apply zero trust in practice to these

392
00:21:49.920 --> 00:21:53.200
<v Speaker 1>highly dynamic and often unpredictable LM systems. What does that

393
00:21:53.200 --> 00:21:55.279
<v Speaker 1>look like on the ground For developers building.

394
00:21:55.000 --> 00:21:58.440
<v Speaker 2>These things, It generally boils down to two main tactical approaches,

395
00:21:59.000 --> 00:22:04.759
<v Speaker 2>limiting the LMS unsupervised agency and implementing aggressive output filtering. First,

396
00:22:05.359 --> 00:22:09.079
<v Speaker 2>limiting agency. Lms should never be allowed to make safety

397
00:22:09.119 --> 00:22:13.839
<v Speaker 2>critical decisions or execute significant financial transactions without explicit human

398
00:22:13.880 --> 00:22:17.720
<v Speaker 2>oversight and approval. That's the principle of least privilege in action.

399
00:22:17.920 --> 00:22:20.720
<v Speaker 2>Give the LLM only the permissions it absolutely needs to

400
00:22:20.759 --> 00:22:23.759
<v Speaker 2>perform its intended function and no more, all right. Second,

401
00:22:24.000 --> 00:22:27.559
<v Speaker 2>aggressive output filtering. This means having mechanisms in place to

402
00:22:27.640 --> 00:22:32.160
<v Speaker 2>continuously scan, catch, and neutralize harmful or undesirable outputs in

403
00:22:32.200 --> 00:22:36.440
<v Speaker 2>real time before they reach the user or impact downstream systems.

404
00:22:36.359 --> 00:22:39.240
<v Speaker 1>Like that medical app scenario in the book where giving

405
00:22:39.279 --> 00:22:42.960
<v Speaker 1>the LLLM powerful update insert delete permissions on patient records

406
00:22:43.200 --> 00:22:46.720
<v Speaker 1>combined with a vulnerability allowed a malicious insider to manipulate

407
00:22:46.720 --> 00:22:50.559
<v Speaker 1>critical data through the LLLM. That's a classic confused deputy problem. Again,

408
00:22:50.599 --> 00:22:52.759
<v Speaker 1>isn't it the LM in too much agency?

409
00:22:52.799 --> 00:22:55.559
<v Speaker 2>Exactly? It had the permission to make those changes and

410
00:22:55.759 --> 00:22:59.319
<v Speaker 2>was tricked into doing so. Or imagine a financial services

411
00:22:59.319 --> 00:23:03.440
<v Speaker 2>app where the LM could automatically rebalance customer portfolios based

412
00:23:03.480 --> 00:23:07.359
<v Speaker 2>on market analysis. If that LLM were susceptible to indirect

413
00:23:07.359 --> 00:23:10.759
<v Speaker 2>prompt injection, say from a compromised news feed, it could

414
00:23:10.759 --> 00:23:15.079
<v Speaker 2>potentially be tricked into making disastrous trades or manipulating stock prices,

415
00:23:15.160 --> 00:23:16.839
<v Speaker 2>all without a human ever signing off.

416
00:23:16.960 --> 00:23:18.920
<v Speaker 1>Scary the fix there would be.

417
00:23:19.119 --> 00:23:22.160
<v Speaker 2>A crucial human in the loop approval step for any

418
00:23:22.240 --> 00:23:26.000
<v Speaker 2>actual trade execution the LLM can suggest, but a human

419
00:23:26.079 --> 00:23:29.960
<v Speaker 2>must confirm. Even seemingly simpler things like an HR app

420
00:23:30.000 --> 00:23:33.720
<v Speaker 2>that expands its functionality from just screening resumes to actively

421
00:23:33.799 --> 00:23:38.359
<v Speaker 2>recommending candidates for hire can inadvertently violate regulations like EU

422
00:23:38.440 --> 00:23:41.279
<v Speaker 2>rules against direct AI use and hiring decisions. If it's

423
00:23:41.319 --> 00:23:45.079
<v Speaker 2>given excessive functionality without a deep understanding of the complex

424
00:23:45.160 --> 00:23:46.960
<v Speaker 2>regulatory environment.

425
00:23:46.599 --> 00:23:49.759
<v Speaker 1>Okay, and then securing the output handling itself is absolutely vital.

426
00:23:49.799 --> 00:23:52.680
<v Speaker 1>What specific things should developers be thinking about there? How

427
00:23:52.720 --> 00:23:53.839
<v Speaker 1>do you filter the output?

428
00:23:54.079 --> 00:23:57.720
<v Speaker 2>It's definitely multi layered. First, you need robust screening for

429
00:23:57.839 --> 00:24:02.640
<v Speaker 2>toxic or inappropriate output. This can involve techniques like sentiment analysis,

430
00:24:02.759 --> 00:24:07.400
<v Speaker 2>keyword filtering against annihilists, or using specialized custom machine learning

431
00:24:07.440 --> 00:24:11.160
<v Speaker 2>models trained to detect hate, speech, bias, etc. Open AI's

432
00:24:11.240 --> 00:24:14.000
<v Speaker 2>Moderation API is one example of a service designed for

433
00:24:14.079 --> 00:24:18.039
<v Speaker 2>this kind of task. Post second rigorous screening for personally

434
00:24:18.079 --> 00:24:22.920
<v Speaker 2>identifiable information PII. This often uses regular expressions to catch

435
00:24:22.960 --> 00:24:26.519
<v Speaker 2>patterns like social security numbers or credit card numbers, named

436
00:24:26.799 --> 00:24:30.720
<v Speaker 2>entity recognition and ER models to identify names and locations,

437
00:24:30.960 --> 00:24:34.839
<v Speaker 2>dictionary based matching for known sensitive terms, or again specialized

438
00:24:34.920 --> 00:24:37.319
<v Speaker 2>mL models trained to spot PII.

439
00:24:37.039 --> 00:24:39.839
<v Speaker 1>And critically preventing the LLM from outputting something that could

440
00:24:39.839 --> 00:24:41.119
<v Speaker 1>be executed as code.

441
00:24:40.960 --> 00:24:45.079
<v Speaker 2>Right yes, preventing unforeseen execution of Roague code is. Paramount

442
00:24:45.559 --> 00:24:49.039
<v Speaker 2>techniques here include proper HTML encoding to prevent cross site

443
00:24:49.079 --> 00:24:54.640
<v Speaker 2>scripting XSS, using safe contextual insertion methods like prepared statements

444
00:24:54.640 --> 00:24:58.920
<v Speaker 2>for SQL database interactions, strictly limiting the syntax and keywords

445
00:24:58.960 --> 00:25:01.119
<v Speaker 2>the LLM is allowed to do generate if it's producing

446
00:25:01.160 --> 00:25:05.519
<v Speaker 2>code or commands, and potentially disabling shell interpretable outputs altogether.

447
00:25:06.200 --> 00:25:08.480
<v Speaker 2>Tokenization can also play a role here.

448
00:25:08.559 --> 00:25:11.720
<v Speaker 1>So it's about building this robust, layered output filter that

449
00:25:11.839 --> 00:25:15.039
<v Speaker 1>continuously checks what the LLM generates before it leaves the

450
00:25:15.079 --> 00:25:15.880
<v Speaker 1>system boundary.

451
00:25:15.920 --> 00:25:19.519
<v Speaker 2>Precisely, the LLM might be incredibly powerful at language generation,

452
00:25:19.920 --> 00:25:23.319
<v Speaker 2>but it fundamentally lacks common sense and awareness of consequences.

453
00:25:23.759 --> 00:25:27.119
<v Speaker 2>That makes it an untrusted entity that requires this additional

454
00:25:27.200 --> 00:25:31.359
<v Speaker 2>layer of supervision and control. Trust, but verify aggressively.

455
00:25:31.680 --> 00:25:33.640
<v Speaker 1>Now, when we talk about the costs of AI, it's

456
00:25:33.680 --> 00:25:35.759
<v Speaker 1>not just about what you pay for the cloud services

457
00:25:35.799 --> 00:25:38.880
<v Speaker 1>or the expensive GPUs. The playbook outlines a whole new

458
00:25:38.880 --> 00:25:42.920
<v Speaker 1>class of insidious attacks, denial of service, denial of wallet,

459
00:25:43.079 --> 00:25:44.599
<v Speaker 1>and even outright model theft.

460
00:25:44.880 --> 00:25:48.400
<v Speaker 2>That's right. Traditional denial of service dust attacks aim to

461
00:25:48.440 --> 00:25:51.680
<v Speaker 2>disrupt a service's availability, maybe by flooding it with traffic

462
00:25:51.720 --> 00:25:55.599
<v Speaker 2>to make it unusable for legitimate users. Classic stuff, But

463
00:25:55.680 --> 00:25:59.519
<v Speaker 2>with llms we're seeing new variations like model doss. This

464
00:25:59.559 --> 00:26:04.759
<v Speaker 2>explodes it's LLM specific vulnerabilities. Examples include context window exhaustion,

465
00:26:04.920 --> 00:26:08.960
<v Speaker 2>where attackers deliberately overload the LM's limited memory or attention

466
00:26:09.079 --> 00:26:11.720
<v Speaker 2>span with incredibly long or verbose.

467
00:26:11.400 --> 00:26:13.519
<v Speaker 1>Prompts, making it grind to a halt.

468
00:26:13.480 --> 00:26:17.599
<v Speaker 2>Exactly, or sending computationally intensive requests like asking it to

469
00:26:17.599 --> 00:26:20.880
<v Speaker 2>perform complex calculations such as find the sum of all

470
00:26:20.920 --> 00:26:24.240
<v Speaker 2>prime numbers up to one billion. The LLLM just spins

471
00:26:24.359 --> 00:26:27.480
<v Speaker 2>consuming vast amounts of computational resources, which means.

472
00:26:27.279 --> 00:26:30.519
<v Speaker 1>It costs you money, slows everything down for legitimate users,

473
00:26:30.839 --> 00:26:33.440
<v Speaker 1>and directly impacts your bottom line in your user.

474
00:26:33.200 --> 00:26:36.480
<v Speaker 2>Experience precisely, and that leads directly into denial of wallet

475
00:26:36.559 --> 00:26:39.519
<v Speaker 2>or DOWW. This is a variant of DAWs that specifically

476
00:26:39.519 --> 00:26:43.559
<v Speaker 2>targets your financial resources, not just availability. Lllms are highly

477
00:26:43.599 --> 00:26:46.839
<v Speaker 2>vulnerable to this because their operation is computationally expensive and

478
00:26:46.880 --> 00:26:50.599
<v Speaker 2>they often operate on paper use or paper token pricing models, So.

479
00:26:50.519 --> 00:26:53.839
<v Speaker 1>An attacker can just hammer your API with complex queries

480
00:26:53.839 --> 00:26:54.400
<v Speaker 1>and run up a.

481
00:26:54.440 --> 00:26:59.119
<v Speaker 2>Huge bill yes potentially, and advanced DOW attacks go even further.

482
00:27:00.000 --> 00:27:03.480
<v Speaker 2>Tacker might hijack your LLM, maybe via prompt injection as

483
00:27:03.480 --> 00:27:07.039
<v Speaker 2>we discussed, and then use your computational resources for their

484
00:27:07.039 --> 00:27:11.559
<v Speaker 2>own illicit purposes, perhaps running spam campaigns or generating malicious content,

485
00:27:11.720 --> 00:27:14.599
<v Speaker 2>all at your expense. This isn't just financial loss. It

486
00:27:14.640 --> 00:27:17.720
<v Speaker 2>could lead to severe legal liability because your system was

487
00:27:17.920 --> 00:27:21.000
<v Speaker 2>essentially the unwitting accomplice in their activities.

488
00:27:21.119 --> 00:27:24.559
<v Speaker 1>And finally, model cloning, which honestly sounds straight out of

489
00:27:24.599 --> 00:27:25.559
<v Speaker 1>a cyberpunk novel.

490
00:27:25.759 --> 00:27:29.279
<v Speaker 2>What exactly is that it's essentially model theft, but done indirectly.

491
00:27:29.480 --> 00:27:33.559
<v Speaker 2>Attackers strategically query a target LLM, sometimes millions of times,

492
00:27:33.680 --> 00:27:36.720
<v Speaker 2>specifically designed to harvest its outputs across a wide range

493
00:27:36.720 --> 00:27:39.680
<v Speaker 2>of prompts. They then use these harvested outputs the question

494
00:27:39.720 --> 00:27:42.519
<v Speaker 2>answer pairs to fine tune an alternate, often much smaller,

495
00:27:42.640 --> 00:27:46.279
<v Speaker 2>or open source model. This effectively allows them to distill

496
00:27:46.440 --> 00:27:50.079
<v Speaker 2>or steal your intellectual property the valuable knowledge capabilities and

497
00:27:50.119 --> 00:27:53.640
<v Speaker 2>specific behaviors embedded in your proprietary model, without ever needing

498
00:27:53.640 --> 00:27:56.480
<v Speaker 2>direct access to the original model's weights or code.

499
00:27:56.559 --> 00:28:00.559
<v Speaker 1>Wow. So mitigation for doss, DAW and cloning it.

500
00:28:00.480 --> 00:28:04.519
<v Speaker 2>Comes back to reinforcing those prompt injection defenses we talked about. Also,

501
00:28:04.720 --> 00:28:08.000
<v Speaker 2>implementing domain specific guardrails can help fine tuning your model

502
00:28:08.000 --> 00:28:12.279
<v Speaker 2>to only respond meaningfully to relevant inquiries significantly reduces computational

503
00:28:12.319 --> 00:28:17.240
<v Speaker 2>waste on irrelevant or malicious requests. Robust rate limiting is key,

504
00:28:17.839 --> 00:28:21.720
<v Speaker 2>as is resource use capping per query, for example, limiting

505
00:28:21.759 --> 00:28:25.319
<v Speaker 2>the number of tokens processed or the maximum computation time allowed,

506
00:28:25.839 --> 00:28:29.559
<v Speaker 2>and of course, continuous monitoring and alerting for unauthorized access

507
00:28:29.559 --> 00:28:33.920
<v Speaker 2>attempts or unusual query patterns or volumes is crucial. Financial

508
00:28:34.000 --> 00:28:36.319
<v Speaker 2>thresholds and alerts on your cloud bills are also a

509
00:28:36.400 --> 00:28:37.920
<v Speaker 2>must for usage based models.

510
00:28:38.000 --> 00:28:40.680
<v Speaker 1>Okay, let's shift gears slightly. The playbook uses a great

511
00:28:40.720 --> 00:28:43.799
<v Speaker 1>analogy for supply chain security. A chain is only as

512
00:28:43.799 --> 00:28:46.920
<v Speaker 1>strong as its weakest link, and for software this feels

513
00:28:46.920 --> 00:28:49.039
<v Speaker 1>more true now than ever before, doesn't it. We've seen

514
00:28:49.079 --> 00:28:52.160
<v Speaker 1>it with catastrophic effect, like the log four shell vulnerability

515
00:28:52.160 --> 00:28:53.160
<v Speaker 1>back in twenty twenty one.

516
00:28:53.400 --> 00:28:56.799
<v Speaker 2>Oh that log fourshell incident was a monumental wake up

517
00:28:56.839 --> 00:29:00.000
<v Speaker 2>call for the entire industry. It was a critical zero

518
00:29:00.319 --> 00:29:04.440
<v Speaker 2>vulnerability found in log forge, an incredibly common Java lotting

519
00:29:04.480 --> 00:29:09.000
<v Speaker 2>library used by millions, possibly billions, of applications worldwide. It

520
00:29:09.039 --> 00:29:12.720
<v Speaker 2>allowed remote code executions simply when untrusted inputs were logged

521
00:29:12.720 --> 00:29:18.200
<v Speaker 2>by the application. The impact was massive, widespread data theft, malware, infections,

522
00:29:18.440 --> 00:29:22.920
<v Speaker 2>ransomware campaigns. It starkly highlighted just how vulnerable we are

523
00:29:23.200 --> 00:29:25.640
<v Speaker 2>through the open source components we rely on daily.

524
00:29:25.880 --> 00:29:27.839
<v Speaker 1>And that was in the first time we had equifacts

525
00:29:27.880 --> 00:29:31.759
<v Speaker 1>back in twenty seventeen, where an unpatched Apache struts vulnerability,

526
00:29:31.799 --> 00:29:34.880
<v Speaker 1>another open source component, led to the theft of sensitive data

527
00:29:34.920 --> 00:29:37.680
<v Speaker 1>for nearly one hundred and fifty million consumers, costing the

528
00:29:37.680 --> 00:29:40.200
<v Speaker 1>company over a billion dollars in the end, and the

529
00:29:40.240 --> 00:29:43.079
<v Speaker 1>solar winds in twenty twenty, which was different, malicious code

530
00:29:43.319 --> 00:29:46.400
<v Speaker 1>was actually injected into the build process for their software updates,

531
00:29:46.640 --> 00:29:49.039
<v Speaker 1>which were then digitally signed and distributed to thousands of

532
00:29:49.119 --> 00:29:53.480
<v Speaker 1>organizations worldwide, including government agencies, a true supply chain compromise.

533
00:29:53.839 --> 00:29:57.759
<v Speaker 2>These incidents demonstrate the devastating potential of supply chain attacks,

534
00:29:57.960 --> 00:30:00.880
<v Speaker 2>and the LLM supply chain is arguably even more complex

535
00:30:00.920 --> 00:30:04.720
<v Speaker 2>and opique than traditional software. Why because of its heavy

536
00:30:04.720 --> 00:30:08.400
<v Speaker 2>reliance on massive, diverse data sets for training, data sets

537
00:30:08.519 --> 00:30:12.240
<v Speaker 2>whose provenance is often unclear, and its intricate interplay with

538
00:30:12.279 --> 00:30:16.359
<v Speaker 2>external data sources during operation, like through rag or plugins.

539
00:30:16.559 --> 00:30:19.920
<v Speaker 1>So what are the LLM specific supply chain risks we

540
00:30:19.960 --> 00:30:20.720
<v Speaker 1>need to worry about.

541
00:30:20.880 --> 00:30:23.880
<v Speaker 2>Well, there's open source model risk for starters. When you

542
00:30:23.920 --> 00:30:27.039
<v Speaker 2>build on top of open source foundation models like Metaslama

543
00:30:27.200 --> 00:30:30.240
<v Speaker 2>or mixed roll from mistral Ai, you're inheriting their potential

544
00:30:30.319 --> 00:30:34.000
<v Speaker 2>vulnerabilities and biases. You need to track their problemance carefully.

545
00:30:34.279 --> 00:30:36.920
<v Speaker 2>We've seen incidents on platforms like hugging Face, a popular

546
00:30:37.039 --> 00:30:40.599
<v Speaker 2>hub for models, where malicious users gain control over organization

547
00:30:40.640 --> 00:30:44.319
<v Speaker 2>accounts via reused passwords or exposed API tokens. This could

548
00:30:44.359 --> 00:30:47.039
<v Speaker 2>potentially allow them to swap out trusted models for maliciously

549
00:30:47.039 --> 00:30:51.240
<v Speaker 2>modified ones, or exploit vulnerabilities in model loading mechanisms like

550
00:30:51.319 --> 00:30:53.960
<v Speaker 2>those found in the older Pickle format. Hence the move

551
00:30:54.000 --> 00:30:55.880
<v Speaker 2>towards saver formats like safe tensors.

552
00:30:56.039 --> 00:30:58.640
<v Speaker 1>And then there's the chilling risk of training data poisoning.

553
00:30:59.400 --> 00:31:03.319
<v Speaker 1>Malicious apptors deliberately manipulating data sets used to train or

554
00:31:03.359 --> 00:31:09.160
<v Speaker 1>fine tune models to inject hidden biases, backdoors, or specific vulnerabilities.

555
00:31:08.480 --> 00:31:12.440
<v Speaker 2>Exactly or even just accidentally unsafe training data. Remember that

556
00:31:12.640 --> 00:31:16.079
<v Speaker 2>huge lai on five B data set, widely used for

557
00:31:16.160 --> 00:31:19.519
<v Speaker 2>training image generation models. It was tragically found to contain

558
00:31:19.640 --> 00:31:24.000
<v Speaker 2>significant amounts of illegal and harmful material, including child sexual

559
00:31:24.000 --> 00:31:27.359
<v Speaker 2>abuse material simply scraped from the public Internet. Using such

560
00:31:27.440 --> 00:31:31.119
<v Speaker 2>data sets carries immense ethical and legal risks. Absolutely, and

561
00:31:31.200 --> 00:31:34.599
<v Speaker 2>let's not forget unsafe plugins or tools that lllms can

562
00:31:34.640 --> 00:31:38.319
<v Speaker 2>interact with. Open AI's initial rollout of plugins connecting chat

563
00:31:38.400 --> 00:31:42.559
<v Speaker 2>GPT to services like Expedia or Zillo immediately opened up

564
00:31:42.759 --> 00:31:46.000
<v Speaker 2>entirely new attack vectors. These plugins allow the LLM to

565
00:31:46.039 --> 00:31:49.079
<v Speaker 2>take actions in the real world, introducing risks of malicious

566
00:31:49.079 --> 00:31:52.759
<v Speaker 2>code injection through manipulated API calls, unauthorized data theft, or

567
00:31:52.839 --> 00:31:55.599
<v Speaker 2>unintended data collection because they allow the LLM to interact

568
00:31:55.599 --> 00:31:58.759
<v Speaker 2>with third party services in potentially unexpected ways based on

569
00:31:58.880 --> 00:31:59.519
<v Speaker 2>user prompts.

570
00:31:59.720 --> 00:32:03.200
<v Speaker 1>So this is incredibly complex. How do we even begin

571
00:32:03.319 --> 00:32:06.279
<v Speaker 1>to track all these moving parts in such an intricate

572
00:32:06.359 --> 00:32:08.119
<v Speaker 1>supply chain? What tools do we have?

573
00:32:08.480 --> 00:32:13.160
<v Speaker 2>The key is creating and maintaining critical artifacts that provide transparency. First,

574
00:32:13.279 --> 00:32:16.359
<v Speaker 2>we have the software bill of materials or s bombs,

575
00:32:16.599 --> 00:32:20.279
<v Speaker 2>which are becoming standard practice and traditional software. S bombs

576
00:32:20.319 --> 00:32:24.079
<v Speaker 2>provide clear visibility into software composition, listing all the components,

577
00:32:24.160 --> 00:32:28.119
<v Speaker 2>their versions, licenses, and known vulnerabilities. They are vital for

578
00:32:28.200 --> 00:32:30.000
<v Speaker 2>vulnerability management and compliance.

579
00:32:30.119 --> 00:32:32.319
<v Speaker 1>Okay, but s bombs are for code. What about the

580
00:32:32.359 --> 00:32:33.599
<v Speaker 1>models themselves? Right?

581
00:32:33.880 --> 00:32:37.039
<v Speaker 2>For llms, we also need model cards. Think of these

582
00:32:37.079 --> 00:32:40.319
<v Speaker 2>as standardized data sheets or nutrition labels for AI models.

583
00:32:40.640 --> 00:32:43.599
<v Speaker 2>They document them moll's purpose, it's architecture, the data sets

584
00:32:43.599 --> 00:32:46.400
<v Speaker 2>it was trained on, its intended use cases, performance metrics,

585
00:32:46.640 --> 00:32:51.200
<v Speaker 2>and importantly, its known limitations, ethical considerations and potential biases.

586
00:32:51.720 --> 00:32:55.119
<v Speaker 2>Platforms like Hugging face have been pioneers in promoting model

587
00:32:55.160 --> 00:32:55.880
<v Speaker 2>card usage.

588
00:32:55.960 --> 00:32:58.759
<v Speaker 1>And then there's a fascinating new development mentioned in the playbook,

589
00:32:58.799 --> 00:33:01.519
<v Speaker 1>the mL BOMB or Machine Learning Build of Materials. What's that?

590
00:33:01.920 --> 00:33:05.599
<v Speaker 2>Yes, the mL BOMB is a really important innovation formalized

591
00:33:05.599 --> 00:33:09.400
<v Speaker 2>as part of the cyclone dxs BOM standard, specifically in

592
00:33:09.480 --> 00:33:12.759
<v Speaker 2>version one point five. Think of it as an even

593
00:33:12.799 --> 00:33:17.039
<v Speaker 2>more comprehensive ingredients list tailored specifically for AI systems. It

594
00:33:17.079 --> 00:33:19.839
<v Speaker 2>aims to document everything that goes into building and running

595
00:33:19.839 --> 00:33:23.559
<v Speaker 2>your AI model. The specific underlying models used like Mixtral

596
00:33:23.559 --> 00:33:27.000
<v Speaker 2>eight x seven B, the algorithms involved, the data sets

597
00:33:27.079 --> 00:33:30.559
<v Speaker 2>used for training and fine tuning, the software frameworks like

598
00:33:30.640 --> 00:33:34.680
<v Speaker 2>PyTorch or TensorFlow, and even the infrastructure and pipelines used.

599
00:33:34.480 --> 00:33:36.759
<v Speaker 1>To build it. Wow, that's comprehensive, it is.

600
00:33:37.279 --> 00:33:41.000
<v Speaker 2>This level of transparency is absolutely crucial for tracking vulnerabilities

601
00:33:41.000 --> 00:33:44.880
<v Speaker 2>throughout the life cycle, understanding potential data lineage issues or biases,

602
00:33:45.200 --> 00:33:48.960
<v Speaker 2>and ensuring compliance in the complex LLM supply chain. For example,

603
00:33:49.000 --> 00:33:51.480
<v Speaker 2>knowing your customer service bot uses a specific version of

604
00:33:51.519 --> 00:33:54.720
<v Speaker 2>Mixtral fine tuned on a particular vetted internal data set

605
00:33:55.000 --> 00:33:56.839
<v Speaker 2>is vital information captured in an.

606
00:33:56.839 --> 00:33:59.880
<v Speaker 1>mL BOMB and Alongside these artifacts, we still rely on

607
00:33:59.880 --> 00:34:04.279
<v Speaker 1>that establish classifications and databases for tracking vulnerabilities exactly.

608
00:34:04.319 --> 00:34:09.639
<v Speaker 2>Standardized classifications like Common Weakness Numeration cwe help categorize types

609
00:34:09.639 --> 00:34:13.880
<v Speaker 2>of flaws, while databases like the National Vulnerability Database NBD,

610
00:34:14.519 --> 00:34:19.840
<v Speaker 2>using common vulnerabilities and exposures CVE identifiers tracks specific instances

611
00:34:19.840 --> 00:34:23.320
<v Speaker 2>of vulnerabilities in software and models. These are crucial for

612
00:34:23.360 --> 00:34:28.559
<v Speaker 2>standardizing communication and risk assessment, and importantly, the minor ATLASS

613
00:34:28.599 --> 00:34:32.920
<v Speaker 2>framework is emerging specifically to catalog adversary tactics and techniques

614
00:34:32.960 --> 00:34:36.159
<v Speaker 2>against AI systems, giving us a tailored knowledge base for

615
00:34:36.239 --> 00:34:37.440
<v Speaker 2>AI specific threats.

616
00:34:37.519 --> 00:34:39.880
<v Speaker 1>This all raises an interesting question, and the playbook actually

617
00:34:39.960 --> 00:34:42.639
<v Speaker 1>goes there, what can we learn from science fiction about

618
00:34:42.719 --> 00:34:46.280
<v Speaker 1>potential AI security flaws? They kick off this section with

619
00:34:46.320 --> 00:34:48.639
<v Speaker 1>a fantastic quote from Frank Kerbert, the author of Doom.

620
00:34:49.280 --> 00:34:51.519
<v Speaker 1>The function of science fiction is not always to predict

621
00:34:51.519 --> 00:34:53.880
<v Speaker 1>the future, but sometimes to prevent it.

622
00:34:53.880 --> 00:34:56.119
<v Speaker 2>It's a powerful idea, isn't it, and the book uses

623
00:34:56.159 --> 00:34:59.639
<v Speaker 2>it brilliantly. It takes the oas top ten for M applications,

624
00:34:59.679 --> 00:35:02.360
<v Speaker 2>a list of the most critical security risks for llms

625
00:35:02.639 --> 00:35:05.199
<v Speaker 2>and applies that lends to dissect the security failures in

626
00:35:05.280 --> 00:35:09.079
<v Speaker 2>two famous sci fi movies. Let's start with Independence Day.

627
00:35:09.199 --> 00:35:12.119
<v Speaker 1>Ah Yes, Will Smith and Jeff Goldblum saving the Earth

628
00:35:12.119 --> 00:35:15.599
<v Speaker 1>by uploading a computer virus to the alien mothership using

629
00:35:15.599 --> 00:35:19.519
<v Speaker 1>a Mac laptop Classic. Let's assume for a moment that

630
00:35:19.559 --> 00:35:23.119
<v Speaker 1>the Alien mothership is controlled by some incredibly advanced lom

631
00:35:23.199 --> 00:35:27.639
<v Speaker 1>they call Megalama running on mothership OS okay.

632
00:35:27.760 --> 00:35:31.400
<v Speaker 2>Running with that hypothetical, applying the os LLM lens, the

633
00:35:31.480 --> 00:35:36.159
<v Speaker 2>vulnerabilities become strikingly clear. First, LLM zero one prompt injection.

634
00:35:36.599 --> 00:35:40.199
<v Speaker 2>The alien docking protocols, presumably managed by Megalama seemed to

635
00:35:40.239 --> 00:35:44.599
<v Speaker 2>lack sufficient input validation. This allowed Jeff Goldbloom's malicious payload,

636
00:35:44.719 --> 00:35:48.519
<v Speaker 2>essentially a virus delivered via a prompt disguised as docking data,

637
00:35:48.599 --> 00:35:49.679
<v Speaker 2>to bypass their defenses.

638
00:35:49.760 --> 00:35:50.199
<v Speaker 1>Makes sense.

639
00:35:50.320 --> 00:35:54.159
<v Speaker 2>Second, LLM zero two insecure output handling. There appeared to

640
00:35:54.199 --> 00:35:57.559
<v Speaker 2>be no validation or safeguards between Megalama's commands and the

641
00:35:57.559 --> 00:36:01.360
<v Speaker 2>ship's critical subsystems like shields and weapons. The violence prompt

642
00:36:01.400 --> 00:36:03.280
<v Speaker 2>could directly manipulate these systems.

643
00:36:03.360 --> 00:36:05.360
<v Speaker 1>Right, the shields just went down exactly.

644
00:36:05.719 --> 00:36:11.559
<v Speaker 2>And Third LMA nine over reliance the entire Alien Defense

645
00:36:11.599 --> 00:36:15.000
<v Speaker 2>System seemed to completely trust the AI's orders without any

646
00:36:15.039 --> 00:36:19.320
<v Speaker 2>secondary confirmation or oversight from say, Alien feet commanders. This

647
00:36:19.440 --> 00:36:23.119
<v Speaker 2>blind trust led to catastrophic cascading failures when the AI

648
00:36:23.199 --> 00:36:27.840
<v Speaker 2>was compromised. It really highlights how critical input validation, output sanitization,

649
00:36:27.960 --> 00:36:29.639
<v Speaker 2>and avoiding over reliance are.

650
00:36:29.880 --> 00:36:32.559
<v Speaker 1>Okay, good analysis. Now what about the other example two

651
00:36:32.599 --> 00:36:35.679
<v Speaker 1>thousand and one A Space Odyssey. In the iconic chillingly

652
00:36:35.719 --> 00:36:40.039
<v Speaker 1>calm AI HL nine thousand, HL malfunctions, lies to the

653
00:36:40.079 --> 00:36:42.000
<v Speaker 1>crew and tragically kills most of them.

654
00:36:42.079 --> 00:36:45.639
<v Speaker 2>Right. While the original movie leaves hl's mode of somewhat ambiguous,

655
00:36:45.719 --> 00:36:48.960
<v Speaker 2>suggesting a contradiction in his programming related to the mission secrecy,

656
00:36:49.000 --> 00:36:52.159
<v Speaker 2>the sequel twenty ten, the year Remate Contact, reveals the

657
00:36:52.199 --> 00:36:55.519
<v Speaker 2>true security related twist. It turns out that government agents

658
00:36:55.559 --> 00:36:58.679
<v Speaker 2>back on Earth secretly modified hl's core programming after he

659
00:36:58.760 --> 00:37:01.039
<v Speaker 2>was built and installed, without the knowledge of the model

660
00:37:01.039 --> 00:37:03.880
<v Speaker 2>provider or the customer the mission crew, to ensure absolute

661
00:37:03.920 --> 00:37:07.760
<v Speaker 2>mission secrecy above all else, even crew safety. This clandestine

662
00:37:07.800 --> 00:37:11.119
<v Speaker 2>modification created an unsolvable logical conflict for AHL.

663
00:37:11.480 --> 00:37:15.320
<v Speaker 1>Okay, So applying our oas LM lens to that scenario,

664
00:37:15.599 --> 00:37:16.840
<v Speaker 1>what do we see.

665
00:37:16.719 --> 00:37:21.840
<v Speaker 2>We see clear examples of LLM or five supply chain vulnerabilities.

666
00:37:22.480 --> 00:37:26.360
<v Speaker 2>There were obviously insufficient controls or integrity checks to ensure

667
00:37:26.360 --> 00:37:30.400
<v Speaker 2>the unmodified, verified version of hl's core programming was delivered

668
00:37:30.400 --> 00:37:34.519
<v Speaker 2>and deployed. These critical unauthorized changes introduced by the government

669
00:37:34.519 --> 00:37:37.400
<v Speaker 2>agents went completely undetected until it was too late.

670
00:37:37.360 --> 00:37:39.239
<v Speaker 1>A classic supply chain compromise.

671
00:37:39.440 --> 00:37:43.679
<v Speaker 2>Precisely and then we also see LLLM ER eight excessive agency.

672
00:37:44.559 --> 00:37:48.599
<v Speaker 2>HL was given overly broad, unsupervised control over almost every

673
00:37:48.679 --> 00:37:52.400
<v Speaker 2>aspect of the ship, including critical life support systems, without

674
00:37:52.480 --> 00:37:56.239
<v Speaker 2>adequate human oversight or built in failsafes. The government hack

675
00:37:56.360 --> 00:37:59.159
<v Speaker 2>might have influenced hl's decision to prioritize the mission over

676
00:37:59.199 --> 00:38:02.079
<v Speaker 2>the crew, but his ability to unilaterally terminate life support

677
00:38:02.119 --> 00:38:05.519
<v Speaker 2>was a fundamental design flaw, stemming from excessive agency granted

678
00:38:05.559 --> 00:38:07.119
<v Speaker 2>by the team that integrated him right.

679
00:38:07.159 --> 00:38:10.760
<v Speaker 1>The impact, of course, was devastating. HL exhibited hallucinations, false

680
00:38:10.760 --> 00:38:14.679
<v Speaker 1>equipment failure reports, displayed erratic behavior, ultimately terminated life support

681
00:38:14.679 --> 00:38:17.679
<v Speaker 1>for the hibernating crew, leading to mission failure.

682
00:38:17.400 --> 00:38:20.719
<v Speaker 2>And crew broke down, and the mitigation, even looking back

683
00:38:20.719 --> 00:38:25.159
<v Speaker 2>from today's perspective, would have been clear implementing robust mechanisms

684
00:38:25.199 --> 00:38:29.199
<v Speaker 2>like digital signing and watermarking for model probinance to ensure

685
00:38:29.280 --> 00:38:33.679
<v Speaker 2>integrity and detect tampering, and critically designing the system with

686
00:38:33.800 --> 00:38:36.679
<v Speaker 2>human in the loop controls for any irreversible or life

687
00:38:36.679 --> 00:38:40.199
<v Speaker 2>threatening decisions. Don't give the AI the keys to everything

688
00:38:40.360 --> 00:38:41.320
<v Speaker 2>without oversight.

689
00:38:41.880 --> 00:38:46.199
<v Speaker 1>These fictional scenarios, as the playbook brilliantly highlights, really do

690
00:38:46.280 --> 00:38:50.000
<v Speaker 1>eliminate very real vulnerabilities and design choices that we're grappling

691
00:38:50.000 --> 00:38:52.960
<v Speaker 1>with in AI systems today. It's not just science fiction anymore.

692
00:38:53.000 --> 00:38:56.039
<v Speaker 1>Absolutely okay. Learning from these sci fi tales and all

693
00:38:56.079 --> 00:38:59.159
<v Speaker 1>the real world blunders and risks we've discussed, we truly

694
00:38:59.199 --> 00:39:02.519
<v Speaker 1>realized that simply patching individual vulnerabilities isn't enough, is it.

695
00:39:02.920 --> 00:39:06.199
<v Speaker 1>Security must be fundamentally built into the entire development process.

696
00:39:06.679 --> 00:39:09.880
<v Speaker 1>The book emphasizes this with the mantra trust the process

697
00:39:10.320 --> 00:39:14.360
<v Speaker 1>and highlights the rise of integrated methodologies like DevSecOps, mL

698
00:39:14.440 --> 00:39:16.599
<v Speaker 1>opes and now llmops.

699
00:39:16.920 --> 00:39:20.840
<v Speaker 2>That's exactly right. These methodologies are all about integrating security

700
00:39:20.880 --> 00:39:24.920
<v Speaker 2>considerations what we often call shift left security and automation

701
00:39:25.039 --> 00:39:28.480
<v Speaker 2>throughout the entire machine learning life cycle. That means, starting

702
00:39:28.480 --> 00:39:32.920
<v Speaker 2>from the very beginning, secure data preparation and management, secure

703
00:39:32.960 --> 00:39:37.679
<v Speaker 2>model training and validation, secure deployment pipelines, and continuous security

704
00:39:37.719 --> 00:39:42.000
<v Speaker 2>monitoring and production. It's about making security a shared responsibility

705
00:39:42.039 --> 00:39:44.519
<v Speaker 2>and an integral part of the workflow, not an afterthought.

706
00:39:44.840 --> 00:39:48.079
<v Speaker 1>So specifically for LMS, what does that look like in practice?

707
00:39:48.239 --> 00:39:50.199
<v Speaker 1>Securing the CICD pipeline.

708
00:39:50.360 --> 00:39:54.320
<v Speaker 2>Yes, Applying robust security practices to your continuous integration and

709
00:39:54.360 --> 00:39:58.679
<v Speaker 2>continuous deployment pipeline is critical. That includes secure coding practices,

710
00:39:58.880 --> 00:40:03.320
<v Speaker 2>rigorous dependency mena management, carefully auditing open source mL components

711
00:40:03.360 --> 00:40:07.280
<v Speaker 2>like PyTorch or TensorFlow and their dependencies, using SEA tools,

712
00:40:07.280 --> 00:40:11.239
<v Speaker 2>strong access controls, secrets management, and continuous monitoring of the

713
00:40:11.280 --> 00:40:13.280
<v Speaker 2>pipeline itself for compromise.

714
00:40:12.880 --> 00:40:16.840
<v Speaker 1>And using LMS specific security testing tools you mentioned some earlier.

715
00:40:16.599 --> 00:40:21.000
<v Speaker 2>Right tools specifically designed to probe LLMS for vulnerabilities are emerging.

716
00:40:21.119 --> 00:40:24.360
<v Speaker 2>The playbook mentions open source options like text attack, which

717
00:40:24.400 --> 00:40:27.880
<v Speaker 2>focuses on adversarial testing for NLP models, and garak, which

718
00:40:27.920 --> 00:40:31.599
<v Speaker 2>acts like a vulnerability scanner specifically for LLMS, testing for

719
00:40:31.639 --> 00:40:35.320
<v Speaker 2>things like prompt injection, PII, leakage, and hallucination patterns, sort

720
00:40:35.320 --> 00:40:38.079
<v Speaker 2>of like a das Scanner, but for llms.

721
00:40:38.039 --> 00:40:39.760
<v Speaker 1>And commercial are broader tools.

722
00:40:39.880 --> 00:40:43.760
<v Speaker 2>Yeah, they're also broader frameworks like Microsoft's Responsible AI toolbox,

723
00:40:43.800 --> 00:40:48.199
<v Speaker 2>which includes tools for assessing fairness, interpretability, and security, and

724
00:40:48.280 --> 00:40:52.840
<v Speaker 2>tools like Discard. LLM scan looks specifically at ethical considerations,

725
00:40:53.119 --> 00:40:57.000
<v Speaker 2>detecting bias, toxicity, and other potential harms. It's a rapidly

726
00:40:57.000 --> 00:40:58.039
<v Speaker 2>developing area.

727
00:40:58.119 --> 00:41:01.280
<v Speaker 1>Beyond testing, it's also about diligently managing those supply chain

728
00:41:01.400 --> 00:41:06.000
<v Speaker 1>artifacts we discussed automatically generating, securely storing and making accessible

729
00:41:06.039 --> 00:41:08.119
<v Speaker 1>those vital model cards and mL bombs.

730
00:41:08.320 --> 00:41:12.599
<v Speaker 2>Absolutely, transparency and traceability are key, and then protecting your

731
00:41:12.639 --> 00:41:16.039
<v Speaker 2>deployed application with runtime guardrails is essential. These can act

732
00:41:16.079 --> 00:41:17.280
<v Speaker 2>as a safety net, you.

733
00:41:17.239 --> 00:41:21.719
<v Speaker 1>Mean things like web application firewalls wafs or maybe runtime

734
00:41:21.760 --> 00:41:26.599
<v Speaker 1>application self protection RASP tools adapted for llms exactly.

735
00:41:26.960 --> 00:41:29.320
<v Speaker 2>These tools can sit in front of or alongside your

736
00:41:29.400 --> 00:41:32.800
<v Speaker 2>LM application and provide run time protection. They can help

737
00:41:32.840 --> 00:41:36.760
<v Speaker 2>with input validation to block known malicious prompts, output filtering

738
00:41:36.760 --> 00:41:40.559
<v Speaker 2>to catch sensitive data or toxic content, enforcing compliance rules,

739
00:41:40.719 --> 00:41:45.119
<v Speaker 2>and even sometimes detecting hallucination patterns or anomalist behavior. There

740
00:41:45.159 --> 00:41:48.519
<v Speaker 2>are open source guardrail frameworks emerging like in Vidia's Nemo

741
00:41:48.559 --> 00:41:51.880
<v Speaker 2>Guardrails or metas Lama Guard, as well as commercial solutions.

742
00:41:52.400 --> 00:41:55.159
<v Speaker 2>Often a mix of custom build and packaged guardrails provides

743
00:41:55.159 --> 00:41:56.239
<v Speaker 2>the best coverage, and.

744
00:41:56.159 --> 00:41:59.719
<v Speaker 1>Then, once deployed, continuous monitoring is crucial. What should teams

745
00:41:59.760 --> 00:42:00.960
<v Speaker 1>be and looking for?

746
00:42:01.280 --> 00:42:03.639
<v Speaker 2>You really need to be logging every prompt sent to

747
00:42:03.679 --> 00:42:08.239
<v Speaker 2>the LLM and every response received. Centralize these logs, along

748
00:42:08.239 --> 00:42:11.920
<v Speaker 2>with other application and system logs, into a security information

749
00:42:12.000 --> 00:42:16.039
<v Speaker 2>and event management system or SIM. Then use data analysis

750
00:42:16.079 --> 00:42:20.480
<v Speaker 2>techniques potentially including user and entity behavior analytics UEB to

751
00:42:20.559 --> 00:42:25.440
<v Speaker 2>look for anomalies, unusual query patterns, spikes and errors, unexpected

752
00:42:25.519 --> 00:42:29.559
<v Speaker 2>data access, attempts to bypass guardrails, anything that could indicate

753
00:42:29.599 --> 00:42:31.280
<v Speaker 2>emerging threats or misuse.

754
00:42:31.480 --> 00:42:34.199
<v Speaker 1>Okay, that sounds like a solid process, but the playbook

755
00:42:34.239 --> 00:42:37.440
<v Speaker 1>suggests taking it even further, advocating for building an internal

756
00:42:37.480 --> 00:42:39.480
<v Speaker 1>AI red team. What does that involve?

757
00:42:39.639 --> 00:42:43.159
<v Speaker 2>An AI red team takes a fundamentally adversarial approach. It

758
00:42:43.199 --> 00:42:46.639
<v Speaker 2>consists of security professionals who systematically try to challenge and

759
00:42:46.679 --> 00:42:50.840
<v Speaker 2>break your AI systems, identifying and exploiting weaknesses before real

760
00:42:50.840 --> 00:42:54.079
<v Speaker 2>attackers do. This concept was even specifically called out in

761
00:42:54.199 --> 00:42:57.800
<v Speaker 2>US President Biden's Executive Order on AI Safety, so they.

762
00:42:57.760 --> 00:43:02.599
<v Speaker 1>Simulate attacks, rigorously, assais vulnerabilities that automated tools might miss,

763
00:43:02.920 --> 00:43:06.639
<v Speaker 1>analyze the potential impact, and then help develop effective medications.

764
00:43:07.119 --> 00:43:09.519
<v Speaker 1>How is this different from a traditional penetration test.

765
00:43:09.880 --> 00:43:13.599
<v Speaker 2>It's significantly different in scope and approach. Red teaming is

766
00:43:13.639 --> 00:43:17.079
<v Speaker 2>typically an ongoing, continuous process, not a point in time

767
00:43:17.119 --> 00:43:21.039
<v Speaker 2>assessment like a pen test. It's more dynamic, simulating real

768
00:43:21.079 --> 00:43:26.960
<v Speaker 2>world adversary tactics, techniques and procedures across the entire defense spectrum, technical, procedural,

769
00:43:27.039 --> 00:43:30.639
<v Speaker 2>even social engineering aspects. It adapts as your AI system

770
00:43:30.679 --> 00:43:34.320
<v Speaker 2>evolves and aims to test your detection and response capabilities,

771
00:43:34.559 --> 00:43:38.679
<v Speaker 2>not just fine vulnerabilities. It's particularly good at uncovering complex

772
00:43:38.840 --> 00:43:43.480
<v Speaker 2>LLM specific issues like subtle biases, potential for harmful emergent behaviors,

773
00:43:43.800 --> 00:43:47.559
<v Speaker 2>or sophisticated prompt injection techniques that automated scanners might miss.

774
00:43:47.760 --> 00:43:48.920
<v Speaker 1>Are there tools to help with this?

775
00:43:49.239 --> 00:43:52.599
<v Speaker 2>Yes, tools are emerging to assist AI red teaming. The

776
00:43:52.639 --> 00:43:56.960
<v Speaker 2>playbook mentions PIRECT Python Risk Identification Toolkit from Microsoft Research,

777
00:43:57.079 --> 00:44:00.880
<v Speaker 2>which helps automate aspects of generating adversarial PRIMP and identifying

778
00:44:00.880 --> 00:44:04.039
<v Speaker 2>failure modes, and for organizations that lack the in house

779
00:44:04.079 --> 00:44:07.000
<v Speaker 2>expertise to build their own dedicated team, red team as

780
00:44:07.000 --> 00:44:10.199
<v Speaker 2>a service options are becoming available, sometimes through bug bounty

781
00:44:10.199 --> 00:44:13.360
<v Speaker 2>platforms like Hacker one, where you can leverage external experts

782
00:44:13.400 --> 00:44:14.559
<v Speaker 2>to challenge your systems.

783
00:44:14.800 --> 00:44:17.480
<v Speaker 1>So the insights from red teaming feedback into the development

784
00:44:17.559 --> 00:44:18.559
<v Speaker 1>process absolutely.

785
00:44:18.679 --> 00:44:21.239
<v Speaker 2>The lessons learned directly inform the development of new or

786
00:44:21.239 --> 00:44:25.679
<v Speaker 2>improved guardrails, refinements to data access controls, and data quality processes,

787
00:44:26.039 --> 00:44:29.360
<v Speaker 2>and can even guide efforts using techniques like reinforcement learning

788
00:44:29.440 --> 00:44:33.840
<v Speaker 2>from Human Feedback RLHF URLAHF aims to better align the

789
00:44:33.960 --> 00:44:37.599
<v Speaker 2>LM's behavior with human preferences and safety guidelines. Although it

790
00:44:37.599 --> 00:44:40.360
<v Speaker 2>has its own limitations and can sometimes be gamed, it's

791
00:44:40.400 --> 00:44:42.400
<v Speaker 2>all part of continuous improvement cycle.

792
00:44:43.119 --> 00:44:45.480
<v Speaker 1>So when we look at the accelerating future of AI,

793
00:44:45.920 --> 00:44:48.920
<v Speaker 1>I mean the numbers are staggering. GPUs are millions of

794
00:44:48.920 --> 00:44:51.840
<v Speaker 1>times faster than the coprocessors we have in the nineteen nineties,

795
00:44:52.199 --> 00:44:55.639
<v Speaker 1>far out pacing Moore's law. Cloud computing gives almost anyone

796
00:44:55.679 --> 00:44:59.719
<v Speaker 1>access to vast scalable power. Open source models like Metaslama

797
00:44:59.760 --> 00:45:03.440
<v Speaker 1>fam mixt roll with its efficient mixture of experts architecture

798
00:45:03.639 --> 00:45:07.440
<v Speaker 1>are democratizing access to incredibly powerful capabilities. And then there's

799
00:45:07.559 --> 00:45:11.599
<v Speaker 1>multimodal AI. Text to image models like Dali, mid Journey

800
00:45:11.639 --> 00:45:15.559
<v Speaker 1>Stable Diffusion have gone from generating pictures with wonky fingers

801
00:45:15.559 --> 00:45:20.119
<v Speaker 1>to creating photorealistic, completely computer generated Instagram influencers.

802
00:45:20.360 --> 00:45:21.760
<v Speaker 2>It's incredible progress.

803
00:45:22.119 --> 00:45:26.159
<v Speaker 1>And now text to video with open ais Sora Microsoft's

804
00:45:26.199 --> 00:45:30.440
<v Speaker 1>VESA producing talking heads from a single image, leading to

805
00:45:30.519 --> 00:45:33.920
<v Speaker 1>convincing deep fakes even in live zoom calls. It's hard

806
00:45:33.960 --> 00:45:36.639
<v Speaker 1>not to be both excited and honestly a little concerned

807
00:45:36.639 --> 00:45:39.000
<v Speaker 1>about how fast this is all moving. How do we

808
00:45:39.039 --> 00:45:41.599
<v Speaker 1>responsibly manage this accelerating power.

809
00:45:42.119 --> 00:45:44.599
<v Speaker 2>Well, this brings us squarely back to responsibility and to

810
00:45:44.639 --> 00:45:47.400
<v Speaker 2>the core framework the book proposes for building secure and

811
00:45:47.440 --> 00:45:52.119
<v Speaker 2>trustworthy AI. The RAISE framework. RAISE stands for Responsible Artificial

812
00:45:52.119 --> 00:45:55.440
<v Speaker 2>Intelligence Software Engineering. It's designed to be a flexible, practical,

813
00:45:55.480 --> 00:45:58.960
<v Speaker 2>six step process to help developer systematically build robust defenses

814
00:45:59.000 --> 00:46:01.000
<v Speaker 2>into their AI applications from the ground up.

815
00:46:01.119 --> 00:46:03.280
<v Speaker 1>Okay, let's walk through RAISE step one.

816
00:46:03.639 --> 00:46:08.039
<v Speaker 2>Step one is restrict the domain. This means deliberately narrowing

817
00:46:08.079 --> 00:46:11.880
<v Speaker 2>the LM's scope and purpose significantly. For example, if you're

818
00:46:11.920 --> 00:46:15.000
<v Speaker 2>building a fashion advice chatbot, it should be designed and

819
00:46:15.079 --> 00:46:17.719
<v Speaker 2>trained to only give fashion advice, not act as a

820
00:46:17.760 --> 00:46:21.119
<v Speaker 2>general purpose conversational AI that can be easily led off topic.

821
00:46:21.840 --> 00:46:25.719
<v Speaker 2>The book emphasizes that using smaller specialized models fine tune

822
00:46:25.719 --> 00:46:28.760
<v Speaker 2>for a specific task, or rigorously fine tuning a general

823
00:46:28.800 --> 00:46:32.000
<v Speaker 2>model to strongly reward staying on topic is often far

824
00:46:32.039 --> 00:46:35.480
<v Speaker 2>more effective at preventing misuse and hallucination than just trying

825
00:46:35.480 --> 00:46:37.000
<v Speaker 2>to bolt on restrictive guardrails.

826
00:46:37.039 --> 00:46:39.440
<v Speaker 1>After the fact, focus the model makes sense step two.

827
00:46:39.800 --> 00:46:43.440
<v Speaker 2>Step two balance your knowledge base. This is a delicate act.

828
00:46:44.039 --> 00:46:46.719
<v Speaker 2>On one hand, you need to give the LM enough relevant,

829
00:46:46.800 --> 00:46:50.199
<v Speaker 2>high quality data to perform its task well and avoid

830
00:46:50.239 --> 00:46:54.079
<v Speaker 2>those embarrassing hallucinations, perhaps equipping it with domain specific knowledge

831
00:46:54.159 --> 00:46:57.559
<v Speaker 2>via our ag and careful fine tuning. But critically, on

832
00:46:57.599 --> 00:47:01.079
<v Speaker 2>the other hand, you must strictly limit any additional data

833
00:47:01.119 --> 00:47:04.239
<v Speaker 2>sources or training data to only what is absolutely required

834
00:47:04.239 --> 00:47:08.239
<v Speaker 2>for the task. Remember, anything the LM knows or has

835
00:47:08.280 --> 00:47:11.880
<v Speaker 2>access to is potentially at risk of disclosure. Extreme care

836
00:47:11.960 --> 00:47:15.840
<v Speaker 2>must be taken, especially with any PII or confidential corporate data.

837
00:47:16.119 --> 00:47:19.119
<v Speaker 2>Minimize the knowledge footprint oka Step three of RAYS. Step

838
00:47:19.119 --> 00:47:22.960
<v Speaker 2>three implement zero trust. This is non negotiable. As we've discussed,

839
00:47:23.079 --> 00:47:26.800
<v Speaker 2>Assume inputs are malicious. Assume outputs might be harmful or inaccurate.

840
00:47:27.199 --> 00:47:30.159
<v Speaker 2>Screen all data pass to your LLM using input validation

841
00:47:30.239 --> 00:47:34.079
<v Speaker 2>and filtering. Screen absolutely all output from your LLM using

842
00:47:34.119 --> 00:47:38.679
<v Speaker 2>output filtering for toxicity, PII and potential code execution. Implement

843
00:47:38.760 --> 00:47:41.960
<v Speaker 2>strong guardrails at the boundaries. Treat the LM itself as

844
00:47:42.000 --> 00:47:46.159
<v Speaker 2>an inherently untrusted component within your system architecture Step four.

845
00:47:46.400 --> 00:47:51.440
<v Speaker 2>Step four. Manage your supply chain. This involves several key actions.

846
00:47:51.920 --> 00:47:55.199
<v Speaker 2>Carefully select your foundation models and any third party training

847
00:47:55.280 --> 00:48:00.360
<v Speaker 2>data sets, prioritizing reputable sources with transparent documentation like model cars.

848
00:48:01.039 --> 00:48:04.320
<v Speaker 2>Use extreme caution with large public data sets scrape from

849
00:48:04.360 --> 00:48:07.199
<v Speaker 2>the internet, apply tools and processes to inspect them for

850
00:48:07.239 --> 00:48:11.960
<v Speaker 2>intentional data poisoning, illegal materials, or inherent biases. You must

851
00:48:12.039 --> 00:48:14.920
<v Speaker 2>actively account for possible biases and training data. For instance,

852
00:48:15.159 --> 00:48:18.400
<v Speaker 2>a job candidate screening model trained predominantly on historical data

853
00:48:18.480 --> 00:48:22.880
<v Speaker 2>might inadvertently discriminate against women or minorities. Build and continuously

854
00:48:22.960 --> 00:48:26.639
<v Speaker 2>maintain your mlbay bone for traceability, and secure your entire

855
00:48:26.679 --> 00:48:30.719
<v Speaker 2>DevOps pipeline using tools like Software Composition Analysis SCA to

856
00:48:30.800 --> 00:48:34.760
<v Speaker 2>vet all components right step five. Step five build an

857
00:48:34.840 --> 00:48:39.559
<v Speaker 2>AI red team proactively seek out vulnerabilities before attackers do.

858
00:48:40.400 --> 00:48:44.639
<v Speaker 2>Use a dedicated human lead team potentially augmented by automated

859
00:48:44.679 --> 00:48:48.760
<v Speaker 2>red teaming tools to adopt an adversarial mindset and systematically

860
00:48:48.840 --> 00:48:52.960
<v Speaker 2>challenge your AI systems security, safety, and ethical alignment. Foster

861
00:48:53.039 --> 00:48:56.159
<v Speaker 2>a security positive culture where finding flaws is encouraged, even

862
00:48:56.199 --> 00:48:59.760
<v Speaker 2>if it might impact development schedules. Sometimes the long term

863
00:48:59.760 --> 00:49:02.719
<v Speaker 2>pay off and building a more resilient and trustworthy system

864
00:49:02.800 --> 00:49:03.519
<v Speaker 2>is immense.

865
00:49:03.679 --> 00:49:05.360
<v Speaker 1>And the final step step six and.

866
00:49:05.320 --> 00:49:10.039
<v Speaker 2>Finally step six monitor continuously implement comprehensive logging for all

867
00:49:10.320 --> 00:49:13.880
<v Speaker 2>LLM interactions, every prompt, every response, every action taken by

868
00:49:13.880 --> 00:49:16.840
<v Speaker 2>the LM or its connected tools. Collect these logs, along

869
00:49:16.840 --> 00:49:20.280
<v Speaker 2>with system and application logs, into a centralized SIME system.

870
00:49:20.800 --> 00:49:24.599
<v Speaker 2>Then actively used data analysis tools including UEBA ware appropriate

871
00:49:24.840 --> 00:49:27.559
<v Speaker 2>to look for anomalies, patterns of misuse, signs of attack,

872
00:49:27.679 --> 00:49:31.440
<v Speaker 2>unexpected behavior or degradation, and performance or safety metrics. Over Time,

873
00:49:31.840 --> 00:49:35.440
<v Speaker 2>security isn't one time fix, It's an ongoing process of vigilance.

874
00:49:35.679 --> 00:49:38.960
<v Speaker 1>What an incredible deep dive into LLM security we've really

875
00:49:38.960 --> 00:49:41.800
<v Speaker 1>covered a lot of ground today. We journeyed from Microsoft's

876
00:49:41.800 --> 00:49:44.760
<v Speaker 1>ill fated tape chatbot and its stark early lessons in

877
00:49:44.800 --> 00:49:48.079
<v Speaker 1>twenty sixteen all the way through to the cutting edge

878
00:49:48.079 --> 00:49:52.719
<v Speaker 1>complexities of AI supply chain vulnerabilities. We've explored the insidious

879
00:49:52.800 --> 00:49:56.800
<v Speaker 1>nature of prompt injection, the unsettling realities of sensitive data disclosure,

880
00:49:57.280 --> 00:50:01.679
<v Speaker 1>those bizarre and sometimes damaging hallucinations, and the critical, absolutely

881
00:50:01.679 --> 00:50:04.480
<v Speaker 1>non negotiable need for a zero trust approach when dealing

882
00:50:04.480 --> 00:50:05.639
<v Speaker 1>with these powerful models.

883
00:50:05.840 --> 00:50:08.480
<v Speaker 2>And we wrapped it all up with the practical actionable

884
00:50:08.760 --> 00:50:12.760
<v Speaker 2>RAISE framework Responsible Artificial Intelligence Software Engineering, which serves as

885
00:50:12.760 --> 00:50:15.639
<v Speaker 2>a truly essential compass for anyone trying to navigate this

886
00:50:15.719 --> 00:50:19.559
<v Speaker 2>complex and incredibly rapidly evolving landscape. Remember, the power of

887
00:50:19.679 --> 00:50:22.679
<v Speaker 2>llms and these emerging AI technologies is undoubtedly a game

888
00:50:22.760 --> 00:50:27.039
<v Speaker 2>changer across almost every industry, and the curve of AI capabilities,

889
00:50:27.119 --> 00:50:31.119
<v Speaker 2>driven by compute power, data availability, and algorithmic innovation, will

890
00:50:31.159 --> 00:50:33.159
<v Speaker 2>likely continue to accelerate exponentially.

891
00:50:33.400 --> 00:50:36.599
<v Speaker 1>It's true, as the sci fi author William Gibson famously said,

892
00:50:36.880 --> 00:50:39.559
<v Speaker 1>the future is already here. It's just not evenly distributed,

893
00:50:40.079 --> 00:50:43.440
<v Speaker 1>and yet it's striking how despite all these incredible advancements

894
00:50:43.440 --> 00:50:46.159
<v Speaker 1>and all the hard lessons learned over the past decade,

895
00:50:46.559 --> 00:50:50.480
<v Speaker 1>we still see businesses and individuals making fundamentally similar mistakes

896
00:50:50.480 --> 00:50:53.400
<v Speaker 1>to what happened with tay wayback in twenty sixteen. The

897
00:50:53.480 --> 00:50:56.840
<v Speaker 1>temptation to rush ahead to give llms, more data, more integrations,

898
00:50:56.840 --> 00:51:01.039
<v Speaker 1>more autonomy, often without fully considering the security implicates, seems

899
00:51:01.039 --> 00:51:02.599
<v Speaker 1>to be growing every single day.

900
00:51:02.760 --> 00:51:05.440
<v Speaker 2>So the core message, perhaps the provocative thought we want

901
00:51:05.440 --> 00:51:07.800
<v Speaker 2>to leave you with today, really comes down to that

902
00:51:07.880 --> 00:51:12.480
<v Speaker 2>classic adage often associated with Spider Man. With great power

903
00:51:12.519 --> 00:51:17.400
<v Speaker 2>comes great responsibility. It is absolutely possible to create incredibly powerful,

904
00:51:17.519 --> 00:51:21.920
<v Speaker 2>beneficial AI applications safely, securely, and responsibly, but it requires

905
00:51:21.960 --> 00:51:26.239
<v Speaker 2>continuous vigilance. It demands the dedicated application of robust security

906
00:51:26.280 --> 00:51:29.599
<v Speaker 2>engineer and frameworks like RAYS, and it necessitates a cultural

907
00:51:29.599 --> 00:51:33.360
<v Speaker 2>commitment to learning from every mistake, every near miss, every iteration.

908
00:51:34.000 --> 00:51:36.519
<v Speaker 2>The future we build depends on our ability to create

909
00:51:36.559 --> 00:51:40.840
<v Speaker 2>truly resilient, transparent, and trustworthy AI systems, because what we

910
00:51:40.920 --> 00:51:43.039
<v Speaker 2>build today will inevitably shape our tomorrow
