WEBVTT

1
00:00:00.080 --> 00:00:01.679
<v Speaker 1>Welcome to the Deep Dive, where the show that bigs

2
00:00:01.720 --> 00:00:03.919
<v Speaker 1>through stacks of sources to give you the key takeaways,

3
00:00:04.120 --> 00:00:07.480
<v Speaker 1>making sure you're well informed. And today, Wow, we are

4
00:00:07.519 --> 00:00:11.599
<v Speaker 1>plunging into a pretty intense digital battlefield. The stakes are

5
00:00:11.599 --> 00:00:14.919
<v Speaker 1>incredibly high. We're talking about malware, you know, that nasty

6
00:00:14.960 --> 00:00:18.359
<v Speaker 1>software designed purely to disrupt damage steel and the scale,

7
00:00:18.399 --> 00:00:21.280
<v Speaker 1>the sheer scale of this problem, it's just staggering. Get this.

8
00:00:22.120 --> 00:00:25.800
<v Speaker 1>Every single day, something like three hundred and fifty thousand

9
00:00:25.839 --> 00:00:29.640
<v Speaker 1>new instances of malicious software pop up detected. Just think

10
00:00:29.679 --> 00:00:32.119
<v Speaker 1>about that number for a second. And back in twenty eighteen,

11
00:00:32.200 --> 00:00:34.960
<v Speaker 1>over six hundred and sixty nine million new variants were

12
00:00:35.000 --> 00:00:37.439
<v Speaker 1>spotted in that year alone. This isn't just annoying pop ups.

13
00:00:37.479 --> 00:00:40.320
<v Speaker 1>It's a huge financial hit businesses. We're spending on average

14
00:00:40.359 --> 00:00:42.600
<v Speaker 1>two point four million US dollars back in twenty eighteen

15
00:00:42.640 --> 00:00:45.759
<v Speaker 1>twenty nineteen just fighting malware and web attacks. So our

16
00:00:45.799 --> 00:00:47.759
<v Speaker 1>mission for this deep dive is to really get into

17
00:00:47.799 --> 00:00:51.399
<v Speaker 1>how cutting edge artificial intelligence, specifically deep learning, is being

18
00:00:51.520 --> 00:00:54.280
<v Speaker 1>used as well a crucial line of defense. We want

19
00:00:54.359 --> 00:00:58.159
<v Speaker 1>to explore how these intelligence systems are learning, adapting, maybe

20
00:00:58.200 --> 00:01:00.640
<v Speaker 1>even predicting threats that the old way just can't catch.

21
00:01:00.640 --> 00:01:03.280
<v Speaker 1>It's not just about spotting the known bad guys anymore, right,

22
00:01:03.399 --> 00:01:06.439
<v Speaker 1>it's about anticipating the unknown, the brand new stuff.

23
00:01:06.640 --> 00:01:10.959
<v Speaker 2>That's precisely it. You know, for years cybersecurity really leaned

24
00:01:11.000 --> 00:01:14.439
<v Speaker 2>heavily on what's called signature based detection. You could think

25
00:01:14.480 --> 00:01:18.120
<v Speaker 2>of it like having a huge photo album of known criminals.

26
00:01:18.280 --> 00:01:21.560
<v Speaker 2>It's great for recognizing malware we've already seen and fingerprinted,

27
00:01:21.959 --> 00:01:25.519
<v Speaker 2>very efficient for that, but it's big weakness. It's achilles heel.

28
00:01:25.719 --> 00:01:28.879
<v Speaker 2>Really is the zero day attack, ah.

29
00:01:28.599 --> 00:01:30.680
<v Speaker 1>The infamous zero days Exactly?

30
00:01:31.040 --> 00:01:34.560
<v Speaker 2>These are completely new malware variants never seen before. They

31
00:01:34.560 --> 00:01:36.760
<v Speaker 2>don't have a signature, no photo in the album to match.

32
00:01:37.239 --> 00:01:39.719
<v Speaker 2>And that's exactly where AI and deep learning are stepping in.

33
00:01:40.040 --> 00:01:43.120
<v Speaker 2>They use much more sophisticated methods like looking at dynamic

34
00:01:43.159 --> 00:01:46.200
<v Speaker 2>behavior to spot malicious intent, even if the code itself

35
00:01:46.239 --> 00:01:46.760
<v Speaker 2>is brand new.

36
00:01:46.840 --> 00:01:49.560
<v Speaker 1>Okay, let's unpack that a bit, Starting with like the

37
00:01:49.640 --> 00:01:52.680
<v Speaker 1>raw materials, how do we even study malware? I gather

38
00:01:52.760 --> 00:01:55.760
<v Speaker 1>there are two main ways, static and dynamic analysis.

39
00:01:55.799 --> 00:01:56.159
<v Speaker 3>That's right.

40
00:01:56.200 --> 00:01:59.560
<v Speaker 2>Static analysis is well like examining a suspicious package without

41
00:01:59.560 --> 00:02:02.280
<v Speaker 2>actually opening it. You're looking at the code itself without

42
00:02:02.359 --> 00:02:05.480
<v Speaker 2>running it, things like library calls that might make tech

43
00:02:05.519 --> 00:02:09.639
<v Speaker 2>strings inside it, byte sequences, maybe the sequence of API calls.

44
00:02:09.639 --> 00:02:14.319
<v Speaker 2>It seems designed to make signature based detection that mostly

45
00:02:14.439 --> 00:02:17.520
<v Speaker 2>uses this static data, but as we said, it totally

46
00:02:17.599 --> 00:02:20.400
<v Speaker 2>misses new malware because there's no existing signature, right, no

47
00:02:20.479 --> 00:02:24.159
<v Speaker 2>mugshot exactly. And then you have dynamic analysis. This is

48
00:02:24.159 --> 00:02:28.039
<v Speaker 2>where you actually detonate the malware so to speak.

49
00:02:28.120 --> 00:02:30.560
<v Speaker 1>You run it sounds risky.

50
00:02:30.599 --> 00:02:32.520
<v Speaker 2>Well, you run it or emulate it in a very

51
00:02:32.520 --> 00:02:36.280
<v Speaker 2>controlled environment a sandbox usually, and you watch what it does.

52
00:02:36.439 --> 00:02:38.759
<v Speaker 2>So you track the actual API calls it makes, how

53
00:02:38.759 --> 00:02:41.280
<v Speaker 2>it interacts with the system, maybe even low level hardware

54
00:02:41.319 --> 00:02:45.080
<v Speaker 2>events for unknown malware. Seeing its behavior what it actually

55
00:02:45.159 --> 00:02:48.639
<v Speaker 2>does is absolutely critical. It's not just about its blueprint.

56
00:02:48.319 --> 00:02:50.520
<v Speaker 1>But it's actions makes sense, and I heard some people

57
00:02:50.560 --> 00:02:53.080
<v Speaker 1>are even combining them like a hybrid approach.

58
00:02:53.240 --> 00:02:56.639
<v Speaker 2>Yes, absolutely. Hybrid analysis tries to get the best of

59
00:02:56.680 --> 00:02:59.400
<v Speaker 2>both worlds, looking at both the static structure and the

60
00:02:59.479 --> 00:03:01.759
<v Speaker 2>dynamic bee behavior to build more complete picture.

61
00:03:02.159 --> 00:03:03.639
<v Speaker 3>Things like mal DNA try to do this.

62
00:03:04.039 --> 00:03:06.439
<v Speaker 1>So you mentioned API calls and other things you look for.

63
00:03:06.599 --> 00:03:09.520
<v Speaker 1>These are the features, right, The specific clues precisely.

64
00:03:09.759 --> 00:03:13.080
<v Speaker 2>Features are the specific characteristics we extract. And API call

65
00:03:13.120 --> 00:03:17.080
<v Speaker 2>sequences are incredibly valuable. Why because they directly show what

66
00:03:17.120 --> 00:03:20.479
<v Speaker 2>a program is trying to do. Interact with files, connect

67
00:03:20.520 --> 00:03:24.280
<v Speaker 2>to the network, modify the system. API calls reveal.

68
00:03:24.039 --> 00:03:26.560
<v Speaker 1>That ah okay, And the key.

69
00:03:26.280 --> 00:03:29.240
<v Speaker 2>Insight here is that the order of these calls often

70
00:03:29.319 --> 00:03:33.840
<v Speaker 2>screams malicious intent. Think about it opening a file, encrypting it,

71
00:03:33.919 --> 00:03:36.960
<v Speaker 2>then deleting the original. That sequence tells a very different

72
00:03:36.960 --> 00:03:38.879
<v Speaker 2>story than just opening and reading a file.

73
00:03:38.960 --> 00:03:41.360
<v Speaker 1>Yeah, it definitely sounds like ransomware exactly.

74
00:03:41.919 --> 00:03:45.120
<v Speaker 2>So researchers use techniques like n grams, which is just

75
00:03:45.199 --> 00:03:47.680
<v Speaker 2>a fancy way of saying they look at short ordered

76
00:03:47.680 --> 00:03:50.919
<v Speaker 2>sequences of calls, like pairs or triplets to capture this

77
00:03:51.039 --> 00:03:54.919
<v Speaker 2>vital order information. Opcode sequences are another important feature too.

78
00:03:55.159 --> 00:03:58.360
<v Speaker 2>Those are the really low level machine instructions giving insight

79
00:03:58.400 --> 00:03:59.879
<v Speaker 2>into the program's core functions.

80
00:04:00.439 --> 00:04:02.919
<v Speaker 1>So how do analysts actually get this data? What tools

81
00:04:02.919 --> 00:04:03.560
<v Speaker 1>are they using?

82
00:04:03.800 --> 00:04:06.639
<v Speaker 2>Ah, there's a whole toolkit for static analysis. You have

83
00:04:06.680 --> 00:04:10.520
<v Speaker 2>dissemblers and debuggers like ida pro or allidobig. They let

84
00:04:10.520 --> 00:04:13.719
<v Speaker 2>you peek inside the compiled code. See the assembly instructions

85
00:04:13.800 --> 00:04:16.480
<v Speaker 2>extract op codes, potential API calls, and for.

86
00:04:16.439 --> 00:04:19.040
<v Speaker 1>The dynamic side, the sandbox stuff right.

87
00:04:19.199 --> 00:04:21.920
<v Speaker 2>Tools like API monitor are used to track those API

88
00:04:22.000 --> 00:04:24.480
<v Speaker 2>calls live, but you usually need to run the malware

89
00:04:24.560 --> 00:04:27.839
<v Speaker 2>inside a virtual machine or sandbox to contain it. Buster

90
00:04:27.959 --> 00:04:32.839
<v Speaker 2>Sandbox Analyzer BSA and similar tools like CW sandbox are

91
00:04:32.879 --> 00:04:35.560
<v Speaker 2>designed for exactly that. They run the malware safely and

92
00:04:35.680 --> 00:04:39.639
<v Speaker 2>log everything it does, file changes, network connections, API calls.

93
00:04:39.959 --> 00:04:43.879
<v Speaker 2>They're even more advanced tools like ether, which use hardware virtualization.

94
00:04:44.040 --> 00:04:46.519
<v Speaker 2>They kind of sit outside the operating system the malware

95
00:04:46.560 --> 00:04:48.600
<v Speaker 2>is running in, making them much harder for the malware

96
00:04:48.639 --> 00:04:49.160
<v Speaker 2>to detect.

97
00:04:49.160 --> 00:04:52.199
<v Speaker 1>Okay, this is fascinating. So you've got all this raw data,

98
00:04:52.240 --> 00:04:56.360
<v Speaker 1>API sequences, op codes, behaviors. Now how do you actually

99
00:04:56.399 --> 00:04:59.120
<v Speaker 1>feed this into an AI? How does the machine see

100
00:04:59.560 --> 00:04:59.959
<v Speaker 1>the malwa?

101
00:05:00.399 --> 00:05:01.000
<v Speaker 3>Well, this is.

102
00:05:00.920 --> 00:05:03.160
<v Speaker 2>Where some really creative approaches come in. One of the

103
00:05:03.160 --> 00:05:05.839
<v Speaker 2>most surprising ones is malware visualization.

104
00:05:06.000 --> 00:05:08.439
<v Speaker 1>Visualization you mean like charts and graphs.

105
00:05:08.439 --> 00:05:12.759
<v Speaker 2>No, literally turning the malware code the binary file itself

106
00:05:13.079 --> 00:05:15.720
<v Speaker 2>into an image, usually a grayscale image.

107
00:05:15.759 --> 00:05:18.600
<v Speaker 1>Wait, what turning code into a picture? How does that

108
00:05:18.639 --> 00:05:19.160
<v Speaker 1>even work?

109
00:05:19.439 --> 00:05:22.560
<v Speaker 2>Or why it sounds bizarre? I know, but researchers found

110
00:05:22.560 --> 00:05:25.120
<v Speaker 2>that malware samples from the same family, even if they

111
00:05:25.120 --> 00:05:28.600
<v Speaker 2>look different in code, often end up having similar textures

112
00:05:28.639 --> 00:05:32.759
<v Speaker 2>and structural patterns when you represent their binary data as pixels.

113
00:05:32.279 --> 00:05:34.240
<v Speaker 1>In an image like a visual fingerprint.

114
00:05:34.480 --> 00:05:37.600
<v Speaker 2>Kind of yeah, kindred attributes as some call it. And

115
00:05:37.639 --> 00:05:40.800
<v Speaker 2>the brilliant part is this lets us use incredibly powerful

116
00:05:40.879 --> 00:05:44.120
<v Speaker 2>deep learning models that were originally designed for image recognition.

117
00:05:44.560 --> 00:05:48.240
<v Speaker 1>You mean, like the AI that recognizes cats and photos.

118
00:05:47.800 --> 00:05:52.079
<v Speaker 2>Exactly, Convolutional neural networks or CNNs. They're designed to find

119
00:05:52.120 --> 00:05:57.000
<v Speaker 2>patterns in images, edges, textures, shapes, increasingly complex features. So

120
00:05:57.160 --> 00:05:59.759
<v Speaker 2>by turning malware into an image, we can train as

121
00:05:59.800 --> 00:06:02.879
<v Speaker 2>c N to spot the visual hallmarks of malicious code,

122
00:06:03.079 --> 00:06:06.199
<v Speaker 2>even if it has no obvious image component itself. It's

123
00:06:06.199 --> 00:06:07.279
<v Speaker 2>surprisingly effective.

124
00:06:07.399 --> 00:06:10.360
<v Speaker 1>Wow. Okay, that's pretty cool. So CNN's for the image approach.

125
00:06:10.519 --> 00:06:12.439
<v Speaker 1>What other AI tools are in the box?

126
00:06:12.720 --> 00:06:16.240
<v Speaker 2>Well, for data that's sequential where the order is crucial,

127
00:06:16.360 --> 00:06:19.279
<v Speaker 2>like those API call sequences or op code sequences we

128
00:06:19.360 --> 00:06:23.199
<v Speaker 2>talked about. With these different architectures, recurrent neural networks or

129
00:06:23.319 --> 00:06:27.759
<v Speaker 2>RNNs are designed specifically for sequential data, Okay, and within

130
00:06:27.920 --> 00:06:32.000
<v Speaker 2>RNNs variants like lstm's long short term memory networks are

131
00:06:32.079 --> 00:06:36.120
<v Speaker 2>really powerful. They have mechanisms to remember information over longer sequences,

132
00:06:36.199 --> 00:06:39.160
<v Speaker 2>which is perfect for tracking complex behaviors that unfold over.

133
00:06:39.040 --> 00:06:40.959
<v Speaker 1>Time, so they can connect an early action with.

134
00:06:40.920 --> 00:06:42.160
<v Speaker 3>A later one precisely.

135
00:06:42.560 --> 00:06:46.480
<v Speaker 2>LSTMs are actually quite successful commercially. Another popular variation is

136
00:06:46.519 --> 00:06:49.480
<v Speaker 2>the GRU or gated recurrent unit, which is a bit

137
00:06:49.519 --> 00:06:52.920
<v Speaker 2>simpler than LSTM but often performs just as well. Both

138
00:06:53.079 --> 00:06:57.160
<v Speaker 2>LSTMs and grus have shown really significant improvements in detecting malware,

139
00:06:57.439 --> 00:07:01.319
<v Speaker 2>even things like spotting cybersecurity events based on say, patterns

140
00:07:01.319 --> 00:07:03.000
<v Speaker 2>and social media messages over time.

141
00:07:03.279 --> 00:07:04.879
<v Speaker 1>Interesting any other architectures.

142
00:07:05.079 --> 00:07:09.639
<v Speaker 2>Definitely there are residual networks or resonants. Their key innovation

143
00:07:09.839 --> 00:07:13.839
<v Speaker 2>is allowing the network to learn identity mappings, basically letting

144
00:07:13.879 --> 00:07:17.279
<v Speaker 2>the signal skip layers if needed. This helps train much

145
00:07:17.439 --> 00:07:21.000
<v Speaker 2>deeper networks without running into problems like vanishing gradients where

146
00:07:21.000 --> 00:07:22.600
<v Speaker 2>the signal gets too weak to train the.

147
00:07:22.600 --> 00:07:23.639
<v Speaker 3>Early layers effectively.

148
00:07:23.959 --> 00:07:26.319
<v Speaker 2>It's kind of inspired by how neurons connect in the brain.

149
00:07:26.600 --> 00:07:29.519
<v Speaker 1>Deeper networks mean potentially learning more complex patterns.

150
00:07:29.560 --> 00:07:30.920
<v Speaker 3>I guess that's the idea.

151
00:07:31.360 --> 00:07:34.399
<v Speaker 2>And then there are jans generative adversarial networks.

152
00:07:35.040 --> 00:07:38.279
<v Speaker 1>These are fascinating adversarial sounds intense.

153
00:07:38.519 --> 00:07:41.279
<v Speaker 2>It is in a way you have two networks competing.

154
00:07:41.839 --> 00:07:45.439
<v Speaker 2>A generator tries to create fake data like fake malware samples,

155
00:07:45.920 --> 00:07:49.079
<v Speaker 2>and a discriminator tries to tell the generator's fakes apart

156
00:07:49.120 --> 00:07:49.680
<v Speaker 2>from real.

157
00:07:49.560 --> 00:07:51.240
<v Speaker 1>Dat like a game of cat and mouse.

158
00:07:51.319 --> 00:07:54.639
<v Speaker 2>Exactly a mini max game. The generator gets better at

159
00:07:54.680 --> 00:07:58.480
<v Speaker 2>fooling the discriminator, and the discriminator gets better at spotting fakes.

160
00:07:59.040 --> 00:08:01.959
<v Speaker 2>The really exciting part about cans is their potential for

161
00:08:02.040 --> 00:08:05.439
<v Speaker 2>things like zero day malware detection, because the generator might

162
00:08:05.439 --> 00:08:08.959
<v Speaker 2>create novel malicious patterns or even we can use them

163
00:08:08.959 --> 00:08:11.680
<v Speaker 2>in the lab to generate challenging new threats to test

164
00:08:11.720 --> 00:08:14.839
<v Speaker 2>our defenses before similar things appear in the wild. It's

165
00:08:14.839 --> 00:08:15.720
<v Speaker 2>like a digital.

166
00:08:15.439 --> 00:08:19.120
<v Speaker 1>Sparring partner proactive defense. I like that. What about understanding

167
00:08:19.199 --> 00:08:22.360
<v Speaker 1>the words of malware like op codes or API calls?

168
00:08:22.480 --> 00:08:22.800
<v Speaker 3>Ah?

169
00:08:22.879 --> 00:08:25.439
<v Speaker 2>Yes, that's where word embedding techniques come in, like word

170
00:08:25.519 --> 00:08:28.680
<v Speaker 2>two vec, or even approaches based on hidden Markov models

171
00:08:28.720 --> 00:08:31.800
<v Speaker 2>like HMM two vec. The core idea is similar to

172
00:08:31.839 --> 00:08:35.159
<v Speaker 2>how language models understand words and sentences. You treat op

173
00:08:35.240 --> 00:08:38.519
<v Speaker 2>codes or API calls as words. These techniques learn to

174
00:08:38.559 --> 00:08:42.480
<v Speaker 2>represent these words as numerical vectors in a high dimensional.

175
00:08:42.000 --> 00:08:44.159
<v Speaker 1>Space, vectors like points on a map.

176
00:08:44.480 --> 00:08:47.600
<v Speaker 2>Sort of yes, And the key is that words used

177
00:08:47.639 --> 00:08:50.799
<v Speaker 2>in similar contexts, like API calls that often appear together

178
00:08:50.799 --> 00:08:54.919
<v Speaker 2>in malicious sequences and then closer together in this vector space.

179
00:08:55.559 --> 00:08:58.559
<v Speaker 2>Word two vec, for example, trained on just a shallow

180
00:08:58.600 --> 00:09:02.799
<v Speaker 2>neural network, can capture really meaningful relationships. It learns the

181
00:09:02.879 --> 00:09:05.360
<v Speaker 2>meaning or function of an op code from how it's

182
00:09:05.440 --> 00:09:06.879
<v Speaker 2>used alongside others, so.

183
00:09:06.799 --> 00:09:09.039
<v Speaker 1>It groups similar functions together automatically.

184
00:09:09.200 --> 00:09:13.440
<v Speaker 2>Essentially, yes, it captures semantic relationships. There are others too, briefly,

185
00:09:13.639 --> 00:09:17.360
<v Speaker 2>like extreme learning machines or elms. These are super fast

186
00:09:17.679 --> 00:09:21.200
<v Speaker 2>because they don't use the typical backpropagation training method solving

187
00:09:21.240 --> 00:09:22.519
<v Speaker 2>linear equations instead.

188
00:09:22.639 --> 00:09:25.960
<v Speaker 1>Wow, okay, so it's a really diverse AI toolkit. CNNs

189
00:09:25.960 --> 00:09:30.759
<v Speaker 1>for images, RNNs for sequences, jans for generating challenges, embeddings

190
00:09:30.799 --> 00:09:31.519
<v Speaker 1>for meaning.

191
00:09:31.399 --> 00:09:34.840
<v Speaker 2>Exactly, they're not just generic algorithms, they're specific tools honed

192
00:09:34.879 --> 00:09:37.639
<v Speaker 2>for different facets of the malware problem. Each has its

193
00:09:37.639 --> 00:09:39.440
<v Speaker 2>strengths depending on the data and the goal.

194
00:09:39.559 --> 00:09:42.080
<v Speaker 1>Right, It's like having different kinds of sensors. And analyzers.

195
00:09:42.240 --> 00:09:44.440
<v Speaker 1>So let's talk about where this is actually being deployed.

196
00:09:44.559 --> 00:09:47.440
<v Speaker 1>Where are these AI techniques making a real difference on

197
00:09:47.480 --> 00:09:48.159
<v Speaker 1>the front lines?

198
00:09:48.360 --> 00:09:53.039
<v Speaker 2>Good question. A huge area is Android malware detection. Think

199
00:09:53.080 --> 00:09:56.159
<v Speaker 2>about it, billions of smartphones out there. It's a massive target.

200
00:09:56.240 --> 00:09:58.440
<v Speaker 1>Yeah, my phone feels like my life sometimes, right.

201
00:09:59.159 --> 00:10:02.600
<v Speaker 2>So AI system analyze Android apps using static, dynamic or

202
00:10:02.679 --> 00:10:05.639
<v Speaker 2>hybrid methods. They look for suspicious API calls and app

203
00:10:05.679 --> 00:10:10.440
<v Speaker 2>shouldn't need like pt trace for debugging other processes, or

204
00:10:10.559 --> 00:10:15.879
<v Speaker 2>mkdr to create directories unexpectedly or connect for unusual network activity.

205
00:10:16.320 --> 00:10:19.159
<v Speaker 2>They also flag risky permission requests. Does that simple game

206
00:10:19.200 --> 00:10:22.919
<v Speaker 2>really need send SS permission or read contacts or system

207
00:10:22.919 --> 00:10:25.679
<v Speaker 2>milert window to draw over other apps. AI learns the

208
00:10:25.720 --> 00:10:27.759
<v Speaker 2>patterns of legitimate apps versus malware.

209
00:10:27.879 --> 00:10:29.960
<v Speaker 1>That makes sense. What about newer areas. I keep hearing

210
00:10:30.000 --> 00:10:31.759
<v Speaker 1>about smart cars and potential hacking.

211
00:10:31.879 --> 00:10:35.320
<v Speaker 2>That's a critical emerging frontier. Connected vehicle security part of

212
00:10:35.320 --> 00:10:39.559
<v Speaker 2>intelligent transportation systems or rights. Modern cars are basically computers

213
00:10:39.559 --> 00:10:43.720
<v Speaker 2>on wheels, packed with sensors embedded devices, communicating wirelessly V

214
00:10:43.799 --> 00:10:46.720
<v Speaker 2>two V vehicle to vehicle, V two I vehicle to.

215
00:10:46.679 --> 00:10:49.279
<v Speaker 1>Infrastructure, which means more tax surfaces.

216
00:10:49.039 --> 00:10:52.480
<v Speaker 2>Exactly, and the risks are serious. Denial of service DOSS

217
00:10:52.600 --> 00:10:57.399
<v Speaker 2>or distributed denial of service DAS attacks could cripple communication.

218
00:10:57.840 --> 00:11:02.000
<v Speaker 2>Imagine jamming traffic safety messages or preventing cars from coordinating

219
00:11:02.039 --> 00:11:02.840
<v Speaker 2>at intersections.

220
00:11:02.919 --> 00:11:05.000
<v Speaker 1>That sounds potentially catastrophic.

221
00:11:05.200 --> 00:11:06.120
<v Speaker 3>It could be so.

222
00:11:06.200 --> 00:11:09.639
<v Speaker 2>AI is being developed to monitor the complex network traffic

223
00:11:09.720 --> 00:11:13.320
<v Speaker 2>in and around vehicles, looking for anomalies communication patterns that

224
00:11:13.399 --> 00:11:17.480
<v Speaker 2>indicate jamming, spoofing, or attempts to compromise vehicle systems.

225
00:11:17.639 --> 00:11:21.480
<v Speaker 1>Okay, cars, phones, What about the cloud? So much runs

226
00:11:21.519 --> 00:11:21.919
<v Speaker 1>there now?

227
00:11:22.000 --> 00:11:25.960
<v Speaker 2>Absolutely, cloud infrastructure protection is vital. A major threat is

228
00:11:26.000 --> 00:11:30.759
<v Speaker 2>malware injection into virtual machines vms, because cloud platforms often

229
00:11:30.799 --> 00:11:34.919
<v Speaker 2>automatically provision lots of similar vms. If one type gets compromised,

230
00:11:35.240 --> 00:11:38.960
<v Speaker 2>malware can potentially spread very easily to others configured the same.

231
00:11:38.720 --> 00:11:41.200
<v Speaker 1>Way, like an infection spreading through identical twins.

232
00:11:41.440 --> 00:11:46.200
<v Speaker 2>A good analogy. AI techniques, sometimes even simpler machine learning

233
00:11:46.279 --> 00:11:49.639
<v Speaker 2>like keeneurest neighbors or local outlier factor can monitor the

234
00:11:49.679 --> 00:11:53.960
<v Speaker 2>hypervisor the software managing the vms. They look at performance metrics,

235
00:11:54.159 --> 00:11:58.720
<v Speaker 2>CPU load, memory usage, network IO. Anomalies in these patterns

236
00:11:58.799 --> 00:12:01.080
<v Speaker 2>can indicate a VM has been compromised and is doing

237
00:12:01.120 --> 00:12:02.480
<v Speaker 2>something malicious.

238
00:12:02.080 --> 00:12:03.720
<v Speaker 1>Like a fever chart for the VM.

239
00:12:03.840 --> 00:12:06.559
<v Speaker 2>Kind of yeah, though it can be less effective against

240
00:12:06.600 --> 00:12:09.120
<v Speaker 2>low and slow malware that tries very hard to hide

241
00:12:09.159 --> 00:12:12.159
<v Speaker 2>its activity and not cause obvious performance spikes.

242
00:12:11.919 --> 00:12:15.320
<v Speaker 1>Right stealthy attacks. What about just general network defense like

243
00:12:15.360 --> 00:12:17.000
<v Speaker 1>intrusion detection systems.

244
00:12:17.080 --> 00:12:20.879
<v Speaker 2>Yes, IDs are a classic battleground where AI is making inroads.

245
00:12:21.279 --> 00:12:24.159
<v Speaker 2>Instead of just relying on known attack signatures, AI can

246
00:12:24.159 --> 00:12:28.000
<v Speaker 2>perform anomaly detection on system of ventlogs I think database logs,

247
00:12:28.120 --> 00:12:32.000
<v Speaker 2>operating system logs. AI models, particularly auto encoders, can learn

248
00:12:32.000 --> 00:12:34.519
<v Speaker 2>what normal activity looks like for a specific user or.

249
00:12:34.559 --> 00:12:36.960
<v Speaker 1>System, establishing a baseline exactly.

250
00:12:37.639 --> 00:12:41.720
<v Speaker 2>Then any significant deviation from that learned normality gets flagged

251
00:12:41.759 --> 00:12:44.879
<v Speaker 2>as suspicious. It might be an attacker trying to escalate

252
00:12:44.919 --> 00:12:49.159
<v Speaker 2>privileges or moving laterally through the network. Some systems even

253
00:12:49.240 --> 00:12:52.679
<v Speaker 2>use hybrid approaches, maybe combining deep learning like auto encoders

254
00:12:52.679 --> 00:12:56.279
<v Speaker 2>for complex dependent data with traditional machine learning like support

255
00:12:56.320 --> 00:13:01.080
<v Speaker 2>vector machines for simpler independent data like timestamps.

256
00:13:00.159 --> 00:13:03.519
<v Speaker 1>In different angles. And what about something seemingly simpler like spam?

257
00:13:03.639 --> 00:13:07.120
<v Speaker 2>Ah, but spam gets clever too. Image spam is a

258
00:13:07.120 --> 00:13:11.440
<v Speaker 2>big one. Spammers embed their malicious messages or links inside images,

259
00:13:11.759 --> 00:13:14.440
<v Speaker 2>specifically to bypass text based filters.

260
00:13:14.679 --> 00:13:17.320
<v Speaker 1>Oh right, so the filter doesn't see the text correct.

261
00:13:17.559 --> 00:13:21.720
<v Speaker 2>But AI, especially CNN's again often combined with transfer learning

262
00:13:21.759 --> 00:13:25.039
<v Speaker 2>models like VGG nineteen, which are pre trained on millions

263
00:13:25.080 --> 00:13:28.200
<v Speaker 2>of images, can fight back effectively. They don't just read text.

264
00:13:28.279 --> 00:13:31.720
<v Speaker 2>They analyze the image itself. It's metadata like height, with

265
00:13:32.120 --> 00:13:36.320
<v Speaker 2>color statistics, mean color skewness, texture patterns, even shapes detected

266
00:13:36.399 --> 00:13:39.840
<v Speaker 2>using edge filters. They learn the visual characteristics of spam.

267
00:13:39.559 --> 00:13:43.200
<v Speaker 1>Images, so the AI sees the spamminess in the image itself.

268
00:13:43.200 --> 00:13:43.720
<v Speaker 1>That's clever.

269
00:13:44.120 --> 00:13:47.559
<v Speaker 2>It shows how AI can tackle threats designed to evade

270
00:13:47.600 --> 00:13:48.440
<v Speaker 2>older methods.

271
00:13:48.519 --> 00:13:50.879
<v Speaker 1>It really does feel like a constant arms race, though,

272
00:13:51.240 --> 00:13:54.799
<v Speaker 1>as our AI gets better at spotting malware.

273
00:13:54.320 --> 00:13:58.039
<v Speaker 2>The attackers start using AI themselves to create better malware.

274
00:13:58.120 --> 00:13:59.720
<v Speaker 3>It's an unavoidable cycle.

275
00:13:59.440 --> 00:14:02.600
<v Speaker 1>Which leads to this concept I've read about adversarial examples

276
00:14:02.879 --> 00:14:03.759
<v Speaker 1>sounds ominous.

277
00:14:03.840 --> 00:14:08.720
<v Speaker 2>It's a major challenge. Adversarial examples or aes or inputs

278
00:14:08.759 --> 00:14:10.960
<v Speaker 2>could be an image, could be a data file, could

279
00:14:10.960 --> 00:14:14.799
<v Speaker 2>be a software binary that are intentionally but very slightly modified.

280
00:14:15.200 --> 00:14:18.639
<v Speaker 2>The modification is often tiny, maybe even imperceptible to a human,

281
00:14:19.159 --> 00:14:22.879
<v Speaker 2>but it's specifically crafted to fool an AI classification.

282
00:14:22.320 --> 00:14:25.279
<v Speaker 4>Model to make the AI misjudge it exactly in the

283
00:14:25.320 --> 00:14:29.159
<v Speaker 4>malware context, attacker could take a genuinely malicious file, tweak

284
00:14:29.200 --> 00:14:31.200
<v Speaker 4>it just a little bit, maybe adding some junk code,

285
00:14:31.279 --> 00:14:33.720
<v Speaker 4>changing a few bytes so that our AI detector now

286
00:14:33.799 --> 00:14:35.399
<v Speaker 4>classifies it as benign.

287
00:14:35.120 --> 00:14:36.840
<v Speaker 1>But it still does the bad stuff.

288
00:14:37.200 --> 00:14:42.440
<v Speaker 2>Crucially, yes, it preserves its original malicious functionality while wearing

289
00:14:42.480 --> 00:14:47.559
<v Speaker 2>this AI fooling camouflage. It highlights that even powerful AI

290
00:14:47.600 --> 00:14:50.960
<v Speaker 2>models can have these exploitable blind spots. There were even

291
00:14:51.000 --> 00:14:54.480
<v Speaker 2>techniques to create universal perturbations that can fool a model

292
00:14:54.679 --> 00:14:56.240
<v Speaker 2>across many different inputs.

293
00:14:56.360 --> 00:14:59.559
<v Speaker 1>That's worrying. So the malware itself is also evolving, partly

294
00:14:59.600 --> 00:15:01.559
<v Speaker 1>in respect through our defenses.

295
00:15:01.080 --> 00:15:03.679
<v Speaker 2>Constantly, and machine learning is actually being used to track

296
00:15:03.720 --> 00:15:08.559
<v Speaker 2>this evolution. Researchers analyze malware families over time, perhaps looking

297
00:15:08.559 --> 00:15:12.240
<v Speaker 2>at op code sequences within specific time windows. They use techniques,

298
00:15:12.279 --> 00:15:15.240
<v Speaker 2>maybe even simpler ones like linear SVMs, to detect points

299
00:15:15.240 --> 00:15:19.039
<v Speaker 2>where a malware family significantly changed its characteristics.

300
00:15:18.440 --> 00:15:21.519
<v Speaker 1>Like finding evolutionary branches in the malware family tree.

301
00:15:21.559 --> 00:15:25.919
<v Speaker 2>Precisely understanding how threats adapt helps us anticipate future shifts

302
00:15:25.919 --> 00:15:27.279
<v Speaker 2>in their tactics or structure.

303
00:15:27.399 --> 00:15:30.320
<v Speaker 1>There must be practical challenges in just studying all this malware,

304
00:15:30.440 --> 00:15:31.559
<v Speaker 1>especially older stuff.

305
00:15:31.559 --> 00:15:35.960
<v Speaker 2>For live threats, oh absolutely, Handling live malware is inherently risky,

306
00:15:36.279 --> 00:15:39.039
<v Speaker 2>and for older samples, the infrastructure they relied on, especially

307
00:15:39.080 --> 00:15:41.720
<v Speaker 2>their command and control server C two servers, is often

308
00:15:41.799 --> 00:15:42.960
<v Speaker 2>long gone, so you.

309
00:15:42.960 --> 00:15:45.200
<v Speaker 1>Can't see their full behavior, not easily.

310
00:15:45.720 --> 00:15:48.879
<v Speaker 2>That's where C two server emulators become really useful. These

311
00:15:48.879 --> 00:15:52.240
<v Speaker 2>are tools researchers build to mimic the original C two server.

312
00:15:52.840 --> 00:15:55.919
<v Speaker 2>This allows them to run the malware, even historical samples,

313
00:15:56.200 --> 00:15:59.320
<v Speaker 2>in an isolated lab network and observe its full range

314
00:15:59.320 --> 00:16:02.039
<v Speaker 2>of capability. Because the malware thinks it's talking to its

315
00:16:02.039 --> 00:16:05.799
<v Speaker 2>real controller, you can extract features, understand its entire life cycle.

316
00:16:05.919 --> 00:16:07.720
<v Speaker 1>You trick the malware into showing its hand.

317
00:16:08.519 --> 00:16:08.960
<v Speaker 3>Essentially.

318
00:16:09.080 --> 00:16:12.000
<v Speaker 2>Yes, sometimes you might even need to slightly patch the

319
00:16:12.039 --> 00:16:15.519
<v Speaker 2>malware itself, maybe to bypass some anti analysis checks it has,

320
00:16:16.000 --> 00:16:18.679
<v Speaker 2>or if say an encryption key needed for its C

321
00:16:18.759 --> 00:16:21.159
<v Speaker 2>two communication was lost to time, like with some old

322
00:16:21.159 --> 00:16:22.279
<v Speaker 2>cryptol locker variants.

323
00:16:22.360 --> 00:16:25.159
<v Speaker 1>It's a complex process. Now, with all this focus on AI,

324
00:16:25.320 --> 00:16:28.799
<v Speaker 1>this AI mania, almost are their downsides things we need

325
00:16:28.879 --> 00:16:29.799
<v Speaker 1>to be cautious about.

326
00:16:30.080 --> 00:16:31.200
<v Speaker 3>That's a very important point.

327
00:16:31.440 --> 00:16:35.639
<v Speaker 2>Yes, while AI is powerful, we need perspective. Machine learning

328
00:16:35.679 --> 00:16:39.240
<v Speaker 2>is data driven, but it's not magic. Humans still make

329
00:16:39.320 --> 00:16:43.440
<v Speaker 2>crucial decisions, things like choosing the right model architecture, setting

330
00:16:43.480 --> 00:16:46.320
<v Speaker 2>parameters like the number of hidden states in an HMM,

331
00:16:46.559 --> 00:16:49.840
<v Speaker 2>selecting the kernel function for an SVM. These aren't automatic.

332
00:16:50.080 --> 00:16:53.720
<v Speaker 2>They require human expertise and significantly impact performance.

333
00:16:53.840 --> 00:16:56.159
<v Speaker 1>Right. The human element is still key in setting it.

334
00:16:56.159 --> 00:16:59.600
<v Speaker 2>Up, definitely, and there are practical constraints. More data is

335
00:16:59.639 --> 00:17:03.039
<v Speaker 2>often better, but it needs more computing power, more storage,

336
00:17:03.279 --> 00:17:07.079
<v Speaker 2>longer training times. That's a real bottleneck. Plus, some highly

337
00:17:07.119 --> 00:17:08.359
<v Speaker 2>tuned models can become.

338
00:17:08.319 --> 00:17:10.440
<v Speaker 3>Very specific to the data set they were trained on.

339
00:17:10.720 --> 00:17:13.519
<v Speaker 2>They might not generalize well to new, slightly different data,

340
00:17:13.559 --> 00:17:16.880
<v Speaker 2>which is a constant issue with evolving malware. There's a

341
00:17:16.960 --> 00:17:20.440
<v Speaker 2>real need for more robust, more generic deep learning approaches.

342
00:17:20.559 --> 00:17:22.319
<v Speaker 1>Adaptability is crucial and.

343
00:17:22.279 --> 00:17:25.160
<v Speaker 2>Another big challenge, maybe less technical, but just as important,

344
00:17:25.599 --> 00:17:28.880
<v Speaker 2>is the lack of a unified standard for malware taxonomy.

345
00:17:29.519 --> 00:17:32.720
<v Speaker 2>Different anti virus vendors often label the same threat differently,

346
00:17:33.000 --> 00:17:37.000
<v Speaker 2>even with tools like virus Total that aggregate results. Correlating

347
00:17:37.039 --> 00:17:41.319
<v Speaker 2>threats globally and building truly comprehensive data sets is harder

348
00:17:41.319 --> 00:17:43.519
<v Speaker 2>than it should be because we don't always speak the

349
00:17:43.559 --> 00:17:45.240
<v Speaker 2>same language when naming things.

350
00:17:46.319 --> 00:17:48.079
<v Speaker 1>That makes collaborative defense tricky.

351
00:17:48.440 --> 00:17:53.119
<v Speaker 2>It does, and one final sort of intriguing point. Researchers

352
00:17:53.119 --> 00:17:56.079
<v Speaker 2>have found that different methods for selecting the most important features,

353
00:17:56.079 --> 00:17:59.160
<v Speaker 2>like those API calls or op codes, can sometimes pick

354
00:17:59.279 --> 00:18:01.119
<v Speaker 2>vastly different sets of features.

355
00:18:00.880 --> 00:18:01.720
<v Speaker 1>But they still work.

356
00:18:01.960 --> 00:18:05.759
<v Speaker 2>But they still end up achieving similar classification accuracy, which

357
00:18:05.799 --> 00:18:09.119
<v Speaker 2>raises a fascinating question. Are these methods truly finding the

358
00:18:09.200 --> 00:18:12.640
<v Speaker 2>single best set of features or are there potentially multiple

359
00:18:12.680 --> 00:18:15.400
<v Speaker 2>different sets of features that are almost equally good at

360
00:18:15.480 --> 00:18:18.440
<v Speaker 2>identifying malware. It makes you wonder about what the AI

361
00:18:18.559 --> 00:18:19.279
<v Speaker 2>is really learning.

362
00:18:19.440 --> 00:18:22.000
<v Speaker 1>That is interesting. It suggests maybe there isn't one perfect

363
00:18:22.079 --> 00:18:23.160
<v Speaker 1>way to see the malware.

364
00:18:23.400 --> 00:18:25.680
<v Speaker 2>Okay, we have definitely covered a lot of ground in

365
00:18:25.720 --> 00:18:28.960
<v Speaker 2>this deep dive. We've seen how AI and deep learning

366
00:18:29.000 --> 00:18:35.079
<v Speaker 2>are genuinely transforming the fight against malware. From visualizing code

367
00:18:35.119 --> 00:18:37.559
<v Speaker 2>as images, which is still kind of blowing my mind yea,

368
00:18:37.720 --> 00:18:42.160
<v Speaker 2>to understanding behavior through sequences and protecting everything from our

369
00:18:42.240 --> 00:18:45.519
<v Speaker 2>phones and cars to the cloud. It's clearly a super dynamic,

370
00:18:45.599 --> 00:18:46.960
<v Speaker 2>constantly evolving field.

371
00:18:47.119 --> 00:18:49.839
<v Speaker 1>It absolutely is, and I think the key takeaway is

372
00:18:50.599 --> 00:18:55.079
<v Speaker 1>the sheer complexity of this ongoing cybersecurity arms race. AI

373
00:18:55.160 --> 00:18:58.440
<v Speaker 1>gives us incredibly powerful new tools, yes, but the ingenuity

374
00:18:58.480 --> 00:19:02.440
<v Speaker 1>attackers means it's never solved. Critical thinking, human oversight, asking

375
00:19:02.440 --> 00:19:05.960
<v Speaker 1>the right questions, understanding the limitations of the AI, these

376
00:19:06.000 --> 00:19:09.720
<v Speaker 1>remain completely indispensable. It's very much a human machine partnership,

377
00:19:09.839 --> 00:19:13.880
<v Speaker 1>absolutely a partnership against an ever adapting adversary. So maybe

378
00:19:13.880 --> 00:19:16.519
<v Speaker 1>the thought to leave you, our listener with, is this,

379
00:19:17.559 --> 00:19:20.680
<v Speaker 1>As AI gets better and better at spotting the hidden patterns,

380
00:19:20.720 --> 00:19:23.839
<v Speaker 1>the secret signatures of malicious code, what new forms of

381
00:19:23.880 --> 00:19:27.440
<v Speaker 1>digital camouflage will the attackers invent next? And will our

382
00:19:27.440 --> 00:19:30.920
<v Speaker 1>intelligent defenses always find the optimal way to adapt or

383
00:19:31.039 --> 00:19:33.400
<v Speaker 1>just one of many good enough ways. Constantly pushing the

384
00:19:33.480 --> 00:19:36.200
<v Speaker 1>very boundaries of what these intelligent systems can even perceive

385
00:19:36.599 --> 00:19:37.920
<v Speaker 1>is definitely something to think about.
