WEBVTT

1
00:00:00.120 --> 00:00:04.360
<v Speaker 1>Imagine this. You're watching a crime drama, right, and the

2
00:00:04.400 --> 00:00:07.719
<v Speaker 1>detective they're dusting for fingerprints. Classic stuff.

3
00:00:07.879 --> 00:00:10.720
<v Speaker 2>Yeah, you see it all the time, but honestly.

4
00:00:10.480 --> 00:00:14.279
<v Speaker 1>In the real world today that's almost quaint, like a

5
00:00:14.359 --> 00:00:18.000
<v Speaker 1>rotary phone. Today, the fingerprints, they're digital and they're.

6
00:00:17.800 --> 00:00:19.480
<v Speaker 2>Everywhere, absolutely everywhere.

7
00:00:19.519 --> 00:00:22.160
<v Speaker 1>We're talking the smartphone in your pocket, the sat NAV

8
00:00:22.199 --> 00:00:25.399
<v Speaker 1>in your car, even your home CCTV that's running twenty

9
00:00:25.399 --> 00:00:29.519
<v Speaker 1>four to seven. These digital traces, they used to be

10
00:00:29.640 --> 00:00:33.520
<v Speaker 1>just for specific cybercrimes, but now now they're found in

11
00:00:33.600 --> 00:00:36.960
<v Speaker 1>almost every case. It's not just evidence anymore. It's really

12
00:00:37.000 --> 00:00:38.079
<v Speaker 1>an explosion of it.

13
00:00:38.079 --> 00:00:41.439
<v Speaker 2>It truly is. The sheer amount and well the variety

14
00:00:41.479 --> 00:00:44.439
<v Speaker 2>of digital devices mean pretty much any incident, you know,

15
00:00:44.520 --> 00:00:46.119
<v Speaker 2>from a minor theft all the way up to a

16
00:00:46.159 --> 00:00:49.640
<v Speaker 2>major criminal investigation, it leaves a digital footprint something that

17
00:00:49.679 --> 00:00:52.560
<v Speaker 2>wasn't even really conceivable a few decades back, exactly.

18
00:00:52.600 --> 00:00:55.479
<v Speaker 1>And for you listening in this deep dive, it's your

19
00:00:55.479 --> 00:00:58.880
<v Speaker 1>shortcut to getting properly well informed about this hidden world.

20
00:00:59.200 --> 00:01:00.679
<v Speaker 1>We're not just going to scrap it's the surface of

21
00:01:00.679 --> 00:01:03.520
<v Speaker 1>what information is found, No, definitely not. Our mission here

22
00:01:03.960 --> 00:01:06.560
<v Speaker 1>is to pull back the curtain on how it stored,

23
00:01:06.599 --> 00:01:11.439
<v Speaker 1>how it gets retrieved, and maybe most importantly, why understanding

24
00:01:11.480 --> 00:01:15.760
<v Speaker 1>those hidden mechanics, Well, why it fundamentally reshapes an investigation

25
00:01:16.400 --> 00:01:18.400
<v Speaker 1>and maybe even our idea of digital truth.

26
00:01:18.519 --> 00:01:20.799
<v Speaker 2>Yeah, and our deep dive today it's built from sources

27
00:01:20.799 --> 00:01:23.400
<v Speaker 2>that really focus on file system forensics. So we're going

28
00:01:23.480 --> 00:01:26.079
<v Speaker 2>to give you a detailed look at how data is

29
00:01:26.239 --> 00:01:29.480
<v Speaker 2>organized write down at its most fundamental level. We'll look

30
00:01:29.480 --> 00:01:32.359
<v Speaker 2>at the essential tools investigators used to uncover it and

31
00:01:33.400 --> 00:01:36.959
<v Speaker 2>the fascinating, sometimes pretty complex challenges that lie ahead in

32
00:01:37.000 --> 00:01:39.239
<v Speaker 2>this field because it's evolving so fast.

33
00:01:39.280 --> 00:01:43.519
<v Speaker 1>Okay, so we've painted this picture digital evidence exploding everywhere.

34
00:01:43.840 --> 00:01:46.640
<v Speaker 1>But here's where it starts to get tricky. The very

35
00:01:46.640 --> 00:01:49.560
<v Speaker 1>basic rules for collecting this evidence they've been shifting under

36
00:01:49.560 --> 00:01:52.280
<v Speaker 1>our feet. Let's dig into what we're calling the shifting

37
00:01:52.359 --> 00:01:55.519
<v Speaker 1>sands of digital evidence, because what worked maybe ten years

38
00:01:55.560 --> 00:01:57.959
<v Speaker 1>ago could actually compromise an entire case.

39
00:01:58.000 --> 00:01:58.640
<v Speaker 2>Now, that's right.

40
00:01:58.840 --> 00:02:01.879
<v Speaker 1>For years, the standard advice for seizing a computer and

41
00:02:01.920 --> 00:02:05.719
<v Speaker 1>an investigation was just dead simple, pull the plug, just

42
00:02:05.799 --> 00:02:08.439
<v Speaker 1>cut the power immediately. Yep, that was the gold standard.

43
00:02:08.560 --> 00:02:13.479
<v Speaker 2>But today doing that can be a huge mistake. Why

44
00:02:13.599 --> 00:02:16.960
<v Speaker 2>is that old rule suddenly so well dangerous.

45
00:02:17.439 --> 00:02:20.560
<v Speaker 1>What's fascinating, I think is that modern operating systems, in hardware,

46
00:02:20.759 --> 00:02:24.240
<v Speaker 1>they've introduced features that directly conflict with that old advice.

47
00:02:24.439 --> 00:02:28.599
<v Speaker 1>Think of it like this. For encrypted storage, right, the

48
00:02:28.639 --> 00:02:32.560
<v Speaker 1>decryption keys often only live in the computer's volatile memory.

49
00:02:32.960 --> 00:02:36.479
<v Speaker 1>It's RAM, basically, it's short term digital brain. Okay, So

50
00:02:36.520 --> 00:02:39.840
<v Speaker 1>you pull the plug, those keys just vanish instantly, and

51
00:02:39.919 --> 00:02:43.520
<v Speaker 1>all that data becomes a locked box, you know, permanently inaccessible.

52
00:02:43.560 --> 00:02:46.080
<v Speaker 1>It's like trying to open a safe after the combination's

53
00:02:46.120 --> 00:02:48.159
<v Speaker 1>just been wiped from the manager's memory.

54
00:02:47.919 --> 00:02:51.479
<v Speaker 2>Right gone forever. And people aren't just saving things locally anymore,

55
00:02:51.520 --> 00:02:55.800
<v Speaker 2>are they. They're using remote storage, cloud services, email, social.

56
00:02:55.520 --> 00:02:56.919
<v Speaker 1>Media constantly connects.

57
00:02:57.000 --> 00:02:59.280
<v Speaker 2>So if you yank the power, all those live connections,

58
00:02:59.319 --> 00:03:03.199
<v Speaker 2>all that access to potentially crucial data, it's instantly.

59
00:03:02.759 --> 00:03:05.919
<v Speaker 1>Lost, exactly. And this is why live data forensics or

60
00:03:06.039 --> 00:03:09.599
<v Speaker 1>LDF has become so critical. It lets analysts capture that

61
00:03:09.639 --> 00:03:12.560
<v Speaker 1>live data from a running computer system before it disappears.

62
00:03:12.919 --> 00:03:16.240
<v Speaker 1>But hang on, that sounds like it completely contradicts one

63
00:03:16.280 --> 00:03:19.479
<v Speaker 1>of the most fundamental principles of forensics, doesn't it.

64
00:03:19.280 --> 00:03:22.599
<v Speaker 2>It absolutely does, And this raises a really critical point.

65
00:03:23.120 --> 00:03:26.639
<v Speaker 2>The first principle in digital forensics. It's often called ACPO

66
00:03:26.719 --> 00:03:29.960
<v Speaker 2>principle one. It states that no action taken by law

67
00:03:30.000 --> 00:03:33.800
<v Speaker 2>enforcement agencies should change data which may subsequently be relied

68
00:03:33.879 --> 00:03:38.120
<v Speaker 2>upon in court. Okay, LDF, well, it inherently breaks this principle.

69
00:03:38.159 --> 00:03:40.159
<v Speaker 2>I mean, even just moving a mouse on a running

70
00:03:40.159 --> 00:03:43.960
<v Speaker 2>system leaves digital traces. So the tension there, it's very real.

71
00:03:44.360 --> 00:03:47.120
<v Speaker 1>So, Okay, if you're altering the data, how on earth

72
00:03:47.159 --> 00:03:49.560
<v Speaker 1>can it be admissible in court? Doesn't that just open

73
00:03:49.599 --> 00:03:52.360
<v Speaker 1>the evidence up to immediate challenge, Say you changed it?

74
00:03:52.439 --> 00:03:55.560
<v Speaker 2>Well, not necessarily the principles themselves, they're actually still fit

75
00:03:55.599 --> 00:03:58.719
<v Speaker 2>for purpose, but they kind of adapt. So while LDF

76
00:03:58.759 --> 00:04:01.080
<v Speaker 2>does alter data, it can be admissible in court when

77
00:04:01.120 --> 00:04:05.319
<v Speaker 2>you combine it with ACPO principle two, which emphasizes investigator competence,

78
00:04:05.360 --> 00:04:08.360
<v Speaker 2>and principle three, which demands a really thorough audit trail.

79
00:04:08.719 --> 00:04:11.319
<v Speaker 1>Ah okay, the paperwork.

80
00:04:11.080 --> 00:04:14.719
<v Speaker 2>Essentially, yeah, it means every single step taken, every command run,

81
00:04:14.800 --> 00:04:18.720
<v Speaker 2>every change made, it has to be meticulously documented. And

82
00:04:18.759 --> 00:04:21.680
<v Speaker 2>it's that meticulous record keeping that allows the court to

83
00:04:21.800 --> 00:04:25.199
<v Speaker 2>understand and hopefully trust, the context of the altered data.

84
00:04:25.560 --> 00:04:28.920
<v Speaker 2>The integrity of the whole investigation now relies on understanding

85
00:04:29.040 --> 00:04:33.120
<v Speaker 2>that fundamental shift in collection and then just rigorously documenting

86
00:04:33.240 --> 00:04:34.240
<v Speaker 2>every single step.

87
00:04:34.480 --> 00:04:38.040
<v Speaker 1>Right. Speaking of integrity, let's talk about maybe an unsung

88
00:04:38.120 --> 00:04:42.439
<v Speaker 1>hero of digital forensics. Linux. It's often called an open

89
00:04:42.439 --> 00:04:46.000
<v Speaker 1>source powerhouse in this field. But what does open source

90
00:04:46.040 --> 00:04:47.959
<v Speaker 1>actually mean? Is it just about software you don't have

91
00:04:48.000 --> 00:04:48.399
<v Speaker 1>to pay for.

92
00:04:49.079 --> 00:04:52.040
<v Speaker 2>That's a common misconception. To really get open source, it

93
00:04:52.040 --> 00:04:55.120
<v Speaker 2>helps to contrast it with closed source software. So imagine

94
00:04:55.160 --> 00:04:57.720
<v Speaker 2>I write a simple Hello World program and see.

95
00:04:57.480 --> 00:04:58.000
<v Speaker 1>Right, okay.

96
00:04:58.279 --> 00:05:01.079
<v Speaker 2>With closed source, i'd give you the compile executable file.

97
00:05:01.160 --> 00:05:03.240
<v Speaker 2>You can run it, sure, but you can't see the

98
00:05:03.319 --> 00:05:05.839
<v Speaker 2>underlying code. You can't change it. It's like a mystery

99
00:05:05.879 --> 00:05:08.319
<v Speaker 2>box that just well does its thing right.

100
00:05:08.360 --> 00:05:10.240
<v Speaker 1>You just trust it works exactly.

101
00:05:10.439 --> 00:05:12.720
<v Speaker 2>With open source, though, I give you the actual c

102
00:05:13.000 --> 00:05:15.759
<v Speaker 2>program file, the source code. You can read it, you

103
00:05:15.759 --> 00:05:17.639
<v Speaker 2>can understand exactly what it does, and you can even

104
00:05:17.639 --> 00:05:21.120
<v Speaker 2>modify it if you've got the skills. Richard Stallman, the

105
00:05:21.160 --> 00:05:24.000
<v Speaker 2>founder of the Free Software Foundation. He famously said that

106
00:05:24.120 --> 00:05:27.800
<v Speaker 2>free in open source means free as in free speech,

107
00:05:28.480 --> 00:05:30.240
<v Speaker 2>not free as in free beer.

108
00:05:30.600 --> 00:05:31.920
<v Speaker 1>Ah. That's a great distinction.

109
00:05:32.199 --> 00:05:35.079
<v Speaker 2>It is so while it's often free of costs because

110
00:05:35.079 --> 00:05:37.800
<v Speaker 2>of the lightnsing, the core idea is really the freedom

111
00:05:37.839 --> 00:05:40.360
<v Speaker 2>to examine, use, and modify the code.

112
00:05:40.639 --> 00:05:45.000
<v Speaker 1>That distinction is key. So why is this open source model,

113
00:05:45.040 --> 00:05:49.240
<v Speaker 1>particularly Linux, such an advantage for forensics. It seems a

114
00:05:49.240 --> 00:05:52.439
<v Speaker 1>bit counterintuitive that something anyone can tinker with would be

115
00:05:52.519 --> 00:05:53.279
<v Speaker 1>more trustworthy.

116
00:05:53.319 --> 00:05:56.279
<v Speaker 2>Maybe well, it's precisely because anyone can modify it, or

117
00:05:56.279 --> 00:05:58.839
<v Speaker 2>at least examine it, that it's often seen as more trustworthy.

118
00:05:59.120 --> 00:06:01.800
<v Speaker 2>There are a few big reasons. First, there's community power.

119
00:06:02.519 --> 00:06:07.040
<v Speaker 2>Open source projects often have these huge communities of users, developers, testers,

120
00:06:07.079 --> 00:06:10.199
<v Speaker 2>all working together. This collective effort often leads to new

121
00:06:10.240 --> 00:06:14.680
<v Speaker 2>features being introduced faster and crucially, issues being resolved much

122
00:06:14.759 --> 00:06:17.160
<v Speaker 2>quicker than with small proprietary teams.

123
00:06:17.319 --> 00:06:20.560
<v Speaker 1>So it's not just about speed then, does that community

124
00:06:20.639 --> 00:06:25.800
<v Speaker 1>scrutiny also directly help with the trustworthiness and accuracy of

125
00:06:25.800 --> 00:06:30.199
<v Speaker 1>the tools, which must be absolutely paramount when potentially lives

126
00:06:30.240 --> 00:06:30.920
<v Speaker 1>depend on them.

127
00:06:31.040 --> 00:06:33.639
<v Speaker 2>That's exactly it. More eyes on the code, more brain

128
00:06:33.759 --> 00:06:38.360
<v Speaker 2>solving problems. This leads directly to greater trust and correctness.

129
00:06:38.800 --> 00:06:42.680
<v Speaker 2>With closed source software, you're essentially relying on the developers

130
00:06:42.720 --> 00:06:46.680
<v Speaker 2>having got everything right, and we've all seen the infamous

131
00:06:46.720 --> 00:06:50.920
<v Speaker 2>blue screen of death in Windows. For example, With open source,

132
00:06:50.959 --> 00:06:53.720
<v Speaker 2>the community can review and fix the code at any point.

133
00:06:54.240 --> 00:06:57.240
<v Speaker 2>That provides a lot more confidence in the tool's accuracy, which,

134
00:06:57.279 --> 00:06:59.680
<v Speaker 2>as you say, is vital when people's lives might depend

135
00:06:59.680 --> 00:07:03.000
<v Speaker 2>on the investigation's outcome. Makes sense, and yes, cost effectiveness

136
00:07:03.040 --> 00:07:07.000
<v Speaker 2>is a major advantage too. Because of copyleft licensing requirements,

137
00:07:07.079 --> 00:07:09.959
<v Speaker 2>it's actually quite difficult to sell open source software directly,

138
00:07:10.040 --> 00:07:12.839
<v Speaker 2>so it's often free of cost. Now, companies can still

139
00:07:12.839 --> 00:07:16.560
<v Speaker 2>offer services like training or customization around these products, but

140
00:07:16.680 --> 00:07:20.120
<v Speaker 2>the core software itself is usually freely available. And lastly,

141
00:07:20.199 --> 00:07:24.480
<v Speaker 2>specifically for forensics, Linux offers great support for many file

142
00:07:24.519 --> 00:07:27.879
<v Speaker 2>systems by default, often much more than Windows or Mac

143
00:07:27.920 --> 00:07:31.000
<v Speaker 2>OS natively support, which makes it a really ideal forensic

144
00:07:31.040 --> 00:07:32.480
<v Speaker 2>workstation right out of the box.

145
00:07:33.120 --> 00:07:35.240
<v Speaker 1>So when we talk about Linux as an operating system,

146
00:07:35.279 --> 00:07:37.720
<v Speaker 1>what are its main parts? You hear about the kernel,

147
00:07:38.000 --> 00:07:40.839
<v Speaker 1>but what else makes it a functioning OS?

148
00:07:41.160 --> 00:07:44.319
<v Speaker 2>Yeah, good question. At its very heart is the kernel,

149
00:07:44.519 --> 00:07:46.879
<v Speaker 2>which was created by Linus Torvold's that's the bit that

150
00:07:46.920 --> 00:07:50.600
<v Speaker 2>directly controls the hardware and manages the software. Then layered

151
00:07:50.600 --> 00:07:53.079
<v Speaker 2>on top of that are the GNU utilities. These are

152
00:07:53.079 --> 00:07:56.519
<v Speaker 2>standard programs that let users control files, run programs, that

153
00:07:56.560 --> 00:07:59.519
<v Speaker 2>sort of thing. It's really the combination of the Linux

154
00:07:59.600 --> 00:08:03.040
<v Speaker 2>kernel and these GENU utilities that forms the functional operating

155
00:08:03.079 --> 00:08:06.439
<v Speaker 2>system we commonly just call Linux. Beyond that core, you've

156
00:08:06.439 --> 00:08:10.240
<v Speaker 2>got graphical desktop environments, the visual interface most people see,

157
00:08:10.480 --> 00:08:12.839
<v Speaker 2>and of course all the application software the end users

158
00:08:12.839 --> 00:08:15.839
<v Speaker 2>are most familiar with, including those powerful forensic tools we've

159
00:08:15.879 --> 00:08:16.399
<v Speaker 2>been mentioning.

160
00:08:16.600 --> 00:08:20.639
<v Speaker 1>Okay, let's get practical then. What are some basic, really

161
00:08:20.759 --> 00:08:24.920
<v Speaker 1>fundamental forensic commands in Linux that investigators are using day

162
00:08:24.920 --> 00:08:27.800
<v Speaker 1>to day. These must be like the digital equivalent of

163
00:08:27.839 --> 00:08:29.439
<v Speaker 1>a magnifying glass and dusting powder.

164
00:08:29.600 --> 00:08:32.720
<v Speaker 2>Definitely, one of the most fundamental is hashing. This is

165
00:08:32.840 --> 00:08:35.559
<v Speaker 2>absolutely crucial for ensuring data integrity.

166
00:08:35.759 --> 00:08:37.320
<v Speaker 1>Okay, how did that work? Well?

167
00:08:37.559 --> 00:08:41.399
<v Speaker 2>Hashing algorithms like say MD five or the Saha family.

168
00:08:41.799 --> 00:08:45.360
<v Speaker 2>They create a unique digital fingerprint for any piece of data.

169
00:08:45.840 --> 00:08:48.519
<v Speaker 2>If even a single bit is changed in a file,

170
00:08:48.879 --> 00:08:52.279
<v Speaker 2>its hash value will change traumatically. It's like if you

171
00:08:52.399 --> 00:08:55.399
<v Speaker 2>change just one letter in the entire collected works of Shakespeare,

172
00:08:55.480 --> 00:08:59.159
<v Speaker 2>the hash would completely change instantly, confirming even the tiniest alteration.

173
00:08:59.559 --> 00:09:01.960
<v Speaker 1>Wow. So if someone sends you a file and you

174
00:09:02.000 --> 00:09:04.879
<v Speaker 1>calculate its hash, you can instantly verify it hasn't been

175
00:09:04.919 --> 00:09:08.080
<v Speaker 1>tampered with since they calculated their ash. Yeah, that's powerful.

176
00:09:08.320 --> 00:09:12.039
<v Speaker 2>It is very powerful. Now. While some smaller hashes like

177
00:09:12.159 --> 00:09:16.559
<v Speaker 2>CRC threety two maybe can sometimes experience hash collisions, that's

178
00:09:16.600 --> 00:09:20.440
<v Speaker 2>where different inputs accidentally produce the same hash, using larger

179
00:09:20.480 --> 00:09:23.799
<v Speaker 2>outputs like SAHA five twelve, or maybe using multiple different

180
00:09:23.840 --> 00:09:28.000
<v Speaker 2>algorithms together, that greatly reduces that probability to almost zero.

181
00:09:28.200 --> 00:09:29.240
<v Speaker 1>Got it? What else?

182
00:09:29.480 --> 00:09:33.279
<v Speaker 2>Another really useful tool is hex viewers like XXD. This

183
00:09:33.399 --> 00:09:37.360
<v Speaker 2>lets analysts examine raw binary data, bite by byte, things

184
00:09:37.399 --> 00:09:40.399
<v Speaker 2>like a discs partition table for example. It's like looking

185
00:09:40.399 --> 00:09:43.080
<v Speaker 2>at the absolute purest form of the computer's language, the

186
00:09:43.120 --> 00:09:47.440
<v Speaker 2>ones and zeros represented compactly. This often requires root access, though,

187
00:09:47.519 --> 00:09:48.480
<v Speaker 2>using the pseudo command.

188
00:09:48.679 --> 00:09:50.879
<v Speaker 1>Okay, and then there's strings I've heard that's a really

189
00:09:50.919 --> 00:09:53.399
<v Speaker 1>really powerful one for investigators. What makes it so special?

190
00:09:53.559 --> 00:09:54.159
<v Speaker 1>It really is?

191
00:09:54.279 --> 00:09:58.279
<v Speaker 2>The strings command is deceptively simple but incredibly useful. It

192
00:09:58.320 --> 00:10:00.960
<v Speaker 2>displays all the printable as key carecharacters it finds within

193
00:10:01.039 --> 00:10:03.600
<v Speaker 2>any file, even binary file, so even.

194
00:10:03.399 --> 00:10:06.120
<v Speaker 1>In like an image file or a program exactly.

195
00:10:06.240 --> 00:10:09.759
<v Speaker 2>Even if a file is an image or an executable program,

196
00:10:10.080 --> 00:10:12.679
<v Speaker 2>strings can pull out any plain text that happens to

197
00:10:12.679 --> 00:10:14.840
<v Speaker 2>be embedded within it. And when you combine it with

198
00:10:14.960 --> 00:10:18.960
<v Speaker 2>EGREP for text searching, it becomes a very powerful forensic

199
00:10:19.000 --> 00:10:23.799
<v Speaker 2>tool for quickly finding keywords or phrases within potentially massive

200
00:10:23.799 --> 00:10:26.919
<v Speaker 2>amounts of raw data. That sounds incredibly useful, and the

201
00:10:26.960 --> 00:10:30.840
<v Speaker 2>AT option is particularly useful. It displays the bite offset,

202
00:10:30.840 --> 00:10:33.879
<v Speaker 2>basically the address where the text is found. This lets

203
00:10:33.879 --> 00:10:37.120
<v Speaker 2>an investigator navigate directly to that specific spot within the

204
00:10:37.159 --> 00:10:39.360
<v Speaker 2>file or the disc image using other tools.

205
00:10:39.440 --> 00:10:41.960
<v Speaker 1>So it's not just about finding a keyword, it's about

206
00:10:42.279 --> 00:10:45.240
<v Speaker 1>the context, seeing what's around it. I imagine that's crucial

207
00:10:45.279 --> 00:10:47.720
<v Speaker 1>when you're dealing with huge amounts of data where a

208
00:10:47.799 --> 00:10:51.799
<v Speaker 1>word might appear harmlessly in one place, but maybe sinatraily

209
00:10:51.840 --> 00:10:54.279
<v Speaker 1>in another, all hidden away in binary code.

210
00:10:54.320 --> 00:10:56.879
<v Speaker 2>Precisely, it's not just finding the needle in the haystack.

211
00:10:56.960 --> 00:10:59.720
<v Speaker 2>It's like finding a specific pollen grain on that needle,

212
00:10:59.759 --> 00:11:03.759
<v Speaker 2>and it's precise digital address. It's truly granular work.

213
00:11:04.039 --> 00:11:07.000
<v Speaker 1>That's incredible. Okay, that's a great segue into understanding the

214
00:11:07.039 --> 00:11:10.919
<v Speaker 1>hidden language, how computers speak in ones and zeros. I mean,

215
00:11:10.919 --> 00:11:12.200
<v Speaker 1>at the end of the day, it's all just zeros

216
00:11:12.200 --> 00:11:14.080
<v Speaker 1>and ones, right, But how do they turn that into

217
00:11:14.120 --> 00:11:16.639
<v Speaker 1>something we actually understand, like text or numbers.

218
00:11:17.000 --> 00:11:19.559
<v Speaker 2>It is all zeros and ones. But how those zeros

219
00:11:19.600 --> 00:11:23.399
<v Speaker 2>and ones are interpreted? That's the key. Computers, as you say,

220
00:11:23.480 --> 00:11:27.200
<v Speaker 2>use the binary number system base two. Humans we generally

221
00:11:27.279 --> 00:11:31.080
<v Speaker 2>use decimal base ten, but in computing you'll very often

222
00:11:31.200 --> 00:11:36.000
<v Speaker 2>encounter hexadecimal or hex, which is base sixteen. It uses

223
00:11:36.039 --> 00:11:38.799
<v Speaker 2>the digits zero through nine and then the letters A

224
00:11:39.000 --> 00:11:42.759
<v Speaker 2>through F to represent the values ten to fifteen. Hexadeesimal

225
00:11:42.840 --> 00:11:45.960
<v Speaker 2>is simply a much more compact way to represent binary

226
00:11:46.039 --> 00:11:47.120
<v Speaker 2>data for human eyes.

227
00:11:47.320 --> 00:11:49.240
<v Speaker 1>Okay, so it's like a shorthand exactly.

228
00:11:49.320 --> 00:11:51.639
<v Speaker 2>For example, the binary number ten eleven that's one zero,

229
00:11:51.679 --> 00:11:54.360
<v Speaker 2>one one is equivalent to eleven in our decimal system.

230
00:11:54.759 --> 00:11:57.159
<v Speaker 2>In hex, it would be b. It just makes reading

231
00:11:57.200 --> 00:11:59.000
<v Speaker 2>long strings of binary data much easier.

232
00:11:59.039 --> 00:12:02.240
<v Speaker 1>Okay, that makes sense. Then there's text. My computer understands

233
00:12:02.240 --> 00:12:03.960
<v Speaker 1>what I type, But how does it turn the letter

234
00:12:04.000 --> 00:12:05.960
<v Speaker 1>a into numbers and then back again.

235
00:12:06.279 --> 00:12:09.840
<v Speaker 2>Ah, that's where character encodings come in. They basically assign

236
00:12:09.919 --> 00:12:13.679
<v Speaker 2>a unique numerical code to each character. Letter's numbers, symbols

237
00:12:13.919 --> 00:12:17.759
<v Speaker 2>everything like a secret codebook kind of Older encodings like

238
00:12:17.879 --> 00:12:21.120
<v Speaker 2>ASE and ISO eight eight five nine, they were quite limited.

239
00:12:21.519 --> 00:12:24.360
<v Speaker 2>They worked well for English, but struggled with special characters

240
00:12:24.399 --> 00:12:27.720
<v Speaker 2>like maybe the A character in Spanish or other European languages.

241
00:12:28.000 --> 00:12:30.799
<v Speaker 2>They simply didn't have enough codes assigned for every symbol

242
00:12:30.919 --> 00:12:31.799
<v Speaker 2>used across.

243
00:12:31.559 --> 00:12:34.840
<v Speaker 1>The world, Which is where Unicode and UTF eight step in,

244
00:12:34.879 --> 00:12:35.799
<v Speaker 1>I guess correct.

245
00:12:36.159 --> 00:12:39.720
<v Speaker 2>Unicode is this huge standard that supports a vast range

246
00:12:39.759 --> 00:12:42.840
<v Speaker 2>of characters from pretty much all the world's writing systems.

247
00:12:43.159 --> 00:12:47.320
<v Speaker 2>It covers virtually every character you could imagine, and UTF

248
00:12:47.320 --> 00:12:50.879
<v Speaker 2>eight is a specific encoding method for Unicode. It's a

249
00:12:51.039 --> 00:12:55.320
<v Speaker 2>variable width encoding, which cleverly solves the storage inefficiency you'd

250
00:12:55.320 --> 00:12:57.919
<v Speaker 2>get if every single character took up say four bites.

251
00:12:58.639 --> 00:13:01.679
<v Speaker 2>UTF eight uses anywhere from one to four bytes per character,

252
00:13:01.879 --> 00:13:05.039
<v Speaker 2>so it adapts exactly. This makes it the de facto

253
00:13:05.159 --> 00:13:08.279
<v Speaker 2>standard for web pageing coding and much more. What's really

254
00:13:08.279 --> 00:13:11.720
<v Speaker 2>clever about UTF eight is that standard English ACI characters

255
00:13:11.759 --> 00:13:15.360
<v Speaker 2>ABC one, two three, they are represented identically to their

256
00:13:15.360 --> 00:13:18.399
<v Speaker 2>original ACI form, taking up just one byte, but more

257
00:13:18.440 --> 00:13:22.919
<v Speaker 2>complex characters like emojis or characters from other alphabets, they

258
00:13:23.000 --> 00:13:25.720
<v Speaker 2>might take two, three, or four bytes. So it's incredibly

259
00:13:25.720 --> 00:13:29.279
<v Speaker 2>efficient for common text, but flexible enough for global communication.

260
00:13:29.720 --> 00:13:32.440
<v Speaker 1>That's smart. And what about time? How do computers keep

261
00:13:32.480 --> 00:13:35.279
<v Speaker 1>track of that down to the you know, the millisecond

262
00:13:35.320 --> 00:13:38.639
<v Speaker 1>or even nanosecond, which must be crucial for an investigation

263
00:13:38.679 --> 00:13:39.440
<v Speaker 1>and timeline.

264
00:13:39.519 --> 00:13:43.600
<v Speaker 2>Time representation in computing is another fascinating area. Many systems,

265
00:13:43.679 --> 00:13:46.919
<v Speaker 2>especially Unix like systems like Linux and mac os, use

266
00:13:46.919 --> 00:13:48.320
<v Speaker 2>what's called Unix time.

267
00:13:48.240 --> 00:13:48.960
<v Speaker 1>Right, I've heard of that.

268
00:13:49.039 --> 00:13:51.159
<v Speaker 2>It's measured as a number of seconds that have elapsed

269
00:13:51.200 --> 00:13:55.480
<v Speaker 2>since midnight UTC on January first, nineteen seventy. That specific

270
00:13:55.559 --> 00:13:57.080
<v Speaker 2>moment is known as the epoch.

271
00:13:57.399 --> 00:13:59.799
<v Speaker 1>Okay, so just a big counter of seconds. But what

272
00:14:00.159 --> 00:14:03.519
<v Speaker 1>if two things happen really fast, like within the same second,

273
00:14:03.679 --> 00:14:06.279
<v Speaker 1>would they have the exact same time stamp? That could

274
00:14:06.279 --> 00:14:08.919
<v Speaker 1>be a real problem for investigators trying to figure out

275
00:14:08.960 --> 00:14:13.360
<v Speaker 1>the exact order of events, especially if automated processes are involved.

276
00:14:13.480 --> 00:14:16.399
<v Speaker 2>It absolutely could be, and it was a limitation. While

277
00:14:16.440 --> 00:14:18.799
<v Speaker 2>maybe not an issue for things happening at human speed,

278
00:14:19.279 --> 00:14:23.080
<v Speaker 2>automated processes can access or modify many files within a

279
00:14:23.159 --> 00:14:27.279
<v Speaker 2>single second. This meant older filesystems like say x two

280
00:14:27.320 --> 00:14:31.600
<v Speaker 2>to two, which only had second level granularity, couldn't always

281
00:14:31.639 --> 00:14:35.120
<v Speaker 2>definitively say which isn't happened first if they occurred in

282
00:14:35.159 --> 00:14:35.840
<v Speaker 2>the same second.

283
00:14:35.919 --> 00:14:36.799
<v Speaker 1>So how do they fix that?

284
00:14:37.360 --> 00:14:40.559
<v Speaker 2>Well, most modern implementations of UNIX time, like you find

285
00:14:40.559 --> 00:14:43.639
<v Speaker 2>in the XT four filesystem, for example, they now include

286
00:14:43.639 --> 00:14:47.879
<v Speaker 2>a nanosecond subcomponent. Nanosecond yeah, billions of a second. This

287
00:14:48.039 --> 00:14:52.360
<v Speaker 2>significantly improves the granularity, allowing for incredibly precise ordering of

288
00:14:52.399 --> 00:14:55.919
<v Speaker 2>events and file creation timestamps that can be absolutely crucial

289
00:14:55.960 --> 00:14:57.960
<v Speaker 2>for building an accurate forensic timeline.

290
00:14:58.159 --> 00:15:02.440
<v Speaker 1>That's a huge leap in precision. Okay, one more weird

291
00:15:02.519 --> 00:15:05.279
<v Speaker 1>term before we move on. Indian thiss. That sounds like

292
00:15:05.279 --> 00:15:07.559
<v Speaker 1>something out of Gulliver's Travels or I don't know, a

293
00:15:07.600 --> 00:15:09.639
<v Speaker 1>really obscure technical debate. What's that about?

294
00:15:09.879 --> 00:15:12.799
<v Speaker 2>Huh? It does sound a bit strange, doesn't it, But

295
00:15:12.879 --> 00:15:16.879
<v Speaker 2>it's actually crucial for correctly interpreting raw Heck's data off

296
00:15:16.919 --> 00:15:20.519
<v Speaker 2>a disc. Indianness is just about the order in which

297
00:15:20.559 --> 00:15:24.679
<v Speaker 2>computers store or read multi byte numbers. Okay, how so

298
00:15:25.120 --> 00:15:27.279
<v Speaker 2>imagine writing down a date. Do you write month, day

299
00:15:27.399 --> 00:15:29.120
<v Speaker 2>year like in the US or day month year like

300
00:15:29.159 --> 00:15:31.639
<v Speaker 2>in Europe. It's the same information, right, just a different order.

301
00:15:31.679 --> 00:15:32.120
<v Speaker 1>Gotcha.

302
00:15:32.159 --> 00:15:35.200
<v Speaker 2>Computers have a similar choice for numbers. Big Indian is

303
00:15:35.240 --> 00:15:37.919
<v Speaker 2>like writing one twenty three. The most significant bite, the

304
00:15:37.919 --> 00:15:40.759
<v Speaker 2>one hundred's part, comes first. Little Indian, which is more

305
00:15:40.759 --> 00:15:43.120
<v Speaker 2>common on PCs, is like writing three twenty one. The

306
00:15:43.200 --> 00:15:45.759
<v Speaker 2>least significant bite, the ones part, comes first.

307
00:15:45.799 --> 00:15:47.519
<v Speaker 1>So why does that matter for forensics?

308
00:15:47.759 --> 00:15:50.399
<v Speaker 2>Because if you pull raw data off a disc, maybe

309
00:15:50.440 --> 00:15:53.200
<v Speaker 2>from a critical system boot file or a timestamp field,

310
00:15:53.240 --> 00:15:55.519
<v Speaker 2>and you don't know the reading order, the indianness of

311
00:15:55.519 --> 00:15:58.960
<v Speaker 2>the system that wrote it, you'll completely misinterpret what those

312
00:15:59.000 --> 00:16:02.120
<v Speaker 2>bites actually report. It could be the difference between seeing

313
00:16:02.159 --> 00:16:05.559
<v Speaker 2>December fifth and May twelfth in a critical timestamp, just

314
00:16:05.639 --> 00:16:07.639
<v Speaker 2>based on reading the bytes in the wrong order.

315
00:16:07.840 --> 00:16:11.919
<v Speaker 1>Okay, crucial detail. Then, with that secret language sort of decoded,

316
00:16:12.240 --> 00:16:14.440
<v Speaker 1>let's zoom out a bit from the individual bits and

317
00:16:14.480 --> 00:16:17.480
<v Speaker 1>bytes to the actual landscape where all this data lives.

318
00:16:18.440 --> 00:16:22.799
<v Speaker 1>We're moving to the disks, partitions, and file system fundamentals,

319
00:16:23.279 --> 00:16:27.600
<v Speaker 1>the very architecture of digital storage. Starting with how computers

320
00:16:27.679 --> 00:16:31.320
<v Speaker 1>organize information physically, What are the different types of storage

321
00:16:31.360 --> 00:16:33.840
<v Speaker 1>and which ones are most relevant for forensics?

322
00:16:34.080 --> 00:16:39.159
<v Speaker 2>Right? Computer storage is usually classified into a few tiers, primary, secondary, tertiary,

323
00:16:39.200 --> 00:16:43.519
<v Speaker 2>and offline. Primary storage is typically RAM random access memory,

324
00:16:43.720 --> 00:16:45.639
<v Speaker 2>and the key thing about RAM is that it's volatile,

325
00:16:45.840 --> 00:16:48.519
<v Speaker 2>meaning all the information stored in it is lost as

326
00:16:48.559 --> 00:16:51.440
<v Speaker 2>soon as the power is removed. This is exactly why

327
00:16:51.559 --> 00:16:55.039
<v Speaker 2>live data forensics LDF is so critical. As we've discussed,

328
00:16:55.679 --> 00:16:58.279
<v Speaker 2>RAM holds so many ephemeral details about what was just

329
00:16:58.279 --> 00:17:03.039
<v Speaker 2>happening on the system's story, open documents, running processes, network connections,

330
00:17:03.080 --> 00:17:04.960
<v Speaker 2>those encryption keys we mentioned.

331
00:17:05.000 --> 00:17:06.759
<v Speaker 1>Stuff you lose if you just pull.

332
00:17:06.559 --> 00:17:10.759
<v Speaker 2>A plug precisely. Then you have secondary storage. This includes

333
00:17:10.799 --> 00:17:15.000
<v Speaker 2>your traditional hard disk drives HDDs and the now very

334
00:17:15.000 --> 00:17:18.640
<v Speaker 2>common solid state drives SSDs. This is where most of

335
00:17:18.680 --> 00:17:21.160
<v Speaker 2>our persistent data resides, the stuff that stays when the

336
00:17:21.200 --> 00:17:22.160
<v Speaker 2>power is off.

337
00:17:22.000 --> 00:17:24.720
<v Speaker 1>And SSDs would you're everywhere now. Because they're so fast

338
00:17:24.720 --> 00:17:28.640
<v Speaker 1>and efficient, they bring their own unique forensic headaches, don't they.

339
00:17:29.000 --> 00:17:31.240
<v Speaker 1>I've heard they can be kind of a forensic investigator's

340
00:17:31.319 --> 00:17:34.200
<v Speaker 1>nightmare compared to the old spinning hard drives.

341
00:17:34.359 --> 00:17:38.920
<v Speaker 2>They absolutely do post some unique SSD specific challenges because

342
00:17:38.960 --> 00:17:42.880
<v Speaker 2>of how they work fundamentally differently from HDDs see. Unlike HDDs,

343
00:17:42.960 --> 00:17:45.920
<v Speaker 2>the flash memory components inside an SSD can only be

344
00:17:45.920 --> 00:17:48.279
<v Speaker 2>written to a limited number of times before they wear out,

345
00:17:48.519 --> 00:17:52.799
<v Speaker 2>so to extend the drives lifespan, the SSD controller employs

346
00:17:52.839 --> 00:17:57.599
<v Speaker 2>techniques like were leveling. This involves intelligently moving data around

347
00:17:57.599 --> 00:18:00.680
<v Speaker 2>independently of the operating system just to make sure all

348
00:18:00.680 --> 00:18:02.920
<v Speaker 2>the memory cells get written to roughly the same amount.

349
00:18:03.039 --> 00:18:05.160
<v Speaker 1>So the controller is shuffling data behind the.

350
00:18:05.200 --> 00:18:10.119
<v Speaker 2>Scenes exactly, which makes predicting the precise physical location of

351
00:18:10.160 --> 00:18:13.240
<v Speaker 2>a specific piece of data for forensic analysis much harder.

352
00:18:13.559 --> 00:18:16.039
<v Speaker 2>You can't just assume data stays put in one physical

353
00:18:16.039 --> 00:18:18.440
<v Speaker 2>spot like you mostly could on an HDD.

354
00:18:18.240 --> 00:18:20.799
<v Speaker 1>So data might not be where the operating system thinks

355
00:18:20.799 --> 00:18:23.519
<v Speaker 1>it is. That sounds like a constant game of digital

356
00:18:23.559 --> 00:18:24.240
<v Speaker 1>hide and seek.

357
00:18:24.400 --> 00:18:27.640
<v Speaker 2>It can be, and it gets worse even more critically

358
00:18:27.799 --> 00:18:31.519
<v Speaker 2>when the operating system marks data as unallocated, which happens

359
00:18:31.519 --> 00:18:34.680
<v Speaker 2>when you delete a file. Modern SSDs use a function

360
00:18:34.759 --> 00:18:38.519
<v Speaker 2>called trim. The OS basically tells the SSD controller, Hey,

361
00:18:38.599 --> 00:18:39.799
<v Speaker 2>we don't need the data in these.

362
00:18:39.720 --> 00:18:41.160
<v Speaker 1>Blocks anymore, and the controller.

363
00:18:41.319 --> 00:18:44.119
<v Speaker 2>The controller can then internally mark those blocks for erasure,

364
00:18:44.359 --> 00:18:47.240
<v Speaker 2>often almost immediately, as part of its garbage collection routines.

365
00:18:47.880 --> 00:18:50.640
<v Speaker 2>This means that deleted files are much less likely to

366
00:18:50.680 --> 00:18:54.359
<v Speaker 2>be present and recoverable on SSDs than on traditional HDDs,

367
00:18:54.519 --> 00:18:56.680
<v Speaker 2>where the data just sat there until overwritten.

368
00:18:57.079 --> 00:18:59.599
<v Speaker 1>Wow, so deleting really means deleting much more often on

369
00:18:59.599 --> 00:19:00.759
<v Speaker 1>ans sat often.

370
00:19:00.920 --> 00:19:05.000
<v Speaker 2>Yes, And here's the real kicker for forensics. These SSD

371
00:19:05.119 --> 00:19:08.839
<v Speaker 2>controllers run their garbage collection routines in the background while

372
00:19:08.880 --> 00:19:12.240
<v Speaker 2>the drive is powered on. This means that potentially data

373
00:19:12.319 --> 00:19:14.920
<v Speaker 2>is changing on the device in question, even if it's

374
00:19:14.920 --> 00:19:17.440
<v Speaker 2>just sitting there, plugged in as evidence doing nothing.

375
00:19:17.440 --> 00:19:20.559
<v Speaker 1>From the OS perspective, WHOA, So the evidence is potentially

376
00:19:20.640 --> 00:19:21.799
<v Speaker 1>altering itself.

377
00:19:21.559 --> 00:19:24.799
<v Speaker 2>Exactly, which, as you can imagine, directly breaks the principle

378
00:19:24.839 --> 00:19:28.720
<v Speaker 2>of not altering evidence. It's a fundamental conflict that investigators

379
00:19:28.759 --> 00:19:31.480
<v Speaker 2>have to be acutely aware of when dealing with SSDs.

380
00:19:31.839 --> 00:19:35.680
<v Speaker 1>That's a huge, huge challenge to the core ideas of forensics. Okay,

381
00:19:35.720 --> 00:19:40.039
<v Speaker 1>so physical discs, whether HDD or SSD, they're often divided

382
00:19:40.079 --> 00:19:43.240
<v Speaker 1>into partitions. Why do we do that? What's the purpose

383
00:19:43.240 --> 00:19:44.440
<v Speaker 1>of these logical divisions?

384
00:19:44.839 --> 00:19:49.279
<v Speaker 2>Partitions are essentially logical divisions of a single physical disc.

385
00:19:49.799 --> 00:19:52.079
<v Speaker 2>They allow that one physical disc to be split into

386
00:19:52.160 --> 00:19:55.960
<v Speaker 2>multiple logical areas, each of These areas can then contain

387
00:19:56.079 --> 00:19:59.400
<v Speaker 2>a different file system or even a different operating system.

388
00:19:59.720 --> 00:20:02.599
<v Speaker 2>For you might have one partition for Windows and another

389
00:20:02.599 --> 00:20:04.680
<v Speaker 2>for Linux on the same drive, or maybe a separate

390
00:20:04.720 --> 00:20:06.319
<v Speaker 2>partition just for your user data.

391
00:20:06.559 --> 00:20:09.000
<v Speaker 1>Okay, and there are different ways these partitions are laid

392
00:20:09.000 --> 00:20:12.200
<v Speaker 1>out on the disc. Right, Like MBR versus GPT. What's

393
00:20:12.200 --> 00:20:14.599
<v Speaker 1>the practical difference there for someone investigating a system.

394
00:20:14.720 --> 00:20:20.440
<v Speaker 2>Yes, MBR Master Boot Record and GPTGII Partition Table are

395
00:20:20.519 --> 00:20:24.039
<v Speaker 2>the two main schemes used to define how partitions are

396
00:20:24.119 --> 00:20:27.880
<v Speaker 2>organized on a disc. MBR is the older standard. GPT

397
00:20:28.160 --> 00:20:31.319
<v Speaker 2>is more modern and allows for well far more partitions

398
00:20:31.319 --> 00:20:33.759
<v Speaker 2>on a single disk, and also provides greater space for

399
00:20:33.799 --> 00:20:37.200
<v Speaker 2>storing partition information compared to the very limited.

400
00:20:36.799 --> 00:20:39.160
<v Speaker 1>Space in the MBR, so more robust.

401
00:20:39.440 --> 00:20:43.599
<v Speaker 2>Generally, Yes, For an investigator, knowing which scheme is used

402
00:20:43.799 --> 00:20:45.640
<v Speaker 2>tells you where to look on the disc for critical

403
00:20:45.680 --> 00:20:48.680
<v Speaker 2>boot information and how that disc is logically structured overall.

404
00:20:48.839 --> 00:20:51.640
<v Speaker 1>Got it. Let's zoom in again now to the core

405
00:20:51.720 --> 00:20:55.200
<v Speaker 1>filesystem concepts. These are the real nuts and bolts of

406
00:20:55.240 --> 00:20:58.240
<v Speaker 1>how files actually live on a disc, and understanding them

407
00:20:58.240 --> 00:21:02.359
<v Speaker 1>can reveal hidden data. What's a cluster or block, and

408
00:21:02.440 --> 00:21:04.559
<v Speaker 1>why does that basic unit mate or so much?

409
00:21:05.000 --> 00:21:08.079
<v Speaker 2>Right? A cluster, sometimes called a block, is the basic,

410
00:21:08.400 --> 00:21:12.279
<v Speaker 2>smallest allocatable unit of storage within a file system. Think

411
00:21:12.279 --> 00:21:14.319
<v Speaker 2>of it like building blocks. Even if you have a

412
00:21:14.359 --> 00:21:17.039
<v Speaker 2>tiny file that's only say, one byte in size, the

413
00:21:17.039 --> 00:21:20.000
<v Speaker 2>filesystem has to alligate an entire cluster to store it.

414
00:21:20.160 --> 00:21:23.559
<v Speaker 2>That cluster might be, for instance, forty ninety six bytes, so.

415
00:21:23.640 --> 00:21:26.839
<v Speaker 1>One byte of data, except forty ninety six bytes of.

416
00:21:26.799 --> 00:21:30.119
<v Speaker 2>Space exactly the remaining forty ninety five bytes in that cluster.

417
00:21:30.240 --> 00:21:32.599
<v Speaker 2>The space between the end of the actual file data

418
00:21:32.640 --> 00:21:35.359
<v Speaker 2>and the end of the cluster. That's known as slack space.

419
00:21:35.519 --> 00:21:37.000
<v Speaker 1>And that's interesting because.

420
00:21:36.680 --> 00:21:40.240
<v Speaker 2>Because this slack space isn't necessarily empty, it can contain

421
00:21:40.359 --> 00:21:43.079
<v Speaker 2>data from previous files that happen to occupy this cluster

422
00:21:43.119 --> 00:21:45.680
<v Speaker 2>before the current file was written there. It might hold

423
00:21:45.680 --> 00:21:49.599
<v Speaker 2>fragments of old documents, emails, chatlogs, religual evidence that was

424
00:21:49.640 --> 00:21:52.519
<v Speaker 2>never fully overwritten. It can be a real gold mine

425
00:21:52.519 --> 00:21:53.359
<v Speaker 2>for investigators.

426
00:21:53.599 --> 00:21:58.319
<v Speaker 1>Wow, okay, and unallocated space. That sounds like just empty space,

427
00:21:58.359 --> 00:22:00.640
<v Speaker 1>But I have a feeling it's not always truly empty either.

428
00:22:00.839 --> 00:22:04.079
<v Speaker 2>You're right, it often isn't. Unallocated space is simply disk

429
00:22:04.160 --> 00:22:06.920
<v Speaker 2>space that isn't currently assigned to any active file by

430
00:22:06.920 --> 00:22:11.400
<v Speaker 2>the filesystem. But crucially, when a file is deleted, especially

431
00:22:11.440 --> 00:22:15.759
<v Speaker 2>on older hgds, less so on trim abled SSDs, its

432
00:22:15.799 --> 00:22:19.319
<v Speaker 2>content often isn't wiped immediately. The space is just marked

433
00:22:19.359 --> 00:22:23.559
<v Speaker 2>as available. The actual data might still be physically present

434
00:22:23.559 --> 00:22:26.839
<v Speaker 2>in that unallocated space, just waiting to be overwritten by

435
00:22:26.839 --> 00:22:28.440
<v Speaker 2>new data eventually.

436
00:22:28.039 --> 00:22:30.000
<v Speaker 1>Which is why you need a full copy exactly.

437
00:22:30.519 --> 00:22:33.799
<v Speaker 2>This is precisely why forensic investigators always aim to create

438
00:22:34.079 --> 00:22:37.119
<v Speaker 2>a bit by bit image of the device. This ensures

439
00:22:37.160 --> 00:22:39.319
<v Speaker 2>that all the unallocated space is captured and can be

440
00:22:39.359 --> 00:22:42.200
<v Speaker 2>analyzed later. If you just copied the active files, you'd

441
00:22:42.200 --> 00:22:45.720
<v Speaker 2>miss all that potential evidence of deleted but still recoverable files.

442
00:22:45.839 --> 00:22:48.880
<v Speaker 1>Makes sense, What about file fragmentation? Does that just make

443
00:22:48.920 --> 00:22:51.920
<v Speaker 1>file slow to load or does it complicate forensics too?

444
00:22:52.279 --> 00:22:55.799
<v Speaker 2>File fragmentation happens when there isn't a single large enough

445
00:22:55.799 --> 00:22:58.720
<v Speaker 2>continuous area on the disc to store an entire file

446
00:22:58.839 --> 00:23:01.359
<v Speaker 2>when it's first written or when it grows, so the

447
00:23:01.359 --> 00:23:04.319
<v Speaker 2>file system has to split the file into multiple pieces

448
00:23:04.440 --> 00:23:07.599
<v Speaker 2>or fragments, and store them in different physical locations on

449
00:23:07.640 --> 00:23:11.359
<v Speaker 2>the disc. Okay, while it can definitely affect performance for forensics.

450
00:23:11.640 --> 00:23:14.720
<v Speaker 2>It means that to recover or analyze that file, you

451
00:23:14.799 --> 00:23:18.599
<v Speaker 2>first have to find and correctly reassemble all those scattered fragments.

452
00:23:18.759 --> 00:23:21.920
<v Speaker 2>It's like putting together a jigsaw puzzle, sometimes with missing

453
00:23:22.000 --> 00:23:24.240
<v Speaker 2>pieces or pieces from different puzzles mixed in.

454
00:23:24.519 --> 00:23:27.720
<v Speaker 1>It adds complexity, right, and something called copy on write

455
00:23:27.799 --> 00:23:31.000
<v Speaker 1>or COWW. That sounds like maybe a way to preserve

456
00:23:31.079 --> 00:23:32.680
<v Speaker 1>data rather than lose it.

457
00:23:32.680 --> 00:23:35.119
<v Speaker 2>It is in a way copy on righte. COWW is

458
00:23:35.160 --> 00:23:38.400
<v Speaker 2>a strategy used in many modern file systems like APFS

459
00:23:38.400 --> 00:23:41.440
<v Speaker 2>and others. When a resource like a file block is

460
00:23:41.480 --> 00:23:44.880
<v Speaker 2>about to be modified. Instead of overwriting the original data directly,

461
00:23:45.240 --> 00:23:47.759
<v Speaker 2>the filesystem first makes a copy of the original block

462
00:23:47.799 --> 00:23:49.680
<v Speaker 2>and then writes the changes to the new block.

463
00:23:49.799 --> 00:23:52.440
<v Speaker 1>Ah, so the old version sticks around for a while.

464
00:23:52.559 --> 00:23:55.680
<v Speaker 2>Potentially yes, it means the original data might still be

465
00:23:55.720 --> 00:23:58.799
<v Speaker 2>present somewhere on the disc, offering the potential to discover

466
00:23:59.039 --> 00:24:03.720
<v Speaker 2>earlier versions of artifacts or files. This is also fundamental

467
00:24:03.720 --> 00:24:07.079
<v Speaker 2>to how filesystem snapshots work. The preserve a view of

468
00:24:07.119 --> 00:24:10.119
<v Speaker 2>the filesystem at a specific point in time by referencing

469
00:24:10.160 --> 00:24:14.480
<v Speaker 2>these older, unmodified blocks. These snapshots are great for backups

470
00:24:14.559 --> 00:24:18.240
<v Speaker 2>but also incredibly valuable for forensic analysis to see what

471
00:24:18.279 --> 00:24:20.119
<v Speaker 2>the system look like at a previous state.

472
00:24:20.599 --> 00:24:23.920
<v Speaker 1>Very cool. Finally, in this section, what's RAID? I usually

473
00:24:23.920 --> 00:24:27.000
<v Speaker 1>hear that mentioned in the context of big server storage

474
00:24:27.079 --> 00:24:27.960
<v Speaker 1>or maybe backups.

475
00:24:28.079 --> 00:24:31.319
<v Speaker 2>RAID stands for a redundant array of independent discs. It's

476
00:24:31.359 --> 00:24:34.440
<v Speaker 2>a technology that combines multiple physical disc drives into a

477
00:24:34.480 --> 00:24:37.440
<v Speaker 2>single logical unit that the operating system sees as one

478
00:24:37.680 --> 00:24:38.359
<v Speaker 2>big disc.

479
00:24:38.480 --> 00:24:38.960
<v Speaker 1>Why do that?

480
00:24:39.319 --> 00:24:42.000
<v Speaker 2>It can be done for various reasons. Sometimes it's just

481
00:24:42.000 --> 00:24:46.200
<v Speaker 2>to create one large, consolidated filesystem from several smaller discs.

482
00:24:46.240 --> 00:24:49.759
<v Speaker 2>That's RAID zero, which focuses on performance or size but

483
00:24:49.839 --> 00:24:53.839
<v Speaker 2>offers no redundancy. Or more commonly, it's done for redundancy

484
00:24:53.880 --> 00:24:57.839
<v Speaker 2>and fault tolerance. For instance, RAD one marrors data exactly

485
00:24:57.839 --> 00:25:01.160
<v Speaker 2>across two or more drives. If one fails, the data

486
00:25:01.200 --> 00:25:04.400
<v Speaker 2>is safe on the other. RAT five uses parity information

487
00:25:04.519 --> 00:25:07.720
<v Speaker 2>striped across drives, allowing for a single drive to fail

488
00:25:07.759 --> 00:25:08.759
<v Speaker 2>without any data.

489
00:25:08.559 --> 00:25:10.319
<v Speaker 1>Loss and for forensics.

490
00:25:10.359 --> 00:25:13.799
<v Speaker 2>For forensics, analyzing a RATE array means understanding how the

491
00:25:13.880 --> 00:25:17.160
<v Speaker 2>data is striped or mirrored across those multiple physical discs.

492
00:25:17.480 --> 00:25:20.079
<v Speaker 2>You often need to image all the member discs and

493
00:25:20.119 --> 00:25:23.799
<v Speaker 2>then use specialized software to reconstruct the original logical volume

494
00:25:23.839 --> 00:25:26.359
<v Speaker 2>before you can even start analyzing the filesystem on top

495
00:25:26.400 --> 00:25:29.000
<v Speaker 2>of it. It adds another layer of complexity.

496
00:25:29.039 --> 00:25:31.559
<v Speaker 1>Fascinating. Okay, let's move on and take a quick tour

497
00:25:31.599 --> 00:25:34.119
<v Speaker 1>of con file systems. These are like the different languages

498
00:25:34.200 --> 00:25:37.079
<v Speaker 1>or organizational schemes and operating systems used to manage all

499
00:25:37.119 --> 00:25:40.400
<v Speaker 1>these bits, bytes, clusters and partitions we've been talking about.

500
00:25:40.480 --> 00:25:43.039
<v Speaker 1>We'll start with the old, reliable FAT and its newer

501
00:25:43.079 --> 00:25:44.640
<v Speaker 1>cousin x fat sure.

502
00:25:44.839 --> 00:25:50.039
<v Speaker 2>The FAT file Allocation Table filesystem is well, very old

503
00:25:50.119 --> 00:25:53.279
<v Speaker 2>and relatively simple in its structure. It basically consists of

504
00:25:53.319 --> 00:25:56.880
<v Speaker 2>only three main components, the boot sector, the file allocation

505
00:25:56.920 --> 00:26:00.880
<v Speaker 2>table itself which tracks cluster usage, and the directory entries.

506
00:26:01.200 --> 00:26:04.200
<v Speaker 2>Because of its simplicity and wide compatibility, you still find

507
00:26:04.240 --> 00:26:07.720
<v Speaker 2>it very commonly on removable media like USB drives and

508
00:26:07.799 --> 00:26:09.119
<v Speaker 2>older SD cards, but it.

509
00:26:09.079 --> 00:26:11.759
<v Speaker 1>Had limitations, especially with file size.

510
00:26:11.799 --> 00:26:15.480
<v Speaker 2>Big Time FAT thirty two, the most common version, famously

511
00:26:15.519 --> 00:26:18.039
<v Speaker 2>had a four gigabyte limit for single files, which is

512
00:26:18.359 --> 00:26:21.200
<v Speaker 2>tiny by today's standards. That's where x fat comes in.

513
00:26:21.720 --> 00:26:25.079
<v Speaker 2>It's a newer file system, also for Microsoft, designed specifically

514
00:26:25.079 --> 00:26:28.400
<v Speaker 2>for larger removable media. Like modern high capacity SD cards

515
00:26:28.400 --> 00:26:29.319
<v Speaker 2>and USB drives.

516
00:26:29.400 --> 00:26:30.359
<v Speaker 1>What's the key difference?

517
00:26:30.480 --> 00:26:33.319
<v Speaker 2>A key improvement is its support for much larger files.

518
00:26:33.640 --> 00:26:36.039
<v Speaker 2>XPECT can handle files up to theoretically one hundred and

519
00:26:36.079 --> 00:26:39.400
<v Speaker 2>twenty eight petabytes that's pb, which is enormous. It achieves

520
00:26:39.400 --> 00:26:41.920
<v Speaker 2>this partly by using eight byte values to store file

521
00:26:41.960 --> 00:26:44.880
<v Speaker 2>sizes compared to FAT thirty two's four byte values, so

522
00:26:44.880 --> 00:26:47.279
<v Speaker 2>it's much better suited for things like large video files

523
00:26:47.279 --> 00:26:48.039
<v Speaker 2>on flash drives.

524
00:26:48.240 --> 00:26:51.839
<v Speaker 1>Okay, then there's the standard for Windows for a long

525
00:26:51.880 --> 00:26:55.319
<v Speaker 1>time now and TFS. What are it is? Defining features?

526
00:26:55.359 --> 00:26:57.920
<v Speaker 1>What makes it such a robust system? And what kind

527
00:26:57.920 --> 00:27:00.720
<v Speaker 1>of hidden details might it hold for investigation? Right?

528
00:27:00.960 --> 00:27:05.160
<v Speaker 2>NTFS New Technology Filesystem has been the default for Windows

529
00:27:05.200 --> 00:27:07.799
<v Speaker 2>for decades now, and it's a much more complex and

530
00:27:07.920 --> 00:27:12.119
<v Speaker 2>robust filesystem than FAT. One key feature is journaling.

531
00:27:12.559 --> 00:27:13.119
<v Speaker 1>What does that do?

532
00:27:13.400 --> 00:27:17.599
<v Speaker 2>Journaling means the filesystem records pending changes to its metadata

533
00:27:17.640 --> 00:27:21.160
<v Speaker 2>in a log or journal before actually committing those changes

534
00:27:21.160 --> 00:27:24.960
<v Speaker 2>to the main filesystem structures. This makes NTFS much more

535
00:27:25.000 --> 00:27:28.359
<v Speaker 2>fault tolerant. If the system crashes mid operation, it can

536
00:27:28.440 --> 00:27:31.079
<v Speaker 2>use the journal to recover and ensure the filesystem structure

537
00:27:31.119 --> 00:27:33.920
<v Speaker 2>remains consistent reducing the risk of data corruption.

538
00:27:34.079 --> 00:27:35.440
<v Speaker 1>That sounds important very.

539
00:27:35.680 --> 00:27:39.480
<v Speaker 2>NTFS also supports something called alternate data streams or ADS.

540
00:27:39.759 --> 00:27:42.640
<v Speaker 2>This allows multiple separate streams of data to be associated

541
00:27:42.680 --> 00:27:43.519
<v Speaker 2>with a single.

542
00:27:43.279 --> 00:27:45.519
<v Speaker 1>File name like hidden attachments.

543
00:27:45.279 --> 00:27:48.039
<v Speaker 2>Kind of Yeah, the main file data is in one stream,

544
00:27:48.119 --> 00:27:51.400
<v Speaker 2>but you can attach other hidden streams. These aren't always

545
00:27:51.480 --> 00:27:54.720
<v Speaker 2>visible through standard tools like Windows Explore, so they have

546
00:27:54.799 --> 00:27:58.640
<v Speaker 2>sometimes been used to hide information maybe malware components or

547
00:27:58.640 --> 00:28:02.400
<v Speaker 2>other data. Instigators definitely need to check for ADS.

548
00:28:02.119 --> 00:28:04.519
<v Speaker 1>So it's like a hidden digital tag or a secret

549
00:28:04.559 --> 00:28:08.480
<v Speaker 1>compartment within a file. Have investigators found surprising ways these

550
00:28:08.519 --> 00:28:11.839
<v Speaker 1>ADS are used, maybe to trace a file's origin absolutely.

551
00:28:11.880 --> 00:28:14.559
<v Speaker 2>For example, a common ADS you might encounter, maybe without

552
00:28:14.599 --> 00:28:18.880
<v Speaker 2>realizing it, is the zone dot identifier. Windows often automatically

553
00:28:18.880 --> 00:28:22.240
<v Speaker 2>attaches this stream to files downloaded from the Internet, and

554
00:28:22.279 --> 00:28:25.759
<v Speaker 2>it records information like the original URL it came from.

555
00:28:26.279 --> 00:28:29.480
<v Speaker 2>That can be crucial evidence for tracing malware or suspicious

556
00:28:29.519 --> 00:28:30.880
<v Speaker 2>documents back to their source.

557
00:28:31.279 --> 00:28:33.559
<v Speaker 1>Interesting. What else is key in NTFS?

558
00:28:33.759 --> 00:28:37.880
<v Speaker 2>Well, the heart of NTFS is the Master Filetable or MFT.

559
00:28:38.039 --> 00:28:40.799
<v Speaker 2>Think of it as a highly detailed database or library

560
00:28:40.839 --> 00:28:43.720
<v Speaker 2>catalog for every single file and folder on the volume.

561
00:28:43.880 --> 00:28:45.839
<v Speaker 2>Each file has an entry in the MFT.

562
00:28:45.720 --> 00:28:47.599
<v Speaker 1>And what information is in that entry.

563
00:28:47.599 --> 00:28:52.440
<v Speaker 2>Lots of metadata. It stores file attributes, including things like objectives.

564
00:28:52.680 --> 00:28:56.759
<v Speaker 2>These are unique identifiers UUIDs that can act like digital fingerprints,

565
00:28:57.039 --> 00:28:59.720
<v Speaker 2>potentially linking a file back to the specific computer it

566
00:28:59.759 --> 00:29:02.279
<v Speaker 2>was a real created on, even if the file itself

567
00:29:02.319 --> 00:29:05.400
<v Speaker 2>has been copied around. It includes creation time stamps and

568
00:29:05.480 --> 00:29:07.759
<v Speaker 2>even the m MASS address of the network card of

569
00:29:07.799 --> 00:29:11.519
<v Speaker 2>the creating machine sometimes WOW. And the MFT also holds

570
00:29:11.599 --> 00:29:15.279
<v Speaker 2>security descriptor attributes. These contain information like the owner and

571
00:29:15.319 --> 00:29:20.440
<v Speaker 2>group security identifiers sids and the access control lists acls,

572
00:29:20.440 --> 00:29:23.160
<v Speaker 2>which define exactly who has permission to read, write, or

573
00:29:23.200 --> 00:29:26.559
<v Speaker 2>execute the file, all crucial details for an investigation.

574
00:29:27.000 --> 00:29:30.079
<v Speaker 1>Okay, moving across to the Linux world, now we have

575
00:29:30.200 --> 00:29:33.039
<v Speaker 1>the EXT family of filesystems. How has that evolved over

576
00:29:33.079 --> 00:29:35.799
<v Speaker 1>the years and what does the latest version XT four

577
00:29:36.000 --> 00:29:37.599
<v Speaker 1>offer investigators?

578
00:29:37.960 --> 00:29:41.240
<v Speaker 2>Right, the EXT family is native to Linux. It started

579
00:29:41.240 --> 00:29:43.880
<v Speaker 2>with X two, which was simpler and stored file metadata

580
00:29:43.880 --> 00:29:47.119
<v Speaker 2>in structures called inodes. Each file or directory has an

581
00:29:47.119 --> 00:29:50.480
<v Speaker 2>inode containing information like permissions, time stamps, and pointers to

582
00:29:50.519 --> 00:29:51.799
<v Speaker 2>the actual data blocks.

583
00:29:51.839 --> 00:29:53.799
<v Speaker 1>But X two lacks something important, Right?

584
00:29:54.039 --> 00:29:57.480
<v Speaker 2>Yes, X two lacked journaling, making it vulnerable to corruption

585
00:29:57.519 --> 00:30:00.279
<v Speaker 2>if the system crashed during rites. So X three was

586
00:30:00.319 --> 00:30:04.039
<v Speaker 2>developed and its main addition was journaling, providing that resilience

587
00:30:04.039 --> 00:30:08.759
<v Speaker 2>similar to NTFS. X three also introduced h tree directory indexing,

588
00:30:08.799 --> 00:30:13.000
<v Speaker 2>which was a significant performance improvement, allowing directories to efficiently

589
00:30:13.039 --> 00:30:16.799
<v Speaker 2>handle millions of files, overcoming a major limitation of X

590
00:30:16.799 --> 00:30:18.160
<v Speaker 2>two for large file systems.

591
00:30:18.359 --> 00:30:20.519
<v Speaker 1>And X four is the modern standard now right, what

592
00:30:20.599 --> 00:30:22.839
<v Speaker 1>really makes it stand out? Especially for forensics?

593
00:30:23.039 --> 00:30:26.599
<v Speaker 2>Yes, xdoor is the default for many Linux distributions today

594
00:30:26.839 --> 00:30:30.519
<v Speaker 2>and brought several important X to four innovations. One major

595
00:30:30.599 --> 00:30:34.319
<v Speaker 2>change was the introduction of extents. Instead of using individual

596
00:30:34.359 --> 00:30:37.680
<v Speaker 2>block pointers for large files, extends to find a starting

597
00:30:37.720 --> 00:30:40.640
<v Speaker 2>block and a length, making storage much more efficient for

598
00:30:40.759 --> 00:30:43.480
<v Speaker 2>large contiguous files and simplifying.

599
00:30:42.920 --> 00:30:45.039
<v Speaker 1>Recovery any other key features Yeah.

600
00:30:44.839 --> 00:30:48.319
<v Speaker 2>Another one is inline storage for very small files, X

601
00:30:48.359 --> 00:30:51.079
<v Speaker 2>four can actually store the file's data directly within the

602
00:30:51.079 --> 00:30:54.440
<v Speaker 2>inode structure itself, saving space and eliminating the need for

603
00:30:54.440 --> 00:30:58.279
<v Speaker 2>separate data blocks entirely all interesting and crucially for investigators.

604
00:30:58.640 --> 00:31:03.240
<v Speaker 2>XT four dramatically improved timestamp granularity down to nanosecond precision.

605
00:31:03.839 --> 00:31:06.920
<v Speaker 2>It also officially added support for a file creation timestamp

606
00:31:07.000 --> 00:31:09.759
<v Speaker 2>sometimes called cre time or b time, which was often

607
00:31:09.799 --> 00:31:13.559
<v Speaker 2>missing or unreliable in older ext versions. This allows for

608
00:31:13.640 --> 00:31:17.799
<v Speaker 2>incredibly detailed timelines of filesystem events down to billions of

609
00:31:17.839 --> 00:31:18.680
<v Speaker 2>a second.

610
00:31:18.559 --> 00:31:22.079
<v Speaker 1>Nine seconds again, okay, finally in our tour, Apple's next

611
00:31:22.119 --> 00:31:27.039
<v Speaker 1>generation system, APFS. How does that compare? Especially given Apple's

612
00:31:27.039 --> 00:31:29.359
<v Speaker 1>big focus on security and user privacy.

613
00:31:29.480 --> 00:31:33.359
<v Speaker 2>APFS Apple filesystem was introduced relatively recently to replace their

614
00:31:33.400 --> 00:31:37.440
<v Speaker 2>older HFS plus filesystem on macOS iOS and other Apple devices.

615
00:31:37.640 --> 00:31:41.000
<v Speaker 2>A big architectural change was moving to sixty four bit enodes.

616
00:31:41.440 --> 00:31:44.480
<v Speaker 2>This vastly increases the theoretical number of files possible on

617
00:31:44.519 --> 00:31:47.279
<v Speaker 2>a volume compared to HFS plus tin essential for handling

618
00:31:47.359 --> 00:31:50.480
<v Speaker 2>the massive amounts of data on modern devices.

619
00:31:49.880 --> 00:31:51.200
<v Speaker 1>Makes sense. What about features?

620
00:31:51.400 --> 00:31:54.599
<v Speaker 2>Well? APFS was designed with modern hardware like SSD's and mind.

621
00:31:54.720 --> 00:31:57.720
<v Speaker 2>It features robust built in encryption, which can be applied

622
00:31:57.759 --> 00:32:00.440
<v Speaker 2>at the whole disc level or even per file. This

623
00:32:00.480 --> 00:32:03.359
<v Speaker 2>obviously makes forensic analysis much more challenging if you don't

624
00:32:03.359 --> 00:32:04.680
<v Speaker 2>have the decryption keys, a.

625
00:32:04.680 --> 00:32:05.759
<v Speaker 1>Big hurdle definitely.

626
00:32:05.799 --> 00:32:11.359
<v Speaker 2>However, APFS also has really robust snapshot creation capabilities built in.

627
00:32:11.799 --> 00:32:14.160
<v Speaker 2>These are used by time Machine for backups, but they

628
00:32:14.200 --> 00:32:17.160
<v Speaker 2>also mean that older versions of the filesystem state might

629
00:32:17.240 --> 00:32:21.160
<v Speaker 2>be easily accessible, which can be very useful for forensic analysis,

630
00:32:21.359 --> 00:32:23.920
<v Speaker 2>allowing you to roll back and see previous file versions

631
00:32:23.960 --> 00:32:24.680
<v Speaker 2>or system.

632
00:32:24.359 --> 00:32:26.359
<v Speaker 1>States interesting, anything else unique.

633
00:32:26.440 --> 00:32:31.119
<v Speaker 2>One other notable feature is space sharing. In APFS, multiple

634
00:32:31.200 --> 00:32:34.799
<v Speaker 2>logical volumes can exist within a single physical container and

635
00:32:34.839 --> 00:32:38.960
<v Speaker 2>share the underlying free space. This is flexible for users,

636
00:32:39.000 --> 00:32:42.440
<v Speaker 2>but can make tracking exact data allocation and free space

637
00:32:42.480 --> 00:32:46.039
<v Speaker 2>a bit more complex for forensic analysis compared to traditional

638
00:32:46.039 --> 00:32:47.039
<v Speaker 2>fixed partitions.

639
00:32:47.359 --> 00:32:49.799
<v Speaker 1>Wow, it's a lot of complex hidden infrastructure under the hood.

640
00:32:50.240 --> 00:32:52.920
<v Speaker 1>So how do investigators actually get to all this data,

641
00:32:52.960 --> 00:32:57.160
<v Speaker 1>these bits bytes in odes, MFT records without altering the

642
00:32:57.200 --> 00:33:00.440
<v Speaker 1>original evidence or causing more problems like those SSD changes.

643
00:33:01.000 --> 00:33:05.799
<v Speaker 1>What's in the investigator's toolkit acquiring and analyzing digital evidence right?

644
00:33:06.119 --> 00:33:10.039
<v Speaker 2>The first and absolutely critical step is forensically sound acquisition.

645
00:33:10.880 --> 00:33:13.480
<v Speaker 2>The goal here is to create a perfect copy of

646
00:33:13.519 --> 00:33:17.440
<v Speaker 2>the original storage device without changing the original in any way.

647
00:33:17.680 --> 00:33:18.680
<v Speaker 1>How do they guarantee that?

648
00:33:18.960 --> 00:33:22.359
<v Speaker 2>Primarily through the use of right blockers. These can be

649
00:33:22.400 --> 00:33:25.759
<v Speaker 2>specialized hardware devices that sit between the investigator's computer and

650
00:33:25.799 --> 00:33:29.400
<v Speaker 2>the evidence drive, physically preventing any right commands from reaching

651
00:33:29.400 --> 00:33:32.839
<v Speaker 2>the evidence, or they can be software based, like specific

652
00:33:32.920 --> 00:33:35.400
<v Speaker 2>settings and Linux that mount a device and read only mode.

653
00:33:35.440 --> 00:33:38.000
<v Speaker 2>At a very low level, the idea is to put

654
00:33:38.000 --> 00:33:41.880
<v Speaker 2>that fragile artifact, the original drive, in a protective glass case,

655
00:33:41.960 --> 00:33:44.960
<v Speaker 2>metaphorically speaking, before you even begin to study it.

656
00:33:45.039 --> 00:33:48.680
<v Speaker 1>Okay. Essential step and a core Linux command for making

657
00:33:48.680 --> 00:33:52.119
<v Speaker 1>that copy is DAD. I hear. That's incredibly powerful, but

658
00:33:52.160 --> 00:33:54.000
<v Speaker 1>maybe a little dangerous if you're not careful.

659
00:33:54.400 --> 00:33:57.920
<v Speaker 2>Yes. The AAD command is a classic, very powerful Linux

660
00:33:57.960 --> 00:34:01.440
<v Speaker 2>tool for creating raw images, which are exact bit by

661
00:34:01.519 --> 00:34:04.799
<v Speaker 2>dick copies of a device or partition. It's sometimes nicknamed

662
00:34:04.880 --> 00:34:07.359
<v Speaker 2>data destroyer if you mix up the input and output.

663
00:34:07.519 --> 00:34:11.159
<v Speaker 2>Uh hot, Yeah, you have to be careful, but used correctly,

664
00:34:11.199 --> 00:34:14.960
<v Speaker 2>it copies everything every sector, including all the unallocated space

665
00:34:15.000 --> 00:34:17.199
<v Speaker 2>and slack space we talked about earlier, which is vital.

666
00:34:17.599 --> 00:34:19.960
<v Speaker 2>You can also use options like count and skip to

667
00:34:20.000 --> 00:34:23.400
<v Speaker 2>copy only specific parts, allowing you to extract exact data

668
00:34:23.400 --> 00:34:25.639
<v Speaker 2>that is desired. For example, you could use it to

669
00:34:25.679 --> 00:34:27.960
<v Speaker 2>copy just the first five hundred and twelve bytes to

670
00:34:28.000 --> 00:34:30.800
<v Speaker 2>get the master boot record, or just the blogs belonging

671
00:34:30.800 --> 00:34:35.320
<v Speaker 2>to a specific partition table. Precise, very although for efficiency,

672
00:34:35.719 --> 00:34:39.239
<v Speaker 2>error handling and adding metadata about the acquisition process, many

673
00:34:39.280 --> 00:34:43.119
<v Speaker 2>forensic investigators now prefer specialized image formats like the Expert

674
00:34:43.119 --> 00:34:46.599
<v Speaker 2>Witness Format EWF, often created by tools other than just DD.

675
00:34:47.320 --> 00:34:50.639
<v Speaker 2>These formats can compress data, handle read errors better, and

676
00:34:50.719 --> 00:34:53.239
<v Speaker 2>store case information alongside the image data.

677
00:34:53.320 --> 00:34:56.599
<v Speaker 1>Okay, So once they have that pristine, forensically sound image,

678
00:34:56.639 --> 00:34:59.360
<v Speaker 1>that bit for bit copy, how do they start actually

679
00:34:59.360 --> 00:35:02.880
<v Speaker 1>picking it apart finding those hidden files, deleted remnants, or

680
00:35:02.920 --> 00:35:04.519
<v Speaker 1>specific pieces of metadata.

681
00:35:04.599 --> 00:35:07.559
<v Speaker 2>That's where analysis tools come in. A very widely used suite,

682
00:35:07.679 --> 00:35:10.280
<v Speaker 2>especially in the open source world, is the sleuth kit,

683
00:35:10.719 --> 00:35:12.400
<v Speaker 2>often abbreviated as TSK.

684
00:35:12.639 --> 00:35:13.400
<v Speaker 1>Okay, what does that do?

685
00:35:13.679 --> 00:35:16.599
<v Speaker 2>TSK is actually a collection of command line tools designed

686
00:35:16.639 --> 00:35:21.079
<v Speaker 2>specifically for filesystem forensics. For example, a tool called fusstat

687
00:35:21.119 --> 00:35:24.880
<v Speaker 2>can analyze the image and determine the filesystem type NTFS,

688
00:35:25.159 --> 00:35:29.480
<v Speaker 2>XT four, FAT, et cetera, and provide overall information about

689
00:35:29.480 --> 00:35:32.199
<v Speaker 2>the volume like block size and total blocks, so like

690
00:35:32.239 --> 00:35:35.559
<v Speaker 2>an initial assessment exactly. Then there's FLS, which is used

691
00:35:35.559 --> 00:35:39.440
<v Speaker 2>to list files and directories within the filesystem image. Crucially,

692
00:35:39.679 --> 00:35:43.159
<v Speaker 2>FLS can often list deleted files that still have metadata entries,

693
00:35:43.480 --> 00:35:46.320
<v Speaker 2>and it can even list the low level filesystem structures

694
00:35:46.320 --> 00:35:50.679
<v Speaker 2>themselves like inodes and ext or cnids and hfs, plus

695
00:35:50.840 --> 00:35:53.360
<v Speaker 2>essay showing you where things used to be or how they're.

696
00:35:53.239 --> 00:35:55.360
<v Speaker 1>Organized, and getting the actual data for that.

697
00:35:55.400 --> 00:35:59.679
<v Speaker 2>You'd use tools like astat to recover specific file metadata, timestams, permissions,

698
00:35:59.719 --> 00:36:03.119
<v Speaker 2>size as block pointers by referencing its inode number or

699
00:36:03.239 --> 00:36:06.400
<v Speaker 2>MFT entry id, and then iicat in use to recover

700
00:36:06.440 --> 00:36:09.760
<v Speaker 2>the actual file content itself, essentially concatenating together the data

701
00:36:09.760 --> 00:36:12.320
<v Speaker 2>blocks associated with that specific file or structure.

702
00:36:12.480 --> 00:36:15.119
<v Speaker 1>So TSK can also help put together timelines and maybe

703
00:36:15.159 --> 00:36:17.360
<v Speaker 1>even recover deleted files more automatically.

704
00:36:17.639 --> 00:36:21.360
<v Speaker 2>Indeed, TSK includes tools like mac time, which can parse

705
00:36:21.559 --> 00:36:25.719
<v Speaker 2>various timestamps collected from across the filesystem metadata like modified

706
00:36:25.760 --> 00:36:29.400
<v Speaker 2>access change times from inodes or MFT entries, and generate

707
00:36:29.480 --> 00:36:33.679
<v Speaker 2>incredibly detailed timelines of activity. There's also bly calls for

708
00:36:33.760 --> 00:36:37.039
<v Speaker 2>extracting blocks from the unallocated space, making it easier to

709
00:36:37.119 --> 00:36:41.039
<v Speaker 2>search that space for remnants of deleted data and scract

710
00:36:41.039 --> 00:36:43.639
<v Speaker 2>cover attempts to automate the recovery of deleted files by

711
00:36:43.639 --> 00:36:46.760
<v Speaker 2>finding their orphaned metadata and associated data blocks.

712
00:36:47.000 --> 00:36:49.719
<v Speaker 1>What if the metadata is completely gone, though, the filesystem

713
00:36:49.800 --> 00:36:53.239
<v Speaker 1>structures are corrupted, but maybe the raw data for say

714
00:36:53.360 --> 00:36:56.880
<v Speaker 1>a picture, is still sitting out there in unallocated space. Ah.

715
00:36:56.920 --> 00:36:58.920
<v Speaker 2>That's where a technique called data carving comes up.

716
00:36:58.960 --> 00:37:02.119
<v Speaker 1>That sounds like digital arc cheology, like trying to reconstruct

717
00:37:02.119 --> 00:37:04.320
<v Speaker 1>a shattered vase from its unique edge patterns.

718
00:37:04.400 --> 00:37:07.519
<v Speaker 2>That's a great analogy. Data carving works by bypassing the

719
00:37:07.519 --> 00:37:11.760
<v Speaker 2>file system entirely and instead searching for known file signatures

720
00:37:11.880 --> 00:37:15.119
<v Speaker 2>directly within the raw data stream of the image. These

721
00:37:15.119 --> 00:37:18.079
<v Speaker 2>signatures are unique sequences of bytes that typically mark the

722
00:37:18.079 --> 00:37:22.119
<v Speaker 2>beginning header and sometimes the end footer of specific file types.

723
00:37:22.519 --> 00:37:25.719
<v Speaker 2>For example, JPEG image files almost always start with the

724
00:37:25.760 --> 00:37:28.639
<v Speaker 2>hex bytes zero ox f ft eight and end with

725
00:37:28.880 --> 00:37:30.599
<v Speaker 2>zero ox f ft nine, so.

726
00:37:30.599 --> 00:37:32.559
<v Speaker 1>You just scan the whole image for those patterns.

727
00:37:32.639 --> 00:37:36.639
<v Speaker 2>Essentially, yes, carving tools scan the raw data looking for

728
00:37:36.679 --> 00:37:39.360
<v Speaker 2>these known headers and footers, and then extract the data

729
00:37:39.400 --> 00:37:41.800
<v Speaker 2>in between as a potential file. It's like looking for

730
00:37:41.840 --> 00:37:44.960
<v Speaker 2>specific patterns of color and texture to identify pieces of

731
00:37:44.960 --> 00:37:47.719
<v Speaker 2>that shattered vase. However, it's important to know that data

732
00:37:47.719 --> 00:37:51.199
<v Speaker 2>carving is not fully reliable. Files can be fragmented, meaning

733
00:37:51.239 --> 00:37:53.960
<v Speaker 2>the header and footer might be separated. By unrelated data,

734
00:37:54.239 --> 00:37:57.039
<v Speaker 2>or the footer might be missing entirely. Also, those byte

735
00:37:57.039 --> 00:38:00.559
<v Speaker 2>patterns might occasionally appear randomly within other unrelated day leading

736
00:38:00.559 --> 00:38:04.360
<v Speaker 2>to false positives. So car files always need careful validation.

737
00:38:04.719 --> 00:38:08.440
<v Speaker 1>Makes sense? Okay? Looking ahead now, with all this technology

738
00:38:08.519 --> 00:38:14.400
<v Speaker 1>evolving so incredibly rapidly, SSD's new file systems, encryption, what

739
00:38:14.480 --> 00:38:16.639
<v Speaker 1>are some of the biggest things on the horizon future

740
00:38:16.679 --> 00:38:19.760
<v Speaker 1>challenges in digital forensics? What keeps investigators up at night?

741
00:38:20.199 --> 00:38:22.960
<v Speaker 2>Well, probably the single biggest challenge is simply the data

742
00:38:23.000 --> 00:38:26.159
<v Speaker 2>volume problem. We've seen this vast increase in the quantity

743
00:38:26.159 --> 00:38:30.400
<v Speaker 2>of digital evidence phones, with terabies of storage, cloud accounts,

744
00:38:30.519 --> 00:38:35.159
<v Speaker 2>IoT devices, it's everywhere, right, This creates a kind of

745
00:38:35.239 --> 00:38:39.360
<v Speaker 2>vicious cycle because the resources available for analysis, skilled human analysts,

746
00:38:39.400 --> 00:38:42.840
<v Speaker 2>processing power, storage for forensic images. They haven't kept pace

747
00:38:43.199 --> 00:38:46.880
<v Speaker 2>more data, but limited resources to handle it efficiently means

748
00:38:46.920 --> 00:38:49.559
<v Speaker 2>backlogs grow and investigations can slow down.

749
00:38:49.599 --> 00:38:51.920
<v Speaker 1>And new file systems keep popping up too, right, Yeah,

750
00:38:51.960 --> 00:38:53.519
<v Speaker 1>constantly changing the rules of the game.

751
00:38:53.719 --> 00:38:57.880
<v Speaker 2>Absolutely. New file systems like APFS when it first appeared,

752
00:38:58.119 --> 00:39:02.199
<v Speaker 2>and even Extra four. Historically, they often require significant reverse

753
00:39:02.239 --> 00:39:06.920
<v Speaker 2>engineering by forensic researchers, especially if official documentation is scarce

754
00:39:07.000 --> 00:39:09.719
<v Speaker 2>or non existent. This takes a huge amount of time

755
00:39:09.800 --> 00:39:12.960
<v Speaker 2>and specialized skill, and there's always a risk of misinterpretation,

756
00:39:13.400 --> 00:39:16.199
<v Speaker 2>which has obvious implications if the findings are presented in court.

757
00:39:16.320 --> 00:39:19.559
<v Speaker 1>And you mentioned live data forensics earlier LDF, that comes

758
00:39:19.599 --> 00:39:21.480
<v Speaker 1>with its own set of ongoing challenges too.

759
00:39:21.679 --> 00:39:25.920
<v Speaker 2>Yes, definitely, Live data forensics revisited is a constant topic.

760
00:39:26.079 --> 00:39:29.000
<v Speaker 2>We already talked about how it inherently breaks ACPO Principle

761
00:39:29.039 --> 00:39:32.199
<v Speaker 2>one by altering data, making it impossible to realize that

762
00:39:32.239 --> 00:39:34.079
<v Speaker 2>principle fully. But there are other.

763
00:39:34.000 --> 00:39:38.400
<v Speaker 1>Risks too, sugg as like the system crashing maybe or

764
00:39:38.440 --> 00:39:39.800
<v Speaker 1>that crucial audit trail.

765
00:39:39.599 --> 00:39:43.360
<v Speaker 2>We talked about exactly. System crashes are a real possibility,

766
00:39:43.440 --> 00:39:46.880
<v Speaker 2>especially when performing intrusive actions like acquiring the contents of

767
00:39:46.960 --> 00:39:51.079
<v Speaker 2>RAM directly, which can sometimes destabilize a running system, and

768
00:39:51.199 --> 00:39:56.639
<v Speaker 2>maybe more fundamentally, LDF is inherently non repeatable. Unlike deadbox forensics,

769
00:39:56.639 --> 00:39:59.559
<v Speaker 2>where you can rerun analysis commands on a static image

770
00:39:59.599 --> 00:40:03.159
<v Speaker 2>multiple times and get the same result, LDF changes the

771
00:40:03.199 --> 00:40:07.079
<v Speaker 2>live system with every action. You can't perfectly recreate the

772
00:40:07.119 --> 00:40:08.280
<v Speaker 2>state later, so.

773
00:40:08.159 --> 00:40:10.199
<v Speaker 1>The documentation becomes even more critical.

774
00:40:10.360 --> 00:40:14.960
<v Speaker 2>Absolutely paramount that documentation. The ACPO audit trail becomes the

775
00:40:15.000 --> 00:40:18.119
<v Speaker 2>only way to verify the process, to show what was done,

776
00:40:18.199 --> 00:40:21.000
<v Speaker 2>when and why, because you can't simply rerun the experiment.

777
00:40:21.119 --> 00:40:23.440
<v Speaker 1>Okay, And then there's the elephant in the room for

778
00:40:23.519 --> 00:40:28.920
<v Speaker 1>any digital investigation today. Encryption. It's fantastic for user privacy obviously,

779
00:40:29.360 --> 00:40:32.079
<v Speaker 1>but it can be a massive hurdle, sometimes a complete

780
00:40:32.119 --> 00:40:33.840
<v Speaker 1>dead end for an investigation.

781
00:40:34.239 --> 00:40:37.480
<v Speaker 2>Encryption really is the classic double edged sword. It protects

782
00:40:37.519 --> 00:40:41.119
<v Speaker 2>legitimate users privacy and security, which is essential, but it

783
00:40:41.199 --> 00:40:44.719
<v Speaker 2>equally protects criminal communications and data from investigators if they

784
00:40:44.719 --> 00:40:45.519
<v Speaker 2>can't get the keys.

785
00:40:46.000 --> 00:40:47.559
<v Speaker 1>Are there any proposed solutions?

786
00:40:47.719 --> 00:40:51.760
<v Speaker 2>Well, Ideas like key escro systems have been proposed, where

787
00:40:52.000 --> 00:40:55.039
<v Speaker 2>decryption keys would perhaps be held by a trusted third

788
00:40:55.079 --> 00:40:59.760
<v Speaker 2>party under specific legal conditions, but these face huge technical

789
00:40:59.800 --> 00:41:03.519
<v Speaker 2>and ethical challenges. For one, criminals could simply choose to

790
00:41:03.639 --> 00:41:06.400
<v Speaker 2>use other encryption software or methods that aren't part of

791
00:41:06.440 --> 00:41:10.920
<v Speaker 2>the escrosystem. And second, there are massive privacy concerns about

792
00:41:10.920 --> 00:41:15.159
<v Speaker 2>governments or other entities having potential access to everyone's encrypted data.

793
00:41:15.199 --> 00:41:16.599
<v Speaker 2>It's a really difficult balance.

794
00:41:16.960 --> 00:41:18.639
<v Speaker 1>So we're kind of back to square one on that.

795
00:41:18.840 --> 00:41:21.159
<v Speaker 2>In many ways, and many ways yes, it remains a

796
00:41:21.159 --> 00:41:23.159
<v Speaker 2>major challenge and finally sort of tying a lot of

797
00:41:23.159 --> 00:41:27.320
<v Speaker 2>this together, there's a significant ongoing need for better standardization

798
00:41:27.400 --> 00:41:31.360
<v Speaker 2>and tool testing in the field, meaning developing standardized TORBA

799
00:41:31.400 --> 00:41:35.360
<v Speaker 2>basically large, well defined data sets of known digital evidence

800
00:41:35.360 --> 00:41:39.159
<v Speaker 2>containing specific artifacts. These could then be used to rigorously

801
00:41:39.239 --> 00:41:43.000
<v Speaker 2>test and validate the accuracy and reliability of different forensic

802
00:41:43.039 --> 00:41:47.119
<v Speaker 2>tools and techniques. Having such standard test sets would greatly

803
00:41:47.159 --> 00:41:50.079
<v Speaker 2>increase their acceptance in the courts and build more confidence

804
00:41:50.079 --> 00:41:53.760
<v Speaker 2>in forensic findings overall because results could be consistently replicated

805
00:41:53.800 --> 00:41:56.320
<v Speaker 2>and verified across different tools and laps.

806
00:41:56.519 --> 00:41:59.480
<v Speaker 1>Wow, what a journey that was. We've really delved deep

807
00:41:59.519 --> 00:42:01.840
<v Speaker 1>into the hit and structures of file systems, haven't we

808
00:42:02.039 --> 00:42:05.000
<v Speaker 1>From the absolute basics of bits and bytes, through the

809
00:42:05.960 --> 00:42:09.639
<v Speaker 1>intricate designs of different operating system formats like ntfs and

810
00:42:09.760 --> 00:42:12.719
<v Speaker 1>XT four and apfs, and then explore the cutting edge

811
00:42:12.760 --> 00:42:16.000
<v Speaker 1>techniques investigators use to try and uncover digital truths from

812
00:42:16.039 --> 00:42:18.440
<v Speaker 1>all that complexity. It's a world that is just constantly,

813
00:42:18.480 --> 00:42:19.440
<v Speaker 1>constantly changing.

814
00:42:19.639 --> 00:42:22.960
<v Speaker 2>It really is. The evolution is relentless. Every new app,

815
00:42:23.039 --> 00:42:26.320
<v Speaker 2>every new device, every new way we interact digitally, it

816
00:42:26.360 --> 00:42:30.559
<v Speaker 2>creates new challenges, but also potentially new opportunities for digital forensics.

817
00:42:30.639 --> 00:42:33.719
<v Speaker 2>And as that digital world keeps expanding, so does this

818
00:42:33.800 --> 00:42:38.079
<v Speaker 2>invisible landscape of information hidden beneath our everyday interactions. It's

819
00:42:38.199 --> 00:42:41.800
<v Speaker 2>honestly a constant race just to keep up to understand

820
00:42:41.840 --> 00:42:44.039
<v Speaker 2>the new hidden languages that are always emerging.

821
00:42:44.679 --> 00:42:47.159
<v Speaker 1>So here's something to think about as we wrap up.

822
00:42:47.679 --> 00:42:51.000
<v Speaker 1>Given how deeply digital evidence now intertwines with almost every

823
00:42:51.039 --> 00:42:54.920
<v Speaker 1>aspect of our lives, how might our growing understanding of

824
00:42:54.960 --> 00:42:57.639
<v Speaker 1>these invisible file systems the stuff we talked about today,

825
00:42:58.039 --> 00:43:00.599
<v Speaker 1>How might that reshape the very definition of truth in

826
00:43:00.639 --> 00:43:05.400
<v Speaker 1>a modern investigation? And maybe more intriguingly, what new hidden languages,

827
00:43:05.440 --> 00:43:07.840
<v Speaker 1>what new forms of digital evidence might emerge next that

828
00:43:07.920 --> 00:43:11.239
<v Speaker 1>will challenge even our current forensic capabilities. Think about that

829
00:43:11.280 --> 00:43:13.719
<v Speaker 1>for a moment. This knowledge isn't just for the tech

830
00:43:13.760 --> 00:43:17.400
<v Speaker 1>experts or the investigators. It's really about understanding the very

831
00:43:17.400 --> 00:43:20.800
<v Speaker 1>fabric of our digital existence and how those unseen layers

832
00:43:20.800 --> 00:43:24.360
<v Speaker 1>can reveal or sometimes conceal, the most profound stories of

833
00:43:24.400 --> 00:43:24.840
<v Speaker 1>our time.
