WEBVTT 1 00:00:00.120 --> 00:00:04.360 Imagine this. You're watching a crime drama, right, and the 2 00:00:04.400 --> 00:00:07.719 detective they're dusting for fingerprints. Classic stuff. 3 00:00:07.879 --> 00:00:10.720 Yeah, you see it all the time, but honestly. 4 00:00:10.480 --> 00:00:14.279 In the real world today that's almost quaint, like a 5 00:00:14.359 --> 00:00:18.000 rotary phone. Today, the fingerprints, they're digital and they're. 6 00:00:17.800 --> 00:00:19.480 Everywhere, absolutely everywhere. 7 00:00:19.519 --> 00:00:22.160 We're talking the smartphone in your pocket, the sat NAV 8 00:00:22.199 --> 00:00:25.399 in your car, even your home CCTV that's running twenty 9 00:00:25.399 --> 00:00:29.519 four to seven. These digital traces, they used to be 10 00:00:29.640 --> 00:00:33.520 just for specific cybercrimes, but now now they're found in 11 00:00:33.600 --> 00:00:36.960 almost every case. It's not just evidence anymore. It's really 12 00:00:37.000 --> 00:00:38.079 an explosion of it. 13 00:00:38.079 --> 00:00:41.439 It truly is. The sheer amount and well the variety 14 00:00:41.479 --> 00:00:44.439 of digital devices mean pretty much any incident, you know, 15 00:00:44.520 --> 00:00:46.119 from a minor theft all the way up to a 16 00:00:46.159 --> 00:00:49.640 major criminal investigation, it leaves a digital footprint something that 17 00:00:49.679 --> 00:00:52.560 wasn't even really conceivable a few decades back, exactly. 18 00:00:52.600 --> 00:00:55.479 And for you listening in this deep dive, it's your 19 00:00:55.479 --> 00:00:58.880 shortcut to getting properly well informed about this hidden world. 20 00:00:59.200 --> 00:01:00.679 We're not just going to scrap it's the surface of 21 00:01:00.679 --> 00:01:03.520 what information is found, No, definitely not. Our mission here 22 00:01:03.960 --> 00:01:06.560 is to pull back the curtain on how it stored, 23 00:01:06.599 --> 00:01:11.439 how it gets retrieved, and maybe most importantly, why understanding 24 00:01:11.480 --> 00:01:15.760 those hidden mechanics, Well, why it fundamentally reshapes an investigation 25 00:01:16.400 --> 00:01:18.400 and maybe even our idea of digital truth. 26 00:01:18.519 --> 00:01:20.799 Yeah, and our deep dive today it's built from sources 27 00:01:20.799 --> 00:01:23.400 that really focus on file system forensics. So we're going 28 00:01:23.480 --> 00:01:26.079 to give you a detailed look at how data is 29 00:01:26.239 --> 00:01:29.480 organized write down at its most fundamental level. We'll look 30 00:01:29.480 --> 00:01:32.359 at the essential tools investigators used to uncover it and 31 00:01:33.400 --> 00:01:36.959 the fascinating, sometimes pretty complex challenges that lie ahead in 32 00:01:37.000 --> 00:01:39.239 this field because it's evolving so fast. 33 00:01:39.280 --> 00:01:43.519 Okay, so we've painted this picture digital evidence exploding everywhere. 34 00:01:43.840 --> 00:01:46.640 But here's where it starts to get tricky. The very 35 00:01:46.640 --> 00:01:49.560 basic rules for collecting this evidence they've been shifting under 36 00:01:49.560 --> 00:01:52.280 our feet. Let's dig into what we're calling the shifting 37 00:01:52.359 --> 00:01:55.519 sands of digital evidence, because what worked maybe ten years 38 00:01:55.560 --> 00:01:57.959 ago could actually compromise an entire case. 39 00:01:58.000 --> 00:01:58.640 Now, that's right. 40 00:01:58.840 --> 00:02:01.879 For years, the standard advice for seizing a computer and 41 00:02:01.920 --> 00:02:05.719 an investigation was just dead simple, pull the plug, just 42 00:02:05.799 --> 00:02:08.439 cut the power immediately. Yep, that was the gold standard. 43 00:02:08.560 --> 00:02:13.479 But today doing that can be a huge mistake. Why 44 00:02:13.599 --> 00:02:16.960 is that old rule suddenly so well dangerous. 45 00:02:17.439 --> 00:02:20.560 What's fascinating, I think is that modern operating systems, in hardware, 46 00:02:20.759 --> 00:02:24.240 they've introduced features that directly conflict with that old advice. 47 00:02:24.439 --> 00:02:28.599 Think of it like this. For encrypted storage, right, the 48 00:02:28.639 --> 00:02:32.560 decryption keys often only live in the computer's volatile memory. 49 00:02:32.960 --> 00:02:36.479 It's RAM, basically, it's short term digital brain. Okay, So 50 00:02:36.520 --> 00:02:39.840 you pull the plug, those keys just vanish instantly, and 51 00:02:39.919 --> 00:02:43.520 all that data becomes a locked box, you know, permanently inaccessible. 52 00:02:43.560 --> 00:02:46.080 It's like trying to open a safe after the combination's 53 00:02:46.120 --> 00:02:48.159 just been wiped from the manager's memory. 54 00:02:47.919 --> 00:02:51.479 Right gone forever. And people aren't just saving things locally anymore, 55 00:02:51.520 --> 00:02:55.800 are they. They're using remote storage, cloud services, email, social. 56 00:02:55.520 --> 00:02:56.919 Media constantly connects. 57 00:02:57.000 --> 00:02:59.280 So if you yank the power, all those live connections, 58 00:02:59.319 --> 00:03:03.199 all that access to potentially crucial data, it's instantly. 59 00:03:02.759 --> 00:03:05.919 Lost, exactly. And this is why live data forensics or 60 00:03:06.039 --> 00:03:09.599 LDF has become so critical. It lets analysts capture that 61 00:03:09.639 --> 00:03:12.560 live data from a running computer system before it disappears. 62 00:03:12.919 --> 00:03:16.240 But hang on, that sounds like it completely contradicts one 63 00:03:16.280 --> 00:03:19.479 of the most fundamental principles of forensics, doesn't it. 64 00:03:19.280 --> 00:03:22.599 It absolutely does, And this raises a really critical point. 65 00:03:23.120 --> 00:03:26.639 The first principle in digital forensics. It's often called ACPO 66 00:03:26.719 --> 00:03:29.960 principle one. It states that no action taken by law 67 00:03:30.000 --> 00:03:33.800 enforcement agencies should change data which may subsequently be relied 68 00:03:33.879 --> 00:03:38.120 upon in court. Okay, LDF, well, it inherently breaks this principle. 69 00:03:38.159 --> 00:03:40.159 I mean, even just moving a mouse on a running 70 00:03:40.159 --> 00:03:43.960 system leaves digital traces. So the tension there, it's very real. 71 00:03:44.360 --> 00:03:47.120 So, Okay, if you're altering the data, how on earth 72 00:03:47.159 --> 00:03:49.560 can it be admissible in court? Doesn't that just open 73 00:03:49.599 --> 00:03:52.360 the evidence up to immediate challenge, Say you changed it? 74 00:03:52.439 --> 00:03:55.560 Well, not necessarily the principles themselves, they're actually still fit 75 00:03:55.599 --> 00:03:58.719 for purpose, but they kind of adapt. So while LDF 76 00:03:58.759 --> 00:04:01.080 does alter data, it can be admissible in court when 77 00:04:01.120 --> 00:04:05.319 you combine it with ACPO principle two, which emphasizes investigator competence, 78 00:04:05.360 --> 00:04:08.360 and principle three, which demands a really thorough audit trail. 79 00:04:08.719 --> 00:04:11.319 Ah okay, the paperwork. 80 00:04:11.080 --> 00:04:14.719 Essentially, yeah, it means every single step taken, every command run, 81 00:04:14.800 --> 00:04:18.720 every change made, it has to be meticulously documented. And 82 00:04:18.759 --> 00:04:21.680 it's that meticulous record keeping that allows the court to 83 00:04:21.800 --> 00:04:25.199 understand and hopefully trust, the context of the altered data. 84 00:04:25.560 --> 00:04:28.920 The integrity of the whole investigation now relies on understanding 85 00:04:29.040 --> 00:04:33.120 that fundamental shift in collection and then just rigorously documenting 86 00:04:33.240 --> 00:04:34.240 every single step. 87 00:04:34.480 --> 00:04:38.040 Right. Speaking of integrity, let's talk about maybe an unsung 88 00:04:38.120 --> 00:04:42.439 hero of digital forensics. Linux. It's often called an open 89 00:04:42.439 --> 00:04:46.000 source powerhouse in this field. But what does open source 90 00:04:46.040 --> 00:04:47.959 actually mean? Is it just about software you don't have 91 00:04:48.000 --> 00:04:48.399 to pay for. 92 00:04:49.079 --> 00:04:52.040 That's a common misconception. To really get open source, it 93 00:04:52.040 --> 00:04:55.120 helps to contrast it with closed source software. So imagine 94 00:04:55.160 --> 00:04:57.720 I write a simple Hello World program and see. 95 00:04:57.480 --> 00:04:58.000 Right, okay. 96 00:04:58.279 --> 00:05:01.079 With closed source, i'd give you the compile executable file. 97 00:05:01.160 --> 00:05:03.240 You can run it, sure, but you can't see the 98 00:05:03.319 --> 00:05:05.839 underlying code. You can't change it. It's like a mystery 99 00:05:05.879 --> 00:05:08.319 box that just well does its thing right. 100 00:05:08.360 --> 00:05:10.240 You just trust it works exactly. 101 00:05:10.439 --> 00:05:12.720 With open source, though, I give you the actual c 102 00:05:13.000 --> 00:05:15.759 program file, the source code. You can read it, you 103 00:05:15.759 --> 00:05:17.639 can understand exactly what it does, and you can even 104 00:05:17.639 --> 00:05:21.120 modify it if you've got the skills. Richard Stallman, the 105 00:05:21.160 --> 00:05:24.000 founder of the Free Software Foundation. He famously said that 106 00:05:24.120 --> 00:05:27.800 free in open source means free as in free speech, 107 00:05:28.480 --> 00:05:30.240 not free as in free beer. 108 00:05:30.600 --> 00:05:31.920 Ah. That's a great distinction. 109 00:05:32.199 --> 00:05:35.079 It is so while it's often free of costs because 110 00:05:35.079 --> 00:05:37.800 of the lightnsing, the core idea is really the freedom 111 00:05:37.839 --> 00:05:40.360 to examine, use, and modify the code. 112 00:05:40.639 --> 00:05:45.000 That distinction is key. So why is this open source model, 113 00:05:45.040 --> 00:05:49.240 particularly Linux, such an advantage for forensics. It seems a 114 00:05:49.240 --> 00:05:52.439 bit counterintuitive that something anyone can tinker with would be 115 00:05:52.519 --> 00:05:53.279 more trustworthy. 116 00:05:53.319 --> 00:05:56.279 Maybe well, it's precisely because anyone can modify it, or 117 00:05:56.279 --> 00:05:58.839 at least examine it, that it's often seen as more trustworthy. 118 00:05:59.120 --> 00:06:01.800 There are a few big reasons. First, there's community power. 119 00:06:02.519 --> 00:06:07.040 Open source projects often have these huge communities of users, developers, testers, 120 00:06:07.079 --> 00:06:10.199 all working together. This collective effort often leads to new 121 00:06:10.240 --> 00:06:14.680 features being introduced faster and crucially, issues being resolved much 122 00:06:14.759 --> 00:06:17.160 quicker than with small proprietary teams. 123 00:06:17.319 --> 00:06:20.560 So it's not just about speed then, does that community 124 00:06:20.639 --> 00:06:25.800 scrutiny also directly help with the trustworthiness and accuracy of 125 00:06:25.800 --> 00:06:30.199 the tools, which must be absolutely paramount when potentially lives 126 00:06:30.240 --> 00:06:30.920 depend on them. 127 00:06:31.040 --> 00:06:33.639 That's exactly it. More eyes on the code, more brain 128 00:06:33.759 --> 00:06:38.360 solving problems. This leads directly to greater trust and correctness. 129 00:06:38.800 --> 00:06:42.680 With closed source software, you're essentially relying on the developers 130 00:06:42.720 --> 00:06:46.680 having got everything right, and we've all seen the infamous 131 00:06:46.720 --> 00:06:50.920 blue screen of death in Windows. For example, With open source, 132 00:06:50.959 --> 00:06:53.720 the community can review and fix the code at any point. 133 00:06:54.240 --> 00:06:57.240 That provides a lot more confidence in the tool's accuracy, which, 134 00:06:57.279 --> 00:06:59.680 as you say, is vital when people's lives might depend 135 00:06:59.680 --> 00:07:03.000 on the investigation's outcome. Makes sense, and yes, cost effectiveness 136 00:07:03.040 --> 00:07:07.000 is a major advantage too. Because of copyleft licensing requirements, 137 00:07:07.079 --> 00:07:09.959 it's actually quite difficult to sell open source software directly, 138 00:07:10.040 --> 00:07:12.839 so it's often free of cost. Now, companies can still 139 00:07:12.839 --> 00:07:16.560 offer services like training or customization around these products, but 140 00:07:16.680 --> 00:07:20.120 the core software itself is usually freely available. And lastly, 141 00:07:20.199 --> 00:07:24.480 specifically for forensics, Linux offers great support for many file 142 00:07:24.519 --> 00:07:27.879 systems by default, often much more than Windows or Mac 143 00:07:27.920 --> 00:07:31.000 OS natively support, which makes it a really ideal forensic 144 00:07:31.040 --> 00:07:32.480 workstation right out of the box. 145 00:07:33.120 --> 00:07:35.240 So when we talk about Linux as an operating system, 146 00:07:35.279 --> 00:07:37.720 what are its main parts? You hear about the kernel, 147 00:07:38.000 --> 00:07:40.839 but what else makes it a functioning OS? 148 00:07:41.160 --> 00:07:44.319 Yeah, good question. At its very heart is the kernel, 149 00:07:44.519 --> 00:07:46.879 which was created by Linus Torvold's that's the bit that 150 00:07:46.920 --> 00:07:50.600 directly controls the hardware and manages the software. Then layered 151 00:07:50.600 --> 00:07:53.079 on top of that are the GNU utilities. These are 152 00:07:53.079 --> 00:07:56.519 standard programs that let users control files, run programs, that 153 00:07:56.560 --> 00:07:59.519 sort of thing. It's really the combination of the Linux 154 00:07:59.600 --> 00:08:03.040 kernel and these GENU utilities that forms the functional operating 155 00:08:03.079 --> 00:08:06.439 system we commonly just call Linux. Beyond that core, you've 156 00:08:06.439 --> 00:08:10.240 got graphical desktop environments, the visual interface most people see, 157 00:08:10.480 --> 00:08:12.839 and of course all the application software the end users 158 00:08:12.839 --> 00:08:15.839 are most familiar with, including those powerful forensic tools we've 159 00:08:15.879 --> 00:08:16.399 been mentioning. 160 00:08:16.600 --> 00:08:20.639 Okay, let's get practical then. What are some basic, really 161 00:08:20.759 --> 00:08:24.920 fundamental forensic commands in Linux that investigators are using day 162 00:08:24.920 --> 00:08:27.800 to day. These must be like the digital equivalent of 163 00:08:27.839 --> 00:08:29.439 a magnifying glass and dusting powder. 164 00:08:29.600 --> 00:08:32.720 Definitely, one of the most fundamental is hashing. This is 165 00:08:32.840 --> 00:08:35.559 absolutely crucial for ensuring data integrity. 166 00:08:35.759 --> 00:08:37.320 Okay, how did that work? Well? 167 00:08:37.559 --> 00:08:41.399 Hashing algorithms like say MD five or the Saha family. 168 00:08:41.799 --> 00:08:45.360 They create a unique digital fingerprint for any piece of data. 169 00:08:45.840 --> 00:08:48.519 If even a single bit is changed in a file, 170 00:08:48.879 --> 00:08:52.279 its hash value will change traumatically. It's like if you 171 00:08:52.399 --> 00:08:55.399 change just one letter in the entire collected works of Shakespeare, 172 00:08:55.480 --> 00:08:59.159 the hash would completely change instantly, confirming even the tiniest alteration. 173 00:08:59.559 --> 00:09:01.960 Wow. So if someone sends you a file and you 174 00:09:02.000 --> 00:09:04.879 calculate its hash, you can instantly verify it hasn't been 175 00:09:04.919 --> 00:09:08.080 tampered with since they calculated their ash. Yeah, that's powerful. 176 00:09:08.320 --> 00:09:12.039 It is very powerful. Now. While some smaller hashes like 177 00:09:12.159 --> 00:09:16.559 CRC threety two maybe can sometimes experience hash collisions, that's 178 00:09:16.600 --> 00:09:20.440 where different inputs accidentally produce the same hash, using larger 179 00:09:20.480 --> 00:09:23.799 outputs like SAHA five twelve, or maybe using multiple different 180 00:09:23.840 --> 00:09:28.000 algorithms together, that greatly reduces that probability to almost zero. 181 00:09:28.200 --> 00:09:29.240 Got it? What else? 182 00:09:29.480 --> 00:09:33.279 Another really useful tool is hex viewers like XXD. This 183 00:09:33.399 --> 00:09:37.360 lets analysts examine raw binary data, bite by byte, things 184 00:09:37.399 --> 00:09:40.399 like a discs partition table for example. It's like looking 185 00:09:40.399 --> 00:09:43.080 at the absolute purest form of the computer's language, the 186 00:09:43.120 --> 00:09:47.440 ones and zeros represented compactly. This often requires root access, though, 187 00:09:47.519 --> 00:09:48.480 using the pseudo command. 188 00:09:48.679 --> 00:09:50.879 Okay, and then there's strings I've heard that's a really 189 00:09:50.919 --> 00:09:53.399 really powerful one for investigators. What makes it so special? 190 00:09:53.559 --> 00:09:54.159 It really is? 191 00:09:54.279 --> 00:09:58.279 The strings command is deceptively simple but incredibly useful. It 192 00:09:58.320 --> 00:10:00.960 displays all the printable as key carecharacters it finds within 193 00:10:01.039 --> 00:10:03.600 any file, even binary file, so even. 194 00:10:03.399 --> 00:10:06.120 In like an image file or a program exactly. 195 00:10:06.240 --> 00:10:09.759 Even if a file is an image or an executable program, 196 00:10:10.080 --> 00:10:12.679 strings can pull out any plain text that happens to 197 00:10:12.679 --> 00:10:14.840 be embedded within it. And when you combine it with 198 00:10:14.960 --> 00:10:18.960 EGREP for text searching, it becomes a very powerful forensic 199 00:10:19.000 --> 00:10:23.799 tool for quickly finding keywords or phrases within potentially massive 200 00:10:23.799 --> 00:10:26.919 amounts of raw data. That sounds incredibly useful, and the 201 00:10:26.960 --> 00:10:30.840 AT option is particularly useful. It displays the bite offset, 202 00:10:30.840 --> 00:10:33.879 basically the address where the text is found. This lets 203 00:10:33.879 --> 00:10:37.120 an investigator navigate directly to that specific spot within the 204 00:10:37.159 --> 00:10:39.360 file or the disc image using other tools. 205 00:10:39.440 --> 00:10:41.960 So it's not just about finding a keyword, it's about 206 00:10:42.279 --> 00:10:45.240 the context, seeing what's around it. I imagine that's crucial 207 00:10:45.279 --> 00:10:47.720 when you're dealing with huge amounts of data where a 208 00:10:47.799 --> 00:10:51.799 word might appear harmlessly in one place, but maybe sinatraily 209 00:10:51.840 --> 00:10:54.279 in another, all hidden away in binary code. 210 00:10:54.320 --> 00:10:56.879 Precisely, it's not just finding the needle in the haystack. 211 00:10:56.960 --> 00:10:59.720 It's like finding a specific pollen grain on that needle, 212 00:10:59.759 --> 00:11:03.759 and it's precise digital address. It's truly granular work. 213 00:11:04.039 --> 00:11:07.000 That's incredible. Okay, that's a great segue into understanding the 214 00:11:07.039 --> 00:11:10.919 hidden language, how computers speak in ones and zeros. I mean, 215 00:11:10.919 --> 00:11:12.200 at the end of the day, it's all just zeros 216 00:11:12.200 --> 00:11:14.080 and ones, right, But how do they turn that into 217 00:11:14.120 --> 00:11:16.639 something we actually understand, like text or numbers. 218 00:11:17.000 --> 00:11:19.559 It is all zeros and ones. But how those zeros 219 00:11:19.600 --> 00:11:23.399 and ones are interpreted? That's the key. Computers, as you say, 220 00:11:23.480 --> 00:11:27.200 use the binary number system base two. Humans we generally 221 00:11:27.279 --> 00:11:31.080 use decimal base ten, but in computing you'll very often 222 00:11:31.200 --> 00:11:36.000 encounter hexadecimal or hex, which is base sixteen. It uses 223 00:11:36.039 --> 00:11:38.799 the digits zero through nine and then the letters A 224 00:11:39.000 --> 00:11:42.759 through F to represent the values ten to fifteen. Hexadeesimal 225 00:11:42.840 --> 00:11:45.960 is simply a much more compact way to represent binary 226 00:11:46.039 --> 00:11:47.120 data for human eyes. 227 00:11:47.320 --> 00:11:49.240 Okay, so it's like a shorthand exactly. 228 00:11:49.320 --> 00:11:51.639 For example, the binary number ten eleven that's one zero, 229 00:11:51.679 --> 00:11:54.360 one one is equivalent to eleven in our decimal system. 230 00:11:54.759 --> 00:11:57.159 In hex, it would be b. It just makes reading 231 00:11:57.200 --> 00:11:59.000 long strings of binary data much easier. 232 00:11:59.039 --> 00:12:02.240 Okay, that makes sense. Then there's text. My computer understands 233 00:12:02.240 --> 00:12:03.960 what I type, But how does it turn the letter 234 00:12:04.000 --> 00:12:05.960 a into numbers and then back again. 235 00:12:06.279 --> 00:12:09.840 Ah, that's where character encodings come in. They basically assign 236 00:12:09.919 --> 00:12:13.679 a unique numerical code to each character. Letter's numbers, symbols 237 00:12:13.919 --> 00:12:17.759 everything like a secret codebook kind of Older encodings like 238 00:12:17.879 --> 00:12:21.120 ASE and ISO eight eight five nine, they were quite limited. 239 00:12:21.519 --> 00:12:24.360 They worked well for English, but struggled with special characters 240 00:12:24.399 --> 00:12:27.720 like maybe the A character in Spanish or other European languages. 241 00:12:28.000 --> 00:12:30.799 They simply didn't have enough codes assigned for every symbol 242 00:12:30.919 --> 00:12:31.799 used across. 243 00:12:31.559 --> 00:12:34.840 The world, Which is where Unicode and UTF eight step in, 244 00:12:34.879 --> 00:12:35.799 I guess correct. 245 00:12:36.159 --> 00:12:39.720 Unicode is this huge standard that supports a vast range 246 00:12:39.759 --> 00:12:42.840 of characters from pretty much all the world's writing systems. 247 00:12:43.159 --> 00:12:47.320 It covers virtually every character you could imagine, and UTF 248 00:12:47.320 --> 00:12:50.879 eight is a specific encoding method for Unicode. It's a 249 00:12:51.039 --> 00:12:55.320 variable width encoding, which cleverly solves the storage inefficiency you'd 250 00:12:55.320 --> 00:12:57.919 get if every single character took up say four bites. 251 00:12:58.639 --> 00:13:01.679 UTF eight uses anywhere from one to four bytes per character, 252 00:13:01.879 --> 00:13:05.039 so it adapts exactly. This makes it the de facto 253 00:13:05.159 --> 00:13:08.279 standard for web pageing coding and much more. What's really 254 00:13:08.279 --> 00:13:11.720 clever about UTF eight is that standard English ACI characters 255 00:13:11.759 --> 00:13:15.360 ABC one, two three, they are represented identically to their 256 00:13:15.360 --> 00:13:18.399 original ACI form, taking up just one byte, but more 257 00:13:18.440 --> 00:13:22.919 complex characters like emojis or characters from other alphabets, they 258 00:13:23.000 --> 00:13:25.720 might take two, three, or four bytes. So it's incredibly 259 00:13:25.720 --> 00:13:29.279 efficient for common text, but flexible enough for global communication. 260 00:13:29.720 --> 00:13:32.440 That's smart. And what about time? How do computers keep 261 00:13:32.480 --> 00:13:35.279 track of that down to the you know, the millisecond 262 00:13:35.320 --> 00:13:38.639 or even nanosecond, which must be crucial for an investigation 263 00:13:38.679 --> 00:13:39.440 and timeline. 264 00:13:39.519 --> 00:13:43.600 Time representation in computing is another fascinating area. Many systems, 265 00:13:43.679 --> 00:13:46.919 especially Unix like systems like Linux and mac os, use 266 00:13:46.919 --> 00:13:48.320 what's called Unix time. 267 00:13:48.240 --> 00:13:48.960 Right, I've heard of that. 268 00:13:49.039 --> 00:13:51.159 It's measured as a number of seconds that have elapsed 269 00:13:51.200 --> 00:13:55.480 since midnight UTC on January first, nineteen seventy. That specific 270 00:13:55.559 --> 00:13:57.080 moment is known as the epoch. 271 00:13:57.399 --> 00:13:59.799 Okay, so just a big counter of seconds. But what 272 00:14:00.159 --> 00:14:03.519 if two things happen really fast, like within the same second, 273 00:14:03.679 --> 00:14:06.279 would they have the exact same time stamp? That could 274 00:14:06.279 --> 00:14:08.919 be a real problem for investigators trying to figure out 275 00:14:08.960 --> 00:14:13.360 the exact order of events, especially if automated processes are involved. 276 00:14:13.480 --> 00:14:16.399 It absolutely could be, and it was a limitation. While 277 00:14:16.440 --> 00:14:18.799 maybe not an issue for things happening at human speed, 278 00:14:19.279 --> 00:14:23.080 automated processes can access or modify many files within a 279 00:14:23.159 --> 00:14:27.279 single second. This meant older filesystems like say x two 280 00:14:27.320 --> 00:14:31.600 to two, which only had second level granularity, couldn't always 281 00:14:31.639 --> 00:14:35.120 definitively say which isn't happened first if they occurred in 282 00:14:35.159 --> 00:14:35.840 the same second. 283 00:14:35.919 --> 00:14:36.799 So how do they fix that? 284 00:14:37.360 --> 00:14:40.559 Well, most modern implementations of UNIX time, like you find 285 00:14:40.559 --> 00:14:43.639 in the XT four filesystem, for example, they now include 286 00:14:43.639 --> 00:14:47.879 a nanosecond subcomponent. Nanosecond yeah, billions of a second. This 287 00:14:48.039 --> 00:14:52.360 significantly improves the granularity, allowing for incredibly precise ordering of 288 00:14:52.399 --> 00:14:55.919 events and file creation timestamps that can be absolutely crucial 289 00:14:55.960 --> 00:14:57.960 for building an accurate forensic timeline. 290 00:14:58.159 --> 00:15:02.440 That's a huge leap in precision. Okay, one more weird 291 00:15:02.519 --> 00:15:05.279 term before we move on. Indian thiss. That sounds like 292 00:15:05.279 --> 00:15:07.559 something out of Gulliver's Travels or I don't know, a 293 00:15:07.600 --> 00:15:09.639 really obscure technical debate. What's that about? 294 00:15:09.879 --> 00:15:12.799 Huh? It does sound a bit strange, doesn't it, But 295 00:15:12.879 --> 00:15:16.879 it's actually crucial for correctly interpreting raw Heck's data off 296 00:15:16.919 --> 00:15:20.519 a disc. Indianness is just about the order in which 297 00:15:20.559 --> 00:15:24.679 computers store or read multi byte numbers. Okay, how so 298 00:15:25.120 --> 00:15:27.279 imagine writing down a date. Do you write month, day 299 00:15:27.399 --> 00:15:29.120 year like in the US or day month year like 300 00:15:29.159 --> 00:15:31.639 in Europe. It's the same information, right, just a different order. 301 00:15:31.679 --> 00:15:32.120 Gotcha. 302 00:15:32.159 --> 00:15:35.200 Computers have a similar choice for numbers. Big Indian is 303 00:15:35.240 --> 00:15:37.919 like writing one twenty three. The most significant bite, the 304 00:15:37.919 --> 00:15:40.759 one hundred's part, comes first. Little Indian, which is more 305 00:15:40.759 --> 00:15:43.120 common on PCs, is like writing three twenty one. The 306 00:15:43.200 --> 00:15:45.759 least significant bite, the ones part, comes first. 307 00:15:45.799 --> 00:15:47.519 So why does that matter for forensics? 308 00:15:47.759 --> 00:15:50.399 Because if you pull raw data off a disc, maybe 309 00:15:50.440 --> 00:15:53.200 from a critical system boot file or a timestamp field, 310 00:15:53.240 --> 00:15:55.519 and you don't know the reading order, the indianness of 311 00:15:55.519 --> 00:15:58.960 the system that wrote it, you'll completely misinterpret what those 312 00:15:59.000 --> 00:16:02.120 bites actually report. It could be the difference between seeing 313 00:16:02.159 --> 00:16:05.559 December fifth and May twelfth in a critical timestamp, just 314 00:16:05.639 --> 00:16:07.639 based on reading the bytes in the wrong order. 315 00:16:07.840 --> 00:16:11.919 Okay, crucial detail. Then, with that secret language sort of decoded, 316 00:16:12.240 --> 00:16:14.440 let's zoom out a bit from the individual bits and 317 00:16:14.480 --> 00:16:17.480 bytes to the actual landscape where all this data lives. 318 00:16:18.440 --> 00:16:22.799 We're moving to the disks, partitions, and file system fundamentals, 319 00:16:23.279 --> 00:16:27.600 the very architecture of digital storage. Starting with how computers 320 00:16:27.679 --> 00:16:31.320 organize information physically, What are the different types of storage 321 00:16:31.360 --> 00:16:33.840 and which ones are most relevant for forensics? 322 00:16:34.080 --> 00:16:39.159 Right? Computer storage is usually classified into a few tiers, primary, secondary, tertiary, 323 00:16:39.200 --> 00:16:43.519 and offline. Primary storage is typically RAM random access memory, 324 00:16:43.720 --> 00:16:45.639 and the key thing about RAM is that it's volatile, 325 00:16:45.840 --> 00:16:48.519 meaning all the information stored in it is lost as 326 00:16:48.559 --> 00:16:51.440 soon as the power is removed. This is exactly why 327 00:16:51.559 --> 00:16:55.039 live data forensics LDF is so critical. As we've discussed, 328 00:16:55.679 --> 00:16:58.279 RAM holds so many ephemeral details about what was just 329 00:16:58.279 --> 00:17:03.039 happening on the system's story, open documents, running processes, network connections, 330 00:17:03.080 --> 00:17:04.960 those encryption keys we mentioned. 331 00:17:05.000 --> 00:17:06.759 Stuff you lose if you just pull. 332 00:17:06.559 --> 00:17:10.759 A plug precisely. Then you have secondary storage. This includes 333 00:17:10.799 --> 00:17:15.000 your traditional hard disk drives HDDs and the now very 334 00:17:15.000 --> 00:17:18.640 common solid state drives SSDs. This is where most of 335 00:17:18.680 --> 00:17:21.160 our persistent data resides, the stuff that stays when the 336 00:17:21.200 --> 00:17:22.160 power is off. 337 00:17:22.000 --> 00:17:24.720 And SSDs would you're everywhere now. Because they're so fast 338 00:17:24.720 --> 00:17:28.640 and efficient, they bring their own unique forensic headaches, don't they. 339 00:17:29.000 --> 00:17:31.240 I've heard they can be kind of a forensic investigator's 340 00:17:31.319 --> 00:17:34.200 nightmare compared to the old spinning hard drives. 341 00:17:34.359 --> 00:17:38.920 They absolutely do post some unique SSD specific challenges because 342 00:17:38.960 --> 00:17:42.880 of how they work fundamentally differently from HDDs see. Unlike HDDs, 343 00:17:42.960 --> 00:17:45.920 the flash memory components inside an SSD can only be 344 00:17:45.920 --> 00:17:48.279 written to a limited number of times before they wear out, 345 00:17:48.519 --> 00:17:52.799 so to extend the drives lifespan, the SSD controller employs 346 00:17:52.839 --> 00:17:57.599 techniques like were leveling. This involves intelligently moving data around 347 00:17:57.599 --> 00:18:00.680 independently of the operating system just to make sure all 348 00:18:00.680 --> 00:18:02.920 the memory cells get written to roughly the same amount. 349 00:18:03.039 --> 00:18:05.160 So the controller is shuffling data behind the. 350 00:18:05.200 --> 00:18:10.119 Scenes exactly, which makes predicting the precise physical location of 351 00:18:10.160 --> 00:18:13.240 a specific piece of data for forensic analysis much harder. 352 00:18:13.559 --> 00:18:16.039 You can't just assume data stays put in one physical 353 00:18:16.039 --> 00:18:18.440 spot like you mostly could on an HDD. 354 00:18:18.240 --> 00:18:20.799 So data might not be where the operating system thinks 355 00:18:20.799 --> 00:18:23.519 it is. That sounds like a constant game of digital 356 00:18:23.559 --> 00:18:24.240 hide and seek. 357 00:18:24.400 --> 00:18:27.640 It can be, and it gets worse even more critically 358 00:18:27.799 --> 00:18:31.519 when the operating system marks data as unallocated, which happens 359 00:18:31.519 --> 00:18:34.680 when you delete a file. Modern SSDs use a function 360 00:18:34.759 --> 00:18:38.519 called trim. The OS basically tells the SSD controller, Hey, 361 00:18:38.599 --> 00:18:39.799 we don't need the data in these. 362 00:18:39.720 --> 00:18:41.160 Blocks anymore, and the controller. 363 00:18:41.319 --> 00:18:44.119 The controller can then internally mark those blocks for erasure, 364 00:18:44.359 --> 00:18:47.240 often almost immediately, as part of its garbage collection routines. 365 00:18:47.880 --> 00:18:50.640 This means that deleted files are much less likely to 366 00:18:50.680 --> 00:18:54.359 be present and recoverable on SSDs than on traditional HDDs, 367 00:18:54.519 --> 00:18:56.680 where the data just sat there until overwritten. 368 00:18:57.079 --> 00:18:59.599 Wow, so deleting really means deleting much more often on 369 00:18:59.599 --> 00:19:00.759 ans sat often. 370 00:19:00.920 --> 00:19:05.000 Yes, And here's the real kicker for forensics. These SSD 371 00:19:05.119 --> 00:19:08.839 controllers run their garbage collection routines in the background while 372 00:19:08.880 --> 00:19:12.240 the drive is powered on. This means that potentially data 373 00:19:12.319 --> 00:19:14.920 is changing on the device in question, even if it's 374 00:19:14.920 --> 00:19:17.440 just sitting there, plugged in as evidence doing nothing. 375 00:19:17.440 --> 00:19:20.559 From the OS perspective, WHOA, So the evidence is potentially 376 00:19:20.640 --> 00:19:21.799 altering itself. 377 00:19:21.559 --> 00:19:24.799 Exactly, which, as you can imagine, directly breaks the principle 378 00:19:24.839 --> 00:19:28.720 of not altering evidence. It's a fundamental conflict that investigators 379 00:19:28.759 --> 00:19:31.480 have to be acutely aware of when dealing with SSDs. 380 00:19:31.839 --> 00:19:35.680 That's a huge, huge challenge to the core ideas of forensics. Okay, 381 00:19:35.720 --> 00:19:40.039 so physical discs, whether HDD or SSD, they're often divided 382 00:19:40.079 --> 00:19:43.240 into partitions. Why do we do that? What's the purpose 383 00:19:43.240 --> 00:19:44.440 of these logical divisions? 384 00:19:44.839 --> 00:19:49.279 Partitions are essentially logical divisions of a single physical disc. 385 00:19:49.799 --> 00:19:52.079 They allow that one physical disc to be split into 386 00:19:52.160 --> 00:19:55.960 multiple logical areas, each of These areas can then contain 387 00:19:56.079 --> 00:19:59.400 a different file system or even a different operating system. 388 00:19:59.720 --> 00:20:02.599 For you might have one partition for Windows and another 389 00:20:02.599 --> 00:20:04.680 for Linux on the same drive, or maybe a separate 390 00:20:04.720 --> 00:20:06.319 partition just for your user data. 391 00:20:06.559 --> 00:20:09.000 Okay, and there are different ways these partitions are laid 392 00:20:09.000 --> 00:20:12.200 out on the disc. Right, Like MBR versus GPT. What's 393 00:20:12.200 --> 00:20:14.599 the practical difference there for someone investigating a system. 394 00:20:14.720 --> 00:20:20.440 Yes, MBR Master Boot Record and GPTGII Partition Table are 395 00:20:20.519 --> 00:20:24.039 the two main schemes used to define how partitions are 396 00:20:24.119 --> 00:20:27.880 organized on a disc. MBR is the older standard. GPT 397 00:20:28.160 --> 00:20:31.319 is more modern and allows for well far more partitions 398 00:20:31.319 --> 00:20:33.759 on a single disk, and also provides greater space for 399 00:20:33.799 --> 00:20:37.200 storing partition information compared to the very limited. 400 00:20:36.799 --> 00:20:39.160 Space in the MBR, so more robust. 401 00:20:39.440 --> 00:20:43.599