WEBVTT 1 00:00:00.160 --> 00:00:05.160 Welcome to the deep dive. Today. We're undertaking a strategic 2 00:00:05.200 --> 00:00:11.039 analysis really of the modern IT engine. Yeah, if you're building, managing, 3 00:00:11.160 --> 00:00:15.519 or maybe migrating business applications, it's highly likely that Linux 4 00:00:15.560 --> 00:00:18.640 virtual machines, maybe containers are running the show somewhere. 5 00:00:18.640 --> 00:00:19.440 Oh. Absolutely. 6 00:00:19.559 --> 00:00:22.160 And the move to the cloud, well, it's a strategic 7 00:00:22.239 --> 00:00:26.440 bandate now for almost every organization, isn't it. But mastering 8 00:00:26.480 --> 00:00:30.839 that environment, making it efficient, resilient, secure, that takes more 9 00:00:30.879 --> 00:00:32.159 than just spinning up a few. 10 00:00:32.000 --> 00:00:35.280 Servers, It absolutely does. Running your workloads in the public 11 00:00:35.280 --> 00:00:39.759 cloud is a fundamentally different paradigm than running a physical 12 00:00:39.840 --> 00:00:40.439 data center. 13 00:00:40.520 --> 00:00:42.960 It just is a different way of thinking completely. 14 00:00:43.240 --> 00:00:47.439 The promise is massive, of course, agility, true elasticity, pay 15 00:00:47.479 --> 00:00:50.880 for use, economics, all the good stuff. But without a 16 00:00:50.880 --> 00:00:53.840 strategic playbook, you can end up just throwing good money 17 00:00:53.840 --> 00:00:57.359 after bad maybe creating these complex architectures that just fail 18 00:00:57.439 --> 00:00:57.960 under load. 19 00:00:58.039 --> 00:01:00.840 Yeah, we've all seen that happen. We've been diving into 20 00:01:00.960 --> 00:01:05.120 a pretty comprehensive guide that outlines five essential principles for 21 00:01:05.280 --> 00:01:08.599 well deploying and managing Linux in the cloud. Our mission 22 00:01:08.640 --> 00:01:12.239 today is to distill those five strategic pillars for you 23 00:01:12.799 --> 00:01:16.040 think of this as your shortcut to understanding the planning, 24 00:01:16.400 --> 00:01:21.760 the architecture, monitoring, and the governance you need to succeed 25 00:01:21.879 --> 00:01:24.959 in these really complex, dynamic cloud environments. 26 00:01:25.120 --> 00:01:27.719 Okay, so before we even hit principle one, we really 27 00:01:27.719 --> 00:01:31.159 have to establish the foundational architecture because your choice here 28 00:01:31.519 --> 00:01:35.159 it dictates pretty much everything that follows. Cloud services are 29 00:01:35.159 --> 00:01:38.439 generally broken down into let's say three main categories based 30 00:01:38.480 --> 00:01:39.599 on who's responsible for. 31 00:01:39.560 --> 00:01:43.079 What, which dictates how much strategic headache you keep. Basically, yeah, 32 00:01:43.120 --> 00:01:44.680 so walk us through that spectrum. 33 00:01:44.719 --> 00:01:47.560 Okay, on one end, you've got ISS. That's infrastructure as 34 00:01:47.599 --> 00:01:49.920 a service. This is probably the most common starting point 35 00:01:49.920 --> 00:01:55.359 for many folks. Right, the provider gives you the bare infrastructure, servers, storage, networking, 36 00:01:55.680 --> 00:01:59.439 but you, the customer, you're responsible for managing the operating system, 37 00:01:59.560 --> 00:02:04.000 the patch, the applications, basically everything above the hypervisor. 38 00:02:04.120 --> 00:02:06.200 So I gives you that freedom, you know, run whatever 39 00:02:06.239 --> 00:02:09.400 Linux distro you want, but you keep the strategic headache 40 00:02:09.400 --> 00:02:12.360 of managing potentially hundreds of different OS layers. 41 00:02:12.680 --> 00:02:17.560 Precisely, then we move along to palats Platform as a service. Now, 42 00:02:17.680 --> 00:02:21.400 this is often strategically superior, especially for new application development. 43 00:02:21.560 --> 00:02:25.240 Oh so, well, here the provider handles the OS, the networking, 44 00:02:25.280 --> 00:02:28.599 the databases, all that plumbing. You the developer, you just 45 00:02:28.680 --> 00:02:31.560 focus purely on your code. It's essentially a ready to 46 00:02:31.639 --> 00:02:32.680 use delivery environment. 47 00:02:32.759 --> 00:02:33.120 Gotcha. 48 00:02:33.639 --> 00:02:37.560 And finally there's sauce Software as a service. This is 49 00:02:37.599 --> 00:02:41.879 where you outsource well pretty much everything infrastructure software updates. 50 00:02:42.319 --> 00:02:46.080 The consumer has pretty limited control. Think of using something 51 00:02:46.120 --> 00:02:49.960 like a CRM service or you know, Adobe Creative Cloud 52 00:02:50.199 --> 00:02:51.039 running on Azure. 53 00:02:51.280 --> 00:02:55.240 Okay, so knowing where you sit on that iss packs 54 00:02:55.280 --> 00:02:58.199 Sauce spectrum is crucial, and that brings us neatly to 55 00:02:58.240 --> 00:02:59.919 our first principle. I think it does. 56 00:03:00.159 --> 00:03:04.560 Principle one understand which Linux vms are adaptable to the cloud, 57 00:03:04.800 --> 00:03:07.199 and the source material really stresses this. The very first 58 00:03:07.240 --> 00:03:10.319 step must be a cloud readiness assessment. Okay, you can't 59 00:03:10.360 --> 00:03:12.479 just assume everything you're currently running on premise will work 60 00:03:12.479 --> 00:03:15.759 efficiently or even cost effectively in a virtualized cloud environment. 61 00:03:16.080 --> 00:03:18.719 But you know, is an is often seen as the 62 00:03:18.759 --> 00:03:22.680 path of least resistance, just lift and shift. Why is 63 00:03:22.719 --> 00:03:24.960 a formal assessment so critical. 64 00:03:25.120 --> 00:03:30.159 Because failing to assess properly often leads to massive overspending 65 00:03:30.240 --> 00:03:33.520 Later on, It just does the assessment forces you to 66 00:03:33.719 --> 00:03:38.120 really analyze your existing workload patterns, your database requirements, and 67 00:03:38.159 --> 00:03:42.080 that's the data that must guide your is versus past decision. 68 00:03:42.560 --> 00:03:45.520 If your application can be refactored, maybe modernized a bit, 69 00:03:45.719 --> 00:03:48.840 you could save huge amounts of money and operational effort 70 00:03:49.120 --> 00:03:50.800 by moving it to pass instead. 71 00:03:51.000 --> 00:03:53.439 And this changes the team structure too, doesn't it. Yeah, 72 00:03:53.520 --> 00:03:58.199 you mentioned needing fewer traditional sissedmends focused on physical kit exactly, 73 00:03:58.360 --> 00:04:00.800 and more DevOps architects focused on automation and that kind 74 00:04:00.800 --> 00:04:01.479 of thing exactly. 75 00:04:01.479 --> 00:04:04.560 And this leads to that fundamental migration fork in the road. 76 00:04:04.639 --> 00:04:07.039 You can lift and shift, just move your existing stack 77 00:04:07.120 --> 00:04:08.360 without fundamental changes. 78 00:04:08.400 --> 00:04:09.360 Couick and dirty. 79 00:04:09.360 --> 00:04:12.240 Quick maybe saves you three months of planning upfront, but 80 00:04:12.319 --> 00:04:15.039 you likely pay I don't know, maybe forty percent more 81 00:04:15.080 --> 00:04:17.480 in the long run because those traditional vms don't really 82 00:04:17.560 --> 00:04:20.439 utilize cloud native features like auto scaling very well. 83 00:04:20.639 --> 00:04:23.800 So the strategic path, maybe a bit riskier upfront, is 84 00:04:24.360 --> 00:04:25.680 architect before migration. 85 00:04:26.040 --> 00:04:29.439 That's the path to long term benefit. Yes, you modernize 86 00:04:29.480 --> 00:04:32.560 the application, you upgrade it to use cloud APIs for 87 00:04:32.680 --> 00:04:37.759 things like scaling, resilience, and to execute this effectively, you 88 00:04:37.879 --> 00:04:42.160 need modern operations what we often call immutable infrastructure. We're 89 00:04:42.160 --> 00:04:47.279 talking continuous integration, continuous deployment CICD, pipeline. 90 00:04:46.920 --> 00:04:48.680 WULL DevOps toolchain right. 91 00:04:48.560 --> 00:04:52.319 Enabled by tools like Jenkins Terraform. Maybe running on Azure 92 00:04:52.519 --> 00:04:56.600 Virtual Machines scale sets vmss's or the equivalent in other. 93 00:04:56.439 --> 00:04:59.160 Clouds, and VMSS is key there, isn't it, because that's 94 00:04:59.199 --> 00:05:02.040 the mechanism alie allowing those Linux vms to just instantly 95 00:05:02.120 --> 00:05:05.519 multiply when demand spikes, giving you that true cloud elasticity 96 00:05:05.519 --> 00:05:07.560 you talked about exactly. Okay, So once we've done the 97 00:05:07.560 --> 00:05:10.759 strategic planning, decided how we're building it, the next logical 98 00:05:10.800 --> 00:05:14.120 SAP is ensuring that build is well rock solid, which 99 00:05:14.199 --> 00:05:16.480 leads us directly to principle two availability. 100 00:05:16.600 --> 00:05:20.399 Principle two define your workloads required availability, and this is 101 00:05:20.399 --> 00:05:23.160 really where the cloud offers built in resilience that traditional 102 00:05:23.240 --> 00:05:27.079 data centers often struggle to match cost effectively. Well providers 103 00:05:27.120 --> 00:05:32.160 offer this through geographically isolated regions and within those regions 104 00:05:32.240 --> 00:05:36.759 availability zones or azs. Think of azs as physically separate 105 00:05:36.839 --> 00:05:38.879 data centers within a region. 106 00:05:38.959 --> 00:05:41.639 Okay, and within those zones, we need to talk about 107 00:05:41.680 --> 00:05:46.639 logical constructs like availability sets, particularly in Azure, they're designed 108 00:05:46.680 --> 00:05:50.120 to spread risk across the physical hardware. Right, But this 109 00:05:50.160 --> 00:05:52.000 is where it gets a little abstract for some Yeah, 110 00:05:52.000 --> 00:05:55.360 how should we think about fault domains and update domains? 111 00:05:55.560 --> 00:05:58.600 Okay, let's use a simple analogy. Think of availability sets 112 00:05:58.600 --> 00:06:01.360 as a promise from the provider that your critical vms 113 00:06:01.439 --> 00:06:03.720 aren't all sitting on the same power strip or the 114 00:06:03.720 --> 00:06:05.000 same network switch. 115 00:06:04.800 --> 00:06:07.319 Essentially, right, not all eggs in one basket exactly. 116 00:06:07.360 --> 00:06:10.079 Fault domains are groups of resources that share a common 117 00:06:10.120 --> 00:06:13.680 power source and network switch, So if that physical rack 118 00:06:13.800 --> 00:06:16.759 goes down, everything in that fault domain potentially fails together. 119 00:06:17.040 --> 00:06:20.040 So it's like having your application vms distributed across say 120 00:06:20.560 --> 00:06:23.639 two entirely separate server acs in the same data center 121 00:06:23.680 --> 00:06:24.439 building precisely. 122 00:06:24.480 --> 00:06:26.879 And then, update domains are groups of resources that the 123 00:06:26.879 --> 00:06:30.279 cloud provider patches and updates together during planned maintenance. You 124 00:06:30.319 --> 00:06:34.720 want your critical application components distributed across multiple update domains 125 00:06:34.800 --> 00:06:37.680 so a single routine maintenance event doesn't take down your 126 00:06:37.800 --> 00:06:41.600 entire service. It's basically your insurance policy against both unexpected 127 00:06:41.600 --> 00:06:43.480 physical failure and planned maintenance. 128 00:06:43.519 --> 00:06:47.199 Windows makes sense beyond just protecting against failure, though, we 129 00:06:47.279 --> 00:06:50.439 need to handle incoming demand spread the load. That's where 130 00:06:50.480 --> 00:06:51.360 load balancing comes in. 131 00:06:51.480 --> 00:06:55.279 Oh, absolutely essential for both availability and scaling. And you 132 00:06:55.319 --> 00:06:58.639 need to distinguish between network load balancers which operate at 133 00:06:58.720 --> 00:07:01.560 layer four routing traffic based on IP address and port, 134 00:07:02.319 --> 00:07:05.360 and application load balancers, which work at layer seven looking 135 00:07:05.399 --> 00:07:10.279 at application headers like HTTP requests. Strategically, you often want 136 00:07:10.319 --> 00:07:13.240 the layer seven balancers because they can often incorporate a 137 00:07:13.399 --> 00:07:18.160 Web Application Firewall or WAFH. 138 00:07:16.800 --> 00:07:18.360 Adding a security layer right there. 139 00:07:18.560 --> 00:07:20.879 Exactly. It adds a layer of defense right at the 140 00:07:20.879 --> 00:07:23.879 front door, filtering out known web exploits before they even 141 00:07:23.920 --> 00:07:25.079 reach your Linux VMS. 142 00:07:25.319 --> 00:07:29.079 Good point. Now, you mentioned resilience earlier. If durability is 143 00:07:29.160 --> 00:07:32.240 kind of the default in cloud storage, where do customers 144 00:07:32.240 --> 00:07:34.759 often slip up with storage redundancy. 145 00:07:34.439 --> 00:07:37.079 Well, they often fail by relying only on the baseline. 146 00:07:37.120 --> 00:07:41.199 The default cloud storage usually defaults to incredible durability within 147 00:07:41.240 --> 00:07:44.920 a single data center. Think eleven nins ninety nine point 148 00:07:45.040 --> 00:07:50.480 nine nine nine percent durability. That's what's called locally redundant 149 00:07:50.519 --> 00:07:55.160 storage LRS, which sounds amazing, is for hardware failure within 150 00:07:55.199 --> 00:07:58.079 that data center. But eleven nines means absolutely nothing. If 151 00:07:58.120 --> 00:08:01.199 a regional natural disaster like a flood or a major 152 00:08:01.279 --> 00:08:03.920 power outage, takes out the entire physical site. 153 00:08:04.240 --> 00:08:07.040 Right, LRIS won't save you from a regional catastrophe. That's 154 00:08:07.079 --> 00:08:08.759 where you need the geographical separation. 155 00:08:08.959 --> 00:08:12.839 Yes, precisely for maximum data safety against a major regional event. 156 00:08:13.040 --> 00:08:17.160 The source strongly recommends using georedundant storage grs or zone 157 00:08:17.160 --> 00:08:21.519 redundant storage crs. These replicate your data across multiple geographically 158 00:08:21.519 --> 00:08:24.720 separated zones or even regions. Okay, it's a critical and 159 00:08:24.800 --> 00:08:28.120 usually relatively cheap insurance policy against that kind of catastrophic 160 00:08:28.160 --> 00:08:31.000 regional failure. Don't skip it for important data. 161 00:08:31.120 --> 00:08:34.879 Okay. So we've planned the migration, we've built resilient infrastructure 162 00:08:34.960 --> 00:08:38.000 using AZS and redundancy. Now we need eyes on the 163 00:08:38.039 --> 00:08:42.159 whole operation. Right. That brings us to Principle three. Monitor 164 00:08:42.279 --> 00:08:46.679 your applications running on Linux across the entire stack. You 165 00:08:46.720 --> 00:08:49.960 mentioned earlier, this paradigm shift away from just monitoring server health. 166 00:08:50.759 --> 00:08:53.320 Why is the cloud provider's involvement so crucial here? 167 00:08:54.039 --> 00:08:57.879 Because the cloud provider is already monitoring the underlying infrastructure health, 168 00:08:58.279 --> 00:09:02.320 the physical host machine, the high provisor layer. Your job 169 00:09:02.519 --> 00:09:06.960 as the customer ships almost entirely towards application performance monitoring 170 00:09:07.000 --> 00:09:09.519 APM and focusing on the end user experience. 171 00:09:09.799 --> 00:09:12.200 So less about CPU on the box, more about how 172 00:09:12.240 --> 00:09:13.679 quickly the web page loads for. 173 00:09:13.600 --> 00:09:16.360 The user exactly, and think about server list functions like 174 00:09:16.399 --> 00:09:19.480 Azure functions or AWS Lambda. You don't even have a 175 00:09:19.519 --> 00:09:22.200 server to monitor in the traditional sense. You're just monitoring 176 00:09:22.279 --> 00:09:24.360 the execution of these little chunks of code. 177 00:09:25.559 --> 00:09:28.600 But this must create an incredibly fragmented view, mustn't it, 178 00:09:28.960 --> 00:09:31.399 Especially if you're in a complex hybrid setup or using 179 00:09:31.480 --> 00:09:32.279 multiple clouds. 180 00:09:32.720 --> 00:09:36.080 Oh, it creates enormous challenges. You often lack that unified 181 00:09:36.159 --> 00:09:40.240 visibility across all your resources. You're dealing with different cloud 182 00:09:40.279 --> 00:09:44.039 specific tools as your monitor, here, AWS, cloud watch, there, 183 00:09:44.399 --> 00:09:48.399 maybe something else on prem and everything is dynamically scaling 184 00:09:48.480 --> 00:09:52.080 up and down. If a VM instance only lives for say, 185 00:09:52.279 --> 00:09:55.840 thirty minutes during a peak, and then disappears, how do 186 00:09:55.879 --> 00:09:59.639 you effectively track its performance history or troubleshoot what happened? 187 00:10:00.200 --> 00:10:03.519 Question? Let's drill down a bit for the Linux administrators listening. 188 00:10:03.720 --> 00:10:06.759 What specific metrics become even more crucial to watch in 189 00:10:06.799 --> 00:10:09.039 the cloud context compared to on premise. 190 00:10:09.240 --> 00:10:12.960 Okay, we need deep insight. So when looking at CPU usage, 191 00:10:13.000 --> 00:10:16.200 it's vital to distinguish between user time that's your application running, 192 00:10:16.240 --> 00:10:19.360 and privileged time or system time, which is the kernel 193 00:10:19.440 --> 00:10:20.039 doing work. 194 00:10:20.240 --> 00:10:22.279 Why is that distinction so important now? 195 00:10:22.559 --> 00:10:25.559 Because if you see consistently high privileged time, it often 196 00:10:25.600 --> 00:10:29.519 indicates poor performance caused by the underlying hypervisor or maybe 197 00:10:29.519 --> 00:10:33.639 noisy neighbors on the physical host. That's potentially the provider's problem, 198 00:10:33.679 --> 00:10:36.799 not your application code. Knowing that difference helps you open 199 00:10:36.840 --> 00:10:38.159 the right kind of support ticket. 200 00:10:38.480 --> 00:10:41.639 Ah, that's a great example of how monitoring helps navigate 201 00:10:41.639 --> 00:10:43.720 that shared responsibility model. What else? 202 00:10:44.039 --> 00:10:48.200 Absolutely, for DISCO, you absolutely must track input output operations 203 00:10:48.240 --> 00:10:52.799 per second IOPs, especially with Linux file systems. Hitting IOPs 204 00:10:52.840 --> 00:10:56.399 limits is a common bottleneck. If your IOPs are spiking, 205 00:10:56.840 --> 00:10:59.279 you probably need to scale up your storage tier, maybe 206 00:10:59.320 --> 00:11:02.120 get faster, not just make the VM bigger. 207 00:11:02.240 --> 00:11:04.480 Got it, disc speed not just size right. 208 00:11:04.840 --> 00:11:07.600 And critically, for memory utilization, you need to track paging 209 00:11:07.639 --> 00:11:11.480 events or swap activity. Excessive paging where the VM is 210 00:11:11.559 --> 00:11:13.919 constantly swapping memory out to DISC because it doesn't have 211 00:11:14.000 --> 00:11:17.519 enough RAM is probably the clearest, most unambiguous sign that 212 00:11:17.600 --> 00:11:20.720 performance is tanking and you need more memory capacity for 213 00:11:20.759 --> 00:11:21.399 that workload. 214 00:11:21.559 --> 00:11:24.480 Okay, clear indicators there. So the native endor tools like 215 00:11:24.559 --> 00:11:27.440 Azure Monitor or cloud Watch, they give you data on 216 00:11:27.519 --> 00:11:30.600 individual servers or services, but they don't necessarily give you 217 00:11:30.639 --> 00:11:34.720 that unified enterprise wide dashboard view, especially if you've got 218 00:11:34.720 --> 00:11:37.639 that multi cloud or hybrid reality exactly. 219 00:11:37.639 --> 00:11:40.360 They're great for their own ecosystems, but they don't naturally 220 00:11:40.399 --> 00:11:43.519 palk to each other or integrate with your on prem 221 00:11:43.559 --> 00:11:48.279 tools to gain that truly comprehensive uniform view across everything. 222 00:11:48.600 --> 00:11:52.559 The sources highly recommend integrating third party monitoring tools think 223 00:11:53.000 --> 00:11:56.200 data Dog, Dina Trace, neuralic tools like that. 224 00:11:56.320 --> 00:11:57.559 And what do they bring to the table. 225 00:11:57.840 --> 00:12:01.440 They specialize in collecting and correlate metrics from every layer 226 00:12:01.480 --> 00:12:04.039 of the architecture, from the database queries up to the 227 00:12:04.279 --> 00:12:07.840 load balance or response times, maybe even front end user experience, 228 00:12:08.080 --> 00:12:11.200 regardless of which cloud vendor or which data center things 229 00:12:11.200 --> 00:12:14.679 are sitting on. That unified visibility is really the difference 230 00:12:14.679 --> 00:12:18.360 between proactive management and constantly just reacting to outages after 231 00:12:18.399 --> 00:12:18.879 they happen. 232 00:12:19.080 --> 00:12:22.039 Okay, that makes sense. Let's shift gears now to defensive 233 00:12:22.039 --> 00:12:26.120 protection with Principle four, ensure your Linux vms are secure 234 00:12:26.159 --> 00:12:29.519 and backed up. Now, you mentioned shared responsibility earlier, and 235 00:12:29.600 --> 00:12:31.879 you said, if you take only one concept away from 236 00:12:31.879 --> 00:12:34.000 this whole deep dive, it should be the shared security 237 00:12:34.039 --> 00:12:36.200 responsibility model. Let's really nail this. 238 00:12:36.200 --> 00:12:39.240 Down, we have to. This is probably the single most 239 00:12:39.279 --> 00:12:43.399 misunderstood concept in cloud and where customers frankly fail all 240 00:12:43.440 --> 00:12:45.840 the time. Let's clearly define the line in the sand. 241 00:12:46.399 --> 00:12:49.519 The cloud provider is responsible for security of the cloud. 242 00:12:49.720 --> 00:12:51.320 Okay, of the cloud meaning the. 243 00:12:51.320 --> 00:12:54.080 Physical security of the data centers, the security of the 244 00:12:54.120 --> 00:12:58.120 global network infrastructure, the security of their managed services like 245 00:12:58.200 --> 00:13:01.519 the hypervisor or the storage fabri They secure the building 246 00:13:01.519 --> 00:13:03.120 and its core systems. 247 00:13:03.039 --> 00:13:06.679 Right, And the cloud customer is responsible for security. 248 00:13:06.200 --> 00:13:08.679 In the cloud exactly. Security in the cloud this means 249 00:13:08.720 --> 00:13:11.320 your customer data, the security of the operating systems you 250 00:13:11.399 --> 00:13:15.879 choose to run, like Linux, patching those ocs, configuring firewalls, 251 00:13:15.919 --> 00:13:21.039 managing your application security, identity and access management IAM, and 252 00:13:21.080 --> 00:13:25.159 crucially encryption. You secure everything you put inside the building. 253 00:13:25.399 --> 00:13:27.600 And you specifically called out encryption there. 254 00:13:27.679 --> 00:13:32.039 Why because customers often forget that last part. Encryption of 255 00:13:32.120 --> 00:13:35.080 data at rest on your VMS or in your databases 256 00:13:35.120 --> 00:13:38.519 is almost always the customer's job. By default. The provider 257 00:13:38.600 --> 00:13:40.759 gives you the tools, but you have to turn them 258 00:13:40.759 --> 00:13:41.879 on and manage the keys. 259 00:13:42.200 --> 00:13:44.879 Okay, So to fulfill your side of the bargain, you 260 00:13:44.960 --> 00:13:47.679 need to leverage the tools the provider gives you, like 261 00:13:47.960 --> 00:13:53.480 network security groups, cloud firewalls, strong im controls, using things 262 00:13:53.519 --> 00:13:57.240 like virtual private clouds of vpcs or v nets for 263 00:13:57.399 --> 00:13:58.720 logical network isolation. 264 00:13:59.039 --> 00:14:02.080 Absolutely, those are your primary tools for securing things in 265 00:14:02.120 --> 00:14:02.559 the cloud. 266 00:14:03.000 --> 00:14:07.679 Now, let's connect security with disaster recovery or dr When 267 00:14:07.720 --> 00:14:10.600 we're planning for DR we always talk about RTO and RPO. 268 00:14:10.639 --> 00:14:11.759 Can you quickly define those? 269 00:14:11.879 --> 00:14:15.320 Sure? RTO that's the recovery time objective. It's the maximum 270 00:14:15.360 --> 00:14:18.679 acceptable time allowed to restore your service after a disaster hits. 271 00:14:19.000 --> 00:14:20.559 How fast you need to be back online? 272 00:14:20.759 --> 00:14:21.080 Okay? 273 00:14:21.200 --> 00:14:25.639 And RPO the recovery point objective. That's the maximum acceptable 274 00:14:25.639 --> 00:14:28.519 amount of data loss, usually measured in time like can 275 00:14:28.519 --> 00:14:30.840 you afford to lose the last hour of data or 276 00:14:30.879 --> 00:14:32.000 only the last five minutes? 277 00:14:32.120 --> 00:14:35.000 Right? I remember, you know, ten fifteen years ago, running 278 00:14:35.000 --> 00:14:38.159 a DR drill was this massive annual event. It cost 279 00:14:38.200 --> 00:14:41.559 a fortune because you were essentially paying for idle hot 280 00:14:41.639 --> 00:14:46.360 standby physical infrastructure sitting in a dedicated secondary site just waiting. 281 00:14:46.080 --> 00:14:49.759 For disaster exactly millions sometimes just for that insurance, and. 282 00:14:49.679 --> 00:14:53.279 The cloud changes the entire financial stress test of that situation, 283 00:14:53.360 --> 00:14:53.840 doesn't it. 284 00:14:53.840 --> 00:14:57.440 It dramatically shifts that rto cost trade off curve. Really, 285 00:14:57.919 --> 00:15:01.799 because cloud elasticity allows you to quickly provisioned compute resources 286 00:15:01.840 --> 00:15:04.919 only when the recovery is actually needed, not paying for 287 00:15:04.960 --> 00:15:07.879 them to sit idle twenty four to seven, you can 288 00:15:07.960 --> 00:15:11.759 often achieve a much faster recovery time, a shorter RTO 289 00:15:12.080 --> 00:15:15.759 at a significantly reduced infrastructure costs compared to those traditional 290 00:15:15.799 --> 00:15:20.240 dedicated DR sites. Basically, you can often afford faster recovery 291 00:15:20.240 --> 00:15:22.879 metrics because the hardware effectively sits powered off in the 292 00:15:22.879 --> 00:15:25.360 cloud until you declare disaster and need to spin it up. 293 00:15:25.679 --> 00:15:29.080 So modernizing backup and DR in the cloud means maybe 294 00:15:29.120 --> 00:15:33.120 outsourcing the whole backup process via managed services like Azure 295 00:15:33.120 --> 00:15:37.720 backup or AWS backup, and using built in replication tools 296 00:15:37.720 --> 00:15:40.480 maybe like Azure Site Recovery or cloud Endure, which can 297 00:15:40.519 --> 00:15:43.960 effectively eliminate the need for that second expensive physical data 298 00:15:44.000 --> 00:15:45.720 center altogether for many workloads. 299 00:15:45.799 --> 00:15:47.320 That's exactly the modern approach. 300 00:15:47.399 --> 00:15:50.919 Yes, okay, that brings us to our final and arguably 301 00:15:51.000 --> 00:15:58.399 most strategic capstone Principal five. Governance often sounds like boring paperwork, 302 00:15:58.559 --> 00:16:00.799 but you suggested it's actually one of the most complex 303 00:16:00.840 --> 00:16:02.960 parts of moving to and operating in the cloud. 304 00:16:03.000 --> 00:16:07.320 Why is that because the cloud, by its nature, abstracts 305 00:16:07.320 --> 00:16:11.000 location and control in ways that introduce massive new complexities, 306 00:16:11.360 --> 00:16:15.559 especially around legal issues, compliance and data disclosure regulations. The 307 00:16:15.600 --> 00:16:19.320 source specifically highlights things like data sovereignty laws. 308 00:16:19.080 --> 00:16:22.679 Ah right, the rules that say data belonging to citizens 309 00:16:22.720 --> 00:16:25.600 of a certain country must physically remain stored within that. 310 00:16:25.559 --> 00:16:29.120 Country's borders exactly. So if a cloud provider has regions 311 00:16:29.159 --> 00:16:32.360 all over the world, you as the architect or administrator, 312 00:16:32.399 --> 00:16:35.080 have the responsibility to ensure that the data for say, 313 00:16:35.399 --> 00:16:38.759 your German customers, is provision only in an EU region 314 00:16:38.840 --> 00:16:42.559 like Germany or Frankfurt and isn't accidentally replicated or backed 315 00:16:42.639 --> 00:16:45.240 up to a US region for instance. That requires careful 316 00:16:45.240 --> 00:16:46.200 governance policies. 317 00:16:46.559 --> 00:16:50.559 And beyond just the legal complexity, there's often a customer concern. 318 00:16:50.720 --> 00:16:54.360 Isn't there about trusting the provider with sensitive data given 319 00:16:54.399 --> 00:16:57.519 the shared nature of the resources and maybe having less 320 00:16:57.559 --> 00:17:00.960 direct visibility compared to their old on premise environments. 321 00:17:01.080 --> 00:17:04.119 That's a huge factor. You're relying on the provider security 322 00:17:04.200 --> 00:17:08.200 for the underlying layers. You're on shared hardware. It requires 323 00:17:08.200 --> 00:17:10.279 a different level of trust and verification. 324 00:17:10.960 --> 00:17:15.039 So how do you, as the customer maintain strategic control 325 00:17:15.279 --> 00:17:17.960 and ensure compliance and manage that trust. 326 00:17:18.000 --> 00:17:22.240 Through rigorous governance mechanisms provided by the cloud platform. We're 327 00:17:22.240 --> 00:17:26.119 talking about strict role based access control RBAC, making sure 328 00:17:26.160 --> 00:17:28.799 people only have the minimum permissions they need. We're talking 329 00:17:28.799 --> 00:17:33.039 about network security groups and policies, and crucially using hierarchical 330 00:17:33.039 --> 00:17:34.000 account provisioning. 331 00:17:34.160 --> 00:17:34.920 What do you mean by that? 332 00:17:35.079 --> 00:17:39.759 Dividing your potentially sprawling cloud resources into logical containers, separate departments, 333 00:17:39.759 --> 00:17:43.279 different projects, distinct subscriptions or accounts. This allows you to 334 00:17:43.359 --> 00:17:47.079 ring fence costs, apply specific security policies only where needed, 335 00:17:47.359 --> 00:17:51.000 and manage access at scale. It's fundamental to staying organized 336 00:17:51.000 --> 00:17:51.519 and secure. 337 00:17:51.759 --> 00:17:54.559 Okay, And finally, when it comes to trusting the global 338 00:17:54.599 --> 00:17:58.799 cloud vendor, you can't exactly send your own auditors to 339 00:17:58.839 --> 00:18:03.359 physically inspect their massive, highly secured data centers around the world. 340 00:18:03.680 --> 00:18:07.240 So how is that confidence that trust actually established. 341 00:18:07.440 --> 00:18:11.000 It's established through what the source calls delegated trust. Since 342 00:18:11.079 --> 00:18:14.559 direct physical auditing by every customer is completely infeasible and 343 00:18:14.599 --> 00:18:17.720 frankly a security risk in itself. Trust is established by 344 00:18:17.759 --> 00:18:22.240 relying on independent, recognize third party audits and certifications, So. 345 00:18:22.200 --> 00:18:24.240 You look for their badges essentially. 346 00:18:24.000 --> 00:18:27.279 Kind of yeah, you rely on standardized reports like SC 347 00:18:27.319 --> 00:18:29.640 one or SEC two, which a test to financial and 348 00:18:29.720 --> 00:18:33.799 operational controls. You look for industry specific certifications like ISO 349 00:18:33.839 --> 00:18:36.359 twenty seven, DEERO zero zero six, AR seven zero zero 350 00:18:36.359 --> 00:18:40.480 two for security management, or maybe HYPOLAA for healthcare data 351 00:18:40.880 --> 00:18:44.799 or PCIDSS for payment card data. These aren't just acronyms 352 00:18:44.799 --> 00:18:47.680 on a web page. They represent formal attestations by accredited 353 00:18:47.680 --> 00:18:50.960 auditors that the cloud provider has implemented the required security standards, 354 00:18:51.000 --> 00:18:55.599 management controls, and operational procedures at that foundational infrastructure level. 355 00:18:56.119 --> 00:18:58.839 You delegate the auditing trust to these recognized bodies. 356 00:18:59.119 --> 00:19:02.559 Okay, So wrapping it up, the five strategic principles for 357 00:19:02.680 --> 00:19:06.519 mastering Linux in the cloud. Start with cloud readiness and planning, 358 00:19:07.079 --> 00:19:11.640 build for availability and resilience, Implement unified monitoring across the stack, 359 00:19:12.039 --> 00:19:16.079 ensure security and disaster recovery through that shared model, and finally, 360 00:19:16.240 --> 00:19:20.319 overlay rigorous governance. It really does feel like a roadmap 361 00:19:20.640 --> 00:19:23.279 to avoiding both technical and financial headaches. 362 00:19:23.519 --> 00:19:27.440 It absolutely is, And notice the underlying theme the shift 363 00:19:27.480 --> 00:19:31.480 is profound. The cloud vendor takes responsibility for securing the 364 00:19:31.480 --> 00:19:36.000 cloud infrastructure, but the customer retains full responsibility for securing 365 00:19:36.000 --> 00:19:38.680 their application, their data, their identities, their access in. 366 00:19:38.680 --> 00:19:41.279 The cloud, which fundamentally changes the game. 367 00:19:41.440 --> 00:19:44.160 It fundamentally means the long term role of the traditional 368 00:19:44.200 --> 00:19:47.319 system administrator is changing dramatically. They have to evolve. 369 00:19:47.440 --> 00:19:51.400 They need to become strategists right embracing development practices, cloud 370 00:19:51.519 --> 00:19:55.759 architecture principles, and maybe most importantly, understanding and implementing these 371 00:19:55.759 --> 00:19:58.920 governance strategies. It's really no longer just about racking and 372 00:19:58.920 --> 00:20:00.200 stacking physical matas. 373 00:20:00.480 --> 00:20:05.599 Not at all. It's about strategic automation, compliance, security, posture management, 374 00:20:05.839 --> 00:20:06.799 cost optimization. 375 00:20:07.319 --> 00:20:09.119 So here's a final thought to leave of our listeners 376 00:20:09.160 --> 00:20:12.680 with building on that, if the cissed men's role is 377 00:20:12.759 --> 00:20:18.160 shifting towards governance, towards managing identity and access, yeah, it 378 00:20:18.240 --> 00:20:21.960 raises a crucial question for you listening, Are you and 379 00:20:22.039 --> 00:20:26.240 your organization actually structured to effectively audit your own identity 380 00:20:26.240 --> 00:20:30.640 and access management, i AM policies within the cloud or 381 00:20:30.720 --> 00:20:33.319 is that maybe the biggest blind spot you've accidentally created 382 00:20:33.400 --> 00:20:34.960 or outsourced without realizing it. 383 00:20:35.119 --> 00:20:37.480 Hmmm, that's a good one. Definitely something to mull over 384 00:20:37.599 --> 00:20:38.880 who's watching the watchers. 385 00:20:39.160 --> 00:20:42.039 Essentially exactly something to think about until our next deep dies. 386 00:20:42.279 --> 00:20:44.799 Thanks for breaking down these principles today, my pleasure is 387 00:20:44.799 --> 00:20:45.559 a great discussion