WEBVTT

1
00:00:00.160 --> 00:00:05.160
<v Speaker 1>Welcome to the deep dive. Today. We're undertaking a strategic

2
00:00:05.200 --> 00:00:11.039
<v Speaker 1>analysis really of the modern IT engine. Yeah, if you're building, managing,

3
00:00:11.160 --> 00:00:15.519
<v Speaker 1>or maybe migrating business applications, it's highly likely that Linux

4
00:00:15.560 --> 00:00:18.640
<v Speaker 1>virtual machines, maybe containers are running the show somewhere.

5
00:00:18.640 --> 00:00:19.440
<v Speaker 2>Oh. Absolutely.

6
00:00:19.559 --> 00:00:22.160
<v Speaker 1>And the move to the cloud, well, it's a strategic

7
00:00:22.239 --> 00:00:26.440
<v Speaker 1>bandate now for almost every organization, isn't it. But mastering

8
00:00:26.480 --> 00:00:30.839
<v Speaker 1>that environment, making it efficient, resilient, secure, that takes more

9
00:00:30.879 --> 00:00:32.159
<v Speaker 1>than just spinning up a few.

10
00:00:32.000 --> 00:00:35.280
<v Speaker 2>Servers, It absolutely does. Running your workloads in the public

11
00:00:35.280 --> 00:00:39.759
<v Speaker 2>cloud is a fundamentally different paradigm than running a physical

12
00:00:39.840 --> 00:00:40.439
<v Speaker 2>data center.

13
00:00:40.520 --> 00:00:42.960
<v Speaker 1>It just is a different way of thinking completely.

14
00:00:43.240 --> 00:00:47.439
<v Speaker 2>The promise is massive, of course, agility, true elasticity, pay

15
00:00:47.479 --> 00:00:50.880
<v Speaker 2>for use, economics, all the good stuff. But without a

16
00:00:50.880 --> 00:00:53.840
<v Speaker 2>strategic playbook, you can end up just throwing good money

17
00:00:53.840 --> 00:00:57.359
<v Speaker 2>after bad maybe creating these complex architectures that just fail

18
00:00:57.439 --> 00:00:57.960
<v Speaker 2>under load.

19
00:00:58.039 --> 00:01:00.840
<v Speaker 1>Yeah, we've all seen that happen. We've been diving into

20
00:01:00.960 --> 00:01:05.120
<v Speaker 1>a pretty comprehensive guide that outlines five essential principles for

21
00:01:05.280 --> 00:01:08.599
<v Speaker 1>well deploying and managing Linux in the cloud. Our mission

22
00:01:08.640 --> 00:01:12.239
<v Speaker 1>today is to distill those five strategic pillars for you

23
00:01:12.799 --> 00:01:16.040
<v Speaker 1>think of this as your shortcut to understanding the planning,

24
00:01:16.400 --> 00:01:21.760
<v Speaker 1>the architecture, monitoring, and the governance you need to succeed

25
00:01:21.879 --> 00:01:24.959
<v Speaker 1>in these really complex, dynamic cloud environments.

26
00:01:25.120 --> 00:01:27.719
<v Speaker 2>Okay, so before we even hit principle one, we really

27
00:01:27.719 --> 00:01:31.159
<v Speaker 2>have to establish the foundational architecture because your choice here

28
00:01:31.519 --> 00:01:35.159
<v Speaker 2>it dictates pretty much everything that follows. Cloud services are

29
00:01:35.159 --> 00:01:38.439
<v Speaker 2>generally broken down into let's say three main categories based

30
00:01:38.480 --> 00:01:39.599
<v Speaker 2>on who's responsible for.

31
00:01:39.560 --> 00:01:43.079
<v Speaker 1>What, which dictates how much strategic headache you keep. Basically, yeah,

32
00:01:43.120 --> 00:01:44.680
<v Speaker 1>so walk us through that spectrum.

33
00:01:44.719 --> 00:01:47.560
<v Speaker 2>Okay, on one end, you've got ISS. That's infrastructure as

34
00:01:47.599 --> 00:01:49.920
<v Speaker 2>a service. This is probably the most common starting point

35
00:01:49.920 --> 00:01:55.359
<v Speaker 2>for many folks. Right, the provider gives you the bare infrastructure, servers, storage, networking,

36
00:01:55.680 --> 00:01:59.439
<v Speaker 2>but you, the customer, you're responsible for managing the operating system,

37
00:01:59.560 --> 00:02:04.000
<v Speaker 2>the patch, the applications, basically everything above the hypervisor.

38
00:02:04.120 --> 00:02:06.200
<v Speaker 1>So I gives you that freedom, you know, run whatever

39
00:02:06.239 --> 00:02:09.400
<v Speaker 1>Linux distro you want, but you keep the strategic headache

40
00:02:09.400 --> 00:02:12.360
<v Speaker 1>of managing potentially hundreds of different OS layers.

41
00:02:12.680 --> 00:02:17.560
<v Speaker 2>Precisely, then we move along to palats Platform as a service. Now,

42
00:02:17.680 --> 00:02:21.400
<v Speaker 2>this is often strategically superior, especially for new application development.

43
00:02:21.560 --> 00:02:25.240
<v Speaker 2>Oh so, well, here the provider handles the OS, the networking,

44
00:02:25.280 --> 00:02:28.599
<v Speaker 2>the databases, all that plumbing. You the developer, you just

45
00:02:28.680 --> 00:02:31.560
<v Speaker 2>focus purely on your code. It's essentially a ready to

46
00:02:31.639 --> 00:02:32.680
<v Speaker 2>use delivery environment.

47
00:02:32.759 --> 00:02:33.120
<v Speaker 1>Gotcha.

48
00:02:33.639 --> 00:02:37.560
<v Speaker 2>And finally there's sauce Software as a service. This is

49
00:02:37.599 --> 00:02:41.879
<v Speaker 2>where you outsource well pretty much everything infrastructure software updates.

50
00:02:42.319 --> 00:02:46.080
<v Speaker 2>The consumer has pretty limited control. Think of using something

51
00:02:46.120 --> 00:02:49.960
<v Speaker 2>like a CRM service or you know, Adobe Creative Cloud

52
00:02:50.199 --> 00:02:51.039
<v Speaker 2>running on Azure.

53
00:02:51.280 --> 00:02:55.240
<v Speaker 1>Okay, so knowing where you sit on that iss packs

54
00:02:55.280 --> 00:02:58.199
<v Speaker 1>Sauce spectrum is crucial, and that brings us neatly to

55
00:02:58.240 --> 00:02:59.919
<v Speaker 1>our first principle. I think it does.

56
00:03:00.159 --> 00:03:04.560
<v Speaker 2>Principle one understand which Linux vms are adaptable to the cloud,

57
00:03:04.800 --> 00:03:07.199
<v Speaker 2>and the source material really stresses this. The very first

58
00:03:07.240 --> 00:03:10.319
<v Speaker 2>step must be a cloud readiness assessment. Okay, you can't

59
00:03:10.360 --> 00:03:12.479
<v Speaker 2>just assume everything you're currently running on premise will work

60
00:03:12.479 --> 00:03:15.759
<v Speaker 2>efficiently or even cost effectively in a virtualized cloud environment.

61
00:03:16.080 --> 00:03:18.719
<v Speaker 1>But you know, is an is often seen as the

62
00:03:18.759 --> 00:03:22.680
<v Speaker 1>path of least resistance, just lift and shift. Why is

63
00:03:22.719 --> 00:03:24.960
<v Speaker 1>a formal assessment so critical.

64
00:03:25.120 --> 00:03:30.159
<v Speaker 2>Because failing to assess properly often leads to massive overspending

65
00:03:30.240 --> 00:03:33.520
<v Speaker 2>Later on, It just does the assessment forces you to

66
00:03:33.719 --> 00:03:38.120
<v Speaker 2>really analyze your existing workload patterns, your database requirements, and

67
00:03:38.159 --> 00:03:42.080
<v Speaker 2>that's the data that must guide your is versus past decision.

68
00:03:42.560 --> 00:03:45.520
<v Speaker 2>If your application can be refactored, maybe modernized a bit,

69
00:03:45.719 --> 00:03:48.840
<v Speaker 2>you could save huge amounts of money and operational effort

70
00:03:49.120 --> 00:03:50.800
<v Speaker 2>by moving it to pass instead.

71
00:03:51.000 --> 00:03:53.439
<v Speaker 1>And this changes the team structure too, doesn't it. Yeah,

72
00:03:53.520 --> 00:03:58.199
<v Speaker 1>you mentioned needing fewer traditional sissedmends focused on physical kit exactly,

73
00:03:58.360 --> 00:04:00.800
<v Speaker 1>and more DevOps architects focused on automation and that kind

74
00:04:00.800 --> 00:04:01.479
<v Speaker 1>of thing exactly.

75
00:04:01.479 --> 00:04:04.560
<v Speaker 2>And this leads to that fundamental migration fork in the road.

76
00:04:04.639 --> 00:04:07.039
<v Speaker 2>You can lift and shift, just move your existing stack

77
00:04:07.120 --> 00:04:08.360
<v Speaker 2>without fundamental changes.

78
00:04:08.400 --> 00:04:09.360
<v Speaker 1>Couick and dirty.

79
00:04:09.360 --> 00:04:12.240
<v Speaker 2>Quick maybe saves you three months of planning upfront, but

80
00:04:12.319 --> 00:04:15.039
<v Speaker 2>you likely pay I don't know, maybe forty percent more

81
00:04:15.080 --> 00:04:17.480
<v Speaker 2>in the long run because those traditional vms don't really

82
00:04:17.560 --> 00:04:20.439
<v Speaker 2>utilize cloud native features like auto scaling very well.

83
00:04:20.639 --> 00:04:23.800
<v Speaker 1>So the strategic path, maybe a bit riskier upfront, is

84
00:04:24.360 --> 00:04:25.680
<v Speaker 1>architect before migration.

85
00:04:26.040 --> 00:04:29.439
<v Speaker 2>That's the path to long term benefit. Yes, you modernize

86
00:04:29.480 --> 00:04:32.560
<v Speaker 2>the application, you upgrade it to use cloud APIs for

87
00:04:32.680 --> 00:04:37.759
<v Speaker 2>things like scaling, resilience, and to execute this effectively, you

88
00:04:37.879 --> 00:04:42.160
<v Speaker 2>need modern operations what we often call immutable infrastructure. We're

89
00:04:42.160 --> 00:04:47.279
<v Speaker 2>talking continuous integration, continuous deployment CICD, pipeline.

90
00:04:46.920 --> 00:04:48.680
<v Speaker 1>WULL DevOps toolchain right.

91
00:04:48.560 --> 00:04:52.319
<v Speaker 2>Enabled by tools like Jenkins Terraform. Maybe running on Azure

92
00:04:52.519 --> 00:04:56.600
<v Speaker 2>Virtual Machines scale sets vmss's or the equivalent in other.

93
00:04:56.439 --> 00:04:59.160
<v Speaker 1>Clouds, and VMSS is key there, isn't it, because that's

94
00:04:59.199 --> 00:05:02.040
<v Speaker 1>the mechanism alie allowing those Linux vms to just instantly

95
00:05:02.120 --> 00:05:05.519
<v Speaker 1>multiply when demand spikes, giving you that true cloud elasticity

96
00:05:05.519 --> 00:05:07.560
<v Speaker 1>you talked about exactly. Okay, So once we've done the

97
00:05:07.560 --> 00:05:10.759
<v Speaker 1>strategic planning, decided how we're building it, the next logical

98
00:05:10.800 --> 00:05:14.120
<v Speaker 1>SAP is ensuring that build is well rock solid, which

99
00:05:14.199 --> 00:05:16.480
<v Speaker 1>leads us directly to principle two availability.

100
00:05:16.600 --> 00:05:20.399
<v Speaker 2>Principle two define your workloads required availability, and this is

101
00:05:20.399 --> 00:05:23.160
<v Speaker 2>really where the cloud offers built in resilience that traditional

102
00:05:23.240 --> 00:05:27.079
<v Speaker 2>data centers often struggle to match cost effectively. Well providers

103
00:05:27.120 --> 00:05:32.160
<v Speaker 2>offer this through geographically isolated regions and within those regions

104
00:05:32.240 --> 00:05:36.759
<v Speaker 2>availability zones or azs. Think of azs as physically separate

105
00:05:36.839 --> 00:05:38.879
<v Speaker 2>data centers within a region.

106
00:05:38.959 --> 00:05:41.639
<v Speaker 1>Okay, and within those zones, we need to talk about

107
00:05:41.680 --> 00:05:46.639
<v Speaker 1>logical constructs like availability sets, particularly in Azure, they're designed

108
00:05:46.680 --> 00:05:50.120
<v Speaker 1>to spread risk across the physical hardware. Right, But this

109
00:05:50.160 --> 00:05:52.000
<v Speaker 1>is where it gets a little abstract for some Yeah,

110
00:05:52.000 --> 00:05:55.360
<v Speaker 1>how should we think about fault domains and update domains?

111
00:05:55.560 --> 00:05:58.600
<v Speaker 2>Okay, let's use a simple analogy. Think of availability sets

112
00:05:58.600 --> 00:06:01.360
<v Speaker 2>as a promise from the provider that your critical vms

113
00:06:01.439 --> 00:06:03.720
<v Speaker 2>aren't all sitting on the same power strip or the

114
00:06:03.720 --> 00:06:05.000
<v Speaker 2>same network switch.

115
00:06:04.800 --> 00:06:07.319
<v Speaker 1>Essentially, right, not all eggs in one basket exactly.

116
00:06:07.360 --> 00:06:10.079
<v Speaker 2>Fault domains are groups of resources that share a common

117
00:06:10.120 --> 00:06:13.680
<v Speaker 2>power source and network switch, So if that physical rack

118
00:06:13.800 --> 00:06:16.759
<v Speaker 2>goes down, everything in that fault domain potentially fails together.

119
00:06:17.040 --> 00:06:20.040
<v Speaker 1>So it's like having your application vms distributed across say

120
00:06:20.560 --> 00:06:23.639
<v Speaker 1>two entirely separate server acs in the same data center

121
00:06:23.680 --> 00:06:24.439
<v Speaker 1>building precisely.

122
00:06:24.480 --> 00:06:26.879
<v Speaker 2>And then, update domains are groups of resources that the

123
00:06:26.879 --> 00:06:30.279
<v Speaker 2>cloud provider patches and updates together during planned maintenance. You

124
00:06:30.319 --> 00:06:34.720
<v Speaker 2>want your critical application components distributed across multiple update domains

125
00:06:34.800 --> 00:06:37.680
<v Speaker 2>so a single routine maintenance event doesn't take down your

126
00:06:37.800 --> 00:06:41.600
<v Speaker 2>entire service. It's basically your insurance policy against both unexpected

127
00:06:41.600 --> 00:06:43.480
<v Speaker 2>physical failure and planned maintenance.

128
00:06:43.519 --> 00:06:47.199
<v Speaker 1>Windows makes sense beyond just protecting against failure, though, we

129
00:06:47.279 --> 00:06:50.439
<v Speaker 1>need to handle incoming demand spread the load. That's where

130
00:06:50.480 --> 00:06:51.360
<v Speaker 1>load balancing comes in.

131
00:06:51.480 --> 00:06:55.279
<v Speaker 2>Oh, absolutely essential for both availability and scaling. And you

132
00:06:55.319 --> 00:06:58.639
<v Speaker 2>need to distinguish between network load balancers which operate at

133
00:06:58.720 --> 00:07:01.560
<v Speaker 2>layer four routing traffic based on IP address and port,

134
00:07:02.319 --> 00:07:05.360
<v Speaker 2>and application load balancers, which work at layer seven looking

135
00:07:05.399 --> 00:07:10.279
<v Speaker 2>at application headers like HTTP requests. Strategically, you often want

136
00:07:10.319 --> 00:07:13.240
<v Speaker 2>the layer seven balancers because they can often incorporate a

137
00:07:13.399 --> 00:07:18.160
<v Speaker 2>Web Application Firewall or WAFH.

138
00:07:16.800 --> 00:07:18.360
<v Speaker 1>Adding a security layer right there.

139
00:07:18.560 --> 00:07:20.879
<v Speaker 2>Exactly. It adds a layer of defense right at the

140
00:07:20.879 --> 00:07:23.879
<v Speaker 2>front door, filtering out known web exploits before they even

141
00:07:23.920 --> 00:07:25.079
<v Speaker 2>reach your Linux VMS.

142
00:07:25.319 --> 00:07:29.079
<v Speaker 1>Good point. Now, you mentioned resilience earlier. If durability is

143
00:07:29.160 --> 00:07:32.240
<v Speaker 1>kind of the default in cloud storage, where do customers

144
00:07:32.240 --> 00:07:34.759
<v Speaker 1>often slip up with storage redundancy.

145
00:07:34.439 --> 00:07:37.079
<v Speaker 2>Well, they often fail by relying only on the baseline.

146
00:07:37.120 --> 00:07:41.199
<v Speaker 2>The default cloud storage usually defaults to incredible durability within

147
00:07:41.240 --> 00:07:44.920
<v Speaker 2>a single data center. Think eleven nins ninety nine point

148
00:07:45.040 --> 00:07:50.480
<v Speaker 2>nine nine nine percent durability. That's what's called locally redundant

149
00:07:50.519 --> 00:07:55.160
<v Speaker 2>storage LRS, which sounds amazing, is for hardware failure within

150
00:07:55.199 --> 00:07:58.079
<v Speaker 2>that data center. But eleven nines means absolutely nothing. If

151
00:07:58.120 --> 00:08:01.199
<v Speaker 2>a regional natural disaster like a flood or a major

152
00:08:01.279 --> 00:08:03.920
<v Speaker 2>power outage, takes out the entire physical site.

153
00:08:04.240 --> 00:08:07.040
<v Speaker 1>Right, LRIS won't save you from a regional catastrophe. That's

154
00:08:07.079 --> 00:08:08.759
<v Speaker 1>where you need the geographical separation.

155
00:08:08.959 --> 00:08:12.839
<v Speaker 2>Yes, precisely for maximum data safety against a major regional event.

156
00:08:13.040 --> 00:08:17.160
<v Speaker 2>The source strongly recommends using georedundant storage grs or zone

157
00:08:17.160 --> 00:08:21.519
<v Speaker 2>redundant storage crs. These replicate your data across multiple geographically

158
00:08:21.519 --> 00:08:24.720
<v Speaker 2>separated zones or even regions. Okay, it's a critical and

159
00:08:24.800 --> 00:08:28.120
<v Speaker 2>usually relatively cheap insurance policy against that kind of catastrophic

160
00:08:28.160 --> 00:08:31.000
<v Speaker 2>regional failure. Don't skip it for important data.

161
00:08:31.120 --> 00:08:34.879
<v Speaker 1>Okay. So we've planned the migration, we've built resilient infrastructure

162
00:08:34.960 --> 00:08:38.000
<v Speaker 1>using AZS and redundancy. Now we need eyes on the

163
00:08:38.039 --> 00:08:42.159
<v Speaker 1>whole operation. Right. That brings us to Principle three. Monitor

164
00:08:42.279 --> 00:08:46.679
<v Speaker 1>your applications running on Linux across the entire stack. You

165
00:08:46.720 --> 00:08:49.960
<v Speaker 1>mentioned earlier, this paradigm shift away from just monitoring server health.

166
00:08:50.759 --> 00:08:53.320
<v Speaker 1>Why is the cloud provider's involvement so crucial here?

167
00:08:54.039 --> 00:08:57.879
<v Speaker 2>Because the cloud provider is already monitoring the underlying infrastructure health,

168
00:08:58.279 --> 00:09:02.320
<v Speaker 2>the physical host machine, the high provisor layer. Your job

169
00:09:02.519 --> 00:09:06.960
<v Speaker 2>as the customer ships almost entirely towards application performance monitoring

170
00:09:07.000 --> 00:09:09.519
<v Speaker 2>APM and focusing on the end user experience.

171
00:09:09.799 --> 00:09:12.200
<v Speaker 1>So less about CPU on the box, more about how

172
00:09:12.240 --> 00:09:13.679
<v Speaker 1>quickly the web page loads for.

173
00:09:13.600 --> 00:09:16.360
<v Speaker 2>The user exactly, and think about server list functions like

174
00:09:16.399 --> 00:09:19.480
<v Speaker 2>Azure functions or AWS Lambda. You don't even have a

175
00:09:19.519 --> 00:09:22.200
<v Speaker 2>server to monitor in the traditional sense. You're just monitoring

176
00:09:22.279 --> 00:09:24.360
<v Speaker 2>the execution of these little chunks of code.

177
00:09:25.559 --> 00:09:28.600
<v Speaker 1>But this must create an incredibly fragmented view, mustn't it,

178
00:09:28.960 --> 00:09:31.399
<v Speaker 1>Especially if you're in a complex hybrid setup or using

179
00:09:31.480 --> 00:09:32.279
<v Speaker 1>multiple clouds.

180
00:09:32.720 --> 00:09:36.080
<v Speaker 2>Oh, it creates enormous challenges. You often lack that unified

181
00:09:36.159 --> 00:09:40.240
<v Speaker 2>visibility across all your resources. You're dealing with different cloud

182
00:09:40.279 --> 00:09:44.039
<v Speaker 2>specific tools as your monitor, here, AWS, cloud watch, there,

183
00:09:44.399 --> 00:09:48.399
<v Speaker 2>maybe something else on prem and everything is dynamically scaling

184
00:09:48.480 --> 00:09:52.080
<v Speaker 2>up and down. If a VM instance only lives for say,

185
00:09:52.279 --> 00:09:55.840
<v Speaker 2>thirty minutes during a peak, and then disappears, how do

186
00:09:55.879 --> 00:09:59.639
<v Speaker 2>you effectively track its performance history or troubleshoot what happened?

187
00:10:00.200 --> 00:10:03.519
<v Speaker 1>Question? Let's drill down a bit for the Linux administrators listening.

188
00:10:03.720 --> 00:10:06.759
<v Speaker 1>What specific metrics become even more crucial to watch in

189
00:10:06.799 --> 00:10:09.039
<v Speaker 1>the cloud context compared to on premise.

190
00:10:09.240 --> 00:10:12.960
<v Speaker 2>Okay, we need deep insight. So when looking at CPU usage,

191
00:10:13.000 --> 00:10:16.200
<v Speaker 2>it's vital to distinguish between user time that's your application running,

192
00:10:16.240 --> 00:10:19.360
<v Speaker 2>and privileged time or system time, which is the kernel

193
00:10:19.440 --> 00:10:20.039
<v Speaker 2>doing work.

194
00:10:20.240 --> 00:10:22.279
<v Speaker 1>Why is that distinction so important now?

195
00:10:22.559 --> 00:10:25.559
<v Speaker 2>Because if you see consistently high privileged time, it often

196
00:10:25.600 --> 00:10:29.519
<v Speaker 2>indicates poor performance caused by the underlying hypervisor or maybe

197
00:10:29.519 --> 00:10:33.639
<v Speaker 2>noisy neighbors on the physical host. That's potentially the provider's problem,

198
00:10:33.679 --> 00:10:36.799
<v Speaker 2>not your application code. Knowing that difference helps you open

199
00:10:36.840 --> 00:10:38.159
<v Speaker 2>the right kind of support ticket.

200
00:10:38.480 --> 00:10:41.639
<v Speaker 1>Ah, that's a great example of how monitoring helps navigate

201
00:10:41.639 --> 00:10:43.720
<v Speaker 1>that shared responsibility model. What else?

202
00:10:44.039 --> 00:10:48.200
<v Speaker 2>Absolutely, for DISCO, you absolutely must track input output operations

203
00:10:48.240 --> 00:10:52.799
<v Speaker 2>per second IOPs, especially with Linux file systems. Hitting IOPs

204
00:10:52.840 --> 00:10:56.399
<v Speaker 2>limits is a common bottleneck. If your IOPs are spiking,

205
00:10:56.840 --> 00:10:59.279
<v Speaker 2>you probably need to scale up your storage tier, maybe

206
00:10:59.320 --> 00:11:02.120
<v Speaker 2>get faster, not just make the VM bigger.

207
00:11:02.240 --> 00:11:04.480
<v Speaker 1>Got it, disc speed not just size right.

208
00:11:04.840 --> 00:11:07.600
<v Speaker 2>And critically, for memory utilization, you need to track paging

209
00:11:07.639 --> 00:11:11.480
<v Speaker 2>events or swap activity. Excessive paging where the VM is

210
00:11:11.559 --> 00:11:13.919
<v Speaker 2>constantly swapping memory out to DISC because it doesn't have

211
00:11:14.000 --> 00:11:17.519
<v Speaker 2>enough RAM is probably the clearest, most unambiguous sign that

212
00:11:17.600 --> 00:11:20.720
<v Speaker 2>performance is tanking and you need more memory capacity for

213
00:11:20.759 --> 00:11:21.399
<v Speaker 2>that workload.

214
00:11:21.559 --> 00:11:24.480
<v Speaker 1>Okay, clear indicators there. So the native endor tools like

215
00:11:24.559 --> 00:11:27.440
<v Speaker 1>Azure Monitor or cloud Watch, they give you data on

216
00:11:27.519 --> 00:11:30.600
<v Speaker 1>individual servers or services, but they don't necessarily give you

217
00:11:30.639 --> 00:11:34.720
<v Speaker 1>that unified enterprise wide dashboard view, especially if you've got

218
00:11:34.720 --> 00:11:37.639
<v Speaker 1>that multi cloud or hybrid reality exactly.

219
00:11:37.639 --> 00:11:40.360
<v Speaker 2>They're great for their own ecosystems, but they don't naturally

220
00:11:40.399 --> 00:11:43.519
<v Speaker 2>palk to each other or integrate with your on prem

221
00:11:43.559 --> 00:11:48.279
<v Speaker 2>tools to gain that truly comprehensive uniform view across everything.

222
00:11:48.600 --> 00:11:52.559
<v Speaker 2>The sources highly recommend integrating third party monitoring tools think

223
00:11:53.000 --> 00:11:56.200
<v Speaker 2>data Dog, Dina Trace, neuralic tools like that.

224
00:11:56.320 --> 00:11:57.559
<v Speaker 1>And what do they bring to the table.

225
00:11:57.840 --> 00:12:01.440
<v Speaker 2>They specialize in collecting and correlate metrics from every layer

226
00:12:01.480 --> 00:12:04.039
<v Speaker 2>of the architecture, from the database queries up to the

227
00:12:04.279 --> 00:12:07.840
<v Speaker 2>load balance or response times, maybe even front end user experience,

228
00:12:08.080 --> 00:12:11.200
<v Speaker 2>regardless of which cloud vendor or which data center things

229
00:12:11.200 --> 00:12:14.679
<v Speaker 2>are sitting on. That unified visibility is really the difference

230
00:12:14.679 --> 00:12:18.360
<v Speaker 2>between proactive management and constantly just reacting to outages after

231
00:12:18.399 --> 00:12:18.879
<v Speaker 2>they happen.

232
00:12:19.080 --> 00:12:22.039
<v Speaker 1>Okay, that makes sense. Let's shift gears now to defensive

233
00:12:22.039 --> 00:12:26.120
<v Speaker 1>protection with Principle four, ensure your Linux vms are secure

234
00:12:26.159 --> 00:12:29.519
<v Speaker 1>and backed up. Now, you mentioned shared responsibility earlier, and

235
00:12:29.600 --> 00:12:31.879
<v Speaker 1>you said, if you take only one concept away from

236
00:12:31.879 --> 00:12:34.000
<v Speaker 1>this whole deep dive, it should be the shared security

237
00:12:34.039 --> 00:12:36.200
<v Speaker 1>responsibility model. Let's really nail this.

238
00:12:36.200 --> 00:12:39.240
<v Speaker 2>Down, we have to. This is probably the single most

239
00:12:39.279 --> 00:12:43.399
<v Speaker 2>misunderstood concept in cloud and where customers frankly fail all

240
00:12:43.440 --> 00:12:45.840
<v Speaker 2>the time. Let's clearly define the line in the sand.

241
00:12:46.399 --> 00:12:49.519
<v Speaker 2>The cloud provider is responsible for security of the cloud.

242
00:12:49.720 --> 00:12:51.320
<v Speaker 1>Okay, of the cloud meaning the.

243
00:12:51.320 --> 00:12:54.080
<v Speaker 2>Physical security of the data centers, the security of the

244
00:12:54.120 --> 00:12:58.120
<v Speaker 2>global network infrastructure, the security of their managed services like

245
00:12:58.200 --> 00:13:01.519
<v Speaker 2>the hypervisor or the storage fabri They secure the building

246
00:13:01.519 --> 00:13:03.120
<v Speaker 2>and its core systems.

247
00:13:03.039 --> 00:13:06.679
<v Speaker 1>Right, And the cloud customer is responsible for security.

248
00:13:06.200 --> 00:13:08.679
<v Speaker 2>In the cloud exactly. Security in the cloud this means

249
00:13:08.720 --> 00:13:11.320
<v Speaker 2>your customer data, the security of the operating systems you

250
00:13:11.399 --> 00:13:15.879
<v Speaker 2>choose to run, like Linux, patching those ocs, configuring firewalls,

251
00:13:15.919 --> 00:13:21.039
<v Speaker 2>managing your application security, identity and access management IAM, and

252
00:13:21.080 --> 00:13:25.159
<v Speaker 2>crucially encryption. You secure everything you put inside the building.

253
00:13:25.399 --> 00:13:27.600
<v Speaker 1>And you specifically called out encryption there.

254
00:13:27.679 --> 00:13:32.039
<v Speaker 2>Why because customers often forget that last part. Encryption of

255
00:13:32.120 --> 00:13:35.080
<v Speaker 2>data at rest on your VMS or in your databases

256
00:13:35.120 --> 00:13:38.519
<v Speaker 2>is almost always the customer's job. By default. The provider

257
00:13:38.600 --> 00:13:40.759
<v Speaker 2>gives you the tools, but you have to turn them

258
00:13:40.759 --> 00:13:41.879
<v Speaker 2>on and manage the keys.

259
00:13:42.200 --> 00:13:44.879
<v Speaker 1>Okay, So to fulfill your side of the bargain, you

260
00:13:44.960 --> 00:13:47.679
<v Speaker 1>need to leverage the tools the provider gives you, like

261
00:13:47.960 --> 00:13:53.480
<v Speaker 1>network security groups, cloud firewalls, strong im controls, using things

262
00:13:53.519 --> 00:13:57.240
<v Speaker 1>like virtual private clouds of vpcs or v nets for

263
00:13:57.399 --> 00:13:58.720
<v Speaker 1>logical network isolation.

264
00:13:59.039 --> 00:14:02.080
<v Speaker 2>Absolutely, those are your primary tools for securing things in

265
00:14:02.120 --> 00:14:02.559
<v Speaker 2>the cloud.

266
00:14:03.000 --> 00:14:07.679
<v Speaker 1>Now, let's connect security with disaster recovery or dr When

267
00:14:07.720 --> 00:14:10.600
<v Speaker 1>we're planning for DR we always talk about RTO and RPO.

268
00:14:10.639 --> 00:14:11.759
<v Speaker 1>Can you quickly define those?

269
00:14:11.879 --> 00:14:15.320
<v Speaker 2>Sure? RTO that's the recovery time objective. It's the maximum

270
00:14:15.360 --> 00:14:18.679
<v Speaker 2>acceptable time allowed to restore your service after a disaster hits.

271
00:14:19.000 --> 00:14:20.559
<v Speaker 2>How fast you need to be back online?

272
00:14:20.759 --> 00:14:21.080
<v Speaker 1>Okay?

273
00:14:21.200 --> 00:14:25.639
<v Speaker 2>And RPO the recovery point objective. That's the maximum acceptable

274
00:14:25.639 --> 00:14:28.519
<v Speaker 2>amount of data loss, usually measured in time like can

275
00:14:28.519 --> 00:14:30.840
<v Speaker 2>you afford to lose the last hour of data or

276
00:14:30.879 --> 00:14:32.000
<v Speaker 2>only the last five minutes?

277
00:14:32.120 --> 00:14:35.000
<v Speaker 1>Right? I remember, you know, ten fifteen years ago, running

278
00:14:35.000 --> 00:14:38.159
<v Speaker 1>a DR drill was this massive annual event. It cost

279
00:14:38.200 --> 00:14:41.559
<v Speaker 1>a fortune because you were essentially paying for idle hot

280
00:14:41.639 --> 00:14:46.360
<v Speaker 1>standby physical infrastructure sitting in a dedicated secondary site just waiting.

281
00:14:46.080 --> 00:14:49.759
<v Speaker 2>For disaster exactly millions sometimes just for that insurance, and.

282
00:14:49.679 --> 00:14:53.279
<v Speaker 1>The cloud changes the entire financial stress test of that situation,

283
00:14:53.360 --> 00:14:53.840
<v Speaker 1>doesn't it.

284
00:14:53.840 --> 00:14:57.440
<v Speaker 2>It dramatically shifts that rto cost trade off curve. Really,

285
00:14:57.919 --> 00:15:01.799
<v Speaker 2>because cloud elasticity allows you to quickly provisioned compute resources

286
00:15:01.840 --> 00:15:04.919
<v Speaker 2>only when the recovery is actually needed, not paying for

287
00:15:04.960 --> 00:15:07.879
<v Speaker 2>them to sit idle twenty four to seven, you can

288
00:15:07.960 --> 00:15:11.759
<v Speaker 2>often achieve a much faster recovery time, a shorter RTO

289
00:15:12.080 --> 00:15:15.759
<v Speaker 2>at a significantly reduced infrastructure costs compared to those traditional

290
00:15:15.799 --> 00:15:20.240
<v Speaker 2>dedicated DR sites. Basically, you can often afford faster recovery

291
00:15:20.240 --> 00:15:22.879
<v Speaker 2>metrics because the hardware effectively sits powered off in the

292
00:15:22.879 --> 00:15:25.360
<v Speaker 2>cloud until you declare disaster and need to spin it up.

293
00:15:25.679 --> 00:15:29.080
<v Speaker 1>So modernizing backup and DR in the cloud means maybe

294
00:15:29.120 --> 00:15:33.120
<v Speaker 1>outsourcing the whole backup process via managed services like Azure

295
00:15:33.120 --> 00:15:37.720
<v Speaker 1>backup or AWS backup, and using built in replication tools

296
00:15:37.720 --> 00:15:40.480
<v Speaker 1>maybe like Azure Site Recovery or cloud Endure, which can

297
00:15:40.519 --> 00:15:43.960
<v Speaker 1>effectively eliminate the need for that second expensive physical data

298
00:15:44.000 --> 00:15:45.720
<v Speaker 1>center altogether for many workloads.

299
00:15:45.799 --> 00:15:47.320
<v Speaker 2>That's exactly the modern approach.

300
00:15:47.399 --> 00:15:50.919
<v Speaker 1>Yes, okay, that brings us to our final and arguably

301
00:15:51.000 --> 00:15:58.399
<v Speaker 1>most strategic capstone Principal five. Governance often sounds like boring paperwork,

302
00:15:58.559 --> 00:16:00.799
<v Speaker 1>but you suggested it's actually one of the most complex

303
00:16:00.840 --> 00:16:02.960
<v Speaker 1>parts of moving to and operating in the cloud.

304
00:16:03.000 --> 00:16:07.320
<v Speaker 2>Why is that because the cloud, by its nature, abstracts

305
00:16:07.320 --> 00:16:11.000
<v Speaker 2>location and control in ways that introduce massive new complexities,

306
00:16:11.360 --> 00:16:15.559
<v Speaker 2>especially around legal issues, compliance and data disclosure regulations. The

307
00:16:15.600 --> 00:16:19.320
<v Speaker 2>source specifically highlights things like data sovereignty laws.

308
00:16:19.080 --> 00:16:22.679
<v Speaker 1>Ah right, the rules that say data belonging to citizens

309
00:16:22.720 --> 00:16:25.600
<v Speaker 1>of a certain country must physically remain stored within that.

310
00:16:25.559 --> 00:16:29.120
<v Speaker 2>Country's borders exactly. So if a cloud provider has regions

311
00:16:29.159 --> 00:16:32.360
<v Speaker 2>all over the world, you as the architect or administrator,

312
00:16:32.399 --> 00:16:35.080
<v Speaker 2>have the responsibility to ensure that the data for say,

313
00:16:35.399 --> 00:16:38.759
<v Speaker 2>your German customers, is provision only in an EU region

314
00:16:38.840 --> 00:16:42.559
<v Speaker 2>like Germany or Frankfurt and isn't accidentally replicated or backed

315
00:16:42.639 --> 00:16:45.240
<v Speaker 2>up to a US region for instance. That requires careful

316
00:16:45.240 --> 00:16:46.200
<v Speaker 2>governance policies.

317
00:16:46.559 --> 00:16:50.559
<v Speaker 1>And beyond just the legal complexity, there's often a customer concern.

318
00:16:50.720 --> 00:16:54.360
<v Speaker 1>Isn't there about trusting the provider with sensitive data given

319
00:16:54.399 --> 00:16:57.519
<v Speaker 1>the shared nature of the resources and maybe having less

320
00:16:57.559 --> 00:17:00.960
<v Speaker 1>direct visibility compared to their old on premise environments.

321
00:17:01.080 --> 00:17:04.119
<v Speaker 2>That's a huge factor. You're relying on the provider security

322
00:17:04.200 --> 00:17:08.200
<v Speaker 2>for the underlying layers. You're on shared hardware. It requires

323
00:17:08.200 --> 00:17:10.279
<v Speaker 2>a different level of trust and verification.

324
00:17:10.960 --> 00:17:15.039
<v Speaker 1>So how do you, as the customer maintain strategic control

325
00:17:15.279 --> 00:17:17.960
<v Speaker 1>and ensure compliance and manage that trust.

326
00:17:18.000 --> 00:17:22.240
<v Speaker 2>Through rigorous governance mechanisms provided by the cloud platform. We're

327
00:17:22.240 --> 00:17:26.119
<v Speaker 2>talking about strict role based access control RBAC, making sure

328
00:17:26.160 --> 00:17:28.799
<v Speaker 2>people only have the minimum permissions they need. We're talking

329
00:17:28.799 --> 00:17:33.039
<v Speaker 2>about network security groups and policies, and crucially using hierarchical

330
00:17:33.039 --> 00:17:34.000
<v Speaker 2>account provisioning.

331
00:17:34.160 --> 00:17:34.920
<v Speaker 1>What do you mean by that?

332
00:17:35.079 --> 00:17:39.759
<v Speaker 2>Dividing your potentially sprawling cloud resources into logical containers, separate departments,

333
00:17:39.759 --> 00:17:43.279
<v Speaker 2>different projects, distinct subscriptions or accounts. This allows you to

334
00:17:43.359 --> 00:17:47.079
<v Speaker 2>ring fence costs, apply specific security policies only where needed,

335
00:17:47.359 --> 00:17:51.000
<v Speaker 2>and manage access at scale. It's fundamental to staying organized

336
00:17:51.000 --> 00:17:51.519
<v Speaker 2>and secure.

337
00:17:51.759 --> 00:17:54.559
<v Speaker 1>Okay, And finally, when it comes to trusting the global

338
00:17:54.599 --> 00:17:58.799
<v Speaker 1>cloud vendor, you can't exactly send your own auditors to

339
00:17:58.839 --> 00:18:03.359
<v Speaker 1>physically inspect their massive, highly secured data centers around the world.

340
00:18:03.680 --> 00:18:07.240
<v Speaker 1>So how is that confidence that trust actually established.

341
00:18:07.440 --> 00:18:11.000
<v Speaker 2>It's established through what the source calls delegated trust. Since

342
00:18:11.079 --> 00:18:14.559
<v Speaker 2>direct physical auditing by every customer is completely infeasible and

343
00:18:14.599 --> 00:18:17.720
<v Speaker 2>frankly a security risk in itself. Trust is established by

344
00:18:17.759 --> 00:18:22.240
<v Speaker 2>relying on independent, recognize third party audits and certifications, So.

345
00:18:22.200 --> 00:18:24.240
<v Speaker 1>You look for their badges essentially.

346
00:18:24.000 --> 00:18:27.279
<v Speaker 2>Kind of yeah, you rely on standardized reports like SC

347
00:18:27.319 --> 00:18:29.640
<v Speaker 2>one or SEC two, which a test to financial and

348
00:18:29.720 --> 00:18:33.799
<v Speaker 2>operational controls. You look for industry specific certifications like ISO

349
00:18:33.839 --> 00:18:36.359
<v Speaker 2>twenty seven, DEERO zero zero six, AR seven zero zero

350
00:18:36.359 --> 00:18:40.480
<v Speaker 2>two for security management, or maybe HYPOLAA for healthcare data

351
00:18:40.880 --> 00:18:44.799
<v Speaker 2>or PCIDSS for payment card data. These aren't just acronyms

352
00:18:44.799 --> 00:18:47.680
<v Speaker 2>on a web page. They represent formal attestations by accredited

353
00:18:47.680 --> 00:18:50.960
<v Speaker 2>auditors that the cloud provider has implemented the required security standards,

354
00:18:51.000 --> 00:18:55.599
<v Speaker 2>management controls, and operational procedures at that foundational infrastructure level.

355
00:18:56.119 --> 00:18:58.839
<v Speaker 2>You delegate the auditing trust to these recognized bodies.

356
00:18:59.119 --> 00:19:02.559
<v Speaker 1>Okay, So wrapping it up, the five strategic principles for

357
00:19:02.680 --> 00:19:06.519
<v Speaker 1>mastering Linux in the cloud. Start with cloud readiness and planning,

358
00:19:07.079 --> 00:19:11.640
<v Speaker 1>build for availability and resilience, Implement unified monitoring across the stack,

359
00:19:12.039 --> 00:19:16.079
<v Speaker 1>ensure security and disaster recovery through that shared model, and finally,

360
00:19:16.240 --> 00:19:20.319
<v Speaker 1>overlay rigorous governance. It really does feel like a roadmap

361
00:19:20.640 --> 00:19:23.279
<v Speaker 1>to avoiding both technical and financial headaches.

362
00:19:23.519 --> 00:19:27.440
<v Speaker 2>It absolutely is, And notice the underlying theme the shift

363
00:19:27.480 --> 00:19:31.480
<v Speaker 2>is profound. The cloud vendor takes responsibility for securing the

364
00:19:31.480 --> 00:19:36.000
<v Speaker 2>cloud infrastructure, but the customer retains full responsibility for securing

365
00:19:36.000 --> 00:19:38.680
<v Speaker 2>their application, their data, their identities, their access in.

366
00:19:38.680 --> 00:19:41.279
<v Speaker 1>The cloud, which fundamentally changes the game.

367
00:19:41.440 --> 00:19:44.160
<v Speaker 2>It fundamentally means the long term role of the traditional

368
00:19:44.200 --> 00:19:47.319
<v Speaker 2>system administrator is changing dramatically. They have to evolve.

369
00:19:47.440 --> 00:19:51.400
<v Speaker 1>They need to become strategists right embracing development practices, cloud

370
00:19:51.519 --> 00:19:55.759
<v Speaker 1>architecture principles, and maybe most importantly, understanding and implementing these

371
00:19:55.759 --> 00:19:58.920
<v Speaker 1>governance strategies. It's really no longer just about racking and

372
00:19:58.920 --> 00:20:00.200
<v Speaker 1>stacking physical matas.

373
00:20:00.480 --> 00:20:05.599
<v Speaker 2>Not at all. It's about strategic automation, compliance, security, posture management,

374
00:20:05.839 --> 00:20:06.799
<v Speaker 2>cost optimization.

375
00:20:07.319 --> 00:20:09.119
<v Speaker 1>So here's a final thought to leave of our listeners

376
00:20:09.160 --> 00:20:12.680
<v Speaker 1>with building on that, if the cissed men's role is

377
00:20:12.759 --> 00:20:18.160
<v Speaker 1>shifting towards governance, towards managing identity and access, yeah, it

378
00:20:18.240 --> 00:20:21.960
<v Speaker 1>raises a crucial question for you listening, Are you and

379
00:20:22.039 --> 00:20:26.240
<v Speaker 1>your organization actually structured to effectively audit your own identity

380
00:20:26.240 --> 00:20:30.640
<v Speaker 1>and access management, i AM policies within the cloud or

381
00:20:30.720 --> 00:20:33.319
<v Speaker 1>is that maybe the biggest blind spot you've accidentally created

382
00:20:33.400 --> 00:20:34.960
<v Speaker 1>or outsourced without realizing it.

383
00:20:35.119 --> 00:20:37.480
<v Speaker 2>Hmmm, that's a good one. Definitely something to mull over

384
00:20:37.599 --> 00:20:38.880
<v Speaker 2>who's watching the watchers.

385
00:20:39.160 --> 00:20:42.039
<v Speaker 1>Essentially exactly something to think about until our next deep dies.

386
00:20:42.279 --> 00:20:44.799
<v Speaker 2>Thanks for breaking down these principles today, my pleasure is

387
00:20:44.799 --> 00:20:45.559
<v Speaker 2>a great discussion
