WEBVTT

1
00:00:00.160 --> 00:00:03.160
<v Speaker 1>Welcome to the deep dive. Now you gave us a

2
00:00:03.200 --> 00:00:06.599
<v Speaker 1>really interesting stack of sources, this time a blueprint really

3
00:00:06.639 --> 00:00:11.519
<v Speaker 1>for modern enterprise tech. And our mission today is, well,

4
00:00:11.640 --> 00:00:13.880
<v Speaker 1>let's cut through the jargon. We want to make you

5
00:00:14.000 --> 00:00:18.120
<v Speaker 1>instantly well informed about cloud native application development. This isn't

6
00:00:18.160 --> 00:00:22.480
<v Speaker 1>about just dry definitions. It's about understanding the core ideas,

7
00:00:23.000 --> 00:00:27.719
<v Speaker 1>the philosophy, and the tools that really change software.

8
00:00:27.359 --> 00:00:28.800
<v Speaker 2>Development, fundamentally changed it.

9
00:00:28.879 --> 00:00:31.239
<v Speaker 1>We're talking about that shift, you know, from apps that

10
00:00:31.280 --> 00:00:34.920
<v Speaker 1>took months, maybe years to roll out, oh, to systems

11
00:00:34.960 --> 00:00:38.399
<v Speaker 1>that demand new features in well days.

12
00:00:38.000 --> 00:00:41.719
<v Speaker 2>And that need for speed, for constant evolution. That's the

13
00:00:41.759 --> 00:00:44.600
<v Speaker 2>core driver here. If you look at the sources, the

14
00:00:44.640 --> 00:00:48.799
<v Speaker 2>big move is crystal clear. It's leaving behind the monolithic

15
00:00:48.920 --> 00:00:50.719
<v Speaker 2>architecture monoliths.

16
00:00:51.079 --> 00:00:54.119
<v Speaker 1>We all remember that latency pain, don't we These massive

17
00:00:54.159 --> 00:00:58.479
<v Speaker 1>applications built as just one single giant unit, indivisible. They

18
00:00:58.520 --> 00:01:00.320
<v Speaker 1>were hard to scale. We know that, But what was

19
00:01:00.320 --> 00:01:03.759
<v Speaker 1>the real headache, the biggest pain point organizationally? Maybe?

20
00:01:03.920 --> 00:01:07.879
<v Speaker 2>Well, yeah, scaling was tough. If one tiny part like

21
00:01:08.319 --> 00:01:11.239
<v Speaker 2>search on an e commerce site got hammered with traffic, right,

22
00:01:11.760 --> 00:01:15.799
<v Speaker 2>you had to clone the entire huge application, massive resource waste.

23
00:01:15.959 --> 00:01:21.920
<v Speaker 2>But honestly, maybe worse was deployment. Developer fixes one tiny

24
00:01:22.000 --> 00:01:26.159
<v Speaker 2>bug deep in some module. They couldn't just deplay that fix.

25
00:01:26.959 --> 00:01:29.959
<v Speaker 1>Oh no, you had to test everything.

26
00:01:29.599 --> 00:01:33.319
<v Speaker 2>Everything, full regression testing on the whole beast. So releases

27
00:01:33.359 --> 00:01:36.879
<v Speaker 2>were slow, risky, took weeks sometimes to coordinate.

28
00:01:37.000 --> 00:01:39.640
<v Speaker 1>Okay, So micro services come in to fix that, breaking

29
00:01:39.719 --> 00:01:44.000
<v Speaker 1>the ab down into the smallest logical level loosely coupled services.

30
00:01:44.319 --> 00:01:48.000
<v Speaker 2>The key advantage is that granular control selective scaling. Search

31
00:01:48.040 --> 00:01:50.840
<v Speaker 2>service overloaded find just scale that not the whole thing.

32
00:01:51.000 --> 00:01:54.200
<v Speaker 2>The order service accounts. Yeah, they just sit there doing

33
00:01:54.200 --> 00:01:58.560
<v Speaker 2>their job. And there's another cool benefit. The sources highlight polyglot.

34
00:01:58.079 --> 00:01:59.959
<v Speaker 1>Programming, meaning using different languages.

35
00:02:00.120 --> 00:02:03.000
<v Speaker 2>Right, use the best tool for the specific job. Maybe

36
00:02:03.040 --> 00:02:05.519
<v Speaker 2>Python for I don't know, a reporting micro service because

37
00:02:05.519 --> 00:02:09.039
<v Speaker 2>this data libraries are great, but use Java for your

38
00:02:09.080 --> 00:02:12.199
<v Speaker 2>high performance transaction service. Get that flexibility.

39
00:02:12.400 --> 00:02:15.599
<v Speaker 1>So the tech enables breaking things up. But let's be honest,

40
00:02:15.759 --> 00:02:18.680
<v Speaker 1>this whole shift wouldn't have happened without the money side

41
00:02:18.759 --> 00:02:19.280
<v Speaker 1>changing too.

42
00:02:19.360 --> 00:02:21.719
<v Speaker 3>Right, That pay as you go model, Oh absolutely, that's

43
00:02:21.719 --> 00:02:25.639
<v Speaker 3>the economic engine that really drove cloud adoption, paying only

44
00:02:25.680 --> 00:02:29.400
<v Speaker 3>for what you actually use and what's really interesting is

45
00:02:29.759 --> 00:02:33.840
<v Speaker 3>how it lowered the barrier to just trying things out, experimentation.

46
00:02:34.159 --> 00:02:37.439
<v Speaker 3>How so, well, think back if you needed a powerful

47
00:02:37.479 --> 00:02:41.560
<v Speaker 3>server just to say, run some sample code and validy fixes. Yeah,

48
00:02:41.680 --> 00:02:44.400
<v Speaker 3>you had to buy the hardware first, a big capital expense.

49
00:02:44.919 --> 00:02:47.400
<v Speaker 3>Now you rent a virtual machine, maybe a quad core,

50
00:02:47.840 --> 00:02:49.400
<v Speaker 3>use it for twenty minutes, shut.

51
00:02:49.199 --> 00:02:51.919
<v Speaker 4>It down, and pay for twenty minutes exactly.

52
00:02:51.560 --> 00:02:54.280
<v Speaker 2>Just the time you use it. Completely changes the economics

53
00:02:54.280 --> 00:02:55.319
<v Speaker 2>of development and testing.

54
00:02:55.840 --> 00:02:59.639
<v Speaker 1>That ability to just grab resources when needed it must

55
00:02:59.639 --> 00:03:02.759
<v Speaker 1>have slash overheads. Okay, so we know why the shift happened.

56
00:03:03.639 --> 00:03:07.520
<v Speaker 1>Let's talk about where the cloud environment itself. The sources

57
00:03:07.599 --> 00:03:10.280
<v Speaker 1>use this great layer by layer metaphor for the Zauce

58
00:03:10.319 --> 00:03:14.680
<v Speaker 1>categories ISS PIASAUCE. It gets confusing. So where does my

59
00:03:14.800 --> 00:03:17.000
<v Speaker 1>responsibility end and the cloud providers begin?

60
00:03:17.199 --> 00:03:20.080
<v Speaker 2>Right? This hierarchy is key because it defines your cost,

61
00:03:20.240 --> 00:03:23.039
<v Speaker 2>your effort, and how much control you have. Let's start

62
00:03:23.039 --> 00:03:26.439
<v Speaker 2>at the bottom. Traditional on premise. Ok. Here you manage

63
00:03:26.479 --> 00:03:29.560
<v Speaker 2>everything the building, the power, the cooling, the servers, the network,

64
00:03:29.599 --> 00:03:33.199
<v Speaker 2>the OS, middleware, the app, the data, all of it.

65
00:03:33.800 --> 00:03:34.479
<v Speaker 2>Your problem.

66
00:03:34.560 --> 00:03:38.639
<v Speaker 1>Total control, total responsibility. So climbing one step up, ISS

67
00:03:38.840 --> 00:03:40.199
<v Speaker 1>infrastructure as a service.

68
00:03:40.280 --> 00:03:43.159
<v Speaker 2>What changes with is think is something like an aws

69
00:03:43.240 --> 00:03:47.039
<v Speaker 2>EC two instance, a basic virtual machine. The cloud provider

70
00:03:47.080 --> 00:03:52.319
<v Speaker 2>handles the fundamental infrastructure, the servers, storage, networking, the physical stuff.

71
00:03:52.360 --> 00:03:55.120
<v Speaker 2>But I still manage You're still responsible for the operating system,

72
00:03:55.159 --> 00:03:58.840
<v Speaker 2>any middleware, the runtime environment, your application and your data.

73
00:03:59.240 --> 00:04:00.560
<v Speaker 2>Still quite a bit look after.

74
00:04:00.840 --> 00:04:04.080
<v Speaker 1>Okay, so PASS platform as a service must take more

75
00:04:04.120 --> 00:04:04.639
<v Speaker 1>off my clate.

76
00:04:04.680 --> 00:04:08.879
<v Speaker 2>Then it does significantly more with pass. The cloud provider

77
00:04:08.960 --> 00:04:13.319
<v Speaker 2>manages the OS, the middleware, and the runtime environments. Ah So,

78
00:04:13.439 --> 00:04:15.599
<v Speaker 2>as the user, you really only need to focus on

79
00:04:15.639 --> 00:04:19.480
<v Speaker 2>your application code and its associated data. I think Amazon

80
00:04:19.560 --> 00:04:21.439
<v Speaker 2>Elastic Beanstock. You just upload your.

81
00:04:21.360 --> 00:04:26.000
<v Speaker 1>Code and the platform handles the rest, provisioning, scaling pretty much.

82
00:04:26.079 --> 00:04:29.199
<v Speaker 2>Yeah. It abstracts away a lot of the operational burden.

83
00:04:28.959 --> 00:04:31.439
<v Speaker 1>And then sas software is as a service is just

84
00:04:31.600 --> 00:04:34.439
<v Speaker 1>the finished product, like logging into Gmail or Office three

85
00:04:34.560 --> 00:04:35.079
<v Speaker 1>sixty five.

86
00:04:35.240 --> 00:04:38.480
<v Speaker 2>Exactly, you're purely a consumer, log in, use the software.

87
00:04:38.519 --> 00:04:39.839
<v Speaker 2>The provider handles everything else.

88
00:04:39.879 --> 00:04:42.000
<v Speaker 1>But the cloud didn't stop there, did it. There's face

89
00:04:42.079 --> 00:04:43.399
<v Speaker 1>functions as a service.

90
00:04:43.199 --> 00:04:47.079
<v Speaker 2>Right, face like AWS LANDA or Azure functions. This is

91
00:04:47.319 --> 00:04:50.160
<v Speaker 2>like the ultimate level of abstraction. The smallest unit.

92
00:04:50.240 --> 00:04:50.920
<v Speaker 1>Your smallest unit.

93
00:04:51.000 --> 00:04:53.879
<v Speaker 2>Yeah, you only manage the function itself, the actual snippet

94
00:04:53.879 --> 00:04:58.319
<v Speaker 2>of code. The cloud handles everything else, provisioning servers, scaling up,

95
00:04:58.560 --> 00:05:02.600
<v Speaker 2>scaling down, even scaling to zero when it's not being used,

96
00:05:02.639 --> 00:05:06.279
<v Speaker 2>and the billing reflects that precisely. Your charge only for

97
00:05:06.360 --> 00:05:08.959
<v Speaker 2>the exact time your code is running. If it runs

98
00:05:08.959 --> 00:05:12.519
<v Speaker 2>for say, thirty milliseconds, you pay for thirty milliseconds. That

99
00:05:12.600 --> 00:05:16.480
<v Speaker 2>efficiency is just game changing.

100
00:05:16.639 --> 00:05:19.439
<v Speaker 1>Okay, that clarifies the layers the what. But here's the

101
00:05:19.519 --> 00:05:23.720
<v Speaker 1>crucial part. The mindset shift. Just lifting and shifting your

102
00:05:23.759 --> 00:05:28.079
<v Speaker 1>old monolith onto an IASVM. That doesn't really unlock the

103
00:05:28.079 --> 00:05:30.120
<v Speaker 1>cloud's power, does it not at all?

104
00:05:30.279 --> 00:05:32.439
<v Speaker 2>That's just running your old problems in a new location.

105
00:05:32.759 --> 00:05:36.040
<v Speaker 1>Right. The sources really stress this. Building true cloud native

106
00:05:36.079 --> 00:05:39.800
<v Speaker 1>apps means unlearning old habits. You have to design for well.

107
00:05:40.120 --> 00:05:41.360
<v Speaker 4>Volatility absolutely.

108
00:05:41.439 --> 00:05:45.439
<v Speaker 2>Cloud native design assumes failure is normal. The old monolith

109
00:05:45.519 --> 00:05:47.839
<v Speaker 2>mindset was, you know, the server's precious, keep it running

110
00:05:47.839 --> 00:05:50.680
<v Speaker 2>at all costs. In the cloud, servers are cattle nut pets.

111
00:05:50.680 --> 00:05:52.759
<v Speaker 2>They're disposable. Components will fail.

112
00:05:52.519 --> 00:05:56.199
<v Speaker 1>Which brings us right to design factor one, embrace failure. Specifically,

113
00:05:56.480 --> 00:05:57.839
<v Speaker 1>no single point of failure.

114
00:05:58.000 --> 00:06:01.160
<v Speaker 2>Correct, and the cornerstone of building resilient systems like this

115
00:06:01.439 --> 00:06:02.319
<v Speaker 2>is statelessness.

116
00:06:02.360 --> 00:06:05.279
<v Speaker 1>Okay, statelessness? What does that mean in practice? Give us

117
00:06:05.279 --> 00:06:05.879
<v Speaker 1>an analogy.

118
00:06:06.040 --> 00:06:10.120
<v Speaker 2>Okay, think about ordering food. Right, a stateful restaurant, only

119
00:06:10.199 --> 00:06:13.120
<v Speaker 2>the waiter who took your order knows what you ordered

120
00:06:13.199 --> 00:06:15.519
<v Speaker 2>or where you are in your meal. If that specific

121
00:06:15.519 --> 00:06:19.199
<v Speaker 2>waiter goes home, you're stuck. Your state is lost with them, right, I.

122
00:06:19.160 --> 00:06:21.279
<v Speaker 1>Can see that annoying.

123
00:06:21.160 --> 00:06:25.120
<v Speaker 2>Very Now a stateless restaurant, your order is written down,

124
00:06:25.319 --> 00:06:29.439
<v Speaker 2>maybe put into a central system. Any available waiter, any

125
00:06:29.480 --> 00:06:32.600
<v Speaker 2>server instance can look up your order details and continues

126
00:06:32.680 --> 00:06:33.839
<v Speaker 2>serving you seamlessly.

127
00:06:34.279 --> 00:06:37.879
<v Speaker 1>Ah, The system knows, not the individual server exactly.

128
00:06:38.079 --> 00:06:41.399
<v Speaker 2>The service itself doesn't hold onto session memory between requests

129
00:06:41.920 --> 00:06:45.160
<v Speaker 2>the state. The data is stored elsewhere, maybe a database

130
00:06:45.240 --> 00:06:47.959
<v Speaker 2>or cash accessible to all instances. That's what lets you

131
00:06:48.000 --> 00:06:50.680
<v Speaker 2>easily add a remove server's horizontal scaling. You can have

132
00:06:50.720 --> 00:06:52.680
<v Speaker 2>one hundred identical interchangeable servers.

133
00:06:52.759 --> 00:06:55.160
<v Speaker 1>That makes total sense. Okay, So failure is inevitable. The

134
00:06:55.240 --> 00:06:59.000
<v Speaker 1>system needs to not just survive it, but handle it gracefully.

135
00:06:59.279 --> 00:07:03.560
<v Speaker 2>Yeah, yes, fail fast, that's crucial, often done using the

136
00:07:03.600 --> 00:07:04.600
<v Speaker 2>circuit breaker pattern.

137
00:07:04.720 --> 00:07:07.000
<v Speaker 1>Circuit breaker like in my house sort of.

138
00:07:07.399 --> 00:07:10.360
<v Speaker 2>You don't want a struggling service to just hang silent

139
00:07:10.439 --> 00:07:13.959
<v Speaker 2>feeling for minutes, causing backups everywhere else. If a downstream

140
00:07:14.000 --> 00:07:17.040
<v Speaker 2>service is clearly having trouble, got it off exactly, the

141
00:07:17.079 --> 00:07:20.879
<v Speaker 2>circuit breaker trips. It stops sending requests that failing service

142
00:07:20.920 --> 00:07:23.680
<v Speaker 2>for a short period, preventing overload and giving it a

143
00:07:23.720 --> 00:07:28.000
<v Speaker 2>chance to recover or be replaced. Then importantly, the calling

144
00:07:28.040 --> 00:07:32.040
<v Speaker 2>service needs to handle that failure gracefully, meaning don't just

145
00:07:32.040 --> 00:07:35.279
<v Speaker 2>show an error page. If live searches down, maybe pull

146
00:07:35.319 --> 00:07:39.680
<v Speaker 2>results from a cash or show top selling products, something useful,

147
00:07:39.759 --> 00:07:40.600
<v Speaker 2>not just a dead end.

148
00:07:40.759 --> 00:07:44.879
<v Speaker 1>Okay, resilience is built on statelessness and failing fast. But

149
00:07:45.000 --> 00:07:48.839
<v Speaker 1>if we have potentially hundreds of these small services, managing

150
00:07:48.839 --> 00:07:52.680
<v Speaker 1>that manually sounds impossible, which leads to design factor two.

151
00:07:53.319 --> 00:07:55.959
<v Speaker 4>Automation is king absolutely non negotiable.

152
00:07:56.000 --> 00:07:58.839
<v Speaker 2>With potentially hundreds of micro services, manual management is a

153
00:07:58.839 --> 00:08:02.720
<v Speaker 2>recipe for disaster. You must automate testing, deployment, that's your

154
00:08:02.800 --> 00:08:04.560
<v Speaker 2>CICD pipeline.

155
00:08:04.040 --> 00:08:07.439
<v Speaker 1>And monitoring to eliminate human error, that and just.

156
00:08:07.399 --> 00:08:10.279
<v Speaker 2>To cope with the scale. This automation is also key

157
00:08:10.319 --> 00:08:15.160
<v Speaker 2>to building a self healing system self healing like Wolverine. Huh, Well,

158
00:08:15.759 --> 00:08:18.519
<v Speaker 2>maybe not quite that fast, but the system needs to

159
00:08:18.600 --> 00:08:23.079
<v Speaker 2>automatically detect and recover from failures without needing a human

160
00:08:23.120 --> 00:08:27.240
<v Speaker 2>to step in. That could mean Kubernetes automatically restarting a

161
00:08:27.319 --> 00:08:28.959
<v Speaker 2>failed container instance.

162
00:08:28.720 --> 00:08:30.199
<v Speaker 1>Or redirecting traffic.

163
00:08:29.959 --> 00:08:33.399
<v Speaker 2>Or spinning up more instances at the load increases the

164
00:08:33.440 --> 00:08:34.799
<v Speaker 2>system manages itself.

165
00:08:35.159 --> 00:08:38.240
<v Speaker 1>Ideally, and for developers actually building these things. The sources

166
00:08:38.320 --> 00:08:40.799
<v Speaker 1>kept mentioning the twelve factor app philosophy is that like

167
00:08:40.840 --> 00:08:41.840
<v Speaker 1>a checklist, it's more.

168
00:08:41.799 --> 00:08:45.480
<v Speaker 2>Set of principles. Yeah. A widely accepted guide for building robust,

169
00:08:45.799 --> 00:08:49.039
<v Speaker 2>scalable services for the cloud. It covers things like how

170
00:08:49.039 --> 00:08:51.919
<v Speaker 2>to handle configuration, logs, dependencies.

171
00:08:52.240 --> 00:08:54.559
<v Speaker 1>What's a key benefit of following those rules?

172
00:08:54.720 --> 00:08:59.440
<v Speaker 2>Standardization and predictability. Take configuration for example, Factor three says

173
00:08:59.440 --> 00:09:01.679
<v Speaker 2>stork and figure in the environment, not in the code.

174
00:09:01.840 --> 00:09:02.960
<v Speaker 1>Why is that so important?

175
00:09:03.080 --> 00:09:05.639
<v Speaker 2>Because then the exact same compiled code artifact can be

176
00:09:05.679 --> 00:09:10.360
<v Speaker 2>deployed unchanged across development, test, staging, production. You just change

177
00:09:10.399 --> 00:09:14.240
<v Speaker 2>the environment variables for database connections, apikeys, et cetera. It

178
00:09:14.279 --> 00:09:17.559
<v Speaker 2>makes deployments much faster and safer, eliminates a huge source

179
00:09:17.559 --> 00:09:18.080
<v Speaker 2>of errors.

180
00:09:18.600 --> 00:09:22.679
<v Speaker 1>Got it? Okay, so we have the design mindset, embrace failure,

181
00:09:22.799 --> 00:09:26.440
<v Speaker 1>automate everything. Now let's connect that to the toolkit. What

182
00:09:26.639 --> 00:09:29.799
<v Speaker 1>technologies actually make this happen? How do we deploy and

183
00:09:29.879 --> 00:09:30.559
<v Speaker 1>run these things?

184
00:09:30.600 --> 00:09:34.240
<v Speaker 2>Right? The implementation, It really starts with containers. We mentioned

185
00:09:34.559 --> 00:09:35.799
<v Speaker 2>VM's being heavy.

186
00:09:35.519 --> 00:09:37.320
<v Speaker 1>Because they include a full OS.

187
00:09:37.159 --> 00:09:41.039
<v Speaker 2>Right, containers are way lighter. They allow for much greater density.

188
00:09:41.120 --> 00:09:44.000
<v Speaker 1>Remind us why they're lighter again, what's the core difference?

189
00:09:44.240 --> 00:09:47.480
<v Speaker 2>They share the host operating system's kernel, so instead of

190
00:09:47.519 --> 00:09:50.919
<v Speaker 2>each app needing its own complete OS like a VM does.

191
00:09:50.879 --> 00:09:53.080
<v Speaker 1>Like separate cars with engines, Yeah.

192
00:09:52.960 --> 00:09:55.799
<v Speaker 2>Containers are more like everyone sharing the car's engine, the

193
00:09:55.840 --> 00:09:58.519
<v Speaker 2>host to S kernel, but each having their own secure

194
00:09:58.639 --> 00:10:02.080
<v Speaker 2>passenger cabin built around. You're just packaging the application and

195
00:10:02.240 --> 00:10:04.440
<v Speaker 2>its dependencies, not the whole OS stack.

196
00:10:04.639 --> 00:10:07.039
<v Speaker 1>So you can pack way more onto the same hardware,

197
00:10:07.240 --> 00:10:08.440
<v Speaker 1>more efficient, faster.

198
00:10:08.240 --> 00:10:11.600
<v Speaker 2>To start up, exactly, huge boost and agility. But then

199
00:10:11.639 --> 00:10:16.360
<v Speaker 2>if you've got tens, maybe hundreds or thousands of these containers,

200
00:10:17.279 --> 00:10:20.360
<v Speaker 2>you need management, air traffic control.

201
00:10:20.080 --> 00:10:21.600
<v Speaker 1>Basically, and that's Kubernetes.

202
00:10:22.000 --> 00:10:25.399
<v Speaker 2>That's Kubernetes, or k eights as it's often called It's

203
00:10:25.440 --> 00:10:29.159
<v Speaker 2>become the de facto standard for container orchestration orchestration, meaning

204
00:10:29.360 --> 00:10:33.360
<v Speaker 2>it automates the deployment, scaling, load balancing, and crucially, the

205
00:10:33.440 --> 00:10:37.720
<v Speaker 2>healing of containerized applications across a cluster of machines. If

206
00:10:37.720 --> 00:10:40.440
<v Speaker 2>a container running your service crashes.

207
00:10:40.240 --> 00:10:42.240
<v Speaker 1>Kates, notices and starts a new one.

208
00:10:42.159 --> 00:10:45.519
<v Speaker 2>Yep, automatically, it handles that complexity so developers can focus

209
00:10:45.559 --> 00:10:48.080
<v Speaker 2>on code, not infrastructure babysitting.

210
00:10:48.120 --> 00:10:51.360
<v Speaker 1>Okay, containers managed by KAS We mentioned speed earlier that

211
00:10:51.360 --> 00:10:55.159
<v Speaker 1>comes from continuous integration and continuous delivery CICD Right.

212
00:10:55.279 --> 00:10:58.720
<v Speaker 2>CICD pipelines ensure that your code is constantly being built,

213
00:10:59.200 --> 00:11:03.200
<v Speaker 2>tested and made ready for deployment. This enables those frequent, small,

214
00:11:03.240 --> 00:11:06.320
<v Speaker 2>low risk releases we talked about instead of massive, scary

215
00:11:06.360 --> 00:11:07.480
<v Speaker 2>deployments once a quarter.

216
00:11:07.639 --> 00:11:10.120
<v Speaker 1>You deploy small changes, maybe multiple times a day.

217
00:11:10.399 --> 00:11:13.000
<v Speaker 2>That's the goal. But how do you make sure the

218
00:11:13.080 --> 00:11:17.679
<v Speaker 2>underlying infrastructure, the Kubernetes cluster itself, the networking, the databases

219
00:11:18.159 --> 00:11:21.039
<v Speaker 2>is set up correctly and consistently every single time?

220
00:11:21.240 --> 00:11:24.720
<v Speaker 1>Good question. Manual setup seems error prone.

221
00:11:24.480 --> 00:11:28.759
<v Speaker 2>Highly error prone. That's where infrastructure as code or IAC

222
00:11:29.120 --> 00:11:30.039
<v Speaker 2>is absolutely viable.

223
00:11:30.159 --> 00:11:32.000
<v Speaker 1>Infrastructure as code yeah.

224
00:11:32.279 --> 00:11:34.799
<v Speaker 2>Instead of clicking around in a cloud provider's web console

225
00:11:35.080 --> 00:11:37.120
<v Speaker 2>to set things up, which is slow and impossible to

226
00:11:37.159 --> 00:11:39.440
<v Speaker 2>replicate perfectly, you write.

227
00:11:39.240 --> 00:11:42.080
<v Speaker 1>Scripts scripts that define the infrastructure.

228
00:11:41.480 --> 00:11:46.000
<v Speaker 2>Exactly, using tools like AWS cloud Formation as your ARM

229
00:11:46.080 --> 00:11:51.679
<v Speaker 2>templates or Terraform. These scripts define your servers, networks, load balancers, everything.

230
00:11:52.159 --> 00:11:54.639
<v Speaker 2>You check this code in diversion control just like your

231
00:11:54.639 --> 00:11:55.519
<v Speaker 2>application code.

232
00:11:55.559 --> 00:11:58.679
<v Speaker 1>Ah. So it's repeatable, auditable, and testable.

233
00:11:58.879 --> 00:12:01.960
<v Speaker 2>You can spin up an entire identical environment for development, testing,

234
00:12:02.039 --> 00:12:05.679
<v Speaker 2>or production just by running the script. It eliminates configuration

235
00:12:05.879 --> 00:12:09.080
<v Speaker 2>drift and the classic well it worked on my machine problem?

236
00:12:09.120 --> 00:12:12.279
<v Speaker 1>Okay, that consistency is key. Now we have hundreds of

237
00:12:12.279 --> 00:12:16.200
<v Speaker 1>services automated deployment. How do we avoid the pay as

238
00:12:16.240 --> 00:12:19.320
<v Speaker 1>you go model becoming pay way too much? How do

239
00:12:19.360 --> 00:12:21.519
<v Speaker 1>we manage costs and spot problems?

240
00:12:21.639 --> 00:12:24.720
<v Speaker 2>That comes down to proactive monitoring and alerting. It's not optional,

241
00:12:24.759 --> 00:12:25.399
<v Speaker 2>it's essential.

242
00:12:25.519 --> 00:12:27.159
<v Speaker 1>Not just for finding bugs.

243
00:12:27.080 --> 00:12:30.679
<v Speaker 2>No, it's critical for cost management too. You need visibility

244
00:12:30.720 --> 00:12:34.519
<v Speaker 2>into how all these services are behaving. Our resources being underutilized,

245
00:12:34.879 --> 00:12:39.480
<v Speaker 2>that's wasted money. Are they being overutilized that risks, performance

246
00:12:39.519 --> 00:12:40.399
<v Speaker 2>issues are failures.

247
00:12:40.480 --> 00:12:43.480
<v Speaker 1>So you need centralized logging and metrics absolutely.

248
00:12:43.759 --> 00:12:47.360
<v Speaker 2>Tools like AWS, cloud Watch or open source stacks like

249
00:12:47.399 --> 00:12:51.559
<v Speaker 2>the ELK stack, Elastic Search, log Stash, Cubana are common.

250
00:12:52.039 --> 00:12:54.960
<v Speaker 2>They gather logs and metrics from all your micro services

251
00:12:55.000 --> 00:12:55.559
<v Speaker 2>into one.

252
00:12:55.399 --> 00:12:57.679
<v Speaker 4>Place so you can see the whole picture and set up.

253
00:12:57.559 --> 00:13:01.399
<v Speaker 2>Alerts for anomalies, errors, high latency, unused resource consumption so

254
00:13:01.440 --> 00:13:03.879
<v Speaker 2>you can react before it impacts users or your bill.

255
00:13:03.960 --> 00:13:07.360
<v Speaker 1>Makes sense. One last piece security Moving from one big

256
00:13:07.399 --> 00:13:11.480
<v Speaker 1>monolith to hundreds of distributed services must change the security

257
00:13:11.480 --> 00:13:12.240
<v Speaker 1>game completely.

258
00:13:12.360 --> 00:13:14.519
<v Speaker 2>It absolutely does. You can't just put a big firewall

259
00:13:14.600 --> 00:13:17.360
<v Speaker 2>or on the monolith anymore. Cloud data security relies heavily

260
00:13:17.399 --> 00:13:21.320
<v Speaker 2>on two things. Fine grained access control and network segmentation.

261
00:13:21.399 --> 00:13:22.360
<v Speaker 1>Okay, break those down.

262
00:13:22.519 --> 00:13:26.960
<v Speaker 2>Role based access control or RBAC IM and AWS is

263
00:13:26.960 --> 00:13:31.279
<v Speaker 2>about who can do what principle of least privilege. Users

264
00:13:31.600 --> 00:13:35.279
<v Speaker 2>or even services themselves only get the absolute minimum permissions

265
00:13:35.279 --> 00:13:39.159
<v Speaker 2>they need to function. No more generic admin keys floating around.

266
00:13:39.200 --> 00:13:41.320
<v Speaker 1>And network segmentation that's about.

267
00:13:41.159 --> 00:13:45.320
<v Speaker 2>Controlling what can talk to what You use virtual networks, subnets,

268
00:13:45.480 --> 00:13:50.080
<v Speaker 2>security groups essentially internal firewalls to isolate services. The Bayman

269
00:13:50.159 --> 00:13:52.240
<v Speaker 2>service should only be allowed to talk to the specific

270
00:13:52.320 --> 00:13:55.159
<v Speaker 2>database that needs and maybe the order service, it shouldn't

271
00:13:55.159 --> 00:13:56.879
<v Speaker 2>be able to reach the user profile service, for.

272
00:13:56.840 --> 00:14:00.440
<v Speaker 1>Example, inintaining the blast radius. If something gets compromised.

273
00:14:00.279 --> 00:14:03.000
<v Speaker 2>Exactly zero, trust principles become much more important.

274
00:14:03.159 --> 00:14:06.679
<v Speaker 1>Wow. Okay, so looking back, it's quite a journey. We

275
00:14:06.720 --> 00:14:10.240
<v Speaker 1>went from the monolithic bottleneck to these nimble micro services.

276
00:14:10.720 --> 00:14:14.240
<v Speaker 1>We navigated the ZIAS models, the pay as you go economics,

277
00:14:14.600 --> 00:14:20.639
<v Speaker 1>and critically adopted this design mindset focused on resilience, statelessness, automation,

278
00:14:20.960 --> 00:14:21.919
<v Speaker 1>assuming failure.

279
00:14:22.200 --> 00:14:24.679
<v Speaker 2>And I think the biggest takeaway really, the thing that

280
00:14:24.720 --> 00:14:28.159
<v Speaker 2>should guide anyone starting this journey or refining it, is

281
00:14:28.159 --> 00:14:32.000
<v Speaker 2>that proactive planning right at the design phase. That's what

282
00:14:32.080 --> 00:14:35.480
<v Speaker 2>saves you those thousands of dollars and man hours down

283
00:14:35.559 --> 00:14:36.000
<v Speaker 2>the line.

284
00:14:36.080 --> 00:14:38.360
<v Speaker 1>You can't just tack this stuff on later.

285
00:14:38.279 --> 00:14:41.759
<v Speaker 2>No, you really can't. Trying to retrofit cloud native ideas

286
00:14:41.759 --> 00:14:46.759
<v Speaker 2>onto an old monolithic design, it's usually painful and often fails.

287
00:14:47.240 --> 00:14:48.679
<v Speaker 2>You have to build it in from day.

288
00:14:48.559 --> 00:14:51.080
<v Speaker 1>One, right, which brings us nicely to our final thought

289
00:14:51.159 --> 00:14:53.960
<v Speaker 1>for you, the listener to ponder, We talked a lot

290
00:14:53.960 --> 00:14:57.720
<v Speaker 1>about designing for failure, and the sources mentioned this powerful concept,

291
00:14:57.879 --> 00:14:59.240
<v Speaker 1>the bulkhead pattern.

292
00:14:59.360 --> 00:15:01.639
<v Speaker 4>Ah Yes, ship analogy exactly.

293
00:15:01.879 --> 00:15:05.159
<v Speaker 1>A bulkhead is that watertight wall inside a ship's hull.

294
00:15:05.399 --> 00:15:08.120
<v Speaker 1>If there's a leak or a fire in one compartment.

295
00:15:07.679 --> 00:15:10.919
<v Speaker 2>The bulkhead contains it. It stops the disaster from spreading

296
00:15:10.919 --> 00:15:13.039
<v Speaker 2>and sinking the whole ship. The rest of the vessel

297
00:15:13.200 --> 00:15:14.120
<v Speaker 2>stays operational.

298
00:15:14.720 --> 00:15:18.000
<v Speaker 1>So the question for you is where in your business,

299
00:15:18.039 --> 00:15:21.240
<v Speaker 1>in your systems, in your organization can you deliberately apply

300
00:15:21.279 --> 00:15:22.559
<v Speaker 1>that bulkhead pattern.

301
00:15:22.440 --> 00:15:26.799
<v Speaker 2>If one part fails, a key service, a critical process,

302
00:15:27.080 --> 00:15:30.440
<v Speaker 2>maybe even a team. Have you built the walls, Have

303
00:15:30.559 --> 00:15:34.799
<v Speaker 2>you designed the isolation points, the statelessness, the network segmentation,

304
00:15:34.919 --> 00:15:39.039
<v Speaker 2>the circuit breakers, the automation to ensure that failure is contained.

305
00:15:38.919 --> 00:15:42.320
<v Speaker 1>So that the core mission, the essential operations, can continue

306
00:15:42.320 --> 00:15:45.000
<v Speaker 1>moving forward even when something inevitably breaks.

307
00:15:45.120 --> 00:15:48.279
<v Speaker 2>That's the challenge thinking about resilience, not just in code,

308
00:15:48.320 --> 00:15:49.440
<v Speaker 2>but across the whole system.

309
00:15:49.480 --> 00:15:51.840
<v Speaker 1>Definitely something to mull over as you apply these cloud

310
00:15:51.879 --> 00:15:54.360
<v Speaker 1>native concepts. Thank you for joining us for this deep dive.
