WEBVTT

1
00:00:00.080 --> 00:00:02.200
<v Speaker 1>Welcome back to the deep dive. We're here to give

2
00:00:02.240 --> 00:00:05.400
<v Speaker 1>you that fast track, that real expertise on the industries

3
00:00:06.200 --> 00:00:10.560
<v Speaker 1>well most critical topics. That's the plan, and today we

4
00:00:10.599 --> 00:00:16.760
<v Speaker 1>are really diving deep our mission mastering modern IT infrastructure monitoring. Yeah,

5
00:00:16.800 --> 00:00:20.600
<v Speaker 1>and our guide for this is some really comprehensive material

6
00:00:21.760 --> 00:00:25.719
<v Speaker 1>compiled from the zabx five IT Infrastructure Monitoring pickbook right

7
00:00:25.960 --> 00:00:26.359
<v Speaker 1>and for.

8
00:00:26.320 --> 00:00:30.039
<v Speaker 2>You listening in, the goal here is simple, a complete,

9
00:00:30.120 --> 00:00:32.920
<v Speaker 2>but you know, fast understanding of zabix five.

10
00:00:32.799 --> 00:00:34.920
<v Speaker 3>Architecture cutting through the noise exactly.

11
00:00:34.920 --> 00:00:38.960
<v Speaker 2>We're focusing on the core structure, the advanced ways it

12
00:00:38.960 --> 00:00:42.159
<v Speaker 2>grabs data and crucially how to scale it. Everything you

13
00:00:42.240 --> 00:00:43.759
<v Speaker 2>really need for a robust setup.

14
00:00:43.799 --> 00:00:46.479
<v Speaker 1>And that timing on this couldn't be better really, zabx

15
00:00:46.520 --> 00:00:50.039
<v Speaker 1>five it's an LTS release, long term support. Companies bet

16
00:00:50.039 --> 00:00:52.520
<v Speaker 1>their infrastructure health on this for years. So getting these

17
00:00:52.560 --> 00:00:54.840
<v Speaker 1>fundamentals right right now super critical.

18
00:00:54.920 --> 00:00:55.399
<v Speaker 2>Absolutely.

19
00:00:55.479 --> 00:00:57.439
<v Speaker 1>Okay, So let's unpack this a bit. What are the

20
00:00:57.520 --> 00:01:01.320
<v Speaker 1>absolute foundational bits needed to just make as run and

21
00:01:01.479 --> 00:01:05.799
<v Speaker 1>maybe more importantly, what's that single essential metric, the one

22
00:01:05.840 --> 00:01:08.959
<v Speaker 1>thing that tells you if your serious monitoring setup can

23
00:01:09.000 --> 00:01:11.040
<v Speaker 1>actually handle the load. You're throwing at it.

24
00:01:11.239 --> 00:01:15.959
<v Speaker 2>Right. Architecturally, zabx needs three core components. They're constantly talking

25
00:01:16.000 --> 00:01:18.640
<v Speaker 2>three main parts. Yeah, you've got the Zavic server. That's

26
00:01:18.680 --> 00:01:21.640
<v Speaker 2>the central brain doing the polling, the processing.

27
00:01:21.159 --> 00:01:21.480
<v Speaker 1>Got it.

28
00:01:21.560 --> 00:01:22.760
<v Speaker 3>The engine the engine.

29
00:01:23.000 --> 00:01:27.200
<v Speaker 2>Then the database usually Maria dB, maybe Postgres. Well, that's

30
00:01:27.239 --> 00:01:29.760
<v Speaker 2>the huge repository for all your time series data. It

31
00:01:29.760 --> 00:01:30.239
<v Speaker 2>gets big.

32
00:01:30.359 --> 00:01:31.560
<v Speaker 3>The data store exactly.

33
00:01:31.920 --> 00:01:35.159
<v Speaker 2>And finally the zabex front end. That's the WebUI you

34
00:01:35.200 --> 00:01:41.200
<v Speaker 2>interact with, served up by Apache or NGI NX typically.

35
00:01:41.079 --> 00:01:45.120
<v Speaker 1>So engine, data store, dashboard makes sense. Now, before we

36
00:01:45.159 --> 00:01:47.959
<v Speaker 1>get into performance, where does a listener actually find all

37
00:01:47.959 --> 00:01:49.959
<v Speaker 1>these metrics and settings we're about to talk about.

38
00:01:50.120 --> 00:01:54.040
<v Speaker 2>Good point that dashboard. The front end its structured pretty clearly.

39
00:01:54.159 --> 00:01:58.359
<v Speaker 2>When you log in, you see these main categories monitoring, inventory, reports,

40
00:01:58.400 --> 00:02:02.840
<v Speaker 2>configuration and administration. Okay, all the key stuff we'll discuss today.

41
00:02:02.920 --> 00:02:05.480
<v Speaker 2>It fits neatly into one of those five areas. Makes

42
00:02:05.480 --> 00:02:08.879
<v Speaker 2>it easier to navigate. Now, that critical metric you asked about,

43
00:02:09.120 --> 00:02:11.000
<v Speaker 2>the one that governs scale, that's.

44
00:02:11.039 --> 00:02:15.960
<v Speaker 1>Nvps MVPs new values per second. Why is that number

45
00:02:16.360 --> 00:02:20.120
<v Speaker 1>so incredibly important in a high volume system?

46
00:02:20.240 --> 00:02:23.439
<v Speaker 2>Think of MVPs as the ingestion rate. It's the average

47
00:02:23.520 --> 00:02:26.400
<v Speaker 2>number of new data points, new readings from your items

48
00:02:26.439 --> 00:02:31.400
<v Speaker 2>that the Zavik server is either receiving or requesting every single.

49
00:02:31.280 --> 00:02:33.400
<v Speaker 3>Second, per second, wow, per second.

50
00:02:33.639 --> 00:02:36.560
<v Speaker 2>It is absolutely the single best measure of the load

51
00:02:36.599 --> 00:02:38.800
<v Speaker 2>on your server, both right now and what you can expect.

52
00:02:39.319 --> 00:02:41.280
<v Speaker 1>So it predicts the future almost.

53
00:02:41.000 --> 00:02:41.400
<v Speaker 3>In a way.

54
00:02:41.520 --> 00:02:45.280
<v Speaker 2>Yes, if you see that MVPs number climbing steadily, staying hi,

55
00:02:45.919 --> 00:02:48.560
<v Speaker 2>that's your definitive signal. It tells you exactly when and

56
00:02:48.599 --> 00:02:50.560
<v Speaker 2>how much you need to beef up your server's CPU

57
00:02:50.599 --> 00:02:53.560
<v Speaker 2>and memory. Right, you ignore nvps at your own peril.

58
00:02:53.599 --> 00:02:57.680
<v Speaker 1>Basically, Okay, message received. If MVPs is the heartbeat, let's

59
00:02:57.719 --> 00:03:01.000
<v Speaker 1>talk about what we're actually feeding this engine. Monitoring isn't

60
00:03:01.039 --> 00:03:04.639
<v Speaker 1>just about pings anymore. You need custom data, sometimes asynchronous stuff.

61
00:03:04.960 --> 00:03:07.599
<v Speaker 1>How does zabx five give you that flexibility beyond just

62
00:03:07.639 --> 00:03:08.280
<v Speaker 1>basic checks?

63
00:03:08.319 --> 00:03:10.479
<v Speaker 2>Well, it starts the agent simple checks, like you know,

64
00:03:10.599 --> 00:03:14.039
<v Speaker 2>is SSH port twenty two open? Those are still there, sure, But.

65
00:03:14.000 --> 00:03:17.479
<v Speaker 1>The big shift is to Zavik's Agent two. It's written in.

66
00:03:17.439 --> 00:03:20.159
<v Speaker 3>Go lang ah go okay, and that.

67
00:03:20.159 --> 00:03:25.000
<v Speaker 1>Matters because Go natively handles concurrency and asynchronous tasks much better.

68
00:03:25.520 --> 00:03:29.879
<v Speaker 1>The result a much lighter footprint on resources.

69
00:03:29.360 --> 00:03:32.639
<v Speaker 2>Which is huge and containerized setups or just large virtual.

70
00:03:32.400 --> 00:03:34.240
<v Speaker 3>Environments exactly invaluable.

71
00:03:34.400 --> 00:03:37.719
<v Speaker 2>So that's the tech choice explained. But for really custom

72
00:03:37.759 --> 00:03:41.639
<v Speaker 2>bespoke data, the sources talk a lot about this trapper mechanism.

73
00:03:41.919 --> 00:03:44.560
<v Speaker 2>If agent two is so advanced, why do we need

74
00:03:44.560 --> 00:03:47.560
<v Speaker 2>this separate trapper thing. Isn't that just adding complexity?

75
00:03:47.879 --> 00:03:50.879
<v Speaker 1>It's a fair pultion. They serve slightly different roles, though

76
00:03:50.919 --> 00:03:53.599
<v Speaker 1>the agent typically pulls on a set schedule right check

77
00:03:53.639 --> 00:03:57.599
<v Speaker 1>this every sixty seconds. The trapper is for asynchronous data

78
00:03:57.759 --> 00:04:00.319
<v Speaker 1>data that maybe comes from some external script on the

79
00:04:00.360 --> 00:04:02.680
<v Speaker 1>host and you don't know exactly when it'll finish.

80
00:04:02.840 --> 00:04:04.599
<v Speaker 2>Ah okay, like a long running job.

81
00:04:04.719 --> 00:04:07.240
<v Speaker 1>Precisely, so, you set up a zabx trapper item on

82
00:04:07.280 --> 00:04:08.479
<v Speaker 1>the server side. It just waits.

83
00:04:08.719 --> 00:04:11.879
<v Speaker 2>Then on the host being monitored, you use the ZABC

84
00:04:11.960 --> 00:04:16.439
<v Speaker 2>sender utility. That utility actively pushes the result, maybe from

85
00:04:16.480 --> 00:04:19.879
<v Speaker 2>your complex Python script or cron job, straight to the

86
00:04:19.920 --> 00:04:21.519
<v Speaker 2>server when the result is ready.

87
00:04:21.279 --> 00:04:23.800
<v Speaker 1>So the server isn't asking, it's just listening for that

88
00:04:23.959 --> 00:04:25.000
<v Speaker 1>specific data to.

89
00:04:25.040 --> 00:04:28.720
<v Speaker 2>Arrive, exactly passive listening on the server, active sending from

90
00:04:28.759 --> 00:04:29.120
<v Speaker 2>the host.

91
00:04:29.319 --> 00:04:31.519
<v Speaker 1>That makes sense. It's the gateway for data that doesn't

92
00:04:31.560 --> 00:04:34.759
<v Speaker 1>fit the regular polling schedule. Now here's where it seems

93
00:04:34.800 --> 00:04:37.680
<v Speaker 1>to get really powerful. Though getting raw data is one thing,

94
00:04:37.759 --> 00:04:40.399
<v Speaker 1>making it useful is another. Yeah, tell us about this

95
00:04:40.519 --> 00:04:41.800
<v Speaker 1>preprocessing layer.

96
00:04:42.040 --> 00:04:45.319
<v Speaker 2>Oh, preprocessing is an absolute game changer. It's data manipulation

97
00:04:45.439 --> 00:04:48.399
<v Speaker 2>that happens before the value even lands.

98
00:04:48.079 --> 00:04:51.040
<v Speaker 3>In the database, before storage. Okay, yes, and.

99
00:04:50.959 --> 00:04:54.680
<v Speaker 2>Its main goal is efficiency. One way is using things

100
00:04:54.759 --> 00:04:56.120
<v Speaker 2>like dependent.

101
00:04:55.639 --> 00:04:56.879
<v Speaker 3>Items pendent item.

102
00:04:56.959 --> 00:04:59.879
<v Speaker 2>Think of it like this. You make one single request

103
00:05:00.120 --> 00:05:02.879
<v Speaker 2>to get say, a big chunk of status info a

104
00:05:02.879 --> 00:05:05.519
<v Speaker 2>master item check right, then depend on items. Let you

105
00:05:05.560 --> 00:05:09.279
<v Speaker 2>pull five specific metrics out of that single chunk of data,

106
00:05:09.319 --> 00:05:12.160
<v Speaker 2>all for the cost of just one network call one

107
00:05:12.199 --> 00:05:12.920
<v Speaker 2>interval check.

108
00:05:13.079 --> 00:05:17.000
<v Speaker 1>Ah. So instead of five separate checks for CPU RAM

109
00:05:17.079 --> 00:05:21.040
<v Speaker 1>disc uptime, you do one big check and peel off

110
00:05:21.040 --> 00:05:21.600
<v Speaker 1>the bits.

111
00:05:21.360 --> 00:05:24.480
<v Speaker 2>You need precisely. It saves a massive amount of overhead

112
00:05:24.480 --> 00:05:25.480
<v Speaker 2>on the network in the agent.

113
00:05:25.519 --> 00:05:26.279
<v Speaker 1>Okay, that's clever.

114
00:05:26.399 --> 00:05:29.079
<v Speaker 2>And often that one big status check it dumps out

115
00:05:29.160 --> 00:05:33.079
<v Speaker 2>just messy unstructured text. Think of the raw output from

116
00:05:33.120 --> 00:05:36.199
<v Speaker 2>like if canfig on Linux run via system dot.

117
00:05:36.120 --> 00:05:37.160
<v Speaker 1>Run yet tons of text?

118
00:05:37.279 --> 00:05:39.399
<v Speaker 2>This is where what you called the rejecs hack comes in.

119
00:05:39.759 --> 00:05:43.120
<v Speaker 2>You use preprocessing with a regular expression rejects to.

120
00:05:43.240 --> 00:05:45.839
<v Speaker 3>Scan that raw text, find the pattern.

121
00:05:45.759 --> 00:05:48.639
<v Speaker 2>And cleanly extract just the single number you care about,

122
00:05:48.680 --> 00:05:51.680
<v Speaker 2>like total rx bites for interface ends one ninety two.

123
00:05:52.160 --> 00:05:55.079
<v Speaker 2>You turn that text chaos into a clean, usable number

124
00:05:55.160 --> 00:05:55.759
<v Speaker 2>right at the door.

125
00:05:55.920 --> 00:05:58.600
<v Speaker 1>That's fantastic. It shifts the cleanup work away from the

126
00:05:58.600 --> 00:06:01.240
<v Speaker 1>network into the server's processing, where it's more efficient. I

127
00:06:01.240 --> 00:06:04.360
<v Speaker 1>think you remember another database tip related to this, something

128
00:06:04.439 --> 00:06:05.319
<v Speaker 1>about duplicates.

129
00:06:05.560 --> 00:06:08.079
<v Speaker 2>Yes, exactly, for values that don't change often, maybe a

130
00:06:08.160 --> 00:06:10.399
<v Speaker 2>server serial number for a more version, that kind of

131
00:06:10.399 --> 00:06:14.120
<v Speaker 2>thing that's static mostly right, you use the discard unchanged

132
00:06:14.160 --> 00:06:17.639
<v Speaker 2>preprocessing step. If zavix gets the same value one hundred

133
00:06:17.680 --> 00:06:20.759
<v Speaker 2>times in a row, it won't write one hundred identical

134
00:06:20.879 --> 00:06:22.040
<v Speaker 2>entries into the database.

135
00:06:22.199 --> 00:06:24.839
<v Speaker 1>Ah, so it just stores the first one and ignores

136
00:06:24.879 --> 00:06:26.000
<v Speaker 1>the rest until it changes.

137
00:06:26.120 --> 00:06:26.519
<v Speaker 3>Correct.

138
00:06:26.680 --> 00:06:29.800
<v Speaker 2>This prevents just pointless bloating of your database over time.

139
00:06:30.160 --> 00:06:31.800
<v Speaker 2>Super important for long term.

140
00:06:31.600 --> 00:06:35.040
<v Speaker 1>Health, definitely. Okay, so we've collected and cleaned the data.

141
00:06:35.160 --> 00:06:37.199
<v Speaker 1>Now we need to act on it. Alerting. We've all

142
00:06:37.199 --> 00:06:41.560
<v Speaker 1>been there. The flapping alert goes critical then okay, then

143
00:06:41.600 --> 00:06:45.079
<v Speaker 1>critical fifty times an hour floods your inbox. How does

144
00:06:45.160 --> 00:06:47.839
<v Speaker 1>Zavix help stop that alert fatigue nightmare?

145
00:06:47.959 --> 00:06:50.519
<v Speaker 2>The cure for that horror show is the recovery expression.

146
00:06:50.839 --> 00:06:54.480
<v Speaker 2>Setting a simple trigger is easy CPU usage fifty percent

147
00:06:54.800 --> 00:06:56.000
<v Speaker 2>trigger an alert.

148
00:06:55.800 --> 00:06:58.120
<v Speaker 1>Right, and the naive setup recovers as soon as it

149
00:06:58.199 --> 00:06:59.480
<v Speaker 1>hits forty nine percent.

150
00:06:59.240 --> 00:07:03.879
<v Speaker 2>Exactly which to flapping. The recovery expression demands stability. You

151
00:07:03.959 --> 00:07:06.439
<v Speaker 2>define a separate condition for the alert to clear, it

152
00:07:06.480 --> 00:07:09.000
<v Speaker 2>has to move significantly away from the problem threshold.

153
00:07:09.120 --> 00:07:11.720
<v Speaker 1>So like trigger a fifty percent, but only recover when

154
00:07:11.720 --> 00:07:13.920
<v Speaker 1>it's back down to say forty percent.

155
00:07:13.759 --> 00:07:16.879
<v Speaker 2>Pcisely that the system has to prove its stable and

156
00:07:16.959 --> 00:07:19.600
<v Speaker 2>well clear of the danger zone before the alarm goes silent.

157
00:07:19.879 --> 00:07:22.040
<v Speaker 1>That makes the alerts actually meaningful again.

158
00:07:22.199 --> 00:07:26.399
<v Speaker 2>Nice, absolutely, and organizationally, you need structure too. Use tags

159
00:07:26.439 --> 00:07:30.000
<v Speaker 2>from day one, AD service SSH or application billing to

160
00:07:30.079 --> 00:07:31.839
<v Speaker 2>every related trigger so you.

161
00:07:31.800 --> 00:07:33.519
<v Speaker 1>Can filter and rout alerts properly.

162
00:07:33.680 --> 00:07:36.879
<v Speaker 2>Yes, and customize your severity levels. Don't stick with the

163
00:07:36.920 --> 00:07:40.439
<v Speaker 2>defaults like disaster high warning. If your company uses P one,

164
00:07:40.519 --> 00:07:43.040
<v Speaker 2>P two, P three, change the names in Zavik so

165
00:07:43.120 --> 00:07:45.800
<v Speaker 2>the alerts immediately make sense in your team's context.

166
00:07:46.279 --> 00:07:50.519
<v Speaker 1>Good practical tips. Now let's talk scaling. The actual architecture.

167
00:07:51.279 --> 00:07:54.560
<v Speaker 1>MVPs tells us the load is high. But for medium, large,

168
00:07:54.639 --> 00:08:00.160
<v Speaker 1>geographically dispersed environments, what's the component that handles spreading out

169
00:08:00.160 --> 00:08:01.000
<v Speaker 1>that monitoring work.

170
00:08:01.120 --> 00:08:04.079
<v Speaker 2>That's where zab's proxies come in. They are absolutely essential

171
00:08:04.120 --> 00:08:06.439
<v Speaker 2>for offloading work from the central ZABK server.

172
00:08:06.639 --> 00:08:07.399
<v Speaker 1>How do they do that?

173
00:08:07.480 --> 00:08:10.360
<v Speaker 2>A proxy sits closer to the devices it monitors, maybe

174
00:08:10.360 --> 00:08:13.120
<v Speaker 2>in a remote data center or a specific network segment.

175
00:08:13.439 --> 00:08:15.839
<v Speaker 2>It collects data locally, holds on to it, maybe does

176
00:08:15.839 --> 00:08:16.720
<v Speaker 2>some pre processing.

177
00:08:17.120 --> 00:08:19.319
<v Speaker 1>Preprocessing can happen on the proxy too.

178
00:08:19.519 --> 00:08:22.279
<v Speaker 2>Yes, some of it. Then it compresses that data and

179
00:08:22.360 --> 00:08:25.079
<v Speaker 2>forwards it efficiently back to the main ZABC server. It

180
00:08:25.160 --> 00:08:27.959
<v Speaker 2>reduces the load on the central server significantly.

181
00:08:28.079 --> 00:08:33.039
<v Speaker 1>Okay, Now, when deploying these proxies, you've got firewalls, network segmentation.

182
00:08:33.360 --> 00:08:35.840
<v Speaker 1>Are there different types performance trade offs?

183
00:08:35.960 --> 00:08:36.200
<v Speaker 3>Yeah?

184
00:08:36.200 --> 00:08:39.159
<v Speaker 2>There are two main modes, passive and active proxies. We

185
00:08:39.240 --> 00:08:43.840
<v Speaker 2>strongly strongly recommend active proxies. Why active Two big reasons. First,

186
00:08:44.440 --> 00:08:48.480
<v Speaker 2>they're generally faster because they proactively push their collected data

187
00:08:48.559 --> 00:08:50.759
<v Speaker 2>to the server whenever they have new stuff. They don't

188
00:08:50.759 --> 00:08:51.480
<v Speaker 2>wait to be asked.

189
00:08:51.559 --> 00:08:53.000
<v Speaker 1>Okay, less latency, right.

190
00:08:53.440 --> 00:08:56.720
<v Speaker 2>But second, and often more importantly, for network teams, an

191
00:08:56.759 --> 00:09:01.039
<v Speaker 2>active proxy only needs one outbound connection and initiated from

192
00:09:01.080 --> 00:09:02.120
<v Speaker 2>the proxy to the server.

193
00:09:02.320 --> 00:09:04.320
<v Speaker 3>Uh so the proxy calls home exactly.

194
00:09:04.679 --> 00:09:07.399
<v Speaker 2>Compare that to passive proxies, where the central server has

195
00:09:07.440 --> 00:09:11.360
<v Speaker 2>to be able to initiate connections to potentially dozens or

196
00:09:11.480 --> 00:09:14.200
<v Speaker 2>hundreds of proxies. Firewall rule nightmare.

197
00:09:14.440 --> 00:09:19.399
<v Speaker 1>Definitely simplifying firewall management is a massive win in big companies. Okay,

198
00:09:19.440 --> 00:09:23.519
<v Speaker 1>Sticking with operational wins, Yeah, let's hit the biggest maintenance

199
00:09:23.559 --> 00:09:28.200
<v Speaker 1>headache for any monitoring system over time, the database. How

200
00:09:28.200 --> 00:09:31.159
<v Speaker 1>do we stop it from becoming this unmanageable beast? Yeah?

201
00:09:31.240 --> 00:09:34.200
<v Speaker 2>The database is always the challenge long term. The bottlenet

202
00:09:34.279 --> 00:09:37.039
<v Speaker 2>comes from the default Zabas process called the housekeeper.

203
00:09:37.120 --> 00:09:39.240
<v Speaker 1>Housekeeper sounds helpful, well.

204
00:09:39.120 --> 00:09:41.399
<v Speaker 2>It tries to be. When data gets old. Say your

205
00:09:41.399 --> 00:09:44.440
<v Speaker 2>history retention is ninety days, the housekeeper is responsible for

206
00:09:44.480 --> 00:09:45.840
<v Speaker 2>deleting data older than that.

207
00:09:46.000 --> 00:09:48.639
<v Speaker 3>Okay, seems necessary, but it deletes.

208
00:09:48.320 --> 00:09:52.039
<v Speaker 2>That data roe bi row. Imagine millions, maybe billions of

209
00:09:52.159 --> 00:09:55.879
<v Speaker 2>rows as your database grows into terabytes. This row by

210
00:09:56.000 --> 00:10:00.720
<v Speaker 2>roe deletion just consumes immense IOCPU. Yeah, it can bring

211
00:10:00.720 --> 00:10:04.080
<v Speaker 2>your server performance to its knees during cleanup OUCH.

212
00:10:04.120 --> 00:10:06.879
<v Speaker 1>So relying on the built and cleaner eventually just grinds

213
00:10:06.919 --> 00:10:09.799
<v Speaker 1>everything to a halt. What's the better way? The advanced solution?

214
00:10:10.120 --> 00:10:12.679
<v Speaker 2>You absolutely need to move to database native methods. For

215
00:10:12.840 --> 00:10:16.399
<v Speaker 2>my sqel, that's my Sqel partitioning. For postgrescul it's leveraging

216
00:10:16.480 --> 00:10:19.840
<v Speaker 2>the timescale dB extension, which brings time series superpowers to.

217
00:10:19.799 --> 00:10:24.559
<v Speaker 1>Postgross partitioning or timescale deb How do they avoid the

218
00:10:24.679 --> 00:10:25.799
<v Speaker 1>row by row problem?

219
00:10:25.840 --> 00:10:29.519
<v Speaker 2>They work with time based chunks instead of deleting individual rows.

220
00:10:29.639 --> 00:10:32.240
<v Speaker 2>The database is structured so you can just drop an

221
00:10:32.399 --> 00:10:37.440
<v Speaker 2>entire old partition, say delete all data from March instantly.

222
00:10:37.279 --> 00:10:39.720
<v Speaker 1>Like throwing away a whole filing cabinet drawer instead of

223
00:10:39.759 --> 00:10:41.200
<v Speaker 1>shredding each paper inside.

224
00:10:41.240 --> 00:10:45.679
<v Speaker 2>Exactly that analogy. It's incredibly efficient. The cleanup becomes almost instantaneous,

225
00:10:46.080 --> 00:10:48.080
<v Speaker 2>freeing up massive resources.

226
00:10:48.159 --> 00:10:50.600
<v Speaker 1>That sounds like a no brainer from a performance perspective.

227
00:10:51.039 --> 00:10:55.639
<v Speaker 1>But does switching to partitioning have any functional trade offs

228
00:10:55.679 --> 00:10:58.000
<v Speaker 1>for the user? Configuring zavig it does.

229
00:10:58.120 --> 00:11:01.399
<v Speaker 2>Yeah, that's the key planning points. You implement native partitioning

230
00:11:01.480 --> 00:11:05.320
<v Speaker 2>your history and trend data retention settings often become global

231
00:11:05.399 --> 00:11:09.399
<v Speaker 2>database parameters. You typically lose the ability to set say

232
00:11:09.440 --> 00:11:12.000
<v Speaker 2>seven days history for this item, but ninety days for

233
00:11:12.039 --> 00:11:12.480
<v Speaker 2>that item.

234
00:11:12.600 --> 00:11:15.559
<v Speaker 1>Ah. So you gain huge efficiency, but lose some of

235
00:11:15.559 --> 00:11:18.279
<v Speaker 1>that fine grain control over retention per item.

236
00:11:18.440 --> 00:11:20.240
<v Speaker 2>That's the main trade off. You need to plan your

237
00:11:20.279 --> 00:11:21.919
<v Speaker 2>retention strategy more globally.

238
00:11:22.279 --> 00:11:27.480
<v Speaker 1>Got it important consideration? Okay, last, big area hybrid cloud.

239
00:11:27.639 --> 00:11:30.600
<v Speaker 1>Lots of critical metrics now live in places like AWS,

240
00:11:30.600 --> 00:11:34.559
<v Speaker 1>cloud Watch or Azure Monitor. How does zabix stay relevant?

241
00:11:34.679 --> 00:11:39.240
<v Speaker 1>How does it pull data that's behind these proprietary cloud CLIs.

242
00:11:39.519 --> 00:11:42.720
<v Speaker 2>Zabx uses a really powerful combo here, zabks agent user

243
00:11:42.759 --> 00:11:45.639
<v Speaker 2>parameters plus the cloud provider's own CLI tools.

244
00:11:45.720 --> 00:11:49.240
<v Speaker 1>Okay, so you can saw the awcli or the azurecli where.

245
00:11:49.279 --> 00:11:51.600
<v Speaker 2>Right onto the same machine where the zabs Agent two

246
00:11:51.720 --> 00:11:54.000
<v Speaker 2>is running. Could be an EC two instance, could be

247
00:11:54.039 --> 00:11:56.559
<v Speaker 2>on prem machine that needs to query the cloud. Okay,

248
00:11:56.799 --> 00:11:59.360
<v Speaker 2>then you can figure a user parameter within the Zavik's

249
00:11:59.440 --> 00:12:02.720
<v Speaker 2>agent confis figuration. This user parameter basically just tells the

250
00:12:02.759 --> 00:12:06.159
<v Speaker 2>agent run this specific AWSCLI command.

251
00:12:06.440 --> 00:12:08.600
<v Speaker 1>The agent cs like a secure little proxy to run

252
00:12:08.639 --> 00:12:10.399
<v Speaker 1>the cloud command locally precisely.

253
00:12:10.600 --> 00:12:13.360
<v Speaker 2>The agent executes, the command, gets the output, maybe your

254
00:12:13.399 --> 00:12:17.159
<v Speaker 2>sqsq depth or soupu utilization from cloud watch, and then

255
00:12:17.200 --> 00:12:20.039
<v Speaker 2>it injects that value straight into Zavix. Like any other metric.

256
00:12:20.080 --> 00:12:22.279
<v Speaker 1>It collected very flexible. Basically, if you can script it

257
00:12:22.320 --> 00:12:24.120
<v Speaker 1>on the command line, zax can monitor it.

258
00:12:24.360 --> 00:12:27.120
<v Speaker 2>That's the power of it. And for containers, Agent two

259
00:12:27.159 --> 00:12:29.279
<v Speaker 2>makes it even easier. It has native plug ins for

260
00:12:29.320 --> 00:12:32.360
<v Speaker 2>things like dock or monitoring built right in again thanks

261
00:12:32.399 --> 00:12:33.559
<v Speaker 2>to that Go architecture.

262
00:12:33.600 --> 00:12:35.960
<v Speaker 1>Okay, pulling it all together, then, what does this mean

263
00:12:36.000 --> 00:12:39.000
<v Speaker 1>for you the listener? We've seen Zavix five is well,

264
00:12:39.000 --> 00:12:43.600
<v Speaker 1>it's clearly robust, super flexible, scalable, designed for way more

265
00:12:43.639 --> 00:12:47.440
<v Speaker 1>than just simple pings. It handles complex enterprise level stuff.

266
00:12:47.720 --> 00:12:48.080
<v Speaker 3>Yeah.

267
00:12:48.120 --> 00:12:51.720
<v Speaker 2>I think the key takeaway is that real control, real scalability.

268
00:12:51.799 --> 00:12:56.440
<v Speaker 2>It requires planning upfront, especially especially around the database. You

269
00:12:56.559 --> 00:12:59.720
<v Speaker 2>have to decide on partitioning or timescale dB early, don't

270
00:12:59.720 --> 00:13:02.200
<v Speaker 2>wait to it hurts, don't wait till it hurts, and

271
00:13:02.440 --> 00:13:06.080
<v Speaker 2>getting good at that data extraction using preprocessing. That's non

272
00:13:06.080 --> 00:13:08.200
<v Speaker 2>negotiable if you want to keep your database clean and

273
00:13:08.240 --> 00:13:09.720
<v Speaker 2>your metrics really trustworthy.

274
00:13:09.960 --> 00:13:13.159
<v Speaker 1>Fantastic summary. Now, building on that idea, of platform control.

275
00:13:13.480 --> 00:13:16.120
<v Speaker 1>We talked about how Agent two can execute commands to

276
00:13:16.159 --> 00:13:19.600
<v Speaker 1>pull data. Yeah, but what about controlling Zavix itself. Here's

277
00:13:19.600 --> 00:13:21.799
<v Speaker 1>a final thought. Even a user who's locked down to

278
00:13:21.879 --> 00:13:23.480
<v Speaker 1>the basic Zavis user.

279
00:13:23.320 --> 00:13:27.120
<v Speaker 2>Role, right, the most basic view only type role usually.

280
00:13:27.159 --> 00:13:29.840
<v Speaker 1>Exactly they can maybe only see metrics for their own

281
00:13:29.879 --> 00:13:34.879
<v Speaker 1>couple of hosts. Even that user can potentially execute powerful

282
00:13:34.919 --> 00:13:38.840
<v Speaker 1>administrative actions, things like enabling or disabling host on maps

283
00:13:39.240 --> 00:13:43.240
<v Speaker 1>or scheduling system maintenance periods. They can trigger these right

284
00:13:43.279 --> 00:13:44.480
<v Speaker 1>from the Zavix front end.

285
00:13:44.519 --> 00:13:48.320
<v Speaker 2>Wait, hang on a basic user doing admin tasks. How

286
00:13:48.679 --> 00:13:50.960
<v Speaker 2>that sounds like a massive security hole if they don't

287
00:13:50.960 --> 00:13:52.000
<v Speaker 2>have the actual permissions.

288
00:13:52.080 --> 00:13:54.879
<v Speaker 1>It sounds like it, but it's actually the ultimate delegation

289
00:13:55.000 --> 00:13:59.279
<v Speaker 1>power of the ZAVICSAPI. The trick is the script they

290
00:13:59.279 --> 00:14:02.360
<v Speaker 1>trigger from the frontend. It isn't running with their lowly

291
00:14:02.440 --> 00:14:05.960
<v Speaker 1>user permissions. Instead, that custom script you've set up is

292
00:14:05.960 --> 00:14:08.399
<v Speaker 1>configured behind the scenes to use the API credentials of

293
00:14:08.440 --> 00:14:11.519
<v Speaker 1>a different user, one who does have administrative privileges.

294
00:14:11.679 --> 00:14:13.559
<v Speaker 2>So the low level user clicks a button, but the

295
00:14:13.600 --> 00:14:16.559
<v Speaker 2>action is performed via the API using a high privileged

296
00:14:16.559 --> 00:14:18.840
<v Speaker 2>token or user configured in that script.

297
00:14:19.039 --> 00:14:23.000
<v Speaker 1>Exactly. It lets you safely delegate very specific, complex administrative

298
00:14:23.039 --> 00:14:26.080
<v Speaker 1>workflows like putting a server into maintenance for exactly two

299
00:14:26.080 --> 00:14:29.120
<v Speaker 1>hours starting now, to users who absolutely should not have

300
00:14:29.240 --> 00:14:30.919
<v Speaker 1>general admin rights to the whole system.

301
00:14:31.200 --> 00:14:31.799
<v Speaker 3>Wow.

302
00:14:32.159 --> 00:14:37.039
<v Speaker 2>Okay, that is powerful the API as a controlled delegation mechanism.

303
00:14:37.120 --> 00:14:41.039
<v Speaker 1>That's the raw, potent and highly customizable power hidden within

304
00:14:41.080 --> 00:14:44.440
<v Speaker 1>the Zavix API. Food for thought. Definitely thanks for diving

305
00:14:44.480 --> 00:14:46.440
<v Speaker 1>deep with us today. We really hope you feel better

306
00:14:46.480 --> 00:14:50.120
<v Speaker 1>equipped now to tackle your infrastructure monitoring challenges using Zavix

307
00:14:50.159 --> 00:14:50.399
<v Speaker 1>five
