WEBVTT 1 00:00:00.080 --> 00:00:02.200 Welcome back to the deep dive. We're here to give 2 00:00:02.240 --> 00:00:05.400 you that fast track, that real expertise on the industries 3 00:00:06.200 --> 00:00:10.560 well most critical topics. That's the plan, and today we 4 00:00:10.599 --> 00:00:16.760 are really diving deep our mission mastering modern IT infrastructure monitoring. Yeah, 5 00:00:16.800 --> 00:00:20.600 and our guide for this is some really comprehensive material 6 00:00:21.760 --> 00:00:25.719 compiled from the zabx five IT Infrastructure Monitoring pickbook right 7 00:00:25.960 --> 00:00:26.359 and for. 8 00:00:26.320 --> 00:00:30.039 You listening in, the goal here is simple, a complete, 9 00:00:30.120 --> 00:00:32.920 but you know, fast understanding of zabix five. 10 00:00:32.799 --> 00:00:34.920 Architecture cutting through the noise exactly. 11 00:00:34.920 --> 00:00:38.960 We're focusing on the core structure, the advanced ways it 12 00:00:38.960 --> 00:00:42.159 grabs data and crucially how to scale it. Everything you 13 00:00:42.240 --> 00:00:43.759 really need for a robust setup. 14 00:00:43.799 --> 00:00:46.479 And that timing on this couldn't be better really, zabx 15 00:00:46.520 --> 00:00:50.039 five it's an LTS release, long term support. Companies bet 16 00:00:50.039 --> 00:00:52.520 their infrastructure health on this for years. So getting these 17 00:00:52.560 --> 00:00:54.840 fundamentals right right now super critical. 18 00:00:54.920 --> 00:00:55.399 Absolutely. 19 00:00:55.479 --> 00:00:57.439 Okay, So let's unpack this a bit. What are the 20 00:00:57.520 --> 00:01:01.320 absolute foundational bits needed to just make as run and 21 00:01:01.479 --> 00:01:05.799 maybe more importantly, what's that single essential metric, the one 22 00:01:05.840 --> 00:01:08.959 thing that tells you if your serious monitoring setup can 23 00:01:09.000 --> 00:01:11.040 actually handle the load. You're throwing at it. 24 00:01:11.239 --> 00:01:15.959 Right. Architecturally, zabx needs three core components. They're constantly talking 25 00:01:16.000 --> 00:01:18.640 three main parts. Yeah, you've got the Zavic server. That's 26 00:01:18.680 --> 00:01:21.640 the central brain doing the polling, the processing. 27 00:01:21.159 --> 00:01:21.480 Got it. 28 00:01:21.560 --> 00:01:22.760 The engine the engine. 29 00:01:23.000 --> 00:01:27.200 Then the database usually Maria dB, maybe Postgres. Well, that's 30 00:01:27.239 --> 00:01:29.760 the huge repository for all your time series data. It 31 00:01:29.760 --> 00:01:30.239 gets big. 32 00:01:30.359 --> 00:01:31.560 The data store exactly. 33 00:01:31.920 --> 00:01:35.159 And finally the zabex front end. That's the WebUI you 34 00:01:35.200 --> 00:01:41.200 interact with, served up by Apache or NGI NX typically. 35 00:01:41.079 --> 00:01:45.120 So engine, data store, dashboard makes sense. Now, before we 36 00:01:45.159 --> 00:01:47.959 get into performance, where does a listener actually find all 37 00:01:47.959 --> 00:01:49.959 these metrics and settings we're about to talk about. 38 00:01:50.120 --> 00:01:54.040 Good point that dashboard. The front end its structured pretty clearly. 39 00:01:54.159 --> 00:01:58.359 When you log in, you see these main categories monitoring, inventory, reports, 40 00:01:58.400 --> 00:02:02.840 configuration and administration. Okay, all the key stuff we'll discuss today. 41 00:02:02.920 --> 00:02:05.480 It fits neatly into one of those five areas. Makes 42 00:02:05.480 --> 00:02:08.879 it easier to navigate. Now, that critical metric you asked about, 43 00:02:09.120 --> 00:02:11.000 the one that governs scale, that's. 44 00:02:11.039 --> 00:02:15.960 Nvps MVPs new values per second. Why is that number 45 00:02:16.360 --> 00:02:20.120 so incredibly important in a high volume system? 46 00:02:20.240 --> 00:02:23.439 Think of MVPs as the ingestion rate. It's the average 47 00:02:23.520 --> 00:02:26.400 number of new data points, new readings from your items 48 00:02:26.439 --> 00:02:31.400 that the Zavik server is either receiving or requesting every single. 49 00:02:31.280 --> 00:02:33.400 Second, per second, wow, per second. 50 00:02:33.639 --> 00:02:36.560 It is absolutely the single best measure of the load 51 00:02:36.599 --> 00:02:38.800 on your server, both right now and what you can expect. 52 00:02:39.319 --> 00:02:41.280 So it predicts the future almost. 53 00:02:41.000 --> 00:02:41.400 In a way. 54 00:02:41.520 --> 00:02:45.280 Yes, if you see that MVPs number climbing steadily, staying hi, 55 00:02:45.919 --> 00:02:48.560 that's your definitive signal. It tells you exactly when and 56 00:02:48.599 --> 00:02:50.560 how much you need to beef up your server's CPU 57 00:02:50.599 --> 00:02:53.560 and memory. Right, you ignore nvps at your own peril. 58 00:02:53.599 --> 00:02:57.680 Basically, Okay, message received. If MVPs is the heartbeat, let's 59 00:02:57.719 --> 00:03:01.000 talk about what we're actually feeding this engine. Monitoring isn't 60 00:03:01.039 --> 00:03:04.639 just about pings anymore. You need custom data, sometimes asynchronous stuff. 61 00:03:04.960 --> 00:03:07.599 How does zabx five give you that flexibility beyond just 62 00:03:07.639 --> 00:03:08.280 basic checks? 63 00:03:08.319 --> 00:03:10.479 Well, it starts the agent simple checks, like you know, 64 00:03:10.599 --> 00:03:14.039 is SSH port twenty two open? Those are still there, sure, But. 65 00:03:14.000 --> 00:03:17.479 The big shift is to Zavik's Agent two. It's written in. 66 00:03:17.439 --> 00:03:20.159 Go lang ah go okay, and that. 67 00:03:20.159 --> 00:03:25.000 Matters because Go natively handles concurrency and asynchronous tasks much better. 68 00:03:25.520 --> 00:03:29.879 The result a much lighter footprint on resources. 69 00:03:29.360 --> 00:03:32.639 Which is huge and containerized setups or just large virtual. 70 00:03:32.400 --> 00:03:34.240 Environments exactly invaluable. 71 00:03:34.400 --> 00:03:37.719 So that's the tech choice explained. But for really custom 72 00:03:37.759 --> 00:03:41.639 bespoke data, the sources talk a lot about this trapper mechanism. 73 00:03:41.919 --> 00:03:44.560 If agent two is so advanced, why do we need 74 00:03:44.560 --> 00:03:47.560 this separate trapper thing. Isn't that just adding complexity? 75 00:03:47.879 --> 00:03:50.879 It's a fair pultion. They serve slightly different roles, though 76 00:03:50.919 --> 00:03:53.599 the agent typically pulls on a set schedule right check 77 00:03:53.639 --> 00:03:57.599 this every sixty seconds. The trapper is for asynchronous data 78 00:03:57.759 --> 00:04:00.319 data that maybe comes from some external script on the 79 00:04:00.360 --> 00:04:02.680 host and you don't know exactly when it'll finish. 80 00:04:02.840 --> 00:04:04.599 Ah okay, like a long running job. 81 00:04:04.719 --> 00:04:07.240 Precisely, so, you set up a zabx trapper item on 82 00:04:07.280 --> 00:04:08.479 the server side. It just waits. 83 00:04:08.719 --> 00:04:11.879 Then on the host being monitored, you use the ZABC 84 00:04:11.960 --> 00:04:16.439 sender utility. That utility actively pushes the result, maybe from 85 00:04:16.480 --> 00:04:19.879 your complex Python script or cron job, straight to the 86 00:04:19.920 --> 00:04:21.519 server when the result is ready. 87 00:04:21.279 --> 00:04:23.800 So the server isn't asking, it's just listening for that 88 00:04:23.959 --> 00:04:25.000 specific data to. 89 00:04:25.040 --> 00:04:28.720 Arrive, exactly passive listening on the server, active sending from 90 00:04:28.759 --> 00:04:29.120 the host. 91 00:04:29.319 --> 00:04:31.519 That makes sense. It's the gateway for data that doesn't 92 00:04:31.560 --> 00:04:34.759 fit the regular polling schedule. Now here's where it seems 93 00:04:34.800 --> 00:04:37.680 to get really powerful. Though getting raw data is one thing, 94 00:04:37.759 --> 00:04:40.399 making it useful is another. Yeah, tell us about this 95 00:04:40.519 --> 00:04:41.800 preprocessing layer. 96 00:04:42.040 --> 00:04:45.319 Oh, preprocessing is an absolute game changer. It's data manipulation 97 00:04:45.439 --> 00:04:48.399 that happens before the value even lands. 98 00:04:48.079 --> 00:04:51.040 In the database, before storage. Okay, yes, and. 99 00:04:50.959 --> 00:04:54.680 Its main goal is efficiency. One way is using things 100 00:04:54.759 --> 00:04:56.120 like dependent. 101 00:04:55.639 --> 00:04:56.879 Items pendent item. 102 00:04:56.959 --> 00:04:59.879 Think of it like this. You make one single request 103 00:05:00.120 --> 00:05:02.879 to get say, a big chunk of status info a 104 00:05:02.879 --> 00:05:05.519 master item check right, then depend on items. Let you 105 00:05:05.560 --> 00:05:09.279 pull five specific metrics out of that single chunk of data, 106 00:05:09.319 --> 00:05:12.160 all for the cost of just one network call one 107 00:05:12.199 --> 00:05:12.920 interval check. 108 00:05:13.079 --> 00:05:17.000 Ah. So instead of five separate checks for CPU RAM 109 00:05:17.079 --> 00:05:21.040 disc uptime, you do one big check and peel off 110 00:05:21.040 --> 00:05:21.600 the bits. 111 00:05:21.360 --> 00:05:24.480 You need precisely. It saves a massive amount of overhead 112 00:05:24.480 --> 00:05:25.480 on the network in the agent. 113 00:05:25.519 --> 00:05:26.279 Okay, that's clever. 114 00:05:26.399 --> 00:05:29.079 And often that one big status check it dumps out 115 00:05:29.160 --> 00:05:33.079 just messy unstructured text. Think of the raw output from 116 00:05:33.120 --> 00:05:36.199 like if canfig on Linux run via system dot. 117 00:05:36.120 --> 00:05:37.160 Run yet tons of text? 118 00:05:37.279 --> 00:05:39.399 This is where what you called the rejecs hack comes in. 119 00:05:39.759 --> 00:05:43.120 You use preprocessing with a regular expression rejects to. 120 00:05:43.240 --> 00:05:45.839 Scan that raw text, find the pattern. 121 00:05:45.759 --> 00:05:48.639 And cleanly extract just the single number you care about, 122 00:05:48.680 --> 00:05:51.680 like total rx bites for interface ends one ninety two. 123 00:05:52.160 --> 00:05:55.079 You turn that text chaos into a clean, usable number 124 00:05:55.160 --> 00:05:55.759 right at the door. 125 00:05:55.920 --> 00:05:58.600 That's fantastic. It shifts the cleanup work away from the 126 00:05:58.600 --> 00:06:01.240 network into the server's processing, where it's more efficient. I 127 00:06:01.240 --> 00:06:04.360 think you remember another database tip related to this, something 128 00:06:04.439 --> 00:06:05.319 about duplicates. 129 00:06:05.560 --> 00:06:08.079 Yes, exactly, for values that don't change often, maybe a 130 00:06:08.160 --> 00:06:10.399 server serial number for a more version, that kind of 131 00:06:10.399 --> 00:06:14.120 thing that's static mostly right, you use the discard unchanged 132 00:06:14.160 --> 00:06:17.639 preprocessing step. If zavix gets the same value one hundred 133 00:06:17.680 --> 00:06:20.759 times in a row, it won't write one hundred identical 134 00:06:20.879 --> 00:06:22.040 entries into the database. 135 00:06:22.199 --> 00:06:24.839 Ah, so it just stores the first one and ignores 136 00:06:24.879 --> 00:06:26.000 the rest until it changes. 137 00:06:26.120 --> 00:06:26.519 Correct. 138 00:06:26.680 --> 00:06:29.800 This prevents just pointless bloating of your database over time. 139 00:06:30.160 --> 00:06:31.800 Super important for long term. 140 00:06:31.600 --> 00:06:35.040 Health, definitely. Okay, so we've collected and cleaned the data. 141 00:06:35.160 --> 00:06:37.199 Now we need to act on it. Alerting. We've all 142 00:06:37.199 --> 00:06:41.560 been there. The flapping alert goes critical then okay, then 143 00:06:41.600 --> 00:06:45.079 critical fifty times an hour floods your inbox. How does 144 00:06:45.160 --> 00:06:47.839 Zavix help stop that alert fatigue nightmare? 145 00:06:47.959 --> 00:06:50.519 The cure for that horror show is the recovery expression. 146 00:06:50.839 --> 00:06:54.480 Setting a simple trigger is easy CPU usage fifty percent 147 00:06:54.800 --> 00:06:56.000 trigger an alert. 148 00:06:55.800 --> 00:06:58.120 Right, and the naive setup recovers as soon as it 149 00:06:58.199 --> 00:06:59.480 hits forty nine percent. 150 00:06:59.240 --> 00:07:03.879 Exactly which to flapping. The recovery expression demands stability. You 151 00:07:03.959 --> 00:07:06.439 define a separate condition for the alert to clear, it 152 00:07:06.480 --> 00:07:09.000 has to move significantly away from the problem threshold. 153 00:07:09.120 --> 00:07:11.720 So like trigger a fifty percent, but only recover when 154 00:07:11.720 --> 00:07:13.920 it's back down to say forty percent. 155 00:07:13.759 --> 00:07:16.879 Pcisely that the system has to prove its stable and 156 00:07:16.959 --> 00:07:19.600 well clear of the danger zone before the alarm goes silent. 157 00:07:19.879 --> 00:07:22.040 That makes the alerts actually meaningful again. 158 00:07:22.199 --> 00:07:26.399 Nice, absolutely, and organizationally, you need structure too. Use tags 159 00:07:26.439 --> 00:07:30.000 from day one, AD service SSH or application billing to 160 00:07:30.079 --> 00:07:31.839 every related trigger so you. 161 00:07:31.800 --> 00:07:33.519 Can filter and rout alerts properly. 162 00:07:33.680 --> 00:07:36.879 Yes, and customize your severity levels. Don't stick with the 163 00:07:36.920 --> 00:07:40.439 defaults like disaster high warning. If your company uses P one, 164 00:07:40.519 --> 00:07:43.040 P two, P three, change the names in Zavik so 165 00:07:43.120 --> 00:07:45.800 the alerts immediately make sense in your team's context. 166 00:07:46.279 --> 00:07:50.519 Good practical tips. Now let's talk scaling. The actual architecture. 167 00:07:51.279 --> 00:07:54.560 MVPs tells us the load is high. But for medium, large, 168 00:07:54.639 --> 00:08:00.160 geographically dispersed environments, what's the component that handles spreading out 169 00:08:00.160 --> 00:08:01.000 that monitoring work. 170 00:08:01.120 --> 00:08:04.079 That's where zab's proxies come in. They are absolutely essential 171 00:08:04.120 --> 00:08:06.439 for offloading work from the central ZABK server. 172 00:08:06.639 --> 00:08:07.399 How do they do that? 173 00:08:07.480 --> 00:08:10.360 A proxy sits closer to the devices it monitors, maybe 174 00:08:10.360 --> 00:08:13.120 in a remote data center or a specific network segment. 175 00:08:13.439 --> 00:08:15.839 It collects data locally, holds on to it, maybe does 176 00:08:15.839 --> 00:08:16.720 some pre processing. 177 00:08:17.120 --> 00:08:19.319 Preprocessing can happen on the proxy too. 178 00:08:19.519 --> 00:08:22.279 Yes, some of it. Then it compresses that data and 179 00:08:22.360 --> 00:08:25.079 forwards it efficiently back to the main ZABC server. It 180 00:08:25.160 --> 00:08:27.959 reduces the load on the central server significantly. 181 00:08:28.079 --> 00:08:33.039 Okay, Now, when deploying these proxies, you've got firewalls, network segmentation. 182 00:08:33.360 --> 00:08:35.840 Are there different types performance trade offs? 183 00:08:35.960 --> 00:08:36.200 Yeah? 184 00:08:36.200 --> 00:08:39.159 There are two main modes, passive and active proxies. We 185 00:08:39.240 --> 00:08:43.840 strongly strongly recommend active proxies. Why active Two big reasons. First, 186 00:08:44.440 --> 00:08:48.480 they're generally faster because they proactively push their collected data 187 00:08:48.559 --> 00:08:50.759 to the server whenever they have new stuff. They don't 188 00:08:50.759 --> 00:08:51.480 wait to be asked. 189 00:08:51.559 --> 00:08:53.000 Okay, less latency, right. 190 00:08:53.440 --> 00:08:56.720 But second, and often more importantly, for network teams, an 191 00:08:56.759 --> 00:09:01.039 active proxy only needs one outbound connection and initiated from 192 00:09:01.080 --> 00:09:02.120 the proxy to the server. 193 00:09:02.320 --> 00:09:04.320 Uh so the proxy calls home exactly. 194 00:09:04.679 --> 00:09:07.399 Compare that to passive proxies, where the central server has 195 00:09:07.440 --> 00:09:11.360 to be able to initiate connections to potentially dozens or 196 00:09:11.480 --> 00:09:14.200 hundreds of proxies. Firewall rule nightmare. 197 00:09:14.440 --> 00:09:19.399 Definitely simplifying firewall management is a massive win in big companies. Okay, 198 00:09:19.440 --> 00:09:23.519 Sticking with operational wins, Yeah, let's hit the biggest maintenance 199 00:09:23.559 --> 00:09:28.200 headache for any monitoring system over time, the database. How 200 00:09:28.200 --> 00:09:31.159 do we stop it from becoming this unmanageable beast? Yeah? 201 00:09:31.240 --> 00:09:34.200 The database is always the challenge long term. The bottlenet 202 00:09:34.279 --> 00:09:37.039 comes from the default Zabas process called the housekeeper. 203 00:09:37.120 --> 00:09:39.240 Housekeeper sounds helpful, well. 204 00:09:39.120 --> 00:09:41.399 It tries to be. When data gets old. Say your 205 00:09:41.399 --> 00:09:44.440 history retention is ninety days, the housekeeper is responsible for 206 00:09:44.480 --> 00:09:45.840 deleting data older than that. 207 00:09:46.000 --> 00:09:48.639 Okay, seems necessary, but it deletes. 208 00:09:48.320 --> 00:09:52.039 That data roe bi row. Imagine millions, maybe billions of 209 00:09:52.159 --> 00:09:55.879 rows as your database grows into terabytes. This row by 210 00:09:56.000 --> 00:10:00.720 roe deletion just consumes immense IOCPU. Yeah, it can bring 211 00:10:00.720 --> 00:10:04.080 your server performance to its knees during cleanup OUCH. 212 00:10:04.120 --> 00:10:06.879 So relying on the built and cleaner eventually just grinds 213 00:10:06.919 --> 00:10:09.799 everything to a halt. What's the better way? The advanced solution? 214 00:10:10.120 --> 00:10:12.679 You absolutely need to move to database native methods. For 215 00:10:12.840 --> 00:10:16.399 my sqel, that's my Sqel partitioning. For postgrescul it's leveraging 216 00:10:16.480 --> 00:10:19.840 the timescale dB extension, which brings time series superpowers to. 217 00:10:19.799 --> 00:10:24.559 Postgross partitioning or timescale deb How do they avoid the 218 00:10:24.679 --> 00:10:25.799 row by row problem? 219 00:10:25.840 --> 00:10:29.519 They work with time based chunks instead of deleting individual rows. 220 00:10:29.639 --> 00:10:32.240 The database is structured so you can just drop an 221 00:10:32.399 --> 00:10:37.440 entire old partition, say delete all data from March instantly. 222 00:10:37.279 --> 00:10:39.720 Like throwing away a whole filing cabinet drawer instead of 223 00:10:39.759 --> 00:10:41.200 shredding each paper inside. 224 00:10:41.240 --> 00:10:45.679 Exactly that analogy. It's incredibly efficient. The cleanup becomes almost instantaneous, 225 00:10:46.080 --> 00:10:48.080 freeing up massive resources. 226 00:10:48.159 --> 00:10:50.600 That sounds like a no brainer from a performance perspective. 227 00:10:51.039 --> 00:10:55.639 But does switching to partitioning have any functional trade offs 228 00:10:55.679 --> 00:10:58.000 for the user? Configuring zavig it does. 229 00:10:58.120 --> 00:11:01.399 Yeah, that's the key planning points. You implement native partitioning 230 00:11:01.480 --> 00:11:05.320 your history and trend data retention settings often become global 231 00:11:05.399 --> 00:11:09.399 database parameters. You typically lose the ability to set say 232 00:11:09.440 --> 00:11:12.000 seven days history for this item, but ninety days for 233 00:11:12.039 --> 00:11:12.480 that item. 234 00:11:12.600 --> 00:11:15.559 Ah. So you gain huge efficiency, but lose some of 235 00:11:15.559 --> 00:11:18.279 that fine grain control over retention per item. 236 00:11:18.440 --> 00:11:20.240 That's the main trade off. You need to plan your 237 00:11:20.279 --> 00:11:21.919 retention strategy more globally. 238 00:11:22.279 --> 00:11:27.480 Got it important consideration? Okay, last, big area hybrid cloud. 239 00:11:27.639 --> 00:11:30.600 Lots of critical metrics now live in places like AWS, 240 00:11:30.600 --> 00:11:34.559 cloud Watch or Azure Monitor. How does zabix stay relevant? 241 00:11:34.679 --> 00:11:39.240 How does it pull data that's behind these proprietary cloud CLIs. 242 00:11:39.519 --> 00:11:42.720 Zabx uses a really powerful combo here, zabks agent user 243 00:11:42.759 --> 00:11:45.639 parameters plus the cloud provider's own CLI tools. 244 00:11:45.720 --> 00:11:49.240 Okay, so you can saw the awcli or the azurecli where. 245 00:11:49.279 --> 00:11:51.600 Right onto the same machine where the zabs Agent two 246 00:11:51.720 --> 00:11:54.000 is running. Could be an EC two instance, could be 247 00:11:54.039 --> 00:11:56.559 on prem machine that needs to query the cloud. Okay, 248 00:11:56.799 --> 00:11:59.360 then you can figure a user parameter within the Zavik's 249 00:11:59.440 --> 00:12:02.720 agent confis figuration. This user parameter basically just tells the 250 00:12:02.759 --> 00:12:06.159 agent run this specific AWSCLI command. 251 00:12:06.440 --> 00:12:08.600 The agent cs like a secure little proxy to run 252 00:12:08.639 --> 00:12:10.399 the cloud command locally precisely. 253 00:12:10.600 --> 00:12:13.360 The agent executes, the command, gets the output, maybe your 254 00:12:13.399 --> 00:12:17.159 sqsq depth or soupu utilization from cloud watch, and then 255 00:12:17.200 --> 00:12:20.039 it injects that value straight into Zavix. Like any other metric. 256 00:12:20.080 --> 00:12:22.279 It collected very flexible. Basically, if you can script it 257 00:12:22.320 --> 00:12:24.120 on the command line, zax can monitor it. 258 00:12:24.360 --> 00:12:27.120 That's the power of it. And for containers, Agent two 259 00:12:27.159 --> 00:12:29.279 makes it even easier. It has native plug ins for 260 00:12:29.320 --> 00:12:32.360 things like dock or monitoring built right in again thanks 261 00:12:32.399 --> 00:12:33.559 to that Go architecture. 262 00:12:33.600 --> 00:12:35.960 Okay, pulling it all together, then, what does this mean 263 00:12:36.000 --> 00:12:39.000 for you the listener? We've seen Zavix five is well, 264 00:12:39.000 --> 00:12:43.600 it's clearly robust, super flexible, scalable, designed for way more 265 00:12:43.639 --> 00:12:47.440 than just simple pings. It handles complex enterprise level stuff. 266 00:12:47.720 --> 00:12:48.080 Yeah. 267 00:12:48.120 --> 00:12:51.720 I think the key takeaway is that real control, real scalability. 268 00:12:51.799 --> 00:12:56.440 It requires planning upfront, especially especially around the database. You 269 00:12:56.559 --> 00:12:59.720 have to decide on partitioning or timescale dB early, don't 270 00:12:59.720 --> 00:13:02.200 wait to it hurts, don't wait till it hurts, and 271 00:13:02.440 --> 00:13:06.080 getting good at that data extraction using preprocessing. That's non 272 00:13:06.080 --> 00:13:08.200 negotiable if you want to keep your database clean and 273 00:13:08.240 --> 00:13:09.720 your metrics really trustworthy. 274 00:13:09.960 --> 00:13:13.159 Fantastic summary. Now, building on that idea, of platform control. 275 00:13:13.480 --> 00:13:16.120 We talked about how Agent two can execute commands to 276 00:13:16.159 --> 00:13:19.600 pull data. Yeah, but what about controlling Zavix itself. Here's 277 00:13:19.600 --> 00:13:21.799 a final thought. Even a user who's locked down to 278 00:13:21.879 --> 00:13:23.480 the basic Zavis user. 279 00:13:23.320 --> 00:13:27.120 Role, right, the most basic view only type role usually. 280 00:13:27.159 --> 00:13:29.840 Exactly they can maybe only see metrics for their own 281 00:13:29.879 --> 00:13:34.879 couple of hosts. Even that user can potentially execute powerful 282 00:13:34.919 --> 00:13:38.840 administrative actions, things like enabling or disabling host on maps 283 00:13:39.240 --> 00:13:43.240 or scheduling system maintenance periods. They can trigger these right 284 00:13:43.279 --> 00:13:44.480 from the Zavix front end. 285 00:13:44.519 --> 00:13:48.320 Wait, hang on a basic user doing admin tasks. How 286 00:13:48.679 --> 00:13:50.960 that sounds like a massive security hole if they don't 287 00:13:50.960 --> 00:13:52.000 have the actual permissions. 288 00:13:52.080 --> 00:13:54.879 It sounds like it, but it's actually the ultimate delegation 289 00:13:55.000 --> 00:13:59.279 power of the ZAVICSAPI. The trick is the script they 290 00:13:59.279 --> 00:14:02.360 trigger from the frontend. It isn't running with their lowly 291 00:14:02.440 --> 00:14:05.960 user permissions. Instead, that custom script you've set up is 292 00:14:05.960 --> 00:14:08.399 configured behind the scenes to use the API credentials of 293 00:14:08.440 --> 00:14:11.519 a different user, one who does have administrative privileges. 294 00:14:11.679 --> 00:14:13.559 So the low level user clicks a button, but the 295 00:14:13.600 --> 00:14:16.559 action is performed via the API using a high privileged 296 00:14:16.559 --> 00:14:18.840 token or user configured in that script. 297 00:14:19.039 --> 00:14:23.000 Exactly. It lets you safely delegate very specific, complex administrative 298 00:14:23.039 --> 00:14:26.080 workflows like putting a server into maintenance for exactly two 299 00:14:26.080 --> 00:14:29.120 hours starting now, to users who absolutely should not have 300 00:14:29.240 --> 00:14:30.919 general admin rights to the whole system. 301 00:14:31.200 --> 00:14:31.799 Wow. 302 00:14:32.159 --> 00:14:37.039 Okay, that is powerful the API as a controlled delegation mechanism. 303 00:14:37.120 --> 00:14:41.039 That's the raw, potent and highly customizable power hidden within 304 00:14:41.080 --> 00:14:44.440 the Zavix API. Food for thought. Definitely thanks for diving 305 00:14:44.480 --> 00:14:46.440 deep with us today. We really hope you feel better 306 00:14:46.480 --> 00:14:50.120 equipped now to tackle your infrastructure monitoring challenges using Zavix 307 00:14:50.159 --> 00:14:50.399 five