WEBVTT 1 00:00:00.120 --> 00:00:04.839 Welcome back to the deep dive. Today, we are wrestling with, uh, 2 00:00:05.160 --> 00:00:09.039 probably the defining architectural challenge of the last few years, 3 00:00:09.039 --> 00:00:09.480 maybe the. 4 00:00:09.439 --> 00:00:10.919 Decade definitely feels like it. 5 00:00:11.240 --> 00:00:15.080 How do you reliably route traffic when the ground beneath 6 00:00:15.080 --> 00:00:17.600 your feet is constantly shifting. We're talking about the shift 7 00:00:17.640 --> 00:00:20.440 from stable, predictable monoliths. 8 00:00:19.920 --> 00:00:22.239 Right, the quarterly release cycle kind of. 9 00:00:22.160 --> 00:00:25.879 Thing exactly, to this well sometimes chaotic world of micro 10 00:00:26.000 --> 00:00:28.600 services that scale up and down constantly. 11 00:00:28.800 --> 00:00:31.640 Yeah, it's like comparing I don't know, a printed map 12 00:00:31.679 --> 00:00:34.119 from the nineties to Google Maps in a city where 13 00:00:34.200 --> 00:00:38.359 roads just appear and disappear every few minutes and buildings 14 00:00:38.399 --> 00:00:39.479 resize themselves. 15 00:00:39.560 --> 00:00:42.640 That's a great analogy. And those older load balancers, the 16 00:00:42.679 --> 00:00:45.159 ones built for the static map, they just can't cope. 17 00:00:45.240 --> 00:00:48.159 They're stuck in that static config mindset. They pretty much 18 00:00:48.200 --> 00:00:52.079 melt down when faced with how dynamic a modern cloud 19 00:00:52.159 --> 00:00:53.119 environment really. 20 00:00:52.920 --> 00:00:56.320 Is and that operational headache. That's why we're diving deep 21 00:00:56.320 --> 00:00:59.240 into Treevik today. It's an open source API gateway and 22 00:00:59.240 --> 00:01:02.000 it's build specific to handle that dynamic complexity. 23 00:01:02.280 --> 00:01:05.480 Right. The idea is to simplify deploy micro services, especially 24 00:01:05.480 --> 00:01:08.519 if you're in the kuber eddies world, which, let's face it, many. 25 00:01:08.280 --> 00:01:11.000 Are so our mission today. 26 00:01:10.840 --> 00:01:14.319 Our mission is to unpack how trific acts as this 27 00:01:14.400 --> 00:01:17.319 crucial link. Think of it as the intelligent gateway tier. 28 00:01:18.079 --> 00:01:23.439 It connects that volatile ecosystem of services to the outside world. 29 00:01:23.599 --> 00:01:26.359 And by the end you listening should have a pretty 30 00:01:26.359 --> 00:01:28.799 good handle on the cutting edge of network routing. 31 00:01:29.120 --> 00:01:32.079 A shortcut maybe, yeah, a shortcut to being well informed 32 00:01:32.120 --> 00:01:34.319 about this stuff. Resilience patterns too. 33 00:01:34.400 --> 00:01:37.120 Okay, let's start with the big picture, the monolith problem. 34 00:01:37.200 --> 00:01:39.920 We probably all remember it, right, tight coupling. 35 00:01:39.640 --> 00:01:42.560 Slow releases, Oh the. 36 00:01:42.439 --> 00:01:46.159 Pain, and that really expensive all or nothing scaling. Need 37 00:01:46.239 --> 00:01:48.640 more horsepower for login scale. 38 00:01:48.239 --> 00:01:51.400 The whole thing, huge waste of resources. That worked. Okay, 39 00:01:51.480 --> 00:01:56.480 I guess with the classic three tier model presentation, application data, simple. 40 00:01:56.319 --> 00:01:59.120 Enough, but micro services break that model completely. 41 00:01:59.280 --> 00:02:01.719 When you shatter that happened to I don't know, dozens 42 00:02:01.840 --> 00:02:04.519 hundreds of tiny services. You need a different architecture. 43 00:02:04.560 --> 00:02:06.640 You have to evolve the four tier model. 44 00:02:06.439 --> 00:02:08.360 Exactly, build for distributed systems. 45 00:02:08.639 --> 00:02:11.199 And that fourth tier is where Trafiic lives, right, that's. 46 00:02:11.039 --> 00:02:13.319 His home turf. Right, So the four tiers you really 47 00:02:13.360 --> 00:02:16.960 need are first, content delivery, the UI, the client stuff. 48 00:02:17.039 --> 00:02:17.360 Okay. 49 00:02:17.560 --> 00:02:23.280 Second, the gateway tier, that's STRAFIK, discovery, routing, correlating requests, 50 00:02:23.400 --> 00:02:26.840 aggregating responses sometimes all that happens here the traffic hap 51 00:02:27.159 --> 00:02:30.680 sort of. Yeah. Then third is the services tier, your 52 00:02:30.719 --> 00:02:35.719 actual decoupled business logic units, high cohesion, loose coupling, all 53 00:02:35.759 --> 00:02:40.240 that good stuff. And finally, the data tier databases, message queues, 54 00:02:40.479 --> 00:02:43.759 but now ideally exclusive to the services that own that data. 55 00:02:43.800 --> 00:02:46.560 Okay, So the gateway tier is critical, it's the front door. 56 00:02:47.000 --> 00:02:50.439 What does a modern gateway like trafic absolutely have to 57 00:02:50.439 --> 00:02:52.199 do to handle that chaos in tier three? 58 00:02:52.400 --> 00:02:54.000 Right, It's got to be more than just a simple 59 00:02:54.039 --> 00:02:57.000 port forwarder. Layer seven routing is non negotiable. 60 00:02:56.479 --> 00:02:59.159 Where seven meaning application layer. 61 00:02:59.039 --> 00:03:03.240 Exactly routing based on HTDP headers, host names, paths, maybe 62 00:03:03.240 --> 00:03:05.360 even stuff in the request body, not just layer four 63 00:03:05.639 --> 00:03:08.199 like TCP or UDP ports. And it needs to speak 64 00:03:08.199 --> 00:03:11.879 different languages essentially HDTP one, HGDP two, gRPC rest it 65 00:03:11.919 --> 00:03:12.479 shouldn't care. 66 00:03:12.719 --> 00:03:15.680 And security that feels like a huge piece, especially with 67 00:03:15.719 --> 00:03:18.680 all those services chattering away behind the gateway. 68 00:03:18.719 --> 00:03:23.639 Oh massive, Absolutely, the gateway must handle TLS termination, you know, 69 00:03:23.960 --> 00:03:25.560 decrypting the incoming. 70 00:03:25.280 --> 00:03:27.719 Public traffic standard stuff, right, But. 71 00:03:27.639 --> 00:03:31.159 Then inside the cluster for service to service chat you 72 00:03:31.240 --> 00:03:34.319 need mutual tls MTLs. 73 00:03:33.759 --> 00:03:35.960 So both sides prove who they are precisely. 74 00:03:36.159 --> 00:03:39.080 It's not just the client showing ID. The server demands 75 00:03:39.120 --> 00:03:42.400 ID back, show me your papers too. It's essential for 76 00:03:42.479 --> 00:03:45.240 locking things down inside your perimeter if something goes wrong, 77 00:03:45.719 --> 00:03:46.960 limits the blast radius. 78 00:03:47.039 --> 00:03:49.759 Okay, that makes sense, which leads us right to maybe 79 00:03:49.759 --> 00:03:53.919 the killer feature, autoconfiguration. Because, like you said, hundreds of services, 80 00:03:53.960 --> 00:03:59.520 maybe thousands of instances, updating config files byhand impossible right there, It. 81 00:03:59.560 --> 00:04:02.719 Just doesn't scale. That's where Treyfik fundamentally solves the service 82 00:04:02.759 --> 00:04:05.479 discovery problem. Instead of a human editing. 83 00:04:05.240 --> 00:04:07.759 A file, which is always error prone. 84 00:04:07.479 --> 00:04:11.759 Always, instead, Trephi talks directly to a service registry. Think Console, 85 00:04:11.759 --> 00:04:15.599 etca a Kubernetes itself. These things are like near real 86 00:04:15.639 --> 00:04:18.759 time databases of where every active service instance lives on 87 00:04:18.800 --> 00:04:19.279 the network. 88 00:04:19.399 --> 00:04:21.680 Ah, so Triffic doesn't need its own map. It just 89 00:04:21.839 --> 00:04:23.959 asks the map maker constantly. 90 00:04:23.600 --> 00:04:27.800 Exactly perfect analogy. Treyfik calls these map makers providers. It 91 00:04:27.839 --> 00:04:29.920 has first class support baked in, just sits there and 92 00:04:29.959 --> 00:04:33.519 watches the provider. A new service instance spins up. Treyfix 93 00:04:33.560 --> 00:04:35.879 C is it yep, an old one dies, treyfix E 94 00:04:36.000 --> 00:04:39.519 is that too, and it automatically reconfigures its own routing 95 00:04:39.560 --> 00:04:44.399 tables crucially without needing a restart or dropping existing connections. 96 00:04:44.600 --> 00:04:47.600 Hot reloads zero downtime. That's the dream. 97 00:04:47.800 --> 00:04:51.959 That's critical. Dynamic configuration and hot reloads are absolutely key. 98 00:04:52.199 --> 00:04:55.000 How tricky is it if you're running say, Docker and 99 00:04:55.079 --> 00:04:59.759 Kubernetes and maybe console, can one Trafiic instance watch all 100 00:04:59.800 --> 00:05:00.319 of them? 101 00:05:00.439 --> 00:05:03.600 Yeah? Surprisingly easily. That's the beauty of the provider concept. 102 00:05:04.199 --> 00:05:08.360 Trayfik kind of abstracts away the specific details of talking 103 00:05:08.360 --> 00:05:11.519 to Kubernetes versus talking to Console, so you can centralize 104 00:05:11.600 --> 00:05:13.319 routing even in a mixed environment. 105 00:05:13.399 --> 00:05:15.800 So developers just deploy to whatever platform. 106 00:05:15.399 --> 00:05:17.480 They use, and Treyfik figures out how to find it 107 00:05:17.519 --> 00:05:21.399 and send traffic there. Developers focus on code. Treefiic handles 108 00:05:21.439 --> 00:05:22.439 the routing complexity. 109 00:05:22.759 --> 00:05:26.879 Okay, so Treyfik knows where everything is. Now let's talk 110 00:05:26.920 --> 00:05:30.279 about actually sending the traffic efficiently. We all know basic 111 00:05:30.360 --> 00:05:34.199 round robin, right, just deal them out equally fine for 112 00:05:34.240 --> 00:05:35.000 stateless stuff. 113 00:05:35.079 --> 00:05:38.240 Yeah, simple, effective, if all your servers are identical, but. 114 00:05:38.240 --> 00:05:42.079 They rarely are so weighted Round Robin WRR. How does 115 00:05:42.120 --> 00:05:42.519 that work? 116 00:05:42.680 --> 00:05:46.120 Right? WRR is about being smarter with resources. Maybe you 117 00:05:46.199 --> 00:05:49.519 have an older, cheaper server with less CPU. You don't 118 00:05:49.560 --> 00:05:51.720 want to getting the same traffic as your brand new 119 00:05:52.240 --> 00:05:56.279 beat cloud instance makes sense, So WRR lets you assign weights. 120 00:05:56.600 --> 00:05:59.199 You could say, send three requests to the powerful guests 121 00:05:59.279 --> 00:06:01.399 B one group for every one request you send to 122 00:06:01.439 --> 00:06:04.199 the older guest D two, a three point one ratio 123 00:06:04.319 --> 00:06:04.839 for example. 124 00:06:04.839 --> 00:06:08.040 So it's not just load balancing, it's cost optimization too. 125 00:06:08.240 --> 00:06:12.040 Definitely in the cloud especially, WRR helps you squeeze maximum 126 00:06:12.120 --> 00:06:15.439 value out of cheaper or older instances alongside the new ones. 127 00:06:15.639 --> 00:06:19.600 Keeps everything utilized efficiently, saves money, no resource just sitting 128 00:06:19.600 --> 00:06:20.959 idle or getting totally slammed. 129 00:06:21.000 --> 00:06:23.639 Okay, let's flip that. What about apps where the user's 130 00:06:23.720 --> 00:06:26.839 state matters, like a shopping cart stored in memory on 131 00:06:26.839 --> 00:06:29.759 one specific server instance, Round Robin would break that. 132 00:06:30.079 --> 00:06:33.360 Yeah, that needs sticky sessions. If a user's second request 133 00:06:33.439 --> 00:06:36.199 hits a different server, poof their cart is gone or 134 00:06:36.240 --> 00:06:38.680 they get logged out. Bad experience. 135 00:06:39.000 --> 00:06:41.720 So how does trophy candle that it uses cookies. 136 00:06:41.959 --> 00:06:44.800 Typically when the first request hits a back end instance, 137 00:06:45.000 --> 00:06:48.560 treefix sets a cookie in the response for subsequent requests 138 00:06:48.560 --> 00:06:51.040 from that same user, trific reads the cookie and make 139 00:06:51.079 --> 00:06:53.720 sure to send the request back to that same original instance. 140 00:06:54.240 --> 00:06:55.199 Keeps a session alive. 141 00:06:55.480 --> 00:06:59.120 Okay, sticky sessions makes sense, But underlying all this balancing, 142 00:06:59.439 --> 00:07:02.560 you need health right. Making sure you're not sending traffic 143 00:07:02.600 --> 00:07:03.279 to a dead. 144 00:07:03.160 --> 00:07:06.560 Server absolutely fundamental. You only want to route traffic to 145 00:07:06.720 --> 00:07:10.399 instances that are actually healthy, usually meaning they return a 146 00:07:10.439 --> 00:07:14.079 two XX or a three X HTTP status code. Anything 147 00:07:14.079 --> 00:07:14.920 else is an error. 148 00:07:15.079 --> 00:07:19.480 Doesn't constantly poking every instance ad overhead though a performance tax. 149 00:07:19.639 --> 00:07:22.639 That's a fair question. It's a trade off. Trefik does 150 00:07:22.720 --> 00:07:25.279 use active checks where it sends a probe and passive 151 00:07:25.319 --> 00:07:28.800 checks watching responses. But you can figure the interval. You 152 00:07:28.879 --> 00:07:32.399 tune it so you find a balance, right, you said it, so. 153 00:07:32.439 --> 00:07:35.519 The monitoring overhead isn't painful, but it's frequent enough to 154 00:07:35.560 --> 00:07:38.759 pull an unhealthy instance out of the pool quickly when 155 00:07:38.800 --> 00:07:41.560 it does fail. It's crucial for the stability of things 156 00:07:41.560 --> 00:07:42.600 like round robin. 157 00:07:42.920 --> 00:07:45.839 Let's shift gears a bit to more advanced resilience patterns. 158 00:07:46.240 --> 00:07:50.199 Traffic mirroring sometimes called shadowing, sounds useful for testing. 159 00:07:49.879 --> 00:07:52.879 Oh, it's fantastic for canary deployments, really safe testing. The 160 00:07:52.959 --> 00:07:56.319 idea is you take your live production traffic, the real stuff, 161 00:07:56.360 --> 00:07:59.319 the real stuff, and you copy a small percentage of it, 162 00:07:59.319 --> 00:08:02.720 say ten percent, and send that copy asynchronously to a 163 00:08:02.720 --> 00:08:05.120 new test environment, maybe your guess V two. 164 00:08:05.040 --> 00:08:09.199 Version, asynchronously, so the original user isn't waiting exactly. 165 00:08:09.040 --> 00:08:13.279 And critically trafiic ignores the response from that mirror request. 166 00:08:13.279 --> 00:08:15.959 It just fires it off and forgets about it unless 167 00:08:16.000 --> 00:08:18.800 you see how your new code behaves under real load stability, 168 00:08:19.079 --> 00:08:22.839 resource use without any risk to the actual user experience. 169 00:08:23.120 --> 00:08:26.040 That's clever. Okay, so we've handled load and safe testing. 170 00:08:26.360 --> 00:08:29.839 But what about when things actually fail, not just one instance, 171 00:08:29.839 --> 00:08:33.440 but maybe a whole downstream database or API becomes slow 172 00:08:33.600 --> 00:08:37.000 or unresponsive in a micro services world. That seems like 173 00:08:37.039 --> 00:08:38.240 it could cause chaos. 174 00:08:38.559 --> 00:08:42.799 It absolutely can. That's the dreaded cascading failure scenario. One 175 00:08:43.120 --> 00:08:45.480 slow dependency makes its callers wait, they. 176 00:08:45.519 --> 00:08:47.639 Run out of threads or connections. 177 00:08:47.159 --> 00:08:50.080 Exactly, and then they fail, taking down the services to 178 00:08:50.120 --> 00:08:51.799 call them. It ripples outwards. 179 00:08:52.080 --> 00:08:54.519 So how does trific act as a ble kid prevent 180 00:08:54.600 --> 00:08:55.279 that ripple. 181 00:08:55.480 --> 00:08:58.840 That's the job of the circuit breaker pattern. Trafic middleware 182 00:08:58.879 --> 00:09:01.919 can implement this. It watches for failures going to a 183 00:09:01.919 --> 00:09:03.399 particular back end service. 184 00:09:03.639 --> 00:09:06.759 Failure is meaning errors or timeouts. 185 00:09:06.240 --> 00:09:09.919 Both typically yeah, if the failure rate or maybe latency 186 00:09:10.200 --> 00:09:11.200 crosses a threshold you. 187 00:09:11.240 --> 00:09:13.519 Define like too many errors in the last minute. 188 00:09:13.320 --> 00:09:16.639 Right, or responses are taking too long. If that happens, 189 00:09:16.879 --> 00:09:20.080 Trefix trips the breaker. It stops sending requests to that 190 00:09:20.080 --> 00:09:21.559 struggling service altogether for. 191 00:09:21.519 --> 00:09:24.440 A period and just returns an error immediately. 192 00:09:23.960 --> 00:09:26.960 Yep, usually a five zero three service unavailable. It does 193 00:09:26.960 --> 00:09:30.600 this instantly without even trying the failing service. This protects 194 00:09:30.600 --> 00:09:33.960 the calling services from getting bogged down and saves resources 195 00:09:34.000 --> 00:09:36.519 across the system. It's like the system saying nope, that 196 00:09:36.600 --> 00:09:38.559 are closed for now, try again later. 197 00:09:38.480 --> 00:09:41.320 And the conditions for tripping. It can be quite sophisticated. 198 00:09:41.360 --> 00:09:44.679 I saw yeah. Treefix implementation is pretty powerful. It's not 199 00:09:44.720 --> 00:09:48.639 just simple failure counts. You could use expressions like trip 200 00:09:48.679 --> 00:09:52.360 if latency at quantil ms fifty point zero hundred meaning 201 00:09:52.600 --> 00:09:54.679 the meeting response time is over one hundred. 202 00:09:54.399 --> 00:09:57.000 Milliseconds, or based on error ratio exactly. 203 00:09:57.000 --> 00:09:59.320 Response cut a ratio five hundred, six hundred point twenty 204 00:09:59.320 --> 00:10:01.600 five trip if more than twenty five percent of recent 205 00:10:01.679 --> 00:10:05.039 responses were five xx errors gives you fine grain control. 206 00:10:05.200 --> 00:10:08.159 Okay, circuit breakers handle the big failures. What about those 207 00:10:08.159 --> 00:10:12.240 little annoying transient glitches like a brief network kickup that 208 00:10:12.360 --> 00:10:13.840 just needs a quick retry. 209 00:10:14.279 --> 00:10:17.159 Perfect use case for retries middleware, Just like getting refresh 210 00:10:17.240 --> 00:10:19.399 in your browser when it page times out right, TRIFIC 211 00:10:19.440 --> 00:10:22.559 could be configured to automatically retry a request, maybe once 212 00:10:22.639 --> 00:10:25.159 or twice if it fails with specific errors like a 213 00:10:25.159 --> 00:10:27.679 connection timeout or maybe a five h two bad gateway. 214 00:10:28.200 --> 00:10:30.559 It provides a basic level of self healing for those 215 00:10:30.600 --> 00:10:31.799 intermitt network blips. 216 00:10:31.960 --> 00:10:35.919 Makes sense. So we've got routing balancing resilience. But when 217 00:10:35.960 --> 00:10:38.039 things do go wrong despite all this, we need to 218 00:10:38.039 --> 00:10:40.759 figure out why. Let's talk observability. 219 00:10:41.080 --> 00:10:45.159 Crucial observability isn't just knowing that something is wrong, but 220 00:10:45.279 --> 00:10:49.799 having the data to understand why. And TRIFIC, sitting at 221 00:10:49.799 --> 00:10:53.120 the entry point, is perfectly placed to collect that data. 222 00:10:53.279 --> 00:10:57.440 Across the three pillars right, logs, traces, metrics exactly. 223 00:10:57.559 --> 00:10:58.519 Let's start with logs. 224 00:10:59.159 --> 00:11:02.120 Now people off and say application logs alone aren't enough 225 00:11:02.120 --> 00:11:05.919 in micro services. What makes trifix logs actually useful here? 226 00:11:06.200 --> 00:11:09.360 Well, it generates standard error logs, of course, but the 227 00:11:09.399 --> 00:11:12.320 real value is often in the access logs. The trick 228 00:11:12.480 --> 00:11:16.480 is logging everything for every request can be really resource intensive. 229 00:11:16.600 --> 00:11:18.840 Yeah, generates huge amounts of data. 230 00:11:18.600 --> 00:11:21.360 So trific lets you filter them intelligently. You might say, 231 00:11:21.559 --> 00:11:24.519 only lawged requests that resulted in a redirect status codes 232 00:11:24.559 --> 00:11:27.919 three hundred to three h two, or only log requests 233 00:11:27.919 --> 00:11:31.279 that took longer than say, five seconds to complete using 234 00:11:31.279 --> 00:11:32.440 a mind duration filter. 235 00:11:32.600 --> 00:11:35.600 Ah, so you capture the interesting or problematic events without 236 00:11:35.679 --> 00:11:37.399 drowning and routine data. 237 00:11:37.120 --> 00:11:40.600 Precisely optimizes performance, gets you the diagnostic data you actually need. 238 00:11:40.679 --> 00:11:42.559 Okay, logs tell us what happened at the edge, But 239 00:11:42.759 --> 00:11:46.440 to follow a request through multiple services, we need tracing. 240 00:11:46.360 --> 00:11:50.799 Right request tracing stitches the whole journey together. Each piece 241 00:11:50.840 --> 00:11:53.840 of work done by a service is a span. All 242 00:11:53.879 --> 00:11:57.919 the spans for one user request combine into a single trace, like. 243 00:11:57.879 --> 00:11:59.840 A timeline of the request's life. 244 00:11:59.600 --> 00:12:03.399 Exactly, and Trafik being the first point of contact, can 245 00:12:03.440 --> 00:12:08.080 generate standardized trace headers, often B three propagation headers, things 246 00:12:08.120 --> 00:12:11.000 like XB three trace seed. Think of them like a digital. 247 00:12:10.720 --> 00:12:12.919 Passport, and it passes that passport along. 248 00:12:13.240 --> 00:12:16.000 It injects those headers into the request before forwarding it 249 00:12:16.039 --> 00:12:19.000 to the first back end service. That service, if it's 250 00:12:19.039 --> 00:12:21.840 trace aware, adds its own span and passes the headers on. 251 00:12:22.399 --> 00:12:25.200 So even if the request hits five different micro services, 252 00:12:25.320 --> 00:12:25.639 you can. 253 00:12:25.519 --> 00:12:27.879 See the whole chain in a system like Zipkin or 254 00:12:28.000 --> 00:12:28.960 Jaeger exactly. 255 00:12:29.080 --> 00:12:32.879 End to end visibility invaluable for debugging distributed systems. 256 00:12:32.519 --> 00:12:34.960 And the third pillar metrics the numbers yep. 257 00:12:35.320 --> 00:12:39.679 Treyfix exposes key application level metrics, things like total request counts, 258 00:12:39.720 --> 00:12:43.559 request latencies, average quantiles, error rates, information about the. 259 00:12:43.559 --> 00:12:45.639 Back end servers, and you feed that into. 260 00:12:45.679 --> 00:12:49.919 Standard monitoring systems, typically Prometheus. Prometheus scrapes these metrics from 261 00:12:50.000 --> 00:12:53.399 Treyfi periodically. Then you can use tools like Rafona to 262 00:12:53.720 --> 00:12:58.360 visualize trends, plan capacity, and set up automated alerts if say, 263 00:12:58.399 --> 00:13:00.600 aer rates spike or latency degrades. 264 00:13:01.039 --> 00:13:03.399 Got it? Okay, let's bring this home to the place 265 00:13:03.399 --> 00:13:07.840 where treefix seems most popular. Kubernetes. You mentioned earlier that 266 00:13:07.919 --> 00:13:10.919 the original Kubernetes ingress API wasn't great. 267 00:13:11.159 --> 00:13:14.519 Yeah, it was. Let's say a bit under specified vague, 268 00:13:14.879 --> 00:13:18.440 which forced vendors like treyfick in Jinks and others to 269 00:13:18.519 --> 00:13:20.200 rely heavily on custom. 270 00:13:19.879 --> 00:13:23.120 Annotations, annotations being those kind of messy tech strings. In 271 00:13:23.159 --> 00:13:23.759 the Yamo. 272 00:13:23.720 --> 00:13:27.799 Exactly, you'd have dozens of vendor specific annotations to configure 273 00:13:27.840 --> 00:13:31.519 basic things like timeouts or retries or sticky sessions. It 274 00:13:31.559 --> 00:13:33.120 wasn't clean, wasn't standardized. 275 00:13:33.279 --> 00:13:35.240 So how did trefik improve on that? They gave up 276 00:13:35.240 --> 00:13:36.879 on Ingress in treyfiic v two. 277 00:13:36.919 --> 00:13:41.440 They shifted strategy. They embraced Kubernetes's custom resource definitions or crds. 278 00:13:41.799 --> 00:13:45.679 They introduced their own resources like ingress, root middleware TLS. 279 00:13:45.240 --> 00:13:48.360 Option, So instead of annotations, you define routing rules using 280 00:13:48.440 --> 00:13:51.279 these custom but still native feeling Kubernetes's objects. 281 00:13:51.559 --> 00:13:55.639 Precisely, it's a much nicer experience. As they say, configuration 282 00:13:55.720 --> 00:14:00.840 becomes structured, version controllable Kubernetes YAML, just like your deployment services. 283 00:14:01.240 --> 00:14:04.759 Any Kubernetes engineer can understand it. It follows familiar patterns, 284 00:14:05.159 --> 00:14:08.600 no more digging through annotation documentation for different vendors. 285 00:14:08.639 --> 00:14:10.879 That sounds like a huge improvement. And you also touched 286 00:14:10.919 --> 00:14:15.080 on TLS simplification getting certificates is often a real pain. 287 00:14:15.159 --> 00:14:19.600 Oh historically it was awful manual requests, validation hoops, remembering 288 00:14:19.639 --> 00:14:22.799 to new high chance of error, high risk. 289 00:14:23.000 --> 00:14:24.840 So how does trifick fix that. 290 00:14:25.200 --> 00:14:28.159 It integrates directly with the ACME protocol, which is the 291 00:14:28.279 --> 00:14:32.679 standard let's encrypt uses for automating certificate issuance for public domains. 292 00:14:32.840 --> 00:14:35.639 Let's encrypt the free certificate authority right. 293 00:14:35.639 --> 00:14:38.720 When in trifick you basically just configure a cert resolver 294 00:14:38.840 --> 00:14:42.120 pointing to let's encrypt. Than when you define an ingress 295 00:14:42.200 --> 00:14:43.360 route for a public host. 296 00:14:43.240 --> 00:14:44.840 Name, trifiic just handles it. 297 00:14:44.840 --> 00:14:49.039 It handles the entire life cycle automatically. It requests the certificate, 298 00:14:49.320 --> 00:14:52.440 handles the domain validation challenge, often using something called the 299 00:14:52.480 --> 00:14:57.799 TLS ALPN zero one challenge. It's quite neat, retrieves the certificate, 300 00:14:58.000 --> 00:15:01.320 installs it, and even handles renew before it expires. 301 00:15:01.799 --> 00:15:04.720 Wow. So the developer just defines the route asks for 302 00:15:04.840 --> 00:15:07.960 TLS and trefick and let's encrypt do the rest. 303 00:15:08.039 --> 00:15:12.639 Pretty much focus on the application logic. The complicated, error 304 00:15:12.679 --> 00:15:15.279 prone task of certificate management just happens. 305 00:15:15.440 --> 00:15:18.159 So wrapping it up, trefix core value seems to be 306 00:15:18.200 --> 00:15:22.039 replacing that old, rigid manual configuration world. 307 00:15:21.840 --> 00:15:24.639 Which just breaks under micro service dynamism. 308 00:15:24.440 --> 00:15:28.120 With the dynamic self configuring system built for that reality. 309 00:15:28.159 --> 00:15:31.080 It's the traffic cop that learns the roads automatically as 310 00:15:31.120 --> 00:15:33.279 they get built or torn down well put. 311 00:15:33.360 --> 00:15:35.759 And there's a final thought, maybe a provocative one, tied 312 00:15:35.759 --> 00:15:39.120 to that certificate automation. We just discussed why traditionally certificate 313 00:15:39.159 --> 00:15:42.440 management was so painful and manual. People did it infrequently, 314 00:15:42.919 --> 00:15:45.480 maybe once a year. This meant certificates were valued for 315 00:15:45.519 --> 00:15:48.679 a long time. If one got compromise somehow, an attacker 316 00:15:48.679 --> 00:15:50.039 had a year long window. 317 00:15:50.279 --> 00:15:52.799 Right. Long lived credentials are risky. 318 00:15:52.720 --> 00:15:56.799 Very yeah. But because Trefix integration with let's encrypt automates 319 00:15:56.840 --> 00:16:01.120 the renewal process, certificates typically only live for ninety days now, 320 00:16:01.679 --> 00:16:05.399 and the renewals automatic, often no human touch needed. 321 00:16:05.639 --> 00:16:08.840 So it drastically shrinks the window of opportunity for an 322 00:16:08.879 --> 00:16:11.559 attacker using a compromise certificate. 323 00:16:11.120 --> 00:16:15.519 Exactly here removes a tedious, error prone operational task and 324 00:16:15.639 --> 00:16:19.960 significantly improves your security posture. By enforcing short certificate lifetimes. 325 00:16:20.639 --> 00:16:23.799 That whole category of operational security risk just kind of 326 00:16:23.840 --> 00:16:25.440 melts away thanks to automation. 327 00:16:25.679 --> 00:16:29.240 That's a really powerful side effect of adopting modern tooling. 328 00:16:29.320 --> 00:16:31.759 A fantastic insight to end on, Thank you for taking 329 00:16:31.840 --> 00:16:33.679 us through this deep dive into trafit. 330 00:16:33.399 --> 00:16:35.039 My pleasure is fascinating technology. 331 00:16:35.120 --> 00:16:37.519 Then thank you our listeners for joining us. We'll catch 332 00:16:37.519 --> 00:16:38.519 you on the next deep dive.