WEBVTT 1 00:00:00.160 --> 00:00:03.680 Welcome to the deep dive. Today. We're diving into well, 2 00:00:03.720 --> 00:00:07.240 a really fundamental challenge in computing these days, handling that 3 00:00:07.519 --> 00:00:10.080 absolute flood of data, you know, the high speed stuff. 4 00:00:10.160 --> 00:00:14.119 Yeah, it's relentless. I think social media, IoT sensors, everywhere, 5 00:00:14.199 --> 00:00:17.280 web clicks, it just keeps coming. 6 00:00:17.760 --> 00:00:19.839 And it's not just the sheer volume, right, it's how 7 00:00:19.879 --> 00:00:22.480 bursty it is. You get these sudden peaks exactly. 8 00:00:22.519 --> 00:00:26.000 That borestiness is the real killer. Our sources. They show 9 00:00:26.039 --> 00:00:29.399 that building systems that don't just fall over when you 10 00:00:29.440 --> 00:00:32.039 get a sudden rush, like say everyone trying to check 11 00:00:32.079 --> 00:00:34.280 out a bike in a smart city right at five pm, 12 00:00:34.600 --> 00:00:37.799 that needs special thinking. Traditional approaches just won't. 13 00:00:37.600 --> 00:00:40.119 Cut it right. So our mission today is basically to 14 00:00:40.119 --> 00:00:43.159 give you a solid shortcut to understanding these kinds of architectures. 15 00:00:43.640 --> 00:00:47.079 Will use the Amazon Kinesis family, you know, data streams, 16 00:00:47.119 --> 00:00:49.799 fire hose, data analytics, video streams, kind of as our 17 00:00:49.840 --> 00:00:50.359 case study. 18 00:00:50.560 --> 00:00:52.679 Yeah. The goal is by the end you'll really get 19 00:00:52.679 --> 00:00:55.640 the difference between these tools, and more importantly, the core 20 00:00:55.719 --> 00:00:59.920 idea is behind making these systems scalable and crucially fast. 21 00:01:00.039 --> 00:01:02.280 Okay, let's kick off with the big why why is 22 00:01:02.399 --> 00:01:05.400 speed so important in data analysis. Now why the big rush? 23 00:01:05.560 --> 00:01:09.000 Well, it's become a strategic must have. Really, we've moved 24 00:01:09.000 --> 00:01:12.319 way beyond just looking backwards. Yeah, like those old batch 25 00:01:12.400 --> 00:01:14.599 jobs running overnight giving you a report. 26 00:01:14.239 --> 00:01:17.120 On yesterday, right, a snapshot of the past exactly. 27 00:01:17.640 --> 00:01:21.599 Now, it's all about immediate insights, the freshest possible data. 28 00:01:21.840 --> 00:01:25.760 Let's you make the best decisions right now, perishable insights. 29 00:01:25.760 --> 00:01:31.319 Basically, I've heard about this thing, the O day loop observe, orient, decide, act. 30 00:01:31.760 --> 00:01:34.719 It comes from military strategy. I think, how does real 31 00:01:34.760 --> 00:01:35.959 time data fit in there? 32 00:01:36.079 --> 00:01:40.359 It fits perfectly. Real time analytics fundamentally strengths that observe window. 33 00:01:40.719 --> 00:01:42.640 Something happens, you know about. 34 00:01:42.400 --> 00:01:45.280 It instantly, which means you can orient, decide, and act 35 00:01:45.359 --> 00:01:45.920 much faster. 36 00:01:46.079 --> 00:01:49.719 Precisely gives you a huge edge. Think about frog detection 37 00:01:50.000 --> 00:01:53.760 or optimizing ad placements on the fly, or in that 38 00:01:53.799 --> 00:01:58.200 smart city example, rerouting maintenance crews instantly based on real 39 00:01:58.280 --> 00:02:00.000 time bike usage patterns. 40 00:02:00.040 --> 00:02:03.000 Okay, so that's the why. Now the how? A single 41 00:02:03.000 --> 00:02:05.840 server obviously can't cope, so we use distributed systems. Lots 42 00:02:05.840 --> 00:02:08.280 of servers network together. But does not just explode the 43 00:02:08.280 --> 00:02:10.599 complexity failure modes everywhere. 44 00:02:10.759 --> 00:02:14.400 Oh absolutely, It gets complex fast, but engineers figured it 45 00:02:14.400 --> 00:02:17.919 out ways to manage it. Two main patterns really help 46 00:02:18.039 --> 00:02:20.439 tame that initial complexity. 47 00:02:19.919 --> 00:02:21.479 Beast Okay, what were they. 48 00:02:21.520 --> 00:02:26.599 First standardized interfaces? Things like APIs This really paved the 49 00:02:26.639 --> 00:02:28.479 way for micro service architectures. 50 00:02:28.719 --> 00:02:32.199 Ah, so breaking the big system down into smaller independent 51 00:02:32.240 --> 00:02:33.639 services exactly. 52 00:02:33.840 --> 00:02:36.400 It lets different teams own their piece and you know, 53 00:02:36.639 --> 00:02:39.360 iterate faster without stepping on each other's toes. 54 00:02:39.560 --> 00:02:42.439 It sounds a bit like Conway's law in action, the 55 00:02:42.479 --> 00:02:44.919 system design mirroring the ORG structure. 56 00:02:45.240 --> 00:02:47.879 That's a great way to put it. Yeah, small independent 57 00:02:47.919 --> 00:02:52.159 teams tend to build small independent services. Those standardized interfaces 58 00:02:52.159 --> 00:02:53.319 are what make it work smoothly. 59 00:02:53.439 --> 00:02:57.439 Okay, interfaces help services talk. But what about failures? If 60 00:02:57.439 --> 00:03:00.639 one micro service crashes while it's processing data, how do 61 00:03:00.680 --> 00:03:03.240 you stop it from dragging down everything else that depends 62 00:03:03.280 --> 00:03:03.560 on it? 63 00:03:03.639 --> 00:03:06.400 Right, that's the castgating failure problem. And that's where the 64 00:03:06.439 --> 00:03:10.759 second pattern comes in. Decoupling. You use an asynchronous message broker. 65 00:03:10.879 --> 00:03:14.479 A message broker like a middleman for data sort of. 66 00:03:14.599 --> 00:03:17.719 Yeah. It acts as the stable buffer, an invariant in 67 00:03:17.759 --> 00:03:21.560 the system. If a downstream service, a consumer fails, the 68 00:03:21.639 --> 00:03:25.120 broker just holds onto the incoming messages. It builds up 69 00:03:25.120 --> 00:03:26.240 a backlog. 70 00:03:25.840 --> 00:03:29.039 So the producer can keep sending data unaware of the 71 00:03:29.080 --> 00:03:30.120 downstream problem. 72 00:03:30.280 --> 00:03:33.479 Exactly, the failure is contained. Once the consumer service comes 73 00:03:33.479 --> 00:03:36.800 back online, it just starts processing the backlog from the broker. 74 00:03:37.080 --> 00:03:39.560 No data lost, no castkating crash. 75 00:03:39.680 --> 00:03:43.159 That makes sense. The broker isolates the fault. But if 76 00:03:43.199 --> 00:03:46.199 that backlog keeps growing, that sounds like trouble brewing. What 77 00:03:46.319 --> 00:03:47.360 metrics do we watch? 78 00:03:47.560 --> 00:03:51.439 The key capacity metric is transactions per second tps. That's 79 00:03:51.520 --> 00:03:54.000 usually limited by either the number of records per second 80 00:03:54.159 --> 00:03:56.560 or the total data size like megabytes per second. 81 00:03:56.599 --> 00:03:59.960 Okay, tps, but what's the danger signal the real data. 82 00:04:00.080 --> 00:04:02.919 Your signal is back pressure. That's when your producers are 83 00:04:02.960 --> 00:04:06.000 sending data faster than your consumers can process it. Input 84 00:04:06.000 --> 00:04:08.520 tps is consistently higher than output tps. 85 00:04:08.759 --> 00:04:11.080 So back to the bike sharing rush hour hits every 86 00:04:11.159 --> 00:04:13.879 bike starts pinging its location like crazy. How do you 87 00:04:13.919 --> 00:04:17.759 handle that surge, that back pressure without the system just drowning. 88 00:04:18.040 --> 00:04:21.120 You've got a few strategies ranging from let's say gentle 89 00:04:21.560 --> 00:04:25.120 to pretty aggressive. Okay, you can throttle the producers tell 90 00:04:25.160 --> 00:04:28.560 the bikes, hey, maybe only report your location every minute 91 00:04:28.600 --> 00:04:30.439 instead of every second during peak times. 92 00:04:30.639 --> 00:04:32.600 Reduce the input rate makes. 93 00:04:32.439 --> 00:04:35.000 Sense, Or you can scale the consumers. If you're in 94 00:04:35.040 --> 00:04:38.480 the cloud, you might automatically spin up more instances of 95 00:04:38.519 --> 00:04:40.680 your processing application to handle the load. 96 00:04:40.800 --> 00:04:42.000 Add more workers, right. 97 00:04:42.079 --> 00:04:44.279 You can also use bigger buffers in the message broker 98 00:04:44.279 --> 00:04:48.800 itself to just absorb temporary spikes. But the most drastic 99 00:04:48.839 --> 00:04:52.000 option is to just drop messages drop data. 100 00:04:52.279 --> 00:04:53.199 That sounds risky. 101 00:04:53.360 --> 00:04:56.120 It is. You'd only ever do that for non critical data. 102 00:04:56.160 --> 00:04:59.120 Like maybe dropping some routine sensor readings is okay if 103 00:04:59.120 --> 00:05:02.120 the systems overloaded, but you'd never drop something like a 104 00:05:02.279 --> 00:05:04.360 customer order completion message. Never. 105 00:05:04.519 --> 00:05:08.160 Okay, got it? Critical versus non critical. Let's dig into 106 00:05:08.160 --> 00:05:11.560 the stream itself. What are the absolute basic parts of 107 00:05:11.639 --> 00:05:13.800 any streaming system for main components? 108 00:05:14.079 --> 00:05:16.000 Yeah, you can break it down pretty simply. First you 109 00:05:16.000 --> 00:05:18.639 have the producers. These are the apps sending the data, 110 00:05:19.120 --> 00:05:22.439 our bike sensors, the user's mobile app checking out a bike, 111 00:05:22.480 --> 00:05:23.040 that kind of thing. 112 00:05:23.079 --> 00:05:24.639 Okay, producers send data. 113 00:05:24.680 --> 00:05:27.720 Then you have the messages or records. That's the actual 114 00:05:27.800 --> 00:05:30.839 data payload, usually small, maybe I a few kill bytes 115 00:05:31.000 --> 00:05:34.319 often capped around a megabyte. Each record has the data 116 00:05:34.360 --> 00:05:37.399 itself and a header, usually with a unique ID the 117 00:05:37.439 --> 00:05:38.920 broker assigns. 118 00:05:38.399 --> 00:05:40.160 Got it, producers messages. 119 00:05:40.319 --> 00:05:43.279 Third is the stream or the broker. That's the component 120 00:05:43.360 --> 00:05:45.600 doing the buffering, like canisis itself. 121 00:05:45.360 --> 00:05:48.000 Or Kafka the fault isolator we talked about, right, And 122 00:05:48.040 --> 00:05:51.079 finally you have the consumers, the applications that pull data 123 00:05:51.120 --> 00:05:55.959 from the stream and actually do something with it, analysis, storage, triggering, alerts. 124 00:05:55.519 --> 00:06:02.399 Whatever, producers, messages, stream consumers. Okay, now latency you mentioned 125 00:06:02.399 --> 00:06:05.279 speed is critical the source of stress. We need to 126 00:06:05.279 --> 00:06:08.199 be really precise here. It's not just latency. They're two 127 00:06:08.240 --> 00:06:09.720 specific measures. 128 00:06:09.600 --> 00:06:14.000 Yes, absolutely critical distinction. There's propagation delay that's simply the 129 00:06:14.079 --> 00:06:17.279 time it takes from the moment of producer writes a 130 00:06:17.279 --> 00:06:21.920 message to the moment a consumer reads it raw transmission speed. 131 00:06:21.759 --> 00:06:24.600 Basically okay, right to read time. What's the other one? 132 00:06:24.720 --> 00:06:28.480 The other and often more telling metric is the age 133 00:06:28.480 --> 00:06:31.800 of the message. This measures how long a message has 134 00:06:31.800 --> 00:06:34.720 actually been sitting in the stream before a consumer picked 135 00:06:34.720 --> 00:06:35.000 it up. 136 00:06:35.120 --> 00:06:38.079 Ah, So propagation delay could be low, but the message 137 00:06:38.160 --> 00:06:41.040 might still be old. If the consumers are lagging exactly. 138 00:06:41.079 --> 00:06:43.879 If that average message age starts creeping up, that's your 139 00:06:43.879 --> 00:06:46.680 big warning sign. It means your consumers can't keep up. 140 00:06:46.839 --> 00:06:49.759 The backlog is growing, and performance problems are right around 141 00:06:49.759 --> 00:06:52.319 the corner, even if the network itself is fast. 142 00:06:52.759 --> 00:06:56.839 That's a really important distinction. Now, failure modes distributed systems 143 00:06:56.920 --> 00:07:00.319 retries network glitches. It means data streams all and have 144 00:07:00.399 --> 00:07:03.560 this at least once delivery guarantee right, a message might 145 00:07:03.600 --> 00:07:04.519 show up more than once. 146 00:07:04.800 --> 00:07:09.519 That's the standard reality. Yes, because ensuring exactly once delivery 147 00:07:09.560 --> 00:07:14.279 across a distributed system is incredibly hard. Most brokers guarantee 148 00:07:14.279 --> 00:07:17.279 at least once. The producer might retry sending if it 149 00:07:17.279 --> 00:07:20.680 doesn't get confirmation the consumer by process and in crash 150 00:07:20.759 --> 00:07:23.160 before confirming. Duplicates happen, and that. 151 00:07:23.319 --> 00:07:28.279 Sounds potentially disastrous. If you're, say, processing payments or updating 152 00:07:28.279 --> 00:07:30.319 inventory accounts, double counting. 153 00:07:30.040 --> 00:07:33.199 It would be disastrous, which is why the responsibility for 154 00:07:33.240 --> 00:07:37.720 handling duplicates shifts downstream to the consumer or the final 155 00:07:37.759 --> 00:07:42.199 destination system. Those systems must be designed to be idempatant idempatent, 156 00:07:42.439 --> 00:07:46.120 meaning meaning processing the same message multiple times has the 157 00:07:46.160 --> 00:07:48.680 exact same effect as processing it just once. 158 00:07:48.800 --> 00:07:50.680 How do you achieve that two main ways. 159 00:07:50.800 --> 00:07:53.560 Either the operation itself is naturally idempatant like setting a 160 00:07:53.639 --> 00:07:55.560 value as adempatent adding to a count, or is not, 161 00:07:56.360 --> 00:07:59.079 or you use that unique ideed. It's typically in the 162 00:07:59.079 --> 00:08:02.399 message payload to explicitly check if you've already processed this 163 00:08:02.439 --> 00:08:04.839 specific message before de duplication. 164 00:08:04.959 --> 00:08:08.560 Basically okay, So the consuming application needs to be smart 165 00:08:08.560 --> 00:08:12.399 about duplicates. What about errors within a message? If a 166 00:08:12.439 --> 00:08:15.720 consumer gets a record it just can't process. 167 00:08:15.600 --> 00:08:18.959 That leads to a particularly nasty problem called the poison pill. 168 00:08:19.120 --> 00:08:21.800 Poisman pill sounds bad, It is. 169 00:08:22.160 --> 00:08:25.879 Because message streams usually guarantee the order of messages, at 170 00:08:25.959 --> 00:08:29.040 least within a certain context. Like all messages for a 171 00:08:29.079 --> 00:08:32.960 specific bike, you need to process the unlock event before 172 00:08:33.000 --> 00:08:34.559 the return event takes sense. 173 00:08:34.759 --> 00:08:35.480 Order matters. 174 00:08:35.559 --> 00:08:38.480 But if one message in that sequence causes an error 175 00:08:38.519 --> 00:08:42.159 in the consumer, maybe it's malformed bad data, and the 176 00:08:42.200 --> 00:08:46.200 consumer keeps retrying and failing on that specific message, it 177 00:08:46.200 --> 00:08:49.600 gets stuck. It gets completely stuck. That single poison pill 178 00:08:49.639 --> 00:08:52.919 record blocks the processing of all the perfectly valid records 179 00:08:52.919 --> 00:08:54.879 sitting behind it in the stream for that same bike 180 00:08:55.000 --> 00:08:58.679 or whatever the ordering context is. The whole sequence grinds 181 00:08:58.720 --> 00:09:00.879 to a halt because of one bad message. 182 00:09:00.960 --> 00:09:03.879 Wow, so a single bite of bad data could effectively 183 00:09:03.919 --> 00:09:06.799 block a whole partition of the stream, causing a latency 184 00:09:06.879 --> 00:09:09.440 for everything behind it to skyrocket exactly. 185 00:09:09.559 --> 00:09:13.519 It really highlights why robust error handling and potentially dead 186 00:09:13.559 --> 00:09:18.080 letter cues for problematic messages are so crucial in consumer design. 187 00:09:18.080 --> 00:09:20.720 Okay, that sets the stage really well. Let's move on 188 00:09:20.759 --> 00:09:23.759 to the specific tools Amazon offers with Kinesis to handle 189 00:09:23.759 --> 00:09:27.480 all this at scale. Starting with the foundation canesis data 190 00:09:27.519 --> 00:09:30.240 streams or KDS. When do you reach for this? 191 00:09:30.720 --> 00:09:34.600 KTS is your go to when you need maximum control, 192 00:09:34.919 --> 00:09:39.679 high durability, and really low latency. We're talking subsecond processing 193 00:09:39.720 --> 00:09:42.519 for your own custom applications that read directly from the stream. 194 00:09:42.600 --> 00:09:45.279 And the core unit of KDS is the shard, right, 195 00:09:45.360 --> 00:09:46.360 what does that actually do? 196 00:09:46.559 --> 00:09:49.840 The shard is the fundamental unit of capacity and parallelism 197 00:09:49.879 --> 00:09:53.159 in KDS. Each chard has fixed limits on how much 198 00:09:53.240 --> 00:09:55.879 data it can ingest typically one megabyte per second or 199 00:09:55.879 --> 00:09:58.480 one thousand records per second, whichever comes first, and also 200 00:09:58.519 --> 00:09:59.720 how much data can be read out. 201 00:10:00.000 --> 00:10:02.120 If I need more capacity, I just add more shards 202 00:10:02.320 --> 00:10:02.879 pretty much. 203 00:10:03.000 --> 00:10:05.799 Yeah, you scale the stream by scaling the number of shards. 204 00:10:05.879 --> 00:10:08.720 Okay, and if we're tracking millions of bike journeys, how 205 00:10:08.720 --> 00:10:11.960 does KDS decide which shard a specific bike's data goes 206 00:10:12.000 --> 00:10:13.279 to and keep it in order. 207 00:10:13.480 --> 00:10:17.200 That's determined by the partition key. When your producer application 208 00:10:17.320 --> 00:10:20.399 sends a record to KDS, it includes a partition key. 209 00:10:20.840 --> 00:10:22.879 This could be the bike ID for instance. 210 00:10:22.759 --> 00:10:25.360 And KTS uses that key to route. 211 00:10:25.080 --> 00:10:29.360 The data exactly. KTS hashes the partition key and uses 212 00:10:29.399 --> 00:10:31.960 the result to assign the record to a specific shard. 213 00:10:32.679 --> 00:10:35.240 All records with the same partition key always go to 214 00:10:35.279 --> 00:10:35.759 the same. 215 00:10:35.639 --> 00:10:38.559 Chard ah, and that's how it guarantees order. Within that key, 216 00:10:38.919 --> 00:10:41.399 all data for bike one twenty three lands on, say, 217 00:10:41.799 --> 00:10:44.559 shard five, in the correct sequence precisely. 218 00:10:45.159 --> 00:10:48.679 But this also highlights a potential pitfall, the hot partition 219 00:10:48.759 --> 00:10:52.039 key or hot shard. If one partition ke, like a 220 00:10:52.080 --> 00:10:55.240 super popular bike or a single central sensor, sends way 221 00:10:55.240 --> 00:10:58.000 more data than others, its shark can get overwhelmed while 222 00:10:58.000 --> 00:10:58.720 others are idle. 223 00:10:58.879 --> 00:11:01.399 So choosing a good part two key that distributes data 224 00:11:01.440 --> 00:11:04.600 evenly is critical for performance, maybe not just the bike idea. 225 00:11:04.600 --> 00:11:06.759 If usage is very uneven. 226 00:11:06.679 --> 00:11:10.200 Right, sometimes using a more random key or combining fields 227 00:11:10.440 --> 00:11:12.240 is necessary to spread the load effectively. 228 00:11:12.480 --> 00:11:14.799 Now reading the data, Let's say we have multiple apps 229 00:11:14.840 --> 00:11:17.039 wanting that bike data, one for maintenance alarts, one for 230 00:11:17.120 --> 00:11:20.279 route analysis. How do they read efficiently? You mentioned Enhanced 231 00:11:20.279 --> 00:11:21.080 fan out EFO. 232 00:11:21.440 --> 00:11:24.279 Yes, so, standard KDS consumers work on a pull model. 233 00:11:24.480 --> 00:11:27.200 They pull the shard for data, but each chard has 234 00:11:27.240 --> 00:11:30.200 a total read capacity limit about two megabytes per second 235 00:11:30.440 --> 00:11:32.000 that all standard consumers share. 236 00:11:32.200 --> 00:11:34.519 So if you have lots of consumers pulling the same shard, 237 00:11:34.559 --> 00:11:36.879 they start slowing each other down exactly. 238 00:11:36.519 --> 00:11:40.519 They compete for that shared bandwidth. Enhanced fan out EFO 239 00:11:41.200 --> 00:11:44.840 solves this with EFO. Each registered consumer gets its own 240 00:11:44.919 --> 00:11:49.480 dedicated throughput limit per shard, delivered via push model from Kinesis. 241 00:11:49.960 --> 00:11:52.639 So EFO consumers don't interfere with each other. 242 00:11:52.759 --> 00:11:55.799 Correct. It allows multiple real time applications to consume from 243 00:11:55.799 --> 00:11:58.720 the same stream at high speed independently. If you need 244 00:11:58.720 --> 00:12:01.840 scale on the consumer side, EFO is often essential. 245 00:12:02.240 --> 00:12:06.519 Okay, KTS for flexible, low latency custom apps. What about 246 00:12:06.600 --> 00:12:09.799 Kinesis Data fire Hose KDF. People often call it the 247 00:12:09.799 --> 00:12:10.440 easy loader. 248 00:12:10.720 --> 00:12:14.720 Why because it is KDF is fully managed, totally serverless 249 00:12:14.960 --> 00:12:17.080 and Its whole purpose is to make it super simple 250 00:12:17.120 --> 00:12:20.440 to capture streaming data and load it into specific destinations. 251 00:12:20.639 --> 00:12:24.279 Think S three data laks, redshift data warehouses, elastic search 252 00:12:24.320 --> 00:12:25.360 for search, that kind of thing. 253 00:12:25.399 --> 00:12:26.440 How easy is easy? 254 00:12:26.759 --> 00:12:29.440 Often zero code is needed for the delivery part. You 255 00:12:29.480 --> 00:12:31.759 can figure fire Hose in the console, point it at 256 00:12:31.799 --> 00:12:34.679 your stream source which could even be KDS, tell it 257 00:12:34.720 --> 00:12:38.840 where to send the data, and it handles the scaling, buffering, delivery, 258 00:12:38.960 --> 00:12:40.360 and retries automatically. 259 00:12:40.519 --> 00:12:42.799 Does it do anything else like transform the data? 260 00:12:43.039 --> 00:12:45.679 Yes, you can. It has built in capabilities for data 261 00:12:45.720 --> 00:12:48.639 format conversion like Jayson to park, and it can also 262 00:12:48.679 --> 00:12:53.039 invoke an AWS lambda function to perform custom inline transformations 263 00:12:53.039 --> 00:12:55.759 basic epl before the data lands and the destination. 264 00:12:56.039 --> 00:12:59.840 Okay, KTS gives us speed and flexibility. KDF gives us 265 00:13:00.039 --> 00:13:03.240 ease of use for loading data, sinks. What's the catch 266 00:13:03.320 --> 00:13:04.879 with KDF? What's the tradeoff? 267 00:13:05.080 --> 00:13:08.279 The main trade off is latency because fire Hose buffers 268 00:13:08.360 --> 00:13:11.799 data internally, often for several seconds or even minutes to 269 00:13:11.840 --> 00:13:13.919 batch it efficiently for delivery to the destination. 270 00:13:14.080 --> 00:13:16.960 Ah, It's not truly real time like KDS. 271 00:13:16.519 --> 00:13:19.559 Can be exactly. It optimizes for delivery throughput and cost 272 00:13:19.559 --> 00:13:23.480 effectiveness by batching, not for subsecond end to end latency, 273 00:13:23.919 --> 00:13:26.840 So if you need that immediate subsecond processing, KDS is 274 00:13:26.879 --> 00:13:30.039 the choice. If your goal is reliable, easy loading into 275 00:13:30.120 --> 00:13:32.639 F three or redshift and near real time is good enough, 276 00:13:32.799 --> 00:13:34.720 KDF is often way simpler. 277 00:13:34.519 --> 00:13:41.240 Makes sense speed versus simplicity. Now, analysis kinesis data analytics kDa. 278 00:13:41.600 --> 00:13:43.960 How does this fit in analyzing the stream as it flows? 279 00:13:44.159 --> 00:13:47.320 Precisely? kDa is also serverless, and it lets you run 280 00:13:47.360 --> 00:13:50.679 continuous queries or applications against your streaming data, either from 281 00:13:50.759 --> 00:13:53.320 KDS or KDF. It offers two different. 282 00:13:53.039 --> 00:13:55.120 Engines for this, Okay, what are the engine choices? 283 00:13:55.279 --> 00:13:58.879 First is a sqel engine. If you're familiar with standard SQL, 284 00:13:59.000 --> 00:14:02.879 you can use ans icql to query the stream. kDa 285 00:14:02.960 --> 00:14:06.240 presents the stream data within windows, like tumbling windows of the 286 00:14:06.279 --> 00:14:08.799 last five minutes or sliding windows, so. 287 00:14:08.879 --> 00:14:11.840 You could write a query like select count from bike 288 00:14:11.919 --> 00:14:15.480 stream where status in US, group by location district over 289 00:14:15.559 --> 00:14:16.440 a rolling time. 290 00:14:16.279 --> 00:14:19.200 Window exactly that kind of thing. It treats the incoming 291 00:14:19.240 --> 00:14:22.519 stream like a continuously updating table. You can query. Very 292 00:14:22.559 --> 00:14:26.399 powerful for many common analytics tasks and accessible if you 293 00:14:26.440 --> 00:14:27.320 know SQL. 294 00:14:27.240 --> 00:14:28.799 And the second engine you mentioned too. 295 00:14:29.039 --> 00:14:31.279 The second is based on a patche flink. This gives 296 00:14:31.279 --> 00:14:33.840 you much more power and flexibility. You typically write your 297 00:14:33.879 --> 00:14:35.519 applications in Javar Scala. 298 00:14:35.600 --> 00:14:37.960 What can flink do that the SQL engine can't. 299 00:14:38.200 --> 00:14:41.600 Well, flink apps can handle much more complex logic, maintain 300 00:14:41.679 --> 00:14:45.639 sophisticated state across events, and importantly can connect the data 301 00:14:45.639 --> 00:14:49.360 sources and sinks outside of just kinesis and aws like 302 00:14:49.399 --> 00:14:52.799 maybe an on premises KOFKA cluster or a custom database. 303 00:14:53.000 --> 00:14:56.000 And I remember reading flink is key for achieving exactly 304 00:14:56.039 --> 00:14:58.759 once processing semantics. How does it manage that? 305 00:14:59.159 --> 00:15:03.519 Yes, that's a major advantage. Flank achieves exactly once by 306 00:15:03.559 --> 00:15:08.200 carefully managing the application's internal state and periodically saving checkpoints 307 00:15:08.200 --> 00:15:11.159 of that state to durable storage typically S three or 308 00:15:11.240 --> 00:15:14.480 something like rock sysdb. If there's a failure, it can 309 00:15:14.480 --> 00:15:17.840 restore from the last successful checkpoint and resume processing without 310 00:15:17.879 --> 00:15:19.759 missing data or processing duplicates. 311 00:15:19.840 --> 00:15:23.360 So SQL for simpler windowed queries, think for complex stateful 312 00:15:23.440 --> 00:15:26.159 potentially exactly one's processing and connecting anywhere. 313 00:15:26.200 --> 00:15:26.879 That's a good summary. 314 00:15:26.960 --> 00:15:30.519 Yeah, okay, one more service kinesis video streams KBS. This 315 00:15:30.559 --> 00:15:32.559 seems more specialized. Video and audio. 316 00:15:32.720 --> 00:15:36.240 Yeah, KBS is specifically designed for handling time encoded data streams, 317 00:15:36.440 --> 00:15:39.759 primarily video and audio, but potentially other time series data too, 318 00:15:39.919 --> 00:15:41.919 like radar or liar feeds. 319 00:15:42.159 --> 00:15:44.600 How does play out in our smart city bike scenario? 320 00:15:44.960 --> 00:15:49.279 KBS actually addresses two pretty distinct use cases. There first 321 00:15:49.399 --> 00:15:53.279 is real time low latency interaction. Imagine a user wanting 322 00:15:53.320 --> 00:15:56.240 to see a live video feed of a bike stand 323 00:15:56.320 --> 00:15:58.440 on their phone to check if bikes. 324 00:15:58.159 --> 00:16:00.000 Are actually available okay, live streaming. 325 00:16:00.080 --> 00:16:04.000 KVS provides capabilities, often using Web RTC standards for that 326 00:16:04.080 --> 00:16:06.440 kind of low latency peer to peer or small group 327 00:16:06.559 --> 00:16:10.159 video streaming. It manages the signaling channels and things like 328 00:16:10.200 --> 00:16:13.840 stunt and turn servers needed to establish the connections. So 329 00:16:14.080 --> 00:16:15.799 KVS WebRTC. 330 00:16:15.360 --> 00:16:18.159 For live use got it? And the second use case. 331 00:16:18.000 --> 00:16:20.799 The second is more about storage, playback and analysis. This 332 00:16:20.840 --> 00:16:24.720 is KVS storage. You stream video from say security cameras 333 00:16:24.720 --> 00:16:28.000 that the bike stands into KVS for durable storage. 334 00:16:27.559 --> 00:16:28.919 And then what just store it? 335 00:16:29.039 --> 00:16:31.200 You can store it for compliance or later review, but 336 00:16:31.240 --> 00:16:33.639 the real power comes from integrating it with AI and 337 00:16:33.720 --> 00:16:36.480 mL services. For instance, you could feed that stored video 338 00:16:36.480 --> 00:16:38.679 stream into Amazon Recognition. 339 00:16:38.399 --> 00:16:39.759 AH for analysis. 340 00:16:39.919 --> 00:16:43.679 What like running facial recognition to detect known vandals near 341 00:16:43.720 --> 00:16:46.519 the bike stands in real time, or maybe object detection 342 00:16:46.559 --> 00:16:50.519 to count bikes automatically or spot obstructions. Recognition analyzes the 343 00:16:50.559 --> 00:16:54.279 frames ingested via KVS and can trigger alerts or other actions. 344 00:16:54.440 --> 00:16:57.399 So KVS handles both the live interaction piece and the 345 00:16:57.600 --> 00:16:59.759 jest for later analysis piece for video. 346 00:16:59.559 --> 00:17:02.039 Data correct two sides of the video coin. 347 00:17:02.200 --> 00:17:04.759 That's a really helpful tour of the whole Kinesis family. 348 00:17:04.839 --> 00:17:06.960 Let's try to quickly recap the core job of each 349 00:17:07.000 --> 00:17:07.759 one for the listener. 350 00:17:07.839 --> 00:17:13.079 Okay, KTS, that's your foundational low latency, high throughput stream 351 00:17:13.559 --> 00:17:17.480 for custom apps needing maximum flexibility, thinks subsecond speed right 352 00:17:17.599 --> 00:17:21.599 KTF KDF the easy serverlust loader, great for getting streaming 353 00:17:21.640 --> 00:17:24.759 data into S three redshift elastic search without much code, 354 00:17:25.279 --> 00:17:29.519 but sacrifices that subsecond latency for batching efficiency. kDa the 355 00:17:29.599 --> 00:17:32.799 real time Analytics engine, use it SQL interface for simpler 356 00:17:32.799 --> 00:17:36.039 windowed queries, or the Apache frink engine for complex stateful 357 00:17:36.039 --> 00:17:39.200 processing exactly one semantics and broader connectivity. 358 00:17:39.240 --> 00:17:42.079 And finally, KVS KDS. 359 00:17:41.559 --> 00:17:44.920 The specialist for video and time encoded data, handles both 360 00:17:44.960 --> 00:17:48.559 low latency web RTC streaming for live interaction and gerbil 361 00:17:48.720 --> 00:17:53.119 ingestion for storage playback, and AML analysis like recognition. 362 00:17:53.359 --> 00:17:56.640 Perfect summary. Now, before we wrap up, there's a crucial 363 00:17:56.680 --> 00:18:00.599 security point that underpins all of this distributed complexity. How 364 00:18:00.640 --> 00:18:04.160 to secure all these producers and consumers talking to kinesis, Yeah, this. 365 00:18:04.160 --> 00:18:08.440 Is super important. The absolute standard best practice is use 366 00:18:08.519 --> 00:18:11.960 IMA roles. Do not embed long term access keys or 367 00:18:12.000 --> 00:18:16.319 secrets directly into your producer or consumer applications. Why roles specifically, 368 00:18:16.559 --> 00:18:20.240 because IM roles provide temporary, short lived security credentials that 369 00:18:20.279 --> 00:18:24.759 are automatically generated and rotated by AWS. Your application running 370 00:18:24.759 --> 00:18:27.440 on EC two or Lambda or wherever assumes a role 371 00:18:27.599 --> 00:18:31.000 gets temporary permissions, does its work, and those credentials expire. 372 00:18:31.119 --> 00:18:34.440 So even if an application instance gets compromised somehow, the 373 00:18:34.480 --> 00:18:36.359 attacker doesn't get permanent keys. 374 00:18:36.480 --> 00:18:42.079 Exactly, the potential blast radius is dramatically reduced compared to static, 375 00:18:42.160 --> 00:18:46.200 long lived keys being compromised. It's all about implementing least 376 00:18:46.200 --> 00:18:51.640 privileged access using temporary role based credentials. It's fundamental to 377 00:18:51.680 --> 00:18:53.039 secure cloud architecture. 378 00:18:53.359 --> 00:18:56.440 That's a critical takeaway. Okay, let's finish with that provocative 379 00:18:56.480 --> 00:18:59.480 thought we talked about at least once delivery and the 380 00:18:59.519 --> 00:19:02.519 need for dempotency. How should listeners think about that? 381 00:19:02.920 --> 00:19:05.119 Right, we established that GETT duplicates is just a fact 382 00:19:05.160 --> 00:19:08.440 of life in most high performance, resilient streaming systems like 383 00:19:08.559 --> 00:19:10.960 those built on Kinesis data streams. You have to design 384 00:19:11.000 --> 00:19:11.240 for it. 385 00:19:11.559 --> 00:19:12.920 So the provocative thought. 386 00:19:12.680 --> 00:19:15.599 Is think about how profoundly that changes how you have 387 00:19:15.680 --> 00:19:18.559 to design your applications. Yeah, you can never assume an 388 00:19:18.559 --> 00:19:22.200 input an event, A message will arrive only once, might 389 00:19:22.279 --> 00:19:25.079 arrive twice or even more time. So that's rare. So 390 00:19:25.119 --> 00:19:28.240 how does that constraint The absolute certainty that you might 391 00:19:28.279 --> 00:19:31.680 get duplicates ripple through your entire design? How does it 392 00:19:31.680 --> 00:19:36.240 affect your database schemas, your transaction logic, your state management. 393 00:19:36.640 --> 00:19:39.920 Realizing that you must build identity consumers isn't just a detail, 394 00:19:40.039 --> 00:19:43.240 it's a fundamental shift in mindset needed to truly master 395 00:19:43.359 --> 00:19:44.480 streaming architectures. 396 00:19:44.720 --> 00:19:48.279 A great point to ponder designing for repeats, not just requests. 397 00:19:48.400 --> 00:19:49.880 Thanks for breaking all that down. 398 00:19:49.720 --> 00:19:52.839 My pleasure. It's complex, but hopefully a bit clearer now.