WEBVTT 1 00:00:00.080 --> 00:00:01.720 Okay, so you're trying to get a real grip on 2 00:00:01.760 --> 00:00:05.960 something complex, right, and you want it fast, but without 3 00:00:05.960 --> 00:00:08.240 getting totally buried in jargon and detail. 4 00:00:08.359 --> 00:00:10.279 Yeah, information overload is real. 5 00:00:10.240 --> 00:00:13.759 Exactly, So think of this as your shortcut. We're diving 6 00:00:13.800 --> 00:00:17.079 deep into real time analytics today, trying to give you 7 00:00:17.120 --> 00:00:20.640 that core understanding, you know, without all the noise. 8 00:00:20.920 --> 00:00:23.160 And we're basing this on the book Building real Time 9 00:00:23.199 --> 00:00:27.920 Analytics Systems. It just came out September twenty twenty three, first. 10 00:00:27.920 --> 00:00:31.079 Edition, right, and the book's goal seems pretty practical, just 11 00:00:31.120 --> 00:00:33.759 helping you, the listener, get your job done if you're 12 00:00:33.799 --> 00:00:34.719 working in this space. 13 00:00:34.960 --> 00:00:37.240 Pretty much, it cuts through the theory to the how 14 00:00:37.280 --> 00:00:38.039 to now. 15 00:00:38.479 --> 00:00:43.960 Think back maybe early two thousands data analytics. It's often 16 00:00:43.960 --> 00:00:46.479 felt like something you did after everything else happened, you know, 17 00:00:46.679 --> 00:00:47.320 batch process. 18 00:00:47.439 --> 00:00:49.799 Oh definitely, reports would tell you what happened yesterday or 19 00:00:49.920 --> 00:00:51.719 last week hindsight basically. 20 00:00:51.759 --> 00:00:53.799 But things have really shifted, haven't they. There's this like 21 00:00:54.280 --> 00:00:57.920 massive appetite now for knowing things the moment they happen 22 00:00:58.320 --> 00:00:58.920 real time. 23 00:00:59.039 --> 00:01:02.399 Absolutely. The book uses fraud detection as a great example. Yeah, 24 00:01:02.399 --> 00:01:03.880 finding out about fraud hours later? 25 00:01:04.200 --> 00:01:07.079 Uh uh, too late? Right, the money's probably done exactly. 26 00:01:07.200 --> 00:01:10.480 The real wind is spotting it now, flagging it, maybe 27 00:01:10.519 --> 00:01:14.200 blocking it instantly. That immediacy is key. It's not just 28 00:01:14.319 --> 00:01:18.200 nice to have anymore often, it's well essential. 29 00:01:18.879 --> 00:01:21.280 And that brings us to this idea of streaming. It's 30 00:01:21.319 --> 00:01:24.400 not about waiting for a whole file to finish downloading 31 00:01:24.480 --> 00:01:25.120 or collecting. 32 00:01:25.280 --> 00:01:27.280 No, not at all. Think of it more like a 33 00:01:28.599 --> 00:01:32.400 continuous flow, a river of data that just keeps coming. 34 00:01:32.439 --> 00:01:33.439 It never really ends. 35 00:01:33.680 --> 00:01:35.760 And the crucial part is you can dip into that 36 00:01:35.840 --> 00:01:38.560 river and act on what you see right then and there. 37 00:01:38.599 --> 00:01:41.920 Precisely. A data stream fundamentally is just a series of 38 00:01:42.000 --> 00:01:45.120 data points ordered by time. Each one represents some kind 39 00:01:45.120 --> 00:01:46.760 of event or a change. 40 00:01:46.480 --> 00:01:48.760 Like what give us an example, well, like. 41 00:01:48.840 --> 00:01:52.599 Every single purchase on an e commerce site, or every 42 00:01:52.599 --> 00:01:55.359 reading from an IoT sensor, maybe temperature of pressure. It's 43 00:01:55.400 --> 00:01:56.680 like a constant pulsive information. 44 00:01:56.760 --> 00:01:58.760 Okay, okay, And here's a point the book really stresses, 45 00:01:58.799 --> 00:02:02.120 which I found fascinating. Events have a shelf. 46 00:02:01.840 --> 00:02:04.040 Life, a very short one. 47 00:02:04.079 --> 00:02:07.359 Sometimes their value can just like plummet super fast. Think 48 00:02:07.359 --> 00:02:10.479 about an online shopping cart someone just abandon right if. 49 00:02:10.360 --> 00:02:12.919 You can ping them with an SMS or an email. 50 00:02:13.000 --> 00:02:15.120 Maybe with a little discount voucher, Like. 51 00:02:15.319 --> 00:02:17.000 Immediately you might get that sale back. 52 00:02:17.080 --> 00:02:19.319 You've got a decent shot. Yeah, but wait, even just 53 00:02:19.360 --> 00:02:20.840 a couple of hours, they've moved. 54 00:02:20.639 --> 00:02:24.039 On, bought somewhere else, or just changed their mind exactly. 55 00:02:24.240 --> 00:02:28.080 The timing that immediate reaction makes all the difference, and 56 00:02:28.120 --> 00:02:31.560 that is the heart of real time analytics or RTA. 57 00:02:32.159 --> 00:02:35.479 It's all about squeezing value from those events basically as 58 00:02:35.479 --> 00:02:36.199 soon as they happen. 59 00:02:36.400 --> 00:02:39.719 The book mentions soft real time. What's that about. Does 60 00:02:39.719 --> 00:02:41.240 that mean it's not quite real time? 61 00:02:41.520 --> 00:02:43.840 Well, yeah, kind of. It just acknowledges that, you know, 62 00:02:43.919 --> 00:02:48.120 perfection is hard. There might be tiny delays milliseconds maybe 63 00:02:48.159 --> 00:02:52.680 seconds because of network latency or system hiccups. It's not instantaneous, 64 00:02:52.680 --> 00:02:53.840 but it's very very close. 65 00:02:53.960 --> 00:02:57.080 Okay, so practical real time. The big difference is compared 66 00:02:57.120 --> 00:03:00.080 to batch processing right batches. 67 00:02:59.759 --> 00:03:02.639 Where you collect data over time maybe an hour, maybe 68 00:03:02.639 --> 00:03:04.879 a day, put it in a big chunk, and then 69 00:03:04.919 --> 00:03:05.479 analyze it. 70 00:03:05.560 --> 00:03:08.080 We used to set up these artificial deadlines. Didn't we 71 00:03:08.319 --> 00:03:10.840 run the report at midnight for yesterday's data. 72 00:03:11.080 --> 00:03:14.120 Yeah, those time boundaries. The problem is your analysis is 73 00:03:14.120 --> 00:03:17.879 always looking backwards, you're getting insights about what was happening. 74 00:03:17.520 --> 00:03:19.599 Which might be stale news by the time you get 75 00:03:19.639 --> 00:03:20.319 it totally. 76 00:03:21.039 --> 00:03:23.039 RTA aims to give you a view of the present, 77 00:03:23.199 --> 00:03:25.479 so your decisions are actually relevant to now. 78 00:03:25.719 --> 00:03:28.240 So our mission for this deep dive drawing from the 79 00:03:28.240 --> 00:03:31.360 book is to really get into those core concepts and 80 00:03:31.719 --> 00:03:35.479 importantly the benefits. What do you actually gain from this? 81 00:03:35.879 --> 00:03:39.360 Let's talk benefits. Then. Speed is obviously a big one. 82 00:03:39.759 --> 00:03:43.439 The book argues it's often a decisive factor. Market leaders 83 00:03:43.439 --> 00:03:44.719 tend to be faster. 84 00:03:44.680 --> 00:03:47.719 Faster at understanding, faster at reacting. 85 00:03:47.360 --> 00:03:50.360 Exactly, and RTA helps achieve that. For one thing, it 86 00:03:50.400 --> 00:03:53.960 can actually open up totally new revenue streams now. So well, 87 00:03:54.080 --> 00:03:56.719 think about turning your real time data itself into a 88 00:03:56.759 --> 00:04:00.520 product offering your end users. Maybe customers the ability to 89 00:04:00.599 --> 00:04:05.240 query data with analytical capabilities almost live. They'd likely pay 90 00:04:05.319 --> 00:04:06.280 for that kind of access. 91 00:04:06.400 --> 00:04:09.479 Ah, interesting, So the insight itself becomes a premium service. 92 00:04:09.719 --> 00:04:12.479 That makes sense. It's not just about making more money, though, 93 00:04:12.599 --> 00:04:15.080 is it. The book talks infrastructure costs too. 94 00:04:15.439 --> 00:04:18.920 Yes, that's a really important one. Traditional BATGE systems often 95 00:04:18.959 --> 00:04:21.079 tie storage and compute together very tightly. 96 00:04:21.240 --> 00:04:23.120 Meaning if your data grows. 97 00:04:23.240 --> 00:04:26.399 Your costs for both storage and the processing power needed 98 00:04:26.439 --> 00:04:30.759 can just explode, often exponentially ouch. But with RTA, you're 99 00:04:30.879 --> 00:04:34.319 processing data more incrementally as it arrives. It sort of 100 00:04:34.319 --> 00:04:37.000 breaks that tight coupling. You don't necessarily need to store 101 00:04:37.079 --> 00:04:40.399 everything forever just to process it later in huge. 102 00:04:40.120 --> 00:04:44.120 Batches, so you avoid building those massive, expensive legacy systems 103 00:04:44.319 --> 00:04:45.279 just for batch jobs. 104 00:04:45.399 --> 00:04:49.600 Potentially, yes, significant cost savings are possible there. You're handling 105 00:04:49.680 --> 00:04:52.279 smaller streams continuously. 106 00:04:51.800 --> 00:04:54.879 Like managing a steady creek instead of building dams for 107 00:04:55.079 --> 00:04:59.160 unpredictable floods. And what about us, the customers? How does 108 00:04:59.199 --> 00:05:00.680 this improve customer experience? 109 00:05:00.839 --> 00:05:04.639 Well, think about customer support. Traditionally it's reactive, Right, you 110 00:05:04.639 --> 00:05:08.240 have a problem, you call their email, They investigate. 111 00:05:07.720 --> 00:05:08.920 And maybe fix it eventually. 112 00:05:09.199 --> 00:05:13.279 Maybe. With RTA, companies can constantly monitor streams of data 113 00:05:13.360 --> 00:05:17.560 usage patterns. ERA logus sensor data looking for anomalies or 114 00:05:17.560 --> 00:05:18.560 signs of trouble. 115 00:05:18.319 --> 00:05:21.000 Ah, so they can spot problems before I even notice them. 116 00:05:21.279 --> 00:05:24.920 That's the goal. They can potentially identify and even resolve 117 00:05:24.959 --> 00:05:29.759 issues proactively automatically, maybe reroute traffic, restart a service, or 118 00:05:29.800 --> 00:05:32.439 even reach out to you before it becomes a major headache. 119 00:05:32.560 --> 00:05:37.519 That sounds much better, moving from reactive firefighting to proactive. 120 00:05:36.920 --> 00:05:41.000 Care exactly, it leads to much higher customer satisfaction. It 121 00:05:41.040 --> 00:05:43.160 feels like the company is actually looking out for you. 122 00:05:43.319 --> 00:05:49.600 Okay, So RTA sounds powerful but also complex. The book 123 00:05:49.680 --> 00:05:55.279 introduces this term the real time analytics ecosystem or stack. 124 00:05:55.399 --> 00:05:56.680 What is that? In simple terms? 125 00:05:56.759 --> 00:06:00.319 Yeah, you'll hear ecosystem, stack, streaming stack. The basic mean 126 00:06:00.319 --> 00:06:03.800 the same thing. It's the whole collection of tools, technologies, 127 00:06:04.040 --> 00:06:06.439 and the processes you use to get from those raw, 128 00:06:06.879 --> 00:06:08.160 unending streams. 129 00:06:07.759 --> 00:06:09.639 Of data to actual insights you can use. 130 00:06:09.720 --> 00:06:12.959 Precisely, it's the entire pipeline, all the components working together. 131 00:06:13.160 --> 00:06:15.759 And why is understanding that whole picture important? 132 00:06:15.959 --> 00:06:19.199 Well, if you're an architect designing these systems, or developer 133 00:06:19.240 --> 00:06:22.079 building the apps, or even an operator keeping it all running, 134 00:06:22.600 --> 00:06:24.800 you need to understand how the pieces fit together. 135 00:06:24.639 --> 00:06:26.920 To make the right choices about tools and how they connect. 136 00:06:27.279 --> 00:06:30.560 Absolutely, it helps you build systems that are robust, scalable, 137 00:06:30.800 --> 00:06:33.360 and actually deliver those real time insights effectively. 138 00:06:33.480 --> 00:06:37.120 Okay. Now, before diving into the modern stack, the book 139 00:06:37.199 --> 00:06:41.560 briefly mentions something called the Lambda architecture. Sounds a bit 140 00:06:41.839 --> 00:06:43.120 I don't know dated. 141 00:06:43.000 --> 00:06:44.000 It is a bit older. Yeah. 142 00:06:44.120 --> 00:06:44.319 Yeah. 143 00:06:44.360 --> 00:06:46.720 It was kind of an early attempt to deal with 144 00:06:47.240 --> 00:06:52.439 having both real time needs and needing accurate historical analysis 145 00:06:52.480 --> 00:06:53.959 on huge data. 146 00:06:53.720 --> 00:06:56.040 Sets, trying to do both at once sort of. 147 00:06:56.199 --> 00:06:59.879 It had three layers, a big, slow batch layer for 148 00:07:00.120 --> 00:07:04.079 processing all the historical data accurately, a fast speed layer 149 00:07:04.319 --> 00:07:08.079 for handling the incoming real time streams providing quick, maybe 150 00:07:08.120 --> 00:07:10.839 slightly less perfect answers. And the third layer a serving 151 00:07:10.920 --> 00:07:12.959 layer that would try to merge the results from both 152 00:07:12.959 --> 00:07:15.680 the batch and speed layers when you actually queried the system. 153 00:07:15.759 --> 00:07:17.839 Okay, so it tried to give you fast answers and 154 00:07:17.879 --> 00:07:20.720 eventually correct complete answers. What was the upside? 155 00:07:20.839 --> 00:07:23.720 The main benefit was that your original raw data was 156 00:07:23.839 --> 00:07:26.480 kept safe and sound in the batch layer, so if 157 00:07:26.480 --> 00:07:28.600 you messed up your processing logic or wanted to try 158 00:07:28.600 --> 00:07:29.879 a new analysis. 159 00:07:29.360 --> 00:07:30.839 You could always go back and rerun it on the 160 00:07:30.879 --> 00:07:35.319 original data exactly. Data I mutability was a plus, But 161 00:07:36.680 --> 00:07:39.360 I sense a butt coming. The book implies it wasn't 162 00:07:39.360 --> 00:07:41.680 the perfect solution. What were the drawbacks? 163 00:07:42.279 --> 00:07:45.079 There were quite a few. Actually, First, it was complex. 164 00:07:45.480 --> 00:07:49.120 You essentially had to build and maintain two separate data pipelines, 165 00:07:49.480 --> 00:07:52.480 Batch and speed. That's a lot of engineering. 166 00:07:52.120 --> 00:07:54.920 Effort, double the work, potentially double the problem pretty much. 167 00:07:55.439 --> 00:07:58.839 Also, many early stream processors relied heavily on the JVM, 168 00:07:59.040 --> 00:08:01.519 the Java Virtual Mass, which was fine if you were 169 00:08:01.519 --> 00:08:04.199 a Java shop, but maybe less ideal otherwise. 170 00:08:04.360 --> 00:08:07.600 Fender lock in or skill set mismatch. 171 00:08:07.759 --> 00:08:10.519 Yeah, and maybe the biggest headache was often having to 172 00:08:10.519 --> 00:08:14.079 write and maintain the same or very similar processing logic 173 00:08:14.160 --> 00:08:15.839 in both the batch and the speed layers. 174 00:08:16.480 --> 00:08:19.560 Duplication. That sounds like a nightmare for consistency and updates, 175 00:08:19.720 --> 00:08:20.319 it really was. 176 00:08:20.639 --> 00:08:23.920 Keeping them perfectly in sync was hard, leading to potential 177 00:08:23.920 --> 00:08:26.959 inconsistencies in the final results. So yeah, lots of overhead 178 00:08:27.000 --> 00:08:27.600 and complexity. 179 00:08:27.720 --> 00:08:30.000 Okay, so LAMB deserved a purpose, But we've move on. 180 00:08:30.399 --> 00:08:34.320 What does a more modern real time analytics stack look like? 181 00:08:34.360 --> 00:08:35.679 What are the essential pieces? 182 00:08:36.080 --> 00:08:39.279 Right? The contemporary approach is generally more streamlined. It typically 183 00:08:39.320 --> 00:08:41.080 starts with event producers. 184 00:08:40.639 --> 00:08:42.679 The things generating the data in the first place. 185 00:08:42.759 --> 00:08:46.639 Exactly, systems that detect something happen to state change and 186 00:08:46.759 --> 00:08:50.159 fire off an event. Like an order management system sees 187 00:08:50.200 --> 00:08:52.919 a new order and generates an order received event, and that. 188 00:08:52.919 --> 00:08:57.559 Event contains the details like order ID, customer info items. 189 00:08:57.679 --> 00:09:00.879 All the relevant data and A key thing here mentioned 190 00:09:00.919 --> 00:09:03.480 in the book is you really need to benchmark your 191 00:09:03.480 --> 00:09:07.440 producers make sure they can actually handle the volume and 192 00:09:07.519 --> 00:09:12.159 speed of events you expect without becoming a bottleneck. Scalability 193 00:09:12.200 --> 00:09:14.120 and latency are critical right from the start. 194 00:09:14.200 --> 00:09:16.679 Okay, makes sense, The source needs to keep up. Where 195 00:09:16.679 --> 00:09:17.879 do those events go next? 196 00:09:18.120 --> 00:09:20.639 They flow into the event streaming platform. This is like 197 00:09:20.679 --> 00:09:23.519 the central highway or message bus for all your events. 198 00:09:23.519 --> 00:09:27.159 The backbone, yeah, exactly. Its job is to ingest potentially 199 00:09:27.279 --> 00:09:31.000 huge volumes of events, store them reliably, usually for some 200 00:09:31.039 --> 00:09:34.320 configurable period, and deliver them to whatever needs to consume them. 201 00:09:34.399 --> 00:09:36.799 A patch Kafka is probably the most well known example here. 202 00:09:36.960 --> 00:09:39.519 Right, Kofka comes up a lot. What makes a good 203 00:09:39.720 --> 00:09:40.720 streaming platform? 204 00:09:41.000 --> 00:09:45.440 Key things are scalability, Can it handle growth? Fault tolerance? 205 00:09:45.440 --> 00:09:49.000 Does it lose data if a server fails? High throughput? 206 00:09:49.039 --> 00:09:53.159 Can it handle a massive continuous flow and low latency? 207 00:09:53.519 --> 00:09:56.000 How quickly does data get through? Got it? 208 00:09:56.679 --> 00:10:01.120 So? Data producers feed events onto this Sofka like highway? 209 00:10:02.200 --> 00:10:05.200 Then what then? You typically have a stream processing platform 210 00:10:05.279 --> 00:10:07.240 This is where the real time analysis starts happening. 211 00:10:07.279 --> 00:10:08.759 This is where the magic happens, well. 212 00:10:08.720 --> 00:10:10.399 Some of it. This is where you take those raw 213 00:10:10.399 --> 00:10:13.639 event streams and transform them, maybe enrich them by joining 214 00:10:13.639 --> 00:10:16.960 them with other data streams or static data filter them, 215 00:10:17.159 --> 00:10:22.000 aggregate them, run calculations. Basically turn raw data into intermediate insights. 216 00:10:22.159 --> 00:10:23.879 Can you give examples of tools here? 217 00:10:24.000 --> 00:10:26.799 Sure. Popular ones include a Patche flink, which is a 218 00:10:26.799 --> 00:10:30.799 powerful stream processing framework. There's also a Patche Spark streaming, 219 00:10:30.879 --> 00:10:34.559 which extends the Spark batch engine for streaming, and Kaffka streams, 220 00:10:34.559 --> 00:10:36.559 which is a library that lets you build stream processing 221 00:10:36.600 --> 00:10:38.440 apps directly on top of Kafka. 222 00:10:38.600 --> 00:10:41.320 What are the important features for these stream processors? 223 00:10:41.559 --> 00:10:44.639 You need things like good state management because your analysis 224 00:10:44.679 --> 00:10:48.720 often depends on past events when doing capabilities for doing 225 00:10:48.720 --> 00:10:51.679 calculations over specific time periods like the last five minutes. 226 00:10:52.159 --> 00:10:55.840 Fault tolerance obviously so processing doesn't stop if something breaks, 227 00:10:56.840 --> 00:10:59.679 and support for different data formats okay. 228 00:11:00.039 --> 00:11:03.360 Using happens insights are generated, how do we actually use 229 00:11:03.399 --> 00:11:04.399 them or see them? 230 00:11:04.519 --> 00:11:08.120 That's the final piece, usually the serving layer. This is 231 00:11:08.159 --> 00:11:10.399 the system that stores the results of your real time 232 00:11:10.440 --> 00:11:13.559 processing and makes them available for querying fast. 233 00:11:13.919 --> 00:11:17.519 This is what applications or dashboards actually talk to exactly. 234 00:11:17.559 --> 00:11:20.559 It's the primary access point. Now, this serving layer could 235 00:11:20.600 --> 00:11:23.000 be a few different types of systems like what It 236 00:11:23.039 --> 00:11:26.679 could be a fast key value store think Mango dB, 237 00:11:26.879 --> 00:11:29.919 maybe elastic search or rettis. These are great if you 238 00:11:29.960 --> 00:11:32.639 primarily need to look up results based on a specific key, 239 00:11:32.840 --> 00:11:35.600 like getting the current status for a particular user. 240 00:11:35.440 --> 00:11:37.559 ID quick lookups. What's the alternative? 241 00:11:37.679 --> 00:11:40.960 The alternative, especially for more complex analytics, is a real 242 00:11:41.039 --> 00:11:45.879 time ol APP database. Ol app stands for online analytical processing. 243 00:11:45.480 --> 00:11:48.399 Ah okay designed for analysis right. 244 00:11:48.799 --> 00:11:52.600 Tools like a Pacupine, Apache, Druid, rock Set, or ClickHouse 245 00:11:52.879 --> 00:11:55.879 fall into this category. They are built for slicing and 246 00:11:55.960 --> 00:12:00.240 dicing data, running aggregations, filtering across lots of dimensions, much 247 00:12:00.279 --> 00:12:02.240 more complex queries than just a key lookup. 248 00:12:02.519 --> 00:12:04.759 So if I want to see, say, sales trends by 249 00:12:04.799 --> 00:12:07.559 region and product category for the last hour, I'd want 250 00:12:07.600 --> 00:12:08.799 an ol APP database. 251 00:12:09.120 --> 00:12:12.600 Generally, Yes, that's where they shine. The crucial thing for 252 00:12:12.679 --> 00:12:16.000 any serving layer in this context is speed. You need 253 00:12:16.120 --> 00:12:19.799 really fast data ingestion. The results from the stream processor 254 00:12:19.840 --> 00:12:23.000 need to show up almost instantly, and query latency needs 255 00:12:23.039 --> 00:12:24.440 to be low often in the. 256 00:12:24.360 --> 00:12:25.759 Millisecond well, well seconds again. 257 00:12:25.799 --> 00:12:28.200 Wow yeah, and it also needs to handle high concurrency, 258 00:12:28.200 --> 00:12:30.919 potentially thousands or even hundreds of thousands of queries per 259 00:12:30.919 --> 00:12:32.200 second depending on the application. 260 00:12:32.399 --> 00:12:34.919 That's incredible scale. How do you choose between key value 261 00:12:34.919 --> 00:12:38.600 and real time ol app beyond just the query type? 262 00:12:38.639 --> 00:12:41.240 Well, query type is the main driver, but you also 263 00:12:41.279 --> 00:12:44.840 look at how data gets in. Does it support direct 264 00:12:44.919 --> 00:12:48.679 streaming ingestion from COFKA or flink or do you need 265 00:12:48.720 --> 00:12:51.639 an extra step? How fast is that ingestion? Really? Can 266 00:12:51.679 --> 00:12:54.279 it handle your expected data volume and rate? Does it 267 00:12:54.360 --> 00:12:57.440 need complex indexing or pre aggregation to meet your query 268 00:12:57.480 --> 00:13:00.559 speed goals? Lots to consider, definitely, and the book makes 269 00:13:00.600 --> 00:13:03.679 a very sensible point. Don't just trust the marketing hype. 270 00:13:04.120 --> 00:13:07.320 Do your own benchmarking with your own data and query patterns. 271 00:13:07.559 --> 00:13:10.120 See what actually works best for your specific needs. 272 00:13:10.279 --> 00:13:13.720 Test it yourself. Always good advice. Okay, so we have producers, 273 00:13:13.720 --> 00:13:17.000 the streaming platform, the process, or the serving layer. How 274 00:13:17.000 --> 00:13:19.840 do people like actual users see this stuff? 275 00:13:19.879 --> 00:13:22.519 Ah, the front end? Good point. If your users are 276 00:13:22.679 --> 00:13:25.720 internal like data analysts or engineers, maybe they querry the 277 00:13:25.759 --> 00:13:27.840 serving layer directly using SQL or an. 278 00:13:27.759 --> 00:13:32.320 API okay, But for less technical users or external customer. 279 00:13:31.960 --> 00:13:34.399 Then you'll likely need a user interface a front end, 280 00:13:34.919 --> 00:13:36.120 and you've got a few options here. 281 00:13:36.120 --> 00:13:36.600 What are they? 282 00:13:36.679 --> 00:13:39.399 You could go fully custom build your own web application 283 00:13:39.759 --> 00:13:44.120 using standard tools like react as Angular viewjas gives you 284 00:13:44.159 --> 00:13:47.720 total control over the look, feel, and functionality. 285 00:13:47.080 --> 00:13:49.080 The Highffert high control option. What else? 286 00:13:49.399 --> 00:13:52.480 Then there are low code frameworks, things like Streamlet or plotly, 287 00:13:52.559 --> 00:13:55.720 dash or popular especially in the Python world. They let 288 00:13:55.720 --> 00:13:59.000 you build interactive dashboards and web apps with much less 289 00:13:59.000 --> 00:14:00.200 front end coding effort. 290 00:14:00.360 --> 00:14:03.559 Faster development, maybe, less customization generally yes. 291 00:14:04.039 --> 00:14:10.000 And the third category is data visualization tools they ca Apache, Superset, redash, Grfauna. 292 00:14:10.440 --> 00:14:13.559 These often provide drag and drop interfaces to build dashboards 293 00:14:13.559 --> 00:14:16.320 directly on top of your data sources, often with no 294 00:14:16.399 --> 00:14:17.360 coding required at all. 295 00:14:17.480 --> 00:14:19.240 The quickest way to get a dashboard up. 296 00:14:19.159 --> 00:14:22.440 Often yes, So how you choose depends on a few things. 297 00:14:22.480 --> 00:14:24.759 What's the front end coding skill level of your team, 298 00:14:25.440 --> 00:14:28.399 how much time do you realistically have, and who are 299 00:14:28.399 --> 00:14:32.639 the user's internal experts or external customers needing a polished experience. 300 00:14:32.840 --> 00:14:36.559 A spectrum of choices matching needs and resources makes sense 301 00:14:36.879 --> 00:14:39.080 now the book also notes that sometimes the lines between 302 00:14:39.120 --> 00:14:40.639 these components get fuzzy. 303 00:14:40.960 --> 00:14:44.960 Yeah, technology evolves and tools sometimes wear multiple hats. Apache 304 00:14:44.960 --> 00:14:48.559 Pulsar is a good examples, mainly an event streaming platform 305 00:14:48.679 --> 00:14:52.360 like Kafka, but it also has built in capabilities called 306 00:14:52.399 --> 00:14:56.039 Pulsar functions that let you do some lightweight stream processing 307 00:14:56.080 --> 00:14:57.720 directly within Pulsar itself. 308 00:14:57.840 --> 00:15:01.799 Ah, so the streaming platform is doing some processing tasks exactly. 309 00:15:01.840 --> 00:15:04.279 It blurs the line a bit between the streaming platform 310 00:15:04.519 --> 00:15:07.240 and the stream processing platform. It just shows that these 311 00:15:07.279 --> 00:15:10.960 categories aren't always rigid silos. The landscape is pretty dynamic. 312 00:15:11.080 --> 00:15:14.039 Okay, that's a great tour through the stack. So wrapping 313 00:15:14.120 --> 00:15:17.080 up this main section, what's the big takeaway from the 314 00:15:17.120 --> 00:15:18.519 book about building these systems? 315 00:15:18.639 --> 00:15:21.919 I think the fundamental message is that embracing real time 316 00:15:22.200 --> 00:15:26.559 analytics isn't just a technical upgrade. It's a strategic move 317 00:15:26.919 --> 00:15:29.240 that can give you a serious competitive. 318 00:15:28.799 --> 00:15:31.679 Edge by making you faster, more informed. 319 00:15:31.639 --> 00:15:35.840 And ultimately making more accurate, relevant decisions because you're acting 320 00:15:35.840 --> 00:15:38.480 on what's happening now, not what happened yesterday. 321 00:15:38.679 --> 00:15:41.960 Fantastic, and this deep dive has been really an introduction. 322 00:15:42.480 --> 00:15:45.200 We've touched on the core ideas, the benefits those key 323 00:15:45.240 --> 00:15:48.159 building blocks of the RTA stack all pulled from the 324 00:15:48.200 --> 00:15:51.000 insights in building real time analytics systems. 325 00:15:51.039 --> 00:15:53.919 Absolutely, it just scratches the surface, but hopefully gives you 326 00:15:53.960 --> 00:15:54.840 a solid foundation. 327 00:15:55.039 --> 00:15:57.799 So a final thought for you, the listener to chew on, 328 00:15:58.480 --> 00:16:02.039 think about your own work, your own organization. Where could 329 00:16:02.080 --> 00:16:06.240 real time analytics unlock something new, maybe a new data 330 00:16:06.279 --> 00:16:09.679 product or a way to significantly improve a process you 331 00:16:09.720 --> 00:16:10.279 already have. 332 00:16:10.600 --> 00:16:14.200 Yeah, ask yourself, what's the current shelf life of your data? 333 00:16:14.679 --> 00:16:17.440 Is its value decaying rapidly? What can you gain by 334 00:16:17.519 --> 00:16:20.200 acting on it immediately? And maybe think about that stack. 335 00:16:20.200 --> 00:16:25.240 We discussed producers, streaming, processing, serving front end. Which piece 336 00:16:25.320 --> 00:16:28.919 might offer the biggest immediate win for your specific situation? 337 00:16:29.240 --> 00:16:32.440 Something to definitely consider. Where could that immediate incite make 338 00:16:32.440 --> 00:16:33.279 the biggest difference