WEBVTT

1
00:00:00.160 --> 00:00:03.640
<v Speaker 1>Welcome to the deep dive, your shortcut to truly understanding

2
00:00:03.759 --> 00:00:09.519
<v Speaker 1>complex topics. Today, we're plunging into Apache Koffka. It's really

3
00:00:09.560 --> 00:00:13.880
<v Speaker 1>a foundational technology underpinning so much of the modern data world.

4
00:00:14.080 --> 00:00:16.519
<v Speaker 2>Absolutely, it's everywhere, even if you don't see it directly.

5
00:00:16.640 --> 00:00:19.399
<v Speaker 1>We've pulled together a stack of detailed sources for you,

6
00:00:19.480 --> 00:00:23.679
<v Speaker 1>particularly some really great insights from a patchi Kofka in action,

7
00:00:24.440 --> 00:00:27.239
<v Speaker 1>and our mission really is to unpack its full potential.

8
00:00:27.359 --> 00:00:29.640
<v Speaker 2>Yeah, get beyond just the buzzwords exactly.

9
00:00:29.679 --> 00:00:33.560
<v Speaker 1>We'll explore everything from its basic building blocks to how

10
00:00:33.600 --> 00:00:37.840
<v Speaker 1>it ensures rock solid reliability, which is critical goose performance

11
00:00:37.880 --> 00:00:41.719
<v Speaker 1>to incredible levels, and fits into the most advanced enterprise systems.

12
00:00:42.039 --> 00:00:44.240
<v Speaker 1>Think of it this way. If you've ever wondered how

13
00:00:44.240 --> 00:00:47.880
<v Speaker 1>a massive online retailer processes millions of real time orders,

14
00:00:48.320 --> 00:00:54.079
<v Speaker 1>updates inventory instantly, or personalizes your shopping experience on the fly, Kuffa.

15
00:00:53.840 --> 00:00:55.479
<v Speaker 2>Is probably in the mix somewhere. It's off in that

16
00:00:55.560 --> 00:00:56.520
<v Speaker 2>silent powerhouse.

17
00:00:56.840 --> 00:01:00.719
<v Speaker 1>Get ready for some serious aha moments, because this deep

18
00:01:00.759 --> 00:01:03.560
<v Speaker 1>dive is your essential guide. We want you to not

19
00:01:03.719 --> 00:01:07.239
<v Speaker 1>just know what Kolica is, but deeply understand how it

20
00:01:07.319 --> 00:01:10.319
<v Speaker 1>works and why it's so critical for today's real time

21
00:01:10.439 --> 00:01:11.400
<v Speaker 1>data needs.

22
00:01:11.120 --> 00:01:13.680
<v Speaker 2>And hopefully, without feeling overwhelmed by the jargon.

23
00:01:13.439 --> 00:01:14.200
<v Speaker 1>Let's untack this.

24
00:01:14.560 --> 00:01:18.560
<v Speaker 2>It's truly a system that transforms how organizations handle data.

25
00:01:19.159 --> 00:01:22.239
<v Speaker 2>It enables that shift, you know, from waiting.

26
00:01:21.879 --> 00:01:24.120
<v Speaker 3>For daily reports still a batch world.

27
00:01:24.000 --> 00:01:26.959
<v Speaker 2>Right to getting instant insights and acting on them immediately.

28
00:01:27.079 --> 00:01:31.200
<v Speaker 1>That real time capability is absolutely where the magic happens. Okay,

29
00:01:31.280 --> 00:01:34.359
<v Speaker 1>So for anyone looking to understand Kafka, where do we

30
00:01:34.480 --> 00:01:38.519
<v Speaker 1>even begin? What are its absolute foundational elements?

31
00:01:38.560 --> 00:01:41.599
<v Speaker 2>Okay? So at its core, cof Go works with messages

32
00:01:42.159 --> 00:01:45.799
<v Speaker 2>sometimes called records. Okay, These are essentially just by rays.

33
00:01:45.840 --> 00:01:49.439
<v Speaker 2>Think of them like small data envelopes, and for efficiency,

34
00:01:49.560 --> 00:01:53.159
<v Speaker 2>they're often grouped into batches before being sent saves overhead.

35
00:01:52.799 --> 00:01:55.840
<v Speaker 1>Got it? Batches of messages, and these messages are organized

36
00:01:55.840 --> 00:01:58.519
<v Speaker 1>into topics like categories exactly.

37
00:01:59.040 --> 00:02:02.040
<v Speaker 2>Think of a topic as a dedicated channel or category

38
00:02:02.079 --> 00:02:05.920
<v Speaker 2>for bundling messages of a specific business type, much like

39
00:02:06.120 --> 00:02:09.159
<v Speaker 2>tables in a database maybe, but really designed for a

40
00:02:09.199 --> 00:02:12.879
<v Speaker 2>continuous stream of events. So, for that online retailer we mentioned,

41
00:02:12.919 --> 00:02:15.879
<v Speaker 2>you might have a customer orders topic or maybe a

42
00:02:15.919 --> 00:02:17.599
<v Speaker 2>product inventory updates topic.

43
00:02:17.800 --> 00:02:20.120
<v Speaker 1>Right, So if I place an order that becomes a

44
00:02:20.120 --> 00:02:23.879
<v Speaker 1>message in the customer orders topic. Simple enough, But how

45
00:02:23.879 --> 00:02:27.560
<v Speaker 1>does KOFKA handle the sheer volume millions, maybe billions of

46
00:02:27.599 --> 00:02:30.039
<v Speaker 1>messages and ensure it can scale.

47
00:02:30.240 --> 00:02:33.000
<v Speaker 2>Ah? That's where partitions come in, and they are truly

48
00:02:33.479 --> 00:02:36.800
<v Speaker 2>like the backbone of kofka's performance and scalability.

49
00:02:36.879 --> 00:02:38.199
<v Speaker 1>Partitions Okay, hash.

50
00:02:38.080 --> 00:02:41.240
<v Speaker 2>Topic is divided into one or more partitions. This division

51
00:02:41.319 --> 00:02:44.560
<v Speaker 2>is what enables parallel processing lots of things happening at once.

52
00:02:44.680 --> 00:02:46.439
<v Speaker 1>Makes sense, divide and conquer and to.

53
00:02:46.439 --> 00:02:49.879
<v Speaker 2>Ensure high availability and fault tolerance, which is crucial. These

54
00:02:49.919 --> 00:02:53.080
<v Speaker 2>partitions are replicated across different COFCA servers. The servers which

55
00:02:53.120 --> 00:02:55.120
<v Speaker 2>you call we call them brokers. So if one broker

56
00:02:55.159 --> 00:02:57.800
<v Speaker 2>goes down to debta, is safe and accessible in another one?

57
00:02:58.039 --> 00:02:58.599
<v Speaker 2>No panic?

58
00:02:59.080 --> 00:03:03.639
<v Speaker 1>Got it? Mess topics partitions on brokers. Okay, So you

59
00:03:03.719 --> 00:03:07.919
<v Speaker 1>have producers sending messages and consumers receiving them. How did

60
00:03:07.960 --> 00:03:10.639
<v Speaker 1>they interact with these partitions and brokers?

61
00:03:10.800 --> 00:03:14.400
<v Speaker 2>Good question. Producers are the application sending the messages your

62
00:03:14.560 --> 00:03:18.000
<v Speaker 2>order service. Maybe they send them to the designated leader

63
00:03:18.120 --> 00:03:18.919
<v Speaker 2>of a partition.

64
00:03:19.240 --> 00:03:22.599
<v Speaker 1>Leader one broker is in charge for that partition, right, and.

65
00:03:22.560 --> 00:03:26.120
<v Speaker 2>The producer selects that partition using something called a partitioner.

66
00:03:26.560 --> 00:03:28.439
<v Speaker 2>Often it's based on a message key, which we'll get to.

67
00:03:28.560 --> 00:03:28.840
<v Speaker 1>Okay.

68
00:03:29.080 --> 00:03:32.840
<v Speaker 2>On the other side, consumers receive and process messages. They're

69
00:03:32.879 --> 00:03:35.800
<v Speaker 2>quite flexible actually, they can read from multiple partitions, even

70
00:03:36.159 --> 00:03:37.319
<v Speaker 2>multiple topics at once.

71
00:03:37.400 --> 00:03:39.360
<v Speaker 1>And the brokers themselves, what's their main job.

72
00:03:39.680 --> 00:03:43.560
<v Speaker 2>The brokers are the Kafka servers. They manage the storage, distribution,

73
00:03:43.759 --> 00:03:47.439
<v Speaker 2>retrieval of messages, all that, and they share replicas and

74
00:03:47.520 --> 00:03:51.560
<v Speaker 2>processing tasks pretty evenly among themselves. It's a distributed system.

75
00:03:51.680 --> 00:03:53.599
<v Speaker 1>This sounds like a lot of moving parts all needing

76
00:03:53.639 --> 00:03:56.039
<v Speaker 1>to coordinate. Who is the leader? Is this broker alive?

77
00:03:56.159 --> 00:03:58.960
<v Speaker 1>How does Kafka manage that internal choreography?

78
00:03:59.240 --> 00:04:02.719
<v Speaker 2>Right? That's the role coordination pluster. Historically this was a

79
00:04:02.759 --> 00:04:05.719
<v Speaker 2>page zookeeper, a whole separate system you had to manage.

80
00:04:05.840 --> 00:04:08.800
<v Speaker 1>I remember a zoo keeper could be complex.

81
00:04:08.879 --> 00:04:12.039
<v Speaker 2>It could. But now Koka is increasingly moving to craft

82
00:04:12.759 --> 00:04:15.919
<v Speaker 2>kr aft ok and this isn't just a name change.

83
00:04:15.960 --> 00:04:21.360
<v Speaker 2>It's a pretty significant evolution. Craft simplifies the entire Kofka

84
00:04:21.480 --> 00:04:25.920
<v Speaker 2>architecture because it removes that external dependency on zookeeper, so

85
00:04:26.000 --> 00:04:30.040
<v Speaker 2>Kofka manages itself more exactly, it becomes self managing for

86
00:04:30.079 --> 00:04:35.079
<v Speaker 2>these critical coordination tasks. Overseeing partition assignments, handling leader elections,

87
00:04:35.519 --> 00:04:39.160
<v Speaker 2>continuously monitoring broker health means fewer moving parts for you

88
00:04:39.240 --> 00:04:42.439
<v Speaker 2>to manage, which is a huge operational win, especially for

89
00:04:42.600 --> 00:04:43.879
<v Speaker 2>large dynamic clusters.

90
00:04:43.920 --> 00:04:47.360
<v Speaker 1>Okay, that gives us the basic anatomy messages and topics

91
00:04:47.360 --> 00:04:50.959
<v Speaker 1>split into partitions managed by brokers, with producers and consumers

92
00:04:51.040 --> 00:04:54.000
<v Speaker 1>all coordinated by craft or zookeeper. But here's where it

93
00:04:54.040 --> 00:04:58.000
<v Speaker 1>gets really interesting and a bit well mind bending for me. Initially,

94
00:04:58.319 --> 00:05:02.120
<v Speaker 1>the sources describe Kofka's core nature as a distributed log.

95
00:05:02.360 --> 00:05:04.000
<v Speaker 2>Yes, this is fundamental.

96
00:05:04.079 --> 00:05:05.920
<v Speaker 1>Can you elaborate on that? Why is thinking of it

97
00:05:05.920 --> 00:05:07.199
<v Speaker 1>as a log so important?

98
00:05:07.600 --> 00:05:11.639
<v Speaker 2>Absolutely? What's truly fascinating here is that Kafka is fundamentally

99
00:05:11.759 --> 00:05:14.959
<v Speaker 2>a distributed log. You need to sort of forget about

100
00:05:15.000 --> 00:05:17.199
<v Speaker 2>it being just a message queue for a second. Okay,

101
00:05:17.240 --> 00:05:19.800
<v Speaker 2>think of it more like an immutable personal diary, or

102
00:05:20.120 --> 00:05:23.040
<v Speaker 2>maybe better the commit log of a database. It answers

103
00:05:23.079 --> 00:05:26.720
<v Speaker 2>the question what happened? It focuses on the history of

104
00:05:26.759 --> 00:05:30.319
<v Speaker 2>events rather than just what is which is the current state?

105
00:05:30.480 --> 00:05:32.079
<v Speaker 1>Right history versus snapshot?

106
00:05:32.160 --> 00:05:35.800
<v Speaker 2>Precisely So for our online retailer, it's not just current

107
00:05:35.839 --> 00:05:38.920
<v Speaker 2>inventory is fifty shirts. It's more like a shirt was

108
00:05:38.959 --> 00:05:42.079
<v Speaker 2>sold at ten point zero one am than another at

109
00:05:42.120 --> 00:05:44.959
<v Speaker 2>ten point zero two am. Then we receive stock at

110
00:05:45.000 --> 00:05:47.160
<v Speaker 2>ten point zero five am, the whole sequence.

111
00:05:47.480 --> 00:05:50.040
<v Speaker 1>So it's about the sequence of actions, the journey, not

112
00:05:50.160 --> 00:05:52.959
<v Speaker 1>just the final destination. What are the key properties of

113
00:05:53.000 --> 00:05:54.879
<v Speaker 1>such a log that make it so powerful?

114
00:05:55.120 --> 00:05:59.279
<v Speaker 2>Logs have distinct crucial properties. First order and sorting. Messages

115
00:05:59.319 --> 00:06:01.920
<v Speaker 2>are always sorted time within a partition, oldest entry.

116
00:06:01.839 --> 00:06:03.160
<v Speaker 1>The beginning partition, got it.

117
00:06:03.399 --> 00:06:06.199
<v Speaker 2>Second, writing and reading direction. You always append new entries

118
00:06:06.199 --> 00:06:08.079
<v Speaker 2>to the end of the log, like adding to a diary,

119
00:06:08.439 --> 00:06:10.800
<v Speaker 2>and you typically read from old to new using what

120
00:06:10.920 --> 00:06:13.199
<v Speaker 2>Kafka calls offsets to track your position.

121
00:06:13.360 --> 00:06:15.879
<v Speaker 1>Offsets like bookmarks kinda yeah.

122
00:06:15.920 --> 00:06:19.800
<v Speaker 2>And crucially, immutability. Once an entry is written, you can't

123
00:06:19.800 --> 00:06:22.399
<v Speaker 2>easily change or remove it. It's like writing in permanent ink.

124
00:06:22.800 --> 00:06:27.360
<v Speaker 1>That immutability has profound implications for data integrity. I imagine,

125
00:06:27.680 --> 00:06:30.519
<v Speaker 1>and I've heard this concept of time travel mentioned with logs.

126
00:06:30.600 --> 00:06:32.439
<v Speaker 1>How does that actually work and why is it such

127
00:06:32.439 --> 00:06:33.160
<v Speaker 1>a game changer?

128
00:06:33.480 --> 00:06:37.839
<v Speaker 2>That's right, time travel because the log is immutable and ordered.

129
00:06:38.240 --> 00:06:40.519
<v Speaker 2>You can literally reconstruct the state of the world at

130
00:06:40.519 --> 00:06:43.959
<v Speaker 2>any point in time by simply replaying the entries from

131
00:06:43.959 --> 00:06:46.800
<v Speaker 2>the beginning of the log or from a specific offset.

132
00:06:47.360 --> 00:06:50.279
<v Speaker 2>For our online retailer, this means you could replay all

133
00:06:50.399 --> 00:06:53.800
<v Speaker 2>order placed events from last year to reconstruct exactly how

134
00:06:53.800 --> 00:06:57.040
<v Speaker 2>many items were sold during a specific promotion wow, or

135
00:06:57.079 --> 00:06:59.759
<v Speaker 2>even rebuild an entire system state if a database with

136
00:06:59.759 --> 00:07:03.959
<v Speaker 2>some lost just from the Kofka log. This capability is

137
00:07:04.000 --> 00:07:08.120
<v Speaker 2>really how Kafka helps businesses transition from traditional batch oriented

138
00:07:08.160 --> 00:07:11.360
<v Speaker 2>processing you know, waiting for those overnight reports.

139
00:07:11.079 --> 00:07:12.959
<v Speaker 1>Right at the end of day summary is to real.

140
00:07:12.800 --> 00:07:15.680
<v Speaker 2>Time data handling, getting instant, up to the minute insights.

141
00:07:15.920 --> 00:07:19.600
<v Speaker 1>That's incredibly powerful replaying history. But if a log is

142
00:07:19.639 --> 00:07:22.519
<v Speaker 1>conceptually simple, why does it need to be distributed? Why

143
00:07:22.560 --> 00:07:26.480
<v Speaker 1>not just one gigantic, super fast log on one machine. Yeah,

144
00:07:26.560 --> 00:07:30.160
<v Speaker 1>good question, And this feels like where Kofka truly transforms

145
00:07:30.199 --> 00:07:33.959
<v Speaker 1>from a concept into an industrial strength powerhouse. Must have

146
00:07:34.040 --> 00:07:37.319
<v Speaker 1>big implications for like data resilience exactly.

147
00:07:37.600 --> 00:07:41.560
<v Speaker 2>The challenge with the single log is clear speed, scalability

148
00:07:41.600 --> 00:07:45.920
<v Speaker 2>and resilience a single system, a single server. It's often

149
00:07:46.000 --> 00:07:50.759
<v Speaker 2>just not reliable enough hardware fails networks glitch, it's common.

150
00:07:51.160 --> 00:07:55.000
<v Speaker 2>Kafka addresses this through horizontal scaling. Me Instead of buying bigger,

151
00:07:55.079 --> 00:07:58.600
<v Speaker 2>more powerful servers, vertical scaling you use more servers. When

152
00:07:58.639 --> 00:08:00.959
<v Speaker 2>you're existing brokers are getting busy, you just add another

153
00:08:00.959 --> 00:08:01.639
<v Speaker 2>one to the cluster.

154
00:08:02.480 --> 00:08:04.360
<v Speaker 3>Scale out, not up precisely.

155
00:08:04.680 --> 00:08:07.079
<v Speaker 2>This is cheaper, much more flexible, and fits the reality

156
00:08:07.079 --> 00:08:09.480
<v Speaker 2>that individual machines aren't perfectly reliable.

157
00:08:09.519 --> 00:08:13.199
<v Speaker 1>So that's a fundamental architectural philosophy behind COFKA. Expect failures,

158
00:08:13.240 --> 00:08:14.040
<v Speaker 1>build around them.

159
00:08:14.120 --> 00:08:17.399
<v Speaker 2>It absolutely is. This approach is crucial because individual IT

160
00:08:17.639 --> 00:08:23.639
<v Speaker 2>systems are seen as inherently unreliable. Horizontal scaling also enables parallelization.

161
00:08:23.279 --> 00:08:24.839
<v Speaker 1>More work done at the same time.

162
00:08:24.759 --> 00:08:29.399
<v Speaker 2>Right, allowing Kofka to process far more messages per unit

163
00:08:29.439 --> 00:08:32.679
<v Speaker 2>of time than a single server ever could. And this

164
00:08:32.720 --> 00:08:36.960
<v Speaker 2>works sufficiently because in KOFKA, data for one logical entities,

165
00:08:37.000 --> 00:08:41.320
<v Speaker 2>say all events related to a specific product or maybe

166
00:08:41.360 --> 00:08:44.120
<v Speaker 2>a single customer's order history can be kept in its

167
00:08:44.159 --> 00:08:45.720
<v Speaker 2>own log partition.

168
00:08:45.759 --> 00:08:47.399
<v Speaker 1>Using that message key you mentioned earlier.

169
00:08:47.440 --> 00:08:51.080
<v Speaker 2>Exactly using the key, this ensures correct ordering for that

170
00:08:51.159 --> 00:08:54.639
<v Speaker 2>specific entity, even if the overall order across all products

171
00:08:54.679 --> 00:08:56.279
<v Speaker 2>isn't strictly sequential globally.

172
00:08:56.360 --> 00:08:58.720
<v Speaker 1>Okay, that makes sense order within a context.

173
00:08:58.840 --> 00:09:01.399
<v Speaker 2>So the implication for you, as someone listening and maybe

174
00:09:01.440 --> 00:09:04.519
<v Speaker 2>trying to build robust systems, is that Kofka is designed

175
00:09:04.559 --> 00:09:07.159
<v Speaker 2>from the ground up to be highly available and resilient

176
00:09:07.559 --> 00:09:10.399
<v Speaker 2>even if parts of it fail. It distributes data and

177
00:09:10.480 --> 00:09:13.960
<v Speaker 2>work across many machines. It's built for reliability in an

178
00:09:14.039 --> 00:09:14.960
<v Speaker 2>unreliable world.

179
00:09:15.120 --> 00:09:18.120
<v Speaker 1>That's a fantastic overview of the core architecture, very clear.

180
00:09:18.399 --> 00:09:20.840
<v Speaker 1>Let's maybe talk about the messages themselves. Now. The source

181
00:09:20.879 --> 00:09:24.440
<v Speaker 1>mentions Kofka's data agnosticism. What does that actually mean for

182
00:09:24.519 --> 00:09:25.799
<v Speaker 1>what you can send through it?

183
00:09:25.799 --> 00:09:28.279
<v Speaker 2>It means Kafka doesn't really care about the content of

184
00:09:28.320 --> 00:09:32.320
<v Speaker 2>your messages. It treats all messages as raw BTE arrays.

185
00:09:32.360 --> 00:09:34.039
<v Speaker 1>Just sequences of bytes yep.

186
00:09:34.399 --> 00:09:37.960
<v Speaker 2>This flexibility is a key design choice. It allows Kafka

187
00:09:38.039 --> 00:09:41.039
<v Speaker 2>to handle any kind of data, whether it's JSON, AVRO

188
00:09:41.240 --> 00:09:45.679
<v Speaker 2>proto buff, plaintext, whatever, regardless of its format or structure.

189
00:09:46.200 --> 00:09:48.559
<v Speaker 2>It doesn't try to interpret the data, which is actually

190
00:09:48.600 --> 00:09:50.000
<v Speaker 2>a big part of its high performance.

191
00:09:50.159 --> 00:09:51.120
<v Speaker 1>Like a postal service.

192
00:09:51.240 --> 00:09:54.480
<v Speaker 2>Exactly like a postal service that delivers any package, big

193
00:09:54.559 --> 00:09:56.600
<v Speaker 2>or small without needing to know what's inside.

194
00:09:56.799 --> 00:09:59.960
<v Speaker 1>That sounds incredibly flexible, But doesn't that mean it's complete

195
00:10:00.279 --> 00:10:03.720
<v Speaker 1>agnostic to the meaning or structure of the data. What

196
00:10:03.759 --> 00:10:07.440
<v Speaker 1>are the implications of that for say, data governance or

197
00:10:07.919 --> 00:10:09.720
<v Speaker 1>ensuring consistency down the line.

198
00:10:09.840 --> 00:10:12.960
<v Speaker 2>Ah, you've hit on a crucial point there. While Kofka

199
00:10:13.000 --> 00:10:16.639
<v Speaker 2>itself is agnostic, in practice, it is optimized for many

200
00:10:16.720 --> 00:10:18.600
<v Speaker 2>small structured messages.

201
00:10:18.159 --> 00:10:19.000
<v Speaker 1>Small unstructure.

202
00:10:19.080 --> 00:10:22.120
<v Speaker 2>Okay, the default maximum message size is only one megabyte.

203
00:10:22.200 --> 00:10:24.360
<v Speaker 1>Oh, that's smaller than I might have thought.

204
00:10:24.320 --> 00:10:27.480
<v Speaker 2>It is, and why you can technically adjust this. It's

205
00:10:27.559 --> 00:10:33.440
<v Speaker 2>generally advised against larger messages can severely impact performance disk space.

206
00:10:34.120 --> 00:10:36.960
<v Speaker 2>It's just not what it's designed for. Kofka is built

207
00:10:36.960 --> 00:10:40.200
<v Speaker 2>for high throughput of many small events, not for transferring

208
00:10:40.279 --> 00:10:43.399
<v Speaker 2>large files like PDFs or big video files. I mean,

209
00:10:43.440 --> 00:10:46.440
<v Speaker 2>look at LinkedIn. They famously use Kafka to process something

210
00:10:46.480 --> 00:10:49.879
<v Speaker 2>like seven trillion messages a day across roughly one hundred

211
00:10:49.919 --> 00:10:51.440
<v Speaker 2>clusters back in twenty nineteen.

212
00:10:51.480 --> 00:10:53.080
<v Speaker 1>Seven trillion a day.

213
00:10:53.360 --> 00:10:56.720
<v Speaker 2>That's an astonishing number of small messages. Really drives home

214
00:10:56.759 --> 00:10:57.720
<v Speaker 2>the point definitely.

215
00:10:57.919 --> 00:11:00.279
<v Speaker 1>So if most messages are small, what are the common

216
00:11:00.320 --> 00:11:03.039
<v Speaker 1>types of messages you typically see in a real world

217
00:11:03.159 --> 00:11:06.159
<v Speaker 1>KOFCA system, what patterns emerge in practice?

218
00:11:06.159 --> 00:11:08.799
<v Speaker 2>Most systems use a mix of message types. You often

219
00:11:08.840 --> 00:11:12.559
<v Speaker 2>see states, states, yeah, messages that describe the complete current

220
00:11:12.600 --> 00:11:15.679
<v Speaker 2>state of an object, like all the details for a product,

221
00:11:15.720 --> 00:11:18.720
<v Speaker 2>its current price, stock level, description, everything, and if you

222
00:11:18.720 --> 00:11:20.960
<v Speaker 2>only care about the latest state. KOFKA has a feature

223
00:11:20.960 --> 00:11:24.000
<v Speaker 2>called log compaction, which uses message keys to save space

224
00:11:24.000 --> 00:11:26.840
<v Speaker 2>by keeping only the most recent version of a particular record.

225
00:11:27.080 --> 00:11:30.279
<v Speaker 1>Okay, so state is the full picture right now? What else?

226
00:11:30.519 --> 00:11:33.440
<v Speaker 2>Then? There are deltas. These contain only the changes in state,

227
00:11:33.759 --> 00:11:36.799
<v Speaker 2>like just a stock quantity adjustment of negative five because

228
00:11:36.799 --> 00:11:40.200
<v Speaker 2>an item sold or plus ten because stock arrived.

229
00:11:40.279 --> 00:11:42.799
<v Speaker 1>Ah, just the change. That sounds way more efficient for

230
00:11:42.879 --> 00:11:43.519
<v Speaker 1>data volume.

231
00:11:43.720 --> 00:11:47.600
<v Speaker 2>It is much smaller messages, but they're less useful on

232
00:11:47.639 --> 00:11:47.960
<v Speaker 2>their own.

233
00:11:48.279 --> 00:11:50.759
<v Speaker 1>How so, what are the challenges if you only have

234
00:11:50.879 --> 00:11:55.000
<v Speaker 1>deltas and need to know, say, the product's total stock

235
00:11:55.120 --> 00:11:55.519
<v Speaker 1>right now?

236
00:11:55.559 --> 00:11:57.919
<v Speaker 2>That's a great question. If you only have deltas, you'd

237
00:11:57.919 --> 00:12:01.200
<v Speaker 2>have to process all previous deltas for that product just

238
00:12:01.240 --> 00:12:04.679
<v Speaker 2>to reconstruct the current state. That could be computationally intensive

239
00:12:04.679 --> 00:12:05.240
<v Speaker 2>for the consumer.

240
00:12:05.360 --> 00:12:07.600
<v Speaker 1>Right you have to sum them all up exactly.

241
00:12:07.559 --> 00:12:10.720
<v Speaker 2>Which is why events are often preferred. They describe what happened,

242
00:12:10.879 --> 00:12:14.399
<v Speaker 2>but add context. Like instead of just MIGHTO five stock,

243
00:12:14.480 --> 00:12:17.240
<v Speaker 2>the event might be order fulfilled event, which contains the

244
00:12:17.279 --> 00:12:21.200
<v Speaker 2>stock change but also the order ID, customer ID timestamp

245
00:12:21.960 --> 00:12:27.159
<v Speaker 2>more more meaning VAT, adjustment or promotion started, or other examples. Logs,

246
00:12:27.240 --> 00:12:29.600
<v Speaker 2>in fact, are really just a special kind of event stream.

247
00:12:29.639 --> 00:12:32.320
<v Speaker 1>Okay, events give context and you mentioned one more type.

248
00:12:32.200 --> 00:12:35.600
<v Speaker 2>Yes, commands. These are used to instruct other systems to

249
00:12:35.600 --> 00:12:40.120
<v Speaker 2>perform actions like ship this order command or process payment command.

250
00:12:40.759 --> 00:12:43.639
<v Speaker 2>Unlike events, where the cener often doesn't care who listens,

251
00:12:43.879 --> 00:12:47.320
<v Speaker 2>commands usually require a response or a specific action from

252
00:12:47.320 --> 00:12:48.360
<v Speaker 2>the recipient system.

253
00:12:48.399 --> 00:12:53.360
<v Speaker 1>That distinction between events and commands feels important. Commands expect

254
00:12:53.360 --> 00:12:56.960
<v Speaker 1>a reaction. Now you mentioned, messages aren't just a singular

255
00:12:57.039 --> 00:13:00.639
<v Speaker 1>blob of data. They have structure. Break that down for

256
00:13:00.720 --> 00:13:03.639
<v Speaker 1>us again. What are the parts of a Kafka message?

257
00:13:03.759 --> 00:13:07.559
<v Speaker 2>Yes, a Kafka message or record technically is composed of

258
00:13:07.600 --> 00:13:12.039
<v Speaker 2>a few key elements. First, the value that's the primary payload,

259
00:13:12.080 --> 00:13:14.799
<v Speaker 2>the actual information you want to convey, like the details

260
00:13:14.799 --> 00:13:15.679
<v Speaker 2>of a customer order.

261
00:13:15.840 --> 00:13:17.600
<v Speaker 3>Usually the biggest part the core data.

262
00:13:17.720 --> 00:13:21.039
<v Speaker 2>And there's an optional key which is incredibly important even

263
00:13:21.039 --> 00:13:24.200
<v Speaker 2>though it's optional, so important it's used to categorize messages,

264
00:13:24.279 --> 00:13:27.240
<v Speaker 2>and critically, messages with the same key are guaranteed by

265
00:13:27.279 --> 00:13:28.799
<v Speaker 2>Kafka to go to the same partition.

266
00:13:29.039 --> 00:13:32.000
<v Speaker 1>Ah. So that's how you ensure order for a specific entity,

267
00:13:32.200 --> 00:13:34.519
<v Speaker 1>like all updates for product one two over.

268
00:13:34.720 --> 00:13:37.639
<v Speaker 2>Exactly, if you send all updates for product one twenty

269
00:13:37.679 --> 00:13:40.159
<v Speaker 2>three with a key one twenty three, they land in

270
00:13:40.200 --> 00:13:43.919
<v Speaker 2>the same partition in order, it guarantees their order relative

271
00:13:43.960 --> 00:13:46.759
<v Speaker 2>to each other, at least from a single producer. The

272
00:13:46.879 --> 00:13:49.759
<v Speaker 2>key is also essential for that log compaction feature we

273
00:13:49.879 --> 00:13:53.360
<v Speaker 2>mentioned where Kafka retains only the latest message for a

274
00:13:53.360 --> 00:13:56.360
<v Speaker 2>given key, very useful for topics representing current state.

275
00:13:56.799 --> 00:13:59.480
<v Speaker 1>So the key is crucial for ordering and compaction, not

276
00:13:59.639 --> 00:14:01.919
<v Speaker 1>just as a ID. What else is in a message?

277
00:14:02.039 --> 00:14:04.799
<v Speaker 2>There are also optional custom headers. These are meant for

278
00:14:05.080 --> 00:14:09.039
<v Speaker 2>technical metadata, things like tracing IDs for distributed systems, maybe

279
00:14:09.080 --> 00:14:11.960
<v Speaker 2>security token stuff like that, not really for business data.

280
00:14:12.080 --> 00:14:14.440
<v Speaker 1>Keep business data in the value generally yes.

281
00:14:14.679 --> 00:14:17.440
<v Speaker 2>And finally, there's a timestamp. This records the time the

282
00:14:17.440 --> 00:14:20.080
<v Speaker 2>message was created by the producer or potentially when it

283
00:14:20.120 --> 00:14:23.840
<v Speaker 2>was appended to the broker log, depending on configuration. This

284
00:14:23.960 --> 00:14:27.799
<v Speaker 2>timestamp is vital for many real time analytics scenarios, especially

285
00:14:27.799 --> 00:14:30.480
<v Speaker 2>when you start dealing with time windows in stream processing.

286
00:14:30.759 --> 00:14:33.519
<v Speaker 1>Fascinating how much detail goes into what seems like a

287
00:14:33.559 --> 00:14:38.000
<v Speaker 1>simple message. Loads of potential there. Now, let's pivot to

288
00:14:38.039 --> 00:14:43.080
<v Speaker 1>something absolutely critical for any data system, reliability. How does

289
00:14:43.120 --> 00:14:45.840
<v Speaker 1>kofka build trust with your data? How does it ensure

290
00:14:45.879 --> 00:14:48.799
<v Speaker 1>nothing gets lost or hopelessly out of order?

291
00:14:49.159 --> 00:14:52.799
<v Speaker 2>Right? Reliability in Kofka is built on a few core pillars. First,

292
00:14:53.080 --> 00:14:55.080
<v Speaker 2>replication and leaders followers.

293
00:14:55.200 --> 00:14:58.799
<v Speaker 1>We touched on this leaders and followers for partitions exactly.

294
00:14:58.879 --> 00:15:02.360
<v Speaker 2>For each partition, one broker acts as the leader. It

295
00:15:02.399 --> 00:15:06.279
<v Speaker 2>handles all the incoming produce requests and outgoing consumer requests

296
00:15:06.279 --> 00:15:09.080
<v Speaker 2>for that partition. The other brokers holding replicas for that

297
00:15:09.159 --> 00:15:12.720
<v Speaker 2>partition are called followers, and they just continuously replicate or

298
00:15:12.759 --> 00:15:16.480
<v Speaker 2>copy new messages from that leader. This creates redundant copies

299
00:15:16.480 --> 00:15:18.480
<v Speaker 2>of your data across different machines.

300
00:15:18.559 --> 00:15:22.200
<v Speaker 1>That sounds incredibly robust, multiple copies, But what actually happens

301
00:15:22.200 --> 00:15:24.360
<v Speaker 1>behind the scenes when a leader fails let's say the

302
00:15:24.399 --> 00:15:28.159
<v Speaker 1>machine crashes. Is the switch to a follower instantaneous? Are

303
00:15:28.200 --> 00:15:31.159
<v Speaker 1>there any potential downsides or edge cases? A listener should be.

304
00:15:31.159 --> 00:15:35.600
<v Speaker 2>Aware of good question. When a leader fails, Kafka automatically

305
00:15:35.639 --> 00:15:38.360
<v Speaker 2>detects this and elects a new leader from its set

306
00:15:38.399 --> 00:15:42.279
<v Speaker 2>of in sync replicas or ISRs. These are followers that

307
00:15:42.279 --> 00:15:43.639
<v Speaker 2>are caught up with a leader's.

308
00:15:43.320 --> 00:15:46.039
<v Speaker 1>Log ISRs, in sync replicas.

309
00:15:45.679 --> 00:15:50.120
<v Speaker 2>YEAH, or sometimes eligible leader replicas elrs, depending on the setup.

310
00:15:50.600 --> 00:15:54.159
<v Speaker 2>The goal is to ensure the topic remains accessible. Producers

311
00:15:54.159 --> 00:15:57.480
<v Speaker 2>and consumers are designed to automatically detect this change and

312
00:15:57.559 --> 00:16:00.639
<v Speaker 2>switch to the new leader, usually with minimal interruption. We're

313
00:16:00.639 --> 00:16:02.720
<v Speaker 2>talking milliseconds typically.

314
00:16:02.320 --> 00:16:04.080
<v Speaker 1>Okay, so it's fast. Any downsides?

315
00:16:04.240 --> 00:16:07.159
<v Speaker 2>The main downside is that during that brief election period,

316
00:16:07.480 --> 00:16:12.200
<v Speaker 2>that specific partition might be temporarily unavailable for writing. Reading

317
00:16:12.279 --> 00:16:15.279
<v Speaker 2>might still be possible from followers depending on config, but

318
00:16:15.399 --> 00:16:19.080
<v Speaker 2>writs need the leader. Also, once the original preferred leader

319
00:16:19.120 --> 00:16:22.000
<v Speaker 2>comes back online and catches up, KAKA often aims to

320
00:16:22.039 --> 00:16:25.360
<v Speaker 2>reinstate it as leader. This helps rebalance the leadership load

321
00:16:25.399 --> 00:16:26.840
<v Speaker 2>across the cluster over time.

322
00:16:27.360 --> 00:16:31.120
<v Speaker 1>That's excellent to know. Automatic failover is key. So how

323
00:16:31.159 --> 00:16:34.960
<v Speaker 1>do acknowledgements or ACKs play into this? How do producers

324
00:16:35.080 --> 00:16:38.519
<v Speaker 1>know their messages are safely persisted across these replicas before

325
00:16:38.559 --> 00:16:39.000
<v Speaker 1>they move on?

326
00:16:39.320 --> 00:16:43.519
<v Speaker 2>ACKs? Are precisely how producers control the durability guarantee and

327
00:16:43.679 --> 00:16:47.799
<v Speaker 2>ensure messages are safely persisted. There are three main strategies

328
00:16:47.919 --> 00:16:51.720
<v Speaker 2>controlled by the act's producer. Can fig with x zero

329
00:16:51.799 --> 00:16:55.159
<v Speaker 2>it's basically fire and forget send in hope pretty much.

330
00:16:55.559 --> 00:16:58.279
<v Speaker 2>It gives the highest performance because the producer doesn't wait

331
00:16:58.320 --> 00:17:01.399
<v Speaker 2>for any confirmation at all, but it offers the lowest

332
00:17:01.440 --> 00:17:05.799
<v Speaker 2>reliability comforable. Maybe to UDP networking. You could lose messages

333
00:17:05.839 --> 00:17:07.279
<v Speaker 2>if the broker fails immediately.

334
00:17:07.400 --> 00:17:08.519
<v Speaker 1>When would you ever use that?

335
00:17:08.880 --> 00:17:12.160
<v Speaker 2>It's acceptable if some data loss is tolerable, maybe high

336
00:17:12.240 --> 00:17:15.240
<v Speaker 2>volume sensor data where only the latest reading matters and

337
00:17:15.279 --> 00:17:18.599
<v Speaker 2>losing an occasional reading isn't catastrophic, like a temperature sensor

338
00:17:18.799 --> 00:17:19.960
<v Speaker 2>in a non critical system.

339
00:17:20.079 --> 00:17:22.759
<v Speaker 1>Okay, what about AX one. That sounds like a middle ground.

340
00:17:22.799 --> 00:17:25.839
<v Speaker 2>It is AX one means the producer gets a response

341
00:17:26.000 --> 00:17:29.200
<v Speaker 2>and acknowledgment as soon as the leader broker successfully receives

342
00:17:29.200 --> 00:17:32.079
<v Speaker 2>and writes the message to its local log. This offers

343
00:17:32.160 --> 00:17:34.920
<v Speaker 2>much better latency than waiting for all replicas, but data

344
00:17:34.960 --> 00:17:37.960
<v Speaker 2>loss is still possible if the leader receives the message

345
00:17:38.319 --> 00:17:40.920
<v Speaker 2>sends the akaz act back to the producer but then

346
00:17:41.200 --> 00:17:44.720
<v Speaker 2>crashes before that message gets replicated to its followers, that

347
00:17:44.759 --> 00:17:45.599
<v Speaker 2>message is lost.

348
00:17:45.960 --> 00:17:48.920
<v Speaker 1>Ah okay, so it's confirmed by the leader, but not

349
00:17:49.039 --> 00:17:50.599
<v Speaker 1>guaranteed replicated.

350
00:17:50.200 --> 00:17:53.160
<v Speaker 2>Yet exactly, which brings us to AXOL or you can

351
00:17:53.160 --> 00:17:55.079
<v Speaker 2>write as a medico one. This has actually been the

352
00:17:55.079 --> 00:17:56.680
<v Speaker 2>default setting since Kafka three.

353
00:17:56.519 --> 00:17:57.799
<v Speaker 1>Point zero, the safest option.

354
00:17:58.200 --> 00:18:02.480
<v Speaker 2>Yes, AXOL offers the highest reliability with this setting. The

355
00:18:02.559 --> 00:18:05.279
<v Speaker 2>leader waits until all of the current in sync replica's

356
00:18:05.640 --> 00:18:09.359
<v Speaker 2>ISRs have successfully persisted the data to their logs before

357
00:18:09.400 --> 00:18:11.799
<v Speaker 2>sending that final ACK back to the producer.

358
00:18:11.920 --> 00:18:14.240
<v Speaker 1>So you know it's on multiple machines, right.

359
00:18:14.559 --> 00:18:17.119
<v Speaker 2>This is what you definitely want for critical data like

360
00:18:17.160 --> 00:18:21.279
<v Speaker 2>those customer orders, financial transactions, anything you absolutely cannot lose.

361
00:18:21.359 --> 00:18:24.839
<v Speaker 1>So for guaranteed delivery, AXOL is the gold standard. Is

362
00:18:24.880 --> 00:18:27.240
<v Speaker 1>there a way to fine tune exactly how many InSync

363
00:18:27.319 --> 00:18:30.200
<v Speaker 1>replicas need to acknowledge before the leader confirms? Maybe you

364
00:18:30.200 --> 00:18:31.880
<v Speaker 1>don't need all of them, just a majority.

365
00:18:32.079 --> 00:18:35.519
<v Speaker 2>Yes. Absolutely. That's where men dot nsync dot replicas comes in.

366
00:18:35.839 --> 00:18:38.519
<v Speaker 2>It's a topic level configuration setting that works hand in

367
00:18:38.559 --> 00:18:39.359
<v Speaker 2>hand with AXOL.

368
00:18:39.559 --> 00:18:40.240
<v Speaker 1>How does it work?

369
00:18:40.319 --> 00:18:43.519
<v Speaker 2>It specifies the minimum number of ISRs, including the leader

370
00:18:43.559 --> 00:18:47.480
<v Speaker 2>itself that must acknowledge the right before the leader confirms

371
00:18:47.519 --> 00:18:49.680
<v Speaker 2>receipt back to the producer. So if you have a

372
00:18:49.720 --> 00:18:53.000
<v Speaker 2>replication factor of three and you set men dot nsync

373
00:18:53.079 --> 00:18:55.799
<v Speaker 2>dot replicas two, then the right succeeds as long as

374
00:18:55.799 --> 00:18:58.880
<v Speaker 2>the leader and at least one follower confirm it. If

375
00:18:58.880 --> 00:19:01.240
<v Speaker 2>only the leader is available, well, the producer will get

376
00:19:01.240 --> 00:19:04.960
<v Speaker 2>an error and can retry, preventing potential data loss if

377
00:19:05.000 --> 00:19:07.559
<v Speaker 2>too many replicas are temporarily down or slow.

378
00:19:07.720 --> 00:19:10.640
<v Speaker 1>That gives you really fine grain control over the durability

379
00:19:10.720 --> 00:19:14.920
<v Speaker 1>versus availability trade off. Very useful, but what about guaranteeing

380
00:19:14.920 --> 00:19:17.960
<v Speaker 1>messages are written exactly once and in the correct order,

381
00:19:18.079 --> 00:19:21.160
<v Speaker 1>especially if a producer has to retry sending due to

382
00:19:21.200 --> 00:19:23.799
<v Speaker 1>a temporary network issue or something that sounds like a

383
00:19:23.799 --> 00:19:25.720
<v Speaker 1>classic distributed systems headache.

384
00:19:25.759 --> 00:19:28.920
<v Speaker 2>It is a tough problem, but Kaofka has solutions for that.

385
00:19:29.000 --> 00:19:31.200
<v Speaker 2>We turn to idempatance and transactions.

386
00:19:31.319 --> 00:19:34.920
<v Speaker 1>Idempatance, meaning doing something multiple times, has the same effect

387
00:19:34.960 --> 00:19:35.480
<v Speaker 1>as doing.

388
00:19:35.319 --> 00:19:39.039
<v Speaker 2>It once, precisely by setting enabled dot idempatance true on

389
00:19:39.079 --> 00:19:42.319
<v Speaker 2>the producer, which is actually the default now too. Alongside

390
00:19:42.359 --> 00:19:45.079
<v Speaker 2>acts all, Kafka ensures that messages are written in the

391
00:19:45.079 --> 00:19:48.519
<v Speaker 2>correct order. Within a partition and are present exactly once,

392
00:19:48.880 --> 00:19:50.440
<v Speaker 2>even if the producer retries sending.

393
00:19:50.839 --> 00:19:53.000
<v Speaker 1>How does it do that without much overhead?

394
00:19:53.240 --> 00:19:55.759
<v Speaker 2>It uses sequence numbers assigned by the producer and tracked

395
00:19:55.759 --> 00:19:59.079
<v Speaker 2>by the broker. The performance loss is negligible, maybe one

396
00:19:59.119 --> 00:20:02.640
<v Speaker 2>percent or less, but the gain in data integrity is huge.

397
00:20:03.039 --> 00:20:05.960
<v Speaker 2>Imagine if an order place message got duplicated because of

398
00:20:05.960 --> 00:20:08.039
<v Speaker 2>a retry, I defidence prevents that.

399
00:20:08.519 --> 00:20:11.759
<v Speaker 1>Okay, so I dumpetance handles duplicates from producer retries. What

400
00:20:11.799 --> 00:20:13.440
<v Speaker 1>about transactions? When do they come in?

401
00:20:13.599 --> 00:20:18.559
<v Speaker 2>Transactions are for achieving exactly once semantics eos When you're

402
00:20:18.559 --> 00:20:22.480
<v Speaker 2>doing more complex things, especially involving multiple partitions or transferring

403
00:20:22.599 --> 00:20:27.000
<v Speaker 2>data atomically between COFKA and other external systems like databases

404
00:20:27.079 --> 00:20:28.480
<v Speaker 2>or other COFKA topics.

405
00:20:28.319 --> 00:20:30.960
<v Speaker 1>Like a multi step process that needs to succeed or

406
00:20:31.000 --> 00:20:32.839
<v Speaker 1>fail entirely exactly.

407
00:20:33.240 --> 00:20:37.119
<v Speaker 2>A producer can begin a transaction, send messages to multiple partitions,

408
00:20:37.519 --> 00:20:40.559
<v Speaker 2>and then either commit the transaction, making all messages visible

409
00:20:40.559 --> 00:20:44.240
<v Speaker 2>to consumers, or abort it, discarding them all. It's atomic.

410
00:20:44.400 --> 00:20:46.799
<v Speaker 1>How do transactions affect consumers? Do they need to do

411
00:20:46.839 --> 00:20:47.680
<v Speaker 1>anything special?

412
00:20:47.880 --> 00:20:52.240
<v Speaker 2>Yes? Crucially, consumers that need transactional guarantees must set their

413
00:20:52.279 --> 00:20:55.400
<v Speaker 2>isolation dot level configuration to read.

414
00:20:55.200 --> 00:20:56.680
<v Speaker 3>Committed, read committed Okay.

415
00:20:56.960 --> 00:20:59.400
<v Speaker 2>This ensures they only read messages that are part of

416
00:20:59.440 --> 00:21:04.079
<v Speaker 2>successfully committed transactions, filtering out any messages from aborted transactions

417
00:21:04.160 --> 00:21:07.000
<v Speaker 2>or ongoing ones. It's actually good practice to set this

418
00:21:07.359 --> 00:21:10.359
<v Speaker 2>even if you don't use transactions initially, just to be safe.

419
00:21:10.400 --> 00:21:14.279
<v Speaker 1>And how does Kafka manage this atomicity across potentially multiple

420
00:21:14.480 --> 00:21:15.759
<v Speaker 1>brokers and partitions.

421
00:21:15.839 --> 00:21:19.240
<v Speaker 2>It uses a variation of the classic two phase commit protocol. Internally,

422
00:21:19.680 --> 00:21:23.240
<v Speaker 2>it involves transaction coordinators on the brokers and special control messages.

423
00:21:23.599 --> 00:21:26.240
<v Speaker 2>It's complex under the hood, but it provides that strong

424
00:21:26.279 --> 00:21:30.559
<v Speaker 2>guarantee of atonic rights across multiple partitions, ensuring data consistency

425
00:21:30.599 --> 00:21:31.880
<v Speaker 2>throughout your entire data flow.

426
00:21:32.000 --> 00:21:36.160
<v Speaker 1>That's a truly comprehensive approach to reliability from replication and

427
00:21:36.400 --> 00:21:41.880
<v Speaker 1>ACKs right through to idempaitence and transactions. Very impressive. Now,

428
00:21:41.960 --> 00:21:45.319
<v Speaker 1>let's talk about speed. Kafka is famous for its performance.

429
00:21:45.359 --> 00:21:48.480
<v Speaker 1>It's throughput. How does it achieve such high speeds and

430
00:21:48.519 --> 00:21:52.279
<v Speaker 1>what are the key configurations that truly make it fly?

431
00:21:52.680 --> 00:21:55.680
<v Speaker 2>Yeah, performance is definitely one of Kafka's hallmarks. It's inherently

432
00:21:55.720 --> 00:21:58.680
<v Speaker 2>tuned for performance right out of the box. It's design

433
00:21:58.799 --> 00:22:02.119
<v Speaker 2>basically assumes that hard disks are relatively cheap these days

434
00:22:02.400 --> 00:22:05.599
<v Speaker 2>and memory is quite abundant, okay, and it heavily prioritizes

435
00:22:05.640 --> 00:22:09.039
<v Speaker 2>horizontal scaling as we discussed. But to truly make it fly,

436
00:22:09.440 --> 00:22:14.079
<v Speaker 2>optimization through careful configuration settings across all the components producers,

437
00:22:14.079 --> 00:22:17.400
<v Speaker 2>brokers and consumers is vital. It's not just one magic switch.

438
00:22:17.759 --> 00:22:21.519
<v Speaker 1>And you mentioned partitions are key to performance earlier? Can

439
00:22:21.559 --> 00:22:24.720
<v Speaker 1>you elaborate on how they directly contribute to speed and throughput?

440
00:22:24.920 --> 00:22:28.359
<v Speaker 2>Absolutely, partitions are key because they enable massive parallel processing

441
00:22:28.359 --> 00:22:31.559
<v Speaker 2>and load balancing. Remember, topics are divided into partitions and

442
00:22:31.599 --> 00:22:35.039
<v Speaker 2>these partitions are then distributed across the different brokeram machines.

443
00:22:35.359 --> 00:22:39.279
<v Speaker 2>Producers determine which partition to send messages to, usually based

444
00:22:39.279 --> 00:22:43.160
<v Speaker 2>on the key and on the consumer side, we use consumer.

445
00:22:42.759 --> 00:22:45.240
<v Speaker 1>Groups consumer groups. What are those exactly?

446
00:22:45.519 --> 00:22:48.160
<v Speaker 2>A consumer group is just a set of consumer instances

447
00:22:48.200 --> 00:22:52.559
<v Speaker 2>that cooperate to consume from a topic. Kafka automatically assigns

448
00:22:52.599 --> 00:22:55.960
<v Speaker 2>the partitions of a topic across the available consumers in

449
00:22:56.000 --> 00:22:58.759
<v Speaker 2>a group. So if a topic has ten partitions and

450
00:22:58.799 --> 00:23:01.599
<v Speaker 2>you have five consumers in a group. Each consumer will

451
00:23:01.599 --> 00:23:03.720
<v Speaker 2>handle two partitions in parallel.

452
00:23:03.519 --> 00:23:06.720
<v Speaker 1>AH, so the group processes the topic together in parallel

453
00:23:06.759 --> 00:23:08.559
<v Speaker 1>across partitions exactly.

454
00:23:08.799 --> 00:23:11.880
<v Speaker 2>This allows you to scale out your consumption by simply

455
00:23:11.920 --> 00:23:15.359
<v Speaker 2>adding more consumer instances to the group up to the

456
00:23:15.440 --> 00:23:19.519
<v Speaker 2>number of partitions, it drastically increases the overall message processing throughput.

457
00:23:19.759 --> 00:23:22.839
<v Speaker 1>I understand that more partitions can mean more potential parallelism,

458
00:23:22.960 --> 00:23:25.359
<v Speaker 1>more throughput, but it feels like there could be a

459
00:23:25.400 --> 00:23:29.000
<v Speaker 1>point of diminishing returns or even negative consequences. What are

460
00:23:29.039 --> 00:23:32.240
<v Speaker 1>the implications of having too many partitions? Is their downside?

461
00:23:32.319 --> 00:23:36.119
<v Speaker 2>You're absolutely right to be cautious. There definitely is a downside.

462
00:23:36.599 --> 00:23:39.839
<v Speaker 2>While increasing partitions can boost throughput up to a point,

463
00:23:40.279 --> 00:23:43.640
<v Speaker 2>it introduces significant complexity and overhead if you go too far.

464
00:23:44.279 --> 00:23:47.599
<v Speaker 2>Like what well, each partition demand client resources, memory on

465
00:23:47.680 --> 00:23:51.240
<v Speaker 2>producers and consumers on the brokers, Each partition is a

466
00:23:51.279 --> 00:23:55.240
<v Speaker 2>log file on disc requiring file handles, memory for indexing,

467
00:23:55.559 --> 00:23:59.079
<v Speaker 2>and CPU for replication. Thousands or tens of thousands of

468
00:23:59.079 --> 00:24:03.200
<v Speaker 2>partitions can really strain broker resources. It can also lead

469
00:24:03.240 --> 00:24:07.599
<v Speaker 2>to prolonged unavailability during certain failure scenarios like leader elections,

470
00:24:08.000 --> 00:24:12.079
<v Speaker 2>especially with older zookeeper managed clusters, where a zookeeper itself

471
00:24:12.119 --> 00:24:12.799
<v Speaker 2>could become.

472
00:24:12.559 --> 00:24:15.839
<v Speaker 1>A bottleneck, So finding the right number is important crucial.

473
00:24:15.960 --> 00:24:18.720
<v Speaker 2>Imagine our online retailer suddenly deciding to have a million

474
00:24:18.759 --> 00:24:22.920
<v Speaker 2>tiny partitions, one for every single product desk cue. While

475
00:24:22.920 --> 00:24:26.079
<v Speaker 2>it might sound organized for ordering, the overhead of managing

476
00:24:26.119 --> 00:24:30.079
<v Speaker 2>all those partitions, leader, elections, replication traffic, client connections would

477
00:24:30.160 --> 00:24:33.640
<v Speaker 2>likely overwhelm the system and actually reduce overall performance instability.

478
00:24:33.759 --> 00:24:35.559
<v Speaker 1>And can you easily change the number later?

479
00:24:35.839 --> 00:24:39.000
<v Speaker 2>That's another catch. Reducing the number of partitions for a

480
00:24:39.039 --> 00:24:43.240
<v Speaker 2>topic isn't really possible without potentially losing data or complex

481
00:24:43.279 --> 00:24:47.240
<v Speaker 2>manual steps. Increasing partitions is easier. You could do that online,

482
00:24:47.720 --> 00:24:52.000
<v Speaker 2>but increasing partitions disrupts message ordering guarantees for existing keys.

483
00:24:52.240 --> 00:24:56.440
<v Speaker 2>Why because the partitioner usually calculates the target partition using

484
00:24:56.440 --> 00:24:59.759
<v Speaker 2>something like hash key percent number of partitions. If you

485
00:24:59.839 --> 00:25:02.400
<v Speaker 2>chang ange the number of partitions, the result changes and

486
00:25:02.480 --> 00:25:05.359
<v Speaker 2>messages with the same key start going to different partitions

487
00:25:05.359 --> 00:25:08.559
<v Speaker 2>than before, breaking strict ordering for that key until all

488
00:25:08.559 --> 00:25:09.799
<v Speaker 2>the old data expires.

489
00:25:09.920 --> 00:25:13.200
<v Speaker 1>Wow. Okay, so choosing the initial partition count and planning

490
00:25:13.200 --> 00:25:16.039
<v Speaker 1>for future growth is really important. Get it wrong, It's

491
00:25:16.039 --> 00:25:17.000
<v Speaker 1>hard to fix. Easily.

492
00:25:17.240 --> 00:25:21.880
<v Speaker 2>Exactly optimal balance is key usually found through careful testing, monitoring,

493
00:25:21.960 --> 00:25:24.720
<v Speaker 2>and understanding your data access patterns. Don't just pick a

494
00:25:24.799 --> 00:25:25.680
<v Speaker 2>huge number upfront.

495
00:25:25.759 --> 00:25:28.960
<v Speaker 1>That's a very clear warning. So partitions are critical for scaling.

496
00:25:29.400 --> 00:25:32.799
<v Speaker 1>How does the producer specifically contribute to this incredible performance?

497
00:25:32.839 --> 00:25:34.480
<v Speaker 1>Beyond just sending messages?

498
00:25:34.680 --> 00:25:37.960
<v Speaker 2>Producer performance relies heavily on batching. This is super.

499
00:25:37.759 --> 00:25:40.160
<v Speaker 1>Important batching grouping messages.

500
00:25:40.319 --> 00:25:44.039
<v Speaker 2>Yes. Instead of sending each message individually over the network

501
00:25:44.039 --> 00:25:47.319
<v Speaker 2>as soon as it's ready, producers collect messages destined for

502
00:25:47.359 --> 00:25:50.079
<v Speaker 2>the same partition on the same broker and group them

503
00:25:50.079 --> 00:25:53.039
<v Speaker 2>into larger batches than they send the entire batch in

504
00:25:53.079 --> 00:25:53.400
<v Speaker 2>one go.

505
00:25:53.920 --> 00:25:55.839
<v Speaker 1>That must save a lot of network round trips.

506
00:25:56.240 --> 00:26:00.920
<v Speaker 2>Drastically, It significantly enhances performance and reduce uses network load

507
00:26:01.119 --> 00:26:05.119
<v Speaker 2>by sending fewer, larger chunks of data rather than many

508
00:26:05.160 --> 00:26:05.799
<v Speaker 2>small ones.

509
00:26:05.880 --> 00:26:07.119
<v Speaker 1>How do you control that batching?

510
00:26:07.400 --> 00:26:10.000
<v Speaker 2>You can figure it mainly with two settings. Yeah, batch

511
00:26:10.039 --> 00:26:13.680
<v Speaker 2>dot size which sets the maximum batch size and bytes,

512
00:26:13.720 --> 00:26:16.880
<v Speaker 2>and linger dot ms, which is the maximum time in milliseconds.

513
00:26:17.200 --> 00:26:19.519
<v Speaker 2>The producer will wait to try and fill up a

514
00:26:19.519 --> 00:26:22.240
<v Speaker 2>batch before sending it, even if it's not full yet.

515
00:26:22.119 --> 00:26:25.160
<v Speaker 1>So a trade off between latency and throughput exactly.

516
00:26:25.319 --> 00:26:27.920
<v Speaker 2>Larger linger dot m's values like five meters, ten meters

517
00:26:28.039 --> 00:26:31.680
<v Speaker 2>or even more increase latency slightly because messages wait longer,

518
00:26:32.119 --> 00:26:34.880
<v Speaker 2>but they also increase the chance of bigger batches, leading

519
00:26:34.880 --> 00:26:38.200
<v Speaker 2>to much better throughput and efficiency. Finding the sweet spot

520
00:26:38.279 --> 00:26:40.640
<v Speaker 2>depends on your application's latency requirements.

521
00:26:40.799 --> 00:26:43.160
<v Speaker 1>And what about compression? Does that happen at the producer?

522
00:26:43.440 --> 00:26:46.799
<v Speaker 2>Yes, and it's another big performance booster. The producer can

523
00:26:46.799 --> 00:26:50.799
<v Speaker 2>compress the entire batch of messages just once before sending.

524
00:26:50.599 --> 00:26:53.839
<v Speaker 1>It the whole batch, not message by message, whole batch.

525
00:26:54.039 --> 00:26:57.599
<v Speaker 2>This is much more efficient than compressing individual messages. Common

526
00:26:57.680 --> 00:27:01.960
<v Speaker 2>compression types like snappy, gzip, LZ for zstd are supported.

527
00:27:02.400 --> 00:27:05.319
<v Speaker 2>This significantly reduces the amount of data sent over the

528
00:27:05.359 --> 00:27:08.640
<v Speaker 2>network and also saves hard disk space on the brokers

529
00:27:09.000 --> 00:27:12.000
<v Speaker 2>as they store and transmit the compressed batches unchanged.

530
00:27:12.400 --> 00:27:16.759
<v Speaker 1>Clever batching and compression working together. And earlier you mentioned

531
00:27:16.880 --> 00:27:19.319
<v Speaker 1>zero copy transfer as a neat trick for brokers to

532
00:27:19.359 --> 00:27:23.079
<v Speaker 1>achieve high performance. How does that actually make the brokers faster?

533
00:27:23.559 --> 00:27:25.480
<v Speaker 1>What are they doing or rather not doing right?

534
00:27:25.720 --> 00:27:28.599
<v Speaker 2>Broker performance is maximized largely by keeping them as simple

535
00:27:28.640 --> 00:27:32.599
<v Speaker 2>and efficient as possible. Their primary job is in complex computation.

536
00:27:33.200 --> 00:27:36.279
<v Speaker 2>It's really about efficiently pushing bytes from network sockets to

537
00:27:36.359 --> 00:27:40.039
<v Speaker 2>disc when receiving from producers, reducing and pushing bytes from

538
00:27:40.079 --> 00:27:42.400
<v Speaker 2>disc back to network sockets when sending to consumers.

539
00:27:42.519 --> 00:27:45.240
<v Speaker 1>Consuming just moving data pretty much.

540
00:27:45.480 --> 00:27:48.480
<v Speaker 2>And this is where zero copy transfer, a feature available

541
00:27:48.480 --> 00:27:51.759
<v Speaker 2>in Linux and other Unix like operating systems, comes into play.

542
00:27:52.359 --> 00:27:55.759
<v Speaker 2>It's a fundamental reason Kofka can achieve such incredible speeds

543
00:27:55.799 --> 00:27:56.920
<v Speaker 2>on commodity hardware.

544
00:27:57.359 --> 00:27:59.359
<v Speaker 1>So what does zero copy actually avoid?

545
00:27:59.400 --> 00:28:03.480
<v Speaker 2>What cop Imagine the traditional way data moves from the

546
00:28:03.559 --> 00:28:08.200
<v Speaker 2>disc into the operating system kernel's memory page cash, then

547
00:28:08.279 --> 00:28:12.000
<v Speaker 2>copied into the applications memory, the coca broker process, then

548
00:28:12.039 --> 00:28:15.039
<v Speaker 2>copied back into the kernel socket buffer memory, and finally

549
00:28:15.079 --> 00:28:18.680
<v Speaker 2>copied out to the network card. That's multiple copies and memory.

550
00:28:18.440 --> 00:28:19.720
<v Speaker 1>Sounds inefficient, it is.

551
00:28:20.000 --> 00:28:22.799
<v Speaker 2>Zero copy allows the kernel to directly transfer data from

552
00:28:22.799 --> 00:28:25.920
<v Speaker 2>the disc cache page cash straight to the network socket

553
00:28:25.960 --> 00:28:30.240
<v Speaker 2>buffer without needing that intermediate copy into the applications Kafka's

554
00:28:30.319 --> 00:28:31.000
<v Speaker 2>memory space.

555
00:28:31.079 --> 00:28:34.359
<v Speaker 1>It cuts out the middleman the application buffer exactly.

556
00:28:34.880 --> 00:28:38.720
<v Speaker 2>It avoids unnecessary data copies and CPU context switches between

557
00:28:38.799 --> 00:28:43.000
<v Speaker 2>kernel mode and user mode. This makes the broker astonishingly efficient,

558
00:28:43.319 --> 00:28:45.880
<v Speaker 2>acting more like a super fast data pipeline or router

559
00:28:46.319 --> 00:28:49.720
<v Speaker 2>than a heavy processing engine. And because Coffka's message format

560
00:28:49.759 --> 00:28:52.039
<v Speaker 2>on disc is the same as it's over the wire format,

561
00:28:52.279 --> 00:28:53.279
<v Speaker 2>this works beautifully.

562
00:28:53.440 --> 00:28:56.680
<v Speaker 1>That's a really key optimization. And you also mentioned Kofka

563
00:28:56.799 --> 00:28:59.440
<v Speaker 1>often relies on the OS for flushing data to disc

564
00:28:59.599 --> 00:29:01.720
<v Speaker 1>rather than doing it manually after every rite.

565
00:29:01.839 --> 00:29:02.079
<v Speaker 4>Yes.

566
00:29:02.319 --> 00:29:06.240
<v Speaker 2>Generally Kafka avoids forcing manual sink operations after every message

567
00:29:06.400 --> 00:29:09.480
<v Speaker 2>right for performance reasons, and FOLCNC forces the OS to

568
00:29:09.480 --> 00:29:12.839
<v Speaker 2>physically write data from its cashes to the disc hardware immediately,

569
00:29:13.079 --> 00:29:14.200
<v Speaker 2>which it can be slow.

570
00:29:14.039 --> 00:29:17.920
<v Speaker 1>So it risks losing data if the OS crashes before flushing.

571
00:29:18.240 --> 00:29:21.400
<v Speaker 2>In theory, yes, for data that's only in the OS cash,

572
00:29:21.839 --> 00:29:25.519
<v Speaker 2>but Kofka relies on its replication mechanism for durability. By

573
00:29:25.559 --> 00:29:28.119
<v Speaker 2>the time a producer gets an ax AL confirmation, the

574
00:29:28.200 --> 00:29:31.680
<v Speaker 2>data is safely replicated to multiple brokers, OS caches and

575
00:29:31.839 --> 00:29:35.960
<v Speaker 2>likely heading to disk soon via background OS processes. Relying

576
00:29:35.960 --> 00:29:39.440
<v Speaker 2>on replication plus the OS's background flushing provides high throughput

577
00:29:39.480 --> 00:29:41.559
<v Speaker 2>and strong durability guarantees in practice.

578
00:29:41.680 --> 00:29:45.720
<v Speaker 1>Okay, that makes sense. Reliability through replications speed through avoiding

579
00:29:45.759 --> 00:29:52.200
<v Speaker 1>forced sinks, so producers batch and compress brokers use zero copy.

580
00:29:52.480 --> 00:29:55.640
<v Speaker 1>What about the consumer side? Is there a bottleneck there

581
00:29:55.759 --> 00:29:57.920
<v Speaker 1>or is COFKA just pushing data as fast as the

582
00:29:57.920 --> 00:29:58.680
<v Speaker 1>network allows?

583
00:29:59.039 --> 00:30:02.279
<v Speaker 2>Consumer performances is definitely configurable. You have settings like fetch

584
00:30:02.359 --> 00:30:05.599
<v Speaker 2>dot min dot bytes, which tells the broker the minimum

585
00:30:05.599 --> 00:30:07.599
<v Speaker 2>amount of data to send back in one go.

586
00:30:07.960 --> 00:30:11.359
<v Speaker 1>So the consumer isn't getting piny responses all the time.

587
00:30:11.279 --> 00:30:14.240
<v Speaker 2>Right, and fetch dot max dot wheat dot ms the

588
00:30:14.240 --> 00:30:16.759
<v Speaker 2>maximum time the broker will wait for that minimum amount

589
00:30:16.759 --> 00:30:19.319
<v Speaker 2>of data to accumulate before sending back whatever it has.

590
00:30:19.640 --> 00:30:22.319
<v Speaker 2>These help tune the balance between latency and throughput on

591
00:30:22.359 --> 00:30:25.160
<v Speaker 2>the consumer side, similar to the producer's lingered MS.

592
00:30:25.240 --> 00:30:26.759
<v Speaker 1>Okay, but where's the real limit?

593
00:30:26.839 --> 00:30:29.839
<v Speaker 2>Usually here's a crucial insight and something many people overlook

594
00:30:29.839 --> 00:30:33.920
<v Speaker 2>when troubleshooting performance. Consumer performance is typically limited by how

595
00:30:33.960 --> 00:30:38.480
<v Speaker 2>fast the consumer application processes the data, not by COFA itself.

596
00:30:38.160 --> 00:30:40.400
<v Speaker 1>So it's my code that's slow, not COFKA.

597
00:30:40.440 --> 00:30:44.960
<v Speaker 2>Often, yes, COFA brokers are incredibly efficient at serving data.

598
00:30:45.519 --> 00:30:48.519
<v Speaker 2>Our own performance tests and many others often reveal that

599
00:30:48.559 --> 00:30:52.359
<v Speaker 2>consumers can easily pull data much faster than typical producers

600
00:30:52.359 --> 00:30:55.480
<v Speaker 2>can even send it. The bottleneck is frequently your own

601
00:30:55.519 --> 00:30:59.920
<v Speaker 2>application logic. How quickly can your online retailer's inventory serve,

602
00:31:00.359 --> 00:31:04.119
<v Speaker 2>look up product details, updated's database and acknowledge the item

603
00:31:04.160 --> 00:31:07.680
<v Speaker 2>sold event it just received from kofca. Or sometimes it's

604
00:31:07.680 --> 00:31:11.079
<v Speaker 2>simply the network bandwidth available to the consumer, but rarely

605
00:31:11.200 --> 00:31:13.079
<v Speaker 2>is it Kafka's ability to deliver the bytes.

606
00:31:13.400 --> 00:31:15.400
<v Speaker 1>That's a really key distinction to keep in mind for

607
00:31:15.480 --> 00:31:20.240
<v Speaker 1>optimization and troubleshooting. Focus on the consumer application logic first. Okay,

608
00:31:20.279 --> 00:31:25.440
<v Speaker 1>we've covered the core mechanics, reliability, performance, fantastic foundation. Now

609
00:31:25.559 --> 00:31:28.359
<v Speaker 1>let's talk about how coofka integrates with the wider world

610
00:31:28.359 --> 00:31:30.839
<v Speaker 1>of systems and enables that exciting realm of real time

611
00:31:30.920 --> 00:31:34.440
<v Speaker 1>data analysis. How does coofca connect fit into this picture?

612
00:31:34.480 --> 00:31:34.960
<v Speaker 1>What is it?

613
00:31:35.200 --> 00:31:38.079
<v Speaker 2>Coppa connect is a really powerful framework and tool that's

614
00:31:38.079 --> 00:31:40.359
<v Speaker 2>part of the Apache Kofka project. Its purpose is to

615
00:31:40.359 --> 00:31:42.799
<v Speaker 2>make it easy to integrate COFCA with external systems.

616
00:31:42.880 --> 00:31:44.720
<v Speaker 1>External systems like what think.

617
00:31:44.640 --> 00:31:50.079
<v Speaker 2>Databases, key value stores, search indexes, file systems, cloud storage

618
00:31:50.119 --> 00:31:54.359
<v Speaker 2>like S three, messaging queues like JMS, pretty much anything

619
00:31:54.359 --> 00:31:56.960
<v Speaker 2>you'd want to get data at kofka from or get

620
00:31:57.039 --> 00:31:58.960
<v Speaker 2>data out of Kofka into Okay.

621
00:31:59.000 --> 00:32:02.240
<v Speaker 1>So it's like a universe data bridge builder for COFKA.

622
00:32:02.279 --> 00:32:04.640
<v Speaker 2>That's a great way to put it. And crucially, it

623
00:32:04.680 --> 00:32:07.279
<v Speaker 2>aims to do this without you having to write custom

624
00:32:07.319 --> 00:32:10.079
<v Speaker 2>integration code for every single system you use pre built

625
00:32:10.200 --> 00:32:11.319
<v Speaker 2>or community connectors.

626
00:32:11.559 --> 00:32:14.400
<v Speaker 1>So why should you, the listener care about COOFCA connect?

627
00:32:14.480 --> 00:32:15.519
<v Speaker 1>What's the big benefit?

628
00:32:15.799 --> 00:32:18.279
<v Speaker 2>The big benefit is that it helps automate and standardize

629
00:32:18.319 --> 00:32:22.440
<v Speaker 2>data flow to and from Kofka. It massively simplifies building

630
00:32:22.440 --> 00:32:26.000
<v Speaker 2>and managing these data pipelines. For our online retailer example,

631
00:32:26.160 --> 00:32:29.119
<v Speaker 2>it means effortlessly getting customer profile updates from a CRM

632
00:32:29.160 --> 00:32:32.559
<v Speaker 2>system into a COSTA topic or pushing processed order data

633
00:32:32.599 --> 00:32:36.160
<v Speaker 2>from Kofka into a downstream data warehouse or fulfillment system,

634
00:32:36.599 --> 00:32:39.599
<v Speaker 2>all using configuration rather than complex custom code.

635
00:32:39.799 --> 00:32:43.480
<v Speaker 1>Less code, more configuration sounds good. How does it work? Architecturally?

636
00:32:43.759 --> 00:32:46.880
<v Speaker 2>A Kafka connect deployment runs as a cluster of workers.

637
00:32:47.279 --> 00:32:50.599
<v Speaker 2>These are just JVM processes that execute the integration tasks.

638
00:32:50.799 --> 00:32:54.519
<v Speaker 2>They handle skivelling offsets, configuration, and distributing.

639
00:32:54.039 --> 00:32:58.240
<v Speaker 1>The actual work workers running the tasks and the tasks themselves.

640
00:32:58.640 --> 00:33:01.720
<v Speaker 2>The actual integration logic is encapsulated in connectors. These are

641
00:33:01.759 --> 00:33:04.160
<v Speaker 2>plugins that you deploy to your connect pluster. There are

642
00:33:04.160 --> 00:33:06.799
<v Speaker 2>two main types, source connectors and sync connectors.

643
00:33:06.920 --> 00:33:08.759
<v Speaker 1>Source and sync easy enough.

644
00:33:08.839 --> 00:33:12.400
<v Speaker 2>Source connectors import data from an external system into Kofka topics.

645
00:33:12.759 --> 00:33:16.000
<v Speaker 2>For example, a GDBC source connector can pull a database

646
00:33:16.039 --> 00:33:20.119
<v Speaker 2>table for new rows or more powerfully, connectors like Debisium

647
00:33:20.119 --> 00:33:25.160
<v Speaker 2>perform change data capture CDC by reading database transaction logs

648
00:33:25.400 --> 00:33:29.119
<v Speaker 2>and sending every single row level change, insert, update, delete

649
00:33:29.119 --> 00:33:30.359
<v Speaker 2>to Kafka in real time.

650
00:33:30.480 --> 00:33:34.640
<v Speaker 1>CDC is huge for real time data warehousing and replication.

651
00:33:34.279 --> 00:33:37.039
<v Speaker 2>Absolutely and then sync connectors do the opposite. They export

652
00:33:37.160 --> 00:33:40.440
<v Speaker 2>data from Coffka topics to external systems, like writing records

653
00:33:40.440 --> 00:33:43.279
<v Speaker 2>from a Cofka topic into lastic search for searching, or

654
00:33:43.440 --> 00:33:46.319
<v Speaker 2>hdfs for archiving, or maybe calling a rest api.

655
00:33:46.759 --> 00:33:50.279
<v Speaker 1>Very powerful. Can you do any like light transformations on

656
00:33:50.359 --> 00:33:53.039
<v Speaker 1>the messages as they pass through connect maybe clean things

657
00:33:53.119 --> 00:33:55.480
<v Speaker 1>up a bit before they hit kafka or before they

658
00:33:55.519 --> 00:33:56.839
<v Speaker 1>go to the sync system.

659
00:33:56.680 --> 00:34:00.519
<v Speaker 2>Yes, you can. Coffka connects support something called Sync Message

660
00:34:00.519 --> 00:34:04.839
<v Speaker 2>transformations or smts smmt's. These allow you to perform simple,

661
00:34:05.079 --> 00:34:09.800
<v Speaker 2>stateless record level transformations on messages within the connect pipeline.

662
00:34:09.840 --> 00:34:11.840
<v Speaker 2>They could be applied before a message is written to

663
00:34:11.920 --> 00:34:14.920
<v Speaker 2>Kafka by a source connector, or before a message is

664
00:34:14.920 --> 00:34:16.880
<v Speaker 2>written to an external system by a sync.

665
00:34:16.639 --> 00:34:19.199
<v Speaker 1>Connector what kind of transformations.

666
00:34:18.519 --> 00:34:23.320
<v Speaker 2>Common examples include things like renaming fields, replaced field, dropping fields,

667
00:34:23.400 --> 00:34:26.559
<v Speaker 2>replace field again or drop, extracting a field from the

668
00:34:26.559 --> 00:34:30.039
<v Speaker 2>message value to become the message key value, toke, pulling

669
00:34:30.039 --> 00:34:33.800
<v Speaker 2>out a specific field, extract field, or even masking sensitive

670
00:34:33.800 --> 00:34:36.320
<v Speaker 2>fields by setting them to null or a fixed value

671
00:34:36.360 --> 00:34:38.880
<v Speaker 2>mask field for privacy or compliance reasons.

672
00:34:39.079 --> 00:34:41.840
<v Speaker 1>That sounds really useful for basic cleanup or shaping.

673
00:34:42.039 --> 00:34:45.119
<v Speaker 2>It is for light work, but there's an important warning here.

674
00:34:45.519 --> 00:34:49.639
<v Speaker 2>Smts are not a fully fledged ETL extract transform load tool.

675
00:34:50.119 --> 00:34:53.880
<v Speaker 2>They are designed for simple, stateless transformations on individual messages.

676
00:34:54.599 --> 00:34:59.199
<v Speaker 2>If you need complex transformations, stateful operations, joins between different

677
00:34:59.280 --> 00:35:04.360
<v Speaker 2>data sources, or heavy computation, smts are not the right tool.

678
00:35:04.280 --> 00:35:05.280
<v Speaker 1>So for the heavy lifting.

679
00:35:05.599 --> 00:35:09.320
<v Speaker 2>For that you typically turn to a dedicated stream processing framework.

680
00:35:09.400 --> 00:35:12.519
<v Speaker 1>AH stream processing. Perfect segue. Welcome to the world of

681
00:35:12.559 --> 00:35:15.280
<v Speaker 1>stream processing. What exactly is that and why is it

682
00:35:15.320 --> 00:35:17.840
<v Speaker 1>such a natural fit such a big deal When talking

683
00:35:17.840 --> 00:35:18.800
<v Speaker 1>about Kofka.

684
00:35:18.920 --> 00:35:23.360
<v Speaker 2>Stream processing is essentially about processing data continuously as it arrives,

685
00:35:23.679 --> 00:35:26.559
<v Speaker 2>typically in real time or near real time, instead of

686
00:35:26.599 --> 00:35:29.320
<v Speaker 2>collecting data into batches and processing it hours later the

687
00:35:29.440 --> 00:35:33.280
<v Speaker 2>old way. Right, you process potentially unbounded streams of data

688
00:35:33.320 --> 00:35:36.039
<v Speaker 2>events as they flow through the system, like data flowing

689
00:35:36.039 --> 00:35:39.360
<v Speaker 2>through Coofka topics and the benefit This enables instant analysis,

690
00:35:39.599 --> 00:35:43.360
<v Speaker 2>immediate reactions to business changes, and the creation of applications

691
00:35:43.400 --> 00:35:45.920
<v Speaker 2>that are always up to date. Think back to our

692
00:35:45.960 --> 00:35:49.840
<v Speaker 2>online retailer. Instead of knowing total sales only at the

693
00:35:49.920 --> 00:35:53.000
<v Speaker 2>end of the day from a Bash report, stream processing

694
00:35:53.000 --> 00:35:56.280
<v Speaker 2>allows them to see real time sales trends as they happen.

695
00:35:56.840 --> 00:36:00.920
<v Speaker 2>They can detect potentially fraudulent transactions within seconds, or trigger

696
00:36:01.000 --> 00:36:04.760
<v Speaker 2>personalized offers based on a customer's immediate browsing behavior on

697
00:36:04.800 --> 00:36:08.440
<v Speaker 2>their website. It unlocks truly real time capabilities.

698
00:36:08.679 --> 00:36:13.119
<v Speaker 1>That makes sense moving from batch latency to real time responsiveness.

699
00:36:13.480 --> 00:36:16.400
<v Speaker 1>What are some common frameworks people use for this with Kofka.

700
00:36:16.440 --> 00:36:19.920
<v Speaker 2>There are several powerful frame works out there. Kafka streams

701
00:36:19.960 --> 00:36:22.400
<v Speaker 2>is a very popular choice because it's actually a Java

702
00:36:22.480 --> 00:36:25.440
<v Speaker 2>library that's part of the Apache Kofka project itself. It

703
00:36:25.440 --> 00:36:28.599
<v Speaker 2>makes it easy to build stream processing applications that read

704
00:36:28.639 --> 00:36:29.360
<v Speaker 2>from and write to.

705
00:36:29.400 --> 00:36:30.880
<v Speaker 1>Kaofka tightly integrated.

706
00:36:31.320 --> 00:36:34.679
<v Speaker 2>Very then you have other major open source players like

707
00:36:34.760 --> 00:36:37.880
<v Speaker 2>Apache Flank, which is known for its sophisticated state management

708
00:36:38.119 --> 00:36:42.199
<v Speaker 2>and event time processing capabilities. There's also Apache spark streaming,

709
00:36:42.320 --> 00:36:46.159
<v Speaker 2>though it's more microbatch oriented. Historically, and in other ecosystems,

710
00:36:46.159 --> 00:36:49.800
<v Speaker 2>you might see things like Scala's AKA streams or Python's

711
00:36:49.800 --> 00:36:50.719
<v Speaker 2>faust library.

712
00:36:50.880 --> 00:36:54.199
<v Speaker 1>Lots of choices, Let's maybe focus on Kofka streams. Since

713
00:36:54.239 --> 00:36:57.039
<v Speaker 1>it's part of kofka. How do you actually process these

714
00:36:57.039 --> 00:36:59.320
<v Speaker 1>streams using it? What are the building blocks.

715
00:36:59.039 --> 00:37:02.400
<v Speaker 2>And Kafka streams. If you define your processing logic as

716
00:37:02.400 --> 00:37:05.920
<v Speaker 2>a topology of processors, kind of like chaining together operations,

717
00:37:06.400 --> 00:37:09.199
<v Speaker 2>you typically start with a source processor which reads data

718
00:37:09.280 --> 00:37:12.320
<v Speaker 2>from one or more Cofka topics into a stream called

719
00:37:12.320 --> 00:37:12.920
<v Speaker 2>a k stream.

720
00:37:13.000 --> 00:37:15.320
<v Speaker 1>Okay, get the data in then what Then.

721
00:37:15.199 --> 00:37:18.360
<v Speaker 2>You apply various transformation or processing steps. You can filter

722
00:37:18.519 --> 00:37:21.079
<v Speaker 2>messages based on some condition, maybe only keep orders with

723
00:37:21.079 --> 00:37:23.599
<v Speaker 2>the value over one hundred dollars. You can use map

724
00:37:23.719 --> 00:37:27.039
<v Speaker 2>or map values to transform the messages. Map values just

725
00:37:27.239 --> 00:37:29.320
<v Speaker 2>changes the message value. Well, map can change both the

726
00:37:29.400 --> 00:37:31.519
<v Speaker 2>key and the value, or even the type of.

727
00:37:31.440 --> 00:37:34.360
<v Speaker 1>The message useful for reshaping data definitely.

728
00:37:34.440 --> 00:37:38.000
<v Speaker 4>You can merge two different data streams together into one,

729
00:37:38.119 --> 00:37:41.320
<v Speaker 4>or you can split a single stream into multiple downstream

730
00:37:41.360 --> 00:37:44.760
<v Speaker 4>topics based on different conditions, like routing high value orders

731
00:37:44.760 --> 00:37:46.679
<v Speaker 4>to one topic and standard orders.

732
00:37:46.519 --> 00:37:48.559
<v Speaker 1>To another, branching the flow exactly.

733
00:37:48.960 --> 00:37:52.800
<v Speaker 2>And A very common and powerful operation is aggregation. Like count,

734
00:37:53.239 --> 00:37:56.599
<v Speaker 2>This coount's occurrence is per key. For example, continuously counting

735
00:37:56.599 --> 00:37:58.719
<v Speaker 2>how many times each product was viewed or.

736
00:37:58.679 --> 00:38:01.400
<v Speaker 1>Added to a cart. That sounds like it needs to

737
00:38:01.440 --> 00:38:02.639
<v Speaker 1>remember things over time.

738
00:38:02.800 --> 00:38:07.239
<v Speaker 2>It does. Operations like count. Some reduce are stateful. They

739
00:38:07.239 --> 00:38:10.000
<v Speaker 2>need to maintain and update some internal state based on

740
00:38:10.039 --> 00:38:14.280
<v Speaker 2>the incoming messages. Kofka Streams manages this state reliably interesting.

741
00:38:14.519 --> 00:38:17.320
<v Speaker 1>I've also heard of streaming squel being used with Kafka.

742
00:38:17.519 --> 00:38:20.519
<v Speaker 1>Is that like running familiar SQL queries but on live,

743
00:38:21.000 --> 00:38:23.840
<v Speaker 1>constantly changing data streams instead of static tables.

744
00:38:23.880 --> 00:38:27.639
<v Speaker 2>Precisely, streaming SQL offers a higher level declarative way to

745
00:38:27.679 --> 00:38:31.760
<v Speaker 2>define stream processing logic using SQL like syntax. Frameworks like

746
00:38:31.840 --> 00:38:35.639
<v Speaker 2>KSQLDB built on Coofka Streams or flink sql allow you

747
00:38:35.639 --> 00:38:39.159
<v Speaker 2>to write queries like selection productive count from clicks group

748
00:38:39.239 --> 00:38:40.960
<v Speaker 2>by producted directly on data streams.

749
00:38:41.199 --> 00:38:44.239
<v Speaker 1>But what does that query return? A stream doesn't have

750
00:38:44.280 --> 00:38:45.880
<v Speaker 1>an end good point.

751
00:38:46.119 --> 00:38:49.880
<v Speaker 2>Unlike a traditional database query that runs once and returns

752
00:38:49.920 --> 00:38:53.719
<v Speaker 2>a single final result set, a streaming SEQL query typically

753
00:38:53.760 --> 00:38:57.199
<v Speaker 2>runs continuously and produces a new data stream of changes.

754
00:38:57.400 --> 00:38:58.480
<v Speaker 1>A stream of changes.

755
00:38:58.639 --> 00:39:01.599
<v Speaker 2>Yeah, so for that at count query, the output stream

756
00:39:01.599 --> 00:39:05.480
<v Speaker 2>would contain messages indicating the updated count for a productive

757
00:39:05.920 --> 00:39:08.440
<v Speaker 2>Every time the count changes due to a new click

758
00:39:08.440 --> 00:39:11.239
<v Speaker 2>event arriving, it continuously refines the result.

759
00:39:11.400 --> 00:39:13.800
<v Speaker 1>So the result itself is a stream. That's a different

760
00:39:13.840 --> 00:39:14.840
<v Speaker 1>way of thinking it is.

761
00:39:15.039 --> 00:39:18.639
<v Speaker 2>And maybe frameworks like flank sql also support a headless

762
00:39:18.639 --> 00:39:21.360
<v Speaker 2>mode where you can deploy pre defined SQL queries that

763
00:39:21.400 --> 00:39:24.880
<v Speaker 2>just run continuously in the background, perhaps writing their continuously

764
00:39:24.920 --> 00:39:28.400
<v Speaker 2>updated results back to another Kafka topic or an external database.

765
00:39:28.840 --> 00:39:31.960
<v Speaker 1>You mentioned state full operations like counting and how Kaffa

766
00:39:32.000 --> 00:39:35.079
<v Speaker 1>streams manage a state. You also hear about stream states

767
00:39:35.119 --> 00:39:38.760
<v Speaker 1>and tables. How do those fit in? Especially with aggregations.

768
00:39:38.320 --> 00:39:41.239
<v Speaker 2>Right, stream processing frameworks need a way to reliably store

769
00:39:41.280 --> 00:39:46.000
<v Speaker 2>the state required for operations like aggregations, counts, sums, averages,

770
00:39:46.440 --> 00:39:49.800
<v Speaker 2>or for joins between streams in Kaffa streams. This state

771
00:39:49.880 --> 00:39:52.239
<v Speaker 2>is typically stored in local state stores on the machine

772
00:39:52.320 --> 00:39:55.519
<v Speaker 2>running the application instance. These are often backed by embedded

773
00:39:55.599 --> 00:39:59.000
<v Speaker 2>databases like rockstd for performance local storage.

774
00:39:59.159 --> 00:40:03.000
<v Speaker 1>What happens if the the application instance crashes is the state.

775
00:40:02.800 --> 00:40:07.159
<v Speaker 2>Loss ah good question. That's where Kafka's own reliability comes in.

776
00:40:07.639 --> 00:40:10.760
<v Speaker 2>These local state stores are backed by internal change log

777
00:40:10.800 --> 00:40:12.199
<v Speaker 2>topics in Kafka.

778
00:40:11.960 --> 00:40:13.280
<v Speaker 1>Itself changelog topics.

779
00:40:13.400 --> 00:40:16.400
<v Speaker 2>Yes, every update made to the local state store is

780
00:40:16.440 --> 00:40:19.239
<v Speaker 2>also written as a message to a compacted Kofka topic.

781
00:40:19.800 --> 00:40:23.320
<v Speaker 2>If your application instance crashes and restarts, possibly on a

782
00:40:23.320 --> 00:40:27.000
<v Speaker 2>different machine, Kafka streams can automatically restore its local state

783
00:40:27.320 --> 00:40:30.480
<v Speaker 2>by replaying the messages from that changelog topic. It makes

784
00:40:30.519 --> 00:40:32.199
<v Speaker 2>the state fault tolerant.

785
00:40:31.840 --> 00:40:34.079
<v Speaker 1>Clever using Kafka to back up the state of the

786
00:40:34.119 --> 00:40:35.840
<v Speaker 1>stream processor exactly so.

787
00:40:36.119 --> 00:40:40.039
<v Speaker 2>Those aggregations like some or Average, use these state stores

788
00:40:40.079 --> 00:40:43.239
<v Speaker 2>to keep track of the running calculation. The result of

789
00:40:43.280 --> 00:40:46.840
<v Speaker 2>an aggregation in Kafka streams is often represented as a

790
00:40:46.880 --> 00:40:47.719
<v Speaker 2>K table.

791
00:40:47.559 --> 00:40:49.400
<v Speaker 1>A K table. What's that compared to a K stream.

792
00:40:49.480 --> 00:40:51.760
<v Speaker 2>Think of a K stream as representing the raw sequence

793
00:40:51.800 --> 00:40:54.800
<v Speaker 2>of events, the history what happened. A K table, on

794
00:40:54.840 --> 00:40:58.719
<v Speaker 2>the other hand, represents the current state derived from that stream,

795
00:40:58.760 --> 00:41:00.960
<v Speaker 2>like an up to date view or materialize few What

796
00:41:01.039 --> 00:41:03.960
<v Speaker 2>is the current value? So the output of our account

797
00:41:04.159 --> 00:41:06.519
<v Speaker 2>aggregation would be a K table where the key is

798
00:41:06.519 --> 00:41:08.880
<v Speaker 2>the producted and the value is its latest.

799
00:41:08.599 --> 00:41:12.239
<v Speaker 1>Count stream effects table of current state. Got it? And

800
00:41:12.320 --> 00:41:16.400
<v Speaker 1>what about combining data from different streams or enriching a

801
00:41:16.440 --> 00:41:19.280
<v Speaker 1>stream with data from a table. How do streaming joins work?

802
00:41:19.280 --> 00:41:19.880
<v Speaker 1>In this world?

803
00:41:20.159 --> 00:41:24.840
<v Speaker 2>Joins are essential for enriching data. Stream processing frameworks support

804
00:41:24.920 --> 00:41:28.000
<v Speaker 2>various types of joins. You can do stream table joins.

805
00:41:28.119 --> 00:41:30.760
<v Speaker 2>This is common for enrichment. Imagine you have a K

806
00:41:30.840 --> 00:41:34.039
<v Speaker 2>stream of order events and a K table containing customer

807
00:41:34.079 --> 00:41:37.880
<v Speaker 2>profile information keep by customer ID. You can join the

808
00:41:37.960 --> 00:41:40.400
<v Speaker 2>order stream with a customer table to add the customer's

809
00:41:40.480 --> 00:41:43.519
<v Speaker 2>name and address to each order event as it flows through.

810
00:41:44.199 --> 00:41:46.760
<v Speaker 2>The join is typically triggered when a new event arrives

811
00:41:46.760 --> 00:41:49.639
<v Speaker 2>on the stream and it looks up the corresponding key.

812
00:41:49.519 --> 00:41:50.039
<v Speaker 4>In the table.

813
00:41:50.239 --> 00:41:54.000
<v Speaker 1>Okay, enriching events with static ish data. Right.

814
00:41:54.480 --> 00:41:57.440
<v Speaker 2>You can also do table table joints, where changes in

815
00:41:57.480 --> 00:42:00.559
<v Speaker 2>either table can trigger updates to the joined result. This

816
00:42:00.679 --> 00:42:03.880
<v Speaker 2>is useful for combining two evolving data sets. And then

817
00:42:03.920 --> 00:42:07.639
<v Speaker 2>you have stream stream joins joining two potentially infinite streams

818
00:42:07.639 --> 00:42:10.920
<v Speaker 2>of events. This usually requires defining a time window, because

819
00:42:10.920 --> 00:42:12.880
<v Speaker 2>you need to specify how long the system should wait

820
00:42:12.920 --> 00:42:15.000
<v Speaker 2>for a matching event to arrive on the other stream

821
00:42:15.239 --> 00:42:18.800
<v Speaker 2>before giving up. For example, joining AD impressions with AD

822
00:42:18.800 --> 00:42:21.280
<v Speaker 2>clicks based on a user ID within say a five

823
00:42:21.280 --> 00:42:21.840
<v Speaker 2>minute window.

824
00:42:21.960 --> 00:42:26.000
<v Speaker 1>Windows become crucial for stream stream joins. And you mentioned

825
00:42:26.000 --> 00:42:29.519
<v Speaker 1>earlier that for joins to work efficiently, data often needs

826
00:42:29.559 --> 00:42:32.880
<v Speaker 1>to be copartitioned. Can you remind us what that means? Sure?

827
00:42:33.000 --> 00:42:37.119
<v Speaker 2>Copartitioning is a prerequisite for efficient joins and some aggregations

828
00:42:37.400 --> 00:42:41.280
<v Speaker 2>in many distributed stream processing systems, including cough To streams.

829
00:42:41.599 --> 00:42:44.119
<v Speaker 2>It means that records from the different topics being joined,

830
00:42:44.400 --> 00:42:47.639
<v Speaker 2>which share the same joint key, must reside in partitions

831
00:42:47.679 --> 00:42:49.480
<v Speaker 2>with the same ID number across.

832
00:42:49.119 --> 00:42:52.159
<v Speaker 1>Those topics, same key, same partition number, even if in

833
00:42:52.199 --> 00:42:53.400
<v Speaker 1>different topics exactly.

834
00:42:53.480 --> 00:42:56.039
<v Speaker 2>Think of it like this. If you have customer orders

835
00:42:56.039 --> 00:42:59.760
<v Speaker 2>in one Kafka topic orders and customer addresses in another topic,

836
00:43:00.079 --> 00:43:04.480
<v Speaker 2>dresses both potentially partitioned across multiple brokers. For Kafa streams

837
00:43:04.480 --> 00:43:07.239
<v Speaker 2>to efficiently join an order with its corresponding address based

838
00:43:07.280 --> 00:43:09.760
<v Speaker 2>on customer it needs to ensure that the order for

839
00:43:09.800 --> 00:43:12.679
<v Speaker 2>a customer one twenty three Q one twenty three and

840
00:43:12.719 --> 00:43:14.679
<v Speaker 2>the address for customer one twenty three Q one twenty

841
00:43:14.719 --> 00:43:18.760
<v Speaker 2>three both land in say, partition five of their respective topics.

842
00:43:18.800 --> 00:43:22.519
<v Speaker 2>Why is that necessary Because then the Kaffka stream's task

843
00:43:22.639 --> 00:43:26.440
<v Speaker 2>responsible for processing partition five will have local access to

844
00:43:26.480 --> 00:43:29.119
<v Speaker 2>both the order and the address data for customer one

845
00:43:29.199 --> 00:43:32.800
<v Speaker 2>twenty three. It doesn't need to make slow network calls

846
00:43:32.800 --> 00:43:36.039
<v Speaker 2>to fetch data from other partitions or other instances. It

847
00:43:36.119 --> 00:43:38.639
<v Speaker 2>keeps the joint operation efficient and scalable.

848
00:43:38.920 --> 00:43:41.760
<v Speaker 1>It's like ensuring related files are in the same filing

849
00:43:41.760 --> 00:43:44.679
<v Speaker 1>cabinet drawer, even across different cabinets, so you only have

850
00:43:44.719 --> 00:43:45.559
<v Speaker 1>to look in one place.

851
00:43:45.679 --> 00:43:49.639
<v Speaker 2>That's a perfect analogy. If data isn't naturally co partitioned

852
00:43:49.639 --> 00:43:52.760
<v Speaker 2>by key when it arrives, kopfcas streams often needs to

853
00:43:52.800 --> 00:43:57.239
<v Speaker 2>perform an internal repartitioning step, which involves writing the data

854
00:43:57.280 --> 00:44:01.400
<v Speaker 2>to an intermediate correctly partitioned topic before the join can happen.

855
00:44:01.920 --> 00:44:04.400
<v Speaker 2>This adds some overhead, but ensures correctness.

856
00:44:04.599 --> 00:44:08.360
<v Speaker 1>That co partitioning requirement makes perfect sense for distributed joints.

857
00:44:08.559 --> 00:44:11.880
<v Speaker 1>Now time itself seems like a really crucial and potentially

858
00:44:11.920 --> 00:44:15.679
<v Speaker 1>tricky concept in stream processing. You mentioned event time earlier.

859
00:44:15.719 --> 00:44:17.639
<v Speaker 1>What are the different time concepts we need to be

860
00:44:17.639 --> 00:44:19.800
<v Speaker 1>aware of and why does it matter which one you use?

861
00:44:20.199 --> 00:44:23.800
<v Speaker 2>Yes, Understanding time is absolutely vital and choosing the wrong

862
00:44:23.880 --> 00:44:27.960
<v Speaker 2>time semantic can lead to inaccurate results. There are generally

863
00:44:28.039 --> 00:44:32.239
<v Speaker 2>four key time concepts people talk about. First, event time.

864
00:44:33.039 --> 00:44:35.639
<v Speaker 2>This is when the event actually occurred in the real world,

865
00:44:35.880 --> 00:44:38.559
<v Speaker 2>like the timestamp generated by the sensor when it took

866
00:44:38.559 --> 00:44:40.960
<v Speaker 2>a reading, or the exact moment a customer clicked a

867
00:44:40.960 --> 00:44:42.400
<v Speaker 2>button on the website.

868
00:44:42.119 --> 00:44:43.039
<v Speaker 1>The time it happened.

869
00:44:43.079 --> 00:44:46.800
<v Speaker 2>Second, create time. This is the time when the producer

870
00:44:46.840 --> 00:44:50.280
<v Speaker 2>application created the COFCA message. This is often close to

871
00:44:50.320 --> 00:44:53.000
<v Speaker 2>event time, but could be later if there's a delay

872
00:44:53.000 --> 00:44:56.039
<v Speaker 2>in the producing system. This is actually the default time

873
00:44:56.079 --> 00:44:59.719
<v Speaker 2>stamp stored in COFCA messages if the producer doesn't explicitly

874
00:44:59.719 --> 00:45:03.719
<v Speaker 2>sell one. Third, log a pen time this is the

875
00:45:03.760 --> 00:45:06.119
<v Speaker 2>time when the coffer broker received the message and appended

876
00:45:06.159 --> 00:45:09.000
<v Speaker 2>it to the partition's log. This timestamp is assigned by

877
00:45:09.039 --> 00:45:12.440
<v Speaker 2>the broker itself. Messages within a partition are strictly ordered

878
00:45:12.440 --> 00:45:13.000
<v Speaker 2>by log.

879
00:45:12.800 --> 00:45:15.199
<v Speaker 1>A pen time Okay, broker received time.

880
00:45:15.159 --> 00:45:19.280
<v Speaker 2>And finally, stream time or sometimes called processing time. This

881
00:45:19.320 --> 00:45:21.880
<v Speaker 2>is the time when the message is actually processed by

882
00:45:21.880 --> 00:45:23.400
<v Speaker 2>the stream processing application.

883
00:45:23.519 --> 00:45:23.880
<v Speaker 1>Instance.

884
00:45:24.599 --> 00:45:26.599
<v Speaker 2>This is usually the latest of all the time stamps

885
00:45:26.719 --> 00:45:29.679
<v Speaker 2>and can be affected by processing delays, network latency, etc.

886
00:45:30.159 --> 00:45:33.519
<v Speaker 1>So event time create time log a pen time, stream

887
00:45:33.559 --> 00:45:36.360
<v Speaker 1>processing time. Why does the choice matter so much?

888
00:45:36.760 --> 00:45:40.199
<v Speaker 2>It matters because if you want accurate results that reflect

889
00:45:40.199 --> 00:45:43.079
<v Speaker 2>the real world sequence of events, especially when dealing with

890
00:45:43.199 --> 00:45:45.280
<v Speaker 2>data that might arrive late or out of order, which

891
00:45:45.320 --> 00:45:47.760
<v Speaker 2>is common in distributed systems, you generally want to use

892
00:45:47.800 --> 00:45:48.639
<v Speaker 2>event time.

893
00:45:48.800 --> 00:45:50.960
<v Speaker 1>Even if messages arrive out of sequence.

894
00:45:51.079 --> 00:45:54.519
<v Speaker 2>Yes, processing based on event time allows the system to

895
00:45:54.599 --> 00:45:57.960
<v Speaker 2>correctly handle out of order data and produce results consistent

896
00:45:58.000 --> 00:46:01.840
<v Speaker 2>with when things actually happened. Example, if you're calculating hourly

897
00:46:01.920 --> 00:46:05.079
<v Speaker 2>sales totals, using event time ensures a sale that occurred

898
00:46:05.079 --> 00:46:07.519
<v Speaker 2>at ten five h zero five am but arrive late

899
00:46:07.519 --> 00:46:10.280
<v Speaker 2>at eleven five zero five am still gets counted in

900
00:46:10.320 --> 00:46:13.000
<v Speaker 2>the ten point zero zero eleven point zero am window.

901
00:46:13.559 --> 00:46:16.079
<v Speaker 2>Using processing time would put it in the wrong hour.

902
00:46:16.320 --> 00:46:19.480
<v Speaker 1>Ah. That makes a huge difference for accuracy. But processing

903
00:46:19.519 --> 00:46:21.360
<v Speaker 1>based on event time sounds more complex.

904
00:46:21.639 --> 00:46:25.159
<v Speaker 2>It is more complex. The stream processor needs mechanisms to

905
00:46:25.199 --> 00:46:28.760
<v Speaker 2>handle potentially late arriving data, often using concepts like water

906
00:46:28.800 --> 00:46:32.119
<v Speaker 2>marks to track the progress of event time and decide

907
00:46:32.119 --> 00:46:34.599
<v Speaker 2>when it's safe to finalize calculations for a given time.

908
00:46:34.639 --> 00:46:37.760
<v Speaker 2>Window processing time is simpler, but less accurate for many

909
00:46:37.840 --> 00:46:38.440
<v Speaker 2>use cases.

910
00:46:38.599 --> 00:46:41.519
<v Speaker 1>Okay, that clarifies the time concepts and building on that

911
00:46:41.599 --> 00:46:44.119
<v Speaker 1>you mentioned, time windows are often used with stream processing,

912
00:46:44.199 --> 00:46:47.599
<v Speaker 1>especially for aggregations or joints. What are some common types

913
00:46:47.639 --> 00:46:49.920
<v Speaker 1>of windows and one would you use them? Come?

914
00:46:49.960 --> 00:46:53.960
<v Speaker 2>Windows are fundamental for defining boundaries for calculations on unbanded streams.

915
00:46:54.039 --> 00:46:57.679
<v Speaker 2>Common types include tumbling windows. These are fixed size, non

916
00:46:57.679 --> 00:47:01.239
<v Speaker 2>overlapping windows. Think of them like slicing time into consecutive chunks,

917
00:47:01.400 --> 00:47:04.840
<v Speaker 2>for example, calculating total sales for each distinct hour ten

918
00:47:04.920 --> 00:47:08.639
<v Speaker 2>point zero zero, eleven point zero zero, eleven point zero zero, zero, twelve,

919
00:47:08.639 --> 00:47:10.920
<v Speaker 2>de verits, et cetera. Great for curiotic.

920
00:47:10.519 --> 00:47:12.599
<v Speaker 1>Reports fixed separate blocks. Got it?

921
00:47:12.840 --> 00:47:16.239
<v Speaker 2>Then you have sliding windows. These also have a fixed length,

922
00:47:16.599 --> 00:47:20.000
<v Speaker 2>but they slide forward continuously by a specified slide interval,

923
00:47:20.159 --> 00:47:23.920
<v Speaker 2>meaning the windows overlap. For example, calculating the average website

924
00:47:24.000 --> 00:47:27.159
<v Speaker 2>response time over the last five minutes updated every one minute.

925
00:47:27.400 --> 00:47:30.599
<v Speaker 2>Useful for monitoring moving averages or recent trends.

926
00:47:30.559 --> 00:47:33.119
<v Speaker 1>Overlapping continuously updated.

927
00:47:32.719 --> 00:47:36.159
<v Speaker 2>Few right, they're also hopping windows. These are similar to

928
00:47:36.199 --> 00:47:39.480
<v Speaker 2>sliding windows fixed length overlapping, but defined by both the

929
00:47:39.519 --> 00:47:43.599
<v Speaker 2>window size and a fixed advancement interval the hop For example,

930
00:47:43.679 --> 00:47:46.559
<v Speaker 2>calculating a daily report covering the last seven days where

931
00:47:46.559 --> 00:47:49.039
<v Speaker 2>the window is seven days long and it hops forward

932
00:47:49.039 --> 00:47:50.119
<v Speaker 2>by one day each day.

933
00:47:50.400 --> 00:47:53.239
<v Speaker 1>Okay, like a sliding window, but maybe advancing less frequently

934
00:47:53.599 --> 00:47:53.920
<v Speaker 1>kind of.

935
00:47:54.239 --> 00:47:57.840
<v Speaker 2>And finally, session windows these are quite different. Their boundaries

936
00:47:57.880 --> 00:48:00.360
<v Speaker 2>aren't based on fixed time intervals, but on peer of

937
00:48:00.360 --> 00:48:02.079
<v Speaker 2>inactivity in the data stream.

938
00:48:02.079 --> 00:48:04.480
<v Speaker 1>Grouped by key inactivity. How does that work?

939
00:48:04.760 --> 00:48:08.159
<v Speaker 2>You define a session gap duration for a given key

940
00:48:08.599 --> 00:48:11.639
<v Speaker 2>like a user ID. All events arriving within that gap

941
00:48:11.719 --> 00:48:14.559
<v Speaker 2>duration of each other are grouped into the same session window.

942
00:48:15.239 --> 00:48:17.679
<v Speaker 2>If no event arrives for that key for longer than

943
00:48:17.679 --> 00:48:20.639
<v Speaker 2>the gap, the session is considered closed and the next

944
00:48:20.639 --> 00:48:24.199
<v Speaker 2>event starts a new session. Perfect for tracking user sessions

945
00:48:24.199 --> 00:48:27.280
<v Speaker 2>on a website where session ends after say thirty minutes

946
00:48:27.280 --> 00:48:28.800
<v Speaker 2>of no clicks from that user.

947
00:48:29.000 --> 00:48:32.960
<v Speaker 1>That's really clever for activity based analysis. Lots of windowing options,

948
00:48:33.000 --> 00:48:36.559
<v Speaker 1>so we have all these powerful stream processing capabilities with

949
00:48:36.639 --> 00:48:41.920
<v Speaker 1>Kafka streams transformations, state management, joins Windows. How does Kafka

950
00:48:41.960 --> 00:48:44.960
<v Speaker 1>streams actually achieve parallelization for all this work? If I

951
00:48:45.039 --> 00:48:48.280
<v Speaker 1>run multiple instances of my streaming application, how do they coordinate?

952
00:48:48.679 --> 00:48:53.079
<v Speaker 2>Kaffka Streams leverages Costca's own partitioning model for parallelization, which

953
00:48:53.119 --> 00:48:56.440
<v Speaker 2>is really elegant. It effectively splits the processing topology your

954
00:48:56.519 --> 00:49:00.000
<v Speaker 2>chain of K streams and K tables into independent units

955
00:49:00.039 --> 00:49:04.000
<v Speaker 2>called tasks. Each task is responsible for processing data from

956
00:49:04.039 --> 00:49:07.360
<v Speaker 2>one or more specific partitions of the input Cofka topics.

957
00:49:07.119 --> 00:49:10.960
<v Speaker 1>So one task per input partition roughly generally.

958
00:49:10.639 --> 00:49:14.559
<v Speaker 2>Yes, although a task might process partitions from multiple topics

959
00:49:14.679 --> 00:49:17.719
<v Speaker 2>if they're part of a join or merge. These tasks

960
00:49:17.760 --> 00:49:22.519
<v Speaker 2>are the smallest unit of parallelization. Kaffka Streams then automatically

961
00:49:22.519 --> 00:49:25.639
<v Speaker 2>distributes these tasks as evenly as possible across all the

962
00:49:25.719 --> 00:49:28.559
<v Speaker 2>running instances of your application that share the same application

963
00:49:28.679 --> 00:49:29.679
<v Speaker 2>dot eight ah.

964
00:49:29.840 --> 00:49:33.000
<v Speaker 1>The application dot A links the instances together.

965
00:49:32.719 --> 00:49:34.840
<v Speaker 2>Correct It acts like the consumer group i D we

966
00:49:34.920 --> 00:49:39.000
<v Speaker 2>discussed earlier. If you have say, ten partitions in your

967
00:49:39.000 --> 00:49:41.880
<v Speaker 2>input topic, and you run five instances of your Coffa

968
00:49:41.960 --> 00:49:45.880
<v Speaker 2>streams application with the same id Kaffa streams will assign

969
00:49:45.960 --> 00:49:49.360
<v Speaker 2>two tasks and thus two partitions to each instance to

970
00:49:49.400 --> 00:49:50.400
<v Speaker 2>process in parallel.

971
00:49:50.880 --> 00:49:54.880
<v Speaker 1>And what happens if one of those application instances fails Coffa.

972
00:49:54.760 --> 00:49:58.519
<v Speaker 2>Streams handles that automatically too, using Kafka's underlying consumer group

973
00:49:58.599 --> 00:50:01.760
<v Speaker 2>rebalancing protocol. If an instance fails or leaves the group,

974
00:50:01.880 --> 00:50:06.239
<v Speaker 2>its task are automatically redistributed among the remaining healthy instances. Similarly,

975
00:50:06.280 --> 00:50:08.760
<v Speaker 2>if you add a new instance, tasks will be migrated

976
00:50:08.760 --> 00:50:11.760
<v Speaker 2>to it to rebalance the load. This provides elasticity and

977
00:50:11.840 --> 00:50:14.519
<v Speaker 2>fault tolerance for your stream processing very resilient.

978
00:50:14.679 --> 00:50:17.920
<v Speaker 1>You mentioned repartitioning earlier for joins, Does that involve these

979
00:50:17.960 --> 00:50:18.559
<v Speaker 1>tasks too?

980
00:50:18.760 --> 00:50:22.599
<v Speaker 2>Yes. For operations like key based joins or aggregations group

981
00:50:22.639 --> 00:50:25.360
<v Speaker 2>by key count, et cetera that require data to be

982
00:50:25.400 --> 00:50:28.559
<v Speaker 2>grouped by key, Kofka streams might need to perform that

983
00:50:28.639 --> 00:50:31.920
<v Speaker 2>repartitioning step we talked about. This is done internally by

984
00:50:31.920 --> 00:50:35.360
<v Speaker 2>writing the relevant data stream to a special intermediate KOFCA

985
00:50:35.440 --> 00:50:39.320
<v Speaker 2>topic often called a repartition topic, which is correctly partitioned

986
00:50:39.320 --> 00:50:42.719
<v Speaker 2>by the required key. Then downstream tasks read from this

987
00:50:42.800 --> 00:50:46.480
<v Speaker 2>repartition topic. This effectively shuffles the data across the tasks

988
00:50:46.519 --> 00:50:49.239
<v Speaker 2>based on the key, ensuring all messages for a specific

989
00:50:49.360 --> 00:50:51.880
<v Speaker 2>key are processed by the same task, even if they

990
00:50:51.920 --> 00:50:55.239
<v Speaker 2>originally came from different input partitions. This might split your

991
00:50:55.239 --> 00:50:58.719
<v Speaker 2>overall processing logic into what COFFCA Streams calls sub topologies

992
00:50:59.039 --> 00:51:03.159
<v Speaker 2>connected by these interns ernal repartition topics, optimizing dataflow and correctness.

993
00:51:03.400 --> 00:51:07.199
<v Speaker 1>That's incredibly powerful and quite sophisticated under the hood, allowing

994
00:51:07.239 --> 00:51:10.679
<v Speaker 1>complex stateful processing to scale out and be fault tolerant.

995
00:51:11.320 --> 00:51:14.719
<v Speaker 1>But with all this power comes responsibility. Right, we've built

996
00:51:14.719 --> 00:51:19.119
<v Speaker 1>this amazing real time data nervous system. Let's talk about

997
00:51:19.159 --> 00:51:23.599
<v Speaker 1>management and security. How do you keep KOFKA healthy, compliant,

998
00:51:23.639 --> 00:51:27.039
<v Speaker 1>and secure, especially in a large production environment with many

999
00:51:27.079 --> 00:51:27.840
<v Speaker 1>teams using it.

1000
00:51:28.039 --> 00:51:31.800
<v Speaker 2>You're absolutely right. Once KOFKA becomes central to your data architecture,

1001
00:51:32.000 --> 00:51:35.880
<v Speaker 2>effective governance becomes crucial. Without it, things can quickly descend

1002
00:51:35.920 --> 00:51:36.519
<v Speaker 2>into chaos.

1003
00:51:36.679 --> 00:51:38.000
<v Speaker 1>Chaos how well.

1004
00:51:38.039 --> 00:51:41.000
<v Speaker 2>Imagine different teams producing data to the same topic but

1005
00:51:41.119 --> 00:51:44.960
<v Speaker 2>using slightly different formats or field names. Without agreement. Downstream

1006
00:51:44.960 --> 00:51:49.000
<v Speaker 2>consumer's break data becomes inconsistent, trust or roads. You risk

1007
00:51:49.039 --> 00:51:51.360
<v Speaker 2>that garbage in garbage out scenario.

1008
00:51:50.880 --> 00:51:53.599
<v Speaker 1>We mentioned so defining clear rules is step one.

1009
00:51:53.920 --> 00:51:56.880
<v Speaker 2>Yes, even if you don't use a formal tool initially,

1010
00:51:57.800 --> 00:52:03.000
<v Speaker 2>data always has a schema, implicit or explicit. Documenting and

1011
00:52:03.079 --> 00:52:06.719
<v Speaker 2>agreeing on schemas between producers and consumers is paramount For

1012
00:52:06.800 --> 00:52:10.119
<v Speaker 2>our online retailer. The team producing order events needs to

1013
00:52:10.239 --> 00:52:14.280
<v Speaker 2>agree with the teams consuming them, fulfillment analytics, etc. On

1014
00:52:14.440 --> 00:52:17.679
<v Speaker 2>exactly what fields are present, their types, and whether they

1015
00:52:17.679 --> 00:52:18.960
<v Speaker 2>are required or optional.

1016
00:52:19.239 --> 00:52:23.239
<v Speaker 1>Okay, agree on the schema, But schemas evolve. What about

1017
00:52:23.239 --> 00:52:25.719
<v Speaker 1>handling changes? How do you ensure a change made by

1018
00:52:25.760 --> 00:52:27.679
<v Speaker 1>one team doesn't break everyone else.

1019
00:52:27.840 --> 00:52:30.719
<v Speaker 2>That's where schema compatibility levels come into play. These are

1020
00:52:30.760 --> 00:52:33.559
<v Speaker 2>rules that define how schemas are allowed to evolve over time.

1021
00:52:33.840 --> 00:52:36.679
<v Speaker 2>They cover in changes like adding or deleting fields or

1022
00:52:36.760 --> 00:52:37.360
<v Speaker 2>changing types.

1023
00:52:37.440 --> 00:52:38.480
<v Speaker 1>What are the common levels?

1024
00:52:38.519 --> 00:52:41.920
<v Speaker 2>You typically have no compatibility or no nell anything goes.

1025
00:52:42.199 --> 00:52:45.400
<v Speaker 2>Any change is allowed, even breaking ones like renaming a

1026
00:52:45.519 --> 00:52:50.480
<v Speaker 2>required field very risky. Backward compatibility new schemas must be

1027
00:52:50.519 --> 00:52:53.440
<v Speaker 2>readable by applications using older schemas. This usually means you

1028
00:52:53.480 --> 00:52:56.440
<v Speaker 2>can add optional fields or delete existing fields, but not

1029
00:52:56.480 --> 00:53:00.400
<v Speaker 2>add required fields or rename existing ones. Consumers run older

1030
00:53:00.400 --> 00:53:02.360
<v Speaker 2>code won't break when encountering new data.

1031
00:53:02.440 --> 00:53:05.800
<v Speaker 1>Okay, new consumers can read old data. Wait, no other

1032
00:53:05.840 --> 00:53:09.159
<v Speaker 1>way around. Old consumers can read new data. Let me rephrase.

1033
00:53:09.639 --> 00:53:13.440
<v Speaker 1>Backward means consumers using the new schema can still process

1034
00:53:13.519 --> 00:53:16.599
<v Speaker 1>data written with the old schema. You can add optional

1035
00:53:16.639 --> 00:53:18.400
<v Speaker 1>fields or delete fields. Ah.

1036
00:53:18.559 --> 00:53:22.280
<v Speaker 2>Yes, let me clarify that. Backward compatibility means consumers using

1037
00:53:22.360 --> 00:53:25.360
<v Speaker 2>the new schema can process data produced with the old schema.

1038
00:53:25.719 --> 00:53:29.440
<v Speaker 2>This usually allows deleting fields or making required fields optional.

1039
00:53:29.719 --> 00:53:32.239
<v Speaker 1>Got it, new code reads old data.

1040
00:53:32.320 --> 00:53:36.000
<v Speaker 2>Then forward compatibility, consumers using an old schema can process

1041
00:53:36.079 --> 00:53:39.199
<v Speaker 2>data produced with the new schema. This usually allows adding

1042
00:53:39.239 --> 00:53:43.480
<v Speaker 2>new fields typically optional, or making optional fields required. Old

1043
00:53:43.480 --> 00:53:45.239
<v Speaker 2>code won't break when seeing new data.

1044
00:53:45.239 --> 00:53:47.840
<v Speaker 1>Formats old code reads new data.

1045
00:53:47.880 --> 00:53:51.880
<v Speaker 2>And finally, full compatibility. This combines both backward and forward

1046
00:53:52.119 --> 00:53:54.559
<v Speaker 2>Both new and old consumers can read both new and

1047
00:53:54.679 --> 00:53:58.039
<v Speaker 2>old data. This is the safest but often the most restrictive,

1048
00:53:58.280 --> 00:54:01.239
<v Speaker 2>typically only allowing adding or moving optional fields.

1049
00:54:01.360 --> 00:54:04.920
<v Speaker 1>Choosing the right compatibility level seems critical for managing change smoothly.

1050
00:54:05.079 --> 00:54:09.360
<v Speaker 2>Absolutely, it prevents unexpected breakages as your data landscape evolves.

1051
00:54:09.119 --> 00:54:11.880
<v Speaker 1>And that leads us naturally to schema registries. I assume

1052
00:54:12.400 --> 00:54:16.320
<v Speaker 1>are these the tools that enforce these compatibility rules exactly.

1053
00:54:16.639 --> 00:54:21.000
<v Speaker 2>Schema registries like the popular Confluence Schema Registry or alternatives

1054
00:54:21.039 --> 00:54:23.840
<v Speaker 2>like care Paths or EPICUREA registry act as a central

1055
00:54:23.880 --> 00:54:26.920
<v Speaker 2>authority or single source of truth for all schemas used

1056
00:54:26.960 --> 00:54:28.360
<v Speaker 2>within your Kafka ecosystem.

1057
00:54:28.440 --> 00:54:28.960
<v Speaker 1>What do they do?

1058
00:54:29.199 --> 00:54:35.039
<v Speaker 2>They store schemas like Avro, Poto, Boof or Jason schema definitions,

1059
00:54:35.760 --> 00:54:39.239
<v Speaker 2>manage different versions of those schemas, and critically, they enforce

1060
00:54:39.280 --> 00:54:42.920
<v Speaker 2>the compatibility rules you've defined for each topic or subject.

1061
00:54:43.280 --> 00:54:47.079
<v Speaker 2>In registry terms, when a producer tries to send data,

1062
00:54:47.199 --> 00:54:49.800
<v Speaker 2>it often first checks with the registry if the schema

1063
00:54:49.840 --> 00:54:52.360
<v Speaker 2>it's using is compatible with the registered versions for that

1064
00:54:52.440 --> 00:54:55.920
<v Speaker 2>topic according to the configured compatibility level. If not, the

1065
00:54:55.960 --> 00:54:59.320
<v Speaker 2>registry can reject the attempt, preventing incompatible data from ever

1066
00:54:59.480 --> 00:55:02.360
<v Speaker 2>entering cope. Consumers also use a registry to fetch the

1067
00:55:02.360 --> 00:55:05.239
<v Speaker 2>correct schema to de serialize incoming data.

1068
00:55:05.000 --> 00:55:07.920
<v Speaker 1>So the registry acts like a gatekeeper for schema quality

1069
00:55:07.920 --> 00:55:08.440
<v Speaker 1>and evolution.

1070
00:55:08.599 --> 00:55:11.239
<v Speaker 2>Precisely, it's a cornerstone of good Kafka governance.

1071
00:55:11.480 --> 00:55:16.039
<v Speaker 1>Okay governance handles the data structure. Moving to security, what

1072
00:55:16.079 --> 00:55:18.719
<v Speaker 1>are the core concerns for protecting your Kafka cluster and

1073
00:55:18.760 --> 00:55:22.280
<v Speaker 1>the data flowing through it from unauthorized access or potential breaches.

1074
00:55:22.840 --> 00:55:25.039
<v Speaker 1>This must be top of mind for anyone running Cofka

1075
00:55:25.119 --> 00:55:26.199
<v Speaker 1>with sensitive data.

1076
00:55:26.320 --> 00:55:29.920
<v Speaker 2>Security is absolutely paramount, and it involves several layers. First,

1077
00:55:29.960 --> 00:55:34.639
<v Speaker 2>you need authentication. This verifies who a client, producer, consumer, broker,

1078
00:55:34.960 --> 00:55:40.400
<v Speaker 2>dual claims to be. Common mechanisms include using mutual TLS MTLs,

1079
00:55:40.840 --> 00:55:43.880
<v Speaker 2>where both the client and server present certificates to verify

1080
00:55:43.920 --> 00:55:45.239
<v Speaker 2>each other's identity.

1081
00:55:45.000 --> 00:55:47.199
<v Speaker 1>Encrypted and authenticated connection.

1082
00:55:47.039 --> 00:55:51.079
<v Speaker 2>Right, or using SSL simple authentication and security layer which

1083
00:55:51.079 --> 00:55:54.920
<v Speaker 2>supports various pluggable methods like Curbaro's common and traditional enterprises

1084
00:55:55.280 --> 00:55:59.960
<v Speaker 2>plane user name password used with TLS scram, more secure challenges,

1085
00:56:00.000 --> 00:56:03.679
<v Speaker 2>sponsor passwords or even o opatonide connect via SaaS loft

1086
00:56:03.679 --> 00:56:05.880
<v Speaker 2>bearer for integration with modern identity providers.

1087
00:56:05.960 --> 00:56:07.400
<v Speaker 1>So confirm identity first.

1088
00:56:08.440 --> 00:56:11.280
<v Speaker 2>Then what Once a client is authenticated? You need authorization.

1089
00:56:11.679 --> 00:56:14.360
<v Speaker 2>This defines what an authenticated client is actually allowed to do.

1090
00:56:14.480 --> 00:56:16.159
<v Speaker 2>Can it read from topic A? Can it write to

1091
00:56:16.159 --> 00:56:17.800
<v Speaker 2>topic B? Can it create new topics?

1092
00:56:18.039 --> 00:56:19.840
<v Speaker 1>Controlling permissions exactly?

1093
00:56:20.199 --> 00:56:24.679
<v Speaker 2>Kafka typically handles this via Access control Lists ACLS. You

1094
00:56:24.760 --> 00:56:30.119
<v Speaker 2>define rules specifying which user principle has which permission read, write, create, describe, etc.

1095
00:56:30.639 --> 00:56:35.119
<v Speaker 2>On which resource, topic, group cluster. The best practice here

1096
00:56:35.199 --> 00:56:38.719
<v Speaker 2>is always the least privileged principle. Grant only the permissions

1097
00:56:38.760 --> 00:56:42.280
<v Speaker 2>absolutely necessary for a client to perform its function, nothing more.

1098
00:56:42.960 --> 00:56:46.079
<v Speaker 1>Don't give the marketing analytics consumer rate access to the

1099
00:56:46.119 --> 00:56:47.360
<v Speaker 1>payment processing topic.

1100
00:56:47.440 --> 00:56:49.039
<v Speaker 2>Definitely not. ACLS prevent that.

1101
00:56:49.119 --> 00:56:53.599
<v Speaker 1>Okay, Authentication and authorization control access. What about protecting the

1102
00:56:53.719 --> 00:56:56.400
<v Speaker 1>data itself, both as it moves across the network and

1103
00:56:56.480 --> 00:56:58.079
<v Speaker 1>when it's sitting on the broker's discs.

1104
00:56:58.159 --> 00:57:01.679
<v Speaker 2>Good point. That involves encryption for data in transit between

1105
00:57:01.719 --> 00:57:05.599
<v Speaker 2>clients and brokers or between brokers themselves. You use transport encryption,

1106
00:57:05.719 --> 00:57:09.960
<v Speaker 2>typically TLS Transport Layer Security, the successor to SSL. Enabling.

1107
00:57:10.000 --> 00:57:13.559
<v Speaker 2>TLS encrypts all the COOFKA traffic, preventing eavesdropping on the network.

1108
00:57:13.599 --> 00:57:14.559
<v Speaker 1>Is there a performance hit?

1109
00:57:14.920 --> 00:57:18.440
<v Speaker 2>Yes, TLS does introduce some CPU overhead for the encryption

1110
00:57:18.559 --> 00:57:22.000
<v Speaker 2>decryption process, so there's a performance cost, but it's often

1111
00:57:22.000 --> 00:57:26.119
<v Speaker 2>a necessary cost for security. KOFKA cleverly supports configuring multiple

1112
00:57:26.119 --> 00:57:29.159
<v Speaker 2>listeners per broker, For example, one plaintext listener on port

1113
00:57:29.239 --> 00:57:31.679
<v Speaker 2>nine zero nine two and one TLS listener on part

1114
00:57:31.719 --> 00:57:34.880
<v Speaker 2>nine zero nine three. This allows for gradual migration of

1115
00:57:34.880 --> 00:57:36.679
<v Speaker 2>clients to TLS without downtime.

1116
00:57:36.760 --> 00:57:40.199
<v Speaker 1>Okay TLS for data in motion. What about encryption at

1117
00:57:40.239 --> 00:57:42.519
<v Speaker 1>rest when data is stored on the broker discs.

1118
00:57:42.639 --> 00:57:45.800
<v Speaker 2>Kofka itself doesn't provide built in features for encrypting data

1119
00:57:45.840 --> 00:57:49.159
<v Speaker 2>stored within its log files on disc. For encryption at rest,

1120
00:57:49.239 --> 00:57:52.239
<v Speaker 2>you typically rely on capabilities of the underlying operating system

1121
00:57:52.360 --> 00:57:56.199
<v Speaker 2>or storage system, for instance, using filesystem level encryption like

1122
00:57:56.280 --> 00:58:01.119
<v Speaker 2>Linux's dmcrypt LUKS or features provided by cloud storage volumes

1123
00:58:01.280 --> 00:58:03.159
<v Speaker 2>like ebs encryption on AWS.

1124
00:58:03.519 --> 00:58:05.760
<v Speaker 1>So handle it at the infrastructure.

1125
00:58:05.119 --> 00:58:09.119
<v Speaker 2>Layer generally yes. Alternatively, a pattern sometimes used as employing

1126
00:58:09.119 --> 00:58:12.239
<v Speaker 2>a secure KOFCA proxy that encrypts message values before they

1127
00:58:12.239 --> 00:58:15.000
<v Speaker 2>are even produced to Kafka and decrypts them after consumption.

1128
00:58:15.559 --> 00:58:17.880
<v Speaker 2>This has complexity, but ensures the data on the broker

1129
00:58:17.920 --> 00:58:19.719
<v Speaker 2>disc is encrypted at the application level.

1130
00:58:19.960 --> 00:58:23.679
<v Speaker 1>And what about true end to end encryption where only

1131
00:58:23.719 --> 00:58:27.320
<v Speaker 1>the original producer and final consumer can decrypt the message

1132
00:58:27.440 --> 00:58:29.719
<v Speaker 1>and even the brokers can't see the plaintext.

1133
00:58:30.000 --> 00:58:34.519
<v Speaker 2>That offers the highest level of confidentiality for critical data. However,

1134
00:58:34.920 --> 00:58:38.000
<v Speaker 2>it must be implemented entirely by the client applications themselves.

1135
00:58:38.960 --> 00:58:41.800
<v Speaker 2>The producer incrudes the message value before sending, and the

1136
00:58:41.800 --> 00:58:46.400
<v Speaker 2>consumer decrypts it after receiving. KOFKA just transports the encrypted

1137
00:58:46.440 --> 00:58:50.400
<v Speaker 2>byt RAY. There's no widely adopted standard library or built

1138
00:58:50.400 --> 00:58:53.320
<v Speaker 2>in KOFKA feature for this, so it requires careful implementation

1139
00:58:53.400 --> 00:58:55.159
<v Speaker 2>and key management on the client side.

1140
00:58:55.199 --> 00:58:59.559
<v Speaker 1>Okay, so multiple layers often off c TLS encryption and

1141
00:58:59.599 --> 00:59:03.760
<v Speaker 1>transit infrastructure encryption at rest, an optional client side end

1142
00:59:03.800 --> 00:59:07.159
<v Speaker 1>to end encryption seems comprehensive. What if you have an

1143
00:59:07.239 --> 00:59:10.639
<v Speaker 1>existing cluster that was set up without security enabled. Is

1144
00:59:10.639 --> 00:59:13.840
<v Speaker 1>it a massive disruptive project to secure it later without

1145
00:59:13.920 --> 00:59:15.280
<v Speaker 1>causing significant downtime?

1146
00:59:15.440 --> 00:59:18.599
<v Speaker 2>Fortunately, no, it's usually manageable. You can secure an unsecured

1147
00:59:18.639 --> 00:59:22.639
<v Speaker 2>cluster gradually without affecting availabilities significantly. The key is using

1148
00:59:22.639 --> 00:59:26.079
<v Speaker 2>those multiple listeners. You can add new secure listeners example

1149
00:59:26.199 --> 00:59:31.519
<v Speaker 2>saslpls alongside the existing plaintex listeners on all brokers, usually

1150
00:59:31.559 --> 00:59:35.079
<v Speaker 2>requiring rolling restarts of the brokers one by one. Then

1151
00:59:35.199 --> 00:59:38.079
<v Speaker 2>you update the inner broker communication protocol to use the

1152
00:59:38.079 --> 00:59:42.159
<v Speaker 2>secure listener another rolling restart. Finally, you migrate your client

1153
00:59:42.159 --> 00:59:45.679
<v Speaker 2>applications incrementally to connect to the secure listeners and configure

1154
00:59:45.679 --> 00:59:49.559
<v Speaker 2>their authentication credentials. Once all clients are migrated, you can

1155
00:59:49.599 --> 00:59:53.119
<v Speaker 2>optionally disable the old plaintex listeners, a well defined process

1156
00:59:53.159 --> 00:59:55.559
<v Speaker 2>designed to minimize disruption, so.

1157
00:59:55.480 --> 00:59:58.159
<v Speaker 1>It's not an all or nothing, big bang cutover. That's

1158
00:59:58.239 --> 01:00:02.400
<v Speaker 1>very reassuring for organizations looking to improve their security posture. Now,

1159
01:00:02.440 --> 01:00:05.480
<v Speaker 1>what about managing resource usage? We talked about performance, but

1160
01:00:05.519 --> 01:00:08.599
<v Speaker 1>how do you prevent a single misbehaving or poorly coded

1161
01:00:08.639 --> 01:00:13.320
<v Speaker 1>application from consuming excessive resources and potentially impacting the entire

1162
01:00:13.360 --> 01:00:14.840
<v Speaker 1>cluster and all other users.

1163
01:00:14.920 --> 01:00:18.559
<v Speaker 2>That's where resource allocations, specifically quotas come into play.

1164
01:00:18.639 --> 01:00:19.719
<v Speaker 1>Quotas limits.

1165
01:00:19.960 --> 01:00:23.960
<v Speaker 2>Yes, Kafka allows administrators to set quotas on resource consumption

1166
01:00:24.039 --> 01:00:27.679
<v Speaker 2>for clients. Their primary purpose is really to protect the

1167
01:00:27.719 --> 01:00:31.360
<v Speaker 2>cluster from excessive load caused by misconfigured, buggy, or even

1168
01:00:31.400 --> 01:00:35.039
<v Speaker 2>malicious clients. It's generally not intended as a mechanism to

1169
01:00:35.119 --> 01:00:39.079
<v Speaker 2>artificially limit well behaved clients or enforced strict service tiers,

1170
01:00:39.400 --> 01:00:40.559
<v Speaker 2>though some use it that way.

1171
01:00:40.880 --> 01:00:42.679
<v Speaker 1>What kind of resources can you limit?

1172
01:00:42.920 --> 01:00:45.840
<v Speaker 2>The main quota is control produce and consume throughput rates

1173
01:00:46.079 --> 01:00:48.679
<v Speaker 2>measured invites per second. You can set a producer bitter

1174
01:00:48.800 --> 01:00:51.679
<v Speaker 2>quota and a consumer bitter quota. There's also a request

1175
01:00:51.679 --> 01:00:54.760
<v Speaker 2>percentage quota that limits the percentage of CPU time a

1176
01:00:54.800 --> 01:00:57.760
<v Speaker 2>client's request can utilize on the broker's network and IO threads,

1177
01:00:57.960 --> 01:00:59.800
<v Speaker 2>preventing CPU starvation.

1178
01:01:00.079 --> 01:01:03.079
<v Speaker 1>Bandwidth and CPU usage limits. Can you set these limits

1179
01:01:03.079 --> 01:01:06.039
<v Speaker 1>broadly or target specific users or applications.

1180
01:01:06.280 --> 01:01:08.960
<v Speaker 2>You can define quotas at different levels. You can set

1181
01:01:08.960 --> 01:01:11.840
<v Speaker 2>them based on the client dot ID property configured in

1182
01:01:11.880 --> 01:01:16.119
<v Speaker 2>the producer or consumer application. However, client dot ID can

1183
01:01:16.159 --> 01:01:20.559
<v Speaker 2>sometimes be easily spoofed or shared across instances. A more

1184
01:01:20.639 --> 01:01:23.920
<v Speaker 2>robust approach is to set quotas at the authenticated user level,

1185
01:01:24.119 --> 01:01:27.920
<v Speaker 2>based on the principle derived from SASEL or TLS. Authentication.

1186
01:01:28.599 --> 01:01:31.760
<v Speaker 2>User level quotas are generally considered more reliable for enforcement.

1187
01:01:32.159 --> 01:01:35.239
<v Speaker 2>You can also set default quotas per user or per

1188
01:01:35.320 --> 01:01:35.920
<v Speaker 2>client ID.

1189
01:01:36.239 --> 01:01:39.280
<v Speaker 1>What's the best practice for setting these quota values? Should

1190
01:01:39.320 --> 01:01:40.840
<v Speaker 1>you be really strict.

1191
01:01:40.960 --> 01:01:44.000
<v Speaker 2>The general best practice is actually to set quotas quite generously,

1192
01:01:44.280 --> 01:01:47.000
<v Speaker 2>well above the normal expected peak usage for a client,

1193
01:01:47.519 --> 01:01:50.599
<v Speaker 2>monitor the actual usage closely. The idea is to use

1194
01:01:50.639 --> 01:01:52.760
<v Speaker 2>quotas as a safety net or a safeguard to catch

1195
01:01:52.840 --> 01:01:56.119
<v Speaker 2>runaway clients or denial of service attempts, rather than using

1196
01:01:56.119 --> 01:01:58.679
<v Speaker 2>them as a strict bottleneck that clients regularly bump up

1197
01:01:58.679 --> 01:02:02.440
<v Speaker 2>against during normal operation. They're like circuit breakers, not fine grain.

1198
01:02:02.519 --> 01:02:05.360
<v Speaker 1>Traffic shapers use them as guardrails, not speed limits for

1199
01:02:05.400 --> 01:02:09.719
<v Speaker 1>everyday traffic. Makes sense. We've talked a lot about Kofka's internals,

1200
01:02:09.760 --> 01:02:13.199
<v Speaker 1>capabilities management, What about actually running it? What are the

1201
01:02:13.199 --> 01:02:17.119
<v Speaker 1>common deployment models? How do people typically deploy and operate

1202
01:02:17.199 --> 01:02:19.239
<v Speaker 1>KOFCA clusters in the real world.

1203
01:02:19.400 --> 01:02:21.760
<v Speaker 2>You have several options, each with its own set of

1204
01:02:21.760 --> 01:02:26.039
<v Speaker 2>trade offs regarding control, cost and operational effort. The traditional

1205
01:02:26.039 --> 01:02:28.800
<v Speaker 2>approach is running Kafka on your own hardware in your

1206
01:02:28.800 --> 01:02:31.880
<v Speaker 2>own data centers. This gives you maximum control over the

1207
01:02:31.960 --> 01:02:35.679
<v Speaker 2>environment hardware selection and configuration, but it also requires the

1208
01:02:35.719 --> 01:02:42.599
<v Speaker 2>most significant operational expertise for planning, provisioning, automation, monitoring, patching, upgrades.

1209
01:02:42.840 --> 01:02:45.760
<v Speaker 1>Everything, high control, high responsibility exactly.

1210
01:02:46.159 --> 01:02:49.599
<v Speaker 2>A very common variation is running Kafka in virtualized environments

1211
01:02:50.000 --> 01:02:53.440
<v Speaker 2>like VMware OpenStack on top of your own hardware or

1212
01:02:53.480 --> 01:02:57.559
<v Speaker 2>private cloud infrastructure. This adds a layer of virtualization management,

1213
01:02:57.880 --> 01:03:02.320
<v Speaker 2>but follows similar principles. Key considerations here are distributing brokers,

1214
01:03:02.599 --> 01:03:06.559
<v Speaker 2>evenly across physical VM hosts to avoid single points of failure,

1215
01:03:06.960 --> 01:03:11.480
<v Speaker 2>and carefully considering storage performance, preferring local SSDs over shared

1216
01:03:11.760 --> 01:03:15.599
<v Speaker 2>network attached storage SANDS and NASS if latency and throughput

1217
01:03:15.639 --> 01:03:19.159
<v Speaker 2>are critical, as network storage can sometimes introduce unpredictable performance.

1218
01:03:19.280 --> 01:03:22.440
<v Speaker 1>Okay, self managed on prem or private cloud? What about

1219
01:03:22.480 --> 01:03:25.280
<v Speaker 1>public cloud? Kumernetes seems popular these days?

1220
01:03:25.400 --> 01:03:25.559
<v Speaker 4>Right?

1221
01:03:25.639 --> 01:03:29.320
<v Speaker 2>Running Kafka on Kubernetes has become increasingly popular, especially for

1222
01:03:29.440 --> 01:03:32.679
<v Speaker 2>organizations that already have strong operational teams comfortable with managing

1223
01:03:32.719 --> 01:03:34.119
<v Speaker 2>stateful workloads on k ads.

1224
01:03:34.280 --> 01:03:37.519
<v Speaker 1>Isn't running something stateful like Cofka on Kubernetes tricky?

1225
01:03:37.880 --> 01:03:43.039
<v Speaker 2>It can be. Managing storage, networking and upgrades requires care. However,

1226
01:03:43.159 --> 01:03:46.960
<v Speaker 2>the ecosystem has matured significantly. The community standard for this

1227
01:03:47.079 --> 01:03:50.840
<v Speaker 2>is generally considered to be Strimsy Strimsey dot io. It's

1228
01:03:50.880 --> 01:03:54.760
<v Speaker 2>a Kubernetes operator specifically designed for deploying and managing Kofka

1229
01:03:54.800 --> 01:03:57.960
<v Speaker 2>clusters along with components like Coffca, Connect, mirror, Maaker, etc.

1230
01:03:58.440 --> 01:04:03.079
<v Speaker 2>On Kubernetes automate. It's many complex operational tasks like provisioning,

1231
01:04:03.320 --> 01:04:07.599
<v Speaker 2>configuration management, rolling upgrade certificate management, making it much more manageable,

1232
01:04:08.039 --> 01:04:10.840
<v Speaker 2>but it still requires a solid understanding of both Cofka

1233
01:04:10.880 --> 01:04:12.400
<v Speaker 2>and Kubernetes.

1234
01:04:12.000 --> 01:04:15.679
<v Speaker 1>So Strimsey helps bridge the gap for Kubernetes users. What

1235
01:04:15.800 --> 01:04:18.000
<v Speaker 1>about just using a fully managed service.

1236
01:04:18.239 --> 01:04:21.960
<v Speaker 2>That's the other major direction, using public cloud managed services

1237
01:04:22.159 --> 01:04:25.159
<v Speaker 2>platforms like conflent cloud from the original creators of Kafka,

1238
01:04:25.679 --> 01:04:29.760
<v Speaker 2>Amazon MSK Managed Streaming for Kafka, Avan for Apache Kofka

1239
01:04:30.119 --> 01:04:33.320
<v Speaker 2>as your HD on sidez Kafka, Google Cloud pub sub.

1240
01:04:33.400 --> 01:04:36.920
<v Speaker 2>Though different, sometimes used as alternative, they have stracked away most,

1241
01:04:36.960 --> 01:04:39.199
<v Speaker 2>if not all, of the underlying infrastructure.

1242
01:04:38.679 --> 01:04:40.519
<v Speaker 1>Management sounds appealing. What's the catch?

1243
01:04:40.800 --> 01:04:45.519
<v Speaker 2>The appeal is obvious, faster deployment, reduced operational burden, built

1244
01:04:45.559 --> 01:04:49.480
<v Speaker 2>in scalability and reliability features. The catch, or rather the

1245
01:04:49.519 --> 01:04:54.039
<v Speaker 2>trade offs, are typically reduced control over the fine grain configuration, potential,

1246
01:04:54.119 --> 01:04:56.960
<v Speaker 2>vendor lock in, and cost which can sometimes be higher

1247
01:04:57.000 --> 01:05:01.039
<v Speaker 2>than self managing a very large scale. Even with a

1248
01:05:01.079 --> 01:05:04.840
<v Speaker 2>managed service, you still need significant in house Kofka expertise.

1249
01:05:05.280 --> 01:05:08.840
<v Speaker 2>You need to understand Kofka concepts to design your applications correctly,

1250
01:05:08.920 --> 01:05:12.760
<v Speaker 2>choose the right service tier, understand the service's limitations like

1251
01:05:12.880 --> 01:05:18.079
<v Speaker 2>maximum retention periods, throughput cabs, partition limits, troubleshoot application level issues,

1252
01:05:18.480 --> 01:05:22.039
<v Speaker 2>and manage costs effectively. It's not no OPS, it's different OPS.

1253
01:05:22.400 --> 01:05:26.960
<v Speaker 1>Managed service simplifies infrastructure, but not application design or KOFCA knowledge.

1254
01:05:27.079 --> 01:05:30.199
<v Speaker 1>Good point. Finally, regardless of how you deploy it, keeping

1255
01:05:30.199 --> 01:05:33.559
<v Speaker 1>an eye on everything once it's running seems essential. Monitoring

1256
01:05:33.559 --> 01:05:36.000
<v Speaker 1>and alerting must be crucial for a distributed system like

1257
01:05:36.039 --> 01:05:40.079
<v Speaker 1>Kafka that's likely underpinning critical applications absolutely crucial.

1258
01:05:40.280 --> 01:05:43.440
<v Speaker 2>While Kofka is designed to be robust and fault tolerant,

1259
01:05:43.880 --> 01:05:47.719
<v Speaker 2>it's not invulnerable. Things can still go wrong. Brokers can

1260
01:05:47.760 --> 01:05:51.440
<v Speaker 2>run out of disk space, networks can become saturated, consumers

1261
01:05:51.440 --> 01:05:54.440
<v Speaker 2>can fall behind, partitions can become under replicated.

1262
01:05:54.760 --> 01:05:57.760
<v Speaker 1>Why is monitoring so important? What are you trying to achieve?

1263
01:05:58.280 --> 01:06:01.519
<v Speaker 2>The primary goals of monitoring are to detect problems quickly,

1264
01:06:01.840 --> 01:06:04.559
<v Speaker 2>ideally before they cause a major outage or data loss,

1265
01:06:04.960 --> 01:06:09.280
<v Speaker 2>and provide insights needed for troubleshooting and capacity planning. You

1266
01:06:09.320 --> 01:06:11.639
<v Speaker 2>want to prevent a small issue on one broker from

1267
01:06:11.639 --> 01:06:13.840
<v Speaker 2>cascading into a complete cluster failure.

1268
01:06:14.239 --> 01:06:16.760
<v Speaker 1>What are some of the absolute key metrics you should

1269
01:06:16.760 --> 01:06:18.480
<v Speaker 1>be watching the critical vital signs?

1270
01:06:18.599 --> 01:06:21.280
<v Speaker 2>There are many metrics available via JMX, but some key

1271
01:06:21.320 --> 01:06:25.960
<v Speaker 2>ones include Under replicated partitions, this metric, exposed by the

1272
01:06:26.000 --> 01:06:29.199
<v Speaker 2>controller broker, counts the number of partitions that currently don't

1273
01:06:29.199 --> 01:06:32.119
<v Speaker 2>have their full set of InSync replicas. This value should

1274
01:06:32.159 --> 01:06:35.159
<v Speaker 2>ideally always be zero. If it's greater than zero for

1275
01:06:35.199 --> 01:06:38.280
<v Speaker 2>a sustained period, it indicates a problem with replication and

1276
01:06:38.360 --> 01:06:41.000
<v Speaker 2>reduced fault tolerance. That's a top priority alert.

1277
01:06:41.239 --> 01:06:43.519
<v Speaker 1>Underreplicated equals bad got it.

1278
01:06:43.840 --> 01:06:47.880
<v Speaker 2>Active controller count should be exactly one across the entire cluster.

1279
01:06:48.360 --> 01:06:51.159
<v Speaker 2>If it's zero or one, the clusters in a bad state,

1280
01:06:51.519 --> 01:06:56.159
<v Speaker 2>offline petitions count, similar to underreplicated discounts. Partitions whose leader

1281
01:06:56.280 --> 01:06:59.920
<v Speaker 2>is offline should also be zero. Broker level metrics like

1282
01:07:00.039 --> 01:07:02.800
<v Speaker 2>leader count and partition count, These should be relatively balanced

1283
01:07:02.800 --> 01:07:05.519
<v Speaker 2>across all brokers in the cluster. If one broker has

1284
01:07:05.599 --> 01:07:08.280
<v Speaker 2>vastly more leaders or partitions than others, it indicates a

1285
01:07:08.320 --> 01:07:12.360
<v Speaker 2>load imbalance. Replication lag metrics like max lag under Kafka

1286
01:07:12.360 --> 01:07:15.519
<v Speaker 2>dot server replica fectro managers show the maximum lag between

1287
01:07:15.519 --> 01:07:18.199
<v Speaker 2>a leader and its followers. High lag means followers are

1288
01:07:18.199 --> 01:07:20.679
<v Speaker 2>falling behind, increasing risks during failover.

1289
01:07:20.880 --> 01:07:23.599
<v Speaker 1>Lag is important for consumers too write definitely.

1290
01:07:24.239 --> 01:07:28.239
<v Speaker 2>Consumer lag, often calculated externally by monitoring tools by comparing

1291
01:07:28.440 --> 01:07:30.960
<v Speaker 2>the consumer group's committed offset with the latest offset in

1292
01:07:31.000 --> 01:07:33.639
<v Speaker 2>the partition, is critical. It shows how far behind a

1293
01:07:33.679 --> 01:07:37.679
<v Speaker 2>consumer group is in processing messages. High or constantly increasing

1294
01:07:37.719 --> 01:07:41.000
<v Speaker 2>lag indicates the consumers can't keep up. Also, watch basic

1295
01:07:41.039 --> 01:07:44.400
<v Speaker 2>throughput bites in PERSEC bites OP per SC messages in

1296
01:07:44.440 --> 01:07:46.679
<v Speaker 2>per sec at the broker and topic level to understand

1297
01:07:46.719 --> 01:07:50.920
<v Speaker 2>load and detect anomalies. Network and request metrics request Q

1298
01:07:51.039 --> 01:07:54.280
<v Speaker 2>size should be near zero. Request latent zemes P ninety

1299
01:07:54.320 --> 01:07:56.599
<v Speaker 2>nine or P nine ninety nine are useful to spot

1300
01:07:56.639 --> 01:08:00.440
<v Speaker 2>processing bottlenecks and for consumer groups COFFA, dot com consumer

1301
01:08:00.480 --> 01:08:05.400
<v Speaker 2>dot consumer coordinator. Metrics related to rebalances, repellance, latency, max, rebalanced, total,

1302
01:08:05.840 --> 01:08:09.960
<v Speaker 2>frequent or long rebalances can indicate instability in the consumer group, example,

1303
01:08:10.039 --> 01:08:11.440
<v Speaker 2>consumers crashing repeatedly.

1304
01:08:11.519 --> 01:08:13.639
<v Speaker 1>That's a good list of critical signals. Yeah, so, how

1305
01:08:13.639 --> 01:08:15.880
<v Speaker 1>do you approach alerting based on these metrics without getting

1306
01:08:15.960 --> 01:08:19.159
<v Speaker 1>drowned in constant notifications, especially in dynamic environments.

1307
01:08:19.399 --> 01:08:22.359
<v Speaker 2>That's the art of good alerting. First, not every metric

1308
01:08:22.439 --> 01:08:25.359
<v Speaker 2>needs an alert. Focus on metrics that indicate a real

1309
01:08:25.479 --> 01:08:29.720
<v Speaker 2>actionable problem impacting service health or data integrity, like under

1310
01:08:29.800 --> 01:08:35.319
<v Speaker 2>replicated partitions. High consumer lag beyond a threshold brokers being down. Second,

1311
01:08:35.720 --> 01:08:39.199
<v Speaker 2>set meaningful thresholds based on your baseline performance and SLOs.

1312
01:08:39.920 --> 01:08:42.560
<v Speaker 2>Don't alert if consumer lag spikes for one minute during

1313
01:08:42.560 --> 01:08:45.479
<v Speaker 2>a brief load burst if it recovers quickly, alert if

1314
01:08:45.479 --> 01:08:48.760
<v Speaker 2>it stays high for ten minutes. Third, allows systems time

1315
01:08:48.800 --> 01:08:52.000
<v Speaker 2>for self healing where appropriate. If you're running on Kubernetes,

1316
01:08:52.000 --> 01:08:54.119
<v Speaker 2>maybe don't page the on call engineer immediately. If for

1317
01:08:54.159 --> 01:08:58.119
<v Speaker 2>broker pod crashes, Kubernetes might restart it successfully within a minute.

1318
01:08:58.319 --> 01:09:00.760
<v Speaker 2>Alert only if the problem persists, be on the expected

1319
01:09:00.760 --> 01:09:01.720
<v Speaker 2>auto recovery time.

1320
01:09:02.319 --> 01:09:04.520
<v Speaker 1>Build in some tolerance for self heal right.

1321
01:09:05.079 --> 01:09:08.760
<v Speaker 2>And finally, always enrich your alerts with context. An alert

1322
01:09:08.800 --> 01:09:12.199
<v Speaker 2>message should clearly state what is wrong, where, which cluster,

1323
01:09:12.399 --> 01:09:15.880
<v Speaker 2>which topic, group, the severity, and ideally provide links to

1324
01:09:16.000 --> 01:09:20.079
<v Speaker 2>relevant dashboards for investigation or even point to specific troubleshooting

1325
01:09:20.079 --> 01:09:23.640
<v Speaker 2>steps or alert playbooks. An alert for our online retailer

1326
01:09:23.680 --> 01:09:27.039
<v Speaker 2>that just says metric excess high is useless. An alert

1327
01:09:27.079 --> 01:09:31.199
<v Speaker 2>saying critical order processing consumer group lag one million messages

1328
01:09:31.199 --> 01:09:34.880
<v Speaker 2>for fifteen men's on broad COSTCA cluster dashboard, dot link playbook.

1329
01:09:34.880 --> 01:09:36.640
<v Speaker 2>Dot link is much more effective.

1330
01:09:36.800 --> 01:09:40.920
<v Speaker 1>Context makes alerts actionable, not just noise. Excellent advice. We've

1331
01:09:40.920 --> 01:09:43.000
<v Speaker 1>covered a huge amount of ground how Kofka works, how

1332
01:09:43.000 --> 01:09:45.520
<v Speaker 1>it performs, how it integrates, how it's managed and secured.

1333
01:09:45.760 --> 01:09:49.199
<v Speaker 1>Truly a deep dive. But as with any complex distributed system,

1334
01:09:49.279 --> 01:09:51.279
<v Speaker 1>things can still go wrong on a larger scale. Let's

1335
01:09:51.279 --> 01:09:53.880
<v Speaker 1>talk disaster management. How do you prepare for the really

1336
01:09:53.880 --> 01:09:57.119
<v Speaker 1>bad scenarios like an entire data center going offline unexpectedly.

1337
01:09:57.520 --> 01:10:01.920
<v Speaker 2>Disaster management is absolutely critical for business continuity, especially when

1338
01:10:02.000 --> 01:10:06.760
<v Speaker 2>Kafka is underpinning core operations. Kaffa's own architecture provides some

1339
01:10:06.840 --> 01:10:11.600
<v Speaker 2>inherent resilience here. Its asyncreitous nature, for example, helps mitigate

1340
01:10:11.640 --> 01:10:15.760
<v Speaker 2>temporary network partitions or brief compute failures. If a producer

1341
01:10:15.800 --> 01:10:18.600
<v Speaker 2>can't reach the broker, it can often buffer messages and

1342
01:10:18.640 --> 01:10:22.239
<v Speaker 2>retry later. If a consumer gets disconnected, it can usually

1343
01:10:22.319 --> 01:10:24.760
<v Speaker 2>reconnect and resume from where it left off using its

1344
01:10:24.760 --> 01:10:28.760
<v Speaker 2>committed offsets. Data isn't typically lost just because of transient

1345
01:10:28.760 --> 01:10:29.840
<v Speaker 2>connectivity issues.

1346
01:10:30.000 --> 01:10:33.640
<v Speaker 1>Okay, it handles temporary glitches, Well, what about more permanent

1347
01:10:33.680 --> 01:10:37.079
<v Speaker 1>compute failures like a single broker machine dying completely within

1348
01:10:37.079 --> 01:10:37.560
<v Speaker 1>a cluster.

1349
01:10:37.920 --> 01:10:41.319
<v Speaker 2>As we discuss with reliability, Kafka is explicitly designed to

1350
01:10:41.319 --> 01:10:45.800
<v Speaker 2>handle individual compute failures gracefully. Assuming you've configured adequate replication

1351
01:10:46.239 --> 01:10:49.239
<v Speaker 2>with a replication factor of say three and min dot

1352
01:10:49.319 --> 01:10:52.399
<v Speaker 2>insnc dot replica set to two, the cluster can tolerate

1353
01:10:52.439 --> 01:10:55.359
<v Speaker 2>the loss of one broker per partition without any loss

1354
01:10:55.359 --> 01:10:58.279
<v Speaker 2>of data availability for rights all of yours or reads.

1355
01:10:58.399 --> 01:11:00.960
<v Speaker 1>Automatic leader fail over kicks in exactly.

1356
01:11:01.239 --> 01:11:04.319
<v Speaker 2>The cluster heals itself by electing new leaders from the

1357
01:11:04.359 --> 01:11:07.520
<v Speaker 2>remaining in sync replicas. This is a core part of

1358
01:11:07.600 --> 01:11:11.239
<v Speaker 2>Kofka's high availability promise within a single cluster deployment.

1359
01:11:11.640 --> 01:11:14.239
<v Speaker 1>That's good for failures within a deployment, but what about

1360
01:11:14.239 --> 01:11:17.479
<v Speaker 1>the really big one, a full data center failure. If

1361
01:11:17.479 --> 01:11:20.640
<v Speaker 1>your entire Kofka cluster is in one DC and that

1362
01:11:20.720 --> 01:11:25.680
<v Speaker 1>DC goes dark power, loss, network outage, natural disaster, you're offline,

1363
01:11:25.760 --> 01:11:26.359
<v Speaker 1>right you are?

1364
01:11:26.640 --> 01:11:29.600
<v Speaker 2>If the entire cluster lives in that single DC, that's

1365
01:11:29.640 --> 01:11:33.199
<v Speaker 2>a major risk for critical applications. So for true disaster

1366
01:11:33.319 --> 01:11:37.079
<v Speaker 2>recovery across data centers, you need more sophisticated strategies which

1367
01:11:37.119 --> 01:11:39.279
<v Speaker 2>significantly increase complexity and cost.

1368
01:11:39.520 --> 01:11:40.560
<v Speaker 1>What are the main approaches.

1369
01:11:40.680 --> 01:11:42.640
<v Speaker 2>One approach is to build a stretched cluster.

1370
01:11:43.000 --> 01:11:43.520
<v Speaker 1>Stretched.

1371
01:11:43.680 --> 01:11:46.600
<v Speaker 2>Yes, you deploy single logical Kofka cluster, but the brokers

1372
01:11:46.600 --> 01:11:50.800
<v Speaker 2>are physically distributed across multiple data centers or availability zones ASS.

1373
01:11:51.319 --> 01:11:54.079
<v Speaker 2>For example, you might place brokers in three different ass

1374
01:11:54.199 --> 01:11:56.039
<v Speaker 2>within the same geographic region.

1375
01:11:55.880 --> 01:11:59.239
<v Speaker 1>So it operates as one cluster, just geographically spread out.

1376
01:11:59.520 --> 01:12:03.960
<v Speaker 2>Correct. The coordination mechanism craft or zookeeper also needs to

1377
01:12:03.960 --> 01:12:07.119
<v Speaker 2>be stretched across these locations. If one entire data center

1378
01:12:07.159 --> 01:12:11.079
<v Speaker 2>AZ fails, the Kafka cluster can potentially remain operational as

1379
01:12:11.119 --> 01:12:14.000
<v Speaker 2>long as a majority of coordinators remain online and partitions

1380
01:12:14.039 --> 01:12:16.479
<v Speaker 2>still have a live leader in one of the surviving

1381
01:12:16.520 --> 01:12:20.359
<v Speaker 2>locations and enough ISRs meet the men dot InSync dot

1382
01:12:20.399 --> 01:12:21.399
<v Speaker 2>replicas requirement.

1383
01:12:21.960 --> 01:12:23.439
<v Speaker 1>What are the downsides of stretching?

1384
01:12:23.720 --> 01:12:27.119
<v Speaker 2>The main downside is latency. Network latency between data centers

1385
01:12:27.119 --> 01:12:30.000
<v Speaker 2>is typically much higher than within a single DC. This

1386
01:12:30.119 --> 01:12:35.359
<v Speaker 2>increased latency impacts produced request times, especially for AXOL replication

1387
01:12:35.479 --> 01:12:39.119
<v Speaker 2>lag and failover times. It generally requires dcs that are

1388
01:12:39.159 --> 01:12:43.319
<v Speaker 2>relatively close geographically with low latency, high bandwidth links between them,

1389
01:12:43.520 --> 01:12:46.159
<v Speaker 2>and you typically need at least three locations dcs or

1390
01:12:46.199 --> 01:12:49.720
<v Speaker 2>azs to reliably maintain a quorum for the coordination cluster

1391
01:12:49.800 --> 01:12:50.920
<v Speaker 2>if one location fails.

1392
01:12:51.079 --> 01:12:54.720
<v Speaker 1>So stretched clusters offer high availability, become with latency trade offs,

1393
01:12:54.760 --> 01:12:58.039
<v Speaker 1>and require careful network planning. What's the alternative if stretching

1394
01:12:58.079 --> 01:12:58.800
<v Speaker 1>isn't feasible?

1395
01:12:59.239 --> 01:13:03.039
<v Speaker 2>The other common approach is using multiple independent Kofka clusters

1396
01:13:03.479 --> 01:13:06.840
<v Speaker 2>and replicating data between them using tools like Kofka's own

1397
01:13:06.920 --> 01:13:10.319
<v Speaker 2>mirror Maker, specifically mirror Maker two, which is built on

1398
01:13:10.359 --> 01:13:11.239
<v Speaker 2>the Connect framework.

1399
01:13:11.359 --> 01:13:15.359
<v Speaker 1>Mirroring copying data between clusters exactly.

1400
01:13:15.159 --> 01:13:18.079
<v Speaker 2>You continuously copy messages from topics in one cluster to

1401
01:13:18.119 --> 01:13:21.399
<v Speaker 2>topics in another cluster, usually located in a different data

1402
01:13:21.399 --> 01:13:25.039
<v Speaker 2>center or region. You can set various topologies with mirroring,

1403
01:13:25.560 --> 01:13:28.600
<v Speaker 2>like what a common one is? Active passive. You have

1404
01:13:28.640 --> 01:13:32.119
<v Speaker 2>your primary active cluster in one DC where producers send

1405
01:13:32.199 --> 01:13:36.159
<v Speaker 2>data and consumers normally read from mirror Maker then copies

1406
01:13:36.159 --> 01:13:39.680
<v Speaker 2>this data to a passive cluster in a separate dr

1407
01:13:40.000 --> 01:13:41.600
<v Speaker 2>disaster recovery data center.

1408
01:13:41.640 --> 01:13:43.279
<v Speaker 1>What's the passive cluster used for?

1409
01:13:43.439 --> 01:13:46.720
<v Speaker 2>It serves as a hot standby. If the active cluster fails,

1410
01:13:46.760 --> 01:13:50.640
<v Speaker 2>you would need to manually or via automation fail over

1411
01:13:50.720 --> 01:13:54.119
<v Speaker 2>your producers and consumers to start using the passive cluster instead.

1412
01:13:54.479 --> 01:13:57.199
<v Speaker 2>This setup is also often used for data migration between

1413
01:13:57.199 --> 01:14:00.760
<v Speaker 2>clusters or for centralizing data for analysis. Each regional clusters

1414
01:14:00.800 --> 01:14:02.760
<v Speaker 2>mirroring data to a main headquarters.

1415
01:14:02.279 --> 01:14:06.039
<v Speaker 1>Cluster, so failover is typically a manual step in active passive.

1416
01:14:06.239 --> 01:14:09.239
<v Speaker 1>What about active active does that avoid manual failover?

1417
01:14:09.600 --> 01:14:13.399
<v Speaker 2>An active active setup involves having two or more independent clusters,

1418
01:14:13.720 --> 01:14:17.760
<v Speaker 2>both actively serving producer and consumer traffic and configured to

1419
01:14:17.800 --> 01:14:21.239
<v Speaker 2>mirror data to each other. For example, users in Europe

1420
01:14:21.279 --> 01:14:24.039
<v Speaker 2>connect to the EU cluster, uses in the US connect

1421
01:14:24.039 --> 01:14:26.640
<v Speaker 2>to the US cluster, and mirror Maker copies data in

1422
01:14:26.680 --> 01:14:27.800
<v Speaker 2>both directions between them.

1423
01:14:28.039 --> 01:14:31.119
<v Speaker 1>That sounds complex to manage, especially avoiding infinite loops of

1424
01:14:31.119 --> 01:14:32.920
<v Speaker 1>mirroring the same message back and forth.

1425
01:14:33.039 --> 01:14:36.680
<v Speaker 2>It is complex. Mirror Maker two handles the looping issue

1426
01:14:36.680 --> 01:14:40.119
<v Speaker 2>by automatically prefixing mirrored topics with the name of the

1427
01:14:40.159 --> 01:14:43.720
<v Speaker 2>source cluster. For example, data from topic A in the

1428
01:14:43.840 --> 01:14:47.000
<v Speaker 2>USE cluster arrives in the UTH cluster as USE cluster

1429
01:14:47.119 --> 01:14:50.359
<v Speaker 2>dot Topic A. This prevents it from being mirrored back again.

1430
01:14:51.159 --> 01:14:53.399
<v Speaker 2>The bigger challenge with active active is often on the

1431
01:14:53.439 --> 01:14:56.640
<v Speaker 2>application side. You need to ensure your consumer services can

1432
01:14:56.680 --> 01:15:00.800
<v Speaker 2>handle potentially processing the same logical event twice, once from

1433
01:15:00.840 --> 01:15:03.720
<v Speaker 2>the local cluster, once mirrored from the remote cluster, or

1434
01:15:03.800 --> 01:15:07.359
<v Speaker 2>implement logic to de duplicate or process only based on origin.

1435
01:15:08.000 --> 01:15:11.439
<v Speaker 2>Our online retailer would need very careful application designed to avoid, say,

1436
01:15:11.960 --> 01:15:14.439
<v Speaker 2>double charging a customer shipping an order twice. If using

1437
01:15:14.479 --> 01:15:18.279
<v Speaker 2>active active, it provides higher availability and potentially lower latency

1438
01:15:18.279 --> 01:15:21.039
<v Speaker 2>for users connecting to their local cluster, but demands more

1439
01:15:21.039 --> 01:15:22.640
<v Speaker 2>sophisticated application logic.

1440
01:15:22.920 --> 01:15:26.600
<v Speaker 1>Active Active sounds powerful, but requires careful thought about idempotency

1441
01:15:26.720 --> 01:15:30.520
<v Speaker 1>or deduplication and consumers any other mirroring topologies.

1442
01:15:30.840 --> 01:15:33.640
<v Speaker 2>Another useful one is hub and spoke. This is great

1443
01:15:33.640 --> 01:15:37.880
<v Speaker 2>for distributed organizations. Imagine a central hub Kafka cluster at

1444
01:15:37.920 --> 01:15:43.000
<v Speaker 2>headquarters and multiple smaller spoke clusters at regional offices, factories,

1445
01:15:43.239 --> 01:15:44.720
<v Speaker 2>or even retail stores like.

1446
01:15:44.640 --> 01:15:46.079
<v Speaker 1>Our supermarket chain example.

1447
01:15:46.199 --> 01:15:50.199
<v Speaker 2>Exactly, espoke cluster might handle local operations and then mirror

1448
01:15:50.319 --> 01:15:53.680
<v Speaker 2>relevant data like sales summaries up to the central hub

1449
01:15:53.720 --> 01:15:57.119
<v Speaker 2>for company wide aggregation and analysis. The hub might also

1450
01:15:57.199 --> 01:16:00.520
<v Speaker 2>mirror down commands or configuration updates to the spokes. This

1451
01:16:00.560 --> 01:16:03.600
<v Speaker 2>allows spokes to operate somewhat independently. Even if connectivity to

1452
01:16:03.640 --> 01:16:06.399
<v Speaker 2>the hub is intermittent, and then sync up later. Think

1453
01:16:06.439 --> 01:16:09.000
<v Speaker 2>cruise ships needing to operate autonomously at sea and then

1454
01:16:09.039 --> 01:16:10.560
<v Speaker 2>sink data when they reach port.

1455
01:16:10.720 --> 01:16:13.319
<v Speaker 1>Hub and spoke seems well suited for those kinds of

1456
01:16:13.399 --> 01:16:18.039
<v Speaker 1>distributed environments. So stretched clusters or mirroring with various patterns

1457
01:16:18.079 --> 01:16:22.039
<v Speaker 1>offer ways to handle large scale disasters. Kafka is clearly

1458
01:16:22.039 --> 01:16:26.439
<v Speaker 1>incredibly versatile, powerful and resilient when used correctly. But just

1459
01:16:26.479 --> 01:16:28.960
<v Speaker 1>as important as knowing when and how to use it

1460
01:16:29.000 --> 01:16:31.680
<v Speaker 1>is knowing when not to use Kaofka. It's not a

1461
01:16:31.680 --> 01:16:34.680
<v Speaker 1>silver bullet for every data problem, is it. What are

1462
01:16:34.680 --> 01:16:37.600
<v Speaker 1>some crucial limitations or scenarios where kofka might be the

1463
01:16:37.640 --> 01:16:38.920
<v Speaker 1>wrong tool for the job.

1464
01:16:39.199 --> 01:16:41.960
<v Speaker 2>That's a vital point to cover. Kaffa is amazing, but

1465
01:16:42.039 --> 01:16:45.640
<v Speaker 2>applying it inappropriately leads to frustration and poor results. There

1466
01:16:45.640 --> 01:16:49.239
<v Speaker 2>are several key anti patterns. First, and perhaps most importantly,

1467
01:16:49.439 --> 01:16:51.560
<v Speaker 2>Kaffa is not a relational database.

1468
01:16:51.840 --> 01:16:54.199
<v Speaker 1>We touched on this with the log versus state idea.

1469
01:16:54.359 --> 01:16:57.760
<v Speaker 2>Right Coffee excels at storing the history of events what happened,

1470
01:16:58.640 --> 01:17:01.520
<v Speaker 2>It's generally poor at representing and querying the current state

1471
01:17:01.560 --> 01:17:05.359
<v Speaker 2>of complex entities, especially if that requires complex joints across

1472
01:17:05.399 --> 01:17:09.880
<v Speaker 2>multiple normalized tables or point lookups based on arbitrary criteria

1473
01:17:10.119 --> 01:17:12.640
<v Speaker 2>like finding a user by email address when the topic

1474
01:17:12.720 --> 01:17:15.800
<v Speaker 2>is key by user ID. Don't try to replace your

1475
01:17:15.840 --> 01:17:20.399
<v Speaker 2>operational databases postgress, MySQL, et cetera with Kofka for storing

1476
01:17:20.439 --> 01:17:24.640
<v Speaker 2>the canonical queriable state of your application entities. Use databases

1477
01:17:24.640 --> 01:17:28.479
<v Speaker 2>for that. Messages flowing through Kafka should ideally be denormalized,

1478
01:17:28.760 --> 01:17:31.760
<v Speaker 2>meaning a single message contains all the relevant information needed

1479
01:17:31.760 --> 01:17:35.439
<v Speaker 2>for that event, potentially duplicating data found elsewhere, rather than

1480
01:17:35.479 --> 01:17:39.079
<v Speaker 2>requiring complex lookups for joins later, use relational databases for

1481
01:17:39.119 --> 01:17:42.199
<v Speaker 2>your service's internal data persistence and complex querying needs.

1482
01:17:42.560 --> 01:17:46.079
<v Speaker 1>So you wouldn't use kofcas, CASEQL or streams API to

1483
01:17:46.119 --> 01:17:49.960
<v Speaker 1>directly serve a request like find customer excess current shipping address.

1484
01:17:49.960 --> 01:17:52.840
<v Speaker 1>If that address might have changed many times, You'd query

1485
01:17:52.880 --> 01:17:55.800
<v Speaker 1>a proper database that stores the current address exactly.

1486
01:17:56.039 --> 01:17:59.439
<v Speaker 2>You might build that database view using data stream from Kofka.

1487
01:17:59.760 --> 01:18:02.640
<v Speaker 2>But Kafka itself isn't the primary query engine for current

1488
01:18:02.640 --> 01:18:03.720
<v Speaker 2>state lookups in that way.

1489
01:18:03.880 --> 01:18:06.079
<v Speaker 1>Okay, not a database replacement. What else is it not?

1490
01:18:06.439 --> 01:18:10.159
<v Speaker 2>Second Kafka is not a synchronous communication interface or a

1491
01:18:10.199 --> 01:18:14.239
<v Speaker 2>request response system, meaning avoid using KOFKA for interactions where

1492
01:18:14.239 --> 01:18:17.079
<v Speaker 2>a client sends a request and needs an immediate blocking

1493
01:18:17.119 --> 01:18:20.000
<v Speaker 2>response back on the same connection. Think of a web

1494
01:18:20.039 --> 01:18:23.079
<v Speaker 2>browser calling a back end API to fetch user profile

1495
01:18:23.199 --> 01:18:26.960
<v Speaker 2>data that needs a quick HTTP response. Kofka is designed

1496
01:18:26.960 --> 01:18:30.880
<v Speaker 2>for asynchronous data exchange. Producers send messages and typically don't

1497
01:18:30.920 --> 01:18:34.319
<v Speaker 2>wait for or even know about the consumers. Consumers process

1498
01:18:34.319 --> 01:18:37.319
<v Speaker 2>messages at their own pace. Trying to force synchronous request

1499
01:18:37.319 --> 01:18:40.760
<v Speaker 2>response patterns over Kafka e g. Producer sends topic A,

1500
01:18:40.960 --> 01:18:43.119
<v Speaker 2>waits for a consumerative process, and send a apply to

1501
01:18:43.119 --> 01:18:46.359
<v Speaker 2>topic B. Producer reads from topic B is usually complex, brittle,

1502
01:18:46.359 --> 01:18:49.640
<v Speaker 2>and inefficient compared to using standard RPC, rest APIs or

1503
01:18:49.720 --> 01:18:51.800
<v Speaker 2>gRPC for synchronous needs.

1504
01:18:51.880 --> 01:18:54.960
<v Speaker 1>So if your online stores checkout page needs that instant

1505
01:18:54.960 --> 01:18:58.119
<v Speaker 1>payment accepted or payment declined feedback to show the user,

1506
01:18:58.800 --> 01:19:01.640
<v Speaker 1>you'd use a direct API call to the payment service,

1507
01:19:02.039 --> 01:19:04.600
<v Speaker 1>not send a command via Kafka and hope for a

1508
01:19:04.640 --> 01:19:05.800
<v Speaker 1>quick reply message.

1509
01:19:05.840 --> 01:19:08.439
<v Speaker 2>Precisely use the right tool for the job. KOFKA for

1510
01:19:08.520 --> 01:19:12.880
<v Speaker 2>asynchronous event streams APIs for synchronous interactions.

1511
01:19:12.000 --> 01:19:14.920
<v Speaker 1>Makes sense and what about sending large files? You mentioned

1512
01:19:15.000 --> 01:19:17.359
<v Speaker 1>the one mebe default limit earlier right.

1513
01:19:17.520 --> 01:19:21.279
<v Speaker 2>Third, KOFKA is not a file exchange platform. It is

1514
01:19:21.359 --> 01:19:25.359
<v Speaker 2>highly optimized for processing a large volume of relatively small messages,

1515
01:19:25.520 --> 01:19:29.199
<v Speaker 2>typically under that ONEVAB default and often much smaller kilobytes.

1516
01:19:29.800 --> 01:19:33.199
<v Speaker 2>These messages are usually structured machine readable data like JSON

1517
01:19:33.319 --> 01:19:36.520
<v Speaker 2>or AVRO. Trying to send large binary files like multi

1518
01:19:36.600 --> 01:19:41.239
<v Speaker 2>megabyte PDFs, high resolution images, video files directly inside KOFCA

1519
01:19:41.239 --> 01:19:44.800
<v Speaker 2>message values is generally a very bad idea. Why specifically

1520
01:19:44.880 --> 01:19:48.479
<v Speaker 2>it performs poorly It consumes excessive broker disk space, puts

1521
01:19:48.479 --> 01:19:51.960
<v Speaker 2>heavy load on network bandwidth, increases memory pressure on clients

1522
01:19:52.000 --> 01:19:55.119
<v Speaker 2>and brokers, and can lead to processing timeouts and instability.

1523
01:19:55.520 --> 01:19:59.720
<v Speaker 2>Kofka's internal mechanisms aren't designed for efficiently handling huge blobs.

1524
01:19:59.319 --> 01:20:02.000
<v Speaker 1>Like that, So what's the alternative? If you need to

1525
01:20:02.079 --> 01:20:05.199
<v Speaker 1>signal that a large file is ready for processing, The.

1526
01:20:05.159 --> 01:20:07.680
<v Speaker 2>Standard pattern is to store the large file in a

1527
01:20:07.720 --> 01:20:12.199
<v Speaker 2>dedicated object storage system like AWSS three, Google Cloud Storage,

1528
01:20:12.439 --> 01:20:14.960
<v Speaker 2>or an internal file server, and then send a small

1529
01:20:15.000 --> 01:20:18.720
<v Speaker 2>COFFCA message containing metadata about the file, including a reference

1530
01:20:18.800 --> 01:20:21.680
<v Speaker 2>or pointer like the S three URL to its location.

1531
01:20:22.239 --> 01:20:25.640
<v Speaker 2>The consumer reads the small notification message from COFCA and

1532
01:20:25.680 --> 01:20:28.279
<v Speaker 2>then uses the reference to fetch the large file directly

1533
01:20:28.359 --> 01:20:31.640
<v Speaker 2>from the storage system for processing. Keep large payloads out

1534
01:20:31.640 --> 01:20:32.520
<v Speaker 2>of cofka itself.

1535
01:20:33.079 --> 01:20:36.520
<v Speaker 1>Use COFCA for the notification, not the file transfer. Got it.

1536
01:20:36.560 --> 01:20:38.880
<v Speaker 1>Does it always make sense to implement coofka even for

1537
01:20:38.920 --> 01:20:42.880
<v Speaker 1>smaller projects or simpler data needs. Is there a complexity threshold?

1538
01:20:43.119 --> 01:20:46.319
<v Speaker 2>That's a good question. Fourth coofka is used for small

1539
01:20:46.319 --> 01:20:50.199
<v Speaker 2>applications can sometimes be questionable. While kafka brings immense power

1540
01:20:50.199 --> 01:20:55.680
<v Speaker 2>and scalability, it also introduces operational complexity. Setting up, managing, monitoring,

1541
01:20:55.800 --> 01:20:59.039
<v Speaker 2>securing and upgrading a COFFCA cluster, even a small one,

1542
01:20:59.319 --> 01:21:03.039
<v Speaker 2>or even using managed service, which still requires configuration and understanding,

1543
01:21:03.399 --> 01:21:07.439
<v Speaker 2>involves overhead. For very small scale applications or simple point

1544
01:21:07.479 --> 01:21:10.920
<v Speaker 2>to point integration needs, the complexity of introducing kofka might

1545
01:21:10.960 --> 01:21:14.640
<v Speaker 2>outweigh its benefits. Sometimes a simpler solution, like using a

1546
01:21:14.640 --> 01:21:18.880
<v Speaker 2>message queue built into your application framework, relying on database triggers,

1547
01:21:18.960 --> 01:21:21.880
<v Speaker 2>or polling using a lightweight cloud queue service, or even

1548
01:21:21.960 --> 01:21:25.000
<v Speaker 2>just direct API calls might be perfectly sufficient and much

1549
01:21:25.039 --> 01:21:28.520
<v Speaker 2>easier to manage. Don't introduce kafka just because it's popular,

1550
01:21:28.960 --> 01:21:31.920
<v Speaker 2>make sure the problem warrants its capabilities. Don't bring a

1551
01:21:31.920 --> 01:21:34.159
<v Speaker 2>bazooka to a knife fight, so to speak.

1552
01:21:34.000 --> 01:21:38.800
<v Speaker 1>Evaluate the trade offs, don't over engineer makes sense. And finally,

1553
01:21:39.000 --> 01:21:41.359
<v Speaker 1>the source material has this great warning which you alluded

1554
01:21:41.359 --> 01:21:45.079
<v Speaker 1>to earlier. If garbage is produced into Kofka, garbage will

1555
01:21:45.079 --> 01:21:47.399
<v Speaker 1>also come out at the consumer side. What does that

1556
01:21:47.439 --> 01:21:49.359
<v Speaker 1>truly mean in practice and why is it such an

1557
01:21:49.359 --> 01:21:50.439
<v Speaker 1>important closing thought?

1558
01:21:50.960 --> 01:21:54.640
<v Speaker 2>It means that ultimately Kafka is not a substitute for

1559
01:21:54.720 --> 01:21:58.800
<v Speaker 2>good architecture and good data practices upstream. Kaffa itself provides

1560
01:21:58.840 --> 01:22:01.600
<v Speaker 2>incredible freedom and flex dixability and how you produce and

1561
01:22:01.680 --> 01:22:05.920
<v Speaker 2>consume data. It faithfully and reliably transports the bite arrays

1562
01:22:05.960 --> 01:22:09.119
<v Speaker 2>you give it, but it doesn't inherently understand or validate

1563
01:22:09.159 --> 01:22:12.640
<v Speaker 2>the semantic meaning or business correctness of that data beyond

1564
01:22:12.640 --> 01:22:15.119
<v Speaker 2>basic scheme of validation. If you use a schema registry, it.

1565
01:22:15.159 --> 01:22:17.000
<v Speaker 3>Just moves the bytes exactly.

1566
01:22:17.680 --> 01:22:20.359
<v Speaker 2>So, if the data quality being produced into Kafka is poor,

1567
01:22:20.880 --> 01:22:25.800
<v Speaker 2>if producers send malform messages, inconsistent values, incorrect calculations, or

1568
01:22:25.840 --> 01:22:29.800
<v Speaker 2>just plain wrong information, Kafka will diligently deliver that poor

1569
01:22:29.880 --> 01:22:33.319
<v Speaker 2>data to all interested consumers. The consumers will then either

1570
01:22:33.640 --> 01:22:36.760
<v Speaker 2>crash trying to process the garbage, make bad decisions based

1571
01:22:36.800 --> 01:22:39.880
<v Speaker 2>on the garbage, or have to implement complex, defensive logic

1572
01:22:39.920 --> 01:22:42.319
<v Speaker 2>to try and clean up the garbage. This highlights the

1573
01:22:42.359 --> 01:22:45.319
<v Speaker 2>Kafka's power, lized not just in the technology itself, but

1574
01:22:45.359 --> 01:22:48.840
<v Speaker 2>crucially in the thoughtful architecture, the data modeling, the schema governance,

1575
01:22:49.119 --> 01:22:52.439
<v Speaker 2>the validation logic and producers, and the overall data quality

1576
01:22:52.479 --> 01:22:55.760
<v Speaker 2>strategy you build around it. Coffa is a powerful tool,

1577
01:22:56.279 --> 01:22:59.079
<v Speaker 2>but its effectiveness depends entirely on how well you design

1578
01:22:59.159 --> 01:23:02.000
<v Speaker 2>and manage the entire gata ecosystem it enables. You won't

1579
01:23:02.039 --> 01:23:03.760
<v Speaker 2>magically fix up stream data problems.

1580
01:23:03.960 --> 01:23:07.079
<v Speaker 1>Garbage in, garbage out, faithfully delivered at scale. A very

1581
01:23:07.159 --> 01:23:11.079
<v Speaker 1>sobering and important reminder while you have just completed a

1582
01:23:11.239 --> 01:23:13.880
<v Speaker 1>very deep dive into apatche Kofka. We went from its

1583
01:23:13.880 --> 01:23:19.239
<v Speaker 1>fundamental components, messages, topics, partitions, brokers, and its unique nature

1584
01:23:19.279 --> 01:23:19.560
<v Speaker 1>as a.

1585
01:23:19.479 --> 01:23:24.399
<v Speaker 2>Distributed log all the way through to replication acknowledgments, idempotence

1586
01:23:24.479 --> 01:23:25.720
<v Speaker 2>for reliability, and.

1587
01:23:25.640 --> 01:23:29.920
<v Speaker 1>How it achieves incredible performance through batching, compressions, zero copy

1588
01:23:29.960 --> 01:23:31.960
<v Speaker 1>and parallel processing. With partitions.

1589
01:23:32.079 --> 01:23:35.560
<v Speaker 2>We explored its power for integrating systems using KOFCA connect

1590
01:23:35.560 --> 01:23:38.319
<v Speaker 2>and SMTS, and dove into the world of real time

1591
01:23:38.359 --> 01:23:42.880
<v Speaker 2>stream processing with Kofka streams, covering state tables, joins, and

1592
01:23:42.920 --> 01:23:43.720
<v Speaker 2>time concepts.

1593
01:23:43.960 --> 01:23:49.720
<v Speaker 1>And we didn't forget management and security, governance schema registries, authentication, authorization,

1594
01:23:50.039 --> 01:23:54.760
<v Speaker 1>encryption quotas, deployment models, and crucial monitoring and alerting.

1595
01:23:54.479 --> 01:23:58.399
<v Speaker 2>Plus how to handle disasters using stretched clusters or mirroring

1596
01:23:58.439 --> 01:24:00.920
<v Speaker 2>patterns like active passes in Hubbin.

1597
01:24:00.720 --> 01:24:04.000
<v Speaker 1>Spoke and critically, we also covered the scenarios where Kaffka

1598
01:24:04.119 --> 01:24:05.960
<v Speaker 1>might not be the best fit, ensuring you have that

1599
01:24:06.079 --> 01:24:07.920
<v Speaker 1>balanced and practical understanding.

1600
01:24:08.039 --> 01:24:10.800
<v Speaker 2>It's been quite a journey through the Kafka ecosystem.

1601
01:24:11.039 --> 01:24:14.800
<v Speaker 1>Indeed, the key takeaway for you listening is hopefully clear.

1602
01:24:15.319 --> 01:24:18.640
<v Speaker 1>Kofka isn't just another messaging system off the shelf. It's

1603
01:24:18.680 --> 01:24:22.720
<v Speaker 1>a truly foundational technology that, when you strategically and thoughtfully,

1604
01:24:23.079 --> 01:24:27.600
<v Speaker 1>can genuinely transform how your organization handles data, enabling that

1605
01:24:27.720 --> 01:24:30.399
<v Speaker 1>shift towards real time operations and insights.

1606
01:24:30.520 --> 01:24:32.600
<v Speaker 2>That transformation potential is real.

1607
01:24:32.600 --> 01:24:37.119
<v Speaker 1>But as we've seen, Kafka empowers you with incredible flexibility,

1608
01:24:37.399 --> 01:24:40.000
<v Speaker 1>and this brings us back to that final crucial point.

1609
01:24:40.520 --> 01:24:43.760
<v Speaker 1>Remember this, Kaffka doesn't care about the messages it transports.

1610
01:24:43.800 --> 01:24:47.520
<v Speaker 1>It's agnostic. If garbage is produced into Kafka, garbage will

1611
01:24:47.520 --> 01:24:49.119
<v Speaker 1>also come out of the consumer side.

1612
01:24:49.159 --> 01:24:51.640
<v Speaker 2>It faithfully delivers what it's given exactly.

1613
01:24:52.119 --> 01:24:55.439
<v Speaker 1>It's true power lies not just in the impressive technology itself,

1614
01:24:55.479 --> 01:24:59.079
<v Speaker 1>but in the thoughtful architecture, the rigorous data quality practices,

1615
01:24:59.279 --> 01:25:01.359
<v Speaker 1>and the careful you bring to it.

1616
01:25:01.359 --> 01:25:03.840
<v Speaker 2>It really underscores the importance of that end to end

1617
01:25:03.920 --> 01:25:05.640
<v Speaker 2>thinking about your data pipelines.

1618
01:25:05.920 --> 01:25:09.199
<v Speaker 1>So as you reflect on your own data challenges and opportunities,

1619
01:25:09.239 --> 01:25:11.359
<v Speaker 1>what stands out to you from today's deep dive. What

1620
01:25:11.560 --> 01:25:15.840
<v Speaker 1>aspects of Kofka's architecture, its capabilities, or even its limitations

1621
01:25:15.880 --> 01:25:18.439
<v Speaker 1>might you explore further for your own needs or projects.

1622
01:25:18.840 --> 01:25:21.880
<v Speaker 1>Keep asking those questions, keep digging deeper, because in the

1623
01:25:21.920 --> 01:25:24.039
<v Speaker 1>world of data, the learning never truly stops
