WEBVTT

1
00:00:00.000 --> 00:00:02.480
<v Speaker 1>All right, so we've got a ton of information here

2
00:00:02.520 --> 00:00:04.759
<v Speaker 1>about how to actually set up an analytics system.

3
00:00:04.839 --> 00:00:06.559
<v Speaker 2>Yeah, it seems like you're really looking to get into

4
00:00:06.759 --> 00:00:08.439
<v Speaker 2>the modern best practices for this.

5
00:00:08.880 --> 00:00:11.400
<v Speaker 1>Absolutely, and I think this is going to be really

6
00:00:11.400 --> 00:00:15.480
<v Speaker 1>fascinating because we've got some excerpts from the Analytics set

7
00:00:15.560 --> 00:00:17.160
<v Speaker 1>Up Guidebook by Holistics.

8
00:00:17.440 --> 00:00:20.839
<v Speaker 2>Yeah, and Holistics they've got years of experience helping companies

9
00:00:21.480 --> 00:00:25.519
<v Speaker 2>build their business intelligence capabilities exactly, so they definitely know

10
00:00:25.559 --> 00:00:26.399
<v Speaker 2>what they're talking about.

11
00:00:26.559 --> 00:00:28.480
<v Speaker 1>Yeah, they know their stuff. So what we're going to

12
00:00:28.519 --> 00:00:31.199
<v Speaker 1>do is give you a really clear roadmap to building

13
00:00:32.200 --> 00:00:34.960
<v Speaker 1>a modern and scalable analytics system and.

14
00:00:34.960 --> 00:00:38.359
<v Speaker 2>Hopefully cut through a lot of the jargon that's out there. Yeah, sure,

15
00:00:38.399 --> 00:00:41.960
<v Speaker 2>and really highlight some of the most fascinating bits from

16
00:00:42.000 --> 00:00:42.799
<v Speaker 2>this guidebook.

17
00:00:43.240 --> 00:00:46.520
<v Speaker 1>Now, they start out by emphasizing that any analytics system,

18
00:00:46.560 --> 00:00:50.280
<v Speaker 1>no matter how complex, it ultimately boils down to three

19
00:00:50.359 --> 00:00:51.399
<v Speaker 1>key actions.

20
00:00:51.560 --> 00:00:54.920
<v Speaker 2>Yeah, what they call the core trio, so loading the data,

21
00:00:55.039 --> 00:00:57.840
<v Speaker 2>transforming the data, and then actually using the data.

22
00:00:57.960 --> 00:00:59.520
<v Speaker 1>Okay, I like that the core trio.

23
00:01:00.079 --> 00:01:04.120
<v Speaker 2>Yeah, it's a simple framework, but it's really powerful just

24
00:01:04.120 --> 00:01:06.400
<v Speaker 2>because it helps you kind of break down, you know,

25
00:01:06.439 --> 00:01:10.079
<v Speaker 2>this whole complex system into like manageable steps.

26
00:01:10.439 --> 00:01:14.239
<v Speaker 1>Yeah, for sure. So step one then is loading data.

27
00:01:15.000 --> 00:01:17.400
<v Speaker 1>Seems pretty straightforward, right, just get all your data into

28
00:01:17.439 --> 00:01:18.439
<v Speaker 1>one central location.

29
00:01:18.640 --> 00:01:21.760
<v Speaker 2>Yeah in principle, yeah, but in reality it can quickly

30
00:01:21.799 --> 00:01:25.239
<v Speaker 2>become a logistical nightmare. I can imagine, you know, think

31
00:01:25.239 --> 00:01:27.519
<v Speaker 2>about all the different sources you might have, Like you know,

32
00:01:27.599 --> 00:01:29.519
<v Speaker 2>you've got data in your app, you've got your CRM,

33
00:01:29.599 --> 00:01:32.799
<v Speaker 2>all your marketing platforms, and then you know, maybe even

34
00:01:32.799 --> 00:01:35.400
<v Speaker 2>those dreaded spreadsheets lurking in the shadows.

35
00:01:35.799 --> 00:01:37.000
<v Speaker 1>Yeah.

36
00:01:37.040 --> 00:01:40.480
<v Speaker 2>So consolidating all of that into a central repository it's

37
00:01:40.480 --> 00:01:42.879
<v Speaker 2>no small feet, No.

38
00:01:42.719 --> 00:01:47.040
<v Speaker 1>It's not. This makes me think of like herding cats.

39
00:01:46.920 --> 00:01:49.000
<v Speaker 2>You know what I mean, it really is, Yeah, except

40
00:01:49.000 --> 00:01:51.400
<v Speaker 2>the cats are data points from a dozen different systems.

41
00:01:51.519 --> 00:01:52.200
<v Speaker 1>Yeah, exactly.

42
00:01:52.599 --> 00:01:55.560
<v Speaker 2>So this is where this concept of a data warehouse

43
00:01:55.560 --> 00:01:59.000
<v Speaker 2>comes in. It's basically, you know, it's a massive centralized

44
00:01:59.040 --> 00:02:03.079
<v Speaker 2>storage system that's really specifically designed for housing and managing

45
00:02:03.079 --> 00:02:04.840
<v Speaker 2>all of your organization's data.

46
00:02:04.920 --> 00:02:07.840
<v Speaker 1>Okay, so that's our central hub, the data warehouse.

47
00:02:07.959 --> 00:02:08.919
<v Speaker 2>Yeah.

48
00:02:08.960 --> 00:02:12.759
<v Speaker 1>But you know, the guidebook mentions that some early stage

49
00:02:12.800 --> 00:02:15.719
<v Speaker 1>companies might try to skip this step. Is that ever

50
00:02:15.800 --> 00:02:16.400
<v Speaker 1>a good idea?

51
00:02:16.680 --> 00:02:20.719
<v Speaker 2>It's a delicate balance. You know, if you're truly early stage,

52
00:02:21.039 --> 00:02:23.000
<v Speaker 2>maybe you just have one data source not a lot

53
00:02:23.039 --> 00:02:26.199
<v Speaker 2>of traffic, you might be able to get away with

54
00:02:26.319 --> 00:02:29.719
<v Speaker 2>querying directly from your production database for a short period

55
00:02:29.759 --> 00:02:30.080
<v Speaker 2>of time.

56
00:02:30.319 --> 00:02:31.960
<v Speaker 1>I mean, but that just seems like that would put

57
00:02:32.039 --> 00:02:35.439
<v Speaker 1>such a strain on your system, potentially even affecting your

58
00:02:35.560 --> 00:02:36.680
<v Speaker 1>users exactly.

59
00:02:36.759 --> 00:02:38.960
<v Speaker 2>And that's one of the big risks, right It's like

60
00:02:39.639 --> 00:02:41.560
<v Speaker 2>it's like trying to run a marathon while carrying a

61
00:02:41.599 --> 00:02:44.039
<v Speaker 2>heavy backpack. Okay, you know, you might make it to

62
00:02:44.039 --> 00:02:47.159
<v Speaker 2>the finish line, but it's going to be slow and painful, ye,

63
00:02:47.520 --> 00:02:49.840
<v Speaker 2>and you run the risk of tripping and injuring yourself,

64
00:02:49.879 --> 00:02:52.759
<v Speaker 2>which in this analogy, would be you know, equivalent to

65
00:02:52.919 --> 00:02:54.639
<v Speaker 2>causing downtime for your application.

66
00:02:55.000 --> 00:02:57.439
<v Speaker 1>Right right now, that makes perfect sense. And what about

67
00:02:57.479 --> 00:03:01.120
<v Speaker 1>companies that are using you know, no secal databases like

68
00:03:01.159 --> 00:03:04.000
<v Speaker 1>Mango dB. Are those suitable for analytics.

69
00:03:04.199 --> 00:03:09.000
<v Speaker 2>Generally no no SQL databases. They're great for handling specific

70
00:03:09.159 --> 00:03:12.800
<v Speaker 2>types of data and workloads, but they're not really optimized

71
00:03:12.840 --> 00:03:17.439
<v Speaker 2>for the complex queries that are essential for business intelligence. Okay,

72
00:03:17.639 --> 00:03:20.000
<v Speaker 2>so trying to perform in depth analytics on a no

73
00:03:20.080 --> 00:03:23.919
<v Speaker 2>SQL database, it's it's kind of like trying to you know,

74
00:03:23.960 --> 00:03:26.199
<v Speaker 2>trying to write a novel on a typewriter that's designed

75
00:03:26.240 --> 00:03:27.319
<v Speaker 2>for short memos. Right.

76
00:03:27.599 --> 00:03:29.960
<v Speaker 1>Okay, so even for startups that are just starting out,

77
00:03:30.039 --> 00:03:33.479
<v Speaker 1>you're advocating for them to move towards you know, this

78
00:03:33.599 --> 00:03:35.599
<v Speaker 1>dedicated analytics setup as soon as possible.

79
00:03:35.680 --> 00:03:39.000
<v Speaker 2>Yeah. Absolutely. I mean there is this you know, dump

80
00:03:39.039 --> 00:03:42.159
<v Speaker 2>and load method which is basically just exporting data to

81
00:03:42.240 --> 00:03:44.280
<v Speaker 2>like local files, you know, to analyze it. And that

82
00:03:44.360 --> 00:03:46.360
<v Speaker 2>might work in the very very early stages, but it

83
00:03:46.439 --> 00:03:50.240
<v Speaker 2>quickly becomes unsustainable, Okay. It just it lacks the structure

84
00:03:50.360 --> 00:03:53.599
<v Speaker 2>and the scalability and automation that you get with a

85
00:03:53.599 --> 00:03:54.680
<v Speaker 2>proper analytics system.

86
00:03:55.000 --> 00:03:58.879
<v Speaker 1>Okay, so we've established that this data warehouse is essential,

87
00:04:00.120 --> 00:04:02.159
<v Speaker 1>but how do we actually get all of our data

88
00:04:02.199 --> 00:04:04.800
<v Speaker 1>into it? That seems like a really big task.

89
00:04:04.560 --> 00:04:06.919
<v Speaker 2>Yeah, for sure. And that's where data consolidation comes in.

90
00:04:07.000 --> 00:04:09.919
<v Speaker 2>And this is you know, a critical part of building

91
00:04:09.919 --> 00:04:13.360
<v Speaker 2>a modern analytics stack. And the guidebook talks about a

92
00:04:13.439 --> 00:04:16.879
<v Speaker 2>key shift in thinking here, moving from this traditional ETL

93
00:04:16.920 --> 00:04:19.480
<v Speaker 2>approach to a more modern ELT approach.

94
00:04:19.560 --> 00:04:22.759
<v Speaker 1>Okay, so ETL versus ELT. What's the difference and why

95
00:04:22.879 --> 00:04:24.519
<v Speaker 1>is this shift so important? Right?

96
00:04:24.759 --> 00:04:29.279
<v Speaker 2>So, ETL stands for extract, Transform load, and that's the

97
00:04:29.279 --> 00:04:31.639
<v Speaker 2>traditional method where you extract a data from all your

98
00:04:31.720 --> 00:04:35.439
<v Speaker 2>various sources, you transform it into a usable format outside

99
00:04:35.480 --> 00:04:38.439
<v Speaker 2>of the data warehouse, and then finally you load it

100
00:04:38.480 --> 00:04:39.240
<v Speaker 2>into the warehouse.

101
00:04:39.399 --> 00:04:41.639
<v Speaker 1>Okay, so it's like before you put it into storage,

102
00:04:41.759 --> 00:04:44.319
<v Speaker 1>you're meticulously cleaning and organizing everything.

103
00:04:44.439 --> 00:04:48.120
<v Speaker 2>Yeah, exactly. But this approach, it has some significant drawbacks.

104
00:04:48.319 --> 00:04:51.240
<v Speaker 2>You know, as the volume and variety of data increases,

105
00:04:51.560 --> 00:04:54.800
<v Speaker 2>transforming everything before loading it into the warehouse can really

106
00:04:54.839 --> 00:04:56.360
<v Speaker 2>create a massive bottleneck.

107
00:04:56.399 --> 00:04:58.720
<v Speaker 1>So it's like having a single narrow doorway into our

108
00:04:58.800 --> 00:04:59.600
<v Speaker 1>data warehouse.

109
00:05:00.040 --> 00:05:02.079
<v Speaker 2>Great way to put it, Yeah, and it really slows

110
00:05:02.079 --> 00:05:07.959
<v Speaker 2>everything down. So ELT, which stands for extract load transform,

111
00:05:08.480 --> 00:05:12.519
<v Speaker 2>basically flips the script to extract the data, you load

112
00:05:12.560 --> 00:05:16.040
<v Speaker 2>it into the data warehouse in its raw, unprocessed format,

113
00:05:16.360 --> 00:05:19.040
<v Speaker 2>and then you transform it within the warehouse itself.

114
00:05:19.319 --> 00:05:22.800
<v Speaker 1>So we're dumping everything into the warehouse first and then

115
00:05:22.879 --> 00:05:25.959
<v Speaker 1>organizing later. It just seems so counterintuitive.

116
00:05:26.120 --> 00:05:27.920
<v Speaker 2>Yeah, I know it seems that way, right.

117
00:05:28.000 --> 00:05:29.639
<v Speaker 1>But what are the benefits of this approach.

118
00:05:30.079 --> 00:05:34.040
<v Speaker 2>Well, there are several first eliminates that bottleneck that we

119
00:05:34.079 --> 00:05:37.360
<v Speaker 2>talked about. That's caused by transforming everything outside the warehouse. Okay,

120
00:05:37.560 --> 00:05:42.000
<v Speaker 2>and cloud data warehouses are incredibly powerful these days, you know,

121
00:05:42.560 --> 00:05:45.959
<v Speaker 2>so you can actually leverage that processing power for the transformation.

122
00:05:46.199 --> 00:05:49.920
<v Speaker 1>It's like using one of those, you know, industrial sized

123
00:05:50.000 --> 00:05:52.879
<v Speaker 1>vacuum cleaners versus like a tiny handheld vacuum cleaner.

124
00:05:53.000 --> 00:05:55.519
<v Speaker 2>Exactly. Yeah, it's a much more efficient use of resources.

125
00:05:55.600 --> 00:05:55.879
<v Speaker 1>Okay.

126
00:05:56.480 --> 00:06:01.360
<v Speaker 2>Secondly, ELT allows for a more agile, dumb first, transform

127
00:06:01.480 --> 00:06:04.959
<v Speaker 2>later approach, so you can load all your raw data

128
00:06:05.040 --> 00:06:08.439
<v Speaker 2>into the warehouse without having to like meticulously define every

129
00:06:08.439 --> 00:06:09.600
<v Speaker 2>transformation upfront.

130
00:06:09.720 --> 00:06:13.720
<v Speaker 1>Okay, so you're saying that's particularly valuable if your company's

131
00:06:13.839 --> 00:06:15.160
<v Speaker 1>data needs are still evolving.

132
00:06:15.360 --> 00:06:18.680
<v Speaker 2>Exactly. Yeah, if you're not entirely sure what insights you're

133
00:06:18.680 --> 00:06:20.360
<v Speaker 2>looking for yet, you can just get all the data

134
00:06:20.399 --> 00:06:22.600
<v Speaker 2>in there and then worry about the transformation later.

135
00:06:22.720 --> 00:06:23.639
<v Speaker 1>Okay, that makes sense.

136
00:06:24.480 --> 00:06:29.600
<v Speaker 2>And then here's another key advantage. ELT empowers data analysts

137
00:06:29.680 --> 00:06:33.120
<v Speaker 2>to actually take ownership of the transformation process.

138
00:06:33.160 --> 00:06:36.439
<v Speaker 1>Wait, so the analysts are actually writing the transformations themselves.

139
00:06:36.480 --> 00:06:39.600
<v Speaker 1>I always thought that was like a data engineer's job.

140
00:06:39.920 --> 00:06:43.199
<v Speaker 2>Yeah, traditionally it has been, right, but with ELT, analysts

141
00:06:43.199 --> 00:06:46.079
<v Speaker 2>can actually use SQL okay, which is the language they're

142
00:06:46.079 --> 00:06:50.839
<v Speaker 2>already familiar with to define and manage those transformations within

143
00:06:50.959 --> 00:06:51.879
<v Speaker 2>the data warehouse.

144
00:06:51.920 --> 00:06:53.480
<v Speaker 1>So it freeze them up. They don't have to rely

145
00:06:53.560 --> 00:06:54.639
<v Speaker 1>on the data engineers as much.

146
00:06:54.680 --> 00:06:57.639
<v Speaker 2>Exactly. It reduces their dependency on the engineers, and then

147
00:06:57.680 --> 00:07:02.000
<v Speaker 2>they can be much more agile and responsive to you know,

148
00:07:02.519 --> 00:07:03.839
<v Speaker 2>the changing business needs.

149
00:07:03.920 --> 00:07:06.879
<v Speaker 1>Yeah, I mean that must be incredibly empowering for them.

150
00:07:07.240 --> 00:07:07.639
<v Speaker 2>Yeah.

151
00:07:07.759 --> 00:07:10.360
<v Speaker 1>Now, before we move on from this section, I know

152
00:07:10.439 --> 00:07:14.240
<v Speaker 1>data laks are a part of this modern data ecosystem,

153
00:07:14.839 --> 00:07:17.680
<v Speaker 1>but I'm not exactly sure how they fit into this

154
00:07:17.680 --> 00:07:19.079
<v Speaker 1>this whole ELT picture.

155
00:07:19.399 --> 00:07:23.319
<v Speaker 2>Right. So, a data lake is basically a vast, unstructured

156
00:07:23.319 --> 00:07:27.120
<v Speaker 2>repository for all of your raw data, regardless of its format.

157
00:07:27.439 --> 00:07:30.560
<v Speaker 2>Think of it as a massive data holding tank. So

158
00:07:30.639 --> 00:07:33.160
<v Speaker 2>you can dump everything into the data lake, and then

159
00:07:33.199 --> 00:07:36.199
<v Speaker 2>you selectively pull out the specific data that you need

160
00:07:36.199 --> 00:07:38.279
<v Speaker 2>for analysis using ELT.

161
00:07:38.600 --> 00:07:41.000
<v Speaker 1>Okay, so we can dump everything into the data lake

162
00:07:41.360 --> 00:07:43.199
<v Speaker 1>and then pull out what we need when we're ready

163
00:07:43.199 --> 00:07:44.279
<v Speaker 1>to actually analyze it.

164
00:07:44.360 --> 00:07:47.439
<v Speaker 2>Exactly. Yeah, It's kind of like having a giant pantry

165
00:07:47.439 --> 00:07:49.160
<v Speaker 2>where you store all of your ingredients and then you

166
00:07:49.160 --> 00:07:51.920
<v Speaker 2>pull out the specific ones you need when you're ready

167
00:07:51.959 --> 00:07:53.240
<v Speaker 2>to cook a particular dish.

168
00:07:53.439 --> 00:07:56.439
<v Speaker 1>Okay, all right, I like that analogy. So now that

169
00:07:56.480 --> 00:07:59.040
<v Speaker 1>we've got you know, our data into the warehouse, what

170
00:07:59.079 --> 00:07:59.959
<v Speaker 1>happens next.

171
00:08:00.000 --> 00:08:02.319
<v Speaker 2>Well, now it's time for step two of the core trio,

172
00:08:03.040 --> 00:08:05.720
<v Speaker 2>which is transforming that raw data into something we can

173
00:08:05.759 --> 00:08:07.120
<v Speaker 2>actually use to gain insights.

174
00:08:07.399 --> 00:08:09.160
<v Speaker 1>So we're molding it to fit our needs.

175
00:08:09.759 --> 00:08:12.839
<v Speaker 2>Yeah, exactly, And this is where we start shaping and

176
00:08:12.959 --> 00:08:16.319
<v Speaker 2>molding the data to fit our specific business needs. And

177
00:08:16.839 --> 00:08:20.160
<v Speaker 2>you know, there are a lot of potential benefits to

178
00:08:20.240 --> 00:08:23.600
<v Speaker 2>doing good data transformations. It's not just about making the

179
00:08:23.680 --> 00:08:27.040
<v Speaker 2>data look pretty. It's about ensuring consistency, making it easier

180
00:08:27.040 --> 00:08:29.720
<v Speaker 2>to analyze, and even potentially saving money.

181
00:08:29.839 --> 00:08:33.279
<v Speaker 1>Right, Okay, so let's dive into those benefits a little

182
00:08:33.320 --> 00:08:36.080
<v Speaker 1>bit more. What are some of the key advantages of

183
00:08:37.279 --> 00:08:39.559
<v Speaker 1>really well designed data transformations.

184
00:08:40.200 --> 00:08:42.639
<v Speaker 2>Well, one of the key benefits is that it helps

185
00:08:42.720 --> 00:08:47.080
<v Speaker 2>ensure consistency across the entire organization. Okay, so by creating

186
00:08:47.120 --> 00:08:51.559
<v Speaker 2>standardized definitions and calculations, you can avoid this dreaded thing

187
00:08:51.639 --> 00:08:52.720
<v Speaker 2>called metric drift.

188
00:08:53.039 --> 00:08:54.679
<v Speaker 1>Well, metric drift, yeah, I've heard of that.

189
00:08:54.639 --> 00:08:57.279
<v Speaker 2>Which is basically where different departments end up calculating the

190
00:08:57.279 --> 00:09:01.639
<v Speaker 2>same thing in different ways. Leading to confusion and inaccurate reporting.

191
00:09:01.840 --> 00:09:03.919
<v Speaker 1>Right, So we want to make sure that like everyone

192
00:09:04.000 --> 00:09:05.480
<v Speaker 1>speaking the same language exactly.

193
00:09:05.600 --> 00:09:08.799
<v Speaker 2>Yeah, like everyone in the organization is using the same dictionary. Okay,

194
00:09:09.000 --> 00:09:11.679
<v Speaker 2>so when we talk about key metrics, we all know

195
00:09:11.720 --> 00:09:14.360
<v Speaker 2>what we're talking about, right, We're all on the same page, exactly.

196
00:09:15.360 --> 00:09:20.639
<v Speaker 2>Another advantage is reusability, okay. So by defining transformations as

197
00:09:20.679 --> 00:09:24.440
<v Speaker 2>like modular components, you can easily reuse them across different

198
00:09:24.480 --> 00:09:27.039
<v Speaker 2>analyzes and reports, okay, which saves you a lot of

199
00:09:27.120 --> 00:09:29.440
<v Speaker 2>time and effort. But it also ensures that all your

200
00:09:29.440 --> 00:09:32.559
<v Speaker 2>insights are based on the same underlying logic.

201
00:09:32.320 --> 00:09:34.919
<v Speaker 1>So we're not reinventing the wheel every time we need

202
00:09:34.960 --> 00:09:37.480
<v Speaker 1>to pull a new report or something. Yeah, okay.

203
00:09:37.840 --> 00:09:40.440
<v Speaker 2>And that brings us to the third major benefit, which

204
00:09:40.480 --> 00:09:42.759
<v Speaker 2>is performance and cost effectiveness.

205
00:09:43.000 --> 00:09:43.399
<v Speaker 1>Okay.

206
00:09:43.519 --> 00:09:47.720
<v Speaker 2>You know, well designed transformations can optimize your queries, make

207
00:09:47.759 --> 00:09:51.639
<v Speaker 2>them run faster and more efficiently, which can translate into

208
00:09:51.679 --> 00:09:55.600
<v Speaker 2>significant cost savings, especially when you're dealing with large volumes

209
00:09:55.600 --> 00:09:57.440
<v Speaker 2>of data in a cloud data.

210
00:09:57.159 --> 00:10:00.000
<v Speaker 1>Warehouse, right where you're paying for compute time okay exactly.

211
00:10:00.200 --> 00:10:04.000
<v Speaker 2>So yeah, good data transformations. They're like building a solid

212
00:10:04.039 --> 00:10:06.879
<v Speaker 2>foundation for a house. You know, it might not be

213
00:10:06.919 --> 00:10:10.320
<v Speaker 2>the most glamorous part, but it's essential for stability, efficiency,

214
00:10:10.919 --> 00:10:13.399
<v Speaker 2>and long term success.

215
00:10:13.840 --> 00:10:16.320
<v Speaker 1>Okay, I like that. So it's like the foundation. So

216
00:10:16.480 --> 00:10:18.840
<v Speaker 1>what are some what does a data transformation look like

217
00:10:18.919 --> 00:10:19.519
<v Speaker 1>in practice?

218
00:10:19.600 --> 00:10:21.519
<v Speaker 2>Yeah, So to give you a practical sense of what

219
00:10:21.559 --> 00:10:23.480
<v Speaker 2>this looks like, the guidebook actually walks us through a

220
00:10:23.519 --> 00:10:29.519
<v Speaker 2>really basic example of using SQL to transform raw booking

221
00:10:29.600 --> 00:10:34.000
<v Speaker 2>data into a daily summary. And it's really quite elegant

222
00:10:34.000 --> 00:10:34.960
<v Speaker 2>in its simplicity.

223
00:10:35.120 --> 00:10:35.320
<v Speaker 1>Oka.

224
00:10:35.440 --> 00:10:39.480
<v Speaker 2>It really demystifies that process of data transformation and shows

225
00:10:39.519 --> 00:10:42.440
<v Speaker 2>you how with a little bit of SQL knowledge, analysts

226
00:10:42.519 --> 00:10:45.080
<v Speaker 2>can really take ownership of this crucial step.

227
00:10:45.159 --> 00:10:48.240
<v Speaker 1>Yeah, it's really accessible. Yeah. So okay, so we've loaded

228
00:10:48.279 --> 00:10:51.600
<v Speaker 1>the data, we've transformed it. What is the final step

229
00:10:51.639 --> 00:10:52.720
<v Speaker 1>in this core trio?

230
00:10:52.960 --> 00:10:55.399
<v Speaker 2>Right? So the final step, of course, is actually using

231
00:10:55.440 --> 00:10:56.159
<v Speaker 2>that data. Right.

232
00:10:56.200 --> 00:10:59.039
<v Speaker 1>It's all about using it getting value out of it exactly.

233
00:10:59.159 --> 00:11:02.600
<v Speaker 2>And the guidebook refers to this as data servicing.

234
00:11:02.679 --> 00:11:04.360
<v Speaker 1>Data servicing, okay, And they.

235
00:11:04.200 --> 00:11:06.679
<v Speaker 2>Talk about how the role of data analysts has evolved

236
00:11:06.679 --> 00:11:08.320
<v Speaker 2>significantly in recent years.

237
00:11:08.559 --> 00:11:11.200
<v Speaker 1>You know, I was just thinking about that the other day, like,

238
00:11:11.559 --> 00:11:13.879
<v Speaker 1>how how much that has changed?

239
00:11:14.200 --> 00:11:17.120
<v Speaker 2>Yeah, it's really fascinating. How this role has evolved, and

240
00:11:17.919 --> 00:11:21.279
<v Speaker 2>they actually frame it as a tale of three jobs.

241
00:11:21.399 --> 00:11:22.799
<v Speaker 1>A tale of three jobs.

242
00:11:22.600 --> 00:11:26.600
<v Speaker 2>Okay, highlighting you know, the journey of data analysts from

243
00:11:26.919 --> 00:11:29.559
<v Speaker 2>like report monkeys to self service enablers.

244
00:11:29.639 --> 00:11:31.360
<v Speaker 1>Okay, I definitely want to hear more about this.

245
00:11:31.480 --> 00:11:34.080
<v Speaker 2>Yeah, it's a really interesting story. So in the early

246
00:11:34.159 --> 00:11:36.879
<v Speaker 2>days of BI, you know, data analysts were often relegated

247
00:11:36.960 --> 00:11:38.679
<v Speaker 2>to the role of report monkeys.

248
00:11:38.799 --> 00:11:39.120
<v Speaker 1>Okay.

249
00:11:39.360 --> 00:11:43.440
<v Speaker 2>They basically spent most of their time manually generating reports

250
00:11:43.440 --> 00:11:44.639
<v Speaker 2>for decision makers.

251
00:11:44.759 --> 00:11:45.279
<v Speaker 1>Okay.

252
00:11:45.360 --> 00:11:48.799
<v Speaker 2>It was really tedious and reactive, and they often felt

253
00:11:48.799 --> 00:11:51.320
<v Speaker 2>like they were just churning out spreadsheets okay, and not

254
00:11:51.360 --> 00:11:54.440
<v Speaker 2>really having any opportunity to actually analyze the data.

255
00:11:54.519 --> 00:11:57.519
<v Speaker 1>Right, They're just you know, putting together spreadsheets.

256
00:11:56.960 --> 00:11:59.440
<v Speaker 2>Exactly, And it was a recipe for burnout.

257
00:11:59.480 --> 00:12:01.080
<v Speaker 1>Oh absolute, I can imagine.

258
00:12:01.120 --> 00:12:03.720
<v Speaker 2>But then came the advent of self service BI tools

259
00:12:03.759 --> 00:12:07.960
<v Speaker 2>like Tableau, oh, Tableau, which promised to liberate analysts from

260
00:12:08.200 --> 00:12:12.240
<v Speaker 2>you know, the shackles of manual reporting and empower business

261
00:12:12.320 --> 00:12:14.840
<v Speaker 2>users to actually explore the data themselves.

262
00:12:15.240 --> 00:12:19.279
<v Speaker 1>Okay, so they're empowered. That sounds like a really good thing.

263
00:12:19.639 --> 00:12:23.440
<v Speaker 2>Freedom It certainly seemed that way initially, but as the

264
00:12:23.519 --> 00:12:27.120
<v Speaker 2>guidebook points out this shift to self service BI, while

265
00:12:27.120 --> 00:12:30.000
<v Speaker 2>empowering in many ways, also introduced some new challenges.

266
00:12:30.080 --> 00:12:31.279
<v Speaker 1>Okay, so there's a butt coming.

267
00:12:31.519 --> 00:12:31.799
<v Speaker 2>Yeah.

268
00:12:31.799 --> 00:12:34.679
<v Speaker 1>What were some of the downsides of this self service revolution?

269
00:12:34.919 --> 00:12:37.919
<v Speaker 2>Well, one of the biggest issues was the rise of

270
00:12:38.000 --> 00:12:38.720
<v Speaker 2>metric drift.

271
00:12:38.879 --> 00:12:39.240
<v Speaker 1>Okay.

272
00:12:39.480 --> 00:12:44.320
<v Speaker 2>With no centralized definition of governance around key metrics, you know,

273
00:12:45.000 --> 00:12:48.879
<v Speaker 2>different departments started calculating things in slightly different ways, leading

274
00:12:48.919 --> 00:12:53.559
<v Speaker 2>to you know, inconsistencies and making it difficult to really

275
00:12:53.559 --> 00:12:56.039
<v Speaker 2>get a clear picture of what was actually happening.

276
00:12:56.200 --> 00:12:59.360
<v Speaker 1>Right, So again, like everyone speaking a different language exactly.

277
00:12:59.399 --> 00:13:02.519
<v Speaker 2>Yeah, it's like everyone having their own version of the truth, right,

278
00:13:02.600 --> 00:13:05.360
<v Speaker 2>which doesn't sound like a recipe for good decision making.

279
00:13:05.519 --> 00:13:08.879
<v Speaker 1>Oh no, it doesn't. So is there something else where

280
00:13:08.879 --> 00:13:09.720
<v Speaker 1>do we go from there?

281
00:13:09.919 --> 00:13:13.159
<v Speaker 2>Right? So that's where the third stage in the evolution

282
00:13:13.279 --> 00:13:16.039
<v Speaker 2>of the data analyst comes in. And this is where

283
00:13:16.840 --> 00:13:19.960
<v Speaker 2>you know, tools like Looker and Holistics enter the scene

284
00:13:20.679 --> 00:13:23.759
<v Speaker 2>with their emphasis on data modeling layers and a more

285
00:13:23.919 --> 00:13:25.799
<v Speaker 2>balanced approach to self service.

286
00:13:26.000 --> 00:13:28.519
<v Speaker 1>Okay, so data modeling layers, I think you've mentioned this before.

287
00:13:28.559 --> 00:13:30.799
<v Speaker 1>Can you just explain what those are and why they're

288
00:13:30.799 --> 00:13:31.399
<v Speaker 1>so important?

289
00:13:31.559 --> 00:13:34.679
<v Speaker 2>Right? A data Modeling Layer is basically an abstraction layer

290
00:13:35.080 --> 00:13:38.200
<v Speaker 2>that sits on top of your raw data, and it

291
00:13:38.200 --> 00:13:43.120
<v Speaker 2>allows you to define business logic, calculations, relationships, and standardized

292
00:13:43.120 --> 00:13:47.159
<v Speaker 2>definitions for all your key metrics in a centralized location.

293
00:13:47.919 --> 00:13:51.480
<v Speaker 1>So rather than just having like the business logic baked

294
00:13:51.519 --> 00:13:54.600
<v Speaker 1>into every individual report or dashboard exactly.

295
00:13:54.679 --> 00:13:57.639
<v Speaker 2>Yeah, you're basically creating a central repository, okay, of all

296
00:13:57.639 --> 00:14:00.440
<v Speaker 2>those definitions and calculations that can be reused to cross

297
00:14:00.480 --> 00:14:01.159
<v Speaker 2>the organization.

298
00:14:01.320 --> 00:14:03.279
<v Speaker 1>It's like a blueprint for your data.

299
00:14:02.960 --> 00:14:06.519
<v Speaker 2>Exactly, So everyone is working from the same set of plans.

300
00:14:06.919 --> 00:14:09.759
<v Speaker 2>And this not only helps to prevent metric drift, but

301
00:14:09.799 --> 00:14:12.480
<v Speaker 2>it also makes it much easier for business users to

302
00:14:12.559 --> 00:14:16.960
<v Speaker 2>actually explore the data independently without you know, getting lost

303
00:14:17.039 --> 00:14:18.360
<v Speaker 2>in the technical complexities.

304
00:14:18.879 --> 00:14:21.759
<v Speaker 1>Yeah, you know. And there's there's this great anecdote in

305
00:14:21.759 --> 00:14:26.679
<v Speaker 1>the book about a CEO who needed specific data and

306
00:14:26.720 --> 00:14:29.679
<v Speaker 1>how data modeling layer empowered them to be able to

307
00:14:29.720 --> 00:14:33.039
<v Speaker 1>get that data themselves without having to go through the

308
00:14:33.159 --> 00:14:33.720
<v Speaker 1>data team.

309
00:14:33.960 --> 00:14:36.559
<v Speaker 2>Yeah, that's a great anecdote. It illustrates how this can

310
00:14:36.720 --> 00:14:39.759
<v Speaker 2>really bridge the gap, you know, yeah, between technical and

311
00:14:39.799 --> 00:14:44.080
<v Speaker 2>non technical users and make data accessible to everyone. You know,

312
00:14:44.200 --> 00:14:46.159
<v Speaker 2>so instead of having to rely on the data team

313
00:14:46.200 --> 00:14:50.480
<v Speaker 2>for every single request, business users can actually explore the

314
00:14:50.559 --> 00:14:53.720
<v Speaker 2>data themselves, okay, armed with the confidence that they're working

315
00:14:53.759 --> 00:14:55.799
<v Speaker 2>with accurate and consistent definitions.

316
00:14:55.879 --> 00:14:58.759
<v Speaker 1>Yeah, it seems like a win win. The data analysts

317
00:14:58.759 --> 00:15:01.600
<v Speaker 1>don't have to just focus on those mundane reporting tasks,

318
00:15:02.120 --> 00:15:04.200
<v Speaker 1>and business users are able to kind of answer their

319
00:15:04.200 --> 00:15:06.240
<v Speaker 1>own questions exactly.

320
00:15:06.399 --> 00:15:09.279
<v Speaker 2>And you know, the guidebook highlights the key benefits of

321
00:15:09.320 --> 00:15:14.720
<v Speaker 2>this approach. It's increased self service analytics, okay, more efficient

322
00:15:14.919 --> 00:15:19.559
<v Speaker 2>use of your data team's resources, and a well documented

323
00:15:19.639 --> 00:15:23.320
<v Speaker 2>and consistent layer of data knowledge that everyone can access.

324
00:15:23.360 --> 00:15:26.159
<v Speaker 1>Okay, So data modeling layers very important piece of the

325
00:15:26.200 --> 00:15:31.600
<v Speaker 1>puzzle for sure. So let's talk about data modeling itself

326
00:15:31.600 --> 00:15:33.639
<v Speaker 1>a little bit more. You sent me some really interesting

327
00:15:33.639 --> 00:15:37.279
<v Speaker 1>stuff on Kimball's dimensional data modeling, with I think is

328
00:15:37.279 --> 00:15:39.480
<v Speaker 1>a very important framework in this field. Yeah.

329
00:15:39.519 --> 00:15:43.039
<v Speaker 2>Absolutely, Kimball's Dimensional data Modeling. It's it's a classic for

330
00:15:43.080 --> 00:15:46.600
<v Speaker 2>a reason. You know, even in today's cloud first world,

331
00:15:46.960 --> 00:15:49.600
<v Speaker 2>the core concepts are incredibly relevant.

332
00:15:49.799 --> 00:15:50.120
<v Speaker 1>Okay.

333
00:15:50.320 --> 00:15:53.480
<v Speaker 2>It's it's like learning the fundamentals of music theory. You know.

334
00:15:53.519 --> 00:15:55.919
<v Speaker 2>You can always add your own flare later on, yeah,

335
00:15:55.960 --> 00:16:00.000
<v Speaker 2>but those basics they're essential, right, So what are some.

336
00:16:00.080 --> 00:16:01.799
<v Speaker 1>Of those core concepts those essentials.

337
00:16:01.840 --> 00:16:04.000
<v Speaker 2>Well, one of the core concepts is this idea of

338
00:16:04.039 --> 00:16:08.279
<v Speaker 2>a data model itself. So a data model is essentially

339
00:16:08.320 --> 00:16:10.840
<v Speaker 2>an abstract representation of your business data.

340
00:16:11.080 --> 00:16:11.360
<v Speaker 1>Okay.

341
00:16:11.480 --> 00:16:15.360
<v Speaker 2>It defines the entities, the attributes, the relationships that are

342
00:16:15.799 --> 00:16:19.080
<v Speaker 2>important for your analysis. It's kind of like creating a map,

343
00:16:19.159 --> 00:16:19.360
<v Speaker 2>you know.

344
00:16:19.559 --> 00:16:22.559
<v Speaker 1>Okay, So it's like our map of our data exactly. Okay,

345
00:16:22.600 --> 00:16:25.080
<v Speaker 1>So what else do we need to navigate this terrain?

346
00:16:25.320 --> 00:16:28.080
<v Speaker 2>Well? Another key concept is what's called relationship mapping.

347
00:16:28.399 --> 00:16:28.679
<v Speaker 1>Okay.

348
00:16:28.720 --> 00:16:31.360
<v Speaker 2>So, just like in a relational database, you need to

349
00:16:31.399 --> 00:16:34.360
<v Speaker 2>define how different data models relate to each other. So,

350
00:16:34.440 --> 00:16:37.639
<v Speaker 2>for example, a customer model might have a relationship with

351
00:16:37.679 --> 00:16:40.720
<v Speaker 2>an orders model okay, right, which allows you to analyze

352
00:16:41.600 --> 00:16:43.039
<v Speaker 2>customer purchasing behavior.

353
00:16:43.120 --> 00:16:45.519
<v Speaker 1>So it's like connecting the docks between different data.

354
00:16:45.240 --> 00:16:48.519
<v Speaker 2>Points exactly, Yeah, to reveal that bigger picture.

355
00:16:48.600 --> 00:16:48.919
<v Speaker 1>Okay.

356
00:16:49.679 --> 00:16:54.039
<v Speaker 2>Then there's what's called custom field logic, which allows you

357
00:16:54.120 --> 00:16:58.679
<v Speaker 2>to create calculated fields within your modelsky. For example, you

358
00:16:58.759 --> 00:17:02.039
<v Speaker 2>might define total revs new as the sum of all

359
00:17:02.039 --> 00:17:03.039
<v Speaker 2>sales transactions.

360
00:17:03.120 --> 00:17:06.480
<v Speaker 1>Right, So we're going beyond the raw data we're adding calculations.

361
00:17:05.880 --> 00:17:08.799
<v Speaker 2>To it exactly. Yeah, you're adding layers of meaning and

362
00:17:08.839 --> 00:17:13.000
<v Speaker 2>calculations to make it more relevant to your specific business needs. Right.

363
00:17:13.400 --> 00:17:17.440
<v Speaker 2>And then finally, there's this idea of models built on

364
00:17:17.519 --> 00:17:21.079
<v Speaker 2>top of other models, okay. And this allows you to

365
00:17:21.240 --> 00:17:26.160
<v Speaker 2>create increasingly sophisticated and nuanced representations of your data.

366
00:17:26.240 --> 00:17:26.599
<v Speaker 1>Okay.

367
00:17:26.839 --> 00:17:29.839
<v Speaker 2>So, for example, you could combine a customer model with

368
00:17:29.880 --> 00:17:32.799
<v Speaker 2>a product model okay, and create what's called a customer

369
00:17:32.839 --> 00:17:36.480
<v Speaker 2>segmentation model okay, right, which allows you to group customers

370
00:17:36.480 --> 00:17:37.839
<v Speaker 2>based on their purchasing patterns.

371
00:17:37.880 --> 00:17:40.400
<v Speaker 1>So we're building like a hierarchy of models.

372
00:17:40.000 --> 00:17:44.359
<v Speaker 2>Exactly, Yeah, each one adding more you know, granularity and insight.

373
00:17:44.519 --> 00:17:47.839
<v Speaker 1>Okay. So, and this brings us to the star of

374
00:17:47.880 --> 00:17:52.079
<v Speaker 1>the show, which I think you mentioned earlier, the Star schema. Yes,

375
00:17:52.680 --> 00:17:54.359
<v Speaker 1>the Star schema, So tell me about that.

376
00:17:54.440 --> 00:17:57.480
<v Speaker 2>It's a classic for a reason, okay. And it's it's

377
00:17:57.519 --> 00:18:01.279
<v Speaker 2>basically a specific way of organizing your data for analysis

378
00:18:01.720 --> 00:18:05.119
<v Speaker 2>that makes it incredibly efficient for querying and reporting. It's

379
00:18:05.160 --> 00:18:09.119
<v Speaker 2>called a star schema because visually it kind of resembles

380
00:18:09.160 --> 00:18:11.880
<v Speaker 2>a star. You have a central fact table and that's

381
00:18:11.920 --> 00:18:14.079
<v Speaker 2>surrounded by multiple dimension tables.

382
00:18:14.240 --> 00:18:16.079
<v Speaker 1>Okay. So the fact table is the heart of the

383
00:18:16.119 --> 00:18:19.480
<v Speaker 1>star exactly, and then the dimensions are like the points

384
00:18:19.680 --> 00:18:22.519
<v Speaker 1>radiating outward, like what goes into each of those?

385
00:18:22.720 --> 00:18:26.480
<v Speaker 2>Right? So the fact table contains the core metrics you

386
00:18:26.519 --> 00:18:30.839
<v Speaker 2>want to analyze, okay. Often these are numerical values like

387
00:18:31.440 --> 00:18:36.319
<v Speaker 2>sales figures or website visits, you know, customer interactions. It's

388
00:18:36.359 --> 00:18:38.960
<v Speaker 2>like the what of your analysis okay, the what right,

389
00:18:39.240 --> 00:18:43.079
<v Speaker 2>and the dimension tables provide the context okay, and descriptive

390
00:18:43.119 --> 00:18:46.680
<v Speaker 2>information about those facts okay, you know, like the who, what, where,

391
00:18:47.000 --> 00:18:49.279
<v Speaker 2>when and why of your data.

392
00:18:49.680 --> 00:18:53.000
<v Speaker 1>Okay. So, for example, if we're looking at you know,

393
00:18:53.160 --> 00:18:58.559
<v Speaker 1>sales data, our fact table might contain like date, product ID, quantity,

394
00:18:58.599 --> 00:19:02.319
<v Speaker 1>souled price exactly. Yeah. And then our dimension tables would

395
00:19:02.359 --> 00:19:04.599
<v Speaker 1>be things like the customer, the product, you know, the

396
00:19:04.599 --> 00:19:07.200
<v Speaker 1>location and things like that precisely. Yeah. Yeah.

397
00:19:07.240 --> 00:19:09.920
<v Speaker 2>And by linking these tables together, you can start to

398
00:19:10.000 --> 00:19:13.079
<v Speaker 2>answer those, you know, complex questions about your data, like

399
00:19:13.480 --> 00:19:16.839
<v Speaker 2>which products are selling best in which regions? Or what

400
00:19:16.960 --> 00:19:19.720
<v Speaker 2>are the demographics of our highest spending customers.

401
00:19:19.920 --> 00:19:22.039
<v Speaker 1>So we're adding layers of details exactly.

402
00:19:22.119 --> 00:19:25.480
<v Speaker 2>Yeah. And the guidebook does a great job of, you know,

403
00:19:25.599 --> 00:19:29.559
<v Speaker 2>walking us through a practical example of designing a staff

404
00:19:29.640 --> 00:19:33.640
<v Speaker 2>schema using the case of modeling data from a point

405
00:19:33.680 --> 00:19:37.519
<v Speaker 2>of sale system, and they emphasize this importance of choosing

406
00:19:37.559 --> 00:19:41.640
<v Speaker 2>the right grain, you know, which refers to the level

407
00:19:41.680 --> 00:19:45.119
<v Speaker 2>of detail that's captured in that fact table. Okay, and

408
00:19:45.200 --> 00:19:50.440
<v Speaker 2>getting this right it's really crucial for efficient and accurate analysis.

409
00:19:49.920 --> 00:19:51.880
<v Speaker 1>Yeah, because if you get it wrong, it can lead

410
00:19:51.920 --> 00:19:55.559
<v Speaker 1>to all kinds of problems, right, like performance issues, data redundancy,

411
00:19:55.799 --> 00:19:58.559
<v Speaker 1>and even inaccurate results exactly.

412
00:19:58.720 --> 00:20:01.079
<v Speaker 2>Yeah. It's it's like, you know, trying to build a

413
00:20:01.119 --> 00:20:03.200
<v Speaker 2>house with bricks that are too big or too small.

414
00:20:03.279 --> 00:20:04.440
<v Speaker 2>It's just not going to work well.

415
00:20:04.559 --> 00:20:10.400
<v Speaker 1>Okay. So, like they use the analogy of tracking website traffic, right,

416
00:20:10.480 --> 00:20:13.400
<v Speaker 1>and if your grain is too coarse, you might only

417
00:20:13.480 --> 00:20:17.519
<v Speaker 1>be tracking visits at you know, the website level, but

418
00:20:17.599 --> 00:20:21.720
<v Speaker 1>you're missing details about like the page views right exactly.

419
00:20:21.880 --> 00:20:25.319
<v Speaker 1>But if it's too fine, you're tracking every single click

420
00:20:25.440 --> 00:20:26.160
<v Speaker 1>and mouse movement.

421
00:20:26.240 --> 00:20:30.359
<v Speaker 2>Yeah, and you end up with this massive, unwieldy data set.

422
00:20:30.640 --> 00:20:33.480
<v Speaker 1>Okay, So it's really important to find that that sweet

423
00:20:33.480 --> 00:20:36.519
<v Speaker 1>spot exactly, so you know, you get the information you

424
00:20:36.599 --> 00:20:40.279
<v Speaker 1>need without overwhelming yourself with data you don't need, right, now,

425
00:20:40.480 --> 00:20:42.960
<v Speaker 1>you know, I know the world of data warehousing has

426
00:20:43.039 --> 00:20:45.839
<v Speaker 1>changed a lot since Kimball first introduced this. Oh yeah,

427
00:20:45.880 --> 00:20:48.839
<v Speaker 1>I mean, especially with cloud data warehouses and their power

428
00:20:48.839 --> 00:20:49.680
<v Speaker 1>and affordability.

429
00:20:49.720 --> 00:20:53.240
<v Speaker 2>Absolutely, and the guidebook emphasizes that, you know, while Kimball's

430
00:20:53.240 --> 00:20:57.200
<v Speaker 2>principles are still really relevant, some of those specific techniques

431
00:20:57.240 --> 00:21:01.319
<v Speaker 2>that he talked about can be adapt or even bypassed

432
00:21:01.359 --> 00:21:04.079
<v Speaker 2>in the context of a modern cloud data warehouse.

433
00:21:04.160 --> 00:21:06.440
<v Speaker 1>Okay, so there's some things you can skip, yeah, exactly.

434
00:21:06.480 --> 00:21:09.200
<v Speaker 1>You know, like they gave the example of inventory management,

435
00:21:09.519 --> 00:21:12.599
<v Speaker 1>and traditionally, you know, if you wanted to track inventory

436
00:21:12.680 --> 00:21:16.960
<v Speaker 1>levels over time, you'd need these complex snapshot fact tables.

437
00:21:17.519 --> 00:21:19.480
<v Speaker 1>But with a modern data warehouse, a lot of the

438
00:21:19.480 --> 00:21:21.599
<v Speaker 1>time you can just work with the raw data exactly.

439
00:21:21.960 --> 00:21:24.839
<v Speaker 2>Yeah, and that's because of the sheer processing power and

440
00:21:24.920 --> 00:21:29.200
<v Speaker 2>storage capacity of these modern cloud data warehouses. Yeah, it

441
00:21:29.279 --> 00:21:32.559
<v Speaker 2>allows us to be much more agile and iterative in

442
00:21:32.599 --> 00:21:35.079
<v Speaker 2>our in our approach to data modeling. You know, we

443
00:21:35.119 --> 00:21:38.880
<v Speaker 2>can load the raw data first and then experiment with

444
00:21:39.039 --> 00:21:42.440
<v Speaker 2>different modeling techniques within the warehouse itself.

445
00:21:42.519 --> 00:21:44.759
<v Speaker 1>Okay, so you don't have to like figure everything out ufront,

446
00:21:44.839 --> 00:21:45.200
<v Speaker 1>you can.

447
00:21:45.160 --> 00:21:47.680
<v Speaker 2>Just experiment exactly, yeah, it's much more flexible, I.

448
00:21:47.720 --> 00:21:50.200
<v Speaker 1>See, okay, And they also talked about how features like

449
00:21:50.200 --> 00:21:56.279
<v Speaker 1>table partitioning can simplify how you handle slowly changing dimensions

450
00:21:56.400 --> 00:21:57.039
<v Speaker 1>or seds.

451
00:21:57.279 --> 00:21:58.880
<v Speaker 2>Yeah for sure, which used.

452
00:21:58.759 --> 00:22:01.000
<v Speaker 1>To require you know, these plax modeling.

453
00:22:00.720 --> 00:22:03.839
<v Speaker 2>Techniques, right, right, So table partitioning is basically a really

454
00:22:03.880 --> 00:22:07.920
<v Speaker 2>powerful technique. It allows you to divide, you know, a

455
00:22:08.000 --> 00:22:12.279
<v Speaker 2>large table into smaller, more manageable chunks, okay, based on

456
00:22:12.359 --> 00:22:16.880
<v Speaker 2>criteria like date ranges. And this can really improve query

457
00:22:16.920 --> 00:22:19.559
<v Speaker 2>performance and make it a lot easier to manage that

458
00:22:19.680 --> 00:22:20.519
<v Speaker 2>historical data.

459
00:22:21.000 --> 00:22:23.920
<v Speaker 1>So instead of having like this one massive table for

460
00:22:24.000 --> 00:22:26.559
<v Speaker 1>all of our customer data, we might partition it by

461
00:22:26.640 --> 00:22:29.039
<v Speaker 1>year exactly. Yeah, so if we need data from a

462
00:22:29.079 --> 00:22:31.240
<v Speaker 1>specific year, it's much faster to get to.

463
00:22:31.279 --> 00:22:36.319
<v Speaker 2>Yeah, precisely. And then when it comes to ScDs, you know,

464
00:22:36.640 --> 00:22:42.000
<v Speaker 2>which track changes to dimension attributes over time, table partitioning

465
00:22:42.000 --> 00:22:44.799
<v Speaker 2>can actually be used to store different versions of a

466
00:22:44.839 --> 00:22:48.519
<v Speaker 2>dimension in separate partitions, okay, so you can easily track

467
00:22:48.559 --> 00:22:52.559
<v Speaker 2>those historical changes without having to resort to those complex

468
00:22:52.640 --> 00:22:53.640
<v Speaker 2>modeling techniques.

469
00:22:54.319 --> 00:22:56.920
<v Speaker 1>So I mean, with these cloud data warehouses, it seems

470
00:22:56.960 --> 00:22:58.720
<v Speaker 1>like there's just so much more flexibility.

471
00:22:58.880 --> 00:23:01.720
<v Speaker 2>Yeah, absolutely, we're no longer limited by those constraints of

472
00:23:01.759 --> 00:23:05.359
<v Speaker 2>the old traditional on premise data warehouses. So yeah, we

473
00:23:05.400 --> 00:23:08.400
<v Speaker 2>can experiment more freely, iterate more quickly, and adapt our

474
00:23:08.440 --> 00:23:10.359
<v Speaker 2>models as our business needs evolve.

475
00:23:10.720 --> 00:23:13.079
<v Speaker 1>Okay, And you know, they have a really great case

476
00:23:13.119 --> 00:23:16.680
<v Speaker 1>study that they talk about about their own company, Holistics,

477
00:23:17.240 --> 00:23:21.920
<v Speaker 1>and they talk about how they implemented a snowcloud analytics system,

478
00:23:22.039 --> 00:23:26.000
<v Speaker 1>right yeah, and how they took this very iterative approach

479
00:23:26.480 --> 00:23:27.920
<v Speaker 1>to data modeling.

480
00:23:28.440 --> 00:23:32.119
<v Speaker 2>It's a great example of putting this putting theory into practice.

481
00:23:32.359 --> 00:23:34.640
<v Speaker 2>You know, they didn't try to model everything up front.

482
00:23:34.960 --> 00:23:37.759
<v Speaker 2>They just started with a basic model and then iteratively

483
00:23:37.799 --> 00:23:41.359
<v Speaker 2>refined it based on how people were actually using the data.

484
00:23:41.440 --> 00:23:44.440
<v Speaker 1>So the key takeaway is, let usage guide your modeling

485
00:23:44.480 --> 00:23:48.240
<v Speaker 1>decisions exactly, you know, don't try to over engineer things

486
00:23:48.240 --> 00:23:51.039
<v Speaker 1>from the start. Start simple and see how people are

487
00:23:51.079 --> 00:23:53.680
<v Speaker 1>interacting with the data, and then adjust accordingly.

488
00:23:53.920 --> 00:23:56.720
<v Speaker 2>And they highlight some key principles that guided their process,

489
00:23:56.799 --> 00:24:00.680
<v Speaker 2>like embedding business logic in the data models sometulf okay,

490
00:24:00.759 --> 00:24:05.079
<v Speaker 2>rather than individual queries and aiming for you know, self

491
00:24:05.119 --> 00:24:08.799
<v Speaker 2>serve and analytics by really empowering those business users to

492
00:24:08.920 --> 00:24:10.640
<v Speaker 2>explore the data independently.

493
00:24:10.960 --> 00:24:13.279
<v Speaker 1>So we're creating a system that's both you know, robust

494
00:24:13.279 --> 00:24:17.119
<v Speaker 1>and flexible and sure it's consistency, but also allows for exploration.

495
00:24:17.319 --> 00:24:18.799
<v Speaker 2>Yeah, finding that balance.

496
00:24:19.039 --> 00:24:21.640
<v Speaker 1>Okay, And this brings us to this this fascinating concept

497
00:24:21.680 --> 00:24:26.319
<v Speaker 1>of the arc of adoption, right, Yeah, which describes the

498
00:24:26.359 --> 00:24:29.880
<v Speaker 1>typical stages of data usage evolution within an organization.

499
00:24:30.240 --> 00:24:30.599
<v Speaker 2>Yeah.

500
00:24:30.640 --> 00:24:34.119
<v Speaker 1>It's a really great framework for understanding like how data

501
00:24:34.200 --> 00:24:36.200
<v Speaker 1>driven thinking takes root and.

502
00:24:36.200 --> 00:24:39.720
<v Speaker 2>Matures exactly, and understanding where your organization sits on this

503
00:24:39.920 --> 00:24:42.400
<v Speaker 2>arc can help you make much more informed decisions.

504
00:24:42.960 --> 00:24:45.680
<v Speaker 1>Okay, So tell me about this arc of adoption. I mean,

505
00:24:46.039 --> 00:24:47.920
<v Speaker 1>you know, we've already talked about the fact that, like

506
00:24:48.000 --> 00:24:51.160
<v Speaker 1>in those early stages, there's this heavy reliance on you know,

507
00:24:51.279 --> 00:24:54.319
<v Speaker 1>spreadsheets and ad hoc analysis, right, So, what are some

508
00:24:54.400 --> 00:24:57.000
<v Speaker 1>of the telltale signs that a company's in this phase

509
00:24:57.440 --> 00:24:59.359
<v Speaker 1>and what are some of the challenges they might face.

510
00:24:59.759 --> 00:25:02.400
<v Speaker 2>Well, Well, in this initial phase, the data is often

511
00:25:02.480 --> 00:25:06.119
<v Speaker 2>scattered across various sources, and there's this heavy reliance on

512
00:25:06.319 --> 00:25:11.039
<v Speaker 2>manual processes to extract you know, basic insights. Think about

513
00:25:11.160 --> 00:25:13.960
<v Speaker 2>you know those late nights you spend cobbling together reports

514
00:25:13.960 --> 00:25:16.519
<v Speaker 2>and excel. Oh, Yeah, I've been there trying to make

515
00:25:16.599 --> 00:25:20.759
<v Speaker 2>sense of data from different departments and systems, and you know,

516
00:25:20.839 --> 00:25:24.039
<v Speaker 2>it's functional, but it's so time consuming and prone to errors,

517
00:25:24.680 --> 00:25:27.960
<v Speaker 2>and it just doesn't scale well as your organization grows,

518
00:25:28.119 --> 00:25:28.640
<v Speaker 2>Like trying.

519
00:25:28.440 --> 00:25:30.400
<v Speaker 1>To build a skyscraper with hand tools exactly.

520
00:25:30.519 --> 00:25:32.200
<v Speaker 2>Yeah, you might be able to lay a few bricks,

521
00:25:32.200 --> 00:25:32.759
<v Speaker 2>but you're not going to.

522
00:25:32.799 --> 00:25:34.000
<v Speaker 1>Get very far right right.

523
00:25:34.160 --> 00:25:38.400
<v Speaker 2>And the guidebook it highlights some of the key challenges

524
00:25:38.440 --> 00:25:43.559
<v Speaker 2>of this stage, like data inconsistency, lack of standardization. You know,

525
00:25:43.559 --> 00:25:46.839
<v Speaker 2>it's really difficult to collaborate, and there's a very limited

526
00:25:46.839 --> 00:25:49.880
<v Speaker 2>ability to answer those more complex business questions.

527
00:25:49.920 --> 00:25:52.079
<v Speaker 1>It's a lot of frustration from both you know, the

528
00:25:52.079 --> 00:25:55.160
<v Speaker 1>people putting together the reports and then the decision makers

529
00:25:55.160 --> 00:25:56.519
<v Speaker 1>who are trying to use that information.

530
00:25:56.720 --> 00:26:00.799
<v Speaker 2>For sure, and this is often the point where companies ree, okay,

531
00:26:00.839 --> 00:26:05.160
<v Speaker 2>we need a more robust and scalable solution, and they

532
00:26:05.200 --> 00:26:07.839
<v Speaker 2>begin to invest in bi tools, which kind of moves

533
00:26:07.880 --> 00:26:11.039
<v Speaker 2>them to that second stage of the arc of adoption, okay,

534
00:26:11.119 --> 00:26:15.599
<v Speaker 2>which is charpitorized by those you know, static reports and dashboards.

535
00:26:15.839 --> 00:26:17.839
<v Speaker 2>So this is where we start to see you know,

536
00:26:17.920 --> 00:26:22.799
<v Speaker 2>those colorful charts and graphs that executives love, right right, Yeah,

537
00:26:22.839 --> 00:26:27.279
<v Speaker 2>and dashboards they provide this uh, you know, more centralized

538
00:26:27.319 --> 00:26:30.599
<v Speaker 2>and visual way to track those key metrics, so it

539
00:26:30.640 --> 00:26:35.599
<v Speaker 2>makes it easier to monitor progress, identify trends, and bi

540
00:26:35.759 --> 00:26:38.559
<v Speaker 2>tools play a much more prominent role at this stage,

541
00:26:38.680 --> 00:26:39.720
<v Speaker 2>host speaker.

542
00:26:39.440 --> 00:26:42.319
<v Speaker 1>Right, they start automating those reporting processes exactly.

543
00:26:42.440 --> 00:26:46.000
<v Speaker 2>Yeah, and they make data more accessible to a wider audience.

544
00:26:46.079 --> 00:26:48.440
<v Speaker 1>Okay, so it's like, you know, we've gone from that

545
00:26:48.559 --> 00:26:51.200
<v Speaker 1>hand drawn map to the GPS.

546
00:26:51.279 --> 00:26:52.200
<v Speaker 2>That's a great way to put it.

547
00:26:52.400 --> 00:26:54.960
<v Speaker 1>Yeah, we get a better picture of you know, where

548
00:26:54.960 --> 00:26:57.599
<v Speaker 1>we are and where we're going. Yeah, but we're still

549
00:26:57.640 --> 00:26:59.400
<v Speaker 1>kind of on a predetermined.

550
00:26:58.799 --> 00:27:02.640
<v Speaker 2>Route, right And dash towards you know, they provide valuable insights,

551
00:27:03.200 --> 00:27:06.160
<v Speaker 2>but they don't necessarily empower users to you know, explore

552
00:27:06.200 --> 00:27:10.079
<v Speaker 2>the data freely or ask those what if questions that

553
00:27:10.119 --> 00:27:13.160
<v Speaker 2>often lead to those really groundbreaking discoveries.

554
00:27:13.200 --> 00:27:16.519
<v Speaker 1>And that brings us to like, you know, the the

555
00:27:17.160 --> 00:27:20.799
<v Speaker 1>holy grail of data maturity, right, self service analytics exactly

556
00:27:20.920 --> 00:27:23.960
<v Speaker 1>the third stage of this arc of adoption, And this

557
00:27:24.000 --> 00:27:26.039
<v Speaker 1>is where our users can you know, they have the

558
00:27:26.119 --> 00:27:31.400
<v Speaker 1>ability to access, analyze, and visualize data independently without having

559
00:27:31.480 --> 00:27:34.440
<v Speaker 1>to go to the data team for every single request exactly.

560
00:27:35.319 --> 00:27:41.000
<v Speaker 2>Self service analytics really represents this shift from a centralized,

561
00:27:41.079 --> 00:27:45.720
<v Speaker 2>it driven model to a much more decentralized user driven approach.

562
00:27:46.200 --> 00:27:48.559
<v Speaker 2>It's like, you know, having a fully interactive map, you

563
00:27:48.599 --> 00:27:52.480
<v Speaker 2>can zoom in and out, explore different routes, discover like

564
00:27:53.000 --> 00:27:56.000
<v Speaker 2>those hidden gems, you know, and even create your own

565
00:27:56.039 --> 00:27:57.039
<v Speaker 2>personalized maps.

566
00:27:57.279 --> 00:28:01.079
<v Speaker 1>So incredibly empowering for those users. Yeah, but I know

567
00:28:01.160 --> 00:28:04.359
<v Speaker 1>that the guidebook cautions that just giving people BI tools

568
00:28:04.480 --> 00:28:07.400
<v Speaker 1>isn't enough, no at all, to really you know, experience

569
00:28:07.440 --> 00:28:08.640
<v Speaker 1>self service analytics.

570
00:28:08.759 --> 00:28:12.920
<v Speaker 2>Yeah, simply giving everyone access to data without the right foundation,

571
00:28:13.119 --> 00:28:15.839
<v Speaker 2>it's it's a recipe for disaster. Okay, you know, it's

572
00:28:15.920 --> 00:28:20.079
<v Speaker 2>it's like giving someone like a really powerful sports car, okay,

573
00:28:20.759 --> 00:28:24.680
<v Speaker 2>without any driving lessons or you know, knowledge of traffic laws.

574
00:28:24.839 --> 00:28:26.359
<v Speaker 1>Right, you know, they might be able to get a

575
00:28:26.400 --> 00:28:28.480
<v Speaker 1>moving but it's probably going to end in a crash.

576
00:28:28.640 --> 00:28:30.799
<v Speaker 2>Yeah, that's a that's a scary thought. So what are

577
00:28:30.839 --> 00:28:32.119
<v Speaker 2>those essential ingredients?

578
00:28:32.240 --> 00:28:36.799
<v Speaker 1>Yeah, so the guidebook they highlight three key pillars data literacy,

579
00:28:37.279 --> 00:28:40.640
<v Speaker 1>data governance, and a robust data modeling layer.

580
00:28:40.799 --> 00:28:42.359
<v Speaker 2>Okay, let's break those down a little bit.

581
00:28:42.640 --> 00:28:46.519
<v Speaker 1>Yeah, so data literacy basically means that users need to

582
00:28:46.599 --> 00:28:50.480
<v Speaker 1>understand how to interpret and work with data, how to

583
00:28:50.519 --> 00:28:53.920
<v Speaker 1>ask the right questions okay, and then how to draw

584
00:28:54.279 --> 00:28:57.759
<v Speaker 1>you know, meaningful conclusions from the insights they discovered. Okay.

585
00:28:57.839 --> 00:29:00.440
<v Speaker 2>So it's not just about having access to thet it's

586
00:29:00.519 --> 00:29:01.359
<v Speaker 2>knowing what to do with it.

587
00:29:01.559 --> 00:29:04.720
<v Speaker 1>Yeah, exactly. It's about being able to actually speak the

588
00:29:04.799 --> 00:29:12.039
<v Speaker 1>language of data. Data governance ensures that the data is accurate, consistent,

589
00:29:12.200 --> 00:29:13.079
<v Speaker 1>and trustworthy.

590
00:29:13.240 --> 00:29:13.559
<v Speaker 2>Okay.

591
00:29:13.680 --> 00:29:18.039
<v Speaker 1>It involves establishing processes for data quality management, defining those

592
00:29:18.160 --> 00:29:23.640
<v Speaker 1>clear roles and responsibilities for data stewership, and then implementing

593
00:29:23.680 --> 00:29:27.240
<v Speaker 1>those security measures to protect sensitive information. So we're making

594
00:29:27.240 --> 00:29:29.599
<v Speaker 1>sure everyone's using the data responsibly exactly.

595
00:29:29.640 --> 00:29:32.359
<v Speaker 2>Yeah, it's about creating that framework that ensures everyone is

596
00:29:32.440 --> 00:29:34.319
<v Speaker 2>using the data responsibly and ethically.

597
00:29:35.640 --> 00:29:38.000
<v Speaker 1>Okay. And then finally, the data modeling layer, which I

598
00:29:38.000 --> 00:29:40.599
<v Speaker 1>think we've talked about a lot already, but that's the

599
00:29:40.640 --> 00:29:43.240
<v Speaker 1>foundation really that makes this all possible.

600
00:29:43.319 --> 00:29:50.640
<v Speaker 2>Yeah. Absolutely. And by defining those business logics and those calculations,

601
00:29:50.759 --> 00:29:55.440
<v Speaker 2>you know, those relationships in that centralized location, we're creating

602
00:29:55.480 --> 00:29:58.240
<v Speaker 2>this single source of truth that everyone in the organization

603
00:29:58.279 --> 00:29:59.200
<v Speaker 2>can actually trust.

604
00:30:00.799 --> 00:30:03.519
<v Speaker 1>So the modeling layer is our guide exactly.

605
00:30:03.640 --> 00:30:06.319
<v Speaker 2>Yeah. It ensures that everyone's you know, speaking the same

606
00:30:06.400 --> 00:30:08.519
<v Speaker 2>language and interpreting the data consistently.

607
00:30:08.759 --> 00:30:11.599
<v Speaker 1>Okay, So without that things could go very wrong.

608
00:30:11.759 --> 00:30:15.880
<v Speaker 2>Oh yeah, absolutely. Without a robust data modeling layer, you know,

609
00:30:15.960 --> 00:30:19.200
<v Speaker 2>self service analytics can really quickly descend into chaos.

610
00:30:19.279 --> 00:30:19.640
<v Speaker 1>Okay.

611
00:30:19.880 --> 00:30:23.119
<v Speaker 2>You know you have users potentially creating their own definitions,

612
00:30:23.160 --> 00:30:25.759
<v Speaker 2>their own calculations, okay, which can lead to all sorts

613
00:30:25.799 --> 00:30:28.880
<v Speaker 2>of inconsistencies and inaccurate reporting.

614
00:30:29.160 --> 00:30:31.279
<v Speaker 1>Okay. So you know, they make it very clear that

615
00:30:31.559 --> 00:30:34.000
<v Speaker 1>that self service analytics is a journey. It's not like

616
00:30:34.039 --> 00:30:35.279
<v Speaker 1>you just arrived there.

617
00:30:35.319 --> 00:30:39.960
<v Speaker 2>No, absolutely, yeah, it's it's an ongoing process of learning, adapting,

618
00:30:40.160 --> 00:30:45.200
<v Speaker 2>and refining. And as organizations become more data driven, the

619
00:30:45.319 --> 00:30:48.559
<v Speaker 2>demands on that data team they also increase.

620
00:30:48.680 --> 00:30:50.319
<v Speaker 1>Yeah we talked about that earlier.

621
00:30:50.079 --> 00:30:54.720
<v Speaker 2>Right, and the guidebook delves into the challenges of scaling

622
00:30:54.759 --> 00:30:58.319
<v Speaker 2>your BI tools and processes to match the growing data

623
00:30:58.400 --> 00:31:00.640
<v Speaker 2>needs of the organization.

624
00:31:00.880 --> 00:31:03.759
<v Speaker 1>Yeah, you know how that team can be overwhelmed too

625
00:31:03.759 --> 00:31:09.039
<v Speaker 1>many requests for reports and dashboards at hoc analyses.

626
00:31:08.599 --> 00:31:12.279
<v Speaker 2>Right, Yeah, it's like that small kitchen staff trying to

627
00:31:12.599 --> 00:31:16.160
<v Speaker 2>cater a banquet for you know, hundreds of guests. It's

628
00:31:16.200 --> 00:31:17.759
<v Speaker 2>just not sustainable in the long run.

629
00:31:17.640 --> 00:31:19.599
<v Speaker 1>Right, it's not. So how do we prevent that? What

630
00:31:19.640 --> 00:31:20.720
<v Speaker 1>are some of those strategies?

631
00:31:20.960 --> 00:31:24.599
<v Speaker 2>So scalability becomes really really crucial here, and the guidebook

632
00:31:24.680 --> 00:31:28.440
<v Speaker 2>highlights several strategies for scaling your BI infrastructure, things like

633
00:31:28.640 --> 00:31:31.119
<v Speaker 2>choosing the right BI tools that can handle those increasing

634
00:31:31.200 --> 00:31:34.759
<v Speaker 2>data volumes and that user concurrency, you know, implementing a

635
00:31:34.839 --> 00:31:39.519
<v Speaker 2>really robust data governance process okay to streamline those workflows,

636
00:31:39.920 --> 00:31:43.559
<v Speaker 2>and then fostering that culture of data literacy throughout the

637
00:31:43.720 --> 00:31:44.480
<v Speaker 2>entire organization.

638
00:31:44.640 --> 00:31:46.839
<v Speaker 1>Okay, so working smarter, not harder, exactly.

639
00:31:47.160 --> 00:31:51.599
<v Speaker 2>Yeah, you know, investing in those right tools, processes, and

640
00:31:51.680 --> 00:31:55.920
<v Speaker 2>skills really helps those data teams keep pace with those

641
00:31:55.960 --> 00:31:59.359
<v Speaker 2>growing demands and ensure that the organization can continue to

642
00:31:59.400 --> 00:32:01.839
<v Speaker 2>make those really data driven decisions as it scales.

643
00:32:02.039 --> 00:32:04.880
<v Speaker 1>Okay, So technology is obviously, you know, a very important

644
00:32:04.920 --> 00:32:07.240
<v Speaker 1>part of this, but it's really the people that are

645
00:32:07.279 --> 00:32:08.359
<v Speaker 1>driving that success.

646
00:32:08.599 --> 00:32:12.559
<v Speaker 2>Yeah, absolutely, you know, technology is an essential enabler, but

647
00:32:12.759 --> 00:32:16.240
<v Speaker 2>it's it's ultimately the people who drive that data driven success. So,

648
00:32:16.480 --> 00:32:20.559
<v Speaker 2>you know, fostering that culture of data literacy, empowering those

649
00:32:20.640 --> 00:32:23.920
<v Speaker 2>business users and then investing in the development of your

650
00:32:23.960 --> 00:32:27.400
<v Speaker 2>data professionals. Those are all crucial for building a truly

651
00:32:27.559 --> 00:32:28.839
<v Speaker 2>data driven organization.

652
00:32:29.519 --> 00:32:32.640
<v Speaker 1>Okay. So the guidebook concludes by emphasizing that this world

653
00:32:32.799 --> 00:32:35.839
<v Speaker 1>of business intelligence, I mean it's constantly changing.

654
00:32:36.079 --> 00:32:38.640
<v Speaker 2>Oh yeah, absolutely, always evolving.

655
00:32:38.759 --> 00:32:43.720
<v Speaker 1>New tools, new technologies, new approaches. I mean, what's what

656
00:32:44.039 --> 00:32:47.440
<v Speaker 1>works today might not work tomorrow. So you know we

657
00:32:47.440 --> 00:32:51.480
<v Speaker 1>got to stay curious, experiment, yeah, definitely, and continue to

658
00:32:51.559 --> 00:32:52.319
<v Speaker 1>learn and adapt.

659
00:32:52.519 --> 00:32:55.279
<v Speaker 2>You know, it's like surfing. Now you need to be

660
00:32:55.400 --> 00:33:00.480
<v Speaker 2>aware of those changing tides, adjust your balance, and you know,

661
00:33:00.640 --> 00:33:03.640
<v Speaker 2>stay ahead of the curve to really avoid wiping out.

662
00:33:03.880 --> 00:33:06.680
<v Speaker 1>Okay, I like that. So what does this mean for

663
00:33:06.759 --> 00:33:07.920
<v Speaker 1>our listeners, Well, you know.

664
00:33:07.920 --> 00:33:12.000
<v Speaker 2>Building a modern analytics system, it's not just about choosing

665
00:33:12.039 --> 00:33:17.359
<v Speaker 2>those right tools, it's about really understanding those underlying principles,

666
00:33:17.839 --> 00:33:22.079
<v Speaker 2>you know, adapting them to your organization's specific needs and growth, Okay,

667
00:33:22.400 --> 00:33:26.079
<v Speaker 2>and then embracing that culture of continuous learning and improvement.

668
00:33:26.680 --> 00:33:29.519
<v Speaker 2>So this, this deep diet has really equipped you with

669
00:33:29.559 --> 00:33:34.359
<v Speaker 2>the knowledge to navigate this really exciting and rapidly evolving field.

670
00:33:35.119 --> 00:33:36.359
<v Speaker 2>But the journey doesn't end.

671
00:33:36.319 --> 00:33:37.200
<v Speaker 1>Here, no, it doesn't.

672
00:33:37.240 --> 00:33:42.000
<v Speaker 2>So keep exploring, keep experimenting and keep pushing the boundaries

673
00:33:42.000 --> 00:33:43.559
<v Speaker 2>of what's possible with data.

674
00:33:43.799 --> 00:33:45.359
<v Speaker 1>It really is, and it can help you make those

675
00:33:45.359 --> 00:33:48.400
<v Speaker 1>decisions about your bi strategy. So let's go back to

676
00:33:48.440 --> 00:33:51.079
<v Speaker 1>these stages and unpack them a little bit more. You know,

677
00:33:51.400 --> 00:33:54.039
<v Speaker 1>we know that in the early stages, companies rely heavily

678
00:33:54.119 --> 00:33:57.759
<v Speaker 1>on spreadsheets and ad hoc analysis. What are some of

679
00:33:57.759 --> 00:34:00.480
<v Speaker 1>the telltale signs that a company's in this phase and

680
00:34:00.839 --> 00:34:02.519
<v Speaker 1>what are some of the challenges they might face.

681
00:34:02.759 --> 00:34:06.240
<v Speaker 2>Yeah, so in this initial phase, data is often, like

682
00:34:06.240 --> 00:34:09.239
<v Speaker 2>you said, scattered across different sources, and there's a really

683
00:34:09.320 --> 00:34:12.880
<v Speaker 2>heavy reliance on those manual processes to extract, you know,

684
00:34:12.920 --> 00:34:15.320
<v Speaker 2>just those basic insights. Think of those late nights, like

685
00:34:15.320 --> 00:34:18.599
<v Speaker 2>you said, spent cobbling together reports and Excel trying to

686
00:34:18.639 --> 00:34:21.559
<v Speaker 2>make sense of data from different departments or systems, and

687
00:34:21.800 --> 00:34:24.800
<v Speaker 2>you know, it's a functional approach, but it's incredibly time consuming,

688
00:34:24.840 --> 00:34:27.960
<v Speaker 2>prone to errors, and it just doesn't scale well as

689
00:34:27.960 --> 00:34:29.119
<v Speaker 2>the organization grows.

690
00:34:29.320 --> 00:34:31.199
<v Speaker 1>It's like trying to build a skyscraper.

691
00:34:30.679 --> 00:34:33.360
<v Speaker 2>With handles exactly. Yeah, you might be able to lay

692
00:34:33.360 --> 00:34:35.119
<v Speaker 2>a few bricks, but you're not going to get very far.

693
00:34:35.199 --> 00:34:38.320
<v Speaker 2>And the guidebook highlights some of those key challenges of

694
00:34:38.360 --> 00:34:43.679
<v Speaker 2>this stage, you know, data inconsistency, lack of standardization, difficulty

695
00:34:43.719 --> 00:34:47.360
<v Speaker 2>in collaborating, and a very limited ability to answer those

696
00:34:47.519 --> 00:34:49.599
<v Speaker 2>those those complex business questions.

697
00:34:49.800 --> 00:34:52.079
<v Speaker 1>Frustration from both you know, the people who are putting

698
00:34:52.119 --> 00:34:54.480
<v Speaker 1>together the reports and the decision makers who are trying

699
00:34:54.480 --> 00:34:56.239
<v Speaker 1>to use it exactly.

700
00:34:56.320 --> 00:34:58.880
<v Speaker 2>And this is often the point where companies realize, okay,

701
00:34:59.000 --> 00:35:02.760
<v Speaker 2>we need a more bust and scalable solution, so they

702
00:35:02.840 --> 00:35:05.480
<v Speaker 2>start to invest in BI tools, which kind of moves

703
00:35:05.480 --> 00:35:08.360
<v Speaker 2>them to that second stage of the arc of adoption, okay,

704
00:35:08.440 --> 00:35:11.880
<v Speaker 2>which is characterized by those you know, static reports and dashboards.

705
00:35:11.920 --> 00:35:14.199
<v Speaker 2>So this is where we start to see those you know,

706
00:35:14.360 --> 00:35:17.599
<v Speaker 2>colorful charts and graphs that executives love.

707
00:35:17.800 --> 00:35:21.519
<v Speaker 1>Right and dashboards. They provide a more centralized and visual

708
00:35:21.559 --> 00:35:24.719
<v Speaker 1>way to track those key metrics, making it easier to

709
00:35:24.960 --> 00:35:28.679
<v Speaker 1>monitor progress, identify trends, and BI tools play a much

710
00:35:28.719 --> 00:35:32.920
<v Speaker 1>more prominent role at this stage, automating those reporting processes

711
00:35:33.159 --> 00:35:36.119
<v Speaker 1>and making data more accessible to a wider audience. So

712
00:35:36.159 --> 00:35:38.800
<v Speaker 1>we've gone from that hand drawn map to a GPS system.

713
00:35:38.840 --> 00:35:40.760
<v Speaker 1>We've got a clearer view of where we are and

714
00:35:40.760 --> 00:35:43.800
<v Speaker 1>where we're going, but we're still following that pre determined route.

715
00:35:43.880 --> 00:35:46.599
<v Speaker 2>That's a great way to put it. Yeah, dashboards, you know,

716
00:35:46.639 --> 00:35:49.960
<v Speaker 2>they provide valuable insights, but they don't necessarily empower users

717
00:35:50.000 --> 00:35:52.639
<v Speaker 2>to explore the data freely or ask those what if

718
00:35:52.760 --> 00:35:57.000
<v Speaker 2>questions that often lead to those you know, groundbreaking discoveries.

719
00:35:57.159 --> 00:35:59.639
<v Speaker 1>Right, and that brings us to the holy grail, self

720
00:35:59.639 --> 00:36:03.199
<v Speaker 1>service analytics exactly. Yeah, the third stage of this arc

721
00:36:03.239 --> 00:36:05.679
<v Speaker 1>of adoption. Yeah, and this is where our users have

722
00:36:05.719 --> 00:36:10.400
<v Speaker 1>the ability to access, analyze, and visualize data independently without

723
00:36:10.400 --> 00:36:12.440
<v Speaker 1>having to go to the data team for every request.

724
00:36:12.800 --> 00:36:17.679
<v Speaker 2>Exactly. Self service analytics represents a shift from that centralized,

725
00:36:17.679 --> 00:36:21.920
<v Speaker 2>it driven model to a much more decentralized user driven approach.

726
00:36:22.239 --> 00:36:24.559
<v Speaker 2>It's like having a fully interactive map. You can zoom

727
00:36:24.599 --> 00:36:27.920
<v Speaker 2>in and out, explore different routes, discover those hidden gems,

728
00:36:28.159 --> 00:36:30.199
<v Speaker 2>you know, and even create your own personalized maps.

729
00:36:30.239 --> 00:36:33.199
<v Speaker 1>Okay, so very empowering for the user. But I know

730
00:36:33.239 --> 00:36:36.840
<v Speaker 1>the guidebook cautions that just giving people BI tools isn't

731
00:36:36.920 --> 00:36:38.519
<v Speaker 1>enough to actually be able to do this.

732
00:36:38.880 --> 00:36:42.280
<v Speaker 2>No, not at all. Simply giving everyone access to data

733
00:36:42.360 --> 00:36:45.880
<v Speaker 2>without the right foundation is a recipe for disaster. It's

734
00:36:45.920 --> 00:36:49.320
<v Speaker 2>like giving someone a really powerful supports car you know,

735
00:36:49.360 --> 00:36:52.519
<v Speaker 2>without any driving lessons or knowledge of traffic laws, they

736
00:36:52.599 --> 00:36:54.239
<v Speaker 2>might be able to get it moving, but it's probably

737
00:36:54.239 --> 00:36:55.320
<v Speaker 2>going to end in a crash.

738
00:36:55.360 --> 00:36:57.760
<v Speaker 1>That's a pretty scary thought. So what are those essential

739
00:36:57.880 --> 00:36:59.559
<v Speaker 1>ingredients for self service analytics?

740
00:36:59.679 --> 00:37:03.519
<v Speaker 2>Yes, so the guidebook highlights three key pillars data literacy,

741
00:37:03.599 --> 00:37:07.119
<v Speaker 2>data governance, and a robust data modeling layer. So data

742
00:37:07.159 --> 00:37:09.840
<v Speaker 2>literacy basically means that users need to understand how to

743
00:37:09.880 --> 00:37:12.880
<v Speaker 2>interpret and work with data, how to ask the right questions,

744
00:37:12.880 --> 00:37:15.280
<v Speaker 2>and then how to draw you know, meaningful conclusions from

745
00:37:15.280 --> 00:37:16.280
<v Speaker 2>the insights they discover.

746
00:37:16.559 --> 00:37:19.159
<v Speaker 1>So it's not just about having access to the data,

747
00:37:19.239 --> 00:37:20.559
<v Speaker 1>it's knowing what to do with it.

748
00:37:20.760 --> 00:37:23.599
<v Speaker 2>Yeah, exactly, it's about being able to speak the language

749
00:37:23.599 --> 00:37:29.639
<v Speaker 2>of data data governance that ensures that the data is accurate, consistent,

750
00:37:29.719 --> 00:37:34.559
<v Speaker 2>and trustworthy. It involves establishing those processes for data quality management,

751
00:37:35.000 --> 00:37:38.559
<v Speaker 2>defining clearer roles and responsibilities for data stewardship, and then

752
00:37:38.639 --> 00:37:42.239
<v Speaker 2>implementing those security measures to protect sensitive information, but.

753
00:37:42.320 --> 00:37:45.280
<v Speaker 1>Making sure everyone's using the data responsibly and.

754
00:37:45.599 --> 00:37:47.760
<v Speaker 2>Exactly, yeah, it's about creating a framework to make sure

755
00:37:47.760 --> 00:37:50.480
<v Speaker 2>that happens. And then finally, you know, we have that

756
00:37:50.599 --> 00:37:53.639
<v Speaker 2>data modeling layer and This is that foundation that makes

757
00:37:53.760 --> 00:37:59.079
<v Speaker 2>self service analytics possible. By defining those business logics and calculations,

758
00:37:59.159 --> 00:38:03.119
<v Speaker 2>you know, those relationships in a centralized location, we're creating

759
00:38:03.159 --> 00:38:05.920
<v Speaker 2>that single source of truth that everyone in the organization

760
00:38:06.039 --> 00:38:06.639
<v Speaker 2>can trust.

761
00:38:07.000 --> 00:38:10.320
<v Speaker 1>So that modeling layer is our guide. It ensures that

762
00:38:10.360 --> 00:38:14.559
<v Speaker 1>everyone speaking the same language interpreting the data consistently exactly.

763
00:38:14.679 --> 00:38:18.519
<v Speaker 2>And without that robust data modeling layer, you know, self

764
00:38:18.559 --> 00:38:22.000
<v Speaker 2>service analytics can really descend into chaos where you have

765
00:38:22.159 --> 00:38:26.639
<v Speaker 2>users potentially creating their own definitions, their own calculations, which

766
00:38:26.800 --> 00:38:30.159
<v Speaker 2>leads to inconsistencies and inaccurate reporting.

767
00:38:30.400 --> 00:38:32.679
<v Speaker 1>So it sounds like this is really a journey. It's

768
00:38:32.679 --> 00:38:33.039
<v Speaker 1>not a.

769
00:38:32.960 --> 00:38:37.519
<v Speaker 2>Destination, absolutely, yeah, it's it's an ongoing process of learning, adapting,

770
00:38:37.559 --> 00:38:41.639
<v Speaker 2>and refining. And as organizations become more data driven, the

771
00:38:41.679 --> 00:38:44.559
<v Speaker 2>demands on the data team also increase, and the guidebook

772
00:38:44.559 --> 00:38:47.440
<v Speaker 2>really delves into the challenges of scaling your bi tools

773
00:38:47.440 --> 00:38:51.199
<v Speaker 2>and processes to match those you know, growing data needs

774
00:38:51.199 --> 00:38:51.960
<v Speaker 2>of the organization.

775
00:38:52.119 --> 00:38:54.599
<v Speaker 1>Right, so, the data team can easily become overwhelmed too

776
00:38:54.599 --> 00:38:58.519
<v Speaker 1>many requests for reports, dashboards, ad hoc analyses. How do

777
00:38:58.559 --> 00:38:59.199
<v Speaker 1>we prevent that?

778
00:38:59.239 --> 00:39:02.519
<v Speaker 2>What are some strategy so scalability becomes really crucial here,

779
00:39:02.920 --> 00:39:06.559
<v Speaker 2>and the guidebook highlights several strategies for scaling your BI infrastructure.

780
00:39:06.639 --> 00:39:09.119
<v Speaker 2>You know, things like choosing the right BI tools that

781
00:39:09.159 --> 00:39:12.760
<v Speaker 2>can handle those increasing data volumes and that user concurrency,

782
00:39:13.119 --> 00:39:17.599
<v Speaker 2>implementing a really robust data governance process to streamline workflows,

783
00:39:17.840 --> 00:39:21.480
<v Speaker 2>and then fostering a culture of data literacy throughout the.

784
00:39:21.480 --> 00:39:26.400
<v Speaker 1>Organization, so working smarter, not harder, investing the right tools, processes,

785
00:39:26.519 --> 00:39:29.719
<v Speaker 1>skills to help that data team keep pace with those

786
00:39:29.760 --> 00:39:30.840
<v Speaker 1>growing demands exactly.

787
00:39:30.960 --> 00:39:34.159
<v Speaker 2>Yeah, you know, investing in those right tools, those processes,

788
00:39:34.199 --> 00:39:36.599
<v Speaker 2>and those skills really helps those data teams keep pace

789
00:39:36.639 --> 00:39:39.440
<v Speaker 2>with those growing demands and ensure that the organization can

790
00:39:39.480 --> 00:39:42.280
<v Speaker 2>continue to make data driven decisions as it scales.

791
00:39:42.440 --> 00:39:45.400
<v Speaker 1>Okay, so technology is a really important enabler, but at

792
00:39:45.400 --> 00:39:47.599
<v Speaker 1>the end of the day, it's really the people that

793
00:39:47.599 --> 00:39:49.400
<v Speaker 1>are going to be driving data driven.

794
00:39:49.159 --> 00:39:52.960
<v Speaker 2>Success, right, Yeah, absolutely, technology is an essential enabler, but

795
00:39:53.039 --> 00:39:56.440
<v Speaker 2>it's ultimately the people who drive that data driven success.

796
00:39:56.800 --> 00:40:00.199
<v Speaker 2>So fostering that culture of data literacy and power uring

797
00:40:00.239 --> 00:40:04.079
<v Speaker 2>those business users, and investing in that development of data professionals,

798
00:40:04.519 --> 00:40:08.840
<v Speaker 2>those are all crucial for building a truly data driven organization.

799
00:40:08.719 --> 00:40:10.880
<v Speaker 1>And the guide, but concludes by saying that this world

800
00:40:10.880 --> 00:40:15.559
<v Speaker 1>of business intelligence is constantly evolving with new tools and technologies.

801
00:40:16.079 --> 00:40:19.519
<v Speaker 1>What works today might be outdated tomorrow, so it's really

802
00:40:19.679 --> 00:40:23.599
<v Speaker 1>essential to stay curious, experiment, and continue to learn and adapt.

803
00:40:24.320 --> 00:40:26.559
<v Speaker 2>You know, It's like surfing. You need to be aware

804
00:40:26.599 --> 00:40:29.880
<v Speaker 2>of those changing tides, adjust your balance, and stay ahead

805
00:40:29.880 --> 00:40:31.360
<v Speaker 2>of the curve to avoid wiping out.

806
00:40:31.480 --> 00:40:33.400
<v Speaker 1>Yeah, I like that analogy. So what does all this

807
00:40:33.519 --> 00:40:34.800
<v Speaker 1>mean for listeners?

808
00:40:34.960 --> 00:40:38.000
<v Speaker 2>Well, building a modern analytics system, it's not just about

809
00:40:38.079 --> 00:40:42.239
<v Speaker 2>choosing the right tools, it's about really understanding those underlying principles,

810
00:40:42.280 --> 00:40:45.599
<v Speaker 2>adapting them to your organization's needs and growth, and embracing

811
00:40:45.599 --> 00:40:47.719
<v Speaker 2>that culture of continuous learning and improvement.

812
00:40:48.079 --> 00:40:50.280
<v Speaker 1>This deep dive has really given you the knowledge to

813
00:40:50.400 --> 00:40:54.159
<v Speaker 1>navigate this exciting and rapidly evolving field. But remember it

814
00:40:54.199 --> 00:40:58.119
<v Speaker 1>doesn't end here. Keep exploring, keep experimenting, and keep pushing

815
00:40:58.119 --> 00:40:59.960
<v Speaker 1>the boundaries of what's possible with data.
