WEBVTT

1
00:00:00.040 --> 00:00:03.759
<v Speaker 1>Okay, let's unpack this. You ever feel like you're just drowning.

2
00:00:03.359 --> 00:00:07.360
<v Speaker 2>In numbers, Definitely, spreadsheets everywhere, complex reports.

3
00:00:07.000 --> 00:00:10.599
<v Speaker 1>Exactly, grabs that maybe look cool but don't actually tell

4
00:00:10.599 --> 00:00:12.560
<v Speaker 1>you much. It's so easy to get lost in all

5
00:00:12.599 --> 00:00:15.000
<v Speaker 1>that data, right, But imagine if you could just see

6
00:00:15.039 --> 00:00:18.160
<v Speaker 1>through it all, if the numbers could instantly like paint

7
00:00:18.199 --> 00:00:19.039
<v Speaker 1>a clear picture.

8
00:00:19.679 --> 00:00:22.399
<v Speaker 2>That's the idea, isn't it. That's the real power of

9
00:00:22.480 --> 00:00:27.600
<v Speaker 2>data visualization, turning that overwhelming information into something you can grasp,

10
00:00:27.879 --> 00:00:29.320
<v Speaker 2>you know, quickly and really thoroughly.

11
00:00:29.359 --> 00:00:32.479
<v Speaker 1>Yeah, moving beyond just tables and stats exactly.

12
00:00:32.560 --> 00:00:35.399
<v Speaker 2>We're so used to looking at tables, maybe hearing about models,

13
00:00:35.759 --> 00:00:38.439
<v Speaker 2>and those have their place for sure, but a good

14
00:00:38.560 --> 00:00:41.920
<v Speaker 2>visualization it can give you that immediate kind of gut

15
00:00:42.039 --> 00:00:45.679
<v Speaker 2>level understanding, seeing the patterns, the relationships hiding in there.

16
00:00:45.759 --> 00:00:48.719
<v Speaker 1>It's like reading a recipe versus actually seeing the finished dish.

17
00:00:48.759 --> 00:00:51.840
<v Speaker 1>Like you said, yeah, precisely, And that brings us nicely

18
00:00:51.880 --> 00:00:55.399
<v Speaker 1>to our source for this deep dive. The book Data

19
00:00:55.479 --> 00:00:58.000
<v Speaker 1>Visualization A practical.

20
00:00:57.520 --> 00:00:59.960
<v Speaker 2>Introduction, Ah, yes, good one.

21
00:01:00.200 --> 00:01:02.759
<v Speaker 1>And this isn't just you know, a gallery of pretty charts.

22
00:01:02.799 --> 00:01:05.239
<v Speaker 1>It's a really practical, hands on guide mm hmm.

23
00:01:05.599 --> 00:01:10.040
<v Speaker 2>It walks you through using R the programming language and

24
00:01:10.159 --> 00:01:13.959
<v Speaker 2>this really flexible tool called gig plot two right.

25
00:01:14.000 --> 00:01:16.200
<v Speaker 1>And what I found really insightful is how it focuses

26
00:01:16.239 --> 00:01:18.640
<v Speaker 1>not just on the aesthetics like does it look.

27
00:01:18.599 --> 00:01:20.599
<v Speaker 2>Nice, although that matters.

28
00:01:20.200 --> 00:01:25.159
<v Speaker 1>True, but more on how our brains actually process visual

29
00:01:25.200 --> 00:01:28.879
<v Speaker 1>information and designing charts that work with that process.

30
00:01:29.079 --> 00:01:32.480
<v Speaker 2>That connection is key, isn't it, between how we see

31
00:01:32.920 --> 00:01:36.120
<v Speaker 2>and what we understand. The book really emphasizes that the

32
00:01:36.200 --> 00:01:39.439
<v Speaker 2>best visualizations are the ones that kind of tap into

33
00:01:39.439 --> 00:01:43.879
<v Speaker 2>how we intuitively interpret things like size, color, position, makes sense.

34
00:01:43.680 --> 00:01:46.680
<v Speaker 1>Like making the data speak directly through what we see exactly.

35
00:01:46.760 --> 00:01:48.760
<v Speaker 1>So our mission today really is to pull out the

36
00:01:48.840 --> 00:01:52.560
<v Speaker 1>key insights from this book to help you listening become

37
00:01:52.560 --> 00:01:53.480
<v Speaker 1>more data savvy.

38
00:01:53.599 --> 00:01:55.359
<v Speaker 2>Yeah, give you the tools to not just make your

39
00:01:55.359 --> 00:01:57.480
<v Speaker 2>own effective charts, but also to look at any graph

40
00:01:57.599 --> 00:02:00.400
<v Speaker 2>you see and really understand what it's telling you or

41
00:02:00.400 --> 00:02:01.519
<v Speaker 2>maybe what it isn't telling you.

42
00:02:01.719 --> 00:02:04.959
<v Speaker 1>Good point. We want to help you avoid those common

43
00:02:05.000 --> 00:02:08.639
<v Speaker 1>pitfalls and just feel more confident navigating all this data.

44
00:02:09.080 --> 00:02:12.000
<v Speaker 2>Okay, so where should we start? Maybe the big question

45
00:02:13.360 --> 00:02:17.800
<v Speaker 2>why even bother visualizing data? Why not just stick with

46
00:02:17.879 --> 00:02:18.439
<v Speaker 2>the tables.

47
00:02:18.680 --> 00:02:21.479
<v Speaker 1>Right. The book makes a really strong case for moving

48
00:02:21.520 --> 00:02:22.759
<v Speaker 1>beyond just the numbers.

49
00:02:22.919 --> 00:02:26.960
<v Speaker 2>It does. There's a great example from Jackman back in

50
00:02:27.039 --> 00:02:27.680
<v Speaker 2>nineteen eighty.

51
00:02:27.759 --> 00:02:29.240
<v Speaker 1>Oh yeah, the voter turnout one.

52
00:02:29.360 --> 00:02:31.879
<v Speaker 2>That's the one. He was looking at voter turnout and

53
00:02:32.080 --> 00:02:36.879
<v Speaker 2>income inequality across different countries. Okay, and the initial analysis,

54
00:02:36.960 --> 00:02:40.400
<v Speaker 2>just crunching the numbers for eighteen countries suggested a pretty

55
00:02:40.439 --> 00:02:41.080
<v Speaker 2>strong link.

56
00:02:41.520 --> 00:02:42.719
<v Speaker 1>Seems straightforward enough.

57
00:02:42.719 --> 00:02:45.000
<v Speaker 2>But then he just plotted the data a simple scatterplot

58
00:02:45.680 --> 00:02:49.039
<v Speaker 2>and bam, it was instantly clear that whole relationship is

59
00:02:49.039 --> 00:02:51.319
<v Speaker 2>basically being driven by one single data.

60
00:02:51.080 --> 00:02:52.159
<v Speaker 1>Point South Africa.

61
00:02:52.479 --> 00:02:56.439
<v Speaker 2>Wow, so one outlier was creating the entire trend pretty much.

62
00:02:56.639 --> 00:03:00.000
<v Speaker 1>Now, you could find that eventually with more stats, sensitivity analysis,

63
00:03:00.159 --> 00:03:00.719
<v Speaker 1>stuff like that.

64
00:03:00.800 --> 00:03:02.719
<v Speaker 2>Sure, big, deep enough, but the visual.

65
00:03:02.520 --> 00:03:04.159
<v Speaker 1>It made it obvious immediately.

66
00:03:04.400 --> 00:03:07.159
<v Speaker 2>That's a powerful demonstration right there. It really is.

67
00:03:07.280 --> 00:03:07.520
<v Speaker 1>Yeah.

68
00:03:07.560 --> 00:03:11.280
<v Speaker 2>And there's another great illustration of the book, inspired by Enscom's.

69
00:03:10.879 --> 00:03:12.439
<v Speaker 1>Quartet ah the classic.

70
00:03:12.719 --> 00:03:16.719
<v Speaker 2>Yeah. So another researcher, Van Hove, created sixteen different data

71
00:03:16.719 --> 00:03:20.960
<v Speaker 2>sets and here's the kicker. Huh, every single one had

72
00:03:20.960 --> 00:03:25.280
<v Speaker 2>the exact same statistical correlation between x and y r

73
00:03:25.800 --> 00:03:26.919
<v Speaker 2>equals point.

74
00:03:26.680 --> 00:03:31.319
<v Speaker 1>Six Okay, point six seems like a decent positive relationship.

75
00:03:30.759 --> 00:03:33.159
<v Speaker 2>Right, that's what the number tells you. But then you

76
00:03:33.280 --> 00:03:36.680
<v Speaker 2>visualize them. You plot those sixteen data sets, and they

77
00:03:36.680 --> 00:03:39.800
<v Speaker 2>look at different, totally different. Some look like a nice

78
00:03:39.800 --> 00:03:42.840
<v Speaker 2>cloud of points like you'd expect. Others have a crazy

79
00:03:42.919 --> 00:03:45.840
<v Speaker 2>outlier point of the line. Some are clearly curved, some

80
00:03:45.879 --> 00:03:47.919
<v Speaker 2>are just like two separate groups of dots.

81
00:03:48.039 --> 00:03:51.479
<v Speaker 1>So the single number point six completely hid all that

82
00:03:51.599 --> 00:03:52.840
<v Speaker 1>variation completely.

83
00:03:52.879 --> 00:03:56.240
<v Speaker 2>The core insight there is just critical. Yeah, always always

84
00:03:56.240 --> 00:03:58.919
<v Speaker 2>look at a scatterplot of your correlations. Don't just trust

85
00:03:59.000 --> 00:03:59.400
<v Speaker 2>the number.

86
00:03:59.520 --> 00:04:01.919
<v Speaker 1>It really Ammer's home that point, doesn't it. A single

87
00:04:01.960 --> 00:04:05.240
<v Speaker 1>statistic can mask wildly different realities in the data. You

88
00:04:05.280 --> 00:04:07.800
<v Speaker 1>just have to see the shape, spot the weird.

89
00:04:07.599 --> 00:04:08.960
<v Speaker 2>Stuff, understand the distribution.

90
00:04:09.120 --> 00:04:12.159
<v Speaker 1>Yeah, but it's also important, as the book notes, not

91
00:04:12.199 --> 00:04:14.120
<v Speaker 1>to just blindly trust the visual either.

92
00:04:14.319 --> 00:04:18.399
<v Speaker 2>Right. Absolutely, visualizations have their own what the book calls

93
00:04:18.480 --> 00:04:21.360
<v Speaker 2>rhetorical plausibility. They suggest things, They.

94
00:04:21.199 --> 00:04:23.560
<v Speaker 1>Frame the data in a certain way exactly.

95
00:04:24.199 --> 00:04:25.959
<v Speaker 2>Just because it's in a chart doesn't make it the

96
00:04:26.000 --> 00:04:28.480
<v Speaker 2>absolute truth. We still need to think critically.

97
00:04:28.759 --> 00:04:32.279
<v Speaker 1>Okay, so we're sold on why visualization is powerful, But

98
00:04:32.360 --> 00:04:34.639
<v Speaker 1>what makes one good versus bad Right.

99
00:04:34.959 --> 00:04:38.160
<v Speaker 2>The book kind of breaks down the problems into three buckets. Okay,

100
00:04:38.439 --> 00:04:43.079
<v Speaker 2>there's issues of just plain bad taste aesthetics. Then there

101
00:04:43.120 --> 00:04:46.879
<v Speaker 2>are substantive problems like how the data itself is shown,

102
00:04:47.279 --> 00:04:51.240
<v Speaker 2>and finally perceptual problems how our brains interpret the visual.

103
00:04:51.439 --> 00:04:54.639
<v Speaker 1>Let's start with bad taste. What falls into that category?

104
00:04:54.800 --> 00:04:56.839
<v Speaker 2>This is where aesthetics really come in. Things that make

105
00:04:56.879 --> 00:05:01.639
<v Speaker 2>a graph distracting, cluttered, hard to read, inconsistent design.

106
00:05:01.399 --> 00:05:02.519
<v Speaker 1>Choice, What is too much going on?

107
00:05:02.639 --> 00:05:05.279
<v Speaker 2>Yeah? Exactly. The book uses an example figure one point

108
00:05:05.279 --> 00:05:08.399
<v Speaker 2>four of what it calls chart junk, classic term chart junk.

109
00:05:08.560 --> 00:05:09.720
<v Speaker 1>Love it. What's an example?

110
00:05:09.959 --> 00:05:13.439
<v Speaker 2>Oh, think like bars that are hard to distinguish, labels

111
00:05:13.439 --> 00:05:17.360
<v Speaker 2>repeated everywhere pointlessly, Maybe those fake three D effects that

112
00:05:17.399 --> 00:05:22.160
<v Speaker 2>add nothing, oh the worst? Or drop shadows, yes, drop shadows,

113
00:05:22.199 --> 00:05:25.879
<v Speaker 2>pointless textures. It's just visual clutter getting in the way

114
00:05:25.879 --> 00:05:26.759
<v Speaker 2>of the actual data.

115
00:05:26.920 --> 00:05:29.519
<v Speaker 1>So the idea is keep it clean pretty much.

116
00:05:29.759 --> 00:05:33.720
<v Speaker 2>Less is often more. Every little line, every color should

117
00:05:33.800 --> 00:05:36.240
<v Speaker 2>be there for a reason, helping the data speak. If

118
00:05:36.279 --> 00:05:38.319
<v Speaker 2>it's not adding understanding, maybe it shouldn't be there.

119
00:05:38.600 --> 00:05:41.399
<v Speaker 1>It reminds me of Edward Tuff's work. The book mentions him, right.

120
00:05:41.319 --> 00:05:44.519
<v Speaker 2>Oh, yeah, Tuft's foundational his concept of the data to

121
00:05:44.600 --> 00:05:45.319
<v Speaker 2>ink ratio.

122
00:05:45.920 --> 00:05:49.000
<v Speaker 1>Right, maximize the ink that shows data, minimize the rest.

123
00:05:48.839 --> 00:05:50.800
<v Speaker 2>Exactly, get rid of the chart junk. He also talked

124
00:05:50.839 --> 00:05:55.480
<v Speaker 2>about graphical excellence, showing interesting data clearly, efficiently, telling the

125
00:05:55.480 --> 00:05:58.480
<v Speaker 2>truth about it, getting the most ideas across with the

126
00:05:58.560 --> 00:05:59.480
<v Speaker 2>least visual noise.

127
00:05:59.600 --> 00:06:04.759
<v Speaker 1>Makes sense, simplify, remove extra gridlines, pointless colors.

128
00:06:04.600 --> 00:06:08.040
<v Speaker 2>Usually yes, But the book throws in a really interesting

129
00:06:08.079 --> 00:06:13.560
<v Speaker 2>curveball here. Some research Bateman Borkin, they found that sometimes

130
00:06:13.639 --> 00:06:17.560
<v Speaker 2>those more visually embellished graphs, almost like many infographics, Yeah,

131
00:06:17.759 --> 00:06:20.399
<v Speaker 2>they can actually be more memorable than the super simple,

132
00:06:20.439 --> 00:06:21.079
<v Speaker 2>clean ones.

133
00:06:21.199 --> 00:06:25.319
<v Speaker 1>Really, that's counterintuitive, more memorable even if they're harder to

134
00:06:25.360 --> 00:06:26.680
<v Speaker 1>read initially.

135
00:06:26.199 --> 00:06:29.199
<v Speaker 2>It seems so. Yeah, people might recall something visually unique

136
00:06:29.279 --> 00:06:30.920
<v Speaker 2>or novel more easily later on.

137
00:06:31.199 --> 00:06:33.360
<v Speaker 1>Huh. So there's a bit of a trade off maybe

138
00:06:33.680 --> 00:06:36.160
<v Speaker 1>between immediate clarity and long term.

139
00:06:35.959 --> 00:06:40.720
<v Speaker 2>Recall potentially, But the key is memorable doesn't automatically mean

140
00:06:41.199 --> 00:06:42.800
<v Speaker 2>easy to interpret.

141
00:06:42.480 --> 00:06:46.519
<v Speaker 1>Accurately, right, which brings us to that third category of problems,

142
00:06:46.800 --> 00:06:48.759
<v Speaker 1>perceptual issues exactly.

143
00:06:48.879 --> 00:06:51.720
<v Speaker 2>This is where it gets really fascinating. Even a clean,

144
00:06:51.959 --> 00:06:56.879
<v Speaker 2>well designed graph can unintentionally mislead people just because of

145
00:06:56.879 --> 00:06:59.560
<v Speaker 2>how our brains work. How so well? The book shows

146
00:06:59.560 --> 00:07:02.879
<v Speaker 2>an example with stacked bar charts, trying to compare the

147
00:07:02.920 --> 00:07:05.800
<v Speaker 2>size of the say, middle segment, across several different bars.

148
00:07:06.439 --> 00:07:08.439
<v Speaker 2>It's surprisingly difficult for our eyes.

149
00:07:08.720 --> 00:07:11.560
<v Speaker 1>Yeah, I can picture that your baseline keeps changing, right.

150
00:07:11.800 --> 00:07:13.920
<v Speaker 2>And there's another example with lines that look like they're

151
00:07:13.920 --> 00:07:17.399
<v Speaker 2>converging getting closer just because of the aspect ratio the

152
00:07:17.439 --> 00:07:20.279
<v Speaker 2>shape of the plot, even if the underlying data shows

153
00:07:20.279 --> 00:07:21.279
<v Speaker 2>they're staying parallel.

154
00:07:21.519 --> 00:07:24.560
<v Speaker 1>Wow, So good taste isn't enough. You really need to

155
00:07:24.639 --> 00:07:25.639
<v Speaker 1>understand perception.

156
00:07:25.959 --> 00:07:28.639
<v Speaker 2>You absolutely do. And our perception is a uniform right,

157
00:07:28.759 --> 00:07:31.959
<v Speaker 2>Like how we see color. Our ability to distinguish shades

158
00:07:32.600 --> 00:07:34.000
<v Speaker 2>changes across the spectrum.

159
00:07:34.079 --> 00:07:36.000
<v Speaker 1>And it depends on lightness too, doesn't it.

160
00:07:36.279 --> 00:07:39.519
<v Speaker 2>Yeah, chroma depends on luminance. It gets complex. That's why

161
00:07:39.759 --> 00:07:43.839
<v Speaker 2>the book really pushes for using perceptually uniform color palettes.

162
00:07:44.000 --> 00:07:46.079
<v Speaker 1>Perceptually uniform, Okay, what does that mean?

163
00:07:46.120 --> 00:07:49.839
<v Speaker 2>Exactly? Imagine a color ramp where each step up represents

164
00:07:49.879 --> 00:07:53.439
<v Speaker 2>an equal increase in the data value. A perceptually uniform

165
00:07:53.519 --> 00:07:57.199
<v Speaker 2>palette makes those steps look equally spaced in color intensity.

166
00:07:57.360 --> 00:08:00.360
<v Speaker 1>Ah, so a non uniform one might make some small

167
00:08:00.480 --> 00:08:03.680
<v Speaker 1>data changes look huge visually or vice versa.

168
00:08:03.920 --> 00:08:08.480
<v Speaker 2>Exactly, it avoids accidentally emphasizing or de emphasizing parts of

169
00:08:08.519 --> 00:08:11.360
<v Speaker 2>the data just because of quirks in the color scale.

170
00:08:11.560 --> 00:08:14.199
<v Speaker 1>Okay, so the book talks about different types of these palets.

171
00:08:14.279 --> 00:08:18.480
<v Speaker 2>Yeah, three main ones. First, sequential scales, think lo to

172
00:08:18.600 --> 00:08:22.720
<v Speaker 2>high data like income or maybe temperature if it's all positive.

173
00:08:22.560 --> 00:08:25.120
<v Speaker 1>Makes sense, like light blue to dark blue.

174
00:08:25.240 --> 00:08:28.120
<v Speaker 2>Right. Then you have diverging scales. These are for data

175
00:08:28.160 --> 00:08:33.000
<v Speaker 2>with a meaningful midpoint like zero temperature changes, maybe deviations

176
00:08:33.000 --> 00:08:34.200
<v Speaker 2>from an average.

177
00:08:33.919 --> 00:08:36.360
<v Speaker 1>Like that blue to red scale example figure one point.

178
00:08:36.399 --> 00:08:37.240
<v Speaker 1>Then you see that's a.

179
00:08:37.240 --> 00:08:41.120
<v Speaker 2>Classic zero or the midpoint is usually a neutral color

180
00:08:41.320 --> 00:08:44.120
<v Speaker 2>like white or light gray, and the extremes diverge to

181
00:08:44.159 --> 00:08:45.159
<v Speaker 2>two different hues.

182
00:08:45.240 --> 00:08:45.600
<v Speaker 1>Okay.

183
00:08:45.879 --> 00:08:50.080
<v Speaker 2>And third type qualitative talents. These are for categorical data

184
00:08:50.080 --> 00:08:53.720
<v Speaker 2>where there's no inherent order. Think countries, talks of products,

185
00:08:53.879 --> 00:08:54.759
<v Speaker 2>political parties.

186
00:08:54.799 --> 00:08:58.480
<v Speaker 1>So the goal there is just distinct colors distinct.

187
00:08:58.120 --> 00:09:01.960
<v Speaker 2>But also ideally with similar visual weight, so one category

188
00:09:02.000 --> 00:09:05.240
<v Speaker 2>doesn't just pop out unintentionally. The bottom palette that same

189
00:09:05.279 --> 00:09:07.320
<v Speaker 2>figure one point one end scene is a good example.

190
00:09:07.399 --> 00:09:10.120
<v Speaker 1>It's really about making sure the visual differences match the

191
00:09:10.200 --> 00:09:12.279
<v Speaker 1>data differences accurately precisely.

192
00:09:12.600 --> 00:09:14.960
<v Speaker 2>Using the wrong palate can really mess with interpretation.

193
00:09:15.320 --> 00:09:18.559
<v Speaker 1>The book also mentions complexity overload trying to map too

194
00:09:18.600 --> 00:09:19.399
<v Speaker 1>many things at once.

195
00:09:19.679 --> 00:09:22.960
<v Speaker 2>Yeah, like using size and shape and color and position

196
00:09:23.360 --> 00:09:26.000
<v Speaker 2>all in one go. Unless the data has a really

197
00:09:26.039 --> 00:09:29.840
<v Speaker 2>really clear structure, it just becomes noise. Figure one point

198
00:09:29.919 --> 00:09:32.759
<v Speaker 2>nineteen shows that, Well, hard to track everything.

199
00:09:32.399 --> 00:09:34.919
<v Speaker 1>Too much happening. And what about gestalt rules?

200
00:09:35.039 --> 00:09:38.759
<v Speaker 2>Ah, yeah, that's about how our brains naturally look for patterns.

201
00:09:39.200 --> 00:09:42.360
<v Speaker 2>We group things, we connect things. We see shapes even.

202
00:09:42.200 --> 00:09:45.559
<v Speaker 1>If they aren't really there sometimes like seeing faces in clouds,

203
00:09:45.759 --> 00:09:46.320
<v Speaker 1>kind of like that.

204
00:09:46.440 --> 00:09:49.960
<v Speaker 2>Yeah. Figure one point one each shows seemingly random dots,

205
00:09:50.320 --> 00:09:52.559
<v Speaker 2>but you can't help trying to see clusters or lines.

206
00:09:53.360 --> 00:09:56.000
<v Speaker 2>This is powerful if you use it right in visualization design,

207
00:09:56.080 --> 00:09:58.440
<v Speaker 2>but it can also trick people into seeing patterns that

208
00:09:58.480 --> 00:09:59.519
<v Speaker 2>are just random chance.

209
00:10:00.039 --> 00:10:03.120
<v Speaker 1>So understanding perception is crucial, which leads to how we

210
00:10:03.159 --> 00:10:06.639
<v Speaker 1>actually encode data visually. The book talks about Cleveland and

211
00:10:06.720 --> 00:10:08.600
<v Speaker 1>McGill's research foundational stuff.

212
00:10:08.799 --> 00:10:11.519
<v Speaker 2>Figure one point two to three summarizes it. They basically

213
00:10:11.559 --> 00:10:13.639
<v Speaker 2>figured out what visual tasks were best at?

214
00:10:13.840 --> 00:10:16.360
<v Speaker 1>Perceptual Okay, what's at the top. What are we best at?

215
00:10:16.440 --> 00:10:20.720
<v Speaker 2>Judging position along a common scale? Think comparing bar heights

216
00:10:20.720 --> 00:10:23.559
<v Speaker 2>in a standard bar chart. We're really accurate.

217
00:10:23.200 --> 00:10:25.559
<v Speaker 1>At That makes sense, they all start from zero, right.

218
00:10:25.960 --> 00:10:29.919
<v Speaker 2>Then comes position on a lined but separate scales. Still

219
00:10:29.960 --> 00:10:34.279
<v Speaker 2>pretty good. Then judging links like line segments, but only

220
00:10:34.360 --> 00:10:35.919
<v Speaker 2>if they share a common baseline.

221
00:10:36.080 --> 00:10:38.399
<v Speaker 1>Hmm. Okay, and what are we worse at?

222
00:10:38.679 --> 00:10:42.320
<v Speaker 2>Our accuracy drops off for judging links without a common baseline.

223
00:10:43.200 --> 00:10:46.360
<v Speaker 2>Then things like angles, which is why pie charts can

224
00:10:46.440 --> 00:10:50.360
<v Speaker 2>be problematic for comparison in area and volume and color

225
00:10:50.399 --> 00:10:53.000
<v Speaker 2>saturation or hue are further down the list.

226
00:10:53.120 --> 00:10:56.320
<v Speaker 1>So this hierarchy should guide our choices. If you want

227
00:10:56.360 --> 00:10:59.039
<v Speaker 1>people to compare values accurately, use.

228
00:10:58.919 --> 00:11:02.080
<v Speaker 2>Position along the commons. Bar charts are often great for that.

229
00:11:02.120 --> 00:11:04.720
<v Speaker 2>If you're showing trends, maybe line charts work well for

230
00:11:04.799 --> 00:11:07.559
<v Speaker 2>judging slope or angle, though even that's not top tier.

231
00:11:07.759 --> 00:11:10.720
<v Speaker 1>It really highlights why choosing the right chart type matters

232
00:11:10.720 --> 00:11:13.200
<v Speaker 1>so much for effective communication. It's about how easily the

233
00:11:13.279 --> 00:11:16.399
<v Speaker 1>viewer can decode the information exactly, and.

234
00:11:16.320 --> 00:11:18.919
<v Speaker 2>The book also stresses it's not just which channel you

235
00:11:19.000 --> 00:11:22.440
<v Speaker 2>choose like color or position, but how you implement.

236
00:11:22.000 --> 00:11:24.960
<v Speaker 1>It, like using a good sequential palette for ordered data

237
00:11:25.120 --> 00:11:27.039
<v Speaker 1>or distinct hues for categories.

238
00:11:27.120 --> 00:11:30.240
<v Speaker 2>Precisely the details of the implementation matter hugely.

239
00:11:30.440 --> 00:11:32.480
<v Speaker 1>Okay, this is great theory, but the book is also

240
00:11:32.720 --> 00:11:36.159
<v Speaker 1>very practical. Right it dives into using R and gg

241
00:11:36.279 --> 00:11:36.720
<v Speaker 1>plot two.

242
00:11:36.960 --> 00:11:39.440
<v Speaker 2>It does it shift skiers into how you actually make

243
00:11:39.480 --> 00:11:41.240
<v Speaker 2>these visualizations using code.

244
00:11:41.320 --> 00:11:44.159
<v Speaker 1>Now, programming can sound a bit scary. The book suggests

245
00:11:44.159 --> 00:11:47.279
<v Speaker 1>starting with something called R mark down. Why is that helpful?

246
00:11:47.480 --> 00:11:51.279
<v Speaker 2>Armarkdown is fantastic for reproducibility unless you combine your code,

247
00:11:51.399 --> 00:11:54.600
<v Speaker 2>your notes, and your output the plots, the tables all

248
00:11:54.639 --> 00:11:55.919
<v Speaker 2>in one document.

249
00:11:55.600 --> 00:11:57.879
<v Speaker 1>So you can see exactly how you got a result exactly.

250
00:11:58.039 --> 00:12:01.159
<v Speaker 2>You write in plaintext embedchun of our code. When you

251
00:12:01.200 --> 00:12:04.000
<v Speaker 2>process the document, the code runs and the results get

252
00:12:04.000 --> 00:12:07.120
<v Speaker 2>inserted right there. It's great for keeping track, sharing work,

253
00:12:07.519 --> 00:12:09.679
<v Speaker 2>and avoiding that how do I make this chart again?

254
00:12:09.879 --> 00:12:13.240
<v Speaker 1>Problem? That sounds incredibly useful and R itself.

255
00:12:13.480 --> 00:12:16.360
<v Speaker 2>R is a super powerful language widely used in statistics

256
00:12:16.360 --> 00:12:19.440
<v Speaker 2>and data science, and gg plot two is this amazing

257
00:12:19.480 --> 00:12:22.159
<v Speaker 2>package within R for visualization.

258
00:12:21.600 --> 00:12:24.320
<v Speaker 1>Built on the grammar of graphics what's that about.

259
00:12:24.559 --> 00:12:26.759
<v Speaker 2>Think of it like a system for building graphs piece

260
00:12:26.799 --> 00:12:30.200
<v Speaker 2>by piece. You start with your data, then you define

261
00:12:30.360 --> 00:12:34.440
<v Speaker 2>esthetic mappings linking data variables to visual properties like exposition,

262
00:12:34.840 --> 00:12:37.720
<v Speaker 2>we position, color size.

263
00:12:37.240 --> 00:12:38.919
<v Speaker 1>Okay, mapping data to visuals.

264
00:12:39.080 --> 00:12:42.039
<v Speaker 2>Then you choose gms to the geometric objects like points, lines,

265
00:12:42.120 --> 00:12:44.720
<v Speaker 2>bars that actually represent the data, and you layer these

266
00:12:44.720 --> 00:12:45.279
<v Speaker 2>things together.

267
00:12:45.399 --> 00:12:47.440
<v Speaker 1>So it's a structured way to think about building any

268
00:12:47.519 --> 00:12:47.759
<v Speaker 1>kind of.

269
00:12:47.759 --> 00:12:51.480
<v Speaker 2>Plot exactly, very flexible, very powerful once you grasp the

270
00:12:51.480 --> 00:12:55.639
<v Speaker 2>core ideas developed by Leland Wilkinson implemented in gig plot

271
00:12:55.639 --> 00:12:56.919
<v Speaker 2>two by Hadley Wickham.

272
00:12:57.200 --> 00:13:00.799
<v Speaker 1>And the book mentions the ecology of assistance better.

273
00:13:00.399 --> 00:13:03.080
<v Speaker 2>Now Yeah, basically meaning there's just so much help available

274
00:13:03.120 --> 00:13:07.919
<v Speaker 2>online now, websites like stack overflow, our communities, tutorials, blogs,

275
00:13:08.480 --> 00:13:10.559
<v Speaker 2>it's much easier to get started in financewers. When you

276
00:13:10.559 --> 00:13:11.559
<v Speaker 2>get stuck then it used to.

277
00:13:11.480 --> 00:13:14.879
<v Speaker 1>Be that's encouraging. So to get started, the book says,

278
00:13:14.919 --> 00:13:17.120
<v Speaker 1>install the tidy verse right.

279
00:13:17.399 --> 00:13:20.240
<v Speaker 2>The tidy Verse is a collection of our packages including

280
00:13:20.279 --> 00:13:23.279
<v Speaker 2>deep plot two, deeplier for a data manipulation, and others,

281
00:13:23.639 --> 00:13:26.600
<v Speaker 2>all designed to work together really well. You install it

282
00:13:26.600 --> 00:13:29.840
<v Speaker 2>in our studio usually with just installed out packages.

283
00:13:29.639 --> 00:13:32.879
<v Speaker 1>Tidy verse, and the book suggests typing out the examples.

284
00:13:33.080 --> 00:13:35.720
<v Speaker 2>Yeah, it's good advice. Actually typing the code helps it

285
00:13:35.759 --> 00:13:37.679
<v Speaker 2>sync in much better than just copy pasting.

286
00:13:37.799 --> 00:13:42.000
<v Speaker 1>Good tip and reassuringly. Gplot's defaults are pretty.

287
00:13:41.679 --> 00:13:45.919
<v Speaker 2>Good generally, Yes, the default settings for colors, themes, et

288
00:13:46.000 --> 00:13:49.240
<v Speaker 2>cetera are thoughtfully chosen. You can often get a decent

289
00:13:49.279 --> 00:13:52.840
<v Speaker 2>looking informative plot without much tweaking, which is great for beginners.

290
00:13:53.000 --> 00:13:56.519
<v Speaker 1>Okay, let's get into those core jiggy plot concepts. First,

291
00:13:56.639 --> 00:13:59.679
<v Speaker 1>ascetic mappings using ease. Break that down again.

292
00:13:59.799 --> 00:14:02.519
<v Speaker 2>Right. Ease is where you tell gd plot which variables

293
00:14:02.519 --> 00:14:05.840
<v Speaker 2>in your data control which visual property. So as x

294
00:14:05.960 --> 00:14:10.159
<v Speaker 2>gdt per cap y equals life x, color equals the

295
00:14:10.360 --> 00:14:13.679
<v Speaker 2>x axis, life x controls the I axis, and the

296
00:14:13.720 --> 00:14:15.279
<v Speaker 2>continent column controls the color.

297
00:14:15.399 --> 00:14:17.799
<v Speaker 1>Crucially, you're not saying which color, just what controls the

298
00:14:17.799 --> 00:14:18.639
<v Speaker 1>color exactly.

299
00:14:18.720 --> 00:14:22.200
<v Speaker 2>Gb plot handles assigning the actual colors, positions, et cetera

300
00:14:22.519 --> 00:14:23.639
<v Speaker 2>based on the data values.

301
00:14:23.720 --> 00:14:25.600
<v Speaker 1>Okay, then GMS GMS.

302
00:14:25.240 --> 00:14:29.279
<v Speaker 2>Are the visual markers. Geompoint makes a scatterplot, gmline draws lines,

303
00:14:29.600 --> 00:14:33.320
<v Speaker 2>GMAM makes bar shirts. Gmsmooth adds a smooth trend line

304
00:14:33.399 --> 00:14:35.159
<v Speaker 2>you add into your plot with a plus sign.

305
00:14:35.399 --> 00:14:38.279
<v Speaker 1>So ggplot sets up the canvas and mappings. Then you

306
00:14:38.360 --> 00:14:40.960
<v Speaker 1>add plus gom point or plus gmbi.

307
00:14:41.080 --> 00:14:43.240
<v Speaker 2>You got you build plots layer by layer.

308
00:14:43.399 --> 00:14:45.679
<v Speaker 1>And the importance of tidy data. Ah.

309
00:14:45.759 --> 00:14:48.039
<v Speaker 2>Yes, tidy data is a way of structuring your data

310
00:14:48.080 --> 00:14:51.600
<v Speaker 2>set that ggplot and the tidy verse really prefer. Basically,

311
00:14:52.200 --> 00:14:55.399
<v Speaker 2>each variable gets its own column, each observation gets its.

312
00:14:55.279 --> 00:14:58.840
<v Speaker 1>Own row like a long format, not wide exactly.

313
00:14:58.919 --> 00:15:01.279
<v Speaker 2>It might seem like a small detail, but organizing your

314
00:15:01.320 --> 00:15:04.360
<v Speaker 2>data this way makes working with gdplot much much smoother

315
00:15:04.440 --> 00:15:05.200
<v Speaker 2>and more intuitive.

316
00:15:05.240 --> 00:15:08.080
<v Speaker 1>Got it? And this idea of inheritance of mappings.

317
00:15:08.279 --> 00:15:10.519
<v Speaker 2>That just means if you define mappings in the main

318
00:15:10.639 --> 00:15:15.039
<v Speaker 2>gd plot call like g plot data gapminder asex c

319
00:15:15.240 --> 00:15:18.000
<v Speaker 2>GDP per cap y life x, any gms you add

320
00:15:18.039 --> 00:15:20.799
<v Speaker 2>later like plus gom point or plus gm smooth will

321
00:15:20.840 --> 00:15:23.279
<v Speaker 2>automatically use those x and y mappings.

322
00:15:22.960 --> 00:15:25.600
<v Speaker 1>Unless you override them specifically in the GM right.

323
00:15:25.720 --> 00:15:28.240
<v Speaker 2>You can give a GM its own a's mapping if needed,

324
00:15:28.480 --> 00:15:31.080
<v Speaker 2>but inherence saves a lot of typing for common mappings.

325
00:15:31.200 --> 00:15:33.559
<v Speaker 1>Okay, let's run through some practical plot examples from the

326
00:15:33.600 --> 00:15:37.919
<v Speaker 1>book Basic Scatterplot Life expectancy versus GDP per capita using

327
00:15:37.919 --> 00:15:39.480
<v Speaker 1>the gapminder data YEP.

328
00:15:39.799 --> 00:15:43.440
<v Speaker 2>That would be gg plot data gapminder mapping es x

329
00:15:43.639 --> 00:15:46.360
<v Speaker 2>GDP per cap y lifex that sets it up plus

330
00:15:46.360 --> 00:15:50.080
<v Speaker 2>g ome point boom scatterplot simple enough. Add a smoother

331
00:15:50.399 --> 00:15:52.840
<v Speaker 2>Just add plus GM smooth. On the next line. Gg

332
00:15:52.919 --> 00:15:56.279
<v Speaker 2>plot adds a default trend line, usually with a confidence

333
00:15:56.320 --> 00:15:56.840
<v Speaker 2>band around it.

334
00:15:56.960 --> 00:16:00.159
<v Speaker 1>Nice. Now that GDP data is probably skewed right of

335
00:16:00.159 --> 00:16:02.039
<v Speaker 1>lower values a few very high ones.

336
00:16:01.919 --> 00:16:05.399
<v Speaker 2>Usually is makes the scatterplot bunch up on one side.

337
00:16:05.480 --> 00:16:08.519
<v Speaker 1>So transforming the scale like a log scale for the

338
00:16:08.720 --> 00:16:09.840
<v Speaker 1>X axis good idea.

339
00:16:09.960 --> 00:16:13.120
<v Speaker 2>Yes, add another layer plus scale x log ten that

340
00:16:13.200 --> 00:16:16.039
<v Speaker 2>transforms the x axis to a base ten log scale,

341
00:16:16.200 --> 00:16:17.840
<v Speaker 2>spreading the data out much better.

342
00:16:17.720 --> 00:16:20.639
<v Speaker 1>Visually okay, and making it look more professional. Titles axis

343
00:16:20.720 --> 00:16:21.440
<v Speaker 1>labels use.

344
00:16:21.360 --> 00:16:24.879
<v Speaker 2>The labs function add plus lab title my plot title

345
00:16:25.320 --> 00:16:28.360
<v Speaker 2>x GDP per capita why life expectancy.

346
00:16:27.919 --> 00:16:30.240
<v Speaker 1>Simple and what if you want to format the axis

347
00:16:30.320 --> 00:16:32.639
<v Speaker 1>labels like showing dollars on the X axis.

348
00:16:32.759 --> 00:16:35.360
<v Speaker 2>That's where the scales package comes in Handy You modified

349
00:16:35.399 --> 00:16:38.120
<v Speaker 2>the scale function maybe like plus scale x log ten

350
00:16:38.360 --> 00:16:40.759
<v Speaker 2>labels at cool scales dot dollar gives you a nice

351
00:16:40.759 --> 00:16:42.080
<v Speaker 2>dollar formatting Cool.

352
00:16:42.159 --> 00:16:45.879
<v Speaker 1>Now, what about mapping categories like coloring the points by continent.

353
00:16:46.000 --> 00:16:49.879
<v Speaker 2>You add color continent inside the a's function, so as

354
00:16:50.159 --> 00:16:53.519
<v Speaker 2>xx GDP per cap y life x color.

355
00:16:53.480 --> 00:16:56.120
<v Speaker 1>Continent and ggplot handles the rest YEP.

356
00:16:56.240 --> 00:16:58.559
<v Speaker 2>It assigns a color to each continent and automatically adds

357
00:16:58.559 --> 00:17:02.200
<v Speaker 2>a legend explaining the colors. If you also have GM smooth,

358
00:17:02.600 --> 00:17:05.640
<v Speaker 2>you'll likely get a separate smooth line for each continent

359
00:17:05.839 --> 00:17:07.079
<v Speaker 2>in its corresponding color.

360
00:17:07.279 --> 00:17:10.839
<v Speaker 1>Okay, this brings up that crucial difference mapping versus setting

361
00:17:11.279 --> 00:17:12.759
<v Speaker 1>making all points purple.

362
00:17:12.839 --> 00:17:16.160
<v Speaker 2>For instance, right, if you put color purple inside a's

363
00:17:16.240 --> 00:17:19.599
<v Speaker 2>fod G plot treats purple as a data value, it

364
00:17:19.640 --> 00:17:22.240
<v Speaker 2>gives all points the same default color and makes a

365
00:17:22.359 --> 00:17:23.640
<v Speaker 2>useless legend entry.

366
00:17:23.400 --> 00:17:25.119
<v Speaker 1>For purple because you mapped it to data.

367
00:17:25.359 --> 00:17:27.480
<v Speaker 2>Exactly, If you just want to set all points b purple,

368
00:17:27.519 --> 00:17:30.720
<v Speaker 2>you put color purple outside a's inside the GM function

369
00:17:30.759 --> 00:17:33.240
<v Speaker 2>itself like GM point color purple.

370
00:17:33.359 --> 00:17:37.200
<v Speaker 1>No mapping, just setting a fixed visual property, no legend needed.

371
00:17:37.160 --> 00:17:41.000
<v Speaker 2>Precisely, huge difference. Common point of confusion makes sense.

372
00:17:41.079 --> 00:17:43.799
<v Speaker 1>Then there's faceting, splitting the plot into panels.

373
00:17:44.039 --> 00:17:47.720
<v Speaker 2>Yes, super useful. Face wrap lets you split by one

374
00:17:47.759 --> 00:17:51.640
<v Speaker 2>categorical variable, arranging panels and a grid face. A grid

375
00:17:51.880 --> 00:17:54.480
<v Speaker 2>lets you split by two variables, creating a two D

376
00:17:54.599 --> 00:17:55.960
<v Speaker 2>grid of plots like.

377
00:17:55.920 --> 00:17:59.160
<v Speaker 1>That age versus children example, faceted by sex and.

378
00:17:59.160 --> 00:18:03.039
<v Speaker 2>Race exactly to compare relationships across different groups really effectively.

379
00:18:03.200 --> 00:18:06.319
<v Speaker 1>What about visualizing just one continuous variable?

380
00:18:06.640 --> 00:18:10.559
<v Speaker 2>Histograms EM histogram you map your variable to x like

381
00:18:10.680 --> 00:18:13.720
<v Speaker 2>a x use area. It bins the data and shows

382
00:18:13.720 --> 00:18:16.359
<v Speaker 2>counts as bars. You might need to adjust the binwith

383
00:18:16.440 --> 00:18:18.720
<v Speaker 2>argument to get a good view. Or density plots GEM

384
00:18:18.759 --> 00:18:21.920
<v Speaker 2>density similar idea, but gives you a smooth curve estimated

385
00:18:21.920 --> 00:18:25.559
<v Speaker 2>distribution often a nice alternative or complement to histograms.

386
00:18:25.200 --> 00:18:27.039
<v Speaker 1>And briefly, graph tables.

387
00:18:27.599 --> 00:18:30.640
<v Speaker 2>GM table allows embedding a small table right onto the plot.

388
00:18:30.720 --> 00:18:33.279
<v Speaker 2>Can be handy for showing summary stats alongside the visual.

389
00:18:33.319 --> 00:18:37.000
<v Speaker 1>Okay, crucial step saving your masterpiece? How do we save plots?

390
00:18:37.440 --> 00:18:40.279
<v Speaker 2>Easiest way is DG save After you display your plot

391
00:18:40.359 --> 00:18:43.119
<v Speaker 2>to type gg stave myplot dot pdf or gg save

392
00:18:43.240 --> 00:18:46.920
<v Speaker 2>myplot dot PNG. It saves the last plot by default.

393
00:18:46.680 --> 00:18:49.519
<v Speaker 1>PDF versus PNG you mentioned, vector versus raster.

394
00:18:49.880 --> 00:18:53.519
<v Speaker 2>Yeah. Vector formats like PDF or SVG are usually best.

395
00:18:53.720 --> 00:18:56.680
<v Speaker 2>They store the plot as lines and shapes, so you

396
00:18:56.720 --> 00:19:00.720
<v Speaker 2>can resize them infinitely without getting blurry. Good for publications.

397
00:19:00.880 --> 00:19:04.160
<v Speaker 1>Raster formats like PNG or JPEG are pixel based, so

398
00:19:04.200 --> 00:19:06.519
<v Speaker 1>they can get blocky if you enlarge them too much.

399
00:19:06.640 --> 00:19:09.240
<v Speaker 2>Right, use vector when you can, especially for line art

400
00:19:09.279 --> 00:19:10.519
<v Speaker 2>like most plots, and.

401
00:19:10.480 --> 00:19:12.880
<v Speaker 1>The here package for filepaths.

402
00:19:12.440 --> 00:19:15.279
<v Speaker 2>Highly recommend it. It helps make your filepaths relative to

403
00:19:15.319 --> 00:19:18.240
<v Speaker 2>your project root directory, so your code doesn't break if

404
00:19:18.240 --> 00:19:20.440
<v Speaker 2>you move the project folder or share it. Much more

405
00:19:20.519 --> 00:19:24.079
<v Speaker 2>robust dot here, dot here output my plot dot pdf

406
00:19:24.160 --> 00:19:28.039
<v Speaker 2>kind of thing, okay. Chapter five delves into refining plots.

407
00:19:28.319 --> 00:19:30.119
<v Speaker 2>Key takeaways main things.

408
00:19:30.640 --> 00:19:34.240
<v Speaker 1>Every aesthetic mapping has a scale scale dot function you

409
00:19:34.240 --> 00:19:38.359
<v Speaker 1>can adjust like tics. Labels, sales often have guides legends

410
00:19:38.359 --> 00:19:39.200
<v Speaker 1>you can tweak.

411
00:19:38.880 --> 00:19:40.839
<v Speaker 2>With guides and cosmetic changes.

412
00:19:40.599 --> 00:19:42.799
<v Speaker 1>Often done in the GM function itself or using the

413
00:19:42.799 --> 00:19:46.160
<v Speaker 1>big theme function for overall plot appearance, background, gridlines, fonts,

414
00:19:46.240 --> 00:19:47.279
<v Speaker 1>legend position.

415
00:19:47.160 --> 00:19:50.640
<v Speaker 2>Right like theme, legend dot position, a bottle and labs

416
00:19:50.920 --> 00:19:54.359
<v Speaker 2>for all labels YEP labs it's your friend for access titles,

417
00:19:54.400 --> 00:19:58.519
<v Speaker 2>legend titles, plot titles, subtitles, captions keep things clearly labeled.

418
00:19:58.759 --> 00:20:02.640
<v Speaker 1>Moving on, Chapter six touches on visualizing models, not just

419
00:20:02.759 --> 00:20:03.240
<v Speaker 1>raw data.

420
00:20:03.359 --> 00:20:06.680
<v Speaker 2>Yeah, visualization is huge for understanding modern results too. The

421
00:20:06.680 --> 00:20:09.680
<v Speaker 2>Broom package is amazing here. How so it tidies up

422
00:20:09.680 --> 00:20:13.640
<v Speaker 2>MESSI model output. Tidy gives you coefficients, p values, et cetera,

423
00:20:13.880 --> 00:20:17.200
<v Speaker 2>and a nice table augment adds predictions and residuals back

424
00:20:17.200 --> 00:20:20.680
<v Speaker 2>to your original data. Glance gives model summary stats.

425
00:20:20.359 --> 00:20:22.799
<v Speaker 1>So you can then plot those Tidy results exactly.

426
00:20:22.880 --> 00:20:25.759
<v Speaker 2>You can plot coefficients using packages like cofplot. You can

427
00:20:25.799 --> 00:20:28.839
<v Speaker 2>generate predictions from your model using predict, then plot those predictions,

428
00:20:28.960 --> 00:20:32.200
<v Speaker 2>maybe with confidence intervals using gn Ribbon mixed models much

429
00:20:32.240 --> 00:20:32.920
<v Speaker 2>less abstract.

430
00:20:33.079 --> 00:20:37.279
<v Speaker 1>Chapter seven maps another visual form, but with unique issues.

431
00:20:37.759 --> 00:20:41.160
<v Speaker 2>Definitely, maps are tricky. Data is often tied to geographic

432
00:20:41.240 --> 00:20:46.039
<v Speaker 2>units of varying sizes and populations, the modifiable aerial unit problem.

433
00:20:45.880 --> 00:20:48.400
<v Speaker 1>Like how election maps can look different depending on whether

434
00:20:48.400 --> 00:20:51.240
<v Speaker 1>you color states or counties or scale by population.

435
00:20:51.720 --> 00:20:55.640
<v Speaker 2>Precisely, the book shows that twenty sixteen election maps illustrating this.

436
00:20:56.400 --> 00:21:00.319
<v Speaker 2>It also stresses looking at baseline variables like population density

437
00:21:00.640 --> 00:21:04.880
<v Speaker 2>when interpreting coreplith maps maps colored by value.

438
00:21:04.599 --> 00:21:08.200
<v Speaker 1>Right, because a large, sparsely populated area colored dark red

439
00:21:08.279 --> 00:21:11.880
<v Speaker 1>might represent fewer actual votes than a small dense area

440
00:21:11.920 --> 00:21:13.039
<v Speaker 1>colored lighter.

441
00:21:12.759 --> 00:21:15.240
<v Speaker 2>Red exactly, gotta be careful with interpretation.

442
00:21:15.440 --> 00:21:19.640
<v Speaker 1>Chapter eight gets back to color importance of colorblind friendliness crucial.

443
00:21:20.160 --> 00:21:22.720
<v Speaker 2>About eight percent of man have some form of color

444
00:21:22.799 --> 00:21:26.839
<v Speaker 2>vision deficiency. Use palettes designed to be distinguishable for everyone.

445
00:21:27.240 --> 00:21:30.079
<v Speaker 2>Packages like Dichromat and color Blinder help with this in.

446
00:21:30.240 --> 00:21:32.799
<v Speaker 1>R good point and using manual colors when they have

447
00:21:32.880 --> 00:21:35.359
<v Speaker 1>meaning like political parties, can be useful.

448
00:21:35.079 --> 00:21:39.319
<v Speaker 2>But the book cautions against stereotypes. Always test and themes. Themes.

449
00:21:39.480 --> 00:21:41.440
<v Speaker 1>Themes are great for changing the whole look and feel quickly.

450
00:21:41.599 --> 00:21:44.880
<v Speaker 1>UG themes has presets like them Economist or the Musejay,

451
00:21:45.279 --> 00:21:47.920
<v Speaker 1>or you can customize endlessly with the base theme function.

452
00:21:48.119 --> 00:21:51.799
<v Speaker 2>Chapter eight also warns about some plot types dual y axes,

453
00:21:52.039 --> 00:21:55.640
<v Speaker 2>yet generally avoid dual y ax as if possible, very

454
00:21:55.640 --> 00:21:59.359
<v Speaker 2>easy to mislead by manipulating the axis ranges independently. The

455
00:21:59.400 --> 00:22:02.839
<v Speaker 2>book suggests alternatives like indexing data to a common start

456
00:22:02.880 --> 00:22:04.640
<v Speaker 2>point or plotting the difference.

457
00:22:04.400 --> 00:22:05.920
<v Speaker 1>And the perennial favorite target.

458
00:22:06.319 --> 00:22:11.000
<v Speaker 2>Pie Charts ha yes generally poor for comparisons, especially if

459
00:22:11.000 --> 00:22:13.519
<v Speaker 2>they are many slices or the values are close. Board

460
00:22:13.599 --> 00:22:16.279
<v Speaker 2>charts are almost always better for showing parts of a

461
00:22:16.319 --> 00:22:17.680
<v Speaker 2>whole or comparing amounts.

462
00:22:17.880 --> 00:22:22.359
<v Speaker 1>Okay, Finally, the appendix emphasizes reproducibility and workflow super important.

463
00:22:22.680 --> 00:22:25.839
<v Speaker 2>Use R marked out, keep your data tidy, write functions

464
00:22:25.839 --> 00:22:29.000
<v Speaker 2>for repetitive tasks, know how to find help using i'm

465
00:22:29.200 --> 00:22:34.279
<v Speaker 2>and package vignettes in R. It makes your analysis more reliable, understandable,

466
00:22:34.319 --> 00:22:35.039
<v Speaker 2>and repeatable.

467
00:22:35.279 --> 00:22:39.519
<v Speaker 1>So wrapping up data visualization is incredibly powerful for insight.

468
00:22:39.359 --> 00:22:44.519
<v Speaker 2>But it requires thought, understanding, perception, design principles, honest representation, and.

469
00:22:44.480 --> 00:22:47.279
<v Speaker 1>Tools like R and gg plot two as the book shows,

470
00:22:47.400 --> 00:22:49.200
<v Speaker 1>give you the practical means to do it well.

471
00:22:49.319 --> 00:22:52.119
<v Speaker 2>Absolutely hopefully you the listener feel a bit more ready

472
00:22:52.160 --> 00:22:54.559
<v Speaker 2>to dive in and visualize your own data.

473
00:22:54.640 --> 00:22:56.960
<v Speaker 1>Yeah, those aha moments when you see a pattern are

474
00:22:57.039 --> 00:22:57.839
<v Speaker 1>really rewarding.

475
00:22:58.119 --> 00:23:00.960
<v Speaker 2>Definitely, and don't be intimidated by the learning curve. The

476
00:23:01.039 --> 00:23:03.559
<v Speaker 2>resources and community support are really strong.

477
00:23:03.599 --> 00:23:06.319
<v Speaker 1>Now, So here's a thought to leave you with. Think

478
00:23:06.359 --> 00:23:09.799
<v Speaker 1>about the last confusing chart you saw, How could some

479
00:23:09.880 --> 00:23:13.799
<v Speaker 1>of these ideas, clarity, perception, honesty have made it better?

480
00:23:13.920 --> 00:23:16.720
<v Speaker 2>And maybe more importantly, what stories are hiding in your

481
00:23:16.799 --> 00:23:18.680
<v Speaker 2>data waiting to be visualized.

482
00:23:18.920 --> 00:23:21.880
<v Speaker 1>We really encourage checking out the book Data visualization, a

483
00:23:21.920 --> 00:23:26.559
<v Speaker 1>practical introduction and exploring those R packages tidy Verse, GG

484
00:23:26.680 --> 00:23:30.559
<v Speaker 1>plot two, scales here, gg themes, Broom, lots of great tools.

485
00:23:30.559 --> 00:23:34.240
<v Speaker 2>Pause, remember looking at data, visualizing it. It's not a

486
00:23:34.279 --> 00:23:35.880
<v Speaker 2>substitute for thinking carefully

487
00:23:36.039 --> 00:23:38.960
<v Speaker 1>Right, but it's an absolutely essential part of asking better

488
00:23:39.039 --> 00:23:41.079
<v Speaker 1>questions and getting to those deeper insights.
