WEBVTT

1
00:00:00.080 --> 00:00:01.800
<v Speaker 1>Welcome to the deep dive. This is where we take

2
00:00:01.840 --> 00:00:06.000
<v Speaker 1>a whole stack of complex information, research notes articles and

3
00:00:06.040 --> 00:00:08.359
<v Speaker 1>really boil it down to the essentials for you. And

4
00:00:08.439 --> 00:00:11.640
<v Speaker 1>today we're plunging into a topic that's just well critical

5
00:00:11.679 --> 00:00:15.560
<v Speaker 1>for any serious developer, high performance Postgres School, specifically within

6
00:00:15.679 --> 00:00:19.120
<v Speaker 1>Ruby on Rail's applications. Our mission really is to give

7
00:00:19.120 --> 00:00:21.399
<v Speaker 1>you those key insights that will make your apps faster,

8
00:00:22.280 --> 00:00:25.120
<v Speaker 1>more reliable, definitely more resilient. What's kind of cool is

9
00:00:25.120 --> 00:00:27.399
<v Speaker 1>that our main source here is actually a beta book.

10
00:00:27.640 --> 00:00:30.120
<v Speaker 1>It's still being developed, so you're getting some really cutting edge,

11
00:00:30.760 --> 00:00:32.679
<v Speaker 1>very practical info straight from the trenches.

12
00:00:32.719 --> 00:00:35.280
<v Speaker 2>You might say, yeah, and that's exactly why this is

13
00:00:35.960 --> 00:00:39.439
<v Speaker 2>so relevant right now. Getting really good at postgreschool and rails.

14
00:00:39.880 --> 00:00:43.759
<v Speaker 2>That's not just helpful for your career, it's massively in demand,

15
00:00:44.000 --> 00:00:46.520
<v Speaker 2>right Just think about it. A Hired survey from twenty

16
00:00:46.560 --> 00:00:48.960
<v Speaker 2>twenty three it found Ruby on rails was actually the

17
00:00:48.960 --> 00:00:52.280
<v Speaker 2>most sought after skill. We're talking one point six y

18
00:00:52.320 --> 00:00:55.560
<v Speaker 2>four times more interview requests if you know your stuff, wow.

19
00:00:55.399 --> 00:00:58.039
<v Speaker 1>One point six four times Yeah.

20
00:00:57.799 --> 00:01:01.479
<v Speaker 2>And Postgres School it's consistently winning wards. It was number

21
00:01:01.520 --> 00:01:04.480
<v Speaker 2>one in the twenty twenty two stack Overflow survey for

22
00:01:04.599 --> 00:01:08.719
<v Speaker 2>most used database among pros and it's topped the dB

23
00:01:08.799 --> 00:01:12.200
<v Speaker 2>engine's ranking three times. So yeah, the time spent learning this,

24
00:01:12.920 --> 00:01:14.959
<v Speaker 2>it's it's definitely high impact knowledge.

25
00:01:15.040 --> 00:01:16.799
<v Speaker 1>Okay, all right, let's get into it. Then. Before we

26
00:01:16.840 --> 00:01:19.439
<v Speaker 1>can even you know, talk about performance tuning, we need

27
00:01:19.480 --> 00:01:21.719
<v Speaker 1>someone to actually do the tuning. We need a test bed.

28
00:01:22.400 --> 00:01:25.799
<v Speaker 1>So this source material. It introduces a fictional app called

29
00:01:25.920 --> 00:01:29.640
<v Speaker 1>ride share. What exactly is that? And I guess how

30
00:01:29.640 --> 00:01:31.000
<v Speaker 1>does it help us learn this stuff?

31
00:01:31.239 --> 00:01:31.400
<v Speaker 3>Right?

32
00:01:31.439 --> 00:01:34.719
<v Speaker 2>So ride share is it's basically designed as this simplified

33
00:01:34.760 --> 00:01:37.319
<v Speaker 2>API only web app. Think of it like a mini

34
00:01:37.439 --> 00:01:39.799
<v Speaker 2>Uber or Lyft. You know, it's got the core active

35
00:01:39.840 --> 00:01:43.560
<v Speaker 2>record models you'd expect drivers, writers, trips, trip requests.

36
00:01:43.879 --> 00:01:46.239
<v Speaker 1>That's worth thing and active record just quickly, that's the RM, right,

37
00:01:46.239 --> 00:01:48.159
<v Speaker 1>the object Relational mapper in reels exactly.

38
00:01:48.200 --> 00:01:50.760
<v Speaker 2>It's the magic layer that connects your Ruby code, your

39
00:01:50.760 --> 00:01:54.120
<v Speaker 2>classes directly to the database tables. And it really leans

40
00:01:54.120 --> 00:01:57.040
<v Speaker 2>into that rails philosophy of convention over configuration.

41
00:01:57.200 --> 00:01:59.799
<v Speaker 1>Ah right, like driver model maps to driver's table.

42
00:01:59.640 --> 00:02:02.120
<v Speaker 2>Automnate precisely, you don't have to spell it all out

43
00:02:02.239 --> 00:02:05.840
<v Speaker 2>and for managing schema changes rideshare uses the standard rail

44
00:02:05.879 --> 00:02:09.439
<v Speaker 2>stuff dbstructured dot s cool along with active record migrations

45
00:02:09.719 --> 00:02:13.360
<v Speaker 2>under the hood that uses pg dump to capture the structure.

46
00:02:13.439 --> 00:02:15.080
<v Speaker 1>You know, one thing I found really valuable in the

47
00:02:15.120 --> 00:02:18.879
<v Speaker 1>source was this strong push to set up and use

48
00:02:19.000 --> 00:02:22.199
<v Speaker 1>postgrass School locally, like really, make it your own little lab.

49
00:02:22.840 --> 00:02:24.840
<v Speaker 3>Totally. It's not just theory.

50
00:02:25.039 --> 00:02:28.599
<v Speaker 2>Having it run locally gives you complete control a safe

51
00:02:28.599 --> 00:02:31.960
<v Speaker 2>space to experiment, which, let's face it, is absolutely essential

52
00:02:32.000 --> 00:02:34.039
<v Speaker 2>when you're messing with performance settings. You don't want to

53
00:02:34.080 --> 00:02:37.039
<v Speaker 2>test this stuff live, definitely not, and getting rights you're

54
00:02:37.080 --> 00:02:39.840
<v Speaker 2>running is designed to be pretty smooth. You use Homebrew

55
00:02:40.080 --> 00:02:43.240
<v Speaker 2>urban v for your Ruby version, Butler for gems, and

56
00:02:43.280 --> 00:02:46.280
<v Speaker 2>then just the standard binrails commands dB do I create, dB,

57
00:02:46.400 --> 00:02:49.759
<v Speaker 2>DOT migrate, dB console simple stuff. Once you set up,

58
00:02:49.840 --> 00:02:53.800
<v Speaker 2>you immediately start bumping into core postgred school ideas like

59
00:02:54.240 --> 00:02:56.520
<v Speaker 2>SQL being a declarative language.

60
00:02:56.560 --> 00:02:58.120
<v Speaker 1>Okay, what does that actually mean? Declaraty?

61
00:02:58.240 --> 00:02:59.919
<v Speaker 2>It just means you tell postgres School what you want,

62
00:03:00.120 --> 00:03:02.080
<v Speaker 2>like get me all the trips from yesterday, and you

63
00:03:02.120 --> 00:03:06.280
<v Speaker 2>don't specify how to get it. Postgrescool's optimizer, its brain

64
00:03:06.919 --> 00:03:10.080
<v Speaker 2>figures out the most efficient way to execute that request.

65
00:03:10.199 --> 00:03:13.280
<v Speaker 1>Ah, so you declare the result. It handles the process

66
00:03:13.439 --> 00:03:15.520
<v Speaker 1>like ordering food, exactly.

67
00:03:15.120 --> 00:03:18.639
<v Speaker 2>Like ordering food, and the how it figures out that's

68
00:03:18.680 --> 00:03:22.520
<v Speaker 2>the query execution plan. It's like the database's internal recipe

69
00:03:22.560 --> 00:03:25.840
<v Speaker 2>for fetching your data. Plus you've got functions built in

70
00:03:25.919 --> 00:03:28.520
<v Speaker 2>ones and ones you can define yourself, which really let

71
00:03:28.560 --> 00:03:32.800
<v Speaker 2>you push complex logic down into the database itself. Seriously,

72
00:03:33.080 --> 00:03:36.240
<v Speaker 2>we can't stress this enough. Get ride share set up locally.

73
00:03:36.560 --> 00:03:39.319
<v Speaker 2>It really is the perfect lab for practicing everything we're

74
00:03:39.319 --> 00:03:39.960
<v Speaker 2>about to cover.

75
00:03:40.120 --> 00:03:43.400
<v Speaker 1>Okay, performance lab established. Ride Share is running locally, safe

76
00:03:43.400 --> 00:03:46.719
<v Speaker 1>space acquired. But to really know what's going on, we

77
00:03:46.800 --> 00:03:49.639
<v Speaker 1>need tools to look inside Postgres school, Right, how do

78
00:03:49.680 --> 00:03:50.719
<v Speaker 1>we peak under the hood.

79
00:03:51.039 --> 00:03:53.319
<v Speaker 2>Yeah, the piece grows command line tool is kind of

80
00:03:53.319 --> 00:03:55.240
<v Speaker 2>your main entry point there. There are a few meta

81
00:03:55.240 --> 00:03:57.919
<v Speaker 2>commands you'll use all the time, like a cantillybax that

82
00:03:58.039 --> 00:04:01.560
<v Speaker 2>toggles This expanded view makes query results way easier to read.

83
00:04:01.680 --> 00:04:03.120
<v Speaker 1>Oh yeah, that's super helpful.

84
00:04:03.439 --> 00:04:06.039
<v Speaker 2>And e is great. It pomps your current query and

85
00:04:06.080 --> 00:04:10.000
<v Speaker 2>your text editor life saver for complex SQL. Then there's

86
00:04:10.039 --> 00:04:13.840
<v Speaker 2>EEL to just list your databases. You can also customize

87
00:04:13.840 --> 00:04:17.680
<v Speaker 2>PSQL using a timicustl RC file add aliases, change the

88
00:04:17.720 --> 00:04:21.360
<v Speaker 2>prompt whatever makes you comfortable, and for enabling some really

89
00:04:21.360 --> 00:04:24.560
<v Speaker 2>powerful extensions like PG stat statements, which will definitely circle

90
00:04:24.600 --> 00:04:27.600
<v Speaker 2>back to you need to edit your postcrisql dot com

91
00:04:27.639 --> 00:04:30.439
<v Speaker 2>file specifically the shared preload library is setting.

92
00:04:30.639 --> 00:04:33.839
<v Speaker 1>Ah, okay, and that canfig change needs a restart.

93
00:04:33.560 --> 00:04:34.639
<v Speaker 3>Right that specific one does.

94
00:04:34.720 --> 00:04:37.600
<v Speaker 2>Yeah, Shared preload libraries needs a full postcresco restart to

95
00:04:37.600 --> 00:04:38.120
<v Speaker 2>take effect.

96
00:04:38.240 --> 00:04:41.879
<v Speaker 1>Got it? So okay, we can configure things, but how

97
00:04:41.920 --> 00:04:44.480
<v Speaker 1>do we see what the database is doing? Like right now?

98
00:04:44.560 --> 00:04:46.800
<v Speaker 1>Is it just log files or is there a more

99
00:04:46.800 --> 00:04:49.920
<v Speaker 1>direct way? I remember this one time a query just

100
00:04:50.040 --> 00:04:52.040
<v Speaker 1>ran forever almost took down the whole app. If only

101
00:04:52.040 --> 00:04:53.600
<v Speaker 1>I'd known about PG stat activity. Oh.

102
00:04:53.639 --> 00:04:55.920
<v Speaker 2>Absolutely, PG set activity is exactly that. It's your real

103
00:04:55.959 --> 00:05:00.439
<v Speaker 2>time dashboard. You see every connection, what state it's in active, idle,

104
00:05:00.720 --> 00:05:04.000
<v Speaker 2>maybe crucially idle in transaction, and you see background processes too,

105
00:05:04.000 --> 00:05:04.959
<v Speaker 2>like autovacuum, so.

106
00:05:04.959 --> 00:05:06.959
<v Speaker 1>You can spot those long running queries.

107
00:05:07.319 --> 00:05:07.600
<v Speaker 3>Yep.

108
00:05:08.120 --> 00:05:10.399
<v Speaker 2>You can see the curry text, find its process idea

109
00:05:10.439 --> 00:05:12.720
<v Speaker 2>the PID, and then if you really need to, you

110
00:05:12.759 --> 00:05:15.199
<v Speaker 2>can try to cancel it gracefully with pg cancel back

111
00:05:15.279 --> 00:05:19.360
<v Speaker 2>end or in an emergency, terminate it with p determinative

112
00:05:19.399 --> 00:05:22.040
<v Speaker 2>back end. Use that last one carefully.

113
00:05:21.680 --> 00:05:24.319
<v Speaker 1>Though right termination is a bit heavy handed.

114
00:05:24.120 --> 00:05:27.279
<v Speaker 2>It can be Now this ties into understanding pessimistic locking.

115
00:05:27.720 --> 00:05:31.160
<v Speaker 2>It's what postgrescool does by default. You'll see shared locks

116
00:05:31.399 --> 00:05:34.680
<v Speaker 2>and exclusive locks. The main takeaway you really want to

117
00:05:34.720 --> 00:05:38.079
<v Speaker 2>minimize how long you hold exclusive locks because they block

118
00:05:38.199 --> 00:05:40.639
<v Speaker 2>everyone else trying to access that same data.

119
00:05:40.720 --> 00:05:42.720
<v Speaker 1>And that can lead to deadlocks.

120
00:05:42.839 --> 00:05:43.319
<v Speaker 3>Exactly.

121
00:05:43.519 --> 00:05:47.519
<v Speaker 2>Deadlocks are the worst case scenario, two transactions waiting for

122
00:05:47.560 --> 00:05:51.199
<v Speaker 2>each other, stuck forever. Postgres will will detect and break them,

123
00:05:51.360 --> 00:05:54.360
<v Speaker 2>but it means one transaction fails. You can see livelock

124
00:05:54.439 --> 00:05:56.360
<v Speaker 2>information using the peaklocks view.

125
00:05:56.560 --> 00:05:59.439
<v Speaker 1>Okay, so lots to monitor. How about experimenting safely?

126
00:06:00.079 --> 00:06:03.040
<v Speaker 2>These source suggest using generate series to create lots of

127
00:06:03.079 --> 00:06:05.839
<v Speaker 2>fake data. You can do this in a separate experiment's database.

128
00:06:05.920 --> 00:06:08.839
<v Speaker 2>It's a great way to simulate production level load without

129
00:06:08.839 --> 00:06:10.399
<v Speaker 2>touching your real development data.

130
00:06:10.439 --> 00:06:11.000
<v Speaker 1>That's smart.

131
00:06:11.040 --> 00:06:14.399
<v Speaker 2>And here's something that often surprises people. Post grescool has

132
00:06:14.480 --> 00:06:15.800
<v Speaker 2>transactional DDL.

133
00:06:15.920 --> 00:06:18.519
<v Speaker 1>Transactional DDL data definition.

134
00:06:18.160 --> 00:06:20.519
<v Speaker 2>Language like create table, exactly.

135
00:06:20.160 --> 00:06:24.759
<v Speaker 1>Schema changes, create index, alter table, ad column. They happen

136
00:06:24.800 --> 00:06:27.920
<v Speaker 1>inside a transaction just like data changes. So you can

137
00:06:28.000 --> 00:06:31.879
<v Speaker 1>literally type begin, then create index mix on my table call,

138
00:06:32.040 --> 00:06:34.360
<v Speaker 1>then realize you made a mistake and type roll back

139
00:06:34.839 --> 00:06:37.240
<v Speaker 1>and that index just poof never happened.

140
00:06:37.319 --> 00:06:37.720
<v Speaker 3>WHOA.

141
00:06:37.759 --> 00:06:41.600
<v Speaker 2>Okay, that's huge for safety. No half applied schema changes.

142
00:06:41.720 --> 00:06:44.879
<v Speaker 1>It's an incredible safety net. It means your migrations either

143
00:06:44.920 --> 00:06:48.399
<v Speaker 1>succeed completely or fail completely, which brings us back to

144
00:06:48.439 --> 00:06:53.040
<v Speaker 1>safe experimentation. Always test schema changes in staging first, and

145
00:06:53.120 --> 00:06:56.160
<v Speaker 1>maybe even use read only database users in production for

146
00:06:56.199 --> 00:06:59.319
<v Speaker 1>certain monitoring tasks. Using roles like p grade old data

147
00:06:59.399 --> 00:07:01.920
<v Speaker 1>or pgmont just adds another layer of safety.

148
00:07:02.040 --> 00:07:05.040
<v Speaker 2>Makes sense, Okay, let's switch gears a bit. Data correctness

149
00:07:05.079 --> 00:07:08.399
<v Speaker 2>data consistency obviously super important in rails. We often reach

150
00:07:08.439 --> 00:07:11.800
<v Speaker 2>for active record validations, but the force argues pretty strongly

151
00:07:11.839 --> 00:07:14.839
<v Speaker 2>for using database level constraints too. What's the thinking there?

152
00:07:14.839 --> 00:07:19.240
<v Speaker 2>Are they redundant not redundant complementary? That's the key. Active

153
00:07:19.279 --> 00:07:23.240
<v Speaker 2>record validations are great, essential, even for catching errors early

154
00:07:23.279 --> 00:07:27.279
<v Speaker 2>at the application layer, providing good user feedback. But database

155
00:07:27.360 --> 00:07:32.720
<v Speaker 2>constraints offer stronger guarantees because they're enforced inside the database engine,

156
00:07:33.000 --> 00:07:36.959
<v Speaker 2>which is built specifically to handle high concurrency transactions and

157
00:07:37.079 --> 00:07:41.199
<v Speaker 2>maintain data isolation in ways an application layer just can't.

158
00:07:41.360 --> 00:07:44.279
<v Speaker 1>Okay, stronger guarantees. What kind of constraints are we talking

159
00:07:44.319 --> 00:07:46.959
<v Speaker 1>about beyond say, primary key or.

160
00:07:46.920 --> 00:07:48.720
<v Speaker 3>Not in all, Oh, there's a whole suite.

161
00:07:48.720 --> 00:07:51.639
<v Speaker 2>You've got unique constraints obviously, foreign key constraints to maintain

162
00:07:51.680 --> 00:07:55.839
<v Speaker 2>relationships between tables, check constraints for custom rules, and even

163
00:07:55.879 --> 00:07:57.839
<v Speaker 2>more advanced exclusion constraints.

164
00:07:58.000 --> 00:08:00.120
<v Speaker 1>Let's take unique If I want to add one, but

165
00:08:00.160 --> 00:08:02.399
<v Speaker 1>I already have duplicate data, what do I do?

166
00:08:02.800 --> 00:08:05.839
<v Speaker 2>Good question? You typically need to clean up that data first,

167
00:08:06.160 --> 00:08:09.240
<v Speaker 2>the source mentions using a common table expression a CTE

168
00:08:09.639 --> 00:08:12.240
<v Speaker 2>with the row number window function. It's a neat trick

169
00:08:12.240 --> 00:08:15.199
<v Speaker 2>to identify and then delete the duplicates before you apply

170
00:08:15.279 --> 00:08:16.480
<v Speaker 2>the unique constraint.

171
00:08:16.800 --> 00:08:21.000
<v Speaker 1>Okay, and for and keys. Rails didn't always support those natively,

172
00:08:21.120 --> 00:08:21.759
<v Speaker 1>did it right?

173
00:08:21.839 --> 00:08:24.920
<v Speaker 2>Native support landed in rails four point two Before that

174
00:08:24.959 --> 00:08:28.920
<v Speaker 2>you use gems. But yeah, they're fundamental. They ensure, for example,

175
00:08:28.920 --> 00:08:31.079
<v Speaker 2>that you can't delete a rider if they still have

176
00:08:31.120 --> 00:08:34.240
<v Speaker 2>associated trips in the database. Prevents orphaned records.

177
00:08:34.519 --> 00:08:37.799
<v Speaker 1>Got it? What about che check constraints you said, custom rules, Yeah,

178
00:08:37.840 --> 00:08:38.840
<v Speaker 1>they're really flexible.

179
00:08:39.039 --> 00:08:41.519
<v Speaker 2>Anything that evaluates the true or false, like you could

180
00:08:41.639 --> 00:08:44.159
<v Speaker 2>enforce that a trips table's completed timestam must always be

181
00:08:44.240 --> 00:08:47.240
<v Speaker 2>later than it's create debt timestap. Simple powerful rule.

182
00:08:47.399 --> 00:08:50.159
<v Speaker 1>Okay, that makes sense, But this raises a practical point.

183
00:08:50.200 --> 00:08:52.120
<v Speaker 1>How do you add a cheat check constraint like that

184
00:08:52.240 --> 00:08:55.039
<v Speaker 1>to a table that's already huge and getting hammered with traffic.

185
00:08:55.279 --> 00:08:57.600
<v Speaker 1>Wouldn't that lock it up while it checks millions of

186
00:08:57.639 --> 00:08:58.200
<v Speaker 1>old rows?

187
00:08:58.559 --> 00:09:02.519
<v Speaker 2>Exactly the problem, and there's an elegant solution. You do

188
00:09:02.559 --> 00:09:06.240
<v Speaker 2>it in two spelps using rails migrations. First you add

189
00:09:06.320 --> 00:09:12.279
<v Speaker 2>check constraint, but pass the option validated false. This tells postgraschool, okay,

190
00:09:12.399 --> 00:09:14.799
<v Speaker 2>enforce this rule for all new or updated rows from

191
00:09:14.799 --> 00:09:17.000
<v Speaker 2>now on, but don't check the old ones yet. That

192
00:09:17.039 --> 00:09:18.039
<v Speaker 2>part is super fast.

193
00:09:18.200 --> 00:09:19.360
<v Speaker 1>Ah, so it doesn't block.

194
00:09:19.679 --> 00:09:19.840
<v Speaker 3>Right.

195
00:09:20.240 --> 00:09:23.559
<v Speaker 2>Then, in a separate later migration you run validated at

196
00:09:23.639 --> 00:09:27.120
<v Speaker 2>check constraint for that same constraint. This tells postgres school, okay,

197
00:09:27.159 --> 00:09:29.879
<v Speaker 2>now go back and check all the existing rows. But

198
00:09:30.000 --> 00:09:32.480
<v Speaker 2>it does so without taking such a heavy lock. It

199
00:09:32.519 --> 00:09:33.679
<v Speaker 2>avoids that downtime.

200
00:09:34.200 --> 00:09:37.080
<v Speaker 1>Clever two steps. What about deferring constraints?

201
00:09:37.159 --> 00:09:41.000
<v Speaker 2>Yeah, deferable Initially deferred. You can apply this to unique

202
00:09:41.120 --> 00:09:45.279
<v Speaker 2>primary key, foreign key, and exclusion constraints. It means the

203
00:09:45.320 --> 00:09:48.120
<v Speaker 2>constraint check is postponed until the very end of the transaction.

204
00:09:48.600 --> 00:09:51.399
<v Speaker 2>Super useful for things like say, reordering items in a

205
00:09:51.399 --> 00:09:54.440
<v Speaker 2>list where each item needs a unique position. You might

206
00:09:54.480 --> 00:09:57.519
<v Speaker 2>temporarily have duplicate positions during the transaction while you swap

207
00:09:57.559 --> 00:09:59.360
<v Speaker 2>things around, but as long as it's fixed by the

208
00:09:59.399 --> 00:10:00.759
<v Speaker 2>time you commit, it's okay.

209
00:10:00.639 --> 00:10:04.039
<v Speaker 1>Interesting, okay. You also mentioned exclusion constraints. Those sound advanced.

210
00:10:04.240 --> 00:10:07.840
<v Speaker 2>They are powerful and less common, but solve specific problems

211
00:10:07.919 --> 00:10:11.600
<v Speaker 2>really well. They prevent overlapping data across multiple roads in

212
00:10:11.639 --> 00:10:15.240
<v Speaker 2>the same table. The classic example is preventing overlapping time

213
00:10:15.320 --> 00:10:18.799
<v Speaker 2>ranges like booking a meeting room, or, in ride SHARE's case,

214
00:10:19.000 --> 00:10:23.399
<v Speaker 2>maybe preventing overlapping vehicle reservations. They usually require an extension

215
00:10:23.480 --> 00:10:26.120
<v Speaker 2>like beat read just and often use range types like

216
00:10:26.200 --> 00:10:29.720
<v Speaker 2>TSTs range for timestamp ranges along with the overlap operator

217
00:10:29.759 --> 00:10:33.720
<v Speaker 2>at datcha okay, quick detour case in sensitive unique emails

218
00:10:33.799 --> 00:10:36.080
<v Speaker 2>common problem. How does postgress will handle that?

219
00:10:36.240 --> 00:10:37.039
<v Speaker 3>Two main ways?

220
00:10:37.080 --> 00:10:40.200
<v Speaker 2>Really You could use the site text extension, which provides

221
00:10:40.240 --> 00:10:44.120
<v Speaker 2>a case in sensitive text type, or you can use generated.

222
00:10:43.639 --> 00:10:46.360
<v Speaker 1>Columns generated columns like virtual columns sorted.

223
00:10:46.440 --> 00:10:49.440
<v Speaker 2>Yeah, you define a column that's automatically computed based on others,

224
00:10:49.639 --> 00:10:51.600
<v Speaker 2>so you could have a lower mail generated column that

225
00:10:51.639 --> 00:10:54.919
<v Speaker 2>always stores lower email. Then you put a regular unique

226
00:10:54.960 --> 00:10:58.639
<v Speaker 2>index on that generated column. Rails actually supports these now too.

227
00:10:58.720 --> 00:11:02.440
<v Speaker 1>Neat and quickly. And domains right, create.

228
00:11:02.159 --> 00:11:05.320
<v Speaker 2>Type of gas as enom lets you define a fixed

229
00:11:05.360 --> 00:11:10.200
<v Speaker 2>list of a loudstring values for a column like trip statuses, pending, active, completed.

230
00:11:10.679 --> 00:11:13.360
<v Speaker 2>Create domain lets you create a custom data type based

231
00:11:13.440 --> 00:11:16.679
<v Speaker 2>on an existing one, but add check constraints to it,

232
00:11:16.759 --> 00:11:20.480
<v Speaker 2>making reusable validation rules both useful, different trade offs.

233
00:11:20.519 --> 00:11:23.159
<v Speaker 1>Okay, this is great for ensuring data integrity, but let's

234
00:11:23.159 --> 00:11:26.320
<v Speaker 1>talk about actually changing the database schema on a busy

235
00:11:26.360 --> 00:11:30.159
<v Speaker 1>production system. That moment when you run RAILSDB dot migrate,

236
00:11:30.840 --> 00:11:31.759
<v Speaker 1>it can be terrifying.

237
00:11:31.840 --> 00:11:34.480
<v Speaker 2>Oh yeah, the dreaded migration lock exactly.

238
00:11:34.480 --> 00:11:37.559
<v Speaker 1>Some alter table operations, they take what's called an access

239
00:11:37.600 --> 00:11:40.600
<v Speaker 1>exclusive lock, right, and that just blocks everything reads rights.

240
00:11:40.639 --> 00:11:43.480
<v Speaker 1>Your app grinds to a halt. How do we avoid

241
00:11:43.519 --> 00:11:44.720
<v Speaker 1>that absolute nightmare?

242
00:11:44.879 --> 00:11:45.039
<v Speaker 3>Right?

243
00:11:45.039 --> 00:11:47.440
<v Speaker 2>That exclusive lock is the enemy on a busy system.

244
00:11:47.679 --> 00:11:50.840
<v Speaker 2>The absolute key here, Your best friend really is the

245
00:11:50.919 --> 00:11:54.639
<v Speaker 2>concurrently keyword for operations like create index or drop index.

246
00:11:54.960 --> 00:11:57.759
<v Speaker 2>Adding concurrently tells postgress will to do the work without

247
00:11:57.759 --> 00:12:01.200
<v Speaker 2>taking that heavy lock. It takes longer, more resources, but

248
00:12:01.240 --> 00:12:02.600
<v Speaker 2>your application stays online.

249
00:12:02.679 --> 00:12:03.360
<v Speaker 3>It's a life saver.

250
00:12:03.759 --> 00:12:07.440
<v Speaker 1>So create index concurrently, drop index concurrently. Are there other

251
00:12:07.480 --> 00:12:08.559
<v Speaker 1>ways to stay safe?

252
00:12:08.639 --> 00:12:09.159
<v Speaker 3>Definitely.

253
00:12:09.360 --> 00:12:12.879
<v Speaker 2>There's a fantastic Ruby gym called strong Migrations. You add

254
00:12:12.879 --> 00:12:15.960
<v Speaker 2>it to your development environment and it actively watches your migrations.

255
00:12:16.639 --> 00:12:19.519
<v Speaker 2>If it spots a potentially dangerous operation, one that would

256
00:12:19.519 --> 00:12:23.879
<v Speaker 2>take an access exclusive lock and likely cause downtime, it'll

257
00:12:23.919 --> 00:12:27.639
<v Speaker 2>either warn you, suggest a safer multi step alternative like

258
00:12:27.679 --> 00:12:30.879
<v Speaker 2>the chi check constrained example, or even prevent the migration

259
00:12:30.919 --> 00:12:32.519
<v Speaker 2>from running in production by default.

260
00:12:32.639 --> 00:12:36.039
<v Speaker 1>Oh wow, So it enforces safer practices during development exactly.

261
00:12:36.080 --> 00:12:39.080
<v Speaker 2>It catches things early beyond that. You need safeguards at

262
00:12:39.080 --> 00:12:42.679
<v Speaker 2>the database level too, especially with high concurrency. Setting a

263
00:12:42.720 --> 00:12:44.080
<v Speaker 2>lock time out is crucial.

264
00:12:44.240 --> 00:12:46.600
<v Speaker 1>Lock time out that limits how long a query waits

265
00:12:46.600 --> 00:12:47.080
<v Speaker 1>for a lock.

266
00:12:47.360 --> 00:12:50.480
<v Speaker 2>Precisely, if a query can't get the lock it needs

267
00:12:50.480 --> 00:12:55.120
<v Speaker 2>within say, fifty milliseconds, it gets canceled instead of just

268
00:12:55.120 --> 00:12:59.559
<v Speaker 2>sitting there waiting indefinitely and potentially holding up other processes. Similarly,

269
00:12:59.759 --> 00:13:02.120
<v Speaker 2>stay time out puts a cap on how long any

270
00:13:02.120 --> 00:13:05.600
<v Speaker 2>single sequel statement is allowed to run, prevents runaway queries

271
00:13:05.600 --> 00:13:09.399
<v Speaker 2>from hogging resources. The source also mentions enabling log lock

272
00:13:09.440 --> 00:13:12.639
<v Speaker 2>weights and tuning deadlock timeout for better visibility in your

273
00:13:12.679 --> 00:13:16.039
<v Speaker 2>logs when contention happens. Okay, timeouts are key. What about

274
00:13:16.080 --> 00:13:20.080
<v Speaker 2>removing columns? I've heard that can cause weird errors too. Ah, yes,

275
00:13:20.120 --> 00:13:22.840
<v Speaker 2>so the stale schema cache problem. This happens when you

276
00:13:22.960 --> 00:13:25.159
<v Speaker 2>drop a column, but some of your running Rails application

277
00:13:25.240 --> 00:13:27.639
<v Speaker 2>servers haven't picked up the schema change yet, they try

278
00:13:27.679 --> 00:13:29.600
<v Speaker 2>to query the column that no longer exists.

279
00:13:29.919 --> 00:13:30.879
<v Speaker 3>Boom error.

280
00:13:31.000 --> 00:13:32.840
<v Speaker 1>Right, So how do you remove a column safely?

281
00:13:33.200 --> 00:13:36.159
<v Speaker 2>The recommended way is using active record dot base dot

282
00:13:36.200 --> 00:13:40.200
<v Speaker 2>ignored columns. It's a multi step process. First, you add

283
00:13:40.200 --> 00:13:42.879
<v Speaker 2>the column name to ignored columns in your Rails model,

284
00:13:43.240 --> 00:13:47.120
<v Speaker 2>deploy that code. Now rail simply pretends the column doesn't exist,

285
00:13:47.440 --> 00:13:50.159
<v Speaker 2>even though it's still in the database. Then, once your

286
00:13:50.159 --> 00:13:52.360
<v Speaker 2>sure no code is using it, you create and run

287
00:13:52.399 --> 00:13:55.799
<v Speaker 2>a migration to actually remove column. Finally, you remove the

288
00:13:55.799 --> 00:13:58.679
<v Speaker 2>column name from ignored columns in a later deploy. It's

289
00:13:58.720 --> 00:14:00.320
<v Speaker 2>gradual and safe.

290
00:14:00.279 --> 00:14:04.120
<v Speaker 1>Makes sense gradual removal. Now this brings up another big one.

291
00:14:04.320 --> 00:14:06.879
<v Speaker 1>What if you add a new column and need to

292
00:14:06.919 --> 00:14:11.440
<v Speaker 1>populate it for millions of existing rows backfilling data without downtime?

293
00:14:11.519 --> 00:14:14.360
<v Speaker 2>Yeah, that's a classic challenge. Running a massive up date

294
00:14:14.440 --> 00:14:17.000
<v Speaker 2>statement is usually out of the question. Too slow, too

295
00:14:17.080 --> 00:14:20.919
<v Speaker 2>much locking. You need online backfilling strategies. One approach is

296
00:14:20.960 --> 00:14:24.440
<v Speaker 2>double writing or dual rights. You modify your application code

297
00:14:24.480 --> 00:14:26.840
<v Speaker 2>to write to both the old location if any, and

298
00:14:26.919 --> 00:14:29.679
<v Speaker 2>the new column simultaneously. You run a background job to

299
00:14:29.679 --> 00:14:32.039
<v Speaker 2>backfill a new column for old records, and once it's done,

300
00:14:32.080 --> 00:14:34.360
<v Speaker 2>you switch weeds to the new column and eventually remove

301
00:14:34.399 --> 00:14:36.200
<v Speaker 2>the dual right logic and the old column.

302
00:14:36.240 --> 00:14:38.279
<v Speaker 1>Okay, double writing any other ways.

303
00:14:38.440 --> 00:14:42.720
<v Speaker 2>Another technique involves using intermediate tables. You create a temporary table,

304
00:14:42.799 --> 00:14:45.240
<v Speaker 2>maybe just with the primary key in the new column value.

305
00:14:45.519 --> 00:14:48.080
<v Speaker 2>You populate that table, perhaps marking it U and lodgy

306
00:14:48.200 --> 00:14:51.639
<v Speaker 2>so it doesn't hit replication, maybe disabling autovacuum on it

307
00:14:51.639 --> 00:14:55.039
<v Speaker 2>temporarily for speed. Then you batch update the main table

308
00:14:55.080 --> 00:14:59.320
<v Speaker 2>from this intermediate table. And crucially, all these backfilling processes

309
00:14:59.360 --> 00:15:02.600
<v Speaker 2>need to be done in small, manageable batches, often with

310
00:15:02.679 --> 00:15:05.759
<v Speaker 2>some kind of throttling or delay between batches to avoid

311
00:15:05.879 --> 00:15:08.519
<v Speaker 2>overwhelming the database or causing replication.

312
00:15:08.159 --> 00:15:12.559
<v Speaker 1>Lack batching and throttling. Got it, Okay, let's shift focus

313
00:15:12.600 --> 00:15:16.360
<v Speaker 1>to active record itself. It's amazing. Rails convention over configuration

314
00:15:16.480 --> 00:15:20.480
<v Speaker 1>is great, but yeah, it can sometimes generate pretty inefficient

315
00:15:20.559 --> 00:15:22.240
<v Speaker 1>queries if you're not paying attention right.

316
00:15:22.399 --> 00:15:26.159
<v Speaker 2>Absolutely, the abstraction is powerful, but it can hide what's

317
00:15:26.200 --> 00:15:29.120
<v Speaker 2>actually happening. So connecting this to the bigger picture, it's

318
00:15:29.159 --> 00:15:32.080
<v Speaker 2>about making your rails app smarter in how it communicates

319
00:15:32.080 --> 00:15:32.759
<v Speaker 2>with postgress.

320
00:15:32.759 --> 00:15:35.519
<v Speaker 1>How do we even spot the bad queries easily?

321
00:15:35.840 --> 00:15:36.080
<v Speaker 3>Well?

322
00:15:36.200 --> 00:15:38.559
<v Speaker 2>One really helpful thing in Roll seven and later is

323
00:15:38.600 --> 00:15:42.320
<v Speaker 2>the improved query logs. They can automatically add context like

324
00:15:42.360 --> 00:15:45.200
<v Speaker 2>which controller an action trigger the query right into the

325
00:15:45.200 --> 00:15:48.320
<v Speaker 2>sql log output. Makes tracing a slow query back to

326
00:15:48.320 --> 00:15:50.559
<v Speaker 2>your application code much much easier.

327
00:15:50.600 --> 00:15:53.120
<v Speaker 1>That sounds useful. What's a common inefficiency pattern?

328
00:15:53.200 --> 00:15:53.360
<v Speaker 3>Oh?

329
00:15:53.360 --> 00:15:55.720
<v Speaker 2>The absolute classic is the M plus one query problem?

330
00:15:55.799 --> 00:15:56.639
<v Speaker 2>You see it everywhere?

331
00:15:56.720 --> 00:15:59.879
<v Speaker 1>Ah, Yes, fetch one thing, then loop and fetch relate

332
00:16:00.279 --> 00:16:01.120
<v Speaker 1>things one.

333
00:16:01.000 --> 00:16:04.720
<v Speaker 2>By one exactly like load one hundred blog posts and

334
00:16:04.759 --> 00:16:07.559
<v Speaker 2>then inside the loop for each post, run another query

335
00:16:07.600 --> 00:16:09.799
<v Speaker 2>to get its author. That's one hundred and one database

336
00:16:09.840 --> 00:16:13.000
<v Speaker 2>queries when it could probably be just two kills performance.

337
00:16:13.200 --> 00:16:15.440
<v Speaker 1>Right, So how do we fix N plus one?

338
00:16:15.639 --> 00:16:18.720
<v Speaker 2>The primary solution is eager loading. You tell active Record

339
00:16:18.799 --> 00:16:22.000
<v Speaker 2>upfront what associated data you'll need, use methods like dot

340
00:16:22.000 --> 00:16:26.039
<v Speaker 2>preload or dot cludes rails, then cleverly figures out how

341
00:16:26.039 --> 00:16:28.399
<v Speaker 2>to load all that data in a minimal number of queries,

342
00:16:28.480 --> 00:16:30.440
<v Speaker 2>usually just one extra query for each association.

343
00:16:30.559 --> 00:16:32.440
<v Speaker 1>Yeah preload and dot includes.

344
00:16:32.120 --> 00:16:35.080
<v Speaker 2>Got it and a quick tip if you've already eagerloaded

345
00:16:35.159 --> 00:16:38.799
<v Speaker 2>data into an array of objects, use dot size to

346
00:16:38.840 --> 00:16:41.879
<v Speaker 2>get the count dot dot count. Dot size works unloaded

347
00:16:41.960 --> 00:16:45.960
<v Speaker 2>array in memory, dot count might trigger another database query

348
00:16:46.039 --> 00:16:47.440
<v Speaker 2>unnecessarily good tip.

349
00:16:47.639 --> 00:16:49.440
<v Speaker 1>Any other ways to prevent N plus one?

350
00:16:49.720 --> 00:16:52.720
<v Speaker 2>Yeah, Rail six point one introduce strict loading. You can

351
00:16:52.840 --> 00:16:55.559
<v Speaker 2>enable it per association or globally. If you try to

352
00:16:55.600 --> 00:16:59.480
<v Speaker 2>access an association that wasn't explicitly eger loaded, it raises

353
00:16:59.480 --> 00:17:02.679
<v Speaker 2>an error instead of silently running the en plus one query.

354
00:17:02.759 --> 00:17:04.799
<v Speaker 2>It forces you to be explicit and prevents the problem

355
00:17:04.799 --> 00:17:05.319
<v Speaker 2>by default.

356
00:17:05.400 --> 00:17:07.680
<v Speaker 1>Ooh, I like that force the good behavior. What about

357
00:17:07.720 --> 00:17:09.279
<v Speaker 1>optimizing individual queries?

358
00:17:09.359 --> 00:17:11.799
<v Speaker 2>Simple things First, always use limit If you don't need

359
00:17:11.799 --> 00:17:14.119
<v Speaker 2>all possible results, don't pull back ten thousand rows if

360
00:17:14.160 --> 00:17:17.799
<v Speaker 2>you only display twenty. Also, the returning clause on insert

361
00:17:18.079 --> 00:17:21.160
<v Speaker 2>active records dot insertle method supports this. Now it lets

362
00:17:21.160 --> 00:17:23.240
<v Speaker 2>you get back the IDs or other columns of the

363
00:17:23.319 --> 00:17:26.839
<v Speaker 2>rose you just inserted without needing a separate select query afterwards.

364
00:17:27.079 --> 00:17:27.920
<v Speaker 2>Saves a round trip.

365
00:17:28.160 --> 00:17:30.759
<v Speaker 1>Nice and processing large amounts of data use.

366
00:17:30.599 --> 00:17:34.440
<v Speaker 2>Active records batching methods dot finbach or dot in batches.

367
00:17:34.839 --> 00:17:38.200
<v Speaker 2>They retrieve records in batches default one thousand, which keeps

368
00:17:38.240 --> 00:17:41.079
<v Speaker 2>memory usage low and avoids overwhelming the database or your

369
00:17:41.119 --> 00:17:45.480
<v Speaker 2>application with enormous result sets. Much more reliable for large tables.

370
00:17:45.640 --> 00:17:48.559
<v Speaker 1>Okay, batching is key. What about more complex SQL logic?

371
00:17:48.599 --> 00:17:49.799
<v Speaker 1>Does Active record help there?

372
00:17:49.880 --> 00:17:54.200
<v Speaker 2>It does increasingly so. Active record has good support for subqueries.

373
00:17:54.240 --> 00:17:57.279
<v Speaker 2>Now you can construct queries where part of the ware clause,

374
00:17:57.319 --> 00:18:01.079
<v Speaker 2>for instance, is itself another SQL query, useful for things

375
00:18:01.119 --> 00:18:03.400
<v Speaker 2>like finding drivers who have completed more trips than the

376
00:18:03.440 --> 00:18:05.279
<v Speaker 2>overall average number of trips per driver.

377
00:18:05.480 --> 00:18:10.039
<v Speaker 1>Okay, subqueries. I've also heard about CT's common table expressions.

378
00:18:10.119 --> 00:18:14.720
<v Speaker 2>Yes, CTEs using the with keyword in SQL are fantastic

379
00:18:14.759 --> 00:18:17.240
<v Speaker 2>for complex queries. They'll let you break down a big,

380
00:18:17.279 --> 00:18:20.759
<v Speaker 2>hairy query into smaller named logical steps. It makes the

381
00:18:20.759 --> 00:18:24.279
<v Speaker 2>sequel vastly more readable and maintainable. Active record has ways

382
00:18:24.279 --> 00:18:26.240
<v Speaker 2>to build queries using CTEs.

383
00:18:25.839 --> 00:18:29.000
<v Speaker 1>Too, so better organization. What about views?

384
00:18:29.559 --> 00:18:33.440
<v Speaker 2>Database views are another way to encapsulate complex sequel. You

385
00:18:33.519 --> 00:18:36.759
<v Speaker 2>define the query logic as a view directly imposed cresscol

386
00:18:37.359 --> 00:18:39.359
<v Speaker 2>and then you can query it from rails, almost like

387
00:18:39.400 --> 00:18:43.440
<v Speaker 2>a regular table. The scenic gem is very popular for

388
00:18:43.519 --> 00:18:46.200
<v Speaker 2>managing database views within your rails migrations.

389
00:18:46.319 --> 00:18:50.160
<v Speaker 1>Okay, views and materialized views. What's the difference?

390
00:18:50.240 --> 00:18:52.920
<v Speaker 2>Ah, Now, this is where it gets really interesting. Materialized

391
00:18:52.960 --> 00:18:55.559
<v Speaker 2>views take it a step further. They don't just store

392
00:18:55.599 --> 00:18:58.960
<v Speaker 2>the query definition. They actually execute the query and store

393
00:18:59.000 --> 00:19:01.160
<v Speaker 2>the results physically like a cash table.

394
00:19:01.519 --> 00:19:03.680
<v Speaker 1>So queries against them are super fast.

395
00:19:03.440 --> 00:19:07.160
<v Speaker 2>Lightened fast because the complex calculation or joint is already done.

396
00:19:07.279 --> 00:19:10.599
<v Speaker 2>They're perfect for complex reports or dashboard data that doesn't

397
00:19:10.680 --> 00:19:12.200
<v Speaker 2>need to be absolutely real time.

398
00:19:12.559 --> 00:19:13.759
<v Speaker 3>And the cool part you.

399
00:19:13.680 --> 00:19:16.920
<v Speaker 2>Can often refresh them can currently update the stored data

400
00:19:16.920 --> 00:19:20.359
<v Speaker 2>without locking readers. Provided the materialized view as a unique

401
00:19:20.359 --> 00:19:24.519
<v Speaker 2>index defined on it, zero downtime updates for your cash data.

402
00:19:24.559 --> 00:19:28.480
<v Speaker 1>Wow, concurrent refresh that's powerful. What about caching within rails itself?

403
00:19:28.720 --> 00:19:28.920
<v Speaker 3>Right.

404
00:19:29.039 --> 00:19:32.200
<v Speaker 2>Rails has layers two. The query cash is automatic within

405
00:19:32.240 --> 00:19:35.440
<v Speaker 2>a single controller action. If you run the exact same

406
00:19:35.519 --> 00:19:39.000
<v Speaker 2>sequel query twice in one request, RAILS cash is the

407
00:19:39.000 --> 00:19:41.480
<v Speaker 2>result of the first call and returns it instantly the

408
00:19:41.519 --> 00:19:44.319
<v Speaker 2>second time, avoiding a redundant database.

409
00:19:44.000 --> 00:19:45.599
<v Speaker 1>Hit okay automatic per request.

410
00:19:45.839 --> 00:19:49.680
<v Speaker 2>Then there are prepared statements postcresscool can parse a query

411
00:19:49.720 --> 00:19:53.559
<v Speaker 2>plan once and reuse it for subsequent executions with different parameters.

412
00:19:54.119 --> 00:19:57.400
<v Speaker 2>RAILS uses these under the hood. Rail seven improved how

413
00:19:57.400 --> 00:20:00.880
<v Speaker 2>it handles them, making reuse more likely by numerrating columns,

414
00:20:00.920 --> 00:20:03.279
<v Speaker 2>which helps reduce parsing overhead.

415
00:20:02.880 --> 00:20:03.759
<v Speaker 1>And counter cases.

416
00:20:03.880 --> 00:20:06.599
<v Speaker 2>Counter cases are a common pattern, not strictly a RAILS feature,

417
00:20:06.640 --> 00:20:09.839
<v Speaker 2>but supported by it. You add an integer calumn to

418
00:20:09.880 --> 00:20:12.079
<v Speaker 2>a model, say trips count on the user model. Then

419
00:20:12.119 --> 00:20:14.680
<v Speaker 2>you can figure the trip model to automatically increment or

420
00:20:14.720 --> 00:20:17.799
<v Speaker 2>decrement that counter whenever a trip is created or destroyed

421
00:20:17.839 --> 00:20:18.119
<v Speaker 2>for that.

422
00:20:18.160 --> 00:20:21.400
<v Speaker 1>User, so you get counts almost instantly without a slow query.

423
00:20:21.880 --> 00:20:24.519
<v Speaker 2>Exactly reading the count is just reading it into your column.

424
00:20:25.200 --> 00:20:27.680
<v Speaker 2>The tradeoff is slightly increased right latency, you have to

425
00:20:27.759 --> 00:20:31.160
<v Speaker 2>update the counter, and potential for slight inconsistencies if things

426
00:20:31.200 --> 00:20:34.200
<v Speaker 2>go wrong but they're often a huge performance win for

427
00:20:34.279 --> 00:20:35.559
<v Speaker 2>frequently needed counts.

428
00:20:36.160 --> 00:20:39.599
<v Speaker 1>Makes sense. What about calculations like averages or sums.

429
00:20:39.880 --> 00:20:42.480
<v Speaker 2>Do them in the database. Don't pull a million numbers

430
00:20:42.519 --> 00:20:45.720
<v Speaker 2>back to Ruby just to calculate the average. Use CQL

431
00:20:45.759 --> 00:20:50.880
<v Speaker 2>aggregate functions like avg sum countet max min. Active record

432
00:20:50.880 --> 00:20:54.039
<v Speaker 2>provides methods like dot average sum, et cetera that generate

433
00:20:54.039 --> 00:20:57.200
<v Speaker 2>the correct sequel. It drastically reduces data transfer and Ruby

434
00:20:57.240 --> 00:20:58.480
<v Speaker 2>object allocation.

435
00:20:58.319 --> 00:21:00.720
<v Speaker 1>Right less object delocation is that a big deal?

436
00:21:00.880 --> 00:21:03.599
<v Speaker 2>It can be, especially in tight loops or high throughput scenarios.

437
00:21:04.119 --> 00:21:08.359
<v Speaker 2>Every active record object RAILS creates consumes memory and CPU cycles.

438
00:21:08.839 --> 00:21:10.160
<v Speaker 3>If you only need raw.

439
00:21:10.039 --> 00:21:12.759
<v Speaker 2>Data maybe for a report or an export, Using lower

440
00:21:12.839 --> 00:21:15.640
<v Speaker 2>level methods like dot find by Soukel or even active

441
00:21:15.680 --> 00:21:19.200
<v Speaker 2>record dot base, dot connection, dot execute can be significantly faster.

442
00:21:19.680 --> 00:21:22.559
<v Speaker 2>There were dren simpler data structures, hashes or rays instead

443
00:21:22.559 --> 00:21:26.359
<v Speaker 2>of full blown active record objects, dramatically reducing that allocation overhead.

444
00:21:26.440 --> 00:21:29.519
<v Speaker 1>Okay, lots of ways to optimize within RAILS, but let's

445
00:21:29.519 --> 00:21:33.759
<v Speaker 1>go deeper into the database itself. When a query hits postgriss, well,

446
00:21:33.839 --> 00:21:36.119
<v Speaker 1>how do we see what it's really doing? And how

447
00:21:36.119 --> 00:21:38.240
<v Speaker 1>do we choose the right index to speed it up.

448
00:21:38.480 --> 00:21:42.119
<v Speaker 2>Right now we're getting into database level observability. A really

449
00:21:42.119 --> 00:21:45.799
<v Speaker 2>crucial tool here is the pgstat statements extension we mentioned earlier.

450
00:21:46.119 --> 00:21:48.519
<v Speaker 2>You enable it in postcresq al dot COF.

451
00:21:48.759 --> 00:21:49.400
<v Speaker 1>What is it track?

452
00:21:49.720 --> 00:21:53.200
<v Speaker 2>It tracks execution statistics for every normalized query run against

453
00:21:53.200 --> 00:21:56.599
<v Speaker 2>your database, total time spent, average time, how many times

454
00:21:56.640 --> 00:21:59.920
<v Speaker 2>it was called, rose returned, et cetera. It's invaluable for

455
00:22:00.079 --> 00:22:03.920
<v Speaker 2>finding your most expensive queries globally, the ones consuming the

456
00:22:03.920 --> 00:22:05.839
<v Speaker 2>most database time overall, so you can.

457
00:22:05.720 --> 00:22:08.720
<v Speaker 1>See the biggest hitters across the whole application exactly.

458
00:22:08.799 --> 00:22:12.440
<v Speaker 2>It helps you prioritize optimization efforts for a more visual approach.

459
00:22:12.559 --> 00:22:15.440
<v Speaker 2>Pg hero is a great dashboard tool. Many teams use it.

460
00:22:15.440 --> 00:22:17.319
<v Speaker 2>It gives you a web UI to see slow queries.

461
00:22:17.359 --> 00:22:20.519
<v Speaker 2>It can analyze query plans using explain, show you large

462
00:22:20.519 --> 00:22:24.920
<v Speaker 2>tables indexes, find unused indexes, and even show currently running queries.

463
00:22:24.960 --> 00:22:28.240
<v Speaker 1>Pg Hero Okay, but the core tool is explain right.

464
00:22:28.519 --> 00:22:30.640
<v Speaker 3>Explain is fundamental running.

465
00:22:30.720 --> 00:22:33.680
<v Speaker 2>Explain before a query shows you post Grescol's plan for

466
00:22:33.720 --> 00:22:37.039
<v Speaker 2>executing it. Adding Analyze actually runs the query and shows

467
00:22:37.079 --> 00:22:39.880
<v Speaker 2>you the plan along with the actual time spent and

468
00:22:40.000 --> 00:22:40.960
<v Speaker 2>rose returned at.

469
00:22:40.839 --> 00:22:44.160
<v Speaker 1>Each step, so explain analyze gives you the real picture.

470
00:22:44.319 --> 00:22:47.000
<v Speaker 2>Yes, you can also add buffers to see memory usage

471
00:22:47.119 --> 00:22:50.920
<v Speaker 2>like cash hits and format YAMEL or JSON for easier parsing.

472
00:22:51.319 --> 00:22:53.680
<v Speaker 2>It shows you things like whether it's doing a sequential

473
00:22:53.720 --> 00:22:56.440
<v Speaker 2>scan reading the whole table usually bad, or an index

474
00:22:56.480 --> 00:23:00.000
<v Speaker 2>scan or ideally an index only scan. It shows estimated costs,

475
00:23:00.440 --> 00:23:03.839
<v Speaker 2>filter conditions. It's how you diagnose a slow query.

476
00:23:03.599 --> 00:23:05.960
<v Speaker 1>Which leads to the question how do you make sure

477
00:23:06.000 --> 00:23:08.680
<v Speaker 1>Postgress uses the index you think it should or figure

478
00:23:08.720 --> 00:23:09.759
<v Speaker 1>out if you're missing one.

479
00:23:09.960 --> 00:23:13.039
<v Speaker 2>Finding missing indexes often starts with looking for those slow

480
00:23:13.119 --> 00:23:16.920
<v Speaker 2>sequential scans and explain plans or MPG stats statements. Data

481
00:23:17.079 --> 00:23:20.279
<v Speaker 2>pg Hero also has a suggested indexes feature based on

482
00:23:20.319 --> 00:23:23.559
<v Speaker 2>its analysis the rail's best practices. Jim can sometimes flag

483
00:23:23.599 --> 00:23:25.000
<v Speaker 2>missing foreign key indexes too.

484
00:23:25.200 --> 00:23:28.680
<v Speaker 1>Can you make Postgress logs slow queries automatically YEP.

485
00:23:28.960 --> 00:23:31.920
<v Speaker 2>Set log minduration statement in postcressql dot com.

486
00:23:32.000 --> 00:23:32.200
<v Speaker 1>Yeah.

487
00:23:32.400 --> 00:23:35.799
<v Speaker 2>Any query taking longer than say, five or milliseconds gets

488
00:23:35.799 --> 00:23:39.240
<v Speaker 2>logged automatically. Even better, you can use the auto explain extension,

489
00:23:39.359 --> 00:23:42.640
<v Speaker 2>which automatically logs the explain plan for slow queries so

490
00:23:42.680 --> 00:23:45.440
<v Speaker 2>you see exactly why it was slow right in your logs.

491
00:23:45.160 --> 00:23:48.799
<v Speaker 1>Auto explain Nice. Okay, let's talk index type, single column,

492
00:23:48.880 --> 00:23:50.519
<v Speaker 1>multi column what's the strategy.

493
00:23:50.720 --> 00:23:54.000
<v Speaker 2>Well, a multi column index on ABC can be used

494
00:23:54.000 --> 00:23:56.480
<v Speaker 2>for queries filtering on A or A and B or A,

495
00:23:56.599 --> 00:23:59.200
<v Speaker 2>B and C, but generally won't be used efficiently if you.

496
00:23:59.160 --> 00:24:00.359
<v Speaker 3>Only filter on or C.

497
00:24:00.640 --> 00:24:02.599
<v Speaker 2>It's left to right. You need to think about your

498
00:24:02.680 --> 00:24:06.440
<v Speaker 2>query patterns. Sometimes multiple single column indexes are better. Sometimes

499
00:24:06.480 --> 00:24:09.519
<v Speaker 2>a carefully ordered multi column indexes key. You also want

500
00:24:09.519 --> 00:24:11.960
<v Speaker 2>to avoid redundant indexes, like having an index on A

501
00:24:12.200 --> 00:24:14.440
<v Speaker 2>and another on AB. The first one might be unnecessary.

502
00:24:14.480 --> 00:24:18.119
<v Speaker 1>Okay, what about indexing on like a lowercased email.

503
00:24:18.240 --> 00:24:21.400
<v Speaker 2>That's indexes on expressions. You can literally do create unique

504
00:24:21.400 --> 00:24:25.200
<v Speaker 2>index on users lower email. The index stores the result

505
00:24:25.240 --> 00:24:27.839
<v Speaker 2>of the lower function, allowing fast case in sensitive lookups

506
00:24:27.920 --> 00:24:28.839
<v Speaker 2>or uniqueness checks.

507
00:24:29.000 --> 00:24:32.160
<v Speaker 1>Cool now, GIM indexes those sound different.

508
00:24:32.480 --> 00:24:32.680
<v Speaker 3>Yeah.

509
00:24:32.759 --> 00:24:35.640
<v Speaker 2>GN stands for generalized inverted index and this is where

510
00:24:35.640 --> 00:24:39.599
<v Speaker 2>things get really interesting. For certain data types. GN indexes

511
00:24:39.640 --> 00:24:42.960
<v Speaker 2>are optimized for indexing columns that contain multiple values think

512
00:24:43.039 --> 00:24:45.279
<v Speaker 2>arrays or especially JSMB columns.

513
00:24:45.359 --> 00:24:47.880
<v Speaker 1>So you can efficiently query inside a JSMB.

514
00:24:47.680 --> 00:24:50.519
<v Speaker 2>Column exactly with a gene index on a json B

515
00:24:50.640 --> 00:24:53.720
<v Speaker 2>column holding, say, trip details, you could very quickly find

516
00:24:53.759 --> 00:24:56.920
<v Speaker 2>all trips where metadata bags and trunk is two, or

517
00:24:56.960 --> 00:25:01.319
<v Speaker 2>where metadata by water offered is true. Without a gen index,

518
00:25:01.599 --> 00:25:04.640
<v Speaker 2>searching inside json B is usually a slow, sequential scan.

519
00:25:05.319 --> 00:25:08.839
<v Speaker 2>Gn is also used by extensions like PGTRGM for fast

520
00:25:08.960 --> 00:25:12.920
<v Speaker 2>text similarity searches, finding Henry when someone types Henrietta, for example,

521
00:25:13.240 --> 00:25:15.160
<v Speaker 2>really powerful for fuzzy searching.

522
00:25:14.839 --> 00:25:18.680
<v Speaker 1>Wow, okay, jsonb's searching and fuzzy text. What are partial indexes?

523
00:25:18.920 --> 00:25:21.440
<v Speaker 2>Partial indexes are indexes that have a wear clause in

524
00:25:21.480 --> 00:25:24.440
<v Speaker 2>their definition. This means the index only includes rows that

525
00:25:24.480 --> 00:25:25.400
<v Speaker 2>match that wear clock.

526
00:25:25.480 --> 00:25:26.799
<v Speaker 1>Why would you too that to.

527
00:25:26.680 --> 00:25:30.000
<v Speaker 2>Make the index much smaller and more efficient. A classic

528
00:25:30.039 --> 00:25:34.359
<v Speaker 2>example is indexing soft deleted records. Create index on users

529
00:25:34.680 --> 00:25:37.119
<v Speaker 2>we're deleted at is not null. If only one percent

530
00:25:37.160 --> 00:25:39.920
<v Speaker 2>of your users are deleted, this index is tiny compared

531
00:25:39.960 --> 00:25:43.319
<v Speaker 2>to indexing all users. It's perfect for queries that specifically

532
00:25:43.359 --> 00:25:44.839
<v Speaker 2>target that subset of rows.

533
00:25:45.079 --> 00:25:50.440
<v Speaker 1>Smaller index faster queries for that subset, smart and brin indexes.

534
00:25:50.079 --> 00:25:53.480
<v Speaker 2>Brin block range index. Now, connecting this to the bigger picture,

535
00:25:53.839 --> 00:25:57.240
<v Speaker 2>think about very large tables may pen only tables like

536
00:25:57.279 --> 00:26:00.480
<v Speaker 2>event logs, where data is inserted in order off correlated

537
00:26:00.519 --> 00:26:03.519
<v Speaker 2>with the timestamp. Light created app for these tables. Britt

538
00:26:03.559 --> 00:26:06.640
<v Speaker 2>indexes can offer performance similar to a standard B tree

539
00:26:06.680 --> 00:26:10.000
<v Speaker 2>index for range queries like finding events between two timestamps,

540
00:26:10.240 --> 00:26:13.079
<v Speaker 2>but they're drastically smaller. We're talking maybe forty kilobytes versus

541
00:26:13.079 --> 00:26:15.079
<v Speaker 2>two hundred megabytes for the same data range.

542
00:26:15.160 --> 00:26:19.039
<v Speaker 1>It's huge space saving tiny indexes for ordered data. Interesting

543
00:26:19.079 --> 00:26:20.759
<v Speaker 1>do indexes help with order by? Why?

544
00:26:21.000 --> 00:26:21.640
<v Speaker 3>Absolutely?

545
00:26:22.039 --> 00:26:24.359
<v Speaker 2>If you have an index on the columns you're ordering by,

546
00:26:24.680 --> 00:26:27.599
<v Speaker 2>postgrescool can often just read the data directly from the

547
00:26:27.640 --> 00:26:31.440
<v Speaker 2>index in the correct order, avoiding a costly sorting step.

548
00:26:31.920 --> 00:26:34.759
<v Speaker 2>This includes handling a null's last or nulls first efficiently

549
00:26:34.799 --> 00:26:35.559
<v Speaker 2>if the index is.

550
00:26:35.480 --> 00:26:38.599
<v Speaker 1>Defined correctly, and covering indexes covering indexes.

551
00:26:38.720 --> 00:26:42.359
<v Speaker 2>This uses the inclue gag keyword added in post crescool eleven.

552
00:26:42.519 --> 00:26:44.799
<v Speaker 2>It lets you add extra columns to an index just

553
00:26:44.839 --> 00:26:47.880
<v Speaker 2>for their data, not for searching or ordering. The benefit

554
00:26:48.319 --> 00:26:50.599
<v Speaker 2>if a query can get all columns it needs both

555
00:26:50.640 --> 00:26:53.200
<v Speaker 2>for filtering, ordering and for the seleclist directly from the

556
00:26:53.240 --> 00:26:56.839
<v Speaker 2>index itself. Postgrescool can perform an index only scan. It

557
00:26:56.880 --> 00:26:58.799
<v Speaker 2>never even has to visit the main table heap that

558
00:26:58.839 --> 00:27:01.400
<v Speaker 2>avoids a whole layer of IO and can be a massive.

559
00:27:01.039 --> 00:27:04.759
<v Speaker 1>Performance boost index only scans got it. Okay, So we've

560
00:27:04.759 --> 00:27:09.319
<v Speaker 1>optimized queries, picked good indexes, but databases need ongoing care

561
00:27:09.559 --> 00:27:13.720
<v Speaker 1>right like cars needing oil changes. The source talks about vacuum, analyze,

562
00:27:13.759 --> 00:27:15.640
<v Speaker 1>re index the VR maintenance tasks.

563
00:27:15.720 --> 00:27:18.759
<v Speaker 2>Yeah, absolutely vital, and it loops back to how Postgres

564
00:27:18.839 --> 00:27:24.200
<v Speaker 2>school handles changes that NVCC system multiversion concurrency control. When

565
00:27:24.200 --> 00:27:27.640
<v Speaker 2>you update or delete a row, postgriss doesn't immediately overwrite

566
00:27:27.759 --> 00:27:30.640
<v Speaker 2>or remove the old data. It marks the old row

567
00:27:30.759 --> 00:27:34.359
<v Speaker 2>version as invisible to new transactions, but the physical space

568
00:27:34.440 --> 00:27:37.720
<v Speaker 2>isn't reclaimed right away. These invisible rows are called dead.

569
00:27:37.480 --> 00:27:40.839
<v Speaker 1>Toples, and those dead tuples are bloat exactly.

570
00:27:41.200 --> 00:27:43.680
<v Speaker 2>They take up space in your tables and indexes, making

571
00:27:43.680 --> 00:27:45.839
<v Speaker 2>them larger than they need to be, which can slow

572
00:27:45.880 --> 00:27:48.079
<v Speaker 2>down queries. This is where vacuum comes in.

573
00:27:48.200 --> 00:27:50.480
<v Speaker 1>So vacuum cleans up the dead tupules.

574
00:27:50.559 --> 00:27:53.240
<v Speaker 2>Yes, and the really neat thing is auto vacuum, which

575
00:27:53.240 --> 00:27:57.039
<v Speaker 2>is postgres schools built in background process that does this automatically.

576
00:27:57.400 --> 00:28:00.880
<v Speaker 2>It monitors tables for changes and runs vvace to reclaim

577
00:28:00.920 --> 00:28:04.480
<v Speaker 2>space and analyze an ed SHEBSI to update statistics for

578
00:28:04.519 --> 00:28:05.960
<v Speaker 2>the query planner periodically.

579
00:28:06.039 --> 00:28:07.640
<v Speaker 1>Be said, it's often too conservative.

580
00:28:07.880 --> 00:28:11.240
<v Speaker 2>Yeah, The default auto vacuum settings are often tuned for safety,

581
00:28:11.359 --> 00:28:15.279
<v Speaker 2>not aggressiveness. On busy databases with high right rates, lots

582
00:28:15.279 --> 00:28:18.160
<v Speaker 2>of updates and deletes, the default hittings might not keep

583
00:28:18.200 --> 00:28:21.720
<v Speaker 2>up and blow can accumulate. You can, and often should,

584
00:28:22.039 --> 00:28:25.680
<v Speaker 2>tune parameters like auto vacuum vacuum cost limit how much

585
00:28:25.680 --> 00:28:29.680
<v Speaker 2>work it does per run and vacuum vacuum cost delay

586
00:28:29.839 --> 00:28:33.319
<v Speaker 2>how long it pauses per tables or globally to make

587
00:28:33.359 --> 00:28:35.640
<v Speaker 2>it run more often or more aggressively to keep bloat

588
00:28:35.720 --> 00:28:36.319
<v Speaker 2>under control.

589
00:28:36.559 --> 00:28:39.440
<v Speaker 1>Okay, tune auto vacuum. What about indexes? Do they get

590
00:28:39.440 --> 00:28:40.680
<v Speaker 1>bloated to They do?

591
00:28:41.000 --> 00:28:44.759
<v Speaker 2>And indexes can become fragmented or inefficient over time for

592
00:28:44.839 --> 00:28:48.839
<v Speaker 2>rebuilding them. The best practice now is re index concurrently.

593
00:28:48.599 --> 00:28:51.200
<v Speaker 1>Ah concurrently again, so no downtime right.

594
00:28:51.319 --> 00:28:54.279
<v Speaker 2>Introduced in POSTGRESSCULL thirteen, it rebuilds the index in the

595
00:28:54.319 --> 00:28:57.599
<v Speaker 2>background without blocking reads or rights to the table. Once

596
00:28:57.640 --> 00:28:59.960
<v Speaker 2>the new index is ready, it swaps it in atomically.

597
00:29:00.279 --> 00:29:03.160
<v Speaker 2>For older Postgres versions, the Prakic extension can do something

598
00:29:03.200 --> 00:29:05.519
<v Speaker 2>similar for both tables and indexes.

599
00:29:05.119 --> 00:29:08.240
<v Speaker 1>So mostly rely on autovacuum and renex Concurrently, do you

600
00:29:08.279 --> 00:29:09.599
<v Speaker 1>ever need manual vacuum.

601
00:29:09.599 --> 00:29:13.720
<v Speaker 2>Sometimes you might run vacuum analyze your pable manually after

602
00:29:13.759 --> 00:29:16.279
<v Speaker 2>a huge data loaded deletion, just to make sure stats

603
00:29:16.319 --> 00:29:19.960
<v Speaker 2>are up to date immediately. There's also a vacuum ver

604
00:29:20.000 --> 00:29:23.200
<v Speaker 2>both option to see what it's doing, and since Postgres

605
00:29:23.240 --> 00:29:27.200
<v Speaker 2>twelve vacuum skip blocked is useful to make sure your

606
00:29:27.200 --> 00:29:29.519
<v Speaker 2>manual vacuum doesn't get stuck waiting for a lock on

607
00:29:29.559 --> 00:29:30.000
<v Speaker 2>the table.

608
00:29:30.279 --> 00:29:31.640
<v Speaker 1>Is there a way to see the bloat?

609
00:29:32.079 --> 00:29:34.559
<v Speaker 2>Yeah, you can actually simulate it locally for learning, disable

610
00:29:34.599 --> 00:29:36.680
<v Speaker 2>auto vacuum on a test table or on a ton

611
00:29:36.720 --> 00:29:38.759
<v Speaker 2>of you P dates and then use SQL queries you

612
00:29:38.759 --> 00:29:41.200
<v Speaker 2>can find them online or tools like pg hero to

613
00:29:41.359 --> 00:29:44.079
<v Speaker 2>estimate the table and index bloat. Then you could run

614
00:29:44.160 --> 00:29:46.839
<v Speaker 2>vaco full well, which does lock the table heavily, so

615
00:29:46.880 --> 00:29:48.920
<v Speaker 2>don't run it on production just to see it reclaim

616
00:29:48.960 --> 00:29:51.240
<v Speaker 2>the space aggressively. It helps build intuition.

617
00:29:51.400 --> 00:29:54.079
<v Speaker 1>Good for understanding. You mentioned unused indexes earlier.

618
00:29:54.200 --> 00:29:57.920
<v Speaker 2>Yes, crucial maintenance indexes aren't free, They take up disc

619
00:29:57.960 --> 00:30:01.519
<v Speaker 2>space and more importantly, they slow down rights insert update

620
00:30:01.519 --> 00:30:04.400
<v Speaker 2>to lead because the index has to be updated too regularly.

621
00:30:04.400 --> 00:30:08.039
<v Speaker 2>Identifying and removing unused or redundant overlapping indexes is a

622
00:30:08.119 --> 00:30:11.599
<v Speaker 2>key optimization. Tools like pg hero are great for finding

623
00:30:11.640 --> 00:30:14.640
<v Speaker 2>these based on usage statistics tracked by postcress, so.

624
00:30:14.720 --> 00:30:18.720
<v Speaker 1>Clean up unused indexes. How can we schedule these maintenance tasks?

625
00:30:18.839 --> 00:30:21.759
<v Speaker 2>The pupg chron extension is super handy for this. It's

626
00:30:21.799 --> 00:30:24.839
<v Speaker 2>a scheduler that runs inside postgress role itself. You can

627
00:30:24.920 --> 00:30:28.039
<v Speaker 2>use it to schedule recurring SQEL commands like running a

628
00:30:28.039 --> 00:30:31.839
<v Speaker 2>manual vacuum analyze on specific busy tables every night, or

629
00:30:31.880 --> 00:30:35.559
<v Speaker 2>maybe rebuilding certain indexes periodically with index.

630
00:30:35.200 --> 00:30:39.039
<v Speaker 1>Concurrently pgcra cron for scheduling any other quick maintenance tools.

631
00:30:39.200 --> 00:30:42.759
<v Speaker 2>The rails pg extras gem provides some convenient RAKE tasks

632
00:30:42.799 --> 00:30:46.680
<v Speaker 2>for common checks right from your railvapp, things like checking

633
00:30:46.759 --> 00:30:50.920
<v Speaker 2>database cash hit rates, finding long running queries, diagnosing lock contention,

634
00:30:51.240 --> 00:30:54.720
<v Speaker 2>and identifying null indexes. Null indexes yeah indexes where a

635
00:30:54.799 --> 00:30:57.519
<v Speaker 2>very high percentage of the entries are actually NLLL. Depending

636
00:30:57.599 --> 00:30:59.799
<v Speaker 2>on your queries, these might be candidates for conversion into

637
00:30:59.799 --> 00:31:03.599
<v Speaker 2>some more efficient partial indexes using a were column is

638
00:31:03.640 --> 00:31:05.359
<v Speaker 2>not NL class got.

639
00:31:05.079 --> 00:31:09.400
<v Speaker 1>It okay, final big area scaling, more users, more traffic.

640
00:31:09.960 --> 00:31:13.200
<v Speaker 1>This often leads to problems with database connections, right hitting limits.

641
00:31:13.480 --> 00:31:16.920
<v Speaker 2>Yeah, that's a common scaling bottleneck. Postgress will has a

642
00:31:16.960 --> 00:31:20.960
<v Speaker 2>hard limit on simultaneous connections set by the max connections parameter.

643
00:31:21.400 --> 00:31:23.960
<v Speaker 2>If your application tries to open more connections than that

644
00:31:24.000 --> 00:31:28.039
<v Speaker 2>limit allows, you get the dreaded fatal Sorry too many

645
00:31:28.079 --> 00:31:29.160
<v Speaker 2>clients already error?

646
00:31:29.680 --> 00:31:32.839
<v Speaker 1>Ugh yeah, not fun. How do you deal with that?

647
00:31:33.160 --> 00:31:34.480
<v Speaker 3>Well, first you need visibility.

648
00:31:34.880 --> 00:31:37.720
<v Speaker 2>Use pgstat activity again to see how many connections are open,

649
00:31:37.720 --> 00:31:40.359
<v Speaker 2>in what state they're in. Are they active, idle or

650
00:31:40.359 --> 00:31:43.599
<v Speaker 2>worryingly idle in transaction meaning a connection is holding a

651
00:31:43.640 --> 00:31:47.440
<v Speaker 2>transaction open but isn't doing any work. Potentially holding locks.

652
00:31:47.559 --> 00:31:49.319
<v Speaker 1>Idle in transaction is bad.

653
00:31:49.319 --> 00:31:49.799
<v Speaker 3>Very bad.

654
00:31:50.079 --> 00:31:52.640
<v Speaker 2>You need to find and fix the application code causing that.

655
00:31:53.079 --> 00:31:56.039
<v Speaker 2>But the main architectural solution for handling lots of connections

656
00:31:56.039 --> 00:31:57.759
<v Speaker 2>efficiently is connection pooling.

657
00:31:57.839 --> 00:31:59.799
<v Speaker 1>Connection pooling like pg bouncer.

658
00:32:00.079 --> 00:32:00.799
<v Speaker 3>It's exactly.

659
00:32:01.240 --> 00:32:05.039
<v Speaker 2>Pg bouncer is the standard, free, open source connection pooler

660
00:32:05.039 --> 00:32:09.000
<v Speaker 2>for Postgress. What's fascinating here is how it works. It

661
00:32:09.039 --> 00:32:12.759
<v Speaker 2>sits between your rails application and your postgrescool database. Your

662
00:32:12.920 --> 00:32:15.839
<v Speaker 2>Rails apps connect to pg bouncer, which can handle thousands

663
00:32:15.880 --> 00:32:19.200
<v Speaker 2>of front end connections, and pg bouncer maintains a smaller

664
00:32:19.359 --> 00:32:23.160
<v Speaker 2>managed pool of actual connections to the back end postgresscole database,

665
00:32:23.480 --> 00:32:27.599
<v Speaker 2>reusing them efficiently. It drastically reduces the overhead of constantly

666
00:32:27.640 --> 00:32:30.319
<v Speaker 2>opening and closing connections to Postgress itself.

667
00:32:30.039 --> 00:32:32.440
<v Speaker 1>So it lets you handle way more app servers without

668
00:32:32.519 --> 00:32:34.519
<v Speaker 1>overwhelming the database's connection limits.

669
00:32:34.599 --> 00:32:38.160
<v Speaker 2>Precisely, it's essential for scaling rails apps with postgresscool. You

670
00:32:38.240 --> 00:32:41.400
<v Speaker 2>do need to configure pg bouncer's pool modes though. Transaction

671
00:32:41.480 --> 00:32:44.279
<v Speaker 2>pooling is very common and efficient, but it has a catch.

672
00:32:44.680 --> 00:32:47.400
<v Speaker 2>It doesn't work well with postgrescool session based features like

673
00:32:47.440 --> 00:32:51.160
<v Speaker 2>prepared statements, so if you use transaction pooling, you generally

674
00:32:51.240 --> 00:32:54.480
<v Speaker 2>need to disable prepared statements in your rails database dot

675
00:32:54.519 --> 00:32:55.359
<v Speaker 2>eml oh.

676
00:32:55.279 --> 00:32:58.240
<v Speaker 1>Okay disable prepared statements for transaction pooling.

677
00:32:58.279 --> 00:33:01.920
<v Speaker 2>What's the alternative session pool mode in pg bouncer. It

678
00:33:01.960 --> 00:33:05.319
<v Speaker 2>works fine with prepared statements, but it's less efficient at

679
00:33:05.400 --> 00:33:08.359
<v Speaker 2>re using connections as it pins a back end connection

680
00:33:08.400 --> 00:33:11.599
<v Speaker 2>to a client for the entire session duration. It's often

681
00:33:11.640 --> 00:33:14.319
<v Speaker 2>the fallback if you can't disable prepared statements.

682
00:33:14.440 --> 00:33:17.759
<v Speaker 1>Got it? Transaction versus session pooling, and we still need

683
00:33:17.799 --> 00:33:18.759
<v Speaker 1>those timeouts right.

684
00:33:18.920 --> 00:33:22.839
<v Speaker 2>Absolutely, statement time out, lock, time out, and especially idle

685
00:33:22.880 --> 00:33:26.960
<v Speaker 2>in transaction. Session timeout become even more critical when you

686
00:33:26.960 --> 00:33:31.279
<v Speaker 2>have a pooler involved to prevent misbehaving application instances from

687
00:33:31.279 --> 00:33:32.599
<v Speaker 2>hogging pooled connections.

688
00:33:32.720 --> 00:33:35.880
<v Speaker 1>Okay, let's dive a bit deeper into locking. We mentioned

689
00:33:35.880 --> 00:33:37.240
<v Speaker 1>pessimistic locking.

690
00:33:37.119 --> 00:33:40.960
<v Speaker 2>Right postgressholes Default locks are taken up front. Peaglocks few

691
00:33:41.000 --> 00:33:44.200
<v Speaker 2>gives you the granular details you can explicitly request locks

692
00:33:44.240 --> 00:33:47.440
<v Speaker 2>in SQL. Select it for update is common. It locks

693
00:33:47.440 --> 00:33:50.279
<v Speaker 2>the rose you select, preventing any other transaction from updating

694
00:33:50.359 --> 00:33:52.839
<v Speaker 2>or deleting them or selecting them for update until your

695
00:33:52.839 --> 00:33:54.440
<v Speaker 2>transaction commits or rolls back.

696
00:33:54.519 --> 00:33:56.160
<v Speaker 1>What if the row is already locked.

697
00:33:55.960 --> 00:33:59.240
<v Speaker 2>By default for update will wait, but you can add

698
00:33:59.279 --> 00:34:02.440
<v Speaker 2>no weight If the row is locked, the query fails immediately,

699
00:34:03.240 --> 00:34:06.559
<v Speaker 2>or you can use skiplocked. This tells postgress to just

700
00:34:06.599 --> 00:34:09.199
<v Speaker 2>ignore any locked rows and return only the ones that

701
00:34:09.239 --> 00:34:13.159
<v Speaker 2>can lock immediately. Super useful for implementing job queues where

702
00:34:13.280 --> 00:34:15.840
<v Speaker 2>multiple workers might try to grab the next available job.

703
00:34:16.079 --> 00:34:19.599
<v Speaker 1>No wait and skip locked. Okay, is there a weaker.

704
00:34:19.360 --> 00:34:23.679
<v Speaker 2>Lock yees selectaseball for share. This takes a shared lock.

705
00:34:24.119 --> 00:34:26.559
<v Speaker 2>Multiple transactions can hold a shared lock on the same

706
00:34:26.639 --> 00:34:30.360
<v Speaker 2>row simultaneously. It prevents anyone from getting an exclusive lock,

707
00:34:30.480 --> 00:34:33.079
<v Speaker 2>like for update or from delete, but it allows other

708
00:34:33.119 --> 00:34:36.119
<v Speaker 2>transactions to also read or select for share. Useful if

709
00:34:36.119 --> 00:34:37.800
<v Speaker 2>you need to read data and ensure it doesn't change

710
00:34:37.800 --> 00:34:40.159
<v Speaker 2>before your transaction finishes, but you don't mind others reading

711
00:34:40.199 --> 00:34:40.480
<v Speaker 2>it too.

712
00:34:40.559 --> 00:34:44.039
<v Speaker 1>Okay, that's pessimistic locking and postgress. What about optimistic locking

713
00:34:44.079 --> 00:34:45.800
<v Speaker 1>and RAILS, right, that's different.

714
00:34:46.360 --> 00:34:50.440
<v Speaker 2>Active record optimistic locking is an application level strategy. You

715
00:34:50.480 --> 00:34:53.400
<v Speaker 2>add a lock version integer column to your table. When

716
00:34:53.480 --> 00:34:56.480
<v Speaker 2>railsloads a record, it remembers the lock version. When you

717
00:34:56.519 --> 00:34:59.320
<v Speaker 2>go to save it, RAILS checks if the lock version

718
00:34:59.320 --> 00:35:01.320
<v Speaker 2>in the database still match is the one it loaded.

719
00:35:01.800 --> 00:35:04.599
<v Speaker 2>If it doesn't match, meeting someone else modified the record

720
00:35:04.639 --> 00:35:08.719
<v Speaker 2>in the meantime, RAILS raises an active record stale object terror.

721
00:35:08.960 --> 00:35:12.280
<v Speaker 2>Instead of overwriting the changes, it forces you to handle

722
00:35:12.280 --> 00:35:14.320
<v Speaker 2>the conflict in your application code.

723
00:35:14.119 --> 00:35:17.679
<v Speaker 1>So detect conflicts at right time rather than preventing them

724
00:35:17.679 --> 00:35:19.039
<v Speaker 1>with locks exactly.

725
00:35:19.239 --> 00:35:22.079
<v Speaker 2>And then there are advisory locks. These are completely application

726
00:35:22.199 --> 00:35:26.119
<v Speaker 2>defined locks using functions like peak advisory lock key. They

727
00:35:26.119 --> 00:35:28.719
<v Speaker 2>don't lock table rows, they just lock an arbitrary number.

728
00:35:28.760 --> 00:35:32.400
<v Speaker 2>The key useful for coordinating background jobs or ensuring only

729
00:35:32.440 --> 00:35:35.480
<v Speaker 2>one process runs a specific task at a time without

730
00:35:35.519 --> 00:35:36.719
<v Speaker 2>needing a dedicated table.

731
00:35:36.800 --> 00:35:40.280
<v Speaker 1>Okay, lots of locking options shifting again. Dealing with large

732
00:35:40.320 --> 00:35:43.400
<v Speaker 1>data sets often means pagination. What are the pitfalls?

733
00:35:43.639 --> 00:35:47.480
<v Speaker 2>The biggest pitfall is traditional limit offset pagination. It's easy

734
00:35:47.519 --> 00:35:51.000
<v Speaker 2>to implement dot limit twenty dot offset, but it gets

735
00:35:51.000 --> 00:35:54.119
<v Speaker 2>incredibly slow on large tables as the offset value grows.

736
00:35:54.199 --> 00:35:55.719
<v Speaker 1>Why is high offset slow.

737
00:35:55.840 --> 00:35:58.119
<v Speaker 2>Because postgress will still has to fetch all the rows

738
00:35:58.199 --> 00:36:01.039
<v Speaker 2>up to the offset plus limit point, sort them, and

739
00:36:01.079 --> 00:36:04.599
<v Speaker 2>then discard the offset rows. So offset five hundred thousand

740
00:36:04.639 --> 00:36:06.480
<v Speaker 2>is going to be agonizingly slow because it has to

741
00:36:06.519 --> 00:36:09.199
<v Speaker 2>process over five million rows just to find the twenty

742
00:36:09.239 --> 00:36:09.559
<v Speaker 2>you want.

743
00:36:09.760 --> 00:36:12.760
<v Speaker 1>Ouch. So what's better for large tables? Cursors?

744
00:36:13.000 --> 00:36:16.880
<v Speaker 2>Database cursors declared cursor fetch solve the offset problem by

745
00:36:16.920 --> 00:36:20.679
<v Speaker 2>maintaining state on the server. They're consistent, but they have drawbacks.

746
00:36:21.000 --> 00:36:24.440
<v Speaker 2>They keep a transaction open, which consumes resources and doesn't

747
00:36:24.480 --> 00:36:27.840
<v Speaker 2>play well with pg bouncer's transaction pooling mode. An active

748
00:36:27.880 --> 00:36:29.519
<v Speaker 2>record doesn't natively support them easily.

749
00:36:29.599 --> 00:36:32.039
<v Speaker 1>Okay, So cursors are tricky, what's the recommended way?

750
00:36:32.079 --> 00:36:36.119
<v Speaker 2>Then key set pagination sometimes called seek based pagination. We

751
00:36:36.159 --> 00:36:38.519
<v Speaker 2>connect this to the bigger picture. This is generally the

752
00:36:38.519 --> 00:36:41.920
<v Speaker 2>most efficient method for large data sets. Instead of offset,

753
00:36:42.199 --> 00:36:44.599
<v Speaker 2>you use a wear clause based on the value of

754
00:36:44.639 --> 00:36:47.039
<v Speaker 2>the last item scene on the previous page, combined with

755
00:36:47.360 --> 00:36:50.679
<v Speaker 2>order buy and limit. For example, where is a last

756
00:36:50.679 --> 00:36:54.559
<v Speaker 2>seeded order by sp limit twenty. Because you're filtering directly

757
00:36:54.639 --> 00:36:57.519
<v Speaker 2>using an index column, the database can jump almost directly

758
00:36:57.519 --> 00:36:59.880
<v Speaker 2>to the starting point for the next page, avoiding scanning

759
00:37:00.000 --> 00:37:02.760
<v Speaker 2>all the previous rows. It stays fast even on page

760
00:37:02.800 --> 00:37:03.280
<v Speaker 2>ten thousand.

761
00:37:03.400 --> 00:37:05.960
<v Speaker 1>Key set pagination use the last scene value in the

762
00:37:06.000 --> 00:37:10.400
<v Speaker 1>wear clause, got it last topic Bulk data writes. How

763
00:37:10.400 --> 00:37:11.440
<v Speaker 1>to do those efficiently?

764
00:37:11.760 --> 00:37:14.679
<v Speaker 2>Upsorts are common, combining an insert with an update if

765
00:37:14.719 --> 00:37:18.239
<v Speaker 2>the row already exists. Postgrescoll has insert insert and tree

766
00:37:18.519 --> 00:37:20.960
<v Speaker 2>on conflict do nothing if it exists, ignore the insert

767
00:37:21.199 --> 00:37:23.800
<v Speaker 2>or insert in on conflict, do update set if exists,

768
00:37:23.880 --> 00:37:27.679
<v Speaker 2>update specify columns. Rails's insertal method has an on conflict

769
00:37:27.719 --> 00:37:31.000
<v Speaker 2>option to handle this, and Postcress fifteen introduced this standard

770
00:37:31.000 --> 00:37:34.159
<v Speaker 2>SQL merge command, which is even more powerful for complex

771
00:37:34.239 --> 00:37:35.320
<v Speaker 2>absurd logic.

772
00:37:35.039 --> 00:37:37.800
<v Speaker 1>Okay, upsurts handle conflicts. What about just loading lots of

773
00:37:37.840 --> 00:37:39.480
<v Speaker 1>new data like from a file.

774
00:37:39.760 --> 00:37:43.199
<v Speaker 2>For bulk loading from files, especially csvs, the absolute fastest

775
00:37:43.239 --> 00:37:46.599
<v Speaker 2>way is the native postgress will copy command copy mytable

776
00:37:46.599 --> 00:37:50.079
<v Speaker 2>from my fil dot csv with format csv header. It

777
00:37:50.159 --> 00:37:52.800
<v Speaker 2>bypasses a lot of the normal SQL processing overhead and

778
00:37:52.880 --> 00:37:55.440
<v Speaker 2>is incredibly efficient, especially if we run into the same

779
00:37:55.440 --> 00:37:58.280
<v Speaker 2>transaction as a create table or trunkate as it can

780
00:37:58.320 --> 00:37:59.320
<v Speaker 2>minimize wall loogging.

781
00:37:59.360 --> 00:38:02.800
<v Speaker 1>Then copy is king for bulk loads. Anything else cool

782
00:38:02.840 --> 00:38:03.960
<v Speaker 1>For external.

783
00:38:03.559 --> 00:38:06.559
<v Speaker 2>Data, yeah, this is a surprisingly powerful and clever tool

784
00:38:06.719 --> 00:38:10.360
<v Speaker 2>foreign data rappers, specifically the filep and do extension file

785
00:38:10.400 --> 00:38:14.000
<v Speaker 2>EPI darol lets you define an external file like CSV

786
00:38:14.039 --> 00:38:16.679
<v Speaker 2>log file as if it were a postcrescoll table. You

787
00:38:16.719 --> 00:38:19.599
<v Speaker 2>can then run SQL queries directly against that external file

788
00:38:19.639 --> 00:38:21.159
<v Speaker 2>without importing the data seriously.

789
00:38:21.280 --> 00:38:24.440
<v Speaker 1>Query as CSV file with SQL without loading it seriously.

790
00:38:24.840 --> 00:38:28.480
<v Speaker 2>Create extension fileffew year, create server file server foreign and

791
00:38:28.599 --> 00:38:32.920
<v Speaker 2>data wrapper FILEFEEDW, create foreign table, mylogues, server filesiver options,

792
00:38:33.000 --> 00:38:36.519
<v Speaker 2>fill namepath, dollars, dot csv, format csv. Then just selects

793
00:38:36.519 --> 00:38:38.679
<v Speaker 2>from milogues where air code e cools five hundred. It's

794
00:38:38.679 --> 00:38:41.920
<v Speaker 2>amazing for analyzing log files or accessing archive data stored

795
00:38:41.960 --> 00:38:44.119
<v Speaker 2>externally a real game changer sometimes.

796
00:38:44.360 --> 00:38:48.039
<v Speaker 1>Wow, Okay, what a journey we've been on. We started

797
00:38:48.119 --> 00:38:50.559
<v Speaker 1>right at the beginning, setting up that local performance lab

798
00:38:50.599 --> 00:38:55.119
<v Speaker 1>with rideshare. Then we dove into admin basics, safe experimentation,

799
00:38:55.599 --> 00:38:58.559
<v Speaker 1>then hit data integrity with constraints which go way beyond

800
00:38:58.599 --> 00:39:02.559
<v Speaker 1>basic rills validations. We tackled those scary production migrations, learning

801
00:39:02.559 --> 00:39:05.599
<v Speaker 1>about concurrently and tools like strong migrations. Then we really

802
00:39:05.639 --> 00:39:08.920
<v Speaker 1>got into optimizing active record itself n plus one queries,

803
00:39:08.960 --> 00:39:13.039
<v Speaker 1>eager loading cts, materialized views, and finally went deep into

804
00:39:13.079 --> 00:39:18.239
<v Speaker 1>postgress internals, explain advanced indexing like gins bren, partial indexes,

805
00:39:18.360 --> 00:39:21.159
<v Speaker 1>essential maintenance with vacuum and re index and handling scaling

806
00:39:21.239 --> 00:39:24.440
<v Speaker 1>challenges like connection pooling with pg bouncer locking, and efficient

807
00:39:24.480 --> 00:39:27.760
<v Speaker 1>folk data handling with copyy and even FDWs. You really

808
00:39:27.800 --> 00:39:30.239
<v Speaker 1>have gained a shortcut here to some seriously high performance

809
00:39:30.239 --> 00:39:33.400
<v Speaker 1>postgrescol skills. That's an incredibly valuable step.

810
00:39:33.159 --> 00:39:35.679
<v Speaker 2>It really is, and that's the goal, right to equip

811
00:39:35.719 --> 00:39:40.480
<v Speaker 2>you with these practical, actionable insights, hopefully some surprising facts

812
00:39:40.519 --> 00:39:43.360
<v Speaker 2>too that can genuinely make a difference in your applications.

813
00:39:43.679 --> 00:39:46.119
<v Speaker 2>You should now have a much deeper feel for how

814
00:39:46.159 --> 00:39:49.679
<v Speaker 2>postgresco operates and how to really harness its power with rails.

815
00:39:50.280 --> 00:39:52.960
<v Speaker 2>So maybe a final thought to leave you with, Given

816
00:39:53.000 --> 00:39:55.719
<v Speaker 2>all this immense power and flexibility we've uncovered in post

817
00:39:55.760 --> 00:39:58.599
<v Speaker 2>gris skool today, just think about your own work. Is

818
00:39:58.639 --> 00:40:01.920
<v Speaker 2>there existing application bottleneck, maybe something you struggled with that

819
00:40:02.000 --> 00:40:05.159
<v Speaker 2>could be dramatically simplified or sped up by using one

820
00:40:05.199 --> 00:40:08.000
<v Speaker 2>of these database features, maybe one you hadn't even considered before.

821
00:40:08.159 --> 00:40:12.239
<v Speaker 2>Keep asking that question, keep exploring, because continuously improving the

822
00:40:12.320 --> 00:40:16.679
<v Speaker 2>operational excellence of your databases, well, it's an ongoing process,

823
00:40:16.719 --> 00:40:18.039
<v Speaker 2>but it's definitely a rewarding one,
