WEBVTT 1 00:00:00.080 --> 00:00:01.800 Welcome to the deep dive. This is where we take 2 00:00:01.840 --> 00:00:06.000 a whole stack of complex information, research notes articles and 3 00:00:06.040 --> 00:00:08.359 really boil it down to the essentials for you. And 4 00:00:08.439 --> 00:00:11.640 today we're plunging into a topic that's just well critical 5 00:00:11.679 --> 00:00:15.560 for any serious developer, high performance Postgres School, specifically within 6 00:00:15.679 --> 00:00:19.120 Ruby on Rail's applications. Our mission really is to give 7 00:00:19.120 --> 00:00:21.399 you those key insights that will make your apps faster, 8 00:00:22.280 --> 00:00:25.120 more reliable, definitely more resilient. What's kind of cool is 9 00:00:25.120 --> 00:00:27.399 that our main source here is actually a beta book. 10 00:00:27.640 --> 00:00:30.120 It's still being developed, so you're getting some really cutting edge, 11 00:00:30.760 --> 00:00:32.679 very practical info straight from the trenches. 12 00:00:32.719 --> 00:00:35.280 You might say, yeah, and that's exactly why this is 13 00:00:35.960 --> 00:00:39.439 so relevant right now. Getting really good at postgreschool and rails. 14 00:00:39.880 --> 00:00:43.759 That's not just helpful for your career, it's massively in demand, 15 00:00:44.000 --> 00:00:46.520 right Just think about it. A Hired survey from twenty 16 00:00:46.560 --> 00:00:48.960 twenty three it found Ruby on rails was actually the 17 00:00:48.960 --> 00:00:52.280 most sought after skill. We're talking one point six y 18 00:00:52.320 --> 00:00:55.560 four times more interview requests if you know your stuff, wow. 19 00:00:55.399 --> 00:00:58.039 One point six four times Yeah. 20 00:00:57.799 --> 00:01:01.479 And Postgres School it's consistently winning wards. It was number 21 00:01:01.520 --> 00:01:04.480 one in the twenty twenty two stack Overflow survey for 22 00:01:04.599 --> 00:01:08.719 most used database among pros and it's topped the dB 23 00:01:08.799 --> 00:01:12.200 engine's ranking three times. So yeah, the time spent learning this, 24 00:01:12.920 --> 00:01:14.959 it's it's definitely high impact knowledge. 25 00:01:15.040 --> 00:01:16.799 Okay, all right, let's get into it. Then. Before we 26 00:01:16.840 --> 00:01:19.439 can even you know, talk about performance tuning, we need 27 00:01:19.480 --> 00:01:21.719 someone to actually do the tuning. We need a test bed. 28 00:01:22.400 --> 00:01:25.799 So this source material. It introduces a fictional app called 29 00:01:25.920 --> 00:01:29.640 ride share. What exactly is that? And I guess how 30 00:01:29.640 --> 00:01:31.000 does it help us learn this stuff? 31 00:01:31.239 --> 00:01:31.400 Right? 32 00:01:31.439 --> 00:01:34.719 So ride share is it's basically designed as this simplified 33 00:01:34.760 --> 00:01:37.319 API only web app. Think of it like a mini 34 00:01:37.439 --> 00:01:39.799 Uber or Lyft. You know, it's got the core active 35 00:01:39.840 --> 00:01:43.560 record models you'd expect drivers, writers, trips, trip requests. 36 00:01:43.879 --> 00:01:46.239 That's worth thing and active record just quickly, that's the RM, right, 37 00:01:46.239 --> 00:01:48.159 the object Relational mapper in reels exactly. 38 00:01:48.200 --> 00:01:50.760 It's the magic layer that connects your Ruby code, your 39 00:01:50.760 --> 00:01:54.120 classes directly to the database tables. And it really leans 40 00:01:54.120 --> 00:01:57.040 into that rails philosophy of convention over configuration. 41 00:01:57.200 --> 00:01:59.799 Ah right, like driver model maps to driver's table. 42 00:01:59.640 --> 00:02:02.120 Automnate precisely, you don't have to spell it all out 43 00:02:02.239 --> 00:02:05.840 and for managing schema changes rideshare uses the standard rail 44 00:02:05.879 --> 00:02:09.439 stuff dbstructured dot s cool along with active record migrations 45 00:02:09.719 --> 00:02:13.360 under the hood that uses pg dump to capture the structure. 46 00:02:13.439 --> 00:02:15.080 You know, one thing I found really valuable in the 47 00:02:15.120 --> 00:02:18.879 source was this strong push to set up and use 48 00:02:19.000 --> 00:02:22.199 postgrass School locally, like really, make it your own little lab. 49 00:02:22.840 --> 00:02:24.840 Totally. It's not just theory. 50 00:02:25.039 --> 00:02:28.599 Having it run locally gives you complete control a safe 51 00:02:28.599 --> 00:02:31.960 space to experiment, which, let's face it, is absolutely essential 52 00:02:32.000 --> 00:02:34.039 when you're messing with performance settings. You don't want to 53 00:02:34.080 --> 00:02:37.039 test this stuff live, definitely not, and getting rights you're 54 00:02:37.080 --> 00:02:39.840 running is designed to be pretty smooth. You use Homebrew 55 00:02:40.080 --> 00:02:43.240 urban v for your Ruby version, Butler for gems, and 56 00:02:43.280 --> 00:02:46.280 then just the standard binrails commands dB do I create, dB, 57 00:02:46.400 --> 00:02:49.759 DOT migrate, dB console simple stuff. Once you set up, 58 00:02:49.840 --> 00:02:53.800 you immediately start bumping into core postgred school ideas like 59 00:02:54.240 --> 00:02:56.520 SQL being a declarative language. 60 00:02:56.560 --> 00:02:58.120 Okay, what does that actually mean? Declaraty? 61 00:02:58.240 --> 00:02:59.919 It just means you tell postgres School what you want, 62 00:03:00.120 --> 00:03:02.080 like get me all the trips from yesterday, and you 63 00:03:02.120 --> 00:03:06.280 don't specify how to get it. Postgrescool's optimizer, its brain 64 00:03:06.919 --> 00:03:10.080 figures out the most efficient way to execute that request. 65 00:03:10.199 --> 00:03:13.280 Ah, so you declare the result. It handles the process 66 00:03:13.439 --> 00:03:15.520 like ordering food, exactly. 67 00:03:15.120 --> 00:03:18.639 Like ordering food, and the how it figures out that's 68 00:03:18.680 --> 00:03:22.520 the query execution plan. It's like the database's internal recipe 69 00:03:22.560 --> 00:03:25.840 for fetching your data. Plus you've got functions built in 70 00:03:25.919 --> 00:03:28.520 ones and ones you can define yourself, which really let 71 00:03:28.560 --> 00:03:32.800 you push complex logic down into the database itself. Seriously, 72 00:03:33.080 --> 00:03:36.240 we can't stress this enough. Get ride share set up locally. 73 00:03:36.560 --> 00:03:39.319 It really is the perfect lab for practicing everything we're 74 00:03:39.319 --> 00:03:39.960 about to cover. 75 00:03:40.120 --> 00:03:43.400 Okay, performance lab established. Ride Share is running locally, safe 76 00:03:43.400 --> 00:03:46.719 space acquired. But to really know what's going on, we 77 00:03:46.800 --> 00:03:49.639 need tools to look inside Postgres school, Right, how do 78 00:03:49.680 --> 00:03:50.719 we peak under the hood. 79 00:03:51.039 --> 00:03:53.319 Yeah, the piece grows command line tool is kind of 80 00:03:53.319 --> 00:03:55.240 your main entry point there. There are a few meta 81 00:03:55.240 --> 00:03:57.919 commands you'll use all the time, like a cantillybax that 82 00:03:58.039 --> 00:04:01.560 toggles This expanded view makes query results way easier to read. 83 00:04:01.680 --> 00:04:03.120 Oh yeah, that's super helpful. 84 00:04:03.439 --> 00:04:06.039 And e is great. It pomps your current query and 85 00:04:06.080 --> 00:04:10.000 your text editor life saver for complex SQL. Then there's 86 00:04:10.039 --> 00:04:13.840 EEL to just list your databases. You can also customize 87 00:04:13.840 --> 00:04:17.680 PSQL using a timicustl RC file add aliases, change the 88 00:04:17.720 --> 00:04:21.360 prompt whatever makes you comfortable, and for enabling some really 89 00:04:21.360 --> 00:04:24.560 powerful extensions like PG stat statements, which will definitely circle 90 00:04:24.600 --> 00:04:27.600 back to you need to edit your postcrisql dot com 91 00:04:27.639 --> 00:04:30.439 file specifically the shared preload library is setting. 92 00:04:30.639 --> 00:04:33.839 Ah, okay, and that canfig change needs a restart. 93 00:04:33.560 --> 00:04:34.639 Right that specific one does. 94 00:04:34.720 --> 00:04:37.600 Yeah, Shared preload libraries needs a full postcresco restart to 95 00:04:37.600 --> 00:04:38.120 take effect. 96 00:04:38.240 --> 00:04:41.879 Got it? So okay, we can configure things, but how 97 00:04:41.920 --> 00:04:44.480 do we see what the database is doing? Like right now? 98 00:04:44.560 --> 00:04:46.800 Is it just log files or is there a more 99 00:04:46.800 --> 00:04:49.920 direct way? I remember this one time a query just 100 00:04:50.040 --> 00:04:52.040 ran forever almost took down the whole app. If only 101 00:04:52.040 --> 00:04:53.600 I'd known about PG stat activity. Oh. 102 00:04:53.639 --> 00:04:55.920 Absolutely, PG set activity is exactly that. It's your real 103 00:04:55.959 --> 00:05:00.439 time dashboard. You see every connection, what state it's in active, idle, 104 00:05:00.720 --> 00:05:04.000 maybe crucially idle in transaction, and you see background processes too, 105 00:05:04.000 --> 00:05:04.959 like autovacuum, so. 106 00:05:04.959 --> 00:05:06.959 You can spot those long running queries. 107 00:05:07.319 --> 00:05:07.600 Yep. 108 00:05:08.120 --> 00:05:10.399 You can see the curry text, find its process idea 109 00:05:10.439 --> 00:05:12.720 the PID, and then if you really need to, you 110 00:05:12.759 --> 00:05:15.199 can try to cancel it gracefully with pg cancel back 111 00:05:15.279 --> 00:05:19.360 end or in an emergency, terminate it with p determinative 112 00:05:19.399 --> 00:05:22.040 back end. Use that last one carefully. 113 00:05:21.680 --> 00:05:24.319 Though right termination is a bit heavy handed. 114 00:05:24.120 --> 00:05:27.279 It can be Now this ties into understanding pessimistic locking. 115 00:05:27.720 --> 00:05:31.160 It's what postgrescool does by default. You'll see shared locks 116 00:05:31.399 --> 00:05:34.680 and exclusive locks. The main takeaway you really want to 117 00:05:34.720 --> 00:05:38.079 minimize how long you hold exclusive locks because they block 118 00:05:38.199 --> 00:05:40.639 everyone else trying to access that same data. 119 00:05:40.720 --> 00:05:42.720 And that can lead to deadlocks. 120 00:05:42.839 --> 00:05:43.319 Exactly. 121 00:05:43.519 --> 00:05:47.519 Deadlocks are the worst case scenario, two transactions waiting for 122 00:05:47.560 --> 00:05:51.199 each other, stuck forever. Postgres will will detect and break them, 123 00:05:51.360 --> 00:05:54.360 but it means one transaction fails. You can see livelock 124 00:05:54.439 --> 00:05:56.360 information using the peaklocks view. 125 00:05:56.560 --> 00:05:59.439 Okay, so lots to monitor. How about experimenting safely? 126 00:06:00.079 --> 00:06:03.040 These source suggest using generate series to create lots of 127 00:06:03.079 --> 00:06:05.839 fake data. You can do this in a separate experiment's database. 128 00:06:05.920 --> 00:06:08.839 It's a great way to simulate production level load without 129 00:06:08.839 --> 00:06:10.399 touching your real development data. 130 00:06:10.439 --> 00:06:11.000 That's smart. 131 00:06:11.040 --> 00:06:14.399 And here's something that often surprises people. Post grescool has 132 00:06:14.480 --> 00:06:15.800 transactional DDL. 133 00:06:15.920 --> 00:06:18.519 Transactional DDL data definition. 134 00:06:18.160 --> 00:06:20.519 Language like create table, exactly. 135 00:06:20.160 --> 00:06:24.759 Schema changes, create index, alter table, ad column. They happen 136 00:06:24.800 --> 00:06:27.920 inside a transaction just like data changes. So you can 137 00:06:28.000 --> 00:06:31.879 literally type begin, then create index mix on my table call, 138 00:06:32.040 --> 00:06:34.360 then realize you made a mistake and type roll back 139 00:06:34.839 --> 00:06:37.240 and that index just poof never happened. 140 00:06:37.319 --> 00:06:37.720 WHOA. 141 00:06:37.759 --> 00:06:41.600 Okay, that's huge for safety. No half applied schema changes. 142 00:06:41.720 --> 00:06:44.879 It's an incredible safety net. It means your migrations either 143 00:06:44.920 --> 00:06:48.399 succeed completely or fail completely, which brings us back to 144 00:06:48.439 --> 00:06:53.040 safe experimentation. Always test schema changes in staging first, and 145 00:06:53.120 --> 00:06:56.160 maybe even use read only database users in production for 146 00:06:56.199 --> 00:06:59.319 certain monitoring tasks. Using roles like p grade old data 147 00:06:59.399 --> 00:07:01.920 or pgmont just adds another layer of safety. 148 00:07:02.040 --> 00:07:05.040 Makes sense, Okay, let's switch gears a bit. Data correctness 149 00:07:05.079 --> 00:07:08.399 data consistency obviously super important in rails. We often reach 150 00:07:08.439 --> 00:07:11.800 for active record validations, but the force argues pretty strongly 151 00:07:11.839 --> 00:07:14.839 for using database level constraints too. What's the thinking there? 152 00:07:14.839 --> 00:07:19.240 Are they redundant not redundant complementary? That's the key. Active 153 00:07:19.279 --> 00:07:23.240 record validations are great, essential, even for catching errors early 154 00:07:23.279 --> 00:07:27.279 at the application layer, providing good user feedback. But database 155 00:07:27.360 --> 00:07:32.720 constraints offer stronger guarantees because they're enforced inside the database engine, 156 00:07:33.000 --> 00:07:36.959 which is built specifically to handle high concurrency transactions and 157 00:07:37.079 --> 00:07:41.199 maintain data isolation in ways an application layer just can't. 158 00:07:41.360 --> 00:07:44.279 Okay, stronger guarantees. What kind of constraints are we talking 159 00:07:44.319 --> 00:07:46.959 about beyond say, primary key or. 160 00:07:46.920 --> 00:07:48.720 Not in all, Oh, there's a whole suite. 161 00:07:48.720 --> 00:07:51.639 You've got unique constraints obviously, foreign key constraints to maintain 162 00:07:51.680 --> 00:07:55.839 relationships between tables, check constraints for custom rules, and even 163 00:07:55.879 --> 00:07:57.839 more advanced exclusion constraints. 164 00:07:58.000 --> 00:08:00.120 Let's take unique If I want to add one, but 165 00:08:00.160 --> 00:08:02.399 I already have duplicate data, what do I do? 166 00:08:02.800 --> 00:08:05.839 Good question? You typically need to clean up that data first, 167 00:08:06.160 --> 00:08:09.240 the source mentions using a common table expression a CTE 168 00:08:09.639 --> 00:08:12.240 with the row number window function. It's a neat trick 169 00:08:12.240 --> 00:08:15.199 to identify and then delete the duplicates before you apply 170 00:08:15.279 --> 00:08:16.480 the unique constraint. 171 00:08:16.800 --> 00:08:21.000 Okay, and for and keys. Rails didn't always support those natively, 172 00:08:21.120 --> 00:08:21.759 did it right? 173 00:08:21.839 --> 00:08:24.920 Native support landed in rails four point two Before that 174 00:08:24.959 --> 00:08:28.920 you use gems. But yeah, they're fundamental. They ensure, for example, 175 00:08:28.920 --> 00:08:31.079 that you can't delete a rider if they still have 176 00:08:31.120 --> 00:08:34.240 associated trips in the database. Prevents orphaned records. 177 00:08:34.519 --> 00:08:37.799 Got it? What about che check constraints you said, custom rules, Yeah, 178 00:08:37.840 --> 00:08:38.840 they're really flexible. 179 00:08:39.039 --> 00:08:41.519 Anything that evaluates the true or false, like you could 180 00:08:41.639 --> 00:08:44.159 enforce that a trips table's completed timestam must always be 181 00:08:44.240 --> 00:08:47.240 later than it's create debt timestap. Simple powerful rule. 182 00:08:47.399 --> 00:08:50.159 Okay, that makes sense, But this raises a practical point. 183 00:08:50.200 --> 00:08:52.120 How do you add a cheat check constraint like that 184 00:08:52.240 --> 00:08:55.039 to a table that's already huge and getting hammered with traffic. 185 00:08:55.279 --> 00:08:57.600 Wouldn't that lock it up while it checks millions of 186 00:08:57.639 --> 00:08:58.200 old rows? 187 00:08:58.559 --> 00:09:02.519 Exactly the problem, and there's an elegant solution. You do 188 00:09:02.559 --> 00:09:06.240 it in two spelps using rails migrations. First you add 189 00:09:06.320 --> 00:09:12.279 check constraint, but pass the option validated false. This tells postgraschool, okay, 190 00:09:12.399 --> 00:09:14.799 enforce this rule for all new or updated rows from 191 00:09:14.799 --> 00:09:17.000 now on, but don't check the old ones yet. That 192 00:09:17.039 --> 00:09:18.039 part is super fast. 193 00:09:18.200 --> 00:09:19.360 Ah, so it doesn't block. 194 00:09:19.679 --> 00:09:19.840 Right. 195 00:09:20.240 --> 00:09:23.559 Then, in a separate later migration you run validated at 196 00:09:23.639 --> 00:09:27.120 check constraint for that same constraint. This tells postgres school, okay, 197 00:09:27.159 --> 00:09:29.879 now go back and check all the existing rows. But 198 00:09:30.000 --> 00:09:32.480 it does so without taking such a heavy lock. It 199 00:09:32.519 --> 00:09:33.679 avoids that downtime. 200 00:09:34.200 --> 00:09:37.080 Clever two steps. What about deferring constraints? 201 00:09:37.159 --> 00:09:41.000 Yeah, deferable Initially deferred. You can apply this to unique 202 00:09:41.120 --> 00:09:45.279 primary key, foreign key, and exclusion constraints. It means the 203 00:09:45.320 --> 00:09:48.120 constraint check is postponed until the very end of the transaction. 204 00:09:48.600 --> 00:09:51.399 Super useful for things like say, reordering items in a 205 00:09:51.399 --> 00:09:54.440 list where each item needs a unique position. You might 206 00:09:54.480 --> 00:09:57.519 temporarily have duplicate positions during the transaction while you swap 207 00:09:57.559 --> 00:09:59.360 things around, but as long as it's fixed by the 208 00:09:59.399 --> 00:10:00.759 time you commit, it's okay. 209 00:10:00.639 --> 00:10:04.039 Interesting, okay. You also mentioned exclusion constraints. Those sound advanced. 210 00:10:04.240 --> 00:10:07.840 They are powerful and less common, but solve specific problems 211 00:10:07.919 --> 00:10:11.600 really well. They prevent overlapping data across multiple roads in 212 00:10:11.639 --> 00:10:15.240 the same table. The classic example is preventing overlapping time 213 00:10:15.320 --> 00:10:18.799 ranges like booking a meeting room, or, in ride SHARE's case, 214 00:10:19.000 --> 00:10:23.399 maybe preventing overlapping vehicle reservations. They usually require an extension 215 00:10:23.480 --> 00:10:26.120 like beat read just and often use range types like 216 00:10:26.200 --> 00:10:29.720 TSTs range for timestamp ranges along with the overlap operator 217 00:10:29.759 --> 00:10:33.720 at datcha okay, quick detour case in sensitive unique emails 218 00:10:33.799 --> 00:10:36.080 common problem. How does postgress will handle that? 219 00:10:36.240 --> 00:10:37.039 Two main ways? 220 00:10:37.080 --> 00:10:40.200 Really You could use the site text extension, which provides 221 00:10:40.240 --> 00:10:44.120 a case in sensitive text type, or you can use generated. 222 00:10:43.639 --> 00:10:46.360 Columns generated columns like virtual columns sorted. 223 00:10:46.440 --> 00:10:49.440 Yeah, you define a column that's automatically computed based on others, 224 00:10:49.639 --> 00:10:51.600 so you could have a lower mail generated column that 225 00:10:51.639 --> 00:10:54.919 always stores lower email. Then you put a regular unique 226 00:10:54.960 --> 00:10:58.639 index on that generated column. Rails actually supports these now too. 227 00:10:58.720 --> 00:11:02.440 Neat and quickly. And domains right, create. 228 00:11:02.159 --> 00:11:05.320 Type of gas as enom lets you define a fixed 229 00:11:05.360 --> 00:11:10.200 list of a loudstring values for a column like trip statuses, pending, active, completed. 230 00:11:10.679 --> 00:11:13.360 Create domain lets you create a custom data type based 231 00:11:13.440 --> 00:11:16.679 on an existing one, but add check constraints to it, 232 00:11:16.759 --> 00:11:20.480 making reusable validation rules both useful, different trade offs. 233 00:11:20.519 --> 00:11:23.159 Okay, this is great for ensuring data integrity, but let's 234 00:11:23.159 --> 00:11:26.320 talk about actually changing the database schema on a busy 235 00:11:26.360 --> 00:11:30.159 production system. That moment when you run RAILSDB dot migrate, 236 00:11:30.840 --> 00:11:31.759 it can be terrifying. 237 00:11:31.840 --> 00:11:34.480 Oh yeah, the dreaded migration lock exactly. 238 00:11:34.480 --> 00:11:37.559 Some alter table operations, they take what's called an access 239 00:11:37.600 --> 00:11:40.600 exclusive lock, right, and that just blocks everything reads rights. 240 00:11:40.639 --> 00:11:43.480 Your app grinds to a halt. How do we avoid 241 00:11:43.519 --> 00:11:44.720 that absolute nightmare? 242 00:11:44.879 --> 00:11:45.039 Right? 243 00:11:45.039 --> 00:11:47.440 That exclusive lock is the enemy on a busy system. 244 00:11:47.679 --> 00:11:50.840 The absolute key here, Your best friend really is the 245 00:11:50.919 --> 00:11:54.639 concurrently keyword for operations like create index or drop index. 246 00:11:54.960 --> 00:11:57.759 Adding concurrently tells postgress will to do the work without 247 00:11:57.759 --> 00:12:01.200 taking that heavy lock. It takes longer, more resources, but 248 00:12:01.240 --> 00:12:02.600 your application stays online. 249 00:12:02.679 --> 00:12:03.360 It's a life saver. 250 00:12:03.759 --> 00:12:07.440 So create index concurrently, drop index concurrently. Are there other 251 00:12:07.480 --> 00:12:08.559 ways to stay safe? 252 00:12:08.639 --> 00:12:09.159 Definitely. 253 00:12:09.360 --> 00:12:12.879 There's a fantastic Ruby gym called strong Migrations. You add 254 00:12:12.879 --> 00:12:15.960 it to your development environment and it actively watches your migrations. 255 00:12:16.639 --> 00:12:19.519 If it spots a potentially dangerous operation, one that would 256 00:12:19.519 --> 00:12:23.879 take an access exclusive lock and likely cause downtime, it'll 257 00:12:23.919 --> 00:12:27.639 either warn you, suggest a safer multi step alternative like 258 00:12:27.679 --> 00:12:30.879 the chi check constrained example, or even prevent the migration 259 00:12:30.919 --> 00:12:32.519 from running in production by default. 260 00:12:32.639 --> 00:12:36.039 Oh wow, So it enforces safer practices during development exactly. 261 00:12:36.080 --> 00:12:39.080 It catches things early beyond that. You need safeguards at 262 00:12:39.080 --> 00:12:42.679 the database level too, especially with high concurrency. Setting a 263 00:12:42.720 --> 00:12:44.080 lock time out is crucial. 264 00:12:44.240 --> 00:12:46.600 Lock time out that limits how long a query waits 265 00:12:46.600 --> 00:12:47.080 for a lock. 266 00:12:47.360 --> 00:12:50.480 Precisely, if a query can't get the lock it needs 267 00:12:50.480 --> 00:12:55.120 within say, fifty milliseconds, it gets canceled instead of just 268 00:12:55.120 --> 00:12:59.559 sitting there waiting indefinitely and potentially holding up other processes. Similarly, 269 00:12:59.759 --> 00:13:02.120 stay time out puts a cap on how long any 270 00:13:02.120 --> 00:13:05.600 single sequel statement is allowed to run, prevents runaway queries 271 00:13:05.600 --> 00:13:09.399 from hogging resources. The source also mentions enabling log lock 272 00:13:09.440 --> 00:13:12.639 weights and tuning deadlock timeout for better visibility in your 273 00:13:12.679 --> 00:13:16.039 logs when contention happens. Okay, timeouts are key. What about 274 00:13:16.080 --> 00:13:20.080 removing columns? I've heard that can cause weird errors too. Ah, yes, 275 00:13:20.120 --> 00:13:22.840 so the stale schema cache problem. This happens when you 276 00:13:22.960 --> 00:13:25.159 drop a column, but some of your running Rails application 277 00:13:25.240 --> 00:13:27.639 servers haven't picked up the schema change yet, they try 278 00:13:27.679 --> 00:13:29.600 to query the column that no longer exists. 279 00:13:29.919 --> 00:13:30.879 Boom error. 280 00:13:31.000 --> 00:13:32.840 Right, So how do you remove a column safely? 281 00:13:33.200 --> 00:13:36.159 The recommended way is using active record dot base dot 282 00:13:36.200 --> 00:13:40.200 ignored columns. It's a multi step process. First, you add 283 00:13:40.200 --> 00:13:42.879 the column name to ignored columns in your Rails model, 284 00:13:43.240 --> 00:13:47.120 deploy that code. Now rail simply pretends the column doesn't exist, 285 00:13:47.440 --> 00:13:50.159 even though it's still in the database. Then, once your 286 00:13:50.159 --> 00:13:52.360 sure no code is using it, you create and run 287 00:13:52.399 --> 00:13:55.799 a migration to actually remove column. Finally, you remove the 288 00:13:55.799 --> 00:13:58.679 column name from ignored columns in a later deploy. It's 289 00:13:58.720 --> 00:14:00.320 gradual and safe. 290 00:14:00.279 --> 00:14:04.120 Makes sense gradual removal. Now this brings up another big one. 291 00:14:04.320 --> 00:14:06.879 What if you add a new column and need to 292 00:14:06.919 --> 00:14:11.440 populate it for millions of existing rows backfilling data without downtime? 293 00:14:11.519 --> 00:14:14.360 Yeah, that's a classic challenge. Running a massive up date 294 00:14:14.440 --> 00:14:17.000 statement is usually out of the question. Too slow, too 295 00:14:17.080 --> 00:14:20.919 much locking. You need online backfilling strategies. One approach is 296 00:14:20.960 --> 00:14:24.440 double writing or dual rights. You modify your application code 297 00:14:24.480 --> 00:14:26.840 to write to both the old location if any, and 298 00:14:26.919 --> 00:14:29.679 the new column simultaneously. You run a background job to 299 00:14:29.679 --> 00:14:32.039 backfill a new column for old records, and once it's done, 300 00:14:32.080 --> 00:14:34.360 you switch weeds to the new column and eventually remove 301 00:14:34.399 --> 00:14:36.200 the dual right logic and the old column. 302 00:14:36.240 --> 00:14:38.279 Okay, double writing any other ways. 303 00:14:38.440 --> 00:14:42.720 Another technique involves using intermediate tables. You create a temporary table, 304 00:14:42.799 --> 00:14:45.240 maybe just with the primary key in the new column value. 305 00:14:45.519 --> 00:14:48.080 You populate that table, perhaps marking it U and lodgy 306 00:14:48.200 --> 00:14:51.639 so it doesn't hit replication, maybe disabling autovacuum on it 307 00:14:51.639 --> 00:14:55.039 temporarily for speed. Then you batch update the main table 308 00:14:55.080 --> 00:14:59.320 from this intermediate table. And crucially, all these backfilling processes 309 00:14:59.360 --> 00:15:02.600 need to be done in small, manageable batches, often with 310 00:15:02.679 --> 00:15:05.759 some kind of throttling or delay between batches to avoid 311 00:15:05.879 --> 00:15:08.519 overwhelming the database or causing replication. 312 00:15:08.159 --> 00:15:12.559 Lack batching and throttling. Got it, Okay, let's shift focus 313 00:15:12.600 --> 00:15:16.360 to active record itself. It's amazing. Rails convention over configuration 314 00:15:16.480 --> 00:15:20.480 is great, but yeah, it can sometimes generate pretty inefficient 315 00:15:20.559 --> 00:15:22.240 queries if you're not paying attention right. 316 00:15:22.399 --> 00:15:26.159 Absolutely, the abstraction is powerful, but it can hide what's 317 00:15:26.200 --> 00:15:29.120 actually happening. So connecting this to the bigger picture, it's 318 00:15:29.159 --> 00:15:32.080 about making your rails app smarter in how it communicates 319 00:15:32.080 --> 00:15:32.759 with postgress. 320 00:15:32.759 --> 00:15:35.519 How do we even spot the bad queries easily? 321 00:15:35.840 --> 00:15:36.080 Well? 322 00:15:36.200 --> 00:15:38.559 One really helpful thing in Roll seven and later is 323 00:15:38.600 --> 00:15:42.320 the improved query logs. They can automatically add context like 324 00:15:42.360 --> 00:15:45.200 which controller an action trigger the query right into the 325 00:15:45.200 --> 00:15:48.320 sql log output. Makes tracing a slow query back to 326 00:15:48.320 --> 00:15:50.559 your application code much much easier. 327 00:15:50.600 --> 00:15:53.120 That sounds useful. What's a common inefficiency pattern? 328 00:15:53.200 --> 00:15:53.360 Oh? 329 00:15:53.360 --> 00:15:55.720 The absolute classic is the M plus one query problem? 330 00:15:55.799 --> 00:15:56.639 You see it everywhere? 331 00:15:56.720 --> 00:15:59.879 Ah, Yes, fetch one thing, then loop and fetch relate 332 00:16:00.279 --> 00:16:01.120 things one. 333 00:16:01.000 --> 00:16:04.720 By one exactly like load one hundred blog posts and 334 00:16:04.759 --> 00:16:07.559 then inside the loop for each post, run another query 335 00:16:07.600 --> 00:16:09.799 to get its author. That's one hundred and one database 336 00:16:09.840 --> 00:16:13.000 queries when it could probably be just two kills performance. 337 00:16:13.200 --> 00:16:15.440 Right, So how do we fix N plus one? 338 00:16:15.639 --> 00:16:18.720 The primary solution is eager loading. You tell active Record 339 00:16:18.799 --> 00:16:22.000 upfront what associated data you'll need, use methods like dot 340 00:16:22.000 --> 00:16:26.039 preload or dot cludes rails, then cleverly figures out how 341 00:16:26.039 --> 00:16:28.399 to load all that data in a minimal number of queries, 342 00:16:28.480 --> 00:16:30.440 usually just one extra query for each association. 343 00:16:30.559 --> 00:16:32.440 Yeah preload and dot includes. 344 00:16:32.120 --> 00:16:35.080 Got it and a quick tip if you've already eagerloaded 345 00:16:35.159 --> 00:16:38.799 data into an array of objects, use dot size to 346 00:16:38.840 --> 00:16:41.879 get the count dot dot count. Dot size works unloaded 347 00:16:41.960 --> 00:16:45.960 array in memory, dot count might trigger another database query 348 00:16:46.039 --> 00:16:47.440 unnecessarily good tip. 349 00:16:47.639 --> 00:16:49.440 Any other ways to prevent N plus one? 350 00:16:49.720 --> 00:16:52.720 Yeah, Rail six point one introduce strict loading. You can 351 00:16:52.840 --> 00:16:55.559 enable it per association or globally. If you try to 352 00:16:55.600 --> 00:16:59.480 access an association that wasn't explicitly eger loaded, it raises 353 00:16:59.480 --> 00:17:02.679 an error instead of silently running the en plus one query. 354 00:17:02.759 --> 00:17:04.799 It forces you to be explicit and prevents the problem 355 00:17:04.799 --> 00:17:05.319 by default. 356 00:17:05.400 --> 00:17:07.680 Ooh, I like that force the good behavior. What about 357 00:17:07.720 --> 00:17:09.279 optimizing individual queries? 358 00:17:09.359 --> 00:17:11.799 Simple things First, always use limit If you don't need 359 00:17:11.799 --> 00:17:14.119 all possible results, don't pull back ten thousand rows if 360 00:17:14.160 --> 00:17:17.799 you only display twenty. Also, the returning clause on insert 361 00:17:18.079 --> 00:17:21.160 active records dot insertle method supports this. Now it lets 362 00:17:21.160 --> 00:17:23.240 you get back the IDs or other columns of the 363 00:17:23.319 --> 00:17:26.839 rose you just inserted without needing a separate select query afterwards. 364 00:17:27.079 --> 00:17:27.920 Saves a round trip. 365 00:17:28.160 --> 00:17:30.759 Nice and processing large amounts of data use. 366 00:17:30.599 --> 00:17:34.440 Active records batching methods dot finbach or dot in batches. 367 00:17:34.839 --> 00:17:38.200 They retrieve records in batches default one thousand, which keeps 368 00:17:38.240 --> 00:17:41.079 memory usage low and avoids overwhelming the database or your 369 00:17:41.119 --> 00:17:45.480 application with enormous result sets. Much more reliable for large tables. 370 00:17:45.640 --> 00:17:48.559 Okay, batching is key. What about more complex SQL logic? 371 00:17:48.599 --> 00:17:49.799 Does Active record help there? 372 00:17:49.880 --> 00:17:54.200 It does increasingly so. Active record has good support for subqueries. 373 00:17:54.240 --> 00:17:57.279 Now you can construct queries where part of the ware clause, 374 00:17:57.319 --> 00:18:01.079 for instance, is itself another SQL query, useful for things 375 00:18:01.119 --> 00:18:03.400 like finding drivers who have completed more trips than the 376 00:18:03.440 --> 00:18:05.279 overall average number of trips per driver. 377 00:18:05.480 --> 00:18:10.039 Okay, subqueries. I've also heard about CT's common table expressions. 378 00:18:10.119 --> 00:18:14.720 Yes, CTEs using the with keyword in SQL are fantastic 379 00:18:14.759 --> 00:18:17.240 for complex queries. They'll let you break down a big, 380 00:18:17.279 --> 00:18:20.759 hairy query into smaller named logical steps. It makes the 381 00:18:20.759 --> 00:18:24.279 sequel vastly more readable and maintainable. Active record has ways 382 00:18:24.279 --> 00:18:26.240 to build queries using CTEs. 383 00:18:25.839 --> 00:18:29.000 Too, so better organization. What about views? 384 00:18:29.559 --> 00:18:33.440 Database views are another way to encapsulate complex sequel. You 385 00:18:33.519 --> 00:18:36.759 define the query logic as a view directly imposed cresscol 386 00:18:37.359 --> 00:18:39.359 and then you can query it from rails, almost like 387 00:18:39.400 --> 00:18:43.440 a regular table. The scenic gem is very popular for 388 00:18:43.519 --> 00:18:46.200 managing database views within your rails migrations. 389 00:18:46.319 --> 00:18:50.160 Okay, views and materialized views. What's the difference? 390 00:18:50.240 --> 00:18:52.920 Ah, Now, this is where it gets really interesting. Materialized 391 00:18:52.960 --> 00:18:55.559 views take it a step further. They don't just store 392 00:18:55.599 --> 00:18:58.960 the query definition. They actually execute the query and store 393 00:18:59.000 --> 00:19:01.160 the results physically like a cash table. 394 00:19:01.519 --> 00:19:03.680 So queries against them are super fast. 395 00:19:03.440 --> 00:19:07.160 Lightened fast because the complex calculation or joint is already done. 396 00:19:07.279 --> 00:19:10.599 They're perfect for complex reports or dashboard data that doesn't 397 00:19:10.680 --> 00:19:12.200 need to be absolutely real time. 398 00:19:12.559 --> 00:19:13.759 And the cool part you. 399 00:19:13.680 --> 00:19:16.920 Can often refresh them can currently update the stored data 400 00:19:16.920 --> 00:19:20.359 without locking readers. Provided the materialized view as a unique 401 00:19:20.359 --> 00:19:24.519