WEBVTT 1 00:00:00.120 --> 00:00:05.519 Welcome to the deep dive. Today, we're jumping into the 2 00:00:05.559 --> 00:00:08.119 really interesting world of Python image processing. 3 00:00:08.240 --> 00:00:09.800 Yeah, it's a big topic, it is. 4 00:00:10.240 --> 00:00:12.599 And you asked us for a way to quickly get 5 00:00:12.599 --> 00:00:16.719 the main ideas the techniques for manipulating and understanding images. 6 00:00:16.879 --> 00:00:19.239 So that's what we're doing today. 7 00:00:19.320 --> 00:00:22.679 That's a plan. We're using the Python Image Processing Cookbook 8 00:00:23.079 --> 00:00:26.839 as well. Our guide is packed with practical stuff, right. 9 00:00:26.960 --> 00:00:31.519 Think of it as decoding how computers learn to see 10 00:00:32.039 --> 00:00:33.960 and even change the images we look at. 11 00:00:33.799 --> 00:00:36.640 Every day exactly. Our goal, our mission, if you like, 12 00:00:36.759 --> 00:00:39.679 is to pull out the most useful, maybe even surprising 13 00:00:40.200 --> 00:00:41.240 bits from the cookbook. 14 00:00:41.399 --> 00:00:43.840 Yeah, give you that shortcut to the core concepts without 15 00:00:43.840 --> 00:00:47.359 getting liced in all the super technical code details right away. 16 00:00:47.719 --> 00:00:50.880 We're aiming for those aha moments, Ready to dive in. 17 00:00:50.880 --> 00:00:52.880 Let's do it. A really fun place to begin is 18 00:00:52.920 --> 00:00:56.399 creating artistic effects. The cookbook shows some well pretty cool 19 00:00:56.399 --> 00:00:58.320 ways to take a normal photo and make it something 20 00:00:58.359 --> 00:00:59.000 else entirely. 21 00:00:59.159 --> 00:01:00.359 Okay, I like this one of that? 22 00:01:00.439 --> 00:01:03.799 Like what well one is turning photos into cartoons? Oh yeah, 23 00:01:03.840 --> 00:01:06.159 not just a simple filter, I guess no, No, it's 24 00:01:06.159 --> 00:01:08.079 more involved a sequence of steps. 25 00:01:08.120 --> 00:01:10.480 Actually, all right, walk me through it. How do you 26 00:01:10.519 --> 00:01:12.280 start making a photo look like a cartoon? 27 00:01:12.560 --> 00:01:16.319 First step is something called bilateral filtering. Imagine you want 28 00:01:16.319 --> 00:01:19.120 to smooth out parts of an image, but not the 29 00:01:19.200 --> 00:01:22.680 sharp lines, the edges. Okay, Bilateral filtering does that. It 30 00:01:22.840 --> 00:01:27.239 smooths areas with similar colors, but keeps the important boundaries sharp. 31 00:01:27.959 --> 00:01:31.079 You'd use the bilateral filter function in open CD Python 32 00:01:31.159 --> 00:01:31.359 for this. 33 00:01:31.680 --> 00:01:35.680 Uh. Okay, so soften the texture, keep the lines, got it. 34 00:01:35.920 --> 00:01:39.040 What's next then comes median blurring. This is more about 35 00:01:39.079 --> 00:01:42.760 smoothing out noise and creating those flat blocks of color 36 00:01:42.799 --> 00:01:44.319 you see in cartoons. 37 00:01:43.840 --> 00:01:46.079 Right, like simplifying the textures exactly. 38 00:01:46.159 --> 00:01:48.680 The function for that is median blur. It sort of 39 00:01:48.719 --> 00:01:51.040 averages out small imperfections. 40 00:01:50.519 --> 00:01:53.920 Makes sense. Flat colors, sharp lines, so the lines need 41 00:01:53.959 --> 00:01:55.159 to be emphasized. 42 00:01:54.680 --> 00:01:57.920 Somehow, you got it. That's where adaptive thresholding comes in. 43 00:01:58.840 --> 00:02:00.879 This really makes the main edges pop. Think of it 44 00:02:00.959 --> 00:02:03.560 like inking in the outlines. Okay, even if the lighting 45 00:02:03.640 --> 00:02:07.439 isn't perfect across the image, adaptive threshold helps find and 46 00:02:07.560 --> 00:02:09.000 enhance those dominant edges. 47 00:02:09.400 --> 00:02:13.479 Nice bold outlines, flat colors. How do they merge? 48 00:02:13.759 --> 00:02:17.319 The final step uses a bit wise A and D operation. 49 00:02:18.039 --> 00:02:20.840 Imagine you have the smooth color image on one layer 50 00:02:21.120 --> 00:02:24.560 and the strong edges on another. The bitwise and function 51 00:02:24.680 --> 00:02:27.520 basically combines them, so the color fills in up to 52 00:02:27.599 --> 00:02:30.800 those strong edges. That gives you the final cartoon look. 53 00:02:31.159 --> 00:02:33.560 That's actually really clever. It's like a recipe for mimicking 54 00:02:33.599 --> 00:02:36.000 an art style. What other artistic tricks are in there? 55 00:02:36.039 --> 00:02:39.960 There's also simulating light art or long exposure effects. You 56 00:02:40.000 --> 00:02:42.639 know those photos with light trails from cars or water 57 00:02:42.680 --> 00:02:44.080 that looks all smooth and silky. 58 00:02:44.199 --> 00:02:46.360 Oh yeah, those are cool. How's that done? 59 00:02:46.520 --> 00:02:49.599 It's surprisingly simple at its core. You just average together 60 00:02:49.719 --> 00:02:53.159 many frames from a video clip, average them. Yeah, anything 61 00:02:53.240 --> 00:02:56.159 static in the video stays clear when you average the frames, 62 00:02:56.199 --> 00:02:58.879 but anything moving gets blurred together. That's how you get 63 00:02:58.879 --> 00:03:00.400 the light trails or the smooth water. 64 00:03:00.639 --> 00:03:03.919 Ah. Right, So if you film traffic at night, the 65 00:03:03.919 --> 00:03:07.400 buildings would be sharp, but the headlights would become streaks 66 00:03:07.439 --> 00:03:10.159 across the image, like leaving the camera shutter. 67 00:03:09.879 --> 00:03:14.120 Open, precisely the digital equivalent. The cookbook mentions getting that 68 00:03:14.240 --> 00:03:16.080 silky water look this way. 69 00:03:15.960 --> 00:03:19.639 Clever, very clever. Yeah, what about drawing style like pencil sketches? 70 00:03:19.840 --> 00:03:22.680 Yep? The cookbook covers that too. It uses different kinds 71 00:03:22.680 --> 00:03:25.120 of edge detection to pull up the outlines of details, 72 00:03:25.159 --> 00:03:26.439 kind of like an artist would. 73 00:03:26.360 --> 00:03:29.159 Edge detection finding the sharp changes in brightness. Right. 74 00:03:29.280 --> 00:03:29.439 Yeah. 75 00:03:29.479 --> 00:03:31.599 The book mentioned a few ways for sketches. 76 00:03:31.719 --> 00:03:35.560 Exactly. One is using difference of Gaushian dolldy. 77 00:03:35.280 --> 00:03:37.479 Og doheg dot og okay. 78 00:03:37.560 --> 00:03:39.879 Yeah. The idea is you blur the image slightly differently 79 00:03:39.960 --> 00:03:43.319 twice and then compare them. The differences highlight the edges. 80 00:03:43.879 --> 00:03:45.919 Xog is just a variation on that, maybe for a 81 00:03:45.960 --> 00:03:47.159 more stylized look. 82 00:03:47.240 --> 00:03:49.960 So the computer compares slightly different views to find the 83 00:03:50.000 --> 00:03:54.360 important lines. Interesting. It also mentioned anisopropic diffusion. Again we 84 00:03:54.479 --> 00:03:55.560 heard that from noise reduction. 85 00:03:55.759 --> 00:03:59.280 Yes, it's versatile for sketching. It smooths out the image 86 00:03:59.319 --> 00:04:02.680 while keeping the key edges sharp. It simplifies things, makes 87 00:04:02.719 --> 00:04:05.479 it look more abstract, more like a sketch. The book 88 00:04:05.520 --> 00:04:08.960 even gives some parameters like KAPA, twenty night or twenty 89 00:04:09.159 --> 00:04:10.560 as starting points, so. 90 00:04:10.520 --> 00:04:12.879 It's a smart smoothing that knows what to keep. And 91 00:04:12.919 --> 00:04:15.400 the last sketch method was the dodge operation. 92 00:04:16.040 --> 00:04:20.319 Sounds like photography, it's related for sketching, you invert the image, 93 00:04:20.519 --> 00:04:23.040 blur the inverted version quite a bit, and then sort 94 00:04:23.040 --> 00:04:26.120 of divide the original by that blurred in version, sometimes 95 00:04:26.120 --> 00:04:29.920 with the threshold too. It really emphasizes contrast along edges, 96 00:04:30.079 --> 00:04:32.360 giving that bright outline sketch effect. 97 00:04:32.839 --> 00:04:36.639 It's amazing how math can replicate these artistic looks. Okay, 98 00:04:36.680 --> 00:04:39.720 So moving from art, the cookbook gets into image enhancement, 99 00:04:40.160 --> 00:04:42.959 making images better, clearer, right. 100 00:04:42.800 --> 00:04:45.120 And a big part of that is denoising. We all 101 00:04:45.160 --> 00:04:48.560 hate grainy photos. Simple filters are the first thing mentioned, 102 00:04:48.839 --> 00:04:52.079 Like blurring right that that can kill details too exactly. 103 00:04:52.240 --> 00:04:55.680 Things like Gaushian or Median blur reduce noise, but they 104 00:04:55.680 --> 00:04:57.040 often blur everything else. 105 00:04:56.920 --> 00:04:59.199 Along with it, which brings us back to things like 106 00:04:59.279 --> 00:05:00.560 anisotropic fusion. 107 00:05:00.680 --> 00:05:04.240 Seems useful it really is, because it smooths while trying 108 00:05:04.240 --> 00:05:07.560 to preserve edges. It's often better at removing noise without 109 00:05:07.639 --> 00:05:09.120 making the whole image look soft. 110 00:05:09.399 --> 00:05:12.680 Okay, And then there are denoising auto encoders. Now that 111 00:05:12.720 --> 00:05:15.800 sounds like AI. It is. It's a neural network. You 112 00:05:15.879 --> 00:05:18.680 train it by feeding it noisy images and teaching it 113 00:05:18.720 --> 00:05:20.199 to output clean versions. 114 00:05:20.399 --> 00:05:21.279 How does it learn that? 115 00:05:21.519 --> 00:05:25.279 Through training lots of examples? It sees a noisy input, 116 00:05:25.480 --> 00:05:27.519 makes a guess at the clean output, compares it to 117 00:05:27.560 --> 00:05:30.279 the actual clean image, and adjusts itself to get closer. 118 00:05:30.319 --> 00:05:33.439 Next time, it learns to recognize noise patterns and remove them. 119 00:05:33.720 --> 00:05:36.279 The book even mentions you can use color images and 120 00:05:36.319 --> 00:05:37.639 try different network types. 121 00:05:37.759 --> 00:05:40.720 Wow, so the network literally learns what noise is and 122 00:05:40.759 --> 00:05:45.759 how to subtract it. Okay, what else for enhancement? Histogram 123 00:05:45.759 --> 00:05:48.600 matching sounds like adjusting brightness and contrast. 124 00:05:48.759 --> 00:05:51.920 Kind of a histogram shows a distribution of brightness levels. 125 00:05:52.399 --> 00:05:55.639 Histogram matching lets you take the overall tonal feel of 126 00:05:55.800 --> 00:05:58.360 one image, the template, and apply it to another image, 127 00:05:58.399 --> 00:06:03.680 the sourcewulate these things called cumulative distribution functions CDFs for 128 00:06:03.800 --> 00:06:07.439 both images. They summarize the brightness distribution. Then you map 129 00:06:07.480 --> 00:06:10.199 the brightness levels from the source image to the corresponding 130 00:06:10.240 --> 00:06:12.319 levels and the template based on these CDFs. 131 00:06:12.439 --> 00:06:13.399 And why would you do that? 132 00:06:13.600 --> 00:06:17.480 For creative effects? Mostly the cookbook suggests making a daytime 133 00:06:17.480 --> 00:06:20.720 photo look like night vision by matching its histogram to 134 00:06:20.920 --> 00:06:22.240 a picture taken at night. 135 00:06:22.399 --> 00:06:25.600 Huh. So you could completely change the mood by borrowing 136 00:06:25.600 --> 00:06:26.360 the tonal range. 137 00:06:26.399 --> 00:06:30.399 That's powerful, definitely, And the last enhancement technique here is 138 00:06:30.600 --> 00:06:35.319 seamless cloning or Poisson image editing. This is about pasting 139 00:06:35.399 --> 00:06:38.199 something from one image into another really realistically. 140 00:06:38.360 --> 00:06:41.160 Ah yes, cutting and pasting without it looking fake. How 141 00:06:41.199 --> 00:06:41.759 does that work? 142 00:06:41.839 --> 00:06:43.920 The magic is in the blending. It looks at the 143 00:06:43.959 --> 00:06:46.279 gradients the changes in color at the boundary of the 144 00:06:46.319 --> 00:06:49.439 object you're pasting, okay, and it tries to adjust the 145 00:06:49.480 --> 00:06:53.040 pasted objects so it's gradients smoothly transition into the gradients 146 00:06:53.040 --> 00:06:56.600 of the background image. THECV two dot seamless clone function 147 00:06:56.759 --> 00:07:00.000 in OpenCV maybe with the CV two dot mix clone 148 00:07:00.199 --> 00:07:04.040 option uses some clever math Posson equations to figure this out. 149 00:07:04.079 --> 00:07:05.680 So it's matching not just the colors, but the way 150 00:07:05.759 --> 00:07:07.680 light and shadow change across the boundary. 151 00:07:07.839 --> 00:07:12.199 Very cool exactly now. After enhancing images, the book moves 152 00:07:12.240 --> 00:07:16.920 into understanding their structure, starting with edge detection algorithms. We 153 00:07:17.000 --> 00:07:18.639 mentioned some for sketching, but there's more. 154 00:07:18.920 --> 00:07:21.720 Right, we talked about canny and more Hildreth. For canny, 155 00:07:22.160 --> 00:07:25.480 the book said, less blur means more detail, maybe more 156 00:07:25.519 --> 00:07:28.319 noise and more blur gives cleaner, stronger edges. 157 00:07:28.480 --> 00:07:31.680 Correct that blur amount. The sigma value controls the trade 158 00:07:31.720 --> 00:07:35.720 off and mare Hildreth uses the laplation of Gaussian log filter. 159 00:07:36.079 --> 00:07:39.800 It highlights rapid intensity changes. Then you find the zero 160 00:07:39.959 --> 00:07:43.480 crossings in that filtered image, which often mark the edges. 161 00:07:43.600 --> 00:07:46.680 Zero crossings where the filtered value goes from positive to 162 00:07:46.720 --> 00:07:47.720 negative vice versa. 163 00:07:47.800 --> 00:07:51.600 Yeah, it pinpoints those sharp transitions. And the third method 164 00:07:51.680 --> 00:07:55.639 mentioned was wavelet based edge detection. I know wavelets from audio? 165 00:07:56.199 --> 00:07:59.920 How do they work for images? Similar idea? Actually, wavelets 166 00:08:00.040 --> 00:08:02.959 break down the image into different frequency components. Edges are 167 00:08:03.000 --> 00:08:05.959 sharp features, so they contain a lot of high frequency information. 168 00:08:06.639 --> 00:08:10.120 By looking at the wavelet coefficients the numbers representing these frequencies, 169 00:08:10.319 --> 00:08:12.759 you can find where the high frequencies are concentrated, and 170 00:08:12.759 --> 00:08:14.639 that tells you where the edges are. It's another way 171 00:08:14.639 --> 00:08:15.040 to find. 172 00:08:14.920 --> 00:08:20.439 Sharpness analyzing the image's visual frequencies. Need perspective, okay. Next 173 00:08:20.519 --> 00:08:25.120 up image restoration, fixing broken images exactly. 174 00:08:25.240 --> 00:08:28.920 De blurring is a big one. The cookbook mentions Wiener filters. 175 00:08:29.120 --> 00:08:32.240 I think I've heard of those for signal processing YEP. 176 00:08:32.720 --> 00:08:36.120 Applied to images. Wiener filters try to reverse blurring. They 177 00:08:36.200 --> 00:08:38.960 estimate the original sharp image considering both how it was 178 00:08:38.960 --> 00:08:42.080 blurred and any noise present. There's usually a parameter to 179 00:08:42.120 --> 00:08:44.039 balance how much denoising versus de blurring. 180 00:08:44.039 --> 00:08:47.279 You want a balancing act right, trying to unblur without 181 00:08:47.279 --> 00:08:51.879 making noise worse. The book also mentioned constrained least squares 182 00:08:51.919 --> 00:08:58.240 filtering CLS with laplation. Constrained sounds complicated. 183 00:08:57.759 --> 00:08:59.840 It's a bit more advanced. CLS. Lets you add us 184 00:08:59.840 --> 00:09:03.559 some about the original image. Using a laplation constraint basically 185 00:09:03.600 --> 00:09:06.639 tells the algorithm the original image was probably smooth, so 186 00:09:07.120 --> 00:09:09.840 try to make the deep blurred result smooth too, while 187 00:09:09.879 --> 00:09:11.200 still trying to recover detail. 188 00:09:11.440 --> 00:09:14.399 Got it? Adding some prior knowledge. What about denoising with 189 00:09:14.679 --> 00:09:17.879 markoff random fields MRFs sounds statistical. 190 00:09:18.000 --> 00:09:21.159 It is MRF's model how pixels relate to their neighbors. 191 00:09:21.320 --> 00:09:24.480 The basic ideas that nearby pixels usually have similar values 192 00:09:24.480 --> 00:09:27.279 in a clean image, So the algorithm tries to find 193 00:09:27.320 --> 00:09:30.320 a denoised image where these local relationships are most likely, 194 00:09:30.600 --> 00:09:34.840 effectively smoothing out the random noise that violates those neighborhood similarities. 195 00:09:35.440 --> 00:09:39.039 The book mentions converting pixels to Mannix one and one first, 196 00:09:39.080 --> 00:09:41.679 which is common for some MRF methods. 197 00:09:41.399 --> 00:09:46.000 So finding the most probable clean image based on pixel statistics. Okay, 198 00:09:46.200 --> 00:09:48.159 and fixing holes image in painting. 199 00:09:48.279 --> 00:09:53.080 Yeah, like digital art restoration, filling in missing bits plausibly 200 00:09:54.000 --> 00:09:56.919 total variation in painting is one method mentioned. 201 00:09:56.600 --> 00:09:59.600 Coldal variation heard that before. How does it fill holes? 202 00:09:59.799 --> 00:10:02.720 It tries to fill the missing area by extending information 203 00:10:02.799 --> 00:10:05.080 from the surrounding pixels, but it does it in a 204 00:10:05.120 --> 00:10:07.480 way that keeps the filled area as smooth as possible, 205 00:10:07.679 --> 00:10:11.360 minimizing sharp changes or new edges within the patch. OpenCV 206 00:10:11.480 --> 00:10:12.399 has functions for this. 207 00:10:12.600 --> 00:10:16.159 Smoothly propagating the existing textures into the gap. Okay, And 208 00:10:16.399 --> 00:10:17.799 the last restoration. 209 00:10:17.399 --> 00:10:21.240 Technique, dictionary learning sounds like building a library. That's a 210 00:10:21.240 --> 00:10:24.440 good analogy. You learn a set of basic image patches 211 00:10:24.480 --> 00:10:27.799 the dictionary from the image itself. Then you assume that 212 00:10:27.879 --> 00:10:30.240 any noisy or missing part of the image can be 213 00:10:30.279 --> 00:10:34.279 reconstructed by combining these learned dictionary atoms or patches. So 214 00:10:34.320 --> 00:10:37.080 you find the best combination to represent and rebuild the 215 00:10:37.159 --> 00:10:37.960 damaged area. 216 00:10:38.279 --> 00:10:41.879 So it learns the image's own building blocks and uses 217 00:10:41.919 --> 00:10:44.039 them for repairs. Clever. 218 00:10:44.480 --> 00:10:49.080 Very Okay, Moving onto binary image processing just black and white. 219 00:10:48.840 --> 00:10:51.519 Pixels still useful stuff you can do though, like the 220 00:10:51.559 --> 00:10:53.320 distance transform. What's that measure? 221 00:10:53.759 --> 00:10:56.080 For every white pixel, it calculates how far it is 222 00:10:56.159 --> 00:10:59.720 from the nearest black pixel the background boundary. Pixels deep 223 00:10:59.720 --> 00:11:02.519 inside to white shape get high values, Pixels near the 224 00:11:02.639 --> 00:11:06.480 edge get low values. Good for analyzing thickness or shape. 225 00:11:06.720 --> 00:11:10.480 Makes sense, sort of thickness map. What about the morphological. 226 00:11:09.720 --> 00:11:13.000 Gradient that's mainly for highlighting the boundaries of objects in 227 00:11:13.039 --> 00:11:16.000 a binary image. You get it by subtracting an eroded 228 00:11:16.080 --> 00:11:20.000 version shrunk of the image from a dilated version expanded. 229 00:11:20.360 --> 00:11:22.279 It leaves just the one pixel thick outline. 230 00:11:22.399 --> 00:11:25.120 A clean way to get just the edges and the 231 00:11:25.200 --> 00:11:27.759 hit or mistransform. Taking name it is. 232 00:11:27.879 --> 00:11:30.840 It's for finding very specific small shapes or patterns. You 233 00:11:30.919 --> 00:11:34.840 use two little templates, one matching the foreground pattern and 234 00:11:34.919 --> 00:11:38.600 one matching the required background around it. It only triggers 235 00:11:38.600 --> 00:11:39.799 where both match perfectly. 236 00:11:40.200 --> 00:11:44.000 A very precise pattern finder for binary images. Got it? 237 00:11:44.720 --> 00:11:48.279 Last one here is morphological watershed, I know watershed for 238 00:11:48.279 --> 00:11:49.720 segmenting grayscale images. 239 00:11:49.879 --> 00:11:53.480 Yep, same principle, powerful for binary and grayscale. You treat 240 00:11:53.480 --> 00:11:56.080 the image like a three D landscape based on intensity. 241 00:11:56.559 --> 00:11:59.279 Then you flood it from low points the markers where 242 00:11:59.320 --> 00:12:03.039 the water from different basins meats. Those are your segmentation boundaries. 243 00:12:03.240 --> 00:12:03.559 Okay. 244 00:12:03.639 --> 00:12:06.440 The cookbook says you can place markers by finding peaks 245 00:12:06.519 --> 00:12:10.000 in the distance transform image or in low gradient areas. 246 00:12:10.240 --> 00:12:13.639 Great for separating touching objects like cells, or just finding 247 00:12:13.679 --> 00:12:15.720 distinct blobs flooding. 248 00:12:15.279 --> 00:12:18.279 The image landscape to find the natural divides. All right, 249 00:12:18.360 --> 00:12:22.240 let's shift to image registration. Aligning images super. 250 00:12:22.000 --> 00:12:24.679 Important for comparing images taken at different times or with 251 00:12:24.720 --> 00:12:28.440 different cameras or different medical scanners. The book starts with 252 00:12:28.559 --> 00:12:30.919 medical image registration using simple ITK. 253 00:12:31.159 --> 00:12:33.440 Yeah, like aligning a CT and an MRI scan of 254 00:12:33.480 --> 00:12:36.240 the same patient. Right, how does simple ITK do it? 255 00:12:36.240 --> 00:12:40.200 It finds the best geometric transformation maybe shifting, rotating scaling 256 00:12:40.240 --> 00:12:43.000 to line them up. It does this by optimizing a 257 00:12:43.039 --> 00:12:47.320 similarity score like Matt's mutual information, using a specific transform 258 00:12:47.320 --> 00:12:51.240 model like similarity d transform and an optimizer maybe gradient 259 00:12:51.279 --> 00:12:54.279 to set read images, set up the process run it 260 00:12:54.639 --> 00:12:56.759 then resample one image to match the other. 261 00:12:57.240 --> 00:13:00.440 A systematic way to find the perfect overlap. Okay. Then 262 00:13:00.480 --> 00:13:02.519 there's the ECC algorithm and warping. 263 00:13:02.919 --> 00:13:07.399 ECC is enhanced correlation coefficient. It's an algorithm designed to 264 00:13:07.440 --> 00:13:10.240 figure out the geometric warp needed to align two images, 265 00:13:10.519 --> 00:13:13.759 maybe correcting for slight camera shifts. Once you have the warp, 266 00:13:13.879 --> 00:13:14.440 you apply it. 267 00:13:14.519 --> 00:13:17.399 Gotcha. What about faces? Aligning faces with dlib. 268 00:13:17.519 --> 00:13:20.799 Dlib is great for finding facial landmarks eyes, nose corners, 269 00:13:20.840 --> 00:13:23.279 mouth corners, et cetera. Right, once you have those points 270 00:13:23.320 --> 00:13:26.039 on two faces, you can calculate and a fine transformation 271 00:13:26.200 --> 00:13:28.639 to warp one face so its landmarks line up with 272 00:13:28.679 --> 00:13:31.879 the other. This normalizes the pose the face a laner 273 00:13:31.960 --> 00:13:34.320 class and Immutell's helps here central. 274 00:13:34.080 --> 00:13:38.000 For face recognition. I bet okay. Robust matching and homography 275 00:13:38.039 --> 00:13:40.120 with RANSACK sounds like dealing with. 276 00:13:40.120 --> 00:13:44.080 Errors exactly When you match features between images, say, using 277 00:13:44.159 --> 00:13:48.399 sift features and brief descriptors, you often get bad matches outliers. 278 00:13:49.000 --> 00:13:52.840 RANSACK random sample consensus helps find the true transformation the 279 00:13:52.879 --> 00:13:55.200 homography despite these outliers. 280 00:13:55.279 --> 00:13:55.559 Wow. 281 00:13:55.879 --> 00:13:59.679 It randomly picks small subsets of matches, calculates a homography, 282 00:13:59.759 --> 00:14:02.639 and see how many other matches agree with it. It repeats 283 00:14:02.679 --> 00:14:05.519 this and picks the homography supported by the most matches, 284 00:14:05.720 --> 00:14:06.960 ignoring the ones that don't fit. 285 00:14:07.440 --> 00:14:12.440 Finds the consensus ignores the noise. Smart and image mosaicing, 286 00:14:13.120 --> 00:14:14.679 making panoramas. 287 00:14:14.120 --> 00:14:17.840 Yeah, stishing overlapping photos. The usual steps are find features 288 00:14:17.879 --> 00:14:21.320 like sift, match them between images, calculate the homography to 289 00:14:21.360 --> 00:14:24.840 warp them into alignment, then blend the seams open cvs 290 00:14:25.039 --> 00:14:28.039 CV two stitcher class makes it easier. The book also 291 00:14:28.120 --> 00:14:31.519 mentions cylindrical warping for very wide panoramas to handle distortion. 292 00:14:31.759 --> 00:14:34.480 So seamless panoramas are impressive. What about face morphing? That 293 00:14:34.600 --> 00:14:35.159 sounds fun? 294 00:14:35.399 --> 00:14:39.519 It is creating that smooth video transition between two faces. 295 00:14:39.840 --> 00:14:43.120 You need corresponding points on both faces first, Then you 296 00:14:43.200 --> 00:14:46.559 calculate an average shape between them. Then you warp both 297 00:14:46.600 --> 00:14:50.440 original faces towards that average shape. Finally, you blend the 298 00:14:50.480 --> 00:14:55.360 warped images together over time, usually with alpha blending. Meshwarping 299 00:14:55.440 --> 00:14:57.519 is one technique for the warp itself. 300 00:14:57.159 --> 00:14:59.720 Guiding one face to become another by aligning features in 301 00:14:59.720 --> 00:15:05.600 bloe And Finally, registration leads to building an image search engine. 302 00:15:05.720 --> 00:15:09.120 Content based image retrieval, Yeah, a multi step process. 303 00:15:09.120 --> 00:15:09.799 How did it work? 304 00:15:10.519 --> 00:15:14.200 You extract features like sift from every image in your database, 305 00:15:14.679 --> 00:15:19.360 create compact descriptions of those features, index them efficiently. Then 306 00:15:19.480 --> 00:15:22.360 for a query image, you extract its features descriptions and 307 00:15:22.399 --> 00:15:25.320 search the index for images with the most similar descriptions 308 00:15:25.600 --> 00:15:29.320 using tools like flan for speed and ratio testing for reliability. 309 00:15:29.519 --> 00:15:32.759 Creating a visual fingerprint and searching for matches. Powerful stuff. 310 00:15:32.840 --> 00:15:37.360 Definitely okay. Next major area image segmentation, dividing an image 311 00:15:37.360 --> 00:15:38.399 into meaningful parts. 312 00:15:38.600 --> 00:15:42.679 Simplest ways thresholding right. The book mentions OTSU and Riddler Calvert. 313 00:15:42.960 --> 00:15:47.320 Yeah. Basic idea is separating foreground from background based on brightness. 314 00:15:47.919 --> 00:15:52.039 Atsu's method and Riddler Calvert are automatic threshold finders. They 315 00:15:52.080 --> 00:15:56.080 analyze the histogram to find the best split point. Mahota's 316 00:15:56.120 --> 00:16:00.600 library has them. Atsu minimizes variance within classes. Really covered 317 00:16:00.639 --> 00:16:01.879 is iterative. 318 00:16:01.799 --> 00:16:04.600 So they find the threshold for you. Yes. What about 319 00:16:04.639 --> 00:16:07.759 segmentation with self organizing maps SOMs? 320 00:16:08.120 --> 00:16:11.720 SOMs are neural networks used for clustering. You can feed 321 00:16:11.799 --> 00:16:15.399 image pixel data like color into an SOM. It learns 322 00:16:15.440 --> 00:16:18.200 to group similar pixels together on its map. Okay, so 323 00:16:18.200 --> 00:16:20.720 you can use the trained SOM to segment the image 324 00:16:20.720 --> 00:16:23.720 based on which map neuron a pixel activates, or just 325 00:16:23.759 --> 00:16:26.919 to reduce the number of colors. Quantization. The book mentions 326 00:16:27.000 --> 00:16:28.519 using it on handwritten digits. 327 00:16:28.320 --> 00:16:31.159 Letting the data cluster itself. What's random walk segmentation? 328 00:16:31.440 --> 00:16:34.399 That one's interactive. You first label a few seed pixels 329 00:16:34.440 --> 00:16:37.440 for each region you want. Then for every unlabeled pixel, 330 00:16:37.480 --> 00:16:40.320 the algorithm figures out the probability that are random walk 331 00:16:40.360 --> 00:16:44.080 starting there would hit each seed region first, the pixel 332 00:16:44.080 --> 00:16:47.000 gets assigned to the region with the highest probability. Often 333 00:16:47.039 --> 00:16:48.200 gives really nice results. 334 00:16:48.480 --> 00:16:52.440 A guided approach using initial hints. What about segmenting skin 335 00:16:52.960 --> 00:16:54.960 gmm EM algorithm. 336 00:16:54.639 --> 00:16:59.960 Gaussian mixture model GMM and expectation maximization EM. The idea 337 00:17:00.240 --> 00:17:03.000 is that skin colors follow a mix of Gaussian distributions. 338 00:17:03.399 --> 00:17:06.480 You train a GMM on skin and skin examples, then 339 00:17:06.559 --> 00:17:10.240 you use the train model to classify pixels in new images. 340 00:17:09.880 --> 00:17:13.440 Learning the statistics of skin color. Okay, medical image segmentation 341 00:17:13.519 --> 00:17:15.319 again UNED and watershed right. 342 00:17:15.799 --> 00:17:18.880 Deep learning models like UNIT are huge in medical imaging, 343 00:17:18.920 --> 00:17:23.000 now great at learning complex patterns for segmenting organs or tumors, 344 00:17:23.359 --> 00:17:27.480 and watershed via simple ITK is still useful, especially for 345 00:17:27.559 --> 00:17:29.640 separating touching cells or structures. 346 00:17:29.720 --> 00:17:32.720 Then deep semantic segmentation assigning a label to every. 347 00:17:32.519 --> 00:17:35.359 Pixel exactly using models like deep lab V three plus 348 00:17:35.480 --> 00:17:38.319 or FCN not just there's a car, but these specific 349 00:17:38.400 --> 00:17:42.279 pixels are car. These are road, et cetera, pixel level understanding, got. 350 00:17:42.200 --> 00:17:44.680 It, and deep instant segmentation. How's that different? 351 00:17:44.720 --> 00:17:46.920 It goes one step further. Semantic says these are all 352 00:17:46.920 --> 00:17:49.680 car pixels. Instance says this is car hashtag one, This 353 00:17:49.720 --> 00:17:52.640 is car hashtag two, this is car hashtag three inch 354 00:17:52.720 --> 00:17:53.720 with its own mask. 355 00:17:53.599 --> 00:17:57.240 Ah distinguishing individual objects of the same class precisely. 356 00:17:57.640 --> 00:18:01.079 Models like mask URCNN do this. They build on object 357 00:18:01.160 --> 00:18:04.200 detectors like faster RCNN and add a branch to predict 358 00:18:04.200 --> 00:18:05.960 the mask for each detected instance. 359 00:18:06.119 --> 00:18:08.920 Car one, Car two, Car three, each outlined much more. 360 00:18:08.799 --> 00:18:14.279 Detail exactly, okay. Next up image classification, assigning one label 361 00:18:14.359 --> 00:18:15.240 to the whole image. 362 00:18:15.240 --> 00:18:19.119 The book starts with feature based HOG and logistic regression. 363 00:18:19.200 --> 00:18:21.000 We saw HOG for detection. 364 00:18:20.799 --> 00:18:25.119 YEP histogram of oriented gradients. You extract these HG features, 365 00:18:25.160 --> 00:18:27.599 which capture edge direction info, and feed them into a 366 00:18:27.640 --> 00:18:31.799 standard classifier like logistic regression to categorize the entire image. 367 00:18:32.039 --> 00:18:37.400 Classic machine learning pipeline extract features, train classifier, evaluate. 368 00:18:37.079 --> 00:18:40.279 Using gradients as a signature. What about texture classification? Gebor 369 00:18:40.319 --> 00:18:41.359 filter banks. 370 00:18:41.279 --> 00:18:45.160 Gaybor filters are sensitive to orientation and frequency. Great for texture, 371 00:18:45.279 --> 00:18:47.240 a bank is just a set of Gaybor filters with 372 00:18:47.279 --> 00:18:50.240 different parameters. You apply the bank, get a feature vector 373 00:18:50.279 --> 00:18:52.759 describing the texture, and compare it to feature vectors of 374 00:18:52.839 --> 00:18:53.680 known textures. 375 00:18:53.880 --> 00:18:56.720 Analyzing the image grain with special filters. Okay, and then 376 00:18:56.839 --> 00:19:00.079 the big one. Pre trained deep blurring models, transfer lar. 377 00:19:00.599 --> 00:19:04.920 Huge shortcut models like VGG sixteen, mobile NETV two ResNet 378 00:19:05.039 --> 00:19:09.079 inception trained on millions of image neet images they've learned 379 00:19:09.200 --> 00:19:10.519 general visual features. 380 00:19:10.599 --> 00:19:12.480 Do you just use mat in the box pretty much? 381 00:19:12.559 --> 00:19:14.960 You feed your image in get predictions based on the 382 00:19:15.039 --> 00:19:18.400 vast knowledge they already have. The cookbook shows classifying a 383 00:19:18.519 --> 00:19:19.880 cheetah and swans this. 384 00:19:19.920 --> 00:19:24.279 Way, borrowing expertise cool and training a custom classifier using 385 00:19:24.359 --> 00:19:25.519 transfer learning Right. 386 00:19:25.960 --> 00:19:28.440 You take a pre train model, usually chop off its 387 00:19:28.480 --> 00:19:32.960 final classification layer, freeze the early layers which learn general features, 388 00:19:33.279 --> 00:19:36.039 add your own new classification layers on top, and train 389 00:19:36.160 --> 00:19:38.519 only those new layers, or maybe fine tune a bit 390 00:19:38.519 --> 00:19:40.559 more on your specific data. 391 00:19:40.319 --> 00:19:43.000 Set ah adapting it exactly much. 392 00:19:42.839 --> 00:19:46.000 Faster and needs less data than training from scratch. The 393 00:19:46.039 --> 00:19:49.279 book mentions image data Generator for augmenting your data too, 394 00:19:49.400 --> 00:19:51.680 creating variations to help the model generalize. 395 00:19:51.880 --> 00:19:55.640 Taking a generalist model and making it a specialist Okay. 396 00:19:56.200 --> 00:19:59.960 Classifying graphic signs as mentioned next, with challenges like imbalance 397 00:20:00.079 --> 00:20:00.759 and overfitting. 398 00:20:00.960 --> 00:20:04.680 Yeah, important for self driving. Some signs are rare. Imbalance 399 00:20:04.880 --> 00:20:09.240 models might memorize the training data overfitting, so you use 400 00:20:09.279 --> 00:20:13.279 techniques like resembling classes or heavy data augmentation during training, 401 00:20:13.400 --> 00:20:16.319