WEBVTT 1 00:00:00.360 --> 00:00:03.080 Usually when we talk about making a diagnosis, there's this 2 00:00:03.200 --> 00:00:08.000 expectation of pure mechanical precision. 3 00:00:07.960 --> 00:00:10.400 Right, like a very comforting binary exactly. 4 00:00:10.599 --> 00:00:12.160 I mean, think about breaking your arm. You go to 5 00:00:12.160 --> 00:00:14.519 the hospital, they take the X ray, and you see 6 00:00:14.519 --> 00:00:17.039 that jagged white line on the black film, and the 7 00:00:17.039 --> 00:00:19.079 doctor just points and says, you know, there it is. 8 00:00:19.480 --> 00:00:25.440 Yeah, it's incredibly visible. We have this fundamental human bias 9 00:00:25.519 --> 00:00:28.440 toward things we can see, right, things we can categorize 10 00:00:28.440 --> 00:00:32.000 and just put into neat little boxes broken or not broken. 11 00:00:32.119 --> 00:00:34.880 But then if you zoom out and look at the 12 00:00:34.920 --> 00:00:37.240 digital world that you are interacting with right now, I 13 00:00:37.240 --> 00:00:40.520 mean the apps on your phone, the movie recommendations popping 14 00:00:40.560 --> 00:00:43.359 up on your TV, or even the software protocols keeping 15 00:00:43.399 --> 00:00:44.479 your bank account secure. 16 00:00:44.560 --> 00:00:47.840 Yeah, suddenly that X ray machine is just entirely useless. 17 00:00:47.479 --> 00:00:51.640 Completely useless, because when you engage with modern technology, you 18 00:00:51.679 --> 00:00:55.719 are stepping inside this invisible architecture. You're completely surrounded by 19 00:00:55.759 --> 00:00:59.679 these complex decisions and predictions that are being made in 20 00:00:59.679 --> 00:01:01.359 the app slute dark, and. 21 00:01:01.439 --> 00:01:05.120 The sheer volume of data flowing through that architecture is 22 00:01:05.200 --> 00:01:09.239 so massive. I mean, human eyes couldn't possibly find those 23 00:01:09.359 --> 00:01:12.159 jagged white lines even if they knew exactly what to 24 00:01:12.200 --> 00:01:12.519 look for. 25 00:01:12.680 --> 00:01:14.719 Right. The scale of it all just demands that we 26 00:01:15.000 --> 00:01:18.359 rely on algorithms to do the spotting for us, which 27 00:01:18.400 --> 00:01:21.599 honestly is exactly why we are diving into the material 28 00:01:21.640 --> 00:01:22.519 you sent over today. 29 00:01:22.719 --> 00:01:26.120 It's a really fascinating collection of research, it really is. 30 00:01:26.519 --> 00:01:30.239 So we're looking at excerpts from this incredibly dense but 31 00:01:30.359 --> 00:01:35.959 honestly illuminating academic compilation. It's titled Data Visualization and Knowledge 32 00:01:36.000 --> 00:01:40.120 Engineering Spotting Data Points with Artificial Intelligence, and we are 33 00:01:40.120 --> 00:01:44.560 pulling from three distinct research chapters today, spanning software engineering, 34 00:01:44.799 --> 00:01:47.599 multimedia recommendation engines, and computer vision. 35 00:01:47.519 --> 00:01:49.879 Right, which sounds like three completely different worlds. 36 00:01:50.000 --> 00:01:52.000 Yeah, but our mission for this deep dive is to 37 00:01:52.040 --> 00:01:54.760 show you how they're connected. By the end of this conversation, 38 00:01:54.799 --> 00:01:58.319 you're going to understand the brilliant, completely silent mathematics that 39 00:01:58.400 --> 00:02:01.719 decide what you see here and use every single day. 40 00:02:01.959 --> 00:02:05.599 Because what stands out immediately across all these seemingly disparate 41 00:02:05.719 --> 00:02:11.319 fields is the shared underlying logic. These systems all basically 42 00:02:11.360 --> 00:02:15.759 rely on taking an overwhelmingly chaotic environment, finding the mathematical 43 00:02:15.759 --> 00:02:18.360 neighbors or the hidden patterns within it. And then using 44 00:02:18.400 --> 00:02:21.120 that specific geometry to predict a future outcome. 45 00:02:21.199 --> 00:02:23.240 Okay, so let's start right at the foundation of that 46 00:02:23.360 --> 00:02:26.879 digital world, which is the code itself. Before an AI 47 00:02:27.000 --> 00:02:31.680 can say, curate your evening entertainment or organize your vacation photos, 48 00:02:32.319 --> 00:02:35.960 the underlying software running those platforms has to actually function. 49 00:02:36.080 --> 00:02:38.360 It has to work, yeah, which brings up a really 50 00:02:38.479 --> 00:02:40.680 fascinating problem for developers. 51 00:02:40.199 --> 00:02:43.000 Right because when a tech company has millions of lines 52 00:02:43.000 --> 00:02:47.680 of code, they obviously can't manually test every single permutation 53 00:02:47.759 --> 00:02:48.400 before launch. 54 00:02:48.560 --> 00:02:52.199 They absolutely cannot. They have to optimize their quality assurance resources. 55 00:02:52.560 --> 00:02:56.719 So historically, developers relied heavily on something called WPDP. 56 00:02:56.280 --> 00:02:58.120 Which is within project defect prediction. 57 00:02:58.479 --> 00:03:02.280 Exactly within project defect prediction and the mechanism there is 58 00:03:02.879 --> 00:03:06.400 it's fairly intuitive. If version one point zero of your 59 00:03:06.400 --> 00:03:10.639 software crashed because of let's say a memory leak in 60 00:03:10.719 --> 00:03:12.759 a specific login module. 61 00:03:12.639 --> 00:03:15.240 The model just learns to aggressively check that exact same 62 00:03:15.280 --> 00:03:18.280 log in module when you build version two point out right. 63 00:03:18.319 --> 00:03:20.840 It scrutinizes the historical weak points, which. 64 00:03:20.680 --> 00:03:23.080 Makes total sense. If you actually have a version one 65 00:03:23.080 --> 00:03:26.319 point zero, you're learning from your own past mistakes. But 66 00:03:26.400 --> 00:03:28.599 if you are launching a brand new piece of software. 67 00:03:28.759 --> 00:03:31.159 You have zero pass data. I mean you are flying 68 00:03:31.199 --> 00:03:32.000 completely blind. 69 00:03:32.120 --> 00:03:34.280 You are, And that is where the shift to CPDP 70 00:03:34.400 --> 00:03:36.120 comes in. That's the frontier right now. 71 00:03:36.039 --> 00:03:37.439 Cross project defect prediction. 72 00:03:37.680 --> 00:03:40.599 Yes, so instead of relying on your own non existent history, 73 00:03:40.919 --> 00:03:45.000 the algorithm uses massive sets of training data from completely 74 00:03:45.039 --> 00:03:48.520 different outside software projects to find the hidden bugs in 75 00:03:48.560 --> 00:03:49.280 your new code. 76 00:03:49.400 --> 00:03:51.439 Okay, let's unpack this for a second, because the logic 77 00:03:51.479 --> 00:03:53.599 here is just wild to me. This is basically like 78 00:03:54.120 --> 00:03:56.120 trying to predict where the plumbing is going to leak 79 00:03:56.120 --> 00:03:57.879 in a brand new, half built. 80 00:03:57.599 --> 00:04:01.680 Skyscraper by studying the plumbing failures of completely different skyscraper 81 00:04:01.680 --> 00:04:03.199 across town exactly. 82 00:04:03.280 --> 00:04:04.680 I mean, how does that even work. 83 00:04:05.000 --> 00:04:08.800 It's actually a brilliant way to conceptualize it. Your skyscraper analogy. 84 00:04:09.400 --> 00:04:13.000 You are working on the assumption that because both structures 85 00:04:13.159 --> 00:04:16.720 use you know, pipes, water pressure, and gravity, the physical 86 00:04:16.720 --> 00:04:19.360 stress points will behave similarly, even if. 87 00:04:19.279 --> 00:04:21.240 The architectural floor plans are wildly different. 88 00:04:21.360 --> 00:04:25.079 Exactly, and the source material mentions four specific ways they 89 00:04:25.120 --> 00:04:27.720 set up this cross project training right. 90 00:04:27.680 --> 00:04:31.600 I have them here. It's strict mixed mixed with target 91 00:04:31.639 --> 00:04:33.079 class and pair wise. 92 00:04:33.000 --> 00:04:37.120 So strict means the training data is completely blind to 93 00:04:37.120 --> 00:04:41.079 your new software. It only uses outside projects period. Okay, 94 00:04:41.279 --> 00:04:45.439 mixed folds in older, perhaps slightly related projects alongside the 95 00:04:45.480 --> 00:04:48.639 outside data now mixed with target class is really interesting 96 00:04:48.680 --> 00:04:51.639 because it takes a tiny labeled sample from your current 97 00:04:51.720 --> 00:04:54.240 unfinished project to give the algorithm just a slight hint 98 00:04:54.319 --> 00:04:56.160 about your specific architecture, kind of like. 99 00:04:56.160 --> 00:04:58.800 Showing at a rough blueprint before it checks the pikes, right. 100 00:04:59.120 --> 00:05:01.600 And then pairwise is a strict one to one mapping. 101 00:05:01.759 --> 00:05:04.920 The model is trained entirely on one single outside project 102 00:05:05.040 --> 00:05:06.800 and then test it entirely on yours. 103 00:05:07.160 --> 00:05:09.680 But I'm trying to visualize what the AI is actually 104 00:05:09.680 --> 00:05:12.480 looking at here, because it's not reading the code like 105 00:05:12.680 --> 00:05:15.680 a human programmer, right, Yeah, it's not scanning for a 106 00:05:15.720 --> 00:05:16.759 missing semicolon. 107 00:05:16.959 --> 00:05:20.319 No, No, it's looking at structural metrics. The text highlights 108 00:05:20.319 --> 00:05:23.720 something called CK metrics, which measure the complexity of object 109 00:05:23.720 --> 00:05:24.720 oriented software. 110 00:05:25.000 --> 00:05:26.720 What's an example of a CK metric? 111 00:05:26.959 --> 00:05:29.920 A good example is the depth of inheritance tree. 112 00:05:30.120 --> 00:05:32.720 Depth of inheritance tree. Okay, what does that mean practically? 113 00:05:33.160 --> 00:05:36.959 Well, imagine code like a family tree. If a piece 114 00:05:37.000 --> 00:05:41.160 of code inherits traits from say, ten generations of parent 115 00:05:41.199 --> 00:05:44.279 code above it. It is deeply nested. 116 00:05:44.480 --> 00:05:46.639 Oh I see, and if you change one thing at 117 00:05:46.639 --> 00:05:49.199 the very top of that ten generation tree, it probably 118 00:05:49.199 --> 00:05:50.800 just breaks everything at the bottom. 119 00:05:50.519 --> 00:05:53.759 Exactly the point it's incredibly fragile. Or the AI looks 120 00:05:53.759 --> 00:05:57.800 at something like weighted methods per class, which basically measures 121 00:05:57.839 --> 00:06:00.560 how many different operations a single piece of code is 122 00:06:00.600 --> 00:06:01.800 trying to juggle all at once. 123 00:06:01.959 --> 00:06:04.519 So the algorithm isn't looking for a broken line of chade, 124 00:06:04.959 --> 00:06:07.199 it's scanning for structural fragility. 125 00:06:07.360 --> 00:06:12.600 Yes, mathematically extreme complexity is basically the breeding ground for bugs. 126 00:06:12.920 --> 00:06:15.360 Okay, I have to push back here though, just putting 127 00:06:15.360 --> 00:06:18.240 myself in the shoes of the engineers. If a commercial 128 00:06:18.279 --> 00:06:24.240 software project is, say, mostly successful, wouldn't bugs be incredibly rare? 129 00:06:24.720 --> 00:06:26.279 They are relatively speak. 130 00:06:26.160 --> 00:06:27.959 Right, So, say ninety nine percent of the code is 131 00:06:28.000 --> 00:06:32.680 structurally sound and only one percent is actually defective. If 132 00:06:32.720 --> 00:06:35.879 you feed an AI that data, doesn't the math just break. 133 00:06:36.399 --> 00:06:38.639 I mean, the AI could literally just look at any 134 00:06:38.639 --> 00:06:43.079 line of code blindly guess no bug and be mathematically correct. 135 00:06:43.240 --> 00:06:44.680 Ninety nine percent of the time. 136 00:06:45.120 --> 00:06:48.600 You've just identify, honestly, one of the most notorious hurdles 137 00:06:48.639 --> 00:06:51.639 in machine learning. It's called the class imbalance problem. 138 00:06:51.800 --> 00:06:53.000 Class imbalance problem. 139 00:06:53.079 --> 00:06:56.839 Yeah, when one outcome is overwhelmingly common, the algorithm just 140 00:06:56.879 --> 00:07:00.879 takes the path of least mathematical resistance, learns to ignore 141 00:07:00.920 --> 00:07:03.759 the rare anomaly the bug because optimizing for the ninety 142 00:07:03.839 --> 00:07:07.279 nine percent yields a fantastic accuracy score on paper. 143 00:07:07.439 --> 00:07:09.439 So how do they actually solve that? Because you can't 144 00:07:09.439 --> 00:07:11.360 just copy and paste that one where bug one hundred 145 00:07:11.399 --> 00:07:13.160 times to balance the spreadgy Right, that seems like it 146 00:07:13.199 --> 00:07:16.439 would just teach the AI to memorize one specific mistake, 147 00:07:16.639 --> 00:07:17.160 and you'd. 148 00:07:16.959 --> 00:07:20.240 Be totally right. Over sampling by just copying data does 149 00:07:20.279 --> 00:07:24.360 exactly that. The AI memorizes the duplicate, it overfits to it, 150 00:07:24.639 --> 00:07:28.040 and then becomes entirely useless at finding new types of bugs. 151 00:07:28.240 --> 00:07:29.399 Okay, so what's the fix. 152 00:07:29.560 --> 00:07:35.480 Instead, the researchers utilized a highly sophisticated statistical technique called SEMOT. 153 00:07:35.240 --> 00:07:39.160 Which stands for synthetic minority over sampling technique. 154 00:07:39.240 --> 00:07:42.199 Yes, and somemisode doesn't duplicate. What it does is calculate 155 00:07:42.240 --> 00:07:45.480 the mathematical distance between the rare bug data points in 156 00:07:45.600 --> 00:07:46.800 multidimensional space. 157 00:07:46.959 --> 00:07:50.399 Whoa multidimensional space. Okay, slow down. 158 00:07:50.279 --> 00:07:53.680 Let's simplify it. Imagine a scatter plot graph with two 159 00:07:53.839 --> 00:07:56.600 real bugs plotted on it. Smow T draws a line 160 00:07:56.639 --> 00:08:00.480 between those two points and mathematically synthesizes an entirely new 161 00:08:00.600 --> 00:08:02.480 artificial bugs somewhere along that line. 162 00:08:02.480 --> 00:08:04.720 Oh wow. Wait, really, so they aren't just finding bugs. 163 00:08:04.759 --> 00:08:08.399 They're essentially cloning the DNA of a mistake exactly. They 164 00:08:08.399 --> 00:08:13.040 are hallucinating highly realistic structural flaws to force the AI 165 00:08:13.120 --> 00:08:14.360 to become a better detective. 166 00:08:14.680 --> 00:08:19.160 It balances the scales not with repetition, but with synthetic diversity. 167 00:08:19.439 --> 00:08:22.240 And when the researchers combine some mote with a gradient 168 00:08:22.279 --> 00:08:25.600 boosting algorithm called xg boost, which by the way, is 169 00:08:25.639 --> 00:08:29.720 exceptional at handling complex tabular data, their cross project prediction 170 00:08:29.759 --> 00:08:32.080 accuracy reached up to eighty eight percent. 171 00:08:32.440 --> 00:08:36.000 Eighty eight percent. It completely flips how I thought quality 172 00:08:36.000 --> 00:08:40.240 assurance worked. It proves that algorithms can successfully predict structural 173 00:08:40.279 --> 00:08:42.600 failure just by studying the mathematical neighborhood. 174 00:08:42.840 --> 00:08:44.039 It does, and I. 175 00:08:44.000 --> 00:08:47.000 Mean if AI can synthesize fake data to fixed broken code, 176 00:08:47.039 --> 00:08:49.639 it raises a much bigger question for me. Can we 177 00:08:49.639 --> 00:08:52.679 apply that exact same neighborly logic to human behavior. 178 00:08:52.759 --> 00:08:55.759 Oh, absolutely, which takes us straight into the mechanics of 179 00:08:55.799 --> 00:08:59.720 recommendation systems, you know, the systems deciding what song, product 180 00:08:59.799 --> 00:09:03.440 or movie you interact with next. Broadly speaking, the industry 181 00:09:03.480 --> 00:09:07.960 relies on two philosophies, content based filtering and collaborative filtering. 182 00:09:08.039 --> 00:09:10.200 Content based seems pretty intuitive to me. If I watch 183 00:09:10.200 --> 00:09:13.799 a documentary about, say, deep sea diving, the algorithm tags 184 00:09:13.840 --> 00:09:17.440 the features yea, like ocean submarines, greene biology, and then 185 00:09:17.440 --> 00:09:20.240 it just recommends another documentary with those same exact tags. 186 00:09:20.720 --> 00:09:25.519 Yeah, it's essentially property matching. The limitation, however, is that 187 00:09:25.879 --> 00:09:30.639 content based filtering traps you in a very predictable bubble. 188 00:09:31.200 --> 00:09:33.879 It has no mechanism to surprise you with something outside 189 00:09:33.919 --> 00:09:35.080 of those literal. 190 00:09:34.720 --> 00:09:36.960 Tags, right, You're just stuck in a submarine loop. 191 00:09:36.799 --> 00:09:40.200 Forever, exactly. And that is why platforms pivot heavily toward 192 00:09:40.240 --> 00:09:41.320 collaborative filtering. 193 00:09:41.440 --> 00:09:44.399 And this is where the math gets really interesting. 194 00:09:44.000 --> 00:09:47.360 Because collaborative filtering doesn't actually care what the movie or 195 00:09:47.480 --> 00:09:51.120 song is about. It completely ignores the content tags. 196 00:09:51.200 --> 00:09:53.120 Wait, it ignores them entirely. 197 00:09:53.039 --> 00:09:55.679 Entirely, it only cares about the behavioral patterns of the 198 00:09:55.720 --> 00:09:58.960 people consuming it. It takes all of your clicks, your views, 199 00:09:58.960 --> 00:10:02.679 and ratings and plots them on this massive mathematical grid 200 00:10:02.720 --> 00:10:06.279 called a user item matrix. Okay, then it uses clustering 201 00:10:06.320 --> 00:10:09.360 algorithms like k means clustering to map you into a 202 00:10:09.360 --> 00:10:13.919 specific locality of other users who share your precise behavioral footprint. 203 00:10:14.080 --> 00:10:17.080 So collaborative filtering is basically like walking into a massive, 204 00:10:17.120 --> 00:10:20.639 crowded party, finding the one total stranger who likes the 205 00:10:20.720 --> 00:10:24.320 exact same weird indie band as you, and then blamely 206 00:10:24.360 --> 00:10:26.639 trusting their movie recommendation for the rest of the night. 207 00:10:26.919 --> 00:10:29.759 That's it, But it goes even further than that. The 208 00:10:29.799 --> 00:10:33.720 AI assumes that your agreement on past choices is actually 209 00:10:33.720 --> 00:10:36.159 a mathematical vector pointing toward your next choice. 210 00:10:36.159 --> 00:10:37.120 Meaning what exactly? 211 00:10:37.240 --> 00:10:40.120 Meaning, if you and this cluster of strangers agreed on 212 00:10:40.120 --> 00:10:44.120 your last fifty interactions, the system is statistically confident you 213 00:10:44.120 --> 00:10:46.960 will enjoy the fifty first thing they liked, even if 214 00:10:47.000 --> 00:10:49.440 it's a completely different genre that you've never even explored. 215 00:10:50.360 --> 00:10:52.799 But wait, looking at the source material, what happens when 216 00:10:52.840 --> 00:10:56.320 there is no history to match, Like the text brings 217 00:10:56.399 --> 00:10:57.960 up the cold start problem. 218 00:10:58.159 --> 00:11:01.000 Ah, yes, the cold start right, Because if I am 219 00:11:01.080 --> 00:11:03.480 a brand new user, my row on that user item 220 00:11:03.480 --> 00:11:06.440 matrix is completely blank. Or if a musician uploads a 221 00:11:06.480 --> 00:11:10.000 brand new track five seconds ago, it is zero listener data. 222 00:11:10.639 --> 00:11:13.000 How does this system ever recommend it? Doesn't the math 223 00:11:13.120 --> 00:11:13.799 just break down? 224 00:11:14.200 --> 00:11:16.240 The math does indeed break down. There the user itta 225 00:11:16.279 --> 00:11:20.919 matrix becomes too sparse. It's like a giant spreadsheet where 226 00:11:21.039 --> 00:11:23.559 ninety nine percent of the cells are just empty. You 227 00:11:23.600 --> 00:11:25.759 can't calculate a vector from nothing, So. 228 00:11:25.720 --> 00:11:27.639 What's the worker ind Well, this is why the state 229 00:11:27.639 --> 00:11:30.360 of the art approach relies on hybrid models. They layer 230 00:11:30.399 --> 00:11:33.960 collaborative and content based filtering together and then they integrate 231 00:11:34.000 --> 00:11:37.159 context from the Internet of Things or IoT. 232 00:11:37.080 --> 00:11:40.120 Right, they pull in real world unstructured data and the 233 00:11:40.159 --> 00:11:43.759 source text actually has this incredible real world case study 234 00:11:44.000 --> 00:11:47.200 to prove how powerful this is. Getting the story of 235 00:11:47.200 --> 00:11:48.240 miss Swati preside. 236 00:11:48.360 --> 00:11:51.440 Yes, it's a perfect illustration of how predictive analytics has 237 00:11:51.480 --> 00:11:54.240 evolved from just tracking what you clicked yesterday. 238 00:11:54.440 --> 00:11:55.639 So you had the stage for us. 239 00:11:55.720 --> 00:11:59.039 Yeah, there was an AI engine named Missin developed by 240 00:11:59.320 --> 00:12:02.559 ic Terra Science and its goal was to predict future talent. 241 00:12:02.720 --> 00:12:04.919 So it didn't just look at a sparse matrix of 242 00:12:05.000 --> 00:12:08.919 song ratings. It utilized natural language processing or NLP, to 243 00:12:09.000 --> 00:12:11.039 analyze her entire digital footprint. 244 00:12:11.320 --> 00:12:13.960 Okay, so what is the actual mechanism there? How does 245 00:12:14.000 --> 00:12:16.879 an algorithm read a digital footprint and spit out a 246 00:12:16.919 --> 00:12:17.960 prediction for stardom? 247 00:12:18.480 --> 00:12:21.759 So NLP allows the algorithm to map human language to 248 00:12:21.799 --> 00:12:24.879 mathematical weights. The messin Engines scraped the web for her 249 00:12:24.879 --> 00:12:26.879 college performances at engineering fest. 250 00:12:26.960 --> 00:12:28.480 Wow, it went that deep, it. 251 00:12:28.399 --> 00:12:31.879 Did, and it analyzed the semantic sentiment of the lyrics 252 00:12:31.960 --> 00:12:36.000 she was singing, basically calculating the emotional resonance of her words. 253 00:12:36.399 --> 00:12:39.799 On top of that, attracted her social media interactions, mapping 254 00:12:39.840 --> 00:12:42.279 the velocity and the sentiment of the comments around her. 255 00:12:42.759 --> 00:12:46.360 So it's assigning mathematical values to the emotional reaction she's 256 00:12:46.399 --> 00:12:50.600 generating online and then comparing that shape to the historical 257 00:12:50.679 --> 00:12:52.480 data of artists who actually made it big. 258 00:12:52.600 --> 00:12:56.159 Exactly. It synthesized all that unstructured context and predicted that 259 00:12:56.200 --> 00:12:59.639 she would make a debut as a playback singer in Bollywood. 260 00:12:59.240 --> 00:13:01.519 Which actually had I mean, she ended up singing for 261 00:13:01.519 --> 00:13:05.120 a feature film. The recommendation system wasn't just reacting to 262 00:13:05.200 --> 00:13:10.000 pass clicks. It was actively discovering latent human talent by 263 00:13:10.080 --> 00:13:13.000 identifying the mathematical signature of future popularity. 264 00:13:13.120 --> 00:13:16.559 It's a profound shift really in how we understand discovery. 265 00:13:16.879 --> 00:13:19.480 These algorithms. They're no longer just mirrors showing us what 266 00:13:19.519 --> 00:13:22.919 we already did. They are predictive oracles. They find the 267 00:13:22.960 --> 00:13:25.200 talent and immediately match it with the cluster of users 268 00:13:25.200 --> 00:13:27.080 who are mathematically primed to receive it. 269 00:13:27.080 --> 00:13:29.919 It's brilliant. Oh, but you know it deals with recommending 270 00:13:30.039 --> 00:13:33.639 or finding one specific thing, one song, one artist. What 271 00:13:33.759 --> 00:13:36.679 happens when the problem isn't picking one thing but trying 272 00:13:36.720 --> 00:13:39.120 to distill thousands of things. I mean, we all have 273 00:13:39.159 --> 00:13:42.159 thousands of photos sitting on our phones right now. How 274 00:13:42.159 --> 00:13:45.399 does an AI look at a massive visual data set 275 00:13:45.559 --> 00:13:48.320 and summarize it without losing the big picture? 276 00:13:48.960 --> 00:13:52.559 You're touching on the immense challenge of image collection summarization. 277 00:13:53.440 --> 00:13:56.679 To process that kind of visual noise, the algorithm has 278 00:13:56.720 --> 00:14:02.919 to choose a summarization philosophy. This material contrasts extractive summarization 279 00:14:03.600 --> 00:14:05.240 with abstractive summarization. 280 00:14:05.399 --> 00:14:07.039 Okay, if we think about this in terms of sports, 281 00:14:07.120 --> 00:14:10.639 extractive summarization would be like the highlight reel. You're pulling 282 00:14:10.639 --> 00:14:15.440 the actual untouched video clips of the best plays exactly, 283 00:14:15.679 --> 00:14:18.720 and abstractive would be the sports reporter writing a brand 284 00:14:18.720 --> 00:14:20.320 new article summarizing the game. 285 00:14:20.480 --> 00:14:24.039 That's spot on. Abstractive means the AI extracts the essence 286 00:14:24.080 --> 00:14:26.600 of the data and generate something entirely new, like a 287 00:14:26.639 --> 00:14:30.399 text summary. But the researchers note this is highly impractical 288 00:14:30.440 --> 00:14:31.879 for personal image collection, Right. 289 00:14:31.919 --> 00:14:34.360 I don't want an AI to generate a fake composite 290 00:14:34.440 --> 00:14:36.799 image to summarize my actual family vacation. 291 00:14:37.200 --> 00:14:40.120 No, you want your actual photos. So we rely on 292 00:14:40.200 --> 00:14:41.480 extractive summarization. 293 00:14:41.799 --> 00:14:44.080 But how does a computer look at a thousand pixels 294 00:14:44.080 --> 00:14:46.879 and mathematically decide what makes a good highlight? 295 00:14:47.120 --> 00:14:51.679 Well? The text details two main mathematical approaches to extractive summarization. 296 00:14:52.440 --> 00:14:56.120 The first is the similarity based approach. The goal here 297 00:14:56.240 --> 00:14:58.759 is to find the canonical view, and. 298 00:14:58.639 --> 00:15:01.720 A canonical view is what exactly the definitive angle? 299 00:15:01.879 --> 00:15:06.519 Yes, think of the most universally recognizable angle of the 300 00:15:06.559 --> 00:15:09.759 Eiffel Tower. To find this in your photos, the AI 301 00:15:09.919 --> 00:15:11.399 builds an eigen model. 302 00:15:11.480 --> 00:15:15.360 Hold on eigenmodel sounds incredibly dense. What is that practically doing? 303 00:15:15.440 --> 00:15:17.320 Is it just like averaging all the colors together. 304 00:15:17.519 --> 00:15:21.200 Not just colors. It's extracting the structural skeleton of the images. 305 00:15:21.600 --> 00:15:25.480 It maps out multidimensional features you know, edges, lighting, shapes, 306 00:15:25.679 --> 00:15:28.039 and it plots every photo in mathematical space. 307 00:15:28.080 --> 00:15:28.799 Okay, I'm falling. 308 00:15:29.360 --> 00:15:33.080 Then it uses something called cosine similarity. This calculates the 309 00:15:33.080 --> 00:15:36.480 geometric angle between the data points by finding the photos 310 00:15:36.480 --> 00:15:39.320 with the tightest angles to one another. It clusters similar 311 00:15:39.399 --> 00:15:42.840 images together and extracts the one photo sitting dead center 312 00:15:42.879 --> 00:15:43.559 in that cluster. 313 00:15:43.840 --> 00:15:46.159 So it looks at fifty photos of my dog at 314 00:15:46.159 --> 00:15:50.120 the beach, groups them by their structural skeleton, finds the 315 00:15:50.159 --> 00:15:53.840 mathematical dead center, and declares this is the canonical beach 316 00:15:53.919 --> 00:15:54.600 dog photo. 317 00:15:54.879 --> 00:15:56.639 That's the similarity approach. 318 00:15:56.759 --> 00:15:57.639 Yes, yeah. 319 00:15:57.679 --> 00:16:00.639 Now contrast that with the reconstruction based approach, which actually 320 00:16:00.639 --> 00:16:03.879 treats your photo album like a data compression problem. 321 00:16:04.039 --> 00:16:05.240 Data compression, right. 322 00:16:05.279 --> 00:16:08.279 It uses a dictionary of sparse representations and relies on 323 00:16:08.360 --> 00:16:11.159 minimizing something called L to norm error. 324 00:16:11.320 --> 00:16:14.120 Okay, L two norm error. I need an analogy here 325 00:16:14.159 --> 00:16:16.519 to wrap my head around that. Think of L two 326 00:16:16.600 --> 00:16:18.759 norm error like freeze drying a meal. 327 00:16:18.799 --> 00:16:20.720 Freeze drying, okay, Yeah. 328 00:16:20.360 --> 00:16:22.240 You remove all the water, which is the bulk of 329 00:16:22.279 --> 00:16:25.039 the weight distored efficiently, and if you add water back 330 00:16:25.120 --> 00:16:28.480 later and the meal tastes exactly like the original. The 331 00:16:28.639 --> 00:16:31.039 error in your freeze drying process zero. 332 00:16:31.279 --> 00:16:33.799 That is actually a highly accurate way to look at it. 333 00:16:33.840 --> 00:16:36.360 The algorithm is freeze drying your photo album. It asks 334 00:16:36.399 --> 00:16:39.799 a purely mathematical question, if I only keep these five 335 00:16:39.799 --> 00:16:42.000 photos out of one hundred, can I use their specific 336 00:16:42.080 --> 00:16:45.559 mathematical features to perfectly reconstruct the data of the missing 337 00:16:45.679 --> 00:16:48.919 ninety five. The five photos become the basis set, and 338 00:16:48.960 --> 00:16:51.720 the L to norm error is simply the mathematical difference 339 00:16:51.759 --> 00:16:56.399 between your original massive album and the algorithm's estimation. If 340 00:16:56.440 --> 00:16:59.720 the error is tiny, the summary is highly representative. 341 00:17:00.039 --> 00:17:01.879 But wait, putting myself in your shoes for a second, 342 00:17:01.879 --> 00:17:05.279 looking at my own camera role, pure math doesn't understand sentiment. 343 00:17:05.559 --> 00:17:06.400 No it doesn't. 344 00:17:06.519 --> 00:17:09.440 If the AI just optimizes for this L two norm error, 345 00:17:09.920 --> 00:17:14.160 it might pick five technically perfect photos that completely miss 346 00:17:14.240 --> 00:17:17.519 the emotional point of my trip, Like my favorite photo 347 00:17:17.599 --> 00:17:20.680 might be blurry or off center. Isn't a highlight? Real? 348 00:17:20.839 --> 00:17:22.119 Incredibly subjective? 349 00:17:22.240 --> 00:17:25.319 This is a crucial limitation. It really is. If you 350 00:17:25.480 --> 00:17:29.200 only use pure geometry, you get a mathematically perfect summary 351 00:17:29.279 --> 00:17:32.400 that feels totally alien to a human And that is 352 00:17:32.440 --> 00:17:36.759 exactly why the researchers introduce task specific summarization. 353 00:17:36.319 --> 00:17:38.559 Meaning the AI needs to know why you want the 354 00:17:38.559 --> 00:17:39.960 summary before it does the mask. 355 00:17:40.119 --> 00:17:43.039 Exactly, it filters the math through a layer of human intent. 356 00:17:43.200 --> 00:17:44.480 So how does it actually do that? 357 00:17:44.640 --> 00:17:48.720 The researchers build a deep learning architecture using a scorer network, 358 00:17:49.200 --> 00:17:52.240 so before it ever clusters a photo, it evaluates every 359 00:17:52.359 --> 00:17:57.480 single image based on three specific criteria relevance, diversity, and redundancy. 360 00:17:57.799 --> 00:17:59.880 Well, diversity and redundancy makes sense, you want it to 361 00:18:00.079 --> 00:18:03.440 and angles. You obviously don't want five identical pictures of 362 00:18:03.440 --> 00:18:07.960 the same sunset. But how does an algorithm measure subjective relevance? 363 00:18:08.400 --> 00:18:11.960 It uses a pre trained classifier. The AI takes the 364 00:18:12.000 --> 00:18:16.480 image's mathematical properties, it's feature vector, and multiplies it by 365 00:18:16.480 --> 00:18:20.000 a probability score that was generated for your specific task. 366 00:18:20.039 --> 00:18:21.000 Okay, give me an example. 367 00:18:21.119 --> 00:18:23.839 Say your task is show me the architectural highlights of 368 00:18:23.880 --> 00:18:27.519 my trip. The classifier acts as a filter, boosting the 369 00:18:27.519 --> 00:18:30.480 mathematical weight of buildings and drastically lowering the weight of 370 00:18:30.519 --> 00:18:31.400 selfies or food. 371 00:18:31.559 --> 00:18:34.440 So it's forcing the geometry to respect the context. 372 00:18:34.079 --> 00:18:37.519 Precisely, and the text notes this ensures the summary is 373 00:18:37.920 --> 00:18:40.480 a topologically invariant representation. 374 00:18:40.799 --> 00:18:44.880 Okay, let's ELI five that explain, like I'm five. Topologically 375 00:18:44.880 --> 00:18:48.160 invariant means what the shape of the memory survives. 376 00:18:48.559 --> 00:18:52.119 Yes, in topology, you can stretch or shrink an object, 377 00:18:52.440 --> 00:18:54.680 but as long as you don't pair it or punch 378 00:18:54.759 --> 00:18:59.039 new holes in it, it's fundamental property. Its invariant shape remains. 379 00:18:59.160 --> 00:19:00.000 Ah, it's beautiful. 380 00:19:00.240 --> 00:19:02.960 By using scorer networks, the AI can shrink a ten 381 00:19:03.039 --> 00:19:06.680 thousand photo album down to ten photos, but the fundamental 382 00:19:06.759 --> 00:19:09.920 shape of your memory, tailored specifically for what you care about, 383 00:19:10.160 --> 00:19:11.559 remains perfectly intact. 384 00:19:11.799 --> 00:19:14.839 You know, it is genuinely remarkable how interconnected all these 385 00:19:14.880 --> 00:19:18.160 concepts are. We started by looking at how AI clones 386 00:19:18.200 --> 00:19:21.480 the structural DNA of a mistake to predict software failure. 387 00:19:22.240 --> 00:19:25.000 Then we move to how IT clusters our behavioral footprints 388 00:19:25.039 --> 00:19:28.640 in an N dimensional matrix to predict cultural success, and 389 00:19:28.680 --> 00:19:32.119 we finished with how it uses sparse reconstruction and scorer 390 00:19:32.200 --> 00:19:37.480 networks to freeze dryer visual chaos into perfect, meaningful summaries. 391 00:19:37.079 --> 00:19:39.480 And the thread binding it all together is the mathematics 392 00:19:39.480 --> 00:19:42.960 of relationships. Data doesn't exist in a vacuum. Once an 393 00:19:43.000 --> 00:19:45.640 algorithm understands how a single piece of data relates to 394 00:19:45.640 --> 00:19:48.079 the neighborhood around it, it can predict the future of 395 00:19:48.079 --> 00:19:49.720 that entire neighborhood, which. 396 00:19:49.480 --> 00:19:52.960 Brings us entirely back to you listening right now. Every 397 00:19:53.039 --> 00:19:55.559 time you open a streaming app, search your camera roll, 398 00:19:55.839 --> 00:19:58.960 or rely on a banking protocols to securely process a transaction, 399 00:19:59.440 --> 00:20:03.960 you are really on this invisible architecture. Cess's MOTI balancing 400 00:20:04.000 --> 00:20:09.200 the scales, collaborative filtering, finding your digital neighbors, sparse reconstruction, 401 00:20:09.279 --> 00:20:10.240