WEBVTT

1
00:00:00.360 --> 00:00:03.080
<v Speaker 1>Usually when we talk about making a diagnosis, there's this

2
00:00:03.200 --> 00:00:08.000
<v Speaker 1>expectation of pure mechanical precision.

3
00:00:07.960 --> 00:00:10.400
<v Speaker 2>Right, like a very comforting binary exactly.

4
00:00:10.599 --> 00:00:12.160
<v Speaker 1>I mean, think about breaking your arm. You go to

5
00:00:12.160 --> 00:00:14.519
<v Speaker 1>the hospital, they take the X ray, and you see

6
00:00:14.519 --> 00:00:17.039
<v Speaker 1>that jagged white line on the black film, and the

7
00:00:17.039 --> 00:00:19.079
<v Speaker 1>doctor just points and says, you know, there it is.

8
00:00:19.480 --> 00:00:25.440
<v Speaker 2>Yeah, it's incredibly visible. We have this fundamental human bias

9
00:00:25.519 --> 00:00:28.440
<v Speaker 2>toward things we can see, right, things we can categorize

10
00:00:28.440 --> 00:00:32.000
<v Speaker 2>and just put into neat little boxes broken or not broken.

11
00:00:32.119 --> 00:00:34.880
<v Speaker 1>But then if you zoom out and look at the

12
00:00:34.920 --> 00:00:37.240
<v Speaker 1>digital world that you are interacting with right now, I

13
00:00:37.240 --> 00:00:40.520
<v Speaker 1>mean the apps on your phone, the movie recommendations popping

14
00:00:40.560 --> 00:00:43.359
<v Speaker 1>up on your TV, or even the software protocols keeping

15
00:00:43.399 --> 00:00:44.479
<v Speaker 1>your bank account secure.

16
00:00:44.560 --> 00:00:47.840
<v Speaker 2>Yeah, suddenly that X ray machine is just entirely useless.

17
00:00:47.479 --> 00:00:51.640
<v Speaker 1>Completely useless, because when you engage with modern technology, you

18
00:00:51.679 --> 00:00:55.719
<v Speaker 1>are stepping inside this invisible architecture. You're completely surrounded by

19
00:00:55.759 --> 00:00:59.679
<v Speaker 1>these complex decisions and predictions that are being made in

20
00:00:59.679 --> 00:01:01.359
<v Speaker 1>the app slute dark, and.

21
00:01:01.439 --> 00:01:05.120
<v Speaker 2>The sheer volume of data flowing through that architecture is

22
00:01:05.200 --> 00:01:09.239
<v Speaker 2>so massive. I mean, human eyes couldn't possibly find those

23
00:01:09.359 --> 00:01:12.159
<v Speaker 2>jagged white lines even if they knew exactly what to

24
00:01:12.200 --> 00:01:12.519
<v Speaker 2>look for.

25
00:01:12.680 --> 00:01:14.719
<v Speaker 1>Right. The scale of it all just demands that we

26
00:01:15.000 --> 00:01:18.359
<v Speaker 1>rely on algorithms to do the spotting for us, which

27
00:01:18.400 --> 00:01:21.599
<v Speaker 1>honestly is exactly why we are diving into the material

28
00:01:21.640 --> 00:01:22.519
<v Speaker 1>you sent over today.

29
00:01:22.719 --> 00:01:26.120
<v Speaker 2>It's a really fascinating collection of research, it really is.

30
00:01:26.519 --> 00:01:30.239
<v Speaker 1>So we're looking at excerpts from this incredibly dense but

31
00:01:30.359 --> 00:01:35.959
<v Speaker 1>honestly illuminating academic compilation. It's titled Data Visualization and Knowledge

32
00:01:36.000 --> 00:01:40.120
<v Speaker 1>Engineering Spotting Data Points with Artificial Intelligence, and we are

33
00:01:40.120 --> 00:01:44.560
<v Speaker 1>pulling from three distinct research chapters today, spanning software engineering,

34
00:01:44.799 --> 00:01:47.599
<v Speaker 1>multimedia recommendation engines, and computer vision.

35
00:01:47.519 --> 00:01:49.879
<v Speaker 2>Right, which sounds like three completely different worlds.

36
00:01:50.000 --> 00:01:52.000
<v Speaker 1>Yeah, but our mission for this deep dive is to

37
00:01:52.040 --> 00:01:54.760
<v Speaker 1>show you how they're connected. By the end of this conversation,

38
00:01:54.799 --> 00:01:58.319
<v Speaker 1>you're going to understand the brilliant, completely silent mathematics that

39
00:01:58.400 --> 00:02:01.719
<v Speaker 1>decide what you see here and use every single day.

40
00:02:01.959 --> 00:02:05.599
<v Speaker 2>Because what stands out immediately across all these seemingly disparate

41
00:02:05.719 --> 00:02:11.319
<v Speaker 2>fields is the shared underlying logic. These systems all basically

42
00:02:11.360 --> 00:02:15.759
<v Speaker 2>rely on taking an overwhelmingly chaotic environment, finding the mathematical

43
00:02:15.759 --> 00:02:18.360
<v Speaker 2>neighbors or the hidden patterns within it. And then using

44
00:02:18.400 --> 00:02:21.120
<v Speaker 2>that specific geometry to predict a future outcome.

45
00:02:21.199 --> 00:02:23.240
<v Speaker 1>Okay, so let's start right at the foundation of that

46
00:02:23.360 --> 00:02:26.879
<v Speaker 1>digital world, which is the code itself. Before an AI

47
00:02:27.000 --> 00:02:31.680
<v Speaker 1>can say, curate your evening entertainment or organize your vacation photos,

48
00:02:32.319 --> 00:02:35.960
<v Speaker 1>the underlying software running those platforms has to actually function.

49
00:02:36.080 --> 00:02:38.360
<v Speaker 2>It has to work, yeah, which brings up a really

50
00:02:38.479 --> 00:02:40.680
<v Speaker 2>fascinating problem for developers.

51
00:02:40.199 --> 00:02:43.000
<v Speaker 1>Right because when a tech company has millions of lines

52
00:02:43.000 --> 00:02:47.680
<v Speaker 1>of code, they obviously can't manually test every single permutation

53
00:02:47.759 --> 00:02:48.400
<v Speaker 1>before launch.

54
00:02:48.560 --> 00:02:52.199
<v Speaker 2>They absolutely cannot. They have to optimize their quality assurance resources.

55
00:02:52.560 --> 00:02:56.719
<v Speaker 2>So historically, developers relied heavily on something called WPDP.

56
00:02:56.280 --> 00:02:58.120
<v Speaker 1>Which is within project defect prediction.

57
00:02:58.479 --> 00:03:02.280
<v Speaker 2>Exactly within project defect prediction and the mechanism there is

58
00:03:02.879 --> 00:03:06.400
<v Speaker 2>it's fairly intuitive. If version one point zero of your

59
00:03:06.400 --> 00:03:10.639
<v Speaker 2>software crashed because of let's say a memory leak in

60
00:03:10.719 --> 00:03:12.759
<v Speaker 2>a specific login module.

61
00:03:12.639 --> 00:03:15.240
<v Speaker 1>The model just learns to aggressively check that exact same

62
00:03:15.280 --> 00:03:18.280
<v Speaker 1>log in module when you build version two point out right.

63
00:03:18.319 --> 00:03:20.840
<v Speaker 2>It scrutinizes the historical weak points, which.

64
00:03:20.680 --> 00:03:23.080
<v Speaker 1>Makes total sense. If you actually have a version one

65
00:03:23.080 --> 00:03:26.319
<v Speaker 1>point zero, you're learning from your own past mistakes. But

66
00:03:26.400 --> 00:03:28.599
<v Speaker 1>if you are launching a brand new piece of software.

67
00:03:28.759 --> 00:03:31.159
<v Speaker 1>You have zero pass data. I mean you are flying

68
00:03:31.199 --> 00:03:32.000
<v Speaker 1>completely blind.

69
00:03:32.120 --> 00:03:34.280
<v Speaker 2>You are, And that is where the shift to CPDP

70
00:03:34.400 --> 00:03:36.120
<v Speaker 2>comes in. That's the frontier right now.

71
00:03:36.039 --> 00:03:37.439
<v Speaker 1>Cross project defect prediction.

72
00:03:37.680 --> 00:03:40.599
<v Speaker 2>Yes, so instead of relying on your own non existent history,

73
00:03:40.919 --> 00:03:45.000
<v Speaker 2>the algorithm uses massive sets of training data from completely

74
00:03:45.039 --> 00:03:48.520
<v Speaker 2>different outside software projects to find the hidden bugs in

75
00:03:48.560 --> 00:03:49.280
<v Speaker 2>your new code.

76
00:03:49.400 --> 00:03:51.439
<v Speaker 1>Okay, let's unpack this for a second, because the logic

77
00:03:51.479 --> 00:03:53.599
<v Speaker 1>here is just wild to me. This is basically like

78
00:03:54.120 --> 00:03:56.120
<v Speaker 1>trying to predict where the plumbing is going to leak

79
00:03:56.120 --> 00:03:57.879
<v Speaker 1>in a brand new, half built.

80
00:03:57.599 --> 00:04:01.680
<v Speaker 2>Skyscraper by studying the plumbing failures of completely different skyscraper

81
00:04:01.680 --> 00:04:03.199
<v Speaker 2>across town exactly.

82
00:04:03.280 --> 00:04:04.680
<v Speaker 1>I mean, how does that even work.

83
00:04:05.000 --> 00:04:08.800
<v Speaker 2>It's actually a brilliant way to conceptualize it. Your skyscraper analogy.

84
00:04:09.400 --> 00:04:13.000
<v Speaker 2>You are working on the assumption that because both structures

85
00:04:13.159 --> 00:04:16.720
<v Speaker 2>use you know, pipes, water pressure, and gravity, the physical

86
00:04:16.720 --> 00:04:19.360
<v Speaker 2>stress points will behave similarly, even if.

87
00:04:19.279 --> 00:04:21.240
<v Speaker 1>The architectural floor plans are wildly different.

88
00:04:21.360 --> 00:04:25.079
<v Speaker 2>Exactly, and the source material mentions four specific ways they

89
00:04:25.120 --> 00:04:27.720
<v Speaker 2>set up this cross project training right.

90
00:04:27.680 --> 00:04:31.600
<v Speaker 1>I have them here. It's strict mixed mixed with target

91
00:04:31.639 --> 00:04:33.079
<v Speaker 1>class and pair wise.

92
00:04:33.000 --> 00:04:37.120
<v Speaker 2>So strict means the training data is completely blind to

93
00:04:37.120 --> 00:04:41.079
<v Speaker 2>your new software. It only uses outside projects period. Okay,

94
00:04:41.279 --> 00:04:45.439
<v Speaker 2>mixed folds in older, perhaps slightly related projects alongside the

95
00:04:45.480 --> 00:04:48.639
<v Speaker 2>outside data now mixed with target class is really interesting

96
00:04:48.680 --> 00:04:51.639
<v Speaker 2>because it takes a tiny labeled sample from your current

97
00:04:51.720 --> 00:04:54.240
<v Speaker 2>unfinished project to give the algorithm just a slight hint

98
00:04:54.319 --> 00:04:56.160
<v Speaker 2>about your specific architecture, kind of like.

99
00:04:56.160 --> 00:04:58.800
<v Speaker 1>Showing at a rough blueprint before it checks the pikes, right.

100
00:04:59.120 --> 00:05:01.600
<v Speaker 2>And then pairwise is a strict one to one mapping.

101
00:05:01.759 --> 00:05:04.920
<v Speaker 2>The model is trained entirely on one single outside project

102
00:05:05.040 --> 00:05:06.800
<v Speaker 2>and then test it entirely on yours.

103
00:05:07.160 --> 00:05:09.680
<v Speaker 1>But I'm trying to visualize what the AI is actually

104
00:05:09.680 --> 00:05:12.480
<v Speaker 1>looking at here, because it's not reading the code like

105
00:05:12.680 --> 00:05:15.680
<v Speaker 1>a human programmer, right, Yeah, it's not scanning for a

106
00:05:15.720 --> 00:05:16.759
<v Speaker 1>missing semicolon.

107
00:05:16.959 --> 00:05:20.319
<v Speaker 2>No, No, it's looking at structural metrics. The text highlights

108
00:05:20.319 --> 00:05:23.720
<v Speaker 2>something called CK metrics, which measure the complexity of object

109
00:05:23.720 --> 00:05:24.720
<v Speaker 2>oriented software.

110
00:05:25.000 --> 00:05:26.720
<v Speaker 1>What's an example of a CK metric?

111
00:05:26.959 --> 00:05:29.920
<v Speaker 2>A good example is the depth of inheritance tree.

112
00:05:30.120 --> 00:05:32.720
<v Speaker 1>Depth of inheritance tree. Okay, what does that mean practically?

113
00:05:33.160 --> 00:05:36.959
<v Speaker 2>Well, imagine code like a family tree. If a piece

114
00:05:37.000 --> 00:05:41.160
<v Speaker 2>of code inherits traits from say, ten generations of parent

115
00:05:41.199 --> 00:05:44.279
<v Speaker 2>code above it. It is deeply nested.

116
00:05:44.480 --> 00:05:46.639
<v Speaker 1>Oh I see, and if you change one thing at

117
00:05:46.639 --> 00:05:49.199
<v Speaker 1>the very top of that ten generation tree, it probably

118
00:05:49.199 --> 00:05:50.800
<v Speaker 1>just breaks everything at the bottom.

119
00:05:50.519 --> 00:05:53.759
<v Speaker 2>Exactly the point it's incredibly fragile. Or the AI looks

120
00:05:53.759 --> 00:05:57.800
<v Speaker 2>at something like weighted methods per class, which basically measures

121
00:05:57.839 --> 00:06:00.560
<v Speaker 2>how many different operations a single piece of code is

122
00:06:00.600 --> 00:06:01.800
<v Speaker 2>trying to juggle all at once.

123
00:06:01.959 --> 00:06:04.519
<v Speaker 1>So the algorithm isn't looking for a broken line of chade,

124
00:06:04.959 --> 00:06:07.199
<v Speaker 1>it's scanning for structural fragility.

125
00:06:07.360 --> 00:06:12.600
<v Speaker 2>Yes, mathematically extreme complexity is basically the breeding ground for bugs.

126
00:06:12.920 --> 00:06:15.360
<v Speaker 1>Okay, I have to push back here though, just putting

127
00:06:15.360 --> 00:06:18.240
<v Speaker 1>myself in the shoes of the engineers. If a commercial

128
00:06:18.279 --> 00:06:24.240
<v Speaker 1>software project is, say, mostly successful, wouldn't bugs be incredibly rare?

129
00:06:24.720 --> 00:06:26.279
<v Speaker 2>They are relatively speak.

130
00:06:26.160 --> 00:06:27.959
<v Speaker 1>Right, So, say ninety nine percent of the code is

131
00:06:28.000 --> 00:06:32.680
<v Speaker 1>structurally sound and only one percent is actually defective. If

132
00:06:32.720 --> 00:06:35.879
<v Speaker 1>you feed an AI that data, doesn't the math just break.

133
00:06:36.399 --> 00:06:38.639
<v Speaker 1>I mean, the AI could literally just look at any

134
00:06:38.639 --> 00:06:43.079
<v Speaker 1>line of code blindly guess no bug and be mathematically correct.

135
00:06:43.240 --> 00:06:44.680
<v Speaker 1>Ninety nine percent of the time.

136
00:06:45.120 --> 00:06:48.600
<v Speaker 2>You've just identify, honestly, one of the most notorious hurdles

137
00:06:48.639 --> 00:06:51.639
<v Speaker 2>in machine learning. It's called the class imbalance problem.

138
00:06:51.800 --> 00:06:53.000
<v Speaker 1>Class imbalance problem.

139
00:06:53.079 --> 00:06:56.839
<v Speaker 2>Yeah, when one outcome is overwhelmingly common, the algorithm just

140
00:06:56.879 --> 00:07:00.879
<v Speaker 2>takes the path of least mathematical resistance, learns to ignore

141
00:07:00.920 --> 00:07:03.759
<v Speaker 2>the rare anomaly the bug because optimizing for the ninety

142
00:07:03.839 --> 00:07:07.279
<v Speaker 2>nine percent yields a fantastic accuracy score on paper.

143
00:07:07.439 --> 00:07:09.439
<v Speaker 1>So how do they actually solve that? Because you can't

144
00:07:09.439 --> 00:07:11.360
<v Speaker 1>just copy and paste that one where bug one hundred

145
00:07:11.399 --> 00:07:13.160
<v Speaker 1>times to balance the spreadgy Right, that seems like it

146
00:07:13.199 --> 00:07:16.439
<v Speaker 1>would just teach the AI to memorize one specific mistake,

147
00:07:16.639 --> 00:07:17.160
<v Speaker 1>and you'd.

148
00:07:16.959 --> 00:07:20.240
<v Speaker 2>Be totally right. Over sampling by just copying data does

149
00:07:20.279 --> 00:07:24.360
<v Speaker 2>exactly that. The AI memorizes the duplicate, it overfits to it,

150
00:07:24.639 --> 00:07:28.040
<v Speaker 2>and then becomes entirely useless at finding new types of bugs.

151
00:07:28.240 --> 00:07:29.399
<v Speaker 1>Okay, so what's the fix.

152
00:07:29.560 --> 00:07:35.480
<v Speaker 2>Instead, the researchers utilized a highly sophisticated statistical technique called SEMOT.

153
00:07:35.240 --> 00:07:39.160
<v Speaker 1>Which stands for synthetic minority over sampling technique.

154
00:07:39.240 --> 00:07:42.199
<v Speaker 2>Yes, and somemisode doesn't duplicate. What it does is calculate

155
00:07:42.240 --> 00:07:45.480
<v Speaker 2>the mathematical distance between the rare bug data points in

156
00:07:45.600 --> 00:07:46.800
<v Speaker 2>multidimensional space.

157
00:07:46.959 --> 00:07:50.399
<v Speaker 1>Whoa multidimensional space. Okay, slow down.

158
00:07:50.279 --> 00:07:53.680
<v Speaker 2>Let's simplify it. Imagine a scatter plot graph with two

159
00:07:53.839 --> 00:07:56.600
<v Speaker 2>real bugs plotted on it. Smow T draws a line

160
00:07:56.639 --> 00:08:00.480
<v Speaker 2>between those two points and mathematically synthesizes an entirely new

161
00:08:00.600 --> 00:08:02.480
<v Speaker 2>artificial bugs somewhere along that line.

162
00:08:02.480 --> 00:08:04.720
<v Speaker 1>Oh wow. Wait, really, so they aren't just finding bugs.

163
00:08:04.759 --> 00:08:08.399
<v Speaker 1>They're essentially cloning the DNA of a mistake exactly. They

164
00:08:08.399 --> 00:08:13.040
<v Speaker 1>are hallucinating highly realistic structural flaws to force the AI

165
00:08:13.120 --> 00:08:14.360
<v Speaker 1>to become a better detective.

166
00:08:14.680 --> 00:08:19.160
<v Speaker 2>It balances the scales not with repetition, but with synthetic diversity.

167
00:08:19.439 --> 00:08:22.240
<v Speaker 2>And when the researchers combine some mote with a gradient

168
00:08:22.279 --> 00:08:25.600
<v Speaker 2>boosting algorithm called xg boost, which by the way, is

169
00:08:25.639 --> 00:08:29.720
<v Speaker 2>exceptional at handling complex tabular data, their cross project prediction

170
00:08:29.759 --> 00:08:32.080
<v Speaker 2>accuracy reached up to eighty eight percent.

171
00:08:32.440 --> 00:08:36.000
<v Speaker 1>Eighty eight percent. It completely flips how I thought quality

172
00:08:36.000 --> 00:08:40.240
<v Speaker 1>assurance worked. It proves that algorithms can successfully predict structural

173
00:08:40.279 --> 00:08:42.600
<v Speaker 1>failure just by studying the mathematical neighborhood.

174
00:08:42.840 --> 00:08:44.039
<v Speaker 2>It does, and I.

175
00:08:44.000 --> 00:08:47.000
<v Speaker 1>Mean if AI can synthesize fake data to fixed broken code,

176
00:08:47.039 --> 00:08:49.639
<v Speaker 1>it raises a much bigger question for me. Can we

177
00:08:49.639 --> 00:08:52.679
<v Speaker 1>apply that exact same neighborly logic to human behavior.

178
00:08:52.759 --> 00:08:55.759
<v Speaker 2>Oh, absolutely, which takes us straight into the mechanics of

179
00:08:55.799 --> 00:08:59.720
<v Speaker 2>recommendation systems, you know, the systems deciding what song, product

180
00:08:59.799 --> 00:09:03.440
<v Speaker 2>or movie you interact with next. Broadly speaking, the industry

181
00:09:03.480 --> 00:09:07.960
<v Speaker 2>relies on two philosophies, content based filtering and collaborative filtering.

182
00:09:08.039 --> 00:09:10.200
<v Speaker 1>Content based seems pretty intuitive to me. If I watch

183
00:09:10.200 --> 00:09:13.799
<v Speaker 1>a documentary about, say, deep sea diving, the algorithm tags

184
00:09:13.840 --> 00:09:17.440
<v Speaker 1>the features yea, like ocean submarines, greene biology, and then

185
00:09:17.440 --> 00:09:20.240
<v Speaker 1>it just recommends another documentary with those same exact tags.

186
00:09:20.720 --> 00:09:25.519
<v Speaker 2>Yeah, it's essentially property matching. The limitation, however, is that

187
00:09:25.879 --> 00:09:30.639
<v Speaker 2>content based filtering traps you in a very predictable bubble.

188
00:09:31.200 --> 00:09:33.879
<v Speaker 2>It has no mechanism to surprise you with something outside

189
00:09:33.919 --> 00:09:35.080
<v Speaker 2>of those literal.

190
00:09:34.720 --> 00:09:36.960
<v Speaker 1>Tags, right, You're just stuck in a submarine loop.

191
00:09:36.799 --> 00:09:40.200
<v Speaker 2>Forever, exactly. And that is why platforms pivot heavily toward

192
00:09:40.240 --> 00:09:41.320
<v Speaker 2>collaborative filtering.

193
00:09:41.440 --> 00:09:44.399
<v Speaker 1>And this is where the math gets really interesting.

194
00:09:44.000 --> 00:09:47.360
<v Speaker 2>Because collaborative filtering doesn't actually care what the movie or

195
00:09:47.480 --> 00:09:51.120
<v Speaker 2>song is about. It completely ignores the content tags.

196
00:09:51.200 --> 00:09:53.120
<v Speaker 1>Wait, it ignores them entirely.

197
00:09:53.039 --> 00:09:55.679
<v Speaker 2>Entirely, it only cares about the behavioral patterns of the

198
00:09:55.720 --> 00:09:58.960
<v Speaker 2>people consuming it. It takes all of your clicks, your views,

199
00:09:58.960 --> 00:10:02.679
<v Speaker 2>and ratings and plots them on this massive mathematical grid

200
00:10:02.720 --> 00:10:06.279
<v Speaker 2>called a user item matrix. Okay, then it uses clustering

201
00:10:06.320 --> 00:10:09.360
<v Speaker 2>algorithms like k means clustering to map you into a

202
00:10:09.360 --> 00:10:13.919
<v Speaker 2>specific locality of other users who share your precise behavioral footprint.

203
00:10:14.080 --> 00:10:17.080
<v Speaker 1>So collaborative filtering is basically like walking into a massive,

204
00:10:17.120 --> 00:10:20.639
<v Speaker 1>crowded party, finding the one total stranger who likes the

205
00:10:20.720 --> 00:10:24.320
<v Speaker 1>exact same weird indie band as you, and then blamely

206
00:10:24.360 --> 00:10:26.639
<v Speaker 1>trusting their movie recommendation for the rest of the night.

207
00:10:26.919 --> 00:10:29.759
<v Speaker 2>That's it, But it goes even further than that. The

208
00:10:29.799 --> 00:10:33.720
<v Speaker 2>AI assumes that your agreement on past choices is actually

209
00:10:33.720 --> 00:10:36.159
<v Speaker 2>a mathematical vector pointing toward your next choice.

210
00:10:36.159 --> 00:10:37.120
<v Speaker 1>Meaning what exactly?

211
00:10:37.240 --> 00:10:40.120
<v Speaker 2>Meaning, if you and this cluster of strangers agreed on

212
00:10:40.120 --> 00:10:44.120
<v Speaker 2>your last fifty interactions, the system is statistically confident you

213
00:10:44.120 --> 00:10:46.960
<v Speaker 2>will enjoy the fifty first thing they liked, even if

214
00:10:47.000 --> 00:10:49.440
<v Speaker 2>it's a completely different genre that you've never even explored.

215
00:10:50.360 --> 00:10:52.799
<v Speaker 1>But wait, looking at the source material, what happens when

216
00:10:52.840 --> 00:10:56.320
<v Speaker 1>there is no history to match, Like the text brings

217
00:10:56.399 --> 00:10:57.960
<v Speaker 1>up the cold start problem.

218
00:10:58.159 --> 00:11:01.000
<v Speaker 2>Ah, yes, the cold start right, Because if I am

219
00:11:01.080 --> 00:11:03.480
<v Speaker 2>a brand new user, my row on that user item

220
00:11:03.480 --> 00:11:06.440
<v Speaker 2>matrix is completely blank. Or if a musician uploads a

221
00:11:06.480 --> 00:11:10.000
<v Speaker 2>brand new track five seconds ago, it is zero listener data.

222
00:11:10.639 --> 00:11:13.000
<v Speaker 2>How does this system ever recommend it? Doesn't the math

223
00:11:13.120 --> 00:11:13.799
<v Speaker 2>just break down?

224
00:11:14.200 --> 00:11:16.240
<v Speaker 1>The math does indeed break down. There the user itta

225
00:11:16.279 --> 00:11:20.919
<v Speaker 1>matrix becomes too sparse. It's like a giant spreadsheet where

226
00:11:21.039 --> 00:11:23.559
<v Speaker 1>ninety nine percent of the cells are just empty. You

227
00:11:23.600 --> 00:11:25.759
<v Speaker 1>can't calculate a vector from nothing, So.

228
00:11:25.720 --> 00:11:27.639
<v Speaker 2>What's the worker ind Well, this is why the state

229
00:11:27.639 --> 00:11:30.360
<v Speaker 2>of the art approach relies on hybrid models. They layer

230
00:11:30.399 --> 00:11:33.960
<v Speaker 2>collaborative and content based filtering together and then they integrate

231
00:11:34.000 --> 00:11:37.159
<v Speaker 2>context from the Internet of Things or IoT.

232
00:11:37.080 --> 00:11:40.120
<v Speaker 1>Right, they pull in real world unstructured data and the

233
00:11:40.159 --> 00:11:43.759
<v Speaker 1>source text actually has this incredible real world case study

234
00:11:44.000 --> 00:11:47.200
<v Speaker 1>to prove how powerful this is. Getting the story of

235
00:11:47.200 --> 00:11:48.240
<v Speaker 1>miss Swati preside.

236
00:11:48.360 --> 00:11:51.440
<v Speaker 2>Yes, it's a perfect illustration of how predictive analytics has

237
00:11:51.480 --> 00:11:54.240
<v Speaker 2>evolved from just tracking what you clicked yesterday.

238
00:11:54.440 --> 00:11:55.639
<v Speaker 1>So you had the stage for us.

239
00:11:55.720 --> 00:11:59.039
<v Speaker 2>Yeah, there was an AI engine named Missin developed by

240
00:11:59.320 --> 00:12:02.559
<v Speaker 2>ic Terra Science and its goal was to predict future talent.

241
00:12:02.720 --> 00:12:04.919
<v Speaker 2>So it didn't just look at a sparse matrix of

242
00:12:05.000 --> 00:12:08.919
<v Speaker 2>song ratings. It utilized natural language processing or NLP, to

243
00:12:09.000 --> 00:12:11.039
<v Speaker 2>analyze her entire digital footprint.

244
00:12:11.320 --> 00:12:13.960
<v Speaker 1>Okay, so what is the actual mechanism there? How does

245
00:12:14.000 --> 00:12:16.879
<v Speaker 1>an algorithm read a digital footprint and spit out a

246
00:12:16.919 --> 00:12:17.960
<v Speaker 1>prediction for stardom?

247
00:12:18.480 --> 00:12:21.759
<v Speaker 2>So NLP allows the algorithm to map human language to

248
00:12:21.799 --> 00:12:24.879
<v Speaker 2>mathematical weights. The messin Engines scraped the web for her

249
00:12:24.879 --> 00:12:26.879
<v Speaker 2>college performances at engineering fest.

250
00:12:26.960 --> 00:12:28.480
<v Speaker 1>Wow, it went that deep, it.

251
00:12:28.399 --> 00:12:31.879
<v Speaker 2>Did, and it analyzed the semantic sentiment of the lyrics

252
00:12:31.960 --> 00:12:36.000
<v Speaker 2>she was singing, basically calculating the emotional resonance of her words.

253
00:12:36.399 --> 00:12:39.799
<v Speaker 2>On top of that, attracted her social media interactions, mapping

254
00:12:39.840 --> 00:12:42.279
<v Speaker 2>the velocity and the sentiment of the comments around her.

255
00:12:42.759 --> 00:12:46.360
<v Speaker 1>So it's assigning mathematical values to the emotional reaction she's

256
00:12:46.399 --> 00:12:50.600
<v Speaker 1>generating online and then comparing that shape to the historical

257
00:12:50.679 --> 00:12:52.480
<v Speaker 1>data of artists who actually made it big.

258
00:12:52.600 --> 00:12:56.159
<v Speaker 2>Exactly. It synthesized all that unstructured context and predicted that

259
00:12:56.200 --> 00:12:59.639
<v Speaker 2>she would make a debut as a playback singer in Bollywood.

260
00:12:59.240 --> 00:13:01.519
<v Speaker 1>Which actually had I mean, she ended up singing for

261
00:13:01.519 --> 00:13:05.120
<v Speaker 1>a feature film. The recommendation system wasn't just reacting to

262
00:13:05.200 --> 00:13:10.000
<v Speaker 1>pass clicks. It was actively discovering latent human talent by

263
00:13:10.080 --> 00:13:13.000
<v Speaker 1>identifying the mathematical signature of future popularity.

264
00:13:13.120 --> 00:13:16.559
<v Speaker 2>It's a profound shift really in how we understand discovery.

265
00:13:16.879 --> 00:13:19.480
<v Speaker 2>These algorithms. They're no longer just mirrors showing us what

266
00:13:19.519 --> 00:13:22.919
<v Speaker 2>we already did. They are predictive oracles. They find the

267
00:13:22.960 --> 00:13:25.200
<v Speaker 2>talent and immediately match it with the cluster of users

268
00:13:25.200 --> 00:13:27.080
<v Speaker 2>who are mathematically primed to receive it.

269
00:13:27.080 --> 00:13:29.919
<v Speaker 1>It's brilliant. Oh, but you know it deals with recommending

270
00:13:30.039 --> 00:13:33.639
<v Speaker 1>or finding one specific thing, one song, one artist. What

271
00:13:33.759 --> 00:13:36.679
<v Speaker 1>happens when the problem isn't picking one thing but trying

272
00:13:36.720 --> 00:13:39.120
<v Speaker 1>to distill thousands of things. I mean, we all have

273
00:13:39.159 --> 00:13:42.159
<v Speaker 1>thousands of photos sitting on our phones right now. How

274
00:13:42.159 --> 00:13:45.399
<v Speaker 1>does an AI look at a massive visual data set

275
00:13:45.559 --> 00:13:48.320
<v Speaker 1>and summarize it without losing the big picture?

276
00:13:48.960 --> 00:13:52.559
<v Speaker 2>You're touching on the immense challenge of image collection summarization.

277
00:13:53.440 --> 00:13:56.679
<v Speaker 2>To process that kind of visual noise, the algorithm has

278
00:13:56.720 --> 00:14:02.919
<v Speaker 2>to choose a summarization philosophy. This material contrasts extractive summarization

279
00:14:03.600 --> 00:14:05.240
<v Speaker 2>with abstractive summarization.

280
00:14:05.399 --> 00:14:07.039
<v Speaker 1>Okay, if we think about this in terms of sports,

281
00:14:07.120 --> 00:14:10.639
<v Speaker 1>extractive summarization would be like the highlight reel. You're pulling

282
00:14:10.639 --> 00:14:15.440
<v Speaker 1>the actual untouched video clips of the best plays exactly,

283
00:14:15.679 --> 00:14:18.720
<v Speaker 1>and abstractive would be the sports reporter writing a brand

284
00:14:18.720 --> 00:14:20.320
<v Speaker 1>new article summarizing the game.

285
00:14:20.480 --> 00:14:24.039
<v Speaker 2>That's spot on. Abstractive means the AI extracts the essence

286
00:14:24.080 --> 00:14:26.600
<v Speaker 2>of the data and generate something entirely new, like a

287
00:14:26.639 --> 00:14:30.399
<v Speaker 2>text summary. But the researchers note this is highly impractical

288
00:14:30.440 --> 00:14:31.879
<v Speaker 2>for personal image collection, Right.

289
00:14:31.919 --> 00:14:34.360
<v Speaker 1>I don't want an AI to generate a fake composite

290
00:14:34.440 --> 00:14:36.799
<v Speaker 1>image to summarize my actual family vacation.

291
00:14:37.200 --> 00:14:40.120
<v Speaker 2>No, you want your actual photos. So we rely on

292
00:14:40.200 --> 00:14:41.480
<v Speaker 2>extractive summarization.

293
00:14:41.799 --> 00:14:44.080
<v Speaker 1>But how does a computer look at a thousand pixels

294
00:14:44.080 --> 00:14:46.879
<v Speaker 1>and mathematically decide what makes a good highlight?

295
00:14:47.120 --> 00:14:51.679
<v Speaker 2>Well? The text details two main mathematical approaches to extractive summarization.

296
00:14:52.440 --> 00:14:56.120
<v Speaker 2>The first is the similarity based approach. The goal here

297
00:14:56.240 --> 00:14:58.759
<v Speaker 2>is to find the canonical view, and.

298
00:14:58.639 --> 00:15:01.720
<v Speaker 1>A canonical view is what exactly the definitive angle?

299
00:15:01.879 --> 00:15:06.519
<v Speaker 2>Yes, think of the most universally recognizable angle of the

300
00:15:06.559 --> 00:15:09.759
<v Speaker 2>Eiffel Tower. To find this in your photos, the AI

301
00:15:09.919 --> 00:15:11.399
<v Speaker 2>builds an eigen model.

302
00:15:11.480 --> 00:15:15.360
<v Speaker 1>Hold on eigenmodel sounds incredibly dense. What is that practically doing?

303
00:15:15.440 --> 00:15:17.320
<v Speaker 1>Is it just like averaging all the colors together.

304
00:15:17.519 --> 00:15:21.200
<v Speaker 2>Not just colors. It's extracting the structural skeleton of the images.

305
00:15:21.600 --> 00:15:25.480
<v Speaker 2>It maps out multidimensional features you know, edges, lighting, shapes,

306
00:15:25.679 --> 00:15:28.039
<v Speaker 2>and it plots every photo in mathematical space.

307
00:15:28.080 --> 00:15:28.799
<v Speaker 1>Okay, I'm falling.

308
00:15:29.360 --> 00:15:33.080
<v Speaker 2>Then it uses something called cosine similarity. This calculates the

309
00:15:33.080 --> 00:15:36.480
<v Speaker 2>geometric angle between the data points by finding the photos

310
00:15:36.480 --> 00:15:39.320
<v Speaker 2>with the tightest angles to one another. It clusters similar

311
00:15:39.399 --> 00:15:42.840
<v Speaker 2>images together and extracts the one photo sitting dead center

312
00:15:42.879 --> 00:15:43.559
<v Speaker 2>in that cluster.

313
00:15:43.840 --> 00:15:46.159
<v Speaker 1>So it looks at fifty photos of my dog at

314
00:15:46.159 --> 00:15:50.120
<v Speaker 1>the beach, groups them by their structural skeleton, finds the

315
00:15:50.159 --> 00:15:53.840
<v Speaker 1>mathematical dead center, and declares this is the canonical beach

316
00:15:53.919 --> 00:15:54.600
<v Speaker 1>dog photo.

317
00:15:54.879 --> 00:15:56.639
<v Speaker 2>That's the similarity approach.

318
00:15:56.759 --> 00:15:57.639
<v Speaker 1>Yes, yeah.

319
00:15:57.679 --> 00:16:00.639
<v Speaker 2>Now contrast that with the reconstruction based approach, which actually

320
00:16:00.639 --> 00:16:03.879
<v Speaker 2>treats your photo album like a data compression problem.

321
00:16:04.039 --> 00:16:05.240
<v Speaker 1>Data compression, right.

322
00:16:05.279 --> 00:16:08.279
<v Speaker 2>It uses a dictionary of sparse representations and relies on

323
00:16:08.360 --> 00:16:11.159
<v Speaker 2>minimizing something called L to norm error.

324
00:16:11.320 --> 00:16:14.120
<v Speaker 1>Okay, L two norm error. I need an analogy here

325
00:16:14.159 --> 00:16:16.519
<v Speaker 1>to wrap my head around that. Think of L two

326
00:16:16.600 --> 00:16:18.759
<v Speaker 1>norm error like freeze drying a meal.

327
00:16:18.799 --> 00:16:20.720
<v Speaker 2>Freeze drying, okay, Yeah.

328
00:16:20.360 --> 00:16:22.240
<v Speaker 1>You remove all the water, which is the bulk of

329
00:16:22.279 --> 00:16:25.039
<v Speaker 1>the weight distored efficiently, and if you add water back

330
00:16:25.120 --> 00:16:28.480
<v Speaker 1>later and the meal tastes exactly like the original. The

331
00:16:28.639 --> 00:16:31.039
<v Speaker 1>error in your freeze drying process zero.

332
00:16:31.279 --> 00:16:33.799
<v Speaker 2>That is actually a highly accurate way to look at it.

333
00:16:33.840 --> 00:16:36.360
<v Speaker 2>The algorithm is freeze drying your photo album. It asks

334
00:16:36.399 --> 00:16:39.799
<v Speaker 2>a purely mathematical question, if I only keep these five

335
00:16:39.799 --> 00:16:42.000
<v Speaker 2>photos out of one hundred, can I use their specific

336
00:16:42.080 --> 00:16:45.559
<v Speaker 2>mathematical features to perfectly reconstruct the data of the missing

337
00:16:45.679 --> 00:16:48.919
<v Speaker 2>ninety five. The five photos become the basis set, and

338
00:16:48.960 --> 00:16:51.720
<v Speaker 2>the L to norm error is simply the mathematical difference

339
00:16:51.759 --> 00:16:56.399
<v Speaker 2>between your original massive album and the algorithm's estimation. If

340
00:16:56.440 --> 00:16:59.720
<v Speaker 2>the error is tiny, the summary is highly representative.

341
00:17:00.039 --> 00:17:01.879
<v Speaker 1>But wait, putting myself in your shoes for a second,

342
00:17:01.879 --> 00:17:05.279
<v Speaker 1>looking at my own camera role, pure math doesn't understand sentiment.

343
00:17:05.559 --> 00:17:06.400
<v Speaker 2>No it doesn't.

344
00:17:06.519 --> 00:17:09.440
<v Speaker 1>If the AI just optimizes for this L two norm error,

345
00:17:09.920 --> 00:17:14.160
<v Speaker 1>it might pick five technically perfect photos that completely miss

346
00:17:14.240 --> 00:17:17.519
<v Speaker 1>the emotional point of my trip, Like my favorite photo

347
00:17:17.599 --> 00:17:20.680
<v Speaker 1>might be blurry or off center. Isn't a highlight? Real?

348
00:17:20.839 --> 00:17:22.119
<v Speaker 1>Incredibly subjective?

349
00:17:22.240 --> 00:17:25.319
<v Speaker 2>This is a crucial limitation. It really is. If you

350
00:17:25.480 --> 00:17:29.200
<v Speaker 2>only use pure geometry, you get a mathematically perfect summary

351
00:17:29.279 --> 00:17:32.400
<v Speaker 2>that feels totally alien to a human And that is

352
00:17:32.440 --> 00:17:36.759
<v Speaker 2>exactly why the researchers introduce task specific summarization.

353
00:17:36.319 --> 00:17:38.559
<v Speaker 1>Meaning the AI needs to know why you want the

354
00:17:38.559 --> 00:17:39.960
<v Speaker 1>summary before it does the mask.

355
00:17:40.119 --> 00:17:43.039
<v Speaker 2>Exactly, it filters the math through a layer of human intent.

356
00:17:43.200 --> 00:17:44.480
<v Speaker 1>So how does it actually do that?

357
00:17:44.640 --> 00:17:48.720
<v Speaker 2>The researchers build a deep learning architecture using a scorer network,

358
00:17:49.200 --> 00:17:52.240
<v Speaker 2>so before it ever clusters a photo, it evaluates every

359
00:17:52.359 --> 00:17:57.480
<v Speaker 2>single image based on three specific criteria relevance, diversity, and redundancy.

360
00:17:57.799 --> 00:17:59.880
<v Speaker 1>Well, diversity and redundancy makes sense, you want it to

361
00:18:00.079 --> 00:18:03.440
<v Speaker 1>and angles. You obviously don't want five identical pictures of

362
00:18:03.440 --> 00:18:07.960
<v Speaker 1>the same sunset. But how does an algorithm measure subjective relevance?

363
00:18:08.400 --> 00:18:11.960
<v Speaker 2>It uses a pre trained classifier. The AI takes the

364
00:18:12.000 --> 00:18:16.480
<v Speaker 2>image's mathematical properties, it's feature vector, and multiplies it by

365
00:18:16.480 --> 00:18:20.000
<v Speaker 2>a probability score that was generated for your specific task.

366
00:18:20.039 --> 00:18:21.000
<v Speaker 1>Okay, give me an example.

367
00:18:21.119 --> 00:18:23.839
<v Speaker 2>Say your task is show me the architectural highlights of

368
00:18:23.880 --> 00:18:27.519
<v Speaker 2>my trip. The classifier acts as a filter, boosting the

369
00:18:27.519 --> 00:18:30.480
<v Speaker 2>mathematical weight of buildings and drastically lowering the weight of

370
00:18:30.519 --> 00:18:31.400
<v Speaker 2>selfies or food.

371
00:18:31.559 --> 00:18:34.440
<v Speaker 1>So it's forcing the geometry to respect the context.

372
00:18:34.079 --> 00:18:37.519
<v Speaker 2>Precisely, and the text notes this ensures the summary is

373
00:18:37.920 --> 00:18:40.480
<v Speaker 2>a topologically invariant representation.

374
00:18:40.799 --> 00:18:44.880
<v Speaker 1>Okay, let's ELI five that explain, like I'm five. Topologically

375
00:18:44.880 --> 00:18:48.160
<v Speaker 1>invariant means what the shape of the memory survives.

376
00:18:48.559 --> 00:18:52.119
<v Speaker 2>Yes, in topology, you can stretch or shrink an object,

377
00:18:52.440 --> 00:18:54.680
<v Speaker 2>but as long as you don't pair it or punch

378
00:18:54.759 --> 00:18:59.039
<v Speaker 2>new holes in it, it's fundamental property. Its invariant shape remains.

379
00:18:59.160 --> 00:19:00.000
<v Speaker 1>Ah, it's beautiful.

380
00:19:00.240 --> 00:19:02.960
<v Speaker 2>By using scorer networks, the AI can shrink a ten

381
00:19:03.039 --> 00:19:06.680
<v Speaker 2>thousand photo album down to ten photos, but the fundamental

382
00:19:06.759 --> 00:19:09.920
<v Speaker 2>shape of your memory, tailored specifically for what you care about,

383
00:19:10.160 --> 00:19:11.559
<v Speaker 2>remains perfectly intact.

384
00:19:11.799 --> 00:19:14.839
<v Speaker 1>You know, it is genuinely remarkable how interconnected all these

385
00:19:14.880 --> 00:19:18.160
<v Speaker 1>concepts are. We started by looking at how AI clones

386
00:19:18.200 --> 00:19:21.480
<v Speaker 1>the structural DNA of a mistake to predict software failure.

387
00:19:22.240 --> 00:19:25.000
<v Speaker 1>Then we move to how IT clusters our behavioral footprints

388
00:19:25.039 --> 00:19:28.640
<v Speaker 1>in an N dimensional matrix to predict cultural success, and

389
00:19:28.680 --> 00:19:32.119
<v Speaker 1>we finished with how it uses sparse reconstruction and scorer

390
00:19:32.200 --> 00:19:37.480
<v Speaker 1>networks to freeze dryer visual chaos into perfect, meaningful summaries.

391
00:19:37.079 --> 00:19:39.480
<v Speaker 2>And the thread binding it all together is the mathematics

392
00:19:39.480 --> 00:19:42.960
<v Speaker 2>of relationships. Data doesn't exist in a vacuum. Once an

393
00:19:43.000 --> 00:19:45.640
<v Speaker 2>algorithm understands how a single piece of data relates to

394
00:19:45.640 --> 00:19:48.079
<v Speaker 2>the neighborhood around it, it can predict the future of

395
00:19:48.079 --> 00:19:49.720
<v Speaker 2>that entire neighborhood, which.

396
00:19:49.480 --> 00:19:52.960
<v Speaker 1>Brings us entirely back to you listening right now. Every

397
00:19:53.039 --> 00:19:55.559
<v Speaker 1>time you open a streaming app, search your camera roll,

398
00:19:55.839 --> 00:19:58.960
<v Speaker 1>or rely on a banking protocols to securely process a transaction,

399
00:19:59.440 --> 00:20:03.960
<v Speaker 1>you are really on this invisible architecture. Cess's MOTI balancing

400
00:20:04.000 --> 00:20:09.200
<v Speaker 1>the scales, collaborative filtering, finding your digital neighbors, sparse reconstruction,

401
00:20:09.279 --> 00:20:10.240
<v Speaker 1>distilling the noise.

402
00:20:10.359 --> 00:20:11.440
<v Speaker 2>It's everywhere.

403
00:20:11.559 --> 00:20:14.240
<v Speaker 1>These algorithms are silently working in the background to save

404
00:20:14.279 --> 00:20:17.599
<v Speaker 1>you time, curate your worldview, and keep the digital plumbing

405
00:20:17.640 --> 00:20:18.319
<v Speaker 1>from collapsing.

406
00:20:18.480 --> 00:20:21.400
<v Speaker 2>It really reframes how we interact with our own data,

407
00:20:21.680 --> 00:20:25.279
<v Speaker 2>and if we connect these capabilities to the bigger picture,

408
00:20:25.319 --> 00:20:28.519
<v Speaker 2>it leaves us with something quite profound to consider exactly, well,

409
00:20:29.319 --> 00:20:33.160
<v Speaker 2>if algorithms can synthesize artificial bugs to predict a software crash,

410
00:20:33.559 --> 00:20:36.960
<v Speaker 2>and if NLP can mine unstructured social media texts to

411
00:20:37.000 --> 00:20:40.000
<v Speaker 2>predict the exact moment someone becomes a star. And if

412
00:20:40.039 --> 00:20:42.720
<v Speaker 2>mathematical models can find the perfect canonical view of a

413
00:20:42.759 --> 00:20:46.960
<v Speaker 2>massive photo album, what happens when these incredibly powerful systems

414
00:20:47.000 --> 00:20:49.119
<v Speaker 2>are turned toward the data of your entire life?

415
00:20:49.200 --> 00:20:49.759
<v Speaker 1>Oh wow?

416
00:20:49.799 --> 00:20:52.079
<v Speaker 2>If a machine can distill a data set down to

417
00:20:52.160 --> 00:20:55.279
<v Speaker 2>its fundamental shape, what is the canonical view of you?

418
00:20:55.720 --> 00:20:58.920
<v Speaker 1>Oh man, that is a heavy, fascinating question. To walk

419
00:20:58.920 --> 00:21:01.920
<v Speaker 1>away with the idea of an algorithm zooming out on

420
00:21:01.960 --> 00:21:05.480
<v Speaker 1>your entire digital footprint and just picking the ten frames

421
00:21:05.559 --> 00:21:09.279
<v Speaker 1>that mathematically reconstruct your essence. I love that. Thank you

422
00:21:09.319 --> 00:21:11.880
<v Speaker 1>for handing us this incredible research today and joining us

423
00:21:11.920 --> 00:21:14.839
<v Speaker 1>as we explore the invisible architecture around us. Keep questioning

424
00:21:14.880 --> 00:21:16.920
<v Speaker 1>the algorithms and we will catch you on the next

425
00:21:16.920 --> 00:21:17.440
<v Speaker 1>deep dive.
