WEBVTT

1
00:00:00.080 --> 00:00:02.919
<v Speaker 1>Have you ever wondered what truly lies beneath the surface

2
00:00:02.919 --> 00:00:06.719
<v Speaker 1>of the Internet, you know, far beyond what traditional search

3
00:00:06.759 --> 00:00:08.320
<v Speaker 1>engines like Google can ever show you.

4
00:00:08.519 --> 00:00:11.400
<v Speaker 2>It's a massive space. Really, we're talking about these vast

5
00:00:11.560 --> 00:00:13.359
<v Speaker 2>unseen digital realms.

6
00:00:13.119 --> 00:00:19.440
<v Speaker 1>Right, teeming with fascinating, sometimes alarming, and often incredibly useful information. Absolutely,

7
00:00:19.920 --> 00:00:22.120
<v Speaker 1>welcome to the Deep Dive, the show where we cut

8
00:00:22.120 --> 00:00:26.160
<v Speaker 1>through the noise, unpack complex topics and extract those vital

9
00:00:26.239 --> 00:00:29.440
<v Speaker 1>nuggets of knowledge so you can become well informed, quickly

10
00:00:29.519 --> 00:00:30.160
<v Speaker 1>and thoroughly.

11
00:00:30.600 --> 00:00:33.200
<v Speaker 2>And our mission today is really to give you a

12
00:00:33.200 --> 00:00:37.320
<v Speaker 2>powerful understanding of the digital world's hidden layers, both.

13
00:00:37.200 --> 00:00:40.200
<v Speaker 1>The Deep Web and you know, the more notorious Dark

14
00:00:40.200 --> 00:00:41.079
<v Speaker 1>Web exactly.

15
00:00:41.759 --> 00:00:45.439
<v Speaker 2>And we're diving into a truly comprehensive resource today, the

16
00:00:45.600 --> 00:00:48.280
<v Speaker 2>Dark Web Breakthroughs in Research and Practice.

17
00:00:48.479 --> 00:00:49.840
<v Speaker 1>Ah okay, Yeah, it's a.

18
00:00:49.759 --> 00:00:52.600
<v Speaker 2>Compilation published by IGI Global back in twenty eighteen, and

19
00:00:52.600 --> 00:00:55.520
<v Speaker 2>it's packed with cutting edge theories and developments. It's really

20
00:00:55.560 --> 00:00:58.359
<v Speaker 2>designed to empower anyone wanting a deeper understanding of this

21
00:00:58.439 --> 00:00:59.640
<v Speaker 2>whole evolving space.

22
00:01:00.320 --> 00:01:04.560
<v Speaker 1>Sounds perfect. So this incredible resource is organized into four

23
00:01:04.719 --> 00:01:06.040
<v Speaker 1>major sections.

24
00:01:05.920 --> 00:01:10.000
<v Speaker 2>That's right, cybercrime and security, then data mining, an analysis,

25
00:01:10.120 --> 00:01:12.840
<v Speaker 2>online identity, and finally web crawling.

26
00:01:13.319 --> 00:01:15.879
<v Speaker 1>Okay, so we're going to navigate these areas pulling out

27
00:01:15.879 --> 00:01:20.439
<v Speaker 1>the most surprising facts and relevant details for you, helping

28
00:01:20.480 --> 00:01:21.439
<v Speaker 1>you see the unseen.

29
00:01:21.640 --> 00:01:22.159
<v Speaker 2>Let's do it.

30
00:01:22.280 --> 00:01:26.439
<v Speaker 1>Okay, let's unpack this first section then, on cybercrime and security.

31
00:01:26.560 --> 00:01:31.239
<v Speaker 1>It highlights some truly well eye opening developments in online

32
00:01:31.319 --> 00:01:32.159
<v Speaker 1>criminal activity.

33
00:01:32.280 --> 00:01:34.359
<v Speaker 2>Yeah, and what's fascinating here, I think, is how the

34
00:01:34.400 --> 00:01:38.040
<v Speaker 2>research goes beyond just describing the crimes. It really seeks

35
00:01:38.040 --> 00:01:42.359
<v Speaker 2>to understand the underlying psychological and social factors driving them.

36
00:01:42.519 --> 00:01:46.159
<v Speaker 2>Like what for instance, well, take the unsettling context of

37
00:01:46.239 --> 00:01:49.840
<v Speaker 2>revenge porn. The research introduces something called the dark triad

38
00:01:49.959 --> 00:01:50.959
<v Speaker 2>personality traits.

39
00:01:51.159 --> 00:01:54.640
<v Speaker 1>The dark triad that sounds well, pretty ominous. What exactly

40
00:01:54.680 --> 00:01:55.319
<v Speaker 1>are those traits?

41
00:01:55.359 --> 00:01:59.519
<v Speaker 2>It refers to machiavelianism, psychopathy, and narcissism. And these aren't

42
00:01:59.519 --> 00:02:03.760
<v Speaker 2>just buzzwords, you know. They represent characteristics like callousness, egocentrism,

43
00:02:03.840 --> 00:02:07.840
<v Speaker 2>low empathy, and a well a readiness to exploit others.

44
00:02:07.920 --> 00:02:09.199
<v Speaker 1>Okay, so break those down a bit.

45
00:02:09.280 --> 00:02:13.639
<v Speaker 2>Psychopathy is psychopathy specifically indicates a severe lack of empathy,

46
00:02:14.520 --> 00:02:17.000
<v Speaker 2>impulsivity really driven by immediate.

47
00:02:16.560 --> 00:02:18.400
<v Speaker 1>Gratification and machiavilianism.

48
00:02:18.479 --> 00:02:22.560
<v Speaker 2>That's more about strategic ruthless manipulation, planning things.

49
00:02:22.319 --> 00:02:24.159
<v Speaker 1>Out, got it and narcissism.

50
00:02:24.280 --> 00:02:31.439
<v Speaker 2>That's all about entitlement, grandiosity and ego reinforcement, needing that validation.

51
00:02:31.960 --> 00:02:33.560
<v Speaker 1>So what's the link to revenge porn?

52
00:02:34.319 --> 00:02:37.000
<v Speaker 2>Well, the significant finding here is that endorsing these dark

53
00:02:37.039 --> 00:02:40.080
<v Speaker 2>triad traits strongly predicts a greater propensity for engaging in

54
00:02:40.120 --> 00:02:44.039
<v Speaker 2>revenge porn and disturbingly, also a greater enjoyment of tormenting

55
00:02:44.080 --> 00:02:44.960
<v Speaker 2>others online.

56
00:02:45.000 --> 00:02:47.560
<v Speaker 1>Wow, it's chilling. It is a stark look at the

57
00:02:47.599 --> 00:02:50.159
<v Speaker 1>motivations behind some of the Internet's darkest corners.

58
00:02:50.400 --> 00:02:51.039
<v Speaker 2>It really is.

59
00:02:51.199 --> 00:02:54.639
<v Speaker 1>That's a powerful insight into individual psychology. But moving on

60
00:02:54.680 --> 00:02:58.520
<v Speaker 1>to a broader, maybe more societal scale of online malice,

61
00:02:58.599 --> 00:03:01.360
<v Speaker 1>the source also delves into temporary terror on the net.

62
00:03:01.719 --> 00:03:03.520
<v Speaker 1>This isn't just about loan actors, is it.

63
00:03:03.680 --> 00:03:06.719
<v Speaker 2>No, definitely not. It's a significant shift we've seen extremist

64
00:03:06.759 --> 00:03:10.800
<v Speaker 2>groups have evolved from relying on singular leaders to leveraging

65
00:03:10.840 --> 00:03:14.719
<v Speaker 2>these vast decentralized networks lots of loose weak ties basically

66
00:03:14.800 --> 00:03:17.240
<v Speaker 2>to spread their ideology and tactics.

67
00:03:17.680 --> 00:03:20.719
<v Speaker 1>And is the Islamic state is a key example.

68
00:03:20.439 --> 00:03:24.439
<v Speaker 2>A stark example. Yes, they actively manipulate a concept called

69
00:03:24.639 --> 00:03:28.599
<v Speaker 2>the constitutive. Other essentially, they prey on feelings of isolation.

70
00:03:28.840 --> 00:03:30.080
<v Speaker 1>Oh so who do they target.

71
00:03:30.280 --> 00:03:34.759
<v Speaker 2>They specifically target younger Middle Eastern women in the Western world,

72
00:03:35.199 --> 00:03:38.240
<v Speaker 2>luring them with this promise of an Islamic state where

73
00:03:38.280 --> 00:03:41.879
<v Speaker 2>they'll supposedly feel understood and you know, part of a family.

74
00:03:42.479 --> 00:03:46.599
<v Speaker 1>So they're exploiting personal vulnerabilities for ideological recruitment, creating a

75
00:03:46.639 --> 00:03:48.879
<v Speaker 1>sense of belonging in a very dangerous way.

76
00:03:49.000 --> 00:03:53.400
<v Speaker 2>Precisely, this shared emotional state is crucial for contemporary social

77
00:03:53.439 --> 00:03:59.159
<v Speaker 2>movements and global ghod. The research also details lone wolf terrorism,

78
00:03:59.319 --> 00:04:02.639
<v Speaker 2>noting its core with the proliferation of powerful weapons.

79
00:04:02.319 --> 00:04:06.280
<v Speaker 1>Right, and these individuals often blend ideology with personal issues.

80
00:04:05.919 --> 00:04:09.360
<v Speaker 2>Exactly deeply personal grievances. Think about incidents like the Orlando

81
00:04:09.439 --> 00:04:13.039
<v Speaker 2>nightclub shooting or the Fort Hood attack. As Corner noted

82
00:04:13.039 --> 00:04:16.000
<v Speaker 2>back in twenty sixteen, these are often disturbed individuals who

83
00:04:16.040 --> 00:04:19.079
<v Speaker 2>sort of layer a political facade over their personal problems.

84
00:04:19.319 --> 00:04:24.040
<v Speaker 1>And this decentralized network structure, it sounds like it makes

85
00:04:24.079 --> 00:04:26.639
<v Speaker 1>these organizations incredibly difficult to combat.

86
00:04:26.800 --> 00:04:30.800
<v Speaker 2>It absolutely changes the game with no specific heart or

87
00:04:30.920 --> 00:04:33.160
<v Speaker 2>head that can be targeted. As the source puts it,

88
00:04:33.199 --> 00:04:37.560
<v Speaker 2>traditional counter terrorism strategies become well less effective.

89
00:04:37.800 --> 00:04:39.600
<v Speaker 1>How did iss manage their media? Then?

90
00:04:40.000 --> 00:04:44.120
<v Speaker 2>They built an incredibly sophisticated media strategy. They use their

91
00:04:44.279 --> 00:04:47.839
<v Speaker 2>monthly Debak magazine, for instance, published via the dark web

92
00:04:47.839 --> 00:04:51.680
<v Speaker 2>for anonymity, and they varied their social media content widely,

93
00:04:51.759 --> 00:04:58.079
<v Speaker 2>everything from you vicious beheading videos to seemingly innocuous kitten.

94
00:04:57.800 --> 00:04:59.560
<v Speaker 1>Memes kitten memes really.

95
00:04:59.399 --> 00:05:02.959
<v Speaker 2>Yeah, designed to appeal to different demographics for recruitment, and

96
00:05:03.000 --> 00:05:06.800
<v Speaker 2>when faced with crackdowns like Twitter suspensions, they adapted incredible quickly.

97
00:05:07.040 --> 00:05:10.000
<v Speaker 2>How they shifted from user centric dissemination where you could

98
00:05:10.040 --> 00:05:12.959
<v Speaker 2>trace it back to one account, to a hashtag driven model.

99
00:05:13.399 --> 00:05:15.759
<v Speaker 2>This made their messages far harder to trace.

100
00:05:16.120 --> 00:05:19.519
<v Speaker 1>The dark web then becomes an essential, if hidden publishing

101
00:05:19.560 --> 00:05:20.439
<v Speaker 1>platform for them.

102
00:05:20.639 --> 00:05:24.879
<v Speaker 2>That's absolutely right. It's decentralized and anonymous networks are just

103
00:05:25.160 --> 00:05:31.240
<v Speaker 2>crucial for propaganda, especially after activist Wars pushed a lot

104
00:05:31.439 --> 00:05:34.800
<v Speaker 2>of extremist Islamic discourse onto these hidden platforms.

105
00:05:34.920 --> 00:05:37.839
<v Speaker 1>And it shares technology with other known dark websites.

106
00:05:38.120 --> 00:05:41.319
<v Speaker 2>Yeah, it shares some of the same technological underpinnings as

107
00:05:41.360 --> 00:05:45.800
<v Speaker 2>places like wikileaps, Bitcoin, and the infamous Silk Road.

108
00:05:46.240 --> 00:05:48.399
<v Speaker 1>That's a stark picture of the threats lurking in the

109
00:05:48.439 --> 00:05:52.079
<v Speaker 1>digital shadows. Let's switch gears a bit. Maybe something many

110
00:05:52.120 --> 00:05:55.959
<v Speaker 1>of us can relate to more directly. Dysfunctional digital behaviors

111
00:05:56.000 --> 00:05:56.879
<v Speaker 1>in online learning.

112
00:05:57.040 --> 00:05:59.639
<v Speaker 2>Ah. Yes, the elephant in the online classroom.

113
00:06:00.000 --> 00:06:02.560
<v Speaker 1>Horse calls it that, right, and it's apparently far more

114
00:06:02.560 --> 00:06:03.680
<v Speaker 1>prevalent than we might realize.

115
00:06:03.759 --> 00:06:06.399
<v Speaker 2>It's a very apt metaphor because it's often ignored, isn't it.

116
00:06:06.600 --> 00:06:10.800
<v Speaker 2>This elephant covers everything from cyberbullying and plagiarism to outright

117
00:06:10.879 --> 00:06:14.000
<v Speaker 2>hacking and just the constant search for shortcuts.

118
00:06:14.079 --> 00:06:16.120
<v Speaker 1>Can you give an example like the plagiarism case.

119
00:06:16.279 --> 00:06:20.560
<v Speaker 2>Sure, the research describes a student a who plagiarized a

120
00:06:20.680 --> 00:06:26.160
<v Speaker 2>term paper. The reasons well, conflicting priorities. They had a

121
00:06:26.240 --> 00:06:29.560
<v Speaker 2>demanding agency report due at the same time, but also

122
00:06:29.639 --> 00:06:30.720
<v Speaker 2>the sheer ease of.

123
00:06:30.759 --> 00:06:33.000
<v Speaker 1>Copy pasting, and the distance factor.

124
00:06:32.800 --> 00:06:36.920
<v Speaker 2>Exactly the perceived distance from the instructor. The student actually

125
00:06:37.000 --> 00:06:39.920
<v Speaker 2>rationalized it thinking it was easy since my instructor was

126
00:06:40.040 --> 00:06:41.319
<v Speaker 2>thousands of miles away.

127
00:06:41.439 --> 00:06:44.319
<v Speaker 1>Wow, and hacking isn't just for big corporations. It happens

128
00:06:44.319 --> 00:06:46.519
<v Speaker 1>in e learning too. That's pretty unsettling.

129
00:06:46.800 --> 00:06:51.040
<v Speaker 2>It is the source details specific methods, students logging in

130
00:06:51.120 --> 00:06:54.519
<v Speaker 2>as an instructor to grab test answers, using spyware to

131
00:06:54.560 --> 00:06:58.000
<v Speaker 2>see others answers during tests, or even employing sniffers to

132
00:06:58.079 --> 00:07:00.000
<v Speaker 2>decipher network packets for passwords.

133
00:07:00.399 --> 00:07:03.399
<v Speaker 1>So online learning can become less about education and more

134
00:07:03.399 --> 00:07:06.279
<v Speaker 1>about winning for some students, like a game.

135
00:07:06.519 --> 00:07:09.639
<v Speaker 2>That's the core of this gamer's agenda concept they talk about.

136
00:07:09.680 --> 00:07:12.160
<v Speaker 2>Students start treating online courses like games they need to

137
00:07:12.199 --> 00:07:14.959
<v Speaker 2>win by quote outsmarting.

138
00:07:14.199 --> 00:07:17.000
<v Speaker 1>The system, and they rationalize it, how.

139
00:07:16.519 --> 00:07:20.240
<v Speaker 2>Often through an avatar identity, believing the person cutting corners

140
00:07:20.360 --> 00:07:25.519
<v Speaker 2>isn't really them. The research lists key reasons pressure to

141
00:07:25.600 --> 00:07:29.480
<v Speaker 2>maintain high GPAs the feeling they probably won't get.

142
00:07:29.240 --> 00:07:32.000
<v Speaker 1>Caught, the ease of copy pasting you mentioned.

143
00:07:31.800 --> 00:07:35.399
<v Speaker 2>Right, the perception that everyone does it, and unfortunately sometimes

144
00:07:35.399 --> 00:07:39.199
<v Speaker 2>a noticeable level of faculty apathy or at least perceived apathy.

145
00:07:39.519 --> 00:07:41.920
<v Speaker 1>And distance seems to play a significant role here.

146
00:07:42.079 --> 00:07:46.399
<v Speaker 2>Oh, absolutely. The research shows a clear inverse relationship. The

147
00:07:46.439 --> 00:07:49.720
<v Speaker 2>greater the distance between student and teacher, the higher the

148
00:07:49.759 --> 00:07:54.120
<v Speaker 2>tendency for cheating. Less interaction less oversight.

149
00:07:53.920 --> 00:07:55.839
<v Speaker 1>Which leads to this problematique map.

150
00:07:55.959 --> 00:08:00.879
<v Speaker 2>Yes, exactly. It's this tangled web of interrelated problems cheating, cyberbullying,

151
00:08:01.000 --> 00:08:04.920
<v Speaker 2>cutting corners, all linked by these overarching factors like physical

152
00:08:04.920 --> 00:08:09.839
<v Speaker 2>and psychological distance, the technology itself, cyber psychological elements, and

153
00:08:09.920 --> 00:08:12.639
<v Speaker 2>broader sociocultural influences.

154
00:08:12.040 --> 00:08:14.040
<v Speaker 1>And the end result can be quite negative.

155
00:08:14.319 --> 00:08:17.600
<v Speaker 2>Often it culminates in an adversarial teacher learner relationship, which

156
00:08:17.639 --> 00:08:19.480
<v Speaker 2>is the opposite of what education should be.

157
00:08:19.600 --> 00:08:21.759
<v Speaker 1>This really sounds like it demands a fresh approach to

158
00:08:21.800 --> 00:08:23.160
<v Speaker 1>online education policy.

159
00:08:23.439 --> 00:08:27.759
<v Speaker 2>It certainly calls for a shift. The implications include moving

160
00:08:27.839 --> 00:08:31.439
<v Speaker 2>away from just testing factual recall lower level cognitive stuff

161
00:08:31.480 --> 00:08:36.679
<v Speaker 2>towards fostering metacognitive learning objectives waiting, essentially teaching students how

162
00:08:36.720 --> 00:08:40.360
<v Speaker 2>to learn and think critically rather than just memorizing facts.

163
00:08:40.879 --> 00:08:43.480
<v Speaker 2>It also means ensuring assessment doesn't actually get in the

164
00:08:43.480 --> 00:08:48.279
<v Speaker 2>way of genuine learning and promoting educational policies consistent with openness,

165
00:08:48.320 --> 00:08:53.279
<v Speaker 2>things like learner centeredness, connectivism really encouraging active construction of knowledge.

166
00:08:53.399 --> 00:08:56.399
<v Speaker 1>Okay, our next topic in this section asks a pretty

167
00:08:56.399 --> 00:09:00.240
<v Speaker 1>provocative question, how to become a cyber criminal? It really

168
00:09:00.240 --> 00:09:03.879
<v Speaker 1>gets into the difference between what hacking originally meant and

169
00:09:04.039 --> 00:09:05.799
<v Speaker 1>the darker path it often takes today.

170
00:09:06.000 --> 00:09:08.399
<v Speaker 2>That's such a crucial distinction. Initially, you know, hacking was

171
00:09:08.440 --> 00:09:12.519
<v Speaker 2>often seen as innovative, even constructive. Think Dennis Ritchie and

172
00:09:12.600 --> 00:09:15.879
<v Speaker 2>Ken Thompson creating you and IX, or maybe Sean Fanning

173
00:09:15.919 --> 00:09:19.360
<v Speaker 2>creating Napster. That kind of thing, right, creative problem solving exactly.

174
00:09:19.720 --> 00:09:23.039
<v Speaker 2>But then it evolved or devolved perhaps into cracking, which

175
00:09:23.080 --> 00:09:25.720
<v Speaker 2>involves serious criminal offenses.

176
00:09:25.480 --> 00:09:27.279
<v Speaker 1>And the motivation there is often financial.

177
00:09:27.519 --> 00:09:32.039
<v Speaker 2>Often yes, the source sites examples of individuals, frequently teenagers,

178
00:09:32.440 --> 00:09:35.840
<v Speaker 2>driven by the potential for significant financial gain, like the

179
00:09:35.960 --> 00:09:40.159
<v Speaker 2>s toost botnet, which alone generated and estimated fourteen.

180
00:09:39.799 --> 00:09:44.279
<v Speaker 1>Million dollars fourteen million. Wow, So it's about the money.

181
00:09:44.600 --> 00:09:48.080
<v Speaker 1>Often driven by a kind of economic cost benefit analysis,

182
00:09:48.720 --> 00:09:51.360
<v Speaker 1>the perception that the risk is low but the rewards

183
00:09:51.360 --> 00:09:52.039
<v Speaker 1>are high.

184
00:09:52.039 --> 00:09:55.240
<v Speaker 2>Pretty much according to G. Beecker's Economic Approach to Crime

185
00:09:55.279 --> 00:09:58.960
<v Speaker 2>from way back in nineteen sixty eight. Individuals, especially teens,

186
00:09:59.120 --> 00:10:02.320
<v Speaker 2>might make that kind of calculation. They perceive hacking or

187
00:10:02.360 --> 00:10:06.200
<v Speaker 2>cracking as relatively riskless and highly compatible with their lifestyles,

188
00:10:06.360 --> 00:10:10.320
<v Speaker 2>particularly if they have low earnings or lack other opportunities.

189
00:10:09.600 --> 00:10:11.519
<v Speaker 1>And the media plays a role indirectly.

190
00:10:11.600 --> 00:10:15.399
<v Speaker 2>Yes, the potential for huge profits like bought Masters earning

191
00:10:15.480 --> 00:10:19.600
<v Speaker 2>millions annually gets highlighted, acting as an observability factor, basically

192
00:10:19.639 --> 00:10:21.080
<v Speaker 2>showing us how lucrative it can be.

193
00:10:21.360 --> 00:10:24.799
<v Speaker 1>And modern technology like cloud computing makes it even easier

194
00:10:24.799 --> 00:10:26.120
<v Speaker 1>for cyber criminals to operate.

195
00:10:26.320 --> 00:10:29.600
<v Speaker 2>It drastically lowers the bar for entry. Yeah. Infrastructure as

196
00:10:29.600 --> 00:10:33.720
<v Speaker 2>a service or ISS provides massive computing power on demand

197
00:10:33.720 --> 00:10:35.559
<v Speaker 2>for things like brute force password attacks.

198
00:10:35.679 --> 00:10:37.679
<v Speaker 1>See you don't need your own supercomputer exactly.

199
00:10:37.759 --> 00:10:40.919
<v Speaker 2>It enables cheap and large scale denial of service or

200
00:10:41.000 --> 00:10:45.120
<v Speaker 2>DOS attacks, which can cripple websites, and it simplifies spamming

201
00:10:45.240 --> 00:10:49.320
<v Speaker 2>or malware distribution via mail. Software as a service. Basically,

202
00:10:49.440 --> 00:10:54.039
<v Speaker 2>it provides accessible, powerful tools for illicit activities.

203
00:10:53.759 --> 00:10:57.320
<v Speaker 1>Which also means we're seeing more widespread ransomware and sextortion,

204
00:10:57.799 --> 00:11:00.639
<v Speaker 1>things that affect ordinary people and businesses Precisely.

205
00:11:01.000 --> 00:11:03.919
<v Speaker 2>Businesses often fall victim to ransomware, where their data is

206
00:11:04.000 --> 00:11:07.519
<v Speaker 2>encrypted and held hostage. They frequently pay up, sometimes just

207
00:11:07.519 --> 00:11:10.600
<v Speaker 2>to avoid the public exposure or the catastrophic data loss.

208
00:11:10.799 --> 00:11:15.799
<v Speaker 2>Hand sextortion similarly, online sexual extortion or sextortion is sadly

209
00:11:15.840 --> 00:11:18.480
<v Speaker 2>on the rise, with victims often paying to avoid the

210
00:11:18.519 --> 00:11:21.639
<v Speaker 2>publicity and shame. The shift from sort of non criminal

211
00:11:21.639 --> 00:11:26.080
<v Speaker 2>hacking exploration to outright criminal activity is unfortunately facilitated by

212
00:11:26.120 --> 00:11:30.720
<v Speaker 2>hacking's compatibility with youth culture, its perceived advantages over traditional crime,

213
00:11:30.759 --> 00:11:33.480
<v Speaker 2>its ease of use, and even the existence of supportive

214
00:11:33.519 --> 00:11:34.519
<v Speaker 2>online communities.

215
00:11:34.720 --> 00:11:38.039
<v Speaker 1>Okay, let's shift gears again. Here's where it gets really interesting.

216
00:11:38.080 --> 00:11:41.399
<v Speaker 1>I think this next section explores how we can actually

217
00:11:41.480 --> 00:11:46.679
<v Speaker 1>navigate and extract valuable information from the web's hidden depths.

218
00:11:47.759 --> 00:11:50.639
<v Speaker 1>After all that talk of threads, this is about harnessing

219
00:11:50.679 --> 00:11:52.320
<v Speaker 1>its power, right And.

220
00:11:52.279 --> 00:11:55.279
<v Speaker 2>If we connect this to the bigger picture, understanding these

221
00:11:55.320 --> 00:11:58.919
<v Speaker 2>extracted techniques is crucial for harnessing the vast amount of

222
00:11:59.000 --> 00:12:02.120
<v Speaker 2>high quality data that lies beyond traditional search engine reach.

223
00:12:02.600 --> 00:12:04.480
<v Speaker 2>This is the deep web we're talking about now.

224
00:12:04.519 --> 00:12:06.480
<v Speaker 1>And just to clarify again, the deep web isn't the

225
00:12:06.480 --> 00:12:08.440
<v Speaker 1>same as the dark web, right correct.

226
00:12:08.519 --> 00:12:10.840
<v Speaker 2>The dark web is a smaller, hidden part of the

227
00:12:10.879 --> 00:12:14.519
<v Speaker 2>deep web that requires specific software like tor to access.

228
00:12:14.879 --> 00:12:17.279
<v Speaker 2>The deep web itself is much larger. It's simply all

229
00:12:17.320 --> 00:12:21.000
<v Speaker 2>the information stored in searchable databases that dynamically generate results

230
00:12:21.080 --> 00:12:24.519
<v Speaker 2>when you query them. Think library catalogs, internal corporate sites,

231
00:12:24.559 --> 00:12:25.159
<v Speaker 2>that sort of thing.

232
00:12:25.360 --> 00:12:29.200
<v Speaker 1>Okay, So how do we efficiently query these massive deep

233
00:12:29.279 --> 00:12:33.399
<v Speaker 1>web databases when traditional search engines can't really handle it well?

234
00:12:33.399 --> 00:12:36.600
<v Speaker 2>The source proposes a rather clever solution, an optimal query

235
00:12:36.639 --> 00:12:38.799
<v Speaker 2>generation mechanism based on random ranking.

236
00:12:39.039 --> 00:12:40.960
<v Speaker 1>Okay, random ranking? How does that work?

237
00:12:41.039 --> 00:12:43.399
<v Speaker 2>Think of it like a smart librarian trying to find

238
00:12:43.440 --> 00:12:47.799
<v Speaker 2>specific information in a massive archive without pulling every single book.

239
00:12:48.279 --> 00:12:52.679
<v Speaker 2>It uses a response analyzer and a query ranker. First,

240
00:12:52.759 --> 00:12:55.559
<v Speaker 2>a form analyzer figures out the structure of the search

241
00:12:55.600 --> 00:13:00.159
<v Speaker 2>forms on these hitting databases. Then the query ranker prioritizes

242
00:13:00.360 --> 00:13:04.080
<v Speaker 2>which queries to send based on factors like past query behavior,

243
00:13:04.120 --> 00:13:07.240
<v Speaker 2>what worked before, and even the size of recent search

244
00:13:07.279 --> 00:13:10.519
<v Speaker 2>results pages. Then you give an example, sure in an

245
00:13:10.559 --> 00:13:14.519
<v Speaker 2>air travel database, say this system can automatically reduce illogical

246
00:13:14.600 --> 00:13:18.080
<v Speaker 2>query combinations like trying to find flights from Deli to

247
00:13:18.159 --> 00:13:22.279
<v Speaker 2>Deli from maybe eight possibilities down to four by using

248
00:13:22.320 --> 00:13:25.240
<v Speaker 2>external knowledge like that, it minimizes the number of queries

249
00:13:25.279 --> 00:13:27.000
<v Speaker 2>needed and avoids duplicates.

250
00:13:27.039 --> 00:13:28.480
<v Speaker 1>That saves a lot of time and resources.

251
00:13:28.519 --> 00:13:32.200
<v Speaker 2>Presumably exactly the goal is to exhaustively retrieve the content

252
00:13:32.279 --> 00:13:33.840
<v Speaker 2>with the minimum number of queries.

253
00:13:34.200 --> 00:13:36.879
<v Speaker 1>That's incredibly smart, and the Web itself, of course, has

254
00:13:36.879 --> 00:13:40.840
<v Speaker 1>been constantly evolving, changing how this hidden data is structured.

255
00:13:41.039 --> 00:13:42.320
<v Speaker 1>The source looked at that too.

256
00:13:42.440 --> 00:13:45.639
<v Speaker 2>It absolutely has. An analysis comparing the global web in

257
00:13:45.639 --> 00:13:48.720
<v Speaker 2>twenty nine to twenty fourteen show some really significant changes

258
00:13:48.720 --> 00:13:50.159
<v Speaker 2>in web page development.

259
00:13:49.840 --> 00:13:51.039
<v Speaker 1>Such as well.

260
00:13:51.240 --> 00:13:55.200
<v Speaker 2>While core HTML tags like head, HTML, body, and title

261
00:13:55.279 --> 00:13:59.399
<v Speaker 2>remained consistently present almost near one hundred percent, tags used

262
00:13:59.440 --> 00:14:02.879
<v Speaker 2>for dynamic content things like meta, div, link and script,

263
00:14:03.120 --> 00:14:04.559
<v Speaker 2>their usage increased.

264
00:14:04.240 --> 00:14:06.159
<v Speaker 1>Dramatically, and older tags decreased.

265
00:14:06.320 --> 00:14:09.399
<v Speaker 2>Yeah, table related tags like Freya table teap body TD

266
00:14:09.840 --> 00:14:12.720
<v Speaker 2>their use went down. It clearly signals a move away

267
00:14:12.720 --> 00:14:17.080
<v Speaker 2>from simple static pages towards more interactive, data driven web experiences.

268
00:14:17.159 --> 00:14:19.480
<v Speaker 1>What about content formats? Are we seeing shifts there too?

269
00:14:19.679 --> 00:14:21.919
<v Speaker 1>Images documents, definitely.

270
00:14:21.480 --> 00:14:24.759
<v Speaker 2>For images, JPG remained dominant, but P and G usage

271
00:14:24.799 --> 00:14:27.360
<v Speaker 2>grew massively. It jumped from just over three percent of

272
00:14:27.360 --> 00:14:29.240
<v Speaker 2>all images in two thousand and nine to nearly a

273
00:14:29.360 --> 00:14:32.159
<v Speaker 2>quarter by twenty fourteen. Wow, and the percentage of pages

274
00:14:32.240 --> 00:14:35.679
<v Speaker 2>using at least one PNG image nearly quadrupled. This reflects

275
00:14:35.679 --> 00:14:39.039
<v Speaker 2>a growing demand for richer visual content, more complex web designs.

276
00:14:39.320 --> 00:14:41.000
<v Speaker 2>JIFF usage meanwhile, pretty much.

277
00:14:40.879 --> 00:14:44.159
<v Speaker 1>Dropped off, and music documents.

278
00:14:43.279 --> 00:14:47.279
<v Speaker 2>For music MP three just stayed overwhelmingly dominant over ninety

279
00:14:47.320 --> 00:14:51.159
<v Speaker 2>one percent by twenty fourteen. Not much evolution there, probably

280
00:14:51.279 --> 00:14:54.960
<v Speaker 2>due to its excellent quality to size ratio. For documents,

281
00:14:55.000 --> 00:14:58.440
<v Speaker 2>PDF was the most common and grew steadily, with XML

282
00:14:58.559 --> 00:15:01.279
<v Speaker 2>also increasing both our faise i've heard for portability and

283
00:15:01.399 --> 00:15:02.200
<v Speaker 2>professional use.

284
00:15:02.480 --> 00:15:07.159
<v Speaker 1>Interesting about compression too, Zip is surpassing gztip.

285
00:15:06.799 --> 00:15:10.200
<v Speaker 2>Yeah, potentially linked to the dominance of Windows operating systems,

286
00:15:10.200 --> 00:15:11.720
<v Speaker 2>the researchers suggest.

287
00:15:11.759 --> 00:15:14.440
<v Speaker 1>And how quickly your pages being updated? Does that impact

288
00:15:14.480 --> 00:15:16.559
<v Speaker 1>how we search or crawl this hidden web?

289
00:15:16.799 --> 00:15:20.039
<v Speaker 2>This is a huge trend. First, style wise, bold text

290
00:15:20.120 --> 00:15:23.080
<v Speaker 2>usage decreased, while title styles like H two and H

291
00:15:23.159 --> 00:15:26.799
<v Speaker 2>three increased, maybe a shift in presentation norms. Average URL

292
00:15:26.879 --> 00:15:29.960
<v Speaker 2>length stayed consistent, but the percentage of longer URLs grew

293
00:15:30.039 --> 00:15:32.200
<v Speaker 2>may be indicating more complex content paths.

294
00:15:32.320 --> 00:15:33.960
<v Speaker 1>But the age of pages.

295
00:15:33.639 --> 00:15:35.519
<v Speaker 2>Right, here's the crucial part. There's a rapid trend of

296
00:15:35.600 --> 00:15:38.879
<v Speaker 2>updating content. By twenty fourteen, something like seventy percent of

297
00:15:38.919 --> 00:15:39.559
<v Speaker 2>pages were.

298
00:15:39.519 --> 00:15:41.480
<v Speaker 1>Less than three months old seventy percent.

299
00:15:41.320 --> 00:15:43.240
<v Speaker 2>And nearly two thirds or less than one month old.

300
00:15:43.320 --> 00:15:46.080
<v Speaker 2>This implies a critical need for much faster recrawling and

301
00:15:46.120 --> 00:15:49.279
<v Speaker 2>index updates by search engines. Otherwise the results you get

302
00:15:49.360 --> 00:15:51.200
<v Speaker 2>quickly become outdated.

303
00:15:50.720 --> 00:15:53.639
<v Speaker 1>So crawlers need to constantly adapt to these dynamic changes.

304
00:15:54.559 --> 00:15:57.559
<v Speaker 1>How has the underlying technology of web pages affected this?

305
00:15:58.000 --> 00:15:58.840
<v Speaker 1>Like JavaScript?

306
00:15:58.960 --> 00:16:01.519
<v Speaker 2>Precisely, the average which number of links per page shot

307
00:16:01.600 --> 00:16:04.519
<v Speaker 2>up significantly from about fifty eight to over one hundred

308
00:16:04.559 --> 00:16:07.879
<v Speaker 2>and seven between twenty nine and twenty fourteen. Dynamic links

309
00:16:07.879 --> 00:16:12.200
<v Speaker 2>also saw substantial growth, and JavaScript critically. Yes, JavaScript became

310
00:16:12.240 --> 00:16:15.240
<v Speaker 2>the most dominant client side technology, hitting nearly seventy six

311
00:16:15.279 --> 00:16:19.440
<v Speaker 2>percent usage by twenty fourteen. Older techle like Flash, bbscript,

312
00:16:19.519 --> 00:16:23.480
<v Speaker 2>tclscript they nearly disappeared. Why The shift driven largely by

313
00:16:23.480 --> 00:16:27.320
<v Speaker 2>the rise of ajax, asynchronous JavaScript and XML, and also

314
00:16:27.440 --> 00:16:31.320
<v Speaker 2>flashes well known security and compatibility issues. What this means

315
00:16:31.360 --> 00:16:34.840
<v Speaker 2>is that modern crawling systems must primarily focus on processing

316
00:16:34.919 --> 00:16:38.159
<v Speaker 2>JavaScript effectively to index the web properly today, and on

317
00:16:38.200 --> 00:16:41.279
<v Speaker 2>the server side, Php interestingly remain dominant there.

318
00:16:41.320 --> 00:16:43.799
<v Speaker 1>Okay, this all leads nicely into the deep web information

319
00:16:43.919 --> 00:16:47.159
<v Speaker 1>retrieval process. Can you recap the fundamental difference between the

320
00:16:47.159 --> 00:16:49.639
<v Speaker 1>deep web and the surface web and how we access

321
00:16:49.639 --> 00:16:50.320
<v Speaker 1>it efficiently.

322
00:16:50.639 --> 00:16:54.240
<v Speaker 2>Sure. The key distinction again is that deep web information

323
00:16:54.440 --> 00:16:59.039
<v Speaker 2>is dynamically generated from databases in response to specific user requests.

324
00:16:59.320 --> 00:17:02.080
<v Speaker 2>It's not like the static pre index pages of the

325
00:17:02.120 --> 00:17:02.919
<v Speaker 2>surface web.

326
00:17:02.759 --> 00:17:04.160
<v Speaker 1>Which is why Google struggles with it.

327
00:17:04.240 --> 00:17:08.880
<v Speaker 2>Exactly, So, a specialized deep web crawler follows four main steps. First,

328
00:17:09.039 --> 00:17:13.880
<v Speaker 2>it analyzes the query interface, the search box basically. Second,

329
00:17:14.480 --> 00:17:18.480
<v Speaker 2>it intelligently assigns values to those query fields.

330
00:17:18.240 --> 00:17:20.400
<v Speaker 1>Like filling in the form automatically kind.

331
00:17:20.240 --> 00:17:23.400
<v Speaker 2>Of yeah, Third, it analyzes the response it gets back

332
00:17:23.440 --> 00:17:27.079
<v Speaker 2>and navigates the results, maybe clicking through pages. And finally,

333
00:17:27.160 --> 00:17:29.759
<v Speaker 2>it ranks the relevance of the information it finds.

334
00:17:29.960 --> 00:17:31.519
<v Speaker 1>And that ranking is different too.

335
00:17:31.559 --> 00:17:34.920
<v Speaker 2>Very different, because, unlike the surface web, the deep web

336
00:17:35.079 --> 00:17:39.720
<v Speaker 2>lacks those direct hyperlinks between pages, so traditional link based

337
00:17:39.799 --> 00:17:43.759
<v Speaker 2>quality assessments like PageRank don't really apply in the same way.

338
00:17:43.839 --> 00:17:46.680
<v Speaker 1>And the source cover specific techniques and protocols for this, right.

339
00:17:46.880 --> 00:17:48.279
<v Speaker 1>Can you give us a sense of what makes these

340
00:17:48.319 --> 00:17:49.319
<v Speaker 1>crawlers so effective?

341
00:17:49.519 --> 00:17:52.720
<v Speaker 2>Yeah, it gets pretty technical, but for identifying the search interfaces,

342
00:17:52.759 --> 00:17:57.599
<v Speaker 2>they use sophisticated classification techniques like decision trees, random forest

343
00:17:58.119 --> 00:18:02.519
<v Speaker 2>IRFA are mentioned. To understand the database structures. The schemas.

344
00:18:02.839 --> 00:18:06.599
<v Speaker 2>There are mapping techniques like COMA and LSD, and for

345
00:18:06.720 --> 00:18:12.000
<v Speaker 2>smartly assigning those query values. Methods like LGERM help with

346
00:18:12.720 --> 00:18:14.960
<v Speaker 2>global aggregation and local scoring.

347
00:18:15.119 --> 00:18:17.920
<v Speaker 1>And there are protocols too, like languages for talking to

348
00:18:18.039 --> 00:18:19.119
<v Speaker 1>databases exactly.

349
00:18:19.559 --> 00:18:23.160
<v Speaker 2>The source highlights various protocols proposed for deep web crawling,

350
00:18:23.440 --> 00:18:27.039
<v Speaker 2>things like SRU, which is XML focused Z thirty nine

351
00:18:27.039 --> 00:18:29.720
<v Speaker 2>point fifty, which is a client server protocol often used

352
00:18:29.720 --> 00:18:32.200
<v Speaker 2>by libraries to search across multiple sources at.

353
00:18:32.160 --> 00:18:34.519
<v Speaker 1>Once, like searching multiple university libraries from.

354
00:18:34.440 --> 00:18:37.920
<v Speaker 2>One place, precisely that kind of thing. There's also OAIPMH

355
00:18:38.039 --> 00:18:42.720
<v Speaker 2>for harvesting metadata PLQL, which combines XML and Information Retrieval

356
00:18:43.000 --> 00:18:45.920
<v Speaker 2>host List protocol and sitemaps protocol. That last one uses

357
00:18:45.920 --> 00:18:48.720
<v Speaker 2>an XML file. Webmasters can provide to tell search engines

358
00:18:48.720 --> 00:18:52.319
<v Speaker 2>about URLs, especially useful for content generated by ajax or

359
00:18:52.359 --> 00:18:54.079
<v Speaker 2>Flash that curlers might otherwise miss.

360
00:18:54.279 --> 00:18:58.000
<v Speaker 1>So how do these specialized deep web search engines compared

361
00:18:58.039 --> 00:19:01.319
<v Speaker 1>to the conventional ones we use every day? Google or bang.

362
00:19:01.240 --> 00:19:04.279
<v Speaker 2>Well surface engines your Googles and bings. They cast a

363
00:19:04.359 --> 00:19:07.079
<v Speaker 2>really wide net, right, they give you very general results.

364
00:19:07.119 --> 00:19:11.000
<v Speaker 2>Lots of hits deep web engines, however, are far more

365
00:19:11.000 --> 00:19:15.240
<v Speaker 2>efficient for specific often technical literature or data.

366
00:19:15.359 --> 00:19:16.960
<v Speaker 1>They're more focused exactly.

367
00:19:17.440 --> 00:19:20.359
<v Speaker 2>Their goal isn't just a long list of hits, but

368
00:19:20.480 --> 00:19:25.000
<v Speaker 2>providing a right list of highly relevant information, quality and

369
00:19:25.119 --> 00:19:26.920
<v Speaker 2>depth over just quantity.

370
00:19:27.039 --> 00:19:27.880
<v Speaker 1>Can you mind a few?

371
00:19:27.960 --> 00:19:31.640
<v Speaker 2>Sure? The source mentions some good examples Cyrus, which specializes

372
00:19:31.680 --> 00:19:35.759
<v Speaker 2>in scientific, scholarly, technical, and medical data. Deep Dive, which

373
00:19:35.759 --> 00:19:39.519
<v Speaker 2>is actually an online rental service for scientific and technical articles.

374
00:19:39.519 --> 00:19:41.000
<v Speaker 2>You pay to access them for a period.

375
00:19:41.079 --> 00:19:41.799
<v Speaker 1>Interesting model.

376
00:19:41.880 --> 00:19:46.039
<v Speaker 2>Yeah, And Biznar, which focuses on business information and uses

377
00:19:46.079 --> 00:19:50.920
<v Speaker 2>what's called federated search technology, meaning it queries multiple authoritative

378
00:19:50.920 --> 00:19:54.920
<v Speaker 2>business databases simultaneously and combines the results for you. Very

379
00:19:54.960 --> 00:19:56.279
<v Speaker 2>powerful for business research.

380
00:19:56.480 --> 00:19:59.960
<v Speaker 1>Okay, So, with all this data floating around, both visible

381
00:20:00.240 --> 00:20:02.759
<v Speaker 1>and hidden, what does this all mean when we talk

382
00:20:02.799 --> 00:20:06.359
<v Speaker 1>about online identity? This section really makes you think about

383
00:20:06.359 --> 00:20:09.000
<v Speaker 1>how our digital selves are constructed and perceived.

384
00:20:09.079 --> 00:20:11.759
<v Speaker 2>It really does, and it raises a critical question, doesn't it?

385
00:20:12.160 --> 00:20:15.000
<v Speaker 2>In a world just a wash with data, how do

386
00:20:15.039 --> 00:20:18.839
<v Speaker 2>we balance the desire for information, sometimes the need for it,

387
00:20:19.319 --> 00:20:21.279
<v Speaker 2>with our fundamental rights to privacy.

388
00:20:21.440 --> 00:20:22.839
<v Speaker 1>It's a tough balance, it is.

389
00:20:23.160 --> 00:20:27.160
<v Speaker 2>The Internet itself is nuanced, It's not inherently private or public.

390
00:20:27.279 --> 00:20:30.039
<v Speaker 2>Our privacy depends so much on our own behavior and

391
00:20:30.079 --> 00:20:32.480
<v Speaker 2>the specific services we choose to use.

392
00:20:32.440 --> 00:20:35.039
<v Speaker 1>And this leads to harmful things like doxing attacks.

393
00:20:35.279 --> 00:20:38.480
<v Speaker 2>Yes, doxing is a chilling example of that balance failing.

394
00:20:38.920 --> 00:20:42.920
<v Speaker 2>It's when someone digs up and maliciously publishes personally identifiable

395
00:20:42.960 --> 00:20:46.759
<v Speaker 2>information your address, phone number, family details to publicly shame,

396
00:20:46.920 --> 00:20:50.240
<v Speaker 2>commit fraud, harass, or even directly harm someone horrible.

397
00:20:50.400 --> 00:20:51.960
<v Speaker 1>And that's related to inference.

398
00:20:51.599 --> 00:20:56.200
<v Speaker 2>Attacks closely related. Yeah, Inference attacks are about revealing unintended

399
00:20:56.200 --> 00:21:01.279
<v Speaker 2>information about individuals or even hidden dark networks. These insights

400
00:21:01.400 --> 00:21:05.079
<v Speaker 2>often come from analyzing our electronic data doubles or data

401
00:21:05.160 --> 00:21:06.359
<v Speaker 2>doppel gangers.

402
00:21:06.039 --> 00:21:07.160
<v Speaker 1>Our digital footprints.

403
00:21:07.279 --> 00:21:10.799
<v Speaker 2>Essentially, Yes, profiles compile from all the data we shed online,

404
00:21:11.119 --> 00:21:14.640
<v Speaker 2>sometimes data leaked by others about us without our knowledge.

405
00:21:14.759 --> 00:21:17.839
<v Speaker 1>What tools are used to uncover these hidden connections and

406
00:21:17.960 --> 00:21:21.240
<v Speaker 1>understand online social networks? How does that work well?

407
00:21:21.279 --> 00:21:26.200
<v Speaker 2>Social Network analysis or SNA and its electronic counterpart ESNA

408
00:21:26.279 --> 00:21:30.519
<v Speaker 2>are key here. These tools are designed specifically to capture, analyze,

409
00:21:30.599 --> 00:21:32.359
<v Speaker 2>and visualize social networks.

410
00:21:32.559 --> 00:21:33.279
<v Speaker 1>What can they show?

411
00:21:33.599 --> 00:21:37.000
<v Speaker 2>They can reveal hidden relationships between people or groups, help

412
00:21:37.079 --> 00:21:40.279
<v Speaker 2>understand network centrality, like who are the key influencers or

413
00:21:40.279 --> 00:21:43.359
<v Speaker 2>connectors and identify different types of communities online.

414
00:21:43.359 --> 00:21:44.559
<v Speaker 1>What assumptions do they make.

415
00:21:44.680 --> 00:21:49.160
<v Speaker 2>They're generally built on core sociological assumptions that people are social,

416
00:21:49.240 --> 00:21:51.680
<v Speaker 2>that we tend to prefer others like ourselves. That's hum awfully,

417
00:21:51.920 --> 00:21:54.720
<v Speaker 2>birds of a feather exactly, but also that we connect

418
00:21:54.720 --> 00:21:59.240
<v Speaker 2>across different groups that's heteropfully, and that social structures often

419
00:21:59.240 --> 00:21:59.960
<v Speaker 2>have hierarchies.

420
00:22:00.400 --> 00:22:02.720
<v Speaker 1>Are there specific software tools mentioned?

421
00:22:02.799 --> 00:22:06.640
<v Speaker 2>Yes, Tools like Multego Radium are mentioned. It can apparently

422
00:22:06.680 --> 00:22:09.559
<v Speaker 2>link something simple like an email address to a surprisingly

423
00:22:09.680 --> 00:22:15.039
<v Speaker 2>wide range of other electronic info URLs, websites, visited phone numbers,

424
00:22:15.480 --> 00:22:19.079
<v Speaker 2>even mapping out broader network structures wow and NOE Excela

425
00:22:19.200 --> 00:22:22.359
<v Speaker 2>is often used specifically for mapping Twitter user networks and

426
00:22:22.480 --> 00:22:24.799
<v Speaker 2>visualizing those complex social connections.

427
00:22:25.039 --> 00:22:28.000
<v Speaker 1>How dynamic are these online networks? Do they stay static

428
00:22:28.079 --> 00:22:29.279
<v Speaker 1>or are they constantly changing?

429
00:22:29.480 --> 00:22:32.920
<v Speaker 2>Oh, They're incredibly dynamic. Members are constantly cycling in and out.

430
00:22:33.200 --> 00:22:36.880
<v Speaker 2>Research shows large social networks typically have a few distinct parts.

431
00:22:37.200 --> 00:22:40.920
<v Speaker 2>There's usually a giant component, a tightly connected core group.

432
00:22:41.119 --> 00:22:43.759
<v Speaker 2>Then there's a middle region made up of smaller clusters,

433
00:22:43.960 --> 00:22:48.519
<v Speaker 2>often formed around charismatic star individuals, and finally an outer

434
00:22:48.599 --> 00:22:51.599
<v Speaker 2>periphery of isolated nodes or individual Do.

435
00:22:51.599 --> 00:22:52.799
<v Speaker 1>These groups merge easily?

436
00:22:53.079 --> 00:22:56.359
<v Speaker 2>What's particularly fascinating, according to the research sited, is that

437
00:22:56.400 --> 00:22:59.160
<v Speaker 2>there seems to be a low likelihood of these isolated

438
00:22:59.160 --> 00:23:03.519
<v Speaker 2>communities merge with the main core. Instead, those middle region

439
00:23:03.559 --> 00:23:06.720
<v Speaker 2>clusters tend to either eventually merge with the main mass

440
00:23:06.839 --> 00:23:11.200
<v Speaker 2>or they just disappear if the central star individual stops

441
00:23:11.319 --> 00:23:12.559
<v Speaker 2>actively cultivating them.

442
00:23:12.759 --> 00:23:16.599
<v Speaker 1>Interesting. That makes you wonder about the accuracy of these analyzes, though,

443
00:23:16.640 --> 00:23:18.480
<v Speaker 1>What are the limits of these tools? Can they get

444
00:23:18.480 --> 00:23:18.799
<v Speaker 1>it wrong?

445
00:23:18.880 --> 00:23:21.960
<v Speaker 2>Oh? Absolutely, there are definite limits. The results can be

446
00:23:22.039 --> 00:23:25.319
<v Speaker 2>significantly affected by things like the parameters the researcher sets

447
00:23:25.359 --> 00:23:28.319
<v Speaker 2>for the data crawls, the unique quirks and limitations of

448
00:23:28.359 --> 00:23:29.359
<v Speaker 2>the tools themselves.

449
00:23:29.599 --> 00:23:31.200
<v Speaker 1>On human error, of course.

450
00:23:31.279 --> 00:23:35.960
<v Speaker 2>Incorrect processes, misinterpretation of the data, or just flawed logic

451
00:23:36.000 --> 00:23:39.279
<v Speaker 2>on the researcher's part. Plus, and this is key, the

452
00:23:39.359 --> 00:23:44.039
<v Speaker 2>validity of any electronic identity exists on a continuum. Information

453
00:23:44.119 --> 00:23:46.160
<v Speaker 2>is constantly changing online, so.

454
00:23:46.039 --> 00:23:48.079
<v Speaker 1>A snapshot in time might not be accurate.

455
00:23:48.160 --> 00:23:51.119
<v Speaker 2>Later exactly what's accurate at one moment may no longer

456
00:23:51.160 --> 00:23:53.119
<v Speaker 2>be true. The next day or even the next hour.

457
00:23:53.599 --> 00:23:56.039
<v Speaker 2>It's a snapshot, definitely not a permanent record.

458
00:23:56.319 --> 00:23:59.640
<v Speaker 1>And what are the real world implications of this type

459
00:23:59.680 --> 00:24:02.519
<v Speaker 1>of des data analysis? Does it affect us directly?

460
00:24:02.799 --> 00:24:06.880
<v Speaker 2>One very direct and frankly startling implication mentioned is in

461
00:24:06.920 --> 00:24:10.920
<v Speaker 2>the lending world. Apparently some lending entities now consider social

462
00:24:10.960 --> 00:24:14.559
<v Speaker 2>network connections as data points when assessing loan suitability.

463
00:24:14.880 --> 00:24:18.720
<v Speaker 1>Seriously, so, who you know online could affect your loan application?

464
00:24:19.039 --> 00:24:23.000
<v Speaker 2>The suggestion is yes, if you have acquaintances online who

465
00:24:23.039 --> 00:24:26.519
<v Speaker 2>happen to have poor credit scores, that connection could potentially

466
00:24:26.599 --> 00:24:30.599
<v Speaker 2>and negatively affect your own loan eligibility. It really blurs

467
00:24:30.640 --> 00:24:33.759
<v Speaker 2>the lines between your personal connections and your financial standing.

468
00:24:33.920 --> 00:24:37.359
<v Speaker 1>That's quite something. Okay, let's shift now to the fascinating

469
00:24:37.359 --> 00:24:41.079
<v Speaker 1>concept of becoming anonymous. Many people associate this name with

470
00:24:41.119 --> 00:24:43.920
<v Speaker 1>a specific, often controversial group.

471
00:24:43.839 --> 00:24:46.559
<v Speaker 2>And that's correct, But it's really important to understand anonymous

472
00:24:46.559 --> 00:24:49.319
<v Speaker 2>not as a formal organization or a club, or even

473
00:24:49.400 --> 00:24:52.599
<v Speaker 2>a movement with clear leaders or a fixed ideology.

474
00:24:52.720 --> 00:24:53.440
<v Speaker 1>So what is it? Then?

475
00:24:53.720 --> 00:24:56.240
<v Speaker 2>The source defines it more like people who travel a

476
00:24:56.240 --> 00:25:00.680
<v Speaker 2>short distance together, united temporarily by common goals or dislay

477
00:25:00.799 --> 00:25:04.920
<v Speaker 2>its infrastructure is remarkably decentralized and incredibly adaptable.

478
00:25:05.200 --> 00:25:06.039
<v Speaker 1>How did it operate?

479
00:25:06.359 --> 00:25:11.559
<v Speaker 2>It leverages existing Internet facilities, social networks, IRC, internet, really

480
00:25:11.640 --> 00:25:16.000
<v Speaker 2>chat imageboards like four Chan, and they're known for being

481
00:25:16.119 --> 00:25:19.039
<v Speaker 2>incredibly quick to shift to new platforms if a previous

482
00:25:19.039 --> 00:25:20.680
<v Speaker 2>one gets compromised or shut down.

483
00:25:20.839 --> 00:25:23.880
<v Speaker 1>And they gained global notoriety pretty quickly, didn't they background

484
00:25:23.880 --> 00:25:24.400
<v Speaker 1>twenty ten.

485
00:25:24.559 --> 00:25:28.200
<v Speaker 2>Absolutely, they really burst onto the global scene by orchestrating

486
00:25:28.279 --> 00:25:31.720
<v Speaker 2>those online uprisings in support of WikiLeaks in twenty ten,

487
00:25:32.160 --> 00:25:35.519
<v Speaker 2>and then notably they claim to have infiltrated natosystems in

488
00:25:35.559 --> 00:25:36.240
<v Speaker 2>twenty eleven.

489
00:25:36.519 --> 00:25:39.559
<v Speaker 1>What are their defining characteristics according to the research.

490
00:25:39.359 --> 00:25:44.160
<v Speaker 2>A constantly changing membership for one, an increasing politicization over time,

491
00:25:44.759 --> 00:25:48.279
<v Speaker 2>engagement and actions that are often illegal, and that distinct

492
00:25:48.400 --> 00:25:50.640
<v Speaker 2>fluid network structure we just talked about.

493
00:25:50.759 --> 00:25:54.359
<v Speaker 1>The iconic guy Fox mask is practically synonymous with anonymous.

494
00:25:54.400 --> 00:25:58.000
<v Speaker 1>Now what's its deeper significance beyond just being a disguise?

495
00:25:58.039 --> 00:25:59.039
<v Speaker 1>Where did that come from?

496
00:25:59.160 --> 00:26:02.319
<v Speaker 2>Well? The ask was hugely popularized by the two thousand

497
00:26:02.319 --> 00:26:06.279
<v Speaker 2>and six movie V for Vendetta. Of course, symbolically it's

498
00:26:06.319 --> 00:26:09.920
<v Speaker 2>seen as challenging cultural assumptions. It represents a kind of

499
00:26:09.960 --> 00:26:14.039
<v Speaker 2>paradoxical non identity, a non identity yeah, rejecting clear pet

500
00:26:14.160 --> 00:26:18.640
<v Speaker 2>either alternatives, it serves as a powerful strategy, arguably to

501
00:26:18.720 --> 00:26:22.000
<v Speaker 2>protect the individual self in an age of pervasive surveillance.

502
00:26:22.319 --> 00:26:26.400
<v Speaker 2>There's also a carnivalesque element to it that the source discusses,

503
00:26:27.160 --> 00:26:30.440
<v Speaker 2>how so, in the sense that laughter becomes both protests

504
00:26:30.440 --> 00:26:34.880
<v Speaker 2>and acceptance. Social distinctions are temporarily suspended, and the mask

505
00:26:34.960 --> 00:26:39.240
<v Speaker 2>itself symbolizes an escape from a fixed social personality or identity.

506
00:26:39.279 --> 00:26:41.599
<v Speaker 1>That's a powerful symbol, especially when you can trast the

507
00:26:41.599 --> 00:26:45.160
<v Speaker 1>movie's character V with the actual historical Guy Fox.

508
00:26:45.279 --> 00:26:48.000
<v Speaker 2>Indeed, the historical Guy Fox, as you know, was caught,

509
00:26:48.279 --> 00:26:52.799
<v Speaker 2>tortured and his plot failed miserably. In stark contrast, V

510
00:26:52.920 --> 00:26:56.000
<v Speaker 2>in the movie successfully blows up Parliament, and crucially with

511
00:26:56.079 --> 00:26:59.799
<v Speaker 2>collective participation from the masked masses. This emphasizes that for

512
00:26:59.839 --> 00:27:05.440
<v Speaker 2>anonymous the idea itself, freedom, anti authoritarianism, whatever it might

513
00:27:05.480 --> 00:27:08.160
<v Speaker 2>be at the moment, matters more than the specific individual

514
00:27:08.200 --> 00:27:08.920
<v Speaker 2>behind the mask.

515
00:27:09.319 --> 00:27:11.400
<v Speaker 1>There's a weird link to an Internet memes.

516
00:27:11.519 --> 00:27:14.359
<v Speaker 2>Yeah, it's an odd cultural footnote, but the source mentions

517
00:27:14.400 --> 00:27:17.359
<v Speaker 2>how the epic fail guy meme in a strange twist

518
00:27:17.359 --> 00:27:20.279
<v Speaker 2>of Internet culture, might have even helped set the stage

519
00:27:20.279 --> 00:27:23.960
<v Speaker 2>for Anonymous's adoption of the mask. It apparently created a

520
00:27:24.039 --> 00:27:27.440
<v Speaker 2>kind of seamless cognitive link back to the historical figure

521
00:27:27.640 --> 00:27:30.519
<v Speaker 2>who was, after all, famous for an epic fail.

522
00:27:30.759 --> 00:27:34.920
<v Speaker 1>Huh, strange connections. Okay, finally, let's turn our attention to

523
00:27:34.960 --> 00:27:38.519
<v Speaker 1>the cutting edge of web crawling itself. This section focuses

524
00:27:38.559 --> 00:27:41.680
<v Speaker 1>on optimizing how we access and retrieve all this hidden

525
00:27:41.720 --> 00:27:43.759
<v Speaker 1>information we've been talking about, right.

526
00:27:43.680 --> 00:27:47.839
<v Speaker 2>And this really underscores how continuous innovation in crawling technologies

527
00:27:47.920 --> 00:27:50.640
<v Speaker 2>is absolutely essential. We need it to keep pace with

528
00:27:50.720 --> 00:27:53.519
<v Speaker 2>the Web's evolving structure and to effectively surface all this

529
00:27:53.640 --> 00:27:56.839
<v Speaker 2>vast hidden data. It's fundamentally about making the deep web

530
00:27:56.920 --> 00:27:58.319
<v Speaker 2>truly searchable and useful.

531
00:27:58.680 --> 00:28:01.920
<v Speaker 1>So what's the main challenge again, that traditional search engines

532
00:28:01.960 --> 00:28:04.759
<v Speaker 1>face when trying to index this deep web? Why can't

533
00:28:04.759 --> 00:28:05.559
<v Speaker 1>Google just do it?

534
00:28:05.880 --> 00:28:09.400
<v Speaker 2>The fundamental problem is that conventional search engines are primarily

535
00:28:09.400 --> 00:28:14.119
<v Speaker 2>designed to index static, linked pages. They really struggle to

536
00:28:14.200 --> 00:28:17.480
<v Speaker 2>efficiently crawl the massive deep Web because, as we've said,

537
00:28:17.880 --> 00:28:20.680
<v Speaker 2>its pages are dynamic, generated on the fly, and they

538
00:28:20.680 --> 00:28:24.559
<v Speaker 2>are often hidden behind restricted search interfaces like log in

539
00:28:24.640 --> 00:28:26.079
<v Speaker 2>pages or complex forms.

540
00:28:26.160 --> 00:28:27.720
<v Speaker 1>Yeah, that makes it expensive.

541
00:28:27.400 --> 00:28:32.799
<v Speaker 2>Exactly, It significantly increases the costs associated with accessing the data,

542
00:28:32.839 --> 00:28:36.759
<v Speaker 2>storing it, and communicating it. The proposed solution discussed here

543
00:28:36.839 --> 00:28:40.319
<v Speaker 2>is the idea of vertical search engines cerbical, meaning domain

544
00:28:40.359 --> 00:28:44.039
<v Speaker 2>specific engines designed to provide much better quality search results,

545
00:28:44.079 --> 00:28:47.160
<v Speaker 2>but only within a specific area of deep web content,

546
00:28:47.559 --> 00:28:51.160
<v Speaker 2>narrowing the focus to say, academic papers, or job listings

547
00:28:51.240 --> 00:28:52.160
<v Speaker 2>or flight information.

548
00:28:52.480 --> 00:28:55.000
<v Speaker 1>And the core objective of these new systems is to

549
00:28:55.079 --> 00:28:59.720
<v Speaker 1>dramatically reduce those costs while providing better, more relevant results.

550
00:29:00.119 --> 00:29:03.440
<v Speaker 2>Precisely, that's the goal. They achieve this by using techniques

551
00:29:03.480 --> 00:29:06.799
<v Speaker 2>like parallel computing, doing many things at once, distributing the

552
00:29:06.799 --> 00:29:10.680
<v Speaker 2>hardware and software load, and employing more efficient indexing techniques

553
00:29:10.799 --> 00:29:14.200
<v Speaker 2>like automatic segmentation to get high precision results.

554
00:29:14.640 --> 00:29:16.599
<v Speaker 1>Is there a way to measure the cost?

555
00:29:16.920 --> 00:29:20.079
<v Speaker 2>The source even provides a cost calculation model for querying

556
00:29:20.079 --> 00:29:23.240
<v Speaker 2>a database. It basically looks at the number of records

557
00:29:23.279 --> 00:29:26.240
<v Speaker 2>matched by a query versus the maximum number displayed per

558
00:29:26.319 --> 00:29:30.920
<v Speaker 2>result page cost gidb eto like dluddbk as the formula.

559
00:29:30.559 --> 00:29:32.480
<v Speaker 1>Puts it, and the results for promising.

560
00:29:32.279 --> 00:29:36.839
<v Speaker 2>Yeah, experimental results have consistently demonstrated significant reductions in communication costs,

561
00:29:37.200 --> 00:29:41.400
<v Speaker 2>access costs, storage costs, and computational costs. It's really about

562
00:29:41.480 --> 00:29:44.279
<v Speaker 2>being smarter, not just bigger or faster, in how we

563
00:29:44.359 --> 00:29:46.400
<v Speaker 2>approach searching this hidden data.

564
00:29:46.440 --> 00:29:49.960
<v Speaker 1>What about designing a whole new architecture for deep web crawlers?

565
00:29:50.160 --> 00:29:52.559
<v Speaker 1>Are their efforts to fundamentally rethink how they work to

566
00:29:52.640 --> 00:29:53.960
<v Speaker 1>uncover even more data?

567
00:29:54.240 --> 00:29:57.240
<v Speaker 2>Yes, because even existing deep web crawlers still leave a

568
00:29:57.319 --> 00:30:00.400
<v Speaker 2>large volume of data undiscovered. So one chapter poses a

569
00:30:00.440 --> 00:30:05.359
<v Speaker 2>novel architecture based on something called the QIIEP specification KEYIP.

570
00:30:05.759 --> 00:30:06.640
<v Speaker 1>What does that improve?

571
00:30:06.920 --> 00:30:08.960
<v Speaker 2>Think of it as a much smarter way for the

572
00:30:09.039 --> 00:30:12.319
<v Speaker 2>crawler to interact with web forms. It includes improvements like

573
00:30:12.680 --> 00:30:16.319
<v Speaker 2>enhanced form filling capabilities, more automatic query selections so it

574
00:30:16.359 --> 00:30:19.200
<v Speaker 2>knows what to search for, and minimizing errors when it

575
00:30:19.240 --> 00:30:21.279
<v Speaker 2>follows links or forwards pages.

576
00:30:21.319 --> 00:30:23.319
<v Speaker 1>Does it involve different parts working together?

577
00:30:23.519 --> 00:30:26.640
<v Speaker 2>It does. It lists key modules within the architecture, things

578
00:30:26.720 --> 00:30:30.480
<v Speaker 2>like a page fetcher, a page analyzer, a form id manager,

579
00:30:30.839 --> 00:30:35.440
<v Speaker 2>the QIP server itself, a form submitter, link extractor, link ranker,

580
00:30:35.519 --> 00:30:37.079
<v Speaker 2>and so on. They all work in concert.

581
00:30:37.480 --> 00:30:39.039
<v Speaker 1>Can you simplify the process?

582
00:30:39.359 --> 00:30:42.559
<v Speaker 2>Basically, it starts with initial page analysis and filtering links.

583
00:30:42.920 --> 00:30:46.599
<v Speaker 2>Then it identifies the search interfaces, uses this QIAP server

584
00:30:46.720 --> 00:30:50.160
<v Speaker 2>to correlate the form fields, intelligently submits the filled forms,

585
00:30:50.200 --> 00:30:53.200
<v Speaker 2>and then crawls, ranks and stores the dynamic content that

586
00:30:53.240 --> 00:30:54.000
<v Speaker 2>gets generated.

587
00:30:54.160 --> 00:30:57.480
<v Speaker 1>And how effective is this new architecture in actually finding

588
00:30:57.480 --> 00:30:59.359
<v Speaker 1>the hidden data? Does it work better?

589
00:30:59.599 --> 00:31:03.880
<v Speaker 2>It's a significantly high harvest ratio for focused domains, for instance,

590
00:31:03.920 --> 00:31:06.759
<v Speaker 2>achieving nearly eighty percent harvests for job and book related

591
00:31:06.759 --> 00:31:10.240
<v Speaker 2>searches and over fifty percent for autodomains.

592
00:31:09.920 --> 00:31:11.920
<v Speaker 1>So much better than previous methods.

593
00:31:12.160 --> 00:31:16.400
<v Speaker 2>Yes, this indicates a substantial improvement over existing approaches. The

594
00:31:16.440 --> 00:31:20.759
<v Speaker 2>overall benefits are clear better performance, reduced costs for deep

595
00:31:20.759 --> 00:31:24.920
<v Speaker 2>web searching using a domain specific formula, and highly effective

596
00:31:24.960 --> 00:31:28.880
<v Speaker 2>link generation. It boasts over forty percent effective links with

597
00:31:28.960 --> 00:31:32.240
<v Speaker 2>forms extraction per loop, meaning it's far more efficient at

598
00:31:32.240 --> 00:31:34.480
<v Speaker 2>pulling out that dynamic information we're looking for.

599
00:31:34.640 --> 00:31:37.079
<v Speaker 1>Okay, Our final point in is deep dive touches on

600
00:31:37.200 --> 00:31:40.319
<v Speaker 1>how search engines, both surface and deep act as a

601
00:31:40.400 --> 00:31:44.359
<v Speaker 1>kind of backbone for information extraction across so many fields

602
00:31:44.359 --> 00:31:45.720
<v Speaker 1>in our modern ICT world.

603
00:31:45.839 --> 00:31:50.599
<v Speaker 2>Absolutely information and communication technology ICT and search engines underpinning

604
00:31:50.680 --> 00:31:53.519
<v Speaker 2>it play just a monumental role in human development across

605
00:31:53.519 --> 00:31:57.559
<v Speaker 2>incredibly vast application areas like what well, education obviously, the

606
00:31:57.559 --> 00:32:02.599
<v Speaker 2>business environment, human resources, job information, searching, e commerce, online banking,

607
00:32:02.759 --> 00:32:07.880
<v Speaker 2>database related information extraction, in general, health information, e government services.

608
00:32:08.400 --> 00:32:10.640
<v Speaker 1>The list goes on, and the deep web information is

609
00:32:10.680 --> 00:32:12.200
<v Speaker 1>particularly crucial here.

610
00:32:12.200 --> 00:32:16.359
<v Speaker 2>Often yes, because its dynamic pages can offer that right

611
00:32:16.519 --> 00:32:20.599
<v Speaker 2>list of highly relevant specific information rather than just a

612
00:32:20.680 --> 00:32:23.720
<v Speaker 2>long list of general hits from the surface web. And

613
00:32:23.799 --> 00:32:28.920
<v Speaker 2>crucially it often provides access to incredibly authoritative primary source sites.

614
00:32:29.079 --> 00:32:31.039
<v Speaker 1>And just to recap, there are different types of web

615
00:32:31.079 --> 00:32:35.519
<v Speaker 1>crawlers tailored for different needs, reflecting the web's diverse structure exactly.

616
00:32:35.599 --> 00:32:38.480
<v Speaker 2>You've got the simple crawler, which is just a single process,

617
00:32:39.039 --> 00:32:43.759
<v Speaker 2>the parallel crawler using multi threading for faster downloads, focused

618
00:32:43.759 --> 00:32:47.839
<v Speaker 2>crawlers designed to zero in on specific topics or domains,

619
00:32:47.880 --> 00:32:50.559
<v Speaker 2>incremental crawlers which are smart enough to only go back

620
00:32:50.559 --> 00:32:51.400
<v Speaker 2>and update pages that.

621
00:32:51.400 --> 00:32:53.440
<v Speaker 1>Have actually changed, shaving resources right.

622
00:32:53.680 --> 00:32:56.279
<v Speaker 2>And then the hidden or deep crawlers we've been focusing

623
00:32:56.359 --> 00:32:59.599
<v Speaker 2>on specifically designed for those dynamic pages and the vast

624
00:32:59.640 --> 00:33:02.200
<v Speaker 2>amounts of online data hidden behind forms.

625
00:33:02.680 --> 00:33:05.519
<v Speaker 1>Can you just highlight the key differences one last time

626
00:33:05.599 --> 00:33:07.759
<v Speaker 1>between the surface web and the deep web that these

627
00:33:07.839 --> 00:33:11.359
<v Speaker 1>various crawlers NAVI just to really solidify our understanding.

628
00:33:11.559 --> 00:33:16.119
<v Speaker 2>Certainly so. The surface web, mostly static linked pages, generally

629
00:33:16.160 --> 00:33:20.319
<v Speaker 2>offers broad less specialized content. It doesn't publish results through

630
00:33:20.359 --> 00:33:24.200
<v Speaker 2>direct database queries, its content often comes from less professional

631
00:33:24.279 --> 00:33:27.079
<v Speaker 2>or structured sources, and it represents just a fraction of

632
00:33:27.119 --> 00:33:29.480
<v Speaker 2>the overall unstructured content online.

633
00:33:29.680 --> 00:33:31.000
<v Speaker 1>Okay, and the deep web.

634
00:33:31.079 --> 00:33:34.680
<v Speaker 2>The deep web dynamic holds an enormous amount of online data.

635
00:33:35.119 --> 00:33:38.200
<v Speaker 2>It requires more resources and processing power to access, but

636
00:33:38.279 --> 00:33:42.400
<v Speaker 2>in return, it offers narrower, much deeper, often higher quality content.

637
00:33:42.920 --> 00:33:46.839
<v Speaker 2>It publishes results through direct queries to databases, and frequently

638
00:33:46.839 --> 00:33:51.519
<v Speaker 2>contains highly professional, authoritative material. It's truly the hidden bulk

639
00:33:51.559 --> 00:33:52.200
<v Speaker 2>of the Internet.

640
00:33:52.200 --> 00:33:55.640
<v Speaker 1>Iceberg and we mentioned some specific deep web search engines

641
00:33:55.720 --> 00:33:56.839
<v Speaker 1>tailored for these needs.

642
00:33:56.960 --> 00:34:01.599
<v Speaker 2>Yes, just reinforcing the point cirus focusing on science, scholarly, technical,

643
00:34:01.640 --> 00:34:05.079
<v Speaker 2>and medical data. Deep Dive the rental service for research articles,

644
00:34:05.480 --> 00:34:08.440
<v Speaker 2>and bisnard the business focus engine, using that federated search

645
00:34:08.480 --> 00:34:13.440
<v Speaker 2>technology to combine results from multiple authoritative business collections simultaneously.

646
00:34:13.760 --> 00:34:15.559
<v Speaker 2>Tools designed for specific deep dives.

647
00:34:15.840 --> 00:34:18.840
<v Speaker 1>What an incredible deep dive that was into the deepened

648
00:34:19.960 --> 00:34:23.000
<v Speaker 1>the dark Web. Yeah, we've truly unpacked a lot today,

649
00:34:23.000 --> 00:34:27.119
<v Speaker 1>everything from cybercrimes, unsettling psychological roots you know that dark

650
00:34:27.159 --> 00:34:28.000
<v Speaker 1>triad stuff.

651
00:34:28.280 --> 00:34:33.719
<v Speaker 2>Yeah, and the fascinating complex social dynamics of groups like Anonymous.

652
00:34:33.159 --> 00:34:35.519
<v Speaker 1>Right all the way to the cutting edge technology that

653
00:34:35.559 --> 00:34:39.159
<v Speaker 1>helps us navigate and even harness these hidden digital landscapes,

654
00:34:39.679 --> 00:34:41.719
<v Speaker 1>the crawlers, the analysis.

655
00:34:41.719 --> 00:34:44.280
<v Speaker 2>We've really seen how the Internet is just this complex,

656
00:34:44.519 --> 00:34:48.719
<v Speaker 2>constantly evolving entity, hasn't it always blurring the lines between

657
00:34:48.880 --> 00:34:51.280
<v Speaker 2>public and private, visible and hidden?

658
00:34:51.519 --> 00:34:52.000
<v Speaker 1>Definitely?

659
00:34:52.119 --> 00:34:55.719
<v Speaker 2>And how continuous innovation in data extraction and crawling tech

660
00:34:55.840 --> 00:34:59.000
<v Speaker 2>is just absolutely essential to surface and harness its vast,

661
00:34:59.400 --> 00:35:04.199
<v Speaker 2>often hidden information for everything from academic research to business intelligence.

662
00:35:04.280 --> 00:35:07.360
<v Speaker 1>Well, hopefully this deep dive has given you, our listener,

663
00:35:07.679 --> 00:35:10.039
<v Speaker 1>a bit of a shortcut to being truly well informed

664
00:35:10.039 --> 00:35:13.599
<v Speaker 1>on this. Maybe offer those aha moments without the usual

665
00:35:13.639 --> 00:35:14.599
<v Speaker 1>information overload.

666
00:35:14.880 --> 00:35:18.039
<v Speaker 2>Yeah, you should now have a more nuanced understanding, hopefully,

667
00:35:18.119 --> 00:35:21.239
<v Speaker 2>of the powerful forces shaping our digital interactions and the

668
00:35:21.239 --> 00:35:24.079
<v Speaker 2>sheer amount of hidden data that influences so much of

669
00:35:24.079 --> 00:35:26.639
<v Speaker 2>our world, often thought us even realizing it.

670
00:35:26.920 --> 00:35:29.679
<v Speaker 1>So here's a final provocative thought to leave you with

671
00:35:29.840 --> 00:35:30.840
<v Speaker 1>as you go about your day.

672
00:35:31.159 --> 00:35:35.119
<v Speaker 2>Consider this. As the Web continues its relentless evolution and

673
00:35:35.199 --> 00:35:38.119
<v Speaker 2>more and more of our lives move online, how will

674
00:35:38.159 --> 00:35:41.840
<v Speaker 2>the very definitions of public and private, visible and hidden

675
00:35:42.199 --> 00:35:44.320
<v Speaker 2>continue to shift and change, And.

676
00:35:44.280 --> 00:35:47.880
<v Speaker 1>Perhaps more importantly, what responsibilities do we all have as users,

677
00:35:48.000 --> 00:35:52.960
<v Speaker 1>as developers, as citizens in consciously shaping that complex, interconnected future.
