WEBVTT

1
00:00:00.200 --> 00:00:04.360
<v Speaker 1>Right now, the average company is losing over eight hundred

2
00:00:04.360 --> 00:00:07.879
<v Speaker 1>and sixty thousand dollars a year just to security breaches.

3
00:00:08.000 --> 00:00:08.720
<v Speaker 2>Yeah, it's massive.

4
00:00:08.880 --> 00:00:11.880
<v Speaker 1>And honestly, if you add in the downtime, that is

5
00:00:11.919 --> 00:00:15.800
<v Speaker 1>another four hundred and ninety seven thousand plus, like nearly

6
00:00:15.960 --> 00:00:19.600
<v Speaker 1>five hundred and eighty six thousand purely to data loss

7
00:00:20.399 --> 00:00:22.399
<v Speaker 1>every single year per company.

8
00:00:22.480 --> 00:00:25.600
<v Speaker 2>It's staggering. And you know the scariest part of those

9
00:00:25.679 --> 00:00:29.000
<v Speaker 2>numbers is that attackers, well they aren't breaking through physical

10
00:00:29.039 --> 00:00:31.679
<v Speaker 2>steel vaults to get to that data anymore. Right, the

11
00:00:31.719 --> 00:00:35.159
<v Speaker 2>attack surface has just shifted entirely. We no longer secure

12
00:00:35.240 --> 00:00:37.240
<v Speaker 2>just the perimeter of a network. I mean we have

13
00:00:37.280 --> 00:00:40.520
<v Speaker 2>to secure the actual logic of the applications running on

14
00:00:40.560 --> 00:00:41.039
<v Speaker 2>top of it.

15
00:00:41.119 --> 00:00:44.920
<v Speaker 1>Because if you process payments, or store user profiles, or

16
00:00:46.119 --> 00:00:47.479
<v Speaker 1>hold intellectual.

17
00:00:46.920 --> 00:00:49.560
<v Speaker 2>Property, you're hosting a highly lucrative target right there on

18
00:00:49.600 --> 00:00:51.200
<v Speaker 2>a public facing server. Exactly.

19
00:00:51.280 --> 00:00:53.640
<v Speaker 1>Okay, let's unpack this because our mission today for this

20
00:00:53.719 --> 00:00:57.719
<v Speaker 1>deep dive is to demystify how web application penetration testing

21
00:00:57.799 --> 00:00:58.600
<v Speaker 1>actually works.

22
00:00:58.719 --> 00:01:00.280
<v Speaker 2>Right, we really want to get into the two kit.

23
00:01:00.640 --> 00:01:03.159
<v Speaker 1>Yeah, we are going to break down how these vulnerabilities

24
00:01:03.159 --> 00:01:06.239
<v Speaker 1>are discovered before the malicious actors find them and look

25
00:01:06.280 --> 00:01:10.400
<v Speaker 1>specifically at how custom Python tools are built to just

26
00:01:11.280 --> 00:01:13.760
<v Speaker 1>automate that entire discovery process.

27
00:01:13.519 --> 00:01:15.480
<v Speaker 2>Which is such a fascinating area.

28
00:01:15.640 --> 00:01:17.359
<v Speaker 1>It really is. And the first thing that stands out

29
00:01:17.359 --> 00:01:20.000
<v Speaker 1>to me about this methodology is that, well, we aren't

30
00:01:20.040 --> 00:01:23.400
<v Speaker 1>talking about static code analysis here, Like, this isn't just

31
00:01:24.040 --> 00:01:27.280
<v Speaker 1>running some scanner over thousands of lines of code to

32
00:01:27.400 --> 00:01:31.239
<v Speaker 1>check for typos. This is dynamic. It is a live

33
00:01:31.480 --> 00:01:34.200
<v Speaker 1>offensive exercise on a running application.

34
00:01:34.719 --> 00:01:39.359
<v Speaker 2>Yeah, and security professionals approach this live environment in well

35
00:01:39.400 --> 00:01:41.319
<v Speaker 2>one to two distinct ways. You have your black box

36
00:01:41.359 --> 00:01:44.319
<v Speaker 2>testing and your white box testing. So in a black

37
00:01:44.319 --> 00:01:48.040
<v Speaker 2>box scenario, you simulate an external attacker, you have zero

38
00:01:48.120 --> 00:01:50.879
<v Speaker 2>prior knowledge of the target's infrastructure.

39
00:01:50.280 --> 00:01:52.000
<v Speaker 1>So you're going in totally blind.

40
00:01:51.760 --> 00:01:53.840
<v Speaker 2>Exactly, you have to map the entire architecture from the

41
00:01:53.879 --> 00:01:56.640
<v Speaker 2>outside in. But in a white box tests, the organization

42
00:01:57.040 --> 00:02:01.959
<v Speaker 2>actually provides the source code, the server configureations, and the

43
00:02:02.000 --> 00:02:04.400
<v Speaker 2>API documentation up front, which.

44
00:02:04.159 --> 00:02:07.439
<v Speaker 1>I mean that seems like cheating at first glance, right,

45
00:02:07.799 --> 00:02:10.240
<v Speaker 1>if you're simulating a hacker, why would you want the blueprints?

46
00:02:10.280 --> 00:02:13.120
<v Speaker 2>I get that a lot, actually, but it comes down

47
00:02:13.159 --> 00:02:16.520
<v Speaker 2>to speed and depth. Black box testing spends just a

48
00:02:16.599 --> 00:02:19.800
<v Speaker 2>massive amount of time simply figuring out what exists. Oh

49
00:02:19.840 --> 00:02:23.080
<v Speaker 2>I see, Yeah, So by providing the blueprints, an organization

50
00:02:23.159 --> 00:02:27.439
<v Speaker 2>potentially bypasses that whole discovery phase. It forces the tester

51
00:02:27.599 --> 00:02:31.759
<v Speaker 2>to focus entirely on the deep, complex logic flaws that

52
00:02:31.879 --> 00:02:35.120
<v Speaker 2>an automated external scan might just miss completely.

53
00:02:35.199 --> 00:02:37.280
<v Speaker 1>That makes a lot of sense. But whether you start

54
00:02:37.319 --> 00:02:41.919
<v Speaker 1>blind or with the blueprints, the actual attack methodology still

55
00:02:41.960 --> 00:02:44.039
<v Speaker 1>follows like four rigid phases that does.

56
00:02:44.120 --> 00:02:47.840
<v Speaker 2>Yeah. First is reconnaissance that is actively fingerprinting the infrastructure

57
00:02:48.039 --> 00:02:51.400
<v Speaker 2>to determine what web server, database, and frameworks are running.

58
00:02:51.439 --> 00:02:51.759
<v Speaker 1>Okay.

59
00:02:51.840 --> 00:02:55.240
<v Speaker 2>Second is mapping that's essentially charting every single endpoint and

60
00:02:55.280 --> 00:02:56.199
<v Speaker 2>resource available.

61
00:02:56.240 --> 00:02:58.960
<v Speaker 1>And then you hit phase three, which is vulnerability discovery

62
00:02:59.120 --> 00:03:03.000
<v Speaker 1>where you actively fuzz those endpoints to find you know, cracks.

63
00:03:02.639 --> 00:03:05.800
<v Speaker 2>In the logic precisely. And finally phase four is exploitation.

64
00:03:06.280 --> 00:03:08.840
<v Speaker 1>And what really caught my attention here is the ultimate

65
00:03:08.879 --> 00:03:12.319
<v Speaker 1>goal of that final phase because the objective often isn't

66
00:03:12.360 --> 00:03:14.599
<v Speaker 1>just to compromise the web server itself.

67
00:03:14.280 --> 00:03:16.479
<v Speaker 2>Right, because the web server usually it's in the DMZ,

68
00:03:16.879 --> 00:03:20.599
<v Speaker 2>a demilitarized zone exactly, it's intentionally isolated from the rest

69
00:03:20.599 --> 00:03:22.840
<v Speaker 2>of the company. So the true goal is to use

70
00:03:22.879 --> 00:03:25.960
<v Speaker 2>that compromised web server as a pivot point. You want

71
00:03:26.000 --> 00:03:28.479
<v Speaker 2>to jump the gap into the internal protected network.

72
00:03:28.560 --> 00:03:33.680
<v Speaker 1>Wow, so penetrating that internal barrier is like the holy

73
00:03:33.719 --> 00:03:35.240
<v Speaker 1>Grail of a penetration test.

74
00:03:35.800 --> 00:03:39.039
<v Speaker 2>It absolutely is. The DMZ is designed to be public facing,

75
00:03:39.120 --> 00:03:42.520
<v Speaker 2>you know, it expects hostile traffic, but the internal network

76
00:03:42.680 --> 00:03:44.280
<v Speaker 2>is where the crown Jewels live.

77
00:03:44.280 --> 00:03:48.400
<v Speaker 1>The primary databases, the active directory SERVERSAYE records.

78
00:03:48.479 --> 00:03:52.199
<v Speaker 2>Yeah, proving you can bridge that gap demonstrates a critical

79
00:03:52.280 --> 00:03:54.080
<v Speaker 2>systemic failure in their architecture.

80
00:03:54.159 --> 00:03:58.639
<v Speaker 1>It's essentially like hiring a professional burglar to break into

81
00:03:58.680 --> 00:04:00.319
<v Speaker 1>your house. You aren't doing it just to see if

82
00:04:00.319 --> 00:04:01.960
<v Speaker 1>they can stand on your porch, right, You're doing it

83
00:04:02.000 --> 00:04:03.680
<v Speaker 1>to see if they can use a loose window in

84
00:04:03.719 --> 00:04:07.199
<v Speaker 1>the guest bathroom to somehow unlock the master bedroom safe.

85
00:04:07.400 --> 00:04:10.039
<v Speaker 2>That is a perfect analogy. But you know, to pick

86
00:04:10.080 --> 00:04:13.439
<v Speaker 2>those digital locks, you really have to understand the fundamental

87
00:04:13.520 --> 00:04:17.560
<v Speaker 2>language of the web HTTP exactly. The entire Internet operates

88
00:04:17.560 --> 00:04:22.480
<v Speaker 2>on HGTP, and from an attacker's perspective, HTTP has a

89
00:04:22.759 --> 00:04:26.720
<v Speaker 2>massive structural vulnerability built right into its core.

90
00:04:26.600 --> 00:04:28.959
<v Speaker 1>Which is it is completely stabless.

91
00:04:29.279 --> 00:04:32.839
<v Speaker 2>Yes, statelessness is the architectural quirk that enables almost all

92
00:04:32.920 --> 00:04:37.040
<v Speaker 2>web application manipulation. Neither the client nor the server retains

93
00:04:37.079 --> 00:04:38.800
<v Speaker 2>any memory of previous transactions.

94
00:04:38.839 --> 00:04:41.199
<v Speaker 1>Okay, wait, let me stop you there. Sure, every single

95
00:04:41.240 --> 00:04:44.279
<v Speaker 1>time your browser sends a request to a server, the

96
00:04:44.319 --> 00:04:46.839
<v Speaker 1>server treats you as if it has never met you before.

97
00:04:47.079 --> 00:04:49.319
<v Speaker 2>That is exactly how the protocol is designed.

98
00:04:49.480 --> 00:04:51.759
<v Speaker 1>So if the server has total amnesia every time I

99
00:04:51.759 --> 00:04:54.360
<v Speaker 1>click a link, how does it remember that I'm securely

100
00:04:54.399 --> 00:04:56.879
<v Speaker 1>logged into my bank account, or like that I have

101
00:04:56.959 --> 00:04:58.279
<v Speaker 1>three items in a shopping car.

102
00:04:58.439 --> 00:05:02.079
<v Speaker 2>That is the million dollar question. Engineers had to invent

103
00:05:02.199 --> 00:05:06.399
<v Speaker 2>a workaround to force state onto a stateless protocol, and

104
00:05:06.480 --> 00:05:10.639
<v Speaker 2>that workaround is the HTTP header, specifically the set cookie

105
00:05:10.639 --> 00:05:15.199
<v Speaker 2>and cookie headers. When you authenticate successfully, the server sends

106
00:05:15.240 --> 00:05:18.519
<v Speaker 2>back a set cookie header. It's effectively handing your browser

107
00:05:19.000 --> 00:05:21.959
<v Speaker 2>a unique alphanumeric ID badge.

108
00:05:22.000 --> 00:05:23.160
<v Speaker 1>Okay, I'm with here.

109
00:05:23.319 --> 00:05:27.279
<v Speaker 2>So for every subsequent request you make, your browser attaches

110
00:05:27.319 --> 00:05:30.439
<v Speaker 2>that ID badge using the cookie header. The server sees

111
00:05:30.439 --> 00:05:33.279
<v Speaker 2>the badge and says, ah, I recognize this session.

112
00:05:33.079 --> 00:05:36.759
<v Speaker 1>Which means the entire concept of a quote unquote secure

113
00:05:36.879 --> 00:05:40.160
<v Speaker 1>login session relies on those headers being passed back and

114
00:05:40.199 --> 00:05:42.480
<v Speaker 1>forth in plain sight exactly. So if I'm an attacker,

115
00:05:42.519 --> 00:05:44.560
<v Speaker 1>I mean, I don't need your password. If I can

116
00:05:44.600 --> 00:05:48.240
<v Speaker 1>intercept or predict that session, couldy, I can just inject

117
00:05:48.240 --> 00:05:50.319
<v Speaker 1>it into my own headers and the server will treat

118
00:05:50.360 --> 00:05:52.079
<v Speaker 1>me exactly as if I am you.

119
00:05:52.319 --> 00:05:56.399
<v Speaker 2>That is the very essence of session hijacking. Attackers relentlessly

120
00:05:56.439 --> 00:05:59.759
<v Speaker 2>target headers because they are the control mechanism. Wow, the

121
00:05:59.839 --> 00:06:02.879
<v Speaker 2>US user agent header. For example, your browser sends this

122
00:06:02.959 --> 00:06:05.720
<v Speaker 2>client side header to tell the server what device you

123
00:06:05.759 --> 00:06:08.079
<v Speaker 2>are using, say a desktop running Chrome.

124
00:06:08.439 --> 00:06:10.560
<v Speaker 1>But I could intercept my own request and I don't

125
00:06:10.560 --> 00:06:12.720
<v Speaker 1>know change my user agent to say I'm an iPhone

126
00:06:12.759 --> 00:06:14.800
<v Speaker 1>six running an outdated version of Safari.

127
00:06:15.079 --> 00:06:18.040
<v Speaker 2>You absolutely could, and when you manipulate that header, the

128
00:06:18.120 --> 00:06:23.040
<v Speaker 2>server might route you away from the secure modern desktop application.

129
00:06:22.759 --> 00:06:26.439
<v Speaker 1>Oh and instead serve me an older, deprecated mobile EPI

130
00:06:26.600 --> 00:06:28.240
<v Speaker 1>that the developers forgot to patch.

131
00:06:28.519 --> 00:06:33.160
<v Speaker 2>Exactly. It highlights a core tenet of penetration testing. You

132
00:06:33.199 --> 00:06:36.519
<v Speaker 2>can never trust client side data. Every single piece of

133
00:06:36.519 --> 00:06:39.759
<v Speaker 2>information sent from the browser to the server can be manipulated.

134
00:06:39.879 --> 00:06:42.199
<v Speaker 1>That brings up a massive mechanical problem. Though, I mean

135
00:06:42.319 --> 00:06:45.120
<v Speaker 1>standard browsers like Chrome or Safari they go out of

136
00:06:45.160 --> 00:06:47.680
<v Speaker 1>their way to hide all this underlying plumbing they do.

137
00:06:47.839 --> 00:06:50.439
<v Speaker 1>They definitely don't give you a button to manually edit

138
00:06:50.439 --> 00:06:54.000
<v Speaker 1>your HTTP header's mid flight. So how do testers actually

139
00:06:54.040 --> 00:06:55.160
<v Speaker 1>manipulate this traffic?

140
00:06:55.279 --> 00:06:59.160
<v Speaker 2>They use an HTTP proxy tools like burp Suite, DAP

141
00:06:59.839 --> 00:07:03.720
<v Speaker 2>or the Python based MIT proxy, and intercepting proxy fundamentally

142
00:07:03.800 --> 00:07:06.920
<v Speaker 2>changes how you interact with a web application. How so well,

143
00:07:07.040 --> 00:07:09.720
<v Speaker 2>it sits locally on your machine, acting as a middleman

144
00:07:09.800 --> 00:07:12.600
<v Speaker 2>between your browser and the target server. When you click

145
00:07:12.639 --> 00:07:14.639
<v Speaker 2>a link, the request doesn't actually go.

146
00:07:14.600 --> 00:07:16.240
<v Speaker 1>To the Internet, it goes to the proxy.

147
00:07:16.519 --> 00:07:20.240
<v Speaker 2>Right The proxy holds the request in suspension. It allows

148
00:07:20.240 --> 00:07:24.279
<v Speaker 2>you to manually rewrite the headers, manipulate the query parameters,

149
00:07:24.399 --> 00:07:27.120
<v Speaker 2>or alter the payload before finally releasing to the destination.

150
00:07:27.279 --> 00:07:30.199
<v Speaker 1>Wait, hold on, If almost the entire Internet runs on

151
00:07:30.319 --> 00:07:33.360
<v Speaker 1>HTTPS now, which is end to end encrypted, how is

152
00:07:33.399 --> 00:07:35.959
<v Speaker 1>a proxy sitting in the middle intercepting that traffic?

153
00:07:36.199 --> 00:07:37.319
<v Speaker 2>That's the tricky part.

154
00:07:37.439 --> 00:07:41.720
<v Speaker 1>Shouldn't my browser immediately throw a massive red security warning

155
00:07:41.839 --> 00:07:44.680
<v Speaker 1>because the SSL certificate doesn't match the proxy?

156
00:07:44.759 --> 00:07:48.160
<v Speaker 2>It absolutely would unless you compromise your own machines trust store.

157
00:07:48.240 --> 00:07:53.360
<v Speaker 2>What yeah to intercept HTTPS? Tools like emit proxy dynamically

158
00:07:53.399 --> 00:07:55.839
<v Speaker 2>generate fake SSL certificates on the fly.

159
00:07:56.199 --> 00:07:56.800
<v Speaker 1>You're kidding.

160
00:07:57.000 --> 00:07:59.600
<v Speaker 2>Nope. When you set up the proxy, you install its

161
00:07:59.600 --> 00:08:03.800
<v Speaker 2>custom root certificate authority directly into your operating system's trusted

162
00:08:03.839 --> 00:08:07.360
<v Speaker 2>certificate store. Oh wow, So when the proxy intercepts traffic

163
00:08:07.399 --> 00:08:10.279
<v Speaker 2>to your bank, it instantly signs a fake certificate for

164
00:08:10.319 --> 00:08:13.439
<v Speaker 2>that bank using the root authority your computer already trusts.

165
00:08:13.720 --> 00:08:17.040
<v Speaker 1>That is wild. So my browser sees a valid cryptographic

166
00:08:17.120 --> 00:08:21.079
<v Speaker 1>signature and establishes the secure tunnel with the proxy.

167
00:08:20.959 --> 00:08:24.920
<v Speaker 2>Completely unaware that the proxy is decrypting, reading, and re

168
00:08:25.079 --> 00:08:27.800
<v Speaker 2>encrypting the traffic before sending it to the real server.

169
00:08:28.040 --> 00:08:31.800
<v Speaker 1>You are executing a deliberate, highly sophisticated man in the

170
00:08:31.800 --> 00:08:35.159
<v Speaker 1>middle attack on your own hardware just to see the raw.

171
00:08:35.080 --> 00:08:37.120
<v Speaker 2>Data exactly it's necessary.

172
00:08:37.240 --> 00:08:40.960
<v Speaker 1>That is brilliant. But doing that manually holding individual requests

173
00:08:41.000 --> 00:08:44.919
<v Speaker 1>and suspension to rewrite headers that has to be agonizingly slow.

174
00:08:45.039 --> 00:08:46.279
<v Speaker 2>Oh it is. It's tedious.

175
00:08:46.320 --> 00:08:49.360
<v Speaker 1>If you want to test thousands of endpoints, you need automation.

176
00:08:49.559 --> 00:08:51.879
<v Speaker 1>You need to write scripts in Python.

177
00:08:51.759 --> 00:08:55.759
<v Speaker 2>Right and to truly appreciate Python's capability here, consider the

178
00:08:55.799 --> 00:08:59.559
<v Speaker 2>traditional alternative, the raw old school method of interacting with

179
00:08:59.559 --> 00:09:02.120
<v Speaker 2>a server involved using telnet.

180
00:09:01.679 --> 00:09:02.879
<v Speaker 1>Oh Man tealnet.

181
00:09:02.960 --> 00:09:05.559
<v Speaker 2>Yeah, you would open a terminal connect to a server's

182
00:09:05.559 --> 00:09:08.519
<v Speaker 2>IP address on port eighty and manually type out the

183
00:09:08.639 --> 00:09:10.799
<v Speaker 2>raw HTTP syntax.

184
00:09:10.440 --> 00:09:14.600
<v Speaker 1>Literally typing out get slash htdp slash one point one.

185
00:09:14.519 --> 00:09:17.240
<v Speaker 2>Yes, followed by the host header, and then physically hitting

186
00:09:17.279 --> 00:09:19.679
<v Speaker 2>the enter key twice just to signal the end of

187
00:09:19.720 --> 00:09:20.240
<v Speaker 2>the request.

188
00:09:20.440 --> 00:09:22.519
<v Speaker 1>Doing that for a single page feels like driving a

189
00:09:22.519 --> 00:09:24.399
<v Speaker 1>manual transmission car with no power steering.

190
00:09:24.480 --> 00:09:25.720
<v Speaker 2>That's exactly what it feels like.

191
00:09:26.000 --> 00:09:30.159
<v Speaker 1>You feel every single mechanical grind of the protocol. And

192
00:09:30.279 --> 00:09:33.120
<v Speaker 1>early Python wasn't vastly better, was it not?

193
00:09:33.240 --> 00:09:38.279
<v Speaker 2>Really? The older EARLB twi library required enormous boilerplate code.

194
00:09:38.600 --> 00:09:41.720
<v Speaker 2>You had to manually import separate modules to handle cookies,

195
00:09:41.919 --> 00:09:44.080
<v Speaker 2>build custom authentication handler.

196
00:09:43.799 --> 00:09:45.799
<v Speaker 1>Just to pull down a secure web page, right.

197
00:09:46.039 --> 00:09:50.159
<v Speaker 2>But that friction vanished with the introduction of Python's requests library.

198
00:09:50.519 --> 00:09:53.600
<v Speaker 2>It just abstracts away all the complex mechanics of the protocol.

199
00:09:53.759 --> 00:09:57.519
<v Speaker 1>So sending an authenticated request with custom headers is now

200
00:09:57.759 --> 00:09:59.679
<v Speaker 1>what a two line operation.

201
00:09:59.440 --> 00:10:03.240
<v Speaker 2>Literally two lines. You simply invoke requests dot get. If

202
00:10:03.240 --> 00:10:05.559
<v Speaker 2>you want to spoof your device, you create a standard

203
00:10:05.559 --> 00:10:09.519
<v Speaker 2>Python dictionary user agent colon iPhone six.

204
00:10:09.279 --> 00:10:11.600
<v Speaker 1>And just pass it directly into the function exactly.

205
00:10:11.639 --> 00:10:15.519
<v Speaker 2>The library automatically handles the TCP connection, the encoding, the

206
00:10:15.559 --> 00:10:18.000
<v Speaker 2>SSEL negotiation, and the session persistence.

207
00:10:18.120 --> 00:10:20.320
<v Speaker 1>Okay, let me push back on this though, Sure, go ahead.

208
00:10:20.519 --> 00:10:23.720
<v Speaker 1>If we are using Python to fire thousands of automated

209
00:10:23.799 --> 00:10:27.679
<v Speaker 1>customized payloads at a server in seconds, doesn't that immediately

210
00:10:27.720 --> 00:10:31.720
<v Speaker 1>trigger a modern web application firewall? It definitely can, because

211
00:10:31.720 --> 00:10:36.000
<v Speaker 1>a WAFT is designed to detect anomalist traffic spikes. If

212
00:10:36.000 --> 00:10:38.799
<v Speaker 1>a script hits a server a thousand times a second,

213
00:10:39.240 --> 00:10:42.039
<v Speaker 1>wouldn't the tester's IP just get banned instantly?

214
00:10:42.639 --> 00:10:46.759
<v Speaker 2>A poorly written script will absolutely trigger a firewall. That

215
00:10:46.879 --> 00:10:49.240
<v Speaker 2>is where custom automation becomes an art form.

216
00:10:49.480 --> 00:10:50.080
<v Speaker 1>Ah.

217
00:10:50.120 --> 00:10:52.960
<v Speaker 2>When you write your own Python tools, you build in

218
00:10:53.039 --> 00:10:56.519
<v Speaker 2>evasion mechanics. You introduce what's called jitter.

219
00:10:56.600 --> 00:11:00.960
<v Speaker 1>Jitter like randomized time delays between each request exactly, so.

220
00:11:00.960 --> 00:11:04.720
<v Speaker 2>The traffic pattern mimics human browsing rather than a machine gun.

221
00:11:05.320 --> 00:11:08.960
<v Speaker 2>You automatically rotate the user agent string so every request

222
00:11:09.000 --> 00:11:11.360
<v Speaker 2>looks like it's coming from a different device. Oh, that's clever,

223
00:11:11.559 --> 00:11:13.919
<v Speaker 2>And you route the traffic through a rotating pool of

224
00:11:14.000 --> 00:11:17.399
<v Speaker 2>proxy ip addresses. You aren't just automating the attack, you

225
00:11:17.440 --> 00:11:18.519
<v Speaker 2>are automating.

226
00:11:18.039 --> 00:11:20.519
<v Speaker 1>The stealth, which is exactly what you need when you

227
00:11:20.559 --> 00:11:23.039
<v Speaker 1>transition to the mapping phase, because you can't attack an

228
00:11:23.120 --> 00:11:24.720
<v Speaker 1>endpoint if you don't know it exists.

229
00:11:24.840 --> 00:11:27.960
<v Speaker 2>Right, and developers rarely publish a convenient list of their

230
00:11:28.039 --> 00:11:29.960
<v Speaker 2>hidden administrative portals.

231
00:11:29.759 --> 00:11:33.519
<v Speaker 1>So testers rely on brute force discovery. They use automated

232
00:11:33.559 --> 00:11:37.399
<v Speaker 1>tools like dirb or fuffuzs combined with massive dictionary files

233
00:11:37.440 --> 00:11:40.679
<v Speaker 1>containing thousands of common vulnerable directory.

234
00:11:40.279 --> 00:11:44.080
<v Speaker 2>Names, things like slash backup, Slash test, or slash admin

235
00:11:44.200 --> 00:11:45.120
<v Speaker 2>underscore v two.

236
00:11:45.399 --> 00:11:48.440
<v Speaker 1>Right. The script fires those dictionary terms at the server

237
00:11:48.639 --> 00:11:51.799
<v Speaker 1>and listens for anomalous responses. And it isn't just looking

238
00:11:51.799 --> 00:11:53.600
<v Speaker 1>for standard success codes.

239
00:11:53.320 --> 00:11:56.720
<v Speaker 2>Is it. No, It monitors subtle variations. Think about it.

240
00:11:57.159 --> 00:12:01.559
<v Speaker 2>If requesting a thousand random non exist directories returns an

241
00:12:01.600 --> 00:12:04.759
<v Speaker 2>identical aerror page with the content length of exactly four

242
00:12:04.840 --> 00:12:09.279
<v Speaker 2>hundred bytes, okay, But requesting slash dev underscore backup returns

243
00:12:09.279 --> 00:12:11.399
<v Speaker 2>an air page that is four hundred and fifteen bytes.

244
00:12:11.480 --> 00:12:14.840
<v Speaker 1>Oh. I See that tiny discrepancy in the response size

245
00:12:14.879 --> 00:12:17.600
<v Speaker 1>tells the tester that the directory physically exists on the server,

246
00:12:17.720 --> 00:12:19.039
<v Speaker 1>even if access is forbidden.

247
00:12:19.080 --> 00:12:21.159
<v Speaker 2>Precisely, it's a dead giveaway.

248
00:12:20.879 --> 00:12:24.080
<v Speaker 1>That covers the invisible hidden directories. But to map the

249
00:12:24.159 --> 00:12:27.879
<v Speaker 1>visible architecture of an application systematically, you have to scrape.

250
00:12:27.519 --> 00:12:30.759
<v Speaker 2>It, yes, and Python handles this brilliantly with a library

251
00:12:30.799 --> 00:12:31.559
<v Speaker 2>called Scrapie.

252
00:12:31.919 --> 00:12:36.080
<v Speaker 1>Instead of guessing URLs, a scrapey spider navigates the application

253
00:12:36.200 --> 00:12:39.039
<v Speaker 1>exactly how a human would by following links.

254
00:12:38.960 --> 00:12:43.279
<v Speaker 2>Right, scrapeye shifts the focus from simply making requests to

255
00:12:43.399 --> 00:12:47.279
<v Speaker 2>deeply parsing their responses. When the spider downloads the HTML

256
00:12:47.320 --> 00:12:51.120
<v Speaker 2>of a page, it has to extract specific, meaningful data

257
00:12:51.440 --> 00:12:52.960
<v Speaker 2>from thousands of lines.

258
00:12:52.679 --> 00:12:55.960
<v Speaker 1>Of markup, and it achieves this by interacting with the

259
00:12:56.480 --> 00:12:58.960
<v Speaker 1>document object model or DOM exactly.

260
00:12:59.080 --> 00:13:02.679
<v Speaker 2>The DOM is essentially a hierarchical tree representing every element

261
00:13:02.720 --> 00:13:05.200
<v Speaker 2>on the page, and to navigate that tree.

262
00:13:05.279 --> 00:13:08.519
<v Speaker 1>You use XPath, which is essentially a coordinate system for

263
00:13:08.639 --> 00:13:11.039
<v Speaker 1>web data. Like if I want to extract a list

264
00:13:11.039 --> 00:13:13.360
<v Speaker 1>of book titles from a publisher site, I don't write

265
00:13:13.360 --> 00:13:15.039
<v Speaker 1>complex code to read the text.

266
00:13:15.080 --> 00:13:17.559
<v Speaker 2>No, you just inspect the page, find that the titles

267
00:13:17.559 --> 00:13:20.639
<v Speaker 2>are wrapped in a specific tag, and write an x path.

268
00:13:20.519 --> 00:13:24.080
<v Speaker 1>Query something like you know, slash div bracket at class

269
00:13:24.080 --> 00:13:26.799
<v Speaker 1>equals quote book block title, quote slash text.

270
00:13:26.840 --> 00:13:29.320
<v Speaker 2>Think about the mechanics of that query. The double forward

271
00:13:29.360 --> 00:13:32.840
<v Speaker 2>slash tells the script to search the entire document, regardless

272
00:13:32.840 --> 00:13:33.519
<v Speaker 2>of hierarchy.

273
00:13:33.600 --> 00:13:35.440
<v Speaker 1>It specifically hunts for a div node.

274
00:13:35.639 --> 00:13:38.600
<v Speaker 2>Right, The brackets act as an attribute filter, ensuring it

275
00:13:38.639 --> 00:13:42.720
<v Speaker 2>only selects nodes where the class exactly matches that title. Finally,

276
00:13:42.759 --> 00:13:46.559
<v Speaker 2>the text function strips away all the surrounding HTML markup.

277
00:13:46.480 --> 00:13:49.600
<v Speaker 1>And returns only the clean payload, rips the exact data

278
00:13:49.639 --> 00:13:52.320
<v Speaker 1>you want, and cleanly exports it into a JSON file.

279
00:13:52.559 --> 00:13:56.159
<v Speaker 2>It's beautiful, but scraping a single page is useless for

280
00:13:56.200 --> 00:13:59.759
<v Speaker 2>mapping an entire site. The spider has to be recursive, right.

281
00:13:59.799 --> 00:14:03.320
<v Speaker 1>It has to find every URL on the page, validate

282
00:14:03.360 --> 00:14:06.799
<v Speaker 1>them using complex regular expressions to filter out junk data,

283
00:14:07.000 --> 00:14:10.240
<v Speaker 1>and then launch new requests for every single link it finds.

284
00:14:10.519 --> 00:14:14.519
<v Speaker 2>And that recursion introduces a catastrophic risk if not managed correctly.

285
00:14:14.639 --> 00:14:17.759
<v Speaker 2>Why is that because web architecture is not a straight line.

286
00:14:18.000 --> 00:14:21.159
<v Speaker 2>It is a highly interconnected graph. The homepage links to

287
00:14:21.200 --> 00:14:24.080
<v Speaker 2>the about page, which links to the contact page, which

288
00:14:24.120 --> 00:14:26.399
<v Speaker 2>inevitably contains a link right back to the homepage.

289
00:14:26.440 --> 00:14:28.240
<v Speaker 1>Oh I see where this is going. If your spider

290
00:14:28.279 --> 00:14:31.039
<v Speaker 1>doesn't track its own path, it follows that cycle endlessly

291
00:14:31.440 --> 00:14:34.600
<v Speaker 1>homepage about contact, homepage about contact.

292
00:14:34.639 --> 00:14:38.080
<v Speaker 2>It creates an infinite loop. Within minutes, the automated script

293
00:14:38.159 --> 00:14:41.639
<v Speaker 2>will consume all available local memory and crash your machine.

294
00:14:41.840 --> 00:14:45.320
<v Speaker 1>Or worse, it will effectively launch a denial of service

295
00:14:45.360 --> 00:14:49.039
<v Speaker 1>attack against the target server by hammering those three pages

296
00:14:49.519 --> 00:14:50.919
<v Speaker 1>thousands of times a second.

297
00:14:51.080 --> 00:14:56.000
<v Speaker 2>Exactly to prevent this, professional crawlers maintain a tracking array.

298
00:14:56.480 --> 00:15:00.279
<v Speaker 2>It's a stateful list of every unique URL they have ourday.

299
00:15:00.159 --> 00:15:03.200
<v Speaker 1>Processed, So before the spider follows a new link, it

300
00:15:03.399 --> 00:15:05.759
<v Speaker 1>checks the array. If the URL is in a list,

301
00:15:05.799 --> 00:15:06.600
<v Speaker 1>it just drops it.

302
00:15:06.799 --> 00:15:07.039
<v Speaker 2>Right.

303
00:15:07.399 --> 00:15:11.639
<v Speaker 1>Okay, If these automated scrapey spiders are so insanely powerful

304
00:15:11.679 --> 00:15:15.799
<v Speaker 1>and they map massive architectures without triggering infinite loops, why

305
00:15:15.840 --> 00:15:18.360
<v Speaker 1>do we still need those local HTTP proxies we talked

306
00:15:18.360 --> 00:15:18.919
<v Speaker 1>about earlier.

307
00:15:18.960 --> 00:15:19.759
<v Speaker 2>That's a great question.

308
00:15:20.039 --> 00:15:23.159
<v Speaker 1>Why not just let scrape map the entire application automatically?

309
00:15:23.200 --> 00:15:27.080
<v Speaker 2>Because automated crawlers have a massive fundamental blind spot. They

310
00:15:27.120 --> 00:15:30.679
<v Speaker 2>do not execute JavaScript. A scrapey spider pulls down the

311
00:15:30.759 --> 00:15:33.759
<v Speaker 2>raw static HTML response from the server and parses it,

312
00:15:34.480 --> 00:15:37.759
<v Speaker 2>but modern web applications are heavily dynamic.

313
00:15:37.480 --> 00:15:39.399
<v Speaker 1>Right, A lot of it is rendered on the client side.

314
00:15:39.440 --> 00:15:43.039
<v Speaker 2>Now, yeah, many interfaces, buttons and API n points don't

315
00:15:43.080 --> 00:15:46.279
<v Speaker 2>actually exist in the static HTML. They're generated dynamically by

316
00:15:46.360 --> 00:15:48.840
<v Speaker 2>JavaScript only after the browser loads the page.

317
00:15:49.000 --> 00:15:52.720
<v Speaker 1>So the spider reads the HTML, sees no links, and

318
00:15:52.799 --> 00:15:54.039
<v Speaker 1>assumes the page is empty.

319
00:15:54.159 --> 00:15:59.519
<v Speaker 2>Precisely, it completely misses the dynamically generated attack surface. But

320
00:15:59.600 --> 00:16:02.600
<v Speaker 2>a local proxy sitting between a real web browser and

321
00:16:02.639 --> 00:16:04.399
<v Speaker 2>the server captures everything.

322
00:16:04.600 --> 00:16:08.639
<v Speaker 1>Because the browser executes the JavaScript, generates the new requests

323
00:16:08.679 --> 00:16:10.399
<v Speaker 1>and sends them through the proxy.

324
00:16:10.559 --> 00:16:13.399
<v Speaker 2>Relying purely on an automated crawler leaves you with just

325
00:16:13.440 --> 00:16:16.759
<v Speaker 2>a fraction of the actual map. You must combine automated

326
00:16:16.799 --> 00:16:20.879
<v Speaker 2>scraping with proxy browser based interaction to see the full picture.

327
00:16:20.960 --> 00:16:24.120
<v Speaker 1>This brings up a fundamental question about the modern security landscape.

328
00:16:24.159 --> 00:16:28.320
<v Speaker 1>Though there are massive commercial vulnerability scanners on the market,

329
00:16:28.360 --> 00:16:31.080
<v Speaker 1>oh definitely, tools that cost tens of thousands of dollars

330
00:16:31.159 --> 00:16:34.519
<v Speaker 1>and come with highly polished graphical interfaces, why would a

331
00:16:34.559 --> 00:16:39.240
<v Speaker 1>professional penetration tester spend hours writing custom Python scripts from scratch?

332
00:16:39.480 --> 00:16:43.240
<v Speaker 2>Because commercial scanners are built on generalizations, they are designed

333
00:16:43.279 --> 00:16:47.200
<v Speaker 2>to find known vulnerabilities in standard configurations. But enterprise web

334
00:16:47.240 --> 00:16:53.360
<v Speaker 2>applications are unique, complex ecosystems. They are sprawling amalgamations of

335
00:16:53.720 --> 00:16:58.600
<v Speaker 2>legacy codebases, customized frameworks, and proprietary business logic.

336
00:16:58.759 --> 00:17:01.559
<v Speaker 1>So a commercial scanner it might not even understand how

337
00:17:01.559 --> 00:17:06.079
<v Speaker 1>to properly authenticate to a highly customized multi factor log

338
00:17:06.119 --> 00:17:06.640
<v Speaker 1>in sequence.

339
00:17:06.680 --> 00:17:09.039
<v Speaker 2>It gets stuck right the front door sil When you

340
00:17:09.039 --> 00:17:12.680
<v Speaker 2>write custom Python tools, whether it's a specialized brute forcer

341
00:17:12.799 --> 00:17:16.200
<v Speaker 2>or a tailored Mint proxy script, you adapt instantly to

342
00:17:16.240 --> 00:17:17.680
<v Speaker 2>the unique quirks of the target.

343
00:17:17.799 --> 00:17:22.119
<v Speaker 1>You build logic that perfectly mimics the application's required behavior right.

344
00:17:22.319 --> 00:17:25.240
<v Speaker 2>Allowing you to bypass the non standard hurdles that stop

345
00:17:25.279 --> 00:17:27.200
<v Speaker 2>automated commercial tools in their tracks.

346
00:17:27.640 --> 00:17:30.160
<v Speaker 1>But you know, the ability to instantly bypass hurdles with

347
00:17:30.160 --> 00:17:33.119
<v Speaker 1>custom code requires an environment where you can safely fail.

348
00:17:33.200 --> 00:17:33.359
<v Speaker 2>Oh.

349
00:17:33.400 --> 00:17:37.480
<v Speaker 1>Absolutely, you cannot point a newly drafted, untested brute forcer

350
00:17:37.599 --> 00:17:39.960
<v Speaker 1>at a live production server to see how it handles

351
00:17:39.960 --> 00:17:43.559
<v Speaker 1>thread concurrency. The methodology demands a sandbox.

352
00:17:43.920 --> 00:17:47.079
<v Speaker 2>Yeah, professionals use virtualization software like virtual Box for this.

353
00:17:47.680 --> 00:17:52.720
<v Speaker 2>They run deliberately vulnerable applications like the simulated Scruffy Bank environment,

354
00:17:53.119 --> 00:17:57.200
<v Speaker 2>which operates on a standard stack of PHP, Mysequel, and Apache.

355
00:17:57.240 --> 00:18:00.759
<v Speaker 1>It provides a local contained ecosystem where a runaway script

356
00:18:00.839 --> 00:18:03.559
<v Speaker 1>won't destroy actual infrastructure.

357
00:18:02.880 --> 00:18:06.519
<v Speaker 2>Which underscores the most critical operational boundary in this entire field,

358
00:18:06.640 --> 00:18:10.599
<v Speaker 2>the legal one exactly Executing these techniques against a live

359
00:18:10.640 --> 00:18:14.839
<v Speaker 2>target without explicit written authorization from the organization is a

360
00:18:14.960 --> 00:18:15.799
<v Speaker 2>federal crime.

361
00:18:16.119 --> 00:18:18.799
<v Speaker 1>Yeah. The difference between a security audit and a cyber

362
00:18:18.799 --> 00:18:22.000
<v Speaker 1>attech is purely a matter of legal scope and permission

363
00:18:22.240 --> 00:18:24.960
<v Speaker 1>it is. It is the equivalent of holding a master key.

364
00:18:25.359 --> 00:18:27.640
<v Speaker 1>The fact that you understand the mechanics of the lock

365
00:18:27.759 --> 00:18:30.200
<v Speaker 1>and possess the tools to pick it does not grant

366
00:18:30.279 --> 00:18:32.240
<v Speaker 1>you the right to test it on a neighbor's front door.

367
00:18:32.359 --> 00:18:33.400
<v Speaker 2>It absolutely doesn't.

368
00:18:33.519 --> 00:18:37.920
<v Speaker 1>The power to map an entire corporate infrastructure or dynamically

369
00:18:37.960 --> 00:18:43.319
<v Speaker 1>intercept and rewrite HTTPS traffic in seconds carries absolute legal liability.

370
00:18:43.440 --> 00:18:47.440
<v Speaker 2>The methodology is an incredible responsibility. You are leveraging the

371
00:18:47.519 --> 00:18:51.359
<v Speaker 2>underlying architecture of the Internet to expose flaws before they

372
00:18:51.400 --> 00:18:52.160
<v Speaker 2>are weaponized.

373
00:18:52.400 --> 00:18:55.119
<v Speaker 1>When you look back at everything we've covered, it fundamentally

374
00:18:55.200 --> 00:18:58.160
<v Speaker 1>changes how you view a web browser. I mean, we

375
00:18:58.160 --> 00:19:00.920
<v Speaker 1>looked at the sheer financial devastation.

376
00:19:00.279 --> 00:19:02.720
<v Speaker 2>Of breaches, the huge numbers at the start.

377
00:19:02.680 --> 00:19:07.279
<v Speaker 1>Right, and we broke down the inherent vulnerability of HTTP's

378
00:19:07.319 --> 00:19:11.559
<v Speaker 1>stateless design and how manipulating headers allows you to bypass

379
00:19:11.599 --> 00:19:12.680
<v Speaker 1>identity controls.

380
00:19:12.960 --> 00:19:18.000
<v Speaker 2>We moved past the browser, intercepting encrypted traffic with local proxies.

381
00:19:17.640 --> 00:19:22.160
<v Speaker 1>And automating complex interactions using Python's requests library. We mapped

382
00:19:22.160 --> 00:19:25.359
<v Speaker 1>the invisible corners of applications with brute forcers and traversed

383
00:19:25.359 --> 00:19:27.519
<v Speaker 1>the dom using scrape and XPath.

384
00:19:27.519 --> 00:19:30.680
<v Speaker 2>All while avoiding the traps of infinite loops and JavaScript

385
00:19:30.680 --> 00:19:31.759
<v Speaker 2>rendering blind spots.

386
00:19:31.839 --> 00:19:35.319
<v Speaker 1>It is a profound shift in perspective. You stop seeing

387
00:19:35.319 --> 00:19:37.799
<v Speaker 1>a website as a collection of pages and start seeing

388
00:19:37.839 --> 00:19:40.839
<v Speaker 1>it as a complex sequence of API calls and database

389
00:19:40.920 --> 00:19:42.839
<v Speaker 1>queries just waiting to be manipulated.

390
00:19:42.960 --> 00:19:44.240
<v Speaker 2>It really is a whole different world.

391
00:19:44.559 --> 00:19:47.160
<v Speaker 1>And as you think about that manipulation, there is one

392
00:19:47.640 --> 00:19:51.640
<v Speaker 1>final provocative concept to consider. We spend so much time

393
00:19:51.680 --> 00:19:53.640
<v Speaker 1>talking about technical flaws, right.

394
00:19:53.720 --> 00:19:57.640
<v Speaker 2>SEQL injections, cross sites, scripting, broken authentication algorithms.

395
00:19:58.079 --> 00:20:01.119
<v Speaker 1>Yeah, we hunt for broken code, but consider what happens

396
00:20:01.119 --> 00:20:04.720
<v Speaker 1>when these custom Python tools interact with the actual business

397
00:20:04.720 --> 00:20:07.240
<v Speaker 1>logic of an application. What do you mean what if

398
00:20:07.240 --> 00:20:11.000
<v Speaker 1>the most devastating vulnerability isn't a coding error at all.

399
00:20:11.559 --> 00:20:14.880
<v Speaker 1>What if the code executes exactly as the engineers intended,

400
00:20:15.240 --> 00:20:17.400
<v Speaker 1>but the logic itself becomes the weapon.

401
00:20:17.720 --> 00:20:18.400
<v Speaker 2>Oh I see.

402
00:20:18.599 --> 00:20:21.359
<v Speaker 1>Imagine an e commerce site where adding an item to

403
00:20:21.400 --> 00:20:25.680
<v Speaker 1>your cart triggers a perfectly legal HTTP request. Now imagine

404
00:20:25.759 --> 00:20:29.200
<v Speaker 1>using a custom Python script to send that exact legal request,

405
00:20:29.319 --> 00:20:31.720
<v Speaker 1>but adding a negative quantity to the shopping.

406
00:20:31.359 --> 00:20:35.279
<v Speaker 2>Cart, mathematically forcing the server to reduce your total balance exactly.

407
00:20:35.440 --> 00:20:39.039
<v Speaker 1>The code didn't break, the firewall didn't trigger, The transaction

408
00:20:39.160 --> 00:20:42.559
<v Speaker 1>was completely valid. The threat wasn't a broken lock. The

409
00:20:42.599 --> 00:20:46.200
<v Speaker 1>threat was a completely flawless sequence of automated requests that

410
00:20:46.240 --> 00:20:49.599
<v Speaker 1>the developers simply never anticipated a human could make.

411
00:20:50.079 --> 00:20:52.359
<v Speaker 2>That is a terrifying thought, it really is.

412
00:20:52.880 --> 00:20:55.680
<v Speaker 1>We hope this deep dive into the source material has

413
00:20:55.720 --> 00:20:58.640
<v Speaker 1>given you a completely new lens through which to view

414
00:20:58.759 --> 00:21:01.920
<v Speaker 1>the invisible digital at all ground. Thank you for joining us,

415
00:21:02.039 --> 00:21:02.920
<v Speaker 1>and keep exploring
