WEBVTT

1
00:00:00.400 --> 00:00:04.320
<v Speaker 1>Welcome to the Occult Green Jacks. Tonight, we've got a

2
00:00:04.400 --> 00:00:07.559
<v Speaker 1>very special guest, Ashih, who is going to tell us

3
00:00:07.599 --> 00:00:12.759
<v Speaker 1>a lot of really deep intelligent stuff about AI and

4
00:00:12.839 --> 00:00:15.160
<v Speaker 1>all all the rest of the stuff. He'll get into it,

5
00:00:15.519 --> 00:00:18.280
<v Speaker 1>but first let's go to Nick. Nick, how you doing

6
00:00:18.359 --> 00:00:20.719
<v Speaker 1>what you got coming out? Tell us about your herbal

7
00:00:21.160 --> 00:00:22.600
<v Speaker 1>video that just popped out?

8
00:00:22.879 --> 00:00:23.839
<v Speaker 2>Oh, thank you, thank you?

9
00:00:23.960 --> 00:00:26.000
<v Speaker 3>Yeah that it seems like I seemed to be getting

10
00:00:26.039 --> 00:00:27.640
<v Speaker 3>like you even mentioned good reviews are right, ab, we'll

11
00:00:27.640 --> 00:00:28.559
<v Speaker 3>getting some good comments.

12
00:00:29.719 --> 00:00:31.160
<v Speaker 2>It's a different way of looking at it.

13
00:00:31.239 --> 00:00:33.640
<v Speaker 3>I do cover the like kind of the occult angle

14
00:00:33.679 --> 00:00:36.200
<v Speaker 3>of the witchcraft angle, maybe even like a little bit

15
00:00:36.200 --> 00:00:37.799
<v Speaker 3>of history, like I forgot what it might have been,

16
00:00:37.799 --> 00:00:40.399
<v Speaker 3>peppermin or something, spearman something I was mentioning, like even

17
00:00:40.439 --> 00:00:43.799
<v Speaker 3>I think in Greek Greek history of Greece they'd wear

18
00:00:43.840 --> 00:00:46.159
<v Speaker 3>it around their heads, is like to help them study

19
00:00:46.280 --> 00:00:47.000
<v Speaker 3>or something when they.

20
00:00:46.880 --> 00:00:48.600
<v Speaker 2>Took tests, some random shit like that.

21
00:00:48.640 --> 00:00:50.719
<v Speaker 3>Like and I'm throwing all that stuff in there, and

22
00:00:50.759 --> 00:00:53.520
<v Speaker 3>then I'll go into how it actually affects your twelve

23
00:00:53.560 --> 00:00:56.079
<v Speaker 3>cranial nerves and when you saw it seeing like when

24
00:00:56.119 --> 00:00:58.479
<v Speaker 3>it's actually going to you your your you know your

25
00:00:58.560 --> 00:01:03.960
<v Speaker 3>vegas system and your uh you're glad uh if it

26
00:01:04.040 --> 00:01:06.799
<v Speaker 3>basically like the some effect like up to three at

27
00:01:06.799 --> 00:01:09.159
<v Speaker 3>a time, and that gets pretty interesting when it starts

28
00:01:09.239 --> 00:01:13.319
<v Speaker 3>doing that. You know your trigeminal, your vegas system, and

29
00:01:13.359 --> 00:01:14.879
<v Speaker 3>maybe one or two one other one.

30
00:01:15.159 --> 00:01:17.760
<v Speaker 2>But uh yeah, long story short.

31
00:01:18.599 --> 00:01:20.599
<v Speaker 3>I think it's interesting for people to see like what

32
00:01:20.760 --> 00:01:24.439
<v Speaker 3>some of these things actually do when you just inhale them,

33
00:01:24.599 --> 00:01:26.480
<v Speaker 3>what kind of effects it's gonna have on your body

34
00:01:26.480 --> 00:01:29.319
<v Speaker 3>and how you're gonna feel afterwards, and some of them

35
00:01:29.319 --> 00:01:32.079
<v Speaker 3>that I even show, Uh, if you decide to just

36
00:01:32.120 --> 00:01:35.239
<v Speaker 3>all of a sudden going crazy dosages with it, you'll

37
00:01:35.319 --> 00:01:38.120
<v Speaker 3>notice it starts to shut your system down. And I

38
00:01:38.120 --> 00:01:41.000
<v Speaker 3>start to wonder about that with maybe shamanistic rituals.

39
00:01:41.680 --> 00:01:45.280
<v Speaker 2>You know, that's why you were fumigating that stuff, you know.

40
00:01:45.400 --> 00:01:47.400
<v Speaker 3>So it's just interesting to see the kind of science

41
00:01:47.439 --> 00:01:49.599
<v Speaker 3>with these things because you start to wonder, did they

42
00:01:49.680 --> 00:01:52.959
<v Speaker 3>know this back then, because like the effects that you'll see,

43
00:01:53.000 --> 00:01:55.239
<v Speaker 3>like that would be in like Scott Cunningham's book about

44
00:01:55.280 --> 00:01:57.599
<v Speaker 3>what these herbs are gonna do for you, you can say, well,

45
00:01:57.599 --> 00:01:59.719
<v Speaker 3>that's what it's doing to you fucking physically, just from

46
00:01:59.799 --> 00:02:03.239
<v Speaker 3>hill or putting it on your skin, you know, so

47
00:02:03.280 --> 00:02:06.719
<v Speaker 3>you almost wonder like were they were still getting it twisted?

48
00:02:06.840 --> 00:02:07.319
<v Speaker 2>You know what I'm saying.

49
00:02:07.319 --> 00:02:09.680
<v Speaker 3>It is still more science unfortunately with that stuff, and

50
00:02:09.719 --> 00:02:10.919
<v Speaker 3>somehow it got woo wooed.

51
00:02:11.360 --> 00:02:11.560
<v Speaker 2>You know.

52
00:02:11.680 --> 00:02:14.719
<v Speaker 3>I don't know did we lose did we lose the original?

53
00:02:14.879 --> 00:02:17.439
<v Speaker 3>I guess a cult cult idea behind it, something like that.

54
00:02:17.719 --> 00:02:20.159
<v Speaker 3>So check it out. It is getting good reviews. I

55
00:02:20.199 --> 00:02:22.080
<v Speaker 3>was a little surprised that actually has as many comments

56
00:02:22.120 --> 00:02:24.840
<v Speaker 3>as it does now. So, uh yeah, it's got about

57
00:02:24.879 --> 00:02:27.800
<v Speaker 3>three parts one drop today. We got a few other

58
00:02:27.840 --> 00:02:29.800
<v Speaker 3>ones that will be coming out. I got part two

59
00:02:29.800 --> 00:02:34.080
<v Speaker 3>of the Colors coming out. I should have Johannes Kepler

60
00:02:34.560 --> 00:02:36.560
<v Speaker 3>dropping in like two weeks. I should be editing that

61
00:02:36.680 --> 00:02:38.000
<v Speaker 3>up and that will be out. And I got into

62
00:02:38.039 --> 00:02:40.280
<v Speaker 3>my park that should be dropping any within the week,

63
00:02:40.319 --> 00:02:43.759
<v Speaker 3>I'm hoping. So yeah, we got a bunch of shit

64
00:02:43.840 --> 00:02:45.479
<v Speaker 3>coming out. I have a bunch of stuff edited. I

65
00:02:45.560 --> 00:02:48.159
<v Speaker 3>just haven't actually wrapped it all up yet. But uh yeah,

66
00:02:48.199 --> 00:02:50.120
<v Speaker 3>then we do have some stuff going forward. We'll be

67
00:02:50.159 --> 00:02:54.000
<v Speaker 3>covering the Stargate project next week. Again, it's gonna be

68
00:02:54.039 --> 00:02:56.280
<v Speaker 3>a big group of the whole cult reject team we'll

69
00:02:56.319 --> 00:02:58.879
<v Speaker 3>be here covering more of the Stargate or Gateway project,

70
00:02:59.120 --> 00:03:02.439
<v Speaker 3>PaperWorks whatever was Yeah, so that should be interesting.

71
00:03:02.439 --> 00:03:03.520
<v Speaker 2>So we got a lot of stuff coming.

72
00:03:04.400 --> 00:03:08.319
<v Speaker 1>Thank you all right, sounds awesome, Ahi, Ish tell the

73
00:03:08.360 --> 00:03:11.400
<v Speaker 1>people about what you've been through, tell them about who

74
00:03:11.479 --> 00:03:12.360
<v Speaker 1>you are and what you do.

75
00:03:13.120 --> 00:03:16.439
<v Speaker 4>Hi. Good to be back on guys. Thanks for having

76
00:03:16.479 --> 00:03:19.680
<v Speaker 4>me in case you don't know me from the last

77
00:03:19.680 --> 00:03:22.960
<v Speaker 4>times I was on here. My background, my name's Ashiesh.

78
00:03:23.120 --> 00:03:27.240
<v Speaker 4>My background is in artificial intelligence and in finance. I

79
00:03:27.319 --> 00:03:29.919
<v Speaker 4>spent ten years working in finance as a trader for

80
00:03:30.039 --> 00:03:33.479
<v Speaker 4>Edward Jones Investments, and part of my responsibilities were covering

81
00:03:33.520 --> 00:03:36.879
<v Speaker 4>tech at the time. After that, I transitioned full time

82
00:03:36.919 --> 00:03:40.680
<v Speaker 4>into AI and I was specifically trying to apply AI

83
00:03:40.719 --> 00:03:43.599
<v Speaker 4>to computer vision. And the presentation I want to give

84
00:03:43.639 --> 00:03:45.639
<v Speaker 4>today is related to that because AI is a hot

85
00:03:45.680 --> 00:03:48.960
<v Speaker 4>topic and I think one of the things that's happening,

86
00:03:49.080 --> 00:03:51.479
<v Speaker 4>just like every other subject we've seen, there's a massive

87
00:03:51.520 --> 00:03:55.120
<v Speaker 4>amount of distortion that's happening in AI, And what I

88
00:03:55.159 --> 00:03:57.840
<v Speaker 4>want to do is I want to help people. So

89
00:03:58.000 --> 00:04:00.319
<v Speaker 4>a lot of people think that AI is like a

90
00:04:00.400 --> 00:04:05.159
<v Speaker 4>super advanced subject that's impossible to tackle or understand unless

91
00:04:05.199 --> 00:04:09.639
<v Speaker 4>you have some kind of like background in computer science,

92
00:04:10.199 --> 00:04:13.400
<v Speaker 4>and to be honest, it's not really necessary. I hope

93
00:04:13.400 --> 00:04:17.000
<v Speaker 4>I can make this accessible for people to understand real AI.

94
00:04:17.720 --> 00:04:20.519
<v Speaker 4>And I think it's really important that people understand this

95
00:04:20.680 --> 00:04:25.360
<v Speaker 4>because there's so many profound things that are actually true

96
00:04:25.399 --> 00:04:29.759
<v Speaker 4>about AI that are not really publicly discussed. And so

97
00:04:30.079 --> 00:04:33.319
<v Speaker 4>one of the big topics that is kind of a

98
00:04:33.319 --> 00:04:35.279
<v Speaker 4>big deal with AI and a lot of people who

99
00:04:35.279 --> 00:04:38.920
<v Speaker 4>are like doomers on AI, they think that there's some

100
00:04:39.040 --> 00:04:42.600
<v Speaker 4>kind of artificial life form that's about to be created,

101
00:04:43.120 --> 00:04:46.399
<v Speaker 4>and this artificial life form will turned into the terminator

102
00:04:46.480 --> 00:04:50.879
<v Speaker 4>or some apocalyptise scenario or something like that. Hopefully, what

103
00:04:50.959 --> 00:04:52.519
<v Speaker 4>I can do here is bring a little bit of

104
00:04:52.600 --> 00:04:57.399
<v Speaker 4>rationality this subject without getting overly technical. But there are

105
00:04:57.439 --> 00:04:59.680
<v Speaker 4>a couple of technical concepts that I want to explain, no,

106
00:04:59.680 --> 00:05:04.600
<v Speaker 4>I'll make it. And I also want to be able

107
00:05:04.639 --> 00:05:08.079
<v Speaker 4>to show what real AI is and how you know

108
00:05:08.160 --> 00:05:11.480
<v Speaker 4>what real AI is with versus what is not because

109
00:05:11.519 --> 00:05:14.720
<v Speaker 4>there's a lot of junk AI out there. So let's

110
00:05:14.759 --> 00:05:20.600
<v Speaker 4>just kind of start with like what is AI? Like fundamentally,

111
00:05:20.800 --> 00:05:26.399
<v Speaker 4>at a kindergarten level, what is AI? AI is a

112
00:05:26.639 --> 00:05:31.439
<v Speaker 4>mathematical function, and it is primarily based on a reward

113
00:05:31.680 --> 00:05:34.759
<v Speaker 4>and cost. There's a function for reward that can be

114
00:05:34.800 --> 00:05:37.920
<v Speaker 4>basically anything you want, and then there's a function for

115
00:05:38.199 --> 00:05:41.519
<v Speaker 4>cost for that same reward, which can also basically be

116
00:05:41.600 --> 00:05:44.959
<v Speaker 4>whatever you want. And the objective in AI is to

117
00:05:45.040 --> 00:05:50.040
<v Speaker 4>make these two functions resemble real life as much as possible.

118
00:05:50.839 --> 00:05:53.680
<v Speaker 4>So this is really what AI is. It's a type

119
00:05:53.680 --> 00:05:59.720
<v Speaker 4>of pattern recognition map. And this pattern recognition map is intense.

120
00:06:00.319 --> 00:06:04.480
<v Speaker 4>It's extremely large. Some of these models can have billions

121
00:06:04.519 --> 00:06:07.079
<v Speaker 4>of weights, and I'll kind of explain a little bit

122
00:06:07.120 --> 00:06:12.160
<v Speaker 4>about that. And because of the massive size of these models,

123
00:06:12.600 --> 00:06:17.519
<v Speaker 4>they are also energy intents and the energy you may

124
00:06:17.560 --> 00:06:20.839
<v Speaker 4>know in you know, in America, there's this huge push

125
00:06:20.920 --> 00:06:26.240
<v Speaker 4>for energy investment. Weightly, we're spending trillions upon trillions of dollars,

126
00:06:26.319 --> 00:06:30.040
<v Speaker 4>multiple trillions of dollars into both our energy grid and

127
00:06:30.279 --> 00:06:33.759
<v Speaker 4>energy utilities. And then on top of that, AI as

128
00:06:33.800 --> 00:06:38.160
<v Speaker 4>well is also independently receiving or trillions of dollars are

129
00:06:38.199 --> 00:06:44.279
<v Speaker 4>independently being invested in AI. So there are two major

130
00:06:44.680 --> 00:06:50.000
<v Speaker 4>energy intense aspects of AI. There is the training side,

131
00:06:50.480 --> 00:06:52.600
<v Speaker 4>which is like what you do to build the model,

132
00:06:53.439 --> 00:06:56.720
<v Speaker 4>and then there's the inference side, which is what happens

133
00:06:56.759 --> 00:06:59.800
<v Speaker 4>after the model's already done you deployed it. There's people's

134
00:06:59.800 --> 00:07:02.519
<v Speaker 4>hands and then they do a querry on something like chat,

135
00:07:02.560 --> 00:07:06.360
<v Speaker 4>GPT or whatever, and then that quarry has a small

136
00:07:06.360 --> 00:07:09.279
<v Speaker 4>amount of energy that's required to do this thing called inference,

137
00:07:09.319 --> 00:07:12.879
<v Speaker 4>which then outputs your answer output to your result. Now,

138
00:07:13.000 --> 00:07:17.319
<v Speaker 4>the training side is massive, and the inference side individually

139
00:07:17.639 --> 00:07:21.759
<v Speaker 4>is very small, But where the massive energy constraint comes

140
00:07:21.759 --> 00:07:25.399
<v Speaker 4>from is the scale. So once these developed models get

141
00:07:25.399 --> 00:07:29.759
<v Speaker 4>deployed out into the world, the energy demand from the

142
00:07:29.800 --> 00:07:33.879
<v Speaker 4>sum total of them all combined. That's where the energy

143
00:07:33.920 --> 00:07:37.360
<v Speaker 4>intensity goes through the roof. So two sides that are

144
00:07:37.399 --> 00:07:40.839
<v Speaker 4>heavily energy intense that are required in order to make

145
00:07:40.879 --> 00:07:44.199
<v Speaker 4>AI work. So what you can see from this though,

146
00:07:44.360 --> 00:07:46.480
<v Speaker 4>and what you should kind of interpret in your head

147
00:07:46.519 --> 00:07:49.519
<v Speaker 4>just starting with this, is that if an AI model

148
00:07:49.639 --> 00:07:53.839
<v Speaker 4>requires that much energy in order to generate the model

149
00:07:53.920 --> 00:07:56.720
<v Speaker 4>and then be able to use it, you're probably in

150
00:07:56.759 --> 00:08:01.480
<v Speaker 4>the vacuum stage, a vacuum tube stage, because this is

151
00:08:01.959 --> 00:08:06.920
<v Speaker 4>nowhere near the efficiency of the compute that happens in

152
00:08:06.959 --> 00:08:09.160
<v Speaker 4>the human brain. And I'm going to.

153
00:08:09.160 --> 00:08:12.240
<v Speaker 1>Start relating hold on because I think this is a

154
00:08:12.240 --> 00:08:16.000
<v Speaker 1>two way street. It's it's not only just taking everything

155
00:08:16.040 --> 00:08:20.879
<v Speaker 1>that you're you know, sort of, it's not only taking

156
00:08:20.920 --> 00:08:24.720
<v Speaker 1>your questions. It's taking all of the information together. So

157
00:08:24.800 --> 00:08:28.040
<v Speaker 1>this is a massive data gathering tool as well as

158
00:08:28.040 --> 00:08:32.200
<v Speaker 1>a data you know, output tool. Yeah, and that's the

159
00:08:32.200 --> 00:08:36.000
<v Speaker 1>main sort of correlation that's being drawn here. I think

160
00:08:36.039 --> 00:08:40.919
<v Speaker 1>they're sucking up a lot of core information, psychological information

161
00:08:40.960 --> 00:08:44.000
<v Speaker 1>about each each one of us, how we interact with it.

162
00:08:44.679 --> 00:08:48.559
<v Speaker 4>Yep. And the kind of that kind of description that

163
00:08:48.600 --> 00:08:52.039
<v Speaker 4>you just gave that is most relevant to the concept

164
00:08:52.080 --> 00:08:55.799
<v Speaker 4>of l ms, which are large language models. So this

165
00:08:55.960 --> 00:08:59.840
<v Speaker 4>is like your chat Ebt, Gemini, crock whatever, take your

166
00:09:00.000 --> 00:09:03.720
<v Speaker 4>flavor of the month. There's so many these things. This

167
00:09:03.840 --> 00:09:06.720
<v Speaker 4>is a good, good thing you mentioned that, because what

168
00:09:06.799 --> 00:09:08.720
<v Speaker 4>I want to show here, and one of the things

169
00:09:08.799 --> 00:09:12.039
<v Speaker 4>that I'm hoping to promote, is that computer vision is

170
00:09:12.080 --> 00:09:17.080
<v Speaker 4>a way better, more accurate version of AI than a

171
00:09:17.159 --> 00:09:21.159
<v Speaker 4>large language model. So what I was going to say

172
00:09:21.159 --> 00:09:25.879
<v Speaker 4>earlier was that I'm relating AI to the brain. And

173
00:09:25.919 --> 00:09:29.000
<v Speaker 4>what I was originally trying to do here and when

174
00:09:29.000 --> 00:09:33.080
<v Speaker 4>I was working on AI, was to model artificial intelligence

175
00:09:33.120 --> 00:09:35.559
<v Speaker 4>off the human brain. So you're going to hear me

176
00:09:35.720 --> 00:09:39.159
<v Speaker 4>relate to the human brain many times in this presentation.

177
00:09:39.320 --> 00:09:41.159
<v Speaker 4>I'll get to that because that's a big part of

178
00:09:41.159 --> 00:09:43.320
<v Speaker 4>what I'm doing here. So one of the things to

179
00:09:43.399 --> 00:09:46.279
<v Speaker 4>know about that is that in a large language model,

180
00:09:46.320 --> 00:09:51.399
<v Speaker 4>an LM is a pattern recognition math built around literally

181
00:09:51.519 --> 00:09:56.120
<v Speaker 4>language text. And if you think about the human brain,

182
00:09:56.480 --> 00:09:59.159
<v Speaker 4>the language center of the brain is a very small

183
00:09:59.240 --> 00:10:02.399
<v Speaker 4>part of the brain right here over here. It's a

184
00:10:02.480 --> 00:10:05.919
<v Speaker 4>very small part actually, And I'm not saying it's not important.

185
00:10:05.960 --> 00:10:09.279
<v Speaker 4>A large language model and that technology is important, but

186
00:10:09.320 --> 00:10:11.519
<v Speaker 4>that is not what's going to get you to this

187
00:10:11.679 --> 00:10:14.720
<v Speaker 4>artificial life form. And that's controversial to say because there's

188
00:10:14.759 --> 00:10:17.000
<v Speaker 4>so many people that think that a large language model

189
00:10:17.039 --> 00:10:20.159
<v Speaker 4>will just suddenly become this artificial life form. But language

190
00:10:20.200 --> 00:10:23.399
<v Speaker 4>is not a life that's not how language is a

191
00:10:23.440 --> 00:10:26.639
<v Speaker 4>form pattern recognition math. And the easy way to understand

192
00:10:26.679 --> 00:10:29.960
<v Speaker 4>this without having any understanding of the math behind it

193
00:10:30.039 --> 00:10:33.120
<v Speaker 4>is if you walk into a room and there are

194
00:10:33.240 --> 00:10:36.879
<v Speaker 4>ten people speaking different languages that you've never heard before,

195
00:10:37.240 --> 00:10:40.200
<v Speaker 4>not languages that you don't understand but have heard before. No, no, no,

196
00:10:40.440 --> 00:10:43.799
<v Speaker 4>languages that you've never heard before, and you will not

197
00:10:43.879 --> 00:10:47.600
<v Speaker 4>be able to distinguish between all that noise one voice

198
00:10:47.639 --> 00:10:51.720
<v Speaker 4>from another. However, if one of those ten people start

199
00:10:51.840 --> 00:10:55.440
<v Speaker 4>speaking English in that room, all of a sudden, you'll

200
00:10:55.480 --> 00:10:58.559
<v Speaker 4>be able to zone in on that wavelength and filter

201
00:10:58.679 --> 00:11:00.679
<v Speaker 4>it out, and you'll be able to hear the English

202
00:11:01.120 --> 00:11:04.240
<v Speaker 4>from the noise. This is how you know that language

203
00:11:04.279 --> 00:11:07.000
<v Speaker 4>is a pattern recognition math and you're not born with language.

204
00:11:07.399 --> 00:11:11.679
<v Speaker 4>Another thing to understand about language, two people in this

205
00:11:11.840 --> 00:11:17.360
<v Speaker 4>world can hear the same sound and interpret something totally

206
00:11:17.399 --> 00:11:19.639
<v Speaker 4>different from that sound. In other words, there can be

207
00:11:19.679 --> 00:11:22.159
<v Speaker 4>a word in English and then another word in a

208
00:11:22.159 --> 00:11:24.600
<v Speaker 4>different language that sounds exactly the same. That mean two

209
00:11:24.679 --> 00:11:28.759
<v Speaker 4>totally different things. Two different people can understand two different

210
00:11:28.799 --> 00:11:31.399
<v Speaker 4>things from this same sound. So this is how you

211
00:11:31.519 --> 00:11:35.279
<v Speaker 4>know that language is an acquired skill. That is a

212
00:11:35.440 --> 00:11:39.039
<v Speaker 4>form of pattern recognition. And this is not anything that

213
00:11:39.080 --> 00:11:42.279
<v Speaker 4>you would call a life form. This is a talent,

214
00:11:42.559 --> 00:11:45.720
<v Speaker 4>you could say, but it isn't a life form. And

215
00:11:45.799 --> 00:11:48.200
<v Speaker 4>one of the other things is and I actually brought

216
00:11:48.240 --> 00:11:52.000
<v Speaker 4>this up just so you could see in case anyone

217
00:11:52.039 --> 00:11:56.279
<v Speaker 4>forgot from their seventh grade science class, but the definition

218
00:11:56.360 --> 00:12:01.320
<v Speaker 4>of life to this day still does not include intelligence.

219
00:12:02.399 --> 00:12:09.440
<v Speaker 4>And in my opinion, intelligence is equal to life these definitions.

220
00:12:09.480 --> 00:12:11.639
<v Speaker 4>If you read what this is saying here on the screen,

221
00:12:11.679 --> 00:12:15.639
<v Speaker 4>the actual definition of life, this sounds like primates trying

222
00:12:15.679 --> 00:12:19.759
<v Speaker 4>to describe humans horribly. By the way, this is not

223
00:12:19.919 --> 00:12:23.279
<v Speaker 4>the definition of life. If you take away reproduction, for example,

224
00:12:23.639 --> 00:12:26.799
<v Speaker 4>is part of life. It is an essential element of

225
00:12:26.840 --> 00:12:29.759
<v Speaker 4>what is considered life. If you took away reproduction from

226
00:12:29.799 --> 00:12:32.559
<v Speaker 4>everything on Earth tomorrow, does that mean there's no life

227
00:12:32.600 --> 00:12:37.799
<v Speaker 4>on Earth? That is how life works. So what I'm

228
00:12:37.840 --> 00:12:41.159
<v Speaker 4>trying to argue here is that intelligence is part of

229
00:12:41.200 --> 00:12:47.159
<v Speaker 4>the definition of life. And in case you're interested, there's

230
00:12:47.159 --> 00:12:49.039
<v Speaker 4>something else that I want to say before I get

231
00:12:49.039 --> 00:12:51.679
<v Speaker 4>into the actual formal presentation as a little set up here,

232
00:12:52.919 --> 00:12:58.120
<v Speaker 4>So if you're interested in understanding more about intelligence and perception,

233
00:12:58.360 --> 00:13:01.200
<v Speaker 4>my home basis perception. By the way, AI my focus

234
00:13:01.240 --> 00:13:07.759
<v Speaker 4>area was perception. If you want to understand perception, it

235
00:13:07.799 --> 00:13:10.879
<v Speaker 4>turns out, and what I'm going to demonstrate in this presentation,

236
00:13:11.000 --> 00:13:18.240
<v Speaker 4>it turns out that computer vision by accident implicitly understands neuroscience.

237
00:13:19.360 --> 00:13:21.840
<v Speaker 4>And I'll describe this in detail as I go through

238
00:13:21.840 --> 00:13:24.120
<v Speaker 4>the presentation. But if you want to learn more about

239
00:13:24.120 --> 00:13:27.000
<v Speaker 4>what I'm talking about on the neuroscience side, which I'm

240
00:13:27.000 --> 00:13:28.639
<v Speaker 4>not going to do too much about. I'll talk mostly

241
00:13:28.639 --> 00:13:30.000
<v Speaker 4>on the AI side, but if you want to learn

242
00:13:30.039 --> 00:13:34.159
<v Speaker 4>more about the neuroscience side, I highly recommend this lecture series,

243
00:13:34.559 --> 00:13:36.799
<v Speaker 4>and this one will break it down easy enough for

244
00:13:36.840 --> 00:13:42.200
<v Speaker 4>you to understand some basic neuroscience about vision. So computer vision,

245
00:13:42.240 --> 00:13:44.559
<v Speaker 4>in my opinion, doesn't get the rep that it deserves

246
00:13:44.720 --> 00:13:49.200
<v Speaker 4>in AI. And here's something to note about all of this.

247
00:13:50.039 --> 00:13:52.000
<v Speaker 4>So as I was saying, intelligence is not part of

248
00:13:52.000 --> 00:13:55.000
<v Speaker 4>the definition of life, my dad was a neurologist and

249
00:13:55.039 --> 00:13:57.720
<v Speaker 4>a psychiatrist, my mom is a pharmacist with a bunch

250
00:13:57.759 --> 00:13:59.840
<v Speaker 4>of degrees and all this stuff. There's still disappointed that

251
00:13:59.840 --> 00:14:04.519
<v Speaker 4>I'm not a doctor. And what what is interesting and

252
00:14:04.519 --> 00:14:09.320
<v Speaker 4>what I've learned over the years is a two major things.

253
00:14:09.639 --> 00:14:14.240
<v Speaker 4>So number one in evolution, one of the things, one

254
00:14:14.279 --> 00:14:18.039
<v Speaker 4>of the first structures that was ever evolved was a

255
00:14:18.039 --> 00:14:24.039
<v Speaker 4>photoreceptor attached to a flagella. A photoreceptor just says, so

256
00:14:24.120 --> 00:14:26.879
<v Speaker 4>a flagella is a tail, and it was a photoreceptor

257
00:14:26.919 --> 00:14:29.080
<v Speaker 4>was attached to it, and a photoreceptor it just says

258
00:14:29.080 --> 00:14:31.720
<v Speaker 4>whether light exists or not, yes or no, one or zero.

259
00:14:32.480 --> 00:14:34.360
<v Speaker 4>And so you can see that this is the beginning

260
00:14:34.360 --> 00:14:38.320
<v Speaker 4>of binary map, whether light exists or does not exist.

261
00:14:39.240 --> 00:14:42.799
<v Speaker 4>So another thing that is commonly known but often forgotten

262
00:14:43.600 --> 00:14:49.320
<v Speaker 4>is that eyeballs are brain cells. So this is very

263
00:14:49.360 --> 00:14:53.759
<v Speaker 4>important to understand. In the human eye, the ratio of

264
00:14:53.919 --> 00:14:58.360
<v Speaker 4>cones uh they're there are these two types of uh self.

265
00:14:58.720 --> 00:15:03.480
<v Speaker 4>There's these uh raw shaped cylinders which are gray and

266
00:15:03.519 --> 00:15:07.639
<v Speaker 4>they determine grayscale. They determine basically opacity light light, the

267
00:15:07.720 --> 00:15:11.519
<v Speaker 4>gradient of light, brightness of light, and grayscale. And then

268
00:15:11.600 --> 00:15:16.600
<v Speaker 4>they have these cones RGB red, red, green, blue. The

269
00:15:16.720 --> 00:15:22.240
<v Speaker 4>ratio on this is roughly twenty six three one. Roughly

270
00:15:22.320 --> 00:15:24.000
<v Speaker 4>if you're a female, you'll probably have a little bit

271
00:15:24.039 --> 00:15:27.480
<v Speaker 4>more blue, so twenty for twenty. The ratio of cones

272
00:15:27.519 --> 00:15:31.559
<v Speaker 4>in the human eye is overweighted gray cylinders. Ratio of

273
00:15:31.600 --> 00:15:35.039
<v Speaker 4>twenty on the gray, six on the red, three on

274
00:15:35.120 --> 00:15:37.759
<v Speaker 4>the green, and then one or a little bit higher

275
00:15:37.799 --> 00:15:41.759
<v Speaker 4>than one if you're female on the blue cones. So

276
00:15:41.840 --> 00:15:46.919
<v Speaker 4>that's interesting. And what are the part that I'm going

277
00:15:46.960 --> 00:15:49.519
<v Speaker 4>to skip in a neuroscience is about V one through

278
00:15:49.639 --> 00:15:53.639
<v Speaker 4>V four cortex and the visual cortex. And this is

279
00:15:53.639 --> 00:15:56.159
<v Speaker 4>outside of the you know, the purview of the presentation.

280
00:15:56.480 --> 00:15:59.240
<v Speaker 4>But if you want to go deeper down the eyeball route,

281
00:16:00.200 --> 00:16:03.559
<v Speaker 4>recommend after watching this presentation on the AI side. For me,

282
00:16:04.000 --> 00:16:06.080
<v Speaker 4>if you want to understand what I was saying earlier

283
00:16:06.120 --> 00:16:09.679
<v Speaker 4>about how computer vision experts implicitly, whether they know it

284
00:16:09.799 --> 00:16:15.480
<v Speaker 4>or not, accidentally already understand the visual cortext just from

285
00:16:15.600 --> 00:16:20.200
<v Speaker 4>computer vision, because computer vision is doing exactly what the

286
00:16:20.480 --> 00:16:24.159
<v Speaker 4>visual cortex is doing exactly. It is literal, and I'm

287
00:16:24.159 --> 00:16:27.480
<v Speaker 4>going to show published papers that demonstrate this. So what

288
00:16:27.519 --> 00:16:30.480
<v Speaker 4>I'm saying is that computer vision in the modern day

289
00:16:31.159 --> 00:16:35.519
<v Speaker 4>is mimicking the brain, and it is published and known,

290
00:16:35.679 --> 00:16:38.039
<v Speaker 4>and computer vision experts, whether they know it or not,

291
00:16:38.639 --> 00:16:42.360
<v Speaker 4>accidentally figured out the visual cortext and understand it implicitly.

292
00:16:42.840 --> 00:16:47.200
<v Speaker 4>An example of this in computer vision, we understand why

293
00:16:47.279 --> 00:16:51.639
<v Speaker 4>you want to use grayscale to detect motion. First, it's

294
00:16:51.759 --> 00:16:55.519
<v Speaker 4>energy efficient and that's what happens in V one and

295
00:16:55.600 --> 00:17:01.159
<v Speaker 4>that's why you have predominantly gray cylinders and your eyeball.

296
00:17:02.080 --> 00:17:05.759
<v Speaker 4>And then there are functional purposes for RGB as well,

297
00:17:06.000 --> 00:17:10.200
<v Speaker 4>and this is also discussed in computer vision. Okay, so

298
00:17:10.440 --> 00:17:13.319
<v Speaker 4>that was a good like little preamble for the setup,

299
00:17:13.359 --> 00:17:15.319
<v Speaker 4>and hopefully you're seeing that this is shit that no

300
00:17:15.359 --> 00:17:17.240
<v Speaker 4>one's ever talked about. And this is real.

301
00:17:17.640 --> 00:17:20.720
<v Speaker 2>AI, have you spent a lot of time? Go ahead,

302
00:17:22.599 --> 00:17:24.319
<v Speaker 2>I was just gonna say, this is right up your all.

303
00:17:24.720 --> 00:17:28.440
<v Speaker 1>He's upgraded all of these symbols in occultism directly to

304
00:17:28.519 --> 00:17:32.119
<v Speaker 1>the eyeball, and the eyeball is so central to occultism

305
00:17:32.440 --> 00:17:35.720
<v Speaker 1>in general. I mean it's in the word itself. And

306
00:17:36.359 --> 00:17:38.400
<v Speaker 1>he breaks down all of these symbols and how they

307
00:17:38.480 --> 00:17:41.519
<v Speaker 1>relate to stuff, and it's just like, yeah, it's brain tissue,

308
00:17:41.799 --> 00:17:43.240
<v Speaker 1>this is what's programming.

309
00:17:43.720 --> 00:17:46.160
<v Speaker 4>I also know the connection and spirituality, and I wasn't

310
00:17:46.200 --> 00:17:48.279
<v Speaker 4>going to discuss it that much, but here's something to know.

311
00:17:48.519 --> 00:17:56.119
<v Speaker 4>In the beginning, there was light and another thing about Yeah,

312
00:17:56.160 --> 00:17:57.960
<v Speaker 4>I know a lot about that, but that's a whole

313
00:17:58.039 --> 00:17:59.880
<v Speaker 4>wormhole and if I say something about it, we'll have

314
00:17:59.880 --> 00:18:02.559
<v Speaker 4>to oh for like an hour. But in other words, yeah,

315
00:18:02.759 --> 00:18:07.759
<v Speaker 4>there is it's it's there's a lot of indication that

316
00:18:08.119 --> 00:18:11.880
<v Speaker 4>in religion that there was an understanding of evolution, especially

317
00:18:11.960 --> 00:18:15.119
<v Speaker 4>in Hinduism, there was an understanding of evolution. And there's

318
00:18:15.160 --> 00:18:18.319
<v Speaker 4>a lot of not just written documentation of this, but

319
00:18:18.319 --> 00:18:23.240
<v Speaker 4>there's also archaeology that demonstrates this. Uh literal you can

320
00:18:23.279 --> 00:18:25.680
<v Speaker 4>see it with your own eyes, thousands and thousands of

321
00:18:25.720 --> 00:18:28.680
<v Speaker 4>years old, and they're demonstrating that they understood evolution and

322
00:18:28.720 --> 00:18:29.920
<v Speaker 4>more beyond that.

323
00:18:30.279 --> 00:18:34.680
<v Speaker 2>You know, it's real quick, you know, no, go ahead,

324
00:18:34.720 --> 00:18:36.200
<v Speaker 2>let's go ahead, let's okay.

325
00:18:36.279 --> 00:18:40.160
<v Speaker 1>Thees zinc calcium reaction at conception. It's a flash of light.

326
00:18:41.119 --> 00:18:43.440
<v Speaker 4>Yeah, there's a Yeah, it's kind of like the Paho

327
00:18:43.519 --> 00:18:44.240
<v Speaker 4>electric effect.

328
00:18:44.799 --> 00:18:46.680
<v Speaker 2>But real quick, is she? I think it's interesting.

329
00:18:48.039 --> 00:18:48.240
<v Speaker 5>You know.

330
00:18:48.279 --> 00:18:51.160
<v Speaker 3>There was when a boy I think actually friends with

331
00:18:51.200 --> 00:18:54.799
<v Speaker 3>them pravine. Uh, he was a cosmic summit. I went

332
00:18:54.839 --> 00:18:56.880
<v Speaker 3>up to him and I said something to him. I

333
00:18:56.920 --> 00:18:58.359
<v Speaker 3>was trying to explain to him. I was like, I know,

334
00:18:58.400 --> 00:19:00.799
<v Speaker 3>like what you were showing on the screen, you were

335
00:19:00.799 --> 00:19:03.319
<v Speaker 3>showing with something else. I said, but have you ever wondered,

336
00:19:03.400 --> 00:19:06.400
<v Speaker 3>like if those are depictions of inside your rible?

337
00:19:07.759 --> 00:19:09.359
<v Speaker 2>And he like like lost it.

338
00:19:09.319 --> 00:19:11.880
<v Speaker 3>Like he like I looked like I literally like freaked

339
00:19:11.920 --> 00:19:13.440
<v Speaker 3>him out to where.

340
00:19:13.279 --> 00:19:15.440
<v Speaker 2>He like sat down and sat sat down to study

341
00:19:15.440 --> 00:19:16.000
<v Speaker 2>eating his food.

342
00:19:16.000 --> 00:19:18.240
<v Speaker 3>He's like, oh yeah, I was like, yo, I literally

343
00:19:18.240 --> 00:19:19.440
<v Speaker 3>just bugged this guy the fuck out.

344
00:19:19.480 --> 00:19:21.720
<v Speaker 2>But like you get it, you get it.

345
00:19:21.880 --> 00:19:25.200
<v Speaker 4>Yeah, Pravine is one of my most favorite.

346
00:19:25.000 --> 00:19:26.720
<v Speaker 2>Yeah, yeah, maybe you could maybe you could. Maybe you

347
00:19:26.759 --> 00:19:28.160
<v Speaker 2>can tell that and he'll take it a little bit

348
00:19:28.200 --> 00:19:29.119
<v Speaker 2>more serious, like.

349
00:19:29.119 --> 00:19:31.480
<v Speaker 4>Oh, I've I've talked his ear off about this stuff.

350
00:19:31.519 --> 00:19:33.039
<v Speaker 4>He's fully aware of everything.

351
00:19:33.079 --> 00:19:34.960
<v Speaker 2>Oh, I don't know. He looked like he was bugged

352
00:19:34.960 --> 00:19:36.359
<v Speaker 2>out when I said something to him, like he.

353
00:19:36.279 --> 00:19:41.319
<v Speaker 4>Looked at that time. Oh okay if I had just

354
00:19:41.480 --> 00:19:43.200
<v Speaker 4>met Pravine at that time, so he didn't know that

355
00:19:43.319 --> 00:19:45.599
<v Speaker 4>much about me at that time. But we became good

356
00:19:45.599 --> 00:19:47.799
<v Speaker 4>friends over time. Oh. Nice, I'm actually gonna go to

357
00:19:47.880 --> 00:19:50.119
<v Speaker 4>Cambodia with him and a couple of months or a

358
00:19:50.160 --> 00:19:50.799
<v Speaker 4>few months.

359
00:19:51.240 --> 00:19:52.000
<v Speaker 2>Nice, that's awesome.

360
00:19:52.319 --> 00:19:53.559
<v Speaker 1>Be careful of those planes.

361
00:19:54.960 --> 00:19:57.960
<v Speaker 4>A lot of yeah, yeah, yoh.

362
00:19:58.000 --> 00:19:59.319
<v Speaker 2>We'll have to get you on with Day to talk

363
00:19:59.319 --> 00:20:01.079
<v Speaker 2>about the eyeball. I did like a five or six.

364
00:20:01.079 --> 00:20:03.119
<v Speaker 3>I don't have many plots series with the cult symbolism

365
00:20:03.160 --> 00:20:04.240
<v Speaker 3>in the eye, But if you ever want to come

366
00:20:04.240 --> 00:20:06.440
<v Speaker 3>back on and talk about that ship, we're talking about.

367
00:20:06.279 --> 00:20:07.240
<v Speaker 5>The eyeball at all. Day Man.

368
00:20:09.319 --> 00:20:10.640
<v Speaker 2>A lot of my lot of all.

369
00:20:10.599 --> 00:20:13.400
<v Speaker 3>My art, believe and I see the crossing of the arms.

370
00:20:13.400 --> 00:20:16.359
<v Speaker 3>That's that's the crossing of the optic nerve. Like a

371
00:20:16.359 --> 00:20:18.640
<v Speaker 3>lot of my art literally is depicting the inside of

372
00:20:18.640 --> 00:20:20.440
<v Speaker 3>the eyeball, the one that I have like a chick.

373
00:20:20.480 --> 00:20:21.759
<v Speaker 2>It looks like she's on a sea show.

374
00:20:22.200 --> 00:20:25.039
<v Speaker 3>That's like the hyloid collac canal and the oriserata and

375
00:20:25.079 --> 00:20:28.279
<v Speaker 3>then the ac humor underneath. I totally all my ship's

376
00:20:28.279 --> 00:20:30.519
<v Speaker 3>all basically depicted on an eyeball on the brain man.

377
00:20:31.240 --> 00:20:34.240
<v Speaker 4>And don't forget about the pineal gland, right, yeah, yeah,

378
00:20:34.319 --> 00:20:36.640
<v Speaker 4>so we got all and then the third eye and everything. Yeah,

379
00:20:36.680 --> 00:20:42.160
<v Speaker 4>there's quite a lot to do with vision. And uh,

380
00:20:42.599 --> 00:20:46.400
<v Speaker 4>one of the how do I say so? One of

381
00:20:46.440 --> 00:20:48.839
<v Speaker 4>the there's another thing that I want to discuss about

382
00:20:48.839 --> 00:20:52.039
<v Speaker 4>evolution before I get into the formal presentation, and then

383
00:20:53.039 --> 00:20:55.720
<v Speaker 4>we'll go from there. But one more thing about evolution.

384
00:20:55.920 --> 00:20:58.920
<v Speaker 4>And I'm not sure how this happened, but for some

385
00:20:59.000 --> 00:21:04.240
<v Speaker 4>reason in this zeitgeist, it is commonly thought or said

386
00:21:04.799 --> 00:21:11.559
<v Speaker 4>that DNA is the cause of evolution. And this is

387
00:21:11.640 --> 00:21:14.960
<v Speaker 4>just not true. There is no scientists saying that. The

388
00:21:15.000 --> 00:21:18.160
<v Speaker 4>people who discovered DNA didn't say that. Darwin didn't say that.

389
00:21:18.519 --> 00:21:22.279
<v Speaker 4>As far as I know, there is no official source

390
00:21:22.400 --> 00:21:29.440
<v Speaker 4>anywhere that is saying that DNA causes evolution. DNA is

391
00:21:29.559 --> 00:21:34.880
<v Speaker 4>the product of evolution, and that's why they're correlated, but

392
00:21:35.000 --> 00:21:38.440
<v Speaker 4>correlation is not Causation is understood at kindergarten level, and

393
00:21:38.480 --> 00:21:40.680
<v Speaker 4>as far as I know, no scientist is actually saying that.

394
00:21:40.720 --> 00:21:43.799
<v Speaker 4>But for some reason in the psycheiasis is commonly thought

395
00:21:43.960 --> 00:21:48.599
<v Speaker 4>that DNA causes evolution. This is not true, and there

396
00:21:48.640 --> 00:21:52.279
<v Speaker 4>are it is academically understood and known. It is academically

397
00:21:52.359 --> 00:21:56.680
<v Speaker 4>accepted that the brain has known mechanisms for controlling DNA.

398
00:21:56.880 --> 00:22:00.079
<v Speaker 4>And even every in your everyday life, your DNA is

399
00:22:00.119 --> 00:22:03.480
<v Speaker 4>not static. Things turn different, Switches turn on and off

400
00:22:03.480 --> 00:22:06.279
<v Speaker 4>in your DNA all the time, and so and this

401
00:22:06.440 --> 00:22:09.880
<v Speaker 4>mechanism is controlled by the brain, and often environmental factors

402
00:22:09.920 --> 00:22:14.839
<v Speaker 4>are the reason for why this happens. So again, and

403
00:22:15.319 --> 00:22:17.920
<v Speaker 4>another way, just like if you if you're just really

404
00:22:17.920 --> 00:22:20.079
<v Speaker 4>thinking about it from a high level, you've got to

405
00:22:20.160 --> 00:22:24.279
<v Speaker 4>understand that, like, uh, there's so much infrastructure that's required

406
00:22:24.440 --> 00:22:26.880
<v Speaker 4>in order to make DNA work. In the first place,

407
00:22:27.480 --> 00:22:29.319
<v Speaker 4>you need to have the hisstones to zip the file,

408
00:22:29.359 --> 00:22:31.680
<v Speaker 4>because I look at DNA as being software. You need

409
00:22:31.720 --> 00:22:33.960
<v Speaker 4>to have the zip hisstones to zip the file. And

410
00:22:33.960 --> 00:22:35.680
<v Speaker 4>then you need to have a way of unraveling it.

411
00:22:35.799 --> 00:22:37.480
<v Speaker 4>You need to have a way of splitting it. You

412
00:22:37.519 --> 00:22:40.960
<v Speaker 4>need to have the transcription enzymes to transcribe the DNA.

413
00:22:41.160 --> 00:22:43.519
<v Speaker 4>You need to have all the different proteins already there

414
00:22:43.559 --> 00:22:45.680
<v Speaker 4>in order to even transcribe them in the first place.

415
00:22:46.119 --> 00:22:48.279
<v Speaker 4>So there's and and then so much more. In order

416
00:22:48.319 --> 00:22:51.079
<v Speaker 4>to make DNA work, it needs all of these other

417
00:22:51.319 --> 00:22:56.000
<v Speaker 4>things as well, and so in order of operations. But

418
00:22:56.240 --> 00:22:58.720
<v Speaker 4>how is like DNA by itself doesn't do anything. There's

419
00:22:58.720 --> 00:23:01.720
<v Speaker 4>no magic that happens with DNA by itself, And so

420
00:23:01.960 --> 00:23:04.319
<v Speaker 4>in order of operations, you would need to have all

421
00:23:04.359 --> 00:23:08.200
<v Speaker 4>of this other infrastructure available or at least functional to

422
00:23:08.279 --> 00:23:11.680
<v Speaker 4>some degree before DNA becomes useful. So in terms of

423
00:23:11.759 --> 00:23:14.039
<v Speaker 4>order or operations, DNA can't be at the front of

424
00:23:14.079 --> 00:23:17.559
<v Speaker 4>the line. This is the software. Software doesn't come before

425
00:23:17.559 --> 00:23:21.640
<v Speaker 4>the hardware. So this is kind of you know, you

426
00:23:21.680 --> 00:23:24.160
<v Speaker 4>can kind of understand this stuff at a pretty basic level.

427
00:23:24.160 --> 00:23:26.640
<v Speaker 4>This is all commonly understood and all this stuff, but

428
00:23:26.680 --> 00:23:30.000
<v Speaker 4>for some reason, there's just a different idea of how

429
00:23:30.000 --> 00:23:35.039
<v Speaker 4>it works out there. So let's get into the presentation.

430
00:23:35.319 --> 00:23:39.400
<v Speaker 4>So what I was doing at Arizona State University, as

431
00:23:39.440 --> 00:23:43.920
<v Speaker 4>I was working on computer vision applied to artificial intelligence

432
00:23:44.000 --> 00:23:48.279
<v Speaker 4>in automated vehicles, and so I was trying to apply

433
00:23:48.440 --> 00:23:57.240
<v Speaker 4>camera based systems to automated vehicles. And one of the

434
00:23:57.559 --> 00:24:03.119
<v Speaker 4>issues that I ran into in automated vehicles, especially at ASU,

435
00:24:03.559 --> 00:24:08.640
<v Speaker 4>is that a overwhelming amount of investment and research in

436
00:24:08.799 --> 00:24:12.960
<v Speaker 4>R and B effort, especially in academia but also out

437
00:24:12.960 --> 00:24:15.799
<v Speaker 4>in the real business world, has been dedicated to the

438
00:24:15.839 --> 00:24:22.759
<v Speaker 4>task of making liedar part of the automated vehicle apparatus system.

439
00:24:23.279 --> 00:24:25.319
<v Speaker 4>And what I'm sure I'm going to be arguing in

440
00:24:25.359 --> 00:24:31.039
<v Speaker 4>this presentation is that LDAR is completely useless. It's actually work.

441
00:24:31.119 --> 00:24:35.440
<v Speaker 4>It makes your system worse, not better. And I want

442
00:24:35.440 --> 00:24:39.559
<v Speaker 4>to be clear upfront. Lightar is the real technology and

443
00:24:39.599 --> 00:24:42.519
<v Speaker 4>it does have real world application. It's used in space,

444
00:24:42.599 --> 00:24:46.039
<v Speaker 4>it's used to map layout. There are actual, real functional

445
00:24:46.079 --> 00:24:49.680
<v Speaker 4>purposes for lightar. But what I am going to hit

446
00:24:49.759 --> 00:24:52.319
<v Speaker 4>hard on in this presentation is it LDAR makes no

447
00:24:52.480 --> 00:24:55.200
<v Speaker 4>sense at all in an automated vehicle. Doesn't matter what

448
00:24:55.240 --> 00:24:58.640
<v Speaker 4>your philosophy is, there is no reason to have lightar

449
00:24:58.759 --> 00:25:04.000
<v Speaker 4>in a vehicle. So automated vehicle so me since maybe

450
00:25:04.079 --> 00:25:06.359
<v Speaker 4>not everyone is familiar with how light our works, just

451
00:25:06.400 --> 00:25:08.759
<v Speaker 4>to set it up the way that light our works

452
00:25:08.960 --> 00:25:11.680
<v Speaker 4>is so the major companies that you may have seen

453
00:25:11.720 --> 00:25:14.480
<v Speaker 4>that have light or Weimo is the big one in

454
00:25:14.519 --> 00:25:18.440
<v Speaker 4>the US and Waimo is owned by Google. And then

455
00:25:18.559 --> 00:25:21.279
<v Speaker 4>in the past there were other companies that have also

456
00:25:21.359 --> 00:25:27.000
<v Speaker 4>used them. There was General Motors GM, but they ended

457
00:25:27.119 --> 00:25:29.319
<v Speaker 4>up having a problem a couple of years ago in

458
00:25:29.319 --> 00:25:31.640
<v Speaker 4>California where they ended up using a light our based

459
00:25:31.680 --> 00:25:35.319
<v Speaker 4>system and this car dragged a person underneath their vehicle

460
00:25:35.559 --> 00:25:38.039
<v Speaker 4>for like twenty or thirty feet, and so then they

461
00:25:38.039 --> 00:25:40.400
<v Speaker 4>had to end basically operations. They were even caught lying

462
00:25:40.559 --> 00:25:42.920
<v Speaker 4>like trying to deceive about it too when they were

463
00:25:42.920 --> 00:25:44.960
<v Speaker 4>doing it. It was all over the news. People probably

464
00:25:44.960 --> 00:25:48.960
<v Speaker 4>remember another company that was doing it was Uber and

465
00:25:49.440 --> 00:25:52.599
<v Speaker 4>I live in Phoenix, Arizona. This is a heavy testing

466
00:25:52.640 --> 00:25:56.559
<v Speaker 4>ground for automated vehicles. And the reason why Arizona is

467
00:25:56.599 --> 00:25:59.119
<v Speaker 4>a heavy testing ground or Phoenix specifically is a heavy

468
00:25:59.160 --> 00:26:01.160
<v Speaker 4>testing ground for automate vehicles is because we have three

469
00:26:01.240 --> 00:26:03.720
<v Speaker 4>hundred and sixty days of sunshine and we have some

470
00:26:03.799 --> 00:26:07.200
<v Speaker 4>of the widest and best roads in the US because

471
00:26:07.200 --> 00:26:10.119
<v Speaker 4>we have massive trucking that comes through here, and so

472
00:26:10.480 --> 00:26:14.039
<v Speaker 4>this is and we have a very grid like road system,

473
00:26:14.480 --> 00:26:17.559
<v Speaker 4>very squared, grid like road system is very clean at organized,

474
00:26:18.000 --> 00:26:22.119
<v Speaker 4>so we're pretty much ideal set up for this, and

475
00:26:22.200 --> 00:26:26.960
<v Speaker 4>so WEIMO eventually came here, and what they ended up

476
00:26:27.000 --> 00:26:29.279
<v Speaker 4>doing is they ended up going to ASU. WEIMO did,

477
00:26:29.359 --> 00:26:31.599
<v Speaker 4>and they ended up investing a huge amount of money

478
00:26:31.680 --> 00:26:35.920
<v Speaker 4>into their engineering department. And that was very problematic for me,

479
00:26:36.200 --> 00:26:38.839
<v Speaker 4>which I didn't know about at the time, because I

480
00:26:38.960 --> 00:26:41.960
<v Speaker 4>was going in there applying with this presentation for a

481
00:26:42.000 --> 00:26:46.440
<v Speaker 4>PhD in computer vision to use it for automated vehicles.

482
00:26:46.880 --> 00:26:49.480
<v Speaker 4>And so you'll see why this gets to be a

483
00:26:49.519 --> 00:26:54.000
<v Speaker 4>big problem when you're in the hornet's nest with WEIMO.

484
00:26:54.160 --> 00:26:56.240
<v Speaker 4>So before I get to that, let's kind of build

485
00:26:56.240 --> 00:26:58.480
<v Speaker 4>it all up and kind of set up what I'm

486
00:26:58.480 --> 00:27:03.400
<v Speaker 4>talking about here. My home basis perception. But and by

487
00:27:03.440 --> 00:27:05.440
<v Speaker 4>the way, if this math is scaring you, don't worry.

488
00:27:05.480 --> 00:27:07.759
<v Speaker 4>I'm not going to get to mathavy. I'm breaking everything down.

489
00:27:07.880 --> 00:27:09.960
<v Speaker 4>You don't need a degree to understand any of this stuff.

490
00:27:11.240 --> 00:27:13.279
<v Speaker 4>But what is important here is that there are three

491
00:27:13.519 --> 00:27:21.880
<v Speaker 4>major elements to UH automated vehicles for doing the doing

492
00:27:22.160 --> 00:27:27.000
<v Speaker 4>controlling automated vehicles. So the first element is perception, and

493
00:27:27.039 --> 00:27:30.519
<v Speaker 4>then dynamics and then control and then basically what I

494
00:27:30.559 --> 00:27:33.440
<v Speaker 4>have on this side is intro to robotics. So this

495
00:27:33.519 --> 00:27:37.960
<v Speaker 4>is this is the functional math for how you do robotics.

496
00:27:38.720 --> 00:27:41.119
<v Speaker 4>And what I have here on the left side, this

497
00:27:41.200 --> 00:27:43.720
<v Speaker 4>big old crazy looking equation. All this, all this is

498
00:27:43.759 --> 00:27:46.799
<v Speaker 4>saying is what is the least wrong answer? So based

499
00:27:46.799 --> 00:27:50.559
<v Speaker 4>off of these three equations on the right side, the

500
00:27:50.680 --> 00:27:52.640
<v Speaker 4>left side is trying to figure out what is the

501
00:27:52.720 --> 00:27:56.880
<v Speaker 4>least wrong answer based off of that, And what you'll

502
00:27:56.880 --> 00:28:02.160
<v Speaker 4>see is that perception, which is ZT. Perception is used

503
00:28:02.160 --> 00:28:05.759
<v Speaker 4>in all three equations, is fundamental to all three equations.

504
00:28:05.880 --> 00:28:11.599
<v Speaker 4>So perception is the most important task in robotics. Perception.

505
00:28:13.400 --> 00:28:19.160
<v Speaker 4>So here are some of the major brands. So remember

506
00:28:19.200 --> 00:28:22.039
<v Speaker 4>that this presentation was done in twenty twenty three, so

507
00:28:22.079 --> 00:28:24.440
<v Speaker 4>some of this is a little bit dated, but almost

508
00:28:24.480 --> 00:28:28.960
<v Speaker 4>all of it is still relevant today. This is a

509
00:28:29.000 --> 00:28:32.000
<v Speaker 4>company called Argo on the top left here. This was

510
00:28:32.079 --> 00:28:36.240
<v Speaker 4>owned by Ford. This is a now defunct company. But

511
00:28:36.519 --> 00:28:40.160
<v Speaker 4>they were also trying to force fit lighter into their cars.

512
00:28:40.839 --> 00:28:43.519
<v Speaker 4>They weren't able to do it, so they went bankruptor

513
00:28:43.599 --> 00:28:45.480
<v Speaker 4>they closed on the company. But what I wanted to

514
00:28:45.480 --> 00:28:49.039
<v Speaker 4>point out about this is that they have these light

515
00:28:49.079 --> 00:28:52.480
<v Speaker 4>our things on here. But notice the cameras that they

516
00:28:52.559 --> 00:28:56.480
<v Speaker 4>used are ring cameras, literal ring cameras, and so you

517
00:28:56.480 --> 00:28:58.839
<v Speaker 4>can tell that they went way out of their way

518
00:28:58.880 --> 00:29:03.240
<v Speaker 4>to do go all in on cameras. Huh. Of course,

519
00:29:03.240 --> 00:29:06.240
<v Speaker 4>Weimo's the big one. This company down here in the

520
00:29:06.279 --> 00:29:10.960
<v Speaker 4>center is this should be Wave if I recall correctly,

521
00:29:11.000 --> 00:29:12.720
<v Speaker 4>it's been a little while since I've done this. And

522
00:29:12.759 --> 00:29:14.799
<v Speaker 4>then there's a couple others. It's fine. There's a Chinese

523
00:29:14.799 --> 00:29:17.079
<v Speaker 4>company over here called Xpaying, which people should know about

524
00:29:17.079 --> 00:29:24.400
<v Speaker 4>if they care about this stuff. So LDAR is so

525
00:29:26.359 --> 00:29:30.720
<v Speaker 4>prevalent and overwhelmingly so. Almost every major company has been

526
00:29:30.720 --> 00:29:34.119
<v Speaker 4>trying to force fit LDAR into their automated vehicle, and

527
00:29:34.200 --> 00:29:36.920
<v Speaker 4>almost every single one of them is failing. The only

528
00:29:36.960 --> 00:29:40.880
<v Speaker 4>one that is achieving scale, despite what narratives are, the

529
00:29:40.880 --> 00:29:43.960
<v Speaker 4>only one that is achieving scale on the autonomous vehicles

530
00:29:44.000 --> 00:29:47.359
<v Speaker 4>in this world is Tesla. And so Tesla is a

531
00:29:47.400 --> 00:29:51.799
<v Speaker 4>camera based system that does not use any other perception sensor,

532
00:29:52.039 --> 00:29:57.039
<v Speaker 4>just cameras and way moos use lightar and camera. So

533
00:29:57.160 --> 00:30:00.480
<v Speaker 4>here's another thing about LDAR. Even if if you do

534
00:30:00.680 --> 00:30:03.759
<v Speaker 4>use a light ar system, no matter what, you still

535
00:30:03.799 --> 00:30:06.240
<v Speaker 4>need a camera based system. Doesn't matter who you are,

536
00:30:06.319 --> 00:30:08.839
<v Speaker 4>what philosophy you believe in it doesn't matter anything. You

537
00:30:08.920 --> 00:30:11.839
<v Speaker 4>still need a camera based system. So I'm going to

538
00:30:11.920 --> 00:30:19.240
<v Speaker 4>discuss why that's important or why that is. This is

539
00:30:19.279 --> 00:30:21.440
<v Speaker 4>like Chinese companies. I'm going to skip some of this

540
00:30:21.519 --> 00:30:25.039
<v Speaker 4>stuff because this presentation is actually quite long, and so

541
00:30:25.079 --> 00:30:27.039
<v Speaker 4>I'm going to skip it to condense it quite a

542
00:30:27.039 --> 00:30:31.119
<v Speaker 4>lot and stick to really the most important points. So

543
00:30:32.119 --> 00:30:34.400
<v Speaker 4>one of the issues is that lighter has got this

544
00:30:34.519 --> 00:30:37.400
<v Speaker 4>spinny thing that's going around. You've probably seen it if

545
00:30:37.400 --> 00:30:41.119
<v Speaker 4>you've seen a Waymo vehicle or something like that. Zeukes

546
00:30:41.160 --> 00:30:44.720
<v Speaker 4>I've just seen in Las Vegas has also started to deploy,

547
00:30:44.759 --> 00:30:46.680
<v Speaker 4>which I believe it's owned by Amazon. This is a

548
00:30:46.720 --> 00:30:52.039
<v Speaker 4>relatively new addition, but again still using widar. And what

549
00:30:52.119 --> 00:30:55.039
<v Speaker 4>a light oar system does is this little spinny thing

550
00:30:55.079 --> 00:30:58.200
<v Speaker 4>that I'm talking about. It shoots a laser beam and

551
00:30:58.240 --> 00:31:00.960
<v Speaker 4>it shoots a ton of these all over the and

552
00:31:01.000 --> 00:31:03.440
<v Speaker 4>then there's like you're supposed to get a reflection back.

553
00:31:03.559 --> 00:31:06.480
<v Speaker 4>This is an imperceptible beam of light, laser beam of light,

554
00:31:07.000 --> 00:31:09.559
<v Speaker 4>and then based on that time and distance, they're able

555
00:31:09.640 --> 00:31:13.400
<v Speaker 4>to get a point return of information on depth. So,

556
00:31:13.440 --> 00:31:17.160
<v Speaker 4>in other words, this laser beams functional purpose and what

557
00:31:17.240 --> 00:31:20.559
<v Speaker 4>it's doing, is it's trying to measure depth of the

558
00:31:20.599 --> 00:31:25.799
<v Speaker 4>things around it. The actual literal laser precise depth calculation,

559
00:31:26.359 --> 00:31:30.400
<v Speaker 4>and it is laser precise, it is super precise depth calculation.

560
00:31:30.960 --> 00:31:36.000
<v Speaker 4>But this is effectively the entire in its entirety, the

561
00:31:36.079 --> 00:31:41.599
<v Speaker 4>full and complete explanation of the functional purpose of why

562
00:31:41.640 --> 00:31:44.359
<v Speaker 4>these light oar systems exist on these cars is for

563
00:31:44.480 --> 00:31:51.000
<v Speaker 4>precise measurement of depth. But lightar is like a dumber

564
00:31:51.079 --> 00:31:53.400
<v Speaker 4>version of sonar, that's the way to think of it.

565
00:31:53.920 --> 00:31:59.680
<v Speaker 4>And it creates a point cloud return kind of a

566
00:31:59.759 --> 00:32:02.440
<v Speaker 4>grip around it in point cloud return. And I'm going

567
00:32:02.480 --> 00:32:05.319
<v Speaker 4>to show videos and pictures in all this of that.

568
00:32:06.039 --> 00:32:07.880
<v Speaker 4>But what I want to show first before I get

569
00:32:07.880 --> 00:32:10.720
<v Speaker 4>into the videos of that, is I want to show

570
00:32:11.039 --> 00:32:14.880
<v Speaker 4>why a few different cases of why lid are doesn't

571
00:32:14.920 --> 00:32:17.599
<v Speaker 4>make any sense in an automated vehicle. Why am I

572
00:32:17.640 --> 00:32:21.319
<v Speaker 4>saying that so strongly? So this what I have on

573
00:32:21.359 --> 00:32:23.519
<v Speaker 4>the screen, And by the way, every slide should have

574
00:32:23.720 --> 00:32:26.200
<v Speaker 4>a source on it, So like down here, this is

575
00:32:26.240 --> 00:32:33.680
<v Speaker 4>the actual paper, academically accepted paper, and i E. The

576
00:32:33.759 --> 00:32:35.759
<v Speaker 4>two major conferences I'm going to be talking about are

577
00:32:35.799 --> 00:32:40.119
<v Speaker 4>i EE and CVPR. I EE is like basically the

578
00:32:40.160 --> 00:32:43.319
<v Speaker 4>biggest engineering conference in the world, and CVPR is the

579
00:32:43.400 --> 00:32:46.279
<v Speaker 4>largest computer vision conference in the world. So all of

580
00:32:46.319 --> 00:32:49.880
<v Speaker 4>these papers and slides and presentations that I'm going to

581
00:32:49.880 --> 00:32:53.200
<v Speaker 4>be talking about are all from the largest possible conferences,

582
00:32:53.240 --> 00:32:57.240
<v Speaker 4>the most academically accepted conferences that there are in the world,

583
00:32:57.359 --> 00:33:03.079
<v Speaker 4>not just in America, in the world. So this paper

584
00:33:03.079 --> 00:33:04.680
<v Speaker 4>that I have on the screen is a little dated.

585
00:33:04.680 --> 00:33:07.880
<v Speaker 4>It's from twenty twenty one, but this is considered originally

586
00:33:07.920 --> 00:33:11.640
<v Speaker 4>the foundational paper on identifying what is known as VRUS.

587
00:33:12.319 --> 00:33:16.079
<v Speaker 4>VRUS stands for Vulnerable road user and this is a

588
00:33:16.119 --> 00:33:19.519
<v Speaker 4>regulatory term, not an AI term. This is a regulatory

589
00:33:19.640 --> 00:33:25.519
<v Speaker 4>term and it is basically a catch all phrase for pedestrians, dogs, cats,

590
00:33:25.519 --> 00:33:29.000
<v Speaker 4>et cetera. And what this paper is showing is that

591
00:33:29.039 --> 00:33:32.640
<v Speaker 4>it at that time and it identified thirty two attributes

592
00:33:33.039 --> 00:33:35.880
<v Speaker 4>on identifying a pedestrian, which is like one of the

593
00:33:35.920 --> 00:33:40.759
<v Speaker 4>most important tasks that a computer, any automated vehicle needs

594
00:33:40.799 --> 00:33:43.839
<v Speaker 4>to do. And if you look at what these attributes

595
00:33:43.880 --> 00:33:49.880
<v Speaker 4>are that they're talking about, like a handbag, clothes and

596
00:33:50.200 --> 00:33:56.200
<v Speaker 4>things like that, the bags that they're carrying male female.

597
00:33:56.960 --> 00:33:59.599
<v Speaker 4>The issue with a point cloud return is that it

598
00:33:59.720 --> 00:34:03.119
<v Speaker 4>only gives you the shape. It does not give you

599
00:34:03.200 --> 00:34:07.079
<v Speaker 4>the color quality that you get from a camera. So

600
00:34:07.240 --> 00:34:10.760
<v Speaker 4>liedar is like a one dimensional point. It's just a

601
00:34:10.800 --> 00:34:13.119
<v Speaker 4>point that tells you death and that's it. And you

602
00:34:13.159 --> 00:34:15.920
<v Speaker 4>get a whole bunch of points, millions of points, and

603
00:34:15.960 --> 00:34:19.440
<v Speaker 4>then hopefully you can identify the shape of an object

604
00:34:19.760 --> 00:34:22.960
<v Speaker 4>with these points. That's what light our does. Well. It

605
00:34:23.039 --> 00:34:25.599
<v Speaker 4>turns out that if this is what you're prioritizing, and

606
00:34:25.599 --> 00:34:27.800
<v Speaker 4>this is what you're trying to maximize for, which is

607
00:34:27.800 --> 00:34:29.639
<v Speaker 4>what a lot of pretty much all these light our

608
00:34:29.679 --> 00:34:33.440
<v Speaker 4>companies are doing well, you miss out on all of

609
00:34:33.480 --> 00:34:37.519
<v Speaker 4>those extra fine quality details that are actually most relevant,

610
00:34:38.000 --> 00:34:41.480
<v Speaker 4>the most important aspect is this stuff. And this is

611
00:34:41.559 --> 00:34:46.239
<v Speaker 4>academically understood. So this is in twenty twenty one, so

612
00:34:46.280 --> 00:34:48.639
<v Speaker 4>this is dated. So this has actually been improved significantly

613
00:34:48.679 --> 00:34:50.400
<v Speaker 4>since then. But this is just a cited This is

614
00:34:50.480 --> 00:34:57.000
<v Speaker 4>known for like a long time in the industry. I'm

615
00:34:57.039 --> 00:34:59.280
<v Speaker 4>gonna kind of skip this part and go to some

616
00:34:59.320 --> 00:35:04.599
<v Speaker 4>more of the meat. So right here is something called

617
00:35:04.639 --> 00:35:07.360
<v Speaker 4>this is Argo verse, which is Argo that the company

618
00:35:07.400 --> 00:35:10.960
<v Speaker 4>that was that company that was owned by Ford that

619
00:35:11.079 --> 00:35:14.519
<v Speaker 4>was also trying to force fit LDAR into their into

620
00:35:14.559 --> 00:35:17.679
<v Speaker 4>their cars. They're now defunct, but in twenty twenty three

621
00:35:18.039 --> 00:35:22.239
<v Speaker 4>at CVPR they presented this information that I'm showing on

622
00:35:22.280 --> 00:35:25.239
<v Speaker 4>the screen. Now. One of the things that has happened

623
00:35:25.280 --> 00:35:27.559
<v Speaker 4>in academia, and I'm pretty sure, at least as far

624
00:35:27.599 --> 00:35:30.199
<v Speaker 4>as i'm aware to this day, is still true. One

625
00:35:30.239 --> 00:35:32.480
<v Speaker 4>of the things that has happened in academia is that

626
00:35:33.360 --> 00:35:38.960
<v Speaker 4>research around light our degradation over range seems to be

627
00:35:39.039 --> 00:35:44.960
<v Speaker 4>heavily suppressed. And before ARGO went defuncts they published this

628
00:35:45.079 --> 00:35:47.320
<v Speaker 4>and this is, as far as I'm aware, the only

629
00:35:48.199 --> 00:35:54.159
<v Speaker 4>major company that has ever published anything like this. What

630
00:35:54.239 --> 00:35:58.360
<v Speaker 4>this is showing on the screen here is the accuracy

631
00:35:58.360 --> 00:36:03.840
<v Speaker 4>on object and classification identification using light ar based systems.

632
00:36:04.480 --> 00:36:07.320
<v Speaker 4>So you can see in their best case scenario they're

633
00:36:07.320 --> 00:36:11.159
<v Speaker 4>getting about fifty one percent accuracy at zero to fifty

634
00:36:11.199 --> 00:36:15.599
<v Speaker 4>meters of range, but once you go to fifty to

635
00:36:15.639 --> 00:36:18.840
<v Speaker 4>one hundred meters of range, it drops by half, and

636
00:36:18.840 --> 00:36:20.440
<v Speaker 4>then once you go one hundred to one hundred and

637
00:36:20.480 --> 00:36:23.599
<v Speaker 4>fifty meters a range, it drops by half again. So

638
00:36:23.679 --> 00:36:27.239
<v Speaker 4>what this is showing is that it's showing a log

639
00:36:27.320 --> 00:36:31.519
<v Speaker 4>to degradation in accuracy every fifty meters using a light

640
00:36:31.639 --> 00:36:35.880
<v Speaker 4>ur based system. And I just did the math down here.

641
00:36:35.960 --> 00:36:38.360
<v Speaker 4>This is some basic maths. This is about as intense

642
00:36:38.360 --> 00:36:40.639
<v Speaker 4>as the math is going to get. At sixty miles

643
00:36:40.679 --> 00:36:43.840
<v Speaker 4>per hour, that's twenty six point eight two meters per second.

644
00:36:44.519 --> 00:36:48.400
<v Speaker 4>That means fifty meters of accurate range. Gives you two

645
00:36:48.440 --> 00:36:52.280
<v Speaker 4>seconds of forecasting time, so you're able to see at

646
00:36:52.360 --> 00:36:55.320
<v Speaker 4>two seconds of what ahead of you with with this

647
00:36:55.400 --> 00:36:57.000
<v Speaker 4>system at sixty miles per.

648
00:36:56.840 --> 00:37:01.760
<v Speaker 1>Hour, this completely takes automated trucking off of the board.

649
00:37:01.840 --> 00:37:04.760
<v Speaker 1>You cannot have those type of tolerances if you're going

650
00:37:04.840 --> 00:37:09.199
<v Speaker 1>to actually have a truck that's now commanded by AI

651
00:37:09.360 --> 00:37:10.000
<v Speaker 1>or whatever.

652
00:37:10.119 --> 00:37:12.440
<v Speaker 4>No doubt about that. Yeah, that's why there isn't a

653
00:37:12.519 --> 00:37:14.599
<v Speaker 4>light hour based truck as far as I'm aware, it's

654
00:37:14.760 --> 00:37:19.800
<v Speaker 4>not one that carries weight. But there is the Tesla

655
00:37:19.880 --> 00:37:23.559
<v Speaker 4>semi which does exist and is on the roads. It

656
00:37:23.639 --> 00:37:27.920
<v Speaker 4>is working, and there are millions of vehicles Tesla vehicles,

657
00:37:27.960 --> 00:37:30.199
<v Speaker 4>automated vehicles on the road. In fact, I have one.

658
00:37:30.360 --> 00:37:33.400
<v Speaker 4>Oh I should also mention this. I've been beta testing

659
00:37:33.519 --> 00:37:37.000
<v Speaker 4>the Tesla automated system since twenty eighteen. I have more

660
00:37:37.039 --> 00:37:39.800
<v Speaker 4>than one hundred thousand miles driven on their system since then.

661
00:37:39.920 --> 00:37:43.199
<v Speaker 4>Over multiple cars and it's night and day. Comparatively to

662
00:37:43.239 --> 00:37:47.000
<v Speaker 4>anything else that you can never see, it's like ridiculous,

663
00:37:47.119 --> 00:37:50.920
<v Speaker 4>how much better it is. And again, Tesla has millions

664
00:37:50.960 --> 00:37:54.400
<v Speaker 4>and millions of these vehicles on the road and they're

665
00:37:54.480 --> 00:37:57.079
<v Speaker 4>all over the country doing this and actually the elsewhere

666
00:37:57.079 --> 00:38:01.480
<v Speaker 4>in the world. But Waimo only operates geo fence spaces

667
00:38:01.519 --> 00:38:04.599
<v Speaker 4>and a few select cities. They only have two thousand

668
00:38:04.679 --> 00:38:08.800
<v Speaker 4>vehicles in their fleet. And I'm going to get into

669
00:38:08.960 --> 00:38:12.239
<v Speaker 4>more about all that, but it's that that's worth keeping

670
00:38:12.320 --> 00:38:14.400
<v Speaker 4>in your mind for now, and it's gonna come up

671
00:38:14.480 --> 00:38:16.039
<v Speaker 4>later again why that's relevant.

672
00:38:16.599 --> 00:38:18.000
<v Speaker 3>So I think I did want to ask you though,

673
00:38:19.400 --> 00:38:21.400
<v Speaker 3>like what do you think about like light art kind

674
00:38:21.440 --> 00:38:23.880
<v Speaker 3>of being used on the ground to service, Like I mean,

675
00:38:24.639 --> 00:38:26.760
<v Speaker 3>you aren't even using that as a way of like

676
00:38:26.880 --> 00:38:29.760
<v Speaker 3>kind of like a ground truthing for like these satellite

677
00:38:29.760 --> 00:38:31.920
<v Speaker 3>images like even on the Pyramids or other stuff.

678
00:38:33.000 --> 00:38:38.920
<v Speaker 1>That's fine, all right, yeah, ye think about this. So

679
00:38:38.960 --> 00:38:43.800
<v Speaker 1>they're only taking in one wavelength of light, so they

680
00:38:43.840 --> 00:38:46.440
<v Speaker 1>don't know what what the fuck anything is. It's just

681
00:38:46.480 --> 00:38:50.039
<v Speaker 1>a distance. You've only got one dimension of vision. It's

682
00:38:50.079 --> 00:38:51.480
<v Speaker 1>it's worse than gray scale.

683
00:38:51.800 --> 00:38:54.480
<v Speaker 4>Yeah, And I'm setting up a few things here on

684
00:38:54.559 --> 00:38:57.239
<v Speaker 4>purpose before I show you the video of what the

685
00:38:57.320 --> 00:38:59.639
<v Speaker 4>light are actually looks like, because I want you out

686
00:38:59.679 --> 00:39:03.920
<v Speaker 4>kind of see how many different problems problem scenarios there are.

687
00:39:04.360 --> 00:39:06.280
<v Speaker 4>And then when I show you the video, you'll see

688
00:39:06.599 --> 00:39:08.440
<v Speaker 4>you'll have all the context you need for the video

689
00:39:08.440 --> 00:39:11.599
<v Speaker 4>to make sense. So you're going to see that, don't worry,

690
00:39:11.639 --> 00:39:14.320
<v Speaker 4>it's coming. You're going to see why how much of

691
00:39:14.320 --> 00:39:17.440
<v Speaker 4>a difference there is between light R and basically vision.

692
00:39:18.360 --> 00:39:21.360
<v Speaker 4>It's like not even close. So what I was saying

693
00:39:21.360 --> 00:39:25.239
<v Speaker 4>earlier though, was that light our based systems require geofencing,

694
00:39:25.719 --> 00:39:29.239
<v Speaker 4>meaning they have to map to a specific area and

695
00:39:29.280 --> 00:39:32.079
<v Speaker 4>they can only operate within this specific area, whatever this

696
00:39:32.199 --> 00:39:35.239
<v Speaker 4>area block is, and they have to go through this

697
00:39:35.320 --> 00:39:37.519
<v Speaker 4>area over and over and over again, and then they

698
00:39:37.519 --> 00:39:41.480
<v Speaker 4>map it. It's effectively memorizing the area. And that's what

699
00:39:41.639 --> 00:39:44.320
<v Speaker 4>geo fencing is called, and that's what WEIMO does. They

700
00:39:44.360 --> 00:39:47.679
<v Speaker 4>don't operate everywhere in Phoenix. They operate within confined regions

701
00:39:47.719 --> 00:39:50.719
<v Speaker 4>and that's for a reason. And another thing about this

702
00:39:50.960 --> 00:39:53.079
<v Speaker 4>is that that means that you're only allowed to do

703
00:39:53.199 --> 00:39:55.800
<v Speaker 4>in that system. In a light our based system, you're

704
00:39:55.880 --> 00:39:58.119
<v Speaker 4>only at this point as far as I'm aware, and

705
00:39:58.159 --> 00:40:00.880
<v Speaker 4>there's no technology that can beat this. Are only allowed

706
00:40:00.920 --> 00:40:06.679
<v Speaker 4>to do point to point travel, specific point pre determined

707
00:40:06.960 --> 00:40:12.159
<v Speaker 4>to specific point predetermined. And this is different than say

708
00:40:12.199 --> 00:40:16.480
<v Speaker 4>at Tesla, which can do anywhere to anywhere travel. And

709
00:40:16.559 --> 00:40:19.119
<v Speaker 4>so that means you can start from anywhere go to

710
00:40:19.159 --> 00:40:21.519
<v Speaker 4>anywhere Whereas a way, MO has to start in pre

711
00:40:21.599 --> 00:40:26.480
<v Speaker 4>defined spots only and go to pre defined spots only,

712
00:40:26.800 --> 00:40:28.840
<v Speaker 4>and these have to be these are finite. It's not

713
00:40:28.960 --> 00:40:31.840
<v Speaker 4>anywhere in that space. So even within the geofence region,

714
00:40:32.239 --> 00:40:35.280
<v Speaker 4>they can only go from specific place to specific place,

715
00:40:35.639 --> 00:40:41.599
<v Speaker 4>not anywhere anywhere you want in there. And so one

716
00:40:41.599 --> 00:40:43.440
<v Speaker 4>of the things that has happened, and one of the

717
00:40:43.480 --> 00:40:47.800
<v Speaker 4>distortions that has happened, this is infected the regulatory space

718
00:40:47.840 --> 00:40:51.440
<v Speaker 4>at a kindergarten level, this is broken somehow, some way.

719
00:40:51.639 --> 00:40:55.320
<v Speaker 4>Waimo and other light our companies have convinced everyone that

720
00:40:55.480 --> 00:41:01.119
<v Speaker 4>point to point travel is more difficult than anywhere to

721
00:41:01.159 --> 00:41:06.400
<v Speaker 4>anywhere travel, And to this day these systems are still

722
00:41:06.440 --> 00:41:09.960
<v Speaker 4>classified that way, where point to point travel, which is

723
00:41:10.000 --> 00:41:14.480
<v Speaker 4>an ADS system in regulatory terms, is somehow, some way

724
00:41:14.519 --> 00:41:19.199
<v Speaker 4>from a regulatory perspective, it is somehow more difficult than

725
00:41:19.239 --> 00:41:24.280
<v Speaker 4>anywhere to anywhere travel, which is an ADAS system. Sort

726
00:41:24.280 --> 00:41:29.920
<v Speaker 4>of well, I should be careful what's saying ADS versus eights. Actually,

727
00:41:31.119 --> 00:41:34.000
<v Speaker 4>let me come back to ADS versus ADS later. But

728
00:41:34.159 --> 00:41:37.880
<v Speaker 4>the point is that somehow they've convinced regulatory people that

729
00:41:38.920 --> 00:41:41.079
<v Speaker 4>point to point is harder than anywhere to anywhere, when

730
00:41:41.119 --> 00:41:44.119
<v Speaker 4>it's clearly the other way around. By far, it's exponentially

731
00:41:44.159 --> 00:41:47.519
<v Speaker 4>more difficult to do anywhere to anywhere travel than point

732
00:41:47.519 --> 00:41:50.840
<v Speaker 4>to point, especially when you're not confined by geofencing. One

733
00:41:50.880 --> 00:41:54.760
<v Speaker 4>of the other things is that these systems, these light

734
00:41:54.760 --> 00:41:58.159
<v Speaker 4>our based systems, are extremely expensive. Now I'm sure the

735
00:41:58.199 --> 00:41:59.840
<v Speaker 4>cost has come down over the past couple of years,

736
00:41:59.880 --> 00:42:02.800
<v Speaker 4>but in twenty twenty three, the average cost on those

737
00:42:02.800 --> 00:42:05.559
<v Speaker 4>weay moths for the lighter based system that apparatus that

738
00:42:05.599 --> 00:42:08.280
<v Speaker 4>they put on there, that cost is fifty thousand dollars

739
00:42:08.360 --> 00:42:11.400
<v Speaker 4>just for the system alone. Then you have to at

740
00:42:11.400 --> 00:42:14.679
<v Speaker 4>that time roughly maybe it's come down a little, but

741
00:42:14.800 --> 00:42:18.599
<v Speaker 4>it's still stupidly expensive. And then for whatever reason, they're

742
00:42:18.639 --> 00:42:21.519
<v Speaker 4>putting these things on Jaguars, So that's another eighty thousand.

743
00:42:22.239 --> 00:42:26.679
<v Speaker 4>So just between these two things alone, you have more

744
00:42:26.719 --> 00:42:29.320
<v Speaker 4>than one hundred and twenty thousand dollars being spent on

745
00:42:29.400 --> 00:42:31.760
<v Speaker 4>these systems one hundred and twenty one hundred and thirty thousand,

746
00:42:32.360 --> 00:42:35.039
<v Speaker 4>and that's not including a safety driver, that's not including

747
00:42:35.159 --> 00:42:37.880
<v Speaker 4>R and B, that's not including any kind of testing or anything.

748
00:42:38.000 --> 00:42:41.719
<v Speaker 4>That's just the hardware alone with nothing else. No software

749
00:42:41.760 --> 00:42:43.920
<v Speaker 4>is even put in it, No AI has even been

750
00:42:43.960 --> 00:42:47.880
<v Speaker 4>put in there yet, So already this stuff is automatically

751
00:42:48.280 --> 00:42:53.760
<v Speaker 4>absurdly overpriced to begin with, whereas the Tesla vehicles are

752
00:42:53.760 --> 00:42:57.039
<v Speaker 4>thirty forty thousand dollars by comparison all in with everything

753
00:42:57.239 --> 00:43:07.159
<v Speaker 4>already done, and so the economics here is something that

754
00:43:07.199 --> 00:43:10.239
<v Speaker 4>people need to understand is that there is nothing like

755
00:43:10.599 --> 00:43:16.519
<v Speaker 4>the scale is not reasonably possible with a light our

756
00:43:16.559 --> 00:43:20.719
<v Speaker 4>based system because the expense on light our based systems,

757
00:43:20.800 --> 00:43:23.880
<v Speaker 4>even after the costs have come down, is still absurd.

758
00:43:24.199 --> 00:43:28.480
<v Speaker 4>By comparison, a camera costs about one hundred dollars. The

759
00:43:28.599 --> 00:43:30.760
<v Speaker 4>cameras that are on the Teslas that they've been doing

760
00:43:30.840 --> 00:43:33.119
<v Speaker 4>for years and years that do all this stuff anywhere

761
00:43:33.119 --> 00:43:37.199
<v Speaker 4>to anywhere, travel just one hundred bucks each, so the

762
00:43:37.239 --> 00:43:40.400
<v Speaker 4>cost difference is absurd. One of the other issues with

763
00:43:40.480 --> 00:43:45.960
<v Speaker 4>geofencing is that if geo fencing and basically memorizing the

764
00:43:46.119 --> 00:43:49.599
<v Speaker 4>area means that you have to expect that the area

765
00:43:49.719 --> 00:43:52.559
<v Speaker 4>is static, because as soon as the area changes in

766
00:43:52.599 --> 00:43:55.440
<v Speaker 4>the material way, your training data goes out the window.

767
00:43:56.280 --> 00:43:59.960
<v Speaker 4>And so what construction becomes a problem because these space

768
00:44:00.000 --> 00:44:03.280
<v Speaker 4>says will continuously change. So as soon as construction arise,

769
00:44:03.400 --> 00:44:05.239
<v Speaker 4>all your training data for that area is gone, and

770
00:44:05.840 --> 00:44:07.679
<v Speaker 4>then you have to retrain with the construction there. And

771
00:44:07.679 --> 00:44:09.360
<v Speaker 4>then as soon as the construction is done, the whole

772
00:44:09.400 --> 00:44:11.480
<v Speaker 4>area changed again, and so you have to redo the

773
00:44:11.519 --> 00:44:14.199
<v Speaker 4>training with new data all over again. So it is

774
00:44:14.239 --> 00:44:18.320
<v Speaker 4>not easily adaptable. So one of the big problems with

775
00:44:19.800 --> 00:44:24.440
<v Speaker 4>interacting in the real world is that it's an open

776
00:44:24.760 --> 00:44:28.559
<v Speaker 4>set environment. What that means in mathematical terms, what that

777
00:44:28.639 --> 00:44:31.920
<v Speaker 4>means is that there's infinite possibilities. So if you're trying

778
00:44:31.960 --> 00:44:35.480
<v Speaker 4>to memorize an answer, which is what basically every light

779
00:44:35.480 --> 00:44:38.800
<v Speaker 4>our system is trying to do, if you memorize an answer,

780
00:44:38.840 --> 00:44:43.639
<v Speaker 4>you cannot possibly memorize infinity. And the problem they're trying

781
00:44:43.639 --> 00:44:46.280
<v Speaker 4>to solve is an open set problem, which means there's

782
00:44:46.320 --> 00:44:50.960
<v Speaker 4>infinite possibilities, So you cannot memorize infinity, so you are

783
00:44:51.039 --> 00:44:53.239
<v Speaker 4>destined to fail from the beginning if your system is

784
00:44:53.280 --> 00:44:56.639
<v Speaker 4>dependent on geofencing or memorization, which it's basically the same thing.

785
00:44:58.960 --> 00:45:02.840
<v Speaker 4>So just to kind of demonstrate other issues that come

786
00:45:02.920 --> 00:45:06.199
<v Speaker 4>up with light AR, this is an intersection in China,

787
00:45:07.360 --> 00:45:09.400
<v Speaker 4>And if you look at this intersection in China, just

788
00:45:09.440 --> 00:45:12.239
<v Speaker 4>look at the heavy complexity that's involved here, the path

789
00:45:12.239 --> 00:45:14.679
<v Speaker 4>and complexity, all the lands, all the turning, all the

790
00:45:14.719 --> 00:45:17.800
<v Speaker 4>different conditions that would be required, and the amount of

791
00:45:17.800 --> 00:45:20.079
<v Speaker 4>traffic that you would have normally going through this. How

792
00:45:20.119 --> 00:45:23.840
<v Speaker 4>intense it would be to navigate and manage all that. Well,

793
00:45:24.079 --> 00:45:27.199
<v Speaker 4>this is yet another situation where light R doesn't do

794
00:45:27.280 --> 00:45:30.800
<v Speaker 4>shit for you, nothing at all. Light R does not

795
00:45:31.000 --> 00:45:34.119
<v Speaker 4>help you with this problem in any way, shape or form,

796
00:45:34.280 --> 00:45:38.960
<v Speaker 4>exactly zero. It's useless, doesn't do anything. Same thing as

797
00:45:39.000 --> 00:45:40.760
<v Speaker 4>true in America. I was just showing this one because

798
00:45:40.840 --> 00:45:42.320
<v Speaker 4>one of the issues is that you want to be

799
00:45:42.320 --> 00:45:44.960
<v Speaker 4>able to generalize an answer. So if you try to

800
00:45:45.239 --> 00:45:48.960
<v Speaker 4>train on just US infrastructure, then you're pigeonholing yourself to

801
00:45:49.239 --> 00:45:51.679
<v Speaker 4>US infrastructure and you can't work anywhere else. And one

802
00:45:51.679 --> 00:45:52.840
<v Speaker 4>of the things I'm trying to get it is I'm

803
00:45:52.840 --> 00:45:55.519
<v Speaker 4>trying to model off the human brain, so we need

804
00:45:55.519 --> 00:45:59.760
<v Speaker 4>to find a generalizable answer that works anywhere regardless of

805
00:46:00.039 --> 00:46:05.559
<v Speaker 4>where you are a situation. Here's another problem. You have

806
00:46:05.639 --> 00:46:08.599
<v Speaker 4>all kinds of different signs, all kinds of different lights,

807
00:46:08.800 --> 00:46:13.079
<v Speaker 4>and they're constantly evolving, and there's so many different versions

808
00:46:13.079 --> 00:46:16.159
<v Speaker 4>of them, and they constantly change. And this is another

809
00:46:16.239 --> 00:46:19.719
<v Speaker 4>situation where light ar doesn't do shit for you. Exactly zero.

810
00:46:20.000 --> 00:46:22.519
<v Speaker 4>There's nothing about lightar that helps you with anything here

811
00:46:22.519 --> 00:46:24.599
<v Speaker 4>on the screen, zero exactly zero.

812
00:46:27.360 --> 00:46:30.800
<v Speaker 1>So they ishud on our streets. I mean, if they

813
00:46:30.800 --> 00:46:33.679
<v Speaker 1>can't read a light, what the fuck are we doing

814
00:46:33.800 --> 00:46:35.280
<v Speaker 1>letting it drive around?

815
00:46:36.079 --> 00:46:39.239
<v Speaker 4>Not a clue? I mean, I will make the argument

816
00:46:39.320 --> 00:46:42.280
<v Speaker 4>for I will a little bit later give the fundamental

817
00:46:42.400 --> 00:46:44.719
<v Speaker 4>argument about why light R is used, which is about

818
00:46:44.760 --> 00:46:48.440
<v Speaker 4>depth perception. This is the fundamental functional purpose that light

819
00:46:48.480 --> 00:46:51.400
<v Speaker 4>our serves. But we need to get a little more

820
00:46:51.440 --> 00:46:54.239
<v Speaker 4>involved into this presentation before I really hit it. But

821
00:46:54.400 --> 00:46:59.519
<v Speaker 4>the argument for lightar is it is accurate death. And

822
00:46:59.599 --> 00:47:01.400
<v Speaker 4>I'll get into that a little later. Let me kind

823
00:47:01.400 --> 00:47:03.519
<v Speaker 4>of do all the setup for everyone, because these are

824
00:47:03.559 --> 00:47:05.639
<v Speaker 4>all going to be new concepts, so we're gonna want

825
00:47:06.039 --> 00:47:08.920
<v Speaker 4>We're going to do this right. Let's put it that way. So,

826
00:47:10.280 --> 00:47:12.840
<v Speaker 4>as I was saying before the primary issues that we

827
00:47:12.920 --> 00:47:15.559
<v Speaker 4>have an open set environment. If we're trying to interact

828
00:47:15.559 --> 00:47:18.880
<v Speaker 4>in the universe, in the real world, then there's going

829
00:47:18.920 --> 00:47:20.960
<v Speaker 4>to be an infinite number of things that can happen.

830
00:47:21.400 --> 00:47:24.320
<v Speaker 4>So there are three major categories of problems. There are

831
00:47:24.320 --> 00:47:27.239
<v Speaker 4>the known common problems that we always know about. That's

832
00:47:27.280 --> 00:47:31.440
<v Speaker 4>this green bubble. Then there's the long tail known cases,

833
00:47:31.519 --> 00:47:34.159
<v Speaker 4>the rare events but we know about them. Those are

834
00:47:34.239 --> 00:47:37.639
<v Speaker 4>that's the blue circle. And then there is this much

835
00:47:37.880 --> 00:47:42.320
<v Speaker 4>larger orange circle, which is the long tail unknown events,

836
00:47:42.880 --> 00:47:46.000
<v Speaker 4>the unusual events that we don't know about, which is

837
00:47:46.079 --> 00:47:50.440
<v Speaker 4>the actual larger set of problems, way larger than the

838
00:47:50.480 --> 00:47:54.199
<v Speaker 4>other two. So just to give you some examples, at

839
00:47:54.280 --> 00:47:57.559
<v Speaker 4>least once a year, in this top left photo, at

840
00:47:57.599 --> 00:47:59.880
<v Speaker 4>least once a year, there might be a dinosaur cross

841
00:48:00.079 --> 00:48:04.800
<v Speaker 4>the road. And on the top right photo you have

842
00:48:04.880 --> 00:48:06.559
<v Speaker 4>a literal it's I know it's kind of hard to see,

843
00:48:06.599 --> 00:48:09.480
<v Speaker 4>sorry small, but that's a literal plane on the road.

844
00:48:10.719 --> 00:48:13.239
<v Speaker 4>And then you have a few other different issues here,

845
00:48:13.280 --> 00:48:15.960
<v Speaker 4>like on the bottom that and we'll hit on a

846
00:48:15.960 --> 00:48:17.880
<v Speaker 4>lot more of this stuff, but you can see how

847
00:48:17.880 --> 00:48:21.599
<v Speaker 4>there's a lot of unusual things that happen in fact,

848
00:48:21.760 --> 00:48:24.199
<v Speaker 4>the number of weird things that can happen is probably

849
00:48:24.239 --> 00:48:27.039
<v Speaker 4>way higher than you really are common Like, even if

850
00:48:27.039 --> 00:48:30.039
<v Speaker 4>you drive a lot, you'd be surprised how many weird

851
00:48:30.079 --> 00:48:33.800
<v Speaker 4>things happen that you've probably personally never seen before. So

852
00:48:34.679 --> 00:48:38.480
<v Speaker 4>in particular, what I want to highlight here is the

853
00:48:38.519 --> 00:48:47.159
<v Speaker 4>center top video and the center bottom video. So in

854
00:48:47.239 --> 00:48:50.119
<v Speaker 4>the center top video you have a bunch of small

855
00:48:50.199 --> 00:48:53.880
<v Speaker 4>mammals all over the road. In the center bottom video,

856
00:48:54.239 --> 00:48:59.320
<v Speaker 4>you have like flying pages. Now I was mentioning earlier

857
00:48:59.639 --> 00:49:02.519
<v Speaker 4>that the way light our works is that lied Our

858
00:49:03.800 --> 00:49:09.599
<v Speaker 4>tries to do object detection and classification by shape. So

859
00:49:10.039 --> 00:49:15.840
<v Speaker 4>what really is problematic is if you're trying to classify

860
00:49:16.599 --> 00:49:20.800
<v Speaker 4>by shape, what ends up happening is if you get

861
00:49:20.840 --> 00:49:25.159
<v Speaker 4>this wrong, meaning you reverse these two scenarios. In the

862
00:49:25.159 --> 00:49:28.880
<v Speaker 4>top scenario, you may go very cautiously because of these mammals.

863
00:49:29.440 --> 00:49:31.719
<v Speaker 4>In the bottom scenario, you don't want to you're on

864
00:49:31.719 --> 00:49:33.719
<v Speaker 4>the highway. On this scenario, you don't want to force

865
00:49:33.760 --> 00:49:36.440
<v Speaker 4>the break right away. You would actually go really aggressively

866
00:49:36.440 --> 00:49:39.039
<v Speaker 4>because you would know that these are pages. However, if

867
00:49:39.039 --> 00:49:43.159
<v Speaker 4>you're classifying by shape, you might reverse these two scenarios,

868
00:49:43.719 --> 00:49:47.119
<v Speaker 4>and that's when the horrible stuff happens, and so this

869
00:49:47.159 --> 00:49:50.800
<v Speaker 4>is another example where shape can easily be fooled and

870
00:49:50.840 --> 00:49:56.159
<v Speaker 4>where the quality, the quality of color becomes very relevant

871
00:49:56.199 --> 00:49:58.440
<v Speaker 4>and important. And in fact, not long ago I think

872
00:49:58.440 --> 00:50:00.079
<v Speaker 4>it was actually when we were at the comp for

873
00:50:00.079 --> 00:50:03.840
<v Speaker 4>instant weimo run over a cat and so uh yeah,

874
00:50:03.920 --> 00:50:06.440
<v Speaker 4>so I mean that also became really I'm a big

875
00:50:06.480 --> 00:50:10.679
<v Speaker 4>cat fan, So that's that really hurt. And of course

876
00:50:10.719 --> 00:50:12.199
<v Speaker 4>you have a lot of unusual scenarios.

877
00:50:13.360 --> 00:50:17.239
<v Speaker 1>Top right, what's the difference between a cat and a

878
00:50:17.280 --> 00:50:20.199
<v Speaker 1>bag blowing in the wind. If a cat's at full sprint,

879
00:50:20.320 --> 00:50:23.480
<v Speaker 1>how do you tell the difference in shape from that?

880
00:50:23.599 --> 00:50:26.360
<v Speaker 1>And maybe a bag that's now in the shape of

881
00:50:26.400 --> 00:50:26.800
<v Speaker 1>a cat.

882
00:50:27.239 --> 00:50:30.800
<v Speaker 4>Yeah, you would be surprised at how much better computers

883
00:50:30.800 --> 00:50:33.480
<v Speaker 4>are at computer vision than humans when you really want

884
00:50:33.480 --> 00:50:35.719
<v Speaker 4>to do And one of the things that I'm not

885
00:50:35.760 --> 00:50:38.199
<v Speaker 4>going to hit on it, but maybe it's worth mentioning

886
00:50:38.199 --> 00:50:40.679
<v Speaker 4>since you asked, is that once you can do the

887
00:50:40.760 --> 00:50:44.679
<v Speaker 4>visual spectrum, why can't you do everything else? There's nothing

888
00:50:44.719 --> 00:50:48.239
<v Speaker 4>stopping you. And so that's exactly what the evolutionary path

889
00:50:48.280 --> 00:50:50.400
<v Speaker 4>will look like when you get really advanced. Is that

890
00:50:50.679 --> 00:50:53.400
<v Speaker 4>once you can do wavelength light there's nothing stopping. That's

891
00:50:53.400 --> 00:50:55.679
<v Speaker 4>a very tight that's like the smallest liver of the

892
00:50:56.159 --> 00:50:58.719
<v Speaker 4>wavelength available to you. Once you can do that, you

893
00:50:58.760 --> 00:51:01.239
<v Speaker 4>can do all the other stuff. So just add infrared

894
00:51:01.280 --> 00:51:04.119
<v Speaker 4>for example, and you can automatically see through smoke, you

895
00:51:04.119 --> 00:51:06.280
<v Speaker 4>can see through fog, you can see through you can

896
00:51:06.320 --> 00:51:08.079
<v Speaker 4>see at night. You'll be able to do all kinds

897
00:51:08.119 --> 00:51:11.440
<v Speaker 4>of stuff. So this is all and it's all vision,

898
00:51:11.679 --> 00:51:13.880
<v Speaker 4>not anything to do with light, ar or anything like that.

899
00:51:16.920 --> 00:51:19.480
<v Speaker 4>And of course you'll have some unusual situations like in

900
00:51:19.519 --> 00:51:24.119
<v Speaker 4>the far right scenario that's a random washer rolling down

901
00:51:24.159 --> 00:51:28.280
<v Speaker 4>the road, and so that can also be problematic. Sometimes

902
00:51:28.280 --> 00:51:29.960
<v Speaker 4>you'll have trees that are in the middle of the road,

903
00:51:29.960 --> 00:51:33.119
<v Speaker 4>like a bottom left and then this situation I'm going

904
00:51:33.199 --> 00:51:35.039
<v Speaker 4>to bring up now discuss it a little more. But

905
00:51:35.440 --> 00:51:38.920
<v Speaker 4>second from the bottom right, this is a trash can

906
00:51:39.000 --> 00:51:42.280
<v Speaker 4>that's blowing from the wind through the road. Now, the

907
00:51:42.360 --> 00:51:46.920
<v Speaker 4>reason why in AVS for a brief period of time

908
00:51:47.039 --> 00:51:51.960
<v Speaker 4>this scenario became complicated was because there's this thing called

909
00:51:52.039 --> 00:51:56.679
<v Speaker 4>kinematic bits. These things that kinematic bits is just motion bits,

910
00:51:56.719 --> 00:52:01.000
<v Speaker 4>motion information data about motion. And so so people started

911
00:52:01.000 --> 00:52:04.920
<v Speaker 4>training on kinematic bits in addition to the wavelength light

912
00:52:04.960 --> 00:52:07.400
<v Speaker 4>and stuff like that. And so what they did was

913
00:52:07.440 --> 00:52:09.519
<v Speaker 4>they trained the system to be able to recognize every

914
00:52:09.559 --> 00:52:12.960
<v Speaker 4>single trash can there is, and it did it. But

915
00:52:13.079 --> 00:52:16.199
<v Speaker 4>one of the problems was that the training that they

916
00:52:16.239 --> 00:52:19.719
<v Speaker 4>did was on trash cans that didn't move, and so

917
00:52:19.800 --> 00:52:22.639
<v Speaker 4>when they deployed it, there was this trash can that

918
00:52:22.719 --> 00:52:26.599
<v Speaker 4>started moving and so it had kinematic bits, but every

919
00:52:26.639 --> 00:52:29.679
<v Speaker 4>trash can it was trained on had zero, no kinematic bits.

920
00:52:29.719 --> 00:52:32.039
<v Speaker 4>So trash can is not supposed to move. And so

921
00:52:32.079 --> 00:52:34.159
<v Speaker 4>you can see how a detail like that will mess

922
00:52:34.239 --> 00:52:37.239
<v Speaker 4>up the system. And so one of the things that's

923
00:52:37.280 --> 00:52:42.480
<v Speaker 4>important is to understand where are all the advantages and

924
00:52:42.559 --> 00:52:46.000
<v Speaker 4>the disadvantages, and kinematic bits are actually an advantage, you

925
00:52:46.119 --> 00:52:48.199
<v Speaker 4>just need to make sure you train on them. And

926
00:52:48.280 --> 00:52:50.800
<v Speaker 4>so this is earlier. This scenario that I'm talking about

927
00:52:50.880 --> 00:52:52.880
<v Speaker 4>has already been figured out. Don't worry. This is you know,

928
00:52:52.920 --> 00:52:55.039
<v Speaker 4>long ago. So people have figured this dumb shit out.

929
00:52:55.199 --> 00:52:57.480
<v Speaker 4>But all we had to go through the evolutionary process.

930
00:52:57.840 --> 00:53:00.760
<v Speaker 4>And I think in some of the that I've seen,

931
00:53:00.880 --> 00:53:04.559
<v Speaker 4>you know, because there's so many companies that are trying

932
00:53:04.559 --> 00:53:07.320
<v Speaker 4>to do light ar at this point and I see

933
00:53:07.320 --> 00:53:09.079
<v Speaker 4>all kinds of horrible stuff, and I see a lot

934
00:53:09.079 --> 00:53:12.519
<v Speaker 4>of cheats as well, And I'll talk a little bit

935
00:53:12.519 --> 00:53:15.400
<v Speaker 4>about one of those cheats right now, which is a

936
00:53:15.400 --> 00:53:19.360
<v Speaker 4>bounding box. So this is a in computer vision, this

937
00:53:19.480 --> 00:53:21.760
<v Speaker 4>is not a big deal. But in basically any other

938
00:53:23.760 --> 00:53:27.599
<v Speaker 4>like what do you call it, any other modality, this

939
00:53:27.760 --> 00:53:29.960
<v Speaker 4>bounding box problem I've seen is a big problem. And

940
00:53:30.000 --> 00:53:32.480
<v Speaker 4>even in twenty twenty three, I was still seeing papers

941
00:53:32.840 --> 00:53:36.519
<v Speaker 4>that were being submitted to CVPR or to IEEE that

942
00:53:37.159 --> 00:53:40.480
<v Speaker 4>we're using bounding boxes. Now, that paper that I was

943
00:53:40.519 --> 00:53:43.440
<v Speaker 4>showing before about the pedestrians the vr us in twenty

944
00:53:43.480 --> 00:53:45.760
<v Speaker 4>twenty one, that's a found I was saying that the

945
00:53:45.800 --> 00:53:49.960
<v Speaker 4>foundational paper heavily cited, well understood, academic accepted, you know,

946
00:53:50.039 --> 00:53:54.960
<v Speaker 4>big deal. That paper also demonstrated that bounding boxes are

947
00:53:55.119 --> 00:53:57.280
<v Speaker 4>a cheat. And what a bounding box is is that

948
00:53:57.440 --> 00:53:59.599
<v Speaker 4>not only do you need to be able to identify

949
00:53:59.639 --> 00:54:02.760
<v Speaker 4>what an object is, but you also need to be

950
00:54:02.840 --> 00:54:06.320
<v Speaker 4>able to position that object on the frame, be able

951
00:54:06.320 --> 00:54:09.519
<v Speaker 4>to identify that position. And what a bounding box is

952
00:54:09.719 --> 00:54:12.280
<v Speaker 4>basically a box that goes around the object you're trying

953
00:54:12.280 --> 00:54:16.239
<v Speaker 4>to identify. And the issue with this is that a

954
00:54:16.280 --> 00:54:18.960
<v Speaker 4>bounding box is going to always be larger than the

955
00:54:19.000 --> 00:54:23.519
<v Speaker 4>actual object itself, and it's not going to be the

956
00:54:23.559 --> 00:54:25.519
<v Speaker 4>shape of the object. The bounding box is just going

957
00:54:25.599 --> 00:54:27.719
<v Speaker 4>to like for example, in this circle that you're seeing here,

958
00:54:28.000 --> 00:54:32.039
<v Speaker 4>the bounding box in this example is way larger than

959
00:54:32.079 --> 00:54:35.920
<v Speaker 4>the actual truck itself, but so you can see how

960
00:54:36.039 --> 00:54:39.559
<v Speaker 4>much larger it is. So in the scenario like you

961
00:54:39.639 --> 00:54:43.679
<v Speaker 4>have in the middle here with this overturned truck, if

962
00:54:43.719 --> 00:54:46.039
<v Speaker 4>you imagine a bounding box with this kind of error

963
00:54:46.119 --> 00:54:48.679
<v Speaker 4>rate on it in terms of the size of its shape,

964
00:54:49.159 --> 00:54:51.119
<v Speaker 4>and apply it to this truck, the bounding box is

965
00:54:51.159 --> 00:54:54.440
<v Speaker 4>going to be bigger than the frame. Now, why that's

966
00:54:54.480 --> 00:54:58.480
<v Speaker 4>important is that it's easier to see on this bottom

967
00:54:58.559 --> 00:55:02.119
<v Speaker 4>left photo. That's important is because if you're using a

968
00:55:02.159 --> 00:55:05.960
<v Speaker 4>bounding box and it's larger than the object, you can

969
00:55:06.000 --> 00:55:09.440
<v Speaker 4>see that there's bounding box on the side here. You

970
00:55:09.440 --> 00:55:12.719
<v Speaker 4>can see here this side here is covering up the road.

971
00:55:13.320 --> 00:55:17.039
<v Speaker 4>So from a from the AI's perspective, there is no

972
00:55:17.280 --> 00:55:20.440
<v Speaker 4>road to travel on, and so you'll see the AI

973
00:55:20.519 --> 00:55:23.599
<v Speaker 4>systems start to stutter here because it doesn't think there's

974
00:55:23.639 --> 00:55:26.519
<v Speaker 4>actually anywhere to go, any room to go, and so

975
00:55:26.599 --> 00:55:29.480
<v Speaker 4>it has no way of path projecting through a bounding

976
00:55:29.519 --> 00:55:31.800
<v Speaker 4>box because you're not supposed to interact with this object.

977
00:55:32.360 --> 00:55:34.719
<v Speaker 4>So this is why bounding boxes are a cheat. And

978
00:55:34.800 --> 00:55:36.480
<v Speaker 4>this is still, as far as I know, this is

979
00:55:36.519 --> 00:55:40.400
<v Speaker 4>still actually happening. Unfortunately, in computer vision, I think AI

980
00:55:40.519 --> 00:55:44.239
<v Speaker 4>specialists have figured this dumb shit out, but in everything else,

981
00:55:44.320 --> 00:55:47.239
<v Speaker 4>I think they haven't. And this is a real shame.

982
00:55:47.760 --> 00:55:50.880
<v Speaker 4>And I'll show you the solution to this problem, how

983
00:55:50.920 --> 00:55:53.159
<v Speaker 4>you solve the bounding box problem. But I just want

984
00:55:53.199 --> 00:55:55.639
<v Speaker 4>to bring it up right now that I've seen many

985
00:55:55.719 --> 00:55:59.719
<v Speaker 4>light our based systems use this, and this is a cheat.

986
00:55:59.760 --> 00:56:02.199
<v Speaker 4>This is a known cheat, and this is academically published

987
00:56:02.239 --> 00:56:04.760
<v Speaker 4>as a cheat, and it's unfortunate that this thing is

988
00:56:04.800 --> 00:56:14.039
<v Speaker 4>still happening. So here's the WEIMO system on the on

989
00:56:14.079 --> 00:56:16.320
<v Speaker 4>the right here, and you can see the little spinny

990
00:56:16.360 --> 00:56:18.519
<v Speaker 4>thing that's a lighter. And like I was saying, no

991
00:56:18.559 --> 00:56:21.039
<v Speaker 4>matter what your philosophy is, you still need to have

992
00:56:21.079 --> 00:56:23.760
<v Speaker 4>cameras anyway. And what this is doing is it's kind

993
00:56:23.760 --> 00:56:27.119
<v Speaker 4>of showing the cleaning system that they're using. So this

994
00:56:27.199 --> 00:56:31.159
<v Speaker 4>whole apparatus is being put on top of jaguars in

995
00:56:31.239 --> 00:56:34.400
<v Speaker 4>Phoenix and in La San France and maybe a couple

996
00:56:34.480 --> 00:56:39.000
<v Speaker 4>other cities now Austin, I think also, and so what

997
00:56:39.119 --> 00:56:42.039
<v Speaker 4>I want. So there's a couple major issues with this.

998
00:56:42.480 --> 00:56:45.079
<v Speaker 4>So number one, there's a reason why we do wind

999
00:56:45.119 --> 00:56:48.559
<v Speaker 4>tunnel testing. And so if you're putting all these light

1000
00:56:48.599 --> 00:56:50.639
<v Speaker 4>oar systems on it, and it's not just this little

1001
00:56:50.639 --> 00:56:53.280
<v Speaker 4>thing on top, it's also got light oars on the front,

1002
00:56:53.320 --> 00:56:55.280
<v Speaker 4>on the two corners on the back, and the two

1003
00:56:55.320 --> 00:56:57.840
<v Speaker 4>corners I think on the center back as well. So

1004
00:56:58.159 --> 00:57:00.679
<v Speaker 4>you're increasing the amount of mass and you're increasing the

1005
00:57:00.719 --> 00:57:04.760
<v Speaker 4>amount of surface area on the exterior of the vehicle

1006
00:57:05.000 --> 00:57:06.800
<v Speaker 4>in order to do this. And there's a reason why

1007
00:57:06.840 --> 00:57:09.039
<v Speaker 4>we do windsital testing. So if you do all this,

1008
00:57:09.119 --> 00:57:11.239
<v Speaker 4>you're gonna basically invalidate all that and this is going

1009
00:57:11.320 --> 00:57:13.880
<v Speaker 4>to affect your range efficiency. So you're gonna have range

1010
00:57:13.920 --> 00:57:15.800
<v Speaker 4>problems as a result of this. And this is especially

1011
00:57:15.880 --> 00:57:18.400
<v Speaker 4>going to become a problem at speed. And so the

1012
00:57:18.480 --> 00:57:20.440
<v Speaker 4>higher speed you go, the more wind resistance you're going

1013
00:57:20.519 --> 00:57:22.719
<v Speaker 4>to get in, the more problems you're gonna have with range.

1014
00:57:23.599 --> 00:57:26.440
<v Speaker 4>Another issue with this, and what I was showing here

1015
00:57:26.440 --> 00:57:29.960
<v Speaker 4>with this video is the cleaning system. And if you

1016
00:57:30.199 --> 00:57:33.760
<v Speaker 4>just apply one brain cell. Another thing you can do

1017
00:57:34.599 --> 00:57:37.440
<v Speaker 4>is you can just take these cameras and put them

1018
00:57:37.440 --> 00:57:40.719
<v Speaker 4>behind the windshield and then you don't ever have to

1019
00:57:40.760 --> 00:57:42.960
<v Speaker 4>do this dumb shit ever again. And then you can

1020
00:57:43.039 --> 00:57:47.239
<v Speaker 4>use the cabin features like the windshield wiper and the

1021
00:57:47.320 --> 00:57:51.039
<v Speaker 4>defrost settings that are already there and you don't have

1022
00:57:51.079 --> 00:57:55.199
<v Speaker 4>to build anything new and it only will cost you

1023
00:57:55.239 --> 00:58:00.639
<v Speaker 4>like a couple hundred dollars and that's it. So that's

1024
00:58:00.679 --> 00:58:03.000
<v Speaker 4>the kind of dumb we're dealing with. So this is

1025
00:58:03.039 --> 00:58:05.039
<v Speaker 4>why I needed to do this setup for it before

1026
00:58:05.039 --> 00:58:07.599
<v Speaker 4>I get into the more complicated stuff, because I need

1027
00:58:07.639 --> 00:58:09.760
<v Speaker 4>a level set on what kind of level of dumb

1028
00:58:09.800 --> 00:58:15.360
<v Speaker 4>we're dealing with. So I'm gonna skip this. This is

1029
00:58:15.360 --> 00:58:17.760
<v Speaker 4>called a distribution shift. I don't want to get too technical,

1030
00:58:17.880 --> 00:58:21.880
<v Speaker 4>but this is basically saying some stuff about time and

1031
00:58:21.920 --> 00:58:25.119
<v Speaker 4>how it's relevant. I want to point this company out

1032
00:58:25.159 --> 00:58:26.840
<v Speaker 4>just briefly. I'm not going to go deep into it,

1033
00:58:26.880 --> 00:58:28.519
<v Speaker 4>but I just want to just so that people don't

1034
00:58:28.559 --> 00:58:31.880
<v Speaker 4>think I'm just promoting Tesla whatever, and that's it. This

1035
00:58:31.960 --> 00:58:35.480
<v Speaker 4>is another company out of the UK. This is the

1036
00:58:35.519 --> 00:58:39.519
<v Speaker 4>company's name is wave Wayve and this is run by

1037
00:58:39.519 --> 00:58:43.360
<v Speaker 4>a guy named Alex Kendall, and they're trying to do

1038
00:58:43.400 --> 00:58:47.440
<v Speaker 4>a monocular based system. So a monocular based system is

1039
00:58:47.519 --> 00:58:50.840
<v Speaker 4>one camera and they're using like other things like HD

1040
00:58:50.960 --> 00:58:53.360
<v Speaker 4>maps and probably a couple other things. But the point

1041
00:58:53.559 --> 00:58:56.840
<v Speaker 4>is their system is one camera and that's all they're using,

1042
00:58:57.719 --> 00:59:01.400
<v Speaker 4>and this is based out of the UK, and they're

1043
00:59:01.480 --> 00:59:04.400
<v Speaker 4>trying to build it up. I don't think they're fully

1044
00:59:04.440 --> 00:59:06.320
<v Speaker 4>deployed yet, or at least not that I'm aware of.

1045
00:59:07.599 --> 00:59:11.880
<v Speaker 4>But what is important to understand here is that even

1046
00:59:12.000 --> 00:59:17.239
<v Speaker 4>with one camera, automated vehicles can be achieved. And they've

1047
00:59:17.239 --> 00:59:19.400
<v Speaker 4>done a really good job with this, Like I'm skipping

1048
00:59:19.400 --> 00:59:22.199
<v Speaker 4>a lot here, but like it's actually impressive what they're

1049
00:59:22.199 --> 00:59:26.360
<v Speaker 4>able to do with one camera and how money abstract

1050
00:59:26.400 --> 00:59:28.599
<v Speaker 4>ways that they've been able to improve on the system

1051
00:59:28.599 --> 00:59:34.039
<v Speaker 4>that are unconventional. And so the important aspect to understand

1052
00:59:34.039 --> 00:59:37.679
<v Speaker 4>here is that again, the camera's one hundred dollars and

1053
00:59:37.880 --> 00:59:41.760
<v Speaker 4>this camera based, this singular camera based system is still

1054
00:59:41.880 --> 00:59:46.719
<v Speaker 4>capable of doing automated vehicles, and Waimo's spending one hundred

1055
00:59:46.719 --> 00:59:49.280
<v Speaker 4>and twenty thousand dollars one hundred and thirty thousand dollars

1056
00:59:49.280 --> 00:59:54.920
<v Speaker 4>per vehicle to not really achieve that so it's one

1057
00:59:54.960 --> 00:59:58.360
<v Speaker 4>thing for me to say that Waymo's messing up and stuff,

1058
00:59:58.440 --> 01:00:02.400
<v Speaker 4>but it's another thing to act. Showed the hardcore data

1059
01:00:02.400 --> 01:00:05.119
<v Speaker 4>on it. So I just want to bring this up

1060
01:00:05.159 --> 01:00:12.760
<v Speaker 4>real quick. Well where to go there it is. This

1061
01:00:12.920 --> 01:00:18.119
<v Speaker 4>is from the NITSA dot gov website. So this is

1062
01:00:18.159 --> 01:00:22.719
<v Speaker 4>official crash reporting data. This is a standing general Order

1063
01:00:22.800 --> 01:00:26.880
<v Speaker 4>on crash reporting. You're required to report and this is

1064
01:00:27.000 --> 01:00:28.800
<v Speaker 4>NITSA dot gov if you can see at the top.

1065
01:00:30.360 --> 01:00:35.760
<v Speaker 4>And this is national reporting. So any automated vehicle that

1066
01:00:35.920 --> 01:00:39.360
<v Speaker 4>is operating on public roads you're required to report if

1067
01:00:39.360 --> 01:00:43.599
<v Speaker 4>it's involved in the collision. And what this is again

1068
01:00:43.719 --> 01:00:47.840
<v Speaker 4>national reporting. So I want to iterate again. WEAIMO only

1069
01:00:47.920 --> 01:00:52.960
<v Speaker 4>operates in geofen spaces within specific cities in the country,

1070
01:00:53.000 --> 01:00:56.159
<v Speaker 4>and that's it. WEAIMO. This is this data is being

1071
01:00:56.199 --> 01:00:59.840
<v Speaker 4>reported on. The last reported on November seventeen, twenty twenty five.

1072
01:01:00.400 --> 01:01:03.800
<v Speaker 4>This is continuously updated every couple months, every few months.

1073
01:01:03.800 --> 01:01:07.559
<v Speaker 4>I've been following it since twenty twenty three, and WEIMO

1074
01:01:07.719 --> 01:01:10.639
<v Speaker 4>has about two thousand vehicles on the road. And you

1075
01:01:10.679 --> 01:01:12.880
<v Speaker 4>can see here it's kind of tiny. Here, I'll read

1076
01:01:12.880 --> 01:01:15.639
<v Speaker 4>it for you. It's one thousand four hundred and twenty

1077
01:01:15.719 --> 01:01:21.519
<v Speaker 4>six collisions. So in their fleet of two thousand vehicles

1078
01:01:21.639 --> 01:01:26.519
<v Speaker 4>across only a few different cities, they're reporting one thousand,

1079
01:01:26.639 --> 01:01:31.360
<v Speaker 4>four hundred and twenty six collisions. Now, another thing to

1080
01:01:31.360 --> 01:01:36.079
<v Speaker 4>know about this, Weimo recently expanded to a couple new cities,

1081
01:01:37.639 --> 01:01:42.559
<v Speaker 4>so that fleet of two thousand vehicles is recently like

1082
01:01:42.760 --> 01:01:47.960
<v Speaker 4>increased recently. In May of this year, I saw they

1083
01:01:47.960 --> 01:01:51.280
<v Speaker 4>had like twelve hundred vehicles on the road, fourteen hundred vehicles,

1084
01:01:51.280 --> 01:01:55.079
<v Speaker 4>something like this, and they were reporting roughly eighty percent

1085
01:01:55.400 --> 01:01:57.840
<v Speaker 4>of their fleet has already been involved in a collision.

1086
01:01:58.719 --> 01:02:01.960
<v Speaker 4>And I was I've been watching this chart since twenty

1087
01:02:02.000 --> 01:02:05.960
<v Speaker 4>twenty three, and on average it's eighty percent and sometimes

1088
01:02:06.039 --> 01:02:09.599
<v Speaker 4>I've even seen more than one hundred percent. In other words,

1089
01:02:09.639 --> 01:02:14.239
<v Speaker 4>on average more than one collision per vehicle in their

1090
01:02:14.320 --> 01:02:24.280
<v Speaker 4>fleet on Waimo. Now, let's compare this is Tesla's collisions.

1091
01:02:24.800 --> 01:02:29.880
<v Speaker 4>Tesla has I think five million vehicles on the road

1092
01:02:31.199 --> 01:02:35.719
<v Speaker 4>with the eight ass system, millions and millions of vehicles,

1093
01:02:35.760 --> 01:02:39.280
<v Speaker 4>and also around the world, not just in the United States,

1094
01:02:39.880 --> 01:02:44.119
<v Speaker 4>and they're reporting two thousand I think that says eight

1095
01:02:44.199 --> 01:02:50.320
<v Speaker 4>hundred and forty five collisions with millions of vehicles on

1096
01:02:50.400 --> 01:02:57.760
<v Speaker 4>the road, so pretty large, massive magnitude of difference between

1097
01:02:57.760 --> 01:03:04.760
<v Speaker 4>the two. The collision rate with Waimo's is pretty insane.

1098
01:03:05.039 --> 01:03:08.119
<v Speaker 4>I mean, any other vehicle company that you've ever heard

1099
01:03:08.159 --> 01:03:11.280
<v Speaker 4>of having a sixty eighty hundred percent collision rate and

1100
01:03:11.320 --> 01:03:14.840
<v Speaker 4>still operating after all these years, I've never heard of one,

1101
01:03:15.159 --> 01:03:17.800
<v Speaker 4>but Waimo seems to be able to get away with it.

1102
01:03:19.039 --> 01:03:21.199
<v Speaker 4>And so one of the other issues.

1103
01:03:20.800 --> 01:03:23.199
<v Speaker 3>Here, well, one thing I did want to ask you

1104
01:03:23.239 --> 01:03:25.760
<v Speaker 3>it might be a little too conspiratorial. I was wondering

1105
01:03:25.840 --> 01:03:29.159
<v Speaker 3>you think the LIGHTO is maybe the actually map the area,

1106
01:03:29.400 --> 01:03:31.559
<v Speaker 3>Like it's an excuse, that's just one of the excuses

1107
01:03:31.599 --> 01:03:31.920
<v Speaker 3>it's there.

1108
01:03:32.639 --> 01:03:34.679
<v Speaker 4>There is actually that excuse it is made, and that

1109
01:03:34.800 --> 01:03:35.199
<v Speaker 4>is true.

1110
01:03:35.480 --> 01:03:38.000
<v Speaker 3>Like I wonder if it's actually like collecting data and

1111
01:03:38.039 --> 01:03:39.519
<v Speaker 3>you're thinking like, oh, it's on top of the car

1112
01:03:39.559 --> 01:03:40.440
<v Speaker 3>to help get it somewhere.

1113
01:03:40.760 --> 01:03:43.679
<v Speaker 4>It's too Actually, yeah, so this is actually true. So

1114
01:03:43.960 --> 01:03:46.880
<v Speaker 4>you're you're making a good point. There is actually a

1115
01:03:46.960 --> 01:03:50.880
<v Speaker 4>temporary use for light ar in automated vehicles, and this

1116
01:03:51.000 --> 01:03:53.760
<v Speaker 4>is known so one of them and I will get

1117
01:03:53.760 --> 01:03:55.320
<v Speaker 4>into it a little bit later. But one of the

1118
01:03:55.360 --> 01:03:58.079
<v Speaker 4>major issues is calculating depth This is pretty much one

1119
01:03:58.119 --> 01:04:00.480
<v Speaker 4>of the big problems that needed to be solved in

1120
01:04:00.559 --> 01:04:05.880
<v Speaker 4>AVS is depth calculation. And one of the things that

1121
01:04:05.880 --> 01:04:09.199
<v Speaker 4>they learned is that no matter what your system is,

1122
01:04:09.440 --> 01:04:12.119
<v Speaker 4>even if you're using a computer vision only or into

1123
01:04:12.199 --> 01:04:14.800
<v Speaker 4>system like Tesla, one of the things that they learned

1124
01:04:14.920 --> 01:04:17.840
<v Speaker 4>is that you can temporarily put a light our system

1125
01:04:17.920 --> 01:04:21.079
<v Speaker 4>on the vehicle along with the vision and as you're

1126
01:04:21.119 --> 01:04:24.840
<v Speaker 4>training the model, and you can map the area to

1127
01:04:24.960 --> 01:04:28.960
<v Speaker 4>measure depth calculations around you, and you can use that

1128
01:04:29.119 --> 01:04:32.440
<v Speaker 4>to help train the vision system on how to train depth.

1129
01:04:33.039 --> 01:04:35.079
<v Speaker 4>So but then you would still take the light our

1130
01:04:35.119 --> 01:04:37.920
<v Speaker 4>system off after that. The only reason you would have

1131
01:04:37.920 --> 01:04:40.519
<v Speaker 4>it on is temporarily to get depth calculation to get

1132
01:04:40.519 --> 01:04:44.000
<v Speaker 4>the system kickstarted into training, but then you would remove

1133
01:04:44.000 --> 01:04:45.599
<v Speaker 4>it later on and it would be able to calculate

1134
01:04:45.599 --> 01:04:47.840
<v Speaker 4>depth off of the vision system alone. Now even this

1135
01:04:47.960 --> 01:04:50.440
<v Speaker 4>is not required anymore, but there was. This is an

1136
01:04:50.519 --> 01:04:53.760
<v Speaker 4>argument that has made this is sometimes done and this

1137
01:04:53.840 --> 01:04:55.599
<v Speaker 4>is supposed to be temporary, but this is not what

1138
01:04:55.639 --> 01:04:58.920
<v Speaker 4>Wemo's doing. Weymo's making it, and they've multiple times set

1139
01:04:58.920 --> 01:05:03.719
<v Speaker 4>this in public ands including their executives. They're all saying

1140
01:05:03.760 --> 01:05:05.920
<v Speaker 4>that light oar is here to stay for them, meaning

1141
01:05:05.960 --> 01:05:08.719
<v Speaker 4>that they're trying to make light are a fundamental element

1142
01:05:08.800 --> 01:05:13.719
<v Speaker 4>of their solution. The fund, I should say, the fundamental

1143
01:05:13.840 --> 01:05:18.480
<v Speaker 4>element of their solution, which is nonsense, complete nonsense.

1144
01:05:19.199 --> 01:05:22.519
<v Speaker 1>The vital solution for the Silicon Valley nazis.

1145
01:05:23.800 --> 01:05:31.679
<v Speaker 4>He well, well there's stuff there. It's important to understand

1146
01:05:31.840 --> 01:05:35.400
<v Speaker 4>that Weimo is owned by Google and there are numerous

1147
01:05:35.440 --> 01:05:38.480
<v Speaker 4>incentives for doing this. What I had started off by

1148
01:05:38.519 --> 01:05:42.920
<v Speaker 4>saying was that when Weimo came to Phoenix, they went

1149
01:05:42.960 --> 01:05:47.039
<v Speaker 4>to ASU and they started dumping enormous amounts of money

1150
01:05:47.079 --> 01:05:49.519
<v Speaker 4>into the engineering department at ASU. I didn't know that

1151
01:05:49.599 --> 01:05:51.440
<v Speaker 4>at the time, but that's what ended up getting me

1152
01:05:51.679 --> 01:05:55.599
<v Speaker 4>basically in a lot of problems. And so when I

1153
01:05:55.639 --> 01:06:01.079
<v Speaker 4>was applying to my PhD at ASU, I was using

1154
01:06:01.079 --> 01:06:04.760
<v Speaker 4>this presentation. I presented this to all the lab directors

1155
01:06:04.840 --> 01:06:08.159
<v Speaker 4>at ASU and the Perception and Robotics group, which is

1156
01:06:08.199 --> 01:06:11.800
<v Speaker 4>a five floor building filled with PhDs, and basically none

1157
01:06:11.800 --> 01:06:14.159
<v Speaker 4>of them are working on computer vision. They had one

1158
01:06:14.320 --> 01:06:16.639
<v Speaker 4>robot arm with a single camera and that was like

1159
01:06:16.719 --> 01:06:19.679
<v Speaker 4>their computer vision product. Everything else was light our or

1160
01:06:19.719 --> 01:06:22.800
<v Speaker 4>light our related technologies that they were working on. And

1161
01:06:22.840 --> 01:06:25.360
<v Speaker 4>in fact I had to teach my professors all this

1162
01:06:25.400 --> 01:06:27.719
<v Speaker 4>stuff that I'm talking about. They had no clue about

1163
01:06:27.719 --> 01:06:30.239
<v Speaker 4>computer vision and even some of the basic stuff. At

1164
01:06:30.280 --> 01:06:32.400
<v Speaker 4>a certain point it became completely pointless for MEDI even

1165
01:06:32.440 --> 01:06:36.159
<v Speaker 4>b at ASU, because not the professors, every single person

1166
01:06:36.280 --> 01:06:38.639
<v Speaker 4>in the perception of robotics group, not a single person

1167
01:06:38.679 --> 01:06:41.880
<v Speaker 4>at that time had a clue about anything I'm saying here.

1168
01:06:42.320 --> 01:06:51.280
<v Speaker 4>And it's how do I say this? These problems are

1169
01:06:51.320 --> 01:06:54.920
<v Speaker 4>too dumb for somebody that has a PhD and not

1170
01:06:55.039 --> 01:06:59.079
<v Speaker 4>know about in AI and let alone an entire university

1171
01:06:59.159 --> 01:07:01.440
<v Speaker 4>filled with PhD is not knowing a single clue about

1172
01:07:01.480 --> 01:07:07.639
<v Speaker 4>any of this. That's corruption, that's not a mistake. So

1173
01:07:08.199 --> 01:07:10.320
<v Speaker 4>what I was so earlier, I was saying I wanted

1174
01:07:10.360 --> 01:07:12.280
<v Speaker 4>to do this setup so you can understand some of

1175
01:07:12.320 --> 01:07:14.519
<v Speaker 4>the issues that come with this. Now I'll show you

1176
01:07:15.000 --> 01:07:18.119
<v Speaker 4>what Waimo's light our based system actually sees. So here's

1177
01:07:18.159 --> 01:07:22.599
<v Speaker 4>a visual demonstration of what I'm talking about. So this

1178
01:07:22.719 --> 01:07:26.599
<v Speaker 4>is a point cloud representation of what the Weimo vehicle sees.

1179
01:07:26.639 --> 01:07:30.039
<v Speaker 4>You have the literal cameras on top, and then below

1180
01:07:30.119 --> 01:07:33.079
<v Speaker 4>it is what the light our system is seeing, creating

1181
01:07:33.159 --> 01:07:37.000
<v Speaker 4>a point cloud return representation. This is what I'm showing

1182
01:07:37.039 --> 01:07:38.519
<v Speaker 4>on the bottom is what it means to have a

1183
01:07:38.559 --> 01:07:42.280
<v Speaker 4>point cloud return representation. And like you were saying earlier,

1184
01:07:42.280 --> 01:07:45.159
<v Speaker 4>it's just one dimensional point of light. It's not even

1185
01:07:45.199 --> 01:07:48.079
<v Speaker 4>really the wavelength. It's just that one point and that's it.

1186
01:07:48.599 --> 01:07:51.360
<v Speaker 4>Let me play it again. So this is like a

1187
01:07:51.400 --> 01:07:54.159
<v Speaker 4>person walking across with a box. You can see the

1188
01:07:54.159 --> 01:07:59.239
<v Speaker 4>point cloud representation here again, these people move in this stuff.

1189
01:07:59.280 --> 01:08:01.679
<v Speaker 4>You can see it kind of misses the second person,

1190
01:08:01.760 --> 01:08:06.519
<v Speaker 4>it looks like, but it's very low quality, right, Like

1191
01:08:06.639 --> 01:08:11.480
<v Speaker 4>you would never trade the top for the bottom, right though,

1192
01:08:11.559 --> 01:08:13.920
<v Speaker 4>But somehow they've convinced you that this is exactly what

1193
01:08:13.960 --> 01:08:18.159
<v Speaker 4>you should do. One more time, just to iterate the point.

1194
01:08:18.279 --> 01:08:22.319
<v Speaker 4>This is what a point cloud representation means. Now remember

1195
01:08:22.640 --> 01:08:25.600
<v Speaker 4>this video because I'm going to show you what you

1196
01:08:25.640 --> 01:08:29.279
<v Speaker 4>can do with computer vision, and anybody that has understands

1197
01:08:29.359 --> 01:08:32.239
<v Speaker 4>video games and how video games work and all that stuff,

1198
01:08:32.680 --> 01:08:34.920
<v Speaker 4>you're about to have your mind blown because it's exactly

1199
01:08:34.920 --> 01:08:37.960
<v Speaker 4>the same thing. So whatever you can do in video games,

1200
01:08:38.159 --> 01:08:40.399
<v Speaker 4>that's what you can do with computer vision, and you

1201
01:08:40.479 --> 01:08:47.560
<v Speaker 4>cannot do that dumb shit with light art. Okay, this

1202
01:08:47.720 --> 01:08:51.760
<v Speaker 4>is in my opinion, the most important slide, and this

1203
01:08:51.920 --> 01:08:54.479
<v Speaker 4>is the slide that got me in trouble. I know

1204
01:08:54.520 --> 01:08:56.199
<v Speaker 4>there's a lot on here. I'm going to break it down.

1205
01:08:56.239 --> 01:09:01.399
<v Speaker 4>It's not that hard, but this is really worth knowing.

1206
01:09:01.439 --> 01:09:08.399
<v Speaker 4>And in my opinion, like I sh, I want to

1207
01:09:08.399 --> 01:09:10.760
<v Speaker 4>mention that there's a lot of things that I'm not saying.

1208
01:09:11.000 --> 01:09:14.079
<v Speaker 4>Some of it is to condense, but a lot of

1209
01:09:14.079 --> 01:09:18.880
<v Speaker 4>it is also to avoid creating even more problems for myself.

1210
01:09:19.199 --> 01:09:21.119
<v Speaker 4>I'm happy to talk about things if you want to,

1211
01:09:21.520 --> 01:09:24.760
<v Speaker 4>but I'm gonna skip some of the really bad stuff

1212
01:09:25.079 --> 01:09:26.760
<v Speaker 4>that could get me in a lot of trouble. But

1213
01:09:27.039 --> 01:09:28.880
<v Speaker 4>this is this is good. This got me in enough

1214
01:09:28.880 --> 01:09:30.840
<v Speaker 4>trouble what I have here on the screen, This got

1215
01:09:30.880 --> 01:09:33.079
<v Speaker 4>me in enough trouble. But I already did this, so

1216
01:09:33.079 --> 01:09:36.439
<v Speaker 4>I'll go ahead and talk about it. So this is

1217
01:09:36.640 --> 01:09:40.520
<v Speaker 4>the state. So if you ever see the phrase soda

1218
01:09:41.039 --> 01:09:45.640
<v Speaker 4>Sota state of the art, soda state of the art,

1219
01:09:46.600 --> 01:09:49.399
<v Speaker 4>So this is the state of the art. Way MOO

1220
01:09:49.640 --> 01:09:53.239
<v Speaker 4>leaderboard results. On June twenty ninth, twenty twenty three WHIPS.

1221
01:09:54.079 --> 01:09:57.399
<v Speaker 4>This was presented at CVPR twenty twenty three, the Major

1222
01:09:57.520 --> 01:10:01.159
<v Speaker 4>Computer Vision Conference by Chen Wu who's for WAMEL. I

1223
01:10:01.199 --> 01:10:03.600
<v Speaker 4>have the link here and this is the paper that

1224
01:10:03.720 --> 01:10:08.680
<v Speaker 4>is associated with that presentation on the left side. So

1225
01:10:08.760 --> 01:10:10.520
<v Speaker 4>this is the reference paper that I'm talking about on

1226
01:10:10.560 --> 01:10:12.960
<v Speaker 4>the left side. Here on the left side you have

1227
01:10:13.039 --> 01:10:19.880
<v Speaker 4>this table. This table represents what is industry standard for

1228
01:10:20.279 --> 01:10:26.960
<v Speaker 4>object classification around the world. Everyone in the world that

1229
01:10:27.119 --> 01:10:33.640
<v Speaker 4>is doing object identification and classification in AI, they all

1230
01:10:34.199 --> 01:10:41.199
<v Speaker 4>use MIOU mean intersection over union in this task. This

1231
01:10:41.399 --> 01:10:45.880
<v Speaker 4>is industry standard, not just in America, everywhere in the world.

1232
01:10:46.159 --> 01:10:50.520
<v Speaker 4>This is well understood, well known. This is standard. What

1233
01:10:51.359 --> 01:10:54.560
<v Speaker 4>MIOU is that there is a minimum. It's at the

1234
01:10:54.640 --> 01:10:59.359
<v Speaker 4>time it was a minimum sixteen classes, but actually the

1235
01:10:59.479 --> 01:11:02.279
<v Speaker 4>numbers in significantly since then. I think in this case

1236
01:11:02.359 --> 01:11:04.920
<v Speaker 4>WEIMO was doing twenty two which at the time in

1237
01:11:04.960 --> 01:11:07.640
<v Speaker 4>twenty twenty three was acceptable to do twenty two classes.

1238
01:11:08.279 --> 01:11:10.680
<v Speaker 4>Since then this number has increased to even more classes.

1239
01:11:10.720 --> 01:11:14.199
<v Speaker 4>But just know at the time this is actually industry

1240
01:11:14.239 --> 01:11:17.920
<v Speaker 4>standard and accepted. You're going to have two sets of

1241
01:11:18.000 --> 01:11:22.399
<v Speaker 4>data here, validation set and the test set. The validation

1242
01:11:22.560 --> 01:11:26.840
<v Speaker 4>set is the performance of the model on the training data,

1243
01:11:26.880 --> 01:11:30.079
<v Speaker 4>and then the test set, which is the real measure

1244
01:11:30.239 --> 01:11:34.600
<v Speaker 4>of performance. The test set is on data after the

1245
01:11:34.600 --> 01:11:38.640
<v Speaker 4>model has been trained and is being tested on scenarios

1246
01:11:38.680 --> 01:11:41.800
<v Speaker 4>that it hasn't ever seen before. So on average, you

1247
01:11:41.800 --> 01:11:45.079
<v Speaker 4>would expect that the validation set would be or I mean,

1248
01:11:45.119 --> 01:11:48.560
<v Speaker 4>the test set would be lower performance than the validation set,

1249
01:11:48.600 --> 01:11:50.560
<v Speaker 4>So that's supposed to be it. So it's all these

1250
01:11:50.560 --> 01:11:53.239
<v Speaker 4>different classes. They measure the performance on each one and

1251
01:11:53.279 --> 01:11:56.239
<v Speaker 4>then they do an intersection over union on them, and

1252
01:11:56.279 --> 01:11:58.199
<v Speaker 4>then they average it, and that's what this is table

1253
01:11:58.279 --> 01:12:01.039
<v Speaker 4>is saying. So what you're getting here is forty six

1254
01:12:01.039 --> 01:12:07.560
<v Speaker 4>point eighty two percent accuracy on MIOU from WEIMO at

1255
01:12:07.600 --> 01:12:11.479
<v Speaker 4>this one and they presented us this table was on

1256
01:12:11.680 --> 01:12:16.680
<v Speaker 4>the last page of their paper here and it was

1257
01:12:16.720 --> 01:12:20.800
<v Speaker 4>not referenced in the paper, and this table was hidden

1258
01:12:20.880 --> 01:12:25.880
<v Speaker 4>at the very end, not talked about. Everything else on

1259
01:12:25.920 --> 01:12:30.680
<v Speaker 4>this slide is what weimo's actually doing. This table is

1260
01:12:30.760 --> 01:12:34.760
<v Speaker 4>industry standard. Everything else is not. This is what weimo's doing,

1261
01:12:34.880 --> 01:12:39.319
<v Speaker 4>So here we go. What you need to know is this.

1262
01:12:39.760 --> 01:12:43.560
<v Speaker 4>Look at this line that says swformer dot threef hours,

1263
01:12:43.680 --> 01:12:46.960
<v Speaker 4>referring to WEIMO, and then also down here s w

1264
01:12:47.119 --> 01:12:50.319
<v Speaker 4>former threef hours. These two lines are the lines you

1265
01:12:50.359 --> 01:12:53.600
<v Speaker 4>need to look at. What you'll see is they're showing

1266
01:12:53.640 --> 01:12:58.399
<v Speaker 4>the performance using this metric apaph on pedestrians, for example,

1267
01:13:00.359 --> 01:13:06.119
<v Speaker 4>Whatever the hell this apaph metric is not miou and

1268
01:13:06.199 --> 01:13:08.840
<v Speaker 4>I have never seen this metric before, so I had

1269
01:13:08.840 --> 01:13:11.720
<v Speaker 4>to go find it. What the hell is this apaaph

1270
01:13:11.800 --> 01:13:15.560
<v Speaker 4>thing they're showing on this whatever this metric is apaph,

1271
01:13:15.640 --> 01:13:18.279
<v Speaker 4>whatever the hell that means? This thing is showing an

1272
01:13:18.319 --> 01:13:23.359
<v Speaker 4>eighty two point nine percent accuracy rate up here or

1273
01:13:23.399 --> 01:13:25.920
<v Speaker 4>maybe down here, like eighty two point one three on

1274
01:13:25.960 --> 01:13:30.159
<v Speaker 4>the pedestrian L one whatever ap me and aph. So

1275
01:13:30.239 --> 01:13:32.039
<v Speaker 4>I went and looked for where the hell did this

1276
01:13:32.079 --> 01:13:35.279
<v Speaker 4>come from? And that's what's on the right. And it

1277
01:13:35.359 --> 01:13:41.920
<v Speaker 4>turns out this metric apaph it was also made by

1278
01:13:41.960 --> 01:13:44.600
<v Speaker 4>Waimo in a paper from twenty nineteen, which I'm showing

1279
01:13:44.640 --> 01:13:49.119
<v Speaker 4>here on the right side. And what this effectively does

1280
01:13:49.600 --> 01:13:52.359
<v Speaker 4>this is the formula for it on the right side. Whoops.

1281
01:13:52.920 --> 01:13:56.640
<v Speaker 4>What this does is it takes this data on the left,

1282
01:13:57.039 --> 01:14:00.640
<v Speaker 4>runs it through this magic formula on the right, and

1283
01:14:00.760 --> 01:14:05.800
<v Speaker 4>outputs these results. That's literally what it does. I'm not exaggerating.

1284
01:14:05.800 --> 01:14:08.800
<v Speaker 4>That's what it does. It takes this data, runs it

1285
01:14:08.840 --> 01:14:13.399
<v Speaker 4>through that formula, manufactures this data and as you can see,

1286
01:14:13.720 --> 01:14:16.880
<v Speaker 4>the performance that they're showing off of this metric, which

1287
01:14:16.880 --> 01:14:19.479
<v Speaker 4>is what they're using for their safety reports and what

1288
01:14:19.520 --> 01:14:22.079
<v Speaker 4>they published in their safety data in twenty twenty three

1289
01:14:22.119 --> 01:14:27.439
<v Speaker 4>and twenty twenty four. This data here is almost double

1290
01:14:27.520 --> 01:14:30.640
<v Speaker 4>the performance of the industry standard, and they totally ignored

1291
01:14:30.680 --> 01:14:34.399
<v Speaker 4>the industry standard forty six percent versus eighty two point

1292
01:14:34.479 --> 01:14:37.479
<v Speaker 4>nine percent or whatever on a metric that they invented.

1293
01:14:38.119 --> 01:14:41.359
<v Speaker 4>So let me make it clear here. WEIMO invented the

1294
01:14:41.439 --> 01:14:45.520
<v Speaker 4>metric that nobody in the world uses, and then they

1295
01:14:45.680 --> 01:14:50.039
<v Speaker 4>use that metric for their safety data, and that metric

1296
01:14:50.079 --> 01:14:53.239
<v Speaker 4>that they invented is showing a performance that is nearly

1297
01:14:53.399 --> 01:14:57.159
<v Speaker 4>double the actual performance that is known as industry standard

1298
01:14:57.479 --> 01:15:02.880
<v Speaker 4>around the world. So this is important to know. Now,

1299
01:15:03.279 --> 01:15:05.159
<v Speaker 4>one other thing that I was saying earlier and the

1300
01:15:05.199 --> 01:15:08.560
<v Speaker 4>previous side was the degradation of lid or over range.

1301
01:15:09.399 --> 01:15:13.760
<v Speaker 4>So what you're getting these point cloud returns. One of

1302
01:15:13.760 --> 01:15:17.399
<v Speaker 4>the things that happens is that the number of points

1303
01:15:17.439 --> 01:15:21.520
<v Speaker 4>that you get from the light our system the farther

1304
01:15:21.720 --> 01:15:25.159
<v Speaker 4>you go out is lower. That's why the quality goes down.

1305
01:15:25.760 --> 01:15:28.279
<v Speaker 4>So when I was saying fifty meters one hundred meters,

1306
01:15:28.279 --> 01:15:31.439
<v Speaker 4>one hundred and fifty meters, the argoverse results were showing

1307
01:15:31.560 --> 01:15:34.600
<v Speaker 4>degradation at a log two rate half every fifty meters.

1308
01:15:35.920 --> 01:15:39.159
<v Speaker 4>This this thing that I just described, this is called sparsity.

1309
01:15:39.560 --> 01:15:43.399
<v Speaker 4>In AI terms, this is referred to as sparsity. How

1310
01:15:43.560 --> 01:15:48.000
<v Speaker 4>sparse are the points? You want to have higher resolution data?

1311
01:15:48.439 --> 01:15:54.039
<v Speaker 4>More points? Well, these results were showing degradation at range,

1312
01:15:54.439 --> 01:15:58.279
<v Speaker 4>and I was also stating that it seems like research

1313
01:15:58.359 --> 01:16:01.920
<v Speaker 4>on this specific topic about degradation over range seems to

1314
01:16:01.960 --> 01:16:06.279
<v Speaker 4>be heavily suppressed because I just can't find anything anyone

1315
01:16:06.520 --> 01:16:10.239
<v Speaker 4>really doing any serious research on it. And it's not,

1316
01:16:10.319 --> 01:16:12.840
<v Speaker 4>in my opinion, it's impossible that nobody thought about that.

1317
01:16:13.039 --> 01:16:16.960
<v Speaker 4>For how prevalent and common, and how many businesses, major

1318
01:16:17.039 --> 01:16:21.239
<v Speaker 4>multi billion dollar businesses have been using lightar and failed,

1319
01:16:21.680 --> 01:16:25.680
<v Speaker 4>the graveyard of businesses that have tried to make lightar

1320
01:16:25.760 --> 01:16:30.199
<v Speaker 4>work and failed, there is no possible way that nobody

1321
01:16:30.239 --> 01:16:34.640
<v Speaker 4>thought about degradation of lightar over range. I promise you

1322
01:16:35.119 --> 01:16:38.079
<v Speaker 4>this is like from an engineering perspective, you have to

1323
01:16:38.119 --> 01:16:40.920
<v Speaker 4>be completely idiot. You have to be a complete idiot

1324
01:16:40.920 --> 01:16:44.199
<v Speaker 4>to not think about that. This is like even in cameras,

1325
01:16:44.239 --> 01:16:46.439
<v Speaker 4>you think about like video games, you know about like

1326
01:16:46.840 --> 01:16:49.359
<v Speaker 4>how you have lower quality pixels. Further down you go

1327
01:16:49.520 --> 01:16:51.479
<v Speaker 4>and in order to say, remember, you have to understand

1328
01:16:51.520 --> 01:16:54.880
<v Speaker 4>this stuff. It's like at an elementary level. There's no

1329
01:16:55.039 --> 01:16:58.199
<v Speaker 4>possible way people didn't think about that. So why is

1330
01:16:58.239 --> 01:17:00.640
<v Speaker 4>that important? I want to read this thing that came

1331
01:17:00.640 --> 01:17:03.760
<v Speaker 4>from the paper. Some of the latest commercial light ours

1332
01:17:03.800 --> 01:17:06.079
<v Speaker 4>can sense up to two hundred and fifty and three

1333
01:17:06.159 --> 01:17:09.960
<v Speaker 4>hundred meters in all directions around the vehicle, leading to

1334
01:17:10.039 --> 01:17:13.479
<v Speaker 4>a large range of point clouds. Okay, so they're fully

1335
01:17:13.520 --> 01:17:16.119
<v Speaker 4>aware of point how point clouds work, and they're talking

1336
01:17:16.119 --> 01:17:18.760
<v Speaker 4>about how long range lightar can go up to this range.

1337
01:17:19.000 --> 01:17:21.000
<v Speaker 4>But notice they don't tell you how sparse it is

1338
01:17:21.039 --> 01:17:25.000
<v Speaker 4>at that point. So let's go to this bottom right here,

1339
01:17:25.479 --> 01:17:29.800
<v Speaker 4>I'm going to read this. Our experiments are primarily based

1340
01:17:30.239 --> 01:17:33.600
<v Speaker 4>on the challenging WEAIMO open data set. Okay, One thing

1341
01:17:33.680 --> 01:17:36.399
<v Speaker 4>to know about this WEIMO open data set. At the

1342
01:17:36.439 --> 01:17:39.359
<v Speaker 4>time that I published this, there were a total which

1343
01:17:39.399 --> 01:17:43.159
<v Speaker 4>is November twenty twenty three, at the time I was

1344
01:17:43.199 --> 01:17:47.000
<v Speaker 4>publishing this, which is four or five months after CVPR.

1345
01:17:47.800 --> 01:17:52.079
<v Speaker 4>This CVPR, they had a total of six submissions to

1346
01:17:52.239 --> 01:17:57.720
<v Speaker 4>the WEIMO open data set. By comparison, in Nvidia at

1347
01:17:57.760 --> 01:18:03.319
<v Speaker 4>on their data set had more than four hundred submissions

1348
01:18:04.880 --> 01:18:08.000
<v Speaker 4>and I'm bringing that up because the next line they

1349
01:18:08.039 --> 01:18:12.279
<v Speaker 4>say is which has been adopted in many recent state

1350
01:18:12.359 --> 01:18:18.000
<v Speaker 4>of the art three D detection methods. As far as

1351
01:18:18.079 --> 01:18:21.239
<v Speaker 4>I'm aware, basically nobody in the world uses the Waymo

1352
01:18:21.359 --> 01:18:25.079
<v Speaker 4>open data set other than weymo, and even the submissions,

1353
01:18:25.159 --> 01:18:28.159
<v Speaker 4>the six submissions that I saw in November twenty twenty three,

1354
01:18:28.640 --> 01:18:31.680
<v Speaker 4>even those submissions were from nobody's I've never heard of

1355
01:18:31.720 --> 01:18:34.680
<v Speaker 4>any of them. But on the in Vidia data set,

1356
01:18:35.079 --> 01:18:40.039
<v Speaker 4>with more than four hundred, every major computer vision every

1357
01:18:40.039 --> 01:18:43.800
<v Speaker 4>major university that competes in computer vision has competed there,

1358
01:18:43.840 --> 01:18:50.279
<v Speaker 4>including in Vidia and all other major computer vision participants,

1359
01:18:50.640 --> 01:18:55.199
<v Speaker 4>it's like the standard. Effectively, the data set contains one

1360
01:18:55.680 --> 01:18:58.159
<v Speaker 4>hundred and fifty seven split into seven hundred and ninety

1361
01:18:58.199 --> 01:19:00.800
<v Speaker 4>eight training, two hundred two validation one hundred fIF Each

1362
01:19:00.960 --> 01:19:05.079
<v Speaker 4>scene has about two hundred frames, where each frame captures

1363
01:19:05.079 --> 01:19:07.840
<v Speaker 4>the full three hundred and sixty degrees around the EGO vehicle.

1364
01:19:08.279 --> 01:19:12.800
<v Speaker 4>The data set has one long range lighter with the

1365
01:19:12.920 --> 01:19:18.720
<v Speaker 4>range capped at seventy five meters, four near range lightar,

1366
01:19:18.760 --> 01:19:24.600
<v Speaker 4>and five cameras, so it doesn't matter that the long

1367
01:19:24.680 --> 01:19:29.119
<v Speaker 4>range lighter can since up to two hundred and fifty

1368
01:19:29.119 --> 01:19:31.960
<v Speaker 4>to three hundred meters. They're capping it at seventy five.

1369
01:19:32.720 --> 01:19:35.800
<v Speaker 4>And this helps support what I was saying earlier and

1370
01:19:35.840 --> 01:19:39.000
<v Speaker 4>what Argoverse results were showing at the same timeframe as this,

1371
01:19:39.680 --> 01:19:43.000
<v Speaker 4>that lighter degrades over range. And notice the seventy five

1372
01:19:43.039 --> 01:19:46.079
<v Speaker 4>meter cap just a little beyond the fifty meter degradation.

1373
01:19:46.960 --> 01:19:49.319
<v Speaker 1>How much you would have bet that that seventy five

1374
01:19:49.399 --> 01:19:52.079
<v Speaker 1>meter cap that they're giving it is probably a way

1375
01:19:52.119 --> 01:19:55.600
<v Speaker 1>over estimation as well.

1376
01:19:55.720 --> 01:19:59.960
<v Speaker 4>Yeah, I'm pretty sure. Yeah, you're going to see some stuff.

1377
01:20:02.560 --> 01:20:05.760
<v Speaker 4>I'm just being a little I'm not trying to be.

1378
01:20:08.760 --> 01:20:11.920
<v Speaker 1>Yeah, they're overestimated. Even in that seventy five meters. They

1379
01:20:11.920 --> 01:20:13.399
<v Speaker 1>would have gone higher if they could have.

1380
01:20:15.359 --> 01:20:17.279
<v Speaker 4>Yeah, they would have gone high there. I mean, the

1381
01:20:17.319 --> 01:20:19.359
<v Speaker 4>seventy five meters is sixty miles an hour, gives you

1382
01:20:19.439 --> 01:20:22.079
<v Speaker 4>less than three seconds of forecasting time, you know, So

1383
01:20:22.199 --> 01:20:29.159
<v Speaker 4>that's still not much better, you know. So let me

1384
01:20:29.279 --> 01:20:31.479
<v Speaker 4>talk a little bit. I mean, I'll skip all this

1385
01:20:31.520 --> 01:20:34.119
<v Speaker 4>technical stuff, but if anyone cares and wants to know

1386
01:20:34.159 --> 01:20:36.199
<v Speaker 4>about the architecture and how it really works for the

1387
01:20:36.239 --> 01:20:39.039
<v Speaker 4>camera based system, this is actually a really useful slide.

1388
01:20:39.039 --> 01:20:41.319
<v Speaker 4>It pretty much lays out the actual architecture if anyone

1389
01:20:41.359 --> 01:20:44.119
<v Speaker 4>actually cares. But I'm going to try and skip all

1390
01:20:44.159 --> 01:20:49.119
<v Speaker 4>the really technical stuff. I do want to just mention

1391
01:20:49.560 --> 01:20:53.039
<v Speaker 4>something from this paper, which was a big deal. This

1392
01:20:53.159 --> 01:20:55.079
<v Speaker 4>is called one Former. This was a big deal in

1393
01:20:55.119 --> 01:20:59.079
<v Speaker 4>computer vision and all this stuff. The element here that

1394
01:20:59.119 --> 01:21:03.279
<v Speaker 4>I'm just going to briefly is that, as you would expect,

1395
01:21:03.840 --> 01:21:10.920
<v Speaker 4>it turns out that vision also improves large language models semantics,

1396
01:21:10.960 --> 01:21:13.520
<v Speaker 4>and it's actually a two way road. They kind of

1397
01:21:13.520 --> 01:21:17.640
<v Speaker 4>go hand in hand. But the point is that vision

1398
01:21:17.800 --> 01:21:21.279
<v Speaker 4>is an augment to an LLM. So another way of

1399
01:21:21.319 --> 01:21:24.399
<v Speaker 4>saying that is being able to see a red wagon

1400
01:21:25.159 --> 01:21:29.119
<v Speaker 4>allows you to be able to describe a red wagon

1401
01:21:29.199 --> 01:21:32.920
<v Speaker 4>in language better. And this one form is kind of

1402
01:21:32.920 --> 01:21:38.119
<v Speaker 4>showing all that. Let me skip some of this stuff.

1403
01:21:38.680 --> 01:21:42.159
<v Speaker 4>This is technical stuff. It's kind of cool, shows some

1404
01:21:42.319 --> 01:21:45.600
<v Speaker 4>really interesting things, but I kind of want to get

1405
01:21:45.640 --> 01:21:48.760
<v Speaker 4>away from it. Segment anything for anyone in computer vision

1406
01:21:48.840 --> 01:21:52.760
<v Speaker 4>is totally worth knowing. This is a This was made

1407
01:21:52.800 --> 01:21:56.039
<v Speaker 4>by Meta and when it came out, it was a

1408
01:21:56.039 --> 01:22:02.199
<v Speaker 4>big deal because it did it did object classification pretty

1409
01:22:02.279 --> 01:22:07.439
<v Speaker 4>robustly across a wide variety of topics, not just avs,

1410
01:22:06.880 --> 01:22:11.760
<v Speaker 4>like not just like useful in the av industry, but

1411
01:22:11.840 --> 01:22:15.840
<v Speaker 4>useful across a variety of tasks, and it was a

1412
01:22:15.840 --> 01:22:20.439
<v Speaker 4>big deal. So this is where we'll get into the

1413
01:22:20.600 --> 01:22:25.399
<v Speaker 4>argument for depth and why light oars were adopted. So again,

1414
01:22:25.880 --> 01:22:32.199
<v Speaker 4>the primary and significant function that lightar provides avs is

1415
01:22:32.239 --> 01:22:36.920
<v Speaker 4>a depth calculation. And the reason why this is completely

1416
01:22:37.039 --> 01:22:40.279
<v Speaker 4>useless because number one, there's no reason to have super

1417
01:22:40.319 --> 01:22:43.199
<v Speaker 4>precise death depth. There's no reason to have laser precision

1418
01:22:43.319 --> 01:22:49.319
<v Speaker 4>on depth. The argument went like this, the major problem

1419
01:22:49.520 --> 01:22:56.840
<v Speaker 4>in avs is collision, and because lightar does accurate depth perception,

1420
01:22:57.399 --> 01:23:01.279
<v Speaker 4>highly accurate, this is how you have collision. This was

1421
01:23:01.319 --> 01:23:04.640
<v Speaker 4>the crux and fundamental purpose of the argument for lidar.

1422
01:23:06.119 --> 01:23:08.479
<v Speaker 4>But if you think about your human eyes and how

1423
01:23:08.520 --> 01:23:11.479
<v Speaker 4>you do it in real life, you're not trying to

1424
01:23:11.600 --> 01:23:14.960
<v Speaker 4>calculate the distance between you and an object in front

1425
01:23:14.960 --> 01:23:18.600
<v Speaker 4>of you. The only thing you care about when you

1426
01:23:18.680 --> 01:23:21.279
<v Speaker 4>in your everyday real life is you care about this

1427
01:23:21.439 --> 01:23:25.880
<v Speaker 4>object is in front of that object, but behind that object.

1428
01:23:26.279 --> 01:23:30.800
<v Speaker 4>This is called relative distance. So this is how the

1429
01:23:30.840 --> 01:23:35.239
<v Speaker 4>brain does it, and this is actually the better way

1430
01:23:35.319 --> 01:23:38.319
<v Speaker 4>of doing it. It's way more efficient. So one of

1431
01:23:38.359 --> 01:23:42.760
<v Speaker 4>the reasons why we have binocular vision two eyes is

1432
01:23:42.760 --> 01:23:46.960
<v Speaker 4>because to having two eyes aka two cameras is how

1433
01:23:47.159 --> 01:23:51.479
<v Speaker 4>our brain does depth perception on a relative basis. When

1434
01:23:51.520 --> 01:23:54.680
<v Speaker 4>you have two fixed points that you're using as a

1435
01:23:54.720 --> 01:24:00.199
<v Speaker 4>camera with known dimensions, you can create triangles off of this,

1436
01:24:00.279 --> 01:24:03.960
<v Speaker 4>and then everything becomes geometry and that's it. You don't

1437
01:24:04.000 --> 01:24:06.479
<v Speaker 4>you can even make up then like the actual real

1438
01:24:06.840 --> 01:24:09.520
<v Speaker 4>precise numbers don't even matter. You can be as long

1439
01:24:09.520 --> 01:24:12.960
<v Speaker 4>as they're relatively correct. It doesn't actually matter what the

1440
01:24:13.119 --> 01:24:15.079
<v Speaker 4>numbers are. You can even make up numbers as you go.

1441
01:24:15.439 --> 01:24:17.640
<v Speaker 4>The only thing that matters is relative to death.

1442
01:24:18.520 --> 01:24:22.359
<v Speaker 1>Now, insects, look at the eyes on insects. It's just

1443
01:24:22.840 --> 01:24:25.680
<v Speaker 1>massive eyes, just over and over again, just so that

1444
01:24:25.720 --> 01:24:28.520
<v Speaker 1>it has perfect you know, it's it's got depth perception

1445
01:24:28.600 --> 01:24:32.920
<v Speaker 1>and everything else. Yeah, all all you know, wired together

1446
01:24:32.960 --> 01:24:33.600
<v Speaker 1>with those eyes.

1447
01:24:33.840 --> 01:24:37.279
<v Speaker 4>And you'll even notice that the eyes on say a fly,

1448
01:24:37.479 --> 01:24:41.760
<v Speaker 4>for example, are like they look complicated, but they're actually

1449
01:24:41.880 --> 01:24:44.359
<v Speaker 4>just repeating the same thing over and over again, which

1450
01:24:44.840 --> 01:24:48.039
<v Speaker 4>one fundamental concept and what they care about. A fly

1451
01:24:48.600 --> 01:24:52.359
<v Speaker 4>cares about the most is motion detection. And that's why

1452
01:24:52.359 --> 01:24:54.279
<v Speaker 4>the eyes bulge out and they can see behind them

1453
01:24:54.279 --> 01:24:56.840
<v Speaker 4>and all around. That's why the globe shape and they're

1454
01:24:56.840 --> 01:25:02.880
<v Speaker 4>doing object detection sorry, motion motion de and that's the

1455
01:25:02.880 --> 01:25:05.319
<v Speaker 4>fundamental reason why it is. And you can even get

1456
01:25:05.560 --> 01:25:09.800
<v Speaker 4>hints of evolutionary purpose evolutionary progress. Sorry, you can get

1457
01:25:09.840 --> 01:25:12.600
<v Speaker 4>hints of evolutionary progress just by looking at the eye

1458
01:25:12.600 --> 01:25:15.760
<v Speaker 4>throughout all the species. The more evolved the eye, probably

1459
01:25:15.760 --> 01:25:18.760
<v Speaker 4>the more intelligent the species. If all the eye that

1460
01:25:18.800 --> 01:25:21.159
<v Speaker 4>they could this is some animal or or whatever it

1461
01:25:21.239 --> 01:25:24.960
<v Speaker 4>can do, is look do objects or motion detection, then

1462
01:25:25.000 --> 01:25:27.720
<v Speaker 4>it's probably a primitive eye that probably has gray cylinders

1463
01:25:27.800 --> 01:25:31.960
<v Speaker 4>or something equivalent, right, gray scale. But the more evolved

1464
01:25:31.960 --> 01:25:34.760
<v Speaker 4>the eye is, with more color perception and more quality,

1465
01:25:35.399 --> 01:25:38.880
<v Speaker 4>then this is probably a more intelligent species. And one

1466
01:25:38.920 --> 01:25:40.720
<v Speaker 4>of the things that I kind of I'm not going

1467
01:25:40.800 --> 01:25:42.359
<v Speaker 4>to hit too hard on this, but one of the

1468
01:25:42.399 --> 01:25:44.840
<v Speaker 4>things I want to demonstrate is that consciousness is a gradient.

1469
01:25:45.479 --> 01:25:47.520
<v Speaker 4>One of the things is that there's not a real

1470
01:25:47.880 --> 01:25:50.960
<v Speaker 4>definition of consciousness. And why is it difficult to measure?

1471
01:25:51.000 --> 01:25:54.760
<v Speaker 4>It's because consciousness is a gradient. It's not one. It's

1472
01:25:54.800 --> 01:25:57.239
<v Speaker 4>not like you just suddenly have a certain level that

1473
01:25:57.279 --> 01:25:59.000
<v Speaker 4>you hit and then all of a sudden you're conscious.

1474
01:25:59.520 --> 01:26:03.680
<v Speaker 4>Nonsciousness is a gradient, and it goes from simple to complex,

1475
01:26:04.079 --> 01:26:07.000
<v Speaker 4>and it can get even more complex than we even are.

1476
01:26:08.800 --> 01:26:15.239
<v Speaker 4>So I want to get to the real fun stuff.

1477
01:26:15.680 --> 01:26:20.760
<v Speaker 4>So I was just talking about depth perception. So and

1478
01:26:21.079 --> 01:26:25.880
<v Speaker 4>earlier I had mentioned that bounding boxes are a cheat. Well,

1479
01:26:26.279 --> 01:26:31.359
<v Speaker 4>this is the solution to that. To those problems, depth

1480
01:26:31.399 --> 01:26:36.119
<v Speaker 4>perception and bounding boxes are both solved by this. So

1481
01:26:36.359 --> 01:26:41.199
<v Speaker 4>in the AV industry it has been completely adopted, including Waimo,

1482
01:26:41.439 --> 01:26:46.840
<v Speaker 4>every single major company or philosophy. It doesn't matter what

1483
01:26:46.920 --> 01:26:51.560
<v Speaker 4>your belief system is. Every participant in the AV industry

1484
01:26:51.800 --> 01:26:57.359
<v Speaker 4>has adopted this thing called an occupancy network. And this

1485
01:26:57.439 --> 01:27:00.800
<v Speaker 4>is one of two AI cons that I'm going to

1486
01:27:00.800 --> 01:27:04.039
<v Speaker 4>be demonstrating. This is when earlier I was saying I

1487
01:27:04.079 --> 01:27:07.000
<v Speaker 4>was hoping to teach people what real AI is and

1488
01:27:07.039 --> 01:27:10.880
<v Speaker 4>distinguish that from fake AI. This is the one of

1489
01:27:10.920 --> 01:27:13.640
<v Speaker 4>the two things you need to know in order to

1490
01:27:13.760 --> 01:27:17.199
<v Speaker 4>understand what is real AI and what is not. An

1491
01:27:17.239 --> 01:27:20.199
<v Speaker 4>occupancy network is one, and the next thing I'll talk

1492
01:27:20.199 --> 01:27:23.760
<v Speaker 4>about is something called nerve. These two concepts you must

1493
01:27:23.800 --> 01:27:26.800
<v Speaker 4>know if you want to understand anything about AI going forward.

1494
01:27:26.840 --> 01:27:30.680
<v Speaker 4>And this is what distinguishes real AI from all the

1495
01:27:30.680 --> 01:27:35.119
<v Speaker 4>other junk. So an occupancy network, what it is is

1496
01:27:35.159 --> 01:27:39.560
<v Speaker 4>it's a grid pattern that is made around the vehicle itself,

1497
01:27:40.399 --> 01:27:43.880
<v Speaker 4>and this grid pattern is filled with these Minecraft looking

1498
01:27:43.960 --> 01:27:48.319
<v Speaker 4>boxes all around it. And these Minecraft looking boxes, these

1499
01:27:48.359 --> 01:27:53.159
<v Speaker 4>boxes are called voxels, and these voxels are made are

1500
01:27:53.239 --> 01:27:57.199
<v Speaker 4>heat mapped all around the vehicle. So it's basically a grid.

1501
01:27:57.479 --> 01:27:59.479
<v Speaker 4>And then you put objects on this grid and it

1502
01:27:59.520 --> 01:28:02.239
<v Speaker 4>doesn't matter the exact precise distance as long as you

1503
01:28:02.279 --> 01:28:06.239
<v Speaker 4>get it pixel correct. Whatever your pixel resolution is will

1504
01:28:06.279 --> 01:28:10.560
<v Speaker 4>be the resolution of your occupancy network and these voxels.

1505
01:28:10.760 --> 01:28:18.039
<v Speaker 4>What is really simple and powerful about an occupancy network

1506
01:28:18.079 --> 01:28:21.680
<v Speaker 4>and why it was adopted by the industry is because

1507
01:28:21.760 --> 01:28:25.600
<v Speaker 4>it gives you a simple mechanism, a zero and one

1508
01:28:25.960 --> 01:28:30.920
<v Speaker 4>binary mechanism for drive and not drive. So you can

1509
01:28:30.960 --> 01:28:34.039
<v Speaker 4>see here in blue, this blue area is where you

1510
01:28:34.079 --> 01:28:38.680
<v Speaker 4>can drive, and then everything else is no drive, don't drive.

1511
01:28:39.560 --> 01:28:42.199
<v Speaker 1>If you're gonna if you're gonna relate this to a sense,

1512
01:28:42.399 --> 01:28:45.600
<v Speaker 1>it would have to be touch, right, because each and

1513
01:28:45.640 --> 01:28:48.640
<v Speaker 1>every one of these things have different bumps and different layers,

1514
01:28:49.119 --> 01:28:52.279
<v Speaker 1>so that as you're reaching out, you're touching these boxes,

1515
01:28:52.399 --> 01:28:55.800
<v Speaker 1>which is a generalized form, and you're getting the feel

1516
01:28:55.920 --> 01:28:58.720
<v Speaker 1>for it how far or close it is. So it's

1517
01:28:58.760 --> 01:29:02.119
<v Speaker 1>a very tactile sais with this form right here.

1518
01:29:02.279 --> 01:29:05.960
<v Speaker 4>Good, So you're absolutely right, and so eventually I'm going

1519
01:29:06.000 --> 01:29:08.119
<v Speaker 4>to show the evolution of this. So this is way.

1520
01:29:08.159 --> 01:29:10.000
<v Speaker 4>This is an old version. I know that says twenty

1521
01:29:10.039 --> 01:29:12.760
<v Speaker 4>twenty three, but this is actually an old version of

1522
01:29:12.800 --> 01:29:15.800
<v Speaker 4>Tesla's occupancy network. I think it is. Actually this is

1523
01:29:15.800 --> 01:29:18.840
<v Speaker 4>the twenty twenty version, if I recall correctly. So this

1524
01:29:18.960 --> 01:29:21.159
<v Speaker 4>is an old version, and I'm just showing this to

1525
01:29:21.199 --> 01:29:24.520
<v Speaker 4>show you the evolution of what happened in the industry

1526
01:29:25.000 --> 01:29:29.560
<v Speaker 4>the occupancy because this is fundamentally important an occupancy network,

1527
01:29:29.720 --> 01:29:32.079
<v Speaker 4>and you're right. You are going to get the texture

1528
01:29:32.279 --> 01:29:35.760
<v Speaker 4>quality as a result of this, and there's a few

1529
01:29:35.880 --> 01:29:39.600
<v Speaker 4>there's actually a number of other benefits. But first and foremost,

1530
01:29:40.119 --> 01:29:43.079
<v Speaker 4>the thing to understand here is that the reason why

1531
01:29:43.119 --> 01:29:46.920
<v Speaker 4>this was adopted into the av industry. No matter your philosophy,

1532
01:29:47.000 --> 01:29:50.159
<v Speaker 4>whether you're using lidar or not, it doesn't matter, you're

1533
01:29:50.199 --> 01:29:53.479
<v Speaker 4>still going to be doing this is because it provided

1534
01:29:53.680 --> 01:30:01.920
<v Speaker 4>an automatic implicit collision avoidance system. This through software alone,

1535
01:30:02.479 --> 01:30:08.960
<v Speaker 4>this occupancy network automatically created an implicit collision avoidance system,

1536
01:30:09.319 --> 01:30:13.520
<v Speaker 4>and that alone was what was a lot enabled it

1537
01:30:13.720 --> 01:30:18.279
<v Speaker 4>which made it attractive to put into avs. Now, there

1538
01:30:18.279 --> 01:30:20.239
<v Speaker 4>are many other benefits that I'm going to talk about

1539
01:30:20.239 --> 01:30:24.119
<v Speaker 4>that ended up happening, and this evolved even further. But

1540
01:30:24.520 --> 01:30:27.920
<v Speaker 4>important to understand here that basically, think of Minecraft. You

1541
01:30:27.960 --> 01:30:30.520
<v Speaker 4>have a zero one mechanism for where you can drive

1542
01:30:30.560 --> 01:30:34.319
<v Speaker 4>and where not to drive. And in the you know,

1543
01:30:34.640 --> 01:30:37.520
<v Speaker 4>in the zero part, the part you don't drive these

1544
01:30:37.520 --> 01:30:40.560
<v Speaker 4>little boxes. What they've done is they've added in a

1545
01:30:40.600 --> 01:30:44.000
<v Speaker 4>bunch of little data points, so they add further data

1546
01:30:44.039 --> 01:30:47.000
<v Speaker 4>into the not drive part. And then the drive part

1547
01:30:47.039 --> 01:30:50.159
<v Speaker 4>is basically the road. That's it, just the road, and

1548
01:30:50.720 --> 01:30:56.680
<v Speaker 4>that's simple, binary and super powerful. And it turns out

1549
01:30:56.800 --> 01:31:00.720
<v Speaker 4>and this is this is the solution to the bounding problem.

1550
01:31:01.039 --> 01:31:03.720
<v Speaker 4>And I'm going to get into that. So the bounding

1551
01:31:03.720 --> 01:31:06.720
<v Speaker 4>box problem I was saying was a was a cheat.

1552
01:31:07.239 --> 01:31:10.840
<v Speaker 4>Oh shit, I'm sorry, I need to talk about nerves first.

1553
01:31:12.119 --> 01:31:14.279
<v Speaker 4>Pause on the bounding box problem. I need to talk

1554
01:31:14.279 --> 01:31:16.560
<v Speaker 4>about this first. I got a mixed up. So the

1555
01:31:16.840 --> 01:31:18.800
<v Speaker 4>I was saying, there are two things that we need

1556
01:31:18.840 --> 01:31:23.279
<v Speaker 4>to know in real AI. The occupancy network is one,

1557
01:31:24.000 --> 01:31:25.720
<v Speaker 4>and then the second thing you need to know. And

1558
01:31:25.760 --> 01:31:27.760
<v Speaker 4>this is as technical as I'm going to get there

1559
01:31:28.000 --> 01:31:34.600
<v Speaker 4>is nerves now NERF, n ERF, NERF. This thing was invented.

1560
01:31:34.680 --> 01:31:37.840
<v Speaker 4>This is a technology that was invented in twenty nineteen,

1561
01:31:38.159 --> 01:31:42.840
<v Speaker 4>I think or twenty eighteen, twenty nineteen, and this stands

1562
01:31:42.920 --> 01:31:47.199
<v Speaker 4>for neural radiance field. And now, if you play video games,

1563
01:31:47.199 --> 01:31:50.039
<v Speaker 4>you're going to have an advantage in understanding all this because

1564
01:31:50.079 --> 01:31:54.399
<v Speaker 4>a neural radiance field is very similar in concept to

1565
01:31:54.640 --> 01:31:58.720
<v Speaker 4>raytracing in a video game. And so what a radiance

1566
01:31:58.760 --> 01:32:04.800
<v Speaker 4>field is is that it's the ray from a pixel

1567
01:32:05.119 --> 01:32:10.920
<v Speaker 4>on a frame. It's the ray vector that is pretty

1568
01:32:10.960 --> 01:32:13.800
<v Speaker 4>much identical to what ray tracing is in video games,

1569
01:32:14.119 --> 01:32:19.640
<v Speaker 4>and it's each pixel doing that ray tracing. And that's

1570
01:32:19.640 --> 01:32:22.439
<v Speaker 4>what a NERF is. That's originally what a NERF was.

1571
01:32:22.800 --> 01:32:27.520
<v Speaker 4>So NERF started out in twenty nineteen as a new technology,

1572
01:32:27.840 --> 01:32:31.199
<v Speaker 4>and what happened is this thing that's called a radiance field,

1573
01:32:31.199 --> 01:32:36.119
<v Speaker 4>this NERF thing, it evolved into a whole branch of technologies. Now,

1574
01:32:36.159 --> 01:32:40.359
<v Speaker 4>this whole branch of technologies in AI is just colloquially

1575
01:32:40.680 --> 01:32:45.600
<v Speaker 4>collectively referred to as NERF. But they haven't since then.

1576
01:32:45.760 --> 01:32:48.960
<v Speaker 4>Since then, they have made like thousands and thousands of

1577
01:32:48.960 --> 01:32:53.640
<v Speaker 4>different evolutions of this thing. Called nerve. Now, earlier I

1578
01:32:53.720 --> 01:32:57.039
<v Speaker 4>was showing the occupancy network is these like Minecraft looking

1579
01:32:57.079 --> 01:33:02.560
<v Speaker 4>boxes fossils. What happened oh time is that those voxels

1580
01:33:03.119 --> 01:33:08.800
<v Speaker 4>reduced in size over time and eventually became single points.

1581
01:33:09.800 --> 01:33:12.279
<v Speaker 4>And what you were able to do with that is

1582
01:33:12.319 --> 01:33:15.760
<v Speaker 4>you were able to make an occupancy network and the

1583
01:33:15.920 --> 01:33:21.359
<v Speaker 4>NERF effectively the same thing. So occupancy networks and nurse

1584
01:33:21.720 --> 01:33:26.600
<v Speaker 4>initially started out as two independent, completely different things, and

1585
01:33:26.760 --> 01:33:32.720
<v Speaker 4>over time they came together and became basically the same thing.

1586
01:33:33.399 --> 01:33:36.560
<v Speaker 4>So another way of understanding this is that ray tracing

1587
01:33:36.600 --> 01:33:41.760
<v Speaker 4>in video games and occupancy networks in aves basically the

1588
01:33:41.800 --> 01:33:45.039
<v Speaker 4>same thing. And so this is like if you play

1589
01:33:45.079 --> 01:33:47.640
<v Speaker 4>video games, that should be nuts because that means that

1590
01:33:47.720 --> 01:33:51.279
<v Speaker 4>everything that is happening in vehicles in robotics is a

1591
01:33:51.359 --> 01:33:54.640
<v Speaker 4>video game. It is like literally it is that. It

1592
01:33:54.720 --> 01:34:00.439
<v Speaker 4>is not like that, it is that equal to that.

1593
01:34:00.880 --> 01:34:03.239
<v Speaker 4>So these two things became the same thing. This is

1594
01:34:03.239 --> 01:34:05.920
<v Speaker 4>why I'm saying you need to know these two things,

1595
01:34:05.960 --> 01:34:08.199
<v Speaker 4>not that you have to understand everything about it. You

1596
01:34:08.279 --> 01:34:11.479
<v Speaker 4>just need to know what this is because this is

1597
01:34:11.560 --> 01:34:14.159
<v Speaker 4>real AI and you're gonna see what happens with this.

1598
01:34:14.800 --> 01:34:17.880
<v Speaker 4>So I'll even see that word. So NERF is a

1599
01:34:18.039 --> 01:34:21.239
<v Speaker 4>new concept, I'm sure to people. So I'm just gonna

1600
01:34:21.239 --> 01:34:23.680
<v Speaker 4>show like a few examples so that people get an

1601
01:34:23.760 --> 01:34:29.000
<v Speaker 4>understanding of what it is. So initially NERF. So initially

1602
01:34:29.119 --> 01:34:33.359
<v Speaker 4>NERF was slow, you know, like all new things. But

1603
01:34:33.479 --> 01:34:37.920
<v Speaker 4>then eventually there was this paper that was called plinoxiles

1604
01:34:37.960 --> 01:34:39.680
<v Speaker 4>and they figured out how to do it way faster,

1605
01:34:39.840 --> 01:34:43.840
<v Speaker 4>especially with GPUs. And you'll see in this video here

1606
01:34:44.199 --> 01:34:46.239
<v Speaker 4>the difference. So the left one is like what it

1607
01:34:46.279 --> 01:34:48.199
<v Speaker 4>was originally trying to do. This is what NERF is

1608
01:34:48.199 --> 01:34:51.960
<v Speaker 4>originally trying to do. It's trying to render the object.

1609
01:34:52.359 --> 01:34:54.439
<v Speaker 4>But you can see that it was like slow and

1610
01:34:54.760 --> 01:34:57.359
<v Speaker 4>has some problems and has some difficulty. But then of

1611
01:34:57.359 --> 01:35:00.479
<v Speaker 4>course they basically sped this thing up like really fast.

1612
01:35:01.000 --> 01:35:03.039
<v Speaker 4>So what I'm going to eventually build up to is

1613
01:35:03.079 --> 01:35:04.800
<v Speaker 4>can you do this in real time? Yes you can,

1614
01:35:04.960 --> 01:35:06.920
<v Speaker 4>And so I'm going to build up to, Yes, you

1615
01:35:07.000 --> 01:35:11.000
<v Speaker 4>can render NERVE aka ocupency network in real time. If

1616
01:35:11.039 --> 01:35:13.680
<v Speaker 4>you're an AI person, this matters to you because if

1617
01:35:13.680 --> 01:35:17.920
<v Speaker 4>you're an av navigating through the world, it's one thing

1618
01:35:17.920 --> 01:35:19.960
<v Speaker 4>to be able to build of the environment around you,

1619
01:35:20.239 --> 01:35:22.239
<v Speaker 4>but can you do it in real time? Which is

1620
01:35:22.279 --> 01:35:24.600
<v Speaker 4>an even harder right, you have to do it fast.

1621
01:35:25.199 --> 01:35:28.880
<v Speaker 4>So can you do radiance fields aka ocupency networks in

1622
01:35:28.920 --> 01:35:30.720
<v Speaker 4>real time? The answer is going to be yes, and

1623
01:35:30.720 --> 01:35:34.079
<v Speaker 4>I'm going to demonstrate that these are the results if

1624
01:35:34.319 --> 01:35:37.960
<v Speaker 4>you need to care about this. So let me show

1625
01:35:37.960 --> 01:35:40.000
<v Speaker 4>you one step of evolution. This is not the modern

1626
01:35:40.079 --> 01:35:42.119
<v Speaker 4>version of the occupens the network in the tessel system.

1627
01:35:42.159 --> 01:35:44.560
<v Speaker 4>This is like what twenty twenty two, So you can

1628
01:35:44.640 --> 01:35:46.960
<v Speaker 4>see just in a couple of years how the evolution

1629
01:35:47.159 --> 01:35:52.760
<v Speaker 4>is lower resolution I mean sorry, higher resolution voxels, smaller vauxels,

1630
01:35:53.159 --> 01:35:55.800
<v Speaker 4>and what you're getting here. You can see that this

1631
01:35:56.279 --> 01:35:58.319
<v Speaker 4>is still a zero one mechanism, just like I was

1632
01:35:58.319 --> 01:36:00.720
<v Speaker 4>saying before for drive, no drive. But look at this.

1633
01:36:00.760 --> 01:36:03.359
<v Speaker 4>So these are the cameras on top, the literal cameras

1634
01:36:03.359 --> 01:36:05.800
<v Speaker 4>on top, and then this is what the car sees.

1635
01:36:06.600 --> 01:36:09.000
<v Speaker 4>So this is the evolution of the occupancy network over

1636
01:36:09.479 --> 01:36:12.319
<v Speaker 4>probably a couple of years. So a little bit higher

1637
01:36:12.359 --> 01:36:16.479
<v Speaker 4>resolution you're seeing, and you can see the car shapes

1638
01:36:16.520 --> 01:36:19.239
<v Speaker 4>are starting to become more defined. And remember how I

1639
01:36:19.279 --> 01:36:24.479
<v Speaker 4>was saying, uh, bounding boxes are a cheat. These little

1640
01:36:24.520 --> 01:36:27.159
<v Speaker 4>shapes right here, this is how you get around the

1641
01:36:27.199 --> 01:36:30.319
<v Speaker 4>bounding box problem. Once you apply them to a grid

1642
01:36:30.800 --> 01:36:33.640
<v Speaker 4>and you make the voxels as small as possible to

1643
01:36:33.720 --> 01:36:38.279
<v Speaker 4>individual points. Then the task becomes getting the shapes pixel perfect,

1644
01:36:39.239 --> 01:36:42.399
<v Speaker 4>and you have a pixel perfect shape through an occupancy

1645
01:36:42.520 --> 01:36:46.239
<v Speaker 4>network for free. And still so many people are using

1646
01:36:46.239 --> 01:36:48.359
<v Speaker 4>bounding boxes, which is a total cheat.

1647
01:36:53.960 --> 01:36:55.279
<v Speaker 1>This reminds me of Blender.

1648
01:36:56.600 --> 01:36:59.520
<v Speaker 4>Blender, Oh Blender, yeah, right.

1649
01:36:59.359 --> 01:37:02.880
<v Speaker 1>The three D software, Like you could just plug just

1650
01:37:03.239 --> 01:37:05.439
<v Speaker 1>straight into there and have a total model of the

1651
01:37:05.600 --> 01:37:09.119
<v Speaker 1>entire place, and then I guess share it between different vehicles.

1652
01:37:09.680 --> 01:37:12.159
<v Speaker 1>Isn't that what's also happening as well, Like there's a

1653
01:37:12.520 --> 01:37:14.760
<v Speaker 1>sharing between different vehicles with all these.

1654
01:37:16.520 --> 01:37:19.079
<v Speaker 4>I think that, I mean, this is that's more theoretical.

1655
01:37:19.359 --> 01:37:22.119
<v Speaker 4>How do I say it's it's not commercially available like

1656
01:37:22.199 --> 01:37:25.119
<v Speaker 4>Tesla's don't do it yet, but this is part of

1657
01:37:25.159 --> 01:37:30.359
<v Speaker 4>the plan that they're working on. Yeah, and yeah, that

1658
01:37:30.359 --> 01:37:34.640
<v Speaker 4>that you can assume that that'll be obvious. It's yeah.

1659
01:37:34.680 --> 01:37:37.079
<v Speaker 4>I would say that in terms of priorities, like what

1660
01:37:37.159 --> 01:37:41.000
<v Speaker 4>I'm talking about here, making this perfect, getting it exactly

1661
01:37:41.079 --> 01:37:43.840
<v Speaker 4>right becomes is a way higher priority for everyone in

1662
01:37:43.880 --> 01:37:46.159
<v Speaker 4>the world, And that's what they're This is where we're at.

1663
01:37:46.439 --> 01:37:48.760
<v Speaker 4>We're still in the vacuum tube stage of AI. Every

1664
01:37:48.800 --> 01:37:51.960
<v Speaker 4>once acond we're like, no, no, no, we're not. And I'm

1665
01:37:52.000 --> 01:37:54.159
<v Speaker 4>trying to show some of the evolution in the progress

1666
01:37:54.479 --> 01:37:56.479
<v Speaker 4>because this is the state of the art stuff. And

1667
01:37:56.800 --> 01:38:00.560
<v Speaker 4>if you understand video games, you understand that we're still right,

1668
01:38:00.720 --> 01:38:03.520
<v Speaker 4>like this is like not even nineteen nineties graphics yet, right,

1669
01:38:04.560 --> 01:38:09.760
<v Speaker 4>So we're getting there. So, as I was saying before,

1670
01:38:09.880 --> 01:38:11.800
<v Speaker 4>you get object shapes for free, and this is the

1671
01:38:12.159 --> 01:38:14.760
<v Speaker 4>solutions to the bounding box problem. And this was a

1672
01:38:14.800 --> 01:38:17.399
<v Speaker 4>major problem in the industry and still probably is for

1673
01:38:17.439 --> 01:38:19.560
<v Speaker 4>some dumb people. But this is how you solve it.

1674
01:38:19.600 --> 01:38:24.079
<v Speaker 4>And this is a really really important thing, getting object

1675
01:38:24.119 --> 01:38:28.800
<v Speaker 4>shapes for free, Like it's up there with collision avoidance

1676
01:38:28.840 --> 01:38:33.680
<v Speaker 4>in terms of importance one of the and this is

1677
01:38:33.720 --> 01:38:36.199
<v Speaker 4>like kind of important too, especially if you're an AI

1678
01:38:36.279 --> 01:38:39.239
<v Speaker 4>you understand how significant this is. Probably most people won't

1679
01:38:39.319 --> 01:38:41.279
<v Speaker 4>understand how big of a deal this is. But this

1680
01:38:41.399 --> 01:38:43.720
<v Speaker 4>is a one of the other things you get out

1681
01:38:43.720 --> 01:38:46.479
<v Speaker 4>of an ocupency network is that it reasons well within

1682
01:38:46.840 --> 01:38:50.399
<v Speaker 4>about occluded objects, So objects that are traveling behind other

1683
01:38:50.439 --> 01:38:55.039
<v Speaker 4>objects and partially occluded. This system inherently and automatically just

1684
01:38:55.199 --> 01:38:57.800
<v Speaker 4>understands that because it's just a grid network around you.

1685
01:38:58.000 --> 01:39:02.039
<v Speaker 4>Very simple heat map. But again, as I was saying,

1686
01:39:02.079 --> 01:39:05.640
<v Speaker 4>the most important thing about the occupancy network for avs

1687
01:39:05.720 --> 01:39:09.399
<v Speaker 4>is a collision avoidance system. It implicitly has this. Now

1688
01:39:09.439 --> 01:39:12.399
<v Speaker 4>that that is not to say that it will automatically

1689
01:39:12.399 --> 01:39:16.640
<v Speaker 4>go on the correct path. No, it just means it

1690
01:39:16.680 --> 01:39:22.239
<v Speaker 4>will consistently choose a collision avoidant path. So, as you

1691
01:39:22.239 --> 01:39:24.359
<v Speaker 4>can see here, this car went out of control. It

1692
01:39:24.439 --> 01:39:26.640
<v Speaker 4>almost collided in the right corner, but the collision avoid

1693
01:39:26.680 --> 01:39:29.399
<v Speaker 4>the occupancy network avoided it moved over to the left.

1694
01:39:29.439 --> 01:39:31.039
<v Speaker 4>This is clearly the wrong side of the road, but

1695
01:39:31.119 --> 01:39:36.079
<v Speaker 4>it avoided another collision. It stabilized and that's the value originally,

1696
01:39:36.520 --> 01:39:39.279
<v Speaker 4>initially that was the major value, and it was sufficient

1697
01:39:39.279 --> 01:39:42.319
<v Speaker 4>to sell it into the av industry just for this alone,

1698
01:39:42.560 --> 01:39:44.159
<v Speaker 4>because there's a software alone doing this.

1699
01:39:45.680 --> 01:39:53.359
<v Speaker 5>Close your words, looking to the darkness far the blazing

1700
01:39:53.520 --> 01:40:05.359
<v Speaker 5>start focus on it be called the don't feel let

1701
01:40:05.399 --> 01:40:06.560
<v Speaker 5>the should be done.

1702
01:40:09.399 --> 01:40:10.159
<v Speaker 4>And does

1703
01:40:12.000 --> 01:40:16.439
<v Speaker 1>You the shoes
