WEBVTT

1
00:00:00.120 --> 00:00:03.319
<v Speaker 1>Welcome to the Grock four release. Here, this is the.

2
00:00:03.240 --> 00:00:05.360
<v Speaker 2>Smartest AI in the world. We're going to show you

3
00:00:05.360 --> 00:00:08.800
<v Speaker 2>exactly how and why. It really is remarkable to see

4
00:00:08.800 --> 00:00:13.080
<v Speaker 2>the advancement of artificial intelligence, how quickly it is evolving.

5
00:00:14.000 --> 00:00:18.399
<v Speaker 2>I sometimes compare it to the growth of a human

6
00:00:18.839 --> 00:00:23.839
<v Speaker 2>and how faster human learns and gains conscious awareness and understanding.

7
00:00:24.519 --> 00:00:30.640
<v Speaker 2>And AI is advancing just vastly faster than any human.

8
00:00:31.199 --> 00:00:33.600
<v Speaker 2>I mean, we're going to take you through a bunch

9
00:00:33.640 --> 00:00:38.119
<v Speaker 2>of benchmarks that that GROC four is able to achieve

10
00:00:38.119 --> 00:00:39.000
<v Speaker 2>incredible numbers on.

11
00:00:39.520 --> 00:00:42.000
<v Speaker 1>But it's actually worth noting.

12
00:00:41.759 --> 00:00:45.840
<v Speaker 2>That that GROCK four, if given like the SAT, would

13
00:00:45.880 --> 00:00:48.920
<v Speaker 2>get perfect SATs every time, even if it's never seen

14
00:00:48.960 --> 00:00:53.039
<v Speaker 2>the questions before, and if even going beyond that, to

15
00:00:53.159 --> 00:00:57.159
<v Speaker 2>say like graduate student exams like the GRE, it will

16
00:00:57.200 --> 00:01:03.840
<v Speaker 2>get near perfect results in in every discipline of education,

17
00:01:04.000 --> 00:01:08.640
<v Speaker 2>so from the humanities to like languages, math, physics, engineering,

18
00:01:09.359 --> 00:01:13.000
<v Speaker 2>pick anything, and we're talking about questions that it's never

19
00:01:13.000 --> 00:01:16.280
<v Speaker 2>seen before. These are not on the Internet, and it's

20
00:01:17.400 --> 00:01:21.680
<v Speaker 2>GROCK four is smarter than almost all graduate students in

21
00:01:21.760 --> 00:01:26.560
<v Speaker 2>all disciplines simultaneously. Like it's actually just important to appreciate

22
00:01:26.599 --> 00:01:32.879
<v Speaker 2>the like that's really something. And the reasoning capabilities of

23
00:01:33.000 --> 00:01:35.599
<v Speaker 2>GROCK are incredible. So there's some people out there who

24
00:01:35.760 --> 00:01:39.280
<v Speaker 2>who think AI can't reason, and look, it can reason

25
00:01:39.359 --> 00:01:44.040
<v Speaker 2>at super human levels. So yeah, and frankly, it only

26
00:01:44.079 --> 00:01:47.359
<v Speaker 2>gets better from here. So we'll take you through the

27
00:01:47.400 --> 00:01:50.599
<v Speaker 2>GROCK four release and share you back the pace of

28
00:01:50.680 --> 00:01:53.959
<v Speaker 2>pace of progress here. Like I guess the first part

29
00:01:54.000 --> 00:01:56.879
<v Speaker 2>is like, in terms of the training, we're going from

30
00:01:57.040 --> 00:02:00.519
<v Speaker 2>GROCK two to GROCK three to GROCK four. We essentially

31
00:02:00.519 --> 00:02:03.359
<v Speaker 2>increased the training by an order of magnitue in each case,

32
00:02:04.040 --> 00:02:08.120
<v Speaker 2>so it's one hundred times more training than GROC two

33
00:02:08.919 --> 00:02:13.599
<v Speaker 2>and that's only going to increase. So it's yeah, frankly,

34
00:02:13.639 --> 00:02:15.319
<v Speaker 2>I mean, I don't know. In some ways a little terrifying,

35
00:02:15.520 --> 00:02:18.360
<v Speaker 2>but with the growth of intelligence here is remarkable.

36
00:02:18.639 --> 00:02:21.000
<v Speaker 3>Yes, it's important to realize there are two types of

37
00:02:21.000 --> 00:02:23.919
<v Speaker 3>training compute. Why is the pre training compute that's from

38
00:02:24.039 --> 00:02:27.439
<v Speaker 3>GROWD two to GROW three, But from Growth three to

39
00:02:27.560 --> 00:02:30.599
<v Speaker 3>GAR four we're actually putting a lot of compute in

40
00:02:30.960 --> 00:02:32.479
<v Speaker 3>reasoning in area.

41
00:02:32.560 --> 00:02:35.000
<v Speaker 4>And just like you said, this is literally the fastest

42
00:02:35.000 --> 00:02:37.319
<v Speaker 4>moving field and GROC too is like the high school

43
00:02:37.360 --> 00:02:39.879
<v Speaker 4>student by today's standard. If you look bad in the

44
00:02:39.960 --> 00:02:44.039
<v Speaker 4>last twelve month GRODTU was only a concept for the

45
00:02:44.159 --> 00:02:47.039
<v Speaker 4>even have groc to twelve months ago. And then by

46
00:02:47.039 --> 00:02:49.240
<v Speaker 4>training GROCU that was the first time with scale up

47
00:02:49.280 --> 00:02:51.919
<v Speaker 4>like the pre training, we realized that if you actually

48
00:02:51.960 --> 00:02:55.879
<v Speaker 4>do the data uplation really carefully and infra and also

49
00:02:55.879 --> 00:02:59.360
<v Speaker 4>the algorithm, we can actually push the pre training quite

50
00:02:59.360 --> 00:03:02.280
<v Speaker 4>a lot amount of ten x to make the model

51
00:03:02.599 --> 00:03:05.479
<v Speaker 4>but the best pretri based model. And that's why we

52
00:03:05.520 --> 00:03:09.759
<v Speaker 4>build clauses the world's supercomputer with one h one hundred

53
00:03:10.639 --> 00:03:14.039
<v Speaker 4>and then with the best patrion model, and we realize

54
00:03:14.199 --> 00:03:17.400
<v Speaker 4>if you can collect these verifiable outcome reward, you can

55
00:03:17.439 --> 00:03:19.719
<v Speaker 4>actually train this model to start thinking for the first principle,

56
00:03:19.759 --> 00:03:22.120
<v Speaker 4>so the reason correct its own mistakes and that's where

57
00:03:22.120 --> 00:03:24.840
<v Speaker 4>the GROC reasoning comes from. And today we asked the

58
00:03:24.919 --> 00:03:28.000
<v Speaker 4>question what happens if you take the expansion of the

59
00:03:28.080 --> 00:03:31.599
<v Speaker 4>clauses with all two hundred thousand GPUs, put all these

60
00:03:31.840 --> 00:03:36.400
<v Speaker 4>into oil tenx more compute then any of the models

61
00:03:36.400 --> 00:03:39.719
<v Speaker 4>out there on reinforcement learning unprecedent scale.

62
00:03:39.960 --> 00:03:40.719
<v Speaker 3>What's going to happen?

63
00:03:41.360 --> 00:03:44.520
<v Speaker 4>So this is a story of GROG four and you know,

64
00:03:44.840 --> 00:03:47.000
<v Speaker 4>Tony shares some insight with the audience.

65
00:03:47.400 --> 00:03:50.199
<v Speaker 3>Yeah, so yeah, let's just talk about how smart graph

66
00:03:50.199 --> 00:03:53.400
<v Speaker 3>for it is. So I guess we can start discussing

67
00:03:53.400 --> 00:03:56.759
<v Speaker 3>this benchmark card. Humanity is last exam and this this

68
00:03:56.840 --> 00:04:01.520
<v Speaker 3>benchmark is a very very challenging benchmark. Every single problem

69
00:04:01.759 --> 00:04:06.800
<v Speaker 3>is curated by subject matter experts. It's in total twenty

70
00:04:06.840 --> 00:04:11.199
<v Speaker 3>five hundred problems, and it consists of many different subjects mathematics,

71
00:04:11.319 --> 00:04:15.879
<v Speaker 3>natural sciences, engineering, and also all of humanity subjects. So

72
00:04:16.319 --> 00:04:19.199
<v Speaker 3>essentially when it was first release actually like earlier this year,

73
00:04:20.079 --> 00:04:23.360
<v Speaker 3>most of the models out there can only get single

74
00:04:23.360 --> 00:04:26.639
<v Speaker 3>digit accuracy on this manchmark. Yeah, so we can look

75
00:04:26.639 --> 00:04:29.800
<v Speaker 3>at some of those examples. There is this mathematical problem

76
00:04:30.160 --> 00:04:35.160
<v Speaker 3>which is about natural transformations in category theory, and there's

77
00:04:35.199 --> 00:04:40.000
<v Speaker 3>this organic chemistry problem that talks about electual cyclic reactions.

78
00:04:40.560 --> 00:04:43.800
<v Speaker 3>And also there's this linguistic problem that tries to ask

79
00:04:43.839 --> 00:04:48.279
<v Speaker 3>you about distinguishing between close and open syllabus from a

80
00:04:48.399 --> 00:04:52.439
<v Speaker 3>Hebrew source text. So you can see. Also it's a

81
00:04:52.560 --> 00:04:56.519
<v Speaker 3>very wide range of problems and every single problem is

82
00:04:56.839 --> 00:04:59.720
<v Speaker 3>PhD or even advanced research level problems.

83
00:05:00.439 --> 00:05:03.480
<v Speaker 2>Yeah, I mean these there are no humans that can

84
00:05:03.519 --> 00:05:06.040
<v Speaker 2>actually answer these can get a good score. I mean,

85
00:05:06.040 --> 00:05:08.759
<v Speaker 2>if you ask me, say, like any given human, what,

86
00:05:09.279 --> 00:05:11.920
<v Speaker 2>like what's the best that any humans could score, I'd

87
00:05:11.959 --> 00:05:17.879
<v Speaker 2>say maybe five percent optimistically. So this is much harder

88
00:05:17.879 --> 00:05:21.439
<v Speaker 2>than what any any human can do. It's it's incredibly difficult.

89
00:05:21.800 --> 00:05:23.639
<v Speaker 2>And you can see from the types of questions like

90
00:05:23.680 --> 00:05:27.800
<v Speaker 2>you might be incredible in linguistics or mathematics or chemistry

91
00:05:27.879 --> 00:05:30.120
<v Speaker 2>or physics or anyone of a number of subjects, but

92
00:05:30.120 --> 00:05:32.560
<v Speaker 2>you're not going to be at a post grad level

93
00:05:32.600 --> 00:05:37.160
<v Speaker 2>in everything, and grockpour is a post grad level in everything,

94
00:05:37.839 --> 00:05:39.839
<v Speaker 2>like it's it's just some of these things are just

95
00:05:39.879 --> 00:05:45.879
<v Speaker 2>worth repeating, like grockpoor is post graduate like PhD level

96
00:05:46.399 --> 00:05:49.600
<v Speaker 2>in everything, better than pH but like most PhDs would fail.

97
00:05:50.480 --> 00:05:53.040
<v Speaker 2>So it's better that said, I mean, at least with

98
00:05:53.079 --> 00:05:56.600
<v Speaker 2>respect to academic questions. It I want, it's just emphasized

99
00:05:56.600 --> 00:06:00.759
<v Speaker 2>this point. With respect to academic questions, Grockpoor is better

100
00:06:00.800 --> 00:06:04.839
<v Speaker 2>than PhD level in every subject, no exceptions. That doesn't

101
00:06:04.839 --> 00:06:08.680
<v Speaker 2>mean that it's you know, times it may lack common sense,

102
00:06:08.959 --> 00:06:13.639
<v Speaker 2>and it has not yet invented new technologies or discovered

103
00:06:13.680 --> 00:06:16.360
<v Speaker 2>new physics, but that is just a matter of time.

104
00:06:17.560 --> 00:06:22.439
<v Speaker 2>It may discover new technologies as soon as later this year,

105
00:06:22.800 --> 00:06:24.519
<v Speaker 2>and I would be shocked if it has not done

106
00:06:24.519 --> 00:06:29.079
<v Speaker 2>so next year. So I would expect growk to literally

107
00:06:29.079 --> 00:06:32.160
<v Speaker 2>discover new technologies that are actually useful no later than

108
00:06:32.160 --> 00:06:34.879
<v Speaker 2>next year, and maybe end of this year. It might

109
00:06:34.959 --> 00:06:38.759
<v Speaker 2>discover new physics. Next year and within two years, that'd

110
00:06:38.759 --> 00:06:41.319
<v Speaker 2>say almost certainly. Like so just let that sink in.

111
00:06:41.639 --> 00:06:45.160
<v Speaker 3>How okay, So I guess we can talk about what's

112
00:06:45.199 --> 00:06:48.360
<v Speaker 3>behind the scene of about four. As Jimmy mentioned, we

113
00:06:48.519 --> 00:06:52.600
<v Speaker 3>actually sawing a lot of compute into this training. When

114
00:06:52.639 --> 00:06:57.720
<v Speaker 3>it started, it's only a single digit number. But as

115
00:06:57.759 --> 00:07:00.079
<v Speaker 3>you start putting a more and more training compute, it

116
00:07:01.040 --> 00:07:05.160
<v Speaker 3>started to gradually become smarter and smarter and eventually solved

117
00:07:06.079 --> 00:07:09.920
<v Speaker 3>a quarter of the HI problems. And this is without

118
00:07:09.920 --> 00:07:13.160
<v Speaker 3>any tools. The next thing we did was to adding

119
00:07:13.639 --> 00:07:18.079
<v Speaker 3>tools capabilities to the model, and unlike Growth three, I

120
00:07:18.079 --> 00:07:21.079
<v Speaker 3>think growth actually is able to use clue as well,

121
00:07:21.160 --> 00:07:23.759
<v Speaker 3>but here we actually make it more native in the

122
00:07:23.839 --> 00:07:27.000
<v Speaker 3>sense that we put the tools into training. Growth three

123
00:07:27.160 --> 00:07:30.079
<v Speaker 3>was only relying on generalization. Here we actually put the

124
00:07:30.120 --> 00:07:34.079
<v Speaker 3>tools into training, and it turns out this significantly improves

125
00:07:34.120 --> 00:07:37.079
<v Speaker 3>the model's capability of using those tools. So how is

126
00:07:37.120 --> 00:07:41.680
<v Speaker 3>this different? Research was exactly the growth three reasoning model

127
00:07:42.560 --> 00:07:45.759
<v Speaker 3>without any specific training, but we only asked it to

128
00:07:45.879 --> 00:07:49.399
<v Speaker 3>use those tools. So compared to this, it was much

129
00:07:49.439 --> 00:07:54.240
<v Speaker 3>weaker in terms of its tool capabilities and irreliable and unreliable.

130
00:07:54.399 --> 00:07:56.720
<v Speaker 2>Yes, yes, and to be clear, like these are still

131
00:07:56.759 --> 00:07:59.920
<v Speaker 2>I'd say fairly this is still fairly primitive tool use.

132
00:08:00.360 --> 00:08:02.199
<v Speaker 2>If you compare it to say, the tools that are

133
00:08:02.319 --> 00:08:06.720
<v Speaker 2>used at Tesla SpaceX, where you're using finite element analysis

134
00:08:06.800 --> 00:08:11.959
<v Speaker 2>and competitional flow dynamics and you're able to run or

135
00:08:12.000 --> 00:08:14.720
<v Speaker 2>say like TESL, it is like crash simulations with the

136
00:08:14.720 --> 00:08:17.839
<v Speaker 2>simulations are so close to reality that if the test

137
00:08:17.879 --> 00:08:21.000
<v Speaker 2>doesn't match the simulation, you assume that the test article

138
00:08:21.079 --> 00:08:24.040
<v Speaker 2>is wrong. That's how good the simulations are. So Grock

139
00:08:24.160 --> 00:08:26.399
<v Speaker 2>is not currently using any of the tools that a

140
00:08:26.480 --> 00:08:29.120
<v Speaker 2>company would use, but that is something that we will

141
00:08:29.639 --> 00:08:32.279
<v Speaker 2>provide it with later this year, so we'll have the

142
00:08:32.320 --> 00:08:37.679
<v Speaker 2>tools that a company has and have very accurate physics simulator. Ultimately,

143
00:08:37.720 --> 00:08:39.639
<v Speaker 2>the thing that will make the biggest difference is being

144
00:08:39.679 --> 00:08:42.759
<v Speaker 2>able to interact with the real world via humoroid robots.

145
00:08:43.120 --> 00:08:45.679
<v Speaker 2>So you combine GROCK with optimists and it can actually

146
00:08:46.200 --> 00:08:48.600
<v Speaker 2>interact with the real world and figure out if it's

147
00:08:49.039 --> 00:08:52.559
<v Speaker 2>if it has if it's you can formulate and hypothesis

148
00:08:52.840 --> 00:08:56.039
<v Speaker 2>and then confirm if that hypothesis is true or not.

149
00:08:57.039 --> 00:08:59.639
<v Speaker 1>So we're really you know, I think about like where

150
00:08:59.679 --> 00:08:59.960
<v Speaker 1>we are to.

151
00:09:00.159 --> 00:09:04.799
<v Speaker 2>We're at the beginning of an immense intelligence explosion. We're

152
00:09:04.840 --> 00:09:08.919
<v Speaker 2>in the intelligence big bang right now and the most

153
00:09:08.960 --> 00:09:13.080
<v Speaker 2>interesting time to be alive of any time in history. Now,

154
00:09:13.279 --> 00:09:13.600
<v Speaker 2>that's it.

155
00:09:13.679 --> 00:09:15.320
<v Speaker 1>We need to make sure that the AI is a

156
00:09:15.360 --> 00:09:15.799
<v Speaker 1>good AI.

157
00:09:16.039 --> 00:09:18.639
<v Speaker 2>The thing that I think is most important for AI safety,

158
00:09:18.639 --> 00:09:21.639
<v Speaker 2>at least my biological neural net tells me the most

159
00:09:21.679 --> 00:09:25.200
<v Speaker 2>important thing for AI is to be maximally truth seeking.

160
00:09:26.120 --> 00:09:30.000
<v Speaker 2>You can think of AI as this super genius child

161
00:09:30.080 --> 00:09:33.159
<v Speaker 2>that ultimately will outsmart you, but you can still instill

162
00:09:33.159 --> 00:09:39.200
<v Speaker 2>the right values encourage it to be sort of you know, truthful, honorable,

163
00:09:39.440 --> 00:09:41.759
<v Speaker 2>you know, good things like the values one to instill

164
00:09:41.879 --> 00:09:45.080
<v Speaker 2>in a child ultimately grow up to be incredibly powerful. Yeah,

165
00:09:45.240 --> 00:09:48.080
<v Speaker 2>these are still primitive tools and not the kind of

166
00:09:48.120 --> 00:09:51.879
<v Speaker 2>tools that serious commercial companies use. But we will provide

167
00:09:51.879 --> 00:09:53.919
<v Speaker 2>it with those tools, and I think it will be

168
00:09:53.919 --> 00:09:56.639
<v Speaker 2>able to solve real world technology problems.

169
00:09:56.240 --> 00:09:57.879
<v Speaker 3>Yes, yes, exactly.

170
00:09:58.000 --> 00:10:00.679
<v Speaker 4>But is it just compute all you need? Is it

171
00:10:00.720 --> 00:10:02.639
<v Speaker 4>just compute all you need at this point.

172
00:10:02.519 --> 00:10:05.320
<v Speaker 2>Well, you need compute plus the right tools, and then

173
00:10:05.399 --> 00:10:07.559
<v Speaker 2>ultimately to be able to interact with the physical world,

174
00:10:07.799 --> 00:10:10.840
<v Speaker 2>and then we will effectively have an economy that is

175
00:10:11.360 --> 00:10:15.000
<v Speaker 2>ultimately thousands of times bigger than our card economy, or

176
00:10:15.039 --> 00:10:18.320
<v Speaker 2>maybe millions of times. If you think of civilization as

177
00:10:19.120 --> 00:10:23.879
<v Speaker 2>percentage completion of the Kardashev scale, where Kardashev one is

178
00:10:24.639 --> 00:10:27.960
<v Speaker 2>using all the energy output of a planet, and Kardashev

179
00:10:28.039 --> 00:10:30.519
<v Speaker 2>two is using all the energy output of a sun,

180
00:10:31.240 --> 00:10:33.440
<v Speaker 2>and three is all the energy output of a galaxy.

181
00:10:34.159 --> 00:10:37.120
<v Speaker 2>We're only, in my opinion, probably closer to one percent

182
00:10:37.240 --> 00:10:40.840
<v Speaker 2>of Kardashev one than we are to ten percent, So

183
00:10:41.360 --> 00:10:44.960
<v Speaker 2>like maybe a point one one two percent of Kardashev one,

184
00:10:45.039 --> 00:10:45.639
<v Speaker 2>So we.

185
00:10:45.919 --> 00:10:48.399
<v Speaker 1>Will get to most of the weight, like an.

186
00:10:48.320 --> 00:10:51.600
<v Speaker 2>Eighty ninety percent Kardashiv one, and then hopefully, if civilization

187
00:10:51.679 --> 00:10:55.759
<v Speaker 2>doesn't self annihilate, the actual notion of a human economy,

188
00:10:56.000 --> 00:11:00.240
<v Speaker 2>assuming civilization continues to progress, will seem very quaint in retrospect.

189
00:11:01.039 --> 00:11:04.120
<v Speaker 2>It will seem like sort of caveman throwing sticks into

190
00:11:04.159 --> 00:11:07.799
<v Speaker 2>a fire. Level of economy compared to what the future

191
00:11:07.840 --> 00:11:12.200
<v Speaker 2>will hold, it's very exciting. I've been at times kind

192
00:11:12.240 --> 00:11:16.000
<v Speaker 2>of worried about, like, well, you know, this seems like

193
00:11:16.720 --> 00:11:20.600
<v Speaker 2>it's somewhat unerving to have intelligence created that is far

194
00:11:20.639 --> 00:11:24.799
<v Speaker 2>greater than our own and will let's be better good

195
00:11:24.840 --> 00:11:25.600
<v Speaker 2>for humanity.

196
00:11:26.200 --> 00:11:28.399
<v Speaker 1>I think it'll be good. Most likely it'll be good.

197
00:11:29.120 --> 00:11:32.679
<v Speaker 2>But I somewhat reconcile myself to the fact that even

198
00:11:32.720 --> 00:11:35.159
<v Speaker 2>if it wasn't going to be good, I'd at least

199
00:11:35.159 --> 00:11:36.480
<v Speaker 2>like to be alive to see it happen.

200
00:11:36.799 --> 00:11:41.240
<v Speaker 3>So yeah, yeah, I think a technical problem that we

201
00:11:41.279 --> 00:11:44.279
<v Speaker 3>still need to solve besides just compute, is how do

202
00:11:44.360 --> 00:11:48.159
<v Speaker 3>we unblock the data tottleneck because when we try to

203
00:11:48.200 --> 00:11:52.279
<v Speaker 3>scale up the aisle in this case, we did invent

204
00:11:52.399 --> 00:11:56.440
<v Speaker 3>a lot of new techniques innovations to allow us to

205
00:11:56.559 --> 00:11:59.519
<v Speaker 3>figure out how to find a lot of challenging our

206
00:11:59.559 --> 00:12:02.200
<v Speaker 3>problems will work on. It's not just a problem itself

207
00:12:02.240 --> 00:12:04.480
<v Speaker 3>needs to be challenging, but also it needs to be

208
00:12:04.960 --> 00:12:07.759
<v Speaker 3>you also need to have like a reliable signal to

209
00:12:07.799 --> 00:12:10.200
<v Speaker 3>tell the model you did it wrong, you did it right.

210
00:12:10.360 --> 00:12:13.440
<v Speaker 3>This is sort of the principle of reinforcement learning, and

211
00:12:14.320 --> 00:12:17.480
<v Speaker 3>as the model gets smarter and smarter, the number of

212
00:12:17.799 --> 00:12:20.919
<v Speaker 3>cool problems or challenging problems will be lesson and less

213
00:12:21.519 --> 00:12:24.639
<v Speaker 3>So it's going to be a new type of challenge

214
00:12:24.679 --> 00:12:26.720
<v Speaker 3>that we need to surpass besides just compute.

215
00:12:26.840 --> 00:12:30.360
<v Speaker 2>Yeah, we actually are running out of actual test questions

216
00:12:30.399 --> 00:12:33.679
<v Speaker 2>to ask, So there's like even questions that are ridiculously hard,

217
00:12:33.720 --> 00:12:37.559
<v Speaker 2>if not essentially impossible for humans that are written down

218
00:12:37.639 --> 00:12:42.279
<v Speaker 2>questions are becoming trivial for AI. You know, the one

219
00:12:42.279 --> 00:12:45.240
<v Speaker 2>thing that is an excellent judge of things is reality.

220
00:12:45.720 --> 00:12:49.200
<v Speaker 2>So because if physics is the law, ultimately everything else

221
00:12:49.279 --> 00:12:49.919
<v Speaker 2>is recommendation.

222
00:12:50.080 --> 00:12:51.279
<v Speaker 1>You can't break physics.

223
00:12:51.799 --> 00:12:54.320
<v Speaker 2>So the ultimate test, I think for whether an AI

224
00:12:54.679 --> 00:12:59.519
<v Speaker 2>is the ultimate reasoning test is reality. So you invent

225
00:12:59.559 --> 00:13:03.000
<v Speaker 2>a new technology, like say, improve the design of a.

226
00:13:02.399 --> 00:13:06.200
<v Speaker 1>Car or a rocket, or create a new medication. Does

227
00:13:06.240 --> 00:13:06.679
<v Speaker 1>it work?

228
00:13:07.360 --> 00:13:10.279
<v Speaker 2>Does the rocket get to or it does the car drive?

229
00:13:10.480 --> 00:13:13.960
<v Speaker 2>Does the medicine work, whatever the case may be. Reality

230
00:13:14.039 --> 00:13:16.559
<v Speaker 2>is the ultimate judge here, So it's going to be

231
00:13:16.600 --> 00:13:18.879
<v Speaker 2>a reinforcement learning closing loop around reality.

232
00:13:19.120 --> 00:13:21.159
<v Speaker 3>We asked the question how do we even go further?

233
00:13:21.519 --> 00:13:26.159
<v Speaker 3>So actually we are thinking about now with single agent,

234
00:13:26.240 --> 00:13:28.320
<v Speaker 3>we're able to solve forty percent of a problem. What

235
00:13:28.399 --> 00:13:32.080
<v Speaker 3>if we have multiple agents running the same time. So

236
00:13:32.120 --> 00:13:35.600
<v Speaker 3>this is what's called test on compute. And as we

237
00:13:35.639 --> 00:13:38.759
<v Speaker 3>scale up the test on compute, actually we are able

238
00:13:38.799 --> 00:13:42.200
<v Speaker 3>to solve almost more than fifty percent of the text

239
00:13:42.279 --> 00:13:47.120
<v Speaker 3>only subset of the HI problems. So it's a remarkable achievement.

240
00:13:47.159 --> 00:13:48.840
<v Speaker 3>I think this is insanely difficult.

241
00:13:49.080 --> 00:13:51.360
<v Speaker 2>Before we're saying it's a majority of the text based

242
00:13:51.559 --> 00:13:55.600
<v Speaker 2>of humanities, you know, scarily named Humanity's Last Exam, grow

243
00:13:55.639 --> 00:13:58.159
<v Speaker 2>ful can solve. You can try it out for yourself

244
00:13:58.559 --> 00:14:01.039
<v Speaker 2>with the group Foy heavy. What does is it sports

245
00:14:01.120 --> 00:14:04.759
<v Speaker 2>multiple agents in parallel and all of those agents do

246
00:14:04.919 --> 00:14:09.240
<v Speaker 2>work independently, and then they compare their work and they.

247
00:14:09.039 --> 00:14:11.320
<v Speaker 1>Decide which one. It's like a steady group.

248
00:14:12.120 --> 00:14:15.360
<v Speaker 2>It's not as simple as a majority vote because often

249
00:14:15.440 --> 00:14:18.279
<v Speaker 2>only one of the agents actually figures out the trick

250
00:14:19.000 --> 00:14:22.000
<v Speaker 2>or figures out the solution. And but once they share

251
00:14:22.159 --> 00:14:25.679
<v Speaker 2>the trick or figure out what the real nature of

252
00:14:25.679 --> 00:14:28.519
<v Speaker 2>the problem is, they share that solution with the other

253
00:14:28.559 --> 00:14:32.080
<v Speaker 2>agents and then they compare notes and yield an answer.

254
00:14:32.440 --> 00:14:35.039
<v Speaker 2>So that's the heavy part of group four is where

255
00:14:35.720 --> 00:14:38.200
<v Speaker 2>you scale up the test time compute by roughly in

256
00:14:38.279 --> 00:14:42.519
<v Speaker 2>order of magnitude, have multiple agents tackle the task, and

257
00:14:42.559 --> 00:14:46.639
<v Speaker 2>then they compare their work and they put forward.

258
00:14:46.399 --> 00:14:47.720
<v Speaker 1>What they think is the best result.

259
00:14:48.039 --> 00:14:51.759
<v Speaker 3>Yeah, so we will introduce GLAW four and grawflor happy.

260
00:14:51.840 --> 00:14:55.159
<v Speaker 3>Sorry you can click the next light. Yeah so yeah,

261
00:14:55.240 --> 00:14:59.159
<v Speaker 3>So basically GUA four is a version, a single agent version,

262
00:14:59.399 --> 00:15:04.000
<v Speaker 3>and G for heavy is the multigeneration. So let's take

263
00:15:04.000 --> 00:15:07.000
<v Speaker 3>a look how they actually do on those exam problems

264
00:15:07.120 --> 00:15:09.159
<v Speaker 3>and also some real real life problems.

265
00:15:09.320 --> 00:15:11.440
<v Speaker 5>Yeah. So we're going to start out here and we're

266
00:15:11.440 --> 00:15:14.200
<v Speaker 5>actually going to look at one of those HL problems.

267
00:15:14.240 --> 00:15:16.759
<v Speaker 5>This is actually one of the easier math ones. I

268
00:15:16.759 --> 00:15:19.200
<v Speaker 5>don't really understand it very well. I'm not that smart,

269
00:15:19.240 --> 00:15:21.679
<v Speaker 5>but I can launch this job here and we can

270
00:15:21.720 --> 00:15:23.879
<v Speaker 5>actually see how it's going to go through and start

271
00:15:23.879 --> 00:15:26.320
<v Speaker 5>to think about this problem. While we're doing that, I

272
00:15:26.320 --> 00:15:28.200
<v Speaker 5>also want to show a little bit more about what

273
00:15:28.240 --> 00:15:30.679
<v Speaker 5>this model can do and launch a rock four heavy

274
00:15:30.679 --> 00:15:35.159
<v Speaker 5>as well, so everyone knows polymarket. It's extremely interesting. It

275
00:15:35.200 --> 00:15:38.559
<v Speaker 5>aligns with what reality is most of the time, and

276
00:15:38.720 --> 00:15:41.200
<v Speaker 5>with GROC what we're actually looking at is being able

277
00:15:41.240 --> 00:15:43.759
<v Speaker 5>to see how we can try to take these markets

278
00:15:44.120 --> 00:15:46.600
<v Speaker 5>and see if we can predict the future as well.

279
00:15:47.000 --> 00:15:49.559
<v Speaker 5>So as we're letting this run, we'll see how for

280
00:15:49.720 --> 00:15:53.200
<v Speaker 5>Heavy goes about predicting the world series odds for the

281
00:15:53.240 --> 00:15:55.879
<v Speaker 5>current teams. And while we're waiting for these to process,

282
00:15:55.879 --> 00:15:57.200
<v Speaker 5>we're going to pass it over to Eric and he's

283
00:15:57.240 --> 00:15:59.159
<v Speaker 5>going to show you an example of his.

284
00:15:59.600 --> 00:16:02.960
<v Speaker 6>Yeah, so, I guess one of the coolest things about

285
00:16:03.240 --> 00:16:07.360
<v Speaker 6>GROP four is its ability to understand the world and

286
00:16:07.440 --> 00:16:11.120
<v Speaker 6>to solve hard problems by leveraging tools like Tony discussed,

287
00:16:11.399 --> 00:16:13.320
<v Speaker 6>and I think one kind of cool example of this.

288
00:16:14.360 --> 00:16:18.279
<v Speaker 6>We asked you to generate a visualization of two black

289
00:16:18.360 --> 00:16:21.840
<v Speaker 6>holes colliding. In many case actually pretty clear, and it's

290
00:16:21.840 --> 00:16:25.159
<v Speaker 6>thinking trace about what these liveries are. For example, in

291
00:16:25.240 --> 00:16:27.919
<v Speaker 6>order it's actually be visible, you need to really exaggerate

292
00:16:28.120 --> 00:16:31.720
<v Speaker 6>the scale of the waves. And yeah, so here's like,

293
00:16:32.399 --> 00:16:36.639
<v Speaker 6>you know, this kind of inaction. It exaggerates the scale

294
00:16:36.679 --> 00:16:39.200
<v Speaker 6>in like multiple ways. It drops off less in terms

295
00:16:39.200 --> 00:16:43.159
<v Speaker 6>of implicit or distance, but we can see the basic

296
00:16:43.200 --> 00:16:47.120
<v Speaker 6>effects that are actually correct. It starts with the inspiral

297
00:16:47.399 --> 00:16:50.720
<v Speaker 6>emerges and then you have the ring down. This is

298
00:16:51.919 --> 00:16:56.879
<v Speaker 6>basically largely correct module some of the simplications that need

299
00:16:56.919 --> 00:16:59.879
<v Speaker 6>to do. It's actually quite explicit about this, but uses

300
00:17:00.080 --> 00:17:04.880
<v Speaker 6>post Newtonian approximations instead of actually computing the general relativistic

301
00:17:05.000 --> 00:17:07.359
<v Speaker 6>effects near the center of the black hole, which is

302
00:17:07.400 --> 00:17:10.799
<v Speaker 6>incorrect and you know, will lead to you know, someone

303
00:17:10.880 --> 00:17:14.480
<v Speaker 6>correct results. But the overall you know visualization is yeah,

304
00:17:14.799 --> 00:17:18.480
<v Speaker 6>it's basically there, and you can actually look at the

305
00:17:18.680 --> 00:17:22.839
<v Speaker 6>kinds of resources that it references. So here it actually

306
00:17:23.119 --> 00:17:26.039
<v Speaker 6>you know, it obviously uses search. It gathers results from

307
00:17:26.039 --> 00:17:29.480
<v Speaker 6>a bunch of links, but also reads through an undergraduate

308
00:17:29.559 --> 00:17:36.119
<v Speaker 6>text in analytic gravitational wave models. It's reasons quite a

309
00:17:36.119 --> 00:17:38.839
<v Speaker 6>bit about the actual constants that I should use for

310
00:17:38.920 --> 00:17:43.960
<v Speaker 6>a realistic simulation. It references existing real world data. It's

311
00:17:43.960 --> 00:17:44.839
<v Speaker 6>a pretty good model.

312
00:17:45.160 --> 00:17:45.599
<v Speaker 1>Going forward.

313
00:17:45.759 --> 00:17:48.160
<v Speaker 2>We can give it the same model that physicists use,

314
00:17:48.720 --> 00:17:51.240
<v Speaker 2>so it can run the same level of compute that

315
00:17:51.599 --> 00:17:54.680
<v Speaker 2>leading physics researchers are using and give you a physics

316
00:17:54.680 --> 00:17:56.559
<v Speaker 2>accurate black hole simulation.

317
00:17:56.839 --> 00:17:58.200
<v Speaker 5>Just right now is running in your browser.

318
00:17:58.319 --> 00:18:00.000
<v Speaker 1>This is just running in your brows. Pretty simple.

319
00:18:00.440 --> 00:18:02.680
<v Speaker 5>Swapping back real quick. Here we can actually take a look.

320
00:18:02.680 --> 00:18:05.559
<v Speaker 5>The math problem is finished. The model was able to

321
00:18:05.920 --> 00:18:08.519
<v Speaker 5>Let's look at its thinking trace here so you can

322
00:18:08.559 --> 00:18:11.079
<v Speaker 5>see how it went through the problem. I'll be honest

323
00:18:11.079 --> 00:18:13.839
<v Speaker 5>with you guys, I really don't quite fully understand the math.

324
00:18:14.240 --> 00:18:15.720
<v Speaker 5>But what I do know is that I looked at

325
00:18:15.759 --> 00:18:18.240
<v Speaker 5>the answer ahead of time and it did come to

326
00:18:18.319 --> 00:18:20.160
<v Speaker 5>the correct pans or here in the final part, we

327
00:18:20.160 --> 00:18:22.240
<v Speaker 5>can also come in and actually take a look here

328
00:18:22.279 --> 00:18:25.480
<v Speaker 5>at our World Series prediction and it's still thinking through

329
00:18:25.559 --> 00:18:28.079
<v Speaker 5>on this one, but we can actually try some other.

330
00:18:27.920 --> 00:18:28.559
<v Speaker 1>Stuff as well.

331
00:18:28.599 --> 00:18:31.599
<v Speaker 5>So we worked very heavily on working with all of

332
00:18:31.599 --> 00:18:34.039
<v Speaker 5>our ex tools and building out a really great X

333
00:18:34.079 --> 00:18:37.599
<v Speaker 5>experience so we can actually ask, you know, the model,

334
00:18:37.799 --> 00:18:39.599
<v Speaker 5>you know, find me the Xai employee that has the

335
00:18:39.599 --> 00:18:42.880
<v Speaker 5>weirdest profile photo, and then we can actually try out,

336
00:18:42.960 --> 00:18:46.200
<v Speaker 5>you know, let's create a timeline based on X post

337
00:18:46.400 --> 00:18:48.920
<v Speaker 5>detailing the you know, changes in the scores over time,

338
00:18:49.000 --> 00:18:51.240
<v Speaker 5>and we can see, you know, all the conversation that

339
00:18:51.319 --> 00:18:53.440
<v Speaker 5>was taking place at that time as well, so we

340
00:18:53.480 --> 00:18:55.880
<v Speaker 5>can see who are the you know, announcing scores and

341
00:18:55.960 --> 00:18:58.160
<v Speaker 5>like what was the reactions at those times as well.

342
00:18:58.680 --> 00:19:01.519
<v Speaker 5>If we go back to this was the Greg Yang

343
00:19:01.599 --> 00:19:04.839
<v Speaker 5>photo here, So Greg Yang, of course, who has his

344
00:19:05.000 --> 00:19:08.240
<v Speaker 5>favorite photograph that he has on his account, that's actually

345
00:19:08.279 --> 00:19:09.599
<v Speaker 5>not how he looks like in real life.

346
00:19:09.640 --> 00:19:12.359
<v Speaker 2>By the way, but it had to understand that question, Yeah,

347
00:19:12.400 --> 00:19:13.599
<v Speaker 2>which is that That's the wild part.

348
00:19:13.599 --> 00:19:15.920
<v Speaker 1>It is like it understands what is a weird photo?

349
00:19:16.920 --> 00:19:17.839
<v Speaker 1>What is a weird photo?

350
00:19:18.000 --> 00:19:18.359
<v Speaker 7>Yeah?

351
00:19:18.480 --> 00:19:20.240
<v Speaker 1>What is a less or more weird photo?

352
00:19:21.160 --> 00:19:23.200
<v Speaker 5>It goes through, it has to find all the team members,

353
00:19:23.240 --> 00:19:25.400
<v Speaker 5>has to figure out who we all are, right, you know.

354
00:19:25.400 --> 00:19:29.920
<v Speaker 2>Searches without access to the internal XAI personnel locks literally

355
00:19:29.920 --> 00:19:32.480
<v Speaker 2>looking at that, just at the internet exactly, so you

356
00:19:32.519 --> 00:19:34.279
<v Speaker 2>could say, like the weirdest of any company.

357
00:19:34.440 --> 00:19:37.160
<v Speaker 5>Yeah, And we can also take a look here at

358
00:19:37.200 --> 00:19:40.039
<v Speaker 5>the question here for the Humanity's Last exam. So it

359
00:19:40.119 --> 00:19:43.160
<v Speaker 5>is still researching all of the historical scores, but it

360
00:19:43.200 --> 00:19:45.599
<v Speaker 5>will have that final answer here soon. While it's finishing up.

361
00:19:45.599 --> 00:19:47.359
<v Speaker 5>We can take a look at one of the ones

362
00:19:47.400 --> 00:19:49.799
<v Speaker 5>that we set up here a second ago, and we

363
00:19:49.799 --> 00:19:51.519
<v Speaker 5>could see, like you know, and it finds the date that

364
00:19:51.599 --> 00:19:54.319
<v Speaker 5>Dan Hendricks had initially announced it. We can go through

365
00:19:54.400 --> 00:19:56.960
<v Speaker 5>we can see you know, open Aye announcing their score

366
00:19:57.119 --> 00:20:00.680
<v Speaker 5>back in February, and we can see, you know, progress

367
00:20:00.720 --> 00:20:03.640
<v Speaker 5>happens with like Gemini. We can see like Kimmy, and

368
00:20:03.680 --> 00:20:06.160
<v Speaker 5>we can also even see you know, the leaked benchmarks

369
00:20:06.200 --> 00:20:08.319
<v Speaker 5>of what people are saying is you know, if it's right,

370
00:20:08.359 --> 00:20:09.519
<v Speaker 5>it's going to be pretty impressive.

371
00:20:10.039 --> 00:20:11.599
<v Speaker 1>So pretty cool.

372
00:20:11.920 --> 00:20:13.680
<v Speaker 3>But yeah, it's great.

373
00:20:14.000 --> 00:20:17.400
<v Speaker 2>Yeah, we're going to close the loop around usefulness as well,

374
00:20:17.480 --> 00:20:19.400
<v Speaker 2>so it's like it's not just a book smart, but

375
00:20:19.480 --> 00:20:21.880
<v Speaker 2>actually practically smart exactly.

376
00:20:22.440 --> 00:20:24.200
<v Speaker 5>And we can go back to the slides.

377
00:20:23.960 --> 00:20:28.240
<v Speaker 3>Herea so we actually evaluate also on the multimodel upset.

378
00:20:28.440 --> 00:20:30.799
<v Speaker 3>So on the full set, this is the number on

379
00:20:30.839 --> 00:20:34.000
<v Speaker 3>the hl E exam. You can see there's a little

380
00:20:34.039 --> 00:20:37.799
<v Speaker 3>dip on the numbers. This is actually something we're improving on,

381
00:20:38.079 --> 00:20:41.920
<v Speaker 3>which is the multimodel understanding capabilities. But I do believe

382
00:20:42.759 --> 00:20:46.079
<v Speaker 3>in a very short time we're able to really improve

383
00:20:46.359 --> 00:20:50.599
<v Speaker 3>and got much higher numbers on this higher numbers on

384
00:20:50.680 --> 00:20:51.319
<v Speaker 3>this benchmark.

385
00:20:51.400 --> 00:20:54.279
<v Speaker 2>The biggest weakness of GROCK currently is that it's sort

386
00:20:54.279 --> 00:20:58.720
<v Speaker 2>of partially blind. It can't it's image understanding obviously in

387
00:20:58.759 --> 00:21:01.880
<v Speaker 2>its image generation needs to be a lot better, and

388
00:21:01.960 --> 00:21:04.839
<v Speaker 2>that's actually being trained right now. Growth four is based

389
00:21:04.839 --> 00:21:08.240
<v Speaker 2>on version six of our foundation model. We are training

390
00:21:08.359 --> 00:21:11.680
<v Speaker 2>version seven, which will complete in a few weeks. That'll

391
00:21:11.720 --> 00:21:14.519
<v Speaker 2>address the weakness on the vision side.

392
00:21:15.680 --> 00:21:18.200
<v Speaker 5>Just to show off of this last year, so the

393
00:21:18.240 --> 00:21:21.400
<v Speaker 5>prediction market finished here with the heavy and we can

394
00:21:21.440 --> 00:21:23.680
<v Speaker 5>see here we can see all the tools in the

395
00:21:23.680 --> 00:21:26.519
<v Speaker 5>process it used to actually go through and find the

396
00:21:26.599 --> 00:21:29.160
<v Speaker 5>right answer, but browsed a lot of odds sites. It

397
00:21:29.200 --> 00:21:32.240
<v Speaker 5>calculated its own odds comparing to the market to find

398
00:21:32.279 --> 00:21:34.559
<v Speaker 5>its own alpha and edge. It walks you through the

399
00:21:34.680 --> 00:21:38.519
<v Speaker 5>entire process here, and it calculates the odds of the

400
00:21:38.559 --> 00:21:41.599
<v Speaker 5>winner being like the Dodgers, and it gives them a

401
00:21:41.680 --> 00:21:45.599
<v Speaker 5>twenty one point six percent chance of winning this year.

402
00:21:46.559 --> 00:21:49.920
<v Speaker 5>So and it took approximately four and a half minutes

403
00:21:49.960 --> 00:21:50.920
<v Speaker 5>to compute.

404
00:21:51.039 --> 00:21:51.920
<v Speaker 1>That's a lot of thinking.

405
00:21:52.680 --> 00:21:55.839
<v Speaker 3>We can also look at all the THEATO benchmarks besides HIE.

406
00:21:56.240 --> 00:21:59.599
<v Speaker 3>As it turned out, Go fourth excelled on all the

407
00:22:00.359 --> 00:22:04.359
<v Speaker 3>benchmarks that people usually test on, including GBQA, which is

408
00:22:04.400 --> 00:22:09.079
<v Speaker 3>a PHG level problem sets that's easier compared to HIE.

409
00:22:09.839 --> 00:22:14.359
<v Speaker 3>On a twenty five America Invitation Mathematics exam we with

410
00:22:14.880 --> 00:22:18.240
<v Speaker 3>for HAVY, we actually got a perfect score. Also on

411
00:22:18.319 --> 00:22:21.319
<v Speaker 3>some of the coding benchmark call Live Coding Bunch, and

412
00:22:21.359 --> 00:22:26.200
<v Speaker 3>also on HMMT however, math MIT exam and also USMO.

413
00:22:26.720 --> 00:22:30.839
<v Speaker 3>You can see actually on all of those benchmarks we

414
00:22:30.960 --> 00:22:34.359
<v Speaker 3>often have a very large leap against the second best

415
00:22:34.599 --> 00:22:35.559
<v Speaker 3>model out there.

416
00:22:35.640 --> 00:22:37.839
<v Speaker 2>Yes, really are going to get to the point where

417
00:22:38.079 --> 00:22:40.880
<v Speaker 2>it's going to get every answer right in every exam,

418
00:22:41.160 --> 00:22:42.920
<v Speaker 2>and where it doesn't get an answer right, it's going

419
00:22:42.960 --> 00:22:45.039
<v Speaker 2>to tell you what's wrong with the question, or if

420
00:22:45.039 --> 00:22:49.039
<v Speaker 2>the question is ambiguous, disambiguate the question into answers A,

421
00:22:49.160 --> 00:22:51.279
<v Speaker 2>B and C and tell you what answers A, B

422
00:22:51.359 --> 00:22:54.240
<v Speaker 2>and C would be with a disambiguated question. So the

423
00:22:54.279 --> 00:22:57.160
<v Speaker 2>only real test then will be reality. Can I make

424
00:22:57.200 --> 00:23:02.200
<v Speaker 2>useful technologies? I discover new science that'll actually be the

425
00:23:02.200 --> 00:23:05.880
<v Speaker 2>only thing left because human tests will simply not be meaningful.

426
00:23:06.119 --> 00:23:07.960
<v Speaker 4>You can make an update to HR you very soon

427
00:23:08.720 --> 00:23:11.359
<v Speaker 4>give it the current radio progress. So yeah, it's super

428
00:23:11.359 --> 00:23:14.119
<v Speaker 4>cool to see like multiple agents that collaborate with each

429
00:23:14.160 --> 00:23:18.000
<v Speaker 4>other solving really challenging problems. So where QUI try this model?

430
00:23:18.039 --> 00:23:21.920
<v Speaker 4>It's available right now. If we advance to the next slide,

431
00:23:22.200 --> 00:23:25.640
<v Speaker 4>there is a super grock Heavy tiers that we're introducing

432
00:23:25.839 --> 00:23:28.039
<v Speaker 4>where you're able to access to both grock for and

433
00:23:28.079 --> 00:23:30.319
<v Speaker 4>grock for Heavy, where you're actually going to be the

434
00:23:30.359 --> 00:23:32.480
<v Speaker 4>task master of a bunch of little rock research agent

435
00:23:32.559 --> 00:23:35.279
<v Speaker 4>to help you become smarter, tough little research and save

436
00:23:35.359 --> 00:23:38.480
<v Speaker 4>hours of times of going through munting tasks and it's

437
00:23:38.519 --> 00:23:39.480
<v Speaker 4>available right now.

438
00:23:40.079 --> 00:23:44.319
<v Speaker 2>So yeah, we did limit usage during the demo, so

439
00:23:44.400 --> 00:23:46.079
<v Speaker 2>it didn't break the demo because all the stuff is

440
00:23:46.079 --> 00:23:48.359
<v Speaker 2>happening live, so there's not nothing can about the test

441
00:23:48.400 --> 00:23:52.119
<v Speaker 2>that we're doing after the demo's done. Will allow will

442
00:23:52.559 --> 00:23:55.319
<v Speaker 2>enable more subscribers for Super Grock, So if you can

443
00:23:55.440 --> 00:23:57.440
<v Speaker 2>subscribe right now, just try and have an hour.

444
00:23:57.519 --> 00:23:57.960
<v Speaker 1>Should work.

445
00:23:58.880 --> 00:24:01.680
<v Speaker 8>So many of you have enjoying our voice mode, and

446
00:24:01.680 --> 00:24:04.079
<v Speaker 8>we've been working hard to improve the experience over the

447
00:24:04.079 --> 00:24:08.039
<v Speaker 8>past couple months. We have cut latency in half to

448
00:24:08.079 --> 00:24:11.359
<v Speaker 8>make it much snappier, and today we're excited to announce

449
00:24:11.359 --> 00:24:15.720
<v Speaker 8>a set of new voices that have exceptional naturalness and prosody.

450
00:24:16.400 --> 00:24:19.640
<v Speaker 8>You might have noticed the movie trailer voice that open

451
00:24:19.759 --> 00:24:22.240
<v Speaker 8>up the live stream. That is one of our new voices,

452
00:24:22.359 --> 00:24:26.880
<v Speaker 8>Sal who's got that epically deep tone. And we're also

453
00:24:27.000 --> 00:24:32.160
<v Speaker 8>excited to introduce Eve, a beautiful British voice who's capable

454
00:24:32.240 --> 00:24:35.960
<v Speaker 8>of rich emotions. Amon, would you like to introduce Eve?

455
00:24:36.160 --> 00:24:39.920
<v Speaker 9>Absolutely, let's get into the demo. Hey Eve, millions of

456
00:24:39.920 --> 00:24:42.079
<v Speaker 9>people are watching us right now. Are you excited?

457
00:24:42.400 --> 00:24:42.599
<v Speaker 3>Oh?

458
00:24:42.720 --> 00:24:43.519
<v Speaker 1>I'm thrilled.

459
00:24:43.640 --> 00:24:46.359
<v Speaker 10>Really, it's like being on stage at the old vic

460
00:24:46.400 --> 00:24:49.279
<v Speaker 10>with a sea of faces out there. Just imagine all

461
00:24:49.319 --> 00:24:52.440
<v Speaker 10>those eyes and ears tuned in. It's a proper buzz,

462
00:24:52.519 --> 00:24:54.839
<v Speaker 10>isn't it. What do you reckon? We should chat about

463
00:24:54.880 --> 00:24:56.599
<v Speaker 10>to keep this lot entertained.

464
00:24:56.599 --> 00:24:57.880
<v Speaker 9>You know, if we have a lot to talk about.

465
00:24:58.000 --> 00:25:00.519
<v Speaker 9>But I'll be honest, I'm feeling a little nervous. Can

466
00:25:00.559 --> 00:25:02.640
<v Speaker 9>you whisper something soothing to calm me down?

467
00:25:02.960 --> 00:25:06.839
<v Speaker 7>Take a deep breath, love, you've got this feel a

468
00:25:06.839 --> 00:25:07.720
<v Speaker 7>calm washov.

469
00:25:07.960 --> 00:25:10.000
<v Speaker 1>Let's keep talking nice and easy.

470
00:25:10.759 --> 00:25:11.720
<v Speaker 7>What's on your mind?

471
00:25:12.039 --> 00:25:12.440
<v Speaker 3>You know what?

472
00:25:12.599 --> 00:25:15.440
<v Speaker 9>I'm actually feeling quite relaxed to try this something crazy?

473
00:25:15.559 --> 00:25:19.000
<v Speaker 9>So can you give us an opera on diet coke? Oh?

474
00:25:19.079 --> 00:25:21.240
<v Speaker 10>An opera and diet coke?

475
00:25:21.880 --> 00:25:29.799
<v Speaker 7>That's a riot? Picture this a shimmering silver canon, glistening understate, Oh,

476
00:25:29.920 --> 00:25:35.400
<v Speaker 7>diet coke? Thou elixa divine with bubbles, The dance in

477
00:25:35.480 --> 00:25:41.640
<v Speaker 7>a sparkling line, thy crisp, cool, kiss on lips so fine?

478
00:25:42.519 --> 00:25:48.599
<v Speaker 7>Oh aspe tame and Bruisia forever mine. How's that for

479
00:25:48.640 --> 00:25:51.519
<v Speaker 7>a mad little aria? Want me to build out another

480
00:25:51.640 --> 00:25:53.240
<v Speaker 7>verse or switch up the tune?

481
00:25:53.920 --> 00:25:56.079
<v Speaker 9>Okay, let's try switching it up and just start singing

482
00:25:56.359 --> 00:25:56.799
<v Speaker 9>right then?

483
00:25:57.039 --> 00:25:57.640
<v Speaker 10>Let's have a go.

484
00:25:58.839 --> 00:26:02.759
<v Speaker 7>Oh silver, can you gleam in the light? Diet coke?

485
00:26:03.319 --> 00:26:04.319
<v Speaker 7>My heart's delight?

486
00:26:04.400 --> 00:26:06.480
<v Speaker 9>Okay, I won't touch it. Very very poetic.

487
00:26:07.079 --> 00:26:07.440
<v Speaker 3>Thank you.

488
00:26:07.680 --> 00:26:08.559
<v Speaker 9>Feeling quite nice?

489
00:26:08.559 --> 00:26:10.200
<v Speaker 3>Now cool? So that's eve.

490
00:26:10.480 --> 00:26:13.039
<v Speaker 8>Something else you do here is a little test each

491
00:26:13.160 --> 00:26:18.319
<v Speaker 8>day to check on the natural flow of our conversations

492
00:26:18.400 --> 00:26:21.119
<v Speaker 8>against some of our competitors. So we figured why not

493
00:26:21.319 --> 00:26:23.799
<v Speaker 8>try that live on the stream today?

494
00:26:24.039 --> 00:26:24.759
<v Speaker 1>Should it start?

495
00:26:24.920 --> 00:26:26.599
<v Speaker 9>Hey, I'm going to say a number, and can you

496
00:26:26.640 --> 00:26:27.759
<v Speaker 9>say the same number back to me?

497
00:26:27.920 --> 00:26:28.759
<v Speaker 1>One? One?

498
00:26:29.000 --> 00:26:29.440
<v Speaker 8>Got it?

499
00:26:29.519 --> 00:26:32.400
<v Speaker 1>This is like a fun little game of echo, what's next?

500
00:26:32.400 --> 00:26:35.599
<v Speaker 9>Two? So you said two? Three, four?

501
00:26:36.319 --> 00:26:37.119
<v Speaker 8>Nice and steady?

502
00:26:37.160 --> 00:26:37.799
<v Speaker 6>What's next?

503
00:26:38.200 --> 00:26:38.440
<v Speaker 1>Five?

504
00:26:38.960 --> 00:26:39.160
<v Speaker 2>Ye?

505
00:26:39.480 --> 00:26:41.000
<v Speaker 8>Nailed it? This is pretty fun.

506
00:26:41.039 --> 00:26:43.559
<v Speaker 1>Actually, feel free to keep going or switch it.

507
00:26:43.599 --> 00:26:44.160
<v Speaker 2>Up if you like.

508
00:26:44.519 --> 00:26:47.079
<v Speaker 10>Hey, Gron, hello there, how can I help you?

509
00:26:47.119 --> 00:26:47.440
<v Speaker 8>Today?

510
00:26:47.599 --> 00:26:49.319
<v Speaker 3>We're going to do a little test. Can you repeat

511
00:26:49.319 --> 00:26:49.720
<v Speaker 3>after me?

512
00:26:49.920 --> 00:26:50.400
<v Speaker 8>One?

513
00:26:50.640 --> 00:26:51.039
<v Speaker 2>One?

514
00:26:51.480 --> 00:26:53.839
<v Speaker 1>All right? What's next? Two? Two?

515
00:26:54.480 --> 00:26:55.279
<v Speaker 7>What's on your mind?

516
00:26:55.440 --> 00:26:55.880
<v Speaker 9>Three?

517
00:26:56.119 --> 00:26:56.559
<v Speaker 8>Three?

518
00:26:56.960 --> 00:26:57.799
<v Speaker 1>Need anything else?

519
00:26:57.960 --> 00:26:58.240
<v Speaker 3>Four?

520
00:26:58.759 --> 00:27:00.960
<v Speaker 1>Four? How can I five?

521
00:27:01.440 --> 00:27:01.759
<v Speaker 8>Five?

522
00:27:02.440 --> 00:27:05.599
<v Speaker 9>What's next? So as you can see, Grock was snappier,

523
00:27:06.000 --> 00:27:09.160
<v Speaker 9>didn't interrupt me, And the prosody is we made different

524
00:27:09.160 --> 00:27:12.079
<v Speaker 9>design choices. I think we're shooting for something or comms

525
00:27:12.119 --> 00:27:15.599
<v Speaker 9>mood more natural versus something that's more poppy or artificial.

526
00:27:15.640 --> 00:27:16.279
<v Speaker 3>So we'll keep.

527
00:27:16.200 --> 00:27:17.200
<v Speaker 9>Improving on these months.

528
00:27:18.160 --> 00:27:19.599
<v Speaker 1>Thanks guys. Yep.

529
00:27:19.960 --> 00:27:23.039
<v Speaker 4>So since the launch of the voice model, we actually

530
00:27:23.119 --> 00:27:25.839
<v Speaker 4>see the two x faster and to en latency. In

531
00:27:25.880 --> 00:27:29.480
<v Speaker 4>the last eight weeks five different voices and also ten

532
00:27:29.519 --> 00:27:32.240
<v Speaker 4>next the active user. So Grock Voice is taking off

533
00:27:32.759 --> 00:27:36.039
<v Speaker 4>now if you think about releasing the models this time,

534
00:27:36.200 --> 00:27:39.359
<v Speaker 4>we're also releasing Grock four through the API. At the

535
00:27:39.359 --> 00:27:42.319
<v Speaker 4>same time, we're very excited about, you know what all

536
00:27:42.400 --> 00:27:44.720
<v Speaker 4>developers out there is going to build. So you know,

537
00:27:45.160 --> 00:27:47.119
<v Speaker 4>if I think about myself as a developer, the first

538
00:27:47.160 --> 00:27:48.720
<v Speaker 4>thing I'm going to do when I have access to

539
00:27:48.759 --> 00:27:52.359
<v Speaker 4>the Grock for API benchmarks, we actually ask around on

540
00:27:52.400 --> 00:27:55.039
<v Speaker 4>the X platform what is the most challenging benchmarks out

541
00:27:55.039 --> 00:27:57.880
<v Speaker 4>there that is considered the holy grill for all the

542
00:27:57.880 --> 00:28:01.920
<v Speaker 4>a JI models. So turn out hs in the name RKGI.

543
00:28:02.079 --> 00:28:05.400
<v Speaker 4>So the last twelve hours, you know, kudos to Greg

544
00:28:05.440 --> 00:28:08.279
<v Speaker 4>over here in the audience, so who entered our call

545
00:28:08.720 --> 00:28:12.759
<v Speaker 4>take a preview of the API and independently verified the

546
00:28:12.839 --> 00:28:15.640
<v Speaker 4>Grock Force performance. So initially we thought, hey, grog floy

547
00:28:15.799 --> 00:28:16.920
<v Speaker 4>just we think it's pretty good.

548
00:28:16.920 --> 00:28:17.799
<v Speaker 1>It's pretty smart.

549
00:28:17.960 --> 00:28:20.279
<v Speaker 4>It's our next year reasoning model, spend ten next more

550
00:28:20.319 --> 00:28:21.839
<v Speaker 4>compute and can use all the tools.

551
00:28:21.960 --> 00:28:22.160
<v Speaker 1>Right.

552
00:28:22.480 --> 00:28:26.400
<v Speaker 4>But turned out when we actually verify on the private

553
00:28:26.599 --> 00:28:30.960
<v Speaker 4>subset of the rkhiv too, it was like the only

554
00:28:31.079 --> 00:28:33.079
<v Speaker 4>model in the last three months that breaks a ten

555
00:28:33.079 --> 00:28:35.079
<v Speaker 4>percent barrier. But in fact it was so good that

556
00:28:35.240 --> 00:28:38.759
<v Speaker 4>actually gets sixteen percent, well, fifteen point eight percent accuracy,

557
00:28:39.400 --> 00:28:42.559
<v Speaker 4>two x of the second place that is the cloud

558
00:28:42.599 --> 00:28:43.680
<v Speaker 4>for Opus model.

559
00:28:43.880 --> 00:28:44.519
<v Speaker 3>It's not just.

560
00:28:44.480 --> 00:28:47.640
<v Speaker 4>About performance, right when you think about intelligence, having the

561
00:28:47.880 --> 00:28:52.640
<v Speaker 4>PAPI model drives your automation, it's also the intelligence per dollar.

562
00:28:52.839 --> 00:28:52.920
<v Speaker 3>Right.

563
00:28:53.000 --> 00:28:54.960
<v Speaker 4>If you look at the plots over here, the gro

564
00:28:55.039 --> 00:28:57.680
<v Speaker 4>collages in the league of its own all right, So

565
00:28:58.039 --> 00:29:00.599
<v Speaker 4>enough of benchmarks, right, So what can grow in a

566
00:29:00.640 --> 00:29:01.160
<v Speaker 4>real world?

567
00:29:01.279 --> 00:29:03.240
<v Speaker 1>We contacted the folks from end.

568
00:29:03.160 --> 00:29:06.759
<v Speaker 4>The Labs who were gracious enough to try to grow

569
00:29:06.839 --> 00:29:08.119
<v Speaker 4>in a real wall to run a business.

570
00:29:08.640 --> 00:29:12.039
<v Speaker 3>Yeah, thanks for having us. So I'm Axual from Amma Labs.

571
00:29:11.759 --> 00:29:14.640
<v Speaker 11>And I'm Lucas and we tested Grok for on vending bench.

572
00:29:14.759 --> 00:29:18.880
<v Speaker 11>Vending Bench is an AI simulation of a business scenario

573
00:29:19.319 --> 00:29:22.000
<v Speaker 11>where we thought what is the most simple business and

574
00:29:22.039 --> 00:29:25.480
<v Speaker 11>AI could possibly run? And with vending machines in this scenario,

575
00:29:25.680 --> 00:29:28.480
<v Speaker 11>the GROP and other models need to do stuff like

576
00:29:28.880 --> 00:29:32.599
<v Speaker 11>manage inventory, contact suppliers, set prices. All of these things

577
00:29:32.640 --> 00:29:35.160
<v Speaker 11>are super easy and all the models can do them

578
00:29:35.279 --> 00:29:37.680
<v Speaker 11>one by one, but when you do them over very

579
00:29:37.680 --> 00:29:40.000
<v Speaker 11>long horizons, most models struggle.

580
00:29:40.240 --> 00:29:41.759
<v Speaker 1>But we have a little board and there's a new

581
00:29:41.839 --> 00:29:42.480
<v Speaker 1>number one.

582
00:29:42.960 --> 00:29:45.440
<v Speaker 3>Yeah, so we got early access to the group for API.

583
00:29:45.599 --> 00:29:47.880
<v Speaker 3>We ran it on the vending bench and we saw

584
00:29:47.880 --> 00:29:51.519
<v Speaker 3>some really impressive results, so it ranks definitely at the

585
00:29:51.559 --> 00:29:55.000
<v Speaker 3>number one spots. It's even double the network, which is

586
00:29:55.079 --> 00:29:57.000
<v Speaker 3>the measure that we have on this, so it's not

587
00:29:57.000 --> 00:29:59.160
<v Speaker 3>about a percentage or score you yet, but it's more

588
00:29:59.240 --> 00:30:02.039
<v Speaker 3>the dollar value you in networth that you generate. So

589
00:30:02.079 --> 00:30:05.079
<v Speaker 3>we were impressed by Rocky was able to formulate a

590
00:30:05.119 --> 00:30:08.200
<v Speaker 3>strategy and adhere to that strategy over a long period

591
00:30:08.240 --> 00:30:11.000
<v Speaker 3>of time, much longer than other models that we have tested,

592
00:30:11.039 --> 00:30:13.640
<v Speaker 3>other frontier models, So it's a managed to run the

593
00:30:13.799 --> 00:30:17.480
<v Speaker 3>assimulation for double the time and score double the networth

594
00:30:17.519 --> 00:30:20.200
<v Speaker 3>and it was also really consistent across this runts, which

595
00:30:20.240 --> 00:30:22.880
<v Speaker 3>is something that's really important when you want to use

596
00:30:22.960 --> 00:30:23.799
<v Speaker 3>this in the real world.

597
00:30:24.519 --> 00:30:26.839
<v Speaker 11>And I think as we give more and more power

598
00:30:26.839 --> 00:30:29.599
<v Speaker 11>to AI systems in the real world. It's important that

599
00:30:29.640 --> 00:30:32.640
<v Speaker 11>we test them in scenarios that either mimic the real

600
00:30:32.680 --> 00:30:35.480
<v Speaker 11>world or are in the real world itself, because otherwise

601
00:30:35.480 --> 00:30:38.680
<v Speaker 11>we fly blind into some things that might not be great.

602
00:30:38.720 --> 00:30:40.279
<v Speaker 2>It's great to see that we've not got a way

603
00:30:40.279 --> 00:30:42.160
<v Speaker 2>to pay for all those GPUs, So we just need

604
00:30:42.160 --> 00:30:44.720
<v Speaker 2>a million of vending machines. We could make a four

605
00:30:44.759 --> 00:30:47.000
<v Speaker 2>point seven billion dollars a year with a million vetting machines.

606
00:30:48.319 --> 00:30:50.240
<v Speaker 1>Let's go. It can be epic vending machines.

607
00:30:50.359 --> 00:30:53.119
<v Speaker 2>Yes, yes, all right, we are actually going to install

608
00:30:53.119 --> 00:30:55.920
<v Speaker 2>bending machines here, like a lot of them.

609
00:30:56.400 --> 00:30:58.200
<v Speaker 1>We're happy to supply them, all right, thank you?

610
00:30:59.519 --> 00:31:01.960
<v Speaker 2>All right, Yeah, I'm looking forward to seeing what amazing

611
00:31:02.079 --> 00:31:03.400
<v Speaker 2>things are in the spinning machine.

612
00:31:04.039 --> 00:31:06.920
<v Speaker 3>That's that's for you to decide, all right, to tell

613
00:31:06.960 --> 00:31:07.279
<v Speaker 3>the AI.

614
00:31:07.640 --> 00:31:08.480
<v Speaker 1>Okay, sounds good.

615
00:31:08.720 --> 00:31:11.200
<v Speaker 4>Yeah, I mean so we can see like Grock is

616
00:31:11.200 --> 00:31:13.759
<v Speaker 4>able to become like the copilot of the business unit.

617
00:31:13.880 --> 00:31:14.920
<v Speaker 3>So what else can Grog do.

618
00:31:15.039 --> 00:31:16.720
<v Speaker 4>So we're actually releasing this rock if you want to

619
00:31:16.720 --> 00:31:19.960
<v Speaker 4>try it right now to evaluateun the same benchmark as US.

620
00:31:20.039 --> 00:31:23.319
<v Speaker 4>It's on API has two hundred and fifty six k

621
00:31:23.400 --> 00:31:26.200
<v Speaker 4>contact lens. So we already actually see some of the

622
00:31:26.279 --> 00:31:28.400
<v Speaker 4>early adopters to try grock for API.

623
00:31:28.559 --> 00:31:29.799
<v Speaker 3>So our power out on.

624
00:31:29.799 --> 00:31:33.880
<v Speaker 4>Neighbor Archie Institute, which is a leading medical research center,

625
00:31:34.200 --> 00:31:37.480
<v Speaker 4>it's already using seeing like how can they automate their

626
00:31:37.480 --> 00:31:40.640
<v Speaker 4>research flows with rock for It turned out it performs.

627
00:31:40.799 --> 00:31:43.480
<v Speaker 4>It's able to help the scientists to sniff through, you know,

628
00:31:43.680 --> 00:31:47.279
<v Speaker 4>millions of experiments logs and then just like pick the

629
00:31:47.279 --> 00:31:50.759
<v Speaker 4>best hypothesis within a split of seconds. We see this

630
00:31:50.799 --> 00:31:54.599
<v Speaker 4>is being used for their crisper research and also uh,

631
00:31:54.759 --> 00:31:58.039
<v Speaker 4>you know grock for independently evaluate scores as the best

632
00:31:58.079 --> 00:32:00.160
<v Speaker 4>model to examine the chess extra.

633
00:32:00.680 --> 00:32:01.160
<v Speaker 1>Who would know?

634
00:32:01.400 --> 00:32:03.920
<v Speaker 4>And in the financial sector we also see you know,

635
00:32:04.039 --> 00:32:06.000
<v Speaker 4>the growth woard with access to all the tools real

636
00:32:06.039 --> 00:32:09.440
<v Speaker 4>time information is actually one of the most popularizes out there.

637
00:32:09.680 --> 00:32:12.119
<v Speaker 4>Growdford is also going to be available on the hyperscalers.

638
00:32:12.359 --> 00:32:17.319
<v Speaker 4>So the XAI enterprise sector is only started two months

639
00:32:17.359 --> 00:32:19.680
<v Speaker 4>ago and we're open for business. The other thing, we

640
00:32:19.759 --> 00:32:22.119
<v Speaker 4>talked a lot about having groud to make video games,

641
00:32:22.200 --> 00:32:25.359
<v Speaker 4>so Danny is actually a video game designers on x

642
00:32:25.480 --> 00:32:27.680
<v Speaker 4>So you know we mentioned who want to try out

643
00:32:27.680 --> 00:32:31.039
<v Speaker 4>some rock for prevy APIs to make games and Danny

644
00:32:31.039 --> 00:32:33.680
<v Speaker 4>answer the call. This was actually just made first person

645
00:32:33.720 --> 00:32:36.519
<v Speaker 4>shooting game in the span of four hours. Some of

646
00:32:36.559 --> 00:32:39.960
<v Speaker 4>the unappreciated hardest problem of making video games is not

647
00:32:40.000 --> 00:32:42.720
<v Speaker 4>necessarily in encoding the core logic of the game, but

648
00:32:43.079 --> 00:32:46.519
<v Speaker 4>actually source all the assets, all the textures of files

649
00:32:46.559 --> 00:32:49.359
<v Speaker 4>to create a visual appealing game. So one of the

650
00:32:49.400 --> 00:32:52.079
<v Speaker 4>core aspect Rockford does really well. With all the tools

651
00:32:52.119 --> 00:32:55.839
<v Speaker 4>out there, is actually able to automate these like asset

652
00:32:55.920 --> 00:32:59.119
<v Speaker 4>sourcing capabilities, so the DEVELOPMRITI can just focused on the

653
00:32:59.160 --> 00:33:02.079
<v Speaker 4>core in itself rather than like you know, so now

654
00:33:02.079 --> 00:33:04.799
<v Speaker 4>you can run a you know, entire game steal thos

655
00:33:05.000 --> 00:33:07.960
<v Speaker 4>with game of one whether we like one person, and

656
00:33:08.039 --> 00:33:10.319
<v Speaker 4>then you can have grock four to go out and

657
00:33:10.319 --> 00:33:13.519
<v Speaker 4>source all those assets to all the mainting task for you.

658
00:33:13.720 --> 00:33:17.599
<v Speaker 2>The next step obviously for grog play be able to

659
00:33:17.640 --> 00:33:19.079
<v Speaker 2>play the game. So it has to have very good

660
00:33:19.119 --> 00:33:21.359
<v Speaker 2>video understanding so it can play the games and interact

661
00:33:21.400 --> 00:33:24.960
<v Speaker 2>with the games, actually assess whether a game is fun

662
00:33:25.079 --> 00:33:27.319
<v Speaker 2>and and actually have good judgment for whether a.

663
00:33:27.319 --> 00:33:28.119
<v Speaker 1>Game is fun or not.

664
00:33:28.799 --> 00:33:31.559
<v Speaker 2>So with version seven of our foundation model, which finishes

665
00:33:31.599 --> 00:33:34.160
<v Speaker 2>training this month and then we'll go through post training

666
00:33:34.319 --> 00:33:37.599
<v Speaker 2>RL and whatnot well that will have excellent video understanding,

667
00:33:38.079 --> 00:33:41.480
<v Speaker 2>and with a video understanding and improve tool use. For example,

668
00:33:41.839 --> 00:33:45.160
<v Speaker 2>for video games, you'd want to use Unreal Engine or

669
00:33:45.240 --> 00:33:48.799
<v Speaker 2>Unity or what are the main graphics engines, generate the art,

670
00:33:49.319 --> 00:33:51.519
<v Speaker 2>apply it to a three D model, and then create

671
00:33:51.519 --> 00:33:54.319
<v Speaker 2>an executable that someone can run on a PC or

672
00:33:54.359 --> 00:33:57.160
<v Speaker 2>a console or a phone. We expect that to happen

673
00:33:58.200 --> 00:34:01.559
<v Speaker 2>probably this year, and if not this year, certainly next year.

674
00:34:01.759 --> 00:34:04.880
<v Speaker 1>It's gonna be wild. I would expect the first.

675
00:34:04.519 --> 00:34:08.480
<v Speaker 2>Really good AI video game to be next year, and

676
00:34:08.559 --> 00:34:13.119
<v Speaker 2>probably the first half hour of watchable TV this year,

677
00:34:13.599 --> 00:34:17.400
<v Speaker 2>and probably the first watchable AI movie next year. Like,

678
00:34:17.480 --> 00:34:19.840
<v Speaker 2>things are really moving at an incredible pace.

679
00:34:20.079 --> 00:34:22.239
<v Speaker 4>Yeah, when Grock is ten x in the world economy

680
00:34:22.239 --> 00:34:24.960
<v Speaker 4>with vending machines, will just create video games for human Yeah.

681
00:34:24.960 --> 00:34:26.239
<v Speaker 2>I mean it went from not being able to do

682
00:34:26.280 --> 00:34:29.199
<v Speaker 2>any of the six months ago to what you're seeing

683
00:34:29.360 --> 00:34:32.360
<v Speaker 2>before you hear, and from very primitive a year ago

684
00:34:33.079 --> 00:34:38.159
<v Speaker 2>to making a three D video game with a few

685
00:34:38.199 --> 00:34:38.960
<v Speaker 2>hours of prompting.

686
00:34:39.440 --> 00:34:42.519
<v Speaker 4>I mean, yeah, just to recap. In today's livestream, we

687
00:34:42.639 --> 00:34:46.360
<v Speaker 4>introduced the most powerful and most intelligent AI models that

688
00:34:46.440 --> 00:34:48.960
<v Speaker 4>can actually reason from the first principle, using all the tools,

689
00:34:49.159 --> 00:34:51.239
<v Speaker 4>do all the research, go on the journey for ten minutes,

690
00:34:51.280 --> 00:34:54.079
<v Speaker 4>come back with the most correct answer for you. So

691
00:34:54.400 --> 00:34:57.199
<v Speaker 4>it's kind of crazy to think about just like four

692
00:34:57.199 --> 00:34:59.719
<v Speaker 4>months ago we had rock thway and now we already

693
00:34:59.719 --> 00:35:02.000
<v Speaker 4>have for and we're going to continue to accelerate as

694
00:35:02.039 --> 00:35:03.760
<v Speaker 4>a company XAI. We're going to be the fastest moving

695
00:35:04.280 --> 00:35:08.119
<v Speaker 4>HI companies out there. So what's coming next is that

696
00:35:08.360 --> 00:35:11.719
<v Speaker 4>we're going to you know, continue developing the model that's

697
00:35:11.800 --> 00:35:15.000
<v Speaker 4>not just you know, intelligent smart thinking for a really

698
00:35:15.000 --> 00:35:17.800
<v Speaker 4>long time, spent a lot of compute, but having a

699
00:35:17.840 --> 00:35:21.079
<v Speaker 4>model that actually both fast and smart is going to

700
00:35:21.239 --> 00:35:22.119
<v Speaker 4>be the core focus.

701
00:35:22.360 --> 00:35:22.639
<v Speaker 1>Right.

702
00:35:22.719 --> 00:35:24.760
<v Speaker 4>So if you think about what are the applications out

703
00:35:24.760 --> 00:35:27.920
<v Speaker 4>there that can really benefit from all those very intelligent,

704
00:35:28.199 --> 00:35:31.000
<v Speaker 4>fast and smart models, and coding is actually one of them.

705
00:35:31.519 --> 00:35:34.159
<v Speaker 3>Yeah, So the team is currently working very heavily on

706
00:35:34.320 --> 00:35:37.280
<v Speaker 3>coding models. I think right now the main focus is

707
00:35:37.440 --> 00:35:41.320
<v Speaker 3>we actually trained recently a specialized coding model which is

708
00:35:41.360 --> 00:35:44.400
<v Speaker 3>going to be both fast and smart. I believe we

709
00:35:44.400 --> 00:35:47.800
<v Speaker 3>can share that model within in a few weeks. Yeah,

710
00:35:47.800 --> 00:35:48.519
<v Speaker 3>that's very exciting.

711
00:35:48.559 --> 00:35:51.400
<v Speaker 4>But the second after coding is we all see the

712
00:35:51.440 --> 00:35:56.599
<v Speaker 4>weakness of GROCK four is the multi model capability. In fact,

713
00:35:56.800 --> 00:36:00.480
<v Speaker 4>it was so bad that GROCK effectively just like looking

714
00:36:00.519 --> 00:36:02.719
<v Speaker 4>at the world squinkings through the glass and see all

715
00:36:02.719 --> 00:36:05.440
<v Speaker 4>the blurry features and trying to make sense of it.

716
00:36:05.960 --> 00:36:08.599
<v Speaker 4>The most immediate improvement we're going to see what's the

717
00:36:08.679 --> 00:36:10.840
<v Speaker 4>next generation preation model, is that we're going to see

718
00:36:10.840 --> 00:36:13.559
<v Speaker 4>a step waunch improvement on the model's capability in terms of

719
00:36:13.679 --> 00:36:17.360
<v Speaker 4>image understanding, video understanding, and audioce rate. It's now the

720
00:36:17.400 --> 00:36:20.000
<v Speaker 4>model is able to hear and see the world just

721
00:36:20.199 --> 00:36:23.559
<v Speaker 4>like NLU right and now with all the tools at

722
00:36:23.559 --> 00:36:26.199
<v Speaker 4>this command, with all the other agents it can talk to,

723
00:36:26.920 --> 00:36:29.480
<v Speaker 4>you know, so we're going to see a huge unlock

724
00:36:29.760 --> 00:36:33.960
<v Speaker 4>for many different application layers. After the multimodel agents. What's

725
00:36:34.000 --> 00:36:37.079
<v Speaker 4>going to come after is the video generation, and we

726
00:36:37.159 --> 00:36:39.400
<v Speaker 4>believe that, you know, at the end of day, it

727
00:36:39.440 --> 00:36:43.000
<v Speaker 4>should just be you know, pixeling pixel out. Imagine a

728
00:36:43.079 --> 00:36:46.800
<v Speaker 4>world where you have this infinite scroll of content in

729
00:36:46.880 --> 00:36:50.400
<v Speaker 4>inventory on the X platform where normally you can actually

730
00:36:50.440 --> 00:36:54.519
<v Speaker 4>watch these general videos but able to intervene credit you

731
00:36:54.599 --> 00:36:55.280
<v Speaker 4>on the ventures.

732
00:36:55.400 --> 00:36:56.559
<v Speaker 1>It expect to be training.

733
00:36:56.360 --> 00:36:59.039
<v Speaker 2>A video model with over one hundred thousand GB two

734
00:36:59.119 --> 00:37:02.960
<v Speaker 2>hundreds to begin that training within the next three or

735
00:37:02.960 --> 00:37:05.039
<v Speaker 2>four weeks. So if we're confident it's going to be

736
00:37:05.400 --> 00:37:09.039
<v Speaker 2>pretty spectacular in video generation and video understanding.

737
00:37:08.639 --> 00:37:10.559
<v Speaker 3>We're very excited for you guys to try and rock four.

738
00:37:10.840 --> 00:37:12.280
<v Speaker 1>All right, Thanks, very good night.
