WEBVTT

1
00:00:00.120 --> 00:00:03.960
Eight eight seven. Tony even stays
awake all night, twenty four hours a

2
00:00:04.000 --> 00:00:07.639
day, seven days a week,
so you can sleep better and rest easy.

3
00:00:08.320 --> 00:00:12.240
South Pacific sleep left, start feeling
better and get in a great night

4
00:00:12.320 --> 00:00:21.199
of sleep. Today. You're on
board caseyaas Inland Express caseyaa home, Linda

5
00:00:21.440 --> 00:00:32.719
en fifty am the station that needs
notice here behind the information economy has a

6
00:00:32.840 --> 00:00:38.240
ride. The world is teeming with
innovation as new business models reinvent every industry

7
00:00:38.280 --> 00:00:43.600
industry. Inside Analysis is your source
of information and insight about how to make

8
00:00:43.640 --> 00:00:48.439
the most of this exciting new era. Learn more at inside analysis dot com,

9
00:00:48.520 --> 00:00:53.039
Inside analysis dot com. And now
here's your host, Eric Kavanaugh.

10
00:00:57.679 --> 00:01:00.039
All right, ladies and gentlemen,
Hello and welcome back once again the only

11
00:01:00.159 --> 00:01:04.680
coast to coast show all about the
information economy. It's called Inside Analysis,

12
00:01:04.760 --> 00:01:08.000
or is truly Eric Kavanaugh Here with
one of my best friends in the business,

13
00:01:08.079 --> 00:01:12.079
a guy who's been around we're a
couple of years now and has made

14
00:01:12.079 --> 00:01:18.040
his way into all sorts of different
data scenarios, data products, data concepts,

15
00:01:18.120 --> 00:01:22.599
data models, data designs. Mister
Kent Graziano, I asked him before

16
00:01:22.599 --> 00:01:26.680
we started, does that mean?
Does that mean thank you very much in

17
00:01:26.719 --> 00:01:33.040
Italian because of course gratimile, but
it means grace or beloved or deer,

18
00:01:33.560 --> 00:01:38.599
so kent deer gratiano. How was
going out there, buddy? Good?

19
00:01:38.640 --> 00:01:41.760
How are you doing rich? I'm
doing good. So yeah, folks,

20
00:01:41.799 --> 00:01:46.200
I had this idea to do a
series called profiles in data, because it's

21
00:01:46.200 --> 00:01:48.200
all about the people when you get
right down to it. I mean,

22
00:01:48.239 --> 00:01:51.319
of course the data is important,
and the tools and the technologies and the

23
00:01:51.359 --> 00:01:53.920
methods and how to do all that
stuff. But at the end of the

24
00:01:53.000 --> 00:01:57.000
day, even in this era of
AI, it's still about the people.

25
00:01:57.239 --> 00:02:00.359
And it's always going to be about
the people. I mean, the jobs

26
00:02:00.359 --> 00:02:02.680
are going to change. I think
we're going to go through a very disruptive

27
00:02:02.719 --> 00:02:07.159
period. It's already happening from what
I've seen. It's it's already taking place.

28
00:02:07.719 --> 00:02:09.360
There are lots of reasons for it, but the one thing you can

29
00:02:09.360 --> 00:02:14.199
count on is the wisdom of sages, and so Kan't. I'm going to

30
00:02:14.240 --> 00:02:16.599
throw you in that category as a
data sage. You've been around a long

31
00:02:16.599 --> 00:02:22.319
time, You've seen the ups and
downs, and we're in a very different

32
00:02:22.360 --> 00:02:23.680
world right now. You know,
I'm going to throw kind of a curb.

33
00:02:23.719 --> 00:02:29.080
I'll let you to start. I
feel like these large language models represent

34
00:02:29.199 --> 00:02:35.240
a very serious inflection point in the
history of business, quite frankly, but

35
00:02:35.560 --> 00:02:39.680
primarily of the data world. And
i'll explain why. So in the very

36
00:02:39.719 --> 00:02:44.520
earliest days of DM radio, I
remember wrapping my head around ETL, all

37
00:02:44.560 --> 00:02:47.280
this ETL moving around and around and
around and around, all these batch windows

38
00:02:47.280 --> 00:02:51.280
being hit, and I thought to
myself, this is crazy, like that.

39
00:02:51.680 --> 00:02:54.360
You're probably moving data that doesn't get
used. You're probably moving the same

40
00:02:54.439 --> 00:02:59.960
data multiple times. You're probably sometimes
overwriting good data with bad I mean,

41
00:03:00.000 --> 00:03:01.759
and there just must be so many
things happening here on the data front.

42
00:03:02.000 --> 00:03:07.159
But business just demands things. The
business demands data, It demands execution,

43
00:03:07.240 --> 00:03:09.599
it demands performance. Business people say
we got to get our data in they're

44
00:03:09.639 --> 00:03:13.080
okay, fine, let's move it
in there. Oh we didn't get the

45
00:03:13.120 --> 00:03:16.000
schema right, and then things changed, right like with snowflake, and you

46
00:03:16.039 --> 00:03:20.800
know a lot about Snowflake. Maybe
we'll start there. One thing that I

47
00:03:20.840 --> 00:03:25.680
thought they really solved and really figured
out was that schemas change because the world

48
00:03:25.800 --> 00:03:30.120
changes. So you want to be
able to change the schema of your warehouse.

49
00:03:30.479 --> 00:03:34.240
Well, in the pre Snowflake days, that was a huge problem.

50
00:03:34.280 --> 00:03:36.879
I mean, you know, you'd
be asking for lots of trouble, a

51
00:03:36.879 --> 00:03:39.800
lot of money, a lot of
very unhappy engineers, like lots of bad

52
00:03:39.840 --> 00:03:44.240
things would happen, and Snowflake's like, now, let's make it easy to

53
00:03:44.319 --> 00:03:46.520
pull this thing down, change the
schema in the spin it back up again.

54
00:03:47.280 --> 00:03:51.000
It was it your take as well. That was a very big change

55
00:03:51.039 --> 00:03:53.840
in the in the industry. Yeah, I mean the I mean Snowflake was

56
00:03:54.960 --> 00:03:59.400
that you had to use the standard
phrase a game changer, certainly in the

57
00:03:59.479 --> 00:04:06.199
data warehouse world because of a number
of the features that been voluntary architected into

58
00:04:06.400 --> 00:04:11.759
the platform right and you know what
you're talking about there, which made a

59
00:04:11.840 --> 00:04:15.240
huge difference and still obviously does to
a lot of people was their zero copy

60
00:04:15.240 --> 00:04:20.240
clone. The ability to take an
existing schema and with one command clone the

61
00:04:20.399 --> 00:04:27.079
entire thing and basically make a complete
replica of it with no additional storage costs.

62
00:04:27.120 --> 00:04:30.160
That was the thing, because that
was something I couldn't do in the

63
00:04:30.240 --> 00:04:34.519
data warehousing world if I wanted to
have a dev instance that had the same

64
00:04:34.560 --> 00:04:40.959
amount of data as my PROD instance. I never ever in my career was

65
00:04:41.000 --> 00:04:45.399
able to get that because the cost
was too much and the time, you

66
00:04:45.399 --> 00:04:49.959
know, once you started getting into
terabytes of data to replicate that on storage

67
00:04:50.000 --> 00:04:58.000
with the systems at the time,
that could take days, weeks sometimes depending

68
00:04:58.000 --> 00:05:02.079
on the system, and you really
couldn't get which meant you couldn't actually innovate

69
00:05:02.879 --> 00:05:09.279
with confidence that you weren't going to
break something right right. You couldn't test

70
00:05:09.319 --> 00:05:11.519
it. It's like, oh,
the query works great. Yeah, well

71
00:05:11.560 --> 00:05:15.639
you're got one tenth of the data
in your QA instance as you do in

72
00:05:15.680 --> 00:05:17.399
production. And then it goes to
production and the users are screaming, going,

73
00:05:18.160 --> 00:05:20.600
I can't live with this, this
is this is too slow. And

74
00:05:20.639 --> 00:05:26.720
now you're doing, you know again, additional iterations and refactoring to figure out

75
00:05:26.720 --> 00:05:28.920
how can I tune it to make
it better, how can I make it

76
00:05:28.959 --> 00:05:31.160
faster? And you know, the
users are already mad at you because you

77
00:05:31.199 --> 00:05:33.879
gave them, you told them you
had what they wanted, and they're like,

78
00:05:33.920 --> 00:05:38.959
well, this is unusable, it's
too slow. And the zero copy

79
00:05:39.000 --> 00:05:42.560
cloning thing that was like, wow, this this is awesome. This means

80
00:05:42.600 --> 00:05:46.480
I can actually not shoot myself in
the foot before I move something into production

81
00:05:46.879 --> 00:05:49.959
and have to do these cycles again, we could do real user acceptance testing

82
00:05:50.839 --> 00:05:57.079
without this exorbitant cost or you know, waiting weeks for the infrastructure team to

83
00:05:57.079 --> 00:06:01.439
put more disc online and for the
DVAS to you know, restore something from

84
00:06:01.480 --> 00:06:05.079
backup, assuming the backup tape was
actually good, right, you know that

85
00:06:05.160 --> 00:06:08.959
they could do all of that,
So that that that really did change a

86
00:06:09.000 --> 00:06:14.160
lot for a lot of companies out
there. Then, and of course the

87
00:06:14.199 --> 00:06:18.040
whole separation to compute from storage that
allowed you to scale the storage if you

88
00:06:18.120 --> 00:06:23.399
needed without scaling the compute, and
scale the compute without scaling the storage just

89
00:06:23.439 --> 00:06:27.720
made it so flexible. It just
changed. It changed so many things.

90
00:06:28.480 --> 00:06:31.000
Then you throw in there the variant
data type that allowed you to load Jason.

91
00:06:31.199 --> 00:06:35.000
So you talk about schema changes.
You know, we actually got real

92
00:06:35.120 --> 00:06:41.839
schema on read load in a Jason
document into a table in relational database with

93
00:06:41.879 --> 00:06:46.360
the table access it was SQL.
And you know, as your data pipeline

94
00:06:46.399 --> 00:06:49.800
is running and you got new data
coming in, if there's a little change

95
00:06:49.800 --> 00:06:53.360
in the scheme, it didn't break
anything. I mean, I'm sure you

96
00:06:53.439 --> 00:06:56.920
remember the days that you were talking
about ETL. It's like the source system

97
00:06:57.079 --> 00:07:01.680
changed. They dropped one column right, and didn't tell anybody because they're thinking

98
00:07:01.720 --> 00:07:05.480
about their operational system, they're not
talking about the data warehouse. And then

99
00:07:05.519 --> 00:07:10.879
your informatica job blows up in the
middle of the night and now everybody's screaming

100
00:07:10.920 --> 00:07:13.800
the next morning because the data hasn't
been refreshed. You told us that'd be

101
00:07:13.920 --> 00:07:16.519
refreshed overnight and it's not been refreshed. What happened, Eh, you changed

102
00:07:16.519 --> 00:07:19.399
the source. They changed the source
system. Didn't tell us. Now we

103
00:07:19.439 --> 00:07:25.079
got to go back and re engineer
the ETL and change the tables and the

104
00:07:25.160 --> 00:07:29.279
database, and you know, so
much of that went away in the last

105
00:07:29.639 --> 00:07:34.160
Well see, Snowflake was founded in
twenty twelve, so last ten years.

106
00:07:34.199 --> 00:07:39.040
I started with them in twenty fifteen, so it's been less than a decade,

107
00:07:39.519 --> 00:07:43.319
and you know, they solved those
really big problems that we had in

108
00:07:43.360 --> 00:07:47.079
data warehousing for multiple decades. Well, I mean you kind of hinted that

109
00:07:46.959 --> 00:07:51.639
the big downside of the old way, which is basically an opportunity costs around

110
00:07:51.720 --> 00:07:59.120
innovation because it was fragile, because
it was difficult to work with, because

111
00:07:59.240 --> 00:08:03.240
changes were very hard, people didn't
try to make changes. I mean,

112
00:08:03.600 --> 00:08:05.439
it's like a kid who keeps getting
reprimanded for trying new things. They just

113
00:08:05.480 --> 00:08:09.040
stopped like, fine, I won't
do anything right. So the psychology of

114
00:08:09.639 --> 00:08:16.000
data use really took a whole new
turn, in large part thanks to Snowflake.

115
00:08:16.040 --> 00:08:18.319
Now, of course we have data
bricks, we have a whole bunch

116
00:08:18.319 --> 00:08:20.920
of other options of ways to do
things. But the point is that now

117
00:08:22.000 --> 00:08:26.000
there is more confidence to move forward
to try changes to do things because you're

118
00:08:26.040 --> 00:08:31.160
not so worried about bringing the whole
thing to a crashing halt. Right right,

119
00:08:31.360 --> 00:08:37.639
You're not creating instant technical debt by
doing something right, and so you

120
00:08:37.679 --> 00:08:41.639
can take a more agile approach.
You can be more responsive to the business

121
00:08:43.159 --> 00:08:46.200
and the business requirements when they want
to change, with like yeah, without

122
00:08:46.240 --> 00:08:50.679
having to worry that, oh yeah, we're gonna break it. And then

123
00:08:50.720 --> 00:08:54.080
the tools have evolved so much too, between the data catalogs and the data

124
00:08:54.120 --> 00:08:58.679
engineering tools. You know, the
thing that we always needed was really good

125
00:08:58.720 --> 00:09:03.320
ima analysis reports. But to get
the impact analysis reports, you needed something

126
00:09:03.360 --> 00:09:09.039
tracking the metadata around data lineage.
You had to know where did that data

127
00:09:09.080 --> 00:09:11.440
come from and where did it go? We pulled a data element in from

128
00:09:11.480 --> 00:09:16.080
one source to a system, how
many places did it go And if we

129
00:09:16.200 --> 00:09:18.519
changed that, how many things is
it going to impact? Well, you've

130
00:09:18.519 --> 00:09:22.279
got to have the lineage to do
the impact analysis. And now we've got

131
00:09:22.399 --> 00:09:26.519
a lot more tools in er what
we call our modern data stack that allow

132
00:09:26.600 --> 00:09:30.159
us to do that so much easier, so much quicker. You're no longer

133
00:09:30.200 --> 00:09:35.919
relying on a couple of data engineers
to know the code inside and out.

134
00:09:35.960 --> 00:09:37.279
And go, oh, yeah,
that's right. I remember over in that

135
00:09:37.919 --> 00:09:41.759
routine we did this, and we
did this based on the user requirements.

136
00:09:41.799 --> 00:09:45.600
But then they had us do this
in a different routine and yeah, if

137
00:09:45.600 --> 00:09:48.720
we touch that, we're going to
break three different things. You don't have

138
00:09:48.759 --> 00:09:52.440
to do that anymore. You can
find it through the metadata. Yeah.

139
00:09:52.519 --> 00:09:56.279
Well, and you hinted it something
else tonight. Throughout the largest language models

140
00:09:56.320 --> 00:10:01.559
to start us off here, and
you think about the impact of these engines

141
00:10:01.639 --> 00:10:05.000
now, I think there were numerous
other shoes left to drop in this equation,

142
00:10:05.720 --> 00:10:09.120
just one way of putting it,
basically, because they are very powerful.

143
00:10:09.279 --> 00:10:13.200
But you know, Sam Altman had
that very curious comment he made a

144
00:10:13.279 --> 00:10:15.799
couple of months ago, say,
oh, the era of large language models

145
00:10:15.879 --> 00:10:20.639
is over. Everyone was like,
what, what are you talking about I

146
00:10:20.720 --> 00:10:22.840
think, No, I don't know. This is just my personal theory.

147
00:10:22.879 --> 00:10:28.360
But we've now seen that they've had
some challenges people, And this is research

148
00:10:28.399 --> 00:10:30.799
that I've done and people have talked
to have said, there are things that

149
00:10:30.840 --> 00:10:33.120
it did very well three months ago, four months ago that now has a

150
00:10:33.159 --> 00:10:37.240
hard time doing. A good buddy
of mine, who actually almost worked with

151
00:10:37.320 --> 00:10:41.600
open Ai, said that he looked
at their models three years ago or so

152
00:10:41.720 --> 00:10:46.279
and said, you're gonna have all
kinds of problems because you're not respecting entropy.

153
00:10:46.679 --> 00:10:48.399
He said, you're gonna have short
term memory problems, long term memory

154
00:10:48.399 --> 00:10:52.240
problems. Basically, he blew it
up at them, and they just said,

155
00:10:52.240 --> 00:10:54.320
all right, we'll get out of
here. We're going to do it

156
00:10:54.320 --> 00:10:56.399
our way. We'll deal with that
later. Yeah, we'll do exactly,

157
00:10:56.440 --> 00:11:00.639
We'll figure that out later. And
of course they rolled out and huge.

158
00:11:01.039 --> 00:11:03.799
I mean, you know, the
friends I have at Google, I can

159
00:11:03.840 --> 00:11:07.279
tell they are stressed out. The
company is very stressed out right now,

160
00:11:07.279 --> 00:11:09.120
which is kind of hard to believe
because it's Google, Like, I mean,

161
00:11:09.159 --> 00:11:13.159
goodness, you're one of the biggest
companies in the world. You change

162
00:11:13.240 --> 00:11:16.960
the world with the stuff that you
did, and of course what happened.

163
00:11:16.159 --> 00:11:20.720
Microsoft goes throws a whole bunch of
money at open AI, then the opening

164
00:11:20.759 --> 00:11:26.080
eye board curiously kicks him out,
and then such Inno Della hires him,

165
00:11:26.159 --> 00:11:30.399
and like, wow, that was
one of the biggest tennis matches I've ever

166
00:11:30.399 --> 00:11:35.480
seen. Like what is going back
and forth? Point being? And where

167
00:11:35.480 --> 00:11:39.960
I'm kind of going with all this
is, even though there are some challenges,

168
00:11:39.000 --> 00:11:43.600
they're so powerful. You can load
code and ask the machine to tell

169
00:11:43.639 --> 00:11:48.440
you what the code means. You
can load a whole bunch of documents and

170
00:11:48.480 --> 00:11:50.320
ask it to summarize. So I
think those are the two better use cases.

171
00:11:50.759 --> 00:11:54.840
But kind of where I'm going is
I see these llms, the engines

172
00:11:54.879 --> 00:11:58.159
of them at least, and maybe
it's SLM small language models to become more

173
00:11:58.200 --> 00:12:05.679
prominent, really serving as a very
powerful component in a workflow. So they're

174
00:12:05.720 --> 00:12:09.279
good at ETL. For example,
I scraped a whole bunch of data off

175
00:12:09.360 --> 00:12:13.720
the CEES website because I wanted to
get a clean list of companies from that

176
00:12:13.879 --> 00:12:16.960
site. So I scraped a whole
I did a just command a command C

177
00:12:16.600 --> 00:12:20.360
command V into a Google doc and
then I pasted that in the chat into

178
00:12:22.159 --> 00:12:24.559
Gemini was barred back then, and
I said, please give me a clean

179
00:12:24.639 --> 00:12:28.200
list of just the company names and
all this text, and there's all this

180
00:12:28.279 --> 00:12:31.919
other text inside. It was like, okay, that just banged it out.

181
00:12:31.960 --> 00:12:37.080
I'm like, all right, that's
pretty impressive. You know that it

182
00:12:37.120 --> 00:12:41.120
can take an instruction like that ascertain
I only want company names, know what

183
00:12:41.159 --> 00:12:46.759
those are in the context, and
then bang out a whole list, And

184
00:12:46.799 --> 00:12:48.679
then I asked that to give me
a Twitter handle for each one. It

185
00:12:48.720 --> 00:12:52.639
did that too, but half of
them were bad. So it does make

186
00:12:52.679 --> 00:12:56.360
stuff up. But nonetheless, I
mean, think about if you'd seen this

187
00:12:56.440 --> 00:12:58.559
blob of text, you know what
I'm talking about, just all kinds of

188
00:12:58.600 --> 00:13:01.840
other words in between places and tabs
and commas, and is also kind of

189
00:13:01.840 --> 00:13:07.600
crap. I had no problem just
stripping all that stuff out. That's a

190
00:13:07.720 --> 00:13:11.320
very powerful function, which would have
taken a good ETL developer. I don't

191
00:13:11.320 --> 00:13:13.720
even know how long to sort through
and to figure out it would be hard

192
00:13:13.720 --> 00:13:18.200
to do. Not for the llms. So what kind of impact do you

193
00:13:18.200 --> 00:13:22.559
think they're going to have on the
data management business? Writ large Well,

194
00:13:22.080 --> 00:13:28.879
I think that what we're seeing is
the next step in data democratization right by

195
00:13:30.240 --> 00:13:33.799
wee can now with even what you
described, you basically you took the data

196
00:13:33.879 --> 00:13:37.679
engineer out of the loop, right, You were able to do something very

197
00:13:37.759 --> 00:13:41.799
quickly on your own. And I
think that's part of the promise and the

198
00:13:41.840 --> 00:13:48.320
power of these things is to allow
business analysts now to do things with data

199
00:13:48.720 --> 00:13:52.480
without having to decoders. Right,
they don't even necessarily need to know SQL.

200
00:13:52.720 --> 00:13:56.480
In one of the recent demos I
saw, I think it's called Cortex

201
00:13:58.559 --> 00:14:01.360
when the new sale featured, right, is like you can you can ask

202
00:14:01.399 --> 00:14:05.320
a question say I want I want
the uh, you know, the average

203
00:14:05.360 --> 00:14:13.320
sales over the last three years for
my top five regions, and it writes

204
00:14:13.320 --> 00:14:18.159
the sequel for you, right,
And you know that's starting need to come

205
00:14:18.200 --> 00:14:22.200
out. And I've seen a couple
of demos like that recently from the data

206
00:14:22.200 --> 00:14:26.320
analytics perspective that now again you don't
you don't have to know SQL. Uh,

207
00:14:26.639 --> 00:14:31.639
you don't even necessarily have to understand
the schema of the database in order

208
00:14:31.720 --> 00:14:37.080
to get some basic analytics done using
an ll m U, using a you

209
00:14:37.120 --> 00:14:41.840
know it's prompt. And my son
is in school now and he he mentioned

210
00:14:41.840 --> 00:14:48.240
something about one of the careers that
they've talked about in the career education stuff

211
00:14:48.240 --> 00:14:52.279
that he's going through at college was
a thing called prompt engineering, right that

212
00:14:52.720 --> 00:14:56.360
you could you could take classes in
prompt engineering. It's like, well,

213
00:14:56.360 --> 00:15:00.919
what does that mean? That's okay? How to ask the question? So

214
00:15:01.080 --> 00:15:05.000
I think we used to call that
critical thinking, right learning how to ask

215
00:15:05.039 --> 00:15:09.799
the question, But now you're going
to learn how to ask the question of

216
00:15:09.960 --> 00:15:15.559
an AI basically right, and how
you ask the question will have an effect

217
00:15:15.639 --> 00:15:20.320
on the outcome because again, the
AI the LM can't read your mind and

218
00:15:20.360 --> 00:15:26.039
it can only interpret so much.
So being able to ask very clear,

219
00:15:26.200 --> 00:15:33.840
concise questions using language that the AI
recognizes is going to be the skill,

220
00:15:33.240 --> 00:15:37.120
right, That's going to be the
skill rather than knowing how to write,

221
00:15:37.559 --> 00:15:43.279
you know, select some count star
group by right in order to get it

222
00:15:43.000 --> 00:15:48.600
in order to get the answer,
which is so huge because and this is

223
00:15:48.679 --> 00:15:50.919
leading up to maybe segment two.
I'm going to give you one of my

224
00:15:50.960 --> 00:15:56.320
big ideas and sear to think about
it. But because now you've democratized access

225
00:15:56.399 --> 00:16:00.480
to analytics, is what you've done. Basically, you had to know some

226
00:16:00.519 --> 00:16:03.320
sqel or have a good idea at
least how to use the technologies to be

227
00:16:03.360 --> 00:16:07.960
able to build the queries. Now
you can just ask it a question.

228
00:16:07.200 --> 00:16:11.519
Now again, it does make things
up, so you have to be careful.

229
00:16:11.519 --> 00:16:12.600
You have to vet things. But
we've always had to vet things.

230
00:16:12.639 --> 00:16:15.480
I mean, there are times when
the data engineering was done wrong. So

231
00:16:15.799 --> 00:16:19.120
it's not like humans didn't make mistakes, and now the AI is creating some

232
00:16:19.440 --> 00:16:22.639
new function called a mistake. No, like, it makes stuff up,

233
00:16:22.639 --> 00:16:26.759
and it has some problems, but
the fact that you can sit there and

234
00:16:26.799 --> 00:16:32.200
just have such an interactive discourse with
the data, to me is incredibly game

235
00:16:32.320 --> 00:16:37.320
changing. And I think it's going
to put even more excitement in the workplace

236
00:16:37.399 --> 00:16:41.759
to use the data because a lot
of times people don't use the technologies because

237
00:16:41.799 --> 00:16:45.960
they're hard to use, because they're
slow, because they don't trust the data

238
00:16:47.000 --> 00:16:49.080
for example, and we do have
the trust issue. But this stuff is

239
00:16:49.120 --> 00:16:52.360
not hard to use. I mean, it can be hard to get really

240
00:16:52.399 --> 00:16:56.320
good at your prompt engineering, but
even that, just to ask a different

241
00:16:56.360 --> 00:17:00.080
question, it doesn't get tired,
it doesn't get annoyed at you. I

242
00:17:00.120 --> 00:17:03.000
will admit I tried Groc a little
bit. I thought Grock was kind of

243
00:17:03.039 --> 00:17:06.519
annoying. That's the one that's in
Twitter these days. It's supposed to be

244
00:17:06.640 --> 00:17:10.599
sarcastic and goofy, And I'm like, is there a big market for an

245
00:17:10.680 --> 00:17:14.319
engine that gives me sarcastic answers?
I just don't think so. I think,

246
00:17:14.680 --> 00:17:17.480
you know, maybe Elon smoked a
couple of dubies before that one came

247
00:17:17.519 --> 00:17:19.279
out, which is fine. I
mean, the guy's done amazing things.

248
00:17:19.440 --> 00:17:25.000
He's an engineer himself, and you
look at the accomplishments this guy's had.

249
00:17:25.000 --> 00:17:27.519
It's just absolutely shocking. But you
do have competition out there. And I

250
00:17:27.519 --> 00:17:33.720
guess what I'm driving at is by
democratizing access to systems of record with these

251
00:17:34.039 --> 00:17:38.599
llms, we are fundamentally changing the
game about how people interact with data,

252
00:17:38.799 --> 00:17:42.359
how they're able to consume data learn
from data. I'm writing an abstract right

253
00:17:42.359 --> 00:17:45.160
now, in fact, for the
show I'll do with you on Thursday,

254
00:17:47.240 --> 00:17:52.319
all about this concept of data literacy. And I think that just playing around

255
00:17:52.400 --> 00:17:56.880
with these tools connected to your data
sources, you are fostering better data literacy

256
00:17:56.960 --> 00:18:00.759
because you're learning things about the data
and you don't have to take a class

257
00:18:00.839 --> 00:18:04.079
necessarily, just have to dedicate the
time to hop online, play around,

258
00:18:04.319 --> 00:18:07.720
click a few buttons, type into
some questions and see the different answers,

259
00:18:07.880 --> 00:18:11.720
and you have to work with it
to see how it operates and to kind

260
00:18:11.720 --> 00:18:15.240
of know sort of the contours of
what it does. But folks, don't

261
00:18:15.279 --> 00:18:18.079
touch out that. I'll be right
back in a moment. We're talking to

262
00:18:18.920 --> 00:18:30.039
a hero of data. Ken Graziano
will be right back. Welcome back to

263
00:18:30.200 --> 00:18:37.559
Inside Analysis. Here's your host and
Eric Tavanaugh. All right, folks,

264
00:18:37.559 --> 00:18:41.920
back here on Inside Analysis with Kent
Graziano, who has been a data avenger

265
00:18:41.000 --> 00:18:44.799
for years. Quite frankly, you've
been out there on the front line.

266
00:18:44.960 --> 00:18:48.559
I remember I met you at the
Data Vault conference a number of years ago.

267
00:18:48.640 --> 00:18:52.000
That was fun. That was before
the big COVID, And you know,

268
00:18:52.200 --> 00:18:53.680
COVID gave us a lot of time
to think about stuff, for sure,

269
00:18:53.720 --> 00:18:56.720
and to sort of reevaluate what was
going on. I'm sure there's a

270
00:18:56.720 --> 00:19:00.559
bit of a COVID surge in terms
of innovation because people are time to think

271
00:19:00.559 --> 00:19:03.240
about stuff and do cool new things. And with that in mind, I'm

272
00:19:03.279 --> 00:19:07.839
going to throw my big idea over
to the sage graz Siattle. I want

273
00:19:07.839 --> 00:19:11.839
to get your thoughts on this,
so I keep thinking about how executives use

274
00:19:11.920 --> 00:19:17.079
data and how we've used data historically, and if there are these long processes

275
00:19:17.160 --> 00:19:21.599
of building reports and doing all these
different things, I think the llm's kind

276
00:19:21.599 --> 00:19:23.680
of turn all that on its head. And if you do it right,

277
00:19:25.200 --> 00:19:26.720
I think what's going to happen is
companies are going to get their own idea,

278
00:19:26.759 --> 00:19:30.440
to get their own private model.
Maybe it's a small language model for

279
00:19:30.480 --> 00:19:34.759
a particular industry like legal or healthcare
or retail, because they have their own

280
00:19:34.839 --> 00:19:40.599
lexicon, and so it's good to
kind of weed out the semantic issues that

281
00:19:40.680 --> 00:19:42.200
you'll find with a large language model. That's kind of the point of the

282
00:19:42.240 --> 00:19:45.400
small language models. So you get
that, and then you train, you

283
00:19:45.599 --> 00:19:51.000
get your vector database. You basically
take your curated, trusted data and you

284
00:19:51.039 --> 00:19:53.880
start feeding it into the vector database
as you're embeddings. These are your anchors

285
00:19:53.880 --> 00:19:59.000
of truth. And what I think
is going to happen is this is sort

286
00:19:59.000 --> 00:20:02.720
of the Valhalla, if you will, is that you get enough of your

287
00:20:03.039 --> 00:20:07.480
corporate data in your embeddings. You've
got your anchors of truth now, and

288
00:20:07.519 --> 00:20:11.519
then you set up whether it's COFKA
topics or some kind of stream of changes

289
00:20:11.640 --> 00:20:18.000
coming through like CDC, basically into
the vector database and what's going to happen

290
00:20:18.119 --> 00:20:22.960
is the executives are just going to
sit there and have this amazing intern with

291
00:20:22.039 --> 00:20:26.599
tremendous amounts of knowledge at their fingertips. They'll be able to ask any number

292
00:20:26.599 --> 00:20:30.079
of questions, you know, how
how are we doing this month? What

293
00:20:30.119 --> 00:20:33.680
can I do to change? Who's
really excelling in my sales team? Like

294
00:20:33.759 --> 00:20:37.160
all these kind of questions that you
can ask and just get information pouring into

295
00:20:37.200 --> 00:20:41.119
them. And it's going to be
very useful because it's connected to your data

296
00:20:41.119 --> 00:20:44.799
warehouse, to your CRM system,
to your sales sporce, to your clickstream

297
00:20:44.839 --> 00:20:48.680
analysis, whatever it is that you've
got, that senior executive is tapped into

298
00:20:48.720 --> 00:20:52.799
it and can now ask any question
he or she wants. How viable is

299
00:20:52.839 --> 00:20:56.559
that? Do you think? And
is that the future? Yeah? I

300
00:20:56.640 --> 00:21:00.839
think you've hit on something there,
because really this is, you know,

301
00:21:00.880 --> 00:21:07.240
the extension evolution. Maybe it's the
the fruition of all the ideas that we

302
00:21:07.319 --> 00:21:15.519
had around data warehousing and decision support
systems and advanced analytics and business intelligence in

303
00:21:15.599 --> 00:21:21.920
general. Right that to have that
at the fingertips of an executive, that's

304
00:21:22.160 --> 00:21:23.599
that's where the value comes in,
Right, is you've got to be able

305
00:21:23.640 --> 00:21:29.759
to use the data to make effective
business decisions. And I think that what

306
00:21:29.839 --> 00:21:38.200
you're describing could be the differentiator that
allows an organization to get their competitive advantage.

307
00:21:38.519 --> 00:21:45.240
And specifically when they're looking at their
internal data right and training a model

308
00:21:45.319 --> 00:21:48.759
on their data, if they do
that right, that's the thing that's going

309
00:21:48.799 --> 00:21:52.720
to give them the competitive advantage.
Because so many things have been commoditized,

310
00:21:52.799 --> 00:21:56.279
right, and we've got lots of
data out there that's being shared via data

311
00:21:56.279 --> 00:21:57.720
marketplaces, and sure, you're probably
going to want to pull some of that

312
00:21:57.839 --> 00:22:03.359
in. But it's the combination of
that external data with your internal data,

313
00:22:03.440 --> 00:22:07.079
your proprietary data, that's the thing
that's got to give you the competitive advantage.

314
00:22:07.279 --> 00:22:12.839
And if you've got like you said
this uh basically online on demand uh

315
00:22:14.000 --> 00:22:22.599
executive intern data intern that's putting all
this together automatically right through through a small

316
00:22:22.680 --> 00:22:26.039
language model that the executive got a
question they just you know, like like

317
00:22:26.079 --> 00:22:29.759
Alexa, Well they'll probably give it. They'll have to give it their own

318
00:22:29.799 --> 00:22:34.759
name and say, hey, uh, I heard from my sales guy that

319
00:22:34.759 --> 00:22:40.960
we're having a problem in the western
region. Give me give me a summary

320
00:22:41.000 --> 00:22:45.000
of what happened in sales in the
last three days in the West Region and

321
00:22:45.359 --> 00:22:49.279
boom, it's like and eventually it
should be able to give us. You

322
00:22:49.279 --> 00:22:52.359
know, well, what's the implication
of that, Oh, the trend.

323
00:22:52.400 --> 00:22:56.319
The trend is if the sales,
if what's happening in sales in the West

324
00:22:56.319 --> 00:23:02.200
Region continues for X number of more
weeks, you're going to lose this much

325
00:23:02.200 --> 00:23:07.440
profit, right, right, These
are the kinds of questions that we built

326
00:23:07.480 --> 00:23:10.720
data warehouses to answer. Yes,
it's kind of where I'm going with this.

327
00:23:10.920 --> 00:23:14.640
Absolutely, I don't I don't think
that data warehouse is going to go

328
00:23:14.680 --> 00:23:18.599
away, but it's to me,
it has to get simpler, I think.

329
00:23:18.839 --> 00:23:22.279
And you know, so you look
at the marketplace. What happened.

330
00:23:22.680 --> 00:23:26.119
Data Bricks went out and bought Mosaic
mL for like one point four billion or

331
00:23:26.119 --> 00:23:29.599
something like that when this was just
taking off, and Terra Data and some

332
00:23:29.640 --> 00:23:32.799
of the others were still kind of
laughing at the jen Ai stuff, which

333
00:23:32.839 --> 00:23:37.200
I thought was a bit foolish frankly, But those guys were laughing at the

334
00:23:37.200 --> 00:23:41.000
cloud originally too, if you remember, Yeah, well that's wrong, that's

335
00:23:41.000 --> 00:23:45.759
pretty funny. It didn't work either, right, Well, I don't know,

336
00:23:45.839 --> 00:23:48.920
man, I just I keep thinking
about the disruptive power of these engines.

337
00:23:48.960 --> 00:23:52.880
And again, yes, they get
some stuff wrong. So do the

338
00:23:52.920 --> 00:23:56.039
old systems. The old systems got
stuff wrong because their data was wrong or

339
00:23:56.079 --> 00:23:59.440
the model was wrong, and it
takes even longer to find it and fix

340
00:23:59.480 --> 00:24:03.359
it in the old systems. Well, that's a different other thing, right,

341
00:24:03.400 --> 00:24:06.880
that's the other thing. So it's
like, here's what fascinates me is

342
00:24:07.400 --> 00:24:11.839
trying to figure out how do these
models figure out what to choose when they

343
00:24:11.920 --> 00:24:15.359
give you their answer. Now,
kindy that got bought by click I took

344
00:24:15.359 --> 00:24:21.160
a demo of their engine. It's
pretty interesting because it connects to an LLM

345
00:24:21.559 --> 00:24:26.920
and basically you use it for like
frequently asked questions for your manual for some

346
00:24:26.039 --> 00:24:30.319
new product that you bought, like
the new iPhone or something, and you

347
00:24:30.440 --> 00:24:33.880
load the whole thing in there.
It will automatically give you frequently asked questions

348
00:24:33.920 --> 00:24:37.799
that it suggests and the answers,
and you curate that and then when someone

349
00:24:37.920 --> 00:24:40.759
uses it, you'll get the answer
you asked. But they'll give you a

350
00:24:40.839 --> 00:24:44.720
drop down that'll show you where it
got those bits from, So it'll say

351
00:24:44.799 --> 00:24:47.799
the first sentence came from page three, paragraph two. The other one came

352
00:24:47.839 --> 00:24:51.599
from here. I'm like, now
that is compelling because now you're getting no

353
00:24:51.599 --> 00:24:56.039
annotation, right, that's the annotation
that we were taught in our language.

354
00:24:56.079 --> 00:25:00.599
Our language classes, right in our
English classes are learning how to do research

355
00:25:00.640 --> 00:25:03.480
papers. It's like, don't put
anything in there that you can't prove where

356
00:25:03.519 --> 00:25:07.240
it came from, right, And
it's and whether there's endnotes or footnotes,

357
00:25:07.559 --> 00:25:11.359
all those annotations are there, and
that's yeah, that's part of what we

358
00:25:11.400 --> 00:25:17.200
need. That's I guess the part
of the QA on these things is can

359
00:25:17.240 --> 00:25:19.599
we can we see where this data
came from rather than it even though it

360
00:25:19.640 --> 00:25:25.440
might be a black box that generated
the result, it's still traceable to the

361
00:25:25.480 --> 00:25:29.319
source and say, these are the
references, this is where these numbers came

362
00:25:29.319 --> 00:25:32.759
from, This is where this concept
came from, this is this is how

363
00:25:32.799 --> 00:25:37.759
we're coming up with this particular recommendation. And I think that's critical. That

364
00:25:37.759 --> 00:25:42.480
that is critical, you know,
to differentiate between a hallucination and a good

365
00:25:42.519 --> 00:25:47.079
answer that you actually want to run
your business off of. Right. Well,

366
00:25:47.119 --> 00:25:48.759
and that's the thing, right,
is that to run your business off

367
00:25:48.799 --> 00:25:52.319
these things, you're going to have
to have some certitude about what they're telling

368
00:25:52.359 --> 00:25:56.200
you. So that is a concern, But I just I feel like there's

369
00:25:56.240 --> 00:26:00.720
going to be downward pressure on pricing
data warehousing. But at the same time,

370
00:26:02.000 --> 00:26:04.480
you've got so much automation now,
it's so much easier to do things

371
00:26:04.519 --> 00:26:07.200
than it was in the old days. I mean, I guess that one

372
00:26:07.240 --> 00:26:11.240
of the biggest hurdles to overcome here
is just old fashioned mindsets. What do

373
00:26:11.319 --> 00:26:15.920
you think? Absolutely? Oh yeah, I mean that's the Even when I

374
00:26:15.960 --> 00:26:22.839
started off with Snowflake, the most
frequent question or might actually comment I got

375
00:26:22.880 --> 00:26:25.920
when I was out, you know, being the evangelist and talking to people

376
00:26:25.960 --> 00:26:29.400
about the separation compute from storage and
zero copy cloning and all the things that

377
00:26:29.440 --> 00:26:32.039
we talked about. And the first
thing is like, oh, that'd be

378
00:26:32.160 --> 00:26:37.039
that'd be awesome if it was true. And then they say, okay,

379
00:26:37.039 --> 00:26:38.720
well, well how do we how
do we index our queries? It's like

380
00:26:38.720 --> 00:26:41.960
you don't, Well, then this
thing can't work. It can't work.

381
00:26:42.200 --> 00:26:45.839
There's no way it can work because
you don't have Indexes's like, no,

382
00:26:47.000 --> 00:26:52.240
this is a completely different architecture.
And it was getting that that mindset change.

383
00:26:52.519 --> 00:26:56.799
Same thing when we went from Waterfall
to agile, right, trying to

384
00:26:56.839 --> 00:27:00.119
think about how do we do a
project in iterations, versus spending six months

385
00:27:00.119 --> 00:27:04.000
writing up the requirements document, getting
everybody signed off, and then spending two

386
00:27:04.079 --> 00:27:07.480
years building it, or spending a
year and a half doing an enterprise data

387
00:27:07.519 --> 00:27:11.200
model and then having to hand it
to a DBA to convert it to a

388
00:27:11.240 --> 00:27:15.440
schema in a database, and three
years later you've got a data warehouse that

389
00:27:15.440 --> 00:27:21.880
has nothing in it because nobody cares
anymore. Right, we had to change

390
00:27:21.880 --> 00:27:26.079
the way people were thinking. I
think what we're talking about here, it's

391
00:27:26.119 --> 00:27:29.920
the same thing. It's the you
know, we've talked over the last couple

392
00:27:29.920 --> 00:27:33.240
of years about companies about data literacy
and data culture, and it really is

393
00:27:33.279 --> 00:27:41.319
the organizational culture and expectations that you've
got to find a way to shift those

394
00:27:41.839 --> 00:27:47.200
right and get out of the well
the classic and this is this is every

395
00:27:47.200 --> 00:27:49.839
industry. It's not just it,
it is every industry. Well, we've

396
00:27:49.880 --> 00:27:53.599
never done it that way before,
or the reverse, Well, we've always

397
00:27:53.640 --> 00:27:57.920
done it this way, and I'm
comfortable doing it this way, right,

398
00:27:59.000 --> 00:28:02.920
so I want to just keep going
this way. I actually, early on

399
00:28:03.160 --> 00:28:11.119
Snowflake, we worked with a large
retailer in England and they after their evaluation,

400
00:28:11.279 --> 00:28:17.559
decided to switch from one of the
existing big data warehouse companies to Snowflake,

401
00:28:18.440 --> 00:28:22.640
and several DBAs literally resigned and went
and found other jobs. And this

402
00:28:22.720 --> 00:28:26.119
is in Europe, where you know, people don't change jobs hardly at all.

403
00:28:26.559 --> 00:28:30.960
Because they didn't want to spend the
last five years of their career learning

404
00:28:32.000 --> 00:28:34.480
Snowflake. They wanted to stay with
what they were comfortable with, which was

405
00:28:34.880 --> 00:28:41.000
the older system. And they found
another company in London that needed a DBA

406
00:28:41.240 --> 00:28:44.599
that had their expertise, and they
figured they were just they were going to

407
00:28:44.640 --> 00:28:48.200
go over there and sail off into
the sunset, that that was going to

408
00:28:48.200 --> 00:28:53.559
be less stressful for them than to
stay where they were and learn this really

409
00:28:53.599 --> 00:29:00.079
exciting new technology that they literally left
the organization rather than change the way they

410
00:29:00.079 --> 00:29:06.000
were doing things and thinking about things
differently. Yeah, that's not uncommon.

411
00:29:06.079 --> 00:29:08.559
I was. I was floored.
I was like, you do they did?

412
00:29:08.599 --> 00:29:15.640
What? Okay, I'm done,
I'm out, No, thank you.

413
00:29:15.640 --> 00:29:18.559
That's uh, that's wild, I
mean, and that that's I feel

414
00:29:18.559 --> 00:29:22.799
like, that's where we are right
now because of these lllms, I mean,

415
00:29:22.960 --> 00:29:26.680
so much is being shaken at the
moment, and people stuff to do

416
00:29:26.680 --> 00:29:30.480
their jobs. These have to get
stuff done. I mean, I've played

417
00:29:30.519 --> 00:29:34.480
around with these things and I'm really
amazed at what they're able to capture.

418
00:29:34.519 --> 00:29:37.519
Even though they do hallucinate, even
though they do make things up. It's

419
00:29:37.559 --> 00:29:41.880
really impressive what they can find.
And you just have to sit there and

420
00:29:41.920 --> 00:29:45.440
wonder, Wow, all of this
stuff is embedded in these models, and

421
00:29:45.519 --> 00:29:48.400
it's just deep in there. It's
gonna take us years to find out what's

422
00:29:48.440 --> 00:29:52.119
in there because there's so much stuff
in there already, right, I mean,

423
00:29:52.200 --> 00:29:56.960
like, how do you even how
do you begin? Yeah, and

424
00:29:56.480 --> 00:30:00.480
like everything is, you got to
start one foot in front of the other.

425
00:30:00.319 --> 00:30:03.920
But you've got to be willing to
try. And I think that's that's

426
00:30:03.960 --> 00:30:07.720
the big that's the big lesson in
all of this, and certainly, you

427
00:30:07.720 --> 00:30:11.400
know, even the last ten years
with the you know, going from a

428
00:30:11.559 --> 00:30:18.599
dupe to cloud data warehouses like Snowflake
and then data bricks, is you've got

429
00:30:18.640 --> 00:30:23.160
to be willing to take a look
at these new technologies and think about it

430
00:30:23.200 --> 00:30:27.160
critically. Is like how can I
take advantage of this technology rather than go,

431
00:30:27.200 --> 00:30:30.440
oh my god, I'm going to
lose my job, right, They're

432
00:30:30.480 --> 00:30:32.599
not going to need me anymore.
It's like, well, yeah, they're

433
00:30:32.599 --> 00:30:34.119
not going to need you anymore if
you don't learn how to do new things,

434
00:30:34.559 --> 00:30:37.359
if you don't learn how to use
these new technologies, you know,

435
00:30:37.359 --> 00:30:42.400
figure out how how to apply yourself
to the new technology with the knowledge that

436
00:30:42.440 --> 00:30:47.319
you already have, you know,
whether it's you know, in the data

437
00:30:47.319 --> 00:30:51.559
world, we're talking about data management. You understand things like schemas and SQL

438
00:30:51.599 --> 00:30:55.079
and all that. Well, how
can you apply that in the ll M

439
00:30:55.200 --> 00:31:00.240
world and the SLM world? You
know, how can you be of value

440
00:31:00.640 --> 00:31:08.400
to your organization helping them make these
transitions to the newer technologies for the advantage

441
00:31:08.400 --> 00:31:12.599
of the organization. Yeah, that's
very interesting. I mean you mentioned earlier

442
00:31:12.599 --> 00:31:18.400
you schema on read, and now
with these llms, it's almost it's almost

443
00:31:18.440 --> 00:31:21.680
like you can do that to a
certain extent, right and schema, but

444
00:31:22.000 --> 00:31:25.279
not in the way that any of
those sequel people ever dreamed of, right,

445
00:31:25.440 --> 00:31:30.640
right, So that we now have
an a an AI that can read

446
00:31:30.720 --> 00:31:34.759
the schema right right, that that
somebody doesn't have to look at the er

447
00:31:34.960 --> 00:31:41.559
diagram and know how to write the
joints. Now, the downside, you're

448
00:31:41.640 --> 00:31:45.480
I guess the caveat and all of
this and you mentioned it kind of in

449
00:31:45.839 --> 00:31:48.759
one of your earlier statements about having
your data warehouses and all that is.

450
00:31:48.839 --> 00:32:00.440
This presumes you've got a good,
a well structured data architecture, that there's

451
00:32:00.599 --> 00:32:04.559
metadata at least that tells you how
the data is related and what the data

452
00:32:04.720 --> 00:32:07.480
means. The semantics of the data
you talked about, you know, the

453
00:32:07.920 --> 00:32:13.319
you know, having a lexicon of
say a law firm. You know you

454
00:32:13.440 --> 00:32:17.000
got that terminology there, well,
you know the underlying data. There's got

455
00:32:17.039 --> 00:32:23.640
to be a relationship there between that
legal legal ease and the data in order

456
00:32:23.720 --> 00:32:28.759
to be able to ask a question
and create a prompt that will get you

457
00:32:28.839 --> 00:32:32.440
the answer you want. And so
it presumes it it's your data structure that

458
00:32:32.559 --> 00:32:37.359
is documented and that it's high quality, right, because the the ll M

459
00:32:37.599 --> 00:32:43.200
is not going to discriminate between a
bad row of data and good row of

460
00:32:43.279 --> 00:32:46.480
data unless you've somehow programmed it to
say, you know, ignore all the

461
00:32:46.599 --> 00:32:52.000
roles where data is null. Right, So you're you've put a data quality

462
00:32:52.079 --> 00:32:54.559
check into the prompt itself. Right. I know I don't want to do

463
00:32:54.599 --> 00:33:00.440
analysis on things where the sales date
is blank, right, because obviously I've

464
00:33:00.440 --> 00:33:05.319
got a problem. But that you
would have to know enough about data quality

465
00:33:06.000 --> 00:33:09.160
and know enough about the data to
ask a question that way. So otherwise

466
00:33:09.200 --> 00:33:14.880
you got it somewhere. Somebody has
to still do you know the hard part

467
00:33:14.960 --> 00:33:20.839
that we've always done there with with
building that data platform and ingesting the data

468
00:33:21.079 --> 00:33:27.799
and curating the data to get you
use that word curating curated data that we're

469
00:33:27.839 --> 00:33:30.079
going to use to feed all these
things, right, And I think that's

470
00:33:30.119 --> 00:33:36.119
going to be that the next big
push is figuring out these pipelines to feed

471
00:33:36.160 --> 00:33:38.920
the l ms. Do you have
to get your your sort of foundation in

472
00:33:39.000 --> 00:33:43.119
place, but then it's going to
be updating these things and then monitoring over

473
00:33:43.160 --> 00:33:45.160
time. But folks, don't touch
out. That'll be right back. You're

474
00:33:45.160 --> 00:33:55.519
listening to Inside Analysis. Welcome back
to Inside Analysis. Here's your host,

475
00:33:57.039 --> 00:34:01.640
Eric tabnaugh show. All right,
folks, back here on Inside Analysis talking

476
00:34:01.640 --> 00:34:05.319
to the one, the only.
Kent Gratziano is going to be at Data

477
00:34:05.440 --> 00:34:08.840
Universe April tenth and eleventh in New
York City the Javit Center. Don't miss

478
00:34:08.840 --> 00:34:12.880
it, folks, It's going to
be fun. Yours truly will be there

479
00:34:13.239 --> 00:34:15.159
and I'll throw one of my other
curve balls at you. Can't just because

480
00:34:15.320 --> 00:34:17.760
you're such a good guy and I
know you can hit good curve balls.

481
00:34:19.320 --> 00:34:22.440
What I'm going to talk about one
of my talks is the death of journalism,

482
00:34:22.559 --> 00:34:25.679
as I call it, And what
I'm really referring to is the fact

483
00:34:25.679 --> 00:34:32.079
that there's no media company who can
stand up to these large language models and

484
00:34:32.119 --> 00:34:38.519
the power of this AI in terms
of personalization, in terms of covering everything

485
00:34:38.559 --> 00:34:43.400
that could be interesting to someone.
And I think what's going to happen here

486
00:34:43.440 --> 00:34:45.719
is that the smart media companies are
going to figure out how to do what

487
00:34:45.760 --> 00:34:50.079
we mentioned. They're going to get
their own model. They're going to train

488
00:34:50.400 --> 00:34:54.760
their model on their voice by loading
all the past articles as embeddings. And

489
00:34:54.800 --> 00:35:00.440
then what's going to happen is journalists
are going to be more like curat and

490
00:35:00.559 --> 00:35:05.079
editors, and you're going to be
spinning out stories from facts. You have

491
00:35:05.119 --> 00:35:07.920
to have your fact sets. This
is really really important fact sets. And

492
00:35:08.079 --> 00:35:13.880
that's systems of record. So I
think like sales tax systems, for example,

493
00:35:13.920 --> 00:35:16.400
I think a municipality has to collect
sales tax. They're collecting all this

494
00:35:16.519 --> 00:35:21.760
data from all the different stores and
you determine what products are sold. Most

495
00:35:21.840 --> 00:35:24.280
have barcodes so you can be able
to do some analysis. Wouldn't that be

496
00:35:24.360 --> 00:35:29.519
a fantastic service to subscribe to as
a business person. Let's say I run

497
00:35:29.559 --> 00:35:32.119
a small like an easy Mart or
something like that, to be able to

498
00:35:32.159 --> 00:35:36.440
see who's buying what. And the
example I give is squish mellows. Like

499
00:35:36.440 --> 00:35:37.920
one day you're just looking at your
report, like, what are these squish

500
00:35:37.920 --> 00:35:42.119
metal things that everyone's making tons of
money on? Look into this, Oh,

501
00:35:42.199 --> 00:35:45.719
there's some new toy for kids that
kids freaking love. So now kids

502
00:35:45.760 --> 00:35:51.480
have to have every squish mellow under
the sun babies, it is. It's

503
00:35:51.480 --> 00:35:53.800
a new It's exactly what it is. It's the new beanie baby with lots

504
00:35:53.840 --> 00:35:58.320
of different sizes. My point is
that it's not going to be so much

505
00:35:58.400 --> 00:36:02.480
individual reporters going out spending all day
writing something, as it's going to be

506
00:36:02.639 --> 00:36:09.159
dynamically generated bits of information from systems
of record, spoken in text that is,

507
00:36:09.280 --> 00:36:13.800
from a large language model that can
be altered. Because now, like,

508
00:36:14.280 --> 00:36:16.320
it's like having a reporter that you
can ask questions of all day long.

509
00:36:16.559 --> 00:36:19.119
No, wait a minute, tell
me more about this. Tell me

510
00:36:19.119 --> 00:36:22.239
more about that. No reporter can
sit there and answer questions for everyone all

511
00:36:22.320 --> 00:36:24.199
day long you just can't do it. So I think there's going to be

512
00:36:24.239 --> 00:36:30.039
a whole movement of data engineering around
media, of connecting to these large language

513
00:36:30.039 --> 00:36:34.880
models. And you still have reporters
and journalists who will sort of review this

514
00:36:34.960 --> 00:36:38.159
stuff and write some of their own
original content for their voice. But to

515
00:36:38.239 --> 00:36:43.000
me, there's no stopping this train. It's like a high speed train coming

516
00:36:43.000 --> 00:36:45.719
at all of us. What do
you think? Well, my one question

517
00:36:45.800 --> 00:36:51.760
there is like, but where how
do we collect the facts to go into

518
00:36:51.880 --> 00:36:55.960
generating these stories? I like the
idea of being able to ask the questions.

519
00:36:57.000 --> 00:37:00.000
And you know, right now we've
got all kinds of clickbait headline,

520
00:37:00.159 --> 00:37:01.920
so it's like, you want to
ask the question. You read the article

521
00:37:01.920 --> 00:37:06.039
and it doesn't quite tell you.
It's like, well, right, where

522
00:37:06.480 --> 00:37:14.079
what was the vaccination rate in New
York State between March twenty twenty one in

523
00:37:14.119 --> 00:37:17.880
April twenty twenty two, broken down
by county? And you might want to

524
00:37:17.880 --> 00:37:22.320
get that detail because you live in
a particular county in New York. You

525
00:37:22.360 --> 00:37:24.920
want to say, well, what
really happened? The headline says vaccination rates

526
00:37:25.000 --> 00:37:30.760
dropped fifty percent over what it was
pre COVID or something like that. And

527
00:37:30.920 --> 00:37:34.360
you want to get the details,
but the article doesn't quite cover it,

528
00:37:34.599 --> 00:37:37.159
but you want to ask those questions. But again back to what's the source

529
00:37:37.199 --> 00:37:44.320
of the data though, where's that
information going to come from? If you

530
00:37:44.360 --> 00:37:46.480
don't have you still have to have
I think the reporters in the field.

531
00:37:46.599 --> 00:37:51.239
Somebody has to go out and interview
people somehow, and it might be via

532
00:37:51.320 --> 00:37:53.880
zoom like this, right that you
find, Hey, here's a person who

533
00:37:53.920 --> 00:38:00.360
has information about something that just happened, and so we get them on a

534
00:38:00.400 --> 00:38:06.119
zoom call and we talked to them, and then the zoom AI transcribes that

535
00:38:07.239 --> 00:38:10.159
transcribes that conversation, so nobody's actually
having to type it all up, right,

536
00:38:12.360 --> 00:38:15.119
And then that could be a source, I guess for the LM.

537
00:38:15.199 --> 00:38:19.639
But somewhere there's still got to be
that interaction with the external world somehow,

538
00:38:19.800 --> 00:38:22.360
right, Yeah, And I think
that's going to be these systems of records.

539
00:38:22.360 --> 00:38:28.920
So taxes is one, because every
county collects taxes. There are other

540
00:38:29.079 --> 00:38:34.440
sources for government, like the Federal
Register for example, wherever actions are taken.

541
00:38:34.519 --> 00:38:37.880
So I think like your ERP system, basically that's tracking the movement of

542
00:38:38.000 --> 00:38:42.079
goods and the sales and all this
kind of stuff, salesforce, That's all

543
00:38:42.079 --> 00:38:45.199
I was mentioning. You want to
tap into all these different systems and get

544
00:38:45.239 --> 00:38:49.119
a feed from them to be able
to ask questions of your business and so

545
00:38:49.280 --> 00:38:52.000
of the government. You could do
the same thing. And I just think

546
00:38:52.039 --> 00:38:57.679
about how much there is to be
gleaned from these from these environments of ours.

547
00:38:57.719 --> 00:39:00.079
And you think about even policy documents. Right. A buddy of mine,

548
00:39:00.119 --> 00:39:02.920
Jim Harris, was saying he was
talking to a friend who works for

549
00:39:04.199 --> 00:39:06.880
I think a Canadian government organization,
and he was like, Oh, I

550
00:39:06.920 --> 00:39:07.920
need to fire this person. How
can I do that? I don't know

551
00:39:08.360 --> 00:39:12.239
what the paperwork is going to look
like. And he said, just asking

552
00:39:12.400 --> 00:39:15.000
LLM. And it's like, okay, he asked, and it's like okay.

553
00:39:15.039 --> 00:39:16.920
Step one you have to file a
letter of grievance and say hey,

554
00:39:16.960 --> 00:39:20.679
your job is not performing well at
et cetera. Step two is to wait

555
00:39:20.719 --> 00:39:23.119
two weeks. Step three is do
this and all these things like, because

556
00:39:23.360 --> 00:39:28.039
if you've fed all this documentation into
a large language model, it can give

557
00:39:28.039 --> 00:39:31.039
you a good summary. And so
I mean, wow, think about,

558
00:39:31.199 --> 00:39:35.880
like in the court system, how
you have all these different motions you can

559
00:39:35.920 --> 00:39:38.679
file and you have to found the
motion on something based on this, based

560
00:39:38.719 --> 00:39:43.800
on that case law for example.
I mean there are these stories of an

561
00:39:43.800 --> 00:39:47.639
attorney who just used whatever the LM
gave him and got in trouble. I

562
00:39:47.639 --> 00:39:52.880
almost think that's law. Yeah,
it actually referenced, it didn't exist,

563
00:39:53.599 --> 00:39:58.119
right, and and that's a problem. Right. But a small language model

564
00:39:58.119 --> 00:40:00.960
that's trained on all the case laws, it's going to be a very different

565
00:40:00.000 --> 00:40:02.920
story. And now you can check
things. And I mean you can even

566
00:40:02.960 --> 00:40:07.599
work that into the workflow to say, as your RAG model, check and

567
00:40:07.719 --> 00:40:13.000
make sure that these are actual cases
at the So again that gets back to

568
00:40:13.039 --> 00:40:17.440
the workflow stuff. Well that I
mean we've paid lawyers hundreds even thousands of

569
00:40:17.480 --> 00:40:22.159
dollars an hour sometimes to know the
nuances of these codes and these policies.

570
00:40:22.519 --> 00:40:25.119
Well that you know, you don't
have to do that anymore. To me,

571
00:40:25.239 --> 00:40:30.320
that's just a huge change. It's
I think about just you your manual,

572
00:40:30.440 --> 00:40:32.440
like your you know, your code
of conducts or something. For these

573
00:40:32.440 --> 00:40:36.800
big companies to have like big thing
or even the loss that they're passing these

574
00:40:36.880 --> 00:40:39.280
days a thousand pages. Oh we
can't. There was a famous politician who

575
00:40:39.280 --> 00:40:42.000
said, oh, what's the point
of reading it if you don't have two

576
00:40:42.039 --> 00:40:45.679
days and two lawyers, Well now
you can. Now you can just feed

577
00:40:45.719 --> 00:40:47.920
that sucker until on them and ask
it all kinds of questions. What are

578
00:40:47.960 --> 00:40:52.119
the strangest things in here, what's
the most expensive, what's the least expensive?

579
00:40:52.559 --> 00:40:57.480
Point being that is a massive gains
changer for day to day workflow.

580
00:40:57.599 --> 00:41:00.960
And it's not just text generation,
it's not just image generation. I mean,

581
00:41:00.000 --> 00:41:04.840
those are two categories of use case. But to me, the analytical

582
00:41:04.960 --> 00:41:08.400
side is the most interesting because it
does have all these different ways of looking

583
00:41:08.400 --> 00:41:13.800
at things because of all the parameters. Right, Yeah, no, I

584
00:41:13.840 --> 00:41:17.480
think I think you're You're not wrong
on that. I'm still thinking about the

585
00:41:17.840 --> 00:41:23.159
journalism experience example, though, as
to how that's going to work, Like

586
00:41:23.320 --> 00:41:27.400
you can see the other end of
it. But I still say you have

587
00:41:27.480 --> 00:41:31.519
a sourcing issue. You've still got
to have the humans involved somehow to get

588
00:41:31.559 --> 00:41:37.079
the basic information for it. Though. I guess, you know, these

589
00:41:37.199 --> 00:41:42.920
days you could, I guess you
could scrape everything off of Twitter because there's

590
00:41:43.639 --> 00:41:45.599
uh, you know, we talked
about citizen data scientists in the past.

591
00:41:45.639 --> 00:41:51.440
Here with democratizing data, we've got
citizen journalists out there. Unfortunately, you

592
00:41:51.440 --> 00:41:55.960
don't know necessarily how accurate the reporting
is, but I guess if you have

593
00:41:57.559 --> 00:42:01.280
something happening in the world and you've
got you know, twenty or thirty Twitter

594
00:42:01.320 --> 00:42:07.360
feeds that are reporting it, that
are that are not just re reposting what

595
00:42:07.519 --> 00:42:13.039
somebody else posted, but are actually
taking pictures themselves. And now we've got

596
00:42:13.079 --> 00:42:17.199
ais that are now able to start
doing things like analyze right, pictures right,

597
00:42:17.280 --> 00:42:22.760
and say what's in that picture?
Say these these five these five posts

598
00:42:23.159 --> 00:42:28.719
all have pictures supposedly of this event, and it should be able to overlay

599
00:42:28.719 --> 00:42:31.639
them and go, yeah, those
pictures, they're unique pictures, but they

600
00:42:31.679 --> 00:42:34.800
are of the same area, and
you know, you get the kind of

601
00:42:34.800 --> 00:42:37.519
the landscaping overlay of this line here
that yep, that's that same building in

602
00:42:37.559 --> 00:42:42.960
the background, and be able to
verify all of that, right, Well,

603
00:42:43.079 --> 00:42:45.840
that is a very very interesting point, and so kind of to that

604
00:42:46.000 --> 00:42:50.719
end, I thought of this a
while ago. And cameras, all these

605
00:42:50.719 --> 00:42:53.079
digital cameras, they have their own
metadata, right they have. Some of

606
00:42:53.119 --> 00:42:55.880
them have GPS, so they know
where you are, they know what time

607
00:42:55.880 --> 00:43:00.760
of day you took the picture.
They and they have the medidated. I'm

608
00:43:00.760 --> 00:43:04.199
sure that's encoded in the file,
like we talked about parquet files. Right,

609
00:43:04.199 --> 00:43:07.159
it's got the medidata baked in there
so you can vet and see uha,

610
00:43:07.599 --> 00:43:12.000
this one was doctor that one wasn't. But to your point, you

611
00:43:12.039 --> 00:43:15.599
could aggregate on demand when there's a
big event, like when there's a riot

612
00:43:15.639 --> 00:43:20.679
someplace, or when you know there's
a smash and grab or something, to

613
00:43:20.760 --> 00:43:24.800
be able to dynamically pick up little
bits and pieces of that. I could

614
00:43:24.840 --> 00:43:30.599
see an AI engine writing a story
there was a row at Bob's bar last

615
00:43:30.719 --> 00:43:36.239
nights and you know, twenty seven
people were involved, Like you could get

616
00:43:36.239 --> 00:43:39.559
some juicy details and Bob Jones was
taken to jail because you get his mugshot

617
00:43:39.639 --> 00:43:43.280
or something. I mean, it
is possible, But to your point,

618
00:43:43.280 --> 00:43:46.119
you have to have some source.
So Twitter becomes a source, social media

619
00:43:46.119 --> 00:43:52.199
becomes a source. Official records like
police reports and things of that nature becomes

620
00:43:52.199 --> 00:43:57.719
sources, and those those things have
to be accessible. That's a's the ither

621
00:43:57.800 --> 00:44:01.760
thing that the whole data sharing thing, whether it's a government source, linconomics

622
00:44:01.840 --> 00:44:06.000
data you were talking about, the
tax data you're talking about, it has

623
00:44:06.039 --> 00:44:12.599
to be accessible somewhere so that so
that the LLM can get to the data,

624
00:44:12.719 --> 00:44:15.960
or you could build a data pipeline
to pull it in. That's right,

625
00:44:15.039 --> 00:44:17.360
No, that's exactly correct. And
I think there's gonna be a lot

626
00:44:17.400 --> 00:44:21.880
of pressure on those systems because when
this stuff starts to take off. I

627
00:44:21.880 --> 00:44:27.159
mean, apparently that's why Elon instituted
that rate limiting stuff for a period of

628
00:44:27.159 --> 00:44:30.199
time, because they realized that someone
was sucking all their data out and using

629
00:44:30.280 --> 00:44:34.760
it to train other models because here
it was a free source and you could

630
00:44:34.840 --> 00:44:37.119
use it. So he was trying
to stop that. It's just crazy the

631
00:44:37.199 --> 00:44:38.599
things we're dealing with these days.
But don't touch out. The podcast bonus

632
00:44:38.599 --> 00:44:45.199
segment is up next. We'll be
right back. All right, folks,

633
00:44:45.239 --> 00:44:50.000
time for the podcast bonus segment Here
on a fantastic inside analysis with our profiles

634
00:44:50.039 --> 00:44:53.800
in data Sage Kent Gardziano. We've
been talking all about the Data Universe conference.

635
00:44:53.800 --> 00:44:59.280
Coming up is the April tenth and
eleventh dates just a few weeks away

636
00:44:59.280 --> 00:45:02.960
now. Will do a fireside chat
with yours truly about privacy, which of

637
00:45:04.000 --> 00:45:08.960
course conjures up images of governance and
security and data management and ethics and all

638
00:45:09.000 --> 00:45:13.800
that fun stuff. And there's also
the Data Vault conference coming up in the

639
00:45:13.800 --> 00:45:15.920
beginning of May, Is that right, Kent? Yeah, yeah, w

640
00:45:16.199 --> 00:45:22.679
W DVC. It's gonna be the
tenth anniversary that flies when you're having fun,

641
00:45:22.800 --> 00:45:28.559
Oh yeah, when you're not in
Stove. Vermont's beautiful, beautiful location.

642
00:45:29.360 --> 00:45:31.840
It is a little harder to get
to than New York City, but

643
00:45:32.039 --> 00:45:36.679
in al for but it's it's worth
it. It's worth it to get there.

644
00:45:36.760 --> 00:45:39.639
Well, it's funny speaking of getting
there. When we drove, when

645
00:45:39.679 --> 00:45:43.320
you and I met there, which
is like I guess the year before COVID

646
00:45:43.360 --> 00:45:47.760
twenty nineteen maybe nineteen, and if
we drove and you know, higher forces

647
00:45:47.800 --> 00:45:51.199
were watching out four of us because
I'm trying to look at the snap like

648
00:45:51.239 --> 00:45:53.679
to understand what this is, like
the straight thing we get and we realize

649
00:45:54.079 --> 00:45:59.440
it's a fairy, Like we better
get that fairy because and we were there

650
00:45:59.519 --> 00:46:04.559
just in time to get the last
one and across the southern end of Lake

651
00:46:04.639 --> 00:46:07.480
Champlain. Yeah, I guess so, because we would have had to drive

652
00:46:07.519 --> 00:46:08.920
all the way rount it would have
been a really you'll be glad to know

653
00:46:09.000 --> 00:46:13.280
there's a bridge there. Now it's
there, Okay, there there is a

654
00:46:13.320 --> 00:46:19.320
bridge. It's a it's a Whitey
drive through the countryside there in in the

655
00:46:19.480 --> 00:46:22.920
far east border of New York State, the very southern end of Lake Champlain.

656
00:46:23.199 --> 00:46:25.719
But there's actually a bridge that goes
over now, so you don't you

657
00:46:25.760 --> 00:46:30.599
don't have to wait for a ferry. There's actually a real bridge that got

658
00:46:30.599 --> 00:46:34.000
put in there. I must have
been during COVID, because I you know,

659
00:46:34.039 --> 00:46:36.599
when I went after COVID and drove
they say, oh, hey,

660
00:46:36.639 --> 00:46:39.360
this is great. And I used
I used Google Google Maps, and I

661
00:46:39.400 --> 00:46:44.280
was just following the map going it
looks like there's a bridge on the map.

662
00:46:44.400 --> 00:46:46.400
I say, okay, I'll go
that way, sure enough, and

663
00:46:46.440 --> 00:46:51.360
it goes into southern Vermont. Well. And it's funny because, you know,

664
00:46:51.840 --> 00:46:53.960
in terms of following bots and just
doing what the bot tells you.

665
00:46:54.039 --> 00:46:58.480
I heard someone give a real good
example and say you're already doing it when

666
00:46:58.480 --> 00:47:01.280
you use your map speature on your
phone, just trusting what it tells you

667
00:47:01.320 --> 00:47:04.400
to do, and like, oh, here we go. And I think

668
00:47:04.480 --> 00:47:08.960
that is going to be a very
interesting other change in our industries, these

669
00:47:08.960 --> 00:47:14.320
little AI agents doing little things,
just scanning around looking for stuff. I

670
00:47:14.320 --> 00:47:16.679
mean, you and I are to
talk about privacy, and of course that

671
00:47:16.719 --> 00:47:22.039
again brings governance into the picture and
ethics what you what you should do ethically.

672
00:47:22.280 --> 00:47:25.679
I think that there are all these
conversations happening now because we need them

673
00:47:25.679 --> 00:47:30.119
now, because guess what, this
is a big deal. And you know,

674
00:47:30.280 --> 00:47:35.280
I'll tell you one of my other
big visions here is these companies like

675
00:47:35.320 --> 00:47:38.880
Amazon and Google and Facebook and LinkedIn
and others have done very well by using

676
00:47:38.920 --> 00:47:43.280
our data to train their algorithms,
which of course they then use, they

677
00:47:43.280 --> 00:47:46.079
say, to give us better stuff. I've never yet bought the argument I'm

678
00:47:46.079 --> 00:47:50.320
going to get better advertising, like
you know, I don't think I've seen

679
00:47:50.440 --> 00:47:54.599
better advertising exactly. It's like,
you know, really, is that that's

680
00:47:54.639 --> 00:47:58.000
the that's the caret the end of
the stick here, as I give up

681
00:47:58.000 --> 00:48:00.639
all my data and then you I
know that you're gonna give me good things

682
00:48:00.679 --> 00:48:05.960
that people are trying to sell me, and I haven't seen that. But

683
00:48:06.280 --> 00:48:08.760
I will say, there is some
cool stuff on you know, some cool

684
00:48:08.840 --> 00:48:12.920
gadgets these days. So I've come
across some of those on Facebook and some

685
00:48:12.960 --> 00:48:15.360
other places, so there is some
benefit to it. But I want to

686
00:48:15.360 --> 00:48:19.719
be able to own our algorithm,
like everyone talks about being able to own

687
00:48:19.760 --> 00:48:22.400
your data and get some money like
secondary income. I mean, I think

688
00:48:22.480 --> 00:48:28.079
that the numbers aren't quite there to
really make that interesting, Like how much

689
00:48:28.159 --> 00:48:30.880
data will it be? Is that? Like how much Spotify pays small bands?

690
00:48:30.920 --> 00:48:34.239
I hear people joke, Yeah,
I got my check for a dollar

691
00:48:34.320 --> 00:48:37.239
forty nine in the mail, and
how exciting is that? There is this

692
00:48:37.360 --> 00:48:42.119
scale problem, But I do feel
like still all the power is centralized with

693
00:48:42.159 --> 00:48:45.760
these big organizations, and I'd love
to find some way it's kind of pull

694
00:48:45.840 --> 00:48:50.239
that out and get some more power
back in the individual's hands or even in

695
00:48:50.239 --> 00:48:53.519
aggregate you know, Like I wrote
about what I referred to as a consumer

696
00:48:53.559 --> 00:49:00.760
facing data leg of information about transactions. So you think credit card companies sell

697
00:49:00.800 --> 00:49:05.559
their exhaust data to investment banks so
they can optimize what they buy and the

698
00:49:05.599 --> 00:49:08.800
stock market and other reasons like that. I think when they sell our data

699
00:49:09.280 --> 00:49:14.719
to some third party like that,
they should also provide an anonymized version for

700
00:49:14.840 --> 00:49:17.920
this consumer data leg where you and
I and anyone else could kind of log

701
00:49:17.960 --> 00:49:22.079
in and just see interesting data about
what's happening in the world. How many

702
00:49:22.079 --> 00:49:25.480
apples are being sold, how many
cars are being sold. That's interesting information

703
00:49:25.559 --> 00:49:29.239
for business people, and this kind
of gets back to the journalism thing,

704
00:49:29.800 --> 00:49:34.119
because, especially for business people,
what I want more than anything is information.

705
00:49:34.280 --> 00:49:37.840
I don't really want attitude or spin
or narrative or all that stuff.

706
00:49:37.840 --> 00:49:40.760
What I want is facts about what
are people paying for this, what are

707
00:49:40.760 --> 00:49:44.760
people paying for that? How long
does it take this company to do that

708
00:49:44.840 --> 00:49:46.559
job, how long does they take
that company to do this job. That's

709
00:49:46.719 --> 00:49:52.440
really useful information, and it's just
facts. It's just basic facts that come

710
00:49:52.440 --> 00:49:55.760
from transactional systems. So that's my
big vision for the future. But what

711
00:49:55.800 --> 00:50:00.320
do you think about that? Oh? Yeah, I'm with you because you

712
00:50:00.320 --> 00:50:05.519
know, Dan lind said, who
invented data vault? He he used to

713
00:50:05.519 --> 00:50:10.480
say, you know, the data
vault is a historical repository of the facts

714
00:50:10.960 --> 00:50:15.079
from your source system. It's not
a single source of truth. It's a

715
00:50:15.119 --> 00:50:19.679
single source of facts because you know, truth becomes malleable to a certain extent

716
00:50:19.719 --> 00:50:24.199
with your interpretation, right. You
know, But to to create the information,

717
00:50:24.320 --> 00:50:28.519
for to be information and to be
informative, it has to be based

718
00:50:28.559 --> 00:50:30.159
on facts. So you've got to
start, You've got to set that foundation

719
00:50:30.840 --> 00:50:35.880
right. And so having a you
know, a we don't have to have

720
00:50:35.960 --> 00:50:39.360
a single source of facts like we
used to. That was our goal with

721
00:50:39.440 --> 00:50:44.599
data warehousing was the single source of
truth. You know the technologies. Well,

722
00:50:44.800 --> 00:50:49.079
not everything has to necessarily be in
the data warehouse, provided you've got

723
00:50:49.119 --> 00:50:52.800
the data pipelines and you know the
right business use cases for it, but

724
00:50:52.880 --> 00:50:58.199
you do have to have a source
of fact in order to do this,

725
00:50:58.199 --> 00:51:02.079
because you don't want you know,
guesswork going on, you know, unless,

726
00:51:02.079 --> 00:51:07.639
of course you're dealing with customer sentiment, which is different. But then

727
00:51:07.840 --> 00:51:13.880
even that, getting customer sentiment off
of say, off of x or Facebook

728
00:51:13.960 --> 00:51:19.920
or LinkedIn by people posting about your
product, those are facts. This customer

729
00:51:20.039 --> 00:51:25.159
said this, right right, And
from that fact and a lot of other

730
00:51:25.199 --> 00:51:31.719
similar facts you can start to evolve
then an understanding of customer sentiment about your

731
00:51:31.760 --> 00:51:37.119
product, right right. That's so
cool. I love that as single source

732
00:51:37.159 --> 00:51:39.920
of facts. It makes a lot
of sense because, and I'll throw one

733
00:51:40.000 --> 00:51:44.920
last teaser out here, I came
across as stat as I was preparing for

734
00:51:44.960 --> 00:51:49.079
my talk on the death of journalism, and the biggest challenges are bias.

735
00:51:49.239 --> 00:51:52.559
Quite frankly, there's bias. There's
also something in might like there's misinformation,

736
00:51:52.840 --> 00:51:59.119
there's disinformation, and there's missed information, and to me, that's the biggest

737
00:51:59.119 --> 00:52:01.679
one. Go ahead, you get
I'll give you another one, poisoned information.

738
00:52:02.519 --> 00:52:07.800
Poisoned Yeah, yeah, Dan alerted
me to that one. There's been

739
00:52:07.800 --> 00:52:12.599
a couple of articles about that,
about poisoned data. So think of it

740
00:52:12.639 --> 00:52:16.800
as reverse hacking. Instead of somebody
hacking into your system and stealing your data,

741
00:52:17.119 --> 00:52:22.559
they hack into your system and inject
bad data so that your algorithms come

742
00:52:22.559 --> 00:52:27.199
out with the wrong answer. Wow, that's crazy. Isn't that crazy?

743
00:52:27.880 --> 00:52:30.719
That is crazy. Well, it's
been so much fun talking to you.

744
00:52:30.800 --> 00:52:32.920
Can't You're such a rock star in
the industry. I've always looked to you

745
00:52:32.920 --> 00:52:37.159
for advice on what's happening. I
look forward to seeing you in New York.

746
00:52:37.199 --> 00:52:39.079
And it's a couple of weeks.
Folks up online of Data Universe and

747
00:52:39.280 --> 00:52:42.760
we'll see it a couple of weeks. You've been listening to Inside Analysis.

748
00:52:42.960 --> 00:52:46.280
KCIA Radio has openings for one hour
talk shows. If you want to host

749
00:52:46.320 --> 00:52:51.480
a radio show, now is the
time. Make CACIA your flag ship station.

750
00:52:51.760 --> 00:52:54.679
Our rates are affordable and our services
are second to none. We broadcast

751
00:52:54.679 --> 00:52:59.880
to a population of five million people
plus. We stream and podcast on all

752
00:53:00.079 --> 00:53:04.719
major online audio and video systems.
If you've been thinking about broadcasting a weekly

753
00:53:04.840 --> 00:53:09.039
radio program on real radio plus the
Internet, contact our CEO at two eight

754
00:53:09.119 --> 00:53:15.119
one five nine nine ninety eight hundred
two eight one five nine nine ninety eight

755
00:53:15.239 --> 00:53:17.679
hundred. You could skype your show
from your home to our Redlands, California

756
00:53:17.679 --> 00:53:22.119
studio, where our live producers and
engineers are ready to work with you personally.

757
00:53:22.320 --> 00:53:28.039
A radio program on KCAA is the
perfect work from home avocation in these

758
00:53:28.039 --> 00:53:32.039
stressful times. Just type KCAA Radio
dot com into your browser to learn more

759
00:53:32.079 --> 00:53:36.599
about hosting a show on the best
station in the nation, or call our

760
00:53:36.679 --> 00:53:43.360
CEO for details to eight one five
nine nine ninety eight hundred. I'm doctor

761
00:53:43.400 --> 00:53:50.639
Anthony Lyiserwitz, and this is Climate
Connections. Plastic is used in many products,

762
00:53:50.760 --> 00:53:54.760
from containers and bags to electronics and
vehicles, and it's a significant source

763
00:53:54.880 --> 00:53:59.960
of climate warming pollution. Plastic's production. At the moment, it's come through

764
00:54:00.480 --> 00:54:06.199
to four point five percent of global
greenhouse gas emissions. Livia Carbernaut of the

765
00:54:06.239 --> 00:54:09.639
Technical University in Munich, Germany,
explains that almost all plastics are made from

766
00:54:09.679 --> 00:54:15.320
oil, natural gas or coal.
Extracting and transporting those fuels emits carbon pollution.

767
00:54:16.000 --> 00:54:20.719
Then more fossil fuels are burned to
supply the heat and electricity used to

768
00:54:20.760 --> 00:54:25.159
refine those raw materials and manufacture the
plastic products, so the entire process is

769
00:54:25.320 --> 00:54:30.559
very carbon intensive. Carbonnaout's research shows
that in the past few decades, carbon

770
00:54:30.599 --> 00:54:36.719
pollution from making plastic has doubled because
production has grown and shifted to parts of

771
00:54:36.760 --> 00:54:39.239
the world that burn a lot of
coal for energy. So she says the

772
00:54:39.280 --> 00:54:45.000
industry needs to change. Making plastics
from algae or plants can reduce the need

773
00:54:45.079 --> 00:54:50.199
for fossil fuels as raw ingredients,
and switching to renewable energy can reduce carbon

774
00:54:50.239 --> 00:54:54.639
pollution from production facilities. Also,
for sure, we should avoid plastics whenever

775
00:54:54.840 --> 00:55:00.280
possible, like the single use plastics
to help limit how much is made in

776
00:55:00.320 --> 00:55:05.719
the first place. Climate Connections is
produced by the Yell Center for Environmental Communication.

777
00:55:06.159 --> 00:55:12.440
To learn more about climate change,
visit climatec connections dot org. And

778
00:55:12.519 --> 00:55:17.039
now the Voices of KCAA was an
exciting announcement. Want to hear NBC News

779
00:55:17.199 --> 00:55:21.440
or KCAA anywhere you go, Well, now there's an app for that.

780
00:55:21.719 --> 00:55:25.679
KCAA is celebrating twenty five years in
our silver Anniversary with a brand new app.

781
00:55:25.840 --> 00:55:30.679
The new KCAA app is now available
on your smart device, cell phone,

782
00:55:30.920 --> 00:55:35.920
in your car, or any place. Just search KCAA on Google Play

783
00:55:36.199 --> 00:55:38.599
or in the Apple Store one touch
and you can listen on your car radio,

784
00:55:38.679 --> 00:55:44.599
Bluetooth device, Android Auto or Applecar
Play. Catch the KCAA buzz in

785
00:55:44.639 --> 00:55:47.800
your earbuds or on the streets.
Celebrating twenty five years of talk news and

786
00:55:47.880 --> 00:55:53.360
excellence with our new KCAA app,
Just do it and download it. KCAA

787
00:55:53.440 --> 00:56:00.440
celebrating twenty five years. I'm Listening
reminds you that talk saves lives and nine

788
00:56:00.440 --> 00:56:04.239
eight eight makes it even easier to
reach out and talk nine to one to

789
00:56:04.239 --> 00:56:07.360
one for emergency services nine eight eight
for mental health needs nine eight eight connects

790
00:56:07.360 --> 00:56:12.239
you with trained counselors and over two
hundred crisis centers nationwide. Find out more

791
00:56:12.360 --> 00:56:15.920
at I'm Listening dot org. AM
radio provides always on new sports, talk,

792
00:56:16.039 --> 00:56:21.239
traffic, and weather reports. It
also delivers vital emergency information when your

793
00:56:21.280 --> 00:56:24.639
community needs it most. A new
bill in Congress would ensure AM radio stays

794
00:56:24.679 --> 00:56:29.880
in your car because when sale and
internet services are down, this free emergency

795
00:56:29.920 --> 00:56:34.320
service is critical. Text AM to
five two eight eighty six and tell Congress

796
00:56:34.360 --> 00:56:37.239
to support the AM radio. For
every Vehicle act message in data RATESMA AMPLA.

797
00:56:37.360 --> 00:56:39.199
You may receive up to four messages
a month, and you may text

798
00:56:39.199 --> 00:56:44.320
stop to stop this message. Furnished
by the National Association of Broadcasters. T

799
00:56:44.480 --> 00:56:49.159
Hebot Club's original purepowd to rco Super
tea comes from the only tree in the

800
00:56:49.239 --> 00:56:52.719
world that fungus does not grow on. As a result, it naturally has

801
00:56:52.800 --> 00:56:58.559
anti fungal, anti infection, anti
viral, antibacterial, anti inflammation, and

802
00:56:58.800 --> 00:57:01.760
anti parasite property. So the tea
is great for healthy people because it helps

803
00:57:01.760 --> 00:57:06.800
build the immune system, and it
can be truly miraculous for someone fighting a

804
00:57:06.840 --> 00:57:10.320
potentially life threatening disease due to an
infection, diabetes, or cancer. The

805
00:57:10.360 --> 00:57:15.599
tea is also organic and naturally caffeine
free. A one pound package of tea

806
00:57:15.679 --> 00:57:17.760
is forty nine to ninety five,
which includes shipping. To order, please

807
00:57:17.840 --> 00:57:22.880
visit to Hebota Club dot com.
T hebo is spelled tea like tom a,

808
00:57:22.480 --> 00:57:27.719
h ee b like boy o.
They continue with the word t and

809
00:57:27.760 --> 00:57:30.920
then the word club. The complete
website is to Hebot Club dot com or

810
00:57:30.960 --> 00:57:36.519
call us at eight one eight six
one zero eight zero eight eight Monday through

811
00:57:36.559 --> 00:57:40.159
Saturday nine am to five pm California
time. That's eight one eight six one

812
00:57:40.239 --> 00:57:45.599
zero eight zero eight eight to Hebot
club dot com. Are you graduating high

813
00:57:45.639 --> 00:57:51.199
school soon and wondering what to do
next? College is one option, but

814
00:57:51.400 --> 00:57:58.119
why not consider the high paying jobs
made possible by union power Labor Union Teamsters

815
00:57:58.199 --> 00:58:01.400
Local nineteen thirty two is open to
training center to get you into the high

816
00:58:01.480 --> 00:58:07.400
school to high paying job pipeline.
You'll learn all the skills needed to excel

817
00:58:07.440 --> 00:58:15.039
in opportunities across industries. Visit nineteen
thirty two Trainingcenter dot org to enroll today.

818
00:58:15.639 --> 00:58:23.159
That's nineteen thirty two Trainingcenter dot org. Look around your office? Is

819
00:58:23.159 --> 00:58:27.760
it time to change things up?
Start a new home office or reorganize your

820
00:58:27.760 --> 00:58:31.639
professional office space. Visit Office Furniture
Outlet and Corona and you'll feel great.

821
00:58:31.760 --> 00:58:36.880
With a huge inventory of both new
and pre owned office furniture you can buy,

822
00:58:37.119 --> 00:58:39.559
sell, or even trade to get
the job done. Office Furniture Outlet

823
00:58:39.599 --> 00:58:44.960
and Corona will get you looking and
feeling good, and that simply means success

824
00:58:44.960 --> 00:58:49.880
and great business results. From executive
office collections to home office options, you

825
00:58:49.960 --> 00:58:53.960
can find exactly what you need at
an affordable price office furniture outlet. They

826
00:58:54.000 --> 00:58:59.599
have desk selections that range from modern
and contemporary to traditional and elegant. With

827
00:58:59.679 --> 00:59:02.320
the law large selection of sizes,
finishes, and styles, you can design

828
00:59:02.360 --> 00:59:06.760
an office just the way you need
it, an office you can be proud

829
00:59:06.800 --> 00:59:08.880
of. Pre office Finisher outlet and
Corona is just south of the ninety one

830
00:59:08.880 --> 00:59:15.280
Freeway in McKinley at two eighty four
DuPont Street, or visit ofousa dot com.

831
00:59:15.320 --> 00:59:22.599
That's ofousa dot Com for the office
furniture outlet in Corona today. For

832
00:59:22.679 --> 00:59:29.079
several years, KCAA has been marketing
the Youngevity brand of nutritional and personal care

833
00:59:29.119 --> 00:59:34.320
products. Our experience with Youngevity has
been one hundred percent positive, so we

834
00:59:34.400 --> 00:59:38.159
are pleased to recommend them to you. Regarding nutritional supplements, we recommend pollen

835
00:59:38.199 --> 00:59:44.559
Burst in the berry flavor and tangy
Tangerine two point zero in the tablet form.

836
00:59:44.760 --> 00:59:49.119
For regularity issues, we recommend three
day cleanse, and for personal care

837
00:59:49.400 --> 00:59:54.360
we recommend morning hydration cream. You
can shop online for Youngevity at www dot

838
00:59:54.440 --> 00:59:59.960
KCAA team dot com, or you
can order by phone by calling eight hundre

839
01:00:00.039 --> 01:00:02.360
Ndred ninet eight two three, one, nine seven, and tell customer support

840
01:00:02.400 --> 01:00:07.320
that you are part of the KCAA
team. Youngevity is an American company based

841
01:00:07.360 --> 01:00:08.119
in San Diego.

