WEBVTT

1
00:00:00.320 --> 00:00:05.040
As a ride. The world is
teeming with innovation as new business models reinvent

2
00:00:05.240 --> 00:00:11.199
every industry industry. Inside Analysis is
your source of information and insight about how

3
00:00:11.240 --> 00:00:15.880
to make the most of this exciting
new era. Learn more at inside analysis

4
00:00:15.960 --> 00:00:20.920
dot com, insideanalysis dot com.
And now here's your host, Eric Kavanaugh.

5
00:00:21.519 --> 00:00:30.120
Ladies and gentlemen, Hello, and
welcome back once again to the Blurer

6
00:00:30.120 --> 00:00:34.920
Group's webinar series. I'm very pleased
to have a very special guest today with

7
00:00:35.039 --> 00:00:38.439
us. A former Gardner analyst and
all around expert has been in the field

8
00:00:38.880 --> 00:00:42.240
for quite some time, Sumid Palace
with us today and we're going to talk

9
00:00:42.280 --> 00:00:47.119
about knowledge graphs and the whole concept
of a center of excellence. And if

10
00:00:47.119 --> 00:00:49.960
you were here in the pre show, I was chit chatting with Suman about

11
00:00:50.439 --> 00:00:55.119
all the different applications of knowledge graphs, in particular within the context of all

12
00:00:55.159 --> 00:01:00.560
this generative AI and beyond that,
foundational models. AI is just one flavor

13
00:01:00.600 --> 00:01:06.200
of artificial intelligence. There are many
other forms of AI. But I think,

14
00:01:06.480 --> 00:01:10.680
as we've been discussing, by and
large, we're going through a major

15
00:01:10.840 --> 00:01:15.680
transformation in enterprise software now and in
how business gets done. Quite frankly,

16
00:01:15.159 --> 00:01:22.239
and these AI models are going to
subsume most of traditional enterprise software sooner or

17
00:01:22.319 --> 00:01:26.799
later. Some of the low hanging
fruit is definitely in the customer service space,

18
00:01:26.840 --> 00:01:32.000
obviously in copyrighting and content creation things
of this nature. But you can

19
00:01:32.040 --> 00:01:37.560
rest assured that business intelligence, analytics, most of what enterprise software does,

20
00:01:37.840 --> 00:01:42.280
is going to find itself in the
crosshairs of these foundational models. And here's

21
00:01:42.319 --> 00:01:49.079
the good news, folks. Graph
especially knowledge graphs, are extremely valuable at

22
00:01:49.120 --> 00:01:53.640
being able to true the wheel of
GENAI, if you will. In other

23
00:01:53.680 --> 00:01:57.120
words, whether it's part of a
RAG architecture and that's probably the most of

24
00:01:57.159 --> 00:02:00.959
what you'll see, or some other
type of implementation perhaps fine tuning, knowledge

25
00:02:01.000 --> 00:02:08.400
graph provide a tremendous foundation of the
concepts and constructs and key ideas that you're

26
00:02:08.400 --> 00:02:13.479
trying to manage with your enterprise.
So with that, I'm going to hand

27
00:02:13.479 --> 00:02:16.800
it over to Submit Paler for my
long intro there knowledge graph Center of Excellence,

28
00:02:16.800 --> 00:02:20.520
take it away, Submit, Thank
you, Eric, thank you for

29
00:02:20.560 --> 00:02:23.800
the introduction and give giving us the
opportunity to discuss about knowledge graphs and Center

30
00:02:23.840 --> 00:02:28.520
of excellence around it. Hi,
my name is Summith and I'm delighted to

31
00:02:28.560 --> 00:02:31.879
have all of you here. My
role here at Autotech's straddles between marketing,

32
00:02:31.960 --> 00:02:38.680
pre sales and solutions engineering to bring
in thought leadership across various industries for adoption

33
00:02:38.879 --> 00:02:46.000
of knowledge, graph and semantic technologies. So let's I'd like to welcome you

34
00:02:46.039 --> 00:02:47.759
to this buffet, the data buffett. Right. This is a picture from

35
00:02:47.879 --> 00:02:52.360
Matt Turk's Mad Landscape twenty twenty four. I'm not sure how many of you

36
00:02:52.400 --> 00:02:55.159
have seen it. There are about
twenty four hundred logos here on this one

37
00:02:55.199 --> 00:03:00.000
single chart. It's called Mad for
a reason because it covers machine learning,

38
00:03:00.080 --> 00:03:05.840
AI and business intelligence and data and
data. And what we see here is

39
00:03:05.960 --> 00:03:10.159
the data ecosystem today is crowded with
shiny objects, dazzling buzzwords, and data

40
00:03:10.199 --> 00:03:15.800
ecosystems have sort of become data jungles
where data teams are struggling, grappling with

41
00:03:16.039 --> 00:03:24.439
the high entropy in this ecosystem to
create in this disaggregated system a functional modern

42
00:03:24.520 --> 00:03:30.199
data experience. What is happening as
a result of this is that data skills

43
00:03:30.199 --> 00:03:35.919
have become sort of the most sought
after skills, and job descriptions are shifting

44
00:03:35.960 --> 00:03:39.319
as quickly as the new tools hit
the market. This unbundling of the data

45
00:03:39.360 --> 00:03:44.360
ecosystem has led to the problem that
there is no one end to end,

46
00:03:44.360 --> 00:03:50.759
two causing teams to causing data teams
and data personas toduct tape, different products

47
00:03:50.759 --> 00:03:55.639
and frameworks to build these end to
end automatic or automated, agile and repeatable

48
00:03:55.800 --> 00:04:01.960
data driven systems. Let's look at
this data maturity spectrum right. We call

49
00:04:02.000 --> 00:04:08.080
it the DEKW pyramid, which will
show in the next slide. All organizations

50
00:04:08.080 --> 00:04:13.120
today are on this data journey,
but it turns out that almost every organization

51
00:04:13.879 --> 00:04:18.040
is stuck at the information layer.
They find it very difficult to cross that

52
00:04:18.160 --> 00:04:23.040
chasm from information layer to the knowledge
layer. They have no dearth of data.

53
00:04:23.199 --> 00:04:27.759
The data is hidden in data lakes, lakehouses, data warehouses. There's

54
00:04:27.800 --> 00:04:32.040
lots of data, but it lacks
context to make the connections across these data

55
00:04:32.040 --> 00:04:38.519
sources, across these data units to
gain valuable insights. So we have this

56
00:04:38.600 --> 00:04:43.879
thing where we say data data everywhere, not a drop of insight, neither

57
00:04:44.160 --> 00:04:47.879
drop of context. We have heard
about big data, wide data and so

58
00:04:47.959 --> 00:04:54.720
on, but actually no one talks
about the context aspects. And as we

59
00:04:54.879 --> 00:05:00.439
get more and more data, the
context gets diluted unless it's managed well.

60
00:05:00.959 --> 00:05:08.360
So this is the dikw pyramit and
what we see is most organizations just stop

61
00:05:08.360 --> 00:05:11.959
at that information layer. There are
some organizations that can make this transition to

62
00:05:12.000 --> 00:05:15.839
the knowledge layer through the use of
graphs and knowledge graphs. And the critical

63
00:05:15.959 --> 00:05:23.240
step, the critical misstep, I
would say is in adopting AI and data

64
00:05:23.319 --> 00:05:29.879
technologies is the disconnect, the major
disconnect between business needs and technology. Organizations

65
00:05:30.160 --> 00:05:34.839
rush to embrace technologies without clearly defining
the problems they are going to solve.

66
00:05:35.959 --> 00:05:42.720
They bring in different technologies in a
more bottom up approach instead of taking a

67
00:05:42.800 --> 00:05:47.120
top down approach to solve business use
cases that leverage data and the technologies,

68
00:05:47.199 --> 00:05:53.600
and hence miss the fact that they
are building. They are building solutions that

69
00:05:53.680 --> 00:05:57.720
often don't serve the business. And
this is where we come up with this

70
00:05:57.800 --> 00:06:03.639
term called bad data tax. In
this race to become data driven, most

71
00:06:03.639 --> 00:06:10.759
of the efforts of most organizations have
resulted in a tangled web of data integrations

72
00:06:10.800 --> 00:06:16.439
often point to point and integrations,
as well as reconciliations across these data silos,

73
00:06:16.800 --> 00:06:21.439
and this has resulted in a huge
cost to the organization, often forty

74
00:06:21.560 --> 00:06:28.920
to sixty percent of an enterprise's annual
technology spend, which amounts to millions in

75
00:06:29.000 --> 00:06:33.959
dollars. We call this the bad
data tax, and these investments often don't

76
00:06:34.040 --> 00:06:43.079
translate into insights needed to deliver better
decisions or build better processes, and hence

77
00:06:43.160 --> 00:06:47.800
there is a solid justification in every
organization to fix this so that those who

78
00:06:47.879 --> 00:06:53.879
need access to the data can convert
it to insights and drive business decisions and

79
00:06:53.959 --> 00:07:00.399
processes, and make data available and
accessible in the right format in the that

80
00:07:00.439 --> 00:07:08.000
is flexible, accurate, and machine
readable. It's been also seeing that,

81
00:07:08.319 --> 00:07:13.120
you know, from my experience at
Gartner when I used to talk to customers

82
00:07:13.160 --> 00:07:15.879
and clients all the time, it
seems that only half of the CEOs are

83
00:07:16.279 --> 00:07:21.600
able to drive innovation using data and
just about forty percent of the CDOs manage

84
00:07:21.720 --> 00:07:27.720
data as a business asset. These
are some of the numbers that we have

85
00:07:27.839 --> 00:07:33.240
seen. Also comes to my mind
is about sixty about sorry, about twenty

86
00:07:33.279 --> 00:07:39.079
twenty five percent of the CDOs also
do not have a single point of accountability

87
00:07:39.279 --> 00:07:46.120
for data within their organizations. This
was a survey done by sales Force a

88
00:07:46.240 --> 00:07:54.680
year back where they illustrated this point
that ninety percent agree from the data teams

89
00:07:54.680 --> 00:07:59.240
that the need for trustworthy data is
higher than ever and today it's even more

90
00:07:59.319 --> 00:08:03.800
because it's again the whole idea about
garbage in garbage out. You're feeling if

91
00:08:03.839 --> 00:08:07.399
you're feeding garbage to your AI technology, going to get garbage out. So

92
00:08:07.519 --> 00:08:13.000
this sort of summarizes this mind map
sort of summarizes what we discussed in terms

93
00:08:13.040 --> 00:08:16.319
of challenges that most organizations are facing. One of the biggest challenges you'll see

94
00:08:16.360 --> 00:08:22.560
the one which is right to the
buttom is also about findability most organizations.

95
00:08:22.600 --> 00:08:26.879
Most data personas in most organizations cannot
link the data, cannot find the data

96
00:08:28.399 --> 00:08:33.080
where because of lack of context.
And McKenzie and IDC had done our research

97
00:08:33.120 --> 00:08:39.440
about a few years back where they
found out most data personas in organizations spend

98
00:08:39.480 --> 00:08:45.600
about thirty percent of their time just
finding the right data for their use case.

99
00:08:46.559 --> 00:08:50.679
So if we ask ourselves these questions, right, why do most data

100
00:08:50.879 --> 00:08:56.919
like efforts fail and why is data
getting increasingly harder to find? Why existing

101
00:08:56.000 --> 00:09:01.679
data catalogs are not working for most
enterprises. The reason is to be effective

102
00:09:01.720 --> 00:09:05.879
to work with data, it's not
just technology and more data. It requires

103
00:09:07.080 --> 00:09:13.559
context and semantics to make the data
more powerful. What has happened is we

104
00:09:13.600 --> 00:09:20.399
are allowed the data to lose consistency
and precision of meaning because most organizations haven't

105
00:09:20.440 --> 00:09:30.519
thought about context and semantics as they
build their enterprise data platforms. So data

106
00:09:30.519 --> 00:09:35.399
by itself is powerful, but the
challenge is again the context. As data

107
00:09:35.440 --> 00:09:39.919
has grown in volumes, the need
for automated context has grown as well,

108
00:09:41.279 --> 00:09:50.000
and that's why organizations cannot aspire to
or cannot dream to become data or AI

109
00:09:50.080 --> 00:09:54.279
driven unless, in my mind,
they are context driven and data context here

110
00:09:54.320 --> 00:10:01.639
includes both business technical metadata, governance, privacy, access ability issues. It

111
00:10:01.720 --> 00:10:11.000
is context that makes data more valuable
okay, and in my opinion, data

112
00:10:11.080 --> 00:10:18.600
engineering will still remain a huge cost
center for most organizations until it matures from

113
00:10:18.720 --> 00:10:26.200
becoming a peer ETL oriented or an
ELT oriented approach to an ECL oriented approach.

114
00:10:26.240 --> 00:10:33.200
Where C is the context by leveraging
knowledge graphs and ontologies for knowledge management.

115
00:10:35.240 --> 00:10:39.480
So what is a knowledge graph?
Knowledge graph is sort of a network

116
00:10:39.519 --> 00:10:45.399
of entities representing real world domain objects
like people, organization, as well as

117
00:10:45.440 --> 00:10:52.240
concepts, topics and their semantic relationships
and attributes. Knowledge graphs with their emphasis

118
00:10:52.399 --> 00:11:00.360
or with their more stress on semantic
relations between the entities, creates the text

119
00:11:00.480 --> 00:11:07.799
for both humans as well as machines
to do automated reasoning. And knowledge graphs

120
00:11:07.840 --> 00:11:11.360
go beyond just simple storage and querying
of data, and it focuses more on

121
00:11:11.440 --> 00:11:18.840
the idea of definitions of the connections
between the entities and as you will see,

122
00:11:18.840 --> 00:11:24.679
it requires connecting the dots across most
of the organization where most organizations struggle,

123
00:11:24.200 --> 00:11:31.799
and knowledge graphs help to build this
foundational semantic graph layer by semantically linking

124
00:11:31.840 --> 00:11:35.399
the data across the silos, whether
it's structured, unstructured, semi structured,

125
00:11:37.000 --> 00:11:46.960
and that reduces or eliminates bottlenecks in
the process of becoming data driven. What

126
00:11:46.120 --> 00:11:50.840
makes the knowledge graph useful and powerful
is that semantic model the ontologies, the

127
00:11:50.960 --> 00:11:58.000
taxonomies that includes domain concepts and their
inter relationships, their hierarchies, their dependencies.

128
00:11:58.480 --> 00:12:05.039
It is this semantics that actually enriches
data with the context that both machines

129
00:12:05.159 --> 00:12:15.200
and humans can interpret unambiguously. So
knowledge graphs actually give a holistic view of

130
00:12:15.240 --> 00:12:22.000
the data, revealing these intricate hierarchies
within the data the precise definitions that gives

131
00:12:22.120 --> 00:12:26.000
meaning to the data. As I
said earlier, data by itself is powerful,

132
00:12:26.039 --> 00:12:33.639
but context is what gives the meaning
and real value to the data,

133
00:12:33.759 --> 00:12:35.519
you know, real quick. So, mat, we got a question from

134
00:12:35.519 --> 00:12:39.200
the audience, and it's a very
good question about data literacy, it seems

135
00:12:39.240 --> 00:12:43.840
to me, and you've already mentioned
data catalogs. Obviously, data catalogs are

136
00:12:43.879 --> 00:12:48.960
great for improving data literacy because the
whole point is to capture the definitions and

137
00:12:48.000 --> 00:12:52.600
the meanings of these concepts and to
share them in a useful, accessible way.

138
00:12:52.840 --> 00:12:56.559
But it seems to me a knowledge
grab is an incredibly powerful tool for

139
00:12:56.720 --> 00:13:01.320
improving data literacy. What do you
think? Absolutely, absolutely, that's a

140
00:13:01.440 --> 00:13:07.320
very good question and very good point
that knowledge graph has embedded in it that

141
00:13:07.600 --> 00:13:11.799
semantics and the context, as I
keep saying again and again, which are

142
00:13:11.879 --> 00:13:16.720
again very domain specific. A lot
of the times, the data literacy aspect

143
00:13:16.759 --> 00:13:22.000
is understanding the data and understanding the
nuances associated with the data, which is

144
00:13:22.039 --> 00:13:30.799
oftentimes embedded in the heads of domain
specialists, data stewards, even data engineers.

145
00:13:30.799 --> 00:13:35.360
They are encoding all those business rules
in their SEQL code and that sort

146
00:13:35.399 --> 00:13:41.440
of you know, decouples the data
from the logic, whereas in a knowledge

147
00:13:41.480 --> 00:13:46.679
graph, where everything is connected in
a knowledge graph. The data and the

148
00:13:46.720 --> 00:13:52.600
metadata are in the same place.
That provides enormous benefits from a data literacy

149
00:13:52.639 --> 00:13:58.679
perspective. Spot on mm hmm,
okay, go ahead. So the knowledge

150
00:13:58.679 --> 00:14:03.559
graph platform adds semantics and meaning to
the data, as we said, but

151
00:14:03.600 --> 00:14:09.159
how does it do it right?
It treats the connections between the different entities

152
00:14:09.440 --> 00:14:15.519
as relationships are first class citizens using
in a knowledge graph or in a graph

153
00:14:15.559 --> 00:14:18.519
based technology, using nodes, edges, and labels to depict these entities their

154
00:14:18.519 --> 00:14:24.399
inter relationships and properties. It's this
semantic layer that comes out from it that

155
00:14:24.480 --> 00:14:31.320
contextualizes the data, giving it meaning, a formal representation and meaning, making

156
00:14:31.320 --> 00:14:37.080
it machine interpretable. The advantage other
advantage of knowledge graphs, especially built with

157
00:14:37.600 --> 00:14:43.679
RDF stack is it follows open standards. Everything in a knowledge graph that is

158
00:14:43.720 --> 00:14:50.120
built with an RDF stack is thereby
reusable, it's interoperable, and it's very

159
00:14:52.000 --> 00:14:58.960
amenable to data sharing with the with
unambiguous semantics. Also, the other aspect

160
00:14:58.960 --> 00:15:03.320
of graphs is graphs can be very
easily accommodated to change to make changes because

161
00:15:03.480 --> 00:15:11.759
enterprise data systems are always changing,
and graphs provide that flexibility of flexible schema

162
00:15:11.879 --> 00:15:18.240
to make adjustments and changes. This
shared meaning resolves a lot of the ambiguities

163
00:15:18.240 --> 00:15:22.120
associated, especially when you're building let's
say, simple data pipeline. As you're

164
00:15:22.120 --> 00:15:28.320
handing off data or bringing in data
from operational systems to analytical systems you utilize,

165
00:15:28.399 --> 00:15:33.720
there is a sort of an impedance
mismatch happens where the terms can mean

166
00:15:33.759 --> 00:15:37.840
something else in the operational systems while
it means something else in the analytics system

167
00:15:37.240 --> 00:15:43.120
That's where knowledge graphs, with their
ideas of semantics, with ontologies and taxolomies,

168
00:15:43.600 --> 00:15:50.039
removes these ambiguities. Now, there
are different ways in which semantic technologies

169
00:15:50.080 --> 00:15:54.120
increase the value of data. First, it helps in data integration. It's

170
00:15:54.120 --> 00:15:58.480
not we all do data integration.
Data integration is probably one of the richest

171
00:16:00.039 --> 00:16:03.080
tools you would see in that first
slide or the third slide that I showed

172
00:16:03.120 --> 00:16:07.000
you. But those are all doing
data integration with a lot of the code,

173
00:16:07.039 --> 00:16:08.519
with a lot of the business logic
in the minds of the people who

174
00:16:08.559 --> 00:16:12.559
are implementing it. But with knowledge
graphs, we do what is called semantic

175
00:16:12.679 --> 00:16:19.120
data antigration, where you are semantically
joining the data across different data silo.

176
00:16:19.200 --> 00:16:25.080
That's why you will see a lot
of the data fabrics are powered with knowledge

177
00:16:25.080 --> 00:16:30.200
graphs. The second aspect is data
quality as it captures relationships between things and

178
00:16:30.600 --> 00:16:37.919
adding context through ontologies as well as
doing inferencing through the relationships. And a

179
00:16:38.000 --> 00:16:44.519
side benefit of this is doing entity
resolution so we don't have to duplicate nodes

180
00:16:44.600 --> 00:16:48.879
or DeDuplicate nodes in a graph.
A knowledge graph built with our dear stack,

181
00:16:49.360 --> 00:16:53.480
you cannot have duplicates. The system
will not allow you to have duplicates.

182
00:16:53.720 --> 00:16:59.960
It removes that whole data engineering aspect
of you know, where we do

183
00:17:00.039 --> 00:17:03.400
d Duplication is a huge set of
work that needs to be done. But

184
00:17:03.440 --> 00:17:07.200
with knowledge graphs, when you're ingesting
data into a knowledge graph, especially with

185
00:17:07.240 --> 00:17:11.880
an RDF stack, it will prevent
duplicates from happening. So all this leads

186
00:17:11.880 --> 00:17:18.720
to trust with data validation, lineage
and provenance out of the box, a

187
00:17:18.759 --> 00:17:23.680
knowledge graph built with RDF provides you
provenance. Think about the tools that provide

188
00:17:23.680 --> 00:17:29.160
you provenance and lineage how much work
they have to do to provide you the

189
00:17:29.200 --> 00:17:36.039
whole lineage stack. While with knowledge
graphs built in version in capabilities and capabilities

190
00:17:36.079 --> 00:17:40.759
to do provenance, there is a
provenance based ontology. It comes to you

191
00:17:41.000 --> 00:17:45.160
out of the box, same thing
with what we call fair principles. Data

192
00:17:47.079 --> 00:17:51.279
in our opinion, needs to adhere
to these fair principles, which is findable,

193
00:17:51.720 --> 00:17:56.039
accessible, interoperable, and reusable.
And knowledge graphs provide you with all

194
00:17:56.119 --> 00:18:02.119
these capabilities. So this light sort
of some arizes all that we discussed earlier

195
00:18:02.119 --> 00:18:07.319
about the capabilities of knowledge graphs to
remove ambiguities, to represent data consistently,

196
00:18:07.839 --> 00:18:14.920
and integrate and unify the data sources. Now, this graph foundation, this

197
00:18:15.200 --> 00:18:19.839
as what I've been talking about.
Graph foundation with a knowledge graph with based

198
00:18:19.960 --> 00:18:26.799
with taxonomies and ontologies, enables you
to do these things that that we have

199
00:18:26.920 --> 00:18:33.519
all been doing in a much more
seamless, much more cost effective way where

200
00:18:33.680 --> 00:18:40.319
you have to where you get these
new capabilities out of the box and these

201
00:18:40.599 --> 00:18:44.480
how does a knowledge graph do it? It does it with these semantic standards.

202
00:18:44.519 --> 00:18:48.839
Semantic standards have been there since the
beginning of the century or end of

203
00:18:48.359 --> 00:18:56.519
last century, where it uses identities
to represent entities to represent the concepts and

204
00:18:56.720 --> 00:19:03.160
all these result in these foundational capabilities
that are very essential to the whole data

205
00:19:03.200 --> 00:19:10.400
management practice to provide things data quality
with validation capabilities, of doing, of

206
00:19:10.559 --> 00:19:15.240
reusing, of governance and lineage,
and those on the right and side you

207
00:19:15.279 --> 00:19:21.920
see the valued drivers of all these
foundational capabilities. Now, this is the

208
00:19:22.160 --> 00:19:29.039
outline of an enterprise knowledge graph platform
and how it interplaces, how it interconnects

209
00:19:29.119 --> 00:19:33.920
with the different tools and engines that
we have in most organizations in their legacy

210
00:19:33.920 --> 00:19:41.160
systems, and it supports two major
design patterns here, the semantic knowledge hub

211
00:19:41.559 --> 00:19:45.839
and the semantic data fabric. The
knowledge hub part of it uses knowledge graphs

212
00:19:45.839 --> 00:19:52.720
to manage documents and unstructured content.
Unstructured content is all around us and improves

213
00:19:52.759 --> 00:19:59.200
the way the documents are found,
especially with the relevance and with their precision

214
00:19:59.240 --> 00:20:03.640
and accuracy. The data fabric side, the semantic data fabric side, is

215
00:20:03.799 --> 00:20:11.640
the pattern that provides better unified access
across multiple structured or semi structured data sources

216
00:20:11.440 --> 00:20:17.640
and its objective is to enable twarying
all of them as if it's a single

217
00:20:17.720 --> 00:20:22.039
federated database or a data source in
both these use cases. Both these use

218
00:20:22.079 --> 00:20:26.319
cases leverage the semantic metadata, which, as you see in the center,

219
00:20:26.440 --> 00:20:33.759
is the conceptual model, which is
based on domain specific ontologies and other metadata

220
00:20:33.799 --> 00:20:41.559
capabilities. When you do this,
it makes data much more discoverable, interpretable,

221
00:20:42.519 --> 00:20:49.240
unambiguous and also consistent. And at
the bottom we see to manage these

222
00:20:49.279 --> 00:20:56.240
platforms we need different engines and capabilities. We include integration with llms, with

223
00:20:56.359 --> 00:21:00.839
machine learning tools, specially with text
analytics, document store maybe full text search

224
00:21:00.880 --> 00:21:08.119
engines and vector databases for doing and
again integration with other LPG based graph sources

225
00:21:08.400 --> 00:21:15.160
to do graph analytics. And these
are some of the major use cases that

226
00:21:15.359 --> 00:21:19.880
utotext has been doing in the last
twenty years to solve some of these data

227
00:21:19.880 --> 00:21:27.720
management problems along with the high level
architectural patterns that support those use cases.

228
00:21:29.400 --> 00:21:34.400
Now show you what Some of the
next slides are mostly quotes which i'd quickly

229
00:21:34.440 --> 00:21:40.039
go through. You can read them
once once these decks are shared. However,

230
00:21:40.079 --> 00:21:44.240
there is one particular example where even
Gartner now has been talking about in

231
00:21:44.279 --> 00:21:48.279
the last five seven years about the
value of graphs and knowledge graphs as well

232
00:21:48.319 --> 00:21:52.799
as the semantics and the metadata needed
to be become successful with your data management

233
00:21:52.839 --> 00:21:57.440
practices. And if you have seen
Gartner's two thousand and four Impact Radar,

234
00:21:57.519 --> 00:22:03.119
you see knowledge graphs at the center
here where they're talking about knowledge graphs with

235
00:22:03.200 --> 00:22:08.559
the metadata aspects how knowledge graphs can
help with GENAI and LMS. One particular

236
00:22:08.599 --> 00:22:14.039
example I'd like you to look at
and this is available even on YouTube if

237
00:22:14.039 --> 00:22:18.559
you search. In a knowledge graph
conference twenty twenty two twenty three, Gregor

238
00:22:18.640 --> 00:22:26.079
womb from he's the head of Data
Architecture at UBS spoke about implementing their next

239
00:22:26.119 --> 00:22:33.240
generation data management based on this foundational
graph layer knowledge graph layer. They're doing

240
00:22:33.319 --> 00:22:37.640
all the things that sort of I
highlighted in the above slides, building common

241
00:22:37.720 --> 00:22:42.440
data models and ontologies with the unified
meaning that could be shared across the organization.

242
00:22:44.640 --> 00:22:48.599
They built this with schema dot org
with all the shared models and schemas

243
00:22:48.640 --> 00:22:53.839
that were standardized across the organization.
They built a data service to enrich their

244
00:22:53.880 --> 00:23:00.680
data by converting it into a knowledge
graph based on these shared ontologies. They

245
00:23:00.720 --> 00:23:06.000
cataloged all their data assets to build
this conceptual layer and map data to this

246
00:23:06.400 --> 00:23:14.279
player to power their downstream applications,
analytics and data products. So, before

247
00:23:14.319 --> 00:23:18.759
I finished just the last two slides, where what is this graph center of

248
00:23:18.799 --> 00:23:25.200
excellence? Right? So how can
something like this be implemented in larger organizations?

249
00:23:25.920 --> 00:23:32.960
Most large organizations also have few isolated
graph projects for specific niche projects and

250
00:23:33.079 --> 00:23:37.599
use cases like fraud detection or recommendation, but this data management case is very

251
00:23:37.640 --> 00:23:42.319
different. We see companies like UBS
which have gone down this path and they

252
00:23:42.359 --> 00:23:52.200
have adopted this graph center of excellence
with strategic prioritization of graph use cases across

253
00:23:52.240 --> 00:24:00.160
the organization with c suit sponsorship to
start with a single sea level executive t

254
00:24:00.240 --> 00:24:07.920
who champions the strategic vision to build
or bring in knowledge graph based approaches to

255
00:24:07.960 --> 00:24:12.880
solve the data problems. And finally, these are some of the key takeaways.

256
00:24:14.559 --> 00:24:18.599
In order to become data driven in
the age of AI requires organizations to

257
00:24:18.680 --> 00:24:25.799
shift from to shift to a more
connected and contextualized aspect of thinking about their

258
00:24:25.880 --> 00:24:33.519
data with graph technologies as the foundational
layer for their modern data management requirements because

259
00:24:33.599 --> 00:24:41.759
graph enables semantic data integration, traceability, ambiguity, resolving ambiguity, ambiguous entities,

260
00:24:44.400 --> 00:24:48.359
promoting sort of consistency, sharing,
reuse, and following the fair data

261
00:24:48.400 --> 00:24:56.960
principles to connect the dots across the
organization with this semantic graph layer using ontologies

262
00:24:56.960 --> 00:25:03.440
and taxonomies and control vocabulary which are
domain specific models for the organization. That

263
00:25:03.519 --> 00:25:10.680
was in short about a quick run
through of some the graph approach. That's

264
00:25:10.680 --> 00:25:12.160
pretty impressive. We've got a couple
questions that I know you have to run

265
00:25:12.160 --> 00:25:17.680
in a few minutes here, but
a couple of quick questions. How does

266
00:25:17.680 --> 00:25:21.920
a graph maintain lineage to the source
once it's been loaded? Is that in

267
00:25:21.960 --> 00:25:26.920
the relationships between the entities or how
do you actually preserve lineage? Yeah,

268
00:25:26.960 --> 00:25:33.839
so if you have to preserve lineage, then both yours. There is a

269
00:25:33.920 --> 00:25:38.319
specific ontology that can be incorporated into
the graph. It's called provo. Provo

270
00:25:38.480 --> 00:25:44.559
is a is a well known,
publicly available ontology that is based on the

271
00:25:44.640 --> 00:25:48.880
RDF stack that, when incorporated into
your knowledge graph, can help you to

272
00:25:49.000 --> 00:25:55.559
do sort of what current data lakes
also are doing in terms of doing time

273
00:25:55.640 --> 00:26:00.799
travel, in terms of doing versioning, to keep tag of how the data

274
00:26:00.920 --> 00:26:04.960
is morphing as it is moving through
your system, because remember, your data

275
00:26:06.000 --> 00:26:10.160
pipelines are not getting replaced by data
by knowledge graphs or the graph lift.

276
00:26:10.200 --> 00:26:15.920
The graph letter is just the metadata
around all your data workflows. This metadata

277
00:26:15.079 --> 00:26:22.200
will help you enrich your provenance,
lineage, time tracking, versioning, those

278
00:26:22.240 --> 00:26:26.839
aspects of it. Yeah right,
I mean, I'm reminded of how we

279
00:26:26.920 --> 00:26:30.799
got here with the whole concept of
data warehousing, and years ago we realize

280
00:26:30.839 --> 00:26:36.759
that you cannot analyze, you cannot
run queries on operational systems very effectively.

281
00:26:37.279 --> 00:26:41.319
So we pulled out key aspects transactional
data from these systems, but in doing

282
00:26:41.359 --> 00:26:47.039
so, stripped out all the context. And what you're suggesting here is that

283
00:26:47.119 --> 00:26:52.519
the graph, especially via a center
of excellence, can preserve all of that

284
00:26:52.720 --> 00:26:56.160
context, which can then be leveraged
downstream in any application, whether it's a

285
00:26:56.240 --> 00:27:02.119
data warehouse or a bi tool or
just some front end that a user has

286
00:27:02.240 --> 00:27:06.119
to get access to information about customers
or products or whatever. Is that about,

287
00:27:06.160 --> 00:27:11.519
right? Exactly exactly, Because as
we move from one system to other,

288
00:27:11.640 --> 00:27:15.200
right, as we are making a
lot of organizations are copying data,

289
00:27:15.200 --> 00:27:18.319
and copying data is bad, right, And as we're moving data each of

290
00:27:18.359 --> 00:27:23.119
these handshake points, there is a
lot of information that is getting lost in

291
00:27:23.319 --> 00:27:27.920
translation, and the knowledge graph,
the graph foundation layer, helps to capture

292
00:27:27.960 --> 00:27:33.359
them and interconnect them. What has
happened, why it has happened, and

293
00:27:33.400 --> 00:27:37.559
what was the reason it happened?
Right? So you're spot on, Eric

294
00:27:37.599 --> 00:27:40.640
in terms of that kind of an
interpretation. Yes, yeah, We've got

295
00:27:40.640 --> 00:27:45.440
a couple more questions here. One
was could you explain the difference between a

296
00:27:45.559 --> 00:27:51.720
data vault approach, a data about
modeling approach, and a knowledge graph.

297
00:27:51.839 --> 00:27:56.160
That's a pretty long discussion in terms
of a data vault versus a knowledge graph.

298
00:27:56.240 --> 00:28:00.680
However, please reach out to us, would say, it's an interesting

299
00:28:00.759 --> 00:28:06.519
conversation. Reach out to us and
we'll be able to explain to you how

300
00:28:07.519 --> 00:28:10.799
we are. What we are saying
is, yes, the graph model with

301
00:28:10.880 --> 00:28:14.720
the found with the knowledge graph and
an RDF, So knowledge graphs can be

302
00:28:14.759 --> 00:28:18.599
built with an rd OF data model
or an LPG data model. RDF data

303
00:28:18.680 --> 00:28:26.480
model by itself has the semantics and
the context associated with it. Now the

304
00:28:26.599 --> 00:28:32.039
data vault model or any of the
other data modeling they are more at the

305
00:28:32.319 --> 00:28:37.440
logical layer, while the while the
knowledge graph is more at the foundational layer

306
00:28:37.519 --> 00:28:41.680
or at the implementational layer. Interesting, We've got a bunch of good questions

307
00:28:41.720 --> 00:28:45.200
coming here. I'll just throw a
couple more at you with the time we

308
00:28:45.240 --> 00:28:48.400
have left. One attendee is asking, how can we ensure the scalability and

309
00:28:48.480 --> 00:28:56.279
efficiency of querying and updating large scale
dynamic knowledge graphs while maintaining data consistency and

310
00:28:56.359 --> 00:29:03.240
minimizing latency. I mean, how
do you maintain performance over time. Now,

311
00:29:03.519 --> 00:29:11.200
one of the things is, so
don't think about bringing copying your data

312
00:29:11.559 --> 00:29:15.519
from your sore systems into knowledge graphs. Right. Knowledge graphs are more about

313
00:29:15.680 --> 00:29:21.640
the metadata aspect of it. Right
when you're doing graph traversals or quertying a

314
00:29:21.720 --> 00:29:26.799
graph where you have billions of nodes. A lot of our customers have have

315
00:29:27.200 --> 00:29:33.640
knowledge graphs where the metadata as well
as the instance data is runs into millions

316
00:29:33.680 --> 00:29:38.079
of nodes and edges. However,
your knowledge graph is not your ol app

317
00:29:38.200 --> 00:29:45.960
system where concurrency and latency is like
that deadly combination where you need high concurrency

318
00:29:45.000 --> 00:29:49.799
and low latency. Right here,
you are doing most of the traversals or

319
00:29:49.880 --> 00:29:56.359
most of the quadring of the knowledge
graph for figuring out the dependencies, for

320
00:29:56.440 --> 00:30:00.440
figuring out, as maybe one example, how the whole traceability happens, how

321
00:30:00.519 --> 00:30:03.480
the lineit happens, as you have
to traverse the graph. But you're not

322
00:30:03.559 --> 00:30:10.519
doing aggregations right where you are rolling
up data or doing a lot of analytical

323
00:30:10.640 --> 00:30:15.880
type all app type reporting type querdits. Gotcha, there's one last question here

324
00:30:15.880 --> 00:30:18.880
as well, and folks, for
any unanswered questions, we will pass these

325
00:30:18.920 --> 00:30:22.359
on to submit. He can get
back to you to be able to design

326
00:30:23.039 --> 00:30:29.240
and implement knowledge graphs. Do I
need to learn graph databases or vector databases?

327
00:30:29.279 --> 00:30:34.039
Well. Vector databases of course are
purpose built really for large language models,

328
00:30:34.440 --> 00:30:44.640
and they represent entities like concepts,
people, et cetera. They represents

329
00:30:45.039 --> 00:30:48.039
as vectors, so as memories will
formula. Right, So there's always going

330
00:30:48.079 --> 00:30:56.240
to be a lossy capacity with storing
something as a vector because you're converting it

331
00:30:56.359 --> 00:31:02.599
from text or imagery into a mathematical
function. So they're different, they're different

332
00:31:02.720 --> 00:31:04.720
entities. They do a lot of
overlap, but final thoughts from you as

333
00:31:04.759 --> 00:31:07.920
soon it go ahead. Yeah,
they are totally different things, right.

334
00:31:07.000 --> 00:31:14.160
Vector databases are mostly to store the
embeddings from your concepts, from your data

335
00:31:14.200 --> 00:31:18.240
from every instance data into an embedding
format and store it and retrieve it quickly.

336
00:31:18.680 --> 00:31:22.640
Retrieving it quickly is what vector databases
do well because of support for indices

337
00:31:22.680 --> 00:31:26.200
and all that. Now, a
lot of the existing databases, whether it's

338
00:31:26.319 --> 00:31:33.160
relational or non relational, are also
adding vector capabilities. Right. Graph databases

339
00:31:33.200 --> 00:31:38.480
are also adding vector capabilities. We
at autotext integrate with vector databases. Choose

340
00:31:38.519 --> 00:31:41.519
your own or bring your own vector
database and we'll integrate. If you have

341
00:31:41.599 --> 00:31:47.079
to store like that's the way we
do graph FRAG, where we build on

342
00:31:47.119 --> 00:31:52.480
top of RAG with graphs store the
embeddings in an external vector database. We

343
00:31:52.599 --> 00:31:55.960
could add vector databases on our own, but right now we don't see a

344
00:31:56.039 --> 00:32:00.319
huge need for it. But vector
databases and graphs are totally separate things.

345
00:32:00.960 --> 00:32:04.799
Yeah, and the last question,
I'll answer this you tell me if I'm

346
00:32:04.839 --> 00:32:07.440
wrong, and I tend to ask, is it a separate knowledge graph ontology

347
00:32:07.519 --> 00:32:13.039
database for each use case you deploy? Not necessarily, I mean it could

348
00:32:13.039 --> 00:32:16.519
be. There are different ontologies that
make sense for different chunks of data basically,

349
00:32:16.599 --> 00:32:21.759
but you do want to reuse the
ontologies that you deploy right real quick.

350
00:32:22.279 --> 00:32:27.559
Absolutely, Yes, different use cases
can have an overlap of different ontologies

351
00:32:27.599 --> 00:32:30.880
and that is where the knowledge graph
platform is meant for sharing for reusing.

352
00:32:31.400 --> 00:32:35.960
But again, if it's a siloed
use case, you could definitely use it.

353
00:32:37.440 --> 00:32:38.960
Wow. This has been fantastic.
Thank you so much for your time,

354
00:32:39.000 --> 00:32:43.359
seeming us to jump to a customer
call. Customers always come first,

355
00:32:43.680 --> 00:32:45.480
but thank you so much for all
these excellent questions, folks. We'll be

356
00:32:45.519 --> 00:32:49.519
sure to pass these along to the
auto techt folks. And this is the

357
00:32:49.559 --> 00:32:52.640
second and a series of three.
So I have one more webinar coming up

358
00:32:52.680 --> 00:32:54.920
on the bad Data Tax. So
and if anyone wants to be on a

359
00:32:54.960 --> 00:32:59.039
show like this, I mean email
info at dm radio dot is that comes

360
00:32:59.160 --> 00:33:00.240
right to me. And with that
we'll bid you farewell. Folks. Thanks

361
00:33:00.240 --> 00:33:02.680
so much for your time and attention. Thank you, submit, we'll talk

362
00:33:02.720 --> 00:33:05.759
to you. Take care. Thank
you, Eric, and send me the

363
00:33:05.799 --> 00:33:07.559
questions please. I think you will
have it in the database somewhere right.

364
00:33:07.640 --> 00:33:15.440
Yeah, thank you, take care, folks, Bye bye, Kaca.

365
00:33:16.480 --> 00:33:22.359
The information economy as a rod.
The world is teeming with innovation as new

366
00:33:22.400 --> 00:33:28.720
business models reinvent every industry industry.
Inside Analysis is your source of information and

367
00:33:28.839 --> 00:33:32.039
insight about how to make the most
of this exciting new era. Learn more

368
00:33:32.200 --> 00:33:37.240
at inside analysis dot com, insideanalysis
dot com And now here's your host,

369
00:33:37.720 --> 00:33:47.119
Eric Kavanaugh. Ladies and gentlemen,
Hello, and welcome back once again to

370
00:33:47.359 --> 00:33:52.079
the Blower Group's webinar series. I'm
very pleased to have a very special guest

371
00:33:52.200 --> 00:33:55.720
today with us. A former Gardner
analyst and all around expert has been in

372
00:33:55.720 --> 00:34:00.519
the field for quite some time.
Suem It Palace with a today and we're

373
00:34:00.519 --> 00:34:04.799
going to talk about knowledge graphs and
the whole concept of a center of excellence,

374
00:34:05.279 --> 00:34:07.720
And if you were here in the
pre show, I was chit chatting

375
00:34:07.759 --> 00:34:12.760
with assuming about all the different applications
of knowledge graphs, in particular within the

376
00:34:12.800 --> 00:34:19.280
context of all this generative AI and
beyond that foundational models. Generative AI is

377
00:34:19.360 --> 00:34:22.360
just one flavor of artificial intelligence.
There are many other forms of AI.

378
00:34:23.000 --> 00:34:28.239
But I think, as we've been
discussing, by and large, we're going

379
00:34:28.280 --> 00:34:32.880
through a major transformation in enterprise software
now and in how business gets done quite

380
00:34:32.920 --> 00:34:39.039
frankly, and these AI models are
going to subsume most of traditional enterprise software

381
00:34:39.440 --> 00:34:44.679
sooner or later. Some of the
low hanging fruit is definitely in the customer

382
00:34:44.760 --> 00:34:49.800
service space, obviously in copywriting and
content creation, things of this nature.

383
00:34:50.159 --> 00:34:54.480
But you can rest assured that business
intelligence, analytics, most of what enterprise

384
00:34:54.559 --> 00:35:00.320
software does, is going to find
itself in the crosshairs of these foundational models.

385
00:35:00.480 --> 00:35:06.159
And here's the good news, folks. Graph especially knowledge graphs, are

386
00:35:06.199 --> 00:35:12.039
extremely valuable at being able to true
the wheel of GENAI, if you will,

387
00:35:12.039 --> 00:35:14.880
In other words, whether it's part
of a RAG architecture. And that's

388
00:35:14.880 --> 00:35:19.079
probably the most of what you'll see
or some other type of implementation, perhaps

389
00:35:19.119 --> 00:35:25.599
fine tuning. Knowledge graphs provide a
tremendous foundation of the concepts and constructs and

390
00:35:25.760 --> 00:35:30.599
key ideas that you're trying to manage
with your enterprise. So with that,

391
00:35:30.599 --> 00:35:34.239
I'm going to hand it over to
submit Pal for my long intro there,

392
00:35:34.599 --> 00:35:37.559
knowledge Graphs Center of Excellence, Take
it away, Submit. Thank you,

393
00:35:37.679 --> 00:35:42.280
Eric, thank you for the introduction
and giving us the opportunity to discuss about

394
00:35:42.360 --> 00:35:45.480
knowledge graphs and center of excellence around
it. Hi, my name is Simmith

395
00:35:45.480 --> 00:35:50.159
and I'm delighted to have all of
you here. My role here at Autotext

396
00:35:50.239 --> 00:35:54.519
straddles between marketing, pre sales and
solutions engineering to bring in thought leadership across

397
00:35:54.679 --> 00:36:04.159
various industries for adoption of knowledge graph
and semantic technologies. So let's I'd like

398
00:36:04.199 --> 00:36:06.840
to welcome you to this buffet,
the data buffey right. This is a

399
00:36:06.840 --> 00:36:09.840
picture from Matt Turk's Mad Landscape twenty
twenty four. I'm not sure how many

400
00:36:09.880 --> 00:36:14.320
of you have seen it. There
are about twenty four hundred logos here on

401
00:36:14.360 --> 00:36:17.400
this one single chart. It's called
Mad for a reason because it covers machine

402
00:36:17.440 --> 00:36:23.519
learning AI and business intelligence and data
and data and what we see here is

403
00:36:23.639 --> 00:36:29.840
the data ecosystem today is crowded with
shiny objects, dazzling buzzwords, and data

404
00:36:29.920 --> 00:36:35.800
ecosystems have sort of become data jungles
where data teams are struggling grappling with the

405
00:36:35.920 --> 00:36:44.519
high entropy in this ecosystem to create
in this disaggregated system a functional modern data

406
00:36:44.559 --> 00:36:50.000
experience. What is happening as a
result of this is that data skills have

407
00:36:50.039 --> 00:36:53.800
become sort of the most sought after
skills, and job descriptions are shifting as

408
00:36:53.880 --> 00:36:59.679
quickly as the new tools hit the
market. This unbundling of the data ecosystem

409
00:36:59.760 --> 00:37:04.239
has led to the problem that there
is no one end to end two causing

410
00:37:04.320 --> 00:37:09.079
teams to causing data teams and data
personas to do tape different products and frameworks

411
00:37:09.079 --> 00:37:15.119
to build these end to end automatic
or automated, agile and repeatable data driven

412
00:37:15.159 --> 00:37:21.880
systems. Let's look at this data
maturity spectrum right. We call it the

413
00:37:21.960 --> 00:37:27.239
DEKW pyramid, which will show in
the next slide. All organizations today are

414
00:37:27.280 --> 00:37:32.199
on this data journey, but it
turns out that almost every organization is stuck

415
00:37:32.280 --> 00:37:37.199
at the information layer. They find
it very difficult to cross that chasm.

416
00:37:37.239 --> 00:37:40.840
From information layer to the knowledge layer. They have no dearth of data.

417
00:37:40.880 --> 00:37:45.719
The data is hidden in data lakes, lakehouses, data warehouses. There's lots

418
00:37:45.760 --> 00:37:52.199
of data, but it lacks context
to make the connections across these data sources,

419
00:37:52.239 --> 00:37:57.440
across these data units to gain valuable
insights. So we have this thing

420
00:37:57.599 --> 00:38:01.559
where we say data, data everywhere, not a drop of insight, neither

421
00:38:01.840 --> 00:38:07.559
drop of context. We have heard
about big data, wide data and so

422
00:38:07.639 --> 00:38:13.400
on, but actually no one talks
about the context aspects. And as we

423
00:38:13.559 --> 00:38:19.119
get more and more data, the
context gets diluted unless it's managed well.

424
00:38:20.639 --> 00:38:25.639
So this is the dikw pyramit.
And what we see is most organizations just

425
00:38:25.760 --> 00:38:30.480
stop at that information layer. There
are some organizations that can make this transition

426
00:38:30.559 --> 00:38:35.000
to the knowledge layer through the use
of graphs and knowledge graphs. And the

427
00:38:35.039 --> 00:38:40.519
critical step, the critical misstep,
I would say is in adopting AI and

428
00:38:40.639 --> 00:38:45.920
data technologies is the disconnect, the
major disconnect between business needs and technology.

429
00:38:46.679 --> 00:38:53.519
Organizations rush to embrace technologies without clearly
defining the problems they're going to solve.

430
00:38:54.639 --> 00:39:00.440
They bring in different technologies in a
more bottom up the instead of taking a

431
00:39:00.480 --> 00:39:06.960
top down approach to solve business use
cases that leverage data and the technologies and

432
00:39:07.079 --> 00:39:12.880
hence missed the fact that they are
building They are building solutions that often don't

433
00:39:13.000 --> 00:39:16.039
serve the business, and this is
where we come up with this term called

434
00:39:16.280 --> 00:39:22.519
bad data tax. In this race
to become data driven, most of the

435
00:39:22.559 --> 00:39:30.119
efforts of most organizations have resulted in
a tangled web of data integrations often point

436
00:39:30.159 --> 00:39:36.599
to point and integrations, as well
as reconciliations across these data silos, and

437
00:39:36.840 --> 00:39:40.880
this has resulted in a huge cost
to the organization, often forty to sixty

438
00:39:40.960 --> 00:39:47.760
percent often enterprise's annual technology spend,
which amounts to millions in dollars. We

439
00:39:47.840 --> 00:39:55.719
call this the bad data tax,
and these investments often don't translate into insights

440
00:39:55.920 --> 00:40:01.199
needed to deliver better decisions or build
better processes, and hence there is a

441
00:40:01.239 --> 00:40:07.280
solid justification in every organization to fix
this so that those who need access to

442
00:40:07.320 --> 00:40:14.639
the data can convert it to insights
and drive business decisions and processes, and

443
00:40:14.719 --> 00:40:19.199
make data available and accessible in the
right format, in the format that is

444
00:40:19.239 --> 00:40:27.239
flexible, accurate, and machine readable. It's been also seeing that you know,

445
00:40:27.840 --> 00:40:30.920
from my experience at Gartner, when
I used to talk to customers and

446
00:40:31.039 --> 00:40:36.280
clients all the time. It seems
that only half of the CDOs are able

447
00:40:36.320 --> 00:40:40.639
to drive innovation using data and just
about forty percent of the CDOs manage data

448
00:40:40.719 --> 00:40:45.760
as a business asset. These are
some of the numbers that we have seen.

449
00:40:46.559 --> 00:40:52.400
Also comes to my mind is about
sixty sorry, about twenty twenty five

450
00:40:52.440 --> 00:40:58.440
percent of the CDOs also do not
have a single point of accountability for data

451
00:40:58.519 --> 00:41:05.480
within their organizations. This was a
survey done by sales Cource a year back

452
00:41:05.559 --> 00:41:13.559
where they illustrated this point that ninety
percent agree from the data teams that the

453
00:41:13.639 --> 00:41:17.679
need for trustworthy data is higher than
ever and today it's even more because it's

454
00:41:17.719 --> 00:41:22.679
again the whole idea about garbage in, garbage out. You're feeling if you're

455
00:41:22.679 --> 00:41:27.320
feeding garbage to your AI technology,
going to get garbage out. So this

456
00:41:27.519 --> 00:41:30.800
sort of summarizes this mind map sort
of summarizes what we discussed in terms of

457
00:41:30.920 --> 00:41:36.039
challenges that most organizations are facing.
One of the biggest challenges you will see,

458
00:41:36.039 --> 00:41:39.360
the one which is right the button, is also about findability. Most

459
00:41:39.519 --> 00:41:45.480
organizations, most data personas and in
most organizations cannot link the data, cannot

460
00:41:45.519 --> 00:41:52.039
find the data where because of lack
of context and McKenzie and IDC had done

461
00:41:52.079 --> 00:41:57.920
our research about a few years back
where they found out most data personas in

462
00:41:58.119 --> 00:42:04.000
organizations spend about thirty of their time
just finding the right data for their use

463
00:42:04.039 --> 00:42:08.000
case. So if we ask ourselves
these questions, right, why do most

464
00:42:08.119 --> 00:42:15.119
data like efforts fail and why is
data getting increasingly harder to find? Why

465
00:42:15.199 --> 00:42:20.800
existing data catalogs are not working for
most enterprises? The reason is to be

466
00:42:20.840 --> 00:42:23.960
effective to work with data, It's
not just technology and more data. It

467
00:42:24.039 --> 00:42:30.760
requires context and semantics to make the
data more powerful. What has happened is

468
00:42:31.119 --> 00:42:38.440
we are allowed the data to lose
consistency and precision of meaning because most organizations

469
00:42:38.760 --> 00:42:47.840
haven't thought about context and semantics as
they build their enterprise data platforms. So

470
00:42:47.920 --> 00:42:53.679
data by itself is powerful, but
the challenge is again the context. As

471
00:42:53.800 --> 00:42:59.639
data has grown in volumes, the
need for automated context has grown as well.

472
00:43:00.039 --> 00:43:07.360
And that's why organizations cannot aspire to
or cannot drink to become data or

473
00:43:07.400 --> 00:43:13.760
AI driven unless, in my mind, they are context driven and data context

474
00:43:13.760 --> 00:43:20.320
here includes both business technical metadata,
governance, privacy, accessibility issues. It

475
00:43:20.400 --> 00:43:29.679
is context that makes data more valuable. Okay, and in my opinion,

476
00:43:30.400 --> 00:43:37.000
data engineering will still remain a huge
cost center for most organizations until it matures

477
00:43:37.039 --> 00:43:45.480
from becoming a peer ETL oriented or
an ELT oriented approach to an ECL oriented

478
00:43:45.480 --> 00:43:52.320
approach. Where C is the context
by leveraging knowledge graphs and ontologies for knowledge

479
00:43:52.360 --> 00:43:57.639
management. So what is a knowledge
graph? Knowledge graph is sort of a

480
00:43:57.719 --> 00:44:02.880
network of entities representing real world domain
objects like people, organization, as well

481
00:44:02.880 --> 00:44:09.960
as concepts, topics and their semantic
relationships and attributes. Knowledge graphs, with

482
00:44:10.000 --> 00:44:16.639
their emphasis or with their more stress
on semantic relations between the entities, creates

483
00:44:17.360 --> 00:44:23.440
the context for both humans as well
as machines to do automated reasoning. And

484
00:44:24.760 --> 00:44:30.920
knowledge graphs go beyond just simple storage
and querying of data and it focuses more

485
00:44:30.960 --> 00:44:37.280
on the idea of definitions of the
connections between the entities and as you will

486
00:44:37.320 --> 00:44:42.880
see, it requires connecting the dots
across most of the organization where most organizations

487
00:44:42.880 --> 00:44:49.960
struggle, and knowledge graphs help to
build this foundational semantic graph layer by semantically

488
00:44:50.079 --> 00:44:54.079
linking the data across the silos,
whether it's structured, unstructured, semi structured,

489
00:44:54.679 --> 00:45:00.920
and that reduces or eliminates bottlenecks in
the process. Of becoming data driven.

490
00:45:05.559 --> 00:45:08.679
What makes the knowledge graph useful and
powerful is that semantic model, the

491
00:45:08.719 --> 00:45:15.639
ontologies, the taxonomies that includes domain
concepts and their inter relationships, their hierarchies,

492
00:45:15.679 --> 00:45:22.960
their dependencies. It is the semantics
that actually enriches data with the context

493
00:45:22.239 --> 00:45:31.880
that both machines and humans can interpret
unambiguously. So knowledge graphs actually give a

494
00:45:31.960 --> 00:45:37.360
holistic view of the data, revealing
these intricate hierarchies within the data, the

495
00:45:37.400 --> 00:45:44.360
precise definitions that gives meaning to the
data. As I said earlier, data

496
00:45:44.360 --> 00:45:49.559
by itself is powerful, but context
is what gives the meaning and real value

497
00:45:49.559 --> 00:45:52.480
to the data, you know,
real quick. So, man, we

498
00:45:52.519 --> 00:45:57.159
got a question from the audience,
and it's a really good question about data

499
00:45:57.239 --> 00:46:00.599
literacy, it seems to me.
And you've already mention data catalogs. Obviously,

500
00:46:00.639 --> 00:46:06.360
data catalogs are great for improving data
literacy because the whole point is to

501
00:46:06.400 --> 00:46:09.079
capture the definitions and the meanings of
these concepts and to share them in a

502
00:46:09.880 --> 00:46:14.400
useful, accessible way. But it
seems to me a knowledge grab is an

503
00:46:14.440 --> 00:46:19.199
incredibly powerful tool for improving data literacy. What do you think, absolutely,

504
00:46:19.199 --> 00:46:23.679
absolutely, that's a very good question
and very good point that knowledge graph has

505
00:46:23.840 --> 00:46:29.199
embedded in it the semantics and the
context, as I keep saying again and

506
00:46:29.239 --> 00:46:32.519
again, which are again very domain
specific. A lot of the times.

507
00:46:32.679 --> 00:46:40.000
The data literacy aspect is understanding the
data and understanding the nuances associated with the

508
00:46:40.079 --> 00:46:46.239
data, which is oftentimes embedded in
the heads of domain specialists, right data

509
00:46:46.320 --> 00:46:51.880
stewards, even data engineers. They
are encoding all those business rules in their

510
00:46:51.920 --> 00:46:59.599
SQL code and that sort of you
know, decouples the data from the logic,

511
00:47:00.119 --> 00:47:04.320
whereas in a knowledge graph, where
everything is connected. In a knowledge

512
00:47:04.360 --> 00:47:07.920
graph, the data and the metadata
are in the same place. That provides

513
00:47:08.199 --> 00:47:14.679
enormous benefits from a data literacy perspective. Spot on. Mm hmm, okay,

514
00:47:14.719 --> 00:47:21.119
go ahead. So the knowledge graph
platform adds semantics and meaning to the

515
00:47:21.199 --> 00:47:22.800
data, as we said, but
how does it do it right? It

516
00:47:22.960 --> 00:47:32.000
treats the connections between the different entities
as relationships are first class citizens using in

517
00:47:32.039 --> 00:47:36.199
a knowledge graph or in a graph
based technology, using nodes, edges and

518
00:47:36.320 --> 00:47:42.199
labels to depict these entities their inter
relationships and properties. It's this semantic layer

519
00:47:42.280 --> 00:47:46.119
that comes out from it that contextualizes
the data, giving it meaning, a

520
00:47:46.239 --> 00:47:52.840
formal representation and meaning, making it
machine interpretable. The advantage other advantage of

521
00:47:52.880 --> 00:48:00.400
knowledge graphs, especially built with urdfstack
is it follows open standards. The thing

522
00:48:00.599 --> 00:48:06.079
in a knowledge graph that is built
with an R day of stack is thereby

523
00:48:06.440 --> 00:48:15.800
reusable, it's interoperable, and it's
very amenable to data sharing with unambiguous semantics.

524
00:48:15.639 --> 00:48:20.679
Also, the other aspect of graphs
is graphs can be very easily accommodated

525
00:48:20.719 --> 00:48:27.239
to change to make changes because enterprise
data systems are always changing and graphs provide

526
00:48:27.239 --> 00:48:36.519
that flexibility of flexible schema to make
adjustments and changes. This shared meaning resolves

527
00:48:36.559 --> 00:48:39.800
a lot of the ambiguities associated,
especially when you're building let's say simple data

528
00:48:39.840 --> 00:48:45.039
pipeline. As you're handing off data
or bringing in data from operational systems to

529
00:48:45.559 --> 00:48:51.519
analytical systems utilize, there is a
sort of an impedance mismatch happens where the

530
00:48:51.639 --> 00:48:54.480
terms can mean something else in the
operational systems while it means something else in

531
00:48:54.519 --> 00:49:00.920
the analytics system That's where knowledge graphs, with their ideas of semantic with ontologies

532
00:49:00.960 --> 00:49:07.400
and taxolomies, removes these ambiguities.
Now, there are different ways in which

533
00:49:07.440 --> 00:49:13.280
semantic technologies increase the value of data. First, it helps in data integration.

534
00:49:13.599 --> 00:49:16.559
It's not we all do data integration. Data integration is probably one of

535
00:49:16.599 --> 00:49:21.360
the richest tools you would see in
that first slide or the third slide that

536
00:49:21.400 --> 00:49:23.960
I showed you, But those are
all doing data integration with a lot of

537
00:49:24.000 --> 00:49:27.760
the code, with a lot of
the business logic in the minds of the

538
00:49:27.800 --> 00:49:30.599
people who are implementing it. But
with knowledge graphs we do what is called

539
00:49:30.719 --> 00:49:37.840
semantic data antigration, where you are
semantically joining the data across different data silos.

540
00:49:37.880 --> 00:49:42.960
That's why you will see a lot
of the data fabrics are powered with

541
00:49:43.360 --> 00:49:47.800
knowledge graphs. The second aspect is
data quality, as it captures relationships between

542
00:49:47.840 --> 00:49:55.119
things and adding context through ontologies as
well as doing inferencing through the relationships.

543
00:49:57.159 --> 00:50:01.039
And a side benefit of this is
doing an resolution so we don't have to

544
00:50:01.119 --> 00:50:07.239
duplicate nodes or DeDuplicate nodes in a
graph. A knowledge graph built with RDF

545
00:50:07.239 --> 00:50:12.199
stack, you cannot have duplicates.
The system will not allow you to have

546
00:50:12.320 --> 00:50:17.480
duplicates. It removes that whole soft
data engineering aspect of you know where we

547
00:50:17.519 --> 00:50:22.079
do dduplication is a huge set of
work that needs to be done. But

548
00:50:22.119 --> 00:50:25.880
with knowledge graphs. When you're ingesting
data into a knowledge graph, especially with

549
00:50:25.920 --> 00:50:30.559
an RDF stack, it will prevent
duplicates from happening. So all this leads

550
00:50:30.599 --> 00:50:37.239
to trust with data validation, lineage, and provenance. Out of the box,

551
00:50:37.320 --> 00:50:42.800
a knowledge graph built with RDF provides
you provenance. Think about the tools

552
00:50:42.800 --> 00:50:46.039
that provide you provenance and lineage,
how much work they have to do to

553
00:50:46.199 --> 00:50:52.519
provide you the whole lineage stack.
While with knowledge graphs built in version in

554
00:50:52.599 --> 00:50:58.119
capabilities and capabilities to do provenance,
there is a provenance based ontology. It

555
00:50:58.679 --> 00:51:02.199
comes to you out of the box. Same thing with what we call fair

556
00:51:02.280 --> 00:51:08.039
principles. Data in our opinion,
needs to adhere to these fair principles,

557
00:51:08.079 --> 00:51:14.920
which is findable, accessible, interoperable, and reusable. And knowledge graphs provide

558
00:51:14.960 --> 00:51:19.719
you with all these capabilities. So
this light sort of summarizes all that we

559
00:51:19.880 --> 00:51:24.239
discussed earlier about the capabilities of knowledge
graphs to remove ambiguities, to represent data

560
00:51:24.280 --> 00:51:32.320
consistently, and integrate and unify the
data sources. Now, this graph foundation,

561
00:51:32.519 --> 00:51:37.800
this as what I've been talking about. Graph foundation with a knowledge graph

562
00:51:38.039 --> 00:51:45.280
with based with taxonomies and ontologies enables
you to do these things that that we

563
00:51:45.400 --> 00:51:51.880
have all been doing in a much
more seamless, much more cost effective way

564
00:51:51.960 --> 00:51:58.239
where you have to where you get
these new capabilities out of the box and

565
00:51:58.719 --> 00:52:01.519
these how does another edge graph do
it? It does it with these semantic

566
00:52:01.679 --> 00:52:07.360
standards. Semantic standards have been there
since the beginning of the century and last

567
00:52:07.400 --> 00:52:15.599
century, where it uses identities to
represent entities to represent the concepts and all

568
00:52:15.679 --> 00:52:22.360
these result in these foundational capabilities that
are very essential to the whole data management

569
00:52:22.800 --> 00:52:29.079
practice to provide things data quality,
with validation, capabilities of doing, of

570
00:52:29.239 --> 00:52:32.920
reusing, of governance and lineage,
and those on the right hand side you

571
00:52:32.960 --> 00:52:39.679
see the value drivers of all these
foundational capabilities. Now, this is the

572
00:52:39.840 --> 00:52:46.760
outline of an enterprise knowledge graph platform
and how it interplays, how it interconnects

573
00:52:46.800 --> 00:52:52.599
with the different tools and engines that
we have in most organizations in their legacy

574
00:52:52.599 --> 00:52:59.840
systems, and it supports two major
design patterns here, the semantic knowledge hub

575
00:53:00.239 --> 00:53:05.519
and the semantic data fabric. The
knowledge hub part of it uses knowledge graphs

576
00:53:05.559 --> 00:53:12.400
to manage documents and unstructured content.
Unstructured content is all around us and improves

577
00:53:12.440 --> 00:53:16.880
the way the documents are found,
especially with the relevance and with their precision

578
00:53:16.920 --> 00:53:22.320
and accuracy. The data fabric side, the semantic data fabric side, is

579
00:53:22.480 --> 00:53:30.320
the pattern that provides better unified access
across multiple structured or semi structured data sources

580
00:53:30.159 --> 00:53:36.360
and its objective is to enable twadying
all of them as if it's a single

581
00:53:36.400 --> 00:53:40.719
federated database or a data source in
both these use cases. Both these use

582
00:53:40.760 --> 00:53:45.000
cases leverage the semantic metadata, which, as you see in the center,

583
00:53:45.119 --> 00:53:52.440
is the conceptual model, which is
based on domain specific ontologies and other metadata

584
00:53:52.480 --> 00:54:00.239
capabilities. When you do this,
it makes data much more discoverable, interpretable,

585
00:54:01.199 --> 00:54:07.920
unambiguous and also consistent. And at
the bottom we see to manage these

586
00:54:07.960 --> 00:54:14.920
platforms we need different engines and capabilities. We include integration with llms, with

587
00:54:15.039 --> 00:54:20.559
machine learning tools, specially with text
analytics, document stores maybe full text search

588
00:54:20.559 --> 00:54:27.800
engines and vector databases for doing and
again integration with other LPG based graph sources

589
00:54:28.079 --> 00:54:34.840
to do graph analytics. And these
are some of the major use cases that

590
00:54:35.079 --> 00:54:38.559
utotext has been doing in the last
twenty years to solve some of these data

591
00:54:38.559 --> 00:54:46.440
management problems, along with the high
level architectural patterns that support those use cases.

592
00:54:47.079 --> 00:54:52.440
Now show you what some of the
next slides are mostly quotes which I

593
00:54:52.719 --> 00:54:57.679
quickly go through. You can read
them once once these decks are shared.

594
00:54:58.440 --> 00:55:01.639
However, there is one particular example
where even Gartner now has been talking about

595
00:55:01.719 --> 00:55:06.840
in the last five seven years about
the value of graphs and knowledge graphs as

596
00:55:06.840 --> 00:55:12.039
well as the semantics and the metadata
needed to be become successful with your data

597
00:55:12.079 --> 00:55:16.119
management practices. And if you have
seen Gartner's two thousand and four Impact Radar,

598
00:55:16.199 --> 00:55:21.320
you see knowledge graphs at the center
here where they're talking about knowledge graphs

599
00:55:21.719 --> 00:55:27.719
with the metadata aspects how knowledge graphs
can help with GENAI and LMS. One

600
00:55:27.719 --> 00:55:31.159
particular example I'd like you to look
at, and this is available even on

601
00:55:31.199 --> 00:55:37.000
YouTube if you search. In a
knowledge graph conference twenty twenty two twenty three,

602
00:55:37.840 --> 00:55:43.440
Gregor womb from he's the head of
Data Architecture at UBS, spoke about

603
00:55:43.599 --> 00:55:50.320
implementing their next generation data management based
on this foundational graph layer knowledge graph layer.

604
00:55:51.000 --> 00:55:53.880
They're doing all the things that sort
of I highlighted in the above slides,

605
00:55:54.360 --> 00:56:00.880
building common data models and ontologies with
the unified meaning that could be shared

606
00:56:00.920 --> 00:56:07.199
across the organization. They built this
with schema dot org with all the shared

607
00:56:07.239 --> 00:56:12.679
models and schemas that were standardized across
the organization. They built a data service

608
00:56:12.719 --> 00:56:16.599
to enrich their data by converting it
into a knowledge graph based on these shared

609
00:56:16.639 --> 00:56:22.920
ontologies. They cataloged all their data
assets to build this conceptual layer and map

610
00:56:23.000 --> 00:56:30.199
data to this player to power their
downstream applications, analytics and data products.

611
00:56:30.920 --> 00:56:36.880
So before I finished just the last
two slides, where what is this graph

612
00:56:37.000 --> 00:56:42.559
center of excellence? Right? So
how can something like this be implemented in

613
00:56:42.679 --> 00:56:51.119
larger organizations? Most large organizations also
have few isolated graph projects for specific niche

614
00:56:51.119 --> 00:56:55.480
projects and use cases like fraud detection
or recommendation. But this data management case

615
00:56:55.559 --> 00:57:00.800
is very different. We see companies
like US which have gone down this path

616
00:57:00.800 --> 00:57:10.000
and they have adopted this graph center
of excellence with strategic prioritization of graph use

617
00:57:10.039 --> 00:57:17.840
cases across the organization with c suit
sponsorship to start with a single sea level

618
00:57:17.880 --> 00:57:25.360
executive t who champions the strategic vision
to build or bring in knowledge graph based

619
00:57:25.880 --> 00:57:30.599
approaches to solve the data problems.
And finally, these are some of the

620
00:57:30.679 --> 00:57:37.079
key takeaways. In order to become
data driven in the age of AI requires

621
00:57:37.360 --> 00:57:43.920
organizations to shift from to shift to
a more connected and contextualized aspect of thinking

622
00:57:44.000 --> 00:57:50.480
about their data with graph technologies as
the foundational layer for their modern data management

623
00:57:51.400 --> 00:57:59.800
requirements because graph enables semantic data integration, traceability, ambiguity, resolving ambiguity and

624
00:58:00.039 --> 00:58:07.239
viiguous entities, promoting sort of consistency, sharing, reuse, and following the

625
00:58:07.239 --> 00:58:14.000
fair data principles to connect the dots
across the organization with this semantic graph layer

626
00:58:15.119 --> 00:58:21.960
using ontologies and taxonomies and controlled vocabularies
which are domain specific models for the organization.

627
00:58:22.960 --> 00:58:28.719
That was in shot about a quick
run through of some the graph approach.

628
00:58:29.159 --> 00:58:30.280
That's pretty impressive. We've got a
couple of questions, and I know

629
00:58:30.320 --> 00:58:34.400
you have to run in a few
minutes here, but a couple of quick

630
00:58:34.480 --> 00:58:39.159
questions. How does a graph maintain
lineage to the source once it's been loaded?

631
00:58:39.239 --> 00:58:43.920
Is that in the relationships between the
entities or how do you actually preserve

632
00:58:43.960 --> 00:58:49.800
lineage? Yeah, so if you
have to preserve lineage, then both yours.

633
00:58:51.079 --> 00:58:55.760
There is a specific ontology that can
be incorporated into the graph. It's

634
00:58:55.760 --> 00:59:00.800
called to provo. Provo is a
is a well known, publicly well ontology

635
00:59:00.519 --> 00:59:07.400
that is based on the RDF stack
that, when incorporated into your knowledge graph,

636
00:59:07.719 --> 00:59:12.840
can help you to do sort of
what current data lakes also are doing

637
00:59:12.960 --> 00:59:16.559
in terms of doing time travel,
in terms of doing versioning, to keep

638
00:59:16.639 --> 00:59:22.519
tag of how the data is morphing
as it is moving through your system,

639
00:59:22.599 --> 00:59:28.119
because remember, your data pipelines are
not getting replaced by data by knowledge graphs

640
00:59:28.159 --> 00:59:31.280
or the graph let The graph layer
is just the metadata around all your data

641
00:59:31.320 --> 00:59:39.880
workflows. Over KAA is Inland Express
KCAA Homlender ten fifty am the station that

642
00:59:40.000 --> 00:59:51.119
needs know this year behind. You're
listening to an OnCore presentation of this program

643
00:59:51.320 --> 00:59:59.679
KCAA the Inland Topic Express. Thank
you for tuning here for this suggestion.

644
01:00:00.280 --> 01:00:05.400
Justice Watch with Attorneys Zulu Ali.
I am Attorney Zulu Ali with a Justice

645
01:00:05.440 --> 01:00:08.280
Watch crew Rosa Nunez, Michael Blaud, Clark and