WEBVTT

1
00:00:00.080 --> 00:00:02.600
<v Speaker 1>You've seen the headlines, You've definitely heard the buzz about AI,

2
00:00:03.160 --> 00:00:07.440
<v Speaker 1>and maybe you've wondered how those really cool prototypes actually

3
00:00:07.440 --> 00:00:10.480
<v Speaker 1>become real world solutions.

4
00:00:09.919 --> 00:00:12.759
<v Speaker 2>Right, the ones that deliver actual tangible value.

5
00:00:12.919 --> 00:00:16.359
<v Speaker 1>Exactly what does it really take to move AI from

6
00:00:16.399 --> 00:00:19.320
<v Speaker 1>just a concept, like a neat idea to something that

7
00:00:19.359 --> 00:00:22.519
<v Speaker 1>actually drives business results. That's what we're getting into today.

8
00:00:22.640 --> 00:00:24.600
<v Speaker 2>Yeah, our mission here is to take you on a

9
00:00:24.640 --> 00:00:29.440
<v Speaker 2>deep dive into well productionizing AI. We're using insights from

10
00:00:29.519 --> 00:00:34.240
<v Speaker 2>Barry Walsh's guide on delivering AI B to B solutions,

11
00:00:34.240 --> 00:00:36.240
<v Speaker 2>specifically with Cloud and Python.

12
00:00:35.960 --> 00:00:37.920
<v Speaker 1>And we want to cut through the jargon right.

13
00:00:37.840 --> 00:00:42.000
<v Speaker 2>Absolutely, focus on the essential nuggets, you know, the core ecosystem,

14
00:00:42.039 --> 00:00:45.640
<v Speaker 2>the practical steps, the best practices for building AI apps

15
00:00:45.840 --> 00:00:47.560
<v Speaker 2>that actually work and succeed out there.

16
00:00:47.600 --> 00:00:50.039
<v Speaker 1>Because this isn't just about the technology itself, is it.

17
00:00:50.039 --> 00:00:53.640
<v Speaker 1>It's more about understanding that whole journey, how AI moves

18
00:00:53.640 --> 00:00:57.200
<v Speaker 1>from you know, pure hype to actual concrete return on

19
00:00:57.280 --> 00:00:58.479
<v Speaker 1>investment ROI.

20
00:00:58.640 --> 00:00:59.280
<v Speaker 2>That's the key.

21
00:00:59.359 --> 00:01:01.960
<v Speaker 1>We want to show you why employers are so focused

22
00:01:02.000 --> 00:01:07.120
<v Speaker 1>on these high value AI solutions and how understanding this

23
00:01:07.200 --> 00:01:11.000
<v Speaker 1>process can give you some real aha moments about the

24
00:01:11.000 --> 00:01:14.200
<v Speaker 1>future of tech and business and maybe you're rolling it too.

25
00:01:14.519 --> 00:01:17.840
<v Speaker 2>So okay, let's start maybe by demystifying the AI ecosystem

26
00:01:17.879 --> 00:01:21.719
<v Speaker 2>a bit and what we even mean by productionizing.

27
00:01:20.920 --> 00:01:22.040
<v Speaker 1>AI good idea.

28
00:01:22.280 --> 00:01:25.040
<v Speaker 2>For years, AI felt like it was always just around

29
00:01:25.040 --> 00:01:28.560
<v Speaker 2>the corner. But now, well, now it's actually delivering, and

30
00:01:28.760 --> 00:01:31.159
<v Speaker 2>things like the pandemic definitely accelerated that.

31
00:01:31.280 --> 00:01:33.680
<v Speaker 1>Shaw that with chatbots, right, suddenly they were everywhere for

32
00:01:33.680 --> 00:01:34.719
<v Speaker 1>customer service.

33
00:01:34.519 --> 00:01:38.920
<v Speaker 2>Exactly, or deep learning aiding, healthcare diagnostics, even computer vision

34
00:01:39.000 --> 00:01:42.000
<v Speaker 2>for remember the social distancing stuff. Oh yeah, these weren't

35
00:01:42.040 --> 00:01:44.640
<v Speaker 2>just lab experiments. They became pretty critical tools fast.

36
00:01:45.200 --> 00:01:48.040
<v Speaker 1>And the really big shift, Walsh points this out, is

37
00:01:48.079 --> 00:01:52.680
<v Speaker 1>moving beyond those cool but maybe isolated, standalone AI projects.

38
00:01:53.159 --> 00:01:56.920
<v Speaker 1>Companies are now chasing this bigger enterprise AI vision and

39
00:01:56.959 --> 00:01:59.519
<v Speaker 1>the market for that it's projected at what three hundred

40
00:01:59.519 --> 00:02:01.719
<v Speaker 1>and forty one billion dollars huge.

41
00:02:01.640 --> 00:02:04.799
<v Speaker 2>It's massive. Yeah, So it's not just one off solutions anymore.

42
00:02:04.799 --> 00:02:07.640
<v Speaker 2>It's got to be a broader, integrated strategy.

43
00:02:07.200 --> 00:02:10.159
<v Speaker 1>Which demands something different. Right, this industrialization of.

44
00:02:10.120 --> 00:02:15.439
<v Speaker 2>AI exactly, that's the term. It means focusing on reusability, scalability, safety,

45
00:02:16.360 --> 00:02:19.520
<v Speaker 2>and this is key building these things in right from.

46
00:02:19.360 --> 00:02:21.879
<v Speaker 1>The design face, not just an afterthought.

47
00:02:21.599 --> 00:02:25.280
<v Speaker 2>Definitely not an afterthought. That way, AI becomes a reliable

48
00:02:25.360 --> 00:02:28.080
<v Speaker 2>business asset, not just you know, a science.

49
00:02:27.759 --> 00:02:30.919
<v Speaker 1>Project that makes complete sense. But okay, if it's such

50
00:02:30.960 --> 00:02:34.680
<v Speaker 1>a clear benefit, what holds companies back? What are the

51
00:02:34.759 --> 00:02:38.680
<v Speaker 1>hurdles they face trying to get this enterprise AI thing going?

52
00:02:38.800 --> 00:02:39.000
<v Speaker 3>Well?

53
00:02:39.039 --> 00:02:41.240
<v Speaker 2>You know, even with all the potential, a lot of

54
00:02:41.240 --> 00:02:43.919
<v Speaker 2>companies do struggle. Sometimes it's just a lack of awareness

55
00:02:43.919 --> 00:02:45.240
<v Speaker 2>of what AI can really.

56
00:02:45.039 --> 00:02:47.000
<v Speaker 1>Do, or they're stuck with old tools.

57
00:02:46.680 --> 00:02:51.080
<v Speaker 2>That too, legacy tools, maybe a general resistance to innovation.

58
00:02:51.199 --> 00:02:53.319
<v Speaker 2>And of course you always have ethical concerns or worries

59
00:02:53.360 --> 00:02:57.120
<v Speaker 2>about jobs understandable, but ultimately, the best way to look

60
00:02:57.159 --> 00:03:02.199
<v Speaker 2>at it, the forward thinking view is augmented intelligence AI

61
00:03:02.360 --> 00:03:06.400
<v Speaker 2>designed to solve specific business problems, and often there's still

62
00:03:06.439 --> 00:03:07.080
<v Speaker 2>a human in the.

63
00:03:07.080 --> 00:03:09.680
<v Speaker 1>Loop right guiding it, checking it exactly.

64
00:03:09.719 --> 00:03:13.560
<v Speaker 2>It's about augmenting what humans can do, not necessarily replacing

65
00:03:13.599 --> 00:03:14.240
<v Speaker 2>them entirely.

66
00:03:14.719 --> 00:03:18.719
<v Speaker 1>That's a really important distinction. Now, Okay, core concepts sometimes

67
00:03:18.800 --> 00:03:22.240
<v Speaker 1>terms get jumbled. What we're mostly using in business today,

68
00:03:22.280 --> 00:03:23.400
<v Speaker 1>that's narrow AI.

69
00:03:23.639 --> 00:03:28.879
<v Speaker 2>That's right, machines designed for one single specific task like

70
00:03:28.919 --> 00:03:32.360
<v Speaker 2>Google Translate. It's amazing at translation, but you can't ask

71
00:03:32.400 --> 00:03:33.319
<v Speaker 2>it about the weather, right.

72
00:03:33.360 --> 00:03:36.120
<v Speaker 1>It doesn't have general smarts, the sci fi version the

73
00:03:36.120 --> 00:03:42.159
<v Speaker 1>thinking machine that's artificial general intelligence AGI still mostly theory, still.

74
00:03:41.919 --> 00:03:45.319
<v Speaker 2>Very much theory. And within narrow AI, the key techniques

75
00:03:45.360 --> 00:03:49.400
<v Speaker 2>we use are machine learning mL and deep learning DL.

76
00:03:49.560 --> 00:03:51.840
<v Speaker 1>And deep learning is a type of machine learning.

77
00:03:51.639 --> 00:03:54.520
<v Speaker 2>Yeah, basically a more advanced subset. It's specifically built for

78
00:03:54.599 --> 00:03:58.879
<v Speaker 2>tackling those really complex big data problems, often using structures

79
00:03:58.919 --> 00:03:59.960
<v Speaker 2>we call neural networks.

80
00:04:00.199 --> 00:04:02.879
<v Speaker 1>Okay, so if mL and DL are the techniques, how

81
00:04:02.919 --> 00:04:04.960
<v Speaker 1>does data science fit in? Where does that sit?

82
00:04:05.319 --> 00:04:08.599
<v Speaker 2>Good question? Data science is kind of the big umbrella.

83
00:04:09.120 --> 00:04:13.840
<v Speaker 2>Think of it, covering everything the modeling, statistics, programming, plus

84
00:04:14.039 --> 00:04:18.800
<v Speaker 2>crucially knowing the business domain, all aimed at extracting real

85
00:04:18.839 --> 00:04:23.800
<v Speaker 2>insights and value from data. It's the whole toolkit you need, really.

86
00:04:23.519 --> 00:04:26.199
<v Speaker 1>And honestly, none of this would scale what it without.

87
00:04:26.199 --> 00:04:28.959
<v Speaker 1>The cloud. Cloud computing is just fundamental for.

88
00:04:29.040 --> 00:04:32.120
<v Speaker 2>AI today, absolutely essential. It's the primary enabler.

89
00:04:32.160 --> 00:04:37.120
<v Speaker 1>And when we talk cloud, everyone knows the big three Aws, Microsoft, Azure,

90
00:04:37.199 --> 00:04:39.279
<v Speaker 1>Google Cloud Platform GCP.

91
00:04:39.600 --> 00:04:41.720
<v Speaker 2>Those are the main players for sure, though you also

92
00:04:41.759 --> 00:04:45.360
<v Speaker 2>have others like IBM Cloud Heroku doing significant work too.

93
00:04:45.560 --> 00:04:48.639
<v Speaker 1>The key things the cloud provides for AI are basically

94
00:04:48.839 --> 00:04:50.199
<v Speaker 1>storage and compute power.

95
00:04:50.319 --> 00:04:53.279
<v Speaker 2>Right, that's it, storage and compute, And it's worth noting

96
00:04:53.399 --> 00:04:57.480
<v Speaker 2>deep learning projects they need a lot of both, huge overhead.

97
00:04:57.120 --> 00:05:00.160
<v Speaker 1>Sometimes, but machine learning is maybe less demanding.

98
00:04:59.759 --> 00:05:03.279
<v Speaker 2>Off and yeah, mL projects can frequently run with fewer resources.

99
00:05:03.439 --> 00:05:06.480
<v Speaker 2>What's interesting though, a trend we're seeing is companies worried

100
00:05:06.480 --> 00:05:07.360
<v Speaker 2>about vendor.

101
00:05:07.160 --> 00:05:09.319
<v Speaker 1>Lock, being stuck with one provider.

102
00:05:09.160 --> 00:05:12.199
<v Speaker 2>Exactly, so they're moving towards multi cloud or hybrid setups.

103
00:05:12.560 --> 00:05:15.040
<v Speaker 2>Gives them more flexibility strategically, and.

104
00:05:15.000 --> 00:05:17.920
<v Speaker 1>How do they package these AI applications up to run

105
00:05:18.040 --> 00:05:22.079
<v Speaker 1>reliably everywhere? That's a containerization, isn't it like Docker spot

106
00:05:22.120 --> 00:05:23.600
<v Speaker 1>on containerization.

107
00:05:23.759 --> 00:05:26.279
<v Speaker 2>Docker's the big name there. Has pretty much become the

108
00:05:26.360 --> 00:05:30.839
<v Speaker 2>standard way to productionize AI apps. Why is that, Well,

109
00:05:31.279 --> 00:05:34.959
<v Speaker 2>think of containers like those standard shipping containers. They bundle

110
00:05:35.040 --> 00:05:39.680
<v Speaker 2>everything the app needs, code, libraries, settings. It makes them lightweight, portable,

111
00:05:39.959 --> 00:05:41.680
<v Speaker 2>and they run in isolated environments.

112
00:05:41.759 --> 00:05:44.920
<v Speaker 1>So consistent deployment no matter the underlying.

113
00:05:44.480 --> 00:05:48.839
<v Speaker 2>System precisely really important for getting things working reliably in production.

114
00:05:49.120 --> 00:05:52.079
<v Speaker 1>Okay, so we have the concepts, the cloud foundation. Now

115
00:05:52.160 --> 00:05:56.120
<v Speaker 1>let's dig into the operational side, the blueprint. Barry Walsh

116
00:05:56.160 --> 00:05:59.560
<v Speaker 1>really hammers this home data strategy is paramount. He cites

117
00:05:59.560 --> 00:06:02.600
<v Speaker 1>the numbers something like eighty nine percent of businesses struggle

118
00:06:02.639 --> 00:06:03.639
<v Speaker 1>with data management.

119
00:06:03.759 --> 00:06:06.079
<v Speaker 2>Yeah, it's a striking figure, isn't it. And his point

120
00:06:06.160 --> 00:06:09.800
<v Speaker 2>is without a solid data strategy, your analytics, your AI initiatives,

121
00:06:10.279 --> 00:06:12.000
<v Speaker 2>they're likely doomed from the start.

122
00:06:12.240 --> 00:06:14.120
<v Speaker 1>That's a sobering thought, it is.

123
00:06:14.360 --> 00:06:17.959
<v Speaker 2>And to tackle that exact challenge, we have this methodology

124
00:06:18.000 --> 00:06:18.839
<v Speaker 2>called data ops.

125
00:06:19.160 --> 00:06:21.040
<v Speaker 1>Okay, data ops, what's that about?

126
00:06:21.240 --> 00:06:26.600
<v Speaker 2>Basically merges ideas from DevOps, agile methods and lean manufacturing.

127
00:06:25.959 --> 00:06:28.519
<v Speaker 1>Principles and data obviously, right.

128
00:06:28.480 --> 00:06:31.879
<v Speaker 2>The core goal is to streamline your data pipelines, really

129
00:06:31.879 --> 00:06:35.800
<v Speaker 2>boost data quality and reliability, shorten that innovation cycle time,

130
00:06:36.120 --> 00:06:36.759
<v Speaker 2>cut down.

131
00:06:36.560 --> 00:06:38.399
<v Speaker 1>Production error, and improve collaboration.

132
00:06:38.600 --> 00:06:41.879
<v Speaker 2>Yeah, often through things like self service tools. It fosters

133
00:06:41.920 --> 00:06:46.319
<v Speaker 2>this culture of continuous improvement, encouraging experimentation, sort of lab

134
00:06:46.360 --> 00:06:48.480
<v Speaker 2>based innovation with data.

135
00:06:48.600 --> 00:06:51.920
<v Speaker 1>Okay, So data ops handles the data flow and quality.

136
00:06:52.079 --> 00:06:54.240
<v Speaker 1>Then you mentioneds mlops.

137
00:06:54.319 --> 00:06:57.680
<v Speaker 2>Yes. Now, a really critical insight here is that AI

138
00:06:57.720 --> 00:07:01.800
<v Speaker 2>and machine learning aren't just about code code plus data.

139
00:07:01.399 --> 00:07:04.480
<v Speaker 1>And the data part is tricky because it changes exactly

140
00:07:04.519 --> 00:07:05.199
<v Speaker 1>code development.

141
00:07:05.279 --> 00:07:07.160
<v Speaker 2>We kind of know how to control that, but data

142
00:07:07.560 --> 00:07:09.959
<v Speaker 2>is dynamic, it evolves on its own, and that's a

143
00:07:10.000 --> 00:07:13.600
<v Speaker 2>massive challenge. There's this frustrating statistic that less than half

144
00:07:13.800 --> 00:07:15.959
<v Speaker 2>of mL models actually make it into production.

145
00:07:16.079 --> 00:07:17.079
<v Speaker 1>Wow, less than half?

146
00:07:17.360 --> 00:07:20.959
<v Speaker 2>Yeah, not great, it's not. And that's precisely where MLUPS

147
00:07:20.959 --> 00:07:23.839
<v Speaker 2>comes in. It applies all those proven best practices from

148
00:07:23.879 --> 00:07:27.560
<v Speaker 2>DevOps and data ops, but specifically to the entire machine

149
00:07:27.639 --> 00:07:28.480
<v Speaker 2>learning life cycle.

150
00:07:28.600 --> 00:07:30.240
<v Speaker 1>So from the very beginning.

151
00:07:29.959 --> 00:07:33.800
<v Speaker 2>Yeah, data prep, model training, all the way through deployment, monitoring,

152
00:07:34.279 --> 00:07:39.720
<v Speaker 2>and that vital continuous improvement. Looplops is really about bridging

153
00:07:39.759 --> 00:07:43.720
<v Speaker 2>that gap, that painful gap between developing a model and

154
00:07:43.759 --> 00:07:46.000
<v Speaker 2>actually using it effectively in the real world.

155
00:07:46.120 --> 00:07:48.519
<v Speaker 1>And you mentioned agile methods are key in data ops,

156
00:07:48.560 --> 00:07:49.439
<v Speaker 1>same for mlops.

157
00:07:49.480 --> 00:07:53.480
<v Speaker 2>Absolutely central Agile fosters that collaboration and adaptability you need.

158
00:07:54.240 --> 00:07:56.920
<v Speaker 2>AI project teams ideally are really diverse.

159
00:07:57.000 --> 00:07:57.959
<v Speaker 1>Who's usually on them.

160
00:07:58.040 --> 00:08:01.759
<v Speaker 2>You'll have business users, data arket, tech solution architects, data engineers,

161
00:08:01.839 --> 00:08:05.879
<v Speaker 2>data scientists, mL engineers, IT operations folks, everyone.

162
00:08:05.560 --> 00:08:07.439
<v Speaker 1>Involved, and they work in sprints typically.

163
00:08:07.519 --> 00:08:10.600
<v Speaker 2>Yeah, development sprints or product sprints, usually two to four

164
00:08:10.600 --> 00:08:14.040
<v Speaker 2>weeks long. This lets them prioritize and deliver features fixed

165
00:08:14.040 --> 00:08:15.639
<v Speaker 2>bugs incrementally.

166
00:08:15.759 --> 00:08:16.800
<v Speaker 1>The benefits seem clear.

167
00:08:16.879 --> 00:08:21.519
<v Speaker 2>Then more flexibility, definitely faster delivery times too, and crucially,

168
00:08:21.680 --> 00:08:24.560
<v Speaker 2>you reduce the risk of building something nobody actually wants

169
00:08:24.639 --> 00:08:28.519
<v Speaker 2>or needs because you're constantly adapting to changing requirements as

170
00:08:28.519 --> 00:08:29.120
<v Speaker 2>you learn more.

171
00:08:29.480 --> 00:08:32.879
<v Speaker 1>Collaboration sounds key. You mentioned tools like get and GitHub.

172
00:08:33.080 --> 00:08:33.919
<v Speaker 1>How do they fit in?

173
00:08:34.559 --> 00:08:39.360
<v Speaker 2>They're huge version control, especially distributed version control systems or

174
00:08:39.440 --> 00:08:43.960
<v Speaker 2>dvcs like Geit is just invaluable. It tracks and manages

175
00:08:44.080 --> 00:08:46.279
<v Speaker 2>changes not just a source code, but also to your

176
00:08:46.360 --> 00:08:47.120
<v Speaker 2>data sets as.

177
00:08:47.000 --> 00:08:50.480
<v Speaker 1>They evolve, tracking data changes too. That's interesting for AI.

178
00:08:50.759 --> 00:08:54.879
<v Speaker 2>It's a massive benefit. Get helps reduce development time, leads

179
00:08:54.879 --> 00:08:58.720
<v Speaker 2>to much higher success rates for deployments, gives you transparent traceability.

180
00:08:58.960 --> 00:09:02.279
<v Speaker 2>You know exactly who change what when, and critically the

181
00:09:02.320 --> 00:09:03.519
<v Speaker 2>ability to roll back.

182
00:09:03.559 --> 00:09:05.919
<v Speaker 1>To a previous version if something goes wrong.

183
00:09:05.759 --> 00:09:09.279
<v Speaker 2>Exactly previous states of both code and data. The getub

184
00:09:09.279 --> 00:09:12.120
<v Speaker 2>ecosystem ties it all together. You've got the command line

185
00:09:12.120 --> 00:09:15.679
<v Speaker 2>tool get itself, the cloud hosting on GitHub, and a

186
00:09:15.720 --> 00:09:18.919
<v Speaker 2>desktop app too. Makes team collaboration much.

187
00:09:18.720 --> 00:09:22.320
<v Speaker 1>Smoother, and then automating the whole release pipeline. That's CICD right.

188
00:09:22.399 --> 00:09:24.600
<v Speaker 1>Continuous integration continues delivery.

189
00:09:24.440 --> 00:09:28.919
<v Speaker 2>That's the one. CICD automates the build, test, and deployment stages.

190
00:09:29.399 --> 00:09:33.039
<v Speaker 2>It lets you push out software updates constantly, reliably with

191
00:09:33.240 --> 00:09:34.639
<v Speaker 2>minimal manual fuss.

192
00:09:34.759 --> 00:09:37.440
<v Speaker 1>How does that apply specifically in the AI data ops world.

193
00:09:37.559 --> 00:09:40.720
<v Speaker 2>Well, the automation extends even further there. It includes orchestrating

194
00:09:40.759 --> 00:09:44.080
<v Speaker 2>your data pipelines automatically. It can even integrate things like

195
00:09:44.159 --> 00:09:45.080
<v Speaker 2>data drift detection.

196
00:09:45.240 --> 00:09:46.080
<v Speaker 1>What's data drift?

197
00:09:46.200 --> 00:09:49.000
<v Speaker 2>That's when the live data your model season production starts

198
00:09:49.039 --> 00:09:51.679
<v Speaker 2>looking different from the data it was trained on, which

199
00:09:51.759 --> 00:09:53.159
<v Speaker 2>can really mess up performance.

200
00:09:53.399 --> 00:09:55.720
<v Speaker 1>Ah, So CICD can catch.

201
00:09:55.480 --> 00:09:58.919
<v Speaker 2>That, it can trigger alerts or even automated MITL retraining

202
00:09:59.159 --> 00:10:03.519
<v Speaker 2>tools like Jenkins makes setting up these CICD environments relatively straightforward.

203
00:10:03.600 --> 00:10:06.519
<v Speaker 2>Using pipeline scripts. It just makes the whole D end

204
00:10:06.519 --> 00:10:08.279
<v Speaker 2>process much more efficient and robust.

205
00:10:08.480 --> 00:10:11.200
<v Speaker 1>Okay, let's switch gears slightly and talk about the data itself,

206
00:10:11.240 --> 00:10:14.000
<v Speaker 1>because we hear about the data deluge all the time.

207
00:10:14.039 --> 00:10:17.240
<v Speaker 1>What was that prediction? One hundred and seventy five zetabytes

208
00:10:17.759 --> 00:10:19.039
<v Speaker 1>by twenty twenty five.

209
00:10:18.919 --> 00:10:22.879
<v Speaker 2>Something staggering like that. Yeah, a zetabyte is a trilly gigabytes.

210
00:10:23.000 --> 00:10:24.639
<v Speaker 2>It's almost impossible to comprehend.

211
00:10:24.879 --> 00:10:28.159
<v Speaker 1>But the real challenge, the insight for businesses isn't just

212
00:10:28.240 --> 00:10:30.039
<v Speaker 1>the volume, is it. It's the messiness.

213
00:10:30.120 --> 00:10:34.799
<v Speaker 2>Absolutely, raw data almost never comes ready to use. Companies

214
00:10:34.840 --> 00:10:37.879
<v Speaker 2>often seriously underestimate the sheer effort needed to turn that

215
00:10:38.000 --> 00:10:40.759
<v Speaker 2>raw stuff into clean, valuable data.

216
00:10:40.480 --> 00:10:41.759
<v Speaker 1>And that's a critical step.

217
00:10:42.000 --> 00:10:44.600
<v Speaker 2>It can absolutely make or break an AI project right

218
00:10:44.639 --> 00:10:47.639
<v Speaker 2>at the start. Many initiatives stumble right there because they

219
00:10:47.720 --> 00:10:49.360
<v Speaker 2>underestimate the cleaning and prep work.

220
00:10:49.440 --> 00:10:51.840
<v Speaker 1>So how do you manage that influx? IBM has this

221
00:10:51.960 --> 00:10:53.240
<v Speaker 1>concept the AI Ladder.

222
00:10:53.399 --> 00:10:58.240
<v Speaker 2>Yeah, it outlines a strategic sequence collect, organize, analyze, and

223
00:10:58.279 --> 00:11:02.159
<v Speaker 2>then infuse AI. A key part of it is unifying

224
00:11:02.240 --> 00:11:05.879
<v Speaker 2>your data, often across multiple clouds, maybe using a data lake.

225
00:11:06.159 --> 00:11:09.159
<v Speaker 1>Unification is key for getting that complete picture totally.

226
00:11:09.200 --> 00:11:11.840
<v Speaker 2>And this leads us to data pipelines. These are the

227
00:11:11.919 --> 00:11:13.519
<v Speaker 2>automated workflows that move.

228
00:11:13.399 --> 00:11:16.039
<v Speaker 1>Data around like plumbing for data kind.

229
00:11:15.919 --> 00:11:19.559
<v Speaker 2>Of Yeah, automated series of actions extract or ingest data

230
00:11:19.600 --> 00:11:22.720
<v Speaker 2>from sources, transform it so it's usable, and then load

231
00:11:22.759 --> 00:11:25.679
<v Speaker 2>it into a data store for analysis. It's that classic

232
00:11:26.000 --> 00:11:29.000
<v Speaker 2>extract transform load loop ETL.

233
00:11:29.200 --> 00:11:30.559
<v Speaker 1>Let's break that ETL down.

234
00:11:30.759 --> 00:11:34.080
<v Speaker 3>Extraction that's just pulling data from all sorts of places

235
00:11:34.080 --> 00:11:39.480
<v Speaker 3>text files, databases, websites, APIs, and increasingly using efficient formats

236
00:11:39.480 --> 00:11:42.879
<v Speaker 3>like Parquet or AVRO. Then transformation that's the prep work

237
00:11:43.279 --> 00:11:47.039
<v Speaker 3>getting the data ready for whatever system comes next. Involves formatting,

238
00:11:47.240 --> 00:11:52.759
<v Speaker 3>filtering out bad data, encoding things, numerically, scaling values, normalizing,

239
00:11:53.080 --> 00:11:56.039
<v Speaker 3>maybe splitting data sets lots of steps.

240
00:11:55.679 --> 00:11:58.200
<v Speaker 1>Potentially, and finally loading, just.

241
00:11:58.200 --> 00:12:01.759
<v Speaker 2>Putting that cleaned transformed data into its final destination, the

242
00:12:01.840 --> 00:12:02.399
<v Speaker 2>data store.

243
00:12:02.440 --> 00:12:06.240
<v Speaker 1>Okay eto. But then there's another term, data wrangling. How

244
00:12:06.279 --> 00:12:07.559
<v Speaker 1>is that different from transformation?

245
00:12:07.759 --> 00:12:11.120
<v Speaker 2>Good question. Data wrangling is maybe more active, more iterative.

246
00:12:11.720 --> 00:12:14.519
<v Speaker 2>It happens after you've acquired the data, but before you

247
00:12:14.559 --> 00:12:18.360
<v Speaker 2>start building models. Unlike say, exploratory data analysis, where you're

248
00:12:18.360 --> 00:12:21.600
<v Speaker 2>just looking, wrangling actively changes the data to make it

249
00:12:21.639 --> 00:12:22.799
<v Speaker 2>suitable for mL or.

250
00:12:22.759 --> 00:12:25.799
<v Speaker 1>DL, So it's really shaping the data for the model exactly.

251
00:12:25.919 --> 00:12:29.519
<v Speaker 2>It includes things like deciding how to handle missing values,

252
00:12:29.559 --> 00:12:33.320
<v Speaker 2>do you drop the rows, fill them with the average, interpolate.

253
00:12:34.159 --> 00:12:38.440
<v Speaker 2>It also means dealing with outliers, encoding, categorical data, scaling,

254
00:12:38.519 --> 00:12:43.080
<v Speaker 2>numerical features. It's all about optimizing the data for the algorithm.

255
00:12:43.399 --> 00:12:45.759
<v Speaker 1>Right, makes sense. So once the data is wrangled, you

256
00:12:45.799 --> 00:12:47.639
<v Speaker 1>need somewhere to put it. You mentioned data lakes.

257
00:12:47.720 --> 00:12:50.200
<v Speaker 2>Yep, data lakes are popular. They're basically a single large

258
00:12:50.240 --> 00:12:54.720
<v Speaker 2>repository for all kinds of data, raw, structured, unstructured, semi structured.

259
00:12:55.039 --> 00:12:58.399
<v Speaker 2>Great for handling that variety of velocity, volume, veracity the

260
00:12:58.480 --> 00:12:59.440
<v Speaker 2>vis of big data.

261
00:12:59.600 --> 00:13:00.440
<v Speaker 1>But there's a catch.

262
00:13:00.519 --> 00:13:03.720
<v Speaker 2>There is Without really good cataloging and governance, a data

263
00:13:03.799 --> 00:13:06.559
<v Speaker 2>lake can easily turn into a data swamp, just a

264
00:13:06.799 --> 00:13:08.200
<v Speaker 2>mess of unusable data.

265
00:13:08.279 --> 00:13:11.240
<v Speaker 1>Okay, so governance is crucial. What about data warehouses?

266
00:13:11.679 --> 00:13:17.159
<v Speaker 2>Data warehouses are traditionally more structured, clean organized data, mostly structured,

267
00:13:17.639 --> 00:13:20.399
<v Speaker 2>often serving as the single source of truth for reporting

268
00:13:20.440 --> 00:13:23.639
<v Speaker 2>and analytics, though modern ones are getting better at handling

269
00:13:23.720 --> 00:13:24.919
<v Speaker 2>unstructured data too.

270
00:13:24.919 --> 00:13:25.799
<v Speaker 1>And data marts.

271
00:13:25.840 --> 00:13:28.840
<v Speaker 2>Those are usually smaller, focused subsets of a data warehouse,

272
00:13:29.120 --> 00:13:31.639
<v Speaker 2>tailored for a specific department or analytical need.

273
00:13:31.840 --> 00:13:34.200
<v Speaker 1>And there's a newer concept, lakehouse.

274
00:13:34.480 --> 00:13:36.600
<v Speaker 2>Yeah. The lakehouse idea tries to blend the best of

275
00:13:36.639 --> 00:13:39.799
<v Speaker 2>both the flexibility and cheap storage of a data lake,

276
00:13:40.240 --> 00:13:42.799
<v Speaker 2>combined with the data management and structure features of a

277
00:13:42.879 --> 00:13:45.159
<v Speaker 2>data warehouse still evolving but.

278
00:13:45.200 --> 00:13:48.720
<v Speaker 1>Promising, and choosing between ETL and ELT. That's a strategic

279
00:13:48.759 --> 00:13:50.159
<v Speaker 1>decision too, right Definitely.

280
00:13:50.399 --> 00:13:55.120
<v Speaker 2>ETL extract transform load is the classic way transform the

281
00:13:55.200 --> 00:13:58.480
<v Speaker 2>data before loading it. Often good for structured data or

282
00:13:58.600 --> 00:14:03.559
<v Speaker 2>migrating to the cloud. ELT ELT xtract load transform flips it.

283
00:14:03.960 --> 00:14:07.480
<v Speaker 2>You'd load the raw data into storage first, then transform it.

284
00:14:07.600 --> 00:14:10.600
<v Speaker 2>This is really popular for data lakes and exploratory analysis.

285
00:14:10.639 --> 00:14:12.559
<v Speaker 1>Why more flexibility exactly?

286
00:14:13.159 --> 00:14:15.679
<v Speaker 2>Data scientists can access the raw data and decide on

287
00:14:15.720 --> 00:14:18.559
<v Speaker 2>transformations later as they figure out what they need. Much

288
00:14:18.559 --> 00:14:20.639
<v Speaker 2>more agile for exploration.

289
00:14:20.200 --> 00:14:24.600
<v Speaker 1>Than the databases themselves. SQL versus no SQL quick rundown sure.

290
00:14:24.519 --> 00:14:28.480
<v Speaker 2>SQL databases think Microsoft SQL server, my School postgress are relational.

291
00:14:28.639 --> 00:14:31.679
<v Speaker 2>They use pre defined schemas prey structured. Great for complex

292
00:14:31.720 --> 00:14:34.799
<v Speaker 2>analytical queries. What we call ol app using powerful joints

293
00:14:34.919 --> 00:14:38.720
<v Speaker 2>and no SQL, no SEQL like Mungo, DBE, Cassandra AWS,

294
00:14:38.799 --> 00:14:43.720
<v Speaker 2>DynamoDB are non relational, often schemeless or flexible schema designed

295
00:14:43.720 --> 00:14:47.440
<v Speaker 2>for massive scale, high speed and handling frequent changes like

296
00:14:47.440 --> 00:14:51.200
<v Speaker 2>in web apps. Often used for OLTP transactional.

297
00:14:50.639 --> 00:14:53.759
<v Speaker 1>Data and what if you need insights right now from

298
00:14:53.840 --> 00:14:55.279
<v Speaker 1>data that's constantly flowing.

299
00:14:55.440 --> 00:14:58.559
<v Speaker 2>Ah, then you need stream processing and analytics. This is

300
00:14:58.559 --> 00:15:02.039
<v Speaker 2>about querying data streams as they arrive in your real time.

301
00:15:02.320 --> 00:15:04.159
<v Speaker 2>Crucial when the data's value drops.

302
00:15:03.919 --> 00:15:07.200
<v Speaker 1>Quickly, like IoT sensor data or stock prices.

303
00:15:07.240 --> 00:15:12.960
<v Speaker 2>Perfect examples. Tools like Apache storms, Spark, streaming, flink, COFKA, streams, aws, kinesis.

304
00:15:13.200 --> 00:15:15.679
<v Speaker 2>They're all built for this kind of high speed, continuous

305
00:15:15.759 --> 00:15:19.399
<v Speaker 2>querying and analysis, getting insights in milliseconds or seconds.

306
00:15:19.399 --> 00:15:23.039
<v Speaker 1>Okay, we've got the data flowing stored prepped. Let's finally

307
00:15:23.080 --> 00:15:25.840
<v Speaker 1>dive into the engine machine learning and deep learning. Starting

308
00:15:25.840 --> 00:15:27.799
<v Speaker 1>with mL. Supervised learning, Yeah.

309
00:15:27.600 --> 00:15:29.679
<v Speaker 2>Probably the most common type. You train the model on

310
00:15:29.759 --> 00:15:31.679
<v Speaker 2>data where you already know the right answer the label.

311
00:15:31.759 --> 00:15:34.320
<v Speaker 1>Like predicting customer churn you have pass data on who

312
00:15:34.399 --> 00:15:35.480
<v Speaker 1>left exactly.

313
00:15:35.559 --> 00:15:40.120
<v Speaker 2>That's a classification problem. Or forecasting customer revenue based on passbending.

314
00:15:40.159 --> 00:15:43.000
<v Speaker 2>That's a regression known inputs, known outputs.

315
00:15:43.159 --> 00:15:46.759
<v Speaker 1>Then unsupervised learning, this is where it gets interesting.

316
00:15:47.080 --> 00:15:49.759
<v Speaker 2>You don't have labels. The goal is to find hidden

317
00:15:49.799 --> 00:15:52.720
<v Speaker 2>patterns or structures in the data itself.

318
00:15:52.440 --> 00:15:54.799
<v Speaker 1>Like grouping similar customers together.

319
00:15:54.679 --> 00:15:58.600
<v Speaker 2>Right, that's clustering. Another key technique is dimensionality reduction, like

320
00:15:58.639 --> 00:16:03.759
<v Speaker 2>PCA principle component analysis. It helps simplify massive data sets

321
00:16:03.759 --> 00:16:07.200
<v Speaker 2>by finding the most important underlying features, making them easier

322
00:16:07.200 --> 00:16:07.679
<v Speaker 2>to work with.

323
00:16:07.840 --> 00:16:10.399
<v Speaker 1>And then there's reinforcement learning. That sounds different.

324
00:16:10.159 --> 00:16:13.360
<v Speaker 2>Again it is. It's about real time learning. An agent

325
00:16:13.679 --> 00:16:16.960
<v Speaker 2>learns by trial and error in an environment, getting rewards

326
00:16:17.000 --> 00:16:18.039
<v Speaker 2>or penalties for its.

327
00:16:17.879 --> 00:16:20.519
<v Speaker 1>Actions, like training a robot to walk, yeah.

328
00:16:20.320 --> 00:16:24.879
<v Speaker 2>Or gameplaying AI. Google Search Engine uses IT. Autonomous vehicles

329
00:16:24.919 --> 00:16:30.440
<v Speaker 2>rely heavily on IT. Robotics. It's powering some really advanced applications.

330
00:16:29.919 --> 00:16:31.960
<v Speaker 1>Okay, And building on mL, we get to deep learning.

331
00:16:32.039 --> 00:16:34.399
<v Speaker 1>This is where the really complex stuff happens. Right.

332
00:16:34.679 --> 00:16:39.519
<v Speaker 2>Pretty much, DL extends mL using these things called artificial

333
00:16:39.519 --> 00:16:43.000
<v Speaker 2>neural networks A and NS, often with many, many hidden layers.

334
00:16:43.240 --> 00:16:45.240
<v Speaker 2>They're loosely inspired by the brain structure.

335
00:16:45.320 --> 00:16:48.360
<v Speaker 1>Why is DL booming now? These ideas aren't brand.

336
00:16:48.120 --> 00:16:51.200
<v Speaker 2>New, true, the concepts have been around, but it's the

337
00:16:51.320 --> 00:16:55.200
<v Speaker 2>massive leap in computing power, especially from GPUs, and huge

338
00:16:55.240 --> 00:16:58.879
<v Speaker 2>amounts of data that have made training these deep networks feasible.

339
00:16:59.120 --> 00:17:02.639
<v Speaker 2>Think milestones like IBM's Deep Blue beating Caspar.

340
00:17:02.279 --> 00:17:04.160
<v Speaker 1>Ofv Watson on Jeopardy.

341
00:17:03.799 --> 00:17:08.039
<v Speaker 2>Image net breakthroughs, Google Deep minds alphag These were all

342
00:17:08.079 --> 00:17:11.119
<v Speaker 2>powered by or pushed the limits of deep learning and

343
00:17:11.160 --> 00:17:12.319
<v Speaker 2>the hardware behind it.

344
00:17:12.240 --> 00:17:14.480
<v Speaker 1>And inside deal. There are different kinds of neural networks.

345
00:17:14.480 --> 00:17:15.400
<v Speaker 2>Oh yeah, whole zoo of them.

346
00:17:15.480 --> 00:17:15.720
<v Speaker 1>Yeah.

347
00:17:15.720 --> 00:17:19.440
<v Speaker 2>For image recognition, the standard is convolutional neural networks CNNs.

348
00:17:19.680 --> 00:17:22.839
<v Speaker 2>They're brilliant at picking out sparal hierarchies of features in images,

349
00:17:23.119 --> 00:17:25.680
<v Speaker 2>treating them as multidimensional grids or tensors.

350
00:17:25.720 --> 00:17:28.799
<v Speaker 1>Tensors right, like complex spreadsheets.

351
00:17:28.119 --> 00:17:31.960
<v Speaker 2>Sort of, yeah, multidimensional arrays. Then for sequential data time

352
00:17:32.079 --> 00:17:35.680
<v Speaker 2>series language, you have for current neural networks RNNs and

353
00:17:35.759 --> 00:17:39.839
<v Speaker 2>a more powerful variant called LSTMs long short term memory models.

354
00:17:40.119 --> 00:17:42.599
<v Speaker 1>They have memory of past inputs essentially.

355
00:17:42.720 --> 00:17:45.920
<v Speaker 2>Yes, they maintain a state that captures information from previous

356
00:17:45.920 --> 00:17:48.559
<v Speaker 2>steps in the sequence, and then you get into networks.

357
00:17:48.599 --> 00:17:51.839
<v Speaker 2>They can even generate new data like auto encoders or

358
00:17:51.920 --> 00:17:52.920
<v Speaker 2>variational auto.

359
00:17:52.759 --> 00:17:56.359
<v Speaker 1>Encoders and the famous JANS generative adversarial networks.

360
00:17:56.480 --> 00:17:59.519
<v Speaker 2>That's them, the tech behind deep fakes. They learn patterns

361
00:17:59.559 --> 00:18:03.720
<v Speaker 2>from input data and can create incredibly realistic synthetic images, text,

362
00:18:04.039 --> 00:18:04.920
<v Speaker 2>even music.

363
00:18:04.920 --> 00:18:07.440
<v Speaker 1>Wild stuff. What tools do people use to build these?

364
00:18:07.559 --> 00:18:11.000
<v Speaker 2>Python is king here. The dominant frameworks are TensorFlow, which

365
00:18:11.039 --> 00:18:14.359
<v Speaker 2>is Google's open source library, and Keras, which is a

366
00:18:14.440 --> 00:18:17.119
<v Speaker 2>high level API that makes TensorFlow much easier to use.

367
00:18:17.319 --> 00:18:18.480
<v Speaker 1>And the other big one.

368
00:18:18.440 --> 00:18:22.720
<v Speaker 2>PyTorch from Facebook. It's another very popular open source framework

369
00:18:22.920 --> 00:18:26.759
<v Speaker 2>known for its flexibility and dynamic approach often preferred in research.

370
00:18:26.960 --> 00:18:29.319
<v Speaker 1>Building the model is one thing, but getting it to

371
00:18:29.359 --> 00:18:33.640
<v Speaker 1>perform well, that's tuning right, sounds complex.

372
00:18:34.000 --> 00:18:37.279
<v Speaker 2>It's definitely an iterative process, almost an art form. Sometimes

373
00:18:37.839 --> 00:18:41.119
<v Speaker 2>you need to choose the right activation functions, deciding how

374
00:18:41.160 --> 00:18:45.359
<v Speaker 2>neurons fire, select appropriate loss functions to measure the model's air,

375
00:18:46.039 --> 00:18:50.079
<v Speaker 2>and pick good optimization algorithms to adjust the model's internal

376
00:18:50.079 --> 00:18:51.839
<v Speaker 2>weights to minimize that error.

377
00:18:52.160 --> 00:18:54.759
<v Speaker 1>And then there are hyper parameters like dials you can

378
00:18:54.759 --> 00:18:55.799
<v Speaker 1>turn exactly.

379
00:18:56.240 --> 00:18:58.799
<v Speaker 2>Think of them as settings outside the model that control

380
00:18:58.839 --> 00:19:02.119
<v Speaker 2>the learning process itself. Things like the learning rate, how

381
00:19:02.119 --> 00:19:04.920
<v Speaker 2>many examples you process at once, the batch size, and

382
00:19:05.000 --> 00:19:08.160
<v Speaker 2>really important techniques like regularization dropout.

383
00:19:08.240 --> 00:19:09.640
<v Speaker 1>What does regularization do?

384
00:19:10.079 --> 00:19:12.880
<v Speaker 2>It helps prevent the model from overfitting. That's where it

385
00:19:12.920 --> 00:19:15.920
<v Speaker 2>learns the training data too well, including noise, and then

386
00:19:16.000 --> 00:19:19.839
<v Speaker 2>fails to generalize to new unseen data. Dropout is a

387
00:19:19.839 --> 00:19:22.640
<v Speaker 2>common way to fight that. Fine tuning these hyper parameters

388
00:19:22.759 --> 00:19:24.400
<v Speaker 2>is critical for getting good results.

389
00:19:24.680 --> 00:19:27.480
<v Speaker 1>You know, thinking back, we've seen so many cool AI

390
00:19:27.559 --> 00:19:30.440
<v Speaker 1>prototypes over the years, but historically it didn't a lot

391
00:19:30.440 --> 00:19:32.839
<v Speaker 1>of them just fail to make it into actual use.

392
00:19:33.119 --> 00:19:36.200
<v Speaker 2>Sadly, yes, that was a common story, often due to

393
00:19:36.240 --> 00:19:38.799
<v Speaker 2>those operational silos we talked about, or maybe relying too

394
00:19:38.839 --> 00:19:42.000
<v Speaker 2>much on niche experts, models being too complex and code heavy,

395
00:19:42.079 --> 00:19:44.480
<v Speaker 2>core integration, lots of reasons.

396
00:19:44.519 --> 00:19:46.880
<v Speaker 1>But it feels like that's changing now, like there's an

397
00:19:46.920 --> 00:19:48.519
<v Speaker 1>automation revolution happening.

398
00:19:49.000 --> 00:19:51.079
<v Speaker 2>I think that's fair to say, and that's where auto

399
00:19:51.160 --> 00:19:54.920
<v Speaker 2>mL comes in alongside noload no code low code platforms.

400
00:19:54.519 --> 00:19:57.599
<v Speaker 1>Okay, AUTOMML automating machine learning pretty much.

401
00:19:57.720 --> 00:20:00.680
<v Speaker 2>It automates big chunks of the mL workflow, data prep,

402
00:20:01.440 --> 00:20:04.759
<v Speaker 2>picking the right algorithm, tuning those tricky hyper parameters we

403
00:20:04.920 --> 00:20:07.720
<v Speaker 2>just discussed, even deploying the model takes a lot of

404
00:20:07.759 --> 00:20:10.039
<v Speaker 2>the manual gruntwork and guesswork out of it. And the

405
00:20:10.079 --> 00:20:13.880
<v Speaker 2>no LO platform these are crucial for well democratizing AI.

406
00:20:14.240 --> 00:20:17.200
<v Speaker 2>They provide user interfaces that let people who aren't hardcore

407
00:20:17.279 --> 00:20:19.720
<v Speaker 2>data scientists build and deploy AI.

408
00:20:19.559 --> 00:20:22.079
<v Speaker 1>Solutions, so business analysts maybe exactly.

409
00:20:22.480 --> 00:20:25.559
<v Speaker 2>It enables a shift from older rule based automation to

410
00:20:25.720 --> 00:20:31.559
<v Speaker 2>more intelligent AI infused cognitive Robotic process automation or CRPA,

411
00:20:32.079 --> 00:20:33.200
<v Speaker 2>making automation smarter.

412
00:20:33.519 --> 00:20:36.240
<v Speaker 1>How does auto mL actually work underneath? How does it

413
00:20:36.279 --> 00:20:37.400
<v Speaker 1>find the best model?

414
00:20:37.720 --> 00:20:42.119
<v Speaker 2>It uses clever search algorithms. Simple ones exist like random

415
00:20:42.160 --> 00:20:45.079
<v Speaker 2>search or grid search, but the really powerful ones are

416
00:20:45.079 --> 00:20:46.759
<v Speaker 2>adaptive like Bayesian.

417
00:20:46.480 --> 00:20:48.240
<v Speaker 1>Optimization Asian optimization.

418
00:20:48.400 --> 00:20:51.240
<v Speaker 2>Yeah. It learns from each model it tries based on

419
00:20:51.279 --> 00:20:56.039
<v Speaker 2>how well previous configurations performed. It intelligently decides which combination

420
00:20:56.119 --> 00:20:59.680
<v Speaker 2>of algorithm and hyper parameters to try next. It's much

421
00:20:59.680 --> 00:21:01.400
<v Speaker 2>more efficient than just trying things.

422
00:21:01.279 --> 00:21:04.319
<v Speaker 1>Randomly, like a smarter trial and error exactly.

423
00:21:05.079 --> 00:21:07.839
<v Speaker 2>And for coders who want automation, they are great Python

424
00:21:07.920 --> 00:21:12.599
<v Speaker 2>libraries too, things like PI, Carrot for low code, mL, Autosklern, Autowaika,

425
00:21:12.920 --> 00:21:16.640
<v Speaker 2>even tPOT which uses genetic programming to evolve entire bipelines.

426
00:21:16.680 --> 00:21:20.200
<v Speaker 1>Beyond libraries, there are full platforms now right enterprise level stuck.

427
00:21:20.240 --> 00:21:23.759
<v Speaker 2>Oh yeah, the major cloud providers and others have comprehensive offerings.

428
00:21:24.000 --> 00:21:26.599
<v Speaker 2>IBM cloud Pack for data is when it aims to

429
00:21:26.599 --> 00:21:29.839
<v Speaker 2>be a unified platform with a data fabric approach, and

430
00:21:29.920 --> 00:21:32.240
<v Speaker 2>its AUTOAI feature automates the process and.

431
00:21:32.279 --> 00:21:33.680
<v Speaker 1>Ranks models and Azure.

432
00:21:33.839 --> 00:21:37.000
<v Speaker 2>Azure Machine Learning has really strong cloud integration and built

433
00:21:37.000 --> 00:21:40.839
<v Speaker 2>in automated mL features. Google Cloud vertex AI again a

434
00:21:40.960 --> 00:21:44.079
<v Speaker 2>unified platform tightly integrated with all the other Google Cloud

435
00:21:44.079 --> 00:21:48.599
<v Speaker 2>servicesws awsage Maker Autopilot is their main offering for simplifying

436
00:21:48.640 --> 00:21:51.599
<v Speaker 2>the model building, often paired with sage Maker data Wrangler

437
00:21:51.720 --> 00:21:52.880
<v Speaker 2>for the data prep side.

438
00:21:52.960 --> 00:21:54.480
<v Speaker 1>And TensorFlow has its own too.

439
00:21:54.640 --> 00:21:58.599
<v Speaker 2>Yep, TensorFlow Extended or TFX. It's open source from Google,

440
00:21:58.880 --> 00:22:02.680
<v Speaker 2>specifically designed for a building scalable production mL pipelines and

441
00:22:02.880 --> 00:22:06.079
<v Speaker 2>end to end mL alps. These platforms are really making

442
00:22:06.079 --> 00:22:07.680
<v Speaker 2>productionizing AI much more.

443
00:22:07.519 --> 00:22:10.440
<v Speaker 1>Achievable, and the impact of the NOLO interfaces on these

444
00:22:10.440 --> 00:22:11.799
<v Speaker 1>platforms seems massive.

445
00:22:12.119 --> 00:22:15.319
<v Speaker 2>It really is. They're essential for getting more people involved,

446
00:22:15.519 --> 00:22:18.920
<v Speaker 2>broader stakeholder engagement, moving AI out of just the data

447
00:22:18.960 --> 00:22:21.880
<v Speaker 2>science lab and into the hands of people across the business.

448
00:22:22.279 --> 00:22:23.640
<v Speaker 2>That's huge for adoption.

449
00:22:23.920 --> 00:22:27.039
<v Speaker 1>Okay, so we've got data models, automation. How do we

450
00:22:27.079 --> 00:22:31.160
<v Speaker 1>actually go from concept to a working solution? What's the process?

451
00:22:31.400 --> 00:22:34.039
<v Speaker 2>Well, the advice is generally to follow an agile approach,

452
00:22:34.440 --> 00:22:38.119
<v Speaker 2>start small, but keep that bigger enterprise AI picture in mind.

453
00:22:38.279 --> 00:22:39.079
<v Speaker 1>Is there framework.

454
00:22:39.319 --> 00:22:42.519
<v Speaker 2>Wells suggests a kind of seven step process beig a

455
00:22:42.640 --> 00:22:49.480
<v Speaker 2>specific well defined problem, develop low fideli solutions quickly, prototypes, iterate,

456
00:22:49.680 --> 00:22:53.599
<v Speaker 2>maybe try different related problems, collaborate like on Cagle competitions

457
00:22:53.640 --> 00:22:57.480
<v Speaker 2>to learn and always always make sure the solution delivers

458
00:22:57.559 --> 00:22:59.400
<v Speaker 2>real measurable value.

459
00:22:59.440 --> 00:23:02.319
<v Speaker 1>And you need to write infrastructure underneath all this, absolutely

460
00:23:02.319 --> 00:23:06.559
<v Speaker 1>critical things like APIs, application programming interfaces, and endpoints.

461
00:23:06.920 --> 00:23:09.519
<v Speaker 2>These are the standard ways you expose your trained mL

462
00:23:09.559 --> 00:23:12.400
<v Speaker 2>models so other applications or UIs can use it, usually

463
00:23:12.480 --> 00:23:14.079
<v Speaker 2>as a web service like Arrest API.

464
00:23:14.319 --> 00:23:15.680
<v Speaker 1>That's how the model gets called.

465
00:23:15.720 --> 00:23:19.200
<v Speaker 2>To make predictions precisely and to handle big data volumes

466
00:23:19.200 --> 00:23:22.039
<v Speaker 2>and lots of users hating those APIs, you need serious

467
00:23:22.039 --> 00:23:25.079
<v Speaker 2>processing power distributed processing clusters.

468
00:23:25.119 --> 00:23:26.519
<v Speaker 1>This is where GPUs come in.

469
00:23:26.640 --> 00:23:30.200
<v Speaker 2>Yes, GPUs graphics profits units and now TPUs TensorFlow processing

470
00:23:30.240 --> 00:23:33.359
<v Speaker 2>units are key because they excel at the parallel computations

471
00:23:33.400 --> 00:23:36.079
<v Speaker 2>needed for deep learning. And newer things are emerging too,

472
00:23:36.200 --> 00:23:40.640
<v Speaker 2>like DPUs data processing units and FPGA's Field Programmable.

473
00:23:40.160 --> 00:23:42.759
<v Speaker 1>Data RAYSE and dealing with massive data sets.

474
00:23:42.680 --> 00:23:46.119
<v Speaker 2>Techniques like sharding are used. It basically means splitting your

475
00:23:46.160 --> 00:23:49.559
<v Speaker 2>data horizontally. Across multiple servers, so you can process it

476
00:23:49.599 --> 00:23:52.119
<v Speaker 2>in parallel and handle huge volumes.

477
00:23:52.319 --> 00:23:55.839
<v Speaker 1>So the models deployed via an API, how do users

478
00:23:55.880 --> 00:23:58.440
<v Speaker 1>interact with it through user interfaces?

479
00:23:58.640 --> 00:24:02.079
<v Speaker 2>UIs right, and Python being the main AI language, has

480
00:24:02.119 --> 00:24:05.359
<v Speaker 2>great tools for building these UIs quickly like what, Well,

481
00:24:05.400 --> 00:24:09.160
<v Speaker 2>there's dash which is excellent for building analytical web dashboards.

482
00:24:09.680 --> 00:24:13.000
<v Speaker 2>Flask is a popular lightweight framework good for simpler web

483
00:24:13.039 --> 00:24:15.720
<v Speaker 2>apps or API back ends. Django is more of a

484
00:24:15.759 --> 00:24:19.440
<v Speaker 2>full featured, batteries included framework for bigger projects.

485
00:24:19.480 --> 00:24:21.440
<v Speaker 1>And I've heard a lot about streamlet recently.

486
00:24:21.720 --> 00:24:24.640
<v Speaker 2>Yeah, streamlint has become incredibly popular because it makes it

487
00:24:24.680 --> 00:24:28.000
<v Speaker 2>super easy to turn data scripts into sharable web apps

488
00:24:28.000 --> 00:24:32.160
<v Speaker 2>with sliders, buttons, charts very quickly, great for prototyping and

489
00:24:32.200 --> 00:24:32.839
<v Speaker 2>sharing results.

490
00:24:32.880 --> 00:24:34.839
<v Speaker 1>Okay, let's see AI and action. How is all this

491
00:24:34.960 --> 00:24:36.240
<v Speaker 1>impacting different industries?

492
00:24:36.319 --> 00:24:38.200
<v Speaker 2>The impact is becoming really widespread.

493
00:24:38.240 --> 00:24:40.000
<v Speaker 1>Take telecommunications, what are they doing?

494
00:24:40.160 --> 00:24:43.599
<v Speaker 2>A big focus is predictive analytics for customer churn, trying

495
00:24:43.640 --> 00:24:46.640
<v Speaker 2>to figure out who might leave based on factors like price,

496
00:24:46.839 --> 00:24:51.039
<v Speaker 2>customer service interactions, network coverage issues. They also use real

497
00:24:51.079 --> 00:24:53.839
<v Speaker 2>time dashboards and sentiment analysis.

498
00:24:53.319 --> 00:24:55.240
<v Speaker 1>Using NLP on social media.

499
00:24:54.880 --> 00:24:58.759
<v Speaker 2>Exactly like analyzing Twitter data using the API to gauge

500
00:24:58.799 --> 00:25:01.599
<v Speaker 2>customer feelings about the b or service quality. What about

501
00:25:01.599 --> 00:25:06.240
<v Speaker 2>retail again, churn and retention are huge, mining customer data,

502
00:25:06.279 --> 00:25:11.000
<v Speaker 2>purchase history, website behavior demographics to build models predicting who

503
00:25:11.079 --> 00:25:14.079
<v Speaker 2>might leave and why. Also lots of predictive analytics for

504
00:25:14.119 --> 00:25:16.319
<v Speaker 2>online sales trends and customer behavior patterns.

505
00:25:16.359 --> 00:25:19.599
<v Speaker 1>And banking and finance fraud detection must be a big one.

506
00:25:19.680 --> 00:25:23.160
<v Speaker 2>Absolutely, that's kind of the flagship AI use case there,

507
00:25:23.359 --> 00:25:27.039
<v Speaker 2>combining sophisticated machine learning with complex rule engines to spot

508
00:25:27.079 --> 00:25:31.759
<v Speaker 2>anomalies flag suspicious transactions in real time. It's constantly evolving

509
00:25:31.799 --> 00:25:33.000
<v Speaker 2>to catch new fraud patterns.

510
00:25:33.039 --> 00:25:36.079
<v Speaker 1>Supply chain management seems ripe for optimization.

511
00:25:36.519 --> 00:25:41.160
<v Speaker 2>Definitely, optimization and prescriptive analytics are key using AI to

512
00:25:41.240 --> 00:25:44.920
<v Speaker 2>better match supply with demand, optimize delivery routes, improve planning

513
00:25:44.960 --> 00:25:48.559
<v Speaker 2>and scheduling, ultimately aiming to reduce costs and increase efficiency.

514
00:25:48.599 --> 00:25:51.480
<v Speaker 2>Healthcare and pharma, we're seeing a lot more chatbots and

515
00:25:51.839 --> 00:25:57.119
<v Speaker 2>intelligent virtual assistance ivas being used for patient support, appointment scheduling,

516
00:25:57.440 --> 00:26:01.160
<v Speaker 2>even initial mental health screening or support leveraging NLP and

517
00:26:01.279 --> 00:26:03.960
<v Speaker 2>NLG Natural language generation for human.

518
00:26:03.880 --> 00:26:07.039
<v Speaker 1>Like interaction and human resources. Can AI help there?

519
00:26:07.119 --> 00:26:10.400
<v Speaker 2>Yeah. HR analytics is a growing field using AI for

520
00:26:10.440 --> 00:26:15.680
<v Speaker 2>better recruitment, screening resumes, identifying promising candidates, talent management, improving

521
00:26:15.720 --> 00:26:19.160
<v Speaker 2>employee experience and employee attrition. Modeling is a big one,

522
00:26:19.319 --> 00:26:21.359
<v Speaker 2>figuring out why people leave and how to keep your

523
00:26:21.359 --> 00:26:21.839
<v Speaker 2>best talent.

524
00:26:21.920 --> 00:26:23.519
<v Speaker 1>It seems like it's touching almost everywhere.

525
00:26:23.559 --> 00:26:27.359
<v Speaker 2>It really is. Manufacturing with industry four point zero predictive

526
00:26:27.400 --> 00:26:31.480
<v Speaker 2>maintenance on machines, cybersecurity using mL and DL to detect

527
00:26:31.519 --> 00:26:36.359
<v Speaker 2>sophisticated attacks, insurance especially telematics, data from cars for risk

528
00:26:36.400 --> 00:26:40.279
<v Speaker 2>assessment and personalized premiums. Even the legal sector for automating

529
00:26:40.319 --> 00:26:41.640
<v Speaker 2>research and contract review.

530
00:26:41.960 --> 00:26:45.599
<v Speaker 1>Wow. And a key trend speeding this all up is

531
00:26:45.720 --> 00:26:49.480
<v Speaker 1>using pre trained models, right, not building from scratch every time.

532
00:26:49.720 --> 00:26:51.920
<v Speaker 2>That's a really important point. It's becoming much more common.

533
00:26:52.079 --> 00:26:55.319
<v Speaker 2>Instead of spending months training your own massive model, you

534
00:26:55.319 --> 00:26:58.920
<v Speaker 2>can leverage powerful pre trained models offered by cloud providers

535
00:26:59.000 --> 00:27:00.720
<v Speaker 2>or others through simple.

536
00:27:00.480 --> 00:27:02.480
<v Speaker 1>APIs like for analyzing text.

537
00:27:02.599 --> 00:27:06.640
<v Speaker 2>Yeah, things like the Azure Text Analytics API or Google's

538
00:27:06.640 --> 00:27:10.400
<v Speaker 2>teachable machines, which lets you train models easily. Walsh's guide

539
00:27:10.440 --> 00:27:14.200
<v Speaker 2>even mentions using models like day L for generating creative images.

540
00:27:14.279 --> 00:27:16.240
<v Speaker 2>It massively accelerates development.

541
00:27:16.559 --> 00:27:19.319
<v Speaker 1>So quite a journey we've taken today, a real deep

542
00:27:19.400 --> 00:27:23.880
<v Speaker 1>dive through this well sometimes complex world of productionizing AI.

543
00:27:23.960 --> 00:27:25.519
<v Speaker 2>We covered a lot of ground, Yeah, from.

544
00:27:25.359 --> 00:27:29.119
<v Speaker 1>The basic ecosystem, the data strategies, the crucial roles of

545
00:27:29.200 --> 00:27:32.519
<v Speaker 1>data ops and emil ups, then into the mechanics of

546
00:27:32.640 --> 00:27:37.160
<v Speaker 1>machine learning, deep learning, the automation with AutoML and NOLO.

547
00:27:37.240 --> 00:27:39.319
<v Speaker 2>And finally see how it all comes together in real

548
00:27:39.319 --> 00:27:41.680
<v Speaker 2>world applications across so many industries.

549
00:27:41.759 --> 00:27:43.960
<v Speaker 1>And I think what's really clear, what stands out is

550
00:27:44.000 --> 00:27:47.319
<v Speaker 1>that getting AI solutions delivered successfully it's not just about

551
00:27:47.319 --> 00:27:49.279
<v Speaker 1>having a clever model, not at all.

552
00:27:49.480 --> 00:27:53.680
<v Speaker 2>It's the entire end to end process, robust data pipelines,

553
00:27:53.880 --> 00:27:59.400
<v Speaker 2>smart storage, careful orchestration, testing, continuous monitoring once it's live.

554
00:28:00.039 --> 00:28:00.960
<v Speaker 2>It's the whole life cycle.

555
00:28:01.039 --> 00:28:02.359
<v Speaker 1>It really is a journey, isn't it.

556
00:28:02.440 --> 00:28:04.920
<v Speaker 2>Definitely, And as we've touched on, that journey is full

557
00:28:04.920 --> 00:28:07.880
<v Speaker 2>of amazing technical opportunities, but also, let's be honest, some

558
00:28:08.000 --> 00:28:09.079
<v Speaker 2>very practical challenges.

559
00:28:09.240 --> 00:28:12.160
<v Speaker 1>Yeah. So maybe the deep dive for you the listener

560
00:28:12.200 --> 00:28:15.519
<v Speaker 1>now is to think beyond just what AI can do

561
00:28:16.160 --> 00:28:18.160
<v Speaker 1>think about how it's actually brought to life.

562
00:28:18.559 --> 00:28:22.640
<v Speaker 2>Consider those hidden costs. Maybe cloud services for development aren't free.

563
00:28:23.279 --> 00:28:26.960
<v Speaker 2>Managing data that constantly changes, that's complex.

564
00:28:26.519 --> 00:28:29.039
<v Speaker 1>And needing the right mix of skills in your workforce

565
00:28:29.079 --> 00:28:31.240
<v Speaker 1>to actually pull it all off exactly.

566
00:28:31.640 --> 00:28:35.759
<v Speaker 2>The real AHA moment might come from understanding those underlying infrastructures,

567
00:28:36.039 --> 00:28:39.680
<v Speaker 2>the economic realities, the operational hurdles that make AI really happen,

568
00:28:40.119 --> 00:28:42.880
<v Speaker 2>or sometimes surprisingly make it harder than you'd think.
