WEBVTT

1
00:00:00.120 --> 00:00:02.319
<v Speaker 1>Welcome to your deep dive. Today. We're going to crack

2
00:00:02.399 --> 00:00:06.160
<v Speaker 1>open this intro to Python for Computer Science and Data

3
00:00:06.160 --> 00:00:10.000
<v Speaker 1>Science textbook. Oh yeah, and really kind of explore this

4
00:00:10.039 --> 00:00:11.320
<v Speaker 1>whole world of data science.

5
00:00:11.480 --> 00:00:12.400
<v Speaker 2>I love this book.

6
00:00:12.480 --> 00:00:15.000
<v Speaker 1>Think of it as like a guided tour through some

7
00:00:15.039 --> 00:00:17.600
<v Speaker 1>of the most important and engaging parts of this field.

8
00:00:17.719 --> 00:00:19.440
<v Speaker 2>It really is a fantastic foundation.

9
00:00:19.559 --> 00:00:19.760
<v Speaker 1>Yeah.

10
00:00:19.800 --> 00:00:22.480
<v Speaker 2>It covers both the core concepts and like real world

11
00:00:22.559 --> 00:00:25.600
<v Speaker 2>applications of data science. Right, we'll go from the basics

12
00:00:25.640 --> 00:00:28.239
<v Speaker 2>all the way to really advance stuff like deep learning.

13
00:00:28.760 --> 00:00:31.440
<v Speaker 1>Okay, so let's start with kind of unpacking this whole

14
00:00:31.519 --> 00:00:32.520
<v Speaker 1>data science boom.

15
00:00:32.600 --> 00:00:32.920
<v Speaker 2>Okay.

16
00:00:33.320 --> 00:00:36.280
<v Speaker 1>The book starts by talking about this huge surge in interest.

17
00:00:36.920 --> 00:00:39.439
<v Speaker 1>So why is data science such a big deal right now?

18
00:00:39.679 --> 00:00:44.280
<v Speaker 2>Well, it's fascinating because it's really like multiple technological forces

19
00:00:44.560 --> 00:00:46.399
<v Speaker 2>all aligning perfectly at the same time.

20
00:00:46.679 --> 00:00:47.079
<v Speaker 1>Okay.

21
00:00:47.159 --> 00:00:52.000
<v Speaker 2>So you have cheaper and faster technology that leads to

22
00:00:52.039 --> 00:00:55.640
<v Speaker 2>the rapid growth of big data. Right, We're collecting more

23
00:00:55.679 --> 00:00:58.399
<v Speaker 2>information than ever before. And then you have cheaper and

24
00:00:58.479 --> 00:01:01.679
<v Speaker 2>faster internet bandwidth that makes it possible to move that

25
00:01:01.799 --> 00:01:05.439
<v Speaker 2>data around quickly and easily. And then you have tons

26
00:01:05.599 --> 00:01:08.359
<v Speaker 2>of almost free software that gives you the tools to

27
00:01:08.439 --> 00:01:09.200
<v Speaker 2>analyze it all.

28
00:01:09.319 --> 00:01:09.519
<v Speaker 1>Yeah.

29
00:01:09.640 --> 00:01:12.000
<v Speaker 2>It really is the perfect storm for a data revolution.

30
00:01:12.319 --> 00:01:15.840
<v Speaker 1>So it's not just about our ability to like collect

31
00:01:15.920 --> 00:01:18.280
<v Speaker 1>this data but actually make sense.

32
00:01:18.079 --> 00:01:19.040
<v Speaker 2>Of it all exactly.

33
00:01:19.159 --> 00:01:22.519
<v Speaker 1>Like we suddenly can read an entire library, yes, instead

34
00:01:22.519 --> 00:01:24.400
<v Speaker 1>of just staring at the bookshelves exactly.

35
00:01:25.040 --> 00:01:30.159
<v Speaker 2>The book highlights this fundamental shift in programming. Okay, you know,

36
00:01:30.280 --> 00:01:33.359
<v Speaker 2>the cutting edge innovations aren't just about building software anymore.

37
00:01:33.799 --> 00:01:39.159
<v Speaker 2>They're about extracting valuable insights from these massive data sets.

38
00:01:39.319 --> 00:01:41.640
<v Speaker 1>So give me some examples. Like, what are some of

39
00:01:41.680 --> 00:01:44.280
<v Speaker 1>the things that data science is powering today?

40
00:01:44.959 --> 00:01:51.079
<v Speaker 2>Oh, think about it. Mobile navigation apps okay, personalized recommendations

41
00:01:51.120 --> 00:01:54.920
<v Speaker 2>on your streaming services, right, even self driving cars, they're

42
00:01:55.000 --> 00:01:58.560
<v Speaker 2>all powered by data science. Wow, this feel is really

43
00:01:58.599 --> 00:01:59.599
<v Speaker 2>shaping the future.

44
00:01:59.680 --> 00:02:02.920
<v Speaker 1>And that where it gets really interesting. The book points

45
00:02:02.959 --> 00:02:05.959
<v Speaker 1>to Python is kind of this go to language for

46
00:02:06.040 --> 00:02:09.080
<v Speaker 1>data science. So of all the languages out there, why Python.

47
00:02:09.439 --> 00:02:13.439
<v Speaker 2>Python is famous for being beginner friendly. Its syntax is

48
00:02:13.520 --> 00:02:16.800
<v Speaker 2>so clear you can almost read it like plain English,

49
00:02:17.240 --> 00:02:20.479
<v Speaker 2>which makes it way less intimidating for like first time coders.

50
00:02:21.000 --> 00:02:24.520
<v Speaker 2>But don't let that fool you. Python is powerful enough

51
00:02:24.759 --> 00:02:29.000
<v Speaker 2>to handle complex data science projects used by seasoned professionals.

52
00:02:29.080 --> 00:02:31.960
<v Speaker 1>So it's kind of this like multi tool exactly. You know,

53
00:02:32.039 --> 00:02:36.240
<v Speaker 1>it's perfect for both like simple repairs and intricate projects exactly.

54
00:02:36.280 --> 00:02:38.879
<v Speaker 1>And I remember from like even my intro to programming class,

55
00:02:38.960 --> 00:02:42.360
<v Speaker 1>you know how satisfying it was to test out just

56
00:02:42.639 --> 00:02:47.080
<v Speaker 1>little snippets of Python code and like instantly see the results. Yes,

57
00:02:47.199 --> 00:02:51.080
<v Speaker 1>you know, it's really helpful for learning, especially for something

58
00:02:51.080 --> 00:02:52.599
<v Speaker 1>as hands on as data science.

59
00:02:52.759 --> 00:02:57.080
<v Speaker 2>Absolutely, that interactivity lets you experiment and quickly see how

60
00:02:57.120 --> 00:03:00.159
<v Speaker 2>different parts of your code work together. It's essential for

61
00:03:00.199 --> 00:03:04.039
<v Speaker 2>both learning and the process of building data science solutions, so.

62
00:03:03.960 --> 00:03:05.639
<v Speaker 1>You can test drive the code as you build it

63
00:03:05.719 --> 00:03:08.599
<v Speaker 1>exactly like that. Yeah, so we've covered the why of

64
00:03:08.719 --> 00:03:12.319
<v Speaker 1>data science, the why of Python. Let's get into some

65
00:03:12.360 --> 00:03:14.960
<v Speaker 1>of the nuts and bolts here. Okay. Chapter three of

66
00:03:14.960 --> 00:03:18.599
<v Speaker 1>the book kind of dives into Python fundamentals, starting with

67
00:03:19.120 --> 00:03:22.879
<v Speaker 1>variables and assignment statements. Yeah. These seem really basic, but

68
00:03:22.960 --> 00:03:24.680
<v Speaker 1>the book emphasizes them a lot.

69
00:03:25.159 --> 00:03:26.639
<v Speaker 2>They are absolutely essential.

70
00:03:27.080 --> 00:03:28.240
<v Speaker 1>Are they really that crucial?

71
00:03:28.360 --> 00:03:32.400
<v Speaker 2>They are. Variables are the containers that hold information. Okay,

72
00:03:32.439 --> 00:03:35.360
<v Speaker 2>think of them like labeled boxes in a vast warehouse

73
00:03:35.360 --> 00:03:38.400
<v Speaker 2>of data. And assignment statements are how we put the

74
00:03:38.479 --> 00:03:42.000
<v Speaker 2>data into those containers. Right. Without those basic concepts, you

75
00:03:42.039 --> 00:03:43.439
<v Speaker 2>can't do anything in programming.

76
00:03:43.520 --> 00:03:45.960
<v Speaker 1>So it's like organizing all the ingredients before you start cooking.

77
00:03:46.120 --> 00:03:47.319
<v Speaker 2>Yes, precisely.

78
00:03:47.520 --> 00:03:50.479
<v Speaker 1>So, if I wanted to store a Twitter user's age,

79
00:03:50.520 --> 00:03:53.240
<v Speaker 1>for example, I would use a variable to hold that

80
00:03:53.280 --> 00:03:54.520
<v Speaker 1>piece of information.

81
00:03:54.280 --> 00:03:57.879
<v Speaker 2>Exactly, and then you can use that variable to perform calculations, okay,

82
00:03:58.199 --> 00:04:01.400
<v Speaker 2>make comparisons, or even and use it as input for

83
00:04:01.520 --> 00:04:04.639
<v Speaker 2>other parts of your program. Variables are really the foundation

84
00:04:04.719 --> 00:04:06.520
<v Speaker 2>of everything we do in data science.

85
00:04:06.960 --> 00:04:10.400
<v Speaker 1>Okay, So I'm starting to see how those seemingly simple

86
00:04:10.479 --> 00:04:14.599
<v Speaker 1>concepts build up to more complex actions. What about functions,

87
00:04:14.639 --> 00:04:15.759
<v Speaker 1>What role do those play?

88
00:04:15.960 --> 00:04:21.560
<v Speaker 2>Functions are like prepackaged units of code that perform specific tasks. Okay,

89
00:04:21.920 --> 00:04:24.800
<v Speaker 2>they make your code more organized and efficient. It's kind

90
00:04:24.800 --> 00:04:28.839
<v Speaker 2>of like having a well stocked toolbox where each tool

91
00:04:28.920 --> 00:04:30.560
<v Speaker 2>has a very specific purpose.

92
00:04:30.920 --> 00:04:31.040
<v Speaker 1>Right.

93
00:04:31.360 --> 00:04:33.199
<v Speaker 2>You don't need to reinvent the wheel every time you

94
00:04:33.240 --> 00:04:35.759
<v Speaker 2>want to perform a common action, You just call the

95
00:04:35.800 --> 00:04:36.680
<v Speaker 2>appropriate function.

96
00:04:36.920 --> 00:04:37.160
<v Speaker 1>Okay.

97
00:04:37.360 --> 00:04:41.439
<v Speaker 2>And the book even dives into random number generation okay,

98
00:04:41.600 --> 00:04:44.560
<v Speaker 2>which is crucial for sculations and game development.

99
00:04:44.680 --> 00:04:47.680
<v Speaker 1>Oh yeah, like that whole game of craps exercise. Yes,

100
00:04:47.959 --> 00:04:51.279
<v Speaker 1>vaguely remember that. Yeah, I might have to revisit that chapter. Okay,

101
00:04:51.439 --> 00:04:54.639
<v Speaker 1>but hold on, Data isn't always just single pieces of information,

102
00:04:54.759 --> 00:04:57.439
<v Speaker 1>right right? I mean the book then goes into lists

103
00:04:57.480 --> 00:05:00.600
<v Speaker 1>and dictionaries. Yeah, what role do those play? Science?

104
00:05:00.920 --> 00:05:04.839
<v Speaker 2>So you're right, real world data often comes in collections, right,

105
00:05:05.160 --> 00:05:08.839
<v Speaker 2>and Python has powerful built in tools to handle that.

106
00:05:09.040 --> 00:05:09.360
<v Speaker 1>Okay.

107
00:05:09.920 --> 00:05:12.759
<v Speaker 2>Lists are like ordered to do lists for data.

108
00:05:12.839 --> 00:05:13.079
<v Speaker 1>Okay.

109
00:05:13.240 --> 00:05:18.000
<v Speaker 2>They store information sequentially, and the book goes into detail

110
00:05:18.079 --> 00:05:23.000
<v Speaker 2>about how to slice and dice lists to extract specific information.

111
00:05:23.240 --> 00:05:25.839
<v Speaker 1>So it's like pulling out individual ingredients from.

112
00:05:25.720 --> 00:05:26.959
<v Speaker 2>A recipe precisely.

113
00:05:27.120 --> 00:05:29.639
<v Speaker 1>So if I had a list of all Twitter users, yeah,

114
00:05:29.680 --> 00:05:31.560
<v Speaker 1>I could use slicing to just pull out the user

115
00:05:31.639 --> 00:05:34.519
<v Speaker 1>names that start with a exact That seems really useful

116
00:05:34.560 --> 00:05:36.439
<v Speaker 1>for like targeting specific data.

117
00:05:36.519 --> 00:05:39.720
<v Speaker 2>It helps you pinpoint the exact information you need within

118
00:05:39.759 --> 00:05:40.920
<v Speaker 2>a larger data set.

119
00:05:41.000 --> 00:05:41.319
<v Speaker 1>Okay.

120
00:05:41.399 --> 00:05:42.720
<v Speaker 2>Now, dictionaries are a little.

121
00:05:42.480 --> 00:05:43.120
<v Speaker 1>Different, Okay.

122
00:05:43.279 --> 00:05:48.240
<v Speaker 2>They're more like those super organized filing cabinets with labeled folders,

123
00:05:48.959 --> 00:05:52.279
<v Speaker 2>So each piece of data or value has a unique

124
00:05:52.360 --> 00:05:55.000
<v Speaker 2>key associated with it. Okay, and this setup lets you

125
00:05:55.079 --> 00:05:58.279
<v Speaker 2>quickly find the information you need just by using its key.

126
00:05:58.480 --> 00:06:01.399
<v Speaker 1>So it's like if you knew someone's customer on Amazon

127
00:06:01.720 --> 00:06:05.000
<v Speaker 1>and you could immediately pull up their purchase history exactly,

128
00:06:05.040 --> 00:06:08.800
<v Speaker 1>so much more efficient than searching through some giant list precisely.

129
00:06:09.000 --> 00:06:14.240
<v Speaker 2>Dictionaries are fantastic for quickly retrieving specific pieces of data,

130
00:06:14.439 --> 00:06:18.399
<v Speaker 2>and they're essential for building efficient data driven applications.

131
00:06:19.279 --> 00:06:22.160
<v Speaker 1>Now that we've got our data organized in these lists

132
00:06:22.160 --> 00:06:26.160
<v Speaker 1>and dictionaries, what's next in our data science journey?

133
00:06:26.240 --> 00:06:28.279
<v Speaker 2>Well, now we've got to figure out how to make

134
00:06:28.319 --> 00:06:31.480
<v Speaker 2>sense of it all. And the book introduces some data

135
00:06:31.519 --> 00:06:35.160
<v Speaker 2>science essentials, starting with descriptive.

136
00:06:34.680 --> 00:06:38.120
<v Speaker 1>Statistics Data science essentials. Yes, okay, we're getting into the

137
00:06:38.120 --> 00:06:39.959
<v Speaker 1>heart of it now, that's right, I'm ready.

138
00:06:40.079 --> 00:06:44.199
<v Speaker 2>Descriptive statistics are the foundation of data analysis, okay. They

139
00:06:44.240 --> 00:06:47.040
<v Speaker 2>give us a way to summarize and understand the basic

140
00:06:47.120 --> 00:06:50.160
<v Speaker 2>features of our data, and the book covers key concepts

141
00:06:50.199 --> 00:06:54.040
<v Speaker 2>like minimum, maximum, range, count, and some all essential for

142
00:06:54.079 --> 00:06:55.319
<v Speaker 2>getting a handle on your data.

143
00:06:55.600 --> 00:06:57.959
<v Speaker 1>So if I wanted to find out the average age

144
00:06:58.399 --> 00:07:02.360
<v Speaker 1>of Twitter users, I would use descriptive statistics to calculate

145
00:07:02.399 --> 00:07:03.439
<v Speaker 1>that exactly.

146
00:07:03.920 --> 00:07:08.199
<v Speaker 2>Descriptive statistics give you a quick overview of your data's characteristics. Okay,

147
00:07:08.240 --> 00:07:11.560
<v Speaker 2>But the book doesn't stop there, okay. It also emphasizes

148
00:07:11.639 --> 00:07:16.319
<v Speaker 2>the power of data visualization, using charts and graphs to

149
00:07:16.360 --> 00:07:17.480
<v Speaker 2>bring your data to life.

150
00:07:17.560 --> 00:07:19.839
<v Speaker 1>Now that's something I can get behind. Yes, a picture's

151
00:07:19.879 --> 00:07:22.879
<v Speaker 1>worth a thousand data points, right, Exactly what kind of

152
00:07:22.959 --> 00:07:25.519
<v Speaker 1>visualization tools does Python have for us?

153
00:07:25.920 --> 00:07:30.000
<v Speaker 2>Python boasts these incredibly powerful libraries like matt plotlib and

154
00:07:30.079 --> 00:07:33.279
<v Speaker 2>seaborn okay, and they let you create charts and graphs

155
00:07:33.319 --> 00:07:36.759
<v Speaker 2>that transform raw data into insightful visuals.

156
00:07:37.000 --> 00:07:37.360
<v Speaker 1>Okay.

157
00:07:37.639 --> 00:07:42.240
<v Speaker 2>Imagine trying to understand public sentiment about a new product, Okay,

158
00:07:42.279 --> 00:07:45.800
<v Speaker 2>Instead of sifting through thousands of tweets, you could use

159
00:07:45.800 --> 00:07:49.040
<v Speaker 2>matt plotlib to create a visual map of sentiment and

160
00:07:49.079 --> 00:07:50.800
<v Speaker 2>see where the love is, where the hate is, all

161
00:07:50.839 --> 00:07:51.439
<v Speaker 2>at a glance.

162
00:07:51.800 --> 00:07:53.839
<v Speaker 1>Wow, So a heat map of emotions.

163
00:07:53.879 --> 00:07:54.360
<v Speaker 2>Exactly.

164
00:07:54.519 --> 00:07:55.240
<v Speaker 1>That's amazing.

165
00:07:55.439 --> 00:07:58.279
<v Speaker 2>It's about making data accessible and meaningful.

166
00:07:58.639 --> 00:08:00.439
<v Speaker 1>But it sounds like we're moving beyond on just the

167
00:08:00.480 --> 00:08:03.000
<v Speaker 1>basics here. Where does the book take us from here?

168
00:08:03.279 --> 00:08:06.720
<v Speaker 2>Well, the textbook doesn't shy away from more complex topics.

169
00:08:07.199 --> 00:08:11.040
<v Speaker 2>It takes you beyond the fundamentals and introduces advanced concepts

170
00:08:11.120 --> 00:08:14.399
<v Speaker 2>like object oriented programming and recursion.

171
00:08:14.839 --> 00:08:17.800
<v Speaker 1>Object oriented programming. Yes, that sounds a little intimidate.

172
00:08:17.920 --> 00:08:20.600
<v Speaker 2>It might sound complex, yeah, but it's actually a way

173
00:08:20.639 --> 00:08:24.279
<v Speaker 2>to organize your code more efficiently, especially as your programs

174
00:08:24.319 --> 00:08:27.439
<v Speaker 2>grow larger and more intricate. Okay, the book uses this

175
00:08:27.600 --> 00:08:30.959
<v Speaker 2>brilliant analogy building with lego blocks.

176
00:08:31.519 --> 00:08:32.320
<v Speaker 1>Okay, like that.

177
00:08:32.480 --> 00:08:36.879
<v Speaker 2>In object oriented programming, each block represents a self contained

178
00:08:37.000 --> 00:08:40.279
<v Speaker 2>unit of code, complete with its own properties and actions.

179
00:08:40.759 --> 00:08:43.960
<v Speaker 1>So instead of writing this huge, messy program, yes, I

180
00:08:44.000 --> 00:08:49.320
<v Speaker 1>can break it down into these smaller, reusable logo blocks precisely. Okay,

181
00:08:49.399 --> 00:08:50.000
<v Speaker 1>that makes sense.

182
00:08:50.080 --> 00:08:53.240
<v Speaker 2>It makes your code much easier to understand, modify, and reuse.

183
00:08:53.720 --> 00:08:58.039
<v Speaker 2>For example, think about a Twitter USERK. In object oriented programming,

184
00:08:58.080 --> 00:09:02.000
<v Speaker 2>we could represent that user as an object with properties

185
00:09:02.039 --> 00:09:05.600
<v Speaker 2>like user name, age, number of followers, and so on.

186
00:09:05.879 --> 00:09:10.279
<v Speaker 2>We could even define actions or methods that this user

187
00:09:10.360 --> 00:09:16.159
<v Speaker 2>object can perform, like post tweet, follow user or like tweet.

188
00:09:16.320 --> 00:09:19.679
<v Speaker 1>So each user is like this self contained unit with

189
00:09:19.759 --> 00:09:21.679
<v Speaker 1>its own data and actions.

190
00:09:21.720 --> 00:09:22.279
<v Speaker 2>Precisely.

191
00:09:22.440 --> 00:09:23.120
<v Speaker 1>Okay, I like that.

192
00:09:23.159 --> 00:09:25.720
<v Speaker 2>It's a mini program within a larger program.

193
00:09:26.120 --> 00:09:28.559
<v Speaker 1>Yeah. Okay, what about recursion?

194
00:09:29.120 --> 00:09:31.000
<v Speaker 2>Ah? Yes, recursion that.

195
00:09:31.000 --> 00:09:33.679
<v Speaker 1>Was a mind bending concept. It is from chapter eleven.

196
00:09:33.720 --> 00:09:35.960
<v Speaker 2>It is where a function calls itself kind of like

197
00:09:35.960 --> 00:09:38.720
<v Speaker 2>a snake eating its own tail. It can be tricky

198
00:09:38.720 --> 00:09:43.080
<v Speaker 2>to grasp at first, but the book provides clear explanations

199
00:09:43.480 --> 00:09:47.240
<v Speaker 2>and some elegant example, like using recursion to calculate the

200
00:09:47.240 --> 00:09:48.360
<v Speaker 2>Fibonacci sequence.

201
00:09:48.440 --> 00:09:51.679
<v Speaker 1>The Fibonacci sequence, I vaguely remember that from math class,

202
00:09:52.120 --> 00:09:55.320
<v Speaker 1>something about spirals and seashells exactly, coming back to me.

203
00:09:55.399 --> 00:09:58.559
<v Speaker 2>Now, each number in the Fibonacci sequence is the sum

204
00:09:58.639 --> 00:10:02.960
<v Speaker 2>of the two numbers before it. Okay, so zero, one, one, two, three, five, eight,

205
00:10:03.039 --> 00:10:03.480
<v Speaker 2>and so on.

206
00:10:03.679 --> 00:10:03.879
<v Speaker 1>Right.

207
00:10:04.240 --> 00:10:08.480
<v Speaker 2>Recursion provides a really elegant way to calculate these numbers, Okay,

208
00:10:08.519 --> 00:10:10.759
<v Speaker 2>and it has applications beyond just math.

209
00:10:10.960 --> 00:10:12.759
<v Speaker 1>All right, so we've covered a lot of ground here.

210
00:10:12.840 --> 00:10:16.840
<v Speaker 1>We have from basic building blocks of Python to more

211
00:10:16.840 --> 00:10:21.679
<v Speaker 1>complex concepts like object oriented programming and recursion. What comes

212
00:10:21.799 --> 00:10:23.600
<v Speaker 1>next in our data science adventure?

213
00:10:23.679 --> 00:10:26.480
<v Speaker 2>Well, now it's time to put all these concepts together

214
00:10:27.000 --> 00:10:29.840
<v Speaker 2>and see them in action. The book dives into some

215
00:10:30.120 --> 00:10:35.039
<v Speaker 2>fascinating real world applications of data science, right, showing us

216
00:10:35.080 --> 00:10:37.360
<v Speaker 2>how these tools are used to solve real problem.

217
00:10:37.519 --> 00:10:39.440
<v Speaker 1>Give me the good stuff. I'm ready to see data

218
00:10:39.480 --> 00:10:42.240
<v Speaker 1>science in action. Okay, what kind of real world applications

219
00:10:42.240 --> 00:10:43.159
<v Speaker 1>are we talking about here.

220
00:10:43.200 --> 00:10:46.159
<v Speaker 2>One of the most captivating chapters is about data mining Twitter.

221
00:10:46.720 --> 00:10:48.639
<v Speaker 1>Ooh, that sounds fun.

222
00:10:48.879 --> 00:10:51.919
<v Speaker 2>You'll learn how to tap into Twitter's api okay, think

223
00:10:51.960 --> 00:10:55.919
<v Speaker 2>of it like a backdoor to Twitter's data treasure trove. Okay,

224
00:10:56.039 --> 00:11:00.879
<v Speaker 2>to collect and analyze tweets, identify trending topic, and even

225
00:11:01.000 --> 00:11:03.120
<v Speaker 2>map tweet locations geographically.

226
00:11:03.440 --> 00:11:04.919
<v Speaker 1>We can do all that with Python.

227
00:11:05.120 --> 00:11:05.559
<v Speaker 2>We can.

228
00:11:06.039 --> 00:11:09.639
<v Speaker 1>That's amazing. I'm already picturing myself uncovering like hidden trends

229
00:11:09.639 --> 00:11:10.320
<v Speaker 1>and insights.

230
00:11:10.600 --> 00:11:14.279
<v Speaker 2>It's a powerful example of how Python unlocks the potential

231
00:11:14.399 --> 00:11:17.559
<v Speaker 2>of real world data sources. And this is just the

232
00:11:17.600 --> 00:11:20.080
<v Speaker 2>beginning of our journey into the world of data science.

233
00:11:20.080 --> 00:11:21.679
<v Speaker 2>There's so much more to explore.

234
00:11:21.879 --> 00:11:23.919
<v Speaker 1>I can't wait. But for now we'll have to pause

235
00:11:23.919 --> 00:11:26.679
<v Speaker 1>our deep dive here. Okay, don't worry, We'll be back

236
00:11:26.720 --> 00:11:31.320
<v Speaker 1>soon to uncover more fascinating insights and real world applications

237
00:11:31.480 --> 00:11:32.080
<v Speaker 1>in part two.

238
00:11:32.440 --> 00:11:35.120
<v Speaker 2>Sounds good. You know. One thing that really stood out

239
00:11:35.120 --> 00:11:38.399
<v Speaker 2>to me about this textbook is how it goes beyond

240
00:11:38.480 --> 00:11:40.480
<v Speaker 2>just teaching the syntax of Python.

241
00:11:40.639 --> 00:11:40.960
<v Speaker 1>Okay.

242
00:11:41.000 --> 00:11:44.080
<v Speaker 2>It really gets into like the computer science thinking right

243
00:11:44.159 --> 00:11:47.039
<v Speaker 2>that underpins data science. Okay, It's not just about learning

244
00:11:47.080 --> 00:11:49.639
<v Speaker 2>the code. It's about learning how to think like a

245
00:11:49.679 --> 00:11:50.360
<v Speaker 2>problem solver.

246
00:11:50.679 --> 00:11:52.919
<v Speaker 1>I noticed that too. Yeah, it's like the book is

247
00:11:52.960 --> 00:11:56.840
<v Speaker 1>trading our minds, yes, to approach challenges in this really

248
00:11:56.879 --> 00:12:00.679
<v Speaker 1>structured and logical way exactly, which seems essential field as

249
00:12:00.720 --> 00:12:04.399
<v Speaker 1>complex as data science. Absolutely, and chapter three is a

250
00:12:04.399 --> 00:12:08.960
<v Speaker 1>great example, Okay, dives deep into algorithms and pseudocode.

251
00:12:09.240 --> 00:12:13.519
<v Speaker 2>Yes, an algorithm is like a detailed recipe for solving

252
00:12:13.519 --> 00:12:14.039
<v Speaker 2>a problem.

253
00:12:14.200 --> 00:12:14.600
<v Speaker 1>Okay.

254
00:12:14.799 --> 00:12:18.240
<v Speaker 2>It outlines the ingredients, the steps, and the order of

255
00:12:18.320 --> 00:12:22.080
<v Speaker 2>operations in a very clear and logical sequence.

256
00:12:22.159 --> 00:12:25.320
<v Speaker 1>So it's like a blueprint for your code precisely. So

257
00:12:25.440 --> 00:12:27.559
<v Speaker 1>if I wanted to write a program to sort a

258
00:12:27.679 --> 00:12:31.120
<v Speaker 1>list of Twitter users by age, I would first need

259
00:12:31.159 --> 00:12:34.399
<v Speaker 1>to create an algorithm, yes, that outlines the exact steps

260
00:12:34.440 --> 00:12:37.159
<v Speaker 1>to achieve that, you got it. It's like planning the

261
00:12:37.240 --> 00:12:39.559
<v Speaker 1>route before starting a road trip, exactly.

262
00:12:39.840 --> 00:12:42.559
<v Speaker 2>And that's where pseudocode comes in. It's a way to

263
00:12:42.639 --> 00:12:47.080
<v Speaker 2>express the logic of your algorithm using plain English like

264
00:12:47.120 --> 00:12:50.720
<v Speaker 2>statements without getting bogged down in the specific syntax of

265
00:12:50.799 --> 00:12:51.879
<v Speaker 2>a programming language.

266
00:12:51.919 --> 00:12:54.279
<v Speaker 1>So it's like a rough draft of my program, yeah,

267
00:12:54.320 --> 00:12:56.480
<v Speaker 1>where I can focus on the big picture before getting

268
00:12:56.480 --> 00:12:58.600
<v Speaker 1>into the nitty gritty details.

269
00:12:58.159 --> 00:13:01.080
<v Speaker 2>Precisely, and the book is a great of showing how

270
00:13:01.080 --> 00:13:04.879
<v Speaker 2>to translate pseudocode into actual Python code. Okay, it's like

271
00:13:04.960 --> 00:13:06.639
<v Speaker 2>watching a blueprint come to life.

272
00:13:06.799 --> 00:13:10.320
<v Speaker 1>Speaking of bringing things to life, I'm eager to revisit

273
00:13:10.440 --> 00:13:13.600
<v Speaker 1>that Game of Craps example. Oh yeah, I have a

274
00:13:13.639 --> 00:13:17.320
<v Speaker 1>feeling it involves some clever algorithms. It does, and pseudocode

275
00:13:17.320 --> 00:13:18.320
<v Speaker 1>to simulate the game.

276
00:13:18.480 --> 00:13:22.600
<v Speaker 2>You bet. Simulating a game like craps requires careful planning

277
00:13:23.279 --> 00:13:26.639
<v Speaker 2>and breaking the problem down into manageable steps. You need

278
00:13:26.720 --> 00:13:30.159
<v Speaker 2>to consider how to represent the dice rolls, how to

279
00:13:30.200 --> 00:13:33.320
<v Speaker 2>track the score, and how to determine the winner. It's

280
00:13:33.320 --> 00:13:36.080
<v Speaker 2>a great example of how algorithms in pseudocode can help

281
00:13:36.120 --> 00:13:38.519
<v Speaker 2>you tackle complex problems in a structured way.

282
00:13:38.600 --> 00:13:41.200
<v Speaker 1>Okay, definitely adding that to my to do list. Revisit

283
00:13:41.279 --> 00:13:44.360
<v Speaker 1>game of Craps example. Okay, but let's shift years a

284
00:13:44.399 --> 00:13:46.799
<v Speaker 1>little bit. Just something that feels a bit more practical.

285
00:13:47.360 --> 00:13:51.080
<v Speaker 1>This data mining Twitter, Yeah, really peaked my curiosity.

286
00:13:51.240 --> 00:13:51.600
<v Speaker 2>Okay.

287
00:13:52.000 --> 00:13:54.840
<v Speaker 1>It sounds like we can unlock all sorts of insights

288
00:13:55.120 --> 00:13:56.120
<v Speaker 1>from the Twitter verse.

289
00:13:56.320 --> 00:13:59.480
<v Speaker 2>It's a fantastic example of how Python can be used

290
00:13:59.519 --> 00:14:03.919
<v Speaker 2>to act and analyze real world data. The book guides

291
00:14:03.960 --> 00:14:07.200
<v Speaker 2>you through using the tweetpee library, which acts as a

292
00:14:07.240 --> 00:14:11.320
<v Speaker 2>bridge between Python and Twitter's massive data repository.

293
00:14:11.639 --> 00:14:15.399
<v Speaker 1>So tweetee is our key to unlocking Twitter's treasure trove

294
00:14:15.480 --> 00:14:18.240
<v Speaker 1>of data. So besides just the text of the tweets,

295
00:14:18.720 --> 00:14:20.840
<v Speaker 1>what kind of information can we actually extract?

296
00:14:21.279 --> 00:14:24.759
<v Speaker 2>Oh, there's a wealth of information hidden within each tweet,

297
00:14:25.200 --> 00:14:28.039
<v Speaker 2>like what you can access details about the user who

298
00:14:28.080 --> 00:14:31.879
<v Speaker 2>posted it, the time and data was posted, its location,

299
00:14:32.080 --> 00:14:34.759
<v Speaker 2>if it was shared, whether it's a retweet or not,

300
00:14:35.639 --> 00:14:38.919
<v Speaker 2>and even analyze the sentiment expressed in the text.

301
00:14:39.039 --> 00:14:42.399
<v Speaker 1>Sentiment analysis. Yes, so that's where we determine if a

302
00:14:42.440 --> 00:14:46.600
<v Speaker 1>tweet is like positive, negative, or neutral exactly. Seems like

303
00:14:46.639 --> 00:14:49.200
<v Speaker 1>that could be really valuable for understanding public opinion.

304
00:14:49.360 --> 00:14:52.399
<v Speaker 2>It is, and the book explains how Python can be

305
00:14:52.519 --> 00:14:56.320
<v Speaker 2>used to perform sentiment analysis on tweets, okay, providing a

306
00:14:56.360 --> 00:14:58.919
<v Speaker 2>glimpse into public opinion or customer feedback.

307
00:14:59.200 --> 00:15:02.440
<v Speaker 1>Oh, so like if I'm a company, Yes, I could

308
00:15:02.480 --> 00:15:05.759
<v Speaker 1>gauge how people feel about my new product launch exactly

309
00:15:05.799 --> 00:15:07.120
<v Speaker 1>just by analyzing tweets.

310
00:15:07.200 --> 00:15:10.320
<v Speaker 2>It's a powerful tool for businesses and researchers alike.

311
00:15:10.440 --> 00:15:13.320
<v Speaker 1>So I could potentially see if people are generally happy

312
00:15:13.360 --> 00:15:14.600
<v Speaker 1>or unhappy with my brand?

313
00:15:14.759 --> 00:15:16.919
<v Speaker 2>You could? It seems incredibly useful.

314
00:15:16.679 --> 00:15:18.480
<v Speaker 1>Just one of the many possibilities.

315
00:15:18.639 --> 00:15:23.919
<v Speaker 2>The book also shows how to map tweet locations geographically, okay,

316
00:15:24.159 --> 00:15:28.600
<v Speaker 2>creating visualizations that reveal where certain topics or sentiments are

317
00:15:28.679 --> 00:15:29.480
<v Speaker 2>most prevalent.

318
00:15:29.679 --> 00:15:32.639
<v Speaker 1>Wow, So like a heat map of emotions across the globe.

319
00:15:32.679 --> 00:15:36.480
<v Speaker 2>Exactly. It's a powerful way to understand how ideas and

320
00:15:36.559 --> 00:15:38.320
<v Speaker 2>sentiments spread across social media.

321
00:15:38.759 --> 00:15:41.519
<v Speaker 1>It's like watching the collective consciousness of Twitter.

322
00:15:42.919 --> 00:15:46.679
<v Speaker 2>Yes, in real time, it is. The book also mentions

323
00:15:46.679 --> 00:15:48.519
<v Speaker 2>something called streaming tweets.

324
00:15:49.559 --> 00:15:51.480
<v Speaker 1>Streaming tweets Have you heard of that? Yeah?

325
00:15:51.519 --> 00:15:53.519
<v Speaker 2>It sounds like we're tapping into the live flow of

326
00:15:53.559 --> 00:15:56.440
<v Speaker 2>tweets as they're happening, yes, instead of just looking at

327
00:15:56.519 --> 00:15:59.559
<v Speaker 2>historical data. So it's like listening in on a conversation

328
00:15:59.600 --> 00:16:00.519
<v Speaker 2>as it's unfolding.

329
00:16:00.600 --> 00:16:04.840
<v Speaker 1>You got it. It's Twitter's streaming apia, which lets you

330
00:16:04.879 --> 00:16:06.799
<v Speaker 1>tap into the live stream of tweets.

331
00:16:06.840 --> 00:16:10.120
<v Speaker 2>So I could track trending topics or breaking news as

332
00:16:10.159 --> 00:16:11.159
<v Speaker 2>it emerges on Twitter.

333
00:16:11.320 --> 00:16:13.440
<v Speaker 1>Exactly. The possibilities are endless.

334
00:16:13.480 --> 00:16:14.080
<v Speaker 2>Wow.

335
00:16:14.120 --> 00:16:16.440
<v Speaker 1>But I have a question for you. Okay, once we've

336
00:16:16.480 --> 00:16:19.080
<v Speaker 1>collected all this data from Twitter. Uh huh, where do

337
00:16:19.159 --> 00:16:21.399
<v Speaker 1>we store it? That's a good question, right.

338
00:16:21.639 --> 00:16:23.720
<v Speaker 2>It seems like we'd need a pretty big container for

339
00:16:23.759 --> 00:16:24.399
<v Speaker 2>all those tweets.

340
00:16:24.519 --> 00:16:27.480
<v Speaker 1>Yeah, We've talked about lists and dictionaries, but I imagine

341
00:16:27.480 --> 00:16:29.840
<v Speaker 1>those would get pretty unwieldy with millions of tweets.

342
00:16:29.919 --> 00:16:31.120
<v Speaker 2>Yeah, they would.

343
00:16:31.200 --> 00:16:35.159
<v Speaker 1>So what does the book recommend for handling like massive

344
00:16:35.200 --> 00:16:35.919
<v Speaker 1>amounts of data?

345
00:16:36.519 --> 00:16:40.000
<v Speaker 2>The book explores various ways to store tweets, Okay, from

346
00:16:40.080 --> 00:16:45.000
<v Speaker 2>simple text files to more sophisticated databases like Mango dB

347
00:16:45.399 --> 00:16:49.000
<v Speaker 2>Loongo dB, which is particularly well suited for handling the

348
00:16:49.120 --> 00:16:52.559
<v Speaker 2>unstructured data. Okay that's common in social media.

349
00:16:52.440 --> 00:16:55.399
<v Speaker 1>So it's like a custom built warehouse, yes, for all

350
00:16:55.440 --> 00:16:59.600
<v Speaker 1>my Twitter data. Sisely, wasn't Mongo dB mentioned in chapter seventeen?

351
00:17:00.279 --> 00:17:03.120
<v Speaker 1>Was big data and the Internet of Things? Yes, it

352
00:17:03.120 --> 00:17:05.559
<v Speaker 1>seems like we're starting to connect the dots between different

353
00:17:05.559 --> 00:17:06.640
<v Speaker 1>concepts from the book.

354
00:17:06.759 --> 00:17:10.559
<v Speaker 2>You're absolutely right. The book features a detailed case study

355
00:17:10.599 --> 00:17:13.880
<v Speaker 2>on streaming tweets into a Mango dB database.

356
00:17:14.160 --> 00:17:17.759
<v Speaker 1>So we can use Mango dB to store and organize

357
00:17:17.759 --> 00:17:21.079
<v Speaker 1>all those tweets, yes, making it easy to access and

358
00:17:21.119 --> 00:17:22.039
<v Speaker 1>analyze them later.

359
00:17:22.319 --> 00:17:25.960
<v Speaker 2>Precisely, and the book even shows how to use Python

360
00:17:26.519 --> 00:17:30.000
<v Speaker 2>to query and analyze the data stored in Mango dB.

361
00:17:30.359 --> 00:17:33.960
<v Speaker 1>It's like having a librarian who can instantly retrieve any tweet.

362
00:17:33.640 --> 00:17:37.839
<v Speaker 2>I need, Exactly. It's a powerful combination for unlocking insights

363
00:17:37.839 --> 00:17:39.079
<v Speaker 2>from social media data.

364
00:17:39.359 --> 00:17:41.799
<v Speaker 1>This is starting to feel pretty advanced, it is. But

365
00:17:41.839 --> 00:17:44.319
<v Speaker 1>before we move on to like even more complex topics,

366
00:17:44.720 --> 00:17:46.960
<v Speaker 1>there's something else from the Twitter chapter I wanted to

367
00:17:46.960 --> 00:17:50.720
<v Speaker 1>touch on right. The book mentions the importance of cleaning

368
00:17:50.759 --> 00:17:54.160
<v Speaker 1>and preprocessing tweets before analyzing them.

369
00:17:54.440 --> 00:17:58.359
<v Speaker 2>That's a crucial step, Okay that often gets overlooked. Really, Yeah,

370
00:17:58.400 --> 00:18:03.759
<v Speaker 2>tweets are full of noise, okay, URLs, hashtags, mentions, special characters, emojis.

371
00:18:04.319 --> 00:18:07.559
<v Speaker 2>All these elements can make analysis really messy if not

372
00:18:07.759 --> 00:18:08.599
<v Speaker 2>handled properly.

373
00:18:08.720 --> 00:18:10.720
<v Speaker 1>It's like trying to read a book with coffee stains

374
00:18:10.759 --> 00:18:11.799
<v Speaker 1>and scribbles all over it.

375
00:18:11.920 --> 00:18:12.400
<v Speaker 2>Exactly.

376
00:18:12.519 --> 00:18:14.599
<v Speaker 1>Can still get the GISTs, yeah, but it's a lot

377
00:18:14.640 --> 00:18:15.519
<v Speaker 1>harder to decipher.

378
00:18:15.799 --> 00:18:19.519
<v Speaker 2>Raw tweets are often messy and inconsistent, which can really

379
00:18:19.559 --> 00:18:23.400
<v Speaker 2>skew your analysis. That's where preprocessing comes in. It's about

380
00:18:23.440 --> 00:18:27.640
<v Speaker 2>transforming those raw tweets into a cleaner, more structured format

381
00:18:27.960 --> 00:18:29.240
<v Speaker 2>that's easier to analyze.

382
00:18:29.279 --> 00:18:31.720
<v Speaker 1>So how do we actually clean up these messy tweets?

383
00:18:32.039 --> 00:18:33.920
<v Speaker 2>Well, thankfully, you don't have to do it manually.

384
00:18:34.039 --> 00:18:34.559
<v Speaker 1>Okay. Good.

385
00:18:34.720 --> 00:18:39.319
<v Speaker 2>The book introduces a handy Python library called tweet Preprocessor,

386
00:18:39.799 --> 00:18:43.279
<v Speaker 2>and it can automatically remove all that noise from tweets, wow,

387
00:18:43.440 --> 00:18:45.240
<v Speaker 2>leaving you with just the essential text.

388
00:18:45.279 --> 00:18:48.759
<v Speaker 1>So it's like a robotic editor, exact, that can instantly

389
00:18:48.759 --> 00:18:49.720
<v Speaker 1>clean up my tweets.

390
00:18:49.799 --> 00:18:50.119
<v Speaker 2>It is.

391
00:18:50.279 --> 00:18:52.599
<v Speaker 1>That sounds like a huge time saver, it is.

392
00:18:52.720 --> 00:18:54.920
<v Speaker 2>And it also makes the analysis much.

393
00:18:54.720 --> 00:18:58.279
<v Speaker 1>More accurate, right, because then you're working with clean and

394
00:18:58.359 --> 00:19:02.079
<v Speaker 1>consistent data exactly. This deep dive is really giving me

395
00:19:02.119 --> 00:19:05.640
<v Speaker 1>a new appreciation for all the work that goes into

396
00:19:05.759 --> 00:19:09.759
<v Speaker 1>data science. It's not just about writing code. It's about

397
00:19:09.880 --> 00:19:14.480
<v Speaker 1>understanding the data, cleaning it, yes, preparing it for analysis,

398
00:19:14.960 --> 00:19:18.279
<v Speaker 1>and then applying the right tools and techniques to extract

399
00:19:18.440 --> 00:19:19.640
<v Speaker 1>meaningful insights.

400
00:19:19.839 --> 00:19:22.599
<v Speaker 2>It's a whole process, it really is. But before we

401
00:19:22.680 --> 00:19:25.640
<v Speaker 2>dive into the even more complex world of machine learning, Okay,

402
00:19:25.759 --> 00:19:28.039
<v Speaker 2>there's one more essential concept from the book that I

403
00:19:28.039 --> 00:19:28.720
<v Speaker 2>want to highlight.

404
00:19:28.839 --> 00:19:32.319
<v Speaker 1>All right, I'm all ears, what other gems has this

405
00:19:32.599 --> 00:19:33.599
<v Speaker 1>textbook revealed?

406
00:19:34.039 --> 00:19:39.599
<v Speaker 2>Chapter eleven, Computer Science thinking introduces this powerful mental model

407
00:19:39.640 --> 00:19:42.319
<v Speaker 2>for problem solving. It's called decomposition.

408
00:19:42.680 --> 00:19:43.759
<v Speaker 1>Decomposition.

409
00:19:44.160 --> 00:19:47.880
<v Speaker 2>Yes, it's the process of breaking down a large, complex

410
00:19:48.079 --> 00:19:51.680
<v Speaker 2>problem into smaller, more manageable subproblems.

411
00:19:51.680 --> 00:19:54.880
<v Speaker 1>So it's like tackling a giant puzzle, yes, by focusing

412
00:19:54.920 --> 00:19:56.599
<v Speaker 1>on individual pieces, one at a time.

413
00:19:56.759 --> 00:19:57.279
<v Speaker 2>Exactly.

414
00:19:57.519 --> 00:20:00.279
<v Speaker 1>It seems like an essential skill for data science. It

415
00:20:00.359 --> 00:20:03.200
<v Speaker 1>is where we're often dealing with massive amounts of data

416
00:20:03.640 --> 00:20:04.960
<v Speaker 1>and complex challenges.

417
00:20:05.000 --> 00:20:08.640
<v Speaker 2>Absolutely, and the book provides some great examples of how

418
00:20:08.680 --> 00:20:12.200
<v Speaker 2>to apply decomposition to real world data science problem.

419
00:20:12.319 --> 00:20:15.279
<v Speaker 1>So like that game of crafts examples, yes, sacs exactly,

420
00:20:15.279 --> 00:20:18.519
<v Speaker 1>we need to break down the game into its individual components. Yeah,

421
00:20:18.720 --> 00:20:22.039
<v Speaker 1>the dice rolls, the scoring rules, the wind conditions before

422
00:20:22.079 --> 00:20:25.160
<v Speaker 1>we can even begin to write code to simulate it precisely.

423
00:20:25.839 --> 00:20:29.400
<v Speaker 2>And this principle applies to all sorts of data science projects,

424
00:20:29.839 --> 00:20:34.519
<v Speaker 2>whether you're analyzing social media data, building a machine learning model,

425
00:20:35.079 --> 00:20:37.839
<v Speaker 2>or developing a complex data visualization.

426
00:20:38.079 --> 00:20:41.160
<v Speaker 1>So it's like this mental toolkit, yes, for breaking down

427
00:20:41.480 --> 00:20:45.559
<v Speaker 1>complex challenges into manageable steps. It is that seems like

428
00:20:45.599 --> 00:20:46.480
<v Speaker 1>a valuable skill.

429
00:20:46.640 --> 00:20:51.880
<v Speaker 2>It is a fundamental problem solving strategy that transcends any

430
00:20:52.000 --> 00:20:55.799
<v Speaker 2>specific field or domain. It's a way of thinking that

431
00:20:55.799 --> 00:20:59.880
<v Speaker 2>can help you approach any challenge with clarity and confidence.

432
00:21:00.359 --> 00:21:03.240
<v Speaker 1>It's like we're expanding our mental toolkit we are with

433
00:21:03.319 --> 00:21:06.359
<v Speaker 1>this deep dive. Yes, I'm learning not just the technical

434
00:21:06.400 --> 00:21:10.319
<v Speaker 1>aspects of data science, but also the mental models and

435
00:21:10.480 --> 00:21:13.799
<v Speaker 1>problem solving approaches that underpin this field.

436
00:21:13.880 --> 00:21:14.400
<v Speaker 2>That's great.

437
00:21:14.440 --> 00:21:16.039
<v Speaker 1>It's like I'm learning a new way of thinking.

438
00:21:16.200 --> 00:21:17.359
<v Speaker 2>That's fantastic to hear.

439
00:21:17.519 --> 00:21:19.279
<v Speaker 1>So are you ready to step into the world of

440
00:21:19.319 --> 00:21:20.039
<v Speaker 1>machine learning?

441
00:21:20.240 --> 00:21:20.599
<v Speaker 2>I am.

442
00:21:20.759 --> 00:21:22.480
<v Speaker 1>Let's do it all right. It's like we're entering the

443
00:21:22.519 --> 00:21:25.279
<v Speaker 1>realm of science fiction here, where machines can learn and

444
00:21:25.319 --> 00:21:26.119
<v Speaker 1>make predictions.

445
00:21:26.240 --> 00:21:27.599
<v Speaker 2>I know. It's so cool.

446
00:21:27.640 --> 00:21:30.319
<v Speaker 1>Okay, So we've laid the groundwork, talked about data Python,

447
00:21:30.559 --> 00:21:34.559
<v Speaker 1>even touched on those problem solving approaches like decomposition. Yeah,

448
00:21:34.880 --> 00:21:36.279
<v Speaker 1>but now it's time to get to the heart of

449
00:21:36.319 --> 00:21:40.079
<v Speaker 1>it all machine learning. Yes, this is where we step

450
00:21:40.079 --> 00:21:43.079
<v Speaker 1>into the realm of science fiction. I know, machines can

451
00:21:43.200 --> 00:21:44.799
<v Speaker 1>learn and make predictions.

452
00:21:45.039 --> 00:21:45.880
<v Speaker 2>It's pretty amazing.

453
00:21:46.039 --> 00:21:48.680
<v Speaker 1>So let's start with chapter fifteen machine learning.

454
00:21:48.839 --> 00:21:49.279
<v Speaker 2>Okay.

455
00:21:49.519 --> 00:21:54.039
<v Speaker 1>It focuses on this technique called k Nearest Neighbors or kNN.

456
00:21:54.359 --> 00:21:59.519
<v Speaker 2>Yeah. kNN is a classic machine learning algorithm. It's used

457
00:21:59.519 --> 00:22:03.039
<v Speaker 2>for classifocation. Okay, and it's based on a really simple

458
00:22:03.160 --> 00:22:07.160
<v Speaker 2>but powerful idea. Okay, what's that similar things tend to

459
00:22:07.240 --> 00:22:08.000
<v Speaker 2>belong together.

460
00:22:08.200 --> 00:22:09.279
<v Speaker 1>Okay, think about it.

461
00:22:09.480 --> 00:22:13.319
<v Speaker 2>So, if you're trying to classify tweet as positive or negative, Okay,

462
00:22:13.400 --> 00:22:16.440
<v Speaker 2>you would look at how similar it is to other

463
00:22:16.519 --> 00:22:18.200
<v Speaker 2>tweets that have already been classified.

464
00:22:18.440 --> 00:22:21.480
<v Speaker 1>So it's like judging a book by its cover exact or,

465
00:22:21.480 --> 00:22:23.839
<v Speaker 1>in this case, a tweet by its neighbors precisely.

466
00:22:24.039 --> 00:22:29.319
<v Speaker 2>Yeah, you compare the tweet to a set of labeled examples, okay,

467
00:22:29.400 --> 00:22:32.319
<v Speaker 2>tweets that have already been categorized as positive or negative,

468
00:22:32.920 --> 00:22:35.000
<v Speaker 2>and see which category it's closest to.

469
00:22:35.400 --> 00:22:35.839
<v Speaker 1>Okay.

470
00:22:36.079 --> 00:22:40.160
<v Speaker 2>The K in kNN refers to the number of neighbors.

471
00:22:40.480 --> 00:22:42.359
<v Speaker 2>Ok you consider in this comparison.

472
00:22:42.839 --> 00:22:45.680
<v Speaker 1>So if K is five, yes, I'd look at the

473
00:22:45.720 --> 00:22:48.720
<v Speaker 1>five most similar tweets exactly and see which category gets

474
00:22:48.759 --> 00:22:49.480
<v Speaker 1>the most votes.

475
00:22:49.799 --> 00:22:52.240
<v Speaker 2>Yes, majority rules in the world of kNN.

476
00:22:52.519 --> 00:22:54.480
<v Speaker 1>It seems pretty straightforward.

477
00:22:54.240 --> 00:22:57.240
<v Speaker 2>Conceptually, it is. Okay, and the book uses a really

478
00:22:57.279 --> 00:22:59.039
<v Speaker 2>cool data set to illustrate this.

479
00:22:59.240 --> 00:23:03.839
<v Speaker 1>Oh yeah, what's that the MNIST data set MNIST.

480
00:23:04.000 --> 00:23:04.839
<v Speaker 2>Yeah, have you heard of it?

481
00:23:05.000 --> 00:23:07.079
<v Speaker 1>Yeah, that's the one with all the handwritten digits. Right.

482
00:23:07.200 --> 00:23:11.119
<v Speaker 2>Yes, it's a massive collection of images of handwritten digits okay,

483
00:23:11.160 --> 00:23:14.799
<v Speaker 2>commonly used to train machine learning models for image recognition.

484
00:23:15.039 --> 00:23:17.640
<v Speaker 1>So we can use kNN to teach a computer to

485
00:23:17.759 --> 00:23:19.240
<v Speaker 1>recognize handwritten digits.

486
00:23:19.240 --> 00:23:19.680
<v Speaker 2>Exactly.

487
00:23:19.839 --> 00:23:20.559
<v Speaker 1>That's incredible.

488
00:23:20.640 --> 00:23:23.519
<v Speaker 2>It's like giving machines the ability to read our handwriting.

489
00:23:23.839 --> 00:23:24.160
<v Speaker 1>Wow.

490
00:23:24.359 --> 00:23:27.119
<v Speaker 2>The book walks you through the entire process, from loading

491
00:23:27.160 --> 00:23:28.160
<v Speaker 2>the data set uh.

492
00:23:28.079 --> 00:23:32.240
<v Speaker 1>Huh, to training the model and evaluating its accuracy. It

493
00:23:32.279 --> 00:23:34.440
<v Speaker 1>even shows you how to visualize the results.

494
00:23:34.680 --> 00:23:37.799
<v Speaker 2>Visualizations always help, they do. But before we get too

495
00:23:37.960 --> 00:23:41.839
<v Speaker 2>deep into deciphering digits. Yeah, the book mentions something about

496
00:23:41.880 --> 00:23:42.880
<v Speaker 2>splitting the data.

497
00:23:43.119 --> 00:23:46.359
<v Speaker 1>Yes, splitting the data is crucial in machine learning.

498
00:23:46.559 --> 00:23:46.839
<v Speaker 2>Okay.

499
00:23:46.839 --> 00:23:50.319
<v Speaker 1>Why is that We typically divide our data into two

500
00:23:50.319 --> 00:23:53.480
<v Speaker 1>sets okay, a training set and a testing set. We

501
00:23:53.599 --> 00:23:55.680
<v Speaker 1>use the training set to teach our model how to

502
00:23:55.680 --> 00:23:58.359
<v Speaker 1>make predictions. Uh huh, and then we use the testing

503
00:23:58.400 --> 00:23:59.839
<v Speaker 1>set to see how well it's learned.

504
00:24:00.279 --> 00:24:02.319
<v Speaker 2>So it's like a practice exam before the real deal.

505
00:24:02.480 --> 00:24:05.240
<v Speaker 1>Exactly, you got it. We want to make sure our

506
00:24:05.279 --> 00:24:10.240
<v Speaker 1>model can handle new unseen data and not just memorize

507
00:24:10.279 --> 00:24:11.920
<v Speaker 1>the answers to the practice test.

508
00:24:11.720 --> 00:24:13.480
<v Speaker 2>Who don't want it to be a one trick pony.

509
00:24:13.599 --> 00:24:14.000
<v Speaker 1>We don't.

510
00:24:14.039 --> 00:24:15.359
<v Speaker 2>We need it to be adaptable.

511
00:24:15.640 --> 00:24:19.000
<v Speaker 1>Exactly. We want to make sure it can generalize its

512
00:24:19.039 --> 00:24:21.960
<v Speaker 1>knowledge to new situations. Okay, and this helps us avoid

513
00:24:22.039 --> 00:24:26.680
<v Speaker 1>something called overfitting. Overfitting where the model becomes too specialized

514
00:24:27.000 --> 00:24:28.640
<v Speaker 1>to the training data and.

515
00:24:28.519 --> 00:24:30.599
<v Speaker 2>Then performs poorly on new data.

516
00:24:31.359 --> 00:24:34.480
<v Speaker 1>So the book also mentions training the model. Yes, what

517
00:24:34.680 --> 00:24:36.839
<v Speaker 1>exactly happens during this training process?

518
00:24:36.960 --> 00:24:41.359
<v Speaker 2>So training a model involves feeding it data and letting

519
00:24:41.400 --> 00:24:42.759
<v Speaker 2>it learn from that data.

520
00:24:42.880 --> 00:24:43.279
<v Speaker 1>Okay.

521
00:24:43.319 --> 00:24:46.640
<v Speaker 2>In the case of kN N, the training is relatively simple.

522
00:24:47.079 --> 00:24:50.359
<v Speaker 2>The algorithm just remembers all the training data points. But

523
00:24:50.440 --> 00:24:54.680
<v Speaker 2>for more complex algorithms, the training process involves adjusting a

524
00:24:54.720 --> 00:24:58.359
<v Speaker 2>bunch of internal parameters to improve the model's predictions.

525
00:24:58.680 --> 00:25:01.119
<v Speaker 1>So it's like, where code our model to become a

526
00:25:01.119 --> 00:25:04.400
<v Speaker 1>better predictor Exactly, it's getting feedback and improving.

527
00:25:04.119 --> 00:25:06.279
<v Speaker 2>Exactly, You're giving it practice and feedback.

528
00:25:06.319 --> 00:25:10.000
<v Speaker 1>Now, the book also talks about evaluating the model's accuracy. Yes,

529
00:25:10.240 --> 00:25:12.160
<v Speaker 1>how do we actually know if our model is a

530
00:25:12.200 --> 00:25:14.319
<v Speaker 1>star student or needs more tutoring?

531
00:25:15.400 --> 00:25:18.680
<v Speaker 2>There are various ways to evaluate a model's performance. Okay,

532
00:25:18.799 --> 00:25:22.839
<v Speaker 2>But for classification tasks like the mnist example, we often

533
00:25:22.960 --> 00:25:25.039
<v Speaker 2>use something called a confusion matrix.

534
00:25:25.200 --> 00:25:27.160
<v Speaker 1>A confusion matrix.

535
00:25:26.799 --> 00:25:30.680
<v Speaker 2>It shows us how many samples were correctly and incorrectly classified.

536
00:25:30.960 --> 00:25:32.920
<v Speaker 2>It's like a report card for our model.

537
00:25:33.680 --> 00:25:35.799
<v Speaker 1>So it would tell me how many times the model

538
00:25:36.400 --> 00:25:40.480
<v Speaker 1>correctly identified a handwritten three yes versus how many times

539
00:25:40.480 --> 00:25:43.279
<v Speaker 1>it mistook it for like an eight exactly.

540
00:25:43.759 --> 00:25:46.680
<v Speaker 2>It gives us a detailed breakdown of the model's performance,

541
00:25:46.799 --> 00:25:50.640
<v Speaker 2>helping us understand where it excels and where it needs improvement.

542
00:25:51.039 --> 00:25:55.359
<v Speaker 1>So we've talked about kNN for classification. What about predicting

543
00:25:55.480 --> 00:25:56.640
<v Speaker 1>future outcomes?

544
00:25:56.920 --> 00:25:57.599
<v Speaker 2>Ah?

545
00:25:57.759 --> 00:26:00.000
<v Speaker 1>Can machine learning help us see into the future?

546
00:26:00.160 --> 00:26:04.759
<v Speaker 2>Absolutely. The book introduces linear regression. Okay, it's a powerful

547
00:26:04.839 --> 00:26:09.279
<v Speaker 2>technique used to model relationships between variables and make predictions

548
00:26:09.319 --> 00:26:10.559
<v Speaker 2>about future values.

549
00:26:10.720 --> 00:26:13.279
<v Speaker 1>So if I had data on the average temperature in

550
00:26:13.279 --> 00:26:16.160
<v Speaker 1>New York City for the past one hundred years, I

551
00:26:16.200 --> 00:26:20.319
<v Speaker 1>could use linear regression to predict the average temperature next January.

552
00:26:20.359 --> 00:26:23.720
<v Speaker 2>You could. The book provides an example where you analyze

553
00:26:23.799 --> 00:26:26.599
<v Speaker 2>historical temperature data using linear aggression.

554
00:26:26.640 --> 00:26:28.960
<v Speaker 1>Wow. So it's like a data driven crystal ball. It is.

555
00:26:29.000 --> 00:26:31.279
<v Speaker 2>It's a great illustration of how machine learning can be

556
00:26:31.400 --> 00:26:34.440
<v Speaker 2>used to make predictions based on historical trends.

557
00:26:34.680 --> 00:26:37.880
<v Speaker 1>So if the trend is for warmer winters, yes, my

558
00:26:38.000 --> 00:26:40.960
<v Speaker 1>model might predict that next January will be slightly warmer

559
00:26:41.000 --> 00:26:41.599
<v Speaker 1>than average.

560
00:26:41.680 --> 00:26:44.240
<v Speaker 2>Exactly. It's like having a weather forecaster that can see

561
00:26:44.400 --> 00:26:45.400
<v Speaker 2>years into the future.

562
00:26:45.640 --> 00:26:46.240
<v Speaker 1>Amazing.

563
00:26:46.440 --> 00:26:50.000
<v Speaker 2>The book even shows how to visualize those predictions, okay,

564
00:26:50.160 --> 00:26:54.000
<v Speaker 2>creating graphs that show the predicted temperature trend over time.

565
00:26:54.319 --> 00:26:56.720
<v Speaker 1>Visualizations always make things more compelling.

566
00:26:56.519 --> 00:26:56.960
<v Speaker 2>They do.

567
00:26:57.240 --> 00:26:59.920
<v Speaker 1>But this is all seeming pretty straightforward so far. Yeah,

568
00:27:00.319 --> 00:27:03.160
<v Speaker 1>what about more complex types of machine learning.

569
00:27:03.440 --> 00:27:06.799
<v Speaker 2>Well, the book does touch on a few more advanced

570
00:27:06.799 --> 00:27:11.880
<v Speaker 2>techniques okay, like what like multiple linear regression okay, where

571
00:27:11.920 --> 00:27:16.039
<v Speaker 2>you consider multiple factors to make a prediction. And it

572
00:27:16.119 --> 00:27:19.799
<v Speaker 2>even introduces clustering. Clustering which is a way to group

573
00:27:19.960 --> 00:27:23.799
<v Speaker 2>similar data points together okay, without any pre existing labels.

574
00:27:24.319 --> 00:27:27.160
<v Speaker 1>So instead of predicting a specific outcome, yes, we're just

575
00:27:27.200 --> 00:27:28.839
<v Speaker 1>trying to find patterns in the data.

576
00:27:28.880 --> 00:27:33.200
<v Speaker 2>Exactly. It's often used in exploratory data analysis, okay, where

577
00:27:33.200 --> 00:27:36.680
<v Speaker 2>you're trying to understand the underlying structure of your data.

578
00:27:36.759 --> 00:27:38.880
<v Speaker 1>So it's like a data driven treasure hunt.

579
00:27:39.079 --> 00:27:40.359
<v Speaker 2>Exactly, you got it.

580
00:27:40.559 --> 00:27:43.720
<v Speaker 1>The book gives an example of using clustering to analyze

581
00:27:43.759 --> 00:27:47.799
<v Speaker 1>housing data it does, revealing groups of houses with similar characteristics.

582
00:27:47.839 --> 00:27:48.720
<v Speaker 2>Yess, very cool.

583
00:27:48.839 --> 00:27:51.480
<v Speaker 1>I'm seeing how diverse machine learning is. It is there

584
00:27:51.480 --> 00:27:54.000
<v Speaker 1>are so many different techniques, there are eats with its

585
00:27:54.000 --> 00:27:57.240
<v Speaker 1>own strengths and weaknesses. Absolutely, this textbook has been a

586
00:27:57.279 --> 00:28:00.279
<v Speaker 1>fantastic guide. It has giving us a taste of all

587
00:28:00.279 --> 00:28:03.160
<v Speaker 1>these different techniques, showing us how to apply them to

588
00:28:03.240 --> 00:28:04.359
<v Speaker 1>real world problems.

589
00:28:04.400 --> 00:28:06.880
<v Speaker 2>And while it covers a lot of ground, it's important

590
00:28:06.880 --> 00:28:10.079
<v Speaker 2>to remember that this is just the beginning. Machine learning

591
00:28:10.160 --> 00:28:14.359
<v Speaker 2>is a vast and rapidly evolving field, with new techniques

592
00:28:14.359 --> 00:28:16.359
<v Speaker 2>and applications emerging all the time.

593
00:28:16.839 --> 00:28:20.079
<v Speaker 1>So for someone who wants to continue exploring this world, yes,

594
00:28:20.200 --> 00:28:23.640
<v Speaker 1>what's next? What lies beyond the pages of this textbook?

595
00:28:23.880 --> 00:28:28.519
<v Speaker 2>Well, this book provides a solid foundation, but there's a

596
00:28:28.559 --> 00:28:34.599
<v Speaker 2>whole universe of resources out there like what online courses, okay, tutorials,

597
00:28:35.000 --> 00:28:39.240
<v Speaker 2>open source projects, and of course there's no substitute for

598
00:28:39.440 --> 00:28:42.759
<v Speaker 2>hands on experience. Yeah, the best way to learn is

599
00:28:42.799 --> 00:28:46.279
<v Speaker 2>to dive in, start building your own machine learning projects

600
00:28:46.680 --> 00:28:47.880
<v Speaker 2>and see what you can create.

601
00:28:48.400 --> 00:28:49.880
<v Speaker 1>That's inspiring advice.

602
00:28:50.000 --> 00:28:50.400
<v Speaker 2>Thank you.

603
00:28:50.640 --> 00:28:53.200
<v Speaker 1>It's like we've been given the keys to this powerful car,

604
00:28:53.599 --> 00:28:54.279
<v Speaker 1>and now it's time to.

605
00:28:54.319 --> 00:28:57.039
<v Speaker 2>Hit the road's right see where it takes us exactly.

606
00:28:57.319 --> 00:28:59.720
<v Speaker 1>This deep dive has given me a much deeper understanding

607
00:28:59.759 --> 00:29:01.759
<v Speaker 1>of data science and machine learning.

608
00:29:01.839 --> 00:29:02.720
<v Speaker 2>I'm glad to hear that.

609
00:29:02.799 --> 00:29:07.279
<v Speaker 1>It's amazing to me the power of Python to unlock

610
00:29:07.519 --> 00:29:10.720
<v Speaker 1>insights from data and make predictions about the future.

611
00:29:11.000 --> 00:29:14.400
<v Speaker 2>It's been a pleasure exploring this fascinating field with you. Likewise,

612
00:29:14.519 --> 00:29:18.039
<v Speaker 2>remember as you continue your journey, okay, always stay curious,

613
00:29:18.319 --> 00:29:21.960
<v Speaker 2>keep learning, and never stop experimenting. The world of data

614
00:29:22.000 --> 00:29:24.759
<v Speaker 2>science is vast and full of possibilities.

615
00:29:24.920 --> 00:29:27.480
<v Speaker 1>And that's a wrap on our Python and data science

616
00:29:27.519 --> 00:29:28.119
<v Speaker 1>deep dive.

617
00:29:28.279 --> 00:29:28.599
<v Speaker 2>Great.

618
00:29:28.720 --> 00:29:31.240
<v Speaker 1>We've covered a lot of ground today. We have from

619
00:29:31.279 --> 00:29:36.240
<v Speaker 1>the basics of Python programming to the intricacies of machine learning.

620
00:29:36.279 --> 00:29:37.119
<v Speaker 2>It's been a journey.

621
00:29:37.359 --> 00:29:39.880
<v Speaker 1>We hope you've enjoyed this journey into the world of

622
00:29:39.960 --> 00:29:43.079
<v Speaker 1>data science. I am and found it as insightful and

623
00:29:43.160 --> 00:29:47.240
<v Speaker 1>engaging as we have me too. Still, next time, happy coding, yes,

624
00:29:48.440 --> 00:29:50.359
<v Speaker 1>and may your data always be insightful.
