WEBVTT 1 00:00:00.080 --> 00:00:03.680 Imagine really getting a handle on artificial intelligence, not just 2 00:00:04.160 --> 00:00:08.919 the headlines, but the actual mechanics, the tools, what makes 3 00:00:08.960 --> 00:00:11.279 it tick. That's exactly what we're doing today. This is 4 00:00:11.279 --> 00:00:14.480 our deep dive into AI with Python. We've gathered a 5 00:00:14.519 --> 00:00:18.120 stack of sources, articles, research papers, technotes, runner and our 6 00:00:18.160 --> 00:00:21.719 mission really is to pull out the absolute key insights, 7 00:00:22.120 --> 00:00:26.160 the surprising bits and give you a solid but easy 8 00:00:26.199 --> 00:00:29.839 to grasp understanding from what intelligence even means here to 9 00:00:29.920 --> 00:00:35.000 how machines learn see here even play games. So let's 10 00:00:35.079 --> 00:00:37.119 unpack this right from the start. What does it really 11 00:00:37.159 --> 00:00:39.960 mean when we talk about artificial intelligence right. 12 00:00:40.000 --> 00:00:43.640 Well, the foundational idea, going back to John McCarthy, people 13 00:00:43.719 --> 00:00:46.159 often call them the fogger of AI. He defined it 14 00:00:46.200 --> 00:00:49.200 as the science and engineering of making intelligent machines, okay, 15 00:00:49.280 --> 00:00:52.960 especially intelligent computer programs. So fundamentally it's about trying to 16 00:00:52.960 --> 00:00:55.439 build machines or software they can do things we normally 17 00:00:55.520 --> 00:01:00.200 associate with well, human intelligence, learning, problem solving, making decisions. 18 00:01:00.159 --> 00:01:02.320 That kind of thing things humans do exactly. 19 00:01:02.479 --> 00:01:05.159 And you know what's really interesting isn't just the definition, 20 00:01:05.200 --> 00:01:07.120 but why we need it. What it lets us do 21 00:01:07.799 --> 00:01:11.280 think about learning from just massive amounts of data. Yeah, 22 00:01:11.319 --> 00:01:16.400 impossible for one person totally or doing repetitive tasks accurately, tirelessly. 23 00:01:16.920 --> 00:01:21.200 And crucially, AI can sort of teach itself as new 24 00:01:21.200 --> 00:01:24.879 information comes in. That's huge because you know, the world 25 00:01:24.879 --> 00:01:28.000 doesn't stand still. Plus, it can react in real time, 26 00:01:28.200 --> 00:01:31.680 organize huge data sets efficiently, things that give us much 27 00:01:31.719 --> 00:01:34.480 better results than we could manage alone, especially at scale. 28 00:01:35.280 --> 00:01:38.719 And when we think about what makes the system intelligent 29 00:01:38.840 --> 00:01:41.920 in this context, it's useful to think about Howard Gardner's 30 00:01:41.920 --> 00:01:43.680 idea of multiple intelligences. 31 00:01:43.719 --> 00:01:48.159 Ah, yeah, I remember that, like musical, logical exactly, linguistic logical, mathematical, spatial, 32 00:01:48.200 --> 00:01:48.519 and so on. 33 00:01:49.280 --> 00:01:52.040 The idea is if a machine shows capability and even 34 00:01:52.079 --> 00:01:54.840 one or maybe several of these areas, we consider it 35 00:01:54.959 --> 00:01:58.680 artificially intelligent. It's not always about perfectly mimicking humans, but 36 00:01:58.760 --> 00:02:01.000 about having the right kind of intelligence for a task. 37 00:02:01.200 --> 00:02:04.319 Okay, So if that's the big picture of AI, then 38 00:02:05.680 --> 00:02:08.240 here's the key question I think for anyone wanting to 39 00:02:08.240 --> 00:02:12.560 build this stuff, why Python? Why is that language so central? 40 00:02:12.639 --> 00:02:15.479 Yeah, that's a great question, and it's not really an accident. 41 00:02:15.879 --> 00:02:19.039 Python's dominance in AI comes down to a few really 42 00:02:19.080 --> 00:02:23.159 key things. First, its syntax is well, simple and consistent, 43 00:02:23.240 --> 00:02:26.439 easier to learn, much easier to learn, write, and importantly 44 00:02:26.479 --> 00:02:29.840 to read. This means you could prototype ideas really fast, 45 00:02:30.080 --> 00:02:32.479 try something out in hours, maybe not days or weeks. 46 00:02:32.879 --> 00:02:36.120 That speed is vital in AI research. Then there's a community. 47 00:02:36.120 --> 00:02:39.360 It's huge, it's active, it's open source, so much support, 48 00:02:39.439 --> 00:02:44.000 so many people contributing tools. But maybe the killer feature 49 00:02:44.039 --> 00:02:47.639 for AI is its libraries. Python has this incredible ecosystem 50 00:02:47.879 --> 00:02:51.000 of pre built libraries specifically for AI tasks. 51 00:02:51.000 --> 00:02:52.520 I mean like toolkits. 52 00:02:52.080 --> 00:02:55.120 Exactly, things like NUMBPI for handling numbers and raise efficiently, 53 00:02:55.280 --> 00:02:58.400 SCIPI for more scientific stuff, matt plotlib for plotting data, 54 00:02:58.520 --> 00:03:02.120 NLTK for natural language processing with that yeah, and open 55 00:03:02.199 --> 00:03:06.479 cv for computer vision seeing things and as from manipulating data, 56 00:03:06.759 --> 00:03:10.680 open AIGM for reinforcement learning experiments. The list goes on. 57 00:03:10.759 --> 00:03:12.719 These aren't just bits of code, they're like whole work 58 00:03:12.759 --> 00:03:15.280 benches that save developers tons of time. 59 00:03:15.879 --> 00:03:18.919 So for you, if you're looking to actually build AI applications, 60 00:03:19.000 --> 00:03:22.639 starting with Python means you've got a language built for clarity, speed, 61 00:03:23.199 --> 00:03:26.360 and you get this massive head start with all these 62 00:03:26.479 --> 00:03:29.560 powerful tools ready to go. Okay, So we've got the 63 00:03:29.599 --> 00:03:32.639 what and the why Python. Now the next big piece 64 00:03:32.719 --> 00:03:36.120 is how these machines actually learn, and that really brings 65 00:03:36.199 --> 00:03:39.560 us to machine learning, right, which is essentially giving computers 66 00:03:39.560 --> 00:03:42.479 the ability to learn from data, find patterns, and get 67 00:03:42.479 --> 00:03:46.039 better with experience, but without programming every single step explicitly. 68 00:03:46.520 --> 00:03:48.639 And there seem to be three main ways they do this, that's. 69 00:03:48.560 --> 00:03:51.400 Right, three main paradigms. The first and probably the most 70 00:03:51.439 --> 00:03:54.240 common you'll encounter is supervised machine learning. 71 00:03:54.439 --> 00:03:57.159 Supervised like with a teacher, exactly like that. 72 00:03:57.159 --> 00:03:59.080 It's like the algorithm is a student and you give 73 00:03:59.080 --> 00:04:01.680 it a textbook with problems and the answers. Yeah, So 74 00:04:01.759 --> 00:04:04.360 the training data is labeled. You tell it this input 75 00:04:04.360 --> 00:04:08.039 corresponds to this correct output. The goal is for the 76 00:04:08.080 --> 00:04:10.719 algorithm to learn that mapping so it can predict the 77 00:04:10.759 --> 00:04:15.280 output for new unseen inputs. Okay, and supervised learning usually 78 00:04:15.319 --> 00:04:19.639 handles two kinds of problems. There's classification, where the output 79 00:04:19.680 --> 00:04:23.120 is a category like is this email spam or not spam? 80 00:04:23.319 --> 00:04:26.040 This picture a cat or a dog, or in medicine, 81 00:04:26.079 --> 00:04:27.920 is this tumor malignant or benign? 82 00:04:28.040 --> 00:04:28.720 Categories? 83 00:04:29.079 --> 00:04:32.639 And the other's regression, here the output is a continuous 84 00:04:32.720 --> 00:04:35.279 value a number like predicting the price of a house, 85 00:04:35.439 --> 00:04:39.360 or forecasting temperature or estimating say someone's age from a photo. 86 00:04:39.560 --> 00:04:41.319 So numbers not labels. 87 00:04:40.959 --> 00:04:44.480 Exactly, and you see algorithms like decision trees, random forests, 88 00:04:44.560 --> 00:04:48.160 K and N Logistic regression used a lot here. The 89 00:04:48.839 --> 00:04:51.839 main challenge, often especially with big projects, is just getting 90 00:04:51.920 --> 00:04:54.720 enough high quality labeled data. That can be expensive and 91 00:04:54.759 --> 00:04:55.399 time consuming. 92 00:04:55.519 --> 00:04:57.639 Right, someone has to label it all precisely. 93 00:04:58.199 --> 00:05:01.800 Now, moving on. If supervised learning is like learning with 94 00:05:01.839 --> 00:05:07.600 an answer key, unsupervised machine learning is, well, it's more 95 00:05:07.680 --> 00:05:10.120 like being thrown into a library without a catalog and 96 00:05:10.199 --> 00:05:12.839 asked to find interesting connections. Some might even say it's 97 00:05:12.839 --> 00:05:15.160 closer to true AI. 98 00:05:15.040 --> 00:05:17.079 In some ways because there's no teacher. 99 00:05:16.839 --> 00:05:19.639 Exactly, no supervisor, no pre labeled answers. You give the 100 00:05:19.680 --> 00:05:23.560 algorithm raw unlabeled data, and its job is to discover 101 00:05:23.680 --> 00:05:26.240 hidden structures or patterns all by itself. 102 00:05:26.360 --> 00:05:28.240 Okay, like what kind of patterns? Well? 103 00:05:28.279 --> 00:05:31.319 The two main types again. Clustering is about finding natural 104 00:05:31.319 --> 00:05:34.680 groupings in the data. Imagine groupping customers based on their 105 00:05:34.680 --> 00:05:37.680 buying habits without knowing beforehand what those groups might be. 106 00:05:38.040 --> 00:05:40.199 The algorithm figures out the clusters. 107 00:05:39.920 --> 00:05:41.720 Ah finding similar things together. 108 00:05:41.839 --> 00:05:44.920 Yeah, and then there's association. This is about discovering rules 109 00:05:44.920 --> 00:05:48.079 that describe large parts of the data. The classic example 110 00:05:48.160 --> 00:05:53.000 is a market basket analysis finding that customers who buy, say, diapers, 111 00:05:53.439 --> 00:05:54.480 often also by beer. 112 00:05:54.600 --> 00:05:55.959 Right the surprise and connections. 113 00:05:56.199 --> 00:05:59.800 Those kinds of rules. Algorithms like k means are popular 114 00:05:59.839 --> 00:06:03.839 for clustering and a priori for association rules. And the 115 00:06:03.879 --> 00:06:07.439 third type, which is maybe less common but really powerful, 116 00:06:07.839 --> 00:06:11.319 is reinforcement machine learning. Reinforcement this is where the machine 117 00:06:11.519 --> 00:06:15.319 or the agent learns by doing. It interacts with an 118 00:06:15.399 --> 00:06:18.040 environment could be a game, could be the real world 119 00:06:18.040 --> 00:06:20.600 for a robot, and it takes actions trail and error 120 00:06:20.600 --> 00:06:23.759 exactly trial and error. Based on its actions, it gets feedback, 121 00:06:23.839 --> 00:06:26.759 usually as rewards or penalties, and over time it learns 122 00:06:26.759 --> 00:06:31.399 the strategy of policy to maximize its total reward. Think 123 00:06:31.439 --> 00:06:33.199 of training a dog with treats. 124 00:06:33.040 --> 00:06:35.439 Okay, So it learns from consequences. 125 00:06:35.800 --> 00:06:39.160 Precisely, it learns from experience to make better decisions towards 126 00:06:39.199 --> 00:06:41.800 a goal. This is how AI gets really good at 127 00:06:41.839 --> 00:06:44.639 games or controlling robots and complex situations. 128 00:06:44.720 --> 00:06:46.560 So those are learning styles, but how do you actually 129 00:06:46.600 --> 00:06:49.240 build an AI using these what's the first step with 130 00:06:49.480 --> 00:06:51.759 you know, just raw data, because I imagine you can't 131 00:06:51.800 --> 00:06:54.519 just dump raw data into these algorithms, right, it probably 132 00:06:54.600 --> 00:06:56.199 needs cleaning up, shaping. 133 00:06:56.319 --> 00:07:00.759 Oh absolutely, that's a critical off and overlooked step. Data preparation, 134 00:07:01.240 --> 00:07:05.839 specifically data preprocessing. You've hit on a key point. Garbage in, 135 00:07:05.879 --> 00:07:09.720 garbage out. So preprocessing is all about taking that raw, messy, 136 00:07:10.120 --> 00:07:13.920 maybe incomplete data and transforming it into a clean, structured 137 00:07:13.959 --> 00:07:17.360 format that the machine learning algorithms can actually understand and 138 00:07:17.399 --> 00:07:18.399 work with effectively. 139 00:07:18.680 --> 00:07:21.519 Okay, so what does that involve like specific techniques? 140 00:07:21.720 --> 00:07:25.720 Yeah, there are several standard techniques. One is binarization. That's 141 00:07:25.759 --> 00:07:29.680 basically converting numerical values into simple boolean values zero or 142 00:07:29.720 --> 00:07:32.480 one based on some threshold, like if a temperature is 143 00:07:32.480 --> 00:07:35.879 above thirty degrees, it's one hot, otherwise zero not hot. 144 00:07:36.160 --> 00:07:39.319 Simple but useful, sometimes turning things into yes no kind of. Yeah. 145 00:07:39.399 --> 00:07:43.040 Then there's mean removal. This involves subtracting the average value 146 00:07:43.120 --> 00:07:46.639 the mean from each feature across all samples. This centers 147 00:07:46.639 --> 00:07:50.279 the data around zero, which can help some algorithms perform better. 148 00:07:50.519 --> 00:07:54.279 And another really common one is scaling. Data often comes 149 00:07:54.279 --> 00:07:57.000 in with features on vastly different scales like age in 150 00:07:57.120 --> 00:08:00.879 years and income in thousands of dollars. Aniling brings all 151 00:08:00.920 --> 00:08:04.800 features to a comparable range, maybe between zero and one, 152 00:08:05.199 --> 00:08:08.000 or maybe so they have a standard deviation of one. 153 00:08:08.040 --> 00:08:11.800 This prevents features with larger values from unfairly dominating the 154 00:08:11.879 --> 00:08:12.680 learning process. 155 00:08:12.920 --> 00:08:16.040 Makes sense, so everyone gets a fair say data wise. 156 00:08:15.879 --> 00:08:18.439 Exactly, It levels the playing field for the features. 157 00:08:18.839 --> 00:08:21.199 So, okay, the data is prepped, it's clean at scaled. 158 00:08:21.480 --> 00:08:24.319 Now what what are some of those workhorse algorithms that 159 00:08:24.399 --> 00:08:26.079 actually start finding the patterns? 160 00:08:26.319 --> 00:08:28.279 Right? So, once the data is ready, you can apply 161 00:08:28.399 --> 00:08:31.079 various algorithms. Let's touch on a couple of common ones 162 00:08:31.120 --> 00:08:34.320 mentioned in the sources. One is naive base. Naive base 163 00:08:34.519 --> 00:08:39.519 it's a classification technique based on Bayes' theorem from probability. 164 00:08:39.639 --> 00:08:43.320 The naive part comes from its core assumption. It assumes 165 00:08:43.360 --> 00:08:46.279 that all the features, all the input variables are independent of. 166 00:08:46.200 --> 00:08:48.519 Each other, which isn't always true in real life. 167 00:08:48.559 --> 00:08:51.679 Often not. No, that's why it's naive. But surprisingly it 168 00:08:51.679 --> 00:08:55.080 works really well in many situations, especially for text classification 169 00:08:55.159 --> 00:08:58.879 like spam filtering, and it's computationally very efficient, easy to build, 170 00:08:59.200 --> 00:09:01.840 and good with large data sets. Then you have support 171 00:09:01.919 --> 00:09:04.000 vector machines or SVMs. 172 00:09:04.240 --> 00:09:05.840 SVM heard of that one. 173 00:09:05.919 --> 00:09:09.919 Yes, a powerful supervised learning algorithm used for both classification 174 00:09:10.039 --> 00:09:14.000 and regression, though maybe more famous for classification. The basic 175 00:09:14.080 --> 00:09:16.919 idea is to represent the data points as vectors in 176 00:09:16.960 --> 00:09:20.759 space and find the optimal boundary the hyperplane that best 177 00:09:20.799 --> 00:09:22.559 separates the different classes. 178 00:09:22.279 --> 00:09:24.000 Like drawing a line between the dots. 179 00:09:24.240 --> 00:09:27.639 Essentially yes, but in potentially very high dimensional spaces. And 180 00:09:27.679 --> 00:09:30.320 it tries to find the line that has the maximum margin, 181 00:09:30.360 --> 00:09:33.399 the biggest possible gap between the classes, which often leads 182 00:09:33.440 --> 00:09:36.519 to good generalization. And you know, the sources actually give 183 00:09:36.519 --> 00:09:39.399 a concrete example using naive bays. They applied it to 184 00:09:39.440 --> 00:09:42.000 the breast cancer Wisconsin diagnostic database. 185 00:09:42.159 --> 00:09:43.679 Oh wow, real medical data. 186 00:09:43.840 --> 00:09:46.559 Yeah. And the goal was to classify tumors as either 187 00:09:46.639 --> 00:09:50.200 malignant or benign based on certain features from diagnostic tests. 188 00:09:50.960 --> 00:09:54.320 And the naive bath classifier achieved an accuracy of ninety 189 00:09:54.399 --> 00:09:56.120 five point one seven percent. 190 00:09:56.279 --> 00:09:58.600 That's really high. Impressive. 191 00:09:58.679 --> 00:10:00.960 It is impressive, but it also high something crucially you 192 00:10:01.039 --> 00:10:04.600 mentioned earlier, just accuracy isn't always the whole story, especially 193 00:10:04.639 --> 00:10:07.159 in a medicine. You need to understand what kind of 194 00:10:07.200 --> 00:10:08.519 mistakes the model makes. 195 00:10:08.840 --> 00:10:11.360 Right, Like telling someone they don't have cancer when they 196 00:10:11.360 --> 00:10:13.840 do is way worse than the other way around. 197 00:10:13.960 --> 00:10:17.240 Exactly, that's a false negative, and it's often much more 198 00:10:17.240 --> 00:10:20.480 critical to minimize than a false positive a false alarm. 199 00:10:20.559 --> 00:10:21.960 This is where the confusion matrix comes. 200 00:10:21.960 --> 00:10:23.759 In the confusion matrix, it's just. 201 00:10:23.720 --> 00:10:26.559 A simple table really that summarizes the performance of a 202 00:10:26.600 --> 00:10:30.159 classification model. It breaks down the predictions into four categories. 203 00:10:30.720 --> 00:10:34.159 True positives TP correctly identified positive. 204 00:10:33.840 --> 00:10:35.039 Got the cancer right, yep? 205 00:10:35.200 --> 00:10:37.519 True negatives TN correctly identified. 206 00:10:37.120 --> 00:10:38.840 Negative you directly said no cancer right. 207 00:10:39.440 --> 00:10:43.480 False positives FP incorrectly identified as positive, false alarm, and 208 00:10:43.519 --> 00:10:47.039 false negatives FN incorrectly identified as negative. 209 00:10:46.840 --> 00:10:48.679 The dangerous miss that's the one you. 210 00:10:48.639 --> 00:10:51.879 Often worry about most. And from these four numbers in 211 00:10:51.919 --> 00:10:55.799 the matrix you calculate other important metrics. Besides overall accuracy, 212 00:10:56.279 --> 00:10:59.600 there's precision. Out of all the times the model said positive, 213 00:10:59.600 --> 00:11:01.039 how many are actually positive? 214 00:11:01.120 --> 00:11:03.679 How trustworthy are the positive prediction exactly? 215 00:11:03.799 --> 00:11:06.720 Then there's recall, also called sensitivity. Out of all the 216 00:11:06.720 --> 00:11:09.360 actual positive cases, how many did the model. 217 00:11:09.120 --> 00:11:12.000 Find how good is it at catching the positives yep? 218 00:11:12.360 --> 00:11:16.320 And specificity? Out of all the actual negative cases, how 219 00:11:16.320 --> 00:11:20.000 many did the model correctly identify as negative. So depending 220 00:11:20.000 --> 00:11:24.120 on the problem medical diagnosis, spam filtering, fraud detection, you 221 00:11:24.200 --> 00:11:28.080 might care more about maximizing recall or precision or specificity, 222 00:11:28.360 --> 00:11:29.919 not just the overall accuracy. 223 00:11:30.000 --> 00:11:31.759 Okay, that makes a lot of sense. It's about choosing 224 00:11:31.799 --> 00:11:34.639 the right measure for what matters most. All Right, let's 225 00:11:34.679 --> 00:11:37.480 shift gears a bit. We've talked about learning from data 226 00:11:37.480 --> 00:11:41.080 classifying things, But what about how AI interacts with something 227 00:11:41.159 --> 00:11:44.799 uniquely human like language? How do machines understand us? 228 00:11:45.200 --> 00:11:48.840 Ah? Yeah, that's the fascinating world of natural language processing 229 00:11:49.159 --> 00:11:52.960 or NLP. It's basically the field of AI focused on 230 00:11:53.159 --> 00:11:57.759 enabling computers to understand, interpret, and even generate human language, 231 00:11:57.960 --> 00:11:58.799 both spoken and. 232 00:11:58.759 --> 00:12:02.080 Written, understanding and talking back essentially pretty much. 233 00:12:02.080 --> 00:12:04.519 It usually breaks down into two main parts. There's natural 234 00:12:04.559 --> 00:12:08.279 language understanding NLU, which is about figuring out the meaning 235 00:12:08.320 --> 00:12:11.320 behind the words, analyzing the structure. 236 00:12:10.840 --> 00:12:13.159 Of the intent, making sense of the input right. 237 00:12:13.360 --> 00:12:16.320 And then there's natural language generation NLG, which is the 238 00:12:16.320 --> 00:12:21.559 flip side, taking some internal computer representation or data and 239 00:12:21.639 --> 00:12:25.080 producing natural sounding language output, like writing a summary or 240 00:12:25.080 --> 00:12:29.279 answering a question coherently. Now, the NLU part understanding language 241 00:12:29.320 --> 00:12:32.799 is notoriously difficult because human language is just full of ambiguity. 242 00:12:32.879 --> 00:12:36.039 Ambiguity how so well on multiple levels. 243 00:12:36.080 --> 00:12:39.639 There's lexical ambiguity single words having multiple meanings. Think of 244 00:12:39.679 --> 00:12:44.480 the word bank, riverbank, financial thing, okay, yeah, Then syntactic ambiguity, 245 00:12:44.639 --> 00:12:47.840 where the sentence structure is unclear. The classic example is 246 00:12:47.960 --> 00:12:50.279 I saw the man on the hill with a telescope. 247 00:12:50.879 --> 00:12:53.840 Who has the telescope? Meet? Yeah, the man is the 248 00:12:53.840 --> 00:12:54.759 man on the hill that has a. 249 00:12:54.759 --> 00:12:57.720 Telescope A right grammar puzzles exactly. 250 00:12:58.159 --> 00:13:01.919 And then there's referential ambiguity, especially with pronouns. The cat 251 00:13:02.000 --> 00:13:04.919 chased the mouse and it was fast. What does it 252 00:13:05.360 --> 00:13:09.120 refer to the cat or the mouse? Context usually tells us. 253 00:13:09.159 --> 00:13:10.519 But for a computer that's tricky. 254 00:13:10.720 --> 00:13:13.919 Wow, okay, So how does AI even begin to untangle 255 00:13:13.960 --> 00:13:14.279 all that? 256 00:13:14.600 --> 00:13:17.639 It has to break it down systematically? NLP typically involves 257 00:13:17.639 --> 00:13:21.639 a pipeline of steps. First is lexical analysis, just identifying 258 00:13:21.639 --> 00:13:25.720 words in their structures. Then syntactic analysis or parsing, which 259 00:13:25.759 --> 00:13:28.639 figures out the grammatical structure of the sentence how words. 260 00:13:28.399 --> 00:13:30.720 Relate, like diagramming sentences in school. 261 00:13:31.000 --> 00:13:34.120 Kind of like that, yeah, but automated. Then semantic analysis 262 00:13:34.159 --> 00:13:36.600 tries to figure out the actual meaning based on that structure. 263 00:13:36.799 --> 00:13:39.120 Discourse integration looks at how the meaning of a sentence 264 00:13:39.159 --> 00:13:42.360 depends on the sentences the game before it. Context matters hugely, 265 00:13:42.840 --> 00:13:46.159 and finally, pragmatic analysis tries to understand the meaning in 266 00:13:46.200 --> 00:13:50.000 the broader context of the situation, the speaker's intent, real 267 00:13:50.039 --> 00:13:53.360 world knowledge. It's about understanding not just what was said, 268 00:13:53.440 --> 00:13:56.679 but why now. To actually do this analysis, NLP uses 269 00:13:56.759 --> 00:14:01.039 various techniques, many available Python's NLTK library. A fundamental one 270 00:14:01.080 --> 00:14:05.279 is tokenization. Tokenization just breaking text down into smaller units 271 00:14:05.360 --> 00:14:09.440 or tokens, usually words, sometimes sentences, or even characters. It's 272 00:14:09.440 --> 00:14:12.399 a first step in processing texts, chopping it up pretty much. 273 00:14:12.960 --> 00:14:16.240 Then you have things like stemming and limitization. Both try 274 00:14:16.240 --> 00:14:19.240 to reduce words to their root or base form. Stemming 275 00:14:19.279 --> 00:14:21.240 is simpler, kind of a blunt tool, just chops off 276 00:14:21.320 --> 00:14:25.159 endings based on rules, So writing writs written might all 277 00:14:25.200 --> 00:14:28.480 become writ or write depending on the stemmer. It's fast, 278 00:14:28.600 --> 00:14:32.840 but can be crude. Limatization is smarter. It uses vocabulary 279 00:14:32.919 --> 00:14:36.279 and morphological analysis, understanding word structure and parts of speech 280 00:14:36.480 --> 00:14:40.799 to get the actual dictionary form the lemma. So writing 281 00:14:41.360 --> 00:14:45.759 rights written would likely all become right. And importantly, it 282 00:14:45.799 --> 00:14:49.080 can distinguish based on context, like the word saw could 283 00:14:49.120 --> 00:14:52.159 become ce verb or stay saw noun. 284 00:14:52.279 --> 00:14:54.120 More accurate, but probably slower. 285 00:14:54.320 --> 00:14:58.440 Generally yes, limaitization is usually more linguistically correct. And then 286 00:14:58.480 --> 00:15:00.360 for machine learning on texts, you need to convert words 287 00:15:00.399 --> 00:15:03.279 into numbers into features. Two common ways are the bag 288 00:15:03.320 --> 00:15:05.080 of words bo model. 289 00:15:05.240 --> 00:15:07.519 Bag of words sounds messy. 290 00:15:07.360 --> 00:15:10.200 Huh it kind of is? It basically treats a document 291 00:15:10.240 --> 00:15:12.600 as just a collection a bag of its words, ignoring 292 00:15:12.639 --> 00:15:14.960 grammar and word order, and just counts how many times 293 00:15:14.960 --> 00:15:17.759 each word appears, simple but often effective for things like 294 00:15:17.799 --> 00:15:21.279 topic classification. Just the counts matter mostly yes, yeah. And 295 00:15:21.320 --> 00:15:25.000 a more sophisticated approach is TFIDF, which stands for term 296 00:15:25.039 --> 00:15:29.480 frequency inverse document frequency TFIDF. This tries to figure out 297 00:15:29.519 --> 00:15:31.639 how important a word is to a document within a 298 00:15:31.679 --> 00:15:35.000 larger collection of documents. It weighs words higher if they 299 00:15:35.039 --> 00:15:39.120 appear frequently in one document term frequency but rarely in 300 00:15:39.159 --> 00:15:43.200 other documents. Inverse document frequency. This helps filter out common 301 00:15:43.200 --> 00:15:46.759 words like the or is and highlight the truly meaningful. 302 00:15:46.440 --> 00:15:50.159 Terms, so it finds the keywords. Essentially in a statistical way. 303 00:15:50.240 --> 00:15:54.840 Yes, and these techniques BOW and TFIDF are really useful 304 00:15:54.840 --> 00:15:58.639 for things like automatically predicting the category of a news article, or, 305 00:15:58.679 --> 00:16:01.840 as the sources mention, even something like predicting gender based 306 00:16:01.840 --> 00:16:02.440 on names. 307 00:16:02.720 --> 00:16:05.960 Interesting. Okay, so that's text. What about sound? How does 308 00:16:06.000 --> 00:16:08.840 AI hear and understand speech? Right? 309 00:16:08.919 --> 00:16:12.879 Speech recognition? It's about getting a machine to understand spoken language, 310 00:16:13.320 --> 00:16:16.240 and the difficulty really varies. A big factor is the 311 00:16:16.320 --> 00:16:20.240 vocabulary size. A system designed to understand just digits zero 312 00:16:20.320 --> 00:16:23.039 through nine is much simpler than one trying to handle 313 00:16:23.200 --> 00:16:25.519 general dictation with tens of thousands. 314 00:16:25.120 --> 00:16:27.840 Of words makes sense, fewer options exactly. 315 00:16:28.159 --> 00:16:31.639 Then you have channel characteristics. Is it clean recording or 316 00:16:31.720 --> 00:16:34.799 is there lots of background noise? Signal to noise ratio 317 00:16:34.919 --> 00:16:38.159 is key, and the microphone quality and placement matter too. 318 00:16:38.440 --> 00:16:41.039 Okay, So how does it process the sound itself? 319 00:16:41.480 --> 00:16:45.519 Well, the first steps are usually recording, which digitizes the 320 00:16:45.559 --> 00:16:49.759 analog sound wave, and sampling, which converts that continuous signal 321 00:16:49.759 --> 00:16:52.840 into a series of discrete numerical values at a certain rate, 322 00:16:53.080 --> 00:16:54.000 the sampling. 323 00:16:53.639 --> 00:16:56.120 Frequency turning sound into numbers decisely. 324 00:16:56.639 --> 00:16:59.039 Then to make sense of those numbers, AI needs to 325 00:16:59.080 --> 00:17:02.840 extract meaningful features from the speech signal. A very common 326 00:17:02.879 --> 00:17:09.640 technique here is using mscc's malfrequency sexual coefficients MFCCs. Catchy, huh. 327 00:17:09.880 --> 00:17:12.640 Yeah. They're basically a way to represent the short term 328 00:17:12.680 --> 00:17:15.720 power spectrum of the sound, but transformed onto a scale, 329 00:17:15.759 --> 00:17:19.359 the male scale that better reflects human hearing perception. It 330 00:17:19.400 --> 00:17:22.640 helps capture the unique characteristics of different sounds like vowels 331 00:17:22.640 --> 00:17:25.319 and consonants, in a compact form that the AI can 332 00:17:25.400 --> 00:17:25.960 learn from. 333 00:17:26.039 --> 00:17:28.759 So it's finding the fingerprints of speech sounds. That's a 334 00:17:28.759 --> 00:17:30.839 good way to put it, and the sources point out 335 00:17:30.880 --> 00:17:33.599 how practical this has become, mentioning things like using the 336 00:17:33.640 --> 00:17:37.400 Google Speech API, readily available tools that let developers incorporate 337 00:17:37.440 --> 00:17:38.960 speech recognition into their apps. 338 00:17:39.440 --> 00:17:43.240 Very cool. All right, so we've covered language and hearing. 339 00:17:43.680 --> 00:17:46.119 What about site? How does AI see? 340 00:17:46.440 --> 00:17:49.640 That brings us to computer vision or CV. This is 341 00:17:49.680 --> 00:17:52.000 the field that tries to enable machines to see and 342 00:17:52.079 --> 00:17:55.799 interpret the visual world, usually from digital images or videos. 343 00:17:56.279 --> 00:17:59.759 The goal is often to reconstruct, understand, or interpret a 344 00:17:59.759 --> 00:18:01.799 three from its two D images. 345 00:18:02.079 --> 00:18:04.480 So it's more than just processing an image. 346 00:18:04.559 --> 00:18:07.640 Yeah, that's a key distinction. Image processing usually takes an 347 00:18:07.680 --> 00:18:11.240 image as input and produces another image's output, maybe enhanced 348 00:18:11.440 --> 00:18:14.400 or filtered. Computer vision, on the other hand, takes an 349 00:18:14.440 --> 00:18:17.079 image as input but aims to produce some kind of 350 00:18:17.160 --> 00:18:21.119 understanding or description as output. What objects are in the image, 351 00:18:21.119 --> 00:18:22.319 Where are they, what's happening? 352 00:18:22.480 --> 00:18:25.039 Got it understanding not just tweaking. 353 00:18:24.759 --> 00:18:27.880 Exactly, And the applications are just huge. Think robotics helping 354 00:18:27.960 --> 00:18:32.720 robots navigate, identify objects, avoid obstacles, even understand human gestures. 355 00:18:32.960 --> 00:18:36.759 Self driving cars must use this heavily, absolutely a prime example. 356 00:18:36.799 --> 00:18:41.000 And in medicine it's revolutionizing things like analyzing medical scans 357 00:18:41.039 --> 00:18:44.839 to detect tumors or anomalies, reconstructing three D models of organs, 358 00:18:45.480 --> 00:18:49.160 really powerful stuff. A cornerstone library for doing CV and 359 00:18:49.200 --> 00:18:53.400 Python is open CV open source Computer Vision Library OpenCV. 360 00:18:53.640 --> 00:18:54.000 Okay. 361 00:18:54.160 --> 00:18:56.680 It's incredibly powerful and versatile. Lets you do all sorts 362 00:18:56.720 --> 00:19:00.680 of things read, write, display images and video between color 363 00:19:00.720 --> 00:19:04.920 spaces like from standard bgr color to grayscale, detect edges 364 00:19:05.039 --> 00:19:06.680 using algorithms like the Canny edge. 365 00:19:06.559 --> 00:19:08.839 Detector, finding the outlines of things yep. 366 00:19:08.920 --> 00:19:12.799 And even more complex tasks like object detection. OpenCV includes 367 00:19:12.839 --> 00:19:16.119 pre train models like hair cascade classifiers that are quite 368 00:19:16.119 --> 00:19:19.319 effective at detecting specific objects like faces or even eyes 369 00:19:19.359 --> 00:19:20.799 within an image in real time. 370 00:19:20.920 --> 00:19:24.000 Wow, okay, so AI can learn. It can understand language, 371 00:19:24.039 --> 00:19:26.680 process audio, see the world. Yeah, how does it build 372 00:19:26.720 --> 00:19:30.240 the brains behind all this? The complex structures that enable 373 00:19:30.279 --> 00:19:31.160 this advanced stuff? 374 00:19:31.279 --> 00:19:33.640 Right? That takes us into the realm of neural networks 375 00:19:33.680 --> 00:19:37.799 and deep learning at a basic level. Artificial neural networks 376 00:19:37.839 --> 00:19:42.079 ANNs are computing systems inspired by the structure and function 377 00:19:42.359 --> 00:19:45.000 of the biological neural networks that make up animal brains. 378 00:19:45.440 --> 00:19:47.400 Like modeling the brain loosely. 379 00:19:47.119 --> 00:19:51.640 Yes, they consist of interconnected nodes or neurons organized in layers. 380 00:19:52.160 --> 00:19:55.039 Each connection has a weight associated with it, and these 381 00:19:55.039 --> 00:19:58.799 weights are adjusted during the learning process. They essentially learn 382 00:19:58.880 --> 00:20:03.319 to recognize patterns by strengthening or weakening these connections based 383 00:20:03.359 --> 00:20:06.200 on the input data. Deep learning is essentially a type 384 00:20:06.200 --> 00:20:09.880 of machine learning that uses A and n's with many layers, 385 00:20:10.000 --> 00:20:13.039 hence deep What makes deep learning special is that these 386 00:20:13.119 --> 00:20:16.640 layered structures allow the model to learn hierarchies of features 387 00:20:16.680 --> 00:20:17.680 directly from the data. 388 00:20:17.759 --> 00:20:18.440 Hierarchies. 389 00:20:18.519 --> 00:20:21.119 Yeah, so, for image recognition, the first layer might learn 390 00:20:21.160 --> 00:20:23.799 to detect simple edges. The next layer might combine edges 391 00:20:23.839 --> 00:20:26.400 to detect shapes. The layer after that might combine shapes 392 00:20:26.440 --> 00:20:28.519 to detect parts of objects like an I or a nose, 393 00:20:28.880 --> 00:20:32.039 and later layers combine those parts to recognize whole objects 394 00:20:32.279 --> 00:20:36.119 like a face. It learns these representations automatically. 395 00:20:35.519 --> 00:20:38.759 So it builds understanding layer by layer exactly. 396 00:20:38.680 --> 00:20:41.599 And that's a key difference from traditional machine learning. In 397 00:20:41.680 --> 00:20:45.880 traditional mL, you often need significant human effort to design 398 00:20:45.920 --> 00:20:49.160 and select the right features from the data first. Deep 399 00:20:49.240 --> 00:20:52.200 learning aims to learn those features automatically as part of 400 00:20:52.240 --> 00:20:52.799 the process. 401 00:20:53.000 --> 00:20:56.640