WEBVTT 1 00:00:00.120 --> 00:00:04.280 Welcome to the deep dive. Today, we're really getting into 2 00:00:04.320 --> 00:00:07.120 the weeds of artificial intelligence. 3 00:00:06.719 --> 00:00:09.839 Specifically for dot net developers. 4 00:00:09.320 --> 00:00:11.839 Right exactly, we're looking at how you can actually use 5 00:00:11.880 --> 00:00:14.480 things like speech, language and search. 6 00:00:14.480 --> 00:00:17.559 All powered by Microsoft's cognitive services based on the source 7 00:00:17.559 --> 00:00:18.280 material we've got. 8 00:00:18.359 --> 00:00:21.760 Yeah, so think of this as well, moving past the buzzwords. 9 00:00:21.239 --> 00:00:23.640 Right from the big ideas down to the practical tools 10 00:00:23.679 --> 00:00:25.320 you can use to build smarter apps. 11 00:00:25.480 --> 00:00:27.960 Our goal here is, you know, to cut through the noise, 12 00:00:28.120 --> 00:00:30.079 pull out the really key concepts, get. 13 00:00:29.960 --> 00:00:33.439 Those aha moments, maybe uncover some surprising bits. 14 00:00:33.399 --> 00:00:35.759 And do it without you needing to like write a 15 00:00:35.799 --> 00:00:38.079 single line of code. Right now, it's. 16 00:00:37.920 --> 00:00:41.799 About understanding the essentials, the foundation, the services, so you 17 00:00:41.920 --> 00:00:42.960 know what's possible. 18 00:00:43.240 --> 00:00:47.520 Okay, so let's lay that foundation AI. It's everywhere often 19 00:00:47.560 --> 00:00:49.960 sounds like science fiction. What's the real deal. 20 00:00:50.200 --> 00:00:53.240 Well, at its core, it's about building systems that do 21 00:00:53.359 --> 00:00:54.920 things needing human. 22 00:00:54.759 --> 00:00:58.799 Like intelligence, but maybe not sentient robots just yet. 23 00:00:58.880 --> 00:01:02.960 Hah No, it's much more grounded. Think specific tasks, specific 24 00:01:03.039 --> 00:01:04.760 capabilities that are available now. 25 00:01:05.040 --> 00:01:07.920 And getting here wasn't exactly a smooth ride, was it. 26 00:01:08.480 --> 00:01:10.159 There were these AI winters. 27 00:01:10.480 --> 00:01:13.920 That's right, two main periods, sort of mid seventies to 28 00:01:14.000 --> 00:01:17.439 early eighties and then again late eighties to early nineties. 29 00:01:17.680 --> 00:01:20.680 What happened there, Basically, the hype got way ahead of 30 00:01:20.680 --> 00:01:23.640 the actual technology. Promises were made that just couldn't be 31 00:01:23.680 --> 00:01:25.519 delivered with the computing power. 32 00:01:25.200 --> 00:01:29.079 Back then, so uh, funding dried up, progress stalled exactly. 33 00:01:29.120 --> 00:01:32.879 People got disillusioned. But then things started picking up again. 34 00:01:33.239 --> 00:01:34.079 Why what changed? 35 00:01:34.159 --> 00:01:37.920 Computers got faster, cheaper, you know Moore's law and action. 36 00:01:38.640 --> 00:01:40.879 Suddenly some older ideas became. 37 00:01:40.599 --> 00:01:45.200 Feasible, and we started seeing these specialized systems achieving big things. 38 00:01:45.400 --> 00:01:48.799 Yes, like IBM's Deep Blue beating Gary kaspar Off at 39 00:01:48.879 --> 00:01:49.920 chess in ninety seven. 40 00:01:50.040 --> 00:01:52.719 That was huge, a machine beating the world champ in 41 00:01:52.760 --> 00:01:53.719 such a complex game. 42 00:01:53.840 --> 00:01:56.040 It showed that focused DAI could really excel. But the 43 00:01:56.680 --> 00:01:58.920 current boom that really kicked off more recently. 44 00:01:58.640 --> 00:02:00.680 Driven by the tech giants and testing heavily. 45 00:02:00.879 --> 00:02:04.120 Absolutely, and then you had moments like IBM Watson winning 46 00:02:04.200 --> 00:02:06.120 Jeopardy Yeah in twenty eleven. 47 00:02:05.920 --> 00:02:09.199 Right, that really showcased natural language processing in a big way. 48 00:02:09.240 --> 00:02:13.520 It did, and after that companies really started productizing these 49 00:02:13.599 --> 00:02:17.120 AI capabilities, making them available as services. 50 00:02:16.719 --> 00:02:19.520 Which brings us to today, where AI feels like it's 51 00:02:19.560 --> 00:02:21.439 baked into so much tech. 52 00:02:21.680 --> 00:02:25.719 Your phone, websites, even video game characters reacting to what 53 00:02:25.800 --> 00:02:26.120 you do. 54 00:02:26.479 --> 00:02:30.199 So it went from big dreams, some setbacks, and now 55 00:02:30.240 --> 00:02:32.599 it's practical, usable tech. 56 00:02:32.479 --> 00:02:35.639 Pretty much, and that practicality is changing how we even 57 00:02:35.719 --> 00:02:36.919 interact with computers. 58 00:02:37.159 --> 00:02:40.199 Let's talk about user interfaces. We started way back with 59 00:02:40.280 --> 00:02:41.560 the command line, the CLI. 60 00:02:41.719 --> 00:02:44.840 Powerful, yeah if you knew the commands, but super intimidating 61 00:02:44.840 --> 00:02:45.840 for beginners. 62 00:02:45.520 --> 00:02:47.879 Like learning a secret code kind of yeah. 63 00:02:47.919 --> 00:02:51.680 Then came the GUI, the graphical user interface. 64 00:02:51.319 --> 00:02:54.439 Total game changer, Windows icons, the mouse. 65 00:02:54.360 --> 00:02:58.240 Building on work from places like Xerox PRC, then popularized 66 00:02:58.280 --> 00:03:01.120 by Apple and Microsoft. It made computing accessible. 67 00:03:01.159 --> 00:03:02.759 You could see what you were doing exactly. 68 00:03:03.080 --> 00:03:06.840 But now there's another shift happening towards conversation. 69 00:03:06.560 --> 00:03:09.520 The conversational user interface or see y right. 70 00:03:09.639 --> 00:03:12.080 The idea is you just talk or type to the system, 71 00:03:12.199 --> 00:03:14.759 like messaging a friend. No clicking through menus. 72 00:03:14.639 --> 00:03:17.719 So ordering pizzas just typing get me a large pepperoni. 73 00:03:18.199 --> 00:03:22.280 That's the goal, simple natural interaction. Messaging apps have made 74 00:03:22.360 --> 00:03:23.960 us really comfortable. 75 00:03:23.479 --> 00:03:25.120 With this, Okay, but that sounds like it needs some 76 00:03:25.159 --> 00:03:26.479 serious smarts behind it. 77 00:03:26.479 --> 00:03:30.879 It absolutely does. This is where AI is crucial, specifically 78 00:03:31.039 --> 00:03:33.960 Natural language understanding NLU. 79 00:03:33.719 --> 00:03:36.599 Because the system has to figure out what you actually mean, 80 00:03:37.120 --> 00:03:38.120 not just what you typed. 81 00:03:38.360 --> 00:03:42.159 Precisely, it needs NLU in the back end to interpret 82 00:03:42.199 --> 00:03:43.520 that conversational input. 83 00:03:43.680 --> 00:03:48.439 So what's the weather and forecast for today mean the 84 00:03:48.439 --> 00:03:49.439 same thing to the system. 85 00:03:49.680 --> 00:03:52.960 A good NLU system should understand that. Yes, it identifies 86 00:03:53.039 --> 00:03:54.960 the user's goal, the intent. 87 00:03:55.240 --> 00:03:57.000 It pulls out the important bits of information. 88 00:03:57.199 --> 00:04:00.759 Those are the entities. So weather in London tomorrow. The 89 00:04:00.759 --> 00:04:03.680 intena is get weather entities are London and tomorrow. 90 00:04:03.840 --> 00:04:06.960 Seems intuitive for us, but you mentioned CUIs aren't perfect. 91 00:04:07.039 --> 00:04:10.400 There are challenges, Oh definitely. They struggle with really complex, 92 00:04:10.520 --> 00:04:12.840 nuanced conversations, and there are risks. 93 00:04:13.120 --> 00:04:17.519 Remember Tay, Microsoft's Twitter bot. Oh yeah, that one sideways fast. 94 00:04:17.800 --> 00:04:22.000 It learned from interactions, but unfortunately it learned toxic stuff 95 00:04:22.160 --> 00:04:25.160 very quickly and started spewing offensive tweets. 96 00:04:25.360 --> 00:04:27.199 Had to be shut down almost immediately. 97 00:04:27.319 --> 00:04:30.360 A really stark reminder that AI learning from the real 98 00:04:30.399 --> 00:04:33.480 world needs careful controls. You can't just let it loose 99 00:04:33.480 --> 00:04:34.399 without safeguards. 100 00:04:34.600 --> 00:04:38.279 So maybe it mixes better for now combining CUI and GUI. 101 00:04:38.560 --> 00:04:41.879 Yeah, the source suggests a hybrid approach often makes sense. 102 00:04:42.360 --> 00:04:46.160 Use conversation for simple things, stick to graphical interfaces for 103 00:04:46.240 --> 00:04:47.439 more complex tasks. 104 00:04:47.519 --> 00:04:51.079 Okay, let's dig into that NLU piece more. It's fundamental. 105 00:04:51.720 --> 00:04:54.800 Why is it considered an AIHRD problem. 106 00:04:54.639 --> 00:04:58.240 Because human language is just incredibly complex and subtle. Getting 107 00:04:58.279 --> 00:05:01.560 a machine to grasp it properly isn't just one algorithm. 108 00:05:01.720 --> 00:05:04.560 It's like computer vision or machine translation requires lots of 109 00:05:04.560 --> 00:05:05.800 different techniques working together. 110 00:05:05.959 --> 00:05:09.639 Exactly, there are multiple layers of difficulty like way, Well, first, 111 00:05:09.639 --> 00:05:13.680 there's syntax, the grammar, the structure of sentences. Machines need 112 00:05:13.720 --> 00:05:14.680 to parse that correctly. 113 00:05:14.720 --> 00:05:16.439 Okay, sentence rules makes sense. 114 00:05:16.600 --> 00:05:20.560 Then semantics. That's the meaning of words and sentences, synonyms, 115 00:05:20.600 --> 00:05:21.879 words with multiple meanings. 116 00:05:22.160 --> 00:05:26.319 I like apples versus I'm fond of apples, same meaning, 117 00:05:26.480 --> 00:05:27.160 different words. 118 00:05:27.319 --> 00:05:30.399 Right, The machine needs to get that underlying concept. 119 00:05:30.600 --> 00:05:32.439 Any sounds tricky, what's. 120 00:05:32.279 --> 00:05:36.800 Next, pragmatics. This is maybe the toughest bit. It's understanding 121 00:05:36.920 --> 00:05:40.759 the implied meaning the context that's not being explicitly sex Exactly, 122 00:05:40.800 --> 00:05:42.480 if I say wow, it's hot in here, I might 123 00:05:42.519 --> 00:05:46.319 actually mean can you open a window? The machine needs 124 00:05:46.519 --> 00:05:48.720 situational awareness. 125 00:05:48.399 --> 00:05:51.519 Which computers usually lack. They don't have our common sense 126 00:05:51.600 --> 00:05:53.199 or world knowledge precisely. 127 00:05:53.680 --> 00:05:55.319 And then you've got just plain. 128 00:05:55.319 --> 00:05:57.480 Ambiguity words meaning different things. 129 00:05:57.360 --> 00:06:00.720 Yeah, like bank riverbank or financial bank, or sentences that 130 00:06:00.759 --> 00:06:03.000 can be read multiple ways like I saw a man 131 00:06:03.040 --> 00:06:04.759 on a fill with a telescope. 132 00:06:04.519 --> 00:06:06.800 The classic who has the telescope? Right? 133 00:06:06.879 --> 00:06:10.680 And finally just the sheer variation in language spoken versus 134 00:06:10.680 --> 00:06:15.000 written dialects, slang, typos. It's messy, very messy for a 135 00:06:15.000 --> 00:06:17.240 machine to handle consistently. 136 00:06:16.639 --> 00:06:22.160 So syntax, semantics, pragmatics, ambiguity variation quite a challenge. Were 137 00:06:22.160 --> 00:06:23.759 their early attempts to crack this. 138 00:06:24.240 --> 00:06:27.800 Yeah, some famous ones Eliza back in the sixties mimicked 139 00:06:27.839 --> 00:06:31.680 a therapist using pattern matching. It seems smart, but didn't really. 140 00:06:31.560 --> 00:06:33.199 Understand, more like clever tricks. 141 00:06:33.360 --> 00:06:37.240 Kind of a bigger step was SHRDLU around nineteen seventy. 142 00:06:37.480 --> 00:06:39.199 SHRDLU what did it do? 143 00:06:39.560 --> 00:06:43.279 It operated in a tiny virtual world of blocks. You 144 00:06:43.319 --> 00:06:45.480 could tell it pick up the blue pyramid or ask 145 00:06:45.600 --> 00:06:49.519 questions about the blocks, and understood within that very limited world. Yes, 146 00:06:49.639 --> 00:06:53.199 remarkably well. It showed that NLU was possible if you 147 00:06:53.279 --> 00:06:55.279 constrain the domain significantly. 148 00:06:55.800 --> 00:06:59.759 Okay, so NLU is hard but vital. How do developers 149 00:06:59.839 --> 00:07:02.680 like our listeners actually use it today without building it 150 00:07:02.680 --> 00:07:03.279 all themselves. 151 00:07:03.360 --> 00:07:06.959 That's where cloud services come in, like Microsoft's LUIS Language 152 00:07:07.040 --> 00:07:09.160 Understanding Intelligence Service elleweeds. 153 00:07:09.240 --> 00:07:11.560 So it's like NLU as a service pretty much. 154 00:07:11.600 --> 00:07:13.360 You don't need to be the deep learning expert. Your 155 00:07:13.439 --> 00:07:16.199 job is mainly to train it for your specific application. 156 00:07:16.319 --> 00:07:17.399 How does that training work? 157 00:07:17.600 --> 00:07:21.680 You feeded example sentences. They're called utterances that your users might. 158 00:07:21.560 --> 00:07:24.839 Say things like find me a nearby Italian restaurant. 159 00:07:24.720 --> 00:07:27.480 Exactly, And for each utterance you tell owe with the 160 00:07:27.600 --> 00:07:30.600 user's goal. The intent is like fine restaurant, and you 161 00:07:30.680 --> 00:07:32.079 label the key info the. 162 00:07:32.279 --> 00:07:37.720 Entities, so Italian would be cuisine type, nearby implies location precisely. 163 00:07:37.959 --> 00:07:41.160 You provide lots of examples, Louis learns from them using 164 00:07:41.199 --> 00:07:44.800 machine learning algorithms. Then when a new sentence comes in, 165 00:07:44.879 --> 00:07:47.399 it predicts the intent and extracts the entities. 166 00:07:47.839 --> 00:07:49.639 What are the main bits you can figure in LA? 167 00:07:50.079 --> 00:07:52.800 You define your intents the actions users can take, and 168 00:07:52.839 --> 00:07:55.079 you define your entities the data points you need. 169 00:07:55.480 --> 00:07:57.040 Are there different types of entities? 170 00:07:57.319 --> 00:07:59.959 Yes, quite a few, simple entities are ones you define 171 00:08:00.240 --> 00:08:04.120 like product category. But Louis also has pre built entities 172 00:08:04.120 --> 00:08:05.399 which are super useful. 173 00:08:05.560 --> 00:08:06.199 What do they cover? 174 00:08:06.439 --> 00:08:10.839 Common stuff like dates, times, numbers, locations, email addresses, percentages. 175 00:08:10.959 --> 00:08:13.199 Saves you a lot of effort. It already knows how 176 00:08:13.240 --> 00:08:16.040 to recognize next Tuesday at three pm. 177 00:08:16.240 --> 00:08:17.279 That's handy. What else? 178 00:08:17.439 --> 00:08:20.920 You can create composite entities to group related entities like 179 00:08:21.000 --> 00:08:25.439 an order entity containing item and quantity, and hierarchical entities 180 00:08:25.480 --> 00:08:29.199 for parent child relationships like person name having first name 181 00:08:29.240 --> 00:08:30.439 and last name help. 182 00:08:30.360 --> 00:08:33.200 Structure the extracted data. What about phrase lists? 183 00:08:33.799 --> 00:08:37.480 Think of them as giving ellewe hints. You list words 184 00:08:37.559 --> 00:08:41.639 or phrases that are strong indicators for certain intents or entities, 185 00:08:42.039 --> 00:08:44.679 like a list of all your product names or synonyms 186 00:08:44.759 --> 00:08:46.039 for book a meeting. 187 00:08:46.200 --> 00:08:49.000 It helps boost the signal for important terms exactly. 188 00:08:49.039 --> 00:08:51.399 And then there's active learning. This is really important. 189 00:08:51.440 --> 00:08:52.799 After you launch, what does that do? 190 00:08:53.080 --> 00:08:56.720 Elliwei identifies utterances it wasn't very sure about. It shows 191 00:08:56.799 --> 00:08:59.799 them to you. You clarify the correct intents and entities, 192 00:09:00.240 --> 00:09:03.919 and that feedback helps retrain and improve the model over time. 193 00:09:04.360 --> 00:09:07.639 So the model gets smarter based on real user interactions. 194 00:09:07.879 --> 00:09:10.120 Correct It's a continuous improvement cycle and. 195 00:09:10.120 --> 00:09:13.519 The overall flow for an app Using Alleyway, your. 196 00:09:13.399 --> 00:09:15.919 App gets the user's text sense, sends it to the 197 00:09:15.960 --> 00:09:19.279 Louis API. Louis sends back Jason with the predicted intent 198 00:09:19.440 --> 00:09:22.039 and entities. Your app uses that info to. 199 00:09:21.960 --> 00:09:25.840 Do the right thing, like calling another API, querying a database, whatever. 200 00:09:25.600 --> 00:09:28.639 The action is exactly. It integrates nicely with things like 201 00:09:28.720 --> 00:09:31.399 the Microsoft Bought framework for building chatbots. 202 00:09:31.679 --> 00:09:36.120 Okay. LAOS handles the core understanding. What other text analysis 203 00:09:36.159 --> 00:09:38.639 tools are there in cognitive services? 204 00:09:38.679 --> 00:09:42.559 Several useful ones. There's the Bing's spell check API, just 205 00:09:42.600 --> 00:09:46.159 basic spell check. It's smarter than that. It's contextual. It 206 00:09:46.240 --> 00:09:49.799 understands that booking is correct in booking a flight, but 207 00:09:49.879 --> 00:09:53.919 maybe not somewhere else. It gets proper nouns like Microsoft, 208 00:09:54.399 --> 00:09:56.840 even if slightly misspelled ah. 209 00:09:56.879 --> 00:09:59.799 So it considers the surrounding words useful for cleaning up 210 00:09:59.879 --> 00:10:01.159 U input definitely. 211 00:10:01.519 --> 00:10:04.919 It even handles some slang and common brand name misspellings. 212 00:10:05.120 --> 00:10:06.480 What else in the text suite? 213 00:10:06.639 --> 00:10:10.159 The text Analytics API bundles a few things. Language detection 214 00:10:10.279 --> 00:10:13.320 figures out what language the text is in, useful for routing, 215 00:10:13.320 --> 00:10:15.559 support tickets, or filtering content. 216 00:10:15.360 --> 00:10:17.639 And sentiment analysis that seems really popular. 217 00:10:17.759 --> 00:10:21.000 It is analyzing if text is positive, negative, or neutral. 218 00:10:21.360 --> 00:10:25.200 Companies use it constantly for customer reviews, social media monitoring. 219 00:10:24.799 --> 00:10:26.960 Getting a pulse on customer opinion at scale. 220 00:10:26.759 --> 00:10:29.360 Right, it usually gives a score like point nine for 221 00:10:29.559 --> 00:10:31.720 very positive, point one for very negative. 222 00:10:31.759 --> 00:10:33.120 Does it do summarization too? 223 00:10:33.480 --> 00:10:36.759 Not exactly summarization, but key phrase extraction pulls out the 224 00:10:36.799 --> 00:10:40.679 main talking points, the important noun phrases, and topic detection 225 00:10:40.840 --> 00:10:45.000 can group large amounts of text like reviews, into underlying themes. 226 00:10:45.279 --> 00:10:48.039 The source also mentioned something called the Web language model 227 00:10:48.279 --> 00:10:48.960 or web LM. 228 00:10:49.240 --> 00:10:52.919 Yeah, that's a language model trained on well huge amounts 229 00:10:52.960 --> 00:10:56.639 of web data from bing. It understands common word sequences 230 00:10:56.639 --> 00:10:57.600 and probabilities. 231 00:10:57.679 --> 00:10:58.200 What's that use? 232 00:10:58.279 --> 00:11:02.480 For things like word breaking, splitting buy tickets now into 233 00:11:02.919 --> 00:11:07.440 buy tickets now, calculating joint probability? How likely is the 234 00:11:07.480 --> 00:11:12.080 phrase natural language processing versus say natural language pineapple okay? 235 00:11:12.240 --> 00:11:14.320 Measuring how natural a phrase sounds. 236 00:11:14.039 --> 00:11:17.919 And conditional probability predicting the next word given artificial how 237 00:11:18.080 --> 00:11:21.679 likely is intelligence to follow? This powers things like autocorrect 238 00:11:21.799 --> 00:11:22.879 and tax suggestions. 239 00:11:22.879 --> 00:11:25.799 Wow, quite a toolbox for text. Now, what about turning 240 00:11:25.840 --> 00:11:27.639 speech into text and back. 241 00:11:27.720 --> 00:11:30.039 That's where the speech APIs come in. Speech to text 242 00:11:30.279 --> 00:11:34.159 STT converts audio to text text to speech TTS does 243 00:11:34.200 --> 00:11:34.679 the reverse. 244 00:11:34.720 --> 00:11:36.279 How does STT work? Generally? 245 00:11:36.559 --> 00:11:40.080 It analyzes the audio signal, breaks it down into basic 246 00:11:40.159 --> 00:11:44.440 sound units called phonemes, and uses acoustic and language models 247 00:11:44.480 --> 00:11:47.519 to figure out the most likely sequence of words. Usually 248 00:11:47.559 --> 00:11:48.799 gives a confidence score. 249 00:11:48.559 --> 00:11:50.440 Two and Microsoft's offerings. 250 00:11:50.679 --> 00:11:53.960 There are standard speech APIs, but the really interesting one 251 00:11:54.039 --> 00:11:59.840 is the Custom Speech Service criis custom. Howso CRES let's 252 00:11:59.879 --> 00:12:03.200 use adapt the speech recognition model to your specific scenario. 253 00:12:03.360 --> 00:12:03.879 What does that mean? 254 00:12:04.039 --> 00:12:07.159 You can upload your own audio data and accurate transcripts. 255 00:12:07.519 --> 00:12:09.720 If your app will be used in a noisy factory 256 00:12:09.919 --> 00:12:13.320 or involves lots of specific jargon or product names, you 257 00:12:13.360 --> 00:12:15.879 can train a model that's much better at understanding that 258 00:12:15.960 --> 00:12:18.279 specific audio environment and vocabulary. 259 00:12:18.480 --> 00:12:21.480 Ah, so you tailor it to overcome background noise or 260 00:12:21.519 --> 00:12:23.200 specialized language exactly. 261 00:12:23.240 --> 00:12:26.120 It can make a huge difference in accuracy for specific 262 00:12:26.200 --> 00:12:28.320 use cases compared to a general purpose model. 263 00:12:28.360 --> 00:12:30.360 And what about recognizing who is talking? 264 00:12:30.799 --> 00:12:34.879 That's speaker recognition two main types. Verification confirms if a 265 00:12:34.960 --> 00:12:38.480 voice matches a known person like voice log in usually 266 00:12:38.559 --> 00:12:41.519 needs enrollment where the person says specific phrases. 267 00:12:41.559 --> 00:12:42.960 Okay, one to one matching. 268 00:12:42.879 --> 00:12:47.159 And identification, which tries to figure out which speaker from 269 00:12:47.200 --> 00:12:50.200 a pre enrolled group is the one talking. Useful for 270 00:12:50.240 --> 00:12:51.960 transcription that notes who said. 271 00:12:51.679 --> 00:12:54.480 What, and for the other way text to speech, making 272 00:12:54.519 --> 00:12:55.840 the computer talk naturally. Yeah. 273 00:12:55.879 --> 00:12:59.879 TTS takes text and generates audio. The source mentions ssmls 274 00:13:00.039 --> 00:13:03.320 each synthesis markup language, what's that for? It lets you 275 00:13:03.360 --> 00:13:07.039 control how the text is spoken. Things like emphasis, pitch, 276 00:13:07.440 --> 00:13:12.519 speaking rate, pauses, even pronunciation of specific words helps make 277 00:13:12.559 --> 00:13:14.240 the synthesized voice sound less. 278 00:13:14.320 --> 00:13:17.320 Robotic speech text feels like it's gotten way better recently. 279 00:13:17.399 --> 00:13:21.000 It really has, largely thanks to deep learning bottles, but 280 00:13:21.080 --> 00:13:24.039 accuracy is still a challenge, especially in noisy places or 281 00:13:24.080 --> 00:13:27.240 with strong accents. The source notes that even with claims 282 00:13:27.279 --> 00:13:30.080 of low error rates like Google's four point nine percent, 283 00:13:30.399 --> 00:13:32.799 that's often in ideal conditions. 284 00:13:32.279 --> 00:13:34.879 Which is why that custom speech service is valuable for 285 00:13:34.919 --> 00:13:36.840 bridging the gap in real world scenarios. 286 00:13:36.879 --> 00:13:37.440 Precisely. 287 00:13:37.720 --> 00:13:41.120 Okay, shifting focus again, let's talk search and recommendations making 288 00:13:41.159 --> 00:13:42.240 information findable. 289 00:13:42.480 --> 00:13:45.600 Right, we have explicit search. You type a query, but 290 00:13:45.840 --> 00:13:48.759 AI enables implicit search. 291 00:13:48.639 --> 00:13:50.360 Where the system anticipates what you need. 292 00:13:50.639 --> 00:13:54.240 Yeah, like Amazon showing customers who bought this also bought 293 00:13:54.519 --> 00:13:56.480 or related items. It's proactive. 294 00:13:56.600 --> 00:13:59.000 The source mentioned the three piece of search. 295 00:13:58.919 --> 00:14:04.840 Right search everywhere, predictive, anticipating needs, proactive, giving answers before 296 00:14:04.879 --> 00:14:05.399 you ask. 297 00:14:05.879 --> 00:14:09.720 That's the ideal, and Microsoft has bing APIs for web 298 00:14:09.879 --> 00:14:14.480 image news search. Let's focus on recommendations though, how do 299 00:14:14.600 --> 00:14:15.159 those work? 300 00:14:15.440 --> 00:14:18.320 The main goal is usually to increase sales or engagement 301 00:14:18.360 --> 00:14:19.879 by suggesting relevant things. 302 00:14:20.000 --> 00:14:20.799 What kinds are there? 303 00:14:20.919 --> 00:14:24.679 Frequent bought together FBT is common items often bought in 304 00:14:24.720 --> 00:14:27.159 the same transaction, like a camera and a memory card 305 00:14:27.279 --> 00:14:29.840 makes sense. Then item to item, which is a type 306 00:14:29.840 --> 00:14:33.320 of collaborative filtering. It suggests items based on what other 307 00:14:33.480 --> 00:14:37.120 similar users liked. People who viewed this also viewed. 308 00:14:36.840 --> 00:14:38.080 Based on collective behavior. 309 00:14:38.200 --> 00:14:40.679 And user to item, which is more personalized. It looks 310 00:14:40.679 --> 00:14:44.080 at your past history, views, purchases to recommend things specifically 311 00:14:44.080 --> 00:14:44.320 for you. 312 00:14:44.799 --> 00:14:46.919 How do you build these using the Microsoft service? 313 00:14:47.200 --> 00:14:51.519 You need data two main types. Catalog data, which is 314 00:14:51.639 --> 00:14:55.799 info about your items, products, articles, whatever, including features and 315 00:14:56.039 --> 00:15:00.519 usage data records of user interactions like clicks, purchases, ratings, 316 00:15:00.600 --> 00:15:01.200 so you feed it. 317 00:15:01.240 --> 00:15:02.840 Your product list and how people. 318 00:15:02.559 --> 00:15:06.120 Interact with exactly. Then you train different recommendation models called 319 00:15:06.159 --> 00:15:09.320 builds on that data. There are specific builds for FBT 320 00:15:09.559 --> 00:15:12.360 and others like SAR that handle itemed item and user 321 00:15:12.399 --> 00:15:13.000 to item, and. 322 00:15:12.960 --> 00:15:15.679 The quality of recommendations depends heavily on that input data. 323 00:15:15.720 --> 00:15:19.519 Absolutely, good data, good quantity leads to better recommendations. 324 00:15:19.559 --> 00:15:22.120 The source also mentioned ranking and offline evaluation. 325 00:15:22.519 --> 00:15:25.360 Ranking is crucial how do you order the results. It's 326 00:15:25.360 --> 00:15:28.279 often based on relevant scores derived from usage data and 327 00:15:28.320 --> 00:15:32.320 item features. Offline evaluation lets you test your train models 328 00:15:32.320 --> 00:15:35.440 on historical data before deploying them to see which build 329 00:15:35.519 --> 00:15:36.360 performs best. 330 00:15:36.679 --> 00:15:39.200 Okay, so we've covered a lot of ground from AI 331 00:15:39.200 --> 00:15:44.519 history to interfaces and LU with LIS text analysis, speech 332 00:15:44.600 --> 00:15:46.919 tech search recommendations. 333 00:15:47.000 --> 00:15:49.360 It's quite a journey, but the key takeaway, I think, 334 00:15:49.440 --> 00:15:53.200 is how these advanced AI capabilities are becoming accessible right. 335 00:15:53.759 --> 00:15:56.759 Things that needed huge research teams are now available as 336 00:15:56.799 --> 00:15:59.360 APIs like cognitive services. 337 00:15:59.159 --> 00:16:02.120 Especially for developers already in the dot net world. It 338 00:16:02.240 --> 00:16:05.039 lowers the barrier significantly to adding intelligence. 339 00:16:05.080 --> 00:16:07.120 It makes you really think about where AI is already 340 00:16:07.159 --> 00:16:08.200 working behind the scenes in. 341 00:16:08.200 --> 00:16:11.879 Your life or how these tools could reshape industries think 342 00:16:11.879 --> 00:16:14.720 about customer service, retail, healthcare. 343 00:16:14.799 --> 00:16:17.759 Definitely yeah, and that brings us to the future. The 344 00:16:17.799 --> 00:16:22.679 source touches on this idea of AI first organizations. 345 00:16:22.080 --> 00:16:25.960 Yeah, companies embedding AI into their core strategy, their products, 346 00:16:25.960 --> 00:16:27.200 how their people work. 347 00:16:27.080 --> 00:16:29.360 And it addresses the big question about jobs. 348 00:16:29.559 --> 00:16:34.080 The perspective offered is interesting tasks, not jobs, will be eliminated. 349 00:16:34.639 --> 00:16:37.440 The focus shifts to how human roles will change and 350 00:16:37.519 --> 00:16:38.679 work alongside AI. 351 00:16:38.919 --> 00:16:43.399 Augmented intelligence not just artificial intelligence replacing humans exactly. 352 00:16:43.519 --> 00:16:46.919 Combining the strengths of both human and machines working together 353 00:16:47.000 --> 00:16:49.559 can achieve way more than either could alone. 354 00:16:49.639 --> 00:16:53.840 It's a vision of AI becoming woven into everything, cars, factories, shopping, 355 00:16:54.240 --> 00:16:55.200 daily life. 356 00:16:55.000 --> 00:16:56.360 A fundamental transformation. 357 00:16:56.919 --> 00:17:00.200 So to wrap up, we've seen how AI is evolve, 358 00:17:00.559 --> 00:17:03.799 how tools like cognitive services make it practical for developers, 359 00:17:04.039 --> 00:17:07.920 especially with dot net, to build apps that understand language, speech, 360 00:17:08.039 --> 00:17:10.720 and user needs through search and recommendations. 361 00:17:10.799 --> 00:17:13.160 It really puts powerful capabilities within reach. 362 00:17:13.440 --> 00:17:17.200 And thinking about that future, that augmented intelligence idea where 363 00:17:17.240 --> 00:17:21.319 tasks change and humans partner with AI. Here's a final 364 00:17:21.359 --> 00:17:23.799 thought for you to consider. If AI is set to 365 00:17:23.839 --> 00:17:27.839 transform our tasks and merge with our capabilities. What completely 366 00:17:27.880 --> 00:17:30.960 new roles, new kinds of expertise, or maybe even entirely 367 00:17:31.000 --> 00:17:34.400 new opportunities might emerge from this human machine partnership in 368 00:17:34.440 --> 00:17:37.200 the coming years, Things we perhaps can't even quite imagine 369 00:17:37.240 --> 00:17:38.960 Today, something definitely worth pondering.