WEBVTT

1
00:00:00.200 --> 00:00:04.120
<v Speaker 1>Right now, sitting inside almost every single cell of your

2
00:00:04.160 --> 00:00:08.320
<v Speaker 1>body is a three billion letter instruction.

3
00:00:07.960 --> 00:00:10.960
<v Speaker 2>Manual, which is just I mean, it's a staggering scale

4
00:00:11.000 --> 00:00:12.279
<v Speaker 2>to even try to picture.

5
00:00:12.599 --> 00:00:15.160
<v Speaker 1>Yeah, think about that scale for a second. If a

6
00:00:15.160 --> 00:00:17.800
<v Speaker 1>doctor wants to find out why you're sick or you know,

7
00:00:17.839 --> 00:00:21.160
<v Speaker 1>why a medication isn't working, they essentially have to find

8
00:00:21.239 --> 00:00:24.480
<v Speaker 1>a single microscopic typo in a book that is three

9
00:00:24.640 --> 00:00:25.960
<v Speaker 1>million pages long.

10
00:00:26.000 --> 00:00:28.160
<v Speaker 2>Right, and they need to find it fast. I mean,

11
00:00:28.160 --> 00:00:31.160
<v Speaker 2>thirty years ago, doing that was a biological impossibility. It

12
00:00:31.239 --> 00:00:35.200
<v Speaker 2>took over a decade and literally billions of dollars just

13
00:00:35.280 --> 00:00:35.840
<v Speaker 2>to do it once.

14
00:00:36.000 --> 00:00:36.640
<v Speaker 1>Wow.

15
00:00:36.880 --> 00:00:39.880
<v Speaker 2>But today we expect those answers in a matter of days.

16
00:00:40.039 --> 00:00:43.399
<v Speaker 2>It's moved from being this purely biological challenge to what

17
00:00:43.479 --> 00:00:46.399
<v Speaker 2>is essentially a computational miracle.

18
00:00:46.560 --> 00:00:48.960
<v Speaker 1>Okay, let's unpack this because if you've ever wondered how

19
00:00:48.960 --> 00:00:51.399
<v Speaker 1>a simple cheek swab or like a vial of blood

20
00:00:51.479 --> 00:00:54.079
<v Speaker 1>drawn out a clinic actually turns into a highly personalized

21
00:00:54.119 --> 00:00:57.159
<v Speaker 1>medical profile, well, this is exactly the breakdown you need.

22
00:00:57.280 --> 00:00:58.960
<v Speaker 2>Yeah. It's a fascinating journey.

23
00:00:58.840 --> 00:01:02.560
<v Speaker 1>It really is. We're not just talking about the biology today.

24
00:01:02.640 --> 00:01:05.120
<v Speaker 1>We are taking a deep dive into the journey from

25
00:01:05.120 --> 00:01:09.079
<v Speaker 1>the wet, messy chemistry of a human cell to the

26
00:01:09.280 --> 00:01:13.400
<v Speaker 1>digital data on a computer screen. And more importantly, we're

27
00:01:13.400 --> 00:01:17.640
<v Speaker 1>looking at the mind bending mathematical tricks that allow a standard,

28
00:01:17.760 --> 00:01:22.000
<v Speaker 1>cheap laptop to search your entire genetic code without instantly

29
00:01:22.040 --> 00:01:22.680
<v Speaker 1>catching fire.

30
00:01:22.920 --> 00:01:25.400
<v Speaker 2>It really is a collision of two completely different worlds.

31
00:01:25.599 --> 00:01:28.359
<v Speaker 2>I mean, you have to physically extract the data from

32
00:01:28.359 --> 00:01:31.280
<v Speaker 2>the molecule first, Yeah, right, and only then can the

33
00:01:31.319 --> 00:01:32.760
<v Speaker 2>algorithms do their heavy lifting.

34
00:01:32.920 --> 00:01:35.040
<v Speaker 1>Right, So let's start with that physical extraction. We've got

35
00:01:35.040 --> 00:01:37.799
<v Speaker 1>this invisible DNA in a tube. How did we go

36
00:01:37.920 --> 00:01:41.040
<v Speaker 1>from painstakingly reading one genetic sentence at a time to

37
00:01:41.280 --> 00:01:45.079
<v Speaker 1>basically scanning the entire three million page library in an afternoon.

38
00:01:45.200 --> 00:01:46.799
<v Speaker 1>I mean it didn't happen overnight.

39
00:01:46.560 --> 00:01:49.359
<v Speaker 2>No, not at all. It's a story of well, constant,

40
00:01:49.359 --> 00:01:52.079
<v Speaker 2>aggressive problem solving. It started back in nineteen seventy seven

41
00:01:52.120 --> 00:01:54.400
<v Speaker 2>with what we now call first generation sequencing or saying

42
00:01:54.480 --> 00:01:58.319
<v Speaker 2>or sequencing. The foundational idea was brilliant. Honestly, they used

43
00:01:58.359 --> 00:02:01.480
<v Speaker 2>a natural enzyme to copy a strand of DNA, but

44
00:02:01.560 --> 00:02:05.200
<v Speaker 2>they spike the chemical soup with these modified nucleotides. You know,

45
00:02:05.359 --> 00:02:07.560
<v Speaker 2>the ACG and T building blocks.

46
00:02:07.680 --> 00:02:11.000
<v Speaker 1>Right. The sources mentioned these modified blocks have fluorescent glowing

47
00:02:11.039 --> 00:02:14.439
<v Speaker 1>tags on them. They act like molecular stop signs.

48
00:02:14.520 --> 00:02:17.599
<v Speaker 2>Yeah, that's exactly it. That's the key. Imagine you're copying

49
00:02:17.639 --> 00:02:19.599
<v Speaker 2>a sentence, but every time you write the letter A,

50
00:02:20.080 --> 00:02:20.960
<v Speaker 2>your pen freezes.

51
00:02:21.159 --> 00:02:22.919
<v Speaker 1>Oh weird, okay, right, so.

52
00:02:22.919 --> 00:02:25.919
<v Speaker 2>You'd end up with a fragment ending in A. By

53
00:02:26.000 --> 00:02:28.439
<v Speaker 2>running this process over and over, you end up with

54
00:02:28.479 --> 00:02:31.800
<v Speaker 2>a massive mixture of DNA fragments of all different lengths.

55
00:02:32.360 --> 00:02:36.120
<v Speaker 2>You sort them by size using electrical charge, a technique

56
00:02:36.120 --> 00:02:39.719
<v Speaker 2>called electrophoresis, got it, and then a camera reads the

57
00:02:39.759 --> 00:02:42.039
<v Speaker 2>glowing colors at the end of each fragment one by

58
00:02:42.080 --> 00:02:43.080
<v Speaker 2>one to spell.

59
00:02:42.879 --> 00:02:45.840
<v Speaker 1>Out the sequence, which sounds incredibly accurate, but I mean

60
00:02:46.039 --> 00:02:47.319
<v Speaker 1>practically agonizing.

61
00:02:47.439 --> 00:02:49.280
<v Speaker 2>Oh it's painfully slow. Yeah.

62
00:02:49.319 --> 00:02:52.479
<v Speaker 1>The sources say this method maxes out at reading fragments

63
00:02:52.520 --> 00:02:55.560
<v Speaker 1>about eight hundred letters long. If I'm trying to read

64
00:02:55.599 --> 00:02:58.759
<v Speaker 1>a three billion letter genome, that makes me think of

65
00:02:58.960 --> 00:03:03.960
<v Speaker 1>like a medieval monk painstakingly copying an encyclopedia by hand,

66
00:03:04.159 --> 00:03:05.159
<v Speaker 1>letter by single letter.

67
00:03:05.199 --> 00:03:06.199
<v Speaker 2>It's a great analogy.

68
00:03:06.479 --> 00:03:09.319
<v Speaker 1>It works, but you aren't mass producing anything that way.

69
00:03:09.479 --> 00:03:13.039
<v Speaker 2>No, it was a massive bottleneck, and that specific speed

70
00:03:13.080 --> 00:03:16.199
<v Speaker 2>limit is what triggered a complete rethinking of the process.

71
00:03:16.759 --> 00:03:20.639
<v Speaker 2>Companies like Alumina came along and essentially disrupted the biological

72
00:03:20.639 --> 00:03:23.759
<v Speaker 2>space like a Silicon Valley tech company right the introduced

73
00:03:23.800 --> 00:03:25.560
<v Speaker 2>second generation sequencing.

74
00:03:25.159 --> 00:03:28.680
<v Speaker 1>The massively parallel approach. So if Sanger was the medieval monk,

75
00:03:29.000 --> 00:03:32.680
<v Speaker 1>Alumina is like taking that three million page book, tossing

76
00:03:32.719 --> 00:03:35.159
<v Speaker 1>it into a wood schipper and turning it into millions

77
00:03:35.159 --> 00:03:38.039
<v Speaker 1>of tiny pieces of confetti exactly, and then you read

78
00:03:38.120 --> 00:03:40.599
<v Speaker 1>every single shred of confetti at the exact same moment

79
00:03:40.719 --> 00:03:43.759
<v Speaker 1>and basically force a computer to paste the book back together.

80
00:03:44.080 --> 00:03:47.199
<v Speaker 2>That is essentially what they do. Yeah, but to read

81
00:03:47.360 --> 00:03:50.159
<v Speaker 2>millions of tiny shreds at once, the signal has to

82
00:03:50.199 --> 00:03:52.080
<v Speaker 2>be loud enough for a camera sensor to actually pick

83
00:03:52.120 --> 00:03:55.599
<v Speaker 2>it up. A single DNA molecule is just too faint.

84
00:03:56.199 --> 00:03:59.520
<v Speaker 2>So these techniques like a mulsion PCR bridge PCR.

85
00:03:59.199 --> 00:04:02.199
<v Speaker 1>Hold on I see mentioned everywhere in the Deep dive sources,

86
00:04:02.199 --> 00:04:04.960
<v Speaker 1>But what does that actually mean in this context? Bridge

87
00:04:05.039 --> 00:04:06.280
<v Speaker 1>PCR think.

88
00:04:06.159 --> 00:04:10.000
<v Speaker 2>Of it as microscopic photocopying. They wash the DNA fragments

89
00:04:10.039 --> 00:04:13.439
<v Speaker 2>over a tiny glass slide. The fragments attached to the slide,

90
00:04:13.759 --> 00:04:15.759
<v Speaker 2>and enzymes duplicate them right there in.

91
00:04:15.719 --> 00:04:17.279
<v Speaker 1>Place, Okay, right there on the glass.

92
00:04:17.360 --> 00:04:20.720
<v Speaker 2>Yeah, they bend over, forming a bridge and copy themselves

93
00:04:20.759 --> 00:04:24.759
<v Speaker 2>again and again. Suddenly, instead of one faint DNA molecule,

94
00:04:25.079 --> 00:04:28.000
<v Speaker 2>you have a dense little cluster of thousands of identical

95
00:04:28.040 --> 00:04:30.360
<v Speaker 2>clones standing up like a tiny forest on the glass.

96
00:04:30.399 --> 00:04:30.920
<v Speaker 1>Oh wow.

97
00:04:31.000 --> 00:04:33.040
<v Speaker 2>So when you attach a glowing chemical tag to them,

98
00:04:33.360 --> 00:04:36.399
<v Speaker 2>that entire cluster flashes brightly enough for a digital camera

99
00:04:36.439 --> 00:04:37.079
<v Speaker 2>to photograph.

100
00:04:37.360 --> 00:04:40.160
<v Speaker 1>That is wild. So you take a picture, wash the

101
00:04:40.199 --> 00:04:43.079
<v Speaker 1>chemicals away, add the next letter, and take another picture.

102
00:04:43.600 --> 00:04:47.480
<v Speaker 1>Just millions of clusters flashing in sequence. But you know,

103
00:04:47.600 --> 00:04:51.079
<v Speaker 1>reading the sources, there's another second gen variation that completely

104
00:04:51.120 --> 00:04:54.560
<v Speaker 1>blew my mind. Ion Torrent, Oh yeah, they don't use lasers,

105
00:04:54.560 --> 00:04:55.959
<v Speaker 1>they don't use camera because.

106
00:04:55.720 --> 00:04:57.839
<v Speaker 2>They aren't looking at light at all. They are literally

107
00:04:57.879 --> 00:04:59.839
<v Speaker 2>measuring the acidity of the chemical soup.

108
00:05:00.160 --> 00:05:02.160
<v Speaker 1>Hold on, how do you read a genetic code by

109
00:05:02.319 --> 00:05:03.439
<v Speaker 1>checking the pH level?

110
00:05:03.519 --> 00:05:06.480
<v Speaker 2>It comes down to basic chemistry. Really, Every single time

111
00:05:06.480 --> 00:05:10.279
<v Speaker 2>a new nucleotide successfully attaches to a growing DNA strand,

112
00:05:10.560 --> 00:05:15.439
<v Speaker 2>the chemical bond naturally releases a single positively charged hydrogen

113
00:05:15.480 --> 00:05:19.120
<v Speaker 2>ion Okay ion torrent machines use a summit conductor chip

114
00:05:19.439 --> 00:05:23.399
<v Speaker 2>layered with millions of microscopic wells. It's basically a massive

115
00:05:23.439 --> 00:05:27.480
<v Speaker 2>grid of tiny pH meters. It detects that microscopic drop

116
00:05:27.480 --> 00:05:30.519
<v Speaker 2>in pH when the hydrogen ion pops off. So it's

117
00:05:30.560 --> 00:05:34.199
<v Speaker 2>translating a biological event directly into a digital electronic signal.

118
00:05:34.279 --> 00:05:37.759
<v Speaker 1>You're just listening for the electrical pop of a hydrogen atom. Unbelievable.

119
00:05:37.800 --> 00:05:38.759
<v Speaker 2>It is pretty incredible.

120
00:05:38.839 --> 00:05:42.839
<v Speaker 1>But even with that speed, second generation sequencing still relies

121
00:05:42.879 --> 00:05:47.319
<v Speaker 1>on tearing the DNA into tiny confetti, right, which brings

122
00:05:47.360 --> 00:05:51.399
<v Speaker 1>us to the third generation technologies like pack bio and

123
00:05:51.439 --> 00:05:55.040
<v Speaker 1>Oxford nanopore. This reads like pure science fiction. They don't

124
00:05:55.079 --> 00:05:57.120
<v Speaker 1>chop it up, they don't pause to take pictures, they

125
00:05:57.160 --> 00:05:58.279
<v Speaker 1>just read it continuously.

126
00:05:58.439 --> 00:06:03.199
<v Speaker 2>Yeah, it's called single molecule time sequencing with nanopore. Imagine

127
00:06:03.240 --> 00:06:08.079
<v Speaker 2>a microscopic hole, a literal poor punctured through a synthetic membrane.

128
00:06:08.680 --> 00:06:12.319
<v Speaker 2>They apply a steady electrical current across that membrane. Then

129
00:06:12.720 --> 00:06:16.120
<v Speaker 2>they physically pull a single long strand of DNA.

130
00:06:15.879 --> 00:06:17.639
<v Speaker 1>Through that hole, like threading a needle.

131
00:06:17.800 --> 00:06:20.800
<v Speaker 2>Exactly like that, And because the molecular shapes of an

132
00:06:20.800 --> 00:06:24.199
<v Speaker 2>ASCG and a T are slightly different. They each block

133
00:06:24.279 --> 00:06:26.800
<v Speaker 2>the hole in a uniquely different way as they pass through.

134
00:06:26.920 --> 00:06:29.720
<v Speaker 2>Oh I see, yeah, that physically alters the electrical current.

135
00:06:29.879 --> 00:06:32.560
<v Speaker 2>The machine reads this specific changes in the voltage to

136
00:06:32.600 --> 00:06:35.240
<v Speaker 2>spell out the letters as the strand zips through.

137
00:06:35.120 --> 00:06:38.759
<v Speaker 1>Which means you can read massive uninterrupted stretches. The sources

138
00:06:38.800 --> 00:06:41.199
<v Speaker 1>say up to twenty thousand letters in a single read.

139
00:06:41.279 --> 00:06:44.160
<v Speaker 1>It's like feeding the entire intact book through a high

140
00:06:44.160 --> 00:06:45.560
<v Speaker 1>speed ticker tape scanner.

141
00:06:45.800 --> 00:06:47.399
<v Speaker 2>Yep, it's a huge leap and read length.

142
00:06:47.600 --> 00:06:49.519
<v Speaker 1>But I've got to pause you here. I'm looking at

143
00:06:49.519 --> 00:06:53.199
<v Speaker 1>the data from the sources. If this third generation tech

144
00:06:53.360 --> 00:06:57.000
<v Speaker 1>is so revolutionary and reads so fast, why are we

145
00:06:57.000 --> 00:06:59.800
<v Speaker 1>still using the second generation confetti method at all?

146
00:07:00.120 --> 00:07:04.319
<v Speaker 2>Right? Well, what's fascinating here is a very stubborn, hidden

147
00:07:04.360 --> 00:07:07.759
<v Speaker 2>trade off between length and accuracy. When you are violently

148
00:07:07.800 --> 00:07:10.560
<v Speaker 2>pulling a molecule through a microscopic hole at high speed,

149
00:07:10.920 --> 00:07:14.519
<v Speaker 2>the sensor occasionally blinks. It might miss a letter entirely,

150
00:07:14.639 --> 00:07:17.600
<v Speaker 2>or accidentally read the same letter twice. These are called

151
00:07:17.639 --> 00:07:21.600
<v Speaker 2>insertion and deletion errors. Third generation tools historically sit at

152
00:07:21.639 --> 00:07:25.079
<v Speaker 2>an error rate of about seventeen point eight to seventeen

153
00:07:25.160 --> 00:07:27.240
<v Speaker 2>point nine percent, almost.

154
00:07:26.879 --> 00:07:30.199
<v Speaker 1>An eighteen percent error rate in a medical context. I mean,

155
00:07:30.199 --> 00:07:32.920
<v Speaker 1>if I'm looking for a single cancer causing mutation, an

156
00:07:32.959 --> 00:07:35.680
<v Speaker 1>eighteen percent failure rate sounds absolutely terrifying.

157
00:07:35.759 --> 00:07:38.680
<v Speaker 2>It does sound alarming, for sure, but scientists realize something

158
00:07:38.720 --> 00:07:43.399
<v Speaker 2>brilliant about those errors. They're completely random. The nanophore doesn't

159
00:07:43.399 --> 00:07:46.319
<v Speaker 2>systematically struggle with the letter C, for example. It's just

160
00:07:46.399 --> 00:07:47.120
<v Speaker 2>random static.

161
00:07:47.199 --> 00:07:49.240
<v Speaker 1>Okay, So how do you fix random static?

162
00:07:49.480 --> 00:07:52.399
<v Speaker 2>The workaround is actually quite elegant. You just sequence the

163
00:07:52.480 --> 00:07:55.160
<v Speaker 2>exact same strand of DNA twenty or thirty times.

164
00:07:55.160 --> 00:07:57.680
<v Speaker 1>Oh, I see, because the odds of the machine making

165
00:07:57.720 --> 00:08:00.920
<v Speaker 1>the exact same random mistake on the exact same letter

166
00:08:01.000 --> 00:08:03.279
<v Speaker 1>twenty times in a row is basically zero.

167
00:08:03.480 --> 00:08:06.399
<v Speaker 2>Precisely, you layer the thirty reads on top of each other,

168
00:08:06.759 --> 00:08:11.160
<v Speaker 2>the random glitches mathematically cancel out, and the true underlying

169
00:08:11.199 --> 00:08:12.680
<v Speaker 2>sequence emerges clearly.

170
00:08:12.879 --> 00:08:15.959
<v Speaker 1>Okay, So that leads us directly to the next massive problem.

171
00:08:16.279 --> 00:08:19.720
<v Speaker 1>If third generation sequencing has a nearly eighteen percent raw

172
00:08:19.839 --> 00:08:22.439
<v Speaker 1>error rate, just dumping all that text into a computer

173
00:08:22.519 --> 00:08:25.560
<v Speaker 1>file is completely useless. The computer needs to know which

174
00:08:25.639 --> 00:08:29.279
<v Speaker 1>letters are biological facts and which letters are just machine hallucinations.

175
00:08:29.839 --> 00:08:31.639
<v Speaker 1>So how do we tag the trustworthy data.

176
00:08:32.000 --> 00:08:35.039
<v Speaker 2>That's where specialized file formats come in. The most basic

177
00:08:35.120 --> 00:08:37.080
<v Speaker 2>format used to be called FASTA. It was just a

178
00:08:37.080 --> 00:08:40.840
<v Speaker 2>plain text file, literally just a string of acsgs and t's.

179
00:08:41.320 --> 00:08:44.159
<v Speaker 2>But as you point it out, FASTA isn't enough anymore.

180
00:08:44.559 --> 00:08:46.759
<v Speaker 2>We needed a way to track the confidence of every

181
00:08:46.799 --> 00:08:47.399
<v Speaker 2>single letter.

182
00:08:47.679 --> 00:08:51.200
<v Speaker 1>Enter the fast Q format, where the Q literally stands

183
00:08:51.200 --> 00:08:52.600
<v Speaker 1>for quality exactly.

184
00:08:52.879 --> 00:08:56.720
<v Speaker 2>FASTQ attaches a crucial piece of metadata called the phred

185
00:08:57.279 --> 00:09:01.039
<v Speaker 2>quality score or Q score. The sequencing machine actually grades

186
00:09:01.039 --> 00:09:04.200
<v Speaker 2>its own homework. For every single letter it outputs, it

187
00:09:04.279 --> 00:09:06.960
<v Speaker 2>calculates a mathematical probability that it made a mistake.

188
00:09:07.120 --> 00:09:09.840
<v Speaker 1>I found the engineering behind this fascinating. A Q score

189
00:09:09.919 --> 00:09:12.679
<v Speaker 1>is a number, right, say a score of thirty means

190
00:09:12.720 --> 00:09:15.360
<v Speaker 1>a ninety nine point nine percent accuracy rate.

191
00:09:15.480 --> 00:09:16.919
<v Speaker 2>Right, it's a logarithmic scale.

192
00:09:16.960 --> 00:09:19.240
<v Speaker 1>But if you have to store a two digit number

193
00:09:19.399 --> 00:09:22.879
<v Speaker 1>next to every single letter of a three billion letter genome,

194
00:09:23.279 --> 00:09:26.279
<v Speaker 1>you instantly double or triple your file size. Our hard

195
00:09:26.360 --> 00:09:29.720
<v Speaker 1>drives would fill up immediately. So instead, the algorithms take

196
00:09:29.759 --> 00:09:32.919
<v Speaker 1>that Q score number, add exactly thirty three to it,

197
00:09:33.159 --> 00:09:34.679
<v Speaker 1>and map it to a keyboard symbol.

198
00:09:34.720 --> 00:09:36.519
<v Speaker 2>It's an incredibly clever compression hack.

199
00:09:36.720 --> 00:09:38.919
<v Speaker 1>But wait, why add exactly thirty three? Why not just

200
00:09:39.039 --> 00:09:40.000
<v Speaker 1>use the number itself.

201
00:09:40.120 --> 00:09:42.399
<v Speaker 2>Well, it's because of how computers read text using the

202
00:09:42.440 --> 00:09:45.639
<v Speaker 2>ASKI standard. The first thirty two characters in a computer's

203
00:09:45.720 --> 00:09:50.480
<v Speaker 2>language aren't printable. They are invisible commands like escape or return.

204
00:09:50.639 --> 00:09:51.360
<v Speaker 1>Oh right, okay.

205
00:09:51.399 --> 00:09:54.519
<v Speaker 2>By mathematically adding thirty three to the Q score, you

206
00:09:54.639 --> 00:09:58.519
<v Speaker 2>jump past those invisible commands and land perfectly on standard

207
00:09:58.559 --> 00:10:01.879
<v Speaker 2>printable characters. So instead of storing the number thirty the

208
00:10:01.919 --> 00:10:05.480
<v Speaker 2>computer stores a single XH symbol or maybe a question mark.

209
00:10:05.879 --> 00:10:09.000
<v Speaker 2>You fit complex probability data into a single byte of memory.

210
00:10:09.200 --> 00:10:12.440
<v Speaker 1>That is brilliant, And the stakes here are real because

211
00:10:12.480 --> 00:10:14.919
<v Speaker 1>if a doctor is looking at your file and the

212
00:10:15.000 --> 00:10:18.600
<v Speaker 1>sequence shows a genetic marker for a severe disease, they

213
00:10:18.639 --> 00:10:20.799
<v Speaker 1>need to know if that marker have a high Q

214
00:10:21.000 --> 00:10:23.799
<v Speaker 1>score or if it's just a low quality machine glitch.

215
00:10:23.960 --> 00:10:26.960
<v Speaker 2>Exactly, if we connect this to the bigger picture, we

216
00:10:27.000 --> 00:10:29.360
<v Speaker 2>aren't just trusting one read. We look at the read

217
00:10:29.399 --> 00:10:32.960
<v Speaker 2>depth and the genotype quality. If fifty reads show mutation

218
00:10:33.120 --> 00:10:36.000
<v Speaker 2>and have high Q scores, the algorithm confidently called it

219
00:10:36.039 --> 00:10:37.240
<v Speaker 2>a true variant.

220
00:10:36.960 --> 00:10:38.720
<v Speaker 1>It ignores the low quality blitches.

221
00:10:38.960 --> 00:10:41.399
<v Speaker 2>Yes, and once we trust the letters, we have to

222
00:10:41.399 --> 00:10:43.759
<v Speaker 2>figure out what they mean. You take those millions of

223
00:10:43.840 --> 00:10:46.600
<v Speaker 2>verified fast Q shreds and you align them against a

224
00:10:46.639 --> 00:10:50.000
<v Speaker 2>standard reference human genome. It's like checking your puzzle pieces

225
00:10:50.000 --> 00:10:52.559
<v Speaker 2>against the picture on the front of the box. Once

226
00:10:52.559 --> 00:10:55.159
<v Speaker 2>they are aligned, they are saved as a BAM file,

227
00:10:55.440 --> 00:10:57.519
<v Speaker 2>which is a highly compressed binary format.

228
00:10:57.799 --> 00:11:01.159
<v Speaker 1>But humans are fundamentally ninety nine point nine percent identical.

229
00:11:01.519 --> 00:11:04.120
<v Speaker 1>If you sequence my DNA, almost all of it is

230
00:11:04.159 --> 00:11:06.759
<v Speaker 1>exactly the same as the reference map. It seems wildly

231
00:11:06.759 --> 00:11:10.519
<v Speaker 1>inefficient to store three billion letters just to say yep,

232
00:11:10.639 --> 00:11:11.600
<v Speaker 1>still human.

233
00:11:11.519 --> 00:11:14.080
<v Speaker 2>Which is why the final piece of this file pipeline

234
00:11:14.360 --> 00:11:18.519
<v Speaker 2>is the VCF or variant call format. We don't store

235
00:11:18.559 --> 00:11:22.200
<v Speaker 2>your whole genome. The VCF file only stores your mutations,

236
00:11:22.440 --> 00:11:25.240
<v Speaker 2>the differences. It's essentially a list of typos. It says,

237
00:11:25.519 --> 00:11:28.200
<v Speaker 2>at chromosome four position one million, there should be an A,

238
00:11:28.360 --> 00:11:29.559
<v Speaker 2>but in this patient it's a G.

239
00:11:30.200 --> 00:11:32.399
<v Speaker 1>Okay, let's step back. Because I'm looking at the sheer

240
00:11:32.440 --> 00:11:34.759
<v Speaker 1>math of this alignment process. We sort of glossed over

241
00:11:34.799 --> 00:11:37.720
<v Speaker 1>how we actually match the puzzle pieces. If I have

242
00:11:37.759 --> 00:11:40.440
<v Speaker 1>one hundred letter fragment, and I have three billion possible

243
00:11:40.480 --> 00:11:43.200
<v Speaker 1>places to stick it on the reference genome. What in

244
00:11:43.279 --> 00:11:46.279
<v Speaker 1>a standard computer search algorithm just freeze? I mean, how

245
00:11:46.320 --> 00:11:48.120
<v Speaker 1>do they avoid a total system crash?

246
00:11:48.200 --> 00:11:51.320
<v Speaker 2>This is where we get into the real heavy algorithmic lifting.

247
00:11:51.440 --> 00:11:54.120
<v Speaker 2>The first major hurdle is that genetic mutations mean you

248
00:11:54.159 --> 00:11:56.919
<v Speaker 2>almost never have an exact match. You might have a

249
00:11:56.919 --> 00:11:59.440
<v Speaker 2>missing letter or an extra one, so you can't just

250
00:11:59.559 --> 00:12:03.399
<v Speaker 2>hit ctrl as string in search for the exact string.

251
00:12:03.919 --> 00:12:06.440
<v Speaker 2>You have to use something called dynamic programming to calculate

252
00:12:06.480 --> 00:12:07.279
<v Speaker 2>the edit distance.

253
00:12:07.919 --> 00:12:10.559
<v Speaker 1>I read about this. It's about finding the minimum number

254
00:12:10.559 --> 00:12:15.519
<v Speaker 1>of operations insertions, deletions, or substitutions to change one string

255
00:12:15.519 --> 00:12:19.000
<v Speaker 1>of text into another. The source gave a great simple example,

256
00:12:19.440 --> 00:12:22.960
<v Speaker 1>changing the word ants to bent. You substitute the A

257
00:12:23.279 --> 00:12:25.759
<v Speaker 1>for an E, insert a B at the front, and

258
00:12:25.799 --> 00:12:28.279
<v Speaker 1>delete the S at the end. That takes three steps,

259
00:12:28.360 --> 00:12:31.480
<v Speaker 1>perfect exactly. But scaling that up to thousands of letters

260
00:12:31.519 --> 00:12:34.840
<v Speaker 1>creates an astronomical number of possible operations.

261
00:12:34.960 --> 00:12:38.360
<v Speaker 2>Right if you try to calculate every single possible combination

262
00:12:38.480 --> 00:12:42.519
<v Speaker 2>from scratch using standard recursion, which essentially means the computer

263
00:12:42.600 --> 00:12:44.960
<v Speaker 2>solves the problem by breaking it into smaller pieces and

264
00:12:44.960 --> 00:12:47.960
<v Speaker 2>solving every single piece over and over, the computing time

265
00:12:48.000 --> 00:12:52.080
<v Speaker 2>grows exponentially. The universe would literally end before your laptop finished.

266
00:12:52.200 --> 00:12:55.279
<v Speaker 1>Okay, so if recursion crashes the computer, how does dynamic

267
00:12:55.320 --> 00:12:56.440
<v Speaker 1>programming solve it?

268
00:12:56.759 --> 00:13:00.240
<v Speaker 2>By using memory to save time, it builds what's called

269
00:13:00.279 --> 00:13:05.320
<v Speaker 2>a dependency graph or a table. Think of it like

270
00:13:05.360 --> 00:13:09.039
<v Speaker 2>getting driving directions. If you want to calculate the absolute

271
00:13:09.159 --> 00:13:11.879
<v Speaker 2>fastest route from New York to Seattle, and part of

272
00:13:11.879 --> 00:13:15.559
<v Speaker 2>your out goes through Chicago, you calculate the Chicago Seattle

273
00:13:15.639 --> 00:13:18.679
<v Speaker 2>leg once you write that answer down on a sticky note.

274
00:13:18.799 --> 00:13:21.399
<v Speaker 2>Oh okay, so yeah, if you were testing a million

275
00:13:21.440 --> 00:13:24.080
<v Speaker 2>different routes out of New York and a bunch of

276
00:13:24.159 --> 00:13:28.200
<v Speaker 2>them eventually passed through Chicago, you don't mathematically recalculate the

277
00:13:28.240 --> 00:13:30.960
<v Speaker 2>western half of the United States every single time. You

278
00:13:31.000 --> 00:13:32.320
<v Speaker 2>Just look at your sticky.

279
00:13:31.960 --> 00:13:34.200
<v Speaker 1>Note, right, You've already done that math exactly.

280
00:13:34.600 --> 00:13:37.879
<v Speaker 2>Dynamic programming does this for DNA. It solves the tiny

281
00:13:37.960 --> 00:13:40.679
<v Speaker 2>sub problems of the text, saves the answers in a

282
00:13:40.720 --> 00:13:43.519
<v Speaker 2>massive table, and just lifts them up. It drops the

283
00:13:43.519 --> 00:13:46.000
<v Speaker 2>computing time from trillions of years down to minutes.

284
00:13:46.200 --> 00:13:49.240
<v Speaker 1>It catches the answers that makes total sense. But even

285
00:13:49.240 --> 00:13:52.039
<v Speaker 1>with the sticky notes, searching every edge of a three

286
00:13:52.120 --> 00:13:55.799
<v Speaker 1>billion letter genome for millions of tiny confetti fragments is

287
00:13:55.840 --> 00:13:59.159
<v Speaker 1>still too slow, which brings us to a concept called

288
00:13:59.200 --> 00:14:01.559
<v Speaker 1>a bloom filter. And I've got to admit this is

289
00:14:01.559 --> 00:14:04.120
<v Speaker 1>where the computer science gets really counterintuitive for me.

290
00:14:04.480 --> 00:14:06.519
<v Speaker 2>It is a bit mind bending at first.

291
00:14:06.759 --> 00:14:11.600
<v Speaker 1>It's a space efficient probabilistic data structure. Basically, it asks

292
00:14:11.639 --> 00:14:15.759
<v Speaker 1>a massive database, does this sequence exist in here? Without

293
00:14:15.799 --> 00:14:17.240
<v Speaker 1>actually looking through the data. Yeah.

294
00:14:17.279 --> 00:14:20.320
<v Speaker 2>It uses mathematical hash functions and a simple bit array,

295
00:14:20.559 --> 00:14:23.080
<v Speaker 2>just a microscopic sequence of ones and zeros. When you

296
00:14:23.120 --> 00:14:26.120
<v Speaker 2>insert a genetic sequence into the system, it runs it

297
00:14:26.159 --> 00:14:29.960
<v Speaker 2>through a math formula that flips specific zeros to ones. Okay,

298
00:14:30.360 --> 00:14:32.480
<v Speaker 2>when you want to search for a sequence later, you

299
00:14:32.559 --> 00:14:35.480
<v Speaker 2>run it through the same formula. If all the corresponding

300
00:14:35.519 --> 00:14:38.440
<v Speaker 2>bits are ones, it tells you the item is probably there.

301
00:14:38.799 --> 00:14:41.159
<v Speaker 2>But if even a single bit is a zero, it

302
00:14:41.240 --> 00:14:45.120
<v Speaker 2>guarantees with absolute mathematical certainty that the item is not there.

303
00:14:45.320 --> 00:14:46.919
<v Speaker 1>I was trying to picture this, and it makes me

304
00:14:46.960 --> 00:14:50.639
<v Speaker 1>think of a very strict bouncer at a crowded VIP club.

305
00:14:50.960 --> 00:14:53.679
<v Speaker 1>The bouncer uses a series of quick, weird rules to

306
00:14:53.759 --> 00:14:56.559
<v Speaker 1>check people at the door. Are you wearing red shoes?

307
00:14:56.759 --> 00:14:57.600
<v Speaker 1>Do you have a ticket?

308
00:14:57.840 --> 00:15:00.200
<v Speaker 2>That's a good way to look at it right now.

309
00:15:00.240 --> 00:15:03.200
<v Speaker 1>And then the bouncer might mistakenly let a random person

310
00:15:03.200 --> 00:15:05.480
<v Speaker 1>in who isn't on the list. That's a false positive.

311
00:15:05.639 --> 00:15:08.960
<v Speaker 1>But the bouncer will absolutely never ever turn away someone

312
00:15:09.000 --> 00:15:12.000
<v Speaker 1>who is actually on the list. There is zero false negative.

313
00:15:12.360 --> 00:15:14.720
<v Speaker 1>But let me challenge this directly, go for it. Why

314
00:15:14.759 --> 00:15:18.320
<v Speaker 1>would computer scientists intentionally design an algorithm that we know

315
00:15:18.600 --> 00:15:22.240
<v Speaker 1>for a fact gives false positives? Is an accuracy the

316
00:15:22.399 --> 00:15:24.279
<v Speaker 1>entire point of medical science.

317
00:15:24.639 --> 00:15:28.039
<v Speaker 2>This raises an important question about computational trade offs. It's

318
00:15:28.080 --> 00:15:31.519
<v Speaker 2>all about conserving memory and speed. A bloom filter takes

319
00:15:31.600 --> 00:15:35.679
<v Speaker 2>up an unbelievably small amount of memory by intentionally allowing

320
00:15:35.759 --> 00:15:39.080
<v Speaker 2>a tiny predictable margin of error, say a one or

321
00:15:39.120 --> 00:15:43.519
<v Speaker 2>two percent false positive rate. We can achieve near instantaneous search.

322
00:15:43.320 --> 00:15:45.879
<v Speaker 1>Speeds because you aren't using the bloom filter for the

323
00:15:45.919 --> 00:15:48.720
<v Speaker 1>final answer. You use it to instantly discard the ninety

324
00:15:48.799 --> 00:15:52.039
<v Speaker 1>nine percent of the genome where the sequence definitely doesn't belong.

325
00:15:51.840 --> 00:15:54.879
<v Speaker 2>Ray Siicely, you use the cheap fast algorithm to clear

326
00:15:54.879 --> 00:15:57.440
<v Speaker 2>away the junk, and then you only perform the slow,

327
00:15:57.639 --> 00:16:01.600
<v Speaker 2>rigorous dynamic programming check on the few positive hits. You

328
00:16:01.639 --> 00:16:04.720
<v Speaker 2>save your heavy computational artillery for the targets that actually matter.

329
00:16:04.960 --> 00:16:08.559
<v Speaker 1>Okay, so bloom filters tell us if a sequence exists somewhere,

330
00:16:09.000 --> 00:16:11.080
<v Speaker 1>But to find exactly where it lives in the genome,

331
00:16:11.120 --> 00:16:13.279
<v Speaker 1>we need an index, like the index at the back

332
00:16:13.320 --> 00:16:15.320
<v Speaker 1>of a textbook telling you which page a word is on.

333
00:16:15.879 --> 00:16:18.519
<v Speaker 1>But when I was looking at the source text, standard

334
00:16:18.519 --> 00:16:22.840
<v Speaker 1>computer indexes for something this large are impossibly bloated. A

335
00:16:22.879 --> 00:16:26.039
<v Speaker 1>standard index a suffix tree for the human genome takes

336
00:16:26.120 --> 00:16:28.120
<v Speaker 1>up about forty gigabytes of active.

337
00:16:27.840 --> 00:16:30.840
<v Speaker 2>Memory, which is a fatal bottleneck. You can't load forty

338
00:16:30.879 --> 00:16:33.440
<v Speaker 2>gigabytes of data into the ram of a standard computer.

339
00:16:33.799 --> 00:16:35.879
<v Speaker 2>It means the computer would have to constantly read back

340
00:16:35.879 --> 00:16:38.240
<v Speaker 2>and forth from the hard drive, which slows everything to

341
00:16:38.279 --> 00:16:39.200
<v Speaker 2>an absolute crawl.

342
00:16:39.320 --> 00:16:41.519
<v Speaker 1>And this is where we get to the absolute crown

343
00:16:41.639 --> 00:16:45.559
<v Speaker 1>jewel of this whole. Deep dive. Researchers Ferragina and Manzini

344
00:16:45.759 --> 00:16:48.559
<v Speaker 1>created the FM index, and they did it using a

345
00:16:48.559 --> 00:16:53.600
<v Speaker 1>mathematical trick called the Burrows Wheeler transform or BWT, but honestly,

346
00:16:53.679 --> 00:16:55.720
<v Speaker 1>reading the mechanics of this transform broke my brain a

347
00:16:55.759 --> 00:16:57.679
<v Speaker 1>little bit. How does BWT actually work.

348
00:16:57.960 --> 00:17:02.200
<v Speaker 2>It is notoriously difficult to visual lies, but incredibly elegant

349
00:17:02.200 --> 00:17:06.480
<v Speaker 2>once you get it. The BWT is a permutation. It

350
00:17:06.559 --> 00:17:11.279
<v Speaker 2>reorganizes the text. Imagine taking a sequence of letters, rotating

351
00:17:11.279 --> 00:17:13.960
<v Speaker 2>the whole sequence by one letter, writing that down, rotating

352
00:17:14.039 --> 00:17:17.480
<v Speaker 2>it again, and listing out all the possible rotations. Then

353
00:17:17.720 --> 00:17:19.599
<v Speaker 2>you sort those rows alphabetically.

354
00:17:19.720 --> 00:17:21.759
<v Speaker 1>Okay, I'm with you, but why do that? What does

355
00:17:21.799 --> 00:17:25.000
<v Speaker 1>alphabetically sorting a bunch of rotated gibberish actually achieve?

356
00:17:25.559 --> 00:17:28.440
<v Speaker 2>Because of the underlying structure of human language and DNA,

357
00:17:29.039 --> 00:17:32.759
<v Speaker 2>When you sort those rotations alphabetically, a mathematical magic trick happens.

358
00:17:32.799 --> 00:17:36.119
<v Speaker 2>In the final column of that list. Identical characters suddenly

359
00:17:36.200 --> 00:17:39.440
<v Speaker 2>group together. So instead of a random string like acgdac,

360
00:17:39.960 --> 00:17:42.079
<v Speaker 2>the final column will spit out long runs of the

361
00:17:42.119 --> 00:17:44.759
<v Speaker 2>same letter like aaccgt oh.

362
00:17:44.839 --> 00:17:47.759
<v Speaker 1>And because they are grouped together, you can compress them exactly.

363
00:17:47.799 --> 00:17:49.200
<v Speaker 2>It's called run lengthen coding.

364
00:17:49.359 --> 00:17:51.279
<v Speaker 1>Wait, let me make sure I'm picturing this right. Instead

365
00:17:51.279 --> 00:17:54.720
<v Speaker 1>of the computer wasting memory writing out aaaa, run lengthen,

366
00:17:54.720 --> 00:17:57.279
<v Speaker 1>coding just writes five A yes, that.

367
00:17:57.359 --> 00:18:00.279
<v Speaker 2>Single trick allows the FM index to shrink the already

368
00:18:00.279 --> 00:18:02.359
<v Speaker 2>gigabyte in decks down to less than two gigabytes.

369
00:18:02.599 --> 00:18:06.440
<v Speaker 1>Here's where it gets really interesting. Suddenly the entire searchable

370
00:18:06.519 --> 00:18:09.559
<v Speaker 1>map of the human genome fits comfortably into the active

371
00:18:09.599 --> 00:18:11.480
<v Speaker 1>memory of a cheap laptop you could buy at a

372
00:18:11.559 --> 00:18:12.359
<v Speaker 1>big box store.

373
00:18:12.839 --> 00:18:16.319
<v Speaker 2>Yeah, that is just staggering. But the compression isn't even

374
00:18:16.359 --> 00:18:19.240
<v Speaker 2>the craziest part the source is mentioned. It allows for

375
00:18:19.599 --> 00:18:24.359
<v Speaker 2>backward search, which sounds impossible. How do you search compressed

376
00:18:24.400 --> 00:18:27.720
<v Speaker 2>data without uncompressing it first? This is the true genius

377
00:18:27.720 --> 00:18:31.319
<v Speaker 2>of the BWT. Because of how the matrix is mathematically structured,

378
00:18:31.559 --> 00:18:34.519
<v Speaker 2>you can jump between the columns to trace a sequence backwards,

379
00:18:34.599 --> 00:18:38.000
<v Speaker 2>letter by letter without ever unpacking the file. And here

380
00:18:38.079 --> 00:18:40.519
<v Speaker 2>is the kicker. The time it takes to search for

381
00:18:40.559 --> 00:18:43.000
<v Speaker 2>a pattern is proportional only to the length of your

382
00:18:43.039 --> 00:18:46.359
<v Speaker 2>query string. It completely ignores the massive size of the

383
00:18:46.359 --> 00:18:47.119
<v Speaker 2>actual genome.

384
00:18:47.200 --> 00:18:49.240
<v Speaker 1>Hold on, you're saying that if I want to search

385
00:18:49.240 --> 00:18:51.839
<v Speaker 1>for a fifty letter sequence, it takes the exact same

386
00:18:51.880 --> 00:18:54.359
<v Speaker 1>amount of time whether I am searching the tiny genome

387
00:18:54.400 --> 00:18:57.640
<v Speaker 1>of a fruitfly or the three billion letter human genome.

388
00:18:57.799 --> 00:19:00.480
<v Speaker 2>Exactly the size of the haystack no longer matters. The

389
00:19:00.519 --> 00:19:02.599
<v Speaker 2>time it takes only depends on the size of the needle.

390
00:19:02.839 --> 00:19:05.319
<v Speaker 2>It completely democratized genomic research overnight.

391
00:19:05.440 --> 00:19:08.920
<v Speaker 1>Okay, so we've gone from wet chemistry to massive raw data,

392
00:19:09.440 --> 00:19:14.599
<v Speaker 1>to error correcting files to mind blowing compression algorithms. So

393
00:19:14.839 --> 00:19:17.720
<v Speaker 1>what does this all mean? What does all this computational

394
00:19:17.799 --> 00:19:21.240
<v Speaker 1>heavy lifting actually do for the person listening right now?

395
00:19:21.359 --> 00:19:23.440
<v Speaker 1>If you are a patient in a hospital, what are

396
00:19:23.440 --> 00:19:24.279
<v Speaker 1>the steaks.

397
00:19:24.039 --> 00:19:27.279
<v Speaker 2>The steaks for your life? Before these algorithms, the genome

398
00:19:27.359 --> 00:19:30.079
<v Speaker 2>was a black box. Today, because we can search it

399
00:19:30.079 --> 00:19:33.000
<v Speaker 2>so quickly, we discovered incredible things. We found out that

400
00:19:33.039 --> 00:19:36.680
<v Speaker 2>humans only have about twenty thousand protein coding genes. That's

401
00:19:36.720 --> 00:19:38.480
<v Speaker 2>a mere three percent of our total.

402
00:19:38.319 --> 00:19:41.599
<v Speaker 1>DNA here, Really, just three percent actually builds the proteins.

403
00:19:41.680 --> 00:19:44.000
<v Speaker 1>The rest is essentially regulatory instructions.

404
00:19:44.119 --> 00:19:46.880
<v Speaker 2>Yes, and because we can quickly map a patient's DNA

405
00:19:46.960 --> 00:19:50.440
<v Speaker 2>against the reference, we can find the exact microscopic typos

406
00:19:50.480 --> 00:19:53.960
<v Speaker 2>causing their illness. The sources highlight a perfect example, the

407
00:19:54.000 --> 00:19:55.799
<v Speaker 2>bcl abl one fusion gene.

408
00:19:55.920 --> 00:19:57.920
<v Speaker 1>Right, that's a structural variation. It's when a piece of

409
00:19:57.960 --> 00:20:01.559
<v Speaker 1>chromosome nine accidentally breaks off attaches to chromosome twenty two.

410
00:20:01.680 --> 00:20:05.480
<v Speaker 2>Right, and that specific structural TYPO is present in ninety

411
00:20:05.480 --> 00:20:10.400
<v Speaker 2>five percent of patients with chronic myelogenous leukemia. Before this technology,

412
00:20:10.759 --> 00:20:13.359
<v Speaker 2>we just knew a patient had cancer, and we threw

413
00:20:13.440 --> 00:20:17.480
<v Speaker 2>toxic chemotherapy at them, hoping it worked. Today, we sequence

414
00:20:17.519 --> 00:20:20.920
<v Speaker 2>the genome, find the exact broken gear in the cellular machinery,

415
00:20:21.200 --> 00:20:24.240
<v Speaker 2>and use highly targeted drugs designed specifically to block that

416
00:20:24.359 --> 00:20:25.160
<v Speaker 2>mutated protein.

417
00:20:25.319 --> 00:20:27.880
<v Speaker 1>And it isn't just about static DNA either, Right The

418
00:20:27.960 --> 00:20:30.519
<v Speaker 1>sources talk about RNA sec and hippie suck.

419
00:20:30.640 --> 00:20:33.519
<v Speaker 2>Yes, If DNA is the architectural blueprint of a house,

420
00:20:33.880 --> 00:20:37.559
<v Speaker 2>RNA SEC is watching the construction workers actually build it.

421
00:20:37.559 --> 00:20:40.799
<v Speaker 2>It tells us the transcriptome which specific genes are actively

422
00:20:40.839 --> 00:20:43.240
<v Speaker 2>turned on or off in a cell at any given second,

423
00:20:43.559 --> 00:20:47.240
<v Speaker 2>and eachpisick maps out the specific proteins the transcription factors

424
00:20:47.400 --> 00:20:48.559
<v Speaker 2>that are flipping those switches.

425
00:20:48.599 --> 00:20:51.079
<v Speaker 1>It's watching the engine run in real time, which really

426
00:20:51.119 --> 00:20:53.839
<v Speaker 1>means the era of one size fits all medicine is dying.

427
00:20:54.079 --> 00:20:56.319
<v Speaker 1>If you get sick, doctors won't be guessing your treatment

428
00:20:56.359 --> 00:20:59.119
<v Speaker 1>based on population averages anymore. They are going to use

429
00:20:59.160 --> 00:21:02.960
<v Speaker 1>these algorithms to read your specific genetic typos and prescribe personalized,

430
00:21:03.000 --> 00:21:06.279
<v Speaker 1>stratified medicine designed exactly for your unique biology.

431
00:21:06.519 --> 00:21:10.000
<v Speaker 2>It is a complete paradigm shift. We've moved from observing

432
00:21:10.000 --> 00:21:13.640
<v Speaker 2>symptoms to observing the fundamental code of life in real.

433
00:21:13.440 --> 00:21:17.400
<v Speaker 1>Time, and the technology is accelerating. Our source text mentions

434
00:21:17.559 --> 00:21:21.759
<v Speaker 1>that Oxford Nanopor, the company pulling DNA through microscopic holes,

435
00:21:22.039 --> 00:21:25.440
<v Speaker 1>has created a device called the Minion. It's a disposable

436
00:21:25.519 --> 00:21:28.519
<v Speaker 1>DNA sequencer, the exact size and shape of a standard

437
00:21:28.640 --> 00:21:31.559
<v Speaker 1>USB flash drive. You just plug it right into a laptop.

438
00:21:31.880 --> 00:21:33.759
<v Speaker 2>Just think about the implications of that for a moment.

439
00:21:34.000 --> 00:21:36.839
<v Speaker 1>It completely flips the power dynamic, building on everything we've

440
00:21:36.880 --> 00:21:40.119
<v Speaker 1>talked about today, the error correction, the dynamic programming, the

441
00:21:40.160 --> 00:21:44.079
<v Speaker 1>heavily compressed FM index. We are rapidly approaching a world

442
00:21:44.079 --> 00:21:47.559
<v Speaker 1>where you could sequence your own DNA at home. If

443
00:21:47.599 --> 00:21:50.400
<v Speaker 1>the code that dictates whether your cells live, die, or

444
00:21:50.519 --> 00:21:53.160
<v Speaker 1>mutate can be read as easily as scanning a grocery

445
00:21:53.200 --> 00:21:56.440
<v Speaker 1>store barcode in your living room, how is that going

446
00:21:56.519 --> 00:21:59.279
<v Speaker 1>to change our relationship with our own biology? What does

447
00:21:59.319 --> 00:22:01.920
<v Speaker 1>it mean for our ourrivacy or our insurance? If anyone

448
00:22:01.920 --> 00:22:04.559
<v Speaker 1>can plug a thumb, drive in and read our biological destiny.

449
00:22:04.920 --> 00:22:08.720
<v Speaker 2>It's a profound frontier. The diagnostic muddy waters are finally clearing,

450
00:22:09.039 --> 00:22:11.200
<v Speaker 2>but what we find underneath is going to challenge us

451
00:22:11.240 --> 00:22:13.200
<v Speaker 2>as a society in entirely new ways.

452
00:22:13.519 --> 00:22:16.240
<v Speaker 1>It really is something for you to ponder long after

453
00:22:16.279 --> 00:22:19.839
<v Speaker 1>this deep dive ends, because the ultimate instruction manual is

454
00:22:19.880 --> 00:22:23.160
<v Speaker 1>no longer hidden. Thank you so much for joining us

455
00:22:23.160 --> 00:22:26.200
<v Speaker 1>as we impact the invisible architecture of your own biology.

456
00:22:26.519 --> 00:22:29.119
<v Speaker 1>Keep questioning, keep learning, and join us next time as

457
00:22:29.160 --> 00:22:32.039
<v Speaker 1>we continue to explore the absolute edges of human knowledge.
