WEBVTT 1 00:00:00.080 --> 00:00:03.600 Welcome back to another deep dive. Today. We're opening up 2 00:00:03.640 --> 00:00:06.639 a topic that I think a lot of us in 3 00:00:06.679 --> 00:00:08.880 the crunches have a bit of a love hate relationship with. 4 00:00:09.000 --> 00:00:09.519 OHI JEFFA. 5 00:00:09.679 --> 00:00:12.240 We are talking about vSphere seven point X, and I 6 00:00:12.320 --> 00:00:15.759 know the immediate reaction for you listening is probably great. 7 00:00:16.480 --> 00:00:19.519 Another update, another version number to track. But we've been 8 00:00:19.640 --> 00:00:22.679 pouring over the official cert guide for EXAM two V 9 00:00:22.800 --> 00:00:26.039 zero DASH twenty one point two. Yeah, that's by Davis, 10 00:00:26.199 --> 00:00:31.199 Baka and Thomas And honestly, this feels different. It really 11 00:00:31.199 --> 00:00:32.719 doesn't feel like just a service pack. 12 00:00:33.000 --> 00:00:35.000 It really isn't. I mean, when you actually dig into 13 00:00:35.000 --> 00:00:38.079 the architecture changes let out in this guide, VRE seven 14 00:00:38.159 --> 00:00:40.240 represents a massive pivot. 15 00:00:40.359 --> 00:00:40.439 Ye. 16 00:00:40.600 --> 00:00:44.840 It's the exact moment where VMware stopped just virtualizing servers 17 00:00:44.920 --> 00:00:47.560 and really started enforcing the software to find data center 18 00:00:47.840 --> 00:00:51.119 the SDDC. We're moving from a world where we manage 19 00:00:51.119 --> 00:00:54.000 individual boxes to a world where we manage policies in 20 00:00:54.039 --> 00:00:54.960 desired states. 21 00:00:55.119 --> 00:00:58.079 Desired state that is the buzzword, right, Yeah, But looking 22 00:00:58.119 --> 00:01:02.039 at the source material here, there is some serious engineering 23 00:01:02.079 --> 00:01:04.920 behind that marketing term. Absolutely, we've got the death of 24 00:01:04.959 --> 00:01:08.560 the external platform services controller, finally long. 25 00:01:08.359 --> 00:01:12.239 Overdue, the complete overhaul of how storage is handled with 26 00:01:12.359 --> 00:01:16.040 VSN and vvols, and some pretty scary warnings about where 27 00:01:16.159 --> 00:01:19.239 you can and cannot install ESXi anymore. 28 00:01:19.599 --> 00:01:22.359 Yeah, it's a lot to unpack. The guide is dense 29 00:01:22.480 --> 00:01:26.200 because the changes are so fundamental. They're basically ripping out 30 00:01:26.519 --> 00:01:29.879 legacy code that has been there for a decade and 31 00:01:29.920 --> 00:01:33.280 replacing it with this modern container aware architecture. 32 00:01:33.920 --> 00:01:37.480 So let's start with the brain of the beast vsenter server. Now, 33 00:01:37.519 --> 00:01:39.760 if you're listening and you were running visphere six point 34 00:01:39.799 --> 00:01:42.760 zero or six point five, you probably have battle scars 35 00:01:42.799 --> 00:01:46.519 from dealing with the Platform Services Controller, the ps. 36 00:01:46.519 --> 00:01:47.560 OH, the topologies. 37 00:01:47.680 --> 00:01:50.959 I remember having to design these incredibly complex topologies with 38 00:01:51.079 --> 00:01:54.719 external ps sitting behind third party load balancers just to 39 00:01:54.760 --> 00:01:57.079 get single sign on to work across sites. It was 40 00:01:57.120 --> 00:01:58.000 frankly a nightmare. 41 00:01:58.040 --> 00:02:00.920 It was incredibly complex. You had to manage replication agreements 42 00:02:00.959 --> 00:02:03.400 between those PSCs, You had to manage the certificates for 43 00:02:03.480 --> 00:02:04.640 each individual one. 44 00:02:04.760 --> 00:02:07.359 The certificates were the worst, right and if the load 45 00:02:07.400 --> 00:02:11.120 balance er misbehaved, your admins literally couldn't log in. But 46 00:02:11.199 --> 00:02:13.599 the good news from the cert guide is that VS 47 00:02:13.919 --> 00:02:16.759 seven effectively kills the external psc. 48 00:02:16.560 --> 00:02:18.680 So it's dead, Like I don't have to build them anymore. 49 00:02:18.759 --> 00:02:21.960 It is dead. The entire architectures that collapsed back into 50 00:02:22.000 --> 00:02:26.599 a simplified single appliance model. Okay, all those services, single 51 00:02:26.639 --> 00:02:31.319 sign on, the license service, the VMware Certificate Authority, They're 52 00:02:31.360 --> 00:02:34.680 all now running natively inside the v Center Server appliance 53 00:02:35.280 --> 00:02:36.080 the VCSA. 54 00:02:36.319 --> 00:02:39.080 Okay, that sounds great for a greenfield deployment, but what 55 00:02:39.199 --> 00:02:41.840 about the poor admin out there who has that complex 56 00:02:41.919 --> 00:02:46.280 six point seven topology with external PSCs. Is the upgrade 57 00:02:46.280 --> 00:02:48.439 path going to be a complete rip and replace scenario? 58 00:02:49.000 --> 00:02:52.840 Surprisingly, no, The guide highlights that the upgrade tool is 59 00:02:52.840 --> 00:02:55.400 actually a converge tool. Oh interesting, When you run the 60 00:02:55.479 --> 00:02:58.599 VS verse seven installer against your existing environment, it actually 61 00:02:58.680 --> 00:03:02.840 detects those external ps migrates their data and identity into 62 00:03:02.879 --> 00:03:07.159 the new vCenter appliance, and then essentially decommissions the old nodes. Ah, 63 00:03:07.439 --> 00:03:09.560 it converges the topology automatically for you. 64 00:03:09.759 --> 00:03:12.639 That is a huge relief to hear. But there's another 65 00:03:12.639 --> 00:03:17.280 obituary in here. The guide is pretty explicit that vCenter 66 00:03:17.319 --> 00:03:18.479 Server for Windows is. 67 00:03:18.479 --> 00:03:22.080 Gone correct, done, no more installing v Center on top 68 00:03:22.120 --> 00:03:25.000 of Windows Server. We are strictly in the world of 69 00:03:25.000 --> 00:03:28.240 the Photonos appliance now, right, which is great for security 70 00:03:28.280 --> 00:03:31.520 and patching, but it does mean if you are relying 71 00:03:31.560 --> 00:03:35.120 on Windows specific scripts or agents running locally on your 72 00:03:35.199 --> 00:03:36.240 v center box. 73 00:03:36.039 --> 00:03:38.280 Which a lot of people did exactly. 74 00:03:38.039 --> 00:03:40.439 That workflow is completely broken. Now you have to adapt. 75 00:03:40.520 --> 00:03:45.199 Okay, So we have this single monolithic appliance now handling everything. 76 00:03:45.199 --> 00:03:48.039 It's the brain, the heart, and the nervous system. But 77 00:03:48.080 --> 00:03:51.599 if that appliance dies, we are flying blind. I know 78 00:03:51.719 --> 00:03:55.319 v Center High Availability existed before, but the guide makes 79 00:03:55.319 --> 00:03:58.360 it sound like the architecture under the hood has really changed. 80 00:03:58.879 --> 00:04:00.479 How are we keeping this thing alive? 81 00:04:00.599 --> 00:04:03.639 So? vCenter ha and version seven is very slick, but 82 00:04:03.680 --> 00:04:06.560 you have to understand it requires specific networking. It uses 83 00:04:06.599 --> 00:04:09.159 a three node cluster. Okay, you've got an active node, 84 00:04:09.199 --> 00:04:10.960 a passive node and a witness note walk. 85 00:04:10.879 --> 00:04:12.919 Us through the replication. There is this just doing a 86 00:04:12.960 --> 00:04:14.159 standard storage mirror. 87 00:04:14.280 --> 00:04:17.120 No, No, it's much more intelligent than the active node. 88 00:04:17.319 --> 00:04:19.879 The one you are actually logged into and using is 89 00:04:19.959 --> 00:04:23.879 replicating data to the passive node through two distinct channels. 90 00:04:23.879 --> 00:04:27.800 Two channels, right, it uses native postgracle replication for the database, 91 00:04:28.319 --> 00:04:31.240 ensuring that all your inventory and events are SYNCD transactionally. 92 00:04:32.079 --> 00:04:35.920 And then it uses a separate file level replication basically 93 00:04:36.079 --> 00:04:39.000 zer sync to keep the configuration files in check. 94 00:04:39.279 --> 00:04:42.040 And the WITNESS is that just a third copy of 95 00:04:42.040 --> 00:04:43.199 the database. 96 00:04:42.879 --> 00:04:45.040 Not at all. The WITNESS is just a tiebreaker. It's 97 00:04:45.079 --> 00:04:48.000 a very lightweight clone. It doesn't hold the database or 98 00:04:48.040 --> 00:04:48.519 the files. 99 00:04:48.600 --> 00:04:49.319 Noo, what does it do? 100 00:04:49.839 --> 00:04:53.040 Its only job is to provide quorum. If your network 101 00:04:53.120 --> 00:04:55.439 kickups and the active and passive nodes lose sight of 102 00:04:55.439 --> 00:04:57.839 each other, you run the massive risk of a split 103 00:04:57.879 --> 00:05:00.920 brain scenario where both think they are the mas. 104 00:05:00.439 --> 00:05:02.399 Right, which corrupts everything exactly. 105 00:05:02.720 --> 00:05:06.240 The WITNESS basically casts the deciding vote on who actually. 106 00:05:06.040 --> 00:05:08.959 Owns the cluster, and the guide mentions some strict requirements 107 00:05:08.959 --> 00:05:11.639 for this right. You can't just throw these nodes anywhere 108 00:05:11.680 --> 00:05:12.399 on your network. 109 00:05:12.639 --> 00:05:16.120 No, you really can't. You need a dedicated vCenter HA 110 00:05:16.279 --> 00:05:20.639 network interface on each node, and the latency required is strict, 111 00:05:20.720 --> 00:05:23.759 how strange, less than ten milliseconds between the active and 112 00:05:23.800 --> 00:05:26.720 passive nodes. If you try to stretch this across a 113 00:05:26.800 --> 00:05:30.319 laggy wham link, the Postgres school replication will time out 114 00:05:30.480 --> 00:05:31.800 and the cluster will just fail. 115 00:05:31.920 --> 00:05:34.920 Get to know. So we've bulletproofed the brain, but the 116 00:05:34.959 --> 00:05:39.279 brain is useless if the body, meaning the esx offs themselves, 117 00:05:39.360 --> 00:05:40.120 are crumbling. 118 00:05:40.240 --> 00:05:41.319 That's true, and. 119 00:05:41.240 --> 00:05:44.319 That brings us to a somewhat controversial change in the 120 00:05:44.319 --> 00:05:46.759 guide regarding how we actually boot these servers. 121 00:05:46.800 --> 00:05:48.399 Oh yeah, the boot media. 122 00:05:48.600 --> 00:05:51.399 For years in my home lab, and honestly even in 123 00:05:51.399 --> 00:05:55.000 some production environments, I just slap ESXi on a generic 124 00:05:55.079 --> 00:05:58.040 eight gig USB stick or an SD card. It was cheap, 125 00:05:58.079 --> 00:06:00.959 it worked. But reading this guide, it sounds like VMware 126 00:06:01.040 --> 00:06:03.000 is declaring war on USB boot drives. 127 00:06:03.079 --> 00:06:05.319 War might be a strong word, but they are definitely 128 00:06:05.360 --> 00:06:08.399 waving a massive red flag for you. The issue isn't 129 00:06:08.439 --> 00:06:10.879 just the capacity, it's the partition structure. 130 00:06:10.959 --> 00:06:12.040 Okay, break that down. 131 00:06:12.319 --> 00:06:14.680 In previous versions, that boot drive really just held the 132 00:06:14.879 --> 00:06:17.959 hypervisor image, which is tiny. It loads into memory and 133 00:06:18.000 --> 00:06:23.079 it's done. But VS seven introduces a completely new partition layout. 134 00:06:23.480 --> 00:06:26.120 The big one you need to know about is ESXOS data. 135 00:06:26.519 --> 00:06:29.560 ESXOS data. That sounds ominous. What goes in there? 136 00:06:29.680 --> 00:06:33.240 It's consolidation. It takes the old scratch partition, the locker 137 00:06:33.240 --> 00:06:36.319 for VMware tools and the core dump location and puts 138 00:06:36.360 --> 00:06:39.800 them in one place. But here is the kicker. It 139 00:06:39.879 --> 00:06:42.199 is formatted with VMFSL. 140 00:06:42.480 --> 00:06:45.680 VMFSL like the filesystem used in VSL. 141 00:06:45.360 --> 00:06:48.879 Exactly like that. It's a high performance filesystem designed specifically 142 00:06:48.920 --> 00:06:52.800 for frequent reads and writes. This partition stores system logs, traces, 143 00:06:52.800 --> 00:06:54.759 and live database entries for the host itself. 144 00:06:54.800 --> 00:06:56.120 Okay, I see where this is going. 145 00:06:56.240 --> 00:06:58.600 Right. If you put that kind of heavy IO load 146 00:06:58.639 --> 00:07:01.519 on a cheap consumer grade USB stick or an SD card, 147 00:07:01.800 --> 00:07:04.079 you're going to burn out those nan flash cells in 148 00:07:04.120 --> 00:07:05.759 a matter of months, maybe even weeks. 149 00:07:05.839 --> 00:07:07.680 So the hypervisor is literally. 150 00:07:07.360 --> 00:07:10.199 Right rights to drive to death. Yes, And because of this, 151 00:07:10.360 --> 00:07:12.600 if the installer detects you or booting from a low 152 00:07:12.680 --> 00:07:16.519 quality USB device, it creates the esx O STATA partition, 153 00:07:16.879 --> 00:07:17.920 but it runs it into. 154 00:07:17.759 --> 00:07:19.480 Graded mode degraded mode. 155 00:07:19.600 --> 00:07:21.759 Yeah, it tries to limit the rights to save the drive, 156 00:07:22.279 --> 00:07:26.480 but you lose functionality and your logs are seriously at risk. 157 00:07:27.079 --> 00:07:29.639 So what is the actual recommendation from the cert guide? 158 00:07:29.720 --> 00:07:31.879 Are we going back to spinning rest for boot drives? 159 00:07:32.120 --> 00:07:36.079 The guide firmly recommends a local persistent disc that means 160 00:07:36.079 --> 00:07:39.639 an HDD or an SSD of at least thirty two gigabytes. 161 00:07:40.160 --> 00:07:42.800 That gives you enough room for the boot banks and 162 00:07:42.879 --> 00:07:45.800 a fully functional esx ostate a partition. 163 00:07:45.519 --> 00:07:47.680 And if you absolutely have to use USB. 164 00:07:47.879 --> 00:07:50.040 If you must use USB, you have to pair it 165 00:07:50.079 --> 00:07:53.519 with a local disc to offload that scratch partition, or 166 00:07:53.560 --> 00:07:55.000 you are living on borrowed time. 167 00:07:55.120 --> 00:07:56.720 That is going to catch a lot of people off 168 00:07:56.759 --> 00:07:59.399 guard during a hardware refresh. Speaking of things that catch 169 00:07:59.399 --> 00:08:02.839 people off guard are DNS and NTP classics. I feel 170 00:08:02.839 --> 00:08:04.639 like we talk about this every year, but the guide 171 00:08:04.720 --> 00:08:06.079 is just relentless about it. 172 00:08:06.120 --> 00:08:08.560 This time. It has to be in vSphere seven. The 173 00:08:08.600 --> 00:08:12.240 dependencies are hard coded. Take DNS for example, when you 174 00:08:12.240 --> 00:08:16.160 are deploying that new VCSA, the installer pauses and actually 175 00:08:16.240 --> 00:08:19.079 performs a reverse look up on the IP address you provided. 176 00:08:19.319 --> 00:08:22.240 If it cannot resolve that IP back to the fully 177 00:08:22.279 --> 00:08:27.240 qualified domain name the FQDN, the installation literally fails. It 178 00:08:27.279 --> 00:08:29.120 doesn't warn you, it just stops. 179 00:08:29.279 --> 00:08:30.040 Zero tolerance. 180 00:08:30.160 --> 00:08:33.000 Zero tolerance and NTP is even more critical because of 181 00:08:33.080 --> 00:08:36.440 sso how so the authentication tokens, the insale tokens used 182 00:08:36.440 --> 00:08:39.279 between v center and the hosts. They're all time stamped. 183 00:08:39.600 --> 00:08:41.960 If your host drifts more than a few minutes away 184 00:08:41.960 --> 00:08:45.200 from the v center time those tokens are rejected. Oh man, 185 00:08:45.320 --> 00:08:48.840 suddenly your backups fail, v motion fails and you literally 186 00:08:48.919 --> 00:08:50.120 can't log into the host. 187 00:08:50.279 --> 00:08:52.879 So for everyone listening, check your PTR records and your 188 00:08:52.879 --> 00:08:55.200 time servers before you even download the IO. 189 00:08:55.360 --> 00:08:57.840 The un sexy work that saves the deployment, let's. 190 00:08:57.639 --> 00:09:01.080 Shift gears to something a little sexier, store orridge. This 191 00:09:01.159 --> 00:09:04.480 seems to be where the software defined part of SDBC 192 00:09:04.679 --> 00:09:07.840 really kicks into overdrive. Absolutely, we still have the classics 193 00:09:07.960 --> 00:09:11.279 VMFS and NFS. Are they just legacy support now or 194 00:09:11.320 --> 00:09:12.919 have they actually improved in this version? 195 00:09:13.000 --> 00:09:15.919 Oh, they've definitely evolved. VMFS six is the absolute standard 196 00:09:15.919 --> 00:09:20.120 now and the big thing it handles is automatic unmap. 197 00:09:20.440 --> 00:09:21.679 You m in how that works. 198 00:09:21.879 --> 00:09:24.039 In the old days, if you deleted one hundred gigs 199 00:09:24.039 --> 00:09:27.480 of data inside a Windows VM, the underlying storage ray 200 00:09:27.679 --> 00:09:30.440 had no idea that space was free. It stayed marked 201 00:09:30.440 --> 00:09:34.320 as used. VMFS six automatically sends commands down to the 202 00:09:34.440 --> 00:09:35.960 array to reclaim that space. 203 00:09:36.360 --> 00:09:38.840 That's a huge space saver. And what about NFS. I 204 00:09:38.919 --> 00:09:43.399 usually associate NFS with holding isophiles, not running high performance workloads. 205 00:09:43.480 --> 00:09:46.600 Right, But VS seven pushes NFS four point one, which 206 00:09:46.679 --> 00:09:49.320 is a massive leap over NFS three. The two big 207 00:09:49.320 --> 00:09:51.639 features you get are multipathing and cerberos. 208 00:09:51.759 --> 00:09:53.240 Okay, multipathing makes sense. 209 00:09:53.320 --> 00:09:56.120 Yeah, NFS three relied on a single TCP session. If 210 00:09:56.120 --> 00:09:59.200 that link got saturated, you were bottlenecked. NFS four point 211 00:09:59.240 --> 00:10:03.200 one supports true multipathing across multiple links, and Cabero's means 212 00:10:03.200 --> 00:10:05.240 we can finally encrypt that storage traffic on the. 213 00:10:05.159 --> 00:10:07.559 Wire, which is a huge compliance requirement for a lot 214 00:10:07.559 --> 00:10:10.240 of organizations. Now exactly, now, I saw an acronym in 215 00:10:10.279 --> 00:10:13.000 the guy that was new to me, HPP, the high 216 00:10:13.039 --> 00:10:16.360 performance plug in. For the last decade, we've relied on NMP, 217 00:10:16.559 --> 00:10:19.320 the native multipathing plug in. Why do we need a 218 00:10:19.360 --> 00:10:20.000 new one? Now? 219 00:10:20.200 --> 00:10:23.279 This is entirely driven by the rise of NVMe non 220 00:10:23.360 --> 00:10:28.279 volatile memory express. Think about NMP as a traffic cup 221 00:10:28.519 --> 00:10:32.000 that was designed for traffic in the nineteen nineties, spinning discs. 222 00:10:32.159 --> 00:10:32.399 Right. 223 00:10:32.720 --> 00:10:35.279 It has these complex locks and queues that made total 224 00:10:35.320 --> 00:10:38.600 sense when drives were slow, But modern in VM flash 225 00:10:38.919 --> 00:10:42.799 is so incredibly fast that the software stack NMP itself 226 00:10:43.000 --> 00:10:44.399 actually became the bottleneck. 227 00:10:44.519 --> 00:10:46.559 The software couldn't click the send button fast enough for 228 00:10:46.559 --> 00:10:47.440 the hardware. 229 00:10:47.080 --> 00:10:50.840 Precisely, so VMware wrote the HPP specifically for ENVME and 230 00:10:50.960 --> 00:10:55.320 nvmey over fabrics. It removes those legacy locks and optimizes 231 00:10:55.360 --> 00:10:59.240 the whole iopath to handle millions of IOPs without killing 232 00:10:59.279 --> 00:11:00.000 your CPU over. 233 00:11:00.360 --> 00:11:02.960 So, if you're buying an all flash NVM array today, 234 00:11:03.200 --> 00:11:04.480 you need to be using HPP. 235 00:11:04.720 --> 00:11:06.720 If you aren't, you're just wasting the money you spend 236 00:11:06.799 --> 00:11:07.279 on that array. 237 00:11:07.559 --> 00:11:11.320 This moves us nicely into the no More Lams conversation 238 00:11:11.759 --> 00:11:14.840 vivols or virtual volumes. The concept has been around for 239 00:11:14.840 --> 00:11:16.600 a while, but the guide really treats it as a 240 00:11:16.600 --> 00:11:19.720 primary citizen in version seven for those who haven't deployed it. 241 00:11:19.799 --> 00:11:21.240 What is the actual mechanism here? 242 00:11:21.519 --> 00:11:25.200 So vviles changes the entire relationship between vCenter and the 243 00:11:25.240 --> 00:11:28.399 storage array. It introduces a component called the VSA. 244 00:11:28.200 --> 00:11:32.159 Provider vSphere APIs for Storage Awareness right. 245 00:11:32.320 --> 00:11:35.240 This acts as a translator. Instead of the array presenting 246 00:11:35.279 --> 00:11:39.600 a dumb ten terabyte block of space a LN, the 247 00:11:40.240 --> 00:11:44.200 VISA provider tells vsenter, hey, I can do replication, I 248 00:11:44.200 --> 00:11:47.159 can do dduplication, and I can do encryption, And then 249 00:11:47.240 --> 00:11:51.120 v center pushes that policy down to the individual VM level. Exactly. 250 00:11:51.320 --> 00:11:53.840 When you create a VM, the array creates a specific 251 00:11:54.000 --> 00:11:57.120 virtual volume just for that VM's disc. If you need 252 00:11:57.159 --> 00:12:00.360 to snapshot that VM, the array snapshots only that volume. 253 00:12:00.360 --> 00:12:02.600 You aren't snapshotting a whole li in with twenty other 254 00:12:02.679 --> 00:12:03.399 vms on it. 255 00:12:03.399 --> 00:12:06.679 It gives you granular control that matches the application, not 256 00:12:06.759 --> 00:12:09.759 the hardware limitations exactly. But if we really want to 257 00:12:09.799 --> 00:12:12.559 talk about true software defined storage, we have to talk 258 00:12:12.559 --> 00:12:15.039 about VSAM. This really feels like the heart of the 259 00:12:15.120 --> 00:12:18.840 modern VMware stack. Conceptually, we're taking local discs in the 260 00:12:18.879 --> 00:12:21.639 servers and pulling them across the network. But the devil 261 00:12:21.679 --> 00:12:24.759 is always in the details, specifically regarding disc groups. 262 00:12:24.919 --> 00:12:27.480 The disc group is the fundamental building block of VSAN. 263 00:12:28.080 --> 00:12:30.919 Each host participating needs at least one and the strict 264 00:12:31.000 --> 00:12:33.600 rule laid out in the guide is one cash device 265 00:12:33.960 --> 00:12:36.120 and one or more capacity devices. 266 00:12:35.720 --> 00:12:38.639 Per group, and that cash device is non. 267 00:12:38.399 --> 00:12:42.080 Negotiable, completely non negotiable. It must be flash. Even in 268 00:12:42.120 --> 00:12:44.840 a hybrid cluster where your capacity tier is made of 269 00:12:44.919 --> 00:12:48.440 cheap spinning discs, that cash tier has to be high 270 00:12:48.440 --> 00:12:51.720 performance SSD. Why is that because it absorbs one hundred 271 00:12:51.720 --> 00:12:54.000 percent of the right operations. It acts as a buffer. 272 00:12:54.600 --> 00:12:58.200 And here's the critical part. If that cash drive fails, 273 00:12:58.720 --> 00:13:00.679 the entire disc group go offline. 274 00:13:00.720 --> 00:13:03.320 Wow. Now the guide gets into some math that I 275 00:13:03.360 --> 00:13:06.360 think is really important for anyone doing capacity planning. It 276 00:13:06.399 --> 00:13:09.799 talks about RAD five and RADE six erasure coding. Traditionally, 277 00:13:09.879 --> 00:13:13.399 if I wanted redundancy, I use RAD one mirroring. I 278 00:13:13.440 --> 00:13:15.600 have one hundred gigs of data. I need two hundred 279 00:13:15.639 --> 00:13:17.000 gigs of disk space. 280 00:13:17.039 --> 00:13:19.840 Right, a two hundred percent overhead. That gets incredibly expensive 281 00:13:19.840 --> 00:13:24.320 when you're buying enterprise flash drives. Erasure coding changes the algorithm. 282 00:13:24.360 --> 00:13:26.960 It stripes the data across the host with parity. 283 00:13:26.600 --> 00:13:29.159 Bits like old school hardware RAID five. 284 00:13:29.120 --> 00:13:32.159 Very similar logic, but distributed across the network instead of 285 00:13:32.159 --> 00:13:35.000 a bad plane. With RAD five eraser coding, you need 286 00:13:35.039 --> 00:13:38.000 a minimum of four hosts. It uses a three plus 287 00:13:38.000 --> 00:13:41.159 one calculation, so your overhead drops from two x down 288 00:13:41.200 --> 00:13:43.399 to about one point three to three acts. That's signific 289 00:13:43.519 --> 00:13:45.480 You get the exact same level of protection, but you 290 00:13:45.559 --> 00:13:49.120 save a massive amount of raw storage capacity. 291 00:13:49.200 --> 00:13:51.279 That is a huge cost difference at scale. But it's 292 00:13:51.279 --> 00:13:53.639 only available on all flash configurations right. 293 00:13:53.679 --> 00:13:58.120 Correct, The parody calculation requires significant CBU and random io performance. 294 00:13:58.720 --> 00:14:02.519 Spitting discs simply cannot keep up with the read modify 295 00:14:02.639 --> 00:14:07.200 right penalty of erasure coding without totally tanking your VM performance. 296 00:14:07.279 --> 00:14:10.639 Got it now. One of the most fascinating topologies in 297 00:14:10.679 --> 00:14:14.279 the guide is the stretched cluster. This is the scenario 298 00:14:14.279 --> 00:14:16.879 where you have two data centers, Site A and Site B, 299 00:14:17.240 --> 00:14:20.080 and you want them to essentially act as one giant cluster. 300 00:14:20.679 --> 00:14:22.960 But I've always wondered about the split brain problem here. 301 00:14:23.279 --> 00:14:25.960 If the fiber line between the buildings gets cut, how 302 00:14:25.960 --> 00:14:27.840 do you stop them from fighting over who is the 303 00:14:27.879 --> 00:14:28.600 active site. 304 00:14:28.639 --> 00:14:31.679 That's the classic two generals problem, and VSAN solves this 305 00:14:31.759 --> 00:14:34.279 with a witness host. You place this witness in a 306 00:14:34.279 --> 00:14:36.840 third location, a totally different fault domain from Site A 307 00:14:36.919 --> 00:14:41.759 Insight BK, and it holds the metadata components of the VSN. 308 00:14:41.399 --> 00:14:43.159 Objects, so it acts as the referee. 309 00:14:43.399 --> 00:14:47.360 Exactly imagine Site A in Sight B lose connection. Site 310 00:14:47.360 --> 00:14:49.480 A looks at the witness and says, hey, can you 311 00:14:49.480 --> 00:14:52.600 see me? The witness says yes. Sit A then knows 312 00:14:52.639 --> 00:14:55.240 it has a quorum. It has two out of three votes, 313 00:14:55.360 --> 00:14:58.600 so it stays online and Site B Site B can't 314 00:14:58.639 --> 00:15:01.679 see the witness or site A, so it mathematically knows 315 00:15:01.679 --> 00:15:04.080 it has lost the vote. It immediately shuts down its 316 00:15:04.159 --> 00:15:05.840 VMS to prevent any data corruption. 317 00:15:06.000 --> 00:15:08.720 And since the witness only holds metadata, it can run 318 00:15:08.720 --> 00:15:10.360 on a pretty small connection, right, Yeah. 319 00:15:10.200 --> 00:15:12.320 It's tiny. You can run it inside a small cloud 320 00:15:12.360 --> 00:15:15.200 instance or just an old server in a remote office. 321 00:15:15.399 --> 00:15:18.759 It's not storing the actual VMDK data, just the state 322 00:15:18.799 --> 00:15:19.440 of the data. 323 00:15:19.559 --> 00:15:22.840 This all leads to the overarching philosophy of vSphere seven, 324 00:15:22.919 --> 00:15:27.080 which is SPBM storage policy based management. It feels like 325 00:15:27.120 --> 00:15:30.360 we are finally moving away from the old gold, silver 326 00:15:30.480 --> 00:15:32.360 bronze lun mindset. 327 00:15:32.600 --> 00:15:36.279 That is absolutely the goal. In the past, the infrastructure 328 00:15:36.320 --> 00:15:38.600 dictated the policy. You'd say, I have a fast lun, 329 00:15:38.639 --> 00:15:42.000 so put the database there. With SPBM, the application dictates 330 00:15:42.000 --> 00:15:42.679 the infrastructure. 331 00:15:42.799 --> 00:15:44.000 How does that look in practice? 332 00:15:44.120 --> 00:15:47.679 You create a policy and vCenter say mission critical encryption 333 00:15:47.879 --> 00:15:51.120 enabled RAD one protection. Yeah, you just assigned that policy 334 00:15:51.159 --> 00:15:51.679 directly to. 335 00:15:51.639 --> 00:15:54.480 The VM, and v center acts as the broker exactly. 336 00:15:54.600 --> 00:15:56.759 V center looks at your VM data store or your 337 00:15:56.799 --> 00:16:02.639 vfles and checks can I satisfy this requirement? If yes, 338 00:16:03.279 --> 00:16:06.639 it places the VM. If say six months later, a 339 00:16:06.720 --> 00:16:10.120 drive fails and the VM is no longer protected. V 340 00:16:10.240 --> 00:16:12.279 Center flags it as non compliant. 341 00:16:12.399 --> 00:16:15.000 That's a big shift. It changes the admin's job from 342 00:16:15.000 --> 00:16:17.879 provisioning storage to monitoring compliance. 343 00:16:18.080 --> 00:16:19.399 It's all about desired state. 344 00:16:19.519 --> 00:16:21.000 Before we wrap up, we have to touch on the 345 00:16:21.039 --> 00:16:24.279 integration of modern apps. The guide mentions first class discs 346 00:16:24.279 --> 00:16:27.440 and Kubernetes support. I think a lot of infrastructure admins 347 00:16:27.480 --> 00:16:30.120 here Kubernetes and just tune out, thinking it's purely a 348 00:16:30.159 --> 00:16:33.759 developer problem. But vsp seven makes it an infrastructure problem. 349 00:16:33.799 --> 00:16:36.440 It really does. The concept of the first class disc 350 00:16:36.519 --> 00:16:39.279 or STD is crucial here. In the past, a virtual 351 00:16:39.320 --> 00:16:41.840 disc was always a child of a virtual machine. If 352 00:16:41.879 --> 00:16:43.840 you deleted the VM, the disc died. 353 00:16:43.639 --> 00:16:45.960 With it, which is fine for a traditional server, but 354 00:16:46.159 --> 00:16:48.200 really bad for a container exactly. 355 00:16:48.279 --> 00:16:50.639 Containers are ephemeral. They spin up and die in seconds, 356 00:16:51.120 --> 00:16:53.720 but the data they generate, like a database file, needs 357 00:16:53.759 --> 00:16:57.440 to persist. A first class disc is a managed storage 358 00:16:57.480 --> 00:16:59.960 object that exists completely independently of any. 359 00:16:59.879 --> 00:17:01.519 V so it just floats out there. 360 00:17:01.679 --> 00:17:05.079 Yes, and kuberd eddies can request storage. vSphere creates an 361 00:17:05.119 --> 00:17:07.720 FCD and that disc can be attached and detached to 362 00:17:07.759 --> 00:17:10.720 different container worker nodes as needed. It allows you to 363 00:17:10.799 --> 00:17:13.839 run staple apps on the exact same VSR platform you 364 00:17:13.960 --> 00:17:15.799 use for your traditional Windows servers. 365 00:17:16.119 --> 00:17:19.240 So bringing it all together, vSphere seven isn't just a facelift. 366 00:17:19.359 --> 00:17:22.799 It is a fundamental re architecture. We've killed the external 367 00:17:22.839 --> 00:17:26.279 PSC and simplified the management plane. We've tightened the screws 368 00:17:26.279 --> 00:17:30.319 on hardware reliability with VMFSL boot partitions and those strict 369 00:17:30.400 --> 00:17:34.039 DNS requirements. And we've moved storage from a static hardware 370 00:17:34.119 --> 00:17:39.000 mapping to a dynamic, policy driven engine. With VAN and SPBM, 371 00:17:39.359 --> 00:17:40.079 it's really. 372 00:17:39.839 --> 00:17:43.680 About vSphere becoming the universal platform. Whether it's a legacy 373 00:17:43.680 --> 00:17:46.400 SQL server or cloud native micro service, the goal is 374 00:17:46.400 --> 00:17:49.759 to manage them with the exact same policies, the same availability, 375 00:17:50.000 --> 00:17:51.039 and the same security. 376 00:17:51.160 --> 00:17:52.960 I want to leave you with one final thought from 377 00:17:53.000 --> 00:17:56.799 the guide something called life Cycle Manager. We didn't have 378 00:17:56.839 --> 00:17:59.119 time to deep dive into it today, but it applies 379 00:17:59.200 --> 00:18:03.200 that same desig hired state logic to the hardware firmware itself. 380 00:18:03.599 --> 00:18:06.720 It can actually push BIOS and HbA driver updates to 381 00:18:06.720 --> 00:18:09.240 the physical servers to match a policy you set. 382 00:18:09.400 --> 00:18:12.000 It's an absolute game changer. It means you aren't just 383 00:18:12.160 --> 00:18:16.720 patching ESXC anymore. You are patching the actual metal underneath 384 00:18:16.720 --> 00:18:18.519 it all directly from vCenter. 385 00:18:18.839 --> 00:18:22.039 Right, and if vSphere is managing the physical firmware, the 386 00:18:22.079 --> 00:18:25.839 storage controller, and the application policy, are we looking at 387 00:18:25.880 --> 00:18:28.839 the end of the traditional hardware maintenance window as we 388 00:18:28.880 --> 00:18:30.720 know it. It feels like we are getting closer to 389 00:18:30.759 --> 00:18:32.759 the dream of a truly fluid infrastructure. 390 00:18:32.839 --> 00:18:35.039 We are definitely getting there. The hardware is basically just 391 00:18:35.079 --> 00:18:36.319 becoming code at this point. 392 00:18:36.480 --> 00:18:39.440 Plenty to think about before your next upgrade. Check those 393 00:18:39.440 --> 00:18:42.839 boot partitions, verify your dns, and maybe start playing with 394 00:18:43.000 --> 00:18:45.400 SPBM in your lab. Thanks for joining us on this 395 00:18:45.480 --> 00:18:46.799 deep dive into VC or seven. 396 00:18:46.839 --> 00:18:47.680 So glad to be here.