1
00:00:14,480 --> 00:00:17,800
Hey, what's going on, everybody? I am the host of Adventures and

2
00:00:17,839 --> 00:00:22,079
DevOps and wait, yeah, that's
the right channel name. Sorry. You

3
00:00:22,120 --> 00:00:26,480
know sometimes I get that mixed up, like almost every time I think,

4
00:00:26,519 --> 00:00:29,760
so if you're a frequent listener to
the podcast, you know that I messed

5
00:00:29,800 --> 00:00:33,679
that up just about every time.
So thanks for bearing with me on that.

6
00:00:34,439 --> 00:00:38,479
But today I'm not going to miss
this part up because today I have

7
00:00:39,320 --> 00:00:44,520
as our guest of Drew Stokes.
He's the senior manager of software engineering for

8
00:00:44,759 --> 00:00:49,240
Page Your Duty, and we're talking
about incident management. And that's one of

9
00:00:49,240 --> 00:00:54,479
my favorite topics because it just goes
so deep and it crosses so many disciplines

10
00:00:54,520 --> 00:01:00,960
across your infrastructure and your application teams
and marketing and sales and the executive suite.

11
00:01:02,000 --> 00:01:06,079
Like depending on the level of the
incident, you're just all overboard,

12
00:01:06,560 --> 00:01:08,280
all over the board with this.
So Drew, welcome to the show.

13
00:01:08,319 --> 00:01:11,680
I'm excited to have you here.
Thanks, I'm excited to be here.

14
00:01:11,719 --> 00:01:18,079
It's going to be too well right
on, So tell us a little bit

15
00:01:18,120 --> 00:01:23,799
how you got into the field of
incident management or incident response. Oh that's

16
00:01:23,799 --> 00:01:29,079
a good question. Okay, Yeah, So I've been in tech for a

17
00:01:29,079 --> 00:01:33,040
while, like most people here,
I think it's been something like sixteen years,

18
00:01:33,840 --> 00:01:38,840
and I think originally I was kind
of trying to figure out my way

19
00:01:38,920 --> 00:01:42,079
helping folks out with technology and networks, and then I got into front end

20
00:01:42,079 --> 00:01:46,719
development and moved into back end and
then dropped into SRE and that's when I

21
00:01:46,799 --> 00:01:52,079
kind of really got familiar with not
just the process of mitigating incidents, but

22
00:01:52,120 --> 00:01:56,000
actually managing them and trying to learn
from them. So I did that for

23
00:01:56,040 --> 00:01:57,400
a while, and then I think
for something like the last eight years,

24
00:01:57,439 --> 00:02:02,719
I've been primarily focused on people manager
role. And there's a lot of ways

25
00:02:02,760 --> 00:02:07,319
in which you know, people managers
are involved in incident management as well,

26
00:02:07,359 --> 00:02:13,240
both as stakeholders but also you know, facilitators and folks who are playing a

27
00:02:13,240 --> 00:02:16,479
supportive role for people who are responding. So kind of been in that space

28
00:02:17,639 --> 00:02:22,599
for a while now, and back
in I think it was May of twenty

29
00:02:22,639 --> 00:02:27,360
twenty one, I joined a startup
called Jelly, which was founded by Nora

30
00:02:27,479 --> 00:02:30,800
Jones, who's the author of the
Chaos Engineering book and the founder of the

31
00:02:30,879 --> 00:02:35,759
Learning from Incidents community, and that
was kind of where I really dropped into

32
00:02:36,840 --> 00:02:40,240
you know, incident management in general, but specifically this opportunity to kind of

33
00:02:40,280 --> 00:02:45,960
not just resolve incidents, mitigate the
issues, but also to learn from them

34
00:02:45,960 --> 00:02:51,840
in order to improve future response and
organizational performance. So there's a lot of

35
00:02:51,840 --> 00:02:53,680
really interesting ways to think about the
space, and you mentioned at the beginning

36
00:02:53,719 --> 00:02:58,319
it's really important. And part of
the reason is because it's so cross cutting,

37
00:02:58,400 --> 00:03:01,120
right, because incidents or a lens
through which you can see the way

38
00:03:01,120 --> 00:03:06,520
that your organization and your people operate, and that applies to customer service,

39
00:03:06,599 --> 00:03:09,680
that that applies to executives and to
the folks actually responding to the incidents.

40
00:03:09,680 --> 00:03:15,240
It's a really interesting space with a
lot of opportunity, which you'll you'll hear

41
00:03:15,319 --> 00:03:19,479
that word a lot in this conversation. We refer to incidents as opportunities.

42
00:03:20,159 --> 00:03:23,080
Oh for sure they are because you
know, one of the things that I

43
00:03:23,080 --> 00:03:25,800
think about a lot is just because
we're in tech. You know, we've

44
00:03:27,120 --> 00:03:31,840
all done the Google search for is
such and such service down because you're having

45
00:03:31,879 --> 00:03:36,319
problems and you're like, did I
do something wrong? Or are they actually

46
00:03:36,360 --> 00:03:38,599
dead in the water right now?
And I think that's like, to me,

47
00:03:38,680 --> 00:03:46,240
that's one of the hallmarks of highlighting
that your incident response plan is really

48
00:03:46,280 --> 00:03:53,879
really well done whenever your customers know
that you're having an incident incidents because you

49
00:03:54,120 --> 00:04:00,400
told them versus them discovering that something
was broken. Yeah, there's a there's

50
00:04:00,439 --> 00:04:04,879
a level of well. So,
so one interesting aspect here is you mentioned

51
00:04:04,919 --> 00:04:10,159
another cross cutting function there, right, which is you have internal stakeholders and

52
00:04:10,280 --> 00:04:14,479
external stakeholders for these types of things. But there's also this layer I think

53
00:04:14,479 --> 00:04:18,240
that you're referring to here of like
operational excellence and observability. Right, do

54
00:04:18,279 --> 00:04:23,000
you know that the system's broken before
someone tells you that the system is broken,

55
00:04:23,720 --> 00:04:28,199
And a lot of the ways in
which you can improve that process is

56
00:04:28,199 --> 00:04:30,560
through the learning process after the incident. Right, So, if you have

57
00:04:30,600 --> 00:04:34,720
an incident, for example, where
a customer reports an issue, looking at

58
00:04:34,759 --> 00:04:40,120
the details of that timeline and what
actually happened can help you figure out where

59
00:04:40,199 --> 00:04:44,800
you need to add additional instrumentation or
alerting, or how to adjust your team's

60
00:04:45,399 --> 00:04:49,439
processes, you know, your software
development life cycle or your release process to

61
00:04:50,040 --> 00:04:56,600
better account for those kind of unpredictable
behaviors in the system. So really interesting,

62
00:04:56,720 --> 00:05:00,600
like complicate, you know, when
you're dealing with not just complex software

63
00:05:00,639 --> 00:05:05,720
systems, but also complex organizations and
groups of people, right, really interesting

64
00:05:05,759 --> 00:05:11,399
opportunities to figure out how do we
kind of iteratively approve improve our understanding of

65
00:05:11,399 --> 00:05:15,600
the system and our understanding of failure
mode so that we can kind of inspire

66
00:05:15,720 --> 00:05:19,720
customer confidence and trust, right,
letting them know that there's an issue before

67
00:05:20,120 --> 00:05:26,360
before they don't. Yeah, for
sure. So you early early on in

68
00:05:26,360 --> 00:05:31,360
this you mentioned something I want to
highlight, mitigating an incident versus managing an

69
00:05:31,439 --> 00:05:36,360
incident. Can you elaborate on the
difference between those two, Yeah, that's

70
00:05:36,399 --> 00:05:42,720
a that's a great question. So
there are a lot of different aspects of

71
00:05:42,759 --> 00:05:46,199
incident management in general, and I'll
try to like decompose them in a way

72
00:05:46,240 --> 00:05:50,920
that makes sense here. So I
think when you just reference detection, right,

73
00:05:51,000 --> 00:05:55,399
so there's a there's a phase there
of like understanding whether or not there's

74
00:05:55,399 --> 00:05:58,360
an incident and trying to do something
about it. And I think when we

75
00:05:58,399 --> 00:06:04,879
talk about managing incidents, what we're
talking about is providing information and coordinating folks

76
00:06:04,920 --> 00:06:11,959
in incident response. Right. Mitigating
an incident is doing something to address the

77
00:06:12,000 --> 00:06:15,839
issue and get the system back to
a stable state or you know, performing

78
00:06:15,839 --> 00:06:20,120
in a way that's expected with regard
to external stakeholders. But I think for

79
00:06:20,279 --> 00:06:27,279
us, managing an incident is really
about investigating what's going on, getting the

80
00:06:27,360 --> 00:06:30,720
necessary folks with the subject matter expertise
into the room to contribute to that,

81
00:06:31,279 --> 00:06:36,199
coordinating that group of people in you
know, large organizations are really complex incidents.

82
00:06:36,199 --> 00:06:41,759
Sometimes you have multiple work streams of
investigation within an incident, and then

83
00:06:41,800 --> 00:06:46,519
communicating status out to stakeholders, your
customer success team, your executives in a

84
00:06:46,560 --> 00:06:50,839
way that allows them to stay informed
but does not have them jump in and

85
00:06:50,879 --> 00:06:54,439
start, you know, trying to
get involved in the process in a way

86
00:06:54,439 --> 00:07:00,439
that can you know, add additional
complexity to the overall incident managed. So

87
00:07:00,120 --> 00:07:04,199
from my perspective, I think management
is a lot more about the process of

88
00:07:04,199 --> 00:07:10,839
coordinating and communicating during an incident,
and mitigation is about that moment when you've

89
00:07:10,920 --> 00:07:17,040
kind of identified and addressed the issue
to stop whatever impact is associated with the

90
00:07:17,040 --> 00:07:21,000
incident, Right, that's your signal
to your external stakeholders that we are in

91
00:07:21,000 --> 00:07:25,480
a stable state, we've seen things
are good, and there are various other

92
00:07:25,480 --> 00:07:30,720
steps after that, But for me, that's the primary difference. Yeah,

93
00:07:30,920 --> 00:07:35,079
Yeah, I think that's really important
for someone who's not done a lot of

94
00:07:35,120 --> 00:07:43,800
incident responses to understand that the management
of it is equally important as the mitigating

95
00:07:43,839 --> 00:07:47,720
of it. And in many of
the environments I've worked in, those are

96
00:07:47,720 --> 00:07:56,319
actually two key roles for any incident. You have the first responder who's trying

97
00:07:56,399 --> 00:08:00,240
to find the cause and restore the
service. But then you know, alf

98
00:08:00,240 --> 00:08:07,600
what have your primary communications individual who
is getting the information from that first responder

99
00:08:07,639 --> 00:08:11,000
and relaying it out and doing still
in a way so that everyone feels like

100
00:08:11,079 --> 00:08:18,279
they're in touch with what's going on
and they aren't going around the back door

101
00:08:18,000 --> 00:08:24,000
sending DMS to the first responder to
get status updates. Yeah. Yeah.

102
00:08:24,079 --> 00:08:26,720
One thing we talk a lot about
is kind of this this incident management maturity

103
00:08:26,759 --> 00:08:33,840
model, and we think about different
buckets of you know, engineering teams or

104
00:08:33,919 --> 00:08:37,200
organizations with regard to kind of how
they approach this. And I've been in

105
00:08:37,879 --> 00:08:43,360
you know, multiple layers of the
lower maturity model, and sometimes it can

106
00:08:43,399 --> 00:08:46,759
be really difficult, yeah, to
even understand who's doing what and who do

107
00:08:46,840 --> 00:08:48,480
I ask for an update? You
know, I've got a customer who needs

108
00:08:48,480 --> 00:08:52,039
an update now, and we have
an SLA in the contract, what's going

109
00:08:52,080 --> 00:08:56,399
on? It can be really difficult
to even know who's doing that. And

110
00:08:56,440 --> 00:09:00,600
I think you find that in you
know, incident response tooling like Jelly,

111
00:09:00,919 --> 00:09:05,039
those roles are actually codified in the
process. You're assigning an incident commander,

112
00:09:05,440 --> 00:09:09,759
you're assigning a communications lead to try
and take care of that external communication of

113
00:09:09,919 --> 00:09:13,200
here's the person to you know,
connect with if you need an update,

114
00:09:13,559 --> 00:09:18,120
or here's the person responsible for managing
this incident, so that if you join

115
00:09:18,159 --> 00:09:20,039
in, you can say, hey, I'm here and I know about X,

116
00:09:20,200 --> 00:09:24,320
you know, can I help that
sort of thing? Right? And

117
00:09:24,399 --> 00:09:28,320
so that's one of the things that
Jelly does for you. If you need

118
00:09:28,360 --> 00:09:33,399
to improve the majority, improve the
maturity of your incident response playing, using

119
00:09:33,440 --> 00:09:37,519
something like Jelly can kind of help
you say, hey, here are the

120
00:09:39,000 --> 00:09:43,679
here are the people and the processes
you need in place, and provide a

121
00:09:43,720 --> 00:09:50,080
framework, right. Yeah. I
think I think like every small organization goes

122
00:09:50,120 --> 00:09:52,840
through a phase where someone opens a
Google doc and writes down a run book

123
00:09:52,840 --> 00:09:56,039
for how to run incidents, right, And so what we wanted to do

124
00:09:56,240 --> 00:10:00,159
is to provide some of that for
you in a way that didn't get in

125
00:10:00,200 --> 00:10:03,240
your way. So we've got a
bot in Slack right that you can use

126
00:10:03,279 --> 00:10:07,960
to declare incidents as sigence, stakeholders, set stages, communicate status, all

127
00:10:07,960 --> 00:10:11,159
that sort of stuff, so that
you don't really have to go in and

128
00:10:11,240 --> 00:10:16,159
kind of trial and error that Google
doc and try to get folks enrolled in

129
00:10:16,200 --> 00:10:20,679
the process. There's just a thing
kind of nudging you along the way and

130
00:10:20,799 --> 00:10:24,840
helping to offload some of that cognitive
burden. When you're in the middle of

131
00:10:24,919 --> 00:10:28,600
managing an incident, right or typically
as an incident commander, you're thinking about

132
00:10:28,600 --> 00:10:31,519
a lot of things. Sometimes you're
also trying to mitigate the incident. Right

133
00:10:31,559 --> 00:10:35,240
if it's two am, you may
have a stretch of time where you're doing

134
00:10:35,279 --> 00:10:41,279
everything on your own. And so
I think the more folks can find mechanisms

135
00:10:41,480 --> 00:10:48,120
and processes that help them reduce the
number of things they're doing during management so

136
00:10:48,159 --> 00:10:52,720
they can focus on getting the right
folks in the room and finding the means

137
00:10:52,759 --> 00:10:58,600
to mitigation, the more successful the
response process becomes, which results in better

138
00:10:58,679 --> 00:11:03,320
data for your post incident analysis,
and then you're you know, cross the

139
00:11:03,320 --> 00:11:07,519
incident learning over time. Yeah,
it's one of those things that like we've

140
00:11:09,480 --> 00:11:15,840
we've all done incident response wrong enough
time enough times that we we kind of

141
00:11:15,919 --> 00:11:20,240
know, So it's I think it's
one of those things like you know,

142
00:11:20,279 --> 00:11:24,679
like in software engineering, like writing
logs has been done for decades now,

143
00:11:24,759 --> 00:11:28,960
so you don't write your own logging
engine. You just pull in a logging

144
00:11:30,039 --> 00:11:33,600
library because you don't need to reinvent
that wheel. And I think incident response

145
00:11:33,679 --> 00:11:37,879
is one of those we don't need
to reinvent this will we can just buy

146
00:11:37,919 --> 00:11:41,240
a wheel that's already built. Yeah, we've we've we actually have a couple

147
00:11:41,279 --> 00:11:46,639
of customers of Jelly who are trying
to replace their wheels, right because you

148
00:11:46,679 --> 00:11:50,600
know, some of some of the
large organizations who started this process ten years

149
00:11:50,600 --> 00:11:54,000
ago had to make their own I
used to work at New Relic and we

150
00:11:54,080 --> 00:12:00,360
had a slock bot we called nerd
bot, which was our incident response you

151
00:12:00,440 --> 00:12:03,639
know, facilitation tool. But there's
a cost associated with those things, right,

152
00:12:03,679 --> 00:12:07,559
You have to maintain them over time. Oftentimes they kind of fall to

153
00:12:07,600 --> 00:12:11,360
the bottom of the priority stack,
and so iterating on your internal process becomes

154
00:12:11,399 --> 00:12:15,120
really hard. And I think that's
where if you go with something you know,

155
00:12:15,320 --> 00:12:18,200
like Jelly's incident a response spot,
which is you know, fairly opinionated

156
00:12:18,200 --> 00:12:22,600
but narrow in scope, right,
it's just here are the set of criteria

157
00:12:22,639 --> 00:12:28,799
that we use for this thing.
With some customizable features like automation, then

158
00:12:28,840 --> 00:12:33,840
you don't have to kind of invent
that wheel and then reinvent it iteratively for

159
00:12:33,919 --> 00:12:37,200
all time. And you also don't
really have to, you know, answer

160
00:12:37,240 --> 00:12:41,639
a lot of those questions when your
incidents become more complex. There's like different

161
00:12:41,679 --> 00:12:46,039
phases of your incident response process.
When you're a five person team, you

162
00:12:46,120 --> 00:12:48,559
jump in a zoom call, right, and you fix it. When you're

163
00:12:48,639 --> 00:12:54,120
fifty people in a major incident room, it's a very different experience and requires

164
00:12:54,159 --> 00:13:00,679
a different set of skills and supporting
tooling. So, yeah, cool,

165
00:13:00,799 --> 00:13:07,559
you mentioned a couple of times the
post incident response plan, so elaborate on

166
00:13:07,600 --> 00:13:11,480
that a little bit for me.
Yeah, this is another area where I

167
00:13:11,519 --> 00:13:16,799
think everyone kind of starts with a
recognition that there's more that can be gleaned

168
00:13:16,840 --> 00:13:20,759
from these experiences. Right early on, you have an incident, you respond

169
00:13:20,799 --> 00:13:22,159
to it, you fix it,
maybe you shoot an email off to folks

170
00:13:22,200 --> 00:13:24,639
saying what happened, and you know, here's what we're going to do to

171
00:13:24,639 --> 00:13:31,039
address in the future. But as
your system complexity grows and as your organization

172
00:13:31,159 --> 00:13:35,120
grows, there are you know,
many more opportunities to figure out how to

173
00:13:35,320 --> 00:13:41,720
change not just the system itself right
to you know, write better logs or

174
00:13:41,840 --> 00:13:46,080
increase visibility into the system's behavior,
but also to change how the organization is

175
00:13:46,080 --> 00:13:52,639
structured around those systems. Right.
So, one anecdote I like to share

176
00:13:52,799 --> 00:13:56,480
is at my time in a previous
company, we had this custom feature flag

177
00:13:56,559 --> 00:14:01,080
system that had been around for I
don't know, it was like eight or

178
00:14:01,159 --> 00:14:03,440
nine years or something. Everybody wanted
to get off of it. It wasn't

179
00:14:03,480 --> 00:14:07,360
great, and every time there was
an incident with that system, someone from

180
00:14:07,399 --> 00:14:11,240
the network engineering team would be pulled
in because they were one of the original

181
00:14:11,279 --> 00:14:15,080
authors. They had nothing to do
with this system anymore, but no one

182
00:14:15,120 --> 00:14:20,039
else knew how it works. And
so if you're just responding to and mitigating

183
00:14:20,080 --> 00:14:24,120
incidents and not looking any further,
you don't see those types of organizational misalignment

184
00:14:24,240 --> 00:14:28,960
right where you've got a primary owner
or subject matter expert that is, you

185
00:14:30,000 --> 00:14:31,840
know, accountable for a whole slew
of things that have nothing to do with

186
00:14:31,879 --> 00:14:37,440
this foundational service that's critical for business
function. If you've got a feature flag

187
00:14:37,480 --> 00:14:41,840
system in you know, a fourteen
year old code base it's got to work.

188
00:14:43,120 --> 00:14:48,159
So I think when we talk about
post incident learning, this is this

189
00:14:48,240 --> 00:14:52,080
is the next phase in maturity.
Right, you figured out your response process,

190
00:14:52,200 --> 00:14:54,039
you know how to get the right
folks in the room, you know

191
00:14:54,080 --> 00:14:58,240
how to move toward mitigation, and
you're starting to capture some of the you

192
00:14:58,279 --> 00:15:01,200
know, follow ups that you want
to take. Maybe we need more ossability.

193
00:15:01,399 --> 00:15:05,159
Maybe this library and our services out
of date, and if we updata

194
00:15:05,240 --> 00:15:09,960
we'll get better performance. Like that, But it goes beyond some of those

195
00:15:11,000 --> 00:15:13,879
follow ups, and as you start
to cultivate a process around this, and

196
00:15:13,919 --> 00:15:16,799
there's a lot of different ways that
folks do this. You know they're refer

197
00:15:16,879 --> 00:15:22,240
to on this post mortems or learning
reviews, or you know, sometimes you're

198
00:15:22,279 --> 00:15:26,519
just getting in a room and talking
about the incident without the structure, you

199
00:15:26,559 --> 00:15:31,120
start to uncover all of these really
interesting aspects of not only the responding team,

200
00:15:31,159 --> 00:15:33,639
but the organization overall. And so
some of the things that we're most

201
00:15:33,720 --> 00:15:39,200
interested in learning is, you know, what did folks know when they responded

202
00:15:39,240 --> 00:15:41,879
to the incident and what did they
not know? Right? What are the

203
00:15:41,919 --> 00:15:50,360
ways in which the folks involved communicated
successfully and maybe not so much? How

204
00:15:50,399 --> 00:15:56,840
did the organization's processes contribute to or
prevent aspects of a specific incident. It's

205
00:15:56,879 --> 00:16:00,519
all kinds of interesting stuff to dig
into, and you can look at it

206
00:16:00,559 --> 00:16:04,200
from a bunch of different angles.
So we have, you know, a

207
00:16:04,200 --> 00:16:10,360
lot of examples of our customers creating
multiple investigations on an incident where a person

208
00:16:10,399 --> 00:16:14,480
A and person beat both investigate and
then you see like where the differences are,

209
00:16:15,279 --> 00:16:18,600
and I think that turns up a
lot of interesting stuff. We've taken

210
00:16:18,799 --> 00:16:23,000
the approach in Jelly of writing incident
narratives, so you know, post learning,

211
00:16:23,039 --> 00:16:26,799
review, post mortem, whatever you
want to call it. Our feeling

212
00:16:26,919 --> 00:16:33,120
is that incidents are stories and the
way that people connect with information and learn

213
00:16:33,240 --> 00:16:36,679
is through storytelling. And so we've
taken the approach that, you know,

214
00:16:37,000 --> 00:16:41,519
we want to provide folks with a
tool to tell a story backed by evidence,

215
00:16:41,600 --> 00:16:45,080
right, what was actually said during
the incident, what you know,

216
00:16:45,519 --> 00:16:48,519
metrics or data we were looking at, but to kind of nudge folks in

217
00:16:48,559 --> 00:16:55,679
the direction of sharing their perspective and
their assertions about what it means. Right,

218
00:16:55,840 --> 00:16:59,639
when these two folks were talking,
they were talking about different aspects of

219
00:16:59,679 --> 00:17:02,720
the system, and they didn't realize
it what does that mean, right,

220
00:17:02,720 --> 00:17:08,079
what's the opportunity there to improve the
incident management and the way that these teams

221
00:17:08,079 --> 00:17:14,599
are connected and communicating those sorts of
things. Yeah, you see that a

222
00:17:14,599 --> 00:17:18,240
lot whenever you have people with different
disciplines or different backgrounds, you know,

223
00:17:18,279 --> 00:17:25,519
a networking background versus a software engineering
background. And I think that highlights one

224
00:17:25,599 --> 00:17:33,160
of the one of the arts of
post incident response is creating those follow up

225
00:17:33,200 --> 00:17:41,000
items and getting those the right people
engaged to recognize, prioritize, and address

226
00:17:41,160 --> 00:17:45,200
the things that you learned from that
incident. Yeah, and you know that

227
00:17:45,400 --> 00:17:51,119
you mentioned like different disciplines. There
are different different disciplines within the responding team,

228
00:17:51,200 --> 00:17:55,839
but there are also incidents provide this
really unique opportunity to consider the different

229
00:17:55,839 --> 00:17:59,920
disciplines across an organization. Right,
So for your major incidents, it's not

230
00:18:00,160 --> 00:18:03,720
just your you know, senior engineers
from a specific team. It's also your

231
00:18:03,759 --> 00:18:08,480
customer support support folks on critical accounts. It's also your group leads and your

232
00:18:08,480 --> 00:18:15,960
executives. All of these people have
different priorities and perspectives and understanding with regard

233
00:18:17,079 --> 00:18:19,319
to the impacted systems and the impact
on the business. Right, if I'm

234
00:18:19,359 --> 00:18:22,880
responding to an incident, my goal
is to make the chart go down,

235
00:18:23,440 --> 00:18:30,440
whereas my executive or salespeople's goal is
to minimize the costs associated with customer impact.

236
00:18:30,559 --> 00:18:33,319
Right, We've got slas with our
customers for uptime, and we need

237
00:18:33,359 --> 00:18:38,279
to keep that in line. And
I think the different perspectives and priorities there

238
00:18:38,359 --> 00:18:42,599
result in that same kind of differing
perspective that I mentioned earlier, where I

239
00:18:42,680 --> 00:18:47,480
may look at an incident and think
it means one thing, but my group

240
00:18:47,599 --> 00:18:51,960
lead or you know, my sales
associate may look at it and think another

241
00:18:52,039 --> 00:18:59,079
thing. And that opportunity with you
know, incident narratives or post incident learning

242
00:18:59,160 --> 00:19:03,640
is to try and bridge that divide
between those different perspectives and help everyone cultivate

243
00:19:03,640 --> 00:19:07,160
a shared understanding of what it means
across those dimensions. Right, this is

244
00:19:07,200 --> 00:19:14,119
what this incident meant for business impact
and process, for customer satisfaction, and

245
00:19:14,359 --> 00:19:18,440
for the you know, sustainability of
our you know, critical services something like

246
00:19:18,440 --> 00:19:25,240
that. Yeah. I've even worked
in organizations where it involved the marketing team

247
00:19:25,359 --> 00:19:30,880
because they were out scrolling Twitter,
you know, catching tweet going on about

248
00:19:30,880 --> 00:19:34,599
the incident and responding those and trying
to do trying to minimize the blast radius

249
00:19:34,640 --> 00:19:38,920
there. Yeah, this is a
whole other aspect that's really interesting, which

250
00:19:38,960 --> 00:19:42,960
is like where do incidents come from? Right? Who says what an incident

251
00:19:44,160 --> 00:19:48,240
is? We've taken the approach that
anyone can declare an incident. Some organizations

252
00:19:48,240 --> 00:19:52,359
we've worked with are very narrow in
terms of who can declare them. But

253
00:19:52,359 --> 00:19:56,160
yeah, customer success marketing, you
know, random person from the internet.

254
00:19:56,279 --> 00:20:02,519
There are all sources of potential incidents, you know, automation and observability,

255
00:20:02,519 --> 00:20:06,559
those sorts of things, and so
it's you know, the the once you

256
00:20:06,599 --> 00:20:11,440
start thinking about this space and you
start exploring ways of benefiting from these lenses

257
00:20:11,759 --> 00:20:18,720
on current state of systems and organizational
process, you start to see like there

258
00:20:18,759 --> 00:20:25,359
are opportunities everywhere. Right at Jelly
internally, we create incidents for things that

259
00:20:25,400 --> 00:20:29,079
are not incidents. If we have
a release going out that we think might

260
00:20:29,119 --> 00:20:33,839
be you know, impactful to customers
because it changes some aspect of the user

261
00:20:33,880 --> 00:20:37,599
experience, that's an incident. If
we're trying to better understand database failover in

262
00:20:37,799 --> 00:20:41,799
RDS, for example, we run
a game day as an incident, and

263
00:20:42,200 --> 00:20:47,480
doing that gives you this repository of
information that you can use again to build

264
00:20:47,519 --> 00:20:51,519
that narrative and make those assertions about
where are we and where do we want

265
00:20:51,519 --> 00:20:55,759
to be with regard to how we're
operating and the health and stability of our

266
00:20:56,359 --> 00:21:00,279
systems. So that's a really interesting
anecdote about marketing. I love when those

267
00:21:00,319 --> 00:21:04,240
things come in from places you don't
expect, right, You just kind of

268
00:21:04,279 --> 00:21:07,519
get a message from someone that you
haven't met before and they're like, hey,

269
00:21:07,559 --> 00:21:11,599
there's something going on yet we'd better
declare Yeah, yeah, you see

270
00:21:12,039 --> 00:21:17,839
someone from marketing enter in one of
the tech Slack channels and that this is

271
00:21:17,880 --> 00:21:23,319
not going to go well. So
I think one of the cool one of

272
00:21:23,319 --> 00:21:30,839
the cool types of companies I like
to work with fit the model of Jelly

273
00:21:30,920 --> 00:21:34,920
because you actually use your own product, you know, like when you build

274
00:21:34,960 --> 00:21:40,799
and release it, your team actually
uses it to manage your own incidents.

275
00:21:40,839 --> 00:21:45,519
And I think that is really really
cool because you get firsthand experience of what

276
00:21:45,559 --> 00:21:49,920
it's like to be your own customer, and you can understand what your customers

277
00:21:49,960 --> 00:21:56,240
are actually seeing when they're trying to
use your tool. Yeah, one thing

278
00:21:56,240 --> 00:22:03,359
that was really interesting thing in the
early days about working with our customers.

279
00:22:03,400 --> 00:22:06,519
It's interesting now as well. We'll
have to talk about page duty at some

280
00:22:06,519 --> 00:22:10,519
point later. But one thing that
was really interesting is that the customers that

281
00:22:10,559 --> 00:22:14,519
we work with are really passionate about
their process and those opportunities to learn,

282
00:22:14,559 --> 00:22:18,839
and so we get to work really
closely with them on you know, understanding

283
00:22:18,920 --> 00:22:22,079
their process and building tooling it works
for them. We work with F five

284
00:22:22,200 --> 00:22:26,960
and Indeed and Honeycomb and Zendesk.
These are like, you know, large

285
00:22:26,079 --> 00:22:30,920
influential organizations who are kind of at
the cutting edge of this process. So

286
00:22:32,640 --> 00:22:37,759
there's this bi directional information share where
you know, we can build features that

287
00:22:37,799 --> 00:22:41,039
support those organizations processes, but then
we can also adopt some of those organizational

288
00:22:41,039 --> 00:22:45,119
processes because they make a lot of
sense and they work well for us.

289
00:22:45,880 --> 00:22:51,759
I was we were doing a product
demo for an important group of people the

290
00:22:51,799 --> 00:22:55,519
other day and we noticed some lag
in one of our features and I actually

291
00:22:55,559 --> 00:23:00,160
declared an incident with Jelley about the
performance of the incident was cons tol jelly,

292
00:23:00,720 --> 00:23:03,759
and we ran that in parallel during
the demo, and it was there

293
00:23:03,799 --> 00:23:08,559
was this moment where I was just
like, this is so cool running an

294
00:23:08,559 --> 00:23:11,880
incident with the tool that we're demoing
to people, and there wasn't actually an

295
00:23:11,880 --> 00:23:15,880
issue. It was a Wi Fi
lag you know thing, So everything was

296
00:23:15,920 --> 00:23:19,880
good and that's okay. That's also
a learning opportunity. But yeah, it's

297
00:23:19,880 --> 00:23:26,519
been really exciting to kind of watch
things evolve over time and be a you

298
00:23:26,559 --> 00:23:30,440
know, benefactor of that system as
well as trying to evolve it for our

299
00:23:30,480 --> 00:23:36,759
customers and find that alignment across across
orgs, which is really unique. Most

300
00:23:36,759 --> 00:23:41,200
of your incident response and post incident
learning is within an org. We've had

301
00:23:41,200 --> 00:23:48,559
the unique opportunity to kind of extend
that outward, so fun right on.

302
00:23:48,559 --> 00:23:52,119
One of the things I'm interested to
get your opinion on is I over the

303
00:23:52,200 --> 00:23:59,279
years, I've developed the opinion that
there's a difference between mitigating the issue and

304
00:23:59,400 --> 00:24:02,720
resolving issue. And I refer to
that in in terms of, like during

305
00:24:02,759 --> 00:24:07,279
the incident, you know, you
have you know, to say your API

306
00:24:07,359 --> 00:24:15,279
service is slow, it's okay during
the incident to throw more servers at it.

307
00:24:15,839 --> 00:24:19,039
You know, we're going to we're
going to mitigate the issue by adding

308
00:24:19,079 --> 00:24:23,079
more servers or adding more memory,
or do something to make the symptoms of

309
00:24:23,119 --> 00:24:29,319
the problem go away. But then
there's this like defining moment of okay,

310
00:24:30,079 --> 00:24:34,599
customer impact has been resolved, but
now we've got to go back and find

311
00:24:34,839 --> 00:24:40,599
the root cause because adding the additional
servers did not fix the issue that fixed

312
00:24:40,599 --> 00:24:45,039
the symptoms. And I'm interested to
get your opinion on that. Yeah,

313
00:24:45,119 --> 00:24:48,000
it's a really good distinction that you're
making there, and I think it has

314
00:24:48,039 --> 00:24:52,119
a lot to do with prioritization and
understanding. Right. So oftentimes, especially

315
00:24:52,119 --> 00:24:57,680
in major incidents, there's a priority
involved there to minimize customer impact, right,

316
00:24:57,680 --> 00:25:03,920
because customer impact means lost revenue.
Incidents are expensive both in terms of

317
00:25:03,960 --> 00:25:07,519
time and you know, customer satisfaction
and trust. And so I think there

318
00:25:07,519 --> 00:25:11,960
are kind of two ways in my
experience that you mitigate before resolution. And

319
00:25:12,000 --> 00:25:18,200
the first i'm mentioning now is about
minimizing the impact in favor of kind of

320
00:25:18,240 --> 00:25:22,480
getting things back on track. And
so, like you said, throw some

321
00:25:22,519 --> 00:25:27,039
additional servers at the API and that'll
address the symptom, but we still don't

322
00:25:27,119 --> 00:25:30,880
understand what's going on in the hood, right, And so I think the

323
00:25:30,920 --> 00:25:36,400
second reason, sometimes you can choose
not to mitigate an issue. I've been

324
00:25:36,440 --> 00:25:41,720
in situations where we've had customer impact, but the priority of understanding what's going

325
00:25:41,759 --> 00:25:45,160
on has exceeded the priority of needing
to address that impact, maybe because it's

326
00:25:45,200 --> 00:25:48,119
like, you know, one user
at a customer rather than all customers in

327
00:25:48,160 --> 00:25:52,119
a major incident. And so that
second bit I think is really interesting because

328
00:25:52,240 --> 00:25:59,920
you can use the incident and the
the levers you can pull during the incident

329
00:26:00,039 --> 00:26:03,119
to create the conditions for learning while
it's happening. Right, So if you

330
00:26:03,160 --> 00:26:06,920
mitigate the incident with the API,
it means that you have an opportunity to

331
00:26:06,920 --> 00:26:10,480
explore what was actually going on.
Maybe you isolate one of those servers and

332
00:26:10,519 --> 00:26:15,680
you start to dig into you know
which function calls. If you've got distributed

333
00:26:15,720 --> 00:26:19,599
tracing, which is amazing, you
know which specific function or endpoint is causing

334
00:26:19,680 --> 00:26:25,200
delay in the response, right,
that's causing a delay across all responses,

335
00:26:26,039 --> 00:26:30,559
And you can kind of take advantage
of that system state, which you know,

336
00:26:30,599 --> 00:26:33,000
if you reboot the servers, if
you add a ton of them,

337
00:26:33,240 --> 00:26:37,160
those conditions go away and you lose
your opportunity to understand what's going on.

338
00:26:37,200 --> 00:26:41,119
And so there's a lot of different
ways to look at it. I think

339
00:26:41,200 --> 00:26:47,039
mitigation and resolution for folks outside of
incident response, that's a mental framework for

340
00:26:47,160 --> 00:26:51,480
understanding are we good now and are
we good for the long term? Right?

341
00:26:52,000 --> 00:26:56,400
But as a responder, those two
events are really key in terms of

342
00:26:56,480 --> 00:27:02,000
communicating within the response group what our
level of understanding and what priority decisions we're

343
00:27:02,039 --> 00:27:07,599
making with regard to customer impact or
you know, system stability or what have

344
00:27:07,720 --> 00:27:14,400
you. Sometimes incidents are not resolved
for days after you know the actual incident.

345
00:27:14,519 --> 00:27:18,799
I've especially for for large, complex
incidents. Sometimes you just have to

346
00:27:18,839 --> 00:27:22,640
get things to a steady state and
let them stay there until you have chance

347
00:27:22,680 --> 00:27:26,319
to enroll more folks or get a
deeper understanding of what's going on. And

348
00:27:26,400 --> 00:27:29,359
sometimes those fixes are not things you
can roll out, you know, as

349
00:27:29,720 --> 00:27:33,559
one hot fix. Sometimes they are
major upgrades or major changes to kind of

350
00:27:33,599 --> 00:27:40,039
foundational business logic. So I'm glad
you made that distinction because they're they're really

351
00:27:40,079 --> 00:27:42,319
important, and I think oftentimes folks
outside of the incident are just like,

352
00:27:42,359 --> 00:27:47,559
when are we mitigated? When we
mitigated? But you can't you can't lose

353
00:27:47,599 --> 00:27:51,759
sight of that, that time frame
between mitigation and resolution, because that's where

354
00:27:51,799 --> 00:28:02,160
a lot of the you know,
exploratory understanding comes out for sure. And

355
00:28:02,440 --> 00:28:08,240
one of the things that I try
to insist on is that mitigating the issue,

356
00:28:08,599 --> 00:28:12,880
were allowed to make live changes in
production, but the actual root cost

357
00:28:12,960 --> 00:28:18,440
fixed has to go through our normal
development cycle of making the changes in DEV,

358
00:28:18,880 --> 00:28:22,799
pushing the changes to a staging environment, validating them, and then promoting

359
00:28:22,799 --> 00:28:27,160
those changes to production. So it
has to follow that flow. Yeah,

360
00:28:27,240 --> 00:28:32,839
and that's that goes back to that
prioritization opportunity. Right. So once you've

361
00:28:32,920 --> 00:28:37,039
kind of addressed the business impacting issue, then you've got to get back to

362
00:28:37,079 --> 00:28:40,960
your fundamentals, right, and your
business processes and compliance and all of that.

363
00:28:41,519 --> 00:28:48,279
And so detangling those two things allows
you to respond in a way that

364
00:28:48,319 --> 00:28:52,079
helps the business, and then address
the issue in a way that helps the

365
00:28:52,119 --> 00:28:56,079
business, and do those in different
ways, because especially when you're when you're

366
00:28:56,079 --> 00:28:59,119
further along in your maturity model,
when you're a large organization, there's a

367
00:28:59,160 --> 00:29:03,480
lot of things that can hands stand
in the way of quickly addressing an issue.

368
00:29:03,559 --> 00:29:07,559
Right. If you don't create a
path for doing that, then incidents

369
00:29:07,599 --> 00:29:11,640
end up taking longer and having a
lot more impact. So yeah, and

370
00:29:12,200 --> 00:29:15,559
the other thing we've learned in all
of this is that every organization is different,

371
00:29:15,680 --> 00:29:22,079
Right. Some organizations have response processes
that specifically call out different ways of

372
00:29:22,160 --> 00:29:29,160
mitigating impacting issues and different ways of
capturing follow ups for those. Right,

373
00:29:29,240 --> 00:29:33,279
Sometimes the incident's not closed until you've
resolved it, and sometimes it's closed at

374
00:29:33,279 --> 00:29:37,160
the point that it's mitigated and you've
captured the follow ups you want to take

375
00:29:37,160 --> 00:29:41,680
action on. You know. As
a result, sometimes folks keep talking about

376
00:29:41,680 --> 00:29:44,720
the incident after it's been closed and
they want all of that for their post

377
00:29:44,799 --> 00:29:51,039
incident learning review as well. There's
just so many different ways to tailor this

378
00:29:51,480 --> 00:29:56,759
whole incident management process to help an
organization be more successful. Yeah, one

379
00:29:56,799 --> 00:30:00,920
of the places I worked years ago
was is that a healthcare provider, and

380
00:30:00,960 --> 00:30:10,079
we did we provided medical services for
hospitals across the US for trauma patients,

381
00:30:10,599 --> 00:30:15,319
and so every incident that we had, whenever we broke out an incident room,

382
00:30:15,400 --> 00:30:18,680
we actually had a person from our
quality team who would join the call

383
00:30:18,720 --> 00:30:22,519
as well and let us know,
like every five or ten minutes, how

384
00:30:22,519 --> 00:30:29,359
many patients across the United States couldn't
receive life saving healthcare because our stuff was

385
00:30:29,359 --> 00:30:34,119
broken. And so we had a
very unique incident response model there that doesn't

386
00:30:34,160 --> 00:30:37,799
really apply anywhere I've been since then, but there were still lessons that I've

387
00:30:37,839 --> 00:30:42,319
taken away from that, you know, number one is mitigate the issue as

388
00:30:42,400 --> 00:30:48,799
possible. Right, I'm so interested
to hear how how did that information help

389
00:30:48,920 --> 00:30:59,160
or hinder mitigation for your teams.
It really set the priority and kept us

390
00:30:59,200 --> 00:31:03,920
focused, you know, because as
that number went up, you started to

391
00:31:03,000 --> 00:31:08,079
understand, you know, the impact
that this was having. And this was

392
00:31:08,119 --> 00:31:15,359
not a other development team sucks or
their network is terrible, or and many

393
00:31:15,599 --> 00:31:22,200
many of our incidents it was because
of user error at one of the trauma

394
00:31:22,279 --> 00:31:26,480
centers. But it's still not okay
to say, oh, well, they're

395
00:31:26,559 --> 00:31:29,119
just doing it wrong, because you
have to realize at the same time,

396
00:31:29,720 --> 00:31:33,480
you know, while you're on the
phone with that person, they're up on

397
00:31:33,400 --> 00:31:37,680
a table in the emergency room doing
chess compressions on this patient. So they're

398
00:31:37,680 --> 00:31:41,559
going to give it their best shot, but they may not be the most

399
00:31:41,559 --> 00:31:45,240
attentive user at that time, and
you just got to work with that.

400
00:31:47,079 --> 00:31:52,359
Yeah, you're you're highlighting like a
perfect example of I think why we are

401
00:31:52,440 --> 00:31:57,359
so focused on post incident learning,
and it's because the most important aspect of

402
00:31:57,400 --> 00:32:02,599
these complex technical systems that we're all
building and maintaining are the people involved,

403
00:32:02,720 --> 00:32:07,559
Right, and when you're in an
incident response room, a major incident room,

404
00:32:07,599 --> 00:32:10,279
whatever, and you've got someone reminding
you of the impact, especially when

405
00:32:10,279 --> 00:32:15,000
that impact is you know, not
just on dollars, but also on people's

406
00:32:15,440 --> 00:32:24,200
lives. You create the conditions for
this like profound human creativity, right in

407
00:32:24,640 --> 00:32:29,160
terms of figuring out, you know, what can we do as a team

408
00:32:29,319 --> 00:32:32,039
to kind of we're back to the
incident management space, what can we do

409
00:32:32,039 --> 00:32:35,839
as a team to kind of come
up with a creative solution here and get

410
00:32:35,880 --> 00:32:38,359
us back to good, you know, temporarily. And I think if you're

411
00:32:38,400 --> 00:32:45,440
not reflecting on and talking about those
moments in incident response and your you know,

412
00:32:45,759 --> 00:32:49,839
postings in a learning review or narrative
review, whatever you call it,

413
00:32:50,480 --> 00:32:52,920
and you're missing out on all of
those examples of the ways in which the

414
00:32:53,079 --> 00:32:58,039
people are helping support the system and
keep things moving. You know, we

415
00:32:58,400 --> 00:33:04,359
hear a lot in tech and DevOps
and elsewhere that like automation is the key

416
00:33:04,559 --> 00:33:09,799
to sustainability and more reliable systems.
And there are things that we can automate,

417
00:33:10,039 --> 00:33:15,400
you know, especially assigning roles during
incident management and response. But there's

418
00:33:15,440 --> 00:33:21,440
a lot of you know, human
involvement tweaking the system and adding you know

419
00:33:21,559 --> 00:33:25,920
capacity, not you know, technical
capacity in terms of number of network requests.

420
00:33:25,960 --> 00:33:30,880
You can handle things like that,
but adding capacity in terms of the

421
00:33:30,880 --> 00:33:36,359
system's adaptability. And I just like, I would love to be a fly

422
00:33:36,480 --> 00:33:38,480
on the wall for one of those
incidents that you mentioned, because I imagine

423
00:33:38,519 --> 00:33:44,000
folks really came together and came up
with some creative solutions to find a way

424
00:33:44,000 --> 00:33:47,440
to mitigate those incidents and get things
back to good so that they could figure

425
00:33:47,440 --> 00:33:50,799
out, you know, what the
long term solutions were. That's such an

426
00:33:50,799 --> 00:33:55,880
exciting like space, Yeah, for
sure, and it's you know, it

427
00:33:55,920 --> 00:34:02,200
was a majority of the role was
communication. Like all of all the my

428
00:34:02,319 --> 00:34:09,360
coworkers there had exceptional technical skills,
but their communication skills were just a plus

429
00:34:09,440 --> 00:34:14,400
one on top of that. And
I think that's what made it work so

430
00:34:14,559 --> 00:34:16,559
well. And I still say that
to this day. You know, DevOps

431
00:34:17,519 --> 00:34:22,119
is not a technical world. There's
a technical comm component, but it really

432
00:34:22,199 --> 00:34:29,280
is communications in building the technical framework, but then communicating that out to your

433
00:34:29,320 --> 00:34:32,360
customers, the engineers that you support, and getting the feedback from them to

434
00:34:32,480 --> 00:34:37,840
understand what's the difference between what I
built and what they thought I built.

435
00:34:38,519 --> 00:34:43,119
Yeah, it's It's really great when
you have those folks who kind of know

436
00:34:43,280 --> 00:34:46,800
how to be in a critical situation
and maintain you know, effective communication and

437
00:34:46,840 --> 00:34:52,360
find a solution to the issue.
One thing we talk a lot about is

438
00:34:52,360 --> 00:34:54,880
like how do you scale that,
how do you how do you externalize those

439
00:34:54,920 --> 00:35:00,119
skills? Oftentimes we find that the
folks who are most effective and inti and

440
00:35:00,239 --> 00:35:04,280
don't have the capacity or time to
help up level or train folks into that

441
00:35:04,920 --> 00:35:07,119
discipline. Right. It kind of
requires a lot of different skills. You

442
00:35:07,159 --> 00:35:12,119
need a technical expertise, you need
experience with the systems involved, and you

443
00:35:12,159 --> 00:35:17,920
need a good handle on like effective
communication, not just for communicating the status

444
00:35:17,960 --> 00:35:22,119
of the incident, but also communicating
with the folks that you are directing if

445
00:35:22,159 --> 00:35:27,559
you're in an incident commander role for
example. There's another area where if you

446
00:35:27,599 --> 00:35:30,079
invest in learning from these things,
you can create artifacts that folks pick up

447
00:35:30,079 --> 00:35:36,000
when they join the organization. Right
in almost every large org I've been,

448
00:35:36,079 --> 00:35:42,960
there's a confluence space or Google drive
folder is something full of post incident reviews.

449
00:35:43,440 --> 00:35:45,920
Sometimes I'll just go in and read
those, right, and you start

450
00:35:45,960 --> 00:35:50,840
to learn, you know, who
are the folks who demonstrate an ability to

451
00:35:50,920 --> 00:35:53,320
kind of respond to some of the
most significant incidents and what are they doing,

452
00:35:53,920 --> 00:35:59,800
how are they doing that right?
What skills or actions have they taken

453
00:35:59,840 --> 00:36:02,960
that stood out in the learning room
review should I try and cultivate as a

454
00:36:04,000 --> 00:36:07,920
responder? And so that that can
be a really interesting space too, is

455
00:36:08,440 --> 00:36:13,119
not just learning about the system and
what things we can change to improve performance

456
00:36:13,159 --> 00:36:16,079
of at the time, but how
are we leaving breadcrumbs for the new folks

457
00:36:16,119 --> 00:36:21,559
coming into the org who are growing
into that discipline, because trial by fire

458
00:36:21,840 --> 00:36:25,639
during a major incident can be a
really stressful, kind of terrifying experience,

459
00:36:27,119 --> 00:36:30,320
and so the more you can kind
of give, you know, these these

460
00:36:30,480 --> 00:36:37,639
anecdotal or story based accounts of how
things go in your organization, more comfortable

461
00:36:37,679 --> 00:36:42,320
folks and feel when they step into
that role. Yeah, I think it's

462
00:36:42,320 --> 00:36:47,679
one of those areas where there's like
a mentoring path there. And as I

463
00:36:47,719 --> 00:36:52,599
have gotten older and been doing this
for a while, I've realized that that's

464
00:36:52,159 --> 00:36:58,719
that's a larger part of my job
is sharing that that context because you can

465
00:36:58,760 --> 00:37:04,639
put the documentation, but then there's
also like the unspoken or the unwritten part

466
00:37:04,679 --> 00:37:07,880
of that. You know, there's
the mood, the field the context of

467
00:37:07,920 --> 00:37:14,000
the situation. And I think that's
been a problem for you know, far

468
00:37:14,079 --> 00:37:17,159
beyond my lifetime, and the only
way we've been successful at solving it now

469
00:37:17,639 --> 00:37:22,719
up to this point is just through
that mentoring type role where you bring people

470
00:37:22,760 --> 00:37:28,000
in even though you know that they
aren't ready to be the lead in this,

471
00:37:28,440 --> 00:37:32,280
you bring them in just so that
they can can witness it and start

472
00:37:32,320 --> 00:37:38,079
making notes for themselves. Yeah,
and that's where a process or a policy

473
00:37:38,159 --> 00:37:44,920
around incident response and incident learning that
is based on transparency can be really helpful.

474
00:37:45,480 --> 00:37:49,719
Right, Sometimes you get a lot
of folks joining the major incident room

475
00:37:49,760 --> 00:37:53,239
that are trying to contribute in ways
that may not actually you know, help

476
00:37:53,360 --> 00:37:58,519
with mitigation. But a lot of
times we find in large organizations that have

477
00:37:59,199 --> 00:38:02,599
you know, policies angle toward transparency, folks just joined to kind of understand

478
00:38:02,639 --> 00:38:07,840
and learn in the moment and also
after the fact. So, you know,

479
00:38:07,920 --> 00:38:13,880
the the incident learning review calendar is
always a place that I go and

480
00:38:13,920 --> 00:38:16,159
try to figure out, you know, which which of these incidents are going

481
00:38:16,199 --> 00:38:21,079
to be most helpful for me understanding
the way this organization operates and the critical

482
00:38:21,119 --> 00:38:24,360
systems. Right in the past,
role we had a COFCA platform that was

483
00:38:25,039 --> 00:38:29,280
you know, involved in a lot
of incidents, not because the COFCA platform

484
00:38:29,360 --> 00:38:31,039
was a problem, but because everything
was built around it, right, So

485
00:38:31,079 --> 00:38:35,280
every time there was an issue with
any system, that kind of tied back

486
00:38:35,320 --> 00:38:37,760
to there. And that presents a
really interesting lens for you know, how

487
00:38:37,760 --> 00:38:42,840
do these folks communicate with the low
broader org and what changes are we making

488
00:38:42,920 --> 00:38:46,960
to shore up some of those critical
dependencies, And you know, just being

489
00:38:46,960 --> 00:38:52,039
able to join a conversation about that, not having been involved in response or

490
00:38:52,079 --> 00:38:57,199
having anything to do with the teams
involved, can be a really powerful opportunity

491
00:38:57,199 --> 00:39:00,079
for you to kind of learn about
the team that you're working with and the

492
00:39:00,280 --> 00:39:05,519
underlying technologies. Especially for folks like
me, it's been eight years since I

493
00:39:05,679 --> 00:39:08,840
was you know, maintaining those types
of platforms and so picking up on some

494
00:39:08,920 --> 00:39:14,320
of that nuance so that I can
support the folks who are around those systems

495
00:39:14,320 --> 00:39:16,920
can be really helpful. There's a
line there, though, You've got to

496
00:39:16,920 --> 00:39:22,480
make sure that expectations are clear,
right. If you're participating in something for

497
00:39:22,519 --> 00:39:28,400
the purpose of learning, you're kind
of a sponge rather than someone who's bringing

498
00:39:28,519 --> 00:39:31,360
opinions, you know, not having
understood the circumstances of the specific incident.

499
00:39:31,920 --> 00:39:37,400
So you need a healthy kind of
culture and set of expectations around this.

500
00:39:37,559 --> 00:39:40,000
But I've seen a lot of orgs
that do it well, and it is

501
00:39:40,599 --> 00:39:45,760
a game changer, you know,
for for helping to provide you know,

502
00:39:46,599 --> 00:39:52,519
scalable mentorship and opportunities for folks to
get a better understanding of the details.

503
00:39:52,760 --> 00:39:57,039
Yeah. One of the things you
commented on that I think just can't be

504
00:39:57,119 --> 00:40:04,559
elaborated enough is transparen see. And
I've worked in multiple places, and when

505
00:40:04,559 --> 00:40:09,440
I first started my career, it
was it was in many instances a fireable

506
00:40:09,599 --> 00:40:14,920
event if you created an incident,
and for that reason, people would try

507
00:40:14,920 --> 00:40:20,679
to hide and cover up their incidents, which led to no one learning from

508
00:40:20,719 --> 00:40:25,079
that. And these days, you
know, I almost paraded around you know,

509
00:40:25,199 --> 00:40:31,199
hey, I broke this because there's
a learning opportunity there. And I

510
00:40:31,199 --> 00:40:37,559
think it's really important to be open
and to build the environment where people aren't

511
00:40:37,599 --> 00:40:42,320
afraid to say that they made mistakes, and even the dumb mistakes, we

512
00:40:42,400 --> 00:40:45,119
all do them, you know,
you learn from it. And I actually,

513
00:40:45,159 --> 00:40:49,679
at some point in my career a
boss of mine told me, and

514
00:40:49,800 --> 00:40:54,480
it's an anecdotal story, but it's
still effective. Someone created an incident cost

515
00:40:54,519 --> 00:40:59,159
several hundred thousand dollars and said,
oh am, I going to be fired

516
00:40:59,199 --> 00:41:02,440
now, and the responded, no, I just spent two hundred thousand dollars

517
00:41:02,480 --> 00:41:09,960
on your education. Why would I
fire you now? Yeah? And this

518
00:41:10,000 --> 00:41:15,440
is where I think, like,
it's really difficult to build trust, right,

519
00:41:15,480 --> 00:41:17,320
It's really easy to damage trust,
it's really difficult to build it.

520
00:41:17,320 --> 00:41:22,599
And so if you're approaching your your
incident management, you know, life cycle

521
00:41:22,679 --> 00:41:29,159
and process from the perspective of trying
to support folks doing what they can to

522
00:41:29,199 --> 00:41:36,079
help the business be successful, you
get a lot of really impactful contribution and

523
00:41:36,119 --> 00:41:39,079
collaboration with regard to you know,
keeping systems healthy and things like that.

524
00:41:39,599 --> 00:41:45,199
But if you over index on you
know, the measurable metrics. We're humans,

525
00:41:45,280 --> 00:41:49,800
right, every every human will gain
a measure Right, you start to

526
00:41:49,840 --> 00:41:53,360
cultivate some of those types of environments
where you know, what's the consequence of

527
00:41:53,400 --> 00:41:58,280
me doing the right thing here?
Is is it going to reflect poorly on

528
00:41:58,360 --> 00:42:00,199
me? Is it going to cause
an issue? And so, thankfully,

529
00:42:00,719 --> 00:42:06,079
I think every organization that the Jelly
has worked with over the past two and

530
00:42:06,119 --> 00:42:10,000
a half years since I joined two
and a half plus years, they've taken

531
00:42:10,000 --> 00:42:14,800
the approach that, yeah, these
are these are blame aware learning reviews.

532
00:42:14,920 --> 00:42:17,920
Right. We know that folks make
mistakes, that they don't have sufficient context

533
00:42:19,639 --> 00:42:23,400
in the moment, and that they
can learn from those experiences and change their

534
00:42:23,400 --> 00:42:30,239
approach next time, versus this kind
of you know, older model we'll say,

535
00:42:30,320 --> 00:42:37,800
of prioritizing the the the you know, public visibility of how things are

536
00:42:37,840 --> 00:42:39,519
going, and maybe like maybe we
don't declare an incident for that one,

537
00:42:39,559 --> 00:42:43,800
we just try to fix it quickly. Early in my career, I was

538
00:42:44,400 --> 00:42:49,679
learning how to use Microsoft SEQL databases
and we had a large share point site.

539
00:42:49,719 --> 00:42:54,800
It was another medical audit company,
and I learned what drop database commands

540
00:42:54,840 --> 00:43:01,440
do, and I did the hire
production database and fortunately I had enough experience

541
00:43:01,440 --> 00:43:06,159
to quickly restore it before anyone noticed. But that was an environment where I

542
00:43:06,199 --> 00:43:10,119
didn't feel comfortable, you know,
broadcasting that I had just seen in the

543
00:43:10,199 --> 00:43:15,239
process of learning some new commands stopped
the entire database. So yeah, it

544
00:43:15,239 --> 00:43:22,360
can be a tricky balance, but
you know, some light is the best

545
00:43:22,360 --> 00:43:25,360
medicine, right. Transparency in these
types of environments allow folks to do what's

546
00:43:25,440 --> 00:43:30,760
necessary to get things back to good
And I think the more you can kind

547
00:43:30,800 --> 00:43:36,760
of socialize and demonstrate that transparency,
the more effective your organization is going to

548
00:43:36,760 --> 00:43:39,840
be, and the more folks are
going to want to contribute to that mission,

549
00:43:39,960 --> 00:43:45,360
whatever it is. Yeah, yeah, absolutely agreed. So let's talk

550
00:43:45,400 --> 00:43:51,719
a little bit about what's going on
with Jelly these days. Yeah, So

551
00:43:51,880 --> 00:43:57,360
Jelly has been like the most interesting
experience of my career. I think I

552
00:43:57,440 --> 00:44:02,239
mentioned I joined in twenty twenty one. I think it was Jelly was just

553
00:44:02,360 --> 00:44:07,199
a post incident analysis tool at that
time. So we had this notion of

554
00:44:07,239 --> 00:44:12,599
building narratives and not much else,
and we recognized that part of the post

555
00:44:12,679 --> 00:44:16,000
incident learning process involves having good data, and the way that you get good

556
00:44:16,079 --> 00:44:20,880
data is you get consistent in your
process. And so we ended up building

557
00:44:20,880 --> 00:44:24,840
this incident response bot and we also
went to the other end of the spectrum

558
00:44:24,840 --> 00:44:30,079
and started introducing features for cross incident
analysis. And so this is, you

559
00:44:30,119 --> 00:44:32,840
know, after an incident, let's
spend some time learning, but then how

560
00:44:32,840 --> 00:44:38,719
do we roll up those learnings into
themes across incidents that help the organization make

561
00:44:38,800 --> 00:44:46,239
decisions around growing teams to support services
or changing direction with regard to build versus

562
00:44:46,239 --> 00:44:51,960
buy those sorts of things. And
so we've been working on a lot of

563
00:44:51,960 --> 00:44:57,000
cool stuff for the last two and
a half years. And then in what

564
00:44:57,079 --> 00:45:01,440
was it, I think November seconds
the public announcement that we were merging with

565
00:45:01,480 --> 00:45:09,400
Patrie Duty went out, which has
been like really exciting and also a crying

566
00:45:09,440 --> 00:45:14,079
experience has been a month, right, And so page Duty is something like

567
00:45:15,360 --> 00:45:21,880
eleven hundred employees as of January of
this year, we were twenty one.

568
00:45:22,360 --> 00:45:25,719
We're kind of in the process of
figuring out how to bridge those two divides.

569
00:45:25,760 --> 00:45:30,280
And one thing that I'm really excited
about is, you know, Jelly

570
00:45:30,360 --> 00:45:35,880
has spent a lot of time differentiating
itself as a product in the postings and

571
00:45:36,039 --> 00:45:39,880
learning area, and I think we've
brought a lot of kind of novel approaches

572
00:45:39,880 --> 00:45:45,559
and opinions students that response in general. Patri Duty has been doing this for

573
00:45:45,639 --> 00:45:51,079
fourteen plus years, right they and
they created the category within which Jelry could

574
00:45:51,079 --> 00:45:54,519
become a company, which is pretty
cool. And so what we're looking to

575
00:45:54,599 --> 00:46:01,280
do now is to take that practice, you know, post incident learning really

576
00:46:01,519 --> 00:46:06,719
get folks from the earlier phases of
the maturity level where they're just doing incident

577
00:46:06,760 --> 00:46:09,760
response and maybe they're doing a post
incident learning review on a Google doc,

578
00:46:10,159 --> 00:46:15,880
and bring them into the modern right
and start creating incident narratives and doing learning

579
00:46:15,880 --> 00:46:21,239
reviews. Page of Duty has something
like twenty seven thousand free and paid customers.

580
00:46:21,760 --> 00:46:24,800
There's a huge opportunity there to help
folks understand a better way of kind

581
00:46:24,840 --> 00:46:31,159
of benefiting from all. So that's
my focus right now is figuring out how

582
00:46:31,159 --> 00:46:37,719
do we bring those two worlds together
while keeping an eye on preserving that kind

583
00:46:37,760 --> 00:46:45,760
of post incident learning tooling and opportunity. But yeah, a lot a lot

584
00:46:45,760 --> 00:46:49,760
of exciting stuff on the horizon we
are. We are going into a new

585
00:46:49,880 --> 00:46:53,400
year, so I think things will
look very different on the Page of Duty

586
00:46:53,440 --> 00:47:00,000
side and probably also improve on the
Jelly side as well. It's going to

587
00:47:00,039 --> 00:47:04,840
be it's going to be really interesting. Yeah. I think it's a natural

588
00:47:04,880 --> 00:47:08,840
fit, you know, because Page
your Duty is hands down a great tool

589
00:47:09,039 --> 00:47:19,079
for notifying people that there's something requesting
their attention, but what you do after

590
00:47:19,159 --> 00:47:23,760
that is kind of up to you, and so it seems like a natural

591
00:47:23,800 --> 00:47:31,400
fit to just roll that right into
into Jelly and and help help people like

592
00:47:31,480 --> 00:47:37,079
just from a business perspective, take
this huge page your Duty customer base and

593
00:47:37,199 --> 00:47:42,159
just guide them into the thing that
they thought they were doing all along.

594
00:47:43,639 --> 00:47:46,239
Yeah, one one focus for us
has always been, you know, how

595
00:47:46,280 --> 00:47:52,920
can we improve the quality of our
customers' postings and learning reviews, and how

596
00:47:52,920 --> 00:47:58,960
can we allow the folks conducting those
investigations to focus on what matters. We've

597
00:47:58,960 --> 00:48:05,119
talked to organizations where, you know, there was one problem manager at a

598
00:48:05,119 --> 00:48:07,360
a company that used Microsoft teams,
and part of their job was to go

599
00:48:07,440 --> 00:48:13,440
through every team's channel and find transcripts
associated with an incident and put them in

600
00:48:13,480 --> 00:48:16,239
service. Now, nobody should be
doing that, right, That's just toil.

601
00:48:16,440 --> 00:48:22,960
That's that's not productive. And so
one thing I'm especially excited for with

602
00:48:22,000 --> 00:48:27,320
this partnership with page Duty is or
this this acquisition by paye Duty, is

603
00:48:27,840 --> 00:48:30,079
they've got a ton of data,
right, And so when you're building your

604
00:48:30,440 --> 00:48:34,800
post incident narratives, your timeline,
and you're adding evidence and you're trying to

605
00:48:34,840 --> 00:48:38,760
help folks understand the details of an
incident, the more data you have to

606
00:48:38,840 --> 00:48:43,719
substantiate those claims and those events that
you're highlighting in the incident, the more

607
00:48:44,440 --> 00:48:49,079
folks can learn from you know,
the not only the overall shape of the

608
00:48:49,119 --> 00:48:52,400
incident, but the systems involved and
how they're used to understand you know,

609
00:48:52,440 --> 00:48:57,440
the underlying technology. And so there's
an element there that's really exciting, which

610
00:48:57,480 --> 00:49:00,920
is just we have a lot more
data to allow our our users to work

611
00:49:00,960 --> 00:49:07,239
with. But I also think,
like I said, Page Duty has been

612
00:49:07,280 --> 00:49:10,400
known for a really long time,
UH as kind of an industry leader in

613
00:49:10,519 --> 00:49:15,519
scheduling and alearning. Right Uh act
and bail I got paged. I'm gonna

614
00:49:15,519 --> 00:49:19,559
go fix it. Uh. There
is a better way, right, Like,

615
00:49:19,760 --> 00:49:24,559
there are ways to tie that process
into the incident response process and the

616
00:49:24,599 --> 00:49:29,519
postings of the view and I think
that's that's going to be our focus over

617
00:49:29,599 --> 00:49:32,119
the next you know, several months, is figuring out how do we give

618
00:49:35,639 --> 00:49:40,760
pager Duty more mechanisms for supporting responders
throughout the entire incident management life cycle,

619
00:49:40,880 --> 00:49:45,440
not just the detection phase, which
a lot of folks know and they're familiar

620
00:49:45,519 --> 00:49:51,039
with, but you know, Page
Duty's full operations cloud, which most folks

621
00:49:51,119 --> 00:49:53,920
I've talked to don't even know exists. Uh. And and this is you

622
00:49:53,960 --> 00:50:00,440
know, the the AI automation for
reducing noise to signal when it comes to

623
00:50:00,519 --> 00:50:06,440
events. This is all of the
mechanisms around running actual incidents, and then

624
00:50:06,480 --> 00:50:10,599
this is the post incident as well. Pad has a feature today called post

625
00:50:10,639 --> 00:50:15,519
mortems, which is fairly straightforward.
It's your your Google post mortem doc.

626
00:50:15,880 --> 00:50:20,880
But we think there's a lot of
opportunity to not require that folks are going

627
00:50:20,920 --> 00:50:23,679
and creating these data sets manually,
but just kind of provide that information so

628
00:50:23,719 --> 00:50:30,039
they can use it to better narratives
that are living all the things. And

629
00:50:30,119 --> 00:50:31,039
yeah, I think I think it's
a natural fit too. I mean,

630
00:50:31,079 --> 00:50:37,000
I've been using page of duty for
basically my entire career, right and being

631
00:50:37,039 --> 00:50:42,599
able to bridge that gap between that
paging and scheduling and then the things that

632
00:50:42,639 --> 00:50:45,760
I need to do to help my
team be successful, it's going to be

633
00:50:45,159 --> 00:50:51,639
huge, you know, for from
my experience. Yeah, I think having

634
00:50:51,719 --> 00:50:57,559
access to that data is going to
lead to better collaboration after the fact,

635
00:50:57,559 --> 00:51:00,280
because that's for me, I've always
struggled with that. You know, after

636
00:51:00,320 --> 00:51:04,960
the incident's over and you're trying to
do the review of it, trying to

637
00:51:05,079 --> 00:51:09,000
remember what things happened in what order
and remember all of those steps that you

638
00:51:09,079 --> 00:51:14,679
took. So if you've got something
that can can prompt you with reminders and

639
00:51:15,000 --> 00:51:17,320
kind of pre populate that narrative for
you. I think it's just going to

640
00:51:17,440 --> 00:51:23,360
lead to better, better results at
the end. Yeah, there is nothing

641
00:51:23,440 --> 00:51:28,679
better than having a starting point when
you are trying to investigate an incident.

642
00:51:28,760 --> 00:51:31,880
Right, If you open an empty
Google doc, it's a hard time.

643
00:51:32,000 --> 00:51:37,119
But if you can start with you
know, in Jelly today, you start

644
00:51:37,119 --> 00:51:39,719
with the incident transcript, all the
conversation that happened in slack and data about

645
00:51:39,719 --> 00:51:45,039
who is involved, so much better
than starting from nothing. And that's especially

646
00:51:45,079 --> 00:51:52,239
true when you know your incident response
process uses multiple data sources like multiple incident

647
00:51:52,280 --> 00:51:57,000
response channels or your data dog charts
or what have you. So we're not

648
00:51:57,320 --> 00:52:02,079
really looking to do the post incident
narrative for you. We're looking to give

649
00:52:02,079 --> 00:52:07,400
you a point to start from because
that saves time, it saves energy,

650
00:52:07,440 --> 00:52:12,320
and let's you focus on the things
that only you can create within your post

651
00:52:12,360 --> 00:52:16,039
incident narrative. Right, the investigator
is a conduit through which the folks who

652
00:52:16,079 --> 00:52:23,400
are involved in the organizational miscellany kind
of come together into a coherent story about

653
00:52:23,400 --> 00:52:29,199
what happens and what it means.
So we really want to like provide a

654
00:52:29,280 --> 00:52:34,840
foundation on top of which folks can
have these conversations. And I think there's

655
00:52:34,840 --> 00:52:38,840
a lot of opportunity there with this
kind of broader spectrum of data and integrations

656
00:52:40,199 --> 00:52:46,760
within customer's existing processes. Yeah,
it reminds me a lot of like a

657
00:52:47,920 --> 00:52:53,800
there's a like a a people skill
there. You put two people who don't

658
00:52:53,840 --> 00:53:00,239
know each other in a room and
anything could possibly happen. They could strick

659
00:53:00,280 --> 00:53:02,239
up conversation, they could sit there
in silence. You know, there's just

660
00:53:02,320 --> 00:53:07,880
no way to gauge it. But
then if you give them a conversation starter,

661
00:53:07,800 --> 00:53:13,199
then you can sort of like guide
the results from there. And I

662
00:53:13,239 --> 00:53:19,039
think I think that's the real value
of what the post incident narrative does,

663
00:53:19,199 --> 00:53:23,440
is it's that conversation starter. Yeah, that I mean certainly for us,

664
00:53:23,639 --> 00:53:27,679
as you mentioned, like we use
Jelly internally and we do our own learning

665
00:53:27,719 --> 00:53:32,800
reviews. I think the exercise of
you know, mitigating the incident, putting

666
00:53:32,880 --> 00:53:37,039
together the learning review, those are
valuable experiences for the folks involved. But

667
00:53:37,119 --> 00:53:40,880
getting everyone in the company, because
we can do that at twenty one people

668
00:53:42,480 --> 00:53:45,440
into a room to talk about what
happened, to ask questions to figure out

669
00:53:45,440 --> 00:53:49,119
what did you know? What did
you not know? You know? What

670
00:53:49,199 --> 00:53:52,079
did I know? And I wasn't
involved those those sorts of things. That's

671
00:53:52,119 --> 00:54:00,239
where you get really interesting kind of
exponential increases and understand And it's not just

672
00:54:00,760 --> 00:54:04,760
the thing that most excites me about
these learning reviews is it's not just the

673
00:54:04,840 --> 00:54:08,159
understanding of the technical or the organizational
process. It's the understanding of each other.

674
00:54:08,480 --> 00:54:13,400
Right, How how I communicate in
these environments? How you communicate what

675
00:54:13,440 --> 00:54:16,760
your expectations are, what sorts of
things I need to be better about informing

676
00:54:16,840 --> 00:54:24,000
during response. It's a it's a
retro right, and the software can't operate

677
00:54:24,039 --> 00:54:28,719
itself. And so if the people
are working effectively together, then the software

678
00:54:28,800 --> 00:54:31,079
is working effectively and if they're not, then it's not. And I think

679
00:54:31,079 --> 00:54:36,119
that's that's one of the really big
opportunities, especially for you know, the

680
00:54:36,280 --> 00:54:43,199
large complex organizations in novel economic environments, to figure out, you know,

681
00:54:43,280 --> 00:54:47,880
how do we improve our efficiency in
our collaboration so that we can do what

682
00:54:49,320 --> 00:54:53,000
needs to be done? Really exciting, Oh, it is really exciting.

683
00:54:53,119 --> 00:54:59,039
I'm looking forward to seeing how this
plays out for y'all. Yeah, I'll

684
00:54:59,079 --> 00:55:01,760
have to let you out there's you
know, we're in a phase right now

685
00:55:01,800 --> 00:55:05,599
where there are too many good things
for us to do, so we got

686
00:55:05,599 --> 00:55:10,440
to figure out the next best thing
and focus on that. But yeah,

687
00:55:10,480 --> 00:55:15,760
that's that's the spot I want to
be in. Endless opportunity ahead of us.

688
00:55:15,800 --> 00:55:20,800
We just got to figure out how
we're going to get that to our

689
00:55:20,800 --> 00:55:27,719
customers as quickly as possible. Yeah, for sure. Yeah cool. So,

690
00:55:28,440 --> 00:55:30,719
anything else you'd like to share with
us about incident response, Jelly,

691
00:55:30,960 --> 00:55:37,039
page of Duty, any topics at
all. Yeah, if you're not already

692
00:55:37,119 --> 00:55:39,840
using Page of Duty, take a
look. It's the best thing for paging

693
00:55:39,920 --> 00:55:43,519
that I've ever found. And if
you want to give Jelly a try,

694
00:55:43,719 --> 00:55:45,840
there's a free trial on the site
and we start you off with some pre

695
00:55:45,880 --> 00:55:51,280
built learning reviews so you can see
what they look like. Start playing around

696
00:55:51,320 --> 00:55:54,199
in there, and if you have
any questions, you know, I'm sure

697
00:55:54,280 --> 00:55:57,960
you'll be able to find me in
the show notes here. But it's been

698
00:55:58,000 --> 00:56:00,599
really great to meet you, Will
and thank you so much for opportunity to

699
00:56:00,679 --> 00:56:04,559
chat. No, thank you,
it's been a great conversation. I've enjoyed

700
00:56:04,599 --> 00:56:07,760
it. And uh, if you're
up for it, I would love to

701
00:56:07,400 --> 00:56:12,360
have you back on the show'd that'd
be great? All right? So much?

702
00:56:13,000 --> 00:56:15,800
All right? Cool? Well,
thanks for listening everyone, and I

703
00:56:15,840 --> 00:56:20,000
will see y'all next week. M