WEBVTT

1
00:00:14.480 --> 00:00:17.800
Hey, what's going on, everybody? I am the host of Adventures and

2
00:00:17.839 --> 00:00:22.079
DevOps and wait, yeah, that's
the right channel name. Sorry. You

3
00:00:22.120 --> 00:00:26.480
know sometimes I get that mixed up, like almost every time I think,

4
00:00:26.519 --> 00:00:29.760
so if you're a frequent listener to
the podcast, you know that I messed

5
00:00:29.800 --> 00:00:33.679
that up just about every time.
So thanks for bearing with me on that.

6
00:00:34.439 --> 00:00:38.479
But today I'm not going to miss
this part up because today I have

7
00:00:39.320 --> 00:00:44.520
as our guest of Drew Stokes.
He's the senior manager of software engineering for

8
00:00:44.759 --> 00:00:49.240
Page Your Duty, and we're talking
about incident management. And that's one of

9
00:00:49.240 --> 00:00:54.479
my favorite topics because it just goes
so deep and it crosses so many disciplines

10
00:00:54.520 --> 00:01:00.960
across your infrastructure and your application teams
and marketing and sales and the executive suite.

11
00:01:02.000 --> 00:01:06.079
Like depending on the level of the
incident, you're just all overboard,

12
00:01:06.560 --> 00:01:08.280
all over the board with this.
So Drew, welcome to the show.

13
00:01:08.319 --> 00:01:11.680
I'm excited to have you here.
Thanks, I'm excited to be here.

14
00:01:11.719 --> 00:01:18.079
It's going to be too well right
on, So tell us a little bit

15
00:01:18.120 --> 00:01:23.799
how you got into the field of
incident management or incident response. Oh that's

16
00:01:23.799 --> 00:01:29.079
a good question. Okay, Yeah, So I've been in tech for a

17
00:01:29.079 --> 00:01:33.040
while, like most people here,
I think it's been something like sixteen years,

18
00:01:33.840 --> 00:01:38.840
and I think originally I was kind
of trying to figure out my way

19
00:01:38.920 --> 00:01:42.079
helping folks out with technology and networks, and then I got into front end

20
00:01:42.079 --> 00:01:46.719
development and moved into back end and
then dropped into SRE and that's when I

21
00:01:46.799 --> 00:01:52.079
kind of really got familiar with not
just the process of mitigating incidents, but

22
00:01:52.120 --> 00:01:56.000
actually managing them and trying to learn
from them. So I did that for

23
00:01:56.040 --> 00:01:57.400
a while, and then I think
for something like the last eight years,

24
00:01:57.439 --> 00:02:02.719
I've been primarily focused on people manager
role. And there's a lot of ways

25
00:02:02.760 --> 00:02:07.319
in which you know, people managers
are involved in incident management as well,

26
00:02:07.359 --> 00:02:13.240
both as stakeholders but also you know, facilitators and folks who are playing a

27
00:02:13.240 --> 00:02:16.479
supportive role for people who are responding. So kind of been in that space

28
00:02:17.639 --> 00:02:22.599
for a while now, and back
in I think it was May of twenty

29
00:02:22.639 --> 00:02:27.360
twenty one, I joined a startup
called Jelly, which was founded by Nora

30
00:02:27.479 --> 00:02:30.800
Jones, who's the author of the
Chaos Engineering book and the founder of the

31
00:02:30.879 --> 00:02:35.759
Learning from Incidents community, and that
was kind of where I really dropped into

32
00:02:36.840 --> 00:02:40.240
you know, incident management in general, but specifically this opportunity to kind of

33
00:02:40.280 --> 00:02:45.960
not just resolve incidents, mitigate the
issues, but also to learn from them

34
00:02:45.960 --> 00:02:51.840
in order to improve future response and
organizational performance. So there's a lot of

35
00:02:51.840 --> 00:02:53.680
really interesting ways to think about the
space, and you mentioned at the beginning

36
00:02:53.719 --> 00:02:58.319
it's really important. And part of
the reason is because it's so cross cutting,

37
00:02:58.400 --> 00:03:01.120
right, because incidents or a lens
through which you can see the way

38
00:03:01.120 --> 00:03:06.520
that your organization and your people operate, and that applies to customer service,

39
00:03:06.599 --> 00:03:09.680
that that applies to executives and to
the folks actually responding to the incidents.

40
00:03:09.680 --> 00:03:15.240
It's a really interesting space with a
lot of opportunity, which you'll you'll hear

41
00:03:15.319 --> 00:03:19.479
that word a lot in this conversation. We refer to incidents as opportunities.

42
00:03:20.159 --> 00:03:23.080
Oh for sure they are because you
know, one of the things that I

43
00:03:23.080 --> 00:03:25.800
think about a lot is just because
we're in tech. You know, we've

44
00:03:27.120 --> 00:03:31.840
all done the Google search for is
such and such service down because you're having

45
00:03:31.879 --> 00:03:36.319
problems and you're like, did I
do something wrong? Or are they actually

46
00:03:36.360 --> 00:03:38.599
dead in the water right now?
And I think that's like, to me,

47
00:03:38.680 --> 00:03:46.240
that's one of the hallmarks of highlighting
that your incident response plan is really

48
00:03:46.280 --> 00:03:53.879
really well done whenever your customers know
that you're having an incident incidents because you

49
00:03:54.120 --> 00:04:00.400
told them versus them discovering that something
was broken. Yeah, there's a there's

50
00:04:00.439 --> 00:04:04.879
a level of well. So,
so one interesting aspect here is you mentioned

51
00:04:04.919 --> 00:04:10.159
another cross cutting function there, right, which is you have internal stakeholders and

52
00:04:10.280 --> 00:04:14.479
external stakeholders for these types of things. But there's also this layer I think

53
00:04:14.479 --> 00:04:18.240
that you're referring to here of like
operational excellence and observability. Right, do

54
00:04:18.279 --> 00:04:23.000
you know that the system's broken before
someone tells you that the system is broken,

55
00:04:23.720 --> 00:04:28.199
And a lot of the ways in
which you can improve that process is

56
00:04:28.199 --> 00:04:30.560
through the learning process after the incident. Right, So, if you have

57
00:04:30.600 --> 00:04:34.720
an incident, for example, where
a customer reports an issue, looking at

58
00:04:34.759 --> 00:04:40.120
the details of that timeline and what
actually happened can help you figure out where

59
00:04:40.199 --> 00:04:44.800
you need to add additional instrumentation or
alerting, or how to adjust your team's

60
00:04:45.399 --> 00:04:49.439
processes, you know, your software
development life cycle or your release process to

61
00:04:50.040 --> 00:04:56.600
better account for those kind of unpredictable
behaviors in the system. So really interesting,

62
00:04:56.720 --> 00:05:00.600
like complicate, you know, when
you're dealing with not just complex software

63
00:05:00.639 --> 00:05:05.720
systems, but also complex organizations and
groups of people, right, really interesting

64
00:05:05.759 --> 00:05:11.399
opportunities to figure out how do we
kind of iteratively approve improve our understanding of

65
00:05:11.399 --> 00:05:15.600
the system and our understanding of failure
mode so that we can kind of inspire

66
00:05:15.720 --> 00:05:19.720
customer confidence and trust, right,
letting them know that there's an issue before

67
00:05:20.120 --> 00:05:26.360
before they don't. Yeah, for
sure. So you early early on in

68
00:05:26.360 --> 00:05:31.360
this you mentioned something I want to
highlight, mitigating an incident versus managing an

69
00:05:31.439 --> 00:05:36.360
incident. Can you elaborate on the
difference between those two, Yeah, that's

70
00:05:36.399 --> 00:05:42.720
a that's a great question. So
there are a lot of different aspects of

71
00:05:42.759 --> 00:05:46.199
incident management in general, and I'll
try to like decompose them in a way

72
00:05:46.240 --> 00:05:50.920
that makes sense here. So I
think when you just reference detection, right,

73
00:05:51.000 --> 00:05:55.399
so there's a there's a phase there
of like understanding whether or not there's

74
00:05:55.399 --> 00:05:58.360
an incident and trying to do something
about it. And I think when we

75
00:05:58.399 --> 00:06:04.879
talk about managing incidents, what we're
talking about is providing information and coordinating folks

76
00:06:04.920 --> 00:06:11.959
in incident response. Right. Mitigating
an incident is doing something to address the

77
00:06:12.000 --> 00:06:15.839
issue and get the system back to
a stable state or you know, performing

78
00:06:15.839 --> 00:06:20.120
in a way that's expected with regard
to external stakeholders. But I think for

79
00:06:20.279 --> 00:06:27.279
us, managing an incident is really
about investigating what's going on, getting the

80
00:06:27.360 --> 00:06:30.720
necessary folks with the subject matter expertise
into the room to contribute to that,

81
00:06:31.279 --> 00:06:36.199
coordinating that group of people in you
know, large organizations are really complex incidents.

82
00:06:36.199 --> 00:06:41.759
Sometimes you have multiple work streams of
investigation within an incident, and then

83
00:06:41.800 --> 00:06:46.519
communicating status out to stakeholders, your
customer success team, your executives in a

84
00:06:46.560 --> 00:06:50.839
way that allows them to stay informed
but does not have them jump in and

85
00:06:50.879 --> 00:06:54.439
start, you know, trying to
get involved in the process in a way

86
00:06:54.439 --> 00:07:00.439
that can you know, add additional
complexity to the overall incident managed. So

87
00:07:00.120 --> 00:07:04.199
from my perspective, I think management
is a lot more about the process of

88
00:07:04.199 --> 00:07:10.839
coordinating and communicating during an incident,
and mitigation is about that moment when you've

89
00:07:10.920 --> 00:07:17.040
kind of identified and addressed the issue
to stop whatever impact is associated with the

90
00:07:17.040 --> 00:07:21.000
incident, Right, that's your signal
to your external stakeholders that we are in

91
00:07:21.000 --> 00:07:25.480
a stable state, we've seen things
are good, and there are various other

92
00:07:25.480 --> 00:07:30.720
steps after that, But for me, that's the primary difference. Yeah,

93
00:07:30.920 --> 00:07:35.079
Yeah, I think that's really important
for someone who's not done a lot of

94
00:07:35.120 --> 00:07:43.800
incident responses to understand that the management
of it is equally important as the mitigating

95
00:07:43.839 --> 00:07:47.720
of it. And in many of
the environments I've worked in, those are

96
00:07:47.720 --> 00:07:56.319
actually two key roles for any incident. You have the first responder who's trying

97
00:07:56.399 --> 00:08:00.240
to find the cause and restore the
service. But then you know, alf

98
00:08:00.240 --> 00:08:07.600
what have your primary communications individual who
is getting the information from that first responder

99
00:08:07.639 --> 00:08:11.000
and relaying it out and doing still
in a way so that everyone feels like

100
00:08:11.079 --> 00:08:18.279
they're in touch with what's going on
and they aren't going around the back door

101
00:08:18.000 --> 00:08:24.000
sending DMS to the first responder to
get status updates. Yeah. Yeah.

102
00:08:24.079 --> 00:08:26.720
One thing we talk a lot about
is kind of this this incident management maturity

103
00:08:26.759 --> 00:08:33.840
model, and we think about different
buckets of you know, engineering teams or

104
00:08:33.919 --> 00:08:37.200
organizations with regard to kind of how
they approach this. And I've been in

105
00:08:37.879 --> 00:08:43.360
you know, multiple layers of the
lower maturity model, and sometimes it can

106
00:08:43.399 --> 00:08:46.759
be really difficult, yeah, to
even understand who's doing what and who do

107
00:08:46.840 --> 00:08:48.480
I ask for an update? You
know, I've got a customer who needs

108
00:08:48.480 --> 00:08:52.039
an update now, and we have
an SLA in the contract, what's going

109
00:08:52.080 --> 00:08:56.399
on? It can be really difficult
to even know who's doing that. And

110
00:08:56.440 --> 00:09:00.600
I think you find that in you
know, incident response tooling like Jelly,

111
00:09:00.919 --> 00:09:05.039
those roles are actually codified in the
process. You're assigning an incident commander,

112
00:09:05.440 --> 00:09:09.759
you're assigning a communications lead to try
and take care of that external communication of

113
00:09:09.919 --> 00:09:13.200
here's the person to you know,
connect with if you need an update,

114
00:09:13.559 --> 00:09:18.120
or here's the person responsible for managing
this incident, so that if you join

115
00:09:18.159 --> 00:09:20.039
in, you can say, hey, I'm here and I know about X,

116
00:09:20.200 --> 00:09:24.320
you know, can I help that
sort of thing? Right? And

117
00:09:24.399 --> 00:09:28.320
so that's one of the things that
Jelly does for you. If you need

118
00:09:28.360 --> 00:09:33.399
to improve the majority, improve the
maturity of your incident response playing, using

119
00:09:33.440 --> 00:09:37.519
something like Jelly can kind of help
you say, hey, here are the

120
00:09:39.000 --> 00:09:43.679
here are the people and the processes
you need in place, and provide a

121
00:09:43.720 --> 00:09:50.080
framework, right. Yeah. I
think I think like every small organization goes

122
00:09:50.120 --> 00:09:52.840
through a phase where someone opens a
Google doc and writes down a run book

123
00:09:52.840 --> 00:09:56.039
for how to run incidents, right, And so what we wanted to do

124
00:09:56.240 --> 00:10:00.159
is to provide some of that for
you in a way that didn't get in

125
00:10:00.200 --> 00:10:03.240
your way. So we've got a
bot in Slack right that you can use

126
00:10:03.279 --> 00:10:07.960
to declare incidents as sigence, stakeholders, set stages, communicate status, all

127
00:10:07.960 --> 00:10:11.159
that sort of stuff, so that
you don't really have to go in and

128
00:10:11.240 --> 00:10:16.159
kind of trial and error that Google
doc and try to get folks enrolled in

129
00:10:16.200 --> 00:10:20.679
the process. There's just a thing
kind of nudging you along the way and

130
00:10:20.799 --> 00:10:24.840
helping to offload some of that cognitive
burden. When you're in the middle of

131
00:10:24.919 --> 00:10:28.600
managing an incident, right or typically
as an incident commander, you're thinking about

132
00:10:28.600 --> 00:10:31.519
a lot of things. Sometimes you're
also trying to mitigate the incident. Right

133
00:10:31.559 --> 00:10:35.240
if it's two am, you may
have a stretch of time where you're doing

134
00:10:35.279 --> 00:10:41.279
everything on your own. And so
I think the more folks can find mechanisms

135
00:10:41.480 --> 00:10:48.120
and processes that help them reduce the
number of things they're doing during management so

136
00:10:48.159 --> 00:10:52.720
they can focus on getting the right
folks in the room and finding the means

137
00:10:52.759 --> 00:10:58.600
to mitigation, the more successful the
response process becomes, which results in better

138
00:10:58.679 --> 00:11:03.320
data for your post incident analysis,
and then you're you know, cross the

139
00:11:03.320 --> 00:11:07.519
incident learning over time. Yeah,
it's one of those things that like we've

140
00:11:09.480 --> 00:11:15.840
we've all done incident response wrong enough
time enough times that we we kind of

141
00:11:15.919 --> 00:11:20.240
know, So it's I think it's
one of those things like you know,

142
00:11:20.279 --> 00:11:24.679
like in software engineering, like writing
logs has been done for decades now,

143
00:11:24.759 --> 00:11:28.960
so you don't write your own logging
engine. You just pull in a logging

144
00:11:30.039 --> 00:11:33.600
library because you don't need to reinvent
that wheel. And I think incident response

145
00:11:33.679 --> 00:11:37.879
is one of those we don't need
to reinvent this will we can just buy

146
00:11:37.919 --> 00:11:41.240
a wheel that's already built. Yeah, we've we've we actually have a couple

147
00:11:41.279 --> 00:11:46.639
of customers of Jelly who are trying
to replace their wheels, right because you

148
00:11:46.679 --> 00:11:50.600
know, some of some of the
large organizations who started this process ten years

149
00:11:50.600 --> 00:11:54.000
ago had to make their own I
used to work at New Relic and we

150
00:11:54.080 --> 00:12:00.360
had a slock bot we called nerd
bot, which was our incident response you

151
00:12:00.440 --> 00:12:03.639
know, facilitation tool. But there's
a cost associated with those things, right,

152
00:12:03.679 --> 00:12:07.559
You have to maintain them over time. Oftentimes they kind of fall to

153
00:12:07.600 --> 00:12:11.360
the bottom of the priority stack,
and so iterating on your internal process becomes

154
00:12:11.399 --> 00:12:15.120
really hard. And I think that's
where if you go with something you know,

155
00:12:15.320 --> 00:12:18.200
like Jelly's incident a response spot,
which is you know, fairly opinionated

156
00:12:18.200 --> 00:12:22.600
but narrow in scope, right,
it's just here are the set of criteria

157
00:12:22.639 --> 00:12:28.799
that we use for this thing.
With some customizable features like automation, then

158
00:12:28.840 --> 00:12:33.840
you don't have to kind of invent
that wheel and then reinvent it iteratively for

159
00:12:33.919 --> 00:12:37.200
all time. And you also don't
really have to, you know, answer

160
00:12:37.240 --> 00:12:41.639
a lot of those questions when your
incidents become more complex. There's like different

161
00:12:41.679 --> 00:12:46.039
phases of your incident response process.
When you're a five person team, you

162
00:12:46.120 --> 00:12:48.559
jump in a zoom call, right, and you fix it. When you're

163
00:12:48.639 --> 00:12:54.120
fifty people in a major incident room, it's a very different experience and requires

164
00:12:54.159 --> 00:13:00.679
a different set of skills and supporting
tooling. So, yeah, cool,

165
00:13:00.799 --> 00:13:07.559
you mentioned a couple of times the
post incident response plan, so elaborate on

166
00:13:07.600 --> 00:13:11.480
that a little bit for me.
Yeah, this is another area where I

167
00:13:11.519 --> 00:13:16.799
think everyone kind of starts with a
recognition that there's more that can be gleaned

168
00:13:16.840 --> 00:13:20.759
from these experiences. Right early on, you have an incident, you respond

169
00:13:20.799 --> 00:13:22.159
to it, you fix it,
maybe you shoot an email off to folks

170
00:13:22.200 --> 00:13:24.639
saying what happened, and you know, here's what we're going to do to

171
00:13:24.639 --> 00:13:31.039
address in the future. But as
your system complexity grows and as your organization

172
00:13:31.159 --> 00:13:35.120
grows, there are you know,
many more opportunities to figure out how to

173
00:13:35.320 --> 00:13:41.720
change not just the system itself right
to you know, write better logs or

174
00:13:41.840 --> 00:13:46.080
increase visibility into the system's behavior,
but also to change how the organization is

175
00:13:46.080 --> 00:13:52.639
structured around those systems. Right.
So, one anecdote I like to share

176
00:13:52.799 --> 00:13:56.480
is at my time in a previous
company, we had this custom feature flag

177
00:13:56.559 --> 00:14:01.080
system that had been around for I
don't know, it was like eight or

178
00:14:01.159 --> 00:14:03.440
nine years or something. Everybody wanted
to get off of it. It wasn't

179
00:14:03.480 --> 00:14:07.360
great, and every time there was
an incident with that system, someone from

180
00:14:07.399 --> 00:14:11.240
the network engineering team would be pulled
in because they were one of the original

181
00:14:11.279 --> 00:14:15.080
authors. They had nothing to do
with this system anymore, but no one

182
00:14:15.120 --> 00:14:20.039
else knew how it works. And
so if you're just responding to and mitigating

183
00:14:20.080 --> 00:14:24.120
incidents and not looking any further,
you don't see those types of organizational misalignment

184
00:14:24.240 --> 00:14:28.960
right where you've got a primary owner
or subject matter expert that is, you

185
00:14:30.000 --> 00:14:31.840
know, accountable for a whole slew
of things that have nothing to do with

186
00:14:31.879 --> 00:14:37.440
this foundational service that's critical for business
function. If you've got a feature flag

187
00:14:37.480 --> 00:14:41.840
system in you know, a fourteen
year old code base it's got to work.

188
00:14:43.120 --> 00:14:48.159
So I think when we talk about
post incident learning, this is this

189
00:14:48.240 --> 00:14:52.080
is the next phase in maturity.
Right, you figured out your response process,

190
00:14:52.200 --> 00:14:54.039
you know how to get the right
folks in the room, you know

191
00:14:54.080 --> 00:14:58.240
how to move toward mitigation, and
you're starting to capture some of the you

192
00:14:58.279 --> 00:15:01.200
know, follow ups that you want
to take. Maybe we need more ossability.

193
00:15:01.399 --> 00:15:05.159
Maybe this library and our services out
of date, and if we updata

194
00:15:05.240 --> 00:15:09.960
we'll get better performance. Like that, But it goes beyond some of those

195
00:15:11.000 --> 00:15:13.879
follow ups, and as you start
to cultivate a process around this, and

196
00:15:13.919 --> 00:15:16.799
there's a lot of different ways that
folks do this. You know they're refer

197
00:15:16.879 --> 00:15:22.240
to on this post mortems or learning
reviews, or you know, sometimes you're

198
00:15:22.279 --> 00:15:26.519
just getting in a room and talking
about the incident without the structure, you

199
00:15:26.559 --> 00:15:31.120
start to uncover all of these really
interesting aspects of not only the responding team,

200
00:15:31.159 --> 00:15:33.639
but the organization overall. And so
some of the things that we're most

201
00:15:33.720 --> 00:15:39.200
interested in learning is, you know, what did folks know when they responded

202
00:15:39.240 --> 00:15:41.879
to the incident and what did they
not know? Right? What are the

203
00:15:41.919 --> 00:15:50.360
ways in which the folks involved communicated
successfully and maybe not so much? How

204
00:15:50.399 --> 00:15:56.840
did the organization's processes contribute to or
prevent aspects of a specific incident. It's

205
00:15:56.879 --> 00:16:00.519
all kinds of interesting stuff to dig
into, and you can look at it

206
00:16:00.559 --> 00:16:04.200
from a bunch of different angles.
So we have, you know, a

207
00:16:04.200 --> 00:16:10.360
lot of examples of our customers creating
multiple investigations on an incident where a person

208
00:16:10.399 --> 00:16:14.480
A and person beat both investigate and
then you see like where the differences are,

209
00:16:15.279 --> 00:16:18.600
and I think that turns up a
lot of interesting stuff. We've taken

210
00:16:18.799 --> 00:16:23.000
the approach in Jelly of writing incident
narratives, so you know, post learning,

211
00:16:23.039 --> 00:16:26.799
review, post mortem, whatever you
want to call it. Our feeling

212
00:16:26.919 --> 00:16:33.120
is that incidents are stories and the
way that people connect with information and learn

213
00:16:33.240 --> 00:16:36.679
is through storytelling. And so we've
taken the approach that, you know,

214
00:16:37.000 --> 00:16:41.519
we want to provide folks with a
tool to tell a story backed by evidence,

215
00:16:41.600 --> 00:16:45.080
right, what was actually said during
the incident, what you know,

216
00:16:45.519 --> 00:16:48.519
metrics or data we were looking at, but to kind of nudge folks in

217
00:16:48.559 --> 00:16:55.679
the direction of sharing their perspective and
their assertions about what it means. Right,

218
00:16:55.840 --> 00:16:59.639
when these two folks were talking,
they were talking about different aspects of

219
00:16:59.679 --> 00:17:02.720
the system, and they didn't realize
it what does that mean, right,

220
00:17:02.720 --> 00:17:08.079
what's the opportunity there to improve the
incident management and the way that these teams

221
00:17:08.079 --> 00:17:14.599
are connected and communicating those sorts of
things. Yeah, you see that a

222
00:17:14.599 --> 00:17:18.240
lot whenever you have people with different
disciplines or different backgrounds, you know,

223
00:17:18.279 --> 00:17:25.519
a networking background versus a software engineering
background. And I think that highlights one

224
00:17:25.599 --> 00:17:33.160
of the one of the arts of
post incident response is creating those follow up

225
00:17:33.200 --> 00:17:41.000
items and getting those the right people
engaged to recognize, prioritize, and address

226
00:17:41.160 --> 00:17:45.200
the things that you learned from that
incident. Yeah, and you know that

227
00:17:45.400 --> 00:17:51.119
you mentioned like different disciplines. There
are different different disciplines within the responding team,

228
00:17:51.200 --> 00:17:55.839
but there are also incidents provide this
really unique opportunity to consider the different

229
00:17:55.839 --> 00:17:59.920
disciplines across an organization. Right,
So for your major incidents, it's not

230
00:18:00.160 --> 00:18:03.720
just your you know, senior engineers
from a specific team. It's also your

231
00:18:03.759 --> 00:18:08.480
customer support support folks on critical accounts. It's also your group leads and your

232
00:18:08.480 --> 00:18:15.960
executives. All of these people have
different priorities and perspectives and understanding with regard

233
00:18:17.079 --> 00:18:19.319
to the impacted systems and the impact
on the business. Right, if I'm

234
00:18:19.359 --> 00:18:22.880
responding to an incident, my goal
is to make the chart go down,

235
00:18:23.440 --> 00:18:30.440
whereas my executive or salespeople's goal is
to minimize the costs associated with customer impact.

236
00:18:30.559 --> 00:18:33.319
Right, We've got slas with our
customers for uptime, and we need

237
00:18:33.359 --> 00:18:38.279
to keep that in line. And
I think the different perspectives and priorities there

238
00:18:38.359 --> 00:18:42.599
result in that same kind of differing
perspective that I mentioned earlier, where I

239
00:18:42.680 --> 00:18:47.480
may look at an incident and think
it means one thing, but my group

240
00:18:47.599 --> 00:18:51.960
lead or you know, my sales
associate may look at it and think another

241
00:18:52.039 --> 00:18:59.079
thing. And that opportunity with you
know, incident narratives or post incident learning

242
00:18:59.160 --> 00:19:03.640
is to try and bridge that divide
between those different perspectives and help everyone cultivate

243
00:19:03.640 --> 00:19:07.160
a shared understanding of what it means
across those dimensions. Right, this is

244
00:19:07.200 --> 00:19:14.119
what this incident meant for business impact
and process, for customer satisfaction, and

245
00:19:14.359 --> 00:19:18.440
for the you know, sustainability of
our you know, critical services something like

246
00:19:18.440 --> 00:19:25.240
that. Yeah. I've even worked
in organizations where it involved the marketing team

247
00:19:25.359 --> 00:19:30.880
because they were out scrolling Twitter,
you know, catching tweet going on about

248
00:19:30.880 --> 00:19:34.599
the incident and responding those and trying
to do trying to minimize the blast radius

249
00:19:34.640 --> 00:19:38.920
there. Yeah, this is a
whole other aspect that's really interesting, which

250
00:19:38.960 --> 00:19:42.960
is like where do incidents come from? Right? Who says what an incident

251
00:19:44.160 --> 00:19:48.240
is? We've taken the approach that
anyone can declare an incident. Some organizations

252
00:19:48.240 --> 00:19:52.359
we've worked with are very narrow in
terms of who can declare them. But

253
00:19:52.359 --> 00:19:56.160
yeah, customer success marketing, you
know, random person from the internet.

254
00:19:56.279 --> 00:20:02.519
There are all sources of potential incidents, you know, automation and observability,

255
00:20:02.519 --> 00:20:06.559
those sorts of things, and so
it's you know, the the once you

256
00:20:06.599 --> 00:20:11.440
start thinking about this space and you
start exploring ways of benefiting from these lenses

257
00:20:11.759 --> 00:20:18.720
on current state of systems and organizational
process, you start to see like there

258
00:20:18.759 --> 00:20:25.359
are opportunities everywhere. Right at Jelly
internally, we create incidents for things that

259
00:20:25.400 --> 00:20:29.079
are not incidents. If we have
a release going out that we think might

260
00:20:29.119 --> 00:20:33.839
be you know, impactful to customers
because it changes some aspect of the user

261
00:20:33.880 --> 00:20:37.599
experience, that's an incident. If
we're trying to better understand database failover in

262
00:20:37.799 --> 00:20:41.799
RDS, for example, we run
a game day as an incident, and

263
00:20:42.200 --> 00:20:47.480
doing that gives you this repository of
information that you can use again to build

264
00:20:47.519 --> 00:20:51.519
that narrative and make those assertions about
where are we and where do we want

265
00:20:51.519 --> 00:20:55.759
to be with regard to how we're
operating and the health and stability of our

266
00:20:56.359 --> 00:21:00.279
systems. So that's a really interesting
anecdote about marketing. I love when those

267
00:21:00.319 --> 00:21:04.240
things come in from places you don't
expect, right, You just kind of

268
00:21:04.279 --> 00:21:07.519
get a message from someone that you
haven't met before and they're like, hey,

269
00:21:07.559 --> 00:21:11.599
there's something going on yet we'd better
declare Yeah, yeah, you see

270
00:21:12.039 --> 00:21:17.839
someone from marketing enter in one of
the tech Slack channels and that this is

271
00:21:17.880 --> 00:21:23.319
not going to go well. So
I think one of the cool one of

272
00:21:23.319 --> 00:21:30.839
the cool types of companies I like
to work with fit the model of Jelly

273
00:21:30.920 --> 00:21:34.920
because you actually use your own product, you know, like when you build

274
00:21:34.960 --> 00:21:40.799
and release it, your team actually
uses it to manage your own incidents.

275
00:21:40.839 --> 00:21:45.519
And I think that is really really
cool because you get firsthand experience of what

276
00:21:45.559 --> 00:21:49.920
it's like to be your own customer, and you can understand what your customers

277
00:21:49.960 --> 00:21:56.240
are actually seeing when they're trying to
use your tool. Yeah, one thing

278
00:21:56.240 --> 00:22:03.359
that was really interesting thing in the
early days about working with our customers.

279
00:22:03.400 --> 00:22:06.519
It's interesting now as well. We'll
have to talk about page duty at some

280
00:22:06.519 --> 00:22:10.519
point later. But one thing that
was really interesting is that the customers that

281
00:22:10.559 --> 00:22:14.519
we work with are really passionate about
their process and those opportunities to learn,

282
00:22:14.559 --> 00:22:18.839
and so we get to work really
closely with them on you know, understanding

283
00:22:18.920 --> 00:22:22.079
their process and building tooling it works
for them. We work with F five

284
00:22:22.200 --> 00:22:26.960
and Indeed and Honeycomb and Zendesk.
These are like, you know, large

285
00:22:26.079 --> 00:22:30.920
influential organizations who are kind of at
the cutting edge of this process. So

286
00:22:32.640 --> 00:22:37.759
there's this bi directional information share where
you know, we can build features that

287
00:22:37.799 --> 00:22:41.039
support those organizations processes, but then
we can also adopt some of those organizational

288
00:22:41.039 --> 00:22:45.119
processes because they make a lot of
sense and they work well for us.

289
00:22:45.880 --> 00:22:51.759
I was we were doing a product
demo for an important group of people the

290
00:22:51.799 --> 00:22:55.519
other day and we noticed some lag
in one of our features and I actually

291
00:22:55.559 --> 00:23:00.160
declared an incident with Jelley about the
performance of the incident was cons tol jelly,

292
00:23:00.720 --> 00:23:03.759
and we ran that in parallel during
the demo, and it was there

293
00:23:03.799 --> 00:23:08.559
was this moment where I was just
like, this is so cool running an

294
00:23:08.559 --> 00:23:11.880
incident with the tool that we're demoing
to people, and there wasn't actually an

295
00:23:11.880 --> 00:23:15.880
issue. It was a Wi Fi
lag you know thing, So everything was

296
00:23:15.920 --> 00:23:19.880
good and that's okay. That's also
a learning opportunity. But yeah, it's

297
00:23:19.880 --> 00:23:26.519
been really exciting to kind of watch
things evolve over time and be a you

298
00:23:26.559 --> 00:23:30.440
know, benefactor of that system as
well as trying to evolve it for our

299
00:23:30.480 --> 00:23:36.759
customers and find that alignment across across
orgs, which is really unique. Most

300
00:23:36.759 --> 00:23:41.200
of your incident response and post incident
learning is within an org. We've had

301
00:23:41.200 --> 00:23:48.559
the unique opportunity to kind of extend
that outward, so fun right on.

302
00:23:48.559 --> 00:23:52.119
One of the things I'm interested to
get your opinion on is I over the

303
00:23:52.200 --> 00:23:59.279
years, I've developed the opinion that
there's a difference between mitigating the issue and

304
00:23:59.400 --> 00:24:02.720
resolving issue. And I refer to
that in in terms of, like during

305
00:24:02.759 --> 00:24:07.279
the incident, you know, you
have you know, to say your API

306
00:24:07.359 --> 00:24:15.279
service is slow, it's okay during
the incident to throw more servers at it.

307
00:24:15.839 --> 00:24:19.039
You know, we're going to we're
going to mitigate the issue by adding

308
00:24:19.079 --> 00:24:23.079
more servers or adding more memory,
or do something to make the symptoms of

309
00:24:23.119 --> 00:24:29.319
the problem go away. But then
there's this like defining moment of okay,

310
00:24:30.079 --> 00:24:34.599
customer impact has been resolved, but
now we've got to go back and find

311
00:24:34.839 --> 00:24:40.599
the root cause because adding the additional
servers did not fix the issue that fixed

312
00:24:40.599 --> 00:24:45.039
the symptoms. And I'm interested to
get your opinion on that. Yeah,

313
00:24:45.119 --> 00:24:48.000
it's a really good distinction that you're
making there, and I think it has

314
00:24:48.039 --> 00:24:52.119
a lot to do with prioritization and
understanding. Right. So oftentimes, especially

315
00:24:52.119 --> 00:24:57.680
in major incidents, there's a priority
involved there to minimize customer impact, right,

316
00:24:57.680 --> 00:25:03.920
because customer impact means lost revenue.
Incidents are expensive both in terms of

317
00:25:03.960 --> 00:25:07.519
time and you know, customer satisfaction
and trust. And so I think there

318
00:25:07.519 --> 00:25:11.960
are kind of two ways in my
experience that you mitigate before resolution. And

319
00:25:12.000 --> 00:25:18.200
the first i'm mentioning now is about
minimizing the impact in favor of kind of

320
00:25:18.240 --> 00:25:22.480
getting things back on track. And
so, like you said, throw some

321
00:25:22.519 --> 00:25:27.039
additional servers at the API and that'll
address the symptom, but we still don't

322
00:25:27.119 --> 00:25:30.880
understand what's going on in the hood, right, And so I think the

323
00:25:30.920 --> 00:25:36.400
second reason, sometimes you can choose
not to mitigate an issue. I've been

324
00:25:36.440 --> 00:25:41.720
in situations where we've had customer impact, but the priority of understanding what's going

325
00:25:41.759 --> 00:25:45.160
on has exceeded the priority of needing
to address that impact, maybe because it's

326
00:25:45.200 --> 00:25:48.119
like, you know, one user
at a customer rather than all customers in

327
00:25:48.160 --> 00:25:52.119
a major incident. And so that
second bit I think is really interesting because

328
00:25:52.240 --> 00:25:59.920
you can use the incident and the
the levers you can pull during the incident

329
00:26:00.039 --> 00:26:03.119
to create the conditions for learning while
it's happening. Right, So if you

330
00:26:03.160 --> 00:26:06.920
mitigate the incident with the API,
it means that you have an opportunity to

331
00:26:06.920 --> 00:26:10.480
explore what was actually going on.
Maybe you isolate one of those servers and

332
00:26:10.519 --> 00:26:15.680
you start to dig into you know
which function calls. If you've got distributed

333
00:26:15.720 --> 00:26:19.599
tracing, which is amazing, you
know which specific function or endpoint is causing

334
00:26:19.680 --> 00:26:25.200
delay in the response, right,
that's causing a delay across all responses,

335
00:26:26.039 --> 00:26:30.559
And you can kind of take advantage
of that system state, which you know,

336
00:26:30.599 --> 00:26:33.000
if you reboot the servers, if
you add a ton of them,

337
00:26:33.240 --> 00:26:37.160
those conditions go away and you lose
your opportunity to understand what's going on.

338
00:26:37.200 --> 00:26:41.119
And so there's a lot of different
ways to look at it. I think

339
00:26:41.200 --> 00:26:47.039
mitigation and resolution for folks outside of
incident response, that's a mental framework for

340
00:26:47.160 --> 00:26:51.480
understanding are we good now and are
we good for the long term? Right?

341
00:26:52.000 --> 00:26:56.400
But as a responder, those two
events are really key in terms of

342
00:26:56.480 --> 00:27:02.000
communicating within the response group what our
level of understanding and what priority decisions we're

343
00:27:02.039 --> 00:27:07.599
making with regard to customer impact or
you know, system stability or what have

344
00:27:07.720 --> 00:27:14.400
you. Sometimes incidents are not resolved
for days after you know the actual incident.

345
00:27:14.519 --> 00:27:18.799
I've especially for for large, complex
incidents. Sometimes you just have to

346
00:27:18.839 --> 00:27:22.640
get things to a steady state and
let them stay there until you have chance

347
00:27:22.680 --> 00:27:26.319
to enroll more folks or get a
deeper understanding of what's going on. And

348
00:27:26.400 --> 00:27:29.359
sometimes those fixes are not things you
can roll out, you know, as

349
00:27:29.720 --> 00:27:33.559
one hot fix. Sometimes they are
major upgrades or major changes to kind of

350
00:27:33.599 --> 00:27:40.039
foundational business logic. So I'm glad
you made that distinction because they're they're really

351
00:27:40.079 --> 00:27:42.319
important, and I think oftentimes folks
outside of the incident are just like,

352
00:27:42.359 --> 00:27:47.559
when are we mitigated? When we
mitigated? But you can't you can't lose

353
00:27:47.599 --> 00:27:51.759
sight of that, that time frame
between mitigation and resolution, because that's where

354
00:27:51.799 --> 00:28:02.160
a lot of the you know,
exploratory understanding comes out for sure. And

355
00:28:02.440 --> 00:28:08.240
one of the things that I try
to insist on is that mitigating the issue,

356
00:28:08.599 --> 00:28:12.880
were allowed to make live changes in
production, but the actual root cost

357
00:28:12.960 --> 00:28:18.440
fixed has to go through our normal
development cycle of making the changes in DEV,

358
00:28:18.880 --> 00:28:22.799
pushing the changes to a staging environment, validating them, and then promoting

359
00:28:22.799 --> 00:28:27.160
those changes to production. So it
has to follow that flow. Yeah,

360
00:28:27.240 --> 00:28:32.839
and that's that goes back to that
prioritization opportunity. Right. So once you've

361
00:28:32.920 --> 00:28:37.039
kind of addressed the business impacting issue, then you've got to get back to

362
00:28:37.079 --> 00:28:40.960
your fundamentals, right, and your
business processes and compliance and all of that.

363
00:28:41.519 --> 00:28:48.279
And so detangling those two things allows
you to respond in a way that

364
00:28:48.319 --> 00:28:52.079
helps the business, and then address
the issue in a way that helps the

365
00:28:52.119 --> 00:28:56.079
business, and do those in different
ways, because especially when you're when you're

366
00:28:56.079 --> 00:28:59.119
further along in your maturity model,
when you're a large organization, there's a

367
00:28:59.160 --> 00:29:03.480
lot of things that can hands stand
in the way of quickly addressing an issue.

368
00:29:03.559 --> 00:29:07.559
Right. If you don't create a
path for doing that, then incidents

369
00:29:07.599 --> 00:29:11.640
end up taking longer and having a
lot more impact. So yeah, and

370
00:29:12.200 --> 00:29:15.559
the other thing we've learned in all
of this is that every organization is different,

371
00:29:15.680 --> 00:29:22.079
Right. Some organizations have response processes
that specifically call out different ways of

372
00:29:22.160 --> 00:29:29.160
mitigating impacting issues and different ways of
capturing follow ups for those. Right,

373
00:29:29.240 --> 00:29:33.279
Sometimes the incident's not closed until you've
resolved it, and sometimes it's closed at

374
00:29:33.279 --> 00:29:37.160
the point that it's mitigated and you've
captured the follow ups you want to take

375
00:29:37.160 --> 00:29:41.680
action on. You know. As
a result, sometimes folks keep talking about

376
00:29:41.680 --> 00:29:44.720
the incident after it's been closed and
they want all of that for their post

377
00:29:44.799 --> 00:29:51.039
incident learning review as well. There's
just so many different ways to tailor this

378
00:29:51.480 --> 00:29:56.759
whole incident management process to help an
organization be more successful. Yeah, one

379
00:29:56.799 --> 00:30:00.920
of the places I worked years ago
was is that a healthcare provider, and

380
00:30:00.960 --> 00:30:10.079
we did we provided medical services for
hospitals across the US for trauma patients,

381
00:30:10.599 --> 00:30:15.319
and so every incident that we had, whenever we broke out an incident room,

382
00:30:15.400 --> 00:30:18.680
we actually had a person from our
quality team who would join the call

383
00:30:18.720 --> 00:30:22.519
as well and let us know,
like every five or ten minutes, how

384
00:30:22.519 --> 00:30:29.359
many patients across the United States couldn't
receive life saving healthcare because our stuff was

385
00:30:29.359 --> 00:30:34.119
broken. And so we had a
very unique incident response model there that doesn't

386
00:30:34.160 --> 00:30:37.799
really apply anywhere I've been since then, but there were still lessons that I've

387
00:30:37.839 --> 00:30:42.319
taken away from that, you know, number one is mitigate the issue as

388
00:30:42.400 --> 00:30:48.799
possible. Right, I'm so interested
to hear how how did that information help

389
00:30:48.920 --> 00:30:59.160
or hinder mitigation for your teams.
It really set the priority and kept us

390
00:30:59.200 --> 00:31:03.920
focused, you know, because as
that number went up, you started to

391
00:31:03.000 --> 00:31:08.079
understand, you know, the impact
that this was having. And this was

392
00:31:08.119 --> 00:31:15.359
not a other development team sucks or
their network is terrible, or and many

393
00:31:15.599 --> 00:31:22.200
many of our incidents it was because
of user error at one of the trauma

394
00:31:22.279 --> 00:31:26.480
centers. But it's still not okay
to say, oh, well, they're

395
00:31:26.559 --> 00:31:29.119
just doing it wrong, because you
have to realize at the same time,

396
00:31:29.720 --> 00:31:33.480
you know, while you're on the
phone with that person, they're up on

397
00:31:33.400 --> 00:31:37.680
a table in the emergency room doing
chess compressions on this patient. So they're

398
00:31:37.680 --> 00:31:41.559
going to give it their best shot, but they may not be the most

399
00:31:41.559 --> 00:31:45.240
attentive user at that time, and
you just got to work with that.

400
00:31:47.079 --> 00:31:52.359
Yeah, you're you're highlighting like a
perfect example of I think why we are

401
00:31:52.440 --> 00:31:57.359
so focused on post incident learning,
and it's because the most important aspect of

402
00:31:57.400 --> 00:32:02.599
these complex technical systems that we're all
building and maintaining are the people involved,

403
00:32:02.720 --> 00:32:07.559
Right, and when you're in an
incident response room, a major incident room,

404
00:32:07.599 --> 00:32:10.279
whatever, and you've got someone reminding
you of the impact, especially when

405
00:32:10.279 --> 00:32:15.000
that impact is you know, not
just on dollars, but also on people's

406
00:32:15.440 --> 00:32:24.200
lives. You create the conditions for
this like profound human creativity, right in

407
00:32:24.640 --> 00:32:29.160
terms of figuring out, you know, what can we do as a team

408
00:32:29.319 --> 00:32:32.039
to kind of we're back to the
incident management space, what can we do

409
00:32:32.039 --> 00:32:35.839
as a team to kind of come
up with a creative solution here and get

410
00:32:35.880 --> 00:32:38.359
us back to good, you know, temporarily. And I think if you're

411
00:32:38.400 --> 00:32:45.440
not reflecting on and talking about those
moments in incident response and your you know,

412
00:32:45.759 --> 00:32:49.839
postings in a learning review or narrative
review, whatever you call it,

413
00:32:50.480 --> 00:32:52.920
and you're missing out on all of
those examples of the ways in which the

414
00:32:53.079 --> 00:32:58.039
people are helping support the system and
keep things moving. You know, we

415
00:32:58.400 --> 00:33:04.359
hear a lot in tech and DevOps
and elsewhere that like automation is the key

416
00:33:04.559 --> 00:33:09.799
to sustainability and more reliable systems.
And there are things that we can automate,

417
00:33:10.039 --> 00:33:15.400
you know, especially assigning roles during
incident management and response. But there's

418
00:33:15.440 --> 00:33:21.440
a lot of you know, human
involvement tweaking the system and adding you know

419
00:33:21.559 --> 00:33:25.920
capacity, not you know, technical
capacity in terms of number of network requests.

420
00:33:25.960 --> 00:33:30.880
You can handle things like that,
but adding capacity in terms of the

421
00:33:30.880 --> 00:33:36.359
system's adaptability. And I just like, I would love to be a fly

422
00:33:36.480 --> 00:33:38.480
on the wall for one of those
incidents that you mentioned, because I imagine

423
00:33:38.519 --> 00:33:44.000
folks really came together and came up
with some creative solutions to find a way

424
00:33:44.000 --> 00:33:47.440
to mitigate those incidents and get things
back to good so that they could figure

425
00:33:47.440 --> 00:33:50.799
out, you know, what the
long term solutions were. That's such an

426
00:33:50.799 --> 00:33:55.880
exciting like space, Yeah, for
sure, and it's you know, it

427
00:33:55.920 --> 00:34:02.200
was a majority of the role was
communication. Like all of all the my

428
00:34:02.319 --> 00:34:09.360
coworkers there had exceptional technical skills,
but their communication skills were just a plus

429
00:34:09.440 --> 00:34:14.400
one on top of that. And
I think that's what made it work so

430
00:34:14.559 --> 00:34:16.559
well. And I still say that
to this day. You know, DevOps

431
00:34:17.519 --> 00:34:22.119
is not a technical world. There's
a technical comm component, but it really

432
00:34:22.199 --> 00:34:29.280
is communications in building the technical framework, but then communicating that out to your

433
00:34:29.320 --> 00:34:32.360
customers, the engineers that you support, and getting the feedback from them to

434
00:34:32.480 --> 00:34:37.840
understand what's the difference between what I
built and what they thought I built.

435
00:34:38.519 --> 00:34:43.119
Yeah, it's It's really great when
you have those folks who kind of know

436
00:34:43.280 --> 00:34:46.800
how to be in a critical situation
and maintain you know, effective communication and

437
00:34:46.840 --> 00:34:52.360
find a solution to the issue.
One thing we talk a lot about is

438
00:34:52.360 --> 00:34:54.880
like how do you scale that,
how do you how do you externalize those

439
00:34:54.920 --> 00:35:00.119
skills? Oftentimes we find that the
folks who are most effective and inti and

440
00:35:00.239 --> 00:35:04.280
don't have the capacity or time to
help up level or train folks into that

441
00:35:04.920 --> 00:35:07.119
discipline. Right. It kind of
requires a lot of different skills. You

442
00:35:07.159 --> 00:35:12.119
need a technical expertise, you need
experience with the systems involved, and you

443
00:35:12.159 --> 00:35:17.920
need a good handle on like effective
communication, not just for communicating the status

444
00:35:17.960 --> 00:35:22.119
of the incident, but also communicating
with the folks that you are directing if

445
00:35:22.159 --> 00:35:27.559
you're in an incident commander role for
example. There's another area where if you

446
00:35:27.599 --> 00:35:30.079
invest in learning from these things,
you can create artifacts that folks pick up

447
00:35:30.079 --> 00:35:36.000
when they join the organization. Right
in almost every large org I've been,

448
00:35:36.079 --> 00:35:42.960
there's a confluence space or Google drive
folder is something full of post incident reviews.

449
00:35:43.440 --> 00:35:45.920
Sometimes I'll just go in and read
those, right, and you start

450
00:35:45.960 --> 00:35:50.840
to learn, you know, who
are the folks who demonstrate an ability to

451
00:35:50.920 --> 00:35:53.320
kind of respond to some of the
most significant incidents and what are they doing,

452
00:35:53.920 --> 00:35:59.800
how are they doing that right?
What skills or actions have they taken

453
00:35:59.840 --> 00:36:02.960
that stood out in the learning room
review should I try and cultivate as a

454
00:36:04.000 --> 00:36:07.920
responder? And so that that can
be a really interesting space too, is

455
00:36:08.440 --> 00:36:13.119
not just learning about the system and
what things we can change to improve performance

456
00:36:13.159 --> 00:36:16.079
of at the time, but how
are we leaving breadcrumbs for the new folks

457
00:36:16.119 --> 00:36:21.559
coming into the org who are growing
into that discipline, because trial by fire

458
00:36:21.840 --> 00:36:25.639
during a major incident can be a
really stressful, kind of terrifying experience,

459
00:36:27.119 --> 00:36:30.320
and so the more you can kind
of give, you know, these these

460
00:36:30.480 --> 00:36:37.639
anecdotal or story based accounts of how
things go in your organization, more comfortable

461
00:36:37.679 --> 00:36:42.320
folks and feel when they step into
that role. Yeah, I think it's

462
00:36:42.320 --> 00:36:47.679
one of those areas where there's like
a mentoring path there. And as I

463
00:36:47.719 --> 00:36:52.599
have gotten older and been doing this
for a while, I've realized that that's

464
00:36:52.159 --> 00:36:58.719
that's a larger part of my job
is sharing that that context because you can

465
00:36:58.760 --> 00:37:04.639
put the documentation, but then there's
also like the unspoken or the unwritten part

466
00:37:04.679 --> 00:37:07.880
of that. You know, there's
the mood, the field the context of

467
00:37:07.920 --> 00:37:14.000
the situation. And I think that's
been a problem for you know, far

468
00:37:14.079 --> 00:37:17.159
beyond my lifetime, and the only
way we've been successful at solving it now

469
00:37:17.639 --> 00:37:22.719
up to this point is just through
that mentoring type role where you bring people

470
00:37:22.760 --> 00:37:28.000
in even though you know that they
aren't ready to be the lead in this,

471
00:37:28.440 --> 00:37:32.280
you bring them in just so that
they can can witness it and start

472
00:37:32.320 --> 00:37:38.079
making notes for themselves. Yeah,
and that's where a process or a policy

473
00:37:38.159 --> 00:37:44.920
around incident response and incident learning that
is based on transparency can be really helpful.

474
00:37:45.480 --> 00:37:49.719
Right, Sometimes you get a lot
of folks joining the major incident room

475
00:37:49.760 --> 00:37:53.239
that are trying to contribute in ways
that may not actually you know, help

476
00:37:53.360 --> 00:37:58.519
with mitigation. But a lot of
times we find in large organizations that have

477
00:37:59.199 --> 00:38:02.599
you know, policies angle toward transparency, folks just joined to kind of understand

478
00:38:02.639 --> 00:38:07.840
and learn in the moment and also
after the fact. So, you know,

479
00:38:07.920 --> 00:38:13.880
the the incident learning review calendar is
always a place that I go and

480
00:38:13.920 --> 00:38:16.159
try to figure out, you know, which which of these incidents are going

481
00:38:16.199 --> 00:38:21.079
to be most helpful for me understanding
the way this organization operates and the critical

482
00:38:21.119 --> 00:38:24.360
systems. Right in the past,
role we had a COFCA platform that was

483
00:38:25.039 --> 00:38:29.280
you know, involved in a lot
of incidents, not because the COFCA platform

484
00:38:29.360 --> 00:38:31.039
was a problem, but because everything
was built around it, right, So

485
00:38:31.079 --> 00:38:35.280
every time there was an issue with
any system, that kind of tied back

486
00:38:35.320 --> 00:38:37.760
to there. And that presents a
really interesting lens for you know, how

487
00:38:37.760 --> 00:38:42.840
do these folks communicate with the low
broader org and what changes are we making

488
00:38:42.920 --> 00:38:46.960
to shore up some of those critical
dependencies, And you know, just being

489
00:38:46.960 --> 00:38:52.039
able to join a conversation about that, not having been involved in response or

490
00:38:52.079 --> 00:38:57.199
having anything to do with the teams
involved, can be a really powerful opportunity

491
00:38:57.199 --> 00:39:00.079
for you to kind of learn about
the team that you're working with and the

492
00:39:00.280 --> 00:39:05.519
underlying technologies. Especially for folks like
me, it's been eight years since I

493
00:39:05.679 --> 00:39:08.840
was you know, maintaining those types
of platforms and so picking up on some

494
00:39:08.920 --> 00:39:14.320
of that nuance so that I can
support the folks who are around those systems

495
00:39:14.320 --> 00:39:16.920
can be really helpful. There's a
line there, though, You've got to

496
00:39:16.920 --> 00:39:22.480
make sure that expectations are clear,
right. If you're participating in something for

497
00:39:22.519 --> 00:39:28.400
the purpose of learning, you're kind
of a sponge rather than someone who's bringing

498
00:39:28.519 --> 00:39:31.360
opinions, you know, not having
understood the circumstances of the specific incident.

499
00:39:31.920 --> 00:39:37.400
So you need a healthy kind of
culture and set of expectations around this.

500
00:39:37.559 --> 00:39:40.000
But I've seen a lot of orgs
that do it well, and it is

501
00:39:40.599 --> 00:39:45.760
a game changer, you know,
for for helping to provide you know,

502
00:39:46.599 --> 00:39:52.519
scalable mentorship and opportunities for folks to
get a better understanding of the details.

503
00:39:52.760 --> 00:39:57.039
Yeah. One of the things you
commented on that I think just can't be

504
00:39:57.119 --> 00:40:04.559
elaborated enough is transparen see. And
I've worked in multiple places, and when

505
00:40:04.559 --> 00:40:09.440
I first started my career, it
was it was in many instances a fireable

506
00:40:09.599 --> 00:40:14.920
event if you created an incident,
and for that reason, people would try

507
00:40:14.920 --> 00:40:20.679
to hide and cover up their incidents, which led to no one learning from

508
00:40:20.719 --> 00:40:25.079
that. And these days, you
know, I almost paraded around you know,

509
00:40:25.199 --> 00:40:31.199
hey, I broke this because there's
a learning opportunity there. And I

510
00:40:31.199 --> 00:40:37.559
think it's really important to be open
and to build the environment where people aren't

511
00:40:37.599 --> 00:40:42.320
afraid to say that they made mistakes, and even the dumb mistakes, we

512
00:40:42.400 --> 00:40:45.119
all do them, you know,
you learn from it. And I actually,

513
00:40:45.159 --> 00:40:49.679
at some point in my career a
boss of mine told me, and

514
00:40:49.800 --> 00:40:54.480
it's an anecdotal story, but it's
still effective. Someone created an incident cost

515
00:40:54.519 --> 00:40:59.159
several hundred thousand dollars and said,
oh am, I going to be fired

516
00:40:59.199 --> 00:41:02.440
now, and the responded, no, I just spent two hundred thousand dollars

517
00:41:02.480 --> 00:41:09.960
on your education. Why would I
fire you now? Yeah? And this

518
00:41:10.000 --> 00:41:15.440
is where I think, like,
it's really difficult to build trust, right,

519
00:41:15.480 --> 00:41:17.320
It's really easy to damage trust,
it's really difficult to build it.

520
00:41:17.320 --> 00:41:22.599
And so if you're approaching your your
incident management, you know, life cycle

521
00:41:22.679 --> 00:41:29.159
and process from the perspective of trying
to support folks doing what they can to

522
00:41:29.199 --> 00:41:36.079
help the business be successful, you
get a lot of really impactful contribution and

523
00:41:36.119 --> 00:41:39.079
collaboration with regard to you know,
keeping systems healthy and things like that.

524
00:41:39.599 --> 00:41:45.199
But if you over index on you
know, the measurable metrics. We're humans,

525
00:41:45.280 --> 00:41:49.800
right, every every human will gain
a measure Right, you start to

526
00:41:49.840 --> 00:41:53.360
cultivate some of those types of environments
where you know, what's the consequence of

527
00:41:53.400 --> 00:41:58.280
me doing the right thing here?
Is is it going to reflect poorly on

528
00:41:58.360 --> 00:42:00.199
me? Is it going to cause
an issue? And so, thankfully,

529
00:42:00.719 --> 00:42:06.079
I think every organization that the Jelly
has worked with over the past two and

530
00:42:06.119 --> 00:42:10.000
a half years since I joined two
and a half plus years, they've taken

531
00:42:10.000 --> 00:42:14.800
the approach that, yeah, these
are these are blame aware learning reviews.

532
00:42:14.920 --> 00:42:17.920
Right. We know that folks make
mistakes, that they don't have sufficient context

533
00:42:19.639 --> 00:42:23.400
in the moment, and that they
can learn from those experiences and change their

534
00:42:23.400 --> 00:42:30.239
approach next time, versus this kind
of you know, older model we'll say,

535
00:42:30.320 --> 00:42:37.800
of prioritizing the the the you know, public visibility of how things are

536
00:42:37.840 --> 00:42:39.519
going, and maybe like maybe we
don't declare an incident for that one,

537
00:42:39.559 --> 00:42:43.800
we just try to fix it quickly. Early in my career, I was

538
00:42:44.400 --> 00:42:49.679
learning how to use Microsoft SEQL databases
and we had a large share point site.

539
00:42:49.719 --> 00:42:54.800
It was another medical audit company,
and I learned what drop database commands

540
00:42:54.840 --> 00:43:01.440
do, and I did the hire
production database and fortunately I had enough experience

541
00:43:01.440 --> 00:43:06.159
to quickly restore it before anyone noticed. But that was an environment where I

542
00:43:06.199 --> 00:43:10.119
didn't feel comfortable, you know,
broadcasting that I had just seen in the

543
00:43:10.199 --> 00:43:15.239
process of learning some new commands stopped
the entire database. So yeah, it

544
00:43:15.239 --> 00:43:22.360
can be a tricky balance, but
you know, some light is the best

545
00:43:22.360 --> 00:43:25.360
medicine, right. Transparency in these
types of environments allow folks to do what's

546
00:43:25.440 --> 00:43:30.760
necessary to get things back to good
And I think the more you can kind

547
00:43:30.800 --> 00:43:36.760
of socialize and demonstrate that transparency,
the more effective your organization is going to

548
00:43:36.760 --> 00:43:39.840
be, and the more folks are
going to want to contribute to that mission,

549
00:43:39.960 --> 00:43:45.360
whatever it is. Yeah, yeah, absolutely agreed. So let's talk

550
00:43:45.400 --> 00:43:51.719
a little bit about what's going on
with Jelly these days. Yeah, So

551
00:43:51.880 --> 00:43:57.360
Jelly has been like the most interesting
experience of my career. I think I

552
00:43:57.440 --> 00:44:02.239
mentioned I joined in twenty twenty one. I think it was Jelly was just

553
00:44:02.360 --> 00:44:07.199
a post incident analysis tool at that
time. So we had this notion of

554
00:44:07.239 --> 00:44:12.599
building narratives and not much else,
and we recognized that part of the post

555
00:44:12.679 --> 00:44:16.000
incident learning process involves having good data, and the way that you get good

556
00:44:16.079 --> 00:44:20.880
data is you get consistent in your
process. And so we ended up building

557
00:44:20.880 --> 00:44:24.840
this incident response bot and we also
went to the other end of the spectrum

558
00:44:24.840 --> 00:44:30.079
and started introducing features for cross incident
analysis. And so this is, you

559
00:44:30.119 --> 00:44:32.840
know, after an incident, let's
spend some time learning, but then how

560
00:44:32.840 --> 00:44:38.719
do we roll up those learnings into
themes across incidents that help the organization make

561
00:44:38.800 --> 00:44:46.239
decisions around growing teams to support services
or changing direction with regard to build versus

562
00:44:46.239 --> 00:44:51.960
buy those sorts of things. And
so we've been working on a lot of

563
00:44:51.960 --> 00:44:57.000
cool stuff for the last two and
a half years. And then in what

564
00:44:57.079 --> 00:45:01.440
was it, I think November seconds
the public announcement that we were merging with

565
00:45:01.480 --> 00:45:09.400
Patrie Duty went out, which has
been like really exciting and also a crying

566
00:45:09.440 --> 00:45:14.079
experience has been a month, right, And so page Duty is something like

567
00:45:15.360 --> 00:45:21.880
eleven hundred employees as of January of
this year, we were twenty one.

568
00:45:22.360 --> 00:45:25.719
We're kind of in the process of
figuring out how to bridge those two divides.

569
00:45:25.760 --> 00:45:30.280
And one thing that I'm really excited
about is, you know, Jelly

570
00:45:30.360 --> 00:45:35.880
has spent a lot of time differentiating
itself as a product in the postings and

571
00:45:36.039 --> 00:45:39.880
learning area, and I think we've
brought a lot of kind of novel approaches

572
00:45:39.880 --> 00:45:45.559
and opinions students that response in general. Patri Duty has been doing this for

573
00:45:45.639 --> 00:45:51.079
fourteen plus years, right they and
they created the category within which Jelry could

574
00:45:51.079 --> 00:45:54.519
become a company, which is pretty
cool. And so what we're looking to

575
00:45:54.599 --> 00:46:01.280
do now is to take that practice, you know, post incident learning really

576
00:46:01.519 --> 00:46:06.719
get folks from the earlier phases of
the maturity level where they're just doing incident

577
00:46:06.760 --> 00:46:09.760
response and maybe they're doing a post
incident learning review on a Google doc,

578
00:46:10.159 --> 00:46:15.880
and bring them into the modern right
and start creating incident narratives and doing learning

579
00:46:15.880 --> 00:46:21.239
reviews. Page of Duty has something
like twenty seven thousand free and paid customers.

580
00:46:21.760 --> 00:46:24.800
There's a huge opportunity there to help
folks understand a better way of kind

581
00:46:24.840 --> 00:46:31.159
of benefiting from all. So that's
my focus right now is figuring out how

582
00:46:31.159 --> 00:46:37.719
do we bring those two worlds together
while keeping an eye on preserving that kind

583
00:46:37.760 --> 00:46:45.760
of post incident learning tooling and opportunity. But yeah, a lot a lot

584
00:46:45.760 --> 00:46:49.760
of exciting stuff on the horizon we
are. We are going into a new

585
00:46:49.880 --> 00:46:53.400
year, so I think things will
look very different on the Page of Duty

586
00:46:53.440 --> 00:47:00.000
side and probably also improve on the
Jelly side as well. It's going to

587
00:47:00.039 --> 00:47:04.840
be it's going to be really interesting. Yeah. I think it's a natural

588
00:47:04.880 --> 00:47:08.840
fit, you know, because Page
your Duty is hands down a great tool

589
00:47:09.039 --> 00:47:19.079
for notifying people that there's something requesting
their attention, but what you do after

590
00:47:19.159 --> 00:47:23.760
that is kind of up to you, and so it seems like a natural

591
00:47:23.800 --> 00:47:31.400
fit to just roll that right into
into Jelly and and help help people like

592
00:47:31.480 --> 00:47:37.079
just from a business perspective, take
this huge page your Duty customer base and

593
00:47:37.199 --> 00:47:42.159
just guide them into the thing that
they thought they were doing all along.

594
00:47:43.639 --> 00:47:46.239
Yeah, one one focus for us
has always been, you know, how

595
00:47:46.280 --> 00:47:52.920
can we improve the quality of our
customers' postings and learning reviews, and how

596
00:47:52.920 --> 00:47:58.960
can we allow the folks conducting those
investigations to focus on what matters. We've

597
00:47:58.960 --> 00:48:05.119
talked to organizations where, you know, there was one problem manager at a

598
00:48:05.119 --> 00:48:07.360
a company that used Microsoft teams,
and part of their job was to go

599
00:48:07.440 --> 00:48:13.440
through every team's channel and find transcripts
associated with an incident and put them in

600
00:48:13.480 --> 00:48:16.239
service. Now, nobody should be
doing that, right, That's just toil.

601
00:48:16.440 --> 00:48:22.960
That's that's not productive. And so
one thing I'm especially excited for with

602
00:48:22.000 --> 00:48:27.320
this partnership with page Duty is or
this this acquisition by paye Duty, is

603
00:48:27.840 --> 00:48:30.079
they've got a ton of data,
right, And so when you're building your

604
00:48:30.440 --> 00:48:34.800
post incident narratives, your timeline,
and you're adding evidence and you're trying to

605
00:48:34.840 --> 00:48:38.760
help folks understand the details of an
incident, the more data you have to

606
00:48:38.840 --> 00:48:43.719
substantiate those claims and those events that
you're highlighting in the incident, the more

607
00:48:44.440 --> 00:48:49.079
folks can learn from you know,
the not only the overall shape of the

608
00:48:49.119 --> 00:48:52.400
incident, but the systems involved and
how they're used to understand you know,

609
00:48:52.440 --> 00:48:57.440
the underlying technology. And so there's
an element there that's really exciting, which

610
00:48:57.480 --> 00:49:00.920
is just we have a lot more
data to allow our our users to work

611
00:49:00.960 --> 00:49:07.239
with. But I also think,
like I said, Page Duty has been

612
00:49:07.280 --> 00:49:10.400
known for a really long time,
UH as kind of an industry leader in

613
00:49:10.519 --> 00:49:15.519
scheduling and alearning. Right Uh act
and bail I got paged. I'm gonna

614
00:49:15.519 --> 00:49:19.559
go fix it. Uh. There
is a better way, right, Like,

615
00:49:19.760 --> 00:49:24.559
there are ways to tie that process
into the incident response process and the

616
00:49:24.599 --> 00:49:29.519
postings of the view and I think
that's that's going to be our focus over

617
00:49:29.599 --> 00:49:32.119
the next you know, several months, is figuring out how do we give

618
00:49:35.639 --> 00:49:40.760
pager Duty more mechanisms for supporting responders
throughout the entire incident management life cycle,

619
00:49:40.880 --> 00:49:45.440
not just the detection phase, which
a lot of folks know and they're familiar

620
00:49:45.519 --> 00:49:51.039
with, but you know, Page
Duty's full operations cloud, which most folks

621
00:49:51.119 --> 00:49:53.920
I've talked to don't even know exists. Uh. And and this is you

622
00:49:53.960 --> 00:50:00.440
know, the the AI automation for
reducing noise to signal when it comes to

623
00:50:00.519 --> 00:50:06.440
events. This is all of the
mechanisms around running actual incidents, and then

624
00:50:06.480 --> 00:50:10.599
this is the post incident as well. Pad has a feature today called post

625
00:50:10.639 --> 00:50:15.519
mortems, which is fairly straightforward.
It's your your Google post mortem doc.

626
00:50:15.880 --> 00:50:20.880
But we think there's a lot of
opportunity to not require that folks are going

627
00:50:20.920 --> 00:50:23.679
and creating these data sets manually,
but just kind of provide that information so

628
00:50:23.719 --> 00:50:30.039
they can use it to better narratives
that are living all the things. And

629
00:50:30.119 --> 00:50:31.039
yeah, I think I think it's
a natural fit too. I mean,

630
00:50:31.079 --> 00:50:37.000
I've been using page of duty for
basically my entire career, right and being

631
00:50:37.039 --> 00:50:42.599
able to bridge that gap between that
paging and scheduling and then the things that

632
00:50:42.639 --> 00:50:45.760
I need to do to help my
team be successful, it's going to be

633
00:50:45.159 --> 00:50:51.639
huge, you know, for from
my experience. Yeah, I think having

634
00:50:51.719 --> 00:50:57.559
access to that data is going to
lead to better collaboration after the fact,

635
00:50:57.559 --> 00:51:00.280
because that's for me, I've always
struggled with that. You know, after

636
00:51:00.320 --> 00:51:04.960
the incident's over and you're trying to
do the review of it, trying to

637
00:51:05.079 --> 00:51:09.000
remember what things happened in what order
and remember all of those steps that you

638
00:51:09.079 --> 00:51:14.679
took. So if you've got something
that can can prompt you with reminders and

639
00:51:15.000 --> 00:51:17.320
kind of pre populate that narrative for
you. I think it's just going to

640
00:51:17.440 --> 00:51:23.360
lead to better, better results at
the end. Yeah, there is nothing

641
00:51:23.440 --> 00:51:28.679
better than having a starting point when
you are trying to investigate an incident.

642
00:51:28.760 --> 00:51:31.880
Right, If you open an empty
Google doc, it's a hard time.

643
00:51:32.000 --> 00:51:37.119
But if you can start with you
know, in Jelly today, you start

644
00:51:37.119 --> 00:51:39.719
with the incident transcript, all the
conversation that happened in slack and data about

645
00:51:39.719 --> 00:51:45.039
who is involved, so much better
than starting from nothing. And that's especially

646
00:51:45.079 --> 00:51:52.239
true when you know your incident response
process uses multiple data sources like multiple incident

647
00:51:52.280 --> 00:51:57.000
response channels or your data dog charts
or what have you. So we're not

648
00:51:57.320 --> 00:52:02.079
really looking to do the post incident
narrative for you. We're looking to give

649
00:52:02.079 --> 00:52:07.400
you a point to start from because
that saves time, it saves energy,

650
00:52:07.440 --> 00:52:12.320
and let's you focus on the things
that only you can create within your post

651
00:52:12.360 --> 00:52:16.039
incident narrative. Right, the investigator
is a conduit through which the folks who

652
00:52:16.079 --> 00:52:23.400
are involved in the organizational miscellany kind
of come together into a coherent story about

653
00:52:23.400 --> 00:52:29.199
what happens and what it means.
So we really want to like provide a

654
00:52:29.280 --> 00:52:34.840
foundation on top of which folks can
have these conversations. And I think there's

655
00:52:34.840 --> 00:52:38.840
a lot of opportunity there with this
kind of broader spectrum of data and integrations

656
00:52:40.199 --> 00:52:46.760
within customer's existing processes. Yeah,
it reminds me a lot of like a

657
00:52:47.920 --> 00:52:53.800
there's a like a a people skill
there. You put two people who don't

658
00:52:53.840 --> 00:53:00.239
know each other in a room and
anything could possibly happen. They could strick

659
00:53:00.280 --> 00:53:02.239
up conversation, they could sit there
in silence. You know, there's just

660
00:53:02.320 --> 00:53:07.880
no way to gauge it. But
then if you give them a conversation starter,

661
00:53:07.800 --> 00:53:13.199
then you can sort of like guide
the results from there. And I

662
00:53:13.239 --> 00:53:19.039
think I think that's the real value
of what the post incident narrative does,

663
00:53:19.199 --> 00:53:23.440
is it's that conversation starter. Yeah, that I mean certainly for us,

664
00:53:23.639 --> 00:53:27.679
as you mentioned, like we use
Jelly internally and we do our own learning

665
00:53:27.719 --> 00:53:32.800
reviews. I think the exercise of
you know, mitigating the incident, putting

666
00:53:32.880 --> 00:53:37.039
together the learning review, those are
valuable experiences for the folks involved. But

667
00:53:37.119 --> 00:53:40.880
getting everyone in the company, because
we can do that at twenty one people

668
00:53:42.480 --> 00:53:45.440
into a room to talk about what
happened, to ask questions to figure out

669
00:53:45.440 --> 00:53:49.119
what did you know? What did
you not know? You know? What

670
00:53:49.199 --> 00:53:52.079
did I know? And I wasn't
involved those those sorts of things. That's

671
00:53:52.119 --> 00:54:00.239
where you get really interesting kind of
exponential increases and understand And it's not just

672
00:54:00.760 --> 00:54:04.760
the thing that most excites me about
these learning reviews is it's not just the

673
00:54:04.840 --> 00:54:08.159
understanding of the technical or the organizational
process. It's the understanding of each other.

674
00:54:08.480 --> 00:54:13.400
Right, How how I communicate in
these environments? How you communicate what

675
00:54:13.440 --> 00:54:16.760
your expectations are, what sorts of
things I need to be better about informing

676
00:54:16.840 --> 00:54:24.000
during response. It's a it's a
retro right, and the software can't operate

677
00:54:24.039 --> 00:54:28.719
itself. And so if the people
are working effectively together, then the software

678
00:54:28.800 --> 00:54:31.079
is working effectively and if they're not, then it's not. And I think

679
00:54:31.079 --> 00:54:36.119
that's that's one of the really big
opportunities, especially for you know, the

680
00:54:36.280 --> 00:54:43.199
large complex organizations in novel economic environments, to figure out, you know,

681
00:54:43.280 --> 00:54:47.880
how do we improve our efficiency in
our collaboration so that we can do what

682
00:54:49.320 --> 00:54:53.000
needs to be done? Really exciting, Oh, it is really exciting.

683
00:54:53.119 --> 00:54:59.039
I'm looking forward to seeing how this
plays out for y'all. Yeah, I'll

684
00:54:59.079 --> 00:55:01.760
have to let you out there's you
know, we're in a phase right now

685
00:55:01.800 --> 00:55:05.599
where there are too many good things
for us to do, so we got

686
00:55:05.599 --> 00:55:10.440
to figure out the next best thing
and focus on that. But yeah,

687
00:55:10.480 --> 00:55:15.760
that's that's the spot I want to
be in. Endless opportunity ahead of us.

688
00:55:15.800 --> 00:55:20.800
We just got to figure out how
we're going to get that to our

689
00:55:20.800 --> 00:55:27.719
customers as quickly as possible. Yeah, for sure. Yeah cool. So,

690
00:55:28.440 --> 00:55:30.719
anything else you'd like to share with
us about incident response, Jelly,

691
00:55:30.960 --> 00:55:37.039
page of Duty, any topics at
all. Yeah, if you're not already

692
00:55:37.119 --> 00:55:39.840
using Page of Duty, take a
look. It's the best thing for paging

693
00:55:39.920 --> 00:55:43.519
that I've ever found. And if
you want to give Jelly a try,

694
00:55:43.719 --> 00:55:45.840
there's a free trial on the site
and we start you off with some pre

695
00:55:45.880 --> 00:55:51.280
built learning reviews so you can see
what they look like. Start playing around

696
00:55:51.320 --> 00:55:54.199
in there, and if you have
any questions, you know, I'm sure

697
00:55:54.280 --> 00:55:57.960
you'll be able to find me in
the show notes here. But it's been

698
00:55:58.000 --> 00:56:00.599
really great to meet you, Will
and thank you so much for opportunity to

699
00:56:00.679 --> 00:56:04.559
chat. No, thank you,
it's been a great conversation. I've enjoyed

700
00:56:04.599 --> 00:56:07.760
it. And uh, if you're
up for it, I would love to

701
00:56:07.400 --> 00:56:12.360
have you back on the show'd that'd
be great? All right? So much?

702
00:56:13.000 --> 00:56:15.800
All right? Cool? Well,
thanks for listening everyone, and I

703
00:56:15.840 --> 00:56:20.000
will see y'all next week. M