WEBVTT

1
00:00:05.679 --> 00:00:08.480
<v Speaker 1>Hey everybody, and welcome to another episode of the Ruby

2
00:00:08.519 --> 00:00:12.480
<v Speaker 1>Rokes podcast. This week on a panel, we have Luke Stutters. Hello,

3
00:00:12.640 --> 00:00:16.079
<v Speaker 1>we have Dave Kamia. Hey everyone, I'm Charles Maxwood from

4
00:00:16.120 --> 00:00:17.039
<v Speaker 1>dev chat dot TV.

5
00:00:17.359 --> 00:00:20.399
<v Speaker 2>Quick shout out about most valuable dot depth. Go check

6
00:00:20.440 --> 00:00:22.920
<v Speaker 2>it out. We have a special guest this week, and

7
00:00:22.960 --> 00:00:24.800
<v Speaker 2>that is Paul Zeich.

8
00:00:25.199 --> 00:00:27.039
<v Speaker 3>Zeich, well done, thank you.

9
00:00:27.920 --> 00:00:29.679
<v Speaker 2>Now you're here from Checker.

10
00:00:29.760 --> 00:00:33.039
<v Speaker 1>You gave a talk at rails komf about how you

11
00:00:33.119 --> 00:00:35.000
<v Speaker 1>broke stuff or somebody broke stuff.

12
00:00:35.079 --> 00:00:36.600
<v Speaker 2>Do you want to just kind of give us a

13
00:00:36.640 --> 00:00:37.759
<v Speaker 2>quick intro to who you are.

14
00:00:37.679 --> 00:00:39.399
<v Speaker 1>And what you do, and then we'll dive in and

15
00:00:39.439 --> 00:00:42.359
<v Speaker 1>talk about what broke and how you've figured it out.

16
00:00:42.600 --> 00:00:42.960
<v Speaker 3>Sure.

17
00:00:43.200 --> 00:00:46.920
<v Speaker 4>So, I've been a software engineer for about ten years. Recently,

18
00:00:46.960 --> 00:00:49.640
<v Speaker 4>in the last year or so, transitioned into an engineering

19
00:00:49.679 --> 00:00:53.000
<v Speaker 4>management role. But I've worked at a number of different

20
00:00:53.560 --> 00:00:57.280
<v Speaker 4>small startups. I and joined Checker in twenty seventeen when

21
00:00:57.280 --> 00:01:00.520
<v Speaker 4>the company was at about one hundred employees thirty engineers.

22
00:01:00.679 --> 00:01:03.119
<v Speaker 4>Contributed as an engineer for a couple of years to

23
00:01:03.200 --> 00:01:06.799
<v Speaker 4>our team, and then have recently transitioned, like I said,

24
00:01:06.840 --> 00:01:09.799
<v Speaker 4>into a jerry management role at the company.

25
00:01:09.959 --> 00:01:10.359
<v Speaker 2>Very cool.

26
00:01:10.400 --> 00:01:12.840
<v Speaker 1>I actually have a Checker T shirt in my closet

27
00:01:12.840 --> 00:01:15.719
<v Speaker 1>that I never wear. It's check R for those that

28
00:01:15.799 --> 00:01:18.280
<v Speaker 1>are listening and not reading it. Yeah, So why don't

29
00:01:18.280 --> 00:01:20.519
<v Speaker 1>you kind of te us up for this as far

30
00:01:20.599 --> 00:01:24.280
<v Speaker 1>as yeah, what happened, what broke? Yeah, give us a

31
00:01:24.319 --> 00:01:28.400
<v Speaker 1>sort of a preliminary timeline and explain what Checker does

32
00:01:28.400 --> 00:01:29.159
<v Speaker 1>and why that matters.

33
00:01:29.480 --> 00:01:29.799
<v Speaker 3>Sure.

34
00:01:30.040 --> 00:01:34.239
<v Speaker 4>So checker Checker was founded in twenty fourteen. Daniel and Jonathan,

35
00:01:34.519 --> 00:01:38.680
<v Speaker 4>our founders, had worked in the on demand space another

36
00:01:38.760 --> 00:01:42.480
<v Speaker 4>company and had discovered that is very difficult containing great

37
00:01:42.519 --> 00:01:46.359
<v Speaker 4>background checks into their onboarding process. Background checks tend to

38
00:01:46.359 --> 00:01:49.480
<v Speaker 4>be a very important final safety step for a lot

39
00:01:49.480 --> 00:01:52.120
<v Speaker 4>of these companies to make sure that their platform is

40
00:01:52.159 --> 00:01:56.159
<v Speaker 4>going to be safe and secure for their customers, and

41
00:01:56.239 --> 00:01:59.959
<v Speaker 4>so in twenty fourteen they started an automated background check company.

42
00:02:00.239 --> 00:02:05.120
<v Speaker 4>And initially the biggest selling point was that Checker abstracted

43
00:02:05.120 --> 00:02:08.280
<v Speaker 4>away a lot of the complexity of background check process,

44
00:02:08.520 --> 00:02:13.639
<v Speaker 4>collecting candidate information and then executing that flow and exposing

45
00:02:13.680 --> 00:02:17.199
<v Speaker 4>that via an API that was developed in a sinatrap.

46
00:02:17.479 --> 00:02:21.439
<v Speaker 4>And three years later, in twenty seventeen, I just joined

47
00:02:21.439 --> 00:02:25.039
<v Speaker 4>about four or five months, four or five months before

48
00:02:25.280 --> 00:02:28.400
<v Speaker 4>this particular incident happened. Fast forward to that point. We're

49
00:02:28.479 --> 00:02:31.800
<v Speaker 4>running i'd say, a few million checks a year for

50
00:02:32.080 --> 00:02:36.000
<v Speaker 4>a variety of different customers. Most of those customers use

51
00:02:36.039 --> 00:02:39.599
<v Speaker 4>our API, like I said before, to manage that process,

52
00:02:39.639 --> 00:02:42.879
<v Speaker 4>and they do most of the collection and interface with

53
00:02:42.919 --> 00:02:45.759
<v Speaker 4>the candidate on their side in their own application.

54
00:02:46.120 --> 00:02:47.080
<v Speaker 2>Oh that's interesting.

55
00:02:47.439 --> 00:02:50.199
<v Speaker 1>Yeah, I think a lot of the background check portals

56
00:02:50.199 --> 00:02:54.240
<v Speaker 1>that I've seen they yeah, they're like the fully baked

57
00:02:54.280 --> 00:02:57.400
<v Speaker 1>portal instead of yeah, being a background service that somebody

58
00:02:57.439 --> 00:02:59.800
<v Speaker 1>else can integrate into their own app.

59
00:03:00.120 --> 00:03:02.120
<v Speaker 5>Did yeah exactly that ground check on me?

60
00:03:02.319 --> 00:03:05.680
<v Speaker 4>Before this episode, I did not that there are a

61
00:03:05.719 --> 00:03:11.280
<v Speaker 4>lot of very important guidelines in stipulations governed by the

62
00:03:11.280 --> 00:03:14.560
<v Speaker 4>Federal Credit Reporting Act that make sure that you have

63
00:03:14.599 --> 00:03:17.319
<v Speaker 4>to have a permissible purpose for running a background check.

64
00:03:17.439 --> 00:03:20.199
<v Speaker 4>So in this case, most of our customers are using

65
00:03:20.680 --> 00:03:25.120
<v Speaker 4>the permissible purpose around employment as the reason for actually

66
00:03:25.199 --> 00:03:25.840
<v Speaker 4>running that check.

67
00:03:25.919 --> 00:03:28.240
<v Speaker 1>Well, that's no fun, I know, right, I want to

68
00:03:28.240 --> 00:03:32.960
<v Speaker 1>know everybody's dirty secrets. Interesting, So, yeah, why don't you

69
00:03:32.960 --> 00:03:35.759
<v Speaker 1>tell us a little bit about what went down with

70
00:03:35.960 --> 00:03:36.759
<v Speaker 1>the app?

71
00:03:36.879 --> 00:03:37.159
<v Speaker 2>Right?

72
00:03:37.439 --> 00:03:41.639
<v Speaker 4>So, like I said in twenty seventeen, Checker at this

73
00:03:41.719 --> 00:03:44.759
<v Speaker 4>point was a pretty important component of a number of

74
00:03:44.759 --> 00:03:49.159
<v Speaker 4>customers onboarding process. But we'd started off small and things

75
00:03:49.240 --> 00:03:51.840
<v Speaker 4>grew quickly in a lot of ways. We were just

76
00:03:51.879 --> 00:03:55.000
<v Speaker 4>trying to keep the lights on and scale the system

77
00:03:55.680 --> 00:03:58.759
<v Speaker 4>along with our customers as they continued to grow. On

78
00:03:58.840 --> 00:04:01.280
<v Speaker 4>demand is growing a lot this time as well. So

79
00:04:01.639 --> 00:04:06.360
<v Speaker 4>in twenty seventeen, we were doing some fairly routine changes

80
00:04:06.400 --> 00:04:09.039
<v Speaker 4>to a data model. I wasn't directly involved with that,

81
00:04:09.439 --> 00:04:12.919
<v Speaker 4>but we were changing something from an eteger ID to

82
00:04:14.280 --> 00:04:17.160
<v Speaker 4>UID and the references, and there were some backfills that

83
00:04:17.199 --> 00:04:21.079
<v Speaker 4>needed to happen, And so an engineer executed a script

84
00:04:21.240 --> 00:04:26.120
<v Speaker 4>on a Friday afternoon, which is always a great idea,

85
00:04:26.480 --> 00:04:29.920
<v Speaker 4>and they executed a script at about four thirty pm,

86
00:04:30.120 --> 00:04:33.040
<v Speaker 4>probably went and grabbed something, had a little happy hour,

87
00:04:33.079 --> 00:04:37.160
<v Speaker 4>and then headed home. And about an hour later we

88
00:04:37.199 --> 00:04:41.680
<v Speaker 4>started to receive a few various different pages to completely

89
00:04:41.839 --> 00:04:44.120
<v Speaker 4>unrelated teams that didn't really know what was going on

90
00:04:44.160 --> 00:04:46.279
<v Speaker 4>in terms of this backfill, and it didn't look like

91
00:04:46.319 --> 00:04:49.319
<v Speaker 4>anything too serious. It was just an elevated number of

92
00:04:49.360 --> 00:04:52.879
<v Speaker 4>exceptions in our client application that does some of the

93
00:04:52.920 --> 00:04:57.439
<v Speaker 4>candidate PII collection, and so we just decided to that

94
00:04:57.560 --> 00:05:00.759
<v Speaker 4>team decided to snooze that decided to just kind of

95
00:05:00.759 --> 00:05:01.240
<v Speaker 4>ignore it.

96
00:05:01.319 --> 00:05:04.160
<v Speaker 1>Yeah, so people that aren't aware. PII is an acronym

97
00:05:04.199 --> 00:05:08.480
<v Speaker 1>for personally identifiable information and is usually protected by a law.

98
00:05:08.839 --> 00:05:09.959
<v Speaker 2>Thank you anyway, go ahead.

99
00:05:10.319 --> 00:05:14.279
<v Speaker 4>So come Saturday morning, this has been going on for

100
00:05:14.319 --> 00:05:18.279
<v Speaker 4>about twelve hours now, this exception comes in again, and

101
00:05:18.680 --> 00:05:21.279
<v Speaker 4>at that point someone on our team actually decided to

102
00:05:21.360 --> 00:05:25.160
<v Speaker 4>escalate that and get more stakeholders involved. We had some

103
00:05:25.439 --> 00:05:28.319
<v Speaker 4>variety of other issues going on. We just migrated from

104
00:05:28.639 --> 00:05:32.360
<v Speaker 4>one deployment platform to Kubernetes, and so we had some

105
00:05:32.399 --> 00:05:35.439
<v Speaker 4>issues getting onto the cluster. There are too many of

106
00:05:35.480 --> 00:05:36.879
<v Speaker 4>us trying to get on at the same time, so

107
00:05:36.879 --> 00:05:39.639
<v Speaker 4>we ended up all having to actually go into the

108
00:05:39.680 --> 00:05:43.000
<v Speaker 4>office to the physical internet to finally get in debug

109
00:05:43.040 --> 00:05:46.319
<v Speaker 4>the issue. So we had a couple of other confounding

110
00:05:46.480 --> 00:05:48.879
<v Speaker 4>issues come up at the same time that made the

111
00:05:48.920 --> 00:05:52.560
<v Speaker 4>process of response even worse. So finally, this is maybe

112
00:05:53.040 --> 00:05:55.399
<v Speaker 4>ten o'clock in the morning, ten or eleven in the morning,

113
00:05:55.759 --> 00:05:58.560
<v Speaker 4>we finally, after being able to take a look at that,

114
00:05:59.079 --> 00:06:01.959
<v Speaker 4>identified what they issue was and we were starting We're

115
00:06:01.959 --> 00:06:05.279
<v Speaker 4>responding to about fifty to sixty percent of the one

116
00:06:05.279 --> 00:06:08.120
<v Speaker 4>of the most critical end points on our system, which

117
00:06:08.160 --> 00:06:11.279
<v Speaker 4>is to actually create requests to make a report. So

118
00:06:11.439 --> 00:06:14.879
<v Speaker 4>after you've collected the candidate's information, you say, please execute

119
00:06:14.879 --> 00:06:16.800
<v Speaker 4>this report so we can get that back, and that's

120
00:06:16.800 --> 00:06:20.560
<v Speaker 4>a synchronous request that you make using our API. And

121
00:06:20.720 --> 00:06:24.560
<v Speaker 4>when that request was failing, it was failing about forty

122
00:06:24.600 --> 00:06:26.160
<v Speaker 4>to fifty percent of the time with a four or

123
00:06:26.160 --> 00:06:29.439
<v Speaker 4>four response, which isn't really expected. So at that point

124
00:06:29.480 --> 00:06:31.040
<v Speaker 4>we were finally able to pin down the issue and

125
00:06:31.160 --> 00:06:34.279
<v Speaker 4>it came back to this script, and it turned out

126
00:06:34.439 --> 00:06:37.319
<v Speaker 4>that when you went to create this report, we would

127
00:06:37.360 --> 00:06:42.000
<v Speaker 4>look for create these additional subobjects called screenings, and due

128
00:06:42.040 --> 00:06:44.360
<v Speaker 4>to the script, we had actually created an issue where

129
00:06:44.480 --> 00:06:46.560
<v Speaker 4>validation would cause.

130
00:06:46.079 --> 00:06:48.439
<v Speaker 3>The reports to fail to create in this EDU.

131
00:06:48.399 --> 00:06:51.600
<v Speaker 4>Case, So there's some confounding issues with the way that

132
00:06:51.639 --> 00:06:54.040
<v Speaker 4>we had set up the data modeling to begin with

133
00:06:54.120 --> 00:06:57.399
<v Speaker 4>that we were trying to work around and this acception happened.

134
00:06:57.600 --> 00:06:58.439
<v Speaker 3>But when we.

135
00:06:58.399 --> 00:07:02.480
<v Speaker 4>Finally fix the issue, that where we shifted more into

136
00:07:02.600 --> 00:07:05.439
<v Speaker 4>what could what actually went wrong and what were the

137
00:07:05.480 --> 00:07:08.800
<v Speaker 4>real issues that caused us this outage gotcha.

138
00:07:08.920 --> 00:07:11.560
<v Speaker 1>So I'm curious as you work through this, what did

139
00:07:11.600 --> 00:07:13.399
<v Speaker 1>you add to your workflow to make sure that this

140
00:07:13.560 --> 00:07:15.959
<v Speaker 1>doesn't happen again, because I mean, some of it's going

141
00:07:16.040 --> 00:07:19.360
<v Speaker 1>to be technical, right, it's testing or you know, maybe

142
00:07:19.360 --> 00:07:21.759
<v Speaker 1>you set up a staging environment or something like that,

143
00:07:22.240 --> 00:07:24.720
<v Speaker 1>and some of it is going to be hey, when

144
00:07:24.800 --> 00:07:27.399
<v Speaker 1>this kind of alert comes up, do this thing right,

145
00:07:27.439 --> 00:07:29.800
<v Speaker 1>because it sounded like you did have some early indication

146
00:07:29.879 --> 00:07:31.160
<v Speaker 1>that this happened, right.

147
00:07:31.279 --> 00:07:33.560
<v Speaker 4>So I think the first most important thing that we

148
00:07:33.600 --> 00:07:36.360
<v Speaker 4>did was that are really from the beginning, we have

149
00:07:36.560 --> 00:07:39.959
<v Speaker 4>had what we call a blame this culture. I think

150
00:07:39.959 --> 00:07:42.439
<v Speaker 4>it's a common term now in the industry. But the

151
00:07:42.519 --> 00:07:46.800
<v Speaker 4>idea there is to really focus on learning from issues,

152
00:07:46.920 --> 00:07:51.759
<v Speaker 4>not trying to find who made the particular mistake, and

153
00:07:51.879 --> 00:07:54.959
<v Speaker 4>trying to look at what processes you're missing and what

154
00:07:55.240 --> 00:07:57.439
<v Speaker 4>changes you need to make in just your code base

155
00:07:57.480 --> 00:07:59.959
<v Speaker 4>as well that would have prevented the problem from happening,

156
00:08:00.120 --> 00:08:03.319
<v Speaker 4>so not trying to focus on the individual mistake.

157
00:08:03.079 --> 00:08:05.399
<v Speaker 3>That was made. So as part of that, we did a.

158
00:08:05.399 --> 00:08:09.240
<v Speaker 4>Post mortem doc and we went through and identified things

159
00:08:09.319 --> 00:08:12.879
<v Speaker 4>like one we should really have like a dedicated script

160
00:08:13.120 --> 00:08:17.160
<v Speaker 4>repository that goes through a code review process. So that's

161
00:08:17.319 --> 00:08:20.759
<v Speaker 4>that's one thing we implemented, and we made some safeguards

162
00:08:20.800 --> 00:08:24.040
<v Speaker 4>and to address this particular issue with the data models

163
00:08:24.319 --> 00:08:26.959
<v Speaker 4>as well. But I think for everyone that the bigger

164
00:08:27.040 --> 00:08:30.120
<v Speaker 4>issue was really the fact that we missed the outage

165
00:08:30.800 --> 00:08:34.320
<v Speaker 4>for so long. And we did actually have some monitoring

166
00:08:34.320 --> 00:08:36.399
<v Speaker 4>in place for this particular issue that would have that

167
00:08:36.399 --> 00:08:39.399
<v Speaker 4>should have page for the downtime that we were experiencing

168
00:08:39.559 --> 00:08:43.279
<v Speaker 4>for some report creation, But it turned out that our

169
00:08:43.320 --> 00:08:46.840
<v Speaker 4>monitors were really just not set up in the most

170
00:08:46.840 --> 00:08:51.200
<v Speaker 4>effective way to trigger for that particular type of outage

171
00:08:51.240 --> 00:08:54.600
<v Speaker 4>in this case is a partial outage, and that's that

172
00:08:54.679 --> 00:08:57.960
<v Speaker 4>requires a much more sensitive monitor in order for us

173
00:08:58.000 --> 00:09:01.639
<v Speaker 4>to detect everything. We designed the porehand was much more

174
00:09:01.919 --> 00:09:04.559
<v Speaker 4>targeted towards a complete failure of our system.

175
00:09:04.879 --> 00:09:07.679
<v Speaker 6>And so was this something that could have been caught

176
00:09:07.759 --> 00:09:09.600
<v Speaker 6>by automated tests.

177
00:09:09.879 --> 00:09:13.399
<v Speaker 4>This particular issue most likely could not have been caught

178
00:09:13.440 --> 00:09:16.639
<v Speaker 4>by an automated test because it was a it was

179
00:09:17.039 --> 00:09:20.679
<v Speaker 4>so so outside of the norm of what we expected

180
00:09:20.840 --> 00:09:24.159
<v Speaker 4>data to look like. So we had particularly I mean,

181
00:09:24.159 --> 00:09:26.799
<v Speaker 4>we had of course unit tests for everything that we

182
00:09:26.799 --> 00:09:30.039
<v Speaker 4>were running, and we had requests specs as well. We

183
00:09:30.159 --> 00:09:32.960
<v Speaker 4>did not have like an end to end environment set

184
00:09:33.039 --> 00:09:35.120
<v Speaker 4>up for like a staging environment where you could run

185
00:09:35.159 --> 00:09:36.159
<v Speaker 4>these tests and to end.

186
00:09:36.679 --> 00:09:40.480
<v Speaker 3>But again, the data in that this particular.

187
00:09:40.000 --> 00:09:43.919
<v Speaker 4>Case was very old, and it was essentially doing a

188
00:09:44.000 --> 00:09:47.679
<v Speaker 4>migration where that data was in a state that wasn't

189
00:09:47.759 --> 00:09:50.159
<v Speaker 4>anywhere in our code base at this point, So I'm

190
00:09:50.200 --> 00:09:52.720
<v Speaker 4>not sure we could have anticipated this particular issue.

191
00:09:52.799 --> 00:09:56.000
<v Speaker 7>What was the Fordland? Did everyone like phone up and

192
00:09:56.000 --> 00:09:56.919
<v Speaker 7>get really angry?

193
00:09:57.159 --> 00:09:59.000
<v Speaker 3>Oh? From a customer's perpective.

194
00:09:58.799 --> 00:10:01.159
<v Speaker 7>Yeah, yeah, this is the the best bit of outage

195
00:10:01.159 --> 00:10:03.799
<v Speaker 7>stories is the kind of the human cost of whoever

196
00:10:03.840 --> 00:10:07.080
<v Speaker 7>has to answer the phone the next week code drama.

197
00:10:06.720 --> 00:10:10.000
<v Speaker 4>Right, That's that's always one of the especially as your

198
00:10:10.240 --> 00:10:13.559
<v Speaker 4>as your application becomes more important to customers and what

199
00:10:13.600 --> 00:10:18.279
<v Speaker 4>your service, the impact to customers is more and more extreme.

200
00:10:18.559 --> 00:10:21.240
<v Speaker 4>And so in this case, piezas a Friday night, it

201
00:10:21.279 --> 00:10:24.000
<v Speaker 4>wasn't something where a lot of our customers were actively

202
00:10:24.639 --> 00:10:28.159
<v Speaker 4>monitoring on their end as well. Fortunately, we were able

203
00:10:28.200 --> 00:10:31.320
<v Speaker 4>to see that retries were happening, and many of our

204
00:10:31.320 --> 00:10:35.720
<v Speaker 4>customers use a retry fallback mechanism, so they were able

205
00:10:35.799 --> 00:10:39.000
<v Speaker 4>to just allow those to run through. But this is

206
00:10:39.039 --> 00:10:43.120
<v Speaker 4>particularly tricky in this case because there wasn't actually like

207
00:10:43.159 --> 00:10:47.559
<v Speaker 4>a record idea for many of these these particular responses. Fortunately,

208
00:10:47.600 --> 00:10:50.600
<v Speaker 4>we did have a. We do keep API logs, so

209
00:10:50.639 --> 00:10:53.080
<v Speaker 4>we were able to see exactly which requests failed for

210
00:10:53.159 --> 00:10:55.559
<v Speaker 4>each of our customers, and so we were able to

211
00:10:55.679 --> 00:10:58.720
<v Speaker 4>then reach out to our customer success team and they

212
00:10:58.799 --> 00:11:01.159
<v Speaker 4>were able to start to share the impact with each

213
00:11:01.159 --> 00:11:04.559
<v Speaker 4>of those customers pretty quickly. I will say that we've

214
00:11:04.879 --> 00:11:07.759
<v Speaker 4>done a lot of work to make our customer communication

215
00:11:07.840 --> 00:11:10.559
<v Speaker 4>a lot, a lot more polished since then, and that's

216
00:11:10.559 --> 00:11:13.240
<v Speaker 4>something that we're really focusing on now as well, and

217
00:11:13.360 --> 00:11:16.879
<v Speaker 4>just being able to give more visibility customers sooner. And

218
00:11:16.960 --> 00:11:18.960
<v Speaker 4>one of the most important things there is when it

219
00:11:18.960 --> 00:11:21.120
<v Speaker 4>comes to monitoring, is that you really want to be

220
00:11:21.159 --> 00:11:23.720
<v Speaker 4>able to find the issue and be able to start

221
00:11:23.759 --> 00:11:27.600
<v Speaker 4>to investigate it before you You don't want a customer to

222
00:11:27.639 --> 00:11:30.600
<v Speaker 4>identify it first. You should really understand what's happening in

223
00:11:30.639 --> 00:11:33.799
<v Speaker 4>your system before anyone else detects that issue.

224
00:11:33.840 --> 00:11:37.120
<v Speaker 7>And I guess, but this specific not this specific product,

225
00:11:37.159 --> 00:11:41.679
<v Speaker 7>but kind of product where your customers are consuming your API.

226
00:11:42.200 --> 00:11:45.559
<v Speaker 7>You're also at the mercy of their implementation too, so

227
00:11:46.240 --> 00:11:49.320
<v Speaker 7>you know, making a kind of called against you, and

228
00:11:49.799 --> 00:11:52.720
<v Speaker 7>if that call is failing, you know, you've got to

229
00:11:52.759 --> 00:11:55.159
<v Speaker 7>hope that their system can cope with that as well.

230
00:11:55.600 --> 00:11:59.200
<v Speaker 4>Exactly if some of these requests are happening in the browser,

231
00:11:59.480 --> 00:12:02.960
<v Speaker 4>or we're not set up to automatically really try, that

232
00:12:02.960 --> 00:12:05.480
<v Speaker 4>could be a much worse impact on the customers.

233
00:12:05.759 --> 00:12:08.120
<v Speaker 7>Can we talk about the blameless culture for a bit.

234
00:12:08.679 --> 00:12:11.600
<v Speaker 7>This is this is a new idea. And when I

235
00:12:11.679 --> 00:12:14.320
<v Speaker 7>was managing engineering teams, I used to have what I

236
00:12:14.360 --> 00:12:17.279
<v Speaker 7>called the finger of blame. So I used to around

237
00:12:17.320 --> 00:12:19.559
<v Speaker 7>I would hold up my finger in the meeting and

238
00:12:19.600 --> 00:12:22.960
<v Speaker 7>I'd introduce the finger as the finger of blame, and

239
00:12:23.000 --> 00:12:25.960
<v Speaker 7>then we'd work out who the finger of blame should

240
00:12:25.960 --> 00:12:28.960
<v Speaker 7>be pointing to. Now, more often than not, of course

241
00:12:29.000 --> 00:12:31.360
<v Speaker 7>it was me. So the finger of blame was a

242
00:12:31.399 --> 00:12:34.600
<v Speaker 7>double edged finger, but it was it was a kind

243
00:12:34.600 --> 00:12:36.960
<v Speaker 7>of way of you know, people take it very seriously

244
00:12:37.039 --> 00:12:38.759
<v Speaker 7>when they mess up the kind of stuff, so you

245
00:12:38.879 --> 00:12:41.120
<v Speaker 7>kind of have to get your get your team back

246
00:12:41.200 --> 00:12:43.759
<v Speaker 7>on board. So it's a way of kind of lightening

247
00:12:43.799 --> 00:12:49.159
<v Speaker 7>the mood after after that week's disaster. But a blameless culture,

248
00:12:49.200 --> 00:12:51.960
<v Speaker 7>as you said, is a kind of more more sophisticated

249
00:12:52.240 --> 00:12:55.480
<v Speaker 7>way of doing it instead of pointing a jovial finger

250
00:12:55.759 --> 00:12:58.799
<v Speaker 7>at the person who messed up. What's what does that

251
00:12:58.960 --> 00:13:01.120
<v Speaker 7>look like? I mean, you know, do you just go

252
00:13:01.200 --> 00:13:03.960
<v Speaker 7>around telling people it's not their ful or you know,

253
00:13:04.000 --> 00:13:07.279
<v Speaker 7>how do you implement a blameless culture in what sounds

254
00:13:07.320 --> 00:13:09.039
<v Speaker 7>like quite a big engineering team.

255
00:13:09.360 --> 00:13:11.679
<v Speaker 4>I think I think it starts for us with it

256
00:13:11.720 --> 00:13:17.000
<v Speaker 4>really started with our CTO Jonathan and co founder making

257
00:13:17.039 --> 00:13:20.519
<v Speaker 4>that a priority from pretty much day one, basically from

258
00:13:20.519 --> 00:13:24.840
<v Speaker 4>the beginning of our process. When we've had issues or incidents,

259
00:13:25.120 --> 00:13:28.000
<v Speaker 4>we've done a post mortem doc, we've had a process

260
00:13:28.039 --> 00:13:32.960
<v Speaker 4>around that, and it's always been very forward facing, very

261
00:13:33.080 --> 00:13:35.480
<v Speaker 4>much about what could we have done better, what can

262
00:13:35.480 --> 00:13:37.960
<v Speaker 4>we improve, what are the things we should be doing

263
00:13:38.399 --> 00:13:42.279
<v Speaker 4>going forward. So I think having that first touch point

264
00:13:42.440 --> 00:13:45.000
<v Speaker 4>and really having that emphasis from the beginning was really

265
00:13:45.039 --> 00:13:48.559
<v Speaker 4>important and cascades down. I think as you're building out

266
00:13:48.639 --> 00:13:52.120
<v Speaker 4>a bigger engineering team, that's critical is to be able

267
00:13:52.159 --> 00:13:55.200
<v Speaker 4>to just continue to build keep that culture going. And

268
00:13:55.240 --> 00:13:58.320
<v Speaker 4>I think that's that's something a challenge to continue doing.

269
00:13:58.559 --> 00:14:00.720
<v Speaker 4>But I think as we grow and we've been able

270
00:14:00.799 --> 00:14:03.559
<v Speaker 4>to do that so far, so I think that was

271
00:14:03.600 --> 00:14:06.000
<v Speaker 4>step number one. I think a second piece of it

272
00:14:06.039 --> 00:14:08.600
<v Speaker 4>is understanding and trying to understand when it's more of

273
00:14:08.600 --> 00:14:13.480
<v Speaker 4>a process issue versus something that someone particularly did wrong.

274
00:14:13.600 --> 00:14:15.480
<v Speaker 4>And I think a lot of the time. I think

275
00:14:15.519 --> 00:14:18.240
<v Speaker 4>a lot of incidents do occur because you're trying to

276
00:14:18.639 --> 00:14:22.200
<v Speaker 4>you're trying to make different prioritization decisions, and you're trying

277
00:14:22.240 --> 00:14:25.799
<v Speaker 4>to make sure that you anticipate things in advance or

278
00:14:26.080 --> 00:14:31.000
<v Speaker 4>failure ferialments, and sometimes you just miss those. And those

279
00:14:31.000 --> 00:14:34.399
<v Speaker 4>are particular cases where I think the management team needs

280
00:14:34.440 --> 00:14:37.720
<v Speaker 4>to really take responsibility for it. It's not an individual

281
00:14:38.000 --> 00:14:42.200
<v Speaker 4>issue that caused that particular downtime or that that was

282
00:14:42.600 --> 00:14:45.080
<v Speaker 4>necessarily that one piece of code, and so it could

283
00:14:45.080 --> 00:14:48.000
<v Speaker 4>be just an example. Is I mean, this is an

284
00:14:48.000 --> 00:14:50.600
<v Speaker 4>example I think actually where we had some technical debt.

285
00:14:50.600 --> 00:14:53.080
<v Speaker 4>We were trying to clean it up, and that was

286
00:14:53.120 --> 00:14:55.080
<v Speaker 4>a good thing, but I think we didn't necessarily have

287
00:14:55.240 --> 00:14:58.240
<v Speaker 4>everything in place to be able to address that technical

288
00:14:58.320 --> 00:15:02.960
<v Speaker 4>debt effectively, and that's not necessarily one engineer's responsibility to

289
00:15:03.039 --> 00:15:03.840
<v Speaker 4>be out in front.

290
00:15:03.639 --> 00:15:05.360
<v Speaker 1>Of Yeah, one thing I just want to add is

291
00:15:05.399 --> 00:15:08.519
<v Speaker 1>that I like the blameless culture just from the sense

292
00:15:08.600 --> 00:15:13.000
<v Speaker 1>of unless somebody is either malicious, which I have never

293
00:15:13.039 --> 00:15:16.440
<v Speaker 1>ever ever encountered, or is chronically reckless, which I've also

294
00:15:16.559 --> 00:15:19.759
<v Speaker 1>never encountered. Right, everybody is usually trying to pull along

295
00:15:20.440 --> 00:15:23.039
<v Speaker 1>in the same way you know, if somebody has that issue,

296
00:15:23.080 --> 00:15:26.519
<v Speaker 1>you identify it pretty fast and you usually are able

297
00:15:26.559 --> 00:15:30.600
<v Speaker 1>to counter it before it becomes a real problem. But yeah,

298
00:15:31.159 --> 00:15:34.480
<v Speaker 1>just to put that together then you know, yeah, the

299
00:15:34.519 --> 00:15:36.679
<v Speaker 1>rest of it, it's, hey, look, we're on the same team.

300
00:15:36.679 --> 00:15:39.480
<v Speaker 1>We're all trying to get the same place. So let's

301
00:15:39.519 --> 00:15:41.159
<v Speaker 1>talk about how we can do this better so that

302
00:15:41.159 --> 00:15:44.080
<v Speaker 1>doesn't happen again, because next time it might be me, right,

303
00:15:44.440 --> 00:15:48.279
<v Speaker 1>that misses a critical step. And I don't want you

304
00:15:48.320 --> 00:15:51.480
<v Speaker 1>while fingering me either. I mean, I want to learn

305
00:15:51.480 --> 00:15:53.960
<v Speaker 1>from it, but I you know, we don't want people

306
00:15:54.399 --> 00:15:57.759
<v Speaker 1>walking around in fear. Instead, if somebody screws up, we

307
00:15:57.759 --> 00:15:59.799
<v Speaker 1>want them to come forward and say, hey, I might

308
00:15:59.840 --> 00:16:02.600
<v Speaker 1>have this up before it becomes an issue next time.

309
00:16:02.960 --> 00:16:03.480
<v Speaker 3>Absolutely.

310
00:16:03.480 --> 00:16:06.320
<v Speaker 4>And I think one other thing to highlight here is

311
00:16:06.360 --> 00:16:09.360
<v Speaker 4>that when you don't have a blameless culture, folks are

312
00:16:09.360 --> 00:16:12.159
<v Speaker 4>going to be very afraid to speak out when they

313
00:16:12.159 --> 00:16:14.879
<v Speaker 4>do soon an issue, whether it was there they think

314
00:16:15.159 --> 00:16:17.639
<v Speaker 4>it was their mistake or someone else's, They're not going

315
00:16:17.720 --> 00:16:20.360
<v Speaker 4>to want to escalate that issue and make sure that

316
00:16:20.360 --> 00:16:24.279
<v Speaker 4>it gets attention necessarily. And so one of the best

317
00:16:24.320 --> 00:16:27.159
<v Speaker 4>side effects of having a blameless culture is that you

318
00:16:27.240 --> 00:16:31.120
<v Speaker 4>get really engaged response and everyone's going to work together

319
00:16:31.240 --> 00:16:33.679
<v Speaker 4>to try to address the issue. I think that even

320
00:16:33.799 --> 00:16:39.240
<v Speaker 4>cascades down to customer communication as well, because when you're

321
00:16:39.279 --> 00:16:42.399
<v Speaker 4>really engaged in trying to do that, then you're doing

322
00:16:42.440 --> 00:16:44.360
<v Speaker 4>the best thing for the customers as well, because you're

323
00:16:44.360 --> 00:16:48.000
<v Speaker 4>trying to address these issues head on and not try

324
00:16:48.039 --> 00:16:49.919
<v Speaker 4>to find ways to kind of smooth them out under

325
00:16:49.919 --> 00:16:50.399
<v Speaker 4>the surface.

326
00:16:50.799 --> 00:16:53.960
<v Speaker 1>Yeah, it also and this is important, and sometimes I

327
00:16:53.960 --> 00:16:56.879
<v Speaker 1>think people hear this and they're going to go that

328
00:16:56.960 --> 00:16:59.600
<v Speaker 1>sounds a little scary, But you want people to take

329
00:16:59.799 --> 00:17:03.240
<v Speaker 1>chances sometimes, right, you want people to kind of take

330
00:17:03.279 --> 00:17:06.200
<v Speaker 1>a shot at making things better. That opens it up

331
00:17:06.200 --> 00:17:09.000
<v Speaker 1>to them to do that, right, it's oh, well, you know,

332
00:17:09.200 --> 00:17:12.039
<v Speaker 1>I tried this tweak on the Jinkins file, or I

333
00:17:12.160 --> 00:17:15.279
<v Speaker 1>tried this tweak on the Kubernetes setup, or I tried

334
00:17:15.279 --> 00:17:17.960
<v Speaker 1>this tweak on this other thing, and a lot of

335
00:17:18.000 --> 00:17:20.079
<v Speaker 1>times those things pay off. But if you don't give

336
00:17:20.119 --> 00:17:22.680
<v Speaker 1>people the freedom to go for it, a lot of

337
00:17:22.680 --> 00:17:24.160
<v Speaker 1>times you're going to miss out on a lot of

338
00:17:24.160 --> 00:17:27.400
<v Speaker 1>those benefits. And again, as long as they're not being

339
00:17:27.480 --> 00:17:29.920
<v Speaker 1>reckless about it, right, so they're taking the steps, they're

340
00:17:30.000 --> 00:17:31.960
<v Speaker 1>verifying it on their own system and things like that.

341
00:17:32.640 --> 00:17:35.759
<v Speaker 1>Then you benefit much much more from people being willing

342
00:17:35.799 --> 00:17:38.720
<v Speaker 1>to take a shot. So yeah, so with the blameless culture,

343
00:17:38.759 --> 00:17:41.759
<v Speaker 1>I'm curious. So you get together and you start identifying

344
00:17:41.759 --> 00:17:43.920
<v Speaker 1>what the issue is. So what does that look like

345
00:17:44.000 --> 00:17:47.359
<v Speaker 1>then as far as figuring out what's going on, because

346
00:17:47.359 --> 00:17:50.200
<v Speaker 1>you're not pointing fingers, but you are looking for the

347
00:17:50.240 --> 00:17:52.160
<v Speaker 1>commit that made the problem, right you are.

348
00:17:53.160 --> 00:17:55.440
<v Speaker 4>I think at the end of the day, you're going

349
00:17:55.519 --> 00:17:57.880
<v Speaker 4>to try to find the root cause. Right, You're going

350
00:17:57.960 --> 00:17:59.519
<v Speaker 4>to look for that commit, You're going to look for

351
00:17:59.559 --> 00:18:02.440
<v Speaker 4>the law. Maybe it was a script that was logged

352
00:18:02.440 --> 00:18:06.880
<v Speaker 4>into your logging system, whatever it is, You're going to

353
00:18:06.920 --> 00:18:09.480
<v Speaker 4>look for that and look for the root cause. So honestly,

354
00:18:09.480 --> 00:18:11.599
<v Speaker 4>a lot of times, you know, maybe what caused the

355
00:18:11.640 --> 00:18:14.400
<v Speaker 4>issue from whether if it was something that was specifically

356
00:18:14.440 --> 00:18:17.079
<v Speaker 4>run by a specific person, and they probably feel a

357
00:18:17.119 --> 00:18:19.200
<v Speaker 4>little bit of guilt there, but there's no reason.

358
00:18:18.920 --> 00:18:19.920
<v Speaker 3>To lay on more there.

359
00:18:20.160 --> 00:18:22.880
<v Speaker 4>And I think everyone, like you said, feels a lot

360
00:18:22.880 --> 00:18:26.759
<v Speaker 4>of responsibility around the work that they're doing already, so

361
00:18:26.839 --> 00:18:30.160
<v Speaker 4>there's no reason to overemphasize that. So what that looks

362
00:18:30.240 --> 00:18:34.799
<v Speaker 4>like is typically the team that is impacted is really

363
00:18:34.839 --> 00:18:37.799
<v Speaker 4>going to own that post mortem, and that's one way

364
00:18:37.960 --> 00:18:42.000
<v Speaker 4>for you to feel like you're resolving the incident or

365
00:18:42.039 --> 00:18:45.160
<v Speaker 4>that they issue that the cost incident. So this is

366
00:18:45.200 --> 00:18:47.440
<v Speaker 4>a definitely become a different a bit of a different

367
00:18:47.440 --> 00:18:48.640
<v Speaker 4>process as the team is growing.

368
00:18:48.680 --> 00:18:50.160
<v Speaker 3>When we're at thirty.

369
00:18:50.039 --> 00:18:52.119
<v Speaker 4>I think it's a little bit easier just to know

370
00:18:52.200 --> 00:18:55.240
<v Speaker 4>exactly who should work on those types of mitigations. It

371
00:18:55.279 --> 00:18:59.279
<v Speaker 4>doesn't typically it's pretty isolated to a specific team. As

372
00:18:59.359 --> 00:19:02.079
<v Speaker 4>the team has grown, growing and the system is growing,

373
00:19:02.480 --> 00:19:05.480
<v Speaker 4>that's definitely become more of a challenge because sometimes incidents

374
00:19:05.519 --> 00:19:08.960
<v Speaker 4>happen because different issues that multiple teams have introduced, or

375
00:19:09.000 --> 00:19:11.359
<v Speaker 4>maybe there's multiple teams that need to be involved in

376
00:19:11.400 --> 00:19:15.000
<v Speaker 4>the mitigation and for that in that case, we've definitely

377
00:19:15.000 --> 00:19:17.720
<v Speaker 4>been trying to involve our post mortem process and the

378
00:19:17.759 --> 00:19:21.200
<v Speaker 4>action items. So we have a program manager that one

379
00:19:21.240 --> 00:19:24.920
<v Speaker 4>of her responsibilities is specifically around making sure that we

380
00:19:24.960 --> 00:19:28.839
<v Speaker 4>are coordinating some of those efforts and meeting some essays.

381
00:19:28.880 --> 00:19:34.599
<v Speaker 4>So we had to some additional rules and coordination around

382
00:19:34.960 --> 00:19:38.079
<v Speaker 4>the process as we've as we've started to grow, a

383
00:19:38.079 --> 00:19:41.200
<v Speaker 4>lot of it was just on the individual teams initially,

384
00:19:41.240 --> 00:19:44.119
<v Speaker 4>and now as we've grown again, there's more processes involved.

385
00:19:44.200 --> 00:19:46.039
<v Speaker 4>I think that's pretty common thing that you have to

386
00:19:46.079 --> 00:19:47.559
<v Speaker 4>introduce as teams grow.

387
00:19:48.000 --> 00:19:51.559
<v Speaker 7>I will say that if you've got relatives who are

388
00:19:51.599 --> 00:19:56.039
<v Speaker 7>in the medical profession, especially if they're pathologies, even the

389
00:19:56.160 --> 00:19:59.119
<v Speaker 7>use of the term person posts more toem makes me

390
00:19:59.200 --> 00:20:03.359
<v Speaker 7>uncomfortable because those are no fun at all. But yeah,

391
00:20:03.400 --> 00:20:07.240
<v Speaker 7>it's also a word that we use. So yeah, it's oh,

392
00:20:07.759 --> 00:20:10.799
<v Speaker 7>it just makes me. Oh, it's creepy. It's all zombies.

393
00:20:11.160 --> 00:20:11.519
<v Speaker 2>I don't know.

394
00:20:11.680 --> 00:20:16.960
<v Speaker 7>Yeah, the post mortem brings me flashbacks to episodes of

395
00:20:16.960 --> 00:20:19.400
<v Speaker 7>The X Files in the nineties when Dana Scuddy was

396
00:20:19.440 --> 00:20:20.680
<v Speaker 7>taking a Navy in apart.

397
00:20:20.960 --> 00:20:24.119
<v Speaker 1>Yeah, but it does give you a little perspective too, right,

398
00:20:24.200 --> 00:20:27.440
<v Speaker 1>because usually in our post mortems, we're talking about what

399
00:20:27.480 --> 00:20:30.440
<v Speaker 1>went wrong with the system, not that somebody actually died

400
00:20:30.519 --> 00:20:31.720
<v Speaker 1>because of this, Right, I.

401
00:20:31.799 --> 00:20:33.519
<v Speaker 5>Just got a weird brain a Right, that's what my

402
00:20:33.519 --> 00:20:34.319
<v Speaker 5>brain thinks.

403
00:20:34.119 --> 00:20:39.279
<v Speaker 1>That, well, some software it is life supporting, you know,

404
00:20:39.680 --> 00:20:41.640
<v Speaker 1>a lot of the medical equipment and stuff out there.

405
00:20:41.680 --> 00:20:44.960
<v Speaker 1>But you know, in this case, yeah, we all want

406
00:20:44.960 --> 00:20:47.400
<v Speaker 1>to keep our jobs as well, so I mean, it's

407
00:20:47.400 --> 00:20:49.920
<v Speaker 1>not like we can just blow it off either. So yeah,

408
00:20:49.960 --> 00:20:53.000
<v Speaker 1>So I want to get back to the topic at hand, though,

409
00:20:53.079 --> 00:20:56.319
<v Speaker 1>and talk a little bit about what kind of monitoring

410
00:20:56.400 --> 00:20:58.200
<v Speaker 1>did you have before and what kind of monitoring do

411
00:20:58.200 --> 00:21:00.799
<v Speaker 1>you have now in order to catch this kind of thing.

412
00:21:01.359 --> 00:21:05.279
<v Speaker 4>So we use a number of different types of monitoring.

413
00:21:05.480 --> 00:21:09.240
<v Speaker 4>At the time, we used a lot. We were pretty

414
00:21:09.240 --> 00:21:13.240
<v Speaker 4>heavily reliant on exception tracking, and we also had some

415
00:21:13.279 --> 00:21:17.559
<v Speaker 4>application and performance monitoring as well, commonly called APM. A

416
00:21:17.599 --> 00:21:20.079
<v Speaker 4>couple examples of that would be something like new Relic

417
00:21:20.240 --> 00:21:22.720
<v Speaker 4>or data Dog as a product as well. Now and

418
00:21:22.759 --> 00:21:26.160
<v Speaker 4>then we did also use a stats D cluster that

419
00:21:26.480 --> 00:21:29.759
<v Speaker 4>sent metrics over to data Dog, and I think we

420
00:21:30.160 --> 00:21:33.200
<v Speaker 4>just had started using that maybe just a few months

421
00:21:33.240 --> 00:21:37.480
<v Speaker 4>before this particular incident occurred. So, like I alluded to before,

422
00:21:37.599 --> 00:21:40.720
<v Speaker 4>we had some We had some monitors for this particular issue,

423
00:21:41.039 --> 00:21:44.799
<v Speaker 4>but they were pretty simplistic. They basically just looked for

424
00:21:44.839 --> 00:21:48.240
<v Speaker 4>a minimum threshold of the number of reports that we're creating,

425
00:21:48.759 --> 00:21:51.240
<v Speaker 4>and we had to set that threshold.

426
00:21:50.759 --> 00:21:53.000
<v Speaker 3>To be very low over like an hour period.

427
00:21:53.160 --> 00:21:56.839
<v Speaker 4>Because traffic is variable, you never know exactly how many

428
00:21:56.880 --> 00:21:59.480
<v Speaker 4>reports you're going to get created. There's times a day

429
00:21:59.519 --> 00:22:02.720
<v Speaker 4>where we received very few requests, and then there's other

430
00:22:02.720 --> 00:22:05.119
<v Speaker 4>times where we see large spikes. So we just had

431
00:22:05.240 --> 00:22:09.359
<v Speaker 4>very simplistic monitoring in place for some of these key

432
00:22:09.880 --> 00:22:12.559
<v Speaker 4>metrics at that point, and at that point we were

433
00:22:12.599 --> 00:22:15.960
<v Speaker 4>still very heavily reliant on, like I said, exception tracking

434
00:22:16.319 --> 00:22:20.440
<v Speaker 4>using systems bug trackers like Centry that then could then

435
00:22:20.519 --> 00:22:23.480
<v Speaker 4>alert if you had certain thresholds of number of errors

436
00:22:23.680 --> 00:22:26.960
<v Speaker 4>over a period of time. In this particular case, exception

437
00:22:27.000 --> 00:22:29.519
<v Speaker 4>tracking isn't very useful because we were responding with a

438
00:22:29.680 --> 00:22:32.839
<v Speaker 4>four or four, so there wasn't actually there was an

439
00:22:32.880 --> 00:22:34.119
<v Speaker 4>exception in the system.

440
00:22:34.319 --> 00:22:34.839
<v Speaker 2>It was just.

441
00:22:35.359 --> 00:22:39.400
<v Speaker 4>Automatically active record not found something like that that was

442
00:22:39.440 --> 00:22:42.039
<v Speaker 4>then handled automatically and then responding with the four or four.

443
00:22:42.440 --> 00:22:46.079
<v Speaker 4>So it wasn't expected behavior, but there wasn't an exception

444
00:22:46.160 --> 00:22:47.039
<v Speaker 4>that could have been caught.

445
00:22:47.359 --> 00:22:50.039
<v Speaker 1>Yeah, that makes sense. Somebody typed this question in it

446
00:22:50.119 --> 00:22:52.559
<v Speaker 1>was one of the penelists. Did you get that answered?

447
00:22:52.759 --> 00:22:54.119
<v Speaker 1>I don't know if it was Luc Dave.

448
00:22:54.039 --> 00:22:58.400
<v Speaker 5>It was me. Just be clear, was this instant Was

449
00:22:58.400 --> 00:23:03.319
<v Speaker 5>it a monitoring problem all an alerting problem because it

450
00:23:03.359 --> 00:23:05.920
<v Speaker 5>sounds like an alert did go off at some point.

451
00:23:06.400 --> 00:23:08.680
<v Speaker 6>Sounds like it was a people problem because they snooze

452
00:23:08.720 --> 00:23:09.240
<v Speaker 6>the alert.

453
00:23:09.640 --> 00:23:12.880
<v Speaker 4>I think this was more of a monitoring problem overall.

454
00:23:13.400 --> 00:23:18.160
<v Speaker 4>As Dave mentioned, there there was a component where.

455
00:23:18.079 --> 00:23:21.640
<v Speaker 3>A page was met was snoozed, But I.

456
00:23:21.519 --> 00:23:24.480
<v Speaker 4>Think that was still a failure on our on our

457
00:23:25.680 --> 00:23:31.240
<v Speaker 4>monitoring because in this case that was just a signal

458
00:23:31.559 --> 00:23:34.839
<v Speaker 4>of what the true issue was. It was a downstream

459
00:23:34.920 --> 00:23:38.680
<v Speaker 4>client application that had had a page earlier on and

460
00:23:38.720 --> 00:23:41.759
<v Speaker 4>it wasn't It wasn't clear at all what the issue was.

461
00:23:42.119 --> 00:23:45.519
<v Speaker 4>And I think when you're when you're developing a system

462
00:23:45.640 --> 00:23:49.559
<v Speaker 4>for alerting, you need to have clear action items. So

463
00:23:49.599 --> 00:23:53.119
<v Speaker 4>you need to have and that's where custom metrics, building

464
00:23:53.200 --> 00:23:56.920
<v Speaker 4>application metrics as you as you grow become really important,

465
00:23:58.079 --> 00:24:01.559
<v Speaker 4>having the having clear signal what wrong so that that's

466
00:24:01.559 --> 00:24:05.160
<v Speaker 4>someone knows where to investigate. In this case, it was

467
00:24:05.200 --> 00:24:08.440
<v Speaker 4>a client application and browser. There's a lot of noise

468
00:24:08.480 --> 00:24:12.720
<v Speaker 4>there and I can easily understand why someone would just

469
00:24:12.880 --> 00:24:15.720
<v Speaker 4>snooze something like that. In my opinion, it wasn't really

470
00:24:15.720 --> 00:24:17.599
<v Speaker 4>a people issue in this particular case.

471
00:24:18.240 --> 00:24:21.559
<v Speaker 6>Yeah, I think we've all been there before where we

472
00:24:21.640 --> 00:24:26.640
<v Speaker 6>get an alert from whatever monitoring that we're doing and

473
00:24:26.759 --> 00:24:29.559
<v Speaker 6>the error looks serious, but you kind of read it

474
00:24:29.599 --> 00:24:31.519
<v Speaker 6>and like, oh, you know what, this is probably just

475
00:24:31.559 --> 00:24:36.200
<v Speaker 6>a one off situation, and then turns out it is

476
00:24:36.240 --> 00:24:39.079
<v Speaker 6>actually a big deal that needs to be addressed as

477
00:24:39.119 --> 00:24:42.759
<v Speaker 6>soon as possible. So I no, I've been there before,

478
00:24:43.440 --> 00:24:48.119
<v Speaker 6>and you know the hard times to really track this.

479
00:24:48.640 --> 00:24:53.640
<v Speaker 6>I use Century for my air tracking and so I

480
00:24:53.680 --> 00:24:57.319
<v Speaker 6>get email text notifications with that. And one of the

481
00:24:57.440 --> 00:24:59.839
<v Speaker 6>nice things about it is that it'll show the number

482
00:24:59.839 --> 00:25:03.839
<v Speaker 6>of occurrences, whether they are unique or not, so I

483
00:25:03.839 --> 00:25:07.720
<v Speaker 6>can see if okay, this particular error is only coming

484
00:25:07.759 --> 00:25:12.440
<v Speaker 6>from one user, or I could see we're getting one

485
00:25:12.519 --> 00:25:16.559
<v Speaker 6>hundred errors that's coming from one hundred different users, so

486
00:25:16.599 --> 00:25:20.279
<v Speaker 6>there's a more widespread problem. So I think, you know,

487
00:25:20.480 --> 00:25:25.319
<v Speaker 6>definitely getting the notifications, but then having proper analytics on

488
00:25:25.400 --> 00:25:28.799
<v Speaker 6>your errors so you can actually see the scope of

489
00:25:28.799 --> 00:25:32.160
<v Speaker 6>how big this is can really kind of weigh in

490
00:25:32.240 --> 00:25:33.240
<v Speaker 6>on the importance.

491
00:25:33.799 --> 00:25:35.960
<v Speaker 5>Yeah, makes sense, I imagined, Dave.

492
00:25:36.039 --> 00:25:41.920
<v Speaker 7>You've been through, like me, many different monitoring platforms data Dog,

493
00:25:42.680 --> 00:25:45.759
<v Speaker 7>you said, new relics. You know what which the good

494
00:25:46.279 --> 00:25:49.559
<v Speaker 7>Which are the good monitoring platforms for which one? So

495
00:25:49.640 --> 00:25:52.519
<v Speaker 7>you're like, this is the platform that works really well

496
00:25:52.559 --> 00:25:54.079
<v Speaker 7>for this API situation.

497
00:25:54.559 --> 00:25:57.400
<v Speaker 6>I think it all depends on what you're doing. So

498
00:25:57.599 --> 00:26:01.400
<v Speaker 6>if you have a heavy jobscript front in kind of deal,

499
00:26:01.599 --> 00:26:04.279
<v Speaker 6>and if you also have a lot of rev backing code.

500
00:26:04.720 --> 00:26:07.799
<v Speaker 6>I know Centric you can handle both of those situations.

501
00:26:08.880 --> 00:26:14.960
<v Speaker 6>Other people will go with another solution. So I personally

502
00:26:15.079 --> 00:26:19.519
<v Speaker 6>found centriy to be math flavor of choice, but you know,

503
00:26:19.640 --> 00:26:22.200
<v Speaker 6>mile edge will vary based on what other people have.

504
00:26:22.720 --> 00:26:26.319
<v Speaker 4>It also depends on where where you are in terms

505
00:26:26.400 --> 00:26:31.640
<v Speaker 4>of your applications, use cases, what customers, what the customer

506
00:26:31.680 --> 00:26:34.799
<v Speaker 4>profile looks like, how large the company has gone, how

507
00:26:34.799 --> 00:26:37.640
<v Speaker 4>many people are supporting it. When you're early on, when

508
00:26:37.640 --> 00:26:42.839
<v Speaker 4>you're building a new application, new product, by definition, the

509
00:26:43.599 --> 00:26:45.519
<v Speaker 4>developers on that are going to really understand the full

510
00:26:45.640 --> 00:26:50.480
<v Speaker 4>system very well. So cent exception tracking probably is going

511
00:26:50.559 --> 00:26:52.799
<v Speaker 4>to be able to give you most of what you

512
00:26:52.880 --> 00:26:55.440
<v Speaker 4>need to know in terms of understand what's going on.

513
00:26:55.960 --> 00:26:59.119
<v Speaker 4>As the system starts to grow, and especially as you

514
00:26:59.200 --> 00:27:02.960
<v Speaker 4>have more great teams, I think that's where things like

515
00:27:03.079 --> 00:27:06.720
<v Speaker 4>stats D become more useful because you need to be

516
00:27:06.759 --> 00:27:10.720
<v Speaker 4>able to set up specific use cases for core parts

517
00:27:10.720 --> 00:27:13.039
<v Speaker 4>of your application. And I would maybe say that the

518
00:27:13.079 --> 00:27:14.759
<v Speaker 4>bar there is maybe when you start to hit the

519
00:27:14.799 --> 00:27:17.359
<v Speaker 4>point where you start to have a significant number of

520
00:27:17.400 --> 00:27:21.039
<v Speaker 4>pain customers using specific features, maybe you need to start

521
00:27:21.079 --> 00:27:24.000
<v Speaker 4>to hone in on one or two key processes that

522
00:27:24.200 --> 00:27:28.079
<v Speaker 4>they break, it's absolutely critical that you know immediately. That's

523
00:27:28.160 --> 00:27:30.200
<v Speaker 4>kind of the point that Checker was out in twenty seventeen.

524
00:27:30.200 --> 00:27:33.759
<v Speaker 4>We really needed to have high intelligence, our very clear

525
00:27:33.799 --> 00:27:39.559
<v Speaker 4>intelligence and visibility into specific parts of our system, and

526
00:27:39.079 --> 00:27:42.640
<v Speaker 4>we're trying to move in that direction. When the sincident happened.

527
00:27:42.720 --> 00:27:45.440
<v Speaker 4>We've continued to invest in that area going forward. I

528
00:27:45.440 --> 00:27:48.880
<v Speaker 4>think it's become even more important as we're getting larger

529
00:27:49.119 --> 00:27:52.160
<v Speaker 4>because there's just so many different systems that are interacting

530
00:27:52.359 --> 00:27:55.759
<v Speaker 4>together that no one really understands the whole system at

531
00:27:55.759 --> 00:27:58.480
<v Speaker 4>this point. And the only way to really know how

532
00:27:58.519 --> 00:28:01.160
<v Speaker 4>the different systems are working together is maybe make sure

533
00:28:01.160 --> 00:28:04.759
<v Speaker 4>everything's working properly. Is to have some of these custom

534
00:28:04.799 --> 00:28:07.799
<v Speaker 4>metrics to find for specific key processes.

535
00:28:08.119 --> 00:28:11.440
<v Speaker 5>Do you find that putting really large screens on the

536
00:28:11.480 --> 00:28:15.119
<v Speaker 5>office wall helps make your application more reliable?

537
00:28:15.200 --> 00:28:16.720
<v Speaker 3>That's a good question. We don't.

538
00:28:16.880 --> 00:28:19.759
<v Speaker 4>We're all remote now, so at this point, having had

539
00:28:19.759 --> 00:28:22.279
<v Speaker 4>an experiment with that, we did have some of those

540
00:28:22.680 --> 00:28:25.240
<v Speaker 4>in our office. I think I've been trying to find

541
00:28:25.279 --> 00:28:28.359
<v Speaker 4>ways to make that more visible and make metrics more

542
00:28:28.480 --> 00:28:31.000
<v Speaker 4>visible to our team as we've been and shifted to

543
00:28:31.079 --> 00:28:34.559
<v Speaker 4>one hundred percent remote due to the pandemic. There's also

544
00:28:34.599 --> 00:28:38.640
<v Speaker 4>a challenge for our business in particular where sometimes things

545
00:28:38.720 --> 00:28:41.400
<v Speaker 4>are very many of our processes are very asynchronous and

546
00:28:41.440 --> 00:28:46.079
<v Speaker 4>they could take hours to date to fully execute, and

547
00:28:46.160 --> 00:28:48.720
<v Speaker 4>so finding ways to short circuit and know that those

548
00:28:48.759 --> 00:28:52.359
<v Speaker 4>things are broken can be challenging at times. So one

549
00:28:52.400 --> 00:28:53.559
<v Speaker 4>of the things we have to do is we have

550
00:28:53.599 --> 00:28:56.240
<v Speaker 4>to look at the data over time as well and

551
00:28:56.279 --> 00:28:59.039
<v Speaker 4>not just look at real time metrics. So one thing

552
00:28:59.039 --> 00:29:02.079
<v Speaker 4>I've been experimenting with is trying to create more automated

553
00:29:02.119 --> 00:29:04.839
<v Speaker 4>reports that go into sort of a Slack channel that

554
00:29:04.880 --> 00:29:07.039
<v Speaker 4>we can look at and so people can review that.

555
00:29:07.160 --> 00:29:11.160
<v Speaker 4>And we've also implemented a basically a bi weekly review

556
00:29:11.240 --> 00:29:13.920
<v Speaker 4>during our retro where we just look at our metrics

557
00:29:14.079 --> 00:29:16.559
<v Speaker 4>and some of the longer, longer running trends so that

558
00:29:16.599 --> 00:29:19.799
<v Speaker 4>we can see if those look correct, is there anything

559
00:29:19.799 --> 00:29:21.720
<v Speaker 4>that's wrong, We can talk about it, see if there's

560
00:29:21.759 --> 00:29:24.039
<v Speaker 4>things that we want to actually action on based on

561
00:29:24.359 --> 00:29:27.480
<v Speaker 4>that review. So we're trying to find some ways to

562
00:29:27.920 --> 00:29:30.039
<v Speaker 4>do check ins that don't require us to be all

563
00:29:30.039 --> 00:29:30.519
<v Speaker 4>in office.

564
00:29:30.960 --> 00:29:34.680
<v Speaker 7>The Slack channel truly is the Giant Performance monitor of

565
00:29:35.599 --> 00:29:38.640
<v Speaker 7>twenty twenty that is that is literally what tells me

566
00:29:39.240 --> 00:29:41.559
<v Speaker 7>whether stuff is working in a moment. I'm thinking a

567
00:29:41.640 --> 00:29:44.039
<v Speaker 7>lot of people in the same boat. So it sounds

568
00:29:44.079 --> 00:29:45.839
<v Speaker 7>like that you were saying that once you get to

569
00:29:45.839 --> 00:29:49.839
<v Speaker 7>a certain stage, then the office shelf monitoring isn't really

570
00:29:49.880 --> 00:29:53.640
<v Speaker 7>going to cut it. So you have written custom monitoring

571
00:29:53.920 --> 00:29:55.440
<v Speaker 7>for your application.

572
00:29:55.599 --> 00:29:56.279
<v Speaker 5>Is that correct?

573
00:29:56.480 --> 00:30:01.680
<v Speaker 4>We have implemented what i'd consider customer tricks. We use

574
00:30:01.759 --> 00:30:04.279
<v Speaker 4>Data Dog, so a lot of this is out of

575
00:30:04.319 --> 00:30:08.559
<v Speaker 4>the box. You can use their implementation, but you're you're

576
00:30:08.599 --> 00:30:12.720
<v Speaker 4>adding some code just the parts of your application. Maybe

577
00:30:12.759 --> 00:30:15.920
<v Speaker 4>it's a maybe it's a callback on your active record model.

578
00:30:16.000 --> 00:30:19.000
<v Speaker 4>When something is created, you send a message to a

579
00:30:19.279 --> 00:30:22.440
<v Speaker 4>queue and then that triggers over a message into stats

580
00:30:23.319 --> 00:30:26.000
<v Speaker 4>D that goes to data Dog. Anyways, you can do.

581
00:30:26.319 --> 00:30:29.279
<v Speaker 4>It's a pretty lightweight to implementation in terms of what

582
00:30:29.279 --> 00:30:32.200
<v Speaker 4>you can do. But you're adding specific events that you

583
00:30:32.240 --> 00:30:34.640
<v Speaker 4>want to track, and then you can you can create

584
00:30:34.640 --> 00:30:38.440
<v Speaker 4>your own monitors and alerting around those or correlations between

585
00:30:38.480 --> 00:30:42.359
<v Speaker 4>different different events in your system. So you could potentially

586
00:30:42.400 --> 00:30:45.440
<v Speaker 4>look at a custom metric and then look at that

587
00:30:45.599 --> 00:30:49.440
<v Speaker 4>compared to HTTP statuses that are coming through or the

588
00:30:49.559 --> 00:30:52.519
<v Speaker 4>latency of an endpoint and then you could correlate those

589
00:30:52.519 --> 00:30:56.359
<v Speaker 4>two metrics as well. So there's there's some more advanced

590
00:30:56.359 --> 00:30:58.200
<v Speaker 4>things you can do there as well if you need to.

591
00:30:58.440 --> 00:31:01.039
<v Speaker 4>But again it's not really a lot of custom work.

592
00:31:01.160 --> 00:31:04.000
<v Speaker 4>Is just adding some specific points in your code bas

593
00:31:04.079 --> 00:31:06.599
<v Speaker 4>that you feel like are really important to truck. And

594
00:31:06.799 --> 00:31:09.799
<v Speaker 4>one example of this for rails users is I believe

595
00:31:09.839 --> 00:31:12.839
<v Speaker 4>there's something like this already set up for data Dog

596
00:31:13.079 --> 00:31:13.880
<v Speaker 4>for sidekicks.

597
00:31:13.920 --> 00:31:15.839
<v Speaker 3>So we instrument on a lot of our.

598
00:31:15.680 --> 00:31:18.799
<v Speaker 4>Psydekick jobs and we can see when the log is

599
00:31:18.839 --> 00:31:21.240
<v Speaker 4>growing on one of those cues, we can see what

600
00:31:21.279 --> 00:31:25.920
<v Speaker 4>the average completion time is the p ninety completion time

601
00:31:25.920 --> 00:31:28.559
<v Speaker 4>for different types of jobs. So you get a lot

602
00:31:28.599 --> 00:31:32.680
<v Speaker 4>of visibility into your ssidechick workers and processes very easily,

603
00:31:32.960 --> 00:31:33.799
<v Speaker 4>basically for free.

604
00:31:33.960 --> 00:31:37.039
<v Speaker 6>And if you're going to use Slack for your error notification,

605
00:31:37.720 --> 00:31:39.920
<v Speaker 6>now I'm not dossing that at all. No, I have

606
00:31:39.960 --> 00:31:43.640
<v Speaker 6>a few applications that actually do that. It just triggers

607
00:31:43.680 --> 00:31:47.599
<v Speaker 6>a Slack notification. But if you're only capturing the error

608
00:31:47.640 --> 00:31:50.839
<v Speaker 6>message and not a stack trace along with it, then

609
00:31:50.880 --> 00:31:54.720
<v Speaker 6>that error message is pretty much useless because it tells

610
00:31:54.759 --> 00:31:57.920
<v Speaker 6>you you have a problem somewhere in your millions of

611
00:31:57.960 --> 00:31:59.680
<v Speaker 6>lines of code, but we're not going to tell you

612
00:31:59.720 --> 00:32:00.279
<v Speaker 6>where set.

613
00:32:00.839 --> 00:32:04.359
<v Speaker 4>Just to be clear, we capture all of our our

614
00:32:04.559 --> 00:32:08.000
<v Speaker 4>errors in Century. We do have some alerting because of Slack.

615
00:32:08.599 --> 00:32:13.359
<v Speaker 4>But I would also want to emphasize that anything that's

616
00:32:13.400 --> 00:32:17.519
<v Speaker 4>truly has any chance of being a serious issue should

617
00:32:17.519 --> 00:32:20.799
<v Speaker 4>never be like an either an email or a Century

618
00:32:20.799 --> 00:32:24.119
<v Speaker 4>alert or sorry, a Slack alert. You really should have

619
00:32:24.240 --> 00:32:28.480
<v Speaker 4>some kind of escalation via either maybe it's text, maybe

620
00:32:28.519 --> 00:32:32.000
<v Speaker 4>it's an actual incident response system like Peter Duty where

621
00:32:32.039 --> 00:32:33.519
<v Speaker 4>you can have an escalation policy.

622
00:32:33.759 --> 00:32:35.319
<v Speaker 3>For us, that's what we're using.

623
00:32:35.400 --> 00:32:39.160
<v Speaker 4>It should have this synchronous alerting that really forces someone

624
00:32:39.200 --> 00:32:41.960
<v Speaker 4>to look at it. You can't rely on something asynchronous

625
00:32:42.079 --> 00:32:46.279
<v Speaker 4>like Slack in this case for serious response on issues.

626
00:32:46.680 --> 00:32:48.200
<v Speaker 2>There's a little off topic.

627
00:32:48.240 --> 00:32:50.839
<v Speaker 6>But you know what issue I found with that is

628
00:32:51.200 --> 00:32:54.000
<v Speaker 6>I use my cell phone for everything. It's where I

629
00:32:54.000 --> 00:32:57.319
<v Speaker 6>have my email, get my text messages, phone calls and

630
00:32:57.359 --> 00:32:59.720
<v Speaker 6>all that stuff. So I would like to keep it

631
00:32:59.759 --> 00:33:03.480
<v Speaker 6>on full volume late at night when I'm sleeping, so

632
00:33:03.680 --> 00:33:08.000
<v Speaker 6>if a critical does arrive, then I can get notified.

633
00:33:08.480 --> 00:33:11.759
<v Speaker 6>But my issue is that I would never get any

634
00:33:11.880 --> 00:33:14.519
<v Speaker 6>sleep because my phone would just go off so I

635
00:33:14.559 --> 00:33:16.920
<v Speaker 6>need to figure out some way that I can set

636
00:33:17.000 --> 00:33:21.960
<v Speaker 6>up for a particular phone number or something to override

637
00:33:22.480 --> 00:33:25.559
<v Speaker 6>any kind of sleep mode or whatever that I have

638
00:33:25.640 --> 00:33:28.240
<v Speaker 6>on my phone right now, or get a different phone

639
00:33:28.240 --> 00:33:30.880
<v Speaker 6>for that purpose. That seems a bit overkill.

640
00:33:30.559 --> 00:33:32.759
<v Speaker 4>Doesn't You can actually do that. You can do that,

641
00:33:32.799 --> 00:33:35.440
<v Speaker 4>I believe with at least with DIOS. You can set

642
00:33:35.480 --> 00:33:38.599
<v Speaker 4>up an override where you snooze everything else, and then

643
00:33:38.640 --> 00:33:40.000
<v Speaker 4>you can set up and you have to just put

644
00:33:40.039 --> 00:33:43.359
<v Speaker 4>it in your personal contacts, whatever numbers you think you're

645
00:33:43.400 --> 00:33:47.000
<v Speaker 4>going to receive critical notification from, and then that'll actually

646
00:33:47.079 --> 00:33:47.480
<v Speaker 4>ring through it.

647
00:33:47.559 --> 00:33:49.759
<v Speaker 6>All right, I need equip being lazy then and just

648
00:33:49.839 --> 00:33:50.279
<v Speaker 6>do that.

649
00:33:50.519 --> 00:33:53.160
<v Speaker 7>Back in twenty fifteen, I was working in the States

650
00:33:53.400 --> 00:33:57.720
<v Speaker 7>and due to various issues, I was still responsible, thankfully

651
00:33:57.759 --> 00:34:00.240
<v Speaker 7>for a bunch of service in the UK, and I'd

652
00:34:00.240 --> 00:34:02.720
<v Speaker 7>gone to see a film and put my phone on silent,

653
00:34:02.759 --> 00:34:06.599
<v Speaker 7>and of course all the servers melted halfway through Skyfall

654
00:34:06.759 --> 00:34:09.920
<v Speaker 7>or whatever movie it was. Tom Cruise did not alert

655
00:34:10.000 --> 00:34:12.800
<v Speaker 7>me of the impending server disaster while he was dealing

656
00:34:12.800 --> 00:34:15.639
<v Speaker 7>with the aliens. So I came out and everyone was

657
00:34:15.719 --> 00:34:20.079
<v Speaker 7>very upset. So I ended up writing custom alerting with

658
00:34:20.599 --> 00:34:25.360
<v Speaker 7>a custom app. Were using the Android Automator that when

659
00:34:25.360 --> 00:34:29.000
<v Speaker 7>it received a text message with the magic string in

660
00:34:29.039 --> 00:34:33.079
<v Speaker 7>it would actually like manic turn turn up volume and

661
00:34:33.119 --> 00:34:38.760
<v Speaker 7>then play the Beatles help at full volume. And that worked.

662
00:34:39.000 --> 00:34:42.719
<v Speaker 7>That worked very well. But what didn't have, which I

663
00:34:42.880 --> 00:34:45.320
<v Speaker 7>like on the page of duty system, is the acknowledgement

664
00:34:45.920 --> 00:34:48.800
<v Speaker 7>so you can see, you know, yeah, I've sent the message.

665
00:34:48.840 --> 00:34:52.079
<v Speaker 7>Has that person seen that message? And you know tapped

666
00:34:52.119 --> 00:34:55.000
<v Speaker 7>the yes I am aware server as a melting button.

667
00:34:55.400 --> 00:34:59.800
<v Speaker 1>Yeah, I've got I think it's the bedtime settings in iOS,

668
00:35:00.079 --> 00:35:02.239
<v Speaker 1>and yeah, I've just told it. If it's a number

669
00:35:02.280 --> 00:35:05.039
<v Speaker 1>in my contacts, then ring and if it's not, then don't.

670
00:35:05.239 --> 00:35:07.760
<v Speaker 1>So yeah, it'll go off, but it'll only go off

671
00:35:07.800 --> 00:35:10.000
<v Speaker 1>if it's yeah, if it's in my contact So yeah,

672
00:35:10.039 --> 00:35:12.679
<v Speaker 1>then i just add whoever or whatever to my contacts

673
00:35:12.719 --> 00:35:13.639
<v Speaker 1>and I'm set.

674
00:35:13.840 --> 00:35:16.000
<v Speaker 6>Yeah that should work well for my use case because

675
00:35:16.079 --> 00:35:17.079
<v Speaker 6>no one never calls me.

676
00:35:17.320 --> 00:35:17.519
<v Speaker 2>Yeah.

677
00:35:18.480 --> 00:35:22.400
<v Speaker 5>Right, So that's a tragic thing to say to you.

678
00:35:22.440 --> 00:35:22.559
<v Speaker 3>Now.

679
00:35:22.599 --> 00:35:26.480
<v Speaker 6>I had the Verizon call filter, which actually works pretty well.

680
00:35:26.639 --> 00:35:30.079
<v Speaker 6>It's reduced the fifteen to twenty phone calls. I will

681
00:35:30.119 --> 00:35:32.039
<v Speaker 6>get a day down to like one.

682
00:35:32.320 --> 00:35:33.800
<v Speaker 2>Yeah, the iPhone has that feature.

683
00:35:33.800 --> 00:35:36.400
<v Speaker 1>To where you can essentially tell it don't ring unless

684
00:35:36.400 --> 00:35:37.760
<v Speaker 1>the numbers in my contacts.

685
00:35:37.920 --> 00:35:39.440
<v Speaker 2>Yeah, I got.

686
00:35:39.239 --> 00:35:42.000
<v Speaker 6>Burned by that pretty bad. One time. My wife was

687
00:35:42.039 --> 00:35:44.320
<v Speaker 6>over the polls. She had forgotten her phone or she

688
00:35:44.360 --> 00:35:50.559
<v Speaker 6>had lost its phone there, and because that random person

689
00:35:50.679 --> 00:35:53.280
<v Speaker 6>wasn't in my contacts, I never got her phone call.

690
00:35:54.000 --> 00:35:55.400
<v Speaker 2>My phone just stayed silent.

691
00:35:55.519 --> 00:35:57.519
<v Speaker 6>So I had to disable that pretty quick.

692
00:35:57.639 --> 00:35:58.280
<v Speaker 2>That'll teach you.

693
00:35:58.480 --> 00:36:03.119
<v Speaker 7>Can I ask you about composite monitors, because that is

694
00:36:03.159 --> 00:36:06.400
<v Speaker 7>a phrase I haven't heard before. I'm submiled with a

695
00:36:06.559 --> 00:36:10.400
<v Speaker 7>rate monitor, and my understanding that is if it drops

696
00:36:10.480 --> 00:36:13.079
<v Speaker 7>really quick, it goes off, but if it drops slowly,

697
00:36:13.119 --> 00:36:16.039
<v Speaker 7>it doesn't come off. But what is this composite monitor?

698
00:36:16.280 --> 00:36:22.000
<v Speaker 4>So composite monitor is basically a combination of several different

699
00:36:22.119 --> 00:36:26.039
<v Speaker 4>metrics that you're measuring using, chained together those with and.

700
00:36:26.079 --> 00:36:27.000
<v Speaker 3>Or or statements.

701
00:36:27.719 --> 00:36:30.800
<v Speaker 4>So maybe referencing what I was talking about before, or

702
00:36:30.800 --> 00:36:32.760
<v Speaker 4>you might want to have a custom metric that you're

703
00:36:32.800 --> 00:36:35.000
<v Speaker 4>looking at and you want to look at how many

704
00:36:35.039 --> 00:36:38.320
<v Speaker 4>of those are coming through, how many events are coming through?

705
00:36:38.800 --> 00:36:40.960
<v Speaker 3>And then you might also want to look at, in.

706
00:36:40.880 --> 00:36:44.920
<v Speaker 4>This case, the air rate for HTTP status, maybe how

707
00:36:44.920 --> 00:36:48.480
<v Speaker 4>many four hundred errors you're getting relative to two hundreds.

708
00:36:48.559 --> 00:36:51.159
<v Speaker 4>You could basically do something where you have an end

709
00:36:51.239 --> 00:36:55.920
<v Speaker 4>statement between those two different measures and those bullying evaluations,

710
00:36:56.199 --> 00:36:59.159
<v Speaker 4>or you could do something where you have an ore

711
00:36:59.320 --> 00:37:02.159
<v Speaker 4>so you can say, these are basically signaling for the

712
00:37:02.199 --> 00:37:04.599
<v Speaker 4>same type of issue that I want to alert on,

713
00:37:04.760 --> 00:37:09.000
<v Speaker 4>but I'm going to look for these different conditions all

714
00:37:09.039 --> 00:37:09.800
<v Speaker 4>in the same monitor.

715
00:37:09.960 --> 00:37:12.599
<v Speaker 7>So you look at multiple different things at once. Is

716
00:37:12.639 --> 00:37:15.719
<v Speaker 7>that so that you could combine those to kind of

717
00:37:15.760 --> 00:37:19.360
<v Speaker 7>set effectively a much lower threshold and get higher signal

718
00:37:19.400 --> 00:37:22.159
<v Speaker 7>to noise. So you say something like, you know, well,

719
00:37:22.280 --> 00:37:26.239
<v Speaker 7>well allows some number of four row fast, this number

720
00:37:26.280 --> 00:37:29.920
<v Speaker 7>of server load, this number of other errors. But if

721
00:37:29.960 --> 00:37:32.880
<v Speaker 7>you get all three at the same time, then it

722
00:37:32.920 --> 00:37:36.400
<v Speaker 7>triggers something different, or does it use a lower number?

723
00:37:36.840 --> 00:37:38.679
<v Speaker 5>What's the what's the result of that?

724
00:37:39.119 --> 00:37:41.559
<v Speaker 7>The advantage of using that logic instead of just saying,

725
00:37:41.679 --> 00:37:43.960
<v Speaker 7>here is the minimum number of four row flaws.

726
00:37:44.079 --> 00:37:47.320
<v Speaker 5>Here is here's the minimum, here's.

727
00:37:47.079 --> 00:37:52.079
<v Speaker 7>The maximum number of fours, here's the maximum number of errors.

728
00:37:52.639 --> 00:37:55.719
<v Speaker 7>How does that actually translate into a better metric?

729
00:37:55.920 --> 00:37:56.119
<v Speaker 3>Right?

730
00:37:56.159 --> 00:37:57.960
<v Speaker 4>So, I think I think it gives you the ability

731
00:37:58.119 --> 00:38:01.960
<v Speaker 4>to tune things to make the potentially make something have

732
00:38:02.000 --> 00:38:04.880
<v Speaker 4>a higher fidelity of when it alerts, so you're not

733
00:38:04.920 --> 00:38:08.199
<v Speaker 4>getting one. You can set the thresholds actually higher and

734
00:38:08.679 --> 00:38:10.360
<v Speaker 4>keep things. It depends how you want to use it,

735
00:38:10.360 --> 00:38:12.280
<v Speaker 4>but you can. In this case, you could set the

736
00:38:12.320 --> 00:38:15.159
<v Speaker 4>thresholds higher, but you could have something where it's like, well,

737
00:38:15.199 --> 00:38:18.239
<v Speaker 4>if it's all there aren't any errors coming through, then

738
00:38:18.760 --> 00:38:21.159
<v Speaker 4>maybe we're okay with that even though the numbers are

739
00:38:21.159 --> 00:38:23.519
<v Speaker 4>a little bit lower, or you can do things where

740
00:38:23.599 --> 00:38:25.559
<v Speaker 4>you can be more and again you can also tue

741
00:38:25.559 --> 00:38:28.599
<v Speaker 4>this to be more sensitive. In this particular incident, if

742
00:38:28.599 --> 00:38:32.199
<v Speaker 4>we had had some air monitoring around four hundreds in

743
00:38:32.239 --> 00:38:35.199
<v Speaker 4>addition to the threshold that we had that was pretty low,

744
00:38:35.440 --> 00:38:37.280
<v Speaker 4>I think we would have been triggered on we would

745
00:38:37.280 --> 00:38:40.760
<v Speaker 4>have been alerted on that within maybe an hour. So

746
00:38:41.280 --> 00:38:44.000
<v Speaker 4>you can do things there that give you more sensitivity

747
00:38:44.039 --> 00:38:47.639
<v Speaker 4>without necessarily causing a lot more false alarms.

748
00:38:47.719 --> 00:38:49.280
<v Speaker 3>And that's something that.

749
00:38:49.239 --> 00:38:51.599
<v Speaker 4>You have to just be really careful with any kind

750
00:38:51.599 --> 00:38:54.039
<v Speaker 4>of monitor on a team. Is you really need to

751
00:38:54.079 --> 00:38:58.039
<v Speaker 4>make sure that you are not creating false alarms. I'd

752
00:38:58.079 --> 00:39:01.760
<v Speaker 4>say it's almost as important or equally important to the

753
00:39:01.840 --> 00:39:05.119
<v Speaker 4>sensitivity of the alarm as well, because if you're creating

754
00:39:05.159 --> 00:39:08.440
<v Speaker 4>false alarms all the time. It's just human nature to

755
00:39:08.760 --> 00:39:11.880
<v Speaker 4>basically start to ignore those or not really give them

756
00:39:12.159 --> 00:39:14.280
<v Speaker 4>the review that they need. So if you're doing that

757
00:39:14.360 --> 00:39:17.519
<v Speaker 4>all the time, you're probably going to miss something inevitably

758
00:39:18.119 --> 00:39:19.480
<v Speaker 4>when there's actually a real issue.

759
00:39:19.639 --> 00:39:21.599
<v Speaker 1>Makes sense, All right, we're getting close to the end

760
00:39:21.639 --> 00:39:25.599
<v Speaker 1>of our time. Are there any other stories or examples

761
00:39:25.920 --> 00:39:28.679
<v Speaker 1>or lessons that you want to make sure somebody listening

762
00:39:28.760 --> 00:39:31.000
<v Speaker 1>to this gets.

763
00:39:30.679 --> 00:39:33.840
<v Speaker 4>I just want to emphasize that this is a growing

764
00:39:34.079 --> 00:39:39.599
<v Speaker 4>process that I think every team should go through. It's

765
00:39:39.599 --> 00:39:42.920
<v Speaker 4>something that is going to evolve over time, and as

766
00:39:43.000 --> 00:39:46.920
<v Speaker 4>your product becomes more important to customers and can use

767
00:39:46.960 --> 00:39:50.039
<v Speaker 4>and grow, you need to just be constantly revealing what

768
00:39:50.079 --> 00:39:53.519
<v Speaker 4>your approaches to this. What's going to work for brand

769
00:39:53.519 --> 00:39:57.079
<v Speaker 4>new product, brand new startup, brand new company isn't necessarily

770
00:39:57.159 --> 00:39:58.199
<v Speaker 4>going to be the right fit.

771
00:39:58.280 --> 00:39:59.119
<v Speaker 3>As you continue to.

772
00:39:59.119 --> 00:40:02.119
<v Speaker 4>Grow and something that you need to evaluate and as

773
00:40:02.519 --> 00:40:05.079
<v Speaker 4>your product starts to be something that's really a critical

774
00:40:05.119 --> 00:40:09.159
<v Speaker 4>service for your customers or for other teams at your company,

775
00:40:09.360 --> 00:40:12.800
<v Speaker 4>you just need to continually set the bar higher and

776
00:40:13.360 --> 00:40:17.320
<v Speaker 4>make sure that you're continuing to grow observability across the stock.

777
00:40:17.599 --> 00:40:19.719
<v Speaker 1>All right, Well, one more thing before we go to picks,

778
00:40:19.760 --> 00:40:21.840
<v Speaker 1>and that is if people want to get in contact

779
00:40:21.880 --> 00:40:23.639
<v Speaker 1>with you, how do they find you on the internet.

780
00:40:23.800 --> 00:40:28.840
<v Speaker 4>You're welcome to reach out to me on Twitter at Kyzeitch,

781
00:40:29.159 --> 00:40:31.320
<v Speaker 4>or you can reach out to me on LinkedIn as well.

782
00:40:31.440 --> 00:40:31.840
<v Speaker 2>Awesome.

783
00:40:31.920 --> 00:40:33.360
<v Speaker 1>Yeah, we'll get links to those and we'll put them

784
00:40:33.400 --> 00:40:35.119
<v Speaker 1>in the show notes. Let's go ahead and do some

785
00:40:35.159 --> 00:40:36.800
<v Speaker 1>piccks then, Dave, do you want to start us off

786
00:40:36.840 --> 00:40:37.239
<v Speaker 1>with the picks?

787
00:40:37.320 --> 00:40:37.519
<v Speaker 2>Yeah?

788
00:40:37.599 --> 00:40:40.800
<v Speaker 6>Sure. So went to the doctor the other week and

789
00:40:40.960 --> 00:40:43.559
<v Speaker 6>they said I had high blood pressure, which I attribute

790
00:40:43.599 --> 00:40:49.920
<v Speaker 6>to raising kids and them stressing me out. So I

791
00:40:49.960 --> 00:40:52.960
<v Speaker 6>got this blood pressure monitor that syncs up with my

792
00:40:53.039 --> 00:40:56.679
<v Speaker 6>iPhone so it keeps a historical track of it. And

793
00:40:56.760 --> 00:41:00.119
<v Speaker 6>it's been really nice, and I guess it's accurate. I

794
00:41:00.119 --> 00:41:02.440
<v Speaker 6>don't know. It says it's highs so I guess it's

795
00:41:02.480 --> 00:41:06.519
<v Speaker 6>doing something. So it is the withings and it's a

796
00:41:06.840 --> 00:41:09.239
<v Speaker 6>wireless rechargeable blood pressure monitor.

797
00:41:09.519 --> 00:41:11.199
<v Speaker 2>Cool, Luke, how about you.

798
00:41:11.360 --> 00:41:15.880
<v Speaker 7>Us as a really interesting Is this something you wear

799
00:41:16.079 --> 00:41:17.000
<v Speaker 7>all the time day?

800
00:41:17.239 --> 00:41:19.920
<v Speaker 6>No, it's just like the doctor's one where they put it,

801
00:41:20.119 --> 00:41:22.119
<v Speaker 6>roll up your sleeve, put it on your arm and

802
00:41:22.599 --> 00:41:25.239
<v Speaker 6>you know it starts to squeeze your arm. It's not

803
00:41:25.360 --> 00:41:27.679
<v Speaker 6>like a wristwatch or anything. So I do it a

804
00:41:27.719 --> 00:41:28.639
<v Speaker 6>couple of times today.

805
00:41:29.440 --> 00:41:32.280
<v Speaker 7>Blood pressure just kidding, yeah, just just checking it, just

806
00:41:32.360 --> 00:41:35.079
<v Speaker 7>obsessing about it. I suppose that's that's good. It's not

807
00:41:35.159 --> 00:41:37.599
<v Speaker 7>real time. Other always, that'd be even more stressful because

808
00:41:37.599 --> 00:41:40.360
<v Speaker 7>you'd be sitting here and it go off and say, yeah,

809
00:41:40.719 --> 00:41:43.840
<v Speaker 7>blood pressure is going up. Get caught in the feedback loop.

810
00:41:44.119 --> 00:41:46.159
<v Speaker 2>Cool? How about you, Luke? What are your picks?

811
00:41:46.519 --> 00:41:48.639
<v Speaker 5>I've been fighting the code this week, Chuck.

812
00:41:49.440 --> 00:41:52.159
<v Speaker 7>I've been building strange command line in the faces in

813
00:41:52.239 --> 00:41:56.559
<v Speaker 7>a ruvie, and I've been using a little application which

814
00:41:56.639 --> 00:41:59.920
<v Speaker 7>is installed by default on most de Bunty based systems

815
00:42:00.639 --> 00:42:03.920
<v Speaker 7>called a whiptail. This is an old school text style

816
00:42:04.000 --> 00:42:07.159
<v Speaker 7>interface so when you can't put a guy on it

817
00:42:07.239 --> 00:42:11.000
<v Speaker 7>for various reasons, so this is kind of like it

818
00:42:11.039 --> 00:42:13.159
<v Speaker 7>makes makes it look more professional, you know, it makes

819
00:42:13.199 --> 00:42:15.960
<v Speaker 7>it like a real piece of software. And using this

820
00:42:16.119 --> 00:42:19.039
<v Speaker 7>from Ruby has been a real pain because you need

821
00:42:19.079 --> 00:42:22.440
<v Speaker 7>to do funny things with filed as scriptors to get

822
00:42:22.440 --> 00:42:25.480
<v Speaker 7>the user data out. So it turns out a very

823
00:42:25.599 --> 00:42:28.960
<v Speaker 7>unnice man by the name of Felix C. Stiguman has

824
00:42:28.960 --> 00:42:31.239
<v Speaker 7>written a gem has written a gem to do it

825
00:42:31.280 --> 00:42:35.559
<v Speaker 7>all for you in Felix. So yeah, you know, all

826
00:42:35.559 --> 00:42:38.440
<v Speaker 7>of that work I did was totally unnecessary, and you

827
00:42:38.599 --> 00:42:43.679
<v Speaker 7>too can build amazing old school asci looking interfaces using

828
00:42:43.920 --> 00:42:48.480
<v Speaker 7>the gem. It's called ef and it's on GitHub on

829
00:42:48.559 --> 00:42:52.280
<v Speaker 7>the the od fask and there's loads of really interesting

830
00:42:52.559 --> 00:42:56.599
<v Speaker 7>utilties on the odd fast gub. If you dig in,

831
00:42:56.639 --> 00:43:00.199
<v Speaker 7>there's an interesting low level stuff for when you want

832
00:43:00.199 --> 00:43:02.760
<v Speaker 7>to kind of rudy yourself off on the commonline, say well,

833
00:43:02.760 --> 00:43:03.800
<v Speaker 7>well look awesome.

834
00:43:03.920 --> 00:43:05.719
<v Speaker 1>All right, I'm gonna throw out a couple of picks.

835
00:43:06.039 --> 00:43:08.679
<v Speaker 1>The first one is I'm still working on this, so

836
00:43:08.760 --> 00:43:12.039
<v Speaker 1>keep checking in most Valuable dot dev and Summit Dot

837
00:43:12.119 --> 00:43:14.280
<v Speaker 1>Most Valuable dot Dev. I think I've mentioned it on

838
00:43:14.320 --> 00:43:16.480
<v Speaker 1>the show before, but I'm talking to folks out there

839
00:43:16.480 --> 00:43:18.320
<v Speaker 1>in the community. We've talked to a number of people

840
00:43:18.360 --> 00:43:20.559
<v Speaker 1>that you've heard of, that you know well, that you're

841
00:43:20.599 --> 00:43:22.519
<v Speaker 1>excited to hear from. But yeah, I'm going to be

842
00:43:22.559 --> 00:43:24.599
<v Speaker 1>interviewing them and asking them what they would do if

843
00:43:24.639 --> 00:43:27.199
<v Speaker 1>they woke up tomorrow. Was a mid level developer and

844
00:43:27.440 --> 00:43:29.920
<v Speaker 1>felt like they didn't quite know where to go from there.

845
00:43:30.119 --> 00:43:32.960
<v Speaker 1>So a lot of folks that's where they kind of

846
00:43:33.000 --> 00:43:36.480
<v Speaker 1>end up right, they get to junior or mid level developer,

847
00:43:36.599 --> 00:43:38.440
<v Speaker 1>and then it's okay, I'm proficient.

848
00:43:38.519 --> 00:43:38.800
<v Speaker 2>Now what.

849
00:43:39.000 --> 00:43:40.440
<v Speaker 1>Yeah, there are a lot of options, a lot of

850
00:43:40.440 --> 00:43:42.239
<v Speaker 1>ways you can go. I'm hoping to have people come

851
00:43:42.239 --> 00:43:46.239
<v Speaker 1>talk about blogging, podcasting, speaking at conferences and all the

852
00:43:46.280 --> 00:43:48.760
<v Speaker 1>other stuff, and then just how to stay current, you know,

853
00:43:48.800 --> 00:43:50.880
<v Speaker 1>how they keep up on what's going on out there.

854
00:43:50.960 --> 00:43:52.000
<v Speaker 2>So I'm going to pick that.

855
00:43:52.239 --> 00:43:54.199
<v Speaker 1>I've been playing a game on my phone just when

856
00:43:54.199 --> 00:43:56.280
<v Speaker 1>I have a minute, and you know, I want to

857
00:43:56.440 --> 00:43:58.159
<v Speaker 1>sink a little bit of time into it. It's called

858
00:43:58.239 --> 00:44:01.320
<v Speaker 1>Mushroom Wars two. It's on the iPhone. I don't know

859
00:44:01.320 --> 00:44:04.280
<v Speaker 1>if it's on the Android phone. Yeah, liking that, and

860
00:44:04.320 --> 00:44:07.000
<v Speaker 1>then yeah, I'm also putting on a podcasting summit, So

861
00:44:07.000 --> 00:44:09.559
<v Speaker 1>if you're interested in that, you can go to Podcasts

862
00:44:10.159 --> 00:44:13.639
<v Speaker 1>podcast Growth Summit dot co and we'll have all the

863
00:44:13.639 --> 00:44:15.920
<v Speaker 1>information up there if you listen to the Freelancer Show.

864
00:44:16.800 --> 00:44:19.440
<v Speaker 2>The first interview I did was with Petromanos.

865
00:44:19.679 --> 00:44:21.559
<v Speaker 1>She's in Australia, so I was talking to her in

866
00:44:21.679 --> 00:44:24.239
<v Speaker 1>the evening here in the morning there, which is always

867
00:44:24.280 --> 00:44:26.440
<v Speaker 1>fun with all the time zone stuff. But she talked

868
00:44:26.480 --> 00:44:29.840
<v Speaker 1>about basically how to measure your growth and then how

869
00:44:29.880 --> 00:44:32.559
<v Speaker 1>to use Google's tools not just to measure your growth,

870
00:44:32.599 --> 00:44:34.239
<v Speaker 1>but then to figure out where to double down on

871
00:44:34.280 --> 00:44:35.639
<v Speaker 1>it and get more traffic.

872
00:44:35.840 --> 00:44:37.800
<v Speaker 2>So it was awesome.

873
00:44:37.840 --> 00:44:39.800
<v Speaker 1>I'm talking to a bunch of other people that I've

874
00:44:39.840 --> 00:44:41.960
<v Speaker 1>known for years and years in the podcasting space, and

875
00:44:42.039 --> 00:44:44.760
<v Speaker 1>I'm super excited about it too. And I should probably

876
00:44:44.840 --> 00:44:46.679
<v Speaker 1>throw out one more pick. So I'm gonna throw out

877
00:44:46.719 --> 00:44:50.199
<v Speaker 1>gmailias that's g M E L I U S. And

878
00:44:50.639 --> 00:44:53.360
<v Speaker 1>what it is is it's a tool. It's a CRM,

879
00:44:53.559 --> 00:44:57.039
<v Speaker 1>but it also has like scheduling, so like schedule once

880
00:44:57.159 --> 00:45:03.800
<v Speaker 1>or what's the other one. It allows you to set

881
00:45:03.960 --> 00:45:08.280
<v Speaker 1>up a series of emails. It'll do automatic follow up

882
00:45:08.280 --> 00:45:10.880
<v Speaker 1>for you and stuff like that, and so it just

883
00:45:10.920 --> 00:45:13.079
<v Speaker 1>does a whole bunch of email automation, but it runs

884
00:45:13.119 --> 00:45:16.679
<v Speaker 1>out of your email account, your Gmail account. That's the

885
00:45:16.719 --> 00:45:19.400
<v Speaker 1>big nice thing about it is that you don't get

886
00:45:19.679 --> 00:45:23.159
<v Speaker 1>downgraded by send grid or something if your emails aren't landing.

887
00:45:23.639 --> 00:45:26.000
<v Speaker 2>And so that's another thing that I'm just really digging.

888
00:45:26.119 --> 00:45:28.639
<v Speaker 2>So I'm going to shout out about that, Paul, what

889
00:45:28.679 --> 00:45:29.480
<v Speaker 2>are your picks?

890
00:45:29.760 --> 00:45:33.960
<v Speaker 4>I really enjoyed something that was in the Ruby Weekly

891
00:45:34.320 --> 00:45:38.360
<v Speaker 4>newsletter this last week. There's a Ruby one liner cookbook,

892
00:45:38.599 --> 00:45:40.719
<v Speaker 4>so it has a bunch of different one liners. You

893
00:45:40.719 --> 00:45:44.559
<v Speaker 4>can actually just shout it out to and make those calls,

894
00:45:44.599 --> 00:45:46.360
<v Speaker 4>and it explains how you can do a lot of

895
00:45:46.440 --> 00:45:50.719
<v Speaker 4>things they do with a shell script very easily with Ruby.

896
00:45:50.880 --> 00:45:52.400
<v Speaker 2>Awesome. Have to check that out.

897
00:45:52.559 --> 00:45:55.000
<v Speaker 1>Sounds like a decent episode too, whether we just go

898
00:45:55.079 --> 00:45:56.880
<v Speaker 1>through some of those and pick our favorites or whether

899
00:45:56.920 --> 00:45:59.599
<v Speaker 1>we get whoever compiled it on. Thanks for coming, Paul.

900
00:45:59.639 --> 00:46:02.000
<v Speaker 1>This was really helpful, and I think some folks are

901
00:46:02.079 --> 00:46:05.079
<v Speaker 1>probably gonna either encounter this and go, yeah, I wish

902
00:46:05.159 --> 00:46:07.400
<v Speaker 1>we were doing that, because the last time we were

903
00:46:07.440 --> 00:46:10.360
<v Speaker 1>ended something like this it was painful, or some folks.

904
00:46:10.360 --> 00:46:12.960
<v Speaker 1>Hopefully we'll be proactive and go out there and set

905
00:46:12.960 --> 00:46:16.280
<v Speaker 1>things up so that they're watching things and communicating about

906
00:46:16.320 --> 00:46:18.719
<v Speaker 1>the way that they handle issues and the way that

907
00:46:18.760 --> 00:46:20.079
<v Speaker 1>they avoid them in the first place.

908
00:46:20.199 --> 00:46:21.760
<v Speaker 3>It's a pleasure, all right.

909
00:46:21.880 --> 00:46:23.559
<v Speaker 1>We'll go ahead and wrap this up and we will

910
00:46:23.599 --> 00:46:26.000
<v Speaker 1>be back next week. Until next time, max out, everybody,
