1
00:00:14,439 --> 00:00:17,839
Speaker 1: Hey everybody, and welcome back to another episode of Adventures

2
00:00:17,839 --> 00:00:20,719
in DevOps. This week on our panel we have Will Button.

3
00:00:20,960 --> 00:00:23,559
What's going on everyone, We have Jeffrey Growman. Hey there,

4
00:00:23,640 --> 00:00:26,440
I'm Charles Maxwood from Devchat dot TV. And this week

5
00:00:26,519 --> 00:00:29,239
we have a special guest and it is Simon Tults.

6
00:00:29,519 --> 00:00:31,640
Speaker 2: Hi. Everyone, it's a pleasure to be here. Thank you

7
00:00:31,719 --> 00:00:32,759
very much for hosting me.

8
00:00:32,960 --> 00:00:35,479
Speaker 3: So fun Yeah, absolutely, I mean you brought all the energy.

9
00:00:35,520 --> 00:00:37,520
It's it's funny. We were talking beforehand and you're just

10
00:00:37,600 --> 00:00:40,159
excited and I love it. Do you want to just

11
00:00:40,240 --> 00:00:42,359
introduce yourself real quick, let people know who you are,

12
00:00:42,439 --> 00:00:45,920
what you do at? Is it da Tree, Dat Tree.

13
00:00:45,200 --> 00:00:48,560
Speaker 2: The Tree? Yeah? Sure, yep. So yeah. My name is Shimon.

14
00:00:48,840 --> 00:00:52,799
I've been in the infrastructure R and D space for

15
00:00:52,840 --> 00:00:56,200
more than ten years, and I've worked at large companies

16
00:00:56,240 --> 00:00:59,799
like Intel, and i worked at startups from like thirty

17
00:01:00,000 --> 00:01:04,599
employees until we were a thousand employees. And my previous role,

18
00:01:04,640 --> 00:01:08,439
I was an engineering manager for a media company and

19
00:01:08,680 --> 00:01:11,879
I grew together with the organization from thirty employees to

20
00:01:11,959 --> 00:01:16,159
one thousand, and I really saw how the struggle is

21
00:01:16,239 --> 00:01:19,840
real when you have four hundred engineers and you're trying

22
00:01:19,920 --> 00:01:23,959
to make this work while breaking things and moving fast.

23
00:01:24,120 --> 00:01:26,439
So this is actually what brought us here and what

24
00:01:26,599 --> 00:01:29,879
brought me to almost four years ago, me and my

25
00:01:29,959 --> 00:01:33,599
co founder to open the Tree, which helps prevent misconfigurations

26
00:01:33,640 --> 00:01:37,840
from ever reaching production environments, especially around kugernities.

27
00:01:38,040 --> 00:01:40,760
Speaker 3: All right, so the code that crashes my stuff, that's

28
00:01:40,840 --> 00:01:45,040
my fault. The misconfigurations that's somebody else's fault. I'm just

29
00:01:45,079 --> 00:01:48,799
being clear. I hope my boss hears this anyway. So yeah,

30
00:01:48,840 --> 00:01:51,439
so we were talking before the episode and you said

31
00:01:51,439 --> 00:01:55,120
that you experienced this outage and this is where you've

32
00:01:55,200 --> 00:01:57,640
learned a lot of the lessons that led to the tree.

33
00:01:58,040 --> 00:02:00,280
Do you want to just talk about that? Kind of

34
00:02:00,280 --> 00:02:02,359
give us the background and the story so that we

35
00:02:02,439 --> 00:02:04,879
know what you screwed up? I mean how that went?

36
00:02:07,280 --> 00:02:12,199
Speaker 2: Sure, sure thing. So let me describe the scenery first,

37
00:02:12,280 --> 00:02:15,319
you know, like in a book. So imagine a company

38
00:02:15,879 --> 00:02:18,560
and you have one thousand employees, four hundred of them

39
00:02:18,599 --> 00:02:22,400
are developers. The company is born in the cloud, paying

40
00:02:22,439 --> 00:02:26,520
one point five million dollars for AWS every month, a

41
00:02:26,599 --> 00:02:30,919
lot of stuff running. No one actually knows what is everything,

42
00:02:31,000 --> 00:02:34,840
but it kind of works, you know, and moving really

43
00:02:34,879 --> 00:02:38,000
really really fast working in micro services, a lot of

44
00:02:38,039 --> 00:02:42,599
different programming languages, and the philosophy of the company is,

45
00:02:42,639 --> 00:02:45,919
you know, like small speed boats, like Amazon calls it,

46
00:02:45,919 --> 00:02:49,240
two pizza teams. You know, you have your team, you find,

47
00:02:49,439 --> 00:02:51,680
you have a problem, you find the best solution, you go,

48
00:02:51,759 --> 00:02:56,000
you do it, responsible for it, sounds great, removes bottlenecks,

49
00:02:56,039 --> 00:03:00,479
makes you move fast, and really gives great energy people

50
00:03:00,520 --> 00:03:03,199
because they don't feel like they're a small, small screw

51
00:03:03,479 --> 00:03:06,400
in a big organization. And my role there was the

52
00:03:06,479 --> 00:03:10,159
general manager of the infrastructure division. So my work was

53
00:03:10,199 --> 00:03:13,000
to find things that are relevant to all the other

54
00:03:13,120 --> 00:03:16,240
teams and build it as an infrastructure. So for example,

55
00:03:16,280 --> 00:03:20,199
we built a data collection pipeline that ingested more than

56
00:03:20,280 --> 00:03:24,960
two hundred billion with the b every month events from

57
00:03:25,039 --> 00:03:29,719
thirteen regions in AWS. And this is what we would do.

58
00:03:29,840 --> 00:03:32,599
And every time there's a new technology or something that

59
00:03:32,680 --> 00:03:35,680
is cross the company, we would be responsible for it.

60
00:03:35,759 --> 00:03:38,840
Might team and sort of like sort of special ops

61
00:03:38,879 --> 00:03:42,280
in a way. And one day we had a production outage.

62
00:03:42,439 --> 00:03:45,400
Now this happens. You know, we're all people. I make mistakes.

63
00:03:45,439 --> 00:03:52,479
Everyone makes mistakes, besides Charles. It happens, and like every

64
00:03:52,520 --> 00:03:55,000
company we said, okay, let's post more tom the problem,

65
00:03:55,240 --> 00:03:57,639
understand what happened, let's find the root cause. And we

66
00:03:57,719 --> 00:04:00,960
did it. And a developer made a miss configuration in

67
00:04:01,039 --> 00:04:04,159
one manifest file, and we said, okay, we totally understand

68
00:04:04,199 --> 00:04:06,800
people make mistakes, and we want we believe in the

69
00:04:06,800 --> 00:04:09,879
philosophy of, you know, run fast and break things, but

70
00:04:10,360 --> 00:04:13,080
we don't believe in the philosophy of let's make the

71
00:04:13,120 --> 00:04:16,120
same mistake five times. You know there is a limit

72
00:04:16,240 --> 00:04:18,720
to that as well. So we said, okay, so how

73
00:04:18,720 --> 00:04:22,000
do we make sure that this does not happen again.

74
00:04:22,160 --> 00:04:24,319
So first of all, you send the post mortem in

75
00:04:24,360 --> 00:04:27,000
an email to everyone in the company. So we tried that.

76
00:04:27,480 --> 00:04:31,000
Nice doesn't really work. No one reads it, no one

77
00:04:31,079 --> 00:04:33,399
remembers it. And I got to tell you from the

78
00:04:33,439 --> 00:04:37,360
other side as a developer getting emails every day telling

79
00:04:37,399 --> 00:04:41,360
me like, use this package, use this configuration, check this thing.

80
00:04:41,800 --> 00:04:44,279
It's it's not scalable, Like how am I supposed to

81
00:04:44,319 --> 00:04:48,240
remember everything? It's it's just not feasible. And we said, okay,

82
00:04:48,279 --> 00:04:52,360
we did, you know, internal educational systems, And we did

83
00:04:52,399 --> 00:04:55,199
an internal meetup and explain to everyone, and everyone agreed,

84
00:04:55,240 --> 00:04:59,800
and everyone understands but it didn't really work well because

85
00:05:00,160 --> 00:05:02,079
I think, and this is what we thought, and this

86
00:05:02,199 --> 00:05:04,920
is what drove us to actually open the company, is

87
00:05:04,959 --> 00:05:08,639
that it has to happen in an automated way within

88
00:05:08,720 --> 00:05:13,480
the development flow of the developer. Because every inch, every

89
00:05:13,519 --> 00:05:16,360
small thing that you do in order to change the

90
00:05:16,399 --> 00:05:20,839
workflow of the developer, it's crazy. It's almost never going

91
00:05:20,879 --> 00:05:22,720
to happen, and if it's going to happen, it's going

92
00:05:22,720 --> 00:05:25,680
to be very very painful for the developer, for the manager,

93
00:05:25,680 --> 00:05:27,839
and for the company. So he said, how can we

94
00:05:27,879 --> 00:05:30,560
do something that will be seamless in the flow, Because

95
00:05:30,600 --> 00:05:33,240
when we spoke with people, they said, I want to

96
00:05:33,279 --> 00:05:36,279
know when I'm doing something wrong. I don't want to

97
00:05:36,279 --> 00:05:39,839
be the person that submits secret keys into our public

98
00:05:39,879 --> 00:05:42,720
GTAB repository. I don't want to be the person that

99
00:05:42,800 --> 00:05:47,199
takes production down. But sometimes I just don't know. And

100
00:05:47,240 --> 00:05:49,439
this is what drove us to actually opening the tree

101
00:05:49,920 --> 00:05:54,920
and building a solution that hooks directly within the development workflow.

102
00:05:55,480 --> 00:05:58,279
So it's a cli utility. You can run it on

103
00:05:58,319 --> 00:06:01,399
your laptop, Linux and Mac. Just run the three tests

104
00:06:01,720 --> 00:06:05,480
on your Kubernities manifest file or helm file, and then

105
00:06:05,639 --> 00:06:09,319
we provide out of the box pre defined policies. So

106
00:06:09,360 --> 00:06:13,920
I'll give you simple examples that seems, you know, really trivial,

107
00:06:14,000 --> 00:06:17,399
but people don't do it. It's like memory limit SIPY.

108
00:06:17,399 --> 00:06:21,959
You limit a likeness probe, readiness probe, pulling containers from

109
00:06:21,959 --> 00:06:25,920
a centralized registree, not using the latest doctor tag. Because

110
00:06:25,959 --> 00:06:28,279
then every time you build it, it's like going to

111
00:06:28,319 --> 00:06:31,240
the casino. You don't know it's version. You're gonna get what,

112
00:06:31,839 --> 00:06:35,480
You're gonna have it any productions like you know, and

113
00:06:35,959 --> 00:06:38,399
next after you have it in your computer, you install

114
00:06:38,439 --> 00:06:41,920
it in your CICD. And at this point this is

115
00:06:41,959 --> 00:06:44,279
one of the most powerful things because you get a

116
00:06:44,399 --> 00:06:49,240
centralized policy management solution. So I, as the develops engineer,

117
00:06:49,639 --> 00:06:52,680
can identify a problem, think of a policy that I

118
00:06:52,759 --> 00:06:55,959
want to apply to all of my hundred or fifty

119
00:06:56,120 --> 00:06:59,240
or five thousand engineers, and with the click of a button,

120
00:06:59,360 --> 00:07:02,560
I can enable policy. And now all the projects that

121
00:07:02,639 --> 00:07:06,639
go through this CICD pipeline will actually comply with this

122
00:07:06,759 --> 00:07:11,199
policy and otherwise it will fail. And the idea is

123
00:07:11,240 --> 00:07:14,720
that once it fails, it does not notify the develops person.

124
00:07:15,079 --> 00:07:18,680
It explains the developer what do they have to do

125
00:07:19,240 --> 00:07:21,800
and shows them and links them to wiki and to

126
00:07:21,879 --> 00:07:26,160
our dogs and tells them, hey, mister developer, Hey missus developer,

127
00:07:26,279 --> 00:07:28,720
this is how you can fix it. So we are

128
00:07:28,839 --> 00:07:32,040
very very proud of it because I really believe that

129
00:07:32,160 --> 00:07:35,319
this is how I would want my organization to communicate

130
00:07:35,360 --> 00:07:38,519
those policies and practices to me as a developer, as

131
00:07:38,560 --> 00:07:39,160
an engineer.

132
00:07:39,240 --> 00:07:45,480
Speaker 1: Quit telling me what to do. No, it makes sense,

133
00:07:45,759 --> 00:07:48,160
and to be perfectly honest, you know. So, yeah, I

134
00:07:48,480 --> 00:07:51,279
write web developed. I'm a web developer for a fairly

135
00:07:51,360 --> 00:07:54,959
large financial firm. And what's nice is a lot of

136
00:07:54,959 --> 00:07:58,120
this stuff does kind of get pushed into our CICD.

137
00:07:58,959 --> 00:08:02,600
But the other nice part of it is that generally

138
00:08:02,680 --> 00:08:05,720
when these kinds of policy changes come down and I

139
00:08:05,720 --> 00:08:07,639
don't think they're using the tree. I think they're using

140
00:08:07,720 --> 00:08:10,759
just we're making this policy change and we're configuring Jenkins

141
00:08:10,800 --> 00:08:13,920
to do it. But they generally are pretty good about

142
00:08:13,920 --> 00:08:17,160
going in and making the initial move right, so they

143
00:08:17,319 --> 00:08:21,240
move it to the to match the policy, and then

144
00:08:21,279 --> 00:08:25,000
from there when it lines up with whatever we're doing.

145
00:08:25,399 --> 00:08:28,000
That's when we get to the point where it's like, okay,

146
00:08:28,079 --> 00:08:30,759
So then if we change something that messes it up right,

147
00:08:30,800 --> 00:08:32,840
then it's on us. Okay, we can roll this back.

148
00:08:32,960 --> 00:08:35,480
But yeah, they're usually the ones that initially make it comply.

149
00:08:35,600 --> 00:08:37,720
And I just wanted to add that because I think

150
00:08:37,759 --> 00:08:40,519
there's some level of responsibility that goes both ways, and

151
00:08:40,600 --> 00:08:43,720
so that's what I like about this. But if you're

152
00:08:43,720 --> 00:08:46,240
the one that's making the initial change that's going to

153
00:08:46,320 --> 00:08:49,120
cause it to fail and SI, then you probably also

154
00:08:49,159 --> 00:08:51,399
ought to be the person that's either working with somebody

155
00:08:51,799 --> 00:08:54,000
or doing the work yourself to make it comply in

156
00:08:54,039 --> 00:08:54,600
the first place.

157
00:08:54,879 --> 00:08:57,799
Speaker 2: I totally agree, and this is why when we designed

158
00:08:57,879 --> 00:09:01,519
our policy engine, we design I need to have several

159
00:09:01,960 --> 00:09:05,519
points of granularity. So first of all, you can see

160
00:09:05,799 --> 00:09:09,080
how am I doing. Now let's scan my GitHub repositors

161
00:09:09,120 --> 00:09:13,679
and see do I have any violations now or not. Secondly,

162
00:09:14,039 --> 00:09:17,039
you can enable a rule in a way that we

163
00:09:17,120 --> 00:09:21,440
call it gradual rollout. So now every time that they

164
00:09:21,480 --> 00:09:24,039
make a change that is not complainant, to tell them listen,

165
00:09:24,480 --> 00:09:28,320
on August first, this change will not be complied. Now

166
00:09:29,039 --> 00:09:31,720
it is passing, and that's okay, totally fine, but just

167
00:09:31,759 --> 00:09:33,600
so you know, we're going to have a policy in

168
00:09:33,639 --> 00:09:36,600
place in August first, and this is the policy, and

169
00:09:36,639 --> 00:09:39,639
here you have time to actually prepare to it, and

170
00:09:39,679 --> 00:09:44,440
then once August first hits, then it fails as a

171
00:09:44,480 --> 00:09:48,240
warning and not as enforcement, and then you have a

172
00:09:48,279 --> 00:09:52,240
great period for adoption of the policy, and only then

173
00:09:52,320 --> 00:09:53,919
at the end of the end of the end it

174
00:09:54,039 --> 00:09:57,360
actually goes to full enforcement. And you're totally right. This

175
00:09:57,440 --> 00:10:00,200
is the feedback that we got from our customers, and

176
00:10:00,240 --> 00:10:03,080
this is how they designed it, and we built it

177
00:10:03,200 --> 00:10:04,360
because this is how they wanted.

178
00:10:04,600 --> 00:10:05,480
Speaker 4: Now that's super cool.

179
00:10:05,519 --> 00:10:07,879
Speaker 5: I really like the approach of You mentioned the path

180
00:10:07,919 --> 00:10:11,120
of how you got here through the emails and the

181
00:10:11,200 --> 00:10:14,919
meetings and the workshops and stuff, but really all that

182
00:10:15,440 --> 00:10:17,639
is only relevant at the time. And doing it this way,

183
00:10:18,279 --> 00:10:20,120
I think one of the key things there is that

184
00:10:20,159 --> 00:10:24,159
you're meeting the developers where they are, because that's the

185
00:10:24,279 --> 00:10:28,360
right time to introduce the solution or the information is

186
00:10:28,399 --> 00:10:31,720
when it's relevant to them. Otherwise it's just out of context.

187
00:10:32,080 --> 00:10:36,279
Speaker 2: I totally agree. You have to get the warning and

188
00:10:36,080 --> 00:10:39,440
the data in line. I call it in line. And

189
00:10:39,480 --> 00:10:41,840
this is why we're working on We have a helm

190
00:10:41,919 --> 00:10:44,399
plug in, working on a cube, cattle plug in, vias,

191
00:10:44,440 --> 00:10:49,320
code plug in, everything, and this is very important because

192
00:10:49,639 --> 00:10:52,039
if it's not convenient, and if it's not in the

193
00:10:52,080 --> 00:10:56,480
developer's workflow, then I'll give you a story. Okay, I

194
00:10:56,519 --> 00:11:00,519
met with a big enterprise company. The talk to me

195
00:11:00,519 --> 00:11:03,279
about the certain policy that we have that says like

196
00:11:03,440 --> 00:11:06,840
pull containers from the centralized registry of the company. So

197
00:11:06,879 --> 00:11:10,960
it's like m doctor at company ACME dot com, right,

198
00:11:11,200 --> 00:11:13,440
and he's like, it's it's a good policy. I want

199
00:11:13,679 --> 00:11:16,600
to use your solution instead of ours. And I'm like, what,

200
00:11:16,600 --> 00:11:19,320
what's your solution? So what do you do today? Oh,

201
00:11:19,399 --> 00:11:22,279
we just blocked docer hobbing our firewall and no one

202
00:11:22,279 --> 00:11:26,200
can ask it. And I'm like, what.

203
00:11:27,480 --> 00:11:28,600
Speaker 5: Problem solved?

204
00:11:29,960 --> 00:11:31,679
Speaker 2: For real? This is the real really, this is what

205
00:11:31,720 --> 00:11:38,320
he told me. I was and he's like, yeah, well,

206
00:11:38,320 --> 00:11:40,679
we just block it in the DNS and firewall level

207
00:11:40,720 --> 00:11:42,639
and that's it and they can't pull it from there.

208
00:11:42,960 --> 00:11:45,799
And I think that this is absolutely not the way

209
00:11:45,879 --> 00:11:50,360
to do it. As we go forward, developers want to

210
00:11:50,399 --> 00:11:52,759
achieve left nice to play solution.

211
00:11:53,000 --> 00:11:56,200
Speaker 4: Yeah, that works until your developers get smart enough to

212
00:11:56,200 --> 00:11:58,519
figure out how to use a VPN or other ways

213
00:11:58,519 --> 00:12:01,120
of Yeah. Yeah, I mean this.

214
00:12:01,200 --> 00:12:03,440
Speaker 2: Is security by obscurity.

215
00:12:03,240 --> 00:12:06,679
Speaker 1: That's yeah, it's the wrong way I was going to

216
00:12:06,720 --> 00:12:12,679
say Jurassic part nature or developers will find a way.

217
00:12:13,639 --> 00:12:16,519
Speaker 4: Absolutely, you know, just a pile on. I totally love

218
00:12:16,559 --> 00:12:19,679
this idea too, and it's it really speaks I think

219
00:12:19,679 --> 00:12:22,679
to like the whole DevOps mentality of like flow and

220
00:12:22,759 --> 00:12:25,919
like pull requests versus push requests, because I think, you know,

221
00:12:25,960 --> 00:12:28,159
the way you were describing it earlier was that it's

222
00:12:28,200 --> 00:12:31,120
basically just you know, pushing stuff out, which never really

223
00:12:31,200 --> 00:12:33,399
works all that well. But if you, as Will said,

224
00:12:33,559 --> 00:12:37,559
you get timing right so that now developers can pull

225
00:12:37,600 --> 00:12:40,960
that information as they need it. The timing is right,

226
00:12:41,120 --> 00:12:44,679
the method is right, the information is there for them

227
00:12:44,720 --> 00:12:45,720
to pull it.

228
00:12:45,919 --> 00:12:46,519
Speaker 2: Just you know.

229
00:12:46,639 --> 00:12:48,440
Speaker 4: Again, I know I'm just piling on, but I just

230
00:12:48,480 --> 00:12:51,559
feel like it actually really fits really nicely and elegantly.

231
00:12:51,960 --> 00:12:54,720
Then the whole DevOps you know sort of methodology.

232
00:12:54,919 --> 00:12:57,240
Speaker 2: But you know what I got to say, of course

233
00:12:57,240 --> 00:13:02,039
I gave a very radical example. Well, now, but when

234
00:13:02,080 --> 00:13:06,039
we meet with companies that are at this crossroad, because

235
00:13:06,080 --> 00:13:08,720
they say, okay, listen, we scale we had thirty developers,

236
00:13:08,720 --> 00:13:10,879
it was okay, forty fifth, like now we have like

237
00:13:10,960 --> 00:13:15,039
seventy developers. It's COVID. We're all working from home. You

238
00:13:15,080 --> 00:13:17,720
can't come to just a room and ask hey, how

239
00:13:17,720 --> 00:13:20,000
do we do this and that, and it's like we

240
00:13:20,240 --> 00:13:22,720
need to put something in place. And then I see

241
00:13:22,759 --> 00:13:27,279
companies choose two different paths. It's like two opposites that

242
00:13:27,600 --> 00:13:29,720
you can go to. And of course I think that

243
00:13:29,879 --> 00:13:32,919
the best solution is the middle ground. But like some

244
00:13:33,039 --> 00:13:37,120
companies go the old way, the whole way too. Okay,

245
00:13:37,519 --> 00:13:41,279
So DevOps is responsible for the cluster, which is true

246
00:13:41,279 --> 00:13:46,080
in many organizations, the responsible for the operational excellence of

247
00:13:46,120 --> 00:13:48,879
the cluster and for the day to day operations. But

248
00:13:48,919 --> 00:13:52,679
then the developers write the application. Then what they say

249
00:13:52,879 --> 00:13:56,519
is okay. So now every change that the developer makes

250
00:13:56,879 --> 00:13:59,960
to a Kuberniti's manifest or helm or anything that touches

251
00:14:00,120 --> 00:14:03,080
the infrastructure has to go through the opsteam. Now what

252
00:14:03,200 --> 00:14:07,000
happens at this point is that there's a huge bottleneck. Eventually,

253
00:14:07,039 --> 00:14:11,279
it frustrates both sides because the developers they have the

254
00:14:11,440 --> 00:14:13,960
R and D backlog and the product sitting on them

255
00:14:14,200 --> 00:14:16,840
with timelines that they need to release stuff and they're

256
00:14:16,879 --> 00:14:20,080
waiting for the up steam to approve it. The OPS

257
00:14:20,120 --> 00:14:23,519
team they don't want to babysit developers and tell them listen,

258
00:14:23,639 --> 00:14:26,080
you forget you're pulling the latest image put up in

259
00:14:26,200 --> 00:14:29,399
down version. No, because it's not interesting. They want to

260
00:14:29,399 --> 00:14:32,679
do cost reduction, they want to optimize the performance, they

261
00:14:32,679 --> 00:14:34,679
want to upgrade it, they want to bring the new

262
00:14:34,960 --> 00:14:39,639
best versions. They you know, do crazy pocs. And then,

263
00:14:39,759 --> 00:14:44,480
like all all sides are are basically frustrated because they

264
00:14:44,519 --> 00:14:48,720
babysit developers. The developers don't get autonomy, and at the

265
00:14:48,799 --> 00:14:51,919
end of the day, it's just bottleneck the develops. And

266
00:14:51,960 --> 00:14:54,480
not to talk about the fact that SERE and DevOps

267
00:14:54,519 --> 00:14:57,919
teams are usually like one to ten developers, so you

268
00:14:58,039 --> 00:15:00,879
might have like ten develops people to one hundred or

269
00:15:00,960 --> 00:15:03,840
two hundred developers. So that's one side.

270
00:15:03,679 --> 00:15:05,320
Speaker 3: I can definitely identify with this.

271
00:15:05,399 --> 00:15:07,759
Speaker 1: I mean, I'm working on a project right now that's

272
00:15:07,799 --> 00:15:11,559
on several timelines, right, and yeah, when everything's won't deploy,

273
00:15:11,679 --> 00:15:15,480
when it doesn't play nicely with the cluster, things like that,

274
00:15:15,879 --> 00:15:18,480
we get frustrated, right, And then my boss gets frustrated

275
00:15:18,519 --> 00:15:20,519
and it's like, why isn't this out there?

276
00:15:20,559 --> 00:15:20,720
Speaker 4: Right?

277
00:15:20,799 --> 00:15:23,120
Speaker 1: And then you know, well DevOps, right, and so then

278
00:15:23,159 --> 00:15:25,559
they go to DevOps and it's the same thing, right,

279
00:15:25,600 --> 00:15:29,120
and then the DevOps guys. Sometimes it's okay, well, let's

280
00:15:29,200 --> 00:15:31,120
they'll go figure out what it is and it's something

281
00:15:31,159 --> 00:15:33,639
that they can fix. And sometimes they're coming back to

282
00:15:33,720 --> 00:15:37,879
us and saying, well, there's this problem, and they don't

283
00:15:37,879 --> 00:15:41,039
want to come back to us and manage us, and

284
00:15:42,039 --> 00:15:45,799
and nobody else is happy because whoever the powers that

285
00:15:45,879 --> 00:15:48,320
be are for the business needs, they just want it

286
00:15:48,360 --> 00:15:49,240
out right.

287
00:15:49,679 --> 00:15:53,080
Speaker 3: And so, yeah, what you're talking about. We've run into

288
00:15:53,080 --> 00:15:54,639
this more than once over the last year.

289
00:15:55,080 --> 00:15:57,879
Speaker 2: And then it's like, what I'm doing to know? I

290
00:15:57,919 --> 00:16:00,559
heard it from several and several companies and multiple times.

291
00:16:00,600 --> 00:16:04,080
And just to tell you another example, the most common

292
00:16:04,120 --> 00:16:06,159
thing is that they come to Develops and tell them

293
00:16:06,159 --> 00:16:08,159
I have a deadline and then they go like, yeah,

294
00:16:08,159 --> 00:16:11,039
but the CFO told me to do cost reduction on AWS.

295
00:16:11,120 --> 00:16:13,080
So what do I do? I listen to the CEO

296
00:16:13,200 --> 00:16:15,519
to the CFO, or do I listen to the VPR

297
00:16:15,559 --> 00:16:18,679
and D or like what's more important? And who knows?

298
00:16:18,919 --> 00:16:23,159
I don't know. It's it's hard. Now let's talk about

299
00:16:23,200 --> 00:16:27,519
the other side. The other side is actually when this happens,

300
00:16:28,120 --> 00:16:33,039
but Develops does not assume responsibility and they go the

301
00:16:33,080 --> 00:16:36,120
path of educating the developers and they say, no, we're

302
00:16:36,120 --> 00:16:39,200
not going to lock everything. We're not going to lock anything,

303
00:16:39,279 --> 00:16:42,360
but we're going to put additional efforts into educating the

304
00:16:42,399 --> 00:16:45,639
developers in order to making the right decisions, which is nice.

305
00:16:45,919 --> 00:16:50,399
The thing is, while this happens, you're really not sleeping

306
00:16:50,440 --> 00:16:54,120
at night, both because you're afraid and because you're getting

307
00:16:54,279 --> 00:16:58,000
like pager paged for things that are happening. And secondly

308
00:16:58,639 --> 00:17:03,080
the developers are I find them terrified. They go like,

309
00:17:03,559 --> 00:17:06,480
I'm going to do that's something that is going to

310
00:17:07,240 --> 00:17:10,799
change production now. And they go like, I'm a Java

311
00:17:10,799 --> 00:17:14,480
billing engineer, I don't know Docker, I don't know Kubernites.

312
00:17:15,160 --> 00:17:18,759
I'm expert in Java billing, not in you know, Docker,

313
00:17:19,160 --> 00:17:21,519
I don't know. And then you find them like almost

314
00:17:21,519 --> 00:17:25,200
crippled because because they're afraid, they say, I don't know,

315
00:17:25,279 --> 00:17:27,799
I'm not an expert, and I'm afraid to break it

316
00:17:27,839 --> 00:17:30,000
and I don't want to do it. And then the

317
00:17:30,319 --> 00:17:33,160
teams they try to educate them and so on. But

318
00:17:33,279 --> 00:17:37,359
I think that what goes best, and it's solutions like

319
00:17:37,400 --> 00:17:40,119
the tree or you can take up open policy agent

320
00:17:40,599 --> 00:17:44,200
with contest and the gatekeeper and write your own policies

321
00:17:44,559 --> 00:17:47,319
and what I've heard from developers is that when the

322
00:17:47,359 --> 00:17:51,480
middle ground they call it, I feel like I have guardrails,

323
00:17:52,000 --> 00:17:55,599
Like I'm riding the freeway, but I have guardrails so

324
00:17:55,720 --> 00:17:58,960
I can do it by myself. I'm not bottlenecked by

325
00:17:59,000 --> 00:18:02,640
de VopS, but if I do something horribly wrong, the

326
00:18:02,759 --> 00:18:06,799
system will stop me. And then it's like a nice

327
00:18:06,799 --> 00:18:10,319
middle ground between the two, which I think can greatly

328
00:18:10,400 --> 00:18:12,720
help both sides of the equation.

329
00:18:13,160 --> 00:18:14,880
Speaker 5: I think that's one of the approaches I try to

330
00:18:14,880 --> 00:18:17,759
take in specifically in post mortems, you know, because in

331
00:18:17,799 --> 00:18:21,000
post mortims a lot of the focus is on root

332
00:18:21,079 --> 00:18:22,319
cause and what went wrong.

333
00:18:22,400 --> 00:18:23,119
Speaker 3: But I try to.

334
00:18:23,039 --> 00:18:25,799
Speaker 5: Take it a little bit further than that and say,

335
00:18:26,319 --> 00:18:29,839
you know, the failure was not that this code did whatever.

336
00:18:29,920 --> 00:18:34,799
The failure is that the system didn't warn somebody that

337
00:18:34,880 --> 00:18:35,839
this was going to happen.

338
00:18:35,960 --> 00:18:36,200
Speaker 4: Right.

339
00:18:36,279 --> 00:18:40,559
Speaker 5: We built an environment where a developer or an engineer

340
00:18:41,200 --> 00:18:43,200
was able to make a change that they shouldn't have

341
00:18:43,240 --> 00:18:45,000
been allowed to make. And I think that's what you're

342
00:18:45,039 --> 00:18:47,880
describing there, is they're free to do whatever they want,

343
00:18:47,920 --> 00:18:50,759
but there's the guardrails in place to keep them from

344
00:18:51,240 --> 00:18:53,720
doing something that they didn't intentionally want.

345
00:18:53,759 --> 00:18:57,400
Speaker 2: To do a WS you go delead the resource and

346
00:18:57,400 --> 00:19:02,279
it's like, there are fifteen resources attached to this security group.

347
00:19:03,440 --> 00:19:05,960
I guess you don't want to delete the security group.

348
00:19:07,640 --> 00:19:10,480
Speaker 4: So one thing I'm curious about you is when you

349
00:19:10,480 --> 00:19:14,240
talk about your clients, I'm curious to hear what are

350
00:19:14,240 --> 00:19:18,519
the comment sort of misconfigurations. You know, I don't know

351
00:19:18,559 --> 00:19:20,480
if you're to say, you know, if you were to

352
00:19:20,480 --> 00:19:22,319
ask you here, what are the top five or top ten,

353
00:19:22,880 --> 00:19:26,559
I'm really curious to hear like what you see commonly.

354
00:19:27,559 --> 00:19:32,839
Speaker 2: Yeah, great question, It's an absolutely great question. And by

355
00:19:32,839 --> 00:19:34,880
the way, in our docs have that the tree that

356
00:19:34,960 --> 00:19:37,200
I owe, we list all of the policies that we

357
00:19:37,279 --> 00:19:41,440
have and you can view everything. So let's go over

358
00:19:41,559 --> 00:19:45,279
some categories and talk about them and talk about the

359
00:19:45,039 --> 00:19:51,119
their security. So one one day, one company, very big

360
00:19:51,160 --> 00:19:54,160
messaging company told me, he told me, I want an

361
00:19:54,200 --> 00:19:57,759
if in for a safety score. I want this to

362
00:19:57,880 --> 00:20:02,200
run and for me to be to feel not safe

363
00:20:02,240 --> 00:20:06,079
in regards to security safe. Also it also has security aspects,

364
00:20:06,079 --> 00:20:09,519
but I want to know that my safety score is high.

365
00:20:10,400 --> 00:20:15,240
So it starts from resource management. So in kubernets, for example,

366
00:20:15,240 --> 00:20:19,359
you can have a CPU requests and CPU limit, memory limit,

367
00:20:19,519 --> 00:20:23,240
and memory requests. This is very very common, and it

368
00:20:23,279 --> 00:20:29,920
specially happens because the developer, she codes the app and

369
00:20:29,960 --> 00:20:33,000
then she sends it to the cluster, she doesn't know

370
00:20:33,680 --> 00:20:36,480
what is she going to be paired with. Now, the

371
00:20:36,519 --> 00:20:42,519
develops engineer works on workload management and optimizes the workloads

372
00:20:42,799 --> 00:20:46,160
on the different nodes so it will be cost effective.

373
00:20:46,559 --> 00:20:49,119
And the problem starts when you don't have memory in

374
00:20:49,200 --> 00:20:52,599
CPU limits, and then you have a memory leak in

375
00:20:52,680 --> 00:20:56,400
one of your containers, and then it starts affecting it

376
00:20:56,440 --> 00:21:00,279
psych a noisy neighbor, but very very noisy, and then

377
00:21:00,400 --> 00:21:04,160
Kubernitis starts to it depends on how you configured it

378
00:21:04,200 --> 00:21:07,119
and so on, but it starts to kill services, starts

379
00:21:07,119 --> 00:21:10,079
to run out of memory. There are different behaviors that

380
00:21:10,400 --> 00:21:12,640
none of them is good and none of them is

381
00:21:12,680 --> 00:21:16,400
as expected. So I'd say this is the like no

382
00:21:16,559 --> 00:21:19,319
brainer one that you should do.

383
00:21:20,359 --> 00:21:20,519
Speaker 4: Now.

384
00:21:20,559 --> 00:21:23,319
Speaker 2: What we see companies usually do also is that they

385
00:21:23,359 --> 00:21:27,920
start and they apply cluster wide memory and CPU limit

386
00:21:28,799 --> 00:21:30,880
because you can do it on the runtime level. The

387
00:21:30,920 --> 00:21:34,680
problem starts when you know you have different departments. One

388
00:21:34,680 --> 00:21:37,960
of them needs four jigabytes of memory and it's great,

389
00:21:38,039 --> 00:21:40,440
but then you have the AI engineers and they're like,

390
00:21:40,480 --> 00:21:43,960
we need forty jigabytes. And then if you don't configure

391
00:21:44,000 --> 00:21:46,640
it on a shift left side, then you go like, okay,

392
00:21:46,680 --> 00:21:49,160
so I need to increase everyone's limit to forty jigabytes,

393
00:21:49,200 --> 00:21:52,440
and then it's like nothing, it doesn't matter. So it's

394
00:21:52,440 --> 00:21:55,440
really important to set it on the resources side. So

395
00:21:55,599 --> 00:21:58,839
that's one. The second one is I would say around

396
00:21:59,160 --> 00:22:02,720
workload manage in terms of making sure that you have

397
00:22:02,759 --> 00:22:07,079
a liveness probe, a readiness probe, that your doctor container

398
00:22:07,119 --> 00:22:11,559
has a health check. It sounds so trivial, it sounds

399
00:22:11,680 --> 00:22:16,720
so simple, but so many times people go and create

400
00:22:16,759 --> 00:22:20,480
the workload, don't set those things. They just oh, just

401
00:22:20,680 --> 00:22:24,039
HTTP to it, ah, HTTP two hundred, it works great,

402
00:22:26,920 --> 00:22:29,519
and then they don't configure it on the workload level.

403
00:22:29,960 --> 00:22:32,640
And not to talk about you know, deeper things where

404
00:22:32,680 --> 00:22:34,400
you have a service and you want the health check

405
00:22:34,440 --> 00:22:37,039
to include maybe a connection to a database or to

406
00:22:37,119 --> 00:22:41,200
a cash and I would really really advise to, like,

407
00:22:41,319 --> 00:22:45,200
in order to increase your safety and stability, really put

408
00:22:45,279 --> 00:22:49,440
an effort into your health checks, readiness, liveness, because if

409
00:22:49,519 --> 00:22:52,839
you do it right and correctly, once things fail and

410
00:22:52,880 --> 00:22:55,880
things always fail. It will be easy for you to

411
00:22:55,960 --> 00:22:58,640
find the root cause, and it will be easy for

412
00:22:58,680 --> 00:23:01,480
you to protect yourself and for kubernitis to kill this

413
00:23:01,559 --> 00:23:03,759
workload and to get another workload running.

414
00:23:03,960 --> 00:23:05,880
Speaker 4: So let me just stop you there, because I was

415
00:23:06,039 --> 00:23:09,200
to ask the dumb questions. So I think, I think

416
00:23:09,240 --> 00:23:11,799
I understand what you're saying. But for some of our

417
00:23:12,039 --> 00:23:14,079
for other people, you know, our listeners, who may not

418
00:23:14,400 --> 00:23:16,519
have followed that whole train of thought, right, because there's

419
00:23:16,519 --> 00:23:19,400
a lot there. You just said in two minutes that

420
00:23:19,480 --> 00:23:22,319
I feel like we could unpack, right, So I'm going

421
00:23:22,359 --> 00:23:23,400
to say it, and then you need to tell me

422
00:23:23,440 --> 00:23:25,400
how wrong I am or if I miss something. But

423
00:23:26,160 --> 00:23:29,640
so let's say we spin up, you know we we

424
00:23:29,640 --> 00:23:32,279
we send out a new package, we spin it up,

425
00:23:32,599 --> 00:23:34,839
and I'm the developer. So what I do is I

426
00:23:34,920 --> 00:23:37,119
just checked to say, hey, can I hit can I

427
00:23:37,119 --> 00:23:39,279
can I with my browser hit it? And I get

428
00:23:39,319 --> 00:23:43,559
a two hundred response back saying we're good. So that's

429
00:23:43,599 --> 00:23:45,759
only a piece of it, because maybe I'm only hitting

430
00:23:45,839 --> 00:23:48,240
let's say the load balancer, and so the load balancer

431
00:23:48,319 --> 00:23:52,000
is saying I'm here, right, I'm answering you, but the

432
00:23:52,000 --> 00:23:57,279
application behind it is dead, or maybe the application is alive,

433
00:23:57,480 --> 00:24:01,039
but the database behind it is dead. So unless I'm

434
00:24:01,079 --> 00:24:03,799
doing health checks that are short of going through those

435
00:24:03,839 --> 00:24:07,680
steps we may have had, we may have just deployed

436
00:24:07,720 --> 00:24:11,119
something that broke everything and I don't even realize it

437
00:24:11,119 --> 00:24:15,000
because all I'm doing is pinging the load balancer and

438
00:24:15,039 --> 00:24:17,279
getting a two hundred response and everything looks good to

439
00:24:17,319 --> 00:24:19,200
me because I didn't check what's going on behind you know,

440
00:24:19,200 --> 00:24:20,960
I sort of peeling back the layers of the onion.

441
00:24:21,559 --> 00:24:23,279
Did I get that right? Orm I missing.

442
00:24:23,039 --> 00:24:27,400
Speaker 2: Something absolutely right. This is one of the most common

443
00:24:27,440 --> 00:24:33,559
mistakes developers make is they just check the simple front

444
00:24:33,680 --> 00:24:37,000
end web browser and they don't do the entire process.

445
00:24:37,240 --> 00:24:39,119
And then when you do have a problem, it's so

446
00:24:39,279 --> 00:24:42,880
hard to debug it because everything returns a great health check.

447
00:24:42,960 --> 00:24:46,119
So and you don't understand what is actually the problem.

448
00:24:46,440 --> 00:24:50,039
Speaker 4: So is this something that you know, like the tree

449
00:24:50,200 --> 00:24:53,000
does forty? Is this something I've got to figure out?

450
00:24:53,039 --> 00:24:54,119
Like how does that?

451
00:24:54,200 --> 00:24:54,359
Speaker 2: You know?

452
00:24:54,480 --> 00:24:56,079
Speaker 4: How do you build that into a health check? So

453
00:24:55,920 --> 00:24:57,480
that sounds that There's a lot of steps, and it

454
00:24:58,079 --> 00:25:00,720
really depends on the architecture your application.

455
00:25:01,559 --> 00:25:05,000
Speaker 2: Yeah, so we can talk about it from an engineering standpoint.

456
00:25:05,920 --> 00:25:08,559
In terms of the tree. The tree is a tool,

457
00:25:08,799 --> 00:25:10,839
and it's a tool that you can use in order

458
00:25:10,880 --> 00:25:15,079
to say, listen, from now on, all of our Kubernites

459
00:25:15,160 --> 00:25:19,279
workloads are going to have a liveness probe, a readiness probe. Now,

460
00:25:19,279 --> 00:25:21,839
how you configure this liveness problem and readiness probe is

461
00:25:21,920 --> 00:25:24,160
up to you. Same thing, you're going to put a

462
00:25:24,200 --> 00:25:26,880
memory limit. If you put a memory limit of sixty

463
00:25:26,920 --> 00:25:30,359
four megabytes and your server can't even it's some Java

464
00:25:30,680 --> 00:25:33,200
huge jar, I don't know, it can't even load up,

465
00:25:33,759 --> 00:25:36,839
it's your problem. But what we will do is we

466
00:25:36,920 --> 00:25:40,640
will make sure that a policy exists and that it

467
00:25:41,960 --> 00:25:45,960
is configured on the resource. The next layer is another

468
00:25:46,160 --> 00:25:50,119
thing that is like what is the most common you know, policies.

469
00:25:50,799 --> 00:25:55,440
It's actually labels. Again, it sounds so simple, it sounds

470
00:25:55,440 --> 00:25:59,119
so trivial to put the label, and there are so

471
00:25:59,279 --> 00:26:01,720
many and is why to put a label? So I'll

472
00:26:01,720 --> 00:26:04,759
start with the one that we're talking about now. So

473
00:26:05,240 --> 00:26:08,200
first of all, you can use labels in order to

474
00:26:08,240 --> 00:26:10,680
say what type of workload it is in order to

475
00:26:10,720 --> 00:26:14,279
determine which type of policy in terms of resource management,

476
00:26:14,319 --> 00:26:16,759
for example, it should use. So then you could say

477
00:26:16,839 --> 00:26:21,079
this is from type AI and they use those types

478
00:26:21,119 --> 00:26:24,599
of limits and those are from type back end front end.

479
00:26:24,680 --> 00:26:27,759
I don't know. Different teams call it in different names,

480
00:26:28,039 --> 00:26:30,839
and you can use it in order to understand what

481
00:26:30,880 --> 00:26:34,640
are the relevant policies you should use. This is number one.

482
00:26:35,119 --> 00:26:38,880
Number two cost management. This is also a very very

483
00:26:38,880 --> 00:26:42,400
common use case that DevOps people have to deal with,

484
00:26:42,880 --> 00:26:47,599
which is constantly knowing to assign the cost center because

485
00:26:47,640 --> 00:26:50,480
they run the shared resources and at the end of

486
00:26:50,519 --> 00:26:52,680
the day they pay the check to a WS or

487
00:26:52,720 --> 00:26:55,519
AZURE or whatever you run it, and then the internal

488
00:26:55,599 --> 00:26:58,079
company goes like, okay, but how much do we need

489
00:26:58,119 --> 00:27:01,640
to build each business unit inside of organization? And then

490
00:27:01,680 --> 00:27:04,319
they go like, I don't know. We had like five

491
00:27:04,400 --> 00:27:08,319
thousand servers and then they go like, okay, now it's

492
00:27:08,359 --> 00:27:11,920
mandatory everyone should say which department this server belongs to,

493
00:27:12,000 --> 00:27:14,079
because otherwise we're not going to know how much to

494
00:27:14,599 --> 00:27:17,119
allocate to it, because then you don't know what is

495
00:27:17,160 --> 00:27:20,119
the cost of goods of your business, and then the

496
00:27:20,160 --> 00:27:24,160
CFO doesn't know if the business is profitable or not profitable,

497
00:27:24,200 --> 00:27:26,160
or can we hire people? Can we not hire people?

498
00:27:26,519 --> 00:27:30,279
And it's crazy because it's like board of director's decisions

499
00:27:30,279 --> 00:27:32,680
that go down to the CFO that go down, down, down,

500
00:27:32,720 --> 00:27:35,279
down down to the simple label that you need to

501
00:27:35,319 --> 00:27:38,039
put on your comin this workload in order to know

502
00:27:38,079 --> 00:27:39,079
how much it costs.

503
00:27:39,319 --> 00:27:42,480
Speaker 5: Can you define for us the difference between a liveness

504
00:27:42,480 --> 00:27:43,799
probe and a readiness probe?

505
00:27:44,119 --> 00:27:48,759
Speaker 2: That is a great question. So liveness probe works on

506
00:27:49,240 --> 00:27:52,240
I said readiness. Readiness probe is when I'm ready to

507
00:27:52,279 --> 00:27:57,160
serve traffic. So let's say I'm initializing myself. I need

508
00:27:57,200 --> 00:27:59,519
to start. I need to go create a cash, make

509
00:27:59,519 --> 00:28:01,240
sure that they can put it there, and so on.

510
00:28:01,839 --> 00:28:04,640
And then a liveness probe is when I'm running. Am

511
00:28:04,640 --> 00:28:08,079
I running correctly? Can I continue communicating, for example, with

512
00:28:08,240 --> 00:28:12,319
my existing cash or whatever it is. I like the

513
00:28:13,319 --> 00:28:16,240
I think it's too much. I like the health simple

514
00:28:16,279 --> 00:28:18,400
health check you know that goes end to end and

515
00:28:19,400 --> 00:28:22,359
does the check. In addition, by the way, I also

516
00:28:22,880 --> 00:28:25,039
suggest for companies it has nothing to do with the

517
00:28:25,079 --> 00:28:29,240
trends on but like to configure outside health checks that

518
00:28:29,400 --> 00:28:33,400
actually go and do a user activity on your services

519
00:28:33,559 --> 00:28:36,319
for real, because the worst thing you want is a

520
00:28:36,359 --> 00:28:39,240
customer calling and saying the service is down. You want

521
00:28:39,279 --> 00:28:41,400
to be the first to know, so I think that

522
00:28:41,480 --> 00:28:44,680
those are the main things I would focus on.

523
00:28:45,039 --> 00:28:47,680
Speaker 5: That's a really good point. I've been in a few

524
00:28:47,759 --> 00:28:53,319
outages where everything was working internally but nothing was working externally.

525
00:28:53,880 --> 00:28:57,640
Speaker 2: Yep, I'll tell you one of my most severe outages.

526
00:28:58,160 --> 00:29:01,720
It was so hard to debug. It was my previous company.

527
00:29:02,440 --> 00:29:04,799
It was not an outage, it was even worse. What's

528
00:29:04,799 --> 00:29:08,400
worse than an outage? Everything slows down and works really

529
00:29:08,440 --> 00:29:11,759
really bad, and it doesn't break, so you don't really know.

530
00:29:12,119 --> 00:29:15,680
And it was a data pipeline that collected two hundred

531
00:29:15,799 --> 00:29:21,640
billion events every month, and it was a geolocation based routing,

532
00:29:22,000 --> 00:29:24,640
so it would every time someone will click and add,

533
00:29:25,039 --> 00:29:28,279
it will route to the closest aws region and send

534
00:29:28,440 --> 00:29:31,599
the event there. And then we had thirteen regions that

535
00:29:31,640 --> 00:29:34,880
would send everything to a centralized kinnessis and then we

536
00:29:34,920 --> 00:29:37,720
would have workers that would process it. Now, in order

537
00:29:37,759 --> 00:29:41,400
to do the duplication and add some attributes, we had

538
00:29:41,400 --> 00:29:44,759
a ready skesh and this ready so all the workers

539
00:29:44,839 --> 00:29:47,839
would access the ready skesh in order to put in ideas,

540
00:29:48,039 --> 00:29:51,480
select ideas and so on. And at some point, with

541
00:29:51,559 --> 00:29:55,640
the amount of messages increased, you know, slowly, slowly, slowly, slowly,

542
00:29:55,960 --> 00:29:58,200
and then at some point the memory of the readies

543
00:29:58,359 --> 00:30:01,279
got filled. So what did it? It switched to swap.

544
00:30:01,720 --> 00:30:04,559
This is a problem with swap. It's slow. And then

545
00:30:04,640 --> 00:30:08,319
all the requests started returning really really slowly. And then

546
00:30:08,359 --> 00:30:10,799
you don't understand. You think, okay, there's a problem, so

547
00:30:10,839 --> 00:30:14,200
you put on more servers and then they bombard the

548
00:30:14,279 --> 00:30:16,599
readies even more, and then you put on more workers

549
00:30:16,680 --> 00:30:20,240
and like you're trying everything from like you're just trying

550
00:30:20,240 --> 00:30:23,240
to debug everything until finally we're like opening and I

551
00:30:23,319 --> 00:30:25,480
was like, oh my god, the Reddit is running on swap.

552
00:30:27,039 --> 00:30:29,920
And then we had to increase their readis memory and

553
00:30:29,960 --> 00:30:33,279
then and then it fixed it. And if we had

554
00:30:33,279 --> 00:30:36,440
a check that life is check that said I'm going

555
00:30:36,519 --> 00:30:38,960
to perform I put events to the readis and I

556
00:30:38,960 --> 00:30:42,880
expected to take two to four milliseconds. I'm just making

557
00:30:42,920 --> 00:30:45,799
this up. And if at some point this is more

558
00:30:45,839 --> 00:30:49,359
than four milliseconds, there's a problem, we would have immediately

559
00:30:49,480 --> 00:30:52,839
new where is the root cause of this issue? But

560
00:30:52,880 --> 00:30:56,000
we didn't. That's the truth.

561
00:30:58,039 --> 00:31:01,039
Speaker 4: Yeah, another good reason why I put more are so important.

562
00:31:01,559 --> 00:31:05,960
I'm curious to know because here's my anecdotal I guess

563
00:31:05,960 --> 00:31:09,640
experience is that I find that very few organizations do

564
00:31:10,400 --> 00:31:14,960
post mortems well, and if they're doing them, I don't

565
00:31:15,000 --> 00:31:17,880
think that they do them in a very effective way.

566
00:31:17,920 --> 00:31:20,079
I think they do them in more of a finger pointing,

567
00:31:20,720 --> 00:31:23,720
root cause analysis, who caused the problem and who should

568
00:31:23,720 --> 00:31:27,799
we fire right? And I just feel like you feel

569
00:31:27,839 --> 00:31:31,319
that Unfortunately, Listen, I'm on the security side, so a

570
00:31:31,319 --> 00:31:34,480
lot of the post mortems I'm involved with our security incidents,

571
00:31:35,119 --> 00:31:37,640
so those might be a little bit, you know, handled

572
00:31:37,640 --> 00:31:39,960
a little bit differently than like an you know, typical

573
00:31:39,960 --> 00:31:42,359
outage or or that sort of situation.

574
00:31:43,240 --> 00:31:45,240
Speaker 3: But yeah, we will did it.

575
00:31:45,960 --> 00:31:51,680
Speaker 4: Right, Yeah, I mean so, I guess I should say

576
00:31:51,680 --> 00:31:54,720
I feel like most of the time the post mortems

577
00:31:54,799 --> 00:31:57,599
just don't happen. I feel like the times that they do,

578
00:31:58,519 --> 00:32:02,200
it almost becomes a witch hunt. And those are very rare.

579
00:32:02,400 --> 00:32:05,599
But when they do happen, again, that's just my experience,

580
00:32:06,599 --> 00:32:10,480
they just just get nasty. So I'm curious. I want

581
00:32:10,519 --> 00:32:13,119
to hear a better story because I feel like my

582
00:32:13,200 --> 00:32:14,880
experience is not good.

583
00:32:15,079 --> 00:32:19,680
Speaker 2: I've never experienced anything like it. Thankfully, the organizations that

584
00:32:19,759 --> 00:32:23,680
I've worked with, my company, thank god, we did not

585
00:32:23,880 --> 00:32:27,880
have a security incident that someone stole all of our

586
00:32:27,920 --> 00:32:32,039
records or something, because then I think you're like obligated

587
00:32:32,359 --> 00:32:37,160
to take action. And maybe most of the like your

588
00:32:37,279 --> 00:32:40,839
cases were those type of severe cases where you know,

589
00:32:40,920 --> 00:32:44,680
it's it's just it's like borderline, like federal, it's like

590
00:32:45,039 --> 00:32:49,240
it's really a problem. And what I am, yeah, what

591
00:32:49,319 --> 00:32:53,039
I'm referring to is more of a engineers and like

592
00:32:53,039 --> 00:32:56,400
like the ready say story, I just told you, who

593
00:32:56,440 --> 00:32:58,720
are you gonna fire? No one, It's just gonna make

594
00:32:58,759 --> 00:33:01,039
everyone the better, you know, and tell them and then

595
00:33:01,160 --> 00:33:03,279
think of how we could have fixed it. And then

596
00:33:03,359 --> 00:33:05,680
we implemented the check. Believe me, every time there was

597
00:33:05,720 --> 00:33:09,319
a problem, the first thing everyone checkedes the reddits. Everyone

598
00:33:09,480 --> 00:33:12,319
went to see that redit is okay. It was like

599
00:33:12,359 --> 00:33:15,480
a small baby that everyone takes care of. But I

600
00:33:15,480 --> 00:33:18,119
don't believe in the witch hunts. I really believe in

601
00:33:18,160 --> 00:33:21,480
the culture where people come and they say I made

602
00:33:21,480 --> 00:33:25,039
a mistake and people help them understand. And again, as

603
00:33:25,079 --> 00:33:29,279
long as there was no negligence, I don't know, you know,

604
00:33:29,480 --> 00:33:34,480
something criminal or something like that, people make mistakes. Another story,

605
00:33:34,519 --> 00:33:37,559
there was an employee. It was her first day. The

606
00:33:37,599 --> 00:33:40,599
company was still using SVN and not GET and on

607
00:33:40,640 --> 00:33:42,920
her first day on the job, she deleted the entire

608
00:33:43,359 --> 00:33:53,279
SVN three nothing happened to her. Yeah, so I think

609
00:33:53,319 --> 00:33:58,039
restored the backup and it's okay. But I think this

610
00:33:58,119 --> 00:33:59,519
is the main difference. I don't know what is your

611
00:33:59,559 --> 00:34:00,480
experience personally.

612
00:34:00,519 --> 00:34:01,960
Speaker 4: I've seen both ways, you know.

613
00:34:01,960 --> 00:34:06,279
Speaker 5: I remember in years past post mortems were the lynch

614
00:34:06,359 --> 00:34:10,159
mob had to pitch the pitchforks and the torches trying

615
00:34:10,159 --> 00:34:12,480
to find out who we were going to grab. But

616
00:34:12,639 --> 00:34:15,679
I think that's in my experience that's gone away over

617
00:34:15,679 --> 00:34:20,599
the last few years to people being more willing to

618
00:34:21,000 --> 00:34:25,239
accept that mistakes happen. But it almost feels like a

619
00:34:25,320 --> 00:34:32,880
pendulum where now an it's over overly trying too hard.

620
00:34:32,960 --> 00:34:37,519
I guess to make sure that someone doesn't feel attacked

621
00:34:37,679 --> 00:34:39,760
in the post mortem, that you never get to the

622
00:34:39,840 --> 00:34:42,960
root cause either, you know, And so I think you

623
00:34:43,079 --> 00:34:47,480
got to struggle to find the happy medium there. And

624
00:34:47,920 --> 00:34:50,840
I mean, ultimately, you know, in a lot of these situations,

625
00:34:51,000 --> 00:34:55,440
someone did do something incorrect and you've got to point

626
00:34:55,480 --> 00:34:58,079
that out in order to identify it. And when you

627
00:34:58,079 --> 00:35:00,760
point it out, you know, you're not like calling that

628
00:35:00,840 --> 00:35:05,559
person out or attacking their skills. It was just a mistake.

629
00:35:05,599 --> 00:35:07,880
It happened, but it's important to fully understand what that

630
00:35:07,920 --> 00:35:11,800
mistake was so that you can build in the systems

631
00:35:11,840 --> 00:35:13,280
to prevent it from happening again.

632
00:35:13,880 --> 00:35:14,119
Speaker 2: Yeah.

633
00:35:14,199 --> 00:35:18,519
Speaker 1: Yeah, I've been in the situation where and not because

634
00:35:18,519 --> 00:35:21,880
of a post mortem, but just because of other things.

635
00:35:21,920 --> 00:35:22,079
Speaker 3: You know.

636
00:35:22,079 --> 00:35:24,599
Speaker 1: I had a boss come in once on one of

637
00:35:24,599 --> 00:35:27,480
the teams, I was team lead, and he basically walked

638
00:35:27,480 --> 00:35:30,639
in the room and said, somebody's getting fired today, right,

639
00:35:31,320 --> 00:35:35,079
And you don't want people to feel that, right, because

640
00:35:35,119 --> 00:35:37,159
I took him outside and I said, I said, if

641
00:35:37,199 --> 00:35:39,679
you're going to pull this, they're all keeping their jobs.

642
00:35:39,679 --> 00:35:43,400
I'm just going to quit, right, And it's because nobody

643
00:35:43,440 --> 00:35:46,199
should live in that kind of fear. Right, We're all

644
00:35:46,239 --> 00:35:48,559
trying to work on the same thing. But the flip

645
00:35:48,599 --> 00:35:51,960
side is is, yeah, I mean, if somebody is routinely reckless, right,

646
00:35:52,360 --> 00:35:53,800
it's always Jim.

647
00:35:54,039 --> 00:35:54,280
Speaker 3: Right.

648
00:35:54,400 --> 00:35:57,440
Speaker 1: It's gone down four times this month, and Jim has

649
00:35:57,480 --> 00:35:59,679
been the one to mess it up every time, and

650
00:36:00,199 --> 00:36:02,199
this is all stuff that we've done training on, and

651
00:36:02,239 --> 00:36:04,639
so Jim should know better, you know. The first time, Hey,

652
00:36:04,719 --> 00:36:08,760
Jim's a human. Second time, Jim's still a human. Third time, Okay,

653
00:36:09,119 --> 00:36:12,039
Jim's a human. But Jim is starting to cause some problems.

654
00:36:12,280 --> 00:36:14,639
You can have the conversation about whether or not Jim

655
00:36:14,719 --> 00:36:17,719
needs to keep his job. But if people feel like

656
00:36:18,400 --> 00:36:23,039
they're going to be punished for making a mistake every

657
00:36:23,079 --> 00:36:25,239
once in a great while, then you're going to slow

658
00:36:25,320 --> 00:36:28,239
the whole system way down. And the whole point, as

659
00:36:28,239 --> 00:36:31,400
Shimon keeps pointing out, is we want to keep moving fast.

660
00:36:31,519 --> 00:36:32,679
We want to move fast, we.

661
00:36:32,639 --> 00:36:35,000
Speaker 3: Want to get stuff out, we want to solve problems

662
00:36:35,000 --> 00:36:37,679
for our customers as quickly as possible, and at the

663
00:36:37,679 --> 00:36:40,599
same time maintain some level of stability.

664
00:36:40,760 --> 00:36:43,000
Speaker 4: Yeah, really agree. I think the last point I would

665
00:36:43,000 --> 00:36:45,000
make is that I think the whole idea of root

666
00:36:45,039 --> 00:36:48,039
cause analysis, even if it is one person's you know,

667
00:36:48,039 --> 00:36:49,280
at the end of the day, even if you can

668
00:36:49,280 --> 00:36:53,239
tie it back to one person's typo or mistake or whatever,

669
00:36:53,880 --> 00:36:56,400
I personally feel like the root cause analysis is generally

670
00:36:56,440 --> 00:37:02,079
flawed in that it's rarely one person right. It's it

671
00:37:02,159 --> 00:37:05,239
might be one person you know again who typed it

672
00:37:05,280 --> 00:37:08,280
in wrong or did whatever, but there's a process breakdown

673
00:37:08,320 --> 00:37:11,599
as well, and there was an authority breakdown, or like

674
00:37:11,760 --> 00:37:14,800
what she was talking about before, the guardrails didn't exist.

675
00:37:15,639 --> 00:37:18,000
You just can't point it at one person like it's

676
00:37:18,039 --> 00:37:22,119
the system broke down. Yes, it resulted in somebody's mistake

677
00:37:22,159 --> 00:37:24,320
in a manifest file or something like that. But if

678
00:37:24,360 --> 00:37:26,119
you go, you know, if you take it back, you

679
00:37:26,119 --> 00:37:28,320
look at it and you say, well, wait a second, guys,

680
00:37:28,400 --> 00:37:30,639
because our process isn't all that great. He was trying

681
00:37:30,679 --> 00:37:32,800
to do the best he could, he didn't know or

682
00:37:32,800 --> 00:37:34,840
whatever it was. You can't be an expert in everything

683
00:37:35,239 --> 00:37:38,360
made a mistake, but it's because the entire process broke down,

684
00:37:38,360 --> 00:37:40,280
not just because one person made a mistake. And I

685
00:37:40,280 --> 00:37:42,719
feel like that's the piece that you know, you're trying

686
00:37:42,719 --> 00:37:45,719
to do the cause analysis, that's the piece that people

687
00:37:45,840 --> 00:37:46,760
just don't think about.

688
00:37:47,000 --> 00:37:50,119
Speaker 2: I totally agree. Just just to finish on this point,

689
00:37:50,159 --> 00:37:53,559
the best root cause analysis process that I have ever

690
00:37:53,599 --> 00:37:56,920
seen in my life is get lab. They went down

691
00:37:57,519 --> 00:38:01,760
and they've opened a live doc that everyone could see,

692
00:38:01,840 --> 00:38:06,159
all the customers, all everyone, and they've had a sessions

693
00:38:06,159 --> 00:38:09,639
that are like open a Google hangout resume. I remember

694
00:38:09,679 --> 00:38:12,280
what they did, and anyone could join, and it was

695
00:38:12,320 --> 00:38:16,400
a totally transparent process of them debugging the outage that

696
00:38:16,440 --> 00:38:20,280
they had, and of course afterwards they published everything like

697
00:38:20,760 --> 00:38:24,159
including like logs, crazy stuff and like here's what happened.

698
00:38:24,440 --> 00:38:27,880
Here's for transparency, and here's for you to learn how

699
00:38:28,320 --> 00:38:30,880
not to make our mistakes. And I really admired it.

700
00:38:31,000 --> 00:38:33,239
Speaker 5: Yeah, I think there's something to be said for gaining

701
00:38:33,320 --> 00:38:37,639
credibility with your customers whenever they find out that there's

702
00:38:37,679 --> 00:38:40,760
an outage from you, instead of them telling you that

703
00:38:40,840 --> 00:38:44,400
there's an outage, and then you provide real time or

704
00:38:44,480 --> 00:38:47,880
near time updates to them up until the issues resolved.

705
00:38:48,480 --> 00:38:49,039
Speaker 2: Definitely.

706
00:38:49,280 --> 00:38:51,599
Speaker 5: So I think we've all seen scenarios where ABS has

707
00:38:51,639 --> 00:38:55,199
had an incident and you find out about it either

708
00:38:55,239 --> 00:38:58,199
personally or on Reddit three or four hours before the

709
00:38:58,239 --> 00:38:59,920
AWS status page updates.

710
00:39:00,519 --> 00:39:03,440
Speaker 2: That is, if it did not affect the status space

711
00:39:03,559 --> 00:39:07,679
because that happened as well, you know.

712
00:39:10,320 --> 00:39:12,880
Speaker 1: Yeah, well, and that's interesting to me too, right, is

713
00:39:12,920 --> 00:39:17,199
that sometimes it's hey, we screwed this stuff up and

714
00:39:17,360 --> 00:39:20,159
so therefore our app didn't run. And then yeah, we

715
00:39:20,199 --> 00:39:22,480
see these big companies that use a lot of the

716
00:39:22,519 --> 00:39:25,679
AWUS or other infrastructure on the out there on the cloud,

717
00:39:26,360 --> 00:39:29,599
and what winds up happening is yeah, what we're kind

718
00:39:29,599 --> 00:39:33,239
of talking about, except they take down the entire US

719
00:39:33,320 --> 00:39:36,880
Eastern one region, right, and everybody goes, why is the

720
00:39:37,000 --> 00:39:41,400
Internet not working? And yeah, it turns out that, yeah,

721
00:39:41,559 --> 00:39:44,320
the Internet relied on that that region for a whole

722
00:39:44,320 --> 00:39:47,599
bunch of stuff and it's gone. And so those kinds

723
00:39:47,599 --> 00:39:50,960
of externalities too, where it's it goes beyond even your code,

724
00:39:51,000 --> 00:39:56,400
your company, your infrastructure, your cloud set up. That's fascinating too,

725
00:39:56,519 --> 00:39:59,719
and those cases, you know, as Will's pointing out, we

726
00:39:59,800 --> 00:40:03,159
all kind of want to know, right, because it's affecting everybody.

727
00:40:03,400 --> 00:40:05,079
Speaker 2: This is why, you know, it was very interesting when

728
00:40:05,159 --> 00:40:08,280
Jeffrey said that, like which hunt and so on? And

729
00:40:08,320 --> 00:40:12,679
I think this is like the define line between security

730
00:40:12,719 --> 00:40:16,320
and infrastructure, where it's like the culture and infrastrucures like yeah,

731
00:40:16,440 --> 00:40:19,480
we all like seventeen out the jazz and no problem.

732
00:40:19,800 --> 00:40:24,320
And then when it crosses this line specifically you know

733
00:40:25,000 --> 00:40:32,039
a privacy security you know, personally identifying information, and then

734
00:40:32,079 --> 00:40:36,280
it's like, okay, something's different going to happen here. And

735
00:40:36,639 --> 00:40:41,760
it's interesting because in organizations, like government organizations, there are

736
00:40:42,360 --> 00:40:45,000
special ways to investigate what happened. That's saying in a

737
00:40:45,039 --> 00:40:48,159
military when there was an operation, so they want to

738
00:40:48,239 --> 00:40:51,360
learn from it. So there are two paths of investigation.

739
00:40:51,519 --> 00:40:54,320
One path is like the regular path, they investigate and

740
00:40:54,360 --> 00:40:56,320
like they can put someone to jail and so on.

741
00:40:56,679 --> 00:41:00,639
And then there's it's called the professional combat review where

742
00:41:00,760 --> 00:41:03,000
everyone can say whatever. They can say, I killed someone

743
00:41:03,440 --> 00:41:07,119
and they will not be eligible for anything, like they

744
00:41:07,119 --> 00:41:09,400
can't do anything to them, and they have one hundred

745
00:41:09,400 --> 00:41:12,079
percent immunity in this process. And this is done in

746
00:41:12,159 --> 00:41:15,599
order to make sure that we learn and that everyone

747
00:41:15,719 --> 00:41:18,679
say what really really happened, and like everything you say

748
00:41:18,679 --> 00:41:21,559
there is classified, it cannot be used against you and

749
00:41:21,599 --> 00:41:24,480
so on. So I think it's also an interesting thing

750
00:41:24,559 --> 00:41:25,960
to think about in our field.

751
00:41:26,360 --> 00:41:30,239
Speaker 4: I totally agree. I feel like the organizations that yeah,

752
00:41:30,280 --> 00:41:32,559
like I said, I think that which I'm you know,

753
00:41:32,679 --> 00:41:36,800
mentality is a terrible one regardless of what what happened.

754
00:41:36,960 --> 00:41:39,400
I mean unless you are talking about like we said before,

755
00:41:39,519 --> 00:41:42,920
nofeasance or negligence or something like that, or you know,

756
00:41:43,000 --> 00:41:45,920
beyond negligence, but you know, really criminal negligence like which

757
00:41:46,679 --> 00:41:51,119
rarely happens. Right, It's it's generally speaking, you know, it's

758
00:41:51,119 --> 00:41:54,639
a breakdown process and just fix it. I mean, just

759
00:41:54,679 --> 00:41:57,639
work together and fix it. Nobody wants to. I mean,

760
00:41:57,719 --> 00:42:00,440
I've just been involved in so many companies post breach,

761
00:42:01,199 --> 00:42:03,760
and so everybody just wants wants THEMS to go back

762
00:42:03,800 --> 00:42:05,719
to normal. It's like COVID, right, everyone just wants things

763
00:42:05,760 --> 00:42:09,400
to go back to normal. Let's just pass this move on,

764
00:42:09,960 --> 00:42:12,280
you know, do we have to do, but let's stop

765
00:42:12,480 --> 00:42:14,079
reliving it on a daily basis.

766
00:42:14,320 --> 00:42:16,760
Speaker 1: Yeah, all right, Well, I think we're kind of getting

767
00:42:16,760 --> 00:42:19,000
towards a place where we can start to wrap up.

768
00:42:19,280 --> 00:42:23,639
Are there any other kind of big pieces of advice

769
00:42:23,760 --> 00:42:26,360
that we need to put out there before we go

770
00:42:26,440 --> 00:42:27,159
to our picks.

771
00:42:27,360 --> 00:42:30,239
Speaker 2: I want to point out one thing which I really

772
00:42:30,239 --> 00:42:33,559
believe in, which is it's a big word called gee tops.

773
00:42:33,639 --> 00:42:37,800
But in general, make sure that all of your configuration

774
00:42:38,119 --> 00:42:40,639
and all of your assets, everything is in code and

775
00:42:41,079 --> 00:42:44,239
in GIT. And if you live with one thing from

776
00:42:44,239 --> 00:42:48,280
this podcast is make sure that everything is infrastructure's code

777
00:42:48,280 --> 00:42:50,400
and in geed because then you will be able to

778
00:42:50,480 --> 00:42:52,760
at least see what happened and what was the configuration

779
00:42:52,880 --> 00:42:55,480
and how did we configure it. So this is my

780
00:42:56,199 --> 00:42:58,039
final small remark here.

781
00:42:58,079 --> 00:42:58,800
Speaker 3: That's good advice.

782
00:42:59,000 --> 00:43:01,800
Speaker 1: All right, Well let's roll into picks then, Jeffrey, do

783
00:43:01,800 --> 00:43:03,079
you want to startus off with the picks?

784
00:43:03,239 --> 00:43:06,119
Speaker 4: All right? So it's something I was just thinking about.

785
00:43:06,199 --> 00:43:08,199
I was actually thinking about as we're talking, you know,

786
00:43:08,320 --> 00:43:12,559
just having our conversation here. So my pick isn't a

787
00:43:12,800 --> 00:43:16,400
specific thing. It's more of just an approach. So I

788
00:43:16,440 --> 00:43:20,320
get asked all the time like how do you, you know,

789
00:43:20,400 --> 00:43:24,880
sort of continue to continuously learn and you know, learn

790
00:43:24,920 --> 00:43:28,480
new technology is new, you know, sort of stay on

791
00:43:28,559 --> 00:43:32,360
top of current threats. It's technology in general is just

792
00:43:32,400 --> 00:43:35,639
that constantly changing space. But I mean, honestly, I think

793
00:43:35,679 --> 00:43:39,559
that applies beyond technology. Our world is just constantly changing,

794
00:43:40,320 --> 00:43:41,760
and how do you stay on top of that? And

795
00:43:42,280 --> 00:43:44,440
how do you do that without spending eight hours a

796
00:43:44,519 --> 00:43:47,360
day just trying to read or learn or watch or whatever.

797
00:43:48,599 --> 00:43:51,480
And so a couple of things that I have learned.

798
00:43:51,840 --> 00:43:55,400
So I think there's more just ideas than actual like

799
00:43:55,440 --> 00:43:58,159
go go and buy a product or something like that

800
00:43:58,920 --> 00:44:01,159
is you know, the with it we learn. I think

801
00:44:01,199 --> 00:44:03,400
that you know, there are different you know, different people

802
00:44:03,400 --> 00:44:07,000
do learn differently. But what I've seen is that, you know,

803
00:44:07,039 --> 00:44:09,599
there's so much out there now like on YouTube, for instance,

804
00:44:10,840 --> 00:44:13,360
I mean, there's so much content out there, but it

805
00:44:13,400 --> 00:44:15,599
takes a long time to go through, especially now that

806
00:44:15,639 --> 00:44:17,960
everything has adds in it. So now now every video

807
00:44:18,079 --> 00:44:21,840
takes much longer to get through, right, But if what

808
00:44:21,880 --> 00:44:25,039
you're trying to learn is very specific, it's sometimes harder

809
00:44:25,079 --> 00:44:29,320
to figure out how to learn it because there are

810
00:44:29,320 --> 00:44:32,119
so many blog posts that are too too generic or

811
00:44:32,199 --> 00:44:34,960
just repeating what everybody else has already said on the

812
00:44:34,960 --> 00:44:37,079
topic already, and everyone just wants to put it into

813
00:44:37,119 --> 00:44:39,280
their blog to try and you know, get whatever it

814
00:44:39,320 --> 00:44:41,559
is se o or get you know, traction out of it,

815
00:44:41,599 --> 00:44:44,800
traffic that sort of thing, or you can try and

816
00:44:44,920 --> 00:44:46,599
you know, pick it out of like a video, but

817
00:44:46,760 --> 00:44:48,480
you know, you could be going through a sixty minute

818
00:44:48,559 --> 00:44:50,440
video and trying to figure out where where it is.

819
00:44:50,480 --> 00:44:52,840
So I think part of it is and there's no

820
00:44:52,880 --> 00:44:54,920
real answer here, but part of it is just figuring

821
00:44:54,920 --> 00:44:57,519
out what's the best medium for learning what I'm trying

822
00:44:57,559 --> 00:44:59,280
to learn? Am I just trying to get an overview

823
00:44:59,320 --> 00:45:01,880
of it of that subject, then maybe a video is

824
00:45:01,920 --> 00:45:04,719
good if I'm trying to learn something very specific, maybe

825
00:45:04,719 --> 00:45:08,119
going to like stack overflowed. I think building that skill

826
00:45:08,159 --> 00:45:10,800
set in yourself of figuring out what is it that

827
00:45:10,840 --> 00:45:13,480
I'm trying to learn and what's the best way for

828
00:45:13,519 --> 00:45:15,599
me to get there is something that we all have

829
00:45:15,679 --> 00:45:17,119
to just sort of develop. And I think a lot

830
00:45:17,159 --> 00:45:19,960
of us who've been doing this for years, you're probably thinking, yeah,

831
00:45:20,239 --> 00:45:22,719
I've been there, I've done that. I think I'm there already.

832
00:45:23,039 --> 00:45:26,119
But I think for some of the people earlier on

833
00:45:26,199 --> 00:45:29,239
in their in their in their career, this might be

834
00:45:29,280 --> 00:45:31,719
something that you should really be thinking about, is just

835
00:45:32,599 --> 00:45:36,280
how to be most efficient learning something new. And obviously

836
00:45:36,280 --> 00:45:39,000
it also goes back to figuring out what the best

837
00:45:39,039 --> 00:45:40,920
sources are, because, like I said, there's a lot of

838
00:45:41,000 --> 00:45:44,280
content out there, and it's just regurgitating what's what's already

839
00:45:44,280 --> 00:45:46,519
out there and and sort of dumbing it down sometimes

840
00:45:46,519 --> 00:45:49,559
like pulling out some of the details. So those sources,

841
00:45:49,639 --> 00:45:51,360
you know, you want to toss and you want to

842
00:45:51,400 --> 00:45:53,039
just sort of go to, you know, figure out what

843
00:45:53,079 --> 00:45:54,960
are the right sources that that you know give you

844
00:45:54,960 --> 00:45:57,400
their information. So that's one piece. The other thing I

845
00:45:57,400 --> 00:45:59,920
was going to say is I think sometimes a lot

846
00:45:59,920 --> 00:46:02,159
of times we are we have this sort of natural

847
00:46:02,199 --> 00:46:05,400
tendency to look for, you know, when we do have

848
00:46:05,440 --> 00:46:08,320
to buy something, we think about, what's the cheapest product

849
00:46:08,320 --> 00:46:10,760
out there? Right, what's and I think so many so

850
00:46:10,880 --> 00:46:14,360
often the cheapest product actually takes you more time, more energy,

851
00:46:14,440 --> 00:46:16,320
and you end up having to do things over you know,

852
00:46:16,360 --> 00:46:19,000
over again or whatever, and it's not the cheapest product,

853
00:46:19,000 --> 00:46:22,079
and I think, you know it's it's again, you know,

854
00:46:22,159 --> 00:46:23,920
as you go through that learning and figuring out what

855
00:46:24,239 --> 00:46:27,000
what is it that I need, don't fall into the

856
00:46:27,039 --> 00:46:31,280
trap of just buying the cheapest product. Sometimes it's buying

857
00:46:31,320 --> 00:46:33,639
the more expensive product. I mean, sometimes it is the

858
00:46:33,679 --> 00:46:36,280
cheapest product. Generally it's a use once type of thing,

859
00:46:36,400 --> 00:46:38,280
or you know, I'm really going to use it. Great,

860
00:46:38,599 --> 00:46:41,039
But if it's something you're not going to do that

861
00:46:41,079 --> 00:46:43,639
you are going to continue toly use, spend some time

862
00:46:43,679 --> 00:46:46,079
figuring out does it make sense to invest in something

863
00:46:46,119 --> 00:46:49,079
a little bit you know, better quality. So anyway, those

864
00:46:49,079 --> 00:46:52,679
are my two picks methodologies. Whatever thoughts for the day?

865
00:46:52,960 --> 00:46:54,920
Speaker 3: Nice Will, what are your picks?

866
00:46:55,000 --> 00:46:55,400
Speaker 4: All right?

867
00:46:55,480 --> 00:46:58,559
Speaker 5: So I have been working my way through this book,

868
00:46:58,639 --> 00:47:02,559
The Manual from Bictitis. So he was a stoic philosopher,

869
00:47:02,719 --> 00:47:06,840
and I've actually tried to read Marcus Aurelius's meditations in

870
00:47:06,880 --> 00:47:09,920
the past and not really sure how much I actually

871
00:47:09,960 --> 00:47:14,000
retained from that. So I came across this book and

872
00:47:14,039 --> 00:47:16,840
I really like it because it's just it's very short,

873
00:47:16,840 --> 00:47:20,880
like each page just has one particular quote or saying

874
00:47:21,000 --> 00:47:25,440
from Epictetis and it's been really helpful to just kind

875
00:47:25,440 --> 00:47:29,039
of come to understanding with the whole Stoic philosophy and

876
00:47:29,400 --> 00:47:34,079
that in combination with daily emails the email list from

877
00:47:34,119 --> 00:47:37,280
the Daily Stoic dot com, I start each day by

878
00:47:37,320 --> 00:47:40,039
reading those and it's a really good way to kind

879
00:47:40,039 --> 00:47:43,159
of level set your mind before you get started in

880
00:47:43,159 --> 00:47:45,199
a day and put things in perspective, because I think

881
00:47:45,199 --> 00:47:49,000
that's helpful, especially with the amount of information and if

882
00:47:49,039 --> 00:47:52,519
you can't avoid the news that's going on every day,

883
00:47:53,000 --> 00:47:56,599
it kind of helps you temper that message and keep

884
00:47:56,639 --> 00:47:59,719
things into more of a longer range perspective. So the

885
00:47:59,760 --> 00:48:02,599
main from Epictetus and the Daily store dot com are

886
00:48:02,719 --> 00:48:03,719
my picks for today.

887
00:48:04,119 --> 00:48:05,880
Speaker 3: Nice, what do I have for picks?

888
00:48:05,920 --> 00:48:08,960
Speaker 1: So Father's Day, I've got a couple of picks for

889
00:48:09,079 --> 00:48:12,039
stuff that I did or got for Father's Day. The

890
00:48:12,119 --> 00:48:15,599
first pick that I have is my wife's like, Hey,

891
00:48:15,639 --> 00:48:17,920
you get to control the TV, which never happens at

892
00:48:17,960 --> 00:48:19,679
my house, both because I don't watch a ton of

893
00:48:19,719 --> 00:48:24,039
TV and because my kids just are on video games

894
00:48:24,079 --> 00:48:27,159
all day during the summer, so you know, I'll go

895
00:48:27,199 --> 00:48:29,480
down there and I'll just kind of see what's going on.

896
00:48:29,599 --> 00:48:33,639
But yeah, So on Sunday afternoon, I watched Willow, which

897
00:48:33,719 --> 00:48:37,000
is one of my favorite old timey movies.

898
00:48:37,239 --> 00:48:40,239
Speaker 3: So I'm gonna I'm gonna pick that because I enjoyed it.

899
00:48:40,280 --> 00:48:41,119
I really enjoyed it.

900
00:48:41,280 --> 00:48:43,079
Speaker 1: Of course, all my kids the second we turn it

901
00:48:43,119 --> 00:48:46,480
on there they sat there for ten to fifteen minutes

902
00:48:46,480 --> 00:48:47,679
and then just cleared out of the room.

903
00:48:47,679 --> 00:48:52,599
Speaker 3: And I'm just like, my guys, is a good movie. Whatever. Whatever.

904
00:48:52,920 --> 00:48:55,760
Speaker 1: Anyway, the other pick that I have so my wife,

905
00:48:56,079 --> 00:48:58,519
I've been having issues. My grill has been falling apart

906
00:48:58,559 --> 00:49:01,079
for a few years, and I like cooking me some meat.

907
00:49:01,719 --> 00:49:06,559
So my wife got me a trigger smoker for Father's Day.

908
00:49:07,039 --> 00:49:07,639
Speaker 4: Oh nice.

909
00:49:08,280 --> 00:49:10,599
Speaker 1: And it's got a couple of meat probes in it

910
00:49:10,840 --> 00:49:14,239
and stuff like that, which is super nice because a

911
00:49:14,239 --> 00:49:16,119
lot of time it's not. It doesn't have bluetooth or

912
00:49:16,119 --> 00:49:17,880
anything in it. I know some of the more expensive

913
00:49:17,880 --> 00:49:21,239
models do, but it's nice just because you can kind

914
00:49:21,239 --> 00:49:22,920
of cook at the temperature and then you know you're

915
00:49:22,960 --> 00:49:26,639
ready to pull it out right. And so anyway, made

916
00:49:26,679 --> 00:49:29,599
a brisket on it for Father's Day so good.

917
00:49:29,840 --> 00:49:31,880
Speaker 3: Oh my gosh.

918
00:49:32,039 --> 00:49:34,159
Speaker 1: You know, I've got some baby back ribs in the

919
00:49:34,159 --> 00:49:36,480
fridge that I need to throw on there sooner rather

920
00:49:36,559 --> 00:49:40,559
than later. But it's just it's so nice and all

921
00:49:40,599 --> 00:49:42,280
of the stuff that you kind of cook on the

922
00:49:42,360 --> 00:49:46,400
slow cook end of things, they just come out so

923
00:49:46,400 --> 00:49:47,199
so so good.

924
00:49:47,440 --> 00:49:47,639
Speaker 4: Right.

925
00:49:48,119 --> 00:49:50,079
Speaker 1: So the other forms of that I guess are like

926
00:49:50,119 --> 00:49:53,800
the crockpot or the souvide, But yeah, the smoker's nice

927
00:49:53,800 --> 00:49:58,280
too because it gets all this flavor in there. And anyway, yeah,

928
00:49:58,400 --> 00:50:01,800
I am loving having So I'm gonna pick that, Simon.

929
00:50:02,039 --> 00:50:02,800
Speaker 3: What are your picks?

930
00:50:02,920 --> 00:50:06,199
Speaker 2: So I'm gonna have a barbecue now, fifteen of my

931
00:50:06,280 --> 00:50:10,159
friends are coming and I have an Apoleon grill and

932
00:50:10,239 --> 00:50:13,079
I really love grilling and also I always measure the

933
00:50:13,119 --> 00:50:15,960
temperature of the meat and I really really love it

934
00:50:16,039 --> 00:50:20,559
in terms of my picks. So I found daily dot dev.

935
00:50:20,760 --> 00:50:25,320
It's something cool that you can daily dot too. Sorry

936
00:50:25,800 --> 00:50:30,599
that no, I'm I'm mistaking several things here. It's called

937
00:50:30,679 --> 00:50:34,880
daily dot dev and it's a Chrome homepage extension so

938
00:50:34,920 --> 00:50:37,559
when you open up a new tub, it actually shows

939
00:50:37,599 --> 00:50:40,360
you like stuff from news and stuff like that, but

940
00:50:40,880 --> 00:50:43,599
you know, targeted at dev So it's really really nice

941
00:50:44,000 --> 00:50:48,480
because it just gives you a like a thumbnail and

942
00:50:48,519 --> 00:50:51,639
a title and it shows you what's going on. So

943
00:50:51,840 --> 00:50:54,719
I thought it's it's something nice because it's really targeted

944
00:50:54,760 --> 00:50:59,360
towards our target audience, so it's nice. So that's my

945
00:50:59,719 --> 00:51:02,239
small TEP besides the get tops tip that I give

946
00:51:02,280 --> 00:51:02,840
at the beginning.

947
00:51:03,199 --> 00:51:05,800
Speaker 1: Awesome if people want to connect with you online, where

948
00:51:05,800 --> 00:51:06,320
do they find you?

949
00:51:06,679 --> 00:51:11,400
Speaker 2: Yeah, so I'm at Shechemon Tolts at the Twitter and

950
00:51:11,599 --> 00:51:14,599
you can always go to the tree do io and

951
00:51:14,800 --> 00:51:18,679
they see our website there. You can try to message

952
00:51:18,679 --> 00:51:22,360
me on LinkedIn, but it's gonna be you know, it's

953
00:51:22,480 --> 00:51:24,679
it's a we can do a whole session about like

954
00:51:24,719 --> 00:51:29,039
what is LinkedIn become in that regard. But yeah, so

955
00:51:29,159 --> 00:51:32,199
Simon Tolds at Twitter, that's the best place to reach out.

956
00:51:32,360 --> 00:51:35,360
And I look forward to hearing from you and listening

957
00:51:35,360 --> 00:51:37,519
to feedback from users because this is what we love

958
00:51:37,559 --> 00:51:40,519
the most. When people come in run our run our

959
00:51:40,519 --> 00:51:42,920
c l, I get some stuff, and then they write

960
00:51:42,920 --> 00:51:44,960
to us this is great, but we hate this thing

961
00:51:45,000 --> 00:51:47,119
and why can't I do this and that? And then

962
00:51:47,159 --> 00:51:49,320
we talk to them and we hear their feedback and

963
00:51:49,320 --> 00:51:51,920
this is how we prioritize our roadmap, so I encourage

964
00:51:51,960 --> 00:51:54,440
you to give us feedback about our product at the tree.

965
00:51:54,480 --> 00:51:56,639
Do I O d A t r e E do

966
00:51:56,760 --> 00:51:57,880
I O awesome?

967
00:51:58,039 --> 00:51:59,679
Speaker 3: All right, well we'll go ahead and wrap up here.

968
00:51:59,760 --> 00:52:01,360
Thanks than for coming. This was a lot of fun.

969
00:52:01,519 --> 00:52:04,159
Speaker 2: Thank you very much for having me. It was really

970
00:52:04,159 --> 00:52:06,880
really fun being here and gegging out about develops with you.

971
00:52:07,199 --> 00:52:09,599
I feel at home, so thank you very much for

972
00:52:09,639 --> 00:52:10,079
having me.

973
00:52:10,320 --> 00:52:13,440
Speaker 3: All right, well, until next time, folks, max out

