WEBVTT

1
00:00:05.679 --> 00:00:09.919
<v Speaker 1>Hey, welcome back to another episode of JavaScript Jabber. This week,

2
00:00:10.000 --> 00:00:12.000
<v Speaker 1>on our panel, we have Dan Shapiir.

3
00:00:12.599 --> 00:00:17.160
<v Speaker 2>Hello, from this very hot Tel Aviv. Well, isn't that

4
00:00:17.199 --> 00:00:19.839
<v Speaker 2>a surprise somewhere in Israel? Right?

5
00:00:20.600 --> 00:00:22.679
<v Speaker 1>It's it's also pretty hot here. It spaghetting up to

6
00:00:22.719 --> 00:00:25.760
<v Speaker 1>one hundred degrees which is like forty forty one degrees celsius.

7
00:00:25.920 --> 00:00:30.839
<v Speaker 2>Oh, that's actually hotter than it is here. So we've

8
00:00:30.879 --> 00:00:31.559
<v Speaker 2>got better.

9
00:00:32.719 --> 00:00:35.719
<v Speaker 1>It's been that way here for over a month.

10
00:00:36.399 --> 00:00:39.719
<v Speaker 2>It's been trying your eyeballs.

11
00:00:40.159 --> 00:00:44.320
<v Speaker 1>Just about anyway, I'm Charles Maxwood from top Endevs, and

12
00:00:45.000 --> 00:00:49.560
<v Speaker 1>we're going to be talking about monitoring and alerting using

13
00:00:49.640 --> 00:00:56.000
<v Speaker 1>Prometheus and Gryfauna. Now I've played with Graffana this. This

14
00:00:56.280 --> 00:00:59.560
<v Speaker 1>was proposed by Dan, and so I figure he's probably

15
00:00:59.560 --> 00:01:04.159
<v Speaker 1>gonna do more the talking. But yeah, just to give

16
00:01:04.159 --> 00:01:06.519
<v Speaker 1>a little context, I don't even know what freak this is.

17
00:01:06.680 --> 00:01:15.760
<v Speaker 2>So yeah, yeah, So to backtrack a little bit before

18
00:01:15.799 --> 00:01:19.879
<v Speaker 2>we get into specifically what Prometheus and Grafana are, I

19
00:01:19.959 --> 00:01:25.120
<v Speaker 2>want to talk a little bit about monitoring alerting in general.

20
00:01:26.640 --> 00:01:32.040
<v Speaker 2>You know, I've been working on stuff like performance monitoring

21
00:01:32.239 --> 00:01:35.920
<v Speaker 2>and performance optimizations for a good number of years now,

22
00:01:36.519 --> 00:01:39.159
<v Speaker 2>as you all know if you've been listening to this podcast,

23
00:01:40.439 --> 00:01:44.840
<v Speaker 2>and probably one of the most important lessons that I've

24
00:01:44.959 --> 00:01:50.000
<v Speaker 2>learned is that you need to have monitoring in place

25
00:01:50.400 --> 00:01:54.439
<v Speaker 2>and alerting in place before you start any work on

26
00:01:54.560 --> 00:01:58.680
<v Speaker 2>improving things. And in this context, I love a quote

27
00:01:59.079 --> 00:02:03.560
<v Speaker 2>from Peter Drucker. Peter Drucker more or less created the

28
00:02:03.680 --> 00:02:09.280
<v Speaker 2>field of business management, and the quote is, if you

29
00:02:09.439 --> 00:02:14.319
<v Speaker 2>can't measure it, you can't improve it. And I'm really

30
00:02:14.840 --> 00:02:19.400
<v Speaker 2>a big fan of that. If you're unable to measure something,

31
00:02:19.919 --> 00:02:23.360
<v Speaker 2>there's really no way for you to know if you're

32
00:02:24.240 --> 00:02:27.919
<v Speaker 2>making progress, if you're improving things, or if you're actually

33
00:02:28.039 --> 00:02:33.560
<v Speaker 2>degrading things or making no impact at all. And one

34
00:02:33.639 --> 00:02:37.199
<v Speaker 2>of the things that I've done in this context is

35
00:02:37.280 --> 00:02:40.520
<v Speaker 2>that whenever I, like say, I join a new company

36
00:02:40.719 --> 00:02:44.159
<v Speaker 2>or start a new project, I often find myself under

37
00:02:44.240 --> 00:02:49.039
<v Speaker 2>pressure to start delivering improvements, you know, from the get

38
00:02:49.120 --> 00:02:52.560
<v Speaker 2>go as quickly as possible, and I always push back

39
00:02:52.639 --> 00:02:56.159
<v Speaker 2>on that in order to make sure that we have

40
00:02:57.439 --> 00:03:02.280
<v Speaker 2>proper data collection and proper moneying in place before we

41
00:03:02.439 --> 00:03:07.000
<v Speaker 2>start making any improvements. By the way, in this context,

42
00:03:07.120 --> 00:03:11.080
<v Speaker 2>is probably not surprising that I joined Size Sense, which

43
00:03:11.199 --> 00:03:15.599
<v Speaker 2>is a bi analytics company because I'm really a big

44
00:03:15.719 --> 00:03:19.879
<v Speaker 2>believer in that. And you know, as an extra benefit

45
00:03:20.080 --> 00:03:22.680
<v Speaker 2>if you're doing this kind of work, one of the

46
00:03:23.639 --> 00:03:29.800
<v Speaker 2>big benefits of having some sort of measurement and monitoring

47
00:03:29.919 --> 00:03:33.840
<v Speaker 2>solution in place is that after you've made improvements, you

48
00:03:34.039 --> 00:03:38.000
<v Speaker 2>have graphs to show the impact of your hard work.

49
00:03:38.520 --> 00:03:41.919
<v Speaker 2>And from my experience, that's really beneficial when you're let's

50
00:03:41.919 --> 00:03:45.680
<v Speaker 2>say you're looking for raise or advancement or something like that.

51
00:03:47.400 --> 00:03:51.199
<v Speaker 2>But really in order to be successful in any project

52
00:03:51.319 --> 00:03:54.800
<v Speaker 2>that requires taking a system and trying to let's say,

53
00:03:54.840 --> 00:03:59.800
<v Speaker 2>improve it, then having some sort of a monitoring capability

54
00:04:00.039 --> 00:04:04.280
<v Speaker 2>place is crucial. Now the question obviously then becomes what

55
00:04:04.560 --> 00:04:06.960
<v Speaker 2>do I actually use for that. I mean, you know,

56
00:04:07.080 --> 00:04:10.400
<v Speaker 2>it's fairly straightforward to collect a lot of data these days,

57
00:04:10.879 --> 00:04:12.719
<v Speaker 2>but what do you do with it? Where do you

58
00:04:12.840 --> 00:04:15.719
<v Speaker 2>put it, how do you process it, how do you

59
00:04:15.879 --> 00:04:20.120
<v Speaker 2>visualize it, et cetera. And And in this context, I

60
00:04:20.199 --> 00:04:24.199
<v Speaker 2>want to talk specifically about two things, which are Prometheus

61
00:04:24.839 --> 00:04:28.680
<v Speaker 2>and Gofana. And first I'll start with a riddle for you, Chuck,

62
00:04:30.319 --> 00:04:34.319
<v Speaker 2>do you know like who Prometheus was?

63
00:04:35.360 --> 00:04:38.519
<v Speaker 1>Like you know, he was a titan that brought us fire? Right,

64
00:04:39.439 --> 00:04:44.600
<v Speaker 1>wow mythology, and he was he was punished by having

65
00:04:44.879 --> 00:04:47.240
<v Speaker 1>what an eagle eat his entrails out.

66
00:04:48.160 --> 00:04:52.279
<v Speaker 2>Exactly that he was tied to a mountain on the well,

67
00:04:52.319 --> 00:04:56.439
<v Speaker 2>I think with the Tartarus mountains, and then Zeus's eagle

68
00:04:56.600 --> 00:04:59.920
<v Speaker 2>will would come once a day and eat his entrails,

69
00:05:00.160 --> 00:05:03.759
<v Speaker 2>and because he's immortally he cannot die, so he suffers forever.

70
00:05:04.360 --> 00:05:07.560
<v Speaker 2>By the way, later times Greeks or Romans kind of

71
00:05:07.600 --> 00:05:10.439
<v Speaker 2>thought that this was like too bad of a fate,

72
00:05:10.560 --> 00:05:13.959
<v Speaker 2>so they had Hercules release him or something along these lines.

73
00:05:14.000 --> 00:05:20.120
<v Speaker 2>But the no, I can't prove it. But I think

74
00:05:20.639 --> 00:05:24.279
<v Speaker 2>that the reason that this project was called Prometheus is

75
00:05:24.399 --> 00:05:28.600
<v Speaker 2>because it's about bringing knowledge, because fire in this context

76
00:05:29.199 --> 00:05:33.360
<v Speaker 2>is like synonymous with knowledge, knowledge from the gods to

77
00:05:33.399 --> 00:05:37.439
<v Speaker 2>the humans. So it's about bringing knowledge to us developers

78
00:05:37.800 --> 00:05:43.319
<v Speaker 2>about how our systems are operating. So that's the mythology

79
00:05:44.240 --> 00:05:48.319
<v Speaker 2>of what Prometheus is. Now let's talk about what Prometheus

80
00:05:48.439 --> 00:05:54.480
<v Speaker 2>the service is. So Prometheus is free software as an

81
00:05:54.600 --> 00:05:59.000
<v Speaker 2>open source that you can install on premises or you

82
00:05:59.079 --> 00:06:01.959
<v Speaker 2>can use as a service I think as well that

83
00:06:02.160 --> 00:06:08.040
<v Speaker 2>is used for event monitoring alerting. It was originally created

84
00:06:08.800 --> 00:06:14.439
<v Speaker 2>something like twelve years ago at SoundCloud when they came

85
00:06:14.480 --> 00:06:19.079
<v Speaker 2>to the conclusion that none of the third party solutions

86
00:06:19.199 --> 00:06:23.279
<v Speaker 2>for monitoring were sufficient for their needs. And after they

87
00:06:23.399 --> 00:06:28.120
<v Speaker 2>built their system and used it internally, about four years later,

88
00:06:28.879 --> 00:06:33.279
<v Speaker 2>they donated it to the Cloud Native Computing Foundation, which

89
00:06:33.560 --> 00:06:39.360
<v Speaker 2>is the same foundation that also hosts the cumbernitese project.

90
00:06:39.519 --> 00:06:45.360
<v Speaker 2>So you know, this is another successful project from that foundation.

91
00:06:46.759 --> 00:06:51.000
<v Speaker 2>I think the people who are mostly working on it

92
00:06:51.120 --> 00:06:55.079
<v Speaker 2>these days are the people from the company that does Grafana.

93
00:06:57.040 --> 00:06:59.240
<v Speaker 2>But again it's open source and you can see the

94
00:06:59.360 --> 00:07:02.800
<v Speaker 2>source code on GitHub and there are a lot of contributors.

95
00:07:04.079 --> 00:07:08.120
<v Speaker 2>I actually personally contributed to one of the satellite projects

96
00:07:08.199 --> 00:07:13.600
<v Speaker 2>around Prometheus, which is the Prometheus client for node which

97
00:07:13.720 --> 00:07:18.439
<v Speaker 2>makes it possible to connect node js to Prometheus in

98
00:07:18.600 --> 00:07:23.319
<v Speaker 2>order to monitor NOE. So I contributed specifically specifically to

99
00:07:23.519 --> 00:07:25.319
<v Speaker 2>that part of the project.

100
00:07:26.399 --> 00:07:30.199
<v Speaker 1>I'm gonna stop you just for a minute. I've been

101
00:07:30.279 --> 00:07:33.680
<v Speaker 1>posting the links in the comments, but they don't go

102
00:07:33.759 --> 00:07:36.279
<v Speaker 1>to x and that's where most of our live listeners are.

103
00:07:36.600 --> 00:07:40.800
<v Speaker 1>So Prometheus is Prometheus dot io and Graffana is at

104
00:07:40.839 --> 00:07:41.680
<v Speaker 1>graffana dot com.

105
00:07:42.240 --> 00:07:50.480
<v Speaker 2>So anyway, yeah, that's true. Okay, So, so where were

106
00:07:50.560 --> 00:07:53.839
<v Speaker 2>we So we were talking about a little bit about

107
00:07:53.879 --> 00:07:58.720
<v Speaker 2>the history of Prometheus, both the pathological figure and the project.

108
00:07:59.680 --> 00:08:02.839
<v Speaker 2>Now let's talk a little bit about what it actually is.

109
00:08:03.639 --> 00:08:09.160
<v Speaker 2>So it's a service used for event monitoring, alerting. It

110
00:08:09.720 --> 00:08:15.720
<v Speaker 2>records real time metrics in something called a time series database,

111
00:08:16.279 --> 00:08:19.399
<v Speaker 2>which is kind of a special type of database, and

112
00:08:19.519 --> 00:08:22.600
<v Speaker 2>we'll talk about it in more detail and how it

113
00:08:22.720 --> 00:08:25.839
<v Speaker 2>differs from the databases that most of us are familiar with.

114
00:08:27.000 --> 00:08:31.560
<v Speaker 2>It allows for something called high dimensionality, which I also

115
00:08:31.920 --> 00:08:37.240
<v Speaker 2>will try to explain. It supports flexible queries and real

116
00:08:37.320 --> 00:08:41.440
<v Speaker 2>time alerting. And as I said, it's free software. It's

117
00:08:41.519 --> 00:08:47.120
<v Speaker 2>licensed under the Apache two license. So that's what Prometheus is.

118
00:08:47.679 --> 00:08:51.000
<v Speaker 2>So let's say you want to use Prometheus in your organization.

119
00:08:51.720 --> 00:08:55.000
<v Speaker 2>What you would do is that you would install the

120
00:08:55.159 --> 00:09:01.559
<v Speaker 2>Prometheus service and then hook it up to your various

121
00:09:01.639 --> 00:09:06.159
<v Speaker 2>services that you want to monitor. Now, it's a monitoring

122
00:09:06.320 --> 00:09:13.000
<v Speaker 2>solution for back end infrastructure, so things like no JS

123
00:09:13.799 --> 00:09:17.639
<v Speaker 2>or for you know, the JVM, or for something that's

124
00:09:17.639 --> 00:09:21.679
<v Speaker 2>say written in Go. Because that's not surprising because Prometheus

125
00:09:21.720 --> 00:09:25.559
<v Speaker 2>itself is actually written in Go. So maybe it's a

126
00:09:25.600 --> 00:09:28.440
<v Speaker 2>shame that we don't have aj on the show this time.

127
00:09:28.840 --> 00:09:32.159
<v Speaker 1>Yeah. Maybe, but but your point is is that you know,

128
00:09:32.360 --> 00:09:36.519
<v Speaker 1>any any language or system could have a driver that yes.

129
00:09:36.840 --> 00:09:40.799
<v Speaker 2>There's basically a connector for anything. There are also connectors

130
00:09:40.960 --> 00:09:45.440
<v Speaker 2>for a lot of general services. So if you want

131
00:09:45.519 --> 00:09:48.799
<v Speaker 2>to monitor, let's say we were talking about Kubernetes, you

132
00:09:48.879 --> 00:09:54.080
<v Speaker 2>can monitor Kubernetes. Kubernetes has a built in connector for Prometheus,

133
00:09:54.159 --> 00:09:57.120
<v Speaker 2>so you can look at how pods are functioning, or

134
00:09:57.159 --> 00:10:04.039
<v Speaker 2>the Kubernetes cluster itself. There are connectors for various AWS

135
00:10:04.159 --> 00:10:07.039
<v Speaker 2>services and so on and so forth, so you can

136
00:10:07.159 --> 00:10:13.480
<v Speaker 2>collect a lot of data from third party services and infrastructure,

137
00:10:14.320 --> 00:10:19.879
<v Speaker 2>and you can also attach it like create applicative level monitoring,

138
00:10:20.039 --> 00:10:23.679
<v Speaker 2>so you can monitor the behavior of your own applications

139
00:10:24.080 --> 00:10:27.720
<v Speaker 2>that are running on platforms such as the JVM, such

140
00:10:27.759 --> 00:10:32.399
<v Speaker 2>as no JS or in Go, etc. Well more or

141
00:10:32.480 --> 00:10:37.200
<v Speaker 2>less any programming language that you can think of. Now.

142
00:10:38.399 --> 00:10:41.840
<v Speaker 2>The way that you configure the system, so again not

143
00:10:42.039 --> 00:10:44.679
<v Speaker 2>very surprising, given perhaps that it's certain in Go the

144
00:10:44.799 --> 00:10:48.639
<v Speaker 2>configuration are yamal files, and again this is kind of

145
00:10:48.759 --> 00:10:54.519
<v Speaker 2>correlated with Kubernetes. So Yamal, for those of our listeners

146
00:10:54.559 --> 00:11:00.240
<v Speaker 2>who somehow don't know, is a configuration format. You can

147
00:11:00.320 --> 00:11:03.639
<v Speaker 2>think of it to an extent kind of sort of

148
00:11:03.840 --> 00:11:07.799
<v Speaker 2>similar to what we usually do with Jason files, but

149
00:11:08.000 --> 00:11:12.679
<v Speaker 2>it's a different format. It has certain advantages over JSON.

150
00:11:12.840 --> 00:11:18.360
<v Speaker 2>For example, it supports comments. It's used by a lot

151
00:11:18.879 --> 00:11:22.600
<v Speaker 2>for a lot of administrative stuff. So any DevOps person

152
00:11:22.759 --> 00:11:24.840
<v Speaker 2>is likely very familiar with Yamo.

153
00:11:26.080 --> 00:11:29.399
<v Speaker 1>The default configus for rails were all done in Yammo

154
00:11:29.480 --> 00:11:29.799
<v Speaker 1>as well.

155
00:11:30.720 --> 00:11:38.679
<v Speaker 2>So yeah, so you basically create configuration files for Prometheus

156
00:11:38.840 --> 00:11:43.320
<v Speaker 2>in Yamel. And the way that Prometheus works is kind

157
00:11:43.360 --> 00:11:46.320
<v Speaker 2>of the reverse of what you might expect. So you

158
00:11:46.759 --> 00:11:52.000
<v Speaker 2>might think that you somehow configure various systems to push

159
00:11:52.200 --> 00:11:56.720
<v Speaker 2>data into Prometheus, but that's not how it works. The

160
00:11:56.799 --> 00:12:02.840
<v Speaker 2>way that it actually works is that Prometheus data into itself.

161
00:12:03.440 --> 00:12:09.240
<v Speaker 2>So in the yamal file, you tell Prometheus the address

162
00:12:09.440 --> 00:12:12.919
<v Speaker 2>of the various addresses of the various services that you

163
00:12:13.000 --> 00:12:16.879
<v Speaker 2>wanted to monitor and the rate at which it should

164
00:12:16.960 --> 00:12:22.080
<v Speaker 2>effectively ping those services, and it effectively does an HTTP

165
00:12:22.320 --> 00:12:26.360
<v Speaker 2>get to an end points exposed by these services and

166
00:12:26.639 --> 00:12:30.559
<v Speaker 2>downloads data from them. So it actually pulls the data

167
00:12:31.159 --> 00:12:35.960
<v Speaker 2>from them into itself. Now, the advantage of this approach

168
00:12:36.759 --> 00:12:39.679
<v Speaker 2>is that first of all, they don't need to be

169
00:12:39.799 --> 00:12:44.200
<v Speaker 2>aware of where the Prometheus server is, so it's all

170
00:12:44.279 --> 00:12:47.360
<v Speaker 2>the configuration is centralized. They just need to open the port,

171
00:12:47.759 --> 00:12:49.919
<v Speaker 2>you know, listen on it, and that's more or less.

172
00:12:50.000 --> 00:12:53.279
<v Speaker 2>It also means that they can work with multiple Prometheus

173
00:12:53.320 --> 00:12:56.559
<v Speaker 2>service servers at the same time because they all just

174
00:12:56.759 --> 00:13:00.360
<v Speaker 2>pull the data, because pulling the data doesn't clear the data.

175
00:13:00.399 --> 00:13:02.679
<v Speaker 2>It's not as if, you know, they give the data

176
00:13:02.679 --> 00:13:06.879
<v Speaker 2>and then forget it it. They retain the data. Those

177
00:13:06.960 --> 00:13:10.080
<v Speaker 2>services are expected to retain their data in memory, so

178
00:13:10.200 --> 00:13:12.159
<v Speaker 2>you can just hit any one of them at any

179
00:13:12.240 --> 00:13:17.480
<v Speaker 2>time and pull from them their current situation state. I

180
00:13:17.639 --> 00:13:18.360
<v Speaker 2>hope that's clear.

181
00:13:19.600 --> 00:13:23.360
<v Speaker 1>Yeap makes sense to me. I kind of like it too,

182
00:13:23.559 --> 00:13:28.360
<v Speaker 1>just from the standpoint of so I've used other systems,

183
00:13:28.519 --> 00:13:30.879
<v Speaker 1>paid systems, you know, we've been sponsored in the past

184
00:13:30.919 --> 00:13:33.159
<v Speaker 1>by Century and ray Gun and stuff like that that

185
00:13:34.559 --> 00:13:38.240
<v Speaker 1>grab a lot of this information. Though I think we're

186
00:13:38.320 --> 00:13:40.720
<v Speaker 1>talking kind of a level below that right where we're

187
00:13:40.759 --> 00:13:43.360
<v Speaker 1>not talking specifically about the information that's being sent. We're

188
00:13:43.440 --> 00:13:46.480
<v Speaker 1>just talking about how the information gets into the system.

189
00:13:46.519 --> 00:13:50.679
<v Speaker 1>At this point, I like the fact that it's like, Okay,

190
00:13:50.720 --> 00:13:53.480
<v Speaker 1>I'm gonna periodically check and then I don't have ten

191
00:13:53.600 --> 00:13:57.480
<v Speaker 1>million hits on the service on the other end, right,

192
00:13:57.840 --> 00:14:00.200
<v Speaker 1>because I'm not pushing it to it, it's pulling it,

193
00:14:00.639 --> 00:14:03.360
<v Speaker 1>and so it only does the work that it has

194
00:14:03.440 --> 00:14:06.200
<v Speaker 1>to do, right Yeah, Yeah, I don't have all this

195
00:14:06.399 --> 00:14:08.279
<v Speaker 1>extra network crap going on.

196
00:14:09.399 --> 00:14:12.919
<v Speaker 2>Yeah. So it basically does an AGTP get let's say

197
00:14:12.960 --> 00:14:17.279
<v Speaker 2>once every minute to any one of those services. The

198
00:14:17.440 --> 00:14:23.279
<v Speaker 2>response is essentially texts in their own format. It's quite

199
00:14:23.360 --> 00:14:26.200
<v Speaker 2>readable actually, so you could literally go to one of

200
00:14:26.279 --> 00:14:29.200
<v Speaker 2>those endpoints and just hit it with your browser and

201
00:14:29.279 --> 00:14:32.679
<v Speaker 2>see what the response would be. By the way, obviously

202
00:14:32.759 --> 00:14:35.360
<v Speaker 2>you probably want to make sure that those endpoints are

203
00:14:35.440 --> 00:14:40.120
<v Speaker 2>not externally exposed, so you know that everything stays behind

204
00:14:40.159 --> 00:14:44.799
<v Speaker 2>the firewall. The one kind of caveat to that is

205
00:14:44.879 --> 00:14:49.200
<v Speaker 2>that sometimes you have like short lived services, think about

206
00:14:49.320 --> 00:14:53.879
<v Speaker 2>something like a lambda. In that case, what they have

207
00:14:54.120 --> 00:14:57.039
<v Speaker 2>is something called a push gateway, which is like a

208
00:14:57.159 --> 00:15:02.080
<v Speaker 2>standalone service that those show or lived jobs can push

209
00:15:02.240 --> 00:15:05.360
<v Speaker 2>data in. Because they are really short lived, so you

210
00:15:05.480 --> 00:15:08.639
<v Speaker 2>can't assume that they'll hang around until they're they're pulled again.

211
00:15:09.200 --> 00:15:13.799
<v Speaker 2>So they can push their data into that push gateway,

212
00:15:13.960 --> 00:15:17.919
<v Speaker 2>which holds onto that data, and then Prometheus pulls that

213
00:15:18.279 --> 00:15:21.879
<v Speaker 2>data from that push gateway. So it's kind of an

214
00:15:21.960 --> 00:15:26.600
<v Speaker 2>intermediary service that that you know, for those special cases.

215
00:15:26.679 --> 00:15:29.879
<v Speaker 2>But in most cases and the cases that I've used

216
00:15:29.919 --> 00:15:33.159
<v Speaker 2>it in, you know, it was with long lived servers

217
00:15:33.279 --> 00:15:36.120
<v Speaker 2>or services, and then you know that's just the way

218
00:15:36.159 --> 00:15:37.799
<v Speaker 2>it worked, right.

219
00:15:38.480 --> 00:15:40.639
<v Speaker 1>That makes a lot of sense too, just in the yeah,

220
00:15:40.720 --> 00:15:42.919
<v Speaker 1>like I'm thinking like serviles functions and things like that,

221
00:15:43.279 --> 00:15:47.120
<v Speaker 1>or you know, if you we've also run background jobs

222
00:15:47.639 --> 00:15:49.919
<v Speaker 1>a lot in the apps that I have right where

223
00:15:49.960 --> 00:15:51.399
<v Speaker 1>it pulls it off the queue and then runs it.

224
00:15:51.919 --> 00:15:54.559
<v Speaker 1>And so yeah, in either of those cases, Yeah, you

225
00:15:54.639 --> 00:15:59.320
<v Speaker 1>don't want or necessarily need something hanging out so that

226
00:15:59.399 --> 00:16:01.120
<v Speaker 1>it can say, oh, you're going to query me within

227
00:16:01.200 --> 00:16:03.960
<v Speaker 1>the next minute or so, it just says poof, I'm

228
00:16:04.000 --> 00:16:05.320
<v Speaker 1>done and then hands it off, right.

229
00:16:06.159 --> 00:16:10.000
<v Speaker 2>Yeah. So there's for example, in the case of node JS,

230
00:16:10.120 --> 00:16:15.039
<v Speaker 2>there's a Prometheus client for node I think it's literally

231
00:16:15.159 --> 00:16:18.759
<v Speaker 2>the project is really literally as I recall, called Prome client.

232
00:16:19.000 --> 00:16:23.720
<v Speaker 2>So you just NPM install it and then you know

233
00:16:24.399 --> 00:16:28.639
<v Speaker 2>it's in there. You just give it to the port

234
00:16:28.759 --> 00:16:32.679
<v Speaker 2>to listen on and then it does you know, just

235
00:16:33.320 --> 00:16:35.559
<v Speaker 2>let's say it uses express or something like that, and

236
00:16:35.720 --> 00:16:40.360
<v Speaker 2>then it basically collects the information and exposes that to

237
00:16:40.519 --> 00:16:43.279
<v Speaker 2>that port for you, and you don't really need to

238
00:16:43.399 --> 00:16:51.080
<v Speaker 2>do anything to start monitoring basic system level stuff. Now,

239
00:16:51.759 --> 00:16:57.759
<v Speaker 2>the Prometheus server gets the data in and then put

240
00:16:57.919 --> 00:17:03.039
<v Speaker 2>saves it into its own percise database, right, and that

241
00:17:03.279 --> 00:17:08.440
<v Speaker 2>database is what I call the time series database. And

242
00:17:08.920 --> 00:17:15.799
<v Speaker 2>what I mean by that is that Prometheus doesn't store data, Like,

243
00:17:15.960 --> 00:17:18.240
<v Speaker 2>don't think of it something like a database, you know,

244
00:17:18.359 --> 00:17:21.240
<v Speaker 2>we tend to think of something like a relational database

245
00:17:21.400 --> 00:17:24.920
<v Speaker 2>or maybe a low sequel database. That's not really that

246
00:17:25.119 --> 00:17:28.880
<v Speaker 2>sort of a thing. It's something called the time series database.

247
00:17:29.000 --> 00:17:33.920
<v Speaker 2>So basically it has metrics, and it basically saves the

248
00:17:34.519 --> 00:17:38.880
<v Speaker 2>value of a metric at every point in time, so

249
00:17:40.400 --> 00:17:45.000
<v Speaker 2>it's really like keeps on collecting metric data. So there's

250
00:17:45.079 --> 00:17:47.680
<v Speaker 2>no such thing really as schemas or something like that.

251
00:17:47.880 --> 00:17:52.599
<v Speaker 2>It just has metrics that it collects data into, and

252
00:17:53.119 --> 00:17:56.680
<v Speaker 2>it's data collected over time. So like think of I

253
00:17:56.720 --> 00:17:59.880
<v Speaker 2>don't know, let's say you're let's say you're a farmer

254
00:18:00.440 --> 00:18:04.079
<v Speaker 2>and you've got a field and you're measuring the temperature

255
00:18:04.480 --> 00:18:08.400
<v Speaker 2>in the field. So you've got the temperature measurement. Let's

256
00:18:08.400 --> 00:18:13.039
<v Speaker 2>say every minute that you got from from the whatever,

257
00:18:13.440 --> 00:18:17.119
<v Speaker 2>you know, the thermonitor device that you use to monitor

258
00:18:17.200 --> 00:18:21.480
<v Speaker 2>the temperature, and it just gets recorded into that persistent

259
00:18:21.640 --> 00:18:25.960
<v Speaker 2>storage and you can go backward and forward in time

260
00:18:26.400 --> 00:18:31.119
<v Speaker 2>and look at any point in time what the temperature was. Right.

261
00:18:31.319 --> 00:18:33.720
<v Speaker 1>So I'm trying to imagine what this looks like for

262
00:18:33.839 --> 00:18:36.720
<v Speaker 1>an app. So is it measuring like how much CPU

263
00:18:36.839 --> 00:18:41.079
<v Speaker 1>it's using and how much memory? Oh we'll get into that.

264
00:18:41.240 --> 00:18:44.160
<v Speaker 2>Okay, yeah, we'll get into that. But basically, as I

265
00:18:44.359 --> 00:18:45.039
<v Speaker 2>was saying, you can.

266
00:18:45.160 --> 00:18:47.279
<v Speaker 1>I'll keep my enthusiasm down for a minute.

267
00:18:47.319 --> 00:18:50.799
<v Speaker 2>Then it's it's you know what, let's talk about that

268
00:18:50.920 --> 00:18:54.920
<v Speaker 2>a little bit. So it's really monitoring two types of information.

269
00:18:55.480 --> 00:18:59.279
<v Speaker 2>One you can think about it as system level stuff.

270
00:19:00.119 --> 00:19:04.000
<v Speaker 2>So in these context of node a node server, that

271
00:19:04.319 --> 00:19:10.680
<v Speaker 2>might be CPU usage or memory slash heap usage, or

272
00:19:11.240 --> 00:19:15.000
<v Speaker 2>the event loop lag or if it's an express service,

273
00:19:15.599 --> 00:19:19.279
<v Speaker 2>the number of requests per period of time or the

274
00:19:19.440 --> 00:19:26.599
<v Speaker 2>duration of the responses stuff like that. So those are

275
00:19:27.079 --> 00:19:31.759
<v Speaker 2>system level things and they're collected for you automatically. So

276
00:19:31.920 --> 00:19:36.519
<v Speaker 2>as soon as you NPM install the prompt client for

277
00:19:36.720 --> 00:19:41.599
<v Speaker 2>node and it is loaded into your project, your node server,

278
00:19:42.319 --> 00:19:46.039
<v Speaker 2>then all that stuff is automatically collected for you, and

279
00:19:46.920 --> 00:19:53.000
<v Speaker 2>when the promitth your server hits that port that information,

280
00:19:53.319 --> 00:19:57.920
<v Speaker 2>those metrics are available from the get go. On top

281
00:19:58.000 --> 00:20:04.000
<v Speaker 2>of that, you can add applicative level metrics that you

282
00:20:05.319 --> 00:20:10.279
<v Speaker 2>push into Prometheus using an API. So, for example, if

283
00:20:10.359 --> 00:20:15.079
<v Speaker 2>you've got like your own let's say QUES internal cues

284
00:20:15.200 --> 00:20:18.400
<v Speaker 2>that you want to monitor the usage of, or your

285
00:20:18.480 --> 00:20:22.799
<v Speaker 2>own business logic processes that you want to measure the

286
00:20:22.960 --> 00:20:28.559
<v Speaker 2>duration of, you can measure those as well. Okay, so

287
00:20:28.720 --> 00:20:34.160
<v Speaker 2>you've got both the system level stuff and the applicative stuff.

288
00:20:35.839 --> 00:20:40.359
<v Speaker 2>And by the way, one system level stuff that is

289
00:20:40.599 --> 00:20:43.640
<v Speaker 2>really important in the context of no that may be

290
00:20:43.880 --> 00:20:47.720
<v Speaker 2>less obvious or familiar to some of our listeners is

291
00:20:47.799 --> 00:20:50.880
<v Speaker 2>something called the event loop LAG. Are you familiar with that?

292
00:20:52.960 --> 00:20:53.400
<v Speaker 1>I am not.

293
00:20:55.359 --> 00:20:59.160
<v Speaker 2>So, as you know, the way the JavaScript works is

294
00:20:59.279 --> 00:21:02.400
<v Speaker 2>that it's all based on an event loop, be it

295
00:21:02.559 --> 00:21:07.519
<v Speaker 2>either in the browser or in node JavaScript. The way

296
00:21:07.559 --> 00:21:13.000
<v Speaker 2>that it works is you've got whenever something happens, like

297
00:21:13.119 --> 00:21:16.440
<v Speaker 2>if it's a browser, then it's something arrives over the

298
00:21:16.559 --> 00:21:21.200
<v Speaker 2>network or the user does some sort of an interaction

299
00:21:21.519 --> 00:21:27.200
<v Speaker 2>a mouse click, a keyboard press. Whenever something happens, an

300
00:21:27.279 --> 00:21:32.319
<v Speaker 2>event is triggered and that puts the event information in

301
00:21:32.440 --> 00:21:37.559
<v Speaker 2>a queue, and JavaScript which behaves is in a runs

302
00:21:37.680 --> 00:21:42.039
<v Speaker 2>kind of in a single threaded type approach, pulls data

303
00:21:42.279 --> 00:21:46.160
<v Speaker 2>out the most the top data out of the queue,

304
00:21:46.839 --> 00:21:51.440
<v Speaker 2>processes it, and then moves to the next item in

305
00:21:51.519 --> 00:21:53.319
<v Speaker 2>the queue, the next item in the queue, and so

306
00:21:53.519 --> 00:21:57.000
<v Speaker 2>forth until you know. If the queue becomes empty, then

307
00:21:57.079 --> 00:22:01.839
<v Speaker 2>it effectively idles until another stuff is put into the queue.

308
00:22:02.799 --> 00:22:04.720
<v Speaker 1>So far, so good, yep.

309
00:22:05.960 --> 00:22:11.720
<v Speaker 2>Now, the problem is that usually rather than idling, what

310
00:22:11.960 --> 00:22:17.039
<v Speaker 2>happens really is that information comes into gets into the

311
00:22:17.160 --> 00:22:20.119
<v Speaker 2>queue at a very rapid pace, at a high clip.

312
00:22:20.839 --> 00:22:24.400
<v Speaker 2>So it might new new events are placed into the

313
00:22:24.519 --> 00:22:31.640
<v Speaker 2>queue before the the before the jobscript engine is ready

314
00:22:31.799 --> 00:22:35.759
<v Speaker 2>to process them, because it's still busy processing other stuff.

315
00:22:36.440 --> 00:22:39.960
<v Speaker 2>Like you think about, Let's say a node server running

316
00:22:40.039 --> 00:22:45.559
<v Speaker 2>express the events coming in are are the HDP requests.

317
00:22:46.720 --> 00:22:51.279
<v Speaker 2>If HDP requests arrive at a too high rate, then

318
00:22:52.000 --> 00:22:56.160
<v Speaker 2>then the node service might not be able to process

319
00:22:56.240 --> 00:23:00.160
<v Speaker 2>them quickly enough and will get overloaded. Right, So, what

320
00:23:00.400 --> 00:23:04.440
<v Speaker 2>the event loop plaque actually measures is the amount of

321
00:23:04.640 --> 00:23:08.039
<v Speaker 2>time from when a message is placed into the queue

322
00:23:08.720 --> 00:23:11.480
<v Speaker 2>until it's taken out of the queue in order to

323
00:23:11.559 --> 00:23:17.519
<v Speaker 2>be processed. So If that period of time is small,

324
00:23:18.079 --> 00:23:21.640
<v Speaker 2>then you know that your service is really responsive. If

325
00:23:21.720 --> 00:23:25.200
<v Speaker 2>it gets to be too high, that it means you know,

326
00:23:26.559 --> 00:23:31.839
<v Speaker 2>your your system is overloaded and and it's not responsive enough.

327
00:23:33.519 --> 00:23:37.119
<v Speaker 2>You know, if think about think about a service, an

328
00:23:37.200 --> 00:23:41.519
<v Speaker 2>express service that takes I don't know two seconds to

329
00:23:41.640 --> 00:23:43.759
<v Speaker 2>get to pull something out of the queue, it means

330
00:23:43.839 --> 00:23:46.680
<v Speaker 2>that the browser, the client side, is waiting for two

331
00:23:46.799 --> 00:23:51.960
<v Speaker 2>seconds before it's it's event is even processed. So obviously

332
00:23:52.039 --> 00:23:54.640
<v Speaker 2>that's a bad thing, and it's especially bad if it

333
00:23:54.960 --> 00:23:59.000
<v Speaker 2>keeps on growing, because then eventually your service server will

334
00:23:59.119 --> 00:24:07.720
<v Speaker 2>just become a response right. So that's information that Prometheus

335
00:24:07.880 --> 00:24:10.559
<v Speaker 2>that the system, the prompt client is actually able to

336
00:24:10.680 --> 00:24:15.119
<v Speaker 2>extract out of node and exposes that into Prometheus. So

337
00:24:15.519 --> 00:24:19.200
<v Speaker 2>that's one of the system monitoring things that's really useful

338
00:24:19.279 --> 00:24:24.680
<v Speaker 2>to look at when you're monitoring a node service. Another

339
00:24:24.759 --> 00:24:28.079
<v Speaker 2>thing that's really useful to look at is, for example,

340
00:24:28.200 --> 00:24:32.160
<v Speaker 2>heap usage, because if you've got let's say, some sort

341
00:24:32.200 --> 00:24:37.720
<v Speaker 2>of memory leak, then you'll see that after a garbage

342
00:24:37.759 --> 00:24:41.319
<v Speaker 2>collection a GC, rather than going all the way down

343
00:24:42.039 --> 00:24:46.480
<v Speaker 2>your memory, just utilization keeps going up and up and

344
00:24:46.640 --> 00:24:50.960
<v Speaker 2>up and up and again. That if it keeps on going.

345
00:24:51.079 --> 00:24:55.279
<v Speaker 2>What will happen is that the node service will try

346
00:24:55.359 --> 00:24:57.680
<v Speaker 2>to do GCS more and more and more and more

347
00:24:58.079 --> 00:25:00.839
<v Speaker 2>in order to free memory. But to do so and

348
00:25:00.960 --> 00:25:03.920
<v Speaker 2>you get what's known as a GC storm, or effectively,

349
00:25:04.000 --> 00:25:07.240
<v Speaker 2>the service all it does is just try to free

350
00:25:07.319 --> 00:25:10.880
<v Speaker 2>up memory that it can't and it becomes totally stuck.

351
00:25:11.400 --> 00:25:14.079
<v Speaker 2>So that's another thing that you can look at in

352
00:25:14.160 --> 00:25:17.119
<v Speaker 2>the context of Prometheus.

353
00:25:18.039 --> 00:25:21.480
<v Speaker 1>Well, it seems like on both of those measures, on

354
00:25:21.680 --> 00:25:26.000
<v Speaker 1>the lag in what is it node event loop lag,

355
00:25:27.640 --> 00:25:30.400
<v Speaker 1>it seems like because I'm sitting here and I'm thinking, okay,

356
00:25:30.839 --> 00:25:34.240
<v Speaker 1>so it's gonna tell me if it doesn't have the

357
00:25:34.279 --> 00:25:38.279
<v Speaker 1>resources to handle whatever's coming at it. But for me,

358
00:25:39.200 --> 00:25:43.279
<v Speaker 1>I find it useful because I mean, let's just take

359
00:25:43.319 --> 00:25:45.799
<v Speaker 1>podcasting for an example, Right, Like I don't go and

360
00:25:46.200 --> 00:25:50.400
<v Speaker 1>religiously obsess over the numbers, right, I don't go look

361
00:25:50.480 --> 00:25:54.720
<v Speaker 1>at the metrics. But for an app, if you're checking

362
00:25:54.759 --> 00:25:57.119
<v Speaker 1>the metrics on a regular basis, it seems like you

363
00:25:57.160 --> 00:26:04.079
<v Speaker 1>could start to see this lag event loop flag steadily

364
00:26:04.200 --> 00:26:07.319
<v Speaker 1>increasing and go, Okay, we are getting to the point

365
00:26:07.359 --> 00:26:10.839
<v Speaker 1>where we need to start looking at right, instead of

366
00:26:10.960 --> 00:26:13.599
<v Speaker 1>all of a sudden being whoa, whoa, whoa. We're you know,

367
00:26:13.680 --> 00:26:17.960
<v Speaker 1>we're way over the edge, right, And so it allows

368
00:26:18.000 --> 00:26:21.039
<v Speaker 1>you to be proactive, right instead of reacting to people

369
00:26:21.079 --> 00:26:22.079
<v Speaker 1>are complaining it's slow.

370
00:26:22.880 --> 00:26:25.519
<v Speaker 2>Yeah. And actually, what you really want is to have

371
00:26:25.680 --> 00:26:30.799
<v Speaker 2>good alerting, and we'll get to that as well, because yeah, yeah,

372
00:26:30.920 --> 00:26:35.680
<v Speaker 2>because realistically, you're not going to check the graphs for

373
00:26:36.079 --> 00:26:40.319
<v Speaker 2>all your services every morning or every afternoon. What you

374
00:26:40.440 --> 00:26:43.000
<v Speaker 2>really want is you want a system that alerts you

375
00:26:43.599 --> 00:26:46.920
<v Speaker 2>in case something is wrong. And you usually want an

376
00:26:46.960 --> 00:26:50.079
<v Speaker 2>alert to say not that you know system is broken.

377
00:26:50.599 --> 00:26:53.759
<v Speaker 2>You want an alert to tell you you know, system

378
00:26:53.839 --> 00:26:58.880
<v Speaker 2>is running hot, you should you should do something before

379
00:26:58.880 --> 00:27:04.240
<v Speaker 2>it breaks, right, And yeah, and and Prometheus is great

380
00:27:04.319 --> 00:27:07.720
<v Speaker 2>for that as well because you can specify alerts. And

381
00:27:08.200 --> 00:27:11.319
<v Speaker 2>so that's the other part. So I was talking about

382
00:27:11.640 --> 00:27:15.519
<v Speaker 2>how all the data is collected into the Prometheus service

383
00:27:15.680 --> 00:27:20.279
<v Speaker 2>and then saved into persistent storage. But then you can

384
00:27:20.400 --> 00:27:24.920
<v Speaker 2>also do queries on top of that data. Prometheus has

385
00:27:25.119 --> 00:27:29.039
<v Speaker 2>its own query language. It's called prom ql, and it

386
00:27:29.200 --> 00:27:34.319
<v Speaker 2>looks nothing like SQL. You know, even though it's a

387
00:27:34.400 --> 00:27:39.240
<v Speaker 2>query language, it's a totally different query language as well.

388
00:27:39.079 --> 00:27:40.920
<v Speaker 1>As a query language, and it doesn't look.

389
00:27:40.839 --> 00:27:47.079
<v Speaker 2>Like exactly, and you can do two sort of things.

390
00:27:47.200 --> 00:27:52.200
<v Speaker 2>You can have Graffana as a visualization environment. So Graffana

391
00:27:52.359 --> 00:27:54.839
<v Speaker 2>is the service so that you run in the browser

392
00:27:55.279 --> 00:27:57.599
<v Speaker 2>and it can show you all sorts of graphs from

393
00:27:57.720 --> 00:28:02.240
<v Speaker 2>various data sources, and one of the data sources is Prometheus,

394
00:28:02.319 --> 00:28:06.880
<v Speaker 2>and you can write prom ql queries that extract data

395
00:28:07.079 --> 00:28:11.759
<v Speaker 2>and then graph this data in whatever dashboard you're using.

396
00:28:12.240 --> 00:28:16.519
<v Speaker 2>So that's one possible usage. Another possible usage is something

397
00:28:16.599 --> 00:28:21.279
<v Speaker 2>called the alert Manager, which is another component of Prometheus

398
00:28:21.440 --> 00:28:25.200
<v Speaker 2>that comes along with it. It's a standalone service that

399
00:28:25.680 --> 00:28:30.839
<v Speaker 2>does regular that every at regular intervals, it runs uh

400
00:28:31.759 --> 00:28:36.960
<v Speaker 2>prom ql queries, gets the data from them, and sees

401
00:28:37.160 --> 00:28:41.559
<v Speaker 2>if they're if alerting criteria are met, and if they are,

402
00:28:42.160 --> 00:28:48.039
<v Speaker 2>it can then push alerts into emails or Slack or

403
00:28:48.279 --> 00:28:53.279
<v Speaker 2>page your duty or whatever, so that you can generate

404
00:28:53.400 --> 00:28:56.599
<v Speaker 2>alerts out of the promit out of the Prometheus data. So,

405
00:28:56.720 --> 00:29:00.680
<v Speaker 2>for example, going back to that farmer and field and

406
00:29:01.119 --> 00:29:04.960
<v Speaker 2>thermometer example, you could say that if the temperature goes

407
00:29:05.039 --> 00:29:07.960
<v Speaker 2>above I don't know, thirty degrees on the ground, send

408
00:29:08.000 --> 00:29:10.359
<v Speaker 2>an alert something along these thoughts.

409
00:29:12.039 --> 00:29:12.920
<v Speaker 1>Yeah, that makes sense.

410
00:29:13.799 --> 00:29:14.599
<v Speaker 2>Yeah, So.

411
00:29:16.319 --> 00:29:18.720
<v Speaker 1>I just want to throw in here real quick, because

412
00:29:20.279 --> 00:29:24.720
<v Speaker 1>I think sometimes we kind of treat the time series

413
00:29:24.839 --> 00:29:28.039
<v Speaker 1>data as kind of monolithic in ways, like treating it

414
00:29:28.160 --> 00:29:29.359
<v Speaker 1>like for the day or the week.

415
00:29:29.720 --> 00:29:29.880
<v Speaker 2>Right.

416
00:29:31.000 --> 00:29:33.400
<v Speaker 1>I was actually looking on the Discord server that we

417
00:29:33.519 --> 00:29:37.559
<v Speaker 1>use for the hosts and Adventures and DevOps. They were

418
00:29:37.599 --> 00:29:39.599
<v Speaker 1>going to do an episode where they were talking about

419
00:29:40.319 --> 00:29:43.680
<v Speaker 1>holiday rushes, right, And so one day to the next,

420
00:29:43.799 --> 00:29:45.880
<v Speaker 1>it may vary, or you may get a lot of

421
00:29:45.920 --> 00:29:48.440
<v Speaker 1>your traffic in the morning or the evening, and so

422
00:29:49.039 --> 00:29:51.240
<v Speaker 1>you know, by having these alerts, you can start to

423
00:29:51.279 --> 00:29:53.839
<v Speaker 1>pick up some of the patterns. And the other thing

424
00:29:53.960 --> 00:29:56.640
<v Speaker 1>is is you can turn around and you can say, Okay,

425
00:29:57.720 --> 00:30:00.039
<v Speaker 1>not only do I know that something's happening now, but

426
00:30:00.160 --> 00:30:01.920
<v Speaker 1>I can go look at the current state of things

427
00:30:02.000 --> 00:30:05.480
<v Speaker 1>or get a snapshot from the Prometheus data, and then

428
00:30:05.640 --> 00:30:08.759
<v Speaker 1>start to solve whatever is the issue is, right, whether

429
00:30:08.839 --> 00:30:12.400
<v Speaker 1>it's I need more resources or oh I didn't realize,

430
00:30:12.519 --> 00:30:15.440
<v Speaker 1>but I built something into the application that makes it

431
00:30:15.599 --> 00:30:18.680
<v Speaker 1>memory heavy, and so my heap size is going out

432
00:30:18.720 --> 00:30:20.759
<v Speaker 1>of control and I'm running out of memory or whatever.

433
00:30:21.359 --> 00:30:24.960
<v Speaker 2>For sure, But definitely also the fact that data can

434
00:30:25.079 --> 00:30:29.160
<v Speaker 2>vary over time, even if regularly, can make some of

435
00:30:29.279 --> 00:30:32.880
<v Speaker 2>this stuff pretty challenging, but still doable. PROMQL is a

436
00:30:33.000 --> 00:30:36.319
<v Speaker 2>very sophisticated QUERI language, and I'll give examples of some

437
00:30:36.559 --> 00:30:40.720
<v Speaker 2>of the challenges that I've run into when creating alerts. So,

438
00:30:40.960 --> 00:30:44.480
<v Speaker 2>for example, a lot of companies that I've worked for,

439
00:30:44.599 --> 00:30:49.200
<v Speaker 2>companies like Wix or Next Insurance, there was a lot

440
00:30:49.359 --> 00:30:53.319
<v Speaker 2>more traffic let's say, over the week days than over

441
00:30:53.400 --> 00:30:59.079
<v Speaker 2>the week ends, which is not surprising. But the downside

442
00:30:59.119 --> 00:31:03.279
<v Speaker 2>of that was that data over the weekend would fluctuate

443
00:31:03.599 --> 00:31:08.079
<v Speaker 2>a lot a lot more because the sampling size was

444
00:31:08.680 --> 00:31:14.799
<v Speaker 2>significantly lower. When you think about it, let's say you

445
00:31:15.000 --> 00:31:19.400
<v Speaker 2>have let's say talk about Next Insurance. You might have

446
00:31:20.119 --> 00:31:23.599
<v Speaker 2>ten thousand sessions in a week, you know, working in

447
00:31:23.720 --> 00:31:27.799
<v Speaker 2>a week day in a working day, but only one

448
00:31:27.880 --> 00:31:31.680
<v Speaker 2>hundred over the weekend. And when you've got only one

449
00:31:31.759 --> 00:31:38.559
<v Speaker 2>hundred sessions, then ten bad sessions can really impact your

450
00:31:38.759 --> 00:31:46.359
<v Speaker 2>performance numbers. So and then we would see that if

451
00:31:46.400 --> 00:31:50.079
<v Speaker 2>we weren't careful, we would start getting alerts over the

452
00:31:50.160 --> 00:31:54.400
<v Speaker 2>weekends because the data was less stable because there were

453
00:31:54.559 --> 00:31:58.440
<v Speaker 2>just fewer sessions. So we basically kind of had to

454
00:31:58.559 --> 00:32:02.680
<v Speaker 2>come up with queries that were also dependent on the

455
00:32:03.480 --> 00:32:07.680
<v Speaker 2>number of sessions, not just the duration of the sessions,

456
00:32:07.759 --> 00:32:10.880
<v Speaker 2>So if there were too few sessions, we would ignore

457
00:32:11.559 --> 00:32:14.880
<v Speaker 2>the other criteria of the duration of the sessions. So

458
00:32:15.000 --> 00:32:18.319
<v Speaker 2>that might be something. You know, it makes the queries

459
00:32:18.400 --> 00:32:21.599
<v Speaker 2>more challenging. But you want to take these sort of

460
00:32:21.680 --> 00:32:27.160
<v Speaker 2>things into account. So quickly going back to when is

461
00:32:27.279 --> 00:32:30.599
<v Speaker 2>Prometheus a good match and when isn't it a good match?

462
00:32:31.279 --> 00:32:34.200
<v Speaker 2>What type of data would you want to put into

463
00:32:34.279 --> 00:32:37.599
<v Speaker 2>Prometheus and what kind of data you would probably want

464
00:32:37.640 --> 00:32:41.000
<v Speaker 2>to put somewhere else. So Prometheus is a good match

465
00:32:41.359 --> 00:32:48.920
<v Speaker 2>when you're recording pure numeric time series data. It's appropriate

466
00:32:49.079 --> 00:32:54.759
<v Speaker 2>for machine centric monitoring, you're monitoring systems when it's highly

467
00:32:55.000 --> 00:32:59.480
<v Speaker 2>dynamic service oriented architectures, because you know, this whole pull

468
00:32:59.519 --> 00:33:05.839
<v Speaker 2>type mechanism makes it very easy to adjust to additional

469
00:33:06.000 --> 00:33:09.960
<v Speaker 2>instances coming up or going down in a very dynamic

470
00:33:10.079 --> 00:33:13.559
<v Speaker 2>sort of a manner. And again i'll talk about it

471
00:33:13.720 --> 00:33:18.480
<v Speaker 2>assuming we have the time something, but it's very important

472
00:33:18.599 --> 00:33:22.680
<v Speaker 2>when the data is multi dimensional and both in the

473
00:33:22.799 --> 00:33:25.559
<v Speaker 2>collection and the querying, and I'll explain what that is

474
00:33:25.720 --> 00:33:29.200
<v Speaker 2>a bit later on. Now, when is it not a

475
00:33:29.279 --> 00:33:32.039
<v Speaker 2>good match. First of all, it's not a good match

476
00:33:32.119 --> 00:33:36.079
<v Speaker 2>when you need one hundred percent accuracy So for example,

477
00:33:36.119 --> 00:33:39.720
<v Speaker 2>if you're looking at stuff like billing, when the numbers

478
00:33:40.200 --> 00:33:44.599
<v Speaker 2>have to be perfect, then Prometheus is not a good solution.

479
00:33:44.759 --> 00:33:49.960
<v Speaker 2>Prometheus in most cases kind of averages out data because again,

480
00:33:50.039 --> 00:33:54.079
<v Speaker 2>when you're looking at stuff like CPU usage, it doesn't

481
00:33:54.119 --> 00:33:58.319
<v Speaker 2>really matter if you're running at ninety three or ninety

482
00:33:58.400 --> 00:34:02.240
<v Speaker 2>three point two or even ninety four percent. But if

483
00:34:02.279 --> 00:34:05.240
<v Speaker 2>you know, obviously, if you're looking at stuff like, you know,

484
00:34:05.519 --> 00:34:10.719
<v Speaker 2>your taxes, you probably need to be accurate. Uh, it's

485
00:34:10.840 --> 00:34:11.559
<v Speaker 2>not approprix.

486
00:34:12.760 --> 00:34:16.199
<v Speaker 1>Sorry you said, don't talk to me about taxes right now.

487
00:34:17.400 --> 00:34:23.440
<v Speaker 2>Yeah, yeah, anyway, Uh, it's not. It's also not appropriate

488
00:34:23.519 --> 00:34:27.280
<v Speaker 2>when you're recording non numeric data. So for example, if

489
00:34:27.320 --> 00:34:33.320
<v Speaker 2>you're recording stuff like email addresses or phone numbers, even

490
00:34:33.400 --> 00:34:35.840
<v Speaker 2>even though they seem like they're numeric, they're not really,

491
00:34:37.840 --> 00:34:41.320
<v Speaker 2>or you know, street addresses, stuff like that, that's not appropriate.

492
00:34:42.199 --> 00:34:45.679
<v Speaker 2>And it's also not appropriate when the data has to

493
00:34:45.840 --> 00:34:50.400
<v Speaker 2>be totally persistent, when you know when it's you can't

494
00:34:50.440 --> 00:34:55.480
<v Speaker 2>afford to lose any data. As I said, Prometheus pulls

495
00:34:56.039 --> 00:35:00.519
<v Speaker 2>the data, pulls the data out of the very services

496
00:35:01.079 --> 00:35:04.679
<v Speaker 2>at regular intervals, let's say every one minute. If that

497
00:35:04.960 --> 00:35:10.159
<v Speaker 2>service crashes during that one minute, you've lost the data

498
00:35:10.360 --> 00:35:15.679
<v Speaker 2>since you previously pulled it. And that's obviously not something

499
00:35:15.760 --> 00:35:18.320
<v Speaker 2>that you can live with in a lot of cases

500
00:35:18.400 --> 00:35:22.480
<v Speaker 2>where you need persistent data and you know, if it's

501
00:35:22.920 --> 00:35:26.400
<v Speaker 2>if you're dealing with bank accounts and stuff like that,

502
00:35:26.559 --> 00:35:29.800
<v Speaker 2>you can't afford to lose transactions or stuff like that.

503
00:35:30.079 --> 00:35:33.639
<v Speaker 1>So yeah, but it seems like most people are pulling

504
00:35:33.800 --> 00:35:38.360
<v Speaker 1>or using it by putting this client onto their application,

505
00:35:38.559 --> 00:35:42.639
<v Speaker 1>and so it's it's only recording those specific kinds of data.

506
00:35:42.960 --> 00:35:45.719
<v Speaker 1>You're talking about a custom use of Prometheus where you

507
00:35:45.800 --> 00:35:48.880
<v Speaker 1>might push other data into it as.

508
00:35:48.800 --> 00:35:54.239
<v Speaker 2>Well, exactly exactly. Now, it can be a challenge because again,

509
00:35:54.320 --> 00:35:56.800
<v Speaker 2>let's say you're pulling, you're pulling the data or pulling

510
00:35:56.880 --> 00:35:59.440
<v Speaker 2>it in every minute, and if your server has a

511
00:35:59.559 --> 00:36:03.840
<v Speaker 2>problem that in a certain scenario, memory consumption like rockets

512
00:36:03.880 --> 00:36:06.599
<v Speaker 2>out of control and then the server crashes, then you

513
00:36:06.760 --> 00:36:09.639
<v Speaker 2>might not be able to catch that if it, you know,

514
00:36:09.800 --> 00:36:12.000
<v Speaker 2>if it happens within the spence of the span of

515
00:36:12.039 --> 00:36:17.400
<v Speaker 2>a few seconds. So you can either then try to

516
00:36:17.639 --> 00:36:21.800
<v Speaker 2>increase the rate at which you pull and hope that

517
00:36:21.880 --> 00:36:26.079
<v Speaker 2>you're lucky, or look at for some other solution. Right,

518
00:36:27.920 --> 00:36:32.159
<v Speaker 2>as I said, it has integrations for no JVM, Go, Python, Ruby,

519
00:36:32.639 --> 00:36:38.400
<v Speaker 2>and in terms of systems, stuff like Kubernetes, git Lab, AWS, Gira,

520
00:36:38.679 --> 00:36:44.000
<v Speaker 2>mongo dB, reddis. For visualization, usually you'd use Gofana, even

521
00:36:44.079 --> 00:36:48.159
<v Speaker 2>though it also has its own built in simple visualization capabilities.

522
00:36:48.679 --> 00:36:52.400
<v Speaker 2>AH And it's compatible with open Telemetry, so if you're

523
00:36:52.480 --> 00:36:56.960
<v Speaker 2>using open telemetry for collecting telemetry information, you can also

524
00:36:57.840 --> 00:37:00.760
<v Speaker 2>use open Telemetry on top of Promethea and have open

525
00:37:00.800 --> 00:37:03.880
<v Speaker 2>Telemetry put it data, put its data the data that

526
00:37:04.000 --> 00:37:09.440
<v Speaker 2>it collects into the relevant data, the time series data

527
00:37:09.960 --> 00:37:21.320
<v Speaker 2>into Prometheus. So there are several types of metrics and

528
00:37:21.559 --> 00:37:27.039
<v Speaker 2>they're appropriate for different scenarios. So very quickly going over

529
00:37:27.119 --> 00:37:32.519
<v Speaker 2>the different metric types, you've got something called a counter counter.

530
00:37:32.679 --> 00:37:35.360
<v Speaker 2>Think about you know, the if you're let's say, think

531
00:37:35.400 --> 00:37:38.960
<v Speaker 2>about a club. Let's say that you want to go

532
00:37:39.119 --> 00:37:41.960
<v Speaker 2>into and there's somebody at the door with this kind

533
00:37:42.000 --> 00:37:45.480
<v Speaker 2>of a counter device clicking it every time somebody goes in,

534
00:37:46.519 --> 00:37:49.199
<v Speaker 2>like counting the number of people inside, because you know,

535
00:37:49.480 --> 00:37:53.360
<v Speaker 2>the fire regulations only allow up to x number of

536
00:37:53.400 --> 00:37:56.599
<v Speaker 2>people to be inside at the same time, and they

537
00:37:56.760 --> 00:37:58.760
<v Speaker 2>so they need to count the number of people going

538
00:37:58.840 --> 00:38:01.400
<v Speaker 2>in and the number of number people coming out to

539
00:38:01.519 --> 00:38:04.079
<v Speaker 2>make sure that you know they don't exceed those limits.

540
00:38:04.400 --> 00:38:07.199
<v Speaker 2>So counter is really something like that. It's a metric

541
00:38:07.800 --> 00:38:14.079
<v Speaker 2>that that can only increase and you basically add one

542
00:38:14.320 --> 00:38:17.679
<v Speaker 2>or add an your end, which is like adding one

543
00:38:17.880 --> 00:38:21.800
<v Speaker 2>end times, so you basically just add add into it

544
00:38:21.880 --> 00:38:25.880
<v Speaker 2>and it keeps on getting higher. Can you think about

545
00:38:26.119 --> 00:38:30.199
<v Speaker 2>things that you would measure using something like like a counter.

546
00:38:31.360 --> 00:38:34.840
<v Speaker 1>Yeah, like the number of requests that come in or so.

547
00:38:35.199 --> 00:38:39.920
<v Speaker 2>Funnily that that's that's that's literally the first example that

548
00:38:40.079 --> 00:38:43.079
<v Speaker 2>I have. So the request count is exactly such a

549
00:38:43.320 --> 00:38:49.039
<v Speaker 2>such a thing, tasks completed, error count, all these kind

550
00:38:49.079 --> 00:38:52.159
<v Speaker 2>of things that only go up. They only increase until

551
00:38:52.800 --> 00:38:57.679
<v Speaker 2>at least until you restart the service, right. And the

552
00:38:57.920 --> 00:39:01.880
<v Speaker 2>cool thing about a counter is that because of this

553
00:39:02.239 --> 00:39:07.920
<v Speaker 2>behavior of only increasing, you can compute the rate of increase,

554
00:39:08.440 --> 00:39:12.679
<v Speaker 2>which makes it possible to have predictions because if you

555
00:39:12.800 --> 00:39:18.920
<v Speaker 2>can calculate the rate, then you can also predict where

556
00:39:18.960 --> 00:39:21.480
<v Speaker 2>you'll be in a certain amount of time.

557
00:39:24.320 --> 00:39:24.719
<v Speaker 1>Makes sense.

558
00:39:25.559 --> 00:39:30.119
<v Speaker 2>Yeah, So that's the simplest type of metric. Again, it's

559
00:39:30.239 --> 00:39:33.800
<v Speaker 2>used automatically for things like a request counter or a

560
00:39:33.920 --> 00:39:36.559
<v Speaker 2>task complete counters, or an error counter or stuff like that,

561
00:39:37.079 --> 00:39:40.760
<v Speaker 2>But you can also create your own applicative counter if

562
00:39:40.840 --> 00:39:45.639
<v Speaker 2>you want to count your own stuff. Right. The next

563
00:39:45.719 --> 00:39:50.000
<v Speaker 2>type of metric is called a gauge, and it records

564
00:39:50.119 --> 00:39:53.480
<v Speaker 2>a value that can go up or down or literally

565
00:39:53.679 --> 00:39:57.480
<v Speaker 2>be set at any value that you want. So you

566
00:39:57.599 --> 00:40:00.679
<v Speaker 2>can literally say, say, the value of this gauge right

567
00:40:00.719 --> 00:40:04.880
<v Speaker 2>now is x. Again, can you think of when you

568
00:40:05.079 --> 00:40:09.400
<v Speaker 2>might use a gauge? This would be like Q size

569
00:40:09.840 --> 00:40:13.920
<v Speaker 2>or shuck, are you looking at my slides? No?

570
00:40:15.039 --> 00:40:18.440
<v Speaker 3>Because again that's the first item that I have on

571
00:40:18.559 --> 00:40:21.559
<v Speaker 3>my slide for using it is the Q is Q

572
00:40:21.840 --> 00:40:29.840
<v Speaker 3>size exactly Q size, memory usage, CPU usage, number of

573
00:40:29.960 --> 00:40:34.440
<v Speaker 3>requests in currently in progress, all these sort of things that.

574
00:40:34.519 --> 00:40:38.079
<v Speaker 2>You just set it to a particular value at each

575
00:40:38.159 --> 00:40:42.880
<v Speaker 2>point in time now the key, so it can go

576
00:40:43.039 --> 00:40:46.199
<v Speaker 2>up and down or be set to particular value. The

577
00:40:46.320 --> 00:40:49.880
<v Speaker 2>thing about it, though, is that because it's so arbitrary,

578
00:40:50.519 --> 00:40:54.079
<v Speaker 2>you can't use it to assess rate of change, because

579
00:40:54.119 --> 00:40:57.280
<v Speaker 2>if you can just jump between numbers, there's no really

580
00:40:57.440 --> 00:41:02.559
<v Speaker 2>meaningful rate that you can think about.

581
00:41:02.800 --> 00:41:09.599
<v Speaker 1>Yeah, you can average it out, but yeah, the next.

582
00:41:09.480 --> 00:41:15.400
<v Speaker 2>One is slightly more complicated. And when I actually have

583
00:41:15.559 --> 00:41:19.400
<v Speaker 2>presentations that I do that I so far have done internally,

584
00:41:19.440 --> 00:41:21.639
<v Speaker 2>I'm kind of looking for conference who wants to talk

585
00:41:21.679 --> 00:41:24.880
<v Speaker 2>about this. But if you want to be able to

586
00:41:25.039 --> 00:41:32.440
<v Speaker 2>measure things like histogram like sorry, like averages or even

587
00:41:32.519 --> 00:41:37.400
<v Speaker 2>more importantly, percentages like the median or the ninety eighth

588
00:41:37.480 --> 00:41:40.719
<v Speaker 2>percentage or stuff like that, then you use a metric

589
00:41:41.199 --> 00:41:46.800
<v Speaker 2>called a histogram. Now that might seem surprising why it's

590
00:41:46.920 --> 00:41:50.760
<v Speaker 2>called a histogram if it's used to measure percentiles for example,

591
00:41:50.800 --> 00:41:53.519
<v Speaker 2>And I'll explain it. I'll try to explain it in

592
00:41:53.920 --> 00:41:57.239
<v Speaker 2>a minute. But can you think about when you want

593
00:41:57.360 --> 00:41:59.599
<v Speaker 2>something like percentiles.

594
00:42:00.800 --> 00:42:03.119
<v Speaker 1>I would think like memory usage or.

595
00:42:03.719 --> 00:42:07.239
<v Speaker 2>No memory usage. We talked about counter Well what.

596
00:42:07.320 --> 00:42:11.039
<v Speaker 1>I what I meant was, you know, or percentage of

597
00:42:11.400 --> 00:42:16.079
<v Speaker 1>like memory used or percentage of resources used. So what

598
00:42:16.239 --> 00:42:18.519
<v Speaker 1>they have is something I'm not understanding.

599
00:42:18.159 --> 00:42:21.119
<v Speaker 2>What it is, so I'll give it. I'll give examples,

600
00:42:21.159 --> 00:42:23.039
<v Speaker 2>and I think that then it will click for you.

601
00:42:23.559 --> 00:42:29.199
<v Speaker 2>So think about something like request duration, like oh I gotcha,

602
00:42:29.760 --> 00:42:36.119
<v Speaker 2>So you want to say, like my median request I

603
00:42:36.199 --> 00:42:42.599
<v Speaker 2>got yeah, ndering duration is x or the ninetieth percentile

604
00:42:43.239 --> 00:42:49.280
<v Speaker 2>is why. Another example might be the response size, So

605
00:42:49.519 --> 00:42:54.119
<v Speaker 2>you might say my average response size or my median

606
00:42:54.199 --> 00:42:58.119
<v Speaker 2>response size is such and such, it goes up to

607
00:42:58.960 --> 00:43:01.960
<v Speaker 2>something else when I'm looking looking at the ninety ninth percentile.

608
00:43:04.760 --> 00:43:08.000
<v Speaker 2>So you kind of want to be able to get

609
00:43:08.159 --> 00:43:12.239
<v Speaker 2>the measurements but then use them in order to calculate,

610
00:43:12.400 --> 00:43:17.280
<v Speaker 2>as I said, percentiles right now, though, it's called a

611
00:43:17.440 --> 00:43:21.880
<v Speaker 2>histogram because the way that it's actually implemented is that

612
00:43:22.239 --> 00:43:25.840
<v Speaker 2>internally you create buckets. You can say, let's say, like

613
00:43:26.079 --> 00:43:29.199
<v Speaker 2>let's say you're talking about request duration. You'd say, I

614
00:43:29.320 --> 00:43:32.719
<v Speaker 2>have a bucket from zero to ten milliseconds, from ten

615
00:43:32.800 --> 00:43:36.079
<v Speaker 2>million seconds to fifty million seconds, from fifty million seconds

616
00:43:36.119 --> 00:43:39.360
<v Speaker 2>to one hundred million seconds, and anything above one hundred

617
00:43:39.400 --> 00:43:43.239
<v Speaker 2>million seconds, okay, And each measurement goes into one of

618
00:43:43.320 --> 00:43:47.679
<v Speaker 2>these buckets, and then you look at how many fell

619
00:43:47.800 --> 00:43:50.280
<v Speaker 2>into each one of those buckets, and then you can

620
00:43:50.440 --> 00:43:52.079
<v Speaker 2>use it to calculate percentiles.

621
00:43:53.079 --> 00:43:54.920
<v Speaker 1>Right, No, that makes sense.

622
00:43:55.320 --> 00:44:00.280
<v Speaker 2>So it's like a dist for each point in time. Right.

623
00:44:00.719 --> 00:44:03.599
<v Speaker 1>The thing that I'm imagining is anything that you would

624
00:44:03.639 --> 00:44:07.760
<v Speaker 1>put like on a bell curve or something like that, right,

625
00:44:08.280 --> 00:44:11.159
<v Speaker 1>And so then then you're, yeah, your ninetieth percentile is

626
00:44:11.239 --> 00:44:12.199
<v Speaker 1>you know, out on the end.

627
00:44:13.039 --> 00:44:13.159
<v Speaker 2>Right.

628
00:44:13.320 --> 00:44:15.800
<v Speaker 1>But so these are kind of rare, but they're also

629
00:44:15.880 --> 00:44:19.519
<v Speaker 1>kind of the extreme that you have to deal with. Right,

630
00:44:19.639 --> 00:44:22.679
<v Speaker 1>So it's a histogram.

631
00:44:23.039 --> 00:44:26.880
<v Speaker 2>It's a histogram that gets recorded and updated at every interval,

632
00:44:27.320 --> 00:44:29.360
<v Speaker 2>so it's like a histogram over time.

633
00:44:30.920 --> 00:44:33.280
<v Speaker 1>Oh interesting, so you could you could literally see like

634
00:44:33.360 --> 00:44:35.880
<v Speaker 1>the hump move or whatever exactly.

635
00:44:35.960 --> 00:44:38.679
<v Speaker 2>You say, this is my histogram, now, this is my

636
00:44:38.800 --> 00:44:42.079
<v Speaker 2>Instagram a minute later, this is my Instagram another minute later,

637
00:44:42.159 --> 00:44:49.159
<v Speaker 2>and so interesting. Okay, so hopefully that's kind of clear.

638
00:44:49.639 --> 00:44:52.440
<v Speaker 2>Now there's another metrical summary. I don't want to get

639
00:44:52.480 --> 00:44:56.840
<v Speaker 2>into it. It's it's it's more rarely and uncommonly used,

640
00:44:57.480 --> 00:45:00.559
<v Speaker 2>so we'll skip it for now. You know, we're starting

641
00:45:00.599 --> 00:45:03.480
<v Speaker 2>to run long in any event, because what I really

642
00:45:03.559 --> 00:45:07.920
<v Speaker 2>wanted to talk about was the concept of labels and dimensionality.

643
00:45:08.559 --> 00:45:11.760
<v Speaker 2>So let's go back again to that field example of

644
00:45:11.880 --> 00:45:15.000
<v Speaker 2>the farmer wanting to measure let's say, humidity and temperature

645
00:45:15.039 --> 00:45:19.199
<v Speaker 2>in the field in their field. But it's a big field,

646
00:45:19.320 --> 00:45:22.320
<v Speaker 2>so they're not using just a single measurement device. They've

647
00:45:22.360 --> 00:45:26.119
<v Speaker 2>got devices spread out in different locations in the field.

648
00:45:27.039 --> 00:45:29.719
<v Speaker 2>So one way in which you might think about it

649
00:45:30.159 --> 00:45:33.039
<v Speaker 2>is that we'll create a separate metric for each one

650
00:45:33.119 --> 00:45:37.519
<v Speaker 2>of those measurement devices. But the way that Prometheus looks

651
00:45:37.519 --> 00:45:40.599
<v Speaker 2>at it is that we've got a single temperature metric,

652
00:45:41.440 --> 00:45:46.840
<v Speaker 2>but for each measurement we associate labels with it, and

653
00:45:47.000 --> 00:45:52.119
<v Speaker 2>that label is textual value. It might be the name

654
00:45:52.320 --> 00:45:56.199
<v Speaker 2>or the idea of the specific measurement device. So let's

655
00:45:56.199 --> 00:45:59.079
<v Speaker 2>say I've got four devices in the field. They're called

656
00:45:59.119 --> 00:46:04.320
<v Speaker 2>ABC and D. I've got the single temperature metric, but

657
00:46:04.480 --> 00:46:08.679
<v Speaker 2>each measurement is associated with either A or B or

658
00:46:08.840 --> 00:46:12.880
<v Speaker 2>C or D. Mm hmmm is that clear?

659
00:46:13.719 --> 00:46:13.920
<v Speaker 1>Yep.

660
00:46:15.880 --> 00:46:20.320
<v Speaker 2>So now what's the benefit of this approach is that

661
00:46:20.519 --> 00:46:24.639
<v Speaker 2>it's very dynamic. If if I, if I add and

662
00:46:25.039 --> 00:46:28.400
<v Speaker 2>if I you know, if I buy some more land

663
00:46:28.920 --> 00:46:33.280
<v Speaker 2>and I now need also an E N f U measurements,

664
00:46:33.719 --> 00:46:36.599
<v Speaker 2>I don't need to create new metrics. It's the same metric,

665
00:46:37.199 --> 00:46:40.480
<v Speaker 2>but it's associated, but there are the new values coming

666
00:46:40.519 --> 00:46:44.119
<v Speaker 2>in are associated with new label values. So I've got

667
00:46:44.199 --> 00:46:49.719
<v Speaker 2>a label called you know, the device name, but it

668
00:46:49.840 --> 00:46:55.119
<v Speaker 2>can come in with any any arbitrary textual value. And

669
00:46:55.320 --> 00:47:00.280
<v Speaker 2>then when I querry the data, I can say, show

670
00:47:00.400 --> 00:47:05.199
<v Speaker 2>me just the values for the metric for measurement for

671
00:47:05.760 --> 00:47:09.199
<v Speaker 2>when the label is equal to A or to A

672
00:47:09.599 --> 00:47:13.239
<v Speaker 2>or B or A and B because when you do

673
00:47:13.480 --> 00:47:17.280
<v Speaker 2>queries in the prom ql query language, you can either

674
00:47:17.880 --> 00:47:20.599
<v Speaker 2>set a label like in the query to have a

675
00:47:20.639 --> 00:47:24.880
<v Speaker 2>specific value or use a regular expression.

676
00:47:26.440 --> 00:47:30.960
<v Speaker 1>Okay, So I'm imagining that you could do this if

677
00:47:31.000 --> 00:47:35.840
<v Speaker 1>you're spinning up say another doctor container or server exactly.

678
00:47:36.079 --> 00:47:39.920
<v Speaker 2>So you might have the pod name. You would have

679
00:47:40.039 --> 00:47:43.920
<v Speaker 2>a label called pod name, and the value is the

680
00:47:44.000 --> 00:47:49.400
<v Speaker 2>name of the pod, right yep. And you can have

681
00:47:49.760 --> 00:47:55.800
<v Speaker 2>any number of labels associated with each metric, and those

682
00:47:55.960 --> 00:47:59.639
<v Speaker 2>labels can have any number of values.

683
00:48:01.440 --> 00:48:01.719
<v Speaker 1>Okay.

684
00:48:02.920 --> 00:48:10.480
<v Speaker 2>Now that I think highlights the potential issue of dimensionality

685
00:48:10.840 --> 00:48:16.440
<v Speaker 2>because it makes the system very flexible. But you'll see

686
00:48:16.480 --> 00:48:18.800
<v Speaker 2>the problem in a minute. So let's say I have

687
00:48:18.960 --> 00:48:25.960
<v Speaker 2>a metric and I have three labels associated with it.

688
00:48:26.840 --> 00:48:31.000
<v Speaker 2>So it's actually they defined a three dimensional space for

689
00:48:31.159 --> 00:48:36.960
<v Speaker 2>that metric because each measurement has a coordinate that is

690
00:48:37.079 --> 00:48:43.440
<v Speaker 2>specified by the values of those three labels. Okay, is

691
00:48:43.519 --> 00:48:47.800
<v Speaker 2>that clear? I'm kind of waving my hands, think about it.

692
00:48:48.320 --> 00:48:52.599
<v Speaker 2>So think about again, going back to the field example,

693
00:48:52.639 --> 00:48:57.400
<v Speaker 2>and let's say that instead of having each measurement in

694
00:48:57.480 --> 00:49:03.360
<v Speaker 2>the field have its own aim, it has a coordinate

695
00:49:03.599 --> 00:49:06.840
<v Speaker 2>in the field, so it has an x access, an

696
00:49:07.159 --> 00:49:12.039
<v Speaker 2>x value, and a y value, So it's a two

697
00:49:12.199 --> 00:49:13.239
<v Speaker 2>dimensional space.

698
00:49:14.199 --> 00:49:14.519
<v Speaker 1>Okay.

699
00:49:17.119 --> 00:49:22.880
<v Speaker 2>Now, now in reality, like again, when you're measuring things,

700
00:49:23.559 --> 00:49:26.000
<v Speaker 2>you might have many more labels than that, so it

701
00:49:26.079 --> 00:49:32.280
<v Speaker 2>becomes an n dimensional space. Right And if for any

702
00:49:32.519 --> 00:49:39.320
<v Speaker 2>so and the the the for each access, the number

703
00:49:39.360 --> 00:49:42.519
<v Speaker 2>of points on that access is the number of different

704
00:49:42.719 --> 00:49:49.440
<v Speaker 2>values that label might have. Okay, So let's say I

705
00:49:49.559 --> 00:49:55.239
<v Speaker 2>have three labels and each one has ten potential values?

706
00:49:55.920 --> 00:50:01.119
<v Speaker 2>How many how many points? How many different numbers in

707
00:50:01.239 --> 00:50:01.840
<v Speaker 2>that space?

708
00:50:01.960 --> 00:50:04.239
<v Speaker 1>Can I have thousand?

709
00:50:05.639 --> 00:50:09.960
<v Speaker 2>Exactly? So I need to remember one thousand different numbers.

710
00:50:12.800 --> 00:50:17.360
<v Speaker 1>For that single metric, right and make sense of it hopefully.

711
00:50:18.079 --> 00:50:21.199
<v Speaker 2>Yeah, But the problem is that if I'm not careful

712
00:50:21.800 --> 00:50:25.119
<v Speaker 2>with the number of labels that I used, and especially

713
00:50:25.239 --> 00:50:29.400
<v Speaker 2>the number of different values that each label might have,

714
00:50:30.239 --> 00:50:35.480
<v Speaker 2>I can literally blow up my memory. Right and again

715
00:50:35.639 --> 00:50:40.679
<v Speaker 2>going back to how Prometheus works, that Prometheus pools that

716
00:50:40.760 --> 00:50:43.400
<v Speaker 2>say you have a note server, and Prometheus pulls the

717
00:50:43.519 --> 00:50:47.320
<v Speaker 2>data from the node server once every minute. So no,

718
00:50:47.679 --> 00:50:51.880
<v Speaker 2>that node server keeps all that data in its memory.

719
00:50:54.320 --> 00:51:01.239
<v Speaker 2>Oh okay, if you're not careful, you'll blow up notes memory.

720
00:51:02.039 --> 00:51:05.880
<v Speaker 2>If you've got if you've got so I'll given a

721
00:51:06.039 --> 00:51:12.960
<v Speaker 2>real example. We were using Prometheus to monitor data associated

722
00:51:13.400 --> 00:51:18.719
<v Speaker 2>with the with the page performance at her next insurance,

723
00:51:20.119 --> 00:51:23.000
<v Speaker 2>and so one of the labels was the r L.

724
00:51:24.480 --> 00:51:24.519
<v Speaker 1>H.

725
00:51:26.039 --> 00:51:32.800
<v Speaker 2>We had something like two thousand different pages, Okay, so

726
00:51:33.360 --> 00:51:36.679
<v Speaker 2>we had a label call. So one of the dimensions

727
00:51:37.559 --> 00:51:42.000
<v Speaker 2>was was the r L and it had two thousand

728
00:51:42.280 --> 00:51:47.000
<v Speaker 2>possible different values. Actually, if we weren't careful, it could

729
00:51:47.000 --> 00:51:52.039
<v Speaker 2>have had a lot worse if we didn't filter out parameters,

730
00:51:52.559 --> 00:51:59.800
<v Speaker 2>query parameters and stuff like that exactly. And another label

731
00:51:59.840 --> 00:52:02.599
<v Speaker 2>that we had was device type because we wanted to

732
00:52:02.679 --> 00:52:09.079
<v Speaker 2>distinguish between desktop and mobile. And another label that we

733
00:52:09.280 --> 00:52:13.840
<v Speaker 2>had was the browser the browser type because we wanted

734
00:52:13.880 --> 00:52:18.159
<v Speaker 2>to distinguish between performance at say on chroman on Safari. Right,

735
00:52:18.840 --> 00:52:25.039
<v Speaker 2>So think about it. It's two thousand different URLs times

736
00:52:25.800 --> 00:52:31.000
<v Speaker 2>ten different browsers, times three or four different types of

737
00:52:31.119 --> 00:52:35.840
<v Speaker 2>devices or five types of devices, and all of a sudden,

738
00:52:35.920 --> 00:52:42.440
<v Speaker 2>and for each one of those three dimension coordinates, you

739
00:52:42.559 --> 00:52:45.320
<v Speaker 2>need to remember a metric, so it's a number. So

740
00:52:45.480 --> 00:52:50.239
<v Speaker 2>it's millions of numbers that I need to that you're

741
00:52:50.280 --> 00:52:53.880
<v Speaker 2>holding on in memory, and we literally kind of blew

742
00:52:54.000 --> 00:52:59.679
<v Speaker 2>up the memory space. So yeah, so you need to

743
00:52:59.719 --> 00:53:04.599
<v Speaker 2>be careful when when you're like Cavalier about the number

744
00:53:04.639 --> 00:53:09.639
<v Speaker 2>of labels and dimensions that you're using. So it's called cardinality,

745
00:53:10.199 --> 00:53:14.519
<v Speaker 2>and basically you need to be beware of high cardinality. Right.

746
00:53:16.119 --> 00:53:20.679
<v Speaker 2>But the the benefit is that that system is extremely flexible.

747
00:53:21.360 --> 00:53:25.079
<v Speaker 2>Like if there's a new if there's a new r L,

748
00:53:25.480 --> 00:53:28.559
<v Speaker 2>you don't need to do any modification in the system.

749
00:53:28.760 --> 00:53:33.039
<v Speaker 2>It automatically adjusts to that because it just associates a

750
00:53:33.199 --> 00:53:36.760
<v Speaker 2>label value with that particular and another label value with

751
00:53:36.840 --> 00:53:44.000
<v Speaker 2>that particular metric. Right now, there's all sorts of things

752
00:53:44.039 --> 00:53:49.639
<v Speaker 2>about naming conventions. I won't go into that about how

753
00:53:49.760 --> 00:53:52.880
<v Speaker 2>you name your metrics and how you name your labels,

754
00:53:53.480 --> 00:53:58.559
<v Speaker 2>I won't go into that. I will say that there's

755
00:53:58.679 --> 00:54:03.800
<v Speaker 2>a really cool API for working with all the different

756
00:54:04.320 --> 00:54:09.400
<v Speaker 2>UH metric types. So we talked about counter engage and histogram,

757
00:54:10.199 --> 00:54:16.480
<v Speaker 2>so you can there's a if you import the prom

758
00:54:16.920 --> 00:54:23.119
<v Speaker 2>UH PROM Client uh NPM package, then you get those

759
00:54:23.199 --> 00:54:29.239
<v Speaker 2>APIs that you can push your numbers with associated labels

760
00:54:30.039 --> 00:54:35.840
<v Speaker 2>into the system and they just get recorded. Right. So again,

761
00:54:36.639 --> 00:54:40.960
<v Speaker 2>you might have let's say, UH some sort of business

762
00:54:41.079 --> 00:54:45.880
<v Speaker 2>logic process that UH that you know hits does all

763
00:54:45.960 --> 00:54:48.599
<v Speaker 2>sorts of things go to it goes to one database,

764
00:54:48.719 --> 00:54:51.360
<v Speaker 2>goes to another database does all sorts of things, and

765
00:54:51.519 --> 00:54:56.000
<v Speaker 2>you want to measure its duration overall. Or maybe you've

766
00:54:56.079 --> 00:54:59.559
<v Speaker 2>got your own like custom queue for something and you

767
00:54:59.719 --> 00:55:04.960
<v Speaker 2>want to measure the size of that queue, or you've

768
00:55:05.039 --> 00:55:10.519
<v Speaker 2>got something like that or a pool that you want.

769
00:55:10.639 --> 00:55:15.239
<v Speaker 2>Let's say you're you're you're utilizing your own custom pool

770
00:55:15.280 --> 00:55:18.519
<v Speaker 2>of resources, and you want to make sure that you've

771
00:55:18.559 --> 00:55:22.920
<v Speaker 2>made the pool not too big, not too small. Then

772
00:55:23.400 --> 00:55:25.840
<v Speaker 2>again you could be looking at the size of that

773
00:55:26.039 --> 00:55:30.079
<v Speaker 2>pool and how many what's what's its utilization at each

774
00:55:30.320 --> 00:55:34.440
<v Speaker 2>point in time, So you can, through those APIs measure

775
00:55:34.559 --> 00:55:39.559
<v Speaker 2>your own business logic values. And the example that I

776
00:55:39.760 --> 00:55:43.239
<v Speaker 2>gave is we were actually using it to measure core

777
00:55:43.320 --> 00:55:46.280
<v Speaker 2>web vitals. If you remember, we've had a lot of

778
00:55:46.880 --> 00:55:53.960
<v Speaker 2>uh yeah, exactly. Now, those are metrics that are actually

779
00:55:54.159 --> 00:55:57.519
<v Speaker 2>measured on the browser side. So you might ask how

780
00:55:57.599 --> 00:56:02.079
<v Speaker 2>did they make their way into PROMI. Well, we would

781
00:56:02.599 --> 00:56:07.119
<v Speaker 2>collect this data on the client side, post it to

782
00:56:07.320 --> 00:56:10.320
<v Speaker 2>the node server, and the node server would use that

783
00:56:10.519 --> 00:56:16.360
<v Speaker 2>API to put it into Prometheus. Right, So this way

784
00:56:16.440 --> 00:56:20.039
<v Speaker 2>you can collect data not just from the node side,

785
00:56:20.400 --> 00:56:22.880
<v Speaker 2>but also from the browser side. Again, it could be

786
00:56:23.000 --> 00:56:26.400
<v Speaker 2>a system level stuff like corere vitals, or it could

787
00:56:26.480 --> 00:56:33.360
<v Speaker 2>be applicative stuff. You know, whatever your web application happens

788
00:56:33.400 --> 00:56:36.239
<v Speaker 2>to be doing that you would like to measure and monitor.

789
00:56:39.159 --> 00:56:39.719
<v Speaker 1>That's cool.

790
00:56:40.880 --> 00:56:45.639
<v Speaker 2>Yeah. So now the final piece that I wanted to

791
00:56:45.760 --> 00:56:49.679
<v Speaker 2>mention is how to get data out of Prometheus. So,

792
00:56:49.880 --> 00:56:55.000
<v Speaker 2>as I mentioned, there's a query language called prom ql.

793
00:56:55.159 --> 00:57:01.480
<v Speaker 2>That's prom Ql, and like I said, it's definitely not SQL.

794
00:57:02.760 --> 00:57:09.280
<v Speaker 2>It can do stuff like factoring data, aggregating data, run

795
00:57:09.360 --> 00:57:13.960
<v Speaker 2>all sorts of predictive functions tech quantiles, which is a

796
00:57:14.039 --> 00:57:19.320
<v Speaker 2>fancy name for percentiles and averages and whatnot. And it

797
00:57:19.480 --> 00:57:24.280
<v Speaker 2>can be used to answer questions like what's the ninety

798
00:57:24.480 --> 00:57:29.039
<v Speaker 2>fifth pers centile latency in each data center over the

799
00:57:29.119 --> 00:57:34.119
<v Speaker 2>past months, right, you know, so very sophisticated queries, or

800
00:57:34.599 --> 00:57:38.559
<v Speaker 2>how full will the discs be in four days? So

801
00:57:38.880 --> 00:57:43.599
<v Speaker 2>here's an example of a predictive query, or which servers

802
00:57:43.719 --> 00:57:48.639
<v Speaker 2>are the top five users of CPU? All these sorts

803
00:57:48.639 --> 00:57:52.719
<v Speaker 2>of queries, and you can either use these queries in

804
00:57:52.920 --> 00:57:58.880
<v Speaker 2>order to graph things or in order to create alerts.

805
00:58:00.599 --> 00:58:06.519
<v Speaker 1>M okay, And that's what Rawana does. Yes, it uses

806
00:58:06.599 --> 00:58:09.079
<v Speaker 1>Promql to do this stuff exactly.

807
00:58:09.199 --> 00:58:12.119
<v Speaker 2>So in Gafana, what you do is you create a graph,

808
00:58:12.880 --> 00:58:17.039
<v Speaker 2>and in the graph you specify Prometheus as the data source,

809
00:58:17.960 --> 00:58:22.079
<v Speaker 2>and you write the prom ql query, and then it

810
00:58:22.239 --> 00:58:28.719
<v Speaker 2>basically graphs that query over time m and it has

811
00:58:28.840 --> 00:58:31.719
<v Speaker 2>all sorts of fancy graphs, so you can do like

812
00:58:31.920 --> 00:58:35.400
<v Speaker 2>line sharts and bar charts and heat maps and all

813
00:58:35.519 --> 00:58:40.679
<v Speaker 2>sorts of really fancy, fancy graphs, and it's really powerful.

814
00:58:42.039 --> 00:58:45.440
<v Speaker 2>If you're interested in this sort of stuff, I highly

815
00:58:45.519 --> 00:58:48.920
<v Speaker 2>recommend just going into YouTube, let's say, and searching for

816
00:58:51.280 --> 00:58:58.199
<v Speaker 2>for pomql. There's also on the Prometheus website there's like

817
00:58:59.840 --> 00:59:04.559
<v Speaker 2>there's documentation and tutorials for prom ql as well, so

818
00:59:04.800 --> 00:59:09.280
<v Speaker 2>we can post links to that later on in the

819
00:59:10.159 --> 00:59:16.519
<v Speaker 2>in the notes. Obviously, if this was a talk, then

820
00:59:16.559 --> 00:59:19.880
<v Speaker 2>I would actually be showing examples of a prom ql

821
00:59:20.039 --> 00:59:29.760
<v Speaker 2>but you know, I can't really do that. But exactly so,

822
00:59:31.239 --> 00:59:33.800
<v Speaker 2>you would put prom ql in one of two places,

823
00:59:34.400 --> 00:59:38.159
<v Speaker 2>either in something so actually one of three places. So

824
00:59:38.840 --> 00:59:42.639
<v Speaker 2>the Prometheus itself has its own simple web interface where

825
00:59:42.760 --> 00:59:46.199
<v Speaker 2>you can put in prom q ol queries and get

826
00:59:46.320 --> 00:59:52.119
<v Speaker 2>data either in tabular as basically just numbers tabular representation

827
00:59:52.280 --> 00:59:57.039
<v Speaker 2>of numbers or in simple graphs. If you want much

828
00:59:57.159 --> 01:00:01.920
<v Speaker 2>more sophisticated graphs, then you can use gfana, and gfana

829
01:00:02.079 --> 01:00:07.000
<v Speaker 2>actually has a pretty sophisticated editor for prom ql, so

830
01:00:07.199 --> 01:00:10.840
<v Speaker 2>it does automatic completion for you. It knows all the

831
01:00:11.519 --> 01:00:14.159
<v Speaker 2>metrics in the system and the labels and the different

832
01:00:14.239 --> 01:00:17.800
<v Speaker 2>label values, so it can actually do autocomplete for you

833
01:00:17.920 --> 01:00:23.639
<v Speaker 2>when you're typing in the queries. Yeah, it's it's it's

834
01:00:23.800 --> 01:00:26.719
<v Speaker 2>really nice. It's as I think I said, it's kind

835
01:00:26.760 --> 01:00:29.920
<v Speaker 2>of they're kind of sister projects, so it's it's the

836
01:00:30.000 --> 01:00:32.320
<v Speaker 2>same people working on both of them, so they made

837
01:00:32.360 --> 01:00:36.360
<v Speaker 2>sure that the integration is really nice. And you can

838
01:00:36.440 --> 01:00:40.159
<v Speaker 2>also use the prom ql in the alert manager where

839
01:00:40.559 --> 01:00:45.360
<v Speaker 2>you would type the queries in the YAMO files and

840
01:00:46.119 --> 01:00:53.000
<v Speaker 2>then you they get run automatically at periods and if

841
01:00:53.800 --> 01:00:57.400
<v Speaker 2>they come back with data that matches the query, then

842
01:00:57.400 --> 01:01:00.599
<v Speaker 2>an alert gets raised. So you run a query. If

843
01:01:00.639 --> 01:01:04.079
<v Speaker 2>it comes back no data, no alert. If it comes

844
01:01:04.199 --> 01:01:08.960
<v Speaker 2>back with data, then there's an alert. So the an

845
01:01:08.960 --> 01:01:13.719
<v Speaker 2>alert query might be do I the number of CPUs

846
01:01:14.239 --> 01:01:19.559
<v Speaker 2>that are at utilization higher than eighty percent? Right? And

847
01:01:19.679 --> 01:01:21.920
<v Speaker 2>if the number is zero, then there's no alert. If

848
01:01:21.920 --> 01:01:23.960
<v Speaker 2>the number is greater than zero than there would be

849
01:01:23.960 --> 01:01:28.920
<v Speaker 2>an alert makes sense something like that, And it's actually

850
01:01:28.960 --> 01:01:32.360
<v Speaker 2>even more sophisticated than that because you can say you

851
01:01:32.480 --> 01:01:36.000
<v Speaker 2>want to avoid like peaks, momentary peaks, so you might

852
01:01:36.079 --> 01:01:39.760
<v Speaker 2>say it needs to be higher than eighty percent over

853
01:01:39.880 --> 01:01:41.840
<v Speaker 2>a period of two minutes.

854
01:01:42.679 --> 01:01:45.639
<v Speaker 1>Right, stuff like that makes sense.

855
01:01:46.119 --> 01:01:50.760
<v Speaker 2>Yeah, so it's it's a very sophisticated and powerful system.

856
01:01:52.559 --> 01:01:55.119
<v Speaker 1>So is this something it sounds like something used at

857
01:01:55.199 --> 01:01:59.639
<v Speaker 1>next Insurance. Is this something that you're using.

858
01:01:59.559 --> 01:02:03.639
<v Speaker 2>Well, yeah, we're also using it as license. So for example,

859
01:02:03.800 --> 01:02:09.000
<v Speaker 2>when we deploy our service, our customers have the ability

860
01:02:09.519 --> 01:02:16.119
<v Speaker 2>to run our services on premises and then they can

861
01:02:16.280 --> 01:02:21.719
<v Speaker 2>collect monitoring data for our own services inside an instance

862
01:02:21.760 --> 01:02:23.920
<v Speaker 2>of Prometheus that do that we installed with it.

863
01:02:25.639 --> 01:02:26.159
<v Speaker 1>That's cool.

864
01:02:30.000 --> 01:02:33.719
<v Speaker 2>But like I said, it's a really powerful system and

865
01:02:34.440 --> 01:02:39.079
<v Speaker 2>and it's really flexible. And again if you want to

866
01:02:39.159 --> 01:02:45.159
<v Speaker 2>do monitoring or alerting, probably want to do both, then

867
01:02:45.280 --> 01:02:47.360
<v Speaker 2>it's a very good solution for that.

868
01:02:49.519 --> 01:02:51.599
<v Speaker 1>Nice and that more or less covers it.

869
01:02:52.639 --> 01:02:56.119
<v Speaker 2>Like I said, I have a talk presentation on it

870
01:02:56.280 --> 01:02:59.880
<v Speaker 2>in which I, you know, I am able to show

871
01:03:00.079 --> 01:03:05.719
<v Speaker 2>actual examples and and and and visuals, so it's you know,

872
01:03:06.599 --> 01:03:10.760
<v Speaker 2>more more more informative if you're interested in that, if

873
01:03:10.840 --> 01:03:13.559
<v Speaker 2>you're running a conference and you're interested in that, I'm

874
01:03:14.440 --> 01:03:15.760
<v Speaker 2>shopping this talk out.

875
01:03:17.840 --> 01:03:21.760
<v Speaker 1>Nice. Well, I mean this is something that I've kind

876
01:03:21.800 --> 01:03:26.039
<v Speaker 1>of been contemplating figuring out how to put into my

877
01:03:26.199 --> 01:03:29.159
<v Speaker 1>own systems, right, I mean, I I send data into

878
01:03:29.360 --> 01:03:31.760
<v Speaker 1>I have a self hosted version of Century is the

879
01:03:31.840 --> 01:03:35.760
<v Speaker 1>thing I've been using lately, and it collects some of

880
01:03:35.840 --> 01:03:39.960
<v Speaker 1>this kind of data. But you know, I've gotten more

881
01:03:40.000 --> 01:03:42.800
<v Speaker 1>and more into self hosting my own kind of thing

882
01:03:43.360 --> 01:03:46.960
<v Speaker 1>and and running through some of this, right, and it

883
01:03:47.039 --> 01:03:49.440
<v Speaker 1>sounds like this this is right kind of right up

884
01:03:49.480 --> 01:03:53.559
<v Speaker 1>that alley too, where it's okay when I deploy right,

885
01:03:53.719 --> 01:03:56.840
<v Speaker 1>it just you know, it makes sure that I have

886
01:03:57.000 --> 01:04:00.760
<v Speaker 1>a Grafana and a Prometheus instance setup that it just

887
01:04:01.000 --> 01:04:02.960
<v Speaker 1>reports to exactly.

888
01:04:03.159 --> 01:04:06.360
<v Speaker 2>Now, obviously there are overlap between these sort of systems,

889
01:04:06.599 --> 01:04:09.480
<v Speaker 2>but I think their focus is kind of like not

890
01:04:10.239 --> 01:04:15.320
<v Speaker 2>exactly the same. So let's say, if you're doing like

891
01:04:15.599 --> 01:04:19.519
<v Speaker 2>error logging, then you know Century is the tool for you.

892
01:04:19.920 --> 01:04:23.280
<v Speaker 2>I think I said that Prometheus is not the appropriate

893
01:04:23.360 --> 01:04:27.360
<v Speaker 2>tool for textual information. So if you're looking and stuff

894
01:04:27.559 --> 01:04:31.480
<v Speaker 2>like keeping stuff like stack traces and stuff like that,

895
01:04:31.800 --> 01:04:34.559
<v Speaker 2>then you want to use something like Centric. You know,

896
01:04:34.760 --> 01:04:40.519
<v Speaker 2>you could theoretically count the occurrences of a particular type

897
01:04:40.559 --> 01:04:44.199
<v Speaker 2>of error, but yeah, you probably want to use Centric

898
01:04:44.280 --> 01:04:48.159
<v Speaker 2>for something like that. Also, Centriy has the stuff that's

899
01:04:48.559 --> 01:04:54.159
<v Speaker 2>specifically intended for performance monitoring a certain scenarios. But if

900
01:04:54.239 --> 01:04:57.559
<v Speaker 2>you want to do more general purpose monitoring and system

901
01:04:57.679 --> 01:05:04.440
<v Speaker 2>level monitoring and and monitoring applicative stuff, then Prometheus could

902
01:05:04.480 --> 01:05:05.480
<v Speaker 2>be the solution for you.

903
01:05:07.159 --> 01:05:10.199
<v Speaker 1>Right. I also like that, you know, as it aggregates

904
01:05:10.280 --> 01:05:13.599
<v Speaker 1>that data effectively, what you can learn from it is

905
01:05:13.679 --> 01:05:17.559
<v Speaker 1>only limited by I guess what it's measuring, but also

906
01:05:18.559 --> 01:05:20.360
<v Speaker 1>what you can come up with the query out of it.

907
01:05:21.039 --> 01:05:26.159
<v Speaker 2>Oh yeah, for sure, I created some amazing query using

908
01:05:26.280 --> 01:05:31.320
<v Speaker 2>prom QA. Like so I'll give that example. So at

909
01:05:31.400 --> 01:05:37.199
<v Speaker 2>the next insurance, we had something like fifty services in

910
01:05:37.280 --> 01:05:42.519
<v Speaker 2>a service oriented architecture which we're communicating with each other

911
01:05:43.199 --> 01:05:48.039
<v Speaker 2>over various API end points, and we wanted alerts to

912
01:05:48.159 --> 01:05:52.920
<v Speaker 2>be fired when, certainly when any endpoint in the system

913
01:05:53.960 --> 01:05:58.440
<v Speaker 2>became too slow. But then the question was what does

914
01:05:58.559 --> 01:06:03.199
<v Speaker 2>too slow mean? Because you know, it can't be absolute numbers,

915
01:06:03.320 --> 01:06:09.239
<v Speaker 2>because if something it usually takes fifty milliseconds, and then

916
01:06:10.239 --> 01:06:13.039
<v Speaker 2>all of a sudden it grows to one hundred milliseconds,

917
01:06:13.440 --> 01:06:16.320
<v Speaker 2>that means it's too slow. But if something always runs

918
01:06:16.360 --> 01:06:19.679
<v Speaker 2>at one hundred milliseconds and then becomes one hundred and ten,

919
01:06:20.239 --> 01:06:22.840
<v Speaker 2>well that's a smaller change than you might want to

920
01:06:22.960 --> 01:06:25.960
<v Speaker 2>ignore it. So you can't look at absolute numbers. So

921
01:06:26.320 --> 01:06:31.920
<v Speaker 2>we I created really sophisticated queries that basically looked at

922
01:06:32.559 --> 01:06:37.440
<v Speaker 2>behavior over time and tried to see if something became

923
01:06:38.039 --> 01:06:45.760
<v Speaker 2>significantly slower compared to its own specific behavior from you know,

924
01:06:46.239 --> 01:06:52.840
<v Speaker 2>previous durations. And also again wanted to only look at

925
01:06:52.920 --> 01:06:56.719
<v Speaker 2>those endpoints which were sufficiently used, because if there was

926
01:06:56.760 --> 01:07:00.679
<v Speaker 2>some endpoint that was hardly ever used and we probably

927
01:07:00.719 --> 01:07:04.920
<v Speaker 2>don't care about it, right, So yes, so you can

928
01:07:05.039 --> 01:07:09.760
<v Speaker 2>create really sophisticated queries like that. And it was exactly

929
01:07:09.920 --> 01:07:13.760
<v Speaker 2>because with all these end points, you don't want to

930
01:07:13.920 --> 01:07:17.400
<v Speaker 2>have to tell somebody a developer where well, you know,

931
01:07:18.079 --> 01:07:20.639
<v Speaker 2>you're in charge of a service that has one hundred

932
01:07:20.719 --> 01:07:23.880
<v Speaker 2>different API en points. You need to look every day

933
01:07:23.920 --> 01:07:26.800
<v Speaker 2>at one hundred different graphs to figure out if there's

934
01:07:26.840 --> 01:07:31.360
<v Speaker 2>a problem. You want it so you want you want

935
01:07:31.400 --> 01:07:35.039
<v Speaker 2>an alert to be sent to their right to the

936
01:07:35.159 --> 01:07:39.960
<v Speaker 2>Slack channel. Let's say, of that particular team, if any

937
01:07:40.039 --> 01:07:43.159
<v Speaker 2>one of those end points all of a sudden became

938
01:07:43.559 --> 01:07:47.519
<v Speaker 2>slower in a way that really impacts potentially impacts the

939
01:07:47.599 --> 01:07:52.239
<v Speaker 2>system as a whole. Right, and and yeah that's what

940
01:07:52.480 --> 01:07:55.480
<v Speaker 2>that's one of the projects that I did at Next Insurance,

941
01:07:56.480 --> 01:07:59.440
<v Speaker 2>and we use cool and we use Prometheus and go

942
01:07:59.559 --> 01:08:00.119
<v Speaker 2>fun of that.

943
01:08:01.800 --> 01:08:02.239
<v Speaker 1>Very cool.

944
01:08:03.119 --> 01:08:06.639
<v Speaker 2>So when when alert like that was sent to the

945
01:08:06.719 --> 01:08:11.519
<v Speaker 2>Slack channel, the on call member of the team could

946
01:08:11.599 --> 01:08:15.679
<v Speaker 2>actually click a link see the graph for that the

947
01:08:16.000 --> 01:08:20.680
<v Speaker 2>performance of that endpoint in GOFANA over time, and say, okay,

948
01:08:20.880 --> 01:08:23.640
<v Speaker 2>we actually have a problem, or no, this is just

949
01:08:24.399 --> 01:08:27.000
<v Speaker 2>you know, a fluke or something that we can ignore.

950
01:08:27.199 --> 01:08:31.000
<v Speaker 1>Yeah, yeah, it spiked for such and such a thing

951
01:08:31.319 --> 01:08:32.479
<v Speaker 1>and no big deal.

952
01:08:33.239 --> 01:08:36.239
<v Speaker 2>Yeah we were doing something. We ran some sort of

953
01:08:36.279 --> 01:08:40.399
<v Speaker 2>a backup service and you know that's why it impacted everything.

954
01:08:40.119 --> 01:08:42.800
<v Speaker 1>And right, or you migrated some data on the back

955
01:08:42.960 --> 01:08:45.479
<v Speaker 1>end and it slowed it down for two minutes.

956
01:08:45.159 --> 01:08:52.880
<v Speaker 4>And then exactly yeah, awesome, all right, Well, we put

957
01:08:52.960 --> 01:08:58.319
<v Speaker 4>some links into the the comments on twitch and YouTube

958
01:08:58.439 --> 01:09:00.359
<v Speaker 4>and Facebook if you want to go find those.

959
01:09:00.439 --> 01:09:02.279
<v Speaker 1>We'll try and get them into the show notes as well.

960
01:09:03.239 --> 01:09:04.720
<v Speaker 1>But let's go ahead and do some pics.

961
01:09:05.760 --> 01:09:10.680
<v Speaker 2>For sure, although I don't have that many. Okay, I

962
01:09:10.880 --> 01:09:15.439
<v Speaker 2>actually do have something. So for some inexplicable reason, I

963
01:09:15.600 --> 01:09:23.119
<v Speaker 2>decided to start tweeting out my favorite standalone fantasy novels

964
01:09:23.439 --> 01:09:29.000
<v Speaker 2>or books. You know, fantasy tends to be written as

965
01:09:29.039 --> 01:09:33.039
<v Speaker 2>a lengthy series of books or at least trilogies, but

966
01:09:33.640 --> 01:09:36.439
<v Speaker 2>occasionally I just want to read one book that stands

967
01:09:36.880 --> 01:09:39.640
<v Speaker 2>on its own, merit on its own, because you know,

968
01:09:39.760 --> 01:09:41.760
<v Speaker 2>it starts and it ends and you can move on

969
01:09:41.880 --> 01:09:45.399
<v Speaker 2>to something, to the next thing. So I started to

970
01:09:46.079 --> 01:09:50.840
<v Speaker 2>tweet out the collect my list of the top standalone

971
01:09:51.319 --> 01:09:56.439
<v Speaker 2>fantasy books, and you know what, I'll use each I'll

972
01:09:56.560 --> 01:09:59.840
<v Speaker 2>pick each one at the time. So let's see which

973
01:09:59.880 --> 01:10:04.000
<v Speaker 2>one one was my first? Can you think of one?

974
01:10:04.359 --> 01:10:07.760
<v Speaker 2>There's the obvious choice, by the way, which is I

975
01:10:07.840 --> 01:10:10.199
<v Speaker 2>think the Hobbit's I.

976
01:10:10.359 --> 01:10:12.560
<v Speaker 1>Was thinking that, but then I was like, I don't

977
01:10:12.560 --> 01:10:14.640
<v Speaker 1>know if he considers that part of a series, because

978
01:10:14.640 --> 01:10:18.159
<v Speaker 1>it is sort of a prequel to Fellowship of the

979
01:10:18.239 --> 01:10:19.680
<v Speaker 1>Ring and that series.

980
01:10:19.800 --> 01:10:22.800
<v Speaker 2>Yeah, yeah, I think it's.

981
01:10:24.159 --> 01:10:24.359
<v Speaker 1>Yeah.

982
01:10:24.720 --> 01:10:27.359
<v Speaker 2>Yeah. First of all, it's totally self contained, and the

983
01:10:27.600 --> 01:10:30.920
<v Speaker 2>second important thing is that it was written well before

984
01:10:30.920 --> 01:10:34.800
<v Speaker 2>a lot of the Rings was written. Yes, So Tolkien

985
01:10:34.880 --> 01:10:38.760
<v Speaker 2>actually wrote a lot of the Rings because there the

986
01:10:38.880 --> 01:10:42.039
<v Speaker 2>publisher was so happy with the success of The Hobbit

987
01:10:42.119 --> 01:10:46.560
<v Speaker 2>that they wanted more stories in that world, and it

988
01:10:46.920 --> 01:10:51.159
<v Speaker 2>was his idea to transform it from a child's story

989
01:10:51.560 --> 01:10:55.439
<v Speaker 2>to a story for grown ups. But so the Hobbit

990
01:10:55.600 --> 01:10:58.960
<v Speaker 2>was originally written for his written for his children, although

991
01:10:58.960 --> 01:11:02.000
<v Speaker 2>a lot of adults like that story as well. Oh yeah,

992
01:11:02.199 --> 01:11:04.600
<v Speaker 2>you know what, I'll tell you another interesting story about

993
01:11:05.239 --> 01:11:09.720
<v Speaker 2>about the Hobbit. So there's a So, there are a

994
01:11:09.720 --> 01:11:16.319
<v Speaker 2>couple of translations of The Hobbit into Hebrew. But the

995
01:11:16.520 --> 01:11:22.119
<v Speaker 2>one of the translations is especially interesting. It's called the

996
01:11:22.279 --> 01:11:31.319
<v Speaker 2>Pilot's translation because because during so there was a war

997
01:11:31.479 --> 01:11:36.159
<v Speaker 2>between Israel and Egypt in the early in the early seventies,

998
01:11:36.279 --> 01:11:40.039
<v Speaker 2>like in between the Sixth Day War and the Young

999
01:11:40.159 --> 01:11:42.479
<v Speaker 2>Keeper War that was kind of it's called the War

1000
01:11:42.520 --> 01:11:47.920
<v Speaker 2>of Attrition, and various pilots were down and captured by

1001
01:11:47.960 --> 01:11:52.600
<v Speaker 2>Egyptians and held as POWs for several years, kind of

1002
01:11:52.800 --> 01:11:57.880
<v Speaker 2>like the American peow's in Vietnam, and they were looking

1003
01:11:58.199 --> 01:12:01.079
<v Speaker 2>for ways to pass the time, and one of them

1004
01:12:01.479 --> 01:12:05.039
<v Speaker 2>got their hands on the English version of The Hobbit.

1005
01:12:05.119 --> 01:12:08.279
<v Speaker 2>The original Hobbit, and and they decided to translate it

1006
01:12:08.359 --> 01:12:11.920
<v Speaker 2>into Hebrew as a way to pass the time. So

1007
01:12:12.319 --> 01:12:15.680
<v Speaker 2>and when they were released, they they took their the

1008
01:12:15.760 --> 01:12:19.640
<v Speaker 2>translation with them and it literally got published and it's

1009
01:12:19.720 --> 01:12:25.560
<v Speaker 2>called the Pilot's Translation Translation of the Hobbit from from

1010
01:12:25.640 --> 01:12:29.359
<v Speaker 2>English into Hebrew. So anyway, so the Hobbit would be one.

1011
01:12:29.920 --> 01:12:34.119
<v Speaker 2>So but because that's the obvious one, I'll give another one,

1012
01:12:34.880 --> 01:12:40.119
<v Speaker 2>and that that book is called Tigana. It's by Guy

1013
01:12:40.640 --> 01:12:45.439
<v Speaker 2>gravel K. Interestingly, to kind of close the loop. He's

1014
01:12:45.600 --> 01:12:50.720
<v Speaker 2>the person that worked with the with Tolkien's son on

1015
01:12:51.600 --> 01:12:55.640
<v Speaker 2>on bringing it on, you know, publishing the Simarillion. So

1016
01:12:55.800 --> 01:12:58.600
<v Speaker 2>they had a lot of notes over. Yeah, they had

1017
01:12:58.640 --> 01:13:02.359
<v Speaker 2>a lot of notes over from Tolkien. So they actually

1018
01:13:02.439 --> 01:13:05.039
<v Speaker 2>took all those notes and and kind of rounded out

1019
01:13:05.119 --> 01:13:07.560
<v Speaker 2>and filled up the story and then released it as

1020
01:13:07.600 --> 01:13:11.880
<v Speaker 2>a Simarillion after Tolkien had died, and he worked with

1021
01:13:12.000 --> 01:13:14.680
<v Speaker 2>it on it with Tolkien's son, But he also wrote

1022
01:13:14.720 --> 01:13:17.560
<v Speaker 2>several books on his own. One of them is a

1023
01:13:17.600 --> 01:13:22.720
<v Speaker 2>standalone novel called Tigana. You might actually find it especially

1024
01:13:22.800 --> 01:13:29.159
<v Speaker 2>interesting because it's very much inspired by by the Renaissance Italy,

1025
01:13:30.760 --> 01:13:33.560
<v Speaker 2>so the setting there, it's a fictional world, obviously with

1026
01:13:33.760 --> 01:13:37.199
<v Speaker 2>magic and you know, and stuff like that, but it's

1027
01:13:37.399 --> 01:13:42.359
<v Speaker 2>it's the situation. The scenario is very reminiscent of Italy

1028
01:13:42.680 --> 01:13:48.520
<v Speaker 2>in the Renaissance, with the little country of warring principalities

1029
01:13:48.840 --> 01:13:52.720
<v Speaker 2>and influenced by by larger external powers and and so

1030
01:13:52.840 --> 01:13:56.039
<v Speaker 2>on and so forth, and and the culture and whatnot,

1031
01:13:56.560 --> 01:14:00.479
<v Speaker 2>and and it's it's a great book. It's amazing the

1032
01:14:00.800 --> 01:14:06.319
<v Speaker 2>amount of story and settings and scenery and characters that

1033
01:14:06.439 --> 01:14:10.199
<v Speaker 2>he was able to cram into this one book. It's

1034
01:14:10.239 --> 01:14:13.359
<v Speaker 2>a pretty thick book, but still one book. And it's

1035
01:14:13.880 --> 01:14:17.119
<v Speaker 2>very highly recommended. As I said, it's called Tigana. That's

1036
01:14:17.239 --> 01:14:22.359
<v Speaker 2>t I G A n A by Guy gabrielle K

1037
01:14:23.600 --> 01:14:26.800
<v Speaker 2>And that would be my pick for today.

1038
01:14:28.560 --> 01:14:33.800
<v Speaker 1>Awesome. I'll have to check that one out. We were

1039
01:14:34.199 --> 01:14:37.119
<v Speaker 1>watching the Lord of the Rings movies and my daughter

1040
01:14:37.560 --> 01:14:39.239
<v Speaker 1>was saying that she'd never read them, and so I

1041
01:14:41.279 --> 01:14:44.359
<v Speaker 1>had it on audible and so she's been listening to it.

1042
01:14:44.560 --> 01:14:48.479
<v Speaker 2>So but if she's already watched the movie, can she

1043
01:14:49.640 --> 01:14:50.560
<v Speaker 2>can she get into it?

1044
01:14:52.239 --> 01:14:58.000
<v Speaker 1>I think so. She's pretty into other like Harry Potter

1045
01:14:58.279 --> 01:15:02.640
<v Speaker 1>and Percy Accent, and she she got into the books too.

1046
01:15:02.720 --> 01:15:07.840
<v Speaker 1>After she'd seen the movies. So yeah, I'm gonna jump

1047
01:15:07.920 --> 01:15:11.199
<v Speaker 1>in with some picks. Now. I have to say my

1048
01:15:11.279 --> 01:15:13.399
<v Speaker 1>board game group hasn't gotten together in a little bit.

1049
01:15:16.399 --> 01:15:18.600
<v Speaker 1>It's just it's just kind of that season of life.

1050
01:15:21.000 --> 01:15:24.560
<v Speaker 1>But uh, what I'm trying to think of a game?

1051
01:15:24.600 --> 01:15:30.600
<v Speaker 1>I should pick? What was the game that we played

1052
01:15:31.760 --> 01:15:37.039
<v Speaker 1>last Sunday with with the kids? I don't remember anyway

1053
01:15:40.479 --> 01:15:42.159
<v Speaker 1>that I mean, there are all kinds of games out

1054
01:15:42.159 --> 01:15:45.159
<v Speaker 1>there that you can play. I'll just go with one

1055
01:15:45.199 --> 01:15:50.359
<v Speaker 1>of my favorites. It's funny because I've never actually completely

1056
01:15:50.399 --> 01:15:53.840
<v Speaker 1>played through this game. No, I take it back, I

1057
01:15:53.920 --> 01:15:57.199
<v Speaker 1>have played through it once. It's called.

1058
01:15:59.399 --> 01:16:05.760
<v Speaker 2>Monopoly. Does anybody ever play Monopoly all the way through?

1059
01:16:11.279 --> 01:16:15.239
<v Speaker 1>My My kids do all the way through.

1060
01:16:16.439 --> 01:16:16.640
<v Speaker 2>Yeah.

1061
01:16:19.800 --> 01:16:22.359
<v Speaker 1>I haven't played Monopoly in a long time. There there

1062
01:16:22.439 --> 01:16:23.560
<v Speaker 1>are reasons that I don't.

1063
01:16:24.800 --> 01:16:29.359
<v Speaker 2>I played nostalgia, but notably because it's a it's not

1064
01:16:29.520 --> 01:16:31.720
<v Speaker 2>such a fun game. Yeah.

1065
01:16:31.960 --> 01:16:34.920
<v Speaker 1>The game that I'm thinking, yeah, well, there there are

1066
01:16:34.960 --> 01:16:39.399
<v Speaker 1>a number of issues that make it anyway.

1067
01:16:40.079 --> 01:16:44.199
<v Speaker 2>I think I seem to recall that somebody somebody once

1068
01:16:44.279 --> 01:16:47.680
<v Speaker 2>told me that the original motivation for the creation of

1069
01:16:47.720 --> 01:16:50.199
<v Speaker 2>the game Monopoly was to show them to prove the

1070
01:16:50.319 --> 01:16:51.720
<v Speaker 2>futility of capitalism.

1071
01:16:53.720 --> 01:16:57.159
<v Speaker 1>Yeah, well it's not. It's not pure capitalism.

1072
01:16:57.319 --> 01:16:59.760
<v Speaker 2>So yeah.

1073
01:17:00.119 --> 01:17:04.000
<v Speaker 1>Anyway, So the game that I'm gonna pick has called

1074
01:17:04.079 --> 01:17:09.560
<v Speaker 1>Letters from Whitechapel, and I understand it's like Scotland Yard,

1075
01:17:09.600 --> 01:17:13.439
<v Speaker 1>but I've never played Scotland Yard. So one player plays

1076
01:17:13.479 --> 01:17:17.640
<v Speaker 1>as Jack the Ripper, and then the other players play

1077
01:17:17.760 --> 01:17:21.000
<v Speaker 1>as the police deputies. So they're trying to catch Jack.

1078
01:17:23.159 --> 01:17:24.800
<v Speaker 2>Is it kind of like glue or something.

1079
01:17:26.000 --> 01:17:31.279
<v Speaker 1>No, in the sense that you're not you're not trying

1080
01:17:31.319 --> 01:17:33.960
<v Speaker 1>to figure out who did it or anything like that.

1081
01:17:34.239 --> 01:17:37.159
<v Speaker 1>You're it's literally So what the way it works is

1082
01:17:38.720 --> 01:17:42.960
<v Speaker 1>you have the women that Jack the Ripper kills, and

1083
01:17:44.960 --> 01:17:49.199
<v Speaker 1>so when he murders somebody, I just dropped the link,

1084
01:17:49.239 --> 01:17:51.840
<v Speaker 1>but I didn't label it. When he murders somebody in

1085
01:17:51.920 --> 01:17:55.960
<v Speaker 1>the game, he has so many moves that he can

1086
01:17:56.039 --> 01:17:58.640
<v Speaker 1>make to get back to his hideout. And you play

1087
01:17:58.640 --> 01:18:01.439
<v Speaker 1>it in five rounds, and so the police deputies are

1088
01:18:01.479 --> 01:18:04.520
<v Speaker 1>trying to block him off, and so they'll go investigate

1089
01:18:04.560 --> 01:18:06.800
<v Speaker 1>different spots on the board, and there are hundreds of

1090
01:18:06.920 --> 01:18:09.960
<v Speaker 1>spots on the board, and so they'll investigate a spot

1091
01:18:09.960 --> 01:18:12.039
<v Speaker 1>and they'll either find a clue, which means that Jack

1092
01:18:12.199 --> 01:18:16.760
<v Speaker 1>was there, and Jack writes down when he moves through

1093
01:18:16.800 --> 01:18:19.680
<v Speaker 1>a space, he writes it down on his thing. And

1094
01:18:19.840 --> 01:18:21.479
<v Speaker 1>one of my favorite things to do when I'm Jack

1095
01:18:21.560 --> 01:18:25.840
<v Speaker 1>the Ripper is actually to loop back around and so

1096
01:18:25.960 --> 01:18:30.439
<v Speaker 1>they'll find clues. They'll find clues all around this spot

1097
01:18:30.479 --> 01:18:33.039
<v Speaker 1>where I went through twice, and so then they don't

1098
01:18:33.079 --> 01:18:36.600
<v Speaker 1>know which direction I went coming out of there. But anyway,

1099
01:18:37.520 --> 01:18:39.520
<v Speaker 1>if you manage to get back to your height out

1100
01:18:39.760 --> 01:18:44.760
<v Speaker 1>after all five rounds, then you win as Jack the Ripper.

1101
01:18:45.159 --> 01:18:49.239
<v Speaker 1>And then if if the deputies investigate a spot in

1102
01:18:49.359 --> 01:18:54.640
<v Speaker 1>Jack's there and they have to specifically say that they're

1103
01:18:54.680 --> 01:18:58.239
<v Speaker 1>staging in arrest at that spot that if they do

1104
01:18:58.439 --> 01:19:02.800
<v Speaker 1>that properly where Jack is at, then then they win.

1105
01:19:03.359 --> 01:19:06.359
<v Speaker 2>And I have to tell you that the sub the

1106
01:19:06.439 --> 01:19:09.279
<v Speaker 2>subject matter seems a bit dark for a game. I

1107
01:19:09.399 --> 01:19:11.199
<v Speaker 2>have to tell you it is, but.

1108
01:19:12.720 --> 01:19:16.239
<v Speaker 1>The the overall gameplay is is fun. And so what

1109
01:19:16.359 --> 01:19:18.640
<v Speaker 1>happens is you start to narrow down where the hideout

1110
01:19:18.800 --> 01:19:22.399
<v Speaker 1>is right, and so you can you can then start

1111
01:19:22.479 --> 01:19:28.920
<v Speaker 1>to uh, you know, where he's going, and so wherever

1112
01:19:29.039 --> 01:19:31.359
<v Speaker 1>the murder happens at right, then you can start to

1113
01:19:32.920 --> 01:19:35.159
<v Speaker 1>you know, kind of fan out along where he might

1114
01:19:35.239 --> 01:19:39.199
<v Speaker 1>travel through and get get a sense of where he's

1115
01:19:39.239 --> 01:19:41.560
<v Speaker 1>at and then be able to arrest him. And so

1116
01:19:41.680 --> 01:19:46.359
<v Speaker 1>that's that's kind of the gameplay. Let me look it

1117
01:19:46.479 --> 01:19:48.079
<v Speaker 1>up on board game Geek, so I get the weight.

1118
01:19:52.600 --> 01:19:54.520
<v Speaker 1>The reason that I've finished with this guy tried to

1119
01:19:54.560 --> 01:19:58.560
<v Speaker 1>play it with my family, and it's it's kind of

1120
01:19:58.640 --> 01:20:00.680
<v Speaker 1>a longer game. I mean, it can run like two

1121
01:20:00.760 --> 01:20:04.960
<v Speaker 1>hours or longer, and so they just kind of especially

1122
01:20:05.079 --> 01:20:07.600
<v Speaker 1>my kids, just lose interest. And it's not the kind

1123
01:20:07.640 --> 01:20:10.560
<v Speaker 1>of My wife likes to play the kind of light,

1124
01:20:10.800 --> 01:20:14.039
<v Speaker 1>airy games, and this one's a heavier game, and so,

1125
01:20:14.479 --> 01:20:18.359
<v Speaker 1>you know, all this strategizing and stuff, she just doesn't love.

1126
01:20:18.960 --> 01:20:21.800
<v Speaker 1>It has a board game weight of two point sixty four,

1127
01:20:22.000 --> 01:20:24.640
<v Speaker 1>so it is on the heavier side of a game

1128
01:20:24.720 --> 01:20:28.800
<v Speaker 1>that a casual player would play. But anyway, it's it's

1129
01:20:28.880 --> 01:20:32.600
<v Speaker 1>a fun one. So yeah, I'll pick Letters from Whitechapel

1130
01:20:33.119 --> 01:20:34.279
<v Speaker 1>as my board game pick.

1131
01:20:34.880 --> 01:20:35.399
<v Speaker 2>And then.

1132
01:20:36.880 --> 01:20:39.399
<v Speaker 1>I've been doing a whole bunch of stuff with just

1133
01:20:39.479 --> 01:20:43.000
<v Speaker 1>being more. I took way too long on that pick,

1134
01:20:44.239 --> 01:20:46.439
<v Speaker 1>so I'll be quit brief on these other ones. I've

1135
01:20:46.479 --> 01:20:48.239
<v Speaker 1>been trying to be a little bit more efficient with

1136
01:20:48.359 --> 01:20:53.800
<v Speaker 1>my time. And I've also been working on getting back

1137
01:20:53.840 --> 01:20:56.560
<v Speaker 1>into shape. So you know, I picked a marathon and

1138
01:20:56.800 --> 01:21:00.800
<v Speaker 1>started training for it. So a few picks that I've

1139
01:21:00.880 --> 01:21:02.439
<v Speaker 1>got here that I'm just going to put out there.

1140
01:21:02.520 --> 01:21:07.720
<v Speaker 1>So my training program I do in training Peaks and

1141
01:21:07.840 --> 01:21:11.319
<v Speaker 1>that's just training peaks dot com. I'm gonna put it

1142
01:21:11.760 --> 01:21:15.800
<v Speaker 1>in the comments as well. And effectively, it just gives

1143
01:21:15.800 --> 01:21:19.279
<v Speaker 1>you a calendar. You can buy workout plans. The ones

1144
01:21:19.319 --> 01:21:21.479
<v Speaker 1>I've bought cost anywhere from five dollars all the way

1145
01:21:21.560 --> 01:21:24.960
<v Speaker 1>up to like fifty dollars. I think the marathon and

1146
01:21:25.039 --> 01:21:28.760
<v Speaker 1>triathlon ones were more expensive. But it literally just puts

1147
01:21:28.760 --> 01:21:31.359
<v Speaker 1>the workouts into your thing. And then I have a

1148
01:21:31.479 --> 01:21:34.079
<v Speaker 1>garment for runner watch and so it just sinks it

1149
01:21:34.159 --> 01:21:36.119
<v Speaker 1>down to my watch. So all I have to do

1150
01:21:36.279 --> 01:21:40.159
<v Speaker 1>is tell it go into my training calendar, pull the workout,

1151
01:21:40.439 --> 01:21:42.239
<v Speaker 1>run it through the watch. Right, So I did that

1152
01:21:42.399 --> 01:21:43.800
<v Speaker 1>this morning, went for a run.

1153
01:21:44.039 --> 01:21:44.680
<v Speaker 2>It was awesome.

1154
01:21:45.880 --> 01:21:49.920
<v Speaker 1>As far as being more efficient, I've also picked up

1155
01:21:50.000 --> 01:21:55.039
<v Speaker 1>and been using Linear, which seems to be pretty popular

1156
01:21:55.279 --> 01:22:03.039
<v Speaker 1>in in the development circles. Linear dot app I'll put

1157
01:22:03.079 --> 01:22:09.760
<v Speaker 1>that in the comments as well, And I mean it's

1158
01:22:09.840 --> 01:22:12.840
<v Speaker 1>basically a project management board like any of the other

1159
01:22:12.880 --> 01:22:16.239
<v Speaker 1>ones that you're used to. And then what I've been

1160
01:22:16.279 --> 01:22:19.039
<v Speaker 1>doing is I've been anything that I intend to do

1161
01:22:19.159 --> 01:22:23.159
<v Speaker 1>anytime soon out of linear or if it's other stuff

1162
01:22:23.199 --> 01:22:25.680
<v Speaker 1>that I need to be doing. Like for example, when

1163
01:22:25.840 --> 01:22:28.439
<v Speaker 1>Dan and I were talking before the show, he was like, Hey,

1164
01:22:28.560 --> 01:22:30.600
<v Speaker 1>we ought to get people like this on the show,

1165
01:22:30.760 --> 01:22:32.159
<v Speaker 1>or Hey, I was going to reach out to so

1166
01:22:32.239 --> 01:22:34.079
<v Speaker 1>and so, you know, and I was like, oh, I

1167
01:22:34.199 --> 01:22:37.279
<v Speaker 1>know them, and so as we were having that conversation,

1168
01:22:37.359 --> 01:22:42.279
<v Speaker 1>I put them into another system called Motion and I

1169
01:22:43.119 --> 01:22:45.479
<v Speaker 1>kind of I'll look and see if they have an

1170
01:22:45.479 --> 01:22:47.600
<v Speaker 1>affiliate program, but I'm just going to put the link in.

1171
01:22:51.520 --> 01:22:55.039
<v Speaker 1>It's Usmotion dot Com is the app. And the way

1172
01:22:55.119 --> 01:22:59.279
<v Speaker 1>that this works is so you put your tasks in

1173
01:23:01.760 --> 01:23:06.279
<v Speaker 1>let me put it usemotion dot Com. Sorry I'm typing

1174
01:23:06.319 --> 01:23:08.319
<v Speaker 1>in talking at the same time. So you put your

1175
01:23:08.399 --> 01:23:11.479
<v Speaker 1>tasks in and then it pulls like it pulls from

1176
01:23:11.520 --> 01:23:13.800
<v Speaker 1>my Google calendar, and so it knows when I have

1177
01:23:13.920 --> 01:23:16.520
<v Speaker 1>something scheduled like this episode. And then what it does

1178
01:23:16.680 --> 01:23:19.079
<v Speaker 1>is it says, Okay, I'm going to put the other

1179
01:23:19.279 --> 01:23:25.640
<v Speaker 1>tasks that you've put into Motion into your calendar and

1180
01:23:25.880 --> 01:23:28.359
<v Speaker 1>so and you can set it up so that it'll

1181
01:23:28.439 --> 01:23:30.359
<v Speaker 1>actually add them to the Google calendar, which is what

1182
01:23:30.439 --> 01:23:32.079
<v Speaker 1>I did, and then I told it not to market

1183
01:23:32.199 --> 01:23:35.319
<v Speaker 1>as busy, so they show up as free time. So

1184
01:23:35.560 --> 01:23:38.960
<v Speaker 1>it's got tasks set up for Tuesday, Wednesday, Thursday, and

1185
01:23:39.000 --> 01:23:42.279
<v Speaker 1>Friday this week as well, but they show up as

1186
01:23:42.399 --> 01:23:44.399
<v Speaker 1>free time. And so what that means is is that

1187
01:23:44.479 --> 01:23:47.319
<v Speaker 1>if somebody books a time on my calendar, it'll just

1188
01:23:47.439 --> 01:23:51.199
<v Speaker 1>shift everything around it, and so it essentially tells me

1189
01:23:51.279 --> 01:23:53.119
<v Speaker 1>what to work on. And then finally, the last thing

1190
01:23:53.199 --> 01:23:56.800
<v Speaker 1>that I've been using is Focus Blocks and I'm actually

1191
01:23:56.880 --> 01:23:59.079
<v Speaker 1>going to do another premium episode with Manny who's the

1192
01:23:59.079 --> 01:24:02.119
<v Speaker 1>guy that created it. But effectively, what it does is

1193
01:24:02.159 --> 01:24:05.319
<v Speaker 1>you get on Focus, you get on a zoom call

1194
01:24:05.920 --> 01:24:08.479
<v Speaker 1>and that you do a breathing exercise before you start,

1195
01:24:08.800 --> 01:24:10.680
<v Speaker 1>and you commit to this is what I'm going to

1196
01:24:10.800 --> 01:24:13.279
<v Speaker 1>do this hour. And sometimes I hit it. Sometimes I don't,

1197
01:24:13.399 --> 01:24:16.159
<v Speaker 1>and it's things out of my control, or it turns

1198
01:24:16.159 --> 01:24:17.760
<v Speaker 1>out to be harder than I thought it was going

1199
01:24:17.840 --> 01:24:20.720
<v Speaker 1>to be or whatever. But it keeps me on tasks

1200
01:24:20.720 --> 01:24:22.159
<v Speaker 1>because what they make you do is they make you

1201
01:24:22.239 --> 01:24:26.640
<v Speaker 1>put your phone away from your desk and so you

1202
01:24:26.800 --> 01:24:28.359
<v Speaker 1>focus on it. And then at the end of the hour,

1203
01:24:28.439 --> 01:24:30.840
<v Speaker 1>you do another breathing exercise, you report on whether or

1204
01:24:30.840 --> 01:24:32.520
<v Speaker 1>not you got your thing done, and then you do

1205
01:24:32.640 --> 01:24:36.600
<v Speaker 1>it over again. And I usually get two, three, four

1206
01:24:37.239 --> 01:24:40.239
<v Speaker 1>focus blocks in in an afternoon. I try and schedule

1207
01:24:40.319 --> 01:24:44.840
<v Speaker 1>all the regular stuff in the mornings, but yeah, I'll

1208
01:24:44.880 --> 01:24:48.880
<v Speaker 1>do that in the afternoons and then Yeah, so a

1209
01:24:48.920 --> 01:24:51.239
<v Speaker 1>lot of my like prospecting. So if I'm trying to

1210
01:24:51.279 --> 01:24:54.680
<v Speaker 1>find sponsors or stuff you know that's on their first

1211
01:24:54.720 --> 01:24:58.319
<v Speaker 1>thing in the morning, getting through my inbox is a

1212
01:24:58.399 --> 01:25:01.399
<v Speaker 1>first thing in the morning thing. So after I go

1213
01:25:01.520 --> 01:25:02.880
<v Speaker 1>for a run and things like that, that's.

1214
01:25:02.760 --> 01:25:03.039
<v Speaker 2>What we do.

1215
01:25:03.439 --> 01:25:08.239
<v Speaker 1>So anyway, those are my picks. I think his focus

1216
01:25:08.319 --> 01:25:13.760
<v Speaker 1>blocks dot com and I know I have an affiliate

1217
01:25:13.800 --> 01:25:15.680
<v Speaker 1>link for that, but so we'll put that in the

1218
01:25:15.760 --> 01:25:18.279
<v Speaker 1>show notes and it doesn't cost you anything extra, but

1219
01:25:18.319 --> 01:25:21.039
<v Speaker 1>I get a kick back if you use it. But ultimately,

1220
01:25:21.199 --> 01:25:23.239
<v Speaker 1>I mean, if these things save you a bunch of

1221
01:25:23.359 --> 01:25:26.479
<v Speaker 1>time and make you more effective, great, and then if

1222
01:25:26.520 --> 01:25:28.640
<v Speaker 1>I get a kickback because I found an affiliate link

1223
01:25:28.680 --> 01:25:31.520
<v Speaker 1>for it, even better. But ultimately, this is what I'm using.

1224
01:25:31.640 --> 01:25:37.000
<v Speaker 1>So yeah, that's what I've got for picks. So we'll

1225
01:25:37.000 --> 01:25:39.239
<v Speaker 1>wrap it up until next time.

1226
01:25:39.279 --> 01:25:40.039
<v Speaker 2>Folks back sou
