git.lirion.de

Of git, get, and gud

summaryrefslogtreecommitdiffstats
path: root/nagios-plugins-contrib-24.20190301~bpo9+1/check_mongodb/README.md
blob: cb81b8dcadbd4b3ab7b9efaf0b0beefb51de2b9f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
# Nagios-MongoDB

## Overview

This is a simple Nagios check script to monitor your MongoDB server(s). 

## Authors

### Main Author
 Mike Zupan mike -(at)- zcentric.com
### Contributers
 - Frank Brandewiede <brande -(at)- travel-iq.com> <brande -(at)- bfiw.de> <brande -(at)- novolab.de>
 - Sam Perman <sam -(at)- brightcove.com>
 - Shlomo Priymak <shlomoid -(at)- gmail.com>
 - @jhoff909 on github
 - Dag Stockstad <dag.stockstad -(at)- gmail.com>

## Installation

In your Nagios plugins directory run

<pre><code>git clone git://github.com/mzupan/nagios-plugin-mongodb.git</code></pre>

Then use pip to ensure you have all pre-requisites.

<pre><code>pip install requirements</code></pre>

## Usage

### Install in Nagios

Edit your commands.cfg and add the following

<pre><code>
define command {
    command_name    check_mongodb
    command_line    $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$
}

define command {
    command_name    check_mongodb_database
    command_line    $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$ -d $ARG5$
}

define command {
    command_name    check_mongodb_collection
    command_line    $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$ -d $ARG5$ -c $ARG6$
}

define command {
    command_name    check_mongodb_replicaset
    command_line    $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$ -r $ARG5$
}

define command {
    command_name    check_mongodb_query
    command_line    $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$ -q $ARG5$
}
</code></pre>
(add -D to the command if you want to add perfdata to the output)
Then you can reference it like the following. This is is my services.cfg

#### Check Connection

This will check each host that is listed in the Mongo Servers group. It will issue a warning if the connection to the server takes 2 seconds and a critical error if it takes over 4 seconds

<pre><code>
define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Connect Check
    check_command           check_mongodb!connect!27017!2!4
}
</code></pre>

#### Check Percentage of Open Connections

This is a test that will check the percentage of free connections left on the Mongo server. In the following example it will send out an warning if the connection pool is 70% used and a critical error if it is 80% used. 

<pre><code>
define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Free Connections
    check_command           check_mongodb!connections!27017!70!80
}
</code></pre>

#### Check Replication Lag

This is a test that will test the replication lag of Mongo servers. It will send out a warning if the lag is over 15 seconds and a critical error if its over 30 seconds. Please note that this check uses 'optime' from rs.status() which will be behind realtime as heartbeat requests between servers only occur every few seconds. Thus this check may show an apparent lag of < 10 seconds when there really isn't any. Use larger values for reliable monitoring.

<pre><code>
define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Replication Lag
    check_command           check_mongodb!replication_lag!27017!15!30
}
</code></pre>


#### Check Replication Lag Percentage

This is a test that will test the replication lag percentage of Mongo servers. It will send out a warning if the lag is over 50 percents and a critical error if its over 75 percents. Please note that this check gets oplog timeDiff from primary and compares it to replication lag. When this check reaches 100 percent full resync is needed. 

<pre><code>
define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Replication Lag Percentage
    check_command           check_mongodb!replication_lag_percent!27017!50!75
}
</code></pre>


#### Check Memory Usage

This is a test that will test the memory usage of Mongo server. In my example my Mongo servers have 32 gigs of memory so I'll trigger a warning if Mongo uses over 20 gigs of ram and a error if Mongo uses over 28 gigs of memory.

<pre><code>
define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Memory Usage
    check_command           check_mongodb!memory!27017!20!28
}
</code></pre>

#### Check Mapped Memory Usage

This is a test that will check the mapped memory usage of Mongo server. 

<pre><code>
define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Mapped Memory Usage
    check_command           check_mongodb!memory_mapped!27017!20!28
}
</code></pre>

#### Check Lock Time Percentage

This is a test that will test the lock time percentage of Mongo server. In my example my Mongo I want to be warned if the lock time is above 5% and get an error if it's above 10%. When you start to have lock time it generally means your db is now overloaded.

<pre><code>
define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Lock Percentage
    check_command           check_mongodb!lock!27017!5!10
}
</code></pre>

#### Check Average Flush Time

This is a test that will check the average flush time of Mongo server. In my example my Mongo I want to be warned if the average flush time is above 100ms and get an error if it's above 200ms. When you start to get a high average flush time it means your database is write bound.

<pre><code>
define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Flush Average
    check_command           check_mongodb!flushing!27017!100!200
}
</code></pre>

#### Check Last Flush Time

This is a test that will check the last flush time of Mongo server. In my example my Mongo I want to be warned if the last flush time is above 200ms and get an error if it's above 400ms. When you start to get a high flush time it means your server might be needing faster disk or its time to shard.

<pre><code>
define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Last Flush Time
    check_command           check_mongodb!last_flush_time!27017!200!400
}
</code></pre>

#### Check status of mongodb replicaset
This is a test that will check the status of nodes within a replicaset. Depending which status it is it sends a waring during status 0, 3 and 5, critical if the status is 4, 6 or 8 and a ok with status 1, 2 and 7.

Note the trailing 2 0's keep those 0's as the check doesn't compare to anything.. So those values need to be there for the check to work.

<pre><code>
define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB state
      check_command           check_mongodb!replset_state!27017!0!0
}
</code></pre>

#### Check status of index miss ratio
This is a test that will check the ratio of index hits to misses. If the ratio is high, you should consider adding indexes. I want to get a warning if the ratio is above .005 and get an error if it's above .01

<pre><code>
define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB Index Miss Ratio
      check_command           check_mongodb!index_miss_ratio!27017!.005!.01
}
</code></pre>

#### Check number of databases and number of collections
These tests will count the number of databases and the number of collections. It is usefull e.g. when your application "leaks" databases or collections. Set the warning, critical level to fit your application.

<pre><code>
define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB Number of databases
      check_command           check_mongodb!databases!27017!300!500
}

define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB Number of collections
      check_command           check_mongodb!collections!27017!300!500
}
</code></pre>



#### Check size of a database
This will check the size of a database. This is useful for keeping track of growth of a particular database.
Replace your-database with the name of your database
<pre><code>
define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB Database size your-database
      check_command           check_mongodb_database!database_size!27017!300!500!your-database
}
</code></pre>



#### Check index size of a database
This will check the index size of a database. Overlarge indexes eat up memory and indicate a need for compaction.
Replace your-database with the name of your database
<pre><code>
define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB Database index size your-database
      check_command           check_mongodb_database!database_indexes!27017!50!100!your-database
}
</code></pre>



#### Check index size of a collection
This will check the index size of a collection. Overlarge indexes eat up memory and indicate a need for compaction.
Replace your-database with the name of your database and your-collection with the name of your collection
<pre><code>
define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB Database index size your-database
      check_command           check_mongodb_collection!collection_indexes!27017!50!100!your-database!your-collection
}
</code></pre>



#### Check the primary server of replicaset
This will check the primary server of a replicaset. This is useful for catching unexpected stepdowns of the replica's primary server.
Replace your-replicaset with the name of your replicaset
<pre><code>
define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB Replicaset Master Monitor: your-replicaset
      check_command           check_mongodb_replicaset!replica_primary!27017!0!1!your-replicaset
}
</code></pre>


#### Check the number of queries per second
This will check the number of queries per second on a server. Since MongoDB gives us the number as a running counter, we store the last value in the local
database in the nagios_check collection. The following types are accepted: query|insert|update|delete|getmore|command

This command will check updates per second and alert if the count is over 200 and warn if over 150
<pre><code>
define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB Updates per Second
      check_command           check_mongodb_query!queries_per_second!27017!200!150!update
}
</code></pre>

#### Check Primary Connection

This will check each host that is listed in the Mongo Servers group. It will issue a warning if the connection to the primary server of current replicaset takes 2 seconds and a critical error if it takes over 4 seconds

<pre><code>
define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Connect Check
    check_command           check_mongodb!connect_primary!27017!2!4
}
</code></pre>


#### Check Collection State

This will check each host that is listed in the Mongo Servers group. It can be useful to check availability of a critical collection (locks, timeout, config server unavailable...).  It will issue a critical error if find_one query failed

<pre><code>
define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Collection State
    check_command           check_mongodb!collection_state!27017!your-database!your-collection
}
</code></pre>