Monitor your Zabbix proxies!
Rédigé par uTux Aucun commentaireWe had that crazy situation where a Zabbix Proxy was down (VM stopped) and we had no idea about it.
That was quite the surprise because "Well, we should have had a crapload of 'No data' alerts from every host that report to that proxy". Spoiler alert: No, we didn't. We lost monitoring over hundreds of hosts for 24+ hours and everything was fine in Zabbix UI. In fact I realized that situation when the network engineer asked me why his Zabbix dashboards were empty.
An explanation
I admit I don't understand the nodata documentation regarding the function's behavior when a proxy is down. What I understand from real life use case: in a full Active configuration, nodata() based triggers won't work if data can't be collected.
Yes, that sounds stupid because we usually use nodata() triggers to detect when a host goes offline and does not report to Zabbix any longer.
Workarounds
1 - Monitor your proxies
There is an "Zabbix internal" item that can help us: zabbix[proxy,<name>,<param>]. According to the documentation:
- param - lastaccess - the timestamp of the last heartbeat message received from proxy; - This item is always processed by Zabbix server regardless of host location (on server or proxy).
What that means is that we have a reliable way to register when was the last time that the Zabbix Proxy reported to the server. We can use that information to build a trigger.
You can edit the "Zabbix proxy health" template and add an item:
- Name: Zabbix Proxy last access time
- Type: Zabbix internal
- Key: zabbix[proxy,{HOST.NAME},lastaccess]
- Type of information: Numeric (unsigned)
- Units: unixtime
Then create a trigger:
- Name: "Proxy did not reported to the server for a long time"
- Expression: fuzzytime(/template/zabbix[proxy,{HOST.NAME},lastaccess],15m)=0
Now if you shut down your proxy, an alert should be triggered within 15 minutes.
2 - Enforce "strict" mode in nodata()
The nodata() function has an optional parameter that you can set to "strict":
mode - if set to strict (double-quoted), this function will be insensitive to proxy availability (see comments for details).
You can add that parameter to the "base" Linux and Windows active templates. Now your nodata() based triggers should fire up for every single monitored host even if they can't report to Zabbix.