ZCP_day_3_slides_Part5.pdf

Full Transcript

Function: percentile(item key,period|#num,percentage) Parameters: period #num time_shift percentage PERCENTILE time period number of values time shift period range of 0 to 100 This trigger will fire, if a single value is more than 100M: last(/Production server/net.if.in[eth0,bytes]) > 100M This...

Function: percentile(item key,period|#num,percentage) Parameters: period #num time_shift percentage PERCENTILE time period number of values time shift period range of 0 to 100 This trigger will fire, if a single value is more than 100M: last(/Production server/net.if.in[eth0,bytes]) > 100M This trigger will fire, if 95th percentile for the last 1 hour is higher than 100M: In 95% of the time, the usage is around or below the specified amount 5% of the time, the usage can be bursting beyond this rate percentile(/Production server/net.if.in[eth0,bytes],1h,95) > 100M 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 81 PERCENTILE EXAMPLE Percentile can be used: In trigger expressions In calculated items last() last() last() Displayed on graphs Percentile below 1M 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 82 PRACTICAL SETUP On the "Template Advanced" create new user macros: • • • • • {$FS.PREDICT.TIME} {$FS.PREDICT.HISTORY} {$FS.PREDICT.SIZE.LEFT} {$FS.PREDICT.WARNING.TIME} {$FS.PREDICT.WARNING.SIZE} 1h 4h 500M 7d 40G On the "Mounted filesystems discovery" rule create: Calculated item prototypes: {#FSNAME}:Free space after {$FS.PREDICT.TIME} {#FSNAME}:Time left until {$FS.PREDICT.SIZE.LEFT} Trigger prototype {$FS.PREDICT.WARNING.TIME} left until {$FS.PREDICT.SIZE.LEFT} on {HOST.NAME} {#FSNAME} Free space below {$FS.PREDICT.WARNING.SIZE} in {$FS.PREDICT.TIME} on {HOST.NAME} {#FSNAME} A graph prototype to display used, predicted size and time left. Add context to {$FS.PREDICT.WARNING.SIZE} to use value 200M for “/dev” space Update the trigger prototype to use context-based macros. 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved 30m Practical task No: 21 83 Event Correlation 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved 5 minutes 84 EVENT CORRELATION OVERVIEW Event correlation: Allows to correlate problem events to their resolution It can be defined On a trigger level: Allows to correlate separate problems reported by one trigger Works with triggers that have Multiple Problem Event Generation mode enabled Globally: Problems reported from different triggers can be correlated using global correlation rules i https://www.zabbix.com/documentation/current/manual/config/event_correlation 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 85 Trigger-based event correlation 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved 15 minutes 86 TRIGGER-BASED EVENT CORRELATION While generally an OK event can close all the problem events created by one trigger, there are cases when a more detailed approach is needed Correlate separate problems reported by one trigger: Tags are used to extract values and create identification for problem events Problems can be closed individually based on a matching tag Used with "Multiple Problem Event Generation" mode enabled Useful for log files, SNMP traps, etc. Discover certain problems in a log file and close them individually rather than all together Actual tags and tag values only become visible when a trigger fires: If the regular expression used is invalid, it is silently replaced with a string *UNKNOWN* Subsequent OK events with the same *UNKNOWN* tag value may close problem events, which they shouldn't have closed. i https://www.zabbix.com/documentation/current/manual/config/event_correlation/trigger 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 87 TRIGGER-BASED EVENT CORRELATION 125044.865 125048.899 125048.904 134849.769 134849.772 temporarily disabling temporarily disabling temporarily disabling enabling Zabbix agent enabling Zabbix agent 6.0 Certified Professional ● Day 3 Zabbix Zabbix Zabbix checks checks agent checks on host "DEV": host unavailable agent checks on host "PROD 2": host unavailable agent checks on host "PROD 1": host unavailable on host "PROD 1": host became available on host "PROD 2": host became available © 2023 by Zabbix. All rights reserved 3 separate problems 2 ok events Theory 88 TRIGGER-BASED EVENT CORRELATION Problem Recovery Settings Host name in a tag for matching 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 89 Global event correlation 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved 20 minutes 90 GLOBAL EVENT CORRELATION Global event correlation allows to reach out over all the metrics monitored by Zabbix and create correlations: Allows correlation of the problems based on the event tag information Resolve a problem created by one trigger with another trigger on the same or another host Problems matching correlation rules are closed automatically Events are still generated, but actions are not executed Focus on the root causes of a problem by saving yourself from repetitive notifications Important safety tips: Always set a unique tag for a new event that is paired with old events Avoid using common tag names that may end up being used by different correlation configurations Keep the number of the correlation rules limited to the ones you really need Configuring global correlation rules is available only to Zabbix Super Admins i https://www.zabbix.com/documentation/current/manual/config/event_correlation/global 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 91 GLOBAL EVENT CORRELATION RULES Open Configuration > Event correlation to configure global event correlation rules Define conditions for the correlation rule: ! Be very careful with "Type of calculation" 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 92 GLOBAL EVENT CORRELATION CONDITIONS They are based on "new" and "old" events: Old event - already existing unresolved problem event New event - event that is just detected Multiple conditions are available for matching: Old event tag name – the tag name is used New event tag name – the tag name is used New event host group – the host group is used Event tag pair – the values for specified tags are used Old event tag value - the old event tag name and value are used New event tag value - the new event tag name and value are used Conditions are very important: If no old event condition is specified, all the old events may be matched and closed If no new event condition is specified, all the new events may be matched and closed 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 93 GLOBAL EVENT CORRELATION ACTIONS Operations define what to do in case of a match: Close old events close existing (old) events when a new event happens Closing an event resolves the problem that was generated by this event Close a new event close detected event (new) immediately after it occurred Problems caused by this event are detected and resolved with 0 duration Actions for the new events are not executed 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 94 GLOBAL EVENT CORRELATION ON THE SAME HOST 2020/07/09 2020/07/09 2020/07/09 2020/07/09 2020/07/09 20:34:40.343478 20:34:43.493450 20:35:05.953718 20:35:43.575373 20:35:45.384897 cannot cannot cannot cannot cannot accept accept accept accept accept incoming incoming incoming incoming incoming connection connection connection connection connection for for for for for peer: peer: peer: peer: peer: 10.0.2.44 10.0.2.51 10.0.2.40 10.0.2.51 10.0.2.44 3 separate problems 2 duplicate problems duplicate problems initial problems duplicate problems are correlated initial problems 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 95 GLOBAL EVENT CORRELATION ON MULTIPLE HOSTS A switch has multiple devices connected to its ports When the switch is unreachable, problems with all the connected devices are correlated Switch (48 ports) TAG: Device type TAG: Switch name TAG: Problem Server 1 Value: Switch Value: 3rd floor Value: Unreachable Server 3 Server 2 TAG: Device type TAG: Switch name Server ... Value: Server Value: 3rd floor 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 96 GLOBAL EVENT CORRELATION - NOTES Event correlation must be configured very carefully It can close all the existing problems in the worst case Always set a unique tag for a new event that is paired with old events Use the "New event tag" correlation condition Add a condition based on the old event when using the "Close old event" operation Otherwise, all the existing problems could be closed Negatively affects performance: All events are checked against the correlation rules Zabbix generates two events every time when a problem is correlated: problem event recovery event Problems may not get correlated correctly, if there is a little time interval between the first and the second problem (e.g., 0.5s). ! Keep the number of correlation rules limited to the ones you really need 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 97 Zabbix Internals 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved 30 minutes 98 Configuration cache (128K-64G, default 32M): CONFIGURATION CACHE Contains information on hosts, items, LLD rules and triggers to be monitored Changes in the frontend are written to the database, but not immediately seen by the serve This period can be customized by configuration parameter CacheUpdateFrequency Configuration is updated by default every 60 seconds The cache size is set with configuration parameter CacheSize If Zabbix runs out of configuration cache, it will crash with an error message It can be reloaded immediately by executing zabbix_server -R config_cache_reload command Zabbix proxies also have configuration cache containing entities related to this proxy Configuration syncer 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 99 HISTORY CACHE History cache (128K-2G, default 16M): Temporary keeps data collected by the data collectors History syncers use this cache to write data to the DB The cache size is set with configuration parameter HistoryCacheSize Must be almost empty most of the time: Full history cache usually means history syncers cannot write data to DB fast enough May be filled quickly by proxies after long offline periods Has additional cache for indexes HistoryIndexCache (128K-2G, default 4M): Shared memory size for indexing history cache Usual practice is to set this cache 4 times lower than the History cache Data collectors Preprocessing 6.0 Certified Professional ● Day 3 History cache © 2023 by Zabbix. All rights reserved History syncer Theory 100

Use Quizgecko on...
Browser
Browser