Full Transcript

Trend cache (128K-2G, default 4M): TREND CACHE Zabbix server accumulates the trend data in runtime in the trend cache, as the data flows in The cache size is set with configuration parameter TrendCacheSize Server flushes trends into the database: When a new hour starts When Zabbix server stops Wh...

Trend cache (128K-2G, default 4M): TREND CACHE Zabbix server accumulates the trend data in runtime in the trend cache, as the data flows in The cache size is set with configuration parameter TrendCacheSize Server flushes trends into the database: When a new hour starts When Zabbix server stops When a server flushes trend cache and there are already trends in the database for this hour (for example, the server has been restarted mid-hour), the new trend calculation will replace the existing one On a large installation, if a restart is needed, it is recommended to stop server at the end of one hour and start at the beginning of the next hour to avoid unnecessary trend recalculation Trend function cache (128K-G, default 4M) Shared memory size for caching calculated trend function data The cache size is set with configuration parameter TrendFunctionCacheSize 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 101 Value cache (0,128K-64G ,default 8M): VALUE CACHE Makes much faster calculation of: Trigger expressions Calculated/aggregate items Used for accessing historical data, instead of making direct SQL calls to the database. Has performance metrics: Value cache hits historical values are present in the cache Value cache misses historical values are not present in the cache, missing values are requested from the database and the cache updated accordingly The size is controlled by the ValueCacheSize parameter in Zabbix server configuration file Value cache can be disabled by setting the parameter to 0 (not recommended) If the Value cache is full, Zabbix will start working in low memory mode: Problems will be detected by internal monitoring Direct DB queries will be made, which will degrade overall performance Increase the cache size and restart Zabbix server 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 102 DATA COLLECTORS Pollers: Zabbix passive agent checks SNMP checks HTTP items Simple checks (only net.tcp.* and net.udp.*) Trappers: Zabbix active agent metrics and Zabbix sender Active proxies (start at least one trapper per proxy) Execute frontend scripts Pollers unreachable: Pollers for unreachable hosts (including IPMI and Java) ICMP pingers: Performs ICMP ping requests 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 103 DATA COLLECTORS History pollers: Calculated checks ODBC pollers ODBC checks HTTP pollers: Web scenarios Java pollers: Connect to Zabbix Java gateway Proxy pollers: Collect data from Zabbix passive proxies Two proxy pollers per one passive proxy is the best practice Discoverers: Perform network discovery tasks Common practice is to have one discoverer per Discovery rule 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 104 INTERNAL PROCESSES History Syncers: Calculate triggers Write historical data to the database Configuration parameter "StartDBSyncers" Recommended 1 DB Syncer per ~1000 NVPS default settings are the best practice for NVPS < 4000 If 100% busy, this usually means the data is not written fast enough into database. 100% busy history syncers will affect many other processes as well Will spike at the beginning of each hour (trends are written to DB) Configuration syncer: Reads configuration from DB and writes into configuration cache If it is constantly busy, you need to increase "CacheUpdateFrequency" parameter 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 105 INTERNAL PROCESSES Housekeeper: Cleans up historical data: History Trends Events Sessions Audit log 100% busy housekeeper for a long period means poor DB performance. Will affect other internal processes Maybe it is time to partition the database Timer: Responsible for switching the host status to/from maintenance at 0 seconds of every minute Only the first timer is putting hosts into maintenance mode Problem suppression updates are shared between all the timer processes 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 106 PERFORMANCE TUNING EXAMPLE Sample Zabbix server configuration file, zabbix_server.conf: Tune number of processes: StartPollers=80 StartPingers=10 StartPollersUnreachable=80 StartIPMIPollers=10 StartTrappers=20 StartDBSyncers=6 Tune size of in-memory cache: VMwareCacheSize=64 CacheSize=32M HistoryCacheSize=256M TrendCacheSize=64M HistoryIndexCacheSize = 32M ValueCacheSize=64M ! This is just an example of server configuration, do not copy/paste this in production! 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 107 DATA COLLECTION DIAGRAM Legend history poller data collector internal process self monitor internal cache self monitoring cache vmware collector vmware cache configuration unreachable poller data from proxies trapper trend function cache value cache ipmi poller ipmi manager icmp pinger configuration cache data poller java poller configuration syncer history syncer history cache snmp trapper preprocessing manager ODBC poller proxy poller Trend cache 6.0 Certified Professional ● Day 3 http poller © 2023 by Zabbix. All rights reserved Theory 108 PRACTICAL SETUP 1) Test configuration cache size in Zabbix server: Set it to minimum 128K Restart Zabbix server Check results Fix the problem when Zabbix server does not start 2) Link the template "Linux CPU by Zabbix agent" to all the Training-VM-** hosts: 3) Change item update interval to 5s on the "Linux CPU by Zabbix agent" template 4) Look at the impact on NVPS and Value cache hits: Use Monitoring -> Dashboard -> Zabbix server health dashboard 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved 15 minutes Practical task No: 22 109 Database Tuning 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved 15 minutes 110 DATABASE PERFORMANCE Database performance has a significant effect on the performance of Zabbix: All the collected data must be written into a database as soon as possible Zabbix server periodically reads configuration from the database Some Zabbix processes are accessing the database directly, when required: History syncer History pollers Configuration syncer Availability manager Discoverers Trappers Housekeeper Some other processes collected values, events, trigger status, inventory data calculated checks configuration updates availability updates network discovery data autoregistration updates removes outdated information All the frontend pages read the data directly from a database Complicated dashboards or filters can generate complex queries Multiple simultaneous frontend users generate more load 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 111 MYSQL TIPS Do at least a minimal tuning (buffer pool size, innodb log flush mode, etc.) The default database settings are not optimized for the production use Simply tuning a few critical parameters can lead to significant performance improvement Change parameters only if you are sure what you are doing ! Use the latest stable version (MySQL 8.0 is faster than 5.6 for example) ANALYZE on a table can speed up queries many times mysql> ANALYZE TABLE <table name>; Use SSD drives if possible Mount file system with mount options (e. g. noatime, etc.) 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 112 MYSQL CONFIGURATION Sample MySQL configuration: [mysqld] innodb_file_per_table = 1 innodb_buffer_pool_size=<large> (~75% of total RAM) innodb_buffer_pool_instances = 8 innodb_flush_log_at_trx_commit = 2 innodb_flush_method = O_DIRECT innodb_log_file_size = 256M max_allowed_packet = 128M open_files_limit = 65535 max_connections = 100 query_cache_type = 0 wait_timeout = 86400 optimizer_switch=index_condition_pushdown=off ! This is just an example of MySQL configuration, do not copy/paste this in production! 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 113 HOUSEKEEPER ISSUES A housekeeper process can degrade DB performance on large instances: More than 500GB of historical data stored or more than 1000 NVPS processed Trigger "Zabbix housekeeper processes more than 75% busy" is in the problem state for hours (or days). Zabbix performance is dropping during the housekeeping 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved The yellow line is constantly 100% Theory 114 TABLE PARTITIONING It is a way to split large tables into smaller partitions: DELETE queries perform full table scan (very expensive / slow) Partitioned historical data is removed in bulk by deleting physical files (partitions) Recommended for historical tables only! Daily partitions: history history_uint history_str history_log history_text Monthly partitions: trends trends_uint Do not use partitioning for other tables Events' tables are linked together, which will cause inconsistent data with partitioning 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 115 TABLE PARTITIONING Physically, a big table is split into smaller sections: Logically partitioned tables still function like a standard table Partitioning is fully transparent Logically Physically Trends partition 2022_01 Trends data 2022.01.04 19:00 Trends partition 2022_02 Trends partition 2022_03 Trends partition 2022_04 Trends table 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Trends partition 2022_05 Theory 116 TABLE PARTITIONING Benefits: Easy and fast way to remove older data. Much better performance Zabbix supports out-of-the-box partitioning for TimescaleDB Drawbacks: For other DB engines custom scripts must be used to maintain partitions. Old partitions need to be dropped to free up space. New partitions need to be created before they can be used If partitions for current data are not created, the data collection will stop. Will have same duration of keeping history/trends for all the items with the same information type. 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 117 Zabbix Troubleshooting 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved 15 minutes 118 GENERIC TOOLS FOR DEBUGGING Command-line utilities: top, ntop, iostat, vmstat, sar Zabbix itself Log file with debugging mode enabled or strace Normal operation # ps ax history history history history | grep syncer syncer syncer syncer sync #1 [synced #2 [synced #3 [synced #4 [synced 1845 items in 0.257111 sec, syncing history] 24 items in 0.060314 sec, idle 4 sec] 365 items in 0.000018 sec, idle 4 sec] 185 items in 0.000009 sec, syncing history] During an issue # ps ax history history history history | grep syncer syncer syncer syncer sync #1 [synced #2 [synced #3 [synced #4 [synced 962 items in 285.198752 sec, syncing history] 867 items in 285.177799 sec, syncing history] 1000 items in 284.936376 sec, syncing history] 988 items in 285.280719 sec, syncing history] 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 119 SLOW QUERIES Slow queries on a database will cause Zabbix performance degradation: It is one of the visible symptoms if the database works slowly Can be tracked using Zabbix server log file Zabbix server configuration file "zabbix_server.conf”: ### Option: LogSlowQueries # How long a database query may take before being logged (in milliseconds). # 0 - don't log slow queries. LogSlowQueries=3000 Check server/proxy log file for "slow query" lines: During an issue # grep slow slow query: slow query: slow query: slow query: /var/log/zabbix/zabbix_server.log 9.054528 sec, "insert into events (eventid, source, object, objectid, clock... 8.501505 sec, "update hosts set lastaccess=1421211815 where hostid... 6.754405 sec, "insert into history (itemid,clock,ns,value) values... 70.877295 sec, "select distinct t.triggerid, t.description, t.expression, t.error... 6.0 Certified Professional ● Day 3 © 2023 by Zabbix. All rights reserved Theory 120

Use Quizgecko on...
Browser
Browser