[ZBX-12957] zabbix agentd use active mode traffic abnormality Created: 2017 Oct 27  Updated: 2024 Apr 10  Resolved: 2018 Dec 21

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: 3.2.9
Fix Version/s: 4.0.0alpha9, 4.0 (plan)

Type: Problem report Priority: Trivial
Reporter: yinyaliang Assignee: Michael Veksler
Resolution: Fixed Votes: 0
Labels: agent, client_timediff, proxy_timediff
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

centos 6.5


Attachments: PNG File 20171028004134.png     PNG File 20171028004203.png     JPEG File 20171030102811.jpg     JPEG File 20171030103410.jpg     JPEG File 20171030103424.jpg     JPEG File 20171030155458_gather.jpg     JPEG File 20171030160705_localtime.jpg     JPEG File 20171030160800_uptime.jpg     PNG File get_client_timediff.png    
Issue Links:
Causes
causes ZBX-15301 System.localtime on Windows monotonic... Closed
Duplicate
is duplicated by ZBXNEXT-2476 item timestamps sent with zabbix_send... Closed
is duplicated by ZBX-12959 CLONE - zabbix agentd use active mode... Closed
is duplicated by ZBX-12958 CLONE - zabbix agentd use active mode... Closed
is duplicated by ZBX-13324 Use of uninitialized nanoseconds on Z... Closed
is duplicated by ZBX-13616 As Trapper processes adjust monitorin... Closed
Sub-task
part of ZBX-11883 Incorrect nanosecond calculation for ... Closed
Team: Team A
Sprint: Sprint 27, Sprint 28, Sprint 29, Sprint 30, Sprint 31, Sprint 32, Sprint 33, Sprint 34, Sprint 35, Sprint 36, Sprint 37, Sprint 38, Sprint 47, Dec 2018
Story Points: 1

 Description   

Hello. I use version 3.2. When the network interrupts for some time or I stop MySQL for some time, the traffic flow on the agentd side will suddenly increase, which will be several times to dozens of times my bandwidth



 Comments   
Comment by Ingus Vilnis [ 2017 Oct 27 ]

Stop creating duplicate issues here! It is enough with one report and duplicates will not speed up any resolution.

When the network or database is recovering and become available, active agent is sending the collected values to server. Due to very low item update interval (5 seconds) it is possible that the values which are afterwards calculated as speed per second do not get updated correctly because of the short intervals and thus mathematically result in very large numbers.

You may want to increase the update interval of the items to somewhat higher values (e.g. 30 or 60 seconds) and try to reproduce again.

Comment by yinyaliang [ 2017 Oct 29 ]

Sorry, I have submitted the question on this platform for the first time, not very skilled, indeed, my update time is 5s, thank you for your solution

Comment by Glebs Ivanovskis (Inactive) [ 2017 Oct 29 ]

Could you please clarify what was down, database or the network between agent and server? Could you show us Zabbix server's process busyness graphs for that time interval? Are there any other active items monitored by this agent? I would love to see system.uptime or system.localtime Latest data around that time if you have those.

Comment by yinyaliang [ 2017 Oct 30 ]

This problem appears twice, once is the core of network problems, database maintenance is stop service at a time ,there are some active items on the agentd,but the other items do not have the problem,example(memory......)

Comment by Glebs Ivanovskis (Inactive) [ 2017 Oct 30 ]

Oh, great! Can you please show data gathering process busyness graph too? If system.uptime and system.localtime are active, can you show their Values for the time interval around the moment when problem occurred, just like in 20171028004134.png?

Comment by yinyaliang [ 2017 Oct 30 ]

thanks for you help, gather processing,localtime and uptime

Comment by Glebs Ivanovskis (Inactive) [ 2017 Oct 30 ]

Thank you!

Comment by Glebs Ivanovskis (Inactive) [ 2017 Nov 07 ]

Managed to get quite interesting results for system.localtime converted to Zabbix agent (active) type by suspending trappers and releasing one of them from time to time:

2017-11-07 00:43:53 1510008233
2017-11-07 00:42:53 1510008173
2017-11-07 00:42:02 1510007965
2017-11-07 00:42:00 1510008113
2017-11-07 00:42:00 1510008038
2017-11-07 00:38:23 1510007903
2017-11-07 00:37:41 1510007774
2017-11-07 00:37:41 1510007843
2017-11-07 00:37:41 1510007705
2017-11-07 00:34:02 1510007642
2017-11-07 00:33:02 1510007582
2017-11-07 00:32:02 1510007522

From the log file:

 19489:20171107:004202.024 trapper got '{"request":"agent data","data":[{"host":"Zabbix server","key":"system.cpu.switches","value":"45978494","clock":1510008038,"ns":14401032},{"host":"Zabbix server","key":"system.cpu.util[,idle]","value":"95.713509","clock":1510008038,"ns":14567055},{"host":"Zabbix server","key":"system.cpu.util[,interrupt]","value":"0.000000","clock":1510008038,"ns":14719329},{"host":"Zabbix server","key":"system.cpu.util[,softirq]","value":"0.010450","clock":1510008038,"ns":14868482},{"host":"Zabbix server","key":"system.cpu.util[,steal]","value":"0.000000","clock":1510008038,"ns":15025410},{"host":"Zabbix server","key":"system.cpu.util[,iowait]","value":"0.158836","clock":1510008038,"ns":15196637},{"host":"Zabbix server","key":"system.cpu.util[,system]","value":"1.519395","clock":1510008038,"ns":15352948},{"host":"Zabbix server","key":"system.cpu.util[,nice]","value":"0.025079","clock":1510008038,"ns":15507540},{"host":"Zabbix server","key":"system.cpu.util[,user]","value":"2.572730","clock":1510008038,"ns":15665049},{"host":"Zabbix server","key":"system.swap.size[,free]","value":"2161111040","clock":1510008038,"ns":15825709},{"host":"Zabbix server","key":"system.swap.size[,pfree]","value":"100.000000","clock":1510008038,"ns":15993437},{"host":"Zabbix server","key":"system.localtime","value":"1510008038","clock":1510008038,"ns":16147397},{"host":"Zabbix server","key":"system.cpu.intr","value":"6064854","clock":1510008038,"ns":16389943},{"host":"Zabbix server","key":"system.users.num","value":"1","clock":1510008038,"ns":19364872},{"host":"Zabbix server","key":"proc.num[]","value":"322","clock":1510008038,"ns":22950414}],"clock":1510008040,"ns":23786452}'
 
$ date -d @1510008038
Tue Nov  7 00:40:38 EET 2017

As we see, value 1510008038 originally had timestamp 1510008038, but was received by Zabbix server at different time and as a result timestamp was adjusted to 00:42:00.

Comment by Glebs Ivanovskis (Inactive) [ 2017 Nov 07 ]

The problem is that Zabbix server tries to do ntpd's work. sasha strongly insists it needs to. Discussions on the topic happen in ZBXNEXT-3298.

Comment by Sergejs Paskevics [ 2018 Jun 12 ]

Successfully tested

Comment by Michael Veksler [ 2018 Jun 19 ]

Available in 4.0.0alpha9 r82013.

Comment by richlv [ 2018 Aug 02 ]

Looking at the issue comments, it's unclear what was done here. Could you please clarify what exact changes were in the scope here?

Comment by dimir [ 2018 Nov 22 ]

As I understand, we removed the time adjustments completely. The difference is still calculated, but only for printing that in DEBUG mode.

Comment by richlv [ 2018 Nov 22 ]

Michael, can you please confirm or deny the changes mentioned?
Where has this been documented?

Comment by Michael Veksler [ 2018 Nov 22 ]

According the decision the protocol is not changed and difference is calculated only for debug logging.

Documentation:

Comment by richlv [ 2018 Nov 22 ]

Thank you, Michael. While you link to some decision, that link only leads back to this page.
I would guess this is another case of Zabbix closing up and the link is to a secret comment, is that correct?

Comment by dimir [ 2018 Nov 22 ]

Yes, that link points to a very secret internal information where different approaches on solving this issue are listed and the decision is made.

Comment by Glebs Ivanovskis [ 2018 Dec 12 ]

According to discussion in ZBX-15301 this change hasn't been documented properly:

  1. changes affect active agents too, not just proxies;
  2. system.localtime as active check is now useless for fuzzytime() trigger function, users will need a new way of monitoring time difference between server and agents (e.g. monitoring NTP service log files or incorporating time-based function in trigger expression).
Comment by dimir [ 2018 Dec 13 ]

Correct. Re-opening on behalf of previous comment.

Comment by dimir [ 2019 Mar 15 ]

Additional documentation changes:

  • Timestamp correction info that changes affect also active agents and sender
  • system.localtime description that should be used in passive mode
  • fuzzytime description of what "system.localtime" have to received through agent in passive mode
  • Example 10 description of what "system.localtime" have to received through agent in passive mode
  • time synchronizing in installation requirements
Comment by Glebs Ivanovskis [ 2019 Mar 17 ]

Good job, dimir! Consider adding a link to the explanation of what a passive check is.

Comment by dimir [ 2019 Mar 17 ]

The time synchronization link?

Comment by Glebs Ivanovskis [ 2019 Mar 17 ]

No, I mean system.localtime, fuzzytime() and trigger expression example mention "passive check". Reader may not know what it is.

Comment by dimir [ 2019 Mar 17 ]

There, thanks for pointing that out!

Generated at Tue Apr 23 20:59:10 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.