ZABBIX BUGS AND ISSUES

Zabbix Proxy drop to send some data

Details

  • Zabbix ID:
    RTD

Description

Zabbix Proxy 1.8.x drop to send some data. It depends on transaction delay to DB.

In Zabbix Proxy...
Process A: in DCsync_history() calls transacation with DCmass_proxy_add_history() and DCmass_proxy_update_items().
 But DCmass_proxy_update_items() is delayed for some reason.
Process B: my call DCmass_proxy_add_history() and commited. But Process A is not completed.
Process C: try to send data to Server. Then it reads the data from DB which commited by Process B. But currently Process A is not completed, so it can not read Process A's data.
Aftere then, Process A is completed. But as lastid is updated, Process A's data is never sent to Server.

So, I think that for resolving this problem, you shuld insert the WRITE Table lock for DCmass_proxy_add_history() and DCmass_proxy_update_items().

Activity

Hide
Bashman added a comment - - edited

In my case, I'm using version 1.8.15, Zabbix proxy doesn't send all data to server.

  9817:20130508:121800.348 Trapper got [{
        "request":"history data"}] len 28
  9817:20130508:121800.348 In send_proxyhistory()
  9817:20130508:121800.348 In proxy_get_history_data() table:'proxy_history'
  9817:20130508:121800.348 In proxy_get_lastid() [proxy_history.history_lastid]
  9817:20130508:121800.349 query [txnlev:0] [select nextid from ids where table_name='proxy_history' and field_name='history_lastid']
  9814:20130508:121800.349 zbx_recv_response() '{
        "response":"success"}'
  9814:20130508:121800.349 End of zbx_recv_response():SUCCEED
  9814:20130508:121800.349 End of send_host_availability()
  9817:20130508:121800.349 End of proxy_get_lastid():601142242
  9817:20130508:121800.349 query [txnlev:0] [select p.id,h.host,i.key_,p.clock,p.timestamp,p.source,p.severity,p.value,p.logeventid from hosts h,items i,proxy_history p where h.hostid=i.hostid and i.itemid=p.itemid and p.id>601142242 order by p.id limit 1000]
  9817:20130508:121800.350 End of proxy_get_history_data():36 lastid:601142278
  9817:20130508:121800.350 In zbx_recv_response()
  9816:20130508:121800.351 Trapper got [{
        "request":"discovery data"}] len 30
  9816:20130508:121800.352 In send_discovery_data()
  9816:20130508:121800.352 In proxy_get_history_data() table:'proxy_dhistory'
  9816:20130508:121800.352 In proxy_get_lastid() [proxy_dhistory.dhistory_lastid]
  9816:20130508:121800.352 query [txnlev:0] [select nextid from ids where table_name='proxy_dhistory' and field_name='dhistory_lastid']
  9817:20130508:121800.352 zbx_recv_response() '{
        "response":"success"}'
  9817:20130508:121800.352 End of zbx_recv_response():SUCCEED
  9817:20130508:121800.352 In proxy_set_lastid() [proxy_history.history_lastid:601142278]
  9817:20130508:121800.352 query [txnlev:0] [select 1 from ids where table_name='proxy_history' and field_name='history_lastid']
  9816:20130508:121800.352 End of proxy_get_lastid():777
  9816:20130508:121800.352 query [txnlev:0] [select p.id,p.clock,p.druleid,p.dcheckid,p.type,p.ip,p.port,p.key_,p.value,p.status from proxy_dhistory p where p.id>777 order by p.id limit 1000]
  9817:20130508:121800.353 query without transaction detected
  9817:20130508:121800.353 query [txnlev:0] [update ids set nextid=601142278 where table_name='proxy_history' and field_name='history_lastid']
  9816:20130508:121800.353 End of proxy_get_history_data():0 lastid:0
  9816:20130508:121800.353 In zbx_recv_response()
  9818:20130508:121800.353 Trapper got [{
        "request":"auto registration"}] len 33
  9818:20130508:121800.353 In send_areg_data()
  9818:20130508:121800.353 In proxy_get_history_data() table:'proxy_autoreg_host'
  9818:20130508:121800.353 In proxy_get_lastid() [proxy_autoreg_host.autoreg_host_lastid]
  9818:20130508:121800.353 query [txnlev:0] [select nextid from ids where table_name='proxy_autoreg_host' and field_name='autoreg_host_lastid']
  9816:20130508:121800.353 zbx_recv_response() '{
        "response":"success"}'
  9816:20130508:121800.353 End of zbx_recv_response():SUCCEED
  9816:20130508:121800.353 End of send_discovery_data()
  9817:20130508:121800.354 End of proxy_set_lastid()
  9817:20130508:121800.354 End of send_proxyhistory()
  9818:20130508:121800.354 End of proxy_get_lastid():10306
  9818:20130508:121800.354 query [txnlev:0] [select p.id,p.clock,p.host from proxy_autoreg_host p where p.id>10306 order by p.id limit 1000]
  9818:20130508:121800.354 End of proxy_get_history_data():0 lastid:0
  9818:20130508:121800.354 In zbx_recv_response()
  9818:20130508:121800.354 zbx_recv_response() '{
        "response":"success"}'
  9818:20130508:121800.355 End of zbx_recv_response():SUCCEED
  9818:20130508:121800.355 End of send_areg_data()
mysql> select count(*) from proxy_history;
+----------+
| count(*) |
+----------+
|    15447 |
+----------+
1 row in set (0.01 sec)

Could be related to ZBX-5448.

Show
Bashman added a comment - - edited In my case, I'm using version 1.8.15, Zabbix proxy doesn't send all data to server.
  9817:20130508:121800.348 Trapper got [{
        "request":"history data"}] len 28
  9817:20130508:121800.348 In send_proxyhistory()
  9817:20130508:121800.348 In proxy_get_history_data() table:'proxy_history'
  9817:20130508:121800.348 In proxy_get_lastid() [proxy_history.history_lastid]
  9817:20130508:121800.349 query [txnlev:0] [select nextid from ids where table_name='proxy_history' and field_name='history_lastid']
  9814:20130508:121800.349 zbx_recv_response() '{
        "response":"success"}'
  9814:20130508:121800.349 End of zbx_recv_response():SUCCEED
  9814:20130508:121800.349 End of send_host_availability()
  9817:20130508:121800.349 End of proxy_get_lastid():601142242
  9817:20130508:121800.349 query [txnlev:0] [select p.id,h.host,i.key_,p.clock,p.timestamp,p.source,p.severity,p.value,p.logeventid from hosts h,items i,proxy_history p where h.hostid=i.hostid and i.itemid=p.itemid and p.id>601142242 order by p.id limit 1000]
  9817:20130508:121800.350 End of proxy_get_history_data():36 lastid:601142278
  9817:20130508:121800.350 In zbx_recv_response()
  9816:20130508:121800.351 Trapper got [{
        "request":"discovery data"}] len 30
  9816:20130508:121800.352 In send_discovery_data()
  9816:20130508:121800.352 In proxy_get_history_data() table:'proxy_dhistory'
  9816:20130508:121800.352 In proxy_get_lastid() [proxy_dhistory.dhistory_lastid]
  9816:20130508:121800.352 query [txnlev:0] [select nextid from ids where table_name='proxy_dhistory' and field_name='dhistory_lastid']
  9817:20130508:121800.352 zbx_recv_response() '{
        "response":"success"}'
  9817:20130508:121800.352 End of zbx_recv_response():SUCCEED
  9817:20130508:121800.352 In proxy_set_lastid() [proxy_history.history_lastid:601142278]
  9817:20130508:121800.352 query [txnlev:0] [select 1 from ids where table_name='proxy_history' and field_name='history_lastid']
  9816:20130508:121800.352 End of proxy_get_lastid():777
  9816:20130508:121800.352 query [txnlev:0] [select p.id,p.clock,p.druleid,p.dcheckid,p.type,p.ip,p.port,p.key_,p.value,p.status from proxy_dhistory p where p.id>777 order by p.id limit 1000]
  9817:20130508:121800.353 query without transaction detected
  9817:20130508:121800.353 query [txnlev:0] [update ids set nextid=601142278 where table_name='proxy_history' and field_name='history_lastid']
  9816:20130508:121800.353 End of proxy_get_history_data():0 lastid:0
  9816:20130508:121800.353 In zbx_recv_response()
  9818:20130508:121800.353 Trapper got [{
        "request":"auto registration"}] len 33
  9818:20130508:121800.353 In send_areg_data()
  9818:20130508:121800.353 In proxy_get_history_data() table:'proxy_autoreg_host'
  9818:20130508:121800.353 In proxy_get_lastid() [proxy_autoreg_host.autoreg_host_lastid]
  9818:20130508:121800.353 query [txnlev:0] [select nextid from ids where table_name='proxy_autoreg_host' and field_name='autoreg_host_lastid']
  9816:20130508:121800.353 zbx_recv_response() '{
        "response":"success"}'
  9816:20130508:121800.353 End of zbx_recv_response():SUCCEED
  9816:20130508:121800.353 End of send_discovery_data()
  9817:20130508:121800.354 End of proxy_set_lastid()
  9817:20130508:121800.354 End of send_proxyhistory()
  9818:20130508:121800.354 End of proxy_get_lastid():10306
  9818:20130508:121800.354 query [txnlev:0] [select p.id,p.clock,p.host from proxy_autoreg_host p where p.id>10306 order by p.id limit 1000]
  9818:20130508:121800.354 End of proxy_get_history_data():0 lastid:0
  9818:20130508:121800.354 In zbx_recv_response()
  9818:20130508:121800.354 zbx_recv_response() '{
        "response":"success"}'
  9818:20130508:121800.355 End of zbx_recv_response():SUCCEED
  9818:20130508:121800.355 End of send_areg_data()
mysql> select count(*) from proxy_history;
+----------+
| count(*) |
+----------+
|    15447 |
+----------+
1 row in set (0.01 sec)
Could be related to ZBX-5448.
Hide
Bashman added a comment -

It doesn't happen with icmp.

Show
Bashman added a comment - It doesn't happen with icmp.
Hide
Bashman added a comment -

Which is the socket size limit?, I created a new zabbix_proxy with less hosts and it works fine.

Show
Bashman added a comment - Which is the socket size limit?, I created a new zabbix_proxy with less hosts and it works fine.
Hide
MATSUDA Daiki added a comment -

The important thing is that it occurs after commit delay as I wrote. So, did you see commit delay ?
At least Zabbix Proxy of 1.8.16, maybe also 2.0.x, does not have a good argorithm for this, it should wait.

Show
MATSUDA Daiki added a comment - The important thing is that it occurs after commit delay as I wrote. So, did you see commit delay ? At least Zabbix Proxy of 1.8.16, maybe also 2.0.x, does not have a good argorithm for this, it should wait.
Hide
Bashman added a comment -

I don't see any delay after commitment:

 9825:20130508:121929.942 Syncing ...
  9825:20130508:121929.942 In DCsync_history() history_first:1419 history_num:27
  9825:20130508:121929.942 In DCinit_nextchecks()
  9825:20130508:121929.942 query [txnlev:1] [begin;]
  9824:20130508:121929.943 Syncing ...
  9824:20130508:121929.943 In DCsync_history() history_first:1446 history_num:0
  9824:20130508:121929.943 history syncer #3 spent 0.000023 seconds while processing 0 items
  9824:20130508:121929.943 sleeping for 5 seconds
  9825:20130508:121929.943 In DCmass_proxy_add_history()
  9825:20130508:121929.943 query [txnlev:1] [insert into proxy_history (itemid,clock,value) values (49192,1368008365,'33.000000'),(58183,1368008366,'1469584703.000000'),(53185,1368008368,'0.209273');
insert into proxy_history (itemid,clock,value) values (51935,1368008364,'3937898075'),(39832,1368008365,'0'),(38753,1368008365,'5537877'),(49221,1368008362,'1'),(60111,1368008362,'1'),(44452,1368008362,'1'),(42562,1368008362,'1'),(44272,1368008362,'1'),(49852,1368008362,'1'),(50032,1368008362,'1'),(66052,1368008362,'1'),(48951,1368008362,'1'),(50031,1368008362,'1'),(50570,1368008362,'0'),(49184,1368008366,'26'),(66005,1368008367,'0'),(39065,1368008367,'3570903802'),(43824,1368008365,'1'),(50574,1368008365,'0'),(50035,1368008365,'1'),(50575,1368008365,'0'),(52014,1368008365,'1'),(50034,1368008365,'1'),(50033,1368008365,'1');
]
  9823:20130508:121929.943 Syncing ...
  9823:20130508:121929.943 In DCsync_history() history_first:1446 history_num:0
  9823:20130508:121929.943 history syncer #2 spent 0.000034 seconds while processing 0 items
  9823:20130508:121929.943 sleeping for 5 seconds
  9825:20130508:121929.944 End of DCmass_proxy_add_history()
  9825:20130508:121929.944 In DCmass_proxy_update_items()
  9825:20130508:121929.944 End of DCmass_proxy_update_items()
  9825:20130508:121929.944 query [txnlev:1] [commit;]
  9825:20130508:121929.944 In DCflush_nextchecks()
  9825:20130508:121929.944 history syncer #4 spent 0.001907 seconds while processing 27 items
  9825:20130508:121929.944 sleeping for 5 seconds
  9822:20130508:121929.996 Syncing ...
  9822:20130508:121929.997 In DCsync_history() history_first:1446 history_num:0
  9822:20130508:121929.997 history syncer #1 spent 0.000033 seconds while processing 0 items
  9822:20130508:121929.997 sleeping for 5 seconds

The problem I see is that not all the proxy data is sent to the server. May be it is due to the high amount of data to be sent:

Host count: 381
Item count: 2378
Required performance (vps): 10.61

I tried with a new proxy:

Host count: 4
Item count: 80
Required performance (vps): 0.37

And everything is ok.

Show
Bashman added a comment - I don't see any delay after commitment:
 9825:20130508:121929.942 Syncing ...
  9825:20130508:121929.942 In DCsync_history() history_first:1419 history_num:27
  9825:20130508:121929.942 In DCinit_nextchecks()
  9825:20130508:121929.942 query [txnlev:1] [begin;]
  9824:20130508:121929.943 Syncing ...
  9824:20130508:121929.943 In DCsync_history() history_first:1446 history_num:0
  9824:20130508:121929.943 history syncer #3 spent 0.000023 seconds while processing 0 items
  9824:20130508:121929.943 sleeping for 5 seconds
  9825:20130508:121929.943 In DCmass_proxy_add_history()
  9825:20130508:121929.943 query [txnlev:1] [insert into proxy_history (itemid,clock,value) values (49192,1368008365,'33.000000'),(58183,1368008366,'1469584703.000000'),(53185,1368008368,'0.209273');
insert into proxy_history (itemid,clock,value) values (51935,1368008364,'3937898075'),(39832,1368008365,'0'),(38753,1368008365,'5537877'),(49221,1368008362,'1'),(60111,1368008362,'1'),(44452,1368008362,'1'),(42562,1368008362,'1'),(44272,1368008362,'1'),(49852,1368008362,'1'),(50032,1368008362,'1'),(66052,1368008362,'1'),(48951,1368008362,'1'),(50031,1368008362,'1'),(50570,1368008362,'0'),(49184,1368008366,'26'),(66005,1368008367,'0'),(39065,1368008367,'3570903802'),(43824,1368008365,'1'),(50574,1368008365,'0'),(50035,1368008365,'1'),(50575,1368008365,'0'),(52014,1368008365,'1'),(50034,1368008365,'1'),(50033,1368008365,'1');
]
  9823:20130508:121929.943 Syncing ...
  9823:20130508:121929.943 In DCsync_history() history_first:1446 history_num:0
  9823:20130508:121929.943 history syncer #2 spent 0.000034 seconds while processing 0 items
  9823:20130508:121929.943 sleeping for 5 seconds
  9825:20130508:121929.944 End of DCmass_proxy_add_history()
  9825:20130508:121929.944 In DCmass_proxy_update_items()
  9825:20130508:121929.944 End of DCmass_proxy_update_items()
  9825:20130508:121929.944 query [txnlev:1] [commit;]
  9825:20130508:121929.944 In DCflush_nextchecks()
  9825:20130508:121929.944 history syncer #4 spent 0.001907 seconds while processing 27 items
  9825:20130508:121929.944 sleeping for 5 seconds
  9822:20130508:121929.996 Syncing ...
  9822:20130508:121929.997 In DCsync_history() history_first:1446 history_num:0
  9822:20130508:121929.997 history syncer #1 spent 0.000033 seconds while processing 0 items
  9822:20130508:121929.997 sleeping for 5 seconds
The problem I see is that not all the proxy data is sent to the server. May be it is due to the high amount of data to be sent: Host count: 381 Item count: 2378 Required performance (vps): 10.61 I tried with a new proxy: Host count: 4 Item count: 80 Required performance (vps): 0.37 And everything is ok.
Hide
MATSUDA Daiki added a comment - - edited

What do you think about this problem? You only paste successful logs and do not try to confirm source code!

In fact, the problem occured in our customer's environment and DB often delays for a process to commit. And I confirmed the source code with their logs, 1.8.16 and 2.0.5. Because History syncers work asynchronous, there is a possiblity. And I attatch the patches.

Show
MATSUDA Daiki added a comment - - edited What do you think about this problem? You only paste successful logs and do not try to confirm source code! In fact, the problem occured in our customer's environment and DB often delays for a process to commit. And I confirmed the source code with their logs, 1.8.16 and 2.0.5. Because History syncers work asynchronous, there is a possiblity. And I attatch the patches.
Hide
Bashman added a comment -

All I know is that apparently in the logs everything is correct, but the data that proxy must send to the server is not complete, there is missing data. However, if proxy has fewer hosts, it is able to send all the data to server.

Show
Bashman added a comment - All I know is that apparently in the logs everything is correct, but the data that proxy must send to the server is not complete, there is missing data. However, if proxy has fewer hosts, it is able to send all the data to server.
Hide
MATSUDA Daiki added a comment -

No, as I wrote at the first,

1. process A transaction starts and takes lastvalue N
2. process B transaction starts and takes last value N' (N' > N)
3. process A transaction not end and Process B transaction commited
4. process C starts to send the data from N + 1 to N' to the Server
5. process A transcation commited
6. the data to N never be sent

Do you understand? And have you seen this phonomena?

Show
MATSUDA Daiki added a comment - No, as I wrote at the first, 1. process A transaction starts and takes lastvalue N 2. process B transaction starts and takes last value N' (N' > N) 3. process A transaction not end and Process B transaction commited 4. process C starts to send the data from N + 1 to N' to the Server 5. process A transcation commited 6. the data to N never be sent Do you understand? And have you seen this phonomena?
Hide
Bashman added a comment -

Yeah, I understand. You're totally right:

mysql> SELECT max(id) from proxy_history;
+---------+
| max(id) |
+---------+
| 2906356 |
+---------+
1 row in set (0.00 sec)

mysql> select * from ids;
+--------+--------------------+---------------------+---------+
| nodeid | table_name         | field_name          | nextid  |
+--------+--------------------+---------------------+---------+
|      0 | proxy_autoreg_host | autoreg_host_lastid |   66285 |
|      0 | proxy_history      | history_lastid      | 2906383 |
+--------+--------------------+---------------------+---------+
2 rows in set (0.00 sec)
Show
Bashman added a comment - Yeah, I understand. You're totally right:
mysql> SELECT max(id) from proxy_history;
+---------+
| max(id) |
+---------+
| 2906356 |
+---------+
1 row in set (0.00 sec)

mysql> select * from ids;
+--------+--------------------+---------------------+---------+
| nodeid | table_name         | field_name          | nextid  |
+--------+--------------------+---------------------+---------+
|      0 | proxy_autoreg_host | autoreg_host_lastid |   66285 |
|      0 | proxy_history      | history_lastid      | 2906383 |
+--------+--------------------+---------------------+---------+
2 rows in set (0.00 sec)
Hide
AndreaConsadori added a comment -

i also see this issue because i've an ustable dsl connection that loose packet every 5 minutes

Show
AndreaConsadori added a comment - i also see this issue because i've an ustable dsl connection that loose packet every 5 minutes
Hide
Andris Mednis added a comment -

I looked into proposed patch for 2.0.6. It sets up a small shared memory segment for keeping a counting semaphore "How many DB syncers are doing their transactions at the moment". Every "DB syncer" process before starting a transaction:

  • obtains a mutex lock, increments the semaphore by 1, releases the lock,
  • does its transaction,
  • obtains the mutex lock, decrements the semaphore by 1, releases the lock.

The "Data sender" process observes the semaphore and waits until it becomes 0 (no one DB syncer is in transaction). Waiting is a sleeping for 0.1 second.
I wonder how this will behave on a heavy loaded proxies ? Is there a guarantee that "Data sender" will get access to data in a reasonable time ? Some time passes between the "Data sender" seeing that the semaphore is 0 and proceeding with reading data until the data is acquired by the "Data sender". As it is not an atomic operation, in the meantime "DB syncer" processes can do something, maybe even a transaction, who knows.

Show
Andris Mednis added a comment - I looked into proposed patch for 2.0.6. It sets up a small shared memory segment for keeping a counting semaphore "How many DB syncers are doing their transactions at the moment". Every "DB syncer" process before starting a transaction:
  • obtains a mutex lock, increments the semaphore by 1, releases the lock,
  • does its transaction,
  • obtains the mutex lock, decrements the semaphore by 1, releases the lock.
The "Data sender" process observes the semaphore and waits until it becomes 0 (no one DB syncer is in transaction). Waiting is a sleeping for 0.1 second. I wonder how this will behave on a heavy loaded proxies ? Is there a guarantee that "Data sender" will get access to data in a reasonable time ? Some time passes between the "Data sender" seeing that the semaphore is 0 and proceeding with reading data until the data is acquired by the "Data sender". As it is not an atomic operation, in the meantime "DB syncer" processes can do something, maybe even a transaction, who knows.
Hide
Andris Mednis added a comment -

Proposed solution - modify functions proxy_get_history_data() and, maybe, proxy_get_history_data_simple():

  • when reading data from 'proxy_history' table, check whether "id" grows with step 1.
  • if yes, then good, there are no missing records, we can proceed with sending data to master server.
  • if no, some records are missing. May be they will be committed in a moment by one of DB syncers, may be they will never arrive if a rollback occurred in a DB syncer. In this case the Data sender sleeps a short time (0.1 second?) and re-reads the data from 'proxy_history' table starting from the old "lastid". This gives a chance for delayed records to be included in sending to the master server. If there are still missing records, we "write them off" and proceed with sending what is available.

Data sender cannot send missing records later and cannot wait too long for them.

Show
Andris Mednis added a comment - Proposed solution - modify functions proxy_get_history_data() and, maybe, proxy_get_history_data_simple():
  • when reading data from 'proxy_history' table, check whether "id" grows with step 1.
  • if yes, then good, there are no missing records, we can proceed with sending data to master server.
  • if no, some records are missing. May be they will be committed in a moment by one of DB syncers, may be they will never arrive if a rollback occurred in a DB syncer. In this case the Data sender sleeps a short time (0.1 second?) and re-reads the data from 'proxy_history' table starting from the old "lastid". This gives a chance for delayed records to be included in sending to the master server. If there are still missing records, we "write them off" and proceed with sending what is available.
Data sender cannot send missing records later and cannot wait too long for them.
Hide
MATSUDA Daiki added a comment -

> I looked into proposed patch for 2.0.6. It sets up a small shared memory segment for keeping a counting semaphore "How many DB syncers are doing their transactions at the moment". Every "DB syncer" process before starting a transaction:
>
> obtains a mutex lock, increments the semaphore by 1, releases the lock,
> does its transaction,
> obtains the mutex lock, decrements the semaphore by 1, releases the lock.

> The "Data sender" process observes the semaphore and waits until it becomes 0 (no one DB syncer is in transaction). Waiting is a sleeping for 0.1 second.

Yes.

> I wonder how this will behave on a heavy loaded proxies ?

Maybe yes. But there is a usual behaviour that 2nd transaction is commited faster than 1st transaction.

> Is there a guarantee that "Data sender" will get access to data in a reasonable time ?

Maybe no, because I do not understand 'reasonable time'. But there is no problem when Data sender accesses the DB, no DB syncer does not start transaction.

> Some time passes between the "Data sender" seeing that the semaphore is 0 and proceeding with reading data until the data is acquired by the "Data sender". As it is not an atomic operation, in the meantime "DB syncer" processes can do something, maybe even a transaction, who knows.

I know the potential problem that Data sender wait for forever. So, I attached the patch to limit waiting time.

Show
MATSUDA Daiki added a comment - > I looked into proposed patch for 2.0.6. It sets up a small shared memory segment for keeping a counting semaphore "How many DB syncers are doing their transactions at the moment". Every "DB syncer" process before starting a transaction: > > obtains a mutex lock, increments the semaphore by 1, releases the lock, > does its transaction, > obtains the mutex lock, decrements the semaphore by 1, releases the lock. > The "Data sender" process observes the semaphore and waits until it becomes 0 (no one DB syncer is in transaction). Waiting is a sleeping for 0.1 second. Yes. > I wonder how this will behave on a heavy loaded proxies ? Maybe yes. But there is a usual behaviour that 2nd transaction is commited faster than 1st transaction. > Is there a guarantee that "Data sender" will get access to data in a reasonable time ? Maybe no, because I do not understand 'reasonable time'. But there is no problem when Data sender accesses the DB, no DB syncer does not start transaction. > Some time passes between the "Data sender" seeing that the semaphore is 0 and proceeding with reading data until the data is acquired by the "Data sender". As it is not an atomic operation, in the meantime "DB syncer" processes can do something, maybe even a transaction, who knows. I know the potential problem that Data sender wait for forever. So, I attached the patch to limit waiting time.
Hide
Alexander Vladishev added a comment - - edited

Changes in the development branch svn://svn.zabbix.com/branches/dev/ZBX-6249-trunk is successfully reviewed with a small fix. Please review r37798.
Andris Mednis Thanks! Reviewed and accepted.

Show
Alexander Vladishev added a comment - - edited Changes in the development branch svn://svn.zabbix.com/branches/dev/ZBX-6249-trunk is successfully reviewed with a small fix. Please review r37798. Andris Mednis Thanks! Reviewed and accepted.
Hide
Andris Mednis added a comment -

Available in development branches:

  • svn://svn.zabbix.com/branches/dev/ZBX-6249-18 (for version 1.8)
  • svn://svn.zabbix.com/branches/dev/ZBX-6249-trunk
Show
Andris Mednis added a comment - Available in development branches:
  • svn://svn.zabbix.com/branches/dev/ZBX-6249-18 (for version 1.8)
  • svn://svn.zabbix.com/branches/dev/ZBX-6249-trunk
Hide
Alexander Vladishev added a comment -

svn://svn.zabbix.com/branches/dev/ZBX-6249-18 was successfully tested!

Show
Alexander Vladishev added a comment - svn://svn.zabbix.com/branches/dev/ZBX-6249-18 was successfully tested!
Hide
Alexander Vladishev added a comment -

svn://svn.zabbix.com/branches/dev/ZBX-6249-trunk was successfully tested!

Show
Alexander Vladishev added a comment - svn://svn.zabbix.com/branches/dev/ZBX-6249-trunk was successfully tested!
Hide
Andris Mednis added a comment - - edited

Available in pre-1.8.18 r37965, pre-2.0.9 r38027 and pre-2.1.3 (trunk) r37968.

Show
Andris Mednis added a comment - - edited Available in pre-1.8.18 r37965, pre-2.0.9 r38027 and pre-2.1.3 (trunk) r37968.
Hide
Anton Samets added a comment -

Guys, this patch is awesome! Right now in "Administration - Queue" we see only green zeroes at all.
Before this it was like 10-30 thousands of item with delay more than 1 minute. Permanent IO in htop in main zabbix-server has gone.
Recommend to use. Right now we using 2.0.8 with this patch.

Show
Anton Samets added a comment - Guys, this patch is awesome! Right now in "Administration - Queue" we see only green zeroes at all. Before this it was like 10-30 thousands of item with delay more than 1 minute. Permanent IO in htop in main zabbix-server has gone. Recommend to use. Right now we using 2.0.8 with this patch.
Hide
Andris Mednis added a comment -

Are you using a patch from ZBX-6249 attachments contributed by Matsuda Daiki or from 2.0.9rc1 ?

Show
Andris Mednis added a comment - Are you using a patch from ZBX-6249 attachments contributed by Matsuda Daiki or from 2.0.9rc1 ?

People

Vote (1)
Watch (5)

Dates

  • Created:
    Updated:
    Resolved: