[ZBX-11525] Log/eventlog item values sometimes are displayed in wrong order. Created: 2016 Nov 25  Updated: 2017 May 30  Resolved: 2017 Jan 16

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.2.1
Fix Version/s: 3.0.8rc1, 3.2.4rc1, 3.4.0alpha1

Type: Incident report Priority: Critical
Reporter: Ekaterina Letova Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 25.11.png     PNG File compare2.png     Text File data_values.txt     PNG File eventlog.png    
Issue Links:
Duplicate
is duplicated by ZBX-11839 Trigger evaluation wasn't performed c... Closed
is duplicated by ZBX-11614 Events: wrong calculation of ITEM.VAL... Closed

 Description   

If log messages were generated one after another in a short time period and have equal Timestamp, they will be ordered incorrectly (see screenshot).

Steps to reproduce:

Create log/eventlog item (for example we have eventlog[System,,,,,10,skip])
Generate several messages for the chosen log.
Note: messages should be generated one after another, 3-5 in 30 seconds
Open item history for your log/eventlog item (Monitoring > Latest data > History> 500 latest values)

Result:
Values with the same Timestamp are ordered in a wrong way (according to "Local time" field) - not from the older to the newer.

Notes:

Most likely this behavior appeared after Database changes (p. 5.19) in version 3.2.0
Also this is noticeable if agent was disabled for some time and after enabling there are many messages for log/eventlog item to process.



 Comments   
Comment by Glebs Ivanovskis (Inactive) [ 2016 Nov 25 ]

Is proxy involved? If it is, then it could be a scenario similar to ZBX-9939.

Order may be broken in agent's send buffer. Actually, order is preserved there. By the way, which agent version is it?

I don't think database changes you mentioned matter here.

Comment by Ekaterina Letova [ 2016 Nov 25 ]

No, proxy is not ivolved.

Actually, order is preserved there. By the way, which agent version is it?

We use agent 3.2.1, but the problem is not in agent.

Steps to reproduce using zabbix_sender:

  1. Create txt file (data_values.txt) with the following contents:
    "test_host" "log[D:\datalog\test.log]" 1480080946 "2016-11-25 15:10:49,187 WARN  [Test] TestMessage1"
    "test_host" "log[D:\datalog\test.log]" 1480080946 "2016-11-25 15:11:12,187 WARN  [Test] TestMessage2"
    "test_host" "log[D:\datalog\test.log]" 1480080946 "2016-11-25 15:12:13,187 WARN  [Test] TestMessage3"
    "test_host" "log[D:\datalog\test.log]" 1480080946 "2016-11-25 15:13:44,187 WARN  [Test] TestMessage4"
    "test_host" "log[D:\datalog\test.log]" 1480080946 "2016-11-25 15:14:05,187 WARN  [Test] TestMessage5"
    "test_host" "log[D:\datalog\test.log]" 1480080946 "2016-11-25 15:15:19,187 WARN  [Test] TestMessage6"
    
  2. Send command:
    zabbix_sender -z 127.0.0.1 -i /tmp/data_values.txt -T

Result:
In the history of item "log[D:\datalog\test.log]" values with timestamp 16:35 are ordered exactly as in data_values.txt, though was expected reverse order.

Notes:

  1. Triggers work as expected, for example trigger with strlen function will analyse an actual last value (TestMessage6 in my example). So the problem only with the order in item history page.
  2. In zabbix v. 3.0.4 the order for the same values from data_values.txt is correct:
Comment by Glebs Ivanovskis (Inactive) [ 2016 Nov 25 ]

Could you also provide results of the following query:

select h.clock,h.ns,h.value from history_<text, str or log, depending on item type> as h,items as i where i.key_='<item key>' and i.itemid=h.itemid;
Comment by Ekaterina Letova [ 2016 Nov 28 ]
select h.clock,h.ns,h.value from history_log as h,items as i where i.key_='log[D:\datalog\test2.log]' and i.itemid=h.itemid;

v. 3.2.1

clock ns value
1480338181 188523 2016-11-28 15:21:12,187 WARN [Test] Test2
1480338181 188523 2016-11-28 15:22:13,187 WARN [Test] Test3
1480338181 188523 2016-11-28 15:23:44,187 WARN [Test] Test4
1480338181 188523 2016-11-28 15:24:05,187 WARN [Test] Test5
1480338181 188523 2016-11-28 15:25:19,187 WARN [Test] Test6
1480338181 188523 2016-11-28 15:20:49,187 WARN [Test] Test1
1480337868 139256 2016-11-28 15:10:49,187 WARN [Test] TestMessage1
1480337868 139256 2016-11-28 15:11:12,187 WARN [Test] TestMessage2
1480337868 139256 2016-11-28 15:12:13,187 WARN [Test] TestMessage3
1480337868 139256 2016-11-28 15:13:44,187 WARN [Test] TestMessage4
1480337868 139256 2016-11-28 15:14:05,187 WARN [Test] TestMessage5
1480337868 139256 2016-11-28 15:15:19,187 WARN [Test] TestMessage6

v. 3.0.4

clock ns value
1480337868 156319 2016-11-28 15:10:49,187 WARN [Test] TestMessage1
1480337868 156319 2016-11-28 15:11:12,187 WARN [Test] TestMessage2
1480337868 156319 2016-11-28 15:12:13,187 WARN [Test] TestMessage3
1480337868 156319 2016-11-28 15:13:44,187 WARN [Test] TestMessage4
1480337868 156319 2016-11-28 15:14:05,187 WARN [Test] TestMessage5
1480337868 156319 2016-11-28 15:15:19,187 WARN [Test] TestMessage6
1480338181 128743 2016-11-28 15:20:49,187 WARN [Test] Test1
1480338181 128743 2016-11-28 15:21:12,187 WARN [Test] Test2
1480338181 128743 2016-11-28 15:22:13,187 WARN [Test] Test3
1480338181 128743 2016-11-28 15:23:44,187 WARN [Test] Test4
1480338181 128743 2016-11-28 15:24:05,187 WARN [Test] Test5
1480338181 128743 2016-11-28 15:25:19,187 WARN [Test] Test6

Updated screenshot:

Comment by Glebs Ivanovskis (Inactive) [ 2016 Nov 28 ]

I guess the problem was introduced earlier (probably with ZBX-8476) and the root cause is this minuscule change:

Index: trunk/src/libs/zbxdbhigh/proxy.c
===================================================================
--- trunk/src/libs/zbxdbhigh/proxy.c	(revision 50444)
+++ trunk/src/libs/zbxdbhigh/proxy.c	(revision 50445)
@@ -2369,7 +2372,7 @@
 				}
 			}
 			else
-				av->ts.ns = -1;
+				av->ts.ns = proxy_timediff.ns;
 		}
 		else
 			zbx_timespec(&av->ts);

This makes part of DCcheck_ns() which ensures uniqueness of clock-ns pairs logically dead:

static void	DCcheck_ns(zbx_timespec_t *ts)
{
	if (ts->ns >= 0)
		return;

	...
}
Comment by dimir [ 2016 Dec 13 ]

We decided to fix it by auto-incrementing the nanosecond part of timediff for every value in case of data from sender with timestamps. The fix will be available for versions 3.0 and later.

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-11525

Comment by Aleksandrs Saveljevs [ 2016 Dec 19 ]

ZBX-11614 is likely to be a duplicate of this issue. Please verify.

Comment by Aleksandrs Saveljevs [ 2016 Dec 20 ]

(1) The fix looks plausible for this particular case, but it seems to invalidate the fix in ZBX-2805 (which introduced DCcheck_ns() function) and bring that old problem back, considering that we still have ZBX-9229. Also, I have not found a place where ns field is set to a negative value, which makes DCcheck_ns() function useless otherwise, as glebs.ivanovskis pointed out.

<dimir> RESOLVED in r64645

asaveljevs As discussed, it seems to return ZBX-2805 back, because we can have the same client_timediff in different connections. REOPENED.

<dimir> Actually I see no way to guarantee this. Checking this from value cache is an overhead. Another option could be to save the timestamp of the last value, but this will not help if we receive a value with an older timestamp in between, or more than 2 values with the same clock,ns.

RESOLVED

asaveljevs Still, it becomes a regression. For instance, if we repeat the same scenario in ZBX-2805 with a Zabbix 1.8 agent (monitoring a log file with 3 entries written simultaneously), then we get the following data in the database:

mysql> select * from history_log where itemid=34855;
+-----+--------+------------+-----------+--------+----------+--------+------------+----+
| id  | itemid | clock      | timestamp | source | severity | value  | logeventid | ns |
+-----+--------+------------+-----------+--------+----------+--------+------------+----+
| 187 |  34855 | 1482827822 |         0 |        |        0 | Line 1 |          0 |  0 |
| 188 |  34855 | 1482827822 |         0 |        |        0 | Line 2 |          0 |  0 |
| 189 |  34855 | 1482827822 |         0 |        |        0 | Line 3 |          0 |  1 |
+-----+--------+------------+-----------+--------+----------+--------+------------+----+
3 rows in set (0.00 sec)

asaveljevs Decided to check how it affects zabbix_sender. It seems it got "broken" a long time ago. For instance, suppose we use unmodified Zabbix 3.0 server (from the stable branch, not this branch) and use zabbix_sender 2.2 to send values:

$ for i in {1..5}; do echo "ZBX-11525 trapper 1482835555 Value $i" | bin/zabbix_sender -z 127.0.0.1 -i - -T; done

This gives the following data in the database:

| 4468 |  34854 | 1482835555 |         0 |        |        0 | Value 1  |          0 |         0 |
| 4469 |  34854 | 1482835555 |         0 |        |        0 | Value 2  |          0 |         0 |
| 4470 |  34854 | 1482835555 |         0 |        |        0 | Value 3  |          0 |         0 |
| 4471 |  34854 | 1482835555 |         0 |        |        0 | Value 4  |          0 |         0 |
| 4472 |  34854 | 1482835555 |         0 |        |        0 | Value 5  |          0 |         0 |

Doing the same with zabbix_sender 3.0:

| 4473 |  34854 | 1482835555 |         0 |        |        0 | Value 1  |          0 |     74252 |
| 4474 |  34854 | 1482835555 |         0 |        |        0 | Value 2  |          0 |     96549 |
| 4475 |  34854 | 1482835555 |         0 |        |        0 | Value 3  |          0 |     64633 |
| 4476 |  34854 | 1482835555 |         0 |        |        0 | Value 4  |          0 |     43515 |
| 4477 |  34854 | 1482835555 |         0 |        |        0 | Value 5  |          0 |     43217 |

It can be seen that later values can have lower nanoseconds (last column) than earlier values. However, this should be left for ZBXNEXT-2476.

asaveljevs As discussed, since there have always been ways to achieve the same timestamps, no objections for moving this forward. CLOSED.

Comment by dimir [ 2016 Dec 27 ]

(2) [PS] Fix for trunk had lots of conflicts. Fixed in svn://svn.zabbix.com/branches/dev/ZBX-11525-trunk.

RESOLVED

asaveljevs Trunk (version 3.3.0) should probably not have a ChangeLog entry for this issue.

<dimir> Thanks, fixed.

asaveljevs Consider process_proxy_data() function. There, we call process_proxy_history_data_33(), which can modify the value of client_timediff, after which we call process_discovery_data_contents() and process_auto_registration_contents() with that modified value. This does not seem to be correct - the original value should be used instead.

asaveljevs While at it, please take a look at stylistic suggestions in r64750. REOPENED.

<dimir> Looks good. RESOLVED in r64764,r64777

vso CLOSED

Comment by Vladislavs Sokurenko [ 2017 Jan 04 ]

(3) [F] Nanoseconds are good now, however frontend does not take those into account when sorting.

Sort options

$options['sortfield'] = ['itemid', 'clock'];

Array

Parameters:
Array
(
    [history] => 4
    [itemids] => Array
        (
            [0] => 23670
        )

    [output] => extend
    [sortorder] => DESC
    [limit] => 500
    [sortfield] => Array
        (
            [0] => itemid
            [1] => clock
        )

)
	Result:
Array
(
    [0] => Array
        (
            [itemid] => 23670
            [clock] => 1480080946
            [value] => 2016-11-25 15:10:49,187 WARN  [Test] TestMessage1
            [ns] => 63392
        )

    [1] => Array
        (
            [itemid] => 23670
            [clock] => 1480080946
            [value] => 2016-11-25 15:11:12,187 WARN  [Test] TestMessage2
            [ns] => 63393
        )

    [2] => Array
        (
            [itemid] => 23670
            [clock] => 1480080946
            [value] => 2016-11-25 15:12:13,187 WARN  [Test] TestMessage3
            [ns] => 63394
        )

    [3] => Array
        (
            [itemid] => 23670
            [clock] => 1480080946
            [value] => 2016-11-25 15:13:44,187 WARN  [Test] TestMessage4
            [ns] => 63395
        )

    [4] => Array
        (
            [itemid] => 23670
            [clock] => 1480080946
            [value] => 2016-11-25 15:14:05,187 WARN  [Test] TestMessage5
            [ns] => 63396
        )

    [5] => Array
        (
            [itemid] => 23670
            [clock] => 1480080946
            [value] => 2016-11-25 15:15:19,187 WARN  [Test] TestMessage6
            [ns] => 63397
        )

)

SQL query

SQL (0.000199): SELECT h.* FROM history_text h WHERE h.itemid='23670' ORDER BY h.itemid DESC,h.clock DESC LIMIT 500 OFFSET 0
history.php:124 → CView->render() → include() → CScreenHistory->get() → CApiWrapper->__call() → CFrontendApiWrapper->callMethod() → CApiWrapper->callMethod() → CFrontendApiWrapper->callClientMethod() → CLocalApiClient->callMethod() → CHistory->get() → DBselect() in include/classes/api/services/CHistory.php:191

sasha We can't fix it in old versions of Zabbix because issues with performance. Please create a new ZBX issue.

vso Result before was tested on svn://svn.zabbix.com/branches/dev/ZBX-11525-trunk
Here is how it's sorted in 3.0 svn://svn.zabbix.com/branches/dev/ZBX-11525, looks like in this particular case also has nothing to do with nanoseconds, please let me know if I am looking in the wrong place.

Array

3. history.get [CScreenHistory.php:178]

Parameters:
Array
(
    [history] => 4
    [itemids] => Array
        (
            [0] => 23667
        )

    [output] => extend
    [sortorder] => DESC
    [limit] => 500
    [sortfield] => id
)
	Result:
Array
(
    [0] => Array
        (
            [id] => 6
            [itemid] => 23667
            [clock] => 1480080946
            [value] => 2016-11-25 15:15:19,187 WARN  [Test] TestMessage6
            [ns] => 64098
        )

    [1] => Array
        (
            [id] => 5
            [itemid] => 23667
            [clock] => 1480080946
            [value] => 2016-11-25 15:14:05,187 WARN  [Test] TestMessage5
            [ns] => 64097
        )

    [2] => Array
        (
            [id] => 4
            [itemid] => 23667
            [clock] => 1480080946
            [value] => 2016-11-25 15:13:44,187 WARN  [Test] TestMessage4
            [ns] => 64096
        )

    [3] => Array
        (
            [id] => 3
            [itemid] => 23667
            [clock] => 1480080946
            [value] => 2016-11-25 15:12:13,187 WARN  [Test] TestMessage3
            [ns] => 64095
        )

    [4] => Array
        (
            [id] => 2
            [itemid] => 23667
            [clock] => 1480080946
            [value] => 2016-11-25 15:11:12,187 WARN  [Test] TestMessage2
            [ns] => 64094
        )

    [5] => Array
        (
            [id] => 1
            [itemid] => 23667
            [clock] => 1480080946
            [value] => 2016-11-25 15:10:49,187 WARN  [Test] TestMessage1
            [ns] => 64093
        )

)

SQL

SELECT h.* FROM history_text h WHERE h.itemid='23667' ORDER BY h.id DESC LIMIT 500 OFFSET 0

Also for trunk I think that it's a bit odd that history values are sorted by itemid which is the same for the item.
This looks redundant in trunk WHERE h.itemid='23670' ORDER BY h.itemid

<dimir> We removed id fields from history tables in 3.2 (ZBXNEXT-3135).

vso WONTFIX, created separate issue for frontend ZBX-11677

Comment by Vladislavs Sokurenko [ 2017 Jan 05 ]

server side successfully tested, however frontend still display in wrong order.

Comment by dimir [ 2017 Jan 13 ]

Fixed in

  • pre-3.0.8rc1 (r64727)
  • pre-3.2.4rc1 (r64728)
  • pre-3.3.0 (trunk) (r65096)
Generated at Thu Apr 25 01:23:35 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.