[ZBX-21713] "Something impossible has just happened" on Oracle DB. events.c,update_trigger_problem_count Created: 2022 Sep 30  Updated: 2024 Apr 10  Resolved: 2023 Jan 23

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 6.0.8
Fix Version/s: 6.0.13rc1, 6.2.7rc1, 6.4.0beta6, 6.4 (plan)

Type: Problem report Priority: Blocker
Reporter: Oleksii Zagorskyi Assignee: Dmitrijs Goloscapovs
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File backtrace1-2events     HTML File backtrace2-3events     PNG File trigger-100190.png     PNG File trigger-25912.png    
Issue Links:
Duplicate
Team: Team A
Sprint: Sprint 93 (Oct 2022), Sprint 94 (Nov 2022), Sprint 95 (Dec 2022), Sprint 96 (Jan 2023)
Story Points: 1

 Description   

An installation (using Oracle DB) has been upgraded from 6.0.1 to 6.0.8 and an error started to be logged after a few minutes:

 41571:20220926:200340.883 Zabbix Server stopped. Zabbix 6.0.1 (revision a80cb13868).
 44422:20220926:200809.198 Starting Zabbix Server. Zabbix 6.0.8 (revision c7c3044a4a2).
 44629:20220926:200927.595 server #27 started [task manager #1]
--
 44641:20220926:201107.513 sending configuration data to proxy ....
zabbix_server [44629]: ERROR [file and function: <events.c,update_trigger_problem_count>, revision:c7c3044a4a2, line:1562] Something impossible has just happened.
 44629:20220926:201150.849 === Backtrace: ===
 44629:20220926:201150.849 11: /data/zabbix/sbin/zabbix_server: task manager [processing tasks](zbx_backtrace+0x35) [0x596635]
 44629:20220926:201150.850 10: /data/zabbix/sbin/zabbix_server: task manager [processing tasks]() [0x51c956]
 44629:20220926:201150.850 9: /data/zabbix/sbin/zabbix_server: task manager [processing tasks](zbx_close_problem+0x14a) [0x520bda]
 44629:20220926:201150.850 8: /data/zabbix/sbin/zabbix_server: task manager [processing tasks]() [0x4a06da]
 44629:20220926:201150.850 7: /data/zabbix/sbin/zabbix_server: task manager [processing tasks](taskmanager_thread+0x195) [0x4a0da5]
 44629:20220926:201150.850 6: /data/zabbix/sbin/zabbix_server: task manager [processing tasks](zbx_thread_start+0x3e) [0x5a190e]
 44629:20220926:201150.850 5: /data/zabbix/sbin/zabbix_server: task manager [processing tasks]() [0x44acda]
 44629:20220926:201150.850 4: /data/zabbix/sbin/zabbix_server: task manager [processing tasks](MAIN_ZABBIX_ENTRY+0x9f6) [0x44bb56]
 44629:20220926:201150.850 3: /data/zabbix/sbin/zabbix_server: task manager [processing tasks](daemon_start+0x1b2) [0x5963f2]
 44629:20220926:201150.850 2: /data/zabbix/sbin/zabbix_server: task manager [processing tasks](main+0x4dc) [0x449c7c]
 44629:20220926:201150.850 1: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fd258249555]
 44629:20220926:201150.850 0: /data/zabbix/sbin/zabbix_server: task manager [processing tasks]() [0x44a06a]
zabbix_server [44629]: ERROR [file and function: <events.c,update_trigger_problem_count>, revision:c7c3044a4a2, line:1562] Something impossible has just happened.
 44629:20220926:201200.891 === Backtrace: ===
 44629:20220926:201200.891 11: /data/zabbix/sbin/zabbix_server: task manager [processing tasks](zbx_backtrace+0x35) [0x596635]
 44629:20220926:201200.891 10: /data/zabbix/sbin/zabbix_server: task manager [processing tasks]() [0x51c956]
 44629:20220926:201200.891 9: /data/zabbix/sbin/zabbix_server: task manager [processing tasks](zbx_close_problem+0x14a) [0x520bda]
 44629:20220926:201200.891 8: /data/zabbix/sbin/zabbix_server: task manager [processing tasks]() [0x4a06da]
 44629:20220926:201200.891 7: /data/zabbix/sbin/zabbix_server: task manager [processing tasks](taskmanager_thread+0x195) [0x4a0da5]
 44629:20220926:201200.891 6: /data/zabbix/sbin/zabbix_server: task manager [processing tasks](zbx_thread_start+0x3e) [0x5a190e]
 44629:20220926:201200.891 5: /data/zabbix/sbin/zabbix_server: task manager [processing tasks]() [0x44acda]
 44629:20220926:201200.891 4: /data/zabbix/sbin/zabbix_server: task manager [processing tasks](MAIN_ZABBIX_ENTRY+0x9f6) [0x44bb56]
 44629:20220926:201200.891 3: /data/zabbix/sbin/zabbix_server: task manager [processing tasks](daemon_start+0x1b2) [0x5963f2]
 44629:20220926:201200.891 2: /data/zabbix/sbin/zabbix_server: task manager [processing tasks](main+0x4dc) [0x449c7c]
 44629:20220926:201200.891 1: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fd258249555]
 44629:20220926:201200.891 0: /data/zabbix/sbin/zabbix_server: task manager [processing tasks]() [0x44a06a]

The error repeated ~100 times per a day.

Debug level for task manager was increased and here are 2 files where we see when it happens. I've added empty lines just to split blocks/function call, all lines are present and in original order.

Its known that there is some scripts that close certain problems, including the *** problem shown in the logs. They do an event.acknowledge API call.

My findings:
in each file (in each cycle) task manager tries to close 2 or 3 tasks, for the same trigger. 1st (and next one) task/attempt of closing is failed with the error, while last one is without the error.



 Comments   
Comment by Andrejs Sitals (Inactive) [ 2022 Nov 01 ]

The bug is caused by the code that detects the "size of data" (i.e., the number of bytes required for representing the data as a string).

When doing a select query, we use OCIDefineByPos() to tell the driver that data should be converted to a string and stored into a preallocated buffer. The size of the buffer depends on the type of the data:

  • for SQLT_IBDOUBLE and SQLT_BDOUBLE we use hard-coded constant ZBX_MAX_DOUBLE_LEN;
  • for strings we call OCIAttrGet() with OCI_ATTR_CHAR_SIZE and then multiply the result by 4 with the assumption that a single char can take up to 4 bytes;
  • for all other data (e.g., integers) we call OCIAttrGet() with OCI_ATTR_DATA_SIZE.

When column is textual data:

  • OCI_ATTR_CHAR_SIZE is number of characters for the column;
  • OCI_ATTR_DATA_SIZE is number of bytes for the column by taking the charset into account.

When column is numeric data:

  • OCI_ATTR_CHAR_SIZE is 0;
  • OCI_ATTR_DATA_SIZE is number of bytes for storing binary data, it is NOT the number of decimal digits in the number.

Some official resources (e.g., https://docs.oracle.com/cd/B10500_01/appdev.920/a96584/oci06des.htm) claim that OCI_ATTR_DATA_SIZE is 22 for numbers, though that's not always the case. E.g., when selecting constant numbers, OCI_ATTR_DATA_SIZE can be smaller:

  • 2 for 1 (1 digit)
  • 6 for 1234567890 (10 digits)
  • 11 for 12345678901234567890 (20 digits)
  • 16 for 123456789012345678901234567890 (30 digits)

When executing the following query:
select objectid from problem where objectid=22962
OCI_ATTR_DATA_SIZE is 22, as stated in the documentation.

Though, as soon as grouping is added:
select objectid from problem where objectid=22962 group by objectid
OCI_ATTR_DATA_SIZE can be much smaller number, 4 in this case. As a result, we allocate smaller buffer and "22962" is truncated to "2296", and this causes "Something impossible has just happened" of this ticket.

My suggestion is to use OCI_ATTR_DISP_SIZE, "the display size" (from oci.h), which seems to return correct max size for different types of data (strings and numbers), although publicly available documentation does not describe it.

Comment by Dmitrijs Goloscapovs [ 2023 Jan 09 ]

Available in versions:
  - pre-6.0.13rc1 - ae7004065da
  - pre-6.2.7rc1 - 32ecb3fbc89
  - pre-6.4.0beta6 - 888d4029cd7

Generated at Wed Apr 30 06:33:53 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.