[ZBX-12653] Heap corruption in function: 'zbx_mutex_create_per_process_name' within the 'mutexs.c' source file Created: 2017 Sep 01  Updated: 2018 Oct 15  Resolved: 2017 Sep 08

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 3.0.11rc1, 3.2.7, 3.4.2rc1, 4.0.0alpha1
Fix Version/s: 3.0.11rc1, 3.2.8rc1, 3.4.2rc1, 4.0.0alpha1, 4.0 (plan)

Type: Patch request Priority: Critical
Reporter: Ronnie Kaech Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: agent, patch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 10 Pro 64-Bit Build 15063


Attachments: Text File mutexs.c    
Team: Team A
Sprint: Sprint 15, Sprint 16
Story Points: 3

 Description   

I was compiling the source code using the Windows NMake file and added the /MTd switch instead of the /MT compiler switch so I could debug the source code. When I tried to run the code then I encountered a heap corruption. I tracked the issue down to the zbx_mutex_create_per_process_name function in the src/libs/zbxsys/mutexs.c source file. You can see how I did it in the attached source file. I also corrected the issue and commented out the original line. According to the Microsoft documentation (https://msdn.microsoft.com/en-us/library/f30dzcf6.aspx) the second parameter of the _snwprintf_s function is the count of characters and not the size of the buffer in bytes.



 Comments   
Comment by Andrea Biscuola (Inactive) [ 2017 Sep 01 ]

The problem was introduced by fixing ZBX-10034. In subversion the revision number is r57062. In that revision the function zbx_mutex_create_per_process_name() was introduced and never modified since December 2015.

The version of zabbix affected are from 3.0.0 upwards.

Comment by Andrea Biscuola (Inactive) [ 2017 Sep 06 ]

Nihilius I was able to obtain a consistent trace with what you found and I'm working on it.

Comment by Andrea Biscuola (Inactive) [ 2017 Sep 06 ]

Hi Nihilius

I deleted my previous comment because it was so obviously wrong

Your patch is 99% correct, the only modification I did was this:

(void)_snwprintf_s(name, size, size - 1, format, prefix, pid);

Instead of:

(void)_snwprintf_s(name, size, size, format, prefix, pid);

As the amount of character is pre-determined before leaving room for the NULL wide character. It took me too much time due to the fact that the visual studio debugger doesn't gave me the right backtrace. Only with other tools I was able to reproduce the behavior.
Also, being a long time that I don't (really) program on Windows, every time I read "size" I think "bytes", but, on Windows, sizes in "words" or "dwords" are a thing.

Sorry if it took too much time. I also wonder why in a release setting this bug didn't caused crashes in a production setting in the past.

Lesson learned: words and dwords "sizes" are a thing in some operating systems

Comment by Andrea Biscuola (Inactive) [ 2017 Sep 06 ]

Fixed in svn://svn.zabbix.com/branches/dev/ZBX-12653

The second argument of the _snwprintf_s is the size in words (2 bytes) and NOT the size in bytes of the passed buffer pointer. Changed also the third argument to size - 1 (so it will not overflow in any way).

Comment by Ronnie Kaech [ 2017 Sep 06 ]

Dear Mr. Biscuola,

Thank you for fixing this issue. I guess you're mainly programming on Linux where you're used to UTF-8 character encoding and not like Windows which uses UTF-16 in turn.

The fact that it didn't caused any problems in the release version is just pure luck. The /MTd compiler switch of the Visual Studio compiler (linke) actually links to a special version of the Microsoft Common Runtime (CRT) that inserts certain patterns before and after an allocated block of memory. In the release build it was just pure luck that no crash occurred. It was also pretty strange to me as the initial error occurred in the 'fopen' call when the log file was opened.

About the size - 1: The function should actually stop copying characters when it encounters the terminating null character, but safe is safe.

Thank you again.

Comment by Andrea Biscuola (Inactive) [ 2017 Sep 07 ]

Nihilius

Yes, I generally think in terms of UTF-8 (variable sized). Not much experience with UTF-16 (fixed sized)

Thanks for the explanation! My knowledge of the Microsoft CRT is pretty limited, but I recall you could enable/disable additional protections with it (things like ASLR etc.). For sure the problem was causing heap memory to be zero'ed (as the crash happen while inside the function replace_memset()). The behavior of _snwprintf_s is the same as the canonical snprintf, so it mean that, after being done copying the data, set the remaining wide characters of the buffer to '\0' and that was crashing.
In this case, for sure the heap corruption was deleting other data allocated in the heap as, probably, the bound checks used by _snwprintf_s was never used. Anyway, out of this bug, I think we should consider using /MTd in all our debug builds.

Thank you again!

Comment by Ronnie Kaech [ 2017 Sep 07 ]

No problem, you're welcome. Thank you too for fixing this issue.

I can really recommend the /MTd switch as it helps to surface a lot of problems like this one.

Comment by Vladislavs Sokurenko [ 2017 Sep 07 ]

Successfully tested

Comment by Andrea Biscuola (Inactive) [ 2017 Sep 08 ]

Fixed in:

  • pre3.0.11rc1 r72367
  • pre3.2.8rc1 r72370
  • pre3.4.2rc1 r72371
  • pre4.0.0alpha1 (trunk) r72372
Comment by richlv [ 2017 Sep 11 ]

(1) being pedantic here, but the changelog entry does not correspond to the guidelines

Whenever possible, submitters of patches and translations should be acknowledged. This is done by including a string like "; thanks to John Smith" in the changelog entry before the developer's name.

abs Riiiiiiight, going to correct it

abs CLOSED

Generated at Thu Mar 28 15:50:16 EET 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.