[Valgrind-users] Helgrind detects race with same lock

Discussion:

William Good

2017-05-29 17:33:42 UTC

Hello,

I am trying to understand this helgrind output. It says there is a data-race on a read. However both threads hold the same lock. How can this be a race when both threads hold the lock during the access?

==31341== ----------------------------------------------------------------
==31341==
==31341== Lock at 0x5990828 was first observed
==31341== at 0x4C31A76: pthread_mutex_init (hg_intercepts.c:779)
==31341== by 0x4026AF: thread_pool_submit (threadpool.c:85)
==31341== by 0x402012: qsort_internal_parallel (quicksort.c:142)
==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
==31341== by 0x402450: thread_work (threadpool.c:233)
==31341== by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
==31341== by 0x4E42DC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==31341== by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
==31341== Address 0x5990828 is 40 bytes inside a block of size 152 alloc'd
==31341== at 0x4C2CD95: calloc (vg_replace_malloc.c:711)
==31341== by 0x4026A1: thread_pool_submit (threadpool.c:84)
==31341== by 0x402012: qsort_internal_parallel (quicksort.c:142)
==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
==31341== by 0x40279F: future_get (threadpool.c:112)
==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
==31341== by 0x402450: thread_work (threadpool.c:233)
==31341== by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
==31341== by 0x4E42DC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==31341== by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
==31341== Block was alloc'd by thread #3
==31341==
==31341== Possible data race during read of size 4 at 0x5990880 by thread #2
==31341== Locks held: 1, at address 0x5990828
==31341== at 0x4023A9: thread_work (threadpool.c:229)
==31341== by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
==31341== by 0x4E42DC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==31341== by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
==31341==
==31341== This conflicts with a previous write of size 4 by thread #3
==31341== Locks held: 1, at address 0x5990828
==31341== at 0x4027B3: future_get (threadpool.c:114)
==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
==31341== by 0x40279F: future_get (threadpool.c:112)
==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
==31341== by 0x40279F: future_get (threadpool.c:112)
==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
==31341== Address 0x5990880 is 128 bytes inside a block of size 152 alloc'd
==31341== at 0x4C2CD95: calloc (vg_replace_malloc.c:711)
==31341== by 0x4026A1: thread_pool_submit (threadpool.c:84)
==31341== by 0x402012: qsort_internal_parallel (quicksort.c:142)
==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
==31341== by 0x40279F: future_get (threadpool.c:112)
==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
==31341== by 0x402450: thread_work (threadpool.c:233)
==31341== by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
==31341== by 0x4E42DC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==31341== by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
==31341== Block was alloc'd by thread #3
==31341==
==31341== ----------------------------------------------------------------

Philippe Waroquiers

2017-05-29 21:20:22 UTC

Permalink

You might have been unlucky and have a lock that was freed and then
re-used.

See extract of mk_LockP_from_LockN comments:
So we check that each LockN is a member of the admin_locks double
linked list of all Lock structures. That stops us prodding around
in potentially freed-up Lock structures. However, it's not quite a
proper check: if a new Lock has been reallocated at the same
address as one which was previously freed, we'll wind up copying
the new one as the basis for the LockP, which is completely bogus
because it is unrelated to the previous Lock that lived there.
Let's hope that doesn't happen too often.

Do you have a small reproducer for the below ?
Philippe

On Mon, 2017-05-29 at 17:33 +0000, William Good wrote:
> Hello,
>
> I am trying to understand this helgrind output. It says there is a
> data-race on a read. However both threads hold the same lock. How
> can this be a race when both threads hold the lock during the access?
>
>
> ==31341==
> ----------------------------------------------------------------
> ==31341==
> ==31341== Lock at 0x5990828 was first observed
> ==31341== at 0x4C31A76: pthread_mutex_init (hg_intercepts.c:779)
> ==31341== by 0x4026AF: thread_pool_submit (threadpool.c:85)
> ==31341== by 0x402012: qsort_internal_parallel (quicksort.c:142)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x402450: thread_work (threadpool.c:233)
> ==31341== by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
> ==31341== by 0x4E42DC4: start_thread
> (in /usr/lib64/libpthread-2.17.so)
> ==31341== by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
> ==31341== Address 0x5990828 is 40 bytes inside a block of size 152
> alloc'd
> ==31341== at 0x4C2CD95: calloc (vg_replace_malloc.c:711)
> ==31341== by 0x4026A1: thread_pool_submit (threadpool.c:84)
> ==31341== by 0x402012: qsort_internal_parallel (quicksort.c:142)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x40279F: future_get (threadpool.c:112)
> ==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x402450: thread_work (threadpool.c:233)
> ==31341== by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
> ==31341== by 0x4E42DC4: start_thread
> (in /usr/lib64/libpthread-2.17.so)
> ==31341== by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
> ==31341== Block was alloc'd by thread #3
> ==31341==
> ==31341== Possible data race during read of size 4 at 0x5990880 by
> thread #2
> ==31341== Locks held: 1, at address 0x5990828
> ==31341== at 0x4023A9: thread_work (threadpool.c:229)
> ==31341== by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
> ==31341== by 0x4E42DC4: start_thread
> (in /usr/lib64/libpthread-2.17.so)
> ==31341== by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
> ==31341==
> ==31341== This conflicts with a previous write of size 4 by thread #3
> ==31341== Locks held: 1, at address 0x5990828
> ==31341== at 0x4027B3: future_get (threadpool.c:114)
> ==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x40279F: future_get (threadpool.c:112)
> ==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
> ==31341== by 0x40279F: future_get (threadpool.c:112)
> ==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== Address 0x5990880 is 128 bytes inside a block of size 152
> alloc'd
> ==31341== at 0x4C2CD95: calloc (vg_replace_malloc.c:711)
> ==31341== by 0x4026A1: thread_pool_submit (threadpool.c:84)
> ==31341== by 0x402012: qsort_internal_parallel (quicksort.c:142)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x40279F: future_get (threadpool.c:112)
> ==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x402450: thread_work (threadpool.c:233)
> ==31341== by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
> ==31341== by 0x4E42DC4: start_thread
> (in /usr/lib64/libpthread-2.17.so)
> ==31341== by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
> ==31341== Block was alloc'd by thread #3
> ==31341==
> ==31341==
> ----------------------------------------------------------------
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________ Valgrind-users mailing list Valgrind-***@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users

William Good

2017-05-31 18:26:15 UTC

Permalink

So it is actually two different locks that just happen to occupy the same address at different times? Usually, helgrind indicates when each lock was first observed but there is no mention of a second lock. No my reproducer is fairly large.

________________________________
From: Philippe Waroquiers <***@skynet.be>
Sent: Monday, May 29, 2017 5:20 PM
To: William Good
Cc: valgrind-***@lists.sourceforge.net
Subject: Re: [Valgrind-users] Helgrind detects race with same lock

You might have been unlucky and have a lock that was freed and then
re-used.

See extract of mk_LockP_from_LockN comments:
So we check that each LockN is a member of the admin_locks double
linked list of all Lock structures. That stops us prodding around
in potentially freed-up Lock structures. However, it's not quite a
proper check: if a new Lock has been reallocated at the same
address as one which was previously freed, we'll wind up copying
the new one as the basis for the LockP, which is completely bogus
because it is unrelated to the previous Lock that lived there.
Let's hope that doesn't happen too often.

Do you have a small reproducer for the below ?
Philippe

On Mon, 2017-05-29 at 17:33 +0000, William Good wrote:
> Hello,
>
> I am trying to understand this helgrind output. It says there is a
> data-race on a read. However both threads hold the same lock. How
> can this be a race when both threads hold the lock during the access?
>
>
> ==31341==
> ----------------------------------------------------------------
> ==31341==
> ==31341== Lock at 0x5990828 was first observed
> ==31341== at 0x4C31A76: pthread_mutex_init (hg_intercepts.c:779)
> ==31341== by 0x4026AF: thread_pool_submit (threadpool.c:85)
> ==31341== by 0x402012: qsort_internal_parallel (quicksort.c:142)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x402450: thread_work (threadpool.c:233)
> ==31341== by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
> ==31341== by 0x4E42DC4: start_thread
> (in /usr/lib64/libpthread-2.17.so)
> ==31341== by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
> ==31341== Address 0x5990828 is 40 bytes inside a block of size 152
> alloc'd
> ==31341== at 0x4C2CD95: calloc (vg_replace_malloc.c:711)
> ==31341== by 0x4026A1: thread_pool_submit (threadpool.c:84)
> ==31341== by 0x402012: qsort_internal_parallel (quicksort.c:142)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x40279F: future_get (threadpool.c:112)
> ==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x402450: thread_work (threadpool.c:233)
> ==31341== by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
> ==31341== by 0x4E42DC4: start_thread
> (in /usr/lib64/libpthread-2.17.so)
> ==31341== by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
> ==31341== Block was alloc'd by thread #3
> ==31341==
> ==31341== Possible data race during read of size 4 at 0x5990880 by
> thread #2
> ==31341== Locks held: 1, at address 0x5990828
> ==31341== at 0x4023A9: thread_work (threadpool.c:229)
> ==31341== by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
> ==31341== by 0x4E42DC4: start_thread
> (in /usr/lib64/libpthread-2.17.so)
> ==31341== by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
> ==31341==
> ==31341== This conflicts with a previous write of size 4 by thread #3
> ==31341== Locks held: 1, at address 0x5990828
> ==31341== at 0x4027B3: future_get (threadpool.c:114)
> ==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x40279F: future_get (threadpool.c:112)
> ==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
> ==31341== by 0x40279F: future_get (threadpool.c:112)
> ==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== Address 0x5990880 is 128 bytes inside a block of size 152
> alloc'd
> ==31341== at 0x4C2CD95: calloc (vg_replace_malloc.c:711)
> ==31341== by 0x4026A1: thread_pool_submit (threadpool.c:84)
> ==31341== by 0x402012: qsort_internal_parallel (quicksort.c:142)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x40279F: future_get (threadpool.c:112)
> ==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
> ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> ==31341== by 0x402450: thread_work (threadpool.c:233)
> ==31341== by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
> ==31341== by 0x4E42DC4: start_thread
> (in /usr/lib64/libpthread-2.17.so)
> ==31341== by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
> ==31341== Block was alloc'd by thread #3
> ==31341==
> ==31341==
> ----------------------------------------------------------------
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________ Valgrind-users mailing list Valgrind-***@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Valgrind-users Info Page - SourceForge<https://lists.sourceforge.net/lists/listinfo/valgrind-users>
lists.sourceforge.net
To see the collection of prior postings to the list, visit the Valgrind-users Archives. Using Valgrind-users: To post a message to all the list ...

Philippe Waroquiers

2017-05-31 21:54:01 UTC

Permalink

On Wed, 2017-05-31 at 18:26 +0000, William Good wrote:
> So it is actually two different locks that just happen to occupy the
> same address at different times? Usually, helgrind indicates when
> each lock was first observed but there is no mention of a second lock.
To verify this hypothesis, you might run with -v -v -v.
Each time a lock is pthread_mutex_init-ed, you should see a line
such as:
client request: code 48470103, addr 0x5400040, len 0
the request corresponds to the enum client request defined
in helgrind.h : 0x103 = 256 + 3, which is
_VG_USERREQ__HG_PTHREAD_MUTEX_INIT_POST

If you see such a line twice with the same addr, then that
indicates we had 2 initialisations of a mutex at the same
addr.
And the comment below makes me believe helgrind does
not handle that very cleanly.

> No my reproducer is fairly large
That is not a surprise :).
If the problem is effectively linked to re-creation of
another mutex at the same addr, then i think a small
reproducer should be easy to write.

But let's first confirm you see 2 initialisations

You might also try with --tool=drd, to see if drd confirms
the race condition.

Philippe

>
>
>
> ______________________________________________________________________
> From: Philippe Waroquiers <***@skynet.be>
> Sent: Monday, May 29, 2017 5:20 PM
> To: William Good
> Cc: valgrind-***@lists.sourceforge.net
> Subject: Re: [Valgrind-users] Helgrind detects race with same lock
>
> You might have been unlucky and have a lock that was freed and then
> re-used.
>
> See extract of mk_LockP_from_LockN comments:
> So we check that each LockN is a member of the admin_locks double
> linked list of all Lock structures. That stops us prodding around
> in potentially freed-up Lock structures. However, it's not quite a
> proper check: if a new Lock has been reallocated at the same
> address as one which was previously freed, we'll wind up copying
> the new one as the basis for the LockP, which is completely bogus
> because it is unrelated to the previous Lock that lived there.
> Let's hope that doesn't happen too often.
>
> Do you have a small reproducer for the below ?
> Philippe
>
>
> On Mon, 2017-05-29 at 17:33 +0000, William Good wrote:
> > Hello,
> >
> > I am trying to understand this helgrind output. It says there is a
> > data-race on a read. However both threads hold the same lock. How
> > can this be a race when both threads hold the lock during the
> access?
> >
> >
> > ==31341==
> > ----------------------------------------------------------------
> > ==31341==
> > ==31341== Lock at 0x5990828 was first observed
> > ==31341== at 0x4C31A76: pthread_mutex_init (hg_intercepts.c:779)
> > ==31341== by 0x4026AF: thread_pool_submit (threadpool.c:85)
> > ==31341== by 0x402012: qsort_internal_parallel (quicksort.c:142)
> > ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341== by 0x402450: thread_work (threadpool.c:233)
> > ==31341== by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
> > ==31341== by 0x4E42DC4: start_thread
> > (in /usr/lib64/libpthread-2.17.so)
> > ==31341== by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
> > ==31341== Address 0x5990828 is 40 bytes inside a block of size 152
> > alloc'd
> > ==31341== at 0x4C2CD95: calloc (vg_replace_malloc.c:711)
> > ==31341== by 0x4026A1: thread_pool_submit (threadpool.c:84)
> > ==31341== by 0x402012: qsort_internal_parallel (quicksort.c:142)
> > ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341== by 0x40279F: future_get (threadpool.c:112)
> > ==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
> > ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341== by 0x402450: thread_work (threadpool.c:233)
> > ==31341== by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
> > ==31341== by 0x4E42DC4: start_thread
> > (in /usr/lib64/libpthread-2.17.so)
> > ==31341== by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
> > ==31341== Block was alloc'd by thread #3
> > ==31341==
> > ==31341== Possible data race during read of size 4 at 0x5990880 by
> > thread #2
> > ==31341== Locks held: 1, at address 0x5990828
> > ==31341== at 0x4023A9: thread_work (threadpool.c:229)
> > ==31341== by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
> > ==31341== by 0x4E42DC4: start_thread
> > (in /usr/lib64/libpthread-2.17.so)
> > ==31341== by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
> > ==31341==
> > ==31341== This conflicts with a previous write of size 4 by thread
> #3
> > ==31341== Locks held: 1, at address 0x5990828
> > ==31341== at 0x4027B3: future_get (threadpool.c:114)
> > ==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
> > ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341== by 0x40279F: future_get (threadpool.c:112)
> > ==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
> > ==31341== by 0x40279F: future_get (threadpool.c:112)
> > ==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
> > ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341== Address 0x5990880 is 128 bytes inside a block of size 152
> > alloc'd
> > ==31341== at 0x4C2CD95: calloc (vg_replace_malloc.c:711)
> > ==31341== by 0x4026A1: thread_pool_submit (threadpool.c:84)
> > ==31341== by 0x402012: qsort_internal_parallel (quicksort.c:142)
> > ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341== by 0x40279F: future_get (threadpool.c:112)
> > ==31341== by 0x402048: qsort_internal_parallel (quicksort.c:152)
> > ==31341== by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341== by 0x402450: thread_work (threadpool.c:233)
> > ==31341== by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
> > ==31341== by 0x4E42DC4: start_thread
> > (in /usr/lib64/libpthread-2.17.so)
> > ==31341== by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
> > ==31341== Block was alloc'd by thread #3
> > ==31341==
> > ==31341==
> > ----------------------------------------------------------------
> >
> >
> ------------------------------------------------------------------------------
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > _______________________________________________ Valgrind-users
> mailing list Valgrind-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/valgrind-users
>
> Valgrind-users Info Page - SourceForge
> lists.sourceforge.net
> To see the collection of prior postings to the list, visit the
> Valgrind-users Archives. Using Valgrind-users: To post a message to
> all the list ...
>
>
>
>
>

Philippe Waroquiers

2017-06-03 10:00:29 UTC

Permalink

On Wed, 2017-05-31 at 23:54 +0200, Philippe Waroquiers wrote:
> If the problem is effectively linked to re-creation of
> another mutex at the same addr, then i think a small
> reproducer should be easy to write.

The below small program reproduces the behaviour
you have seen: a race condition is reported between
2 threads while helgrind reports they are holding
the same lock.

But in reality, the lock was destroyed and re-created.
Helgrind falsely believes this to be the same lock
(and falsely reports that the first time the lock was observed
is the re-creation).

So, it is probable that what you see is a similar case.
In absence of another synchronisation than this (falsely
presented as a common) lock, you might have a real race
condition.

Philippe

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <unistd.h>

pthread_mutex_t mx1;
int x = 0;

void* child_fn ( void* arg )
{
int r;
int destroy;
printf("child_fn\n");
r= pthread_mutex_lock(&mx1); assert(!r);
x += 1;
destroy = x == 1;
r= pthread_mutex_unlock(&mx1); assert(!r);
if (destroy) {
printf("destroy/recreate mx1\n");
r= pthread_mutex_destroy(&mx1); assert(!r);
r= pthread_mutex_init(&mx1, NULL); assert(!r);
}
printf("child_fn returning ...\n");
return NULL;
}

void* child_fn2 ( void* arg )
{
sleep (20);
child_fn ( arg );
return NULL;
}

int main ( int argc, char** argv )
{
pthread_t child1, child2;
int r;

r= pthread_mutex_init(&mx1, NULL); assert(!r);
printf("creating threads\n");
r= pthread_create(&child1, NULL, child_fn, NULL); assert(!r);
r= pthread_create(&child2, NULL, child_fn2, NULL); assert(!r);
printf("sleeping 5\n");
sleep (5);

printf("joining child1\n");
r= pthread_join(child1, NULL); assert(!r);
printf("joining child2\n");
r= pthread_join(child2, NULL); assert(!r);
printf("end\n");

return 0;
}