[Valgrind-users] helgrind/drd annotations for one statement

Discussion:

[Valgrind-users] helgrind/drd annotations for one statement

Leif Walsh

2012-12-03 19:05:54 UTC

I have a memory location whose behavior I'd like to verify with helgrind
and/or drd, but which I access in a racy way sometimes. I would like to
ignore the race I know about but still have it checked for other accesses.
Is this possible?

The race is that, while a thread has a read lock, it may set the value to
1, but it does not read the value. While a thread has a write lock, it may
read the value and it may set the value to 0. The race is that many
threads can set it to 1 if they all have the read lock. DRD and helgrind
tend to view this as a race.

I tried something like this, but wasn't surprised that it didn't work:

{
VALGRIND_HG_DISABLE_CHECKING(&var, sizeof var);
DRD_IGNORE_VAR(var);
var = 1;
DRD_STOP_IGNORING_VAR(var);
VALGRIND_HG_ENABLE_CHECKING(&var, sizeof var);
}

I can eliminate the race report if I disable checking on the variable
always, but I don't want to lose all that checking if I can avoid it.

--
Cheers,
Leif

Julian Seward

2012-12-04 09:49:57 UTC

Post by Leif Walsh
I have a memory location whose behavior I'd like to verify with helgrind
and/or drd, but which I access in a racy way sometimes. I would like to
ignore the race I know about but still have it checked for other accesses.
Is this possible?
The race is that, while a thread has a read lock, it may set the value to
1, but it does not read the value. While a thread has a write lock, it may
read the value and it may set the value to 0. The race is that many
threads can set it to 1 if they all have the read lock. DRD and helgrind
tend to view this as a race.

Correctly, IMO. Why do you think this behaviour is OK ? If you have
a read lock, you can read the location, but not write it.

Were you trying to achieve some larger goal, such as (at a guess) the
thread with the write lock creates a description of some work to be done
(in an object in memory) and multiple threads with read locks compete to
do that work, and write into the location you mentioned, in order to show
that they have completed it?

J

Leif Walsh

2012-12-04 13:22:24 UTC

Sent from my iPhone

Post by Julian Seward

Post by Leif Walsh
I have a memory location whose behavior I'd like to verify with helgrind
and/or drd, but which I access in a racy way sometimes. I would like to
ignore the race I know about but still have it checked for other accesses.
Is this possible?
The race is that, while a thread has a read lock, it may set the value to
1, but it does not read the value. While a thread has a write lock, it may
read the value and it may set the value to 0. The race is that many
threads can set it to 1 if they all have the read lock. DRD and helgrind
tend to view this as a race.

Correctly, IMO. Why do you think this behaviour is OK ? If you have
a read lock, you can read the location, but not write it.

It is a race, but one that has the same result as if everything were serialized. If three threads all set this bit to 1, that's ok. It's ok because they're all trying to write the same thing.

You're right, it is correctly flagged as a race, I'm just hoping to ask it to not report this one.

Post by Julian Seward
Were you trying to achieve some larger goal, such as (at a guess) the
thread with the write lock creates a description of some work to be done
(in an object in memory) and multiple threads with read locks compete to
do that work, and write into the location you mentioned, in order to show
that they have completed it?
J

Julian Seward

2012-12-04 14:33:54 UTC

Post by Leif Walsh
It is a race, but one that has the same result as if everything were
serialized. If three threads all set this bit to 1, that's ok. It's ok
because they're all trying to write the same thing.

Do you have suitable memory fences in place, so it won't break in
mysterious ways on targets that deliver stores out-of-order to other
processors, eg Power7?

Anyway ..

Post by Leif Walsh
You're right, it is correctly flagged as a race, I'm just hoping to ask it
to not report this one.

You might be best off removing the magic macros shown in your first posting,
and instead using Valgrind's suppression mechanism to hide precisely the
error(s) you don't want to see.

J

Leif Walsh

2012-12-04 15:24:00 UTC

Sent from my iPhone

Post by Julian Seward

Post by Leif Walsh
It is a race, but one that has the same result as if everything were
serialized. If three threads all set this bit to 1, that's ok. It's ok
because they're all trying to write the same thing.

Do you have suitable memory fences in place, so it won't break in
mysterious ways on targets that deliver stores out-of-order to other
processors, eg Power7?

I'm ashamed to say I don't quite understand what can fail here but I'm not worried until I start supporting such architectures. Thanks for the warning though.

Post by Julian Seward
Anyway ..

Post by Leif Walsh
You're right, it is correctly flagged as a race, I'm just hoping to ask it
to not report this one.

You might be best off removing the magic macros shown in your first posting,
and instead using Valgrind's suppression mechanism to hide precisely the
error(s) you don't want to see.

Ok, thanks.

Post by Julian Seward
J

Patrick J. LoPresti

2012-12-04 18:32:21 UTC

Post by Leif Walsh
I'm ashamed to say I don't quite understand

The first step to enlightenment

Post by Leif Walsh
what can fail here but I'm not worried until I start supporting such architectures. Thanks for the warning though.

It is not only about future architectures; it is about smart compilers
on any architecture. Do you ever plan to update your compiler? Then
you should be worried.

It is quite possible for concurrent writes of the _same value_ to
create problems. For a concrete example, see section 2.4 of Hans
Boehm's classic paper:

http://www.usenix.org/event/hotpar11/tech/final_files/Boehm.pdf

The specifics are complicated, but the general idea is very simple. A
data race is "undefined behavior", period. Therefore your compiler
may assume you do not have any and optimize your code accordingly.
(In fact, if your compiler can prove you are engaging in undefined
behavior, it is permitted to compile your entire program into a no-op.
Yes, really.)

Undefined behavior is always a serious bug, precisely because a
trivial, bug-free compiler upgrade can utterly break your program.
Boehm is kind enough to give realistic examples, since bad software
engineers (i.e. most of them) simply refuse to accept this fact
otherwise.

There is no such thing as a benign data race. Ever.

- Pat

Julian Seward

2012-12-05 17:29:08 UTC

Post by Patrick J. LoPresti
There is no such thing as a benign data race. Ever.

<soapbox>

Well said. I couldn't have put any of this better myself.

Having spent a considerable amount of time working on Helgrind and then
using it to chase races in some big hairy C++ codes, I became very
skeptical of the "oh it's only a harmless race" arguments. However,
I gave up shouting about it after a while since it just made me look
like a tiresome pedant hellbent on criticising people's clever go-faster-
by-avoiding-locking schemes.

IMO, the idea that some races are harmless is a dangerous myth that
we should try to stamp out.

I also noticed that threaded code that relies on unsynchronised
accesses is hard for newcomers to understand and reason about, so it
tends to be a maintenance hazard.

In some ways, the fact that the Intel architecture guarantees to
deliver stores in-order (x86-TSO), and is therefore somewhat tolerant
of such racery, is a disadvantage. I think people would take this
stuff more seriously if racey code got trashed more often by machines
with memory systems that do reordering, such as Power7.

</soapbox>

J

Brian Budge

2012-12-05 18:17:20 UTC

It would be better to use atomics (with proper memory order), which can
solve this kind of problem without unintended consequences. As mentioned
before, this still may not be as maintenance friendly as locking, but at
least it will be correct.

Brian

Post by Julian Seward

Post by Patrick J. LoPresti
There is no such thing as a benign data race. Ever.

<soapbox>
Well said. I couldn't have put any of this better myself.
Having spent a considerable amount of time working on Helgrind and then
using it to chase races in some big hairy C++ codes, I became very
skeptical of the "oh it's only a harmless race" arguments. However,
I gave up shouting about it after a while since it just made me look
like a tiresome pedant hellbent on criticising people's clever go-faster-
by-avoiding-locking schemes.
IMO, the idea that some races are harmless is a dangerous myth that
we should try to stamp out.
I also noticed that threaded code that relies on unsynchronised
accesses is hard for newcomers to understand and reason about, so it
tends to be a maintenance hazard.
In some ways, the fact that the Intel architecture guarantees to
deliver stores in-order (x86-TSO), and is therefore somewhat tolerant
of such racery, is a disadvantage. I think people would take this
stuff more seriously if racey code got trashed more often by machines
with memory systems that do reordering, such as Power7.
</soapbox>
J
------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Valgrind-users mailing list
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Leif Walsh

2012-12-05 18:29:38 UTC

Here are the threads I have:

{
pthread_rwlock_rdlock();
x = 1;
pthread_rwlock_rdunlock();
}

{
pthread_rwlock_wrlock();
if (x) {
x = 0;
} else {
foo();
}
pthread_rwlock_wrunlock();
}

Boehm's paper did not convince me of a bug.

Since pthread locks will include proper mfences I believe this to be correct even on non-x86 machines.

Sent from my iPhone

It would be better to use atomics (with proper memory order), which can solve this kind of problem without unintended consequences. As mentioned before, this still may not be as maintenance friendly as locking, but at least it will be correct.
Brian

Post by Julian Seward

Post by Patrick J. LoPresti
There is no such thing as a benign data race. Ever.

<soapbox>
Well said. I couldn't have put any of this better myself.
Having spent a considerable amount of time working on Helgrind and then
using it to chase races in some big hairy C++ codes, I became very
skeptical of the "oh it's only a harmless race" arguments. However,
I gave up shouting about it after a while since it just made me look
like a tiresome pedant hellbent on criticising people's clever go-faster-
by-avoiding-locking schemes.
IMO, the idea that some races are harmless is a dangerous myth that
we should try to stamp out.
I also noticed that threaded code that relies on unsynchronised
accesses is hard for newcomers to understand and reason about, so it
tends to be a maintenance hazard.
In some ways, the fact that the Intel architecture guarantees to
deliver stores in-order (x86-TSO), and is therefore somewhat tolerant
of such racery, is a disadvantage. I think people would take this
stuff more seriously if racey code got trashed more often by machines
with memory systems that do reordering, such as Power7.
</soapbox>
J
------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Valgrind-users mailing list
https://lists.sourceforge.net/lists/listinfo/valgrind-users

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Valgrind-users mailing list
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Julian Seward

2012-12-05 20:38:29 UTC

Post by Leif Walsh
{
pthread_rwlock_rdlock();
x = 1;
pthread_rwlock_rdunlock();
}
{
pthread_rwlock_wrlock();
if (x) {
x = 0;
} else {
foo();
}
pthread_rwlock_wrunlock();
}
Boehm's paper did not convince me of a bug.
Since pthread locks will include proper mfences I believe this to be
correct even on non-x86 machines.

Well, but what's your definition of "correct"? That's central to the
issue at hand.

Imagine that pthread_rwlock_rdunlock contains a one-sided fence,
that causes all writes done by other processors to be visible on
this processor, but has no effect on migrating writes done by this
processor to others. (Why should it? Holding a reader lock doesn't
allow you to write shared memory, so a conforming program should
have no writes that need to be propagated to other processors at
lock release time.)

The above is just a guess. I'm not saying your code will fail on current
implementations. Merely that (as Patrick points out), non-adherence
to the standard puts the program pretty much in the Russian Roulette
department w.r.t. future hardware and compiler developments.

You may remember the gcc -fstrict-aliasing wars of the early 2000s,
in which the gcc developers starting using a relatively obscure aspect
of the C/C++ standard to improve optimisation. As a result vast reams
of previously-working code (including Valgrind) started to fail in
obscure ways.

Same thing happened on a smaller scale a couple of years later when
gcc started to assume that signed integer arithmetic would never overflow.

So .. seen from a historical context, the compiler crews are an extremely
ingenious bunch, and I would not be at all surprised to see newer compilers
causing non-compliant (racey) code to fail more often.

J

Leif Walsh

2012-12-05 21:39:47 UTC

Sent from my iPhone

Post by Julian Seward

Post by Leif Walsh
{
pthread_rwlock_rdlock();
x = 1;
pthread_rwlock_rdunlock();
}
{
pthread_rwlock_wrlock();
if (x) {
x = 0;
} else {
foo();
}
pthread_rwlock_wrunlock();
}
Boehm's paper did not convince me of a bug.
Since pthread locks will include proper mfences I believe this to be
correct even on non-x86 machines.

Well, but what's your definition of "correct"? That's central to the
issue at hand.
Imagine that pthread_rwlock_rdunlock contains a one-sided fence,
that causes all writes done by other processors to be visible on
this processor, but has no effect on migrating writes done by this
processor to others. (Why should it? Holding a reader lock doesn't
allow you to write shared memory, so a conforming program should
have no writes that need to be propagated to other processors at
lock release time.)

The important synchronization point isn't the rdunlock, it's the wrlock.

I have a hard time believing that you can take a pthread write lock and then look at a value some other processor wrote before you took the lock and not get that value. That sounds like pthread is broken or I am being exceedingly thick. But everything I have read on this since I started asking around says that my code is fine.

I agree that races that seem benign can easily stop being benign. I just don't think this is a race according to the spec.

Post by Julian Seward
The above is just a guess. I'm not saying your code will fail on current
implementations. Merely that (as Patrick points out), non-adherence
to the standard puts the program pretty much in the Russian Roulette
department w.r.t. future hardware and compiler developments.
You may remember the gcc -fstrict-aliasing wars of the early 2000s,
in which the gcc developers starting using a relatively obscure aspect
of the C/C++ standard to improve optimisation. As a result vast reams
of previously-working code (including Valgrind) started to fail in
obscure ways.
Same thing happened on a smaller scale a couple of years later when
gcc started to assume that signed integer arithmetic would never overflow.
So .. seen from a historical context, the compiler crews are an extremely
ingenious bunch, and I would not be at all surprised to see newer compilers
causing non-compliant (racey) code to fail more often.
J

David Faure

2012-12-05 21:51:41 UTC

Post by Leif Walsh
The important synchronization point isn't the rdunlock, it's the wrlock.

Well, you need two locks, for a happens-before relationship to be established.
If you remove the rdlock/rdunlock completely (since it could basically be a
no-op for a write operation anyway, as others pointed out earlier), then this
will be more clear: this write might never become visible to the other thread.

Post by Leif Walsh
I have a hard time believing that you can take a pthread write lock and then
look at a value some other processor wrote before you took the lock and not
get that value.

You say "before", but this assumes a global ordering, which you don't get,
when not using atomics or the proper locks.
Each CPU can have a different notion of "before", without the correct
synchronization primitives.

I recommend reading "C++ Concurrency in action" by Anthony Williams, it taught
me a lot on all this all works... definitely not a simple topic.

--
David Faure, ***@kde.org, http://www.davidfaure.fr
Working on KDE, in particular KDE Frameworks 5

Leif Walsh

2012-12-05 21:52:42 UTC

Rdunlock happens before wrlock.

Sent from my iPhone

Post by David Faure

Post by Leif Walsh
The important synchronization point isn't the rdunlock, it's the wrlock.

Well, you need two locks, for a happens-before relationship to be established.
If you remove the rdlock/rdunlock completely (since it could basically be a
no-op for a write operation anyway, as others pointed out earlier), then this
will be more clear: this write might never become visible to the other thread.

Post by Leif Walsh
I have a hard time believing that you can take a pthread write lock and then
look at a value some other processor wrote before you took the lock and not
get that value.

You say "before", but this assumes a global ordering, which you don't get,
when not using atomics or the proper locks.
Each CPU can have a different notion of "before", without the correct
synchronization primitives.
I recommend reading "C++ Concurrency in action" by Anthony Williams, it taught
me a lot on all this all works... definitely not a simple topic.
--
Working on KDE, in particular KDE Frameworks 5

David Faure

2012-12-05 22:02:16 UTC

Post by Leif Walsh
Rdunlock happens before wrlock.

... and is supposed to be about reading only, so why would CPUs bother to
propagate modified data inside that lock, to other CPUs?

--
David Faure, ***@kde.org, http://www.davidfaure.fr
Working on KDE, in particular KDE Frameworks 5

Patrick J. LoPresti

2012-12-05 21:31:15 UTC

Post by Leif Walsh
{
pthread_rwlock_rdlock();
x = 1;
pthread_rwlock_rdunlock();
}
{
pthread_rwlock_wrlock();
if (x) {
x = 0;
} else {
foo();
}
pthread_rwlock_wrunlock();
}

I am going to repeat myself a bit because this is an important point.
Then I will give up.

First, this code smells on its surface. If I were interviewing a
candidate who submitted this as a code sample, it would be an easy
"strong no hire" decision. Grabbing a reader lock to write a
variable? What the heck is that?

No matter how smart you think you are, sooner or later, you will work
with people who lack your genius. Truly smart programmers write code
that any idiot can prove correct. This does not qualify, to put it
mildly.

Second, as we keep trying to explain, concurrent writes to the same
location are *undefined behavior* under every common threading model
(including, in particular, POSIX and C++11). That means a compiler
looking at this code can PROVE that only one thread ever runs the
first block concurrently. If that is not true, then you have
introduced a contradiction into the compiler's reasoning. "

Leif Walsh

2012-12-05 21:51:58 UTC

Sent from my iPhone

Post by Patrick J. LoPresti

Post by Leif Walsh
{
pthread_rwlock_rdlock();
x = 1;
pthread_rwlock_rdunlock();
}
{
pthread_rwlock_wrlock();
if (x) {
x = 0;
} else {
foo();
}
pthread_rwlock_wrunlock();
}

I am going to repeat myself a bit because this is an important point.
Then I will give up.
First, this code smells on its surface. If I were interviewing a
candidate who submitted this as a code sample, it would be an easy
"strong no hire" decision.

Ouch.

Post by Patrick J. LoPresti
Grabbing a reader lock to write a
variable? What the heck is that?

It's an eviction heuristic. It doesn't even need to be perfectly accurate.

Post by Patrick J. LoPresti
No matter how smart you think you are, sooner or later, you will work
with people who lack your genius. Truly smart programmers write code
that any idiot can prove correct. This does not qualify, to put it
mildly.
Second, as we keep trying to explain, concurrent writes to the same
location are *undefined behavior* under every common threading model
(including, in particular, POSIX and C++11). That means a compiler
looking at this code can PROVE that only one thread ever runs the
first block concurrently. If that is not true, then you have
introduced a contradiction into the compiler's reasoning. "From a
contradiction, anything follows." How badly this code breaks depends
only on how smart your compiler is, and they get smarter all the time.

Try to sell a compiler that generates a random executable just because it proves undefined behavior.

I'm not trying to be (that) snarky, just realistic. I understand the risk you're describing and I appreciate you taking the time to do so. But adding extra fences here just because some theoretical compiler might turn this code into a giraffe drawing program is not a useful way for me or the CPU running my program to spend our time.

Post by Patrick J. LoPresti

Post by Leif Walsh
Boehm's paper did not convince me of a bug.

When dealing with multi-threading, the right question is not "Do I
fail to see the problem?" The right question is "Can I create a
simple proof that there is no problem?"
In my experience, most programmers either get this right away, or they
never will.

Post by Leif Walsh
Since pthread locks will include proper mfences I believe this to be correct
even on non-x86 machines.

(a) You believe incorrectly, because undefined behavior can result in
anything whatsoever.
(b) Even if you were correct, the chain of reasoning to prove it is
too long for maintainable code.
Anyway, Helgrind is perfectly correct to warn about this bug.

It is.

Post by Patrick J. LoPresti
- Pat

Godmar Back

2012-12-05 18:36:38 UTC

Post by Julian Seward

Post by Patrick J. LoPresti
There is no such thing as a benign data race. Ever.

<soapbox>
Well said. I couldn't have put any of this better myself.

For the interested audience: Adve/Boehm's very readable and
informative 2010 CACM paper [1] reflects, I believe, Julien's
sentiment as well. If my reading is correct, then the new C and C++
standards (C11 and C11++) effectively adopt the same approach by
leaving the semantics of programs with data races completely
undefined.

Post by Julian Seward
Having spent a considerable amount of time working on Helgrind and then
using it to chase races in some big hairy C++ codes, I became very
skeptical of the "oh it's only a harmless race" arguments. However,
I gave up shouting about it after a while since it just made me look
like a tiresome pedant hellbent on criticising people's clever go-faster-
by-avoiding-locking schemes.

Julien, while you're still on your soapbox, let me make an attempt to
request an answer to a question I sent Oct 1 to this mailing list -
out of academic and practical curiosity, I'm really interested in
learning why Helgrind works the way it does currently, i.e., why did
it abandon Eraser-style locksets [2] in recent versions?

Thanks.

- Godmar

[1] Memory Models: A Case for Rethinking Parallel Languages and Hardware
By Sarita V. Adve, Hans-J. Boehm
Communications of the ACM, Vol. 53 No. 8, Pages 90-101
10.1145/1787234.1787255
http://cacm.acm.org/magazines/2010/8/96610-memory-models-a-case-for-rethinking-parallel-languages-and-hardware/fulltext

[2] http://markmail.org/thread/odrhlvarckfgzvnk
"[Valgrind-users] Q.: Why did Helgrind abandon Eraser-style locksets?"

Julian Seward

2012-12-05 20:49:22 UTC

Post by Godmar Back
learning why Helgrind works the way it does currently, i.e., why did
it abandon Eraser-style locksets [2] in recent versions?

With Memcheck, great efforts were made, and continue to be made, to have
a very low -- essentially zero -- false error rate. Feedback from users
showed that to be something that was regarded as very important.

So it seemed natural to do the same with Helgrind. A pure h-b
framework gives no false positives for race-free programs, provided
it can see all the inter-thread synchronisation events.

J

17 Replies
54 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Leif Walsh 2012-12-03 19:05:54 UTC

Julian Seward 2012-12-04 09:49:57 UTC

Leif Walsh 2012-12-04 13:22:24 UTC

Julian Seward 2012-12-04 14:33:54 UTC

Leif Walsh 2012-12-04 15:24:00 UTC

Patrick J. LoPresti 2012-12-04 18:32:21 UTC

Julian Seward 2012-12-05 17:29:08 UTC

Brian Budge 2012-12-05 18:17:20 UTC

Leif Walsh 2012-12-05 18:29:38 UTC

Julian Seward 2012-12-05 20:38:29 UTC

Leif Walsh 2012-12-05 21:39:47 UTC

David Faure 2012-12-05 21:51:41 UTC

Leif Walsh 2012-12-05 21:52:42 UTC

David Faure 2012-12-05 22:02:16 UTC

Patrick J. LoPresti 2012-12-05 21:31:15 UTC

Leif Walsh 2012-12-05 21:51:58 UTC

Godmar Back 2012-12-05 18:36:38 UTC

Julian Seward 2012-12-05 20:49:22 UTC

about - legalese

Loading...