[Valgrind-users] Segfault, Assertion 'bad_scanned_addr >= VG

Jim

2016-02-22 18:26:07 UTC

Hi, I'm using valgrind to debug some memory issues in a parallel
runtime system and program that uses both MPI and C++11 threads. I am
hitting an issue that, I believe, is the same as Valgrind Bug 349128
(https://bugs.kde.org/show_bug.cgi?id=349128). However, I upgraded to
valgrind-3.12.0.SVN, which I was hoping would fix the issue as per
philippe's commit r15716 in October. Despite running the latest from
svn, I'm still hitting this bug. It's quite possible, though, that I'm
hitting a different bug (either mine or Valgrind's). So, I'm writing
in to see if anyone can help me out with this issue.

Overview: I'm building a new parallel runtime system, and the valgrind
error (segfault -- details below) shows up when the threads are done
running (almost) and are joining back to the parent thread. I'm able
to run many different other tests using my runtime system on valgrind
without this error (and, memcheck doesn't detect any issues with my
runtime system -- the valgrind logs are empty except for the
metadata). However, it's quite possible that there are some memory
issues in my program that the other tests are not detecting.

I can't easily post my source code, as it's hundreds of files, but
here's the basic structure of the test that is failing:

* system: OSX 10.11
* Valgrind valgrind-3.12.0.SVN
* compiler:
* Configured with:
--prefix=/Applications/Xcode.app/Contents/Developer/usr
--with-gxx-include-dir=/usr/include/c++/4.2.1
* Apple LLVM version 7.0.0 (clang-700.0.72)
* Target: x86_64-apple-darwin15.0.0
* Thread model: posix
* 2 MPI processes
* 4 C++11 threads per MPI process (8 threads total)
* each process forks 4 threads with a very minimal function. function
runs and returns.
* threads go to join back to the main thread, and somewhere in
_pthread_join_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
* program runs to completion *without segfault* when I don't run using
valgrind. segfault only arises when I run with valgrind.
* I'm running valgrind on both MPI processes and get valgrind output
for both of the processes. Both processes die with the same error
(segfault on _pthread_join_cleanup)
* valgrind log below

I'd appreciate any input on helping me track this error down. Is this
a problem with valgrind, or with my program? Please let me know if
there's more information I should post.

Thank you!

Jim

Here's the valgrind log:

==59226== Memcheck, a memory error detector
==59226== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==59226== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==59226== Command: specific_bundle_test 1
==59226== Parent PID: 59224
==59226==
--59226--
--59226-- Valgrind options:
--59226-- -v
--59226-- --trace-children=yes
--59226-- --log-file=valgrind/memcheck-%p.valgrind
--59226-- --tool=memcheck
--59226-- --leak-check=yes
--59226-- --dsymutil=yes
--59226-- --extra-debuginfo-path=/Users/jim/Documents/Research/projects/hybrid_programming/pure/test/../lib
--59226-- Output from sysctl({CTL_KERN,KERN_VERSION}):
--59226-- Darwin Kernel Version 15.0.0: Wed Aug 26 16:57:32 PDT
2015; root:xnu-3247.1.106~1/RELEASE_X86_64
--59226-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-rdtscp-sse3
--59226-- Page sizes: currently 4096, max supported 4096
--59226-- Valgrind library directory: /Users/jim/local/lib/valgrind
--59226-- ./specific_bundle_test (rx at 0x100000000, rw at 0x1000ca000)
--59226-- reading syms from primary file (583 6949)
--59226-- dSYM=
./specific_bundle_test.dSYM/Contents/Resources/DWARF/specific_bundle_test
--59226-- reading dwarf3 from dsyms file
--59226-- /usr/lib/dyld (rx at 0x7fff5fc00000, rw at 0x7fff5fc37000)
--59226-- reading syms from primary file (6 1226)
--59226-- Scheduler: using generic scheduler lock implementation.
--59226-- Reading suppressions file: /Users/jim/local/lib/valgrind/default.supp
==59226== embedded gdbserver: reading from
/var/folders/c1/vxvr6h9x10b8dbsxhh6nx05h0000gn/T//vgdb-pipe-from-vgdb-to-59226-by-jim-on-???
==59226== embedded gdbserver: writing to
/var/folders/c1/vxvr6h9x10b8dbsxhh6nx05h0000gn/T//vgdb-pipe-to-vgdb-from-59226-by-jim-on-???
==59226== embedded gdbserver: shared mem
/var/folders/c1/vxvr6h9x10b8dbsxhh6nx05h0000gn/T//vgdb-pipe-shared-mem-vgdb-59226-by-jim-on-???
==59226==
==59226== TO CONTROL THIS PROCESS USING vgdb (which you probably
==59226== don't want to do, unless you know exactly what you're doing,
==59226== or are doing some strange experiment):
==59226== /Users/jim/local/lib/valgrind/../../bin/vgdb --pid=59226
...command...
==59226==
==59226== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==59226== /path/to/gdb specific_bundle_test
==59226== and then give GDB the following command
==59226== target remote |
/Users/jim/local/lib/valgrind/../../bin/vgdb --pid=59226
==59226== --pid is optional if only one valgrind process is running
==59226==
--59226-- REDIR: 0x7fff5fc1e5b9 (dyld:arc4random) redirected to
0x23806e30e (???)
--59226-- REDIR: 0x7fff5fc24780 (dyld:strcmp) redirected to 0x23806e270 (???)
--59226-- REDIR: 0x7fff5fc1e380 (dyld:strlen) redirected to 0x23806e23f (???)
--59226-- REDIR: 0x7fff5fc1e2e0 (dyld:strcpy) redirected to 0x23806e28c (???)
--59226-- REDIR: 0x7fff5fc21cdf (dyld:strcat) redirected to 0x23806e250 (???)
--59226-- REDIR: 0x7fff5fc21d1f (dyld:strlcat) redirected to 0x23806e2a9 (???)
--59226-- /Users/jim/local/lib/valgrind/vgpreload_core-amd64-darwin.so
(rx at 0x100139000, rw at 0x10013b000)
--59226-- reading syms from primary file (3 42)
--59226-- dSYM=
/Users/jim/local/lib/valgrind/vgpreload_core-amd64-darwin.so.dSYM/Contents/Resources/DWARF/vgpreload_core-amd64-darwin.so
--59226-- reading dwarf3 from dsyms file
--59226-- /Users/jim/local/lib/valgrind/vgpreload_memcheck-amd64-darwin.so
(rx at 0x10013d000, rw at 0x100143000)
--59226-- reading syms from primary file (72 356)
--59226-- dSYM=
/Users/jim/local/lib/valgrind/vgpreload_memcheck-amd64-darwin.so.dSYM/Contents/Resources/DWARF/vgpreload_memcheck-amd64-darwin.so
--59226-- reading dwarf3 from dsyms file
--59226-- /usr/local/Cellar/mpich/3.1.4_1/lib/libmpi.12.dylib (rx at
0x100148000, rw at 0x100220000)
--59226-- reading syms from primary file (595 736)
--59226-- /usr/lib/libSystem.B.dylib (rx at 0x100241000, rw at 0x100243000)
--59226-- reading syms from primary file (31 5)
--59226-- /usr/local/Cellar/mpich/3.1.4_1/lib/libmpicxx.12.dylib (rx
at 0x100248000, rw at 0x100251000)
--59226-- reading syms from primary file (329 285)
--59226-- /usr/local/Cellar/mpich/3.1.4_1/lib/libpmpi.12.dylib (rx at
0x100262000, rw at 0x100463000)
--59226-- reading syms from primary file (2070 4514)
--59226-- /usr/lib/libc++.1.dylib (rx at 0x100506000, rw at 0x10055a000)
--59226-- reading syms from primary file (1960 1590)
--59226-- /usr/local/Cellar/gcc/5.2.0/lib/gcc/5/libgfortran.3.dylib
(rx at 0x1005b6000, rw at 0x1006d0000)
--59226-- reading syms from primary file (1327 10741)
--59226-- /usr/local/Cellar/gcc/5.2.0/lib/gcc/5/libgcc_s.1.dylib (rx
at 0x10073a000, rw at 0x100750000)
--59226-- reading syms from primary file (159 1046)
--59226-- /usr/local/Cellar/gcc/5.2.0/lib/gcc/5/libquadmath.0.dylib
(rx at 0x10075a000, rw at 0x100792000)
--59226-- reading syms from primary file (98 1394)
--59226-- /usr/lib/system/libcache.dylib (rx at 0x1007a0000, rw at 0x1007a5000)
--59226-- reading syms from primary file (32 30)
--59226-- /usr/lib/system/libcommonCrypto.dylib (rx at 0x1007aa000, rw
at 0x1007b6000)
--59226-- reading syms from primary file (214 188)
--59226-- /usr/lib/system/libcompiler_rt.dylib (rx at 0x1007c3000, rw
at 0x1007cb000)
--59226-- reading syms from primary file (510 8)
--59226-- /usr/lib/system/libcopyfile.dylib (rx at 0x1007d8000, rw at
0x1007e1000)
--59226-- reading syms from primary file (13 35)
--59226-- /usr/lib/system/libcorecrypto.dylib (rx at 0x1007e7000, rw
at 0x10085f000)
--59226-- reading syms from primary file (428 602)
--59226-- /usr/lib/system/libdispatch.dylib (rx at 0x100877000, rw at
0x1008a5000)
--59226-- reading syms from primary file (215 832)
--59226-- /usr/lib/system/libdyld.dylib (rx at 0x1008ce000, rw at 0x1008d2000)
--59226-- reading syms from primary file (80 109)
--59226-- /usr/lib/system/libkeymgr.dylib (rx at 0x1008d9000, rw at 0x1008da000)
--59226-- reading syms from primary file (12 3)
--59226-- /usr/lib/system/libmacho.dylib (rx at 0x1008e5000, rw at 0x1008eb000)
--59226-- reading syms from primary file (97 1)
--59226-- /usr/lib/system/libquarantine.dylib (rx at 0x1008f1000, rw
at 0x1008f4000)
--59226-- reading syms from primary file (67 32)
--59226-- /usr/lib/system/libremovefile.dylib (rx at 0x1008fa000, rw
at 0x1008fc000)
--59226-- reading syms from primary file (15 4)
--59226-- /usr/lib/system/libsystem_asl.dylib (rx at 0x100901000, rw
at 0x100919000)
--59226-- reading syms from primary file (222 225)
--59226-- /usr/lib/system/libsystem_blocks.dylib (rx at 0x100926000,
rw at 0x100928000)
--59226-- reading syms from primary file (25 22)
--59226-- /usr/lib/system/libsystem_c.dylib (rx at 0x10092c000, rw at
0x1009ba000)
--59226-- reading syms from primary file (1308 746)
--59226-- /usr/lib/system/libsystem_configuration.dylib (rx at
0x1009e5000, rw at 0x1009e8000)
--59226-- reading syms from primary file (28 58)
--59226-- /usr/lib/system/libsystem_coreservices.dylib (rx at
0x1009ee000, rw at 0x1009f1000)
--59226-- reading syms from primary file (13 30)
--59226-- /usr/lib/system/libsystem_coretls.dylib (rx at 0x1009f6000,
rw at 0x100a0b000)
--59226-- reading syms from primary file (115 241)
--59226-- /usr/lib/system/libsystem_dnssd.dylib (rx at 0x100a14000, rw
at 0x100a1d000)
--59226-- reading syms from primary file (68 33)
--59226-- /usr/lib/system/libsystem_info.dylib (rx at 0x100a23000, rw
at 0x100a4d000)
--59226-- reading syms from primary file (526 527)
--59226-- /usr/lib/system/libsystem_kernel.dylib (rx at 0x100a62000,
rw at 0x100a81000)
--59226-- reading syms from primary file (1046 83)
--59226-- /usr/lib/system/libsystem_m.dylib (rx at 0x100a96000, rw at
0x100ac6000)
--59226-- reading syms from primary file (593 1)
--59226-- /usr/lib/system/libsystem_malloc.dylib (rx at 0x100ad2000,
rw at 0x100aef000)
--59226-- reading syms from primary file (102 201)
--59226-- /usr/lib/system/libsystem_network.dylib (rx at 0x100af8000,
rw at 0x100b57000)
--59226-- reading syms from primary file (664 1939)
--59226-- /usr/lib/system/libsystem_networkextension.dylib (rx at
0x100b8c000, rw at 0x100b95000)
--59226-- reading syms from primary file (82 235)
--59226-- /usr/lib/system/libsystem_notify.dylib (rx at 0x100ba0000,
rw at 0x100baa000)
--59226-- reading syms from primary file (136 53)
--59226-- /usr/lib/system/libsystem_platform.dylib (rx at 0x100bb2000,
rw at 0x100bbb000)
--59226-- reading syms from primary file (142 158)
--59226-- /usr/lib/system/libsystem_pthread.dylib (rx at 0x100bc3000,
rw at 0x100bcd000)
--59226-- reading syms from primary file (163 70)
--59226-- /usr/lib/system/libsystem_sandbox.dylib (rx at 0x100bda000,
rw at 0x100bde000)
--59226-- reading syms from primary file (79 7)
--59226-- /usr/lib/system/libsystem_secinit.dylib (rx at 0x100be4000,
rw at 0x100be6000)
--59226-- reading syms from primary file (3 6)
--59226-- /usr/lib/system/libsystem_trace.dylib (rx at 0x100beb000, rw
at 0x100bfd000)
--59226-- reading syms from primary file (94 351)
--59226-- /usr/lib/system/libunwind.dylib (rx at 0x100c0f000, rw at 0x100c15000)
--59226-- reading syms from primary file (102 52)
--59226-- /usr/lib/system/libxpc.dylib (rx at 0x100c1c000, rw at 0x100c46000)
--59226-- reading syms from primary file (503 833)
--59226-- /usr/lib/libobjc.A.dylib (rx at 0x100c64000, rw at 0x100fcf000)
--59226-- reading syms from primary file (347 935)
--59226-- /usr/lib/libauto.dylib (rx at 0x1010af000, rw at 0x1010f6000)
--59226-- reading syms from primary file (68 658)
--59226-- /usr/lib/libc++abi.dylib (rx at 0x10110b000, rw at 0x101135000)
--59226-- reading syms from primary file (337 181)
--59226-- /usr/lib/libDiagnosticMessagesClient.dylib (rx at
0x101143000, rw at 0x101145000)
--59226-- reading syms from primary file (21 14)
--59226-- REDIR: 0x100bb2ac0
(libsystem_platform.dylib:_platform_memchr$VARIANT$Generic) redirected
to 0x100140910 (_platform_memchr$VARIANT$Generic)
--59226-- REDIR: 0x100bb2c80
(libsystem_platform.dylib:_platform_memcmp) redirected to 0x100140f10
(_platform_memcmp)
--59226-- REDIR: 0x100bb3220
(libsystem_platform.dylib:_platform_strncmp) redirected to 0x1001407b0
(_platform_strncmp)
--59226-- REDIR: 0x100ad30b2 (libsystem_malloc.dylib:malloc)
redirected to 0x10013de00 (malloc)
--59226-- REDIR: 0x10092cd20 (libsystem_c.dylib:strlen) redirected to
0x100140340 (strlen)
--59226-- REDIR: 0x100bb3800
(libsystem_platform.dylib:_platform_strcmp) redirected to 0x100140870
(_platform_strcmp)
--59226-- REDIR: 0x100ad5ea8 (libsystem_malloc.dylib:free) redirected
to 0x10013e3b0 (free)
--59226-- REDIR: 0x100ad8441 (libsystem_malloc.dylib:calloc)
redirected to 0x10013e790 (calloc)
--59226-- REDIR: 0x100ad7949
(libsystem_malloc.dylib:malloc_default_zone) redirected to 0x10013fde0
(malloc_default_zone)
--59226-- REDIR: 0x100ad456a
(libsystem_malloc.dylib:malloc_zone_malloc) redirected to 0x10013e170
(malloc_zone_malloc)
--59226-- REDIR: 0x100ad7968
(libsystem_malloc.dylib:malloc_zone_calloc) redirected to 0x10013ea50
(malloc_zone_calloc)
--59226-- REDIR: 0x100ad7a22
(libsystem_malloc.dylib:malloc_zone_from_ptr) redirected to
0x10013fe30 (malloc_zone_from_ptr)
--59226-- REDIR: 0x100bb3380
(libsystem_platform.dylib:_platform_strchr$VARIANT$Generic) redirected
to 0x100140190 (_platform_strchr$VARIANT$Generic)
--59226-- REDIR: 0x100ad8644 (libsystem_malloc.dylib:realloc)
redirected to 0x10013ecb0 (realloc)
--59226-- REDIR: 0x100ada9f0
(libsystem_malloc.dylib:malloc_zone_memalign) redirected to
0x10013f7f0 (malloc_zone_memalign)
--59226-- REDIR: 0x10092cd80 (libsystem_c.dylib:strncpy) redirected to
0x100140540 (strncpy)
--59226-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option
--59226-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times)
--59226-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times)
--59226-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 8 times)
==59226==
==59226== Process terminating with default action of signal 11 (SIGSEGV)
==59226== Access not within mapped region at address 0x7000036B2C1C
==59226== at 0x100BC8873: _pthread_join_cleanup (in
/usr/lib/system/libsystem_pthread.dylib)
==59226== by 0x100BC87D7: pthread_join (in
/usr/lib/system/libsystem_pthread.dylib)
==59226== by 0x10054BE94: std::__1::thread::join() (in
/usr/lib/libc++.1.dylib)
==59226== by 0x1000A76DE: PureProcess::RunPureThreads(int, char**)
(pure_process.cpp:128)
==59226== by 0x1000B63AC: PureRT::Pure::Run(int, char**) (pure.cpp:53)
==59226== by 0x1000B6431: main (pure.cpp:62)
==59226== If you believe this happened as a result of a stack
==59226== overflow in your program's main thread (unlikely but
==59226== possible), you can try to increase the size of the
==59226== main thread stack using the --main-stacksize= flag.
==59226== The main thread stack size used in this run was 8388608.
==59226==
==59226== HEAP SUMMARY:
==59226== in use at exit: 112,031 bytes in 651 blocks
==59226== total heap usage: 875 allocs, 224 frees, 178,835 bytes allocated
==59226==
==59226== Searching for pointers to 651 not-freed blocks

Memcheck: mc_leakcheck.c:1106 (void lc_scan_memory(Addr, SizeT, Bool,
Int, Int, Addr, SizeT)): Assertion 'bad_scanned_addr >=
VG_ROUNDUP(start, sizeof(Addr))' failed.

host stacktrace:
==59226== at 0x23804F24E: ???
==59226== by 0x23804F66C: ???
==59226== by 0x23804F64A: ???
==59226== by 0x238003953: ???
==59226== by 0x238003135: ???
==59226== by 0x238001D17: ???
==59226== by 0x23801482D: ???
==59226== by 0x23805C0F2: ???
==59226== by 0x2380EE752: ???

sched status:
running_tid=1

Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using. Thanks.