Discussion:
[Valgrind-users] Segfault, Assertion 'bad_scanned_addr >= VG_ROUNDUP (MPI / pthread program)
Jim
2016-02-22 18:26:07 UTC
Permalink
Hi, I'm using valgrind to debug some memory issues in a parallel
runtime system and program that uses both MPI and C++11 threads. I am
hitting an issue that, I believe, is the same as Valgrind Bug 349128
(https://bugs.kde.org/show_bug.cgi?id=349128). However, I upgraded to
valgrind-3.12.0.SVN, which I was hoping would fix the issue as per
philippe's commit r15716 in October. Despite running the latest from
svn, I'm still hitting this bug. It's quite possible, though, that I'm
hitting a different bug (either mine or Valgrind's). So, I'm writing
in to see if anyone can help me out with this issue.

Overview: I'm building a new parallel runtime system, and the valgrind
error (segfault -- details below) shows up when the threads are done
running (almost) and are joining back to the parent thread. I'm able
to run many different other tests using my runtime system on valgrind
without this error (and, memcheck doesn't detect any issues with my
runtime system -- the valgrind logs are empty except for the
metadata). However, it's quite possible that there are some memory
issues in my program that the other tests are not detecting.

I can't easily post my source code, as it's hundreds of files, but
here's the basic structure of the test that is failing:

* system: OSX 10.11
* Valgrind valgrind-3.12.0.SVN
* compiler:
* Configured with:
--prefix=/Applications/Xcode.app/Contents/Developer/usr
--with-gxx-include-dir=/usr/include/c++/4.2.1
* Apple LLVM version 7.0.0 (clang-700.0.72)
* Target: x86_64-apple-darwin15.0.0
* Thread model: posix
* 2 MPI processes
* 4 C++11 threads per MPI process (8 threads total)
* each process forks 4 threads with a very minimal function. function
runs and returns.
* threads go to join back to the main thread, and somewhere in
_pthread_join_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
* program runs to completion *without segfault* when I don't run using
valgrind. segfault only arises when I run with valgrind.
* I'm running valgrind on both MPI processes and get valgrind output
for both of the processes. Both processes die with the same error
(segfault on _pthread_join_cleanup)
* valgrind log below

I'd appreciate any input on helping me track this error down. Is this
a problem with valgrind, or with my program? Please let me know if
there's more information I should post.

Thank you!

Jim



Here's the valgrind log:

==59226== Memcheck, a memory error detector
==59226== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==59226== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==59226== Command: specific_bundle_test 1
==59226== Parent PID: 59224
==59226==
--59226--
--59226-- Valgrind options:
--59226-- -v
--59226-- --trace-children=yes
--59226-- --log-file=valgrind/memcheck-%p.valgrind
--59226-- --tool=memcheck
--59226-- --leak-check=yes
--59226-- --dsymutil=yes
--59226-- --extra-debuginfo-path=/Users/jim/Documents/Research/projects/hybrid_programming/pure/test/../lib
--59226-- Output from sysctl({CTL_KERN,KERN_VERSION}):
--59226-- Darwin Kernel Version 15.0.0: Wed Aug 26 16:57:32 PDT
2015; root:xnu-3247.1.106~1/RELEASE_X86_64
--59226-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-rdtscp-sse3
--59226-- Page sizes: currently 4096, max supported 4096
--59226-- Valgrind library directory: /Users/jim/local/lib/valgrind
--59226-- ./specific_bundle_test (rx at 0x100000000, rw at 0x1000ca000)
--59226-- reading syms from primary file (583 6949)
--59226-- dSYM=
./specific_bundle_test.dSYM/Contents/Resources/DWARF/specific_bundle_test
--59226-- reading dwarf3 from dsyms file
--59226-- /usr/lib/dyld (rx at 0x7fff5fc00000, rw at 0x7fff5fc37000)
--59226-- reading syms from primary file (6 1226)
--59226-- Scheduler: using generic scheduler lock implementation.
--59226-- Reading suppressions file: /Users/jim/local/lib/valgrind/default.supp
==59226== embedded gdbserver: reading from
/var/folders/c1/vxvr6h9x10b8dbsxhh6nx05h0000gn/T//vgdb-pipe-from-vgdb-to-59226-by-jim-on-???
==59226== embedded gdbserver: writing to
/var/folders/c1/vxvr6h9x10b8dbsxhh6nx05h0000gn/T//vgdb-pipe-to-vgdb-from-59226-by-jim-on-???
==59226== embedded gdbserver: shared mem
/var/folders/c1/vxvr6h9x10b8dbsxhh6nx05h0000gn/T//vgdb-pipe-shared-mem-vgdb-59226-by-jim-on-???
==59226==
==59226== TO CONTROL THIS PROCESS USING vgdb (which you probably
==59226== don't want to do, unless you know exactly what you're doing,
==59226== or are doing some strange experiment):
==59226== /Users/jim/local/lib/valgrind/../../bin/vgdb --pid=59226
...command...
==59226==
==59226== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==59226== /path/to/gdb specific_bundle_test
==59226== and then give GDB the following command
==59226== target remote |
/Users/jim/local/lib/valgrind/../../bin/vgdb --pid=59226
==59226== --pid is optional if only one valgrind process is running
==59226==
--59226-- REDIR: 0x7fff5fc1e5b9 (dyld:arc4random) redirected to
0x23806e30e (???)
--59226-- REDIR: 0x7fff5fc24780 (dyld:strcmp) redirected to 0x23806e270 (???)
--59226-- REDIR: 0x7fff5fc1e380 (dyld:strlen) redirected to 0x23806e23f (???)
--59226-- REDIR: 0x7fff5fc1e2e0 (dyld:strcpy) redirected to 0x23806e28c (???)
--59226-- REDIR: 0x7fff5fc21cdf (dyld:strcat) redirected to 0x23806e250 (???)
--59226-- REDIR: 0x7fff5fc21d1f (dyld:strlcat) redirected to 0x23806e2a9 (???)
--59226-- /Users/jim/local/lib/valgrind/vgpreload_core-amd64-darwin.so
(rx at 0x100139000, rw at 0x10013b000)
--59226-- reading syms from primary file (3 42)
--59226-- dSYM=
/Users/jim/local/lib/valgrind/vgpreload_core-amd64-darwin.so.dSYM/Contents/Resources/DWARF/vgpreload_core-amd64-darwin.so
--59226-- reading dwarf3 from dsyms file
--59226-- /Users/jim/local/lib/valgrind/vgpreload_memcheck-amd64-darwin.so
(rx at 0x10013d000, rw at 0x100143000)
--59226-- reading syms from primary file (72 356)
--59226-- dSYM=
/Users/jim/local/lib/valgrind/vgpreload_memcheck-amd64-darwin.so.dSYM/Contents/Resources/DWARF/vgpreload_memcheck-amd64-darwin.so
--59226-- reading dwarf3 from dsyms file
--59226-- /usr/local/Cellar/mpich/3.1.4_1/lib/libmpi.12.dylib (rx at
0x100148000, rw at 0x100220000)
--59226-- reading syms from primary file (595 736)
--59226-- /usr/lib/libSystem.B.dylib (rx at 0x100241000, rw at 0x100243000)
--59226-- reading syms from primary file (31 5)
--59226-- /usr/local/Cellar/mpich/3.1.4_1/lib/libmpicxx.12.dylib (rx
at 0x100248000, rw at 0x100251000)
--59226-- reading syms from primary file (329 285)
--59226-- /usr/local/Cellar/mpich/3.1.4_1/lib/libpmpi.12.dylib (rx at
0x100262000, rw at 0x100463000)
--59226-- reading syms from primary file (2070 4514)
--59226-- /usr/lib/libc++.1.dylib (rx at 0x100506000, rw at 0x10055a000)
--59226-- reading syms from primary file (1960 1590)
--59226-- /usr/local/Cellar/gcc/5.2.0/lib/gcc/5/libgfortran.3.dylib
(rx at 0x1005b6000, rw at 0x1006d0000)
--59226-- reading syms from primary file (1327 10741)
--59226-- /usr/local/Cellar/gcc/5.2.0/lib/gcc/5/libgcc_s.1.dylib (rx
at 0x10073a000, rw at 0x100750000)
--59226-- reading syms from primary file (159 1046)
--59226-- /usr/local/Cellar/gcc/5.2.0/lib/gcc/5/libquadmath.0.dylib
(rx at 0x10075a000, rw at 0x100792000)
--59226-- reading syms from primary file (98 1394)
--59226-- /usr/lib/system/libcache.dylib (rx at 0x1007a0000, rw at 0x1007a5000)
--59226-- reading syms from primary file (32 30)
--59226-- /usr/lib/system/libcommonCrypto.dylib (rx at 0x1007aa000, rw
at 0x1007b6000)
--59226-- reading syms from primary file (214 188)
--59226-- /usr/lib/system/libcompiler_rt.dylib (rx at 0x1007c3000, rw
at 0x1007cb000)
--59226-- reading syms from primary file (510 8)
--59226-- /usr/lib/system/libcopyfile.dylib (rx at 0x1007d8000, rw at
0x1007e1000)
--59226-- reading syms from primary file (13 35)
--59226-- /usr/lib/system/libcorecrypto.dylib (rx at 0x1007e7000, rw
at 0x10085f000)
--59226-- reading syms from primary file (428 602)
--59226-- /usr/lib/system/libdispatch.dylib (rx at 0x100877000, rw at
0x1008a5000)
--59226-- reading syms from primary file (215 832)
--59226-- /usr/lib/system/libdyld.dylib (rx at 0x1008ce000, rw at 0x1008d2000)
--59226-- reading syms from primary file (80 109)
--59226-- /usr/lib/system/libkeymgr.dylib (rx at 0x1008d9000, rw at 0x1008da000)
--59226-- reading syms from primary file (12 3)
--59226-- /usr/lib/system/libmacho.dylib (rx at 0x1008e5000, rw at 0x1008eb000)
--59226-- reading syms from primary file (97 1)
--59226-- /usr/lib/system/libquarantine.dylib (rx at 0x1008f1000, rw
at 0x1008f4000)
--59226-- reading syms from primary file (67 32)
--59226-- /usr/lib/system/libremovefile.dylib (rx at 0x1008fa000, rw
at 0x1008fc000)
--59226-- reading syms from primary file (15 4)
--59226-- /usr/lib/system/libsystem_asl.dylib (rx at 0x100901000, rw
at 0x100919000)
--59226-- reading syms from primary file (222 225)
--59226-- /usr/lib/system/libsystem_blocks.dylib (rx at 0x100926000,
rw at 0x100928000)
--59226-- reading syms from primary file (25 22)
--59226-- /usr/lib/system/libsystem_c.dylib (rx at 0x10092c000, rw at
0x1009ba000)
--59226-- reading syms from primary file (1308 746)
--59226-- /usr/lib/system/libsystem_configuration.dylib (rx at
0x1009e5000, rw at 0x1009e8000)
--59226-- reading syms from primary file (28 58)
--59226-- /usr/lib/system/libsystem_coreservices.dylib (rx at
0x1009ee000, rw at 0x1009f1000)
--59226-- reading syms from primary file (13 30)
--59226-- /usr/lib/system/libsystem_coretls.dylib (rx at 0x1009f6000,
rw at 0x100a0b000)
--59226-- reading syms from primary file (115 241)
--59226-- /usr/lib/system/libsystem_dnssd.dylib (rx at 0x100a14000, rw
at 0x100a1d000)
--59226-- reading syms from primary file (68 33)
--59226-- /usr/lib/system/libsystem_info.dylib (rx at 0x100a23000, rw
at 0x100a4d000)
--59226-- reading syms from primary file (526 527)
--59226-- /usr/lib/system/libsystem_kernel.dylib (rx at 0x100a62000,
rw at 0x100a81000)
--59226-- reading syms from primary file (1046 83)
--59226-- /usr/lib/system/libsystem_m.dylib (rx at 0x100a96000, rw at
0x100ac6000)
--59226-- reading syms from primary file (593 1)
--59226-- /usr/lib/system/libsystem_malloc.dylib (rx at 0x100ad2000,
rw at 0x100aef000)
--59226-- reading syms from primary file (102 201)
--59226-- /usr/lib/system/libsystem_network.dylib (rx at 0x100af8000,
rw at 0x100b57000)
--59226-- reading syms from primary file (664 1939)
--59226-- /usr/lib/system/libsystem_networkextension.dylib (rx at
0x100b8c000, rw at 0x100b95000)
--59226-- reading syms from primary file (82 235)
--59226-- /usr/lib/system/libsystem_notify.dylib (rx at 0x100ba0000,
rw at 0x100baa000)
--59226-- reading syms from primary file (136 53)
--59226-- /usr/lib/system/libsystem_platform.dylib (rx at 0x100bb2000,
rw at 0x100bbb000)
--59226-- reading syms from primary file (142 158)
--59226-- /usr/lib/system/libsystem_pthread.dylib (rx at 0x100bc3000,
rw at 0x100bcd000)
--59226-- reading syms from primary file (163 70)
--59226-- /usr/lib/system/libsystem_sandbox.dylib (rx at 0x100bda000,
rw at 0x100bde000)
--59226-- reading syms from primary file (79 7)
--59226-- /usr/lib/system/libsystem_secinit.dylib (rx at 0x100be4000,
rw at 0x100be6000)
--59226-- reading syms from primary file (3 6)
--59226-- /usr/lib/system/libsystem_trace.dylib (rx at 0x100beb000, rw
at 0x100bfd000)
--59226-- reading syms from primary file (94 351)
--59226-- /usr/lib/system/libunwind.dylib (rx at 0x100c0f000, rw at 0x100c15000)
--59226-- reading syms from primary file (102 52)
--59226-- /usr/lib/system/libxpc.dylib (rx at 0x100c1c000, rw at 0x100c46000)
--59226-- reading syms from primary file (503 833)
--59226-- /usr/lib/libobjc.A.dylib (rx at 0x100c64000, rw at 0x100fcf000)
--59226-- reading syms from primary file (347 935)
--59226-- /usr/lib/libauto.dylib (rx at 0x1010af000, rw at 0x1010f6000)
--59226-- reading syms from primary file (68 658)
--59226-- /usr/lib/libc++abi.dylib (rx at 0x10110b000, rw at 0x101135000)
--59226-- reading syms from primary file (337 181)
--59226-- /usr/lib/libDiagnosticMessagesClient.dylib (rx at
0x101143000, rw at 0x101145000)
--59226-- reading syms from primary file (21 14)
--59226-- REDIR: 0x100bb2ac0
(libsystem_platform.dylib:_platform_memchr$VARIANT$Generic) redirected
to 0x100140910 (_platform_memchr$VARIANT$Generic)
--59226-- REDIR: 0x100bb2c80
(libsystem_platform.dylib:_platform_memcmp) redirected to 0x100140f10
(_platform_memcmp)
--59226-- REDIR: 0x100bb3220
(libsystem_platform.dylib:_platform_strncmp) redirected to 0x1001407b0
(_platform_strncmp)
--59226-- REDIR: 0x100ad30b2 (libsystem_malloc.dylib:malloc)
redirected to 0x10013de00 (malloc)
--59226-- REDIR: 0x10092cd20 (libsystem_c.dylib:strlen) redirected to
0x100140340 (strlen)
--59226-- REDIR: 0x100bb3800
(libsystem_platform.dylib:_platform_strcmp) redirected to 0x100140870
(_platform_strcmp)
--59226-- REDIR: 0x100ad5ea8 (libsystem_malloc.dylib:free) redirected
to 0x10013e3b0 (free)
--59226-- REDIR: 0x100ad8441 (libsystem_malloc.dylib:calloc)
redirected to 0x10013e790 (calloc)
--59226-- REDIR: 0x100ad7949
(libsystem_malloc.dylib:malloc_default_zone) redirected to 0x10013fde0
(malloc_default_zone)
--59226-- REDIR: 0x100ad456a
(libsystem_malloc.dylib:malloc_zone_malloc) redirected to 0x10013e170
(malloc_zone_malloc)
--59226-- REDIR: 0x100ad7968
(libsystem_malloc.dylib:malloc_zone_calloc) redirected to 0x10013ea50
(malloc_zone_calloc)
--59226-- REDIR: 0x100ad7a22
(libsystem_malloc.dylib:malloc_zone_from_ptr) redirected to
0x10013fe30 (malloc_zone_from_ptr)
--59226-- REDIR: 0x100bb3380
(libsystem_platform.dylib:_platform_strchr$VARIANT$Generic) redirected
to 0x100140190 (_platform_strchr$VARIANT$Generic)
--59226-- REDIR: 0x100ad8644 (libsystem_malloc.dylib:realloc)
redirected to 0x10013ecb0 (realloc)
--59226-- REDIR: 0x100ada9f0
(libsystem_malloc.dylib:malloc_zone_memalign) redirected to
0x10013f7f0 (malloc_zone_memalign)
--59226-- REDIR: 0x10092cd80 (libsystem_c.dylib:strncpy) redirected to
0x100140540 (strncpy)
--59226-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option
--59226-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times)
--59226-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times)
--59226-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 8 times)
==59226==
==59226== Process terminating with default action of signal 11 (SIGSEGV)
==59226== Access not within mapped region at address 0x7000036B2C1C
==59226== at 0x100BC8873: _pthread_join_cleanup (in
/usr/lib/system/libsystem_pthread.dylib)
==59226== by 0x100BC87D7: pthread_join (in
/usr/lib/system/libsystem_pthread.dylib)
==59226== by 0x10054BE94: std::__1::thread::join() (in
/usr/lib/libc++.1.dylib)
==59226== by 0x1000A76DE: PureProcess::RunPureThreads(int, char**)
(pure_process.cpp:128)
==59226== by 0x1000B63AC: PureRT::Pure::Run(int, char**) (pure.cpp:53)
==59226== by 0x1000B6431: main (pure.cpp:62)
==59226== If you believe this happened as a result of a stack
==59226== overflow in your program's main thread (unlikely but
==59226== possible), you can try to increase the size of the
==59226== main thread stack using the --main-stacksize= flag.
==59226== The main thread stack size used in this run was 8388608.
==59226==
==59226== HEAP SUMMARY:
==59226== in use at exit: 112,031 bytes in 651 blocks
==59226== total heap usage: 875 allocs, 224 frees, 178,835 bytes allocated
==59226==
==59226== Searching for pointers to 651 not-freed blocks

Memcheck: mc_leakcheck.c:1106 (void lc_scan_memory(Addr, SizeT, Bool,
Int, Int, Addr, SizeT)): Assertion 'bad_scanned_addr >=
VG_ROUNDUP(start, sizeof(Addr))' failed.

host stacktrace:
==59226== at 0x23804F24E: ???
==59226== by 0x23804F66C: ???
==59226== by 0x23804F64A: ???
==59226== by 0x238003953: ???
==59226== by 0x238003135: ???
==59226== by 0x238001D17: ???
==59226== by 0x23801482D: ???
==59226== by 0x23805C0F2: ???
==59226== by 0x2380EE752: ???

sched status:
running_tid=1


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using. Thanks.
Philippe Waroquiers
2016-02-22 22:17:56 UTC
Permalink
Post by Jim
I'd appreciate any input on helping me track this error down. Is this
a problem with valgrind, or with my program? Please let me know if
there's more information I should post.
Can you do the following 4 trials and attach the resulting logs
to the bug 349128, which maybe/probably is the best matching bug ?

When looking at the log of your program, I am not very sure
that the 'initial problem' is linked to the leak search.
The bug 353891 fixed in revision 15716 had the following symptoms:
program running normally, no SEGV encountered
leak search starting
SEGV during leak search due to heuristic dereferences not protected
assert failing as there was no specific handler for heuristic dereferences

While in your case, we first see a SEGV,
followed by a leak search which then causes a similar assert.

So, I suspect what you see is a 'client' SEGV, causing a termination
of the program. This termination implies a leak search, and during
the leak search, we encounter a 'second bug', which is in any case
unexpected, as revision 15716 is supposed to have fixed all that :(.

The initial SEGV is maybe the indication of something strange
happening during thread termination, causing after
the SEGV and/or leak search problem. See helgrind/tests/stackteardown.c
for a somewhat strange way in which android terminates a thread.
Maybe something similar happens on MacOS and/or with MPI.
E.g. maybe MPI uses shared memory in a special way ?

With the below 4 trials, we might have a better idea of the bug origin.

Thanks
Philippe

Trials to do:

1. run with --leak-check-heuristics=none
If the problem is fixed, then do:
for h in stdstring length64 newarray multipleinheritance
do
run with --leak-check-heuristics=$h
done
to find which heuristic causes the problem.

2. run with --leak-check=no
do we still see a SEGV ?

3. run with more tracing i.e. with
-v -v -v -d -d -d --vgdb-stop-at=valgrindabexit
valgrind should stop and wait for vgdb when encountering the assert.
Then in another shell window, do
vgdb v.info memory aspacemgr
Valgrind will produce the memory mapping.
We can then see if the address causing the SEGV is somewhat special
(e.g. what is the segment of this address ? Was this a thread stack ?
...)

4. Would it be possible to translate the below host addresses in
sourcefile + linenr ?
On linux, the host stacktrace is symbolic. No idea why on MacOS it is
not the case. I guess MacOS has a tool that can translate these
addresses, which are in the memcheck executable so something like
memcheck-amd64-darwin.
(or I suppose gdb will be able to translate these addresses)
Post by Jim
Memcheck: mc_leakcheck.c:1106 (void lc_scan_memory(Addr, SizeT, Bool,
Int, Int, Addr, SizeT)): Assertion 'bad_scanned_addr >=
VG_ROUNDUP(start, sizeof(Addr))' failed.
==59226== at 0x23804F24E: ???
==59226== by 0x23804F66C: ???
==59226== by 0x23804F64A: ???
==59226== by 0x238003953: ???
==59226== by 0x238003135: ???
==59226== by 0x238001D17: ???
==59226== by 0x23801482D: ???
==59226== by 0x23805C0F2: ???
==59226== by 0x2380EE752: ???
The best is to attach all the produced output/log files to the bug
349128.


Thanks

Philippe
Philippe Waroquiers
2016-02-23 21:10:46 UTC
Permalink
Post by Philippe Waroquiers
2. run with --leak-check=no
do we still see a SEGV ?
Yes, problem still persists. Still a SEGV with the same error stack
track from memcheck.
So, this indicates that what you see is not related to the bug 353891
which was solved with revision 15716 : this bug could only be
triggered when doing a leak search.

Is the SEGV also produced when using other tools ?
(e.g. --tool=none ? --tool=lackey ? --tool=helgrind ?
--tool=callgrind ?)
Post by Philippe Waroquiers
3. run with more tracing i.e. with
-v -v -v -d -d -d --vgdb-stop-at=valgrindabexit
valgrind should stop and wait for vgdb when encountering the assert.
Then in another shell window, do
vgdb v.info memory aspacemgr
Valgrind will produce the memory mapping.
We can then see if the address causing the SEGV is somewhat special
(e.g. what is the segment of this address ? Was this a thread stack ?
...)
==> this didn't work for me; ended with an error instead of waiting.
see attached file for the full output.
vgdb is supposed to work on MacOs but last time I tested was something
like 4 years ago so status is unknown :(.

Nevertheless, the trace has given the aspacemgr mapping.

The log only contains the valgrind trace, not the 'normal output'.
But assuming the same address causes the SEGV (0x7000036B2C1C),
then this is really strange: this address is inside the segment:
--68920:1: aspacem 203: ANON 7000035bc000-7000036bbfff 1048576 rwx--
which is the valgrind stack of the thread tid 6 ???

The trace shows other not understandable entries such as:
--68920:2: stacks no addressable segment for SP 0x70000373EF80

So, it looks like at thread startup time or very quickly after
thread startup, we have a SP which is outside any segment
as maintained by valgrind aspacemgr.
This is all very strange/not understandable.

Maybe you could try to increase the valgrind stack size, just
to experiment. Try e.g. with--valgrind-stacksize=8388608.
If you can control the user thread stacksize, maybe also try
to increase it.

At this stage, I have not much idea, and without a reproducer
on linux, not much chance to to find the problem with
mail remote debugging :(.

Philippe

Loading...