Opened 15 months ago

Last modified 12 months ago

#15287 new bug

T11627[ab] fail on some Darwin environments

Reported by: bgamari Owned by:
Priority: high Milestone: 8.6.1
Component: Compiler Version: 8.4.3
Keywords: Cc: lelf, osa1
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #14758 Differential Rev(s):
Wiki Page:

Description

As of d6216443c61cee94d8ffc31ca8510a534d9406b9 I am seeing T11627a and T11627b fail in the prof_hr way on Darwin under CircleCI (but not Harbormaster!):

Wrong exit code for T11627b(prof_hr)(expected 0 , actual 139 )
Stderr ( T11627b ):
/bin/sh: line 1: 73247 Segmentation fault: 11  ./T11627b +RTS -hr -RTS +RTS -i0 -RTS
*** unexpected failure for T11627b(prof_hr)

Change History (7)

comment:1 Changed 15 months ago by bgamari

$ lldb -- testsuite/tests/profiling/should_run/T11627a +RTS -hr
(lldb) target create "testsuite/tests/profiling/should_run/T11627a"
Current executable set to 'testsuite/tests/profiling/should_run/T11627a' (x86_64).
(lldb) settings set -- target.run-args  "+RTS" "-hr"
(lldb) run
Process 6687 launched: '/Users/distiller/project/testsuite/tests/profiling/should_run/T11627a' (x86_64)
T11627a was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 6687 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xfffffffffffffff8)
    frame #0: 0x00000001000d538b T11627a`retainClosure(c0=<unavailable>, cp0=0x000000010012f550, r0=0x000000010012dab0) at RetainerProfile.c:1409 [opt]
   1406	    //   if cp is not a retainer, r belongs to RSET(cp).
   1407	    //   if cp is a retainer, r == cp.
   1408	
-> 1409	    typeOfc = get_itbl(c)->type;
   1410	
   1411	#if defined(DEBUG_RETAINER)
   1412	    switch (typeOfc) {
Target 0: (T11627a) stopped.
(lldb) print c
(StgClosure *) $0 = 0x00007fff5fbff0b8
(lldb) x/x 0x00007fff5fbff0b8
0x7fff5fbff0b8: 0x00000000

comment:2 Changed 15 months ago by bgamari

A backtrace from another run (which inexplicably has no debug symbols in RetainerProfile.c):

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xfffffffffffffff8)
  * frame #0: 0x00000001000d538b T11627a`retainClosure + 123
    frame #1: 0x00000001000dcdef T11627a`markStableTables at Stable.c:474 [opt]
    frame #2: 0x00000001000dcdaf T11627a`markStableTables(evac=(T11627a`retainRoot), user=0x0000000000000000) at Stable.c:492 [opt]
    frame #3: 0x00000001000d5155 T11627a`retainerProfile + 277
    frame #4: 0x00000001000d016b T11627a`heapCensus(t=<unavailable>) at ProfHeap.c:1174 [opt]
    frame #5: 0x00000001000ea029 T11627a`GarbageCollect(collect_gen=<unavailable>, do_heap_census=<unavailable>, gc_type=<unavailable>, cap=0x0000000100142500, idle_cap=0x0000000000000001) at GC.c:771 [opt]
    frame #6: 0x00000001000dc360 T11627a`scheduleDoGC(pcap=<unavailable>, task=<unavailable>, force_major=<unavailable>) at Schedule.c:1799 [opt]
    frame #7: 0x00000001000dc08e T11627a`scheduleWaitThread [inlined] schedule(initialCapability=<unavailable>, task=<unavailable>) at Schedule.c:545 [opt]
    frame #8: 0x00000001000db7eb T11627a`scheduleWaitThread(tso=<unavailable>, ret=<unavailable>, pcap=0x00007fff5fbff2f8) at Schedule.c:2533 [opt]
    frame #9: 0x00000001000d98ab T11627a`hs_main(argc=1, argv=0x00007fff5fbff458, main_closure=<unavailable>, rts_config=RtsConfig @ 0x00007fff5fbff320) at RtsMain.c:72 [opt]
    frame #10: 0x0000000100001b16 T11627a`main + 166
    frame #11: 0x00007fff8e844235 libdyld.dylib`start + 1
    frame #12: 0x00007fff8e844235 libdyld.dylib`start + 1

comment:3 Changed 15 months ago by bgamari

Another run, this this with DEBUG_RETAINER set:

pop() to the previous stack.
stackSize = 0
retainClosure() ends: oldStackTop = 0x200000, stackTop = 0x200000
retainClosure() called: c0 = 0x1345c0, cp0 = 0x1345c0, r0 = 0x13f7e0
push(): stackTop = 0x200000, currentStackBoundary = 0x200000
stackSize = 1
push(): stackTop = 0x1fffd8, currentStackBoundary = 0x200000
stackSize = 2
pop(): stackTop = 0x1fffb0, currentStackBoundary = 0x200000
        popOff(): stackTop = 0x1fffb0, currentStackBoundary = 0x200000
stackSize = 1
        popOff(): stackTop = 0x1fffd8, currentStackBoundary = 0x200000
pop() to the previous stack.
stackSize = 0
retainClosure() ends: oldStackTop = 0x200000, stackTop = 0x200000
retainClosure() called: c0 = 0x133ee0, cp0 = 0x133ee0, r0 = 0x13f7e0
push(): stackTop = 0x200000, currentStackBoundary = 0x200000
stackSize = 1
push(): stackTop = 0x1fffd8, currentStackBoundary = 0x200000
stackSize = 2
pop(): stackTop = 0x1fffb0, currentStackBoundary = 0x200000
pop(): stackTop = 0x1fffb0, currentStackBoundary = 0x200000
        popOff(): stackTop = 0x1fffb0, currentStackBoundary = 0x200000
stackSize = 1
        popOff(): stackTop = 0x1fffd8, currentStackBoundary = 0x200000
pop() to the previous stack.
stackSize = 0
retainClosure() ends: oldStackTop = 0x200000, stackTop = 0x200000
retainClosure() called: c0 = 0x134a80, cp0 = 0x134a80, r0 = 0x13f7e0
pop(): stackTop = 0x200000, currentStackBoundary = 0x200000
retainClosure() ends: oldStackTop = 0x200000, stackTop = 0x200000
retainClosure() called: c0 = 0x12f550, cp0 = 0x12f550, r0 = 0x12dab0
push(): stackTop = 0x200000, currentStackBoundary = 0x200000
T11627a was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 36053 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.3
    frame #0: 0x00000001000d5cff T11627a`retainClosure [inlined] isRetainer(c=<unavailable>) at RetainerProfile.c:1057 [opt]
   1054     case IND:
   1055     case INVALID_OBJECT:
   1056     default:
-> 1057         barf("Invalid object in isRetainer(): %p, type=%d", c, get_itbl(c)->type);
   1058         return false;
   1059     }
   1060 }
Target 0: (T11627a) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.3
  * frame #0: 0x00000001000d5cff T11627a`retainClosure [inlined] isRetainer(c=<unavailable>) at RetainerProfile.c:1057 [opt]
    frame #1: 0x00000001000d5cff T11627a`retainClosure(c0=<unavailable>, cp0=0x000000010012f550, r0=0x000000010012dab0) at RetainerProfile.c:1517 [opt]
    frame #2: 0x00000042001fcec8
(lldb) 

Given that the backtrace is truncated it looks rather like someone has smashed the C callstack.

comment:4 Changed 15 months ago by bgamari

I rather suspect that this is another manifestation of #14758.

comment:5 Changed 15 months ago by Ben Gamari <ben@…>

In f0179e3/ghc:

testsuite: Skip T11627a and T11627b on Darwin

Darwin tends to give us a very small stack which the retainer profiler tends to
overflow. Strangely, this manifested on CircleCI yet not Harbormaster.

See #15287 and #11627.

comment:6 Changed 14 months ago by lelf

Cc: lelf added

comment:7 Changed 12 months ago by osa1

Cc: osa1 added

I'm confused about comment:3. On Linux, with gdb, when I get a segfault because of stack overflow and run bt I get thousands of stack frames, not just 3. Why do we get only 3 frames if this is a stack overflow? Is this because of calling conventions on darwin?

Note: See TracTickets for help on using tickets.