Opened 9 years ago

Closed 9 years ago

#4992 closed bug (invalid)

LLVM trashes registers for primitive calls

Reported by: scpmw Owned by: davidterei@…
Priority: normal Milestone:
Component: Compiler (LLVM) Version: 7.1
Keywords: Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime crash Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

When calling, the LLVM backend will generate code that sets a number of global registers to "undefined" to prevent LLVM from saving them. I suppose that's a good idea for C calls, but for primitive calls it is unnecessary, even dangerous. "tan 0.5" will have the backend generate code like follows:

...
store i64 %ln1zX, i64* %ln1zZ
store i64 undef, i64* %R3_Var
store i64 undef, i64* %R4_Var
store i64 undef, i64* %R5_Var
store i64 undef, i64* %R6_Var
store float undef, float* %F1_Var
store float undef, float* %F2_Var
store float undef, float* %F3_Var
store float undef, float* %F4_Var
store double undef, double* %D1_Var
store double undef, double* %D2_Var
%ln1A0 = call ccc double (double)* @tan( double 0x3FE0000000000000 ) nounwind
...

A few lines later, these registers are then read again, with LLVM probably assuming that the read results don't matter, as it can deduce that "tan" can't have written them.

I discovered this when investigating why my programs using additional primitive operations kept crashing. Removing the trashing for primitive calls seems to fix the problem nicely, patch attached.

Attachments (2)

llvm-primcall-register-trashing.patch (93.5 KB) - added by scpmw 9 years ago.
trash_testcase.hs (3.5 KB) - added by scpmw 9 years ago.
Unfinished test program

Download all attachments as: .zip

Change History (10)

Changed 9 years ago by scpmw

comment:1 Changed 9 years ago by dterei

Looks good, thanks for the patch! Could you please attach a test case though.

comment:2 Changed 9 years ago by igloo

Status: newpatch

comment:3 Changed 9 years ago by scpmw

Producing a test case for isn't straightforward... It would have to construct a situation where LLVM is pressured for (non-floating-point) registers while at the same time making heavy use of LLVM primitives. At the moment, these all concern floating-point operations, which makes this hard. Using "round", for example, makes the code pieces end up in different thunks. I suppose that's the reason why this problem hasn't shown up so far.

It might be possible to come up with something by addressing the primops directly and using unsafeCoerce in order to force GHC to pull some ints out of nowhere... But I'm not really sure the testcase would even be useful at that point.

The problem surfaced for me when I tried to add a primop for querying the cycle counter (llvm.readcyclecounter), which always uses two integer registers and therefore quickly ends up overwriting important stuff.

Changed 9 years ago by scpmw

Attachment: trash_testcase.hs added

Unfinished test program

comment:4 Changed 9 years ago by scpmw

I have attached my best shot at reproducing the problem with just "tan". It generates roughly the following code before the patch:

	subq	$72, %rsp
	movq	%r15, 40(%rsp)          # 8-byte Spill
	movq	%r14, 32(%rsp)          # 8-byte Spill
	movq	%r12, 24(%rsp)          # 8-byte Spill
	movq	%rbp, 8(%rsp)           # 8-byte Spill
	movq	%r13, 16(%rsp)          # 8-byte Spill
	movaps	%xmm5, %xmm0
	callq	tan
	cvttsd2siq	%xmm0, %rax
	movq	%rax, 56(%rsp)          # 8-byte Spill

And then goes on using all available registers, which is clearly dangerous. With the patch applied, a lot more registers are saved on top of the stack.

I'm unsure though how to craft an actually failing program out of this. Maybe somebody with a better overview can put in the missing piece?

(Fun fact: The program actually manages to crash the LLVM optimizer both with and without the patch. Compiling without -O works.)

comment:5 Changed 9 years ago by dterei

Is it not possible for you to simply attach the original program that crashes without the patch? If its private code you could just share it with me via email if you would prefer. I am not 100% convinced this patch solves the root of the problem.

Across C calls (regardless of prim or not) the registers are trashed because GHC has previously generated code to save and restore the appropriate registers. In the Cmm code given to the LLVM backend spill/reload of the R3... registers already exists and so to have LLVM do it as well just duplicates work. Thats why the backend simply trashes all the caller save registers blindly. This isn't the best solution but it shouldn't cause any bugs.

e.g so if R3 is live across a call you should see code like:

%lnXe = load i64* %R3_Var
store i64 %ln1zX, i64* %ln1zZ
store i64 undef, i64* %R3_Var
store i64 undef, i64* %R4_Var
store i64 undef, i64* %R5_Var
store i64 undef, i64* %R6_Var
store float undef, float* %F1_Var
store float undef, float* %F2_Var
store float undef, float* %F3_Var
store float undef, float* %F4_Var
store double undef, double* %D1_Var
store double undef, double* %D2_Var
%ln1A0 = call ccc double (double)* @tan( double 0x3FE0000000000000 ) nounwind
store i64 %lnXe, i64* %R3_Var

comment:6 Changed 9 years ago by scpmw

I can't simply attach what crashed for me because it requires a number of additional GHC patches to reproduce - all of which are very experimental at this stage. I will have another go at reproducing it, the incomplete test case probably just needs one of the normally-saved registers to be live across "r" to fail.

comment:7 Changed 9 years ago by scpmw

Okay, seems you are indeed right. Once registers are actually live, they get saved and the code does what it should.

The root cause here was my GHC tinkering, which happened to generate primitive calls without paying attention to live variables - which is in itself perfectly safe except that LLVM codegen counts on this unnecessary register saving taking place in order to produce correct code. Whew, didn't expect that.

Apologies for the confusion. I guess this can be closed, as the upsides have now been reduced to just less redundant code getting generated.

comment:8 Changed 9 years ago by dterei

Resolution: invalid
Status: patchclosed

Cool! I wouldn't say that not paying attention to live variables is perfectly safe though. The native code generator as I understand also relies on this saving and restoring of live registers to have already been done in the Cmm representation. The only reason it didn't crash on the same code I assume is that the code for the primitive call never touched the live registers. Its perfectly legal for it to do so though, so there are some cases with bigger primitive calls where it would fail for the native code generator. LLVM failed earlier as the registers are trashed to prevent code duplication and slowdown and as there was no save and restore in the Cmm LLVM correctly used the actually live registers for intermediate code in the function.

Note: See TracTickets for help on using tickets.