Opened 5 years ago

Last modified 3 years ago

#9798 new bug

Frustrating behaviour of the INLINE pragma

Reported by: mojojojo Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.8.3
Keywords: Inlining Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

So I have a function "a", which uses another function "b" from another module.

  • Step 1. I benchmark its performance and get 250ns.
  • Step 2. I go and put the "INLINE" pragma on the function "b", run the benchmark again and get 500ns. That is twice as long.
  • Step 3. I go and add an explicit invocation of the inline function over function "b" in function "a" and finally get the desired optimization: 208ns.

You can reproduce the issue by executing cabal bench decoding --benchmark-options=numeric after cloning the trees of the following commits:

Attachments (3)

step1.7z (63.9 KB) - added by mojojojo 5 years ago.
The core dump and a ticky file
step2.7z (165.9 KB) - added by mojojojo 5 years ago.
A core dump and a ticky file
step3.7z (145.6 KB) - added by mojojojo 5 years ago.
A core dump and a ticky file

Download all attachments as: .zip

Change History (12)

comment:1 Changed 5 years ago by simonpj

I can see that is frustrating.

There are a lot of dependencies here. (Though, happily, I believe that one can run your benchmark without having any C stuff installed eg postgres. Is that right?)

But still, there's a lot of stuff. Could you

  • compile just the offending module (which is Decoder.hs I think),
  • on its own,
  • preferably with only the definition for numeric (which is the offending one I think)
  • with the extra flags -dverbose-core2core -ddump-inlinings

and save the three results for Step 1/2/3 attached to this ticket? That would mean I could take a look without reproducing the setup.

Another thing that would be revealing would be to add -ticky to the command line when compiling; and +RTS -rstep1.ticky to the command line when running. And then add the .ticky files to the ticket. That should show up where the performance differences are occurring without affecting anything.

Thanks

Simon

comment:2 Changed 5 years ago by mojojojo

I believe that one can run your benchmark without having any C stuff installed eg postgres. Is that right?

Yes. Only Haskell dependencies in the benchmark build target.

But still, there's a lot of stuff.

I'll try to approach this some time soon.

Changed 5 years ago by mojojojo

Attachment: step1.7z added

The core dump and a ticky file

Changed 5 years ago by mojojojo

Attachment: step2.7z added

A core dump and a ticky file

Changed 5 years ago by mojojojo

Attachment: step3.7z added

A core dump and a ticky file

comment:3 Changed 5 years ago by mojojojo

I've narrowed the code down as much as possible and published it in a dedicated branch. You'll find that latest commits reproduce the issue in the narrowed down code step by step.

Also you'll find the according dumps and ticky reports attached to this issue. I've generated them by passing the according GHC and runtime flags to Cabal, I hope this doesn't matter.

comment:4 Changed 5 years ago by simonpj

Did you use -ddump-inlinings? Could you add -ddump-occur-anal -ddump-stg. Can you combine stdout and stderr together?

I'm not sure which is which, because directory step2/ contains files labelled step3 and vice versa!

Since you are zipping, you could bundle all three in one.

Thanks! As you'll see, the ticky numbers are revealing. But it looks as if step1 allocates least. Is step3 really faster?

Simon

comment:5 Changed 5 years ago by mojojojo

I actually tried to upload it as a single archive, only to find out that you have a limit of 250KB for a single upload here. So I might have messed up with the archive titles later, the dump and picky files though should have been correct.

Concerning your request I'll try to post back with updates soon.

comment:6 Changed 5 years ago by mojojojo

I bet it won't be a problem for you to inspect the problem yourself now though, since the codebase is quite narrowed down now:

https://github.com/nikita-volkov/postgresql-binary/tree/ghc-issue-9798

comment:7 Changed 5 years ago by mojojojo

I've ran this thing with the new flags and capturing stderr too. You'll find the files in the root of the tree on the same branch: https://github.com/nikita-volkov/postgresql-binary/tree/ghc-issue-9798

comment:8 Changed 5 years ago by thomie

Architecture: x86_64 (amd64)Unknown/Multiple
Operating System: MacOS XUnknown/Multiple

comment:9 Changed 3 years ago by mpickering

Keywords: Inlining added
Note: See TracTickets for help on using tickets.