GHC Commentary: What the hell is a .cmm file?

A .cmm file is rather like C--. The syntax is almost C-- (a few constructs are missing), and it is augmented with some macros that are expanded by GHC's code generator (eg. INFO_TABLE()). A .cmm file is compiled by GHC itself: the syntax is parsed by compiler/cmm/CmmParse.y and compiler/cmm/CmmLex.x into the Cmm data type, where it is then passed through one of the back-ends.

We use the C preprocessor on .cmm files, making extensive use of macros to make writing this low-level code a bit less tedious and error-prone. Most of our C-- macros are in includes/Cmm.h. One useful fact about the macros is P_ is an alias for gcptr, and you should not use it for non-garbage-collected pointers.

Reading references

Reading material for learning Cmm is somewhat scattered, so I (Arash) have created a list of useful links. Since the Cmm language is changing as GHC changes, I have prioritized resources that are not too old. (Feel free to add/remove/modify this list! :))

  • An overview of Cmm is given in David Terei's bachelor thesis (chapter 2.4.3).
  • The comments in the beginning of compiler/cmm/CmmParse.y is super-useful and kept up to date. The rest of the file contains the grammar of the language. Afraid of grammars? Edward Yang wrote this fantastic blog post on how to understand the constructs of Cmm by using the grammar.
  • Cmm has a preprocessor like the one in C and many of the macros are defined in includes/Cmm.h.
  • In 2012, Simon Marlow extended the Cmm language by adding a new high-level syntax which can be used when you don't need low-level access (like registers). The commit explains the details.
  • Cmm is also described on this wiki, but it is written before the new syntax was introduced.
  • Stack frame types are created using INFO_TABLE_RET, the syntax can be confusing since there are both arguments and fields, I (Arash) have not seen anything like it in other programming languages. I tried to explain it in my master thesis (sections 4.2 and 4.2.1).

Other information

It can take time to learn Cmm. One unintuitive thing to watch out for is that there are no function calls in low-level cmm code. The new syntax from 2012 allows function calls but you should know that they are kind of magical.

We say that Cmm is GHC's implementation of C--. This naming scheme is not done consistently everywhere, unfortunately. If you are interested in C-- (which have diverged from Cmm), you can check out its specification.

Last modified 3 years ago Last modified on Jan 9, 2017 9:05:17 AM