Changes between Version 64 and Version 65 of CodeBaseCleanup


Ignore:
Timestamp:
Sep 28, 2018 4:45:07 PM (15 months ago)
Author:
hsyl20
Comment:

Remove useless page

Legend:

Unmodified
Added
Removed
Modified
  • CodeBaseCleanup

    v64 v65  
    1 This page documents some cleanups that I (Sylvain Henry) would like to perform on GHC's code base.
    2 
    3 == Why?
    4 
    5 * Make the code more beginner friendly
    6   * Avoid acronyms
    7   * Hierarchical modules help in understanding the compiler structure
    8   * Try to correctly name things:
    9     * e.g. the "type checker" doesn't only check types, hence maybe we should call it "type system" or split it (e.g. Deriver, TypeChecker, etc.)
    10     * Avoid meaningless codename (e.g. backpack, hoopl)
    11 * Make the compiler more modular
    12   * Allow easier reuse (with the GHC API)
    13   * Make the compiler easier to debug
    14   * Make adding new passes/optimisations easier
    15   * Allow easier and faster testing (testing per component instead of testing the whole pipeline)
    16   * Allow new more interactive frontends (step-run each compiler pass and show IR, stats, etc.)
    17   * Allow profile guided optimizations (passes count and order, etc.)
    18 
    19 == Step 1: introduce basic module hierarchy
    20 
    21 Implement the [wiki:ModuleDependencies/Hierarchical proposal for hierarchical module structure in GHC] (#13009).
    22 
    23 It consists only in renaming/moving modules.
    24 
    25 Compared to the original proposal, I have:
    26 * Put IRs into GHC.IR and compilers into GHC.Compiler
    27 * changed GHC.Types into GHC.Data and GHC.Entity as the former is misleading (from a GHC API user point of view)
    28 * split GHC.Typecheck into GHC.IR.Haskell.{TypeChecker,Deriver}
    29 * split GHC.Utils into GHC.Utils and GHC.Data (e.g., Bag is in Data, not Utils)
    30 * etc.
    31 
    32 Tree logic:
    33 * IR: intermediate representations. Each one contains its syntax and stuff manipulating it
    34     * Haskell
    35         * Syntax
    36         * Parser, Lexer, Printer
    37         * Analyser
    38         * TypeChecker, Renamer, Deriver
    39     * Core
    40         * Syntax
    41         * Analyser
    42         * Transformer.{Simplifier,Specialiser,Vectoriser,WorkerWrapper,FloatIn,FloatOut,CommonSubExpr, etc.}
    43     * Cmm
    44         * Syntax
    45         * Analyser
    46         * Parser, Lexer, Printer
    47         * Transformer.{CommonBlockElim,ConstantFolder,Dataflow,ShortCutter,Sinker}
    48     * Stg
    49         * Syntax
    50         * Analyser
    51         * Transformer.{CommonSubExpr,CostCentreCollecter,Unariser}
    52     * ByteCode.{Assembler,Linker...}
    53     * Interface.{Loader,Renamer,TypeChecker, Transformer.Tidier}
    54     * Llvm.{Syntax, Printer}
    55 * Compiler: converters between representations
    56     * HaskellToCore
    57     * CoreToStg
    58     * StgToCmm
    59     * CmmToAsm
    60     * CmmToLlvm
    61     * CoreToByteCode
    62     * CoreToInterface
    63     * CmmToC
    64     * TemplateToHaskell
    65 * Entity: entities shared by different phases of the compiler (Class, Id, Name, Unique, etc.)
    66 * Builtin: builtin stuff
    67     * Primitive.{Types,Operations}: primitives
    68     * Names, Types, Uniques: other wired-in stuff
    69 * Program: GHC-the-program (command-line parser, etc.) and its modes
    70     * Driver.{Phases,Pipeline}
    71     * Backpack
    72     * Make, MakeDepend
    73 * Interactive: interactive stuff (debugger, closure inspection, interpreter, etc.)
    74 * Data: data structures (Bag, Tree, etc.)
    75 * Config: GHC configuration
    76     * HostPlatform: host platform info
    77     * Flags: dynamic configuration (DynFlags)
    78     * Build: generated at build time
    79 * Packages: package management stuff
    80 * RTS: interaction with the runtime system (closure and table representation)
    81 * Utils: utility code or code that doesn't easily belong to another directory (e.g., Outputable, SysTools, Elf, Finder, etc.)
    82 * Plugin: modules to import to write compiler plugins
    83 
    84 Actual renaming: see CodeBaseCleanup/ModuleRenaming
    85 
    86 Issues:
    87 * name clashes: some modules in `base` (e.g. GHC.Desugar) and `ghc-prim` (e.g. GHC.Types) use the same GHC prefix
    88   * maybe we should put all GHC extensions to base under GHC.Exts.* or GHC.Base.*
    89   * use GHC.Builtin.Primitive.* prefix in ghc-prim?
    90 
    91 TODO in the future:
    92 * Fix comments:
    93   * Several references to Note "Remote Template Haskell" (supposedly in libraries/ghci/GHCi/TH.hs) but it doesn't exist. Maybe replaced by Note "Remote GHCi"?
    94   * Undefined reference to "fill_in in PrelPack.hs" from GHC.Entity.Id
    95   * Undefined reference to CgConTbls.hs from GHC.Compiler.StgToCmm.Binding
    96   * Undefined reference to PprMach.hs from GHC.Compiler.CmmToAsm.PIC
    97   * Undefined reference to Renaming.hs from GHC.IR.Core.Transformer.Substitution
    98   * Undefined reference to simplStg/SRT.hs from GHC.IR.Cmm.Transformer.InfoTableBuilder
    99   * Undefined reference to codeGen/CodeGen.hs from GHC.Compiler.HaskellToCore.Foreign.Declaration
    100   * Undefined reference to RegArchBase.hs from GHC.Compiler.CmmToAsm.Register.Allocator.Graph.ArchX86
    101   * Undefined reference to MachRegs*.hs and MachRegs.hs from GHC.Compiler.CmmToAsm.Register.Allocator.Graph.ArchBase
    102 * Binutils 2.17 is from 2011. Maybe we could remove the Hack in GHC.Compiler.CmmToAsm.X86.CodeGen
    103 * Rename CAF into "static thunk"?
    104 * put notes files (e.g. profiling-notes, *.tex files) into actual notes or in the wiki
    105 * Fix traces of RnHsSyn that doesn't exist anymore
    106 * References to "NCG" should be replaced with reference to "CmmToAsm compiler"
    107 * Foreign export stubs are generated in GHC.Compiler.HaskellToCore.Foreign.Declaration...
    108 * Tests still reflect the old hierarchy (e.g., simplCore/should_compile) but renaming them could break other tools
    109 
    110 
    111 
    112 Questions:
    113 * Why don't we use the mangled selector name ($sel:foo:MkT) in every cases (not only when we have -XDuplicateRecordFields) instead of using the ambiguous one (foo)?
    114   * Incidentally, partially answered yesterday (2017-06-12) on ticket #13352
    115  
    116 
    117 == Step 2: split and edit some modules
    118 
    119 Some modules contain a lot of (unrelated) stuff. We should split them.
    120 
    121 * GHC.Utils (previously compiler/utils/Util.hs) contains a lot of stuff that should be split
    122   * Compiler configuration (ghciSupported, etc.): GHC.Config
    123   * List operations: GHC.Data.List{.Sort,.Fold}
    124   * Transitive closure: GHC.Data.Graph?
    125   * Edit distance and fuzzy match: GHC.Utils.FuzzyMatch?
    126   * Shared globals between GHC package instances: GHC.Utils.SharedGlobals?
    127   * Command-line parser: GHC.Utils.CmdLine
    128   * exactLog2 (Integer): GHC.Data.Integer (why isn't it in base?)
    129   * Read helpers (rational, maybe, etc.): GHC.Utils.Read?
    130   * doesDirNameExist, getModificationUTCTime: GHC.Utils.FilePath
    131   * hSetTranslit: GHC.Utils.Handle.Encoding
    132   * etc.
    133 * Split GHC.Types (was HscTypes) as it contains a lot of unrelated things
    134   * ModGuts/ModDetails/ModIface: move to GHC.Data.Module.*
    135   * Usage/Dependencies: move to GHC.Data.Module.Usage/Dependencies
    136 * GHC.Data.*: split
    137   * Split OccEnv from OccName (to harmonize with GHC.Data.Name.Env)?
    138   * Split ModuleEnv/ModuleSet from Module?
    139 * Split GHC.Data.Types (was TyCoRep)?
    140   * Contains many data types (TyThing, Coercion, Type, Kind, etc.)
    141 * Split PrettyPrint from GHC.Syntax.{Type,Expr,etc.}
    142 * Split GHC.IR.Core.Transform.{Simplify,SimplUtils,etc.}
    143 * Split GHC.Rename.ImportExport (e.g., contains "warnMissingSignature")
    144 * Put cmmToCmm optimisations from GHC.Compilers.CmmToAsm into GHC.IR.Cmm.Transform
    145 * Split type-checker solvers (class lookup, givens, wanted, etc.) (was TcSimplify, TcInteract, etc.)
    146 * Module name GHC.Compilers.StgToCmm.Layout seems dubious: split and rename?
    147 
    148 Some function/type names should be modified:
    149 
    150 * Rename codeGen function into stgToCmm
    151 * Rename nativeCodeGen into cmmToAsm
    152 * Rename ORdList (in GHC.Data.Tree.OrdList) into TreeSomething? (misleading)
    153 * CorePrep (prepare Core for codegen) could use a more explicit name
    154 * Maybe rename GHC.Data.RepType
    155 * Maybe rename OccName/RdrName/Name/Id to make them more explicit (may become obsolete with "trees that grow" patch)
    156   * OccName: NSName (NameSpacedName)
    157   * RdrName: ParsedName
    158   * Name: UniqueName
    159   * Id: TypedName
    160 
    161 
    162 
    163 
    164 == Step 3: clearly separate GHC-the-program and GHC's API
    165 
    166 * Make the GHC API purer
    167 
    168 === Abstract file loading (i.e. pluggable Finder)
    169 
    170 Currently the Finder assumes that a filesystem exists into which it can find some packages/modules.
    171 
    172 I would like to add support for module sources that are only available in memory or that can be retrieved from elsewhere (network, etc.).
    173 
    174 Something similar to Java's class loaders.
    175 
    176 === Abstract error reporting and logging (i.e. pluggable Logger)
    177 
    178 Allow new frontends (using GHC API) to use HTML reporting, etc.
    179 
    180 * Avoid dumping to the filesystem and/or stdout/stderr
    181 * Use data types instead of raw SDoc reports
    182 
    183 
    184 === Step 4: clearly separate phases
    185 
    186 * split DynFlags to only pass the required info to each pass
    187     * e.g. only the required hooks
    188 * use data types to report phase statistics, intermediate representations, etc.