[A scenic desert tree] [Mojave]


Home
People
Research
Projects
Publications
Download
Laboratory
Support
CVSweb
Mailing Lists
Links
Funded by...

 

MCC Compiler Introduction                                  Justin David Smith
-----------------------------------------------------------------------------


   This document is intended to give an overview of the compiler layout
   and design.  It should provide a basic understanding of the compiler
   implementation to people who are new to the project.

   This document is still evolving and being written; please let me know
   if anything in here is particularly vague, and I'll clarify it :)

   For current release notes (where major issues and bugfixes are documented)
   please see the file doc/release-notes.txt.


WHAT IS MCC?

   MCC is the Mojave Compiler Collection, and is a multi-language compiler
   supporting safe process migration and transactions for programs written
   in C, Caml, Java, and Pascal.  More information about this compiler and
   our group can be found at http://mojave.cs.caltech.edu/; the development
   group can be contacted by e-mail at mcc@metaprl.org.


COMPILING AND INSTALLING MCC

 Prerequisite:  cons 2.2.0
   MCC is compiled using a build system called `cons'.  Cons allows for
   better dependency tracking and analysis than `make' can handle, and can
   be setup to handle OCaml dependencies (it is difficult to get `make' to
   handle these correctly). Download cons from http://www.dsmit.com/cons/.
   The compiler will not work with development releases (2.3.x).

 Prerequisite:  ocaml 3.04
   You need to have a recent version of OCaml to build the compiler (one of
   the 3.x releases).  The compiler works with version 3.04.  OCaml can be
   downloaded from http://caml.inria.fr/ocaml/distrib.html

   ** NOTE:  If you build OCaml from source, you should build the native
      code and optimized versions of the compiler -- follow the compile
      instructions provided in the package to build a native code compiler.

 Prerequisite:  libbfd (from binutils 2.11.90 or 2.11.92)
   You need to have libbfd installed on your system, available as part of
   binutils.  Currently MCC's linker only works with the Linux version of
   binutils package at http://www.kernel.org/pub/linux/devel/binutils/. The
   compiler is known to work with binutils 2.11.90.0.8 and 2.11.92.0.12.3.

 Optional:  MetaPRL
   You can build the Mojave compiler with or without MetaPRL support.
   MetaPRL will allow you to use the MetaPRL logical framework to reason
   about intermediate code. If you don't know what this means, or if you
   don't want to reason about the code, we suggest you install without
   MetaPRL support.  
   
   To build with MetaPRL support, you'll need Aleksey Nogin's patch for
   OCaml 3.04, available at http://www.cs.cornell.edu/nogin/RPM/ocaml.html,
   and you'll need the source code for MetaPRL, at http://www.metaprl.org/
   
   To build with MetaPRL support, give the --with-metaprl=<path> option
   to the configure script, where <path> is the path to the top of the
   MetaPRL source directory.  Make sure you're using a snapshot of MetaPRL
   that is compatible with the current release; see the download page on
   the Mojave website, http://mojave.cs.caltech.edu/download.html, for more
   information.

 Building MCC
   Unpack the distribution tarball and cd into the top source directory
   (should be named mcc-<version>).  Run ``./configure'' to configure MCC.
   Some common options you can provide are:
   
      --prefix=<path>         Specify the install prefix for MCC.
      
      --enable-bytecode       Build MCC to bytecode.  By default, MCC is
                              built to native code.
                              
      --with-metaprl=<path>   Specify MCC should be built with MetaPRL.
                              The location of the uninstalled MetaPRL tree
                              should be specified.

      --enable-full-metaprl   Build all of MetaPRL, including the toplevel
                              editor.  By default, only parts required by
                              MCC are built.

      --with-libbfd=<path>    Specify the path to libbfd libraries and
                              include files.  This is rarely necessary,
                              unless your name is Justin :)

      --enable-backend=<name> Configure to build a specific backend.  By
                              default the appropriate backend for the
                              machine's architecture will be built.  At
                              this time, you should not use this option.

   Once you've configured MCC, simply type ``make'' to build the compiler. 
   This will build the entire compiler and all of the required libraries and
   utilities.

   Once you've compiled MCC, the binary file ``main/mcc'' is produced, which
   provides a command-line interface similar to GCC's and is able to take
   a source file all the way to executable code.  A runtime library is also
   produced which contains the garbage collector, etc.; for the Intel
   architecture, this file is arch/x86/runtime/x86runtime.a.  The ``mcc''
   program will automatically link this library will automatically into
   executable programs.

 Installing MCC
   To install MCC (including program, headers, and runtime libraries) to the
   location you chose during configure, simply type ``make install''.  You 
   can also run MCC directly from the source tree; MCC will automatically 
   check to see if it is being run from uninstalled sources, and adjust the 
   search paths accordingly.


COMPILING TEST PROGRAMS

   After you've built the compiler, cd into test/fc/simple and try building
   some of the test programs there.  An example usage of the program is:

      cd test/fc/simple
      mcc -O2 -o mandel.exec mandel.c     # produces executable
      mcc -O2 -c -o mandel.o mandel.c     # produces object file

   For information about MCC's options, run ``mcc --help''.  This will give 
   a brief summary of all the options MCC will support.

   There are build scripts for building the test cases under test/naml and
   test/fc; you can build individual test programs using cons, or build
   entire directories at a time by running `make test' in the directory you
   want to build.  This will attempt to build all test cases; if any files
   run into errors, then a file with extension `.err' is emitted which
   contains the full error message for that test case.

   You can also run the regression test system if you like; cd into test and
   run ./regression_tests run.  This script will run all stable test cases,
   and produce a report indicating how many pass/fail.


OVERVIEW OF COMPILER LAYOUT

   At this point, you should be able to compile the compiler and build some
   of the test programs.  The rest of this document will discuss the basic
   compiler layout, and point you to the relevant files for the definitions
   of the intermediate representations.

   The compiler has three major sections:
      *  Front end (under front/) -- language implementations for C,
         ML, Pascal, etc.  Includes transformations to functional form.
      *  Middle end (under fir/) -- the functional intermediate language.
         Includes most major optimizations and the link to MetaPRL.
      *  Backend (under arch/) -- transformation from FIR to assembly.
         Includes arch-specific optimizations and runtime implementation.
         Runtime safety checks are implemented here.

   In addition, there is a library of various generic utilities (including
   set, table, and graph implementations) under the lib/ directory; these
   are used throughout the compiler.


LIBRARY FUNCTIONS/UTILITIES

   This section is incomplete.


COMPILER FRONT ENDS

   This section is incomplete.


COMPILER MIDDLE END

   This section is incomplete.


COMPILER BACKEND

   The backend is divided into two parts -- an architecture-independent
   Machine IR (MIR), and the architecture-dependent assembly.  The MIR
   resembles the FIR, except that types have been removed and most of the
   runtime checks and runtime-specific details are added.  MIR code is
   still in a tree-expression representation.  The assembly code is very
   close to actual machine code; it is linear and organized into blocks
   of code.  Each of these representations is discussed briefly below.

 Machine IR (MIR)
   The MIR is defined in arch/mir/type/mir.ml, the major transformation
   from FIR to MIR is defined in arch/mir/util/mir_fir.ml.

   The MIR is the last arch-independent stage of the compiler.  Types
   from the FIR are all replaced with low-level hardware types (int8,
   int16, int32, ...); polymorphic data types are removed from the code
   at this point, and pointers are converted to int32 values.  Any type
   coercions are transformed into low-level type coercions -- implicit
   coercions are not allowed in the MIR, so arguments to functions and
   operators must agree on the hardware data type.

   At this stage, all runtime checks are added.  This includes checks
   to verify data and function pointers are valid, array bounds checks,
   and other runtime checks.  Also, the hooks to the garbage collector
   are added; the key instruction here is the Reserve statement, which
   checks that sufficient memory is available for allocations that are
   coming up in the near future.

   Most importantly, the code is transformed from an abstract, high-
   level language to code that is tied to the runtime; the specifics of
   how blocks are formatted in memory, and how the pointer table works
   are exposed in the MIR.  This is to make the conversion to assembly
   easier; most of the runtime code is architecture-independent.  The
   MIR also makes it possible to run some optimizations that depend on
   details of the runtime but are painful to do with assembly code.
   Any code that is dependent on the runtime but is independent of the
   architecture should be added in the MIR stage.

 X86 Assembly
   The X86 assembly is defined in arch/x86/type/x86_inst_type.ml.

   This section is incomplete.

 Runtime Libraries
   This section is incomplete.

   Will discuss POINTER TABLE, GC (briefly), fact that pointers always
   point to begin of block, so PTR+OFS, etc...

   Mention the context (EBP)

   Mention the runtime library (for X86)


Webmaster | Contact Us | Generated on Thursday, Aug 24, 2006

Copyright (c) 2002-2004 Caltech Mojave Research Group.
Computer Science Dept., California Institute of Technology