![]() |
![]() |
|
|
Home
|
MCC Compiler Introduction Justin David Smith
-----------------------------------------------------------------------------
This document is intended to give an overview of the compiler layout
and design. It should provide a basic understanding of the compiler
implementation to people who are new to the project.
This document is still evolving and being written; please let me know
if anything in here is particularly vague, and I'll clarify it :)
For current release notes (where major issues and bugfixes are documented)
please see the file doc/release-notes.txt.
WHAT IS MCC?
MCC is the Mojave Compiler Collection, and is a multi-language compiler
supporting safe process migration and transactions for programs written
in C, Caml, Java, and Pascal. More information about this compiler and
our group can be found at http://mojave.cs.caltech.edu/; the development
group can be contacted by e-mail at mcc@metaprl.org.
COMPILING AND INSTALLING MCC
Prerequisite: cons 2.2.0
MCC is compiled using a build system called `cons'. Cons allows for
better dependency tracking and analysis than `make' can handle, and can
be setup to handle OCaml dependencies (it is difficult to get `make' to
handle these correctly). Download cons from http://www.dsmit.com/cons/.
The compiler will not work with development releases (2.3.x).
Prerequisite: ocaml 3.04
You need to have a recent version of OCaml to build the compiler (one of
the 3.x releases). The compiler works with version 3.04. OCaml can be
downloaded from http://caml.inria.fr/ocaml/distrib.html
** NOTE: If you build OCaml from source, you should build the native
code and optimized versions of the compiler -- follow the compile
instructions provided in the package to build a native code compiler.
Prerequisite: libbfd (from binutils 2.11.90 or 2.11.92)
You need to have libbfd installed on your system, available as part of
binutils. Currently MCC's linker only works with the Linux version of
binutils package at http://www.kernel.org/pub/linux/devel/binutils/. The
compiler is known to work with binutils 2.11.90.0.8 and 2.11.92.0.12.3.
Optional: MetaPRL
You can build the Mojave compiler with or without MetaPRL support.
MetaPRL will allow you to use the MetaPRL logical framework to reason
about intermediate code. If you don't know what this means, or if you
don't want to reason about the code, we suggest you install without
MetaPRL support.
To build with MetaPRL support, you'll need Aleksey Nogin's patch for
OCaml 3.04, available at http://www.cs.cornell.edu/nogin/RPM/ocaml.html,
and you'll need the source code for MetaPRL, at http://www.metaprl.org/
To build with MetaPRL support, give the --with-metaprl=<path> option
to the configure script, where <path> is the path to the top of the
MetaPRL source directory. Make sure you're using a snapshot of MetaPRL
that is compatible with the current release; see the download page on
the Mojave website, http://mojave.cs.caltech.edu/download.html, for more
information.
Building MCC
Unpack the distribution tarball and cd into the top source directory
(should be named mcc-<version>). Run ``./configure'' to configure MCC.
Some common options you can provide are:
--prefix=<path> Specify the install prefix for MCC.
--enable-bytecode Build MCC to bytecode. By default, MCC is
built to native code.
--with-metaprl=<path> Specify MCC should be built with MetaPRL.
The location of the uninstalled MetaPRL tree
should be specified.
--enable-full-metaprl Build all of MetaPRL, including the toplevel
editor. By default, only parts required by
MCC are built.
--with-libbfd=<path> Specify the path to libbfd libraries and
include files. This is rarely necessary,
unless your name is Justin :)
--enable-backend=<name> Configure to build a specific backend. By
default the appropriate backend for the
machine's architecture will be built. At
this time, you should not use this option.
Once you've configured MCC, simply type ``make'' to build the compiler.
This will build the entire compiler and all of the required libraries and
utilities.
Once you've compiled MCC, the binary file ``main/mcc'' is produced, which
provides a command-line interface similar to GCC's and is able to take
a source file all the way to executable code. A runtime library is also
produced which contains the garbage collector, etc.; for the Intel
architecture, this file is arch/x86/runtime/x86runtime.a. The ``mcc''
program will automatically link this library will automatically into
executable programs.
Installing MCC
To install MCC (including program, headers, and runtime libraries) to the
location you chose during configure, simply type ``make install''. You
can also run MCC directly from the source tree; MCC will automatically
check to see if it is being run from uninstalled sources, and adjust the
search paths accordingly.
COMPILING TEST PROGRAMS
After you've built the compiler, cd into test/fc/simple and try building
some of the test programs there. An example usage of the program is:
cd test/fc/simple
mcc -O2 -o mandel.exec mandel.c # produces executable
mcc -O2 -c -o mandel.o mandel.c # produces object file
For information about MCC's options, run ``mcc --help''. This will give
a brief summary of all the options MCC will support.
There are build scripts for building the test cases under test/naml and
test/fc; you can build individual test programs using cons, or build
entire directories at a time by running `make test' in the directory you
want to build. This will attempt to build all test cases; if any files
run into errors, then a file with extension `.err' is emitted which
contains the full error message for that test case.
You can also run the regression test system if you like; cd into test and
run ./regression_tests run. This script will run all stable test cases,
and produce a report indicating how many pass/fail.
OVERVIEW OF COMPILER LAYOUT
At this point, you should be able to compile the compiler and build some
of the test programs. The rest of this document will discuss the basic
compiler layout, and point you to the relevant files for the definitions
of the intermediate representations.
The compiler has three major sections:
* Front end (under front/) -- language implementations for C,
ML, Pascal, etc. Includes transformations to functional form.
* Middle end (under fir/) -- the functional intermediate language.
Includes most major optimizations and the link to MetaPRL.
* Backend (under arch/) -- transformation from FIR to assembly.
Includes arch-specific optimizations and runtime implementation.
Runtime safety checks are implemented here.
In addition, there is a library of various generic utilities (including
set, table, and graph implementations) under the lib/ directory; these
are used throughout the compiler.
LIBRARY FUNCTIONS/UTILITIES
This section is incomplete.
COMPILER FRONT ENDS
This section is incomplete.
COMPILER MIDDLE END
This section is incomplete.
COMPILER BACKEND
The backend is divided into two parts -- an architecture-independent
Machine IR (MIR), and the architecture-dependent assembly. The MIR
resembles the FIR, except that types have been removed and most of the
runtime checks and runtime-specific details are added. MIR code is
still in a tree-expression representation. The assembly code is very
close to actual machine code; it is linear and organized into blocks
of code. Each of these representations is discussed briefly below.
Machine IR (MIR)
The MIR is defined in arch/mir/type/mir.ml, the major transformation
from FIR to MIR is defined in arch/mir/util/mir_fir.ml.
The MIR is the last arch-independent stage of the compiler. Types
from the FIR are all replaced with low-level hardware types (int8,
int16, int32, ...); polymorphic data types are removed from the code
at this point, and pointers are converted to int32 values. Any type
coercions are transformed into low-level type coercions -- implicit
coercions are not allowed in the MIR, so arguments to functions and
operators must agree on the hardware data type.
At this stage, all runtime checks are added. This includes checks
to verify data and function pointers are valid, array bounds checks,
and other runtime checks. Also, the hooks to the garbage collector
are added; the key instruction here is the Reserve statement, which
checks that sufficient memory is available for allocations that are
coming up in the near future.
Most importantly, the code is transformed from an abstract, high-
level language to code that is tied to the runtime; the specifics of
how blocks are formatted in memory, and how the pointer table works
are exposed in the MIR. This is to make the conversion to assembly
easier; most of the runtime code is architecture-independent. The
MIR also makes it possible to run some optimizations that depend on
details of the runtime but are painful to do with assembly code.
Any code that is dependent on the runtime but is independent of the
architecture should be added in the MIR stage.
X86 Assembly
The X86 assembly is defined in arch/x86/type/x86_inst_type.ml.
This section is incomplete.
Runtime Libraries
This section is incomplete.
Will discuss POINTER TABLE, GC (briefly), fact that pointers always
point to begin of block, so PTR+OFS, etc...
Mention the context (EBP)
Mention the runtime library (for X86)
|
|
|
|
||