EARTH (Efficient Architecture
for Running THreads)
|
Execution Model
EARTH supports an adaptive
eventDriven multi-threaded execution model, containing two thread levels: threaded
procedures and fibers. A threaded procedure is invoked asynchronously - forking
a parallel thread of execution. A threaded procedure is statically divided into
fibers -- fine-grain threads communicating through dataflow-like synchronization
operations. These generate events to signal that control and data dependences
are satisfied, triggering a fiber firing that schedules a fiber for execution.
One effective strategy of fiber formation is to place the source and destination
of long-latency operations into different fibers, such as non-local data movement
operations (e.g. in caches or near memory). This model permits local synchronization
between fibers using only relevant dependences, rather than global barriers.
It also enables an effective overlapping of communication and computation, allowing
a processor to grab any fiber whose data is ready. Conceptually, a node in an
EARTH virtual machine has an Execution Unit (EU), which runs the fibers, and
a Synchronization Unit (SU), which determines when fibers are ready to run,
and handles communication between nodes. The EU and SU are communicating through
dedicated queues: a Ready Queue (RQ) of fibers waiting to run on the EU, and
an Event Queue (EQ) containing events corresponding to EARTH operations generated
by fibers executing in the EU. To address the memory wall problem, threaded
function invocation is asynchronous and can be made adaptive - it may first
generate a template with little cost and the actual timing/site of its initiation
and initialization may be determined by a dynamic load balancing scheme. Asynchronous
threaded procedure invocations provide good thread mobility and are effective
in balancing dynamically changing workloads. Fibers are scheduling quanta generated/optimized
by the compiler and contain little architectural state - their invocation/termination
only require a few cycles. Fiber scheduling is event-driven and their order
of execution is determined at run-time based on the dependence satisfaction
and available resource. Event-driven fine-grain multi-threading at the fiber
level has been shown to have the unique ability of tolerating latencies, especially
those due to irregular and dynamically changing access patterns with poor locality.
Presently, the API for
the EARTH virtual machine is programmable through the EARTH Threaded-C language
- an extension of C with EARTH primitive operations (Tremblay et al 2000). On
the IBM-SP or Beowulf clusters, runtime system (RTS) libraries, running under
the native operating system, realize the EARTH virtual machine. Readers are
referred to (Theobald 1999) for more detailed information (including EARTH-C,
EARTH Threaded-C, EARTH RTS, EARTH-MANNA simulator), and EARTH implementations
on other platforms (Kakulavarapu 1999, Morrone 2001)
|