|
Execution ModelEARTH supports an adaptive eventDriven multi-threaded execution model, containing two thread levels: threaded procedures and fibers. A threaded procedure is invoked asynchronously - forking a parallel thread of execution. A threaded procedure is statically divided into fibers -- fine-grain threads communicating through dataflow-like synchronization operations. These generate events to signal that control and data dependences are satisfied, triggering a fiber firing that schedules a fiber for execution. One effective strategy of fiber formation is to place the source and destination of long-latency operations into different fibers, such as non-local data movement operations (e.g. in caches or near memory). This model permits local synchronization between fibers using only relevant dependences, rather than global barriers. It also enables an effective overlapping of communication and computation, allowing a processor to grab any fiber whose data is ready. Conceptually, a node in an EARTH virtual machine has an Execution Unit (EU), which runs the fibers, and a Synchronization Unit (SU), which determines when fibers are ready to run, and handles communication between nodes. The EU and SU are communicating through dedicated queues: a Ready Queue (RQ) of fibers waiting to run on the EU, and an Event Queue (EQ) containing events corresponding to EARTH operations generated by fibers executing in the EU. To address the memory wall problem, threaded function invocation is asynchronous and can be made adaptive - it may first generate a template with little cost and the actual timing/site of its initiation and initialization may be determined by a dynamic load balancing scheme. Asynchronous threaded procedure invocations provide good thread mobility and are effective in balancing dynamically changing workloads. Fibers are scheduling quanta generated/optimized by the compiler and contain little architectural state - their invocation/termination only require a few cycles. Fiber scheduling is event-driven and their order of execution is determined at run-time based on the dependence satisfaction and available resource. Event-driven fine-grain multi-threading at the fiber level has been shown to have the unique ability of tolerating latencies, especially those due to irregular and dynamically changing access patterns with poor locality. Presently, the API for the EARTH virtual machine is programmable through the EARTH Threaded-C language - an extension of C with EARTH primitive operations (Tremblay et al 2000). On the IBM-SP or Beowulf clusters, runtime system (RTS) libraries, running under the native operating system, realize the EARTH virtual machine. Readers are referred to (Theobald 1999) for more detailed information (including EARTH-C, EARTH Threaded-C, EARTH RTS, EARTH-MANNA simulator), and EARTH implementations on other platforms (Kakulavarapu 1999, Morrone 2001) |
© CAPSL
1996-2013. All Rights Reserved.
capslwww@capsl.udel.edu
|