|
|
 |
|
SC Conference - Activity Details
VGrADS: Enabling e-Science Workflows on Grids and Clouds with Fault Tolerance
Authors:
|
Lavanya Ramakrishnan
(Indiana University)
|
|
Daniel Nurmi
(University of California, Santa Barbara)
|
|
Anirban Mandal
(Renaissance Computing Institute)
|
|
Charles Koelbel
(Rice University)
|
|
Dennis Gannon
(Microsoft Research)
|
|
T. Mark Huang
(University of Houston)
|
|
Yang-Suk Kee
(Oracle)
|
|
Graziano Obertelli
(University of California, Santa Barbara)
|
|
Kiran Thyagaraja
(Rice University)
|
|
Rich Wolski
(University of California, Santa Barbara)
|
|
Asim Yarkhan
(University of Tennessee, Knoxville)
|
|
Dmitrii Zagorodnov
(University of California, Santa Barbara)
|
Papers Session
|
Dynamic Task Scheduling
|
|
Thursday, 03:30PM - 04:00PM
|
|
Room PB251
|
Abstract:
Today's scientific workflows use distributed heterogeneous resources
through diverse grid and cloud interfaces that are often hard to
program. In addition, especially for time-sensitive critical applications,
predictable quality of service is necessary across these distributed
resources. VGrADS' virtual grid execution system (vgES) provides an
uniform qualitative resource abstraction over grid and cloud
systems. We apply vgES for scheduling a set of deadline sensitive
weather forecasting workflows. Specifically, this paper reports on our
experiences with (1) virtualized reservations for batch-queue systems,
(2) coordinated usage of TeraGrid (batch queue), Amazon EC2 (cloud),
our own clusters (batch queue) and Eucalyptus (cloud) resources, and
(3) fault tolerance through automated task replication. The combined
effect of these techniques was to enable a new workflow planning
method to balance performance, reliability and cost considerations.
The results point toward improved resource selection and execution
management support for a variety of e-Science applications over grids
and cloud systems.
|
|
|