|
|
 |
|
SC Conference - Activity Details
PLFS: A Checkpoint Filesystem for Parallel Applications
Authors:
|
John Bent
(Los Alamos National Laboratory)
|
|
Garth Gibson
(Carnegie Mellon University)
|
|
Gary Grider
(Los Alamos National Laboratory)
|
|
Ben McClelland
(Los Alamos National Laboratory)
|
|
Paul Nowoczynski
(Pittsburgh Supercomputing Center)
|
|
James Nunez
(Los Alamos National Laboratory)
|
|
Milo Polte
(Carnegie Mellon University)
|
|
Meghan Wingate
(Los Alamos National Laboratory)
|
Papers Session
|
High Performance Filesystems and I/O
|
|
Wednesday, 02:30PM - 03:00PM
|
|
Room PB251
|
Abstract:
Parallel applications running across thousands of processors must protect themselves from inevitable system
failures. Many applications insulate themselves from failures by checkpointing. For many applications, checkpointing into a shared single file is most convenient. With such an
approach, the size of writes are often small and not aligned with file system boundaries. Unfortunately for these
applications, this preferred data layout results in pathologically poor performance from the underlying file system
which is optimized for large, aligned writes to non-shared files.
To address this fundamental mismatch, we have developed a virtual parallel log structured file system, PLFS. PLFS remaps an application’s preferred
data layout into one which is optimized for the underlying file system. Through testing on PanFS, Lustre, and GPFS, we have seen that this layer of indirection
and reorganization can reduce checkpoint time by an order of magnitude for several important benchmarks and
real applications.
|
|
|