Global Grid Forum 5 Edinburgh, Scotland July 23, 2002, 6:00-7:00 Grid Checkpoint Recovery BoF (GridCPR) Agenda / Introduction charter handout, www site and mailing list on there - http://gridcpr.psc.edu/GGF/ - ggf-gridcpr@psc.edu Subscribe by sending mail to majordomo@psc.edu with "subscribe ggf-gridcpr" in the body of the message (not including the "s) round of mutual introduction of the people seems like many "right" people are here (Condor, PVM, MPI, Cactus, ...) reviewing the charter, to be discussed over email (4 weeks from now) majority of people votes in favour of having a WG on gridcpr! we will (majority vote again) finalize charter at GGF6 gridCPR requirements survey? purposes: fault tolerance, migration, accounting(?) Nathan Stone, brief presentation of at PSC: "talking points" from experience at PSC: - discussion of portability issues (needs debate!) - user-level vs. system level, - file format (e.g. HDF5 ??) - how much transparency is needed/possible ??? - what can/cannot be checkpointed ? - file I/O, checkpoint and other files ?? relocation of I/O ??? combine with gridftp ? - focus on HPC on clusters, or expand to "general" apps, like distributed and wireless things? - API: file vs. memory semantics goal for the group: - define APIs and interaction points where to apply checkpoints - find an API for writing a CP image restarting from an image (outside the app context)