FTMP : a protocol for operating system fault tolerance in a fully distributed, loosely coupled environment /

This research presents the design, implementation, and testing of FTMP (Fault Tolerant Monitor Protocol). FTMP provides distributed reliability services to application programs in a highly redundant, loosely coupled, distributed network. Design goals for this research included network and location t...

Full description

Bibliographic Details
Main Author: Safford, David Robert, 1953-
Other Authors: Friesen, Donald K. (degree committee member.), Koppa, Rodger (degree committee member.), McCormick, Bruce H. (degree committee member.)
Format: Thesis Book
Language:English
Published: 1990.
Subjects:
Online Access:ProQuest, Abstract
Link to OAKTrust copy
Description
Summary:This research presents the design, implementation, and testing of FTMP (Fault Tolerant Monitor Protocol). FTMP provides distributed reliability services to application programs in a highly redundant, loosely coupled, distributed network. Design goals for this research included network and location transparency for the applications; topology and operating system independence; the ability to diagnose, contain, and recover from both hardware and software failures; and provision for future protocol extensions. The goal of topology independence is significant in that it prevents the use of existing network broadcast and routing services, while offering the use of much more fault tolerant network designs, such as planar-2. The design approach used includes both the addition of new vertical operating system layers, as well as full distribution and data replication for these new services. Vertical layering is used to place fault tolerance appropriately. While many prior efforts have centered only on hardware based methods, or application based methods, this approach recognizes that fault tolerance is needed in all levels, including hardware, device driver, operating system kernel, library, and application layers. In addition, two new layers are added between the application and traditional operating system, to provide desired new distributed services. The specific design provides the necessary operating system extensions with one service daemon per node. This daemon implements FTMP in a fully distributed fashion. In addition, an interface library is provided which simplifies the application's use of the FTMP services, while also implementing some of the fault tolerance services on a per application basis. Together, the interface library and service daemon provide automatic detection and correction of hardware and software failures, location transparent communication through distributed named ports, automatic critical file remote replication, and distributed object handling. A version is implemented in a 4 by 4 planar-2 mesh of Sun processors running Sun UNIX. This implementation is evaluated for completeness, correct operation, and efficiency. The results demonstrate that FTMP provides powerful tools for highly reliable distributed applications.
Item Description:Typescript (photocopy).
Vita.
"Major subject: Computer science."
Physical Description:xi, 336 leaves : illustrations ; 29 cm
Bibliography:Includes bibliographical references.