Seminar: Fault-Tolerant Distributed Real-Time Systems

Seminar (7CP)

Sommersemester 2013

Instructors: Björn Brandenburg & Allen Clement

Topic

Many safety-critical systems must be inherently distributed, are subject to stringent real-time constraints, and must remain fully functional in the face of transient and, to some extent, permanent subsystem failures. In particular, cyber-physical systems—systems in which computers closely monitor and interact with the physical environment—typically exhibit this combination of requirements.

Common examples from daily life include automotive systems (e.g., anti-lock brakes, drive-by-wire functionality, etc.), air traffic control, factory automation, and the monitoring and control of the power grid. In addition to satisfying highest reliability expectations, such safety-critical systems are also often subject to certification requirements and/or formal validation efforts. That is, not only must they work in practice, but it must also be possible to formally establish their correctness a priori.

The focus of this seminar is to explore the algorithmic foundations that allow the construction of analytically sound fault-tolerant distributed real-time systems.

Prerequisites

This is a research-oriented Masters-level course. Students are expected to have at least an undergraduate-level understanding of operating systems and distributed systems. Prior exposure to real-time systems is recommended but not required.

Students are expected to have had prior training in the principles of effective scientific communication (i.e., students should already know how to write a scientific report and how to give a proper talk). Students that do not satisfy this requirement should enroll in a “Proseminar” first. This seminar is not a public speaking class.

Signup

Due to the format of the seminar, there are only a small number of topics available. Students interested in participating in the seminar should register early using the signup form.

Organization

When: regular meetings every Tuesday from 16:00 (c.t.) to 18:00.

The first meeting is on April 23.

Where: room 005, E1 5 (MPI-SWS building, UdS Campus).

Attendance policy: Seminars thrive on lively discussions. Therefore, attendance is mandatory. Absences require prior approval by the instructors (with the obvious exception of medical emergencies).

Format

The course is split into two phases.

Initially, there will be a few lectures covering real-time and distributed systems basics to establish a common terminology and a common ground for discussion, followed by a (short) graded quiz.

In the second phase, topics covering a small number of research papers will be assigned to participating students. Students are expected to give a lecture presenting the key concepts and techniques and write a brief synopsis of their assigned topic (4–8 pages).

The instructors reserve the right to deduct grade points for repeated failure to contribute in class.

Covered Papers

We will cover six topic areas across seven meetings.

Time-Triggered Real-Time Systems

On May 28, presented by Björn Brandenburg and Allen Clement.

Transient Faults in Real-Time Systems

On June 4, presented by Ufuoma Bright Ighoroje and Manohar Vanga.

Further reading:

Optionally, read Burns et al. (1996) for a gentle introduction to the topic, and Davis et al. (2007) for some corrections pertaining to the schedulability analysis of CAN.

Reliable Broadcast / Agreement

On June 11, presented by Aastha Mehta and Mennan Selimi.

Clock Synchronization

On June 25, presented by Xioafan Zhang and Xiao Chen.

Further reading:

Consensus (1)

On July 16, presented by Felipe Cerqueira and Raul Fernandes.

Consensus (2)

On July 23, Arpan Gujarati and Konstantin Kuznetsov.

Systems

On July 25, presented by Nicholas Merritt.