Masterclass:
"Safety Critical & High Availability Systems"
* Advanced Learning for Experienced Real-Time Embedded System Designers and Software Developers
* How to Structure Embedded Systems and Software for Safety Critical and High Availability Applications
* 3- Day Intensive Seminar (lectures, discussions, design examples, exercises)
MASTERCLASS OVERVIEW
This Masterclass examines the design of embedded systems and software that are to provide services in
applications that could, when they fail, threaten the well-being or safety of people. Many, though not all, of these
systems must not be stopped under any circumstances, and thus must be designed for high availability. Practical
guidance is offered on how to address these concerns when designing systems in fields such as medical,
automotive, avionics, nuclear and chemical process control.
The Masterclass surveys concepts and alternatives for system and software architectures appropriate for safety-
critical and high availability systems. Following an examination of hazard and risk analysis techniques, the seminar
goes on to list a number of approaches to software safety that span fault avoidance, fault detection, and fault
containment tactics including redundancy, recovery, masking and barriers. A variety of candidate architectural design
patterns are examined, including dual/triple modular redundancy, shutdown monitors, dissimilar independent
designs, backup parallel patterns and active/monitor parallel patterns. Many real-world examples are presented.
Systems which are required to provide high availability must be designed to tolerate faults. Their design is usually
based on off-the-shelf hardware and software combined in ways that will achieve “five-nines” (99.999%) or greater
availability. Basic hardware N-plexing and voting issues are discussed, followed by an in-depth study of a number of
backward error recovery fault tolerance techniques including Checkpoint-Rollback, Process Pairs, and Recovery
Blocks. The class continues with several forward error recovery techniques. Software design approaches are
discussed for run-time Built-In Self Test ("BIST") of processor and peripheral hardware. Technical issues such as
failover management, data replication, and software design defects, are addressed in depth.
This Masterclass is far from a general course about system or software design theory, but rather it is tightly focused
on the design of embedded systems and software that are required to provide their intended functions without
endangering the safety or life of users or their environment, while at the same time maintaining high availability if
required.
WHO SHOULD ATTEND ?
This Masterclass is intended for practicing real-time and embedded systems engineers, software system architects,
project managers and technical consultants who have responsibility for designing, structuring and implementing the
hardware and software for real-time and embedded computer systems in applications that could, when they fail,
threaten the well-being or life of people. Many of these systems have high availability as an additional design
requirement.
Course participants are expected to be familiar with general embedded and real-time software design. [This
knowledge can be gained by attending a prerequisite embedded software design course such as "Architectural
Design of Real-Time Software".]
MASTERCLASS OBJECTIVES
The primary goal of this Masterclass is to give the participant the skills necessary to design systems and software for
real-time and embedded computers in which faults and failures could pose a danger to human life. As part of this,
participants gain skills in designing systems for high availability. This is very practical, results-oriented training that
provides knowledge and skills that can be applied immediately.
MASTERCLASS CONTENTS
Definitions and Background
Hazards and Risks
Safety vs. Fault Tolerance
Design Issues for Safety
Redundancy
Approaches to Dependability
Examples: Automotive Drive-by-Wire
Preparatory Analyses
Hazard Analysis: FMEA
Fault & Event Tree Analysis
Exercise: Fault Tree Analysis for Railway Safety
Probabilistic Event Tree Analysis
Risk Analysis
Approaches to Safety: Fault Avoidance, Fault Detection, Fault Tolerance
Exercise: Event Tree and Risk Analysis for Railway Safety
Fundamental Safety Design Patterns
Detection of Sensor Errors
Failstop
Fault Masking
Shutdown Design Patterns
Single Channel Patterns
Multi-Channel Safety Design Patterns
Actuation Monitoring Options
Dual Channel Patterns
Dual Closed-Loop Patterns
Heterogeneous Peer-Channel Pattern
Example: Avionic Computer Software Development
Dual-Dual Pattern
Design Patterns for Resiliency and Safety
Monitor-Actuator Pattern
Extended Example: Medical Respiratory Ventilator
The Safety Executive
Extended Example: Automotive Drive-by-Wire
Extended Example: Airbus A330/340 Fly-by-Wire
Extended Example: Boeing 777 Fly-by-Wire
A Cookbook for Safety-Critical Design Functionality
BIST: Built-In Self Test Software Design
Exercise: The Wild Scalar
Learning from System Failures and Accidents
Sources of System Accidents
Software Factors in Some Famous Accidents
Case Study: Successful Spacecraft
Government and International Software Safety Standards
High Availability: Underlying Principles
Fault Avoidance vs. Tolerance
Replication vs. Functional Redundancy vs. Analytic Redundancy
Dynamic vs. Static Redundancy
Extended Example: Space Shuttle Software
Fundamental System-Level Availability Design Patterns
Static Hardware Fault Tolerance
N-Plex Design
Exercise: MTBF, MTTF Calculations in Triple Modular Redundancy
Dynamic System Fault Tolerance
Redundant Pairs
Clusters
Cluster Failover Strategy Choices
Concepts for Backward Error Recovery
Design Diversity
Dynamic System Redundancy
Backward Error Recovery
Transactions & Checkpointing
System and Software Design Patterns for High Availability
Checkpoint-Rollback
Process Pairs
Recovery Blocks
Limitations of Backward Error Recovery Patterns
Forward Error Recovery Design Patterns
Technical Issues in High Availability Design
Failover Management
Data Replication
Dealing with Software Design Faults
C Language in Critical Systems
Software Robustness: MISRA-C, LINT, Static Code Analyzers
Exercise: C-Language Shenanigans
Update on Static Code Analysis
The JPL "Power of 10" Coding Rules
Final Examination.
INSTRUCTOR: Dr. David Kalinsky
© Copyright 2011, D. Kalinsky Associates, All Rights Reserved. This page Updated March 25, 2011
|