Technical Paper :
"Mutexes Battle Priority Inversions"
that will never interfere with the normal operation of their multitasking application software.
But in fact, priority inversion situations are sometimes the underlying cause of what are popularly called real-time
“software glitches”. They can have serious impacts on application software, such as data losses or even total system
resets. A very famous example is the Mars Pathfinder space mission. The operational software on its Sojourner
miniature rover vehicle is reported to have suffered from recurring priority inversions which triggered software resets
and caused the loss of precious Martian meteorological data and panoramic pictures. [See: "What really happened
on Mars Rover Pathfinder", at http://catless.ncl.ac.uk/Risks/19.49.html#subj1 .]
After the phenomenon of priority inversion is described in this paper, a detailed example of a priority inversion
scenario will be shown. It will be seen that even when using an operating system’s semaphore features, an
especially virulent form of priority inversion termed "unbounded priority inversion" can develop.
An operating system mechanism called a mutex is then introduced, with the potential for addressing problems of
unbounded priority inversion. Mutexes come in several varieties. Priority inheritance mutexes and priority ceiling
mutexes will be examined in detail, and their relative advantages and disadvantages weighed.
WHAT IS ‘PRIORITY INVERSION’?
Modern Real Time Operating Systems support multitasking with a priority-based preemptive scheduler. Each
concurrent task (or ‘thread’) in an application program is assigned a priority number. It is the job of the scheduler to
ensure that at all times, the highest priority task which is ready to run will be the task which is actually running. The
scheduler may stop ("pre-empt") a running task in mid-execution, in order to ensure this -- if a higher priority task
becomes ready to run.
However, because of the often complex interrelationships between tasks, situations might occur in which a relatively
low priority task will be running at a time when you would want or expect a high priority task to be executing. This is
called a Priority Inversion.
Nowadays, priority inversions do not occur because of errors in the design of the task scheduler of an operating
system. Operating system task schedulers are pretty reliable these days. Instead, they occur when the operating
system has marked an important high-priority task as unready to run at a moment when the software designer would
have wanted it to be running.
This might occur, for example, if a task of high priority should be made to run in order to handle a message arriving at
a certain queue. But if the message has been given to another (lower-priority) task instead, then that other task would
be allowed to run instead. And the operating system would prevent the high priority task from running, declaring it
unready to run for lack of a message to process. In this case, the operating system might well be working as
designed and as instructed to work. (Perhaps instructions were given to the operating system without their
ramifications being fully understood.) If the software designer would be dissatisfied with this situation because
“Hey, my message got stolen by that low-priority task”, or “Hey, the high priority task should have executed
instead”, then it’s a situation of priority inversion.
It’s the old tale of “Do what I want, not what I said!”, brought into the age of high-tech multitasking software for real-time
embedded computing. What is wanted, is that a high priority task be made ready to run and be allowed to run as
soon as possible.
Priority inversions can have serious consequences in critical real-time embedded application software. In particular,
a high priority task might be delayed before executing, for an unexpectedly long period of time. These delays might
cause missed time deadlines, gaps in data acquisition, or time gaps in the control of external devices. In conjunction
with a watchdog timer, these delays might even trigger system resets.
A PRIORITY INVERSION SCENARIO
A classic scenario of priority inversion is shown in Figure 1, involving a binary semaphore. In this scenario, the
semaphore is named ‘PRNT’ to symbolize that it is being used to regulate the access of tasks to a printer. It is
initialized with a count of one (or initially given one semaphore ‘token’) to indicate that initially, one printer is available.
Tasks in this scenario are designed with disciplined access to the printer: Whenever a task wishes to print, it must
first obtain the count (‘token’) from the semaphore. [If no ‘token’ is available, the task will be prevented from running,
by the operating system.] When the task later finishes printing, it must return the token to the semaphore.
|Figure 1: Priority Inversion involving a Binary Semaphore
The scenario in Figure 1 above begins with low-priority Task A requesting and getting the ‘token’ from the semaphore
“PRNT”. With the token “in-hand”, Task A begins to print. A short while later, high-priority Task B requests the token
from “PRNT”. There’s no token in “PRNT” since Task A still has it, so Task B is blocked from running. [This is true
despite Task B’s higher priority, since unavailability of a token in a semaphore takes precedence over a task’s
priority.] Task A continues to run, and continues printing. All is as expected, up to this point.
Now another task we'll call Task C, for some unrelated reason, becomes ready to run. Task C is not interested in
printing, hence it does nothing to the semaphore “PRNT”. But Task C has higher priority than Task A which is C to run
To summarize the situation: Task C is of lower priority than Task B. And Task C is running, while Task B is prevented
from running. This is a priority inversion.
It’s as if Task B has become “entrapped”: Task B can’t run because it’s waiting for Task A to release the semaphore
token. But Task A can’t release the semaphore token or even make any progress toward this goal, because it’s been
preempted by Task C.
The situation could get even worse …. Some other task, say Task D, of slightly higher priority than Task C, may for
some unrelated reason, also become ready to run. Task D would then begin running, preempting Task C, which
preempted Task A which is prevented from finishing its printing and thus prevented from returning the token to the
semaphore. All this time, Task B, which is of higher priority than all the others, is “stuck” being prevented from running
because it is waiting for the semaphore token. There may be many tasks behaving like Task D, causing greater and
greater delays for Task B. This is called chained blocking or unbounded priority inversion.
A more explicit time line of this scenario is shown in Figure 2 below.
|Figure 2: Timeline of a Priority Inversion
Time advances to the right in this diagram. In general, it is possible for either Task A or Task B to acquire the
semaphore first. The problematic scenario is when Task A acquires the semaphore first. The diagram shows low-
priority Task A becoming ready first, so it runs first and acquires the semaphore ‘token’ first (its request is represented
by the black triangle on its left side). It is then briefly preempted by Task B because of Task B’s higher priority. But as
soon as Task B attempts to acquire the semaphore token for itself, it is blocked from further execution and Task A
resumes running. Subsequently, it is preempted by Task C (of intermediate priority) which is itself preempted by Task
D (of slightly higher, but still intermediate priority). Only when both Tasks C and D complete their execution does Task
A continue to run. During the execution of all three of these tasks, the task of higher priority than all of them, Task B, is
prevented from running, since it can't acquire the semaphore token. After Task A relinquishes its semaphore token,
the token is given to Task B so that it can run.
If Tasks C and D ran for long periods of time, or if there were additional intermediate-priority tasks such as Tasks C
and D in chained blocking of Task B, then Task B might be prevented from running for a very long time. Once again,
an unbounded priority inversion.
MUTEX TO THE RESCUE!
A ‘mutex’ is a special kind of semaphore used particularly to provide mutual exclusion, hence its name. It can do so
while avoiding the problems of unbounded priority inversion. The operations on a mutex are called ‘lock’ and ‘unlock’,
with meanings which are pretty much intuitive.
A mutex can be thought of as being in either a locked or an unlocked state. A task can use the lock operation to take
the mutex from the unlocked to the locked state; and if it succeeds in this operation, the task becomes the “owner” of
the mutex while it remains locked. If the mutex is already in the locked state, a task trying to use the lock operation will
be prevented from running, until the mutex is unlocked and becomes available.
Only the task that “owns” the mutex at a given time, can perform the ‘unlock’ operation on the mutex. This will take the
mutex from the locked to the unlocked state. This will also end the “ownership” of the mutex by that task.
If a task is waiting for the mutex when an ‘unlock’ operation is done, that task will immediately become the new
“owner” of the mutex. [If several tasks are waiting for the mutex, the operating system will use well-defined criteria to
select the single new “owner”.]
MUTEXES ARE DIFFERENT FROM TRADITIONAL SEMAPHORES
Unlike traditional semaphores, mutexes should not be thought of as having a count. Instead, they’re best thought of
as being in either a locked or an unlocked state.
The concept of a ’token’ is also not relevant for mutexes. With traditional semaphores, the P operation (“take” or “get”
or “wait”) can be thought of as “taking” or “getting” a token, which can then be passed from task to task if necessary.
With a mutex, the ‘lock’ operation can be thought of as obtaining “ownership” of the mutex. But the “owner” task is not
permitted to pass the “ownership” directly to another task.
With traditional semaphores, a ‘token’ may be returned to the semaphore by any task invoking the V operation (“give”
or “release”). With a mutex, only the original “owner” may ‘unlock’ (or “release”) the mutex.
Probably the most significant difference between traditional semaphores and mutexes, is that the problem of
unbounded priority inversion can not easily be solved using traditional semaphores. This is because of their lack of a
notion of “ownership” in a traditional semaphore.
MUTEXES SOLVE THE PROBLEM OF UNBOUNDED PRIORITY INVERSION
There are a number of techniques a mutex could use to solve the problem of unbounded priority inversion.
Two simple approaches might be to turn off interrupts, or to turn off task pre-emption. While these would solve the
problem, they would also have very undesirable side effects. Disabling pre-emption would delay the execution of all
other tasks including high-priority tasks totally unrelated to the problem. Disabling interrupts would totally disconnect
the embedded computer from the outside world -- effectively un-embedding it for some time. These might be called
“brute force” or "bull in a China shop" methods. They’d solve the priority inversion problem, but they can also heavily
damage the behavior of the application software. They should be avoided. Mutexes do not do these sorts of things.
A more refined approach would be to simply prevent intermediate priority tasks from running, when low- and high-
priority tasks are competing for access to the same shared resource. In other words, to prevent Tasks C and D in
Figure 1 from running, when Tasks A and B are competing for access to the printer. This can be done by a mutex, by
raising the lower priority task to a higher priority on a temporary basis while it is accessing the resource. In Figure 3, a
mutex is used to prevent Tasks C and D from running while Tasks A and B are competing for access to the printer.
|Figure 3: Mutex Prevents a Priority Inversion by performing Priority Promotion
Operating systems people call such temporary raising of a task's priority by a mutex, by the name "Priority
Promotion". In other words, the mutex can ask the operating system that is controlling task execution, to raise the
priority of the task that owns it on a temporary basis -- if that is needed to avoid the danger of unbounded priority
inversion. There are two popular ways a mutex can promote a lower priority task to a higher priority temporarily while
the mutex is “owned” by the task. These are called priority inheritance and priority ceiling.
When using priority inheritance, the operating system dynamically changes the priority of the task that “owns” a
mutex, depending on the priorities of other tasks that attempt to lock the mutex. The change in priority occurs if (and
when) other tasks actually request to lock the mutex.
In other words, when a high-priority task attempts to lock a mutex which is already “owned” by a lower-priority task, the
priority of the “owner” is raised to the high priority of the other task. The other task is blocked from running until the
mutex is unlocked, at which time the previous “owner” is brought back to its original lower priority. The general attitude
is "Let the normally low-priority task blast its way through the rest of its critical code quickly at high priority, if
there's a high priority task waiting for it to finish that code".
If several tasks are competing for “ownership” of a mutex, the current “owner” will be temporarily given the priority of
the highest priority task competing for ownership. In all cases, the net result is that intermediate priority tasks will be
prevented from slowing the release of ownership from the current owner to waiting new owners.
[NOTE: If you'd like to read more theory about priority inheritance mutexes, written by the original "inventors" of this kind of mutex, here's
the reference: L. Sha, R. Rajkumar, and J. P. Lehoczky. "Priority Inheritance Protocols: An Approach to Real-Time Synchronization", in
IEEE Transactions on Computers, vol. 39, pp. 1175-1185, Sep. 1990. ]
|Figure 4: Timeline showing how a Priority Inheritance Mutex Prevents Priority Inversion
Figure 4 shows a timeline of the operation of a mutex with priority inheritance. Initially, low-priority Task A takes
ownership of the mutex. It is then pre-empted by high-priority Task B. When Task B tries to lock the mutex, it
becomes blocked. But then the operating system raises the priority of Task A to that of Task B, to allow Task A to
continue running briskly until it unlocks the mutex. At that moment, the priority of Task A is returned to its original low
value, and Task B begins to run as the new owner of the mutex. After Task B unlocks the mutex, all tasks go on to run
according to normal priority-based preemptive scheduling.
Please note that the net result of this "behind-the-scenes" manipulation of Task A's priority, is that Tasks C and D no
longer interfere with the progress of Tasks A and B. Instead, we have "turned the tables" -- now Tasks A and B are
slightly delaying the progress of Tasks C and D, in such a way that Tasks C and D now run only after both A and B
have finished using the mutex.
On the other hand, when using a priority ceiling mutex to avoid priority inversion, each such mutex is assigned a
priority, called its ceiling priority. The ceiling priority must be at least as high as the priority of the highest priority task
that ever works with this mutex. Priority ceiling mutexes operate very "aggressively": Whenever a task locks a priority
ceiling mutex, its priority is automatically raised to the ceiling priority for that mutex. This temporary priority promotion
occurs immediately, as soon as "ownership" of the mutex begins -- and the priority promotion takes place whether or
not any other task is actually asking for "ownership" of the mutex at that time.
In other words, when any task locks a mutex, its priority is raised to a priority at or above that of all tasks which may
want to lock the mutex. The net result is once again that intermediate priority tasks will be prevented from slowing the
release of ownership from the current owner to waiting new owners. The general attitude is "Let the normally low-
priority task blast its way through all of its critical code quickly at high priority, just in case a high priority task
might want to lock the mutex in the very near future".
In embedded software development environments using a real-time operating system (RTOS), priority ceiling
mutexes are sometimes also called "Highest Locker" mutexes. This is because the ceiling priority is found by
surveying all software tasks that contain in their code a request to "lock" the mutex, and identifying the highest priority
task in this group of tasks. If the RTOS does not support time-slicing among tasks of the same priority, then the
ceiling priority for the mutex is the priority of this task. [ If the RTOS does support time-slicing among tasks of the
same priority, then the ceiling priority for the mutex is one priority level higher than the priority of this task.]
The scenario in Figure 4 can be modified to show the operation of a mutex with priority ceiling, if the priority ceiling is
set to be the same as the priority of Task B of this example. The modification to the figure's timeline would be that the
portion of Task A that runs at the elevated priority (Priority Ceiling = Priority 150), would actually run before any part of
Task B got to run.
Please note that once again, as for a priority inheritance mutex, also for a priority ceiling mutex the net result of this
"behind-the-scenes" manipulation of Task A's priority, is that Tasks C and D no longer interfere with the progress of
Tasks A and B. Instead, we have once again "turned the tables" -- now Tasks A and B are slightly delaying the
progress of Tasks C and D, in such a way that Tasks C and D now run only after both A and B have finished using the
mutex. But when using a priority ceiling mutex, this is done aggressively so that Task A has already unlocked the
mutex before Task B even asks to lock it.
WHICH MUTEX TO CHOOSE? : PRIORITY INHERITANCE vs. PRIORITY CEILING MUTEXES
Either priority inheritance mutexes or priority ceiling mutexes (or, in some cases, both) are offered in a number of
modern commercially-available real-time operating system kernels. Both kinds of mutexes elegantly solve the
problem of unbounded priority inversion. But each has its own advantages and disadvantages.
Priority inheritance mutexes adjust the priorities of tasks in a totally automatic way, requiring no involvement from the
application software developers. On the other hand, priority ceiling mutexes require that the application software
developers identify the ceiling priority for the mutex at the time that each priority ceiling mutex is created. If tasks
change priorities dynamically, the required ceiling may not be easy to identify. It must be the highest priority at which
any task may request to lock the mutex at any time.
Most multitasking applications do not change the priorities of tasks dynamically. For those applications, the priority
ceiling should simply be the priority of the highest-priority task which might ever request to lock the mutex.
Another difference between the two kinds of mutexes, is that in many cases priority inheritance mutexes may cause
more task switches than priority ceiling mutexes. The impact of this tendency is certainly application software
dependent; but in all cases lowering the number of task switches will lower operating system overhead.
If multiple mutexes need to be locked by a given task, then mutual deadlocks may develop when using priority
inheritance mutexes. These are circular relations between several tasks and several mutexes, which may cause
some tasks to become blocked forever. Priority inheritance mutexes are prone to mutual deadlocks. Priority ceiling
mutexes are immune to such deadlocks. This is a very serious software design concern in heavily multi-tasking
applications. Hence, it is strongly recommended that priority ceiling mutexes be selected for use in applications
where multiple mutexes need to be locked by multiple tasks at various times.
An additional difference between the two kinds of mutexes relates to analysis of the timing behavior of a multitasking
software system. There is no significant difference in the timing behavior of the operating system services themselves
for the two kinds of mutexes. But it is often much easier to make timing calculations for multitasking applications
using priority ceiling mutexes. That’s because priority inheritance mutexes change task priorities based upon
dynamic situations as they develop, perhaps changing the priority of a task several times in the duration of a single
mutex "ownership". This can make the timing analysis of a hard real-time application extremely complex, if not
impossible. On the other hand, priority ceiling mutexes change a task’s priority in a fixed way whenever it locks the
Priority inversions are a significant problem in the design of multitasking real-time embedded application software
systems. Traditional semaphores can not be made to prevent chained blocking of tasks (unbounded priority
inversions), when used for regulating access to resources shared among tasks. These unbounded priority
inversions can lead to serious application software failures such as loss of data, loss of control, or even total system
Mutexes are a specialized kind of semaphore designed to prevent unbounded priority inversions. A number of
modern real-time operating system kernels support mutexes offering alternative ways to address unbounded priority
inversion. The most popular are priority inheritance mutexes and priority ceiling mutexes. It is the responsibility of the
real-time embedded software architect to determine which kind of mutex is appropriate in each application situation.
|© Copyright 2016, D. Kalinsky Associates, All Rights Reserved.
This page Updated March 25, 2016