Artifact Evaluation Instructions

This document describes how to evaluate the in-kernel implementation of the laminar APA scheduler presented in the paper:

V. Bonifaci, B. Brandenburg, G. D’Angelo, and A. Marchetti-Spaccamela, “Multiprocessor Real-Time Scheduling with Hierarchical Processor Affinities”, Proceedings of the 28th Euromicro Conference on Real-Time Systems (ECRTS 2016), July 2016.

The proposed scheduling algorithm was implemented and evaluated in LITMUS^RT, which in turn is based on Linux.

Note: a tutorial on Linux kernel development and detailed instructions on how to compile and configure a Linux kernel that actually works on a given hardware platform is beyond the scope of this document. Basic familiarity with Linux and the compilation and installation of custom Linux kernels on behalf of the evaluator is assumed.

Note: the paper reports empirical data measured on a particular hardware platform available at the time of writing in the shared research cluster of MPI-SWS. To reproduce the exact numbers reported in the paper, it would be necessary to have access to this particular machine, or one virtually identical to it. The focus of this document is hence how to obtain measurements like those reported in the paper (i.e., how to observe similar trends), and not on reproducing the exact numbers.

The rest of the document covers the following main points:

What hardware is required?
How to obtain, compile, and install the kernel, tools, and test workloads? This part necessarily assumes basic familiarity with Linux kernel development.
How to run experiments?
How to process the raw data?

We provide instructions both for running the full experiments, which however requires access to a 24-core machine, and for running toy experiments, for which a 4-core machine suffices. We hope that running the 4-core toy experiments may be sufficient to establish that the provided artifact and the general procedure works.

Note: Even running the 24-core experiments will not reproduce the exact numbers reported in the paper, unless you happen to have an identical hardware platform. The general trends, however, should be observable even on other 24-core Intel platforms.

Hardware Requirements

To run the full experiments, you need a 24-core Intel platform (or larger) running Linux. To run the toy experiments, a 4-core Intel machine (or larger) running Linux will do. In a pinch, you can also use a virtual machine (with four or more virtual cores) to run the toy experiments. However, in a virtual machine, all collected data will be bogus due to the virtualization overheads.

If you have a system with more than 24 cores, you can tell Linux to use only 24 cores by booting with maxcpus=24 specified in the kernel command line.

On your experimental platform, to approximate our settings, you should disable (in the BIOS) architectural features that cause unpredictability such as hyperthreading and cache prefetching. To reproduce our exact setup, you need a Dell PowerEdge R920 server with four sockets, each containing a 12-core Intel Xeon E7–8857 v2 processor.

Obtaining, Compiling, and Installing the Software Artifact

To conduct experiments similar to those reported in the paper, you need the following software components:

The Linux kernel, version 4.1.3
The LITMUS^RT patch, version 2015.1.
Our patch that adds the laminar APA scheduler.
The liblitmus user-space library, which provides the necessary tools for working with a LITMUS^RT kernel. The liblitmus library also comes with rtspin, a tool for simulating CPU-bound, periodic real-time tasks, which is employed as the workload in the experiments.
The feather-trace-tools project, a collection of tracing and and analysis tools that we used to collect overheads under LITMUS^RT.
Our patch to feather-trace-tools, which adds support for the trace points used in message-passing-based schedulers (such as the implementation of the proposed laminar APA scheduler).
The workloads used to stress the kernel while running experiments.

For convenience, we provide items 2–7 together in a single archive that can be downloaded from here:

http://www.mpi-sws.org/~bbb/papers/ae/ecrts16/ae-laminar-apa.tgz

When unpacked, you should see the following directory contents:

drwxr-xr-x   2 bbb  wheel      68 May 11 20:52 data
drwxr-xr-x  20 bbb  wheel     680 May 11 23:28 feather-trace-tools
-rw-r--r--   1 bbb  wheel  114145 May 12 01:49 kernel-config-for-virtualbox
drwxr-xr-x  15 bbb  wheel     510 May 11 20:45 liblitmus
-rw-r--r--   1 bbb  wheel  600909 May 11 21:04 litmus-rt-with-laminar-apa.patch
drwxr-xr-x   4 bbb  wheel     136 May 11 22:16 workloads

To set up the software environment, carry out the following step-by-step instructions on your Linux host that you will use for the experiments. (This could be a virtual machine, provided your workstation can comfortably run one with at least four virtual cores.) In the following, we assume that you use /usr/local/litmus as the working directory.

Note: all instructions have been tested on a machine running Ubuntu Linux 14.04 LTS. In principle, you should be able to use just about any Linux distribution; however, with different, especially newer compiler versions comes the risk of compilation failures due to newly added warnings (we compile with -Werror).

Support: we’ve made all efforts to make reproducing our work as painless as possible, but working with kernels does come with some challenges. If at any point you run into problems, please feel free to contact Björn Brandenburg (bbb@mpi-sws.org) for help.

Step 0: System Setup

Make sure your Linux box is set up for Linux kernel development. This means installing the typical Unix C development chain, including gcc, make, etc. Any Linux installation that can compile a vanilla Linux kernel should also be able to compile the provided software artifact.

The working directory /usr/local/litmus can be set up as follows:

cd /usr/local
sudo mkdir litmus
sudo chown $USER litmus

Step 1: Download Archive

Download and unpack the prepared archive.

cd /usr/local/litmus
wget http://www.mpi-sws.org/~bbb/papers/ae/ecrts16/ae-laminar-apa.tgz
tar xzf ae-laminar-apa.tgz

Step 2: Download Linux

Download and extract Linux version 4.1.3 in the folder extracted from the archive (= /usr/local/litmus/ae-laminar-apa/).

cd /usr/local/litmus/ae-laminar-apa/
# Download the Linux kernel sources
wget https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.1.3.tar.gz
# Unpack Linux archive into linux-4.1.3
tar xzf linux-4.1.3.tar.gz
# Rename directory into litmus-rt
mv linux-4.1.3 litmus-rt

Note the last step: the directory containing Linux needs to be called litmus-rt in order for the provided build system to work.

Step 3: Apply the LITMUS^RT Patch

The provided archive contains a single patch that includes both the LITMUS^RT base system and the modifications specific to this paper. Apply this patch with the patch utility.

cd /usr/local/litmus/ae-laminar-apa/litmus-rt/
patch -p1 < ../litmus-rt-with-laminar-apa.patch

Our implementation of the laminar APA scheduler can be found in the file litmus/sched_lsa_fp_mp.c.

Step 4: Configure the Patched Kernel

Note: This is the step where prior experience with configuring and compiling the Linux kernel is required.

Prior to compilation, the Linux kernel must configured. In general, it is not possible to tell in advance which configuration options will be required for the hardware platform on which the artifact is going to be evaluated. We thus cannot provide precise instructions for this step.

We provide the following guidelines:

Make sure in-kernel preemptions are enabled (CONFIG_PREEMPT).
Disable all support for sleep states, voltage scaling, etc.
Disable CPU hot-plug support (or make sure hot-plug events never happen at runtime).
Disable “Write protect kernel read-only data structures” (in kernel debug).

With regard to configuration options specific to LITMUS^RT, there is a configuration group at the very end of the configuration.

In the LITMUS^RT settings, make sure you enable overhead tracing (CONFIG_SCHED_OVERHEAD_TRACE=y), disable debug tracing (CONFIG_SCHED_DEBUG_TRACE not set), and enable release-master support (CONFIG_RELEASE_MASTER=y).

After the kernel has been configured, the last couple of lines of your .config file should look like this:

#                                                        
# LITMUS^RT                                              
#                                                        

#                                                        
# Scheduling                                             
#                                                        
CONFIG_PLUGIN_CEDF=y                                     
CONFIG_PLUGIN_PFAIR=y                                    
CONFIG_RELEASE_MASTER=y                                  
CONFIG_PREFER_LOCAL_LINKING=y                            
CONFIG_LITMUS_QUANTUM_LENGTH_US=1000                     
# CONFIG_BUG_ON_MIGRATION_DEADLOCK is not set            

#                                                        
# Real-Time Synchronization                              
#                                                        
CONFIG_NP_SECTION=y                                      
CONFIG_LITMUS_LOCKING=y                                  

#                                                        
# Performance Enhancements                               
#                                                        
# CONFIG_SCHED_CPU_AFFINITY is not set                   
# CONFIG_ALLOW_EARLY_RELEASE is not set                  
# CONFIG_EDF_TIE_BREAK_LATENESS is not set               
CONFIG_EDF_TIE_BREAK_LATENESS_NORM=y                     
# CONFIG_EDF_TIE_BREAK_HASH is not set                   
# CONFIG_EDF_PID_TIE_BREAK is not set                    

#                                                        
# Tracing                                                
#                                                        
CONFIG_FEATHER_TRACE=y                                   
CONFIG_SCHED_TASK_TRACE=y                                
CONFIG_SCHED_TASK_TRACE_SHIFT=13                         
# CONFIG_SCHED_LITMUS_TRACEPOINT is not set              
CONFIG_SCHED_OVERHEAD_TRACE=y                            
CONFIG_SCHED_OVERHEAD_TRACE_SHIFT=24                     
# CONFIG_SCHED_DEBUG_TRACE is not set

If the configured kernel fails to boot or run correctly, this is most likely a configuration problem unrelated to this artifact. When it doubt, make sure you can configure and boot a kernel without the LITMUS^RT patch first.

As an alternative, we provide a kernel configuration that is known to work under VirtualBox (the file kernel-config-for-virtualbox). To use this configuration, copy the file into the kernel source code directory and rename it to .config.

cp -v ../kernel-config-for-virtualbox .config

If all else fails, download and use the VM image provided for the LITMUS^RT tutorial held at TuToR’16. The provided kernel configuration has been derived from this VM image and is known to work. However, if you seek to carry out the instructions laid out in this document in the TuToR’16 VM, you will need to delete the directory /opt/litmus-rt prior to unpacking the kernel in Step 2 since the VM image has only very limited disk space.

Step 5: Compile and Install the Kernel

Once the kernel has been configured, simply compile and install it like any other Linux kernel.

make bzImage modules
sudo make modules_install install

Note that the second step requires root privileges.

Note: Depending on which bootloader your configuration uses (e.g., grub2 or lilo), you may have to run another utility at this point to update the bootloader configuration. In some cases, you may even have to manually edit the bootloader configuration. As this is basic Linux knowledge that anyone familiar with Linux kernel development will know, we do not provide any detailed instructions for this step. (On Ubuntu-based systems, no manual configuration should be needed.)

Step 6: Compile the User-Space Libray

Change to the directory of the provided liblitmus and run make. It should find the kernel in ../litmus-rt automatically.

cd /usr/local/litmus/ae-laminar-apa/liblitmus
make

Step 7: Compile the Tracing Tools

Change to the directory of the provided feather-trace-tools and run make. It should find the kernel in ../litmus-rt automatically.

cd /usr/local/litmus/ae-laminar-apa/feather-trace-tools
make

At this point, the software is ready to run experiments.

Running Experiments

Running the experiments is quite easy since they are mostly automated by the included shell scripts.

Step 8: Reboot into the LITMUS^RT Kernel

Reboot the system and select the just-installed LITMUS^RT kernel in the menu of your boot loader.

Again, how this exactly works is specific to the particular flavor of Linux installed on the evaluation machine; it will be a familiar step to anyone who is comfortable with compiling and installing custom Linux kernels. On Ubuntu, the newly installed kernel can be found in the submenu labeled “Advanced Options for Ubuntu” under the name “Ubuntu, with Linux 4.1.3”.

Once the kernel has booted, you can verify that it is indeed the right kernel by inspecting the list of loaded scheduler plugins with the following command.

cat /proc/litmus/plugins/loaded

This should produce the following output:

PFAIR      
C-EDF      
LSA-FP-MP  
G-FP-MP    
G-EDF-MP   
P-FP       
PSN-EDF    
GSN-EDF    
Linux

The plugin LSA-FP-MP is the laminar strong APA fixed-priority scheduler plugin (the new scheduler); the plugin G-FP-MP is the global fixed-priority scheduler plugin (the baseline). However, these plugins will be automatically activated at the right times by the provided experiment scripts, as discussed below in Step 10.

Step 9: Root Shell and Path

Running experiments requires superuser privileges. Thus, first open a root shell. For example, with sudo:

sudo -s

In the root shell, we need to set the PATH environmental variable to ensure that the experiment scripts can find all required tools. In particular, we need to add the liblitmus and the feather-trace-tools directories to the search path.

export PATH=/usr/local/litmus/ae-laminar-apa/liblitmus:$PATH
export PATH=/usr/local/litmus/ae-laminar-apa/feather-trace-tools:$PATH

You can check that the path was set up correctly by locating the rtspin and ftcat utilities:

which rtspin                                 
# expected output:
# /usr/local/litmus/ae-laminar-apa/liblitmus/rtspin          
which ftcat                                  
# expected output:
# /usr/local/litmus/ae-laminar-apa/feather-trace-tools/ftcat

Step 10: Run an Experiment

Finally, we can launch an experiment. The archive comes both with all workloads used in the experiments reported in the paper, and with toy experiments that allow trying out the kernel if no 24-core machine is available.

There is a shell script for each experiment that takes care of everything: setting up the experiment, launching rtspin processes with appropriate parameters, starting overhead tracing, and tearing everything down again at the end of the experiments.

The experiment scripts are provided in the folder workloads/ of the provided archive. They are organized by required core count and by scheduler:

the directory workloads/24/apa contains experiment scripts for the laminar APA scheduler (the new algorithm) for platforms with at least 24 cores;
the directory workloads/24/gfp contains experiment scripts for the global fixed-priority scheduler (the baseline) for platforms with at least 24 cores;
the directory workloads/4/apa contains experiment scripts for the laminar APA scheduler (the new algorithm) for platforms with at least 4 cores; and
the directory workloads/4/gfp contains experiment scripts for the global fixed-priority scheduler (the baseline) for platforms with at least 4 cores.

The file names of the shell scripts reflect the parameters of the workload. For example, the file workloads/4/apa/apa-workload_m=04_n=08_u=85_seq=00.sh is for m=4 processor cores, launches a workload consisting of n=8 tasks, and has a total utilization of 85%. The seq tag is simply a sequence number; there are 10 scripts for each parameter combination.

To launch an experiment, simply run the corresponding script from the root shell.

Each experiment script produces raw overhead sample files in the directory in which it is launched. We therefore first move to the (still empty) data/ directory.

cd /usr/local/litmus/ae-laminar-apa/data
../workloads/4/apa/apa-workload_m=04_n=08_u=85_seq=00.sh

Each experiment runs for 60 seconds (plus a few seconds for setup and teardown). While it is running, it should provide a progress indicator (dots appearing, one per second). When the experiment is done, the shell script will terminate.

For example, this is what it should look like after the experiment has completed:

root@rts44:/usr/local/litmus# cd /usr/local/litmus/ae-laminar-apa/data                                  
root@rts44:/usr/local/litmus/ae-laminar-apa/data# ../workloads/4/apa/apa-workload_m=04_n=08_u=85_seq=00.sh                                                                                                      
Running apa-workload_m=04_n=08_u=85_seq=00 under LSA-FP-MP for 60 seconds...                            
Setting processor 3 to be the dedicated scheduling core.                                               
Waiting for 8 tasks to finish launching...                                                              
Launching overhead tracer.                                                                              
Waiting for overhead tracer to finish launching...                                                      
Released 8 real-time tasks.                                                                             
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                         
All tasks finished.                                                                                     
Sent SIGUSR1 to stop tracers...                                                                         
root@rts44:/usr/local/litmus/ae-laminar-apa/data#

The experiment script generated a bunch of data files that contain overhead samples that were collected with Feather-Trace, the overhead tracing framework built into LITMUS^RT. You should now see (at least) the following files in the data/ directory.

overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_n=08_u=85_seq=00_cpu=0.bin
overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_n=08_u=85_seq=00_cpu=1.bin
overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_n=08_u=85_seq=00_cpu=2.bin
overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_n=08_u=85_seq=00_cpu=3.bin
overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_n=08_u=85_seq=00_msg=0.bin
overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_n=08_u=85_seq=00_msg=1.bin
overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_n=08_u=85_seq=00_msg=2.bin
overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_n=08_u=85_seq=00_msg=3.bin

The script creates two files for each processor; the total number of files created hence depends on size of the experimental platform.

The above example ran a workload under the new laminar APA scheduler. To get data for the baseline scheduler, run the same experiment again, but this time under the global fixed-priority (GFP) scheduler.

cd /usr/local/litmus/ae-laminar-apa/data
../workloads/4/gfp/apa-workload_m=04_n=08_u=85_seq=00.sh

Again, the output should look something like this:

root@rts44:/usr/local/litmus/ae-laminar-apa/data# ../workloads/4/gfp/apa-workload_m=04_n=08_u=85_seq=00.sh                                                                                                      
Running apa-workload_m=04_n=08_u=85_seq=00 under G-FP-MP for 60 seconds...                              
Setting processor 3 to be the dedicated scheduling core.                                               
Waiting for 8 tasks to finish launching...                                                              
Launching overhead tracer.                                                                              
Waiting for overhead tracer to finish launching...                                                      
Released 8 real-time tasks.                                                                             
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                         
All tasks finished.                                                                                     
Sent SIGUSR1 to stop tracers...

You can now run as many experiments as you wish. Obviously, running all workloads will take many hours.

Data Processing

Once a satisfactory amount of data has been collected, the overhead statistics reported in the paper can be obtained with the tools provided in feather-trace-tools/.

The following steps are based on the LITMUS^RT overhead tracing tutorial. The focus here is on documenting how to obtain the desired statistics, not on explaining why each step is necessary or what precisely it does. For a more in-depth explanation, please refer to the LITMUS^RT tracing tutorial.

Step 11: Sort Trace Files

In the first processing step, the raw trace files are cleaned up and prepared for further processing.

ft-sort-traces overheads_*.bin 2>&1 | tee -a overhead-processing.log

Step 12: Extract Samples

The files ending with the extension .bin are raw trace files in a kernel-defined binary format. Before extracting meaningful statistics, we need to extract the actual data samples.

ft-extract-samples overheads_*.bin 2>&1 | tee -a overhead-processing.log

Step 13: Aggregate Samples

At this point, we have many per-processor, per-task-count, per-utilization, etc. files. We are interested in aggregate overhead values across all tested scenarios. Hence we need to combine the individual sample files.

ft-combine-samples --std overheads_*.float32 2>&1 | tee -a overhead-processing.log

Step 14: Count Total Number of Samples

In general, the number of samples that were collected for the two schedulers will differ to some extent. To get the minimum number available of each type, which we require for the next step, we count the number of samples in all files.

ft-count-samples  combined-overheads_*.float32 > counts.csv

Step 15: Draw a Random Sample

To compare sampled maxima in an unbiased way, we need to use an equal number of samples from each population. Since in all likelihood we recorded a different number of samples for each scheduler, we randomize and truncate the data files.

ft-select-samples counts.csv combined-overheads_*.float32 2>&1 | tee -a overhead-processing.log

Step 16: Compute Statistics

Finally, we can compute the statistics reported in the paper.

ft-compute-stats combined-overheads_*.sf32 > stats.csv

The above command reports overheads in terms of processor cycles. For human consumption, it is more convenient to report overheads in terms of microseconds. To obtain the overheads in microseconds, pass the option --cycles-per-usec to ft-compute-stats. The appropriate value can be obtained from /proc/cpuinfo with the following command:

grep 'cpu MHz' /proc/cpuinfo | uniq

For example, on our test machine (not the one used for the experiments reported in the paper), the output looks as follows.

bbb@rts44:/usr/local/litmus$ grep 'cpu MHz' /proc/cpuinfo | uniq
cpu MHz         : 2200.080

In this particular processor, there are 2200.080 cycles per microsecond. Hence we can compute the statistics in microseconds as follows.

ft-compute-stats --cycles-per-usec 2200.080 combined-overheads_*.sf32 > stats-us.csv

Having just run the two toy experiments mentioned in Step 10 above, the output looks as follows:

#    Plugin, #cores,        Overhead,                                 Unit, #tasks, #samples,      max, 99.9th perc., 99th perc., 95th perc.,     avg,     med,     min,     std,     var,                                                                                                           file
    G-FP-MP,     04,  CLIENT-REQUEST, microseconds (scale = 1/2200.080000),      *,    74673, 19.34430,     15.64589,    2.43958,    2.20492, 1.44456, 1.34995, 0.16272, 0.83952, 0.70479,   combined-overheads_host=rts44_scheduler=G-FP-MP_trace=apa-workload_m=04_overhead=CLIENT-REQUEST_LATENCY.sf32
    G-FP-MP,     04,             CXS, microseconds (scale = 1/2200.080000),      *,    74713, 21.48013,      3.50275,    1.59358,    1.51313, 1.16294, 1.19087, 0.20408, 0.44905, 0.20165,                      combined-overheads_host=rts44_scheduler=G-FP-MP_trace=apa-workload_m=04_overhead=CXS.sf32
    G-FP-MP,     04,     DSP-HANDLER, microseconds (scale = 1/2200.080000),      *,    74754, 26.97538,     11.48239,    1.30995,    1.03087, 0.73659, 0.69270, 0.09272, 0.55134, 0.30398,              combined-overheads_host=rts44_scheduler=G-FP-MP_trace=apa-workload_m=04_overhead=DSP-HANDLER.sf32
    G-FP-MP,     04, RELEASE-LATENCY,        microseconds (scale = 1/1000),      *,    31554, 24.16400,     17.20824,    6.06674,    3.61900, 1.76023, 1.43000, 0.63200, 1.40904, 1.98534,          combined-overheads_host=rts44_scheduler=G-FP-MP_trace=apa-workload_m=04_overhead=RELEASE-LATENCY.sf32
    G-FP-MP,     04,         RELEASE, microseconds (scale = 1/2200.080000),      *,    31553, 28.52260,     13.56427,    4.46666,    3.48669, 1.68209, 1.59085, 0.44862, 1.03474, 1.07066,                  combined-overheads_host=rts44_scheduler=G-FP-MP_trace=apa-workload_m=04_overhead=RELEASE.sf32
    G-FP-MP,     04,          SCHED2, microseconds (scale = 1/2200.080000),      *,    74715, 15.17581,      0.54998,    0.32999,    0.23999, 0.17463, 0.22363, 0.09545, 0.11122, 0.01237,                   combined-overheads_host=rts44_scheduler=G-FP-MP_trace=apa-workload_m=04_overhead=SCHED2.sf32
    G-FP-MP,     04,           SCHED, microseconds (scale = 1/2200.080000),      *,    74715, 28.92122,     11.12775,    3.12509,    2.54354, 1.26885, 1.13632, 0.34271, 0.70886, 0.50247,                    combined-overheads_host=rts44_scheduler=G-FP-MP_trace=apa-workload_m=04_overhead=SCHED.sf32
    G-FP-MP,     04,    SEND-RESCHED, microseconds (scale = 1/2200.080000),      *,    73766, 19.14249,     11.68789,    2.05356,    1.87311, 1.46852, 1.60312, 0.63543, 0.63861, 0.40782,             combined-overheads_host=rts44_scheduler=G-FP-MP_trace=apa-workload_m=04_overhead=SEND-RESCHED.sf32
  LSA-FP-MP,     04,  CLIENT-REQUEST, microseconds (scale = 1/2200.080000),      *,    74673, 16.72530,      9.88720,    1.43449,    1.23359, 1.13731, 1.09587, 0.85997, 0.46705, 0.21813, combined-overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_overhead=CLIENT-REQUEST_LATENCY.sf32
  LSA-FP-MP,     04,             CXS, microseconds (scale = 1/2200.080000),      *,    74713, 10.75552,      1.30645,    0.82133,    0.75725, 0.52628, 0.64998, 0.17863, 0.22476, 0.05051,                    combined-overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_overhead=CXS.sf32
  LSA-FP-MP,     04,     DSP-HANDLER, microseconds (scale = 1/2200.080000),      *,    74754, 31.82248,      5.14525,    3.49442,    3.14307, 2.53362, 2.42264, 0.09409, 0.45715, 0.20899,            combined-overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_overhead=DSP-HANDLER.sf32
  LSA-FP-MP,     04, RELEASE-LATENCY,        microseconds (scale = 1/1000),      *,    31554, 18.40800,     11.55656,    2.41288,    1.72400, 0.95378, 0.70900, 0.62700, 0.67612, 0.45712,        combined-overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_overhead=RELEASE-LATENCY.sf32
  LSA-FP-MP,     04,         RELEASE, microseconds (scale = 1/2200.080000),      *,    31553, 20.57107,     11.54750,    5.53980,    3.29715, 2.27388, 1.96993, 0.53589, 0.91940, 0.84526,                combined-overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_overhead=RELEASE.sf32
  LSA-FP-MP,     04,          SCHED2, microseconds (scale = 1/2200.080000),      *,    74715, 16.82439,      0.30590,    0.22999,    0.17999, 0.12171, 0.10727, 0.09409, 0.07053, 0.00497,                 combined-overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_overhead=SCHED2.sf32
  LSA-FP-MP,     04,           SCHED, microseconds (scale = 1/2200.080000),      *,    74715, 17.67618,      2.13031,    1.82130,    1.03133, 0.54628, 0.49816, 0.24726, 0.31187, 0.09726,                  combined-overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_overhead=SCHED.sf32
  LSA-FP-MP,     04,    SEND-RESCHED, microseconds (scale = 1/2200.080000),      *,    73766, 17.93435,      9.49478,    1.61676,    0.97996, 0.90923, 0.88133, 0.61407, 0.41412, 0.17149,           combined-overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_overhead=SEND-RESCHED.sf32

How to interpret this data: In Figure 1 of the paper, the bars labeled “DSP” correspond directly to the DSP-HANDLER overheads listed in the table. The bars labeled “DISPATCHER” correspond to the sum of the three overheads SCHED (scheduler invocation), SCHED2 (post-context-switch activities), and CXS (context switch). For technical reasons, the latter three overheads are measured separately. The other types of overhead were not reported in the paper.

Final remarks: a close look at the above table reveals that it does not exhibit exactly the same trends as the data reported in the paper. This has multiple reasons.

First, with just two toy experiments, the number of samples is too small to draw any firm conclusions.

Second, with a workload for only four cores, scalability bottlenecks do not manifest yet that play a role in the 24-core configuration.

Third, the data actually stems from a 44-core test machine, and not a 4-core machine (for which the toy experiments are designed). The global scheduler uses all cores and hence never preempts (due to the small number of tasks in the toy experiments). In contrast, the APA scheduler remains constrained by the specified affinities. Therefore, this is not actually a fair comparison; the maxcpus command line option would be required to ensure a level playing field.

This highlights two important points: (1) it is possible to validate the workflow on just about any Linux host with four or more cores, and (2) to replicate the actual reported numbers, a more or less identical machine is needed to run the experiments, and the experiments need to be run at full scale (i.e., all experiments over many hours, resulting in many gigabytes of data).

That said, when running the provided workloads on a 24-core machine (even if not exactly identical), for a reasonable number of task sets (a least a dozen or more), generally similar trends (the APA scheduler incurring higher, but still bearable overheads) should become apparent.