This document describes how to evaluate the in-kernel implementation of the laminar APA scheduler presented in the paper:
V. Bonifaci, B. Brandenburg, G. D’Angelo, and A. Marchetti-Spaccamela, “Multiprocessor Real-Time Scheduling with Hierarchical Processor Affinities”, Proceedings of the 28th Euromicro Conference on Real-Time Systems (ECRTS 2016), July 2016.
The proposed scheduling algorithm was implemented and evaluated in LITMUSRT, which in turn is based on Linux.
Note: a tutorial on Linux kernel development and detailed instructions on how to compile and configure a Linux kernel that actually works on a given hardware platform is beyond the scope of this document. Basic familiarity with Linux and the compilation and installation of custom Linux kernels on behalf of the evaluator is assumed.
Note: the paper reports empirical data measured on a particular hardware platform available at the time of writing in the shared research cluster of MPI-SWS. To reproduce the exact numbers reported in the paper, it would be necessary to have access to this particular machine, or one virtually identical to it. The focus of this document is hence how to obtain measurements like those reported in the paper (i.e., how to observe similar trends), and not on reproducing the exact numbers.
The rest of the document covers the following main points:
We provide instructions both for running the full experiments, which however requires access to a 24-core machine, and for running toy experiments, for which a 4-core machine suffices. We hope that running the 4-core toy experiments may be sufficient to establish that the provided artifact and the general procedure works.
Note: Even running the 24-core experiments will not reproduce the exact numbers reported in the paper, unless you happen to have an identical hardware platform. The general trends, however, should be observable even on other 24-core Intel platforms.
To run the full experiments, you need a 24-core Intel platform (or larger) running Linux. To run the toy experiments, a 4-core Intel machine (or larger) running Linux will do. In a pinch, you can also use a virtual machine (with four or more virtual cores) to run the toy experiments. However, in a virtual machine, all collected data will be bogus due to the virtualization overheads.
If you have a system with more than 24 cores, you can tell Linux to use only 24 cores by booting with maxcpus=24
specified in the kernel command line.
On your experimental platform, to approximate our settings, you should disable (in the BIOS) architectural features that cause unpredictability such as hyperthreading and cache prefetching. To reproduce our exact setup, you need a Dell PowerEdge R920 server with four sockets, each containing a 12-core Intel Xeon E7–8857 v2 processor.
To conduct experiments similar to those reported in the paper, you need the following software components:
liblitmus
user-space library, which provides the necessary tools for working with a LITMUSRT kernel. The liblitmus
library also comes with rtspin
, a tool for simulating CPU-bound, periodic real-time tasks, which is employed as the workload in the experiments.feather-trace-tools
project, a collection of tracing and and analysis tools that we used to collect overheads under LITMUSRT.feather-trace-tools
, which adds support for the trace points used in message-passing-based schedulers (such as the implementation of the proposed laminar APA scheduler).For convenience, we provide items 2–7 together in a single archive that can be downloaded from here:
When unpacked, you should see the following directory contents:
drwxr-xr-x 2 bbb wheel 68 May 11 20:52 data
drwxr-xr-x 20 bbb wheel 680 May 11 23:28 feather-trace-tools
-rw-r--r-- 1 bbb wheel 114145 May 12 01:49 kernel-config-for-virtualbox
drwxr-xr-x 15 bbb wheel 510 May 11 20:45 liblitmus
-rw-r--r-- 1 bbb wheel 600909 May 11 21:04 litmus-rt-with-laminar-apa.patch
drwxr-xr-x 4 bbb wheel 136 May 11 22:16 workloads
To set up the software environment, carry out the following step-by-step instructions on your Linux host that you will use for the experiments. (This could be a virtual machine, provided your workstation can comfortably run one with at least four virtual cores.) In the following, we assume that you use /usr/local/litmus
as the working directory.
Note: all instructions have been tested on a machine running Ubuntu Linux 14.04 LTS. In principle, you should be able to use just about any Linux distribution; however, with different, especially newer compiler versions comes the risk of compilation failures due to newly added warnings (we compile with -Werror
).
Support: we’ve made all efforts to make reproducing our work as painless as possible, but working with kernels does come with some challenges. If at any point you run into problems, please feel free to contact Björn Brandenburg (bbb@mpi-sws.org) for help.
Make sure your Linux box is set up for Linux kernel development. This means installing the typical Unix C development chain, including gcc
, make
, etc. Any Linux installation that can compile a vanilla Linux kernel should also be able to compile the provided software artifact.
The working directory /usr/local/litmus
can be set up as follows:
cd /usr/local
sudo mkdir litmus
sudo chown $USER litmus
Download and unpack the prepared archive.
cd /usr/local/litmus
wget http://www.mpi-sws.org/~bbb/papers/ae/ecrts16/ae-laminar-apa.tgz
tar xzf ae-laminar-apa.tgz
Download and extract Linux version 4.1.3 in the folder extracted from the archive (= /usr/local/litmus/ae-laminar-apa/
).
cd /usr/local/litmus/ae-laminar-apa/
# Download the Linux kernel sources
wget https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.1.3.tar.gz
# Unpack Linux archive into linux-4.1.3
tar xzf linux-4.1.3.tar.gz
# Rename directory into litmus-rt
mv linux-4.1.3 litmus-rt
Note the last step: the directory containing Linux needs to be called litmus-rt
in order for the provided build system to work.
The provided archive contains a single patch that includes both the LITMUSRT base system and the modifications specific to this paper. Apply this patch with the patch
utility.
cd /usr/local/litmus/ae-laminar-apa/litmus-rt/
patch -p1 < ../litmus-rt-with-laminar-apa.patch
Our implementation of the laminar APA scheduler can be found in the file litmus/sched_lsa_fp_mp.c
.
Note: This is the step where prior experience with configuring and compiling the Linux kernel is required.
Prior to compilation, the Linux kernel must configured. In general, it is not possible to tell in advance which configuration options will be required for the hardware platform on which the artifact is going to be evaluated. We thus cannot provide precise instructions for this step.
We provide the following guidelines:
CONFIG_PREEMPT
).With regard to configuration options specific to LITMUSRT, there is a configuration group at the very end of the configuration.
In the LITMUSRT settings, make sure you enable overhead tracing (CONFIG_SCHED_OVERHEAD_TRACE=y
), disable debug tracing (CONFIG_SCHED_DEBUG_TRACE
not set), and enable release-master support (CONFIG_RELEASE_MASTER=y
).
After the kernel has been configured, the last couple of lines of your .config
file should look like this:
#
# LITMUS^RT
#
#
# Scheduling
#
CONFIG_PLUGIN_CEDF=y
CONFIG_PLUGIN_PFAIR=y
CONFIG_RELEASE_MASTER=y
CONFIG_PREFER_LOCAL_LINKING=y
CONFIG_LITMUS_QUANTUM_LENGTH_US=1000
# CONFIG_BUG_ON_MIGRATION_DEADLOCK is not set
#
# Real-Time Synchronization
#
CONFIG_NP_SECTION=y
CONFIG_LITMUS_LOCKING=y
#
# Performance Enhancements
#
# CONFIG_SCHED_CPU_AFFINITY is not set
# CONFIG_ALLOW_EARLY_RELEASE is not set
# CONFIG_EDF_TIE_BREAK_LATENESS is not set
CONFIG_EDF_TIE_BREAK_LATENESS_NORM=y
# CONFIG_EDF_TIE_BREAK_HASH is not set
# CONFIG_EDF_PID_TIE_BREAK is not set
#
# Tracing
#
CONFIG_FEATHER_TRACE=y
CONFIG_SCHED_TASK_TRACE=y
CONFIG_SCHED_TASK_TRACE_SHIFT=13
# CONFIG_SCHED_LITMUS_TRACEPOINT is not set
CONFIG_SCHED_OVERHEAD_TRACE=y
CONFIG_SCHED_OVERHEAD_TRACE_SHIFT=24
# CONFIG_SCHED_DEBUG_TRACE is not set
If the configured kernel fails to boot or run correctly, this is most likely a configuration problem unrelated to this artifact. When it doubt, make sure you can configure and boot a kernel without the LITMUSRT patch first.
As an alternative, we provide a kernel configuration that is known to work under VirtualBox (the file kernel-config-for-virtualbox
). To use this configuration, copy the file into the kernel source code directory and rename it to .config
.
cp -v ../kernel-config-for-virtualbox .config
If all else fails, download and use the VM image provided for the LITMUSRT tutorial held at TuToR’16. The provided kernel configuration has been derived from this VM image and is known to work. However, if you seek to carry out the instructions laid out in this document in the TuToR’16 VM, you will need to delete the directory /opt/litmus-rt
prior to unpacking the kernel in Step 2 since the VM image has only very limited disk space.
Once the kernel has been configured, simply compile and install it like any other Linux kernel.
make bzImage modules
sudo make modules_install install
Note that the second step requires root
privileges.
Note: Depending on which bootloader your configuration uses (e.g., grub2
or lilo
), you may have to run another utility at this point to update the bootloader configuration. In some cases, you may even have to manually edit the bootloader configuration. As this is basic Linux knowledge that anyone familiar with Linux kernel development will know, we do not provide any detailed instructions for this step. (On Ubuntu-based systems, no manual configuration should be needed.)
Change to the directory of the provided liblitmus
and run make
. It should find the kernel in ../litmus-rt
automatically.
cd /usr/local/litmus/ae-laminar-apa/liblitmus
make
Change to the directory of the provided feather-trace-tools
and run make
. It should find the kernel in ../litmus-rt
automatically.
cd /usr/local/litmus/ae-laminar-apa/feather-trace-tools
make
At this point, the software is ready to run experiments.
Running the experiments is quite easy since they are mostly automated by the included shell scripts.
Reboot the system and select the just-installed LITMUSRT kernel in the menu of your boot loader.
Again, how this exactly works is specific to the particular flavor of Linux installed on the evaluation machine; it will be a familiar step to anyone who is comfortable with compiling and installing custom Linux kernels. On Ubuntu, the newly installed kernel can be found in the submenu labeled “Advanced Options for Ubuntu” under the name “Ubuntu, with Linux 4.1.3”.
Once the kernel has booted, you can verify that it is indeed the right kernel by inspecting the list of loaded scheduler plugins with the following command.
cat /proc/litmus/plugins/loaded
This should produce the following output:
PFAIR
C-EDF
LSA-FP-MP
G-FP-MP
G-EDF-MP
P-FP
PSN-EDF
GSN-EDF
Linux
The plugin LSA-FP-MP
is the laminar strong APA fixed-priority scheduler plugin (the new scheduler); the plugin G-FP-MP
is the global fixed-priority scheduler plugin (the baseline). However, these plugins will be automatically activated at the right times by the provided experiment scripts, as discussed below in Step 10.
Running experiments requires superuser privileges. Thus, first open a root
shell. For example, with sudo
:
sudo -s
In the root
shell, we need to set the PATH
environmental variable to ensure that the experiment scripts can find all required tools. In particular, we need to add the liblitmus
and the feather-trace-tools
directories to the search path.
export PATH=/usr/local/litmus/ae-laminar-apa/liblitmus:$PATH
export PATH=/usr/local/litmus/ae-laminar-apa/feather-trace-tools:$PATH
You can check that the path was set up correctly by locating the rtspin
and ftcat
utilities:
which rtspin
# expected output:
# /usr/local/litmus/ae-laminar-apa/liblitmus/rtspin
which ftcat
# expected output:
# /usr/local/litmus/ae-laminar-apa/feather-trace-tools/ftcat
Finally, we can launch an experiment. The archive comes both with all workloads used in the experiments reported in the paper, and with toy experiments that allow trying out the kernel if no 24-core machine is available.
There is a shell script for each experiment that takes care of everything: setting up the experiment, launching rtspin
processes with appropriate parameters, starting overhead tracing, and tearing everything down again at the end of the experiments.
The experiment scripts are provided in the folder workloads/
of the provided archive. They are organized by required core count and by scheduler:
workloads/24/apa
contains experiment scripts for the laminar APA scheduler (the new algorithm) for platforms with at least 24 cores;workloads/24/gfp
contains experiment scripts for the global fixed-priority scheduler (the baseline) for platforms with at least 24 cores;workloads/4/apa
contains experiment scripts for the laminar APA scheduler (the new algorithm) for platforms with at least 4 cores; andworkloads/4/gfp
contains experiment scripts for the global fixed-priority scheduler (the baseline) for platforms with at least 4 cores.The file names of the shell scripts reflect the parameters of the workload. For example, the file workloads/4/apa/apa-workload_m=04_n=08_u=85_seq=00.sh
is for m=4 processor cores, launches a workload consisting of n=8 tasks, and has a total utilization of 85%. The seq
tag is simply a sequence number; there are 10 scripts for each parameter combination.
To launch an experiment, simply run the corresponding script from the root
shell.
Each experiment script produces raw overhead sample files in the directory in which it is launched. We therefore first move to the (still empty) data/
directory.
cd /usr/local/litmus/ae-laminar-apa/data
../workloads/4/apa/apa-workload_m=04_n=08_u=85_seq=00.sh
Each experiment runs for 60 seconds (plus a few seconds for setup and teardown). While it is running, it should provide a progress indicator (dots appearing, one per second). When the experiment is done, the shell script will terminate.
For example, this is what it should look like after the experiment has completed:
root@rts44:/usr/local/litmus# cd /usr/local/litmus/ae-laminar-apa/data
root@rts44:/usr/local/litmus/ae-laminar-apa/data# ../workloads/4/apa/apa-workload_m=04_n=08_u=85_seq=00.sh
Running apa-workload_m=04_n=08_u=85_seq=00 under LSA-FP-MP for 60 seconds...
Setting processor 3 to be the dedicated scheduling core.
Waiting for 8 tasks to finish launching...
Launching overhead tracer.
Waiting for overhead tracer to finish launching...
Released 8 real-time tasks.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
All tasks finished.
Sent SIGUSR1 to stop tracers...
root@rts44:/usr/local/litmus/ae-laminar-apa/data#
The experiment script generated a bunch of data files that contain overhead samples that were collected with Feather-Trace, the overhead tracing framework built into LITMUSRT. You should now see (at least) the following files in the data/
directory.
overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_n=08_u=85_seq=00_cpu=0.bin
overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_n=08_u=85_seq=00_cpu=1.bin
overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_n=08_u=85_seq=00_cpu=2.bin
overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_n=08_u=85_seq=00_cpu=3.bin
overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_n=08_u=85_seq=00_msg=0.bin
overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_n=08_u=85_seq=00_msg=1.bin
overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_n=08_u=85_seq=00_msg=2.bin
overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_n=08_u=85_seq=00_msg=3.bin
The script creates two files for each processor; the total number of files created hence depends on size of the experimental platform.
The above example ran a workload under the new laminar APA scheduler. To get data for the baseline scheduler, run the same experiment again, but this time under the global fixed-priority (GFP) scheduler.
cd /usr/local/litmus/ae-laminar-apa/data
../workloads/4/gfp/apa-workload_m=04_n=08_u=85_seq=00.sh
Again, the output should look something like this:
root@rts44:/usr/local/litmus/ae-laminar-apa/data# ../workloads/4/gfp/apa-workload_m=04_n=08_u=85_seq=00.sh
Running apa-workload_m=04_n=08_u=85_seq=00 under G-FP-MP for 60 seconds...
Setting processor 3 to be the dedicated scheduling core.
Waiting for 8 tasks to finish launching...
Launching overhead tracer.
Waiting for overhead tracer to finish launching...
Released 8 real-time tasks.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
All tasks finished.
Sent SIGUSR1 to stop tracers...
You can now run as many experiments as you wish. Obviously, running all workloads will take many hours.
Once a satisfactory amount of data has been collected, the overhead statistics reported in the paper can be obtained with the tools provided in feather-trace-tools/
.
The following steps are based on the LITMUSRT overhead tracing tutorial. The focus here is on documenting how to obtain the desired statistics, not on explaining why each step is necessary or what precisely it does. For a more in-depth explanation, please refer to the LITMUSRT tracing tutorial.
In the first processing step, the raw trace files are cleaned up and prepared for further processing.
ft-sort-traces overheads_*.bin 2>&1 | tee -a overhead-processing.log
The files ending with the extension .bin
are raw trace files in a kernel-defined binary format. Before extracting meaningful statistics, we need to extract the actual data samples.
ft-extract-samples overheads_*.bin 2>&1 | tee -a overhead-processing.log
At this point, we have many per-processor, per-task-count, per-utilization, etc. files. We are interested in aggregate overhead values across all tested scenarios. Hence we need to combine the individual sample files.
ft-combine-samples --std overheads_*.float32 2>&1 | tee -a overhead-processing.log
In general, the number of samples that were collected for the two schedulers will differ to some extent. To get the minimum number available of each type, which we require for the next step, we count the number of samples in all files.
ft-count-samples combined-overheads_*.float32 > counts.csv
To compare sampled maxima in an unbiased way, we need to use an equal number of samples from each population. Since in all likelihood we recorded a different number of samples for each scheduler, we randomize and truncate the data files.
ft-select-samples counts.csv combined-overheads_*.float32 2>&1 | tee -a overhead-processing.log
Finally, we can compute the statistics reported in the paper.
ft-compute-stats combined-overheads_*.sf32 > stats.csv
The above command reports overheads in terms of processor cycles. For human consumption, it is more convenient to report overheads in terms of microseconds. To obtain the overheads in microseconds, pass the option --cycles-per-usec
to ft-compute-stats
. The appropriate value can be obtained from /proc/cpuinfo
with the following command:
grep 'cpu MHz' /proc/cpuinfo | uniq
For example, on our test machine (not the one used for the experiments reported in the paper), the output looks as follows.
bbb@rts44:/usr/local/litmus$ grep 'cpu MHz' /proc/cpuinfo | uniq
cpu MHz : 2200.080
In this particular processor, there are 2200.080 cycles per microsecond. Hence we can compute the statistics in microseconds as follows.
ft-compute-stats --cycles-per-usec 2200.080 combined-overheads_*.sf32 > stats-us.csv
Having just run the two toy experiments mentioned in Step 10 above, the output looks as follows:
# Plugin, #cores, Overhead, Unit, #tasks, #samples, max, 99.9th perc., 99th perc., 95th perc., avg, med, min, std, var, file
G-FP-MP, 04, CLIENT-REQUEST, microseconds (scale = 1/2200.080000), *, 74673, 19.34430, 15.64589, 2.43958, 2.20492, 1.44456, 1.34995, 0.16272, 0.83952, 0.70479, combined-overheads_host=rts44_scheduler=G-FP-MP_trace=apa-workload_m=04_overhead=CLIENT-REQUEST_LATENCY.sf32
G-FP-MP, 04, CXS, microseconds (scale = 1/2200.080000), *, 74713, 21.48013, 3.50275, 1.59358, 1.51313, 1.16294, 1.19087, 0.20408, 0.44905, 0.20165, combined-overheads_host=rts44_scheduler=G-FP-MP_trace=apa-workload_m=04_overhead=CXS.sf32
G-FP-MP, 04, DSP-HANDLER, microseconds (scale = 1/2200.080000), *, 74754, 26.97538, 11.48239, 1.30995, 1.03087, 0.73659, 0.69270, 0.09272, 0.55134, 0.30398, combined-overheads_host=rts44_scheduler=G-FP-MP_trace=apa-workload_m=04_overhead=DSP-HANDLER.sf32
G-FP-MP, 04, RELEASE-LATENCY, microseconds (scale = 1/1000), *, 31554, 24.16400, 17.20824, 6.06674, 3.61900, 1.76023, 1.43000, 0.63200, 1.40904, 1.98534, combined-overheads_host=rts44_scheduler=G-FP-MP_trace=apa-workload_m=04_overhead=RELEASE-LATENCY.sf32
G-FP-MP, 04, RELEASE, microseconds (scale = 1/2200.080000), *, 31553, 28.52260, 13.56427, 4.46666, 3.48669, 1.68209, 1.59085, 0.44862, 1.03474, 1.07066, combined-overheads_host=rts44_scheduler=G-FP-MP_trace=apa-workload_m=04_overhead=RELEASE.sf32
G-FP-MP, 04, SCHED2, microseconds (scale = 1/2200.080000), *, 74715, 15.17581, 0.54998, 0.32999, 0.23999, 0.17463, 0.22363, 0.09545, 0.11122, 0.01237, combined-overheads_host=rts44_scheduler=G-FP-MP_trace=apa-workload_m=04_overhead=SCHED2.sf32
G-FP-MP, 04, SCHED, microseconds (scale = 1/2200.080000), *, 74715, 28.92122, 11.12775, 3.12509, 2.54354, 1.26885, 1.13632, 0.34271, 0.70886, 0.50247, combined-overheads_host=rts44_scheduler=G-FP-MP_trace=apa-workload_m=04_overhead=SCHED.sf32
G-FP-MP, 04, SEND-RESCHED, microseconds (scale = 1/2200.080000), *, 73766, 19.14249, 11.68789, 2.05356, 1.87311, 1.46852, 1.60312, 0.63543, 0.63861, 0.40782, combined-overheads_host=rts44_scheduler=G-FP-MP_trace=apa-workload_m=04_overhead=SEND-RESCHED.sf32
LSA-FP-MP, 04, CLIENT-REQUEST, microseconds (scale = 1/2200.080000), *, 74673, 16.72530, 9.88720, 1.43449, 1.23359, 1.13731, 1.09587, 0.85997, 0.46705, 0.21813, combined-overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_overhead=CLIENT-REQUEST_LATENCY.sf32
LSA-FP-MP, 04, CXS, microseconds (scale = 1/2200.080000), *, 74713, 10.75552, 1.30645, 0.82133, 0.75725, 0.52628, 0.64998, 0.17863, 0.22476, 0.05051, combined-overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_overhead=CXS.sf32
LSA-FP-MP, 04, DSP-HANDLER, microseconds (scale = 1/2200.080000), *, 74754, 31.82248, 5.14525, 3.49442, 3.14307, 2.53362, 2.42264, 0.09409, 0.45715, 0.20899, combined-overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_overhead=DSP-HANDLER.sf32
LSA-FP-MP, 04, RELEASE-LATENCY, microseconds (scale = 1/1000), *, 31554, 18.40800, 11.55656, 2.41288, 1.72400, 0.95378, 0.70900, 0.62700, 0.67612, 0.45712, combined-overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_overhead=RELEASE-LATENCY.sf32
LSA-FP-MP, 04, RELEASE, microseconds (scale = 1/2200.080000), *, 31553, 20.57107, 11.54750, 5.53980, 3.29715, 2.27388, 1.96993, 0.53589, 0.91940, 0.84526, combined-overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_overhead=RELEASE.sf32
LSA-FP-MP, 04, SCHED2, microseconds (scale = 1/2200.080000), *, 74715, 16.82439, 0.30590, 0.22999, 0.17999, 0.12171, 0.10727, 0.09409, 0.07053, 0.00497, combined-overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_overhead=SCHED2.sf32
LSA-FP-MP, 04, SCHED, microseconds (scale = 1/2200.080000), *, 74715, 17.67618, 2.13031, 1.82130, 1.03133, 0.54628, 0.49816, 0.24726, 0.31187, 0.09726, combined-overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_overhead=SCHED.sf32
LSA-FP-MP, 04, SEND-RESCHED, microseconds (scale = 1/2200.080000), *, 73766, 17.93435, 9.49478, 1.61676, 0.97996, 0.90923, 0.88133, 0.61407, 0.41412, 0.17149, combined-overheads_host=rts44_scheduler=LSA-FP-MP_trace=apa-workload_m=04_overhead=SEND-RESCHED.sf32
How to interpret this data: In Figure 1 of the paper, the bars labeled “DSP” correspond directly to the DSP-HANDLER
overheads listed in the table. The bars labeled “DISPATCHER” correspond to the sum of the three overheads SCHED
(scheduler invocation), SCHED2
(post-context-switch activities), and CXS
(context switch). For technical reasons, the latter three overheads are measured separately. The other types of overhead were not reported in the paper.
Final remarks: a close look at the above table reveals that it does not exhibit exactly the same trends as the data reported in the paper. This has multiple reasons.
First, with just two toy experiments, the number of samples is too small to draw any firm conclusions.
Second, with a workload for only four cores, scalability bottlenecks do not manifest yet that play a role in the 24-core configuration.
Third, the data actually stems from a 44-core test machine, and not a 4-core machine (for which the toy experiments are designed). The global scheduler uses all cores and hence never preempts (due to the small number of tasks in the toy experiments). In contrast, the APA scheduler remains constrained by the specified affinities. Therefore, this is not actually a fair comparison; the maxcpus
command line option would be required to ensure a level playing field.
This highlights two important points: (1) it is possible to validate the workflow on just about any Linux host with four or more cores, and (2) to replicate the actual reported numbers, a more or less identical machine is needed to run the experiments, and the experiments need to be run at full scale (i.e., all experiments over many hours, resulting in many gigabytes of data).
That said, when running the provided workloads on a 24-core machine (even if not exactly identical), for a reasonable number of task sets (a least a dozen or more), generally similar trends (the APA scheduler incurring higher, but still bearable overheads) should become apparent.