Part 1: Schedulability Experiments with SchedCAT

This guide explains how to reproduce the schedulability experiments discussed in Sections V and VI of the paper

B. Brandenburg and M. Gül, “Global Scheduling Not Required: Simple, Near-Optimal Multiprocessor Real-Time Scheduling with Semi-Partitioned Reservations”, Proceedings of the 37th IEEE Real-Time Systems Symposium (RTSS 2016), to appear, December 2016.

This document is organized as follows:

Part 1: Schedulability Experiments with SchedCAT

Please follow the instructions detailed in each section. In case of problems or further questions, feel free to contact us by email.

Packages needed for SchedCAT

The schedulability experiments presented in the paper have been implemented using the SchedCAT library. SchedCAT supports Linux and Mac OS X environments. The use of Linux is recommended and assumed throughout the rest of these instructions.

To compile and run the experiments, the following standard packages are required:

Python 2.7
Python NumPy Library
Python SciPy LIbrary
GNU Make (make)
SWIG 3.0 (swig)
GNU C++ compiler (g++)
GNU Multiple Precision Arithmetic Library (libgmp)

On a Debian-based Linux distribution, these packages can be easily installed with the following command:

# apt-get install python2.7 python-dev python-numpy python-scipy make swig g++ libgmp3-dev

The above command and all following instructions have been successfully tested on the latest release of Ubuntu Linux (16.04).

Downloading the Experiments

For ease of use, the schedulability experiments are packaged as a single TGZ archive that contains the underlying SchedCAT library, the actual experiments, and all evaluated configurations.

To download and extract the archive, execute the following instructions in some work directory (e.g., $HOME or /tmp).

$ wget http://www.mpi-sws.org/~bbb/papers/ae/rtss16/sp-res-schedulability.tgz
$ tar xzf sp-res-schedulability.tgz

The preceding command should have created a folder named rtss16-sched-experiments/, which contains the Python and C++ code that comprise SchedCAT and the experiments. Change the current working directory to that folder for the remainder of these instructions.

$ cd rtss16-sched-experiments/

Compiling SchedCAT

Before the experiments can be run, it is necessary to compile SchedCAT. If the above-mentioned packages have been installed, this should be trivial using the included Makefile.

To compile SchedCAT, move into the SchedCAT folder, compile the source code, and move back to the root folder:

$ cd lib/schedcat
$ make
$ cd ../../

At this point, the schedulability experiments can be executed.

Schedulability Experiments and Configurations

Before proceeding with the next steps, it is necessary to introduce the notions of an experiment type and a configuration.

Experiments

An experiment type is a hard-coded procedure for setting up and carrying out a particular type of empirical experiment. The directory confs/rtss16b contains a directory for each experiment type. In our paper, we ran six different schedulability experiment types. Correspondingly, there are the following six directories in confs/rtss16b:

$ ls confs/rtss16b
emstada  emstada-chunk  mp  unc  unc-chunk  unc-tcount

These experiment types correspond to the figures in the paper as follows:

The emstada experiment type, which uses the Emberson, Stafford, and Davis (2010) task-set generator ¹, corresponds to Figures 3 and 4.

The emstada-chunk experiment type, which uses the same task-set generator but imposes a varying minimum slice size during semi-partitioning, corresponds to Figure 5.
The mp experiment type, which also uses the Emberson et al. (2010) generator, empirically measures an upper bound on preemptions and corresponds to Figure 6.
The unc experiment type uses a setup similar to the emstada experiments, but uses the “UNC style” of task set generation ². Due to space constraints, results from these experiments are not reported in detail in the paper.

The unc-chunk experiment type matches the emstada-chunk experiments, using the “UNC style” generators instead of the Emberson et al. (2010) generator. Due to space constraints, results from these experiments are not reported in detail in the paper.
The unc-tcount experiment type uses “UNC style” generators, but varies the task count (n) instead of the total utilization (U) as the main parameter. Due to space constraints, results from these experiments are not reported in detail in the paper.

The specific functions implementing these experimental setups can be found in the file exp/rtss16b.py, as defined by the EXPERIMENTS lookup table. However, a detailed understanding of the code is not required to reproduce the experiments; a discussion of the actual implementation is hence beyond the scope of these instructions.

Configurations

Each of the aforementioned experiment types can be configured with a number of parameters (e.g., the number of processors, the number of tasks, etc.). A configuration represents one set of parameter choices and defines the experiment to be performed (e.g., varying the total utilization using the emstada setup for m=4 processors and n=8 tasks).

Each of the directories corresponding to the six experiment types contains .conf files for all parameter combinations considered in the experimental study discussed in the paper.

The name of each .conf file contains key-value pairs that encode the values for the most-important parameters.

For example, the file conf/rtss16b/sd_exp=rtss16b-emstada_dist=hyper1000_m=08_n=20.conf contains the configuration that runs the emstada experiment for m=8 processor, n=20 tasks, and a period distribution that ensures a hyperperiod of at most 1000ms.

Running a Schedulability Experiment

The experiments are run by processing configuration files to the exp Python module, which then calls the appropriate setup functions, executes the experiments, and writes the results to disk.

For example, the following command allows launching the schedulability experiment for a specific configuration:

$ python -m exp ./confs/rtss16b/<experiment type>/<configuration file>

If the command is run again, it will skip the configuration because a corresponding output file already exists. To re-execute the experiment anyway, pass the -f (force) flag:

$ python -m exp -f ./confs/rtss16b/<experiment type>/<configuration file>

Each experiment will spawn multiple compute threads. While the experiment is running, a considerable amount of debug and progress output will be produced—this information can be safely ignored.

Troubleshooting

If you get an error message similar to python2.7: No module named native; 'exp' is a package and cannot be directly executed, then SchedCAT was not compiled correctly.
If you get an error message similar to python2.7: No module named scipy.stats; 'exp' is a package and cannot be directly executed, then you are missing the SciPy library.
On Mac OS X, when using the Homebrew package manager (or possibly some other package manager), it is possible to observe a crash with the following message: Fatal Python error: PyThreadState_Get: no current thread, followed by Abort trap: 6. In this case, SchedCAT was linked against the default system Python (/usr/bin/python), but launched with a different version provided by the package manager (e.g., /usr/local/bin/python).

This can be resolved by configuring the SchedCAT Makefile appropriately, but please simply use Ubuntu Linux for this evaluation.

Running All Schedulablity Experiments

To run all experiments of a given type, you can simply pass the entire directory with the -d flag, which will execute all configurations in the given directory. However, before doing so please take into consideration the below note on expected runtimes.

$ python -m exp -d ./confs/rtss16b/<experiment type>/

Warning: Expected Runtimes

While each individual task set typically requires only a few seconds of processor time, each sample reflects over 1000 task sets, and each configuration involves dozens (or even hundreds) of samples. In turn, the full scope of the experiments (six separate experiments, large range of processor and task counts, etc.) involves over 1000 configurations. This means that serious compute power is required to complete all experiments.

For context, we required over a week on a large Xeon cluster (8+ nodes with 64 cores each) to complete all experiments.

Rerunning all experiments on a single computer is thus practically infeasible. We therefore suggest two ways to limit the replication effort to a reasonable timeframe: reducing the number of samples, and focusing on a few representative configurations most relevant to the paper, as discussed next.

Reducing the Number of Samples to Speed Up Experiments

By reducing the number of generated task sets, the overall time runtime can be significantly reduced, albeit at the expense of increased sampling noise. There are two ways to do so.

First, each configuration specifies the number of generated and evaluated task sets per sampling point with the samples parameter. By default, this parameter is set to 1152.

$ grep samples confs/rtss16b/emstada/sd_exp=rtss16b-emstada_dist=hyper1000_m=02_n=03.conf
samples = 1152

By editing a configuration file, the experiments can be changed to evaluate any number of samples.

Second, the number of desired samples can also be overridden when launching the exp module by specifying the --samples switch. For example, to limit a given configuration to 250 samples, execute the following command.

$ python -m exp --samples 250 ./confs/rtss16b/<experiment type>/<configuration file>

The number of samples can be set to any positive value. For example, try 100 samples, which should speed up experiments somewhat.

Suggested Configurations to Replicate (Figures 4, 5, and 6)

To limit the scope to a few experiments, we further suggest to run (primarily) the configurations corresponding to Figure 4 in the paper, which demonstrates the stability of the trends for different parameter choices.

These configurations are:

Figure 4(a): confs/rtss16b/emstada/sd_exp=rtss16b-emstada_dist=hyper1000_m=04_n=08.conf
Figure 4(b): confs/rtss16b/emstada/sd_exp=rtss16b-emstada_dist=hyper1000_m=16_n=32.conf
Figure 4(c): confs/rtss16b/emstada/sd_exp=rtss16b-emstada_dist=hyper1000_m=24_n=48.conf

For diversity, we further suggest to run a few configurations corresponding to one inset each for Figures 5 and 6.

Figure 5(a): confs/rtss16b/emstada-chunk/sd_exp=rtss16b-emstada-chunk_dist=hyper1000_m=08_chunk=0100_n=10.conf confs/rtss16b/emstada-chunk/sd_exp=rtss16b-emstada-chunk_dist=hyper1000_m=08_chunk=1000_n=10.conf confs/rtss16b/emstada-chunk/sd_exp=rtss16b-emstada-chunk_dist=hyper1000_m=08_chunk=0400_n=10.conf
Figure 6(b): confs/rtss16b/mp/sd_exp=rtss16b-mp_dist=hyper1000_m=08_n=16.conf

To run these experiments with reduced resolution (100 task sets per point), execute the following command:

$ python -m exp --samples 100 \
    confs/rtss16b/emstada/sd_exp=rtss16b-emstada_dist=hyper1000_m=04_n=08.conf \
    confs/rtss16b/emstada/sd_exp=rtss16b-emstada_dist=hyper1000_m=16_n=32.conf \
    confs/rtss16b/emstada/sd_exp=rtss16b-emstada_dist=hyper1000_m=24_n=48.conf \
    confs/rtss16b/emstada-chunk/sd_exp=rtss16b-emstada-chunk_dist=hyper1000_m=08_chunk=0100_n=10.conf \  
    confs/rtss16b/emstada-chunk/sd_exp=rtss16b-emstada-chunk_dist=hyper1000_m=08_chunk=1000_n=10.conf \
    confs/rtss16b/emstada-chunk/sd_exp=rtss16b-emstada-chunk_dist=hyper1000_m=08_chunk=0400_n=10.conf \
    confs/rtss16b/mp/sd_exp=rtss16b-mp_dist=hyper1000_m=08_n=16.conf

Using an Ubuntu 16.04 VM with four virtual CPUs running on a 2012 MacBook Pro (2.2 GHz Intel Core i7), these experiments required about 90 minutes to complete (with a reduced target sample count, i.e., with --samples 100).

Obtaining the Generated Schedulability Data

When a configuration has completed running, the exp module writes the resulting schedulability data into a CSV file. The name of the target CSV file is given in each configuration.

For simplicity, with the provided configurations, all data is written to the output/ subdirectory, which follows the same structure and naming convention as the conf/ directory.

For example, after running the configuration confs/rtss16b/emstada/sd_exp=rtss16b-emstada_dist=hyper1000_m=04_n=08.conf, there should be a corresponding output file in the folder output/rtss16b/emstada/.

$ ls output/rtss16b/emstada/
sd_exp=rtss16b-emstada_dist=hyper1000_m=04_n=08.csv

Output Format

The CSV files produced by the emstada, emstada-chunk, unc, and unc-chunk experiment types contain a simple utilization-vs-schedulability table and are structured as follows (download the complete CSV file):

################################ ENVIRONMENT #################################
[... environment data omitted ...]
#################################### DATA ####################################
#         UTIL        R-UTIL          PART          SEMI      SEMI/PAF       SEMI/RP           ANY    PFAIR-1000     PFAIR-500     PFAIR-100           QPS
             1,     25.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000
       2.50000,     62.50000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000
       3.25000,     81.25000,      0.99000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000
       3.34500,     83.62500,      0.95000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000
       3.43500,     85.87500,      0.94000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000
       3.53000,     88.25000,      0.92000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000,      1.00000
[... many rows omitted ...]
       3.95500,     98.87500,      0.04000,      0.68000,      1.00000,      1.00000,      1.00000,      0.01000,      0.07000,      1.00000,      1.00000
       3.96000,     99.00000,      0.00000,      0.77000,      1.00000,      1.00000,      1.00000,      0.01000,      0.04000,      0.98000,      1.00000
       3.96500,     99.12500,      0.00000,      0.60000,      1.00000,      1.00000,      1.00000,      0.01000,      0.07000,      0.96000,      1.00000
       3.97000,     99.25000,      0.00000,      0.66000,      1.00000,      1.00000,      1.00000,      0.00000,      0.03000,      0.96000,      1.00000
       3.97500,     99.37500,      0.00000,      0.57000,      0.99000,      1.00000,      1.00000,      0.00000,      0.01000,      0.81000,      1.00000
       3.98000,     99.50000,      0.00000,      0.69000,      1.00000,      0.99000,      1.00000,      0.00000,      0.00000,      0.57000,      1.00000
       3.98500,     99.62500,      0.00000,      0.64000,      0.98000,      0.99000,      1.00000,      0.00000,      0.01000,      0.30000,      1.00000
       3.99000,     99.75000,      0.00000,      0.54000,      0.97000,      0.99000,      1.00000,      0.00000,      0.00000,      0.09000,      1.00000
       3.99500,     99.87500,      0.00000,      0.52000,      0.94000,      0.96000,      0.99000,      0.00000,      0.00000,      0.01000,      1.00000
             4,    100.00000,      0.00000,      0.11000,      0.37000,      0.27000,      0.42000,      0.00000,      0.00000,      0.00000,      1.00000
############################### CONFIGURATION ################################
[... configuration data omitted ...]

The meaning of the columns is as follows:

Column	Description
UTIL	simply the total system utilization
R-UTIL	relative system utilization (percentage, from 0% to 100%)
PART	schedulability w/ partitioning only — corresponds to Figure 3(a)
SEMI	schedulability w/ basic semi-partitioning — corresponds to Figure 3(b)
SEMI/PAF	schedulability w/ semi-partitioning with the pre-assign failures meta-heuristic
SEMI/RP	schedulability w/ semi-partitioning with the reduce-periods meta-heuristic
ANY	schedulability w/ semi-partitioning using all heuristics and meta-heuristics — corresponds to Figure 3(c)
PFAIR-1000	schedulability w/ PD², assuming a quantum size of 1000µs
PFAIR-500	schedulability w/ PD², assuming a quantum size of 500µs
PFAIR-100	schedulability w/ PD², assuming a quantum size of 100µs
QPS	schedulability w/ QPS (optimal)

The mp experiment type further splits each non-utilization column into five sub-columns:

Column	Description
X (AVG/core)	Average number of context switches per core per second under policy X
X (AVG)	Average number of context switches per second under policy X
X (MED)	Median number of context switches per core per second under policy X
X (STD)	Standard deviation of the number of context switches per second under policy X
X (MAX)	Maximum observed number of context switches per second under policy X

Note that “number of context switches” should be interpreted as an analytical bound on the maximum number of context switches, and not as an actual number of context switches from simulation or observation.

Interpreting the Data

We recommend to load the produced CVS files into OpenOffice Calc, Microsoft Excel, Apple Numbers, or some other spreadsheet-type application that makes it easy to visualize columnar data. By plotting the appropriate columns (as indicated below), graphs similar to those shown in the paper should become apparent.

Figure 3:
- Figure 3(a) shows the PART column for different configurations (varying n).
- Figure 3(b) shows the SEMI column for different configurations (varying n).
- Figure 3(c) shows the ANY column for different configurations (varying n).
Figure 4:
- Figure 4(a) shows the PART, SEMI, and ANY columns for one particular configuration (m=4, n=8).
- Figure 4(b) shows the PART, SEMI, and ANY columns for one particular configuration (m=16, n=32).
- Figure 4(c) shows the PART, SEMI, and ANY columns for one particular configuration (m=24, n=48).
Figure 5:
- Figure 5(a) shows the ANY column for different configurations (m=8, n=10, varying minimum slice-size thresholds).
- Figure 5(b) shows the ANY column for different configurations (m=8, n=16, varying minimum slice-size thresholds).
- Figure 5(c) shows the ANY column for different configurations (m=8, n=24, varying minimum slice-size thresholds).
Figure 6:
- Figure 6(a) shows the “PART (AVG/core)”, “SEMI (AVG/core)”, “ANY (AVG/core)”, and “QPS (AVG/core)” columns for one particular configuration (m=8, n=10).
- Figure 6(b) shows the “PART (AVG/core)”, “SEMI (AVG/core)”, “ANY (AVG/core)”, and “QPS (AVG/core)” columns for one particular configuration (m=8, n=16).
- Figure 6(c) shows the “PART (AVG/core)”, “SEMI (AVG/core)”, “ANY (AVG/core)”, and “QPS (AVG/core)” columns for one particular configuration (m=8, n=24).

Caveat: if the number of samples is reduced, the data will be quite noisy. Nonetheless, the general shape of the curves should still match those shown in the paper.

This concludes the instructions for reproducing the experiments reported in Sections V and VI of our RTSS’16 paper. Should you face any problems or have any questions, please feel free to contact us by email.

For details on the emstada task-set generation method, see P. Emberson, R. Stafford, and R. Davis, “Techniques for the synthesis of multiprocessor tasksets,” in WATERS’10. ↩︎
For details on the “UNC style” task-set generation method, see e.g. B. Brandenburg, J. Calandrino, and J. Anderson, “On the scalability of real-time scheduling algorithms on multicore platforms,” in RTSS’08; and B. Brandenburg, “Scheduling and locking in multiprocessor real-time operating systems”, Ph.D. dissertation, The University of North Carolina at Chapel Hill, 2011. ↩︎