Cheng Li

Phd Student
Dependable Systems Group
Max Planck Institute for Software Systems
Department of Computer Science
Saarland University


Room 309
Building E1 5, Campus
66123 Saarbruecken,Germany

Email: chengli at mpi-sws dot org
Phone: +49 681 9303-8813
Fax: +49 681 9303-9199








♦ Building fast and consistent (geo-)replicated systems

Replication has been widely adopted to build highly scalable services, but this goal is often compromised by the coordination required to ensure application-specific properties such as state convergence and invariant preservation. In this paper, we propose a principled mechanism to minimize coordination in replicated systems via the following components: a) a notion of restriction over pairs of operations, which captures the fact that the two operations must be ordered w.r.t. each other in any partial order; b) a generic consistency model which, given a set of restrictions, requires those restrictions to be met in all admissible partial orders; c) principles for identifying a minimal set of restrictions to ensure the above properties; and d) a coordination service that dynamically maps restrictions to the most efficient coordination protocols. Our preliminary experience with example applications shows that we are able to determine a minimal coordination strategy.

- Minimizing Coordination in Replicated Systems[pdf]. Cheng Li, João Leitão, Allen Clement, Nuno Preguiça and Rodrigo Rodrigues, In Proceedings of the Workshop on on Principles and Practice of Consistency for Distributed Data (PaPoC'15), Bordeaux, France

Most recently, (geo-)replication has been widely adopted by online service providers (such as Google, Amazon, Yahoo!, Microsoft, and etc) to offer their customers a good user experience in terms of low-latency access. To achieve this goal requires system builders to solve a fundamental tension between consistency semantics and performance benefits. The recent proposals including our prior work RedBlue consistency suggest that strong consistency is an obstacle to scale out a replicated system since it requires high-cost coordination, and avoiding coordination and taking advantage of eventual consistency are the major solutions to make a replicated system fast enough. However, to precisely determine which consistency level for each operation leads to a huge amount of non-trivial work. In addition, without a tool, manual classification can be error-prone, and is not scalable. For these reasons, we present SIEVE, a tool that relieve Java programmers from this decision process, allowing applications to automatically extract good performance when possible, while resorting to strong consistency whenever required by the target semantics. Taking as input a set of application-specific invariants and small annotations regarding merge semantics, SIEVE performs a combination of static and dynamic analysis, offline and at runtime. We evaluate SIEVE using two web applications and show that the automatic classification overhead is low, and the performance improvement enabled by replication is significant.

- Automating the Choice of Consistency Levels in Replicated Systems[pdf]. Cheng Li, João Leitão, Allen Clement, Nuno Preguiça, Rodrigo Rodrigues and Viktor Vafeiadis, In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC 2014), Philadelphia, PA, USA

Geo-replication, replicating data at geo-graphically dispersed data centers, is a major solution for popular online services to scale themselves to meet the needs of their increasingly growing user base. With this technique, users contact with a nearby data center to get fast response, and coordination among all data centers is required to keep data globally consistent. Due to the high cost of exchanging message in a wide area network, however, offering low latency and maintaining strong consistency can not be achieved simultaneously. As a result, many systems restort to use weak consistency model (eventual consistency), in which each data center independently processes requests and resolves conflicts much later, to avoid immediate coordination for each request. In this project, we proposed a solution allowing multi-level consistency semantics to co-exist in one system.

- Making Geo-Replicated Systems Fast as Possible, Consistent when Necessary[pdf]. Cheng Li, Daniel Porto, Allen Clement, Johannes Gehrke, Nuno Preguica, and Rodrigo Rodrigues, In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2012), Hollywood, CA, USA

♦ Finding and recovering from concurrency bugs

When analyzing bug reports from the bug database of MySQL, a highly concurrent application, we found a very interesting thing: some concurrency bugs when triggered firstly silently corrupted internal states but became visible to users much later. We named this class of bugs latent bugs. Detecting latent bugs is a challenge, but it also provides an oppotunity to recover latent bugs before they are seen by users. Currently we're trying to build a novel tool to detect latent bugs. Our main idea is to check whether the concurrent executions are linearizable by comparing the output and the internal states after concurrent executions and sequential executions. Once the detector is built, we will use it at run-time with the goal of allowing transparent recovery for the latent concurrency bugs.

- Finding Complex Concurrency Bugs in Large Multi-Threaded Applications [pdf]. Pedro Fonseca, Cheng Li, and Rodrigo Rodrigues, In Proceedings of the 6th European Professional Society on Computer Systems (EuroSys 2011), Salzburg, Austria

- A study of the Internal and External Effects of Concurrency Bugs [pdf]. Pedro Fonseca, Cheng Li, Vishal Singhal and Rodrigo Rodrigues, In Proceedings of the DSN 2010 - 40th IEEE/IFIP International Conference on Dependable Systems and Networks, Chicago, USA