I am a Ph.D. student at Humboldt Universität zu Berlin and the Max Planck Institute for Software Systems (MPI-SWS) in the Social Computing Research group. I am advised by Ulf Leser, Carlos Castillo and Krishna Gummadi. I was a visiting researcher at WSSC with Carlos Castillo, UPF Barcelona, Spain in 2018 and at VIDA lab with Julia Stoyanovich, New York University, USA in 2019. I completed my Diploma degree in Computer Science at the Technische Universität Dresden with Nico Hoffmann and Uwe Petersohn as my advisors, where I developed a machine learning algorithm to recognize vascular pathologies in thermographic images of the brain. I studied Computer Science at MIIT (МИИТ) in Moscow, Russia in 2009/2010, and at INSA in Lyon, France 2010.
My research interests center around artificial intelligence and its social impact, algorithmic discrimination, fairness and algorithmic exploitation.
For full list of publications, kindly check my Google Scholar or DBLP Profile.
Matching code and law: achieving algorithmic fairness with optimal transport.
- Data Mining and Knowledge Discovery. Springer. Volume 34, Issue 1, January 2020.
Abstract: Increasingly, discrimination by algorithms is perceived as a societal and legal problem. As a response, a number of criteria for implementing algorithmic fairness in machine learning have been developed in the literature. This paper proposes the continuous fairness algorithm (CFA𝜃) which enables a continuous interpolation between different fairness definitions. More specifically, we make three main contributions to the existing literature. First, our approach allows the decision maker to continuously vary between specific concepts of individual and group fairness. As a consequence, the algorithm enables the decision maker to adopt intermediate “worldviews” on the degree of discrimination encoded in algorithmic processes, adding nuance to the extreme cases of “we’re all equal” and “what you see is what you get” proposed so far in the literature. Second, we use optimal transport theory, and specifically the concept of the barycenter, to maximize decision maker utility under the chosen fairness constraints. Third, the algorithm is able to handle cases of intersectionality, i.e., of multi-dimensional discrimination of certain groups on grounds of several criteria. We discuss three main examples (credit applications; college admissions; insurance contracts) and map out the legal and policy implications of our approach. The explicit formalization of the trade-off between individual and group fairness allows this post-processing approach to be tailored to different situational contexts in which one or the other fairness criterion may take precedence. Finally, we evaluate our model experimentally.
Reducing disparate exposure in ranking: A learning to rank approach
- Proceedings of The Web Conference 2020 (WWW'20). Taipeh, Taiwan. April, 2020.
Abstract: Ranked search results have become the main mechanism by which we nd content, products, places, and people online. Thus their ordering contributes not only to the satisfaction of the searcher, but also to career and business opportunities, educational placement, and even social success of those being ranked. Researchers have become increasingly concerned with systematic biases in data-driven ranking models, and various post-processing methods have been proposed to mitigate discrimination and inequality of opportunity. This approach, however, has the disadvantage that it still allows an unfair ranking model to be trained. In this paper we explore a new in-processing approach: DELTR, a learning-to-rank framework that addresses potential issues of discrimination and unequal opportunity in rankings at training time. We measure these problems in terms of discrepancies in the average group exposure and design a ranker that optimizes search results in terms of relevance and in terms of reducing such discrepancies. We perform an extensive experimental study showing that being “colorblind” can be among the best or the worst choices from the perspective of relevance and exposure, depending on how much and which kind of bias is present in the training set. We show that our in-processing method performs better in terms of relevance and exposure than a pre-processing and a post-processing method across all tested scenarios.
FairSearch: A Tool For Fairness in Ranked Search Results
- Companion Proceedings of the Web Conference 2020 (WWW'20), Taipeh, Taiwan. April 2020.
Abstract: Ranked search results and recommendations have become the mainmechanism by which we find content, products, places, and people online. With hiring, selecting, purchasing, and dating being increasingly mediated by algorithms, rankings may determine business opportunities, education, access to benefits, and even social success. It is therefore of societal and ethical importance to ask whether search results can demote, marginalize, or exclude individuals of unprivileged groups or promote products with undesired features. In this paper we present FairSearch, the first fair open source search API to provide fairness notions in ranked search results. We implement two well-known algorithms from the literature, namely FA*IR (Zehlike et al., 2017) and DELTR (Zehlike and Castillo, 2018)and provide them as stand-alone libraries in Python and Java. Additionally we implement interfaces to Elasticsearch for both algorithms, a well-known search engine API based on Apache Lucene. The interfaces use the aforementioned Java libraries and enable search engine developers who wish to ensure fair search results of different styles to easily integrate DELTR and FA*IR into their existing Elasticsearch environment.
Two-sided fairness for repeated matchings in two-sided markets: A case study of a ride-hailing platform
- Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'19). Anchorage, Alaska, USA. August, 2019.
Abstract: Ride hailing platforms, such as Uber, Lyft, Ola or DiDi, have traditionally focused on the satisfaction of the passengers, or on boosting successful business transactions. However, recent studies provide a multitude of reasons to worry about the drivers in the ride hailing ecosystem. The concerns range from bad working conditions and worker manipulation to discrimination against minorities. With the sharing economy ecosystem growing, more and more drivers financially depend on online platforms and their algorithms to secure a living. It is pertinent to ask what a fair distribution of income on such platforms is and what power and means the platform has in shaping these distributions. In this paper, we analyze job assignments of a major taxi company and observe that there is significant inequality in the driver income distribution. We propose a novel framework to think about fairness in the matching mechanisms of ride hailing platforms. Specifically, our notion of fairness relies on the idea that, spread over time, all drivers should receive benefits proportional to the amount of time they are active in the platform. We postulate that by not requiring every match to be fair, but rather distributing fairness over time, we can achieve better overall benefit for the drivers and the passengers. We experiment with various optimization problems and heuristics to explore the means of achieving two-sided fairness, and investigate their caveats and side-effects. Overall, our work takes the first step towards rethinking fairness in ride hailing platforms with an additional emphasis on the well-being of drivers.
Fa*ir: A fair top-k ranking algorithm
- Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM'17). Singapore. November, 2017.
Abstract: In this work, we define and solve the Fair Top-k Ranking problem, in which we want to determine a subset of k candidates from a large pool of n » k candidates, maximizing utility (i.e., select the "best" candidates) subject to group fairness criteria. Our ranked group fairness definition extends group fairness using the standard notion of protected groups and is based on ensuring that the proportion of protected candidates in every prefix of the top-k ranking remains statistically above or indistinguishable from a given minimum. Utility is operationalized in two ways: (i) every candidate included in the top-k should be more qualified than every candidate not included; and (ii) for every pair of candidates in the top-k, the more qualified candidate should be ranked above. An efficient algorithm is presented for producing the Fair Top-k Ranking, and tested experimentally on existing datasets as well as new datasets released with this paper, showing that our approach yields small distortions with respect to rankings that maximize utility without considering fairness criteria. To the best of our knowledge, this is the first algorithm grounded in statistical tests that can mitigate biases in the representation of an under-represented group along a ranked list.
Systems and Applications
Awards and Scholarships
- (2019) Google Women Techmaker Scholarship EMEA: (WTM) a 7,000 EUR award for the impact on diversity, demonstrated leadership and strong academic background.
- (2017) Data Transparency Lab Research Grant: Grant of 50,000EUR for the design and implementation of web tools that enable fairness accountability and transparency in machine learning systems.
- (2016) SOAMED Graduate School: PhD school on service-oriented Architectures for the Integration of Software-based Processes, exemplified by Health Care Systems and Medical Technology
- (2010) Femtec Network Career Building Scholarship: Career building program for female future leaders from science, technology, engineering and mathematics
- (2010) Erasmus Scholarship: Semester abroad in Lyon, France
- (2009) DAAD "GoEast" Scholarship: Semester abroad in Moscow, Russia
- (2009) DAAD Summer School Tomsk/Moscow Scholarship: Language summer school in Tomsk and Moscow, Russia
- (2020) Fairness in Algorithmic Decision Making. FTA Live 2020, Berlin, DE.
- (2020) Panel: Wie wird künstliche Intelligenz geschlechtergerecht. Podiumsdiskussion. Berlin, DE.
- (2019) Matching Code and Law. Columbia University, New York City, NY, US.
- (2019) Matching Code and Law. IBM Research, Yorktown, NY, US.
- (2019) Disparate Exposure in Learning to Rank. Microsoft Research, New York City, NY, US.
- (2019) Fairness-Aware Ranking Algorithms. CapGemini, DE.
- (2019) Fairness in Algorithmic Decision Making. Yale University, New Haven, CT, US.
- (2019) Fairness-Aware Ranking Algorithms. Technische Universität Berlin. Berlin, DE.
- (2019) Panel: Brauchen wir mehr Diversität im Datenjournalismus. nr19 Jahreskonferenz. Hamburg, DE
- (2019) Fairness-Aware Ranking Algorithms. Freie Universität Berlin. Berlin, DE.
- (2018) Fairness-Aware Ranking Algorithms. RWTH Aachen. Aachen, DE.
- (2017) Frameworks of Bias in Computer Systems and their Application in Rankings. Workshop on Data and Algorithmic Bias. DAB'17. Singapore.
- (2017) Panel: Algorithmic Fairness and Bias in Data. Workshop on Data and Algorithmic Bias. DAB'17. Singapore.
- (2017) On Fairness in Ranked Search Algorithms. Universität Hamburg. Hamburg, DE
Here is some coverage of newspaper articles written by me on the topic of algorithmic fairness and TV appearances.
- Unfair: Süddeutsche Zeitung Nr. 104, 6th May 2019, p.9
- Algorithmen gegen Diskriminierung: Gegenblende - Debattenmagazin, 22nd May 2019
- Track Chair, Informatik 2020
- PC Member, SIGIR 2020
- PC Member, BIAS 2020
- PC Member, FACTS-IR 2019
- Reviewer, EDBT 2019
- Reviewer FAT* 2019
- Academic Senate Member, Faculty Board Member. 2017 - 2019. TU Berlin
- Appointment Committee Member. 2017 - 2018. TU Berlin
Teaching and Supervision
- (2018 - 2019) Practical Project for Master Students, TU Berlin
- (2017 - 2018) Practical Course for Bachelor Students, TU Berlin
Master Theses Supervision
- Michal Jirku: Algorithmic Fairness Development in a Competitive Setting
- Frederic Mauss: Creating a gender-specific data set about the users of StackOverflow
- Stefanie Quach: Extending the DELTR Algorithm to Multinomial Use Cases
Bachelor Theses Supervision
- Flora Muscinelli: Mapping Algorithmic Fairness is Contextual
- Tom Sühr: Two-Sided Fairness for Repeated Matchings in Two-Sided Markets
- Jan Steffen Preßler: A Data Collection to Develop Fair Machine Learning Algorithms
- Gina-Theresa Diehn: FA*IR as Pre-Processing Fair Machine Learning Approach
- Simon Huber: Generating Discriminatory Datasets by Usage of Wasserstein Generative Adversarial Networks
- Laura Mons: Benchmarking for Fair Machine Learning Algorithms
- Hyerim Hwang: Extension of the FA*IR Top-k Ranking Algorithm to Multinomial Use Cases
- Algorithmic Fairness
- Discrimination and Exploitation
- Political Philosophy
- Machine Learning
Ph.D. in Computer Science2016 - present
Diploma in Computer Science2006 - 2014
Student Exchange2009 - 2010
- German (Native)
- English (Professional)
- French (Advanced)
- Russian (Advanced)