About me

I am a Ph.D. student at the Max Planck Institute for Software Systems (MPI-SWS) in the Social Computing Research group. I am advised by Dr. Krishna P. Gummadi. I completed my Master's degree in Computer Science at the Universidade Federal de Minas Gerais (UFMG), Brazil, with Dr. Fabrício Benevenuto as my advisor. During my MSc, I also held two research intern positions at MPI-SWS in Saarbrücken, Germany. I studied Computer Science at Eötvös Loránd University (ELTE) in Budapest, Hungary, in 2013/2014 during my exchange program (Science Without Borders - CAPES). I completed my undergraduate degree in Computer Science at the Universidade Federal de Ouro Preto (UFOP), Brazil, in 2015.

My complete name: Johnnatan Messias Peixoto Afonso.

Selected Publications

For full list of publications, kindly check my Google Scholar or DBLP Profile.

Journals

  • Search Bias Quantification: Investigating Political Bias in Social Media and Web Search
  • Information Retrieval Journal. Springer. Volume 22, Issue 1-2, April 2019.
  • Abstract: Users frequently use search systems on the Web as well as online social media to learn about ongoing events and public opinion on personalities. Prior studies have shown that the top-ranked results returned by these search engines can shape user opinion about the topic (e.g., event or person) being searched. In case of polarizing topics like politics, where multiple competing perspectives exist, the political bias in the top search results can play a significant role in shaping public opinion towards (or away from) certain perspectives. Given the considerable impact that search bias can have on the user, we propose a generalizable search bias quantification framework that not only measures the political bias in ranked list output by the search system but also decouples the bias introduced by the different sources — input data and ranking system. We apply our framework to study the political bias in searches related to 2016 US Presidential primaries in Twitter social media search and find that both input data and ranking system matter in determining the final search output bias seen by the users. And finally, we use the framework to compare the relative bias for two popular search systems — Twitter social media search and Google web search — for queries related to politicians and political events. We end by discussing some potential solutions to signal the bias in the search results to make the users more aware of them.

  • Managing Longitudinal Exposure of Socially Shared Data on the Twitter Social Media
  • International Journal of Advances in Engineering Sciences and Applied Mathematics (Special Issue on Data Sciences), Springer, 2017.
  • Abstract: On most online social media sites today, user-generated data remains accessible to allowed viewers unless and until the data owner changes her privacy preferences. In this paper, we present a large-scale measurement study focused on understanding how users control the longitudinal exposure of their publicly shared data on social media sites. Our study, using data from Twitter, finds that a significant fraction of users withdraw a surprisingly large percentage of old publicly shared data—more than 28% of 6-year old public posts (tweets) on Twitter are not accessible today. The inaccessible tweets are either selectively deleted by users or withdrawn by users when they delete or make their accounts private. We also found a significant problem with the current exposure control mechanisms—even when a user deletes her tweets or her account, the current mechanisms leave traces of residual activity, i.e., tweets from other users sent as replies to those deleted tweets or accounts still remain accessible. We show that using this residual information one can recover significant information about the deleted tweets or even characteristics of the deleted accounts. To the best of our knowledge, we are the first to study the information leakage resulting from residual activities of deleted tweets and accounts. Finally, we propose two exposure control mechanisms that eliminates information leakage via residual activities. One of our mechanisms optimize for allowing meaningful social interactions with user posts and another mechanism aims to control longitudinal exposure via anonymization . We discuss the merits and drawbacks of our proposed mechanisms compared to existing mechanisms.

  • An Evaluation of Sentiment Analysis for Mobile Devices
  • In Springer Nature Social Network Analysis and Mining. Volume 7, Issue 1, 2017.
  • Abstract: Sentiment Analysis has become a key tool to extract knowledge from data containing opinions and sentiments, particularly, data from online social systems. With the increasing use of smartphones to access social media platforms, a new wave of applications that explore sentiment analysis in the mobile environment is beginning to emerge. However, there are various existing sentiment analysis methods and it is unclear which of them are deployable in the mobile environment. In this paper, we provide the first of a kind study in which we compare the performance of 14 sentence-level sentiment analysis methods in the mobile environment. To do that, we adapted these methods to run on Android OS and then we measure their performance in terms of memory, CPU, and battery consumption. Our findings unveil methods that require almost no adaptations and run relatively fast as well as methods that could not be deployed due to excessive use of memory. We hope our effort provides a guide to developers and researchers interested in exploring sentiment analysis as part of a mobile application and can help new applications to be executed without the dependency of a server-side API. We also share the Android API that implements all the 14 sentiment analysis used in this paper.

  • Longitudinal Privacy Management in Social Media: The Need for Better Controls
  • IEEE Internet Computing (Special Issue on Usable Privacy & Security). Volume 21, Issue 3, May-June, 2017.
  • Abstract: This large-scale measurement study of Twitter focuses on understanding how users control the longitudinal exposure of their publicly shared social data — that is, their tweets — and the limitations of currently used control mechanisms. Our study finds that, while Twitter users widely employ longitudinal exposure control mechanisms, they face two fundamental problems. First, even when users delete their data or account, the current mechanisms leave signficant traces of residual activity. Second, these mechanisms single out withdrawn tweets or accounts, attracting undesirable attention to them. To address both problems, an inactivity- based withdrawal scheme for improved longitudinal exposure control is explored.

  • You followed my bot! Transforming robots into influential users in Twitter
  • First Monday. Volume 18, Issue 7, July, 2013.
  • Abstract: Systems like Klout and Twitalyzer were developed as an attempt to measure the influence of users within social networks. Although the algorithms used by these systems are not public known, they have been widely used to rank users according to their influence, especially in the Twitter social network. As media companies might base their viral marketing campaigns on influence scores, users might attempt to boost their influence scores with simple mechanisms like following unknown users to be followed back or even interacting with those who reciprocate these actions. In this paper, we investigate if widely used influence scores are vulnerable and easy to manipulate. Our approach consists of developing Twitter bot accounts able to interact with real users to verify strategies that can increase their influence scores according to different systems. Our results show that it is possible to become influential using very simple strategies, suggesting that these systems should review their influence score algorithms to avoid accounting with automatic activity.

Conferences

  • (Mis)Information Dissemination in WhatsApp: Gathering, Analyzing and Countermeasures
  • In Proceedings of the 28th Web Conference (WWW'19). San Francisco, USA. May, 2019.
  • Abstract: WhatsApp has revolutionized the way people communicate and interact. It is not only cheaper than the traditional Short Message Service (SMS) communication but it also brings a new form of mobile communication: the group chats. Such groups are great forums for collective discussions on a variety of topics. In particular, in events of great social mobilization, such as strikes and electoral campaigns, WhatsApp group chats are very attractive as they facilitate information exchange among interested people. Yet, recent events have raised concerns about the spreading of misinformation in WhatsApp. In this work, we analyze information dissemination within WhatsApp, focusing on publicly accessible political-oriented groups, collecting all shared messages during major social events in Brazil: a national truck drivers' strike and the Brazilian presidential campaign. We analyze the types of content shared within such groups as well as the network structures that emerge from user interactions within and cross-groups. We then deepen our analysis by identifying the presence of misinformation among the shared images using labels provided by journalists and by a proposed automatic procedure based on Google searches. We identify the most important sources of the fake images and analyze how they propagate across WhatsApp groups and from/to other Web platforms.

  • WhatsApp Monitor: A Fact-Checking System for WhatsApp
  • In Proceedings of the 13th International AAAI Conference on Web and Social Media (ICWSM’19). Munich, Germany. June, 2019.
  • Abstract: WhatsApp is the most popular communication application in many developing countries such as Brazil, India, and Mexico, where many people use it as an interface to the web. Due to its encrypted and peer-to-peer nature feature, it is hard for researchers to study which content people share through WhatsApp at scale. In this demo paper, we propose WhatsApp Monitor (http://www.whatsapp-monitor.dcc.ufmg.br), a web-based system that helps researchers and journalists explore the nature of content shared on WhatsApp public groups from two different contexts: Brazil and India. Our tool monitors multiple content categories such as images, videos, audio, and textual messages posted on a set of WhatsApp groups and displays the most shared content per day. Our tool has been used for monitoring content during the 2018 Brazilian general election and was one of the major sources for estimating the spread of misinformation and helping fact-checking efforts.

  • On Microtargeting Socially Divisive Ads: A Case Study of Russia-Linked Ad Campaigns on Facebook
  • In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT*'19), Atlanta, Georgia. January 2019.
  • Abstract: Targeted advertising is meant to improve the efficiency of matching advertisers to their customers. However, targeted advertising can also be abused by malicious advertisers to efficiently reach people susceptible to false stories, stoke grievances, and incite social conflict. Since targeted ads are not seen by non-targeted and non-vulnerable people, malicious ads are likely to go unreported and their effects undetected. This work examines a specific case of malicious advertising, exploring the extent to which political ads from the Russian Intelligence Research Agency (IRA) run prior to 2016 U.S. elections exploited Facebook's targeted advertising infrastructure to efficiently target ads on divisive or polarizing topics (e.g., immigration, race-based policing) at vulnerable sub-populations. In particular, we do the following: (a) We conduct U.S. census-representative surveys to characterize how users with different political ideologies report, approve, and perceive truth in the content of the IRA ads. Our surveys show that many ads are "divisive": they elicit very different reactions from people belonging to different socially salient groups. (b) We characterize how these divisive ads are targeted to sub-populations that feel particularly aggrieved by the status quo. Our findings support existing calls for greater transparency of content and targeting of political ads. (c) We particularly focus on how the Facebook ad API facilitates such targeting. We show how the enormous amount of personal data Facebook aggregates about users and makes available to advertisers enables such malicious targeting.

  • A System for Monitoring Public Political Groups in WhatsApp
  • In Proceedings of the 24th Brazilian Symposium on Multimedia and the Web (Webmedia'18). Salvador, Brazil. October, 2018.
  • Abstract: In Brazil, 48% of the population use WhatsApp to share and discuss news. Currently, there are serious concerns that this platform can become a fertile ground for groups interested in disseminating misinformation, especially as part of articulated political campaigns. Particularly, WhatsApp provides an important space for users to engage in public conversations that worth attention, the public groups. These groups are suitable for political activism and social movement organization. Additionally, it is reasonable to assume that a malicious misinformation campaign might attempt to maximize the audience of a fake story by sharing it in existing public groups. In this paper, we present a system for gathering, analyzing and visualize public groups in WhatsApp. In addition to describe our methodology, we also provide a brief characterization of the content shared in 127 Brazilian groups. We hope our system can help journalists and researchers to understand the repercussion of events related to the Brazilian elections within these groups.

  • White, Man, and Highly Followed: Gender and Race Inequalities in Twitter
  • In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI'17). Leipzig, Germany. August 2017.
  • Abstract: Social media is considered a democratic space in which people connect and interact with each other regardless of their gender, race, or any other demographic factor. Despite numerous efforts that explore demographic factors in social media, it is still unclear whether social media perpetuates old inequalities from the offline world. In this paper, we attempt to identify gender and race of Twitter users located in U.S. using advanced image processing algorithms from Face++. Then, we investigate how different demographic groups (i.e. male/female, Asian/Black/White) connect with other. We quantify to what extent one group follow and interact with each other and the extent to which these connections and interactions reflect in inequalities in Twitter. Our analysis shows that users identified as White and male tend to attain higher positions in Twitter, in terms of the number of followers and number of times in user's lists. We hope our effort can stimulate the development of new theories of demographic information in the online space.

  • Demographics of News Sharing in the U.S. Twittersphere
  • In Proceedings of the 28th ACM Conference on Hypertext and Social Media (HT'17). Prague, Czech Republic. July 2017.
  • Abstract: The widespread adoption and dissemination of online news through social media systems have been revolutionizing many segments of our society and ultimately our daily lives. In these systems, users can play a central role as they share content to their friends. Despite that, little is known about news spreaders in social media. In this paper, we provide the first of its kind in-depth characterization of news spreaders in social media. In particular, we investigate their demographics, what kind of content they share, and the audience they reach. Among our main findings, we show that males and white users tend to be more active in terms of sharing news, biasing the news audience to the interests of these demographic groups. Our results also quantify differences in interests of news sharing across demographics, which has implications for personalized news digests.

  • Linguistic Diversities of Demographic Groups in Twitter
  • In Proceedings of the 28th ACM Conference on Hypertext and Social Media (HT'17). Prague, Czech Republic. July 2017.
  • Abstract: The massive popularity of online social media provides a unique opportunity for researchers to study the linguistic characteristics and patterns of user's interactions. In this paper, we provide an in-depth characterization of language usage across demographic groups in Twitter. In particular, we extract the gender and race of Twitter users located in the U.S. using advanced image processing algorithms from Face++. Then, we investigate how demographic groups (i.e. male/female, Asian/Black/White) differ in terms of linguistic styles and also their interests. We extract linguistic features from 6 categories (affective attributes, cognitive attributes, lexical density and awareness, temporal references, social and personal concerns, and interpersonal focus), in order to identify the similarities and differences in particular writing set of attributes. In addition, we extract the absolute ranking difference of top phrases between demographic groups. As a dimension of diversity, we also use the topics of interest that we retrieve from each user. Our analysis unveils clear differences in the writing styles (and the topics of interest) of different demographic groups, with variation seen across both gender and race lines. We hope our effort can stimulate the development of new studies related to demographic information in the online space.

  • Who Makes Trends? Understanding Demographic Biases in Crowdsourced Recommendations
  • In Proceedings of the Int'l AAAI Conference on Web and Social (ICWSM’17). Montreal, Canada. May 2017.
  • Quantifying Search Bias: Investigating Sources of Bias for Political Searches in Social Media
  • In Proceedings of the ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW'17). Portland, Oregon, USA, February 2017.
  • Abstract: To help their users to discover the most interesting contents at a particular time, social media sites like Facebook and Twitter deploy content recommendation systems (such as Trending Topics), which often rely on crowdsourced popularity signals to select the contents. Once the contents are selected for recommendation, they reach a large population, effectively giving the initial users of the contents an opportunity to propagate their messages to the wider public. Hence, it is extremely important to understand the demographics of people who make a content worthy of recommendation, and explore whether there are demographic biases in the recommended contents where the majority of the recommended contents were initially popular with crowds exhibiting skewed demographic distributions.
    In this work, using extensive data collected from Twitter, we make the first attempt to quantify and explore the demographic biases in the crowdsourced recommendations (particularly, in the selection of trending topics). In our analysis, we find that very different topics are popular among different demographic groups, and in practice, there is a bias towards a particular demographic while selecting the trending topics. We further propose and evaluate different techniques to limit such demographic biases in trending topic selection.

  • From Migration Corridors to Clusters: The Value of Google+ Data for Migration Studies
  • In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’16). San Francisco, USA. August 2016.
  • Abstract: Recently, there have been considerable efforts to use online data to investigate international migration. These efforts show that Web data are valuable for estimating migration rates and are relatively easy to obtain. However, existing studies have only investigated flows of people along migration corridors, i.e. between pairs of countries. In our work, we use data about "places lived" from millions of Google+ users in order to study migration "clusters", i.e. groups of countries in which individuals have lived. For the first time, we consider information about more than two countries people have lived in. We argue that these data are very valuable because this type of information is not available in traditional demographic sources which record country-to-country migration flows independent of each other. We show that migration clusters of country triads cannot be identified using information about bilateral flows alone. To demonstrate the additional insights that can be gained by using data about migration clusters, we first develop a model that tries to predict the prevalence of a given triad using only data about its constituent pairs. We then inspect the groups of three countries which are more or less prominent, compared to what we would expect based on bilateral flows alone. Next, we identify a set of features such as a shared language or colonial ties that explain which triple of country pairs are more or less likely to be clustered when looking at country triples. Then we select and contrast a few cases of clusters that provide some qualitative information about what our data set shows. The type of data that we use is potentially available for a number of social media services. We hope that this first study about migration clusters will stimulate the use of Web data for the development of new theories of international migration that could not be tested appropriately before.

  • Towards Sentiment Analysis for Mobile Devices
  • In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’16). San Francisco, USA. August 2016.
  • Abstract: The increasing use of smartphones to access social media platforms opens a new wave of applications that explore sentiment analysis in the mobile environment. However, there are various existing sentiment analysis methods and it is unclear which of them are deployable in the mobile environment. This paper provides the first of a kind study in which we compare the performance of 17 sentence-level sentiment analysis methods in the mobile environment. To do that, we adapted these sentence-level methods to run on Android OS and then we measure their performance in terms of memory usage, CPU usage, and battery consumption. Our findings unveil sentence-level methods that require almost no adaptations and run relatively fast as well as methods that could not be deployed due to excessive use of memory. We hope our effort provides a guide to developers and researchers interested in exploring sentiment analysis as part of a mobile application and can help new applications to be executed without the dependency of a server-side API.

  • Forgetting in Social Media: Understanding and Controlling Longitudinal Exposure of Socially Shared Data
  • In Proceedings of the 12th Symposium on Usable Privacy and Security (SOUPS'16), Denver, CO, USA, June 2016
  • Abstract: On most online social media sites today, user-generated data remains accessible to allowed viewers unless and until the data owner changes her privacy preferences. In this paper, we present a large-scale measurement study focussed on understanding how users control the longitudinal exposure of their publicly shared data on social media sites. Our study, using data from Twitter, finds that a significant fraction of users withdraw a surprisingly large percentage of old publicly shared data -- more than 28% of six-year old public posts (tweets) on Twitter are not accessible today. The inaccessible tweets are either selectively deleted by users or withdrawn by users when they delete or make their accounts private. We also found a significant problem with the current exposure control mechanisms – even when a user deletes her tweets or her account, the current mechanisms leave traces of residual activity, i.e., tweets from other users sent as replies to those deleted tweets or accounts still remain accessible. We show that using this residual information one can recover significant information about the deleted tweets or even characteristics of the deleted accounts. To the best of our knowledge, we are the first to study th information leakage resulting from residual activities of deleted tweets and accounts. Finally, we propose an exposure control mechanism that eliminates information leakage via residual activities, while still allowing meaningful social interactions with user posts. We discuss its merits and drawbacks compared to existing mechanisms.

Press Coverage

Here is some coverage of my recent research on important blogs, magazines, and newspapers.

Projeto Eleições sem Fake
Gender and Race Inequalities in Twitter - paper on Web Intelligence'17
Making a bot influential in Twitter - paper on First Monday'13

Systems and Applications

  • Covid-19 monitor: A system to help people understand the pandemic data.
  • Eleições sem Fake: Many systems to help with the Fake news problem.
  • Ira Ads: It explores the demographics of ads from the Russian Intelligence Research Agency (IRA) that run prior to 2016 U.S. elections to exploit Facebook's targeted advertising infrastructure to efficiently target ads on divisive or polarizing topics (e.g., immigration, race-based policing) at vulnerable sub-populations.
  • Who Makes Trends? : Demographic of Trend Promoters is the distribution (or combination) of demographic groups (such as middle-aged white men, young asian women, adolescent black men) in the crowd promoting (or posting about) a topic before the topic becomes Trending on Twitter. Here, we are only considering US based Twitter users whose tweets on the trends appear in the 1% random sample distributed by Twitter.
  • Search Political Leaning of Twitter Users : You can login with your Twitter credentials, to see the political leaning (between democratic and republican) inferred for you. You can also search for other Twitter users and check their political leanings.
  • Secondary Digital Footprint : Twitter is social, people converse with you by mentioning your username in their tweets (e.g., while replying to your tweet or giving a shout-out to you ). These conversations are your secondary digital footprint , even if you delete your account or delete selected tweets, this secondary footprint is not deleted automatically and leaks information about you. Check what your secondary digital footprint reveals about you and your content.

Awards

  • Granted a scholarship in 2013 from the Brazilian Scientific Exchange Program (Science without Borders - CAPES) for Academic Excellence to study in a European University for fourteen months.
  • Motion of applauses for developing the SmartHome project, during the exchange program Science Without Borders in Budapest - Hungary - Câmara de Mariana/MG Brazil - (November/2014)
  • Best paper nominee: CTIC’13, BraSNAM’12
  • 3rd place in the XXXII Concurso de Trabalhos de Iniciação Científica (CTIC2013), XXXIII Congresso da Sociedade Brasileira de Computação (CSBC2013)
  • Honorable Mention Article Sigam-me os Bons! Transformando robôs em pessoas influentes no Twitter, Brazilian Workshop on Social Network Analysis and Mining (BraSNAM’12)

Talks

  • (2020) Countering Misinformation on Social Media Platforms. ThoughtWorks. Belo Horizonte, BR.
  • (2020) Countering Misinformation on Social Media Platforms. SMART Data Sprint 2020. Lisbon, PT.
  • (2019) (Mis)Information Dissemination in WhatsApp: Gathering, Analyzing and Countermeasures. 5th International Conference on Computational Social Science (IC2S2’19). Amsterdam, NL.
  • (2016) From Migration Corridors to Clusters: The Value of Google+ Data for Migration Studies. IEEE/ACM Inter- national Conference on Advances in Social Networks Analysis and Mining (ASONAM’16). San Francisco, US.
  • (2016) Towards Sentiment Analysis for Mobile Devices. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’16). San Francisco, US.
  • (2015) Bazinga! Caracterizando e Detectando Sarcasmo e Ironia no Twitter. Brazilian Workshop on Social Network Analysis and Mining (BraSNAM’15). Recife, BR.
  • (2015) Brazil Around the World: Characterizing and Detecting Brazilian Emigrants Using Google+. Brazilian Sym- posium on Multimedia and the Web (WebMedia’15). Manaus, BR.
  • (2013) Bots Sociais: Como robôs podem se tornar pessoas influentes no Twitter? XXXII Concurso de Trabalhos de Iniciação Científica (CTIC’13). Maceió, BR.
  • (2012) Sigam-me os bons! Transformando robôs em pessoas influentes no Twitter. Brazilian Workshop on Social Network Analysis and Mining (BraSNAM’12). Curitiba, BR.
  • (2011) UGUIDE: Rede Social Móvel Aplicada a Educação. XIX Seminário de Iniciação Científica da UFOP. Ouro Preto, BR.
  • (2011) Computação Móvel: Tendências e Android. 6a Semana da Informática - IFSULDEMINAS. Muzambinho, BR.
  • (2010) BlueGuide - Uma Plataforma de Suporte ao Turista em Ouro Preto. I Seminário de Pesquisa do PPGCC & UFOP e I Fórum de Alunos e Ex-Alunos do DECOM. Ouro Preto, BR.

Interests

  • Blockchains
  • Data Analysis
  • Social Networks
  • Machine Learning

Education

Language

  • Portuguese (Native)
  • English (Professional)
  • German (Basic)