International Journal Publications
- Managing Longitudinal Exposure of Socially Shared Data on the Twitter Social Media.
Abstract: On most online social media sites today, user-generated data remains accessible to allowed viewers unless and until the data owner changes her privacy preferences. In this paper, we present a large-scale measurement study focused on understanding how users control the longitudinal exposure of their publicly shared data on social media sites. Our study, using data from Twitter, finds that a significant fraction of users withdraw a surprisingly large percentage of old publicly shared data—more than 28% of 6-year old public posts (tweets) on Twitter are not accessible today. The inaccessible tweets are either selectively deleted by users or withdrawn by users when they delete or make their accounts private. We also found a significant problem with the current exposure control mechanisms—even when a user deletes her tweets or her account, the current mechanisms leave traces of residual activity, i.e., tweets from other users sent as replies to those deleted tweets or accounts still remain accessible. We show that using this residual information one can recover significant information about the deleted tweets or even characteristics of the deleted accounts. To the best of our knowledge, we are the first to study the information leakage resulting from residual activities of deleted tweets and accounts. Finally, we propose two exposure control mechanisms that eliminates information leakage via residual activities. One of our mechanisms optimize for allowing meaningful social interactions with user posts and another mechanism aims to control longitudinal exposure via anonymization . We discuss the merits and drawbacks of our proposed mechanisms compared to existing mechanisms.
- An Evaluation of Sentiment Analysis for Mobile Devices.
Abstract: Sentiment Analysis has become a key tool to extract knowledge from data containing opinions and sentiments, particularly, data from online social systems. With the increasing use of smartphones to access social media platforms, a new wave of applications that explore sentiment analysis in the mobile environment is beginning to emerge. However, there are various existing sentiment analysis methods and it is unclear which of them are deployable in the mobile environment. In this paper, we provide the first of a kind study in which we compare the performance of 14 sentence-level sentiment analysis methods in the mobile environment. To do that, we adapted these methods to run on Android OS and then we measure their performance in terms of memory, CPU, and battery consumption. Our findings unveil methods that require almost no adaptations and run relatively fast as well as methods that could not be deployed due to excessive use of memory. We hope our effort provides a guide to developers and researchers interested in exploring sentiment analysis as part of a mobile application and can help new applications to be executed without the dependency of a server-side API. We also share the Android API that implements all the 14 sentiment analysis used in this paper.
- Longitudinal Privacy Management in Social Media: The Need for Better Controls.
Abstract: This large-scale measurement study of Twitter focuses on understanding how users control the longitudinal exposure of their publicly shared social data — that is, their tweets — and the limitations of currently used control mechanisms. Our study finds that, while Twitter users widely employ longitudinal exposure control mechanisms, they face two fundamental problems. First, even when users delete their data or account, the current mechanisms leave signficant traces of residual activity. Second, these mechanisms single out withdrawn tweets or accounts, attracting undesirable attention to them. To address both problems, an inactivity- based withdrawal scheme for improved longitudinal exposure control is explored.
- You followed my
bot! Transforming robots into influential users in Twitter.
Abstract: Systems like Klout and Twitalyzer were developed as an attempt to measure the influence of users within social networks. Although the algorithms used by these systems are not public known, they have been widely used to rank users according to their influence, especially in the Twitter social network. As media companies might base their viral marketing campaigns on influence scores, users might attempt to boost their influence scores with simple mechanisms like following unknown users to be followed back or even interacting with those who reciprocate these actions. In this paper, we investigate if widely used influence scores are vulnerable and easy to manipulate. Our approach consists of developing Twitter bot accounts able to interact with real users to verify strategies that can increase their influence scores according to different systems. Our results show that it is possible to become influential using very simple strategies, suggesting that these systems should review their influence score algorithms to avoid accounting with automatic activity.
- White, Man, and Highly Followed: Gender and Race Inequalities in Twitter.
Abstract: Social media is considered a democratic space in which people connect and interact with each other regardless of their gender, race, or any other demographic factor. Despite numerous efforts that explore demographic factors in social media, it is still unclear whether social media perpetuates old inequalities from the offline world. In this paper, we attempt to identify gender and race of Twitter users located in U.S. using advanced image processing algorithms from Face++. Then, we investigate how different demographic groups (i.e. male/female, Asian/Black/White) connect with other. We quantify to what extent one group follow and interact with each other and the extent to which these connections and interactions reflect in inequalities in Twitter. Our analysis shows that users identified as White and male tend to attain higher positions in Twitter, in terms of the number of followers and number of times in user's lists. We hope our effort can stimulate the development of new theories of demographic information in the online space.
- Demographics of News Sharing in the U.S. Twittersphere.
Abstract: The widespread adoption and dissemination of online news through social media systems have been revolutionizing many segments of our society and ultimately our daily lives. In these systems, users can play a central role as they share content to their friends. Despite that, little is known about news spreaders in social media. In this paper, we provide the first of its kind in-depth characterization of news spreaders in social media. In particular, we investigate their demographics, what kind of content they share, and the audience they reach. Among our main findings, we show that males and white users tend to be more active in terms of sharing news, biasing the news audience to the interests of these demographic groups. Our results also quantify differences in interests of news sharing across demographics, which has implications for personalized news digests.
- Linguistic Diversities of Demographic Groups in Twitter.
Abstract: The massive popularity of online social media provides a unique opportunity for researchers to study the linguistic characteristics and patterns of user's interactions. In this paper, we provide an in-depth characterization of language usage across demographic groups in Twitter.
In particular, we extract the gender and race of Twitter users located in the U.S. using advanced image processing algorithms from Face++. Then, we investigate how demographic groups (i.e. male/female, Asian/Black/White) differ in terms of linguistic styles and also their interests. We extract linguistic features from 6 categories (affective attributes, cognitive attributes, lexical density and awareness, temporal references, social and personal concerns, and interpersonal focus), in order to identify the similarities and differences in particular writing set of attributes. In addition, we extract the absolute ranking difference of top phrases between demographic groups. As a dimension of diversity, we also use the topics of interest that we retrieve from each user. Our analysis unveils clear differences in the writing styles (and the topics of interest) of different demographic groups, with variation seen across both gender and race lines. We hope our effort can stimulate the development of new studies related to demographic information in the online space.
- Who Makes Trends? Understanding Demographic Biases in Crowdsourced Recommendations.
Users of social media sites like Facebook and Twitter rely on crowdsourced content recommendation systems (e.g., Trending Topics) to retrieve important and useful information. Contents selected for recommendation indirectly give the initial users who promoted (by liking or posting) the content an opportunity to propagate their messages to a wider audience. Hence, it is important to understand the demographics of people who make a content worthy of recommendation, and explore whether they are representative of the media site's overall population. In this work, using extensive data collected from Twitter, we make the first attempt to quantify and explore the demographic biases in the crowdsourced recommendations. Our analysis, focusing on the selection of trending topics, finds that a large fraction of trends are promoted by crowds whose demographics are significantly different from the overall Twitter population. More worryingly, we find that certain demographic groups are systematically under-represented among the promoters of the trending topics. To make the demographic biases in Twitter trends more transparent, we developed and deployed a Web-based service "Who-Makes-Trends" at http://twitter-app.mpi-sws.org/who-makes-trends
- Quantifying Search Bias: Investigating Sources of Bias for Political Searches in Social Media.
Abstract: To help their users to discover the most interesting contents at a particular time, social media sites like Facebook and Twitter deploy content recommendation systems (such as Trending Topics), which often rely on crowdsourced popularity signals to select the contents. Once the contents are selected for recommendation, they reach a large population, effectively giving the initial users of the contents an opportunity to propagate their messages to the wider public. Hence, it is extremely important to understand the demographics of people who make a content worthy of recommendation, and explore whether there are demographic biases in the recommended contents where the majority of the recommended contents were initially popular with crowds exhibiting skewed demographic distributions.
In this work, using extensive data collected from Twitter, we make the first attempt to quantify and explore the demographic biases in the crowdsourced recommendations (particularly, in the selection of trending topics). In our analysis, we find that very different topics are popular among different demographic groups, and in practice, there is a bias towards a particular demographic while selecting the trending topics. We further propose and evaluate different techniques to limit such demographic biases in trending topic selection.
- From Migration Corridors to Clusters: The Value of Google+ Data for Migration Studies.
Abstract: Recently, there have been considerable efforts to use online data to investigate international migration. These efforts show that Web data are valuable for estimating migration rates and are relatively easy to obtain. However, existing studies have only investigated flows of people along migration corridors, i.e. between pairs of countries. In our work, we use data about "places lived" from millions of Google+ users in order to study migration "clusters", i.e. groups of countries in which individuals have lived. For the first time, we consider information about more than two countries people have lived in. We argue that these data are very valuable because this type of information is not available in traditional demographic sources which record country-to-country migration flows independent of each other. We show that migration clusters of country triads cannot be identified using information about bilateral flows alone. To demonstrate the additional insights that can be gained by using data about migration clusters, we first develop a model that tries to predict the prevalence of a given triad using only data about its constituent pairs. We then inspect the groups of three countries which are more or less prominent, compared to what we would expect based on bilateral flows alone. Next, we identify a set of features such as a shared language or colonial ties that explain which triple of country pairs are more or less likely to be clustered when looking at country triples. Then we select and contrast a few cases of clusters that provide some qualitative information about what our data set shows. The type of data that we use is potentially available for a number of social media services. We hope that this first study about migration clusters will stimulate the use of Web data for the development of new theories of international migration that could not be tested appropriately before.
- Towards Sentiment Analysis for Mobile Devices
Abstract: The increasing use of smartphones to access social media platforms opens a new wave of applications that explore sentiment analysis in the mobile environment. However, there are various existing sentiment analysis methods and it is unclear which of them are deployable in the mobile environment. This paper provides the first of a kind study in which we compare the performance of 17 sentence-level sentiment analysis methods in the mobile environment. To do that, we adapted these sentence-level methods to run on Android OS and then we measure their performance in terms of memory usage, CPU usage, and battery consumption. Our findings unveil sentence-level methods that require almost no adaptations and run relatively fast as well as methods that could not be deployed due to excessive use of memory. We hope our effort provides a guide to developers and researchers interested in exploring sentiment analysis as part of a mobile application and can help new applications to be executed without the dependency of a server-side API.
- Forgetting in Social Media: Understanding and Controlling Longitudinal Exposure of Socially Shared Data
- Algoritmos de Aprendizado de Máquina para Predição de Resultados das Lutas de MMA
Abstract: This paper proposes using machine learning algorithms to predict the outcome of an MMA fight based on the characteristics of the two fighters and their recent opponents. Our experimental evaluation shows an approach to create a dataset applicable to individual sports and one of the evaluated algorithms has 67% of successful predictions.
- Brazil Around the World: Characterizing and Detecting Brazilian Emigrants Using Google+
Abstract: Currently available data about people whose left their home country to live in a foreign country does not adequately capture the standards of contemporary global migration flows. A new trend for migration studies is to study the data from the Internet, either by Social Networks or other data in the WEB. In this study, we collected users data from the social network Google+ to investigate which features of Brazilian users are relevant to classify them as a possible emigrant. Our study uses machine learning techniques, SVM. We selected some features to compose our dataset. Our results show that the network features were the ones that had greater capacity for discrimination. The most relevant for the prediction of Brazilian emigrants users are, in order: reciprocity, PageRank, in-degree, clustering coefficient and ratio of incoming foreigners.
- Bazinga! Caracterizando e Detectando Sarcasmo e Ironia no Twitter
- Pollyanna Gonçalves, Daniel Dalip, Julio C. S. Reis, Johnnatan Messias, Filipe Ribeiro, Philipe Melo, Leandro A. A. Silva, Marcos Gonçalves, and Fabrício Benevenuto.
- In Proceedings of the Proceedings of the Brazilian Workshop on Social Network Analysis and Mining (BraSNAM). Recife, Brazil. July, 2015.
Abstract: Sarcasm and irony are widely used forms of speech used inside and outside the Web, having the power to transform a sentence regarding its polarity or sense. The ability of characterizing and detecting sarcastic and ironic messages on data collected from Web could improve many decision-making systems based on Natural Language Processing (NLP) such as the sentiment analysis, text summarization and review ranking systems. In this work, we propose some approaches to the task of characterization and detection of sarcasm and irony in messages posted on Twitter online social network. Using an automatically collected dataset with the hashtags “#sarcasm” and“#irony”, and by exploiting a large set of characterization and classification techniques, our results show satisfactory rates of accuracy and Macro-F1.
- Bots Sociais: Como robôs podem se tornar pessoas influentes no Twitter?
- Sigam-me os bons! Transformando robôs em pessoas influentes no Twitter.
Abstract: Systems that classify influential users in social networks has been used with great frequency, being referenced in scientific papers and the media as the ideal standard for evaluation of influence in the social network Twitter. We consider this measure a complex and subjective and therefore suspect vulnerability and ease of handling these systems. Based on this, we performed experiments and analyzes in two ranking systems of influence: Klout and Twitalyzer. We create simple robots capable of interacting through Twitter accounts and measure their influence. Our results show that it is possible to be influential through simple strategies. This suggests that the systems do not have ideal metric to rank influence.
- Characterizing Interconnections and Linguistic Patterns in Twitter
Abstract: Social media is considered a democratic space in which people connect and interact with each other regardless of their gender, race, or any other demographic aspect. Despite numerous efforts that explore demographic aspects in social media, it is still unclear whether social media perpetuates old inequalities from the offline world. In this dissertation, we attempt to identify gender and race of Twitter users located in the United States using advanced image processing algorithms from Face++. We investigate how different demographic groups (i.e. male/female, asian/black/white) connect with each other and differentiate them regarding linguistic styles and also their interests. We quantify to what extent one group follows and interacts with each other and the extent to which these connections and interactions reflect in inequalities in Twitter. We also extract linguistic features from six categories (affective attributes, cognitive attributes, lexical density and awareness, temporal references, social and personal concerns, and interpersonal focus) in order to identify the similarities and the differences in the messages they share in Twitter. Furthermore, we extract the absolute ranking difference of top phrases between demographic groups. As a dimension of diversity, we also use the topics of interest that we retrieve from each user. Our analysis shows that users identified as white and male tend to attain higher positions, in terms of the number of followers and number of times in another user's lists, in Twitter. There are clear differences in the way of writing across different demographic groups in both gender and race domains as well as in the topic of interest. We hope our effort can stimulate the development of new theories of demographic information in the online space. Finally, we developed a Web-based system that leverages the demographic aspects of users to provide transparency to the Twitter trending topics system.
- Framework Para Sistemas de Navegação de Veículos Aéreos Não Tripulados
Abstract: Become autonomous unmanned flights undoubtedly enable new opportunities for scientific development. The drones can be used in military services, for example, in combat or as well as for rescue missions, aerial survey, supervision and inspection of a territory, attracting significant attention from media outlets such as, for example, television stations, radio, newspapers and internet. The goal of this project is whether it is possible to make viable autonomous flights at AR.Drone 2.0 and the understanding of its operation. This will require the implementation of a control program for autonomous flights. This framework requires the acquisition of data during the flight, which are obtained using sensors which use Arduino. The Arduino communication with the drone is needed for the inclusion of new sensors and the use of the AR.Drone is performed by the framework Node.js. Each remote button has a specific command, and may be in order for the user to create own missions or even perform some missions previously implemented by the developer. All tests were run on the AR.Drone 2.0, using the Node.js framework, sensors and a remote control. Through the experiments and presented studies became possible to achieve the proposed objective, making possible the implementation of autonomous flights in drone. As a result, for the realization of autonomous flight we designed a framework where the user can create autonomous flight missions for the drone run them. These commands are sent to the drone by the user due to use of a remote control. This remote control sends data to a sensor connected to the Arduino that processes the data and then is read and interpreted by the drone.
For more information and complete curriculum visit: Linkedin