Inferring Demographic Data of Marginalized Users in Twitter with Computer Vision APIs

Inferring demographic intelligence from unlabeled social media data is an actively growing area of research, challenged by low availability of ground truth annotated training corpora. High-accuracy approaches for labeling demographic traits of social media users employ various heuristics that do not scale up and often discount non-English texts and marginalized users. First, we present a framework for inferring the demographic attributes of Twitter users from their profile pictures (avatars) using the Microsoft Azure Face API. Second, we measure the inter-rater agreement between annotations made using our framework against two pre-labeled samples of Twitter users (N1=1163; N2=659) whose age labels were manually annotated. Our results indicate that the strength of the inter-rater agreement (Gwet’s AC1=0.89; 0.90) between the gold standard and our approach is ‘very good’ for labelling the age group of users. The paper provides a use case of Computer Vision for enabling the development of large cross-sectional labeled datasets, and further advances novel solutions in the field of demographic inference from short social media texts.

Kostakos Panos, Pandya Abhinay, Kyriakouli Olga, Oussalah Mourad

Publication type:
A4 Article in conference proceedings

Place of publication:
Proceedings of the European Intelligence and Security Informatics Conference (EISIC) 2018 October 24-25, 2018 Blekinge Institute of Technology, Karlskrona, Sweden

Demographic data, Privacy, Social media mining, Twitter


Full citation:
P. Kostakos, A. Pandya, O. Kyriakouli and M. Oussalah, “Inferring Demographic Data of Marginalized Users in Twitter with Computer Vision APIs,” 2018 European Intelligence and Security Informatics Conference (EISIC), Karlskrona, Sweden, 2018, pp. 81-84. doi: 10.1109/EISIC.2018.00022


