Inferring demographic intelligence from unlabeled social media data is an actively growing area of research, challenged by low availability of ground truth annotated training corpora. High-accuracy approaches for labeling demographic traits of social media users employ various heuristics that do not scale up and often discount non-English texts and marginalized users. First, we present a framework for inferring the demographic attributes of Twitter users from their profile pictures (avatars) using the Microsoft Azure Face API. Second, we measure the inter-rater agreement between annotations made using our framework against two pre-labeled samples of Twitter users (N1=1163; N2=659) whose age labels were manually annotated. Our results indicate that the strength of the inter-rater agreement (Gwet’s AC1=0.89; 0.90) between the gold standard and our approach is ‘very good’ for labelling the age group of users. The paper provides a use case of Computer Vision for enabling the development of large cross-sectional labeled datasets, and further advances novel solutions in the field of demographic inference from short social media texts.
Authors:
Kostakos Panos, Pandya Abhinay, Kyriakouli Olga, Oussalah Mourad
Publication type:
A4 Article in conference proceedings
Place of publication:
Proceedings of the European Intelligence and Security Informatics Conference (EISIC) 2018 October 24-25, 2018 Blekinge Institute of Technology, Karlskrona, Sweden
Keywords:
Demographic data, Privacy, Social media mining, Twitter
Published:
Full citation:
P. Kostakos, A. Pandya, O. Kyriakouli and M. Oussalah, “Inferring Demographic Data of Marginalized Users in Twitter with Computer Vision APIs,” 2018 European Intelligence and Security Informatics Conference (EISIC), Karlskrona, Sweden, 2018, pp. 81-84. doi: 10.1109/EISIC.2018.00022
DOI:
https://doi.org/10.1109/EISIC.2018.00022
Read the publication here:
http://urn.fi/urn:nbn:fi-fe2019082024775