On the use of URLs and hashtags in age prediction of Twitter users

Social media data represent an important resource for behavioral analysis of the ageing population. This paper addresses the problem of age prediction from Twitter dataset, where the prediction issue is viewed as a classification task. For this purpose, an innovative model based on Convolutional Neural Network is devised. To this end, we rely on language-related features and social media specific metadata. More specifically, we introduce two features that have not been previously considered in the literature: the content of URLs and hashtags appearing in tweets. We also employ distributed representations of words and phrases present in tweets, hashtags and URLs, pre-trained on appropriate corpora in order to exploit their semantic information in age prediction. We show that our CNN-based classifier, when compared with an SVM baseline model, yields an improvement of 12.3% and 6.6% in the micro-averaged F1 score on the Dutch and English datasets, respectively.

Authors:
Pandya Abhinay, Oussalah Mourad, Monachesi Paola, Kostakos Panos, Lovén Lauri

Publication type:
A4 Article in conference proceedings

Place of publication:
2018 IEEE International Conference on Information Reuse and Integration (IRI), 7–9 July 2018, Salt Lake City, Utah, USA

Keywords:
Age prediction, Convolutional neural networks, natural language processing, Social media mining, Twitter

Published:

Full citation:
A. Pandya, M. Oussalah, P. Monachesi, P. Kostakos and L. Lovén, “On the Use of URLs and Hashtags in Age Prediction of Twitter Users,” 2018 IEEE International Conference on Information Reuse and Integration (IRI), Salt Lake City, UT, 2018, pp. 62-69. doi: 10.1109/IRI.2018.00017

DOI:
https://doi.org/10.1109/IRI.2018.00017

Read the publication here:
http://urn.fi/urn:nbn:fi-fe2018112849416