Twitter-based gender recognition using transformers.

Journal: Mathematical biosciences and engineering : MBE

Volume: 20

Issue: 9

Year of Publication: 2023

Affiliated Institutions:  Africa-Canada Artificial Intelligence and Data Innovation Consortium (ACADIC), York University, Canada. K.N Toosi University, Faculty of Computer Engineering, Tehran, Iran.

Abstract summary 

Social media contains useful information about people and society that could help advance research in many different areas of health (e.g. by applying opinion mining, emotion/sentiment analysis and statistical analysis) such as mental health, health surveillance, socio-economic inequality and gender vulnerability. User demographics provide rich information that could help study the subject further. However, user demographics such as gender are considered private and are not freely available. In this study, we propose a model based on transformers to predict the user's gender from their images and tweets. The image-based classification model is trained in two different methods: using the profile image of the user and using various image contents posted by the user on Twitter. For the first method a Twitter gender recognition dataset, publicly available on Kaggle and for the second method the PAN-18 dataset is used. Several transformer models, i.e. vision transformers (ViT), LeViT and Swin Transformer are fine-tuned for both of the image datasets and then compared. Next, different transformer models, namely, bidirectional encoders representations from transformers (BERT), RoBERTa and ELECTRA are fine-tuned to recognize the user's gender by their tweets. This is highly beneficial, because not all users provide an image that indicates their gender. The gender of such users could be detected from their tweets. The significance of the image and text classification models were evaluated using the Mann-Whitney U test. Finally, the combination model improved the accuracy of image and text classification models by 11.73 and 5.26% for the Kaggle dataset and by 8.55 and 9.8% for the PAN-18 dataset, respectively. This shows that the image and text classification models are capable of complementing each other by providing additional information to one another. Our overall multimodal method has an accuracy of 88.11% for the Kaggle and 89.24% for the PAN-18 dataset and outperforms state-of-the-art models. Our work benefits research that critically require user demographic information such as gender to further analyze and study social media content for health-related issues.

Authors & Co-authors:  Nia Zahra Movahedi ZM Ahmadi Ali A Mellado Bruce B Wu Jianhong J Orbinski James J Asgary Ali A Kong Jude D JD

Study Outcome 

Source Link: Visit source

Statistics
Citations : 
Authors :  7
Identifiers
Doi : 10.3934/mbe.2023711
SSN : 1551-0018
Study Population
Male,Female
Mesh Terms
Humans
Other Terms
BERT;ELECTRA;LeViT;RoBERTa;Swin transformer;ViT;gender recognition;social media;transformers
Study Design
Cross Sectional Study
Study Approach
Country of Study
Publication Country
United States