Who's speaking? Predicting speaker profession from speech

Variations in speech can reveal the gender, birth place, age, and socioeconomic level of the speaker. In this paper, we show that even the profession of the speaker can be recovered from a recording. For this purpose, we design a method that combines features from both the speech signal and the transcription. For the features from the transcription, we used pretrained language models. This allows us to train a model that predicts the speaker profession from both signals. Our empirical results show that our model can narrow down the profession of the speakers considerably.

Mots clés

Knowledge base Large corpora Multimodal representation Pre-trained language models

Domaines

Informatique [cs]

Fichier principal

icphs-2023.pdf (120.47 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Lihu Chen : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04190126

Soumis le : mardi 29 août 2023-12:10:05

Dernière modification le : mardi 19 mars 2024-11:02:05

Archivage à long terme le : jeudi 30 novembre 2023-18:44:55

Dates et versions

hal-04190126 , version 1 (29-08-2023)

Licence

Paternité

Identifiants

HAL Id : hal-04190126 , version 1

Citer

Yaru Wu, Lihu Chen, Benjamin Elie, Fabian M. Suchanek, Ioana Vasilescu, et al.. Who's speaking? Predicting speaker profession from speech. International Congress of Phonetic Sciences 2023, Aug 2023, Prague, Czech Republic. pp.3086-3090. ⟨hal-04190126⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM CNRS INRIA UNIV-PARIS3 PARISTECH LPP LIMSI CENTRALESUPELEC COMUE-NORMANDIE CAMPUS-AAR AAI UNIV-PARIS-SACLAY UNICAEN CRISCO SORBONNE-UNIVERSITE LTCI INFRES DIG IP_PARIS LISN GS-COMPUTER-SCIENCE

148 Consultations

36 Téléchargements