Exploring Portuguese Hate Speech Detection in Low-Resource Settings: Lightly Tuning Encoder Models or In-Context Learning of Large Models?
Explorando Técnicas de Aprendizado em Modelos de Linguagem para Classificação de Discurso de Ódio e Ofensivo em Português
This repository contains the code and results from the PROPOR 2024 paper, Exploring Portuguese Hate Speech Detection in Low-Resource Settings: Lightly Tuning Encoder Models or In-Context Learning of Large Models?, and its extended version, published in LinguaMÁTICA, titled Explorando Técnicas de Aprendizado em Modelos de Linguagem para Classificação de Discurso de Ódio e Ofensivo em Português.
Gabriel Assis, Annie Amorim, Jonnatahn Carvalho, Daniel de Oliveira, Daniela Vianna, and Aline Paes. 2024. Exploring Portuguese Hate Speech Detection in Low-Resource Settings: Lightly Tuning Encoder Models or In-Context Learning of Large Models?. In Proceedings of the 16th International Conference on Computational Processing of Portuguese, pages 301–311, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics.
Assis, G., Amorim, A., Carvalho, J., Ferro, M., de Oliveira, D., Vianna, D., & Paes, A. (2024). Explorando Técnicas de Aprendizado em Modelos de Linguagem para Classificação de Discurso de Ódio e Ofensivo em Português. Linguamática, 16(2). Obtido de https://linguamatica.com/index.php/linguamatica/article/view/446
@inproceedings{assis-etal-2024-exploring,
title = "Exploring {P}ortuguese {H}ate {S}peech {D}etection in {L}ow-{R}esource {S}ettings:
{L}ightly {T}uning {E}ncoder {M}odels or {I}n-{C}ontext {L}earning of {L}arge {M}odels?",
author = "Assis, Gabriel and
Amorim, Annie and
Carvalho, Jonnathan and
de Oliveira, Daniel and
Vianna, Daniela and
Paes, Aline",
editor = "Gamallo, Pablo and
Claro, Daniela and
Teixeira, Ant{\'o}nio and
Real, Livy and
Garcia, Marcos and
Oliveira, Hugo Gon{\c{c}}alo and
Amaro, Raquel",
booktitle = "Proceedings of the 16th International Conference on Computational Processing of Portuguese",
month = mar,
year = "2024",
address = "Santiago de Compostela, Galicia/Spain",
publisher = "Association for Computational Lingustics",
url = "https://aclanthology.org/2024.propor-1.31",
pages = "301--311"
}
@article{Assis_etal_2024,
title = "Explorando Técnicas de Aprendizado em Modelos de Linguagem
para Classificação de Discurso de Ódio e Ofensivo em Português",
author = "
Assis, Gabriel and
Amorim, Annie and
Carvalho, Jonnathan and
Ferro, Mariza and
de Oliveira, Daniel and
Vianna, Daniela and
Paes, Aline
",
journal = "Linguamática",
volume = "16",
number = "2",
year = "2024",
month = "December",
url = "https://linguamatica.com/index.php/linguamatica/article/view/446"
}
This research was financed by CNPq (National Council for Scientific and Technological Development), grants 307088/2023-5 and 315750/2021-9, and FAPERJ (Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro), processes SEI-260003/002930/2024, SEI-260003/000614/2023, SEI-260003/006057/2024, and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001. The research also benefited from the Google Cloud Research Credits program, under code GCP19980904, and the Maritaca AI credits program.