DepreSym: A Depression Symptom Annotated Corpus and the Role of Large Language Models as Assessors of Psychological Markers Anxo PérezMarcos Fernández-PichelDavid E. Losada Original Paper Open access 03 May 2025
Detoxifying language model outputs: combining multi-agent debates and reinforcement learning for improved summarization G. Bharathi MohanM. GayathriR. Prasanna Kumar Original Paper 29 April 2025
Tropes and the EmotAix lexicon for evaluating the emotional tonality of French verbal association corpora in social representation studies Pascal MolinerPatrick RateauEnola Guegan Original Paper 28 April 2025
Disfluency processing for cascaded speech translation involving English and Indian languages Vandan MujadiaPruthwik MishraDipti Misra Sharma Original Paper 28 April 2025
Benchmarking Hindi-to-English direct speech-to-speech translation with synthetic data Mahendra GuptaMaitreyee DuttaChandresh Kumar Maurya Original Paper 21 April 2025
ArmanEmo: a Persian dataset for text-based emotion detection Hossein MirzaeeJavad PeymanfardHossein Zeinali Original Paper 20 April 2025
Multi-task learning for multi-dialect Arabic sentiment classification and sarcasm detection Mohammed Elsadiq BarmatiBachir SaidAbdelghani Dahou Original Paper 20 April 2025
“But why??” Evaluation of user-suggested synonyms in the Thesaurus of Modern Slovene Magdalena Gapsa Origina lPaper Open access 13 April 2025
MEMD-ABSA: a multi-element multi-domain dataset for aspect-based sentiment analysis Hongjie CaiNan SongRui Xia Original Paper 03 April 2025
Semantic processing for Urdu: corpus creation, parsing, and generation Muhammad Saad AminXiao ZhangJohan Bos Original Paper Open access 27 March 2025
Using contrastive language-image pre-training for Thai recipe recommendation Thanatkorn ChuenbanluesukVoramate PlodprongThitirat Siriborvornratanakul Project Notes 21 March 2025
A new evaluation method: evaluation data and metrics for Chinese grammatical error correction Nankai LinYingwen FuShengyi Jiang Original Paper 17 March 2025
Utilizing phonetic similarity for cross-source and cross-language toponym matching: a benchmark and prototype Tomer SagiMoran ZagaKatja Hose Research Article Open access 26 February 2025
The narratives of war (NoW) corpus of written testimonies of the Russia-Ukraine war Serhii ZasiekinLarysa ZasiekinaVictor Kuperman Original Paper Open access 19 February 2025
Ngalawan Ujaran Sengit: hate speech detection in indonesian code-mixed social media data Endang Wahyu PamungkasPatricia Chiril Original Paper 19 February 2025
MedicalCare: building and annotating an empathy-rich corpus Yinglun SunJose ZavalaJeffrey Moore Original Paper 15 February 2025
Evaluation of end-to-end continuous spanish lipreading in different data conditions David Gimeno-GómezCarlos-D. Martínez-Hinarejos Original Paper Open access 15 February 2025
Improving irony speech spreaders profiling on social networks using clustering & transformer based models Leila HazratiAlireza SokhandanLeili Farzinvash Original Paper 14 February 2025
Automatic readability assessment for sentences: neural, hybrid and large language models Fengkai LiuTan JinJohn S. Y. Lee Original Paper 09 February 2025
DeepMine-multi-TTS: a Persian speech corpus for multi-speaker text-to-speech Majid AdibianHossein ZeinaliSoroush Barmaki Original Paper 01 February 2025
Umplc: the first longitudinal learner corpus of Portuguese Mu YouJing ZhangKaixin Lan Project Note 01 February 2025
Sentiment analysis in low-resource contexts: BERT’s impact on Central Kurdish Kozhin Muhealddin AwllaHadi VeisiAbdulhady Abas Abdullah Original Paper 27 January 2025
UFLA-FORMS: an academic forms dataset for information extraction in the Portuguese language Victor Gonçalves LimaDenilson Alves Pereira Original Paper 22 January 2025
Correction to: Investigating droplet emission during speech interaction Francesca CarboneGilles BouchetAntoine Giovanni Correction Open access 16 January 2025
Introducing a Swahili social media sentiment analysis dataset for the telecom industry Mahadia TungaDavis David Original Paper 09 January 2025
Error annotation: a review and faceted taxonomy Gülşen EryiğitAnna GolynskaiaTolgahan Türker Survey 06 January 2025
Exploring lexical factors in semantic annotation: insights from the classification of nouns in French Lucie BarqueRichard HuygheMartial Foegel Original Paper 06 January 2025
An integrated framework for emotion and sentiment analysis in Tamil and Malayalam visual content V. Jothi PrakashS. Arul Antran Vijay Original Paper 05 January 2025
ParlaMint II: advancing comparable parliamentary corpora across Europe Tomaž ErjavecMatyáš KoppDarja Fišer Original Paper Open access 28 December 2024
Constructing understanding: on the constructional information encoded in large language models Claire BonialHarish Tayyar Madabushi Original Paper Open access 20 December 2024
Stereohoax: a multilingual corpus of racial hoaxes and social media reactions annotated for stereotypes Wolfgang S. Schmeisser-NietoAlessandra Teresa CignarellaFrancesca D’Errico Original Paper Open access 19 December 2024
PinLID: a dataset for Pinglish language identiftcation based on code-mixing sentence on unstructured resources Arash GhafouriHasan NaderiMahdi Firouzmandi Original Paper 07 December 2024
Czech news dataset for semantic textual similarity Jakub SidoMichal SejákVáclav Moravec Original Paper 07 December 2024
Rapidly developing NLP applications for content curation Julian Moreno-SchneiderMalte OstendorffGeorg Rehm Project Note Open access 07 December 2024
A comparative analysis of encoder only and decoder only models in intent classification and sentiment analysis: navigating the trade-offs in model size and performance Alberto BenayasMiguel Angel SiciliaMarçal Mora-Cantallops Original Paper 07 December 2024
Detection of political hate speech in Korean language Hyo-sun RyuJae Kook Lee Original Paper 03 December 2024
Investigating droplet emission during speech interaction Francesca CarboneGilles BouchetAntoine Giovanni Original Paper Open access 03 December 2024
The Mandarin Chinese speech database: a corpus of 18,820 auditory neutral nonsense sentences Anqi ZhouQiuhong LiChao Wu Project Note 30 November 2024
Strategies for managing time and costs in speech corpus creation: insights from the Slovenian ARTUR corpus Darinka VerdonikAndreja BizjakSimon Dobrišek Original Paper Open access 30 November 2024
Evaluation of the morphological rules for the Tenyidie language: a low-resource language Teisovi AngamiMimi Kevichüsa-EzungThemrichon Tuithung Original Paper 27 November 2024
Textflows: an open science NLP evaluation approach Matej MartincMatic PerovšekSenja Pollak Origiinal Paper Open access 27 November 2024
Sanitization of septic news sentences through hybrid approach in English Soma DasSanjay Chatterji Original Paper 27 November 2024
Fake news article detection datasets for Hindi language Sujit KumarAnant ShankhdharSanasam Ranbir Singh Original Paper 22 November 2024
Human–robot dialogue annotation for multi-modal common ground Claire BonialStephanie M. LukinClare R. Voss Original Paper 16 November 2024
Uzbek news corpus for named entity recognition Aizihaierjiang YusufuKamran AzizDonghong Ji Original Paper 11 November 2024
Disfluency annotated corpora for Indian English in technical domains Vandan MujadiaPruthwik MishraDipti Misra Sharma Original Paper 26 October 2024
“You’ll be a nurse, my son!” Automatically assessing gender biases in autoregressive language models in French and Italian Fanny DucelAurélie NévéolKarën Fort Original Paper 24 October 2024
Open source platform for Estonian speech transcription Aivo OlevTanel Alumäe Original Paper Open access 16 October 2024
Exploratory Analysis of Rinconada Bikol Language-Nabua Text Corpus Joseph Jessie S. OñateTiffany Lyn O. Pandes Project Notes 15 October 2024
From greatest simplicity to full power Luís GomesAntónio BrancoRuben Branco Original Paper Open access 12 October 2024