Exploring the Effectiveness of LLM-Generated Context on Emotion Lexicon Word Vectorization: A Comparative Study on Turkish and English

Kumova Metin, SenemAka Uymaz, Hande2025-11-252025-11-252025-091520-92021941-045Xhttps://doi.org/10.1109/MITP.2025.3572550https://hdl.handle.net/20.500.14365/6597This study explores the impact of large language models (LLMs) on emotion lexicon word vectorization on Turkish and English. Emotion analysis involves extracting affective information from various data sources, with text being a primary medium. While traditional vectorization methods lack semantic meaning, contextual vectors, such as bidirectional encoder representations from transformers (BERT), aim to capture the context of words, leading to improved performance in natural language processing tasks. We investigate the efficacy of context sentences from human-annotated datasets and sentences generated by Gemini-Pro LLM in creating word vectors. Additionally, we introduce a manually annotated Turkish emotion and sentiment lexicon (TES-Lex). Performance evaluation is conducted for both Turkish and English using BERT vectors with two approaches: cosine similarity and machine learning. Our findings indicate that LLM-generated context sentences significantly enhance the quality of word vectors, especially in Turkish, underscoring the potential of LLMs in augmenting emotion lexicon resources in low-resourced languages.eninfo:eu-repo/semantics/closedAccessPerformance EvaluationSoft SensorsSemanticsLexiconBidirectional ControlTransformersEncodingRobustnessNatural Language ProcessingLarge Language ModelsPareto OptimizationExploring the Effectiveness of LLM-Generated Context on Emotion Lexicon Word Vectorization: A Comparative Study on Turkish and EnglishArticle10.1109/MITP.2025.35725502-s2.0-105020371802