DEVELOPMENT OF GRAPHEME-TO-PHONEME CONVERSION MODEL FOR ARMENIAN LANGUAGE
Keywords:
Grapheme-to-Phoneme (G2P, Armenian language, neural networks, Text-to-Speech (TTS), Conformer-CTC, International Phonetic Alphabet (IPA).Abstract
The paper presents the development of a grapheme-to-phoneme (G2P) conversion model for the Armenian language. The G2P process is a crucial component in the creation of Text-to-Speech (TTS) systems, directly impacting the quality of synthesized speech. Current approaches for Armenian G2P conversion demonstrate limited accuracy, as evidenced by high error rates of 96.60% WER and 36.15% PER in the existing tools like phonemizer. This research addresses these challenges by developing a comprehensive solution including a specialized dataset and neural network model. We begin by analyzing the specific phonological characteristics of Armenian, including context-dependent pronunciation rules and unique sound-symbol relationships that complicate automated transcription. To address the lack of publicly available resources, we have created a dataset containing 17,862 Armenian word-phoneme pairs by automatically collecting and processing data from Wiktionary using a multi-layered analysis system with robust quality control mechanisms. The analysis of this dataset revealed complex mapping patterns between Armenian graphemes and phonemes, with distribution characteristics following Zipf's law and a wide variety of contextual dependencies. Using this dataset, we developed a Conformer-CTC neural network model with approximately 12.3 million trainable parameters, featuring self-attention mechanisms and convolutional modules specifically designed to capture both local and global linguistic patterns. Evaluation shows that our model achieves a 16.13% Word Error Rate (WER) and a 17.36% Phoneme Error Rate (PER), representing an 80.47% and 18.79% improvement respectively over the existing solutions.



