THE SENTENCE-LEVEL CROSS-LINGUAL PLAGIARISM DETECTION METHOD FOR ARMENIAN-ENGLISH AND ARMENIAN-RUSSIAN LANGUAGE PAIRS

Authors

  • G.A. Petrosyan National Polytechnic University of Armenia Author
  • R.R. Sahakyan National Polytechnic University of Armenia Author

Keywords:

cross-lingual plagiarism detection, transformer, cross-lingual sentence embeddings, pre-trained models, POS tagging, paraphrase detection

Abstract

Recent advances in transformer-based models provide new opportunities for cross-lingual plagiarism detection by projecting sentences from different languages ​​into a shared semantic space. This also applies to low-resourced languages​, where they show state-of-the-art results. In this paper, we present a method for sentence-level cross-lingual plagiarism detection for Armenian-English and Armenian-Russian language pairs. We describe both subtasks – source retrieval and sentence-level alignment based on a language-agnostic dual-encoder model. In the first subtask, suspicious Armenian texts are segmented and compared with English and Russian texts with a POS-tagging-based approach to obtain the possible sources. In the second subtask, we apply a transformer-based dual-encoder model for measuring semantic similarity between sentences. We also fine-tune the selected dual-encoder model on both parallel and paraphrased Armenian-English and Armenian-Russian sentence pairs to enhance sensitivity to semantic alignment and paraphrase detection. As a source of paraphrased pairs, we use two different datasets: paraphrased pairs obtained from a parallel corpus and English paraphrased datasets adopted for our language pairs. We applied the proposed method on two publicly available datasets, adopting one of them for our language pairs. On both datasets, the tuned model outperforms the original one in terms of F1. The obtained results show that the proposed method shows good effectiveness compared to existing methods.

Downloads

Published

21.02.2026

Issue

Section

Articles

Similar Articles

1-10 of 11

You may also start an advanced similarity search for this article.