Authors:
S. Senthamizh Selvi, R. Anitha, O. Jeba Singh, S. Rubin Bose, Mohammad Ayaz Ahmad
Addresses:
Department of Computer Science and Engineering, Sri Venkateswara College of Engineering, Chennai, Tamil Nadu, India. Centre for Academic Research, Alliance University, Bengaluru, Karnataka, India. Department of Computer Science and Engineering, SRM Institute of Science and Technology, Ramapuram, Chennai, Tamil Nadu, India. Department of Mathematics, Physics and Statistics, University of Guyana, Georgetown, Guyana, South America.
When translating specific words from one language to another, it is necessary to take into consideration several different variables to guarantee correctness, precision, quality, and contextual appropriateness. Context-aware translation systems face several issues due to the complexities of natural language. These challenges include contextual disambiguation, cultural and linguistic nuances, subject matter expertise, and maintaining translation consistency. In low-resource languages such as Tamil, which suffer from limited linguistic resources, high word-sense ambiguity, and flexible word order, the translation process is further complicated. These issues are magnified in low-resource languages. To address these concerns, the proposed study will provide linguistic resources for pre-training Neural Machine Translation (NMT) systems. This will be accomplished by developing a translation system that is language-independent, document-specific, and context-based, making it suitable for languages with limited resources. The research presents a novel word vector format known as Word2Line (W2L), which minimises the time required for the process. With the use of TF-IDF, the system can recognise document-specific words in the English language, which serves as the source language, and then predict the context-based translations of those words in Tamil, the target language.
Keywords: Neural Machine Translation; Data Mining Confidence; Context-Aware Translation; Natural Language; Maximum Likelihood Estimate; Document-Specific Translation.
Received on: 22/09/2024, Revised on: 06/12/2024, Accepted on: 11/01/2025, Published on: 05/06/2025
DOI: 10.69888/FTSCS.2025.000434
FMDB Transactions on Sustainable Computing Systems, 2025 Vol. 3 No. 2, Pages: 101-113