Peningkatan Akurasi Penerjemah Bahasa Daerah dengan Optimasi Korpus Paralel

Herry Sujaini

Abstract


Statistical Machine Translation (SMT) quality is influenced by several factors. The most fundamental factor is quantity of corpus used as base material for building translational and language model in SMT. Quantity of corpus is a major factor in ensuring quality of the translation, but quality of corpus can not be ignored either. Checking the source and translation sentences manually in a parallel corpus of course will be very difficult and require a lot of resources. This paper reports the experimental results using a quality improvement strategy of Indonesian-Malay and Indonesia-Javanesse corpus without having to examine and correct the sentences that exist on the corpus. The filter used is the minimum value of each sentence tested by the Bilingual Evaluation Understudy (BLEU) method. Experimental results show that parallel corpus optimization can improve the level of accuracy of Indonesian-Malay translation by 6.97%and Indonesian-Javanesse translation by 5.55%.

Keywords


mesin penerjemah statistik, optimasi korpus, korpus paralel, bahasa Indonesia-Melayu

Full Text:

PDF

References


Nugroho, R.A., Adji, T.B. & Hantono, B.S., “Penerjemahan Bahasa Indonesia dan Bahasa Jawa Menggunakan Metode Statistik Berbasis Frasa”, Seminar Nasional Teknologi Informasi dan Komunikasi 2015 (SENTIKA 2015), Yogyakarta, 2015.

Apriani, T., “Pengaruh Kuantitas Korpus Terhadap Akurasi Mesin Penerjemah Statistik Bahasa Bugis Wajo ke Bahasa Indonesia”, Jurnal Sistem dan Teknologi Informasi (JustIN), Vol. 1, No. 1, 2015.

Mandira, S., Sujaini, H. & Putra, A.B., “Perbaikan Probabilitas Lexical Model Untuk Meningkatkan Akurasi Mesin Penerjemah Statistik”, Jurnal Edukasi dan Penelitian Informatika (JEPIN), Vol. 2, No.1, 2016.

Jarob, Y., Sujaini, H. & Safriadi, N., “Uji Akurasi Penerjemahan Bahasa Indonesia–Dayak Taman Dengan Penandaan Kata Dasar Dan Imbuhan”, Jurnal Edukasi dan Penelitian Informatika (JEPIN), Vol. 2, No. 2, 2016.

Hasbiansyah, M., “Tuning for Quality untuk Uji Akurasi Mesin Penerjemah Statistik (MPS) Bahasa Indonesia-Bahasa Dayak Kanayatn”, Jurnal Sistem dan Teknologi Informasi (JustIN), Vol. 1, No. 1, 2016.

Yohanes, B.W., Robert, T., dan Nugroho, S., "Sistem Penerjemah Bahasa Jawa-Aksara Jawa Berbasis Finite State Automata", Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI), Vol. 6, No. 2, 2017.

Sujaini, H., “Mesin Penerjemah Situs Berita Online Bahasa Indonesia ke Bahasa Melayu Pontianak”, Jurnal Teknik Elektro (ELKHA), Vol. 6, No. 2, 2014.

Sujaini, H. & Bijaksana, A., “Strategi Memperbaiki Kualitas Korpus untuk Meningkatkan Kualitas Mesin Penerjemah Statistik”, Seminar Nasional Teknologi Informasi XI Tahun 2014, Desember 2014, Jakarta, 2014.

Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., dan Mercer, R.L., “The mathematics of statistical machine translation”, Computational Linguistics, Vol. 19, No. 2, pp. 263–313, 1993.

Al-Onaizan, Y., Germann, U., Hermjakob, U., Knight, K., Koehn, P., Marcu, D., & Yamada, K., “Translation with Scarce Bilingual Resources”, Journal Machine Translation, Vol. 17, No. 1, pp. 1--17, 2002.

Koehn, P., Och, F.J., dan Marcu, D., “Statistical Phrase Based Translation”, Proceedings of the Joint Conference on Human Language Technologies and the An-nual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL), Edmonton, 2003.

Zens, R., Och, F.J., & Ney, H., “Phrase-based Statistical Machine Translation”, Proceedings of the German Conference on Artificial Intelligence (KI 2002), Heidelberg, pp. 48-54, 2002.

Och, F.J. & Weber, H, “Improving Statistical Natural Language Translation With Categories And Rules”, Proceedings of the 36th Annual Meeting of the Association of Computational Linguistics (ACL), Montreal, 1998.

Och, F.J., Tillmann, C., & Ney, H., “Improved Alignment Models For Statistical Machine Translation”, Proceedings of the Joint Conference of Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP-VLC), pp. 20–28, 1999.

Och, F.J. & Ney, H., “The Alignment Template Approach to Statistical Machine Translation”, Journal Computational Linguistics, Vol. 30, No. 4, pp. 417–449, 2004.

Venugopal, A., Vogel, S., & Waibel, A., “Effective Phrase Translation Extraction From Alignment Models”, Hinrichs, E, and Roth, D,, editors, Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, pp. 319–326, 2003.

Wang, Y.Y. & Waibel, A., “Modeling With Structures In Statistical Machine Translation”, Proceedings of the 36th Annual Meeting of the Association of Computational Linguistics (ACL), Montreal, pp. 1357-1363, 1998.

Watanabe, T., Sumita, E., & Okuno, H.G., “Chunk-Based Statistical Translation”, Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, Sapporo, Vol. 1, pp. 303-310, 2003.

Marcu, D., “Towards A Unified Approach To Memory And Statistical-Based Machine Translation”, Proceedings of the 39th Annual Meeting of the Association of Computational Linguistics (ACL), Toulouse, pp.378-385, 2001.

Och, F.J. & Ney, H., “Discriminative Training and Maximum Entropy Models for Statistical Machine Translation”, Proceedings of the 40th Annual Meeting of the Association of Computational Linguistics (ACL), Philadelphia, pp. 295-302, 2002.

Koehn, P., “Statistical Machine Translation”, Cambridge University Press, New York, 2010.

Yıldız, E., Tantuğ, A.C., & Diri, B., “The Effect of Parallel Corpus Quality vs Size in English-to-Turkish SMT”, Sixth International Conference on Web services & Semantic Technology (WeST 2014), Chennai, 2014.

Kaalep, H.J. & Veskis, K., “Comparing Parallel Corpora and Evaluating their Quality”, Proceedings of MT Summit XI, Copenhagen, pp. 275-279, 1997.

Maheshwar, S. & Sharma, H., “Improvements in Corpus Quality for Statistical Machine Translation”, IJSRD - International Journal for Scientific Research & Development, Vol. 2, No, 5, pp. 2321-0613, 2014.

Papineni, K., Roukos, S., Ward, T., dan Zhu, W.-J., “BLEU: A Method For Automatic Evaluation of Machine Translation”, Proceedings of the 40th Annual Meeting of the Association of Computational Linguistics (ACL), Pennsylvania, p. 311-318, 2002.




DOI: http://dx.doi.org/10.22146/jnteti.v7i1.394

Refbacks

  • There are currently no refbacks.


Copyright (c) 2018 Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI)

JNTETI (Jurnal Nasional Teknik Elektro dan Teknologi Informasi)

Departemen Teknik Elektro dan Teknologi Informasi, Fakultas Teknik Universitas Gadjah Mada
Jl. Grafika No 2. Kampus UGM Yogyakarta 55281
+62 274 552305
jnteti@ugm.ac.id