Normalisasi Kata Tidak Baku yang Tidak Disingkat dengan Jarak Perubahan

I Gusti Bagus Baskara Nugraha, Rafi Dwi Rizqullah

Abstract


Voice assistant technology is growing rapidly and its use has begun to spread to daily use. However, voice assistant usages are still limited to standard conversation languages. Meanwhile, Indonesian people are accustomed to informal language in daily conversation. This research gives solution to overcome the problem of voice assistants with informal words or words that will not be found in formal word dictionary. We propose text normalization using Levenshtein distance. Test result shows that normalization using Levenshtein distance outperform the normalization using Longest Common Subsequence (LCS) distance with accuracy difference of 8.34%.

Keywords


voice assistant; kamus; kata tidak baku; normalisasi; jarak Levenshtein; jarak Jaro-Winkler

Full Text:

PDF

References


M. Escherich dan W. Goertz. (2015) “Market Trends: Voice as a UI on Consumer Devices—What Do Users Want?” [Online], https://www.gartner.com/doc/3021226/market-trends-voice-uiconsumer/, tanggal akses: 2-Nov-2017.

S. Kleinberg (2018) “5 ways voice assistance is shaping consumer behavior,” [Online], https://thinkwithgoogle.com/consumerinsights/voice-assistance-consumer-experience/, tanggal akses: 30-Jul-2018.

A. Na’im dan H. Syaputra, Kewarganegaraan, Suku Bangsa, Agama dan Bahasa Sehari-hari Penduduk Indonesia: Hasil Sensus Penduduk 2010, Sumarwanto dan T. Irianto, Ed. Jakarta, Indonesia: Badan Pusat Statistik, 2012.

A. Chaer dan L. Agustina, Sosiolinguistik: Suatu Pengantar. Jakarta-Indonesia: Rineka Cipta, 1995.

P. Bojanowski, E. Grave, A. Joulin, dan T. Mikolov, “Enriching Word Vectors with Subword Information,” Transactions of the Association for Computational Linguistics, Vol. 5, hal. 135–146, 2017.

T.S. Saragih, “Normalisasi Teks pada Teks Twitter Berbahasa Indonesia menggunakan Algoritme Jarak String pada R”, Skripsi, Institut Teknologi Bandung, Bandung, Indonesia, 2017.

H. Schütze, C.D. Manning, dan P. Raghavan, Introduction to Information Retrieval. Cambridge, UK: Cambridge University Press, 2008.

V.I. Levenshtein, “Binary Codes Capable of Correcting Deletions, Insertions, and Reversals,” Soviet Physics Doklady, Vol. 10, hal. 707-710, 1966.

W.E. Winkler, “String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage,” Proceedings of the Section on Survey Research, 1990, hal. 354-359.

M.P. Van der Loo, “The Stringdist Package for Approximate String Matching,” The R Journal, Vol. 6, hal. 111–122, 2014.

M.A. Jaro, “Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida,” Journal of the American Statistical Association, Vol. 84, hal. 414–420, 1989.

(2019) KBBI Offline Remake with Qt, [Online], https://github.com/bgli/kbbi-qt, tanggal akses: 1-Feb-2019.

I. Lanin, J. Geovedi, dan W. Soegijoko, “Perbandingan Distribusi Frekuensi Kata Bahasa Indonesia di Kompas, Wikipedia, Twitter, dan Kaskus,” KOLITA 11: Konferensi Linguistik Tahunan Atma Jaya Kesebelas, 2013, hal. 249–252.




DOI: http://dx.doi.org/10.22146/jnteti.v8i3.516

Refbacks

  • There are currently no refbacks.


Copyright (c) 2019 Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI)

Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI)

Departemen Teknik Elektro dan Teknologi Informasi, Fakultas Teknik Universitas Gadjah Mada
Jl. Grafika No 2. Kampus UGM Yogyakarta 55281
+62 274 552305
jnteti@ugm.ac.id