Ekstraksi Frasa Kunci pada Penggabungan Klaster berdasarkan Maximum-Common-Subgraph

Adhi Nurilham, Diana Purwitasari, Chastine Fatichah

Abstract


Klasterisasi dokumen berdasarkan kemiripan topik memudahkan pencarian pada koleksi artikel ilmiah yang banyak dan pelabelan klaster diperlukan untuk memberikan gambaran topik bahasan dalam artikel. Beberapa kelompok dokumen yang masih memiliki kemiripan kontekstual atau topik bahasan perlu digabung untuk menghasilkan label klaster lebih baik. Relasi dari kata-kata dalam satu konteks dapat direpresentasikan sebagai model graf. Makalah ini mengusulkan pelabelan kelompok artikel ilmiah dengan kontribusi penggabungan klaster berbasis graf untuk memberikan label topik yang lebih representatif. Usulan metode diawali dari pengelompokan artikel ilmiah berdasarkan topik dengan K-Means++. Kemudian, kandidat frasa kata dari kelompok dokumen hasil klasterisasi diekstraksi menggunakan adopsi algoritme Apriori yaitu Frequent Phrase Mining. Setiap klaster memiliki representasi graf dari kandidat frasa kata. Dari graf tersebut dihitung nilai indikator sebagai penanda adanya struktur node sama dengan Maximum Common Subgraph (MCS). Penggabungan klaster dilakukan jika terdapat kesamaan struktur graf representasi. Label topik bahasan adalah frasa kunci sebagai hasil ekstraksi dari graf klaster gabungan berdasarkan distribusi topik menggunakan algoritme TopicRank. Evaluasi usulan metode dilakukan berdasarkan koherensi topik label klaster yang dihasilkan. Hasil pengujian menunjukkan bahwa usulan metode dapat meningkatkan koherensi topik pada klaster hasil penggabungan dengan faktor yang memengaruhi, yaitu persentase ukuran graf MCS terhadap graf klaster. Pengamatan lebih lanjut menunjukkan bahwa terdapat konsistensi label klaster hasil penggabungan terhadap isi graf MCS.

Keywords


Pelabelan Klaster; Penggabungan Klaster; Frequent Phrase Mining; Maximum Common Subgraph; TopicRank

Full Text:

PDF

References


H. Park, K. Kwon, A. i. Z. Khiati, J. Lee, dan I. J. Chung, “Agglomerative Hierarchical Clustering for Information Retrieval Using Latent Semantic Index,” 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), 2015, hal. 426–431.

S. Shah dan X. Luo, “Exploring Diseases Based Biomedical Document Clustering and Visualization Using Self-Organizing Maps,” 2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom), 2017, hal. 1–6.

A. Wahib, A. Z. Arifin, dan D. Purwitasari, “Improving Multi-Document Summary Method Based on Sentence Distribution,” TELKOMNIKA (Telecommunication Comput. Electron. Control., Vol. 14, No. 1, hal. 286, 2016.

A. Zaini, M. A. Muslim, dan W. Wijono, “Pengelompokan Artikel Berbahasa Indonesia Berdasarkan Struktur Laten Menggunakan Pendekatan Self Organizing Map,” J. Nas. Tek. Elektro dan Teknol. Inf., Vol. 6, No. 3, hal. 259-267, 2017.

D. Purwitasari, C. Fatichah, I. Arieshanti, dan N. Hayatin, “K-medoids Algorithm on Indonesian Twitter Feeds for Clustering Trending Issue as Important Terms in News Summarization,” Proc. 2015 Int. Conf. Inf. Commun. Technol. Syst. ICTS 2015, 2015, hal. 95–98.

P. Hennig, P. Berger, C. Steuer, C. Wuerz, dan C. Meinel, “Cluster Labeling for the Blogosphere,” 2014 IEEE Fourth International Conference on Big Data and Cloud Computing, 2014, hal. 416–423.

P. Xie dan E. P. Xing, “Integrating Document Clustering and Topic Modeling,” Proc. 29th Conf. Uncertain. Artif. Intell., 2013, hal. 694-703.

Q. Mei, X. Shen, dan C. Zhai, “Automatic Labeling of Multinomial Topic Models,” Proc. 13th ACM SIGKDD Int. Conf. Knowl. Discov. data Min. - KDD ’07, 2007, hal. 490-499.

T. L. Griffiths dan M. Steyvers, “Finding Scientific Topics,” Proc. Natl. Acad. Sci., Vol. 101, No. Supplement 1, hal. 5228–5235, 2004.

C. Aalla dan V. Pudi, “Mining Research Problems from Scientific Literature,” 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2016, hal. 351–360.

Z. Li, J. Li, Y. Liao, S. Wen, dan J. Tang, “Labeling Clusters from Both Linguistic and Statistical Perspectives: A Hybrid Approach,” Knowledge-Based Syst., Vol. 76, hal. 219–227, 2015.

N. Y. Saiyad, H. B. Prajapati, dan V. K. Dabhi, “A Survey of Document Clustering Using Semantic Approach,” 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), 2016, hal. 2555–2562.

J. Jayabharathy, S. Kanmani, dan A. A. Parveen, “Document Clustering and Topic Discovery Based on Semantic Similarity in Scientific Literature,” 2011 IEEE 3rd International Conference on Communication Software and Networks, 2011, hal. 425–429.

S. S. Sonawane dan P. A. Kulkarni, “Graph based Representation and Analysis of Text Document: A Survey of Techniques,” Int. J. Comput. Appl., Vol. 96, No. 19, hal. 1–8, Jun. 2014.

S. Sonawane dan P. Kulkarni, “Graph based Representation and Analysis of Text Document: A Survey of Techniques,” Int. J. Comput. Appl., Vol. 96, No. 19, hal. 1–8, 2014.

N. Shanavas, H. Wang, Z. Lin, dan G. Hawe, “Centrality-Based Approach for Supervised Term Weighting,” IEEE Int. Conf. Data Min. Work. ICDMW, 2017, hal. 1261–1268.

F. Role dan M. Nadif, “Beyond Cluster Labeling: Semantic Interpretation of Clusters‟ Contents Using a Graph Representation,” Knowledge-Based Syst., Vol. 56, hal. 141–155, 2014.

A. El-Kishky, Y. Song, C. Wang, C. Voss, dan J. Han, “Scalable Topical Phrase Mining from Text Corpora,” Proc. VLDB Endow., Vol. 8, No. 3, hal. 305–316, 2014.

T. Mikolov, G. Corrado, K. Chen, dan J. Dean, “Efficient Estimation of Word Representations in Vector Space,” Proc. Int. Conf. Learn. Represent. (ICLR 2013), 2013, hal. 1–12.

J. Wu, Z. Xuan, dan D. Pan, “Enhancing Text Representation for Classification Tasks with Semantic Graph Structures,” Int. J. Innov. Comput. Inf. Control, Vol. 7, No. 5, hal. 13–16, 2011.

L. Sterckx, T. Demeester, J. Deleu, dan C. Develder, “Topical Word Importance for Fast Keyphrase Extraction,” Proc. 24th Int. Conf. World Wide Web - WWW ’15 Companion, 2015, No. 2, hal. 121–122.

M. Röder, A. Both, dan A. Hinneburg, “Exploring the Space of Topic Coherence Measures,” Proc. Eighth ACM Int. Conf. Web Search Data Min. - WSDM ’15, 2015, hal. 399–408.

R. Mihalcea dan P. Tarau, “TextRank: Bringing Order into Texts,” Proc. EMNLP, Vol. 85, hal. 404–411, 2004.

A. Hulth, “Improved Automatic Keyword Extraction Given More Linguistic Knowledge,” Proc. 2003 Conf. Empir. Methods Nat. Lang. Process., 2003, No. 2000, hal. 216–223.

M. Grineva, M. Grinev, dan D. Lizorkin, “Extracting Key Terms from Noisy and Multitheme Documents,” Proc. 18th Int. Conf. World wide web - WWW ’09, 2009, hal. 661-670.

K. S. Hasan dan V. Ng, “Automatic Keyphrase Extraction: A Survey of the State of the Art,” Proc. 52nd Annu. Meet. Assoc. Comput. Linguist. (Volume 1 Long Pap.), 2014, hal. 1262–1273.

L. H. Suadaa dan A. Purwarianti, “Combination of Latent Dirichlet Allocation (LDA) and Term Frequency-Inverse Cluster Frequency (TFxICF) in Indonesian Text Clustering with Labeling,” 2016 4th Int. Conf. Inf. Commun. Technol. ICoICT 2016, 2016, hal. 1-6.

D. Carmel, H. Roitman, dan N. Zwerdling, “Enhancing Cluster Labeling Using Wikipedia,” Proc. 32nd Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. - SIGIR ’09, 2009, hal. 139–146.

L. Page, S. Brin, R. Motwani, dan T. Winograd, “The PageRank Citation Ranking: Bringing Order to the Web,” World Wide Web Internet Web Inf. Syst., Vol. 54, No. 1999–66, hal. 1–17, 1998.

X. Wan dan J. Xiao, “CollabRank: Towards a Collaborative Approach to Single-Document Keyphrase Extraction,” Proc. 22nd Int. Conf. Comput. Linguist. Coling 2008, 2008, hal. 969–976.

Z. Liu, P. Li, Y. Zheng, dan M. Sun, “Clustering to Find Exemplar Terms for Keyphrase Extraction,” Proc. 2009 Conf. Empir. Methods Nat. Lang. Process.: Vol. 1, 2009, hal. 257–266.

Z. Liu, W. Huang, Y. Zheng, dan M. Sun, “Automatic Keyphrase Extraction via Topic Decomposition,” Proc. 2010 Conf. Empir. Methods Nat. Lang. Process., 2010, hal. 366–376.

N. F. Azzahra, H. Ginardi, dan A. Saikhu, “Praproses Data Alir ADS-B dari Multi-Receiver dengan Pengelompokan Agglomerasi Berbasis Konsistensi Jarak,” JNTETI, Vol. 4, No. 1, hal. 39-44, 2015.

A. Krauza, “Extension of Fuzzy Gustafson-Kessel Algorithm Based on Adaptive Cluster Merging,” 2015 IEEE MIT Undergrad. Res. Technol. Conf. URTC 2015, 2016, hal. 1–4.

C. Jin dan Q. Bai, “Text Clustering Algorithm Based on the Graph Structures of Semantic Word Co-occurrence,” 2016 Int. Conf. Inf. Syst. Artif. Intell., 2016, hal. 497-502.

P. J. Rousseeuw, “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis,” J. Comput. Appl. Math., Vol. 20, hal. 53–65, 1987.

R. Gunawan dan K. Mustofa, “Pencarian Aturan Asosiasi Semantic Web untuk Obat Tradisional Indonesia,” J. Nas. Tek. Elektro dan Teknol. Informasi (JNTETI), Vol. 5, No. 3, hal. 192–200, 2016.




DOI: http://dx.doi.org/10.22146/jnteti.v7i3.432

Refbacks

  • There are currently no refbacks.


Copyright (c) 2018 Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI)

JNTETI (Jurnal Nasional Teknik Elektro dan Teknologi Informasi)

Departemen Teknik Elektro dan Teknologi Informasi, Fakultas Teknik Universitas Gadjah Mada
Jl. Grafika No 2. Kampus UGM Yogyakarta 55281
+62 274 552305
jnteti@ugm.ac.id