Analisis dan Perbandingan Kualitas Pengelompokan Dokumen (Document Clustering) Dengan Menggunakan Metode K-Means Dan K-Medians
DOI:
https://doi.org/10.22373/ekw.v1i2.536Keywords:
Data Mining, Clustering, K-means, and K-mediansAbstract
Conducting data analysis on a large set of documents is not an easy task. The common stages are document filtering, document selection, and document clustering. Clustering is a technique used in data mining to find groups of data that do not have a natural grouping. There are many clustering algorithm have been introduced, and two of them are K-means and K-medians. Both methods classify documents based on the proximity of words weighting between documents. This study aims to compare the quality of the clusters produced by K-means and K-medians. The results show that K-medians obtain a better cluster quality when compared to K-means. However, it takes more time to cluster.References
] Agus, E.A., 2008, Subspace Clustering Pada Data Multidimensi Menggunakan Algoritma Mafia Subspace Clustering On Multidimensional Data Using Mafia Algorithm, Skripsi, IT TELKOM, Jakarta.
] Agusta, Y., 2007, K-Means – Penerapan, Permasalahan dan Metode Terkait, Jurnal Sistem dan Informatika Vol. 3 (Pebruari 2007), 47-60.
] Deerwester, S., et al, 1988, Improving Information Retrieval with Latent Semantic Indexing, Proceedings 51 American Society for Information Science 25, USA, hlm. 36-40.
] Fuadi Abidin, T. et al., 2010, Singular Value Decomposition for Dimensionality Reduction in Unsupervised Text Learning Problems, proceeding of the International Conference on Education Technology and Computer, China.
] Garcia, E., 2006, Singular Value Decomposition (SVD) A Fast Track Tutorial, (http://www.miislita.com, diakses 2 Juni 2010).
] Han, J., Micheline, K., 2006. Data Mining: Concepts and Technique, Morgan Kaufmann Publishers, San Francisco.
] Karel, R.H. , 2005, Pembuatan Aplikasi Data Mining untuk Clustering Item dengan Menggunakan Metode Clarans pada Perusahaan X, Skripsi, Universitas Kristen Petra, Surabaya.
] Landauer, T., et al.,1998, Learning like Human Knowledge with Singular Value Decomposition, Advances in Neural Information Processing Systems 10, Cambridge: MIT Press, hal. 45-51.
] Liu, B., 2007, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Berlin Heidelberg, New York.
] MacQueen, J. B., 1967, Some Methods for Classification and Analysis of Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1: 281-297.
] Subekti, B., 2000, Perbandingan Metode-metode Penyelesaian dari Sistem Persamaan Linier yang Singular, Jurnal Surveying dan Geodesi, Vol.X, No.3.
] Umran, M, et al., 2009, Pengelompokan Dokumen Menggunakan K-Means dan Singular Value Decomposition: Studi Kasus Menggunakan Data Blog. Prosiding Seminar Sistem Informasi Indonesia 2009 (Sesindo 2009), Institut Teknologi Surabaya (ITS), Indonesia.
Downloads
Published
Issue
Section
License
Proposed Policy for Journals That Offer Open Access Authors who publish with the Elkawnie journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access).