Analisis dan Perbandingan Kualitas Pengelompokan Dokumen (Document Clustering) Dengan Menggunakan Metode K-Means  Dan K-Medians

Bustami Bustami

doi:10.22373/ekw.v1i2.536

Authors

Bustami Bustami Fakultas Sains dan Teknologi, Universitas Islam, Negeri Ar-Ranir, Banda Aceh

DOI:

https://doi.org/10.22373/ekw.v1i2.536

Keywords:

Data Mining, Clustering, K-means, and K-medians

Abstract

Conducting data analysis on a large set of documents is not an easy task. The common stages are document filtering, document selection, and document clustering. Clustering is a technique used in data mining to find groups of data that do not have a natural grouping. There are many clustering algorithm have been introduced, and two of them are K-means and K-medians. Both methods classify documents based on the proximity of words weighting between documents. This study aims to compare the quality of the clusters produced by K-means and K-medians. The results show that K-medians obtain a better cluster quality when compared to K-means. However, it takes more time to cluster.

References

] Agus, E.A., 2008, Subspace Clustering Pada Data Multidimensi Menggunakan Algoritma Mafia Subspace Clustering On Multidimensional Data Using Mafia Algorithm, Skripsi, IT TELKOM, Jakarta.

] Agusta, Y., 2007, K-Means – Penerapan, Permasalahan dan Metode Terkait, Jurnal Sistem dan Informatika Vol. 3 (Pebruari 2007), 47-60.

] Deerwester, S., et al, 1988, Improving Information Retrieval with Latent Semantic Indexing, Proceedings 51 American Society for Information Science 25, USA, hlm. 36-40.

] Fuadi Abidin, T. et al., 2010, Singular Value Decomposition for Dimensionality Reduction in Unsupervised Text Learning Problems, proceeding of the International Conference on Education Technology and Computer, China.

] Garcia, E., 2006, Singular Value Decomposition (SVD) A Fast Track Tutorial, (http://www.miislita.com, diakses 2 Juni 2010).

] Han, J., Micheline, K., 2006. Data Mining: Concepts and Technique, Morgan Kaufmann Publishers, San Francisco.

] Karel, R.H. , 2005, Pembuatan Aplikasi Data Mining untuk Clustering Item dengan Menggunakan Metode Clarans pada Perusahaan X, Skripsi, Universitas Kristen Petra, Surabaya.

] Landauer, T., et al.,1998, Learning like Human Knowledge with Singular Value Decomposition, Advances in Neural Information Processing Systems 10, Cambridge: MIT Press, hal. 45-51.

] Liu, B., 2007, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Berlin Heidelberg, New York.

] MacQueen, J. B., 1967, Some Methods for Classification and Analysis of Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1: 281-297.

] Subekti, B., 2000, Perbandingan Metode-metode Penyelesaian dari Sistem Persamaan Linier yang Singular, Jurnal Surveying dan Geodesi, Vol.X, No.3.

] Umran, M, et al., 2009, Pengelompokan Dokumen Menggunakan K-Means dan Singular Value Decomposition: Studi Kasus Menggunakan Data Blog. Prosiding Seminar Sistem Informasi Indonesia 2009 (Sesindo 2009), Institut Teknologi Surabaya (ITS), Indonesia.