Perbandingan Hasil Prediksi Diagnosis pada Indian Liver Patient Dataset (ILPD) dengan Teknik Supervised Learning Menggunakan Software Orange
DOI:
https://doi.org/10.61769/telematika.v16i2.402Keywords:
data mining, Orange, Supervised Learning, ILPDAbstract
The development of the volume of data every day has resulted in the need for data mining to obtain valuable and meaningful data. There are many data mining software that has been developed, both free and paid. One of the free data mining software is Orange. This software provides modeling, both supervised and unsupervised learning. Orange also provides model evaluation features, such as accuracy, precision, the time required for training and testing, specificity, and other evaluation measures. Therefore, Orange makes it easy for users to perform data mining. One of the users who need Orange is a user with a non-IT background, such as a health user who can make predictions for the diagnosis of a disease. Users do not need to focus on syntax to perform data mining. With Orange, healthcare users can easily and faster predict the diagnosis of the disease. This study uses Indian Liver Patient (ILPD) data from the UCI-Machine Learning Repository. The objective of the diagnosis is to determine whether the patient has a liver disorder or not. The methods that are used in this study are Decision Tree, Random Forest, SVM, Neural Network, Naïve Bayes, k-NN, and Logistic Regression. This study evaluates using a confusion matrix, accuracy level, precision level, training time, and testing time. The results show that the time required for training and testing is relatively short. With the data used, this study has proved that the four best methods based on accuracy are Logistic Regression, Neural Network, Random Forest, and Naïve Bayes.
Perkembangan volume data setiap hari mengakibatkan perlunya data mining untuk mendapatkan data berharga dan berguna. Terdapat banyak data mining software yang telah dikembangkan, baik gratis maupun berbayar. Salah satu data mining software yang gratis adalah Orange. Sofware ini menyediakan pemodelan, baik supervised maupun unsupervised learning. Orange juga menyediakan fitur evaluasi model, seperti akurasi, presisi, waktu yang dibutuhkan untuk training dan testing, spesifisitas, dan ukuran evaluasi lainnya. Oleh karena itu, dapat dikatakan bahwa Orange memudahkan pengguna untuk melakukan data mining. Salah satu pengguna yang membutuhkan Orange adalah pengguna dengan latar belakang non-IT, seperti pengguna bidang kesehatan yang dapat melakukan prediksi untuk diagnosis suatu penyakit. Pengguna tidak perlu berfokus pada sintaks untuk melakukan data mining. Dengan Orange, pengguna bidang kesehatan dapat memprediksi diagnosis suatu penyakit dengan lebih mudah dan lebih cepat. Penelitian ini menggunakan data Indian Liver Patient (ILPD) dari UCI-Machine Learning Repository. Targetnya adalah menentukan diagnosis pasien apakah memiliki ganguan hati atau tidak. Metode yang digunakan adalah Decision Tree, Random Forest, SVM, Neural Network, Naïve Bayes, k-NN, dan Regresi Logistik. Penelitian ini melakukan evaluasi dengan menggunakan confusion matrix, tingkat akurasi, tingkat presisi, waktu training, dan waktu testing. Hasil penelitian menunjukkan bahwa waktu yang dibutuhkan untuk training dan testing terbilang singkat. Dengan data yang digunakan, dalam penelitian ini diperoleh hasil pula empat metode terbaik berdasarkan tingkat akurasi adalah Regresi Logistik, Neural Network, Random Forest, dan Naïve Bayes.
References
R. Ratra dan P. Gulia, “Experimental evaluation of open source data mining tools (WEKA and Orange),” Int. J. Eng. Trends Technol., vol. 68, no. 8, hlm. 30–35, 2020.
M. PhridviRaj dan C. GuruRao, “Data mining - past, present and future - A Typical Survey on Data Stream,” INTER-ENG ProcediaTechnology, vol. 12, hlm. 255–263, 2013.
J. Han, M. Kamber, dan J. Pei, Data Mining: Concepts and Techniques, 3rd ed. USA: Morgan Kaufman Publisher, 2012.
S. Shalev-Shwartz dan S. Ben-David, Understanding Machine Learning: From Theory to Algorithms. New York: Cambridge University Press, 2014.
M. Roos, “A data analysis demonstrator for managing customer experience in a partnering ventur,” Tesis, Faculty of Engineering, Stellenbosch University, 2019.
T. Wendler dan S. Grottrup, Data Mining with SPSS Modeler: TheORY, Exercises and Solution, 2nd ed. Switzerland: Springer Nature Switzerland, 2021.
A. Jose, M. Philip, L. T. Prasanna, dan M. Manjula, “Comparison of Probit and Logistic Regression models in the analysis of dichotomous outcomes,” Curr. Res. Biostat., vol. 10, no. 1, hlm. 1–19, 2020, doi: 10.3844/amjbsp.2020.1.19.
F. Gorunescu, Data Mining Concepts, Model and Techniques, Vol. 12. Berlin: Springer, 2011.
J. Han dan M. Kamber, Data Mining: Concepts and Techniques Tutorial. San Francisco: Morgan Kaufman Publisher, 2001.
B. Zupan dan Dems, “Introduction to data mining,” 2011. https://file.biolab.si/notes/2018-05-intro-to-datamining-notes.pdf. [10 Agustus 2021].
Orange, “Orange Data Mining: Fruitful and Fun.” [Daring]. Tersedia: https://orangedatamining.com/.
J. Demsar dan B. Zupan, “Orange: data mining fruitful and fun,” Informatica, vol. 37, hlm. 55–60, 2013.
Orange Data Mining, “Orange Data Mining Library Documentation Release 3.”
UCI, “ILPD (Indian Liber Patient Dataset).” https://archive.ics.uci.edu/ml/datasets/ILPD+(Indian+Liver+Patient+Dataset) [20 Agustus 2021].
P. Meilina, “Penerapan data mining dengan metode klasifikasi menggunakan Decision Tree dan Regresi,” J. Teknol. Univ. Muhammadiyah Jakarta, vol. 7, no. 1, hlm. 11–20, 2015.
S. Sansgiry, M. Bhosle, dan K. Sail, “Factors that affect academic performance among pharmacy students,” Am. J. Pharm. Educ., vol. 70, no. 5, artikel 104, 2006.
H. Zhou, Learn Data Mining Through Excel: A Step-by-Step Approach for Understanding Machine Learning Methods. USA: Apress, 2020.
D. L. Olson dan D. Wu, Predictive Data Mining Models, 2nd ed. Singapore: Springer, 2020.
Larose dan T. Daniel, Discovering Knowledge in Data: An Introduction to Data Mining. USA: John Wiley & Sons, 2005.
S. Dash, S. K. Pani, S. Balamurugan, dan A. Abraham, Biomedical Data Mining for Information Retrieval: Methodologies, Techniques and Applications. USA: Scrivener Publishing, 2021.
D. M. W. Powers, “Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness and Correlation,” Adelaide, 2007. [Daring]. Tersedia: http://arxiv.org/abs/2010.16061.
A. Naik dan L. Samant, “Correlation review of classification algorithm using data mining tool: WEKA, Rapidminer, Tanagra, Orange, and Knime,” Procedia Comput. Sci., vol. 85, no. 2016, hlm. 662–668, 2016, doi: 10.1016/j.procs.2016.05.251.
Downloads
Published
Issue
Section
License
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.