Evaluation of Data Mining Techniques using Feature Selection Methods for Pima Indian Diabetes Dataset


  • Raghavendra Srinivasaiah Santosh Kumar Jankatti Raghavendra B. K


: Medical data mining is mainly concerned with detecting patterns and relationship between the patterns inside a medical data. In order to enhance the accuracy rate of prediction, there is a necessity to accomplish improved classification and avoid the erroneous classification. Feature selection is one of the crucial steps and are largely concerned with perceiving and removal of as many unrelated and duplicate data as possible. In this projected work the techniques of data mining such as Artificial Neural Networks, Logistic Regression, Random Forest and Support Vector Machines are assessed by applying them on the PIMA Indian diabetes dataset centred on the Information Gain value of  each individual attribute by means of Feature Selection Methods with Percentage Split as test decision. The capability of the various models are gauged in the form of classification accuracy. Based on the experimental grades it was identified that Random Forest accomplishes an accuracy of 87.5%.