Classification of Breast Cancer Data for Recurrence Identification with Decision Tree Approach


  • Nur Atiqah Hamzah , Sabariah Saharan , Khuneswari A/P Gopal Pillay


Classification is one of the methods in data mining that can be used to group the data into class attributes based on
similarities shared in the data. A decision tree will be used for classification purposes. The decision tree has been
widely used in many field industries as its simplicity of computation is suitable to be used to analyse various kinds of
data available. In this research, the performance of the decision tree will be used in analysing the Breast Cancer
data. Cancer is one of the leading causes of human death in the world and breast cancer has one of the highest
rankings among other cancers. In Malaysia, about 17.3% of cancer patients suffered from breast cancer as stated by
the World Health Organization in 2018. Breast cancer is top cancer in women both in the developed and developing
countries. The breast cancer data were classified into two groups of recurrence and non-recurrence cancer and will
be discussed. The patients’ age, breast, breast quad, menopause, tumor size, involve nodes, node caps, degree of
malignant, and irradiation are the attributes used for the classification process. The accuracy of the decision tree
can be calculated using the confusion matrix.




