The Hepatitis. arff Data set contains information about patients affected by Hepatitis. The task is to generate a classification model to predict Hepatitis histology: Yes/No.
Submit a report based on the answers for the following questions:
a) Select a suitable decision tree model for predicting Histology.
– Which model evaluation method did you use (CW, H-O)? Provide an overview of this model, why was it preferred?
– Interpret the classification outputs: the tree topology, the accuracy rates.
b) Provide a detailed description of the classification model:
– The tree induction algorithm
– The attributes selection criteria.
– The pruning method
c) Vary the model parameters and discuss the impact on the classification results:
– Set the REP parameter (Reduced Error Pruning) to TRUE. Explain this tree pruning method. What impact has it made on the outputs, why?
– Set the parameter unpruned to TRUE, Report and explain any change in the accuracy of results and in the tree structure.
– Change the confidence factor to 15%, report the impact on the classification outputs, explaining the causes of change.
d) Visualise the tree and Generate a set of rules along the subtree path: Varices – Ascites Spiders Bilirubin Sex Class No. If you were to generate association rules from the tree how could you reduce the number of rules (hint: speculate about Support and Confidence)?
e) Perform predictions using two other classification models of your choice: e.g. ANN, SVM, Ensemble learner. Report on the accuracy metrics, discuss the superiority/inferiority of these models performance compared to the decision tree.
f) Create ROC and Lift charts and interpret them.