Machine Learning Project 1 - Naive Bayes

Description: For this project, I worked in a team with another student at Montana State University. We were faced with coding a Naive Bayes model from scratch to be used on classification datasets from the UCI Machine Learning Repository. For this project, we used the Breast Cancer Wisconsin Dataset, Glass Identification Dataset, Congressional Voting Records Dataset, Iris Dataset, and Soybean (Small) Dataset. A base model performance was received after testing each of the datasets. We performed 10-fold cross validation, and we used precision recall and accuracy as the metrics. Once we had a base performance, we then added artificial noise to the datasets. This was done by selecting 10% of the features in the dataset at random and then mixing up the order of that given attribute. We then performed the experiment one more time to test how our Naive Bayes model performed with more noise.


Results: Our Naive Bayes model performed well on all the datasets with a slight decrease in performance with the simulated noise.


Technologies: Python, Numpy, Matplotlib, UML, Latex


Note: If you would like to see the full design document, code base, and research paper that goes with this project please feel free to reach out to me by email.