Date on Honors Thesis

Spring 5-7-2021


Computer Science

Examining Committee Member

Dr. Saptarshi Sengupta, Advisor

Examining Committee Member

Dr. Matthew Tennyson, Committee Member

Examining Committee Member

Dr. Goutam Mylavarapu, Committee Member


Clustering is a widely used unsupervised learning technique across data mining and machine learning applications and finds frequent use in diverse fields ranging from astronomy, medical imaging, search and optimization, geology, geophysics and sentiment analysis to name a few. It is therefore important to verify the effectiveness of the clustering algorithms in question and to make reasonably strong arguments for the acceptance of the end results generated by the validity indices that measure the compactness and separability of clusters. This work aims to explore the successes and limitations of popular clustering mechanisms such as K-Means and Fuzzy C-Means by comparing their performance over publicly available benchmarking datasets that capture a variety of datapoint distributions as well as the number of features, especially from a computational point of view by incorporating techniques that alleviate some of the issues that plague these algorithms. In particular, sensitivity to initialization conditions and stagnation to local minima are explored. Further, an implementation of a feed-forward neural network using a branch of guided random search techniques, viz. Particle Swarm Optimization as the weight optimization strategy is explored to look at the same problem from a classification point of view. The algorithms implemented in this work are studied and their results compared, from which insights about their suitability of application to particular datasets can be obtained.