Murray State Theses and Dissertations

Abstract

Data and algorithmic modeling are two different approaches used in predictive analytics. The models discussed from these two approaches include the proportional odds logit model (POLR), the vector generalized linear model (VGLM), the classification and regression tree model (CART), and the random forests model (RF). Patterns in the data were analyzed using trigonometric polynomial approximations and Fast Fourier Transforms. Predictive modeling is used frequently in statistics and data science to find the relationship between the explanatory (input) variables and a response (output) variable. Both approaches prove advantageous in different cases depending on the data set. In our case, the data set contains an output variable that is ordinal. Using grade records from Murray State University, the goal is to find the best predictive model that can implement an ordinal output by means of data modeling and algorithmic modeling. To train the models, k-fold cross validation is used to find the optimal tuning parameters and performance for each of the models. The logarithmic loss (logLoss) performance metric is utilized to determine which method has the top predictive accuracy. A comparison of each statistical model and a look at alternative methods is discussed.

Year manuscript completed

2020

Year degree awarded

2020

Author's Keywords

machine learning, predictive modeling, student grades, ordinal output, ordinal, proportional odds logistic regression, POLR, vector generalized linear models, VGLM, CART, classification trees, classification and regression trees, random forests, random forest, RF, logLoss, logarithmic loss, multi-class, CV, cross validation, k-fold cross validation, training, testing, trigonometric interpolating polynomials, fast fourier transforms, FFT, fourier series

Thesis Advisor

Donald Adongo

Thesis Co-Advisor

Christopher Mecklin

Committee Chair

Edward Thome

Committee Member

Manoj Pathak

Document Type

Thesis

Share

COinS