I’m analyzing student performance data. In my dataset each row corresponds to a student and each column contains several performance metrics (continuous) and the student type (categorical, 4 types). The student type was computed in another analysis using Expectation-Maximization. This analysis based on how students were graded over time. I only have a small sample of 50 students.
I want to understand what characterizes each student type, regarding the performance features I have. I want to understand things like “the more grade they have the more likely is to belong to a particular cluster” and so on, if they are present at all in my data.
I have three questions:
I believe that what I need is Multinomial Logistic Regression. Am I right or is there a better way to achieve this?
If yes, I’ve been exploring Multinomial Logistic Regression in
R
, using themultinom
of thennet
package, but I need help with the following:Understanding if the model has a good fit. So far I have the percentage of correctly classified instances, but I know this is not a very good measure of fit.
How to assess how good each individual predictor is. I know how to look for the exponentiated $\beta$, but I don’t know how to assess its significance. I read that using the t-distribution to compute the p-value here is usually a mistake. I found a similar post here, but a clear answer was not provided.