Crossing categorical features that are stored as integers

Question

Newbie here. I'm experimenting with the following dataset:

https://archive.ics.uci.edu/ml/datasets/Teaching+Assistant+Evaluation

Data Set Information: The data consist of evaluations of teaching performance over three regular semesters and two summer semesters of 151 teaching assistant (TA) assignments at the Statistics Department of the University of Wisconsin-Madison. The scores were divided into 3 roughly equal-sized categories ("low", "medium", and "high") to form the class variable.

Attribute Information:

Whether of not the TA is a native English speaker (binary); 1=English speaker, 2=non-English speaker
Course instructor (categorical, 25 categories)
Course (categorical, 26 categories)
Summer or regular semester (binary) 1=Summer, 2=Regular
Class size (numerical)
Class attribute (categorical) 1=Low, 2=Medium, 3=High

The data looks like that:

1,23,3,1,19,3
2,15,3,1,17,3
1,23,3,2,49,3
1,5,2,2,33,3
2,7,11,2,55,3
2,23,3,1,20,3
2,9,5,2,19,3
...

I want to cross 2 features (

dataset_bin['courseHasNativeTA'] = dataset_con['courseHasNativeTA'] = dataset_con['engNativ'] + dataset_con['course']

plt.style.use('seaborn-whitegrid')
fig = plt.figure(figsize=(20,10)) 
sns.countplot(y="courseHasNativeTA", data=dataset_bin);

I get the following output:

The problem is that it seems to make no sense as the courses are supposed to be identified from 1 to 26 yet it goes from 2 from 28. I suspect the problem coming from the fact that engNativ and course are treated as numerical features instead of categorical.

I read this related question but I not quite sure about how to apply it to my problem.

Any insight one this? Thanks

Crossing categorical features that are stored as integers

0 Answers0