I have the following set of qualitative data:
______________________________________________
| Observations | City | Transportation mean |
|______________|_________|_____________________|
| Individual#1 | Paris | Bicycle |
|--------------|---------|---------------------|
| Individual#2 | London | Car |
|--------------|---------|---------------------|
| Individual#3 | Paris | (Bicycle, Car) |
|______________|_________|_____________________|
To analyze them, I thus want to run a Multiple Correspondence Analysis (MCA). However, for some variables (here Transportation mean
), modalities are non-exclusive (i.e. Individual#3
use both Bicycle
and Car
).
Questions:
- Is it ok to do a MCA on such data?
- If yes, how to construct the complete disjunctive table (CDT)?
Issue
If yes, I'm not sure what is the best CDT to use between these following two:
_______________________________________________________
| Observations | C_Paris | C_London | T_Bicycle | T_Car |
|______________|_________|__________|___________|_______|
| Individual#1 | 1 | 0 | 1 | 0 |
|--------------|---------|----------|-----------|-------|
| Individual#2 | 0 | 1 | 0 | 1 |
|--------------|---------|----------|-----------|-------|
| Individual#3 | 1 | 0 | 1 | 1 |
|______________|_________|__________|___________|_______|
_______________________________________________________
| Observations | C_Paris | C_London | T_Bicycle | T_Car |
|______________|_________|__________|___________|_______|
| Individual#1 | 1 | 0 | 1 | 0 |
|--------------|---------|----------|-----------|-------|
| Individual#2 | 0 | 1 | 0 | 1 |
|--------------|---------|----------|-----------|-------|
| Individual#3 | 1 | 0 | .5 | .5 |
|______________|_________|__________|___________|_______|
The difference is that, in the first table, weights are only binary (0
or 1
); when in the second, the weight in a cell is: weight = bool_selected? / (number of modalities selected for this category and this observation)
(where bin_selected?
is a boolean that equals 1
if the conisidered modality has been selected, and 0
otherwise).
E.g. since Individual#3
selected two modalities (Bicycle
and Car
) of the variable Transportation mean
, the weight for a modality of the variable Transportation mean
for the observation Individual#3
equals 1/2 = .5
if the modality is selected and 0/2 = 0
if not.