I have this dataset of flows:
Source entity | Dest entity | Traffic | Cost | Source location | Dest location | Direction | more independent variables (mostly nominal)
There is also a unit price ($/unit traffic) of each entity which comes from a discrete set {p1,p2,p3}
and is ordinal/continuous. I want to model this unit price using regression analysis.
Now the question i'm facing is that price is assigned to an entity (which can be source or dest in the table above) and not to a flow which represents each row above. I'm assuming that flows are independent of each other (i.i.d).
Would it be wise to attribute unit price of an entity to a flow ?
I know there is the option of aggregating data on entities but i'm afraid that could be disadvantageous because:
- It is likely that information would be lost
- Dependency is introduced among rows
Also, which models could make sense ? I'm inclined towards regression because of simplicity.
Appreciate any help/references here. Thanks
PS. I'm already confused while dealing with this many nominal variables.