Analyses where there is more than one variable analyzed together at once, and these variables are either dependent (response) ones or the only ones in the analysis. This can be contrasted with "multiple" or "multivariable" analysis, which implies more than one predictor (independent) variable.
Questions tagged [multivariate-analysis]
2304 questions
121
votes
4 answers
Is it possible to have a pair of Gaussian random variables for which the joint distribution is not Gaussian?
Somebody asked me this question in a job interview and I replied that their joint distribution is always Gaussian. I thought that I can always write a bivariate Gaussian with their means and variance and covariances. I am wondering if there can be a…

MarkSAlen
- 2,559
- 5
- 24
- 25
114
votes
5 answers
What skills are required to perform large scale statistical analyses?
Many statistical jobs ask for experience with large scale data. What are the sorts of statistical and computational skills that would be need for working with large data sets. For example, how about building regression models given a data set with…

bit-question
- 2,637
- 6
- 25
- 26
98
votes
13 answers
What is the best way to identify outliers in multivariate data?
Suppose I have a large set of multivariate data with at least three variables. How can I find the outliers? Pairwise scatterplots won't work as it is possible for an outlier to exist in 3 dimensions that is not an outlier in any of the 2 dimensional…

Rob Hyndman
- 51,928
- 23
- 126
- 178
76
votes
2 answers
Multivariate multiple regression in R
I have 2 dependent variables (DVs) each of whose score may be influenced by the set of 7 independent variables (IVs). DVs are continuous, while the set of IVs consists of a mix of continuous and binary coded variables. (In code below continuous…

Andrej
- 2,131
- 2
- 18
- 26
69
votes
2 answers
What is the relationship between independent component analysis and factor analysis?
I am new to Independent Component Analysis (ICA) and have just a rudimentary understanding of the the method. It seems to me that ICA is similar to Factor Analysis (FA) with one exception: ICA assumes that the observed random variables are a linear…

stats_student
- 823
- 1
- 8
- 7
60
votes
5 answers
Is adjusting p-values in a multiple regression for multiple comparisons a good idea?
Lets assume you are a social science researcher/econometrician trying to find relevant predictors of demand for a service. You have 2 outcome/dependent variables describing the demand (using the service yes/no, and the number of occasions). You have…

Mikael M
- 703
- 1
- 6
- 6
58
votes
3 answers
What is the intuition behind conditional Gaussian distributions?
Suppose that $\mathbf{X} \sim N_{2}(\mathbf{\mu}, \mathbf{\Sigma})$. Then the conditional distribution of $X_1$ given that $X_2 = x_2$ is multivariate normally distributed with mean:
$$ E[P(X_1 | X_2 = x_2)] =…

eroeijr
- 581
- 1
- 5
- 4
52
votes
5 answers
How are propensity scores different from adding covariates in a regression, and when are they preferred to the latter?
I admit I'm relatively new to propensity scores and causal analysis.
One thing that's not obvious to me as a newcomer is how the "balancing" using propensity scores is mathematically different from what happens when we add covariates in a…

Frank Barry
- 671
- 1
- 7
- 5
43
votes
1 answer
PCA and Correspondence analysis in their relation to Biplot
Biplot is often used to display results of principal component analysis (and of related techniques). It is a dual or overlay scatterplot showing component loadings and component scores simultaneously. I was informed by @amoeba today that he has…

ttnphns
- 51,648
- 40
- 253
- 462
38
votes
7 answers
Is there an accepted definition for the median of a sample on the plane, or higher ordered spaces?
If so, what?
If not, why not?
For a sample on the line, the median minimizes the total absolute deviation. It would seem natural to extend the definition to R2, etc., but I've never seen it. But then, I've been out in left field for a long time.

phv3773
- 481
- 4
- 4
36
votes
4 answers
What test can I use to compare slopes from two or more regression models?
I would like to test the difference in response of two variables to one predictor. Here is a minimal reproducible example.
library(nlme)
## gls is used in the application; lm would suffice for this example
m.set <- gls(Sepal.Length ~ Petal.Width,…

Abe
- 3,561
- 7
- 27
- 45
35
votes
5 answers
Measuring the "distance" between two multivariate distributions
I'm looking for some good terminology to describe what I'm trying to do, to make it easier to look for resources.
So, say I have two clusters of points A and B, each associated to two values, X and Y, and I want to measure the "distance" between A…

Emile
- 1,057
- 1
- 10
- 16
31
votes
3 answers
What's in a name: Precision (inverse of variance)
Intuitively, the mean is just the average of observations. The variance is how much these observations vary from the mean.
I would like to know why the inverse of the variance is known as the precision. What intuition can we make from this? And why…

cgo
- 7,445
- 10
- 42
- 61
30
votes
6 answers
Variable selection procedure for binary classification
What are the variable/feature selection that you prefer for binary classification when there are many more variables/feature than observations in the learning set? The aim here is to discuss what is the feature selection procedure that reduces the…

robin girard
- 6,335
- 6
- 46
- 60
30
votes
1 answer
SVD of correlated matrix should be additive but doesn't appear to be
I'm just trying to replicate a claim made in the following paper, Finding Correlated Biclusters from Gene Expression Data, which is:
Proposition 4. If $X_{IJ}=R_{I}C^{T}_{J}$. then we have:
i. If $R_{I}$ is a perfect bicluster with additive model,…

zzk
- 697
- 6
- 14