Questions tagged [underdetermined]

Analyses are underdetermined when the number of parameters to be estimated is greater than the number of data. This problem is also referred to as 'p >> n'.

Analyses are underdetermined (cf., Wikipedia: underdetermined system) when the number of parameters to be estimated is greater than the number of data. This problem is also referred to as 'p >> n'. A practical example of this is genome-wide association studies that attempt to determine if any of a large number of genetic variants predict a disease.

23 questions
31
votes
1 answer

Feature selection & model with glmnet on Methylation data (p>>N)

I would like to use GLM and Elastic Net to select those relevant features + build a linear regression model (i.e., both prediction and understanding, so it would be better to be left with relatively few parameters). The output is continuous. It's…
PGreen
  • 565
  • 1
  • 6
  • 11
31
votes
5 answers

Detecting significant predictors out of many independent variables

In a dataset of two non-overlapping populations (patients & healthy, total $n=60$) I would like to find (out of $300$ independent variables) significant predictors for a continuous dependent variable. Correlation between predictors is present. I am…
14
votes
2 answers

Can one (theoretically) train a neural network with fewer training samples than weights?

First of all: I know, there is no general number of sample size required to train a neural network. It depends on way too many factors like complexity of the task, noise in the data and so on. And the more training samples I have, the better will be…
Hobbit
  • 165
  • 1
  • 7
13
votes
1 answer

Applying ridge regression for an underdetermined system of equations?

When $y = X\beta + e$, the least squares problem which imposes a spherical restriction $\delta$ on the value of $\beta$ can be written as \begin{equation} \begin{array} &\operatorname{min}\ \| y - X\beta \|^2_2 \\ \operatorname{s.t.}\ \…
6
votes
4 answers

Solving a practical machine learning problem

I am currently doing my Phd in computational biology at Stanford. I get the data I need to answer the questions I am interested in. The data sets are sometimes "large" and these large problems take longer time periods to solve (a couple of days…
Sid
  • 2,489
  • 10
  • 15
5
votes
3 answers

SVM has relatively low classification rate for high-dimensional data even though 2-D projections show they are separable

I have another problem with 14000 features and 500 training samples. It is a binary classification problem and approximately in the form of an ellipse. My classification accuracy using the 2nd degree polynomial Kernel and via CV is ~ 80%. However,…
user27525
  • 149
  • 3
  • 9
5
votes
2 answers

Why is it bad if number of dimensions / factors > sample size?

I've been told (read) this many times, but I never understood why it's bad for the number of dimensions in your data, or the number of explanatory variables in your model to be higher than your number of samples. Why is this the case?
tmakino
  • 739
  • 1
  • 4
  • 14
5
votes
3 answers

Why is $n < p$ a problem for OLS regression?

I realize I can't invert the $X'X$ matrix but I can use gradient descent on the quadratic loss function and get a solution. I can then use those estimates to calculate standard errors and residuals. Am I going to encounter any problems doing this?
badmax
  • 1,659
  • 7
  • 19
5
votes
0 answers

How to identify a SEM with formative dependent variable (with R's lavaan package)?

I have a formative construct in a structural equation model (SEM) which I would like to estimate with the function sem in the lavaan package in R. Currently, the model is underidentified. I know about four different approaches for identifying the…
jhg
  • 305
  • 2
  • 4
  • 11
4
votes
0 answers

What is scikit-learn's LinearRegression doing when there are more features than observations?

I'm trying to understand what sklearn's LinearRegression (which should be using ordinary least squares) is doing when there are more features than observations. import numpy as np from sklearn.linear_model import LinearRegression X =…
4
votes
1 answer

Fitting least squares when number of predictors are larger than instances

A statement from the book Introduction to Statistical learning with applications in R, didn't quite make sense to me. It says, "In cases when number of predictors are greater than the instances we cannot even fit the multiple linear regression…
3
votes
1 answer

Linear discriminant analysis with $p\gg n$

I am studying Linear Discriminant Analysis (LDA). According to the formula for LDA, we are supposed to get the inverse of within group covariance. However, if $p\gg n$ (i.e., the dimension is much larger than the number of samples), what should I…
3
votes
0 answers

LASSO prediction model question

I am trying to create a prediction model with 33 predictors (brain metabolite levels in various regions) and 8 observations (cognitive test scores) with p>>n problem using LASSO in MATLAB (lassoglm function). When I run LASSO 100 times with 5-fold…
2
votes
1 answer

Dealing with underdetermination in Bayesian models

Bayesian models are supposedly well equipped to deal with high-dimensionality problems, and can handle sparse data well, too. But suppose I've created a model that estimate more parameters than there are data points. Are there tricks to deal with…
Brash Equilibrium
  • 3,565
  • 1
  • 25
  • 43
2
votes
1 answer

Dataset for Least Angle Regression

I have read that least angle regression is good for high dimensional data. I didn't actually understand the meaning of high dimensional data, so does this mean $p>>n$ case? And does anyone know any good dataset with such properties on which we can…
1
2