Tags - Statistical Analysis Stack Exchange

r

Use this tag for any *on-topic* question that (a) involves `R` either as a critical part of the question or expected answer, & (b) is not *just* about how to use `R`.

26743 questions

regression

Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.

26055 questions

machine-learning

Machine learning algorithms build a model of the training data. The term "machine learning" is vaguely defined; it includes what is also called statistical learning, reinforcement learning, unsupervised learning, etc. ALWAYS ADD A MORE SPECIFIC TAG.

18068 questions

time-series

Time series are data observed over time (either in continuous time or at discrete time periods).

12965 questions

probability

A probability provides a quantitative description of the likely occurrence of a particular event.

11025 questions

hypothesis-testing

Hypothesis testing assesses whether data are inconsistent with a given hypothesis rather than being an effect of random fluctuations.

9227 questions

distributions

A distribution is a mathematical description of probabilities or frequencies.

8590 questions

self-study

A routine exercise designed to test one's knowledge; often from a textbook, course, or test used for a class or self-study. This community's policy is to "provide helpful hints" for such questions rather than complete answers.

7595 questions

neural-networks

Artificial neural networks (ANNs) are a broad class of computational models loosely based on biological neural networks. They encompass feedforward NNs (including "deep" NNs), convolutional NNs, recurrent NNs, etc.

7277 questions

bayesian

Bayesian inference is a method of statistical inference that relies on treating the model parameters as random variables and applying Bayes' theorem to deduce subjective probability statements about the parameters or hypotheses, conditional on the observed dataset.

7099 questions

logistic

Refers generally to statistical procedures that utilize the logistic function, most commonly various forms of logistic regression

7015 questions

mathematical-statistics

Mathematical theory of statistics, concerned with formal definitions and general results.

6815 questions

classification

Statistical classification is the problem of identifying the sub-population to which new observations belong, where the identity of the sub-population is unknown, on the basis of a training set of data containing observations whose sub-population is known. Therefore these classifications will show a variable behavior which can be studied by statistics.

6303 questions

correlation

A measure of the degree of linear association among a pair of variables.

5702 questions

statistical-significance

Statistical significance is a characteristic of a statistic viewed in light of a null hypothesis and a given significance level. It reflects whether the statistic belongs to the rejection region (is statistically significant) or the acceptance region (is not statistically significant).

5649 questions

normal-distribution

The normal, or Gaussian, distribution has a density function that is a symmetrical bell-shaped curve. It is one of the most important distributions in statistics. Use the [normality] tag for asking about testing for normality.

5424 questions

mixed-model

Mixed (aka multilevel or hierarchical) models are linear models that include both fixed effects and random effects. They are used to model longitudinal or nested data.

5314 questions

multiple-regression

Regression that includes two or more non-constant independent variables. Also known as multivariable regression.

4906 questions

anova

ANOVA stands for ANalysis Of VAriance, a statistical model and set of procedures for comparing multiple group means. The independent variables in an ANOVA model are categorical, but an ANOVA table can be used to test continuous variables as well.

4844 questions

python

Python is a programming language commonly used for machine learning. Use this tag for any *on-topic* question that (a) involves `Python` either as a critical part of the question or expected answer, & (b) is not *just* about how to use `Python`.

4198 questions

confidence-interval

A confidence interval is an interval that covers an unknown parameter with $(1-\alpha)\%$ confidence. Confidence intervals are a frequentist concept. They are often confused with credible intervals which is the Bayesian analog.

4058 questions

generalized-linear-model

A generalization of linear regression allowing for nonlinear relationships via a "link function" and for the variance of the response to depend on the predicted value. (Not to be confused with "general linear model" which extends the ordinary linear model to general covariance structure and multivariate response.)

3987 questions

variance

The expected squared deviation of a random variable from its mean; or, the average squared deviation of data about their mean.

3779 questions

clustering

Cluster analysis is the task of partitioning data into subsets of objects according to their mutual "similarity," without using preexisting knowledge such as class labels. [Clustered-standard-errors and/or cluster-samples should be tagged as such; do NOT use the "clustering" tag for them.]

3772 questions

forecasting

Prediction of the future events. It is a special case of [prediction], in the context of [time-series].

3542 questions

categorical-data

Categorical (also called nominal) data can take on a limited number of possible values called categories. Categorical values "label", they do not "measure". Please use [ordinal-data] tag for discrete but ordered data types.

3266 questions

t-test

A test for comparing the means of two samples, or the mean of one sample (or even parameter estimates) with a specified value; also known as the "Student t-test" after the pseudonym of its inventor.

3254 questions

cross-validation

Repeatedly withholding subsets of the data during model fitting in order to quantify the model performance on the withheld data subsets.

3195 questions

pca

Principal component analysis (PCA) is a linear dimensionality reduction technique. It reduces a multivariate dataset to a smaller set of constructed variables preserving as much information (as much variance) as possible. These variables, called principal components, are linear combinations of the input variables.

3190 questions

estimation

This tag is too general; please provide a more specific tag. For questions about the properties of specific estimators, use [estimators] tag instead.

3043 questions

maximum-likelihood

a method of estimating parameters of a statistical model by choosing the parameter value that optimizes the probability of observing the given sample.

2972 questions

lme4-nlme

lme4 and nlme are R packages used for fitting linear, generalized linear and nonlinear mixed effects models. For general questions about mixed models use [mixed-model] tag.

2903 questions

sampling

Creating samples from a well-specified population using a probabilistic method and/or producing random numbers from a specified distribution. As this tag is ambiguous, please consider [survey-sampling] for the former and [monte-carlo] or [simulation] for the latter. For questions regarding creating random samples from known distributions, please consider using the [random-generation] tag.

2894 questions

data-visualization

Constructing meaningful and useful graphical representations of data. (If your question is only about how to get particular software to produce a specific effect, then it is likely not on topic here.)

2831 questions

predictive-models

Predictive models are statistical models whose primary purpose is to predict other observations of a system optimally, as opposed to models whose purpose is to test a particular hypothesis or explain a phenomenon mechanistically. As such, predictive models place less emphasis on interpretability and more emphasis on performance.

2756 questions

arima

Refers to the AutoRegressive Integrated Moving Average model used in time series modeling both for data description and for forecasting. This model generalizes the ARMA model by including a term for differencing, which is useful for removing trends and handling some types of non-stationarity.

2671 questions