Tags
A tag is a keyword or label that categorizes your question with other, similar questions.
Use this tag for any *on-topic* question that (a) involves `R` either as a critical part of the question or expected answer, & (b) is not *just* about how to use `R`.
26743 questions
Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.
26055 questions
Machine learning algorithms build a model of the training data. The term "machine learning" is vaguely defined; it includes what is also called statistical learning, reinforcement learning, unsupervised learning, etc. ALWAYS ADD A MORE SPECIFIC TAG.
18068 questions
Time series are data observed over time (either in continuous time or at discrete time periods).
12965 questions
A probability provides a quantitative description of the likely occurrence of a particular event.
11025 questions
Hypothesis testing assesses whether data are inconsistent with a given hypothesis rather than being an effect of random fluctuations.
9227 questions
A distribution is a mathematical description of probabilities or frequencies.
8590 questions
A routine exercise designed to test one's knowledge; often from a textbook, course, or test used for a class or self-study. This community's policy is to "provide helpful hints" for such questions rather than complete answers.
7595 questions
Artificial neural networks (ANNs) are a broad class of computational models loosely based on biological neural networks. They encompass feedforward NNs (including "deep" NNs), convolutional NNs, recurrent NNs, etc.
7277 questions
Bayesian inference is a method of statistical inference that relies on treating the model parameters as random variables and applying Bayes' theorem to deduce subjective probability statements about the parameters or hypotheses, conditional on the observed dataset.
7099 questions
Refers generally to statistical procedures that utilize the logistic function, most commonly various forms of logistic regression
7015 questions
Mathematical theory of statistics, concerned with formal definitions and general results.
6815 questions
Statistical classification is the problem of identifying the sub-population to which new observations belong, where the identity of the sub-population is unknown, on the basis of a training set of data containing observations whose sub-population is known. Therefore these classifications will show a variable behavior which can be studied by statistics.
6303 questions
Statistical significance is a characteristic of a statistic viewed in light of a null hypothesis and a given significance level. It reflects whether the statistic belongs to the rejection region (is statistically significant) or the acceptance region (is not statistically significant).
5649 questions
The normal, or Gaussian, distribution has a density function that is a symmetrical bell-shaped curve. It is one of the most important distributions in statistics. Use the [normality] tag for asking about testing for normality.
5424 questions
Mixed (aka multilevel or hierarchical) models are linear models that include both fixed effects and random effects. They are used to model longitudinal or nested data.
5314 questions
Regression that includes two or more non-constant independent variables. Also known as multivariable regression.
4906 questions
ANOVA stands for ANalysis Of VAriance, a statistical model and set of procedures for comparing multiple group means. The independent variables in an ANOVA model are categorical, but an ANOVA table can be used to test continuous variables as well.
4844 questions
Python is a programming language commonly used for machine learning. Use this tag for any *on-topic* question that (a) involves `Python` either as a critical part of the question or expected answer, & (b) is not *just* about how to use `Python`.
4198 questions
A confidence interval is an interval that covers an unknown parameter with $(1-\alpha)\%$ confidence. Confidence intervals are a frequentist concept. They are often confused with credible intervals which is the Bayesian analog.
4058 questions
A generalization of linear regression allowing for nonlinear relationships via a "link function" and for the variance of the response to depend on the predicted value. (Not to be confused with "general linear model" which extends the ordinary linear model to general covariance structure and multivariate response.)
3987 questions
The expected squared deviation of a random variable from its mean; or, the average squared deviation of data about their mean.
3779 questions
Cluster analysis is the task of partitioning data into subsets of objects according to their mutual "similarity," without using preexisting knowledge such as class labels. [Clustered-standard-errors and/or cluster-samples should be tagged as such; do NOT use the "clustering" tag for them.]
3772 questions
Prediction of the future events. It is a special case of [prediction], in the context of [time-series].
3542 questions
Categorical (also called nominal) data can take on a limited number of possible values called categories. Categorical values "label", they do not "measure". Please use [ordinal-data] tag for discrete but ordered data types.
3266 questions
A test for comparing the means of two samples, or the mean of one sample (or even parameter estimates) with a specified value; also known as the "Student t-test" after the pseudonym of its inventor.
3254 questions
Repeatedly withholding subsets of the data during model fitting in order to quantify the model performance on the withheld data subsets.
3195 questions
Principal component analysis (PCA) is a linear dimensionality reduction technique. It reduces a multivariate dataset to a smaller set of constructed variables preserving as much information (as much variance) as possible. These variables, called principal components, are linear combinations of the input variables.
3190 questions
This tag is too general; please provide a more specific tag. For questions about the properties of specific estimators, use [estimators] tag instead.
3043 questions
a method of estimating parameters of a statistical model by choosing the parameter value that optimizes the probability of observing the given sample.
2972 questions
lme4 and nlme are R packages used for fitting linear, generalized linear and nonlinear mixed effects models. For general questions about mixed models use [mixed-model] tag.
2903 questions
Creating samples from a well-specified population using a probabilistic method and/or producing random numbers from a specified distribution. As this tag is ambiguous, please consider [survey-sampling] for the former and [monte-carlo] or [simulation] for the latter. For questions regarding creating random samples from known distributions, please consider using the [random-generation] tag.
2894 questions
Constructing meaningful and useful graphical representations of data. (If your question is only about how to get particular software to produce a specific effect, then it is likely not on topic here.)
2831 questions
Predictive models are statistical models whose primary purpose is to predict other observations of a system optimally, as opposed to models whose purpose is to test a particular hypothesis or explain a phenomenon mechanistically. As such, predictive models place less emphasis on interpretability and more emphasis on performance.
2756 questions
Refers to the AutoRegressive Integrated Moving Average model used in time series modeling both for data description and for forecasting. This model generalizes the ARMA model by including a term for differencing, which is useful for removing trends and handling some types of non-stationarity.
2671 questions