Most Popular

1500 questions
47
votes
8 answers

Would a Bayesian admit that there is one fixed parameter value?

In Bayesian data analysis, parameters are treated as random variables. This stems from the Bayesian subjective conceptualization of probability. But do Bayesians theoretically acknowledge that there is one true fixed parameter value out in the 'real…
ATJ
  • 1,711
  • 1
  • 15
  • 20
47
votes
4 answers

Taking the expectation of Taylor series (especially the remainder)

My question concerns trying to justify a widely-used method, namely taking the expected value of Taylor Series. Assume we have a random variable $X$ with positive mean $\mu$ and variance $\sigma^2$. Additionally, we have a function, say,…
agronskiy
  • 655
  • 6
  • 7
47
votes
1 answer

Neural Networks: weight change momentum and weight decay

Momentum $\alpha$ is used to diminish the fluctuations in weight changes over consecutive iterations: $$\Delta\omega_i(t+1) = - \eta\frac{\partial E}{\partial w_i} + \alpha \Delta \omega_i(t),$$ where $E({\bf w})$ is the error function, ${\bf w}$ -…
47
votes
4 answers

Difference between forecast and prediction?

I was wondering what difference and relation are between forecast and prediction? Especially in time series and regression? For example, am I correct that: In time series, forecasting seems to mean to estimate a future values given past values of a…
Tim
  • 1
  • 29
  • 102
  • 189
47
votes
3 answers

Intuitive difference between hidden Markov models and conditional random fields

I understand that HMMs (Hidden Markov Models) are generative models, and CRF are discriminative models. I also understand how CRFs (Conditional Random Fields) are designed and used. What I do not understand is how they are different from HMMs? I…
47
votes
7 answers

Features for time series classification

I consider the problem of (multiclass) classification based on time series of variable length $T$, that is, to find a function $$f(X_T) = y \in [1..K]\\ \text{for } X_T = (x_1, \dots, x_T)\\ \text{with } x_t \in \mathbb{R}^d ~,$$ via a global…
Emile
  • 3,150
  • 2
  • 20
  • 17
47
votes
2 answers

How well can multiple regression really "control for" covariates?

We’re all familiar with observational studies that attempt to establish a causal link between a nonrandomized predictor X and an outcome by including every imaginable potential confounder in a multiple regression model. By thus “controlling for” all…
half-pass
  • 3,594
  • 7
  • 23
  • 34
47
votes
4 answers

Where does $\sqrt{n}$ come from in central limit theorem (CLT)?

A very simple version of central limited theorem as below $$ \sqrt{n}\bigg(\bigg(\frac{1}{n}\sum_{i=1}^n X_i\bigg) - \mu\bigg)\ \xrightarrow{d}\ \mathcal{N}(0,\;\sigma^2) $$ which is Lindeberg–Lévy CLT. I do not understand why there is a $\sqrt{n}$…
Flying pig
  • 5,689
  • 11
  • 32
  • 31
47
votes
3 answers

SVM, Overfitting, curse of dimensionality

My dataset is small (120 samples), however the number of features are large varies from (1000-200,000). Although I'm doing feature selection to pick a subset of features, it might still overfit. My first question is, how does SVM handle…
user13420
  • 825
  • 2
  • 9
  • 10
47
votes
1 answer

Rank in R - descending order

I am looking to rank data that, in some cases, the larger value has the rank of 1. I am relatively new to R, but I don't see how I can adjust this setting in the rank function. x <- c(23,45,12,67,34,89) rank(x) generates: [1] 2 4 1 5 3 6 when I…
Btibert3
  • 1,154
  • 1
  • 13
  • 23
47
votes
8 answers

Bayesian vs frequentist Interpretations of Probability

Can someone give a good rundown of the differences between the Bayesian and the frequentist approach to probability? From what I understand: The frequentists view is that the data is a repeatable random sample (random variable) with a specific…
BYS2
  • 1,365
  • 2
  • 12
  • 19
47
votes
1 answer

Explanation of min_child_weight in xgboost algorithm

The definition of the min_child_weight parameter in xgboost is given as the: minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than…
User123456789
  • 613
  • 1
  • 5
  • 9
47
votes
6 answers

Why is softmax output not a good uncertainty measure for Deep Learning models?

I've been working with Convolutional Neural Networks (CNNs) for some time now, mostly on image data for semantic segmentation/instance segmentation. I've often visualized the softmax of the network output as a "heat map" to see how high per pixel…
47
votes
3 answers

Are CDFs more fundamental than PDFs?

My stat prof basically said, if given one of the following three, you can find the other two: Cumulative distribution function Moment Generating Function Probability Density Function But my econometrics professor said CDFs are more fundamental…
47
votes
7 answers

Choosing variables to include in a multiple linear regression model

I am currently working to build a model using a multiple linear regression. After fiddling around with my model, I am unsure how to best determine which variables to keep and which to remove. My model started with 10 predictors for the DV. When…