Most Popular

1500 questions
114
votes
5 answers

What skills are required to perform large scale statistical analyses?

Many statistical jobs ask for experience with large scale data. What are the sorts of statistical and computational skills that would be need for working with large data sets. For example, how about building regression models given a data set with…
114
votes
4 answers

Why does the Lasso provide Variable Selection?

I've been reading Elements of Statistical Learning, and I would like to know why the Lasso provides variable selection and ridge regression doesn't. Both methods minimize the residual sum of squares and have a constraint on the possible values of…
Zhi Zhao
  • 1,352
  • 3
  • 9
  • 9
114
votes
4 answers

What does a "closed-form solution" mean?

I have come across the term "closed-form solution" quite often. What does a closed-form solution mean? How does one determine if a close-form solution exists for a given problem? Searching online, I found some information, but nothing in the context…
114
votes
16 answers

What misused statistical terms are worth correcting?

Statistics is everywhere; common usage of statistical terms is, however, often unclear. The terms probability and odds are used interchangeable in lay English despite their well-defined and different mathematical expressions. Not separating the term…
Antoni Parellada
  • 23,430
  • 15
  • 100
  • 197
114
votes
2 answers

tanh activation function vs sigmoid activation function

The tanh activation function is: $$tanh \left( x \right) = 2 \cdot \sigma \left( 2 x \right) - 1$$ Where $\sigma(x)$, the sigmoid function, is defined as: $$\sigma(x) = \frac{e^x}{1 + e^x}$$. Questions: Does it really matter between using those…
satya
  • 1,293
  • 2
  • 9
  • 9
113
votes
6 answers

What loss function for multi-class, multi-label classification tasks in neural networks?

I'm training a neural network to classify a set of objects into n-classes. Each object can belong to multiple classes at the same time (multi-class, multi-label). I read that for multi-class problems it is generally recommended to use softmax and…
aKzenT
  • 1,231
  • 2
  • 8
  • 5
112
votes
6 answers

Is it possible to train a neural network without backpropagation?

Many neural network books and tutorials spend a lot of time on the backpropagation algorithm, which is essentially a tool to compute the gradient. Let's assume we are building a model with ~10K parameters / weights. Is it possible to run the…
Haitao Du
  • 32,885
  • 17
  • 118
  • 213
111
votes
11 answers

Calculating optimal number of bins in a histogram

I'm interested in finding as optimal of a method as I can for determining how many bins I should use in a histogram. My data should range from 30 to 350 objects at most, and in particular I'm trying to apply thresholding (like Otsu's method) where…
Tony Stark
  • 1,213
  • 2
  • 9
  • 5
111
votes
7 answers

Why use gradient descent for linear regression, when a closed-form math solution is available?

I am taking the Machine Learning courses online and learnt about Gradient Descent for calculating the optimal values in the hypothesis. h(x) = B0 + B1X why we need to use Gradient Descent if we can easily find the values with the below formula?…
Purus
  • 1,213
  • 2
  • 7
  • 6
111
votes
2 answers

What is an embedding layer in a neural network?

In many neural network libraries, there are 'embedding layers', like in Keras or Lasagne. I am not sure I understand its function, despite reading the documentation. For example, in the Keras documentation it says: Turn positive integers (indexes)…
Francesco
  • 1,213
  • 2
  • 9
  • 8
111
votes
5 answers

Using k-fold cross-validation for time-series model selection

Question: I want to be sure of something, is the use of k-fold cross-validation with time series is straightforward, or does one need to pay special attention before using it? Background: I'm modeling a time series of 6 year (with semi-markov…
Mickaël S
  • 1,258
  • 3
  • 10
  • 6
110
votes
4 answers

How do you calculate precision and recall for multiclass classification using confusion matrix?

I wonder how to compute precision and recall using a confusion matrix for a multi-class classification problem. Specifically, an observation can only be assigned to its most probable class / label. I would like to compute: Precision = TP / (TP+FP)…
daiyue
  • 1,203
  • 2
  • 9
  • 7
109
votes
7 answers

Detecting a given face in a database of facial images

I'm working on a little project involving the faces of twitter users via their profile pictures. A problem I've encountered is that after I filter out all but the images that are clear portrait photos, a small but significant percentage of twitter…
ʞɔıu
  • 1,107
  • 2
  • 8
  • 5
109
votes
4 answers

Softmax vs Sigmoid function in Logistic classifier?

What decides the choice of function ( Softmax vs Sigmoid ) in a Logistic classifier ? Suppose there are 4 output classes . Each of the above function gives the probabilities of each class being the correct output . So which one to take for a…
mach
  • 1,545
  • 3
  • 10
  • 12
108
votes
4 answers

Difference between standard error and standard deviation

I'm struggling to understand the difference between the standard error and the standard deviation. How are they different and why do you need to measure the standard error?
louis xie
  • 1,233
  • 3
  • 10
  • 6