Highest Voted Questions - Statistical Analysis Stack Exchange

56

votes

5 answers

Training a decision tree against unbalanced data

I'm new to data mining and I'm trying to train a decision tree against a data set which is highly unbalanced. However, I'm having problems with poor predictive accuracy. The data consists of students studying courses, and the class variable is the…

classification cart unbalanced-classes accuracy

asked May 08 '12 at 16:13

chrisb

715
1
7
8

56

votes

4 answers

What is the proper usage of scale_pos_weight in xgboost for imbalanced datasets?

I have a very imbalanced dataset. I'm trying to follow the tuning advice and use scale_pos_weight but not sure how should I tune it. I can see that RegLossObj.GetGradient does: if (info.labels[i] == 1.0f) w *= param_.scale_pos_weight so a gradient…

unbalanced-classes boosting

asked Oct 30 '16 at 13:59

ihadanny

2,596
3
19
31

56

votes

9 answers

How do R and Python complement each other in data science?

In many tutorials or manuals the narrative seems to imply that R and python coexist as complementary components of the analysis process. To my untrained eye, however, it seems that both languages sort of do the same thing. So my question is if there…

r python software

asked Oct 06 '16 at 08:57

BioHazZzZard

319
1
4
5

56

votes

2 answers

What is a difference between random effects-, fixed effects- and marginal model?

I am trying to expand my knowledge of statistics. I come from a physical sciences background with a "recipe based" approach to statistical testing, where we say is it continuous, is it normally distributed -- OLS regression. In my reading I have…

random-effects-model fixed-effects-model marginal-distribution

asked Jan 26 '12 at 12:56

N26

1,705
3
18
22

56

votes

2 answers

Neural Network: For Binary Classification use 1 or 2 output neurons?

Assume I want to do binary classification (something belongs to class A or class B). There are some possibilities to do this in the output layer of a neural network: Use 1 output node. Output 0 (<0.5) is considered class A and 1 (>=0.5) is…

machine-learning classification neural-networks

asked Apr 13 '16 at 08:23

robert

881
1
9
12

56

votes

7 answers

Effect of switching response and explanatory variable in simple linear regression

Let's say that there exists some "true" relationship between $y$ and $x$ such that $y = ax + b + \epsilon$, where $a$ and $b$ are constants and $\epsilon$ is i.i.d normal noise. When I randomly generate data from that R code: x <- 1:100; y <- ax + b…

regression

asked Jan 03 '12 at 19:24

Greg Aponte

663
1
6
6

56

votes

3 answers

Logistic Regression: Scikit Learn vs Statsmodels

I am trying to understand why the output from logistic regression of these two libraries gives different results. I am using the dataset from UCLA idre tutorial, predicting admit based on gre, gpa and rank. rank is treated as categorical variable,…

regression logistic python scikit-learn statsmodels

asked Mar 25 '16 at 22:01

hurrikale

853
1
8
7

56

votes

2 answers

Cross-Entropy or Log Likelihood in Output layer

I read this page: http://neuralnetworksanddeeplearning.com/chap3.html and it said that sigmoid output layer with cross-entropy is quite similiar with softmax output layer with log-likelihood. what happen if I use sigmoid with log-likelihood or…

neural-networks maximum-likelihood softmax

asked Feb 23 '16 at 05:37

malioboro

851
1
11
19

56

votes

16 answers

Recommended books on experiment design?

What are the panel's recommendations for books on design of experiments? Ideally, books should be still in print or available electronically, although that may not always be feasible. If you feel moved to add a few words on what's so good about the…

references experiment-design

asked Aug 18 '10 at 08:54

walkytalky

1,857
2
22
24

56

votes

2 answers

A/B tests: z-test vs t-test vs chi square vs fisher exact test

I'm trying to understand the reasoning by choosing a specific test approach when dealing with a simple A/B test - (i.e. two variations/groups with a binary respone (converted or not). As an example I will be using the data below Version Visits …

statistical-significance chi-squared-test p-value fishers-exact-test z-statistic

asked Oct 27 '15 at 12:44

L Xandor

1,119
2
9
13

56

votes

13 answers

Software for drawing bayesian networks (graphical models)

I am searching for [free] software that can produce nice looking graphical models, e.g. Any suggestions would be appreciated.

graphical-model software

asked Oct 09 '11 at 17:43

C. Reed

537
1
8
14

56

votes

4 answers

Can a random forest be used for feature selection in multiple linear regression?

Since RF can handle non-linearity but can't provide coefficients, would it be wise to use random forest to gather the most important features and then plug those features into a multiple linear regression model in order to obtain their coefficients?…

regression machine-learning feature-selection random-forest regression-strategies

asked Jul 30 '15 at 21:52

Hidden Markov Model

938
1
8
16

56

votes

6 answers

Practical hyperparameter optimization: Random vs. grid search

I'm currently going through Bengio's and Bergstra's Random Search for Hyper-Parameter Optimization [1] where the authors claim random search is more efficient than grid search in achieving approximately equal performance. My question is: Do people…

machine-learning hyperparameter optimization

asked Jul 08 '15 at 14:25

Bar

2,492
3
19
31

56

votes

9 answers

Are we exaggerating importance of model assumption and evaluation in an era when analyses are often carried out by laymen

Bottom line, the more I learn about statistics, the less I trust published papers in my field; I simply believe that researchers are not doing their statistics well enough. I'm a layman, so to speak. I'm trained in biology but I have no formal…

mathematical-statistics multiple-regression modeling

asked May 07 '15 at 11:28

Adam Robinsson

2,083
3
19
39

56

votes

4 answers

Logistic Regression - Error Term and its Distribution

On whether an error term exists in logistic regression (and its assumed distribution), I have read in various places that: no error term exists the error term has a binomial distribution (in accordance with the distribution of the response…

logistic binomial-distribution bernoulli-distribution logistic-distribution

asked Nov 20 '14 at 10:57

user61124

563
1
5
4

Most Popular