Questions tagged [dispersion]

This tag is very ambiguous. Dispersion is a general term for how spread apart values are. For questions related to disease dissemination content, use the tag epidemiology. For other meanings of dispersion consider using a related tag or creating a new one.

Tag usage

110 questions
16
votes
6 answers

How to detect polarized user opinions (high and low star ratings)

If I have a star rating system where users can express their preference for a product or item, how can I detect statistically if the votes are highly "divided". Meaning, even if the average is 3 out of 5, for a given product, how can I detect if…
David Williams
  • 323
  • 2
  • 7
11
votes
1 answer

Use Cases For Coefficient of Variation vs Index of Dispersion

I am attempting to algorithmically estimate the burstiness of a dataset and have found two comparable metrics. The coefficient of variation is the ratio of the standard deviation to the mean. The index of dispersion is the ratio of the variance to…
Joshua Bambrick
  • 213
  • 2
  • 5
11
votes
4 answers

Why Are Measures of Dispersion Less Intuitive Than Centrality?

There seems to be something in our human understanding that creates difficulties in grasping intuitively the idea of variance. In a narrow sense the answer is immediate: squaring throws us off from our reflexive understanding. But, is it just…
Antoni Parellada
  • 23,430
  • 15
  • 100
  • 197
11
votes
4 answers

The role of variance in Central Limit Theorem

I've read somewhere that the reason we square the differences instead of taking absolute values when calculating variance is that variance defined in the usual way, with squares in the nominator, plays a unique role in Central Limit Theorem. Well,…
user4205580
  • 471
  • 1
  • 5
  • 13
10
votes
5 answers

How to measure dispersion in word frequency data?

How can I quantify the amount of dispersion in a vector of word counts? I'm looking for a statistic that will be high for document A, because it contains many different words that occur infrequently, and low for document B, because it contains one…
dB'
  • 225
  • 3
  • 15
9
votes
2 answers

Why are Pearson's residuals from a negative binomial regression smaller than those from a poisson regression?

I have these data: set.seed(1) predictor <- rnorm(20) set.seed(1) counts <- c(sample(1:1000, 20)) df <- data.frame(counts, predictor) I ran a poisson regression poisson_counts <- glm(counts ~ predictor, data = df, family = "poisson") And a…
6
votes
1 answer

Definition of exponential family with dispersion parameter

I was recently reading a discussion of generalized linear models that considered the response to come from an exponential family with a dispersion parameter so $$ f(y|\theta,\phi) = \exp\left(\frac{y\theta - b(\theta)}{a(\phi)} + c(y,…
6
votes
1 answer

Why iterative estimation of dispersion in negative binomial glm

If we do a negative binomial GLM with unknown deviance, a frequent strategy (used for example by glm.nb in package MASS in R) is to use a Gibbs sampler: Hold dispersion fixed, estimate the mean Hold mean fixed, estimate the dispersion Isn't the…
5
votes
1 answer

How does R find the dispersion parameter in a GLM?

I'm working on a problem involving fitting a GLM to data and I'm curious about how R calculates the dispersion parameter. For example, I have this output for the summary of my GLM. glm(formula = Lifespan ~ glucose + Temperature, family = …
5
votes
1 answer

Why would someone plot variance normalized by the mean?

I'm reading a scientific paper where they plot the variance of particle intensity normalized by the mean of particle intensity. I'm a bit confused and don't have an intuition for how this should be helping me. I'm used to seeing standard deviation…
user391339
  • 151
  • 1
  • 1
  • 5
4
votes
2 answers

Vector-valued estimators, intuitively why $var(\widetilde{\beta})-var(\widehat{\beta})$ being p.s.d. means $\widehat{\beta}$ more efficient?

For two scalar unbiased estimators $\widehat{\alpha}$ and $\widetilde{\alpha}$, we know that if one has smaller variance, then we say it is more efficient, which intuitively means that this estimator is more concentrated around true value (or has…
T34driver
  • 1,608
  • 5
  • 11
4
votes
1 answer

Concept of a z-score for a gamma distribution

This is a somewhat general question. For the normal distribution we have the handy concept of the z-score that can be used to measure distance from the mean in a standardized way. I've encountered a situation where I'm working with more of a gamma…
pabz
  • 41
  • 3
4
votes
3 answers

Which measure of dispersion is this function related to?

Consider a sample $x=\{x_1,...,x_n\}$. Define the average as $\bar x$. Consider the following formula: $$ \dfrac{\sum_{i=1}^n\left(\dfrac{x_i}{\bar x} \right)^c}{n} $$ or equivalently: $$ \dfrac{\sum_{i=1}^n\left(1 + \dfrac{\epsilon_i}{\bar x}…
luchonacho
  • 2,568
  • 3
  • 21
  • 38
4
votes
1 answer

Measures of clustering vs dispersal

I'm looking for metrics of the degree of grouping/clustering of spatial data. I'm not looking to formally cluster the data i.e. classify points within groupings. But rather an index such as from 0 for a uniform distribution to 1 for a set of…
geotheory
  • 547
  • 2
  • 4
  • 14
3
votes
0 answers

Estimating the mean from knowing the first n largest values

There is a sample of n values that are the first n largest values of a population. Is there a way of getting any statistic such as mean or dispersion from such piece of information provided that the population is normally distributed with its size…
Germaniawerks
  • 1,027
  • 1
  • 10
  • 15
1
2 3 4 5 6 7 8