Questions tagged [discrete-data]

Refers to data generated from a distribution that has a countable sample space. The discrete data tag may encompass categorical data, whether nominal (e.g. the distribution of race in a sample of individuals) or ordinal (e.g. socio-economic status), or an actual discrete random variate, such as a set of event counts (e.g. the number of errors on a page of text). Discrete data need not necessarily be integer, however.

From Mood et al. (page 57, 1974):

"A random variable $X$ will be defined to be discrete if the range of $X$ is countable. If a random variable $X$ is discrete, then its corresponding cumulative distribution function $F_X( . )$ will be defined to be discrete."

Mood, A. M., Graybill, F. A., & Boes, D. C. (1974). Introduction to theory of statistics. (B. C. Harrinson & M. Eichberg, Eds.) (3rd ed., p. 564). McGraw-Hill, Inc.

541 questions
73
votes
10 answers

What is the difference between discrete data and continuous data?

What is the difference between discrete data and continuous data?
Albort
  • 881
  • 1
  • 9
  • 10
40
votes
3 answers

Is Kolmogorov-Smirnov test valid with discrete distributions?

I'm comparing a sample and checking whether it distributes as some, discrete, distribution. However, I'm not enterily sure that Kolmogorov-Smirnov applies. Wikipedia seems to imply it does not. If it does not, how can I test the sample's…
38
votes
5 answers

Clustering a dataset with both discrete and continuous variables

I have a dataset X which has 10 dimensions, 4 of which are discrete values. In fact, those 4 discrete variables are ordinal, i.e. a higher value implies a higher/better semantic. 2 of these discrete variables are categorical in the sense that for…
35
votes
2 answers

Dropping one of the columns when using one-hot encoding

My understanding is that in machine learning it can be a problem if your dataset has highly correlated features, as they effectively encode the same information. Recently someone pointed out that when you do one-hot encoding on a categorical…
31
votes
4 answers

Predicting with both continuous and categorical features

Some predictive modeling techniques are more designed for handling continuous predictors, while others are better for handling categorical or discrete variables. Of course there exist techniques to transform one type to another (discretization,…
28
votes
1 answer

Kolmogorov-Smirnov with discrete data: What is proper use of dgof::ks.test in R?

Beginner questions: I want to test whether two discrete data sets come from the same distribution. A Kolmogorov-Smirnov test was suggested to me. Conover (Practical Nonparametric Statistics, 3d) seems to say that the Kolmogorov-Smirnov Test can be…
Mars
  • 888
  • 2
  • 10
  • 20
21
votes
2 answers

Does this discrete distribution have a name?

Does this discrete distribution have a name? For $i \in 1...N$ $f(i) = \frac{1}{N} \sum_{j = i}^N \frac{1}{j}$ I came across this distribution from the following: I have a list of $N$ items ranked by some utility function. I want to randomly select…
Tom
  • 313
  • 1
  • 6
20
votes
1 answer

Basic questions about discrete time survival analysis

I am attempting to carry out a discrete time survival analysis using a logistic regression model, and I'm not sure I completely understand the process. I would greatly appreciate assistance with a few basic questions. Here is the set up: I'm…
Talbot Katz
  • 361
  • 2
  • 5
19
votes
2 answers

Anomaly Detection with Dummy Features (and other Discrete/Categorical Features)

tl;dr What is the recommended way to deal with discrete data when performing anomaly detection? What is the recommended way to deal with categorical data when performing anomaly detection? This answer suggests using discrete data to just filter the…
17
votes
4 answers

Probability formula for a multivariate-bernoulli distribution

I need a formula for the probability of an event in a n-variate Bernoulli distribution $X\in\{0,1\}^n$ with given $P(X_i=1)=p_i$ probabilities for a single element and for pairs of elements $P(X_i=1 \wedge X_j=1)=p_{ij}$. Equivalently I could give…
user3152
17
votes
2 answers

How to fit a discrete distribution to count data?

I have the following histogram of count data. And I would like to fit a discrete distribution to it. I am not sure how I should go about this. Should I first superimpose a discrete distribution, say Negative Binomial distribution, on the histogram…
15
votes
2 answers

Classification with ordered classes?

Say I want to train a classifier that assigns an image of a person as young, middle-aged, or old. A simple way would be to treat the classes as independent categories and train a classifier. But apparently there's some relationship between the…
dontloo
  • 13,692
  • 7
  • 51
  • 80
14
votes
1 answer

Hamiltonian Monte Carlo and discrete parameter spaces

I've just started building models in stan; to build familiarity with the tool, I'm working through some of the exercises in Bayesian Data Analysis (2nd ed.). The Waterbuck exercise supposes that the data $n \sim \text{binomial}(N, \theta)$, with…
Sycorax
  • 76,417
  • 20
  • 189
  • 313
14
votes
1 answer

How to test if my data is discrete or continuous?

It seems to me that to choose the right statistical tools, I have to firstly identify if my dataset is discrete or continuous. Could you mind to teach me how can I test whether the data is discrete or continuous with R?
evdstat
  • 561
  • 4
  • 7
  • 13
13
votes
3 answers

Properties of a discrete random variable

My stats course just taught me that a discrete random variable has a finite number of options ... I hadn't realized that. I would have thought, like a set of integers, it could be infinite. Googling and checking a several web pages, including a few…
James
  • 392
  • 3
  • 11
1
2 3
36 37