Questions tagged [subset]

One set, A, is a subset of another, B, if and only if all elements of A are elements of B.

A set is any collection of elements. Sets can be related to each other in various ways; one important way sets can be related is that one set could be a subset of another. One set, $A$, is a subset of another, $B$, if and only if all elements of $A$ are elements of $B$. This is denoted $A \subseteq B$. Notably, $A$ can be a subset of $B$ even if there are no elements in $B$ that are not also in $A$ (i.e., they contain exactly the same elements). When $B$ contains additional elements that are not in $A$, $A$ is called a "proper subset" of $B$. This is denoted $A \subset B$. The same relationship between these sets can be indicated by calling $B$ a "superset", or "proper superset", of $A$ ($B \supseteq A$, or $B \supset A$, respectively), but referring to the [potentially] smaller set as a subset is more common.

119 questions
11
votes
2 answers

Bias in average age for grandmaster title qualification by age groups?

It has been known for quite some time that the youngest age at which chess players managed to qualify for the grandmaster title has significantly decreased since the 1950s, and there are currently almost 30 players who became grandmaster before…
Tsundoku
  • 237
  • 1
  • 3
  • 12
10
votes
3 answers

How to calculate number of sets in Sigma Algebra

The example 1.2.2 of the book Statistical Inference by Casella and Berger states: if S has n elements, there are 2^n sets...(please see attached). Could you please explain how the authors derived that formula? Thank you.
Nemo
  • 317
  • 2
  • 11
7
votes
1 answer

Is there a name for the increase in variance upon remeasurement after subsetting with a cut-off value?

Context: My problem relates to estimating effect sizes, such as Cohen's d, when looking at a subset of the population defined by a cut-off threshold. This effect size is the difference in two population means divided by the (assumed equal)…
David Luke Thiessen
  • 1,232
  • 2
  • 15
7
votes
1 answer

Is the mean of equal length subsets always equal to the mean of the set?

I've been fooling around with some numbers while learning R and would like to know if the following is generalizable. When I calculated the mean of numbers 1 through 100, I got this: > mean(1:100) [1] 50.5 Then when I calculated the mean of equal…
user60305
6
votes
1 answer

Foundational sufficient statistics

I've been reading through Casella and Berger's Statistical Infererence and have am having a little trouble understanding something in their explanation of sufficient statistics. Here is the passage from page 272-273 before I proceed (note that…
6
votes
1 answer

Estimate mean & standard deviation of set S if I know stats of an inner set and outer set

Suppose I have sets $S_1\subset S_2\subset S_3$. I know exactly the size, mean, and standard deviation of both $S_1$ and $S_3$ and want to estimate the mean and standard deviation of $S_2$, where I only know its size. What's the best estimation I…
5
votes
4 answers

Variance of set of subsets

First of all sorry for the sloppy terminology, but I am right looking for the name of a statistical concept. I was asked to calculate the "turnover" of the Facebook friends commenting on my posts, so I am looking for an indicator that has high…
MrTJ
  • 103
  • 6
5
votes
3 answers

Selecting the most similar subset from an alternative dataset

Background: I have two different datasets coming from two different source. Dataset A has m features (e.g. v1,v2,v3,v4) with let us say 1 million instances. Second dataset B has n features some of which are same as in A and then some (e.g.…
earthlink
  • 285
  • 1
  • 10
4
votes
1 answer

Testing for a drop in bookings

We're developing real-time alerts for fine-grained (every 5 minutes) time series bookings data, and I'm looking for the best approach to doing this. Idea is that if over the past 10–15 minutes (say) there's a big drop in bookings volume relative to…
user11284
4
votes
1 answer

Applying linear regression to a data subset

I've just run a linear regression on an entire data set, but now I need to run the regression with data just from females within the data. Females are denoted under the female column of the data set by a 1. Males are denoted by a 0 under the same…
user41218
  • 53
  • 1
  • 1
  • 3
4
votes
0 answers

Separate ANOVAs on subsets of data

I often read that, after obtaining complex (e.g., four-way) interactions in a factorial ANOVA, researchers decide to split their data by a factor (e.g., gender) and run separate ANOVAs for these two groups. I realize that this can be very helpful…
Statsguest
  • 41
  • 1
4
votes
1 answer

Probability problem - Probability to pick a subset divisible by 3

I'm trying to solve this puzzle but I get stuck. I thought about trying to use the law of total probability to solve intermediate problems with subset of size k but it didn't helped me that much. Is anyone kind enough to give me the right approach…
Meliodas
  • 49
  • 1
4
votes
1 answer

Unsupervised clustering of sequence of events to subsequences

I have a big dataset of M sequences of [1 - N] events, where each event has multiple properties (start date, end date, location, and more contextual features). For each sequence of [1-N] events I want to find up to K (1<=K<=N) subsequences…
4
votes
0 answers

Dealing with many NA's in very large datasets for Lasso

I have a few very large and quite "dirty" (survey) datasets. Primarily, there are lots of NA's. These NA's are mostly the result of different questions being asked in different waves. It is perfectly possible that a question present in the dataset…
Tom
  • 209
  • 4
  • 17
4
votes
2 answers

Is it needed to train the selected model again on entire data before putting in production?

When a dataset is needed to be modeled, the process is to take a part of it out as holdout set which is "unseen" by the training method and is used to test the performance of models created using various techniques. Also while training, cross…
1
2 3 4 5 6 7 8