Questions tagged [partitioning]

A partition is an assignment of every element of a set into 1 & only 1 subset w/ no empty subsets. A common instance of partitioning in statistics is the partitioning of sums of squares for F-tests.

A partition is an assignment of every element of a set into one and only one subset with no empty subsets. That is, no element of the original / super- set is unassigned, no element is assigned to more than one subset, and there is no subset without any assigned elements. A common instance of partitioning in statistics is the partitioning of sums of squares for F-tests.

120 questions
46
votes
8 answers

How to do community detection in a weighted social network/graph?

I'm wondering if someone could suggest what are good starting points when it comes to performing community detection/graph partitioning/clustering on a graph that has weighted, undirected edges. The graph in question has approximately 3 million…
34
votes
5 answers

How to split dataset for time-series prediction?

I have historic sales data from a bakery (daily, over 3 years). Now I want to build a model to predict future sales (using features like weekday, weather variables, etc.). How should I split the dataset for fitting and evaluating the models? Does…
tobip
  • 1,450
  • 4
  • 14
  • 11
18
votes
3 answers

Data partitioning for spatial data

I am constructing different configurations of a Random Forest in order to investigate the influence of well-design variables and location, on the first-year production volumes of shale oil wells, within a given area in the US. In the different model…
15
votes
2 answers

Partitioning trees in R: party vs. rpart

It's been a while since I looked at partitioning trees. Last time I did this sort of thing, I like party in R (created by Hothorn). The idea of conditional inference via sampling makes sense to me. But rpart also had appeal. In the current…
Peter Flom
  • 94,055
  • 35
  • 143
  • 276
12
votes
1 answer

Difference in implementation of binary splits in decision trees

I am curious about the practical implementation of a binary split in a decision tree - as it relates to levels of a categorical predictor $X{j}$. Specifically, I often will utilize some sort of sampling scheme (e.g. bagging, oversampling etc) when…
B_Miner
  • 7,560
  • 20
  • 81
  • 144
11
votes
3 answers

Does Newman's network modularity work for signed, weighted graphs?

The modularity of a graph is defined on its Wikipedia page. In a different post, somebody explained that modularity can easily be computed (and maximized) for weighted networks because the adjacency matrix $A_{ij}$ can as well contain valued ties.…
11
votes
0 answers

What approaches use multiple eigenvectors in graph spectral clustering?

Background: In Newman's PNAS 2006 paper Modularity and community structure in networks, the first eigenvector splits the graph in two clusters, and then each cluster can be further divided by eigenvector of a modified Laplacian of the nodes within…
9
votes
2 answers

Is $R^2$ value valid for insignificant OLS regression model?

I am interested in stating that ___ % of the variance in Y is explained uniquely by $X_1$ and ___ % is explained uniquely by $X_2$. Is there some way to obtain this from a multiple regression model, or do I need to obtain adjusted $R^2$ values…
Patrick
  • 1,381
  • 1
  • 15
  • 21
8
votes
1 answer

Estimate the population variance from a set of means

I have a set of measurements which is partitioned into M partitions. However, I only have the partition sizes $N_i$ and the means $\bar{x}_i$ from each partition. Because all measurements are assumed to be from the same distribution, I believe I can…
8
votes
2 answers

Newman's modularity clustering for graphs

I am interested in running Newman's modularity clustering algorithm on a large graph. If you can point me to a library (or R package, etc) that implements it I would be most grateful.
laramichaels
  • 1,119
  • 3
  • 12
  • 12
6
votes
1 answer

Nested ANOVA: Unequal sample sizes? Variance components?

I am completely out of my depth on this, and all the reading I try to do just confuses me. I'm hoping you can explain things to me in a way that makes sense. (As always seems to be the case, "It shouldn't be this hard!") I'm trying to help a student…
Sam R
  • 395
  • 2
  • 10
6
votes
1 answer

Sampling uniformly from the set of partitions of a set?

In this blogpost, the writer states "It’s easy to sample uniformly from the set of partitions of a set: you pick a number of bins using an appropriate exponential distribution, then randomly i.i.d. toss each element of the set into one of those…
6
votes
1 answer

Interpreting output of igraph's fastgreedy.community clustering method

With the help of several people in this community I have been wetting my feet in clustering some social network data using igraph's implementation of modularity-based clustering. I am having some trouble interpreting the output of this routine and…
laramichaels
  • 1,119
  • 3
  • 12
  • 12
5
votes
2 answers

R procedure for comparing multiple categorical variables (similar to anova() followed by t.test() for continuous)?

Big Picture: How can I implement partitioned Chi Square in R? I understand how to perform the overall Chi square, and then how to get individual parameters (observed counts, expected counts, residuals, etc.). However, I don't understand how to get…
sudo make install
  • 242
  • 1
  • 3
  • 9
5
votes
0 answers

Variance partitioning - why be cautious?

I'm about to use variance partitioning to interpret my results of a given model and across models and have come across various criticisms of it most notably by Pedhazur (1982, 1997). Also, the criticisms are of both the approaches to VP -…
Ph8
  • 51
  • 2
1
2 3 4 5 6 7 8