Questions tagged [model-based-clustering]

60 questions
14
votes
1 answer

Robust cluster method for mixed data in R

I'm looking to cluster a small data set (64 observations of 4 interval variables and a single three-factor categorical variable). Now, I'm quite new to cluster analysis, but I am aware that there has been considerable progress since the days when…
fmark
  • 4,666
  • 5
  • 35
  • 51
13
votes
1 answer

Mclust model selection

The R package mclust uses BIC as a criteria for cluster model selection. From my understanding, a model with the lowest BIC should be selected over other models (if you solely only care about BIC). However, when BIC values are all negative, the…
Jon
  • 2,180
  • 1
  • 11
  • 28
8
votes
2 answers

Cluster clickstream data

I've recently entered the realm of machine learning and a project I am working on requires me to cluster users based on the order they visited webpages on a website. I have data in the form of: ['user_id', 1, 2, 4, 6, 3, 7, 3, 2, 4...] Where each…
8
votes
1 answer

What are the primary differences between Taxometric analyses (e.g., MAXCOV, MAXEIG) and Latent Class analyses?

Recent research has attempted to determine if certain psychological constructs are latently dimensional or taxonic (i.e., including taxons or classes). For example, researchers may be interested in finding out if there is a certain "class" of people…
7
votes
1 answer

Are real and imaginary components of frequency element of fft correlated?

I want to use model-based clustering to classify 1,225 time series (24 periods each). I have decomposed these time series using the fast Fourier transform and selected the harmonics that explain at least a threshold percentage of time series…
5
votes
2 answers

Are there any good papers comparing different philosophical views of cluster analysis?

Lots of people use cluster analysis. I've heard very few explicitly say why. I imagine this is because within a given field, most researchers seem to understand why clustering is used for the problems typical to that area - but uses vary between…
D L Dahly
  • 3,663
  • 1
  • 24
  • 51
5
votes
1 answer

Suggestions for multi-dimensional clustering

I am working in a genomics project and I ended up having a huge table with around 800 measurements (cases/rows), around 200 channels (columns/continuous variables) and 5 categories (one categorical column) I would like to do two things: Try to…
5
votes
1 answer

Best clustering algorithm for real estate data

I want to cluster real estate data to determine average price patterns in city and rural regions. My data set contains size, number of dorms, bathrooms and coordinates of the properties. Which would be the best clustering algorithm for this…
5
votes
2 answers

LCA number of parameters & degrees of freedom

I have a series of physicians' claims submissions. I would like to perform cluster analysis as an exploratory tool to find patterns in how physicians bill based on things like Revenue Codes, Procedure Codes, etc. The data are all polytomous, and…
4
votes
1 answer

Ratio estimation model in 2-stage cluster sampling

I've been reading about stratified sampling, 2-stage SRS sampling, and ratio estimation in finite populations and I have a question. When the ratio estimator is introduced, it seems that in order for it to perform well it is necessary that the…
jld
  • 18,405
  • 2
  • 52
  • 65
4
votes
1 answer

Mclust: Data frame order affects solution

I've come across some behavior in mclust::Mclust that I would not have expected, which is that the order of variables in the data passed to Mclust affects the solution it comes up with. In the example below, the first ordering of the variables…
Cody
  • 41
  • 3
4
votes
1 answer

How to convert molecular categorical variables to dummy variables for cluster analysis?

I would like to use a clustering method, e.g. 'mclust', in R to classify each individual in my dataset to k groups. I have 7 continuous and 3 categorical variables. These and other hierarchical clustering methods do not allow for use of categorical…
4
votes
1 answer

Looking for a hierarchical-clustering method for multiple data types

I would like to find a hierarchical-clustering method useful to assign a group membership into k groups for all individuals in my dataset. I have considered several classic ordination methods, PCA, NMDS, "mclust", etc., but three of my variables are…
3
votes
1 answer

Estimating effects on membership in a cluster

Suppose you want to find clusters based on a set of variables $Y$, and that you want to estimate the effects of some variables $X$ on membership in those clusters. Here is how I am doing it now. Step 1: Perform model-based clustering on the…
Brash Equilibrium
  • 3,565
  • 1
  • 25
  • 43
3
votes
1 answer

Can first-order Markov chain be considered a special case of a hidden Markov model?

I am trying to apply R depmixS4 package in order to cluster time series with model based clustering. The model consists of K components, each being a first order Markov models. The Expectation-Maximization algorithm is then used to estimate model…
1
2 3 4