Questions tagged [gower-similarity]

Gower coefficient is a similarity measure for mixed type data (quantitative, binary, categorical). 'Gower distance' refers to this tag too.

26 questions
47
votes
2 answers

Hierarchical clustering with mixed type data - what distance/similarity to use?

In my dataset we have both continuous and naturally discrete variables. I want to know whether we can do hierarchical clustering using both type of variables. And if yes, what distance measure is appropriate?
13
votes
2 answers

How does the Gower distance calculate the difference between binary variables'?

I have 17 numeric and 5 binary (0-1) variables, with 73 samples in my dataset. I need to run a cluster analysis. I know that the Gower distance is a good metric for datasets with mixed variables. However, I couldn't understand how the Gower distance…
Emrah Bilgiç
  • 289
  • 2
  • 7
  • 14
9
votes
1 answer

Confidence intervals around a centroid with modified Gower similarity

I would like to obtain 95% confidence intervals for centroids based on Gower similarity between some mulivariate samples (community data from sediment cores). I have so far used the vegan{} package in R to obtain modified Gower similarity between…
Margaret
  • 205
  • 2
  • 5
6
votes
2 answers

Gower's (dis)similarity index

I would like to ask a question about Gower similarity/dissimilarity index. Is it ok to use the Gower dissimilarity measure with Ward linkage clustering? I was reading that the Gower similarity index should not be used with Ward linkage because the…
M. Tremmel
  • 61
  • 1
  • 1
  • 2
5
votes
1 answer

How to use Gower's Distance with clustering algorithms in Python

I am trying to cluster by dataset with mixed features using k-means. As a distance metric, I am using Gower's Dissimilarity. I want to ask 2 things: -Is k-means an appropriate algorithm that can accept Gower's matrix results as an input? or how can…
Beg
  • 151
  • 1
  • 3
5
votes
2 answers

Gower distance with R functions; "gower.dist" and "daisy"

I have 9 numeric and 5 binary (0-1) variables, with 73 samples in my dataset. I know that the Gower distance is a good metric for datasets with mixed variables. I tried both daisy(cluster) and gower.dist(StatMatch) functions. We can assign weights…
Emrah Bilgiç
  • 289
  • 2
  • 7
  • 14
4
votes
1 answer

In cluster analysis, can you use Gower's coefficient of similarity with a k-means clustering method?

I am researching cluster analysis, and I am interested in variables that are both categorical and continuous, for which I have read that a Gower's similarity coefficient is a good proximity measure. I have read that Gower's similarity coefficient is…
Laura
  • 61
  • 2
  • 3
3
votes
1 answer

Is one-hot encoding and standardization of data equivalent to Gower's distance?

For clustering and other techniques for mixed data (numerical and categorical), Gower's distance is usually more preferred than Euclidean distance because the former computes distance differently for numerical data and categorical data. For the…
3
votes
0 answers

Gower's dissimilarity measure and Ward's clustering method

I have read some threads on this website saying that it is not OK to use Gower's dissimilarity matrix for Ward's clustering algorithm. I have mixed type variables, first I had a dissimilarity matrix with Gower's formula in R (daisy function). I had…
2
votes
0 answers

Gower distance and MDS: How to determine which variables count?

I have morphological data from two different determined groups (It and Nd), where the variables are heterogeneous (continuous, semi-qualitative, binomial). I want to know if the groups differ morphologically and if so, which variables account most…
2
votes
1 answer

Cluster many thousands observations (mixed variable types). Cluster subsample and then classify the rest observations?

I'm trying to run a cluster analysis on a large dataset (70k+ observations to cluster) with mixed variables (numeric, ordinal, binary and nominal). I don't think I can create the distance matrix using SAS over the entire dataset. So, I have tried to…
2
votes
1 answer

Determining number of clusters with SSE scree plot with Gower's coefficient of similarity

I am researching cluster analysis, and I am interested in variables that are both categorical and continuous, for which I have read that a Gower's similarity coefficient is a good proximity measure. I am interested in first using an average linkage…
Laura
  • 61
  • 2
  • 3
2
votes
1 answer

Manual computation of Gower's similarity coefficient

There is an example on the computation of Gower's similarity coefficient on the page, Gower's similarity coefficient I am trying to work out the similarity manually between patient 1 and 2, however my computation is giving me completely different…
Nimonika
  • 23
  • 3
2
votes
2 answers

How to use Gower's Distance with DBSCAN algorithm in Python

I have been researching about using DBSCAN with sklearn in python but it doesn't have Gower's distance metric built in. All the other implementations are in R in this community. I'm using a dataset with categorical and continuous features and as far…
2
votes
1 answer

How Gower's dissimilarity handle missing values in numeric columns?

I would like to ask a question about Gower dissimilarity, I was wondering how Gower measure handle missing values in numeric columns, especially that Gower standardized each column based on the range of the same attribute ? I have read both details…
1
2