Highest Voted 'similarities' Questions - Statistical Analysis Stack Exchange

50

votes

6 answers

Percentage of overlapping regions of two normal distributions

I was wondering, given two normal distributions with $\sigma_1,\ \mu_1$ and $\sigma_2, \ \mu_2$ how can I calculate the percentage of overlapping regions of two distributions? I suppose this problem has a specific name, are you aware of any…

asked Jun 22 '11 at 07:59

Ali Salehi

603
1
6
5

47

votes

2 answers

Hierarchical clustering with mixed type data - what distance/similarity to use?

In my dataset we have both continuous and naturally discrete variables. I want to know whether we can do hierarchical clustering using both type of variables. And if yes, what distance measure is appropriate?

clustering similarities distance-functions mixed-type-data gower-similarity

asked Sep 07 '11 at 16:18

Beta

5,784
9
33
44

33

votes

1 answer

Comparing hierarchical clustering dendrograms obtained by different distances & methods

[The initial title "Measurement of similarity for hierarchical clustering trees" was later changed by @ttnphns to better reflect the topic] I am performing a number of hierarchical cluster analyses on a dataframe of patient records (e.g. similar to…

r clustering distance-functions similarities dendrogram

asked Jul 07 '13 at 07:57

Wouter

2,102
3
17
26

28

votes

1 answer

Converting similarity matrix to (euclidean) distance matrix

In Random forest algorithm, Breiman (author) constructs similarity matrix as follows: Send all learning examples down each tree in the forest If two examples land in the same leaf increment corresponding element in similarity matrix by 1 Normalize…

random-forest distance similarities euclidean

asked Sep 12 '12 at 09:59

Uros K

467
1
6
9

26

votes

2 answers

Similarity Coefficients for binary data: Why choose Jaccard over Russell and Rao?

From Encyclopedia of Statistical Sciences I understand that given $p$ dichotomous (binary: 1=present; 0=absent) attributes (variables), we can form a contingency table for any two objects i and j of a sample: j 1 0 ------- …

binary-data similarities association-measure

asked Jun 13 '13 at 21:24

wflynny

455
1
6
10

25

votes

5 answers

Compute a cosine dissimilarity matrix in R

I want to create heatmaps based upon cosine dissimilarity. I'm using R and have explored several packages, but cannot find a function to generate a standard cosine dissimilarity matrix. The built-in dist() function doesn't support cosine distances,…

r clustering similarities cosine-similarity cosine-distance

asked Jul 03 '12 at 12:30

Greg Slodkowicz

405
1
5
10

22

votes

5 answers

Similarity measures between curves?

I would like to compute the measure of similarity between two ordered sets of points---the ones under User compared with the ones under Teacher: The points are curves in 3D space, but I was thinking that the problem is simplified if I plotted them…

multiple-comparisons similarities curves procrustes-analysis

asked May 05 '12 at 22:22

Alex

321
1
3
4

21

votes

4 answers

Euclidean distance score and similarity

I'm just working with the book Collective Intelligence (by Toby Segaran) and came across the Euclidean distance score. In the book the author shows how to calculate the similarity between two recommendation arrays (i.e. $\textrm{person} \times…

distance-functions similarities

asked Mar 23 '13 at 12:05

navige

325
1
2
6

17

votes

1 answer

What are the difference between Dice, Jaccard, and overlap coefficients?

I come across three different statistical measures to compare two sets, in particular to segmentation on images (e.g., comparing the similarity between the ground truth and the segmented result). What are the differences between these measurements…

machine-learning similarities dice jaccard-similarity image-segmentation

asked Oct 05 '16 at 23:20

RockTheStar

11,277
31
63
89

17

votes

3 answers

Can someone please explain dynamic time warping for determining time series similarity?

I am trying to grasp the dynamic time warping measure for comparing time series together. I have three time series datasets like this: T1 <- structure(c(0.000213652387565, 0.000535045478866, 0, 0, 0.000219346347883, 0.000359669104424,…

r time-series clustering similarities

asked Feb 03 '12 at 07:21

Legend

4,232
7
37
50

16

votes

1 answer

What is the optimal distance function for individuals when attributes are nominal?

I do not know which distance function between individuals to use in case of nominal (unordered categorical) attributes. I was reading some textbook and they suggest Simple Matching function but some books suggest that I should change the nominal to…

distance-functions distance similarities association-measure categorical-data

asked Apr 11 '13 at 04:59

Jane Doe

311
1
2
6

15

votes

4 answers

What is the purpose of row normalization

I understand the reasoning behind column normalization, as it causes features to be weighted equally, even if they are not measured on the same scale - however, often in the nearest neighbour literature, both columns and rows are normalized. What is…

normalization distance similarities k-nearest-neighbour

asked Oct 04 '15 at 22:57

curiosity_delivers

173
1
1
8

15

votes

3 answers

Quantifying similarity between two data sets

Summary: Trying to find the best method summarize the similarity between two aligned data sets of data using a single value. Details: My question is best explained with a diagram. The graphs below show two different data sets, each with values…

similarities

asked Mar 19 '15 at 00:43

Gabriel Southern

271
1
2
8

12

votes

3 answers

Distance Metrics For Binary Vectors

I have vectors of same length consisting of 1 and 0. I am trying to find out how similar they are. So far I am using hamming distance that I calculate sum of one vector then sum of second vector and the difference between this is the difference of…

categorical-data binary-data distance similarities

asked May 11 '13 at 11:12

totpiko

241
1
2
3

12

votes

2 answers

Does Mercer's theorem work in reverse?

A colleague has a function $s$ and for our purposes it is a black-box. The function measures the similarity $s(a,b)$ of two objects. We know for sure that $s$ has these properties: The similarity scores are real numbers between 0 and 1,…

kernel-trick distance similarities rbf-kernel

asked May 08 '18 at 17:57

Sycorax

76,417
20
189
313

Questions tagged [similarities]