2

I am trying to understand Proc Varclus. This page says

"2.If the second eigenvalue for the cluster is greater than the specified cutoff, then the inital cluster is split into two clusters."

What is the second eigenvalue? I couldn't find much documnetation on the internet. Please explain or suggest links for the same.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
Srikanth Guhan
  • 106
  • 3
  • 13
  • 1
    Your link is invalid--you pasted text in place of the link. A Google search suggests you intended to link to http://www.listendata.com/2015/03/proc-varclus-explained.html. That is a non-authoritative site. The SAS manual page at http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_varclus_sect004.htm provides a clearer explanation: it refers to the " the largest eigenvalue associated with the second principal component." Principal components and their eigenvalues are extensively described and explained at http://stats.stackexchange.com/questions/2691. – whuber Jun 15 '15 at 15:57
  • @Whuber thanks. Why was this question voted down? Did I miss something? – Srikanth Guhan Jun 16 '15 at 07:59
  • I haven't any idea about the source of the downvote. (Moderators have no information about voting details, which are kept private.) – whuber Jun 16 '15 at 12:40

2 Answers2

2

You should start by take a look at wikipedia.

With a matrix $A$, an eigenvector $v$ and its eigenvalue $\lambda$ are defined by:
$Av = \lambda v$
$A$ has for dimension $n * m$, where $n$ is the dimension and $m$ the size of your cluster.

Basically, you have a n-dimensional cluster, compute its eigenvectors and associated eigenvalues, and split it into two clusters if you reach a threshold. However, I agree that "the second eigenvalue" is a little ambiguous, but as said by whuber, the document may refer to "the largest eigenvalue associated with the second principal component."

NiziL
  • 136
  • 4
1

The "second" eigenvalue is either

  • the second largest eigenvalue
  • the second smallest eigenvalue

after performing eigenvalue decomposition (which yields a set of eigenvectors with associated eigenvalues, and this set can be sorted by the eigenvalues) depending on the exact context. Most often, it is the second largest (and if you go to the SAS documentation, this holds for varclus, too).

In many cases, experience has shown that the first eigenvalue captures uninteresting data, such as the document size. Then it is ignored, and the next eigenvectors are considered to be topics. In other cases, the smallest or second-smallest (often non-zero) eigenvalue is used.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96