Is optimizing α for partitions of a scale a sensible method for factor analysis? (was: relation between consistency and unidimensionality)

Question

Cluelessness Disclaimer

I'm a statistics noob so if at all possible, please don't stone me. Also write slowly and in simple terms.

I'm wondering what the relation between the internal consistency of a scale as usually measured by Cronbach's $\alpha$ and unidimensionality is.

The concrete reason I'm asking this fundamental question is that I tried to split a scale into three factors which separated quite nicely using a PCA and a direct oblimin rotation. Almost no intercorrelation between factors and the worst loading in any scale is $.59$.

Now while the first new scale that fell out of this process has a nice $\alpha$ of $.81$ (the total scale had $.5$), the second scale has a whoppingly bad $\alpha$ of less than minus frickin' six point five! I've seen my fair share of negative alphas in my short statistical life but this is unprecedented.

Now my naïve understanding is that internal consistency measures the existence of an underlying construct and unidimensionality of a scale means that this underlying construct has only one dimension. This would mean that internal consistency is a prerequisite for unidimensionality. This SSPS FAQ item at least seems to imply this and this section in Wikipedia that I found while Googling might imply anything because it's really poorly written but it refers to this paper that unfortunately goes over my head.

Because this post got so lengthy, this is probably a good time for a short break for your eyes and brain.

lolcat

Sorry, this looked like a funny idea on paper. Anyway, there is also this question Assessing reliability of a questionnaire: dimensionality, problematic items, and whether to use alpha, lambda6 or some other index? I already found that I feel might contain the answer I'm seeking but it doesn't fully reveal itself to me.

Based on this understanding however, I started with $\alpha$ and tried to work my way from there. What I did is somewhat … unorthodox (which probably translates to "wrong" but I hope we'll hear more on that in an answer to this post). I did a global optimization of $\alpha$ which means I went through all possible partitions of the items of my scale and computed the average $\alpha$ for each partition. Funny thing is, the best solution for three factors (three and four factors had an identical best average $\alpha$ of $.71$) actually makes sense in terms of interpretability (the whole PCA shebang didn't). I'm pretty sure this is a coincidence though, right?

Edit:

After StasK's comment, I decided to include my code for the $\alpha$ optimization in case others might find it useful. Just put your scale into a file named data.csv with columns as variables and rows as cases.

This script's not pretty and it's not fast; it did the job I personally wanted it to do. The optimization of my 12 items took almost 12 minutes on my Core i3. There can be optimizations done in the code but they probably won't fundamentally change the fact that optimizing a 30-item scale with this method would take not 30 minutes as one might naïvely assume but roughly 300 times the age of the universe. Blindly partitioning your 100-item questionnaire is probably not a good idea with this method. The number of partitions generated by partitions() can be checked here http://www.wolframalpha.com/input/?i=bell+number+of+12. Just replace 12 by your number of items.

If you only want to generate partitions with a certain number of factors, replace for p in partitions(range(len(variances))): by for p in n_partitions(range(len(variances)), 3): for 3 factors say. This won't make it faster though (here a considerable speedup would probably be possible if you restrict the search depth in the first place).

In the long run, only a cleverer way than to test all possible partitions will do for larger scales.

#!/usr/bin/env python

from numpy import *
import sys

def variance(data):
    n = len(data)
    variance = 0.0 
    mean = float(sum(data))/n
    for x in data:
        variance += (x - mean)**2
    return variance/n

# from http://stackoverflow.com/q/2037327/1050373
def partitions(set_):
    if not set_:
        yield []
        return
    for i in xrange(2**len(set_)/2):
        parts = [set(), set()]
        for item in set_:
            parts[i&1].add(item)
            i >>= 1
        for b in partitions(parts[1]):
            yield [parts[0]]+b

def n_partitions(set_, n):
    for partition in partitions(set_):
        if len(partition) == n:
            yield partition

def alpha(data, variances, cols):
    n = len(cols)
    cols = array(cols)
    data_cols = data.transpose()[cols]
    variances = variances[cols]
    data_rows = data_cols.transpose()
    row_sums = array([sum(row) for row in data_rows])
    return n/float(n-1)*(1-sum(variances)/variance(row_sums))

data = genfromtxt('data.csv', delimiter=',')

print 'input data:'
print data

# pre-compute column variances
variances = array([variance(col) for col in data.transpose()])

data = data[0:len(data)]

print 'overall alpha:'
print alpha(data, variances, range(len(variances)))

best_alpha_average = float('-inf')

for p in partitions(range(len(variances))):
    lengths = [len(subset) for subset in p]
    if min(lengths) < 2:
        continue
    alphas = [alpha(data,variances,list(subset)) for subset in p]
    alpha_average = sum(alphas)/len(p)
    if alpha_average >= best_alpha_average:
        print 'new best:',alpha_average,alphas
        print 'partition of size '+ str(len(p)) + ': ' + str([list(array(list(subset))+1) for subset in p])
        best_alpha_average = alpha_average
        sys.stdout.flush()

That seems to be a good reference in Psychometrika that you linked to... so are you looking for an explanation of that paper in a simpler language? — StasK, Feb 14 '13 at 23:15
@StasK Simpler language would be great but I hope I don't need to get an explanation of the whole paper. After all its main focus is on the circumstances under which $\alpha$, $\beta$ and $\omega$ behave in this way or another. I'm more interested in more fundamental/primitive concepts I guess and in why my data is allowed to behave in such a strange way that clearly violates my naive understanding of those concepts which isn't good news for the persistence of these understandings without considerable cognitive dissonance. — Christian, Feb 14 '13 at 23:23
@StasK Also, I still hope there are other people who have similar problems with these concepts as I do and not all of them might have access to the full text of this Springer Link. — Christian, Feb 14 '13 at 23:26

score 1 · Answer 1 · answered Feb 14 '13 at 23:36

1

A negative Cronbach alpha when there is a good factor is usually due to some items being reverse coded. In a factor analysis, this doesn't matter, the loadings just become negative. But for Cronbach alpha, the correlation between that item and the others will be negative, thus messing things up.

e.g. Suppose you had the following questions

From 1 to 100, please rate the following people

George W. Bush Barack Obama Hillary Clinton Ronald Reagan Richard Nixon

Well, I would guess there would be one strong factor, but alpha would be lousy.

if this is the problem the solution is simple: Just reverse code the items that need it

answered Feb 14 '13 at 23:36

Peter Flom

94,055
35
143
276

That's indeed right, the scale with the abysmal α has two very large negative loadings. Unfortunately it just doesn't make any semantical sense to reverse these. I probably need to rephrase them then the next time. Any theory on why I never stumbled upon that α optimization thing before? Was it born out of some deep misunderstanding of the underlying concepts? – Christian Feb 15 '13 at 09:50
1

Optimization of $\alpha$ seems like a sensible thing to do (or no worse than many others, at least) -- few people in psychology have sufficient programming background to implement it, that's all. – StasK Feb 15 '13 at 13:11
@StasK That's kinda what I had hoped for, thanks! I added my code (which is Python though, not R or SPSS) in case anybody might find it helpful or interesting. It's a brute-force optimization though and therefore only works for very small scales. – Christian Feb 15 '13 at 15:33
1

I worked on this problem some while ago, and used simulated annealing for longer scales, with 40-60 items. That did not work out particularly well, I have to say. – StasK Feb 15 '13 at 16:58
@StasK Ok, that's good to know. I guess I was lucky that I got good results. You didn't happen to end up publishing anything on your method though, did you? – Christian Mar 01 '13 at 10:03
@PeterFlom Now that I though about this again, does the PCA actually retain the direction of an item so to speak? Doesn't the rotation of the coordinate system destroy the information which items need to be reverse coded but the items land on either side of the principal component vector more or less by random? Or is this what you meant with "the loadings just become negative"? That it "just happens" without any semantic meaning? – Christian Mar 01 '13 at 11:06
Yes, that is what I meant. The sign of the factors is essentially arbitrary. Are you high on a "conservative" factor or low on a "liberal" factor? – Peter Flom Mar 01 '13 at 11:08
Ok, so whether the outcome is +--++ or -++-- is basically random but if things are as they should be, it will be one of the two patterns, right? – Christian Mar 06 '13 at 21:49
That's the idea! – Peter Flom Mar 06 '13 at 22:45
Ok, I think I got it then. Thanks a lot, Peter! :) – Christian Mar 07 '13 at 20:02

Is optimizing α for partitions of a scale a sensible method for factor analysis? (was: relation between consistency and unidimensionality)

1 Answers1