Should I use a paired sample t-test to compare two methods of measuring absorption lines?

Question

Background

I'm doing a research project in astronomy and measuring equivalent widths of absorption lines using Gaussian fits from a star's spectrum in order to determine the star's chemical abundances. Each absorption line corresponds to a specific electronic transition in an atom of a specific element. One type of atom, say iron, produce many absorption lines at different wavelengths that we observe. In addition, different atoms produce different absorption lines. The equivalent width of a Gaussian fit allows one to quantify the strength of an absorption line observed in a spectrum (i.e., how strong a luminous intensity the line appears to have).

My data

I have measured equivalent widths for 54 absorption lines with Method A and B, as shown in Tables 1 and 2 respectively. Each "Equivalent width" value in tables is an average of two or three measurements for each line.

This is how the measurements are done (the full method is described here) but are summarized here https://gitlab.com/evgenyneu/2018_summer_project_logbook/raw/master/a2019/a01/a23_measuring_lines/three_measurements.gif. I fit the Gaussian profile to the observed data. For the first measurement, I make a fit with a Gaussian curve with a wide base. For the second I fit a narrower Gaussian to the peak region. If the observed line is good, I make an optional third measurement by making a what looks like a moderate fit, close to the full width at half-maximum (actually half-minimum as absorption curves are upside down) which is between the narrow and wide measurements.

The "uncertainty" column is the half of the range of the measurements of equivalent widths. For example, if we have two measurements, 10 mA and 16 mA, then the uncertainty is 3 mA ( (16-10)/2 ).

Table 1: Equivalent widths of absorption lines and their uncertainties measured with Method A. The units are angstrom (A) and milliangstrom (mA).

+----------------------------------------+------------------+
| Wavelength (A) | Equivalent width (mA) | Uncertainty (mA) |
+----------------+-----------------------+------------------+
| 4730.03        | 37                    | 8                |
| 4731.45        | 79                    | 9                |
| 4788.76        | 52                    | 5                |
| 4890.75        | 153                   | 20               |
| 4891.49        | 171                   | 19               |
| 5814.81        | 8                     | 2                |
...
... Skipped 47 rows
...
| 6703.57        | 22                    | 1                |
+----------------+-----------------------+------------------+

Table 2: Equivalent widths of absorption lines and their uncertainties measured with Method B.

+----------------------------------------+------------------+
| Wavelength (A) | Equivalent width (mA) | Uncertainty (mA) |
+----------------+-----------------------+------------------+
| 4730.03        | 26                    | 4                |
| 4731.45        | 52                    | 7                |
| 4788.76        | 39                    | 3                |
| 4890.75        | 116                   | 18               |
| 4891.49        | 139                   | 18               |
| 5814.81        | 6                     | 1                |
...
... Skipped 47 rows
...
| 6703.57        | 16                    | 2                |
+----------------+-----------------------+------------------+

Question

Should I compare methods A and B using $t$-testing or should I do something else?

Alternative methods

If paired sample t-test is not suitable, what alternative statistical tests should I use (Wilcoxon signed-rank test, chi square test of independence etc.)?

Sample size

54 measurements.

Independence

Equivalent widths measurements of 54 different lines are not be independent. For example, higher abundance of iron may result in wider equivalent width values of all its absorption lines. In my data it means that, for example, if measurement of a 4730.03 A line is wider, it is likely that measurement of 4731.45 A will be wider as well, if the two lines are produced by the same element.

Do I understand correctly that each of your 54 measurements is measuring something different? Or are you measuring the same exact measure thing 54 times. — StatsStudent, Jan 27 '19 at 06:09
These are measurements of 54 different absorption lines of the same star. Each absorption line is located at specific wavelength. — Evgenii, Jan 27 '19 at 06:12
Do you only have one measurement for each method for each wavelength? — StatsStudent, Jan 27 '19 at 06:17
I have two or three measurements at each wavelength for each of the two methods. — Evgenii, Jan 27 '19 at 06:18
@kjetilbhalvorsen this is uncertainty of measurement of equivalent width. For example, for the line at 4730.03 A and Method A, the measurement is 37±4 mA, where 37 is measured value and 4 is the uncertainty of this measurement. — Evgenii, Jan 27 '19 at 08:29
Ok, but how is that uncertainty calculated? Spectroscopy might have its own conventions, unknown to most readers here — kjetil b halvorsen, Jan 27 '19 at 09:27
@kjetilbhalvorsen you are right, sorry. The uncertainty is half of the range of the measurements. I have updated my question to include this. — Evgenii, Jan 27 '19 at 10:08
Why is half-width the right statistic to use to indicate spread? Also, we know Iron emission lines pretty well in the lab likely to much higher precision and tolerance than we are going to see from a star. As long as the lines are from one and only one element, even though they are at different wavelengths, they all are speaking to the same physical reality: the concentration of iron in the star. Couldn't one use something like a Kalman Filter (unscented?) to combine the physics with the many individual measurements to get a single estimate yielding mean and spread of concentration? — EngrStudent, Jan 27 '19 at 11:56
Good point, @EngrStudent that method of estimating measurement uncertainty was chosen by my supervisor. What method would you suggest? — Evgenii, Jan 27 '19 at 12:59
@EngrStudent, as to your Kalman Filter question, I have no idea, not an expert, sorry. :) In our current workflow we first measure equivalent widths of different absorption lines from the observed spectrum of a star. Then we use a spectral synthesis program called MOOG (Sneden, C. A. 1973) which estimates abundances of chemical elements in the stellar atmosphere (and also star's effective temperature and surface gravity). — Evgenii, Jan 27 '19 at 13:06
Why aren't you integrating the peaks? If that's not appropriate, then why not use standard methods to fit Gaussians to the peaks, such as Maximum Likelihood, which automatically produce reasonable uncertainty estimates? Finally, if you really must compare sets of estimates of half-widths, it's likely the appropriate way to express those data is with their logarithms. — whuber, Feb 03 '19 at 14:42
@whuber Yes (negative) integration is appropriate. No, it isn't Gaussian, see my answer. — Carl, Feb 06 '19 at 09:56

StatsStudent · Answer 1 · 2019-01-27T06:51:16.090

1

I would recommend using the multivariate Hotelling's paired $T^2$-statistic since you have three measurements at each wavelength for each method. Essentially, what you are doing with this method is testing, simultaneously (and thereby controlling the error rate) if the measurements in total (i.e. 54 wavelengths) from Method A are equivalent to Method B. This is the multivariate equivalent to the paired-sample T-test.

There is a good description of this method listed here at Penn State's Multivariate Statistics Course. If you are using R, you'll want to check out the Hotelling package. If you are using SAS for your analysis, you can carry out this test using proc glm along with the MANOVA statement. Please pay careful attention to the assumptions listed before using this method. You must meet the assumptions for this test to be valid.

An excellent textbook reference is the classic multivariate statistics introductory text by Johnson and Wichern, Applied Multivariate Statistical Analysis 6th ed. (page 273-279).

edited Jan 27 '19 at 06:51

answered Jan 27 '19 at 06:34

StatsStudent

10,205
4
37
68

I'm not sure if my data violates the independence assumption. Multiple measurements at different wavelengths may be linked, if they are absorption lines produced by the same chemical element. – Evgenii Jan 27 '19 at 06:52
For example, if 10 lines are produced by iron, and there is a lot of iron, all these 10 lines will give high values. – Evgenii Jan 27 '19 at 06:58
1

It is not important that variables (wavelengths) are independent from one another, so you should be okay with the example you just listed. After all this test is developed so one could simultaneously examine differences in several "response variables" at the same time. We would expect to find measurements to be correlated in many cases. – StatsStudent Jan 27 '19 at 06:59
1

A more complete list of the assumptions and some good online examples are provided here. You will want the paired sample approach on page 42 (assumptions on page 29): https://us.sagepub.com/sites/default/files/upm-binaries/70364_Schumacker_Chapter_3.pdf – StatsStudent Jan 27 '19 at 07:02
It would appear that you answered the question before figuring out what was really being asked. The meaning only became clear after a series of clarifying comments exchanged over many hours. The original question was equivalent to asking "Should I measure the size of an elephant from its trunk, its leg, or its tail length?" and the answer was "No, you should choose a model that best fits the whole elephant." I have not wasted points downvoting your answer. – Carl Feb 06 '19 at 01:33
@Carl, hmmm. Very interesting. I'm not sure how that would even be known, since the OP asked "I want to compare the two methods of measurements and find if they give same results." Given the original question, I'm not sure it would have even been appropriate to downvote! ;-) I hate to think that users expect hours of free consulting on a free website to have the responders ultimately read their minds. But I'm glad to see the OP might have answer for his question - at least for now. – StatsStudent Feb 06 '19 at 01:39
Every statistical neophyte has the problem that they cannot phrase their questions in perfect statistical language. So, are you saying that they should not be encouraged to learn enough to be able to do so? Was there a time when you had trouble asking statistical questions? Sure, this is not a good example of how to ask a question, if you wish, I can rewrite it, and I will do so if it remains closed. However, the object lesson herein for those with statistical training is that teaching moments are not a waste of time, and consultants should not sniff their noses at those who are teachable. – Carl Feb 06 '19 at 01:46
No mind reading, link reading, see https://gitlab.com/evgenyneu/2018_summer_project_logbook/raw/master/a2019/a01/a23_measuring_lines/three_measurements.gif, and http://spiff.rit.edu/classes/ast601/paper/trombley.pdf – Carl Feb 06 '19 at 01:59
@Carl, I think you might have me mistaken. I'm not "sniff(ing) my nose at those who are teachable." If I did would I really be providing general advice here and there to the point where I currently stand with a "reputation" in the top 0.87% reputation this quarter? I'm simply stating that the "threat of a down-vote" seems a bit misplaced for an answer that was accurate with the information provided at the time (and indeed may still be). With regard to rewriting the question for the OP, I'm not sure that's totally in line with the accepted practice here: https://bit.ly/2RHnQiV. (con't). – StatsStudent Feb 06 '19 at 04:41
After all, part of the teaching process is allowing students to learn how to write, think, rewrite themselves. But the decision to edit is ultimately your prerogative. – StatsStudent Feb 06 '19 at 04:42
I tire of symmetric misunderstanding of both questions and answers. My answer was downvoted without comment, obviously incorrectly. I did not threaten you. I might have downvoted your answer because I believe it to be incorrect but refrain from downvoting on general principles, that being that communication is generally more effective than downvoting, and certainly more principled. I am disappointed that you do not accept either my discretion nor POV. Think about it. – Carl Feb 06 '19 at 09:48

Carl · Accepted Answer · 2019-02-06T02:41:59.240

1

The text (after editing due to comments exchanged) shows that a Gaussian is not a good fit to the spectral lines. This link shows the origin of the different widths of the spectral lines depending on how the lines are fit.

Half ranges are not a common method of making observations concerning error of location. For two observations this reduces to $\sqrt{2} $ standard deviations. However, for three observations it would have a different, more-variable relationship to standard deviation. Regardless, the half-range is a small number biased estimate, and would be perhaps better expressed as its square to eliminate that bias, i.e., analogous to variance being unbiased. The reasoning behind this is perhaps not obvious, so see Why are we using a biased and misleading standard deviation formula for $\sigma$ of a normal distribution? by way of explanation. In this case, a better measurement is the area under the curve, which (square measurement) should be proportional to luminous flux despite variable width of the absorption lines from whatever cause.

The sloppiness of the fitting with Gaussian curves is that the spectral lines are not Gaussian curves. That is, the first step in such problems is the numerical identification of the distribution type appropriate to the problem. That is the first thing I would test in the original data. As it turns out, this obviates $t$-testing by eliminating the modelling uncertainty.

A short-cut that obviates the need for testing fit quality is a search on prior work. That search yielded, among other things, Modeling Stellar Absorption Lines: The FeI 6546.25 $\overset{\small{_{\text{o}}}}{\text{A}}$ Line, which suggests better results from Chi-squared fitting of Voigt profile combination of the Gaussian and Lorentz distributions (Bowers & Deeming 1984), and is often used to model spectral absorption line features. If you can implement this, you may also be able to reduce the uncertainty resulting from Gaussian only modelling. That is, it would appear that there are three different criteria for Gaussian fitting because a Gaussian is not the best possible shape to be fitting to the data. I would encourage you to undertake further searches and consultations with potential coauthors on this approach, precisely because I am not an astrophysicist myself, although I probably should disclose that I have coauthored papers with two outstanding ones. Finally, it was not inappropriate to ask your question on CrossValidated, and although there is a StackExchange companion site for physics, there is no AstroPhysics site per se, and many of the astro questions posed do appear on this site.

edited Feb 06 '19 at 02:41

answered Feb 02 '19 at 15:07

Carl

11,532
7
45
102

"The text is unclear as to whether or not method A is from two or three different observation sessions. " Both methods A and B are from the same single observation session. These are just different methods of measuring absorption lines **from the same data**. In both methods I make two or three measurements of the lines: (1) the overestimate, (2) the underestimate and (3) the optional "best" estimate. I have updated my question and described how I performed the measurements. – Evgenii Feb 02 '19 at 23:29
"Another point that needs clarification is that the distribution of errors of measurement is not specified. " I don't understand what you mean. Sorry, I'm not a statistician, could you explain how this is done? :) – Evgenii Feb 02 '19 at 23:30
1

You mean "how" this is done? If so, several approaches. (1) Process multiple measurements using nonparametric methods. (2) Transform measurements to be normally distributed if they are not already and use $t$-testing (i.e., parametric method) of transformed measurements if not already normal. What makes this more difficult than usual is that there is no consistency to the number of measurements taken, which is frankly confusing me no end, because it introduces a potential selection bias that I cannot fathom correcting. – Carl Feb 02 '19 at 23:40
1

Yet another problem is the [discretization](https://en.wikipedia.org/wiki/Discretization) of the measurements into whole numbers of mA. That may be problematic for both nonparametric and parametric methods as in the former case it introduces ties, and in the latter, i.e., for parametric methods, real numbers would be more useful than whole numbers. – Carl Feb 02 '19 at 23:51
noted thanks, I will do three measurements for all lines then! Can I use the nonparametric Wilcoxon signed-rank test? Good point about "discretization", I am actually using all available significant figures in calculations, I've just rounded them up here for simplicity. – Evgenii Feb 03 '19 at 00:21
1

I would not fit a Gaussian to anything, but would use the actual triplicates to establish (1) How the measurements are distributed (2) What that distribution's parameters values are. It is possible that the errors are approximately Gaussian, but, they are not theoretically Gaussian in the sense that one cannot have a negative spectral frequency, and a perfectly Gaussian distribution has infinite support, which does not agree with the semi-infinite support of wavelengths, [see](https://en.wikipedia.org/wiki/List_of_probability_distributions) for an explanation of support . – Carl Feb 03 '19 at 01:13
1

However, the distribution may be close enough to Gaussian that it makes little difference whether one uses a Gaussian as contrasted to a Gamma distribution or something else with semi-infinite support. A theoretically perfect distribution is not strictly necessary if, in this case, a Gaussian distribution is not obviously ruled out using [testing](https://en.wikipedia.org/wiki/Normality_test). I leave it to you to figure out how to test for a distribution type using error propagation from three samples, it would be problematic having only two. – Carl Feb 03 '19 at 01:22
1

I did not initially catch the estimate type, I would suggest doing repeat measurements while only attempting to be accurate and not under or over estimating and to have three *independent* observers make those three measurements to insure a degree of inter-observer variability. One could then quantify interobserver variability, determine whether that is significantly different or not, which then becomes a useful characterization in its own right, and might then feed into a workup of distribution type through error propagation as some degree of independent observations in then assured. – Carl Feb 03 '19 at 03:06
1

By "Guassian fit" I meant the curve with a shape of a Guassian that I fit to the observed absorption line, as shown here: https://gitlab.com/evgenyneu/2018_summer_project_logbook/raw/master/a2019/a01/a23_measuring_lines/three_measurements.gif With this method, I make three measurement (Fat, Skinny and Intermediate). – Evgenii Feb 03 '19 at 04:42