0

I'm pretty new to statistics and I need your help. I just installed the R software and I have no idea how to work with it. I have a small sample looking as follows:

Group A : 10, 12, 14, 19, 20, 23, 34, 41, 12, 13
Group B :  8, 12, 14, 15, 15, 16, 21, 36, 14, 19

I want to apply t-test but before that I would like to apply Shapiro test to know whether my sample comes from a population which has a normal distribution. I know there is a function shapiro.test() but how can I give my numbers as an input to this function?

Can I simply enter shapiro.test(10,12,14,19,20,23,34,41,12,13, 8,12, 14,15,15,16,21,36,14,19)?

  • 1) the Shapiro Wilk test doesn't tell you your data *is* normal; it sometimes tells you when it *isn't*. 2) Your data certainly won't be normal anyway (looks like they're positive integers for starters), so you're answering the wrong question with a test of normality. 3) The mixture distribution obtained from the combined samples are not assumed in the t-test to be normal, so even if it made sense to formally test the assumptions, you wouldn't test *that*. 4) Your R syntax is wrong, since `shapiro.test` takes a vector argument, and you're supplying a comma-separated collection of arguments. – Glen_b Aug 10 '14 at 02:07
  • Thanks. Data in groups A and B are not real. Those data are just examples. You just mentioned it is not correct!!! then what is the correct way? How can I check the normality. There are many tutorials showing this is the way to check normality and I am confused. Plz see http://yatani.jp/teaching/doku.php?id=hcistats:datatransformation – Bahador Saket Aug 10 '14 at 02:16
  • A visual assessment of normality, such as the QQ plot at your link, at least is looking at a measure of effect size (how non-normal is it?). Indeed, the t-test is pretty robust to non-normality (increasingly so at large sample size), so a goodness of fit test will more often tend to reject when it matters *least*. A better option would be to examine how sensitive the test behaviour would be under similar conditions via simulation, as was done in the answer [here](http://stats.stackexchange.com/questions/110801/should-i-use-t-test-on-highly-skewed-data-scientific-proof-please), ... (ctd) – Glen_b Aug 10 '14 at 02:26
  • (ctd) ... or simply to avoid the assumption if you don't think it's reasonable to make it. You could always go to a permutation test, for example, unless sample sizes are especially small (whereupon the problem is lack of suitable significance levels to use). – Glen_b Aug 10 '14 at 02:26
  • Maybe it would be better to explain my problem in a better way, then you can suggest the best thing that I can do. I have 2 tools. Tool A and Tool B. I recruited 16 participants and asked them to perform some tasks using Tool A and then Tool B. I recorded their performance time. Then I applied t-test to see whether difference is significant. But my advisor asked me to check normality of my data. So that is why i am looking for checking normality. I'm pretty new to statistics. – Bahador Saket Aug 10 '14 at 02:39
  • My earlier advice covers this situation, except I'd add that I would assume performance *times* are quite right skew. (For myself I'd be inclined to use GLMs in this case, but I might consider a t-test on log-time. What you need depends on specific details of the particular questions you want to answer. Comments are not the place to deal with that) – Glen_b Aug 10 '14 at 03:05

0 Answers0