1

I have the following hypothesis:

"The higher the number of translation-stable words between a pair of countries, the higher the correlation between the happiness scores of their words".

note:

  • tanslationally-stable means that forward and back trasnlation returns the original word.

  • I have a dataset that has a list of words in different languages that are all rated on a happiness scale

What is the best way to prove whether this hypothesis is true or not.

Here is my suggested answer:

Null hypothesis: "The higher the number of translation-stable words between a pair of countries, the higher the correlation between the happiness scores of their words".

Alternative hypothesis: "There is no correlation between the number of translation-stable words and the happiness scores of the words"

  • One way is to : o Generate the number of translation stable words between any two languages, N o Calculate the Pearson correlation coefficient for the regression, r o Plot the two against in a scatter plot, and apply a linear regression fit to see if there is any correlation between the two.

I feel that I need to show more. Like do I need to do a hypothesis test and calculate the p-value? If so which one?

Any help would be appreciated.

BKS
  • 235
  • 1
  • 2
  • 8
  • 2
    These questions reflect some misunderstandings that have been well addressed in other threads. Find the better ones by searching our site for [p-value](http://stats.stackexchange.com/questions/tagged/p-value) and reading the top-voted ones. – whuber May 12 '15 at 00:31
  • @whuber could you help direct me. I tried searching and couldn't find something similar to what I am talking about where they are comparing a number of common values between two sets with a correlation between two sets – BKS May 12 '15 at 06:56
  • 1
    @BKS check the linked questions and answers, they cover a wide variety of topics on hypothesis testing. Search for [correlation p-value](http://stats.stackexchange.com/search?q=correlation+p-value) and you'll find multiple answers to similar questions. Also, you seem not to understand p-values, so start with checking this question: http://stats.stackexchange.com/questions/31/what-is-the-meaning-of-p-values-and-t-values-in-statistical-tests – Tim May 12 '15 at 07:11

0 Answers0