I have the following hypothesis:
"The higher the number of translation-stable words between a pair of countries, the higher the correlation between the happiness scores of their words".
note:
tanslationally-stable means that forward and back trasnlation returns the original word.
I have a dataset that has a list of words in different languages that are all rated on a happiness scale
What is the best way to prove whether this hypothesis is true or not.
Here is my suggested answer:
Null hypothesis: "The higher the number of translation-stable words between a pair of countries, the higher the correlation between the happiness scores of their words".
Alternative hypothesis: "There is no correlation between the number of translation-stable words and the happiness scores of the words"
- One way is to : o Generate the number of translation stable words between any two languages, N o Calculate the Pearson correlation coefficient for the regression, r o Plot the two against in a scatter plot, and apply a linear regression fit to see if there is any correlation between the two.
I feel that I need to show more. Like do I need to do a hypothesis test and calculate the p-value? If so which one?
Any help would be appreciated.