This question is an extension from an earlier question.
The tensile strength of a synthetic fibre used to make cloth for men’s shirts is of interest to a manufacturer. It is suspected that the strength is affected by the percentage of cotton in the fibre (Table 1). Five levels of cotton percentage are of interest: 15%, 20%, 25%, 30% and 35%. Five observations were taken at each level of cotton percentage; the 25 observations were run in random order. Y denotes the response variable, that is, the tensile strength, measured as resistance to a fixed stress on a scale from 0 to 50. The means of Y for the different levels of cotton percentage, and the overall mean response, are also given.
Table 1
Problems to solve
The idea is to analyse the data with an analysis of variance. Let Yij be the ith replicate of the response variable taken at the jth level of the factor representing the percentage of cotton.
I have read extensively but I still am confused with the following concepts:-
Is it reasonable to believe that the variance is the same for each factor level shown in the boxplot below. There are stark differences between the mean and variance for each factor level. Would this be an example of heteroscedasticity referring to the circumstance in which the variability of each factor level is unequal?
Would it be reasonable to model the relationship between strength and percentage cotton (as a continuous variate) with a simple linear regression model shown in the residual plots below. Would the answer change if the observations for 35% cotton were discarded?
Code for factor levels for the % of cotton in fibres:
- 15 % cotton = A
- 20 % cotton = B
- 25 % cotton = C
- 30 % cotton = D
- 35 % cotton = E
Boxplots and Residual Plots from a Simple Linear Regression