0

I have the problem that my p-value is always NA. I am making a Multiple Linear Regression in R. I have panel data for the period 2008-2018. One dependent variable (y) and 16 independent variables (x1-x16), z are the years (2008-2018). Here are my inputs and results. I am using RStudio

library(readxl)
Risiko_Tool_Copy <- read_excel("C:/Users/debtaba1/Desktop/Risikotool/Risiko-Tool - Copy.xlsx")
View(Risiko_Tool_Copy)
fit <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 + x14 + x15 + x16 + z, data=Risiko_Tool_Copy)
summary(fit) # show results

Call:
lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + 
    x10 + x11 + x12 + x13 + x14 + x15 + x16 + z, data = Risiko_Tool_Copy)

Residuals:
ALL 11 residuals are 0: no residual degrees of freedom!

Coefficients: (7 not defined because of singularities)
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.941e+02         NA      NA       NA
x1          -3.622e-01         NA      NA       NA
x2           9.772e-03         NA      NA       NA
x3          -4.864e+00         NA      NA       NA
x4           1.809e+01         NA      NA       NA
x5           2.086e+00         NA      NA       NA
x6           1.123e+00         NA      NA       NA
x7          -7.864e+00         NA      NA       NA
x8           1.323e+01         NA      NA       NA
x9          -9.386e-01         NA      NA       NA
x10          1.165e-02         NA      NA       NA
x11                 NA         NA      NA       NA
x12                 NA         NA      NA       NA
x13                 NA         NA      NA       NA
x14                 NA         NA      NA       NA
x15                 NA         NA      NA       NA
x16                 NA         NA      NA       NA
z                   NA         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:    NaN 
F-statistic:   NaN on 10 and 0 DF,  p-value: NA

Can anyone please help me?

Here is my dataset:

structure(list(z = c(2008, 2009, 2010, 2011, 2012, 2013, 2014, 
2015, 2016, 2017, 2018), y = c(0.956, 1.463, 0.457, 0.42, 0.57, 
0.33, 0.2, 0.86, 1.14, 0.46, 0.67), x1 = c(2561.74, 2460.28, 
2580.06, 2703.12, 2758.26, 2826.24, 2938.59, 3048.86, 3159.75, 
3277.34, 3386), x2 = c(31719, 30569, 32137, 33673, 34296, 35045, 
36287, 37324, 38370, 39650, 40883), x3 = c(7.8, 8.1, 7.7, 7.1, 
6.8, 6.9, 6.7, 6.4, 6.1, 5.7, 5.2), x4 = c(3.88, 1.16, 1, 1.25, 
0.88, 0.55, 0.16, 0.05, 0.01, 0, 0), x5 = c(82.3, 83, 83.9, 86.8, 
89.8, 92.6, 95.5, 100, 106, 110.8, 116.3), x6 = c(92.5, 85.2, 
97.1, 100.4, 96.7, 97.9, 99.3, 100, 100.2, 103.2, 103), x7 = c(2.6, 
0.3, 1.1, 2.1, 2, 1.5, 0.9, 0.3, 0.5, 1.5, 1.8), x8 = c(97.8, 
98.8, 100, 101.3, 102.5, 103.8, 105.4, 106.7, 108, 109.7, 110.9
), x9 = c(80.76, 81.8, 81.75, 80.33, 80.52, 80.77, 81.2, 82.18, 
82.52, 82.79, 82.88), x10 = c(615.8, 790.68, 1059.19, 1201.63, 
1259.57, 867.14, 987.1, 971.9, 1095.28, 1084.83, 1119.42), x11 = c(95.98, 
44.6, 79.36, 91.38, 98.92, 90.8, 98.71, 53.81, 36.98, 53.72, 
60.42), x12 = c(95.98, 45.59, 77.93, 94.75, 107.24, 110.88, 110.91, 
57.56, 37.61, 56.82, 66.87), x13 = c(4810.2, 5957.43, 6914.19, 
5898.35, 7612.39, 9552.16, 9805.55, 10743, 11481.1, 12917.6, 
10559), x14 = c(5601.9, 7507.04, 10128.12, 8897.81, 11914.37, 
16574.45, 16934.85, 20774.62, 22188.94, 26200.77, 21588.09), 
    x15 = c(6052, 1070, 2848, 1107, 680, 699, 2277, 782, 478, 
    961, 1366), x16 = c(903.25, 1115.1, 1257.64, 1257.6, 1426.19, 
    1848.36, 2058.9, 2043.94, 2238.83, 2673.61, 2506.85)), row.names = c(NA, 
-11L), class = c("tbl_df", "tbl", "data.frame"))
Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Besi
  • 1
  • 3

1 Answers1

2

You will need at least $m$ observations to estimate $m$ parameters of a linear model. To do it reliably though will require much more, e.g. 50$m$ observations. In your case, $m$ is 18. So, technically you will need 18 observations. Practically, maybe 900, depending on the goal of your analysis.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Michael M
  • 10,553
  • 5
  • 27
  • 43
  • Hi Michael! Thnx for your answer. I have17 data for 11 years. so my n is 187 or not? I just want to prove if the korrelation between the dependet and independet variable ist signifiacant. Is there any other method to proove it? – Besi May 06 '19 at 09:08
  • 1
    Hmm. Your text says "11 residuals". Somewhere, you are loosing the other lines. Maybe due to missing values? – Michael M May 06 '19 at 09:32
  • no, no values are missing – Besi May 06 '19 at 09:45
  • 2
    The dataset as shown (thanks) shows just 11 observations. If you have panel data then you only show one panel. The model you are fitting makes no sense for 11 observations. – Nick Cox May 06 '19 at 10:55
  • oh ok. Thnx. Can you suggest another model? – Besi May 06 '19 at 10:59
  • 2
    No; I really can't. For a start, I have no idea what your data are. Also, everything hinges on what other panels you may have. If you do have other panels then there are many possibilities. In terms of what you have shown us so far, you have 11 years and 17 variables. What you might do with what you have shown is too broad a question. . – Nick Cox May 06 '19 at 11:06
  • ok. thnx for your help until now. – Besi May 06 '19 at 11:10