1

Say your data generating process is given by the function $y=f(x|\theta)$, where $y$ and $x$ represent variables (data) and $\theta$ represent parameter(s). For convergence reasons (e.g. $f(\cdot)$ is highly non-linear on parameters and a GMM estimator does not converge), you decide to estimate a Taylor series expansion of $f(\cdot)$ around $\theta=\theta_0$. Let's denote this approximated function as $y \approx g(x|\theta)_{\theta_0}$.

Say you estimate $\theta$ in $g(\cdot)$ based on a random sample of $\{y,x\}$, and you get $\hat\theta_1$. Then, you recompute the Taylor series approximation around this point estimate (keeping the Taylor series order constant), and produce $y \approx g(x|\theta)_{\hat\theta_1}$. Then, you estimate again, yielding $\hat\theta_2$. You iterate until

$$ (\hat\theta_n - \hat\theta_{n+1})^2 < \epsilon $$

for an arbitrary threshold $\epsilon > 0$.

Convergence (in terms of the optimisation criterion above) is of course of paramount importance. Notice that for an arbitrarily large $\epsilon$ there is always a solution, as long as $\hat\theta$ can be computed, which itself depends on the properties of $g(\cdot)$, e.g. on the order of the Taylor expansion; a linear model is always estimable, beyond trivial issues like multicolinearity.

My question is, is the method above a thing? I've searched for "iterated estimation of taylor series" on Google, in this forum and in Math.SE and cannot find anything about this. Maybe the method is just plainly wrong, e.g. convergence is not assured by any known theorem.


More details on the method

For instance, consider a CES production function:

$$ Y = \left(\alpha K^\theta+ (1-\alpha)L^\theta\right)^{1/\theta} $$

where Y, L and K are variables, and $\alpha$ and $\theta$ are parameters. Assume we are particularly interested in estimating $\theta$.

So, you produce a first order Taylor series expansion of the log of $Y$, around $\theta= \theta_0$. The new formula (which is equivalent to the so-called translog production function when $\theta_0 = 0$) is:

$$ln(Y) \approx \frac{1}{\theta_0} ln\left(\alpha K^\theta_0+ (1-\alpha)L^\theta_0\right) + (\theta - \theta_0)\left[\frac{1}{\theta^2} ln\left(\alpha K^\theta_0+ (1-\alpha)L^\theta_0\right) + \frac{1}{\theta_0}\frac{\left(\alpha K^\theta_0 ln(K)+ (1-\alpha)L^\theta_0 ln(L)\right)}{\alpha K^\theta_0+ (1-\alpha)L^\theta_0} \right] $$

So, you estimate the above equation with a random sample of $\{Y,L,K\}$, using e.g. non-linear least squares, for a given arbitrary $\theta_0$. Importantly, $\theta_0 \neq 0$, because otherwise the equation above changes completely (see translog function in link). From this estimation, you obtain an estimate of $\theta$, $\hat\theta_1$. Then, you re-estimate the model assuming $\theta_0 = \hat\theta_1$ (so, a new Taylor series around a different value). Then, estimate the new equation, obtaining $\hat\theta_2$. Iterate until some convergence criterion is fulfilled.

luchonacho
  • 2,568
  • 3
  • 21
  • 38
  • I am having difficulty understanding your procedure. Could you elaborate on what you mean by "estimate a Taylor series expansion of $f(\cdot)$ around $\theta=\theta_0$"? This notation seems explicitly to refer to an expansion in *both* arguments $x$ and $\theta,$ but by referencing only $\theta$ it appears to be a Taylor series only in the second argument, but AFAIK the only thing you can "estimate" is the behavior of $f$ in its *first* argument! – whuber Dec 07 '18 at 16:17
  • @whuber I added an example. I hope it's clearer. – luchonacho Dec 07 '18 at 16:50
  • I'm still confused, as I will explain. Please note that this is a *first* order series in $\theta.$ As such, I cannot see why it would require "non-linear" least squares, because ordinary least squares regression of $\log(Y)$ against $\alpha(1-\alpha)(\log(K/L))^2$ will fit it (assuming you know $\alpha$, which--although you refer to it as a "parameter"--seems to have been forgotten). And if you *do* use a higher order expansion in all parameters, it seems like your problem isn't any simpler than just performing a nonlinear fit to the desired model in the first place. – whuber Dec 07 '18 at 17:35
  • Is it correct that your goal is to optimize $\theta$ when the loss function is not linear in basis? – Chris Dec 07 '18 at 17:42
  • @whuber You are correct. This is a first order approximation. I use a non-linear estimator because I'm estimating $\theta$ and $\alpha$ (I could use OLS and then ude delta method, but prefer NL). Still, the iteration is for $\theta$, parameter around which the Taylor series is performed. – luchonacho Dec 07 '18 at 18:17
  • @Chris you can see it that way. But I'm estimating a regression. So there is another optimisation problem there too. – luchonacho Dec 07 '18 at 18:20
  • If I follow all that correctly, it sounds like you are proposing a strategy to optimize a function of $\theta$ by locally linearizing it. Since that procedure is built into many solvers, there's likely nothing new here--but the good news is that your default solver might already be doing something close to what you're trying to describe. – whuber Dec 07 '18 at 18:42
  • @whuber I am indeed locally linearising and then estimating $\theta$. But the solution is optimal only for that particular liearisation. What I want is to "re-linearise" around that estimated value. The equation changes for every re-linearisation. – luchonacho Dec 07 '18 at 18:53
  • 1
    That's exactly how many optimizers work. – whuber Dec 07 '18 at 20:21
  • @whuber Ok, but is this true even when the initial Taylor expansion is made at zero (which is my case)? – luchonacho Dec 08 '18 at 09:20
  • I'm afraid I don't understand why zero--or indeed any other number--would be special. – whuber Dec 08 '18 at 14:55
  • 1
    Try "Newton's Method" (https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization) – jbowman Dec 09 '18 at 18:34
  • @whuber I hope I made myself clearer now. – luchonacho Dec 11 '18 at 11:42
  • @whuber The crux of the issue for me is that, because of the particular nature of the function $Y$, the taylor expansion around 0 is entirely different from that elsewhere (around zero you need to use L'Hopital to get rid of terms). So, if I apply a newton optimisation method to $ln(Y)$, is that taken into account? A manual application of the method certainly would need to. – luchonacho Dec 11 '18 at 11:51
  • If the Taylor expansion around one point (such as $0$) is truly different than the expansion around any nearby points, then the expansion is probably invalid in the first place. You might want to check the radius of convergence and look at the magnitude of the error term. – whuber Dec 11 '18 at 18:24

1 Answers1

1

If you take the $n^{\text{th}}$ order Taylor expansion of $\theta$ around a value, you have implicitly restricted your model to an $n$ degree polynomial and have set an initial value for $\theta$. A polynomial is linear in basis and thus if you use a convex loss function can be solved exactly given your data. No further expansion can improve this result.

This is equivalent to Polynomial Regression.

Chris
  • 681
  • 4
  • 13
  • But then I'm estimating $\theta$, so I could produce a new Taylor expansion around that new value and re-estimate. So process is: expand around arbitrary $\theta$, estimate $\theta$, expand around new $\theta$, estimate $\theta$, etc etc – luchonacho Dec 07 '18 at 18:56
  • If $g$ is a polynomial, when you estimate $\theta$ using a linear model (which you can do) you will get the correct (up to a small error) $\theta$ for which there is no improvement. – Chris Dec 07 '18 at 20:57
  • I added more details to the question. – luchonacho Dec 11 '18 at 11:42