Why is it not possible to generate an explicit formula for Newton's method?

2,752

Solution 1

First, note that such a formula would violate the "too good to be true" test: it would let you find the minima of any (sufficiently smooth, strictly) convex function in a single step, no matter how complicated!

For intuition, consider what Newton's method is doing: you are starting from an initial guess, and taking a sequence of "downhill steps" until you reach the minimum, where each step uses local information (a quadratic local approximation) about your neighborhood to compute the next destination. Imagine a hiker climbing down a mountain, who first hikes to the lowest point she can see, then reorients herself and goes to the lowest point she can see from her new location, etc.

The only way to make a direct beeline to the lowest point ($n\to\infty$) is if the hiker somehow knows the entire shape of the landscape in advance. In mathematical terms, this means your formula would have to use either:

  • information about the value of your function at all points, or equivalently (for nicely-behaved functions)
  • the values of all derivatives of all orders at your starting point.

And of course neither is practical in practice. Note though that sometimes you do have such complete information, e.g. when your function is quadratic, so that all higher-order derivatives are zero. And then you can easily make a beeline straight to the solution.

Also, you can write down a formula for the $n$th iteration, but as stated in the other answers, this is not useful: it gets messier and messier the larger $n$ gets (as it must, as explained above), doesn't converge to anything nice as $n\to \infty$, and amounts to nothing more than... running Newton's method for $n$ consecutive steps.

Solution 2

You wouldn't really gain much with a formula for the $n$th iteration.

The appeal of the Newton-Raphson method is that a single step:

  1. is conceptually easy to understand,
  2. is fast to compute, and
  3. can be checked from step to step for closeness to a stationary point.

Nesting $n$ of these computations into a single formula works against all three of these things.

Solution 3

Given $y \in \mathbb R^+$, suppose we want to compute the square root of $y$. Newton's method produces the recurrence relation $x_{k+1} = g (x_k)$ where

$$g (x) := \frac 12 \left( x + \frac{y}{x}\right)$$

which Hero of Alexandria apparently knew of almost two millennia ago. Suppose that we want a formula for $x_3$ in terms of $y$ and an initial estimate $x_0$. Using SymPy, we define $y$, $x_0$ and $g$:

>>> y, x0 = symbols('y x0')
>>> g = lambdify(x, 0.5 * (x + (y / x)))

Iterating $3$ times:

>>> g(g(g(x0)))
                         0.5*y                     0.25*y       0.125*y
0.125*x0 + --------------------------------- + -------------- + -------
                         0.5*y        0.25*y            0.5*y      x0  
           0.25*x0 + -------------- + ------   0.5*x0 + -----          
                              0.5*y     x0                x0           
                     0.5*x0 + -----                                    
                                x0                                     

Simplifying:

>>> simplify(g(g(g(x0))))
    /           8            6               4  2            2  3             4\
1.0*\0.015625*x0  + 0.4375*x0 *y + 1.09375*x0 *y  + 0.4375*x0 *y  + 0.015625*y /
--------------------------------------------------------------------------------
            /        6           4             2  2          3\             
         x0*\0.125*x0  + 0.875*x0 *y + 0.875*x0 *y  + 0.125*y /             

Is working with the ratio of bivariate polynomials of degrees $8$ and $7$ really less tedious than using the recurrence relation $x_{k+1} = g (x_k)$ thrice?

Suppose now that we want a formula for $x_5$ in terms of $y$ and $x_0$. Iterating $5$ times:

>>> g(g(g(g(g(x0)))))
                                                                                    0.5*y                                                                                                           0.25*y                                                0.125*y                   0.0625*y      0.03125*y
0.03125*x0 + --------------------------------------------------------------------------------------------------------------------------------------------------- + ----------------------------------------------------------------------- + --------------------------------- + -------------- + ---------
                                                          0.5*y                                                  0.25*y                   0.125*y       0.0625*y                            0.5*y                     0.25*y       0.125*y                 0.5*y        0.25*y            0.5*y       x0   
             0.0625*x0 + ----------------------------------------------------------------------- + --------------------------------- + -------------- + --------   0.125*x0 + --------------------------------- + -------------- + -------   0.25*x0 + -------------- + ------   0.5*x0 + -----            
                                                  0.5*y                     0.25*y       0.125*y                 0.5*y        0.25*y            0.5*y      x0                               0.5*y        0.25*y            0.5*y      x0                        0.5*y     x0                x0             
                         0.125*x0 + --------------------------------- + -------------- + -------   0.25*x0 + -------------- + ------   0.5*x0 + -----                         0.25*x0 + -------------- + ------   0.5*x0 + -----                       0.5*x0 + -----                                      
                                                  0.5*y        0.25*y            0.5*y      x0                        0.5*y     x0                x0                                             0.5*y     x0                x0                                   x0                                       
                                    0.25*x0 + -------------- + ------   0.5*x0 + -----                       0.5*x0 + -----                                                             0.5*x0 + -----                                                                                                     
                                                       0.5*y     x0                x0                                   x0                                                                         x0                                                                                                      
                                              0.5*x0 + -----                                                                                                                                                                                                                                               
                                                         x0                                                                                                                                                                                                                                                

Simplifying:

>>> simplify(g(g(g(g(g(x0))))))
    /                       32                         30                           28  2                         26  3                         24  4                        22  5                       20  6                       18  7                       16  8                       14  9                       12  10                        10  11                         8  12                         6  13                         4  14                         2  15                         16\
1.0*\9.31322574615479e-10*x0   + 4.61935997009277e-7*x0  *y + 3.34903597831726e-5*x0  *y  + 0.00084395706653595*x0  *y  + 0.00979593023657799*x0  *y  + 0.0600817054510117*x0  *y  + 0.210285969078541*x0  *y  + 0.439058616757393*x0  *y  + 0.559799736365676*x0  *y  + 0.439058616757393*x0  *y  + 0.210285969078541*x0  *y   + 0.0600817054510117*x0  *y   + 0.00979593023657799*x0 *y   + 0.00084395706653595*x0 *y   + 3.34903597831726e-5*x0 *y   + 4.61935997009277e-7*x0 *y   + 9.31322574615479e-10*y  /
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                    /                      30                         28                            26  2                         24  3                        22  4                       20  5                       18  6                       16  7                       14  8                       12  9                       10  10                        8  11                         6  12                          4  13                         2  14                        15\                 
                 x0*\2.98023223876953e-8*x0   + 4.61935997009277e-6*x0  *y + 0.000187546014785767*x0  *y  + 0.00313469767570496*x0  *y  + 0.0261224806308746*x0  *y  + 0.120163410902023*x0  *y  + 0.323516875505447*x0  *y  + 0.526870340108871*x0  *y  + 0.526870340108871*x0  *y  + 0.323516875505447*x0  *y  + 0.120163410902023*x0  *y   + 0.0261224806308746*x0 *y   + 0.00313469767570496*x0 *y   + 0.000187546014785767*x0 *y   + 4.61935997009277e-6*x0 *y   + 2.98023223876953e-8*y  /                 

Is this formula useful? If we use floating-point arithmetic, almost certainly not.

However, we can use arbitrary-precision rational arithmetic instead:

from sympy import *
from fractions import Fraction

# declare symbols
x, y, x_0 = symbols('x y x_0')

# declare function g
g = lambdify(x, Fraction(1, 2) * (x + (y / x)))

# iterate
x_1 = g(x_0)
x_2 = g(x_1)
x_3 = g(x_2)
x_4 = g(x_3)
x_5 = g(x_4)

print latex(simplify(x_5))

which produces the following monstrous formula for $x_5$:

$$\color{blue}{\frac{x_{0}^{32} + 496 x_{0}^{30} y + 35960 x_{0}^{28} y^{2} + 906192 x_{0}^{26} y^{3} + 10518300 x_{0}^{24} y^{4} + 64512240 x_{0}^{22} y^{5} + 225792840 x_{0}^{20} y^{6} + 471435600 x_{0}^{18} y^{7} + 601080390 x_{0}^{16} y^{8} + 471435600 x_{0}^{14} y^{9} + 225792840 x_{0}^{12} y^{10} + 64512240 x_{0}^{10} y^{11} + 10518300 x_{0}^{8} y^{12} + 906192 x_{0}^{6} y^{13} + 35960 x_{0}^{4} y^{14} + 496 x_{0}^{2} y^{15} + y^{16}}{32 x_{0} \left(x_{0}^{30} + 155 x_{0}^{28} y + 6293 x_{0}^{26} y^{2} + 105183 x_{0}^{24} y^{3} + 876525 x_{0}^{22} y^{4} + 4032015 x_{0}^{20} y^{5} + 10855425 x_{0}^{18} y^{6} + 17678835 x_{0}^{16} y^{7} + 17678835 x_{0}^{14} y^{8} + 10855425 x_{0}^{12} y^{9} + 4032015 x_{0}^{10} y^{10} + 876525 x_{0}^{8} y^{11} + 105183 x_{0}^{6} y^{12} + 6293 x_{0}^{4} y^{13} + 155 x_{0}^{2} y^{14} + y^{15}\right)}}$$

Suppose that we want to compute $\sqrt{10}$ using the initial estimate $x_0 = 3$:

>>> (x_5.subs(y,10)).subs(x_0,3)
9347391150304592810234881/2955904621546382351702304
>>> 9347391150304592810234881.0/2955904621546382351702304.0
3.1622776601683795
>>> 3.1622776601683795**2
10.000000000000002

which is good enough. Hence,

$$\sqrt{10} \approx \frac{9347391150304592810234881}{2955904621546382351702304} \approx 3.16227766$$

Solution 4

A tentative explanation:

As Newton's iterations converge to the same root, starting from many initial values, the function that associates the initial value to the iterate is very "flat".

Her is an illustration with the square root function:

enter image description here

As the expression of the iterates is a rational fraction, the degree of the numerator and denominator must increase to achieve this flatness. I have no explanation why the number of terms increases, though.

Share:
2,752

Related videos on Youtube

user2707299
Author by

user2707299

Updated on August 01, 2022

Comments

  • user2707299
    user2707299 over 1 year

    Going through the recursive formula for approximating roots every time is extraordinarily tedious, so I was wondering why there was no formula that computed the $n$th iteration of Newton's method.

    • Max
      Max almost 7 years
      in that case we would not need iterative methods anymore. of course trivially there is one: just recursively plug in all the intermediate values into your result and you and up with a... very very useless and ugly closed form that also is computationally disadvatageous like hell.
    • littleO
      littleO almost 7 years
      Newton's method should be implemented as a computer program, then it is not tedious.
    • Winther
      Winther almost 7 years
      Try to compute some iterations to get a feel for how complicated the expression for $x_n$ in terms of $x_0$ really is. For example for $n = 3$ we have $$x_{3} = x_0 - \frac{f(x_0)}{f'(x_0)} - \frac{f\left(x_0- \frac{f(x_0)}{f'(x_0)}\right)}{f'\left(x_0 - \frac{f(x_0)}{f'(x_0)}\right)} - \frac{f\left(x_0 - \frac{f(x_0)}{f'(x_0)} - \frac{f\left(x_0- \frac{f(x_0)}{f'(x_0)}\right)}{f'\left(x_0 - \frac{f(x_0)}{f'(x_0)}\right)}\right)}{f'\left(x_0 - \frac{f(x_0)}{f'(x_0)} - \frac{f\left(x_0- \frac{f(x_0)}{f'(x_0)}\right)}{f'\left(x_0 - \frac{f(x_0)}{f'(x_0)}\right)}\right)}$$
    • Lutz Lehmann
      Lutz Lehmann almost 7 years
      Also search the internet for "Newton fractal". These are indicative of the shape of the level sets of large iterations of Newton's method.
    • DanielWainfleet
      DanielWainfleet almost 7 years
      There are some special cases. For example with $x= x_0>0$ and $A>0$ the $n$th iterate ($n>0$) approximation to $\sqrt A$ is $\sum_{j=0}^{2^{n-1}}\binom {2^n}{2j}x^{2j}A^j\;/$ $\sum_{j=1}^{2^{n-1}}\binom {2^n}{2j-1}x^{2j-1}A^{j-1}.$
    • J. M. ain't a mathematician
      J. M. ain't a mathematician almost 7 years
      @user254665, speaking of which...
    • Daniel Schepler
      Daniel Schepler almost 7 years
      I thought the $n$th iterate approximation to $\sqrt{A}$ was $\sqrt{A} \mathrm{coth}(C \cdot 2^n)$ for some $C$ determined by the initial condition.
    • DanielWainfleet
      DanielWainfleet almost 7 years
      @DanielSchepler.Yes. It's the same thing. E.g. $\coth 4c=(a^4+6a^2+1)/(4a^2+4a)$ where $a=\coth c$.... And $C=\coth^{-1}x_0 $ when $x_0>\sqrt A$.
    • leftaroundabout
      leftaroundabout almost 7 years
      The formula for Newton-Raphson is explicit. Recursive, yes, but every step (and thus any finite number of steps) is explicit. However, the algorithm can be used to compute functions that are only specified implicitly (i.e. you only have $F(x,y)=0$ given and want to compute $y$ as a function of $x$). Pretty good, eh? You might rather ask “how can Newton-Raphson manage to give an explicit formula for implicit problems”? (To which the answer is, it can generally only give an approximation, albeit a very good one.)
    • Jean Marie
      Jean Marie almost 7 years
      Have a look at this rather recent question (math.stackexchange.com/q/2114710) and its different answers (it is the iteration of the Newton's method for $f(x)=x^2-x+1$ as I explain it in my answer). You will find there explicit formulas.
  • Ian
    Ian almost 7 years
    I don't think this is such a great explanation because this property is just not present in a lot of other cases. It is in some sense true if, say, your function is monotone on a neighborhood of the root and you consider initial values on that neighborhood. But Newton's method can potentially have very wild dynamics in the presence of multiple extrema.
  • Admin
    Admin almost 7 years
    @ian: on the opposite, this reinforces the explanation. In case of multiple roots, the iterates can converge to a piecewise constant function (sometimes extremely complicated), something hard to achieve with rational functions, and consuming more and more degrees of freedom. Thanks for the tip.
  • Matt Samuel
    Matt Samuel almost 7 years
    The reason I became a mathematician is because big numbers are funny! Well, not really, but they are funny sometimes.
  • Marc van Leeuwen
    Marc van Leeuwen almost 7 years
    Note that OP did not ask for a formula for the limit of the iterates given by Newton's method (just for the iterates themselves). Your answer, notably its opening sentence, appears to suppose that was the case.
  • user7530
    user7530 almost 7 years
    @Marc yes, that is true. I answered the question OP clearly intended to ask (from context and question title) rather than the literal question body.
  • Matthew Towers
    Matthew Towers almost 7 years
    Thanks for this really interesting example of using sympy, I'd love to see more questions answered like this
  • littleO
    littleO almost 7 years
    The first sentence is not quite right: Newton's method requires that the objective function be twice differentiable, but there are many important nondifferentiable convex functions.
  • user7530
    user7530 almost 7 years
    @littleO Yes, I didn't think such details would be useful given the level of the OP's question, but since this answer seems to be giving a lot of attention, I've added a parenthetical remark.