Taylor series | Essence of calculus, chapter 11

Taylor series | Essence of calculus, chapter 11

When I first learned about Taylor series,
I definitely didn’t appreciate how important they are.
But time and time again they come up in math, physics, and many fields of engineering because
they’re one of the most powerful tools that math has to offer for approximating functions. One of the first times this clicked for me
as a student was not in a calculus class, but in a physics class.
We were studying some problem that had to do with the potential energy of a pendulum,
and for that you need an expression for how high the weight of the pendulum is above its
lowest point, which works out to be proportional to one minus the cosine of the angle between
the pendulum and the vertical. The specifics of the problem we were trying
to solve are beyond the point here, but I’ll just say that this cosine function made the
problem awkward and unwieldy. But by approximating cos(theta) as 1 – theta2/2,
of all things, everything fell into place much more easily.
If you’ve never seen anything like this before, an approximation like that might seem
completely out of left field. If you graph cos(theta) along with this function
1 – theta2/2, they do seem rather close to each other for small angles near 0, but how
would you even think to make this approximation? And how would you find this particular quadratic?
The study of Taylor series is largely about taking non-polynomial functions, and finding
polynomials that approximate them near some input.
The motive is that polynomials tend to be much easier to deal with than other functions:
They’re easier to compute, easier to take derivatives, easier to integrate…they’re
just all around friendly. So let’s look at the function cos(x), and
take a moment to think about how you might find a quadratic approximation near x=0.
That is, among all the polynomials that look c0 + c1x + c2x2 for some choice of the constants
c0, c1 and c2, find the one that most resembles cos(x) near x=0; whose graph kind of spoons
with the graph of cos(x) at that point. Well, first of all, at the input 0 the value
of cos(x) is 1, so if our approximation is going to be any good at all, it should also
equal 1 when you plug in 0. Plugging in 0 just results in whatever c0 is, so we can
set that equal to 1. This leaves us free to choose constant c1
and c2 to make this approximation as good as we can, but nothing we do to them will
change the fact that the polynomial equals 1 at x=0.
It would also be good if our approximation had the same tangent slope as as cos(x) at
this point of interest. Otherwise, the approximation drifts away from the cosine graph even fro
value of x very close to 0. The derivative of cos(x) is -sin(x), and at
x=0 that equals 0, meaning its tangent line is flat.
Working out the derivative of our quadratic, you get c1 + 2c2x. At x=0 that equals whatever
we choose for c1. So this constant c1 controls the derivative of our approximation around
x=0. Setting it equal to 0 ensures that our approximation has the same derivative as cos(x),
and hence the same tangent slope. This leaves us free to change c2, but the
value and slope of our polynomial at x=0 are locked in place to match that of cos(x). The cosine graph curves downward above x=0,
it has a negative second derivative. Or in other words, even though the rate of change
is 0 at that point, the rate of change itself is decreasing around that point.
Specifically, since its derivative is -sin(x) its second derivative is -cos(x), so at x=0
its second derivative is -1. In the same way that we wanted the derivative
of our approximation to match that of cosine, so that their values wouldn’t drift apart
needlessly quickly, making sure that their second derivatives match will ensure that
they curve at the same rate; that the slope of our polynomial doesn’t drift away from
the slope of cos(x) any more quickly than it needs to.
Pulling out that same derivative we had before, then taking its derivative, we see that the
second derivative of this polynomial is exactly 2c2, so to make sure this second derivative
also equals -1 at x=0, 2c2 must equal -1, meaning c2 itself has to be -½.
This gives us the approximation 1 + 0x – ½ x2. To get a feel for how good this is, if you
estimated cos(0.1) with this polynomial, you’d get 0.995. And this is the true value of cos(0.1).
It’s a really good approximation. Take a moment to reflect on what just happened.
You had three degrees of freedom with a quadratic approximation, the constants c0, c1, and c2.
c0 was responsible for making sure that the output of the approximation matches that of
cos(x) at x=0, c1 was in charge of making sure the derivatives match at that point,
and c2 was responsible for making sure the second derivatives match up.
This ensures that the way your approximation changes as you move away from x=0, and the
way that the rate of change itself changes, is as similar as possible to behavior of cos(x),
given the amount of control you have. You could give yourself more control by allowing
more terms in your polynomial, and matching higher order derivatives of cos(x).
For example, add on the term c3x3 for some constant c3.
If you take the third derivative of a cubic polynomial, anything quadratic or smaller
goes to 0. As for that last term, after three iterations
of the power rule it looks like 1*2*3*c3. On the other hand, the third derivative of
cos(x) is sin(x), which equals 0 at x=0, so to make the third derivatives match, the constant
c3 should be 0. In other words, not only is 1 – ½ x2 the
best possible quadratic approximation of cos(x) around x=0, it’s also the best possible
cubic approximation. You can actually make an improvement by adding
a fourth order term, c4x4. The fourth derivative of cos(x) is itself, which equals 1 at x=0.
And what’s the fourth derivative of our polynomial with this new term? Well, when
you keep applying the power rule over and over, with those exponents all hopping down
front, you end up with 1*2*3*4*c4, which is 24c4
So if we want this to match the fourth derivative of cos(x), which is 1, c4 must be 1/24.
And indeed, the polynomial 1 – ½ x2 + 1/24 x4, which looks like this, is a very close
approximation for cos(x) around x=0. In any physics problem involving the cosine
of some small angle, for example, predictions would be almost unnoticeably different if
you substituted this polynomial for cos(x). Now, step back and notice a few things about
this process. First, factorial terms naturally come up in
this process. When you take n derivatives of xn, letting
the power rule just keep cascading, what you’re left with is 1*2*3 and on up to n.
So you don’t simply set the coefficients of the polynomial equal to whatever derivative
value you want, you have to divide by the appropriate factorial to cancel out this effect.
For example, that x4 coefficient is the fourth derivative of cosine, 1, divided by 4 factorial,
24. The second thing to notice is that adding
new terms, like this c4x4, doesn’t mess up what old terms should be, and that’s
important. For example, the second derivative of this
polynomial at x=0 is still equal to 2 times the second coefficient, even after introducing
higher order terms to the polynomial. And it’s because we’re plugging in x=0,
so the second derivative of any higher order terms, which all include an x, will wash away.
The same goes for any other derivative, which is why each derivative of a polynomial at
x=0 is controlled by one and only one coefficient. If instead you were approximating near an
input other than 0, like x=pi, in order to get the same effect you would have to write
your polynomial in terms of powers of (x – pi), or whatever input you’re looking at.
This makes it look notably more complicated, but all it’s doing is making the point pi
look like 0, so that plugging in x=pi will result in a lot of nice cancelation that leaves
only one constant. And finally, on a more philosophical level,
notice how what we’re doing here is essentially taking information about the higher order
derivatives of a function at a single point, and translating it into information about
the value of that function near that point. We can take as many derivatives of cos(x)
as we want, it follows this nice cyclic pattern cos(x), -sin(x), -cos(x), sin(x), and repeat.
So the value of these derivative of x=0 have the cyclic pattern 1, 0, -1, 0, and repeat.
And knowing the values of all those higher-order derivatives is a lot of information about
cos(x), even though it only involved plugging in a single input, x=0.
That information is leveraged to get an approximation around this input by creating a polynomial
whose higher order derivatives, match up with those of cos(x), following this same 1, 0,
-1, 0 cyclic pattern. To do that, make each coefficient of this
polynomial follow this same pattern, but divide each one by the appropriate factorial, like
I mentioned before, so as to cancel out the cascading effects of many power rule applications.
The polynomials you get by stopping this process at any point are called “Taylor polynomials”
for cos(x) around the input x=0. More generally, and hence more abstractly,
if we were dealing with some function other than cosine, you would compute its derivative,
second derivative, and so on, getting as many terms as you’d like, and you’d evaluate
each one at x=0. Then for your polynomial approximation, the
coefficient of each xn term should be the value of the nth derivative of the function
at 0, divided by (n!). This rather abstract formula is something
you’ll likely see in any text or course touching on Taylor polynomials.
And when you see it, think to yourself that the constant term ensures that the value of
the polynomial matches that of f(x) at x=0, the next term ensures that the slope of the
polynomial matches that of the function, the next term ensure the rate at which that slope
changes is the same, and so on, depending on how many terms you want.
The more terms you choose, the closer the approximation, but the tradeoff is that your
polynomial is more complicated. And if you want to approximate near some input
a other than 0, you write the polynomial in terms of (x-a) instead, and evaluate all the
derivatives of f at that input a. This is what Taylor series look like in their
fullest generality. Changing the value of a changes where the approximation is hugging
the original function; where its higher order derivatives will be equal to those of the
original function. One of the simplest meaningful examples is
ex, around the input x=0. Computing its derivatives is nice, since the derivative of ex is itself,
so its second derivative is also ex, as is its third, and so on.
So at the point x=0, these are all 1. This means our polynomial approximation looks like
1 + x + ½ x2 + 1/(3!) x3 + 1/(4!) x4, and so on, depending on how many terms you want.
These are the Taylor polynomials for ex. In the spirit of showing you just how connected
the topics of calculus are, let me turn to a completely different way to understand this
second order term geometrically. It’s related to the fundamental theorem of calculus, which
I talked about in chapters 1 and 8. Like we did in those videos, consider a function
that gives the area under some graph between a fixed left point and a variable right point.
What we’re going to do is think about how to approximate this area function, not the
function for the graph like we were doing before. Focusing on that area is what will
make the second order term pop out. Remember, the fundamental theorem of calculus
is that this graph itself represents the derivative of the area function, and as a reminder it’s
because a slight nudge dx to the right bound on the area gives a new bit of area approximately
equal to the height of the graph times dx, in a way that’s increasingly accurate for
smaller choice of dx. So df over dx, the change in area divided
by that nudge dx, approaches the height of the graph as dx approaches 0.
But if you wanted to be more accurate about the change to the area given some change to
x that isn’t mean to approach 0, you would take into account this portion right here,
which is approximately a triangle. Let’s call the starting input a, and the
nudged input above it x, so that this change is (x-a).
The base of that little triangle is that change (x-a), and its height is the slope of the
graph times (x-a). Since this graph is the derivative of the area function, that slope
is the second derivative of the area function, evaluated at the input a.
So the area of that triangle, ½ base times height, is one half times the second derivative
of the area function, evaluated at a, multiplied by (x-a)2.
And this is exactly what you see with Taylor polynomials. If you knew the various derivative
information about the area function at the point a, you would approximate this area at
x to be the area up to a, f(a), plus the area of this rectangle, which is the first derivative
times (x-a), plus the area of this triangle, which is ½ (the second derivative) * (x – a)2.
I like this, because even though it looks a bit messy all written out, each term has
a clear meaning you can point to on the diagram. We could call it an end here, and you’d
have you’d have a phenomenally useful tool for approximations with these Taylor polynomials.
But if you’re thinking like a mathematician, one question you might ask is if it makes
sense to never stop, and add up infinitely many terms.
In math, an infinite sum is called a “series”, so even though one of the approximations with
finitely many terms is called a “Taylor polynomial” for your function, adding all
infinitely many terms gives what’s called a “Taylor series”.
Now you have to be careful with the idea of an infinite series, because it doesn’t actually
make sense to add infinitely many things; you can only hit the plus button on the calculator
so many times. But if you have a series where adding more
and more terms gets you increasingly close to some specific value, you say the series
converges to that value. Or, if you’re comfortable extending the definition of equality to include
this kind of series convergence, you’d say the series as a whole, this infinite sum,
equals the value it converges to. For example, look at the Taylor polynomials
for ex, and plug in some input like x=1. As you add more and more polynomial terms,
the total sum gets closer and closer to the value e, so we say that the infinite series
converges to the number e. Or, what’s saying the same thing, that it equals the number
e. In fact, it turns out that if you plug in
any other value of x, like x=2, and look at the value of higher and higher order Taylor
polynomials at this value, they will converge towards ex, in this case e2.
This is true for any input, no matter how far away from 0 it is, even though these Taylor
polynomials are constructed only from derivative information gathered at the input 0.
In a case like this, we say ex equals its Taylor series at all inputs x, which is kind
of a magical thing to have happen. Although this is also true for some other
important functions, like sine and cosine, sometimes these series only converge within
a certain range around the input whose derivative information you’re using.
If you work out the Taylor series for the natural log of x around the input x=1, which
is built from evaluating the higher order derivatives of ln(x) at x=1, this is what
it looks like. When you plug in an input between 0 and 2,
adding more and more terms of this series will indeed get you closer and closer to the
natural log of that input. But outside that range, even by just a bit,
the series fails to approach anything. As you add more and more terms the sum bounces
back and forth wildly, it does not approaching the natural log of that value, even though
the natural log of x is perfectly well defined for inputs above 2.
In some sense, the derivative information of ln(x) at x=1 doesn’t propagate out that
far. In a case like this, where adding more terms
of the series doesn’t approach anything, you say the series diverges.
And that maximum distance between the input you’re approximating near, and points where
the outputs of these polynomials actually do converge, is called the “radius of convergence”
for the Taylor series. There remains more to learn about Taylor series,
their many use cases, tactics for placing bounds on the error of these approximations,
tests for understanding when these series do and don’t converge.
For that matter there remains more to learn about calculus as a whole, and the countless
topics not touched by this series. The goal with these videos is to give you
the fundamental intuitions that make you feel confident and efficient learning more on your
own, and potentially even rediscovering more of the topic for yourself.
In the case of Taylor series, the fundamental intuition to keep in mind as you explore more
is that they translate derivative information at a single point to approximation information
around that point. The next series like this will be on probability,
and if you want early access as those videos are made, you know where to go.


  1. Post

    Please turn on your moniker replies.
    Could I ask, what is the name of the mathematical graphic software you used for this?

  2. Post
  3. Post
    Marcus Hendriksen

    If you add as many terms as it takes for the terminal zeroes of your polynomial to occur at -2pi and 2pi, would that not be a perfectly accurate, yet finite, representation of cosine? (sticking with the example in this video). Assuming that it's even possible to have the zeroes line up in that way, of course.

  4. Post
  5. Post
  6. Post
    Larry Wen

    WOW ! The radius of convergence in engineering is a symmetry or an asymmetry. You hit an almost perfect symmetry/asymmetry with the optimal higher power derivatives. It is also the Taylor Polynomial ; such an amazing intuition. Such a hindsight has never been taught (at least to me) by my untalented engineering math tutors. How much engineering talents got wasted in the process ?

  7. Post
    Tushar Kushwaha

    Most lucid explanation even for a noob like me. And I am pretty sure now that the education system I came across was completely flawed.

  8. Post
    Technical Pragyan

    Sir, you are are not of this world.
    You explain everything that it is addition.
    It was pleasing and a convincing explanation.

  9. Post
    lomachenko k

    Anyone please explain 4:40 after the polynomial already has flat tangent at x=0(y axis) by putting c1=0, it drifts away like that, and the slope of its tangent is no longer equal to zero at x=0, this should not happen as we already put c1 as zero, according to how I understand. This is the best video on taylor series.

  10. Post
  11. Post
  12. Post
  13. Post
  14. Post
  15. Post
  16. Post
    Fernando Arnaldo Vilanculo

    I did not understand it. But I will keep this video for a future study because it seems to be perfect explanation seldom to get nowadays! What led me here was a search of a general formula to calculate sin of any angle. And I still did not find it.

  17. Post
  18. Post
  19. Post
    Key Chain

    After entering "taylor s" in the search bar, the first suggestion is "taylor series 3blue1brown"… I think this is a beautiful compliment to you, a great achievement and sth. to be proud of! Thank you so much and keep up the great work!

  20. Post

    This one of the most intellectual beautiful things that I have seen in my career as a student, math is awsome.

  21. Post
  22. Post
    Dhana Bharathi

    What a explain, I was carzzzzzzzy about this 😘😘😘😘😘😘😘😘😘😘😘😘😘😘 edit:- oh my god give me a another like button please 😭😭😭😭😭😭😭😭😭😭😭😭😭😭😭😭. Pls. God I need another like button 😖😖😖😖😘😘😘😘

  23. Post
  24. Post

    Before: I don't really get why Taylor series are the way they are and I'm not expecting that to change but I'll give the video a chance.

    After: Oh, that's where the n! term comes from, now this all makes sense.

  25. Post
  26. Post
  27. Post
  28. Post
  29. Post
  30. Post
  31. Post

    This is not a comment on Taylor series. Rather, it is about Harmonic series. Any comments are welcome.
    After much thought, I must disagree. It has been proved as n -> infinity, the sum of terms that equal 1/2 becomes infinitely large ( 2: 1/4, 4: 1/8, 8: 1/16, . . .), but the actual value becomes increasingly small (2x(1/4)+4x(1/8)+8x(1/16)+16x(1/32) . . .). The harmonic series starts at 1 and converges on 2. This result says absolutely nothing about the harmonic series diverging, as the new series is below the harmonic series and this new series converges to zero. I have always believed the harmonic series elements converge to zero as n -> infinity and still do. It must be remembered that the graph we see gives the actual value of each element, the behavior of the harmonic series starts at 1 and arches up towards 2 at infinity. 2 is its limiting number.

  32. Post

    Are you a computer scientist or software engineer. I am but I found it quite interesting that you used the term 'use case' to refer of the capabilities of the Taylor series. Perhaps that term is associated I'm maths too?

  33. Post
  34. Post
    Alex Mercer

    I'm in 12th standard in India. I just learnt about Maxima-Minima and problems related to that. I also learnt integrals and things. Including area under the curve.
    I learnt simple approximation method of a function: (dy/dx)∆x=∆y
    Now watching this is really interesting to me. I'm about to learn about differential equations in few days. (I solved few questions myself.)
    I just wanted to say, that your method to teaching is really cool, and that clear animations make it even better to understand. I could understand (about) everything.
    I'm your new subscriber.

  35. Post
  36. Post
  37. Post
  38. Post

    The taylor series kinda reminds me of the fourier series. Is there some sort of corrolation or am I just completely misunderstanding it?

  39. Post
  40. Post
  41. Post
  42. Post
    Keerthan Kumar

    If this had a 2nd like button and a 3rd, I would still hit it. This was so inspiring and understandable than any other method of teaching

  43. Post
    Deepjyoti Saha

    How beautiful these videos are!!!! 3b1b u could not have done anything better than this for math lovers. Please keep making such videos. Thank you.

  44. Post
  45. Post
  46. Post
  47. Post
  48. Post
  49. Post
  50. Post
  51. Post
  52. Post
  53. Post
  54. Post
    Karamjeet Pal

    Is that the mathematicians perspective towards a problem or concept , it is just amazing! Thank you so much to expand our tiny intelect to beyond even our imaginations …🙏🙏🙏

  55. Post
  56. Post
  57. Post
  58. Post
  59. Post
    Nik's Workshop

    So if would know that there is a number e, but not know its value, I could use the taylor series for e^x (Because the only thing I know is that its derivate is itself) to approximate its value? Thats cool

  60. Post
  61. Post
  62. Post
    Zaid Gharaybeh

    Weird that taylor series converges for e to the x but not for ln x, even though they are basically the exact same curves but swapped from y to x axes.

  63. Post
  64. Post
  65. Post
    Cooper Wharton

    At 17:00, wow that is just so amazing. I feel like wow. Nothing has helped me more than these videos. All i can say is thank you

  66. Post
  67. Post
  68. Post

    Engineers think that numbers approximate reality.
    Physicists think reality approximates numbers
    Mathematicians don't see a connection.

  69. Post
  70. Post
    Anwarul Bashir Shuaib

    The simplicity you use to convey complex concepts to us is something that every teacher in every schools need!

  71. Post
  72. Post
  73. Post
  74. Post
    Anshuman Singh

    Never learn anything that much deeper in just 20 minutes..thanks to 3blue1brown..your works are incredible

  75. Post
    Anjesh Kafle

    This feels like music. Like god is speaking about the rules that govern the universe. I love mathematics. I love you, 3blue1brown. You make mathematics intuitive like god always wanted it to be.

  76. Post
  77. Post
  78. Post
  79. Post
  80. Post
  81. Post
  82. Post
    Human Being

    At 13:22
    Those of you are thinking (like me) that, why are we expressing the polynomial in x-a instead of x? that is Because it would make it easier to evaluate as it when we put x=a at any point, the higher orders becomes 0 and thus making it like the earlier way in which we were approximating on 0 itself
    (btw I had written this so I could look it up if I get confused later) 😁

  83. Post
  84. Post
    Isaac Johnson

    This is fantastic. As a math and physics tutor I try to derive equations myself so I can better explain them. I would have never in a lifetime figured this out, I just told people to memorize it and that the Taylor series is useful. Now I'm exited for the next student I get who is coving this in calc ii. Thank you so much for this video!!! I will make sure to point people this way.

  85. Post
  86. Post
  87. Post
  88. Post
  89. Post
  90. Post
  91. Post
    Flavio Mancebo

    Man, this is a really good video. I wished I had this when I was back in my undergrad engineering classes. Just as education is generally powerful, making complex concepts within reach and pleasurable for a broader population has immeasurable consequences. Nicely done.

  92. Post
  93. Post
  94. Post
  95. Post

    opened this up when one of these showed up in my qm book, and this clarifies exactly what I needed to know :- ) so glad I ended up here.

  96. Post
  97. Post
    Jonas Manuel

    Thank you so much!! I had my first day of studying physics for bachelor today and the first real lecture was calculus 1. After saying hello the prof derived the taylor series without saying what it represents and why we need it using notation we have never seen in high school. Needless to say that no one understood a single thing. This video is brilliant!!!

  98. Post
  99. Post
  100. Post

Leave a Reply

Your email address will not be published. Required fields are marked *