Q: I don't understand this sentence: The famous triangle inequality tells us that this will be maximized by the unit vector in the direction (nabla) f (x0, y0) To me, this doesn't seem obvious at all, can I find explanation perhaps somewhere else on the khan academy? I have looked at triangle inequality `||x + y|| <= ||x||+||y||` , but I don't understand how the two things are related, other than that both somehow talk about vectors. Directional derivative however works with dot products and not adding vectors together, as in the triangle inequality, so I don't immediately see the connection between the two.

I'm in the same boat and don't see how the triangle inequality can be applied, however using the slightly different Cauchy-Schwarz inequality works. The Cauchy-Scwarz inequality states that ```x·y <= ||x|| * ||y||``` for any two vectors x and y. Cauchy-Schwarz also says that the inequality can be turned to an equality ```x · y = ||x|| * ||y||``` if x and y are parallel.

Q: Hi, i've a question on the following point "Computing the directional derivative involves a dot product between the gradient and the vector v". when i look at the definition of dot product, it says "|a|.|b|.cosine theeta". but in this definition no cosine theeta is involved. is that mean, the gradient (which has the partial derivatives of x,y) not considered as a vector here? or is it? if yes,why cosine theeta is not involved here?

The gradient is vector here, see http://mathworld.wolfram.com/DotProduct.html You can figure out why `cosine` is omitted.

Q: Isn't it good to color "h" in black in the figure just below the subtitle of "Breaking down the nudge" for the consistency? Just a suggestion.

If you meant that we should bold h like 𝐡, then this is my answer: No, because h is a scalar value, and we're taking the limit as h approaches the (scalar value) zero. This corresponds to the nudge in the input, which is the vector h𝐯, approaching the vector value 𝟎. h𝐯 is equal to the input nudge direction 𝐯 scaled (multiplied) by the step size h.

Q: step 4. how does adding the change in the x direction and y direction give us the total change in the function? we are adding "perpendicular numbers", not vectors, so the adding should look more like pythagoras, shouldn't it?

I think The explanation in this article is not precise, as the reason for we to be able to ignore hv0 to add the 2 nadge is: The function must be locally be essentially linear, i.e., there must be a linear approximation L(x)=f(a)+Df(a)(x−a) That ignorance and directly add make sense for L(x). Detailed info, you can refer to : http://mathinsight.org/directional_derivative_gradient_derivation

Q: I think this part may be wrong (or at least I don't fully understand it): "However, this differs slightly from the result of our step-by-step argument, in which the partial derivative with respect to y is taken at the point (x_0 + hv_1, y_0) not at the point (x_0, y_0)" I would think we could treat the change in y just like we treated the change in x -- they are in essence happening at the same time -- and how would you chose the order anyway?

The thing is that lim ℎ→0 𝑥₀ + ℎ𝑣₁ = 𝑥₀ With 𝒗 = (𝑣₁, 𝑣₂) 𝛻𝒗 𝑓(𝑥₀, 𝑦₀) = lim ℎ→0 [𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀ + ℎ𝑣₂) − 𝑓(𝑥₀, 𝑦₀)]∕ℎ = lim ℎ→0 [𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀ + ℎ𝑣₂) − 𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀) + 𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀) − 𝑓(𝑥₀, 𝑦₀)]∕ℎ = lim ℎ→0 [𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀ + ℎ𝑣₂) − 𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀)]∕ℎ + lim ℎ→0 [𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀) − 𝑓(𝑥₀, 𝑦₀)]∕ℎ = lim ℎ→0 (𝜕∕𝜕𝑦[𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀)]) + 𝜕∕𝜕𝑥[𝑓(𝑥₀, 𝑦₀)] = 𝜕∕𝜕𝑦[𝑓(𝑥₀, 𝑦₀)]) + 𝜕∕𝜕𝑥[𝑓(𝑥₀, 𝑦₀)]

Question 1

I'm having trouble understanding the 3rd step under the formal argument. If we move hv1 in the x direction, how does this imply that the output will be hv1*fx(x0,y0)? (Sorry for the notation - I'm on my phone).

Accepted Answer

That is the definition of the derivative. Remember: 
`fₓ(x₀,y₀) = lim_Δx→0 [(f(x₀+Δx,y₀)-f(x₀,y₀))/Δx]`
Then, we can replace Δx with hv₁ because both Δx and h are very small, so we get: 
`fₓ(x₀,y₀) = (f(x₀+hv₁,y₀)-f(x₀,y₀))/hv₁`
We can then rearrange this equation to get: 
`f(x₀+hv₁,y₀) = hv₁ × fₓ(x₀,y₀) + f(x₀,y₀)`

Question 2

Can anyone help me understand how to solve the puzzle at the end of the article? I am having trouble understanding it.

Accepted Answer

I had trouble with this puzzle too, but then I thought about it in terms of vectors. We need to maximize 100A + 20B + 2C, right? By definition of the dot product, this expression is equal to the dot product of two vectors [100, 20, 2] * [A, B, C]. So we want to maximize the dot product. When does the dot product have the maximum value? It is maximum when two vectors are parallel, or, in other words, one vector is multiple of the other (this can be understood from the graphical interpretation of the dot product). Therefore, our vector [A, B, C] should be [100x, 20x, 2x], where x is some number.

The second insight is to express A^2 + B^2 + C^2 = 10404 equation in vector notation. Expression A^2 + B^2 + C^2 is equal to the dot product of vector [A,B,C] with itself:

[A,B,C]*[A,B,C] = 10404.

Here instead of [A,B,C] we substitute  [100x, 20x, 2x] vector and solve for x:

[100x, 20x, 2x]* [100x, 20x, 2x] = 10000x^2 + 400x^2 + 4x^2 = 10404x^4 = 10404.
x = 1.

Therefore, our vector [A,B,C] is  [100 * 1, 20 * 1, 2 * 1] =  [100, 20, 2].

So, we have our answer:
A = 100
B = 20
C = 2.

I hope it wasn't confusing. :)

Question 3

For a given gradient vector, I can understand that any unit vector in the direction of gradient vector will give maximum value of dot product between itself and gradient vector. 
But how does it proves the gradient vector is itself the direction of maximum ascent.

Accepted Answer

I don’t think that quite matters. If a unit vector in the direction of the gradient vector is the direction of greatest ascent, then moving in the direction of the gradient vector is also in the direction of the greatest ascent. One is just a multiple of another—they still pointin the same direction.

Question 4

I don't understand this sentence: The famous triangle inequality tells us that this will be maximized by the unit vector in the direction (nabla) f (x0, y0)

To me, this doesn't seem obvious at all, can I find explanation perhaps somewhere else on the khan academy? I have looked at triangle inequality `||x + y|| <= ||x||+||y||` , but I don't understand how the two things are related, other than that both somehow talk about vectors. Directional derivative however works with dot products and not adding vectors together, as in the triangle inequality, so I don't immediately see the connection between the two.

Accepted Answer

I'm in the same boat and don't see how the triangle inequality can be applied, however using the slightly different Cauchy-Schwarz inequality works. The Cauchy-Scwarz inequality states that ```x·y <= ||x|| * ||y||``` for any two vectors x and y. Cauchy-Schwarz also says that the inequality can be turned to an equality ```x · y =  ||x|| * ||y||``` if x and y are parallel.

Question 5

Hi, i've a question on the following point "Computing the directional derivative involves a dot product between the gradient and the vector v". when i look at the definition of dot product, it says "|a|.|b|.cosine theeta".
but in this definition no cosine theeta is involved. is that mean, the gradient (which has the partial derivatives of x,y) not considered as a vector here? or is it? if yes,why cosine theeta is not involved here?

Accepted Answer

The gradient is vector here, see http://mathworld.wolfram.com/DotProduct.html 
You can figure out why `cosine` is omitted.

Question 6

Isn't it good to color "h" in black in the figure just below the subtitle of "Breaking down the nudge" for the consistency? Just a suggestion.

Accepted Answer

If you meant that we should bold h like 𝐡, then this is my answer: No, because h is a scalar value, and we're taking the limit as h approaches the (scalar value) zero. This corresponds to the nudge in the input, which is the vector h𝐯, approaching the vector value 𝟎. h𝐯 is equal to the input nudge direction 𝐯 scaled (multiplied) by the step size h.

Question 7

step 4. how does adding the change in the x direction and y direction give us the total change in the function? we are adding "perpendicular numbers", not vectors, so the adding should look more like pythagoras, shouldn't it?

Accepted Answer

I think The explanation in this article is not precise, as the reason for we to be able to ignore hv0 to add the 2 nadge is: 
The function must be locally be essentially linear, i.e., there must be a linear approximation
L(x)=f(a)+Df(a)(x−a)        
That ignorance and directly add make sense for L(x).
Detailed info, you can refer to :
http://mathinsight.org/directional_derivative_gradient_derivation

Question 8

I think this part may be wrong (or at least I don't fully understand it):

"However, this differs slightly from the result of our step-by-step argument, in which the partial derivative with respect to y is taken at the point (x_0 + hv_1, y_0) not at the point (x_0, y_0)"

I would think we could treat the change in y just like we treated the change in x -- they are in essence happening at the same time -- and how would you chose the order anyway?

Accepted Answer

The thing is that lim ℎ→0 𝑥₀ + ℎ𝑣₁ = 𝑥₀

With 𝒗 = (𝑣₁, 𝑣₂)
𝛻𝒗 𝑓(𝑥₀, 𝑦₀) = lim ℎ→0 [𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀ + ℎ𝑣₂) − 𝑓(𝑥₀, 𝑦₀)]∕ℎ

= lim ℎ→0 [𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀ + ℎ𝑣₂) − 𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀) + 𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀) − 𝑓(𝑥₀, 𝑦₀)]∕ℎ

= lim ℎ→0 [𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀ + ℎ𝑣₂) − 𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀)]∕ℎ
+ lim ℎ→0 [𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀) − 𝑓(𝑥₀, 𝑦₀)]∕ℎ

= lim ℎ→0 (𝜕∕𝜕𝑦[𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀)]) + 𝜕∕𝜕𝑥[𝑓(𝑥₀, 𝑦₀)]

= 𝜕∕𝜕𝑦[𝑓(𝑥₀, 𝑦₀)]) + 𝜕∕𝜕𝑥[𝑓(𝑥₀, 𝑦₀)]

Symbol	Informal understanding	Formal understanding
$\partial x$ ‍	A tiny nudge in the $x$ ‍ direction.	A limiting variable $h$ ‍ which goes to $0$ ‍, and will be added to the first component of the function's input.
$\partial f$ ‍	The resulting change in the output of $f$ ‍ after the nudge.	The difference between $f (x_{0} + h, y_{0})$ ‍ and $f (x_{0}, y_{0})$ ‍, taken in the same limit as $h \to 0$ ‍.

Course: Multivariable calculus > Unit 2

Directional derivatives (going deeper)

Background:

Formal definition of the directional derivative

Seeking connection between the definition and computation

Breaking down the nudge

Why does the gradient point in the direction of steepest ascent?

Want to join the conversation?