why we didn't divide by n or (n - 1)?

That would be the standard deviation of a *dataset*. Dividing by (n-1) is done in order to correct bias. Sal explained the reason why this is done in this video: https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data/modal/v/another-simulation-giving-evidence-that-n-1-gives-us-an-unbiased-estimate-of-variance Moreover, the standard deviation represents the average dispersion of a certain set of values. Thus, if the relative frequency is used to calculate the expected value, the relative frequency is also used to calculate the standard deviation.

i thought that the mean was the average of the sum. how come the mean of the discrete random variable is only the sum of the x(p(x)) instead of sum:x(p(x)) / number of values of x? I hope that makes sense.

The expected value is sigma xp(x) by definition. What this implies if there are three numbers let say 1, 5, 10, and three number have equally likely chance of occurring: then the expected value is (1+5+10)/3 = 16/3 = 5.33... If the probabilty the values occurring are different then you would have to use xp(x). Let now say 1 occurs with 0.5 chance, 10 with chance of 0.2 and 5 with chance of 0.3 . Then the expected value is 0.5(1)+0.3(5)+0.2(10)= 3.4. Note that mean and expected value are the same thing. It is just we extending concept of mean or in other words expected value for various probability mass function where each event does not have the same chance of occurring.

What is the difference between variance and standard deviation? Why is standard deviation the square root of the variance? Thanks!

The variance is an indicator of the dispersion but doesn't carry any immediate information about it (for instance, how could you interpret a variance of 1.19 from a random variable in comparison with a variance of 2.34 from another r.v.?). Standard deviation allows you to "standardize" the dispersion for large number of samples (or initially based on normal distribution): if your std is 1.09 and your mean is 2.1, you can say that 68% of your values are expected to be between 2.1-1.09 and 2.1+1.09 (mean + 1 std) for instance. Basically (and quite naively), std is a way to standardize the value given by the variance.

Just a quick question, why do we have to time P(X) again when calculating Standard Deviation? This was done already when calculating E(X) so the mean 2.1 should be weighted already no?

I think a good way to understand 'weight' is with the concept of frequency. With the random variable X, 0 occurs 10% of the time. So, there were 100 data points/ experiments, we would estimate 10 of them to be zero. Of course, we do not have the number of data points, but we do have the frequency that 0 will probably occur. Long story short, we cannot ignore how often a data point shows up, otherwise we are ignoring a big portion of the data. Forgive me if that was confusing. If you need clarification, let me know.

Main content

Course: AP®︎/College Statistics > Unit 8

Lesson 2: Mean and standard deviation of random variables

Variance and standard deviation of a discrete random variable

Name: Variance and standard deviation of a discrete random variable
Uploaded: 2017-07-14T17:56:21Z
Description: We learn how to calculate the mean and standard deviation of a discrete random variable. The concept of a random variable is explained, along with methods to calculate its expected value (mean) and measure its spread (variance and standard deviation). A practical example makes the concept easier to understand.

Google Classroom

We learn how to calculate the mean and standard deviation of a discrete random variable. The concept of a random variable is explained, along with methods to calculate its expected value (mean) and measure its spread (variance and standard deviation). A practical example makes the concept easier to understand.

Want to join the conversation?

Sort by:

ju lee
Posted 6 years ago. Direct link to ju lee's post “why we didn't divide by n...”
why we didn't divide by n or (n - 1)?
Button navigates to signup pageComment on ju lee's post “why we didn't divide by n...”
(18 votes)
Answer
- Gian
  Posted 6 years ago. Direct link to Gian's post “That would be the standar...”
  That would be the standard deviation of a dataset. Dividing by (n-1) is done in order to correct bias. Sal explained the reason why this is done in this video: https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data/modal/v/another-simulation-giving-evidence-that-n-1-gives-us-an-unbiased-estimate-of-variance
  
  Moreover, the standard deviation represents the average dispersion of a certain set of values. Thus, if the relative frequency is used to calculate the expected value, the relative frequency is also used to calculate the standard deviation.
  Comment on Gian's post “That would be the standar...”
  (14 votes)
kisa
Posted 5 years ago. Direct link to kisa's post “i thought that the mean w...”
i thought that the mean was the average of the sum. how come the mean of the discrete random variable is only the sum of the x(p(x)) instead of sum:x(p(x)) / number of values of x? I hope that makes sense.
Button navigates to signup pageButton navigates to signup page
(6 votes)
Answer
- cossine
  Posted 5 years ago. Direct link to cossine's post “The expected value is sig...”
  The expected value is sigma xp(x) by definition. What this implies if there are three numbers let say 1, 5, 10, and three number have equally likely chance of occurring:
  
  then the expected value is (1+5+10)/3 = 16/3 = 5.33...
  
  If the probabilty the values occurring are different then you would have to use xp(x). Let now say 1 occurs with 0.5 chance, 10 with chance of 0.2 and 5 with chance of 0.3 . Then the expected value is 0.5(1)+0.3(5)+0.2(10)= 3.4.
  
  Note that mean and expected value are the same thing. It is just we extending concept of mean or in other words expected value for various probability mass function where each event does not have the same chance of occurring.
  Button navigates to signup page
  (16 votes)
katieleonard032
Posted 5 years ago. Direct link to katieleonard032's post “What is the difference be...”
What is the difference between variance and standard deviation? Why is standard deviation the square root of the variance? Thanks!
Button navigates to signup pageButton navigates to signup page
(6 votes)
Answer
- John Smith
  Posted 4 years ago. Direct link to John Smith's post “The variance is an indica...”
  The variance is an indicator of the dispersion but doesn't carry any immediate information about it (for instance, how could you interpret a variance of 1.19 from a random variable in comparison with a variance of 2.34 from another r.v.?).
  
  Standard deviation allows you to "standardize" the dispersion for large number of samples (or initially based on normal distribution): if your std is 1.09 and your mean is 2.1, you can say that 68% of your values are expected to be between 2.1-1.09 and 2.1+1.09 (mean + 1 std) for instance.
  
  Basically (and quite naively), std is a way to standardize the value given by the variance.
  Button navigates to signup page
  (11 votes)
Seungho Choi
Posted 7 years ago. Direct link to Seungho Choi's post “At 6:22, how is for stand...”
At
6:22
, how is for standard deviation to be intuitive? Given the way it is calculated, I think standard deviation is similar to the mean of deviation. But how can it be reasonable intuitively?
Button navigates to signup pageButton navigates to signup page
(10 votes)
Answer
- Daudi Majura
  Posted 6 years ago. Direct link to Daudi Majura's post “https://www.khanacademy.o...”
  https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data/variance-standard-deviation-population/v/population-standard-deviation
  Button navigates to signup page
  (0 votes)
alexiawpy
Posted 5 years ago. Direct link to alexiawpy's post “Just a quick question, wh...”
Just a quick question, why do we have to time P(X) again when calculating Standard Deviation? This was done already when calculating E(X) so the mean 2.1 should be weighted already no?
Button navigates to signup pageButton navigates to signup page
(5 votes)
Answer
- Dackid19
  Posted 5 years ago. Direct link to Dackid19's post “I think a good way to und...”
  I think a good way to understand 'weight' is with the concept of frequency. With the random variable X, 0 occurs 10% of the time. So, there were 100 data points/ experiments, we would estimate 10 of them to be zero. Of course, we do not have the number of data points, but we do have the frequency that 0 will probably occur.
  
  Long story short, we cannot ignore how often a data point shows up, otherwise we are ignoring a big portion of the data.
  
  Forgive me if that was confusing. If you need clarification, let me know.
  Button navigates to signup page
  (2 votes)
JJ
Posted 7 months ago. Direct link to JJ's post “So basically, the formula...”
So basically, the formula of finding variance of a discrete random variable is

X= random variable
P(X)= probability of random variable
Σ=sum
σ^2= variance
µ=mean

σ^2 =Σ[X-µ ]^2 ⋅ P(X) variance is equal to the sum of squared difference between X(respectively) and the mean(µ), then we multiply it with the P(X) the X's probability_.

To find the standard deviation(σ), we simply just have to take square root of both side,( usually do it after found your variance):

√（ σ^2） =√（Σ[X-µ ]^2 ⋅ P(X)）

Feel free to correct me if I have made any mistake here, since im just another learner as well, cant be 100% right.
Button navigates to signup pageButton navigates to signup page
(4 votes)
Answer
Samer Saber
Posted 6 years ago. Direct link to Samer Saber's post “What I love about Sal is ...”
What I love about Sal is that he explains the concept behind every equation or method so we wouldn't have to just memorize it.
But he didn't do so this time with the VAR equation. Can anyone explain it?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
kailey.muroaguirre
Posted a year ago. Direct link to kailey.muroaguirre's post “can we apply z-scores to ...”
can we apply z-scores to discrete random variables
Button navigates to signup pageButton navigates to signup page
(2 votes)
Answer
Martin Schnuriger
Posted 6 years ago. Direct link to Martin Schnuriger's post “In general when calculati...”
In general when calculating the variance and standard deviation we divide by N resp. n-1. Why does this not apply in case of discrete random variables ?
Button navigates to signup pageButton navigates to signup page
(2 votes)
Answer
N N
Posted 2 years ago. Direct link to N N's post “Can we apply the concept ...”
Can we apply the concept of z-score to discrete randome variables?
For example, if X=4, z score is (4-2.1)/1.19 = 1.597
So X=4 is 1.597 standard deviations away from the expected value.
Is my understanding correct?
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer

Video transcript

- [Instructor] In a previous video, we defined this random variable x. It's a discrete random variable. It can only take on a finite number of values, and I defined it as the number of workouts I might do in a week. And we calculated the expected value of our random variable x, which we could also denote as the mean of x, and we use the Greek letter mu, which we use for population mean. And all we did is, it's the probability-weighted sum of the various outcomes. And we got for this random variable with this probability distribution, we got an expected value or a mean of 2.1. What we're gonna do now is extend this idea to measuring spread. And so we're going to think about what is the variance of this random variable, and then we could take the square root of that to find what is the standard deviation. The way we are going to do this has parallels with the way that we've calculated variance in the past. So the variance of our random variable x, what we're going to do is take the difference between each outcome and the mean, square that difference, and then we're gonna multiply it by the probability of that outcome. So for example for this first data point, you're going to have zero minus 2.1 squared times the probability of getting zero, times 0.1. Then you're going to get plus one minus 2.1 squared times the probability that you get one, times 0.15. Then you're going to get plus two minus 2.1 squared times the probability that you get a two, times 0.4. Then you have plus three minus 2.1 squared times 0.25. And then last but not least you have plus four minus 2.1 squared times 0.1. So once again, the difference between each outcome and the mean, we square it and we multiply times the probability of that outcome. So this is going to be negative 2.1 squared, which is just 2.1 squared, so I'll just write this as 2.1 squared, times .1. That's the first term. And then we're going to have plus one minus 2.1 is negative 1.1, and then we're going to square that, so that's just going to be the same thing as 1.1 squared, which is 1.21 but I'll just write it out, 1.1 squared times .15. And then this is going to be two minus 2.1 is negative .1. When you square it is going to be equal to. So plus .01. If you have negative .1 times negative .1, it's .01 times 0.4, times .4. And then plus we this is going to be 0.9 squared, so that is .81 times .25. And then we're almost there. This is going to be plus 1.9 squared, 1.9 squared times .1. And we get 1.19. So this is all going to be equal to 1.19. And if we wanna get the standard deviation for this random variable, we would denote that with the Greek letter sigma. The standard deviation for the random variable x is going to be equal to the square root of the variance. Square root of 1.19, which is equal to, just get the calculator back here, so we are just going to take the square root of what we just, let's type it again, 1.19. And that gives us, so it's approximately 1.09. Approximately 1.09. So let's see if this makes sense. Let me put this all on a number line right over here. So you have the outcome zero, one, two, three, and four. So you have a 10% chance of getting a zero. So I will draw that like this, let's just say this is a height of 10%. You have a 15% chance of getting one, so that would be 1 1/2 times higher. So it would look something like this. You have a 40% chance of getting a two. That's going to be like this. You have a 40% chance of getting a two. You have a 25% chance of getting a three. Like this. And then you have a 10% chance of getting a four. So like that. So this is a visualization of this discrete probability distribution where I didn't draw the vertical axis here, but this would be .1, this would be .15, this would be .25, and that is .4. And then we see that the mean is at 2.1. The mean is, the mean is at 2.1, which makes sense. Even though this random variable only takes on integer values, you can have a mean that takes on a non-integer value. And then the standard deviation is 1.09. So 1.09 above the mean is going to get us close to 3.2, and 1.09 below the mean is gonna get us close to one. And so this all at least intuitively feels reasonable. This mean does seem to be indicative of the central tendency of this distribution. And the standard deviation does seem to be a decent measure of the spread.