For a printerfriendly PDF version of this guide, click here
This guide outlines three methods used to summarise the variability in a dataset.
It will help you identify which measure is most appropriate to use for a particular set of data.
Examples are also given of the use of these measures and how the standard deviation can be calculated using Excel.
Other useful guides: Using averages, Working with percentages
Introduction
Measures of average such as the median and mean represent the typical value for a dataset. Within the dataset the actual values usually differ from one another and from the average value itself.
The extent to which the median and mean are good representatives of the values in the original dataset depends upon the variability or dispersion in the original data.
Datasets are said to have high dispersion when they contain values considerably higher and lower than the mean value.
In figure 1 the number of different sized tutorial groups in semester 1 and semester 2 are presented. In both semesters the mean and median tutorial group size is 5 students, however the groups in semester 2 show more dispersion (or variability in size) than those in semester 1.
Dispersion within a dataset can be measured or described in several ways including the range, interquartile range and standard deviation.
The Range
The range is the most obvious measure of dispersion and is the difference between the lowest and highest values in a dataset.
In figure 1, the size of the largest semester 1 tutorial group is 6 students and the size of the smallest group is 4 students, resulting in a range of 2 (64).
In semester 2, the largest tutorial group size is 7 students and the smallest tutorial group contains 3 students, therefore the range is 4 (73).
 The range is simple to compute and is useful when you wish to evaluate the whole of a dataset.
 The range is useful for showing the spread within a dataset and for comparing the spread between similar datasets.
An example of the use of the range to compare spread within datasets is provided in table 1. The scores of individual students in the examination and coursework component of a module are shown.
To find the range in marks the highest and lowest values need to be found from the table. The highest coursework mark was 48 and the lowest was 27 giving a range of 21. In the examination, the highest mark was 45 and the lowest 12 producing a range of 33. This indicates that there was wider variation in the students’ performance in the examination than in the coursework for this module.
Since the range is based solely on the two most extreme values within the dataset, if one of these is either exceptionally high or low (sometimes referred to as outlier) it will result in a range that is not typical of the variability within the dataset.
For example, imagine in the above example that one student failed to hand in any coursework and was awarded a mark of zero, however they sat the exam and scored 40. The range for the coursework marks would now become 48 (480), rather than 21, however the new range is not typical of the dataset as a whole and is distorted by the outlier in the coursework marks.
In order to reduce the problems caused by outliers in a dataset, the interquartile range is often calculated instead of the range.
The Interquartile Range
The interquartile range is a measure that indicates the extent to which the central 50% of values within the dataset are dispersed. It is based upon, and related to, the median.
In the same way that the median divides a dataset into two halves, it can be further divided into quarters by identifying the upper and lower quartiles.
The lower quartile is found one quarter of the way along a dataset when the values have been arranged in order of magnitude; the upper quartile is found three quarters along the dataset.
Therefore, the upper quartile lies half way between the median and the highest value in the dataset whilst the lower quartile lies halfway between the median and the lowest value in the dataset. The interquartile range is found by subtracting the lower quartile from the upper quartile.
For example, the examination marks for 20 students following a particular module are arranged in order of magnitude.
 The median lies at the midpoint between the two central values (10th and 11th)
 = halfway between 60 and 62 = 61
 The lower quartile lies at the midpoint between the 5th and 6th values
 = halfway between 52 and 53 = 52.5
 The upper quartile lies at the midpoint between the 15th and 16th values
 = halfway between 70 and 71 = 70.5
The interquartile range for this dataset is therefore 70.5 – 52.5 = 18 whereas the range is: 80 – 43 = 37.
The interquartile range provides a clearer picture of the overall dataset by removing/ignoring the outlying values.
Measures of Variance
Range
The range is the difference between the high and low values. Since it uses only the extreme values, it is greatly affected by extreme values.
Procedure for finding
 Take the largest value and subtract the smallest value
Formula
Variance
The variance is the average squared deviation from the mean. It usefulness is limited because the units are squared and not the same as the original data. The sample variance is denoted by s2, it is an unbiased estimator of the population variance.
Procedure for finding
 Find the mean of the data
 Subtract the mean from each value to find the deviation from the mean
 Square the deviation from the mean
 Total the squares of the deviation from the mean
 Divide by the degrees of freedom (one less than the sample size)
Formula
Standard Deviation
The standard deviation is the average deviation from the mean. It is found by taking the square root of the variance and solves the problem of not having the same units as the original data. The sample standard deviation is denoted by s. It is not an unbiased estimator of the population standard deviation.
Procedure for finding
 Find the variance
 Take the square root
Formula
Less Common Measures of Variance
Mean Absolute Deviation
The sum of the deviations from the mean will always be zero. We need to make sure that none of the deviations are negative. We can do this by squaring each deviation (as we do in the variance or standard deviation) or by taking the absolute value (as we do in the mean absolute deviation).
Procedure for finding
 Find the mean of the data
 Subtract the mean from each data value to get the deviation from the mean
 Take the absolute value of each deviation from the mean
 Total the absolute values of the deviations from the mean
 Divide the total by the sample size.
Formula
Variation
The variation is the sum of the squares of the deviations from the mean. It has units that are squared instead of the same as the original data and it does not take the sample size into account.
Procedure for finding
 Find the mean of the data
 Subtract the mean from each value to find the deviation from the mean
 Square the deviation from the mean
 Total the squares of the deviation from the mean
Formula
Range Rule of Thumb
The range rule of thumb says that the range is approximately four times the standard deviation. Alternatively, the standard deviation is approximately onefourth the range. That means that most of the data lies within two standard deviations of the mean.
Procedure for finding
 Find the range
 Divide it by four
Formula
Pearson's Index of Skewness
Pearson's index of skewness can be used to determine whether the data is symmetric or skewed. If the index is between 1 and 1, then the distribution is symmetric. If the index is no more than 1 then it is skewed to the left and if it is at least 1, then it is skewed to the right.
Procedure for finding
 Find the mean, median, and standard deviation of the data.
 Subtract the median from the mean.
 Multiply by 3
 Divide by the standard deviation
Formula
Coefficient of Variation
The coefficient of variation is expressed as a percent and describes the standard deviation relative to the mean. It can be used to compare variability when the units are different (the units will divide out, providing just a raw number).
Procedure for finding
Formula
Chebyshev's Rule
Descriptive Statistics
Statistical Indices of Data Variability
Measures of Dispersion
Range The range gives you the most basic information about the spread of scores. It is calculated by the difference between the lowest and highest scores.
Interquartile Range: The difference between the score representing the 75th percentile and the score representing the 25th percentile is the interquartile range. This value gives you the range of the middle 50% of the values in the data set.
Variance and Standard Deviation: The standard deviation is the square root of the average squared deviation from the mean. The average squared deviation from the mean is also known as the variance.
Understanding and Calculating the Standard Deviation Computers are used extensively for calculating the standard deviation and other statistics. However, calculating the standard deviation by hand once or twice can be helpful in developing an understanding of its meaning.
Calculating the variance and standard deviation Consider the observations 8,25,7,5,8,3,10,12,9.
 First, determine n, which is the number of data values.
 Second, calculate the arithmetic mean, which is the sum of scores divided by n. For this example, the mean = (8+25+7+5+8+3+10+12+9) / 9 or 9.67
 Then, subtract the mean from each individual score to find the individual deviations.
 Then, square the individual deviations.
 Then, find the sum of the squares of the deviations…can you see why we squared them before adding the values?
 Divide the sum of the squares of the deviations by n1. This is the Variance!
 Take the square root of the variance to obtain the standard deviation, which has the same units as the original data.
Score  Mean  Deviation*  SquaredDeviation 
8  9.67  1.67  2.79 
25  9.67  +15.33  235.01 
7  9.67  2.67  7.13 
5  9.67  4.67  21.81 
8  9.67  1.67  2.79 
3  9.67  6.67  44.49 
10  9.67  +.33  .11 
12  9.67  +2.33  5.43 
9  9.67  .67  .45 
Sum of squared dev = 320.01 
*Deviation = Score – Mean 
Standard Deviation = Square root(sum of squared deviations / (N1)
= Square root(320.01/(91)) 
= Square root(40) 
= 6.32 
Raw score method for calculating standard deviation Again, consider the observations 8,25,7,5,8,3,10,12,9.
 First, square each of the scores.
 Determine N, which is the number of scores.
 Compute the sum of X and the sum of Xsquared.
 Then, calculate the standard deviation as illustrated below.

Score X2 8 64 25 625 7 49 N=9 5 25 8 64 Sum of X=87 3 9 10 100 Sum of X2=1161 12 144 9 81 — — 87 1161 Standard Deviation = square root[(sum of X2)((sum of X)*(sum of X)/N)/(N1)]
= square root[(1161)(87*87)/9)/(91)] = square root[(1161(7569/9)/8)] = square root[(1161841)/8] = square root[320/8] = square root[40] = 6.32  Even simple statistics, such as the standard deviation, are tedious to calculate “by hand”.
 Copyright © 1997 T. Lee Willoughby
Standard Deviation Calculator
 home / math / standard deviation calculator
 Please provide numbers separated by comma to calculate the standard deviation, variance, mean, sum, and margin of error.
RelatedProbability Calculator  Sample Size Calculator  Statistics Calculator
Standard deviation in statistics, typically denoted by σ, is a measure of variation or dispersion (refers to a distribution's extent of stretching or squeezing) between values in a set of data.
The lower the standard deviation, the closer the data points tend to be to the mean (or expected value), μ. Conversely, a higher standard deviation indicates a wider range of values.
Similarly to other mathematical and statistical concepts, there are many different situations in which standard deviation can be used, and thus many different equations. In addition to expressing population variability, the standard deviation is also often used to measure statistical results such as the margin of error.
When used in this manner, standard deviation is often called the standard error of the mean, or standard error of the estimate with regard to a mean. The calculator above computes population standard deviation and sample standard deviation, as well as confidence interval approximations.
Population Standard Deviation
The population standard deviation, the standard definition of σ, is used when an entire population can be measured, and is the square root of the variance of a given data set. In cases where every member of a population can be sampled, the following equation can be used to find the standard deviation of the entire population:
Where xi is an individual value μ is the mean/expected value N is the total number of values 
For those unfamiliar with summation notation, the equation above may seem daunting, but when addressed through its individual components, this summation is not particularly complicated. The i=1 in the summation indicates the starting index, i.e.
for the data set 1, 3, 4, 7, 8, i=1 would be 1, i=2 would be 3, and so on.
Hence the summation notation simply means to perform the operation of (xi – μ2) on each value through N, which in this case is 5 since there are 5 values in this data set.
EX: μ = (1+3+4+7+8) / 5 = 4.6 σ = √[(1 – 4.6)2 + (3 – 4.6)2 + … + (8 – 4.6)2)]/5 σ = √(12.96 + 2.56 + 0.36 + 5.76 + 11.56)/5 = 2.577
Sample Standard Deviation
In many cases, it is not possible to sample every member within a population, requiring that the above equation be modified so that the standard deviation can be measured through a random sample of the population being studied. A common estimator for σ is the sample standard deviation, typically denoted by s
Range and Standard Deviation – Magoosh Statistics Blog
When you start out with statistics, there are a lot of terms that can be super confusing. Take mean, median, and mode for example; they sound similar but mean completely different things.
But they are central to understanding how statistical models and methods work.
Another set of terms that are central to understanding statistical models are range and standard deviation.
Home on the Range
When we think about it in mathematical terms, range is a pretty straightforward term. It means the distance between the highest value and the lowest value. Let’s take a look at a three data sets for an idea of their ranges.
The mean of each data set is the same, so we may be tempted to think that the data are the same. But a look at the range says otherwise. In the first dataset, X1, the range is 25 – 5 = 20. While dataset X3 has a range of 90 – (60) = 150! This represents vast differences in the data that we have to account for in some way.
The range also represents the variability of the data. Datasets with a large range are said to have large variability, while datasets with smaller ranges are said to have small variability. Generally, smaller variability is better because it represents more precise measurements and yields more accurate analyses.
The range is a descriptive term that is useful for describing data. Its chief use is in calculating quartiles and interquartile range. But while range is a good gauge of the variability of the data, there is a more accurate and useful one: standard deviation.
Good Ol’ Standard Deviation
Standard deviation is the standard way that we understand and report variability. The most awesome thing about standard deviation is that we can use it not only to describe data but also conduct further analyses such as ANOVA or multiple linear regressions.
Standard deviation is a reliable method for determining how variable the data is for both a sample and a population. Of course, we cannot truly know the standard deviation for a population, but with the standard deviation of a sample, we can infer it.
The deviation is how much a score varies from the overall mean of the data. In the case of our example data, it would be how much each value differs from the mean of 15. We generally use s to represent deviation. For our data the deviation is
How can I calculate SD from a mean sample, range, N?
Universidade Estadual de Santa Cruz
CreativCeutical
Medizinische Universität Innsbruck
The University of Sheffield
CreativCeutical
University of Southern Denmark
University of Portsmouth
University of Nottingham
University of Nottingham
University of Nottingham
University of Deusto
University of Nottingham
University of Nottingham
 Eric Lim
An appreciation and understanding of statistics is import to all practising clinicians, not simply researchers. This is because mathematics is the fundamental basis to which we base clinical decisions, usually with reference to the benefit in relation to risk. Unless a clinician has a basic understanding of statistics, he or she will never be in a…
How to Find the Mean, Median, Mode, Range, and Standard Deviation
Updated May 14, 2018
By Karen G Blaettler
Simplify comparisons of sets of number, especially large sets of number, by calculating the center values using mean, mode and median. Use the ranges and standard deviations of the sets to examine the variability of data.
The mean identifies the average value of the set of numbers. For example, consider the data set containing the values 20, 24, 25, 36, 25, 22, 23.
To find the mean, use the formula: Mean equals the sum of the numbers in the data set divided by the number of values in the data set. In mathematical terms: Mean=(sum of all terms)÷(how many terms or values in the set).
Add the numbers in the example data set: 20+24+25+36+25+22+23=175.
Divide by the number of data points in the set. This set has seven values so divide by 7.
Insert the values into the formula to calculate the mean. The mean equals the sum of the values (175) divided by the number of data points (7). Since 175÷7=25, the mean of this data set equals 25. Not all mean values will equal a whole number.
The median identifies the midpoint or middle value of a set of numbers.
Put the numbers in order from smallest to largest. Use the example set of values: 20, 24, 25, 36, 25, 22, 23. Placed in order, the set becomes: 20, 22, 23, 24, 25, 25, 36.
Since this set of numbers has seven values, the median or value in the center is 24.
If the set of numbers has an even number of values, calculate the average of the two center values. For example, suppose the set of numbers contains the values 22, 23, 25, 26. The middle lies between 23 and 25. Adding 23 and 25 yields 48. Dividing 48 by two gives a median value of 24.
The mode identifies the most common value or values in the data set. Depending on the data, there might be one or more modes, or no mode at all.
Leave a Reply