Measures of Variability/Dispersion

A brief overview of the various measures of variability/dispersion used to interpret data sets in statistical analysis.

Edu Level: Unit1

Date: Dec 6 2025 - 3:39 AM

⏱️Read Time:

Notebooks ⬅️ Applied Math ⬅️ Measures of Variability/Dispersion

Measures of Variability/Dispersion

Continuing onwards in statistical analysis, there are a variety of measures that affect the spread or concentration of data in a data set. These are known as the Measures of Variability/Dispersion.

These are:

Standard Deviation
Variance
Range
Inter-quartile Range ($IQR$)

Inherently, you may be tempted to ask the question, "Why is this important?". In order to illustrate this concept, consider the following:

The data sets below display the salaries for male and female employees:

Male Employees, x ($)

$$40000, 45000, 50000, 55000, 65000$$

And similarly,

Female Employees, y ($)

$$20000, 35000, 50000, 65000, 80000$$

From this data set, the following results have been obtained:

$$ \text{Mean, }\bar{x} = 50 \\ \text{Median } (Q_{\text{2}}) = 50 $$

And consequently,

$$ \text{Mean, }\bar{y} = 50 \\ \text{Median } (Q_{\text{2}}) = 50 $$

Coincidentally, we have stubmled upon two very different data sets in which the values are strikingly different yet the Measures of Central Tendency, specifically the mean and median, are of the same values. Hence, we cannot make a proper inference on the actual datasets due to this confusion. This is where knowledge of the variability/dispersion of the dataset will come into motion.

1. Standard Deviation

The standard deviation of a data set can be denoted as the spread or dispersion of the values around the mean of the data. Simply, this refers to the average distance of each data point from the mean. Thereby, we can conclude that a small standard deviation means that the data points are packed closer towards the mean whereas a large standard deviation means that the data points are spread out further from the mean. The sample standard deviation is denoted by the symbol, $s$, whereas the population standard deviation is denoted by the symbol, $\sigma$.

Ungrouped Data

$$\text{Standard deviation, } s = \sqrt{\frac{\sum x^2}{n}-\bar{x}^2}$$

In which,

$$\begin{align*} n &= \text{total number of data values} \\ x^2 &= \text{the square of a single data value} \\ \sum x^2 &= \text{sum of all squared data values} \\ \bar{x}^2 &= \text{the squared mean of the data set} \\ \end{align*}$$

For example, given the data values: $25, 34, 10, 17, 124, 16, 108$.

In this case, the total number of data values, $n = 7$. As there are six countable values amongst the raw data values.

Firstly, we must determine the mean of the data-set:

Re: $\text{Mean, } \bar{x} = \frac{\sum x}{n}$ (for Ungrouped data)

Therefore,

$$\begin{align*} \text{Mean, } \bar{x} &= \frac{334}{7} \\ \text{Mean, } \bar{x} &= 47.7 \text{ (to 3 s.f.)}\\ \end{align*}$$

Now, we must determine: $\sum x^2$.

$$ \begin{array}{|c|c|} \hline x & x^2 \\ \hline 25 & 625 \\ \hline 34 & 1156 \\ \hline 10 & 100 \\ \hline 17 & 289 \\ \hline 124 & 15376 \\ \hline 16 & 256 \\ \hline 108 & 11664 \\ \hline & \sum x^2 = 29966 \\ \hline \end{array} $$

Finally, we can plug in these values:

$$\begin{align*} \text{Standard deviation, } s &= \sqrt{\frac{\sum x^2}{n}-\bar{x}^2} \\ \text{Standard deviation, } s &= \sqrt{\frac{(29966)}{(7)}-(47.7)^2} \\ \text{Standard deviation, } s &= 44.8 \text{ (to 3 s.f.)} \\ \end{align*}$$

Thereby, we can concur due to the relatively high standard deviation the average distance from each data point in the data set is relatively high, quantified as: $44.8$. We could even verify this by re-checking the data-set and seeing data points such as: 10 and 124, highlighting a high level of variation/dispersion.

Grouped Data

$$\text{Standard deviation, } s = \sqrt{\frac{\sum{fx^2}}{\sum{f}}-\bar{x}^2}$$

In which,

$$\begin{align*} \sum{f} &= \text{total frequency of the data values} \\ x^2 &= \text{the square of a single data value} \\ fx^2 &= \text{the square of a single data value multiplied by frequency} \\ \sum fx^2 &= \text{sum of all squared data values multiplied by frequency} \\ \bar{x}^2 &= \text{the squared mean of the data set} \\ \end{align*}$$

For example, given:

$$ \begin{array}{|c|c|} \hline \text{Marks, }x & \text{Frequency, }f \\ \hline 0-10 & 4 \\ \hline 10-20 & 7 \\ \hline 20-30 & 5 \\ \hline 30-40 & 9 \\ \hline 40-50 & 6 \\ \hline 50-60 & 3 \\ \hline 60-70 & 8 \\ \hline 70-80 & 2 \\ \hline \end{array} $$

From the table above, the total frequency, $\sum{f} = 44$.

Now, we must also determine the mean, $\bar{x}$ and $\sum{fx^2}$.

Firstly,

Re: $\text{Mean, } \bar{x} = \frac{\sum{fx}}{\sum{f}}$ (for Grouped data)

In order to simplify the calculation, the required variable totals are displayed in a table below:

$$ \begin{array}{|c|c|c|c|c|} \hline x & f & x^2 & f x & f x^2 \\ \hline 10 & 4 & 100 & 40 & 400 \\ \hline 20 & 7 & 400 & 140 & 2800 \\ \hline 30 & 5 & 900 & 150 & 4500 \\ \hline 40 & 9 & 1600 & 360 & 14400 \\ \hline 50 & 6 & 2500 & 300 & 15000 \\ \hline 60 & 3 & 3600 & 180 & 10800 \\ \hline 70 & 8 & 4900 & 560 & 39200 \\ \hline 80 & 2 & 6400 & 160 & 12800 \\ \hline \text{-} & \sum{f} = 44 & \text{-} & \sum{fx} = 1890 & \sum{fx^2} =99900 \\ \hline \end{array} $$

Thereby, it can be calculated that:

$$\begin{align*} \text{Mean, } \bar{x} &= \frac{1890}{44} \\ \text{Mean, } \bar{x} &= 43.0 \text{ (to 3 s.f.)} \\ \end{align*}$$

And such, it follows that:

$$\begin{align*} \text{Standard deviation, } s &= \sqrt{\frac{\sum fx^2}{\sum f}-\bar{x}^2} \\ \text{Standard deviation, } s &= \sqrt{\frac{(99900)}{(44)}-(43.0)^2} \\ \text{Standard deviation, } s &= 20.5 \text{ (to 3 s.f.)} \\ \end{align*}$$

2. Variance

The variance of a data set can be denoted as the averaged squared distance of a set of values from their mean. Thereby, we can conclude that a small variance means that the data points are packed closer towards the mean whereas a large variance means that the data points are spread out further from the mean. As aforementioned, unlike standard deviation, this is the average squared distance of the set of values from the mean. The sample variance is denoted by the symbol, $s^2$, whereas the population variance is denoted by the symbol, $\sigma^2$.

Ungrouped Data

$$\text{Variance, } s^2 = \frac{\sum x^2}{n}-\bar{x}^2$$

In which,

Note: The process is the same for calculating as the examples shown above for standard deviation.

Grouped Data

$$\text{Variance, } s^2 = \frac{\sum{fx^2}}{\sum{f}}-\bar{x}^2$$

In which,

Note: The process is the same for calculating as the examples shown above for standard deviation.

3. Range

Inherently, the range of a data set is total spread of the data, highlighting the disparity of the data set from the smallest value (minimum) to the largest value (maximum). The range gives the most basic idea of variability as it projects how far apart the extreme values of the data set are.

Ungrouped Data

$$\text{Range} = M_{v}-L_{v}$$

In which,

$$\begin{align*} M_{v} &= \text{maximum value of the data set} \\ L_{v} &= \text{minimum value of the data set} \\ \end{align*}$$

For example, given the data values: $15, 14, 15, 27, 51, 31, 141$.

It can be identified that,

$$\begin{align*} \text{maximum value of the data set, } M_{v} &= 141 \\ \text{minimum value of the data set, } L_{v} &= 14 \\ \end{align*}$$

Therefore, the range can be simply calculated as follows:

$$\begin{align*} \text{Range} &= M_{v}-L_{v} \\ \text{Range} &= (141)-(14) \\ \text{Range} &= 127 \\ \end{align*}$$

Grouped Data

$$\text{Range} = \text{UB}_H-\text{LB}_L$$

In which,

$$\begin{align*} \text{UB}_H &= \text{Upper Boundary of the Highest Class} \\ \text{LB}_L &= \text{Lower Boundary of the Lowest Class} \\ \end{align*}$$

For example, given:

$$ \begin{array}{|c|c|} \hline \text{Marks, }x & \text{Frequency, }f \\ \hline 0-10 & 4 \\ \hline 10-20 & 7 \\ \hline 20-30 & 5 \\ \hline 30-40 & 9 \\ \hline 40-50 & 6 \\ \hline \end{array} $$

It can be noted that the highest class is: $40-50$ and as such, the lowest class is: $0-10$.

Therefore, it follows that:

the Upper Class Boundary of the highest class, $\text{UB}_H = 50$

the Lower Class Boundary of the lowest class, $\text{LB}_L = 0$

Hence, the range can be easily calculated as such:

$$\begin{align*} \text{Range} &= \text{UB}_H-\text{LB}_L \\ \text{Range} &= (50)-(0) \\ \text{Range} &= 50 \\ \end{align*}$$

4. Inter-quartile Range (IQR)

Lastly, the inter-quartile range, $IQR$, is the middle 50% of the distribution, as is often confused with the median which is the middle value of the data set. Essentialy, it is the measure of the spread of the middle 50% of the distribution. Though often confused with the median, the inter-quartile range is sandwiched between two quartiles: The Lower Quartile ($Q_1$) and The Upper Quartile ($Q_3$).

Ungrouped Data

$$\text{Inter-quartile range, } IQR = Q_3 - Q_1$$

In which,

$$\begin{align*} \text{Q}_1 &= \text{the upper quartile of the data set} \\ \text{Q}_1 &= \text{the lower quartile of the data set} \\ \end{align*}$$

It is important to note the following:

$$\text{Upper Quartile, } Q_3 = \frac{3}{4}(n+1) \text{ where $n$ is the number of data values}$$ $$\text{Lower Quartile, } Q_3 = \frac{1}{4}(n+1) \text{ where $n$ is the number of data values}$$

For example, given the data set: $98, 32, 21, 11, 73, 54, 121$.
We must first array the data-set: $11, 21, 32, 54, 73, 98, 121$. It can be concurred that the number of data values, $n = 7$
Now, we can calculate $Q_{3}$ and $Q_{1}$:

$$\begin{align*} \text{Q}_3 &= \frac{3}{4}(n+1) \\ \text{Q}_3 &= \frac{3}{4}(7+1) \\ \text{Q}_3 &= 6^{th} \text{ position} \\ \text{Q}_3 &= 98 \\ \end{align*}$$

and,

$$\begin{align*} \text{Q}_1 &= \frac{1}{4}(n+1) \\ \text{Q}_1 &= \frac{1}{4}(7+1) \\ \text{Q}_1 &= 2^{nd} \text{ position} \\ \text{Q}_1 &= 21 \\ \end{align*}$$

From which, the inter-quartile range $(IQR)$ can be easily calculated:

$$\begin{align*} \text{Inter-quartile range, } IQR = Q_3 - Q_1 \\ \text{Inter-quartile range, } IQR = 98-21 \\ \text{Inter-quartile range, } IQR = 77 \\ \end{align*}$$

Grouped Data

$$\text{Inter-quartile range, } IQR = Q_3 - Q_1$$

In which,

$$\begin{align*} \text{Q}_3 &= \text{the upper quartile of the data set} \\ \text{Q}_1 &= \text{the lower quartile of the data set} \\ \end{align*}$$

It is important to note the following:

Utilizing Linear Interpolation:

$$\text{Upper Quartile, } Q_3 = l_{Q_3} + \left( \frac{\frac{3n}{4} - F_{Q_3}}{f_{Q_3}} \right) \times h$$ $$\text{Lower Quartile, } Q_1 = l_{Q_1} + \left( \frac{\frac{n}{4} - F_{Q_1}}{f_{Q_1}} \right) \times h$$

In which,

$$\begin{align*} n &= \text{total number of data values} \\ l_{Q_{3}} &= \text{lower limit of the upper quartile class} \\ l_{Q_{1}} &= \text{lower limit of the lower quartile class} \\ F_{Q_{3}} &= \text{cumulative frequency of classes preceding the upper quartile class} \\ F_{Q_{1}} &= \text{cumulative frequency of classes preceding the lower quartile class} \\ f_{Q_{3}} &= \text{frequency of the upper quartile class} \\ f_{Q_{1}} &= \text{frequency of the lower quartile class} \\ h &= \text{class width} \end{align*}$$

For example, given the following table:

$$ \begin{array}{|c|c|} \hline \text{Mass, g} & \text{Frequency, } f \\ \hline 0-10 & 11 \\ \hline 10-20 & 31 \\ \hline 20-30 & 24 \\ \hline 30-40 & 18 \\ \hline \end{array} $$

We must identify the Upper Quartile class utilizing the previous method:

$$\begin{align*} \text{Upper Quartile, } (Q_{\text{3}}) &= \frac{3}{4}(n+1) \\ \text{Upper Quartile, } (Q_{\text{3}}) &= \frac{3}{4}(84+1) \\ \text{Upper Quartile, } (Q_{\text{3}}) &= \frac{3}{4}(85) \\ \text{Upper Quartile, } (Q_{\text{3}}) &= 63.75^{\text{th}} \text { position} \end{align*}$$

Hence, the upper quartile class is: $20-30$.

and,

$$ \begin{align*} \text{Lower Quartile, } (Q_{1}) &= \frac{1}{4}(n+1) \\ \text{Lower Quartile, }(Q_{1})&= \frac{1}{4}(84+1) \\ \text{Lower Quartile, }(Q_{1})&= \frac{1}{4}(85) \\ \text{Lower Quartile, }(Q_{1})&= 21.25^{\text{th}} \text{ position} \end{align*} $$

Hence, the lower quartile class is: $10-20$.

It is important to note that for these calculations, the same rules apply as with the median. Such that: If the $\text{Upper Quartile, } (Q_{\text{3}}) \notin \mathbb{Z}^+$ or $\text{Upper Quartile, } (Q_{\text{1}}) \notin \mathbb{Z}^+$ , we find the values at the $x^{\text{th}}$ and $y^{\text{th}}$ positions and take the average of it. If the $\text{Upper Quartile, } (Q_{\text{1}}) \in \mathbb{Z}^+$ or $\text{Lower Quartile, } (Q_{\text{1}}) \in \mathbb{Z}^+$ , the value at that position would be taken.

Now, we can calculate the upper quartile:

$$ \begin{align*} \text{Upper Quartile, } (Q_{3}) &= l_{Q_{3}} + \left( \frac{\frac{3n}{4} - F_{Q_{3}}}{f_{Q_{3}}} \right) \times h \\ \text{Upper Quartile, } (Q_{3}) &= 20 + \left( \frac{\frac{3(84)}{4} - 42}{24} \right) \times 10 \\ \text{Upper Quartile, } (Q_{3}) &= 28.75 \\ \text{Upper Quartile, } (Q_{3}) &= 28.8 \text{ (to 3 s.f.)} \end{align*} $$

and, subsequently, the lower quartile:

$$ \begin{align*} \text{Lower Quartile, } (Q_{1}) &= l_{Q_{1}} + \left( \frac{\frac{n}{4} - F_{Q_{1}}}{f_{Q_{1}}} \right) \times h \\ \text{Lower Quartile, } (Q_{1}) &= 10 + \left( \frac{\frac{84}{4} - 11}{31} \right) \times 10 \\ \text{Lower Quartile, } (Q_{1}) &= 10 + \left( \frac{21 - 11}{31} \right) \times 10 \\ \text{Lower Quartile, } (Q_{1}) &= 10 + \frac{100}{31} \\ \text{Lower Quartile, } (Q_{1}) &= 13.225806\ldots \\ \text{Lower Quartile, } (Q_{1}) &= 13.2 \text{ (to 3 s.f.)} \end{align*} $$

Finally, the inter-quartile can be easily calculated as follows:

$$\begin{align*} \text{Inter-quartile range, } IQR = Q_{3} - Q_{1} \\ \text{Inter-quartile range, } IQR = 28.8-13.2 \\ \text{Inter-quartile range, } IQR = 15.6 \\ \end{align*}$$

⚠️ Report an Error

Measures of Variability/Dispersion

⏱️Read Time:

Measures of Variability/Dispersion

1. Standard Deviation

Ungrouped Data

Grouped Data

2. Variance

Ungrouped Data

Grouped Data

3. Range

Ungrouped Data

Grouped Data

4. Inter-quartile Range (IQR)

Ungrouped Data

Grouped Data

Read Next

Confidence Intervals

Intro to Probability

Measures of Central Tendency

About Kyle Patel