How to Calculate Mean, Median, Mode and Range in Python
When working with collections of data in Python we may want to find their mean median or mode. This can provide useful insights into what is happening inside a particular dataset or to compare it with other datasets.
In this tutorial, we will learn how to calculate the mean, median, and mode of iterable data types such as lists and tuples to discover more about them in Python.
Calculating the Mean
The mean is the result of all the values added together divided by the number of values. This will give us a generalised average of the data, which won't necessarily be a value located in the data. Let's say we had the following list
of numbers and we wanted to find their mean:
nums = [9, 4, 6, 6, 5, 2, 10, 12, 1]
We could approach this by adding the numbers together using a for
loop and then dividing that by the length of the list
. A better approach would be to use the built-in Python sum()
function to get the sum of all the values in the list
then divide that by the list
length using the len()
function.
mean = sum(nums) / len(nums)
print(mean)
6.111111111111111
In the above example, the mean is returned as a floating-point number. To round it to the nearest integer, use the Python round()
function.
print(round(mean))
6
The Python mean() Method
If you need to calculate means often it might be worth importing the statistics
package. It has a mean()
method which will do all the work of calculating a mean from the data allowing your code to be cleaner.
import statistics
nums = [9, 4, 6, 6, 5, 2, 10, 12, 1]
mean = statistics.mean(nums)
print(mean)
6.111111111111111
Calculating the Median
The median is the middle number of a sorted collection of numbers. To get the median in Python we will first have to sort the iterable and then find the index in the centre by dividing the length of the list in half. If the centre index is odd we will have to get the average of the middle two numbers to get the median.
Let's create a function that will accept an iterable as an argument and return the median.
nums = [9, 4, 6, 6, 5, 2, 10, 12, 1]
def get_median(items):
sorted_list = sorted(items)
length = len(items)
index = (length -1) // 2
if length % 2:
return sorted_list[index]
else:
return (sorted_list[index] + sorted_list[index + 1]) / 2
print(get_median(nums))
Inside the get_median()
function above we are firstly sorting the input list using sorted()
and getting the list length using len()
. Then we are getting the centre index by getting length -1
(because indexes start at 0
) and dividing that by 2
using the floor division operator (//
) to ensure we get an integer.
Then, if the centre index is even the value of the index is returned, else the average of the two closest values to the centre is returned. This is done by adding index -1
to index +1
and dividing the result by 2
.
The Python median() method
We can get the median of an iterable on one line by using the statistics.median()
method. This might be a better solution as all the work of odd centre index values is done for you.
import statistics
nums = [9, 4, 6, 6, 5, 2, 10, 12, 1]
median = statistics.median(nums)
print(median)
Calculating the Mode
The mode is the most frequently occurring value in a collection of data. This principle can be applied to both numbers and strings. The mode could be a single value, multiple values or nothing if all the values are used equally.
To get the number of times each value in a list
occurred we can use the Counter()
function from the collections
package. Let's test it out on a list
of numbers and print the result.
from collections import Counter
nums = [9, 4, 6, 6, 5, 2, 10, 12, 1, 4, 4, 6]
data = Counter(nums)
items_count = data.most_common()
highest_count = data.most_common(1)
print(data)
print(items_count)
print(highest)
Counter({4: 3, 6: 3, 9: 1, 5: 1, 2: 1, 10: 1, 12: 1, 1: 1})
[(4, 3), (6, 3), (9, 1), (5, 1), (2, 1), (10, 1), (12, 1), (1, 1)]
[(4, 3)]
Counter()
returns a class containing an ordered dictionary of each number and the number of times it occurred. We can then convert that into an list of tuples using the most_common()
method. To get the highest occurring value select the first tuple using data.most_common(1)
.
This isn't particularly useful because in the above example we can see that the values 4
and 6
both occurred 3
three times yet 4
was only shown as being the mode. To get all the mode values we can use list comprehension to build a new list and add items that are equally the highest occurring.
from collections import Counter
nums = [9, 4, 6, 6, 5, 2, 10, 12, 1, 4, 4, 6]
def get_modes(values):
c = Counter(value)
return [k for k, v in c.items() if v == c.most_common(1)[0][1]]
print(my_mode(nums))
[4, 6]
The Python mode() method
The statistics
package provides a median()
method, though it will only show one mode.
nums = [9, 4, 6, 6, 5, 2, 10, 12, 1, 4, 4, 6]
mode = statistics.mode(nums)
print(mode)
4
The Python multimode() method
The Python statistics.multimode()
method will return a list of modes.
nums = [9, 4, 6, 6, 5, 2, 10, 12, 1, 4, 4, 6]
mode = statistics.multimode(nums)
print(mode)
[4, 6]
Calculating the Range
To get the range of values from a list
we can use the min()
and max()
functions.
nums = [9, 4, 6, 6, 5, 2, 10, 12, 1, 4, 4, 6]
minimum = min(nums)
maximum = max(nums)
print(f'The range is {minimum}, {maximum}')
The range is 1, 12
Conclusion
You now know how to get the mean, median, mode and range of values in Python. The statistics
package pretty much has all bases covered when it comes to getting average, though it is good practice to know how to get them yourself first.