About Pandas version: This article is written based on pandas2.2.0.
Updates on the content of this article: As the stable version of pandas changes, this article continues to be updated and continuously improved and supplemented.
Portal: Pandas API Reference Directory
Portal: Pandas version updates and new features
Portal: Pandas tutorial series from easy to deep
Directory of this section
- Pandas.DataFrame.mode()
-
- grammar:
- return value:
- Parameter Description:
-
- axis specifies the calculation direction (row or column)
- dropna ignores missing values
- numeric_only only counts rows or columns of pure numeric type
- Related methods:
- Example:
- *Example 1: Calculate the mode of each column or row
- Example 2. Find the product of each row of `DataFrame`
- Example 3: Only count rows or columns whose data type is numeric
Pandas.DataFrame.mode()
The DataFrame.mode
method is used to return the mode of a row or column.
⚠️ Note:
The value that appears most frequently in a certain row or column is the mode. This value can be of string type.
There can be multiple modes.
If all elements appear the same number of times, these elements will be returned as the mode.
Floating point numbers and integers with equal values will be parsed as the same elements when calculating the mode. For example, if 1.0 and 1 appear in the same row or column, it will be understood as the number 1 appearing twice.
Missing value placeholder: refers to when constructing a
DataFrame
, if the rows or columns are not of equal length (the number of elements is inconsistent), the missing position will be replaced by a missing value (Nan).
grammar:
DataFrame.mode(axis=0, numeric_only=False, dropna=True)
return value:
- DataFrame
Returns the mode for each row or column as a DataFrame
.
- When the number of modes is inconsistent, missing values (NaN) will be used to occupy the place. example 1
Parameter Description:
axis specifies the calculation direction (row or column)
- axis : {index (0), columns (1)
The axis
parameter is used to specify the calculation direction, that is, specify to calculate the mode of each row, or calculate the mode of no columns: Example 1
- 0 or 'index': Calculates the mode for each column.
- 1 or 'columns': Calculate the mode of each row.
dropna ignore missing values
- dropna : bool, default True
The dropna
parameter is used to control whether missing values are ignored when calculating the mode. The default dropna=True
means that missing values (Nan) do not participate in the mode calculation:. Example 2
- True: indicates that missing values (Nan) do not participate in the mode calculation (default).
⚠️ Note:
dropna=True
only does not consider the number of missing values (Nan) when calculating the mode, but there may still be places for missing values (Nan) in the result.
- False: Indicates missing values (Nan) participate in mode calculation.
⚠️ Note:
The mode of missing values is still a missing value (Nan)
numeric_only only calculates rows or columns of pure numeric type
- numeric_only : bool, default False
numeric_only
parameter, used to control whether to only calculate rows or columns of pure numeric types: Example 3
- False: Count rows or columns of all data types (default).
- True: Only count rows or columns of numeric type.
⚠️ Note:
When calculating the mode, integers, floating point numbers, Boolean values, and complex numbers can all participate in the calculation.
For plural type columns, if a missing value (Nan) is needed in the returned
DataFrame
, it will be expressed as NaN+0.0j
Related methods:
➡️ Related methods
- Series.mode
Mode
- Series.value_counts
Frequency (number of elements count)
- DataFrame.value_counts
Frequency (count of number of elements)
Example:
Example 1: Calculate the mode of each column or row
Example 1-1. Create demo data
import numpy as np
import pandas as pd
df = pd.DataFrame(
[("bird", 2, 2), ("mammal", 4, 1), ("arthropod", 8, 0), ("bird", 2, np.nan)],
index=("falcon", "horse", "spider", "ostrich"),
columns=("species", "legs", "wings"),
)
df
species | legs | wings | |
---|---|---|---|
falcon | bird | 2 | 2.0 |
horse | mammal | 4 | 1.0 |
spider | arthropod | 8 | 0.0 |
ostrich | bird | 2 | NaN |
Example 1-2. To calculate the mode of each column, you can pass axis=0
or keep the default (without passing the axis parameter)
df.mode() # Equivalent todf.mode(axis=0)
species | legs | wings | |
---|---|---|---|
0 | bird | 2.0 | 0.0 |
1 | NaN | NaN | 1.0 |
2 | NaN | NaN | 2.0 |
Pay attention to the wings column. Since multiple modes are returned, the results of the other two columns are filled with missing values.
Example 1-2, calculate the mode of each column, pass axis=1
.
df.mode(axis=1)
C:\Users\Administrator\AppData\Local\Temp\ipykernel_11456\1210916842.py:1: UserWarning: Unable to sort modes: '<' not supported between instances of 'int' and 'str'
df.mode(axis=1)
0 | 1 | 2 | |
---|---|---|---|
falcon | 2 | NaN | NaN |
horse | mammal | 4.0 | 1.0 |
spider | arthropod | 8.0 | 0.0 |
ostrich | bird | 2.0 | NaN |
Pay attention to the observation results. The NaN in the falcon row is the missing value, and the NaN in the ostrich row is the NaN returned as the mode. When the amount of data is relatively large, you may not be able to determine the specific attributes of the returned NaN. You can combine it with the dropna
parameter to control whether missing values are included when calculating the mode.
C:\Users\Administrator\AppData\Local\Temp\ipykernel_11456\1210916842.py:1: UserWarning: Unable to sort modes: '<' not supported between instances of 'int' and 'str' df.mode(axis=1)
This tip does not affect the correctness of the results, it just tells you that the current results cannot be sorted because of different data types.
Example 2. Find the product of each row of DataFrame
Example 2-1. Constructing demonstration data
import numpy as np
import pandas as pd
df2 = pd.DataFrame(
[
("string1", 2, 2),
("string2", 4, 1),
("string1", 8, np.nan),
("string4", 2, np.nan),
],
index=("No.1OK", "No.2OK", "No.3OK", "No.4OK"),
columns=("No.1List", "No.2List", "No.3List"),
)
df2
Column 1 | Column 2 | Column 3 | |
---|---|---|---|
Line 1 | String 1 | 2 | 2.0 |
Line 2 | String 2 | 4 | 1.0 |
Line 3 | String 1 | 8 | NaN |
Line 4 | String 4 | 2 | NaN |
Observe this example data, if you find the mode of a column, by default dropna=True
(meaning that missing values do not participate in the mode calculation), then the mode of column 1 is the string 1, and the mode of column 2 is 2. Column 3 modes should be 1.0 and 2.0, for example:
df2.mode()
Column 1 | Column 2 | Column 3 | |
---|---|---|---|
0 | string 1 | 2.0 | 1.0 |
1 | NaN | NaN | 2.0 |
Pay attention to the above results. NaN is indeed a occupancy, not the result of the mode calculation.
Example 2-3. When dropna=False
, NaN will participate in the mode calculation, for example:
df2.mode(dropna=False)
Column 1 | Column 2 | Column 3 | |
---|---|---|---|
0 | String 1 | 2 | NaN |
Pay attention to the observation results. At this time, the NaN in column 3 is the mode result, not a placeholder.
Example 3: Calculate only rows or columns whose data type is numeric
Example 3-1, building demonstration data
import numpy as np
import pandas as pd
df3 = pd.DataFrame(
[
("string1", 2, 2, True,1+2j),
("string2", 4, 1, True,1+2j),
("string1", 8, np.nan, False,1+2j),
("string4", 2, np.nan, True,1+3j),
],
index=("No.1OK", "No.2OK", "No.3OK", "No.4OK"),
columns=("string type", "integer type", "floating point type", "Boolean type",'plural type'),
)
df3
String type | Integer type | Floating point type | Boolean type | Complex type | |
---|---|---|---|---|---|
Line 1 | String 1 | 2 | 2.0 | True | 1.0+2.0j |
Line 2 | String 2 | 4 | 1.0 | True | 1.0+2.0j |
Line 3 | String 1 | 8 | NaN | False | 1.0+2.0j |
Line 4 | String 4 | 2 | NaN | True | 1.0+3.0j |
Pay attention to the demonstration data. The floating-point type column is an integer when the data is created. Because there are missing values, the data is converted to floating-point numbers (because missing values will be converted to floating-point types in some calculations). 0.0).
Example 3-2, only calculates numeric columns and does not consider the number of missing values.
df3.mode(numeric_only=True,dropna=True)
Integer type | Floating point type | Boolean type | Complex type | |
---|---|---|---|---|
0 | 2.0 | 1.0 | True | 1.0+2.0j |
1 | NaN | 2.0 | NaN | NaN+0.0j |
Pay attention to the calculation results. The NaN that appears are all placeholder missing values.