Sunday, September 8, 2024
HomepandasPandas.DataFrame.mode() Detailed explanation of mode including code and test data set will...

Pandas.DataFrame.mode() Detailed explanation of mode including code and test data set will be continuously updated with the Pandas version

About Pandas version: This article is written based on pandas2.2.0.

Updates on the content of this article: As the stable version of pandas changes, this article continues to be updated and continuously improved and supplemented.

Portal: Pandas API Reference Directory

Portal: Pandas version updates and new features

Portal: Pandas tutorial series from easy to deep

Directory of this section

  • Pandas.DataFrame.mode()
    • grammar:
  • return value:
  • Parameter Description:
    • axis specifies the calculation direction (row or column)
  • dropna ignores missing values
  • numeric_only only counts rows or columns of pure numeric type
  • Related methods:
  • Example:
  • *Example 1: Calculate the mode of each column or row
  • Example 2. Find the product of each row of `DataFrame`
  • Example 3: Only count rows or columns whose data type is numeric

Pandas.DataFrame.mode()

The DataFrame.mode method is used to return the mode of a row or column.

⚠️ Note:

The value that appears most frequently in a certain row or column is the mode. This value can be of string type.

There can be multiple modes.

If all elements appear the same number of times, these elements will be returned as the mode.

Floating point numbers and integers with equal values ​​will be parsed as the same elements when calculating the mode. For example, if 1.0 and 1 appear in the same row or column, it will be understood as the number 1 appearing twice.

Missing value placeholder: refers to when constructing a DataFrame, if the rows or columns are not of equal length (the number of elements is inconsistent), the missing position will be replaced by a missing value (Nan).

grammar:

DataFrame.mode(axis=0, numeric_only=False, dropna=True)

return value:

  • DataFrame

Returns the mode for each row or column as a DataFrame.

  • When the number of modes is inconsistent, missing values ​​(NaN) will be used to occupy the place. example 1

Parameter Description:

axis specifies the calculation direction (row or column)

  • axis : {index (0), columns (1)

The axis parameter is used to specify the calculation direction, that is, specify to calculate the mode of each row, or calculate the mode of no columns: Example 1

  • 0 or 'index': Calculates the mode for each column.
  • 1 or 'columns': Calculate the mode of each row.

dropna ignore missing values

  • dropna : bool, default True

The dropna parameter is used to control whether missing values ​​are ignored when calculating the mode. The default dropna=True means that missing values ​​(Nan) do not participate in the mode calculation:. Example 2

  • True: indicates that missing values ​​(Nan) do not participate in the mode calculation (default).

⚠️ Note:

dropna=True only does not consider the number of missing values ​​(Nan) when calculating the mode, but there may still be places for missing values ​​(Nan) in the result.

  • False: Indicates missing values ​​(Nan) participate in mode calculation.

⚠️ Note:

The mode of missing values ​​is still a missing value (Nan)

numeric_only only calculates rows or columns of pure numeric type

  • numeric_only : bool, default False

numeric_only parameter, used to control whether to only calculate rows or columns of pure numeric types: Example 3

  • False: Count rows or columns of all data types (default).
  • True: Only count rows or columns of numeric type.

⚠️ Note:

When calculating the mode, integers, floating point numbers, Boolean values, and complex numbers can all participate in the calculation.

For plural type columns, if a missing value (Nan) is needed in the returned DataFrame, it will be expressed as NaN+0.0j

Related methods:

➡️ Related methods


  • Series.mode

Mode

  • Series.value_counts

Frequency (number of elements count)

  • DataFrame.value_counts

Frequency (count of number of elements)

Example:

Example 1: Calculate the mode of each column or row

Example 1-1. Create demo data

import numpy as np
import pandas as pd

df = pd.DataFrame(
    [("bird", 2, 2), ("mammal", 4, 1), ("arthropod", 8, 0), ("bird", 2, np.nan)],
    index=("falcon", "horse", "spider", "ostrich"),
    columns=("species", "legs", "wings"),
)
df 
species legs wings
falcon bird 2 2.0
horse mammal 4 1.0
spider arthropod 8 0.0
ostrich bird 2 NaN

Example 1-2. To calculate the mode of each column, you can pass axis=0 or keep the default (without passing the axis parameter)

df.mode()  # Equivalent todf.mode(axis=0) 
species legs wings
0 bird 2.0 0.0
1 NaN NaN 1.0
2 NaN NaN 2.0

Pay attention to the wings column. Since multiple modes are returned, the results of the other two columns are filled with missing values.

Example 1-2, calculate the mode of each column, pass axis=1.

df.mode(axis=1) 
C:\Users\Administrator\AppData\Local\Temp\ipykernel_11456\1210916842.py:1: UserWarning: Unable to sort modes: '<' not supported between instances of 'int' and 'str'
  df.mode(axis=1) 
0 1 2
falcon 2 NaN NaN
horse mammal 4.0 1.0
spider arthropod 8.0 0.0
ostrich bird 2.0 NaN

Pay attention to the observation results. The NaN in the falcon row is the missing value, and the NaN in the ostrich row is the NaN returned as the mode. When the amount of data is relatively large, you may not be able to determine the specific attributes of the returned NaN. You can combine it with the dropna parameter to control whether missing values ​​are included when calculating the mode.

C:\Users\Administrator\AppData\Local\Temp\ipykernel_11456\1210916842.py:1: UserWarning: Unable to sort modes: '<' not supported between instances of 'int' and 'str' df.mode(axis=1)

This tip does not affect the correctness of the results, it just tells you that the current results cannot be sorted because of different data types.

Example 2. Find the product of each row of DataFrame

Example 2-1. Constructing demonstration data

import numpy as np
import pandas as pd

df2 = pd.DataFrame(
    [
        ("string1", 2, 2),
        ("string2", 4, 1),
        ("string1", 8, np.nan),
        ("string4", 2, np.nan),
    ],
    index=("No.1OK", "No.2OK", "No.3OK", "No.4OK"),
    columns=("No.1List", "No.2List", "No.3List"),
)
df2 
Column 1 Column 2 Column 3
Line 1 String 1 2 2.0
Line 2 String 2 4 1.0
Line 3 String 1 8 NaN
Line 4 String 4 2 NaN

Observe this example data, if you find the mode of a column, by default dropna=True (meaning that missing values ​​do not participate in the mode calculation), then the mode of column 1 is the string 1, and the mode of column 2 is 2. Column 3 modes should be 1.0 and 2.0, for example:

df2.mode() 
Column 1 Column 2 Column 3
0 string 1 2.0 1.0
1 NaN NaN 2.0

Pay attention to the above results. NaN is indeed a occupancy, not the result of the mode calculation.

Example 2-3. When dropna=False, NaN will participate in the mode calculation, for example:

df2.mode(dropna=False) 
Column 1 Column 2 Column 3
0 String 1 2 NaN

Pay attention to the observation results. At this time, the NaN in column 3 is the mode result, not a placeholder.

Example 3: Calculate only rows or columns whose data type is numeric

Example 3-1, building demonstration data

import numpy as np
import pandas as pd

df3 = pd.DataFrame(
    [
        ("string1", 2, 2, True,1+2j),
        ("string2", 4, 1, True,1+2j),
        ("string1", 8, np.nan, False,1+2j),
        ("string4", 2, np.nan, True,1+3j),
    ],
    index=("No.1OK", "No.2OK", "No.3OK", "No.4OK"),
    columns=("string type", "integer type", "floating point type", "Boolean type",'plural type'),
)
df3 
String type Integer type Floating point type Boolean type Complex type
Line 1 String 1 2 2.0 True 1.0+2.0j
Line 2 String 2 4 1.0 True 1.0+2.0j
Line 3 String 1 8 NaN False 1.0+2.0j
Line 4 String 4 2 NaN True 1.0+3.0j

Pay attention to the demonstration data. The floating-point type column is an integer when the data is created. Because there are missing values, the data is converted to floating-point numbers (because missing values ​​will be converted to floating-point types in some calculations). 0.0).

Example 3-2, only calculates numeric columns and does not consider the number of missing values.

df3.mode(numeric_only=True,dropna=True) 
Integer type Floating point type Boolean type Complex type
0 2.0 1.0 True 1.0+2.0j
1 NaN 2.0 NaN NaN+0.0j

Pay attention to the calculation results. The NaN that appears are all placeholder missing values.

RELATED ARTICLES

Most Popular

Recent Comments