Sunday, September 8, 2024
HomepandasPandas.DataFrame.loc Filtering data - Detailed explanation of label method including code and...

Pandas.DataFrame.loc[ ] Filtering data – Detailed explanation of label method including code and test data set will be continuously updated with the Pandas version

About Pandas version: This article is written based on pandas2.2.0.

Updates on the content of this article: As the stable version of pandas changes, this article continues to be updated and continuously improved and supplemented.

Portal: Pandas API Reference Directory

Portal: Pandas version updates and new features

Portal: Pandas tutorial series from easy to deep

Directory of this section

  • Pandas.DataFrame.loc[]
    • grammar:
  • return value:
  • Syntax description:
      1. Filter 1 row, filter 1 column, and filter cells
    1. Filter multiple rows and columns
    1. Scope screening
    1. Boolean filtering
    1. Callable filtering
  • Related methods:
  • Example:

Pandas.DataFrame.loc[]

The Pandas.DataFrame.loc[] method is used to filter DataFrame data by index, column name.

  • Notice! In this method, the number you pass is not interpreted as a natural index, but is passed as a string to DataFrame.loc as the value of the row index, or the value of the column name.
  • > ⚠️ Note:
    >
    > * In this method, the number you pass will not be understood as a natural index, but will only be passed as a string to DataFrame.loc as the value of the row index, or the value of the column name.
    > * If you perform range filtering on a DataFrame with multi-level index, it must be sorted first. It is recommended to use df.sort_index(inplace=True) to sort and then perform range filtering.
    > * Support assignment of original data after filtering Example 10

grammar:

DataFrame.loc ['row index','column name']

return value:

  • Series or DataFrame or Scalar
  • The filtering range, whether the DataFrame has multi-level indexes, etc. will all affect the specific return form.
  • If the filter result is Series or Scalar, the filter condition can be forced to be returned in DataFrame style by enclosing [ ] square brackets. example 1

Syntax description:

1. Filter 1 row, filter 1 column, and filter cells

  • DataFrame.loc[index, column name] Example 1

Index filter conditions and column name filter conditions are separated by commas.

  • Filter 1 row: DataFrame.loc[_'index'_,: ] When only passing index conditions, the red commas and colons can be omitted. Red colons indicate all columns.

  • Filter 1 column: DataFrame.loc[:, _'Column name'_] _The red colon must be present, indicating all rows. _

  • Filter cells: DataFrame.loc[_'index'_, _'column name'_]

If DataFrame has multiple levels of indexes and column names, when you want to filter non-top data, you need to use tuple to pass the level of index and column names. Example 2

2. Filter multiple rows and columns

  • DataFrame.loc[['Index 1', 'Index 2', …],['Column name 1', 'Column name 2', …]]Example 3

Multiple index filter conditions are wrapped in square brackets, and multiple column name filter conditions are wrapped in square brackets. The two conditions are separated by commas.

  • Filter multiple rows: DataFrame.loc[[_'Index 1', 'Index 2'_, …], ] _When only passing index conditions, the red comma can be omitted. _
  • Filter multiple columns: DataFrame.loc[, [_'Column Name 1', 'Column Name 2'_, …]]
  • Filter multiple rows and columns at the same time: DataFrame.loc[[_'index 1','index 2', …], ['column name 1','column name 2', … _]]
  • > ⚠️ Note:
    >
    > * Multiple conditions must be wrapped in [ ] square brackets!
    >
    > * Unlike Python slice, elements at the beginning and end positions wrapped by [ ] will be included in the filter condition.
    >
    > * If DataFrame has multiple levels of indexes and column names, when you want to filter non-top-level data, you need to use tuple to pass the level of index and column names. Example 4
    >

3. Scope filtering

  • DataFrame.loc[['Index1':'Index2'] Example 5

Supports line range filtering, and the starting and ending ranges are separated by English colons. Range filtering of columns is not supported.

  • Filter only the row range: DataFrame.loc[[_'Index 1':'Index 2'_], ] _When only passing index conditions, the red comma can be omitted. _

  • **Filter row range + filter 1 column: ** DataFrame.loc[[_'index 1':'index 2'_], _'column name 1'_]

  • **Filter row range + filter multiple columns: ** DataFrame.loc[[_'index 1':'index 2'], ['column name 1','column name 2', …_ ]]

  • ⚠️ Note:

    • The start and end ranges must be separated by: English colon!

    • Range must be wrapped in [ ] square brackets!

    • Unlike Python slice, the range, start and end positions wrapped by [ ] will be included in the filter condition.

If DataFrame has multiple levels of indexes, column names, and starting range, it must be accurate to the lowest index or column name. Because the top-level index and column name may represent multiple rows or columns, they cannot be used as start conditions. Example 6

If you perform range filtering on a DataFrame with a multi-level index, it must be sorted first. It is recommended to use df.sort_index(inplace=True) to sort and then perform range filtering.

4. Boolean filtering

  • DataFrame.loc[row conditions, column conditions]

  • Row filtering: You can pass a boolean list with the same length as the row index indicating which rows to keep and which rows to discard. Example 7

  • Row filtering: Rows can be filtered using Boolean operations. If the number of Boolean operations exceeds 3, it is recommended to use advanced indexing

  • ⚠️ Note:

    Boolean operations on rows are done through column names. Expressed in the form of df[column name]. Example 8

    For multiple conditions, you can use &, | to express union or, but you cannot use and, or. Example 8

  • Column filtering: Boolean operations are not supported.

5. Callable filtering

  • DataFrame.loc[Callable]

You can use Callable to filter, which is also a Boolean filter in principle. Example 9

Related methods:

➡️ Related methods


  • DataFrame.at

    Access a single value for a row/column label pair.

  • DataFrame.iloc

Filter data-natural index method

  • DataFrame.xs

    Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.

  • Series.loc

    Access group of values using labels.

Example:

Example 1: DataFrame with only single-level index, filtering single data

    1. Filter 1 row and return Series by default. If you apply [ ] to the filtering conditions, you can force the return to DataFrame.
import pandas as pd

# ConstructDF
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
     index=['cobra', 'viper', 'sidewinder'],
     columns=['max_speed', 'shield'])

# General single row filtering,returnSeries
df.loc['cobra']
# ... max_speed    1
# ... shield       2
# ... Name: cobra, dtype: int64

# Single row filter,Apply conditions[ ],Forced return DataFrame
df.loc[['cobra']]
# ...        max_speed  shield
# ... cobra          1       2 
    1. Filter 1 column and return Series by default. If you apply the filter condition to [ ], you can force the return to DataFrame.
# General single column filter'max_speed',returnSeries
df.loc[:,'max_speed']
# ... cobra         1
# ... viper         4
# ... sidewinder    7
# ... Name: max_speed, dtype: int64

# Single column filter,Apply conditions[ ],Forced return DataFrame
df.loc[:,['max_speed']]
# ...           max_speed
# ... cobra 1
# ... viper 4
# ... sidewinder    7 
    1. When filtering cells, the scalar value Scalar is returned by default. If the filtering condition is set to [ ], it can be forced to return DataFrame.
# General cell filtering,Returns a scalar value
df.loc['cobra', 'max_speed']
# ... 1

# Apply filter conditions [ ],Can force return DataFrame
df.loc[['cobra'], ['max_speed']]
# ...       max_speed
# ... cobra     1 

Example 2: Multi-level indexed DataFrame, filtering data

    1. Construct demonstration data and observe data content
import pandas as pd

# Build demo data
tuples = [
   ('shooter', 'Trolls'), ('shooter', 'Necrons'),
   ('mage', 'Trolls'), ('mage', 'Necrons'),
   ('warrior', 'Trolls'), ('warrior', 'Necrons')
]
index = pd.MultiIndex.from_tuples(tuples)
values = [[9, 20], [10, 18], [7, 23],
        [6, 25], [4, 30], [3, 35]]
df = pd.DataFrame(values, columns=[['Attributes1','Attributes2'], ['Attack speed','Attack power']], index=index)

# Observe data content
df 
    1. Filter 1 row or 1 column of top-level index and pass the conditions normally.
# Filter top row index
df.loc['shooter']
# ...        Attributes1     Attributes2
# ...        Attack speed   Attack power
# ... Trolls        9      20
# ... Necrons      10      18

# Filter on top column index
df.loc[:,'Attributes1']
# ...               Attack speed
# ... shooter   Trolls     9
# ...       Necrons   10
# ... mage  Trolls     7
# ...       Necrons    6
# ... warrior   Trolls     4
# ...       Necrons    3 
    1. To filter non-top-level indexes and column names, you need to use tuples to set the conditions.
# Filter the bottom row
df.loc[('shooter','Trolls')]
# ... Attributes1  Attack speed      9
# ... Attributes2  Attack power    20
# ... Name: (shooter, Trolls), dtype: int64

# Filter the underlying index,Top level column name
df.loc[('shooter','Trolls'),'Attributes1']
# ... Attack speed    9
# ... Name: (shooter, Trolls), dtype: int64

# Filter the underlying index,underlying column names
df.loc[('shooter','Trolls'),('Attributes1','Attack speed')]
# ... 9 
9 
    1. Multi-level index filtering, the conditions are set with [], and you can also force the return of DataFrame
df.loc[[('shooter','Trolls')],['Attributes1']]
# ...              Attributes1
# ...              Attack speed
# ... shooter   Trolls   9 

Example 3: Single-level indexed DataFrame, filtering multiple pieces of data

import pandas as pd

# ConstructDF
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
     index=['cobra', 'viper', 'sidewinder'],
     columns=['max_speed', 'shield'])

# Filter multiple rows
df.loc[['cobra', 'viper']]
# ...       max_speed   shield
# ... cobra         1        2
# ... viper         4        5

# Filter multiple rows、single column
df.loc[['cobra', 'viper'],'max_speed']
# ... cobra    1
# ... viper    4
# ... Name: max_speed, dtype: int64

# Filter multiple rows、multiple columns
df.loc[['cobra', 'viper'],['max_speed', 'shield']]
# ...        max_speed  shield
# ... cobra          1       2
# ... viper          4       5 

Example 4: DataFrame with multi-level index, filtering multiple pieces of data

import pandas as pd

# Build demo data
tuples = [
   ('shooter', 'Trolls'), ('shooter', 'Necrons'),
   ('mage', 'Trolls'), ('mage', 'Necrons'),
   ('warrior', 'Trolls'), ('warrior', 'Necrons')
]
index = pd.MultiIndex.from_tuples(tuples)
values = [[9, 20], [10, 18], [7, 23],
        [6, 25], [4, 30], [3, 35]]
df = pd.DataFrame(values, columns=[['Attributes1','Attributes2'], ['Attack speed','Attack power']], index=index)

# Filter multiple top row indexes
df.loc[['shooter','mage']]
# ...               Attributes1 Attributes2
# ...               Attack speed    Attack power
# ... shooter   Trolls     9       20
# ...       Necrons   10       18
# ... mage  Trolls     7       23
# ...       Necrons    6       25

# Filter multiple underlying navigation indexes
df.loc[[('shooter','Trolls'),('mage','Necrons')]]
# ...              Attributes1  Attributes2
# ...               Attack speed    Attack power
# ... shooter   Trolls     9       20
# ... mage  Necrons    6       25

# OK、Column combination conditions
df.loc[('shooter','Trolls'),('Attributes2','Attack power')]
# ... 20

# Filter multiple rows at the same time、multiple columns
df.loc[[('shooter','Trolls'),('mage','Necrons')],[('Attributes1','Attack speed'),('Attributes2','Attack power')]]
# ...              Attributes1  Attributes2
# ...              Attack speed    Attack power
# ... shooter   Trolls    9       20
# ... mage  Necrons   6       25 

Example 5: DataFrame with single-level index, filtering row range

import pandas as pd

# ConstructDF
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
     index=['cobra', 'viper', 'sidewinder'],
     columns=['max_speed', 'shield'])

# Filter only a range of rows
df.loc['viper':'sidewinder']
# ...           max_speed   shield
# ... viper             4        5
# ... sidewinder        7        8

# row range + 1List
df.loc['viper':'sidewinder','shield']
# ... viper         5
# ... sidewinder    8
# ... Name: shield, dtype: int64

# row range + multiple columns
df.loc['viper':'sidewinder',['max_speed','shield']]
# ...             max_speed shield
# ... viper               4      5
# ... sidewinder          7      8 

**Example 6: For a multi-level indexed DataFrame, the starting range of the filtered row range must be accurate to the bottom index or column name. **

import pandas as pd

# Build demo data
tuples = [
   ('shooter', 'Trolls'), ('shooter', 'Necrons'),
   ('warrior', 'Trolls'), ('warrior', 'Necrons'),
   ('mage', 'Trolls'), ('mage', 'Necrons')
]
index = pd.MultiIndex.from_tuples(tuples)
values = [[9, 20], [10, 18], [7, 23],
        [6, 25], [4, 30], [3, 35]]
df = pd.DataFrame(values, columns=[['Attributes1','Attributes2'], ['Attack speed','Attack power']], index=index)

# Observation data
# df
#                Attributes1    Attributes2
#                Attack speed   Attack power
# shooter   Trolls      9      20
#       Necrons    10      18
# warrior   Trolls      4      30
#       Necrons     3      35
# mage  Trolls      7      23
#       Necrons     6      25

# Filter from Archer to Mage,Even if they are all top-level indexes,The starting position of the range condition,It must also be accurate to the troll clan,Means to specify this line。Because the top index、List,May represent multiple rows or columns,This cannot be used as a start condition。
df.loc[('shooter','Trolls'):'warrior']
#                Attributes1    Attributes2
#                Attack speed   Attack power
# shooter   Trolls      9      20
#       Necrons    10      18
# warrior   Trolls      4      30
#       Necrons     3      35

# Filter row range + column range
df.loc[('shooter','Trolls'):('warrior', 'Trolls'),('Attributes1','Attack speed'):('Attributes2', 'Attack power')]
#                Attributes1    Attributes2
#                Attack speed   Attack power
# shooter   Trolls      9      20
#       Necrons    10      18
# warrior   Trolls      4      30 

Example 7: Pass a boolean list indicating which rows are left

import pandas as pd

# Build demo data
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
     index=['cobra', 'viper', 'sidewinder'],
     columns=['max_speed', 'shield'])

# boolean list
list_bool = [False, False, True]

# Pass in boolean list,Only keep the first3OK'sidewinder'
df.loc[list_bool]
# ...             max_speed shield
# ... sidewinder          7      8 

Example 8: Filter rows with Boolean operations

import pandas as pd

# Build demo data
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
     index=['cobra', 'viper', 'sidewinder'],
     columns=['Movement speed', 'Armor'])

# filter 'Movement speed column' > 6 Target
df.loc[df['Movement speed'] > 6]
# ...             Movement speed    Armor
# ... sidewinder    7     8

# filter 'Movement speed column' > 6 Target,at the same time,Only keep the armor column
df.loc[df['Movement speed'] > 6,['Armor']]  # Armor added in square brackets,is forDataFrameshow。
# ...             Armor
# ... sidewinder    8

# use  & express and
df.loc[(df['Movement speed'] > 1) & (df['Armor'] < 8)]
# ...       Movement speed  Armor
# ... viper   4     5

# use | express or
df.loc[(df['Movement speed'] > 4) | (df['Armor'] < 5)]
# ...            Movement speed Armor
# ... cobra        1      2
# ... sidewinder   7      8 

Example 9: Filtering using Callable

  • 1、lambda
import pandas as pd

# Build demo data
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
     index=['cobra', 'viper', 'sidewinder'],
     columns=['Movement speed', 'Armor'])

# uselambda filter Armor column ==8 Target
df.loc[lambda df: df['Armor'] == 8]

# ...             Movement speed    Armor
# ... sidewinder    7     8 
    1. Custom function
import pandas as pd

# Build demo data
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
     index=['cobra', 'viper', 'sidewinder'],
     columns=['Movement speed', 'Armor'])

# Define filter function
def slect_df(df):
    return df['Armor'] == 8

# Call functions
df.loc[slect_df] 

Example 10: Assign value after filtering and change data content

import pandas as pd

# Build demo data
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
     index=['cobra', 'viper', 'sidewinder'],
     columns=['Movement speed', 'Armor'])

# After filtering,Modify data in batches
df.loc[df['Movement speed']>2] = 50
df
# ...             Movement speed    Armor
# ... cobra          1    2
# ... viper         50    50
# ... sidewinder    50    50

# After filtering,batch + 30
df.loc[df['Movement speed'] == 50] += 5
df
# ...             Movement speed    Armor
# ... cobra          1    2
# ... viper         55    55
# ... sidewinder    55    55 
RELATED ARTICLES

Most Popular

Recent Comments