df.fillna()
value : Filled with static, dictionary, array, series or DataFrame.
method: This method is used if the user does not pass any value. Pandas has different methods, such as bfill, backfill or fill, which fill in values at forward index or forward/backward positions respectively.
axis: axis For rows/columns, you need to enter int or string values. For integers, the input can be 0 or 1; for strings, enter “index” or “columns”.
limit: This is an integer value that specifies the maximum number of subsequent forward/backward NaN value padding.
downcast: It takes a dict specifying what d type to downcast to what type. Such as Float64 to int64.
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(6, 3))
df.loc\[1: 2, 1\] = np.nan
df.loc\[3, 2\] = np.nan
print(df)
tmp\_df = df.fillna(value=1, inplace=False, limit=1) # Populate by columnnanforvaluevalue,limitLimit each column value to populate to one
print(tmp\_df)
tmp\_df = df.fillna(inplace=False, method='ffill') # Populate by column with previous value
print(tmp\_df)
tmp\_df = df.fillna(inplace=False, method='ffill', axis=1) # Use previous padding by row
print(tmp\_df)
df.fillna(inplace=True, method='bfill', axis=0) # Populate by column with last value
print(df)
df.interpolate()
Used to fill missing values in a data frame or series, using interpolation techniques instead of hard-coded values.
method: linear: Default value, use linear interpolation, determine the missing value in the middle of the straight line based on the two nearest points; time: when the data index is a date; index: use the value of the index for interpolation; polynomial: polynomial For interpolation, you need to specify the order parameter, for example, order=2 for quadratic polynomial interpolation; pad/ffil: fill NaN with the previous non-missing value; nearest: the nearest non-NaN value; quadratic & cubic: quadratic and cubic interpolation, Suitable for nonlinear data; barycentric: barycentric interpolation. It calculates interpolation based on the center of gravity of a given value; krogh: Krogh interpolation; spline: spline interpolation, which is good at handling outliers in the data set.
‘values’, ‘zero’, ‘slinear’, ‘barycentric’, ‘krogh’, ‘polynomial’, ‘piecewise_polynomial’, ‘from_derivatives’, ‘pchip’, ‘akima’
axis: 0 fills column by column, 1 fills row by row.
limit: The maximum number of consecutive NaNs to fill. Must be greater than 0.
limit_direction : {‘forward’, ‘backward’, ‘both’}, default ‘forward’ 。
limit_area: Determines which NaN values should be interpolated. NaN values may appear at the beginning or end of the sequence.
None: Default value, all NaN can be interpolated without any restrictions.
inside: Only interpolate NaN values, if surrounded by valid observation values (non-NaN), that is, if there are non-NaN values before and after a NaN sequence, then this NaN sequence can be interpolated, and consecutive NaN values at the beginning and end of the sequence can be interpolated. will not be interpolated.
outside: Only the NaN values at the beginning or end are interpolated. The NaN sequence in the middle will not be interpolated as long as it is surrounded by non-NaN values.
downcast: Downcast dtypes if possible.
import pandas as pd
df = pd.DataFrame({"A": \[12, 4, 5, None, 1\],
"B": \[None, 2, 54, 3, None\],
"C": \[20, 16, None, 3, 8\],
"D": \[14, 3, None, None, 6\]})
print(df)
print("=" \* 30)
tmp\_df = df.interpolate(method='linear', limit\_direction='forward', inplace=False)
print(tmp\_df)
print('=' \* 30)
tmp\_df = df.interpolate(method='linear', limit\_direction='backward', limit=1, inplace=False)
print(tmp\_df)
reference:
Pandas – DataFrame.fillna() replaces null values in DataFrame | Geek Tutorial