Original text:
pandas.pydata.org/docs/
New Features 1.3.5 (December 12, 2021)
Original text:
pandas.pydata.org/docs/whatsnew/v1.3.5.html
These are changes in pandas 1.3.5. Check out the release notes for the full changelog including other releases.
Fixed regression issues
- Fixed regression in
Series.equals()
comparing floating point to object data types when comparing to None (GH 44190) -
Fixed a regression in
merge_asof()
that raised an error when an array was provided as a join key (GH 42844) -
Fixed regression that raised
RuntimeError
when usingDateTimeIndex
to resampleDataFrame
for an empty group anduint8
,uint16
oruint32
columns (GH 43329) -
Fixed a regression in creating a
DataFrame
of a timezone-awareTimestamp
scalar around daylight saving time transitions (GH 42505) -
Fixed performance regression in
read_csv()
(GH 44106) -
Fixed regression in
Series.duplicated()
andSeries.drop_duplicates()
when the Series has theCategorical
data type and has a Boolean category (GH 44351) -
Fixed regression in
DataFrameGroupBy.sum()
andSeriesGroupBy.sum()
wheretimedelta64[ns]
data type containedNaT
and failed to treat the value as NA (GH 42659) -
Fixed a regression where extra groups would be incorrectly returned in the results when
other
inRollingGroupby.cov()
andRollingGroupby.corr()
was the same shape as each group (GH 42915) ## Contributors
A total of 10 people contributed to this version. People with a “+” sign contributed to this release for the first time.
- Ali McMaster
-
Matthew Roeschke
-
Matthew Zeitlin
-
MeeseeksMachine
-
pandas development team
-
Patrick Hoefler
-
Simon Hawkins
-
Thomas Li
-
Tobias Pitters
-
jbrockmendel ## Fixed regression issue
-
Fixed regression in
Series.equals()
when comparing floats with dtype object to None (GH 44190) -
Fixed regression where
merge_asof()
raised an error when providing an array as join key (GH 42844) -
Fixed regression when resampling
DataFrame
withDateTimeIndex
, incorrectly raisingRuntimeError
when there were empty groups anduint8
,uint16
oruint32
columns (GH 43329) -
Fixed regression near daylight saving time transitions when creating a
DataFrame
from a timezone-awareTimestamp
scalar (GH 42505) -
Fixed performance regression in
read_csv()
(GH 44106) -
Fixed regression in
Series.duplicated()
andSeries.drop_duplicates()
when the Series has aCategorical
dtype of boolean category (GH 44351) -
Fixed regression in
DataFrameGroupBy.sum()
andSeriesGroupBy.sum()
failing to treat the value as NA when thetimedelta64[ns]
dtype containingNaT
was not treated as NA (GH 42659) -
Fixed regression in
RollingGroupby.cov()
andRollingGroupby.corr()
where extra groups would be incorrectly returned in the result whenother
had the same shape as each group (GH 42915)
Contributor
A total of 10 people contributed to this version. People with a “+” sign after their name are contributing code to that version for the first time.
- Ali McMaster
-
Matthew Roeschke
-
Matthew Zeitlin
-
MeeseeksMachine
-
pandas development team
-
Patrick Hoefler
-
Simon Hawkins
-
Thomas Li
-
Tobias Pitters
-
jbrockmendel
New features in version 1.3.4 (October 17, 2021)
Original text:
pandas.pydata.org/docs/whatsnew/v1.3.4.html
These are changes in pandas version 1.3.4. Check out the release notes for the full changelog including other versions.
fixed return
- Fixed regression in
DataFrame.convert_dtypes()
incorrectly converting byte strings to strings (GH 43183) -
Fixed regression where
DataFrameGroupBy.agg()
andSeriesGroupBy.agg()
failed silently when failing with mixed data types onaxis=1
andMultiIndex
(GH 43209) -
Fixed regression where
merge()
with integer andNaN
keys failed onouter
merge (GH 43550) -
Fixed regression in
DataFrame.corr()
raisingValueError
when usingmethod="spearman"
on 32-bit platforms (GH 43588) -
Fixed performance regression in
MultiIndex.equals()
(GH 43549) -
Fixed performance regressions in
DataFrameGroupBy.first()
,SeriesGroupBy.first()
,DataFrameGroupBy.last()
andSeriesGroupBy.last()
, as well as a regression withStringDtype
(GH 41596 ) -
Fixed regression in
Series.cat.reorder_categories()
failing to update categories onSeries
(GH 43232) -
Fixed regression in
Series.cat.categories()
setter that failed to update categories onSeries
(GH 43334) -
Fixed regression in
read_csv()
raisingUnicodeDecodeError
exception whenmemory_map=True
(GH 43540) -
Fixed regression in
DataFrame.explode()
that raisedAssertionError
whencolumn
was not any scalar of a string (GH 43314) -
Fixed a regression in
Series.aggregate()
that attempted to passargs
andkwargs
multiple times to a user-suppliedfunc
in some cases (GH 43357) -
Fixed a regression when iterating over
DataFrame.groupby.rolling
objects causing the resulting DataFrame to be incorrectly indexed if the input grouping was not sorted (GH 43386) -
Fixed regression where
DataFrame.groupby.rolling.cov()
andDataFrame.groupby.rolling.corr()
calculated incorrectly when the input grouping was unsorted (GH 43386) -
Fixed bug in
pandas.DataFrame.groupby.rolling()
andpandas.api.indexers.FixedForwardWindowIndexer
that caused segfaults and window endpoints to be mixed between groups (GH 43267) -
Fixed bug where
DataFrameGroupBy.mean()
andSeriesGroupBy.mean()
returned incorrect results for datetimelike values withNaT
values (GH 43132) -
Fixed bug in
Series.aggregate()
where the firstargs
was not passed to user-suppliedfunc
in some cases (GH 43357) -
Fixed memory leak in
Series.rolling.quantile()
andSeries.rolling.median()
(GH 43339) -
The minimum version of Cython required to compile pandas is now
0.29.24
(GH 43729) ## Contributors
A total of 17 people contributed patches to this version. People with a “+” after their names contributed patches for the first time.
- Alexey Györi +
-
DSM
-
Irv Lustig
-
Jeff Reback
-
Julien de la Bruère-T +
-
Matthew Zeitlin
-
MeeseeksMachine
-
pandas development team
-
Patrick Hoefler
-
Richard Shadrach
-
Shoham Debnath
-
Simon Hawkins
-
Thomas Li
-
stupid +
-
jbrockmendel
-
michael-gh+
-
realead ## fixed regression
-
Fixed regression in
DataFrame.convert_dtypes()
incorrectly converting byte strings to strings (GH 43183) -
Fixed regression where
DataFrameGroupBy.agg()
andSeriesGroupBy.agg()
would silently fail if failed alongMultiIndex
onaxis=1
(GH 43209) -
Fixed regression where
merge()
failed when merging usingouter
on integers andNaN
keys (GH 43550) -
Fixed regression in
ValueError
inDataFrame.corr()
when usingmethod="spearman"
on 32-bit platforms (GH 43588) -
Fixed performance regression in
MultiIndex.equals()
(GH 43549) -
Fixed performance regression in
StringDtype
,DataFrameGroupBy.first()
,SeriesGroupBy.first()
,DataFrameGroupBy.last()
andSeriesGroupBy.last()
(GH 41596) -
Fixed regression in
Series.cat.reorder_categories()
failing to update categories onSeries
(GH 43232) -
Fixed regression in setter of
Series.cat.categories()
failing to update categories onSeries
(GH 43334) -
Fixed regression in
read_csv()
raisingUnicodeDecodeError
exception whenmemory_map=True
(GH 43540) -
Fixed regression in
DataFrame.explode()
raisingAssertionError
whencolumn
was any scalar that was not a string (GH 43314) -
Fixed regression in
Series.aggregate()
when trying to passargs
andkwargs
to a user-suppliedfunc
multiple times in some cases (GH 43357) -
Fixed a regression when iterating over
DataFrame.groupby.rolling
objects, causing the resulting DataFrame to have incorrect indexes if the input grouping was not sorted (GH 43386) -
Fixed regression in
DataFrame.groupby.rolling.cov()
andDataFrame.groupby.rolling.corr()
, calculating incorrect results when the input grouping was unsorted (GH 43386)
Bug fixes
-
Fixed bug in
pandas.DataFrame.groupby.rolling()
andpandas.api.indexers.FixedForwardWindowIndexer
causing segfaults and window endpoints to be mixed between groups (GH 43267) -
Fixed bug in
DataFrameGroupBy.mean()
andSeriesGroupBy.mean()
where datetimelike values includingNaT
values returned incorrect results (GH 43132) -
Fixed a bug in
Series.aggregate()
where the firstargs
was not passed to the user-suppliedfunc
in some cases (GH 43357) -
Fixed memory leak in
Series.rolling.quantile()
andSeries.rolling.median()
(GH 43339)
other
- The minimum Cython version required to compile pandas is now
0.29.24
(GH 43729)
Contributor
A total of 17 people contributed patches to this version. People with a “+” sign next to their name are contributing patches for the first time.
- Alexey Györi +
-
DSM
-
Irv Lustig
-
Jeff Reback
-
Julien de la Bruère-T +
-
Matthew Zeitlin
-
MeeseeksMachine
-
pandas development team
-
Patrick Hoefler
-
Richard Shadrach
-
Shoham Debnath
-
Simon Hawkins
-
Thomas Li
-
stupid +
-
jbrockmendel
-
michael-gh+
-
realead
New features in 1.3.3 (September 12, 2021)
Original text:
pandas.pydata.org/docs/whatsnew/v1.3.3.html
These are changes in pandas 1.3.3. See the release notes for the full changelog including other versions of pandas.
Fixed regression issues
- Fixed an issue where the
DataFrame
constructor failed when broadcasting for a definedIndex
and aTimestamp
list of length one (GH 42810) -
Fixed an issue where
DataFrameGroupBy.agg()
andSeriesGroupBy.agg()
incorrectly raised exceptions under certain circumstances (GH 42390) -
Fixed regression in
DataFrameGroupBy.apply()
andSeriesGroupBy.apply()
wherenan
values would be dropped even ifdropna=False
(GH 43205) -
Fixed regression in
DataFrameGroupBy.quantile()
andSeriesGroupBy.quantile()
which failed when usingpandas.NA
(GH 42849) -
Fixed regression in
merge()
whenon
column hadExtensionDtype
orbool
data type, being converted toobject
inright
andouter
merges (GH 40073) -
Fixed regression in
RangeIndex.where()
andRangeIndex.putmask()
raisingAssertionError
when the result did not represent aRangeIndex
(GH 43240) -
Fixed a regression in
read_parquet()
where thefastparquet
engine did not work properly in fastparquet 0.7.0 (GH 43075) -
Fixed regression in
DataFrame.loc.__setitem__()
raisingValueError
when setting an array to a cell value (GH 43422) -
Fixed regression where
is_list_like()
was recognized as an iterable when the object's__iter__
was set toNone
(GH 43373) -
Fixed a regression where
DataFrame.__getitem__()
would throw an error on slicing a non-monotone indexedDatetimeIndex
(GH 43223) -
Fixed regression in
Resampler.aggregate()
when used after column selection, which would raise an error iffunc
was a list of aggregate functions (GH 42905) -
Fixed regression in
DataFrame.corr()
where Kendall correlation would produce incorrect results for columns with duplicate values (GH 43401) -
Fixed a regression in
DataFrame.groupby()
that caused results on those columns to be lost when aggregating on those columns (GH 42395, GH 43108) -
Fixed a regression where
Series.fillna()
raised aTypeError
when filling aSeries
of typefloat
with a data type that cannot be converted losslessly (such asfloat32
filled withfloat64
) (GH 43424) -
Fixed regression in
read_csv()
raisingAttributeError
when the file handle was atempfile.SpooledTemporaryFile
object (GH 43439) -
Fixed performance regression issue in
core.window.ewm.ExponentialMovingWindow.mean()
(GH 42333) ## Performance improvements -
In
DataFrame.__setitem__()
, performance is improved when the key or value is not aDataFrame
or the key is not a list-like (GH 43274) ## Bug fixes -
Fixed a bug where
index
data was not passed correctly tofunc
when usingengine="numba"
inDataFrameGroupBy.agg()
andDataFrameGroupBy.transform()
(GH 43133) ## Contributors
A total of 18 people contributed patches to this release. People with a “+” after their names contributed patches for the first time.
- Ali McMaster
-
Irv Lustig
-
Matthew Roeschke
-
Matthew Zeitlin
-
MeeseeksMachine
-
pandas development team
-
Patrick Hoefler
-
Prerana Chakraborty +
-
Richard Shadrach
-
Shoham Debnath
-
Simon Hawkins
-
Thomas Li
-
Torsten Wörtwein
-
Zach Rait +
-
aiudirog +
-
attack68
-
jbrockmendel
-
suoniq + ## fixed regression
-
Fixed regression where broadcast failed for
Index
andTimestamp
lists of length one defined inDataFrame
constructor (GH 42810) -
Fixed a regression where
DataFrameGroupBy.agg()
andSeriesGroupBy.agg()
were incorrectly thrown in some cases (GH 42390) -
Fixed regression in
DataFrameGroupBy.apply()
andSeriesGroupBy.apply()
wheredropna=False
would also dropnan
values (GH 43205) -
Fixed regression where
DataFrameGroupBy.quantile()
andSeriesGroupBy.quantile()
failed when usingpandas.NA
(GH 42849) -
Fixed regression in
merge()
whereon
columns withExtensionDtype
orbool
data types were converted toobject
inright
andouter
merges (GH 40073) -
Fixed regression in
RangeIndex.where()
andRangeIndex.putmask()
raisingAssertionError
when the result did not represent aRangeIndex
(GH 43240) -
Fixed a regression in
read_parquet()
where thefastparquet
engine did not work properly in fastparquet 0.7.0 (GH 43075) -
Fixed regression in
DataFrame.loc.__setitem__()
raisingValueError
when setting an array to a cell value (GH 43422) -
Fixed regression in
is_list_like()
where objects with__iter__
set toNone
were recognized as iterables (GH 43373) -
Fixed regression in
DataFrame.__getitem__()
where slicing onDatetimeIndex
raised an error when the index was non-monotonic (GH 43223) -
Fixed regression in
Resampler.aggregate()
which, when used after column selection, would raise an error iffunc
was a set of aggregate functions (GH 42905) -
Fixed a regression in
DataFrame.corr()
where Kendall correlation would produce incorrect results for columns with duplicate values (GH 43401) -
Fixed a regression in
DataFrame.groupby()
where aggregating columns with object types would lose results for those columns (GH 42395, GH 43108) -
Fixed a regression in
Series.fillna()
that raised aTypeError
when filling afloat
Series
with a dtype that cannot be losslessly converted (such asfloat32
filled withfloat64
) (GH 43424) -
Fixed regression in
read_csv()
raisingAttributeError
when the file handle was atempfile.SpooledTemporaryFile
object (GH 43439) -
Fixed performance regression in
core.window.ewm.ExponentialMovingWindow.mean()
(GH 42333)
Performance improvements
- Improvements in performance when the keys or values of
DataFrame.__setitem__()
are notDataFrame
, or when the keys are not list-like (GH 43274)
Bug fix
- Fixed bug in
engine="numba"
inDataFrameGroupBy.agg()
andDataFrameGroupBy.transform()
, whereindex
data was not correctly passed tofunc
(GH 43133)
Contributor
A total of 18 people contributed patches to this release. People with a “+” after their names contributed patches for the first time.
- Ali McMaster
-
Irv Lustig
-
Matthew Roeschke
-
Matthew Zeitlin
-
MeeseeksMachine
-
pandas development team
-
Patrick Hoefler
-
Prerana Chakraborty +
-
Richard Shadrach
-
Shoham Debnath
-
Simon Hawkins
-
Thomas Li
-
Torsten Wörtwein
-
Zach Rait +
-
aiudirog +
-
attack68
-
jbrockmendel
-
suoniq +
New features in version 1.3.2 (August 15, 2021)
Original text:
pandas.pydata.org/docs/whatsnew/v1.3.2.html
These are changes in pandas version 1.3.2. Check out the release notes for the full changelog including other versions of pandas.
fixed return
- Performance regression in
DataFrame.isin()
andSeries.isin()
for nullable data types (GH 42714) -
Regression in updating the value of a
Series
using a boolean index created viaDataFrame.pop()
has been fixed (GH 42530) -
Regression in
DataFrame.from_records()
when records are empty (GH 42456) -
Regression in
TypeError
inDataFrame.shift()
when shifting and filling values in a DataFrame created by slicing concatenation has been fixed (GH 42719) -
Regression in
DataFrame.agg()
whenfunc
parameter returns a list andaxis=1
(GH 42727) -
Regression in
DataFrame.drop()
that does not work if there are duplicates inMultiIndex
and the indexer is a tuple or a list of tuples (GH 42771) -
Fixed regression in
read_csv()
raisingValueError
when parametersnames
andprefix
are both set toNone
(GH 42387) -
Fixed regression in comparison between
Timestamp
objects and nanoseconddatetime64
objects, outside the implementation scope of nanoseconddatetime64
(GH 42794) -
Fixed regression in
Styler.highlight_min()
andStyler.highlight_max()
thatpandas.NA
failed to ignore (GH 42650) -
Fixed regression in
concat()
wherecopy=False
was not respected when concatenatingaxis=1
Series (GH 42501) -
Regression problem in
Series.nlargest()
andSeries.nsmallest()
with nullable integer or float dtype (GH 42816) -
Fixed regression in
Series.quantile()
related toInt64Dtype
(GH 42626) -
Fixed a regression in
Series.groupby()
andDataFrame.groupby()
where using a tuple-named Series as theby
parameter would incorrectly raise an exception (GH 42731) ## Bug fixes -
Bug in
read_excel()
modified dtypes dictionary when reading files with duplicate columns (GH 42462) -
1D slices on extension types become N-dimensional slices on ExtensionArrays (GH 42430)
-
Fixed bug in
Series.rolling()
andDataFrame.rolling()
where window bounds were not calculated correctly for the first row whencenter=True
andwindow
overridden the offset of all rows (GH 42753) -
Styler.hide_columns()
now hides index name header rows as well as column headers (GH 42101) -
Styler.set_sticky()
has modified CSS to control column/index names and ensure correct sticky positioning (GH 42537) -
Error when deserializing datetime index in PYTHONOPTIMIZED mode (GH 42866) ## Contributors
A total of 16 people contributed patches to this release. People with a “+” after their names contributed patches for the first time.
- Alexander Gorodetsky +
-
Fangchen Li
-
Fred Reiss
-
Justin McOmie +
-
Matthew Zeitlin
-
MeeseeksMachine
-
pandas development team
-
Patrick Hoefler
-
Richard Shadrach
-
Shoham Debnath
-
Simon Hawkins
-
Thomas Li
-
Wenjun Si
-
attack68
-
dicristina +
-
jbrockmendel ## Fixed regression issue
-
Performance regression in
DataFrame.isin()
andSeries.isin()
in the case of nullable data types (GH 42714) -
Regression issue when updating
Series
values using boolean indexes created viaDataFrame.pop()
(GH 42530) -
Regression issue in
DataFrame.from_records()
when records are empty (GH 42456) -
Regression in
DataFrame.shift()
withTypeError
when creating DataFrame via slice concatenation and populating values (GH 42719) -
Regression issue in
DataFrame.agg()
whenfunc
parameter returns a list andaxis=1
(GH 42727) -
Regression in
DataFrame.drop()
not working when there are duplicates inMultiIndex
and the indexer is a tuple or a list of tuples (GH 42771) -
Fixed regression in
read_csv()
raisingValueError
when parametersnames
andprefix
are both set toNone
(GH 42387) -
Fixed an out-of-implementation regression in comparisons between
Timestamp
objects and nanoseconddatetime64
objects (GH 42794) -
Fixed a regression in
Styler.highlight_min()
andStyler.highlight_max()
wherepandas.NA
was not successfully ignored (GH 42650) -
Fixed a bug in
concat()
wherecopy=False
was not respected when connectingaxis=1
Series (GH 42501) -
A regression bug in
Series.nlargest()
andSeries.nsmallest()
caused a regression when the nullable integer or floating point dtype was used (GH 42816) -
Fixed a bug in
Series.quantile()
causing a regression when comparing toInt64Dtype
(GH 42626) -
Fixed a bug in
Series.groupby()
andDataFrame.groupby()
where using a tuple-named Series as theby
parameter would incorrectly raise an exception (GH 42731)
Bug fix
-
A bug in
read_excel()
modifies the dtypes dictionary when reading files with duplicate columns (GH 42462) -
Slices of 1D extended types become N-dimensional slices of extended arrays (GH 42430)
-
Fixed a bug in
Series.rolling()
andDataFrame.rolling()
where the window bounds were not calculated correctly for the first row whencenter=True
andwindow
was an offset that covered all rows ( GH 42753) -
Styler.hide_columns()
now hides index name header rows and column headers (GH 42101) -
Styler.set_sticky()
has modified CSS to control column/index names and ensure correct sticky positioning (GH 42537) -
Bug deserializing datetime index in PYTHONOPTIMIZED mode (GH 42866)
Contributor
A total of 16 people contributed patches to this version. People with a “+” after their names contributed patches for the first time.
- Alexander Gorodetsky +
-
Fangchen Li
-
Fred Reiss
-
Justin McOmie +
-
Matthew Zeitlin
-
MeeseeksMachine
-
pandas development team
-
Patrick Hoefler
-
Richard Shadrach
-
Shoham Debnath
-
Simon Hawkins
-
Thomas Li
-
Wenjun Si
-
attack68
-
dicristina +
-
jbrockmendel
New features in 1.3.1 (July 25, 2021)
Original text:
pandas.pydata.org/docs/whatsnew/v1.3.1.html
These are changes in pandas 1.3.1. Check out the release notes for the full changelog including other versions of pandas.
fixed return
- Unable to build Pandas on PyPy (GH 42355)
-
DataFrame
built with older versions of pandas cannot be deserialized (GH 42345) -
Performance regression in building
DataFrame
from dictionaries of dictionaries (GH 42248) -
Fixed regression in
DataFrame.agg()
where values were lost when DataFrame had extended array dtype, duplicate index andaxis=1
(GH 42380) -
Fixed regression in
DataFrame.astype()
changing the order of non-contiguous data (GH 42396) -
Performance regression in
DataFrame
during reduction operations that require transformations, such as when performingDataFrame.mean()
on integer data (GH 38592) -
Performance regression in
DataFrame.to_dict()
andSeries.to_dict()
when theorient
argument is “records”, “dict”, or “split” (GH 42352) -
A regression that incorrectly raised a
TypeError
when indexing with alist
subclass has been fixed (GH 42433, GH 42461) -
Fixed regression in
DataFrame.isin()
andSeries.isin()
raisingTypeError
when nullable data containing at least one missing value (GH 42405) -
In
concat()
between objects with boolean and integer dtypes, a regression in converting them to objects instead of integers has been fixed (GH 42092) -
Bug in
Series
constructor not acceptingdask.Array
(GH 38645) -
Fixed regression in
SettingWithCopyWarning
showing wrong stacklevel (GH 42570) -
Fixed regression in
merge_asof()
raisingKeyError
when one of theby
columns was in the index (GH 34488) -
Fixed regression in
to_datetime()
returning pd.NaT whencache=True
(GH 42259) -
Fixed a regression where
SeriesGroupBy.value_counts()
caused anIndexError
when called on a Series with only one row (GH 42618) ## Bug fixes -
Fixed a bug where
DataFrame.transpose()
lost values when the DataFrame had an extended array data type and duplicate indexes (GH 42380) -
Fixed bug where
DataFrame.to_xml()
raisedKeyError
when called withindex=False
and an offset index (GH 42458) -
Fixed a bug where
Styler.set_sticky()
did not handle the index name correctly for a single index column (GH 42537) -
Fixed bug where
DataFrame.copy()
did not merge chunks in the result (GH 42579) ## Contributors
A total of 17 people contributed patches to this release. People with a “+” after their names contributed patches for the first time.
- Fangchen Li
-
Live +
-
Matthew Roeschke
-
Matthew Zeitlin
-
MeeseeksMachine
-
pandas development team
-
Patrick Hoefler
-
Richard Shadrach
-
Shoham Debnath +
-
Simon Hawkins
-
Stephan Heßelmann +
-
Stephen +
-
Thomas Li
-
Zheyuan +
-
attack68
-
jbrockmendel
-
neelmraman + ## Regression fixed
-
Pandas cannot be built on PyPy (GH 42355)
-
DataFrame
built with older versions of pandas cannot be deserialized (GH 42345) -
There is a performance regression when constructing
DataFrame
from a dictionary of dictionaries (GH 42248) -
Fixed regression in
DataFrame.agg()
where values were lost when DataFrame had extended array dtype, repeated indexing andaxis=1
(GH 42380) -
Fixed regression in
DataFrame.astype()
changing the order of non-contiguous data (GH 42396) -
There is a performance regression in
DataFrame
in reduction operations that require transformations, such asDataFrame.mean()
on integer data (GH 38592) -
There is a performance regression in
DataFrame.to_dict()
andSeries.to_dict()
when theorient
parameter is one ofrecords
,dict
orsplit
(GH 42352) -
Fixed regression that raised
TypeError
when indexing usinglist
subclasses (GH 42433, GH 42461) -
Fixed regression where
DataFrame.isin()
andSeries.isin()
raisedTypeError
in nullable data containing at least one missing value (GH 42405) -
There is a regression in
concat()
between objects with boolean dtype and integer dtype, converting them to objects instead of integers (GH 42092) -
Bug not accepting
dask.Array
inSeries
constructor (GH 38645) -
Fixed regression in
SettingWithCopyWarning
showing incorrect stacklevel (GH 42570) -
Fixed regression in
merge_asof()
raisingKeyError
when one of theby
columns was in the index (GH 34488) -
Fixed regression in
to_datetime()
returning pd.NaT when generating input with duplicate values whencache=True
(GH 42259) -
Fixed a regression in
SeriesGroupBy.value_counts()
that caused anIndexError
when called on a Series with only one row (GH 42618)
Bug fixes
-
Fixed bug where
DataFrame.transpose()
dropped values when the DataFrame had extended array dtype and duplicate indexes (GH 42380) -
Fixed a bug in
DataFrame.to_xml()
that raised aKeyError
when called withindex=False
and an offset index (GH 42458) -
Fixed a bug in
Styler.set_sticky()
, which did not correctly handle the index name in the case of single index column (GH 42537) -
Fixed bug in
DataFrame.copy()
failing to merge chunks in the result (GH 42579)
Contributor
A total of 17 people contributed patches to this release. People with a “+” after their name are the first to contribute patches.
- Fangchen Li
-
Live +
-
Matthew Roeschke
-
Matthew Zeitlin
-
MeeseeksMachine
-
pandas development team
-
Patrick Hoefler
-
Richard Shadrach
-
Shoham Debnath +
-
Simon Hawkins
-
Stephan Heßelmann +
-
Stephen +
-
Thomas Li
-
Zheyuan +
-
attack68
-
jbrockmendel
-
neelmraman +
What's new in 1.3.0 (July 2, 2021)
Original text:
pandas.pydata.org/docs/whatsnew/v1.3.0.html
These are changes in pandas 1.3.0. See the Release notes for a complete changelog including other versions of pandas.
warn
When reading new Excel 2007+ (.xlsx
) files, the default parameter engine=None
will use openpyxl in all cases when the option io.excel.xlsx.reader
is set to "auto"
engine. Previously, the xlrd engine would be used in some cases. For background on this change, see What's new 1.2.0.
Enhancements
Customize HTTP(s) headers when reading csv or json files
When reading from remote URLs that are not handled by fsspec (such as HTTP and HTTPS), the dictionary passed to storage_options
will be used to create the headers included in the request. This can be used to control the User-Agent header or send other custom headers (GH 36688). For example:
In [1]: headers = {"User-Agent": "pandas"}
In [2]: df = pd.read_csv(
...: "https://download.bls.gov/pub/time.series/cu/cu.item",
...: sep="\t",
...: storage_options=headers
...: )
```### Read and write XML document
We added I/O for reading and rendering shallow versions of [XML](https://www.w3.org/standards/xml/core) documents using `read_xml()` and `DataFrame.to_xml()` support. Use [lxml](https://lxml.de) as parser, both XPath 1.0 and XSLT 1.0 are available ([GH 27554](https://github.com/pandas-dev/pandas/issues/27554)).
```py
In [1]: xml = """<?xml version='1.0' encoding='utf-8'?>
...: <data>
...: <row>
...: <shape>square</shape>
...: <degrees>360</degrees>
...: <sides>4.0</sides>
...: </row>
...: <row>
...: <shape>circle</shape>
...: <degrees>360</degrees>
...: <sides/>
...: </row>
...: <row>
...: <shape>triangle</shape>
...: <degrees>180</degrees>
...: <sides>3.0</sides>
...: </row>
...: </data>"""
In [2]: df = pd.read_xml(xml)
In [3]: df
Out[3]:
shape degrees sides
0 square 360 4.0
1 circle 360 NaN
2 triangle 180 3.0
In [4]: df.to_xml()
Out[4]:
<?xml version='1.0' encoding='utf-8'?>
<data>
<row>
<index>0</index>
<shape>square</shape>
<degrees>360</degrees>
<sides>4.0</sides>
</row>
<row>
<index>1</index>
<shape>circle</shape>
<degrees>360</degrees>
<sides/>
</row>
<row>
<index>2</index>
<shape>triangle</shape>
<degrees>180</degrees>
<sides>3.0</sides>
</row>
</data>
</code></pre>
For more information, see Writing XML in the IO Tools User Guide. ### Styler enhancement
We've done some focused development on <code>Styler</code>. See the revised and improved Styler documentation (GH 39720, GH 39317, GH 40493).
<blockquote>
<ul>
<li>The method <code>Styler.set_table_styles()</code> now accepts a more natural CSS language as argument, such as <code>'color:red;'</code> instead of <code>[('color', 'red')]</code> (GH 39563).</p></li>
<li><p>Methods <code>Styler.highlight_null()</code>, <code>Styler.highlight_min()</code> and <code>Styler.highlight_max()</code> now allow custom CSS highlighting instead of the default background coloring (GH 40242)</p></li>
<li><p><code>Styler.apply()</code> now accepts functions that return <code>ndarray</code> when <code>axis=None</code>, making it consistent with the behavior of <code>axis=0</code> and <code>axis=1</code> (GH 39359)</p></li>
<li><p>When providing malformed CSS via <code>Styler.apply()</code> or <code>Styler.applymap()</code>, an error is now thrown when rendering (GH 39660)</p></li>
<li><p><code>Styler.format()</code> now accepts keyword argument <code>escape</code> for optional HTML and LaTeX escaping (GH 40388, GH 41619)</p></li>
<li><p><code>Styler.background_gradient()</code> added parameter <code>gmap</code> to provide a specific gradient map for coloring (GH 22727)</p></li>
<li><p><code>Styler.clear()</code> now also clears <code>Styler.hidden_index</code> and <code>Styler.hidden_columns</code> (GH 40484)</p></li>
<li><p>Added method <code>Styler.highlight_between()</code> (GH 39821)</p></li>
<li><p>Added method <code>Styler.highlight_quantile()</code> (GH 40926)</p></li>
<li><p>Added method <code>Styler.text_gradient()</code> (GH 41098)</p></li>
<li><p>Added method <code>Styler.set_tooltips()</code> to allow hover tooltips; this can enhance interactive display (GH 21266, GH 40284)</p></li>
<li><p>Added parameter <code>precision</code> to method <code>Styler.format()</code> to control the display of floating point numbers (GH 40134)</p></li>
<li><p>HTML output rendered by <code>Styler</code> now follows the w3 HTML Style Guide (GH 39626)</p></li>
<li><p>Many features of the <code>Styler</code> class are now partially or fully available for DataFrames with non-unique indexes or columns (GH 41143)</p></li>
<li><p>Better control over display by sparsifying indexes or columns individually using new styler options, which are also available via <code>option_context()</code> (GH 41142)</p></li>
<li><p>Added option <code>styler.render.max_elements</code> to avoid browser overload when styling large DataFrames (GH 40712)</p></li>
<li><p>Added method <code>Styler.to_latex()</code> (GH 21673, GH 42320), which also allows some limited CSS transformations (GH 40731)</p></li>
<li><p>Added method <code>Styler.to_html()</code> (GH 13379)</p></li>
<li><p>Added method <code>Styler.set_sticky()</code> to make index and column headers permanently visible in scrolling HTML frames (GH 29072) ### DataFrame constructor respects <code>copy=False</code></p></li>
</ul>
</blockquote>
<p>Copying is no longer done when passing a dictionary to <code>DataFrame</code> with <code>copy=False</code> (GH 32960).
<pre><code class="language-python line-numbers">In [1]: arr = np.array([1, 2, 3])
In [2]: df = pd.DataFrame({"A": arr, "B": arr.copy()}, copy=False)
In [3]: df
Out[3]:
A B
0 1 1
1 2 2
2 3 3
</code></pre>
<code>df["A"]</code> is still a view of <code>arr</code>:
<pre><code class="language-python line-numbers">In [4]: arr[0] = 0
In [5]: assert df.iloc[0, 0] == 0
</code></pre>
When no <code>copy</code> argument is passed, the default behavior remains unchanged, i.e. a copy is made. ### String data type based on PyArrow
We have enhanced <code>StringDtype</code> specifically for string data, which is an extended type. (GH 39908)
It is now possible to add attributes to <code>StringDtype</code> by specifying the <code>storage</code> keyword option. You can make a StringArray backed by a PyArrow array rather than a NumPy array or Python object, using the pandas option or specifying a dtype using <code>dtype='string[pyarrow]'</code>.
StringArray supported by PyArrow requires pyarrow 1.0.0 or higher to be installed.
warn
<code>string[pyarrow]</code> is currently considered an experimental feature. The implementation and parts of the API may change without warning.
<pre><code class="language-python line-numbers">In [6]: pd.Series(['abc', None, 'def'], dtype=pd.StringDtype(storage="pyarrow"))
Out[6]:
0 abc
1 <NA>
2 def
dtype: string
</code></pre>
You can also use the alias <code>"string[pyarrow]"</code>.
<pre><code class="language-python line-numbers">In [7]: s = pd.Series(['abc', None, 'def'], dtype="string[pyarrow]")
In [8]: s
Out[8]:
0 abc
1 <NA>
2 def
dtype: string
</code></pre>
You can also create PyArrow-based string arrays using the pandas option.
<pre><code class="language-python line-numbers">In [9]: with pd.option_context("string_storage", "pyarrow"):
...: s = pd.Series(['abc', None, 'def'], dtype="string")
...:
In [10]: s
Out[10]:
0 abc
1 <NA>
2 def
dtype: string
</code></pre>
The usual string accessor methods work. Where appropriate, the DataFrame's Series or column return type will also have a string dtype.
<pre><code class="language-python line-numbers">In [11]: s.str.upper()
Out[11]:
0 ABC
1 <NA>
2 DEF
dtype: string
In [12]: s.str.split('b', expand=True).dtypes
Out[12]:
0 string[pyarrow]
1 string[pyarrow]
dtype: object
</code></pre>
String accessor methods that return integers will return values with <code>Int64Dtype</code>.
<pre><code class="language-python line-numbers">In [13]: s.str.count("a")
Out[13]:
0 1
1 <NA>
2 0
dtype: Int64
```### Centered datetime scrolling window
Centered datetime windows are now available when performing rolling calculations on DataFrame and Series objects ([GH 38780](https://github.com/pandas-dev/pandas/issues/38780)). For example:
```py
In [14]: df = pd.DataFrame(
....: {"A": [0, 1, 2, 3, 4]}, index=pd.date_range("2020", periods=5, freq="1D")
....: )
....:
In [15]: df
Out[15]:
A
2020-01-01 0
2020-01-02 1
2020-01-03 2
2020-01-04 3
2020-01-05 4
In [16]: df.rolling("2D", center=True).mean()
Out[16]:
A
2020-01-01 0.5
2020-01-02 1.5
2020-01-03 2.5
2020-01-04 3.5
2020-01-05 4.0
```### Other enhancements
+ `DataFrame.rolling()`, `Series.rolling()`, `DataFrame.expanding()` and `Series.expanding()` now support the `method` argument and provide a `'table`` option, Window operations can be performed on the entire `DataFrame`. See the window overview for performance and functionality benefits ([GH 15095](https://github.com/pandas-dev/pandas/issues/15095), [GH 38995](https://github.com/pandas -dev/pandas/issues/38995))
+ `ExponentialMovingWindow` now supports an `online` method to perform `mean` calculations online. See window overview ([GH 41673](https://github.com/pandas-dev/pandas/issues/41673))
+ Added `MultiIndex.dtypes()` ([GH 37062](https://github.com/pandas-dev/pandas/issues/37062))
+ Added `end` and `end_day` options to the `origin` parameter of `DataFrame.resample()` ([GH 37804](https://github.com/pandas-dev/pandas/issues/37804))
+ Improved error message when `usecols` and `names` do not match when `read_csv()` and `engine="c"` ([GH 29042](https://github.com/pandas-dev /pandas/issues/29042))
+ Improved consistency of error messages when passing invalid `win_type` parameters in window methods ([GH 15969](https://github.com/pandas-dev/pandas/issues/15969))
+ `read_sql_query()` now accepts a `dtype` parameter to convert column data in the SQL database based on user input ([GH 10285](https://github.com/pandas-dev/pandas/issues/10285))
+ When `usecols` is not specified, a `ParserWarning` is raised in `read_csv()` if the header or given name length does not match the data length ([GH 21768](https://github.com/ pandas-dev/pandas/issues/21768))
+ Improved pandas to SQLAlchemy integer type mapping when using `DataFrame.to_sql()` ([GH 35076](https://github.com/pandas-dev/pandas/issues/35076))
+ `to_numeric()` now supports downcasting of nullable `ExtensionDtype` objects ([GH 33013](https://github.com/pandas-dev/pandas/issues/33013))
+ Added support for dictionary-like names in `MultiIndex.set_names` and `MultiIndex.rename` ([GH 20421](https://github.com/pandas-dev/pandas/issues/20421))
+ `read_excel()` now automatically detects .xlsb files and legacy .xls files ([GH 35416](https://github.com/pandas-dev/pandas/issues/35416), [GH 41225](https: //github.com/pandas-dev/pandas/issues/41225))
+ `ExcelWriter` now accepts an `if_sheet_exists` parameter for controlling the behavior of appending patterns when writing to existing sheets ([GH 40230](https://github.com/pandas-dev/pandas/issues/40230 ))
+ `Rolling.sum()`, `Expanding.sum()`, `Rolling.mean()`, `Expanding.mean()`, `ExponentialMovingWindow.mean()`, `Rolling.median()`, ` Expanding.median()`, `Rolling.max()`, `Expanding.max()`, `Rolling.min()` and `Expanding.min()` now support using the `engine` keyword [Numba] (http://numba.pydata.org/) Execution ([GH 38895](https://github.com/pandas-dev/pandas/issues/38895), [GH 41267](https://github.com /pandas-dev/pandas/issues/41267))
+ `DataFrame.apply()` now accepts NumPy unary operators as strings, such as `df.apply("sqrt")`, which already exists in `Series.apply()` ([GH 39116](https ://github.com/pandas-dev/pandas/issues/39116))
+ `DataFrame.apply()` now accepts non-callable DataFrame properties as strings, such as `df.apply("size")`, which already exists in `Series.apply()` ([GH 39116]( https://github.com/pandas-dev/pandas/issues/39116))
+ `DataFrame.applymap()` can now accept kwargs to pass to a user-supplied `func` ([GH 39987](https://github.com/pandas-dev/pandas/issues/39987))
+ Passing the `DataFrame` indexer to `iloc` is now not allowed for `Series.__getitem__()` and `DataFrame.__getitem__()` ([GH 39004](https://github.com/pandas-dev /pandas/issues/39004))
+ `Series.apply()` can now accept a list or dictionary-like argument instead of a list or dictionary, e.g. `ser.apply(np.array(["sum", "mean"]))`, which is the case in ` Already exists in DataFrame.apply()` ([GH 39140](https://github.com/pandas-dev/pandas/issues/39140))
+ `DataFrame.plot.scatter()` can now accept a categorical column as parameter `c` ([GH 12380](https://github.com/pandas-dev/pandas/issues/12380), [GH 31357] (https://github.com/pandas-dev/pandas/issues/31357))
+ A useful error message is now raised for `Series.loc()` when the Series has a `MultiIndex` and the indexer has too many dimensions ([GH 35349](https://github.com/pandas-dev/ pandas/issues/35349))
+ `read_stata()` now supports reading data from compressed files ([GH 26599](https://github.com/pandas-dev/pandas/issues/26599))
+ Added support for `ISO 8601`-like timestamp parsing of data read from compressed files to `Timedelta` ([GH 37172](https://github.com/pandas-dev/pandas/issues/37172) )
+ Added support for unary operators in `FloatingArray` ([GH 38749](https://github.com/pandas-dev/pandas/issues/38749))
+ `RangeIndex` can now be constructed directly by passing a `range` object, e.g. `pd.RangeIndex(range(3))` ([GH 12067](https://github.com/pandas-dev/pandas/issues/ 12067))
+ `Series.round()` and `DataFrame.round()` now handle nullable integer and floating point types ([GH 38844](https://github.com/pandas-dev/pandas/issues/38844) )
+ `read_csv()` and `read_json()` provide parameter `encoding_errors` to control how encoding errors are handled ([GH 39450](https://github.com/pandas-dev/pandas/issues/39450))
+ `DataFrameGroupBy.any()`, `SeriesGroupBy.any()`, `DataFrameGroupBy.all()`, and `SeriesGroupBy.all()` when using Kleene logic with nullable data types ([GH 37506](https ://github.com/pandas-dev/pandas/issues/37506))
+ `DataFrameGroupBy.any()`, `SeriesGroupBy.any()`, `DataFrameGroupBy.all()`, and `SeriesGroupBy.all()` return a `BooleanDtype` for columns with nullable data types ([ GH 33449](https://github.com/pandas-dev/pandas/issues/33449))
+ `DataFrameGroupBy.any()`, `SeriesGroupBy.any()`, `DataFrameGroupBy.all()`, and `SeriesGroupBy.all()` on `object` data containing `pd.NA` even if `skipna= True` also throws an exception ([GH 37501](https://github.com/pandas-dev/pandas/issues/37501))
+ `DataFrameGroupBy.rank()` and `SeriesGroupBy.rank()` now support object dtype data ([GH 38278](https://github.com/pandas-dev/pandas/issues/38278))
+ When the `data` argument is a Python iterable object that is not composed of a NumPy `ndarray`, constructing a `DataFrame` or `Series` will result in a dtype with a precision of the maximum value of the NumPy scalar; when `data` is a NumPy ` ndarray` ([GH 40908](https://github.com/pandas-dev/pandas/issues/40908))
+ Add keyword `sort` in `pivot_table()` to allow results to be unsorted ([GH 39143](https://github.com/pandas-dev/pandas/issues/39143))
+ Add keyword `dropna` in `DataFrame.value_counts()` to allow counting rows containing `NA` values ([GH 41325](https://github.com/pandas-dev/pandas/issues/ 41325))
+ `Series.replace()` now converts the result to `PeriodDtype` instead of `object` dtype ([GH 41526](https://github.com/pandas-dev/pandas/issues/41526))
+ Improved error messages in the `corr` and `cov` methods of `Rolling`, `Expanding` and `ExponentialMovingWindow` when `other` is not a `DataFrame` or `Series` ([GH 41741](https: //github.com/pandas-dev/pandas/issues/41741))
+ `Series.between()` now accepts `left` or `right` as an `inclusive` parameter to include only left or right borders ([GH 40245](https://github.com/pandas-dev /pandas/issues/40245))
+ `DataFrame.explode()` now supports exploding multiple columns simultaneously. Its `column` parameter now also accepts a list or tuple of strings to explode on multiple columns simultaneously ([GH 39240](https://github.com/pandas-dev/pandas/issues/39240))
+ `DataFrame.sample()` now accepts the `ignore_index` parameter to reset the index after sampling, similar to `DataFrame.drop_duplicates()` and `DataFrame.sort_values()` ([GH 38581](https://github .com/pandas-dev/pandas/issues/38581)) ## Notable bug fixes
These are bug fixes that may have significant behavior changes.
### `Categorical.unique` now always remains the same as the original data type
Previously, when calling `Categorical.unique()` with categorical data, unused categories in the new array were removed, making the new array's data type different from the original data type ([GH 18291](https://github. com/pandas-dev/pandas/issues/18291))
For example, given:
```py
In [17]: dtype = pd.CategoricalDtype(['bad', 'neutral', 'good'], ordered=True)
In [18]: cat = pd.Categorical(['good', 'good', 'bad', 'bad'], dtype=dtype)
In [19]: original = pd.Series(cat)
In [20]: unique = original.unique()
</code></pre>
<em>Previous Behavior</em>:
<pre><code class="language-python line-numbers">In [1]: unique
['good', 'bad']
Categories (2, object): ['bad' < 'good']
In [2]: original.dtype == unique.dtype
False
</code></pre>
<em>New Behavior</em>:
<pre><code class="language-python line-numbers">In [21]: unique
Out[21]:
['good', 'bad']
Categories (3, object): ['bad' < 'neutral' < 'good']
In [22]: original.dtype == unique.dtype
Out[22]: True
```### exist `DataFrame.combine_first()` reserved data types in
`DataFrame.combine_first()` now preserves data types ([GH 7509](https://github.com/pandas-dev/pandas/issues/7509))
```py
In [23]: df1 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=[0, 1, 2])
In [24]: df1
Out[24]:
A B
0 1 1
1 2 2
2 3 3
In [25]: df2 = pd.DataFrame({"B": [4, 5, 6], "C": [1, 2, 3]}, index=[2, 3, 4])
In [26]: df2
Out[26]:
B C
2 4 1
3 5 2
4 6 3
In [27]: combined = df1.combine_first(df2)
</code></pre>
<em>Previous Behavior</em>:
<pre><code class="language-python line-numbers">In [1]: combined.dtypes
Out[2]:
A float64
B float64
C float64
dtype: object
</code></pre>
<em>New Behavior</em>:
<pre><code class="language-python line-numbers">In [28]: combined.dtypes
Out[28]:
A float64
B int64
C float64
dtype: object
```### Groupby method agg and transform No longer changes the return data type of callable functions
Previously, the methods `DataFrameGroupBy.aggregate()`, `SeriesGroupBy.aggregate()`, `DataFrameGroupBy.transform()` and `SeriesGroupBy.transform()` might convert the result data type when the argument `func` was a callable function , may lead to undesirable results ([GH 21240](https://github.com/pandas-dev/pandas/issues/21240)). Conversion occurs if the result is numeric and converting back to the input data type does not change any value (as measured by `np.allclose`). Such conversions no longer occur.
```py
In [29]: df = pd.DataFrame({'key': [1, 1], 'a': [True, False], 'b': [True, True]})
In [30]: df
Out[30]:
key a b
0 1 True True
1 1 False True
</code></pre>
<em>Previous Behavior</em>:
<pre><code class="language-python line-numbers">In [5]: df.groupby('key').agg(lambda x: x.sum())
Out[5]:
a b
key
1 True 2
</code></pre>
<em>New Behavior</em>:
<pre><code class="language-python line-numbers">In [31]: df.groupby('key').agg(lambda x: x.sum())
Out[31]:
a b
key
1 1 2
```### `DataFrameGroupBy.mean()`、`DataFrameGroupBy.median()` and `GDataFrameGroupBy.var()`、`SeriesGroupBy.mean()`、`SeriesGroupBy.median()` and `SeriesGroupBy.var()` The return result is now `float`
Previously, these methods could produce different data types depending on the input value. These methods will now always return floating point data types. ([GH 41137](https://github.com/pandas-dev/pandas/issues/41137))
```py
In [32]: df = pd.DataFrame({'a': [True], 'b': [1], 'c': [1.0]})
</code></pre>
<em>Previous Behavior</em>:
<pre><code class="language-python line-numbers">In [5]: df.groupby(df.index).mean()
Out[5]:
a b c
0 True 1 1.0
</code></pre>
<em>New Behavior</em>:
<pre><code class="language-python line-numbers">In [33]: df.groupby(df.index).mean()
Out[33]:
a b c
0 1.0 1.0 1.0
```### Try using `loc` and `iloc` Perform in-place operations when setting values
When setting an entire column using `loc` or `iloc`, pandas will try to insert values into existing data instead of creating an entirely new array.
```py
In [34]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")
In [35]: values = df.values
In [36]: new = np.array([5, 6, 7], dtype="int64")
In [37]: df.loc[[0, 1, 2], "A"] = new
</code></pre>
In the old and new behavior, the data in <code>values</code> is overwritten, but in the old behavior the data type of <code>df["A"]</code> becomes <code>int64</code>.
<em>Previous Behavior</em>:
<pre><code class="language-python line-numbers">In [1]: df.dtypes
Out[1]:
A int64
dtype: object
In [2]: np.shares_memory(df["A"].values, new)
Out[2]: False
In [3]: np.shares_memory(df["A"].values, values)
Out[3]: False
</code></pre>
In pandas 1.3.0, <code>df</code> still shares data with <code>values</code>
<em>New Behavior</em>:
<pre><code class="language-python line-numbers">In [38]: df.dtypes
Out[38]:
A float64
dtype: object
In [39]: np.shares_memory(df["A"], new)
Out[39]: False
In [40]: np.shares_memory(df["A"], values)
Out[40]: True
```### When setting `frame[keys] = values` Never operate in place
When setting multiple columns using `frame[keys] = values`, the new arrays will replace the pre-existing arrays for those keys and these arrays *will not* be overwritten ([GH 39510](https://github.com/pandas -dev/pandas/issues/39510)). Therefore, the column will retain the data type of `values` and will not be converted to the data type of the existing array.
```py
In [41]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")
In [42]: df[["A"]] = 5
</code></pre>
In the old behavior, <code>5</code> was converted to <code>float64</code> and inserted into the existing array <code>df</code>:
<em>Previous Behavior</em>:
<pre><code class="language-python line-numbers">In [1]: df.dtypes
Out[1]:
A float64
</code></pre>
In the new behavior, we get a new array and retain an integer <code>5</code>:
<em>New Behavior</em>:
<pre><code class="language-shell line-numbers">In [43]: df.dtypes
Out[43]:
A int64
dtype: object
```### Set non-boolean value to type boolean Series perform consistent conversions
Setting non-boolean values into a `Series` with `dtype=bool` is now always converted to `dtype=object` ([GH 38709](https://github.com/pandas-dev/pandas/issues/38709) )
```py
In [1]: orig = pd.Series([True, False])
In [2]: ser = orig.copy()
In [3]: ser.iloc[1] = np.nan
In [4]: ser2 = orig.copy()
In [5]: ser2.iloc[1] = 2.0
</code></pre>
<em>Previous Behavior</em>:
<pre><code class="language-shell line-numbers">In [1]: ser
Out [1]:
0 1.0
1 NaN
dtype: float64
In [2]:ser2
Out [2]:
0 True
1 2.0
dtype: object
</code></pre>
<em>New Behavior</em>:
<pre><code class="language-python line-numbers">In [1]: ser
Out [1]:
0 True
1 NaN
dtype: object
In [2]:ser2
Out [2]:
0 True
1 2.0
dtype: object
```### DataFrameGroupBy.rolling and SeriesGroupBy.rolling Grouping column values are no longer returned
Grouping columns are now removed from the results of `groupby.rolling` operations ([GH 32262](https://github.com/pandas-dev/pandas/issues/32262))
```py
In [44]: df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})
In [45]: df
Out[45]:
A B
0 1 0
1 1 1
2 2 2
3 3 3
</code></pre>
<em>Previous Behavior</em>:
<pre><code class="language-python line-numbers">In [1]: df.groupby("A").rolling(2).sum()
Out[1]:
A B
A
1 0 NaN NaN
1 2.0 1.0
2 2 NaN NaN
3 3 NaN NaN
</code></pre>
<em>New Behavior</em>:
<pre><code class="language-python line-numbers">In [46]: df.groupby("A").rolling(2).sum()
Out[46]:
B
A
1 0 NaN
1 1.0
2 2 NaN
3 3 NaN
```### Removed artificial truncation of rolling variance and standard deviation
`Rolling.std()` and `Rolling.var()` will no longer artificially truncate results smaller than `~1e-8` and `~1e-15` to zero ([GH 37051](https://github .com/pandas-dev/pandas/issues/37051), [GH 40448](https://github.com/pandas-dev/pandas/issues/40448), [GH 39872](https://github.com /pandas-dev/pandas/issues/39872)).
However, when the values are large, there may be floating point residue in the result.
```py
In [47]: s = pd.Series([7, 5, 5, 5])
In [48]: s.rolling(3).var()
Out[48]:
0 NaN
1 NaN
2 1.333333
3 0.000000
dtype: float64
```### DataFrameGroupBy.rolling and SeriesGroupBy.rolling �� MultiIndex No longer removes levels from results in
`DataFrameGroupBy.rolling()` and `SeriesGroupBy.rolling()` will no longer remove levels of `DataFrame` with `MultiIndex` from the results. This could lead to duplication of levels in the generated `MultiIndex`, but this change restores the behavior that existed in version 1.1.3 ([GH 38787](https://github.com/pandas-dev/pandas/issues/ 38787), [GH 38523](https://github.com/pandas-dev/pandas/issues/38523)).
```py
In [49]: index = pd.MultiIndex.from_tuples([('idx1', 'idx2')], names=['label1', 'label2'])
In [50]: df = pd.DataFrame({'a': [1], 'b': [2]}, index=index)
In [51]: df
Out[51]:
a b
label1 label2
idx1 idx2 1 2
</code></pre>
<em>Previous Behavior</em>:
<pre><code class="language-python line-numbers">In [1]: df.groupby('label1').rolling(1).sum()
Out[1]:
a b
label1
idx1 1.0 2.0
</code></pre>
<em>New Behavior</em>:
<pre><code class="language-python line-numbers">In [52]: df.groupby('label1').rolling(1).sum()
Out[52]:
a b
label1 label1 label2
idx1 idx1 idx2 1.0 2.0
```## Incompatible API Change
### Added minimum version of dependencies
The minimum supported versions of some dependencies have been updated. If installed, we now need to:
| Package | Minimum Version | Required | Changed |
| --- | --- | --- | --- |
| numpy | 1.17.3 | X | X |
| pytz | 2017.3 | X | |
| python-dateutil | 2.7.3 | X | |
| bottleneck | 1.2.1 | | |
| numexpr | 2.7.0 | | X |
| pytest (dev) | 6.0 | | X |
| mypy (dev) | 0.812 | | X |
| setuptools | 38.6.0 | | X |
For [optional libraries](https://pandas.pydata.org/docs/getting_started/install.html), it is generally recommended to use the latest version. The following table lists the minimum versions of each library currently being tested during pandas development. Optional libraries with lower than minimum tested versions may still be available, but are not considered supported.
| Package | Minimum Version | Changed |
| --- | --- | --- |
| beautifulsoup4 | 4.6.0 | |
| fastparquet | 0.4.0 | X |
| fsspec | 0.7.4 | |
| gcsfs | 0.6.0 | |
| lxml | 4.3.0 | |
| matplotlib | 2.2.3 | |
| numb | 0.46.0 | |
| openpyxl | 3.0.0 | X |
| pyarrow | 0.17.0 | X |
| pymysql | 0.8.1 | X |
| pytables | 3.5.1 | |
| s3fs | 0.4.0 | |
| scipy | 1.2.0 | |
| sqlalchemy | 1.3.0 | X |
| tabulate | 0.8.7 | X |
| xray | 0.12.0 | |
| xlrd | 1.2.0 | |
| xlsxwriter | 1.0.2 | |
| xlwt | 1.3.0 | |
| pandas-gbq | 0.12.0 | |
For more information, see Dependencies and Optional Dependencies### Other API Changes
+ Partially initialized `CategoricalDtype` objects (i.e. objects with `categories=None`) will no longer be equivalent to fully initialized dtype objects ([GH 38516](https://github.com/pandas-dev/pandas/issues/ 38516))
+ Accessing `_constructor_expanddim` on `DataFrame` and `_constructor_sliced` on `Series` now raises `AttributeError`. Previously would raise `NotImplementedError` ([GH 38782](https://github.com/pandas-dev/pandas/issues/38782))
+ Added new `engine` and `**engine_kwargs` arguments to `DataFrame.to_sql()` to support other future "SQL engines". Currently we are still only using `SQLAlchemy` under the hood, but plan to support more engines, such as [turbodbc](https://turbodbc.readthedocs.io/en/latest/) ([GH 36893](https://github .com/pandas-dev/pandas/issues/36893))
+ Removed redundant `freq` from `PeriodIndex` string representation ([GH 41653](https://github.com/pandas-dev/pandas/issues/41653))
+ `ExtensionDtype.construct_array_type()` is now a required method for `ExtensionDtype` subclasses, rather than an optional method ([GH 24860](https://github.com/pandas-dev/pandas/issues/24860))
+ Calling `hash` on a non-hashable pandas object will now raise `TypeError` and display a built-in error message (e.g. `unhashable type: 'Series'`). Previously a custom message would be raised such as `The 'Series' object is mutable and therefore cannot be hashed`. Additionally, `isinstance(<Series> , abc.collections.Hashable)` will now return `False` ([GH 40013](https://github.com/pandas-dev/pandas/issues/40013))
+ `Styler.from_custom_template()` now has two new parameters for the template name and the old `name` has been removed since template inheritance was introduced for better parsing ([GH 42053](https://github. com/pandas-dev/pandas/issues/42053)). It is also necessary to subclass modifications to the Styler attribute. ### Construct
+ Documents in `.pptx` and `.pdf` formats are no longer included in wheel or source distributions. ([GH 30741](https://github.com/pandas-dev/pandas/issues/30741)) ## Deprecated functionality
### Removing useless columns in DataFrame is deprecated in DataFrame reductions and DataFrameGroupBy operations.
Calling a reduction (e.g. `.min`, `.max`, `.sum`) on a `DataFrame` with `numeric_only=None` (the default) will be silently ignored if the reduce raises a `TypeError` Ignore and remove from results.
This behavior is deprecated. In a future version, a `TypeError` will be raised and the user will need to select a valid column before calling the function.
For example:
```py
In [53]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})
In [54]: df
Out[54]:
A B
0 1 2016-01-01
1 2 2016-01-02
2 3 2016-01-03
3 4 2016-01-04
</code></pre>
<em>Old Behavior</em>:
<pre><code class="language-python line-numbers">In [3]: df.prod()
Out[3]:
Out[3]:
A 24
dtype: int64
</code></pre>
<em>Future Behavior</em>:
<pre><code class="language-python line-numbers">In [4]: df.prod()
...
TypeError: 'DatetimeArray' does not implement reduction 'prod'
In [5]: df[["A"]].prod()
Out[5]:
A 24
dtype: int64
</code></pre>
Similarly, when applying a function to <code>DataFrameGroupBy</code>, columns where the function raises a <code>TypeError</code> are now silently ignored and removed from the result.
This behavior is deprecated. In a future version, a <code>TypeError</code> will be raised and the user will need to select a valid column before calling the function.
For example:
<pre><code class="language-python line-numbers">In [55]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})
In [56]: gb = df.groupby([1, 1, 2, 2])
</code></pre>
<em>Old Behavior</em>:
<pre><code class="language-python line-numbers">In [4]: gb.prod(numeric_only=False)
Out[4]:
A
1 2
2 12
</code></pre>
<em>Future Behavior</em>:
<pre><code class="language-python line-numbers">In [5]: gb.prod(numeric_only=False)
...
TypeError: datetime64 type does not support prod operations
In [6]: gb[["A"]].prod(numeric_only=False)
Out[6]:
A
1 2
2 12
```### Other obsolete features
+ Deprecate allowing scalars to be passed to `Categorical` constructor ([GH 38433](https://github.com/pandas-dev/pandas/issues/38433))
+ Deprecated constructing `CategoricalIndex` without passing list-like data ([GH 38944](https://github.com/pandas-dev/pandas/issues/38944))
+ Deprecate allowing subclass-specific keyword arguments to be used in the `Index` constructor and use specific subclasses directly instead ([GH 14093](https://github.com/pandas-dev/pandas/issues/14093), [GH 21311](https://github.com/pandas-dev/pandas/issues/21311), [GH 22315](https://github.com/pandas-dev/pandas/issues/22315), [GH 26974](https://github.com/pandas-dev/pandas/issues/26974))
+ Deprecated `astype()` method of datetime classes (`timedelta64[ns]`, `datetime64[ns]`, `Datetime64TZDtype`, `PeriodDtype`) will be converted to integer type, `values.view(. ..)` ([GH 38544](https://github.com/pandas-dev/pandas/issues/38544)). This deprecated feature was withdrawn in pandas 1.4.0.
+ Deprecated `MultiIndex.is_lexsorted()` and `MultiIndex.lexsort_depth()`, please use `MultiIndex.is_monotonic_increasing()` instead ([GH 32259](https://github.com/pandas-dev/pandas/issues/ 32259))
+ Deprecated keyword `try_cast` in `Series.where()`, `Series.mask()`, `DataFrame.where()`, `DataFrame.mask()`; cast results manually if necessary ([GH 38836](https://github.com/pandas-dev/pandas/issues/38836))
+ Deprecated comparison of `Timestamp` objects with `datetime.date` objects. Instead of using e.g. `ts <= mydate`, use `ts <= pd.Timestamp(mydate)` or `ts.date() <= mydate` ([GH 36131](https://github.com/ pandas-dev/pandas/issues/36131))
+ Deprecated `Rolling.win_type` returning `"freq"` ([GH 38963](https://github.com/pandas-dev/pandas/issues/38963))
+ Deprecated`Rolling.is_datetimelike` ([GH 38963](https://github.com/pandas-dev/pandas/issues/38963))
+ Deprecated `DataFrame` indexer for `Series.__setitem__()` and `DataFrame.__setitem__()` ([GH 39004](https://github.com/pandas-dev/pandas/issues/39004))
+ Deprecated`ExponentialMovingWindow.vol()` ([GH 39220](https://github.com/pandas-dev/pandas/issues/39220))
+ Using `.astype` to convert between `datetime64[ns]` dtype and `DatetimeTZDtype` has been deprecated and will raise a warning in a future release that `obj.tz_localize` or `obj.dt.tz_localize` should be used `instead of ([GH 38622](https://github.com/pandas-dev/pandas/issues/38622))
+ In `DataFrame.unstack()`, `DataFrame.shift()`, `Series.shift()` and `DataFrame.reindex()`, cast `datetime.date` objects to `datetime64` as` fill_value` has been deprecated and `pd.Timestamp(dateobj)` should be passed ([GH 39767](https://github.com/pandas-dev/pandas/issues/39767))
+ Deprecated `Styler.set_na_rep()` and `Styler.set_precision()` in favor of using `Styler.format()` with `na_rep` and `precision` as existing and new input parameters respectively ([ GH 40134](https://github.com/pandas-dev/pandas/issues/40134), [GH 40425](https://github.com/pandas-dev/pandas/issues/40425))
+ Deprecate `Styler.where()` in favor of using another form of `Styler.applymap()` ([GH 40821](https://github.com/pandas-dev/pandas/issues/40821))
+ Deprecated Allow partial failure in `Series.transform()` and `DataFrame.transform()` when `func` is similar to a list or dictionary and raises an exception other than `TypeError`; when `func` raises A warning will be raised in a future release ([GH 40211](https://github.com/pandas-dev/pandas/issues/40211)) on exceptions other than `TypeError`
+ Deprecated the `error_bad_lines` and `warn_bad_lines` parameters in `read_csv()` and `read_table()`, and adopted the `on_bad_lines` parameter ([GH 15122](https://github.com/pandas-dev/pandas /issues/15122))
+ Deprecated support for `np.ma.mrecords.MaskedRecords` in the `DataFrame` constructor, instead passing in `{name: data[name] for name in data.dtype.names}` ([GH 40363] (https://github.com/pandas-dev/pandas/issues/40363))
+ Deprecated the use of `merge()`, `DataFrame.merge()` and `DataFrame.join()` on different levels, resulting in different numbers of different levels ([GH 34862](https://github.com /pandas-dev/pandas/issues/34862))
+ Deprecated the use of `**kwargs` in `ExcelWriter`; use keyword argument `engine_kwargs` instead ([GH 40430](https://github.com/pandas-dev/pandas/issues/40430 ))
+ The `level` keyword in `DataFrame` and `Series` aggregations is deprecated; use groupby instead ([GH 39983](https://github.com/pandas-dev/pandas/issues/39983))
+ Deprecated the `inplace` parameter of `Categorical.remove_categories()`, `Categorical.add_categories()`, `Categorical.reorder_categories()`, `Categorical.rename_categories()`, `Categorical.set_categories()`, and Will be removed in a future version ([GH 37643](https://github.com/pandas-dev/pandas/issues/37643))
+ Deprecated the way to generate duplicate columns in `merge()` through the `suffixes` keyword, and there are existing columns ([GH 22818](https://github.com/pandas-dev/pandas/issues/ 22818))
+ The method of setting `Categorical._codes` has been deprecated, a new `Categorical` should be created and passed in the required codes ([GH 40606](https://github.com/pandas-dev/pandas/issues/ 40606))
+ Deprecated `convert_float` optional parameter in `read_excel()` and `ExcelFile.parse()` ([GH 41127](https://github.com/pandas-dev/pandas/issues/41127) )
+ Deprecated the mixed time zone behavior of `DatetimeIndex.union()`; in a future version, both will be converted to UTC instead of object data types ([GH 39328](https://github.com/pandas- dev/pandas/issues/39328))
+ For `read_csv()` using `engine="c"`, deprecated special handling of `usecols` for out-of-bounds indexes ([GH 25623](https://github.com/pandas-dev/pandas /issues/25623))
+ Deprecated special handling of lists with first element being categorical in the `DataFrame` constructor; pass `pd.DataFrame({col: categorical, ...})` instead ([GH 38845](https: //github.com/pandas-dev/pandas/issues/38845))
+ Deprecated behavior of the `DataFrame` constructor when a `dtype` is passed and the data cannot be converted to that dtype. In a future release, this will raise an exception instead of being silently ignored ([GH 24435](https://github.com/pandas-dev/pandas/issues/24435))
+ Deprecated `Timestamp.freq` property. For properties that use it (`is_month_start`, `is_month_end`, `is_quarter_start`, `is_quarter_end`, `is_year_start`, `is_year_end`), when you have a `freq`, use e.g. `freq.is_month_start(ts)` ([GH 15146](https://github.com/pandas-dev/pandas/issues/15146))
+ Constructing `Series` or `DataFrame` using `DatetimeTZDtype` data and `datetime64[ns]` dtype has been deprecated. Use `Series(data).dt.tz_localize(None)` instead ([GH 41555](https://github.com/pandas-dev/pandas/issues/41555), [GH 33401](https://github .com/pandas-dev/pandas/issues/33401))
+ Deprecated behavior when using large integer values and small integer dtypes in `Series` constructs resulted in silent overflow; use `Series(data).astype(dtype)` instead ([GH 41734](https:// github.com/pandas-dev/pandas/issues/41734))
+ Deprecated `DataFrame` construction behavior for floating point data and integer dtype conversions, even lossy; in a future version this will remain floating point, consistent with the behavior of `Series` ([GH 41770]( https://github.com/pandas-dev/pandas/issues/41770))
+ Deprecated behavior of inferring `timedelta64[ns]`, `datetime64[ns]` or `DatetimeTZDtype` dtype in `Series` construct when passing data containing strings and no `dtype` is passed ([GH 33558](https://github.com/pandas-dev/pandas/issues/33558))
+ In a future release, constructing a `Series` or `DataFrame` with `datetime64[ns]` data and `DatetimeTZDtype` will treat the data as wall clock time rather than UTC time (matching DatetimeIndex behavior). To view the data as UTC time, use `pd.Series(data).dt.tz_localize("UTC").dt.tz_convert(dtype.tz)` or `pd.Series(data.view("int64") , dtype=dtype)`([GH 33401](https://github.com/pandas-dev/pandas/issues/33401))
+ Deprecated passing lists as `key` to `DataFrame.xs()` and `Series.xs()` ([GH 41760](https://github.com/pandas-dev/pandas/issues/ 41760))
+ Deprecated the use of boolean parameters as standard parameter values `{"left", "right", "neither", "both"}` in `Series.between()` ([GH 40628](https://github. com/pandas-dev/pandas/issues/40628))
+ Deprecated the use of positional parameter passing for all of the following, except in special cases ([GH 41485](https://github.com/pandas-dev/pandas/issues/41485)):
+ `concat()` (except `objs`)
+ `read_csv()`(Apart from `filepath_or_buffer`)
+ `read_table()`(Apart from `filepath_or_buffer`)
+ `DataFrame.clip()` and `Series.clip()` (except `upper` and `lower`)
+ `DataFrame.drop_duplicates()`(Apart from `subset`)、`Series.drop_duplicates()`、`Index.drop_duplicates()` and `MultiIndex.drop_duplicates()`
+ `DataFrame.drop()` (except `labels`) and `Series.drop()`
+ `DataFrame.dropna()` and `Series.dropna()`
+ `DataFrame.ffill()`, `Series.ffill()`, `DataFrame.bfill()` and `Series.bfill()`
+ `DataFrame.fillna()` and `Series.fillna()` (Apart from `value`)
+ `DataFrame.interpolate()` and `Series.interpolate()` (Apart from `method`)
+ `DataFrame.mask()` and `Series.mask()` (except `cond` and `other`)
+ `DataFrame.reset_index()` (except `level`) and `Series.reset_index()`
+ `DataFrame.set_axis()` and `Series.set_axis()` (except `labels`)
+ `DataFrame.set_index()` (except `keys`)
+ `DataFrame.sort_index()` and `Series.sort_index()`
+ `DataFrame.sort_values()` (Apart from `by`) and `Series.sort_values()`
+ `DataFrame.where()` and `Series.where()` (except `cond` and `other`)
+ `Index.set_names()` and `MultiIndex.set_names()` (except `names`)
+ `MultiIndex.codes()` (except `codes`)
+ `MultiIndex.set_levels()` (except `levels`)
+ `Resampler.interpolate()` (except `method`) ## Performance improvements
+ Performance improvements for `IntervalIndex.isin()` ([GH 38353](https://github.com/pandas-dev/pandas/issues/38353))
+ Performance improvements for `Series.mean()` nullable data types ([GH 34814](https://github.com/pandas-dev/pandas/issues/34814))
+ Performance improvements for `Series.isin()` nullable data types ([GH 38340](https://github.com/pandas-dev/pandas/issues/38340))
+ `DataFrame.fillna()` performance improvement when using `method="pad"` or `method="backfill"` under nullable floating point and nullable integer data types ([GH 39953]( https://github.com/pandas-dev/pandas/issues/39953))
+ Performance improvement of `DataFrame.corr()` under `method=kendall` ([GH 28329](https://github.com/pandas-dev/pandas/issues/28329))
+ Performance improvement of `DataFrame.corr()` under `method=spearman` ([GH 40956](https://github.com/pandas-dev/pandas/issues/40956), [GH 41885](https: //github.com/pandas-dev/pandas/issues/41885))
+ Performance improvements for `Rolling.corr()` and `Rolling.cov()` ([GH 39388](https://github.com/pandas-dev/pandas/issues/39388))
+ Performance improvements to `RollingGroupby.corr()`, `ExpandingGroupby.corr()`, `ExpandingGroupby.corr()` and `ExpandingGroupby.cov()` ([GH 39591](https://github.com/pandas -dev/pandas/issues/39591))
+ Performance improvements for `unique()` object data types ([GH 37615](https://github.com/pandas-dev/pandas/issues/37615))
+ Performance improvements to `json_normalize()` in the base case (including delimiters) ([GH 40035](https://github.com/pandas-dev/pandas/issues/40035) [GH 15621](https:/ /github.com/pandas-dev/pandas/issues/15621))
+ Performance improvements to `ExpandingGroupby` aggregation method ([GH 39664](https://github.com/pandas-dev/pandas/issues/39664))
+ Performance improvements in `Styler`, rendering time reduced by over 50%, now matches `DataFrame.to_html()` ([GH 39972](https://github.com/pandas-dev/pandas/issues /39972) [GH 39952](https://github.com/pandas-dev/pandas/issues/39952), [GH 40425](https://github.com/pandas-dev/pandas/issues/40425) )
+ Method `Styler.set_td_classes()` is now as efficient as `Styler.apply()` and `Styler.applymap()`, or even more efficient in some cases ([GH 40453](https://github.com /pandas-dev/pandas/issues/40453))
+ Performance improvements in `ExponentialMovingWindow.mean()` with `times` parameter ([GH 39784](https://github.com/pandas-dev/pandas/issues/39784))
+ Performance improvements in `DataFrameGroupBy.apply()` and `SeriesGroupBy.apply()` when Python fallback implementation is required ([GH 40176](https://github.com/pandas-dev/pandas/issues/ 40176))
+ Performance improvements when converting PyArrow boolean arrays to pandas nullable boolean arrays ([GH 41051](https://github.com/pandas-dev/pandas/issues/41051))
+ Data splicing performance improvement, for `CategoricalDtype` type data splicing ([GH 40193](https://github.com/pandas-dev/pandas/issues/40193))
+ Performance improvements in `DataFrameGroupBy.cummin()`, `SeriesGroupBy.cummin()`, `DataFrameGroupBy.cummax()` and `SeriesGroupBy.cummax()` with nullable data types ([GH 37493](https ://github.com/pandas-dev/pandas/issues/37493))
+ Performance improvements in `Series.nunique()` with nan values ([GH 40865](https://github.com/pandas-dev/pandas/issues/40865))
+ Performance improvements in `DataFrame.transpose()`, `Series.unstack()` with `DatetimeTZDtype` ([GH 40149](https://github.com/pandas-dev/pandas/issues/40149) )
+ Performance improvements in `Series.plot()` and `DataFrame.plot()` in lazy loading entry points ([GH 41492](https://github.com/pandas-dev/pandas/issues/41492 )) ## Bug fixes
### Classification
+ Bug in `CategoricalIndex` incorrectly not raising `TypeError` when scalar data was passed ([GH 38614](https://github.com/pandas-dev/pandas/issues/38614))
+ Bug in `CategoricalIndex.reindex` when the passed `Index` is not categorical, but its values are all labels in categories ([GH 28690](https://github.com/pandas-dev/pandas/ issues/28690))
+ When constructing `Categorical` from an array of object types of `date` objects, round-trip processing was not performed correctly, that is, `astype` was not used correctly for round-trip processing ([GH 38552](https://github.com/pandas- dev/pandas/issues/38552))
+ Bug in constructing `DataFrame` from `ndarray` and `CategoricalDtype` ([GH 38857](https://github.com/pandas-dev/pandas/issues/38857))
+ Error when setting categorical values into object dtype column of `DataFrame` ([GH 39136](https://github.com/pandas-dev/pandas/issues/39136))
+ A bug occurred when using `DataFrame.reindex()`, which raised `IndexError` when the new index contained duplicates and the old index was a `CategoricalIndex` ([GH 38906](https://github.com /pandas-dev/pandas/issues/38906))
+ When using `Categorical.fillna()`, `NotImplementedError` is raised instead of `ValueError` when filling with non-categorical tuples ([GH 41914](https://github.com/pandas-dev/pandas /issues/41914))
### Datetimelike
+ In `DataFrame` and `Series` constructors, nanoseconds were sometimes removed from `data` of `Timestamp` (or `Timedelta`) whose `dtype` was `datetime64[ns]` (or `timedelta6[ ns]`) ([GH 38032](https://github.com/pandas-dev/pandas/issues/38032))
+ Bug when using `DataFrame.first()` and `Series.first()` when the offset is one month, returning incorrect results when the first day is the last day of the month ([GH 29623] (https://github.com/pandas-dev/pandas/issues/29623))
+ An error occurred while building a `DataFrame` or `Series` with mismatched `datetime64` data and `timedelta64` dtype, or vice versa, failing to raise `TypeError` ([GH 38575](https://github. com/pandas-dev/pandas/issues/38575), [GH 38764](https://github.com/pandas-dev/pandas/issues/38764), [GH 38792](https://github.com/ pandas-dev/pandas/issues/38792))
+ Error when building a `Series` or `DataFrame` using a `datetime` object of type `datetime64[ns]` dtype or a `timedelta` object of type `timedelta64[ns]` dtype, out of range ([GH 38792]( https://github.com/pandas-dev/pandas/issues/38792), [GH 38965](https://github.com/pandas-dev/pandas/issues/38965))
+ There is a bug in `DatetimeIndex.intersection()`, `DatetimeIndex.symmetric_difference()`, `PeriodIndex.intersection()`, `PeriodIndex.symmetric_difference()`, when operating with `CategoricalIndex`, the object type is always returned ([GH 38741](https://github.com/pandas-dev/pandas/issues/38741))
+ There is a bug in `DatetimeIndex.intersection()`. When the frequency is non-Tick, an incorrect result is returned when `n != 1` ([GH 42104](https://github.com/pandas-dev/pandas/ issues/42104))
+ There is a bug in `Series.where()` that incorrectly converts `datetime64` values to `int64` ([GH 37682](https://github.com/pandas-dev/pandas/issues/37682))
+ Bug in `Categorical` incorrectly converting `datetime` object type to `Timestamp` ([GH 38878](https://github.com/pandas-dev/pandas/issues/38878))
+ Bug in comparison between `Timestamp` objects and `datetime64` objects outside the boundaries of the nanosecond `datetime64` implementation ([GH 39221](https://github.com/pandas-dev/pandas/issues/39221 ))
+ A bug exists in values near the implementation boundaries of `Timestamp.round()`, `Timestamp.floor()`, `Timestamp.ceil()`, causing `Timestamp` to be incorrectly rounded to `Timestamp` ([GH 39244](https://github.com/pandas-dev/pandas/issues/39244))
+ A bug exists in values near the implementation boundaries of `Timedelta.round()`, `Timedelta.floor()`, `Timedelta.ceil()`, causing `Timedelta` to be incorrectly rounded to `Timestamp` ([GH 38964](https://github.com/pandas-dev/pandas/issues/38964))
+ Bug in `date_range()` that in extreme cases incorrectly creates a `DatetimeIndex` containing `NaT` instead of raising `OutOfBoundsDatetime` ([GH 24124](https://github.com/pandas-dev /pandas/issues/24124))
+ Bug where `infer_freq()` incorrectly failed to infer the frequency of 'H' for `DatetimeIndex` with time zones that crossed DST boundaries ([GH 39556](https://github.com/pandas-dev/pandas/issues /39556))
+ Bug in `Series` backed by `DatetimeArray` or `TimedeltaArray` sometimes failing to set the array's `freq` to `None` ([GH 41425](https://github.com/pandas-dev/pandas/issues /41425))
### Time difference
+ Bug when constructing `Timedelta` from `np.timedelta64` object with non-nanosecond units, which exceeds the bounds of `timedelta64[ns]` ([GH 38965](https://github.com/pandas-dev /pandas/issues/38965))
+ Bug when constructing `TimedeltaIndex`, incorrectly accepting `np.datetime64("NaT")` objects ([GH 39462](https://github.com/pandas-dev/pandas/issues/39462))
+ Bug when constructing `Timedelta` from an input string containing only symbols and no numbers failed to raise an error ([GH 39710](https://github.com/pandas-dev/pandas/issues/39710))
+ Constructing `TimedeltaIndex` and `to_timedelta()` failed to raise a bug when passing a non-nanosecond `timedelta64` array and overflowing when converting to `timedelta64[ns]` ([GH 40008](https://github .com/pandas-dev/pandas/issues/40008))
### Time zone
+ Different `tzinfo` objects indicating that UTC is not considered equivalent Bug ([GH 39216](https://github.com/pandas-dev/pandas/issues/39216))
+ Bug in `dateutil.tz.gettz("UTC")` is not recognized as equivalent to other `tzinfos` representing UTC ([GH 39276](https://github.com/pandas-dev/pandas/ issues/39276))
### numerical value
+ Bug in `DataFrame.quantile()` and `DataFrame.sort_values()` resulted in incorrect subsequent indexing behavior ([GH 38351](https://github.com/pandas-dev/pandas/issues/38351 ))
+ Bug in `DataFrame.sort_values()`, raising [`IndexError`](https://docs.python.org/3/library/exceptions.html#IndexError "(in Python v3) when `by` is empty .12)") ([GH 40258](https://github.com/pandas-dev/pandas/issues/40258)")
+ `DataFrame.select_dtypes()` will discard numeric `ExtensionDtype` columns when `include=np.number` ([GH 35340](https://github.com/pandas-dev/pandas/issues/35340))
+ Bug in `DataFrame.mode()` and `Series.mode()` not maintaining consistent integer `Index` on empty input ([GH 33321](https://github.com/pandas-dev/pandas/ issues/33321))
+ `DataFrame.rank()` has a bug when DataFrame contains `np.inf` ([GH 32593](https://github.com/pandas-dev/pandas/issues/32593))
+ There is a bug in `DataFrame.rank()` which raises `IndexError` when `axis=0` and the column contains incomparable types ([GH 38932](https://github.com/pandas-dev/pandas/ issues/38932))
+ `Series.rank()`, `DataFrame.rank()`, `DataFrameGroupBy.rank()` and `SeriesGroupBy.rank()` had a bug in handling the most negative `int64` value as a missing value ([GH 32859] (https://github.com/pandas-dev/pandas/issues/32859))
+ `DataFrame.select_dtypes()` has different behavior when `include="int"` in Windows and Linux ([GH 36596](https://github.com/pandas-dev/pandas/issues/36596))
+ `DataFrame.apply()` and `DataFrame.agg()` have errors when passing parameter `func="size"`, which will operate on the entire `DataFrame` instead of rows or columns ([GH 39934](https:/ /github.com/pandas-dev/pandas/issues/39934))
+ `DataFrame.transform()` raised `SpecificationError` when passing a dictionary and columns were missing; now raises `KeyError` ([GH 40004](https://github.com/pandas-dev/pandas/issues/40004 ))
+ `DataFrameGroupBy.rank()` and `SeriesGroupBy.rank()` give incorrect results when using `pct=True` and there are equal values between consecutive groups ([GH 40518](https://github .com/pandas-dev/pandas/issues/40518))
+ `Series.count()` returns `int32` results on 32-bit platforms when parameter `level=None` is used ([GH 40908](https://github.com/pandas-dev/pandas/issues/ 40908))
+ `Series` and `DataFrame` reduction operations using methods `any` and `all` on object data do not return boolean results ([GH 12863](https://github.com/pandas-dev/pandas/issues /12863), [GH 35450](https://github.com/pandas-dev/pandas/issues/35450), [GH 27709](https://github.com/pandas-dev/pandas/issues/27709 ))
+ Using `Series.clip()` on a Series containing NA values and whose data type is a nullable integer or float fails ([GH 40851](https://github.com/pandas-dev/pandas/issues/ 40851))
+ Using `UInt64Index.where()` and `UInt64Index.putmask()` on `other` of `np.int64` data type incorrectly raises `TypeError` ([GH 41974](https://github.com /pandas-dev/pandas/issues/41974))
+ `DataFrame.agg()` does not sort the aggregation axis in the order of supplied aggregate functions, when one or more aggregate functions fail to produce results ([GH 33634](https://github.com /pandas-dev/pandas/issues/33634))
+ Using `DataFrame.clip()` on missing values does not interpret them as unthresholded ([GH 40420](https://github.com/pandas-dev/pandas/issues/40420))
### Convert
+ An issue with `Series.to_dict()` when using the `orient='records'` parameter, now returns the Python native type ([GH 25969](https://github.com/pandas-dev/pandas/ issues/25969))
+ `Series.view()` and `Index.view()` when converting datetime class (`datetime64[ns]`, `datetime64[ns, tz]`, `timedelta64`, `period`) data types Existing issues ([GH 39788](https://github.com/pandas-dev/pandas/issues/39788))
+ Bug creating `DataFrame` from empty `np.recarray` not preserving original dtypes ([GH 40121](https://github.com/pandas-dev/pandas/issues/40121))
+ `DataFrame` failed to raise `TypeError` when constructed from `frozenset` ([GH 40163](https://github.com/pandas-dev/pandas/issues/40163))
+ Bug in `Index` construct silently ignoring passed `dtype` when data cannot be converted to that dtype ([GH 21311](https://github.com/pandas-dev/pandas/issues/21311))
+ When converting to `dtype='categorical'`, `StringArray.astype()` returned to NumPy and raised an error when converting ([GH 40450](https://github.com/pandas-dev/pandas /issues/40450))
+ When given an array of numeric NumPy dtypes lower than int64, uint64 and float64, a bug in `factorize()` was that the unique values did not retain their original dtype ([GH 41132](https://github.com/pandas -dev/pandas/issues/41132))
+ Error in `DataFrame` construction creating dictionary containing `ExtensionDtype` and `copy=True`, failed to copy ([GH 38939](https://github.com/pandas-dev/pandas/issues/ 38939))
+ `qcut()` throws an error when passing `Float64DType` as input ([GH 40730](https://github.com/pandas-dev/pandas/issues/40730))
+ Error when constructing `DataFrame` and `Series` using `datetime64[ns]` data and `dtype=object`, the result is a `datetime` object instead of a `Timestamp` object ([GH 41599](https:// github.com/pandas-dev/pandas/issues/41599))
+ Error when constructing `DataFrame` and `Series` using `timedelta64[ns]` data and `dtype=object`, the result is `np.timedelta64` object instead of `Timedelta` object ([GH 41599](https: //github.com/pandas-dev/pandas/issues/41599))
+ Bug in `DataFrame` construction when given a 2D object-dtype `np.ndarray` that needs to be converted to a `Period` or `Interval` object of `PeriodDtype` or `IntervalDtype` respectively ([GH 41812](https://github.com/pandas-dev/pandas/issues/41812))
+ Bug when building `Series` from a list and a `PandasDtype` ([GH 39357](https://github.com/pandas-dev/pandas/issues/39357))
+ A bug in creating `Series` from a `range` object that did not fit within the bounds of `int64` dtype ([GH 30173](https://github.com/pandas-dev/pandas/issues/30173) )
+ Bug when building `Series` from an `Index` and a `dict` where all keys are tuples and need to be re-indexed ([GH 41707](https://github.com/pandas-dev/pandas/ issues/41707))
+ Bug in `infer_dtype()` when identifying Series, Index or arrays with Period dtype ([GH 23553](https://github.com/pandas-dev/pandas/issues/23553))
+ A bug that occurs when calling `infer_dtype()` on a general `ExtensionArray` object will return `"unknown-array"` instead of reporting an error ([GH 37367](https://github.com/pandas-dev /pandas/issues/37367))
+ Bug when calling `DataFrame.convert_dtypes()` on an empty DataFrame, incorrectly raising `ValueError` ([GH 40393](https://github.com/pandas-dev/pandas/issues/40393))
### string
+ Bug when converting from `pyarrow.ChunkedArray` to `StringArray` when the original object does not have any chunks ([GH 41040](https://github.com/pandas-dev/pandas/issues/41040))
+ A bug in `Series.replace()` and `DataFrame.replace()`, for `StringDType` data, using `regex=True` ignores replacement ([GH 41333](https://github.com/ pandas-dev/pandas/issues/41333), [GH 35977](https://github.com/pandas-dev/pandas/issues/35977))
+ A bug in `Series.str.extract()`, using `StringArray` to return an empty `DataFrame` returns object dtype ([GH 41441](https://github.com/pandas-dev/pandas/issues /41441))
+ A bug in `Series.str.replace()` where the `case` argument was ignored when `regex=False` ([GH 41602](https://github.com/pandas-dev/pandas/issues/ 41602))
### interval
+ A bug in `IntervalIndex.intersection()` and `IntervalIndex.symmetric_difference()` always returned object-dtype when operating with `CategoricalIndex` ([GH 38653](https://github.com/pandas -dev/pandas/issues/38653), [GH 38741](https://github.com/pandas-dev/pandas/issues/38741))
+ A bug in `IntervalIndex.intersection()` returned duplicates when at least one `Index` object had a duplicate that existed in another object ([GH 38743](https://github.com/pandas- dev/pandas/issues/38743))
+ `IntervalIndex.union()`, `IntervalIndex.intersection()`, `IntervalIndex.difference()` and `IntervalIndex.symmetric_difference()` now convert when operating with another `IntervalIndex` of incompatible dtype for the appropriate dtype instead of raising `TypeError` ([GH 39267](https://github.com/pandas-dev/pandas/issues/39267))
+ `PeriodIndex.union()`, `PeriodIndex.intersection()`, `PeriodIndex.symmetric_difference()`, `PeriodIndex.difference()` now convert when operating with another `PeriodIndex` of incompatible dtype for object dtype instead of raising `IncompatibleFrequency` ([GH 39306](https://github.com/pandas-dev/pandas/issues/39306))
+ A bug ([GH 41831](https: //github.com/pandas-dev/pandas/issues/41831))
### index
+ Bug in `Index.union()` and `MultiIndex.union()` discard duplicate `Index` values when `Index` is not monotonic or `sort` is set to `False` ([GH 36289](https ://github.com/pandas-dev/pandas/issues/36289), [GH 31326](https://github.com/pandas-dev/pandas/issues/31326), [GH 40862](https:/ /github.com/pandas-dev/pandas/issues/40862))
+ Bug in `CategoricalIndex.get_indexer()` fails to raise `InvalidIndexError` when not unique ([GH 38372](https://github.com/pandas-dev/pandas/issues/38372))
+ Bug in `IntervalIndex.get_indexer()` when `target` has `CategoricalDtype` and both index and target contain NA values ([GH 41934](https://github.com/pandas-dev/pandas/issues/ 41934))
+ Bug in `Series.loc()` raises `ValueError` when the input is filtered by a boolean list and the value to be set is a list with lower dimensions ([GH 20438](https://github.com/pandas- dev/pandas/issues/20438))
+ Bug in inserting many new columns into a `DataFrame`, causing incorrect subsequent indexing behavior ([GH 38380](https://github.com/pandas-dev/pandas/issues/38380))
+ Bug in `DataFrame.__setitem__()` raising `ValueError` when setting multiple values to duplicate columns ([GH 15695](https://github.com/pandas-dev/pandas/issues/15695))
+ Bug in `DataFrame.loc()`, `Series.loc()`, `DataFrame.__getitem__()` and `Series.__getitem__()` returning incorrect elements of string slices of non-monotonic `DatetimeIndex` ([ GH 33146](https://github.com/pandas-dev/pandas/issues/33146))
+ Bug in `DataFrame.reindex()` and `Series.reindex()` raised with `method="ffill"` and `method="bfill"` and specified `tolerance` when having time zone aware indexes `TypeError` ([GH 38566](https://github.com/pandas-dev/pandas/issues/38566))
+ Bug in `DataFrame.reindex()` In case of `datetime64[ns]` or `timedelta64[ns]`, `fill_value` is incorrectly converted to integer when it needs to be converted to object dtype ([GH 39755]( https://github.com/pandas-dev/pandas/issues/39755))
+ `DataFrame.__setitem__()` raises a `ValueError` when setting on an empty `DataFrame` with the specified column and a non-empty `DataFrame` value ([GH 38831](https://github.com/pandas- dev/pandas/issues/38831))
+ `DataFrame.loc.__setitem__()` raises `ValueError` when operating on unique columns when `DataFrame` has duplicate columns ([GH 38521](https://github.com/pandas-dev/ pandas/issues/38521))
+ Bug in `DataFrame.iloc.__setitem__()` and `DataFrame.loc.__setitem__()` when using dictionary values to set mixed dtypes ([GH 38335](https://github.com/pandas-dev/pandas /issues/38335))
+ `Series.loc.__setitem__()` and `DataFrame.loc.__setitem__()` raise a `KeyError` when a boolean generator is provided ([GH 39614](https://github.com/pandas-dev /pandas/issues/39614))
+ `Series.iloc()` and `DataFrame.iloc()` raise `KeyError` when a generator is provided ([GH 39614](https://github.com/pandas-dev/pandas/issues/ 39614))
+ `DataFrame.__setitem__()` did not raise `ValueError` when the right side is a `DataFrame` with the wrong number of columns ([GH 38604](https://github.com/pandas-dev/pandas/issues /38604))
+ `Series.__setitem__()` raises `ValueError` when setting `Series` using a scalar indexer ([GH 38303](https://github.com/pandas-dev/pandas/issues/38303))
+ `DataFrame.loc()` removes `MultiIndex` level bug when `DataFrame` as input has only one row ([GH 10521](https://github.com/pandas-dev/pandas/issues/10521 ))
+ `DataFrame.__getitem__()` and `Series.__getitem__()` always raise a `KeyError` when slicing with an existing string where `Index` has a millisecond value ([GH 33589](https:/ /github.com/pandas-dev/pandas/issues/33589))
+ Bug when setting `timedelta64` or `datetime64` value to numeric `Series`, cannot be converted to object dtype ([GH 39086](https://github.com/pandas-dev/pandas/issues/39086 ), [GH 39619](https://github.com/pandas-dev/pandas/issues/39619))
+ Bug when setting an `Interval` value into a `Series` or `DataFrame` with a mismatched `IntervalDtype`, incorrectly casting the new value to an existing dtype ([GH 39120](https:// github.com/pandas-dev/pandas/issues/39120))
+ Bug when setting `datetime64` values into `Series` with integer dtype, incorrectly converting datetime64 values to integers ([GH 39266](https://github.com/pandas-dev/pandas/issues /39266))
+ Bug when setting `np.datetime64("NaT")` into a `Series` with `Datetime64TZDtype`, incorrectly treating time zone non-aware values as time zone aware ([GH 39769](https:/ /github.com/pandas-dev/pandas/issues/39769))
+ Error in `Index.get_loc()` did not raise `KeyError` when `key=NaN` and `method` was specified but `NaN` was not in `Index` ([GH 39382](https://github .com/pandas-dev/pandas/issues/39382))
+ Bug in `DatetimeIndex.insert()` when inserting `np.datetime64("NaT")` into a time zone aware index, incorrectly treating time zone non-aware values as time zone aware ([GH 39769]( https://github.com/pandas-dev/pandas/issues/39769))
+ Bug in `Index.insert()` when setting a new column that cannot fit in an existing `frame.columns`, or in `Series.reset_index()` or `DataFrame.reset_index()` error, it was not converted to a compatible dtype and instead incorrectly raised an exception ([GH 39068](https://github.com/pandas-dev/pandas/issues/39068))
+ Bug in `RangeIndex.append()` where single objects of length 1 were incorrectly concatenated ([GH 39401](https://github.com/pandas-dev/pandas/issues/39401))
+ Bug in `RangeIndex.astype()`, when converting to `CategoricalIndex`, the category became `Int64Index` instead of `RangeIndex` ([GH 41263](https://github.com/pandas-dev /pandas/issues/41263))
+ Bug when using boolean indexer to set `numpy.timedelta64` value into `Series` of object dtype ([GH 39488](https://github.com/pandas-dev/pandas/issues/39488))
+ When using `at` or `iat` to set a numeric value to the boolean type `Series`, the conversion to the object type fails ([GH 39582](https://github.com/pandas-dev/pandas/issues /39582))
+ `DataFrame.__setitem__()` and `DataFrame.iloc.__setitem__()` raise a `ValueError` when trying to use a row slice index and set a list to a value ([GH 40440](https://github .com/pandas-dev/pandas/issues/40440))
+ In `DataFrame.loc()`, `KeyError` was not raised when the key was not found in `MultiIndex` and the level was not fully specified ([GH 41170](https://github.com/ pandas-dev/pandas/issues/41170))
+ Bug in `DataFrame.loc.__setitem__()` when setting expansion incorrectly raised an exception when the index in the expansion axis contained duplicates ([GH 40096](https://github.com/pandas-dev /pandas/issues/40096))
+ When using `MultiIndex`, a bug in `DataFrame.loc.__getitem__()` raises an exception when at least one index column has a floating point type and we convert to a float when retrieving a scalar ([GH 41369](https: //github.com/pandas-dev/pandas/issues/41369))
+ Bug in `DataFrame.loc()` matching non-boolean indexed elements ([GH 20432](https://github.com/pandas-dev/pandas/issues/20432))
+ Indexing using `np.nan` on a `Series` or `DataFrame` with `CategoricalIndex` incorrectly raises `KeyError` when a `np.nan` key is present ([GH 41933](https:// github.com/pandas-dev/pandas/issues/41933))
+ Bug in `Series.__delitem__()` using `ExtensionDtype` incorrectly converts to `ndarray` ([GH 40386](https://github.com/pandas-dev/pandas/issues/40386))
+ Bug in `DataFrame.at()` when using `CategoricalIndex` returned incorrect results when passing integer keys ([GH 41846](https://github.com/pandas-dev/pandas/issues /41846))
+ `DataFrame.loc()` returns `MultiIndex` in wrong order with duplicate indexers ([GH 40978](https://github.com/pandas-dev/pandas/issues/40978))
+ Bug in `DataFrame.__setitem__()` raising `TypeError` when using `str` subclass as column name when using `DatetimeIndex` ([GH 37366](https://github.com/pandas-dev/ pandas/issues/37366)
+ Bug in `PeriodIndex.get_loc()` failed to raise `KeyError` when given a `Period` with a mismatching `freq` ([GH 41670](https://github.com/pandas-dev/pandas /issues/41670))
+ Bug `.loc.__getitem__` raises `OverflowError` instead of `KeyError` when using `UInt64Index` with negative integer keys in some cases and converts to positive integers in other cases ([GH 41777](https ://github.com/pandas-dev/pandas/issues/41777))
+ Bug in `Index.get_indexer()` failed to raise `ValueError` due to invalid `method`, `limit` or `tolerance` arguments in some cases ([GH 41918](https://github. com/pandas-dev/pandas/issues/41918))
+ Bug where `TimedeltaIndex` raised `ValueError` instead of `TypeError` when using invalid string slices `Series` or `DataFrame` ([GH 41821](https://github.com/pandas-dev/pandas/ issues/41821))
+ Bug in the `Index` constructor sometimes silently ignored the specified `dtype`, causing problems ([GH 38879](https://github.com/pandas-dev/pandas/issues/38879))
+ The behavior of `Index.where()` now matches the behavior of `Index.putmask()`, i.e. `index.where(mask, other)` matches `index.putmask(~mask, other)` ( [GH 39412](https://github.com/pandas-dev/pandas/issues/39412))
### Missing
+ Bug in `Grouper` not properly propagating `dropna` parameters; `DataFrameGroupBy.transform()` now correctly handles missing values for `dropna=True` ([GH 35612](https://github.com/pandas- dev/pandas/issues/35612))
+ Bugs in `isna()`, `Series.isna()`, `Index.isna()`, `DataFrame.isna()`, and the corresponding `notna` function does not recognize `Decimal("NaN") `Object([GH 39409](https://github.com/pandas-dev/pandas/issues/39409))
+ Bug in `DataFrame.fillna()`, not accepting dictionary as `downcast` keyword ([GH 40809](https://github.com/pandas-dev/pandas/issues/40809))
+ Bug in `isna()`, which does not return a masked copy of a nullable type, causing any subsequent mask modification to change the original array ([GH 40935](https://github.com/pandas-dev/pandas/ issues/40935))
+ Bug in `DataFrame` construction, when floating point data containing `NaN` and integer `dtype`, converting instead of retaining `NaN` ([GH 26919](https://github.com/pandas-dev/ pandas/issues/26919))
+ Bug in `Series.isin()` and `MultiIndex.isin()` where all NaNs were not treated as equivalent if they were in a tuple ([GH 41836](https://github.com /pandas-dev/pandas/issues/41836))
### multiple indexes
+ Bug in `DataFrame.drop()`, raising `TypeError` when `MultiIndex` is not unique and `level` is not provided ([GH 36293](https://github.com/pandas-dev/pandas/issues /36293))
+ Bug in `MultiIndex.intersection()`, repeating `NaN` in the result ([GH 38623](https://github.com/pandas-dev/pandas/issues/38623))
+ Bug in `MultiIndex.equals()` that incorrectly returned `True` when `MultiIndex` contained `NaN`, even if they were in different order ([GH 38439](https://github.com/pandas-dev /pandas/issues/38439))
+ Bug in `MultiIndex.intersection()`, when intersecting with `CategoricalIndex`, always returns an empty result ([GH 38653](https://github.com/pandas-dev/pandas/issues/38653))
+ Bug in `MultiIndex.difference()` that incorrectly raised `TypeError` when the index contained unsortable entries ([GH 41915](https://github.com/pandas-dev/pandas/issues/41915) )
+ There is a bug in `MultiIndex.reindex()`. When used on an empty `MultiIndex`, a `ValueError` will be raised when indexing only a specific level ([GH 41170](https://github.com/pandas-dev /pandas/issues/41170))
+ `MultiIndex.reindex()` has a bug that raises `TypeError` when re-indexing a flat `Index` ([GH 41707](https://github.com/pandas-dev/pandas/issues/41707) )
### I/O
+ `Index.__repr__()` has a bug when `display.max_seq_items=1` ([GH 38415](https://github.com/pandas-dev/pandas/issues/38415))
+ There is a bug in `read_csv()`. Scientific notation cannot be recognized when the parameter `decimal` is set to scientific notation and `engine="python"` ([GH 31920](https://github.com/pandas-dev /pandas/issues/31920))
+ There is a bug in `read_csv()` when interpreting `NA` values as comments. When `NA` contains a comment string, `engine="python"` is fixed ([GH 34002](https://github .com/pandas-dev/pandas/issues/34002))
+ There is a bug in `read_csv()`. When there are multiple header columns and `index_col` is specified, an `IndexError` is raised when the file has no data rows ([GH 38292](https://github.com/pandas-dev/ pandas/issues/38292))
+ There is a bug in `read_csv()`. In the case of `engine="python"`, the length of `usecols` and `names` are not accepted ([GH 16469](https://github.com/pandas -dev/pandas/issues/16469))
+ There is a bug in `read_csv()`. When `delimiter=","`, `usecols` and `parse_dates` are specified, the object dtype is returned. For the case of `engine="python"` ([GH 35873](https: //github.com/pandas-dev/pandas/issues/35873))
+ There is a bug in `read_csv()`. Specifying `names` and `parse_dates` when `engine="c"` will cause `TypeError` ([GH 33699](https://github.com/pandas-dev /pandas/issues/33699))
+ Bug where `read_clipboard()` and `DataFrame.to_clipboard()` do not work properly in WSL ([GH 38527](https://github.com/pandas-dev/pandas/issues/38527))
+ Allow setting custom error values for the `parse_dates` parameter of `read_sql()`, `read_sql_query()` and `read_sql_table()` ([GH 35185](https://github.com/pandas-dev/pandas/ issues/35185))
+ There is a bug in `DataFrame.to_hdf()` and `Series.to_hdf()` that causes a `KeyError` ([GH 33748](https:// github.com/pandas-dev/pandas/issues/33748))
+ There is a bug in `HDFStore.put()` that causes an incorrect `TypeError` when saving a DataFrame with non-string dtype ([GH 34274](https://github.com/pandas-dev/pandas/issues /34274))
+ There is a bug in `json_normalize()`, causing the first element of the generator object not to be included in the returned DataFrame ([GH 35923](https://github.com/pandas-dev/pandas/issues/ 35923))
+ There is a bug in `read_csv()` that applies thousands separators to date columns when a date column should be parsed and `usecols` is specified for `engine="python"` ([GH 39365](https:// github.com/pandas-dev/pandas/issues/39365))
+ There is a bug in `read_excel()` that forward-fills `MultiIndex` names when specifying multiple header and index columns ([GH 34673](https://github.com/pandas-dev/pandas/issues/34673 ))
+ There is a bug in `read_excel()` that does not respect `set_option()` ([GH 34252](https://github.com/pandas-dev/pandas/issues/34252))
+ On nullable boolean dtype, there is a bug in `read_csv()` that does not switch `true_values` and `false_values` ([GH 34655](https://github.com/pandas-dev/pandas/issues/34655) )
+ There is a bug in `read_json()` that fails to maintain numeric string indexes when using `orient="split"` ([GH 28556](https://github.com/pandas-dev/pandas/issues/ 28556))
+ If `chunksize` is non-zero and the query returns no results, `read_sql()` will return an empty generator. Now returns a generator containing a single empty DataFrame ([GH 34411](https://github.com/pandas-dev/pandas/issues/34411))
+ There is a bug in `read_hdf()` when using the `where` parameter, returning unexpected records when filtering on a categorical string column ([GH 39189](https://github.com/pandas-dev/ pandas/issues/39189))
+ Bug in `read_sas()` raises `ValueError` when `datetimes` is null ([GH 39725](https://github.com/pandas-dev/pandas/issues/39725))
+ Bug in `read_excel()` remove null values from single column spreadsheet ([GH 39808](https://github.com/pandas-dev/pandas/issues/39808))
+ Bug in `read_excel()` loading trailing empty rows/columns for certain file types ([GH 41167](https://github.com/pandas-dev/pandas/issues/41167))
+ Bug in `read_excel()` raises `AttributeError` when excel file has `MultiIndex` header followed by two empty rows and no index ([GH 40442](https://github.com/pandas-dev/pandas/ issues/40442))
+ Bug in `read_excel()`, `read_csv()`, `read_table()`, `read_fwf()` and `read_clipboard()` where a blank line after the `MultiIndex` header would be removed ([GH 40442] (https://github.com/pandas-dev/pandas/issues/40442))
+ Bug in `DataFrame.to_string()` misaligned truncated columns when `index=False` ([GH 40904](https://github.com/pandas-dev/pandas/issues/40904))
+ Bug in `DataFrame.to_string()` when `index=False` adds extra points and misplaces truncated lines ([GH 40904](https://github.com/pandas-dev/pandas/issues/40904) )
+ Bug in `read_orc()` always raises `AttributeError` ([GH 40918](https://github.com/pandas-dev/pandas/issues/40918))
+ Bug in `read_csv()` and `read_table()` If `names` and `prefix` are defined, `prefix` is silently ignored and now raises `ValueError` ([GH 39123](https://github.com /pandas-dev/pandas/issues/39123))
+ Bug in `read_csv()` and `read_excel()` when `mangle_dupe_cols` is set to `True`, dtype of duplicate column names is not respected[GH 35211](https://github.com/pandas-dev /pandas/issues/35211))
+ `read_csv()` now raises `ValueError` when `delimiter` and `sep` are defined but `sep` is silently ignored ([GH 39823](https://github.com/pandas-dev/pandas/ issues/39823))
+ `read_csv()` and `read_table()` interpreted parameters incorrectly when `sys.setprofile` was previously called ([GH 41069](https://github.com/pandas-dev/pandas/issues /41069))
+ There is a bug when converting from PyArrow to pandas (for example, for reading Parquet), problems occur when the data buffer size of nullable data types and PyArrow arrays is not a multiple of the dtype size ([GH 40896](https:/ /github.com/pandas-dev/pandas/issues/40896))
+ `read_excel()` raises an error when pandas cannot determine the file type but the user specifies the `engine` parameter ([GH 41225](https://github.com/pandas-dev/pandas/issues/41225))
+ Bug in `read_clipboard()` would move values to the wrong column if there was a null value in the first column when copying in an excel file ([GH 41108](https://github.com/pandas -dev/pandas/issues/41108))
+ `DataFrame.to_hdf()` and `Series.to_hdf()` raise `TypeError` when trying to append an incompatible column to a string column ([GH 41897](https://github.com/pandas -dev/pandas/issues/41897))
### cycle
+ Comparisons between `Period` objects or `Index`, `Series` or `DataFrame` if the `PeriodDtype` does not match will now behave like comparisons of other mismatched types, returning `False` for equality and `False` for inequality Returns `True`, raising `TypeError` for inequality checks ([GH 39274](https://github.com/pandas-dev/pandas/issues/39274))
### Drawing
+ `plotting.scatter_matrix()` reports an error when passing 2D `ax` parameter ([GH 16253](https://github.com/pandas-dev/pandas/issues/16253))
+ Prevent warnings from appearing when Matplotlib's `constrained_layout` is enabled ([GH 25261](https://github.com/pandas-dev/pandas/issues/25261))
+ A bug in `DataFrame.plot()` where if the function was called repeatedly and some calls used `yerr` but others did not, the colors shown in the legend were wrong ([GH 39522](https: //github.com/pandas-dev/pandas/issues/39522))
+ A bug in `DataFrame.plot()` where if the function was called repeatedly, some using `secondary_y` and others using `legend=False`, the colors displayed in the legend were wrong ([ GH 40044](https://github.com/pandas-dev/pandas/issues/40044))
+ A bug in `DataFrame.plot.box()` where the upper or lower bound markers in the plot were not visible when the `dark_background` theme was selected ([GH 40769](https://github.com/pandas-dev /pandas/issues/40769))
### Groupby/resample/rolling
+ A bug in `DataFrameGroupBy.agg()` and `SeriesGroupBy.agg()` where columns of type `PeriodDtype` were incorrectly converted too aggressively ([GH 38254](https://github.com /pandas-dev/pandas/issues/38254))
+ A bug in `SeriesGroupBy.value_counts()` whereby unobserved categories in grouped categorical sequences were not counted ([GH 38672](https://github.com/pandas-dev/pandas/issues /38672))
+ A bug in `SeriesGroupBy.value_counts()` where an error was raised on empty series ([GH 39172](https://github.com/pandas-dev/pandas/issues/39172))
+ A bug in `GroupBy.indices()` where non-existent indexes were included when there was a null value in the grouping key ([GH 9304](https://github.com/pandas-dev/pandas/issues /9304))
+ Fixed a bug in `DataFrameGroupBy.sum()` and `SeriesGroupBy.sum()`, precision loss is now avoided by using Kahan summation ([GH 38778](https://github.com/pandas-dev /pandas/issues/38778))
+ Fixed bug in `DataFrameGroupBy.cumsum()`, `SeriesGroupBy.cumsum()`, `DataFrameGroupBy.mean()` and `SeriesGroupBy.mean()`, resulting in loss of accuracy by using Kahan summation ([GH 38934 ](https://github.com/pandas-dev/pandas/issues/38934))
+ Bug in `Resampler.aggregate()` and `DataFrame.transform()`, throwing `TypeError` instead of `SpecificationError` when missing keys have mixed data types ([GH 39025](https://github. com/pandas-dev/pandas/issues/39025))
+ Bug in `DataFrameGroupBy.idxmin()` and `DataFrameGroupBy.idxmax()` involving `ExtensionDtype` column ([GH 38733](https://github.com/pandas-dev/pandas/issues/38733))
+ Bug in `Series.resample()`, an exception will be thrown when the index is a `PeriodIndex` composed of `NaT` ([GH 39227](https://github.com/pandas-dev/pandas/issues/ 39227))
+ Bug in `RollingGroupby.corr()` and `ExpandingGroupby.corr()`, when the supplied `other` is longer than each group, the grouping column will return `0` instead of `np.nan` ([GH 39591](https://github.com/pandas-dev/pandas/issues/39591))
+ Bug in `ExpandingGroupby.corr()` and `ExpandingGroupby.cov()` where `1` was returned instead of `np.nan` when the supplied `other` was longer than each group ([GH 39591] (https://github.com/pandas-dev/pandas/issues/39591))
+ Bug in `DataFrameGroupBy.mean()`, `SeriesGroupBy.mean()`, `DataFrameGroupBy.median()`, `SeriesGroupBy.median()` and `DataFrame.pivot_table()` where metadata was not propagated ([ GH 28283](https://github.com/pandas-dev/pandas/issues/28283))
+ Bug in `Series.rolling()` and `DataFrame.rolling()` where the window bounds were not calculated correctly when the window was an offset and the dates were in descending order ([GH 40002](https://github. com/pandas-dev/pandas/issues/40002))
+ Bug in `Series.groupby()` and `DataFrame.groupby()`, using `idxmax`, `idxmin`, `mad`, `min`, `max directly on an empty `Series` or `DataFrame` `, `sum`, `prod` and `skew` methods, or when used via `apply`, `aggregate` or `resample`, can lose indexes, columns and/or data types ([GH 26411](https:/ /github.com/pandas-dev/pandas/issues/26411))
+ Bug in `DataFrameGroupBy.apply()` and `SeriesGroupBy.apply()`, when used on `RollingGroupby` objects, would create a `MultiIndex` instead of an `Index` ([GH 39732](https:/ /github.com/pandas-dev/pandas/issues/39732))
+ Bug in `DataFrameGroupBy.sample()`, causing an error when specifying `weights` and the index is `Int64Index` ([GH 39927](https://github.com/pandas-dev/pandas/issues/39927 ))
+ A bug in `DataFrameGroupBy.aggregate()` and `Resampler.aggregate()`, which sometimes raised `SpecificationError` when a dictionary was passed and columns were missing, will now always raise `KeyError` ([GH 40004](https ://github.com/pandas-dev/pandas/issues/40004))
+ A bug in `DataFrameGroupBy.sample()` where column selection was not applied before calculating the result ([GH 39928](https://github.com/pandas-dev/pandas/issues/39928))
+ Providing `times` in `ExponentialMovingWindow` incorrectly raises `ValueError` when calling `__getitem__` ([GH 40164](https://github.com/pandas-dev/pandas/issues/40164))
+ `ExponentialMovingWindow` does not retain `com`, `span`, `alpha` or `halflife` properties when calling `__getitem__` ([GH 40164](https://github.com/pandas-dev/pandas/ issues/40164))
+ `ExponentialMovingWindow` now raises `NotImplementedError` when specifying `times` with `adjust=False` due to incorrect calculation ([GH 40098](https://github.com/pandas-dev/pandas/issues/ 40098))
+ A bug in `ExponentialMovingWindowGroupby.mean()`, when `engine='numba'`, the `times` parameter is ignored ([GH 40951](https://github.com/pandas-dev/pandas/ issues/40951))
+ A bug in `ExponentialMovingWindowGroupby.mean()`, using the wrong time when there are multiple groups ([GH 40951](https://github.com/pandas-dev/pandas/issues/40951))
+ A bug in `ExponentialMovingWindowGroupby` caused the time vector and numerical value of non-trivial grouping to be out of sync ([GH 40951](https://github.com/pandas-dev/pandas/issues/40951))
+ A bug in `Series.asfreq()` and `DataFrame.asfreq()` that would lose rows when the index was not sorted ([GH 39805](https://github.com/pandas-dev/pandas/ issues/39805))
+ When doing aggregate functions on `DataFrame`, the `numeric_only` argument was not respected when the `level` keyword argument was given ([GH 40660](https://github.com/pandas-dev/pandas/ issues/40660))
+ A bug in `SeriesGroupBy.aggregate()` where using a user-defined function to aggregate a Series with object type `Index` would result in incorrect `Index` shape ([GH 40014](https://github .com/pandas-dev/pandas/issues/40014))
+ A bug exists in `RollingGroupby`, the `as_index=False` parameter in `groupby` is ignored ([GH 39433](https://github.com/pandas-dev/pandas/issues/39433))
+ A bug exists in `DataFrameGroupBy.any()`, `SeriesGroupBy.any()`, `DataFrameGroupBy.all()` and `SeriesGroupBy.all()` when a nullable type holding `NA` When used on a column, even `skipna=True` will raise `ValueError` ([GH 40585](https://github.com/pandas-dev/pandas/issues/40585))
+ A bug in `DataFrameGroupBy.cummin()`, `SeriesGroupBy.cummin()`, `DataFrameGroupBy.cummax()` and `SeriesGroupBy.cummax()` incorrectly resolves values near `int64` implementation boundaries Rounding integer values ([GH 40767](https://github.com/pandas-dev/pandas/issues/40767))
+ A bug in `DataFrameGroupBy.rank()` and `SeriesGroupBy.rank()` that incorrectly raised `TypeError` with nullable dtypes ([GH 41010](https://github.com /pandas-dev/pandas/issues/41010))
+ There is a bug when using `DataFrameGroupBy.cummin()`, `SeriesGroupBy.cummin()`, `DataFrameGroupBy.cummax()` and `SeriesGroupBy.cummax()` when converting a nullable data type to a float When counting points, the calculation result is incorrect when it is too large to be round-tripped ([GH 37493](https://github.com/pandas-dev/pandas/issues/37493))
+ There is a bug when using `DataFrame.rolling()`, if the calculation is not numerically stable and `min_periods=0`, the mean returned for all `NaN` windows is zero ([GH 41053](https: //github.com/pandas-dev/pandas/issues/41053))
+ There is a bug when using `DataFrame.rolling()`, if the calculation is not numerically stable and `min_periods=0`, the sum returned is non-zero for all `NaN` windows ([GH 41053](https: //github.com/pandas-dev/pandas/issues/41053))
+ There is a bug when using `SeriesGroupBy.agg()`. When aggregating the ordered `CategoricalDtype`, the order cannot be preserved, causing the aggregation to fail ([GH 41147](https://github.com/pandas-dev /pandas/issues/41147))
+ There is a bug when using `DataFrameGroupBy.min()`, `SeriesGroupBy.min()`, `DataFrameGroupBy.max()` and `SeriesGroupBy.max()` for multiple object type columns and `numeric_only=False `, incorrectly raising `ValueError` ([GH 41111](https://github.com/pandas-dev/pandas/issues/41111))
+ There is a bug in `DataFrameGroupBy.rank()` when the `axis=0` of the GroupBy object and the keyword `axis=1` of the `rank` method ([GH 41320](https://github.com/pandas- dev/pandas/issues/41320))
+ There is a bug in `DataFrameGroupBy.__getitem__()`. When the column is not unique, the error returns an incorrectly formatted `SeriesGroupBy` instead of `DataFrameGroupBy` ([GH 41427](https://github.com/pandas-dev/ pandas/issues/41427))
+ There is a bug in `DataFrameGroupBy.transform()`, which incorrectly raises `AttributeError` when columns are not unique ([GH 41427](https://github.com/pandas-dev/pandas/issues/41427))
+ There is a bug in `Resampler.apply()`, which incorrectly discards duplicate columns when the columns are not unique ([GH 41445](https://github.com/pandas-dev/pandas/issues/41445))
+ Aggregation operations of `Series.groupby()` incorrectly return an empty `Series` instead of raising `TypeError` on aggregation operations whose dtype is invalid, e.g. `.prod` with `datetime64[ns]` dtype ( [GH 41342](https://github.com/pandas-dev/pandas/issues/41342))
+ `DataFrameGroupBy`'s aggregation operation incorrectly failed to drop columns of invalid dtype for the aggregate when there are no valid columns ([GH 41291](https://github.com/pandas-dev/pandas/issues/41291))
+ Bug in `DataFrame.rolling.__iter__()` where `on` was not assigned to the index of the result object ([GH 40373](https://github.com/pandas-dev/pandas/issues/40373))
+ There is a bug in `DataFrameGroupBy.transform()` and `DataFrameGroupBy.agg()`, when using `engine="numba"`, `*args` is cached together with the function passed by the user ([GH 41647](https: //github.com/pandas-dev/pandas/issues/41647))
+ `DataFrameGroupBy` methods `agg`, `transform`, `sum`, `bfill`, `ffill`, `pad`, `pct_change`, `shift`, `ohlc` are missing `.columns.names` ([ GH 41497](https://github.com/pandas-dev/pandas/issues/41497))
### Reshape
+ `merge()` when performing an inner join on partial indexes and `right_index=True` raises an error when there is no overlap between indices ([GH 33814](https://github.com/pandas-dev/pandas /issues/33814))
+ Bug in `DataFrame.unstack()` caused incorrect index names when levels were missing ([GH 37510](https://github.com/pandas-dev/pandas/issues/37510))
+ Under the `left_index=True` and `right_on` specifications, `merge_asof()` propagates the right index instead of the left index ([GH 33463](https://github.com/pandas-dev/pandas/issues/33463 ))
+ DataFrame with `MultiIndex` returns incorrect results on `DataFrame.join()` when one or both indexes have only one level ([GH 36909](https://github.com/pandas-dev/pandas/ issues/36909))
+ In the case of non-numeric merge columns, `merge_asof()` now raises `ValueError` instead of the obscure `TypeError` ([GH 29130](https://github.com/pandas-dev/pandas/issues/ 29130))
+ Bug in `DataFrame.join()` where values were not assigned correctly when a DataFrame had at least one dimension with a `MultiIndex` that had a non-alphabetical `Categorical` category ([GH 38502](https://github. com/pandas-dev/pandas/issues/38502))
+ `Series.value_counts()` and `Series.mode()` now return consistent keys in original order ([GH 12679](https://github.com/pandas-dev/pandas/issues/12679),[ GH 11227](https://github.com/pandas-dev/pandas/issues/11227) and [GH 39007](https://github.com/pandas-dev/pandas/issues/39007))
+ Bug in `DataFrame.stack()` not properly handling `NaN` in `MultiIndex` column ([GH 39481](https://github.com/pandas-dev/pandas/issues/39481))
+ Bug in `DataFrame.apply()` caused incorrect results when argument `func` was a string, `axis=1`, and axis arguments were not supported; now raises `ValueError` ([GH 39211] (https://github.com/pandas-dev/pandas/issues/39211))
+ `DataFrame.sort_values()` does not reshape the index correctly after sorting by column when `ignore_index=True` ([GH 39464](https://github.com/pandas-dev/pandas/issues/39464) )
+ Bug in `DataFrame.append()`, returning incorrect data type in combination of `ExtensionDtype` data types ([GH 39454](https://github.com/pandas-dev/pandas/issues/39454 ))
+ Bug in `DataFrame.append()`, returning incorrect data type in combination of `datetime64` and `timedelta64` data types ([GH 39574](https://github.com/pandas-dev/pandas /issues/39574))
+ Bug in `DataFrame.append()`, when appending a `Series` whose `Index` is not `MultiIndex` in a `DataFrame` with `MultiIndex` ([GH 41707](https://github. com/pandas-dev/pandas/issues/41707))
+ Bug in `DataFrame.pivot_table()`, which returned a single-valued `MultiIndex` when operating on an empty DataFrame ([GH 13483](https://github.com/pandas-dev/pandas/issues/13483) )
+ `Index` can now be passed to [`numpy.all()`](https://numpy.org/doc/stable/reference/generated/numpy.all.html#numpy.all "(in NumPy v1.26 in)") function ([GH 40180](https://github.com/pandas-dev/pandas/issues/40180))
+ Bug in `DataFrame.stack()`, `CategoricalDtype` is not preserved in `MultiIndex` ([GH 36991](https://github.com/pandas-dev/pandas/issues/36991))
+ Bug in `to_datetime()`, which raised an error when the input sequence contained unhashable items ([GH 39756](https://github.com/pandas-dev/pandas/issues/39756))
+ Bug in `Series.explode()` where `ignore_index` is `True` and the value is a scalar �� retaining the index ([GH 40487](https://github.com/pandas-dev/pandas/issues/ 40487))
+ Bug in `to_datetime()`, raising `ValueError` when `Series` contains `None` and `NaT` and has more than 50 elements ([GH 39882](https://github.com/pandas-dev/ pandas/issues/39882))
+ Bug in `Series.unstack()` and `DataFrame.unstack()` ([GH 41875](https:// github.com/pandas-dev/pandas/issues/41875))
+ Bug in `DataFrame.melt()` that throws `InvalidIndexError` when `DataFrame` has duplicate columns used as `value_vars` ([GH 41951](https://github.com/pandas-dev/pandas /issues/41951))
### Sparse
+ In `DataFrame.sparse.to_coo()`, a `KeyError` is thrown when the column is a numeric `Index` without `0` ([GH 18414](https://github.com/pandas- dev/pandas/issues/18414))
+ Bug where `SparseArray.astype()` produced incorrect results when converting from integer dtype to floating point dtype when `copy=False` ([GH 34456](https://github.com/pandas-dev/pandas /issues/34456))
+ `SparseArray.max()` and `SparseArray.min()` always return empty results bug ([GH 40921](https://github.com/pandas-dev/pandas/issues/40921))
### ExtensionArray
+ Bug in `DataFrame.where()` when `other` is a Series with `ExtensionDtype` ([GH 38729](https://github.com/pandas-dev/pandas/issues/ 38729))
+ Fixed a bug where `Series.idxmax()`, `Series.idxmin()`, `Series.argmax()` and `Series.argmin()` would fail when the underlying data is `ExtensionArray` ([GH 32749 ](https://github.com/pandas-dev/pandas/issues/32749), [GH 33719](https://github.com/pandas-dev/pandas/issues/33719), [GH 36566]( https://github.com/pandas-dev/pandas/issues/36566))
+ Fixed a bug where some properties of subclasses of `PandasExtensionDtype` were cached incorrectly ([GH 40329](https://github.com/pandas-dev/pandas/issues/40329))
+ In `DataFrame.mask()`, a bug causing `ValueError` when using `ExtensionDtype` to mask DataFrame ([GH 40941](https://github.com/pandas-dev/pandas/issues/40941))
### Styler
+ The `subset` parameter in `Styler` throws an error on some valid MultiIndex slices ([GH 33562](https://github.com/pandas-dev/pandas/issues/33562))
+ The HTML output rendered by `Styler` has been slightly changed to support w3's good coding standards ([GH 39626](https://github.com/pandas-dev/pandas/issues/39626))
+ In `Styler`, the rendered HTML is missing column class identifiers for some header cells ([GH 39716](https://github.com/pandas-dev/pandas/issues/39716))
+ Bug in `Styler.background_gradient()`, text color was not determined correctly ([GH 39888](https://github.com/pandas-dev/pandas/issues/39888))
+ There was a bug in `Styler.set_table_styles()`, multiple elements of the CSS selector in the `table_styles` parameter were not added correctly ([GH 34061](https://github.com/pandas-dev/pandas/issues/ 34061))
+ Bug in `Styler` when copying in Jupyter, resulting in top left cell missing and incorrect title alignment ([GH 12147](https://github.com/pandas-dev/pandas/issues/12147))
+ Bug in `Styler.where`, `kwargs` was not passed into the applicable callable function ([GH 40845](https://github.com/pandas-dev/pandas/issues/40845))
+ Bug in `Styler` caused CSS to be repeated on multiple renders ([GH 39395](https://github.com/pandas-dev/pandas/issues/39395), [GH 40334](https:// github.com/pandas-dev/pandas/issues/40334))
### other
+ `inspect.getmembers(Series)` no longer raises `AbstractMethodError` ([GH 38782](https://github.com/pandas-dev/pandas/issues/38782))
+ Bug in `Series.where()`, numeric type was not converted to `nan` when `other=None` ([GH 39761](https://github.com/pandas-dev/pandas/issues/ 39761))
+ Fixed an issue where `assert_series_equal()`, `assert_frame_equal()`, `assert_index_equal()` and `assert_extension_array_equal()` incorrectly raised exceptions when properties had unrecognized NA types ([GH 39461](https ://github.com/pandas-dev/pandas/issues/39461))
+ Fixed `assert_index_equal()` failing to raise an error when comparing `CategoricalIndex` instances to `Int64Index` and `RangeIndex` categories when using `exact=True` ([GH 41263](https://github. com/pandas-dev/pandas/issues/41263))
+ Fixed `DataFrame.equals()`, `Series.equals()` and `Index.equals()` in `np.datetime64("NaT")` or `np.timedelta64("NaT")` Problems with object dtype ([GH 39650](https://github.com/pandas-dev/pandas/issues/39650))
+ Fixed the issue where the console JSON output in `show_versions()` was not the correct JSON ([GH 39701](https://github.com/pandas-dev/pandas/issues/39701))
+ pandas now compiles on z/OS when using [xlc](https://www.ibm.com/products/xl-cpp-compiler-zos) ([GH 35826](https://github .com/pandas-dev/pandas/issues/35826))
+ Fixed the issue where `pandas.util.hash_pandas_object()` failed to recognize `hash_key`, `encoding` and `categorize` when the input object type is `DataFrame` ([GH 41404](https://github. com/pandas-dev/pandas/issues/41404)) ## Contributors
A total of 251 people contributed patches to this version. People with a "+" next to their name are contributing patches for the first time.
+ Abhishek R +
+ There's Draginda
+ Adam J. Stewart
+ Adam Turner +
+ Aidan Feldman +
+ Ajitesh Singh +
+ Akshat Jain +
+ Albert Villanova del Moral
+ Alexandre Prince-Levasseur +
+ Andrew Hawyrluk +
+Andrew Wieteska
+ AnglinaBhambra +
+Ankush Dua+
+ Anna Daglis
+ Ashlan Parker +
+ Ashwani +
+ Avinash Pancham
+Ayushman Kumar+
+ Women
+ Benoît Vinot
+ Bharat Raghunathan
+ Bijay Regmi +
+ Bobin Mathew +
+ Bogdan Pilyavets +
+ Brian Hulette +
+ Brian Sun +
+ Brock +
+ Bryan Cutler
+ Caleb +
+ Calvin Ho +
+ Chathura Widanage +
+ Chinmay Rane +
+ Chris Lynch
+ Chris Withers
+ Christos Petropoulos
+ Corentin Girard +
+ DaPy15 +
+ Damodara Puddu +
+ Daniel Hrisca
+ Daniel Saxton
+ DanielFEvans
+ Dare Adewumi +
+ Dave Willmer
+ David Schlachter +
+David-dmh+
+ Deepang Raval +
+ Doris Lee +
+ Dr. Jan-Philip Gehrcke +
+ DriesS +
+ Dylan Percy
+ Erfan Nariman
+ Eric Leung
+ EricLeer +
+ Eve
+Fangchen Li
+ Felix Divo
+Florian Jetter
+ Fred Reiss
+ GFJ138 +
+ Gaurav Sheni +
+ Geoffrey B. Eisenbarth +
+ Prompted Stupperich +
+ Griffin Ansel +
+ Gustavo C. Maciel +
+ Heidi +
+ Henry +
+Hung-Yi Wu+
+ Ian Ozsvald +
+ Irv Funny
+ Isaac Chung +
+ Isaac Virshup
+ JHM Darbyshire (MBP) +
+ JHM Darbyshire (iMac) +
+ Jack Liu +
+ James Lamb +
+ Jeet Parekh
+ Jeff Reback
+ Jiezheng2018 +
+ Jody Klymak
+ Johan Kåhrström +
+ John McGuigan
+ Joris Van den Bossche
+ Jose
+ JoseNavy
+ Josh Dimarsky
+ Josh Friedlander
+ Joshua Klein +
+ Julia Signell
+ Julian Schnitzler +
+ Kaiqi Dong
+ Eunuch Panjri +
+ Katie Smith +
+ Kelly +
+ Kenil +
+ Keppler, Kyle +
+ Kevin Sheppard
+ Khor Chean Wei +
+ Kiley Hewitt +
+ Larry Wong +
+ Lightyears +
+ Lucas Holtz +
+ Lucas Rodés-Guirao
+ Lucky Sivagurunathan +
+ Luis Pinto
+ Maciej Kos +
+ Mark Garcia
+ Marco Edward Gorelli +
+ Marco Gorelli
+ Marco Gorelli +
+ Mark Graham
+ Martin Dengler +
+ Martin Grigorov +
+ Marty Rudolf +
+ Matt Roeschke
+ Matthew Roeschke
+ Matthew Zeitlin
+ Max Bolingbroke
+ Maxim Ivanov
+ Maxim copper +
+ Mayur +
+ MeeseeksMachine
+ Michael Jarniac
+ Michael Hsieh +
+ Michel de Ruiter +
+ Mike Roberts +
+ Miroslav Šedivý
+ Mohammad Jafar Mashhadi
+ Morisa Manzella +
+ Mortada Mehyar
+ Muktan +
+ Naveen Agrawal +
+ Noah
+ Nofar Mishraki +
+ By Kozynets
+ Olga Matoula +
+ Was +
+ Omar Afifi
+ Omer Ozarslan +
+ Owen Lamont +
+ Ozan Lecturer +
+ Pandas development team
+ Paolo Lammens
+ Perfect Gasana +
+ Patrick Hoefler
+ Paul McCarthy +
+ Paulo S. Costa +
+ Pav A
+ Peter
+ Pradyumna Rahul +
+ Recharges +
+ QP Hou +
+ Rahul Chauhan
+ Rahul Sathanapalli
+ Richard Shadrach
+ Robert Bradshaw
+ Robin to Roxel
+ Rohit Gupta
+ Sam Purkis +
+ Samuel GIFFARD +
+ Sean M. Law +
+ Shahar Naveh +
+ ShaharNaveh +
+ Shiv Gupta +
+ Dixit Series +
+ Shudong Yang +
+ Simon Boehm +
+ Simon Hawkins
+ Sioned Baker +
+ Stefan Mejlgaard +
+ Steven Pitman +
+ Steven Schaerer +
+ Stéphane Guillou +
+ TLouf +
+ Firm D Pratama +
+ Terje Petersen
+ Theodoros Nikolaou +
+ Thomas Dickson
+ Thomas Li
+ Thomas Smith
+ Thomas Yu +
+ ThomasBlauthQC +
+ Tim Hoffmann
+ Tom Augspurger
+ Torsten Wörtwein
+ Tyler Reddy
+ UrielMaD
+ Uwe L. Korn
He will hunt
+ VirosaLi
+ Vladimir Podolski
+ Vyom Pathak +
+ MONEY Aiyong
+ Walter Koskinen +
+ Wenjun Si +
+ William Aid
+ Yeshwanth N +
+ Yuanhao Geng
+ Zito Relova +
+ aflah02 +
+ arredond +
+ attack68
+ cdknox +
+chinggg+
+ fathomer +
+ ftrihardjo +
+ github-actions[bot] +
+ gunjan-solanki +
+ kiran teacher
+ hasan-yaman
+ i-aki-y +
+ jbrockmendel
+ jmholzer +
+ jordi-crespo +
+ something +
+ jreback
+ juliansmidek +
+ cooling keppler
+ lrepiton +
+ lucasrodes
+ maroth96 +
+ mikeronayne +
+ mlondschien
+ moink +
+ morrme
+ mschmookler +
+ mzeitlin11
+na2+
+ nofarmishraki +
+ partev
+ patrick
+ ptype
+ realead
+ rhshadrach
+ rlukevie +
+ rosagold +
+ saucoide +
+ sdements +
+ shawnbrown
+ sstiijn +
+ stphnlyd +
+ fallen1 +
+ taytzehao
+ theOehrly +
+ theodorju +
+ thordisstella +
+ tonyyyyip +
+ tsinggggg +
+ tushushu +
+ they just love +
+ the government +
+ wertha + ## Enhancements
### Customize HTTP(s) headers when reading csv or json files
When reading from remote URLs that are not handled by fsspec (such as HTTP and HTTPS), the dictionary passed to `storage_options` will be used to create headers included in the request. This can be used to control the User-Agent header or send other custom headers ([GH 36688](https://github.com/pandas-dev/pandas/issues/36688)). For example:
```py
In [1]: headers = {"User-Agent": "pandas"}
In [2]: df = pd.read_csv(
...: "https://download.bls.gov/pub/time.series/cu/cu.item",
...: sep="\t",
...: storage_options=headers
...: )
```### read and write XML document
We've added I/O support for reading and rendering shallow versions of [XML](https://www.w3.org/standards/xml/core) documents, using `read_xml()` and `DataFrame.to_xml ()`. Use [lxml](https://lxml.de) as the parser, supporting both XPath 1.0 and XSLT 1.0. ([GH 27554](https://github.com/pandas-dev/pandas/issues/27554))
```py
In [1]: xml = """<?xml version='1.0' encoding='utf-8'?>
...: <data>
...: <row>
...: <shape>square</shape>
...: <degrees>360</degrees>
...: <sides>4.0</sides>
...: </row>
...: <row>
...: <shape>circle</shape>
...: <degrees>360</degrees>
...: <sides/>
...: </row>
...: <row>
...: <shape>triangle</shape>
...: <degrees>180</degrees>
...: <sides>3.0</sides>
...: </row>
...: </data>"""
In [2]: df = pd.read_xml(xml)
In [3]: df
Out[3]:
shape degrees sides
0 square 360 4.0
1 circle 360 NaN
2 triangle 180 3.0
In [4]: df.to_xml()
Out[4]:
<?xml version='1.0' encoding='utf-8'?>
<data>
<row>
<index>0</index>
<shape>square</shape>
<degrees>360</degrees>
<sides>4.0</sides>
</row>
<row>
<index>1</index>
<shape>circle</shape>
<degrees>360</degrees>
<sides/>
</row>
<row>
<index>2</index>
<shape>triangle</shape>
<degrees>180</degrees>
<sides>3.0</sides>
</row>
</data>
</code></pre>
For more information, see Writing XML in the IO Tools User Guide. ### Styler enhancements
We've done some focused development on <code>Styler</code>. See also the revised and improved Styler documentation (GH 39720, GH 39317, GH 40493).
<blockquote>
<ul>
<li>The method <code>Styler.set_table_styles()</code> now accepts a more natural CSS language as argument, such as <code>'color:red;'</code> instead of <code>[('color', 'red')]</code> (GH 39563)</p></li>
<li><p>Methods <code>Styler.highlight_null()</code>, <code>Styler.highlight_min()</code> and <code>Styler.highlight_max()</code> now allow custom CSS highlighting instead of the default background coloring (GH 40242)</p></li>
<li><p><code>Styler.apply()</code> now accepts functions that return <code>ndarray</code> when <code>axis=None</code>, making it consistent with the behavior of <code>axis=0</code> and <code>axis=1</code> (GH 39359)</p></li>
<li><p>When providing malformed CSS via <code>Styler.apply()</code> or <code>Styler.applymap()</code>, an error is now thrown at render time (GH 39660)</p></li>
<li><p><code>Styler.format()</code> now accepts keyword argument <code>escape</code> for optional HTML and LaTeX escaping (GH 40388, GH 41619)</p></li>
<li><p><code>Styler.background_gradient()</code> now has argument <code>gmap</code> for providing a specific gradient map for shading (GH 22727)</p></li>
<li><p><code>Styler.clear()</code> now also clears <code>Styler.hidden_index</code> and <code>Styler.hidden_columns</code> (GH 40484)</p></li>
<li><p>Added method <code>Styler.highlight_between()</code> (GH 39821)</p></li>
<li><p>Added method <code>Styler.highlight_quantile()</code> (GH 40926)</p></li>
<li><p>Added method <code>Styler.text_gradient()</code> (GH 41098)</p></li>
<li><p>Added method <code>Styler.set_tooltips()</code> to allow hover tips; this can be used to enhance interactive displays (GH 21266, GH 40284)</p></li>
<li><p>Added parameter <code>precision</code> to method <code>Styler.format()</code> to control the display of floating point numbers (GH 40134)</p></li>
<li><p>HTML output rendered by <code>Styler</code> now follows the w3 HTML style guide (GH 39626)</p></li>
<li><p>Many features of the <code>Styler</code> class are now partially or fully available on DataFrames with non-unique indexes or columns (GH 41143)</p></li>
<li><p>Better control over display via independent sparsification of indexes or columns using new style options, also available via <code>option_context()</code> (GH 41142)</p></li>
<li><p>Added option <code>styler.render.max_elements</code> to avoid browser overload when styling large DataFrames (GH 40712)</p></li>
<li><p>Added method <code>Styler.to_latex()</code> (GH 21673, GH 42320), which also allows some limited CSS transformations (GH 40731)</p></li>
<li><p>Added method <code>Styler.to_html()</code> (GH 13379)</p></li>
<li><p>Added method <code>Styler.set_sticky()</code> to make index and column headers permanently visible in scrolling HTML frames (GH 29072) ### DataFrame constructor follows <code>copy=False</code></p></li>
</ul>
</blockquote>
<p>When passing a dictionary to <code>DataFrame</code> and <code>copy=False</code>, copying will no longer occur (GH 32960).
<pre><code class="language-python line-numbers">In [1]: arr = np.array([1, 2, 3])
In [2]: df = pd.DataFrame({"A": arr, "B": arr.copy()}, copy=False)
In [3]: df
Out[3]:
A B
0 1 1
1 2 2
2 3 3
</code></pre>
<code>df["A"]</code> is still a view on <code>arr</code>:
<pre><code class="language-python line-numbers">In [4]: arr[0] = 0
In [5]: assert df.iloc[0, 0] == 0
</code></pre>
When <code>copy</code> is not passed, the default behavior remains unchanged, which is to copy. ### String data types supported by PyArrow
We have enhanced <code>StringDtype</code>, an extended type specifically for string data. (GH 39908)
It is now possible to specify a <code>storage</code> keyword option to <code>StringDtype</code>. Use the pandas option or specify <code>dtype='string[pyarrow]'</code> to allow StringArray to be backed by Python objects that are PyArrow arrays rather than NumPy arrays.
Using PyArrow's supported StringArray requires pyarrow 1.0.0 or higher.
warn
<code>string[pyarrow]</code> is currently considered experimental. The implementation and parts of the API may change without warning.
<pre><code class="language-python line-numbers">In [6]: pd.Series(['abc', None, 'def'], dtype=pd.StringDtype(storage="pyarrow"))
Out[6]:
0 abc
1 <NA>
2 def
dtype: string
</code></pre>
You can also use the alias <code>"string[pyarrow]"</code>.
<pre><code class="language-python line-numbers">In [7]: s = pd.Series(['abc', None, 'def'], dtype="string[pyarrow]")
In [8]: s
Out[8]:
0 abc
1 <NA>
2 def
dtype: string
</code></pre>
You can also create a PyArrow-backed string array using the pandas option.
<pre><code class="language-python line-numbers">In [9]: with pd.option_context("string_storage", "pyarrow"):
...: s = pd.Series(['abc', None, 'def'], dtype="string")
...:
In [10]: s
Out[10]:
0 abc
1 <NA>
2 def
dtype: string
</code></pre>
Regular string access methods work. Where appropriate, the return type of a Series or DataFrame column will also have a string dtype.
<pre><code class="language-python line-numbers">In [11]: s.str.upper()
Out[11]:
0 ABC
1 <NA>
2 DEF
dtype: string
In [12]: s.str.split('b', expand=True).dtypes
Out[12]:
0 string[pyarrow]
1 string[pyarrow]
dtype: object
</code></pre>
String access methods that return integers will return a value with <code>Int64Dtype</code>
<pre><code class="language-python line-numbers">In [13]: s.str.count("a")
Out[13]:
0 1
1 <NA>
2 0
dtype: Int64
```### Centered datetime scrolling window
Centered datetime windows are now available when performing rolling calculations on DataFrame and Series objects with similar datetime indexes ([GH 38780](https://github.com/pandas-dev/pandas/issues/38780)) . For example:
```py
In [14]: df = pd.DataFrame(
....: {"A": [0, 1, 2, 3, 4]}, index=pd.date_range("2020", periods=5, freq="1D")
....: )
....:
In [15]: df
Out[15]:
A
2020-01-01 0
2020-01-02 1
2020-01-03 2
2020-01-04 3
2020-01-05 4
In [16]: df.rolling("2D", center=True).mean()
Out[16]:
A
2020-01-01 0.5
2020-01-02 1.5
2020-01-03 2.5
2020-01-04 3.5
2020-01-05 4.0
```### Other enhancements
+ `DataFrame.rolling()`, `Series.rolling()`, `DataFrame.expanding()` and `Series.expanding()` now support a `method` argument, which contains a `'table`` option, which Perform window operations on the entire `DataFrame`. See the window overview for performance and functionality benefits ([GH 15095](https://github.com/pandas-dev/pandas/issues/15095), [GH 38995](https://github.com/ pandas-dev/pandas/issues/38995))
+ `ExponentialMovingWindow` now supports an `online` method to perform `mean` calculations online. See Window Overview ([GH 41673](https://github.com/pandas-dev/pandas/issues/41673))
+ Added `MultiIndex.dtypes()` ([GH 37062](https://github.com/pandas-dev/pandas/issues/37062))
+ Added `end` and `end_day` options to the `origin` parameter of `DataFrame.resample()` ([GH 37804](https://github.com/pandas-dev/pandas/issues/37804))
+ In `read_csv()` and `engine="c"`, improved error message when `usecols` and `names` do not match ([GH 29042](https://github.com/pandas-dev/ pandas/issues/29042))
+ Improved consistency of error messages when passing invalid `win_type` parameters in window methods ([GH 15969](https://github.com/pandas-dev/pandas/issues/15969))
+ `read_sql_query()` now accepts a `dtype` parameter to convert columnar data from a SQL database based on user input ([GH 10285](https://github.com/pandas-dev/pandas/issues/10285))
+ When `usecols` is not specified, `read_csv()` now raises `ParserWarning` if the length of the header or given name does not match the length of the data ([GH 21768](https://github.com/pandas- dev/pandas/issues/21768))
+ Integer type mapping from pandas to SQLAlchemy has been improved when using `DataFrame.to_sql()` ([GH 35076](https://github.com/pandas-dev/pandas/issues/35076))
+ `to_numeric()` now supports downgrading nullable `ExtensionDtype` objects ([GH 33013](https://github.com/pandas-dev/pandas/issues/33013))
+ Added support for dictionary-like names in `MultiIndex.set_names` and `MultiIndex.rename` ([GH 20421](https://github.com/pandas-dev/pandas/issues/20421))
+ `read_excel()` now automatically detects .xlsb files and old .xls files ([GH 35416](https://github.com/pandas-dev/pandas/issues/35416), [GH 41225](https ://github.com/pandas-dev/pandas/issues/41225))
+ `ExcelWriter` now accepts an `if_sheet_exists` parameter to control the behavior of append mode when writing to an existing worksheet ([GH 40230](https://github.com/pandas-dev/pandas/issues/40230) )
+ `Rolling.sum()`, `Expanding.sum()`, `Rolling.mean()`, `Expanding.mean()`, `ExponentialMovingWindow.mean()`, `Rolling.median()`, ` Expanding.median()`, `Rolling.max()`, `Expanding.max()`, `Rolling.min()`, and `Expanding.min()` now support [Numba] using the `engine` keyword ](http://numba.pydata.org/) Execute ([GH 38895](https://github.com/pandas-dev/pandas/issues/38895), [GH 41267](https://github. com/pandas-dev/pandas/issues/41267))
+ `DataFrame.apply()` now accepts NumPy's unary operators as strings, such as `df.apply("sqrt")`, which was already the case for `Series.apply()` ([GH 39116](https://github.com/pandas-dev/pandas/issues/39116))
+ `DataFrame.apply()` now accepts non-callable DataFrame properties as strings, such as `df.apply("size")`, which was already the case for `Series.apply()` ([ GH 39116](https://github.com/pandas-dev/pandas/issues/39116))
+ `DataFrame.applymap()` now accepts keyword arguments passed to a user-supplied `func` ([GH 39987](https://github.com/pandas-dev/pandas/issues/39987))
+ Passing `DataFrame` indexers to `iloc` for use with `Series.__getitem__()` and `DataFrame.__getitem__()` is now not allowed ([GH 39004](https://github.com/pandas-dev /pandas/issues/39004))
+ `Series.apply()` can now accept list or dictionary-like arguments, e.g. `ser.apply(np.array(["sum", "mean"]))`, which is the case for `DataFrame.apply()` This is already the case ([GH 39140](https://github.com/pandas-dev/pandas/issues/39140))
+ `DataFrame.plot.scatter()` can now accept a categorical column as argument `c` ([GH 12380](https://github.com/pandas-dev/pandas/issues/12380), [GH 31357] (https://github.com/pandas-dev/pandas/issues/31357))
+ `Series.loc()` now provides a useful error message when the Series has a `MultiIndex` and the indexer has too many dimensions ([GH 35349](https://github.com/pandas-dev/pandas/issues /35349))
+ `read_stata()` now supports reading data from compressed files ([GH 26599](https://github.com/pandas-dev/pandas/issues/26599))
+ Added support for `ISO 8601`-like parsing of timestamps with negative signs to `Timedelta` ([GH 37172](https://github.com/pandas-dev/pandas/issues/37172))
+ Added support for unary operators in `FloatingArray` ([GH 38749](https://github.com/pandas-dev/pandas/issues/38749))
+ It is now possible to construct `RangeIndex` by passing the `range` object directly, e.g. `pd.RangeIndex(range(3))` ([GH 12067](https://github.com/pandas-dev/pandas/issues/ 12067))
+ `Series.round()` and `DataFrame.round()` now handle nullable integer and float data types ([GH 38844](https://github.com/pandas-dev/pandas/issues/38844 ))
+ `read_csv()` and `read_json()` provide parameter `encoding_errors` to control how encoding errors are handled ([GH 39450](https://github.com/pandas-dev/pandas/issues/39450))
+ `DataFrameGroupBy.any()`, `SeriesGroupBy.any()`, `DataFrameGroupBy.all()`, and `SeriesGroupBy.all()` use Kleene logic to handle nullable data types ([GH 37506](https:/ /github.com/pandas-dev/pandas/issues/37506))
+ `DataFrameGroupBy.any()`, `SeriesGroupBy.any()`, `DataFrameGroupBy.all()`, and `SeriesGroupBy.all()` return `BooleanDtype` for columns containing nullable data types ([GH 33449] (https://github.com/pandas-dev/pandas/issues/33449))
+ `DataFrameGroupBy.any()`, `SeriesGroupBy.any()`, `DataFrameGroupBy.all()`, and `SeriesGroupBy.all()` in `object` data containing `pd.NA` even if `skipna= True` also throws an exception ([GH 37501](https://github.com/pandas-dev/pandas/issues/37501))
+ `DataFrameGroupBy.rank()` and `SeriesGroupBy.rank()` now support object-dtype data ([GH 38278](https://github.com/pandas-dev/pandas/issues/38278))
+ Using the `data` argument to construct a `DataFrame` or `Series` that is a Python iterable but *not* a NumPy scalar consisting of a NumPy `ndarray` now results in a dtype with the precision of the maximum NumPy scalar; when `data ` This is already the case with NumPy `ndarray` ([GH 40908](https://github.com/pandas-dev/pandas/issues/40908))
+ Add keyword `sort` to `pivot_table()` to allow results to be unsorted ([GH 39143](https://github.com/pandas-dev/pandas/issues/39143))
+ Add keyword `dropna` to `DataFrame.value_counts()` to allow counting rows containing `NA` values ([GH 41325](https://github.com/pandas-dev/pandas/issues/41325 ))
+ `Series.replace()` now converts results to `PeriodDtype` instead of `object` dtype where possible ([GH 41526](https://github.com/pandas-dev/pandas/issues/41526 ))
+ Improved display of error messages in `corr` and `cov` methods of `Rolling`, `Expanding` and `ExponentialMovingWindow` when `other` is not `DataFrame` or `Series` ([GH 41741]( https://github.com/pandas-dev/pandas/issues/41741))
+ `Series.between()` now accepts `left` or `right` as an argument to the `inclusive` argument to include only the left or right margin ([GH 40245](https://github.com/pandas- dev/pandas/issues/40245))
+ `DataFrame.explode()` now supports expanding multiple columns simultaneously. Its `column` parameter now also accepts a list or tuple of strings to expand on multiple columns simultaneously ([GH 39240](https://github.com/pandas-dev/pandas/issues/39240))
+ `DataFrame.sample()` now accepts an `ignore_index` parameter to reset the index after sampling, similar to `DataFrame.drop_duplicates()` and `DataFrame.sort_values()` ([GH 38581](https:// github.com/pandas-dev/pandas/issues/38581)). ### Customize HTTP(s) headers when reading csv or json files
When reading from remote URLs that are not handled by fsspec (such as HTTP and HTTPS), the dictionary passed in `storage_options` will be used to create the headers included in the request. This can be used to control the User-Agent header or send other custom headers ([GH 36688](https://github.com/pandas-dev/pandas/issues/36688)). For example:
```py
In [1]: headers = {"User-Agent": "pandas"}
In [2]: df = pd.read_csv(
...: "https://download.bls.gov/pub/time.series/cu/cu.item",
...: sep="\t",
...: storage_options=headers
...: )
</code></pre>
<h4>Read and write XML documents</h4>
We've added I/O support for reading and rendering shallow versions of XML documents using <code>read_xml()</code> and <code>DataFrame.to_xml()</code>. Uses lxml as the parser, supporting both XPath 1.0 and XSLT 1.0. (GH 27554)
<pre data-language=XML><code class="language-markup line-numbers">In [1]: xml = """<?xml version='1.0' encoding='utf-8'?>
...: <data>
...: <row>
...: <shape>square</shape>
...: <degrees>360</degrees>
...: <sides>4.0</sides>
...: </row>
...: <row>
...: <shape>circle</shape>
...: <degrees>360</degrees>
...: <sides/>
...: </row>
...: <row>
...: <shape>triangle</shape>
...: <degrees>180</degrees>
...: <sides>3.0</sides>
...: </row>
...: </data>"""
In [2]: df = pd.read_xml(xml)
In [3]: df
Out[3]:
shape degrees sides
0 square 360 4.0
1 circle 360 NaN
2 triangle 180 3.0
In [4]: df.to_xml()
Out[4]:
<?xml version='1.0' encoding='utf-8'?>
<data>
<row>
<index>0</index>
<shape>square</shape>
<degrees>360</degrees>
<sides>4.0</sides>
</row>
<row>
<index>1</index>
<shape>circle</shape>
<degrees>360</degrees>
<sides/>
</row>
<row>
<index>2</index>
<shape>triangle</shape>
<degrees>180</degrees>
<sides>3.0</sides>
</row>
</data>
</code></pre>
For more information, see the Writing XML section in the User Guide in the IO Tools.
<h4>Styler enhancement</h4>
We've done some focused development on <code>Styler</code>. See the revised and improved Styler documentation (GH 39720, GH 39317, GH 40493).
<blockquote>
<ul>
<li>The method <code>Styler.set_table_styles()</code> now accepts a more natural CSS language as argument, such as <code>'color:red;'</code> instead of <code>[('color', 'red')]</code> (GH 39563)</p></li>
<li><p>Methods <code>Styler.highlight_null()</code>, <code>Styler.highlight_min()</code> and <code>Styler.highlight_max()</code> now allow custom CSS highlighting instead of the default background color (GH 40242)</p></li>
<li><p><code>Styler.apply()</code> now accepts functions returning <code>ndarray</code> when <code>axis=None</code>, making it consistent with the behavior of <code>axis=0</code> and <code>axis=1</code> (GH 39359)</p></li>
<li><p>When providing malformed CSS via <code>Styler.apply()</code> or <code>Styler.applymap()</code>, an error is thrown when rendering (GH 39660)</p></li>
<li><p><code>Styler.format()</code> now accepts the keyword argument <code>escape</code> for optional HTML and LaTeX escaping (GH 40388, GH 41619)</p></li>
<li><p><code>Styler.background_gradient()</code> now adds parameter <code>gmap</code> to provide a specific gradient map for shading (GH 22727)</p></li>
<li><p><code>Styler.clear()</code> now also clears <code>Styler.hidden_index</code> and <code>Styler.hidden_columns</code> (GH 40484)</p></li>
<li><p>Added method <code>Styler.highlight_between()</code> (GH 39821)</p></li>
<li><p>Added method <code>Styler.highlight_quantile()</code> (GH 40926)</p></li>
<li><p>Added method <code>Styler.text_gradient()</code> (GH 41098)</p></li>
<li><p>Added method <code>Styler.set_tooltips()</code> to allow hover tips; this can be used to enhance interactive displays (GH 21266, GH 40284)</p></li>
<li><p>Added parameter <code>precision</code> to method <code>Styler.format()</code> to control the display of floating point numbers (GH 40134)</p></li>
<li><p>HTML output rendered by <code>Styler</code> now follows the w3 HTML Style Guide (GH 39626)</p></li>
<li><p>Many features of the <code>Styler</code> class are now partially or fully available for DataFrames with non-unique indexes or columns (GH 41143)</p></li>
<li><p>Better control over display via new styler options by sparsifying indexes or columns individually, also available via <code>option_context()</code> (GH 41142)</p></li>
<li><p>Added option <code>styler.render.max_elements</code> to avoid browser overload when styling large dataframes (GH 40712)</p></li>
<li><p>Added method <code>Styler.to_latex()</code> (GH 21673, GH 42320), which also allows some limited CSS transformations (GH 40731)</p></li>
<li><p>Added method <code>Styler.to_html()</code> (GH 13379)</p></li>
<li><p>Added method <code>Styler.set_sticky()</code> to make index and column headers permanently visible in scrolling HTML frames (GH 29072).</p></li>
</ul>
</blockquote>
<h4>DataFrame constructor follows <code>copy=False</code> and dictionary</h4>
<p>Copying is no longer done when passing a dictionary to <code>DataFrame</code> with <code>copy=False</code> (GH 32960).
<pre><code class="language-python line-numbers">In [1]: arr = np.array([1, 2, 3])
In [2]: df = pd.DataFrame({"A": arr, "B": arr.copy()}, copy=False)
In [3]: df
Out[3]:
A B
0 1 1
1 2 2
2 3 3
</code></pre>
<code>df["A"]</code> is still a view of <code>arr</code>:
<pre><code class="language-python line-numbers">In [4]: arr[0] = 0
In [5]: assert df.iloc[0, 0] == 0
</code></pre>
When <code>copy</code> is not passed, the default behavior remains unchanged, which is to copy.
<h4>Use string data types supported by PyArrow</h4>
We enhanced <code>StringDtype</code>, an extended type specifically for string data (GH 39908).
Storage can now be specified via the <code>storage</code> keyword option of <code>StringDtype</code>. Use the pandas option or specify a dtype using <code>dtype='string[pyarrow]'</code> to allow StringArray to be backed by PyArrow arrays rather than by Python objects of NumPy arrays.
StringArray supported by PyArrow requires pyarrow 1.0.0 or higher to be installed.
warn
<code>string[pyarrow]</code> is currently considered an experimental feature. Implementation and parts of the API may change without warning.
<pre><code class="language-python line-numbers">In [6]: pd.Series(['abc', None, 'def'], dtype=pd.StringDtype(storage="pyarrow"))
Out[6]:
0 abc
1 <NA>
2 def
dtype: string
</code></pre>
You can also use the alias <code>"string[pyarrow]"</code>.
<pre><code class="language-python line-numbers">In [7]: s = pd.Series(['abc', None, 'def'], dtype="string[pyarrow]")
In [8]: s
Out[8]:
0 abc
1 <NA>
2 def
dtype: string
</code></pre>
You can also create PyArrow-enabled string arrays using the pandas option.
<pre><code class="language-python line-numbers">In [9]: with pd.option_context("string_storage", "pyarrow"):
...: s = pd.Series(['abc', None, 'def'], dtype="string")
...:
In [10]: s
Out[10]:
0 abc
1 <NA>
2 def
dtype: string
</code></pre>
Regular string accessor methods work. Where appropriate, the DataFrame's Series or column return type will also have a string dtype.
<pre><code class="language-python line-numbers">In [11]: s.str.upper()
Out[11]:
0 ABC
1 <NA>
2 DEF
dtype: string
In [12]: s.str.split('b', expand=True).dtypes
Out[12]:
0 string[pyarrow]
1 string[pyarrow]
dtype: object
</code></pre>
String accessor methods that return integers will return values with <code>Int64Dtype</code>.
<pre><code class="language-python line-numbers">In [13]: s.str.count("a")
Out[13]:
0 1
1 <NA>
2 0
dtype: Int64
</code></pre>
<h4>Centered scrolling window similar to date and time</h4>
Centered datetime-like windows are now available when performing rolling calculations on DataFrame and Series objects with datetime-like indexes (GH 38780). For example:
<pre><code class="language-python line-numbers">In [14]: df = pd.DataFrame(
....: {"A": [0, 1, 2, 3, 4]}, index=pd.date_range("2020", periods=5, freq="1D")
....: )
....:
In [15]: df
Out[15]:
A
2020-01-01 0
2020-01-02 1
2020-01-03 2
2020-01-04 3
2020-01-05 4
In [16]: df.rolling("2D", center=True).mean()
Out[16]:
A
2020-01-01 0.5
2020-01-02 1.5
2020-01-03 2.5
2020-01-04 3.5
2020-01-05 4.0
</code></pre>
<h4>Other enhancements</h4>
<ul>
<li><code>DataFrame.rolling()</code>, <code>Series.rolling()</code>, <code>DataFrame.expanding()</code>, and <code>Series.expanding()</code> now support a <code>method</code> argument with the <code>'table'</code> option, which option performs windowing operations on the entire <code>DataFrame</code>. Check out the Window Overview to see the performance and functionality benefits (GH 15095, GH 38995)</p></li>
<li><p><code>ExponentialMovingWindow</code> now supports an <code>online</code> method to perform <code>mean</code> calculations online. View Window Overview (GH 41673)</p></li>
<li><p>Added <code>MultiIndex.dtypes()</code> (GH 37062)</p></li>
<li><p>Added <code>end</code> and <code>end_day</code> options to <code>origin</code> parameter in <code>DataFrame.resample()</code> (GH 37804)</p></li>
<li><p>In <code>read_csv()</code>, improved error message when <code>usecols</code> and <code>names</code> do not match, and <code>engine="c"</code> (GH 29042)</p></li>
<li><p>Improved consistency of error messages when passing invalid <code>win_type</code> parameters in Window methods (GH 15969)</p></li>
<li><p><code>read_sql_query()</code> now accepts a <code>dtype</code> argument to transform column data from a SQL database based on user input (GH 10285)</p></li>
<li><p>When <code>usecols</code> is not specified, <code>read_csv()</code> now raises <code>ParserWarning</code> if the length of the header or given name does not match the length of the data (GH 21768)</p></li>
<li><p>Improved integer type mapping from pandas to SQLAlchemy when using <code>DataFrame.to_sql()</code> (GH 35076)</p></li>
<li><p><code>to_numeric()</code> now supports downcasting of nullable <code>ExtensionDtype</code> objects (GH 33013)</p></li>
<li><p>Added support for dictionary-like names in <code>MultiIndex.set_names</code> and <code>MultiIndex.rename</code> (GH 20421)</p></li>
<li><p><code>read_excel()</code> now automatically detects .xlsb files and legacy .xls files (GH 35416, GH 41225)</p></li>
<li><p><code>ExcelWriter</code> now accepts an <code>if_sheet_exists</code> parameter for controlling the behavior of append modes when writing to existing sheets (GH 40230)</p></li>
<li><p><code>Rolling.sum()</code>, <code>Expanding.sum()</code>, <code>Rolling.mean()</code>, <code>Expanding.mean()</code>, <code>ExponentialMovingWindow.mean()</code>, <code>Rolling.median()</code>, <code>Expanding .median()</code>, <code>Rolling.max()</code>, <code>Expanding.max()</code>, <code>Rolling.min()</code> and <code>Expanding.min()</code> now support Numba execution, using the <code>engine</code> keyword (GH 38895, GH 41267)</p></li>
<li><p><code>DataFrame.apply()</code> now accepts NumPy unary operators as strings, such as <code>df.apply("sqrt")</code>, which already exists in <code>Series.apply()</code> (GH 39116)</p></li>
<li><p><code>DataFrame.apply()</code> can now accept non-callable DataFrame properties as strings, such as <code>df.apply("size")</code>, which already exists in <code>Series.apply()</code> (GH 39116)</p></li>
<li><p><code>DataFrame.applymap()</code> now accepts kwargs passed to a user-supplied <code>func</code> (GH 39987)</p></li>
<li><p>Passing <code>DataFrame</code> indexers to <code>iloc</code> for <code>Series.__getitem__()</code> and <code>DataFrame.__getitem__()</code> is now not allowed (GH 39004)</p></li>
<li><p><code>Series.apply()</code> can now accept a list or dictionary-like argument instead of a list or dictionary, e.g. <code>ser.apply(np.array(["sum", "mean"]))</code>, which is used in <code>DataFrame Already exists in .apply()</code> (GH 39140)</p></li>
<li><p><code>DataFrame.plot.scatter()</code> now accepts a categorical column as parameter <code>c</code> (GH 12380, GH 31357)</p></li>
<li><p><code>Series.loc()</code> now raises a useful error message when the Series has a <code>MultiIndex</code> and the indexer has too many dimensions (GH 35349)</p></li>
<li><p><code>read_stata()</code> now supports reading data from compressed files (GH 26599)</p></li>
<li><p>Added support for parsing <code>ISO 8601</code>-like timestamps with negative signs into <code>Timedelta</code> (GH 37172)</p></li>
<li><p>Added support for unary operators in <code>FloatingArray</code> (GH 38749)</p></li>
<li><p><code>RangeIndex</code> can now be constructed by passing a <code>range</code> object directly, e.g. <code>pd.RangeIndex(range(3))</code> (GH 12067)</p></li>
<li><p><code>Series.round()</code> and <code>DataFrame.round()</code> now handle nullable integer and floating point data types (GH 38844)</p></li>
<li><p><code>read_csv()</code> and <code>read_json()</code> provide parameter <code>encoding_errors</code> to control how encoding errors are handled (GH 39450)</p></li>
<li><p><code>DataFrameGroupBy.any()</code>, <code>SeriesGroupBy.any()</code>, <code>DataFrameGroupBy.all()</code>, and <code>SeriesGroupBy.all()</code> use Kleene logic with nullable data types (GH 37506)</p></li>
<li><p><code>DataFrameGroupBy.any()</code>, <code>SeriesGroupBy.any()</code>, <code>DataFrameGroupBy.all()</code>, and <code>SeriesGroupBy.all()</code> return a <code>BooleanDtype</code> for columns with nullable data types (GH 33449)</p></li>
<li><p><code>DataFrameGroupBy.any()</code>, <code>SeriesGroupBy.any()</code>, <code>DataFrameGroupBy.all()</code>, and <code>SeriesGroupBy.all()</code> raise <code>object</code> even if <code>skipna=True</code> and the data contains <code>pd.NA</code> (GH 37501)</p></li>
<li><p><code>DataFrameGroupBy.rank()</code> and <code>SeriesGroupBy.rank()</code> now support object dtype data (GH 38278)</p></li>
<li><p>When constructing a <code>DataFrame</code> or <code>Series</code> using a Python iterable object, if the <code>data</code> parameter is not a NumPy scalar consisting of a NumPy <code>ndarray</code>, the dtype will have the maximum precision of a NumPy scalar; when <code>data</code> is a NumPy <code>ndarray</code> This is already the case (GH 40908)</p></li>
<li><p>Add keyword <code>sort</code> in <code>pivot_table()</code> to allow results to be unsorted (GH 39143)</p></li>
<li><p>Add keyword <code>dropna</code> in <code>DataFrame.value_counts()</code> to allow counting of rows containing <code>NA</code> values (GH 41325)</p></li>
<li><p><code>Series.replace()</code> now converts results to <code>PeriodDtype</code> when possible, instead of <code>object</code> dtype (GH 41526)</p></li>
<li><p>Improved error messages in <code>corr</code> and <code>cov</code> methods of <code>Rolling</code>, <code>Expanding</code> and <code>ExponentialMovingWindow</code> when <code>other</code> is not <code>DataFrame</code> or <code>Series</code> (GH 41741)</p></li>
<li><p><code>Series.between()</code> can now accept <code>left</code> or <code>right</code> as an argument to the <code>inclusive</code> parameter to include only the left or right border (GH 40245)</p></li>
<li><p><code>DataFrame.explode()</code> now supports exploding multiple columns simultaneously. Its <code>column</code> parameter now also accepts a str or list of tuples to explode on multiple columns simultaneously (GH 39240)</p></li>
<li><p><code>DataFrame.sample()</code> now accepts an <code>ignore_index</code> parameter to reset the index after sampling, similar to <code>DataFrame.drop_duplicates()</code> and <code>DataFrame.sort_values()</code> (GH 38581)</p></li>
</ul>
<h3>Notable bug fixes</h3>
<p>These are bug fixes that may have significant behavior changes.
<h4><code>Categorical.unique</code> now always maintains the same dtype as the original</h4>
Previously, when calling <code>Categorical.unique()</code> with categorical data, unused categories in the new array were removed, making the new array a different dtype than the original array (GH 18291)
For example, given:
<pre><code class="language-shell line-numbers">In [17]: dtype = pd.CategoricalDtype(['bad', 'neutral', 'good'], ordered=True)
In [18]: cat = pd.Categorical(['good', 'good', 'bad', 'bad'], dtype=dtype)
In [19]: original = pd.Series(cat)
In [20]: unique = original.unique()
</code></pre>
<em>Previous Behavior</em>:
<pre><code class="language-python line-numbers">In [1]: unique
['good', 'bad']
Categories (2, object): ['bad' < 'good']
In [2]: original.dtype == unique.dtype
False
</code></pre>
<em>New Behavior</em>:
<pre><code class="language-python line-numbers">In [21]: unique
Out[21]:
['good', 'bad']
Categories (3, object): ['bad' < 'neutral' < 'good']
In [22]: original.dtype == unique.dtype
Out[22]: True
```### exist `DataFrame.combine_first()` Reserved dtype
`DataFrame.combine_first()` now preserves dtype ([GH 7509](https://github.com/pandas-dev/pandas/issues/7509))
```py
In [23]: df1 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=[0, 1, 2])
In [24]: df1
Out[24]:
A B
0 1 1
1 2 2
2 3 3
In [25]: df2 = pd.DataFrame({"B": [4, 5, 6], "C": [1, 2, 3]}, index=[2, 3, 4])
In [26]: df2
Out[26]:
B C
2 4 1
3 5 2
4 6 3
In [27]: combined = df1.combine_first(df2)
</code></pre>
<em>Previous Behavior</em>:
<pre><code class="language-python line-numbers">In [1]: combined.dtypes
Out[2]:
A float64
B float64
C float64
dtype: object
</code></pre>
<em>New Behavior</em>:
<pre><code class="language-python line-numbers">In [28]: combined.dtypes
Out[28]:
A float64
B int64
C float64
dtype: object
```### Groupby method agg and transform No longer changes the return of a callable function dtype
The previous methods `DataFrameGroupBy.aggregate()`, `SeriesGroupBy.aggregate()`, `DataFrameGroupBy.transform()` and `SeriesGroupBy.transform()` may transform the dtype of the result when the argument `func` is callable, May lead to undesirable results ([GH 21240](https://github.com/pandas-dev/pandas/issues/21240)). Conversion occurs if the result is numeric and converting it back to the input dtype does not change any value (as measured by `np.allclose`). Such conversions no longer occur.
```py
In [29]: df = pd.DataFrame({'key': [1, 1], 'a': [True, False], 'b': [True, True]})
In [30]: df
Out[30]:
key a b
0 1 True True
1 1 False True
</code></pre>
<em>Previous Behavior</em>:
<pre><code class="language-python line-numbers">In [5]: df.groupby('key').agg(lambda x: x.sum())
Out[5]:
a b
key
1 True 2
</code></pre>
<em>New Behavior</em>:
<pre><code class="language-python line-numbers">In [31]: df.groupby('key').agg(lambda x: x.sum())
Out[31]:
a b
key
1 1 2
```### `DataFrameGroupBy.mean()`, `DataFrameGroupBy.median()`, and `GDataFrameGroupBy.var()`, `SeriesGroupBy.mean()`, `SeriesGroupBy.median()`, and `SeriesGroupBy.var()` of `float` result
Previously, these methods might produce different dtypes depending on the input value. These methods will now always return a floating point dtype. ([GH 41137](https://github.com/pandas-dev/pandas/issues/41137))
```py
In [32]: df = pd.DataFrame({'a': [True], 'b': [1], 'c': [1.0]})
</code></pre>
<em>Previous Behavior</em>:
<pre><code class="language-python line-numbers">In [5]: df.groupby(df.index).mean()
Out[5]:
a b c
0 True 1 1.0
</code></pre>
<em>New Behavior</em>:
<pre><code class="language-python line-numbers">In [33]: df.groupby(df.index).mean()
Out[33]:
a b c
0 1.0 1.0 1.0
```### Try using `loc` and `iloc` Perform in-place operations when setting values
When setting an entire column using `loc` or `iloc`, pandas will try to insert the values into the existing data instead of creating an entirely new array.
```py
In [34]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")
In [35]: values = df.values
In [36]: new = np.array([5, 6, 7], dtype="int64")
In [37]: df.loc[[0, 1, 2], "A"] = new
</code></pre>
In the old and new behavior, the data in <code>values</code> is overwritten, but in the old behavior, the dtype of <code>df["A"]</code> is changed to <code>int64</code>.
<em>Previous Behavior</em>:
<pre><code class="language-python line-numbers">In [1]: df.dtypes
Out[1]:
A int64
dtype: object
In [2]: np.shares_memory(df["A"].values, new)
Out[2]: False
In [3]: np.shares_memory(df["A"].values, values)
Out[3]: False
</code></pre>
In pandas 1.3.0, <code>df</code> continues to share data with <code>values</code>.
<em>New Behavior</em>:
<pre><code class="language-python line-numbers">In [38]: df.dtypes
Out[38]:
A float64
dtype: object
In [39]: np.shares_memory(df["A"], new)
Out[39]: False
In [40]: np.shares_memory(df["A"], values)
Out[40]: True
```### In settings `frame[keys] = values` Never operate in place
When setting multiple columns using `frame[keys] = values`, the new array will replace the pre-existing array for those keys, which will *not* be overwritten ([GH 39510](https://github.com/pandas -dev/pandas/issues/39510)). Therefore, the column will retain the dtype(s) of `values` and will not be converted to the dtype(s) of the existing array.
```py
In [41]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")
In [42]: df[["A"]] = 5
</code></pre>
In the old behavior, <code>5</code> was converted to <code>float64</code> and inserted into the existing array as a support for <code>df</code>:
<em>Previous Behavior</em>:
<pre><code class="language-python line-numbers">In [1]: df.dtypes
Out[1]:
A float64
</code></pre>
In the new behavior, we get a new array and keep the integer type <code>5</code>:
<em>New Behavior</em>:
<pre><code class="language-shell line-numbers">In [43]: df.dtypes
Out[43]:
A int64
dtype: object
```### Consistent conversion when setting up a boolean series
Setting non-boolean values in `Series` using `dtype=bool` is now consistently converted to `dtype=object` ([GH 38709](https://github.com/pandas-dev/pandas/issues/38709))
```py
In [1]: orig = pd.Series([True, False])
In [2]: ser = orig.copy()
In [3]: ser.iloc[1] = np.nan
In [4]: ser2 = orig.copy()
In [5]: ser2.iloc[1] = 2.0
</code></pre>
<em>Previous Behavior</em>:
<pre><code class="language-shell line-numbers">In [1]: ser
Out [1]:
0 1.0
1 NaN
dtype: float64
In [2]:ser2
Out [2]:
0 True
1 2.0
dtype: object
</code></pre>
<em>New Behavior</em>:
<pre><code class="language-python line-numbers">In [1]: ser
Out [1]:
0 True
1 NaN
dtype: object
In [2]:ser2
Out [2]:
0 True
1 2.0
dtype: object
```### DataFrameGroupBy.rolling and SeriesGroupBy.rolling is no longer returned in value grouped-by List
group-by columns will now be removed from the results of `groupby.rolling` operations ([GH 32262](https://github.com/pandas-dev/pandas/issues/32262))
```py
In [44]: df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})
In [45]: df
Out[45]:
A B
0 1 0
1 1 1
2 2 2
3 3 3
</code></pre>
<em>Previous Behavior</em>:
<pre><code class="language-python line-numbers">In [1]: df.groupby("A").rolling(2).sum()
Out[1]:
A B
A
1 0 NaN NaN
1 2.0 1.0
2 2 NaN NaN
3 3 NaN NaN
</code></pre>
<em>New Behavior</em>:
<pre><code class="language-python line-numbers">In [46]: df.groupby("A").rolling(2).sum()
Out[46]:
B
A
1 0 NaN
1 1.0
2 2 NaN
3 3 NaN
```### Removes artificial truncation of rolling variance and standard deviation
`Rolling.std()` and `Rolling.var()` now do not artificially truncate results smaller than `~1e-8` and `~1e-15` to zero ([GH 37051](https:// github.com/pandas-dev/pandas/issues/37051), [GH 40448](https://github.com/pandas-dev/pandas/issues/40448), [GH 39872](https://github. com/pandas-dev/pandas/issues/39872)).
However, when scrolling over larger values, there may be floating point artifacts in the results.
```py
In [47]: s = pd.Series([7, 5, 5, 5])
In [48]: s.rolling(3).var()
Out[48]:
0 NaN
1 NaN
2 1.333333
3 0.000000
dtype: float64
```### DataFrameGroupBy.rolling and SeriesGroupBy.rolling having MultiIndex Levels are no longer removed from the results of
`DataFrameGroupBy.rolling()` and `SeriesGroupBy.rolling()` now do not remove levels of `DataFrame` with `MultiIndex` from the results. This could lead to a seeming duplication of levels in the `MultiIndex` in the results, but this change restores the behavior that existed in version 1.1.3 ([GH 38787](https://github.com/pandas-dev/pandas/issues /38787), [GH 38523](https://github.com/pandas-dev/pandas/issues/38523)).
```py
In [49]: index = pd.MultiIndex.from_tuples([('idx1', 'idx2')], names=['label1', 'label2'])
In [50]: df = pd.DataFrame({'a': [1], 'b': [2]}, index=index)
In [51]: df
Out[51]:
a b
label1 label2
idx1 idx2 1 2
</code></pre>
<em>Previous Behavior</em>:
<pre><code class="language-python line-numbers">In [1]: df.groupby('label1').rolling(1).sum()
Out[1]:
a b
label1
idx1 1.0 2.0
</code></pre>
<em>New Behavior</em>:
<pre><code class="language-python line-numbers">In [52]: df.groupby('label1').rolling(1).sum()
Out[52]:
a b
label1 label1 label2
idx1 idx1 idx2 1.0 2.0
```### `Categorical.unique` Now always remains the same as the original data type
Previously, when calling `Categorical.unique()` with categorical data, unused categories in the new array were removed, making the new array's data type different from the original data type ([GH 18291](https:// github.com/pandas-dev/pandas/issues/18291))
To illustrate, given:
```py
In [17]: dtype = pd.CategoricalDtype(['bad', 'neutral', 'good'], ordered=True)
In [18]: cat = pd.Categorical(['good', 'good', 'bad', 'bad'], dtype=dtype)
In [19]: original = pd.Series(cat)
In [20]: unique = original.unique()
Previous Behavior:
In [1]: unique
['good', 'bad']
Categories (2, object): ['bad' < 'good']
In [2]: original.dtype == unique.dtype
False
New Behavior:
In [21]: unique
Out[21]:
['good', 'bad']
Categories (3, object): ['bad' < 'neutral' < 'good']
In [22]: original.dtype == unique.dtype
Out[22]: True
Preserve data types in DataFrame.combine_first()
DataFrame.combine_first()
now preserves data types (GH 7509)
In [23]: df1 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=[0, 1, 2])
In [24]: df1
Out[24]:
A B
0 1 1
1 2 2
2 3 3
In [25]: df2 = pd.DataFrame({"B": [4, 5, 6], "C": [1, 2, 3]}, index=[2, 3, 4])
In [26]: df2
Out[26]:
B C
2 4 1
3 5 2
4 6 3
In [27]: combined = df1.combine_first(df2)
Previous Behavior:
In [1]: combined.dtypes
Out[2]:
A float64
B float64
C float64
dtype: object
New Behavior:
In [28]: combined.dtypes
Out[28]:
A float64
B int64
C float64
dtype: object
Groupby methods agg and transform no longer change the return data type of the callable object
Previously, the methods DataFrameGroupBy.aggregate()
, SeriesGroupBy.aggregate()
, DataFrameGroupBy.transform()
and SeriesGroupBy.transform()
might transform the dtype of the result when the argument func
was callable. May cause adverse consequences (GH 21240). Conversion occurs if the result is numeric and converting it back to the input dtype does not change any value (as measured by np.allclose
). This conversion does not occur now.
In [29]: df = pd.DataFrame({'key': [1, 1], 'a': [True, False], 'b': [True, True]})
In [30]: df
Out[30]:
key a b
0 1 True True
1 1 False True
Previous Behavior:
In [5]: df.groupby('key').agg(lambda x: x.sum())
Out[5]:
a b
key
1 True 2
New Behavior:
In [31]: df.groupby('key').agg(lambda x: x.sum())
Out[31]:
a b
key
1 1 2
The results of DataFrameGroupBy.mean()
, DataFrameGroupBy.median()
and GDataFrameGroupBy.var()
are float
, SeriesGroupBy.mean()
, SeriesGroupBy.median()
and The result of SeriesGroupBy.var()
is float
Previously, these methods could produce different dtypes depending on the input value. These methods will now always return type float. (GH 41137)
In [32]: df = pd.DataFrame({'a': [True], 'b': [1], 'c': [1.0]})
Previous Behavior:
In [5]: df.groupby(df.index).mean()
Out[5]:
a b c
0 True 1 1.0
New Behavior:
In [33]: df.groupby(df.index).mean()
Out[33]:
a b c
0 1.0 1.0 1.0
Try to operate in place when setting values using loc
and iloc
When setting an entire column using loc
or iloc
, pandas will try to insert values into existing data instead of creating an entirely new array.
In [34]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")
In [35]: values = df.values
In [36]: new = np.array([5, 6, 7], dtype="int64")
In [37]: df.loc[[0, 1, 2], "A"] = new
In both the old and new behavior, the data in values
is overwritten, but in the old behavior, the dtype of df["A"]
is changed to int64
.
Previous Behavior:
In [1]: df.dtypes
Out[1]:
A int64
dtype: object
In [2]: np.shares_memory(df["A"].values, new)
Out[2]: False
In [3]: np.shares_memory(df["A"].values, values)
Out[3]: False
In pandas 1.3.0, df
still shares data with values
New Behavior:
In [38]: df.dtypes
Out[38]:
A float64
dtype: object
In [39]: np.shares_memory(df["A"], new)
Out[39]: False
In [40]: np.shares_memory(df["A"], values)
Out[40]: True
Never do in-place operations when setting frame[keys] = values
When setting multiple columns using frame[keys] = values
, the new arrays will replace the pre-existing arrays for those keys, which will not be overwritten (GH 39510). Therefore, the column will retain the dtype(s) of values
and will not be converted to the dtype(s) of the existing array.
In [41]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")
In [42]: df[["A"]] = 5
In the old behavior, 5
was converted to float64
and inserted into an existing array that supported df
:
Previous Behavior:
In [1]: df.dtypes
Out[1]:
A float64
In the new behavior, we get a new array and retain an integer type 5
:
New Behavior:
In [43]: df.dtypes
Out[43]:
A int64
dtype: object
Set to consistent conversion in Boolean Series
Setting non-boolean values into Series
with dtype=bool
is now consistently converted to dtype=object
(GH 38709)
In [1]: orig = pd.Series([True, False])
In [2]: ser = orig.copy()
In [3]: ser.iloc[1] = np.nan
In [4]: ser2 = orig.copy()
In [5]: ser2.iloc[1] = 2.0
Previous Behavior:
In [1]: ser
Out [1]:
0 1.0
1 NaN
dtype: float64
In [2]:ser2
Out [2]:
0 True
1 2.0
dtype: object
New Behavior:
In [1]: ser
Out [1]:
0 True
1 NaN
dtype: object
In [2]:ser2
Out [2]:
0 True
1 2.0
dtype: object
DataFrameGroupBy.rolling and SeriesGroupBy.rolling no longer return columns grouped by group in the value
Grouping columns will now be removed from the results of groupby.rolling
operations (GH 32262)
In [44]: df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})
In [45]: df
Out[45]:
A B
0 1 0
1 1 1
2 2 2
3 3 3
Previous Behavior:
In [1]: df.groupby("A").rolling(2).sum()
Out[1]:
A B
A
1 0 NaN NaN
1 2.0 1.0
2 2 NaN NaN
3 3 NaN NaN
New Behavior:
In [46]: df.groupby("A").rolling(2).sum()
Out[46]:
B
A
1 0 NaN
1 1.0
2 2 NaN
3 3 NaN
Remove artificial truncation of rolling variance and standard deviation
Rolling.std()
and Rolling.var()
now no longer artificially truncate results smaller than ~1e-8
and ~1e-15
to zero (GH 37051, GH 40448, GH 39872) .
However, there may now be floating point artifacts in the results when scrolling to larger values.
In [47]: s = pd.Series([7, 5, 5, 5])
In [48]: s.rolling(3).var()
Out[48]:
0 NaN
1 NaN
2 1.333333
3 0.000000
dtype: float64
DataFrameGroupBy.rolling and SeriesGroupBy.rolling no longer remove levels with MultiIndex in the results
DataFrameGroupBy.rolling()
and SeriesGroupBy.rolling()
now no longer remove levels with a MultiIndex
in the DataFrame
from the results. This could result in duplication of levels in the resulting MultiIndex
, but this change restores the behavior that existed in version 1.1.3 (GH 38787, GH 38523).
In [49]: index = pd.MultiIndex.from_tuples([('idx1', 'idx2')], names=['label1', 'label2'])
In [50]: df = pd.DataFrame({'a': [1], 'b': [2]}, index=index)
In [51]: df
Out[51]:
a b
label1 label2
idx1 idx2 1 2
Previous Behavior:
In [1]: df.groupby('label1').rolling(1).sum()
Out[1]:
a b
label1
idx1 1.0 2.0
New Behavior:
In [52]: df.groupby('label1').rolling(1).sum()
Out[52]:
a b
label1 label1 label2
idx1 idx1 idx2 1.0 2.0
Backward-incompatible API changes
Added minimum version of dependencies
Some minimum supported dependency versions have been updated. If installed, we now need to:
Packages | Minimum Version | Required | Changed |
---|---|---|---|
numpy | 1.17.3 | X | X |
pytz | 2017.3 | X | |
python-dateutil | 2.7.3 | X | |
bottleneck | 1.2.1 | ||
numexpr | 2.7.0 | X | |
pytest (dev) | 6.0 | X | |
mypy (dev) | 0.812 | X | |
setuptools | 38.6.0 | X |
For optional libraries, it is generally recommended to use the latest version. The following table lists the minimum versions of each library currently being tested during pandas development. Optional libraries with versions lower than the minimum tested may still work, but are not considered supported.
Package | Minimum Version | Changed |
---|---|---|
beautifulsoup4 | 4.6.0 | |
fastparquet | 0.4.0 | X |
fsspec | 0.7.4 | |
gcsfs | 0.6.0 | |
lxml | 4.3.0 | |
matplotlib | 2.2.3 | |
numb | 0.46.0 | |
openpyxl | 3.0.0 | X |
pyarrow | 0.17.0 | X |
pymysql | 0.8.1 | X |
pytables | 3.5.1 | |
s3fs | 0.4.0 | |
scipy | 1.2.0 | |
sqlalchemy | 1.3.0 | X |
tabulate | 0.8.7 | X |
xray | 0.12.0 | |
xlrd | 1.2.0 | |
xlsxwriter | 1.0.2 | |
xlwt | 1.3.0 | |
pandas-gbq | 0.12.0 |
For more information, see Dependencies and Optional dependencies ### Other API changes
- Partially initialized
CategoricalDtype
objects (i.e. objects withcategories=None
) will no longer be equivalent to fully initialized dtype objects (GH 38516) -
Accessing
_constructor_expanddim
onDataFrame
and_constructor_sliced
onSeries
now raisesAttributeError
. Previously would raiseNotImplementedError
(GH 38782) -
Added new
engine
and**engine_kwargs
parameters toDataFrame.to_sql()
to support other future “SQL engines”. Currently, we are still only usingSQLAlchemy
under the hood, but there are plans to support more engines, such as turbodbc (GH 36893) -
Removed redundant
freq
fromPeriodIndex
string representation (GH 41653) -
ExtensionDtype.construct_array_type()
is now a required method forExtensionDtype
subclasses, rather than an optional method (GH 24860) -
Calling
hash
on a non-hashable pandas object will raiseTypeError
with a built-in error message (egunhashable type: 'Series'
). Previously a custom message would be displayed, such as'Series' objects are mutable, thus they cannot be hashed
. Additionally,isinstance(<Series> , abc.collections.Hashable)
now returnsFalse
(GH 40013) -
Styler.from_custom_template()
now has two new template name arguments, and the oldname
was removed since template inheritance was introduced for better parsing (GH 42053). You also need to subclass the Styler attribute. ### Construct -
Documents in
.pptx
and.pdf
formats are no longer included in wheel or source distributions. (GH 30741) ### Increase minimum version of dependencies
The minimum supported versions of some dependencies have been updated. If installed, we now require:
Package | Minimum Version | Required | Changed |
---|---|---|---|
numpy | 1.17.3 | X | X |
pytz | 2017.3 | X | |
python-dateutil | 2.7.3 | X | |
bottleneck | 1.2.1 | ||
numexpr | 2.7.0 | X | |
pytest (dev) | 6.0 | X | |
mypy (dev) | 0.812 | X | |
setuptools | 38.6.0 | X |
For optional libraries, it is generally recommended to use the latest version. The following table lists the minimum versions of each library currently being tested during pandas development. Optional libraries with lower than minimum tested versions may still be available, but are not considered supported.
Package | Minimum Version | Changed |
---|---|---|
beautifulsoup4 | 4.6.0 | |
fastparquet | 0.4.0 | X |
fsspec | 0.7.4 | |
gcsfs | 0.6.0 | |
lxml | 4.3.0 | |
matplotlib | 2.2.3 | |
numb | 0.46.0 | |
openpyxl | 3.0.0 | X |
pyarrow | 0.17.0 | X |
pymysql | 0.8.1 | X |
pytables | 3.5.1 | |
s3fs | 0.4.0 | |
scipy | 1.2.0 | |
sqlalchemy | 1.3.0 | X |
tabulate | 0.8.7 | X |
xray | 0.12.0 | |
xlrd | 1.2.0 | |
xlsxwriter | 1.0.2 | |
xlwt | 1.3.0 | |
pandas-gbq | 0.12.0 |
See Dependencies and Optional Dependencies for more information.
Other API changes
- Partially initialized
CategoricalDtype
objects (i.e. objects withcategories=None
) will no longer be equivalent to fully initialized dtype objects (GH 38516) -
Accessing
_constructor_expanddim
onDataFrame
and_constructor_sliced
onSeries
now raisesAttributeError
. Previously raisedNotImplementedError
(GH 38782) -
Added new
engine
and**engine_kwargs
arguments toDataFrame.to_sql()
to support other future “SQL engines”. Currently we are still only usingSQLAlchemy
under the hood, but plan to support more engines, such as turbodbc (GH 36893) -
Removed redundant
freq
fromPeriodIndex
string representation (GH 41653) -
ExtensionDtype.construct_array_type()
is now a required method forExtensionDtype
subclasses, rather than an optional method (GH 24860) -
Calling
hash
on an unhashable pandas object now raisesTypeError
with a built-in error message (e.g.unhashable type: 'Series'
). Previously a custom message would be raised, such as'Series' objects are mutable, thus they cannot be hashed
. Additionally,isinstance(<Series> , abc.collections.Hashable)
will now returnFalse
(GH 40013) -
Styler.from_custom_template()
now has two new arguments for template names, and the oldname
has been removed since template inheritance was introduced for better parsing (GH 42053). It is also necessary to modify the subclass of Styler property.
Construct
- Documentation in
.pptx
and.pdf
formats is no longer included in wheels or source distributions. (GH 30741)
Deprecated
Deprecated removal of useless columns in DataFrame reduction and DataFrameGroupBy operations
When calling reduction on a DataFrame
with numeric_only=None
(the default) (e.g. .min
, .max
, .sum
), columns that raise TypeError
for the reduction are silently ignored and removed from the results.
This behavior is deprecated. In a future version, TypeError
will be raised and the user will need to select a valid column before calling the function.
For example:
In [53]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})
In [54]: df
Out[54]:
A B
0 1 2016-01-01
1 2 2016-01-02
2 3 2016-01-03
3 4 2016-01-04
Old Behavior:
In [3]: df.prod()
Out[3]:
Out[3]:
A 24
dtype: int64
Future Behavior:
In [4]: df.prod()
...
TypeError: 'DatetimeArray' does not implement reduction 'prod'
In [5]: df[["A"]].prod()
Out[5]:
A 24
dtype: int64
Likewise, when applying a function to DataFrameGroupBy
, columns that currently raise TypeError
for the function are silently ignored and removed from the result.
This behavior has been deprecated. In a future version, a TypeError
will be raised and the user will need to select only valid columns before calling the function.
For example:
In [55]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})
In [56]: gb = df.groupby([1, 1, 2, 2])
Old Behavior:
In [4]: gb.prod(numeric_only=False)
Out[4]:
A
1 2
2 12
Future Behavior:
In [5]: gb.prod(numeric_only=False)
...
TypeError: datetime64 type does not support prod operations
In [6]: gb[["A"]].prod(numeric_only=False)
Out[6]:
A
1 2
2 12
```### Other abandoned
+ Deprecated allowing scalars to be passed to `Categorical` constructor ([GH 38433](https://github.com/pandas-dev/pandas/issues/38433))
+ Deprecated constructing `CategoricalIndex` without passing list-like data ([GH 38944](https://github.com/pandas-dev/pandas/issues/38944))
+ Deprecated the usage of allowing subclass-specific keyword parameters in the `Index` constructor, and directly use specific subclasses instead ([GH 14093](https://github.com/pandas-dev/pandas/issues/14093 ), [GH 21311](https://github.com/pandas-dev/pandas/issues/21311), [GH 22315](https://github.com/pandas-dev/pandas/issues/22315), [GH 26974](https://github.com/pandas-dev/pandas/issues/26974))
+ Deprecated the `astype()` method of datetimelike (`timedelta64[ns]`, `datetime64[ns]`, `Datetime64TZDtype`, `PeriodDtype`) for conversion to integer dtype, use `values.view(.. .)` instead of ([GH 38544](https://github.com/pandas-dev/pandas/issues/38544)). This deprecation was reversed in pandas 1.4.0.
+ Deprecated `MultiIndex.is_lexsorted()` and `MultiIndex.lexsort_depth()` and use `MultiIndex.is_monotonic_increasing()` instead ([GH 32259](https://github.com/pandas-dev/pandas/issues /32259))
+ Deprecated keyword `try_cast` in `Series.where()`, `Series.mask()`, `DataFrame.where()`, `DataFrame.mask()`, manually cast the result if necessary ([ GH 38836](https://github.com/pandas-dev/pandas/issues/38836))
+ Deprecated using `datetime.date` objects to compare `Timestamp` objects. For example, instead of using `ts <= mydate`, use `ts <= pd.Timestamp(mydate)` or `ts.date() <= mydate` ([GH 36131](https://github.com/ pandas-dev/pandas/issues/36131))
+ Deprecated `Rolling.win_type` returning `"freq"` ([GH 38963](https://github.com/pandas-dev/pandas/issues/38963))
+ Deprecated `Rolling.is_datetimelike` ([GH 38963](https://github.com/pandas-dev/pandas/issues/38963))
+ Deprecated `DataFrame` indexer for `Series.__setitem__()` and `DataFrame.__setitem__()` ([GH 39004](https://github.com/pandas-dev/pandas/issues/39004))
+ Deprecated`ExponentialMovingWindow.vol()`([GH 39220](https://github.com/pandas-dev/pandas/issues/39220))
+ Using `.astype` to convert between `datetime64[ns]` dtype and `DatetimeTZDtype` has been deprecated and will throw an error in a future release, `obj.tz_localize` or `obj.dt` should be used instead. tz_localize` ([GH 38622](https://github.com/pandas-dev/pandas/issues/38622))
+ `datetime.date` objects are no longer used as `fill_value` in `DataFrame.unstack()`, `DataFrame.shift()`, `Series.shift()` and `DataFrame.reindex()` to `datetime64`, instead `pd.Timestamp(dateobj)` should be passed ([GH 39767](https://github.com/pandas-dev/pandas/issues/39767))
+ Deprecate `Styler.set_na_rep()` and `Styler.set_precision()` in favor of `Styler.format()` with `na_rep` and `precision` as existing and new input parameters ([GH 40134]( https://github.com/pandas-dev/pandas/issues/40134), [GH 40425](https://github.com/pandas-dev/pandas/issues/40425))
+ Deprecate `Styler.where()` in favor of an alternative with `Styler.applymap()` ([GH 40821](https://github.com/pandas-dev/pandas/issues/40821))
+ In `Series.transform()` and `DataFrame.transform()`, partial failure is no longer allowed when `func` is similar to a list or dictionary and raises any exception except `TypeError`; in a future version A `func` that raises any exception other than `TypeError` will raise an error ([GH 40211](https://github.com/pandas-dev/pandas/issues/40211))
+ In `read_csv()` and `read_table()`, the parameters `error_bad_lines` and `warn_bad_lines` are no longer supported, but the parameter `on_bad_lines` is supported instead ([GH 15122](https://github.com/pandas- dev/pandas/issues/15122))
+ Deprecated support for `np.ma.mrecords.MaskedRecords` in the `DataFrame` constructor, please use `{name: data[name] for name in data.dtype.names}` instead ([GH 40363](https ://github.com/pandas-dev/pandas/issues/40363))
+ Deprecated using `merge()`, `DataFrame.merge()` and `DataFrame.join()` at different hierarchical levels ([GH 34862](https://github.com/pandas-dev/pandas /issues/34862))
+ Deprecated use of `**kwargs` in `ExcelWriter`; use keyword argument `engine_kwargs` instead ([GH 40430](https://github.com/pandas-dev/pandas/issues/40430))
+ Deprecated `level` keyword for `DataFrame` and `Series` aggregation; use groupby instead ([GH 39983](https://github.com/pandas-dev/pandas/issues/39983))
+ Deprecated the `inplace` parameter of `Categorical.remove_categories()`, `Categorical.add_categories()`, `Categorical.reorder_categories()`, `Categorical.rename_categories()`, `Categorical.set_categories()`, and Will be removed in a future release ([GH 37643](https://github.com/pandas-dev/pandas/issues/37643))
+ Deprecated the behavior of `merge()` when generating duplicate columns and existing columns via the `suffixes` keyword ([GH 22818](https://github.com/pandas-dev/pandas/issues/22818) )
+ Setting `Categorical._codes` is deprecated, please create a new `Categorical` with the required codes ([GH 40606](https://github.com/pandas-dev/pandas/issues/40606))
+ Deprecated `convert_float` optional parameter in `read_excel()` and `ExcelFile.parse()` ([GH 41127](https://github.com/pandas-dev/pandas/issues/41127))
+ The mixed time zone behavior of `DatetimeIndex.union()` has been deprecated; in a future version, both will be converted to UTC instead of object types ([GH 39328](https://github.com/pandas- dev/pandas/issues/39328))
+ Deprecated `usecols` for out-of-range indexes in `read_csv()` using `engine="c"` ([GH 25623](https://github.com/pandas-dev/pandas/issues/25623) )
+ Deprecated behavior in `DataFrame` constructor to treat lists whose first element is categorical; pass `pd.DataFrame({col: categorical, ...})` instead ([GH 38845]( https://github.com/pandas-dev/pandas/issues/38845))
+ Deprecated behavior of the `DataFrame` constructor when a `dtype` is passed and the data cannot be converted to that dtype. In a future release, this will be raised instead of being silently ignored ([GH 24435](https://github.com/pandas-dev/pandas/issues/24435))
+ The `Timestamp.freq` property has been deprecated. For properties that use it (`is_month_start`, `is_month_end`, `is_quarter_start`, `is_quarter_end`, `is_year_start`, `is_year_end`), when you have a `freq`, use e.g. `freq.is_month_start(ts)` ([GH 15146](https://github.com/pandas-dev/pandas/issues/15146))
+ Deprecated behavior for constructing `Series` or `DataFrame` with `DatetimeTZDtype` data and `datetime64[ns]` dtype. Use `Series(data).dt.tz_localize(None)` instead ([GH 41555](https://github.com/pandas-dev/pandas/issues/41555), [GH 33401](https:// github.com/pandas-dev/pandas/issues/33401))
+ Deprecated behavior of `Series` construct when large integer values silently overflow with small integer dtype; use `Series(data).astype(dtype)` instead ([GH 41734](https://github.com /pandas-dev/pandas/issues/41734))
+ Deprecated behavior of `DataFrame` constructs when converting floating point data to integer dtype even if there is a loss; in a future version this will remain floating point, matching the behavior of `Series` ([GH 41770] (https://github.com/pandas-dev/pandas/issues/41770))
+ Inference behavior for `timedelta64[ns]`, `datetime64[ns]` or `DatetimeTZDtype` dtypes has been deprecated in the `Series` constructor when passing string data and no `dtype` is passed ([ GH 33558](https://github.com/pandas-dev/pandas/issues/33558))
+ In a future release, when constructing a `Series` or `DataFrame` with `datetime64[ns]` data and `DatetimeTZDtype`, the data will be treated as wall time instead of UTC time (matching DatetimeIndex behavior). To view the data as UTC time, use `pd.Series(data).dt.tz_localize("UTC").dt.tz_convert(dtype.tz)` or `pd.Series(data.view("int64") , dtype=dtype)` ([GH 33401](https://github.com/pandas-dev/pandas/issues/33401))
+ Deprecated passing lists as `key` to `DataFrame.xs()` and `Series.xs()` ([GH 41760](https://github.com/pandas-dev/pandas/issues/41760))
+ Deprecated Boolean type parameter `inclusive` In `Series.between()`, the standard parameter value is `{"left", "right", "neither", "both"}` ([GH 40628](https:/ /github.com/pandas-dev/pandas/issues/40628))
+ Deprecated passing parameters as positional parameters to all the following functions, but special cases have been noted ([GH 41485](https://github.com/pandas-dev/pandas/issues/41485))
+ `concat()` (except `objs`)
+ `read_csv()` (Apart from `filepath_or_buffer`)
+ `read_table()` (Apart from `filepath_or_buffer`)
+ `DataFrame.clip()` and `Series.clip()` (except `upper` and `lower`)
+ `DataFrame.drop_duplicates()` (Apart from `subset`), `Series.drop_duplicates()`, `Index.drop_duplicates()` and `MultiIndex.drop_duplicates()`
+ `DataFrame.drop()` (except `labels`) and `Series.drop()`
+ `DataFrame.dropna()` and `Series.dropna()`
+ `DataFrame.ffill()`, `Series.ffill()`, `DataFrame.bfill()` and `Series.bfill()`
+ `DataFrame.fillna()` and `Series.fillna()` (Apart from `value`)
+ `DataFrame.interpolate()` and `Series.interpolate()` (Apart from `method`)
+ `DataFrame.mask()` and `Series.mask()` (except `cond` and `other`)
+ `DataFrame.reset_index()` (except `level`) and `Series.reset_index()`
+ `DataFrame.set_axis()` and `Series.set_axis()` (except `labels`)
+ `DataFrame.set_index()` (except `keys`)
+ `DataFrame.sort_index()` and `Series.sort_index()`
+ `DataFrame.sort_values()` (Apart from `by`)and `Series.sort_values()`
+ `DataFrame.where()` and `Series.where()` (except `cond` and `other`)
+ `Index.set_names()` and `MultiIndex.set_names()` (except `names`)
+ `MultiIndex.codes()` (except `codes`)
+ `MultiIndex.set_levels()` (except `levels`)
+ `Resampler.interpolate()` (except `method`) ### Deprecate removing irrelevant columns in DataFrame reduction and DataFrameGroupBy operations
When reducing a `DataFrame` with `numeric_only=None` (the default) (e.g. `.min`, `.max`, `.sum`), columns that raise a `TypeError` are silently ignored if reduced and removed from the results.
This behavior is deprecated. In a future version, `TypeError` will be raised and the user will need to select only valid columns before calling the function.
For example:
```py
In [53]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})
In [54]: df
Out[54]:
A B
0 1 2016-01-01
1 2 2016-01-02
2 3 2016-01-03
3 4 2016-01-04
Old Behavior:
In [3]: df.prod()
Out[3]:
Out[3]:
A 24
dtype: int64
Future Behavior:
In [4]: df.prod()
...
TypeError: 'DatetimeArray' does not implement reduction 'prod'
In [5]: df[["A"]].prod()
Out[5]:
A 24
dtype: int64
Similarly, when applying a function to DataFrameGroupBy
, columns where the function raises a TypeError
are now silently ignored and removed from the result.
This behavior is deprecated. In a future version, TypeError
will be raised and the user will need to select only valid columns before calling the function.
For example:
In [55]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})
In [56]: gb = df.groupby([1, 1, 2, 2])
Old Behavior:
In [4]: gb.prod(numeric_only=False)
Out[4]:
A
1 2
2 12
Future Behavior:
In [5]: gb.prod(numeric_only=False)
...
TypeError: datetime64 type does not support prod operations
In [6]: gb[["A"]].prod(numeric_only=False)
Out[6]:
A
1 2
2 12
Other deprecations
- Deprecate allowing scalars to be passed to
Categorical
constructor (GH 38433) -
Deprecate not passing list-like data when constructing
CategoricalIndex
(GH 38944) -
Deprecated allowing specific subclass keyword arguments in the
Index
constructor, use specific subclasses directly instead (GH 14093, GH 21311, GH 22315, GH 26974) -
Deprecated the
astype()
method for converting datetimelike (timedelta64[ns]
,datetime64[ns]
,Datetime64TZDtype
,PeriodDtype
) to integer data types, usevalues.view(.. .)
instead (GH 38544). This deprecation was reversed in pandas 1.4.0. -
Deprecate
MultiIndex.is_lexsorted()
andMultiIndex.lexsort_depth()
and useMultiIndex.is_monotonic_increasing()
instead (GH 32259) -
Deprecated keyword
try_cast
inSeries.where()
,Series.mask()
,DataFrame.where()
,DataFrame.mask()
; cast results manually if needed (GH 38836 ) -
Deprecated comparison of
Timestamp
objects withdatetime.date
objects. For example, instead of usingts <= mydate
, usets <= pd.Timestamp(mydate)
orts.date() <= mydate
(GH 36131) -
Deprecate
Rolling.win_type
returning"freq"
(GH 38963) -
Deprecate
Rolling.is_datetimelike
(GH 38963) -
Deprecate the
DataFrame
indexer in favor ofSeries.__setitem__()
andDataFrame.__setitem__()
(GH 39004) -
Deprecate
ExponentialMovingWindow.vol()
(GH 39220) -
Using
.astype
to convert betweendatetime64[ns]
type andDatetimeTZDtype
has been deprecated and will cause in a future version to useobj.tz_localize
orobj.dt.tz_localize
instead (GH 38622) -
Deprecated when converting
datetime.date
objects todatetime64
asfill_value
inDataFrame.unstack()
,DataFrame.shift()
,Series.shift()
andDataFrame.reindex() For casts in
,pd.Timestamp(dateobj)
should be passed instead (GH 39767) -
Styler.set_na_rep()
andStyler.set_precision()
are deprecated in favor ofStyler.format()
, withna_rep
andprecision
as existing and new input parameters respectively (GH 40134, GH 40425) -
Deprecated
Styler.where()
in favor of an alternative form ofStyler.applymap()
(GH 40821) -
Deprecate functionality in
Series.transform()
andDataFrame.transform()
that allows partial failure whenfunc
is similar to a list or dictionary and raises any exception exceptTypeError
;func
Raising exceptions other thanTypeError
will be raised in a future release (GH 40211) -
Deprecated
error_bad_lines
andwarn_bad_lines
parameters inread_csv()
andread_table()
in favor ofon_bad_lines
parameter (GH 15122) -
Support for
np.ma.mrecords.MaskedRecords
is deprecated in theDataFrame
constructor, please use{name: data[name] for name in data.dtype.names}
instead (GH 40363) -
Deprecated behavior of using
merge()
,DataFrame.merge()
andDataFrame.join()
at different levels (GH 34862) -
Use of
**kwargs
is deprecated inExcelWriter
; use the keyword argumentengine_kwargs
instead (GH 40430) -
The
level
keyword argument is deprecated inDataFrame
andSeries
aggregations; use groupby instead (GH 39983) -
The
inplace
parameter inCategorical.remove_categories()
,Categorical.add_categories()
,Categorical.reorder_categories()
,Categorical.rename_categories()
,Categorical.set_categories()
is deprecated, and Will be removed in a future release (GH 37643) -
Deprecated behavior of duplicate columns via the
suffixes
keyword inmerge()
, as well as columns that already exist (GH 22818) -
The behavior of setting
Categorical._codes
is deprecated, please create a newCategorical
and use the required codes (GH 40606) -
The
convert_float
optional parameter is deprecated inread_excel()
andExcelFile.parse()
(GH 41127) -
The behavior of
DatetimeIndex.union()
in mixed time zones has been deprecated; in a future release, both will be converted to UTC instead of object dtype (GH 39328) -
For
read_csv()
usingengine="c"
, usage ofusecols
with out-of-bounds indexes has been deprecated (GH 25623) -
In the
DataFrame
constructor, special handling of lists whose first element is categorical has been deprecated; usepd.DataFrame({col: categorical, ...})
instead (GH 38845) -
Deprecated behavior of the
DataFrame
constructor when adtype
is passed and the data cannot be converted to that dtype. In a future release, this will raise an exception instead of being silently ignored (GH 24435) -
The
Timestamp.freq
property is deprecated. For properties that use it (is_month_start
,is_month_end
,is_quarter_start
,is_quarter_end
,is_year_start
,is_year_end
), when you have afreq
, use e.g.freq.is_month_start(ts)
(GH 15146) -
Deprecated behavior for constructing
Series
orDataFrame
usingDatetimeTZDtype
data anddatetime64[ns]
dtype. UseSeries(data).dt.tz_localize(None)
instead (GH 41555, GH 33401) -
The
Series
constructor behavior that caused silent overflow for large integer values and small integer dtypes is deprecated; useSeries(data).astype(dtype)
instead (GH 41734) -
DataFrame
construction behavior is deprecated when floating point data and integer dtypes are cast, even with loss; in a future version this will maintain floating point, matching the behavior ofSeries
(GH 41770) -
timedelta64[ns]
,datetime64[ns]
orDatetimeTZDtype
dtypes are no longer inferred in theSeries
construct when passing data containing strings and nodtype
is passed (GH 33558) Deprecated. -
In a future release, constructing a
Series
orDataFrame
withdatetime64[ns]
data andDatetimeTZDtype
will treat the data as wall clock time instead of UTC time (matching DatetimeIndex behavior). To view data as UTC time, usepd.Series(data).dt.tz_localize("UTC").dt.tz_convert(dtype.tz)
orpd.Series(data.view("int64" ), dtype=dtype)
(GH 33401) -
Passing lists as
key
boolean arguments toDataFrame.xs()
andSeries.xs()
is deprecated (GH 41760) -
Deprecated
{"left", "right", "neither", "both"}
using boolean argumentinclusive
as standard argument value inSeries.between()
(GH 40628) -
Arguments passed as positional arguments are deprecated for all of the following cases, except where noted (GH 41485):
-
concat()
(exceptobjs
) -
read_csv()
(Apart fromfilepath_or_buffer
) -
read_table()
(Apart fromfilepath_or_buffer
) -
DataFrame.clip()
andSeries.clip()
(exceptupper
andlower
) -
DataFrame.drop_duplicates()
(Apart fromsubset
parameter),Series.drop_duplicates()
,Index.drop_duplicates()
andMultiIndex.drop_duplicates()
-
DataFrame.drop()
(exceptlabels
) andSeries.drop()
-
DataFrame.dropna()
andSeries.dropna()
-
DataFrame.ffill()
,Series.ffill()
,DataFrame.bfill()
andSeries.bfill()
-
DataFrame.fillna()
andSeries.fillna()
(Apart fromvalue
) -
DataFrame.interpolate()
andSeries.interpolate()
(Apart frommethod
) -
DataFrame.mask()
andSeries.mask()
(exceptcond
andother
) -
DataFrame.reset_index()
(exceptlevel
) andSeries.reset_index()
-
DataFrame.set_axis()
andSeries.set_axis()
(exceptlabels
) -
DataFrame.set_index()
(exceptkeys
) -
DataFrame.sort_index()
andSeries.sort_index()
-
DataFrame.sort_values()
(Apart fromby
)andSeries.sort_values()
-
DataFrame.where()
andSeries.where()
(exceptcond
andother
) -
Index.set_names()
andMultiIndex.set_names()
(exceptnames
) -
MultiIndex.codes()
(exceptcodes
) -
MultiIndex.set_levels()
(exceptlevels
) -
Resampler.interpolate()
(exceptmethod
)
Performance improvements
-
Performance improved for
IntervalIndex.isin()
(GH 38353) -
Performance improvements for
Series.mean()
for nullable data types (GH 34814) -
Performance improvements for
Series.isin()
for nullable data types (GH 38340) -
For nullable float and nullable integer data types,
DataFrame.fillna()
performance improves when usingmethod="pad"
ormethod="backfill"
(GH 39953) -
Performance of
DataFrame.corr()
has been improved formethod=kendall
(GH 28329) -
Performance improvements for
DataFrame.corr()
formethod=spearman
(GH 40956, GH 41885) -
Performance improvements for
Rolling.corr()
andRolling.cov()
(GH 39388) -
Performance improvements for
RollingGroupby.corr()
,ExpandingGroupby.corr()
,ExpandingGroupby.corr()
andExpandingGroupby.cov()
(GH 39591) -
Performance improvements for
unique()
for object data types (GH 37615) -
Performance of
json_normalize()
has been improved for the base case (including delimiters) (GH 40035, GH 15621) -
Performance improved for
ExpandingGroupby
aggregation method (GH 39664) -
Performance improvements in
Styler
, rendering time reduced by over 50%, now matchesDataFrame.to_html()
(GH 39972 GH 39952, GH 40425) -
Method
Styler.set_td_classes()
is now as efficient asStyler.apply()
andStyler.applymap()
, and in some cases even more efficient (GH 40453) -
Performance improvements in
ExponentialMovingWindow.mean()
, usingtimes
(GH 39784) -
Performance improvements in
DataFrameGroupBy.apply()
andSeriesGroupBy.apply()
when Python fallback implementation is required (GH 40176) -
Performance improvements for converting PyArrow boolean arrays to pandas nullable boolean arrays (GH 41051)
-
Performance improvements for joining data with type
CategoricalDtype
(GH 40193) -
Performance improvements in
DataFrameGroupBy.cummin()
,SeriesGroupBy.cummin()
,DataFrameGroupBy.cummax()
andSeriesGroupBy.cummax()
when using nullable data types (GH 37493) -
Improved performance of
Series.nunique()
when using nan values (GH 40865) -
Performance improvements in
DataFrame.transpose()
andSeries.unstack()
when usingDatetimeTZDtype
(GH 40149) -
Performance improvements to
Series.plot()
andDataFrame.plot()
when lazily loading entry points (GH 41492)
Bug fix
Classification
-
CategoricalIndex
incorrectly did not raiseTypeError
when passing scalar data (GH 38614) -
Bug when the
Index
passed toCategoricalIndex.reindex
is not categorical, but all its values are labels in categories (GH 28690) -
Error when constructing
Categorical
from an array of object data types, not properly round-tripping to adate
object viaastype
(GH 38552) -
Error when constructing
DataFrame
fromndarray
andCategoricalDtype
(GH 38857) -
Bug in setting categorical values to object data type columns in
DataFrame
(GH 39136) -
Bug in
DataFrame.reindex()
that raisedIndexError
when the new index contained duplicates and the old index was aCategoricalIndex
(GH 38906) -
When filling
Categorical.fillna()
with a tuple-like category, raiseNotImplementedError
instead ofValueError
when filling with a non-categorical tuple (GH 41914)
Date and time class
-
The
DataFrame
andSeries
constructors sometimes remove nanoseconds fromTimestamp
(orTimedelta
)data
withdtype=datetime64[ns]
(ortimedelta64[ns]
) (GH 38032) -
Bug in
DataFrame.first()
andSeries.first()
with a month offset returning incorrect results when the first day is the end of the month (GH 29623) -
An error occurred when building a
DataFrame
orSeries
with mismatcheddatetime64
data andtimedelta64
data types or vice versa, failing to raiseTypeError
(GH 38575, GH 38764, GH 38792) -
There is a bug in building a
Series
orDataFrame
with adatetime
object outside the range of thedatetime64[ns]
data type or atimedelta
object outside the range of thetimedelta64[ns]
data type (GH 38792, GH 38965 ) -
Bug in
DatetimeIndex.intersection()
,DatetimeIndex.symmetric_difference()
,PeriodIndex.intersection()
,PeriodIndex.symmetric_difference()
always returns object dtype when operating withCategoricalIndex
(GH 38741 ) -
Bug in
DatetimeIndex.intersection()
giving incorrect results when using non-Tick frequencies andn != 1
(GH 42104) -
Bug in
Series.where()
incorrectly convertingdatetime64
values toint64
(GH 37682) -
Bug in
Categorical
incorrectly typecastingdatetime
objects toTimestamp
(GH 38878) -
Bug in comparison between
Timestamp
objects anddatetime64
objects outside the boundaries of the nanoseconddatetime64
implementation (GH 39221) -
Bug in
Timestamp.round()
,Timestamp.floor()
,Timestamp.ceil()
for values close to theTimestamp
implementation boundary (GH 39244) -
Bug in
Timedelta.round()
,Timedelta.floor()
,Timedelta.ceil()
for values close to theTimedelta
implementation boundary (GH 38964) -
Bug in
date_range()
incorrectly creatingDatetimeIndex
containingNaT
instead of raisingOutOfBoundsDatetime
in corner cases (GH 24124) -
infer_freq()
incorrectly fails to infer the 'H' frequency for aDatetimeIndex
that has a time zone and crosses a daylight saving time boundary (GH 39556) -
Series
backed byDatetimeArray
orTimedeltaArray
sometimes fails to set the array'sfreq
toNone
(GH 41425)
Time increment
-
When building
Timedelta
fromnp.timedelta64
objects, objects with non-nanosecond units outside the range oftimedelta64[ns]
failed to raise an error (GH 38965) -
Bug when building
TimedeltaIndex
, incorrectly acceptingnp.datetime64("NaT")
objects (GH 39462) -
Building
Timedelta
from an input string containing only symbols and no numbers fails without raising an error (GH 39710) -
TimedeltaIndex
andto_timedelta()
fail to raise an error when passing a non-nanosecondtimedelta64
array, overflowing when converting totimedelta64[ns]
(GH 40008)
Time zone
-
UTC represented by different
tzinfo
objects are not considered equivalent (GH 39216) -
dateutil.tz.gettz("UTC")
is not recognized as an equivalent for other tzinfo representing UTC (GH 39276)
Value
-
Bug in
DataFrame.quantile()
andDataFrame.sort_values()
caused incorrect subsequent indexing behavior (GH 38351) -
Bug in
DataFrame.sort_values()
, raisingIndexError
for emptyby
(GH 40258) -
Bug in
DataFrame.select_dtypes()
removes numericExtensionDtype
columns wheninclude=np.number
(GH 35340) -
Bug in
DataFrame.mode()
andSeries.mode()
not maintaining consistent integerIndex
on empty input (GH 33321) -
Bug in
DataFrame.rank()
when DataFrame containsnp.inf
(GH 32593) -
Bug in
DataFrame.rank()
raisingIndexError
when column holds incomparable type andaxis=0
(GH 38932) -
Bug in
Series.rank()
,DataFrame.rank()
,DataFrameGroupBy.rank()
andSeriesGroupBy.rank()
treats the most negativeint64
value as missing (GH 32859) -
Bug in
DataFrame.select_dtypes()
behaves differently wheninclude="int"
is used in Windows and Linux (GH 36596) -
Bug in
DataFrame.apply()
andDataFrame.agg()
when passing parameterfunc="size"
operates on the entireDataFrame
instead of rows or columns (GH 39934) -
Bug in
DataFrame.transform()
, which raisedSpecificationError
when passing a dictionary and missing columns, now raisesKeyError
(GH 40004) -
Bug in
DataFrameGroupBy.rank()
andSeriesGroupBy.rank()
giving incorrect results whenpct=True
and there are equal values between consecutive groups (GH 40518) -
Bug in
Series.count()
that would result inint32
results when parameterlevel=None
was used on 32-bit platforms (GH 40908) -
Bug in
Series
andDataFrame
that does not return boolean results for object data when usingany
andall
methods for reduction (GH 12863, GH 35450, GH 27709) -
There is a bug in
Series.clip()
that fails if the Series contains NA values and has a nullable int or float data type (GH 40851) -
Bug in
UInt64Index.where()
andUInt64Index.putmask()
whereTypeError
was incorrectly raised ifother
was of typenp.int64
(GH 41974) -
Bug in
DataFrame.agg()
where the axes of the aggregate were not sorted in the order of the provided aggregate functions, when one or more aggregate functions failed to produce results (GH 33634) -
DataFrame.clip()
has a bug and does not interpret missing values as unthresholded (GH 40420)
Conversion
-
Bug in
Series.to_dict()
inorient='records'
mode, now returns Python native types (GH 25969) -
Bug in
Series.view()
andIndex.view()
when converting to datetime types (datetime64[ns]
,datetime64[ns, tz]
,timedelta64
,period
) ( GH 39788) -
Original data type not preserved when creating
DataFrame
from emptynp.recarray
(GH 40121) -
Failed to raise
TypeError
when buildingDataFrame
fromfrozenset
(GH 40163) -
Ignore passed
dtype
when buildingIndex
, silently ignoring when data cannot be converted to that dtype (GH 21311) -
When converting to
dtype='categorical'
,StringArray.astype()
falls back to NumPy and throws an error on conversion (GH 40450) -
Bug in calling
factorize()
where, when given an array of numeric NumPy dtypes lower than int64, uint64 and float64, unique values did not retain their original dtype (GH 41132) -
Bug when building
DataFrame
with dictionary containingExtensionDtype
andcopy=True
, unable to copy array class objects (GH 38939) -
qcut()
throws error when takingFloat64DType
as input (GH 40730) -
When building
DataFrame
andSeries
withdatetime64[ns]
data anddtype=object
, the result is adatetime
object instead of aTimestamp
object (GH 41599) -
When building
DataFrame
andSeries
withtimedelta64[ns]
data anddtype=object
, the result is annp.timedelta64
object instead of aTimedelta
object (GH 41599) -
Error in
DataFrame
construction when given aPeriod
orInterval
object of 2D object data typenp.ndarray
and cannot be converted toPeriodDtype
orIntervalDtype
, respectively (GH 41812) -
Bug when constructing
Series
from lists andPandasDtype
(GH 39357) -
There is a bug when creating a
Series
from arange
object that does not fit within the boundaries of theint64
data type (GH 30173) -
Bug when creating
Series
fromdict
with full tuple keys andIndex
that needs to be re-indexed (GH 41707) -
Bug in
infer_dtype()
, does not recognize Series, Index or arrays with Period data type (GH 23553) -
For general
ExtensionArray
objects, there is a bug ininfer_dtype()
and an error will be raised."unknown-array"
will now be returned instead of raising an error (GH 37367) -
A bug exists when calling
DataFrame.convert_dtypes()
on an empty DataFrame, incorrectly raisingValueError
(GH 40393)
string
-
Bug when converting from
pyarrow.ChunkedArray
toStringArray
, original data is not chunked (GH 41040) -
Series.replace()
andDataFrame.replace()
ignore replacements forregex=True
when usingStringDType
data (GH 41333, GH 35977) -
There is a bug in
Series.str.extract()
that returns an emptyDataFrame
object dtype when usingStringArray
(GH 41441) -
There is a bug in
Series.str.replace()
in inline code where thecase
parameter is ignored whenregex=False
(GH 41602)
Interval
-
IntervalIndex.intersection()
andIntervalIndex.symmetric_difference()
always return object dtype when operating withCategoricalIndex
(GH 38653, GH 38741) -
IntervalIndex.intersection()
returns duplicates when there are duplicates in at least oneIndex
object that exist in other objects (GH 38743) -
IntervalIndex.union()
,IntervalIndex.intersection()
,IntervalIndex.difference()
andIntervalIndex.symmetric_difference()
now convert appropriately when operating withIntervalIndex
of other incompatible dtypes dtype instead of raisingTypeError
(GH 39267) -
PeriodIndex.union()
,PeriodIndex.intersection()
,PeriodIndex.symmetric_difference()
andPeriodIndex.difference()
are now converted to objects when operating withPeriodIndex
of other incompatible dtypes dtype instead of raisingIncompatibleFrequency
(GH 39306) -
There is a bug in
IntervalIndex.is_monotonic()
,IntervalIndex.get_loc()
,IntervalIndex.get_indexer_for()
andIntervalIndex.__contains__()
when NA values are present (GH 41831)
Index
-
Bug in
Index.union()
andMultiIndex.union()
removing duplicateIndex
values whenIndex
is not monotonic orsort
is set toFalse
(GH 36289, GH 31326 , GH 40862) -
Bug in
CategoricalIndex.get_indexer()
, failing to raiseInvalidIndexError
when non-unique (GH 38372) -
Bug in
IntervalIndex.get_indexer()
whentarget
hasCategoricalDtype
and both index and target contain NA values (GH 41934) -
Bug in
Series.loc()
, raisingValueError
when filtering with a boolean list and the value to be set is a lower dimension list (GH 20438) -
Bug thrown when inserting many new columns into a
DataFrame
, causing subsequent indexing to behave incorrectly (GH 38380) -
Bug in
DataFrame.__setitem__()
, raisingValueError
when setting multiple values to duplicate columns (GH 15695) -
Bug in
DataFrame.loc()
,Series.loc()
,DataFrame.__getitem__()
andSeries.__getitem__()
returning incorrect string slices for non-monotoneDatetimeIndex
Elements of (GH 33146) -
DataFrame.reindex()
andSeries.reindex()
raiseTypeError
(GH 38566) -
Bug in
DataFrame.reindex()
whenfill_value
needs to be converted to an object dtype anddatetime64[ns]
ortimedelta64[ns]
is incorrectly converted to an integer (GH 39755) -
A bug exists in
DataFrame.__setitem__()
that raisesValueError
when setting on an emptyDataFrame
with a specified column and a non-emptyDataFrame
value (GH 38831) -
A bug exists in
DataFrame.loc.__setitem__()
that raisesValueError
when operating on a unique column on aDataFrame
with duplicate columns (GH 38521) -
Bug in mixed types,
DataFrame.iloc.__setitem__()
andDataFrame.loc.__setitem__()
when setting to dictionary values (GH 38335) -
A bug exists in
Series.loc.__setitem__()
andDataFrame.loc.__setitem__()
that raisesKeyError
when a boolean generator is provided (GH 39614) -
A bug exists in
Series.iloc()
andDataFrame.iloc()
that raisesKeyError
when a generator is provided (GH 39614) -
DataFrame.__setitem__()
does not raiseValueError
when the right side is aDataFrame
with a mismatched number of columns (GH 38604) -
A bug exists in
Series.__setitem__()
that raisesValueError
when setting aSeries
using a scalar indexer (GH 38303) -
A bug in
DataFrame.loc()
reduces the level ofMultiIndex
when the inputDataFrame
has only one row (GH 10521) -
DataFrame.__getitem__()
andSeries.__getitem__()
always raiseKeyError
when slicing an existing string whereIndex
has milliseconds (GH 33589) -
When setting a
timedelta64
ordatetime64
value to a numericSeries
fails with an error and cannot convert to an object dtype (GH 39086, GH 39619) -
Bug when setting an
Interval
value into aSeries
orDataFrame
with a mismatchedIntervalDtype
, incorrectly converting the new value to the existing dtype (GH 39120) -
Bug when setting
datetime64
values intoSeries
with integer dtype, incorrectly converting datetime64 values to integers (GH 39266) -
Bug when setting
np.datetime64("NaT")
into aSeries
withDatetime64TZDtype
incorrectly treating time zone independent values as time zone aware values (GH 39769) -
Bug in
Index.get_loc()
whereKeyError
was not raised whenkey=NaN
was specified andmethod
was specified butNaN
was not inIndex
(GH 39382) -
Bug in
DatetimeIndex.insert()
incorrectly treating time zone independent values as time zone aware values when insertingnp.datetime64("NaT")
into a time zone aware index (GH 39769) -
Exception incorrectly thrown in
Index.insert()
when setting a new column that cannot fit in an existingframe.columns
, or inSeries.reset_index()
orDataFrame.reset_index()
instead of converting it to a compatible dtype (GH 39068) -
Bug in
RangeIndex.append()
where single objects of length 1 were incorrectly spliced together (GH 39401) -
Bug in
RangeIndex.astype()
when converting toCategoricalIndex
, category becomesInt64Index
instead ofRangeIndex
(GH 41263) -
Bug in setting
numpy.timedelta64
value toSeries
of object dtype when using boolean indexer (GH 39488) -
Bug when setting numeric value to boolean type
Series
usingat
oriat
to convert it to object type (GH 39582) -
Bug in
DataFrame.__setitem__()
andDataFrame.iloc.__setitem__()
raisingValueError
when trying to set the value of a row fragment with a list (GH 40440) -
Bug in
DataFrame.loc()
whereKeyError
was not raised when key was not found inMultiIndex
and level was not fully specified (GH 41170) -
Bug where
DataFrame.loc.__setitem__()
incorrectly raised an exception when setting an extension when there were duplicates in the index of the extended axis (GH 40096) -
Bug in
DataFrame.loc.__getitem__()
withMultiIndex
incorrectly converting to float when at least one index column has a float type and we retrieve a scalar (GH 41369) -
Bug in
DataFrame.loc()
incorrectly matches non-boolean indexed elements (GH 20432) -
Bug where
KeyError
was incorrectly raised when usingnp.nan
for indexing onSeries
orDataFrame
withCategoricalIndex
(GH 41933) -
Bug in
Series.__delitem__()
withExtensionDtype
incorrectly converted tondarray
(GH 40386) -
Bug in
DataFrame.at()
withCategoricalIndex
returning incorrect results when passing integer keys (GH 41846) -
If there are duplicate indexers in
DataFrame.loc()
, the returnedMultiIndex
will be in the wrong order (GH 40978) -
DataFrame.__setitem__()
raisesTypeError
when usingDatetimeIndex
, usingstr
subclass as column name (GH 37366) -
PeriodIndex.get_loc()
fails to raiseKeyError
when given aPeriod
that does not matchfreq
(GH 41670) -
Bug .loc.__getitem__
sometimes raisedOverflowError
instead ofKeyError
when usingUInt64Index
with negative integer keys, in other cases converting to positive integers (GH 41777) -
Bug in
Index.get_indexer()
, failing to raiseValueError
when using invalidmethod
,limit
ortolerance
parameters in some cases (GH 41918) -
Error when slicing
Series
orDataFrame
, raisingValueError
instead ofTypeError
when passing an invalid string (GH 41821) -
Bug in
Index
constructor where specifieddtype
was sometimes silently ignored (GH 38879) -
The behavior of
Index.where()
now matches the behavior ofIndex.putmask()
, i.e.index.where(mask, other)
matchesindex.putmask(~mask, other)
(GH 39412)
Missing
-
Bug in
Grouper
not properly propagatingdropna
parameter;DataFrameGroupBy.transform()
now correctly handles missing values ofdropna=True
(GH 35612) -
Bug in
isna()
,Series.isna()
,Index.isna()
,DataFrame.isna()
and correspondingnotna
functions do not recognizeDecimal("NaN")
objects ( GH 39409) -
Bug in
DataFrame.fillna()
does not accept dictionary asdowncast
keyword argument (GH 40809) -
Bug in
isna()
does not return a copy of the mask for nullable types, causing any subsequent modification of the mask to alter the original array (GH 40935) -
Bug in
DataFrame
When constructed with floating point data containingNaN
and an integerdtype
, a cast is performed instead of retainingNaN
(GH 26919) -
Bug in
Series.isin()
andMultiIndex.isin()
not treating allNaN
s as equivalent if they were in tuples (GH 41836)
MultiIndex
-
Bug in
DataFrame.drop()
raisesTypeError
whenMultiIndex
is non-unique andlevel
is not provided (GH 36293) -
Bug in
MultiIndex.intersection()
repeatingNaN
in result (GH 38623) -
Bug in
MultiIndex.equals()
incorrectly returnsTrue
whenMultiIndex
containsNaN
, even if they are in different order (GH 38439) -
Bug in
MultiIndex.intersection()
always returns empty result when intersecting withCategoricalIndex
(GH 38653) -
Bug in
MultiIndex.difference()
incorrectly raisesTypeError
when index contains unsortable entries (GH 41915) -
When using
MultiIndex.reindex()
on an emptyMultiIndex
, aValueError
is raised when only a specific level is indexed (GH 41170) -
When reindexing
MultiIndex
,TypeError
is raised when reindexing against a flatIndex
(GH 41707)
I/O
-
There is a bug in
Index.__repr__()
whendisplay.max_seq_items=1
(GH 38415) -
Bug in
read_csv()
when setting parameterdecimal
andengine="python"
, scientific notation is not recognized (GH 31920) -
Bug in
read_csv()
interpretingNA
values as comments, fixed forengine="python"
whenNA
contains a comment string (GH 34002) -
Bug in
read_csv()
raisesIndexError
when multiple header columns andindex_col
are specified, but the file has no data rows (GH 38292) -
Bug in
read_csv()
does not accept the case whereusecols
andnames
are of different lengths inengine="python"
(GH 16469) -
Bug in
read_csv()
returns object dtype whendelimiter=","
whileusecols
andparse_dates
are specified forengine="python"
(GH 35873) -
Bug in
read_csv()
raisingTypeError
when specifyingnames
andparse_dates
forengine="c"
(GH 33699) -
Bug in
read_clipboard()
andDataFrame.to_clipboard()
not working in WSL (GH 38527) -
Allow setting custom error values for the
parse_dates
parameter ofread_sql()
,read_sql_query()
andread_sql_table()
(GH 35185) -
Bug in
DataFrame.to_hdf()
andSeries.to_hdf()
raiseKeyError
when trying to apply to a subclass ofDataFrame
orSeries
(GH 33748) -
Bug in
HDFStore.put()
raises wrongTypeError
when saving DataFrame with non-string dtype (GH 34274) -
Bug in
json_normalize()
causes the first element of the generator object not to be included in the returned DataFrame (GH 35923) -
Bug in
read_csv()
Issue when applying thousand separators to date columns when dates should be parsed andusecols
is specified forengine="python"
(GH 39365) -
Bug in
read_excel()
forward-fillingMultiIndex
names when specifying multiple header and index columns (GH 34673) -
Bug in
read_excel()
does not respectset_option()
(GH 34252) -
Bug in
read_csv()
not switching betweentrue_values
andfalse_values
of nullable boolean types (GH 34655) -
Bug in
read_json()
numeric string index not maintained whenorient="split"
(GH 28556) -
read_sql()
returns an empty generator ifchunksize
is non-zero and the query returns no results. Now returns a generator with a single empty DataFrame (GH 34411) -
Bug in
read_hdf()
returns unexpected records when filtering categorical string columns usingwhere
parameter (GH 39189) -
Bug in
read_sas()
raisingValueError
whendatetimes
is null (GH 39725) -
Bug in
read_excel()
when discarding null values in a single column spreadsheet (GH 39808) -
Bug in
read_excel()
causing problems loading trailing empty rows/columns for certain file types (GH 41167) -
Bug in
read_excel()
raisingAttributeError
when Excel file hasMultiIndex
header followed by two empty lines and no index (GH 40442) -
Bug in
read_excel()
,read_csv()
,read_table()
,read_fwf()
andread_clipboard()
when aMultiIndex
header was followed by an unindexed blank row (GH 40442) -
Bug in
DataFrame.to_string()
misaligned truncated columns whenindex=False
(GH 40904) -
Bug in
DataFrame.to_string()
added extra points and misplaced truncated rows whenindex=False
(GH 40904) -
Bug in
read_orc()
always raisesAttributeError
(GH 40918) -
Bug in
read_csv()
andread_table()
wherenames
andprefix
were silently ignored and now raisedValueError
(GH 39123) -
Bug in
read_csv()
andread_excel()
not respecting the dtype of duplicate column names whenmangle_dupe_cols
is set toTrue
(GH 35211) -
Bug in
read_csv()
wheresep
was silently ignored whendelimiter
andsep
were defined and now raisedValueError
(GH 39823) -
Bug in
read_csv()
andread_table()
misinterprets arguments whensys.setprofile
was previously called (GH 41069) -
Bug (GH 40896) occurs when converting PyArrow to pandas (e.g. reading a Parquet file) with a PyArrow array with a nullable dtype whose data buffer size is not a multiple of the dtype size
-
Bug in
read_excel()
causes an error to occur when pandas cannot determine the file type, even if the user specifies theengine
parameter (GH 41225) -
Bug in
read_clipboard()
misplaces values into the wrong columns when copying from an Excel file if there is a null value in the first column (GH 41108) -
Bug in
DataFrame.to_hdf()
andSeries.to_hdf()
raisingTypeError
when trying to append a string column to an incompatible column (GH 41897)
Period
- Comparisons of
Period
objects orIndex
,Series
orDataFrame
of unmatchedPeriodDtype
now behave the same as comparisons of other unmatched types, returningFalse
for equality andTrue
for inequality, raises
TypeError` for inequality checks (GH 39274)
Drawing
-
Bug in
plotting.scatter_matrix()
raises error when passing 2Dax
argument (GH 16253) -
Prevent warnings from appearing when Matplotlib's
constrained_layout
is enabled (GH 25261) -
Bug in
DataFrame.plot()
where the wrong colors were displayed in the legend if the function was called repeatedly and some calls usedyerr
but others did not (GH 39522) -
Bug in
DataFrame.plot()
where the wrong colors were displayed in the legend if the function was called repeatedly and some calls usedsecondary_y
and others usedlegend=False
(GH 40044) -
Bug in
DataFrame.plot.box()
where the top hat or min/max markers in the plot were not visible when thedark_background
theme was selected (GH 40769)
Groupby/resample/rolling
-
Bug in
DataFrameGroupBy.agg()
andSeriesGroupBy.agg()
where results were incorrectly typecast too aggressively forPeriodDtype
columns (GH 38254) -
Bug in
SeriesGroupBy.value_counts()
where unobserved categories were not counted in the grouped category Series (GH 38672) -
Bug in
SeriesGroupBy.value_counts()
that would throw an error if the Series was empty (GH 39172) -
Bug in
GroupBy.indices()
could include non-existent indexes when there is a null value in the group key (GH 9304) -
Fixed a bug in
DataFrameGroupBy.sum()
andSeriesGroupBy.sum()
that resulted in loss of precision and now uses the Kahan summation method (GH 38778) -
Fixed bug in
DataFrameGroupBy.cumsum()
,SeriesGroupBy.cumsum()
,DataFrameGroupBy.mean()
andSeriesGroupBy.mean()
, resulting in loss of accuracy by using Kahan summation (GH 38934) -
A bug in
Resampler.aggregate()
andDataFrame.transform()
where mixing data types in the absence of keys would raiseTypeError
instead ofSpecificationError
(GH 39025) -
A bug in
DataFrameGroupBy.idxmin()
andDataFrameGroupBy.idxmax()
involvingExtensionDtype
columns (GH 38733) -
A bug in
Series.resample()
that causes an error when the index is aPeriodIndex
consisting ofNaT
(GH 39227) -
A bug in
RollingGroupby.corr()
andExpandingGroupby.corr()
, when providingother
longer than each group, the groupby column would return0
instead ofnp.nan
(GH 39591) -
A bug in
ExpandingGroupby.corr()
andExpandingGroupby.cov()
where1
was returned instead ofnp.nan
whenother
longer than each group was provided (GH 39591) -
Bug in
DataFrameGroupBy.mean()
,SeriesGroupBy.mean()
,DataFrameGroupBy.median()
,SeriesGroupBy.median()
andDataFrame.pivot_table()
where metadata was not propagated (GH 28283 ) -
Bug in
Series.rolling()
andDataFrame.rolling()
where the window bounds were not calculated correctly when the window was an offset and the dates were in descending order (GH 40002) -
Bugs in
Series.groupby()
andDataFrame.groupby()
on emptySeries
orDataFrame
, directly useidxmax
,idxmin
,mad
,min
,max
,sum
,prod
andskew
methods, or when using them viaapply
,aggregate
orresample
, lose indexes, columns and/or data types (GH 26411) -
Bug in
DataFrameGroupBy.apply()
andSeriesGroupBy.apply()
, when used onRollingGroupby
objects, createsMultiIndex
instead ofIndex
(GH 39732) -
Bug in
DataFrameGroupBy.sample()
, causing an error whenweights
is specified and the index isInt64Index
(GH 39927) -
Bug in
DataFrameGroupBy.aggregate()
andResampler.aggregate()
, which sometimes raisedSpecificationError
when a dictionary was passed and a column was missing; now always raisesKeyError
(GH 40004) -
Bug in
DataFrameGroupBy.sample()
where column selection was not applied before calculating the result (GH 39928) -
Bug in
ExponentialMovingWindow
where calling__getitem__
incorrectly raisedValueError
whentimes
was provided (GH 40164) -
Bug in
ExponentialMovingWindow
, calling__getitem__
does not preservecom
,span
,alpha
orhalflife
attributes (GH 40164) -
ExponentialMovingWindow
now raisesNotImplementedError
when specifyingadjust=False
because of incorrect calculation (GH 40098) -
Bug in
ExponentialMovingWindowGroupby.mean()
, thetimes
parameter is ignored whenengine='numba'
(GH 40951) -
Bug in
ExponentialMovingWindowGroupby.mean()
, using wrong time when there are multiple groups (GH 40951) -
Bug in
ExponentialMovingWindowGroupby
, time vector and value will be out of sync during non-trivial grouping (GH 40951) -
Bug in
Series.asfreq()
andDataFrame.asfreq()
that dropped rows when the index was not sorted (GH 39805) -
Bug in aggregate functions in
DataFrame
that did not respect thenumeric_only
parameter when thelevel
keyword was given (GH 40660) -
Aggregating a Series with an
Index
of object type using a user-defined function caused a bug inSeriesGroupBy.aggregate()
with incorrectIndex
shape (GH 40014) -
Bug in
as_index=False
parameter ingroupby
inRollingGroupby
being ignored (GH 39433) -
When using a nullable type column with
NA
and even withskipna=True
,DataFrameGroupBy.any()
,SeriesGroupBy.any()
,DataFrameGroupBy.all()
andSeriesGroupBy.all( )
reportsValueError
error (GH 40585) -
Bug in
DataFrameGroupBy.cummin()
,SeriesGroupBy.cummin()
,DataFrameGroupBy.cummax()
andSeriesGroupBy.cummax()
that incorrectly rounded integer values nearint64
implementation boundaries (GH 40767 ) -
Error in
DataFrameGroupBy.rank()
andSeriesGroupBy.rank()
incorrectly raisingTypeError
using data with nullable types (GH 41010) -
Bug in
DataFrameGroupBy.cummin()
,SeriesGroupBy.cummin()
,DataFrameGroupBy.cummax()
andSeriesGroupBy.cummax()
compute incorrect results when converting nullable data types to floats, resulting in Unable to turn back (GH 37493) -
Bug in
DataFrame.rolling()
returns non-zero sums with zero mean and unstable computation in case ofmin_periods=0
for allNaN
windows (GH 41053) -
Bug in
DataFrame.rolling()
returns non-zero sum in case ofmin_periods=0
for allNaN
windows and unstable computation (GH 41053) -
Bug in
SeriesGroupBy.agg()
fails in ordered aggregation operation onCategoricalDtype
that preserves order (GH 41147) -
Bug in
DataFrameGroupBy.min()
,SeriesGroupBy.min()
,DataFrameGroupBy.max()
andSeriesGroupBy.max()
incorrectly when having multiple object data type columns andnumeric_only=False
RaisesValueError
(GH 41111) -
Bug in
DataFrameGroupBy.rank()
when usingaxis=0
and keywordaxis=1
ofrank
method of GroupBy object (GH 41320) -
Bug in
DataFrameGroupBy.__getitem__()
incorrectly returned a malformedSeriesGroupBy
instead ofDataFrameGroupBy
on non-unique columns (GH 41427) -
Bug in
DataFrameGroupBy.transform()
incorrectly raisesAttributeError
on non-unique columns (GH 41427) -
Bug in
Resampler.apply()
incorrectly removed duplicate columns on non-unique columns (GH 41445) -
Bug in
Series.groupby()
Aggregation incorrectly returns an emptySeries
instead of raisingTypeError
on aggregates whose dtype is invalid, such as.prod
withdatetime64[ns]
dtype (GH 41342) -
Bug in
DataFrameGroupBy
aggregate incorrectly not dropping columns of invalid dtype for the aggregate when there are no valid columns (GH 41291) -
Bug in
DataFrame.rolling.__iter__()
whereon
was not assigned an index into the result object (GH 40373) -
Bug in
DataFrameGroupBy.transform()
andDataFrameGroupBy.agg()
, when usingengine="numba"
,*args
is cached for user-passed functions (GH 41647) -
DataFrameGroupBy
methodsagg
,transform
,sum
,bfill
,ffill
,pad
,pct_change
,shift
,ohlc
removed.columns.names
(GH 41497 )
Reshape
-
Bug in
merge()
caused an error when performing an inner join, when the indexes partially overlapped andright_index=True
was used, there was no overlap between the indices (GH 33814) -
DataFrame.unstack()
with missing levels results in wrong index names (GH 37510). -
merge_asof()
propagated an error in the right index instead of the left index when using theleft_index=True
andright_on
specifications (GH 33463). -
Bug in
DataFrame.join()
returning incorrect results when a DataFrame has aMultiIndex
in which at least one index has only one level (GH 36909). -
merge_asof()
now raisesValueError
in the case of non-numeric merging columns, instead of the obscureTypeError
(GH 29130). -
Bug in
DataFrame.join()
not assigning values correctly when the DataFrame has aMultiIndex
and at least one dimension has a non-alphabeticalCategorical
category (GH 38502). -
Series.value_counts()
andSeries.mode()
now return consistent keys in original order (GH 12679, GH 11227 and GH 39007). -
DataFrame.stack()
does not correctly handle errors withNaN
inMultiIndex
columns (GH 39481). -
DataFrame.apply()
produces incorrect results when the argumentfunc
is a string,axis=1
and the axis argument is not supported; now raises aValueError
(GH 39211). -
Bug in
DataFrame.sort_values()
not reshaping the index correctly after column sorting withignore_index=True
(GH 39464). -
Bug in
DataFrame.append()
returning wrong dtypes when combiningExtensionDtype
types (GH 39454) -
Bug in
DataFrame.append()
returns wrong dtypes when used with a combination ofdatetime64
andtimedelta64
types (GH 39574) -
Bug in
DataFrame.append()
when appending to aDataFrame
that has aSeries
with aMultiIndex
and whoseIndex
is not aMultiIndex
returns wrong dtypes (GH 41707) -
Bug in
DataFrame.pivot_table()
returns aMultiIndex
of a single value when operating on an empty DataFrame (GH 13483) -
Index
can now be passed to thenumpy.all()
function (GH 40180) -
Bug in
DataFrame.stack()
not retainingCategoricalDtype
inMultiIndex
(GH 36991) -
Bug in
to_datetime()
raises error when input sequence contains unhashable items (GH 39756) -
Bug in
Series.explode()
preserve index whenignore_index
isTrue
and value is scalar (GH 40487) -
Bug in
to_datetime()
raisesValueError
whenSeries
containsNone
andNaT
and has more than 50 elements (GH 39882) -
Timezone-aware datetime objects containing object dtype values in
Series.unstack()
andDataFrame.unstack()
incorrectly raiseTypeError
(GH 41875) -
DataFrame.melt()
bug that raisesInvalidIndexError
whenDataFrame
has duplicate columns used asvalue_vars
(GH 41951)
Sparse
-
Fixed a
DataFrame.sparse.to_coo()
bug that raisedKeyError
using a numericIndex
column without0
(GH 18414) -
Fixed bug with
copy=False
inSparseArray.astype()
, which produced incorrect results when converting from an integer dtype to a floating point dtype (GH 34456) -
Fixed bug in
SparseArray.max()
andSparseArray.min()
always returning empty results (GH 40921)
Extend array
-
Fixed bug in
DataFrame.where()
whenother
is a Series withExtensionDtype
(GH 38729) -
Fixed bug where
Series.idxmax()
,Series.idxmin()
,Series.argmax()
andSeries.argmin()
failed when the underlying data was anExtensionArray
(GH 32749, GH 33719, GH 36566) -
Fixed a bug where properties of some
PandasExtensionDtype
subclasses were cached incorrectly (GH 40329) -
Fixed bug in
DataFrame.mask()
that raisedValueError
when masking a DataFrame usingExtensionDtype
(GH 40941)
Stylizer
-
Bug in
subset
parameter inStyler
method throws error for certain valid MultiIndex slices (GH 33562) -
The HTML output rendered by
Styler
has been slightly modified to support w3's good coding standards (GH 39626) -
Bug in
Styler
Some header cells in rendered HTML are missing column class identifiers (GH 39716) -
Bug in
Styler.background_gradient()
where text color is not determined correctly (GH 39888) -
Bug in
Styler.set_table_styles()
Multiple elements of thetable_styles
parameter in the CSS selector are not added correctly (GH 34061) -
Bug in
Styler
top left cell missing and title misaligned when copying from Jupyter (GH 12147) -
Bug in
Styler.where
wherekwargs
is not passed to the applicable callable (GH 40845) -
Bug in
Styler
causes CSS to be repeated on multiple renders (GH 39395, GH 40334)
other
-
inspect.getmembers(Series)
no longer raisesAbstractMethodError
(GH 38782) -
Bug in
Series.where()
with numeric dtype andother=None
not converting tonan
(GH 39761) -
Bug in
assert_series_equal()
,assert_frame_equal()
,assert_index_equal()
andassert_extension_array_equal()
incorrectly raised when attributes had unrecognized NA types (GH 39461) -
Bug when
assert_index_equal()
did not compareCategoricalIndex
instances withInt64Index
andRangeIndex
categories whenexact=True
was raised (GH 41263) -
DataFrame.equals()
,Series.equals()
andIndex. Bug in equals()
(GH 39650) -
Bug in
show_versions()
when console JSON output was not correct JSON (GH 39701) -
pandas now compiles on z/OS when using xlc (GH 35826)
-
Bug in
pandas.util.hash_pandas_object()
fails to recognizehash_key
,encoding
andcategorize
when the input object type isDataFrame
(GH 41404)
Classified
-
Bug in
CategoricalIndex
incorrectly not raisingTypeError
when passing scalar data (GH 38614) -
CategoricalIndex.reindex fails when the passed Index is not categorical but all its values are labels in that category (GH 28690)
-
Incorrect round trip using astype when constructing Categorical from date object of object dtype array (GH 38552)
-
Bug in constructing DataFrame from ndarray and CategoricalDtype (GH 38857)
-
Bug in DataFrame setting categorical values into object dtype column (GH 39136)
-
DataFrame.reindex() raises IndexError when the new index contains duplicates and the old index is a CategoricalIndex (GH 38906)
-
In tuple-like categorical filling, the
Categorical.fillna()
function raisesNotImplementedError
instead ofValueError
when filling with non-categorical tuples (GH 41914)
Datetimelike
-
Nanoseconds were sometimes removed from Timestamp (resp. Timedelta) data with data type datetime64[ns] (resp. timedelta64[ns]) in DataFrame and Series constructors (GH 38032)
-
DataFrame.first()
andSeries.first()
had a bug when the month offset was one, returning incorrect results when the first day of the month was the end of the month (GH 29623) -
Bug where
TypeError
failed to be raised when constructingDataFrame
orSeries
with mismatcheddatetime64
data andtimedelta6
dtype or vice versa (GH 38575, GH 38764, GH 38792) -
Bug in constructing
Series
orDataFrame
withdatetime
objects outside the boundaries ofdatetime64[ns]
dtype ortimedelta
objects outside the boundaries oftimedelta64[ns]
dtype (GH 38792, GH 38965) -
Bug in
DatetimeIndex.intersection()
,DatetimeIndex.symmetric_difference()
,PeriodIndex.intersection()
,PeriodIndex.symmetric_difference()
always returning object dtype when operating withCategoricalIndex
(GH 38741 ) -
DatetimeIndex.intersection()
gives incorrect results when using non-Tick frequencies andn != 1
(GH 42104) -
Bug in
Series.where()
incorrectly convertsdatetime64
values toint64
(GH 37682) -
Bug in
Categorical
when incorrectly convertingdatetime
objects toTimestamp
(GH 38878) -
Bug in comparing
Timestamp
objects withdatetime64
objects outside the boundaries of microsecond-leveldatetime64
implementations (GH 39221) -
Bug in
Timestamp.round()
,Timestamp.floor()
,Timestamp.ceil()
when values close toTimestamp
implementation boundaries (GH 39244) -
Bug Bug exists in
Timedelta.round()
,Timedelta.floor()
andTimedelta.ceil()
for values close to theTimedelta
implementation boundary (GH 38964) -
Bug In
date_range()
, aDatetimeIndex
containingNaT
was incorrectly created instead of raising anOutOfBoundsDatetime
exception in corner cases (GH 24124) -
Bug in
infer_freq()
incorrectly failing to infer the 'H' frequency if theDatetimeIndex
has a time zone and crosses a DST boundary (GH 39556) -
Bug In
Series
backed byDatetimeArray
orTimedeltaArray
, sometimes failing to set the array'sfreq
toNone
(GH 41425)
Time delta
-
Bug When building
Timedelta
, non-nanosecond units fromnp.timedelta64
objects exceed the bounds oftimedelta64[ns]
(GH 38965) -
Bug incorrectly accepting
np.datetime64("NaT")
objects when constructingTimedeltaIndex
(GH 39462) -
Bug When building
Timedelta
, an error failed to be raised if the input string contained only symbols and no numbers (GH 39710) -
Bug in
TimedeltaIndex
andto_timedelta()
where exceptions were not raised when passing overflowing non-nanosecondtimedelta64
arrays that would overflow when converted totimedelta64[ns]
(GH 40008)
Time zone
-
Bug where different
tzinfo
objects representing UTC were not considered equivalent (GH 39216) -
DataFrame.rank()
reportsIndexError
when DataFrame containsaxis=0
and columns have incomparable types (GH 38932)
number
-
Bug in
DataFrame.quantile()
,DataFrame.sort_values()
leading to incorrect subsequent indexing behavior (GH 38351) -
DataFrame.sort_values()
reportsIndexError
on emptyby
(GH 40258) -
Bug in
DataFrame.select_dtypes()
removes numericExtensionDtype
columns wheninclude=np.number
is used (GH 35340) -
Bug in
DataFrame.mode()
andSeries.mode()
not maintaining consistent integerIndex
for empty input (GH 33321) -
Bug in
dateutil.tz.gettz("UTC")
not being recognized as equivalent to other tzinfos representing UTC (GH 39276) -
Bug in
DataFrame.rank()
when DataFrame containsnp.inf
(GH 32593) -
Bug in
Series.rank()
,DataFrame.rank()
,DataFrameGroupBy.rank()
, andSeriesGroupBy.rank()
in treating the most negativeint64
value as missing (GH 32859 ) -
Bug in
DataFrame.select_dtypes()
behaves differently when usinginclude="int"
between Windows and Linux (GH 36596) -
Bug in
DataFrame.apply()
andDataFrame.agg()
when passed argumentfunc="size"
would operate on the entireDataFrame
instead of rows and columns (GH 39934) -
Bug in
DataFrame.transform()
, which raisedSpecificationError
when a dictionary was passed and a column was missing, now raisesKeyError
(GH 40004) -
Bug in
DataFrameGroupBy.rank()
andSeriesGroupBy.rank()
giving incorrect results whenpct=True
and equality exists between consecutive groups (GH 40518) -
Bug in
Series.count()
when argumentlevel=None
producesint32
results on 32-bit platforms (GH 40908) -
Bug in
Series
andDataFrame
not returning boolean results for object data when using methodsany
andall
for reduction (GH 12863, GH 35450, GH 27709) -
Bug in
Series.clip()
which would fail if a Series contained NA values and had a nullable integer or float as data type (GH 40851) -
Bug in
UInt64Index.where()
andUInt64Index.putmask()
whenother
of typenp.int64
incorrectly raisedTypeError
(GH 41974) -
Bug in
DataFrame.agg()
not sorting the axes of an aggregate in the order of the supplied aggregate function when the supplied aggregate function fails to produce a result (GH 33634) -
In
DataFrame.clip()
, interpret missing values as unthresholded (GH 40420)
Conversion
-
Series.to_dict()
usingorient='records'
now returns Python native types (GH 25969) -
Series.view()
andIndex.view()
appear when converting datetime (datetime64[ns]
,datetime64[ns, tz]
,timedelta64
,period
) data types Bug (GH 39788) -
Original data types not preserved when creating
DataFrame
from emptynp.recarray
(GH 40121) -
DataFrame
fails to raiseTypeError
when building withfrozenset
(GH 40163) -
When building
Index
, the passeddtype
is silently ignored when the data cannot be converted to the specified data type (GH 21311) -
When converting
StringArray.astype()
todtype='categorical'
, fallback to NumPy raises an error on conversion (GH 40450) -
When using
factorize()
, unique values fail to retain their original data type when the given array is a numeric NumPy data type below int64, uint64 and float64 (GH 41132) -
Error building
DataFrame
using dictionary as array containingExtensionDtype
andcopy=True
, failed to copy (GH 38939) -
qcut()
reports an error when usingFloat64DType
as input (GH 40730) -
A bug exists in building
DataFrame
andSeries
withdatetime64[ns]
data anddtype=object
, resulting indatetime
objects instead ofTimestamp
objects (GH 41599) -
A bug exists in building
DataFrame
andSeries
withtimedelta64[ns]
data anddtype=object
, resulting innp.timedelta64
objects instead ofTimedelta
objects (GH 41599) -
There is a bug in building
DataFrame
given aPeriod
orInterval
object of two-dimensional object dtypenp.ndarray
, cannot be cast toPeriodDtype
orIntervalDtype
(GH 41812) -
Bug when building
Series
from lists andPandasDtype
(GH 39357) -
Bug when building
Series
fromrange
objects that do not fit withinint64
dtype boundaries (GH 30173) -
Bug when creating
Series
fromdict
with full tuple keys andIndex
that needs to be re-indexed (GH 41707) -
Bug in
infer_dtype()
, Series, Index or array with period (dtype) is not recognized (GH 23553) -
There is an error in a general
ExtensionArray
object whereinfer_dtype()
raises an error."unknown-array"
will now be returned instead of raising an error (GH 37367) -
Bug in
DataFrame.convert_dtypes()
where aValueError
was incorrectly raised when called on an empty DataFrame (GH 40393)
string
-
Bug when converting from
pyarrow.ChunkedArray
toStringArray
when original chunks are zero (GH 41040) -
Vulnerability in
Series.replace()
andDataFrame.replace()
, ignoring replacement ofStringDType
data withregex=True
(GH 41333, GH 35977) -
Vulnerability in
Series.str.extract()
, usingStringArray
to return an emptyDataFrame
returns object dtype (GH 41441) -
Vulnerability in
Series.str.replace()
wherecase
parameter is ignored whenregex=False
(GH 41602)
Interval
-
Bug where
IntervalIndex.intersection()
andIntervalIndex.symmetric_difference()
always return object dtype when operating withCategoricalIndex
(GH 38653, GH 38741) -
Vulnerability in
IntervalIndex.intersection()
returning duplicates when at least oneIndex
object has a duplicate that exists in another object (GH 38743) -
IntervalIndex.union()
,IntervalIndex.intersection()
,IntervalIndex.difference()
andIntervalIndex.symmetric_difference()
now convert when operating with anotherIntervalIndex
with an incompatible dtype for the appropriate dtype instead of raisingTypeError
(GH 39267) -
PeriodIndex.union()
,PeriodIndex.intersection()
,PeriodIndex.symmetric_difference()
,PeriodIndex.difference()
now convert to object dtype when operating with another incompatiblePeriodIndex
, instead of raising anIncompatibleFrequency
error (GH 39306) -
Bug in
IntervalIndex.is_monotonic()
,IntervalIndex.get_loc()
,IntervalIndex.get_indexer_for()
andIntervalIndex.__contains__()
when NA values are present (GH 41831)
Index
-
Bug in
Index.union()
andMultiIndex.union()
to discard duplicateIndex
values whenIndex
is not monotonic orsort
is set toFalse
(GH 36289, GH 31326, GH 40862) -
Bug in
CategoricalIndex.get_indexer()
that did not raiseInvalidIndexError
when not unique (GH 38372) -
Bug in
IntervalIndex.get_indexer()
whentarget
hasCategoricalDtype
and both index and target contain NA values (GH 41934) -
Bug in
Series.loc()
raisesValueError
when filtering input using a boolean list and the value to be set is a lower-dimensional list (GH 20438) -
Bug inserting many new columns into
DataFrame
caused subsequent indexing to behave incorrectly (GH 38380) -
Bug in
DataFrame.__setitem__()
raisingValueError
when setting multiple values to duplicate columns (GH 15695) -
Bug where
DataFrame.loc()
,Series.loc()
,DataFrame.__getitem__()
andSeries.__getitem__()
returned incorrect elements for string slices of non-monotonicDatetimeIndex
(GH 33146) -
Bug in
DataFrame.reindex()
andSeries.reindex()
, with time zone aware indexes, formethod="ffill"
andmethod="bfill"
and when specifyingtolerance
RaisesTypeError
(GH 38566) -
Bug in
DataFrame.reindex()
wheredatetime64[ns]
ortimedelta64[ns]
needed to be converted to an object data type, incorrectly converting to an integer, resulting in afill_value
error (GH 39755) -
Bug in
DataFrame.__setitem__()
, raisingValueError
when setting an emptyDataFrame
with a specified column and a non-emptyDataFrame
value (GH 38831) -
Bug in
DataFrame.loc.__setitem__()
, raisingValueError
whenDataFrame
has duplicate columns when operating on unique columns (GH 38521) -
Bug in
DataFrame.iloc.__setitem__()
andDataFrame.loc.__setitem__()
where mixed data types raisedValueError
when setting with dictionary values (GH 38335) -
Bug in
Series.loc.__setitem__()
andDataFrame.loc.__setitem__()
, raisingKeyError
when providing a boolean generator (GH 39614) -
Bug in
Series.iloc()
andDataFrame.iloc()
, raisingKeyError
when providing a generator (GH 39614) -
Bug in
DataFrame.__setitem__()
whereValueError
was not raised when the right-hand side was aDataFrame
with an incorrect number of columns (GH 38604) -
Bug in
Series.__setitem__()
, raisingValueError
when settingSeries
with a scalar indexer (GH 38303) -
Bug in
DataFrame.loc()
that discards the level ofMultiIndex
when theDataFrame
as input has only one row (GH 10521) -
DataFrame.__getitem__()
andSeries.__getitem__()
always raiseKeyError
when slicing with an existing string with milliseconds (GH 33589) -
Error when setting a
timedelta64
ordatetime64
value to a numericSeries
, which cannot be converted to an object dtype (GH 39086, GH 39619) -
Bug in setting
Interval
value toSeries
orDataFrame
with mismatchedIntervalDtype
incorrectly converting new value to existing dtype (GH 39120) -
Bug in setting
datetime64
value toSeries
with integer dtype incorrectly converting datetime64 value to integer (GH 39266) -
Bug in setting
np.datetime64("NaT")
to aSeries
withDatetime64TZDtype
incorrectly treating timezone-independent values as timezone-aware (GH 39769) -
Error in
Index.get_loc()
did not raiseKeyError
whenkey=NaN
andmethod
was specified butNaN
was not inIndex
(GH 39382) -
Bug in
DatetimeIndex.insert()
when insertingnp.datetime64("NaT")
into a time zone aware index, incorrectly treating time zone independent values as time zone aware values (GH 39769 ) -
Exception incorrectly thrown in
Index.insert()
when setting a new column that cannot be saved in an existingframe.columns
, or inSeries.reset_index()
orDataFrame.reset_index()
in instead of converting to a compatible data type (GH 39068) -
Bug in
RangeIndex.append()
where single objects of length 1 were incorrectly concatenated (GH 39401) -
Bug in
RangeIndex.astype()
, when converting toCategoricalIndex
, the category becameInt64Index
instead ofRangeIndex
(GH 41263) -
Error when setting
numpy.timedelta64
value to object data typeSeries
, using boolean indexer (GH 39488) -
Bug in converting a numeric value to an object data type using
at
oriat
when setting it to a boolean data typeSeries
fails (GH 39582) -
Bug in
DataFrame.__setitem__()
andDataFrame.iloc.__setitem__()
, raisingValueError
when trying to use a row slice index and set a list as a value (GH 40440) -
Bug in
DataFrame.loc()
whereKeyError
was not raised when a key was not found inMultiIndex
and the level was not fully specified (GH 41170) -
DataFrame.loc.__setitem__()
bug in set, incorrectly raising exception when index in extended axis contains duplicates (GH 40096) -
Bug in
DataFrame.loc.__getitem__()
, convertingMultiIndex
to float when at least one index column has a floating point data type and we retrieve a scalar (GH 41369) -
Bug in
DataFrame.loc()
, incorrectly matching non-boolean indexed elements (GH 20432) -
When indexing using
np.nan
on aSeries
orDataFrame
withCategoricalIndex
,KeyError
is incorrectly raised when annp.nan
key is present (GH 41933) -
A bug in
Series.__delitem__()
incorrectly converts tondarray
when usingExtensionDtype
(GH 40386) -
A bug in
DataFrame.at()
returns incorrect results when passing integer keys (GH 41846) -
A bug in
DataFrame.loc()
returnsMultiIndex
in the wrong order when there are duplicate values in the indexer (GH 40978) -
A bug in
DataFrame.__setitem__()
caused aTypeError
to be raised when using astr
subclass as a column name with aDatetimeIndex
(GH 37366) -
A bug in
PeriodIndex.get_loc()
failed to raiseKeyError
when given aPeriod
that did not matchfreq
(GH 41670) -
When using
UInt64Index
with negative integer keys, a bug in.loc.__getitem__
raisedOverflowError
instead ofKeyError
in some cases, wrapping it as a positive integer in other cases (GH 41777) -
A bug in
Index.get_indexer()
failed to raiseValueError
with invalidmethod
,limit
ortolerance
arguments in some cases (GH 41918) -
When slicing using
Series
orDataFrame
withTimedeltaIndex
, passing an invalid string incorrectly raisedValueError
instead ofTypeError
(GH 41821) -
Index
constructor sometimes silently ignores specifieddtype
(GH 38879) -
The behavior of
Index.where()
now matches the behavior ofIndex.putmask()
, i.e.index.where(mask, other)
matchesindex.putmask(~mask, other)
(GH 39412 )
Missing
-
Bug in
Grouper
not propagatingdropna
parameter correctly;DataFrameGroupBy.transform()
now correctly handles missing values ofdropna=True
(GH 35612) -
isna()
,Series.isna()
,Index.isna()
,DataFrame.isna()
and the correspondingnotna
functions do not recognizeDecimal("NaN")
objects (GH 39409 ) -
DataFrame.fillna()
does not accept a dictionary ofdowncast
keyword (GH 40809) -
Bug in
isna()
not returning mask for nullable types, causing any subsequent modification of the mask to change the original array (GH 40935) -
Bug in
DataFrame
construction when containing floating point data and integerdtype
cast without preservingNaN
(GH 26919) -
Bug in
Series.isin()
andMultiIndex.isin()
not treating all nans as equivalent if they were in a tuple (GH 41836)
Multi-level index
-
DataFrame.drop()
raisesTypeError
whenMultiIndex
is not unique andlevel
is not provided (GH 36293) -
Bug with repeated
NaN
in results,MultiIndex.intersection()
(GH 38623) -
Bug in
MultiIndex.equals()
incorrectly returningTrue
whenMultiIndex
containsNaN
and the order is different (GH 38439) -
Bug in
MultiIndex.intersection()
always returns empty results when intersecting withCategoricalIndex
(GH 38653) -
Bug in
MultiIndex.difference()
incorrectly raisedTypeError
when the index contained unsortable entries (GH 41915) -
Raising
ValueError
when usingMultiIndex.reindex()
on an emptyMultiIndex
and only indexing specific levels (GH 41170) -
Bug in
MultiIndex.reindex()
raisingTypeError
when reindexing to a flatIndex
(GH 41707)
I/O
-
Bug in
Index.__repr__()
whendisplay.max_seq_items=1
(GH 38415) -
Bug in
read_csv()
where scientific notation is not recognized if parameterdecimal
is set andengine="python"
(GH 31920) -
Bug in
read_csv()
interpretingNA
values as comments in case ofengine="python"
(GH 34002) -
Bug in
read_csv()
raisingIndexError
when file has no data rows, has multiple header columns andindex_col
is specified (GH 38292) -
Bug in
read_csv()
not acceptingusecols
of different lengths thannames
underengine="python"
(GH 16469) -
In case of
engine="python"
,read_csv()
returns object dtype whendelimiter=","
andusecols
andparse_dates
are specified (GH 35873) -
Fixed a bug where
read_csv()
raisedTypeError
whennames
andparse_dates
were specified inengine="c"
mode (GH 33699) -
Fixed bug where
read_clipboard()
andDataFrame.to_clipboard()
were not working in WSL (GH 38527) -
Allow setting custom error values for the
parse_dates
parameter ofread_sql()
,read_sql_query()
andread_sql_table()
(GH 35185) -
Fixed bug where
DataFrame.to_hdf()
andSeries.to_hdf()
raisedKeyError
when trying to be applied to a subclass ofDataFrame
orSeries
(GH 33748) -
Fixed bug where
HDFStore.put()
raised incorrectTypeError
when saving a DataFrame with non-string dtype (GH 34274) -
Fixed a bug in
json_normalize()
that caused the first element of the generator object not to be included in the returned DataFrame (GH 35923) -
Fixed bug where
read_csv()
should parse date columns when applying thousands separator to date columns inengine="python"
mode butusecols
was specified (GH 39365) -
Fixed bug in
read_excel()
forward-fillingMultiIndex
names when multiple header and index columns were specified (GH 34673) -
Fixed bug where
read_excel()
did not respectset_option()
(GH 34252) -
read_csv()
does not switchtrue_values
andfalse_values
for nullable boolean data types (GH 34655) -
read_json()
does not maintain numeric string index whenorient="split"
(GH 28556) -
If
chunksize
is non-zero and the query result is empty,read_sql()
returns an empty generator. Now returns a generator containing a single empty DataFrame (GH 34411) -
read_hdf()
returns unexpected records when filtering categorical string columns using thewhere
parameter (GH 39189) -
read_sas()
raisesValueError
whendatetimes
is empty (GH 39725) -
read_excel()
removes null values when reading from a single column spreadsheet (GH 39808) -
read_excel()
loads trailing empty rows/columns for some file types (GH 41167) -
read_excel()
raisesAttributeError
when an Excel file has aMultiIndex
header followed by two empty rows and no index (GH 40442) -
An issue in
read_excel()
,read_csv()
,read_table()
,read_fwf()
andread_clipboard()
is when there is no index after theMultiIndex
header and there is a blank row when , the blank line will be deleted (GH 40442) -
DataFrame.to_string()
incorrectly places truncated columns whenindex=False
(GH 40904) -
Bug in
DataFrame.to_string()
whenindex=False
adds extra points and misplaces truncated rows (GH 40904) -
Bug in
read_orc()
always raisesAttributeError
(GH 40918) -
Bug in
read_csv()
andread_table()
Ifnames
andprefix
are defined,prefix
is silently ignored and now raisesValueError
(GH 39123) -
Bug in
read_csv()
andread_excel()
whenmangle_dupe_cols
is set toTrue
, dtype of duplicate column names is not respected (GH 35211) -
Bug in
read_csv()
Ifdelimiter
andsep
are defined,sep
is silently ignored and now raises aValueError
(GH 39823) -
Bug in
read_csv()
andread_table()
misunderstanding parameters in previous calls tosys.setprofile
(GH 41069) -
Error when converting from PyArrow to pandas (e.g. for reading Parquet files) containing nullable data types and PyArrow arrays whose data buffer size is not a multiple of the dtype size (GH 40896)
-
Bug in
read_excel()
raises an error when pandas cannot determine the file type, even if the user specifies theengine
argument (GH 41225) -
Bug in
read_clipboard()
When copying from an Excel file, if the first column has a null value, the value is moved to the wrong column (GH 41108) -
Bug in
DataFrame.to_hdf()
andSeries.to_hdf()
raisingTypeError
when trying to append a string column to an incompatible column (GH 41897)
Period
- Comparisons of
Period
objects orIndex
,Series
orDataFrame
now behave consistent with comparisons of other unmatchedPeriodDtype
types, returningFalse
for equality,True
for inequality, andTrue
for inequality Check raisesTypeError
(GH 39274)
Drawing
-
Fixed bug in
plotting.scatter_matrix()
when 2Dax
argument is passed (GH 16253) -
Prevent warnings from appearing when Matplotlib's
constrained_layout
is enabled (GH 25261) -
Fixed a bug in
DataFrame.plot()
that showed wrong colors in the legend when the function was called repeatedly and some calls usedyerr
but others did not (GH 39522) -
Fixed a bug in
DataFrame.plot()
that showed wrong colors in the legend when the function was called repeatedly and some calls usedsecondary_y
and others usedlegend=False
(GH 40044 ) -
Fixed bug in
DataFrame.plot.box()
where the upper and lower bounds or min/max markers in the plot were not visible when thedark_background
theme was selected (GH 40769)
Grouping/Resampling/Rolling
-
Fixed bug in
DataFrameGroupBy.agg()
andSeriesGroupBy.agg()
where result conversion was incorrectly too aggressive forPeriodDtype
columns (GH 38254) -
Fixed bug in
SeriesGroupBy.value_counts()
where there were no counts for unobserved categories in grouped categorical series (GH 38672) -
There is a bug in
SeriesGroupBy.value_counts()
that throws an error on an empty Series (GH 39172) -
In grouping keys with null values,
GroupBy.indices()
will contain non-existent indices (GH 9304) -
Fixed a precision loss bug in
DataFrameGroupBy.sum()
andSeriesGroupBy.sum()
by now using Kahan summation (GH 38778) -
Fixed precision loss bug in
DataFrameGroupBy.cumsum()
,SeriesGroupBy.cumsum()
,DataFrameGroupBy.mean()
andSeriesGroupBy.mean()
by using Kahan summation (GH 38934) -
There is a bug in
Resampler.aggregate()
andDataFrame.transform()
that throwsTypeError
instead ofSpecificationError
when mixed data types for a key are missing (GH 39025) -
There is a bug in
DataFrameGroupBy.idxmin()
andDataFrameGroupBy.idxmax()
withExtensionDtype
columns (GH 38733) -
Bug in
Series.resample()
raises an error when the index is aPeriodIndex
consisting ofNaT
(GH 39227) -
Bug in
RollingGroupby.corr()
andExpandingGroupby.corr()
, when providingother
longer than each group, the group column would return0
instead ofnp.nan
( GH 39591) -
Bug in
ExpandingGroupby.corr()
andExpandingGroupby.cov()
, when providingother
longer than each group, returning1
instead ofnp.nan
(GH 39591) -
Bug in
DataFrameGroupBy.mean()
,SeriesGroupBy.mean()
,DataFrameGroupBy.median()
,SeriesGroupBy.median()
andDataFrame.pivot_table()
, metadata was not propagated (GH 28283 ) -
Bug in
Series.rolling()
andDataFrame.rolling()
where the window bounds were not calculated correctly when the window was an offset and the date was in descending order (GH 40002) -
When using
Series.groupby()
andDataFrame.groupby()
on an emptySeries
orDataFrame
, useidxmax
,idxmin
,mad
,min
,max
directly ,sum
,prod
andskew
methods, or when using them viaapply
,aggregate
orresample
, indexes, columns and/or data types are lost (GH 26411) -
There is a bug when using
DataFrameGroupBy.apply()
andSeriesGroupBy.apply()
onRollingGroupby
objects, which creates aMultiIndex
instead of anIndex
(GH 39732) -
There is a bug in
DataFrameGroupBy.sample()
that raises an error whenweights
is specified and the index isInt64Index
(GH 39927) -
DataFrameGroupBy.aggregate()
andResampler.aggregate()
, which sometimes raisedSpecificationError
when passing a dictionary and missing columns, will now always raiseKeyError
(GH 40004) -
There is a bug in
DataFrameGroupBy.sample()
that does not apply column selection before calculating the result (GH 39928) -
There was a bug in
ExponentialMovingWindow
when calling__getitem__
that incorrectly raisedValueError
whentimes
was supplied (GH 40164) -
There is a bug in
ExponentialMovingWindow
that losescom
,span
,alpha
orhalflife
attributes when calling__getitem__
(GH 40164) -
ExponentialMovingWindow
now throwsNotImplementedError
when specifyingtimes
withadjust=False
because the calculation is incorrect (GH 40098) -
There is a bug in
ExponentialMovingWindowGroupby.mean()
that ignores thetimes
parameter whenengine='numba'
(GH 40951) -
There is a bug in
ExponentialMovingWindowGroupby.mean()
that uses the wrongtimes
when there are multiple groups (GH 40951) -
There is a bug in
ExponentialMovingWindowGroupby
where the time vector and value will be out of sync for non-trivial groups (GH 40951) -
There is a bug in
Series.asfreq()
andDataFrame.asfreq()
that drops rows when the index is not sorted (GH 39805) -
There is a bug in the aggregate functions of
DataFrame
that does not respect thenumeric_only
parameter when thelevel
keyword is given (GH 40660) -
There is a bug in
SeriesGroupBy.aggregate()
where using a user-defined function to aggregate a Series with object typeIndex
results in an incorrectIndex
shape (GH 40014) -
There is a bug in
RollingGroupby
where theas_index=False
parameter ingroupby
is ignored (GH 39433) -
There is a bug in
DataFrameGroupBy.any()
,SeriesGroupBy.any()
,DataFrameGroupBy.all()
andSeriesGroupBy.all()
when using nullable type columns and evenskipna=True
ValueError
is raised (GH 40585) -
Bug in
DataFrameGroupBy.cummin()
,SeriesGroupBy.cummin()
,DataFrameGroupBy.cummax()
andSeriesGroupBy.cummax()
incorrectly rounding integer values near theint64
implementations bounds (GH 40767) -
Bug in
DataFrameGroupBy.rank()
andSeriesGroupBy.rank()
with nullable dtypes incorrectly raising aTypeError
(GH 41010) -
Bug in
DataFrameGroupBy.cummin()
,SeriesGroupBy.cummin()
,DataFrameGroupBy.cummax()
andSeriesGroupBy.cummax()
computing wrong result with nullable data types too large to roundtrip when casting to float (GH 37493) -
Bug in
DataFrame.rolling()
returning mean zero for allNaN
window withmin_periods=0
if calculation is not numerical stable (GH 41053) -
Bug in
DataFrame.rolling()
returning sum not zero for allNaN
window withmin_periods=0
if calculation is not numerical stable (GH 41053) -
Bug in
SeriesGroupBy.agg()
, failure to preserve orderedCategoricalDtype
in order-preserving aggregation operations (GH 41147) -
Bug in
DataFrameGroupBy.min()
,SeriesGroupBy.min()
,DataFrameGroupBy.max()
andSeriesGroupBy.max()
, multiple object type columns andnumeric_only=False
were raised incorrectlyValueError
(GH 41111) -
Bug in
DataFrameGroupBy.rank()
related toaxis=0
of the GroupBy object and keywordaxis=1
of therank
method (GH 41320) -
Bug in
DataFrameGroupBy.__getitem__()
, non-unique columns incorrectly returned a malformedSeriesGroupBy
instead ofDataFrameGroupBy
(GH 41427) -
Bug in
DataFrameGroupBy.transform()
, non-unique columns incorrectly raisedAttributeError
(GH 41427) -
Bug in
Resampler.apply()
, non-unique columns incorrectly dropped duplicate columns (GH 41445) -
Bug in
Series.groupby()
where aggregation incorrectly returned an emptySeries
instead of raisingTypeError
for aggregation operations whose dtype is invalid, e.g..prod
withdatetime64[ns]
dtype (GH 41342) -
DataFrameGroupBy
aggregation incorrectly did not remove columns of invalid data types for those aggregates when there were no valid columns resulting in an error (GH 41291) -
Bug in
DataFrame.rolling.__iter__()
whereon
was not assigned an index into the result object (GH 40373) -
Bug in
*args
inDataFrameGroupBy.transform()
andDataFrameGroupBy.agg()
being cached with user-passed functions when usingengine="numba"
(GH 41647) -
Removed
.columns.names
inDataFrameGroupBy
methodsagg
,transform
,sum
,bfill
,ffill
,pad
,pct_change
,shift
,ohlc
Bug (GH 41497)
Reshape
-
When performing an inner join with a partial index and
right_index=True
,merge()
raises an incorrect error when there is no overlap between indices (GH 33814) -
Missing levels in
DataFrame.unstack()
caused a bug with incorrect index names (GH 37510) -
Bug in
merge_asof()
propagating right index instead of left index when usingleft_index=True
andright_on
specifications (GH 33463) -
DataFrame.join()
on a DataFrame withMultiIndex
returns incorrect results when one of the indexes has only one level (GH 36909) -
merge_asof()
now raisesValueError
instead of the obscureTypeError
in the case of non-numeric merge columns (GH 29130) -
Bug in
DataFrame.join()
where values were not assigned correctly when the DataFrame had aMultiIndex
and at least one dimension had non-alphabetical categories (GH 38502) -
Series.value_counts()
andSeries.mode()
now return consistent keys in original order (GH 12679, GH 11227 and GH 39007) -
Bug in
DataFrame.stack()
not properly handlingNaN
inMultiIndex
columns (GH 39481) -
Bug in
DataFrame.apply()
giving incorrect results when argumentfunc
is a string andaxis
is not supported; now raisesValueError
(GH 39211) -
Bug in
DataFrame.sort_values()
when sorting on columns did not reshape the index correctly whenignore_index=True
(GH 39464) -
Bug in
DataFrame.append()
returns incorrect dtypes when combiningExtensionDtype
dtypes (GH 39454) -
Bug in
DataFrame.append()
returns incorrect dtypes when used in combination withdatetime64
andtimedelta64
dtypes (GH 39574) -
Bug in
DataFrame.append()
when using aDataFrame
with aMultiIndex
and appending aSeries
that is not aMultiIndex
(GH 41707) -
Bug in
DataFrame.pivot_table()
returns aMultiIndex
of a single value when operating on an empty DataFrame (GH 13483) -
Index
can now be passed to thenumpy.all()
function (GH 40180) -
Bug in
DataFrame.stack()
,CategoricalDtype
is not preserved inMultiIndex
(GH 36991) -
There is a bug in
to_datetime()
that raises an error when the input sequence contains unhashable items (GH 39756) -
Bug in
Series.explode()
that preserves index whenignore_index
is set toTrue
and the value is a scalar (GH 40487) -
There is a bug in
to_datetime()
, which raisesValueError
whenSeries
containsNone
andNaT
and has more than 50 elements (GH 39882) -
Bug in
Series.unstack()
andDataFrame.unstack()
whereby object dtype values containing timezone-aware datetime objects would incorrectly raiseTypeError
(GH 41875) -
There is a bug in
DataFrame.melt()
that raisesInvalidIndexError
inDataFrame
with duplicate columns used asvalue_vars
(GH 41951)
Sparse
-
There is a bug in
DataFrame.sparse.to_coo()
, which raisesKeyError
when the column is a numericIndex
without0
(GH 18414) -
A bug exists in
SparseArray.astype()
where usingcopy=False
produces incorrect results when converting from an integer dtype to a floating point dtype (GH 34456)
ummin(),
SeriesGroupBy.cummin(),
DataFrameGroupBy.cummax()and
SeriesGroupBy.cummax()` computing wrong result with nullable data types too large to roundtrip when casting to float (GH 37493)
-
Bug in
DataFrame.rolling()
returning mean zero for allNaN
window withmin_periods=0
if calculation is not numerical stable (GH 41053) -
Bug in
DataFrame.rolling()
returning sum not zero for allNaN
window withmin_periods=0
if calculation is not numerical stable (GH 41053) -
Bug in
SeriesGroupBy.agg()
, failure to preserve orderedCategoricalDtype
in order-preserving aggregation operations (GH 41147) -
Bug in
DataFrameGroupBy.min()
,SeriesGroupBy.min()
,DataFrameGroupBy.max()
andSeriesGroupBy.max()
, multiple object type columns andnumeric_only=False
were raised incorrectlyValueError
(GH 41111) -
Bug in
DataFrameGroupBy.rank()
related toaxis=0
of the GroupBy object and keywordaxis=1
of therank
method (GH 41320) -
Bug in
DataFrameGroupBy.__getitem__()
, non-unique columns incorrectly returned a malformedSeriesGroupBy
instead ofDataFrameGroupBy
(GH 41427) -
Bug in
DataFrameGroupBy.transform()
, non-unique columns incorrectly raisedAttributeError
(GH 41427) -
Bug in
Resampler.apply()
, non-unique columns incorrectly dropped duplicate columns (GH 41445) -
Bug in
Series.groupby()
where aggregation incorrectly returned an emptySeries
instead of raisingTypeError
for aggregation operations whose dtype is invalid, e.g..prod
withdatetime64[ns]
dtype (GH 41342) -
DataFrameGroupBy
aggregation incorrectly did not remove columns of invalid data types for those aggregates when there were no valid columns resulting in an error (GH 41291) -
Bug in
DataFrame.rolling.__iter__()
whereon
was not assigned an index into the result object (GH 40373) -
Bug in
*args
inDataFrameGroupBy.transform()
andDataFrameGroupBy.agg()
being cached with user-passed functions when usingengine="numba"
(GH 41647) -
Removed
.columns.names
inDataFrameGroupBy
methodsagg
,transform
,sum
,bfill
,ffill
,pad
,pct_change
,shift
,ohlc
Bug (GH 41497)
Reshape
-
When performing an inner join with a partial index and
right_index=True
,merge()
raises an incorrect error when there is no overlap between indices (GH 33814) -
Missing levels in
DataFrame.unstack()
caused a bug with incorrect index names (GH 37510) -
Bug in
merge_asof()
propagating right index instead of left index when usingleft_index=True
andright_on
specifications (GH 33463) -
DataFrame.join()
on a DataFrame withMultiIndex
returns incorrect results when one of the indexes has only one level (GH 36909) -
merge_asof()
now raisesValueError
instead of the obscureTypeError
in the case of non-numeric merge columns (GH 29130) -
Bug in
DataFrame.join()
where values were not assigned correctly when the DataFrame had aMultiIndex
and at least one dimension had non-alphabetical categories (GH 38502) -
Series.value_counts()
andSeries.mode()
now return consistent keys in original order (GH 12679, GH 11227 and GH 39007) -
Bug in
DataFrame.stack()
not properly handlingNaN
inMultiIndex
columns (GH 39481) -
Bug in
DataFrame.apply()
giving incorrect results when argumentfunc
is a string andaxis
is not supported; now raisesValueError
(GH 39211) -
Bug in
DataFrame.sort_values()
when sorting on columns did not reshape the index correctly whenignore_index=True
(GH 39464) -
Bug in
DataFrame.append()
returns incorrect dtypes when combiningExtensionDtype
dtypes (GH 39454) -
Bug in
DataFrame.append()
returns incorrect dtypes when used in combination withdatetime64
andtimedelta64
dtypes (GH 39574) -
Bug in
DataFrame.append()
when using aDataFrame
with aMultiIndex
and appending aSeries
that is not aMultiIndex
(GH 41707) -
Bug in
DataFrame.pivot_table()
returns aMultiIndex
of a single value when operating on an empty DataFrame (GH 13483) -
Index
can now be passed to thenumpy.all()
function (GH 40180) -
Bug in
DataFrame.stack()
,CategoricalDtype
is not preserved inMultiIndex
(GH 36991) -
There is a bug in
to_datetime()
that raises an error when the input sequence contains unhashable items (GH 39756) -
Bug in
Series.explode()
that preserves index whenignore_index
is set toTrue
and the value is a scalar (GH 40487) -
There is a bug in
to_datetime()
, which raisesValueError
whenSeries
containsNone
andNaT
and has more than 50 elements (GH 39882) -
Bug in
Series.unstack()
andDataFrame.unstack()
whereby object dtype values containing timezone-aware datetime objects would incorrectly raiseTypeError
(GH 41875) -
There is a bug in
DataFrame.melt()
that raisesInvalidIndexError
inDataFrame
with duplicate columns used asvalue_vars
(GH 41951)
Sparse
-
There is a bug in
DataFrame.sparse.to_coo()
, which raisesKeyError
when the column is a numericIndex
without0
(GH 18414) -
A bug exists in
SparseArray.astype()
where usingcopy=False
produces incorrect results when converting from an integer dtype to a floating point dtype (GH 34456) -
Bug in
SparseArray.max()
andSparseArray.min()
always returned empty results ([GH 40921](https://github.com/pandas-dev/pa