Sunday, September 8, 2024
HomepandasPandas 2.2 documentation

Pandas 2.2 documentation

Original text: pandas.pydata.org/docs/

New Features 1.3.5 (December 12, 2021)

Original text: pandas.pydata.org/docs/whatsnew/v1.3.5.html

These are changes in pandas 1.3.5. Check out the release notes for the full changelog including other releases.

Fixed regression issues

  • Fixed regression in Series.equals() comparing floating point to object data types when comparing to None (GH 44190)

  • Fixed a regression in merge_asof() that raised an error when an array was provided as a join key (GH 42844)

  • Fixed regression that raised RuntimeError when using DateTimeIndex to resample DataFrame for an empty group and uint8, uint16 or uint32 columns (GH 43329)

  • Fixed a regression in creating a DataFrame of a timezone-aware Timestamp scalar around daylight saving time transitions (GH 42505)

  • Fixed performance regression in read_csv() (GH 44106)

  • Fixed regression in Series.duplicated() and Series.drop_duplicates() when the Series has the Categorical data type and has a Boolean category (GH 44351)

  • Fixed regression in DataFrameGroupBy.sum() and SeriesGroupBy.sum() where timedelta64[ns] data type contained NaT and failed to treat the value as NA (GH 42659)

  • Fixed a regression where extra groups would be incorrectly returned in the results when other in RollingGroupby.cov() and RollingGroupby.corr() was the same shape as each group (GH 42915) ## Contributors

A total of 10 people contributed to this version. People with a “+” sign contributed to this release for the first time.

  • Ali McMaster

  • Matthew Roeschke

  • Matthew Zeitlin

  • MeeseeksMachine

  • pandas development team

  • Patrick Hoefler

  • Simon Hawkins

  • Thomas Li

  • Tobias Pitters

  • jbrockmendel ## Fixed regression issue

  • Fixed regression in Series.equals() when comparing floats with dtype object to None (GH 44190)

  • Fixed regression where merge_asof() raised an error when providing an array as join key (GH 42844)

  • Fixed regression when resampling DataFrame with DateTimeIndex, incorrectly raising RuntimeError when there were empty groups and uint8, uint16 or uint32 columns (GH 43329)

  • Fixed regression near daylight saving time transitions when creating a DataFrame from a timezone-aware Timestamp scalar (GH 42505)

  • Fixed performance regression in read_csv() (GH 44106)

  • Fixed regression in Series.duplicated() and Series.drop_duplicates() when the Series has a Categorical dtype of boolean category (GH 44351)

  • Fixed regression in DataFrameGroupBy.sum() and SeriesGroupBy.sum() failing to treat the value as NA when the timedelta64[ns] dtype containing NaT was not treated as NA (GH 42659)

  • Fixed regression in RollingGroupby.cov() and RollingGroupby.corr() where extra groups would be incorrectly returned in the result when other had the same shape as each group (GH 42915)

Contributor

A total of 10 people contributed to this version. People with a “+” sign after their name are contributing code to that version for the first time.

  • Ali McMaster

  • Matthew Roeschke

  • Matthew Zeitlin

  • MeeseeksMachine

  • pandas development team

  • Patrick Hoefler

  • Simon Hawkins

  • Thomas Li

  • Tobias Pitters

  • jbrockmendel

New features in version 1.3.4 (October 17, 2021)

Original text: pandas.pydata.org/docs/whatsnew/v1.3.4.html

These are changes in pandas version 1.3.4. Check out the release notes for the full changelog including other versions.

fixed return

  • Fixed regression in DataFrame.convert_dtypes() incorrectly converting byte strings to strings (GH 43183)

  • Fixed regression where DataFrameGroupBy.agg() and SeriesGroupBy.agg() failed silently when failing with mixed data types on axis=1 and MultiIndex (GH 43209)

  • Fixed regression where merge() with integer and NaN keys failed on outer merge (GH 43550)

  • Fixed regression in DataFrame.corr() raising ValueError when using method="spearman" on 32-bit platforms (GH 43588)

  • Fixed performance regression in MultiIndex.equals() (GH 43549)

  • Fixed performance regressions in DataFrameGroupBy.first(), SeriesGroupBy.first(), DataFrameGroupBy.last() and SeriesGroupBy.last(), as well as a regression with StringDtype (GH 41596 )

  • Fixed regression in Series.cat.reorder_categories() failing to update categories on Series (GH 43232)

  • Fixed regression in Series.cat.categories() setter that failed to update categories on Series (GH 43334)

  • Fixed regression in read_csv() raising UnicodeDecodeError exception when memory_map=True (GH 43540)

  • Fixed regression in DataFrame.explode() that raised AssertionError when column was not any scalar of a string (GH 43314)

  • Fixed a regression in Series.aggregate() that attempted to pass args and kwargs multiple times to a user-supplied func in some cases (GH 43357)

  • Fixed a regression when iterating over DataFrame.groupby.rolling objects causing the resulting DataFrame to be incorrectly indexed if the input grouping was not sorted (GH 43386)

  • Fixed regression where DataFrame.groupby.rolling.cov() and DataFrame.groupby.rolling.corr() calculated incorrectly when the input grouping was unsorted (GH 43386)

  • Fixed bug in pandas.DataFrame.groupby.rolling() and pandas.api.indexers.FixedForwardWindowIndexer that caused segfaults and window endpoints to be mixed between groups (GH 43267)

  • Fixed bug where DataFrameGroupBy.mean() and SeriesGroupBy.mean() returned incorrect results for datetimelike values ​​with NaT values ​​(GH 43132)

  • Fixed bug in Series.aggregate() where the first args was not passed to user-supplied func in some cases (GH 43357)

  • Fixed memory leak in Series.rolling.quantile() and Series.rolling.median() (GH 43339)

  • The minimum version of Cython required to compile pandas is now 0.29.24 (GH 43729) ## Contributors

A total of 17 people contributed patches to this version. People with a “+” after their names contributed patches for the first time.

  • Alexey Györi +

  • DSM

  • Irv Lustig

  • Jeff Reback

  • Julien de la Bruère-T +

  • Matthew Zeitlin

  • MeeseeksMachine

  • pandas development team

  • Patrick Hoefler

  • Richard Shadrach

  • Shoham Debnath

  • Simon Hawkins

  • Thomas Li

  • stupid +

  • jbrockmendel

  • michael-gh+

  • realead ## fixed regression

  • Fixed regression in DataFrame.convert_dtypes() incorrectly converting byte strings to strings (GH 43183)

  • Fixed regression where DataFrameGroupBy.agg() and SeriesGroupBy.agg() would silently fail if failed along MultiIndex on axis=1 (GH 43209)

  • Fixed regression where merge() failed when merging using outer on integers and NaN keys (GH 43550)

  • Fixed regression in ValueError in DataFrame.corr() when using method="spearman" on 32-bit platforms (GH 43588)

  • Fixed performance regression in MultiIndex.equals() (GH 43549)

  • Fixed performance regression in StringDtype, DataFrameGroupBy.first(), SeriesGroupBy.first(), DataFrameGroupBy.last() and SeriesGroupBy.last() (GH 41596)

  • Fixed regression in Series.cat.reorder_categories() failing to update categories on Series (GH 43232)

  • Fixed regression in setter of Series.cat.categories() failing to update categories on Series (GH 43334)

  • Fixed regression in read_csv() raising UnicodeDecodeError exception when memory_map=True (GH 43540)

  • Fixed regression in DataFrame.explode() raising AssertionError when column was any scalar that was not a string (GH 43314)

  • Fixed regression in Series.aggregate() when trying to pass args and kwargs to a user-supplied func multiple times in some cases (GH 43357)

  • Fixed a regression when iterating over DataFrame.groupby.rolling objects, causing the resulting DataFrame to have incorrect indexes if the input grouping was not sorted (GH 43386)

  • Fixed regression in DataFrame.groupby.rolling.cov() and DataFrame.groupby.rolling.corr(), calculating incorrect results when the input grouping was unsorted (GH 43386)

Bug fixes

  • Fixed bug in pandas.DataFrame.groupby.rolling() and pandas.api.indexers.FixedForwardWindowIndexer causing segfaults and window endpoints to be mixed between groups (GH 43267)

  • Fixed bug in DataFrameGroupBy.mean() and SeriesGroupBy.mean() where datetimelike values ​​including NaT values ​​returned incorrect results (GH 43132)

  • Fixed a bug in Series.aggregate() where the first args was not passed to the user-supplied func in some cases (GH 43357)

  • Fixed memory leak in Series.rolling.quantile() and Series.rolling.median() (GH 43339)

other

  • The minimum Cython version required to compile pandas is now 0.29.24 (GH 43729)

Contributor

A total of 17 people contributed patches to this version. People with a “+” sign next to their name are contributing patches for the first time.

  • Alexey Györi +

  • DSM

  • Irv Lustig

  • Jeff Reback

  • Julien de la Bruère-T +

  • Matthew Zeitlin

  • MeeseeksMachine

  • pandas development team

  • Patrick Hoefler

  • Richard Shadrach

  • Shoham Debnath

  • Simon Hawkins

  • Thomas Li

  • stupid +

  • jbrockmendel

  • michael-gh+

  • realead

New features in 1.3.3 (September 12, 2021)

Original text: pandas.pydata.org/docs/whatsnew/v1.3.3.html

These are changes in pandas 1.3.3. See the release notes for the full changelog including other versions of pandas.

Fixed regression issues

  • Fixed an issue where the DataFrame constructor failed when broadcasting for a defined Index and a Timestamp list of length one (GH 42810)

  • Fixed an issue where DataFrameGroupBy.agg() and SeriesGroupBy.agg() incorrectly raised exceptions under certain circumstances (GH 42390)

  • Fixed regression in DataFrameGroupBy.apply() and SeriesGroupBy.apply() where nan values ​​would be dropped even if dropna=False (GH 43205)

  • Fixed regression in DataFrameGroupBy.quantile() and SeriesGroupBy.quantile() which failed when using pandas.NA (GH 42849)

  • Fixed regression in merge() when on column had ExtensionDtype or bool data type, being converted to object in right and outer merges (GH 40073)

  • Fixed regression in RangeIndex.where() and RangeIndex.putmask() raising AssertionError when the result did not represent a RangeIndex (GH 43240)

  • Fixed a regression in read_parquet() where the fastparquet engine did not work properly in fastparquet 0.7.0 (GH 43075)

  • Fixed regression in DataFrame.loc.__setitem__() raising ValueError when setting an array to a cell value (GH 43422)

  • Fixed regression where is_list_like() was recognized as an iterable when the object's __iter__ was set to None (GH 43373)

  • Fixed a regression where DataFrame.__getitem__() would throw an error on slicing a non-monotone indexed DatetimeIndex (GH 43223)

  • Fixed regression in Resampler.aggregate() when used after column selection, which would raise an error if func was a list of aggregate functions (GH 42905)

  • Fixed regression in DataFrame.corr() where Kendall correlation would produce incorrect results for columns with duplicate values ​​(GH 43401)

  • Fixed a regression in DataFrame.groupby() that caused results on those columns to be lost when aggregating on those columns (GH 42395, GH 43108)

  • Fixed a regression where Series.fillna() raised a TypeError when filling a Series of type float with a data type that cannot be converted losslessly (such as float32 filled with float64) (GH 43424)

  • Fixed regression in read_csv() raising AttributeError when the file handle was a tempfile.SpooledTemporaryFile object (GH 43439)

  • Fixed performance regression issue in core.window.ewm.ExponentialMovingWindow.mean() (GH 42333) ## Performance improvements

  • In DataFrame.__setitem__(), performance is improved when the key or value is not a DataFrame or the key is not a list-like (GH 43274) ## Bug fixes

  • Fixed a bug where index data was not passed correctly to func when using engine="numba" in DataFrameGroupBy.agg() and DataFrameGroupBy.transform() (GH 43133) ## Contributors

A total of 18 people contributed patches to this release. People with a “+” after their names contributed patches for the first time.

  • Ali McMaster

  • Irv Lustig

  • Matthew Roeschke

  • Matthew Zeitlin

  • MeeseeksMachine

  • pandas development team

  • Patrick Hoefler

  • Prerana Chakraborty +

  • Richard Shadrach

  • Shoham Debnath

  • Simon Hawkins

  • Thomas Li

  • Torsten Wörtwein

  • Zach Rait +

  • aiudirog +

  • attack68

  • jbrockmendel

  • suoniq + ## fixed regression

  • Fixed regression where broadcast failed for Index and Timestamp lists of length one defined in DataFrame constructor (GH 42810)

  • Fixed a regression where DataFrameGroupBy.agg() and SeriesGroupBy.agg() were incorrectly thrown in some cases (GH 42390)

  • Fixed regression in DataFrameGroupBy.apply() and SeriesGroupBy.apply() where dropna=False would also drop nan values ​​(GH 43205)

  • Fixed regression where DataFrameGroupBy.quantile() and SeriesGroupBy.quantile() failed when using pandas.NA (GH 42849)

  • Fixed regression in merge() where on columns with ExtensionDtype or bool data types were converted to object in right and outer merges (GH 40073)

  • Fixed regression in RangeIndex.where() and RangeIndex.putmask() raising AssertionError when the result did not represent a RangeIndex (GH 43240)

  • Fixed a regression in read_parquet() where the fastparquet engine did not work properly in fastparquet 0.7.0 (GH 43075)

  • Fixed regression in DataFrame.loc.__setitem__() raising ValueError when setting an array to a cell value (GH 43422)

  • Fixed regression in is_list_like() where objects with __iter__ set to None were recognized as iterables (GH 43373)

  • Fixed regression in DataFrame.__getitem__() where slicing on DatetimeIndex raised an error when the index was non-monotonic (GH 43223)

  • Fixed regression in Resampler.aggregate() which, when used after column selection, would raise an error if func was a set of aggregate functions (GH 42905)

  • Fixed a regression in DataFrame.corr() where Kendall correlation would produce incorrect results for columns with duplicate values ​​(GH 43401)

  • Fixed a regression in DataFrame.groupby() where aggregating columns with object types would lose results for those columns (GH 42395, GH 43108)

  • Fixed a regression in Series.fillna() that raised a TypeError when filling a float Series with a dtype that cannot be losslessly converted (such as float32 filled with float64) (GH 43424)

  • Fixed regression in read_csv() raising AttributeError when the file handle was a tempfile.SpooledTemporaryFile object (GH 43439)

  • Fixed performance regression in core.window.ewm.ExponentialMovingWindow.mean() (GH 42333)

Performance improvements

  • Improvements in performance when the keys or values ​​of DataFrame.__setitem__() are not DataFrame, or when the keys are not list-like (GH 43274)

Bug fix

  • Fixed bug in engine="numba" in DataFrameGroupBy.agg() and DataFrameGroupBy.transform(), where index data was not correctly passed to func (GH 43133)

Contributor

A total of 18 people contributed patches to this release. People with a “+” after their names contributed patches for the first time.

  • Ali McMaster

  • Irv Lustig

  • Matthew Roeschke

  • Matthew Zeitlin

  • MeeseeksMachine

  • pandas development team

  • Patrick Hoefler

  • Prerana Chakraborty +

  • Richard Shadrach

  • Shoham Debnath

  • Simon Hawkins

  • Thomas Li

  • Torsten Wörtwein

  • Zach Rait +

  • aiudirog +

  • attack68

  • jbrockmendel

  • suoniq +

New features in version 1.3.2 (August 15, 2021)

Original text: pandas.pydata.org/docs/whatsnew/v1.3.2.html

These are changes in pandas version 1.3.2. Check out the release notes for the full changelog including other versions of pandas.

fixed return

  • Performance regression in DataFrame.isin() and Series.isin() for nullable data types (GH 42714)

  • Regression in updating the value of a Series using a boolean index created via DataFrame.pop() has been fixed (GH 42530)

  • Regression in DataFrame.from_records() when records are empty (GH 42456)

  • Regression in TypeError in DataFrame.shift() when shifting and filling values ​​in a DataFrame created by slicing concatenation has been fixed (GH 42719)

  • Regression in DataFrame.agg() when func parameter returns a list and axis=1 (GH 42727)

  • Regression in DataFrame.drop() that does not work if there are duplicates in MultiIndex and the indexer is a tuple or a list of tuples (GH 42771)

  • Fixed regression in read_csv() raising ValueError when parameters names and prefix are both set to None (GH 42387)

  • Fixed regression in comparison between Timestamp objects and nanosecond datetime64 objects, outside the implementation scope of nanosecond datetime64 (GH 42794)

  • Fixed regression in Styler.highlight_min() and Styler.highlight_max() that pandas.NA failed to ignore (GH 42650)

  • Fixed regression in concat() where copy=False was not respected when concatenating axis=1 Series (GH 42501)

  • Regression problem in Series.nlargest() and Series.nsmallest() with nullable integer or float dtype (GH 42816)

  • Fixed regression in Series.quantile() related to Int64Dtype (GH 42626)

  • Fixed a regression in Series.groupby() and DataFrame.groupby() where using a tuple-named Series as the by parameter would incorrectly raise an exception (GH 42731) ## Bug fixes

  • Bug in read_excel() modified dtypes dictionary when reading files with duplicate columns (GH 42462)

  • 1D slices on extension types become N-dimensional slices on ExtensionArrays (GH 42430)

  • Fixed bug in Series.rolling() and DataFrame.rolling() where window bounds were not calculated correctly for the first row when center=True and window overridden the offset of all rows (GH 42753)

  • Styler.hide_columns() now hides index name header rows as well as column headers (GH 42101)

  • Styler.set_sticky() has modified CSS to control column/index names and ensure correct sticky positioning (GH 42537)

  • Error when deserializing datetime index in PYTHONOPTIMIZED mode (GH 42866) ## Contributors

A total of 16 people contributed patches to this release. People with a “+” after their names contributed patches for the first time.

  • Alexander Gorodetsky +

  • Fangchen Li

  • Fred Reiss

  • Justin McOmie +

  • Matthew Zeitlin

  • MeeseeksMachine

  • pandas development team

  • Patrick Hoefler

  • Richard Shadrach

  • Shoham Debnath

  • Simon Hawkins

  • Thomas Li

  • Wenjun Si

  • attack68

  • dicristina +

  • jbrockmendel ## Fixed regression issue

  • Performance regression in DataFrame.isin() and Series.isin() in the case of nullable data types (GH 42714)

  • Regression issue when updating Series values ​​using boolean indexes created via DataFrame.pop() (GH 42530)

  • Regression issue in DataFrame.from_records() when records are empty (GH 42456)

  • Regression in DataFrame.shift() with TypeError when creating DataFrame via slice concatenation and populating values ​​(GH 42719)

  • Regression issue in DataFrame.agg() when func parameter returns a list and axis=1 (GH 42727)

  • Regression in DataFrame.drop() not working when there are duplicates in MultiIndex and the indexer is a tuple or a list of tuples (GH 42771)

  • Fixed regression in read_csv() raising ValueError when parameters names and prefix are both set to None (GH 42387)

  • Fixed an out-of-implementation regression in comparisons between Timestamp objects and nanosecond datetime64 objects (GH 42794)

  • Fixed a regression in Styler.highlight_min() and Styler.highlight_max() where pandas.NA was not successfully ignored (GH 42650)

  • Fixed a bug in concat() where copy=False was not respected when connecting axis=1 Series (GH 42501)

  • A regression bug in Series.nlargest() and Series.nsmallest() caused a regression when the nullable integer or floating point dtype was used (GH 42816)

  • Fixed a bug in Series.quantile() causing a regression when comparing to Int64Dtype (GH 42626)

  • Fixed a bug in Series.groupby() and DataFrame.groupby() where using a tuple-named Series as the by parameter would incorrectly raise an exception (GH 42731)

Bug fix

  • A bug in read_excel() modifies the dtypes dictionary when reading files with duplicate columns (GH 42462)

  • Slices of 1D extended types become N-dimensional slices of extended arrays (GH 42430)

  • Fixed a bug in Series.rolling() and DataFrame.rolling() where the window bounds were not calculated correctly for the first row when center=True and window was an offset that covered all rows ( GH 42753)

  • Styler.hide_columns() now hides index name header rows and column headers (GH 42101)

  • Styler.set_sticky() has modified CSS to control column/index names and ensure correct sticky positioning (GH 42537)

  • Bug deserializing datetime index in PYTHONOPTIMIZED mode (GH 42866)

Contributor

A total of 16 people contributed patches to this version. People with a “+” after their names contributed patches for the first time.

  • Alexander Gorodetsky +

  • Fangchen Li

  • Fred Reiss

  • Justin McOmie +

  • Matthew Zeitlin

  • MeeseeksMachine

  • pandas development team

  • Patrick Hoefler

  • Richard Shadrach

  • Shoham Debnath

  • Simon Hawkins

  • Thomas Li

  • Wenjun Si

  • attack68

  • dicristina +

  • jbrockmendel

New features in 1.3.1 (July 25, 2021)

Original text: pandas.pydata.org/docs/whatsnew/v1.3.1.html

These are changes in pandas 1.3.1. Check out the release notes for the full changelog including other versions of pandas.

fixed return

  • Unable to build Pandas on PyPy (GH 42355)

  • DataFrame built with older versions of pandas cannot be deserialized (GH 42345)

  • Performance regression in building DataFrame from dictionaries of dictionaries (GH 42248)

  • Fixed regression in DataFrame.agg() where values ​​were lost when DataFrame had extended array dtype, duplicate index and axis=1 (GH 42380)

  • Fixed regression in DataFrame.astype() changing the order of non-contiguous data (GH 42396)

  • Performance regression in DataFrame during reduction operations that require transformations, such as when performing DataFrame.mean() on integer data (GH 38592)

  • Performance regression in DataFrame.to_dict() and Series.to_dict() when the orient argument is “records”, “dict”, or “split” (GH 42352)

  • A regression that incorrectly raised a TypeError when indexing with a list subclass has been fixed (GH 42433, GH 42461)

  • Fixed regression in DataFrame.isin() and Series.isin() raising TypeError when nullable data containing at least one missing value (GH 42405)

  • In concat() between objects with boolean and integer dtypes, a regression in converting them to objects instead of integers has been fixed (GH 42092)

  • Bug in Series constructor not accepting dask.Array (GH 38645)

  • Fixed regression in SettingWithCopyWarning showing wrong stacklevel (GH 42570)

  • Fixed regression in merge_asof() raising KeyError when one of the by columns was in the index (GH 34488)

  • Fixed regression in to_datetime() returning pd.NaT when cache=True (GH 42259)

  • Fixed a regression where SeriesGroupBy.value_counts() caused an IndexError when called on a Series with only one row (GH 42618) ## Bug fixes

  • Fixed a bug where DataFrame.transpose() lost values ​​when the DataFrame had an extended array data type and duplicate indexes (GH 42380)

  • Fixed bug where DataFrame.to_xml() raised KeyError when called with index=False and an offset index (GH 42458)

  • Fixed a bug where Styler.set_sticky() did not handle the index name correctly for a single index column (GH 42537)

  • Fixed bug where DataFrame.copy() did not merge chunks in the result (GH 42579) ## Contributors

A total of 17 people contributed patches to this release. People with a “+” after their names contributed patches for the first time.

  • Fangchen Li

  • Live +

  • Matthew Roeschke

  • Matthew Zeitlin

  • MeeseeksMachine

  • pandas development team

  • Patrick Hoefler

  • Richard Shadrach

  • Shoham Debnath +

  • Simon Hawkins

  • Stephan Heßelmann +

  • Stephen +

  • Thomas Li

  • Zheyuan +

  • attack68

  • jbrockmendel

  • neelmraman + ## Regression fixed

  • Pandas cannot be built on PyPy (GH 42355)

  • DataFrame built with older versions of pandas cannot be deserialized (GH 42345)

  • There is a performance regression when constructing DataFrame from a dictionary of dictionaries (GH 42248)

  • Fixed regression in DataFrame.agg() where values ​​were lost when DataFrame had extended array dtype, repeated indexing and axis=1 (GH 42380)

  • Fixed regression in DataFrame.astype() changing the order of non-contiguous data (GH 42396)

  • There is a performance regression in DataFrame in reduction operations that require transformations, such as DataFrame.mean() on integer data (GH 38592)

  • There is a performance regression in DataFrame.to_dict() and Series.to_dict() when the orient parameter is one of records, dict or split (GH 42352)

  • Fixed regression that raised TypeError when indexing using list subclasses (GH 42433, GH 42461)

  • Fixed regression where DataFrame.isin() and Series.isin() raised TypeError in nullable data containing at least one missing value (GH 42405)

  • There is a regression in concat() between objects with boolean dtype and integer dtype, converting them to objects instead of integers (GH 42092)

  • Bug not accepting dask.Array in Series constructor (GH 38645)

  • Fixed regression in SettingWithCopyWarning showing incorrect stacklevel (GH 42570)

  • Fixed regression in merge_asof() raising KeyError when one of the by columns was in the index (GH 34488)

  • Fixed regression in to_datetime() returning pd.NaT when generating input with duplicate values ​​when cache=True (GH 42259)

  • Fixed a regression in SeriesGroupBy.value_counts() that caused an IndexError when called on a Series with only one row (GH 42618)

Bug fixes

  • Fixed bug where DataFrame.transpose() dropped values ​​when the DataFrame had extended array dtype and duplicate indexes (GH 42380)

  • Fixed a bug in DataFrame.to_xml() that raised a KeyError when called with index=False and an offset index (GH 42458)

  • Fixed a bug in Styler.set_sticky(), which did not correctly handle the index name in the case of single index column (GH 42537)

  • Fixed bug in DataFrame.copy() failing to merge chunks in the result (GH 42579)

Contributor

A total of 17 people contributed patches to this release. People with a “+” after their name are the first to contribute patches.

  • Fangchen Li

  • Live +

  • Matthew Roeschke

  • Matthew Zeitlin

  • MeeseeksMachine

  • pandas development team

  • Patrick Hoefler

  • Richard Shadrach

  • Shoham Debnath +

  • Simon Hawkins

  • Stephan Heßelmann +

  • Stephen +

  • Thomas Li

  • Zheyuan +

  • attack68

  • jbrockmendel

  • neelmraman +

What's new in 1.3.0 (July 2, 2021)

Original text: pandas.pydata.org/docs/whatsnew/v1.3.0.html

These are changes in pandas 1.3.0. See the Release notes for a complete changelog including other versions of pandas.

warn

When reading new Excel 2007+ (.xlsx) files, the default parameter engine=None will use openpyxl in all cases when the option io.excel.xlsx.reader is set to "auto" engine. Previously, the xlrd engine would be used in some cases. For background on this change, see What's new 1.2.0.

Enhancements

Customize HTTP(s) headers when reading csv or json files

When reading from remote URLs that are not handled by fsspec (such as HTTP and HTTPS), the dictionary passed to storage_options will be used to create the headers included in the request. This can be used to control the User-Agent header or send other custom headers (GH 36688). For example:

In [1]: headers = {"User-Agent": "pandas"}
In [2]: df = pd.read_csv(
 ...:    "https://download.bls.gov/pub/time.series/cu/cu.item",
 ...:    sep="\t",
 ...:    storage_options=headers
 ...: ) 
```### Read and write XML document

We added I/O for reading and rendering shallow versions of [XML](https://www.w3.org/standards/xml/core) documents using `read_xml()` and `DataFrame.to_xml()` support. Use [lxml](https://lxml.de) as parser, both XPath 1.0 and XSLT 1.0 are available ([GH 27554](https://github.com/pandas-dev/pandas/issues/27554)).

```py
In [1]: xml = """<?xml version='1.0' encoding='utf-8'?>
 ...: <data>
 ...: <row>
 ...:    <shape>square</shape>
 ...:    <degrees>360</degrees>
 ...:    <sides>4.0</sides>
 ...: </row>
 ...: <row>
 ...:    <shape>circle</shape>
 ...:    <degrees>360</degrees>
 ...:    <sides/>
 ...: </row>
 ...: <row>
 ...:    <shape>triangle</shape>
 ...:    <degrees>180</degrees>
 ...:    <sides>3.0</sides>
 ...: </row>
 ...: </data>"""

In [2]: df = pd.read_xml(xml)
In [3]: df
Out[3]:
 shape  degrees  sides
0    square      360    4.0
1    circle      360    NaN
2  triangle      180    3.0

In [4]: df.to_xml()
Out[4]:
<?xml version='1.0' encoding='utf-8'?>
<data>
 <row>
 <index>0</index>
 <shape>square</shape>
 <degrees>360</degrees>
 <sides>4.0</sides>
 </row>
 <row>
 <index>1</index>
 <shape>circle</shape>
 <degrees>360</degrees>
 <sides/>
 </row>
 <row>
 <index>2</index>
 <shape>triangle</shape>
 <degrees>180</degrees>
 <sides>3.0</sides>
 </row>
</data> 
</code></pre>

For more information, see Writing XML in the IO Tools User Guide. ### Styler enhancement

We've done some focused development on <code>Styler</code>. See the revised and improved Styler documentation (GH 39720, GH 39317, GH 40493).

<blockquote>
  <ul>
  <li>The method <code>Styler.set_table_styles()</code> now accepts a more natural CSS language as argument, such as <code>'color:red;'</code> instead of <code>[('color', 'red')]</code> (GH 39563).</p></li>
  <li><p>Methods <code>Styler.highlight_null()</code>, <code>Styler.highlight_min()</code> and <code>Styler.highlight_max()</code> now allow custom CSS highlighting instead of the default background coloring (GH 40242)</p></li>
  <li><p><code>Styler.apply()</code> now accepts functions that return <code>ndarray</code> when <code>axis=None</code>, making it consistent with the behavior of <code>axis=0</code> and <code>axis=1</code> (GH 39359)</p></li>
  <li><p>When providing malformed CSS via <code>Styler.apply()</code> or <code>Styler.applymap()</code>, an error is now thrown when rendering (GH 39660)</p></li>
  <li><p><code>Styler.format()</code> now accepts keyword argument <code>escape</code> for optional HTML and LaTeX escaping (GH 40388, GH 41619)</p></li>
  <li><p><code>Styler.background_gradient()</code> added parameter <code>gmap</code> to provide a specific gradient map for coloring (GH 22727)</p></li>
  <li><p><code>Styler.clear()</code> now also clears <code>Styler.hidden_index</code> and <code>Styler.hidden_columns</code> (GH 40484)</p></li>
  <li><p>Added method <code>Styler.highlight_between()</code> (GH 39821)</p></li>
  <li><p>Added method <code>Styler.highlight_quantile()</code> (GH 40926)</p></li>
  <li><p>Added method <code>Styler.text_gradient()</code> (GH 41098)</p></li>
  <li><p>Added method <code>Styler.set_tooltips()</code> to allow hover tooltips; this can enhance interactive display (GH 21266, GH 40284)</p></li>
  <li><p>Added parameter <code>precision</code> to method <code>Styler.format()</code> to control the display of floating point numbers (GH 40134)</p></li>
  <li><p>HTML output rendered by <code>Styler</code> now follows the w3 HTML Style Guide (GH 39626)</p></li>
  <li><p>Many features of the <code>Styler</code> class are now partially or fully available for DataFrames with non-unique indexes or columns (GH 41143)</p></li>
  <li><p>Better control over display by sparsifying indexes or columns individually using new styler options, which are also available via <code>option_context()</code> (GH 41142)</p></li>
  <li><p>Added option <code>styler.render.max_elements</code> to avoid browser overload when styling large DataFrames (GH 40712)</p></li>
  <li><p>Added method <code>Styler.to_latex()</code> (GH 21673, GH 42320), which also allows some limited CSS transformations (GH 40731)</p></li>
  <li><p>Added method <code>Styler.to_html()</code> (GH 13379)</p></li>
  <li><p>Added method <code>Styler.set_sticky()</code> to make index and column headers permanently visible in scrolling HTML frames (GH 29072) ### DataFrame constructor respects <code>copy=False</code></p></li>
  </ul>
</blockquote>

<p>Copying is no longer done when passing a dictionary to <code>DataFrame</code> with <code>copy=False</code> (GH 32960).

<pre><code class="language-python line-numbers">In [1]: arr = np.array([1, 2, 3])

In [2]: df = pd.DataFrame({"A": arr, "B": arr.copy()}, copy=False)

In [3]: df
Out[3]: 
 A  B
0  1  1
1  2  2
2  3  3 
</code></pre>

<code>df["A"]</code> is still a view of <code>arr</code>:

<pre><code class="language-python line-numbers">In [4]: arr[0] = 0

In [5]: assert df.iloc[0, 0] == 0 
</code></pre>

When no <code>copy</code> argument is passed, the default behavior remains unchanged, i.e. a copy is made. ### String data type based on PyArrow

We have enhanced <code>StringDtype</code> specifically for string data, which is an extended type. (GH 39908)

It is now possible to add attributes to <code>StringDtype</code> by specifying the <code>storage</code> keyword option. You can make a StringArray backed by a PyArrow array rather than a NumPy array or Python object, using the pandas option or specifying a dtype using <code>dtype='string[pyarrow]'</code>.

StringArray supported by PyArrow requires pyarrow 1.0.0 or higher to be installed.

warn

<code>string[pyarrow]</code> is currently considered an experimental feature. The implementation and parts of the API may change without warning.

<pre><code class="language-python line-numbers">In [6]: pd.Series(['abc', None, 'def'], dtype=pd.StringDtype(storage="pyarrow"))
Out[6]: 
0     abc
1    <NA>
2     def
dtype: string 
</code></pre>

You can also use the alias <code>"string[pyarrow]"</code>.

<pre><code class="language-python line-numbers">In [7]: s = pd.Series(['abc', None, 'def'], dtype="string[pyarrow]")

In [8]: s
Out[8]: 
0     abc
1    <NA>
2     def
dtype: string 
</code></pre>

You can also create PyArrow-based string arrays using the pandas option.

<pre><code class="language-python line-numbers">In [9]: with pd.option_context("string_storage", "pyarrow"):
 ...:    s = pd.Series(['abc', None, 'def'], dtype="string")
 ...: 

In [10]: s
Out[10]: 
0     abc
1    <NA>
2     def
dtype: string 
</code></pre>

The usual string accessor methods work. Where appropriate, the DataFrame's Series or column return type will also have a string dtype.

<pre><code class="language-python line-numbers">In [11]: s.str.upper()
Out[11]: 
0     ABC
1    <NA>
2     DEF
dtype: string

In [12]: s.str.split('b', expand=True).dtypes
Out[12]: 
0    string[pyarrow]
1    string[pyarrow]
dtype: object 
</code></pre>

String accessor methods that return integers will return values ​​with <code>Int64Dtype</code>.

<pre><code class="language-python line-numbers">In [13]: s.str.count("a")
Out[13]: 
0       1
1    <NA>
2       0
dtype: Int64 
```### Centered datetime scrolling window

Centered datetime windows are now available when performing rolling calculations on DataFrame and Series objects ([GH 38780](https://github.com/pandas-dev/pandas/issues/38780)). For example:

```py
In [14]: df = pd.DataFrame(
 ....:    {"A": [0, 1, 2, 3, 4]}, index=pd.date_range("2020", periods=5, freq="1D")
 ....: )
 ....: 

In [15]: df
Out[15]: 
 A
2020-01-01  0
2020-01-02  1
2020-01-03  2
2020-01-04  3
2020-01-05  4

In [16]: df.rolling("2D", center=True).mean()
Out[16]: 
 A
2020-01-01  0.5
2020-01-02  1.5
2020-01-03  2.5
2020-01-04  3.5
2020-01-05  4.0 
```### Other enhancements

+ `DataFrame.rolling()`, `Series.rolling()`, `DataFrame.expanding()` and `Series.expanding()` now support the `method` argument and provide a `&#39;table`` option, Window operations can be performed on the entire `DataFrame`. See the window overview for performance and functionality benefits ([GH 15095](https://github.com/pandas-dev/pandas/issues/15095), [GH 38995](https://github.com/pandas -dev/pandas/issues/38995))

+ `ExponentialMovingWindow` now supports an `online` method to perform `mean` calculations online. See window overview ([GH 41673](https://github.com/pandas-dev/pandas/issues/41673))

+ Added `MultiIndex.dtypes()` ([GH 37062](https://github.com/pandas-dev/pandas/issues/37062))

+ Added `end` and `end_day` options to the `origin` parameter of `DataFrame.resample()` ([GH 37804](https://github.com/pandas-dev/pandas/issues/37804))

+ Improved error message when `usecols` and `names` do not match when `read_csv()` and `engine="c"` ([GH 29042](https://github.com/pandas-dev /pandas/issues/29042))

+ Improved consistency of error messages when passing invalid `win_type` parameters in window methods ([GH 15969](https://github.com/pandas-dev/pandas/issues/15969))

+ `read_sql_query()` now accepts a `dtype` parameter to convert column data in the SQL database based on user input ([GH 10285](https://github.com/pandas-dev/pandas/issues/10285))

+ When `usecols` is not specified, a `ParserWarning` is raised in `read_csv()` if the header or given name length does not match the data length ([GH 21768](https://github.com/ pandas-dev/pandas/issues/21768))

+ Improved pandas to SQLAlchemy integer type mapping when using `DataFrame.to_sql()` ([GH 35076](https://github.com/pandas-dev/pandas/issues/35076))

+ `to_numeric()` now supports downcasting of nullable `ExtensionDtype` objects ([GH 33013](https://github.com/pandas-dev/pandas/issues/33013))

+ Added support for dictionary-like names in `MultiIndex.set_names` and `MultiIndex.rename` ([GH 20421](https://github.com/pandas-dev/pandas/issues/20421))

+ `read_excel()` now automatically detects .xlsb files and legacy .xls files ([GH 35416](https://github.com/pandas-dev/pandas/issues/35416), [GH 41225](https: //github.com/pandas-dev/pandas/issues/41225))

+ `ExcelWriter` now accepts an `if_sheet_exists` parameter for controlling the behavior of appending patterns when writing to existing sheets ([GH 40230](https://github.com/pandas-dev/pandas/issues/40230 ))

+ `Rolling.sum()`, `Expanding.sum()`, `Rolling.mean()`, `Expanding.mean()`, `ExponentialMovingWindow.mean()`, `Rolling.median()`, ` Expanding.median()`, `Rolling.max()`, `Expanding.max()`, `Rolling.min()` and `Expanding.min()` now support using the `engine` keyword [Numba] (http://numba.pydata.org/) Execution ([GH 38895](https://github.com/pandas-dev/pandas/issues/38895), [GH 41267](https://github.com /pandas-dev/pandas/issues/41267))

+ `DataFrame.apply()` now accepts NumPy unary operators as strings, such as `df.apply("sqrt")`, which already exists in `Series.apply()` ([GH 39116](https ://github.com/pandas-dev/pandas/issues/39116))

+ `DataFrame.apply()` now accepts non-callable DataFrame properties as strings, such as `df.apply("size")`, which already exists in `Series.apply()` ([GH 39116]( https://github.com/pandas-dev/pandas/issues/39116))

+ `DataFrame.applymap()` can now accept kwargs to pass to a user-supplied `func` ([GH 39987](https://github.com/pandas-dev/pandas/issues/39987))

+ Passing the `DataFrame` indexer to `iloc` is now not allowed for `Series.__getitem__()` and `DataFrame.__getitem__()` ([GH 39004](https://github.com/pandas-dev /pandas/issues/39004))

+ `Series.apply()` can now accept a list or dictionary-like argument instead of a list or dictionary, e.g. `ser.apply(np.array(["sum", "mean"]))`, which is the case in ` Already exists in DataFrame.apply()` ([GH 39140](https://github.com/pandas-dev/pandas/issues/39140))

+ `DataFrame.plot.scatter()` can now accept a categorical column as parameter `c` ([GH 12380](https://github.com/pandas-dev/pandas/issues/12380), [GH 31357] (https://github.com/pandas-dev/pandas/issues/31357))

+ A useful error message is now raised for `Series.loc()` when the Series has a `MultiIndex` and the indexer has too many dimensions ([GH 35349](https://github.com/pandas-dev/ pandas/issues/35349))

+ `read_stata()` now supports reading data from compressed files ([GH 26599](https://github.com/pandas-dev/pandas/issues/26599))

+ Added support for `ISO 8601`-like timestamp parsing of data read from compressed files to `Timedelta` ([GH 37172](https://github.com/pandas-dev/pandas/issues/37172) )

+ Added support for unary operators in `FloatingArray` ([GH 38749](https://github.com/pandas-dev/pandas/issues/38749))

+ `RangeIndex` can now be constructed directly by passing a `range` object, e.g. `pd.RangeIndex(range(3))` ([GH 12067](https://github.com/pandas-dev/pandas/issues/ 12067))

+ `Series.round()` and `DataFrame.round()` now handle nullable integer and floating point types ([GH 38844](https://github.com/pandas-dev/pandas/issues/38844) )

+ `read_csv()` and `read_json()` provide parameter `encoding_errors` to control how encoding errors are handled ([GH 39450](https://github.com/pandas-dev/pandas/issues/39450))

+ `DataFrameGroupBy.any()`, `SeriesGroupBy.any()`, `DataFrameGroupBy.all()`, and `SeriesGroupBy.all()` when using Kleene logic with nullable data types ([GH 37506](https ://github.com/pandas-dev/pandas/issues/37506))

+ `DataFrameGroupBy.any()`, `SeriesGroupBy.any()`, `DataFrameGroupBy.all()`, and `SeriesGroupBy.all()` return a `BooleanDtype` for columns with nullable data types ([ GH 33449](https://github.com/pandas-dev/pandas/issues/33449))

+ `DataFrameGroupBy.any()`, `SeriesGroupBy.any()`, `DataFrameGroupBy.all()`, and `SeriesGroupBy.all()` on `object` data containing `pd.NA` even if `skipna= True` also throws an exception ([GH 37501](https://github.com/pandas-dev/pandas/issues/37501))

+ `DataFrameGroupBy.rank()` and `SeriesGroupBy.rank()` now support object dtype data ([GH 38278](https://github.com/pandas-dev/pandas/issues/38278))

+ When the `data` argument is a Python iterable object that is not composed of a NumPy `ndarray`, constructing a `DataFrame` or `Series` will result in a dtype with a precision of the maximum value of the NumPy scalar; when `data` is a NumPy ` ndarray` ([GH 40908](https://github.com/pandas-dev/pandas/issues/40908))

+ Add keyword `sort` in `pivot_table()` to allow results to be unsorted ([GH 39143](https://github.com/pandas-dev/pandas/issues/39143))

+ Add keyword `dropna` in `DataFrame.value_counts()` to allow counting rows containing `NA` values ​​([GH 41325](https://github.com/pandas-dev/pandas/issues/ 41325))

+ `Series.replace()` now converts the result to `PeriodDtype` instead of `object` dtype ([GH 41526](https://github.com/pandas-dev/pandas/issues/41526))

+ Improved error messages in the `corr` and `cov` methods of `Rolling`, `Expanding` and `ExponentialMovingWindow` when `other` is not a `DataFrame` or `Series` ([GH 41741](https: //github.com/pandas-dev/pandas/issues/41741))

+ `Series.between()` now accepts `left` or `right` as an `inclusive` parameter to include only left or right borders ([GH 40245](https://github.com/pandas-dev /pandas/issues/40245))

+ `DataFrame.explode()` now supports exploding multiple columns simultaneously. Its `column` parameter now also accepts a list or tuple of strings to explode on multiple columns simultaneously ([GH 39240](https://github.com/pandas-dev/pandas/issues/39240))

+ `DataFrame.sample()` now accepts the `ignore_index` parameter to reset the index after sampling, similar to `DataFrame.drop_duplicates()` and `DataFrame.sort_values()` ([GH 38581](https://github .com/pandas-dev/pandas/issues/38581)) ## Notable bug fixes

These are bug fixes that may have significant behavior changes.

### `Categorical.unique` now always remains the same as the original data type

Previously, when calling `Categorical.unique()` with categorical data, unused categories in the new array were removed, making the new array&#39;s data type different from the original data type ([GH 18291](https://github. com/pandas-dev/pandas/issues/18291))

For example, given:

```py
In [17]: dtype = pd.CategoricalDtype(['bad', 'neutral', 'good'], ordered=True)

In [18]: cat = pd.Categorical(['good', 'good', 'bad', 'bad'], dtype=dtype)

In [19]: original = pd.Series(cat)

In [20]: unique = original.unique() 
</code></pre>

<em>Previous Behavior</em>:

<pre><code class="language-python line-numbers">In [1]: unique
['good', 'bad']
Categories (2, object): ['bad' < 'good']
In [2]: original.dtype == unique.dtype
False 
</code></pre>

<em>New Behavior</em>:

<pre><code class="language-python line-numbers">In [21]: unique
Out[21]: 
['good', 'bad']
Categories (3, object): ['bad' < 'neutral' < 'good']

In [22]: original.dtype == unique.dtype
Out[22]: True 
```### exist `DataFrame.combine_first()` reserved data types in

`DataFrame.combine_first()` now preserves data types ([GH 7509](https://github.com/pandas-dev/pandas/issues/7509))

```py
In [23]: df1 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=[0, 1, 2])

In [24]: df1
Out[24]: 
 A  B
0  1  1
1  2  2
2  3  3

In [25]: df2 = pd.DataFrame({"B": [4, 5, 6], "C": [1, 2, 3]}, index=[2, 3, 4])

In [26]: df2
Out[26]: 
 B  C
2  4  1
3  5  2
4  6  3

In [27]: combined = df1.combine_first(df2) 
</code></pre>

<em>Previous Behavior</em>:

<pre><code class="language-python line-numbers">In [1]: combined.dtypes
Out[2]:
A    float64
B    float64
C    float64
dtype: object 
</code></pre>

<em>New Behavior</em>:

<pre><code class="language-python line-numbers">In [28]: combined.dtypes
Out[28]: 
A    float64
B      int64
C    float64
dtype: object 
```### Groupby method agg and transform No longer changes the return data type of callable functions

Previously, the methods `DataFrameGroupBy.aggregate()`, `SeriesGroupBy.aggregate()`, `DataFrameGroupBy.transform()` and `SeriesGroupBy.transform()` might convert the result data type when the argument `func` was a callable function , may lead to undesirable results ([GH 21240](https://github.com/pandas-dev/pandas/issues/21240)). Conversion occurs if the result is numeric and converting back to the input data type does not change any value (as measured by `np.allclose`). Such conversions no longer occur.

```py
In [29]: df = pd.DataFrame({'key': [1, 1], 'a': [True, False], 'b': [True, True]})

In [30]: df
Out[30]: 
 key      a     b
0    1   True  True
1    1  False  True 
</code></pre>

<em>Previous Behavior</em>:

<pre><code class="language-python line-numbers">In [5]: df.groupby('key').agg(lambda x: x.sum())
Out[5]:
 a  b
key
1    True  2 
</code></pre>

<em>New Behavior</em>:

<pre><code class="language-python line-numbers">In [31]: df.groupby('key').agg(lambda x: x.sum())
Out[31]: 
 a  b
key 
1    1  2 
```### `DataFrameGroupBy.mean()`、`DataFrameGroupBy.median()` and `GDataFrameGroupBy.var()`、`SeriesGroupBy.mean()`、`SeriesGroupBy.median()` and `SeriesGroupBy.var()` The return result is now `float`

Previously, these methods could produce different data types depending on the input value. These methods will now always return floating point data types. ([GH 41137](https://github.com/pandas-dev/pandas/issues/41137))

```py
In [32]: df = pd.DataFrame({'a': [True], 'b': [1], 'c': [1.0]}) 
</code></pre>

<em>Previous Behavior</em>:

<pre><code class="language-python line-numbers">In [5]: df.groupby(df.index).mean()
Out[5]:
 a  b    c
0    True  1  1.0 
</code></pre>

<em>New Behavior</em>:

<pre><code class="language-python line-numbers">In [33]: df.groupby(df.index).mean()
Out[33]: 
 a    b    c
0  1.0  1.0  1.0 
```### Try using `loc` and `iloc` Perform in-place operations when setting values

When setting an entire column using `loc` or `iloc`, pandas will try to insert values ​​into existing data instead of creating an entirely new array.

```py
In [34]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")

In [35]: values = df.values

In [36]: new = np.array([5, 6, 7], dtype="int64")

In [37]: df.loc[[0, 1, 2], "A"] = new 
</code></pre>

In the old and new behavior, the data in <code>values</code> is overwritten, but in the old behavior the data type of <code>df["A"]</code> becomes <code>int64</code>.

<em>Previous Behavior</em>:

<pre><code class="language-python line-numbers">In [1]: df.dtypes
Out[1]:
A    int64
dtype: object
In [2]: np.shares_memory(df["A"].values, new)
Out[2]: False
In [3]: np.shares_memory(df["A"].values, values)
Out[3]: False 
</code></pre>

In pandas 1.3.0, <code>df</code> still shares data with <code>values</code>

<em>New Behavior</em>:

<pre><code class="language-python line-numbers">In [38]: df.dtypes
Out[38]: 
A    float64
dtype: object

In [39]: np.shares_memory(df["A"], new)
Out[39]: False

In [40]: np.shares_memory(df["A"], values)
Out[40]: True 
```### When setting `frame[keys] = values` Never operate in place

When setting multiple columns using `frame[keys] = values`, the new arrays will replace the pre-existing arrays for those keys and these arrays *will not* be overwritten ([GH 39510](https://github.com/pandas -dev/pandas/issues/39510)). Therefore, the column will retain the data type of `values` and will not be converted to the data type of the existing array.

```py
In [41]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")

In [42]: df[["A"]] = 5 
</code></pre>

In the old behavior, <code>5</code> was converted to <code>float64</code> and inserted into the existing array <code>df</code>:

<em>Previous Behavior</em>:

<pre><code class="language-python line-numbers">In [1]: df.dtypes
Out[1]:
A    float64 
</code></pre>

In the new behavior, we get a new array and retain an integer <code>5</code>:

<em>New Behavior</em>:

<pre><code class="language-shell line-numbers">In [43]: df.dtypes
Out[43]: 
A    int64
dtype: object 
```### Set non-boolean value to type boolean Series perform consistent conversions

Setting non-boolean values ​​into a `Series` with `dtype=bool` is now always converted to `dtype=object` ([GH 38709](https://github.com/pandas-dev/pandas/issues/38709) )

```py
In [1]: orig = pd.Series([True, False])

In [2]: ser = orig.copy()

In [3]: ser.iloc[1] = np.nan

In [4]: ser2 = orig.copy()

In [5]: ser2.iloc[1] = 2.0 
</code></pre>

<em>Previous Behavior</em>:

<pre><code class="language-shell line-numbers">In [1]: ser
Out [1]:
0    1.0
1    NaN
dtype: float64

In [2]:ser2
Out [2]:
0    True
1     2.0
dtype: object 
</code></pre>

<em>New Behavior</em>:

<pre><code class="language-python line-numbers">In [1]: ser
Out [1]:
0    True
1     NaN
dtype: object

In [2]:ser2
Out [2]:
0    True
1     2.0
dtype: object 
```### DataFrameGroupBy.rolling and SeriesGroupBy.rolling Grouping column values ​​are no longer returned

Grouping columns are now removed from the results of `groupby.rolling` operations ([GH 32262](https://github.com/pandas-dev/pandas/issues/32262))

```py
In [44]: df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})

In [45]: df
Out[45]: 
 A  B
0  1  0
1  1  1
2  2  2
3  3  3 
</code></pre>

<em>Previous Behavior</em>:

<pre><code class="language-python line-numbers">In [1]: df.groupby("A").rolling(2).sum()
Out[1]:
 A    B
A
1 0  NaN  NaN
1    2.0  1.0
2 2  NaN  NaN
3 3  NaN  NaN 
</code></pre>

<em>New Behavior</em>:

<pre><code class="language-python line-numbers">In [46]: df.groupby("A").rolling(2).sum()
Out[46]: 
 B
A 
1 0  NaN
 1  1.0
2 2  NaN
3 3  NaN 
```### Removed artificial truncation of rolling variance and standard deviation

`Rolling.std()` and `Rolling.var()` will no longer artificially truncate results smaller than `~1e-8` and `~1e-15` to zero ([GH 37051](https://github .com/pandas-dev/pandas/issues/37051), [GH 40448](https://github.com/pandas-dev/pandas/issues/40448), [GH 39872](https://github.com /pandas-dev/pandas/issues/39872)).

However, when the values ​​are large, there may be floating point residue in the result.

```py
In [47]: s = pd.Series([7, 5, 5, 5])

In [48]: s.rolling(3).var()
Out[48]: 
0         NaN
1         NaN
2    1.333333
3    0.000000
dtype: float64 
```### DataFrameGroupBy.rolling and SeriesGroupBy.rolling �� MultiIndex No longer removes levels from results in

`DataFrameGroupBy.rolling()` and `SeriesGroupBy.rolling()` will no longer remove levels of `DataFrame` with `MultiIndex` from the results. This could lead to duplication of levels in the generated `MultiIndex`, but this change restores the behavior that existed in version 1.1.3 ([GH 38787](https://github.com/pandas-dev/pandas/issues/ 38787), [GH 38523](https://github.com/pandas-dev/pandas/issues/38523)).

```py
In [49]: index = pd.MultiIndex.from_tuples([('idx1', 'idx2')], names=['label1', 'label2'])

In [50]: df = pd.DataFrame({'a': [1], 'b': [2]}, index=index)

In [51]: df
Out[51]: 
 a  b
label1 label2 
idx1   idx2    1  2 
</code></pre>

<em>Previous Behavior</em>:

<pre><code class="language-python line-numbers">In [1]: df.groupby('label1').rolling(1).sum()
Out[1]:
 a    b
label1
idx1    1.0  2.0 
</code></pre>

<em>New Behavior</em>:

<pre><code class="language-python line-numbers">In [52]: df.groupby('label1').rolling(1).sum()
Out[52]: 
 a    b
label1 label1 label2 
idx1   idx1   idx2    1.0  2.0 
```## Incompatible API Change

### Added minimum version of dependencies

The minimum supported versions of some dependencies have been updated. If installed, we now need to:

| Package | Minimum Version | Required | Changed |
| --- | --- | --- | --- |
| numpy | 1.17.3 | X | X |
| pytz | 2017.3 | X |  |
| python-dateutil | 2.7.3 | X |  |
| bottleneck | 1.2.1 |  |  |
| numexpr | 2.7.0 |  | X |
| pytest (dev) | 6.0 |  | X |
| mypy (dev) | 0.812 |  | X |
| setuptools | 38.6.0 |  | X |

For [optional libraries](https://pandas.pydata.org/docs/getting_started/install.html), it is generally recommended to use the latest version. The following table lists the minimum versions of each library currently being tested during pandas development. Optional libraries with lower than minimum tested versions may still be available, but are not considered supported.

| Package | Minimum Version | Changed |
| --- | --- | --- |
| beautifulsoup4 | 4.6.0 |  |
| fastparquet | 0.4.0 | X |
| fsspec | 0.7.4 |  |
| gcsfs | 0.6.0 |  |
| lxml | 4.3.0 | |
| matplotlib | 2.2.3 |  |
| numb | 0.46.0 | |
| openpyxl | 3.0.0 | X |
| pyarrow | 0.17.0 | X |
| pymysql | 0.8.1 | X |
| pytables | 3.5.1 |  |
| s3fs | 0.4.0 |  |
| scipy | 1.2.0 |  |
| sqlalchemy | 1.3.0 | X |
| tabulate | 0.8.7 | X |
| xray | 0.12.0 | |
| xlrd | 1.2.0 |  |
| xlsxwriter | 1.0.2 |  |
| xlwt | 1.3.0 |  |
| pandas-gbq | 0.12.0 | |

For more information, see Dependencies and Optional Dependencies### Other API Changes

+ Partially initialized `CategoricalDtype` objects (i.e. objects with `categories=None`) will no longer be equivalent to fully initialized dtype objects ([GH 38516](https://github.com/pandas-dev/pandas/issues/ 38516))

+ Accessing `_constructor_expanddim` on `DataFrame` and `_constructor_sliced` on `Series` now raises `AttributeError`. Previously would raise `NotImplementedError` ([GH 38782](https://github.com/pandas-dev/pandas/issues/38782))

+ Added new `engine` and `**engine_kwargs` arguments to `DataFrame.to_sql()` to support other future "SQL engines". Currently we are still only using `SQLAlchemy` under the hood, but plan to support more engines, such as [turbodbc](https://turbodbc.readthedocs.io/en/latest/) ([GH 36893](https://github .com/pandas-dev/pandas/issues/36893))

+ Removed redundant `freq` from `PeriodIndex` string representation ([GH 41653](https://github.com/pandas-dev/pandas/issues/41653))

+ `ExtensionDtype.construct_array_type()` is now a required method for `ExtensionDtype` subclasses, rather than an optional method ([GH 24860](https://github.com/pandas-dev/pandas/issues/24860))

+ Calling `hash` on a non-hashable pandas object will now raise `TypeError` and display a built-in error message (e.g. `unhashable type: &#39;Series&#39;`). Previously a custom message would be raised such as `The &#39;Series&#39; object is mutable and therefore cannot be hashed`. Additionally, `isinstance(<Series> , abc.collections.Hashable)` will now return `False` ([GH 40013](https://github.com/pandas-dev/pandas/issues/40013))

+ `Styler.from_custom_template()` now has two new parameters for the template name and the old `name` has been removed since template inheritance was introduced for better parsing ([GH 42053](https://github. com/pandas-dev/pandas/issues/42053)). It is also necessary to subclass modifications to the Styler attribute. ### Construct

+ Documents in `.pptx` and `.pdf` formats are no longer included in wheel or source distributions. ([GH 30741](https://github.com/pandas-dev/pandas/issues/30741)) ## Deprecated functionality

### Removing useless columns in DataFrame is deprecated in DataFrame reductions and DataFrameGroupBy operations.

Calling a reduction (e.g. `.min`, `.max`, `.sum`) on a `DataFrame` with `numeric_only=None` (the default) will be silently ignored if the reduce raises a `TypeError` Ignore and remove from results.

This behavior is deprecated. In a future version, a `TypeError` will be raised and the user will need to select a valid column before calling the function.

For example:

```py
In [53]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})

In [54]: df
Out[54]: 
 A          B
0  1 2016-01-01
1  2 2016-01-02
2  3 2016-01-03
3  4 2016-01-04 
</code></pre>

<em>Old Behavior</em>:

<pre><code class="language-python line-numbers">In [3]: df.prod()
Out[3]:
Out[3]:
A    24
dtype: int64 
</code></pre>

<em>Future Behavior</em>:

<pre><code class="language-python line-numbers">In [4]: df.prod()
...
TypeError: 'DatetimeArray' does not implement reduction 'prod'

In [5]: df[["A"]].prod()
Out[5]:
A    24
dtype: int64 
</code></pre>

Similarly, when applying a function to <code>DataFrameGroupBy</code>, columns where the function raises a <code>TypeError</code> are now silently ignored and removed from the result.

This behavior is deprecated. In a future version, a <code>TypeError</code> will be raised and the user will need to select a valid column before calling the function.

For example:

<pre><code class="language-python line-numbers">In [55]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})

In [56]: gb = df.groupby([1, 1, 2, 2]) 
</code></pre>

<em>Old Behavior</em>:

<pre><code class="language-python line-numbers">In [4]: gb.prod(numeric_only=False)
Out[4]:
A
1   2
2  12 
</code></pre>

<em>Future Behavior</em>:

<pre><code class="language-python line-numbers">In [5]: gb.prod(numeric_only=False)
...
TypeError: datetime64 type does not support prod operations

In [6]: gb[["A"]].prod(numeric_only=False)
Out[6]:
 A
1   2
2  12 
```### Other obsolete features

+ Deprecate allowing scalars to be passed to `Categorical` constructor ([GH 38433](https://github.com/pandas-dev/pandas/issues/38433))

+ Deprecated constructing `CategoricalIndex` without passing list-like data ([GH 38944](https://github.com/pandas-dev/pandas/issues/38944))

+ Deprecate allowing subclass-specific keyword arguments to be used in the `Index` constructor and use specific subclasses directly instead ([GH 14093](https://github.com/pandas-dev/pandas/issues/14093), [GH 21311](https://github.com/pandas-dev/pandas/issues/21311), [GH 22315](https://github.com/pandas-dev/pandas/issues/22315), [GH 26974](https://github.com/pandas-dev/pandas/issues/26974))

+ Deprecated `astype()` method of datetime classes (`timedelta64[ns]`, `datetime64[ns]`, `Datetime64TZDtype`, `PeriodDtype`) will be converted to integer type, `values.view(. ..)` ([GH 38544](https://github.com/pandas-dev/pandas/issues/38544)). This deprecated feature was withdrawn in pandas 1.4.0.

+ Deprecated `MultiIndex.is_lexsorted()` and `MultiIndex.lexsort_depth()`, please use `MultiIndex.is_monotonic_increasing()` instead ([GH 32259](https://github.com/pandas-dev/pandas/issues/ 32259))

+ Deprecated keyword `try_cast` in `Series.where()`, `Series.mask()`, `DataFrame.where()`, `DataFrame.mask()`; cast results manually if necessary ([GH 38836](https://github.com/pandas-dev/pandas/issues/38836))

+ Deprecated comparison of `Timestamp` objects with `datetime.date` objects. Instead of using e.g. `ts <= mydate`, use `ts <= pd.Timestamp(mydate)` or `ts.date() <= mydate` ([GH 36131](https://github.com/ pandas-dev/pandas/issues/36131))

+ Deprecated `Rolling.win_type` returning `"freq"` ([GH 38963](https://github.com/pandas-dev/pandas/issues/38963))

+   Deprecated`Rolling.is_datetimelike` ([GH 38963](https://github.com/pandas-dev/pandas/issues/38963))

+ Deprecated `DataFrame` indexer for `Series.__setitem__()` and `DataFrame.__setitem__()` ([GH 39004](https://github.com/pandas-dev/pandas/issues/39004))

+   Deprecated`ExponentialMovingWindow.vol()` ([GH 39220](https://github.com/pandas-dev/pandas/issues/39220))

+ Using `.astype` to convert between `datetime64[ns]` dtype and `DatetimeTZDtype` has been deprecated and will raise a warning in a future release that `obj.tz_localize` or `obj.dt.tz_localize` should be used `instead of ([GH 38622](https://github.com/pandas-dev/pandas/issues/38622))

+ In `DataFrame.unstack()`, `DataFrame.shift()`, `Series.shift()` and `DataFrame.reindex()`, cast `datetime.date` objects to `datetime64` as` fill_value` has been deprecated and `pd.Timestamp(dateobj)` should be passed ([GH 39767](https://github.com/pandas-dev/pandas/issues/39767))

+ Deprecated `Styler.set_na_rep()` and `Styler.set_precision()` in favor of using `Styler.format()` with `na_rep` and `precision` as existing and new input parameters respectively ([ GH 40134](https://github.com/pandas-dev/pandas/issues/40134), [GH 40425](https://github.com/pandas-dev/pandas/issues/40425))

+ Deprecate `Styler.where()` in favor of using another form of `Styler.applymap()` ([GH 40821](https://github.com/pandas-dev/pandas/issues/40821))

+ Deprecated Allow partial failure in `Series.transform()` and `DataFrame.transform()` when `func` is similar to a list or dictionary and raises an exception other than `TypeError`; when `func` raises A warning will be raised in a future release ([GH 40211](https://github.com/pandas-dev/pandas/issues/40211)) on exceptions other than `TypeError`

+ Deprecated the `error_bad_lines` and `warn_bad_lines` parameters in `read_csv()` and `read_table()`, and adopted the `on_bad_lines` parameter ([GH 15122](https://github.com/pandas-dev/pandas /issues/15122))

+ Deprecated support for `np.ma.mrecords.MaskedRecords` in the `DataFrame` constructor, instead passing in `{name: data[name] for name in data.dtype.names}` ([GH 40363] (https://github.com/pandas-dev/pandas/issues/40363))

+ Deprecated the use of `merge()`, `DataFrame.merge()` and `DataFrame.join()` on different levels, resulting in different numbers of different levels ([GH 34862](https://github.com /pandas-dev/pandas/issues/34862))

+ Deprecated the use of `**kwargs` in `ExcelWriter`; use keyword argument `engine_kwargs` instead ([GH 40430](https://github.com/pandas-dev/pandas/issues/40430 ))

+ The `level` keyword in `DataFrame` and `Series` aggregations is deprecated; use groupby instead ([GH 39983](https://github.com/pandas-dev/pandas/issues/39983))

+ Deprecated the `inplace` parameter of `Categorical.remove_categories()`, `Categorical.add_categories()`, `Categorical.reorder_categories()`, `Categorical.rename_categories()`, `Categorical.set_categories()`, and Will be removed in a future version ([GH 37643](https://github.com/pandas-dev/pandas/issues/37643))

+ Deprecated the way to generate duplicate columns in `merge()` through the `suffixes` keyword, and there are existing columns ([GH 22818](https://github.com/pandas-dev/pandas/issues/ 22818))

+ The method of setting `Categorical._codes` has been deprecated, a new `Categorical` should be created and passed in the required codes ([GH 40606](https://github.com/pandas-dev/pandas/issues/ 40606))

+ Deprecated `convert_float` optional parameter in `read_excel()` and `ExcelFile.parse()` ([GH 41127](https://github.com/pandas-dev/pandas/issues/41127) )

+ Deprecated the mixed time zone behavior of `DatetimeIndex.union()`; in a future version, both will be converted to UTC instead of object data types ([GH 39328](https://github.com/pandas- dev/pandas/issues/39328))

+ For `read_csv()` using `engine="c"`, deprecated special handling of `usecols` for out-of-bounds indexes ([GH 25623](https://github.com/pandas-dev/pandas /issues/25623))

+ Deprecated special handling of lists with first element being categorical in the `DataFrame` constructor; pass `pd.DataFrame({col: categorical, ...})` instead ([GH 38845](https: //github.com/pandas-dev/pandas/issues/38845))

+ Deprecated behavior of the `DataFrame` constructor when a `dtype` is passed and the data cannot be converted to that dtype. In a future release, this will raise an exception instead of being silently ignored ([GH 24435](https://github.com/pandas-dev/pandas/issues/24435))

+ Deprecated `Timestamp.freq` property. For properties that use it (`is_month_start`, `is_month_end`, `is_quarter_start`, `is_quarter_end`, `is_year_start`, `is_year_end`), when you have a `freq`, use e.g. `freq.is_month_start(ts)` ([GH 15146](https://github.com/pandas-dev/pandas/issues/15146))

+ Constructing `Series` or `DataFrame` using `DatetimeTZDtype` data and `datetime64[ns]` dtype has been deprecated. Use `Series(data).dt.tz_localize(None)` instead ([GH 41555](https://github.com/pandas-dev/pandas/issues/41555), [GH 33401](https://github .com/pandas-dev/pandas/issues/33401))

+ Deprecated behavior when using large integer values ​​and small integer dtypes in `Series` constructs resulted in silent overflow; use `Series(data).astype(dtype)` instead ([GH 41734](https:// github.com/pandas-dev/pandas/issues/41734))

+ Deprecated `DataFrame` construction behavior for floating point data and integer dtype conversions, even lossy; in a future version this will remain floating point, consistent with the behavior of `Series` ([GH 41770]( https://github.com/pandas-dev/pandas/issues/41770))

+ Deprecated behavior of inferring `timedelta64[ns]`, `datetime64[ns]` or `DatetimeTZDtype` dtype in `Series` construct when passing data containing strings and no `dtype` is passed ([GH 33558](https://github.com/pandas-dev/pandas/issues/33558))

+ In a future release, constructing a `Series` or `DataFrame` with `datetime64[ns]` data and `DatetimeTZDtype` will treat the data as wall clock time rather than UTC time (matching DatetimeIndex behavior). To view the data as UTC time, use `pd.Series(data).dt.tz_localize("UTC").dt.tz_convert(dtype.tz)` or `pd.Series(data.view("int64") , dtype=dtype)`([GH 33401](https://github.com/pandas-dev/pandas/issues/33401))

+ Deprecated passing lists as `key` to `DataFrame.xs()` and `Series.xs()` ([GH 41760](https://github.com/pandas-dev/pandas/issues/ 41760))

+ Deprecated the use of boolean parameters as standard parameter values ​​`{"left", "right", "neither", "both"}` in `Series.between()` ([GH 40628](https://github. com/pandas-dev/pandas/issues/40628))

+ Deprecated the use of positional parameter passing for all of the following, except in special cases ([GH 41485](https://github.com/pandas-dev/pandas/issues/41485)):

+ `concat()` (except `objs`)

+   `read_csv()`(Apart from `filepath_or_buffer`)

+   `read_table()`(Apart from `filepath_or_buffer`)

+ `DataFrame.clip()` and `Series.clip()` (except `upper` and `lower`)

+   `DataFrame.drop_duplicates()`(Apart from `subset`)、`Series.drop_duplicates()`、`Index.drop_duplicates()` and `MultiIndex.drop_duplicates()`

+ `DataFrame.drop()` (except `labels`) and `Series.drop()`

+   `DataFrame.dropna()` and `Series.dropna()`

+   `DataFrame.ffill()`, `Series.ffill()`, `DataFrame.bfill()` and `Series.bfill()`

+   `DataFrame.fillna()` and `Series.fillna()` (Apart from `value`)

+   `DataFrame.interpolate()` and `Series.interpolate()` (Apart from `method`)

+ `DataFrame.mask()` and `Series.mask()` (except `cond` and `other`)

+ `DataFrame.reset_index()` (except `level`) and `Series.reset_index()`

+ `DataFrame.set_axis()` and `Series.set_axis()` (except `labels`)

+ `DataFrame.set_index()` (except `keys`)

+   `DataFrame.sort_index()` and `Series.sort_index()`

+   `DataFrame.sort_values()` (Apart from `by`) and `Series.sort_values()`

+ `DataFrame.where()` and `Series.where()` (except `cond` and `other`)

+ `Index.set_names()` and `MultiIndex.set_names()` (except `names`)

+ `MultiIndex.codes()` (except `codes`)

+ `MultiIndex.set_levels()` (except `levels`)

+ `Resampler.interpolate()` (except `method`) ## Performance improvements

+ Performance improvements for `IntervalIndex.isin()` ([GH 38353](https://github.com/pandas-dev/pandas/issues/38353))

+ Performance improvements for `Series.mean()` nullable data types ([GH 34814](https://github.com/pandas-dev/pandas/issues/34814))

+ Performance improvements for `Series.isin()` nullable data types ([GH 38340](https://github.com/pandas-dev/pandas/issues/38340))

+ `DataFrame.fillna()` performance improvement when using `method="pad"` or `method="backfill"` under nullable floating point and nullable integer data types ([GH 39953]( https://github.com/pandas-dev/pandas/issues/39953))

+ Performance improvement of `DataFrame.corr()` under `method=kendall` ([GH 28329](https://github.com/pandas-dev/pandas/issues/28329))

+ Performance improvement of `DataFrame.corr()` under `method=spearman` ([GH 40956](https://github.com/pandas-dev/pandas/issues/40956), [GH 41885](https: //github.com/pandas-dev/pandas/issues/41885))

+ Performance improvements for `Rolling.corr()` and `Rolling.cov()` ([GH 39388](https://github.com/pandas-dev/pandas/issues/39388))

+ Performance improvements to `RollingGroupby.corr()`, `ExpandingGroupby.corr()`, `ExpandingGroupby.corr()` and `ExpandingGroupby.cov()` ([GH 39591](https://github.com/pandas -dev/pandas/issues/39591))

+ Performance improvements for `unique()` object data types ([GH 37615](https://github.com/pandas-dev/pandas/issues/37615))

+ Performance improvements to `json_normalize()` in the base case (including delimiters) ([GH 40035](https://github.com/pandas-dev/pandas/issues/40035) [GH 15621](https:/ /github.com/pandas-dev/pandas/issues/15621))

+ Performance improvements to `ExpandingGroupby` aggregation method ([GH 39664](https://github.com/pandas-dev/pandas/issues/39664))

+ Performance improvements in `Styler`, rendering time reduced by over 50%, now matches `DataFrame.to_html()` ([GH 39972](https://github.com/pandas-dev/pandas/issues /39972) [GH 39952](https://github.com/pandas-dev/pandas/issues/39952), [GH 40425](https://github.com/pandas-dev/pandas/issues/40425) )

+ Method `Styler.set_td_classes()` is now as efficient as `Styler.apply()` and `Styler.applymap()`, or even more efficient in some cases ([GH 40453](https://github.com /pandas-dev/pandas/issues/40453))

+ Performance improvements in `ExponentialMovingWindow.mean()` with `times` parameter ([GH 39784](https://github.com/pandas-dev/pandas/issues/39784))

+ Performance improvements in `DataFrameGroupBy.apply()` and `SeriesGroupBy.apply()` when Python fallback implementation is required ([GH 40176](https://github.com/pandas-dev/pandas/issues/ 40176))

+ Performance improvements when converting PyArrow boolean arrays to pandas nullable boolean arrays ([GH 41051](https://github.com/pandas-dev/pandas/issues/41051))

+ Data splicing performance improvement, for `CategoricalDtype` type data splicing ([GH 40193](https://github.com/pandas-dev/pandas/issues/40193))

+ Performance improvements in `DataFrameGroupBy.cummin()`, `SeriesGroupBy.cummin()`, `DataFrameGroupBy.cummax()` and `SeriesGroupBy.cummax()` with nullable data types ([GH 37493](https ://github.com/pandas-dev/pandas/issues/37493))

+ Performance improvements in `Series.nunique()` with nan values ​​([GH 40865](https://github.com/pandas-dev/pandas/issues/40865))

+ Performance improvements in `DataFrame.transpose()`, `Series.unstack()` with `DatetimeTZDtype` ([GH 40149](https://github.com/pandas-dev/pandas/issues/40149) )

+ Performance improvements in `Series.plot()` and `DataFrame.plot()` in lazy loading entry points ([GH 41492](https://github.com/pandas-dev/pandas/issues/41492 )) ## Bug fixes

### Classification

+ Bug in `CategoricalIndex` incorrectly not raising `TypeError` when scalar data was passed ([GH 38614](https://github.com/pandas-dev/pandas/issues/38614))

+ Bug in `CategoricalIndex.reindex` when the passed `Index` is not categorical, but its values ​​are all labels in categories ([GH 28690](https://github.com/pandas-dev/pandas/ issues/28690))

+ When constructing `Categorical` from an array of object types of `date` objects, round-trip processing was not performed correctly, that is, `astype` was not used correctly for round-trip processing ([GH 38552](https://github.com/pandas- dev/pandas/issues/38552))

+ Bug in constructing `DataFrame` from `ndarray` and `CategoricalDtype` ([GH 38857](https://github.com/pandas-dev/pandas/issues/38857))

+ Error when setting categorical values ​​into object dtype column of `DataFrame` ([GH 39136](https://github.com/pandas-dev/pandas/issues/39136))

+ A bug occurred when using `DataFrame.reindex()`, which raised `IndexError` when the new index contained duplicates and the old index was a `CategoricalIndex` ([GH 38906](https://github.com /pandas-dev/pandas/issues/38906))

+ When using `Categorical.fillna()`, `NotImplementedError` is raised instead of `ValueError` when filling with non-categorical tuples ([GH 41914](https://github.com/pandas-dev/pandas /issues/41914))

### Datetimelike

+ In `DataFrame` and `Series` constructors, nanoseconds were sometimes removed from `data` of `Timestamp` (or `Timedelta`) whose `dtype` was `datetime64[ns]` (or `timedelta6[ ns]`) ([GH 38032](https://github.com/pandas-dev/pandas/issues/38032))

+ Bug when using `DataFrame.first()` and `Series.first()` when the offset is one month, returning incorrect results when the first day is the last day of the month ([GH 29623] (https://github.com/pandas-dev/pandas/issues/29623))

+ An error occurred while building a `DataFrame` or `Series` with mismatched `datetime64` data and `timedelta64` dtype, or vice versa, failing to raise `TypeError` ([GH 38575](https://github. com/pandas-dev/pandas/issues/38575), [GH 38764](https://github.com/pandas-dev/pandas/issues/38764), [GH 38792](https://github.com/ pandas-dev/pandas/issues/38792))

+ Error when building a `Series` or `DataFrame` using a `datetime` object of type `datetime64[ns]` dtype or a `timedelta` object of type `timedelta64[ns]` dtype, out of range ([GH 38792]( https://github.com/pandas-dev/pandas/issues/38792), [GH 38965](https://github.com/pandas-dev/pandas/issues/38965))

+ There is a bug in `DatetimeIndex.intersection()`, `DatetimeIndex.symmetric_difference()`, `PeriodIndex.intersection()`, `PeriodIndex.symmetric_difference()`, when operating with `CategoricalIndex`, the object type is always returned ([GH 38741](https://github.com/pandas-dev/pandas/issues/38741))

+ There is a bug in `DatetimeIndex.intersection()`. When the frequency is non-Tick, an incorrect result is returned when `n != 1` ([GH 42104](https://github.com/pandas-dev/pandas/ issues/42104))

+ There is a bug in `Series.where()` that incorrectly converts `datetime64` values ​​to `int64` ([GH 37682](https://github.com/pandas-dev/pandas/issues/37682))

+ Bug in `Categorical` incorrectly converting `datetime` object type to `Timestamp` ([GH 38878](https://github.com/pandas-dev/pandas/issues/38878))

+ Bug in comparison between `Timestamp` objects and `datetime64` objects outside the boundaries of the nanosecond `datetime64` implementation ([GH 39221](https://github.com/pandas-dev/pandas/issues/39221 ))

+ A bug exists in values ​​near the implementation boundaries of `Timestamp.round()`, `Timestamp.floor()`, `Timestamp.ceil()`, causing `Timestamp` to be incorrectly rounded to `Timestamp` ([GH 39244](https://github.com/pandas-dev/pandas/issues/39244))

+ A bug exists in values ​​near the implementation boundaries of `Timedelta.round()`, `Timedelta.floor()`, `Timedelta.ceil()`, causing `Timedelta` to be incorrectly rounded to `Timestamp` ([GH 38964](https://github.com/pandas-dev/pandas/issues/38964))

+ Bug in `date_range()` that in extreme cases incorrectly creates a `DatetimeIndex` containing `NaT` instead of raising `OutOfBoundsDatetime` ([GH 24124](https://github.com/pandas-dev /pandas/issues/24124))

+ Bug where `infer_freq()` incorrectly failed to infer the frequency of &#39;H&#39; for `DatetimeIndex` with time zones that crossed DST boundaries ([GH 39556](https://github.com/pandas-dev/pandas/issues /39556))

+ Bug in `Series` backed by `DatetimeArray` or `TimedeltaArray` sometimes failing to set the array&#39;s `freq` to `None` ([GH 41425](https://github.com/pandas-dev/pandas/issues /41425))

### Time difference

+ Bug when constructing `Timedelta` from `np.timedelta64` object with non-nanosecond units, which exceeds the bounds of `timedelta64[ns]` ([GH 38965](https://github.com/pandas-dev /pandas/issues/38965))

+ Bug when constructing `TimedeltaIndex`, incorrectly accepting `np.datetime64("NaT")` objects ([GH 39462](https://github.com/pandas-dev/pandas/issues/39462))

+ Bug when constructing `Timedelta` from an input string containing only symbols and no numbers failed to raise an error ([GH 39710](https://github.com/pandas-dev/pandas/issues/39710))

+ Constructing `TimedeltaIndex` and `to_timedelta()` failed to raise a bug when passing a non-nanosecond `timedelta64` array and overflowing when converting to `timedelta64[ns]` ([GH 40008](https://github .com/pandas-dev/pandas/issues/40008))

### Time zone

+ Different `tzinfo` objects indicating that UTC is not considered equivalent Bug ([GH 39216](https://github.com/pandas-dev/pandas/issues/39216))

+ Bug in `dateutil.tz.gettz("UTC")` is not recognized as equivalent to other `tzinfos` representing UTC ([GH 39276](https://github.com/pandas-dev/pandas/ issues/39276))

### numerical value

+ Bug in `DataFrame.quantile()` and `DataFrame.sort_values()` resulted in incorrect subsequent indexing behavior ([GH 38351](https://github.com/pandas-dev/pandas/issues/38351 ))

+ Bug in `DataFrame.sort_values()`, raising [`IndexError`](https://docs.python.org/3/library/exceptions.html#IndexError "(in Python v3) when `by` is empty .12)") ([GH 40258](https://github.com/pandas-dev/pandas/issues/40258)")

+ `DataFrame.select_dtypes()` will discard numeric `ExtensionDtype` columns when `include=np.number` ([GH 35340](https://github.com/pandas-dev/pandas/issues/35340))

+ Bug in `DataFrame.mode()` and `Series.mode()` not maintaining consistent integer `Index` on empty input ([GH 33321](https://github.com/pandas-dev/pandas/ issues/33321))

+ `DataFrame.rank()` has a bug when DataFrame contains `np.inf` ([GH 32593](https://github.com/pandas-dev/pandas/issues/32593))

+ There is a bug in `DataFrame.rank()` which raises `IndexError` when `axis=0` and the column contains incomparable types ([GH 38932](https://github.com/pandas-dev/pandas/ issues/38932))

+ `Series.rank()`, `DataFrame.rank()`, `DataFrameGroupBy.rank()` and `SeriesGroupBy.rank()` had a bug in handling the most negative `int64` value as a missing value ([GH 32859] (https://github.com/pandas-dev/pandas/issues/32859))

+ `DataFrame.select_dtypes()` has different behavior when `include="int"` in Windows and Linux ([GH 36596](https://github.com/pandas-dev/pandas/issues/36596))

+ `DataFrame.apply()` and `DataFrame.agg()` have errors when passing parameter `func="size"`, which will operate on the entire `DataFrame` instead of rows or columns ([GH 39934](https:/ /github.com/pandas-dev/pandas/issues/39934))

+ `DataFrame.transform()` raised `SpecificationError` when passing a dictionary and columns were missing; now raises `KeyError` ([GH 40004](https://github.com/pandas-dev/pandas/issues/40004 ))

+ `DataFrameGroupBy.rank()` and `SeriesGroupBy.rank()` give incorrect results when using `pct=True` and there are equal values ​​between consecutive groups ([GH 40518](https://github .com/pandas-dev/pandas/issues/40518))

+ `Series.count()` returns `int32` results on 32-bit platforms when parameter `level=None` is used ([GH 40908](https://github.com/pandas-dev/pandas/issues/ 40908))

+ `Series` and `DataFrame` reduction operations using methods `any` and `all` on object data do not return boolean results ([GH 12863](https://github.com/pandas-dev/pandas/issues /12863), [GH 35450](https://github.com/pandas-dev/pandas/issues/35450), [GH 27709](https://github.com/pandas-dev/pandas/issues/27709 ))

+ Using `Series.clip()` on a Series containing NA values ​​and whose data type is a nullable integer or float fails ([GH 40851](https://github.com/pandas-dev/pandas/issues/ 40851))

+ Using `UInt64Index.where()` and `UInt64Index.putmask()` on `other` of `np.int64` data type incorrectly raises `TypeError` ([GH 41974](https://github.com /pandas-dev/pandas/issues/41974))

+ `DataFrame.agg()` does not sort the aggregation axis in the order of supplied aggregate functions, when one or more aggregate functions fail to produce results ([GH 33634](https://github.com /pandas-dev/pandas/issues/33634))

+ Using `DataFrame.clip()` on missing values ​​does not interpret them as unthresholded ([GH 40420](https://github.com/pandas-dev/pandas/issues/40420))

### Convert

+ An issue with `Series.to_dict()` when using the `orient=&#39;records&#39;` parameter, now returns the Python native type ([GH 25969](https://github.com/pandas-dev/pandas/ issues/25969))

+ `Series.view()` and `Index.view()` when converting datetime class (`datetime64[ns]`, `datetime64[ns, tz]`, `timedelta64`, `period`) data types Existing issues ([GH 39788](https://github.com/pandas-dev/pandas/issues/39788))

+ Bug creating `DataFrame` from empty `np.recarray` not preserving original dtypes ([GH 40121](https://github.com/pandas-dev/pandas/issues/40121))

+ `DataFrame` failed to raise `TypeError` when constructed from `frozenset` ([GH 40163](https://github.com/pandas-dev/pandas/issues/40163))

+ Bug in `Index` construct silently ignoring passed `dtype` when data cannot be converted to that dtype ([GH 21311](https://github.com/pandas-dev/pandas/issues/21311))

+ When converting to `dtype=&#39;categorical&#39;`, `StringArray.astype()` returned to NumPy and raised an error when converting ([GH 40450](https://github.com/pandas-dev/pandas /issues/40450))

+ When given an array of numeric NumPy dtypes lower than int64, uint64 and float64, a bug in `factorize()` was that the unique values ​​did not retain their original dtype ([GH 41132](https://github.com/pandas -dev/pandas/issues/41132))

+ Error in `DataFrame` construction creating dictionary containing `ExtensionDtype` and `copy=True`, failed to copy ([GH 38939](https://github.com/pandas-dev/pandas/issues/ 38939))

+ `qcut()` throws an error when passing `Float64DType` as input ([GH 40730](https://github.com/pandas-dev/pandas/issues/40730))

+ Error when constructing `DataFrame` and `Series` using `datetime64[ns]` data and `dtype=object`, the result is a `datetime` object instead of a `Timestamp` object ([GH 41599](https:// github.com/pandas-dev/pandas/issues/41599))

+ Error when constructing `DataFrame` and `Series` using `timedelta64[ns]` data and `dtype=object`, the result is `np.timedelta64` object instead of `Timedelta` object ([GH 41599](https: //github.com/pandas-dev/pandas/issues/41599))

+ Bug in `DataFrame` construction when given a 2D object-dtype `np.ndarray` that needs to be converted to a `Period` or `Interval` object of `PeriodDtype` or `IntervalDtype` respectively ([GH 41812](https://github.com/pandas-dev/pandas/issues/41812))

+ Bug when building `Series` from a list and a `PandasDtype` ([GH 39357](https://github.com/pandas-dev/pandas/issues/39357))

+ A bug in creating `Series` from a `range` object that did not fit within the bounds of `int64` dtype ([GH 30173](https://github.com/pandas-dev/pandas/issues/30173) )

+ Bug when building `Series` from an `Index` and a `dict` where all keys are tuples and need to be re-indexed ([GH 41707](https://github.com/pandas-dev/pandas/ issues/41707))

+ Bug in `infer_dtype()` when identifying Series, Index or arrays with Period dtype ([GH 23553](https://github.com/pandas-dev/pandas/issues/23553))

+ A bug that occurs when calling `infer_dtype()` on a general `ExtensionArray` object will return `"unknown-array"` instead of reporting an error ([GH 37367](https://github.com/pandas-dev /pandas/issues/37367))

+ Bug when calling `DataFrame.convert_dtypes()` on an empty DataFrame, incorrectly raising `ValueError` ([GH 40393](https://github.com/pandas-dev/pandas/issues/40393))

### string

+ Bug when converting from `pyarrow.ChunkedArray` to `StringArray` when the original object does not have any chunks ([GH 41040](https://github.com/pandas-dev/pandas/issues/41040))

+ A bug in `Series.replace()` and `DataFrame.replace()`, for `StringDType` data, using `regex=True` ignores replacement ([GH 41333](https://github.com/ pandas-dev/pandas/issues/41333), [GH 35977](https://github.com/pandas-dev/pandas/issues/35977))

+ A bug in `Series.str.extract()`, using `StringArray` to return an empty `DataFrame` returns object dtype ([GH 41441](https://github.com/pandas-dev/pandas/issues /41441))

+ A bug in `Series.str.replace()` where the `case` argument was ignored when `regex=False` ([GH 41602](https://github.com/pandas-dev/pandas/issues/ 41602))

### interval

+ A bug in `IntervalIndex.intersection()` and `IntervalIndex.symmetric_difference()` always returned object-dtype when operating with `CategoricalIndex` ([GH 38653](https://github.com/pandas -dev/pandas/issues/38653), [GH 38741](https://github.com/pandas-dev/pandas/issues/38741))

+ A bug in `IntervalIndex.intersection()` returned duplicates when at least one `Index` object had a duplicate that existed in another object ([GH 38743](https://github.com/pandas- dev/pandas/issues/38743))

+ `IntervalIndex.union()`, `IntervalIndex.intersection()`, `IntervalIndex.difference()` and `IntervalIndex.symmetric_difference()` now convert when operating with another `IntervalIndex` of incompatible dtype for the appropriate dtype instead of raising `TypeError` ([GH 39267](https://github.com/pandas-dev/pandas/issues/39267))

+ `PeriodIndex.union()`, `PeriodIndex.intersection()`, `PeriodIndex.symmetric_difference()`, `PeriodIndex.difference()` now convert when operating with another `PeriodIndex` of incompatible dtype for object dtype instead of raising `IncompatibleFrequency` ([GH 39306](https://github.com/pandas-dev/pandas/issues/39306))

+ A bug ([GH 41831](https: //github.com/pandas-dev/pandas/issues/41831))

### index

+ Bug in `Index.union()` and `MultiIndex.union()` discard duplicate `Index` values ​​when `Index` is not monotonic or `sort` is set to `False` ([GH 36289](https ://github.com/pandas-dev/pandas/issues/36289), [GH 31326](https://github.com/pandas-dev/pandas/issues/31326), [GH 40862](https:/ /github.com/pandas-dev/pandas/issues/40862))

+ Bug in `CategoricalIndex.get_indexer()` fails to raise `InvalidIndexError` when not unique ([GH 38372](https://github.com/pandas-dev/pandas/issues/38372))

+ Bug in `IntervalIndex.get_indexer()` when `target` has `CategoricalDtype` and both index and target contain NA values ​​([GH 41934](https://github.com/pandas-dev/pandas/issues/ 41934))

+ Bug in `Series.loc()` raises `ValueError` when the input is filtered by a boolean list and the value to be set is a list with lower dimensions ([GH 20438](https://github.com/pandas- dev/pandas/issues/20438))

+ Bug in inserting many new columns into a `DataFrame`, causing incorrect subsequent indexing behavior ([GH 38380](https://github.com/pandas-dev/pandas/issues/38380))

+ Bug in `DataFrame.__setitem__()` raising `ValueError` when setting multiple values ​​to duplicate columns ([GH 15695](https://github.com/pandas-dev/pandas/issues/15695))

+ Bug in `DataFrame.loc()`, `Series.loc()`, `DataFrame.__getitem__()` and `Series.__getitem__()` returning incorrect elements of string slices of non-monotonic `DatetimeIndex` ([ GH 33146](https://github.com/pandas-dev/pandas/issues/33146))

+ Bug in `DataFrame.reindex()` and `Series.reindex()` raised with `method="ffill"` and `method="bfill"` and specified `tolerance` when having time zone aware indexes `TypeError` ([GH 38566](https://github.com/pandas-dev/pandas/issues/38566))

+ Bug in `DataFrame.reindex()` In case of `datetime64[ns]` or `timedelta64[ns]`, `fill_value` is incorrectly converted to integer when it needs to be converted to object dtype ([GH 39755]( https://github.com/pandas-dev/pandas/issues/39755))

+ `DataFrame.__setitem__()` raises a `ValueError` when setting on an empty `DataFrame` with the specified column and a non-empty `DataFrame` value ([GH 38831](https://github.com/pandas- dev/pandas/issues/38831))

+ `DataFrame.loc.__setitem__()` raises `ValueError` when operating on unique columns when `DataFrame` has duplicate columns ([GH 38521](https://github.com/pandas-dev/ pandas/issues/38521))

+ Bug in `DataFrame.iloc.__setitem__()` and `DataFrame.loc.__setitem__()` when using dictionary values ​​to set mixed dtypes ([GH 38335](https://github.com/pandas-dev/pandas /issues/38335))

+ `Series.loc.__setitem__()` and `DataFrame.loc.__setitem__()` raise a `KeyError` when a boolean generator is provided ([GH 39614](https://github.com/pandas-dev /pandas/issues/39614))

+ `Series.iloc()` and `DataFrame.iloc()` raise `KeyError` when a generator is provided ([GH 39614](https://github.com/pandas-dev/pandas/issues/ 39614))

+ `DataFrame.__setitem__()` did not raise `ValueError` when the right side is a `DataFrame` with the wrong number of columns ([GH 38604](https://github.com/pandas-dev/pandas/issues /38604))

+ `Series.__setitem__()` raises `ValueError` when setting `Series` using a scalar indexer ([GH 38303](https://github.com/pandas-dev/pandas/issues/38303))

+ `DataFrame.loc()` removes `MultiIndex` level bug when `DataFrame` as input has only one row ([GH 10521](https://github.com/pandas-dev/pandas/issues/10521 ))

+ `DataFrame.__getitem__()` and `Series.__getitem__()` always raise a `KeyError` when slicing with an existing string where `Index` has a millisecond value ([GH 33589](https:/ /github.com/pandas-dev/pandas/issues/33589))

+ Bug when setting `timedelta64` or `datetime64` value to numeric `Series`, cannot be converted to object dtype ([GH 39086](https://github.com/pandas-dev/pandas/issues/39086 ), [GH 39619](https://github.com/pandas-dev/pandas/issues/39619))

+ Bug when setting an `Interval` value into a `Series` or `DataFrame` with a mismatched `IntervalDtype`, incorrectly casting the new value to an existing dtype ([GH 39120](https:// github.com/pandas-dev/pandas/issues/39120))

+ Bug when setting `datetime64` values ​​into `Series` with integer dtype, incorrectly converting datetime64 values ​​to integers ([GH 39266](https://github.com/pandas-dev/pandas/issues /39266))

+ Bug when setting `np.datetime64("NaT")` into a `Series` with `Datetime64TZDtype`, incorrectly treating time zone non-aware values ​​as time zone aware ([GH 39769](https:/ /github.com/pandas-dev/pandas/issues/39769))

+ Error in `Index.get_loc()` did not raise `KeyError` when `key=NaN` and `method` was specified but `NaN` was not in `Index` ([GH 39382](https://github .com/pandas-dev/pandas/issues/39382))

+ Bug in `DatetimeIndex.insert()` when inserting `np.datetime64("NaT")` into a time zone aware index, incorrectly treating time zone non-aware values ​​as time zone aware ([GH 39769]( https://github.com/pandas-dev/pandas/issues/39769))

+ Bug in `Index.insert()` when setting a new column that cannot fit in an existing `frame.columns`, or in `Series.reset_index()` or `DataFrame.reset_index()` error, it was not converted to a compatible dtype and instead incorrectly raised an exception ([GH 39068](https://github.com/pandas-dev/pandas/issues/39068))

+ Bug in `RangeIndex.append()` where single objects of length 1 were incorrectly concatenated ([GH 39401](https://github.com/pandas-dev/pandas/issues/39401))

+ Bug in `RangeIndex.astype()`, when converting to `CategoricalIndex`, the category became `Int64Index` instead of `RangeIndex` ([GH 41263](https://github.com/pandas-dev /pandas/issues/41263))

+ Bug when using boolean indexer to set `numpy.timedelta64` value into `Series` of object dtype ([GH 39488](https://github.com/pandas-dev/pandas/issues/39488))

+ When using `at` or `iat` to set a numeric value to the boolean type `Series`, the conversion to the object type fails ([GH 39582](https://github.com/pandas-dev/pandas/issues /39582))

+ `DataFrame.__setitem__()` and `DataFrame.iloc.__setitem__()` raise a `ValueError` when trying to use a row slice index and set a list to a value ([GH 40440](https://github .com/pandas-dev/pandas/issues/40440))

+ In `DataFrame.loc()`, `KeyError` was not raised when the key was not found in `MultiIndex` and the level was not fully specified ([GH 41170](https://github.com/ pandas-dev/pandas/issues/41170))

+ Bug in `DataFrame.loc.__setitem__()` when setting expansion incorrectly raised an exception when the index in the expansion axis contained duplicates ([GH 40096](https://github.com/pandas-dev /pandas/issues/40096))

+ When using `MultiIndex`, a bug in `DataFrame.loc.__getitem__()` raises an exception when at least one index column has a floating point type and we convert to a float when retrieving a scalar ([GH 41369](https: //github.com/pandas-dev/pandas/issues/41369))

+ Bug in `DataFrame.loc()` matching non-boolean indexed elements ([GH 20432](https://github.com/pandas-dev/pandas/issues/20432))

+ Indexing using `np.nan` on a `Series` or `DataFrame` with `CategoricalIndex` incorrectly raises `KeyError` when a `np.nan` key is present ([GH 41933](https:// github.com/pandas-dev/pandas/issues/41933))

+ Bug in `Series.__delitem__()` using `ExtensionDtype` incorrectly converts to `ndarray` ([GH 40386](https://github.com/pandas-dev/pandas/issues/40386))

+ Bug in `DataFrame.at()` when using `CategoricalIndex` returned incorrect results when passing integer keys ([GH 41846](https://github.com/pandas-dev/pandas/issues /41846))

+ `DataFrame.loc()` returns `MultiIndex` in wrong order with duplicate indexers ([GH 40978](https://github.com/pandas-dev/pandas/issues/40978))

+ Bug in `DataFrame.__setitem__()` raising `TypeError` when using `str` subclass as column name when using `DatetimeIndex` ([GH 37366](https://github.com/pandas-dev/ pandas/issues/37366)

+ Bug in `PeriodIndex.get_loc()` failed to raise `KeyError` when given a `Period` with a mismatching `freq` ([GH 41670](https://github.com/pandas-dev/pandas /issues/41670))

+ Bug `.loc.__getitem__` raises `OverflowError` instead of `KeyError` when using `UInt64Index` with negative integer keys in some cases and converts to positive integers in other cases ([GH 41777](https ://github.com/pandas-dev/pandas/issues/41777))

+ Bug in `Index.get_indexer()` failed to raise `ValueError` due to invalid `method`, `limit` or `tolerance` arguments in some cases ([GH 41918](https://github. com/pandas-dev/pandas/issues/41918))

+ Bug where `TimedeltaIndex` raised `ValueError` instead of `TypeError` when using invalid string slices `Series` or `DataFrame` ([GH 41821](https://github.com/pandas-dev/pandas/ issues/41821))

+ Bug in the `Index` constructor sometimes silently ignored the specified `dtype`, causing problems ([GH 38879](https://github.com/pandas-dev/pandas/issues/38879))

+ The behavior of `Index.where()` now matches the behavior of `Index.putmask()`, i.e. `index.where(mask, other)` matches `index.putmask(~mask, other)` ( [GH 39412](https://github.com/pandas-dev/pandas/issues/39412))

### Missing

+ Bug in `Grouper` not properly propagating `dropna` parameters; `DataFrameGroupBy.transform()` now correctly handles missing values ​​for `dropna=True` ([GH 35612](https://github.com/pandas- dev/pandas/issues/35612))

+ Bugs in `isna()`, `Series.isna()`, `Index.isna()`, `DataFrame.isna()`, and the corresponding `notna` function does not recognize `Decimal("NaN") `Object([GH 39409](https://github.com/pandas-dev/pandas/issues/39409))

+ Bug in `DataFrame.fillna()`, not accepting dictionary as `downcast` keyword ([GH 40809](https://github.com/pandas-dev/pandas/issues/40809))

+ Bug in `isna()`, which does not return a masked copy of a nullable type, causing any subsequent mask modification to change the original array ([GH 40935](https://github.com/pandas-dev/pandas/ issues/40935))

+ Bug in `DataFrame` construction, when floating point data containing `NaN` and integer `dtype`, converting instead of retaining `NaN` ([GH 26919](https://github.com/pandas-dev/ pandas/issues/26919))

+ Bug in `Series.isin()` and `MultiIndex.isin()` where all NaNs were not treated as equivalent if they were in a tuple ([GH 41836](https://github.com /pandas-dev/pandas/issues/41836))

### multiple indexes

+ Bug in `DataFrame.drop()`, raising `TypeError` when `MultiIndex` is not unique and `level` is not provided ([GH 36293](https://github.com/pandas-dev/pandas/issues /36293))

+ Bug in `MultiIndex.intersection()`, repeating `NaN` in the result ([GH 38623](https://github.com/pandas-dev/pandas/issues/38623))

+ Bug in `MultiIndex.equals()` that incorrectly returned `True` when `MultiIndex` contained `NaN`, even if they were in different order ([GH 38439](https://github.com/pandas-dev /pandas/issues/38439))

+ Bug in `MultiIndex.intersection()`, when intersecting with `CategoricalIndex`, always returns an empty result ([GH 38653](https://github.com/pandas-dev/pandas/issues/38653))

+ Bug in `MultiIndex.difference()` that incorrectly raised `TypeError` when the index contained unsortable entries ([GH 41915](https://github.com/pandas-dev/pandas/issues/41915) )

+ There is a bug in `MultiIndex.reindex()`. When used on an empty `MultiIndex`, a `ValueError` will be raised when indexing only a specific level ([GH 41170](https://github.com/pandas-dev /pandas/issues/41170))

+ `MultiIndex.reindex()` has a bug that raises `TypeError` when re-indexing a flat `Index` ([GH 41707](https://github.com/pandas-dev/pandas/issues/41707) )

### I/O

+ `Index.__repr__()` has a bug when `display.max_seq_items=1` ([GH 38415](https://github.com/pandas-dev/pandas/issues/38415))

+ There is a bug in `read_csv()`. Scientific notation cannot be recognized when the parameter `decimal` is set to scientific notation and `engine="python"` ([GH 31920](https://github.com/pandas-dev /pandas/issues/31920))

+ There is a bug in `read_csv()` when interpreting `NA` values ​​as comments. When `NA` contains a comment string, `engine="python"` is fixed ([GH 34002](https://github .com/pandas-dev/pandas/issues/34002))

+ There is a bug in `read_csv()`. When there are multiple header columns and `index_col` is specified, an `IndexError` is raised when the file has no data rows ([GH 38292](https://github.com/pandas-dev/ pandas/issues/38292))

+ There is a bug in `read_csv()`. In the case of `engine="python"`, the length of `usecols` and `names` are not accepted ([GH 16469](https://github.com/pandas -dev/pandas/issues/16469))

+ There is a bug in `read_csv()`. When `delimiter=","`, `usecols` and `parse_dates` are specified, the object dtype is returned. For the case of `engine="python"` ([GH 35873](https: //github.com/pandas-dev/pandas/issues/35873))

+ There is a bug in `read_csv()`. Specifying `names` and `parse_dates` when `engine="c"` will cause `TypeError` ([GH 33699](https://github.com/pandas-dev /pandas/issues/33699))

+ Bug where `read_clipboard()` and `DataFrame.to_clipboard()` do not work properly in WSL ([GH 38527](https://github.com/pandas-dev/pandas/issues/38527))

+ Allow setting custom error values ​​for the `parse_dates` parameter of `read_sql()`, `read_sql_query()` and `read_sql_table()` ([GH 35185](https://github.com/pandas-dev/pandas/ issues/35185))

+ There is a bug in `DataFrame.to_hdf()` and `Series.to_hdf()` that causes a `KeyError` ([GH 33748](https:// github.com/pandas-dev/pandas/issues/33748))

+ There is a bug in `HDFStore.put()` that causes an incorrect `TypeError` when saving a DataFrame with non-string dtype ([GH 34274](https://github.com/pandas-dev/pandas/issues /34274))

+ There is a bug in `json_normalize()`, causing the first element of the generator object not to be included in the returned DataFrame ([GH 35923](https://github.com/pandas-dev/pandas/issues/ 35923))

+ There is a bug in `read_csv()` that applies thousands separators to date columns when a date column should be parsed and `usecols` is specified for `engine="python"` ([GH 39365](https:// github.com/pandas-dev/pandas/issues/39365))

+ There is a bug in `read_excel()` that forward-fills `MultiIndex` names when specifying multiple header and index columns ([GH 34673](https://github.com/pandas-dev/pandas/issues/34673 ))

+ There is a bug in `read_excel()` that does not respect `set_option()` ([GH 34252](https://github.com/pandas-dev/pandas/issues/34252))

+ On nullable boolean dtype, there is a bug in `read_csv()` that does not switch `true_values` and `false_values` ([GH 34655](https://github.com/pandas-dev/pandas/issues/34655) )

+ There is a bug in `read_json()` that fails to maintain numeric string indexes when using `orient="split"` ([GH 28556](https://github.com/pandas-dev/pandas/issues/ 28556))

+ If `chunksize` is non-zero and the query returns no results, `read_sql()` will return an empty generator. Now returns a generator containing a single empty DataFrame ([GH 34411](https://github.com/pandas-dev/pandas/issues/34411))

+ There is a bug in `read_hdf()` when using the `where` parameter, returning unexpected records when filtering on a categorical string column ([GH 39189](https://github.com/pandas-dev/ pandas/issues/39189))

+ Bug in `read_sas()` raises `ValueError` when `datetimes` is null ([GH 39725](https://github.com/pandas-dev/pandas/issues/39725))

+ Bug in `read_excel()` remove null values ​​from single column spreadsheet ([GH 39808](https://github.com/pandas-dev/pandas/issues/39808))

+ Bug in `read_excel()` loading trailing empty rows/columns for certain file types ([GH 41167](https://github.com/pandas-dev/pandas/issues/41167))

+ Bug in `read_excel()` raises `AttributeError` when excel file has `MultiIndex` header followed by two empty rows and no index ([GH 40442](https://github.com/pandas-dev/pandas/ issues/40442))

+ Bug in `read_excel()`, `read_csv()`, `read_table()`, `read_fwf()` and `read_clipboard()` where a blank line after the `MultiIndex` header would be removed ([GH 40442] (https://github.com/pandas-dev/pandas/issues/40442))

+ Bug in `DataFrame.to_string()` misaligned truncated columns when `index=False` ([GH 40904](https://github.com/pandas-dev/pandas/issues/40904))

+ Bug in `DataFrame.to_string()` when `index=False` adds extra points and misplaces truncated lines ([GH 40904](https://github.com/pandas-dev/pandas/issues/40904) )

+ Bug in `read_orc()` always raises `AttributeError` ([GH 40918](https://github.com/pandas-dev/pandas/issues/40918))

+ Bug in `read_csv()` and `read_table()` If `names` and `prefix` are defined, `prefix` is silently ignored and now raises `ValueError` ([GH 39123](https://github.com /pandas-dev/pandas/issues/39123))

+ Bug in `read_csv()` and `read_excel()` when `mangle_dupe_cols` is set to `True`, dtype of duplicate column names is not respected[GH 35211](https://github.com/pandas-dev /pandas/issues/35211))

+ `read_csv()` now raises `ValueError` when `delimiter` and `sep` are defined but `sep` is silently ignored ([GH 39823](https://github.com/pandas-dev/pandas/ issues/39823))

+ `read_csv()` and `read_table()` interpreted parameters incorrectly when `sys.setprofile` was previously called ([GH 41069](https://github.com/pandas-dev/pandas/issues /41069))

+ There is a bug when converting from PyArrow to pandas (for example, for reading Parquet), problems occur when the data buffer size of nullable data types and PyArrow arrays is not a multiple of the dtype size ([GH 40896](https:/ /github.com/pandas-dev/pandas/issues/40896))

+ `read_excel()` raises an error when pandas cannot determine the file type but the user specifies the `engine` parameter ([GH 41225](https://github.com/pandas-dev/pandas/issues/41225))

+ Bug in `read_clipboard()` would move values ​​to the wrong column if there was a null value in the first column when copying in an excel file ([GH 41108](https://github.com/pandas -dev/pandas/issues/41108))

+ `DataFrame.to_hdf()` and `Series.to_hdf()` raise `TypeError` when trying to append an incompatible column to a string column ([GH 41897](https://github.com/pandas -dev/pandas/issues/41897))

### cycle

+ Comparisons between `Period` objects or `Index`, `Series` or `DataFrame` if the `PeriodDtype` does not match will now behave like comparisons of other mismatched types, returning `False` for equality and `False` for inequality Returns `True`, raising `TypeError` for inequality checks ([GH 39274](https://github.com/pandas-dev/pandas/issues/39274))

### Drawing

+ `plotting.scatter_matrix()` reports an error when passing 2D `ax` parameter ([GH 16253](https://github.com/pandas-dev/pandas/issues/16253))

+ Prevent warnings from appearing when Matplotlib&#39;s `constrained_layout` is enabled ([GH 25261](https://github.com/pandas-dev/pandas/issues/25261))

+ A bug in `DataFrame.plot()` where if the function was called repeatedly and some calls used `yerr` but others did not, the colors shown in the legend were wrong ([GH 39522](https: //github.com/pandas-dev/pandas/issues/39522))

+ A bug in `DataFrame.plot()` where if the function was called repeatedly, some using `secondary_y` and others using `legend=False`, the colors displayed in the legend were wrong ([ GH 40044](https://github.com/pandas-dev/pandas/issues/40044))

+ A bug in `DataFrame.plot.box()` where the upper or lower bound markers in the plot were not visible when the `dark_background` theme was selected ([GH 40769](https://github.com/pandas-dev /pandas/issues/40769))

### Groupby/resample/rolling

+ A bug in `DataFrameGroupBy.agg()` and `SeriesGroupBy.agg()` where columns of type `PeriodDtype` were incorrectly converted too aggressively ([GH 38254](https://github.com /pandas-dev/pandas/issues/38254))

+ A bug in `SeriesGroupBy.value_counts()` whereby unobserved categories in grouped categorical sequences were not counted ([GH 38672](https://github.com/pandas-dev/pandas/issues /38672))

+ A bug in `SeriesGroupBy.value_counts()` where an error was raised on empty series ([GH 39172](https://github.com/pandas-dev/pandas/issues/39172))

+ A bug in `GroupBy.indices()` where non-existent indexes were included when there was a null value in the grouping key ([GH 9304](https://github.com/pandas-dev/pandas/issues /9304))

+ Fixed a bug in `DataFrameGroupBy.sum()` and `SeriesGroupBy.sum()`, precision loss is now avoided by using Kahan summation ([GH 38778](https://github.com/pandas-dev /pandas/issues/38778))

+ Fixed bug in `DataFrameGroupBy.cumsum()`, `SeriesGroupBy.cumsum()`, `DataFrameGroupBy.mean()` and `SeriesGroupBy.mean()`, resulting in loss of accuracy by using Kahan summation ([GH 38934 ](https://github.com/pandas-dev/pandas/issues/38934))

+ Bug in `Resampler.aggregate()` and `DataFrame.transform()`, throwing `TypeError` instead of `SpecificationError` when missing keys have mixed data types ([GH 39025](https://github. com/pandas-dev/pandas/issues/39025))

+ Bug in `DataFrameGroupBy.idxmin()` and `DataFrameGroupBy.idxmax()` involving `ExtensionDtype` column ([GH 38733](https://github.com/pandas-dev/pandas/issues/38733))

+ Bug in `Series.resample()`, an exception will be thrown when the index is a `PeriodIndex` composed of `NaT` ([GH 39227](https://github.com/pandas-dev/pandas/issues/ 39227))

+ Bug in `RollingGroupby.corr()` and `ExpandingGroupby.corr()`, when the supplied `other` is longer than each group, the grouping column will return `0` instead of `np.nan` ([GH 39591](https://github.com/pandas-dev/pandas/issues/39591))

+ Bug in `ExpandingGroupby.corr()` and `ExpandingGroupby.cov()` where `1` was returned instead of `np.nan` when the supplied `other` was longer than each group ([GH 39591] (https://github.com/pandas-dev/pandas/issues/39591))

+ Bug in `DataFrameGroupBy.mean()`, `SeriesGroupBy.mean()`, `DataFrameGroupBy.median()`, `SeriesGroupBy.median()` and `DataFrame.pivot_table()` where metadata was not propagated ([ GH 28283](https://github.com/pandas-dev/pandas/issues/28283))

+ Bug in `Series.rolling()` and `DataFrame.rolling()` where the window bounds were not calculated correctly when the window was an offset and the dates were in descending order ([GH 40002](https://github. com/pandas-dev/pandas/issues/40002))

+ Bug in `Series.groupby()` and `DataFrame.groupby()`, using `idxmax`, `idxmin`, `mad`, `min`, `max directly on an empty `Series` or `DataFrame` `, `sum`, `prod` and `skew` methods, or when used via `apply`, `aggregate` or `resample`, can lose indexes, columns and/or data types ([GH 26411](https:/ /github.com/pandas-dev/pandas/issues/26411))

+ Bug in `DataFrameGroupBy.apply()` and `SeriesGroupBy.apply()`, when used on `RollingGroupby` objects, would create a `MultiIndex` instead of an `Index` ([GH 39732](https:/ /github.com/pandas-dev/pandas/issues/39732))

+ Bug in `DataFrameGroupBy.sample()`, causing an error when specifying `weights` and the index is `Int64Index` ([GH 39927](https://github.com/pandas-dev/pandas/issues/39927 ))

+ A bug in `DataFrameGroupBy.aggregate()` and `Resampler.aggregate()`, which sometimes raised `SpecificationError` when a dictionary was passed and columns were missing, will now always raise `KeyError` ([GH 40004](https ://github.com/pandas-dev/pandas/issues/40004))

+ A bug in `DataFrameGroupBy.sample()` where column selection was not applied before calculating the result ([GH 39928](https://github.com/pandas-dev/pandas/issues/39928))

+ Providing `times` in `ExponentialMovingWindow` incorrectly raises `ValueError` when calling `__getitem__` ([GH 40164](https://github.com/pandas-dev/pandas/issues/40164))

+ `ExponentialMovingWindow` does not retain `com`, `span`, `alpha` or `halflife` properties when calling `__getitem__` ([GH 40164](https://github.com/pandas-dev/pandas/ issues/40164))

+ `ExponentialMovingWindow` now raises `NotImplementedError` when specifying `times` with `adjust=False` due to incorrect calculation ([GH 40098](https://github.com/pandas-dev/pandas/issues/ 40098))

+ A bug in `ExponentialMovingWindowGroupby.mean()`, when `engine=&#39;numba&#39;`, the `times` parameter is ignored ([GH 40951](https://github.com/pandas-dev/pandas/ issues/40951))

+ A bug in `ExponentialMovingWindowGroupby.mean()`, using the wrong time when there are multiple groups ([GH 40951](https://github.com/pandas-dev/pandas/issues/40951))

+ A bug in `ExponentialMovingWindowGroupby` caused the time vector and numerical value of non-trivial grouping to be out of sync ([GH 40951](https://github.com/pandas-dev/pandas/issues/40951))

+ A bug in `Series.asfreq()` and `DataFrame.asfreq()` that would lose rows when the index was not sorted ([GH 39805](https://github.com/pandas-dev/pandas/ issues/39805))

+ When doing aggregate functions on `DataFrame`, the `numeric_only` argument was not respected when the `level` keyword argument was given ([GH 40660](https://github.com/pandas-dev/pandas/ issues/40660))

+ A bug in `SeriesGroupBy.aggregate()` where using a user-defined function to aggregate a Series with object type `Index` would result in incorrect `Index` shape ([GH 40014](https://github .com/pandas-dev/pandas/issues/40014))

+ A bug exists in `RollingGroupby`, the `as_index=False` parameter in `groupby` is ignored ([GH 39433](https://github.com/pandas-dev/pandas/issues/39433))

+ A bug exists in `DataFrameGroupBy.any()`, `SeriesGroupBy.any()`, `DataFrameGroupBy.all()` and `SeriesGroupBy.all()` when a nullable type holding `NA` When used on a column, even `skipna=True` will raise `ValueError` ([GH 40585](https://github.com/pandas-dev/pandas/issues/40585))

+ A bug in `DataFrameGroupBy.cummin()`, `SeriesGroupBy.cummin()`, `DataFrameGroupBy.cummax()` and `SeriesGroupBy.cummax()` incorrectly resolves values ​​near `int64` implementation boundaries Rounding integer values ​​([GH 40767](https://github.com/pandas-dev/pandas/issues/40767))

+ A bug in `DataFrameGroupBy.rank()` and `SeriesGroupBy.rank()` that incorrectly raised `TypeError` with nullable dtypes ([GH 41010](https://github.com /pandas-dev/pandas/issues/41010))

+ There is a bug when using `DataFrameGroupBy.cummin()`, `SeriesGroupBy.cummin()`, `DataFrameGroupBy.cummax()` and `SeriesGroupBy.cummax()` when converting a nullable data type to a float When counting points, the calculation result is incorrect when it is too large to be round-tripped ([GH 37493](https://github.com/pandas-dev/pandas/issues/37493))

+ There is a bug when using `DataFrame.rolling()`, if the calculation is not numerically stable and `min_periods=0`, the mean returned for all `NaN` windows is zero ([GH 41053](https: //github.com/pandas-dev/pandas/issues/41053))

+ There is a bug when using `DataFrame.rolling()`, if the calculation is not numerically stable and `min_periods=0`, the sum returned is non-zero for all `NaN` windows ([GH 41053](https: //github.com/pandas-dev/pandas/issues/41053))

+ There is a bug when using `SeriesGroupBy.agg()`. When aggregating the ordered `CategoricalDtype`, the order cannot be preserved, causing the aggregation to fail ([GH 41147](https://github.com/pandas-dev /pandas/issues/41147))

+ There is a bug when using `DataFrameGroupBy.min()`, `SeriesGroupBy.min()`, `DataFrameGroupBy.max()` and `SeriesGroupBy.max()` for multiple object type columns and `numeric_only=False `, incorrectly raising `ValueError` ([GH 41111](https://github.com/pandas-dev/pandas/issues/41111))

+ There is a bug in `DataFrameGroupBy.rank()` when the `axis=0` of the GroupBy object and the keyword `axis=1` of the `rank` method ([GH 41320](https://github.com/pandas- dev/pandas/issues/41320))

+ There is a bug in `DataFrameGroupBy.__getitem__()`. When the column is not unique, the error returns an incorrectly formatted `SeriesGroupBy` instead of `DataFrameGroupBy` ([GH 41427](https://github.com/pandas-dev/ pandas/issues/41427))

+ There is a bug in `DataFrameGroupBy.transform()`, which incorrectly raises `AttributeError` when columns are not unique ([GH 41427](https://github.com/pandas-dev/pandas/issues/41427))

+ There is a bug in `Resampler.apply()`, which incorrectly discards duplicate columns when the columns are not unique ([GH 41445](https://github.com/pandas-dev/pandas/issues/41445))

+ Aggregation operations of `Series.groupby()` incorrectly return an empty `Series` instead of raising `TypeError` on aggregation operations whose dtype is invalid, e.g. `.prod` with `datetime64[ns]` dtype ( [GH 41342](https://github.com/pandas-dev/pandas/issues/41342))

+ `DataFrameGroupBy`&#39;s aggregation operation incorrectly failed to drop columns of invalid dtype for the aggregate when there are no valid columns ([GH 41291](https://github.com/pandas-dev/pandas/issues/41291))

+ Bug in `DataFrame.rolling.__iter__()` where `on` was not assigned to the index of the result object ([GH 40373](https://github.com/pandas-dev/pandas/issues/40373))

+ There is a bug in `DataFrameGroupBy.transform()` and `DataFrameGroupBy.agg()`, when using `engine="numba"`, `*args` is cached together with the function passed by the user ([GH 41647](https: //github.com/pandas-dev/pandas/issues/41647))

+ `DataFrameGroupBy` methods `agg`, `transform`, `sum`, `bfill`, `ffill`, `pad`, `pct_change`, `shift`, `ohlc` are missing `.columns.names` ([ GH 41497](https://github.com/pandas-dev/pandas/issues/41497))

### Reshape

+ `merge()` when performing an inner join on partial indexes and `right_index=True` raises an error when there is no overlap between indices ([GH 33814](https://github.com/pandas-dev/pandas /issues/33814))

+ Bug in `DataFrame.unstack()` caused incorrect index names when levels were missing ([GH 37510](https://github.com/pandas-dev/pandas/issues/37510))

+ Under the `left_index=True` and `right_on` specifications, `merge_asof()` propagates the right index instead of the left index ([GH 33463](https://github.com/pandas-dev/pandas/issues/33463 ))

+ DataFrame with `MultiIndex` returns incorrect results on `DataFrame.join()` when one or both indexes have only one level ([GH 36909](https://github.com/pandas-dev/pandas/ issues/36909))

+ In the case of non-numeric merge columns, `merge_asof()` now raises `ValueError` instead of the obscure `TypeError` ([GH 29130](https://github.com/pandas-dev/pandas/issues/ 29130))

+ Bug in `DataFrame.join()` where values ​​were not assigned correctly when a DataFrame had at least one dimension with a `MultiIndex` that had a non-alphabetical `Categorical` category ([GH 38502](https://github. com/pandas-dev/pandas/issues/38502))

+ `Series.value_counts()` and `Series.mode()` now return consistent keys in original order ([GH 12679](https://github.com/pandas-dev/pandas/issues/12679),[ GH 11227](https://github.com/pandas-dev/pandas/issues/11227) and [GH 39007](https://github.com/pandas-dev/pandas/issues/39007))

+ Bug in `DataFrame.stack()` not properly handling `NaN` in `MultiIndex` column ([GH 39481](https://github.com/pandas-dev/pandas/issues/39481))

+ Bug in `DataFrame.apply()` caused incorrect results when argument `func` was a string, `axis=1`, and axis arguments were not supported; now raises `ValueError` ([GH 39211] (https://github.com/pandas-dev/pandas/issues/39211))

+ `DataFrame.sort_values()` does not reshape the index correctly after sorting by column when `ignore_index=True` ([GH 39464](https://github.com/pandas-dev/pandas/issues/39464) )

+ Bug in `DataFrame.append()`, returning incorrect data type in combination of `ExtensionDtype` data types ([GH 39454](https://github.com/pandas-dev/pandas/issues/39454 ))

+ Bug in `DataFrame.append()`, returning incorrect data type in combination of `datetime64` and `timedelta64` data types ([GH 39574](https://github.com/pandas-dev/pandas /issues/39574))

+ Bug in `DataFrame.append()`, when appending a `Series` whose `Index` is not `MultiIndex` in a `DataFrame` with `MultiIndex` ([GH 41707](https://github. com/pandas-dev/pandas/issues/41707))

+ Bug in `DataFrame.pivot_table()`, which returned a single-valued `MultiIndex` when operating on an empty DataFrame ([GH 13483](https://github.com/pandas-dev/pandas/issues/13483) )

+ `Index` can now be passed to [`numpy.all()`](https://numpy.org/doc/stable/reference/generated/numpy.all.html#numpy.all "(in NumPy v1.26 in)") function ([GH 40180](https://github.com/pandas-dev/pandas/issues/40180))

+ Bug in `DataFrame.stack()`, `CategoricalDtype` is not preserved in `MultiIndex` ([GH 36991](https://github.com/pandas-dev/pandas/issues/36991))

+ Bug in `to_datetime()`, which raised an error when the input sequence contained unhashable items ([GH 39756](https://github.com/pandas-dev/pandas/issues/39756))

+ Bug in `Series.explode()` where `ignore_index` is `True` and the value is a scalar �� retaining the index ([GH 40487](https://github.com/pandas-dev/pandas/issues/ 40487))

+ Bug in `to_datetime()`, raising `ValueError` when `Series` contains `None` and `NaT` and has more than 50 elements ([GH 39882](https://github.com/pandas-dev/ pandas/issues/39882))

+ Bug in `Series.unstack()` and `DataFrame.unstack()` ([GH 41875](https:// github.com/pandas-dev/pandas/issues/41875))

+ Bug in `DataFrame.melt()` that throws `InvalidIndexError` when `DataFrame` has duplicate columns used as `value_vars` ([GH 41951](https://github.com/pandas-dev/pandas /issues/41951))

### Sparse

+ In `DataFrame.sparse.to_coo()`, a `KeyError` is thrown when the column is a numeric `Index` without `0` ([GH 18414](https://github.com/pandas- dev/pandas/issues/18414))

+ Bug where `SparseArray.astype()` produced incorrect results when converting from integer dtype to floating point dtype when `copy=False` ([GH 34456](https://github.com/pandas-dev/pandas /issues/34456))

+ `SparseArray.max()` and `SparseArray.min()` always return empty results bug ([GH 40921](https://github.com/pandas-dev/pandas/issues/40921))

### ExtensionArray

+ Bug in `DataFrame.where()` when `other` is a Series with `ExtensionDtype` ([GH 38729](https://github.com/pandas-dev/pandas/issues/ 38729))

+ Fixed a bug where `Series.idxmax()`, `Series.idxmin()`, `Series.argmax()` and `Series.argmin()` would fail when the underlying data is `ExtensionArray` ([GH 32749 ](https://github.com/pandas-dev/pandas/issues/32749), [GH 33719](https://github.com/pandas-dev/pandas/issues/33719), [GH 36566]( https://github.com/pandas-dev/pandas/issues/36566))

+ Fixed a bug where some properties of subclasses of `PandasExtensionDtype` were cached incorrectly ([GH 40329](https://github.com/pandas-dev/pandas/issues/40329))

+ In `DataFrame.mask()`, a bug causing `ValueError` when using `ExtensionDtype` to mask DataFrame ([GH 40941](https://github.com/pandas-dev/pandas/issues/40941))

### Styler

+ The `subset` parameter in `Styler` throws an error on some valid MultiIndex slices ([GH 33562](https://github.com/pandas-dev/pandas/issues/33562))

+ The HTML output rendered by `Styler` has been slightly changed to support w3&#39;s good coding standards ([GH 39626](https://github.com/pandas-dev/pandas/issues/39626))

+ In `Styler`, the rendered HTML is missing column class identifiers for some header cells ([GH 39716](https://github.com/pandas-dev/pandas/issues/39716))

+ Bug in `Styler.background_gradient()`, text color was not determined correctly ([GH 39888](https://github.com/pandas-dev/pandas/issues/39888))

+ There was a bug in `Styler.set_table_styles()`, multiple elements of the CSS selector in the `table_styles` parameter were not added correctly ([GH 34061](https://github.com/pandas-dev/pandas/issues/ 34061))

+ Bug in `Styler` when copying in Jupyter, resulting in top left cell missing and incorrect title alignment ([GH 12147](https://github.com/pandas-dev/pandas/issues/12147))

+ Bug in `Styler.where`, `kwargs` was not passed into the applicable callable function ([GH 40845](https://github.com/pandas-dev/pandas/issues/40845))

+ Bug in `Styler` caused CSS to be repeated on multiple renders ([GH 39395](https://github.com/pandas-dev/pandas/issues/39395), [GH 40334](https:// github.com/pandas-dev/pandas/issues/40334))

### other

+ `inspect.getmembers(Series)` no longer raises `AbstractMethodError` ([GH 38782](https://github.com/pandas-dev/pandas/issues/38782))

+ Bug in `Series.where()`, numeric type was not converted to `nan` when `other=None` ([GH 39761](https://github.com/pandas-dev/pandas/issues/ 39761))

+ Fixed an issue where `assert_series_equal()`, `assert_frame_equal()`, `assert_index_equal()` and `assert_extension_array_equal()` incorrectly raised exceptions when properties had unrecognized NA types ([GH 39461](https ://github.com/pandas-dev/pandas/issues/39461))

+ Fixed `assert_index_equal()` failing to raise an error when comparing `CategoricalIndex` instances to `Int64Index` and `RangeIndex` categories when using `exact=True` ([GH 41263](https://github. com/pandas-dev/pandas/issues/41263))

+ Fixed `DataFrame.equals()`, `Series.equals()` and `Index.equals()` in `np.datetime64("NaT")` or `np.timedelta64("NaT")` Problems with object dtype ([GH 39650](https://github.com/pandas-dev/pandas/issues/39650))

+ Fixed the issue where the console JSON output in `show_versions()` was not the correct JSON ([GH 39701](https://github.com/pandas-dev/pandas/issues/39701))

+ pandas now compiles on z/OS when using [xlc](https://www.ibm.com/products/xl-cpp-compiler-zos) ([GH 35826](https://github .com/pandas-dev/pandas/issues/35826))

+ Fixed the issue where `pandas.util.hash_pandas_object()` failed to recognize `hash_key`, `encoding` and `categorize` when the input object type is `DataFrame` ([GH 41404](https://github. com/pandas-dev/pandas/issues/41404)) ## Contributors

A total of 251 people contributed patches to this version. People with a "+" next to their name are contributing patches for the first time.

+   Abhishek R +

+ There&#39;s Draginda

+   Adam J. Stewart

+   Adam Turner +

+   Aidan Feldman +

+   Ajitesh Singh +

+ Akshat Jain +

+ Albert Villanova del Moral

+ Alexandre Prince-Levasseur +

+   Andrew Hawyrluk +

+Andrew Wieteska

+   AnglinaBhambra +

+Ankush Dua+

+ Anna Daglis

+   Ashlan Parker +

+ Ashwani +

+ Avinash Pancham

+Ayushman Kumar+

+ Women

+   Benoît Vinot

+ Bharat Raghunathan

+ Bijay Regmi +

+   Bobin Mathew +

+ Bogdan Pilyavets +

+ Brian Hulette +

+   Brian Sun +

+   Brock +

+   Bryan Cutler

+   Caleb +

+   Calvin Ho +

+ Chathura Widanage +

+ Chinmay Rane +

+   Chris Lynch

+   Chris Withers

+ Christos Petropoulos

+ Corentin Girard +

+   DaPy15 +

+ Damodara Puddu +

+ Daniel Hrisca

+   Daniel Saxton

+ DanielFEvans

+ Dare Adewumi +

+   Dave Willmer

+ David Schlachter +

+David-dmh+

+ Deepang Raval +

+   Doris Lee +

+   Dr. Jan-Philip Gehrcke +

+   DriesS +

+   Dylan Percy

+ Erfan Nariman

+   Eric Leung

+   EricLeer +

+   Eve

+Fangchen Li

+ Felix Divo

+Florian Jetter

+   Fred Reiss

+   GFJ138 +

+   Gaurav Sheni +

+   Geoffrey B. Eisenbarth +

+ Prompted Stupperich +

+   Griffin Ansel +

+ Gustavo C. Maciel +

+   Heidi +

+   Henry +

+Hung-Yi Wu+

+   Ian Ozsvald +

+ Irv Funny

+ Isaac Chung +

+   Isaac Virshup

+   JHM Darbyshire (MBP) +

+   JHM Darbyshire (iMac) +

+   Jack Liu +

+   James Lamb +

+ Jeet Parekh

+   Jeff Reback

+ Jiezheng2018 +

+   Jody Klymak

+ Johan Kåhrström +

+   John McGuigan

+ Joris Van den Bossche

+   Jose

+ JoseNavy

+   Josh Dimarsky

+   Josh Friedlander

+   Joshua Klein +

+ Julia Signell

+ Julian Schnitzler +

+ Kaiqi Dong

+ Eunuch Panjri +

+   Katie Smith +

+   Kelly +

+ Kenil +

+ Keppler, Kyle +

+   Kevin Sheppard

+ Khor Chean Wei +

+   Kiley Hewitt +

+ Larry Wong +

+   Lightyears +

+   Lucas Holtz +

+ Lucas Rodés-Guirao

+   Lucky Sivagurunathan +

+ Luis Pinto

+ Maciej Kos +

+ Mark Garcia

+   Marco Edward Gorelli +

+ Marco Gorelli

+ Marco Gorelli +

+   Mark Graham

+ Martin Dengler +

+ Martin Grigorov +

+   Marty Rudolf +

+   Matt Roeschke

+   Matthew Roeschke

+   Matthew Zeitlin

+   Max Bolingbroke

+   Maxim Ivanov

+ Maxim copper +

+   Mayur +

+   MeeseeksMachine

+ Michael Jarniac

+ Michael Hsieh +

+ Michel de Ruiter +

+   Mike Roberts +

+ Miroslav Šedivý

+ Mohammad Jafar Mashhadi

+ Morisa Manzella +

+ Mortada Mehyar

+   Muktan +

+   Naveen Agrawal +

+   Noah

+   Nofar Mishraki +

+ By Kozynets

+ Olga Matoula +

+ Was +

+ Omar Afifi

+   Omer Ozarslan +

+   Owen Lamont +

+ Ozan Lecturer +

+ Pandas development team

+ Paolo Lammens

+ Perfect Gasana +

+   Patrick Hoefler

+   Paul McCarthy +

+ Paulo S. Costa +

+ Pav A

+   Peter

+ Pradyumna Rahul +

+ Recharges +

+ QP Hou +

+   Rahul Chauhan

+ Rahul Sathanapalli

+   Richard Shadrach

+   Robert Bradshaw

+   Robin to Roxel

+   Rohit Gupta

+ Sam Purkis +

+ Samuel GIFFARD +

+   Sean M. Law +

+   Shahar Naveh +

+   ShaharNaveh +

+   Shiv Gupta +

+ Dixit Series +

+ Shudong Yang +

+   Simon Boehm +

+   Simon Hawkins

+   Sioned Baker +

+ Stefan Mejlgaard +

+   Steven Pitman +

+   Steven Schaerer +

+ Stéphane Guillou +

+   TLouf +

+ Firm D Pratama +

+ Terje Petersen

+   Theodoros Nikolaou +

+   Thomas Dickson

+ Thomas Li

+   Thomas Smith

+   Thomas Yu +

+   ThomasBlauthQC +

+ Tim Hoffmann

+   Tom Augspurger

+ Torsten Wörtwein

+   Tyler Reddy

+   UrielMaD

+ Uwe L. Korn

He will hunt

+   VirosaLi

+ Vladimir Podolski

+ Vyom Pathak +

+ MONEY Aiyong

+ Walter Koskinen +

+ Wenjun Si +

+ William Aid

+   Yeshwanth N +

+ Yuanhao Geng

+ Zito Relova +

+   aflah02 +

+   arredond +

+   attack68

+   cdknox +

+chinggg+

+   fathomer +

+ ftrihardjo +

+   github-actions[bot] +

+ gunjan-solanki +

+ kiran teacher

+ hasan-yaman

+ i-aki-y +

+   jbrockmendel

+   jmholzer +

+ jordi-crespo +

+ something +

+   jreback

+   juliansmidek +

+ cooling keppler

+   lrepiton +

+   lucasrodes

+   maroth96 +

+   mikeronayne +

+ mlondschien

+   moink +

+   morrme

+   mschmookler +

+ mzeitlin11

+na2+

+ nofarmishraki +

+ partev

+   patrick

+   ptype

+   realead

+   rhshadrach

+   rlukevie +

+   rosagold +

+   saucoide +

+ sdements +

+   shawnbrown

+   sstiijn +

+   stphnlyd +

+ fallen1 +

+   taytzehao

+   theOehrly +

+   theodorju +

+   thordisstella +

+ tonyyyyip +

+ tsinggggg +

+ tushushu +

+ they just love +

+ the government +

+ wertha + ## Enhancements

### Customize HTTP(s) headers when reading csv or json files

When reading from remote URLs that are not handled by fsspec (such as HTTP and HTTPS), the dictionary passed to `storage_options` will be used to create headers included in the request. This can be used to control the User-Agent header or send other custom headers ([GH 36688](https://github.com/pandas-dev/pandas/issues/36688)). For example:

```py
In [1]: headers = {"User-Agent": "pandas"}
In [2]: df = pd.read_csv(
 ...:    "https://download.bls.gov/pub/time.series/cu/cu.item",
 ...:    sep="\t",
 ...:    storage_options=headers
 ...: ) 
```### read and write XML document

We&#39;ve added I/O support for reading and rendering shallow versions of [XML](https://www.w3.org/standards/xml/core) documents, using `read_xml()` and `DataFrame.to_xml ()`. Use [lxml](https://lxml.de) as the parser, supporting both XPath 1.0 and XSLT 1.0. ([GH 27554](https://github.com/pandas-dev/pandas/issues/27554))

```py
In [1]: xml = """<?xml version='1.0' encoding='utf-8'?>
 ...: <data>
 ...: <row>
 ...:    <shape>square</shape>
 ...:    <degrees>360</degrees>
 ...:    <sides>4.0</sides>
 ...: </row>
 ...: <row>
 ...:    <shape>circle</shape>
 ...:    <degrees>360</degrees>
 ...:    <sides/>
 ...: </row>
 ...: <row>
 ...:    <shape>triangle</shape>
 ...:    <degrees>180</degrees>
 ...:    <sides>3.0</sides>
 ...: </row>
 ...: </data>"""

In [2]: df = pd.read_xml(xml)
In [3]: df
Out[3]:
 shape  degrees  sides
0    square      360    4.0
1    circle      360    NaN
2  triangle      180    3.0

In [4]: df.to_xml()
Out[4]:
<?xml version='1.0' encoding='utf-8'?>
<data>
 <row>
 <index>0</index>
 <shape>square</shape>
 <degrees>360</degrees>
 <sides>4.0</sides>
 </row>
 <row>
 <index>1</index>
 <shape>circle</shape>
 <degrees>360</degrees>
 <sides/>
 </row>
 <row>
 <index>2</index>
 <shape>triangle</shape>
 <degrees>180</degrees>
 <sides>3.0</sides>
 </row>
</data> 
</code></pre>

For more information, see Writing XML in the IO Tools User Guide. ### Styler enhancements

We've done some focused development on <code>Styler</code>. See also the revised and improved Styler documentation (GH 39720, GH 39317, GH 40493).

<blockquote>
  <ul>
  <li>The method <code>Styler.set_table_styles()</code> now accepts a more natural CSS language as argument, such as <code>'color:red;'</code> instead of <code>[('color', 'red')]</code> (GH 39563)</p></li>
  <li><p>Methods <code>Styler.highlight_null()</code>, <code>Styler.highlight_min()</code> and <code>Styler.highlight_max()</code> now allow custom CSS highlighting instead of the default background coloring (GH 40242)</p></li>
  <li><p><code>Styler.apply()</code> now accepts functions that return <code>ndarray</code> when <code>axis=None</code>, making it consistent with the behavior of <code>axis=0</code> and <code>axis=1</code> (GH 39359)</p></li>
  <li><p>When providing malformed CSS via <code>Styler.apply()</code> or <code>Styler.applymap()</code>, an error is now thrown at render time (GH 39660)</p></li>
  <li><p><code>Styler.format()</code> now accepts keyword argument <code>escape</code> for optional HTML and LaTeX escaping (GH 40388, GH 41619)</p></li>
  <li><p><code>Styler.background_gradient()</code> now has argument <code>gmap</code> for providing a specific gradient map for shading (GH 22727)</p></li>
  <li><p><code>Styler.clear()</code> now also clears <code>Styler.hidden_index</code> and <code>Styler.hidden_columns</code> (GH 40484)</p></li>
  <li><p>Added method <code>Styler.highlight_between()</code> (GH 39821)</p></li>
  <li><p>Added method <code>Styler.highlight_quantile()</code> (GH 40926)</p></li>
  <li><p>Added method <code>Styler.text_gradient()</code> (GH 41098)</p></li>
  <li><p>Added method <code>Styler.set_tooltips()</code> to allow hover tips; this can be used to enhance interactive displays (GH 21266, GH 40284)</p></li>
  <li><p>Added parameter <code>precision</code> to method <code>Styler.format()</code> to control the display of floating point numbers (GH 40134)</p></li>
  <li><p>HTML output rendered by <code>Styler</code> now follows the w3 HTML style guide (GH 39626)</p></li>
  <li><p>Many features of the <code>Styler</code> class are now partially or fully available on DataFrames with non-unique indexes or columns (GH 41143)</p></li>
  <li><p>Better control over display via independent sparsification of indexes or columns using new style options, also available via <code>option_context()</code> (GH 41142)</p></li>
  <li><p>Added option <code>styler.render.max_elements</code> to avoid browser overload when styling large DataFrames (GH 40712)</p></li>
  <li><p>Added method <code>Styler.to_latex()</code> (GH 21673, GH 42320), which also allows some limited CSS transformations (GH 40731)</p></li>
  <li><p>Added method <code>Styler.to_html()</code> (GH 13379)</p></li>
  <li><p>Added method <code>Styler.set_sticky()</code> to make index and column headers permanently visible in scrolling HTML frames (GH 29072) ### DataFrame constructor follows <code>copy=False</code></p></li>
  </ul>
</blockquote>

<p>When passing a dictionary to <code>DataFrame</code> and <code>copy=False</code>, copying will no longer occur (GH 32960).

<pre><code class="language-python line-numbers">In [1]: arr = np.array([1, 2, 3])

In [2]: df = pd.DataFrame({"A": arr, "B": arr.copy()}, copy=False)

In [3]: df
Out[3]: 
 A  B
0  1  1
1  2  2
2  3  3 
</code></pre>

<code>df["A"]</code> is still a view on <code>arr</code>:

<pre><code class="language-python line-numbers">In [4]: arr[0] = 0

In [5]: assert df.iloc[0, 0] == 0 
</code></pre>

When <code>copy</code> is not passed, the default behavior remains unchanged, which is to copy. ### String data types supported by PyArrow

We have enhanced <code>StringDtype</code>, an extended type specifically for string data. (GH 39908)

It is now possible to specify a <code>storage</code> keyword option to <code>StringDtype</code>. Use the pandas option or specify <code>dtype='string[pyarrow]'</code> to allow StringArray to be backed by Python objects that are PyArrow arrays rather than NumPy arrays.

Using PyArrow's supported StringArray requires pyarrow 1.0.0 or higher.

warn

<code>string[pyarrow]</code> is currently considered experimental. The implementation and parts of the API may change without warning.

<pre><code class="language-python line-numbers">In [6]: pd.Series(['abc', None, 'def'], dtype=pd.StringDtype(storage="pyarrow"))
Out[6]: 
0     abc
1    <NA>
2     def
dtype: string 
</code></pre>

You can also use the alias <code>"string[pyarrow]"</code>.

<pre><code class="language-python line-numbers">In [7]: s = pd.Series(['abc', None, 'def'], dtype="string[pyarrow]")

In [8]: s
Out[8]: 
0     abc
1    <NA>
2     def
dtype: string 
</code></pre>

You can also create a PyArrow-backed string array using the pandas option.

<pre><code class="language-python line-numbers">In [9]: with pd.option_context("string_storage", "pyarrow"):
 ...:    s = pd.Series(['abc', None, 'def'], dtype="string")
 ...: 

In [10]: s
Out[10]: 
0     abc
1    <NA>
2     def
dtype: string 
</code></pre>

Regular string access methods work. Where appropriate, the return type of a Series or DataFrame column will also have a string dtype.

<pre><code class="language-python line-numbers">In [11]: s.str.upper()
Out[11]: 
0     ABC
1    <NA>
2     DEF
dtype: string

In [12]: s.str.split('b', expand=True).dtypes
Out[12]: 
0    string[pyarrow]
1    string[pyarrow]
dtype: object 
</code></pre>

String access methods that return integers will return a value with <code>Int64Dtype</code>

<pre><code class="language-python line-numbers">In [13]: s.str.count("a")
Out[13]: 
0       1
1    <NA>
2       0
dtype: Int64 
```### Centered datetime scrolling window

Centered datetime windows are now available when performing rolling calculations on DataFrame and Series objects with similar datetime indexes ([GH 38780](https://github.com/pandas-dev/pandas/issues/38780)) . For example:

```py
In [14]: df = pd.DataFrame(
 ....:    {"A": [0, 1, 2, 3, 4]}, index=pd.date_range("2020", periods=5, freq="1D")
 ....: )
 ....: 

In [15]: df
Out[15]: 
 A
2020-01-01  0
2020-01-02  1
2020-01-03  2
2020-01-04  3
2020-01-05  4

In [16]: df.rolling("2D", center=True).mean()
Out[16]: 
 A
2020-01-01  0.5
2020-01-02  1.5
2020-01-03  2.5
2020-01-04  3.5
2020-01-05  4.0 
```### Other enhancements

+ `DataFrame.rolling()`, `Series.rolling()`, `DataFrame.expanding()` and `Series.expanding()` now support a `method` argument, which contains a `&#39;table`` option, which Perform window operations on the entire `DataFrame`. See the window overview for performance and functionality benefits ([GH 15095](https://github.com/pandas-dev/pandas/issues/15095), [GH 38995](https://github.com/ pandas-dev/pandas/issues/38995))

+ `ExponentialMovingWindow` now supports an `online` method to perform `mean` calculations online. See Window Overview ([GH 41673](https://github.com/pandas-dev/pandas/issues/41673))

+ Added `MultiIndex.dtypes()` ([GH 37062](https://github.com/pandas-dev/pandas/issues/37062))

+ Added `end` and `end_day` options to the `origin` parameter of `DataFrame.resample()` ([GH 37804](https://github.com/pandas-dev/pandas/issues/37804))

+ In `read_csv()` and `engine="c"`, improved error message when `usecols` and `names` do not match ([GH 29042](https://github.com/pandas-dev/ pandas/issues/29042))

+ Improved consistency of error messages when passing invalid `win_type` parameters in window methods ([GH 15969](https://github.com/pandas-dev/pandas/issues/15969))

+ `read_sql_query()` now accepts a `dtype` parameter to convert columnar data from a SQL database based on user input ([GH 10285](https://github.com/pandas-dev/pandas/issues/10285))

+ When `usecols` is not specified, `read_csv()` now raises `ParserWarning` if the length of the header or given name does not match the length of the data ([GH 21768](https://github.com/pandas- dev/pandas/issues/21768))

+ Integer type mapping from pandas to SQLAlchemy has been improved when using `DataFrame.to_sql()` ([GH 35076](https://github.com/pandas-dev/pandas/issues/35076))

+ `to_numeric()` now supports downgrading nullable `ExtensionDtype` objects ([GH 33013](https://github.com/pandas-dev/pandas/issues/33013))

+ Added support for dictionary-like names in `MultiIndex.set_names` and `MultiIndex.rename` ([GH 20421](https://github.com/pandas-dev/pandas/issues/20421))

+ `read_excel()` now automatically detects .xlsb files and old .xls files ([GH 35416](https://github.com/pandas-dev/pandas/issues/35416), [GH 41225](https ://github.com/pandas-dev/pandas/issues/41225))

+ `ExcelWriter` now accepts an `if_sheet_exists` parameter to control the behavior of append mode when writing to an existing worksheet ([GH 40230](https://github.com/pandas-dev/pandas/issues/40230) )

+ `Rolling.sum()`, `Expanding.sum()`, `Rolling.mean()`, `Expanding.mean()`, `ExponentialMovingWindow.mean()`, `Rolling.median()`, ` Expanding.median()`, `Rolling.max()`, `Expanding.max()`, `Rolling.min()`, and `Expanding.min()` now support [Numba] using the `engine` keyword ](http://numba.pydata.org/) Execute ([GH 38895](https://github.com/pandas-dev/pandas/issues/38895), [GH 41267](https://github. com/pandas-dev/pandas/issues/41267))

+ `DataFrame.apply()` now accepts NumPy&#39;s unary operators as strings, such as `df.apply("sqrt")`, which was already the case for `Series.apply()` ([GH 39116](https://github.com/pandas-dev/pandas/issues/39116))

+ `DataFrame.apply()` now accepts non-callable DataFrame properties as strings, such as `df.apply("size")`, which was already the case for `Series.apply()` ([ GH 39116](https://github.com/pandas-dev/pandas/issues/39116))

+ `DataFrame.applymap()` now accepts keyword arguments passed to a user-supplied `func` ([GH 39987](https://github.com/pandas-dev/pandas/issues/39987))

+ Passing `DataFrame` indexers to `iloc` for use with `Series.__getitem__()` and `DataFrame.__getitem__()` is now not allowed ([GH 39004](https://github.com/pandas-dev /pandas/issues/39004))

+ `Series.apply()` can now accept list or dictionary-like arguments, e.g. `ser.apply(np.array(["sum", "mean"]))`, which is the case for `DataFrame.apply()` This is already the case ([GH 39140](https://github.com/pandas-dev/pandas/issues/39140))

+ `DataFrame.plot.scatter()` can now accept a categorical column as argument `c` ([GH 12380](https://github.com/pandas-dev/pandas/issues/12380), [GH 31357] (https://github.com/pandas-dev/pandas/issues/31357))

+ `Series.loc()` now provides a useful error message when the Series has a `MultiIndex` and the indexer has too many dimensions ([GH 35349](https://github.com/pandas-dev/pandas/issues /35349))

+ `read_stata()` now supports reading data from compressed files ([GH 26599](https://github.com/pandas-dev/pandas/issues/26599))

+ Added support for `ISO 8601`-like parsing of timestamps with negative signs to `Timedelta` ([GH 37172](https://github.com/pandas-dev/pandas/issues/37172))

+ Added support for unary operators in `FloatingArray` ([GH 38749](https://github.com/pandas-dev/pandas/issues/38749))

+ It is now possible to construct `RangeIndex` by passing the `range` object directly, e.g. `pd.RangeIndex(range(3))` ([GH 12067](https://github.com/pandas-dev/pandas/issues/ 12067))

+ `Series.round()` and `DataFrame.round()` now handle nullable integer and float data types ([GH 38844](https://github.com/pandas-dev/pandas/issues/38844 ))

+ `read_csv()` and `read_json()` provide parameter `encoding_errors` to control how encoding errors are handled ([GH 39450](https://github.com/pandas-dev/pandas/issues/39450))

+ `DataFrameGroupBy.any()`, `SeriesGroupBy.any()`, `DataFrameGroupBy.all()`, and `SeriesGroupBy.all()` use Kleene logic to handle nullable data types ([GH 37506](https:/ /github.com/pandas-dev/pandas/issues/37506))

+ `DataFrameGroupBy.any()`, `SeriesGroupBy.any()`, `DataFrameGroupBy.all()`, and `SeriesGroupBy.all()` return `BooleanDtype` for columns containing nullable data types ([GH 33449] (https://github.com/pandas-dev/pandas/issues/33449))

+ `DataFrameGroupBy.any()`, `SeriesGroupBy.any()`, `DataFrameGroupBy.all()`, and `SeriesGroupBy.all()` in `object` data containing `pd.NA` even if `skipna= True` also throws an exception ([GH 37501](https://github.com/pandas-dev/pandas/issues/37501))

+ `DataFrameGroupBy.rank()` and `SeriesGroupBy.rank()` now support object-dtype data ([GH 38278](https://github.com/pandas-dev/pandas/issues/38278))

+ Using the `data` argument to construct a `DataFrame` or `Series` that is a Python iterable but *not* a NumPy scalar consisting of a NumPy `ndarray` now results in a dtype with the precision of the maximum NumPy scalar; when `data ` This is already the case with NumPy `ndarray` ([GH 40908](https://github.com/pandas-dev/pandas/issues/40908))

+ Add keyword `sort` to `pivot_table()` to allow results to be unsorted ([GH 39143](https://github.com/pandas-dev/pandas/issues/39143))

+ Add keyword `dropna` to `DataFrame.value_counts()` to allow counting rows containing `NA` values ​​([GH 41325](https://github.com/pandas-dev/pandas/issues/41325 ))

+ `Series.replace()` now converts results to `PeriodDtype` instead of `object` dtype where possible ([GH 41526](https://github.com/pandas-dev/pandas/issues/41526 ))

+ Improved display of error messages in `corr` and `cov` methods of `Rolling`, `Expanding` and `ExponentialMovingWindow` when `other` is not `DataFrame` or `Series` ([GH 41741]( https://github.com/pandas-dev/pandas/issues/41741))

+ `Series.between()` now accepts `left` or `right` as an argument to the `inclusive` argument to include only the left or right margin ([GH 40245](https://github.com/pandas- dev/pandas/issues/40245))

+ `DataFrame.explode()` now supports expanding multiple columns simultaneously. Its `column` parameter now also accepts a list or tuple of strings to expand on multiple columns simultaneously ([GH 39240](https://github.com/pandas-dev/pandas/issues/39240))

+ `DataFrame.sample()` now accepts an `ignore_index` parameter to reset the index after sampling, similar to `DataFrame.drop_duplicates()` and `DataFrame.sort_values()` ([GH 38581](https:// github.com/pandas-dev/pandas/issues/38581)). ### Customize HTTP(s) headers when reading csv or json files

When reading from remote URLs that are not handled by fsspec (such as HTTP and HTTPS), the dictionary passed in `storage_options` will be used to create the headers included in the request. This can be used to control the User-Agent header or send other custom headers ([GH 36688](https://github.com/pandas-dev/pandas/issues/36688)). For example:

```py
In [1]: headers = {"User-Agent": "pandas"}
In [2]: df = pd.read_csv(
 ...:    "https://download.bls.gov/pub/time.series/cu/cu.item",
 ...:    sep="\t",
 ...:    storage_options=headers
 ...: ) 
</code></pre>

<h4>Read and write XML documents</h4>

We've added I/O support for reading and rendering shallow versions of XML documents using <code>read_xml()</code> and <code>DataFrame.to_xml()</code>. Uses lxml as the parser, supporting both XPath 1.0 and XSLT 1.0. (GH 27554)

<pre data-language=XML><code class="language-markup line-numbers">In [1]: xml = """<?xml version='1.0' encoding='utf-8'?>
 ...: <data>
 ...: <row>
 ...:    <shape>square</shape>
 ...:    <degrees>360</degrees>
 ...:    <sides>4.0</sides>
 ...: </row>
 ...: <row>
 ...:    <shape>circle</shape>
 ...:    <degrees>360</degrees>
 ...:    <sides/>
 ...: </row>
 ...: <row>
 ...:    <shape>triangle</shape>
 ...:    <degrees>180</degrees>
 ...:    <sides>3.0</sides>
 ...: </row>
 ...: </data>"""

In [2]: df = pd.read_xml(xml)
In [3]: df
Out[3]:
 shape  degrees  sides
0    square      360    4.0
1    circle      360    NaN
2  triangle      180    3.0

In [4]: df.to_xml()
Out[4]:
<?xml version='1.0' encoding='utf-8'?>
<data>
 <row>
 <index>0</index>
 <shape>square</shape>
 <degrees>360</degrees>
 <sides>4.0</sides>
 </row>
 <row>
 <index>1</index>
 <shape>circle</shape>
 <degrees>360</degrees>
 <sides/>
 </row>
 <row>
 <index>2</index>
 <shape>triangle</shape>
 <degrees>180</degrees>
 <sides>3.0</sides>
 </row>
</data> 
</code></pre>

For more information, see the Writing XML section in the User Guide in the IO Tools.

<h4>Styler enhancement</h4>

We've done some focused development on <code>Styler</code>. See the revised and improved Styler documentation (GH 39720, GH 39317, GH 40493).

<blockquote>
  <ul>
  <li>The method <code>Styler.set_table_styles()</code> now accepts a more natural CSS language as argument, such as <code>'color:red;'</code> instead of <code>[('color', 'red')]</code> (GH 39563)</p></li>
  <li><p>Methods <code>Styler.highlight_null()</code>, <code>Styler.highlight_min()</code> and <code>Styler.highlight_max()</code> now allow custom CSS highlighting instead of the default background color (GH 40242)</p></li>
  <li><p><code>Styler.apply()</code> now accepts functions returning <code>ndarray</code> when <code>axis=None</code>, making it consistent with the behavior of <code>axis=0</code> and <code>axis=1</code> (GH 39359)</p></li>
  <li><p>When providing malformed CSS via <code>Styler.apply()</code> or <code>Styler.applymap()</code>, an error is thrown when rendering (GH 39660)</p></li>
  <li><p><code>Styler.format()</code> now accepts the keyword argument <code>escape</code> for optional HTML and LaTeX escaping (GH 40388, GH 41619)</p></li>
  <li><p><code>Styler.background_gradient()</code> now adds parameter <code>gmap</code> to provide a specific gradient map for shading (GH 22727)</p></li>
  <li><p><code>Styler.clear()</code> now also clears <code>Styler.hidden_index</code> and <code>Styler.hidden_columns</code> (GH 40484)</p></li>
  <li><p>Added method <code>Styler.highlight_between()</code> (GH 39821)</p></li>
  <li><p>Added method <code>Styler.highlight_quantile()</code> (GH 40926)</p></li>
  <li><p>Added method <code>Styler.text_gradient()</code> (GH 41098)</p></li>
  <li><p>Added method <code>Styler.set_tooltips()</code> to allow hover tips; this can be used to enhance interactive displays (GH 21266, GH 40284)</p></li>
  <li><p>Added parameter <code>precision</code> to method <code>Styler.format()</code> to control the display of floating point numbers (GH 40134)</p></li>
  <li><p>HTML output rendered by <code>Styler</code> now follows the w3 HTML Style Guide (GH 39626)</p></li>
  <li><p>Many features of the <code>Styler</code> class are now partially or fully available for DataFrames with non-unique indexes or columns (GH 41143)</p></li>
  <li><p>Better control over display via new styler options by sparsifying indexes or columns individually, also available via <code>option_context()</code> (GH 41142)</p></li>
  <li><p>Added option <code>styler.render.max_elements</code> to avoid browser overload when styling large dataframes (GH 40712)</p></li>
  <li><p>Added method <code>Styler.to_latex()</code> (GH 21673, GH 42320), which also allows some limited CSS transformations (GH 40731)</p></li>
  <li><p>Added method <code>Styler.to_html()</code> (GH 13379)</p></li>
  <li><p>Added method <code>Styler.set_sticky()</code> to make index and column headers permanently visible in scrolling HTML frames (GH 29072).</p></li>
  </ul>
</blockquote>

<h4>DataFrame constructor follows <code>copy=False</code> and dictionary</h4>

<p>Copying is no longer done when passing a dictionary to <code>DataFrame</code> with <code>copy=False</code> (GH 32960).

<pre><code class="language-python line-numbers">In [1]: arr = np.array([1, 2, 3])

In [2]: df = pd.DataFrame({"A": arr, "B": arr.copy()}, copy=False)

In [3]: df
Out[3]: 
 A  B
0  1  1
1  2  2
2  3  3 
</code></pre>

<code>df["A"]</code> is still a view of <code>arr</code>:

<pre><code class="language-python line-numbers">In [4]: arr[0] = 0

In [5]: assert df.iloc[0, 0] == 0 
</code></pre>

When <code>copy</code> is not passed, the default behavior remains unchanged, which is to copy.

<h4>Use string data types supported by PyArrow</h4>

We enhanced <code>StringDtype</code>, an extended type specifically for string data (GH 39908).

Storage can now be specified via the <code>storage</code> keyword option of <code>StringDtype</code>. Use the pandas option or specify a dtype using <code>dtype='string[pyarrow]'</code> to allow StringArray to be backed by PyArrow arrays rather than by Python objects of NumPy arrays.

StringArray supported by PyArrow requires pyarrow 1.0.0 or higher to be installed.

warn

<code>string[pyarrow]</code> is currently considered an experimental feature. Implementation and parts of the API may change without warning.

<pre><code class="language-python line-numbers">In [6]: pd.Series(['abc', None, 'def'], dtype=pd.StringDtype(storage="pyarrow"))
Out[6]: 
0     abc
1    <NA>
2     def
dtype: string 
</code></pre>

You can also use the alias <code>"string[pyarrow]"</code>.

<pre><code class="language-python line-numbers">In [7]: s = pd.Series(['abc', None, 'def'], dtype="string[pyarrow]")

In [8]: s
Out[8]: 
0     abc
1    <NA>
2     def
dtype: string 
</code></pre>

You can also create PyArrow-enabled string arrays using the pandas option.

<pre><code class="language-python line-numbers">In [9]: with pd.option_context("string_storage", "pyarrow"):
 ...:    s = pd.Series(['abc', None, 'def'], dtype="string")
 ...: 

In [10]: s
Out[10]: 
0     abc
1    <NA>
2     def
dtype: string 
</code></pre>

Regular string accessor methods work. Where appropriate, the DataFrame's Series or column return type will also have a string dtype.

<pre><code class="language-python line-numbers">In [11]: s.str.upper()
Out[11]: 
0     ABC
1    <NA>
2     DEF
dtype: string

In [12]: s.str.split('b', expand=True).dtypes
Out[12]: 
0    string[pyarrow]
1    string[pyarrow]
dtype: object 
</code></pre>

String accessor methods that return integers will return values ​​with <code>Int64Dtype</code>.

<pre><code class="language-python line-numbers">In [13]: s.str.count("a")
Out[13]: 
0       1
1    <NA>
2       0
dtype: Int64 
</code></pre>

<h4>Centered scrolling window similar to date and time</h4>

Centered datetime-like windows are now available when performing rolling calculations on DataFrame and Series objects with datetime-like indexes (GH 38780). For example:

<pre><code class="language-python line-numbers">In [14]: df = pd.DataFrame(
 ....:    {"A": [0, 1, 2, 3, 4]}, index=pd.date_range("2020", periods=5, freq="1D")
 ....: )
 ....: 

In [15]: df
Out[15]: 
 A
2020-01-01  0
2020-01-02  1
2020-01-03  2
2020-01-04  3
2020-01-05  4

In [16]: df.rolling("2D", center=True).mean()
Out[16]: 
 A
2020-01-01  0.5
2020-01-02  1.5
2020-01-03  2.5
2020-01-04  3.5
2020-01-05  4.0 
</code></pre>

<h4>Other enhancements</h4>

<ul>
<li><code>DataFrame.rolling()</code>, <code>Series.rolling()</code>, <code>DataFrame.expanding()</code>, and <code>Series.expanding()</code> now support a <code>method</code> argument with the <code>'table'</code> option, which option performs windowing operations on the entire <code>DataFrame</code>. Check out the Window Overview to see the performance and functionality benefits (GH 15095, GH 38995)</p></li>
<li><p><code>ExponentialMovingWindow</code> now supports an <code>online</code> method to perform <code>mean</code> calculations online. View Window Overview (GH 41673)</p></li>
<li><p>Added <code>MultiIndex.dtypes()</code> (GH 37062)</p></li>
<li><p>Added <code>end</code> and <code>end_day</code> options to <code>origin</code> parameter in <code>DataFrame.resample()</code> (GH 37804)</p></li>
<li><p>In <code>read_csv()</code>, improved error message when <code>usecols</code> and <code>names</code> do not match, and <code>engine="c"</code> (GH 29042)</p></li>
<li><p>Improved consistency of error messages when passing invalid <code>win_type</code> parameters in Window methods (GH 15969)</p></li>
<li><p><code>read_sql_query()</code> now accepts a <code>dtype</code> argument to transform column data from a SQL database based on user input (GH 10285)</p></li>
<li><p>When <code>usecols</code> is not specified, <code>read_csv()</code> now raises <code>ParserWarning</code> if the length of the header or given name does not match the length of the data (GH 21768)</p></li>
<li><p>Improved integer type mapping from pandas to SQLAlchemy when using <code>DataFrame.to_sql()</code> (GH 35076)</p></li>
<li><p><code>to_numeric()</code> now supports downcasting of nullable <code>ExtensionDtype</code> objects (GH 33013)</p></li>
<li><p>Added support for dictionary-like names in <code>MultiIndex.set_names</code> and <code>MultiIndex.rename</code> (GH 20421)</p></li>
<li><p><code>read_excel()</code> now automatically detects .xlsb files and legacy .xls files (GH 35416, GH 41225)</p></li>
<li><p><code>ExcelWriter</code> now accepts an <code>if_sheet_exists</code> parameter for controlling the behavior of append modes when writing to existing sheets (GH 40230)</p></li>
<li><p><code>Rolling.sum()</code>, <code>Expanding.sum()</code>, <code>Rolling.mean()</code>, <code>Expanding.mean()</code>, <code>ExponentialMovingWindow.mean()</code>, <code>Rolling.median()</code>, <code>Expanding .median()</code>, <code>Rolling.max()</code>, <code>Expanding.max()</code>, <code>Rolling.min()</code> and <code>Expanding.min()</code> now support Numba execution, using the <code>engine</code> keyword (GH 38895, GH 41267)</p></li>
<li><p><code>DataFrame.apply()</code> now accepts NumPy unary operators as strings, such as <code>df.apply("sqrt")</code>, which already exists in <code>Series.apply()</code> (GH 39116)</p></li>
<li><p><code>DataFrame.apply()</code> can now accept non-callable DataFrame properties as strings, such as <code>df.apply("size")</code>, which already exists in <code>Series.apply()</code> (GH 39116)</p></li>
<li><p><code>DataFrame.applymap()</code> now accepts kwargs passed to a user-supplied <code>func</code> (GH 39987)</p></li>
<li><p>Passing <code>DataFrame</code> indexers to <code>iloc</code> for <code>Series.__getitem__()</code> and <code>DataFrame.__getitem__()</code> is now not allowed (GH 39004)</p></li>
<li><p><code>Series.apply()</code> can now accept a list or dictionary-like argument instead of a list or dictionary, e.g. <code>ser.apply(np.array(["sum", "mean"]))</code>, which is used in <code>DataFrame Already exists in .apply()</code> (GH 39140)</p></li>
<li><p><code>DataFrame.plot.scatter()</code> now accepts a categorical column as parameter <code>c</code> (GH 12380, GH 31357)</p></li>
<li><p><code>Series.loc()</code> now raises a useful error message when the Series has a <code>MultiIndex</code> and the indexer has too many dimensions (GH 35349)</p></li>
<li><p><code>read_stata()</code> now supports reading data from compressed files (GH 26599)</p></li>
<li><p>Added support for parsing <code>ISO 8601</code>-like timestamps with negative signs into <code>Timedelta</code> (GH 37172)</p></li>
<li><p>Added support for unary operators in <code>FloatingArray</code> (GH 38749)</p></li>
<li><p><code>RangeIndex</code> can now be constructed by passing a <code>range</code> object directly, e.g. <code>pd.RangeIndex(range(3))</code> (GH 12067)</p></li>
<li><p><code>Series.round()</code> and <code>DataFrame.round()</code> now handle nullable integer and floating point data types (GH 38844)</p></li>
<li><p><code>read_csv()</code> and <code>read_json()</code> provide parameter <code>encoding_errors</code> to control how encoding errors are handled (GH 39450)</p></li>
<li><p><code>DataFrameGroupBy.any()</code>, <code>SeriesGroupBy.any()</code>, <code>DataFrameGroupBy.all()</code>, and <code>SeriesGroupBy.all()</code> use Kleene logic with nullable data types (GH 37506)</p></li>
<li><p><code>DataFrameGroupBy.any()</code>, <code>SeriesGroupBy.any()</code>, <code>DataFrameGroupBy.all()</code>, and <code>SeriesGroupBy.all()</code> return a <code>BooleanDtype</code> for columns with nullable data types (GH 33449)</p></li>
<li><p><code>DataFrameGroupBy.any()</code>, <code>SeriesGroupBy.any()</code>, <code>DataFrameGroupBy.all()</code>, and <code>SeriesGroupBy.all()</code> raise <code>object</code> even if <code>skipna=True</code> and the data contains <code>pd.NA</code> (GH 37501)</p></li>
<li><p><code>DataFrameGroupBy.rank()</code> and <code>SeriesGroupBy.rank()</code> now support object dtype data (GH 38278)</p></li>
<li><p>When constructing a <code>DataFrame</code> or <code>Series</code> using a Python iterable object, if the <code>data</code> parameter is not a NumPy scalar consisting of a NumPy <code>ndarray</code>, the dtype will have the maximum precision of a NumPy scalar; when <code>data</code> is a NumPy <code>ndarray</code> This is already the case (GH 40908)</p></li>
<li><p>Add keyword <code>sort</code> in <code>pivot_table()</code> to allow results to be unsorted (GH 39143)</p></li>
<li><p>Add keyword <code>dropna</code> in <code>DataFrame.value_counts()</code> to allow counting of rows containing <code>NA</code> values ​​(GH 41325)</p></li>
<li><p><code>Series.replace()</code> now converts results to <code>PeriodDtype</code> when possible, instead of <code>object</code> dtype (GH 41526)</p></li>
<li><p>Improved error messages in <code>corr</code> and <code>cov</code> methods of <code>Rolling</code>, <code>Expanding</code> and <code>ExponentialMovingWindow</code> when <code>other</code> is not <code>DataFrame</code> or <code>Series</code> (GH 41741)</p></li>
<li><p><code>Series.between()</code> can now accept <code>left</code> or <code>right</code> as an argument to the <code>inclusive</code> parameter to include only the left or right border (GH 40245)</p></li>
<li><p><code>DataFrame.explode()</code> now supports exploding multiple columns simultaneously. Its <code>column</code> parameter now also accepts a str or list of tuples to explode on multiple columns simultaneously (GH 39240)</p></li>
<li><p><code>DataFrame.sample()</code> now accepts an <code>ignore_index</code> parameter to reset the index after sampling, similar to <code>DataFrame.drop_duplicates()</code> and <code>DataFrame.sort_values()</code> (GH 38581)</p></li>
</ul>

<h3>Notable bug fixes</h3>

<p>These are bug fixes that may have significant behavior changes.

<h4><code>Categorical.unique</code> now always maintains the same dtype as the original</h4>

Previously, when calling <code>Categorical.unique()</code> with categorical data, unused categories in the new array were removed, making the new array a different dtype than the original array (GH 18291)

For example, given:

<pre><code class="language-shell line-numbers">In [17]: dtype = pd.CategoricalDtype(['bad', 'neutral', 'good'], ordered=True)

In [18]: cat = pd.Categorical(['good', 'good', 'bad', 'bad'], dtype=dtype)

In [19]: original = pd.Series(cat)

In [20]: unique = original.unique() 
</code></pre>

<em>Previous Behavior</em>:

<pre><code class="language-python line-numbers">In [1]: unique
['good', 'bad']
Categories (2, object): ['bad' < 'good']
In [2]: original.dtype == unique.dtype
False 
</code></pre>

<em>New Behavior</em>:

<pre><code class="language-python line-numbers">In [21]: unique
Out[21]: 
['good', 'bad']
Categories (3, object): ['bad' < 'neutral' < 'good']

In [22]: original.dtype == unique.dtype
Out[22]: True 
```### exist `DataFrame.combine_first()` Reserved dtype

`DataFrame.combine_first()` now preserves dtype ([GH 7509](https://github.com/pandas-dev/pandas/issues/7509))

```py
In [23]: df1 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=[0, 1, 2])

In [24]: df1
Out[24]: 
 A  B
0  1  1
1  2  2
2  3  3

In [25]: df2 = pd.DataFrame({"B": [4, 5, 6], "C": [1, 2, 3]}, index=[2, 3, 4])

In [26]: df2
Out[26]: 
 B  C
2  4  1
3  5  2
4  6  3

In [27]: combined = df1.combine_first(df2) 
</code></pre>

<em>Previous Behavior</em>:

<pre><code class="language-python line-numbers">In [1]: combined.dtypes
Out[2]:
A    float64
B    float64
C    float64
dtype: object 
</code></pre>

<em>New Behavior</em>:

<pre><code class="language-python line-numbers">In [28]: combined.dtypes
Out[28]: 
A    float64
B      int64
C    float64
dtype: object 
```### Groupby method agg and transform No longer changes the return of a callable function dtype

The previous methods `DataFrameGroupBy.aggregate()`, `SeriesGroupBy.aggregate()`, `DataFrameGroupBy.transform()` and `SeriesGroupBy.transform()` may transform the dtype of the result when the argument `func` is callable, May lead to undesirable results ([GH 21240](https://github.com/pandas-dev/pandas/issues/21240)). Conversion occurs if the result is numeric and converting it back to the input dtype does not change any value (as measured by `np.allclose`). Such conversions no longer occur.

```py
In [29]: df = pd.DataFrame({'key': [1, 1], 'a': [True, False], 'b': [True, True]})

In [30]: df
Out[30]: 
 key      a     b
0    1   True  True
1    1  False  True 
</code></pre>

<em>Previous Behavior</em>:

<pre><code class="language-python line-numbers">In [5]: df.groupby('key').agg(lambda x: x.sum())
Out[5]:
 a  b
key
1    True  2 
</code></pre>

<em>New Behavior</em>:

<pre><code class="language-python line-numbers">In [31]: df.groupby('key').agg(lambda x: x.sum())
Out[31]: 
 a  b
key 
1    1  2 
```### `DataFrameGroupBy.mean()`, `DataFrameGroupBy.median()`, and `GDataFrameGroupBy.var()`, `SeriesGroupBy.mean()`, `SeriesGroupBy.median()`, and `SeriesGroupBy.var()` of `float` result

Previously, these methods might produce different dtypes depending on the input value. These methods will now always return a floating point dtype. ([GH 41137](https://github.com/pandas-dev/pandas/issues/41137))

```py
In [32]: df = pd.DataFrame({'a': [True], 'b': [1], 'c': [1.0]}) 
</code></pre>

<em>Previous Behavior</em>:

<pre><code class="language-python line-numbers">In [5]: df.groupby(df.index).mean()
Out[5]:
 a  b    c
0    True  1  1.0 
</code></pre>

<em>New Behavior</em>:

<pre><code class="language-python line-numbers">In [33]: df.groupby(df.index).mean()
Out[33]: 
 a    b    c
0  1.0  1.0  1.0 
```### Try using `loc` and `iloc` Perform in-place operations when setting values

When setting an entire column using `loc` or `iloc`, pandas will try to insert the values ​​into the existing data instead of creating an entirely new array.

```py
In [34]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")

In [35]: values = df.values

In [36]: new = np.array([5, 6, 7], dtype="int64")

In [37]: df.loc[[0, 1, 2], "A"] = new 
</code></pre>

In the old and new behavior, the data in <code>values</code> is overwritten, but in the old behavior, the dtype of <code>df["A"]</code> is changed to <code>int64</code>.

<em>Previous Behavior</em>:

<pre><code class="language-python line-numbers">In [1]: df.dtypes
Out[1]:
A    int64
dtype: object
In [2]: np.shares_memory(df["A"].values, new)
Out[2]: False
In [3]: np.shares_memory(df["A"].values, values)
Out[3]: False 
</code></pre>

In pandas 1.3.0, <code>df</code> continues to share data with <code>values</code>.

<em>New Behavior</em>:

<pre><code class="language-python line-numbers">In [38]: df.dtypes
Out[38]: 
A    float64
dtype: object

In [39]: np.shares_memory(df["A"], new)
Out[39]: False

In [40]: np.shares_memory(df["A"], values)
Out[40]: True 
```### In settings `frame[keys] = values` Never operate in place

When setting multiple columns using `frame[keys] = values`, the new array will replace the pre-existing array for those keys, which will *not* be overwritten ([GH 39510](https://github.com/pandas -dev/pandas/issues/39510)). Therefore, the column will retain the dtype(s) of `values` and will not be converted to the dtype(s) of the existing array.

```py
In [41]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")

In [42]: df[["A"]] = 5 
</code></pre>

In the old behavior, <code>5</code> was converted to <code>float64</code> and inserted into the existing array as a support for <code>df</code>:

<em>Previous Behavior</em>:

<pre><code class="language-python line-numbers">In [1]: df.dtypes
Out[1]:
A    float64 
</code></pre>

In the new behavior, we get a new array and keep the integer type <code>5</code>:

<em>New Behavior</em>:

<pre><code class="language-shell line-numbers">In [43]: df.dtypes
Out[43]: 
A    int64
dtype: object 
```### Consistent conversion when setting up a boolean series

Setting non-boolean values ​​in `Series` using `dtype=bool` is now consistently converted to `dtype=object` ([GH 38709](https://github.com/pandas-dev/pandas/issues/38709))

```py
In [1]: orig = pd.Series([True, False])

In [2]: ser = orig.copy()

In [3]: ser.iloc[1] = np.nan

In [4]: ser2 = orig.copy()

In [5]: ser2.iloc[1] = 2.0 
</code></pre>

<em>Previous Behavior</em>:

<pre><code class="language-shell line-numbers">In [1]: ser
Out [1]:
0    1.0
1    NaN
dtype: float64

In [2]:ser2
Out [2]:
0    True
1     2.0
dtype: object 
</code></pre>

<em>New Behavior</em>:

<pre><code class="language-python line-numbers">In [1]: ser
Out [1]:
0    True
1     NaN
dtype: object

In [2]:ser2
Out [2]:
0    True
1     2.0
dtype: object 
```### DataFrameGroupBy.rolling and SeriesGroupBy.rolling is no longer returned in value grouped-by List

group-by columns will now be removed from the results of `groupby.rolling` operations ([GH 32262](https://github.com/pandas-dev/pandas/issues/32262))

```py
In [44]: df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})

In [45]: df
Out[45]: 
 A  B
0  1  0
1  1  1
2  2  2
3  3  3 
</code></pre>

<em>Previous Behavior</em>:

<pre><code class="language-python line-numbers">In [1]: df.groupby("A").rolling(2).sum()
Out[1]:
 A    B
A
1 0  NaN  NaN
1    2.0  1.0
2 2  NaN  NaN
3 3  NaN  NaN 
</code></pre>

<em>New Behavior</em>:

<pre><code class="language-python line-numbers">In [46]: df.groupby("A").rolling(2).sum()
Out[46]: 
 B
A 
1 0  NaN
 1  1.0
2 2  NaN
3 3  NaN 
```### Removes artificial truncation of rolling variance and standard deviation

`Rolling.std()` and `Rolling.var()` now do not artificially truncate results smaller than `~1e-8` and `~1e-15` to zero ([GH 37051](https:// github.com/pandas-dev/pandas/issues/37051), [GH 40448](https://github.com/pandas-dev/pandas/issues/40448), [GH 39872](https://github. com/pandas-dev/pandas/issues/39872)).

However, when scrolling over larger values, there may be floating point artifacts in the results.

```py
In [47]: s = pd.Series([7, 5, 5, 5])

In [48]: s.rolling(3).var()
Out[48]: 
0         NaN
1         NaN
2    1.333333
3    0.000000
dtype: float64 
```### DataFrameGroupBy.rolling and SeriesGroupBy.rolling having MultiIndex Levels are no longer removed from the results of

`DataFrameGroupBy.rolling()` and `SeriesGroupBy.rolling()` now do not remove levels of `DataFrame` with `MultiIndex` from the results. This could lead to a seeming duplication of levels in the `MultiIndex` in the results, but this change restores the behavior that existed in version 1.1.3 ([GH 38787](https://github.com/pandas-dev/pandas/issues /38787), [GH 38523](https://github.com/pandas-dev/pandas/issues/38523)).

```py
In [49]: index = pd.MultiIndex.from_tuples([('idx1', 'idx2')], names=['label1', 'label2'])

In [50]: df = pd.DataFrame({'a': [1], 'b': [2]}, index=index)

In [51]: df
Out[51]: 
 a  b
label1 label2 
idx1   idx2    1  2 
</code></pre>

<em>Previous Behavior</em>:

<pre><code class="language-python line-numbers">In [1]: df.groupby('label1').rolling(1).sum()
Out[1]:
 a    b
label1
idx1    1.0  2.0 
</code></pre>

<em>New Behavior</em>:

<pre><code class="language-python line-numbers">In [52]: df.groupby('label1').rolling(1).sum()
Out[52]: 
 a    b
label1 label1 label2 
idx1   idx1   idx2    1.0  2.0 
```### `Categorical.unique` Now always remains the same as the original data type

Previously, when calling `Categorical.unique()` with categorical data, unused categories in the new array were removed, making the new array&#39;s data type different from the original data type ([GH 18291](https:// github.com/pandas-dev/pandas/issues/18291))

To illustrate, given:

```py
In [17]: dtype = pd.CategoricalDtype(['bad', 'neutral', 'good'], ordered=True)

In [18]: cat = pd.Categorical(['good', 'good', 'bad', 'bad'], dtype=dtype)

In [19]: original = pd.Series(cat)

In [20]: unique = original.unique() 

Previous Behavior:

In [1]: unique
['good', 'bad']
Categories (2, object): ['bad' < 'good']
In [2]: original.dtype == unique.dtype
False 

New Behavior:

In [21]: unique
Out[21]: 
['good', 'bad']
Categories (3, object): ['bad' < 'neutral' < 'good']

In [22]: original.dtype == unique.dtype
Out[22]: True 

Preserve data types in DataFrame.combine_first()

DataFrame.combine_first() now preserves data types (GH 7509)

In [23]: df1 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=[0, 1, 2])

In [24]: df1
Out[24]: 
 A  B
0  1  1
1  2  2
2  3  3

In [25]: df2 = pd.DataFrame({"B": [4, 5, 6], "C": [1, 2, 3]}, index=[2, 3, 4])

In [26]: df2
Out[26]: 
 B  C
2  4  1
3  5  2
4  6  3

In [27]: combined = df1.combine_first(df2) 

Previous Behavior:

In [1]: combined.dtypes
Out[2]:
A    float64
B    float64
C    float64
dtype: object 

New Behavior:

In [28]: combined.dtypes
Out[28]: 
A    float64
B      int64
C    float64
dtype: object 

Groupby methods agg and transform no longer change the return data type of the callable object

Previously, the methods DataFrameGroupBy.aggregate(), SeriesGroupBy.aggregate(), DataFrameGroupBy.transform() and SeriesGroupBy.transform() might transform the dtype of the result when the argument func was callable. May cause adverse consequences (GH 21240). Conversion occurs if the result is numeric and converting it back to the input dtype does not change any value (as measured by np.allclose). This conversion does not occur now.

In [29]: df = pd.DataFrame({'key': [1, 1], 'a': [True, False], 'b': [True, True]})

In [30]: df
Out[30]: 
 key      a     b
0    1   True  True
1    1  False  True 

Previous Behavior:

In [5]: df.groupby('key').agg(lambda x: x.sum())
Out[5]:
 a  b
key
1    True  2 

New Behavior:

In [31]: df.groupby('key').agg(lambda x: x.sum())
Out[31]: 
 a  b
key 
1    1  2 

The results of DataFrameGroupBy.mean(), DataFrameGroupBy.median() and GDataFrameGroupBy.var() are float, SeriesGroupBy.mean(), SeriesGroupBy.median() and The result of SeriesGroupBy.var() is float

Previously, these methods could produce different dtypes depending on the input value. These methods will now always return type float. (GH 41137)

In [32]: df = pd.DataFrame({'a': [True], 'b': [1], 'c': [1.0]}) 

Previous Behavior:

In [5]: df.groupby(df.index).mean()
Out[5]:
 a  b    c
0    True  1  1.0 

New Behavior:

In [33]: df.groupby(df.index).mean()
Out[33]: 
 a    b    c
0  1.0  1.0  1.0 

Try to operate in place when setting values ​​using loc and iloc

When setting an entire column using loc or iloc, pandas will try to insert values ​​into existing data instead of creating an entirely new array.

In [34]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")

In [35]: values = df.values

In [36]: new = np.array([5, 6, 7], dtype="int64")

In [37]: df.loc[[0, 1, 2], "A"] = new 

In both the old and new behavior, the data in values is overwritten, but in the old behavior, the dtype of df["A"] is changed to int64.

Previous Behavior:

In [1]: df.dtypes
Out[1]:
A    int64
dtype: object
In [2]: np.shares_memory(df["A"].values, new)
Out[2]: False
In [3]: np.shares_memory(df["A"].values, values)
Out[3]: False 

In pandas 1.3.0, df still shares data with values

New Behavior:

In [38]: df.dtypes
Out[38]: 
A    float64
dtype: object

In [39]: np.shares_memory(df["A"], new)
Out[39]: False

In [40]: np.shares_memory(df["A"], values)
Out[40]: True 

Never do in-place operations when setting frame[keys] = values

When setting multiple columns using frame[keys] = values, the new arrays will replace the pre-existing arrays for those keys, which will not be overwritten (GH 39510). Therefore, the column will retain the dtype(s) of values and will not be converted to the dtype(s) of the existing array.

In [41]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")

In [42]: df[["A"]] = 5 

In the old behavior, 5 was converted to float64 and inserted into an existing array that supported df:

Previous Behavior:

In [1]: df.dtypes
Out[1]:
A    float64 

In the new behavior, we get a new array and retain an integer type 5:

New Behavior:

In [43]: df.dtypes
Out[43]: 
A    int64
dtype: object 

Set to consistent conversion in Boolean Series

Setting non-boolean values ​​into Series with dtype=bool is now consistently converted to dtype=object (GH 38709)

In [1]: orig = pd.Series([True, False])

In [2]: ser = orig.copy()

In [3]: ser.iloc[1] = np.nan

In [4]: ser2 = orig.copy()

In [5]: ser2.iloc[1] = 2.0 

Previous Behavior:

In [1]: ser
Out [1]:
0    1.0
1    NaN
dtype: float64

In [2]:ser2
Out [2]:
0    True
1     2.0
dtype: object 

New Behavior:

In [1]: ser
Out [1]:
0    True
1     NaN
dtype: object

In [2]:ser2
Out [2]:
0    True
1     2.0
dtype: object 

DataFrameGroupBy.rolling and SeriesGroupBy.rolling no longer return columns grouped by group in the value

Grouping columns will now be removed from the results of groupby.rolling operations (GH 32262)

In [44]: df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})

In [45]: df
Out[45]: 
 A  B
0  1  0
1  1  1
2  2  2
3  3  3 

Previous Behavior:

In [1]: df.groupby("A").rolling(2).sum()
Out[1]:
 A    B
A
1 0  NaN  NaN
1    2.0  1.0
2 2  NaN  NaN
3 3  NaN  NaN 

New Behavior:

In [46]: df.groupby("A").rolling(2).sum()
Out[46]: 
 B
A 
1 0  NaN
 1  1.0
2 2  NaN
3 3  NaN 

Remove artificial truncation of rolling variance and standard deviation

Rolling.std() and Rolling.var() now no longer artificially truncate results smaller than ~1e-8 and ~1e-15 to zero (GH 37051, GH 40448, GH 39872) .

However, there may now be floating point artifacts in the results when scrolling to larger values.

In [47]: s = pd.Series([7, 5, 5, 5])

In [48]: s.rolling(3).var()
Out[48]: 
0         NaN
1         NaN
2    1.333333
3    0.000000
dtype: float64 

DataFrameGroupBy.rolling and SeriesGroupBy.rolling no longer remove levels with MultiIndex in the results

DataFrameGroupBy.rolling() and SeriesGroupBy.rolling() now no longer remove levels with a MultiIndex in the DataFrame from the results. This could result in duplication of levels in the resulting MultiIndex, but this change restores the behavior that existed in version 1.1.3 (GH 38787, GH 38523).

In [49]: index = pd.MultiIndex.from_tuples([('idx1', 'idx2')], names=['label1', 'label2'])

In [50]: df = pd.DataFrame({'a': [1], 'b': [2]}, index=index)

In [51]: df
Out[51]: 
 a  b
label1 label2 
idx1   idx2    1  2 

Previous Behavior:

In [1]: df.groupby('label1').rolling(1).sum()
Out[1]:
 a    b
label1
idx1    1.0  2.0 

New Behavior:

In [52]: df.groupby('label1').rolling(1).sum()
Out[52]: 
 a    b
label1 label1 label2 
idx1   idx1   idx2    1.0  2.0 

Backward-incompatible API changes

Added minimum version of dependencies

Some minimum supported dependency versions have been updated. If installed, we now need to:

Packages Minimum Version Required Changed
numpy 1.17.3 X X
pytz 2017.3 X
python-dateutil 2.7.3 X
bottleneck 1.2.1
numexpr 2.7.0 X
pytest (dev) 6.0 X
mypy (dev) 0.812 X
setuptools 38.6.0 X

For optional libraries, it is generally recommended to use the latest version. The following table lists the minimum versions of each library currently being tested during pandas development. Optional libraries with versions lower than the minimum tested may still work, but are not considered supported.

Package Minimum Version Changed
beautifulsoup4 4.6.0
fastparquet 0.4.0 X
fsspec 0.7.4
gcsfs 0.6.0
lxml 4.3.0
matplotlib 2.2.3
numb 0.46.0
openpyxl 3.0.0 X
pyarrow 0.17.0 X
pymysql 0.8.1 X
pytables 3.5.1
s3fs 0.4.0
scipy 1.2.0
sqlalchemy 1.3.0 X
tabulate 0.8.7 X
xray 0.12.0
xlrd 1.2.0
xlsxwriter 1.0.2
xlwt 1.3.0
pandas-gbq 0.12.0

For more information, see Dependencies and Optional dependencies ### Other API changes

  • Partially initialized CategoricalDtype objects (i.e. objects with categories=None) will no longer be equivalent to fully initialized dtype objects (GH 38516)

  • Accessing _constructor_expanddim on DataFrame and _constructor_sliced on Series now raises AttributeError. Previously would raise NotImplementedError (GH 38782)

  • Added new engine and **engine_kwargs parameters to DataFrame.to_sql() to support other future “SQL engines”. Currently, we are still only using SQLAlchemy under the hood, but there are plans to support more engines, such as turbodbc (GH 36893)

  • Removed redundant freq from PeriodIndex string representation (GH 41653)

  • ExtensionDtype.construct_array_type() is now a required method for ExtensionDtype subclasses, rather than an optional method (GH 24860)

  • Calling hash on a non-hashable pandas object will raise TypeError with a built-in error message (eg unhashable type: &#39;Series&#39;). Previously a custom message would be displayed, such as &#39;Series&#39; objects are mutable, thus they cannot be hashed. Additionally, isinstance(<Series> , abc.collections.Hashable) now returns False (GH 40013)

  • Styler.from_custom_template() now has two new template name arguments, and the old name was removed since template inheritance was introduced for better parsing (GH 42053). You also need to subclass the Styler attribute. ### Construct

  • Documents in .pptx and .pdf formats are no longer included in wheel or source distributions. (GH 30741) ### Increase minimum version of dependencies

The minimum supported versions of some dependencies have been updated. If installed, we now require:

Package Minimum Version Required Changed
numpy 1.17.3 X X
pytz 2017.3 X
python-dateutil 2.7.3 X
bottleneck 1.2.1
numexpr 2.7.0 X
pytest (dev) 6.0 X
mypy (dev) 0.812 X
setuptools 38.6.0 X

For optional libraries, it is generally recommended to use the latest version. The following table lists the minimum versions of each library currently being tested during pandas development. Optional libraries with lower than minimum tested versions may still be available, but are not considered supported.

Package Minimum Version Changed
beautifulsoup4 4.6.0
fastparquet 0.4.0 X
fsspec 0.7.4
gcsfs 0.6.0
lxml 4.3.0
matplotlib 2.2.3
numb 0.46.0
openpyxl 3.0.0 X
pyarrow 0.17.0 X
pymysql 0.8.1 X
pytables 3.5.1
s3fs 0.4.0
scipy 1.2.0
sqlalchemy 1.3.0 X
tabulate 0.8.7 X
xray 0.12.0
xlrd 1.2.0
xlsxwriter 1.0.2
xlwt 1.3.0
pandas-gbq 0.12.0

See Dependencies and Optional Dependencies for more information.

Other API changes

  • Partially initialized CategoricalDtype objects (i.e. objects with categories=None) will no longer be equivalent to fully initialized dtype objects (GH 38516)

  • Accessing _constructor_expanddim on DataFrame and _constructor_sliced on Series now raises AttributeError. Previously raised NotImplementedError (GH 38782)

  • Added new engine and **engine_kwargs arguments to DataFrame.to_sql() to support other future “SQL engines”. Currently we are still only using SQLAlchemy under the hood, but plan to support more engines, such as turbodbc (GH 36893)

  • Removed redundant freq from PeriodIndex string representation (GH 41653)

  • ExtensionDtype.construct_array_type() is now a required method for ExtensionDtype subclasses, rather than an optional method (GH 24860)

  • Calling hash on an unhashable pandas object now raises TypeError with a built-in error message (e.g. unhashable type: &#39;Series&#39;). Previously a custom message would be raised, such as &#39;Series&#39; objects are mutable, thus they cannot be hashed. Additionally, isinstance(<Series> , abc.collections.Hashable) will now return False (GH 40013)

  • Styler.from_custom_template() now has two new arguments for template names, and the old name has been removed since template inheritance was introduced for better parsing (GH 42053). It is also necessary to modify the subclass of Styler property.

Construct

  • Documentation in .pptx and .pdf formats is no longer included in wheels or source distributions. (GH 30741)

Deprecated

Deprecated removal of useless columns in DataFrame reduction and DataFrameGroupBy operations

When calling reduction on a DataFrame with numeric_only=None (the default) (e.g. .min, .max, .sum), columns that raise TypeError for the reduction are silently ignored and removed from the results.

This behavior is deprecated. In a future version, TypeError will be raised and the user will need to select a valid column before calling the function.

For example:

In [53]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})

In [54]: df
Out[54]: 
 A          B
0  1 2016-01-01
1  2 2016-01-02
2  3 2016-01-03
3  4 2016-01-04 

Old Behavior:

In [3]: df.prod()
Out[3]:
Out[3]:
A    24
dtype: int64 

Future Behavior:

In [4]: df.prod()
...
TypeError: 'DatetimeArray' does not implement reduction 'prod'

In [5]: df[["A"]].prod()
Out[5]:
A    24
dtype: int64 

Likewise, when applying a function to DataFrameGroupBy, columns that currently raise TypeError for the function are silently ignored and removed from the result.

This behavior has been deprecated. In a future version, a TypeError will be raised and the user will need to select only valid columns before calling the function.

For example:

In [55]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})

In [56]: gb = df.groupby([1, 1, 2, 2]) 

Old Behavior:

In [4]: gb.prod(numeric_only=False)
Out[4]:
A
1   2
2  12 

Future Behavior:

In [5]: gb.prod(numeric_only=False)
...
TypeError: datetime64 type does not support prod operations

In [6]: gb[["A"]].prod(numeric_only=False)
Out[6]:
 A
1   2
2  12 
```### Other abandoned

+ Deprecated allowing scalars to be passed to `Categorical` constructor ([GH 38433](https://github.com/pandas-dev/pandas/issues/38433))

+ Deprecated constructing `CategoricalIndex` without passing list-like data ([GH 38944](https://github.com/pandas-dev/pandas/issues/38944))

+ Deprecated the usage of allowing subclass-specific keyword parameters in the `Index` constructor, and directly use specific subclasses instead ([GH 14093](https://github.com/pandas-dev/pandas/issues/14093 ), [GH 21311](https://github.com/pandas-dev/pandas/issues/21311), [GH 22315](https://github.com/pandas-dev/pandas/issues/22315), [GH 26974](https://github.com/pandas-dev/pandas/issues/26974))

+ Deprecated the `astype()` method of datetimelike (`timedelta64[ns]`, `datetime64[ns]`, `Datetime64TZDtype`, `PeriodDtype`) for conversion to integer dtype, use `values.view(.. .)` instead of ([GH 38544](https://github.com/pandas-dev/pandas/issues/38544)). This deprecation was reversed in pandas 1.4.0.

+ Deprecated `MultiIndex.is_lexsorted()` and `MultiIndex.lexsort_depth()` and use `MultiIndex.is_monotonic_increasing()` instead ([GH 32259](https://github.com/pandas-dev/pandas/issues /32259))

+ Deprecated keyword `try_cast` in `Series.where()`, `Series.mask()`, `DataFrame.where()`, `DataFrame.mask()`, manually cast the result if necessary ([ GH 38836](https://github.com/pandas-dev/pandas/issues/38836))

+ Deprecated using `datetime.date` objects to compare `Timestamp` objects. For example, instead of using `ts <= mydate`, use `ts <= pd.Timestamp(mydate)` or `ts.date() <= mydate` ([GH 36131](https://github.com/ pandas-dev/pandas/issues/36131))

+ Deprecated `Rolling.win_type` returning `"freq"` ([GH 38963](https://github.com/pandas-dev/pandas/issues/38963))

+ Deprecated `Rolling.is_datetimelike` ([GH 38963](https://github.com/pandas-dev/pandas/issues/38963))

+ Deprecated `DataFrame` indexer for `Series.__setitem__()` and `DataFrame.__setitem__()` ([GH 39004](https://github.com/pandas-dev/pandas/issues/39004))

+   Deprecated`ExponentialMovingWindow.vol()`([GH 39220](https://github.com/pandas-dev/pandas/issues/39220))

+ Using `.astype` to convert between `datetime64[ns]` dtype and `DatetimeTZDtype` has been deprecated and will throw an error in a future release, `obj.tz_localize` or `obj.dt` should be used instead. tz_localize` ([GH 38622](https://github.com/pandas-dev/pandas/issues/38622))

+ `datetime.date` objects are no longer used as `fill_value` in `DataFrame.unstack()`, `DataFrame.shift()`, `Series.shift()` and `DataFrame.reindex()` to `datetime64`, instead `pd.Timestamp(dateobj)` should be passed ([GH 39767](https://github.com/pandas-dev/pandas/issues/39767))

+ Deprecate `Styler.set_na_rep()` and `Styler.set_precision()` in favor of `Styler.format()` with `na_rep` and `precision` as existing and new input parameters ([GH 40134]( https://github.com/pandas-dev/pandas/issues/40134), [GH 40425](https://github.com/pandas-dev/pandas/issues/40425))

+ Deprecate `Styler.where()` in favor of an alternative with `Styler.applymap()` ([GH 40821](https://github.com/pandas-dev/pandas/issues/40821))

+ In `Series.transform()` and `DataFrame.transform()`, partial failure is no longer allowed when `func` is similar to a list or dictionary and raises any exception except `TypeError`; in a future version A `func` that raises any exception other than `TypeError` will raise an error ([GH 40211](https://github.com/pandas-dev/pandas/issues/40211))

+ In `read_csv()` and `read_table()`, the parameters `error_bad_lines` and `warn_bad_lines` are no longer supported, but the parameter `on_bad_lines` is supported instead ([GH 15122](https://github.com/pandas- dev/pandas/issues/15122))

+ Deprecated support for `np.ma.mrecords.MaskedRecords` in the `DataFrame` constructor, please use `{name: data[name] for name in data.dtype.names}` instead ([GH 40363](https ://github.com/pandas-dev/pandas/issues/40363))

+ Deprecated using `merge()`, `DataFrame.merge()` and `DataFrame.join()` at different hierarchical levels ([GH 34862](https://github.com/pandas-dev/pandas /issues/34862))

+ Deprecated use of `**kwargs` in `ExcelWriter`; use keyword argument `engine_kwargs` instead ([GH 40430](https://github.com/pandas-dev/pandas/issues/40430))

+ Deprecated `level` keyword for `DataFrame` and `Series` aggregation; use groupby instead ([GH 39983](https://github.com/pandas-dev/pandas/issues/39983))

+ Deprecated the `inplace` parameter of `Categorical.remove_categories()`, `Categorical.add_categories()`, `Categorical.reorder_categories()`, `Categorical.rename_categories()`, `Categorical.set_categories()`, and Will be removed in a future release ([GH 37643](https://github.com/pandas-dev/pandas/issues/37643))

+ Deprecated the behavior of `merge()` when generating duplicate columns and existing columns via the `suffixes` keyword ([GH 22818](https://github.com/pandas-dev/pandas/issues/22818) )

+ Setting `Categorical._codes` is deprecated, please create a new `Categorical` with the required codes ([GH 40606](https://github.com/pandas-dev/pandas/issues/40606))

+ Deprecated `convert_float` optional parameter in `read_excel()` and `ExcelFile.parse()` ([GH 41127](https://github.com/pandas-dev/pandas/issues/41127))

+ The mixed time zone behavior of `DatetimeIndex.union()` has been deprecated; in a future version, both will be converted to UTC instead of object types ([GH 39328](https://github.com/pandas- dev/pandas/issues/39328))

+ Deprecated `usecols` for out-of-range indexes in `read_csv()` using `engine="c"` ([GH 25623](https://github.com/pandas-dev/pandas/issues/25623) )

+ Deprecated behavior in `DataFrame` constructor to treat lists whose first element is categorical; pass `pd.DataFrame({col: categorical, ...})` instead ([GH 38845]( https://github.com/pandas-dev/pandas/issues/38845))

+ Deprecated behavior of the `DataFrame` constructor when a `dtype` is passed and the data cannot be converted to that dtype. In a future release, this will be raised instead of being silently ignored ([GH 24435](https://github.com/pandas-dev/pandas/issues/24435))

+ The `Timestamp.freq` property has been deprecated. For properties that use it (`is_month_start`, `is_month_end`, `is_quarter_start`, `is_quarter_end`, `is_year_start`, `is_year_end`), when you have a `freq`, use e.g. `freq.is_month_start(ts)` ([GH 15146](https://github.com/pandas-dev/pandas/issues/15146))

+ Deprecated behavior for constructing `Series` or `DataFrame` with `DatetimeTZDtype` data and `datetime64[ns]` dtype. Use `Series(data).dt.tz_localize(None)` instead ([GH 41555](https://github.com/pandas-dev/pandas/issues/41555), [GH 33401](https:// github.com/pandas-dev/pandas/issues/33401))

+ Deprecated behavior of `Series` construct when large integer values ​​silently overflow with small integer dtype; use `Series(data).astype(dtype)` instead ([GH 41734](https://github.com /pandas-dev/pandas/issues/41734))

+ Deprecated behavior of `DataFrame` constructs when converting floating point data to integer dtype even if there is a loss; in a future version this will remain floating point, matching the behavior of `Series` ([GH 41770] (https://github.com/pandas-dev/pandas/issues/41770))

+ Inference behavior for `timedelta64[ns]`, `datetime64[ns]` or `DatetimeTZDtype` dtypes has been deprecated in the `Series` constructor when passing string data and no `dtype` is passed ([ GH 33558](https://github.com/pandas-dev/pandas/issues/33558))

+ In a future release, when constructing a `Series` or `DataFrame` with `datetime64[ns]` data and `DatetimeTZDtype`, the data will be treated as wall time instead of UTC time (matching DatetimeIndex behavior). To view the data as UTC time, use `pd.Series(data).dt.tz_localize("UTC").dt.tz_convert(dtype.tz)` or `pd.Series(data.view("int64") , dtype=dtype)` ([GH 33401](https://github.com/pandas-dev/pandas/issues/33401))

+ Deprecated passing lists as `key` to `DataFrame.xs()` and `Series.xs()` ([GH 41760](https://github.com/pandas-dev/pandas/issues/41760))

+ Deprecated Boolean type parameter `inclusive` In `Series.between()`, the standard parameter value is `{"left", "right", "neither", "both"}` ([GH 40628](https:/ /github.com/pandas-dev/pandas/issues/40628))

+ Deprecated passing parameters as positional parameters to all the following functions, but special cases have been noted ([GH 41485](https://github.com/pandas-dev/pandas/issues/41485))

+ `concat()` (except `objs`)

+   `read_csv()` (Apart from `filepath_or_buffer`)

+   `read_table()` (Apart from `filepath_or_buffer`)

+ `DataFrame.clip()` and `Series.clip()` (except `upper` and `lower`)

+   `DataFrame.drop_duplicates()` (Apart from `subset`), `Series.drop_duplicates()`, `Index.drop_duplicates()` and `MultiIndex.drop_duplicates()`

+ `DataFrame.drop()` (except `labels`) and `Series.drop()`

+   `DataFrame.dropna()` and `Series.dropna()`

+   `DataFrame.ffill()`, `Series.ffill()`, `DataFrame.bfill()` and `Series.bfill()`

+   `DataFrame.fillna()` and `Series.fillna()` (Apart from `value`)

+   `DataFrame.interpolate()` and `Series.interpolate()` (Apart from `method`)

+ `DataFrame.mask()` and `Series.mask()` (except `cond` and `other`)

+ `DataFrame.reset_index()` (except `level`) and `Series.reset_index()`

+ `DataFrame.set_axis()` and `Series.set_axis()` (except `labels`)

+ `DataFrame.set_index()` (except `keys`)

+   `DataFrame.sort_index()` and `Series.sort_index()`

+   `DataFrame.sort_values()` (Apart from `by`)and `Series.sort_values()`

+ `DataFrame.where()` and `Series.where()` (except `cond` and `other`)

+ `Index.set_names()` and `MultiIndex.set_names()` (except `names`)

+ `MultiIndex.codes()` (except `codes`)

+ `MultiIndex.set_levels()` (except `levels`)

+ `Resampler.interpolate()` (except `method`) ### Deprecate removing irrelevant columns in DataFrame reduction and DataFrameGroupBy operations

When reducing a `DataFrame` with `numeric_only=None` (the default) (e.g. `.min`, `.max`, `.sum`), columns that raise a `TypeError` are silently ignored if reduced and removed from the results.

This behavior is deprecated. In a future version, `TypeError` will be raised and the user will need to select only valid columns before calling the function.

For example:

```py
In [53]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})

In [54]: df
Out[54]: 
 A          B
0  1 2016-01-01
1  2 2016-01-02
2  3 2016-01-03
3  4 2016-01-04 

Old Behavior:

In [3]: df.prod()
Out[3]:
Out[3]:
A    24
dtype: int64 

Future Behavior:

In [4]: df.prod()
...
TypeError: 'DatetimeArray' does not implement reduction 'prod'

In [5]: df[["A"]].prod()
Out[5]:
A    24
dtype: int64 

Similarly, when applying a function to DataFrameGroupBy, columns where the function raises a TypeError are now silently ignored and removed from the result.

This behavior is deprecated. In a future version, TypeError will be raised and the user will need to select only valid columns before calling the function.

For example:

In [55]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})

In [56]: gb = df.groupby([1, 1, 2, 2]) 

Old Behavior:

In [4]: gb.prod(numeric_only=False)
Out[4]:
A
1   2
2  12 

Future Behavior:

In [5]: gb.prod(numeric_only=False)
...
TypeError: datetime64 type does not support prod operations

In [6]: gb[["A"]].prod(numeric_only=False)
Out[6]:
 A
1   2
2  12 

Other deprecations

  • Deprecate allowing scalars to be passed to Categorical constructor (GH 38433)

  • Deprecate not passing list-like data when constructing CategoricalIndex (GH 38944)

  • Deprecated allowing specific subclass keyword arguments in the Index constructor, use specific subclasses directly instead (GH 14093, GH 21311, GH 22315, GH 26974)

  • Deprecated the astype() method for converting datetimelike (timedelta64[ns], datetime64[ns], Datetime64TZDtype, PeriodDtype) to integer data types, use values.view(.. .) instead (GH 38544). This deprecation was reversed in pandas 1.4.0.

  • Deprecate MultiIndex.is_lexsorted() and MultiIndex.lexsort_depth() and use MultiIndex.is_monotonic_increasing() instead (GH 32259)

  • Deprecated keyword try_cast in Series.where(), Series.mask(), DataFrame.where(), DataFrame.mask(); cast results manually if needed (GH 38836 )

  • Deprecated comparison of Timestamp objects with datetime.date objects. For example, instead of using ts <= mydate, use ts <= pd.Timestamp(mydate) or ts.date() <= mydate (GH 36131)

  • Deprecate Rolling.win_type returning "freq" (GH 38963)

  • Deprecate Rolling.is_datetimelike (GH 38963)

  • Deprecate the DataFrame indexer in favor of Series.__setitem__() and DataFrame.__setitem__() (GH 39004)

  • Deprecate ExponentialMovingWindow.vol() (GH 39220)

  • Using .astype to convert between datetime64[ns] type and DatetimeTZDtype has been deprecated and will cause in a future version to use obj.tz_localize or obj.dt.tz_localize instead (GH 38622)

  • Deprecated when converting datetime.date objects to datetime64 as fill_value in DataFrame.unstack(), DataFrame.shift(), Series.shift() and DataFrame.reindex() For casts in, pd.Timestamp(dateobj) should be passed instead (GH 39767)

  • Styler.set_na_rep() and Styler.set_precision() are deprecated in favor of Styler.format(), with na_rep and precision as existing and new input parameters respectively (GH 40134, GH 40425)

  • Deprecated Styler.where() in favor of an alternative form of Styler.applymap() (GH 40821)

  • Deprecate functionality in Series.transform() and DataFrame.transform() that allows partial failure when func is similar to a list or dictionary and raises any exception except TypeError; func Raising exceptions other than TypeError will be raised in a future release (GH 40211)

  • Deprecated error_bad_lines and warn_bad_lines parameters in read_csv() and read_table() in favor of on_bad_lines parameter (GH 15122)

  • Support for np.ma.mrecords.MaskedRecords is deprecated in the DataFrame constructor, please use {name: data[name] for name in data.dtype.names} instead (GH 40363)

  • Deprecated behavior of using merge(), DataFrame.merge() and DataFrame.join() at different levels (GH 34862)

  • Use of **kwargs is deprecated in ExcelWriter; use the keyword argument engine_kwargs instead (GH 40430)

  • The level keyword argument is deprecated in DataFrame and Series aggregations; use groupby instead (GH 39983)

  • The inplace parameter in Categorical.remove_categories(), Categorical.add_categories(), Categorical.reorder_categories(), Categorical.rename_categories(), Categorical.set_categories() is deprecated, and Will be removed in a future release (GH 37643)

  • Deprecated behavior of duplicate columns via the suffixes keyword in merge(), as well as columns that already exist (GH 22818)

  • The behavior of setting Categorical._codes is deprecated, please create a new Categorical and use the required codes (GH 40606)

  • The convert_float optional parameter is deprecated in read_excel() and ExcelFile.parse() (GH 41127)

  • The behavior of DatetimeIndex.union() in mixed time zones has been deprecated; in a future release, both will be converted to UTC instead of object dtype (GH 39328)

  • For read_csv() using engine="c", usage of usecols with out-of-bounds indexes has been deprecated (GH 25623)

  • In the DataFrame constructor, special handling of lists whose first element is categorical has been deprecated; use pd.DataFrame({col: categorical, ...}) instead (GH 38845)

  • Deprecated behavior of the DataFrame constructor when a dtype is passed and the data cannot be converted to that dtype. In a future release, this will raise an exception instead of being silently ignored (GH 24435)

  • The Timestamp.freq property is deprecated. For properties that use it (is_month_start, is_month_end, is_quarter_start, is_quarter_end, is_year_start, is_year_end), when you have a freq, use e.g. freq.is_month_start(ts) (GH 15146)

  • Deprecated behavior for constructing Series or DataFrame using DatetimeTZDtype data and datetime64[ns] dtype. Use Series(data).dt.tz_localize(None) instead (GH 41555, GH 33401)

  • The Series constructor behavior that caused silent overflow for large integer values ​​and small integer dtypes is deprecated; use Series(data).astype(dtype) instead (GH 41734)

  • DataFrame construction behavior is deprecated when floating point data and integer dtypes are cast, even with loss; in a future version this will maintain floating point, matching the behavior of Series (GH 41770)

  • timedelta64[ns], datetime64[ns] or DatetimeTZDtype dtypes are no longer inferred in the Series construct when passing data containing strings and no dtype is passed (GH 33558) Deprecated.

  • In a future release, constructing a Series or DataFrame with datetime64[ns] data and DatetimeTZDtype will treat the data as wall clock time instead of UTC time (matching DatetimeIndex behavior). To view data as UTC time, use pd.Series(data).dt.tz_localize("UTC").dt.tz_convert(dtype.tz) or pd.Series(data.view("int64" ), dtype=dtype) (GH 33401)

  • Passing lists as key boolean arguments to DataFrame.xs() and Series.xs() is deprecated (GH 41760)

  • Deprecated {"left", "right", "neither", "both"} using boolean argument inclusive as standard argument value in Series.between() (GH 40628)

  • Arguments passed as positional arguments are deprecated for all of the following cases, except where noted (GH 41485):

  • concat() (except objs)

  • read_csv()(Apart fromfilepath_or_buffer

  • read_table()(Apart fromfilepath_or_buffer

  • DataFrame.clip() and Series.clip() (except upper and lower)

  • DataFrame.drop_duplicates()(Apart fromsubsetparameter),Series.drop_duplicates()Index.drop_duplicates() and MultiIndex.drop_duplicates()

  • DataFrame.drop() (except labels) and Series.drop()

  • DataFrame.dropna() and Series.dropna()

  • DataFrame.ffill()Series.ffill()DataFrame.bfill() and Series.bfill()

  • DataFrame.fillna() and Series.fillna()(Apart from value

  • DataFrame.interpolate() and Series.interpolate()(Apart from method

  • DataFrame.mask() and Series.mask() (except cond and other)

  • DataFrame.reset_index() (except level) and Series.reset_index()

  • DataFrame.set_axis() and Series.set_axis() (except labels)

  • DataFrame.set_index() (except keys)

  • DataFrame.sort_index() and Series.sort_index()

  • DataFrame.sort_values()(Apart from by)and Series.sort_values()

  • DataFrame.where() and Series.where() (except cond and other)

  • Index.set_names() and MultiIndex.set_names() (except names)

  • MultiIndex.codes() (except codes)

  • MultiIndex.set_levels() (except levels)

  • Resampler.interpolate() (except method)

Performance improvements

  • Performance improved for IntervalIndex.isin() (GH 38353)

  • Performance improvements for Series.mean() for nullable data types (GH 34814)

  • Performance improvements for Series.isin() for nullable data types (GH 38340)

  • For nullable float and nullable integer data types, DataFrame.fillna() performance improves when using method="pad" or method="backfill" (GH 39953)

  • Performance of DataFrame.corr() has been improved for method=kendall (GH 28329)

  • Performance improvements for DataFrame.corr() for method=spearman (GH 40956, GH 41885)

  • Performance improvements for Rolling.corr() and Rolling.cov() (GH 39388)

  • Performance improvements for RollingGroupby.corr(), ExpandingGroupby.corr(), ExpandingGroupby.corr() and ExpandingGroupby.cov() (GH 39591)

  • Performance improvements for unique() for object data types (GH 37615)

  • Performance of json_normalize() has been improved for the base case (including delimiters) (GH 40035, GH 15621)

  • Performance improved for ExpandingGroupby aggregation method (GH 39664)

  • Performance improvements in Styler, rendering time reduced by over 50%, now matches DataFrame.to_html() (GH 39972 GH 39952, GH 40425)

  • Method Styler.set_td_classes() is now as efficient as Styler.apply() and Styler.applymap(), and in some cases even more efficient (GH 40453)

  • Performance improvements in ExponentialMovingWindow.mean(), using times (GH 39784)

  • Performance improvements in DataFrameGroupBy.apply() and SeriesGroupBy.apply() when Python fallback implementation is required (GH 40176)

  • Performance improvements for converting PyArrow boolean arrays to pandas nullable boolean arrays (GH 41051)

  • Performance improvements for joining data with type CategoricalDtype (GH 40193)

  • Performance improvements in DataFrameGroupBy.cummin(), SeriesGroupBy.cummin(), DataFrameGroupBy.cummax() and SeriesGroupBy.cummax() when using nullable data types (GH 37493)

  • Improved performance of Series.nunique() when using nan values ​​(GH 40865)

  • Performance improvements in DataFrame.transpose() and Series.unstack() when using DatetimeTZDtype (GH 40149)

  • Performance improvements to Series.plot() and DataFrame.plot() when lazily loading entry points (GH 41492)

Bug fix

Classification

  • CategoricalIndex incorrectly did not raise TypeError when passing scalar data (GH 38614)

  • Bug when the Index passed to CategoricalIndex.reindex is not categorical, but all its values ​​are labels in categories (GH 28690)

  • Error when constructing Categorical from an array of object data types, not properly round-tripping to a date object via astype (GH 38552)

  • Error when constructing DataFrame from ndarray and CategoricalDtype (GH 38857)

  • Bug in setting categorical values ​​to object data type columns in DataFrame (GH 39136)

  • Bug in DataFrame.reindex() that raised IndexError when the new index contained duplicates and the old index was a CategoricalIndex (GH 38906)

  • When filling Categorical.fillna() with a tuple-like category, raise NotImplementedError instead of ValueError when filling with a non-categorical tuple (GH 41914)

Date and time class

  • The DataFrame and Series constructors sometimes remove nanoseconds from Timestamp (or Timedelta) data with dtype=datetime64[ns] (or timedelta64[ns]) (GH 38032)

  • Bug in DataFrame.first() and Series.first() with a month offset returning incorrect results when the first day is the end of the month (GH 29623)

  • An error occurred when building a DataFrame or Series with mismatched datetime64 data and timedelta64 data types or vice versa, failing to raise TypeError (GH 38575, GH 38764, GH 38792)

  • There is a bug in building a Series or DataFrame with a datetime object outside the range of the datetime64[ns] data type or a timedelta object outside the range of the timedelta64[ns] data type (GH 38792, GH 38965 )

  • Bug in DatetimeIndex.intersection(), DatetimeIndex.symmetric_difference(), PeriodIndex.intersection(), PeriodIndex.symmetric_difference() always returns object dtype when operating with CategoricalIndex (GH 38741 )

  • Bug in DatetimeIndex.intersection() giving incorrect results when using non-Tick frequencies and n != 1 (GH 42104)

  • Bug in Series.where() incorrectly converting datetime64 values ​​to int64 (GH 37682)

  • Bug in Categorical incorrectly typecasting datetime objects to Timestamp (GH 38878)

  • Bug in comparison between Timestamp objects and datetime64 objects outside the boundaries of the nanosecond datetime64 implementation (GH 39221)

  • Bug in Timestamp.round(), Timestamp.floor(), Timestamp.ceil() for values ​​close to the Timestamp implementation boundary (GH 39244)

  • Bug in Timedelta.round(), Timedelta.floor(), Timedelta.ceil() for values ​​close to the Timedelta implementation boundary (GH 38964)

  • Bug in date_range() incorrectly creating DatetimeIndex containing NaT instead of raising OutOfBoundsDatetime in corner cases (GH 24124)

  • infer_freq() incorrectly fails to infer the 'H' frequency for a DatetimeIndex that has a time zone and crosses a daylight saving time boundary (GH 39556)

  • Series backed by DatetimeArray or TimedeltaArray sometimes fails to set the array's freq to None (GH 41425)

Time increment

  • When building Timedelta from np.timedelta64 objects, objects with non-nanosecond units outside the range of timedelta64[ns] failed to raise an error (GH 38965)

  • Bug when building TimedeltaIndex, incorrectly accepting np.datetime64("NaT") objects (GH 39462)

  • Building Timedelta from an input string containing only symbols and no numbers fails without raising an error (GH 39710)

  • TimedeltaIndex and to_timedelta() fail to raise an error when passing a non-nanosecond timedelta64 array, overflowing when converting to timedelta64[ns] (GH 40008)

Time zone

  • UTC represented by different tzinfo objects are not considered equivalent (GH 39216)

  • dateutil.tz.gettz("UTC") is not recognized as an equivalent for other tzinfo representing UTC (GH 39276)

Value

  • Bug in DataFrame.quantile() and DataFrame.sort_values() caused incorrect subsequent indexing behavior (GH 38351)

  • Bug in DataFrame.sort_values(), raising IndexError for empty by (GH 40258)

  • Bug in DataFrame.select_dtypes() removes numeric ExtensionDtype columns when include=np.number (GH 35340)

  • Bug in DataFrame.mode() and Series.mode() not maintaining consistent integer Index on empty input (GH 33321)

  • Bug in DataFrame.rank() when DataFrame contains np.inf (GH 32593)

  • Bug in DataFrame.rank() raising IndexError when column holds incomparable type and axis=0 (GH 38932)

  • Bug in Series.rank(), DataFrame.rank(), DataFrameGroupBy.rank() and SeriesGroupBy.rank() treats the most negative int64 value as missing (GH 32859)

  • Bug in DataFrame.select_dtypes() behaves differently when include="int" is used in Windows and Linux (GH 36596)

  • Bug in DataFrame.apply() and DataFrame.agg() when passing parameter func="size" operates on the entire DataFrame instead of rows or columns (GH 39934)

  • Bug in DataFrame.transform(), which raised SpecificationError when passing a dictionary and missing columns, now raises KeyError (GH 40004)

  • Bug in DataFrameGroupBy.rank() and SeriesGroupBy.rank() giving incorrect results when pct=True and there are equal values ​​between consecutive groups (GH 40518)

  • Bug in Series.count() that would result in int32 results when parameter level=None was used on 32-bit platforms (GH 40908)

  • Bug in Series and DataFrame that does not return boolean results for object data when using any and all methods for reduction (GH 12863, GH 35450, GH 27709)

  • There is a bug in Series.clip() that fails if the Series contains NA values ​​and has a nullable int or float data type (GH 40851)

  • Bug in UInt64Index.where() and UInt64Index.putmask() where TypeError was incorrectly raised if other was of type np.int64 (GH 41974)

  • Bug in DataFrame.agg() where the axes of the aggregate were not sorted in the order of the provided aggregate functions, when one or more aggregate functions failed to produce results (GH 33634)

  • DataFrame.clip() has a bug and does not interpret missing values ​​as unthresholded (GH 40420)

Conversion

  • Bug in Series.to_dict() in orient=&#39;records&#39; mode, now returns Python native types (GH 25969)

  • Bug in Series.view() and Index.view() when converting to datetime types (datetime64[ns], datetime64[ns, tz], timedelta64, period) ( GH 39788)

  • Original data type not preserved when creating DataFrame from empty np.recarray (GH 40121)

  • Failed to raise TypeError when building DataFrame from frozenset (GH 40163)

  • Ignore passed dtype when building Index, silently ignoring when data cannot be converted to that dtype (GH 21311)

  • When converting to dtype=&#39;categorical&#39;, StringArray.astype() falls back to NumPy and throws an error on conversion (GH 40450)

  • Bug in calling factorize() where, when given an array of numeric NumPy dtypes lower than int64, uint64 and float64, unique values ​​did not retain their original dtype (GH 41132)

  • Bug when building DataFrame with dictionary containing ExtensionDtype and copy=True, unable to copy array class objects (GH 38939)

  • qcut() throws error when taking Float64DType as input (GH 40730)

  • When building DataFrame and Series with datetime64[ns] data and dtype=object, the result is a datetime object instead of a Timestamp object (GH 41599)

  • When building DataFrame and Series with timedelta64[ns] data and dtype=object, the result is an np.timedelta64 object instead of a Timedelta object (GH 41599)

  • Error in DataFrame construction when given a Period or Interval object of 2D object data type np.ndarray and cannot be converted to PeriodDtype or IntervalDtype, respectively (GH 41812)

  • Bug when constructing Series from lists and PandasDtype (GH 39357)

  • There is a bug when creating a Series from a range object that does not fit within the boundaries of the int64 data type (GH 30173)

  • Bug when creating Series from dict with full tuple keys and Index that needs to be re-indexed (GH 41707)

  • Bug in infer_dtype(), does not recognize Series, Index or arrays with Period data type (GH 23553)

  • For general ExtensionArray objects, there is a bug in infer_dtype() and an error will be raised. "unknown-array" will now be returned instead of raising an error (GH 37367)

  • A bug exists when calling DataFrame.convert_dtypes() on an empty DataFrame, incorrectly raising ValueError (GH 40393)

string

  • Bug when converting from pyarrow.ChunkedArray to StringArray, original data is not chunked (GH 41040)

  • Series.replace() and DataFrame.replace() ignore replacements for regex=True when using StringDType data (GH 41333, GH 35977)

  • There is a bug in Series.str.extract() that returns an empty DataFrame object dtype when using StringArray (GH 41441)

  • There is a bug in Series.str.replace() in inline code where the case parameter is ignored when regex=False (GH 41602)

Interval

  • IntervalIndex.intersection() and IntervalIndex.symmetric_difference() always return object dtype when operating with CategoricalIndex (GH 38653, GH 38741)

  • IntervalIndex.intersection() returns duplicates when there are duplicates in at least one Index object that exist in other objects (GH 38743)

  • IntervalIndex.union(), IntervalIndex.intersection(), IntervalIndex.difference() and IntervalIndex.symmetric_difference() now convert appropriately when operating with IntervalIndex of other incompatible dtypes dtype instead of raising TypeError (GH 39267)

  • PeriodIndex.union(), PeriodIndex.intersection(), PeriodIndex.symmetric_difference() and PeriodIndex.difference() are now converted to objects when operating with PeriodIndex of other incompatible dtypes dtype instead of raising IncompatibleFrequency (GH 39306)

  • There is a bug in IntervalIndex.is_monotonic(), IntervalIndex.get_loc(), IntervalIndex.get_indexer_for() and IntervalIndex.__contains__() when NA values ​​are present (GH 41831)

Index

  • Bug in Index.union() and MultiIndex.union() removing duplicate Index values ​​when Index is not monotonic or sort is set to False (GH 36289, GH 31326 , GH 40862)

  • Bug in CategoricalIndex.get_indexer(), failing to raise InvalidIndexError when non-unique (GH 38372)

  • Bug in IntervalIndex.get_indexer() when target has CategoricalDtype and both index and target contain NA values ​​(GH 41934)

  • Bug in Series.loc(), raising ValueError when filtering with a boolean list and the value to be set is a lower dimension list (GH 20438)

  • Bug thrown when inserting many new columns into a DataFrame, causing subsequent indexing to behave incorrectly (GH 38380)

  • Bug in DataFrame.__setitem__(), raising ValueError when setting multiple values ​​to duplicate columns (GH 15695)

  • Bug in DataFrame.loc(), Series.loc(), DataFrame.__getitem__() and Series.__getitem__() returning incorrect string slices for non-monotone DatetimeIndex Elements of (GH 33146)

  • DataFrame.reindex() and Series.reindex() raise TypeError (GH 38566)

  • Bug in DataFrame.reindex() when fill_value needs to be converted to an object dtype and datetime64[ns] or timedelta64[ns] is incorrectly converted to an integer (GH 39755)

  • A bug exists in DataFrame.__setitem__() that raises ValueError when setting on an empty DataFrame with a specified column and a non-empty DataFrame value (GH 38831)

  • A bug exists in DataFrame.loc.__setitem__() that raises ValueError when operating on a unique column on a DataFrame with duplicate columns (GH 38521)

  • Bug in mixed types, DataFrame.iloc.__setitem__() and DataFrame.loc.__setitem__() when setting to dictionary values ​​(GH 38335)

  • A bug exists in Series.loc.__setitem__() and DataFrame.loc.__setitem__() that raises KeyError when a boolean generator is provided (GH 39614)

  • A bug exists in Series.iloc() and DataFrame.iloc() that raises KeyError when a generator is provided (GH 39614)

  • DataFrame.__setitem__() does not raise ValueError when the right side is a DataFrame with a mismatched number of columns (GH 38604)

  • A bug exists in Series.__setitem__() that raises ValueError when setting a Series using a scalar indexer (GH 38303)

  • A bug in DataFrame.loc() reduces the level of MultiIndex when the input DataFrame has only one row (GH 10521)

  • DataFrame.__getitem__() and Series.__getitem__() always raise KeyError when slicing an existing string where Index has milliseconds (GH 33589)

  • When setting a timedelta64 or datetime64 value to a numeric Series fails with an error and cannot convert to an object dtype (GH 39086, GH 39619)

  • Bug when setting an Interval value into a Series or DataFrame with a mismatched IntervalDtype, incorrectly converting the new value to the existing dtype (GH 39120)

  • Bug when setting datetime64 values ​​into Series with integer dtype, incorrectly converting datetime64 values ​​to integers (GH 39266)

  • Bug when setting np.datetime64("NaT") into a Series with Datetime64TZDtype incorrectly treating time zone independent values ​​as time zone aware values ​​(GH 39769)

  • Bug in Index.get_loc() where KeyError was not raised when key=NaN was specified and method was specified but NaN was not in Index (GH 39382)

  • Bug in DatetimeIndex.insert() incorrectly treating time zone independent values ​​as time zone aware values ​​when inserting np.datetime64("NaT") into a time zone aware index (GH 39769)

  • Exception incorrectly thrown in Index.insert() when setting a new column that cannot fit in an existing frame.columns, or in Series.reset_index() or DataFrame.reset_index() instead of converting it to a compatible dtype (GH 39068)

  • Bug in RangeIndex.append() where single objects of length 1 were incorrectly spliced ​​together (GH 39401)

  • Bug in RangeIndex.astype() when converting to CategoricalIndex, category becomes Int64Index instead of RangeIndex (GH 41263)

  • Bug in setting numpy.timedelta64 value to Series of object dtype when using boolean indexer (GH 39488)

  • Bug when setting numeric value to boolean type Series using at or iat to convert it to object type (GH 39582)

  • Bug in DataFrame.__setitem__() and DataFrame.iloc.__setitem__() raising ValueError when trying to set the value of a row fragment with a list (GH 40440)

  • Bug in DataFrame.loc() where KeyError was not raised when key was not found in MultiIndex and level was not fully specified (GH 41170)

  • Bug where DataFrame.loc.__setitem__() incorrectly raised an exception when setting an extension when there were duplicates in the index of the extended axis (GH 40096)

  • Bug in DataFrame.loc.__getitem__() with MultiIndex incorrectly converting to float when at least one index column has a float type and we retrieve a scalar (GH 41369)

  • Bug in DataFrame.loc() incorrectly matches non-boolean indexed elements (GH 20432)

  • Bug where KeyError was incorrectly raised when using np.nan for indexing on Series or DataFrame with CategoricalIndex (GH 41933)

  • Bug in Series.__delitem__() with ExtensionDtype incorrectly converted to ndarray (GH 40386)

  • Bug in DataFrame.at() with CategoricalIndex returning incorrect results when passing integer keys (GH 41846)

  • If there are duplicate indexers in DataFrame.loc(), the returned MultiIndex will be in the wrong order (GH 40978)

  • DataFrame.__setitem__() raises TypeError when using DatetimeIndex, using str subclass as column name (GH 37366)

  • PeriodIndex.get_loc() fails to raise KeyError when given a Period that does not match freq (GH 41670)

  • Bug .loc.__getitem__ sometimes raised OverflowError instead of KeyError when using UInt64Index with negative integer keys, in other cases converting to positive integers (GH 41777)

  • Bug in Index.get_indexer(), failing to raise ValueError when using invalid method, limit or tolerance parameters in some cases (GH 41918)

  • Error when slicing Series or DataFrame, raising ValueError instead of TypeError when passing an invalid string (GH 41821)

  • Bug in Index constructor where specified dtype was sometimes silently ignored (GH 38879)

  • The behavior of Index.where() now matches the behavior of Index.putmask(), i.e. index.where(mask, other) matches index.putmask(~mask, other) (GH 39412)

Missing

  • Bug in Grouper not properly propagating dropna parameter; DataFrameGroupBy.transform() now correctly handles missing values ​​of dropna=True (GH 35612)

  • Bug in isna(), Series.isna(), Index.isna(), DataFrame.isna() and corresponding notna functions do not recognize Decimal("NaN") objects ( GH 39409)

  • Bug in DataFrame.fillna() does not accept dictionary as downcast keyword argument (GH 40809)

  • Bug in isna() does not return a copy of the mask for nullable types, causing any subsequent modification of the mask to alter the original array (GH 40935)

  • Bug in DataFrame When constructed with floating point data containing NaN and an integer dtype, a cast is performed instead of retaining NaN (GH 26919)

  • Bug in Series.isin() and MultiIndex.isin() not treating all NaNs as equivalent if they were in tuples (GH 41836)

MultiIndex

  • Bug in DataFrame.drop() raises TypeError when MultiIndex is non-unique and level is not provided (GH 36293)

  • Bug in MultiIndex.intersection() repeating NaN in result (GH 38623)

  • Bug in MultiIndex.equals() incorrectly returns True when MultiIndex contains NaN, even if they are in different order (GH 38439)

  • Bug in MultiIndex.intersection() always returns empty result when intersecting with CategoricalIndex (GH 38653)

  • Bug in MultiIndex.difference() incorrectly raises TypeError when index contains unsortable entries (GH 41915)

  • When using MultiIndex.reindex() on an empty MultiIndex, a ValueError is raised when only a specific level is indexed (GH 41170)

  • When reindexing MultiIndex, TypeError is raised when reindexing against a flat Index (GH 41707)

I/O

  • There is a bug in Index.__repr__() when display.max_seq_items=1 (GH 38415)

  • Bug in read_csv() when setting parameter decimal and engine="python", scientific notation is not recognized (GH 31920)

  • Bug in read_csv() interpreting NA values ​​as comments, fixed for engine="python" when NA contains a comment string (GH 34002)

  • Bug in read_csv() raises IndexError when multiple header columns and index_col are specified, but the file has no data rows (GH 38292)

  • Bug in read_csv() does not accept the case where usecols and names are of different lengths in engine="python" (GH 16469)

  • Bug in read_csv() returns object dtype when delimiter="," while usecols and parse_dates are specified for engine="python" (GH 35873)

  • Bug in read_csv() raising TypeError when specifying names and parse_dates for engine="c" (GH 33699)

  • Bug in read_clipboard() and DataFrame.to_clipboard() not working in WSL (GH 38527)

  • Allow setting custom error values ​​for the parse_dates parameter of read_sql(), read_sql_query() and read_sql_table() (GH 35185)

  • Bug in DataFrame.to_hdf() and Series.to_hdf() raise KeyError when trying to apply to a subclass of DataFrame or Series (GH 33748)

  • Bug in HDFStore.put() raises wrong TypeError when saving DataFrame with non-string dtype (GH 34274)

  • Bug in json_normalize() causes the first element of the generator object not to be included in the returned DataFrame (GH 35923)

  • Bug in read_csv() Issue when applying thousand separators to date columns when dates should be parsed and usecols is specified for engine="python" (GH 39365)

  • Bug in read_excel() forward-filling MultiIndex names when specifying multiple header and index columns (GH 34673)

  • Bug in read_excel() does not respect set_option() (GH 34252)

  • Bug in read_csv() not switching between true_values and false_values of nullable boolean types (GH 34655)

  • Bug in read_json() numeric string index not maintained when orient="split" (GH 28556)

  • read_sql() returns an empty generator if chunksize is non-zero and the query returns no results. Now returns a generator with a single empty DataFrame (GH 34411)

  • Bug in read_hdf() returns unexpected records when filtering categorical string columns using where parameter (GH 39189)

  • Bug in read_sas() raising ValueError when datetimes is null (GH 39725)

  • Bug in read_excel() when discarding null values ​​in a single column spreadsheet (GH 39808)

  • Bug in read_excel() causing problems loading trailing empty rows/columns for certain file types (GH 41167)

  • Bug in read_excel() raising AttributeError when Excel file has MultiIndex header followed by two empty lines and no index (GH 40442)

  • Bug in read_excel(), read_csv(), read_table(), read_fwf() and read_clipboard() when a MultiIndex header was followed by an unindexed blank row (GH 40442)

  • Bug in DataFrame.to_string() misaligned truncated columns when index=False (GH 40904)

  • Bug in DataFrame.to_string() added extra points and misplaced truncated rows when index=False (GH 40904)

  • Bug in read_orc() always raises AttributeError (GH 40918)

  • Bug in read_csv() and read_table() where names and prefix were silently ignored and now raised ValueError (GH 39123)

  • Bug in read_csv() and read_excel() not respecting the dtype of duplicate column names when mangle_dupe_cols is set to True (GH 35211)

  • Bug in read_csv() where sep was silently ignored when delimiter and sep were defined and now raised ValueError (GH 39823)

  • Bug in read_csv() and read_table() misinterprets arguments when sys.setprofile was previously called (GH 41069)

  • Bug (GH 40896) occurs when converting PyArrow to pandas (e.g. reading a Parquet file) with a PyArrow array with a nullable dtype whose data buffer size is not a multiple of the dtype size

  • Bug in read_excel() causes an error to occur when pandas cannot determine the file type, even if the user specifies the engine parameter (GH 41225)

  • Bug in read_clipboard() misplaces values ​​into the wrong columns when copying from an Excel file if there is a null value in the first column (GH 41108)

  • Bug in DataFrame.to_hdf() and Series.to_hdf() raising TypeError when trying to append a string column to an incompatible column (GH 41897)

Period

  • Comparisons of Period objects or Index, Series or DataFrame of unmatched PeriodDtype now behave the same as comparisons of other unmatched types, returning False for equality and True for inequality , raisesTypeError` for inequality checks (GH 39274)

Drawing

  • Bug in plotting.scatter_matrix() raises error when passing 2D ax argument (GH 16253)

  • Prevent warnings from appearing when Matplotlib's constrained_layout is enabled (GH 25261)

  • Bug in DataFrame.plot() where the wrong colors were displayed in the legend if the function was called repeatedly and some calls used yerr but others did not (GH 39522)

  • Bug in DataFrame.plot() where the wrong colors were displayed in the legend if the function was called repeatedly and some calls used secondary_y and others used legend=False (GH 40044)

  • Bug in DataFrame.plot.box() where the top hat or min/max markers in the plot were not visible when the dark_background theme was selected (GH 40769)

Groupby/resample/rolling

  • Bug in DataFrameGroupBy.agg() and SeriesGroupBy.agg() where results were incorrectly typecast too aggressively for PeriodDtype columns (GH 38254)

  • Bug in SeriesGroupBy.value_counts() where unobserved categories were not counted in the grouped category Series (GH 38672)

  • Bug in SeriesGroupBy.value_counts() that would throw an error if the Series was empty (GH 39172)

  • Bug in GroupBy.indices() could include non-existent indexes when there is a null value in the group key (GH 9304)

  • Fixed a bug in DataFrameGroupBy.sum() and SeriesGroupBy.sum() that resulted in loss of precision and now uses the Kahan summation method (GH 38778)

  • Fixed bug in DataFrameGroupBy.cumsum(), SeriesGroupBy.cumsum(), DataFrameGroupBy.mean() and SeriesGroupBy.mean(), resulting in loss of accuracy by using Kahan summation (GH 38934)

  • A bug in Resampler.aggregate() and DataFrame.transform() where mixing data types in the absence of keys would raise TypeError instead of SpecificationError (GH 39025)

  • A bug in DataFrameGroupBy.idxmin() and DataFrameGroupBy.idxmax() involving ExtensionDtype columns (GH 38733)

  • A bug in Series.resample() that causes an error when the index is a PeriodIndex consisting of NaT (GH 39227)

  • A bug in RollingGroupby.corr() and ExpandingGroupby.corr(), when providing other longer than each group, the groupby column would return 0 instead of np.nan (GH 39591)

  • A bug in ExpandingGroupby.corr() and ExpandingGroupby.cov() where 1 was returned instead of np.nan when other longer than each group was provided (GH 39591)

  • Bug in DataFrameGroupBy.mean(), SeriesGroupBy.mean(), DataFrameGroupBy.median(), SeriesGroupBy.median() and DataFrame.pivot_table() where metadata was not propagated (GH 28283 )

  • Bug in Series.rolling() and DataFrame.rolling() where the window bounds were not calculated correctly when the window was an offset and the dates were in descending order (GH 40002)

  • Bugs in Series.groupby() and DataFrame.groupby() on empty Series or DataFrame, directly use idxmax, idxmin, mad, min, max, sum, prod and skew methods, or when using them via apply, aggregate or resample, lose indexes, columns and/or data types (GH 26411)

  • Bug in DataFrameGroupBy.apply() and SeriesGroupBy.apply(), when used on RollingGroupby objects, creates MultiIndex instead of Index (GH 39732)

  • Bug in DataFrameGroupBy.sample(), causing an error when weights is specified and the index is Int64Index (GH 39927)

  • Bug in DataFrameGroupBy.aggregate() and Resampler.aggregate(), which sometimes raised SpecificationError when a dictionary was passed and a column was missing; now always raises KeyError (GH 40004)

  • Bug in DataFrameGroupBy.sample() where column selection was not applied before calculating the result (GH 39928)

  • Bug in ExponentialMovingWindow where calling __getitem__ incorrectly raised ValueError when times was provided (GH 40164)

  • Bug in ExponentialMovingWindow, calling __getitem__ does not preserve com, span, alpha or halflife attributes (GH 40164)

  • ExponentialMovingWindow now raises NotImplementedError when specifying adjust=False because of incorrect calculation (GH 40098)

  • Bug in ExponentialMovingWindowGroupby.mean(), the times parameter is ignored when engine=&#39;numba&#39; (GH 40951)

  • Bug in ExponentialMovingWindowGroupby.mean(), using wrong time when there are multiple groups (GH 40951)

  • Bug in ExponentialMovingWindowGroupby, time vector and value will be out of sync during non-trivial grouping (GH 40951)

  • Bug in Series.asfreq() and DataFrame.asfreq() that dropped rows when the index was not sorted (GH 39805)

  • Bug in aggregate functions in DataFrame that did not respect the numeric_only parameter when the level keyword was given (GH 40660)

  • Aggregating a Series with an Index of object type using a user-defined function caused a bug in SeriesGroupBy.aggregate() with incorrect Index shape (GH 40014)

  • Bug in as_index=False parameter in groupby in RollingGroupby being ignored (GH 39433)

  • When using a nullable type column with NA and even with skipna=True, DataFrameGroupBy.any(), SeriesGroupBy.any(), DataFrameGroupBy.all() and SeriesGroupBy.all( ) reports ValueError error (GH 40585)

  • Bug in DataFrameGroupBy.cummin(), SeriesGroupBy.cummin(), DataFrameGroupBy.cummax() and SeriesGroupBy.cummax() that incorrectly rounded integer values ​​near int64 implementation boundaries (GH 40767 )

  • Error in DataFrameGroupBy.rank() and SeriesGroupBy.rank() incorrectly raising TypeError using data with nullable types (GH 41010)

  • Bug in DataFrameGroupBy.cummin(), SeriesGroupBy.cummin(), DataFrameGroupBy.cummax() and SeriesGroupBy.cummax() compute incorrect results when converting nullable data types to floats, resulting in Unable to turn back (GH 37493)

  • Bug in DataFrame.rolling() returns non-zero sums with zero mean and unstable computation in case of min_periods=0 for all NaN windows (GH 41053)

  • Bug in DataFrame.rolling() returns non-zero sum in case of min_periods=0 for all NaN windows and unstable computation (GH 41053)

  • Bug in SeriesGroupBy.agg() fails in ordered aggregation operation on CategoricalDtype that preserves order (GH 41147)

  • Bug in DataFrameGroupBy.min(), SeriesGroupBy.min(), DataFrameGroupBy.max() and SeriesGroupBy.max() incorrectly when having multiple object data type columns and numeric_only=False Raises ValueError (GH 41111)

  • Bug in DataFrameGroupBy.rank() when using axis=0 and keyword axis=1 of rank method of GroupBy object (GH 41320)

  • Bug in DataFrameGroupBy.__getitem__() incorrectly returned a malformed SeriesGroupBy instead of DataFrameGroupBy on non-unique columns (GH 41427)

  • Bug in DataFrameGroupBy.transform() incorrectly raises AttributeError on non-unique columns (GH 41427)

  • Bug in Resampler.apply() incorrectly removed duplicate columns on non-unique columns (GH 41445)

  • Bug in Series.groupby() Aggregation incorrectly returns an empty Series instead of raising TypeError on aggregates whose dtype is invalid, such as .prod with datetime64[ns] dtype (GH 41342)

  • Bug in DataFrameGroupBy aggregate incorrectly not dropping columns of invalid dtype for the aggregate when there are no valid columns (GH 41291)

  • Bug in DataFrame.rolling.__iter__() where on was not assigned an index into the result object (GH 40373)

  • Bug in DataFrameGroupBy.transform() and DataFrameGroupBy.agg(), when using engine="numba", *args is cached for user-passed functions (GH 41647)

  • DataFrameGroupBy methods agg, transform, sum, bfill, ffill, pad, pct_change, shift, ohlc removed .columns.names (GH 41497 )

Reshape

  • Bug in merge() caused an error when performing an inner join, when the indexes partially overlapped and right_index=True was used, there was no overlap between the indices (GH 33814)

  • DataFrame.unstack() with missing levels results in wrong index names (GH 37510).

  • merge_asof() propagated an error in the right index instead of the left index when using the left_index=True and right_on specifications (GH 33463).

  • Bug in DataFrame.join() returning incorrect results when a DataFrame has a MultiIndex in which at least one index has only one level (GH 36909).

  • merge_asof() now raises ValueError in the case of non-numeric merging columns, instead of the obscure TypeError (GH 29130).

  • Bug in DataFrame.join() not assigning values ​​correctly when the DataFrame has a MultiIndex and at least one dimension has a non-alphabetical Categorical category (GH 38502).

  • Series.value_counts() and Series.mode() now return consistent keys in original order (GH 12679, GH 11227 and GH 39007).

  • DataFrame.stack() does not correctly handle errors with NaN in MultiIndex columns (GH 39481).

  • DataFrame.apply() produces incorrect results when the argument func is a string, axis=1 and the axis argument is not supported; now raises a ValueError (GH 39211).

  • Bug in DataFrame.sort_values() not reshaping the index correctly after column sorting with ignore_index=True (GH 39464).

  • Bug in DataFrame.append() returning wrong dtypes when combining ExtensionDtype types (GH 39454)

  • Bug in DataFrame.append() returns wrong dtypes when used with a combination of datetime64 and timedelta64 types (GH 39574)

  • Bug in DataFrame.append() when appending to a DataFrame that has a Series with a MultiIndex and whose Index is not a MultiIndex returns wrong dtypes (GH 41707)

  • Bug in DataFrame.pivot_table() returns a MultiIndex of a single value when operating on an empty DataFrame (GH 13483)

  • Index can now be passed to the numpy.all() function (GH 40180)

  • Bug in DataFrame.stack() not retaining CategoricalDtype in MultiIndex (GH 36991)

  • Bug in to_datetime() raises error when input sequence contains unhashable items (GH 39756)

  • Bug in Series.explode() preserve index when ignore_index is True and value is scalar (GH 40487)

  • Bug in to_datetime() raises ValueError when Series contains None and NaT and has more than 50 elements (GH 39882)

  • Timezone-aware datetime objects containing object dtype values ​​in Series.unstack() and DataFrame.unstack() incorrectly raise TypeError (GH 41875)

  • DataFrame.melt() bug that raises InvalidIndexError when DataFrame has duplicate columns used as value_vars (GH 41951)

Sparse

  • Fixed a DataFrame.sparse.to_coo() bug that raised KeyError using a numeric Index column without 0 (GH 18414)

  • Fixed bug with copy=False in SparseArray.astype(), which produced incorrect results when converting from an integer dtype to a floating point dtype (GH 34456)

  • Fixed bug in SparseArray.max() and SparseArray.min() always returning empty results (GH 40921)

Extend array

  • Fixed bug in DataFrame.where() when other is a Series with ExtensionDtype (GH 38729)

  • Fixed bug where Series.idxmax(), Series.idxmin(), Series.argmax() and Series.argmin() failed when the underlying data was an ExtensionArray (GH 32749, GH 33719, GH 36566)

  • Fixed a bug where properties of some PandasExtensionDtype subclasses were cached incorrectly (GH 40329)

  • Fixed bug in DataFrame.mask() that raised ValueError when masking a DataFrame using ExtensionDtype (GH 40941)

Stylizer

  • Bug in subset parameter in Styler method throws error for certain valid MultiIndex slices (GH 33562)

  • The HTML output rendered by Styler has been slightly modified to support w3's good coding standards (GH 39626)

  • Bug in Styler Some header cells in rendered HTML are missing column class identifiers (GH 39716)

  • Bug in Styler.background_gradient() where text color is not determined correctly (GH 39888)

  • Bug in Styler.set_table_styles() Multiple elements of the table_styles parameter in the CSS selector are not added correctly (GH 34061)

  • Bug in Styler top left cell missing and title misaligned when copying from Jupyter (GH 12147)

  • Bug in Styler.where where kwargs is not passed to the applicable callable (GH 40845)

  • Bug in Styler causes CSS to be repeated on multiple renders (GH 39395, GH 40334)

other

  • inspect.getmembers(Series) no longer raises AbstractMethodError (GH 38782)

  • Bug in Series.where() with numeric dtype and other=None not converting to nan (GH 39761)

  • Bug in assert_series_equal(), assert_frame_equal(), assert_index_equal() and assert_extension_array_equal() incorrectly raised when attributes had unrecognized NA types (GH 39461)

  • Bug when assert_index_equal() did not compare CategoricalIndex instances with Int64Index and RangeIndex categories when exact=True was raised (GH 41263)

  • DataFrame.equals(), Series.equals() and Index. Bug in equals() (GH 39650)

  • Bug in show_versions() when console JSON output was not correct JSON (GH 39701)

  • pandas now compiles on z/OS when using xlc (GH 35826)

  • Bug in pandas.util.hash_pandas_object() fails to recognize hash_key, encoding and categorize when the input object type is DataFrame (GH 41404)

Classified

  • Bug in CategoricalIndex incorrectly not raising TypeError when passing scalar data (GH 38614)

  • CategoricalIndex.reindex fails when the passed Index is not categorical but all its values ​​are labels in that category (GH 28690)

  • Incorrect round trip using astype when constructing Categorical from date object of object dtype array (GH 38552)

  • Bug in constructing DataFrame from ndarray and CategoricalDtype (GH 38857)

  • Bug in DataFrame setting categorical values ​​into object dtype column (GH 39136)

  • DataFrame.reindex() raises IndexError when the new index contains duplicates and the old index is a CategoricalIndex (GH 38906)

  • In tuple-like categorical filling, the Categorical.fillna() function raises NotImplementedError instead of ValueError when filling with non-categorical tuples (GH 41914)

Datetimelike

  • Nanoseconds were sometimes removed from Timestamp (resp. Timedelta) data with data type datetime64[ns] (resp. timedelta64[ns]) in DataFrame and Series constructors (GH 38032)

  • DataFrame.first() and Series.first() had a bug when the month offset was one, returning incorrect results when the first day of the month was the end of the month (GH 29623)

  • Bug where TypeError failed to be raised when constructing DataFrame or Series with mismatched datetime64 data and timedelta6 dtype or vice versa (GH 38575, GH 38764, GH 38792)

  • Bug in constructing Series or DataFrame with datetime objects outside the boundaries of datetime64[ns] dtype or timedelta objects outside the boundaries of timedelta64[ns] dtype (GH 38792, GH 38965)

  • Bug in DatetimeIndex.intersection(), DatetimeIndex.symmetric_difference(), PeriodIndex.intersection(), PeriodIndex.symmetric_difference() always returning object dtype when operating with CategoricalIndex (GH 38741 )

  • DatetimeIndex.intersection() gives incorrect results when using non-Tick frequencies and n != 1 (GH 42104)

  • Bug in Series.where() incorrectly converts datetime64 values ​​to int64 (GH 37682)

  • Bug in Categorical when incorrectly converting datetime objects to Timestamp (GH 38878)

  • Bug in comparing Timestamp objects with datetime64 objects outside the boundaries of microsecond-level datetime64 implementations (GH 39221)

  • Bug in Timestamp.round(), Timestamp.floor(), Timestamp.ceil() when values ​​close to Timestamp implementation boundaries (GH 39244)

  • Bug Bug exists in Timedelta.round(), Timedelta.floor() and Timedelta.ceil() for values ​​close to the Timedelta implementation boundary (GH 38964)

  • Bug In date_range(), a DatetimeIndex containing NaT was incorrectly created instead of raising an OutOfBoundsDatetime exception in corner cases (GH 24124)

  • Bug in infer_freq() incorrectly failing to infer the 'H' frequency if the DatetimeIndex has a time zone and crosses a DST boundary (GH 39556)

  • Bug In Series backed by DatetimeArray or TimedeltaArray, sometimes failing to set the array's freq to None (GH 41425)

Time delta

  • Bug When building Timedelta, non-nanosecond units from np.timedelta64 objects exceed the bounds of timedelta64[ns] (GH 38965)

  • Bug incorrectly accepting np.datetime64("NaT") objects when constructing TimedeltaIndex (GH 39462)

  • Bug When building Timedelta, an error failed to be raised if the input string contained only symbols and no numbers (GH 39710)

  • Bug in TimedeltaIndex and to_timedelta() where exceptions were not raised when passing overflowing non-nanosecond timedelta64 arrays that would overflow when converted to timedelta64[ns] (GH 40008)

Time zone

  • Bug where different tzinfo objects representing UTC were not considered equivalent (GH 39216)

  • DataFrame.rank() reports IndexError when DataFrame contains axis=0 and columns have incomparable types (GH 38932)

number

  • Bug in DataFrame.quantile(), DataFrame.sort_values() leading to incorrect subsequent indexing behavior (GH 38351)

  • DataFrame.sort_values() reports IndexError on empty by (GH 40258)

  • Bug in DataFrame.select_dtypes() removes numeric ExtensionDtype columns when include=np.number is used (GH 35340)

  • Bug in DataFrame.mode() and Series.mode() not maintaining consistent integer Index for empty input (GH 33321)

  • Bug in dateutil.tz.gettz("UTC") not being recognized as equivalent to other tzinfos representing UTC (GH 39276)

  • Bug in DataFrame.rank() when DataFrame contains np.inf (GH 32593)

  • Bug in Series.rank(), DataFrame.rank(), DataFrameGroupBy.rank(), and SeriesGroupBy.rank() in treating the most negative int64 value as missing (GH 32859 )

  • Bug in DataFrame.select_dtypes() behaves differently when using include="int" between Windows and Linux (GH 36596)

  • Bug in DataFrame.apply() and DataFrame.agg() when passed argument func="size" would operate on the entire DataFrame instead of rows and columns (GH 39934)

  • Bug in DataFrame.transform(), which raised SpecificationError when a dictionary was passed and a column was missing, now raises KeyError (GH 40004)

  • Bug in DataFrameGroupBy.rank() and SeriesGroupBy.rank() giving incorrect results when pct=True and equality exists between consecutive groups (GH 40518)

  • Bug in Series.count() when argument level=None produces int32 results on 32-bit platforms (GH 40908)

  • Bug in Series and DataFrame not returning boolean results for object data when using methods any and all for reduction (GH 12863, GH 35450, GH 27709)

  • Bug in Series.clip() which would fail if a Series contained NA values ​​and had a nullable integer or float as data type (GH 40851)

  • Bug in UInt64Index.where() and UInt64Index.putmask() when other of type np.int64 incorrectly raised TypeError (GH 41974)

  • Bug in DataFrame.agg() not sorting the axes of an aggregate in the order of the supplied aggregate function when the supplied aggregate function fails to produce a result (GH 33634)

  • In DataFrame.clip(), interpret missing values ​​as unthresholded (GH 40420)

Conversion

  • Series.to_dict() using orient=&#39;records&#39; now returns Python native types (GH 25969)

  • Series.view() and Index.view() appear when converting datetime (datetime64[ns], datetime64[ns, tz], timedelta64, period) data types Bug (GH 39788)

  • Original data types not preserved when creating DataFrame from empty np.recarray (GH 40121)

  • DataFrame fails to raise TypeError when building with frozenset (GH 40163)

  • When building Index, the passed dtype is silently ignored when the data cannot be converted to the specified data type (GH 21311)

  • When converting StringArray.astype() to dtype=&#39;categorical&#39;, fallback to NumPy raises an error on conversion (GH 40450)

  • When using factorize(), unique values ​​fail to retain their original data type when the given array is a numeric NumPy data type below int64, uint64 and float64 (GH 41132)

  • Error building DataFrame using dictionary as array containing ExtensionDtype and copy=True, failed to copy (GH 38939)

  • qcut() reports an error when using Float64DType as input (GH 40730)

  • A bug exists in building DataFrame and Series with datetime64[ns] data and dtype=object, resulting in datetime objects instead of Timestamp objects (GH 41599)

  • A bug exists in building DataFrame and Series with timedelta64[ns] data and dtype=object, resulting in np.timedelta64 objects instead of Timedelta objects (GH 41599)

  • There is a bug in building DataFrame given a Period or Interval object of two-dimensional object dtype np.ndarray, cannot be cast to PeriodDtype or IntervalDtype (GH 41812)

  • Bug when building Series from lists and PandasDtype (GH 39357)

  • Bug when building Series from range objects that do not fit within int64 dtype boundaries (GH 30173)

  • Bug when creating Series from dict with full tuple keys and Index that needs to be re-indexed (GH 41707)

  • Bug in infer_dtype(), Series, Index or array with period (dtype) is not recognized (GH 23553)

  • There is an error in a general ExtensionArray object where infer_dtype() raises an error. "unknown-array" will now be returned instead of raising an error (GH 37367)

  • Bug in DataFrame.convert_dtypes() where a ValueError was incorrectly raised when called on an empty DataFrame (GH 40393)

string

  • Bug when converting from pyarrow.ChunkedArray to StringArray when original chunks are zero (GH 41040)

  • Vulnerability in Series.replace() and DataFrame.replace(), ignoring replacement of StringDType data with regex=True (GH 41333, GH 35977)

  • Vulnerability in Series.str.extract(), using StringArray to return an empty DataFrame returns object dtype (GH 41441)

  • Vulnerability in Series.str.replace() where case parameter is ignored when regex=False (GH 41602)

Interval

  • Bug where IntervalIndex.intersection() and IntervalIndex.symmetric_difference() always return object dtype when operating with CategoricalIndex (GH 38653, GH 38741)

  • Vulnerability in IntervalIndex.intersection() returning duplicates when at least one Index object has a duplicate that exists in another object (GH 38743)

  • IntervalIndex.union(), IntervalIndex.intersection(), IntervalIndex.difference() and IntervalIndex.symmetric_difference() now convert when operating with another IntervalIndex with an incompatible dtype for the appropriate dtype instead of raising TypeError (GH 39267)

  • PeriodIndex.union(), PeriodIndex.intersection(), PeriodIndex.symmetric_difference(), PeriodIndex.difference() now convert to object dtype when operating with another incompatible PeriodIndex, instead of raising an IncompatibleFrequency error (GH 39306)

  • Bug in IntervalIndex.is_monotonic(), IntervalIndex.get_loc(), IntervalIndex.get_indexer_for() and IntervalIndex.__contains__() when NA values ​​are present (GH 41831)

Index

  • Bug in Index.union() and MultiIndex.union() to discard duplicate Index values ​​when Index is not monotonic or sort is set to False (GH 36289, GH 31326, GH 40862)

  • Bug in CategoricalIndex.get_indexer() that did not raise InvalidIndexError when not unique (GH 38372)

  • Bug in IntervalIndex.get_indexer() when target has CategoricalDtype and both index and target contain NA values ​​(GH 41934)

  • Bug in Series.loc() raises ValueError when filtering input using a boolean list and the value to be set is a lower-dimensional list (GH 20438)

  • Bug inserting many new columns into DataFrame caused subsequent indexing to behave incorrectly (GH 38380)

  • Bug in DataFrame.__setitem__() raising ValueError when setting multiple values ​​to duplicate columns (GH 15695)

  • Bug where DataFrame.loc(), Series.loc(), DataFrame.__getitem__() and Series.__getitem__() returned incorrect elements for string slices of non-monotonic DatetimeIndex (GH 33146)

  • Bug in DataFrame.reindex() and Series.reindex(), with time zone aware indexes, for method="ffill" and method="bfill" and when specifying tolerance Raises TypeError (GH 38566)

  • Bug in DataFrame.reindex() where datetime64[ns] or timedelta64[ns] needed to be converted to an object data type, incorrectly converting to an integer, resulting in a fill_value error (GH 39755)

  • Bug in DataFrame.__setitem__(), raising ValueError when setting an empty DataFrame with a specified column and a non-empty DataFrame value (GH 38831)

  • Bug in DataFrame.loc.__setitem__(), raising ValueError when DataFrame has duplicate columns when operating on unique columns (GH 38521)

  • Bug in DataFrame.iloc.__setitem__() and DataFrame.loc.__setitem__() where mixed data types raised ValueError when setting with dictionary values ​​(GH 38335)

  • Bug in Series.loc.__setitem__() and DataFrame.loc.__setitem__(), raising KeyError when providing a boolean generator (GH 39614)

  • Bug in Series.iloc() and DataFrame.iloc(), raising KeyError when providing a generator (GH 39614)

  • Bug in DataFrame.__setitem__() where ValueError was not raised when the right-hand side was a DataFrame with an incorrect number of columns (GH 38604)

  • Bug in Series.__setitem__(), raising ValueError when setting Series with a scalar indexer (GH 38303)

  • Bug in DataFrame.loc() that discards the level of MultiIndex when the DataFrame as input has only one row (GH 10521)

  • DataFrame.__getitem__() and Series.__getitem__() always raise KeyError when slicing with an existing string with milliseconds (GH 33589)

  • Error when setting a timedelta64 or datetime64 value to a numeric Series, which cannot be converted to an object dtype (GH 39086, GH 39619)

  • Bug in setting Interval value to Series or DataFrame with mismatched IntervalDtype incorrectly converting new value to existing dtype (GH 39120)

  • Bug in setting datetime64 value to Series with integer dtype incorrectly converting datetime64 value to integer (GH 39266)

  • Bug in setting np.datetime64("NaT") to a Series with Datetime64TZDtype incorrectly treating timezone-independent values ​​as timezone-aware (GH 39769)

  • Error in Index.get_loc() did not raise KeyError when key=NaN and method was specified but NaN was not in Index (GH 39382)

  • Bug in DatetimeIndex.insert() when inserting np.datetime64("NaT") into a time zone aware index, incorrectly treating time zone independent values ​​as time zone aware values ​​(GH 39769 )

  • Exception incorrectly thrown in Index.insert() when setting a new column that cannot be saved in an existing frame.columns, or in Series.reset_index() or DataFrame.reset_index() in instead of converting to a compatible data type (GH 39068)

  • Bug in RangeIndex.append() where single objects of length 1 were incorrectly concatenated (GH 39401)

  • Bug in RangeIndex.astype(), when converting to CategoricalIndex, the category became Int64Index instead of RangeIndex (GH 41263)

  • Error when setting numpy.timedelta64 value to object data type Series, using boolean indexer (GH 39488)

  • Bug in converting a numeric value to an object data type using at or iat when setting it to a boolean data type Series fails (GH 39582)

  • Bug in DataFrame.__setitem__() and DataFrame.iloc.__setitem__(), raising ValueError when trying to use a row slice index and set a list as a value (GH 40440)

  • Bug in DataFrame.loc() where KeyError was not raised when a key was not found in MultiIndex and the level was not fully specified (GH 41170)

  • DataFrame.loc.__setitem__() bug in set, incorrectly raising exception when index in extended axis contains duplicates (GH 40096)

  • Bug in DataFrame.loc.__getitem__(), converting MultiIndex to float when at least one index column has a floating point data type and we retrieve a scalar (GH 41369)

  • Bug in DataFrame.loc(), incorrectly matching non-boolean indexed elements (GH 20432)

  • When indexing using np.nan on a Series or DataFrame with CategoricalIndex, KeyError is incorrectly raised when an np.nan key is present (GH 41933)

  • A bug in Series.__delitem__() incorrectly converts to ndarray when using ExtensionDtype (GH 40386)

  • A bug in DataFrame.at() returns incorrect results when passing integer keys (GH 41846)

  • A bug in DataFrame.loc() returns MultiIndex in the wrong order when there are duplicate values ​​in the indexer (GH 40978)

  • A bug in DataFrame.__setitem__() caused a TypeError to be raised when using a str subclass as a column name with a DatetimeIndex (GH 37366)

  • A bug in PeriodIndex.get_loc() failed to raise KeyError when given a Period that did not match freq (GH 41670)

  • When using UInt64Index with negative integer keys, a bug in .loc.__getitem__ raised OverflowError instead of KeyError in some cases, wrapping it as a positive integer in other cases (GH 41777)

  • A bug in Index.get_indexer() failed to raise ValueError with invalid method, limit or tolerance arguments in some cases (GH 41918)

  • When slicing using Series or DataFrame with TimedeltaIndex, passing an invalid string incorrectly raised ValueError instead of TypeError (GH 41821)

  • Index constructor sometimes silently ignores specified dtype (GH 38879)

  • The behavior of Index.where() now matches the behavior of Index.putmask(), i.e. index.where(mask, other) matches index.putmask(~mask, other) (GH 39412 )

Missing

  • Bug in Grouper not propagating dropna parameter correctly; DataFrameGroupBy.transform() now correctly handles missing values ​​of dropna=True (GH 35612)

  • isna(), Series.isna(), Index.isna(), DataFrame.isna() and the corresponding notna functions do not recognize Decimal("NaN") objects (GH 39409 )

  • DataFrame.fillna() does not accept a dictionary of downcast keyword (GH 40809)

  • Bug in isna() not returning mask for nullable types, causing any subsequent modification of the mask to change the original array (GH 40935)

  • Bug in DataFrame construction when containing floating point data and integer dtype cast without preserving NaN (GH 26919)

  • Bug in Series.isin() and MultiIndex.isin() not treating all nans as equivalent if they were in a tuple (GH 41836)

Multi-level index

  • DataFrame.drop() raises TypeError when MultiIndex is not unique and level is not provided (GH 36293)

  • Bug with repeated NaN in results, MultiIndex.intersection() (GH 38623)

  • Bug in MultiIndex.equals() incorrectly returning True when MultiIndex contains NaN and the order is different (GH 38439)

  • Bug in MultiIndex.intersection() always returns empty results when intersecting with CategoricalIndex (GH 38653)

  • Bug in MultiIndex.difference() incorrectly raised TypeError when the index contained unsortable entries (GH 41915)

  • Raising ValueError when using MultiIndex.reindex() on an empty MultiIndex and only indexing specific levels (GH 41170)

  • Bug in MultiIndex.reindex() raising TypeError when reindexing to a flat Index (GH 41707)

I/O

  • Bug in Index.__repr__() when display.max_seq_items=1 (GH 38415)

  • Bug in read_csv() where scientific notation is not recognized if parameter decimal is set and engine="python" (GH 31920)

  • Bug in read_csv() interpreting NA values ​​as comments in case of engine="python" (GH 34002)

  • Bug in read_csv() raising IndexError when file has no data rows, has multiple header columns and index_col is specified (GH 38292)

  • Bug in read_csv() not accepting usecols of different lengths than names under engine="python" (GH 16469)

  • In case of engine="python", read_csv() returns object dtype when delimiter="," and usecols and parse_dates are specified (GH 35873)

  • Fixed a bug where read_csv() raised TypeError when names and parse_dates were specified in engine="c" mode (GH 33699)

  • Fixed bug where read_clipboard() and DataFrame.to_clipboard() were not working in WSL (GH 38527)

  • Allow setting custom error values ​​for the parse_dates parameter of read_sql(), read_sql_query() and read_sql_table() (GH 35185)

  • Fixed bug where DataFrame.to_hdf() and Series.to_hdf() raised KeyError when trying to be applied to a subclass of DataFrame or Series (GH 33748)

  • Fixed bug where HDFStore.put() raised incorrect TypeError when saving a DataFrame with non-string dtype (GH 34274)

  • Fixed a bug in json_normalize() that caused the first element of the generator object not to be included in the returned DataFrame (GH 35923)

  • Fixed bug where read_csv() should parse date columns when applying thousands separator to date columns in engine="python" mode but usecols was specified (GH 39365)

  • Fixed bug in read_excel() forward-filling MultiIndex names when multiple header and index columns were specified (GH 34673)

  • Fixed bug where read_excel() did not respect set_option() (GH 34252)

  • read_csv() does not switch true_values and false_values for nullable boolean data types (GH 34655)

  • read_json() does not maintain numeric string index when orient="split" (GH 28556)

  • If chunksize is non-zero and the query result is empty, read_sql() returns an empty generator. Now returns a generator containing a single empty DataFrame (GH 34411)

  • read_hdf() returns unexpected records when filtering categorical string columns using the where parameter (GH 39189)

  • read_sas() raises ValueError when datetimes is empty (GH 39725)

  • read_excel() removes null values ​​when reading from a single column spreadsheet (GH 39808)

  • read_excel() loads trailing empty rows/columns for some file types (GH 41167)

  • read_excel() raises AttributeError when an Excel file has a MultiIndex header followed by two empty rows and no index (GH 40442)

  • An issue in read_excel(), read_csv(), read_table(), read_fwf() and read_clipboard() is when there is no index after the MultiIndex header and there is a blank row when , the blank line will be deleted (GH 40442)

  • DataFrame.to_string() incorrectly places truncated columns when index=False (GH 40904)

  • Bug in DataFrame.to_string() when index=False adds extra points and misplaces truncated rows (GH 40904)

  • Bug in read_orc() always raises AttributeError (GH 40918)

  • Bug in read_csv() and read_table() If names and prefix are defined, prefix is silently ignored and now raises ValueError (GH 39123)

  • Bug in read_csv() and read_excel() when mangle_dupe_cols is set to True, dtype of duplicate column names is not respected (GH 35211)

  • Bug in read_csv() If delimiter and sep are defined, sep is silently ignored and now raises a ValueError (GH 39823)

  • Bug in read_csv() and read_table() misunderstanding parameters in previous calls to sys.setprofile (GH 41069)

  • Error when converting from PyArrow to pandas (e.g. for reading Parquet files) containing nullable data types and PyArrow arrays whose data buffer size is not a multiple of the dtype size (GH 40896)

  • Bug in read_excel() raises an error when pandas cannot determine the file type, even if the user specifies the engine argument (GH 41225)

  • Bug in read_clipboard() When copying from an Excel file, if the first column has a null value, the value is moved to the wrong column (GH 41108)

  • Bug in DataFrame.to_hdf() and Series.to_hdf() raising TypeError when trying to append a string column to an incompatible column (GH 41897)

Period

  • Comparisons of Period objects or Index, Series or DataFrame now behave consistent with comparisons of other unmatched PeriodDtype types, returning False for equality, True for inequality, and True for inequality Check raises TypeError (GH 39274)

Drawing

  • Fixed bug in plotting.scatter_matrix() when 2D ax argument is passed (GH 16253)

  • Prevent warnings from appearing when Matplotlib's constrained_layout is enabled (GH 25261)

  • Fixed a bug in DataFrame.plot() that showed wrong colors in the legend when the function was called repeatedly and some calls used yerr but others did not (GH 39522)

  • Fixed a bug in DataFrame.plot() that showed wrong colors in the legend when the function was called repeatedly and some calls used secondary_y and others used legend=False (GH 40044 )

  • Fixed bug in DataFrame.plot.box() where the upper and lower bounds or min/max markers in the plot were not visible when the dark_background theme was selected (GH 40769)

Grouping/Resampling/Rolling

  • Fixed bug in DataFrameGroupBy.agg() and SeriesGroupBy.agg() where result conversion was incorrectly too aggressive for PeriodDtype columns (GH 38254)

  • Fixed bug in SeriesGroupBy.value_counts() where there were no counts for unobserved categories in grouped categorical series (GH 38672)

  • There is a bug in SeriesGroupBy.value_counts() that throws an error on an empty Series (GH 39172)

  • In grouping keys with null values, GroupBy.indices() will contain non-existent indices (GH 9304)

  • Fixed a precision loss bug in DataFrameGroupBy.sum() and SeriesGroupBy.sum() by now using Kahan summation (GH 38778)

  • Fixed precision loss bug in DataFrameGroupBy.cumsum(), SeriesGroupBy.cumsum(), DataFrameGroupBy.mean() and SeriesGroupBy.mean() by using Kahan summation (GH 38934)

  • There is a bug in Resampler.aggregate() and DataFrame.transform() that throws TypeError instead of SpecificationError when mixed data types for a key are missing (GH 39025)

  • There is a bug in DataFrameGroupBy.idxmin() and DataFrameGroupBy.idxmax() with ExtensionDtype columns (GH 38733)

  • Bug in Series.resample() raises an error when the index is a PeriodIndex consisting of NaT (GH 39227)

  • Bug in RollingGroupby.corr() and ExpandingGroupby.corr(), when providing other longer than each group, the group column would return 0 instead of np.nan ( GH 39591)

  • Bug in ExpandingGroupby.corr() and ExpandingGroupby.cov(), when providing other longer than each group, returning 1 instead of np.nan (GH 39591)

  • Bug in DataFrameGroupBy.mean(), SeriesGroupBy.mean(), DataFrameGroupBy.median(), SeriesGroupBy.median() and DataFrame.pivot_table(), metadata was not propagated (GH 28283 )

  • Bug in Series.rolling() and DataFrame.rolling() where the window bounds were not calculated correctly when the window was an offset and the date was in descending order (GH 40002)

  • When using Series.groupby() and DataFrame.groupby() on an empty Series or DataFrame, use idxmax, idxmin, mad, min, max directly , sum, prod and skew methods, or when using them via apply, aggregate or resample, indexes, columns and/or data types are lost (GH 26411)

  • There is a bug when using DataFrameGroupBy.apply() and SeriesGroupBy.apply() on RollingGroupby objects, which creates a MultiIndex instead of an Index (GH 39732)

  • There is a bug in DataFrameGroupBy.sample() that raises an error when weights is specified and the index is Int64Index (GH 39927)

  • DataFrameGroupBy.aggregate() and Resampler.aggregate(), which sometimes raised SpecificationError when passing a dictionary and missing columns, will now always raise KeyError (GH 40004)

  • There is a bug in DataFrameGroupBy.sample() that does not apply column selection before calculating the result (GH 39928)

  • There was a bug in ExponentialMovingWindow when calling __getitem__ that incorrectly raised ValueError when times was supplied (GH 40164)

  • There is a bug in ExponentialMovingWindow that loses com, span, alpha or halflife attributes when calling __getitem__ (GH 40164)

  • ExponentialMovingWindow now throws NotImplementedError when specifying times with adjust=False because the calculation is incorrect (GH 40098)

  • There is a bug in ExponentialMovingWindowGroupby.mean() that ignores the times parameter when engine=&#39;numba&#39; (GH 40951)

  • There is a bug in ExponentialMovingWindowGroupby.mean() that uses the wrong times when there are multiple groups (GH 40951)

  • There is a bug in ExponentialMovingWindowGroupby where the time vector and value will be out of sync for non-trivial groups (GH 40951)

  • There is a bug in Series.asfreq() and DataFrame.asfreq() that drops rows when the index is not sorted (GH 39805)

  • There is a bug in the aggregate functions of DataFrame that does not respect the numeric_only parameter when the level keyword is given (GH 40660)

  • There is a bug in SeriesGroupBy.aggregate() where using a user-defined function to aggregate a Series with object type Index results in an incorrect Index shape (GH 40014)

  • There is a bug in RollingGroupby where the as_index=False parameter in groupby is ignored (GH 39433)

  • There is a bug in DataFrameGroupBy.any(), SeriesGroupBy.any(), DataFrameGroupBy.all() and SeriesGroupBy.all() when using nullable type columns and even skipna=True ValueError is raised (GH 40585)

  • Bug in DataFrameGroupBy.cummin(), SeriesGroupBy.cummin(), DataFrameGroupBy.cummax() and SeriesGroupBy.cummax() incorrectly rounding integer values near the int64 implementations bounds (GH 40767)

  • Bug in DataFrameGroupBy.rank() and SeriesGroupBy.rank() with nullable dtypes incorrectly raising a TypeError (GH 41010)

  • Bug in DataFrameGroupBy.cummin(), SeriesGroupBy.cummin(), DataFrameGroupBy.cummax() and SeriesGroupBy.cummax() computing wrong result with nullable data types too large to roundtrip when casting to float (GH 37493)

  • Bug in DataFrame.rolling() returning mean zero for all NaN window with min_periods=0 if calculation is not numerical stable (GH 41053)

  • Bug in DataFrame.rolling() returning sum not zero for all NaN window with min_periods=0 if calculation is not numerical stable (GH 41053)

  • Bug in SeriesGroupBy.agg(), failure to preserve ordered CategoricalDtype in order-preserving aggregation operations (GH 41147)

  • Bug in DataFrameGroupBy.min(), SeriesGroupBy.min(), DataFrameGroupBy.max() and SeriesGroupBy.max(), multiple object type columns and numeric_only=False were raised incorrectly ValueError (GH 41111)

  • Bug in DataFrameGroupBy.rank() related to axis=0 of the GroupBy object and keyword axis=1 of the rank method (GH 41320)

  • Bug in DataFrameGroupBy.__getitem__(), non-unique columns incorrectly returned a malformed SeriesGroupBy instead of DataFrameGroupBy (GH 41427)

  • Bug in DataFrameGroupBy.transform(), non-unique columns incorrectly raised AttributeError (GH 41427)

  • Bug in Resampler.apply(), non-unique columns incorrectly dropped duplicate columns (GH 41445)

  • Bug in Series.groupby() where aggregation incorrectly returned an empty Series instead of raising TypeError for aggregation operations whose dtype is invalid, e.g. .prod with datetime64[ns] dtype (GH 41342)

  • DataFrameGroupBy aggregation incorrectly did not remove columns of invalid data types for those aggregates when there were no valid columns resulting in an error (GH 41291)

  • Bug in DataFrame.rolling.__iter__() where on was not assigned an index into the result object (GH 40373)

  • Bug in *args in DataFrameGroupBy.transform() and DataFrameGroupBy.agg() being cached with user-passed functions when using engine="numba" (GH 41647)

  • Removed .columns.names in DataFrameGroupBy methods agg, transform, sum, bfill, ffill, pad, pct_change, shift, ohlc Bug (GH 41497)

Reshape

  • When performing an inner join with a partial index and right_index=True, merge() raises an incorrect error when there is no overlap between indices (GH 33814)

  • Missing levels in DataFrame.unstack() caused a bug with incorrect index names (GH 37510)

  • Bug in merge_asof() propagating right index instead of left index when using left_index=True and right_on specifications (GH 33463)

  • DataFrame.join() on a DataFrame with MultiIndex returns incorrect results when one of the indexes has only one level (GH 36909)

  • merge_asof() now raises ValueError instead of the obscure TypeError in the case of non-numeric merge columns (GH 29130)

  • Bug in DataFrame.join() where values ​​were not assigned correctly when the DataFrame had a MultiIndex and at least one dimension had non-alphabetical categories (GH 38502)

  • Series.value_counts() and Series.mode() now return consistent keys in original order (GH 12679, GH 11227 and GH 39007)

  • Bug in DataFrame.stack() not properly handling NaN in MultiIndex columns (GH 39481)

  • Bug in DataFrame.apply() giving incorrect results when argument func is a string and axis is not supported; now raises ValueError (GH 39211)

  • Bug in DataFrame.sort_values() when sorting on columns did not reshape the index correctly when ignore_index=True (GH 39464)

  • Bug in DataFrame.append() returns incorrect dtypes when combining ExtensionDtype dtypes (GH 39454)

  • Bug in DataFrame.append() returns incorrect dtypes when used in combination with datetime64 and timedelta64 dtypes (GH 39574)

  • Bug in DataFrame.append() when using a DataFrame with a MultiIndex and appending a Series that is not a MultiIndex (GH 41707)

  • Bug in DataFrame.pivot_table() returns a MultiIndex of a single value when operating on an empty DataFrame (GH 13483)

  • Index can now be passed to the numpy.all() function (GH 40180)

  • Bug in DataFrame.stack(), CategoricalDtype is not preserved in MultiIndex (GH 36991)

  • There is a bug in to_datetime() that raises an error when the input sequence contains unhashable items (GH 39756)

  • Bug in Series.explode() that preserves index when ignore_index is set to True and the value is a scalar (GH 40487)

  • There is a bug in to_datetime(), which raises ValueError when Series contains None and NaT and has more than 50 elements (GH 39882)

  • Bug in Series.unstack() and DataFrame.unstack() whereby object dtype values ​​containing timezone-aware datetime objects would incorrectly raise TypeError (GH 41875)

  • There is a bug in DataFrame.melt() that raises InvalidIndexError in DataFrame with duplicate columns used as value_vars (GH 41951)

Sparse

  • There is a bug in DataFrame.sparse.to_coo(), which raises KeyError when the column is a numeric Index without 0 (GH 18414)

  • A bug exists in SparseArray.astype() where using copy=False produces incorrect results when converting from an integer dtype to a floating point dtype (GH 34456)

ummin(), SeriesGroupBy.cummin(), DataFrameGroupBy.cummax()andSeriesGroupBy.cummax()` computing wrong result with nullable data types too large to roundtrip when casting to float (GH 37493)

  • Bug in DataFrame.rolling() returning mean zero for all NaN window with min_periods=0 if calculation is not numerical stable (GH 41053)

  • Bug in DataFrame.rolling() returning sum not zero for all NaN window with min_periods=0 if calculation is not numerical stable (GH 41053)

  • Bug in SeriesGroupBy.agg(), failure to preserve ordered CategoricalDtype in order-preserving aggregation operations (GH 41147)

  • Bug in DataFrameGroupBy.min(), SeriesGroupBy.min(), DataFrameGroupBy.max() and SeriesGroupBy.max(), multiple object type columns and numeric_only=False were raised incorrectly ValueError (GH 41111)

  • Bug in DataFrameGroupBy.rank() related to axis=0 of the GroupBy object and keyword axis=1 of the rank method (GH 41320)

  • Bug in DataFrameGroupBy.__getitem__(), non-unique columns incorrectly returned a malformed SeriesGroupBy instead of DataFrameGroupBy (GH 41427)

  • Bug in DataFrameGroupBy.transform(), non-unique columns incorrectly raised AttributeError (GH 41427)

  • Bug in Resampler.apply(), non-unique columns incorrectly dropped duplicate columns (GH 41445)

  • Bug in Series.groupby() where aggregation incorrectly returned an empty Series instead of raising TypeError for aggregation operations whose dtype is invalid, e.g. .prod with datetime64[ns] dtype (GH 41342)

  • DataFrameGroupBy aggregation incorrectly did not remove columns of invalid data types for those aggregates when there were no valid columns resulting in an error (GH 41291)

  • Bug in DataFrame.rolling.__iter__() where on was not assigned an index into the result object (GH 40373)

  • Bug in *args in DataFrameGroupBy.transform() and DataFrameGroupBy.agg() being cached with user-passed functions when using engine="numba" (GH 41647)

  • Removed .columns.names in DataFrameGroupBy methods agg, transform, sum, bfill, ffill, pad, pct_change, shift, ohlc Bug (GH 41497)

Reshape

  • When performing an inner join with a partial index and right_index=True, merge() raises an incorrect error when there is no overlap between indices (GH 33814)

  • Missing levels in DataFrame.unstack() caused a bug with incorrect index names (GH 37510)

  • Bug in merge_asof() propagating right index instead of left index when using left_index=True and right_on specifications (GH 33463)

  • DataFrame.join() on a DataFrame with MultiIndex returns incorrect results when one of the indexes has only one level (GH 36909)

  • merge_asof() now raises ValueError instead of the obscure TypeError in the case of non-numeric merge columns (GH 29130)

  • Bug in DataFrame.join() where values ​​were not assigned correctly when the DataFrame had a MultiIndex and at least one dimension had non-alphabetical categories (GH 38502)

  • Series.value_counts() and Series.mode() now return consistent keys in original order (GH 12679, GH 11227 and GH 39007)

  • Bug in DataFrame.stack() not properly handling NaN in MultiIndex columns (GH 39481)

  • Bug in DataFrame.apply() giving incorrect results when argument func is a string and axis is not supported; now raises ValueError (GH 39211)

  • Bug in DataFrame.sort_values() when sorting on columns did not reshape the index correctly when ignore_index=True (GH 39464)

  • Bug in DataFrame.append() returns incorrect dtypes when combining ExtensionDtype dtypes (GH 39454)

  • Bug in DataFrame.append() returns incorrect dtypes when used in combination with datetime64 and timedelta64 dtypes (GH 39574)

  • Bug in DataFrame.append() when using a DataFrame with a MultiIndex and appending a Series that is not a MultiIndex (GH 41707)

  • Bug in DataFrame.pivot_table() returns a MultiIndex of a single value when operating on an empty DataFrame (GH 13483)

  • Index can now be passed to the numpy.all() function (GH 40180)

  • Bug in DataFrame.stack(), CategoricalDtype is not preserved in MultiIndex (GH 36991)

  • There is a bug in to_datetime() that raises an error when the input sequence contains unhashable items (GH 39756)

  • Bug in Series.explode() that preserves index when ignore_index is set to True and the value is a scalar (GH 40487)

  • There is a bug in to_datetime(), which raises ValueError when Series contains None and NaT and has more than 50 elements (GH 39882)

  • Bug in Series.unstack() and DataFrame.unstack() whereby object dtype values ​​containing timezone-aware datetime objects would incorrectly raise TypeError (GH 41875)

  • There is a bug in DataFrame.melt() that raises InvalidIndexError in DataFrame with duplicate columns used as value_vars (GH 41951)

Sparse

  • There is a bug in DataFrame.sparse.to_coo(), which raises KeyError when the column is a numeric Index without 0 (GH 18414)

  • A bug exists in SparseArray.astype() where using copy=False produces incorrect results when converting from an integer dtype to a floating point dtype (GH 34456)

  • Bug in SparseArray.max() and SparseArray.min() always returned empty results ([GH 40921](https://github.com/pandas-dev/pa

RELATED ARTICLES

Most Popular

Recent Comments