An extension library for the Python language for data analysis; based on Numpy
Install
pip install pandas
Version
import pandas
pandas.__version__ # 0.25.2
One-dimensional array Series
Data in order
List data
- By default, an integer index is created starting from 0; indexable values
var = pandas.Series([1, 2, 3])
var[1] # 2
0 1
1 2
2 3
dtype: int64
- Specify index
var = pandas.Series([1, 2, 3], index=["x", "y", "z"])
var["x"] # 1
x 1
y 2
z 3
dtype: int64
dictionary data
info = {"name": "Tom", "age": 15, "sex": "male", "email": "123@qq.com"}
var = pandas.Series(info)
name Tom
age 15
sex male
email 123@qq.com
dtype: object
- Specify data and set name
pandas.Series(info, index=["name", "age"], name="student information")
name Tom
age 15
Name: student information, dtype: object
Series operations
runoob index value, slice value
var = pandas.Series({"name": "Tom", "age": 15, "sex": "male", "email": "123@qq.com"})
# The following operations,Data starts with ↑ The initial data shall prevail
var["name"] # Tom
var[0] # Tom
# slice:return new Series
var["name":"email"]
var[:"email"] # Specify index:Contains the specified index
var[:3] # Default index:Pay attention to the head and not the tail
var.drop(["sex"]) # Returns a new index label deleted Series
for index, value in var.items():
print(f"Index: {index}, Value: {value}") # Index: name, Value: Tom ....
# Modify original data:increase/delete/change
var["height"] = 178
del var["email"]
var["age"] = 18
Basic operations
- arithmetic operations
var = pandas.Series({"name": "Mei", "age": 15, "sex": "M"})
var * 2
# pandas.Series({"name": "MeiMei", "age": 30, "sex": "MM"})
name MeiMei
age 30
sex MM
dtype: object
- filter
Data of different types cannot be compared: TypeError
var = pandas.Series({"name": "Tom", "sex": "M"})
var[var > 'M']
# pandas.Series({"name": "Tom"})
var = pandas.Series({"Mei": 21, "Lily": 15, "Tom": 19})
var[var > 18]
# pandas.Series({"Mei": 21, "Tom": 19})
- Math functions
import numpy
var = pandas.Series({"Mei": 21, "Lily": 15, "Tom": 19})
numpy.sum(var) # 55
numpy.max(var) # 21
numpy.sqrt(var)
# pandas.Series({"Mei": 4.582576, "Lily": 3.872983, "Tom": 4.358899})
Series method
Statistical data without modifying source data
var = pandas.Series({"noodle": 21, "rice": 2, "bread": 6, "coke": 15})
var.sum() # 44
var.max() # 21
var.min() # 2
var.mean() # average value 11.0
var.std() # standard deviation 8.602325267042627
var.idxmax() # index of maximum value noodle
var.idxmin() # index of minimum value rice
var.head(2) # pandas.Series({"noodle": 21, "rice": 2}),default5
var.tail(2) # pandas.Series({"bread": 6, "coke": 15}),default5
var.astype('float64')
# pandas.Series({"noodle": 21.0, "rice": 2.0, "bread": 6.0, "coke": 15.0})
var.describe() # Descriptive statistics
count 4.000000
mean 11.000000
std 8.602325
min 2.000000
25% 5.000000
50% 10.500000
75% 16.500000
max 21.000000
dtype: float64
Properties of Series
var = pandas.Series({"noodle": 21, "rice": 2, "bread": 6, "coke": 15})
var.size # Number of elements 4
var.shape # shape (4,)
var.dtype # type of data int64
var.values # value arraynumpy.ndarray [21 2 6 15]
var.index # index Index(['noodle', 'rice', 'bread', 'coke'], dtype='object')