Series詳解---pandas

本文轉載自查看原文 2021-10-08 16:19 160 Python

從csv文件中讀取數據生成DataFrame

import pandas as pd
#從csv文件中讀取數據，生成DataFrame
fandango = pd.read_csv('G:\\python\\庫應用（4個）\\3-可視化庫matpltlib\\fandango_scores.csv')
#讀取'FILM'列的[0~5）的值
fandango['FILM'][0:5]
#訪問'RottenTomatoes'列的[0~5）的值
fandango['RottenTomatoes'][0:5]

查看類型

fandango[['FILM','RottenTomatoes','RottenTomatoes_User']]
type(fandango[['FILM','RottenTomatoes','RottenTomatoes_User']])       # pandas.core.frame.DataFrame
type(fandango['RottenTomatoes'])                                      # pandas.core.series.Series
#fandango['RottenTomatoes'].index                                     # RangeIndex(start=0, stop=146, step=1)

1、Series的生成：從DataFrame中獲取Series

from pandas import Series
# 從DataFrame中獲取series：“FILM”列，得到Series
type(fandango['FILM'])       # pandas.core.series.Series
film_series = fandango['FILM']

2、構建Series，值為rt_scores,索引為film_names

# series.values屬性，獲取所有值列表
film_names = film_series.values         
type(film_names)                        # type(film_names) 返回numpy.ndarray

rt_series = fandango['RottenTomatoes']
rt_scores = rt_series.values           
type(rt_scores)                         # type(rt_scores) 返回numpy.ndarray

# 構建Series，值為rt_scores,索引為film_names
custom_series = Series(rt_scores, index=film_names)
custom_series.index                    # Index([...],dtype='object', length=146)

3、Series中元素的訪問

# 通過數字進行訪問
custom_series[[3,5,8]]
# 通過索引名進行訪問
custom_series[['Minions (2015)', 'Leviathan (2014)']]

4、series.index屬性

# series.index屬性，獲取所有索引
type(custom_series.index)               # pandas.core.indexes.base.Index
type(custom_series.index.tolist())      # list
original_index = custom_series.index.tolist()

5、sorted(iterable)是python內置函數，對list進行排序

# sorted(iterable)是python內置函數，對list進行排序
sorted_index = sorted(original_index)

6、series.reindex(index_arr_like)重置series的索引

#help(custom_series.reindex)
# series.reindex(index_arr_like)重置series的索引
sorted_by_index = custom_series.reindex(sorted_index)

7、series按索引排序Series.sort_index()、按值排序Series.sort_values()

# series按索引排序Series.sort_index()、按值排序Series.sort_values()
custom_series.sort_index()
custom_series.sort_values()

8、numpy的add/sin/max運算

#numpy的add/sin/max運算
np.add(custom_series, custom_series)   # 等同於 custom_series + custom_series
np.sin(custom_series)
np.max(custom_series)

9、Series條件判斷

custom_series > 98
greater_than_98_series = custom_series[custom_series > 98]

condition_one = custom_series > 60
condition_two = custom_series < 66
custom_series[condition_one & condition_two]

10、兩個Series的運算：每部電影，影評員與用戶的平均評分

# 每部電影，影評員與用戶的平均評分
rt_critics = Series(fandango['RottenTomatoes'].values, index=fandango['FILM'])
rt_users = Series(fandango['RottenTomatoes_User'].values, index=fandango['FILM'].values)
(rt_critics + rt_users)/2

type(fandango['RottenTomatoes'])       # pandas.core.series.Series
#fandango['RottenTomatoes'].index      # RangeIndex(start=0, stop=146, step=1)
#rt_users.index                        # Index([...],dtype='object', length=146)

11、Series.value_counts(): 統計每個值有在該列中有多少重復值。

import pandas as pd
import matplotlib.pyplot as plt
reviews = pd.read_csv('fandango_scores.csv')
cols = ['FILM', 'RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue']
norm_reviews = reviews[cols]
norm_reviews[:5]

# Series.value_counts(): 每個值有在該列中有多少重復值。
fandango_distribution = norm_reviews['Fandango_Ratingvalue'].value_counts()
# Series.sort_index() 按索引排序，默認升序
fandango_distribution = fandango_distribution.sort_index()

imdb_distribution = norm_reviews['IMDB_norm'].value_counts()
imdb_distribution = imdb_distribution.sort_index()

print(fandango_distribution)
print(imdb_distribution)

2.7     2
2.8     2
2.9     5
3.0     4
3.1     3
3.2     5
3.3     4
3.4     9
3.5     9
3.6     8
3.7     9
3.8     5
3.9    12
4.0     7
4.1    16
4.2    12
4.3    11
4.4     7
4.5     9
4.6     4
4.8     3
Name: Fandango_Ratingvalue, dtype: int64
2.00     1
2.10     1
2.15     1
2.20     1
2.30     2
2.45     2
2.50     1
2.55     1
2.60     2
2.70     4
2.75     5
2.80     2
2.85     1
2.90     1
2.95     3
3.00     2
3.05     4
3.10     1
3.15     9
3.20     6
3.25     4
3.30     9
3.35     7
3.40     1
3.45     7
3.50     4
3.55     7
3.60    10
3.65     5
3.70     8
3.75     6
3.80     3
3.85     4
3.90     9
3.95     2
4.00     1
4.05     1
4.10     4
4.15     1
4.20     2
4.30     1
Name: IMDB_norm, dtype: int64


13、

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Pandas：Series和DataFrame數據結構詳解 pandas之Series pandas的Series Pandas之Series pandas數組(pandas Series)-(1) pandas數組(pandas Series)-(2) pandas基礎(1)_Series和DataFrame Pandas -----簡述 Series和DataFrame Pandas庫的使用--Series pandas入門之Series