python数据分析——城市气候与海洋的关系研究+机器学习【实例】


城市气候与海洋的关系研究

 

导入包

In [2]:
import numpy as np import pandas as pd from pandas import Series,DataFrame import matplotlib.pyplot as plt from pylab import mpl mpl.rcParams['font.sans-serif'] = ['FangSong'] # 指定默认字体 mpl.rcParams['axes.unicode_minus'] = False # 解决保存图像是负号'-'显示为方块的问题 
 

导入数据各个海滨城市数据

In [4]:
# ignore_index忽略行索引
ferrara1 = pd.read_csv('./ferrara_150715.csv') ferrara2 = pd.read_csv('./ferrara_250715.csv') ferrara3 = pd.read_csv('./ferrara_270615.csv') ferrara=pd.concat([ferrara1,ferrara2,ferrara3],ignore_index=True) torino1 = pd.read_csv('./torino_150715.csv') torino2 = pd.read_csv('./torino_250715.csv') torino3 = pd.read_csv('./torino_270615.csv') torino = pd.concat([torino1,torino2,torino3],ignore_index=True) mantova1 = pd.read_csv('./mantova_150715.csv') mantova2 = pd.read_csv('./mantova_250715.csv') mantova3 = pd.read_csv('./mantova_270615.csv') mantova = pd.concat([mantova1,mantova2,mantova3],ignore_index=True) milano1 = pd.read_csv('./milano_150715.csv') milano2 = pd.read_csv('./milano_250715.csv') milano3 = pd.read_csv('./milano_270615.csv') milano = pd.concat([milano1,milano2,milano3],ignore_index=True) ravenna1 = pd.read_csv('./ravenna_150715.csv') ravenna2 = pd.read_csv('./ravenna_250715.csv') ravenna3 = pd.read_csv('./ravenna_270615.csv') ravenna = pd.concat([ravenna1,ravenna2,ravenna3],ignore_index=True) asti1 = pd.read_csv('./asti_150715.csv') asti2 = pd.read_csv('./asti_250715.csv') asti3 = pd.read_csv('./asti_270615.csv') asti = pd.concat([asti1,asti2,asti3],ignore_index=True) bologna1 = pd.read_csv('./bologna_150715.csv') bologna2 = pd.read_csv('./bologna_250715.csv') bologna3 = pd.read_csv('./bologna_270615.csv') bologna = pd.concat([bologna1,bologna2,bologna3],ignore_index=True) piacenza1 = pd.read_csv('./piacenza_150715.csv') piacenza2 = pd.read_csv('./piacenza_250715.csv') piacenza3 = pd.read_csv('./piacenza_270615.csv') piacenza = pd.concat([piacenza1,piacenza2,piacenza3],ignore_index=True) cesena1 = pd.read_csv('./cesena_150715.csv') cesena2 = pd.read_csv('./cesena_250715.csv') cesena3 = pd.read_csv('./cesena_270615.csv') cesena = pd.concat([cesena1,cesena2,cesena3],ignore_index=True) faenza1 = pd.read_csv('./faenza_150715.csv') faenza2 = pd.read_csv('./faenza_250715.csv') faenza3 = pd.read_csv('./faenza_270615.csv') faenza = pd.concat([faenza1,faenza2,faenza3],ignore_index=True) 
 

去除没用的列

In [9]:
cesena.head(5) 
Out[9]:
  temp humidity pressure description dt wind_speed wind_deg city day dist
0 29.15 83 1015 moderate rain 1436863101 3.62 94.001 Cesena 2015-07-14 10:38:21 14
1 29.37 74 1015 moderate rain 1436866691 3.60 20.000 Cesena 2015-07-14 11:38:11 14
2 29.51 78 1015 moderate rain 1436870392 3.60 70.000 Cesena 2015-07-14 12:39:52 14
3 29.88 70 1016 moderate rain 1436874000 4.60 60.000 Cesena 2015-07-14 13:40:00 14
4 30.12 70 1016 moderate rain 1436877549 4.10 70.000 Cesena 2015-07-14 14:39:09 14
In [7]:
city_list = [ferrara,torino,mantova,milano,ravenna,asti,bologna,piacenza,cesena,faenza] for city in city_list: city.drop(labels='Unnamed: 0',axis=1,inplace=True) 
 

显示最高温度于离海远近的关系(观察多个城市)

In [10]:
city_max_temp = [] city_dist = [] for city in city_list: max_temp = city['temp'].max() city_max_temp.append(max_temp) dist = city['dist'][0] city_dist.append(dist) 
In [11]:
city_max_temp
Out[11]:
[33.43000000000001,
 34.69,
 34.18000000000001,
 34.81,
 32.79000000000002,
 34.31,
 33.850000000000016,
 33.920000000000016,
 32.81,
 32.74000000000001]
In [12]:
city_dist
Out[12]:
[47, 357, 121, 250, 8, 315, 71, 200, 14, 37]
In [14]:
plt.scatter(city_dist,city_max_temp) plt.xlabel('距离') plt.ylabel('最高温度') plt.title('距离和温度之间的关系图') 
Out[14]:
Text(0.5,1,'距离和温度之间的关系图')
 
 

观察发现,离海近的可以形成一条直线,离海远的也能形成一条直线。

- 分别以100公里和50公里为分界点,划分为离海近和离海远的两组数据(近海:小于100  远海:大于50)
In [16]:
#找出所有的近海城市(温度和距离)
np_city_dist = np.array(city_dist)#【转换成numpy;因为可进行多维变形】 np_city_max_temp = np.array(city_max_temp) 
In [20]:
near_condition = np_city_dist < 100 near_city_dist = np_city_dist[near_condition] near_city_max_temp = np_city_max_temp[near_condition] 
In [21]:
plt.scatter(near_city_dist,near_city_max_temp) 
Out[21]:
<matplotlib.collections.PathCollection at 0x8950320>
 
 

机器学习

- 算法模型对象:特殊的对象.在该对象中已经集成好个一个方程(还没有求出解的方程).
- 模型对象的作用:通过方程实现预测或者分类
- 样本数据(df,np):
    - 特征数据:自变量
    - 目标(标签)数据:因变量
- 模型对象的分类:
    - 有监督学习:模型需要的样本数据中存在特征和目标
    - 无监督学习:模型需要的样本数据中存在特征
    - 半监督学习:模型需要的样本数据部分需要有特征和目标,部分只需要特征数据
- sklearn模块:封装了多种模型对象.可以直接使用。
  • 面积 采光率 楼层 售价
  • 100 30% 18 33w
  • 80 80% 3 133w
 

导入sklearn,建立线性回归算法模型对象

In [22]:
#1.导包
from sklearn.linear_model import LinearRegression 
In [23]:
#2.实例化模型对象
linner = LinearRegression() 
In [ ]:
#3.提取样本数据
In [25]:
#4.训练模型;reshape(-1,1)【n行,1列】一种属性,多组特征
linner.fit(near_city_dist.reshape(-1,1),near_city_max_temp) 
Out[25]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
In [26]:
#5.预测
linner.predict(38) 
Out[26]:
array([33.16842645])
In [27]:
# 模型精准度评分
linner.score(near_city_dist.reshape(-1,1),near_city_max_temp) 
Out[27]:
0.77988083971852
In [28]:
#绘制回归曲线
x = np.linspace(10,70,num=100) # linspace等差数列 y = linner.predict(x.reshape(-1,1)) 
In [33]:
plt.scatter(near_city_dist,near_city_max_temp) plt.scatter(x,y,marker=1)# marker调整点粗细 
Out[33]:
<matplotlib.collections.PathCollection at 0xaaf7940>
 


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM