Python數據挖掘—回歸—神經網絡


概念:

神經網絡:全稱為人工神經網絡,是一種模仿生物神經網絡(動物的中樞神經系統,特別是大腦)的結構和功能的數學模型或計算模型

生物神經網絡:神經細胞是構成神經系統的基本單元,稱為生物神經元,簡稱神經元

一般采用三到五層

 

首先導入自變量和因變量

 1 import pandas;
 2 from pandas import read_csv;
 3 
 4 data = read_csv(
 5     "C:\\Users\\Jw\\Desktop\\python_work\\Python數據挖掘實戰課程課件\\4.5\\data.csv", 
 6     encoding='utf8'
 7 )
 8 data = data.dropna()
 9 
10 dummyColumns = [
11     'Gender', 'Home Ownership', 'Internet Connection', 'Marital Status',
12     'Movie Selector', 'Prerec Format', 'TV Signal']
13 
14 for column in dummyColumns:
15     data[column]=data[column].astype('category')
16 
17 dummiesData = pandas.get_dummies(
18     data, 
19     columns=dummyColumns,
20     prefix=dummyColumns,
21     prefix_sep=" ",
22     drop_first=True
23 )
24 
25 """
26 博士后    Post-Doc
27 博士      Doctorate
28 碩士      Master's Degree
29 學士      Bachelor's Degree
30 副學士    Associate's Degree
31 專業院校  Some College
32 職業學校  Trade School
33 高中      High School
34 小學      Grade School
35 """
36 educationLevelDict = {
37     'Post-Doc': 9,
38     'Doctorate': 8,
39     'Master\'s Degree': 7,
40     'Bachelor\'s Degree': 6,
41     'Associate\'s Degree': 5,
42     'Some College': 4,
43     'Trade School': 3,
44     'High School': 2,
45     'Grade School': 1
46 }
47 
48 dummiesData['Education Level Map'] = dummiesData['Education Level'].map(educationLevelDict)
49 
50 freqMap = {
51     'Never': 0,
52     'Rarely': 1,
53     'Monthly': 2,
54     'Weekly': 3,
55     'Daily': 4
56 }
57 dummiesData['PPV Freq Map'] = dummiesData['PPV Freq'].map(freqMap)
58 dummiesData['Theater Freq Map'] = dummiesData['Theater Freq'].map(freqMap)
59 dummiesData['TV Movie Freq Map'] = dummiesData['TV Movie Freq'].map(freqMap)
60 dummiesData['Prerec Buying Freq Map'] = dummiesData['Prerec Buying Freq'].map(freqMap)
61 dummiesData['Prerec Renting Freq Map'] = dummiesData['Prerec Renting Freq'].map(freqMap)
62 dummiesData['Prerec Viewing Freq Map'] = dummiesData['Prerec Viewing Freq'].map(freqMap)
63 
64 dummiesSelect = [
65     'Age', 'Num Bathrooms', 'Num Bedrooms', 'Num Cars', 'Num Children', 'Num TVs', 
66     'Education Level Map', 'PPV Freq Map', 'Theater Freq Map', 'TV Movie Freq Map', 
67     'Prerec Buying Freq Map', 'Prerec Renting Freq Map', 'Prerec Viewing Freq Map', 
68     'Gender Male',
69     'Internet Connection DSL', 'Internet Connection Dial-Up', 
70     'Internet Connection IDSN', 'Internet Connection No Internet Connection',
71     'Internet Connection Other', 
72     'Marital Status Married', 'Marital Status Never Married', 
73     'Marital Status Other', 'Marital Status Separated', 
74     'Movie Selector Me', 'Movie Selector Other', 'Movie Selector Spouse/Partner', 
75     'Prerec Format DVD', 'Prerec Format Laserdisk', 'Prerec Format Other', 
76     'Prerec Format VHS', 'Prerec Format Video CD', 
77     'TV Signal Analog antennae', 'TV Signal Cable', 
78     'TV Signal Digital Satellite', 'TV Signal Don\'t watch TV'
79 ]
80 
81 inputData = dummiesData[dummiesSelect]
82 
83 outputData = dummiesData[['Home Ownership Rent']]
View Code

 

導入神經網絡中的MLPClassifier類,使用模型進行多次評分

activation="relu",為激活函數,默認為relu,該句類似於使用s函數,hidden_layer_sizes時隱藏的層數

 

activation 激活函數

  √ relu    線性糾正函數,優於logistics和tanh,因為更符合生物神經元(要么不活動,活動起來比較平緩)

  √logistic   logistic函數

  √tanh       tanh函數

 1 from sklearn.neural_network import MLPClassifier
 2 
 3 for l in range(1, 11):
 4     ANNModel = MLPClassifier(
 5         activation='relu',   #類似於s函數
 6         hidden_layer_sizes=l   #隱藏層層數
 7     )
 8 
 9     ANNModel.fit(inputData, outputData)
10 
11     score = ANNModel.score(inputData, outputData)
12     print(str(l) + ", " + str(score))

預測數據

 1 newData = read_csv(
 2     "C:\\Users\\Jw\\Desktop\\python_work\\Python數據挖掘實戰課程課件\\4.4\\newData.csv", 
 3     encoding='utf-8'
 4 )
 5 
 6 for column in dummyColumns:
 7     newData[column] = newData[column].astype(
 8         'category', 
 9         categories=data[column].cat.categories
10     )
11 
12 newData = newData.dropna()
13 
14 newData['Education Level Map'] = newData['Education Level'].map(educationLevelDict)
15 newData['PPV Freq Map'] = newData['PPV Freq'].map(freqMap)
16 newData['Theater Freq Map'] = newData['Theater Freq'].map(freqMap)
17 newData['TV Movie Freq Map'] = newData['TV Movie Freq'].map(freqMap)
18 newData['Prerec Buying Freq Map'] = newData['Prerec Buying Freq'].map(freqMap)
19 newData['Prerec Renting Freq Map'] = newData['Prerec Renting Freq'].map(freqMap)
20 newData['Prerec Viewing Freq Map'] = newData['Prerec Viewing Freq'].map(freqMap)
21 
22 dummiesNewData = pandas.get_dummies(
23     newData, 
24     columns=dummyColumns,
25     prefix=dummyColumns,
26     prefix_sep=" ",
27     drop_first=True
28 )
29 
30 inputNewData = dummiesNewData[dummiesSelect]
31 
32 ANNModel.predict(inputData)

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM