AI - H2O - 第一個示例


1 - Iris數據集

Iris數據集是常用的機器學習分類實驗數據集,特點是數據量很小,可以快速學習。
數據集包含150個數據集,分為3類,每類50個數據,每個數據包含4個屬性。

  • Sepal.Length(花萼長度),單位是cm
  • Sepal.Width(花萼寬度),單位是cm
  • Petal.Length(花瓣長度),單位是cm
  • Petal.Width(花瓣寬度),單位是cm

可通過以上4個屬性預測鳶尾花卉屬於以下三個種類中的哪一類

  • Iris Setosa(山鳶尾)
  • Iris Versicolour(雜色鳶尾)
  • Iris Virginica(維吉尼亞鳶尾)

2 - 在Python中運行Iris數據集的深度學習

2.1 - 代碼內容

# coding=utf-8
import h2o

h2o.init()  # 默認情況下,H2O實例允許使用所有內核, 並且通常需要25%的系統存儲空間

# 准備數據
datasets = "https://raw.githubusercontent.com/DarrenCook/h2o/bk/datasets/"
data = h2o.import_file(datasets + "iris_wheader.csv")  # 輸入數據
y = "class"  # 變量y是指要學習的字段名稱,在無監督學習中不需要設置此變量
x = data.names  # 從何處學習的字段名稱,這里表示所有其他字段
x.remove(y)
train, test = data.split_frame([0.8])  # 分割為訓練數據和測試數據,這里選取了80%的數據進行訓練,剩下的來進行測試

# 訓練模型
m = h2o.estimators.deeplearning.H2ODeepLearningEstimator()  # 使用默認值,創建一個機器學習算法的對象
m.train(x, y, train)  # 開始訓練,並指定使用所有的數據集
print("# MSE:", m.mse())  # 顯示MSE(均方誤差)
print("# Confusion Matrix: \n", m.confusion_matrix(train))  # 顯示混淆矩陣(顯示每個類別有多少正確, 錯誤時所選擇的類別)

# 使用模型進行預測
p = m.predict(test)
print("# Predict: \n", p)  # 默認只顯示前10行
print("# as_data_frame : \n", p.as_data_frame())  # 顯示所有行
print("# mean: ", (p["predict"] == test["class"]).mean())  # 顯示正確的百分比
print("# cbind: \n", p["predict"].cbind(test["class"]).as_data_frame())  # 顯示每個預測的兩列輸出

# 一些默認約定
# - y變量:H2O中某一列是需要預測的內容,將該列名稱定為y變量(在無監督學習中不需要設置此變量)
# - x變量:數據中的一些列或所有其他列是需要從中學習的內容,這些列稱為x變量
# - data變量:用於完整的數據
# - train變量:用於訓練幀子集
# - valid變量:用於驗證的子集
# - test變量:用於測試的子集
# 建議采用更為清楚有意義的簡寫名稱.

2.2 - 顯示結果

D:\Temp\Anaconda3\envs\h2o\python.exe D:/Anliven/Anliven-Code/PycharmProjects/TempTest/TempTest_1.py
Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
; Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
  Starting server from D:\Temp\Anaconda3\envs\h2o\lib\site-packages\h2o\backend\bin\h2o.jar
  Ice root: C:\Users\anliven\AppData\Local\Temp\tmptafn6xd_
  JVM stdout: C:\Users\anliven\AppData\Local\Temp\tmptafn6xd_\h2o_anliven_started_from_python.out
  JVM stderr: C:\Users\anliven\AppData\Local\Temp\tmptafn6xd_\h2o_anliven_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.
--------------------------  ------------------------------------------
H2O cluster uptime:         02 secs
H2O cluster timezone:       +08:00
H2O data parsing timezone:  UTC
H2O cluster version:        3.24.0.5
H2O cluster version age:    6 days
H2O cluster name:           H2O_from_python_anliven_be1ik6
H2O cluster total nodes:    1
H2O cluster free memory:    10.64 Gb
H2O cluster total cores:    8
H2O cluster allowed cores:  8
H2O cluster status:         accepting new members, healthy
H2O connection url:         http://127.0.0.1:54321
H2O connection proxy:
H2O internal security:      False
H2O API Extensions:         Amazon S3, Algos, AutoML, Core V3, Core V4
Python version:             3.6.2 final
--------------------------  ------------------------------------------
Parse progress: |█████████████████████████████████████████████████████████| 100%
deeplearning Model Build progress: |██████████████████████████████████████| 100%
# MSE: 0.039118900961189924
# Confusion Matrix: 
 Confusion Matrix: Row labels: Actual class; Column labels: Predicted class

Iris-setosa    Iris-versicolor    Iris-virginica    Error     Rate
-------------  -----------------  ----------------  --------  -------
40             0                  0                 0         0 / 40
0              34                 5                 0.128205  5 / 39
0              0                  38                0         0 / 38
40             34                 43                0.042735  5 / 117

deeplearning prediction progress: |███████████████████████████████████████| 100%
# Predict: 
 predict        Iris-setosa    Iris-versicolor    Iris-virginica
-----------  -------------  -----------------  ----------------
Iris-setosa       0.999995        5.26512e-06       1.22522e-23
Iris-setosa       0.999998        2.10502e-06       2.36894e-24
Iris-setosa       0.999996        4.30403e-06       1.68815e-23
Iris-setosa       0.99995         5.0415e-05        4.90541e-23
Iris-setosa       0.999999        1.23285e-06       4.16845e-24
Iris-setosa       0.999997        3.05992e-06       4.10819e-23
Iris-setosa       0.999946        5.44824e-05       5.15226e-22
Iris-setosa       0.999999        8.97722e-07       2.31546e-23
Iris-setosa       0.99999         9.56155e-06       1.59912e-23
Iris-setosa       1               3.44765e-07       4.95222e-24

[33 rows x 4 columns]

# as_data_frame : 
             predict   Iris-setosa  Iris-versicolor  Iris-virginica
0       Iris-setosa  9.999947e-01     5.265116e-06    1.225220e-23
1       Iris-setosa  9.999979e-01     2.105018e-06    2.368935e-24
2       Iris-setosa  9.999957e-01     4.304033e-06    1.688151e-23
3       Iris-setosa  9.999496e-01     5.041504e-05    4.905406e-23
4       Iris-setosa  9.999988e-01     1.232852e-06    4.168452e-24
5       Iris-setosa  9.999969e-01     3.059924e-06    4.108188e-23
6       Iris-setosa  9.999455e-01     5.448235e-05    5.152261e-22
7       Iris-setosa  9.999991e-01     8.977222e-07    2.315463e-23
8       Iris-setosa  9.999904e-01     9.561553e-06    1.599121e-23
9       Iris-setosa  9.999997e-01     3.447651e-07    4.952222e-24
10  Iris-versicolor  1.285173e-07     9.774696e-01    2.253031e-02
11  Iris-versicolor  8.456613e-05     9.979772e-01    1.938266e-03
12  Iris-versicolor  4.829308e-02     9.517061e-01    8.497348e-07
13  Iris-versicolor  4.169988e-07     9.999681e-01    3.150848e-05
14  Iris-versicolor  1.805217e-06     9.998308e-01    1.673994e-04
15  Iris-versicolor  8.759536e-05     9.999115e-01    8.606799e-07
16  Iris-versicolor  2.206746e-05     9.999167e-01    6.120105e-05
17  Iris-versicolor  3.302204e-06     9.998997e-01    9.695184e-05
18  Iris-versicolor  3.622209e-08     9.389008e-01    6.109913e-02
19  Iris-versicolor  9.407188e-03     9.905912e-01    1.631313e-06
20  Iris-versicolor  1.332645e-03     9.986596e-01    7.739634e-06
21   Iris-virginica  5.299107e-16     7.827116e-07    9.999992e-01
22   Iris-virginica  9.149237e-16     4.476949e-09    1.000000e+00
23   Iris-virginica  4.123180e-13     1.779434e-07    9.999998e-01
24   Iris-virginica  7.280032e-08     6.898109e-03    9.931018e-01
25   Iris-virginica  5.853220e-17     9.229382e-07    9.999991e-01
26   Iris-virginica  1.171212e-12     2.643036e-04    9.997357e-01
27   Iris-virginica  2.345086e-16     2.944686e-09    1.000000e+00
28   Iris-virginica  8.742579e-08     2.479772e-01    7.520227e-01
29   Iris-virginica  1.258946e-09     1.586186e-02    9.841381e-01
30   Iris-virginica  2.918212e-07     1.127815e-02    9.887216e-01
31   Iris-virginica  1.635366e-13     3.913354e-06    9.999961e-01
32   Iris-virginica  1.160129e-11     2.099658e-07    9.999998e-01
# mean:  [1.0]
# cbind: 
             predict            class
0       Iris-setosa      Iris-setosa
1       Iris-setosa      Iris-setosa
2       Iris-setosa      Iris-setosa
3       Iris-setosa      Iris-setosa
4       Iris-setosa      Iris-setosa
5       Iris-setosa      Iris-setosa
6       Iris-setosa      Iris-setosa
7       Iris-setosa      Iris-setosa
8       Iris-setosa      Iris-setosa
9       Iris-setosa      Iris-setosa
10  Iris-versicolor  Iris-versicolor
11  Iris-versicolor  Iris-versicolor
12  Iris-versicolor  Iris-versicolor
13  Iris-versicolor  Iris-versicolor
14  Iris-versicolor  Iris-versicolor
15  Iris-versicolor  Iris-versicolor
16  Iris-versicolor  Iris-versicolor
17  Iris-versicolor  Iris-versicolor
18  Iris-versicolor  Iris-versicolor
19  Iris-versicolor  Iris-versicolor
20  Iris-versicolor  Iris-versicolor
21   Iris-virginica   Iris-virginica
22   Iris-virginica   Iris-virginica
23   Iris-virginica   Iris-virginica
24   Iris-virginica   Iris-virginica
25   Iris-virginica   Iris-virginica
26   Iris-virginica   Iris-virginica
27   Iris-virginica   Iris-virginica
28   Iris-virginica   Iris-virginica
29   Iris-virginica   Iris-virginica
30   Iris-virginica   Iris-virginica
31   Iris-virginica   Iris-virginica
32   Iris-virginica   Iris-virginica
H2O session _sid_aa65 closed.

Process finished with exit code 0

3 - 在Flow(流)中運行Iris數據集的深度學習

Flow:http://docs.h2o.ai/h2o/latest-stable/h2o-docs/flow.html#
作為H2O一部分的Web接口名稱(不需要額外的安裝步驟),可以完成如下操作:

  • 查看通過客戶端上傳的數據
  • 直接上傳數據
  • 查看通過客戶端創建的模型(以及正在創建的模型)
  • 直接創建模型
  • 查看通過客戶端生成的預測
  • 直接預測

3.1 - 啟動

直接運行jar文件來啟動H2O Flow

[Anliven@localhost Downloads]$ pwd
/home/Anliven/Downloads
[Anliven@localhost Downloads]$ ls -l
total 402984
drwxr-xr-x 5 Anliven Anliven        60 Jun 19 08:19 h2o-3.24.0.5
-rw-rw-r-- 1 Anliven Anliven 368257676 Jun 19 21:57 h2o-3.24.0.5.zip
drwxr-xr-x 5 Anliven Anliven        84 Dec 22  2017 h2o-bk
-rw-rw-rw- 1 Anliven Anliven  44392957 Jun 23 22:25 基於H2O的機器學習實用方法.zip
[Anliven@localhost Downloads]$ 
[Anliven@localhost Downloads]$ cd h2o-3.24.0.5/
[Anliven@localhost h2o-3.24.0.5]$ java -jar h2o.jar -ip 192.168.16.101 -port 54321
06-27 22:32:49.845 192.168.16.101:54321  3486   main      INFO: ----- H2O started  -----
06-27 22:32:49.864 192.168.16.101:54321  3486   main      INFO: Build git branch: rel-yates
06-27 22:32:49.864 192.168.16.101:54321  3486   main      INFO: Build git hash: b9cd4d5bcd44a4949ca8c677c5e54c10ee72c968
06-27 22:32:49.864 192.168.16.101:54321  3486   main      INFO: Build git describe: jenkins-3.24.0.4-66-gb9cd4d5
06-27 22:32:49.864 192.168.16.101:54321  3486   main      INFO: Build project version: 3.24.0.5
06-27 22:32:49.864 192.168.16.101:54321  3486   main      INFO: Build age: 8 days
06-27 22:32:49.865 192.168.16.101:54321  3486   main      INFO: Built by: 'jenkins'
06-27 22:32:49.865 192.168.16.101:54321  3486   main      INFO: Built on: '2019-06-18 23:52:14'
06-27 22:32:49.865 192.168.16.101:54321  3486   main      INFO: Found H2O Core extensions: [Watchdog, XGBoost, KrbStandalone]
06-27 22:32:49.865 192.168.16.101:54321  3486   main      INFO: Processed H2O arguments: [-ip, 192.168.16.101, -port, 54321]
06-27 22:32:49.865 192.168.16.101:54321  3486   main      INFO: Java availableProcessors: 2
06-27 22:32:49.865 192.168.16.101:54321  3486   main      INFO: Java heap totalMemory: 240.0 MB
06-27 22:32:49.865 192.168.16.101:54321  3486   main      INFO: Java heap maxMemory: 3.45 GB
06-27 22:32:49.866 192.168.16.101:54321  3486   main      INFO: Java version: Java 1.8.0_161 (from Oracle Corporation)
06-27 22:32:49.866 192.168.16.101:54321  3486   main      INFO: JVM launch parameters: []
06-27 22:32:49.866 192.168.16.101:54321  3486   main      INFO: OS version: Linux 3.10.0-957.el7.x86_64 (amd64)
06-27 22:32:49.866 192.168.16.101:54321  3486   main      INFO: Machine physical memory: 15.51 GB
06-27 22:32:49.866 192.168.16.101:54321  3486   main      INFO: Machine locale: en_US
06-27 22:32:49.866 192.168.16.101:54321  3486   main      INFO: X-h2o-cluster-id: 1561645969069
06-27 22:32:49.866 192.168.16.101:54321  3486   main      INFO: User name: 'Anliven'
06-27 22:32:49.866 192.168.16.101:54321  3486   main      INFO: IPv6 stack selected: false
06-27 22:32:49.867 192.168.16.101:54321  3486   main      INFO: Network interface is down: name:virbr0 (virbr0)
06-27 22:32:49.867 192.168.16.101:54321  3486   main      INFO: Possible IP Address: enp0s8 (enp0s8), fe80:0:0:0:cfdd:6281:f738:fba%enp0s8
06-27 22:32:49.867 192.168.16.101:54321  3486   main      INFO: Possible IP Address: enp0s8 (enp0s8), 192.168.16.101
06-27 22:32:49.867 192.168.16.101:54321  3486   main      INFO: Possible IP Address: enp0s3 (enp0s3), fe80:0:0:0:c48f:c289:276:2308%enp0s3
06-27 22:32:49.867 192.168.16.101:54321  3486   main      INFO: Possible IP Address: enp0s3 (enp0s3), 10.0.2.15
06-27 22:32:49.867 192.168.16.101:54321  3486   main      INFO: Possible IP Address: lo (lo), 0:0:0:0:0:0:0:1%lo
06-27 22:32:49.868 192.168.16.101:54321  3486   main      INFO: Possible IP Address: lo (lo), 127.0.0.1
06-27 22:32:49.868 192.168.16.101:54321  3486   main      INFO: H2O node running in unencrypted mode.
06-27 22:32:49.869 192.168.16.101:54321  3486   main      INFO: Internal communication uses port: 54322
06-27 22:32:49.869 192.168.16.101:54321  3486   main      INFO: Listening for HTTP and REST traffic on http://192.168.16.101:54321/
06-27 22:32:49.870 192.168.16.101:54321  3486   main      INFO: H2O cloud name: 'Anliven' on /192.168.16.101:54321, static configuration based on -flatfile null
06-27 22:32:49.870 192.168.16.101:54321  3486   main      INFO: If you have trouble connecting, try SSH tunneling from your local machine (e.g., via port 55555):
06-27 22:32:49.870 192.168.16.101:54321  3486   main      INFO:   1. Open a terminal and run 'ssh -L 55555:localhost:54321 Anliven@192.168.16.101'
06-27 22:32:49.870 192.168.16.101:54321  3486   main      INFO:   2. Point your browser to http://localhost:55555
06-27 22:32:50.627 192.168.16.101:54321  3486   main      INFO: Log dir: '/tmp/h2o-Anliven/h2ologs'
06-27 22:32:50.627 192.168.16.101:54321  3486   main      INFO: Cur dir: '/home/Anliven/Downloads/h2o-3.24.0.5'
06-27 22:32:50.641 192.168.16.101:54321  3486   main      INFO: Subsystem for distributed import from HTTP/HTTPS successfully initialized
06-27 22:32:50.641 192.168.16.101:54321  3486   main      INFO: HDFS subsystem successfully initialized
06-27 22:32:50.645 192.168.16.101:54321  3486   main      INFO: S3 subsystem successfully initialized
06-27 22:32:50.663 192.168.16.101:54321  3486   main      INFO: GCS subsystem successfully initialized
06-27 22:32:50.663 192.168.16.101:54321  3486   main      INFO: Flow dir: '/home/Anliven/h2oflows'
06-27 22:32:50.681 192.168.16.101:54321  3486   main      INFO: Cloud of size 1 formed [/192.168.16.101:54321]
06-27 22:32:50.690 192.168.16.101:54321  3486   main      INFO: Registered parsers: [GUESS, ARFF, XLS, SVMLight, AVRO, PARQUET, CSV]
06-27 22:32:50.691 192.168.16.101:54321  3486   main      INFO: Watchdog extension initialized
06-27 22:32:50.692 192.168.16.101:54321  3486   main      INFO: XGBoost extension initialized
06-27 22:32:50.692 192.168.16.101:54321  3486   main      INFO: KrbStandalone extension initialized
06-27 22:32:50.692 192.168.16.101:54321  3486   main      INFO: Registered 3 core extensions in: 318ms
06-27 22:32:50.692 192.168.16.101:54321  3486   main      INFO: Registered H2O core extensions: [Watchdog, XGBoost, KrbStandalone]
06-27 22:32:51.041 192.168.16.101:54321  3486   main      INFO: Found XGBoost backend with library: xgboost4j_gpu
06-27 22:32:51.041 192.168.16.101:54321  3486   main      INFO: XGBoost supported backends: [WITH_GPU, WITH_OMP]
06-27 22:32:51.229 192.168.16.101:54321  3486   main      INFO: Registered: 174 REST APIs in: 537ms
06-27 22:32:51.229 192.168.16.101:54321  3486   main      INFO: Registered REST API extensions: [Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4]
06-27 22:32:51.492 192.168.16.101:54321  3486   main      INFO: Registered: 249 schemas in 263ms
06-27 22:32:51.493 192.168.16.101:54321  3486   main      INFO: H2O started in 2407ms
06-27 22:32:51.493 192.168.16.101:54321  3486   main      INFO: 
06-27 22:32:51.493 192.168.16.101:54321  3486   main      INFO: Open H2O Flow in your web browser: http://192.168.16.101:54321
06-27 22:32:51.493 192.168.16.101:54321  3486   main      INFO: 

3.2 - 數據

在開始界面點擊importFiles, 或者在開始頁面的頂部菜單依次選擇Data-->Import Files
在新出現的Import Files對話框中, 填寫Search的路徑后點擊查找(放大鏡圖標), 然后在出現的Search Results中選擇數據文件, Selected Files將顯示選擇結果.
注意: 這里的Search路徑可以是數據文件的絕對路徑,也可以是以h2o.jar文件為參照的相對路徑, 例如../h2o-bk/datasets.

單擊Import按鈕, 將顯示文件導入的結果

單擊Parse these files可以自定義導入數據文件的設置, 一般情況下最好是保持默認值, 直接點擊"Parse"即可.

可以點擊View或者iris_wheader1.hex查看詳細信息

Actions中選擇Split...按鈕, 設置如何划分traintest數據集.

點擊Create按鈕

3.3 - 模型

點擊"train"后, 然后點擊"Build Model...", 將出現算法選擇界面

選擇Deep learning, 並選擇參數response_columnclass, 其余參數均保持默認值.

然后單擊此對話框尾部的"Build Model"按鈕, 開始訓練

訓練完成后, 點擊View按鈕, 可以查看模型構建的參數和過程.

如果之前已經構建過模型, 那么從開始界面依次選擇Model--->List All Models, 然后單擊選擇的模型, 就能夠查看到此模型構建的參數和過程.

3.4 - 預測

從模型視圖單擊Predict..., 然后指定名稱/數據集

或者從開始界面依次選擇Score--->Predict, 然后指定名稱/選擇模型/數據集

確定參數后, 點擊Predict, 將看到預測結果

4 - 其他

  • 相比Python,在Flow中可以完成絕大多數類似的操作,不能完成某些數據操作。
  • 在Python中加載數據,可以在Flow中觀察;在Flow中加載數據,也可以在Python中觀察。
  • 通過Admin菜單下的Water Meter可以查看集群中每個CPU內核的工作狀況。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM