目錄
保序回歸原理
保序回歸代碼(Spark Python)
保序回歸原理 |
待續...
保序回歸代碼(Spark Python) |
代碼里數據:https://pan.baidu.com/s/1jHWKG4I 密碼:acq1
# -*-coding=utf-8 -*- from pyspark import SparkConf, SparkContext sc = SparkContext('local') import math from pyspark.mllib.regression import LabeledPoint, IsotonicRegression, IsotonicRegressionModel from pyspark.mllib.util import MLUtils # Load and parse the data 加載和解析數據 def parsePoint(labeledData): return (labeledData.label, labeledData.features[0], 1.0) data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_isotonic_regression_libsvm_data.txt") # Create label, feature, weight tuples from input data with weight set to default value 1.0. 創建標簽,特征,權重的元組,並設置權重默認為1.0 parsedData = data.map(parsePoint) # Split data into training (60%) and test (40%) sets. 分割數據集 training, test = parsedData.randomSplit([0.6, 0.4], 11) # Create isotonic regression model from training data. 創建保序回歸模型 # Isotonic parameter defaults to true so it is only shown for demonstration 參數默認為true,這里只是用於展示 model = IsotonicRegression.train(training) # Create tuples of predicted and real labels. 創建預測和真實標簽的元組 predictionAndLabel = test.map(lambda p: (model.predict(p[1]), p[0])) # Calculate mean squared error between predicted and real labels.計算預測和真實標簽的均方誤差 meanSquaredError = predictionAndLabel.map(lambda pl: math.pow((pl[0] - pl[1]), 2)).mean() print("Mean Squared Error = " + str(meanSquaredError)) #Mean Squared Error = 0.00863040529956 # Save and load model model.save(sc, "myIsotonicRegressionModel") sameModel = IsotonicRegressionModel.load(sc, "myIsotonicRegressionModel") print sameModel.predict(data.collect()[0].features) #0.14987251