spark MLlib的 pipeline方式

本文轉載自查看原文 2015-07-09 17:24 2077 spark/ ML

spark mllib的pipeline，是指將多個機器學習的算法串聯到一個工作鏈中，依次執行各種算法。

在Pipeline中的每個算法被稱為“PipelineStage”，表示其中的一個算法。PipelineStage分為兩種類型， Estimator和Transformer，其中：

Transformer將數據轉換為兩一種形式（例如修改格式），以供后續的Estimator使用，統一的轉換函數transform；
Estimator是由數據得到一個Mode（Mode也是繼承於Transformer），有統一觸發的函數fit。

然后一個“綜合”的算法就可以通過pipeline封裝起來。這樣做的好處是可以很方便的替換算法。例如，我們在應用中往往只是籠統的期望一個“分類”、”擬合“這樣的功能，但不知道是用分類或擬合的那個算法效果是最好的，有了這種pipeline機制后，很方便替換各種分類和擬合算法，從而得到最好的效果。

詳情： https://spark.apache.org/docs/latest/ml-guide.html

/**
 * :: Experimental ::
 * A simple pipeline, which acts as an estimator. A Pipeline consists of a sequence of stages, each
 * of which is either an [[Estimator]] or a [[Transformer]]. When [[Pipeline#fit]] is called, the
 * stages are executed in order. If a stage is an [[Estimator]], its [[Estimator#fit]] method will
 * be called on the input dataset to fit a model. Then the model, which is a transformer, will be
 * used to transform the dataset as the input to the next stage. If a stage is a [[Transformer]],
 * its [[Transformer#transform]] method will be called to produce the dataset for the next stage.
 * The fitted model from a [[Pipeline]] is an [[PipelineModel]], which consists of fitted models and
 * transformers, corresponding to the pipeline stages. If there are no stages, the pipeline acts as
 * an identity transformer.
 */
@Experimental
class Pipeline(override val uid: String) extends Estimator[PipelineModel] {

From WizNote

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Spark Mllib源碼分析 spark Mllib SVM實例 python spark MLlib Spark MLlib介紹 Spark Pipeline Spark MLlib KMeans 聚類算法 Spark MLlib 機器學習 Spark排序算法系列之（MLLib、ML）LR使用方式介紹 Spark MLlib回歸算法LinearRegression 十二、spark MLlib的scala示例