在Pipeline中的每個算法被稱為“PipelineStage”,表示其中的一個算法。PipelineStage分為兩種類型,
Estimator和Transformer,
其中
:
- Transformer將數據轉換為兩一種形式(例如修改格式),以供后續的Estimator使用,統一的轉換函數transform;
- Estimator是由數據得到一個Mode(Mode也是繼承於Transformer),有統一觸發的函數fit。
然后一個“綜合”的算法就可以通過pipeline封裝起來。這樣做的好處是可以很方便的替換算法。例如,我們在應用中往往只是籠統的期望一個“分類”、”擬合“這樣的功能,但不知道是用分類或擬合的那個算法效果是最好的,有了這種pipeline機制后,很方便替換各種分類和擬合算法,從而得到最好的效果。
/**
* :: Experimental ::
* A simple pipeline, which acts as an estimator. A Pipeline consists of a sequence of stages, each
* of which is either an [[Estimator]] or a [[Transformer]]. When [[Pipeline#fit]] is called, the
* stages are executed in order. If a stage is an [[Estimator]], its [[Estimator#fit]] method will
* be called on the input dataset to fit a model. Then the model, which is a transformer, will be
* used to transform the dataset as the input to the next stage. If a stage is a [[Transformer]],
* its [[Transformer#transform]] method will be called to produce the dataset for the next stage.
* The fitted model from a [[Pipeline]] is an [[PipelineModel]], which consists of fitted models and
* transformers, corresponding to the pipeline stages. If there are no stages, the pipeline acts as
* an identity transformer.
*/
@Experimental
class Pipeline(override val uid: String) extends Estimator[PipelineModel] {