在使用Kettle的集群排序中 Carte的設定——(基於Windows)

本文轉載自查看原文 2013-08-15 17:00 5934 Kettle

本片文章主要是關於使用Kettle的UI界面：

Spoon來實現基於集群的對數據庫中的數據表數據進行排序的試驗。

以及在實驗過程中所要開啟的Carte服務的一些配置文件的設置，

還有基於Windows cmd 的相關Carte命令。

文章主要分為六個部分：

1.介紹carte 　　

2.carte相關配置文件的設定

3.carte服務的開啟命令

4.在kettle的圖形界面中對集群進行相關的設定 　　

5.使用kettle集群模式對相關的數據進行排序

6.有關於集群調用子服務器的java源代碼調用實現

1.介紹carte

carte是由kettle所提供的web server的程序，
carte也被叫做子服務器（slave）在kettle調用集群（cluster）來進行分布式分發、處理任務的時候，

可以開啟多個carte服務進程來進行分發ETL（master）任務和接收，運行，提交ETL任務（slave）。

就像是《pentaho kettle solutions》中對Carte的定義：

"Carte a lightweight server process allows for remote monitoring and enables the transformation clustering capabilities ".

"Carte是一個輕量級的服務器進程，可以遠程監控和開啟轉換集群的能力".

2.carte相關配置文件的設定

與hadoop的結點設置類似，本實驗將要實現的是基於一台主機，

開啟四個carte服務，其中一台為Master另外三台為Slave，

來實現在Kettle的Spoon中對數據庫中數據表讀取后以集群的方式來執行排序的過程。

開啟的carte服務所顯示的命令窗口都是一樣的，但是究竟哪一個是主服務哪些又是子服務呢？

對於集群中的主服務器還是子服務器的設定，

我們仍舊引用《pentaho kettle solutions》書中的一段話進行說明（因為很權威的）：

"A cluster schema consists of one master server that is being used as a controller

for the cluster , and a number of non-master slave servers. In short, we refer to

the controlling Carte server as the master and the other Carte servers as slaves"

LZ在不考慮到句式主謂賓定狀補的條件下，對上述介紹的理解是這樣的。

"一個集群實體是由一個用來主控整個集群的主節點

和多個不是主節點

（也就是主節點除外，即配置文件中屬性<master>N</master>對應的值置為N的對應結點）

的子服務器所構成的。

簡而言之，我們把開啟的主控Carte 服務器叫做主節點而其他的Carte 服務器叫做從結點"。

關於Carte的服務器是主還是從是由相關的配置文件：carte-config.xml中的

屬性<master></master>中是"Y"還是"N" 所設定的，

其實這個和hadoop通過相關的XML配置文件來設定是主節點還是從節點是很神似的。

配置文件吧，其實根據計算機不同，以及計算機中的環境變量的不同而千差萬別。

主要說一下LZ關於配置文件的設定過程吧，

若想讓Carte程序可以成功運行的話，首先就應該設定它的配置文件，

配置文件所在的路徑，如下圖所示：

（carte-config.xml 截圖）

在這里LZ在正常進行配置的時候cmd窗口報錯，說是在kokia/Acer/user/acer/

的下面找不到pwd文件夾(kokia是LZ的計算機名稱)

所以LZ根據提示將kettle安裝解壓路徑下的pwd文件夾復制了一份到提示信息的路徑下，

才使得Carte正常運行，不過要讓LZ說是什么原理嘛，其實LZ也不知道的，

或許默認Carte服務啟動的時候會到該路徑下自行尋找相關的配置文件吧......

pwd這個文件夾下面默認存放的是關於Carte的一些配置文件以及登陸用戶名以及密碼等等，

它所在的kettle安裝包的路徑就是./data-integration/pwd 這個下面的。

下面是關於主服務器（master：carte-config-master-8080.xml）配置文件進行相關注釋說明：

<slave_config>

<slaveserver>
<name>master1</name>
<hostname>localhost</hostname>
<port>8080</port>
<master>Y</master>
</slaveserver>


</slave_config>

<!--
even though called master node  ,

it is a instance of the  slaveserver

<name> attribute is used to define the name of the slaveserver
<hostname> in this conf file is the localhost which equal
to the "127.0.0.1" IP address

當然，對於這個hostname的話，在Linux的環境中，

在對應的配置文件中 有相關的IP地址與主機名稱相對應的，

在Windows下面，LZ並不知道相關的配置文件在哪里，

所以如果是集群的節點所在的並不是基於一台主機的話，
<hostname>這個屬性的值可以使用該節點所在的主機IP地址所代替。
<port> 8080 , in carte the port of 8080 is regarded 
as the port of the master node in default

<master> : Y  which talked about above , attribute value = Y
means that the current slaveserver is regarded as the master node
in the cluster.
-->

下面是關於子服務器（slave）的配置文件進行相關注釋說明：

<slave_config>

<masters>

<slaveserver>

<name>master1</name>
<hostname>localhost</hostname>
<port>8080</port>
<username>cluster</username>
<password>cluster</password>
<master>Y</master>
</slaveserver>

</masters>


<report_to_masters>Y</report_to_masters>

<slaveserver>
<name>slave1-8081</name>
<hostname>localhost</hostname>
<port>8081</port>
<username>cluster</username>
<password>cluster</password>
<master>N</master>
</slaveserver>

</slave_config>