1,一開始都是調用HttpMethod的getResponseBody()和getResponseBodyAsString,但這樣總會出現下圖中的警告信息
這是由於沒有使用緩存的緣故,如果字符串數據過多,會出警告,應該改用流和緩存來讀取數據:
String response = null;
BufferedReader resBufferReader = null;
try {
httpClient.executeMethod(httpMethod);
resStream = httpMethod.getResponseBodyAsStream();
resBufferReader = new BufferedReader(new InputStreamReader(resStream));
StringBuffer resBuffer = new StringBuffer();
String resTemp = "";
while((resTemp = resBufferReader.readLine()) != null){
resBuffer.append(resTemp);
}
response = resBuffer.toString();
} catch (Exception e) {
}
public static HBaseAdmin hBaseAdmin = null;
public static void init() {
hBaseConfiguration = HBaseConfiguration.create();
try {
hBaseAdmin = new HBaseAdmin(hBaseConfiguration);
} catch (Exception e) {
throw new HbaseRuntimeException(e);
}
}
這里其實是將兩個默認的配置文件加進來了,對於相同的配置項,后者會覆蓋前者
conf.addResource("hbase-site.xml");
可是總是報下面的錯:
Caused by: java.io.IOException: Unable to determine ZooKeeper ensemble
斷點跟蹤到Zookeeper的源碼發現是ZKUtil類的connect方法拋出的異常,
Watcher watcher, final String descriptor)
throws IOException {
if(ensemble == null) {
throw new IOException("Unable to determine ZooKeeper ensemble");
}
int timeout = conf.getInt("zookeeper.session.timeout", 180 * 1000);
LOG.debug(descriptor + " opening connection to ZooKeeper with ensemble (" +
ensemble + ")");
return new ZooKeeper(ensemble, timeout, watcher);
}
上述代碼表明是沒有讀取到Zookeeper集群的地址,這個地址是在ZooKeeperWatcher的構造函數中讀取的:
繼續跟下去,發現配置信息是makeZKProps方法讀取出來的,
// it and grab its configuration properties.
ClassLoader cl = HQuorumPeer. class.getClassLoader();
final InputStream inputStream =
cl.getResourceAsStream(HConstants.ZOOKEEPER_CONFIG_NAME);
if (inputStream != null) {
try {
return parseZooCfg(conf, inputStream);
} catch (IOException e) {
LOG.warn("Cannot read " + HConstants.ZOOKEEPER_CONFIG_NAME +
", loading from XML files", e);
}
}
看到這里才恍然大悟,它會首先去檢查CLASSPATH下是否有zoo.cfg文件,如果有,則將其中的配置項讀取出來作為Zookeeper的配置項,而此時就會完全不顧 hbase-default.xml和hbase-site.xml這兩個配置文件了!
3, Zookeeper有兩個異常需要特別認真地去考慮,
1)第一種情況是連接丟失,在丟失的這段時間,你的操作是不生效的,也就意味着你所做的delete,setData,makePath這些操作都是無效的,這就是第一個要特別去處理的異常信息
KeeperException.ConnectionLossException,處理的方法很簡單,就是引入重試機制,指定好最大重試次數,重試間隔時間即可。
KeeperException exception = null;
for ( int i = 0; i < retryCount; i++) {
try {
return (T) operation.execute();
} catch (KeeperException.ConnectionLossException e) {
if (exception == null) {
exception = e;
}
if (Thread.currentThread().isInterrupted()) {
Thread.currentThread().interrupt();
throw new InterruptedException();
}
retryDelay(i);
}
}
throw exception;
}
2)第二種情況是Session的超時。當你第一次連接Zookeeper時,是可以注冊一個Watcher的,這個Watcher的作用就是應對Zookeeper連接成功和會話超時的,
當后者發生時,你必須進行嘗試重新連接Zookeeper服務器的動作,一旦重新連接成功,你就可以做一些應用層的初始化動作,這里是通過onReconnect.command()來實現的,OnReconnect接口是一個鈎子,用於重連完成時,回調進行一些初始化動作的。
if (log.isInfoEnabled()) {
log.info("Watcher " + this + " name:" + name + " got event " + event + " path:" + event.getPath() + " type:" + event.getType());
}
state = event.getState();
if (state == KeeperState.SyncConnected) {
connected = true;
clientConnected.countDown();
} else if (state == KeeperState.Expired) {
connected = false;
log.info("Attempting to reconnect to recover relationship with ZooKeeper...");
// 嘗試重新連接zk服務器
try {
connectionStrategy.reconnect(zkServerAddress, zkClientTimeout, this,
new ZkClientConnectionStrategy.ZkUpdate() {
@Override
public void update(SolrZooKeeper keeper) throws InterruptedException, TimeoutException, IOException {
synchronized (connectionStrategy) {
waitForConnected(SolrZkClient.DEFAULT_CLIENT_CONNECT_TIMEOUT);
client.updateKeeper(keeper);
if (onReconnect != null) {
onReconnect.command();
}
synchronized (ConnectionManager. this) {
ConnectionManager. this.connected = true;
}
}
}
});
} catch (Exception e) {
SolrException.log(log, "", e);
}
log.info("Connected:" + connected);
} else if (state == KeeperState.Disconnected) {
connected = false;
} else {
connected = false;
}
notifyAll();
}
4,今天在做solr的master/slave切換時遇到一個讓人困擾的問題
場景描述:
3個solr節點的集群,1個master節點,名為m1,2個slave節點,分別為s1,s2,每個solr節點都在Zookeeper集群中同一個Znode下注冊為EPHEMERAL_SEQUENTIAL節點,分別可以得到一個序號,采取“序號最小者為master”的策略來進行master選舉。若m1節點掛掉,則下一個序號最小的slave節點自動接替成為新的master,假定此slave是s1,則此時有3件事要完成:
1) s1節點上的solr核的solrConfig.xml配置文件中有關replication的片段,必須從slave的配置改成master的配置,並且reload其對應的solr核
2)其他slave節點(這里是s2)必須修改其配置文件中有關replication的片段,將原先指向m1的masterUrl改為指向s1,並且reload其對應的solr核
3)若原先掛掉的m1節點重新回到集群中來,則它會在上面提到的那個Znode下重新一個EPHEMERAL_SEQUENTIAL節點,並且序號肯定會比s1,s2的大,則m1會發現已經有新的master節點s1存在,自動識別出自己的身份是slave,其上的solr核也會采用有關slave的配置片段,並且指向s1所在的新的masterUrl
問題:
我現在碰到的情況是,s1將其配置文件從slave改為master,然后reload的結果是,索引目錄文件由index變成了index.時間戳,導致s2這個slave節點在從s1復制索引時卻是默認從index這個目錄去復制的,從而無法找到索引文件,s1上的indexversion返回是0.
目前卡在這個地方,明天來好好研究下真實原因。。。