1. Can't get Kerberos realm
原因分析:
原始代碼為:
org.apache.hadoop.security.UserGroupInformation.setConfiguration(conf) sun.security.krb5.Config.refresh()
首先根據傳進來的Hadoop配置conf,去設置UserGroupInformation(UGI),方法的調用關系如下(刪除了部分不相關代碼):
public static void setConfiguration(Configuration conf) {
initialize(conf, true);
}
initialize方法如下
private static synchronized void initialize(Configuration conf, boolean overrideNameRules) {
authenticationMethod = SecurityUtil.getAuthenticationMethod(conf);
if (overrideNameRules || !HadoopKerberosName.hasRulesBeenSet()) {
try {
HadoopKerberosName.setConfiguration(conf);
} catch (IOException ioe) {
throw new RuntimeException(
"Problem with Kerberos auth_to_local name configuration", ioe);
}
}
......
}
setConfiguration方法如下
public static void setConfiguration(Configuration conf) throws IOException {
final String defaultRule;
switch (SecurityUtil.getAuthenticationMethod(conf)) {
case KERBEROS:
case KERBEROS_SSL:
try {
KerberosUtil.getDefaultRealm();
} catch (Exception ke) {
throw new IllegalArgumentException("Can't get Kerberos realm", ke);
}
......
}
......
}
getDefaultRealm使用了反射,目的是為了兼容兩套jdk,即IBM(com.ibm.security.krb5.internal.Config) 和 Oracle(sun.security.krb5.Config)
public static String getDefaultRealm()
throws ClassNotFoundException, NoSuchMethodException,
IllegalArgumentException, IllegalAccessException,
InvocationTargetException {
Object kerbConf;
Class<?> classRef;
Method getInstanceMethod;
Method getDefaultRealmMethod;
if (System.getProperty("java.vendor").contains("IBM")) {
classRef = Class.forName("com.ibm.security.krb5.internal.Config"); // 獲取IBM jdk的類引用
} else {
classRef = Class.forName("sun.security.krb5.Config"); // 獲取Oracle jdk的類引用
}
getInstanceMethod = classRef.getMethod("getInstance", new Class[0]);
kerbConf = getInstanceMethod.invoke(classRef, new Object[0]);
getDefaultRealmMethod = classRef.getDeclaredMethod("getDefaultRealm", new Class[0]);
return (String)getDefaultRealmMethod.invoke(kerbConf, new Object[0]);
}
從上述代碼來看,先獲取Config類引用,然后getInstanceMethod是獲得getInstance方法,再次getDefaultRealmMethod是獲得getDefaultRealm方法。
因此,假設我們是使用的Oracle的JDK,那么最后是調用的sun.security.krb5.getDefaultRealm()。接下來看一下sun.security.krb5.getDefaultRealm()是如何實現的。
public String getDefaultRealm() throws KrbException {
if(this.defaultRealm != null) { // 如果defaultRealm不為空,直接返回defaultRealm
return this.defaultRealm;
} else { // 如果defaultRealm為null,獲取defaultRealm
KrbException var1 = null;
String var2 = this.getDefault("default_realm", "libdefaults");
if(var2 == null && this.useDNS_Realm()) {
try {
var2 = this.getRealmFromDNS();
} catch (KrbException var4) {
var1 = var4;
}
}
......
}
}
我們假設defaultRealm = null,看一下如何從var2 = this.getRealmFromDNS();來獲取defaultRealm
private String getRealmFromDNS() throws KrbException {
String var1 = null;
String var2 = null;
try {
var2 = InetAddress.getLocalHost().getCanonicalHostName(); // 1. 獲取local host name
} catch (UnknownHostException var7) {
KrbException var4 = new KrbException(60, "Unable to locate Kerberos realm: " + var7.getMessage());
var4.initCause(var7);
throw var4;
}
String var3 = PrincipalName.mapHostToRealm(var2); // 2. 根據local host name獲取realm
....
}
mapHostToRealm()方法如下:
static String mapHostToRealm(String var0) {
String var1 = null;
try {
String var2 = null;
Config var3 = Config.getInstance(); // 獲取Config的單例對象
if((var1 = var3.getDefault(var0, "domain_realm")) != null) {
return var1;
}
.......
} catch (KrbException var5) {
;
}
return var1;
}
這里會獲取Config的單例對象,
public static synchronized Config getInstance() throws KrbException {
if(singleton == null) {
singleton = new Config();
}
return singleton;
}
再看Config.getInstance();的具體動作就是判斷單例對象是否為null,不為null直接返回,為null重新new一個Config對象。
同時,Config類中還有一個方法refresh,其代碼如下:
public static synchronized void refresh() throws KrbException {
singleton = new Config();
KdcComm.initStatic();
}
從refresh的代碼看,只要調用refresh()方法,就會重新生成Config的單例對象。這個refresh()方法,也是我們代碼里面要調用的。
再回顧一下我們的原始代碼:
org.apache.hadoop.security.UserGroupInformation.setConfiguration(conf) sun.security.krb5.Config.refresh()
回到getInstance()方法,假設singleton單例是null,會生成Config的單例對象。以后,再次調用getInstance方法都會直接返回這個單例對象了,沒有再new的機會了。有人開始質疑沒有機會new Config()對象了? 調用Config.refresh()方法不是可以new嗎? 答案是可以new,但是如果我們的UserGroupInformation.setConfiguration(conf)會拋出異常,是不是Config.refresh()方法就不會被調用了! 我們的錯誤就是出現在這里,后面分析UserGroupInformation.setConfiguration(conf)怎么拋出異常了。
在我們來看一下new Config()具體做了什么事情。
private Config() throws KrbException {
String var1 = getProperty("java.security.krb5.kdc"); // 從系統變量獲取kdc地址,假設我們啟動JVM時沒有設置該變量
if(var1 != null) {
this.defaultKDC = var1.replace(':', ' ');
} else {
this.defaultKDC = null;
}
this.defaultRealm = getProperty("java.security.krb5.realm"); // 從系統變量獲取realm,假設我們啟動JVM時也沒有設置該變量
if((this.defaultKDC != null || this.defaultRealm == null) && (this.defaultRealm != null || this.defaultKDC == null)) {
try {
String var3 = this.getJavaFileName(); // 該方法會從JVM參數java.security.krb5.conf以及<java-home>/lib/security/krb5.conf獲取到krb5.conf文件
Vector var2;
if(var3 != null) {
var2 = this.loadConfigFile(var3);
this.stanzaTable = this.parseStanzaTable(var2);
if(DEBUG) {
System.out.println("Loaded from Java config");
}
} else { // 假設JVM參數java.security.krb5.conf以及<java-home>/lib/security/krb5.conf都沒有獲取到krb5.conf文件
boolean var4 = false;
if(isMacosLionOrBetter()) {
try {
this.stanzaTable = SCDynamicStoreConfig.getConfig();
if(DEBUG) {
System.out.println("Loaded from SCDynamicStoreConfig");
}
var4 = true;
} catch (IOException var6) {
;
}
}
if(!var4) {
var3 = this.getNativeFileName(); // 我們是centos機器, 會拿到/etc/krb5.conf
var2 = this.loadConfigFile(var3); // 加載/etc/krb5.conf文件
this.stanzaTable = this.parseStanzaTable(var2);
if(DEBUG) {
System.out.println("Loaded from native config");
}
}
}
} catch (IOException var7) {
;
}
} else {
throw new KrbException("System property java.security.krb5.kdc and java.security.krb5.realm both must be set or neither must be set.");
}
}
我們的問題就出在var2 = this.loadConfigFile(var3); 位置,因為加載/etc/krb5.conf文件的時候,恰好/etc/krb5.conf文件不存在,因為我們會把修改的krb5.conf去替換/etc/krb5.conf文件,在替換的時間內,恰好去loadConfigFile(),該方法就報了FileNotFoundException的異常。這個異常一直throw到UserGroupInformation.setConfiguration(conf)調用的地方,導致我們永遠調用不到Config.refresh()方法。
2. 報錯com.google.common.util.concurrent.UncheckedTimeoutException: java.util.concurrent.TimeoutException
原因分析:首先這個異常是因為調試上述報錯產生的,所以順便分析下原因。
上述報錯是Can't get Kerberos realm,網上查一下,大概是因為拿不到kdc和realm。
因此,我在JVM啟動參數中添加了如下3個參數:
-Djava.security.krb5.conf=/etc/krb5.conf \
-Djava.security.krb5.kdc=node1:8080 \
-Djava.security.krb5.realm=KFC.com \
指定了krb5.conf文件,kdc地址,realm值。然后重啟程序,發現可以正常使用,然后把/etc/krb5.conf文件刪除了(上個錯誤其實猜想到了是因為讀不到krb5.conf造成的)。
程序竟然報錯 java.util.concurrent.TimeoutException,打jstack
TimeoutException 的jstack如下:
"builtin-checker-serviceId-58" prio=10 tid=0x00007f678800e800 nid=0x4084 waiting for monitor entry [0x00007f672fffe000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1074)
- waiting to lock <0x00000000a8b940d0> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation)
......
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
調用UserGroupInformation.loginUserFromKeytabAndReturnUGI被block了
往上找jstack,
"builtin-checker-serviceId-59" prio=10 tid=0x00007f67680b3800 nid=0x4097 runnable [0x00007f672f2ee000]
java.lang.Thread.State: RUNNABLE
at java.net.PlainDatagramSocketImpl.receive0(Native Method)
- locked <0x000000009a0076e0> (a java.net.PlainDatagramSocketImpl)
at java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:146)
- locked <0x000000009a0076e0> (a java.net.PlainDatagramSocketImpl)
at java.net.DatagramSocket.receive(DatagramSocket.java:816)
- locked <0x000000009a017848> (a java.net.DatagramPacket)
- locked <0x000000009a0076a0> (a java.net.DatagramSocket)
at sun.security.krb5.internal.UDPClient.receive(NetClient.java:207) // 卡主了
at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:390)
at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:343)
at java.security.AccessController.doPrivileged(Native Method)
at sun.security.krb5.KdcComm.send(KdcComm.java:327)
at sun.security.krb5.KdcComm.send(KdcComm.java:219)
at sun.security.krb5.KdcComm.send(KdcComm.java:191)
at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:319)
at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:364)
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:735)
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:584)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:762)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:690)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:688)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:687)
at javax.security.auth.login.LoginContext.login(LoginContext.java:595)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1092)
- locked <0x00000000a8b940d0> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation)
........
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
從jstack中看到UDPClient.receive卡主了,為什么卡主了,不知道! 問大神,大神說加入JVM調試參數-Dsun.security.krb5.debug=true,可以打印日志到console中。在console中看到如下日志:
Ordering keys wrt default_tkt_enctypes list
default etypes for default_tkt_enctypes: 3 1 16.
default etypes for default_tkt_enctypes: 3 1 16.
>>> KrbAsReq creating message
>>> KrbKdcReq send: kdc=node1 UDP:88, timeout=30000, number of retries =3, #bytes=134
>>> KDCCommunication: kdc=node1 UDP:88, timeout=30000,Attempt =1, #bytes=134
SocketTimeOutException with attempt: 1
>>> KDCCommunication: kdc=node1 UDP:88, timeout=30000,Attempt =2, #bytes=134
SocketTimeOutException with attempt: 2
>>> KDCCommunication: kdc=node1 UDP:88, timeout=30000,Attempt =3, #bytes=134
SocketTimeOutException with attempt: 3
>>> KrbKdcReq send: error trying node1
java.net.SocketTimeoutException: Receive timed out
at java.net.PlainDatagramSocketImpl.receive0(Native Method)
at java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:146)
at java.net.DatagramSocket.receive(DatagramSocket.java:816)
at sun.security.krb5.internal.UDPClient.receive(NetClient.java:207)
at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:390)
at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:343)
at java.security.AccessController.doPrivileged(Native Method)
at sun.security.krb5.KdcComm.send(KdcComm.java:327)
at sun.security.krb5.KdcComm.send(KdcComm.java:219)
at sun.security.krb5.KdcComm.send(KdcComm.java:191)
at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:319)
at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:364)
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:735)
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:584)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:762)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:690)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:688)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:687)
at javax.security.auth.login.LoginContext.login(LoginContext.java:595)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1092)
........
看到默認去連了KDC的88端口,默認端口被改成了1088,所以連接失敗,導致超時。 聽說沒有參數可以設置KDC的端口, 不知道真假,在-Djava.security.krb5.kdc參數中指定kdc端口無效。
參考: https://docs.oracle.com/javase/7/docs/technotes/guides/security/jgss/tutorials/KerberosReq.html 及源代碼
