一、DNS的基本功能
在互聯網中,從域名到IP地址的轉換是一個基礎功能,之前一直想結合流行的DNS服務器bind來看下服務器側的配置,所以最近有時間就總結一下。對於應用(或者說客戶端)來說,對域名服務的使用主要基於C庫的gethostbyname函數,該函數實現比較復雜,事實上,在glibc的根目錄下有一個專門的resolv文件夾用來完成該解析功能,而這其中最為重要的就是gethostbyname及其相關的函數簇。
二、DNS server的工作模式
稍微了解過DNS工作原理的應該都聽說過DNS查詢的時候是可能要多次遞歸查詢的,以www.gnu.org這個域名為例,正則的查詢是先到根目錄服務器上查找到org域名服務器的地址,然后到org域名服務器上查找到gnu.org域名服務器,然后再到gnu.org的域名服務器上查找www.gnu.org這個機器的IP地址。但是這里的描述忽略了一些重要的細節,這個遞歸查詢/迭代到底是誰來查詢的,是在客戶端gethostbyname函數完成,還是由DNS server完成?如果是在域名服務器中完成,那么到底是在哪個域名服務器完成?同樣是域名服務器,為什么有的只提供域名服務器的間接地址,而有的卻不厭其煩的刨根問底,直到找到最終的目的IP地址?如果有gethostbyname在客戶端完成,客戶端抓包是不是可以看到很多往返來往的報文。
1、誰來完成迭代
這里可以大致考慮下:如果在客戶端多次查詢,這個地方應該是有浪費的,因為這些查詢比較分散,並且DNSserver不能知道最終的查詢結果,從而不利於進行查詢結果的緩存。最好的策略是讓DNS服務器完成迭代查詢並且將查詢結果緩存起來,從而可以被其它客戶端復用。
2、誰不能迭代
在下面文章中,其中說到了重要的一點
STEP 3: As the answer for the query is not available with the DNS server 172.16.200.30, this server sends a query to one of the DNS root server,for the answer. Now an important fact to note here is that root server's are always iterative servers.
也即是根服務器總是迭代的,也就是對於www.gnu.org,根服務器只會給出org這個域名的DNS查詢服務器,來這個服務器查詢的客戶端需要再次到返回的服務器上執行新的查詢。
3、遞歸服務器的問題
當需要進行遞歸查詢時,其實可以看到在DNS服務器山存在異步的流程:也就是當一個客戶端請求過來的時候,DNS服務器要記錄下這個客戶端的請求信息(包括地址、端口、請求的域名等),同時不斷的到其它DNS上進行迭代查詢,這個對服務器來說其實是一個比較大的開銷,因為異步必須記錄狀態,記錄狀態必然會占用內存。
現在反觀根服務器,它們始終只是提供迭代服務,這意味着它們沒有異步狀態:當一個查詢過來的時候,它只需要回復去哪個服務器查找,這個查詢只查詢本地數據即可。考慮到根服務器的請求量,這種設計和實現是比較合理的。
三、協議的報文格式
報文格式的說明可以在這里找到
1、請求報文
對於是否迭代,在請求報文中有一個專門的RD標志位(Recurse Demain),如果客戶端設置了這個標志位,表示要求DNS server進行遞歸,也就是說要求DNS給出該域名的最終IP地址,但是這個只是客戶端的請求(demand)。具體服務器是不是滿足這個請求還是要看服務器的配置,對於當前的bind服務器來說,在配置文件的option的recursion可以確定是否進行遞歸,如果配置了禁止遞歸,那么即使請求報文置位了RD標志位,DNS還是不會迭代。但是反過來說,如果客戶端的RD沒有置位,那么server一定不能進行迭代。
options {
query-source address 9.9.9.9;
port 53;
pid-file "named.pid";
listen-on {9.9.9.9;};
listen-on-v6 {none;};
recursion yes;
notify yes;
};
2、應答報文
從這個報文格式中可以看到,在每個回報結構中有專門的“Authority count” 和 “Authority”字段,這個字段表示了指定的權威服務器的位置,其實也就是給出的客戶端去再次查詢的服務器的位置。假設向一個DNS查詢並且禁用了遞歸(RD清零),那么此時回報中可能就只有這個Authority section中有效,也就是這個字段告訴了請求客戶端需要到這個地址再次進行查詢。
Response section is empty in general,
Authority section contains NS records describing "a better zone server" for next iteration. (Worst case is root zone)
Additional section contains if known addresses for servers described in authority section.
四、單機測試
可以在本機上啟動一個named服務器,配合dig命令可以看到不同情況下DNS對於請求的處理報文
0、基礎配置文件
/etc/named.con
options {
query-source address 9.9.9.9;
port 53;
pid-file "named.pid";
listen-on {127.0.0.1;};
listen-on-v6 {none;};
recursion yes;
notify yes;
};
view "internal" {
match-clients { 10.53.0.2;
10.53.0.3; };
zone "." {
type hint;
file "root.hint";
};
zone "example" {
type master;
file "internal.db";
allow-update { any; };
};
};
view "external" {
match-clients { any; };
zone "." {
type hint;
file "root.hint";
};
zone "example" {
type master;
file "example.db";
};
};
/etc/root.hint
; Copyright (C) 2000 Internet Software Consortium.
;
; Permission to use, copy, modify, and distribute this software for any
; purpose with or without fee is hereby granted, provided that the above
; copyright notice and this permission notice appear in all copies.
;
; THE SOFTWARE IS PROVIDED "AS IS" AND INTERNET SOFTWARE CONSORTIUM DISCLAIMS
; ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES
; OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL INTERNET SOFTWARE
; CONSORTIUM BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL
; DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
; PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
; ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
; SOFTWARE.
; $Id: root.hint,v 1.3 2000/06/22 21:52:55 tale Exp $
$TTL 999999
. IN NS a.root-servers.nil.
a.root-servers.nil. IN A 10.12.216.180
1、服務器啟用recursion、客戶端啟用RD
執行dig命令
tsecer@harry: dig @127.0.0.1 www.no.exist.test
; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.2 <<>> @127.0.0.1 www.no.exist.test
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 64638
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 2048
;; QUESTION SECTION:
;www.no.exist.test. IN A
;; Query time: 38 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Jul 03 15:14:57 CST 2020
;; MSG SIZE rcvd: 46
抓包輸出
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
15:14:57.735988 IP (tos 0x0, ttl 64, id 34993, offset 0, flags [none], proto UDP (17), length 74)
VM_15_187_centos.52918 > VM_15_187_centos.domain: [bad udp cksum 0xfe49 -> 0x0b9e!] 64638+ [1au] A? www.no.exist.test. ar: . OPT UDPsize=4096 (46)
15:14:57.736249 IP (tos 0x0, ttl 64, id 55832, offset 0, flags [DF], proto UDP (17), length 74)
9.9.9.9.55234 > 10.12.216.180.domain: [udp sum ok] 52155% [1au] A? www.no.exist.test. ar: . OPT UDPsize=2048 (46)
可以看到,本地服務器向bind中配置的10.12.216.180進行了查詢。
2、服務器啟用recursion、客戶端禁用RD
可以看到,在回報的”AUTHORITY SECTION“和”ADDITIONAL SECTION:“中包含了權威服務器的域名和IP地址
tsecer@harry: dig +norecurse @127.0.0.1 www.no.exist.testt
; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.2 <<>> +norecurse @127.0.0.1 www.no.exist.test
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20342
;; flags: qr ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 14
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 2048
;; QUESTION SECTION:
;www.no.exist.test. IN A
;; AUTHORITY SECTION:
. 60095 IN NS k.root-servers.net.
. 60095 IN NS h.root-servers.net.
. 60095 IN NS j.root-servers.net.
. 60095 IN NS i.root-servers.net.
. 60095 IN NS e.root-servers.net.
. 60095 IN NS c.root-servers.net.
. 60095 IN NS a.root-servers.net.
. 60095 IN NS m.root-servers.net.
. 60095 IN NS l.root-servers.net.
. 60095 IN NS b.root-servers.net.
. 60095 IN NS g.root-servers.net.
. 60095 IN NS d.root-servers.net.
. 60095 IN NS f.root-servers.net.
;; ADDITIONAL SECTION:
f.root-servers.net. 31823 IN A 192.5.5.241
k.root-servers.net. 28825 IN A 193.0.14.129
h.root-servers.net. 29235 IN A 198.97.190.53
j.root-servers.net. 33803 IN A 192.58.128.30
i.root-servers.net. 39413 IN A 192.36.148.17
e.root-servers.net. 55941 IN A 192.203.230.10
c.root-servers.net. 38867 IN A 192.33.4.12
a.root-servers.net. 63886 IN A 198.41.0.4
m.root-servers.net. 28815 IN A 202.12.27.33
l.root-servers.net. 59789 IN A 199.7.83.42
b.root-servers.net. 36953 IN A 199.9.14.201
g.root-servers.net. 60093 IN A 192.112.36.4
d.root-servers.net. 71961 IN A 199.7.91.13
;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Jul 03 15:19:22 CST 2020
;; MSG SIZE rcvd: 465
tsecer@harry:
3、服務器禁用recursion、客戶端啟用RD
named.conf中option設置recursion no;
tsecer@harry: dig +recurse @127.0.0.1 www.no.exist.test
; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.2 <<>> +recurse @127.0.0.1 www.no.exist.test
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42574
;; flags: qr rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 2048
;; QUESTION SECTION:
;www.no.exist.test. IN A
;; AUTHORITY SECTION:
. 999999 IN NS a.root-servers.nil.
;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Jul 03 15:22:53 CST 2020
;; MSG SIZE rcvd: 77
tsecer@harry:
4、服務器禁用recursion、客戶端禁用RD
tsecer@harry: dig +norecurse @127.0.0.1 www.no.exist.test
; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.2 <<>> +norecurse @127.0.0.1 www.no.exist.test
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2502
;; flags: qr ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 2048
;; QUESTION SECTION:
;www.no.exist.test. IN A
;; AUTHORITY SECTION:
. 999999 IN NS a.root-servers.nil.
;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Jul 03 15:23:23 CST 2020
;; MSG SIZE rcvd: 77
tsecer@harry:
四、bind工程中代碼備忘
該部分代碼屬於大致調試了下關鍵流程,只是簡單做個備忘,具體的內容不做進一步詳細分析。
1、一個表示異步的客戶端
為了處理異步,當一個客戶端查詢請求過來的時候,需要創建一個對應的ns_client_t對象
bind9-9_0\bin\named\client.c
/*
* Handle an incoming request event from the dispatch (UDP case)
* or tcpmsg (TCP case).
*/
static void
client_request(isc_task_t *task, isc_event_t *event) {
……
}
2、客戶端請求相關邏輯
對DNS來說,客戶端過來的請求叫做“query”
bind9-9_0\bin\named\query.c
static void
query_find(ns_client_t *client, dns_fetchevent_t *event) {
……
/*
* Now look for an answer in the database.
*/
result = dns_db_find(db, client->query.qname, version, type,
client->query.dboptions, client->now,
&node, fname, rdataset, sigrdataset);
……
}
3、DNS服務器去遠端服務器查詢
對DNS來說,去其它DNS服務器查詢的操作為fetch,一個調用鏈為
(gdb) bt
#0 dns_resolver_createfetch (res=0x7ffff7ed90e0, name=0x7ffff7eafb00, type=1, domain=0x7ffff7eafab0, nameservers=0x7fffe80cc310, forwarders=0x0, options=0, task=0x7ffff7ea1c90,
action=0x40f8db <query_resume>, arg=0x7ffff7ec93f0, rdataset=0x7fffe80cc1b0, sigrdataset=0x7ffff7ea7f88, fetchp=0x7ffff7ec95e8) at resolver.c:4720
#1 0x000000000040fe98 in query_recurse (client=0x7ffff7ec93f0, qtype=1, qdomain=0x7ffff7eafab0, nameservers=0x7fffe80cc310) at query.c:1919
#2 0x0000000000410bf3 in query_find (client=0x7ffff7ec93f0, event=0x0) at query.c:2408
#3 0x0000000000411f38 in ns_query_start (client=0x7ffff7ec93f0) at query.c:3055
#4 0x0000000000405bba in client_request (task=0x7ffff7ea1c90, event=0x7fffe80c41b0) at client.c:1106
#5 0x000000000056555d in run (uap=0x7ffff7e9f010) at task.c:799
#6 0x00007ffff79ade25 in start_thread () from /lib64/libpthread.so.0
#7 0x00007ffff76da35d in clone () from /lib64/libc.so.6
(gdb)
4、bind如何自后向前進行域名匹配
域名的一個特殊性在於它是反相匹配的,所以對於客戶端的請求和本地數據庫內容要從后向前匹配,這個在服務器的實現中並不是真正轉換,而只是提前統計了點分位置,然后根據統計信息進行轉換。從下面代碼中可以看到,它是從后向前進行拆分之后的domain逐個匹配的
bind9-9_0\lib\dns\name.c
dns_namereln_t
dns_name_fullcompare(const dns_name_t *name1, const dns_name_t *name2,
int *orderp,
unsigned int *nlabelsp, unsigned int *nbitsp)
{
……
l1 = name1->labels;
l2 = name2->labels;
ldiff = (int)l1 - (int)l2;
if (ldiff < 0)
l = l1;
else
l = l2;
while (l > 0) {
l--;
l1--;
l2--;
label1 = &name1->ndata[offsets1[l1]];
label2 = &name2->ndata[offsets2[l2]];
count1 = *label1++;
count2 = *label2++;
……
}
……
}
5、遞歸模式下的迭代
a、對其它DNS服務器回報處理,可以看到如果權威服務器section中有字段,則返回DNS_R_DELEGATION
static isc_result_t
noanswer_response(fetchctx_t *fctx, dns_name_t *oqname) {
……
/*
* If the current qname is not a subdomain of the query
* domain, there's no point in looking at the authority
* section without doing DNSSEC validation.
*
* Until we do that validation, we'll just return success
* in this case.
*/
if (!dns_name_issubdomain(qname, &fctx->domain))
return (ISC_R_SUCCESS);
……
/*
* Set the current query domain to the referral name.
*
* XXXRTH We should check if we're in forward-only mode, and
* if so we should bail out.
*/
INSIST(dns_name_countlabels(&fctx->domain) > 0);
dns_name_free(&fctx->domain, fctx->res->mctx);
if (dns_rdataset_isassociated(&fctx->nameservers))
dns_rdataset_disassociate(&fctx->nameservers);
dns_name_init(&fctx->domain, NULL);
result = dns_name_dup(ns_name, fctx->res->mctx, &fctx->domain);
if (result != ISC_R_SUCCESS)
return (result);
fctx->attributes |= FCTX_ATTR_WANTCACHE;
return (DNS_R_DELEGATION);
}
……
}
b、迭代過程
其中的fctx_try會觸發前面的回報處理,從而形成一個間接的遞歸調用
static void
resquery_response(isc_task_t *task, isc_event_t *event) {
……
result = noanswer_response(fctx, NULL);
if (result == DNS_R_DELEGATION) {
/*
* We don't have the answer, but we know a better
* place to look.
*/
get_nameservers = ISC_TRUE;
keep_trying = ISC_TRUE;
result = ISC_R_SUCCESS;
}
……
if (keep_trying) {
……
/*
* Try again.
*/
fctx_try(fctx);
……
}
五、glibc中相關代碼
1、如何查看C庫當前使用的__res_state
tsecer@harry: cat gethost.cpp
#include <netdb.h>
#include <resolv.h>
//extern "C" struct *__res_state _res;
struct __res_state * pstste = &_res;
int main()
{
gethostbyname("www.ridicu.org.not.exit");
//__res_state ();
}
tsecer@harry: g++ gethost.cpp
tsecer@harry: ./a.out
(gdb) p *pstste
$2 = {retrans = 5, retry = 2, options = 524993, nscount = 3, nsaddr_list = {{sin_family = 2, sin_port = 13568, sin_addr = {s_addr = 1651997450}, sin_zero = "\000\000\000\000\000\000\000"}, {
sin_family = 2, sin_port = 13568, sin_addr = {s_addr = 1844972554}, sin_zero = "\000\000\000\000\000\000\000"}, {sin_family = 2, sin_port = 13568, sin_addr = {s_addr = 1853389578},
sin_zero = "\000\000\000\000\000\000\000"}}, id = 13280, dnsrch = {0x7ffff75b8ac0 <_res@GLIBC_2.2.5+128> "not.exist", 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
defdname = "not.exist", '\000' <repeats 246 times>, pfcode = 0, ndots = 1, nsort = 0, ipv6_unavail = 0, unused = 0, sort_list = {{addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0},
mask = 0}, {addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}, {addr = {
s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}}, qhook = 0x0, rhook = 0x0, res_h_errno = 0, _vcsock = -1, _flags = 0, _u = {
pad = "\000\000\003\000\003\000\003", '\000' <repeats 44 times>, _ext = {nscount = 0, nsmap = {3, 3, 3}, nssocks = {0, 0, 0}, nscount6 = 0, nsinit = 0, nsaddrs = {0x0, 0x0, 0x0},
_initstamp = {0, 0}}}}
(gdb)
2、/etc/resolv.conf中search的作用
然后在/etc/resolv.conf文件中添加一個
search not.exit.foo
可以看到抓包的內容中多了一個" www.ridicu.org.not.exit.foo.",也就是在這個基礎上追加了search后的字符串然后再次查詢
20:28:29.840008 IP 9.9.9.9.51252 > 10.123.119.98.domain: 30130+ A? www.ridicu.org.not.exit. (41)
20:28:29.932262 IP 10.123.119.98.domain > 9.9.9.9.51252: 30130 NXDomain 0/1/0 (116)
20:28:29.932375 IP 9.9.9.9.59114 > 10.123.119.98.domain: 6007+ A? www.ridicu.org.not.exit.foo. (45)
20:28:29.980252 IP 10.123.119.98.domain > 9.9.9.9.59114: 6007 NXDomain 0/1/0 (143)
六、一個小問題
在抓包的協議中,可以看到地址后面有一個多於的點符號,這個其實是tcmdump對於這些包做了特殊打印的處理,其實包中發送的內容並不是原本的文本格式的原始域名,而是做了轉換,下面的位置描述了轉換方法。
For example, “www.xyzindustries.com” would be encoded as:
“[3] w w w [13] x y z i n d u s t r i e s [3] c o m [0]”
I have shown the label lengths in square brackets to distinguish them. Remember that these label lengths are binary encoded numbers, so a single byte can hold a value from 0 to 255; that “[13]” is one byte and not two, as you can see in Figure 252. Labels are actually limited to a maximum of 63 characters, and we'll see shortly why this is significant.
也就是在每個部分開頭都是一個長度字段開始的,最后結束的時候有一個0,通常的展示都是將這個結尾的0展示成了'.'。該格式在RFC1035的“4.1.2. Question section format”中對於QNAME也有描述。
由於這種編碼格式,其實domainname的長度是有限制的,在這個帖子中討論了這些限制:
253 characters is the maximum length of full domain name, including dots: e.g. www.example.com = 15 characters.
63 characters in the maximum length of a "label" (part of domain name separated by dot). Labels for www.example.com are com, example and www.
This is an example of the domain with longest possible label (a fully working website BTW): http://www.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijk.com/. The domain name length = 71 characters.
This will be an example of longest domain name: abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcde.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijk.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijk.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijk.com