之前在學習NIO的時候只是簡單的學習了其使用,對齊組件Selector、Channel、Buffer 也是只是有三個重要的類,至於為什么叫NIO以及NIO的優點沒有了解,這里詳細記錄下。
1 . 簡單組成
內核模式:跑內核程序。在內核模式下,代碼具有對硬件的所有控制權限。可以執行所有CPU指令,可以訪問任意地址的內存。內核模式是為操作系統最底層,最可信的函數服務的。在內核模式下的任何異常都是災難性的,將會導致整台機器停機。
用戶模式:跑用戶程序。在用戶模式下,代碼沒有對硬件的直接控制權限,只能訪問自己的用戶空間地址。程序是通過調用系統接口(System APIs)來達到訪問硬件和內存。在這種保護模式下,即時程序發生崩潰也是可以恢復的。在你的電腦上大部分程序都是在用戶模式下運行的。
當程序涉及到調用內核程序,會涉及到模式的狀態,CPU會先保存用戶線程的上下文,然后切換到內核模式去執行內核程序,最后在根據上下文切戶到應用程序。
文件描述符fd:文件描述符(File descriptor)是計算機科學中的一個術語,是一個用於表述指向文件的引用的抽象化概念。文件描述符在形式上是一個非負整數。實際上,它是一個索引值,指向內核為每一個進程所維護的該進程打開文件的記錄表。當程序打開一個現有文件或者創建一個新文件時,內核向進程返回一個文件描述符。在程序設計中,一些涉及底層的程序編寫往往會圍繞着文件描述符展開。但是文件描述符這一概念往往只適用於UNIX、Linux這樣的操作系統。
2. BIO測試
BIO測試:
import java.io.BufferedReader; import java.io.InputStream; import java.io.InputStreamReader; import java.net.ServerSocket; import java.net.Socket; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; public class SocketServer { private static final ExecutorService executorService = Executors.newFixedThreadPool(5); public static void main(String[] args) throws Exception { ServerSocket serverSocket = new ServerSocket(8088); System.out.println("serverSocket 8088 start"); while (true) { Socket socket = serverSocket.accept(); System.out.println("socket.getInetAddress(): " + socket.getInetAddress()); executorService.execute(new MyThread(socket)); } } static class MyThread extends Thread { private Socket socket; public MyThread(Socket socket) { this.socket = socket; } @Override public void run() { try { InputStream inputStream = socket.getInputStream(); BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream)); while (true) { String s = bufferedReader.readLine(); System.out.println(Thread.currentThread().getId() + " 收到的消息\t" + s); } } catch (Exception exception) { // ignore } finally { } } } }
使用JDK6 進行編譯然后監測方法執行
[root@localhost jdk6]# ./jdk1.6.0_06/bin/javac SocketServer.java [root@localhost jdk6]# strace -ff -o out ./jdk1.6.0_06/bin/java SocketServer serverSocket 8088 start
strace是一個可用於診斷、調試和教學的Linux用戶空間跟蹤器。我們用它來監控用戶空間進程和內核的交互,比如系統調用、信號傳遞、進程狀態變更等。
會生成幾個out文件,如下:(每個線程對應一個文件,JVM啟動默認會創建一些守護線程,用於GC或者接收jmap 等命令的線程)
[root@localhost jdk6]# ll total 64092 drwxr-xr-x. 9 10 143 204 Jul 23 2008 jdk1.6.0_06 -rw-r--r--. 1 root root 64885867 Jul 20 03:33 jdk-6u6-p-linux-x64.tar.gz -rw-r--r--. 1 root root 21049 Jul 20 07:04 out.10685 -rw-r--r--. 1 root root 139145 Jul 20 07:04 out.10686 -rw-r--r--. 1 root root 21470 Jul 20 07:06 out.10687 -rw-r--r--. 1 root root 941 Jul 20 07:04 out.10688 -rw-r--r--. 1 root root 906 Jul 20 07:04 out.10689 -rw-r--r--. 1 root root 985 Jul 20 07:04 out.10690 -rw-r--r--. 1 root root 941 Jul 20 07:04 out.10691 -rw-r--r--. 1 root root 906 Jul 20 07:04 out.10692 -rw-r--r--. 1 root root 941 Jul 20 07:04 out.10693 -rw-r--r--. 1 root root 388112 Jul 20 07:06 out.10694 -rw-r--r--. 1 root root 1433 Jul 20 07:04 SocketServer.class -rw-r--r--. 1 root root 1626 Jul 20 07:03 SocketServer.java -rw-r--r--. 1 root root 1297 Jul 20 07:04 SocketServer$MyThread.class [root@localhost jdk6]# ll | grep out | wc -l 10
1. 查找socket 關鍵字所在的文件
[root@localhost jdk6]# grep socket out.* out.10686:socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3 out.10686:connect(3, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) out.10686:socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3 out.10686:connect(3, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) out.10686:socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 4 out.10686:getsockname(0, 0x7f64c9083350, [28]) = -1 ENOTSOCK (Socket operation on non-socket) out.10686:socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 4 out.10686:socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 5 out.10686:socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 4
可以看到10686 文件有建立socket 的操作,然后分析10686文件。
(1) 定位到文件末尾:
可以看到是accept命令阻塞,並且無返回結果
(2) 繼續追蹤,查看socket 啟動以及bind、listen 的過程
可以看到主要過程是:一個SocketServer 啟動的過程如下
socket => 4 (文件描述符) bind(4, 8088) listen(4) accept(4, 阻塞
查看accept 命令的格式如下: (可以看到是接受一個socket 連接,如果有連接會返回一個正整數)
[root@localhost jdk6]# man 2 accept ACCEPT(2) Linux Programmer's Manual ACCEPT(2) NAME accept, accept4 - accept a connection on a socket SYNOPSIS #include <sys/types.h> /* See NOTES */ #include <sys/socket.h> int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen); #define _GNU_SOURCE /* See feature_test_macros(7) */ #include <sys/socket.h> int accept4(int sockfd, struct sockaddr *addr, socklen_t *addrlen, int flags); DESCRIPTION The accept() system call is used with connection-based socket types (SOCK_STREAM, SOCK_SEQPACKET). It extracts the first connection request on the queue of pending connections for the listening socket, sockfd, creates a new connected socket, and returns a new file descriptor referring to that socket. The newly created socket is not in the listening state. The original socket sockfd is unaffected by this call. The argument sockfd is a socket that has been created with socket(2), bound to a local address with bind(2), and is listening for connec‐ tions after a listen(2) RETURN VALUE On success, these system calls return a nonnegative integer that is a descriptor for the accepted socket. On error, -1 is returned, and errno is set appropriately.
2. nc 命令模擬建立一個客戶端連接
[root@localhost jdk6]# nc localhost 8088
(1) 查看服務器端會多生成一個out文件
[root@localhost jdk6]# ll total 72712 drwxr-xr-x. 9 10 143 204 Jul 23 2008 jdk1.6.0_06 -rw-r--r--. 1 root root 64885867 Jul 20 03:33 jdk-6u6-p-linux-x64.tar.gz -rw-r--r--. 1 root root 21049 Jul 20 07:04 out.10685 -rw-r--r--. 1 root root 141155 Jul 20 07:32 out.10686 -rw-r--r--. 1 root root 369445 Jul 20 07:33 out.10687 -rw-r--r--. 1 root root 941 Jul 20 07:04 out.10688 -rw-r--r--. 1 root root 906 Jul 20 07:04 out.10689 -rw-r--r--. 1 root root 985 Jul 20 07:04 out.10690 -rw-r--r--. 1 root root 941 Jul 20 07:04 out.10691 -rw-r--r--. 1 root root 906 Jul 20 07:04 out.10692 -rw-r--r--. 1 root root 941 Jul 20 07:04 out.10693 -rw-r--r--. 1 root root 7103157 Jul 20 07:33 out.10694 -rw-r--r--. 1 root root 1266 Jul 20 07:32 out.10866 -rw-r--r--. 1 root root 1433 Jul 20 07:04 SocketServer.class -rw-r--r--. 1 root root 1626 Jul 20 07:03 SocketServer.java -rw-r--r--. 1 root root 1297 Jul 20 07:04 SocketServer$MyThread.class [root@localhost jdk6]# ll | grep out | wc -l 11
(2)查看10686 文件accept 返回的命令 (可以看到接收成功之后返回一個文件描述符是6, 接下來該 fd 會用於recvfrom 讀取數據)
可以看到clone 是創建的處理任務的線程, 也就是對應內核是clone 命令進行創建線程。linux下沒有真正意義的線程,因為linux下沒有給線程設計專有的結構體,它的線程是用進程模擬的,而它是由多個進程共一塊地址空間而模擬得到的。
查看clone 函數如下: (類似於fork 函數創建子進程)
man 2 clone CLONE(2) Linux Programmer's Manual CLONE(2) NAME clone, __clone2 - create a child process SYNOPSIS /* Prototype for the glibc wrapper function */ #include <sched.h> int clone(int (*fn)(void *), void *child_stack, int flags, void *arg, ... /* pid_t *ptid, struct user_desc *tls, pid_t *ctid */ ); /* Prototype for the raw system call */ long clone(unsigned long flags, void *child_stack, void *ptid, void *ctid, struct pt_regs *regs); Feature Test Macro Requirements for glibc wrapper function (see feature_test_macros(7)): clone(): Since glibc 2.14: _GNU_SOURCE Before glibc 2.14: _BSD_SOURCE || _SVID_SOURCE /* _GNU_SOURCE also suffices */ DESCRIPTION clone() creates a new process, in a manner similar to fork(2).
(3) 查看10866 文件:
可以看到當前線程阻塞在recvfrom命令, 查看recfrom 命令如下:
man 2 recvfrom RECV(2) Linux Programmer's Manual RECV(2) NAME recv, recvfrom, recvmsg - receive a message from a socket SYNOPSIS #include <sys/types.h> #include <sys/socket.h> ssize_t recv(int sockfd, void *buf, size_t len, int flags); ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags, struct sockaddr *src_addr, socklen_t *addrlen); ssize_t recvmsg(int sockfd, struct msghdr *msg, int flags); DESCRIPTION The recvfrom() and recvmsg() calls are used to receive messages from a socket, and may be used to receive data on a socket whether or not it is connection-oriented. If src_addr is not NULL, and the underlying protocol provides the source address, this source address is filled in. When src_addr is NULL, nothing is filled in; in this case, addrlen is not used, and should also be NULL. The argument addrlen is a value-result argument, which the caller should initialize before the call to the size of the buffer associated with src_addr, and modified on return to indicate the actual size of the source address. The returned address is truncated if the buffer provided is too small; in this case, addrlen will return a value greater than was supplied to the call. The recv() call is normally used only on a connected socket (see connect(2)) and is identical to recvfrom() with a NULL src_addr argument. ... RETURN VALUE These calls return the number of bytes received, or -1 if an error occurred. In the event of an error, errno is set to indicate the error. The return value will be 0 when the peer has performed an orderly shutdown.
可以看到是從socket 連接中讀取數據,會導致阻塞。
(4) nc 建立連接的客戶端發送消息
[root@localhost jdk6]# nc localhost 8088 test
1》主窗口打印的信息如下:
[root@localhost jdk6]# strace -ff -o out ./jdk1.6.0_06/bin/java SocketServer serverSocket 8088 start socket.getInetAddress(): /0:0:0:0:0:0:0:1 9 收到的消息 test
2》查看out.10866 打出的命令如下
可以看到接收完消息之后然后再次進入recvfrom 命令。
總結: BIO問題總結
1. 每連接每線程,造成的問題就是線程內存消費、cpu 調度消耗
2. 根源是blocking阻塞:accept和recvfrom 內核操作。 解決方案就是內核提供NONBLOCKING非阻塞方案。
3. 查看socket 方法提供了一個參數SOCK_NONBLOCK 用於設置非阻塞: (獲取不到返回-1, 到java 里面就是null)
SOCK_NONBLOCK Set the O_NONBLOCK file status flag on the new open file description. Using this flag saves extra calls to fcntl(2) to achieve the same result.
3. NIO測試
NIO在Java 中被叫做new io, 在操作系統層面被稱為nonblocking io。 下面的測試基於JDK8。
代碼如下:
import java.net.InetSocketAddress; import java.nio.ByteBuffer; import java.nio.channels.ServerSocketChannel; import java.nio.channels.SocketChannel; import java.util.LinkedList; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; public class NIOSocket { private static final ExecutorService executorService = Executors.newFixedThreadPool(5); public static void main(String[] args) throws Exception { LinkedList<SocketChannel> clients = new LinkedList<>(); ServerSocketChannel serverSocketChannel = ServerSocketChannel.open(); serverSocketChannel.bind(new InetSocketAddress(8088)); serverSocketChannel.configureBlocking(false); // 對應於操作系統的NONBLOCKING while (true) { Thread.sleep(500); /** * accept 調用內核的命令 * BIO的時候一直阻塞,有客戶端鏈接的時候返回這個客戶端的fd 文件描述符 * NONBLOCKING 會有返回值,只是返回值是-1 **/ SocketChannel client = serverSocketChannel.accept(); // 不會阻塞 -1 null (OS層面返回-1, java 層面返回null) if (client != null) { client.configureBlocking(false); int port = client.socket().getPort(); System.out.println("client.socket().getPort(): " + port); clients.add(client); } ByteBuffer buffer = ByteBuffer.allocate(4096); // 申請內存,可以在堆內,也可以在堆外DM // 遍歷client讀取消息 for (SocketChannel c : clients) { int read = c.read(buffer); // >0 -1 不會阻塞 if (read > 0) { buffer.flip(); byte[] bytes = new byte[buffer.limit()]; buffer.get(bytes); String string = new String(bytes); System.out.println("client.socket().getPort(): " + c.socket().getPort() + " 收到的消息: " + string); buffer.clear(); } } } } }
0. NIO啟動的過程如下:
socket => 4 (文件描述符) bind(4, 8088) listen(4) 4.nonblocking (設置內核系統調用參數為非阻塞,4是文件描述符) accept(4, xxx) => -1 6
可以看到是設置非阻塞,調用內核方法的時候不會阻塞,有則返回文件描述符,沒有則返回-1,到應用程序內部對應null 或者 -1.
1. 用JDK8編譯
2. 用strace 追蹤
[root@localhost jdk8]# strace -ff -o out ./jdk1.8.0_291/bin/java NIOSocket
3. 查看生成的out 文件
[root@localhost jdk8]# ll total 143724 drwxr-xr-x. 8 10143 10143 273 Apr 7 15:14 jdk1.8.0_291 -rw-r--r--. 1 root root 144616467 Jul 20 03:42 jdk-8u291-linux-i586.tar.gz -rw-r--r--. 1 root root 2358 Jul 20 08:18 NIOSocket.class -rw-r--r--. 1 root root 2286 Jul 20 08:18 NIOSocket.java -rw-r--r--. 1 root root 12822 Jul 20 08:20 out.11117 -rw-r--r--. 1 root root 1489453 Jul 20 08:20 out.11118 -rw-r--r--. 1 root root 10315 Jul 20 08:20 out.11119 -rw-r--r--. 1 root root 1445 Jul 20 08:20 out.11120 -rw-r--r--. 1 root root 1424 Jul 20 08:20 out.11121 -rw-r--r--. 1 root root 884 Jul 20 08:20 out.11122 -rw-r--r--. 1 root root 11113 Jul 20 08:20 out.11123 -rw-r--r--. 1 root root 884 Jul 20 08:20 out.11124 -rw-r--r--. 1 root root 269113 Jul 20 08:20 out.11125 [root@localhost jdk8]# ll | grep out | wc -l 9
一個socket 必須經過上面的socket、bind、listen、accept過程,查看其過程
(1) socket
socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 4
(2) bind 和listen
bind(4, {sa_family=AF_INET6, sin6_port=htons(8088), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0
listen(4, 50)
(3) 查看accept:(可以看到是非阻塞的方式進行,默認會返回一個 -1 值)
4. 用nc 建立一個連接
nc localhost 8088
5. 查看out.11118 文件
會有一條accept 命令返回值不是-1.如下:
accept(4, {sa_family=AF_INET6, sin6_port=htons(59238), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [28]) = 5
6. 建立的鏈接發送一條消息HELLO
7. 主線程查看消息
[root@localhost jdk8]# strace -ff -o out ./jdk1.8.0_291/bin/java NIOSocket client.socket().getPort(): 59238 client.socket().getPort(): 59238 收到的消息: HELLO
8. 查看out.11118 文件讀取到的信息: 可以看到調用的是read 方法。並且會讀取到返回的消息
NIO優缺點:
優點:規避多線程的問題
缺點:假設一萬個連接,只有一個發來數據,每循環一次,必須向內核發送一萬次read 調用,那么有9999次是無意義的,消耗時間和資源(用戶空間向內核空間的循環遍歷,復雜度在系統調用上)。
解決辦法: 內核繼續向前發展,引入多路復用器。 selector、poll、epoll
補充:上面代碼設置的是非阻塞,默認是阻塞,如果去掉設置非阻塞的參數,查看結果如下
1. 代碼:
public class NIOSocket {public static void main(String[] args) throws Exception { LinkedList<SocketChannel> clients = new LinkedList<>(); ServerSocketChannel serverSocketChannel = ServerSocketChannel.open(); serverSocketChannel.bind(new InetSocketAddress(8088)); // serverSocketChannel.configureBlocking(false); // 對應於操作系統的NONBLOCKING while (true) { Thread.sleep(500); /** * accept 調用內核的命令 * BIO的時候一直阻塞,有客戶端鏈接的時候返回這個客戶端的fd 文件描述符 * NONBLOCKING 會有返回值,只是返回值是-1 **/ SocketChannel client = serverSocketChannel.accept(); // 不會阻塞 -1 null (OS層面返回-1, java 層面返回null) if (client != null) { // client.configureBlocking(false); int port = client.socket().getPort(); System.out.println("client.socket().getPort(): " + port); clients.add(client); } ByteBuffer buffer = ByteBuffer.allocate(4096); // 申請內存,可以在堆內,也可以在堆外DM // 遍歷client讀取消息 for (SocketChannel c : clients) { int read = c.read(buffer); // >0 -1 不會阻塞 if (read > 0) { buffer.flip(); byte[] bytes = new byte[buffer.limit()]; buffer.get(bytes); String string = new String(bytes); System.out.println("client.socket().getPort(): " + c.socket().getPort() + " 收到的消息: " + string); buffer.clear(); } } } } }
2. strace 查看阻塞情況:
(1). 查看accept 阻塞情況
socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 4 setsockopt(4, SOL_IPV6, IPV6_V6ONLY, [0], 4) = 0 setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 bind(4, {sa_family=AF_INET6, sin6_port=htons(8088), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0 listen(4, 50) 。。。 accept(4,
(2) nc 連接之后查看read 阻塞情況
4. 引入多路復用器
上面NIO模型中,有個缺點是:假設一萬個連接,只有一個發來數據,每循環一次,必須向內核發送一萬次read 調用,那么有9999次是無意義的,消耗時間和資源(用戶空間向內核空間的循環遍歷,復雜度在系統調用上)。
解決辦法: 內核繼續向前發展,引入多路復用器。 selector、poll、epoll。
socket => 4 (文件描述符) bind(4, 8088) listen(4) 4.nonblocking (設置內核系統調用參數為非阻塞,4是文件描述符) while(true) { select(fd) // O(1), 上限是1024 read(fd) // 讀取數據 }
如果是應用程序自己讀取IO,那么這個IO模型,無論是BIO、NIO、多路復用,都是同步IO模型,多路復用器只能給fd文件描述符的狀態,不能給到數據。也就是需要用戶程序調用內核程序從內核空間讀取用戶程序內存。Windows的IOCP 內核有線程,拷貝數據到用戶空間。
select、poll 多路復用器
優勢: 通過一次系統調用,把fds傳遞給內核,內核進行遍歷,這種遍歷減少了系統調用的次數。
缺點:
1.重復傳遞 fd 文件描述符,解決辦法:內核開辟空間保留fd
2.每次select、poll, 內核都要遍歷全量的 fd,解決辦法:計組深度只是,中斷,callback,增強。
3. select支持的文件描述符數量太小了,默認是1024個。
因此產生了epoll。
5. epoll 理解
優勢:
1. 對fd數量沒有限制(當然這個在poll也被解決了)
2. 拋棄了bitmap數組實現了新的結構來存儲多種事件類型
3. 無需重復拷貝fd 隨用隨加 隨棄隨刪
4. 采用事件驅動避免輪詢查看可讀寫事件
linux 查看epoll命令如下:
man epoll NAME epoll - I/O event notification facility SYNOPSIS #include <sys/epoll.h> NAME epoll - I/O event notification facility SYNOPSIS #include <sys/epoll.h> DESCRIPTION The epoll API performs a similar task to poll(2): monitoring multiple file descriptors to see if I/O is possible on any of them. The epoll API can be used either as an edge-triggered or a level-triggered interface and scales well to large numbers of watched file descriptors. The following system calls are provided to create and manage an epoll instance: * epoll_create(2) creates an epoll instance and returns a file descriptor referring to that instance. (The more recent epoll_create1(2) extends the functionality of epoll_create(2).) * Interest in particular file descriptors is then registered via epoll_ctl(2). The set of file descriptors currently registered on an epoll instance is sometimes called an epoll set. * epoll_wait(2) waits for I/O events, blocking the calling thread if no events are currently available.
可以看到epoll 本身包括三個子命令,epoll_create、epoll_ctl、cpoll_wait,
epoll提供了三個函數,epoll_create,epoll_ctl 和 epoll_wait,epoll_create是創建一個epoll句柄,創建一個epoll 實例,並且初始化其相關數據結構;epoll_ctl是注冊要監聽的事件類型;epoll_wait則是等待事件的產生。
對於第一個缺點,epoll的解決方案在epoll_ctl函數中。每次注冊新的事件到epoll句柄中時(在epoll_ctl中指定EPOLL_CTL_ADD),會把所有的fd拷貝進內核,而不是在epoll_wait的時候重復拷貝。epoll保證了每個fd在整個過程中只會拷貝一次。
對於第二個缺點,epoll的解決方案不像select或poll一樣每次都把current輪流加入fd對應的設備等待隊列中,而只在epoll_ctl時把current掛一遍(這一遍必不可少)並為每個fd指定一個回調函數,當設備就緒,喚醒等待隊列上的等待者時,就會調用這個回調函數,而這個回調函數會把就緒的fd加入一個就緒鏈表)。epoll_wait的工作實際上就是在這個就緒鏈表中查看有沒有就緒的fd(利用schedule_timeout()實現睡一會,判斷一會的效果,和select實現中的第7步是類似的)。
對於第三個缺點,epoll沒有這個限制,它所支持的FD上限是最大可以打開文件的數目,這個數字一般遠大於2048,舉個例子,在1GB內存的機器上大約是10萬左右,具體數目可以cat /proc/sys/fs/file-max察看,一般來說這個數目和系統內存關系很大。
man 2 查看二類系統調用命令如下:
(1) epoll_create: 創建一個epoll 實例,並且初始化其相關的數據結構
EPOLL_CREATE(2) Linux Programmer's Manual EPOLL_CREATE(2) NAME epoll_create, epoll_create1 - open an epoll file descriptor SYNOPSIS #include <sys/epoll.h> int epoll_create(int size); int epoll_create1(int flags); DESCRIPTION epoll_create() creates an epoll(7) instance. Since Linux 2.6.8, the size argument is ignored, but must be greater than zero; see NOTES below. epoll_create() returns a file descriptor referring to the new epoll instance. This file descriptor is used for all the subsequent calls to the epoll interface. When no longer required, the file descriptor returned by epoll_create() should be closed by using close(2). When all file descriptors referring to an epoll instance have been closed, the kernel destroys the instance and releases the associated resources for reuse. epoll_create1() If flags is 0, then, other than the fact that the obsolete size argument is dropped, epoll_create1() is the same as epoll_create(). The following value can be included in flags to obtain different behavior: EPOLL_CLOEXEC Set the close-on-exec (FD_CLOEXEC) flag on the new file descriptor. See the description of the O_CLOEXEC flag in open(2) for reasons why this may be useful. RETURN VALUE On success, these system calls return a nonnegative file descriptor. On error, -1 is returned, and errno is set to indicate the error.
(2) epoll_ctl:fd添加/刪除於epoll_create返回的epfd中
EPOLL_CTL(2) Linux Programmer's Manual EPOLL_CTL(2) NAME epoll_ctl - control interface for an epoll descriptor SYNOPSIS #include <sys/epoll.h> int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event); DESCRIPTION This system call performs control operations on the epoll(7) instance referred to by the file descriptor epfd. It requests that the opera‐ tion op be performed for the target file descriptor, fd. Valid values for the op argument are : EPOLL_CTL_ADD Register the target file descriptor fd on the epoll instance referred to by the file descriptor epfd and associate the event event with the internal file linked to fd. EPOLL_CTL_MOD Change the event event associated with the target file descriptor fd. EPOLL_CTL_DEL Remove (deregister) the target file descriptor fd from the epoll instance referred to by epfd. The event is ignored and can be NULL (but see BUGS below). The event argument describes the object linked to the file descriptor fd. The struct epoll_event is defined as : typedef union epoll_data { void *ptr; int fd; uint32_t u32; uint64_t u64; } epoll_data_t; struct epoll_event { uint32_t events; /* Epoll events */ epoll_data_t data; /* User data variable */ }; The events member is a bit set composed using the following available event types: EPOLLIN The associated file is available for read(2) operations. EPOLLOUT The associated file is available for write(2) operations. EPOLLRDHUP (since Linux 2.6.17) Stream socket peer closed connection, or shut down writing half of connection. (This flag is especially useful for writing simple code to detect peer shutdown when using Edge Triggered monitoring.) EPOLLPRI There is urgent data available for read(2) operations. EPOLLERR Error condition happened on the associated file descriptor. epoll_wait(2) will always wait for this event; it is not necessary to set it in events. EPOLLHUP Hang up happened on the associated file descriptor. epoll_wait(2) will always wait for this event; it is not necessary to set it in events. EPOLLET Sets the Edge Triggered behavior for the associated file descriptor. The default behavior for epoll is Level Triggered. See epoll(7) for more detailed information about Edge and Level Triggered event distribution architectures. EPOLLONESHOT (since Linux 2.6.2) Sets the one-shot behavior for the associated file descriptor. This means that after an event is pulled out with epoll_wait(2) the associated file descriptor is internally disabled and no other events will be reported by the epoll interface. The user must call epoll_ctl() with EPOLL_CTL_MOD to rearm the file descriptor with a new event mask. RETURN VALUE When successful, epoll_ctl() returns zero. When an error occurs, epoll_ctl() returns -1 and errno is set appropriately.
(3) epoll_wait:該接口是阻塞等待內核返回的可讀寫事件,epfd還是epoll_create的返回值,events是個結構體數組指針存儲epoll_event,也就是將內核返回的待處理epoll_event結構都存儲下來
EPOLL_WAIT(2) Linux Programmer's Manual EPOLL_WAIT(2) NAME epoll_wait, epoll_pwait - wait for an I/O event on an epoll file descriptor SYNOPSIS #include <sys/epoll.h> int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout); int epoll_pwait(int epfd, struct epoll_event *events, int maxevents, int timeout, const sigset_t *sigmask); DESCRIPTION The epoll_wait() system call waits for events on the epoll(7) instance referred to by the file descriptor epfd. The memory area pointed to by events will contain the events that will be available for the caller. Up to maxevents are returned by epoll_wait(). The maxevents argu‐ ment must be greater than zero. The timeout argument specifies the minimum number of milliseconds that epoll_wait() will block. (This interval will be rounded up to the system clock granularity, and kernel scheduling delays mean that the blocking interval may overrun by a small amount.) Specifying a timeout of -1 causes epoll_wait() to block indefinitely, while specifying a timeout equal to zero cause epoll_wait() to return immediately, even if no events are available. The struct epoll_event is defined as : typedef union epoll_data { void *ptr; int fd; uint32_t u32; uint64_t u64; } epoll_data_t; struct epoll_event { uint32_t events; /* Epoll events */ epoll_data_t data; /* User data variable */ }; The data of each returned structure will contain the same data the user set with an epoll_ctl(2) (EPOLL_CTL_ADD,EPOLL_CTL_MOD) while the events member will contain the returned event bit field. epoll_pwait() The relationship between epoll_wait() and epoll_pwait() is analogous to the relationship between select(2) and pselect(2): like pselect(2), epoll_pwait() allows an application to safely wait until either a file descriptor becomes ready or until a signal is caught. The following epoll_pwait() call: ready = epoll_pwait(epfd, &events, maxevents, timeout, &sigmask); is equivalent to atomically executing the following calls: sigset_t origmask; sigprocmask(SIG_SETMASK, &sigmask, &origmask); ready = epoll_wait(epfd, &events, maxevents, timeout); sigprocmask(SIG_SETMASK, &origmask, NULL); The sigmask argument may be specified as NULL, in which case epoll_pwait() is equivalent to epoll_wait(). RETURN VALUE When successful, epoll_wait() returns the number of file descriptors ready for the requested I/O, or zero if no file descriptor became ready during the requested timeout milliseconds. When an error occurs, epoll_wait() returns -1 and errno is set appropriately.
可以看到epoll定義的事件結構。
1. epoll官方demo
#define MAX_EVENTS 10 struct epoll_event ev, events[MAX_EVENTS]; int listen_sock, conn_sock, nfds, epollfd; /* Set up listening socket, 'listen_sock' (socket(), bind(), listen()) */ epollfd = epoll_create(10); if (epollfd == -1) { perror("epoll_create"); exit(EXIT_FAILURE); } ev.events = EPOLLIN; ev.data.fd = listen_sock; if (epoll_ctl(epollfd, EPOLL_CTL_ADD, listen_sock, &ev) == -1) { perror("epoll_ctl: listen_sock"); exit(EXIT_FAILURE); } for (;;) { nfds = epoll_wait(epollfd, events, MAX_EVENTS, -1); if (nfds == -1) { perror("epoll_pwait"); exit(EXIT_FAILURE); } for (n = 0; n < nfds; ++n) { if (events[n].data.fd == listen_sock) { conn_sock = accept(listen_sock, (struct sockaddr *) &local, &addrlen); if (conn_sock == -1) { perror("accept"); exit(EXIT_FAILURE); } setnonblocking(conn_sock); ev.events = EPOLLIN | EPOLLET; ev.data.fd = conn_sock; if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn_sock, &ev) == -1) { perror("epoll_ctl: conn_sock"); exit(EXIT_FAILURE); } } else { do_use_fd(events[n].data.fd); } } }
2. 事件觸發模式
The epoll event distribution interface is able to behave both as edge-triggered (ET) and as level-triggered (LT). epoll的事件發布有兩種模式,分別對應邊緣觸發和水平觸發
1. 水平觸發(level-trggered):默認是該模式
只要文件描述符關聯的讀內核緩沖區非空,有數據可以讀取,就一直發出可讀信號進行通知
當文件描述符關聯的內核寫緩沖區不滿,有空間可以寫入,就一直發出可寫信號進行通知
2. 邊緣觸發(edge-triggered):
當文件描述符關聯的讀內核緩沖區由空轉化為非空的時候,則發出可讀信號進行通知
當文件描述符關聯的內核寫緩沖區由滿轉化為不滿的時候,則發出可寫信號進行通知
兩者的區別:
水平觸發是只要讀緩沖區有數據,就會一直觸發可讀信號,而邊緣觸發僅僅在空變為非空的時候通知一次,舉個例子:
1. 讀緩沖區剛開始是空的
2. 讀緩沖區寫入2KB數據
3. 水平觸發和邊緣觸發模式此時都會發出可讀信號
4. 收到信號通知后,讀取了1kb的數據,讀緩沖區還剩余1KB數據
5. 水平觸發會再次進行通知,而邊緣觸發不會再進行通知
所以邊緣觸發需要一次性的把緩沖區的數據讀完為止,也就是一直讀,直到讀到EGAIN(EGAIN說明緩沖區已經空了)為止,因為這一點,邊緣觸發需要設置文件句柄為非阻塞
一道面試題:使用Linux epoll模型的LT水平觸發模式,當socket可寫時,會不停的觸發socket可寫的事件,如何處理?
普通做法:
當需要向socket寫數據時,將該socket加入到epoll等待可寫事件。接收到socket可寫事件后,調用write或send發送數據,當數據全部寫完后, 將socket描述符移出epoll列表,這種做法需要反復添加和刪除。
改進做法:
向socket寫數據時直接調用send發送,當send返回錯誤碼EAGAIN,才將socket加入到epoll,等待可寫事件后再發送數據,全部數據發送完畢,再移出epoll模型,改進的做法相當於認為socket在大部分時候是可寫的,不能寫了再讓epoll幫忙監控。
3. epoll 模型圖
可以簡單的理解為如下圖
(1) 調用epoll_create 創建一個epoll 實例(初始化相關的數據結構),並且返回一個fd文件描述符
(2) 調用epoll_ctl 注冊事件到上面返回的文件描述符,實際就是添加一個fd以及監聽的事件到內核空間(紅黑樹維護一個結構)
當有事件發生內核會把事件結構移動到另一個就緒隊列
(3) 調用epoll_wait 從就緒隊列獲取事件(獲取的事件包括fd、事件類型等)
4. 測試
代碼如下:
import java.net.InetSocketAddress; import java.nio.ByteBuffer; import java.nio.channels.SelectionKey; import java.nio.channels.Selector; import java.nio.channels.ServerSocketChannel; import java.nio.channels.SocketChannel; import java.util.Iterator; import java.util.Set; public class NIOSocket { public static void main(String[] args) throws Exception { // 創建ServerSocketChannel -> ServerSocket // Java NIO中的 ServerSocketChannel 是一個可以監聽新進來的TCP連接的通道, 就像標准IO中的ServerSocket一樣。ServerSocketChannel類在 java.nio.channels包中。 // 通過調用 ServerSocketChannel.open() 方法來打開ServerSocketChannel.如: ServerSocketChannel serverSocketChannel = ServerSocketChannel.open(); serverSocketChannel.socket().bind(new InetSocketAddress(8088)); serverSocketChannel.configureBlocking(false); // 得到一個Selecor對象 (sun.nio.ch.WindowsSelectorImpl) Selector selector = Selector.open(); //把 serverSocketChannel 注冊到 selector 關心 事件為 OP_ACCEPT //SelectionKey中定義的4種事件 //SelectionKey.OP_ACCEPT —— 接收連接進行事件,表示服務器監聽到了客戶連接,那么服務器可以接收這個連接了 // SelectionKey.OP_CONNECT —— 連接就緒事件,表示客戶與服務器的連接已經建立成功 //SelectionKey.OP_READ —— 讀就緒事件,表示通道中已經有了可讀的數據,可以執行讀操作了(通道目前有數據,可以進行讀操作了) //SelectionKey.OP_WRITE —— 寫就緒事件,表示已經可以向通道寫數據了(通道目前可以用於寫操作) serverSocketChannel.register(selector, SelectionKey.OP_ACCEPT); System.out.println("注冊后的selectionkey 數量=" + selector.keys().size()); // 1 //循環等待客戶端連接 while (true) { //這里我們等待1秒,如果沒有事件發生, 返回 if (selector.select(1000) == 0) { //沒有事件發生 // System.out.println("服務器等待了1秒,無連接"); continue; } //如果返回的>0, 就獲取到相關的 selectionKey集合 //1.如果返回的>0, 表示已經獲取到關注的事件 //2. selector.selectedKeys() 返回關注事件的集合 // 通過 selectionKeys 反向獲取通道 Set<SelectionKey> selectionKeys = selector.selectedKeys(); System.out.println("selectionKeys 數量 = " + selectionKeys.size()); //hasNext() :該方法會判斷集合對象是否還有下一個元素,如果已經是最后一個元素則返回false。 //next():把迭代器的指向移到下一個位置,同時,該方法返回下一個元素的引用。 //remove() 從迭代器指向的集合中移除迭代器返回的最后一個元素。 //遍歷 Set<SelectionKey>, 使用迭代器遍歷 Iterator<SelectionKey> keyIterator = selectionKeys.iterator(); while (keyIterator.hasNext()) { //獲取到SelectionKey SelectionKey key = keyIterator.next(); //根據key 對應的通道發生的事件做相應處理 if (key.isAcceptable()) { //如果是 OP_ACCEPT, 有新的客戶端連接 //該該客戶端生成一個 SocketChannel SocketChannel socketChannel = serverSocketChannel.accept(); System.out.println("客戶端連接成功 生成了一個 socketChannel " + socketChannel.hashCode()); //將 SocketChannel 設置為非阻塞 socketChannel.configureBlocking(false); //將socketChannel 注冊到selector, 關注事件為 OP_READ, 同時給socketChannel 關聯一個Buffer socketChannel.register(selector, SelectionKey.OP_READ, ByteBuffer.allocate(1024)); System.out.println("客戶端連接后 ,注冊的selectionkey 數量=" + selector.keys().size()); //2,3,4.. } if (key.isReadable()) { //發生 OP_READ //通過key 反向獲取到對應channel SocketChannel channel = (SocketChannel) key.channel(); //獲取到該channel關聯的buffer ByteBuffer buffer = (ByteBuffer) key.attachment(); channel.read(buffer); System.out.println("from 客戶端: " + new String(buffer.array())); } //手動從集合中移動當前的selectionKey, 防止重復操作 keyIterator.remove(); } } } }
(1) 啟動程序進行監測
[root@localhost jdk8]# strace -ff -o out ./jdk1.8.0_291/bin/java NIOSocket
(2) 查看out 文件總數
[root@localhost jdk8]# ll total 143780 -rw-r--r--. 1 root root 1033 Jul 20 23:11 Client.class -rw-r--r--. 1 root root 206 Jul 20 23:10 Client.java drwxr-xr-x. 8 10143 10143 273 Apr 7 15:14 jdk1.8.0_291 -rw-r--r--. 1 root root 144616467 Jul 20 03:42 jdk-8u291-linux-i586.tar.gz -rw-r--r--. 1 root root 2705 Jul 21 05:54 NIOSocket.class -rw-r--r--. 1 root root 5004 Jul 21 05:44 NIOSocket.java -rw-r--r--. 1 root root 13093 Jul 21 05:54 out.29779 -rw-r--r--. 1 root root 2305003 Jul 21 05:54 out.29780 -rw-r--r--. 1 root root 12951 Jul 21 05:54 out.29781 -rw-r--r--. 1 root root 2101 Jul 21 05:54 out.29782 -rw-r--r--. 1 root root 1784 Jul 21 05:54 out.29783 -rw-r--r--. 1 root root 5016 Jul 21 05:54 out.29784 -rw-r--r--. 1 root root 99615 Jul 21 05:54 out.29785 -rw-r--r--. 1 root root 914 Jul 21 05:54 out.29786 -rw-r--r--. 1 root root 119854 Jul 21 05:54 out.29787 -rw-r--r--. 1 root root 7308 Jul 21 05:54 out.29789
(3) nc 連接到8088並且發送消息 "hello"
[root@localhost jdk8]# nc localhost 8088
hello
(4) 從out.29780查看重要的信息
socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 4 setsockopt(4, SOL_IPV6, IPV6_V6ONLY, [0], 4) = 0 setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 。。。 bind(4, {sa_family=AF_INET6, sin6_port=htons(8088), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0 listen(4, 50) 。。。 epoll_create(256) = 7 。。。 epoll_ctl(7, EPOLL_CTL_ADD, 5, {EPOLLIN, {u32=5, u64=17757820874070687749}}) = 0 。。。 epoll_ctl(7, EPOLL_CTL_ADD, 4, {EPOLLIN, {u32=4, u64=17757820874070687748}}) = 0 gettimeofday({tv_sec=1626861254, tv_usec=513203}, NULL) = 0 epoll_wait(7, [], 4096, 1000) = 0 gettimeofday({tv_sec=1626861255, tv_usec=513652}, NULL) = 0 epoll_wait(7, [], 4096, 1000) = 0 gettimeofday({tv_sec=1626861256, tv_usec=515602}, NULL) = 0 epoll_wait(7, [], 4096, 1000) = 0 gettimeofday({tv_sec=1626861257, tv_usec=518045}, NULL) = 0 epoll_wait(7, [], 4096, 1000) = 0 gettimeofday({tv_sec=1626861258, tv_usec=520289}, NULL) = 0 epoll_wait(7, [], 4096, 1000) = 0 gettimeofday({tv_sec=1626861259, tv_usec=521552}, NULL) = 0 epoll_wait(7, [], 4096, 1000) = 0 。。。 accept(4, {sa_family=AF_INET6, sin6_port=htons(59252), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [28]) = 9 。。。 epoll_ctl(7, EPOLL_CTL_ADD, 9, {EPOLLIN, {u32=9, u64=17757980303256715273}}) = 0 gettimeofday({tv_sec=1626861260, tv_usec=952780}, NULL) = 0 epoll_wait(7, [], 4096, 1000) = 0 。。。 epoll_wait(7, [{EPOLLIN, {u32=9, u64=17757980303256715273}}], 4096, 1000) = 1 write(1, "selectionKeys \346\225\260\351\207\217 = 1", 24) = 24 write(1, "\n", 1) = 1 。。。 read(9, "hello\n", 1024) = 6
可以看到大致過程:
1》建立socket
2》bind端口
3》listen 監聽端口
4》epoll_create(256) = 7 創建epoll 實例
5》注冊事件 (第一個是內置的,第二個是serverSocketChannel 的 fd 注冊到 epfd)
epoll_ctl(7, EPOLL_CTL_ADD, 5, {EPOLLIN, {u32=5, u64=17757820874070687749}}) = 0 /
epoll_ctl(7, EPOLL_CTL_ADD, 4, {EPOLLIN, {u32=4, u64=17757820874070687748}}) = 0
6》epoll_wait 獲取事件
7》獲取到連接事件
8》accept = 9 返回一個 fd 是9的 客戶端socket
9》注冊fd為 9 、事件為讀事件到epfd
10》epoll_wait 獲取到一個事件,可以看到事件為可讀,事件的fd為9.
11》read(9 進行讀取數據
驗證了上面的過程: epoll-create -> epoll_ctl -> epoll_wait
5. 測試2
簡單的例子測試其過程
import java.net.InetSocketAddress; import java.nio.channels.SelectionKey; import java.nio.channels.Selector; import java.nio.channels.ServerSocketChannel; public class NIOSocket { public static void main(String[] args) throws Exception { ServerSocketChannel serverSocketChannel = ServerSocketChannel.open(); serverSocketChannel.socket().bind(new InetSocketAddress(8088)); serverSocketChannel.configureBlocking(false); System.out.println("serverSocketChannel init 8088"); Selector selector = Selector.open(); serverSocketChannel.register(selector, SelectionKey.OP_ACCEPT); System.out.println("Selector.open() = 8088"); int select = selector.select(1000); System.out.println("select: " + select); } }
strace 查看其如下: socket\bind\lisen 就跳過
。。。 epoll_create(256) = 8 。。。 epoll_ctl(8, EPOLL_CTL_ADD, 6, {EPOLLIN, {u32=6, u64=17757820874070687750}}) = 0 。。。 epoll_ctl(8, EPOLL_CTL_ADD, 4, {EPOLLIN, {u32=4, u64=17757820874070687748}}) = 0 gettimeofday({tv_sec=1626858133, tv_usec=975699}, NULL) = 0 epoll_wait(8, [], 4096, 1000) = 0
可以看到
(1) epoll_create 創建一個epoll 實例,返回一個fd
(2) epoll_ctr 注冊事件、fd 到剛才返回的epfd
(3) epoll_wait 獲取epfd 的事件列表
6. 測試3
Selector selector = Selector.open();
對於如上代碼, 測試其調用內核命令:
epoll_create(256) = 6
。。。
epoll_ctl(6, EPOLL_CTL_ADD, 4, {EPOLLIN, {u32=4, u64=17762324473698058244}}) = 0
補充: select、poll、epoll的區別
(1)select==>時間復雜度O(n)
它僅僅知道了,有I/O事件發生了,卻並不知道是哪那幾個流(可能有一個,多個,甚至全部),我們只能無差別輪詢所有流,找出能讀出數據,或者寫入數據的流,對他們進行操作。所以select具有O(n)的無差別輪詢復雜度,同時處理的流越多,無差別輪詢時間就越長。最大的fd文件描述符長度是1024.
(2)poll==>時間復雜度O(n)
poll本質上和select沒有區別,它將用戶傳入的數組拷貝到內核空間,然后查詢每個fd對應的設備狀態, 但是它沒有最大連接數的限制,原因是它是基於鏈表來存儲的.
(3)epoll==>時間復雜度O(1)
epoll可以理解為event poll,不同於忙輪詢和無差別輪詢,epoll會把哪個流發生了怎樣的I/O事件通知我們。所以我們說epoll實際上是事件驅動(每個事件關聯上fd)的,此時我們對這些流的操作都是有意義的。(復雜度降低到了O(1))
select,poll,epoll都是IO多路復用的機制。I/O多路復用就通過一種機制,可以監視多個描述符,一旦某個描述符就緒(一般是讀就緒或者寫就緒),能夠通知程序進行相應的讀寫操作。但select,poll,epoll本質上都是同步I/O,因為他們都需要在讀寫事件就緒后應用程序自己負責進行讀寫,也就是說這個讀寫過程是阻塞的,而異步I/O則無需自己負責進行讀寫,異步I/O的實現會負責把數據從內核拷貝到用戶空間。
epoll跟select都能提供多路I/O復用的解決方案。在現在的Linux內核里有都能夠支持,其中epoll是Linux所特有,而select則應該是POSIX所規定,一般操作系統均有實現。
我們Java 程序使用selector 的時候,在不同的操作系統上可能會使用不同的多路復用器,我在centos7上使用的是epoll。
補充:man查看支持的手冊,如果不支持的話 yum install -y man-pages 安裝全量的man手冊
man 2 cmd 是查看系統調用
[root@localhost jdk8]# man man 1 Executable programs or shell commands 2 System calls (functions provided by the kernel) 3 Library calls (functions within program libraries) 4 Special files (usually found in /dev) 5 File formats and conventions eg /etc/passwd 6 Games 7 Miscellaneous (including macro packages and conventions), e.g. man(7), groff(7) 8 System administration commands (usually only for root) 9 Kernel routines [Non standard]
補充: C10K問題
最初的服務器是基於進程/線程模型。新到來一個TCP連接,就需要分配一個進程。假如有C10K,就需要創建1W個進程,可想而知單機是無法承受的。那么如何突破單機性能是高性能網絡編程必須要面對的問題,進而這些局限和問題就統稱為C10K問題。
因為Linux是互聯網企業中使用率最高的操作系統,Epoll就成為C10K killer、高並發、高性能、異步非阻塞這些技術的代名詞了。FreeBSD推出了kqueue,Linux推出了epoll,Windows推出了IOCP,Solaris推出了/dev/poll。這些操作系統提供的功能就是為了解決C10K問題。epoll技術的編程模型就是異步非阻塞回調,也可以叫做Reactor,事件驅動,事件輪循(EventLoop)。Nginx,libevent,node.js這些就是Epoll時代的產物。
補充:redis采用多路復用原理查看
1. 下載並安全redis
2. strace 檢測redis 啟動
[root@localhost test]# strace -ff -o redisout ../redis-5.0.4/src/redis-server 34127:C 21 Jul 2021 21:57:26.281 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 34127:C 21 Jul 2021 21:57:26.281 # Redis version=5.0.4, bits=64, commit=00000000, modified=0, pid=34127, just started 34127:C 21 Jul 2021 21:57:26.282 # Warning: no config file specified, using the default config. In order to specify a config file use ../redis-5.0.4/src/redis-server /path/to/redis.conf 34127:M 21 Jul 2021 21:57:26.284 * Increased maximum number of open files to 10032 (it was originally set to 1024). _._ _.-``__ ''-._ _.-`` `. `_. ''-._ Redis 5.0.4 (00000000/0) 64 bit .-`` .-```. ```\/ _.,_ ''-._ ( ' , .-` | `, ) Running in standalone mode |`-._`-...-` __...-.``-._|'` _.-'| Port: 6379 | `-._ `._ / _.-' | PID: 34127 `-._ `-._ `-./ _.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | http://redis.io `-._ `-._`-.__.-'_.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | `-._ `-._`-.__.-'_.-' _.-' `-._ `-.__.-' _.-' `-._ _.-' `-.__.-' 34127:M 21 Jul 2021 21:57:26.294 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 34127:M 21 Jul 2021 21:57:26.294 # Server initialized 34127:M 21 Jul 2021 21:57:26.294 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 34127:M 21 Jul 2021 21:57:26.296 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled. 34127:M 21 Jul 2021 21:57:26.296 * Ready to accept connections
2. 查看out文件
[root@localhost test]# ll total 48 -rw-r--r--. 1 root root 34219 Jul 21 21:57 redisout.34127 -rw-r--r--. 1 root root 134 Jul 21 21:57 redisout.34128 -rw-r--r--. 1 root root 134 Jul 21 21:57 redisout.34129 -rw-r--r--. 1 root root 134 Jul 21 21:57 redisout.34130
3. 我們知道啟動一個程序需要socket、bind、listen, 搜索bind
[root@localhost test]# grep bind ./* ./redisout.34127:bind(6, {sa_family=AF_INET6, sin6_port=htons(6379), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0 ./redisout.34127:bind(7, {sa_family=AF_INET, sin_port=htons(6379), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
4. 查看 redisout.34127
... socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 7 setsockopt(7, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 bind(7, {sa_family=AF_INET, sin_port=htons(6379), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 listen(7, 511) = 0 ... epoll_create(1024) = 5 ... epoll_ctl(5, EPOLL_CTL_ADD, 6, {EPOLLIN, {u32=6, u64=6}}) = 0 epoll_ctl(5, EPOLL_CTL_ADD, 7, {EPOLLIN, {u32=7, u64=7}}) = 0 epoll_ctl(5, EPOLL_CTL_ADD, 3, {EPOLLIN, {u32=3, u64=3}}) = 0 ... epoll_wait(5, [], 10128, 0) = 0 open("/proc/34127/stat", O_RDONLY) = 8 read(8, "34127 (redis-server) R 34125 341"..., 4096) = 341 close(8) = 0 read(3, 0x7ffd0d4c055f, 1) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(5, [], 10128, 100) = 0 open("/proc/34127/stat", O_RDONLY) = 8 read(8, "34127 (redis-server) R 34125 341"..., 4096) = 341 close(8) = 0 read(3, 0x7ffd0d4c055f, 1) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(5, [], 10128, 100) = 0 ...
5. 建立一個客戶端並且存一個值
[root@localhost test]# ../redis-5.0.4/src/redis-cli 127.0.0.1:6379> set testkey testvalue OK
6. 繼續查看34127 文件
。。。 accept(7, {sa_family=AF_INET, sin_port=htons(48084), sin_addr=inet_addr("127.0.0.1")}, [128->16]) = 8 。。。 epoll_ctl(5, EPOLL_CTL_ADD, 8, {EPOLLIN, {u32=8, u64=8}}) = 0 。。。 epoll_wait(5, [{EPOLLIN, {u32=8, u64=8}}], 10128, 6) = 1 read(8, "*1\r\n$7\r\nCOMMAND\r\n", 16384) = 17 。。。 read(8, "*3\r\n$3\r\nset\r\n$7\r\ntestkey\r\n$9\r\nte"..., 16384) = 41 read(3, 0x7ffd0d4c055f, 1) = -1 EAGAIN (Resource temporarily unavailable) write(8, "+OK\r\n", 5) = 5 。。。
可以看看到也read接受客戶端發送的數據和write寫回到客戶端的數據滿足redis 協議發送請求數據和解析響應數據
7. 客戶端發送get 請求
127.0.0.1:6379> get testkey "testvalue"
8. 查看out 文件
。。。 epoll_wait(5, [{EPOLLIN, {u32=8, u64=8}}], 10128, 100) = 1 read(8, "*2\r\n$3\r\nget\r\n$7\r\ntestkey\r\n", 16384) = 26 read(3, 0x7ffd0d4c055f, 1) = -1 EAGAIN (Resource temporarily unavailable) write(8, "$9\r\ntestvalue\r\n", 15) = 15 。。。
9. 上面也可以看到redis 啟動的時候啟動了4個線程(根據生成的out文件可以看出來), 也可以用top 查看
(1) 查看PID
[root@localhost test]# netstat -nltp | grep 6379 tcp 0 0 0.0.0.0:6379 0.0.0.0:* LISTEN 34127/../redis-5.0. tcp6 0 0 :::6379 :::* LISTEN 34127/../redis-5.0.
(2) 查看線程信息
[root@localhost test]# top -Hp 34127
redis 單線程是說接受請求、處理獲取數據以及寫數據等核心操作是單線程,一個線程內完成的,其他線程用來處理AOF、刪除過期key 等操作。
關於redis接受數據協議和發送數據協議參考 https://www.cnblogs.com/qlqwjy/p/8560052.html
補充:nginx單線程多路復用查看,epoll 的過程
1. 用strace 啟動監測
[root@localhost sbin]# strace -ff -o out ./nginx
2. 查看生成的out文件
[root@localhost sbin]# ll total 3796 -rwxr-xr-x. 1 root root 3851552 Jul 22 01:02 nginx -rw-r--r--. 1 root root 20027 Jul 22 03:56 out.47227 -rw-r--r--. 1 root root 1100 Jul 22 03:56 out.47228 -rw-r--r--. 1 root root 5512 Jul 22 03:56 out.47229
可以看到生成3個文件
3. ps 查看相關進程
[root@localhost sbin]# ps -ef | grep nginx | grep -v 'grep' root 47225 38323 0 03:56 pts/1 00:00:00 strace -ff -o out ./nginx root 47228 1 0 03:56 ? 00:00:00 nginx: master process ./nginx nobody 47229 47228 0 03:56 ? 00:00:00 nginx: worker process
可以看到由一個master進程一個worker進程。master進程負責重啟、檢測語法等,worker進程用於接收請求。
4. 查看out文件
(1) 查看47228 master 文件
1 set_robust_list(0x7fec5129da20, 24) = 0 2 setsid() = 47228 3 umask(000) = 022 4 open("/dev/null", O_RDWR) = 7 5 dup2(7, 0) = 0 6 dup2(7, 1) = 1 7 close(7) = 0 8 open("/usr/local/nginx/logs/nginx.pid", O_RDWR|O_CREAT|O_TRUNC, 0644) = 7 9 pwrite64(7, "47228\n", 6, 0) = 6 10 close(7) = 0 11 dup2(5, 2) = 2 12 close(3) = 0 13 rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT USR1 USR2 ALRM TERM CHLD WINCH IO], NULL, 8) = 0 14 socketpair(AF_UNIX, SOCK_STREAM, 0, [3, 7]) = 0 15 ioctl(3, FIONBIO, [1]) = 0 16 ioctl(7, FIONBIO, [1]) = 0 17 ioctl(3, FIOASYNC, [1]) = 0 18 fcntl(3, F_SETOWN, 47228) = 0 19 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 20 fcntl(7, F_SETFD, FD_CLOEXEC) = 0 21 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fec5129da10) = 47229 22 rt_sigsuspend([], 8
可以看到master 沒有epoll相關命令,master 進程主要用來負責接受信號、熱更新、熱部署、監聽worker服務狀態。也可以看到最后通過clone 命令創建一個47220 worker子進程。
(2) 查看47229 文件
。。。 epoll_create(512) = 8 eventfd2(0, 0) = 9 epoll_ctl(8, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLET, {u32=7088384, u64=7088384}}) = 0 socketpair(AF_UNIX, SOCK_STREAM, 0, [10, 11]) = 0 epoll_ctl(8, EPOLL_CTL_ADD, 10, {EPOLLIN|EPOLLRDHUP|EPOLLET, {u32=7088384, u64=7088384}}) = 0 close(11) = 0 epoll_wait(8, [{EPOLLIN|EPOLLHUP|EPOLLRDHUP, {u32=7088384, u64=7088384}}], 1, 5000) = 1 close(10) = 0 mmap(NULL, 225280, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fec51266000 brk(NULL) = 0x20ba000 brk(0x20f1000) = 0x20f1000 epoll_ctl(8, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLRDHUP, {u32=1361469456, u64=140652950478864}}) = 0 close(3) = 0 epoll_ctl(8, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLRDHUP, {u32=1361469672, u64=140652950479080}}) = 0 epoll_wait(8, 。。。
(3) curl 進行訪問測試
[root@localhost test3]# curl http://localhost:80 <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html>
(4) 繼續查看47229 文件
epoll_ctl(8, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLRDHUP, {u32=1361469672, u64=140652950479080}}) = 0 epoll_wait(8, [{EPOLLIN, {u32=1361469456, u64=140652950478864}}], 512, -1) = 1 accept4(6, {sa_family=AF_INET, sin_port=htons(40704), sin_addr=inet_addr("127.0.0.1")}, [112->16], SOCK_NONBLOCK) = 3 epoll_ctl(8, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLRDHUP|EPOLLET, {u32=1361469888, u64=140652950479296}}) = 0 epoll_wait(8, [{EPOLLIN, {u32=1361469888, u64=140652950479296}}], 512, 60000) = 1 recvfrom(3, "GET / HTTP/1.1\r\nUser-Agent: curl"..., 1024, 0, NULL, NULL) = 73 stat("/usr/local/nginx/html/index.html", {st_mode=S_IFREG|0644, st_size=612, ...}) = 0 open("/usr/local/nginx/html/index.html", O_RDONLY|O_NONBLOCK) = 10 fstat(10, {st_mode=S_IFREG|0644, st_size=612, ...}) = 0 writev(3, [{iov_base="HTTP/1.1 200 OK\r\nServer: nginx/1"..., iov_len=238}], 1) = 238 sendfile(3, 10, [0] => [612], 612) = 612 write(4, "127.0.0.1 - - [22/Jul/2021:04:29"..., 86) = 86 close(10) = 0 setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0 epoll_wait(8, [{EPOLLIN|EPOLLRDHUP, {u32=1361469888, u64=140652950479296}}], 512, 65000) = 1 recvfrom(3, "", 1024, 0, NULL, NULL) = 0 close(3) = 0 epoll_wait(8,
(5) 再次過濾查看socket相關以及epoll 相關
[root@localhost sbin]# grep socket ./* Binary file ./nginx matches ./out.47227:socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4 ./out.47227:connect(4, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) ./out.47227:socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4 ./out.47227:connect(4, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) ./out.47227:socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4 ./out.47227:connect(4, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) ./out.47227:socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4 ./out.47227:connect(4, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) ./out.47227:socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 6 ./out.47228:socketpair(AF_UNIX, SOCK_STREAM, 0, [3, 7]) = 0 ./out.47229:socketpair(AF_UNIX, SOCK_STREAM, 0, [10, 11]) = 0 [root@localhost sbin]# grep bind ./* Binary file ./nginx matches ./out.47227:bind(6, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [root@localhost sbin]# grep listen ./* Binary file ./nginx matches ./out.47227:listen(6, 511) = 0 ./out.47227:listen(6, 511) = 0 [root@localhost sbin]# grep epoll_create ./* Binary file ./nginx matches ./out.47227:epoll_create(100) = 5 ./out.47229:epoll_create(512) = 8 [root@localhost sbin]# grep epoll_ctl ./* Binary file ./nginx matches ./out.47229:epoll_ctl(8, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLET, {u32=7088384, u64=7088384}}) = 0 ./out.47229:epoll_ctl(8, EPOLL_CTL_ADD, 10, {EPOLLIN|EPOLLRDHUP|EPOLLET, {u32=7088384, u64=7088384}}) = 0 ./out.47229:epoll_ctl(8, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLRDHUP, {u32=1361469456, u64=140652950478864}}) = 0 ./out.47229:epoll_ctl(8, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLRDHUP, {u32=1361469672, u64=140652950479080}}) = 0 ./out.47229:epoll_ctl(8, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLRDHUP|EPOLLET, {u32=1361469888, u64=140652950479296}}) = 0 [root@localhost sbin]# grep epoll_wait ./* Binary file ./nginx matches ./out.47229:epoll_wait(8, [{EPOLLIN|EPOLLHUP|EPOLLRDHUP, {u32=7088384, u64=7088384}}], 1, 5000) = 1 ./out.47229:epoll_wait(8, [{EPOLLIN, {u32=1361469456, u64=140652950478864}}], 512, -1) = 1 ./out.47229:epoll_wait(8, [{EPOLLIN, {u32=1361469888, u64=140652950479296}}], 512, 60000) = 1 ./out.47229:epoll_wait(8, [{EPOLLIN|EPOLLRDHUP, {u32=1361469888, u64=140652950479296}}], 512, 65000) = 1 ./out.47229:epoll_wait(8,