OpenvSwitch2.4.0源碼解讀

本文轉載自查看原文 2016-01-05 17:10 3448 OpenvSwitch/ openvswitch/ sdn/ OVS

原文發表在我的博客主頁，轉載請注明出處！

一.前言

OpenvSwitch，虛擬交換機，以下簡稱OVS，是雲計算和SDN領域非常重要的一個開源交換機，如果需要深入研究雲計算和SDN的數據平面，讀懂OVS的源碼是非常重要的，現有的關於OVS的資料都是OpenvSwitch2.3.*版本的，而ubuntu14.04已經問世好久，其支持OVS2.4.0+版本的源碼分析卻沒有找見。本文參考了大量的資料，從一個初學者的角度出發（側重於OpenFlow協議的實現），對OVS2.4.0源碼按照數據流程進行簡單的分析。

二.概述

關於OVS的概述可以參見我的另一篇博客
在閱讀代碼的時候，推薦Source Insight和Sublime Text 3
常用修改建議：
在工作中一般在這幾個地方來修改內核代碼以達到自己的目的：第一個是datapath.c中的ovs_dp_process_received_packet(struct vportp, struct sk_buffskb)函數內添加相應的代碼來達到自己的目的，因為對於每個數據包來說這個函數都是必經之地；第二個就是自己去設計自己的流表了；第三個和第二個是相關聯的，就是根據流表來設計自己的action，完成自己想要的功能。
OpenFlow修改建議：
主要關注ofproto中的文件，如ofproto.c和connmgr.c文件，其中ofproto.c中的handle_openflow函數是做SDN相關工作的主要修改的地方。

三.源碼分析

從main函數開始

int
main(int argc, char *argv[])
{
    char *unixctl_path = NULL;
    struct unixctl_server *unixctl;
    char *remote;
    bool exiting;
    int retval;

    set_program_name(argv[0]);         //設置程序名稱、版本、編譯日期等信息
    retval = dpdk_init(argc,argv);
    if (retval < 0) {
        return retval;
    }

    argc -= retval;
    argv += retval;

    ovs_cmdl_proctitle_init(argc, argv);              //復制出輸入的參數列表到新的存儲中，讓argv指向這塊內存，主要是為了后面的proctitle_set()函數（在deamonize_start()->monitor_daemon()中調用，可能修改原argv存儲）做准備
    service_start(&argc, &argv);
    remote = parse_options(argc, argv, &unixctl_path);    //解析參數，其中unixctl_path存儲unixctrl域的sock名，作為接收外部控制命令的渠道；而remote存儲連接到ovsdb的信息，即連接到配置數據庫的sock名
    fatal_ignore_sigpipe();                   //忽略pipe讀信號的結束
    ovsrec_init();                            //數據表結構初始化，包括13張數據表

    daemonize_start();                        //讓進程變為守護程序

    if (want_mlockall) {
#ifdef HAVE_MLOCKALL
        if (mlockall(MCL_CURRENT | MCL_FUTURE)) {
            VLOG_ERR("mlockall failed: %s", ovs_strerror(errno));
        }
#else
        VLOG_ERR("mlockall not supported on this system");
#endif
    }

    retval = unixctl_server_create(unixctl_path, &unixctl);     //創建一個unixctl server(存放unixctl)，並監聽在////unixctl_path指定的punix路徑
    if (retval) {
        exit(EXIT_FAILURE);
    }
    unixctl_command_register("exit", "", 0, 0, ovs_vswitchd_exit, &exiting);   //注冊unixctl命令

    bridge_init(remote);                           //讀取數據庫做一些初始化工作
    free(remote);

    exiting = false;
    while (!exiting) {
        memory_run();
        if (memory_should_report()) {
            struct simap usage;

            simap_init(&usage);
            bridge_get_memory_usage(&usage);
            memory_report(&usage);
            simap_destroy(&usage);
        }
        bridge_run();
        unixctl_server_run(unixctl);      //從unixctl指定的server中獲取數據，並執行對應的配置命令
        netdev_run();					  //執行在netdev_classes上定義的每個netdev_classs實體，調用他們的run()

        memory_wait();
        bridge_wait();
        unixctl_server_wait(unixctl);
        netdev_wait();
        if (exiting) {
            poll_immediate_wake();
        }
        poll_block();                  //阻塞，直到之前被poll_fd_wait()注冊過的事件發生，或者等待時間超過
        if (should_service_stop()) {
            exiting = true;
        }
    }
    bridge_exit();
    unixctl_server_destroy(unixctl);
    service_stop();

    return 0;
}

進入bridge_run()函數，這個函數在Bridge.c文件中，ofproto_class類型在ofproto_classes[]變量中聲明。而ofproto_classes[]變量是通過ofproto_init()函數來初始化的，在ofproto.c文件中，繼續調用ofproto_class_register()函數，初始化之后僅含有一個變量——ofproto_dpif_class。而這個類定義在ofproto-dpif.c文件中，聲明了各個變量和操作函數。

void
bridge_run(void)
{
    static struct ovsrec_open_vswitch null_cfg;
    const struct ovsrec_open_vswitch *cfg;

    bool vlan_splinters_changed;

    ovsrec_open_vswitch_init(&null_cfg);

    ovsdb_idl_run(idl);

    if (ovsdb_idl_is_lock_contended(idl)) {
        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
        struct bridge *br, *next_br;

        VLOG_ERR_RL(&rl, "another ovs-vswitchd process is running, "
                    "disabling this process (pid %ld) until it goes away",
                    (long int) getpid());

        HMAP_FOR_EACH_SAFE (br, next_br, node, &all_bridges) {
            bridge_destroy(br);
        }
        /* Since we will not be running system_stats_run() in this process
         * with the current situation of multiple ovs-vswitchd daemons,
         * disable system stats collection. */
        system_stats_enable(false);
        return;
    } else if (!ovsdb_idl_has_lock(idl)
               || !ovsdb_idl_has_ever_connected(idl)) {
        /* Returns if not holding the lock or not done retrieving db
         * contents. */
        return;
    }
    cfg = ovsrec_open_vswitch_first(idl);

    /* Initialize the ofproto library.  This only needs to run once, but
     * it must be done after the configuration is set.  If the
     * initialization has already occurred, bridge_init_ofproto()
     * returns immediately. */
    bridge_init_ofproto(cfg);

    /* Once the value of flow-restore-wait is false, we no longer should
     * check its value from the database. */
    if (cfg && ofproto_get_flow_restore_wait()) {
        ofproto_set_flow_restore_wait(smap_get_bool(&cfg->other_config,
                                        "flow-restore-wait", false));
    }

    bridge_run__();

    /* Re-configure SSL.  We do this on every trip through the main loop,
     * instead of just when the database changes, because the contents of the
     * key and certificate files can change without the database changing.
     *
     * We do this before bridge_reconfigure() because that function might
     * initiate SSL connections and thus requires SSL to be configured. */
    if (cfg && cfg->ssl) {
        const struct ovsrec_ssl *ssl = cfg->ssl;

        stream_ssl_set_key_and_cert(ssl->private_key, ssl->certificate);
        stream_ssl_set_ca_cert_file(ssl->ca_cert, ssl->bootstrap_ca_cert);
    }

    /* If VLAN splinters are in use, then we need to reconfigure if VLAN
     * usage has changed. */
    vlan_splinters_changed = false;
    if (vlan_splinters_enabled_anywhere) {
        struct bridge *br;

        HMAP_FOR_EACH (br, node, &all_bridges) {
            if (ofproto_has_vlan_usage_changed(br->ofproto)) {
                vlan_splinters_changed = true;
                break;
            }
        }
    }

    if (ovsdb_idl_get_seqno(idl) != idl_seqno || vlan_splinters_changed) {
        struct ovsdb_idl_txn *txn;

        idl_seqno = ovsdb_idl_get_seqno(idl);
        txn = ovsdb_idl_txn_create(idl);
        bridge_reconfigure(cfg ? cfg : &null_cfg);

        if (cfg) {
            ovsrec_open_vswitch_set_cur_cfg(cfg, cfg->next_cfg);
            discover_types(cfg);
        }

        /* If we are completing our initial configuration for this run
         * of ovs-vswitchd, then keep the transaction around to monitor
         * it for completion. */
        if (initial_config_done) {
            /* Always sets the 'status_txn_try_again' to check again,
             * in case that this transaction fails. */
            status_txn_try_again = true;
            ovsdb_idl_txn_commit(txn);
            ovsdb_idl_txn_destroy(txn);
        } else {
            initial_config_done = true;
            daemonize_txn = txn;
        }
    }

    if (daemonize_txn) {
        enum ovsdb_idl_txn_status status = ovsdb_idl_txn_commit(daemonize_txn);
        if (status != TXN_INCOMPLETE) {
            ovsdb_idl_txn_destroy(daemonize_txn);
            daemonize_txn = NULL;

            /* ovs-vswitchd has completed initialization, so allow the
             * process that forked us to exit successfully. */
            daemonize_complete();

            vlog_enable_async();

            VLOG_INFO_ONCE("%s (Open vSwitch) %s", program_name, VERSION);
        }
    }

    run_stats_update();
    run_status_update();
    run_system_stats();
}

繼續調用bridge_run__函數，在里面先是調用了ofproto_type_run(type)函數，接着調用了ofproto_run(br->ofproto)函數，接下來一個一個看

static void
bridge_run__(void)
{
    struct bridge *br;
    struct sset types;
    const char *type;

    /* Let each datapath type do the work that it needs to do. */
    sset_init(&types);
    ofproto_enumerate_types(&types);
    SSET_FOR_EACH (type, &types) {
        ofproto_type_run(type);
    }
    sset_destroy(&types);

    /* Let each bridge do the work that it needs to do. */
    HMAP_FOR_EACH (br, node, &all_bridges) {
        ofproto_run(br->ofproto);                                  //處理all_bridge上的每個bridge
    }
}

先看ofproto_type_run(type)函數，調用type_run()函數，這個函數來自於ofproto_dpif.c文件中的type_run()函數，在這個函數中，如果上層同意接收數據，則調用udpif_set_threads(backer->dpif, n_handlers, n_revalidators)；通知udpif它需要多少個線程去處理upcalls。接着會調用udpif_start_threads(udpif, n_handlers, n_revalidators),繼續調用udpif_upcall_handler()，這個處理線程從dpif（datapath interface）upcalls，對其進行處理，然后安裝相應的流表，然后繼續調用recv_upcalls(handler)函數，在這個函數中會調用process_upcall()函數來處理upcall。
ofproto_run()函數在ofproto.c文件中，里面調用了ofproto_class->run(p)，根據前面的分析，這個調用了ofproto-dpif.c文件中的ofproto_dpif_class的run，他還調用了connmgr_run(p->connmgr, handle_openflow)函數來處理來自控制器的OpenFlow消息：

int
ofproto_run(struct ofproto *p)
{
    int error;
    uint64_t new_seq;

    error = p->ofproto_class->run(p);
    if (error && error != EAGAIN) {
        VLOG_ERR_RL(&rl, "%s: run failed (%s)", p->name, ovs_strerror(error));
    }

    run_rule_executes(p);

    /* Restore the eviction group heap invariant occasionally. */
    if (p->eviction_group_timer < time_msec()) {
        size_t i;

        p->eviction_group_timer = time_msec() + 1000;

        for (i = 0; i < p->n_tables; i++) {
            struct oftable *table = &p->tables[i];
            struct eviction_group *evg;
            struct rule *rule;

            if (!table->eviction_fields) {
                continue;
            }

            if (table->n_flows > 100000) {
                static struct vlog_rate_limit count_rl =
                    VLOG_RATE_LIMIT_INIT(1, 1);
                VLOG_WARN_RL(&count_rl, "Table %"PRIuSIZE" has an excessive"
                             " number of rules: %d", i, table->n_flows);
            }

            ovs_mutex_lock(&ofproto_mutex);
            CLS_FOR_EACH (rule, cr, &table->cls) {
                if (rule->idle_timeout || rule->hard_timeout) {
                    if (!rule->eviction_group) {
                        eviction_group_add_rule(rule);
                    } else {
                        heap_raw_change(&rule->evg_node,
                                        rule_eviction_priority(p, rule));
                    }
                }
            }

            HEAP_FOR_EACH (evg, size_node, &table->eviction_groups_by_size) {
                heap_rebuild(&evg->rules);
            }
            ovs_mutex_unlock(&ofproto_mutex);
        }
    }

    if (p->ofproto_class->port_poll) {
        char *devname;

        while ((error = p->ofproto_class->port_poll(p, &devname)) != EAGAIN) {
            process_port_change(p, error, devname);
        }
    }

    new_seq = seq_read(connectivity_seq_get());
    if (new_seq != p->change_seq) {
        struct sset devnames;
        const char *devname;
        struct ofport *ofport;

        /* Update OpenFlow port status for any port whose netdev has changed.
         *
         * Refreshing a given 'ofport' can cause an arbitrary ofport to be
         * destroyed, so it's not safe to update ports directly from the
         * HMAP_FOR_EACH loop, or even to use HMAP_FOR_EACH_SAFE.  Instead, we
         * need this two-phase approach. */
        sset_init(&devnames);
        HMAP_FOR_EACH (ofport, hmap_node, &p->ports) {
            uint64_t port_change_seq;

            port_change_seq = netdev_get_change_seq(ofport->netdev);
            if (ofport->change_seq != port_change_seq) {
                ofport->change_seq = port_change_seq;
                sset_add(&devnames, netdev_get_name(ofport->netdev));
            }
        }
        SSET_FOR_EACH (devname, &devnames) {
            update_port(p, devname);
        }
        sset_destroy(&devnames);

        p->change_seq = new_seq;
    }

    connmgr_run(p->connmgr, handle_openflow);

    return error;
}

上面函數調用ofproto-dpif.c中的run函數
在run()函數中，會調用connmgr_send_packet_in()函數給每個控制器發送OFPT_PACKET_IN消息，這個函數調用schedule_packet_in()函數進行發包調度。
可選調用netflow_run()和sflow_run()函數，進行對netflow和sflow的支持
在ofproto_run()函數后面會調用connmgr_run()函數，之后調用ofconn_run函數，然后在這個函數里面，rconn_run()負責連接控制器；rconn_recv()函數負責從控制器接收數據，handle_openflow()函數負責處理從控制器得到的消息(這個函數在ofproto.c文件中)
最后回到ovs-vswitchd.c文件中
unixctl_server_run(unixctl)：從unixctl指定的server中獲取數據，並執行對應的配置命令
netdev_run()：執行在netdev_classes上定義的每個netdev_class實體，調用它們的run()。
接着進行循環等待事件處理，包括memory, bridge, unixctl_server, netdev這些被poll_fd_wait()注冊過的事件
poll_block：阻塞，直到之前被poll_fd_wait()注冊過的事件發生，或者等待時間超過poll_timer_wait()注冊的最短時間
退出bridge，關閉unixctl連接，取消信號的處理

四.總結

前面從初學者的角度，按照數據包流向，對OVS2.4.0源碼進行了分析。對於研究SDN的人來說，ofproto模塊是非常重要的，可以進一步詳細閱讀其源碼。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 OpenVSwitch [源碼解讀] ResNet源碼解讀（pytorch） openvswitch2.11.0修改源碼后重新編譯 openvswitch2.11.0修改源碼后重新編譯（2） spdk源碼解讀1 Disruptor源碼解讀 MyBatis源碼解讀（4）——SqlSession（上） ansible源碼解讀 Hikaricp源碼解讀（1）——簡介【源碼解讀】cycleGAN（一）：網絡