Verilog HDL語言實現的單周期CPU設計（全部代碼及其注釋）

本文轉載自查看原文 2018-07-10 16:35 7674 essay

寫在前面：本博客為本人原創，嚴禁任何形式的轉載！本博客只允許放在博客園（.cnblogs.com），如果您在其他網站看到這篇博文，請通過下面這個唯一的合法鏈接轉到原文！

本博客全網唯一合法URL：http://www.cnblogs.com/acm-icpcer/p/9289857.html

（1）shift：

module shift (d,sa,right,arith,sh);
input  [31:0]  d;
input  [4:0]     sa;
input  right,arith;
output [31:0] sh;
reg  [31:0] sh;
always  @*  begin
    if   (!right)  begin                        //shift left
        sh = d << sa;
    end else  if   (!arith)  begin              //shift right logical
        sh =  d  >>  sa;
    end else begin                              //shift right arithmetic
        sh =  $signed(d)  >>>  sa;
    end
end
endmodule

移位器，<<表示向左移，>>表示向右移，>>>表示無符號向右移，sa表示向右移的位數。因為做多移動32位，所以sa用5位寄存器表示。

圖一

（2）scinstmem（圖一左邊藍圈）：

module scinstmem (a,inst); 
    input [31:0] a; 
    output [31:0] inst; 
    wire [31:0] rom [0:31];
    assign    rom[5'h00] = 32'h3c010000;    //    (00)    main: lui r1,0
    assign    rom[5'h01] = 32'h34240050;    //    (04)        ori r4,r1,80
    assign    rom[5'h02] = 32'h20050004;    //    (08)        addi r5,r0, 4
    assign    rom[5'h03] = 32'h0c000018;    //    (0c)    call: jal sum
    assign    rom[5'h04] = 32'hac820000;    //    (10)        sw r2,0(r4)
    assign    rom[5'h05] = 32'h8c890000;    //    (14)        lw    r9,    0(r4)
    assign    rom[5'h06] = 32'h01244022;    //    (18)        sub    r8,    r9.    r4
    assign    rom[5'h07] = 32'h20050003;    //    (lc)        addi    r5,    r0.    3
    assign    rom[5'h08] = 32'h20a5ffff;    //    (20)    loop2:    addi    r5,    r5,    -1
    assign    rom[5'h09] = 32'h34a8ffff;    //    (24)        ori    r8,    r5,    0xffff
    assign    rom[5'h0A] = 32'h39085555;    //    (28)        xori    r8.    r8,    0x5555
    assign    rom[5'h0B] = 32'h2009ffff;    //    (2c)        addi    r9,    rO,    -1
    assign    rom[5'h0C] = 32'h312affff;    //    (30)        andi    rlO,    r9,    0xffff
    assign    rom[5'h0D] = 32'h01493025;    //    (34)        or    r6.    rlO,    r9
    assign    rom[5'h0E] = 32'h01494026;    //    (38)        xor    r8,    rlO,    r9
    assign    rom[5'h0F] = 32'h01463824;    //    (3c)        and    r7,    rlO,    r6
    assign    rom[5'h10] = 32'h10a00001;    //    (40)        beq    r5,    r0,    shift
    assign    rom[5'h11] = 32'h08000008;    //    (44)        j    loop2    
    assign    rom[5'h12] = 32'h2005ffff;    //    (48)    shift:    addi    r5.    r0,    -1
    assign    rom[5'h13] = 32'h000543c0;    //    (4c)        sll    r8.    r5.    15
    assign    rom[5'h14] = 32'h00084400;    //    (50)        sll    r8,    r8,    16
    assign    rom[5'h15] = 32'h00084403;    //    (54)        sra    r8,    r8,    16
    assign    rom[5'h16] = 32'h000843c2;    //    (58)        srl    r8.    r8.    15
    assign    rom[5'h17] = 32'h08000017;    //    (5c)    finish:    j    finish    
    assign    rom[5'h18] = 32'h00004020;    //    (60)    sum:    add    r8,    r0,    r0
    assign    rom[5'h19] = 32'h8c890000;    //    (64)    loop:    lw    r9,    (r4)
    assign    rom[5'h1A] = 32'h20840004;    //    (68)        addi    r4,    r4,    4
    assign    rom[5'h1B] = 32'h01094020;    //    (6c)        add    r8,    r8,    r9
    assign    rom[5'h1C] = 32'h20a5ffff;    //    (70)        addi    r5,    r5,    -1
    assign    rom[5'h1D] = 32'h14a0fffb;    //    (74)        bne    rS,    r0,    loop
    assign    rom[5'h1E] = 32'h00081000;    //    (78)        sll    r2f    r8f    0
    assign    rom[5'h1F] = 32'h03e00008;    //    (7c)        jr    r31        
    assign    inst = rom[a[6:2]];

endmodule

只讀指令存儲器，用於存放存儲的程序，用32個32位寄存器表示，每一個寄存器存儲一條指令的機器語言格式（用8位16進制代碼表示32位2進制代碼，每一條機器指令后面的注釋是它的匯編格式）。a用於存放本指令周期內寄存器pc的值，output用於存放即將要執行的寄存器pc所指向的指令所在的寄存器編號。這就是為什么要在最后加上代碼：assign inst = rom[a[6:2]];實際上，a[6:2]所代表的就是pc內容格式所表示的寄存器號碼的字段。

（3）scdatamem（圖一右邊藍圈）：

module scdatamem (clk,dataout,datain,addr,we,inclk,outclk);
input    [31:0]    datain;
input    [31:0]    addr ;
input        clk, we, inclk, outclk;
output    [31:0]    dataout;
reg [31:0] ram    [0:31];
assign    dataout    =ram[addr[6:2]];
always @ (posedge clk) begin
    if (we) ram[addr[6:2]] = datain;
end
integer i;
initial begin
    for (i = 0;i < 32;i = i + 1)
        ram[i] = 0;
    ram[5'h14] = 32'h000000a3;
    ram[5'h15] = 32'h00000027;
    ram[5'h16] = 32'h00000079;
    ram[5'h17] = 32'h00000115;
end
endmodule

we表示寫使能，由圖一可知，此項信號由控制器負責譯碼傳輸過來，故而if (we) ram[addr[6:2]] = datain;表示當有寫使能的，通過輸入的addr數據的寄存器字段指定相應的32位寄存器，並將輸入的數據datain寫入數據存儲器內的相應的ram部分。always @ (posedge clk) begin表示始終在上升沿的時候觸發寫ram。后面的：

initial begin

for (i = 0;i < 32;i = i + 1)

ram[i] = 0;

ram[5'h14] = 32'h000000a3;

ram[5'h15] = 32'h00000027;

ram[5'h16] = 32'h00000079;

ram[5'h17] = 32'h00000115;

end

與存儲在指令存儲器內的匯編源代碼的具體意義相關，先不分析了。

（4）sccu_dataflow（控制器譯碼模塊）：

module sccu_dataflow (op,func,z,wmem,wreg,regrt,m2reg,aluc,shift,aluimm,pcsource,jal,sext);
    input [5:0] op,func;
    input z;
    output wreg,regrt,jal,m2reg,shift,aluimm,sext,wmem;
    output [3:0] aluc;
    output [1:0] pcsource;
    
    wire r_type = ~|op;
    
    wire i_add = r_type&func[5]&~func[4]&~func[3]&~func[2]&~func[1]&~func[0];
    wire i_sub = r_type&func[5]&~func[4]&~func[3]&~func[2]&func[1]&~func[0];
    wire i_and = r_type&func[5]&~func[4]&~func[3]&func[2]&~func[1]&~func[0];
    wire i_or = r_type&func[5]&~func[4]&~func[3]&func[2]&~func[1]&func[0];
    wire i_xor = r_type&func[5]&~func[4]&~func[3]&func[2]&func[1]&~func[0];
    wire i_sll = r_type&~func[5]&~func[4]&~func[3]&~func[2]&~func[1]&~func[0];
    wire i_srl = r_type&~func[5]&~func[4]&~func[3]&~func[2]&func[1]&~func[0];
    wire i_sra = r_type&~func[5]&~func[4]&~func[3]&~func[2]&func[1]&func[0];
    wire i_jr = r_type&~func[5]&~func[4]&func[3]&~func[2]&~func[1]&~func[0];
    wire i_addi = ~op[5]&~op[4]&op[3]&~op[2]&~op[1]&~op[0];
    wire i_andi = ~op[5]&~op[4]&op[3]&op[2]&~op[1]&~op[0];
    wire i_ori = ~op[5]&~op[4]&op[3]&op[2]&~op[1]&op[0];
    wire i_xori = ~op[5]&~op[4]&op[3]&op[2]&op[1]&~op[0];
    wire i_lw = op[5]&~op[4]&~op[3]&~op[2]&op[1]&op[0];
    wire i_sw = op[5]&~op[4]&op[3]&~op[2]&op[1]&op[0];
    wire i_beq = ~op[5]&~op[4]&~op[3]&op[2]&~op[1]&~op[0];
    wire i_bne = ~op[5]&~op[4]&~op[3]&op[2]&~op[1]&op[0];
    wire i_lui = ~op[5]&~op[4]&op[3]&op[2]&op[1]&op[0];
    wire i_j = ~op[5]&~op[4]&~op[3]&~op[2]&op[1]&~op[0];
    wire i_jal = ~op[5]&~op[4]&~op[3]&~op[2]&op[1]&op[0];
    
    assign wreg = i_add|i_sub|i_and|i_or|i_xor|i_sll|i_srl|i_sra|i_addi|i_andi|i_ori|i_xori|i_lw|i_lui|i_jal;
    assign regrt= i_addi|i_andi|i_ori|i_xori|i_lw|i_lui;
    assign jal= i_jal;
    assign m2reg= i_lw;
    assign shift=i_sll|i_srl|i_sra;
    assign aluimm=i_addi|i_andi|i_ori|i_xori|i_lw|i_lui|i_sw;
    assign sext =i_addi|i_lw|i_sw|i_beq|i_bne;
    assign aluc[3]=i_sra;
    assign aluc[2]=i_sub|i_or|i_srl|i_sra|i_ori|i_lui;
    assign aluc[1]=i_xor|i_sll|i_sra|i_xori|i_beq|i_bne|i_lui;
    assign aluc[0]=i_and|i_or|i_sll|i_srl|i_sra|i_andi|i_ori;
    assign wmem = i_sw;
    assign pcsource[1]=i_jr|i_j|i_jal;
    assign pcsource[0]=i_beq&z|i_bne&~z|i_j|i_jal;
endmodule

圖二

此乃控制器譯碼部分。前面的20條wire語句對應於前面的mips指令集，其中的前10條是R型指令，其主要是先判斷是否是R型，如果是，再通過最后面的function字段判斷到底是哪一條R型指令；后面的10條wire語句就可以直接通過op字段判斷出是哪一條MIPS指令。判斷出是什么指令了就要進行譯碼，並輸出控制信號，代碼最后面的一堆assign就是干這個事情的。

圖三

現在取出幾條代碼來說明assign語句如何工作的：

1）

2）

assign pcsource[1]=i_jr|i_j|i_jal;

assign pcsource[0]=i_beq&z|i_bne&~z|i_j|i_jal;

兩條pcsource賦值語句用於告訴四選一選擇器判斷到底是用哪一條數據來源進行對pc值（下一條待執行的代碼段地址）的修改。例如：如果當前指令是跳轉指令時，那么就要通過讀取寄存器對的內容或者

通過alu的計算來獲得下一條指令的地址。

3）

assign aluc[3]=i_sra;

如果當前的指令是要用到alu的指令時，就要判斷具體要讓alu進行說明操作，故而要產生一些傳遞給alu的控制信號。

其余的assign指令不贅述，思想大同小異。

（5）sccpu_dataflow：

module  sccpu_dataflow(clock, resetn, inst,mem,pc, wmem,alu,data);
input     [31:0]   inst,mem;
input         clock,resetn;
output   [31:0]  pc,alu,data;
 
output wmem;
wire  [31:0] p4 , bpc, npc, adr, ra, alua, alub, res, alu_mem;
wire  [3:0] aluc;
wire  [4:0] reg_dest, wn;
wire  [1:0] pcsource;
wire  zero, wmem, wreg, regrt, m2reg, shift, aluimm, jal, sext;
wire  [31:0]  sa  =  {27'b0,inst[10:6]};
wire  [31:0]  offset  =  {imm[13:0],inst[15:0],2'b00};
sccu_dataflow cu  (inst[31:26] , inst[5:0] , zero, wmem,wreg,regrt,m2reg, aluc, shift, aluimm,pcsource, jal, sext);
wire   e  =  sext  &  inst[15];
wire   [15:0]       imm =  {16{e}};
wire  [31:0]       immediate  =  {imm,inst[15:0]};
dff32  ip  (npc,clock,resetn,pc);
cla32  pcplus4   (pc,32'h4,1'b0,p4);
cla32  br_adr     (p4,offset,1'b0, adr);
wire  [31:0]        jpc =  {p4[31:28],inst[25:0],2'b00};
mux2x32  alu_b  (data, immediate,aluimm, alub) ;
mux2x32  alu_a  (ra,sa,shift,alua);
mux2x32  result   (alu,mem,m2reg,alu_mem);
mux2x32  link (alu_mem,p4,jal,res);
mux2x5  reg_wn   (inst[15:11], inst[20: 16] , regrt, reg_dest);
assign wn = reg_dest   |   {5{jal}};                                 //ja1:  r31  <-- p4;
mux4x32  nextpc  (p4,adr,ra, jpc,pcsource,npc);
regfile  rf   (inst[25:21] ,inst[20:16] ,res,wn,wreg,clock,resetn,ra,data);
alu  al_unit   (alua,alub,aluc,alu, zero); 
endmodule

這個就是描述圖三中所有的部件，以及部件之間如何傳遞數據的關系的模塊調用代碼

（6）sccmop_dataflow：

module  sccomp_dataflow(clock, resetn, inst, pc, aluout, memout,mem_clk);
input  clock, resetn,mem_clk;
output   [31:0]  inst,pc, aluout,memout;
wire [31:0]   data;
wire   wmem;
sccpu_dataflow s (clock, resetn, inst,memout,pc, wmem, aluout, data);
scinstmem imem (pc,inst);
scdatamem dmem (clock, memout, data, aluout, wmem, mem_clk, mem_clk);
endmodule

最頂層的控制模塊，從代碼結構可以看出其主要控制數據通路、存儲好的程序和數據存儲模塊。

（7）regfile：

module regfile  (rna, rnb, d, wn,we, clk, clrn, qa, qb);
input       [4:0]  rna,rnb,wn;
input     [31:0]  d;
input     we, clk, clrn;
output  [31:0]  qa,qb;
reg     [31:0]  register  [1:31];  //  31  x  32-bit  regs


//  2  read ports
assign qa  =   (rna ==  0) ? 0 : register[rna]; 
assign qb  =   (rnb ==  0) ? 0 : register[rnb];
 


//  1  write port
always @(posedge clk or negedge clrn)
begin 
if  (clrn==0)  
begin
    integer i;
    for(i=1;i<32;i=i+1)
    register[i] <= 0;
end 
else  if((wn!=0)&&we)
register[wn]  <= d;
end
endmodule

這個就是位於圖三中心部分的是寄存器堆，前頭部分主要是完成了對寄存器堆的定義，后面就是對寄存器堆讀寫的分別實現。

比如代碼assign qa = (rna == 0) ? 0 : register[rna]; 就表示讀出ma所指定的寄存器中的值，並輸出到qa輸出口。

（8）mux4x32：

module mux4x32 (a0,a1,a2,a3,s,y);
    input [31:0] a0,a1,a2,a3;
    input [1:0] s;
    output [31:0] y;
    function [31:0] select;
        input [31:0] a0,a1,a2,a3;
        input [1:0] s;
        case (s)
            2'b00: select = a0;
            2'b01: select = a1;
            2'b10: select = a2;
            2'b11: select = a3;
        endcase
    endfunction
    assign  y = select(a0,a1,a2,a3,s);
endmodule

四選一多路選擇器，主要選擇從輸入的4個口中的具體選擇哪一個口子數據輸出。因為只有4個來源，所以只要使用兩位的s來進行選擇就行。代碼功能就是從a1，a2，a3，a4中通過s選一個輸出到y，s的內容一般由模塊（4），也就是控制器譯碼模塊決定。

（9）其余的選擇器：

module mux2x5 (a0,a1,s,y);
    input [4:0] a0,a1;
    input s;
    output [4:0] y;
    assign y = s?a1:a0;
endmodule

module mux2x32 (a0,a1,s,y);
    input [31:0] a0,a1;
    input s;
    output [31:0] y;
    assign y = s?a1:a0;
endmodule

功能與（8）中所述一致，但是比（8）的功能簡單多了。一個是32位數據的的二選一選擇器，一個是5位數據的二選一選擇器。

（10）dff32（32位寄存器）：

module dff32(d,clk,clrn,q);
    input [31:0] d;
    input     clk,clrn;
    output [31:0] q;
    reg  [31:0] q;
    always @ (negedge clrn or posedge clk)
        if (clrn == 0) begin 
            q <= 0;
        end else begin
            q <= d;
        end
endmodule

普通32位寄存器的代碼描述，當有清零信號來臨時就清零，否則寄存器內就存入輸入的32位數據q。

圖四

（11）並行進位加法器（CLA）的實現：

module cla32 (a,b,ci,s,co);
    input   [31:0]  a,b;
    input  ci;
    output   [31:0]   s;
    output co;
    wire  g_out,p_out;
    cla_32  cla   (a,b, ci,g_out,p_out, s); 
    assign  co  =  g_out| p_out &  ci;
endmodule




module add(a,b,c,g,p,s);
    input a,b,c;
    output g,p,s;
    assign s = a^b^c;
    assign g = a & b;
    assign p = a | b;
endmodule



module g_p  (g,p,c_in,g_out,p_out,c_out);
input  [1:0]  g,p;
input  c_in;
output g_out, p_out, c_out;
assign g_out = g[1]|p[1] & g[0];
assign p_out = p[1]  & p[0];
assign c_out = g[0]   |  p[0]  &  c_in;
endmodule



module cla_2 (a,b,c_in,g_out,p_out,s) ;
input  [1:0]  a,b;
input c_in;
output g_out, p_out;
output  [1:0]  s;
wire  [1:0]  g,p;
wire c_out;
add add0 (a[0],b[0],c_in, g[0],p[0],s[0]);
add add1 (a[1],b[1],c_out, g[1],p[1],s[1]);
g_p g_p0 (g,p,c_in,  g_out,p_out,c_out);
endmodule

module cla_4 (a,b, c_in,g_out,p_out,s);
input  [3:0]  a,b;
input  c_in;
output g_out, p_out;
output  [3:0]  s;
wire  [1:0]  g,p;
wire c_out;
cla_2 cla0 (a[1:0],b[1:0],c_in, g[0],p[0],s[1:0]);
cla_2 clal (a[3:2],b[3:2],c_out,g[1],p[1],s[3:2]);
g_p    g_p0  (g,p,c_in, g_out,p_out,c_out);
endmodule

module  cla_8   (a,b, c_in,g_out,p_out, s);
input   [7:0]  a,b;
input  c_in;
output  g_out, p_out;
output   [7:0]   s;
wire   [1:0]   g,p;
wire  c_out;
cla_4  cla0  (a[3:0],b[3:0],c_in, g[0],p[0],s[3:0]);
cla_4  c1a1  (a[7:4],b[7:4],c_out,g[1],p[1],s[7:4]);
g_p   g_p0  (g,p,c_in,  g_out,p_out,c_out);
endmodule


module cla_16 (a,b, c_in,g_out,p_out, s);
input   [15:0]  a,b;
input  c_in;
output  g_out, p_out;
output   [15:0]  s;
wire  [1:0]  g,p;
wire  c_out;
cla_8  cla0   (a[7:0],b[7:0],c_in,g[0],p[0],s[7:0]);
cla_8  cla1   (a[15:8],b[15:8],c_out,g[1],p[1],s[15:8]);
g_p    g_p0  (g,p,c_in,  g_out,p_out,c_out);
endmodule


module cla_32  (a,b,c_in,g_out,p_out, s);
input  [31:0]  a,b;
input c_in;
output  g_out, p_out;
output  [31:0]  s;
wire  [1:0]  g,p;
wire c_out;
cla_16 c1a0 (a[15:0],b[15:0],c_in,g[0],p[0],s[15:0]);
cla_16 c1a1 (a[31:16],b[31:16],c_out,g[1],p[1],s[31:16]);
g_p    g_p0  (g,p,c_in, g_out,p_out,c_out);
endmodule

這段代碼使用並行進位加法技術，從最基本的加法進位模型add實現全加器cla_2，逐步集成為4位全加器cla_4、8位的全加器cla_8、16位全加器cla_16、32位全加器cla_32（詳見南京大學袁春風老師的《計算機組成與系統結構（第二版）》清華出版社P72），其實這段代碼逐步集成為32位全加器的過程一看就知道是遞歸實現的，因為代碼結構都是一樣的。

（12）alu（運算邏輯單元）：

module alu (a,b,aluc,r,z);
input [31:0] a,b;    // aluc[3:0]
input [3:0] aluc;    
output  [31:0]  r;   // x 0 0 0  ADD
output z;  // x1 0 0   SUB
wire  [31:0]  d_and = a & b;  // x 0 01   WD
wire  [31:0] d_or = a | b;    // x1 01   0R
wire  [31:0] d_xor = a ^ b;   // x 0 1 0   XOR
wire  [31:0]  d_lui = {b[15:0],16'h0};  // x1 1 0   LUI
wire  [31:0]  d_and_or = aluc[2]?d_or : d_and;  // 0 01 1   SLL,
wire  [31:0]  d_xor_1ui= aluc[2]?d_lui : d_xor; // 0 1 1 1   SRL
wire  [31:0]  d_as,d_sh;   // 1 1 1 1   SRA
addsub32 as32  (a,b,aluc[2],d_as);
shift shifter  (b,a[4:0],aluc[2],aluc[3],d_sh) ;
mux4x32 se1ect  (d_as,d_and_or, d_xor_1ui, d_sh, aluc[1:0],r);
assign z = ~|r;
endmodule

alu算術邏輯單元提供出來基礎計算功能外的包括異或、按位與和按位或等邏輯計算功能。具體執行什么運算由4位aluc決定（最多能提供16種基礎計算方式）。具體解釋如下：

1）wire [31:0] d_and = a & b;這句代碼提供按位與運算

2）wire [31:0] d_or = a | b;這句代碼提供按位或運算

3）wire [31:0] d_xor = a ^ b;這句代碼提供按位異或運算

4）wire [31:0] d_lui = {b[15:0],16'h0};高16位是所給數據b的低16位，低16位補0，兩者拼接成32位數據

5）wire [31:0] d_and_or = aluc[2]?d_or : d_and;這句代碼選擇獲取按位與還是按位或運算

6）wire [31:0] d_xor_1ui= aluc[2]?d_lui : d_xor; 這句代碼與5）一致，目的在於選擇運算結果

7）addsub32 as32(a,b,aluc[2],d_as);調用下面的（13）模塊進行加或者減運算

8）shift shifter (b,a[4:0],aluc[2],aluc[3],d_sh) ;調用模塊（1）進行左移或者右移

9）mux4x32 se1ect (d_as,d_and_or, d_xor_1ui, d_sh, aluc[1:0],r);四選一輸出

（13）addsub32（32位加減運算模塊）：

module addsub32(a,b,sub,s);
    input [31:0] a,b;
    input          sub;
    output [31:0] s;
    cla32 as32 (a,b^{32{sub}},sub,s);
endmodule

addsub32模塊提供了32位數據的加或者減計算功能。具體是加還是減運算取決於sub的取值。

tz@COI HZAU

2018/7/10

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 verilog實現的16位CPU單周期設計 P4-verilog實現mips單周期CPU P4-單周期CPU（Verilog實現）淺談Verilog HDL代碼編寫風格 32位除法器的verilog語言實現 Verilog HDL交通燈的實現 R語言實現對基因組SNV進行注釋 8421BCD轉余3碼Verilog HDL的設計(1) 使用Verilog搭建一個單周期CPU 【Verilog HDL】汽車尾燈控制器的實現