Greatest Common Divisor (GCD)

eesti

Algorithm

  while ( x != y ) {
    if ( x < y ) y = y - x;
    else x = x - y;
  }

Interface

Input (x, y) and output signals (xo), and control signals (rst - reset, rdy - ready). The I/O protocol is based on the following - when rst=='0', inputs are latched into internal registers and rdy<='0' indicates that the algorithm has started. The test bench sets rst back to '1' and waits until rdy gets '1' - the algorithm has finished and the result is on the output. Of course, the polarity if control signals can be changed but the the test bench must be modified accordingly.


Control Data Flow Graph VHDL


Behavioral model - not synthesizable, internal signal are missing.
Suitable only for a quick check of the initial idea.


library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;

entity gcd is 
  port (xi, yi : in unsigned(15 downto 0);
        rst    : in bit;
        xo     : out unsigned(15 downto 0);
        rdy    : out bit := '1';
        clk    : in bit);
end gcd;

library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;

architecture experiments of gcd is 
begin 

  process
    variable x, y: unsigned(15 downto 0);
  begin
    -- Wait for the new input data
    wait on clk until clk='1' and rst='0';

    x := xi;    y := yi;    rdy <= '0';
    wait on clk until clk='1';

    -- Calculate
    while  x /= y  loop
      if  x < y  then y := y - x;
      else            x := x - y;    end if;
    end loop;

    -- Ready
    xo <= x;    rdy <= '1';
    wait on clk until clk='1';
  end process;

end experiments;


Simulation results of the behavioral model (the test bench is below):


Test bench VHDL


Can be used with all models. The test bench stops when all input values have been processed.


library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;

entity test is    end test;

architecture bench of test is
  signal clk, rst, rdy, hlt: bit := '1';
  signal x, y, res: unsigned(15 downto 0);

  component gcd
    port (xi, yi : in unsigned(15 downto 0);
          rst    : in bit;
          xo     : out unsigned(15 downto 0);
          rdy    : out bit;
          clk    : in bit);
  end component;
begin
  clk <= not clk after 10 ns when hlt='1';

  U1: gcd port map (x, y, rst, res, rdy, clk);

  process
    type int_array is array (0 to 7) of integer;
    constant a: int_array := (27, 33, 245, 121, 52, 333, 125, 422);
    constant b: int_array := (18, 256, 45, 11, 452, 121, 625, 312);
    -- expected results (GCD)  9,  1,  5,  11,  4,   1,  125,   2  
  begin
    wait on clk until clk='0';
    for  i in a'range  loop
      x <= conv_unsigned(a(i),16);
      y <= conv_unsigned(b(i),16);
      rst <= '0';
      wait on clk until clk='0';
      rst <= '1';
      --wait on clk until clk='0';
      while  rdy = '0'  loop
        wait on clk until clk='0';
      end loop;
      wait on clk until clk='0';
    end loop;
    wait on clk until clk='0';
    hlt <= '0';    wait;
  end process;
end bench;


Clocked behavioral model VHDL


Architecture only.
The behavior of internal signals can be observed at every clock step,
data and control parts can not be distinguished,
synthesizable by some tools only (see structure).
Such a style is useful to validate the algorithm step-by-step.


library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;

architecture experiments of gcd is 
  signal x, y: unsigned(15 downto 0);
begin 

  process begin
    -- Wait for the new input data
    while  rst = '1'  loop
      wait on clk until clk='1';
    end loop;

    x <= xi;    y <= yi;    rdy <= '0';
    wait on clk until clk='1';

    -- Calculate
    while  x /= y  loop
      if  x < y  then y <= y - x;
      else            x <= x - y;    end if;
      wait on clk until clk='1';
    end loop;
    
    -- Ready
    xo <= x;    rdy <= '1';
    wait on clk until clk='1';
  end process;

end experiments;


Simulation results of the clocked behavioral model:


Behavioral state machine VHDL


Architecture only.
The behavior of internal signals can be observed at every clock step,
data and control parts can not be distinguished, but the states are clearly defined.
Synthesizable by most of the synthesis tools, for FPGA-s included (see structure).


library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;

architecture experiments of gcd is
  type state_type is (S_wait, S_start, S_ready);
  signal state: state_type;
  signal x, y: unsigned(15 downto 0);
begin 

  process begin
    wait on clk until clk='1';
    case state  is
    -- Wait for the new input data
    when S_wait =>
      if  rst='0'  then
        x <= xi;    y <= yi;    rdy <= '0';    state <= S_start;
      end if;
    -- Calculate
    when S_start =>
      if  x /= y  then
        if  x < y  then    y <= y - x;
        else               x <= x - y;    end if;
        state <= S_start;
      else
        xo <= x;    rdy <= '1';    state <= S_ready;
      end if;
    -- Ready
    when S_ready =>    state <= S_wait;
    end case;
  end process;

end experiments;

Simulation results of the behavioral state machine model:

Beginning of the same simulation - changing of states is visible:


Hand-outs about the algorithm description and different implementations (2 slides per page).

The other VHDL files

The next five models fully match the register-transfer level (RTL) abstractions - there is a clear state machine and the data-part consists of components (register, multiplexers, subtracters, etc). Also, internal control signals exist. All models are synthesizable.

RTL #1 - single ALU, 3 clock cycles per iteration (see structure).

RTL #2 - single ALU, 2 clock cycles per iteration (see structure).

RTL #3 - comparator (less than) controls subtraction, 1 clock cycle per iteration – small but slow (sequential) data-part (see structure).

RTL #4 and RTL #5 - out-of-order execution – both subtractions are calculated first then the decision is made. The difference is how deciders/comparators are implemented (see structure #4 and structure #5).

Synthesis results and RTL schematics of different versions.


VHDL files to compare components - subtracter and two comparators, universal ALU, and test bench for checking.

SystemVerilog files of different GCD models - behavioral model, test bench, clocked behavioral model, behavioral state machine, RTL #1, RTL #2, RTL #3, RTL #4 and RTL #5.

SystemVerilog files to compare components - subtracter and two comparators, universal ALU, and test bench for checking.