Greatest Common Divisor (GCD) |
![]() |
while ( x != y ) {
if ( x < y ) y = y - x;
else x = x - y;
}
Input (x, y) and output signals (xo), and control signals (rst - reset, rdy - ready). The I/O protocol is based on the following - when rst=='0', inputs are latched into internal registers and rdy<='0' indicates that the algorithm has started. The test bench sets rst back to '1' and waits until rdy gets '1' - the algorithm has finished and the result is on the output. Of course, the polarity if control signals can be changed but the the test bench must be modified accordingly.
| Control Data Flow Graph | VHDL |
|
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;
entity gcd is
port (xi, yi : in unsigned(15 downto 0);
rst : in bit;
xo : out unsigned(15 downto 0);
rdy : out bit := '1';
clk : in bit);
end gcd;
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;
architecture experiments of gcd is
begin
process
variable x, y: unsigned(15 downto 0);
begin
-- Wait for the new input data
wait on clk until clk='1' and rst='0';
x := xi; y := yi; rdy <= '0';
wait on clk until clk='1';
-- Calculate
while x /= y loop
if x < y then y := y - x;
else x := x - y; end if;
end loop;
-- Ready
xo <= x; rdy <= '1';
wait on clk until clk='1';
end process;
end experiments; |
Simulation results of the behavioral model (the test bench is below):
| Test bench | VHDL |
|
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;
entity test is end test;
architecture bench of test is
signal clk, rst, rdy, hlt: bit := '1';
signal x, y, res: unsigned(15 downto 0);
component gcd
port (xi, yi : in unsigned(15 downto 0);
rst : in bit;
xo : out unsigned(15 downto 0);
rdy : out bit;
clk : in bit);
end component;
begin
clk <= not clk after 10 ns when hlt='1';
U1: gcd port map (x, y, rst, res, rdy, clk);
process
type int_array is array (0 to 7) of integer;
constant a: int_array := (27, 33, 245, 121, 52, 333, 125, 422);
constant b: int_array := (18, 256, 45, 11, 452, 121, 625, 312);
-- expected results (GCD) 9, 1, 5, 11, 4, 1, 125, 2
begin
wait on clk until clk='0';
for i in a'range loop
x <= conv_unsigned(a(i),16);
y <= conv_unsigned(b(i),16);
rst <= '0';
wait on clk until clk='0';
rst <= '1';
--wait on clk until clk='0';
while rdy = '0' loop
wait on clk until clk='0';
end loop;
wait on clk until clk='0';
end loop;
wait on clk until clk='0';
hlt <= '0'; wait;
end process;
end bench; |
| Clocked behavioral model | VHDL |
|
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;
architecture experiments of gcd is
signal x, y: unsigned(15 downto 0);
begin
process begin
-- Wait for the new input data
while rst = '1' loop
wait on clk until clk='1';
end loop;
x <= xi; y <= yi; rdy <= '0';
wait on clk until clk='1';
-- Calculate
while x /= y loop
if x < y then y <= y - x;
else x <= x - y; end if;
wait on clk until clk='1';
end loop;
-- Ready
xo <= x; rdy <= '1';
wait on clk until clk='1';
end process;
end experiments; |
Simulation results of the clocked behavioral model:
| Behavioral state machine | VHDL |
|
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;
architecture experiments of gcd is
type state_type is (S_wait, S_start, S_ready);
signal state: state_type;
signal x, y: unsigned(15 downto 0);
begin
process begin
wait on clk until clk='1';
case state is
-- Wait for the new input data
when S_wait =>
if rst='0' then
x <= xi; y <= yi; rdy <= '0'; state <= S_start;
end if;
-- Calculate
when S_start =>
if x /= y then
if x < y then y <= y - x;
else x <= x - y; end if;
state <= S_start;
else
xo <= x; rdy <= '1'; state <= S_ready;
end if;
-- Ready
when S_ready => state <= S_wait;
end case;
end process;
end experiments; |
Simulation results of the behavioral state machine model:
Beginning of the same simulation - changing of states is visible:
Hand-outs about the algorithm description and different implementations (2 slides per page).
The next five models fully match the register-transfer level (RTL) abstractions - there is a clear state machine and the data-part consists of components (register, multiplexers, subtracters, etc). Also, internal control signals exist. All models are synthesizable.
RTL #1 - single ALU, 3 clock cycles per iteration (see structure).
RTL #2 - single ALU, 2 clock cycles per iteration (see structure).
RTL #3 - comparator (less than) controls subtraction, 1 clock cycle per iteration – small but slow (sequential) data-part (see structure).
RTL #4 and RTL #5 - out-of-order execution – both subtractions are calculated first then the decision is made. The difference is how deciders/comparators are implemented (see structure #4 and structure #5).
Synthesis results and RTL schematics of different versions.
VHDL files to compare components - subtracter and two comparators, universal ALU, and test bench for checking.
SystemVerilog files of different GCD models - behavioral model, test bench, clocked behavioral model, behavioral state machine, RTL #1, RTL #2, RTL #3, RTL #4 and RTL #5.
SystemVerilog files to compare components - subtracter and two comparators, universal ALU, and test bench for checking.