9. Pipelined design

  • Pipeline is an important technique to increase the performance of a system.
  • The basic idea is to overlap the processing of several tasks so that more tasks can be completed in the same amount of time.

9.1. Delay and throughput

  • Delay and throughput are the two criteria used to examine the performance of a system

    • Delay: the time required to complete one task.
    • Throughput: the number of tasks that can be completed per unit time.
  • Adding pipeline to a combinational circuit can increase the throughput but not reduce the delay.

9.2. Overview on pipelined design

  • The pipelining technique can be applied to a task that is processed in stages.

    • An example: Pipelined laundry.
    • A complete laundry includes the stages of washing, drying and folding, for example.
../_images/c9_non_pipelined.jpg
  • For non-pipelined process, a new load cannot start until the previous load is completed.

    • It takes 240 minutes to complete the four loads.
    • The delay of processing one load is 60 minutes.
    • The throughput is 1/60 load per minute.
../_images/c9_pipelined.jpg
  • For pipelined process,

    • It takes 120 minutes to complete the four loads.
    • The delay in processing one load remains 60 minutes.
    • The throughput increases to 2/60 load per minute.
    • To process k loads, it will take 40+20k minutes.
    • The throughput becomes k/(40+20k) load per minute -> 1/20 load per minute when k is large.

Pipelined combinational circuit

  • A combinational circuit can be divided into stages so that the processing of different tasks can be overlapped.
  • To ensure that the signals in each stage flow in the correct order and prevent any potential race, registers must be added between successive stages.
  • The registers ensures that the signals can be passed to the next stage only at predetermined points.
../_images/c9_circuit.jpg

Assume: propagation delay for each stage: T1, T2, T3 and T4. Tmax = max(T1, T2, T3, T4); Thus, the minimum clock period has to accommodate the longest delay plus the overhead introduced by the buffer register in each stage: Tc = Tmax + Tr;

The effectiveness of the circuit

  • Propagation delay:

    • non-pipelined circuit: T comb = T1+T2+T3+T4
    • pipelined circuit: Tpipe = 4Tc = 4Tmax + 4Tr
  • Throughput:

    • non-pipelined circuit: 1/T comb;
    • pipelined circuit: k/(3Tc+kTc) -> 1/Tc.

Ideally, for an N-stage circuit

  • The propagation delay of each stage is identical (i.e., Tmax = Tcomb/N)

  • The register overhead (Tr) is comparably small

    • T pipe = NTc = NTmax = T comb
    • Throughput: 1/Tc = 1/Tmax = N/ T comb
  • Ideally, it is desirable to have more stages in the pipeline. However, when N becomes large,

    • the propagation delay of each stage becomes smaller, but Tr remains the same; its effect cannot be ignored.
    • In reality, it is difficult to keep dividing the original combinational circuit into smaller and smaller stages.

9.3. Adding pipeline to a combinational circuit

  • The candidate circuits for effective pipeline design should include the following characteristics:

    • There is enough input data to feed the pipeline circuit.
    • The throughput is the main performance criterion.
    • The combinational circuit can be divided into stages with similar propagation delay.
    • The propagation delay of a stage is much longer than the delay incurred due to the register.

The procedure to convert a combinational circuit to a pipelined design

  • Derive the block diagram of the original combinational circuit and arrange it as a cascading chain.
  • Identify the major components and estimate the relative propagation delays of these components.
  • Divide the chain into stages of similar propagation delays.
  • identify the signals that cross the boundary of the chain.
  • Insert registers for these signals in the boundary.

Examples

Simple pipelined adder-based multiplier

../_images/c9_chart.jpg ../_images/c9_multiplier.jpg

Non-pipelined multiplier in cascading stages

../_images/c9_Non_pipelined_multiplier.jpg

Non-pipelined circuit:

-- stage 2
pp2 <= pp1 + bp2;
-- stage 3
pp3 <= pp2 + bp3;

pipelined circuit:

../_images/c9_pipelined_circuit.jpg
-- register
if (reset = ‘1’) then
    pp2_reg <= (others => ‘0’);
    pp3_reg <= (others => ‘0’);
elsif (clk’event and clk=‘1’) then
    pp2_reg <= pp2_next;
    pp3_reg <= pp3_next;
end if;
…
-- stage 2
pp2_next <= pp1_reg + bp2;
-- stage 3
pp3_next <= pp2_reg + bp3;

Examples

Pipelined multiplier

More efficient Pipelined multiplier

  • Use a smaller (n+1)-bit adder to replace the 2n-bit adder in an n-bit multiplier.
  • Reduce the size of the partial-product register
  • Reduce the size of the registers that hold the b signal.