High Level Synthesis is the technology of 21st Centuary. Lot of industry is working in this area and unfortunatly you will find very less information abou this. We as in VLSI Expert, always try to fill such gaps. These series of articles are going to give you an indept knowledge from basic to advance. Author of this article is Mr. Rishabh Jain (Senior Member Technical Staff, Mentor Graphics Pvt. Ltd.). On our request, he agree to share his experience. This is the first article of couple of more which are in Pipeline.
Proclaiming the Requirement
Diving into the History
Defining the rescuer
Understanding the Use Case
References
The aim is to make a design with the best suitable architecture to suffice the specification requirement.
Let’s consider the following design:
module top(clk,arst,din,dout);
input clk;
input arst;
input [31:0] in;
output reg [31:0] out;
always@(posedge clk or posedge arst)
begin
if(arst == 1'b1)
out <= 32'b0;
else
out <= in;
end
endmodule
What you see above is a Verilog based design of a D-Flip-Flop
For experts in the domain it is pretty evident that precisely it is a D-Flip-Flop with Asynchronous reset functionality.
For the amateurs, let me explain a bit here –
INPUT pins are clk clock pin, arst an asynchronous reset pin and in a 32-bit input data bus.
OUTPUT pin is out a 32-bit output data bus.
At every positive clock edge or at positive reset edge, we transmit value of in to out data bus if arst is 0 else we transmit a 32-bit 0 value to out data bus.
The design above represents an active high reset functionality. Consider a scenario where you wish to reuse the above DFF design with active low reset or maybe a DFF with synchronous reset. You would have to re-code the HDL code as the different requirement of architecture arises but with same functionality.
To overcome the problem, let’s consider the same design with High Level Synthesis (HLS):
void top( int in, int& out)
{
out = in;
}
Yep! That’s pretty much it. Roughly a few lines code in C++ that too without the restrictions of providing reset or clock. That can be easily added later during the synthesis process. The key here is to understand the focus functionality and providing the freedom to the designer in terms of experimenting with multiple architecture types to fulfil the requirements.
But now you might be asking what exactly this HLS is? What happened to traditional HDL methodologies for designing? Are there any additional advantages to what explained above? Where is the definition of input and output or the data bus width in the HLS design mentioned above?
Let’s see the answers to all of these questions.
In terms of hardware description languages (HDLs), we define synthesis as converting HDL model of hardware (higher abstraction) to the corresponding gate-level implementation of the hardware (Lower abstraction).
The process of designing hardware has changed a lot over the years from handwritten netlist design to CADs to HDLs and is still evolving for the betterment and ease of designing with the increase in complexity. Below table shows the growth of design complexity in terms of number of transistors involved over the past few decades.
With increasing design complexity, hardware designing process involved usage of HDLs like Verilog and VHDL which indeed brought a revolution in the VLSI industry having the key advantages as follows:
As the algorithms being used, especially when dealing large and complex Machine Learning algorithms evolved, HDL modelling methodology started to seem highly unlikely to fulfil designer’s requirement. Consider the following limitation of using HDLs:
High Level Synthesis (HLS) can be defined as the automated designing process that transforms the behavioral or functional description of the design into a digital hardware implementation. Now some may argue the point that high-level synthesis could be interchangeably used with the term behavioral synthesis. When we define the different kinds of modelling pertaining to HDLs, behavioral modelling and synthesis seemed to fit the definition of HLS. But the only problem is the lack of methodology in this process.
To begin with, the design entry language was unfamiliar as not all the designers were comfortable with behavioral Verilog or VHDL. Moreover, the problem of using the synthesizable construct like “always block” or “(@posedge clk)” was hindering the designer’s focus on algorithmic designing.
In order to establish a methodology, high level languages were introduced in the process of hardware designing. Thus, HLS primarily started the use of C, C++ and SystemC as the designing languages which in turn made writing complex algorithms easier. Therefore, we define High Level Synthesis as the process which takes C/C++/SystemC (High level language) as an input or algorithmic description of the design and produces the result as a corresponding FSM and data path as an output which is further processed into the formation of the hardware description language (HDL) based RTL netlist.
Please note that while designing with HLS, functionality or the algorithmic behavior of the design is the priority and not the timing aspect or the architecture of it. So now that we have understood what HLS is, let me explain the power of it using an example in the next section.
The port width is simply defined using C++ native “int” datatype which is by default 32-bit wide and “&” operator is used to represent the direction of the registered output port. Additionally, interface protocol details automatically gets included by the HLS tool as shown below:
Let us see another example of designing a 32-bit 4 Input adder with HLS.
void acc (int din[4], int &dout)
{
int acc=0;
for (int i=0; i<4; i++)
{
acc += din[i];
}
dout = acc;
}
The design has 4 32-bit inputs din[0] down to din[3]. The output of the design, as stated using a “&” operator, is dout which is a 32-bit wide data bus.
So far we have only specified the functionality as the sum of all the 4 inputs to be transmitted to the output data bus.
Please not that no information with respect to architecture has been provided. We can easily achieve multiple hardware implementations using the same design.
For instance, we can have fully parallel registered output implementation like: The above implementation is not optimized in terms of resources being used as we can see 3 adders in the resultant hardware.
For achieving minimum area we can have different constraints and force the tool to use a single adder resource. The resultant would be something like: Similarly, a fully pipelined version of the adder can also be achieved by providing a different set of architectural constraints.
Now imagine having to do the same with RTL designing. Designer would probably be writing different Verilog RTL implementation code for all the 3 adder architectures discussed above. This is the power and flexibility of the HLS.
From the above example, following advantages of HLS can be easily observed:
Embrace yourself and stay tuned for the upcoming series of articles where I’ll explain the answers to all the above questions and highlight the power of HLS designing methodology.
https://semiengineering.com/whats-the-real-benefit-of-high-level-synthesis/
https://www.cse.usf.edu/~haozheng/teach/cda4253/doc/hls/hls_bluebook_uv.pdf
Table of Contents
Proclaiming the Requirement
Diving into the History
Defining the rescuer
Understanding the Use Case
References
Proclaiming the Requirement
As a hardware designer, what do you think is the most brainstorming step when you try to convert the provided specification to the corresponding implementation? Is it HDL modelling you are going to choose? Or maybe the technology library you are going to use? Or perhaps the best clock frequency possible which gives best performance, area and power?The aim is to make a design with the best suitable architecture to suffice the specification requirement.
Let’s consider the following design:
module top(clk,arst,din,dout);
input clk;
input arst;
input [31:0] in;
output reg [31:0] out;
always@(posedge clk or posedge arst)
begin
if(arst == 1'b1)
out <= 32'b0;
else
out <= in;
end
endmodule
What you see above is a Verilog based design of a D-Flip-Flop
For experts in the domain it is pretty evident that precisely it is a D-Flip-Flop with Asynchronous reset functionality.
For the amateurs, let me explain a bit here –
INPUT pins are clk clock pin, arst an asynchronous reset pin and in a 32-bit input data bus.
OUTPUT pin is out a 32-bit output data bus.
At every positive clock edge or at positive reset edge, we transmit value of in to out data bus if arst is 0 else we transmit a 32-bit 0 value to out data bus.
The design above represents an active high reset functionality. Consider a scenario where you wish to reuse the above DFF design with active low reset or maybe a DFF with synchronous reset. You would have to re-code the HDL code as the different requirement of architecture arises but with same functionality.
To overcome the problem, let’s consider the same design with High Level Synthesis (HLS):
void top( int in, int& out)
{
out = in;
}
Yep! That’s pretty much it. Roughly a few lines code in C++ that too without the restrictions of providing reset or clock. That can be easily added later during the synthesis process. The key here is to understand the focus functionality and providing the freedom to the designer in terms of experimenting with multiple architecture types to fulfil the requirements.
But now you might be asking what exactly this HLS is? What happened to traditional HDL methodologies for designing? Are there any additional advantages to what explained above? Where is the definition of input and output or the data bus width in the HLS design mentioned above?
Let’s see the answers to all of these questions.
Diving into the History
In VLSI design process, synthesis has been a significant process in the initial stages of the ASIC and FPGA design flows. Synthesis can be simply defined as the process of transforming your design from a higher level of abstraction to a lower level of abstraction.In terms of hardware description languages (HDLs), we define synthesis as converting HDL model of hardware (higher abstraction) to the corresponding gate-level implementation of the hardware (Lower abstraction).
The process of designing hardware has changed a lot over the years from handwritten netlist design to CADs to HDLs and is still evolving for the betterment and ease of designing with the increase in complexity. Below table shows the growth of design complexity in terms of number of transistors involved over the past few decades.
With increasing design complexity, hardware designing process involved usage of HDLs like Verilog and VHDL which indeed brought a revolution in the VLSI industry having the key advantages as follows:
- Easy to express large designs and flexibility
- Abstraction hides the complexity of the design
- Time to market is reduced.
- Optimization is easier with trade off capabilities (area vs speed)
As the algorithms being used, especially when dealing large and complex Machine Learning algorithms evolved, HDL modelling methodology started to seem highly unlikely to fulfil designer’s requirement. Consider the following limitation of using HDLs:
- Hard to design complex algorithmic designs like Computer Vision, Image Processing etc.
- Faster time to market with good quality of results is very challenging.
- High verification cost & debug time.
- Flexibility for handling frequent changes in specifications is not present.
- With change in technology library, design has to be modified, e.g. FPGA to ASIC.
Defining the rescuer
As we have discussed so far, there is a need for a methodology which can be used for designing complex algorithms and focuses on functionality more than the timing aspect of it. The process of high-level synthesis seems to resolve this issue.High Level Synthesis (HLS) can be defined as the automated designing process that transforms the behavioral or functional description of the design into a digital hardware implementation. Now some may argue the point that high-level synthesis could be interchangeably used with the term behavioral synthesis. When we define the different kinds of modelling pertaining to HDLs, behavioral modelling and synthesis seemed to fit the definition of HLS. But the only problem is the lack of methodology in this process.
To begin with, the design entry language was unfamiliar as not all the designers were comfortable with behavioral Verilog or VHDL. Moreover, the problem of using the synthesizable construct like “always block” or “(@posedge clk)” was hindering the designer’s focus on algorithmic designing.
In order to establish a methodology, high level languages were introduced in the process of hardware designing. Thus, HLS primarily started the use of C, C++ and SystemC as the designing languages which in turn made writing complex algorithms easier. Therefore, we define High Level Synthesis as the process which takes C/C++/SystemC (High level language) as an input or algorithmic description of the design and produces the result as a corresponding FSM and data path as an output which is further processed into the formation of the hardware description language (HDL) based RTL netlist.
Please note that while designing with HLS, functionality or the algorithmic behavior of the design is the priority and not the timing aspect or the architecture of it. So now that we have understood what HLS is, let me explain the power of it using an example in the next section.
Understanding the Use Case
In the first section while explaining the requirement and making the case for HLS, I represented the ease of designing with the help of a DFF with asynchronous reset where no clock or reset has been provided at this stage in the HLS design.The port width is simply defined using C++ native “int” datatype which is by default 32-bit wide and “&” operator is used to represent the direction of the registered output port. Additionally, interface protocol details automatically gets included by the HLS tool as shown below:
Let us see another example of designing a 32-bit 4 Input adder with HLS.
void acc (int din[4], int &dout)
{
int acc=0;
for (int i=0; i<4; i++)
{
acc += din[i];
}
dout = acc;
}
The design has 4 32-bit inputs din[0] down to din[3]. The output of the design, as stated using a “&” operator, is dout which is a 32-bit wide data bus.
So far we have only specified the functionality as the sum of all the 4 inputs to be transmitted to the output data bus.
Please not that no information with respect to architecture has been provided. We can easily achieve multiple hardware implementations using the same design.
For instance, we can have fully parallel registered output implementation like: The above implementation is not optimized in terms of resources being used as we can see 3 adders in the resultant hardware.
For achieving minimum area we can have different constraints and force the tool to use a single adder resource. The resultant would be something like: Similarly, a fully pipelined version of the adder can also be achieved by providing a different set of architectural constraints.
Now imagine having to do the same with RTL designing. Designer would probably be writing different Verilog RTL implementation code for all the 3 adder architectures discussed above. This is the power and flexibility of the HLS.
From the above example, following advantages of HLS can be easily observed:
- Higher level of abstraction, therefore, focus on algorithms.
- Trade Off between Area and timing is flexible as per requirement.
- Reusability is much easier.
- Technology library independent designing
- Late spec changes can be incorporated with least possible efforts.
Embrace yourself and stay tuned for the upcoming series of articles where I’ll explain the answers to all the above questions and highlight the power of HLS designing methodology.
References:
https://en.wikipedia.org/wiki/High-level_synthesishttps://semiengineering.com/whats-the-real-benefit-of-high-level-synthesis/
https://www.cse.usf.edu/~haozheng/teach/cda4253/doc/hls/hls_bluebook_uv.pdf
Nicely articulated.
ReplyDeleteNicely articulated.
ReplyDelete