# Enhancing hardware design through domain-specific language tools for higher abstraction

Luca De Santis / Micron Technology Italy



Seville 17-19 October, 2024 https://langdevcon.org





## The context

- For many decades, hardware design has been driven by the processor model: addressing problems at the software level and executing code on general-purpose hardware.
- With the end of Dennard scaling, the focus shifted toward multi-core and multi-processor architectures.
- To achieve better performance, reduce costs, and lower power consumption, we have now entered the era of heterogeneous computing. In this approach, complex digital systems are composed of a blend of standard processors, customized processors, storage media, complex state machines, reconfigurable hardware and adhoc accelerators.



Hennessy and Patterson, Turing Lecture 2018, overlaid over "42 Years of Processors Data" https://www.karlrupp.net/2018/02/42-years-of-microprocessor-trend-data/;"First Wave" added by Les Wilson, Frank Schirrmeister

Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2017 by K. Rupp

#### Turing Lecture 2018

Innovations like domain-specific hardware, enhanced security, open instruction sets, and agile chip development will lead the way.

BY JOHN L. HENNESSY AND DAVID A. PATTERSON

A New Golden Age for Computer Architecture



# The problem

- Hardware design relies on the concepts of Hardware Description Language (HDL) and automatic synthesis.
- In the 80s, two major HDLs were introduced: Verilog and VHDL, and today all industrial-level chip development flows are based on these ones.
- However, as system complexity grew, the limitations of these older languages became evident. This
  highlighted the need to:
  - Increase the level of abstraction in hardware design.
  - Introduce "stronger typing for wires" in HDLs, preventing arbitrary connections between components. Some researchers have pointed out that hardware designers are now facing challenges comparable to the "goto" problem that software engineers encountered during the early development of programming languages.



# Some solutions: the EDA landscape

Over the years, several attempts have been made to move beyond the digital abstraction in hardware design:

- High-level synthesis: Focused on algorithms described using programming languages. It has been highly
  successful in the DSP domain and data-intensive systems but is not as well suited for control-intensive
  systems.
- Processor synthesis: Centered on defining instruction sets. This approach involves expensive tools, and having the compiler integrated in the process adds significant value.
- Hardware-software co-design and co-synthesis: Relies on heuristics, it is not yet a fully developed or standardized methodology.

| High-level synthesis               | Processor<br>Synthesis | Hardware/Software Co-<br>design |  |
|------------------------------------|------------------------|---------------------------------|--|
| «Beyond Verilog» abstractions (?)  |                        |                                 |  |
| Logic Synthesis (RTL Verilog/VHDL) |                        |                                 |  |
| Physical Design                    |                        | Configuration map               |  |
| ASIC                               |                        | FPGA                            |  |



# The real problem

«Beyond Verilog» abstractions are evolving toward incorporating software engineering concepts into hardware design, but:

- Hardware engineers generally do not have the same mind-set as software engineers.
- Hardware design requires strict adherence to fundamental constraints:
  - Area/performance/power trade-off
  - Fine tuning of timing (low level synchronizations, physical delays)
  - Strict control of «states», to avoid undesired hidden ones and deadlocks
  - Practical tools for definition of customized instruction sets and communication protocols
  - Readability of synthesized code for debug

A Golden Age of Hardware Description Languages: Applying Programming Language Techniques to Improve Design Productivity

Lenny Truong Stanford University, USA lenny@cs.stanford.edu

Pat Hanrahan Stanford University, USA hanrahan@cs.stanford.edu

3rd Summit on Advances in Programming Languages (SNAPL 2019)



# The challenge

- There is a common feeling that the right way to face digital systems complexity is by increasing the level
  of abstraction of hardware description languages to stimulate a «correct-by-design» approach
- A common objection is that abstraction means losing details, but looking at the famous Dijkstra's citation: "The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise"
- Can we adopt concepts and tools from the domain-specific language (DSL) field to meet this need?
- Key point is: to capture what is the «semantics» of digital systems representations at any level of description



# One example

- The simplest useful object in hardware design is the D-type flip-flop: it samples a signal when a clock event occurs; on the right is reported a description in synthesizable Verilog
- The problem here is that clock, reset and data can be everything; constraints on these signals are not defined at language level but on post-processing tools
- Here is how Chisel (one of the most promising languages aimed to increase the level of abstraction of hardware) defines a D-type flip-flop

```
class ExampleModule extends Module {
  val io = IO(new Bundle {
    val d = Input(Bool()) // Data input
    val q = Output(Bool()) // Output
  })
  // Define the register with reset behavior
  val reg = withReset(reset.asBool) {
    RegNext(io.d, init = false.B) // Initialize the register to 'false'
  (reset state)
    }
  // Connect the output
  io.q := reg
}
```



 Here is what a hardware designer would like to have: explicit declaration of clock and reset domains

Domain clock, reset {

Reg out = data {init(0)};



# Another example

 A system composed of a memory and a counter that flushes data according to the value of an input signal «go»



out

Here is what a hardware designer would like to have: actions instead of wiring



 This is an example of a very basic memory driver, made simple by introducing the concept of an «action» over a block, overtaking the practice of just «wiring» blocks



# One more example

- Finite state machine (FSM) is one the fundamental concepts in hardware design. Defining real-world state machines can be highly challenging due to the need for exhaustive coverage of "state/input/next state/output" tuples and the difficulty of identifying and managing errors.
- This is an example of a simple state machine: the classical exercise of a recognizer of strings composed of initial symbol 'a' followed by an arbitrary number of symbols 'b' and a final symbol 'c'



 Here is what a hardware designer would like to have: tables instead of code with special symbols like 'any' and 'hold';

```
Table(state, in -> state, out) {
    idle, start -> S1 ;
    S1 , `a' -> S2 ;
    S2 , `b' -> hold ;
    S2 , `c' -> S3 , done;
    S3 , any -> hold ;
    any, any -> error ;
}
```



# The proposal

#### Explicit definition of clock and reset domains

- Easier and cleaner definition of registers and sequential circuits
- Automatic synchronization between domains

#### Actions

- Easier and cleaner definition of interfaces between some standard building blocks like memories, stacks and queues
- High-level description of protocols

#### Enhanced state transition tables

- Compact definition of complex state machines and customized instruction sets
- Easier error detection
- Helps to avoid unwanted hidden states



# Design Criteria for a language

- One file One module (easy prototyping, forces engineers to be organized)
- Common HDL features:
  - Hierarchical instances, in/out ports, parameterization
  - Basic digital types (bit strings, bin/hex literals)
  - Typical digital operators (wires concatenation and slicing)
- Strong typing of wires, registers and memories
- Explicit clock and reset domains
- Clear differentiation between «stateless» vs. «stateful» objects
- Standard storage media (memories, stacks, queues...) and access protocols
- Multidimensional array access (for ML/AI accelerators)
- Enhanced transition tables with actions
- Iterators for indexed declarations



# A possible flow (ANTLR+Python)





# Typing (ANTLR grammar)

| topItem:                                                                  |                            |
|---------------------------------------------------------------------------|----------------------------|
| 'Parameter' ID (sz = size)? ('=' e = expr)? ';'                           | # parDecl                  |
| 'Wire' ID (sz = size)? (a = attrs)? ';'                                   | # wireDecl                 |
| <pre>'Input' ID (sz = size)? (a = attrs)? ';'</pre>                       | <pre># inpDecl</pre>       |
| <pre>'Clock' ID (sz = size)? ';'</pre>                                    | <pre># clkDecl</pre>       |
| 'Reset' ID (sz = size)? ';'                                               | <pre># rstDecl</pre>       |
| <pre>'Index' ID ('=' e = expr)? ';'</pre>                                 | <pre># iterDecl</pre>      |
| <pre>  'Logic' ID (sz = size)? (a = attrs)? ('=' e = expr)? ';'</pre>     | <pre># logicDecl</pre>     |
| <pre>  'State' ID (sz = size)? (a = attrs)? ('=' e = expr)? ';'</pre>     | <pre># stateDecl</pre>     |
| <pre>  'Reg' ID (sz = size)? (a = attrs)? ('=' e = expr)? ';'</pre>       | <pre># regDecl</pre>       |
| <pre>'Instance' n = ID (sz = size)? 'of' m = ID (msz = size)? (</pre>     |                            |
| a = attrs                                                                 |                            |
| )? (c = conns)? ';'                                                       | <pre># instDecl</pre>      |
| 'Action' ID ':' a = action ';'                                            | <pre># actDecl // TO</pre> |
| <pre>  'Action' ID ':' '{' alst += action (';' alst += action)? '}'</pre> | <pre># multiActDecl</pre>  |
| 'Table' ID h = tableHead b = tableBody                                    | <pre># tableDecl</pre>     |
| <pre>'Domain' c = ID ',' r = ID '{' (t += topItem)* '}'</pre>             | <pre># domDecl</pre>       |
| <pre>'assign' lhs = lhsexpr '=' rhs = expr ';'</pre>                      | <pre># assignment;</pre>   |



# A simple processor as a first example



#### Instruction Set :

- NOP , HALT , JMP , JMPR, JMPZ, JMPNZ, IJMP, IJMPR, IJMPZ, IJMPNZ, ERR , LOAD, STORE, SETV,
- ADD, SUB, EQ, NEQ, LT, LTE, GT, GTE, AND, NAND, OR, NOR, XOR, XNOR, NOT



# Source code entry

= 8; Parameter DW Parameter CODE SIZE = 1024 ; Parameter CODE\_WIDTH = 21; Parameter CODE\_ADDR\_SIZE = 10 ; Parameter DATA SIZE = 128 ; Parameter REG\_SIZE = 32; Input go ; Clock clk ; Reset rst\_ ; Domain clk, rst\_ { State state [4] { enum(IDLE , EXE , FETCH, ERROR, DONE, EXE\_ALU, EXE ALU FLAG, STALL LOAD, STALL ALU, STALL FLAG) } ; [CODE\_ADDR\_SIZE]; Reg pc ir [CODE WIDTH]; Reg Reg flag [8]; acc [DW]; Reg Reg acc2 [DW]; Instance ram of mem [DW,DATA\_SIZE] {init("ram\_init.bin")}; Instance rom of bank [CODE WIDTH,CODE SIZE] {init("rom code.bin")}; Instance rf of bank [DW,REG\_SIZE] ;

| // combinationals       |                                                           |
|-------------------------|-----------------------------------------------------------|
| Logic icode [5] { enum( | NOP , HALT , JMP , JMPR, JMPZ, JMPNZ, IJMP, IJMPR, IJMPZ, |
|                         | IJMPNZ, ERR , LOAD, STORE, SETV,                          |
|                         | ADD, SUB, EQ, NEQ, LT, LTE, GT, GTE, AND, NAND, OR,       |
|                         | NOR, XOR, XNOR, NOT )}                                    |
| = ir[20:16:-1] ;        |                                                           |

Logic flagz [1] = flag[0];

// instruction fields
Logic f\_loc [CODE\_ADDR\_SIZE] = ir[15:0:-1];
Logic f\_add [8] = ir[7:0:-1];
Logic f\_val [8] = ir[7:0:-1];



## Dynamic section (Actions+Table)

Action incrpc : pc := pc + 1; Action wrram : ram[f add] := acc ; Action rdrom : ir := rom[pc]; Action setacc : acc := f val ; Action calc : acc := case icode of { ADD : acc + acc2 ; SUB : acc - acc2 ; AND : acc & acc2 ; NAND : not(acc & acc2); OR : acc | acc2 ; NOR : not( acc | acc2); XOR : acc ^ acc2 ; XNOR : not( acc ^ acc2); NOT : not acc ; Action calc flag := case icode of { EQ : acc == acc2; NEQ : acc != acc2 ; LT : acc < acc2 ; LTE : acc <= acc2 ; GT : acc > acc2 ;GTE : acc  $\geq$  acc2 ; };

Table table(state, icode -> state ) { IDLE, any, go -> FETCH : IDLE, any, not go -> IDLE; FETCH, any -> EXE, rdrom ; EXE, NOP -> FETCH, incrpc ; EXE, LOAD -> STALL LOAD, ram.read(f add) ; STALL LOAD, any -> FETCH, acc := ram.dout, incrpc ; -> FETCH, wrram, incrpc ; EXE, STORE EXE, SETV -> FETCH, setacc, incrpc ; EXE, [ADD, SUB, AND, NAND, OR, NOR, XOR, XNOR] -> STALL\_ALU, ram.read(f\_add); STALL\_ALU, any -> EXE\_ALU, acc2 := ram.dout ; EXE ALU, any -> FETCH, calc , incrpc ; EXE, NOT -> FETCH, calc , incrpc ; EXE, [EQ, NEQ, LT, LTE, GT, GTE] -> STALL FLAG, ram.read(f add); STALL FLAG , any -> EXE ALU FLAG , acc2 := ram.dout ; EXE\_ALU\_FLAG, any -> FETCH, calc\_flag, incrpc ; EXE, JMP -> FETCH, pc := f loc ; EXE, JMPR -> FETCH, pc := pc+f val ; EXE, JMPZ, flagz -> FETCH, pc := f\_loc -> FETCH, incrpc ; //EXE, JMPZ, not flagz -> FETCH, incrpc ; EXE, JMPNZ, not flagz -> FETCH, pc := f loc -> FETCH, incrpc ; //EXE, JMPNZ, flagz -> FETCH, incrpc; EXE, IJMPR -> FETCH, pc := pc + acc ; EXE, IJMP -> FETCH, pc := acc ; EXE, HALT -> DONE ; EXE, any -> ERROR ; DONE, any -> DONE; ERROR, any -> ERROR



# Conclusion

- «Beyond Verilog» abstraction is moving towards higher level software concepts, but hardware engineers are uncomfortable on that.
- An intermediate level of representation for digital systems is needed to allow designers to maintain strict control over hardware fundamentals.
- A possible approach is to capture the correct hardware semantics using a domain-specific language methodology.
- "Big opportunities for software engineers to help hardware designers in embracing a growing complexity" (Truong, Hanrahan)





# Closing

Idesantis@micron.com

https://www.linkedin.com/in/luca-de-santis-42a87a5/



Seville 17-19 October, 2024 https://langdevcon.org