Tuesday, August 27, 2019

Digital Design Verification Subsystem Lessons Learned


  1. make bus randomized during invalid cycle, to catch dut bugs that did not check valid signals. However, the constraint should still give 0 bus value a weight, so that 0 bus value can happen in invalid cycle, just in case that in real chip, the bus value is gated to 0 by upper layer module.
  2. bugs of features that cross two subsystems are difficult to catch. if a value generated in one module, and will be used in next module, it has a higher chance to have bug uncatched, as it needs correct model behavior as real hardware. solution: a) it's better to try to implement this in the same module, instead of split it into different modules, if possible. an example is the idle insertion/deletion to accommodate the AM (alignment marker) insertion/deletion in PCS. in this case, it's better to implement it in PCS instead of in MAC.
  3. bugs of features that related to performance AND cross two subsystems are extremely difficult to catch. possible solutions: a) in subsystem level, force to speed up counter value count, and increase pkt counts and so on, so that the possibility is increased. however, this needs very specific test scenarios; b) use hardware acceleration or e, e.g. Palladium or Synopsys ZeBu; c) chip top test to simulate the cross subsystem behaviors
  4. simplify architecture based on real application. sometimes, smart design means simple brutal force. this one needs the architects sensitivity to the industry use scenarios. I have two examples: a) initially we design our switch to have all kinds of protocols and features to both be legacy device compatible and also includes new features. this leads to overdesign and many bugs (insufficient man power and verification time). however, data center ethernet switch are more focused on feed and speed, it needs high throughput, less focused on protocols. end results, a lot of features are not used, a lot of hot new architectures (like SDN) are not used. this is big waste of resources. b) ...?
  5. Review every registers with DE in review meetings. Need to decide for every register: a) what is its meaning and actual use case (e.g. how does software guy config it, how does SA test it in real chip); b) can it be randomized in base test or should be tested in specific test; c) does it have some values that are often used by SA and Customer in real chip? weighted distribution?
  6. Review base config randomization constraints for base test. this is also related to 5) as some of the configs are registers. it needs designer and SA's input to confirm.
  7. simulation time vs packets number: Do Not trade packets number for simulation time. Meaning that do not try to save hardware resources. For verification, the first priority is function correctness, and the more packet number, the more possible that a bug will be hit. CPU resources are cheap, real chip bugs are expensive!
  8. random noise register/memory access during every test. But make sure that the noise and actually traffic can actually hit the same register/memory to trigger corner bugs.
  9. there should be 2 types of checker for a features if it cannot be accurately checked in every scenarios: a) a specific test that accurately check it's function; b) a general checker which is enabled in every testcase, and act as a sanity checker, in case there is some fundamental bugs in certain corner cases, which was not found in the a).
  10. choose the constraint range carefully, and choose the random value carefully. speed mode, pkt number, and the event happen time are all related, need to consider them when setting constraint or randomization range.
  11. have status and counters for monitors and controlling tb in cfg or virtual interface. for example, have counter count received pkt count, or have status variable to monitor dut is in transmiting or idle state, and so on.

Tuesday, August 20, 2019

Protocols

1. IIC or I²C (Inter-Integrated Circuit)

– Bus topology / routing / resources:

From this point of view, I²C is a clear winner over SPI in sparing pins, board routing and how easy it is to build an I²C network.

– Throughput / Speed:

If data must be transferred at ‘high speed’, SPI is clearly the protocol of choice, over I²C. SPI is full-duplex; I²C is not. SPI does not define any speed limit; implementations often go over 10 Mbps. I²C is limited to 1Mbps in Fast Mode+ and to 3.4 Mbps in High Speed Mode – this last one requiring specific I/O buffers, not always easily available.

– Elegance:

Both SPI and I2C offer good support for communication with low-speed devices, but SPI is better suited to applications in which devices transfer data streams, while I²C is better at multi master ‘register access’ application.

Conclusions.

In the world of communication protocols, I²C and SPI are often considered as ‘little’ communication protocols compared to Ethernet, USB, SATA, PCI-Express and others, that present throughput in the x100 megabit per second range if not gigabit per second. Though, one must not forget what each protocol is meant for. Ethernet, USB, SATA are meant for ‘outside the box communications’ and data exchanges between whole systems. When there is a need to implement a communication between integrated circuit such as a microcontroller and a set of relatively slow peripheral, there is no point at using any excessively complex protocols. There, I²C and SPI perfectly fit the bill and have become so popular that it is very likely that any embedded system engineer will use them during his/her career.

2. RTC (Real-Time Clock)
3. UART(Universal Asynchronous Receiver/Transmitter)
freebsd Serial and UART Tutorial
    The Start bit always has a value of 0 (a Space). The Stop Bit always has a value of 1 (a Mark). This means that there will always be a Mark (1) to Space (0) transition on the line at the start of every word, even when multiple word are transmitted back to back. This guarantees that sender and receiver can resynchronize their clocks regardless of the content of the data bits that are being transmitted. refer to stm32f103 reference manual S.27.3.3: 16X oversampling was used to detect noise errors.

4. ARM AMBA

Read Transaction:
To start the transaction off, the master places the slave's address on the ARADDR line and asserts that there is a valid address (ARVALID). Following time T1, the slave asserts the ready signal (ARREADY). Remember the source of data asserts the valid signal when information is available, while the receiver asserts the ready signal when it is able to consume that information. For a transfer to occur both READY and VALID must be asserted. All of this happens on the read address channel, with the address transfer completing on the rising edge of time T2.


From here, the rest of the transaction occurs on the read data channel. When the master is ready for data it asserts its RREADY signal. The slave then places data on the RDATA line and asserts that there is valid data (RVALID). In this case, the slave is the source and the master is the receiver. Recall that VALID and READY can be asserted in any order so long as VALID does not depend on READY. This read represents a single burst transaction made up of 4 beats or data transfers. Notice the slave asserts RLAST when the final beat is transferred.

Write Transaction:
What about writes? Figure 3 shows a timing diagram of an AXI write transaction. The addressing phase is similar to a read. A master places an address on the AWADDR line and asserts a valid signal. The slave asserts that it's ready to receive the address and the address is transferred. 
Next, on the Write Data Channel, the master places data on the bus and asserts the valid signal (WVALID). When the slave is ready, it asserts WREADY and data transfer begins. This transfer is again 4 beats for a single burst. The master asserts the WLAST when the last beat of data has been transferred. 
In contrast to reads, writes include a Write Response Channel where the slave can assert that the write transaction has completed successfully.


AXI Interconnect:
This is where AXI provides the most flexibility. Instead of prescribing how multi-master and multi-slave systems work, the AXI standard only defines the interfaces and leaves the rest up to the designer. If the system has multiple masters attempting to communicate with a single slave, then the AXI Interconnect may contain an arbiter that routes data between the master and slave interfaces. This arbiter could be implemented using simple priorities, a round-robin architecture, or whatever suits the designer's needs.

Systems that use multiple masters and multiple slaves could have interconnects containing arbiters, decoders, multiplexers, and whatever else is needed to successfully process transactions. This might include logic to translate between AXI3, AXI4, and AXI4-Lite protocols. 
Additionally, interconnects can perform bus-width conversion, use data FIFOs, contain register slices to break timing paths, and even convert between two different clock domains.

Burst len, size, type:

The burst length for AXI3 is 1~16, for AXI4 is 1~256(INCR) and 1~16(other burst type)

Burstsize: the maximum number of bytes to transfer in each data transfer, or beat, in a burst. 
If the AXI bus is wider than the burst size, the AXI interface must determine from the transfer address which byte of lanes of the data bus to use for each transfer. See Data read and write structure on axi spec. 

The size of any transfer must not exceed the data bus width of either agent in the transaction. 


Out of order

Data from read transaction ARID values can arrive in any order

Interleave:

Read data of transaction with different ARID values can be interleaved
Write Interleaved: AXI3 support, but the first item of write data must be issued in the same order as the write address. AXI4 does not support interleave. 

Unaligned Address:

TODO

AXI response


5. USB2.0
USB2 made simple(better and low level)
USB in a nutshell(introductory)

6. SPI

C Programming

Header Files and Includes https://cplusplus.com/forum/articles/10627/ https://stackoverflow.com/questions/2762568/c-c-include-header-file-or...