The last 10% of the chip design that sometimes seems to take "the other 90% of the time", that is. Certainly it takes away the other 90% of your patience.


Things that can make going to layout, and passing static timing analysis and gate level simulations with backannotation, much more of a pain than you'd like.

by Andy "I'm frustrated as hell, and I have to take it some more!" Fingerhut

  1. Use strong drivers for signals between major blocks that are synthesized separately.

    For the IPP chip, we synthesized the major blocks separately, and when it came time to glue them all together, no final synthesis run was done. We just reading in the .db files of the blocks and wrote out another .db file for the whole chip. We had *many* overloaded outputs for block-to-block signals, because flip-flop outputs just couldn't drive wires going halfway across chip very well.

    We also had problems with hold time violations between major blocks, but this was partially due to our klunky method of inserting clock trees -- we created clock tree netlists by hand, and hooked drivers up to FF clock pins without regard to where things would be laid out. Many layout turns were burned.

    For the OPP chip, the synthesis scripts of all major blocks used the set_load command on all outputs of the block, making Synopsys believe it had to drive a 2 pF load. It put in nice strong drivers. They were overkill in many cases, I'm sure, but it didn't hurt anything that I know of, except perhaps an unnoticeable increase in power.

  2. Be careful of timing to asynchronous set and reset pins of FF's that have them.

    Synopsys static timing analysis did not check setup and hold times of such pins, at least not for ES2's ECLP07 process. In asking one engineer at ES2 to add this feature to their Synopsys libraries (it seemed like it should be easy), he claimed that Synopsys could not do it, after talking with a Synopsys engineer about it.

    If this problem exists for your synthesis libraries (or Synopsys in general), then it is possible to have a laid out design that passes the "vanilla" static timing analysis perfectly (by which I mean you specify the clocks of the system, and multicycle and false paths, and then ask for longest and shortest paths on the whole chip), but when you do a gate level simulation with backannotated timing, FF's go undefined because the reset pins are deasserting too late or too early. This doesn't hurt so much for assertion edges of reset as long as the asynchronous pins are only used for a chip-wide reset, and not for simply clearing a register during normal operation. For deasserting edges of reset, this is a problem that you most likely want to fix before sending the chip off to fab. The chip won't come out of reset correctly during gate level backannotated simulations.

    For ES2's ECLP07 process, I had a dc_shell script that found all FF's in a hierarchical design that had asynchronous set or reset pins, one cell type at a time, and did explicit timing reports for the longest and shortest paths to those pins from anywhere else. These paths didn't include accurate clock skew backannotation info like most timing reports between FF's do, but it was good enough to find problems, or verify that everything was fine if the margin was at least equal to the clock skew in the design.

  3. Similarly, the timing of access to ES2 RAM's was not checked during static timing analysis, so the only way to find out whether the timing was good, other than tedious hand-checking or writing scripts that might be more complicated than I'd care to do, was to do a Verilog simulation with backannotated timing. Aspec synchronous RAM's might fare better in static timing analysis. It sure would be nice for us, too, if they did (us meaning those who will be using the Aspec/TSMC process for the second versions of IPP and SE chips).

  4. Another "gotcha" with Synopsys static timing analysis that bit us in the OPP chip is that when you do a command like:

    report_timing -max_paths 100 .....
    
    only reports one long path per endpoint, even though it reports 100 long paths in the whole design. If there are multiple long paths through combinational logic that end at the same endpoint, the command above only shows one of them.

    Add the report_timing options "-nworst m", where m is a number bigger than 1, to get more than one path per endpoint.

    This subtlety caused us to avoid fixing a long path in an OPP chip layout turn that we should have known about, but we left it in to avoid yet another turn of layout.

  5. Anticipate that you will need to tune delays and drive strengths between your standard cells and the pads, and between your standard cells and the RAM's. You can avoid this hassle if you are *very* careful about it before the first layout, or the timing constraints are loose (e.g., accessing RAM much more slowly than it is capable of, or the input pin is a static value except perhaps when resetting the chip).

Here are some other comments, independent of layout:
  1. Check that all multicycle paths are safe, in the sense that the source registers of the combinational logic are stable for the desired number of clock periods before the result of the combinational logic is sampled.

    They should also be checked that they are safe in the sense that the receiving FF's have transitions masked out from multicycle path logic, otherwise gate-level simulations could show FF's going undefined because transitions on their inputs one cycle after the input registers change. These undefined's could propagate and mess up the circuit's behavior. In the real physical circuit, the FF's could go metastable, with the same effect.

    RAM's that require more than one clock period to access are effectively multicycle paths for this purpose, and any FF's sampling the data outputs of the RAM should safely mask out the data outputs on all except those clock periods when you know the RAM outputs will be ready to sample.


Last updated on November 18, 1997.
andy_fingerhut at-the-machine alum SPOT wustl SPOT edu