The cells are characterised by running a total of 60 Spice simulations
using different values of input transition and output load. 10 values
of input transition are used, ranging from 20ps to 1500ps.
6 values of output load are used, ranging for an x1 gate from 2fF to 260fF.
For cells with bigger or smaller drive strengths, the output loads are
scaled correspondingly. The characterisation conditions are nominal,
with Vdd=1.2V, T=27C and typical process parameters.
Derating to best and worst case conditions has not been done yet.
The entire library of 121 cells takes 10 hours to characterise on a 1500MHz Pentium M running Suse Linux 8.2. The current library has 209 cells, which would take about 17 hours to characterise.
For each simulation, the rise delay tR is the time from the input transition
passing the 50% point to the rising output passing the 50% point, and the fall
delay tF is the corresponding delay when the output is falling.
The rise and fall transitions are the times the output takes between the 10% and 90% points,
scaled to a full 0-100% transition. This means that the 10-90% time is divided
by (0.9-0.1) or 0.8 to give the equivalent 0-100% transition.
The transition times have been scaled to 0-100% because the Spice simulations are using full swing inputs for the characterisation. The 10/90 point has been determined empirically as minimising differences between the timing in the Spice simulations and the logic timing coming from interpolating the LookUp Tables (LUT's). This determination has to be done for each technology … the values won't necessarily be the same for each one.
For some gates, typically where an N-transistor strongly helps the output to rise,
the rise transition thresholds are different.
In this case, the presence of an N transistor assisting in the pull up changes the
nature of the output curve. In order to minimise timing discrepancies, the thresholds are
varied on a per cell basis. For example, the or2v2x2 measures the rise
threshold from 6%-94%, values which minimise timing discrepancies between Spice and
the timing LUT.
The iv1v0x2 is used as a reference to calculate the simplified Prop-Ramp model
timings which have been put into the .LIB file as comments.
The input transition time used for the Prop delays is that of an iv1v0x2
inverter driving 4 more iv1v0x2's.
The iv1v0x2 has an input pin capacitance of 5.13fF, so a fanout of 4 is a load cap of 20.52fF. The iv1v0x2 timing LUT for the output rising transitions is:
<---------- Load capacitance in fF ----------> 4 16 48 120 250 520 I/P transition +------+-------+-------+-------+-------+-------+ 20ps 42.3, 107.6, 282.0, 674.4, 1382.8, 2854.2 60ps 48.0, 108.2, 282.0, 674.4, 1382.8, 2854.2 90ps 55.2, 111.6, 282.0, 674.4, 1382.8, 2854.2 130ps 66.0, 118.5, 282.5, 674.4, 1382.8, 2854.2 200ps 80.8, 133.7, 288.5, 674.4, 1382.8, 2854.2 300ps 98.7, 158.6, 303.8, 675.4, 1382.8, 2854.2 450ps 123.0, 193.0, 333.9, 687.4, 1382.8, 2854.2 670ps 155.9, 235.8, 385.5, 719.1, 1387.8, 2854.2 1000ps 201.4, 292.5, 466.0, 782.6, 1419.2, 2854.2 1500ps 265.5, 370.6, 570.5, 896.9, 1497.1, 2872.5
Taking an input transition of 130ps, we interpolate between the 16fF and 48fF loads to find that with a load of 20.52fF, the output transition is 141.7ps. Repeating for the output falling transition, again with an input transition of 130ps, the output transition is 111.2ps. The average output transition is then 126.5ps, and from this value we confirm the choice of 130ps as the transition to use for the Prop-Ramp model.
The logical effort of each gate is referenced instead to an inverter with a ratio between its P and N transistors of 2.25, where 2.25 is the standard value of mobility ratio used. The mobility ratio is the ratio of how much more conductive an N-transistor is than a P-transistor, and varies with processes between 2 and 3. This is the iv1v2x2 which has a 27λ P transistor and a 12λ N transistor. Its Logical Effort has been set to 1, and the other gates and their pins are compared to this value.
According to the Logical Effort theory, we will have an optimally designed circuit if all cell transistor sizes are adjusted so that the transition times are all 130ps. This ignores the fact that rise and fall transitions will be different, and only considers averages. It also ignores the fact that fixed wire capacitances vary between different nets, and are particularly low on internal nets on non-inverting gates.
I recommend keeping the transition times below 1200ps, and below 600ps for signals which are used as clocks. The LUT extends beyond this so that for any reasonable value of input transition and output load, the timing is interpolated between values in the table. If the timing has to be extrapolated beyond the table values, then significant timing inaccuracies will occur. A max_transition of 1500ps has been set on each cell input so that input transitions which are bigger than the LUT will generate a warning.
It is possible to get negative delays in the LUT. This happens when the input transition is very slow, the output load is very small and the switching threshold below Vdd/2. For these cell inputs, the max_transition has been reduced to 1000ps.
A 6x10 LUT is larger than that used by most standard cell libraries. For this reason, one can consider the vsclib and vxlib timing to be more accurate than other libraries.