VXLIB Cell Characterisation Methodology

tF diagram The cells are characterised by running a total of 60 Spice simulations using different values of input transition and output load. 10 values of input transition are used, ranging from 20ps to 1500ps. 6 values of output load are used, ranging for an x1 gate from 2.6fF to 338fF. For cells with bigger or smaller drive strengths, the output loads are scaled correspondingly. The characterisation conditions are nominal, with Vdd=1.2V, T=27C and typical process parameters. Derating to best and worst case conditions has not been done yet.

The entire library of 96 cells takes 11.5 hours to characterise on a 1500MHz Pentium M running Suse Linux 8.2.

For each simulation, the rise delay tR is the time from the input transition passing the 50% point to the rising output passing the 50% point, and the fall delay tF is the corresponding delay when the output is falling. The rise and fall transitions are the time the output takes between the 10% and 90% points, scaled to a full 0-100% transition. This means that the 10-90% time is divided by (0.9-0.1) or 0.8 to give the equivalent 0-100% transition.

The transition times have been scaled to 0-100% because the Spice simulations are using full swing inputs for the characterisation. The 10/90 point has been determined empirically as minimising differences between the timing in the Spice simulations and the logic timing coming from interpolating the LookUp Tables (LUT's). This determination has to be done for each technology … the values won't be the same for each one.

For some gates, typically where an N-transistor strongly helps the output to rise, the rise transition thresholds are different. In this case, the presence of an N transistor assisting in the pull up changes the nature of the output curve. In order to minimise timing discrepancies, the thresholds are varied on a per cell basis.

FO4 The iv1_y2 is used as a reference to calculate the simplified Prop-Ramp model timings which have been put into the .LIB file as comments. The input transition time used for the Prop delays is that of an iv1v0x2 inverter from the vsclib driving 4 more iv1v0x2's.

The iv1v0x2 has an input pin capacitance of 5.13fF, so a fanout of 4 is a load cap of 20.52fF. The iv1v0x2 timing LUT for the output rising transitions is:

                <---------- Load capacitance in fF ---------->
                   4      16      48     120     250     520
I/P transition +------+-------+-------+-------+-------+-------+
    20ps          42.3,  107.6,  282.0,  674.4, 1382.8, 2854.2
    60ps          48.0,  108.2,  282.0,  674.4, 1382.8, 2854.2
    90ps          55.2,  111.6,  282.0,  674.4, 1382.8, 2854.2
   130ps          66.0,  118.5,  282.5,  674.4, 1382.8, 2854.2
   200ps          80.8,  133.7,  288.5,  674.4, 1382.8, 2854.2
   300ps          98.7,  158.6,  303.8,  675.4, 1382.8, 2854.2
   450ps         123.0,  193.0,  333.9,  687.4, 1382.8, 2854.2
   670ps         155.9,  235.8,  385.5,  719.1, 1387.8, 2854.2
  1000ps         201.4,  292.5,  466.0,  782.6, 1419.2, 2854.2
  1500ps         265.5,  370.6,  570.5,  896.9, 1497.1, 2872.5

Taking an input transition of 130ps, we interpolate between the 16fF and 48fF loads to find that with a load of 20.52fF, the output transition is 141.7ps. Repeating for the output falling transition, again with an input transition of 130ps, the output transition is 111.2ps. The average output transition is then 126.5ps, and from this value we confirm the choice of 130ps as the transition to use for the Prop-Ramp model.

The logical effort of each gate is referenced instead to an inverter with a ratio between its P and N transistors of 2.25, where 2.25 is the standard value of mobility ratio used. The mobility ratio is the ratio of how much more conductive an N-transistor is than a P-transistor, and varies with processes between 2 and 3. This is the iv1_y2 which has a 36λ P transistor and a 16λ N transistor. Its Logical Effort has been set to 1, and the other gates and their pins are compared to this value.

According to the Logical Effort theory, we will have an optimally designed circuit if all cell transistor sizes are adjusted so that the transition times are all 130ps. This ignores the fact that rise and fall transitions will be different, and only considers averages. It also ignores the fact that fixed wire capacitances vary between different nets, and are particularly low on internal nets on non-inverting gates.

I recommend keeping the transition times below 1200ps, and below 600ps for signals which are used as clocks. The LUT extends beyond this so that for any reasonable value of input transition and output load, the timing is interpolated between values in the table. If the timing has to be extrapolated beyond the table values, then significant timing inaccuracies will occur. A max_transition of 1500ps has been set on each cell input so that input transitions which are bigger than the LUT will generate a warning.

It is possible to get negative delays in the LUT. This happens when the input transition is very slow, the output load is very small and the switching threshold below Vdd/2. For these cell inputs, the max_transition has been reduced to 1000ps.

A 6x10 LUT is larger than that used by most standard cell libraries. For this reason, one can consider the vsclib and vxlib timing to be more accurate than other libraries.