Patent-7

UP PREV NEXT

7. Mapping to library cells using an alternative area algorithm

Initial circuit

Two coefficient area mapping

The patent refers to a single variable C_S derived from the library data for each function and used to link the delay of a function to its area. As already seen, this leads to significant inaccuracies in the area estimate.

A better estimate of the area can be made by using two coefficients for the area approximation. This isn't described in the patent, but is an extension of the ideas given there. We replace the expression for the delay d:

d = τ × (p + C_OUT /(C_S×A))
with
d = τ × (p + C_OUT /(C_S0+C_S1×A))

The two fitted curves for pin a of the 2-NAND gate are shown in the graph to the right. The curve labelled Est.Area2 uses two coefficients and is a better fit than curve Est.Area which uses a single coefficient.

The equation linking the estimated cell area to the delay d using two area coefficients is then:

A = {C_OUT /( d/τ -p)-C_S0 }/C_S1

Values for C_S0 and C_S1 are averaged over all the cells for a function. The constant timing model sets each cell's delay, and from this and the load capacitance C_OUT an estimate of its area according to the equation above. The area is mapped to a drive strength as shown in the example below for pin a of a 2-XOR gate.

Deriving the drive strength from the estimated cell area, pin a of a 2-XOR gate C_S0 = -3.14, C_S1 = 0.67, g = 1.65
		x05		x1		x2		x3		x4
C_IN		3.34		5.34		10.30		15.07		20.29
Adjusted C_IN		2.97		5.34		10.22		16.23		21.32
Actual width		8		8		15		19		25
Est. width A=C_IN/(gC_S1) - C_S0 /C_S1		7.4		9.6		14.0		19.5		24.1
Mapping	0	x05	8.4	x1	11.6	x2	16.5	x3	21.7	x4

The adjusted C_IN compensates for differences in logical effort between the different cells. If as an example the estimated area is 8 tracks, which lies between 0 and 8.4, then the mapped cell is the xor2v0x05.

The graph below for an inverter shows that the fit between actual and estimated areas is much better than before, especially for the weaker drive strengths.

inv area

The values of the single coefficient C_S are listed in the table below. C_S is the drive per unit area, so is higher for an inverter and lower for more complex gates.

Values of C_S by function and pin
	bf1	cgi2	iv1	nd2	nr2	xnr2	xor2		aoi21	oai21
a	2.05	0.39	2.87	1.37	1.08	0.41	0.47	a1	0.55	0.81
b		0.38		1.38	1.04	0.43	0.48	a2	0.55	0.81
c		0.39						b	0.57	0.85

The values of C_S0 and C_S1 are determined by a library analysis and are shown in the tables below. These are used to estimate the area of each cell in the 4-bit adder using the area equation above.

*Values of C_S0* by function and pin**
	bf1	cgi2	iv1	nd2	nr2	xnr2	xor2		aoi21	oai21
a	-7.30	-1.51	-8.94	-5.52	-2.38	-2.94	-3.14	a1	-2.48	-2.84
b		-1.49		-5.57	-2.30	-3.11	-3.18	a2	-2.49	-2.84
c		-1.53						b	-2.10	-2.98
*Values of C_S1* by function and pin**
	bf1	cgi2	iv1	nd2	nr2	xnr2	xor2		aoi21	oai21
a	2.89	0.48	4.30	2.01	1.26	0.58	0.67	a1	0.89	0.98
b		0.48		2.03	1.22	0.61	0.67	a2	0.89	0.98
c		0.49						b	0.86	1.03

One can observe a problem with the methodology by comparing the coefficients for the aoi21 and the oai21 cells. Both should be similar, but the aoi21 is worse. This is because there are only three drive strengths for the aoi21 and two of these have the same width. The coefficient values are quite sensitive to the number of drive strengths.

Netlist comparison

The drive strengths produced by mapping with one area coefficient and by two are the same for the adder with unbuffered inputs. For this circuit, the more accurate mapping has no benefit.

For the adder with buffered inputs there is an improvement in the final schematic. Fig 7a on the right shows the schematic with drive strengths mapped using 1 area coefficient and the critical path with vsclib timing of 371ps, 6% more than the spec. Fig 7b shows the schematic with drive strengths mapped using 2 area coefficients and idealised timing so that all critical paths are 350ps. Instance b0n has an x2 drive strength instead of x1 (Fig 6b). Fig 7c shows the vsclib timing using the drive strengths from Fig 7b. The critical path is 354ps, 1.1% above the spec.

The area estimates produced by either of the coefficients for this circuit is good, as shown in the table below.

	Est. C_S	Actual C_S	Est. C_S0 C_S1	Actual C_S0 C_S1
adder gates	74.4	75.7	73.7	75.7
adder gates		Fig 5c		Fig 5c
estimation error		1.7%		2.6%
buffered adder gates	87.5	86.7	86.8	86.7
buffered adder gates	Fig 6b	Fig 7a	Fig 7b	Fig 7c
estimation error		1.0%		0.1%

The area estimate can give values which are larger than the largest cell or smaller than the smallest cell. In this case, the actual largest or smallest area is used instead.

Fitting single and double area coefficients for pin a of a 2-NAND gate 2-NAND area

Fig 7a. Buffered adder delays with (i) fixed stage effort of f=3.6 used for initial timing; (ii) each gate delay compressed or stretched to meet critical path; (iii) wireload of 6fF per fanout; (iv) gain limit of 5 used for non-inverting gates; (v) single area coefficient C_S to map drive strength; (vi) timing from vsclib cells.
vsclib matched critical paths

Fig 7b. Buffered adder delays with (i) fixed stage effort of f=3.6 used for initial timing; (ii) each gate delay compressed or stretched to meet critical path; (iii) wireload of 6fF per fanout; (iv) two area coefficients C_S to map drive strength; (v) timing using library averages.
ideal matched critical paths

Fig 7c. Buffered adder delays with (i) fixed stage effort of f=3.6 used for initial timing; (ii) each gate delay compressed or stretched to meet critical path; (iii) wireload of 6fF per fanout; (iv) gain limit of 5 used for non-inverting gates; (v) two area coefficients C_S0,C_S1 to map drive strength; (vi) timing from vsclib cells.
vsclib matched critical paths

UP PREV NEXT

5-AUG-05