UP PREV NEXT |

gate count 1751 number of cells 568 number of library cells 92 number of used cells 50 max fanin 4 max input capacitance 94 max internal fanout 34 critical path 0fF 2123 critical path 6fF 2462 |

By selectively removing cells, the library size can be more than halved with only a 0.9% loss in performance.

The interesting observation here is the removal of the **x1**
drive strength cells does not worsen performance significantly.
For most functions, these cells are the ones with the largest
transistor sizes before folding in the smallest area. But the
conclusion is that cells with half sized transistors are better,
because they load the critical path less when driving non
critical outputs; and **x2** or stronger drive strengths are
chosen for the critical path.

The critical path itself is shown below on the left, with the full library critical path on the right.

< 92 cell library critical path > < 188 cell library critical path > x 1 3 51 x 1 3 61 1 bf1v0x12 15 a->z 191 140 bf1v0x12 15 a->z 208 147 2 nd4v0x3 1 d->z 294 103 nd4v0x3 1 d->z 311 103 3 oai21v0x8 4 b->z 375 81 oai21v0x8 4 b->z 392 81 4 iv1v0x12 1 a->z 424 49 xor2v0x4 1 b->z 473 81 5 oai21v0x8 4 a2->z 513 89 cgi2v0x3 3 c->z 578 105 6 xor2v0x4 1 b->z 610 97 iv1v0x6 1 a->z 627 49 7 cgi2v0x3 3 a->z 729 119 cgi2v0x3 3 c->z 730 103 8 iv1v0x4 1 a->z 784 55 iv1v0x6 1 a->z 779 49 9 cgi2v0x3 3 c->z 888 104 cgi2v0x3 3 c->z 889 110 10 iv1v0x4 1 a->z 943 55 iv1v0x6 1 a->z 938 49 11 cgi2v0x3 3 c->z 1039 96 cgi2v0x3 3 c->z 1055 117 12 iv1v0x4 1 a->z 1094 55 iv1v0x6 1 a->z 1104 49 13 cgi2v0x3 3 c->z 1203 109 cgi2v0x3 3 c->z 1209 105 14 iv1v0x4 1 a->z 1257 54 iv1v0x6 1 a->z 1258 49 15 cgi2v0x3 3 c->z 1362 105 cgi2v0x3 3 c->z 1361 103 16 iv1v0x4 1 a->z 1416 54 iv1v0x6 1 a->z 1410 49 17 cgi2v0x3 4 c->z 1534 118 cgi2v0x3 4 c->z 1529 119 18 xnr2v0x3 1 a->z 1638 104 xnr2v0x3 1 a->z 1633 104 19 xor2v0x4 1 b->z 1727 89 xor2v0x4 1 b->z 1722 89 20 cgi2v0x3 2 a->z 1822 95 cgi2v0x3 2 a->z 1817 95 21 iv1v0x4 1 a->z 1877 55 iv1v0x4 1 a->z 1871 54 22 cgi2v0x3 2 c->z 1960 83 cgi2v0x3 2 c->z 1960 89 23 iv1v0x4 1 a->z 2010 50 iv1v0x6 1 a->z 2009 49 24 cgi2v0x2 2 c->z 2096 86 cgi2v0x3 2 c->z 2090 81 25 an2v0x4 2 b->z 2205 109 an2v0x8 2 b->z 2194 104 26 an2v0x8 2 b->z 2315 110 an2v0x8 2 b->z 2304 110 27 xor2v0x2 0 b->z 2462 147 xaon21v0x3 0 a2->z 2441 137 r 14 r 15 |

These two critical paths are nearly the same. Only gates 3,4,5,6 and 27 are different. It is useful to analyse the differences and see where the loss in speed occurs.

- The path from cell #7 to cell #16 is a succession of inverting carry
generators and inverters. The largest carry generators are chosen, and
optimally the inverters should be an
**x6**drive strength. In the 92 cell library these don't exist and an**x4**drive strength inverter is used instead. This increases the delay of these 10 cells by 2.9% from 783 to 806.

This is not a lot. In fact, the book on Logical Effort by Sutherland, Sproull and Harris predicts this (see for example Figure 3.7). It is however more than the overall increase in delay, which is only 0.9%. - The delay to the output of cell #1 is actually faster by 17ps in the 92 cell library than the 188 cell library. This is because the multiplier is made up of many parallel critical paths. In order to keep these faster than 2441 in the 188 cell library, the loading on the input pin x(1) and the input buffer is greater and this slows down also the path which finally turns out to be critical.
- The output r(14) in the 188 cell library
circuit is driven by an
**xor2v0x3**. This cell doesn't exist in the 92 cell library and is replaced by the**xor2v0x2**which is slower. This puts the r(14) output as the critical path (3ps slower than the r(15) output) for the 92 cell library.

If now further cells are removed, they will either be
the high drive cells needed for the critical path, or
cells like the **xaon21v0x3** which can both appear
on the critical path and significantly reduce the cell count.
So from this analysis, the minimum set of combinatorial cells
which gives the best performance is 92 cells. Increasing the
library to 189 cells gives a slight performance
benefit, 0.9% measured with the multiplier. Including the extra
cells is a choice for the library developer.

Table of synthesis results |
|||||||

critical path (ps) | gate count | cell count | porosity | library cells | used cells | ||

synthesis 1 | 4279 | 1561 | 923 | 43% | 9 | 8 | basic inverters, NAND & NOR gates |

synthesis 2 | 4236 | 1472 | 792 | 45% | 15 | 12 | AND & OR gates |

synthesis 3 | 4157 | 1357 | 696 | 46% | 19 | 16 | AOI & OAI gates, 2/1 and 2/2 |

synthesis 4 | 4157 | 1357 | 696 | 46% | 20 | 16 | mxi2 2-way inverting mux |

synthesis 5 | 3983 | 1343 | 668 | 48% | 21 | 16 | cgi2 carry generator inverting |

synthesis 6 | 3948 | 1352 | 668 | 48% | 28 | 18 | inverters with multiple drive strengths |

synthesis 7 |
3061 | 1433 | 666 | 51% | 70 | 27 | x2 drive strengths for all functions |

synthesis 8 | 3056 | 1456 | 666 | 52% | 70 | 30 | BOOG with x1 drive strengths |

synthesis 9 | 2960 | 1476 | 666 | 53% | 70 | 32 | BOOG with x05 drive strengths |

synthesis 10 | 2963 | 1480 | 666 | 53% | 76 | 34 | nd2a and nr2a cells |

synthesis 11 | 2963 | 1480 | 666 | 53% | 79 | 34 | nd2ab type of 2-OR |

CyHP library | 3778 | 1539 | 832 | 46% | 18 | 17 | Minimum size library |

synthesis 12 | 2908 | 1362 | 553 | 54% | 91 | 38 | AND/OR into XOR/XNOR |

synthesis 13 | 2893 | 1378 | 551 | 55% | 103 | 39 | aoi211, aoi31, oai211 & oai31 |

synthesis 14 | 2931 | 1400 | 562 | 55% | 104 | 38 | 3-XOR gate, 1/2 stage delays |

synthesis 15 | 2886 | 1390 | 536 | 56% | 109 | 40 | 3-XOR/XNOR gates as 2×2-I/P gates |

synthesis 16 |
2665 | 1514 | 538 | 60% | 136 | 46 | x3 drive strength cells |

synthesis 17 |
2567 | 1571 | 540 | 61% | 155 | 49 | x4 drive strength cells |

synthesis 18 | 2523 | 1611 | 540 | 62% | 167 | 49 | x6 drive strength cells |

synthesis 19 | 2497 | 1625 | 538 | 62% | 179 | 54 | x8 drive strength cells |

synthesis 20 | 2493 | 1628 | 541 | 62% | 188 | 55 | buffers to decouple non-critical paths |

synthesis 21 | 2441 | 1758 | 563 | 64% | 188 | 55 | input buffers |

synthesis 22 | 2550 | 1717 | 535 | 64% | 188 | 55 | optimised Alliance flow |

synthesis 23 | 2439 | 1695 | 560 | 63% | 188 | 58 | current 209 cell vsclib |

synthesis 24 | 2462 | 1751 | 568 | 64% | 92 | 50 | reduced 92 cell library |

UP PREV NEXT |