Yeah, on old slow agp card it killed fps a quite. But now it's still present even on pcix card in almost the same scale as now it is not the work of throughput, but of overhead.
Technicaly it isn't too difficult to make it openair friendly in code, but actualy it would stuck at size overhead between cpu and graphical hw. Too much to redone in design, conception...
But basicaly the conception is to learn how to make lots of polygons in extreme cheap way and to lower the throughput problems. Torque engine is one of good examples.