RETHINKING THE VALUE OF NETWORK PRUNING Paper Reading Notes
Published:
RETHINKING THE VALUE OF NETWORK PRUNING
A typical three-stage network pruning pipeline is mentioned.
Difference between predefined and auto- matically discovered target architectures, in channel pruning as an example.
While They have shown that, for structured pruning, the inherited weights in the pruned architecture are not better than random, the pruned architecture itself turns out to be what brings the efficiency benefits.
They have also done serveral experiments on the Lottery Ticket Hypothesis and it shows that random initialization is enough for the pruned model to achieve competitive performance. Using the winning ticket as initialization only brings improvement when the learning rate is small (0.01), however such small learning rate leads to a lower accuracy than the widely used large learning rate (0.1).