"GShard: Scaling Giant Models with Conditional Computation and Automatic ..."

Dmitry Lepikhin et al. (2020)
a service of Schloss Dagstuhl - Leibniz Center for Informatics