Author of the publication

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study.

, , , and . ICML, volume 97 of Proceedings of Machine Learning Research, page 5042-5051. PMLR, (2019)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study., , , and . ICML, volume 97 of Proceedings of Machine Learning Research, page 5042-5051. PMLR, (2019)Unlocking Accuracy and Fairness in Differentially Private Image Classification., , , , , , , , and . CoRR, (2023)Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error., , , , and . CoRR, (2021)Resurrecting Recurrent Neural Networks for Long Sequences., , , , , , and . ICML, volume 202 of Proceedings of Machine Learning Research, page 26670-26698. PMLR, (2023)On the Origin of Implicit Regularization in Stochastic Gradient Descent., , , and . ICLR, OpenReview.net, (2021)Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation., , , , , , and . ICLR, OpenReview.net, (2023)Monte Carlo Sort for unreliable human comparisons.. CoRR, (2016)Batch Normalization Biases Deep Residual Networks Towards Shallow Paths., and . CoRR, (2020)A study on the plasticity of neural networks., , , , , , and . CoRR, (2021)Differentially Private Diffusion Models Generate Useful Synthetic Images., , , , , , , , , and . CoRR, (2023)