Author of the publication

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

N. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. Tang. (2016)cite arxiv:1609.04836Comment: Accepted as a conference paper at ICLR 2017.

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

Nitish Chooramun

Prajakta Shirish Oak

Nitish Narayan Roy

Nitish Chandra Chakraborty

Other publications of authors with the same name

The Natural Language Decathlon: Multitask Learning as Question AnsweringB. McCann, N. Keskar, C. Xiong, and R. Socher. (2018)cite arxiv:1806.08730.A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation.A. Gotmare, N. Keskar, C. Xiong, and R. Socher. CoRR, (2018)On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima.N. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. Tang. ICLR, OpenReview.net, (2017)On Large-Batch Training for Deep Learning: Generalization Gap and Sharp MinimaN. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. Tang. (2016)cite arxiv:1609.04836Comment: Accepted as a conference paper at ICLR 2017.Regularizing and Optimizing LSTM Language Models.S. Merity, N. Keskar, and R. Socher. ICLR (Poster), OpenReview.net, (2018)adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs.N. Keskar, and A. Berahas. ECML/PKDD (1), volume 9851 of Lecture Notes in Computer Science, page 1-16. Springer, (2016)Char2Subword: Extending the Subword Embedding Space Using Robust Character Compositionality.G. Aguilar, B. McCann, T. Niu, N. Rajani, N. Keskar, and T. Solorio. EMNLP (Findings), page 1640-1651. Association for Computational Linguistics, (2021)Global Capacity Measures for Deep ReLU Networks via Path Sampling.R. Theisen, J. Klusowski, H. Wang, N. Keskar, C. Xiong, and R. Socher. CoRR, (2019)Unsupervised Paraphrase Generation via Dynamic Blocking.T. Niu, S. Yavuz, Y. Zhou, H. Wang, N. Keskar, and C. Xiong. CoRR, (2020)A second-order method for convex l1-regularized optimization with active-set prediction.N. Keskar, J. Nocedal, F. Öztoprak, and A. Wächter. Optim. Methods Softw., 31 (3): 605-621 (2016)

BibSonomy

Disambiguation of "Keskar, Nitish Shirish"

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

Please choose a person to relate this publication to

Nitish Chooramun

Prajakta Shirish Oak

Nitish Narayan Roy

Nitish Chandra Chakraborty

Other publications of authors with the same name

Disambiguation

BibSonomy

Disambiguation of "Keskar, Nitish Shirish"

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

Please choose a person to relate this publication to

Nitish Chooramun

Prajakta Shirish Oak

Nitish Narayan Roy

Nitish Chandra Chakraborty

Other publications of authors with the same name

Disambiguation

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima