Author of the publication

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization?

T. Wang, A. Roberts, D. Hesslow, T. Scao, H. Chung, I. Beltagy, J. Launay, and C. Raffel. ICML, volume 162 of Proceedings of Machine Learning Research, page 22964-22984. PMLR, (2022)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

Daniel

Other publications of authors with the same name

What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization?T. Wang, A. Roberts, D. Hesslow, T. Scao, H. Chung, I. Beltagy, J. Launay, and C. Raffel. ICML, volume 162 of Proceedings of Machine Learning Research, page 22964-22984. PMLR, (2022)The Falcon Series of Open Language Models.E. Almazrouei, H. Alobeidli, A. Alshamsi, A. Cappelli, R. Cojocaru, M. Debbah, É. Goffinet, D. Hesslow, J. Launay, Q. Malartic and 4 other author(s). CoRR, (2023)The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only.G. Penedo, Q. Malartic, D. Hesslow, R. Cojocaru, A. Cappelli, H. Alobeidli, B. Pannier, E. Almazrouei, and J. Launay. CoRR, (2023)What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?T. Wang, A. Roberts, D. Hesslow, T. Scao, H. Chung, I. Beltagy, J. Launay, and C. Raffel. CoRR, (2022)Is the Number of Trainable Parameters All That Actually Matters?A. Chatelain, A. Djeghri, D. Hesslow, and J. Launay. ICBINB@NeurIPS, volume 163 of Proceedings of Machine Learning Research, page 27-32. PMLR, (2021)What Language Model to Train if You Have One Million GPU Hours?T. Scao, T. Wang, D. Hesslow, L. Saulnier, S. Bekman, M. Bari, S. Biderman, H. Elsahar, N. Muennighoff, J. Phang and 9 other author(s). CoRR, (2022)RITA: a Study on Scaling Up Generative Protein Sequence Models.D. Hesslow, N. Zanichelli, P. Notin, I. Poli, and D. Marks. CoRR, (2022)LightOn Optical Processing Unit : Scaling-up AI and HPC with a Non von Neumann co-processor.C. Brossollet, A. Cappelli, I. Carron, C. Chaintoutis, A. Chatelain, L. Daudet, S. Gigan, D. Hesslow, F. Krzakala, J. Launay and 7 other author(s). HCS, page 1-11. IEEE, (2021)What Language Model to Train if You Have One Million GPU Hours?T. Scao, T. Wang, D. Hesslow, S. Bekman, M. Bari, S. Biderman, H. Elsahar, N. Muennighoff, J. Phang, O. Press and 8 other author(s). EMNLP (Findings), page 765-782. Association for Computational Linguistics, (2022)Is the Number of Trainable Parameters All That Actually Matters?A. Chatelain, A. Djeghri, D. Hesslow, J. Launay, and I. Poli. CoRR, (2021)

BibSonomy

Disambiguation of "Hesslow, Daniel"

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization?

Please choose a person to relate this publication to

Daniel

Daniel

Daniel Stanze

Daniel Wied

Daniel Dirkmann

Other publications of authors with the same name

Disambiguation

BibSonomy

Disambiguation of "Hesslow, Daniel"

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization?

Please choose a person to relate this publication to

Daniel

Daniel

Daniel Stanze

Daniel Wied

Daniel Dirkmann

Other publications of authors with the same name

Disambiguation

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization?