Misc,

Textbooks Are All You Need II: phi-1.5 technical report

Y. Li, S. Bubeck, R. Eldan, A. Del Giorno, S. Gunasekar, and Y. Lee.
(2023)cite arxiv:2309.05463.

Abstract

We continue the investigation into the power of smaller Transformer-based language models as initiated by TinyStories -- a 10 million parameter model that can produce coherent English -- and the follow-up work on phi-1, a 1.3 billion parameter model with Python coding performance close to the state-of-the-art. The latter work proposed to use existing Large Language Models (LLMs) to generate ``textbook quality" data as a way to enhance the learning process compared to traditional web data. We follow the ``Textbooks Are All You Need" approach, focusing this time on common sense reasoning in natural language, and create a new 1.3 billion parameter model named phi-1.5, with performance on natural language tasks comparable to models 5x larger, and surpassing most non-frontier LLMs on more complex reasoning tasks such as grade-school mathematics and basic coding. More generally, phi-1.5 exhibits many of the traits of much larger LLMs, both good -- such as the ability to ``think step by step" or perform some rudimentary in-context learning -- and bad, including hallucinations and the potential for toxic and biased generations -- encouragingly though, we are seeing improvement on that front thanks to the absence of web data. We open-source phi-1.5 to promote further research on these urgent topics.

BibTeX key: li2023textbooks
entry type: misc
year: 2023
url: http://arxiv.org/abs/2309.05463
note: cite arxiv:2309.05463

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@misc{li2023textbooks, abstract = {We continue the investigation into the power of smaller Transformer-based language models as initiated by \textbf{TinyStories} -- a 10 million parameter model that can produce coherent English -- and the follow-up work on \textbf{phi-1}, a 1.3 billion parameter model with Python coding performance close to the state-of-the-art. The latter work proposed to use existing Large Language Models (LLMs) to generate ``textbook quality" data as a way to enhance the learning process compared to traditional web data. We follow the ``Textbooks Are All You Need" approach, focusing this time on common sense reasoning in natural language, and create a new 1.3 billion parameter model named \textbf{phi-1.5}, with performance on natural language tasks comparable to models 5x larger, and surpassing most non-frontier LLMs on more complex reasoning tasks such as grade-school mathematics and basic coding. More generally, \textbf{phi-1.5} exhibits many of the traits of much larger LLMs, both good -- such as the ability to ``think step by step" or perform some rudimentary in-context learning -- and bad, including hallucinations and the potential for toxic and biased generations -- encouragingly though, we are seeing improvement on that front thanks to the absence of web data. We open-source \textbf{phi-1.5} to promote further research on these urgent topics.}, added-at = {2023-09-12T15:32:28.000+0200}, author = {Li, Yuanzhi and Bubeck, Sébastien and Eldan, Ronen and Del Giorno, Allie and Gunasekar, Suriya and Lee, Yin Tat}, biburl = {https://www.bibsonomy.org/bibtex/2d83de556991994f97d2c93c0f59cfb4a/vincentqb}, description = {Textbooks Are All You Need II: phi-1.5 technical report}, interhash = {dbeab91cd2ecbb814a0177506aae2adb}, intrahash = {d83de556991994f97d2c93c0f59cfb4a}, keywords = {llm}, note = {cite arxiv:2309.05463}, timestamp = {2023-09-12T15:32:28.000+0200}, title = {Textbooks Are All You Need II: phi-1.5 technical report}, url = {http://arxiv.org/abs/2309.05463}, year = 2023 }

BibSonomy

Textbooks Are All You Need II: phi-1.5 technical report

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on