While I am working on the blog series on the LLaMA family of models, I have also put together a curated reading list of papers that chart the evolution of large language models. These papers provide crucial context for understanding the foundations of large language models and landscape of LLMs that are the backbone of systems like meta.ai, ChatGPT, and Claude, among others.
In this blog, my attempt is to create a comprehensive reading list, including some that are featured in my blog above.
Key research papers on deep learning architectures
Sequence to Sequence Learning with Neural Networks. Sutskever et al., 2014, Google
Attention is All You Need. Vaswani et al., 2017, Google
Improving Language Understanding by Generative Pre-Training. Radford et al., 2018. OpenAI - GPT-1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Devlin et al., 2018. Google
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Lewis et al., 2019. Facebook
Language Models are Unsupervised Multitask Learners. Radford et al., 2019. OpenAI - GPT-2
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Raffel et al., 2019. Google - T5
Language Models are Few-Shot Learners. Brown et al., 2020. Open AI - GPT-3
On Layer Normalization in the Transformer Architecture. Xiong et al., 2020.
Survey Papers
A Survey of Large Language Models. Zhao et al., 2023. [Github]
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers. Qin et al., 2024.
Large Language Models: A Survey. Minaee et al., 2024
Efficient pre-training and scaling laws
Scaling Laws for Neural Language Models. Kaplan et al., 2020. Open AI
Scaling Language Models: Methods, Analysis & Insights from Training Gopher. Rae et al., 2021. DeepMind
Training Compute-Optimal Large Language Models. Hoffmann et al., 2022 DeepMind - Chinchilla
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. Dao et al., 2022. Stanford University
PaLM: Scaling Language Modeling with Pathways. Chowdhery et al., 2022. Google
Cramming: Training a Language Model on a Single GPU in One Day. Geiping et al., 2022. University of Maryland, College Park
Survey Papers
Efficient Transformers: A Survey. Tay et al., 2020 (Revised 2022). Google
A Survey on Efficient Training of Transformers. Zhuang et al., 2023
Fine-tuning and parameter-efficient transfer learning
Parameter-Efficient Transfer Learning for NLP. Houlsby et al., 2019. Google [Github]
Finetuned Language Models are Zero-Shot Learners. Wei et al., 2021. Google
LoRA: Low-Rank Adaptation of Large Language Models. Hu et al., 2021. Microsoft [Github] [Video]
QLoRA: Efficient Finetuning of Quantized LLMs. Dettmers et al., 2023. University of Washington [Github]
Survey Papers
A Survey of Quantization Methods for Efficient Neural Network Inference. Gholami et al., 2021. UC Berkeley
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning. Lialin et al., 2022. UMass Lowell
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond. Yang et al., 2023 [ Github]
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey. Ding et al., 2024, Microsoft
Aligning LLMs
Deep Reinforcement Learning from Human Preferences. Christiano et al., 2017 (Revised 2023). Google
Fine-Tuning Language Models from Human Preferences. Ziegler et al., 2019 (Revised 2020)
Training Language Models to Follow Instructions with Human Feedback. Ouyang et al., 2022. OpenAI - InstructGPT
Training a helpful and harmless assistant with reinforcement learning from human feedback. Bia et al., 2022. Anthropic
Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. Ganguli et al., 2022. Anthropic
Constitutional AI: Harmlessness from AI Feedback. Bia et al., 2022. Anthropic
Survey Papers
Aligning Large Language Models with Human: A Survey. Wang et al., 2023. Huawei Noah’s Ark Lab
Large Language Model Alignment: A Survey. Shen et al., 2023. Tianjin University
Note, this is not an exhaustive reading list, and I will be updating it as I come across any new paper while I work on #icodeformyभाषा! There will be more of such reading lists in the future for different areas within natural language processing.