Large language models are powerful, but they can be slow, expensive and too general for specific tasks. Many businesses need smaller and faster models that solve their unique problems without high operational costs. To achieve this, you need to go beyond standard fine-tuning. This book shows you how to structurally change large models to make them simple, efficient and specialised.
- Learn to reduce model size with effective pruning techniques.
- Use knowledge distillation to train smaller but capable models.
- Apply fine-tuning to adapt models for specific tasks.
- Work with popular open models like Llama 3, Gemma and Qwen.
- Understand the balance between a model's size, speed and cost.
- Explore how to make your models more understandable and explainable.
Rearchitecting LLMs teaches you structural techniques for creating efficient models. The book provides a practical guide to making language models smaller, faster and cheaper to run. It explains how to move beyond simple fine-tuning and use advanced methods to alter a model's core architecture. You will work with popular open models and learn to adapt them for real-world business needs.
After reading this book, you will be able to create capable and cost-effective models customised to your goals. You will have the skills to build efficient AI systems that deliver value without breaking the budget. This book is for practising AI, ML and data engineers who know Python.