List: Quantisation | Curated by Luv Bansal

Dec 19, 2024
11 stories
2 saves
Quantisation
In
AI Advances
by
Gavin Li
Crazy Challenge: Run Llama 405B on a 8GB VRAM GPUI’m taking on the challenge of running the Llama 3.1 405B model on a GPU with only 8GB of VRAM.
Aug 1, 2024
13
Aug 1, 2024
13
In
AI Advances
by
Muhammad Saad Uddin
Stop Guessing! Here’s How Much GPU Memory You REALLY Need for LLMs!Techniques to Calculate and Reduce Memory Footprint in LLM Serving
Sep 20, 2024
9
Sep 20, 2024
9
Zain ul Abideen
Llama-Bitnet | Training a 1.58 bit LLMWhat is 1 bit LLM and How to train 70M Llama-Bitnet?
Apr 4, 2024
2
Apr 4, 2024
2
In
TheSeriousProgrammer
by
Chidhambararajan R
Binary Magic: Building BitNet 1.58bit Using PyTorch from ScratchSpoiler Alert:
Mar 12, 2024
8
Mar 12, 2024
8
In
Data Science in your pocket
by
Mehul Gupta
What are 1-bit LLMs?The Era of 1-bit LLMs with BitNet b1.58
Mar 3, 2024
6
Mar 3, 2024
6
In
azhar labs
by
azhar
No more Floating Points, The Era of 1.58-bit Large Language ModelsThe world of Large Language Models (LLMs) is witnessing a paradigm shift, one that could redefine the very fundamentals of how these models…
Feb 29, 2024
2
Feb 29, 2024
2
In
TDS Archive
by
Benjamin Marie
Run Llama 2 70B on Your GPU with ExLlamaV2Finding the optimal mixed-precision quantization for your hardware
Sep 29, 2023
3
Sep 29, 2023
3
In
TDS Archive
by
Benjamin Marie
QA-LoRA: Fine-Tune a Quantized Large Language Model on Your GPUQuantization-aware fine-tuning
Oct 14, 2023
1
Oct 14, 2023
1
Benjamin Marie
Quantize LLMs with GPTQ Using Hugging Face TransformersGPTQ is now much easier to use
Sep 2, 2023
3
Sep 2, 2023
3
In
Towards AI
by
Eduardo Muñoz
GPTQ Quantization on a Llama 2 7B Fine-Tuned Model With HuggingFaceA how-to easy-following guide on quantizing an LLM
Sep 7, 2023
2
Sep 7, 2023
2
In
Towards AI
by
Rajesh K
LLM Quantization Techniques- GPTQRecent advances in neural network technology have dramatically increased the scale of the model, resulting in greater sophistication and…
Feb 18, 2024
1
Feb 18, 2024
1