InAI AdvancesbyGavin LiCrazy Challenge: Run Llama 405B on a 8GB VRAM GPUI’m taking on the challenge of running the Llama 3.1 405B model on a GPU with only 8GB of VRAM.Aug 1, 202413Aug 1, 202413
InAI AdvancesbyMuhammad Saad UddinStop Guessing! Here’s How Much GPU Memory You REALLY Need for LLMs!Techniques to Calculate and Reduce Memory Footprint in LLM ServingSep 20, 20249Sep 20, 20249
Zain ul AbideenLlama-Bitnet | Training a 1.58 bit LLMWhat is 1 bit LLM and How to train 70M Llama-Bitnet?Apr 4, 20242Apr 4, 20242
InTheSeriousProgrammerbyChidhambararajan RBinary Magic: Building BitNet 1.58bit Using PyTorch from ScratchSpoiler Alert:Mar 12, 20248Mar 12, 20248
InData Science in your pocketbyMehul GuptaWhat are 1-bit LLMs?The Era of 1-bit LLMs with BitNet b1.58Mar 3, 20246Mar 3, 20246
Inazhar labsbyazharNo more Floating Points, The Era of 1.58-bit Large Language ModelsThe world of Large Language Models (LLMs) is witnessing a paradigm shift, one that could redefine the very fundamentals of how these models…Feb 29, 20242Feb 29, 20242
InTDS ArchivebyBenjamin MarieRun Llama 2 70B on Your GPU with ExLlamaV2Finding the optimal mixed-precision quantization for your hardwareSep 29, 20233Sep 29, 20233
InTDS ArchivebyBenjamin MarieQA-LoRA: Fine-Tune a Quantized Large Language Model on Your GPUQuantization-aware fine-tuningOct 14, 20231Oct 14, 20231
Benjamin MarieQuantize LLMs with GPTQ Using Hugging Face TransformersGPTQ is now much easier to useSep 2, 20233Sep 2, 20233
InTowards AIbyEduardo MuñozGPTQ Quantization on a Llama 2 7B Fine-Tuned Model With HuggingFaceA how-to easy-following guide on quantizing an LLMSep 7, 20232Sep 7, 20232
InTowards AIbyRajesh KLLM Quantization Techniques- GPTQRecent advances in neural network technology have dramatically increased the scale of the model, resulting in greater sophistication and…Feb 18, 20241Feb 18, 20241