InGenerative AIbyKaushik RajanCOCONUT: Redefining Reasoning in Large Language ModelsRevolutionizing reasoning in large language models through latent space.Dec 17, 20248Dec 17, 20248
InTDS ArchivebyMatthew GuntonExploring Medusa and Multi-Token PredictionThis blog post will go into detail on the “MEDUSA: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads” paperJul 10, 2024Jul 10, 2024
InTDS ArchivebyBenjamin MarieTorch Compile: 2x Faster Llama 3.2 with Low EffortBut it will depend on your GPUNov 13, 20243Nov 13, 20243
InAI AdvancesbyGavin LiCrazy Challenge: Run Llama 405B on a 8GB VRAM GPUI’m taking on the challenge of running the Llama 3.1 405B model on a GPU with only 8GB of VRAM.Aug 1, 202413Aug 1, 202413
InAI AdvancesbyMuhammad Saad UddinStop Guessing! Here’s How Much GPU Memory You REALLY Need for LLMs!Techniques to Calculate and Reduce Memory Footprint in LLM ServingSep 20, 20249Sep 20, 20249
InAI AdvancesbyNikhil AnandMy LLM’s outputs got 1000% better with this simple trick.I wish I had known this trick sooner.Dec 2, 202435Dec 2, 202435
InOpenVINO-toolkitbyOpenVINO™ toolkitReduce LLM Footprint with OpenVINO™ Toolkit Weight CompressionCreate lean LLMs using weight compression with the OpenVINO™ toolkit. Reduce LLM size, memory footprint, and GPU requirements.Jul 2, 2024Jul 2, 2024
Zain ul AbideenBest LLM Inference Engine? TensorRT vs vLLM vs LMDeploy vs MLC-LLMBenchmarking various LLM Inference Engines.Jul 6, 20241Jul 6, 20241
Zain ul AbideenApple MLX vs Llama.cpp vs Hugging Face Candle Rust for Lightning-Fast LLMs LocallyMistral-7B and Phi-2 to experiment fastest inference/generation speed across libraries.Jan 31, 20242Jan 31, 20242
InGenerative AIbySimone TedeschiHow to Run 70B LLMs on a Single 4GB GPUHave you ever dreamed of using the state-of-the-art large language models (LLMs) for your natural language processing (NLP) tasks, but felt…Jan 21, 202410Jan 21, 202410
InTDS ArchivebyBenjamin MarieRun Llama 2 70B on Your GPU with ExLlamaV2Finding the optimal mixed-precision quantization for your hardwareSep 29, 20233Sep 29, 20233