Merging LoRA Adapters and Serving Locally
In the SFT and DPO posts, I trained LoRA adapters using pure PyTorch. The adapters are tiny (~4MB), but at inference time you still need to load the base model, inject the LoRA wrappers, and load the adapter weights. What if you just want a single, standalone model you can run anywhere?
Merging folds the adapter back into the base weights permanently. The result is a standard model file with no adapter machinery required.