For the last couple of months, we’ve been refining our inpainting technique for black-and-white Spitz fur. Our research involved comparing Stable Diffusion models, fine-tuning with LoRA, and experimenting with Textual Inversion.
In this article, we reveal how combining LoRA and Textual Inversion has proven to be the most effective approach for our specific needs. We’ll also discuss the metric we used to measure our success and explain why this combined method stands out.

Original image and its evaluation score

Original image and its evaluation score

Original image for our first upscale iteration and its evaluation score

Original image for our first upscale iteration and its evaluation score
The combined approach: LoRA and Textual Inversion
By combining LoRA and Textual Inversion, we leverage the strengths of both:
- LoRA enables us to fine-tune the model’s attention mechanisms to better capture the intricate details of Spitz fur.
- Textual Inversion allows us to introduce a custom token that encompasses the unique style of black-and-white fur.
This synergy results in inpainting outputs that are not only structurally accurate but also stylistically consistent with our target aesthetic.
LoRA modifies the model’s internal transformations, injecting domain-specific knowledge into the attention layers. This is crucial for adapting the model to understand the unique characteristics of Spitz fur. Textual Inversion, on the other hand, teaches the model new concepts through custom token embeddings.
By introducing a token like <dog_fur_style>, we guide the model to generate or inpaint fur in a way that’s consistent with our desired style. Together, these methods provide a comprehensive solution for our inpainting challenges.

Image version with just SDXL (without LoRA or Textual Inversion)

Image version with just SDXL (without LoRA or Textual Inversion)
Measuring success: Our custom metric
To objectively evaluate the performance of our combined approach, we developed a custom machine learning-based metric that integrates a comprehensive set of image quality assessments.
Unlike traditional metrics like SSIM or perceptual similarity alone, our approach combines multiple indicators – including variance of Laplacian, image entropy, gradient contrast, PSNR, SSIM, VIF, LPIPS, and more – into a unified score.
This custom metric checks two things:
- How accurately the AI recreates the fur’s structure – sharp edges and detailed texture
- How naturally the new fur blends in – smooth transitions and fewer errors
It’s designed specifically to evaluate how well we’re inpainting Spitz dog fur.
Our evaluation pipeline preprocesses images into multiple formats – integer-based for sharpness metrics and float-based for reference comparisons – before computing the individual metrics.
These results are then aggregated into a single, weighted score that reflects the overall quality of the inpainting. This holistic approach ensures our assessment captures the full spectrum of fur restoration quality, from technical precision to aesthetic appeal.

Image edited with LoRA and its evaluation score

Image edited with LoRA and its evaluation score
Results and analysis: Superior performance
Our experiments showed that the combined LoRA and Textual Inversion approach significantly outperformed standalone methods. Using our custom metric, we recorded an average quality score improvement of 4% over LoRA alone and 7% over Textual Inversion alone.
In blind visual assessments, the combined method was favored in 87% of cases, with evaluators praising the crisp fur details, seamless blending, and authentic texture. These findings confirm that our combined approach achieves superior inpainting quality for our specific domain.

Image edited with Textual Inversion and its evaluation score

Image edited with Textual Inversion and its evaluation score
Conclusion: The optimal solution
Our tests show that using LoRA and Textual Inversion together works best for adding realistic black-and-white Spitz fur to images.

This is how the image looks with our combined method

This is how the image looks with our combined method
This approach delivers an unmatched balance of structural accuracy and stylistic fidelity, making it our definitive choice for high-quality, domain-specific inpainting. With this powerful stack finalized, we’re confident in its ability to handle even the most demanding fur restoration tasks.

Leave a Reply