Learning Goals
a. What is upscaling?
b. What was the interpolation used when resizing?
c. How is pixel upscaling done?
d. What is the difference between pixel upscaling and latent upscaling?
e. How is latent upscaling done?
Plain Text
복사
Wrokflow
Index | Workflow Name | Actionable Link | Summary |
4.5.1 | Resize | If you understand interpolation required for resizing before upscaling, it would be helpful. | |
4.5.2 | Pixel Upscale | Pixel-based upscaling is possible. | |
4.5.3 | Upscale Model Comparison | Even within pixel-based upscaling, results can vary significantly depending on the model used. | |
4.5.4 | Ultimate SD Upscaler | There is a node that compensates for the drawbacks of pixel-based upscaling. | |
4.5.5 | Ultimate SD Upscaler (2) | Upscaling can be done for both generated images and externally created images. | |
4.5.6 | Latent Upscale | Leveraging the strengths of generative AI, upscaling can be done in the latent space rather than the pixel space. | |
4.5.7 | Latent Upscale(2) | Latent upscaling changes the details, but some of the drawbacks can be hedged. |
a. What is upscaling?
SD1.5 produces images at 512x512, while SDXL produces images at 1024x1024.
However, we don't typically use 512x512 size outputs in practical applications.
In practical scenarios, higher resolution or quality images are sometimes needed.
Upscaling refers to increasing the size of an image from 512x512 to 1024x1024, 2048x2048, 4096x4096, and so on.
Traditionally, non-generative AI methods have been used for upscaling, and they work quite well.
However, there are certain limitations, and generative AI can excel in these areas.
b. What was the interpolation used when resizing?
Facing many new concepts might be challenging.
You don't need to worry about learning all these complexities if you just want to improve image quality.
Let's focus on just two slightly unfamiliar concepts: interpolation and latent.
Interpolation
Let’s start with a fundamental question: How do you scale a 512x512 image up to a 1024x1024 image?
(If you’re not curious, you can just choose bilinear or bicubic interpolation instead of nearest neighbor when resizing and move on.)
Before AI, with just computers, let’s simplify how it was done.
Images are made up of pixels. Let’s examine how to expand a 3x3 image with values ranging from 0 to 10 into a 5x5 image.
If positioned like this, how should the white spaces be filled? Let’s look at the space labeled a between the 0 and 10 values.
•
a is next to 0, so a can be filled with 0. ⇒ 0 (nearest)
•
a is between 0 and 10, so let’s decide it as the average of the two, which is 5. ⇒ 5 (bilinear)
◦
Using this method, b could be 3 and c could be 4.
Further explanation would involve details about computer science and algorithms, which are not crucial here and will be omitted.
When expanding or transforming an image by stretching it, you need to decide which method to use to fill in the values.
Therefore, when performing resizing tasks, it is better to use bilinear or bicubic rather than nearest neighbor.
Bicubic is an improvement over simply averaging two values, using methods similar to differentiation.
(Note that for the pixel image currently being input, the differences between these three methods may not be significantly observable.)
c. How is pixel upscaling done?
Pixel upscaling is an advanced method that involves using AI models designed for pixel-based upscaling. The process is straightforward: choose an upscaling model and perform the upscaling. The results are significantly influenced by the model used. This method represents a traditional AI approach rather than generative AI.
Therefore, there are various upscaling models available. Some models are good at enhancing faces, others perform well universally, and some may not be effective for photos but excel with animations. It might be useful to compare the results yourself and gather tips that work best for you.
Ultimate SD Upscaler
I will explain one of the nodes, the Ultimate SD Upscaler node.
Just because there is a basic generation node does not mean that the upscaling is done based on generation. This might be a misunderstanding.
This is just a pixel upscaler model for performing upscaling.
Why use this? Pixel-based models handle upscaling well for increases like 512x512 to 1024x1024 (4x). However, for larger upscales, such as 8x or 16x, or from 4096x4096 to 8192x8192, performance drops because these models weren't trained with such large images in mind.
The idea behind this node is to split the image into 512x512 tiles, smooth the boundaries with seam fixing, and use ksampler generation with light denoising during the seam fixing to minimize changes to the original image.
So, the current workflow involves creating an image with your checkpoint and then performing pixel upscaling with the same options, applying seam fixing using the checkpoint that created the image. This is referred to as "upscale after generation."
By modifying the workflow this way, if you input an external image instead of one generated within Stable Diffusion, you can apply tiling to an image that wasn’t created by Stable Diffusion.
(It's a mistake not to change the prompt. You should either omit it or set it similar to the image you want to create.
Be cautious about assuming others’ workflows are error-free, as mistakes can happen.)
Currently, since you are upscaling a 512x512 image, you might think, "Do I really need to go through the trouble of tiling?"
Yes, that’s correct. In the current situation, using just the pixel upscaler is likely more advantageous.
However, this method is useful for upscaling originally large images or when aiming to scale your own image up to sizes like 4096x4096.
d. What is the difference between pixel upscaling and latent upscaling?
In section 2.2, when studying KSampler, you also learned about the concept of Latent.
Latent upscaling refers to increasing the size of an image in the latent space rather than in the pixel space.
e. How is latent upscaling done?
Using SD1.5, you created a 512x512 image of a cute cat. Now, let's upscale this image.
If you want to make this image 1024x1024, pixel upscaling is one option.
However, let's try generating it directly at 1024x1024.
Since we already know that SD1.5 produces good results at 512x512, we can anticipate that it might not perform as well at 1024x1024 without further adjustments.
Indeed, two cats appeared! Still, there’s evidence that an effort was made to generate it at 1024x1024.
Instead of trying to generate the image directly at 1024x1024, let's upscale it in the latent space to reach 1024x1024.
Connect the Latent Upscale By node, KSampler, and VAE Decode node after the basic workflow.
For reference, use bicubic interpolation as explained earlier and set the denoise value to 0.56.
The details have slightly changed, but a great 1024x1024 image was successfully created with SD1.5.
(Try applying the same technique to create a 2048x2048 image using SDXL.)
In situations where the exact appearance or details of the product must remain unchanged, pixel upscaling is the only option.
However, if the shape does not need to match 100% like the actual product, the latent upscaling method can also be useful.
I will now introduce three more techniques:
1.
Deep Shrink
2.
ControlNet
3.
HighRes
Deep Shrink = PatchModelAddDownscale(Kohya deep shrink)
The names of these nodes are quite complex. You only need to understand their basic functions.
To use an analogy, it’s like telling a model that excels at 512 to also perform well at 1024, even if it might be confusing.
Using with ControlNet
During the latent upscaling process, some small details might inevitably change. To preserve as much detail as possible, you can use multi-ControlNet alongside the upscaling.
There's no complex principle behind it—just understand it as using ControlNet in conjunction with the process.
This concludes the introduction to several upscaling techniques. Additionally, the Comfy ecosystem supports various custom nodes for upscaling. I recommend exploring these to find nodes that best suit your needs.
However, in 90% of cases, effectively using the upscalers mentioned will allow you to create excellent images of 4096x4096 or larger from Comfy-generated images. Compare them according to your situation and aim to create valuable outcomes with well-generated images.