Learning Goals
You don't need to read all the text! Follow along at your own pace and feel free to practice as you go. :)
- Let's explore how to create images using Text 2 Image. We'll create a cat image once more.
- Learn how to perform parameter testing to find good parameter combinations. ← Important!
Plain Text
복사
(We will explain the principles and details of Stable Diffusion models in Chapters 1-3 and 4-1. While understanding and applying these principles can enhance your ability, it’s not necessary to understand every element in practical work. In other words, we will summarize practical actions centered on image creation in this chapter. If you find something unclear or have questions, refer to Chapters 1-1 and 4-1. Understanding the principles is important to become proficient. This chapter will focus on Basic Generation with KSampler and Model, and how to use other Core elements will be explained in Chapter 5. Although using other Core nodes is important, Text2Image is the key central element in any situation, often referred to as Basic Generation, Basic Pipe, or 1st pass. Let’s build a strong foundation.)
Creating Images with Text 2 Image
01. Setting Goals & Choosing Methods
First, spend some time imagining what kind of image you want to create and think about the methods to implement it. We will be creating images from text. (This is particularly important for larger projects, but for now, we’ll keep it simple.)
02. Writing the Prompt
Write the prompt for your image.
There are various methods to write prompts:
•
Type it in English yourself.
•
Use translation tools like Papago, Google Translate, or DeepL during the process.
•
Chapter 3-3 Use ChatGPT as a Prompt Generator (though not a common practice within our team, I will provide details on other methods).
•
Chapter 3-3 Use tools like Clip Interrogator or sLLM to convert images to text (sLLM will be covered in future updates).
•
Chapter 3-4 Refer to generation data shared by others on platforms like CivitAI, Openart, Midjourney, or ImageTab and adapt them to your needs.
While various tools are available, the principle is simple.
Positive Prompt
Include what you want in the positive prompt (e.g., cat, nature background, minimalism).
Negative Prompt
Include what you want to exclude in the negative prompt (e.g., worst quality, nsfw, text, watermark).
(Unlike some services like Midjourney, you don’t need to enter negative prompts separately here.)
Gradual Improvement
You don’t need to enter all prompts from the start. Gradually add and refine them. Remember to add what you want in the positive prompt and what you want to exclude in the negative prompt.
03. Understanding How It Works
While it’s common to think “the model or checkpoint generates the image” in Stable Diffusion, it’s more accurate to understand that the KSampler uses the checkpoint to generate the image.
Output = KSampler(Model, Prompt)
Plain Text
복사
Let’s explain each component.
04. Models (= Checkpoints)
There are many models available, divided into checkpoints and LoRAs. We will use DreamShaperXL (dreamshaperXL_alpha2Xl10) for practice initially.
Keep the model fixed and experiment with the prompts and KSampler as described above to create a few images.
Model Versions
Stable Diffusion includes models like SD1.5, SD2.1, SDXL, and SD3.0 (and also Flux and Pony models).
We will focus on SD1.5 and SDXL, as they are mainstream.
Remember this one line:
“SD1.5 is trained on 512 and performs well on 512 images, while SDXL is trained on 1024 and performs well on 1024 images. SD1.5 works well with CFG 8, and SDXL works well with CFG 4.”
In other words, to use SD1.5, you need to adjust [1] model, [2] image size, and [3] CFG.
Know whether you’re using an XL model or a 1.5 model. For example, switching to the SD1.5 model (dreamshaper_8) would create a different workflow.
05. KSampler & SeedFix & Parameter Testing
(Important! Make sure to understand how to fix seeds and perform parameter testing.)
If the generated image is not satisfactory, let’s fix the issue.
First, check if there’s an error in the prompt.
If not, fix the seed to prevent random results and continue from the current image.
Understand the meaning of each coefficient in KSampler and learn parameter testing. This is crucial, and even if other sources don’t cover it, this tutorial will.
Set the seed to -1 and change control_after_generate from increment to fix. ← This is a common practice! Get used to it.
(Setting the seed to increment/randomize causes different results every time. Fixing it produces the same image.)
The reason for setting the seed to -1 is to continue from the previous image.
(What you saw earlier means “You created an image with ‘seed=1117041082083158,’ and now the next image will be created with ‘seed=1117041082083159’”. The usage of ComfyUI can be a bit complex, so please understand that parameter testing is necessary. Setting to randomize makes testing more cumbersome.)
If the seed is fixed correctly, you should see the same image.
Change the step from 20 to 30,
then back to 20, and adjust cfg from 8 to 4 and then to 2. I found the image slightly improved.
Set step to 30 and cfg to 2.
The SDXL model might be better, but the image quality has improved.
Change back to increment and generate multiple images. After switching from fix to increment, the same image is generated again due to the same reason as the seed setting. (Does it make sense? If not, review the -1 seed setting.)
Now, the images are more satisfactory. However, it’s not a 100% perfect result. This simple parameter testing process helps in determining what values work best for your prompt and model.
The goal is to observe and adjust changes while keeping other factors constant. Testing multiple factors at once makes it hard to pinpoint which change had the effect.
There’s no absolute ‘best’ parameter; it’s about experimenting and finding what works for you.
We've discussed seed and control_after_generate above, and now I,m going to explain the rest of the elements.
steps | KSampler creates images based on seeds, and the steps parameter determines how many iterations are used to generate the image. If steps is set to 20, the image is processed through 20 iterations, and if set to 30, it goes through 30 iterations. Note that increasing the steps also increases the generation time. For example, a setting of steps at 30 will take about 1.5 times longer than steps at 20. Other parameters besides steps do not affect the time. |
cfg | Controls how strongly the prompt influences the image. A higher cfg value means the prompt has a stronger influence, resulting in images that align more closely with the prompt. A lower cfg value reduces the prompt's influence, giving the model more freedom and leading to images that may not follow the prompt as closely. Higher cfg values often produce images with stronger contrast. |
sampler_name | To put it more technically, you can think of KSampler as performing denoising and sampling 20 times when steps is set to 20. It decides how the sampling process will be carried out through the choice of a specific function. You don’t need to understand the details of these functions right now, but it will be beneficial to grasp their principles later on. |
scheduler | In simpler terms, you can think of the sampler and scheduler as having a sort of compatibility with each other. |
denoise | When doing Text2Image, you don't need to adjust this value. It should be fixed at 1. Lowering it will result in less refined images. |
So, based on the description above, we can vary steps, cfg and denoise by increasing or decreasing the numbers. But how do we decide which sampler and scheduler to use? We recommend the three below. You can play around with them and see what the difference is. (And as you get better, you'll find other values that work well for you).
sampler | scheduler |
euler | normal |
dpmpp_2m_sde | karras |
dpmpp_3m_sde | exponential |
If you don't want to experiment, we recommend that you use the dpmpp_2m_sde and karras combination for now, and then change it later if you need to improve the image. I wish I could summarise the cases more clearly, but there are so many of them that this combination works for us over 90% of the time.
And for reference, in the sampler there are friends (series) with similar names, but in between each model is slightly different, which gives you room to experiment to improve the quality. For example, there is a pattern that sometimes the recommended dpmpp_2m_sde works well with dpmpp_sde, and sometimes dpmpp_2m_sde_gpu works well with dpmpp_2m_sde_gpu.
07. Latent
Image Size So far we've only created forward-facing images (1:1 ratio), but in the real world we'll create images that are 3:4, 4:3, 9:16, 16:9 and so on. In this case, we only need to change the latent size slightly. (How to determine the image size is explained in 3-4 ImageSIze).
Additionally, it's beneficial to have at least a basic understanding of the concept of Latent. I'll add some explanation related to Latent. This will build upon what was discussed in '03. Understanding the Operation.' The previous explanation provided was actually incorrect.
Output = KSampler(Model, Prompt)
Plain Text
복사
If we further develop the function relationship correctly, it will look like this
OutputLatent = KSampler(Model, Prompt, Latent)
OutputImage = VAE_Decode(OutputLatent)
Plain Text
복사
(Just for a minute, some technical terms will come up. Don't worry. It's simple. By reading the explanation below, you'll be able to understand these three diagrams. Although we’ve only discussed text2image so far, you'll be able to roughly predict how image2image works.)
"The image we see is in pixels, and pixels and latents are different. Humans view images through pixels, but AI needs to convert them to latents for processing. KSampler works with latents." Remember this!
So, you create an empty 1024x1024 image, and then encode the pixel image into a latent form using VAE before passing it to KSampler. After KSampler processes the latent, it’s decoded back to pixel form by VAE so that you can view it.
Here, understand that the role of VAE is simply to convert between latents and pixels. For the purposes of this document (excluding section 1-3), VAE is only used for this conversion.
(Encoding and decoding may sound like complicated terms from computer science books…! For simplicity, you can think of them as compression/decompression. Compressing to latent and then decompressing to image, or converting in this context.)
You should understand that you cannot input images directly into KSampler; they need to be converted to latents via VAE first. Likewise, the latent output from KSampler must be converted back to pixel form via VAE.
Model(again)
Great job on tackling the difficult concepts! Now that you understand how to create images using text in section 2-2, it's a good idea to move on to Chapter 4 and learn how to use Core nodes. However, before jumping straight into Chapter 4, it would be beneficial to skim through Chapter 3 as needed, so you can utilize it as a reference.
(It's also recommended to practice a bit with text2image first. Try experimenting with different models and loras using the images below to see how they can be applied. Although Chapter 3-4 covers using external platforms effectively, a key point to remember is to read the documentation carefully. Even if it's not a custom node but a checkpoint, the documentation often provides valuable information on how the checkpoint was created, how to use it effectively, and examples of projects using the model along with the settings and prompts used. While you don't always need to follow this procedure, it's always a good idea to refer to the documentation if things aren't working as expected or if you want to improve your results.)
Checkpoint Usage
As explained above, you can switch to a different checkpoint by clicking "Select Checkpoint" and choosing another one.
(Currently, depending on whether you are using SD1.5 or SDXL, the settings will vary. For now, SDXL is recommended.)
Using LoRA
To use LoRA, follow these steps:
1.
Double-click on the empty space in ComfyUI, search for "Load LoRA," and add it.
2.
Connect the LoRA node between the checkpoint, prompt, and ksampler. Additionally, connect the model and clip nodes.
(Note: If you are using an SD1.5 checkpoint, you should use an SD1.5 LoRA. Similarly, if you are using an SDXL checkpoint, use an SDXL LoRA.)
Recommended Checkpoints/LoRAs
(As of July 31, 2024)
Feel free to explore these recommendations and experiment with different prompts and parameter values! These suggestions are not absolute and are based on what was available on Civitai at the time of writing or models I commonly use. Always perform parameter testing as there is no one-size-fits-all setting.
Remember, always be aware of whether you are using an XL model or a 1.5 model. You cannot mix them.
SD Version | Ckpt/LoRA | Style | Name | Nordy URL | CivitAI URL |
SDXL | Checkpoint | photorealistic | epiCRealism XL | ||
Juggernaut XL | |||||
anime | 万象熔炉 | Anything XL | ||||
Animagine XL V3.1 | |||||
3D | DynaVision XL | ||||
LoRA | photorealistic | SDXL Film Photography Style | |||
Perfect Eyes XL | |||||
anime | LineAniRedmond- Linear Manga Style for SD XL | ||||
Aesthetic Anime LoRA | |||||
3D | Samaritan 3d Cartoon SDXL | ||||
Vector | Vector Cartoon Illustration | ||||
SD 1.5 | Checkpoint | photorealistic | Realistic Vision V6.0 B1 | ||
epiCRealism | |||||
anime | ReV Animated | ||||
MeinaMix | |||||
3D | NeverEnding Dream (NED) | ||||
Disney Pixar Cartoon Type A | |||||
LoRA | anime | Anime Lineart / Manga-like (线稿/線画/マンガ風/漫画风) Style | |||
Studio Ghibli Style LoRA | |||||
3D | blindbox/大概是盲盒 | ||||
3D rendering style |