⭐

2.2. Text2Image(Basic)

Chapter

2.Basic

Learning Goals

You don't need to read all the text! Follow along at your own pace and feel free to practice as you go. :)

- Let's explore how to create images using Text 2 Image. We'll create a cat image once more.
- Learn how to perform parameter testing to find good parameter combinations. ← Important! 
Plain Text
복사

(We will explain the principles and details of Stable Diffusion models in Chapters 1-3 and 4-1. While understanding and applying these principles can enhance your ability, it’s not necessary to understand every element in practical work. In other words, we will summarize practical actions centered on image creation in this chapter. If you find something unclear or have questions, refer to Chapters 1-1 and 4-1. Understanding the principles is important to become proficient. This chapter will focus on Basic Generation with KSampler and Model, and how to use other Core elements will be explained in Chapter 5. Although using other Core nodes is important, Text2Image is the key central element in any situation, often referred to as Basic Generation, Basic Pipe, or 1st pass. Let’s build a strong foundation.)

Creating Images with Text 2 Image

01. Setting Goals & Choosing Methods

First, spend some time imagining what kind of image you want to create and think about the methods to implement it. We will be creating images from text. (This is particularly important for larger projects, but for now, we’ll keep it simple.)

02. Writing the Prompt

Write the prompt for your image.

There are various methods to write prompts:

•

Type it in English yourself.

•

Use translation tools like Papago, Google Translate, or DeepL during the process.

•

Chapter 3-3 Use ChatGPT as a Prompt Generator (though not a common practice within our team, I will provide details on other methods).

•

Chapter 3-3 Use tools like Clip Interrogator or sLLM to convert images to text (sLLM will be covered in future updates).

•

Chapter 3-4 Refer to generation data shared by others on platforms like CivitAI, Openart, Midjourney, or ImageTab and adapt them to your needs.

While various tools are available, the principle is simple.

Positive Prompt

Include what you want in the positive prompt (e.g., cat, nature background, minimalism).

Negative Prompt

Include what you want to exclude in the negative prompt (e.g., worst quality, nsfw, text, watermark).

(Unlike some services like Midjourney, you don’t need to enter negative prompts separately here.)

Gradual Improvement

You don’t need to enter all prompts from the start. Gradually add and refine them. Remember to add what you want in the positive prompt and what you want to exclude in the negative prompt.

03. Understanding How It Works

While it’s common to think “the model or checkpoint generates the image” in Stable Diffusion, it’s more accurate to understand that the KSampler uses the checkpoint to generate the image.

Output = KSampler(Model, Prompt)
Plain Text
복사

Let’s explain each component.

04. Models (= Checkpoints)

There are many models available, divided into checkpoints and LoRAs. We will use DreamShaperXL (dreamshaperXL_alpha2Xl10) for practice initially.

Keep the model fixed and experiment with the prompts and KSampler as described above to create a few images.

Model Versions

Stable Diffusion includes models like SD1.5, SD2.1, SDXL, and SD3.0 (and also Flux and Pony models).

We will focus on SD1.5 and SDXL, as they are mainstream.

Remember this one line:

“SD1.5 is trained on 512 and performs well on 512 images, while SDXL is trained on 1024 and performs well on 1024 images. SD1.5 works well with CFG 8, and SDXL works well with CFG 4.”

In other words, to use SD1.5, you need to adjust [1] model, [2] image size, and [3] CFG.

Know whether you’re using an XL model or a 1.5 model. For example, switching to the SD1.5 model (dreamshaper_8) would create a different workflow.

05. KSampler & SeedFix & Parameter Testing

(Important! Make sure to understand how to fix seeds and perform parameter testing.)

If the generated image is not satisfactory, let’s fix the issue.

First, check if there’s an error in the prompt.

If not, fix the seed to prevent random results and continue from the current image.

Understand the meaning of each coefficient in KSampler and learn parameter testing. This is crucial, and even if other sources don’t cover it, this tutorial will.

Set the seed to -1 and change control_after_generate from increment to fix. ← This is a common practice! Get used to it.

(Setting the seed to increment/randomize causes different results every time. Fixing it produces the same image.)

The reason for setting the seed to -1 is to continue from the previous image.

(What you saw earlier means “You created an image with ‘seed=1117041082083158,’ and now the next image will be created with ‘seed=1117041082083159’”. The usage of ComfyUI can be a bit complex, so please understand that parameter testing is necessary. Setting to randomize makes testing more cumbersome.)

If the seed is fixed correctly, you should see the same image.

Change the step from 20 to 30,

then back to 20, and adjust cfg from 8 to 4 and then to 2. I found the image slightly improved.

Set step to 30 and cfg to 2.

The SDXL model might be better, but the image quality has improved.

Change back to increment and generate multiple images. After switching from fix to increment, the same image is generated again due to the same reason as the seed setting. (Does it make sense? If not, review the -1 seed setting.)

Now, the images are more satisfactory. However, it’s not a 100% perfect result. This simple parameter testing process helps in determining what values work best for your prompt and model.

The goal is to observe and adjust changes while keeping other factors constant. Testing multiple factors at once makes it hard to pinpoint which change had the effect.

There’s no absolute ‘best’ parameter; it’s about experimenting and finding what works for you.

We've discussed seed and control_after_generate above, and now I,m going to explain the rest of the elements.

steps	KSampler creates images based on seeds, and the steps parameter determines how many iterations are used to generate the image. If steps is set to 20, the image is processed through 20 iterations, and if set to 30, it goes through 30 iterations. Note that increasing the steps also increases the generation time. For example, a setting of steps at 30 will take about 1.5 times longer than steps at 20. Other parameters besides steps do not affect the time.
cfg	Controls how strongly the prompt influences the image. A higher cfg value means the prompt has a stronger influence, resulting in images that align more closely with the prompt. A lower cfg value reduces the prompt's influence, giving the model more freedom and leading to images that may not follow the prompt as closely. Higher cfg values often produce images with stronger contrast.
sampler_name	To put it more technically, you can think of KSampler as performing denoising and sampling 20 times when steps is set to 20. It decides how the sampling process will be carried out through the choice of a specific function. You don’t need to understand the details of these functions right now, but it will be beneficial to grasp their principles later on.
scheduler	In simpler terms, you can think of the sampler and scheduler as having a sort of compatibility with each other.
denoise	When doing Text2Image, you don't need to adjust this value. It should be fixed at 1. Lowering it will result in less refined images.

So, based on the description above, we can vary steps, cfg and denoise by increasing or decreasing the numbers. But how do we decide which sampler and scheduler to use? We recommend the three below. You can play around with them and see what the difference is. (And as you get better, you'll find other values that work well for you).

sampler	scheduler
euler	normal
dpmpp_2m_sde	karras
dpmpp_3m_sde	exponential

If you don't want to experiment, we recommend that you use the dpmpp_2m_sde and karras combination for now, and then change it later if you need to improve the image. I wish I could summarise the cases more clearly, but there are so many of them that this combination works for us over 90% of the time.

And for reference, in the sampler there are friends (series) with similar names, but in between each model is slightly different, which gives you room to experiment to improve the quality. For example, there is a pattern that sometimes the recommended dpmpp_2m_sde works well with dpmpp_sde, and sometimes dpmpp_2m_sde_gpu works well with dpmpp_2m_sde_gpu.

07. Latent

Image Size So far we've only created forward-facing images (1:1 ratio), but in the real world we'll create images that are 3:4, 4:3, 9:16, 16:9 and so on. In this case, we only need to change the latent size slightly. (How to determine the image size is explained in 3-4 ImageSIze).

Additionally, it's beneficial to have at least a basic understanding of the concept of Latent. I'll add some explanation related to Latent. This will build upon what was discussed in '03. Understanding the Operation.' The previous explanation provided was actually incorrect.

Output = KSampler(Model, Prompt)
Plain Text
복사

If we further develop the function relationship correctly, it will look like this

OutputLatent = KSampler(Model, Prompt, Latent)
OutputImage = VAE_Decode(OutputLatent)
Plain Text
복사

(Just for a minute, some technical terms will come up. Don't worry. It's simple. By reading the explanation below, you'll be able to understand these three diagrams. Although we’ve only discussed text2image so far, you'll be able to roughly predict how image2image works.)

"The image we see is in pixels, and pixels and latents are different. Humans view images through pixels, but AI needs to convert them to latents for processing. KSampler works with latents." Remember this!

So, you create an empty 1024x1024 image, and then encode the pixel image into a latent form using VAE before passing it to KSampler. After KSampler processes the latent, it’s decoded back to pixel form by VAE so that you can view it.

Here, understand that the role of VAE is simply to convert between latents and pixels. For the purposes of this document (excluding section 1-3), VAE is only used for this conversion.

(Encoding and decoding may sound like complicated terms from computer science books…! For simplicity, you can think of them as compression/decompression. Compressing to latent and then decompressing to image, or converting in this context.)

You should understand that you cannot input images directly into KSampler; they need to be converted to latents via VAE first. Likewise, the latent output from KSampler must be converted back to pixel form via VAE.

Model(again)

Great job on tackling the difficult concepts! Now that you understand how to create images using text in section 2-2, it's a good idea to move on to Chapter 4 and learn how to use Core nodes. However, before jumping straight into Chapter 4, it would be beneficial to skim through Chapter 3 as needed, so you can utilize it as a reference.

(It's also recommended to practice a bit with text2image first. Try experimenting with different models and loras using the images below to see how they can be applied. Although Chapter 3-4 covers using external platforms effectively, a key point to remember is to read the documentation carefully. Even if it's not a custom node but a checkpoint, the documentation often provides valuable information on how the checkpoint was created, how to use it effectively, and examples of projects using the model along with the settings and prompts used. While you don't always need to follow this procedure, it's always a good idea to refer to the documentation if things aren't working as expected or if you want to improve your results.)

Checkpoint Usage

As explained above, you can switch to a different checkpoint by clicking "Select Checkpoint" and choosing another one.

(Currently, depending on whether you are using SD1.5 or SDXL, the settings will vary. For now, SDXL is recommended.)

Using LoRA

To use LoRA, follow these steps:

Double-click on the empty space in ComfyUI, search for "Load LoRA," and add it.

Connect the LoRA node between the checkpoint, prompt, and ksampler. Additionally, connect the model and clip nodes.

(Note: If you are using an SD1.5 checkpoint, you should use an SD1.5 LoRA. Similarly, if you are using an SDXL checkpoint, use an SDXL LoRA.)

Recommended Checkpoints/LoRAs

(As of July 31, 2024)

Feel free to explore these recommendations and experiment with different prompts and parameter values! These suggestions are not absolute and are based on what was available on Civitai at the time of writing or models I commonly use. Always perform parameter testing as there is no one-size-fits-all setting.

Remember, always be aware of whether you are using an XL model or a 1.5 model. You cannot mix them.

SD Version	Ckpt/LoRA	Style	Name	Nordy URL	CivitAI URL
SDXL	Checkpoint	photorealistic	epiCRealism XL	https://nordy.ai/workflows/66b036de33abfca4b7bc3779	https://civitai.com/models/277058/epicrealism-xl
			Juggernaut XL	https://nordy.ai/workflows/66b037238707efff94576717	https://civitai.com/models/133005/juggernaut-xl
		anime	万象熔炉 \| Anything XL	https://nordy.ai/workflows/66b037477ba856929f337643	https://civitai.com/models/9409/or-anything-xl
			Animagine XL V3.1	https://nordy.ai/workflows/66b037617ba856929f337652	https://civitai.com/models/260267/animagine-xl-v31?modelVersionId=403131
		3D	DynaVision XL	https://nordy.ai/workflows/66b03796ef8636743b3d7ffc	https://civitai.com/models/122606/dynavision-xl-all-in-one-stylized-3d-sfw-and-nsfw-output-no-refiner-needed
	LoRA	photorealistic	SDXL Film Photography Style	https://nordy.ai/workflows/66b0382f7ba856929f33768e	https://civitai.com/models/158945/sdxl-film-photography-style
			Perfect Eyes XL	https://nordy.ai/workflows/66b038568707efff9457678a	https://civitai.com/models/118427/perfect-eyes-xl?modelVersionId=128461
		anime	LineAniRedmond- Linear Manga Style for SD XL	https://nordy.ai/workflows/66b038b27ba856929f3376a5	https://civitai.com/models/127018/lineaniredmond-linear-manga-style-for-sd-xl-anime-style
			Aesthetic Anime LoRA	https://nordy.ai/workflows/66b038cdef9e35505de86fee	https://civitai.com/models/295100/aesthetic-anime-lora
		3D	Samaritan 3d Cartoon SDXL	https://nordy.ai/workflows/66b038f6e8c05e0008c5b7e2	https://civitai.com/models/121932/samaritan-3d-cartoon-sdxl
		Vector	Vector Cartoon Illustration	https://nordy.ai/workflows/66b039128707efff945767b4	https://civitai.com/models/278230/vector-cartoon-illustration
SD 1.5	Checkpoint	photorealistic	Realistic Vision V6.0 B1	https://nordy.ai/workflows/66b03e020d43bf34eee5e693	https://civitai.com/models/4201/realistic-vision-v60-b1
			epiCRealism	https://nordy.ai/workflows/66b03e2541293c965a8542b5	https://civitai.com/models/25694/epicrealism
		anime	ReV Animated	https://nordy.ai/workflows/66b03e4afb416cc17e2e2932	https://civitai.com/models/7371/rev-animated
			MeinaMix	https://nordy.ai/workflows/66b03e700d659fc1113bffe6	https://civitai.com/models/7240/meinamix
		3D	NeverEnding Dream (NED)	https://nordy.ai/workflows/66b03e910d659fc1113c0002	https://civitai.com/models/10028/neverending-dream-ned
			Disney Pixar Cartoon Type A	https://nordy.ai/workflows/66b03eaa0d43bf34eee5e718	https://civitai.com/models/65203/disney-pixar-cartoon-type-a
	LoRA	anime	Anime Lineart / Manga-like (线稿/線画/マンガ風/漫画风) Style	https://nordy.ai/workflows/66b03f1ee20b9070bdb763bb	https://civitai.com/models/16014/anime-lineart-manga-like-style
			Studio Ghibli Style LoRA	https://nordy.ai/workflows/66b03f3cac22cfc564cfe3cb	https://civitai.com/models/6526/studio-ghibli-style-lora
		3D	blindbox/大概是盲盒	https://nordy.ai/workflows/66b03f5541f3c348f112b428	https://civitai.com/models/25995/blindbox
			3D rendering style	https://nordy.ai/workflows/66b03f73e8c05e0008c5ba2d	https://civitai.com/models/73756/3d-rendering-style