LTX-2.3 を試してみる (NVFP4 モデルとGeForce RTX 5060 Ti 16GB)

Stability Matrix 版の ComfyUI では、latent_upscale_models に対応するフォルダが StabilityMatrixのインストールフォルダ\Data\Models\ 配下に標準では存在しません。そのため、extra_model_paths.yaml を編集する定義する必要があります。

詳細は、LTX-2 の際の操作をご参照ください。

参考

LTX-2 をローカルで動かす (音声付き動画生成AI / GeForce RTX 5060Ti 16GB)

アプつ

必要なファイルのダウンロード

ComfyUI v0.19.3 では LTX-2.3 は統合済みのため、基本的なモデルはUIから取得できます。

今回は、以下については手動でダウンロードしています:

NVFP4 モデル
LoRA v1.1

(1) ComfyUI の Web UI を起動し、左側メニューの [テンプレート] をクリックします。

(2) “LTX-2.3” で検索します。
[LTX-2.3 : テキストからビデオへ] を選択します。

(3) ワークフローが開きます。必要なモデルトファイルが不足している場合は、エラーが表示されます。
右上の [プロパティパネルの切り替えボタン] をクリックし、不足しているモデルファイルをダウンロードします。

checkpoints / ltx-2.3-22b-dev-fp8.safetensors
https://huggingface.co/Lightricks/LTX-2.3-fp8/resolve/main/ltx-2.3-22b-dev-fp8.safetensors
text_encoders / gemma_3_12B_it_fp4_mixed.safetensors
https://huggingface.co/Comfy-Org/ltx-2/resolve/main/split_files/text_encoders/gemma_3_12B_it_fp4_mixed.safetensors
latent_upscale_models / ltx-2.3-spatial-upscaler-x2-1.1.safetensors
https://huggingface.co/Lightricks/LTX-2.3/resolve/main/ltx-2.3-spatial-upscaler-x2-1.1.safetensors
loras / ltx-2.3-22b-distilled-lora-384.safetensors
https://huggingface.co/Lightricks/LTX-2.3/resolve/main/ltx-2.3-22b-distilled-lora-384.safetensors

(4) NVFP4 版のモデルと version 1.1. の Lora は、以下から個別にダウンロードします。

ltx-2.3-22b-dev-nvfp4.safetensors
https://huggingface.co/Lightricks/LTX-2.3-nvfp4
https://huggingface.co/Lightricks/LTX-2.3-nvfp4/resolve/main/ltx-2.3-22b-dev-nvfp4.safetensors

ltx-2.3-22b-distilled-lora-384-1.1.safetensors
https://huggingface.co/Lightricks/LTX-2.3
https://huggingface.co/Lightricks/LTX-2.3/resolve/main/ltx-2.3-22b-distilled-lora-384-1.1.safetensors

ファイルの配置

ダウンロードしたモデルファイルを配置します。

(1) Diffusion モデル
ltx-2.3-22b-dev-fp8.safetensors, ltx-2.3-22b-dev-nvfp4.safetensors
　→ StabilityMatrix のインストールフォルダ\Data\Models\TextEncoders\Data\Models\DiffusionModels

(2) テキストエンコーダ
gemma_3_12B_it_fp4_mixed.safetensors
　→ StabilityMatrix のインストールフォルダ\Data\Models\TextEncoders

(3) Lora
ltx-2.3-22b-distilled-lora-384.safetensors, ltx-2.3-22b-distilled-lora-384-1.1.safetensors
　→ StabilityMatrix のインストールフォルダ\Data\Models\TextEncoders\Data\Models\VAE

(4) Latent Upscale モデル
ltx-2.3-spatial-upscaler-x2-1.1.safetensors
　→ StabilityMatrix のインストールフォルダ\Data\Models\TextEncoders\Data\Models\LatentUpscaleModels

C:\StablilityMatrix にインストールしている場合は以下のような形です:

C:\STABILITYMATRIX\DATA\MODELS
├─LatentUpscaleModels
│      ltx-2.3-spatial-upscaler-x2-1.1.safetensors
├─Lora
│      ltx-2.3-22b-distilled-lora-384-1.1.safetensors
│      ltx-2.3-22b-distilled-lora-384.safetensors
├─StableDiffusion
│      ltx-2.3-22b-dev-fp8.safetensors
│      ltx-2.3-22b-dev-nvfp4.safetensors
└─TextEncoders
       gemma_3_12B_it_fp4_mixed.safetensors

サンプル画像

[LTX-2.3 : 画像から動画へ] で用いるサンプル画像を、Comfy Cloud からダウンロードします。
https://docs.comfy.org/ja/tutorials/video/ltx/ltx-2-3

(1) 画像から動画 → Comfy Cloudで実行をクリック

(2) 開いた ComfyUI Cloud の画像を読み込むノードの画像を右クリックして名前を付けて保存します。
(今回は egyptian_queen.png として保存しています。)

動画生成を行ってみた

今回は以下の2パターンで検証しました：

デフォルト構成
NVFP4 + LoRA 1.1

NVFP4 について

NVFP4は、NVIDIA Blackwell GPUアーキテクチャで導入された革新的な4ビット浮動小数点フォーマットであり、Blackwell GPU が利用できる環境であれば、試してみる価値は高いでしょう。(参考)
(NVIDIA GeForce RTX 50xx は、Blackwell GPU です。)

テキストからビデオ (デフォルトのまま)

ワークフローを開きます。
ComfyUI の Web UI – 左側メニューの [テンプレート] – [LTX-2.3 : テキストからビデオへ]

サンプルのワークフローをそのまま実行してみます。
約 211 秒で生成できました。

以下生成した動画 (再生すると音声が出るので注意)

[LTX-2.3 : テキストからビデオへ] のワークフローで生成した動画

テキストからビデオ (NVFP4 + Rola 1.1)

次に NVFP4 版で試します。
今回は、Lora も 1.1 を使います。(NVFP4 を利用する上で必須というわけではないですが、せっかくなので)

[LTX-2.3 : テキストからビデオへ] のワークフローを開きます。

[Text to Video (LTX-2.3) ] のノードを変更します。

ckpt_name : [ ltx-2.3-22b-dev-nvfp4.safetensors]
distilled_lora : [ltx-2.3-22b-distilled-lora-384-1.1.safetensors]

[LTX-2.3 : テキストからビデオへ] のワークフローの ckpt_name と distilled_lora を変更する

実行します。
約 179 秒で生成できました。

[LTX-2.3 : テキストからビデオへ] のワークフローの ckpt_name と distilled_lora を変更して生成後

プロンプトは以下ですが、今回は “LTX-2.3” のテキスト描画が不十分な部分が見えました。(小数点が反映できていない。)

Dynamic cinematic close-up of high-tech modular machinery self-assembling in midair, precision robotic parts, magnetic connectors, and glowing circuits clicking together, subtle smoke and light flares, extremely detailed titanium textures. The final product displays a clean, clear surface with large glowing engraved text “LTX-2.3” centered and unobstructed, dramatic lighting, photorealism, 8K, sharp focus.

以下生成した動画 (再生すると音声が出るので注意)