JoyAI-Image-Edit / JD Open Source

Spatially-aware image editing, built for controllable change.

JoyAI-Image-Edit unifies image understanding and generation so object move, camera control, long-text rendering, and scene consistency can be handled as grounded transformations.

Current status

The product surface is not open yet. Join the waiting list to get first access when the editing experience is ready.

8B multimodal reasoning model

16B diffusion transformer for pixel-level generation

Unified interface for understanding, generation, and editing

Join waiting list View GitHub

Project

Checkpoint

License

Core architecture

A dual-model stack where reasoning and generation are part of one editing system.

JoyAI-Image couples an 8B multimodal language model with a 16B multimodal diffusion transformer. The shared interface keeps spatial intent attached as the pipeline moves from instruction parsing to image manipulation.

Capabilities

Research assets that show what the model is actually good at.

Spatial reasoning

Understand the scene before touching a single pixel.

Instruction decomposition, relational grounding, and region-level understanding let edits follow the actual structure of the image.

Spatial editing

Move, rotate, and reframe with explicit control.

Object move, object rotation, and camera control are treated as grounded spatial operations, not vague style prompts.

Long-text rendering

Handle layout-heavy images and dense typography.

JoyAI-Image-Edit is designed for multilingual text, structured poster layouts, and scenes where typography is part of the composition.

System strengths

The differentiation is controllability, not visual noise.

The interface stays quiet on purpose. The proof comes from the research visuals, the architecture, and the operational claims.

Awakened spatial intelligence

Scene parsing and spatial relations stay attached to the edit pipeline, so language can refer to structure precisely.

Unified understanding + generation

An 8B MLLM and 16B MMDiT operate through one shared interface instead of behaving like disconnected tools.

Controllable visual transformations

Camera movement, object manipulation, and multi-view consistency are built into the model family from the start.

Research-driven training data

OpenSpatial, SpatialEdit, and long-text rendering data push the model toward grounded image editing rather than generic generation.

Training recipe

Spatial understanding is trained in before it is demonstrated in editing.

OpenSpatial supports scene reasoning, SpatialEdit sharpens instruction-guided manipulation, and long-text rendering data increases layout fidelity in image generation and editing tasks.

Model	Primary task	Status
JoyAI-Image-Und	Multimodal understanding	Released
JoyAI-Image-Edit	Instruction-guided editing	Released
JoyAI-Image-Edit-Plus	Multi-image editing	Upcoming
JoyAI-Image	Text-to-image generation	Upcoming

Waiting list

Get notified when JoyAI Image Edit opens access.

The current site should act as a clear product signal, not a half-ready editor. Leave your email and we will contact you when the experience is available.

Request access