Logo
Joyai Image
JoyAI-Image spatial editing showcase

JoyAI-Image-Edit / JD Open Source

Spatially-aware image editing, built for controllable change.

JoyAI-Image-Edit unifies image understanding and generation so object move, camera control, long-text rendering, and scene consistency can be handled as grounded transformations.

Current status

The product surface is not open yet. Join the waiting list to get first access when the editing experience is ready.

8B multimodal reasoning model

16B diffusion transformer for pixel-level generation

Unified interface for understanding, generation, and editing

Project

JD Open Source

Checkpoint

JoyAI-Image-Edit

License

Apache 2.0

Core architecture

A dual-model stack where reasoning and generation are part of one editing system.

JoyAI-Image couples an 8B multimodal language model with a 16B multimodal diffusion transformer. The shared interface keeps spatial intent attached as the pipeline moves from instruction parsing to image manipulation.

JoyAI-Image architecture diagram

Capabilities

Research assets that show what the model is actually good at.

Understand the scene before touching a single pixel.

Spatial reasoning

Understand the scene before touching a single pixel.

Instruction decomposition, relational grounding, and region-level understanding let edits follow the actual structure of the image.

Move, rotate, and reframe with explicit control.

Spatial editing

Move, rotate, and reframe with explicit control.

Object move, object rotation, and camera control are treated as grounded spatial operations, not vague style prompts.

Handle layout-heavy images and dense typography.

Long-text rendering

Handle layout-heavy images and dense typography.

JoyAI-Image-Edit is designed for multilingual text, structured poster layouts, and scenes where typography is part of the composition.

System strengths

The differentiation is controllability, not visual noise.

The interface stays quiet on purpose. The proof comes from the research visuals, the architecture, and the operational claims.

Awakened spatial intelligence

Scene parsing and spatial relations stay attached to the edit pipeline, so language can refer to structure precisely.

Unified understanding + generation

An 8B MLLM and 16B MMDiT operate through one shared interface instead of behaving like disconnected tools.

Controllable visual transformations

Camera movement, object manipulation, and multi-view consistency are built into the model family from the start.

Research-driven training data

OpenSpatial, SpatialEdit, and long-text rendering data push the model toward grounded image editing rather than generic generation.

Training recipe

Spatial understanding is trained in before it is demonstrated in editing.

OpenSpatial supports scene reasoning, SpatialEdit sharpens instruction-guided manipulation, and long-text rendering data increases layout fidelity in image generation and editing tasks.

JoyAI-Image capability radar
ModelPrimary taskStatus
JoyAI-Image-UndMultimodal understandingReleased
JoyAI-Image-EditInstruction-guided editingReleased
JoyAI-Image-Edit-PlusMulti-image editingUpcoming
JoyAI-ImageText-to-image generationUpcoming

Waiting list

Get notified when JoyAI Image Edit opens access.

The current site should act as a clear product signal, not a half-ready editor. Leave your email and we will contact you when the experience is available.

Request access