OmniGPT Draw
Last updated
Last updated
OmniGPT Draw integrates the top methodologies from Dall-E 2 and Latent Diffusion, while also introducing innovative concepts. It leverages the CLIP model for encoding both text and images and employs a diffusion image prior to map between the latent spaces of CLIP modalities. This strategy improves the model's visual performance and enables novel capabilities for blending and manipulating images through text. To diffuse the latent spaces, the system uses a transformer with 20 layers, 32 heads, and a hidden size of 2048.