After the takeover of summer with the help of a powerful, freely available new Open Source language and coding, AI models that fit, and in some cases defeated closed/reserved rivals Crack “Qwen Team” researchers AI returns today with the release of a new model of an image image generator AI – Also open source.
QWEN-IMAGE stands out in the crowded field of generative models of paintings Because of this Emphasis on accurate rendering of the text as a part of the visualization – A area where many rivals are still fighting.
By supporting each alphabetical and logographic scripts, the model is particularly running in management of complex typography, multi -year systems, semantics at the paragraph level and Bilingual content (e.g. English-Chinese).
In practice, this enables users Generate content reminiscent of movie posters, presentation slides, store sites, handwritten poetry and stylized infographics – With a clear text that is consistent with their hints.
The AI Impact series returns to San Francisco – August 5
The next AI phase is here – are you ready? Join the leaders from Block, GSK and SAP to see the exclusive look at how autonomous agents transform the flows of the work of the company-decision-making in real time for comprehensive automation.
Secure your house now – the space is limited: https://bit.ly/3guplf
The qwen-image examples include a wide selection of actual use:
- Marketing and branding: Bilingual posters with the brand logo, stylistic calligraphy and consistent design motifs
- Presentation project: Slide deposits taking into account the system with titles hierarchy and visualization suitable for motifs
- Education: Generating materials in the class containing diagrams and precisely rendered instructional text
- Retail and e-commerce: Scenes from shop windows where product labels, marking and environmental context should be legible
- Creative content: Handwritten poetry, scene narratives, anime style illustration with embedded text
Users can interact with the model on Chat Qwen The site, choosing the “image generation” mode from the buttons below the hint.
However, my short preliminary tests revealed that the text and quick compliance weren’t noticeably higher than Midjourney, a popular reserved image generator AI from an American company with the same name. My session through the QWEN chat caused many mistakes in a rapid understanding and loyalty of the text, to my disappointment, even after repetitive attempts and re -rewinding:


However, Midjourney only offers a limited variety of free generations and requires additional subscriptions in comparison with the QVEN image, which due to the licensing of Open Source and the burden of posted HuggingIt might be accepted by any enterprise or free pages provider freed from charge.
Licensing and availability
QWEN-IMAGE is distributed under Apache 2.0 licenseBy enabling industrial and non-commercial use, redistribution and modification-the attribution and inclusion of the license text are required for derivative work.
This can make enterprises look for tools to generate Open Source images, which might be used to create internal or external security, reminiscent of leaflets, ads, ads, newsletters and other digital communication.
But the proven fact that the model’s training data stays a strictly guarded secret – as with most other leading AI image generators – It might be amended some enterprises on the idea of using.
Qwen, unlike Adobe Firefly Or Native Image GPT-4O OPENAI, image generation, For example, does not offer compensation for the industrial applications of your product (No, if the user is sued for copyright violation, Adobe and Openai will help them support them in court).
Model and related resources-in this demonstration notebooks, evaluation tools and tuning scenarios-are available through many repositories:
In addition, a live rating portal called AI Arena allows users to match generations of paintings in rounds in pairs, contributing to the public leader table in the ELO style.
Training and development
Behind the qwen-image performance is situated An enormous training process based on progressive learning, multimodal equalization of tasks and aggressive data treatmentAccording to the technical article, the research team has published today.
The training body includes billions of pairs of text from 4 domains: natural paintings, human portraits, artistic and design content (reminiscent of posters and user interface systems) and data from synthetic text. The QWEN team didn’t specify the size of the training bodyIn addition to “billions of pairs of text”. They provided the distribution of an approximate percentage of each category of content in which he included:
- Nature: ~ 55%
- Design (UI, posters, art): ~ 27%
- People (portraits, human activities): ~ 13%
- Synthetic text rendering data: ~ 5%
In particular, Qwen emphasizes that each one synthetic data was generated internally and no paintings created by other AI models were used. Despite the detailed stages of the treatment and filtering, Documentation does not explain whether any of the data was licensed or derived from public or reserved data sets.
Unlike many generative models that exclude synthetic text on account of the risk of noise, Qwen-image uses strictly controlled synthetic pipelines to enhance the cover cover-especially in the case of low-frequency Chinese signs.
A method in the variety of the curriculum is used: The model begins with easy signed images and non-text contentThen he goes to text scenarios sensitive to the system, rendering in mixed language and dense paragraphs. This It has been shown that a gradual exhibition helps the model to generalize in scenarios and formatting types.
QWEN-IMAGE integrates three key modules:
- Qwen2.5-VLMultimodal language model, distinguishes contextual importance and generation of guides through system hints.
- Vae Enkoder/DecoderTrained in the field of high -resolution documents and systems in the real world, supports detailed visual representations, especially small or dense text.
- MMDITThe skeleton of the diffusion model, coordinates joint learning between image and text methods. The new MSROP system (multimodal scalable coding of rotating position) improves spatial alignment between tokens.
Together, these components enable the effective effect of Qwen-Image in tasks covering the understanding of the image, generation and precise editing.
Benchmark performance
QWEN-IMAGAGE was rated at an angle of several public reference points:
- Geneval AND DPG In the case of quick maintenance and consistency of object attributes
- Oneig-Bench AND For for compositional reasoning and loyalty of the system
- CVTG-2KIN Chinese wordAND Longtext-Bench for rendering text, especially in multilingual contexts
In almost every case, Qwen-Image either suits or exceeds existing closed models [High]Seedream 3.0 and Flux.1 KONTEXT [Pro]. In particular, its performance in rendering Chinese text was much higher than all compared systems.
On the board of AI ARENA public leaders, based on over 10,000 comparisons of human people-Qwen-image ranks third in the general classification and is the best model of the Open Source.
Implications for technical decision makers of enterprises
In the case of AI Enterprise teams managing complex multimodal work flows, Qwen-Image introduces several functional benefits that are in line with the operational needs of varied roles.
People managing the life cycle of models in the language of vision-from training after implementation-wilL Find the value in a coherent quality qwen-image output quality and its components ready for integration. Natura Open Source reduces the costs of licensing, while modular architecture (QWEN2.5-VL + VAE + MMDIT) makes it easier to adapt to custom data sets or tuning the results specific to the domain.
. Training data in the variety of the curriculum and the results of comparative results help teams assess efficiency for the purpose. Regardless of whether by implementing marketing visualizations, document renderings, or graphics of e-commerce products, Qwen-image enables quick experiments without reserved restrictions.
Engineers The task of building AI pipelines or implementing models between distributed systems will appreciate the detailed documentation of the infrastructure. The model has been trained using the architecture of the manufacturer-consumers, supports the scalable processing of many resolutions (256p to 1328p) and is built to operate with Megatron-LM and Tensor parallelism. This It makes Qwen-image a candidate to be implemented in hybrid cloud environments in which reliability and capability matters.
In addition, support for image editing flows to the image (TI2i) and prompts specific to the task allows it for use in real -time or interactive time applications.
Specialists focused on receiving data, validation and transformation It can use QWEN-IMAGE as a tool for generating synthetic data sets for training or expanding computer vision models. His ability to generate high -resolution images with embedded, multilingual annotations can improve performance in OCR tasks, detection of objects or parsing of the system.
Because Qwen-Image was Also trained to avoid artifacts reminiscent of QR codesA distorted text and watermarks, offers higher synthetic quality than many public models-they help corporate teams to take care of the integrity of the training set.
I’m looking for opinions and opportunities for cooperation
The QWEN team emphasizes the openness and cooperation of the community in the model release.
Developers are encouraged to check and adapt Qwen-Image, offering demands and participate in the evaluation table. Feedback on text rendering, editing loyalty and multilingual use cases will shape future iterations.
The team hopes that Qwen-image might be given the goal of “lowering technical barriers in creating visual content”, but as a basis for further research and practical implementation in various industries.
