an AI transforms words into fully photorealistic images!

If DALL-E impressed you, wait a bit to see what Imagen is capable of.

Last April, we brought you the latest version of DALL-E, an incredible fully AI-based image text generator. The system has proven to be astoundingly powerful, versatile and accurate; but Google may have just relegated this program to the second division with Imagen, a new generator of this type that achieves feats such as we have only very rarely seen from a computer.

Any sufficiently advanced technology is indistinguishable from magicwrote the legendary Arthur C. Clarke; a saying that has aged particularly well in the age of AI. After all, what other qualifier would be more appropriate for a program capable of such artistic prowess?

As with OpenAI’s DALL-E, Imagen works on a frighteningly simple concept: you offer it a sentence written in full, and the program takes care of spitting out an image that perfectly matches the legend in question, at least in theory. . And if DALL-E had particularly impressed us (see our article), we must admit that he has found his master; there is absolutely something to fall head over heels when browsing the Imagen album.

Indeed, Google’s program has managed the feat of doing even better than the incredible DALL-E at almost every level. Whether in terms of precision or of versatilityat the level of the interpretation of the sentence or even the consistency of the resultit’s a a real digital tour de force on the part of the firm.

On these images, we see that Imagen has interpreted the scenarios, however damn fanciful. And it’s not just about understanding the instructions, it’s also about the final composition which is flawlessly consistent on each of these examples.

There’s also plenty to be impressed in terms of pure computer imaging, regardless of whether it’s an AI. Special mention to the management of shadows and light, which are simply breathtaking. The result is particularly impressive on subtle reflections of duck and marbles, which are scenarios that are traditionally not easy to manage.

All subjects, in all styles, and with impeccable technique

And Imagen does not only know how to represent real objects; it can even give birth to compositions that cannot be more abstract, without sacrificing the coherence of the final result. These objects below could all have been made by a talented 3D artist as long as the attention to detail is everywhere. The choice of colors is also devilishly effective and greatly contributes to the visual impact of these compositions.

And juggling styles isn’t a drag on Google’s AI either. This is evident in the examples below, which all present objects with a very marked cartoon identity. To see the result, one could almost begin to imagine the first animated films produced entirely by AI!

The most impressive point is also probably the most subtle. When we look at the images below, we see that Imagen is not only good at producing an evocative and immediately identifiable image; he also seems to have some fine understanding of many abstract patterns and conceptsand even basic rules of photo composition. Mind-blowing.

The most advanced system of its type to date

To more accurately assess the performance of Imagen, Google has devised a test called DrawBench. the concept is very simple: we propose to several systems of this kind to realize images on the basis of the same sentences, then we ask humans who are the most realized. And in this little game, Imagen simply walked on the competition, including the poor DALL-E as evidenced by the graph below.

©Google

Unfortunately, like OpenAI, Google does not offer free access to its system. The reasoning is the same: in the state, this technology is a powerful tool from which malicious actors could drive disinformation on a large scale.

In addition, Google wants to avoid very embarrassing scenarios. Because to train an AI on so many scenarios, the researchers could not just give it a mouthful; they had to force-feed her a huge amount of datawho were mostly harvested on the Internet without supervision or prior validation.

A magic wand not to put in all hands

This means that the AI ​​could also have swallowed the worst of the human species; it is therefore not excluded that it will begin to produce results which would make a sane human cry out in horror if offered a slightly contentious caption, such as content with racist elements, pornography or extreme violence. “Trash inside, trash outside“, as the specialists say.

Note, moreover, that the system has already spat out some atrocities; it is no coincidence that Google presents a restricted sample of images. They have been carefully selected upstream to avoid leaving any room for improvisation. It is very important to keep in mind that it is in essence a “best of” and not from a representative sample of image production as a whole.

Still, there is something to be impressed by this fabulous program. Google is once again showing that it’s one of the world’s AI heavyweights, and this fun app is just the tip of the iceberg.

Recall that its satellite company DeepMind, which has recently snatched a luminary of the discipline from Apple, is working on the development of an AI called “generalist” Where “strong», capable of competing with humans (see our article). We are still very far from seeing the first system of this kind appear; but in the meantime, there is no doubt that the road will be littered with works that are both very impressive technically, but also playful and entertaining like Imagen. We have decided to live in a fascinating era!

Leave a Comment