Stable Diffusion: First experiments with open source text-to-image model

The AI Apocalypse is coming! The end of us humans is upon us! Hide your wives! hide your daughters! The machines have won, we are no longer needed! Sobs...

Stable Diffusion: First experiments with open source text-to-image model
Boks in the City

The AI Apocalypse is coming! The end of us humans is upon us! Hide your wives! hide your daughters! The machines have won, we are no longer needed! Sobs...

Nah, not even close.

So, StabilityAI, is a research company that aims to democratize the developments that have been happening in natural language processing and image artificial intelligence spaces. They released this neat open source model called Stable Diffusion, a worthy contender to the popular closed source Midjourney and OpenAI DALL-E 2 technology that have shocked the design and illustration community. These developments are less than a year old.

Will discuss the history and the trajection of the technology in a later piece. Time to showcase some of my experiments in generating never seen images using the skill of the future: prompt engineering.

prompt: an elderly black male chilling in the sun, realistic
prompt: bomber jacket coloured black, assassin's creed hood, red superman cape at the back
prompt: hi-top sneakers, blue and red colors, yellow highlights, straps and mandala decals plastered
prompt: a concept car shaped like a donut
prompt: a rabbit dressed in a tracksuit sitting outside a coffee shop, sipping a cup of cold coffee, studio ghibli style, dull colours
prompt: one hungry shark dressed in a suit, sitting in an office, glitters and lots of gold, hyperrealist style
prompt: cape town as a futuristic cyberpunk world; realistic render; high rising buildings, neon lights
cape town as a futuristic cyberpunk world; table mountain visible, realistic render; high rising buildings, neon lights
prompt: meerkat dressed as a mafia boss, grand theft auto v cover style

These are images that I generated in maybe less than 3 iterations per image. I used the same settings for each image. I took the best of the three images for each prompt just to show the skill and curation needed in prompt design and image generation. Could take a lot more iterations to get cleaner results.  Must say that I'm quite happy with these quick attempts. With more tweaking of the controls, I could get better results. Maybe some objects are a bit distorted. Text in the final image is certainly an issue. The face of the old man could use a run-through of a face-fixing GAN model. You will notice that my Cape Town images needed Table Mountain to be specified and the difference between the two images is quite stark; despite changing one aspect of them. It is to be seen how much consistency of style and forms that I can get in multiple variations. There is always Photoshop availabe to touch up an image. This guide looks like the prompt design bible.

But WOW! The concept art that you can push out of this bad boy is crazy and I have only touched the surface. Look at some other works with various text-to-image technologies: here and here. I can see this eating the stock photos market's lunch. It's another tool in the designer's toolbox. Like a paintbrush that gets you a really good starting point. A great concept builder that you can throw more layers on top of. I already did fashion design, automobile design, album/game artwork, and even made a fake person! Logos next. The possibilities are endless. The debate will continue with questions about the meaning of art, the ethics, and the even greater developments that will go on. Regardless, those who are on the frontier get to shape this world.