Mike's Picture Of The Week

⇐ August 28th, 2022 ⇒

At work I got to see some internal presentations about "Imagen", Google's AI image generator. Unfortunately, at the time Imagen was limited to business use only, so I couldn't try it out for myself. However, I learned that an open source AI image generator called "Stable Diffusion" was being released by a company called "Stability AI".

Stable Diffusion uses techniques similar to the ones really impressed me with Google's Imagen, and even better: Stable Diffusion can run on a consumer video card, and doesn't require expensive cloud GPUs or TPUs to run.

After I downloaded it, I wanted to see how difficult or easy it would be to have it generate an image that I was thinking of in my head. The first thought that came to my head was a childhood memory: We were sitting around in the dining room at the cottage in Ompah discussing photography and art, specifically thinking about the question "when is a photograph art, and when is it just a photograph?"

Just as we were discussing this, my cat climbed into an old pedestal sink that was lying sideways on the dining room floor and sat perfectly upright. The way the cat was framed by the sink just made everyone in the room gasp, and we all agreed that if we had a camera handy, a photograph of that scene would be a "work of art". We did not have a camera handy, and the cat moved on after a few seconds, so all I have is the memory of the scene burned into my mind.

So, I decided to try to have Stable Diffusion recreate the scene. I ran into my first stumbling block right away: Stable Diffusion apparently hasn't been trained on sideways pedestal sinks. No matter how nicely you ask it, the sinks it generates are always upright. The second stumbling block is that while Stable Diffusion seems to be fairly good at human faces (and when it fails, there are AI GANs available that are specifically trained to "fix" distorted human faces), it has not been trained very well on cat faces.

After a bunch of prompting, drawing sketches in paint for inspiration, and general messing around, I got the picture on the left. Not really what I had in my mind, but... close?

For the second image, I wanted "a brave knight being burned by a dragon breathing fire". After some experimenting, I found that I had to generate the burning knight in one pass, generate the dragon in another pass, photoshop them together, and then run the photoshopped image through Stable Diffusion again to make one coherent image. That seems to be the best way to do things if you want more than one subject in your image.

Well that is a lot of text. It is a long way of saying that I'm having fun playing with Stable Diffusion, and if anyone wants me to try generating an image for them, let me know and I'll see what it can do.

Technical Details: These images were created using Stable Diffusion.

POTW - Home - Feedback

Hosted by theorem.ca