November 21, 2024

AI slop is stylistically schizo and contradictory, human art is coherent and unified in style

A recent post from Scott Alexander on AI slop vs. human art asked respondents to guess whether certain images were made by AI or a person. If AI could fool enough respondents, was it not real art, a la the Turing test for judging whether a machine was intelligent?

Well, they did not fool respondents -- the average and median score was 60% correct, compared to 50% if they flipped a coin. We just had an election -- 60 to 40 is not a close election. This contradicts the title of section 1, that people had a hard time identifying AI art just cuz it wasn't near 100% correct. Having a hard time would mean they did worse than a coin-flip.

And he admits that he put a massive thumb on the scale by screening out AI images that had the telltale signs of being AI -- which is to say, the telltale signs that are already known about. This experiment revealed, to me at least, other telltale signs, but more on that later. E.g., disfigured hands and fingers, garbled text, "wrestling" poses where there's a lot of interaction between two bodies, and the entire style of the recognizable DALL-E model. Even with this thumb on the scale, people were still not fooled.

Of course, when it comes to computer models, the question is not can a computer program do something, but how much complexity does it cost, and how good is the output? If it has a 137-degree polynomial to connect a line between 7 data points, that's over-fitting the data. How many prompts, with what degree of specificity, does an AI generator require before it gives sufficiently passable results -- plural, as in reliably replicated, not just a fluke success?

The more complexity that needs to be built into the model by these prompts, the less smart and talented it is. The real comparison is, how many prompts or constraints, and with what degree of specificity, would you have to tell a human artist before they gave sufficiently passable results? Not many at all! The computer is a massive downgrade, if you're telling someone or something else what you have in mind.

The real fascination with AI slop is that the turnaround time for results is relatively fast, compared to the labor-intensive work of human hands. And so even though the results are slop, they're at least 20% real-ish, so you're OK with that trade-off -- crappy quality, but fast results for AI, instead of high quality results that take much longer for a person.

So then AI is not superior or equal to a person, it's a different point along a trade-off continuum, and nothing to gawk at as though it were a higher form of intelligence or existence than our own.

* * *


My main interest, however, is in further analyzing what gives away AI art, beyond the already well-known signs like mangled hands, garbled text, and interactive bodies that turn into Mr. Potatohead abominations.

Those are specific to the subject matter -- what is being portrayed. But scrolling through the images -- and I was not fooled by more than a few (more on why they can fool, later) -- I discovered a more fundamental and stylistic giveaway of AI art, which gets to its very nature, or perhaps lack of a nature, compared to human nature.

Namely, AI -- being a program without a mind or spirit of its own -- can easily be of two minds, even in contradictory ways (not just divergent), at a stylistic level. Not what subject matter is portrayed, but the manner in which it is portrayed. Human beings, possessing a single mind of their own, are of one mind about the manner in which they portray the subject matter.

Consider image 4, which I instantly felt was AI (and it is). There is a clear main subject, close to the viewer, and it's a person. I don't think it matters if the subject is a non-human animal, plant, inanimate object like a boulder, or human artifact like a chair or door -- something that is the focus of attention, in the foreground, near the viewer. Then there's a background environment in which the subject is embedded -- not a portrait in a vacuum.

The subject being a person means it takes the form of a portrait, while the environment takes the form of a landscape.

Yet the styles of these two forms are different and contradictory. The landscape is Impressionist, although who cares exactly what period it's mimicking -- the point is, the level of detail is low-resolution, blurry, with blobs and patches and planes of color more than crisply delineated and complex shapes. This applies not only in the distance, where things are naturally more blurry, but right up in the foreground -- look at the flowers directly around the girl, their stalks look like single thick brushstrokes, and the petals are thick daubs of color. Low-detail, blurry.

Then all of a sudden, the girl in the portrait section is rendered in fairly high detail, in focus rather than blurred. It's not 100% photorealistic, but it's far more in that direction than the highly stylized rendering of detail for the landscape section. You can see multiple folds on the fabric of her clothes, with light / shading for sculpting purposes -- which is NOT used on the grass, flowers, dirt, trees, etc. in the landscape. You can make out individual wisps of hair on her head, each tiny curving line inside her ear (with shading-for-sculpting again), and so on.

This detailed focus gets more blurry and Impressionist as you look toward the bottom of her dress and shoes, and I notice in the other images that the trigger for photorealism seems to be a human face or other exposed parts of the human anatomy. So even just her dress -- which is a single garment, not a separate top and bottom -- looks schizo stylistically, with a more photorealist upper region and a blurry Impressionist bottom region, further from the trigger of exposed human anatomy.

The machine doesn't understand that a single self-contained work of art is supposed to be coherent and unified in style or presentation. It has clearly been trained on photorealistic portraits and Impressionist landscapes, one not-so-stylized and the other highly stylized. And so when asked to combine a portrait within a landscape, it figures why not combine the best of both worlds? -- a high-detail portrait in a landscape that is blurry immediately surrounding her, not to mention farther away as well.

This is not just shallow focus from photography or cinematography -- at the exact same distance from the "camera," there are simultaneously a sharp-focus object and blurry objects. That's not physically or technologically possible -- and could only be done by deliberate choice of the artist, in some warped form of artistic license.

But artists never use that license, cuz it violates the fundamental requirement to present the subject matter in a coherent unified style -- all blurry and Impressionist, or all sharp-focus and photorealistic, but not some of one and some of the other in the same work.

To give a pity point to the machine, it at least does the sensible contradiction instead of the wacko contradiction -- it renders subjects in sharp detail (as though we're giving them our attention), while leaving environments in blurry detail (as though they're in our peripheral vision, not as important), rather than an Impressionist portrait set within a photorealistic landscape (akin to animated figures superimposed on a photographed real-world environment, like Who Framed Roger Rabbit?).

This schizo clash of styles within a single work is how I identified most of the other AI images.

Photorealist portrait in blurry landscape also told me the following were AI: 7, 10 (again, not a portrait of a person, but with a clear subject taking up much space), 13 (cartoon head, realistic water), 16 (the background being just a fairly uniform color plane), 21 (the environment's flowers are blurrier than the decorative flowers on her clothes, despite both being close to the camera), 23 (background looks like an Abstract Expressionist painting, and even within the mother's clothing, the colored pieces are blurrier than the white pieces), 27, 33, 40, 46, 49 (the wacko contradiction, where the close-up buildings are blurry while the distant water is in sharper focus)...

And the most insane is 26, whose subject looks like he was photographed under pristine studio conditions -- while the landscape outside the window is a highly stylized Venetian type portrayal. Is it supposed to be a painting within the artwork, hanging on the wall of this room? To me it looked like a landscape shown through a window, that ol' trick. There's what could be a decorative frame just below it, but not running up the left side of the landscape... so it's a bit schizo in its subject matter, but also in the style, with totally opposite styles for the landscape and the portrait.

Related, there are some whose subject matter is a bunch of abstract geometric shapes, with no 3D depth cues, no lighting variation, etc. -- and then a single human face or body, with multiple features (eye outline, iris, pupil, lips, individual teeth, etc.), sometimes with shading-for-sculpting. The dum-dum AI doesn't understand that a single work has to be entirely abstract or entirely representational. This gave away 6, 17, 24, and 50.

I could tell that 19 was by a person cuz although there are geometric shapes and a stylized human head, the geometric shapes are not separate abstract objects from the representational head -- they're used to form the lines around the head and its features, or to fill up volume within these features, suggest texture of the features, etc. They are building blocks to render a representational object -- not a separate array of abstract shapes, plus a representational head in their midst.

The Impressionist landscapes with no dominating subject are less obviously AI, cuz the contradictory rendering of subject and background cannot happen. Still, their subject matter or compositions look more like photographs, which were then passed through a blurry / stylized / Impressionist filter. The point-of-view, angle, perspective, cropping objects at the frame's edges, etc. Very photographic in composition, if not photorealistic in detail. And painters or illustrators rarely did this -- they create more of a staged array of figures or natural elements if there's no dominant subject.

This gave away 11, 20, 31, and 43. I could tell 22 was by a person cuz there's a semi-prominent human and plant subject, and they're both rendered Impressionistically along with the landscape. However, 38 and 45 do not look like photographs in composition, and have the same approach to detail throughout. 38 has a little wacky of subject matter, with fairly crisply intact ruins amidst a sprawling pasture, and maybe the level of detail on that building is a bit too much compared to the landscape and figures, but it's not as obvious, and the figures are pretty blurry.

44 was the only one that really got me, glad to know it got everyone else too. Kinda photographic in composition, but could easily be a painter as well. Everything rendered in blurry brushstroke blobs, nothing is contradicting that with sharp focus. The presence of multiple people is not triggering the high-detail tendency for portraits. And the arrangement of them looks somewhat staged for dramatic effect, not a typical photograph. Very consistent and coherent stylistically.

Well, one getting through is just a fluke, as far as I'm concerned. By random chance the algo didn't do the many wrong things it is tempted to do. And if you could somehow spell out what is different about this one, to try to replicate it, it'd need so much more complexity in its instructions, that it wouldn't be worth it -- over-fitting the data.

I also missed the most commonly misidentified human picture -- 25. It has that wacko subject matter that makes you think it's AI. And the insane level of detail on the front of the ship, but far less on the bottom, the blurry / misty right and left sides of the landscape (including the smaller ships on the right), are the common contradictory styles of AI.

I wonder if this one was purposely made to resemble AI. If so, that still proves the larger point -- humans are better at imitating AI slop, than computers are at imitating human art. We are superior to them, so we can understand them and imitate them better than vice versa. Their output is a subset of ours, so it's interpolation and valid when we imitate them. Our output is a superset of theirs, so it would be extrapolation and invalid for them to imitate us.

The only human one I was fairly convinced was AI, was 30 -- there's such insane photorealistic detail on her dress, far less detail shown on her face, almost none on the walls of the room, and fairly low-detail on the scene outside the window. I don't think this painter from the turn of the 19th C was trying to imitate AI -- she was just obsessed with painting the details of a dress, and the rest of the composition was an afterthought. Not a very coherent portrait or mini-landscape through the window.

* * *


So, the main points remain. People are much better at identifying AI from human art than just coin-flipping -- even when the really egregious examples are removed.

And crucially, AI models do not have a single mind of their own, like people do, so they frequently violate the fundamental rule to maintain coherence of style within a single work. It's so fundamental that most of us probably didn't even consider it necessary to spell it out explicitly -- like, what other approach would you take, clashing and disjointed styles? Computers are too analytical and slicing-up and zooming-in, not holistic and gestalt-oriented enough to appreciate what coherence, unity, and harmony among parts are.

Presumably they would do the same with a verbal medium -- parts of it would be verse with a strict meter and rhyme scheme, while other parts were dull drab prose. Or where entire paragraphs are dull drab terse prose, then others are highly ornate and full of figures of speech with sentence diagrams that look like someone smashed your windshield with a tire iron. You'd wonder whether the person had a schizo episode while writing a single chapter / story.

But verbal media are more serial, not as all-at-once parallel in processing. So harmony among elements isn't quite as salient of a property of speech as it is of images. IDK what AI story-slop reads like, but at least on the visual side, it's overly analytical schizo nature really comes through, and accounts for why we reject so much of it as decent or good art. It didn't even fulfill one of the most basic requirements -- stylistic coherence!

And again, I don't care how many trillions of parameters they add to these models to make them less ridiculously off-putting. That's over-fitting the data. And it's certainly a worse model to choose than "give prompts to a human artist" -- way less explicit detail needed there, cuz so much is already built-in to human nature, as well as during their training.

But something like stylistic coherence is too obvious and universal and unspoken to be picked up during training. It's part of innate human nature, and machines will never possess that, without ever more risible degrees of complexity-explosion. Sad!

No comments:

Post a Comment

You MUST enter a nickname with the "Name/URL" option if you're not signed in. We can't follow who is saying what if everyone is "Anonymous."