November 21, 2024

AI slop is stylistically schizo and contradictory, human art is coherent and unified in style

A recent post from Scott Alexander on AI slop vs. human art asked respondents to guess whether certain images were made by AI or a person. If AI could fool enough respondents, was it not real art, a la the Turing test for judging whether a machine was intelligent?

Well, they did not fool respondents -- the average and median score was 60% correct, compared to 50% if they flipped a coin. We just had an election -- 60 to 40 is not a close election. This contradicts the title of section 1, that people had a hard time identifying AI art just cuz it wasn't near 100% correct. Having a hard time would mean they did worse than a coin-flip.

And he admits that he put a massive thumb on the scale by screening out AI images that had the telltale signs of being AI -- which is to say, the telltale signs that are already known about. This experiment revealed, to me at least, other telltale signs, but more on that later. E.g., disfigured hands and fingers, garbled text, "wrestling" poses where there's a lot of interaction between two bodies, and the entire style of the recognizable DALL-E model. Even with this thumb on the scale, people were still not fooled.

Of course, when it comes to computer models, the question is not can a computer program do something, but how much complexity does it cost, and how good is the output? If it has a 137-degree polynomial to connect a line between 7 data points, that's over-fitting the data. How many prompts, with what degree of specificity, does an AI generator require before it gives sufficiently passable results -- plural, as in reliably replicated, not just a fluke success?

The more complexity that needs to be built into the model by these prompts, the less smart and talented it is. The real comparison is, how many prompts or constraints, and with what degree of specificity, would you have to tell a human artist before they gave sufficiently passable results? Not many at all! The computer is a massive downgrade, if you're telling someone or something else what you have in mind.

The real fascination with AI slop is that the turnaround time for results is relatively fast, compared to the labor-intensive work of human hands. And so even though the results are slop, they're at least 20% real-ish, so you're OK with that trade-off -- crappy quality, but fast results for AI, instead of high quality results that take much longer for a person.

So then AI is not superior or equal to a person, it's a different point along a trade-off continuum, and nothing to gawk at as though it were a higher form of intelligence or existence than our own.

* * *


My main interest, however, is in further analyzing what gives away AI art, beyond the already well-known signs like mangled hands, garbled text, and interactive bodies that turn into Mr. Potatohead abominations.

Those are specific to the subject matter -- what is being portrayed. But scrolling through the images -- and I was not fooled by more than a few (more on why they can fool, later) -- I discovered a more fundamental and stylistic giveaway of AI art, which gets to its very nature, or perhaps lack of a nature, compared to human nature.

Namely, AI -- being a program without a mind or spirit of its own -- can easily be of two minds, even in contradictory ways (not just divergent), at a stylistic level. Not what subject matter is portrayed, but the manner in which it is portrayed. Human beings, possessing a single mind of their own, are of one mind about the manner in which they portray the subject matter.

Consider image 4, which I instantly felt was AI (and it is). There is a clear main subject, close to the viewer, and it's a person. I don't think it matters if the subject is a non-human animal, plant, inanimate object like a boulder, or human artifact like a chair or door -- something that is the focus of attention, in the foreground, near the viewer. Then there's a background environment in which the subject is embedded -- not a portrait in a vacuum.

The subject being a person means it takes the form of a portrait, while the environment takes the form of a landscape.

Yet the styles of these two forms are different and contradictory. The landscape is Impressionist, although who cares exactly what period it's mimicking -- the point is, the level of detail is low-resolution, blurry, with blobs and patches and planes of color more than crisply delineated and complex shapes. This applies not only in the distance, where things are naturally more blurry, but right up in the foreground -- look at the flowers directly around the girl, their stalks look like single thick brushstrokes, and the petals are thick daubs of color. Low-detail, blurry.

Then all of a sudden, the girl in the portrait section is rendered in fairly high detail, in focus rather than blurred. It's not 100% photorealistic, but it's far more in that direction than the highly stylized rendering of detail for the landscape section. You can see multiple folds on the fabric of her clothes, with light / shading for sculpting purposes -- which is NOT used on the grass, flowers, dirt, trees, etc. in the landscape. You can make out individual wisps of hair on her head, each tiny curving line inside her ear (with shading-for-sculpting again), and so on.

This detailed focus gets more blurry and Impressionist as you look toward the bottom of her dress and shoes, and I notice in the other images that the trigger for photorealism seems to be a human face or other exposed parts of the human anatomy. So even just her dress -- which is a single garment, not a separate top and bottom -- looks schizo stylistically, with a more photorealist upper region and a blurry Impressionist bottom region, further from the trigger of exposed human anatomy.

The machine doesn't understand that a single self-contained work of art is supposed to be coherent and unified in style or presentation. It has clearly been trained on photorealistic portraits and Impressionist landscapes, one not-so-stylized and the other highly stylized. And so when asked to combine a portrait within a landscape, it figures why not combine the best of both worlds? -- a high-detail portrait in a landscape that is blurry immediately surrounding her, not to mention farther away as well.

This is not just shallow focus from photography or cinematography -- at the exact same distance from the "camera," there are simultaneously a sharp-focus object and blurry objects. That's not physically or technologically possible -- and could only be done by deliberate choice of the artist, in some warped form of artistic license.

But artists never use that license, cuz it violates the fundamental requirement to present the subject matter in a coherent unified style -- all blurry and Impressionist, or all sharp-focus and photorealistic, but not some of one and some of the other in the same work.

To give a pity point to the machine, it at least does the sensible contradiction instead of the wacko contradiction -- it renders subjects in sharp detail (as though we're giving them our attention), while leaving environments in blurry detail (as though they're in our peripheral vision, not as important), rather than an Impressionist portrait set within a photorealistic landscape (akin to animated figures superimposed on a photographed real-world environment, like Who Framed Roger Rabbit?).

This schizo clash of styles within a single work is how I identified most of the other AI images.

Photorealist portrait in blurry landscape also told me the following were AI: 7, 10 (again, not a portrait of a person, but with a clear subject taking up much space), 13 (cartoon head, realistic water), 16 (the background being just a fairly uniform color plane), 21 (the environment's flowers are blurrier than the decorative flowers on her clothes, despite both being close to the camera), 23 (background looks like an Abstract Expressionist painting, and even within the mother's clothing, the colored pieces are blurrier than the white pieces), 27, 33, 40, 46, 49 (the wacko contradiction, where the close-up buildings are blurry while the distant water is in sharper focus)...

And the most insane is 26, whose subject looks like he was photographed under pristine studio conditions -- while the landscape outside the window is a highly stylized Venetian type portrayal. Is it supposed to be a painting within the artwork, hanging on the wall of this room? To me it looked like a landscape shown through a window, that ol' trick. There's what could be a decorative frame just below it, but not running up the left side of the landscape... so it's a bit schizo in its subject matter, but also in the style, with totally opposite styles for the landscape and the portrait.

Related, there are some whose subject matter is a bunch of abstract geometric shapes, with no 3D depth cues, no lighting variation, etc. -- and then a single human face or body, with multiple features (eye outline, iris, pupil, lips, individual teeth, etc.), sometimes with shading-for-sculpting. The dum-dum AI doesn't understand that a single work has to be entirely abstract or entirely representational. This gave away 6, 17, 24, and 50.

I could tell that 19 was by a person cuz although there are geometric shapes and a stylized human head, the geometric shapes are not separate abstract objects from the representational head -- they're used to form the lines around the head and its features, or to fill up volume within these features, suggest texture of the features, etc. They are building blocks to render a representational object -- not a separate array of abstract shapes, plus a representational head in their midst.

The Impressionist landscapes with no dominating subject are less obviously AI, cuz the contradictory rendering of subject and background cannot happen. Still, their subject matter or compositions look more like photographs, which were then passed through a blurry / stylized / Impressionist filter. The point-of-view, angle, perspective, cropping objects at the frame's edges, etc. Very photographic in composition, if not photorealistic in detail. And painters or illustrators rarely did this -- they create more of a staged array of figures or natural elements if there's no dominant subject.

This gave away 11, 20, 31, and 43. I could tell 22 was by a person cuz there's a semi-prominent human and plant subject, and they're both rendered Impressionistically along with the landscape. However, 38 and 45 do not look like photographs in composition, and have the same approach to detail throughout. 38 has a little wacky of subject matter, with fairly crisply intact ruins amidst a sprawling pasture, and maybe the level of detail on that building is a bit too much compared to the landscape and figures, but it's not as obvious, and the figures are pretty blurry.

44 was the only one that really got me, glad to know it got everyone else too. Kinda photographic in composition, but could easily be a painter as well. Everything rendered in blurry brushstroke blobs, nothing is contradicting that with sharp focus. The presence of multiple people is not triggering the high-detail tendency for portraits. And the arrangement of them looks somewhat staged for dramatic effect, not a typical photograph. Very consistent and coherent stylistically.

Well, one getting through is just a fluke, as far as I'm concerned. By random chance the algo didn't do the many wrong things it is tempted to do. And if you could somehow spell out what is different about this one, to try to replicate it, it'd need so much more complexity in its instructions, that it wouldn't be worth it -- over-fitting the data.

I also missed the most commonly misidentified human picture -- 25. It has that wacko subject matter that makes you think it's AI. And the insane level of detail on the front of the ship, but far less on the bottom, the blurry / misty right and left sides of the landscape (including the smaller ships on the right), are the common contradictory styles of AI.

I wonder if this one was purposely made to resemble AI. If so, that still proves the larger point -- humans are better at imitating AI slop, than computers are at imitating human art. We are superior to them, so we can understand them and imitate them better than vice versa. Their output is a subset of ours, so it's interpolation and valid when we imitate them. Our output is a superset of theirs, so it would be extrapolation and invalid for them to imitate us.

The only human one I was fairly convinced was AI, was 30 -- there's such insane photorealistic detail on her dress, far less detail shown on her face, almost none on the walls of the room, and fairly low-detail on the scene outside the window. I don't think this painter from the turn of the 19th C was trying to imitate AI -- she was just obsessed with painting the details of a dress, and the rest of the composition was an afterthought. Not a very coherent portrait or mini-landscape through the window.

* * *


So, the main points remain. People are much better at identifying AI from human art than just coin-flipping -- even when the really egregious examples are removed.

And crucially, AI models do not have a single mind of their own, like people do, so they frequently violate the fundamental rule to maintain coherence of style within a single work. It's so fundamental that most of us probably didn't even consider it necessary to spell it out explicitly -- like, what other approach would you take, clashing and disjointed styles? Computers are too analytical and slicing-up and zooming-in, not holistic and gestalt-oriented enough to appreciate what coherence, unity, and harmony among parts are.

Presumably they would do the same with a verbal medium -- parts of it would be verse with a strict meter and rhyme scheme, while other parts were dull drab prose. Or where entire paragraphs are dull drab terse prose, then others are highly ornate and full of figures of speech with sentence diagrams that look like someone smashed your windshield with a tire iron. You'd wonder whether the person had a schizo episode while writing a single chapter / story.

But verbal media are more serial, not as all-at-once parallel in processing. So harmony among elements isn't quite as salient of a property of speech as it is of images. IDK what AI story-slop reads like, but at least on the visual side, it's overly analytical schizo nature really comes through, and accounts for why we reject so much of it as decent or good art. It didn't even fulfill one of the most basic requirements -- stylistic coherence!

And again, I don't care how many trillions of parameters they add to these models to make them less ridiculously off-putting. That's over-fitting the data. And it's certainly a worse model to choose than "give prompts to a human artist" -- way less explicit detail needed there, cuz so much is already built-in to human nature, as well as during their training.

But something like stylistic coherence is too obvious and universal and unspoken to be picked up during training. It's part of innate human nature, and machines will never possess that, without ever more risible degrees of complexity-explosion. Sad!

November 7, 2024

Unstolen election mega-thread

Just re-posting two initial comments here for now to get the ball rolling, will add to it in the comments as usual.

* * *


Why didn't Dems steal it this time? Well, Dems were promising to steal it -- the state election boards in battleground states, the media, and Obama himself on the campaign trail.

Why didn't they this time? Perhaps the election steal of 2020 was part of the broader civic breakdown of 2014-2020 -- most of which was marked by political violence, hostile rhetoric, etc. Stealing an election is not physical violence, or even heated rhetoric, but it is hyper-competitive, antagonistic, anti-social, etc.

It was also part of the broader hostile crusade by woketards, like censoring and deplatforming everyone during the 2014-2020 abyss. That's also hostile, anti-social, war-like, etc., but not physically violent.

This is part of the Peter Turchin 50-year cycle in civic breakdown, whose last peak was the late '60s and early '70s, then the late 1910s and early '20s, late 1860s and early '70s, a missing explosion circa the late 1810s and early '20s (which was instead the Era of Good Feelings), and another burst around the Revolutionary War of circa 1770.

It's a kind of energy that builds up, and then dissipates, over a cycle lasting 50 years, or 25 years in either direction.

By 2024, it was already clear that the violent symptoms of this pattern had abated -- BLM and Antifa did not burn down half the country in '24, there were no roving executions of cops caught on camera like in the mid-late 2010s, Democrats didn't roam around assassinating Trump supporters for no reason and getting off with no bail, etc. Although there were 2 assassination attempts on Trump himself -- the violence hasn't gone to 0, but it's only 5% of what it was during the 2014-2020 abyss.

Libtards didn't even hold marches when the Supreme Court over-turned their sacred cow of Roe v. Wade in '22. There will be no pussy hat marches when Trump is re-inaugurated.

Twitter allowed itself to be bought out and taken over by Musk, which would not have been allowed in 2014-2020, and they submitted to the new orders about no more crazy censorship and ban waves.

So, the failure or unwillingness of Dems to carry out the steal this time must be part of that general dissipation of policitized zeal from its 2014-2020 peak (abyss). There will be no Russiagate, #MeToo, Resistance, etc. bullshit like there was during Trump's first term, during the peak of politicized zealotry.

I thought since stealing an election wasn't violent or confrontational, they'd still do it -- especially since that's what they were promising for the past few months, right up through most of election night, with Philadelphia halting their vote count early in the evening, waiting for the rest of the state to return their numbers, anticipating a steal. Who am I to second-guess the same message, from the same top-level figures, that was followed up on by a successful insane steal in the very last election?

The energy level declining across all dimensions -- violence, censorship, stealing elections -- is also bipartisan. There was WAY less zeal on the Trump side this cycle, compared to 2015-'16, and even 2020. No one is sincerely posting God-Emperor memes anymore, no one is champing at the bit to lay the first bricks in that Big Beeyooteeful Wall, which never got built the last time. And there's just been far less trolling and teabagging this time than in 2016, and certainly 2020 when it got stolen, preventing the teabagging.

Politicized zeal overall in American society has fallen off of its 2014-2020 explosive peak, and will reach a minimum circa 2045, which will be as non-partisan as the mid-1990s were 50 years earlier. Then the next explosion will happen in the 2060s and early '70s, and the cycle will keep on repeating...

* * *


Also a quick dunk on tech determinist dum-dums, who blamed / credited the explosive zeitgeist of 2014-2020 on newfangled tech (social media, smartphones, "meme magic," online in general).

Well, Americans are even more online than they were in 2016, yet the zealotry has fallen off a cliff after 2020, and will continue plummeting toward a minimum in 2045 -- all while Americans continue to be as online, or even more online, than they were in the 2014-2020 period.

That's the cross-temporal proof. Then there's the cross-sectional proof -- Japanese people have become more and more online since they first adopted the internet. Yet they have experienced no such explosion of politicized zealotry -- whether leading to violence, censorship, heated rhetoric, stolen elections, or whatever else.

All technologies are mere tools, indifferent to how they're used, and impotent to shape, channel, or nudge human societal systems or individual behavior. Rather, the dynamics of society and individual psychology lead to some people using some tech for some purpose in some state of affairs, and some others to use some other tech (or even the same tech) for some other purpose when they're in some other state of affairs.

Americans didn't need social media or the internet or online anonymity to carry out an equally explosive bout of zealotry in the late 1960s and early '70s, or the late 1910s and early '20s, or the Civil War or the Revolution -- or the civic breakdown of the 60s AD during the Roman Empire, most of whom weren't even literate, let alone employing a communicative medium other than speech sounds coming out of the mouth.

When the cycle enters a crazy zealous phase, they use whatever means / media they have at their disposal, and when the cycle leaves the crazy zealous phase, they either use different media that have no stain of the zealous-associated media, or they use the same ol' media for a different purpose.

Technologies are utterly indifferent to how they're used, and they have no deterministic or even probabilistic influence stemming from inherently from themselves, toward human behavior, at any scale (person, group, society, etc.).