Lyngby e07: who cares what the computer thinks?

Stop half-assing your illustrations with AI, I’m allergic

I don’t care about AI in the sense that I don’t find the tech interesting and I haven’t found that much use for it in my life. It’s more of a hindrance whenever I encounter it in the wild. Yet I spend a lot of time thinking about it? You can’t escape other people’s use of it, for one. I got kind of worked up about how every article on the web and every presentation anywhere will now have an absolutely repulsive AI-generated illustration somewhere. So I wrote a little bit about it (202403071104).

You don’t always need the computer to sanctify your judgments

I’m working with a PhD student on some data analyses and they mentioned that they did a clustering to support the patterns they were seeing in a figure. I wasn’t sure what to think about it. The below is slightly edited from our email exchanges and is basically me trying to work it out.

So you did your dimensionality-reducing visualization (ordination some call it) and you notice that almost all of the data points with property X are in a certain region of the drawing. You’re welcome to point to that part of the drawing and say these points are all Xs. This will be true regardless whether an algorithm identifies same pattern. If you color all points with property X in red, and all the red is all gathered (dare I say clustered?) in the same area, that kind of speaks for itself. Nobody can dispute the observation that all the red is over here. I’m not sure what the clustering adds in this case; if the computer didn’t pick up on it I’d say you had the wrong model/algorithm or something.

On the occasion of Tukey’s retiring from Bell Labs and from teaching he reflects on the field of statistics and relates a cute story (Tukey 1986):

[In] 1937, when Ralph Boas was a postdoctoral student with Salomon Bochner […] Ralph reported getting one of two answers whenever he told Bochner his latest results: “But zees is trivial!” or “But zees is impossible!”

Tukey was making a point about the practice of statistics: with reasonable data an effect of size \(5\sigma\) is trivial to find, while something of size \(.05\sigma\) might be nearly impossible. (And statistics is useful somewhere in between.) The clustering of obvious clusters in my opinion falls under But zees is trivial! If you can eyeball it, anyone should be able to find it. Who cares what the computer thinks?

Backlinks:

Tukey, John W. 1986. “Sunset Salvo.” Doi: 10.1080/00031305.1986.10475361. The American Statistician 40 (1): 72–76. https://doi.org/10.1080/00031305.1986.10475361.
this file last touched 2024.03.08