I don’t care about AI in the sense that I don’t find the tech interesting and I haven’t found that much use for it in my life. It’s more of a hindrance whenever I encounter it in the wild. Yet I spend a lot of time thinking about it? You can’t escape other people’s use of it, for one. I got kind of worked up about how every article on the web and every presentation anywhere will now have an absolutely repulsive AI-generated illustration somewhere. So I wrote a little bit about it (202403071104).

I’m working with a PhD student on some data analyses and they mentioned that they did a clustering to support the patterns they were seeing in a figure. I wasn’t sure what to think about it. The below is slightly edited from our email exchanges and is basically me trying to work it out.

So you did your dimensionality-reducing visualization (ordination
some call it) and you notice that almost all of the data points with
property X are in a certain region of the drawing. You’re welcome to
point to that part of the drawing and say these points are all Xs. This
will be true regardless whether an algorithm identifies same pattern. If
you color all points with property X in red, and all the red is all
gathered (dare I say clustered?) in the same area, that kind of speaks
for itself. Nobody can dispute the observation that *all the red is
over here*. I’m not sure what the clustering adds in this case; if
the computer didn’t pick up on it I’d say you had the wrong
model/algorithm or something.

On the occasion of Tukey’s retiring from Bell Labs and from teaching he reflects on the field of statistics and relates a cute story (Tukey 1986):

[In] 1937, when Ralph Boas was a postdoctoral student with Salomon Bochner […] Ralph reported getting one of two answers whenever he told Bochner his latest results: “But zees is trivial!” or “But zees is impossible!”

Tukey was making a point about the practice of statistics: with
reasonable data an effect of size \(5\sigma\) is trivial to find, while
something of size \(.05\sigma\) might
be nearly impossible. (And statistics is useful somewhere in between.)
The clustering of obvious clusters in my opinion falls under *But
zees is trivial!* If you can eyeball it, anyone should be able to
find it. Who cares what the computer thinks?

**Backlinks:**

- (202401171334) Letters from Our Lyngby Correspondent

Tukey, John W. 1986. “Sunset Salvo.” Doi:
10.1080/00031305.1986.10475361. *The American Statistician* 40
(1): 72–76. https://doi.org/10.1080/00031305.1986.10475361.