# Lyngby e05: Unstable
variances

Just some scattered notes this week! Not really sticking to my plan
of posting on Wednesdays but I’m happy if I stick to more-or-less
weekly.

## Notes on variance
stabilization

Data transformation is becoming a recurring nightmare on the blog (202401312031). I’m looking into a
single cell RNASeq project and trying to figure out what typical
preprocessing looks like. Not being particularly familiar with the
terrain I picked some random tutorial and started following the steps.
The instructions suggest two separate transformations, both of them in
the name of variance stabilization. It is not clear to me why I would
want to focus on preemptive variance stabilization. Some half-baked
thoughts/observations:

- According to the accepted
answer to this CV Q, taking the square root of Poisson data (the
classical variance stabilizing transformation) before doing a t-test is
kind of pointless.
- For a regression, if you want to rely on the usual iid normal
assumptions, variance stabilization is necessary for more accurate
p-values and confidence intervals.
- But it is possible to model unequal variances
(
*heteroskedasticity* if you have a personal speech coach)
directly so it’s not automatically the case that any variance should be
stabilized.
- Which q.v. the
accepted answer to this related CV Q, which concludes that what kind
of transformation you want really depends on what you want to do. “It
depends” is a common answer to statistical questions.
- I often take logs of data that span several orders of magnitude to
make more readable visualizations. But then I’m not really doing it to
stabilize any variances.

Naturally a tutorial is an outline of common techniques and you
always have to think carefully about how it applies to your own
case.

## A stupid research assistant

Over lunch on campus we talked a bit about large language models, as
you do. I’ve seen an observation online somewhere that ChatGPT is like
having a very naive research assistant. This feels accurate to me: it
just brings you whatever it has completely uncritically but it can be a
nice starting point if you know what you’re doing.

My students don’t know what they are doing, or they wouldn’t be
taking my course. I get the feeling that to some of them, the LLM is a
knowledgable research supervisor and not a stupid research assistant.
This is bad for a variety of reasons, one of which is that the LLM has
no doubt, no concept of being wrong. Probably because it was trained on
Reddit comments.

**Backlinks:**

this file last touched 2024.02.25