I posted this on Less Wrong 10 days ago. At the end, I summarize the answer they gave.
What counts as a hypothesis for Solomonoff induction? The general impression I’ve gotten in various places is “a hypothesis can be anything (that you could write down)”. But I don’t think that’s quite it. E.g. evidence can be written down but is treated separately. I think a hypothesis is more like a computer program that outputs predictions about what evidence will or will not be observed.
If X and Y are hypotheses, then is “X and Y” a hypothesis? “not X”? “X or Y?” If not, why not, and where can I read a clear explanation of the rules and exclusions for Solomonoff hypotheses?
If using logic operators with hypotheses does yield other hypotheses, then I’m curious about a potential problem. When hypotheses are related, we can consider what their probabilities should be in more than one way. The results should always be consistent.
For example, suppose you have no evidence yet. And suppose X and Y are independent. Then you can calculate the probability of P(X or Y) in terms of the probability of P(X) and P(Y). You can also calculate the probability of all three based on their length (that’s the Solomonoff prior). These should always match but I don’t think they do.
The non-normalized probability of X is 1/2^len(X).
So you get:
P(X or Y) = 1/2^len(X) + 1/2^len(Y) - 1/2^(len(X)+len(Y))
and we also know:
P(X or Y) = 1/2^len(X or Y)
since the left hand sides are the same, that means the right hand sides should be equal, by simple substitution:
1/2^len(X or Y) = 1/2^len(X) + 1/2^len(Y) - 1/2^(len(X)+len(Y))
Which has to hold for any X and Y.
We can select X and Y to be the same length and to minimize compression gains when they’re both present, so len(X or Y) should be approximately 2len(X). I’m assuming a basis, or choice of X and Y, such that “or” is very cheap relative to X and Y, hence I approximated it to zero. Then we have:
1/2^2len(X) = 1/2^len(X) + 1/2^len(X) - 1/2^2len(X)
which simplifies to:
1/2^2len(X) = 1/2^len(X)
Which is false (since len(X) isn’t 0). And using a different approximation of len(X or Y) like 1.5len(X), 2.5len(X) or even len(X) wouldn’t make the math work.
So Solomonoff induction is inconsistent. So I assume there’s something I don’t know. What? (My best guess so far, mentioned above, is limits on what is a hypothesis.)
Also here’s a quick intuitive explanation to help explain what’s going on with the math: P(X) is both shorter and less probable than P(X or Y). Think about what you’re doing when you craft a hypotheses. You can add bits (length) to a hypothesis to exclude stuff. In that case, more bits (more length) means lower prior probability, and that makes sense, because the hypothesis is compatible with fewer things from the set of all logically possible things. But you can also add bits (length) to a hypothesis to add alternatives. It could be this or that or a third thing. That makes hypotheses longer but more likely rather than less likely. Also, speaking more generally, the Solomonoff prior probabilities are assigned according to length with no regard for consistency amongst themselves, so its unsurprising that they’re inconsistent unless the hypotheses are limited in such a way that they have no significant relationships with each other that would have to be consistent, which sounds hard to achieve and I haven’t seen any rules specified for achieving that (note that there are other ways to find relationships between hypotheses besides the one I used above, e.g. looking for subsets).
Less Wrong's answer, in my understanding, is that in Solomonoff Induction a "hypothesis" must make positive predictions like "X will happen". Probabilistic positive predictions – assigning probabilities to different specific outcomes – can also work. Saying X or Y will happen is not a valid hypothesis, nor is saying X won't happen.
This is a very standard trick by so-called scholars. They take a regular English word (here "hypothesis") and define it as a technical term with a drastically different meaning. This isn't clearly explained anywhere and lots of people are misled. It's also done with e.g. "heritability".
Solomonoff Induction is just sequence prediction. Take a data sequence as input, then predict the next thing in the sequence via some algorithm. (And do it with all the algorithms and see which do better and are shorter.) It's aspiring to be the oracle in The Fabric of Reality but worse.