This text is an informal critique of the choice of representation of input in RF4 (Saito and Nakano, 1993), pertaining to Bongard problems, the author's domain of research. I give the main subject following a general introduction.

RF4 is *"a concept learning algorithm"*
presented in a 1993 paper (see link, above), which *"adaptively
improves its concept learning efficiency."* RF4 works in
a space of first-order logic formulas, which it searches
performing *"a depth-first search on the basis of five
criteria for pruning undesirable formulae and five transformation
rules for combining formulae"* (from the abstract of the
paper above). I am not concerned here with the particular details
of RF4, not even with its merits, which might be important in
certain areas of computer science and artificial intelligence.
What concerns me is the choice of Bongard problems by Saito and
Nakano, as a domain which exemplifies the virtues of RF4. In
particular, S&N claim that RF4, as implemented on a personal
computer in C, *"solved 41 out of 100 Bongard problems
within a few seconds for each problem."* This claim, by
itself, is hardly interesting. After all, there is the trivial
"algorithm", which, when presented with *any*
Bongard problem (henceforth, BP) "solves" it in a
matter of a few nanoseconds, depending on the computer used: it
is a 2x100 table, the first column of which contains the
incremental number of the BP, while the second column contains
the corresponding solution, in plain English. You enter the
BP-number in this "program", and -- voila! you get the
solution. What more could we expect from an automation of
intelligence?

RF4, of course, does not work that way. In RF4, BP's are given
not as incremental numbers of a table, but as first-order logic
formulas. For example, the first box on the left side of BP #6 (a single outlined triangle) *might*
be given by something like:

*polygon*(X) & *outlined*(X)
& *angles*(X) = 3

I say "*might* be given" as above because,
unfortunately, S&N never give us a complete example of input
representation in their 1993 paper. I had to make an educated
guess for the above formula from the solution to BP #6, which
they *do* give:

**forall** B **in** *boxes,*
**forall** X **in** B, *angles*(X)
= 3 --> *class1*

So, in RF4, the input "BP #6" (for example) is not given as "location 6 of the 2x100 table", as in our trivial algorithm above, but as a collection of 12 formulas (6 for the left and 6 for the right side) similar to the first one, above.

Cases like BP #6 are misleading, because of their simplicity.
When we see a single outlined triangle in a box it is difficult
to think that a description like "*polygon*(X) & *outlined*(X)
& *angles*(X) = 3" does not describe accurately
what we see in the box. In this particular problem it happens
that it does, because nothing else is relevant in its solution.
Suppose we do not know the solution beforehand, however (which is
the natural thing to expect for BP's). Why would not the location
-- in coordinates -- of the triangle be specified? After all, we
might be dealing with BP #8, where the solution has to do *precisely*
with the absolute locations of the outlined areas within their
boxes. Why would not the size of the triangle be relevant?
(Actually, the length of each of its sides.) Problem #2 has to do
with size. And the slope of each of its sides? How about the *quality*
of the sides, which is "plain straight line", as
opposed to the qualities of lines in BP #9
and BP #107? And how about the
line-thickness, which seems to be around 2 pixels (depending on
resolution)?

We thus see that, even in cases of simple triangles, the
specification in terms of a first-order logic formula has to be
quite long, *if we want to be honest and not exclude a priori
some feature,* an exclusion which would certainly guide the
program towards the solution, if the latter does not contain such
a feature. Else, if we include only the relevant features in the
input-description, we clearly perform an act of *cheating*.
The *degree* of cheating is inversely proportional to the
length of the first-order formula. For example, the formula
"*polygon*(X) & *outlined*(X) & *angles*(X)
= 3" cheats to a certain degree because it contains only
three predicates, excluding most of the others mentioned in the
previous paragraph; the formula "*outlined*(X) & *angles*(X)
= 3" cheats more; the formula "*angles*(X) =
3", even more; finally, the "formula" *"6th
problem, 1st box -- now tell me the solution"* (taken
from our earlier 2x100 table-example) cheats as much as it is
possible.

Yet, all the above discussion presupposes that *we know*
what the possible features are, in order to include them in the
(long) formula. Even this, however, is not always possible.
Consider, for example, BP #29 (actually shown in S&N's
paper):

What could the description of the top-left-most box on the
left side be? How do we specify an irregular curve through a
first-order formula? Could it be that we do not need to be very
precise about its exact shape, describing it simply as a
"closed curve"? But how do we know *a priori*
whether its shape is irrelevant, *without knowing the solution*?
In this particular problem (#29) it happens that the shape *is*
irrelevant. But look now at BP #20:

Is shape still irrelevant? Would it be honest to specify
something like "*closed_curve*(X) & *bulges*(X)
= 2 &..." for the first box (top-left, left side)? But *the
whole point* in this problem is to realize precisely that
there are two bulges on each curve. Once that is done, finding
out about the placement of the two dots in relation to the two
bulges is trivial. So if we want to be honest, we should avoid
using predicates like "*bulges*(X)", and let the
program discover those bulges... but from what? Predicates like
"*closed_curve*(X)" do not seem to be very
helpful. Should not the specification of the curve be given in
terms of a very long sequence of short straight lines, each
meeting its previous and next line at its two ends? This seems to
be a fully honest approach, if all the coordinates, lengths, and
slopes of the lines in this sequence are given. But are we
serious? *How on earth do we do this with a first-order
formula*?! How long does this formula have to be, and how can
it be manipulated to infer the existence of the two
"bulges" from it?

The conclusion that I draw from the previous paragraphs is
that first-order formulas are the wrong kind of representation
regarding Bongard problems. *If they are short enough to be
manageable, the representation is cheating; and if they are long
enough to be honest, the representation is unmanageable.*
Consequently, raw (unprocessed) bit-maps, with black and white
pixels are superior as input specifications, because everything
that there is to be discovered about the contents of the box is
there, in front of the program's eye, waiting to be discovered by
the program's mind, un-preprocessed by human helpers.

This last observation apparently generalizes beyond Bongard problems, applying equally to other domains in cognitive science. Whenever the input in a cognitive domain is complex, its initial specification should avoid all manual (human-guided) preprocessing and pruning, allowing the automated system to arrive by itself at any pithy, concise description which possibly exists. Raw, unordered information comes first; the crystallized, highly ordered re-description of it comes much later, after laborious processing. Most of the initial efforts in 20th century artificial intelligence ("GOFAI") had this principle backwards: they would start the automated processing from the crystallized form, facing the insurmountable difficulties of the lost raw information only later, in the form of non-scalability, and impracticability. RF4 is nothing but a genuine product of that age and philosophy in artificial intelligence.

A related link on the web is Alexandre Linhares's critique of RF4 (in PDF format).

Last update: 09/14/03