Introduction to Cyc® Inferencing
Return to Table of ContentsThe Cyc knowledge base contains over 1 million assertions. Many approaches commonly taken by other inference engines (such as frames, RETE match, Prolog, etc.) just don't scale well to KBs of this size. As a result, the Cyc team has been forced to develop other techniques.
Cyc also includes several special-purpose inferencing modules for handling a few specific classes of inference. One set of modules handles reasoning concerning collection membership, subsethood, and disjointness. Another handles equality reasoning. Others implement fast reasoning with #$genlPreds and #$genlAttributes. Still others implement symmetry, transitivity and reflexivity reasoning.
(implies
(and
(isa ?afp AdultFemalePerson)
(residesInRegion ?afp Guam))
(and
(acquaintedWith Zippy ?afp)
(likesAsFriend Zippy ?afp)))
In Cyc-10, formulas asserted to the KB are stored internally, and
reasoned with, in
conjunctive normal form (CNF).
When converted to CNF, a formula gets rewritten as a conjunction of
disjunctions of negated and non-negated
literals.
So, for example, the formula above would be written in CNF as:
(and
(or
(not (isa ?afp AdultFemalePerson))
(not (residesInRegion ?afp Guam))
(acquaintedWith Zippy ?afp))
(or
(not (isa ?afp AdultFemalePerson))
(not (residesInRegion ?afp Guam))
(likesAsFriend Zippy ?afp)))
Each of the conjuncts would become a separate assertion.
Converting to CNF is part of the job of the Cyc-10 canonicalizer. The canonicalizer turns CycL formulas into canonical form, so that that they can be added to the KB as assertions, looked up in the KB, etc. Some of the other things the canonicalizer does are outlined below.
In Cyc-10, as well as in earlier versions of Cyc, universal quantification is handled trivially by leaving universally quantified variables unbound, while existential quantification is handled through the use of Skolem functions. Thus an assertion which is originally entered as:
(forAll ?A
(implies
(isa ?A Animal)
(thereExists ?M
(and
(mother ?A ?M)
(isa ?M FemaleAnimal)))))
will be converted to something like:
(implies
(isa ?A Animal)
(and
(mother ?A (SkolemFunctionFn (?A) ?M))
(isa (SkolemFunctionFn (?A) ?M) FemaleAnimal)))
and then further converted to CNF:
(and
(or
(not (isa ?A Animal))
(mother ?A (SkolemFunctionFn (?A) ?M)))
(or
(not (isa ?A Animal))
(isa (SkolemFunctionFn (?A) ?M) FemaleAnimal)))
Skolem functions are handled exactly like all other functions (except that Cyc
creates and names the function term for you). You are free to rename
the function term, or do anything else with it that you might do with
a function such as MotherOf. In fact, in Cyc-10 the Skolem
function itself is
reified,
so that the formula above becomes the following, and would be added to
the KB as 6 separate assertions:
(and
(or
(isa SKF-8675309 SkolemFunction))
(or
(arity SKF-8675309 1))
(or
(arg1Isa SKF-8675309 Animal))
(or
(resultIsa SKF-8675309 FemaleAnimal))
(or
(not (isa ?U Animal))
(mother ?U (SKF-8675309 ?U)))
(or
(not (isa ?U Animal))
(isa (SKF-8675309 ?U) FemaleAnimal)))
Another task the Cyc-10 canonicalizer takes care of is the ordering of
literals in assertions. The advantage of a canonical literal order is
that it simplifies KB lookup; only a test for structural equality is
required to determine that 2 formulas are the same.
The major advantage to using CNF as an internal representation is that it greatly simplifies the conceptual scheme used in inferencing, because all axioms have a uniform structure. For example, using CNF as a canonical form means that you get modus tollens inferencing "for free". When you add
P and Q => Rto the system, it gets canonicalized to the CNF form
not(P) or not(Q) or RNote that both of the following would be canonicalized to the same CNF form:
P and not(R) => not(Q) Q and not(R) => not(P)Since the internal CNF can be used to attempt to prove any of three literals of which it is constituted, it's as if all three forms of the rule were virtually present in the KB.
There is only one potential downside to using CNF, which is that certain types of assertions which can be expressed quite compactly in conditional form become somewhat unwieldy when converted to CNF. Specifically, an assertion of the form:
(implies
(or P1 P2 P3 ... Pm)
(and Q1 Q2 Q3 ... Qn))
will result in a CNF with m times n conjuncts. However, we do not
regard this as a significant problem, since asserting formulas of this
form constitutes bad KE style, and thus is unlikely to occur very
often. In particular, a knowledge enterer who writes such a formula
is probably attempting to use the P1 ... Pm literals as an exhaustive
list of cases in which the consequent should hold, when he or she
should have been shooting for a single meaningful generalization.
For example, let's say I ask for bindings for the formula (likesObject ?x ?y). That formula will constitute the root node of an inference search tree. What I am looking for is any assertion which will help provide bindings for ?x and ?y. The KB may contain some ground assertions involving likesObject, such as
(likesObject Keith BillM)and also some if-then rules, such as
(implies
(possesses ?x ?y)
(likesObject ?x ?y))
Each of these provides a way to expand the root node. That is, each
constitutes a link to a new node with a different formula to satisfy;
these new nodes will be the leaf nodes of the search. In the first
case, the formula to satisfy in the new node is simply
#$TrueIn the second case, using the if-then rule takes us to a new node that now needs to satisfy the formula
(possesses ?x ?y)The search procedure may now recurse on this new node, because if we can find bindings for this formula, the if-then rule will give us bindings for our original formula. Perhaps the KB also contains a rule which says,
(implies
(objectFoundInLocation ?x KeithsHouse)
(possesses Keith ?x))
This assertion can take us to yet another node with the goal formula
(objectFoundInLocation ?x KeithsHouse)That is, we are now looking for things found in Keith's house, because if something is found in Keith's house, then Keith possesses it, and if someone possesses something, then he or she likes it. Note that this new node has one less free variable than its parent, which is probably desirable.
Alternatively, there may be another rule which states
(implies
(and
(isa ?x Agent)
(owns ?x ?y))
(possesses ?x ?y))
This rule could take us to a new node whose formula to satisfy is
(and
(isa ?x Agent)
(owns ?x ?y))
But this is probably not a happy development: we are now three nodes
down, and the problem is getting more complex, rather than simpler.
Thus, the primary issue to be addressed in designing an inference procedure is the algorithm to be used for searching the tree. How do we decide which leaf nodes to expand next? Another important issue is to determine how a node is expanded. That is, how do we find the axioms in the knowledge base which are likely to provide links to new leaf nodes?
(and
(isa ?x Agent)
(owns ?x ?y))
If we choose to expand this node, which of these two literals should
we expand first? Should we look for elements of Agent, or
should we look for owns pairs?
The solution used in Cyc-10 is to require that each HL module which applies to a literal must be able to provide a heuristic estimate for how expensive it would be to apply that module to the literal. These heuristics are linearly summed, and the literal which has the lowest heuristic cost is favored most strongly.
Go to Top