Oxford University

Oxford University Computing Services

carthage

   Author: (revised rahtz ) Date: February 2001 (revised 07/02/2001)

Contents

1. What is it?

Carthage is a yacc/lex-based parser for SGML DTDs which can delete references to undeclared elements. It can also do a few other things, depending on the run-time flags you give it.

Carthage is unsupported software; it may be used freely without further permission or royalty, but users who improve it or fix errors are requested to notify the author so he can also fix them.

2. How do I use it?

To clean a DTD in the current directory, just invoke the command 'carthage' with two arguments: the filename of the input DTD and the file name for the output DTD. For example:

  carthage dirty.dtd clean.dtd

The DTD parser itself is called 'carthago' (that's Latin for 'Carthage') and can be invoked on its own if you like. To get a list of the options it understands, try invoking it with the option '-?'. When I did this just now, this was the result:

  $ carthago -?
  carthago:  expected usage is
    carthago [options] < dtd-file > output-file
    where options are these (* means default):
    --msglevel 5*|n (or --trace --debug --verbose)
    --msdrop* --mskeep (drop/keep marked sections, -s0 -s1)
    --dupentquiet* --dupentwarn (warn if entity declared twice? -e0 -e1)
    --pedrop* --pekeep (drop/keep parameter entity declarations -p0 -p1)
    --commdrop* --commkeep (drop/keep parameter entity declarations -c0 -c1)
    --undeclared* --unused --declared --used (include in report file, any or all)
    --delete <gi> <gi> ... (list of GIs to delete from content models)
    --delenda <filename> (name of delete-list file)
    --output <filename> (name of report file, may reuse as delenda file)
What this means is that carthago reads from stdin and writes to stdout, and takes the following options:
  --msglevel <integer>
  --trace               (same as --msglevel 0)
  --debug               (same as --msglevel 1)
  --verbose             (same as --msglevel 3)
These control the emission of error and warning messages. Trace, debugging, and 'verbose' messages are enabled when the message level is set to 0, 1, or 3 respectively. The default (--msglevel 5) enables informative messages; to allow warnings only, set --msglevel 7. For error messages only, use --msglevel 10. Values higher than 10 have no effect.

The default is --msglevel 5, and message levels above 10 have no effect.

  --msdrop*
  --mskeep
Control whether marked sections marked IGNORE are suppressed silently in the output or kept. By default they are suppressed.
  --dupentquiet* 
  --dupentwarn 
Control whether a warning is issued if an entity is declared twice. The default is to say nothing (because the TEI uses multiple declarations frequently, so double declarations are seldom errors in my own work.)
  --pedrop* 
  --pekeep 
Control whether to drop or keep parameter entity declarations in the output. By default they're dropped, because all references to such entities are expanded and there seems to reason to keep the declarations around if the references are expanded.
  --commdrop 
  --commkeep*
Control whether to drop or keep comments in the output. The message says that by default they are dropped but I've just checked the code and by default they are kept. Sorry about that.
  --undeclared* 
  --unused 
  --declared 
  --used 
Control what lists of elements should be included in the report file, if any. By default undeclared elements (i.e. element which are used but not declared) are included in the list. Unused elements (elements declared but not appearing in content models), declared elements, and elements referred to in content models (--used) may also be listed. Comments in the output file should make clear which is which.
  --delete <gi> <gi> ... 
  --delenda <filename>
Indicate a list of generic identifiers which should be deleted from content models before element declarations are written out. The --delete option allows you to specify generic identifiers on the command line; I use this for testing, but it might also be handy for real work in some cases. The --delenda option takes the name of a file containing the generic identifiers to be deleted. ('Delenda' is the Latin word meaning 'things to be deleted'. In a famous speech Cicero used the word to describe Rome's enemy Carthage -- Carthago delenda est -- which gives this program its name.)

If the program encounters a reference to a generic identifier that cannot safely be deleted (e.g. because it's a required subelement), an error message is issued and the reference is left undeleted. If

  --output <filename> 
Specify the name of the report file to be generated; the file thus created may be reused as the delenda file.

3. How do I fetch and compile it?

If you want a simple check that carthago compiled correctly, just invoke it and type some declarations at it. Here's a transcript of such an interactive session; I typed the lines labeled u, carthago typed those labeled c:

u    $ carthago --delete a e i o u --verbose
u    <!-- test of carthago --
u    >
c    <!-- test of carthago -->
u    <!element a - - (#pcData) >
c    
c    <!ELEMENT a - -  (#PCDATA) >
c    
c    ! ! ! <stdin>:2  Element a is marked for deletion, but is declared.! ! !
u    <!element b - o (a?, c, d, e) >
c    ! ! ! <stdin>:3  Model requires hand work:  (c, d, e)! ! !
c    
c    <!ELEMENT b - O  (c, d, e) >
c    
u    <!element c - o (a?, b, c?, d, e*, f+) >
c    
c    <!ELEMENT c - O  (b, c?, d, f+) >
c    
u    <!element d - - ( a | b | c | d | e | f | g | h | i | j)* >
c    
c    <!ELEMENT d - -  (b | c | d | f | g | h | j)* >
c    
    $ 

To test that carthage (the shell script) also works, invoke it on some existing DTD, e.g. the dirty.dtd included in the carthage directory. Check that your output looks like the clean.dtd also in the directory:

   carthage dirty.dtd new.dtd
   diff clean.dtd new.dtd



Date: February 2001 (revised 07/02/2001)  Author: (revised rahtz ) .
© Oxford University Computing Services.