From lou@ermine.ox.ac.uk Wed Sep  5 15:39:24 2001
Date: Tue, 4 Sep 2001 22:18:18 +0100 (GMT Daylight Time)
From: Lou Burnard <lou@ermine.ox.ac.uk>
Reply-To: Lou.Burnard@oucs.ox.ac.uk
To: Tomaz Erjavec <Tomaz.Erjavec@ijs.si>
Cc: editors@tei-c.org
Subject: Re: Comments on P4 15 (Simple Analytic Mechanisms)

Dear Tomaz

Many thanks for your comments! I have interlarded them with my comments
and actions below...

On Tue, 4 Sep 2001, Tomaz Erjavec wrote:

> Dear Eds,
> follow my comments on the chapter. Each comment is preceded by the
> relevant text in double quotes. At the risk of further decreasing
> legibility I've marked each such quote with:
>
> * - the complaint - mostly to do with conversion to PDF/HTML - is
>     likely also to occur elsewhere in P4
>
> ! - a true blue corrigible error
>
> ? - me complaining
>
>
> And here are the comments:
>
> *"These `interpretative' elements (span and interp) are described in detail..."
> In the P4 PDF/HTML element names are not typographically distinguished
> from other text; I really hope you will reconsider this - I don't see
> much wrong with P3 styling, where they were in <> + bold.

This is an error in the formatting. The P3 style should be restored.

>
>
> *"<!--Text Encoding Initiative:
>    Guidelines for Electronic Text Encoding and Interchange.
>    Document TEI P3, 1994.-->"
> Here and with most of verbatim text you have serious problems with
> spurious wrapping of the text. These headers and examples were really
> nice in P3, but now they are horrible to read. But I guess you know this!

Yes. We need to work a bit harder on that....

>
>
> *"<!DOCTYPE TEI.2 PUBLIC "-//TEI P3//DTD Main Document Type//EN" "tei2.dtd" ["
> I've mentioned this already, but still: "TEI P3" should probably go to
> "TEI P4". And (I vote) "tei2" to "tei".
>

OK. Your vote is noted! But I think the "no breaking existing dox" rule
means that we have to stick with TEI.2 as the root element name for this
version. The filename is comparatively unimportant, and the FPI even less
so... :-)


>
> *"<w> represents a grammatical (not necessarily orthographic) word.
>      lemma identifies the word's lemma (dictionary entry form).
>   <m> represents a grammatical morpheme.
>      baseform identifies the morpheme's base form."
> The attributes "lemma" and "baseform" are not typographically marked in the
> PDF, although they should be small caps. In the HTML they are bold, although
> they should be <tt> as well. (this was ok in P3)

Agreed that the formatting needs attention. Not sure about small caps.


>
>
> *"As members of the seg class, these elements share the following attributes:
>  type characterises the type of segment.
>  function characterises the function of the segment."
> The "type" and "function" attributes are typographically marked neither in HTML
> or PDF. (this was ok in P3)

This is an error in the formatting. The P3 style should be restored.

>
>
> !"The smay be thought of as providing an abbreviated version"
> The "s" is missing a trailing space. (this was already wrong in P3 but
> not as obvious)

Hoorah! a genuine error! Now fixed.

>
>
> !*"<seg type=clause> and <seg type=phrase>, respectively."
> Attribute values are missing quotes for XML. (this was already 'wrong' in P3)

Actually caused by inept tagging in the source. Now fixed.

>
> !?"<div type="stanza">
>       <l><cl part="i">Tweedledum and Tweedledee</cl>
> ...
>  <div type="stanza">
>       <l><cl part="i">Just then flew down a monstrous crow,</cl>
> ...
>  </div>
>  </div>"
> Is it correct that the second stanza is nested inside the first? I wouldn't
> think so.

Well spotted. Our auto-xmlification procedure came unstuck. Fixed.


>
> ?"<div type="stanza">
>       <l>...</l>"
> In the previous example, stanzas were missing from the poem, but this wasn't
> marked at all. Why then is the omission of lines marked in this one? It would
> be more elegant to just start the example without the first two lines,
> i.e. with "<l><cl next="c5" id="c3"...

OK

>
>
> ?"To make such encodings easier to read, the segmentation tags can begin new
> lines, and be indented according to their degree of nesting, thus: "
> This is a bit of a laugh with the current formatting of examples. Also, I
> can't but mention that I find this advice a bit off-topic in any case. Nice
> nesting helps you with any SGML / XML / TEI document, why should it be
> mentioned esp. for linguistic analysis? (also the comments on
> preserving linebrakes and font shifts ater the example)
>

Certainly the comment on font shifts is daft in this context and has been
removed. The other comment I am less sure about. It might be thought of as
more relevant here, because the markup is actually quite difficult to read
unless it is laid out in some meaningful way. You will say "but no one
should have to read the markup" of course. I will see what Steve thinks,
if he's awake.

>
> ?"<p>
>       <cl type="finite declarative" function="independent">
>          <phr type="NP" function="subject">Nineteen fifty-four,"
> I think an <s> should come after <p>.
>

And you are right! Fixed.


>
> !*"      <phr type="NP" function="complement"> <!-- ? -->"
> This comment is in the PDF rendered as "<!- ? ->" which is
> illegal SGML. Looks like a general bug that PDF gobbles up one - in
> each string of hyphens.
> And what is the comment "?" it doing here anyway?
>

Dunno. But it's gone now!

>
> ?"                  </phr>
>      ---
>    <phr type="PP" function="appositive postmodifier">for
>    <phr type="NP" function="prep.obj.">the U.S.A.
>    <phr type="PP" function="postmodifier">
>           as a whole</phr>
>                         </phr>
>                      </phr>
>                   </phr>
>     ---"
>
> The "---" are yet a third way of marking omitted material in this
> chapter. Again, one hyphen is missing in PDF.

Actually, I think these are part of the example text, not editorial.
Have changed them both to &mdash; accordingly.

>
>
> ?"The w, m and c elements are also identical in meaning to the seg element
> with a type attribute of `w' , `m' , or `c' , and may occur wherever seg is
> permitted to occur"
> I'd just point out that this naming is inconsistent with the one for s, cl,
> and phr, where the type attribute is given not the name of the element, but
> the name of the description of the element. So it should be either <seg
> type='s'>, <seg type='cl'> <seg type='phr'> or <seg type='word'>, <seg
> type='morpheme'> <seg type='character'>. OK, I'll admit that words et al. are
> more frequent than sentences, so it make sense to keep this short and the
> sentence-level ones descriptive, so this is just nit-picking here....

Hmm.
>
>
> ?"the c element can only contain parsed character data, and should in
> fact only contain a single character."  But current practice (as also
> seen in this very chapter) uses <c> to mark punctuation, which can
> have more than one character, e.g. "...".
>

Definitely wrong. "should in fact" ->" will often "

>
> !*"<s id="MQP1S2P114S3">There was a slow integration,"
> !*"<s id="MQP1S2P114S5">Not for one second longer (if the"
> "<span resp="DTL"
>          value='the moment'
>          from="MQp1s2p114s3"
>          to="MQp1s2p114s5"/>"
> and everywhere else in this section:
> ID case missmatch in XML This was ok in P3, something - a bad SGML
> declaration? - made all your IDs UC

Whoops. This is definitely an error. Fixed (all IDs and IDREFs now uc)

>
> !"<span resp="TMA" type="structural" unit" value="introduction"
>          from="S1" to="S3"                                             />
>    <span resp="TMA" type="structural" unit" value="conflict"
>          from="S4a"                                                  />
>    <span resp=""TMA"" type="structural" unit" value="climax"
>          from="S4b"                                                  />
>    <span resp="TMA" type="structural" unit" value="revenge"
>          from="S5" to="S17"                                            />
>    <span resp="TMA" type="structural" unit" value="reconciliation"
>          from="nil1"                                                 />
>    <span resp="TMA" type="structural" unit" value="aftermath"
>          from="P2" to="P4"                                             />"
>
> "structural"" has an end-quote but it shouldn't; this was ok in P3.

Weird. Should have been caught during xmlification. Now fixed.

>
> ?*"NOTE 98: See G. N. Leech and R. G. Garside,: Running a Grammar Factory, in English
>      Computer Corpora: Selected Papers and Research Guide, ed. S. Johansson and A.-B.
>      Stenstr?m (ed):"
> The &oslash; in Stenstr&oslash;m is in HTML a ? and a crossed 0 in
> PDF. Was OK in printed P3.

Something went wrong during the XMLification. Have changed &#8856; back to
&oslash;

>
> And I find the very long discussion on why <w ana= is good rather
> dated - the battle has been, as it were, won.

Will review it.


>
> That's it. I hope it helps.
> Best and good luck with P4!
> Tomaz
>
>

