What I did:
** rearranged
** removed material that was specific to input (rather than markup), and put links to the
input guide.
** expanded the markup instructions
** did basic copy editing, corrected some info, put in some hyperlinks for additional
resources
Issues to deal with:
1. The style Sa bcad,sc does not appear in the template I downloaded
2. Cataloging the Structure of a Tibetan Text needs to be updated to change
“front/body/back” to “klad/gzhung/mjug” from “mdun/lus/rgyab.”
3. I cut the following two sections, as they seemed to be intended for the input manual.
However, they don’t seem to appear there. Maybe they should be put in the input manual.
Illegible script
The key stroke Ctrl+F2, or the corresponding menu item in the THDL menu, inserts the
markup for illegible script as "{ILLEGIBLE}." This is used for any portion of a text that
is illegible, or where a glyph is undecipherable. In such cases, the page and line number
should be noted within the braces to indicate the position of the illegible section. For
example, "{ILLEGIBLE[12-3]}". Then, a scanned image should be made of just the
illegible portion of the line and this image should be named using the edition sigla, dash,
the letters "ILL", dash, the pagination as above. Thus, the illustration for the above
example would be called "Ab-ILL-12-3.jpg", if the text's sigla was "Ab". Should more
than one illegible section occur in the same line, they would be differentiated by using
lower case letters, "a", "b", "c", .... Thus, the illegible sections would be marked:
"{ILLEGIBLE[12-3a]}", "{ILLEGIBLE[12-3b]}", and so forth, while their
corresponding images would be: "Ab-ILL-12-3a.jpg", "Ab-ILL-12-3b.jpg", etc.
Submitting Tibetan Texts
When a text has been fully entered, all the parts of the text along with any scanned
images of illegible or unclear parts should be zipped together into a single .zip file, and
send to: thdl @ virginia.edu (removing extra spaces in that e-mail address!), and marked
subject as "Tibetan Text Submission - NAME OF TEXT".
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Download the Tibetan Language Template
Overview
This page describes how to format Tibetan texts using Microsoft Word’s “styles” feature.
To aid in the markup of Tibetan texts in Word, we have created a Tibetan-language
template. This template contains the same Word styles as a the English-language
template used for THDL documents. In the Tibetan-language template, however, the font
is specified as Tibetan Machine Uni and the language is specified as Tibetan (PRC). The
Tibetan-language template also contains some simple formatting conventions to ensure
that the Tibetan font displays nicely; in particular, it sets the font-size and paragraph
spacing, and includes an option that makes lines break properly at the edges of the page.
Basic Markup Goals
The basic goal of text markup is, first, to create an electronic edition of a text that is easy
to read, search, and navigate. Here, you will add luxuries that aren’t there in the original
text, like subheads, paragraph formatting, clearly identifiable text-titles and personal
names, and so forth. Secondly, when this markup is done with a standardized set of
styles, it makes it easy for the text to be converted to XML and put on the web.
Additional Resources: For more on Word styles, see our document Using Microsoft
Word Styles; a detailed description of text markup principles can be found in Using Word
Styles for THDL Markup. Some technical documentation about the XML language is
here.
Steps for the Basic Markup of a Tibetan Text
The first step in marking up a Tibetan text using Word styles is to make sure that the
basic input has been done correctly. At minimum, the text should:
be based on the Tibetan-language template
use a Unicode font
have page numbers entered
have a proper filename
These steps should have been taken care of by the text inputter, and are described in
Inputting A Tibetan Text.
Once the basic input has been done, the markup process involves applying styles to
different elements of the text. The basic steps here are to (a) structure the text with
subheads, (b) mark paragraph and verse styles, (c) mark citations, (d) mark the text’s
topical outline (sa bcad), and (e) mark names and text titles.
(a) Structure the text with subheads
Topic headings are often implicit in a Tibetan text, but given the format of a traditional
Tibetan book, they are difficult to identify: they are not marked in any special font, are
not numbered, and are often barely distinguished from the body text that surrounds them.
A first read-through of a difficult text might involve hours (or days) of trying to identify
the basic chapter and topic divisions around which the work is organized. Thus, having
an electronic edition with clearly marked sections, chapters, and subject divisions is a
major benefit. Adding subheads also makes lengthy texts navigable; by using features
like Word’s “document map,” a list of chapters and subheads can be easily browsed, and
clicking on one can take you to that section in the document.
Subheads are material that you add to the text. Thus, no part of the text itself (none of the
author’s words) are marked as subheads. Even though an author might identify a topic
heading, saying for instance “Now, the third topic: a detailed etymology,” you would not
mark this as a subhead. Rather, you add a subhead to the text and give it a sensible
wording (in Tibetan): “Topic Three: A Detailed Etymology.” The subhead text that you
create should end with a final shad.
The first subhead that you should add to the text is the title of the work itself. (This may
have already been done by the inputter.) In keeping with the above rule, this is something
that you add to the text; it is not the title that was typed in when the title page was input.
After typing the text title, mark it as Heading 1.
Next, add subheads to separate out the three most basic divisions of the text, which are its
(1) front, (2) body, and (3) back. These sections of a Tibetan text are described in detail
here. Mark these as Heading 2, and give them each a unique number in brackets. These
three subheads will look like:
[1] ཀླད།
[2] གཞུང་།
[3] མཇུག
The remainder of the process of adding subheads is essentially marking the divisions of
each of these three sections, applying the proper subhead style to them, and giving them a
unique number and title. For a lengthy, complicated text this might be days of work,
while a short unstructured text might not have any more subheads than these three.
Below is an example of the subheads for a simple, hypothetical text, which consists of a
front section (containing a title page, and the author’s statement of intent), a body
(containing three chapters), and a back section (containing a colophon and a closing
invocation).
[1] ཀླད།
[1.1] ཁ་བྱང་།
[1.2] དམ་བཅའ།
[2] གཞུང་།
[2.1] ལེའུ་དང་པོ་གཞི་བསྟན་པ།
[2.2] ལེའུ་གཉིས་པ་ལམ་བསྟན་པ།
[2.3] ལེའུ་གསུམ་པ་འབྲས་བུ་བསྟན་པ།
[3] མཇུག
[3.1] མཇུག་བྱང་།
[3.2] ཤིས་བརྗོད།
Here, the front, body, and back headings would be marked with the style Heading 2, and
the divisions of them would be marked as Heading 3. To create further divisions, for
instance to create three internal divisions of the first chapter [2.1 above], you could make
subheads numbered [2.1.1], [2.1.2], and [2.1.3], each marked with the Heading 4 style.
Note that the enumerations that included in the subheads currently need to be marked
"added by editor," as they may need to be removed later, and having them marked will
make them easy to remove. Be sure to mark the brackets and the space following the
closing bracket as well. You may want to do this markup at the very end. The problem is
that if the number is marked "added by editor," when you click on a heading in the
document map, the "style" window will then display "added by editor" rather than the
level of subhead; not being able to easily view the level of subhead makes it difficult to
structure and proof the outline.
Additional Resources:
Cataloging the Structure of a Tibetan Text provides detailed
information on the sections typically found in Tibetan texts.
It is usually easy to come up with the names for chapter-level
subheads, as these are written in the text itself. However, many
divisions won’t be labeled in the text (such as “title page,”
“colophon,” “invocation,” and so forth). For the names of these
in Tibetan, see our Tibetan Text Cataloging Glossary.
It might also help to refer to a lengthy fully-marked text as a
model. Two such texts available on THDL are Longchenpa’s Tshig
don mdzod, and Vimalamitra’s Mu tig phreng ba brgyus pa.
(b) Mark paragraph and verse styles
The text that is contained within the subheads should be marked so that it will display
properly. Prose should be marked with the style Paragraph, while lines of verse should
be broken up and marked with Verse 1 or Verse 2. (Verse 1 is used for the initial line,
while the remaining lines are marked as Verse 2). Insert a carriage return after each line
of verse; if a line is followed by two shads, the return is inserted after the second shad.
Lines of verse can also be separated into stanzas, by marking the first line of each stanza
as Verse 1 (note that this keeps you from having to enter an empty space between
stanzas).
(c) Mark citations
Citations from other works are a common feature of many Tibetan texts. While Tibetans
of course have conventions for distinguishing quoted material from the author’s own
words, these are sometimes imperfectly implemented, leaving the reader to struggle to
decipher what is intended to be a quote and what isn’t. Having quotations clearly
delineated (formatting them like “inset quotations” in Western typesetting) thus adds
major value to an edition. The process is much like that described in step 2, above.
Prose citations are marked with the style Citation Prose 1; if you want to break a prose
citation into paragraphs, paragraphs following the first one are marked with the style
Citation Prose 2.
Verse citations are marked with the styles Citation Verse 1 (for the first line of a stanza)
and Citation Verse 2 for any subsequent lines.
In both cases, these are separated from the author’s text by carriage returns at the
beginning and end of the citation. Citations should (but unfortunately don’t always) end
with some sort of “close quote marker” in Tibetan, such as ces so, or zhes so. These
markers should not be included in the quotation, but appear on the following line. Note
that the style Paragraph Continued is used following a quote, to indicate that there is no
change in topic following the quote.
Following is an (abbreviated) example from Longchenpa’s Tshig don mdzod that will
illustrate how to mark up quotes:
དང་པོ་ནི། ཀློང་དྲུག་པ་ལས།
ཡེ་ཤེས་ཉིད་ནི་རྣམ་གསུམ་གྱིས། །
གཞི་ཡི་ཁྱད་པར་ཚིག་ཏུ་བསྟན། །
ཞེས་པ་དང༌། རྡོ་རྗེ་སེམས་དཔའ་སྙིང་གི་མེ་ལོང་ལས།
གཞིའི་ཆོས་ཐམས་ཅད་ངོ་བོ་རང་བཞིན་
ཐུགས་རྗེ་གསུམ་དུ་ཤེས་པར་གྱིས་ཤིག་
ཅེས་སོ། །
Here, the author gives a brief topical heading, and then states the source of his first
citation. This is in the Paragraph style. Following are two lines of verse, in Citation
Verse 1 and Citation Verse 2 styles. The close quote marker appears on the next line,
which is in Paragraph Continued style. The author then gives a prose citation, which is
marked as Citation Prose 1. Note that for this prose citation, there is no closing shad; a
carriage return is made after the final tsheg in the citation, and the close quote marker
appears on the next line.
Sometimes it is not totally clear if a citation is verse or prose. In these cases, we
recommend that for a lengthy text that is making repetitive citations from the same
sources that you consult the texts being cited to fashion a short list of titles along with
indications of whether it is verse or prose.
(d) Mark the text’s topical outline (sa bcad)
A text’s topical outline should be marked in the style named Sa bcad. This is a character
style rather than a paragraph style.
The sa bcad will usually come right after a subhead, but occasionally appears within the
body of a section. The sa bcad may be a brief statement of what the topic of the section is
(“Now an etymology will be given”), or it may simply be an enumeration (“Now,
first...”). If the sa bcad ends with a closing shad, also mark that shad in the Sa bcad style.
(See the example in step 3, above, where dang po ni is in the Sa bcad style.)
(e) Mark personal names, as well as the names of texts and chapters
The Tibetan-language template contains several styles for marking personal names. In
basic markup, you should apply the style Author to the author’s name when it appears in
colophons. Also mark other names that appear in colophons, such as translators, treasure
revealers, scribes, and so forth. More advanced markup might involve marking the
names of deities, places, historical figures, clans, and so forth. If there is no style
appropriate for the names you need to mark, you could either create a new one in
conjunction with the director of your project, or you could use a generic style like Name
Personal Human.
Mark any names of texts with the style Text Title. (See the example above in step 3,
where Klong drug pa is marked as a text title.) When text titles appear in colophons,
mark them with the style Colophon Text Title. Similarly, chapter titles that appear in
colophons should be marked as Colophon Chapter Title.
Authors often refer to texts without actually giving their names, making oblique
statements like “as it says in sutra” (mdo las), “the root tantra states,” (rtsa rgyud las), or
simply “the same source [mentioned above] says” (de nyid las). Mark these as text titles
as well (but as with actual text titles, don’t include the particle las in the Text Title style).
Advanced Markup
The markup of simple texts may just involve creating a few subheads and marking names
and colophons. But depending on the project, much more detailed markup can be done.
For complicated works, it might be appropriate to apply styles to historical events, dates,
religious practices, place names, and so forth. Commentaries that have a root text
embedded in them can also have the root text marked, which makes for much easier
reading. The Tibetan-language template already contains a wide variety of styles for such
purposes, but if a particular project requires styles that have not been created yet, we can
easily add these to the template.
Detailed Guidelines
Below are some guidelines that should help with the finer details of text markup.
(a) When inserting carriage returns (such as at the end of a paragraph), make sure you
insert the carriage return after the shad+white space, and not after the shad but before the
white space. It is also important to leave the space there: do not delete it! (See the
example above in step 3, where a carriage return follows Klong drug pa las/, and note
how there is still a space left after the las/.) The idea here is that an electronic edition
should be able to be converted into traditional pecha formatting, without all of the
international formatting. As this space is intrinsic to the text, if you remove it, the pecha
formatting will not appear correctly.
(b) When you have two shad marks after a verse line, insert the carriage return after the
second shad so that both shads appear at the end of the line, and the next line begins
freshly with no shad in that line at the beginning. (See the example above in step 3.)
(c) When applying character styles to something that is not a whole sentence (such as for
personal names), make sure you highlight the full term including the final tsheg.
However, do not highlight a final shad at the end of a term. For character styles that are
used to indicate whole phrases or sentences (such as sa bcad), do include the final shad in
the style.
(d) Perform all special formatting (such as creating inset quotes, lists, and so forth) by
using styles. Any formatting that does not use styles will be lost when the text is
converted to XML. If you change the display attributes of particular styles to your own
preferences, do so in the styles, but leave the style names the same.
(e) Occasionally you may want to add something to the text to make reading more clear,
such as adding numbers before elements in a list. If you do so, mark these with the style
Added by Editor. This makes it clear that your addition is not in the actual text, and
makes it easy to find additions if you want to remove them.
(f) Note that unicode Tibetan does not always display properly in Windows XP.
Microsoft Word’s built-in subheads also will sometimes display oddly. The markup
process in Word is primarily about applying styles, rather than worrying about how those
styles look. As long as any element of text has the proper style applied to it, it will
convert properly to XML, and display properties can be set at that time.
(g) As you read through the text, you may well see problem areas that stem from
misinput, conversion mistakes, or other issues. We suggest you use the Word
highlighting function to color those areas yellow so that you can easily find them when
reviewing the text. Likewise, you may have questions about where a citation begins,
whether a shad has been mistakenly omitted, and so forth. These can also be marked so
they can be found later.
(h) The above steps for “basic markup” can be done in any order. It may be easiest to first
mark your whole text in Paragraph style, then to put in the subheads. Sa bcad style is
often easiest to apply as you insert the subheads. Marking text titles and citations at the
same time is another obvious way to save time.
Style Sheet
This contains a list of official THDL markup styles and procedures, which will be
updated as more are formalized. Try to follow these styles on your markup project,
though all of them may not be aplicable to any particular project.
Introductory Scenes (gleng gzhi)
Introductory scenes go in the main body of the outline (i.e. in [2] gzhung), they are not
part of the front matter (i.e. they are not part of [1] klad).
There appear to be two ways that introductory scenes are conceptualized by Tibetan
authors. (1) The are given their own chapter, for instance “chapter 1” of a tantra being the
introductory scenes. (2) They are not distinguished from the first chapter, although the
first chapter really has its own subject. In other words, they are not really part of the
stated subject matter of the first chapter, but there is no sa bcad that distinguishes them
from the first chapter.
The first case presents no problem for markup, just make the first chapter the
introductory scene, as it is in the work you are marking up. In the second case, we break
the introductory scene out of the first chapter, and give it its own subhead (even though
the work itself doesn’t do this. Here, if there are two introductory scenes, you will want
to give it an overarching subhead “introductory scene” (gleng gzhi), and then within that
make subheads for the individual introductory scenes.
Names
It is quite helpful to have names identified in the markup. The NormalTib template has a
selection of name-styles (Name Buddhist Deity, Name Personal Human, etc.). There are
some special cases for names that should be kept in mind:
Speakers
If the name is the name of someone who is speaking (“Then Dorje Chang said...”), mark
it with a “speaker” style. In XML, a name could be marked as a “name” and then have a
qualifier attached indicating that it is also a “speaker,” a “deity,” and so forth. This is not
possible in Word, so we have created a group of specialized names like “Speaker
Buddhist Deity” to deal with the problem.
Epithets
Sometimes it may be helpful to mark “epithets.” For example in “Then the great Teacher
said...,” the word “Teacher” is an epithet rather than a name. These can be marked with
an “Epithet” style. Note that there are also “Speaker Epithet” styles.
Collective Names
It is fairly common to have collective speakers, such as “Then the host of deities spoke
with a single voice.” Such collective names are marked with an “epithet” style.
Shads
Rin chen spungs shad
In projects (like the 17 Tantras project) that intend to create an exact copy of the paper
version of a text, this shad should be used. In other projects, it can simply be replaced
with an ordinary shad, as the rin chen spungs shad is simply an artifact of where the line-
breaks occur in a paper text.