Total Textual Analysis: TEI
What is TEI?
The Text Encoding Initiative (TEI) is an established standard that is used by libraries, museums, and scholars in the Humanities and Social Sciences to represent literary and linguistic texts electronically; it is like a language for encoding texts so that they can be easily searched by elements specific to the disciplines, such as character or place name, literary theme, paragraph, or stanza. As the TEI Consortium states on their website, the guidelines are valuable for research, teaching, and preservation of texts (The Text Encoding Initiative, home page, ¶ 1).
TEI began in 1987 with the aim to develop standards for encoding machine-readable texts. In 1994 a “public release” version of TEI, called P3 Guidelines, was made available; it has since become the de facto standard for encoding texts and other materials (The Text Encoding Initiative). The TEI project was re-established in 2001 as a member-funded, non-profit Consortium of institutions and research projects; members collectively maintain and develop the standard for representing texts electronically (The Text Encoding Initiative; Wikipedia, 2005). The current usable version of the TEI Guidelines, P4, is a revision of P3 from SGML to XML format; the newest version of the original guidelines, P5, is still being revised.
The Guidelines define some 400 textual components and concepts that are expressed in markup language (SGML or XML). TEI is a modular scheme that is designed for customization to fit particular research projects and production tasks; this characteristic makes possible many different applications of TEI (Wikipedia, 2005). However, it is this modularity scheme that can detract from its usability, for customization of TEI according to one’s unique requirements demands a thorough knowledge and understanding of the schema. A solution to the difficulties of learning the entire standard is TEI Lite, which is a “best of” distillation of the full TEI that has only the most commonly-used elements on it, and as such is the most widely used (The Text Encoding Initiative).
TEI in the Library
It would be wise to incorporate TEI into the library and eventually encode all texts and materials in the collection. Judging from the ongoing projects using TEI, this resource is highly relevant to a rare book library such as ours; a primary benefit of TEI (aside from the search functions that it enables) is that it preserves texts in an electronic format for posterity. This would not only ensure the perpetuity of the collection but would also make the more delicate and rare items in the collection more accessible and available to library users. Establishing ourselves in the electronic library community would also enable us to work more closely and cooperatively with other libraries that have digitised their collections; sharing resources would enrich the collective pool of knowledge created by such collaborative work.
A further advantage to encoding texts in the library collection is that such practice would greatly facilitate the research work of students and scholars at the library. Members of the TEI Consortium are exploring the ways in which elements and concepts in a text can be encoded so that sorting and collating materials by particular themes and ideas can be quickly and thoroughly accomplished. An excellent example of this endeavour is The Orlando Project, based at the University of Alberta. Its aim is “to provide the first integrated history of British women’s writing” (The Orlando project, 2002, ¶ 1). TEI is used to encode not only identifying elements (title, author, etc.) but also what The Orlando Project calls “intellectual structures,” that is, thematic and analytical elements that are important in evaluations of texts. This project models for us the possible uses to which we can turn our collection, in particular for special exhibits. We could sort through the collection by theme, quickly collating all our items on 19th century French Naturalist literature or Early Modern plays, for example. Furthermore, the same applications could be performed by scholars visiting our collection, so that not only would they quickly find items of interest to them in our collection, but our materials could be compared with the holdings of other libraries, should we store electronic representations of our texts in a digital repository.
TEI would also be a useful resource for collection development; encoding would enable the librarians to efficiently sort through the existing collection by theme or subject to assess what we own and determine what would enhance the collection. Another pro of TEI is its application in textual analysis and analytical bibliography, as it facilitates close study of a text’s physical form as well as its content. Moreover, along with computers, the encoding should make cross-checking texts easier and more accurate (provided that the texts have been encoded correctly). An example of this application would be the comparison of a particular known edition of a text with an unknown one, looking for characteristic typographical marks that would help to identify the unknown one. In similar fashion, we could examine texts in our collection against each other and against copies of the same texts held in other libraries and institutions.
There are some important considerations that accompany the possible introduction of TEI into our library, of which the most immediate are the money and time. The TEI Guidelines cost $90 USD (or $60 if we become a member of the Consortium), which is reasonable given the advantages that this standard would give our library. There would be a considerable investment in time to train our librarians and other qualified staff in using the Guidelines and in patiently encoding all items in our collection. I noticed that there were problems with wrong syntax, spelling errors, and inaccurate XML language in materials encoded by many of the projects using TEI (see http://www.tei-c.org/Applications/ for examples of projects). The mistakes emphasise the importance of accuracy in TEI, as texts that are incorrectly encoded are not useful to anyone; however, the challenges of correct encoding should not deter us from adopting the TEI Guidelines. Plenty of help is offered to users in the form of case studies, tutorials, presentations and software. In addition, we are welcome to join any of the special interest groups (there is one for libraries) along with the mailing list in order to become more integrated into the TEI community.
Introducing TEI into our library will be accompanied by the demands of learning and implementing a new “language” into our infrastructure. However, given an suitable time-frame in which to devise strategies to incorporate TEI, the benefits accrued from a descriptive standard which has been tailored to fit the humanities and social sciences disciplines would be observed in the maintenance and posterity of a rare books collection and the knowledge developed from easier and more efficient access to and manipulation of texts in our library.
References
The Orlando project: An integrated history of women’s writing in the British Isles. (2002, January 23). Retrieved February 10, 2006 from http://www.ualberta.ca/ORLANDO/
The Text Encoding Initiative. (n.d.). TEI: Yesterday’s information tomorrow. Retrieved February 10, 2006 from http://www.tei-c.org/
The Text Encoding Initiative. (n.d.). What is TEI Lite? Retrieved February 10, 2006 from http://www.tei-c.org/Faq/index.xml.ID=body.1_div.4
The Text Encoding Initiative. (n.d.). What is the TEI? Retrieved February 10, 2006 from http://www.tei-c.org/Faq/index.xml.ID=body.1_div.5
Wikipedia. (2005, November 16). Text Encoding Initiative. Retrieved February 10, 2006 from http://en.wikipedia.org/wiki/Text_Encoding_Initiative

2 Comments:
So far, this is the only other report about the TEI that I've seen other than mine--oh hooray! Forgive me if I've missed something, though, on another blog.
I worked using TEI standards at UVic on the Robert Graves Project and had the pleasure of taking two intensive encoding workshops with some of the TEI muckimucks out there over the last couple of years. What can I say, I'm smitten. But I'm also intimidated by the agility with which some of those folks talk on the listserv about tagging and nesting, the way they describe the nuances and conventions of "div"s and "id type"s, nevermind translations, "xref"s and other complexities. [on a side note, the blog wouldn't let me publish this comment with stray angle brackets sitting around, hence the quotation marks!]
I really do believe what I said in my own report, but was also happy to see you mention the advantage of joining the larger community of libraries who are encoding their collections, a point I had noted beside my computer but never worked into the thousand words. Thanks for bringing that up, Natalie.
BTW, I hope that when you *are* missing from the Inforum that you are doing something you really want to be doing--how about during reading week?!
I was very happy to read your report on TEI, I found the project information so interesting. Unfortunately it did not fit in with my own assignment so I did not use it.
I agree though that it does sound like a programme that is useful for archiving materials.The users who already have sites are very diverse and show just what can be done with the programme.
Pat
Post a Comment
<< Home