Standards for Encoding Linguistic Data.

by Martin Kay

Purchase

Purchase Print Copy

 FormatList Price Price
Add to Cart Paperback15 pages $20.00 $16.00 20% Web Discount

Language processing is a special art that has accumulated considerable experience and technique, but few programmers are aware of it. Encoding conventions should be designed for the typist's or keypuncher's convenience. Given a simple formalized description of the input conventions used, computer programs of the Rand-Grenoble Catalog Input/Output System automatically convert the text into a standard internal coding scheme. The researcher can have his output printed in one style on a local machine and then have the same material printed for permanent use with other conventions on a more sophisticated machine. The change from six- to eight-bit magnetic tape with the IBM System 360 allows for 15 different alphabets with separate upper and lower case letters, and enough codes (128) to represent most syllabaries. Global symbols, the same in all alphabets, include numerals, punctuation marks, diacritics, and indicators of italics, boldface, etc. (Prepared for publication in [Computers] [and the Humanities].) 15 pp. Ref.

This report is part of the RAND Corporation Paper series. The paper was a product of the RAND Corporation from 1948 to 2003 that captured speeches, memorials, and derivative research, usually prepared on authors' own time and meant to be the scholarly or scientific contribution of individual authors to their professional fields. Papers were less formal than reports and did not require rigorous peer review.

The RAND Corporation is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.