A Multilingual Electronic Text Collection of Folk Tales
for Casual Users Using Off-the-Shelf Browsers

Myriam Dartois*, Akira Maeda**, Tetsuo Sakaguchi*, Takehisa Fujita***
Shigeo Sugimoto*, Koichi Tabata*

* University of Library and Information Science
1-2 Kasuga, Tsukuba, Ibaraki 305, Japan
** Nara Institute of Science and Technology
8916-5 Takayama, Ikoma, Nara 630-01 Japan
*** Kyoritsu Women's University
3-27 Kanda-Jinbocho, Chiyoda-ku, Tokyo 101, Japan

{myriam, saka, sugimoto, tabata}@ulis.ac.jp,
[email protected],
[email protected]

http://www.DL.ulis.ac.jp/oldtal es

D-Lib Magazine, October 1997

ISSN 1082-9873

Abstract

Folk tales are part of every nation's cultural heritage. They are a way to share material, discover another culture, and learn its language. This article discusses a multilingual collection of Japanese old tales developed on a multilingual browsing tool for HTML texts. Every tale contained in the collection is written in English, French, and Japanese. The texts of the tales are encoded in HTML and provided to users via the Internet. A user can browse a tale in all of the three languages in parallel using only an off-the-shelf browser. In this article, we also discuss the related issues learned from our experience on this multilingual e-text collection.


Table of contents

1. Introduction
2. Electronic Text Collection of Japanese Old Tales
3. Multilingual HTML
4. Lessons Learned from the Experiment
5. Conclusion


1. Introduction

The World Wide Web (WWW) provides access to documents written in many languages. We often find parallel versions of WWW pages for the same content written in two languages. In most cases, the languages are a local language for readers who understand the language and English as a common language on the global network. The pages in a local language are, in another aspect, for WWW browsers localized to display texts in the language. This means that users who wish to read the pages have to adjust their browsers for the local language. An ideal multilingual browsing environment for WWW users would be one in which users can browse WWW documents regardless of a language or languages which the documents are written in. However, since we can display documents encoded in a character code set accepted as a regional standard, which usually includes ASCII characters, on off-the-shelf browsers, e.g., Netscape Navigator and Internet Explorer, we cannot display real multilingual documents in parallel formats in the way a traveler's guidebook with small dictionaries in six languages does, or the Rosetta Stone, for that matter. Consider, also, that we cannot browse a paper in English on cultural interaction between Europe and Asia, which might include phrases and words written in European and Asian scripts on the off-the-shelf browsers. This kind of "real" multilingual e-text can offer a lot of opportunities. Web documents and e-texts are useful information resource for the research community for international and multicultural issues. This kind of multilinguality is also an important issue for countries where two or more languages dominate or for organizations like the European Community, which have to handle documents written in several languages.

Another important aspect of multilingual e-texts is access to foreign language and culture. With the expansion of the WWW, the general public and even children have become an important category of users. For example, we can find a number of home pages which provide folk tales for children as well as grown-ups. Folk tales can be an introduction to the culture of a country, and an enjoyable reading for children to develop their interests in foreign cultures. Since every nation has tales, they are a fruitful material as a multilingual collection on the global network for world-wide users. As a matter of fact, some folk tales home pages provide a multilingual access [1][2]. But they contain distinct versions of the same stories in different languages.

Web pages that provide multilingual texts have to provide users with access to their texts through off-the-shelf browsers. However, the browsers cannot display all languages because of the problem caused by the character code sets acceptable to them and the font sets available on them. For example, if you are in Europe or in the United States, your browser may not be able to display documents written in an Asian language that does not employ the Roman script, e.g. Japanese, Thai, and Korean. One way of solving this problem is for users to install different sets of fonts on their machines. However, it is not actually possible to install fonts for all languages used on the Internet.

Unicode will ease the multilinguality problem, but users still have to load fonts for complete set of Unicode. We believe that the key point for multilingual information distribution is simple and inexpensive technology for users, especially those who casually access multilingual information. To that end, we have developed a multilingual browser, called the Multilingual HTML Browser (MHTML Browser), which allows users to view documents written in foreign languages without installing any fonts on their machines in advance to access the foreign documents. The MHTML browser is realized using Java, so that the user needs only an off-the-shelf WWW browser capable of Java applet.

This article presents a multilingual e-texts collection of Japanese old tales which is provided for users using the MHTML technology. The collection contains e-texts written in English, French, and Japanese. This article briefly describe the MHTML technology; details are given elsewhere [3][4][5].


2. Electronic Text Collection of Japanese Old Tales

2.1 Folk Tales as Multilingual Information Resource

Folk tales are a world-wide cultural heritage. Each nation has its own folk tales which have been transmitted from generation to generation. Tales had been orally transmitted and were written down rather recently in our history. A folk tale has a number of variations and shows us the rich cultural background of a nation. On the one hand, folk tales are regional, but on the other hand they are universal. Yet we can find a lot of themes in them which are common among nations. They tell us the story about human kind and the rules of the human community; for example, you must not judge a man on his appearance or on his size. This is the theme of the Japanese tale "Issunboshi" and of the French tale "Le Petit Poucet". Both of the heroes are small men, but both of them are courageous and clever. Another common feature is the diversity of protagonists and leading characters, e.g., human beings, animals, imaginary beings and things common in our lives. We can also find cultural diversity from folk tales, e.g. description of everyday life and habits, characteristics of leading characters and background of a story. Thus, folk tales are a very rich material for sharing cultural heritage over the world and for understanding the cultural diversity.

Folk tales have been transmitted from parents to children, and to grandchildren as an enjoyable medium to learn what they need to live in their community. In some cases there are several variants of a tale which differ region by region, or even village by village. Thus, folk tale is an important part of culture of a nation, a region or a village. This fact implies that folk tales give people a common background as a member of a community. For the people of a community, which may be nation-wide, region-wide, or village-wide, a leading character of tales represents a certain meaning, e.g., bravery, gentleness, beauty and so on. For example, the leading character of a Japanese folk tale "Momotaro" is a young boy called Momotaro and he is a symbol of strength and bravery for the Japanese people.

Folk tales have been orally transmitted from a generation to the next generation in a community. This fact is, in a sense, quite different from the communities on the WWW and the Internet. The traditional community is built based on geographical distance. But geographical distance is meaningless on the Internet which provides us with a global communications infrastructure. This means that cross-lingual and cross-cultural information is crucial on the Internet since the users can access foreign information very easily. In addition to researchers and students, members of the general public and children are important part of the user communities on the Internet. We have been working on the multilingual folk tale collection in order to create a shareable information resource based on multiple cultures for various users with different cultural backgrounds.

2.2 Building the Folk Tales Collection

The multilingual e-text collection of Japanese old tales contains ten Japanese tales chosen from well-known Japanese tales. Every tale is written in Japanese, English, and French. It has three distinct entrance pages (i.e. home pages) in parallel written in these three languages linked from the primary entrance [http://www.DL.ulis.ac.jp/oldtales/] Maintaining entrance pages for multiple languages is an important aspect for users, but it makes our maintenance procedure a little complicated. All of these three home pages are organized in the same structure. In this section, we explain the structure using the English pages. Three pages are linked from the home page, a multilingual page, a monolingual page with MHTML support, and a monolingual page without MHTML support.

The multilingual page contains a multilingual text viewer implemented as a Java applet based on the MHTML technology. The first page contains a table of contents written in English, French, and Japanese. The table of contents is displayed in parallel as illustrated in Figure 1.

Figure 1 : Table of contents

This multilingual table of contents is implemented as a table whose elements are an applet to display multilingual texts. Since font glyphs are supplied from the folk tales server, a user need not install fonts in advance. A click on a title of one story, in whichever language, leads the user to the tale chosen. The texts of the tale in three languages are then displayed in parallel as shown in Figure 2.

Figure 2 : Trilingual page of a folk tale

Illustrations are added to the texts as an important component to make the texts attractive for readers, especially for children. Every document linked from the table-of-contents is organized in the same way. This use of multilingual e-text reminds of the use of bilingual books, allowing the reader to switch immediately from one language to the other. The advantage of this system is its flexibility; it is possible to present the same text in three or even more languages in parallel. More over, this system has potential to offer a more flexible service for a user such as choosing a set of languages to read texts.

In the monolingual pages, readers can get a text written in English, French or Japanese. The monolingual page without MHTML provides the texts in HTML. Readers are supposed to have Japanese and Latin-1 fonts to read the texts. The monolingual page with MHTML also offers another presentation of multilingual texts for users who do not want to browse several languages in parallel. This access point to the Japanese old tales collection provides the user with the text of a tale in one language only. The text is displayed using the same applet as used in the multilingual page; the difference is that, in this case, only one panel is displayed on a screen. Figure 3 shows a page displaying one text.

Figure 3 : Monolingual page of a folk tale

As shown in the next section, the multilingual document panel implemented as an applet receives an object which contains a source text string and the minimum set of font glyphs required to display the text. Since the object is automatically created from the source HTML text by an MHTML server, the tales are encoded purely in HTML. The same HTML text is used for these three different interfaces.


3. Multilingual HTML

The basic concept of the MHTML technology is to send a client an object composed of a source HTML text and the minimum set of font glyphs required to display the text. The object, which is called MHTML encapsulated document object, is displayed on the client by a viewer realized as an applet. Figure 4 shows the structure of the MHTML document object. Since all of font glyphs required are sent to the client on demand, the client need not load fonts in advance.

Figure 4 : MHTML document object

The MHTML browser system is composed of two components, MHTML server and MHTML viewer applet. The MHTML server converts an HTML document into an MHTML document object on the fly and sends the objects and the applet to a client. The applet running on the client receives the object and displays the text encapsulated in the object. Figure 5 shows an overview of the MHTML browser system. (The details of MHTML are described in [3][4][5].)

Figure 5 : Overview of the MHTML Browser system

The MHTML technology is advantageous for browsing multilingual documents in the following aspects.

  1. Users need nothing except an off-the-shelf WWW browser which supports Java applets.

  2. Since a character appears repeatedly in a longer text, the ratio between the size of a document object and its source HTML text becomes smaller for longer source texts.

  3. An MHTML service providers can install their own glyphs into their servers in addition to standard fonts in order to make locally-defined characters visible on a remote client. This feature is crucial for e-texts of classic materials in, for example, Japanese and Chinese, which often include characters not defined in any industrial standard character code sets.

  4. An MHTML server can be implemented as an intermediary server which receives a request to get a document written in a foreign language from a WWW server, fetches the document, and sends the MHTML document object created from the source to the client with the MHTML viewer applet. (We call this intermediary sevice the MHTML Gateway. See [http://mhtml.ulis.ac.jp/].)

  5. Input functions for foreign texts can be implemented based on the MHTML technology. A text input function is a mapping from a key input sequence to a character code or to a character code string. The mapping function can be located in a remote server as well as the MHTML server which makes a character code string in a foreign language visible to the user.

4. Lessons Learned from the Experiment

4.1 Issues in Adaptation of Folk Tales

Building a multilingual folk tales collection includes other problems, which include collecting tales and translating them into other languages. First, let us show our strategy for collecting tales. We chose ten famous folk tales. Those tales are very old and have no copyright restrictions. (Books containing those tales are subject to copyright, but the tales themselves are public domain.) We wrote our own copy of the tales in Japanese and translated them into French and English.

This adaptation was inevitable not only to resolve the copyright problems but also to cope with variants of the tales. Each tale has several variants, some of them are created by the authors of books and some are regional variants. The range of variations of a tale is quite large, from the characters' names to key elements of the story. For example, the Japanese tale "Cracking Mountain" has what we could call a "hard" and a "soft" version. In the "hard" one, the wicked badger kills an old woman, cooks her, and after taking the appearance of the old woman, makes the husband eat the stew made out of his own wife. In the "soft" version, there is no human stew. So we had to cope with the variants of each tale and, just as other authors did, adapted our own version. For this purpose, we examined and compared at least four or five variants, and also read documents on the source of the tale. We then selected elements and wrote our own version of the folk tale.

The adaptation was also required to deal with the characters which appear in a tale and their properties. Quite a few common characters appear in tales of different countries, however they represent different meanings. For example, the tortoise is a symbol of longevity and mutual love in the Japanese tales (e.g. "Urashimataro") but is a symbol of tenacity in Europe (e.g. "The Hare and the Tortoise"). This point is quite important for translation from Japanese to English and French.

4.2 Issues in Translation of Folk Tales

We first wrote a Japanese text of each tale, which is considered as the source text in our collection. Then we translated it into English and French. At this point, we had to deal with the translation of concepts: the basic concept of ogre is common to both Japan and Europe, but several differences appear. In Europe, an ogre is usually a character nearly a human being, a man who eats little children (e.g. "Le Petit Poucet", a French tale). Sometimes, it may be an old woman as in "Hansel and Gretel" -- but this is not common. But in Japan, an ogre ("Oni" in Japanese) is not a human being, it is much more a monster which behaves like a robber and usually does not eat people (e.g. "Momotaro the Peach-boy", "Issunboshi"). So it was quite difficult to determine the term that means Japanese ogre "oni", because the word "ogre" conveys a different meaning in Europe. And, there is no appropriate word representing "Oni" in the European cultural context. On the other hand, the term "ogre" seems appropriate as a translation from the Japanese "Yamamba". This character is usually an old woman who eats people. But "ogre" generally refers to a man, and on the other hand an old woman like "Yamamba" would be a witch. Therefore, lots of problems are raised by translation and have to be solved in a more or less free interpretation in order to preserve the integrity of the original tales in translation.

4.3 Issues of Human Resource

Human resources are crucial to developing a multilingual system, i.e., native speakers and/or language specialists. The translated texts of the e-text collection have been checked by native speakers, which is necessary to guarantee the quality of the texts. This check addresses not only grammatical accuracy but also selection of words and phrases that requires cultural sensitivity. The MHTML technology is advantageous in this aspect because we can distribute MHTML servers to locations where we can get the human resources to build collection for local languages and create multilingual collection as a whole.

4.4 Issues Specific in Japanese e-texts: Multiple Character Sets

With respect to the Japanese texts, we had to deal with the problem of Chinese characters, which are called "Kanji". The number and level of difficulty of characters is related to the difficulty of the text. As we wanted to provide texts easy to read for children, we had to use only characters easy to read for children, i.e. "Hirakana", which is a set of about 50 phonetic characters, and a limited number of Kanji which children learn in the lower grades in elementary schools. However, a text written mainly in Hirakana is not easy to read for an adult. An ordinary solution in Japanese books for this kind of problem is to write the text using Kanji where they should be used for adults and add transcription in Hirakana for children as a superscript for Kanji, which is called "Furigana" in Japanese. This additional text would be useful not only for children but also for foreign readers. However, it is difficult to add Furigana because HTML has no function to add such superscript.


5. Conclusion

The multilingual e-text collection of Japanese old tales provides a multilingual environment for users to read old tales written in various languages. The collection currently has ten Japanese famous tales written in English, French, and Japanese, and can be expanded to other languages and more tales. The purpose of the collection is to help users discover a foreign culture and use multilingual e-text as a tool for learning.

We hope to extend the collection to other languages in order to fully use the multilingual environment and its capacities. We also plan to extend it to the tales of other countries in the future.

In addition to the e-texts, our future work also includes defining metadata for multilingual texts and creating a flexible user interface for children. We believe that metadata for coexisting multilingual texts have to be provided to extend the collection. Since the old tales collection is created not only for adult readers but also for children, we are working on a user interface designed for children making use of images and animations.


Acknowledgements

The authors would like to thank Frances Marr who was an English teacher at ULIS. We could not translate the tales into English without her help.


References

[1] Korean Old Tales (http://www.lg.co.kr/public_html/)

[2] The Fairy Tales of Ika Bremer (http://www.ika.com/stories/)

[3] Akira Maeda, Takehisa Fujita, Lee Swee Choo, Tetsuo Sakaguchi, Shigeo Sugimoto, Koichi Tabata: A Multilingual Browser for WWW without Preloaded Fonts Proceedings of the International Symposium on Digital Libraries 1995. p.269-270. Aug. 1995.
(International Symposium on Digital Libraries 1995 (ISDL95))

[4] Tetsuo Sakaguchi, Akira Maeda, Takehisa Fujita, Shigeo Sugimoto, and Koichi Tabata: A Browsing Tool for Multi-lingual Documents for Users without Multi-lingual Fonts
Proceedings of the 1st ACM International Conference on Digital Libraries. p.63-71. Mar. 1996.

[5] Dartois M. et al., Building a Multilingual Electronic Text Collection of Folk Tales as a Set of Encapsulated Document Object : An Approach for Casual Users to Browse Multilingual Documents on the Fly, Proceedings of ECDL'97, (Lecture Notes in Computer Science 1324, Springer), pp. 215-231, 1997
Copyright © 1997 Myriam Dartois, Akira Maeda, Tetsuo Sakaguchi, Takehisa Fujita, Shigeo Sugimoto, Koichi Tabata

D-Lib
Magazine |  Current Issue | Comments
Previous Story | Next Story

hdl:cnri.dlib/october97-sugimoto