Nova
|
Creating the Greek CELEX
database,
technical
or managerial
challenge?
1. INTRODUCTION CELEX is the interinstitutional computerized
documentation system for European Community law. It is a database that
contains
acts of Community legislation and case law with their full text, as
well as
bibliographical data on preparatory acts and parliamentary questions.
(see ref.
1.). CELEX has been established as a multilingual system
because the Treaties provide for nine official working languages. This
multilingual aspect of CELEX is not a mere luxury but rather a matter
of
primary importance for the European citizen: Community law supersedes
in many
cases national law and that is the reason why CELEX, the main
distribution
channel for Community law, is considered a pillar of the united Europe
of 1993. CELEX exists already in French, English, German, Dutch and Italian. The Danish and Greek versions are under preparation while the Spanish and the Portuguese versions are to follow. 2. THE TECHNICAL
CHALLENGE OF MULTILINGUISM The existing versions of CELEX do not contain special
Latin characters (i.e. accented French or German letters); the code
used for
these versions is the basic alphabet (ISO 646). Texts with special
characters
are transposed into standard ASCII. In addition texts are further
transposed
into upper case (“appauvrissement”). Some characters are transliterated
as one
character and some as two (see Table 1). TABLE 1: TRANSLITERATION OF SPECIAL LATIN CHARACTERS IN CELEX DOCUMENTS (apauvrissement).
(upper case letters are similarly transliterated as one or two capital letters, or as one capital and one lowercase, if they are in the beginning of a word) The creation of a Greek CELEX database in such a
“poor” environment is possible. One way would be to replace lower case
Latin
letters with upper case Greek ones or replace the Latin alphabet by the
Greek.
Serious problems would arise if these solutions were adopted; in fact,
legal
texts written only in upper case letters are not recognized as binding
in
Greece. On the other hand, the Community’s legal texts in Greek do
contain
words in Latin (“sui generis” decisions, “ad hoc” committees, ËSPRIT”
programme) that cannot be translated or transliterated into Greek and
can in
fact constitute useful search terms in the context of a Greek database.
Last
but not least the accents can in no way be omitted from Greek texts. In
Greek,
accents mark the syllable that is stressed and can play an important
role in
making the meaning of a work clear. Words may be written the same but
have
different meanings, depending on the syllable that is stressed (see
Table 2, ref.
2). TABLE 2: POLYSEMY IN GREEK WORDS (table adapted from ref. 4)
All these considerations led to the conclusion that an
extended character set was needed for the Greek base. This set would
provide
for the basic Latin characters plus the Greek accented and non-accented
letters. In fact, what was needed was an 8-bit standard for the
Greek-Latin
alphabet similar to the one used for western European languages (ISO
8859/1).
The standard would have to be international because that is what the
Commission
is expected to prefer (see ref. 3). On the other hand, the standard would have to be
implemented in real world products, particularly in:
The Commission’s Informatics environment made the challenge still greater by stipulating conformity with the then newly established Informatics Architecture (see ref. 4). That was quite logical. It would be unthinkable to introduce a special multilingual terminal (including Greek) into the Commission - especially the Translation divisions – in order to have them interrogate CELEX, without it being possible to connect such a terminal to the word processing system on the departmental computer, to EURODICAUTOM or to other bases outside the Commission. The existing situation where three different terminals are used for the abovementioned tasks (VT 100s for accessing internal and external databases, ETS 2010s for word processing and SIEMENS 9751s for the Greek EURODICAUTOM) is not very rational and could not serve as an example. 3. MEETING THE
CHALLENGE In January 1986, when an official was given the task
of setting up a Greek version of CELEX,
It was thought originally that the Greek CELEX base
could be created with the same procedures used for the existing
language
versions. By mid - 86, however, it was realized that a special project
and
additional resources would be needed. Before the project could advance, standards had to be
drawn up. Special contacts were established between ELOT (Hellenic
Standards
Organization) and the convenor of the corresponding Working Group of
ISO (who
was working at the time as an expert at the Commission’s DG XIII). As a
result,
an 8 - bit Greek standard was established jointly by ELOT and ISO in
June 1986
(ELOT 928) and it soon became an ECMA and ISO standard (ISO 8859/7). Meanwhile, contacts were established with industry
(see ref. 5), to provide for terminals incorporating Greek characters
according
to the standard while conforming with the Informatics Architecture (VT
220
compatibility). An interim solution via DRCS (dynamically redefinable
character
sets) was also envisaged but was not put into practice at the time
because of
lack of resources. Following established procedures (see ref. 5), a new
feasibility study was adopted by Commission departments in January
1987.
Progress of the project was conditioned by the availability of
terminals on the
list of approved hardware and software products for use by the
Commission and
by the availability of a DBMS (MISTRAL V5) supporting a multilingual
environment (including Greek). By autumn 1987, a multilingual terminal was submitted
for testing in the Commission’s Informatics Workshop (SCRIBEL terminal
by CREL;
the company was later renamed TIL Technologies and resubmitted the
product
under the name of ALTAIR). Although the layout of the Greek keyboard
wasn’t
ergonomically correct, a series of tests with the DBMS proved
conclusive;
MISTRAL V5 was accepted in February 1988 as capable of supporting a
multilingual environment. The acceptance was based on the creation of a
test
base in Greek containing 5 documents; it was possible to interrogate
that base
successfully from Greece (using an 8 – bit transparent access). In parallel, following an ever increasing political
interest for the proper introduction of Greek into the Commission’s
computer
systems and looking ahead to the Greek presidency (July 1988), special
attention was given to the implementation of a correct version of
Q–one, the
Commission’s standard word processing package under UNIX. Until then,
Q–one had
been using an ambiguous coding for Greek and that had caused serious
problems e.g.
in file transfer, alphabetical ordering etc. A new table for Greek was
implemented, and the appropriate keycaps, printcaps, termcaps and
collate files
created. It thus became possible by June 1988 to produce and print
Greek texts
on Q–one, using standard VT 220 terminals (WYSE 65) and laser printers.
Of
course, a DRCS solution was implemented. Some problems that still
remain will
be solved through the use of true Greek-Latin VT 220 terminals and the
implementation of special Greek laser printer fonts. A special program
(still
to be developed) will allow data transfer from the database towards
Q–one
(downloading) as well as from Q–one to the database (uploading – data
input).
It is hoped that the upgraded Q-one will go into general use by October
1988. It
will be accompanied by an updated version of ILS, the Commission’s
pivot
transcodification system. ILS
(INSEM Local Server) permits the transfer of texts between a series of
word-processing systems (Olivetti ETS 2010, Philips, etc.) and Q-one. Other products properly supporting multilingualism have also been submitted for testing in the Informatics Workshop (EURO-PC by SIEMENS) and are expected to figure on the Commission’s lists of approved products soon. 4. THE GREEK CELEX
ITSELF CELEX documents consist of two main parts. The
analytical part (keywords classifying documents by type or subject
matter,
relevant dates, names of authors, relations to other documents etc.)
and the
textual part (title, full text). Data of the analytical part are fed in
a coded
form into a special file (called ARCHIVE). By the use of special
multilingual
tables, the ARCHIVE codes are machine-translated into the respective
languages
before being introduced into the corresponding CELEX bases. Titles in
all
languages are also introduced into the ARCHIVE in order to offset the
fact that
texts arrive later. Therefore, for a Greek database, three types of data
are to be considered:
The CELEX tables can be completed for Greek very
easily through the use of the THELEM data entry package. The titles of CELEX documents are to be introduced via
Q-one or by the use of data, already existing on magnetic media (e.g.
the tapes
used to produce the Greek version of the “Directory of Community
Legislation in
force”). A large proportion of the texts (legislation, case
law) is available, mainly from the Office of Official Publications on
magnetic
media of different formats. Simple transcodification and formatting
programs
are to be developed in order to feed these data into the base. Now that the basic building blocks (terminals, DBMS, data entry software) are almost in place and (almost) linked together, the writing of the above mentioned programs that feed the data into the base is to be considered a rather routine task. If the necessary resources (6 programmer months) are allocated as expected in July 1988, the infrastructure should be completed by early 1989; the base will then be loaded with analytical data and texts and will be opened to the public in the course of 1989. 5. SOME INTERESTING
PROSPECTS When a general problem is solved, minor specific
sub-problems become easier to solve too. The introduction of the Greek
script
into the Commission’s Informatics Architecture – through the Greek
CELEX
project – means that all the Latin languages can be treated correctly
too. The
“Latin” versions of CELEX can now contain special characters (“rich”
versions)
and these versions can be incorporated into the Commission’s office
automation
environment. The special transfer programs developed for Greek merely
need
adapting. The new modernization plan for CELEX, the adoption of which
is
pending, provides for the introduction of the full texts of Commission
proposals into CELEX. When the final text of these proposals is adopted
by the
Council, translators will need only to download the proposal into their
word-processor, introduce amendments as necessary and send off the
final text
to the relevant authorities (electronically or on paper). As far as the Greek CELEX itself is concerned, there
are further prospects. It will be the first full-text database to be
available
on-line in Greece. This means that it will serve as a pilot for the
opening up
of an information market in Greece, in line with the Community
programme for
the development of a specialized information market in Europe (see ref.
6). On the other hand, the Greek CELEX can serve as a
benchmark for a more advanced morphological study of the Greek
language. Last
but not least, all language versions of CELEX can serve as test beds
for new
machine translation packages. In fact CELEX documents are identified by
a
single document number – the same in all its language versions. So, if
one
finds a decision or a regulation in the English base, one can find the
German
version in seconds using its document number. July
1988 ACKNOWLEDGEMENTS
Special thanks are due to the system supplier, Mr.
Jose MARIN-NAVARRO of the Service for the Development of Applications;
special
assistance was provided through valuable advice by Mr. Marios RAISSIS
of the
Computing Center, Ms Georgia EFTHYMIOPOULOU of DG XIII and MM. Michael
BALTSAVIAS and Nikolas PAPADIMITRIOU of the Translation Directorate.
Q-one was
adapted by Ms Monique VINCENT of TER/Brussels. Thanks are also due to
all
persons in and outside the Commission that supported this project in
any way
(critical, moral or political). REFERENCES
ΠΕΡΙΛΗΨΗ Η δημιουργία της ελληνικής βάσης CELEX είναι μια πρόκληση που
συνδέεται με την γενικότερη εισαγωγή της πολυγλωσσίας στα συστήματα
πληροφορικής των Κοινοτήτων. Η θέσπιση πρότυπων για το ελληνικό
αλφάβητο, η
ενσωμάτωση τους σε χειροπιαστά προϊόντα (τερματικά, πακέτα
προγραμμάτων) και η
σύνδεση όλων των δομικών λίθων μεταξύ τους προκείμενου να αποτελέσουν
ένα
ενοποιημένο σύστημα, να τι θα έχει επιτευχθεί μέχρι το 1989, όταν η
ελληνική
βάση θα ανοίξει στο κοινό. Οι θετικές επιπτώσεις από το έργο αυτό περιλαμβάνουν την δυνατότητα εισαγωγής των ειδικών λατινικών χαρακτήρων στο σύστημα, την προώθηση της δημιουργίας μιας αγοράς πληροφοριών στη Ελλάδα και την εξασφάλιση ενός χώρου δοκιμών για μια πρώτη ουσιαστικότερη μελέτη της μορφολογίας της ελληνικής γλώσσας. Υπάρχουν επίσης δυνατότητες χρησιμοποίησης του συστήματος για δόκιμες πρωτότυπων συστημάτων αυτόματης μετάφρασης. Πάνω απ’ όλα όμως το ελληνικό CELEX θα συμβάλει στην καλύτερη ενημέρωση των νομικών, των επιχειρηματιών και των στελεχών του δημόσιου και του ιδιωτικού τομέα στο Κοινοτικό Δίκαιο. |
||||||||||||||||||||||||||||||||||||||||||||||||||