Computer Aided Processing for Natural Language - PIAF System.
Introduction
A set of programs oriented to natural language pro
cessing have been designed and implemented. These programs
were first applied to French language analysis and led to
creation of the PIAF system (Programmes Interactifs
dfAnalyse du Fran§ais).
Some applications are operational mainly in automatic
information retrieval where the PIAF system performs two
tasks in parallel:
1. Data acquisition and control,
2. Production of simple or compound key words.
Following a request from the Centre National des Arts et
Metiers, we have developed the contracted braille transla
tion and editing of French texts with respect to typo
graphic codes from programs and basic algorithms of the
PIAF system.
General Principles of the PIAF System
We propose to the average user a set of tools to
define and handle textual datas, under the control of an
editor:
i) The definition tools are facilities to create
and perform interactive updating of the dictionaries
and models of the selected language, in a simple
command language.
(ii) The processing tools allow the analysis
and the generation of textual datas with respect
to the previous linguistic definitions (which are
parameters of the running programs).
(iii) The morphological analyser which includes
a dictionary and a finite states automaton for
the processing sequential data (strings).
(iv) The syntactical analyser built upon a
set of dependancy relations and context free
rules, to build and control tree structures.
Some research works are going on to implement modules
for the transformation and the evaluation of tree
structures.
The possibility of changing the linguistic informations
allow us:
(i) the choice of source language (French,
Spanish, English...)
(ii) the choice of the transduction (grammatical
analysis, key words, phonetics, contracted braille,
programs translation...).
Application to Data Acquisition Control of Texts
in Natural Language
The purpose of this program is the detection of en
coding errors (editing, syntactic, morphological error)
with respect to a language model, during the input process
(real time). At this moment, the user is automatically advised
and he can converse interactively with the computer in
order to correct the text, to index the new word in the
dictionary or to complete the model of the language.
(a) Morphological analyser
The dictionary contains non-varing forms and expres
sions, prefixes, radices, suffixes and terminal items.
These elements are related to the morphological models
(400 models for French) and a finite states automaton
controls the application of the concatenation rules (180
rules for French) of the various elements composing a
form (radice+suffix(es)+end term) and provide the required
transduction, in the form of categories and grammatical
variables (verb 1st person, plural, future...).
When no rule leads to a final state, a back-tracking
technique is used to look for other solution (e.g. dif
ferent splitting). Otherwise, an orthographical error is
pin pointed.
(b) Syntactical analyser
The dependency relations between the categories induce
the construction of one (or several) sentence structure.
On each vertice, a context-free rule can be applied
to check the grammatical variable concordances (gender
and number concordance between a definite article and a
substantive). Therefore, it is possible to exhibit and
detail a concordance error.
Application to the Contracted Braille Translation
The purpose of this program is to provide for each
basic string its contracted braille representation,
according to the rules defined by the braille library
edition 64, Association Valentin Haiiy. To perform this
translation, we only need the morphological analyser
- 78 -
system described as above together with an appropriate
dictionary and automaton.
The dictionary contains the invariable forms and
expressions, prefixes, radices, suffices and terminal
itmes, as well as basic strings starting by a consonant
or a vowel. To each of these elements, the braille trans
lation and a reference to a model (30 models for the French
braille) must be associated. The automaton verifies the mapplication of rule (about 30 rules). Generally these
rules have two functions:
(i) Concatenation of the different components
of a form,
(ii) Some production constraints (identical
codes, lower codes).
The specific techniques involved in the interactive
language processing previously written for the data
acquisition control made easier the contracted braille
translation to a large extent. In particular, the following
points:
(i) the priority of the identification of the
longest string is possible because of the
organisation of the dictionary.
(ii) these strings may contain blank characters,
this makes possible the identification of specific
expressions in the input text without special mark.
(iii) each element (string) leading to a solution
(a final state of the automaton) cannot be split.
But whenever a final state is not reached, a back
tracking technique allows a look up for shorter
elements and a solution.
Interactive Editor
A user can run the various modules of the PIAF system
m in an interactive mode.
(i) At the definitional level - the finite
states grammar and the dictionary (or a parti
tion of it) can be created while the coherence
of the information is checked automatically.
(ii) At the translation level - the interrogation mode provides the facility of dialog with the computer (IBM 360 CP/CMS) via a terminal
(SAGEM type TEM 8BR) (keyboard with stressed
vowels) direct login or through the CYCLADES
BUS) network. This procedure is particularly well suited to debugging.
Considering an incorrect result (or no result at all),
it is possible eventually:
(a) to activate directly the definition programs
to update the models and the dictionary.
(b) to correct an error typing,
(c) to restart the current sentence.
The working mode is intended to activate the same
m functions as the interrogative mode but the user must specify the physical files for input/output and the length
of an output lines (up to 31 or 40 characters).
(iii) At the editing level
Two main typographic functions are included:
(a) "COMMENT" prevents contractions
(b) "JUSTIFY" to write at the beginning of a line
or a paragraph.
It is possible to invent new typographic functions.
Conclusion
The design of the PIAF system was intended to be
interactive, parameterised and modular. In the computeraided design area for natural language, it requires to
work out some software problems before any application.
The translation in contracted braille is a good
example of software tool tailorable by non-expert users.
For a user, easy debugging is due to the interactive mode
designed to reach "transparency" of a set of programs.
An application to other natural languages is made
possible without any deep modification.
When there are no contracted braille definite rules
for a language, the PIAF system turns out to be a simulator;
this leads to formalisation of the processing, hence, of
the rules. Besides this effect, the PIAF system can have
frequency counters of words to estimate the gain of
contracted braille.
Translation speed is about 230 words per second on
an IBM 360 under CP/CMS.
Comments
Post a Comment