Computer Aided Processing for Natural Language - PIAF System.

 

BRAILLE RESEARCH


Introduction 


A set of programs oriented to natural language pro cessing have been designed and implemented. These programs were first applied to French language analysis and led to creation of the PIAF system (Programmes Interactifs dfAnalyse du Fran§ais). Some applications are operational mainly in automatic information retrieval where the PIAF system performs two tasks in parallel: 1. Data acquisition and control, 2. Production of simple or compound key words. Following a request from the Centre National des Arts et Metiers, we have developed the contracted braille transla tion and editing of French texts with respect to typo graphic codes from programs and basic algorithms of the PIAF system.



General Principles of the PIAF System


 We propose to the average user a set of tools to define and handle textual datas, under the control of an editor:

i) The definition tools are facilities to create and perform interactive updating of the dictionaries and models of the selected language, in a simple command language. (ii) The processing tools allow the analysis and the generation of textual datas with respect to the previous linguistic definitions (which are parameters of the running programs). (iii) The morphological analyser which includes a dictionary and a finite states automaton for the processing sequential data (strings). (iv) The syntactical analyser built upon a set of dependancy relations and context free rules, to build and control tree structures. Some research works are going on to implement modules for the transformation and the evaluation of tree structures. The possibility of changing the linguistic informations allow us: (i) the choice of source language (French, Spanish, English...) (ii) the choice of the transduction (grammatical analysis, key words, phonetics, contracted braille, programs translation...). 

Application to Data Acquisition Control of Texts in Natural Language 


The purpose of this program is the detection of en coding errors (editing, syntactic, morphological error) with respect to a language model, during the input process (real time). At this moment, the user is automatically advised and he can converse interactively with the computer in order to correct the text, to index the new word in the dictionary or to complete the model of the language. 


(a) Morphological analyser

 The dictionary contains non-varing forms and expres sions, prefixes, radices, suffixes and terminal items. These elements are related to the morphological models (400 models for French) and a finite states automaton controls the application of the concatenation rules (180 rules for French) of the various elements composing a form (radice+suffix(es)+end term) and provide the required transduction, in the form of categories and grammatical variables (verb 1st person, plural, future...). When no rule leads to a final state, a back-tracking technique is used to look for other solution (e.g. dif ferent splitting). Otherwise, an orthographical error is pin pointed.

 (b) Syntactical analyser 

The dependency relations between the categories induce the construction of one (or several) sentence structure. On each vertice, a context-free rule can be applied to check the grammatical variable concordances (gender and number concordance between a definite article and a substantive). Therefore, it is possible to exhibit and detail a concordance error.


Application to the Contracted Braille Translation 

The purpose of this program is to provide for each basic string its contracted braille representation, according to the rules defined by the braille library edition 64, Association Valentin Haiiy. To perform this translation, we only need the morphological analyser - 78 - system described as above together with an appropriate dictionary and automaton. The dictionary contains the invariable forms and expressions, prefixes, radices, suffices and terminal itmes, as well as basic strings starting by a consonant or a vowel. To each of these elements, the braille trans lation and a reference to a model (30 models for the French braille) must be associated. The automaton verifies the mapplication of rule (about 30 rules). Generally these rules have two functions:
(i) Concatenation of the different components of a form, 
(ii) Some production constraints (identical codes, lower codes). 

The specific techniques involved in the interactive language processing previously written for the data acquisition control made easier the contracted braille translation to a large extent. In particular, the following points: 
(i) the priority of the identification of the longest string is possible because of the organisation of the dictionary. 
(ii) these strings may contain blank characters, this makes possible the identification of specific expressions in the input text without special mark.
(iii) each element (string) leading to a solution (a final state of the automaton) cannot be split. But whenever a final state is not reached, a back tracking technique allows a look up for shorter elements and a solution.

Interactive Editor

 A user can run the various modules of the PIAF system m in an interactive mode. 
(i) At the definitional level - the finite states grammar and the dictionary (or a parti tion of it) can be created while the coherence of the information is checked automatically. 

(ii) At the translation level - the interrogation mode provides the facility of dialog with  the computer (IBM 360 CP/CMS) via a terminal (SAGEM type TEM 8BR) (keyboard with stressed vowels) direct login or through the CYCLADES BUS) network. This procedure is particularly well suited to debugging. 
Considering an incorrect result (or no result at all), it is possible eventually: 
(a) to activate directly the definition programs to update the models and the dictionary. 
(b) to correct an error typing, 
(c) to restart the current sentence. 
The working mode is intended to activate the same m functions as the interrogative mode but the user must specify the physical files for input/output and the length of an output lines (up to 31 or 40 characters).

(iii) At the editing level Two main typographic functions are included: 
(a) "COMMENT" prevents contractions 
(b) "JUSTIFY" to write at the beginning of a line or a paragraph. It is possible to invent new typographic functions.


Conclusion 

The design of the PIAF system was intended to be interactive, parameterised and modular. In the computeraided design area for natural language, it requires to work out some software problems before any application. The translation in contracted braille is a good example of software tool tailorable by non-expert users. For a user, easy debugging is due to the interactive mode designed to reach "transparency" of a set of programs. An application to other natural languages is made possible without any deep modification. When there are no contracted braille definite rules for a language, the PIAF system turns out to be a simulator; this leads to formalisation of the processing, hence, of the rules. Besides this effect, the PIAF system can have frequency counters of words to estimate the gain of contracted braille. Translation speed is about 230 words per second on an IBM 360 under CP/CMS.

Comments

Popular posts from this blog

Stories of Renewal.

Highlighting the power of machine learning in creating inclusive solutions that cater to diverse user needs.