© 1975 by British Computer Society
| ||||||||||||||||||||||||||||||||||||||||||||||||||
A dynamic database which automatically removes unwanted generalisation for the efficient analysis of language features that exhibit a disparate frequency distribution
Computing Centre Department, University of Nairobi, PO Box 30197, Nairobi, Kenya
A self-organising database was developed as part of a general language analysis system (Partridge, 1972). The periodic, automatic reorganisation of the database was aimed at increasing the efficiency of analysis of a language in which the constituent features exhibit a Zipfian type rank-frequency relationship. Such a distribution means that only a small number of features account for a large proportion of the information, while a large number of possible features are seldom encountered and thus seldom accessed within the database.
The mechanism described aims at reconciling two conflicting procedures: condensation by generalisation of language features to minimise the total size of the database, and the particularisation of the few commonly occurring features to minimise the average analysis time. Results are presented for the application of this mechanism to the analysis of batches of FORTRAN programs that constituted part of the normal workload for computers within five different environments.
Received February 1973.
* Computing Centre Department, University of Nairobi, PO Box 30197, Nairobi, Kenya