Publisher's Synopsis
The rapid improvements in computer technology that have taken place since computers were first developed in the 1940s have brought about drastic changes in the way that information of all kinds can be stored and processed; indeed, the efficient processing of extremely large, machine-readable databases is one of the most obvious results of these developments. In this book, the authors consider how the advent of parallelism in computer hardware can further increase the efficiency of 2 types of database processing; specifically, they look at ways in which algorithms for text retrieval from serial document files and for cluster analysis (or automatic classification) can be implemented on one particular type of parallel computer, the ICL Distributed Array Processor (DAP).;The book is accordingly addressed to three, rather different types of reader. The first of these, and the one that fits most closely with the background of the two authors, are those who are responsible for the storage and retrieval of files of textual documents. Traditionally, this has been the job of the librarian or the information scientist, but the advent of word processing on a large scale now means that organizations and individuals of all kinds need rapid and effective access to textual data. As will be seen, the DAP (and machines with similar architectures) have the potential to provide facilities that are additional to those available in current text retrieval systems.;The second audience are those with an interest in the use of cluster analysis methods on large datasets. Cluster analysis is one of the most important techniques available for the analysis of multivariate data but is very demanding of computational resources when large numbers of records need to be processed: the DAP offers a realistic prospect for substantial reductions in these demands. Finally, the book is aimed at computer scientists, particularly those with an interest in parallel computing or in database systems, who wish to discover two, rather novel application areas for highly parallel computing systems. Given this varied audience, the authors have elected to include a fair amount of introductory and background material in the book so as to provide a way into the literatures of these three disciplines.