Developed at the Centre for Literary and Linguistic Computing, University of Newcastle, Australia
Director: Hugh Craig
Software developers:R Whipp, Michael Ralston, Jack Eliott, Bill Pascoe



What is the Intelligent Archive?

The Intelligent Archive program is a Java based piece of software used for text analysis within the University of Newcastle's Centre for Literary and Linguistic Computing (CLLC). The software is used in various different ways by the Centre researchers who are focusing on different aspects of text analysis. Professor Hugh Craig is the chief architect of the IA's development; much of the core functionality that is required for Prof. Craig's work also applies to others working within the CLLC.

The typical CLLC project involves preparing a set of texts for computational stylistics operations, with the ultimate purpose of determining authorship of a disputed literary work, or analysing the style of a work or group of works. The IA serves these projects by organising sets of texts and making word counts which can be exported for analysis in an external spreadsheet or statistics program. It is an interface to an archive of texts, and incorporates a range of counting functionalities which can be determined by the user, hence is an 'intelligent archive'. While most text-processing programs focus on more linguistic outputs, such as concordances, or lists of the commonest collocates of a given word, the IA's primary function is more statistical, centred on producing frequency counts of words.


The software is available in three versions, Corella, Budgerigar and Galah. Corella is Intelligent Archive version 2 and is the latest release (26/06/2015). Budgerigar and Corella are previous versions. All provide the following core facilities:

  • Management of individual texts of different formats within a virtual library or repository
  • Management of text sets, which are user-created groups of these texts
  • Word frequency analysis on individual texts, tagged sections within texts, text
    sets, contiguous block segments of a specified size within texts, etc.

Intelligent Archive Galah also includes functionality for 'experiments', namely Jensen-Shannon Divergence and Burrows' Zeta (incorporating Burrows' Iota) for both single words and word pairs.  Documentation is available for IA Budgerigar but not yet for IA Galah. We are currently working on documentation for IA Corella.

System Requirements

The Intelligent Archive is written using the Java platform. As such it is able to run on any operating system which supports Java and therefore will require the installation of a Java Runtime Environment. This can be downloaded here free of charge.

The specifications of the computer used will vary according to which features of the software you wish to use. The core functionality only requires a very basic system with at least 512MB of memory. The software does not require a fast CPU, however, it will be able to provide its results quicker if equipped with a quicker CPU. The software does not currently benefit from being used on a system with multiple CPUs or CPU cores.

The software itself uses less than 6MB of disk space (excluding texts you import). You will also require enough disk space to store all texts added to the text repository.