Some Statistics: as at February 2008
Background of Corpus
Some years ago Dr. Ellen Jordan – a social historian with a special interest in the Victorian era – came to the CLLC with an attribution problem. She was interested to know if the computational stylistics techniques being developed at the Centre could determine the probability of correctness of her "strong hunch" that Anne Mozley had written a number of anonymous articles of interest to her. She was told that she would need to build up a corpus of well attributed articles by comparable authors – similar in date and genre to her "mystery" articles. Ellen decided that she would choose articles written by well regarded female journalists in "high class" literary journals around the 1850s and 60s.
Thus, the corpus began to take shape. The first articles were transcribed from photocopies Ellen had taken from the periodical Journals themselves. The authors, apart from Anne Mozley, were Frances Power Cobbe, George Eliot, Harriet Martineau, Margaret Oliphant and Elizabeth (Lady Eastlake) Rigby.
Ellen was then approached by Eileen Curran to test what Eileen suspected were Wellesley misattributions of the two Scottish writers John Stuart Blackie and John Hill Burton, both of whom were born in 1809 in Edinburgh, attended the same College and wrote for Tait's Edinburgh Review. In order to do this testing, it was necessary to begin adding male authors to the corpus. The authors added at this stage were those whose articles could be downloaded as electronic texts from the online Gutenberg collection.
The Centre's research assistant, Alexis Antonia, had assisted Ellen in building this initial corpus, and found that she very much enjoyed working with these periodical articles. When the opportunity arose for working on a Research Higher Degree, Alexis chose a topic which would involve using and expanding the existing Victorian Periodical Corpus.
Acquisition of Electronic Texts for the Corpus
Setting the boundaries and obtaining the texts:
Quarterlies Monthlies
Edinburgh Review Blackwoods Edinburgh Review
Quarterly Review Cornhill Magazine
Westminster Review Fortnightly Review (which became monthly)
Bentley's Quarterly Review Frasers Magazine
National Review MacMillans Magazine
Tait's Edinburgh Review
Men Women
Walter Bagehot (1826-77) Frances Power Cobbe (1822-1904)
John Stuart Blackie (1809-1895) Caroline Frances Cornwallis (1786-1858)
John Hill Burton (1809-1881) George Eliot (1819-1880)
Thomas Carlyle (1795-1881) Harriet Martineau (1802-1876)
Lord Robert Cecil (1830-1903) Anne Mozley (1809-1891)
John Wilson Croker (1780-1857) Margaret Oliphant (1828-1897)
James Anthony Froude (1819-1894) Elizabeth Lady Eastlake Rigby (1809-1893)
William Rathbone Greg (1809-1881)
Abraham Hayward (1801-1884)
Thomas Henry Huxley (1825-1895)
Charles Kingsley (1819-1875)
George Henry Lewes (1817-1878)
Thomas Babington Macaulay (1800-1859)
Sir Leslie Stephen (1832-1904)
The first method used was to transcribe the text onto the computer from a photocopy of the journal article. This method was used subsequently for a number of MacMillans articles which the librarian in the University's Rare Books Collections photocopied for us.
The second method was to find public domain electronic texts available in online collections. Of these, only Gutenberg allowed downloading of text in editable form.
Other online collections such as the ILEJ (Bodley, Oxford) site for Blackwoods provided only photo image copies of texts; these could be printed for subsequent scanning or could be transcribed from the photo image into editable electronic text form.
Newcastle University has a number of Victorian periodical journals (Westminster, Edinburgh, Frasers, Tait's and an incomplete MacMillans) available on microfilm. Microfilm printouts were obtained from these for many articles; most of these were transcribed onto the computer; occasionally a microfilm printout was considered clear enough to permit scanning.
Where published editions of periodical articles existed in authorial collections of writings, these were photocopied and scanned. Sometimes, if the photocopy was not suitable for scanning, the article was transcribed.
For journals, such as the Quarterly Review and the Fortnightly, which the Newcastle Library doesn't hold, Inter-Library loan requests obtained the file in TIFF format. These articles were printed and either transcribed or scanned depending on the quality of the copy.
Editing of the electronic texts in preparation for the Centre's counting programs.