Skip to content

travelogues/travelogues-corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Travelogues Corpus

A corpus of German language travelogues from the period 1500-1876, drawn from the Austrian Books Online project of the Austrian National Library. The corpus was compiled by the domain experts of the Travelogues Project, using the library's administration system (ALMA). Full-texts and manifests with metadata were retrieved using the SACHA infrastructure. Texts are the result of Optical Character Recognition (OCR), and were not manually corrected. Travelogues is funded through grant I 3795 of the Austrian Science Fund (FWF), and grant 398697847 of the German Research Foundation (DFG).


Repository Contents

- 16th_century
  |- 16c-books.zip (14 MB, 66 files)
  |- 16c-metadata.zip (68 KB, 66 files)
- 17th_century
  |- 17c-books.zip (49 MB, 204 files)
  |- 17c-metadata.zip (202 KB, 204 files)
- 18th_century
  |- 18c-books.zip (214 MB, 949 files)
  |- 18c-metadata.zip (814 KB, 949 files)

IMPORTANT! Git LFS must be installed on your system in order to clone this repository correctly.


Accessing Digital Objects Online

Book and metadata files are named according to their barcode identifiers in the Austrian National Library. The permanent URLs to the digital objects can be constructed by prefixing the barcode with http://data.onb.ac.at/ABO/+, e.g. for barcode Z180627808: http://data.onb.ac.at/ABO/+Z180627808.


Use of the Corpus for Machine Learning

This corpus was used to train an automatic classifier in this publication:

Jan Rörden, Doris Gruber, Martin Krickl, Bernhard Haslhofer (2019) Identifying Historical Travelogues in Large Text Corpora Using Machine Learning (accepted for publication), arXiv:2001.01673 [cs.DL]

More information and source code is available in this repository: Travelogues/identifying-travelogues.


License

About

A corpus of German language travelogues from the period 1500-1876

Resources

License

Stars

Watchers

Forks

Packages

No packages published