Skip to content

ImportingVocabularies

Google Code Exporter edited this page Apr 30, 2015 · 1 revision

HIVE is able to import any vocabulary from a RDF/SKOS file. If the vocabulary is in another format, it must be converted into SKOS before importing.

In order to import a new vocabulary you must:

  1. Create a configuration file with the paths to the files and indexes that will be generated by the HIVE import tools.
  2. Run the AdminVocabularies class to perform the import

How to create your configuration file

Each vocabulary has its own configuration file, with the following format:

#Vocabulary data
name = NBII
longName = CSA/NBII Biocomplexity Thesaurus 
uri = http://thesaurus.nbii.gov 
rdf_file = /usr/local/hive/hive-data/nbii/nbii3.rdf

#Sesame Store
store = /usr/local/hive/hive-data/nbii/store

#Lucene Inverted Index 
index = /usr/local/hive/hive-data/nbii/index

#Autocomplete index
autocomplete = /usr/local/hive/hive-data/nbii/autocomplete

#H2 index
h2 = /usr/local/hive/hive-data/nbii/nbiiH2

#Dummy tagger data files
lingpipe_model = /usr/local/hive/hive-data/lingpipe/postagger/models/medtagModel


#KEA and Maui data files          
stopwords = /usr/local/hive/hive-data/nbii/KEA/data/stopwords/stopwords_en.txt
kea_training_set = /usr/local/hive/hive-data/nbii/KEA/train
kea_test_set = /usr/local/hive/hive-data/nbii/KEA/test
kea_model = /usr/local/hive/hive-data/nbii/KEA/nbii
maui_model = /usr/local/hive/hive-data/nbii/KEA/maui

Place the configuration file in the same directory as the "hive.properties" file. The "hive.properties" file is used by SKOSServer identify which vocabularies will be opened.

The HIVE configuration directory may look like:

conf/ 
  agrovoc.properties 
  hive.properties
  lcsh.properties
  mesh.properties
  nbii.properties
  tgn.properties

AdminVocabularies

Before running AdminVocabularies, make sure Tomcat is not running. A single process can access the HIVE index files at a time.

AdminVocabularies takes these parameters:

  1. Path to configuration directory
  2. Name of the vocabulary
  3. Activate training option for KEA algorithm (optional, If you don't train your system, you can not use automatic indexing classes)

For example (with training):

java  -Djava.ext.dirs=<path to HIVE lib dir> edu.unc.ils.mrc.hive.admin.AdminVocabularies -c <path to directory with hive.properties> -v <vocabulary name> [-a | -sldktmx]

Flags:

 -c <path>  Path to directory that contains hive.properties
 -v <name>  Name of vocabulary to be initialized (e.g., agrovoc)
 -s  Initialize Sesame index
 -l  Initialize Lucene index
 -d  Initialize H2 database
 -k  Initialize KEA database
 -t  Train KEA
 -m  Train Maui
 -x  Initialize autocomplete
 -a  Initialize everything (equivalent of -sldktmxa)

Once the vocabulary has been loaded, you may start Tomcat and test to make sure the vocabulary is working properly.

Effects of AdminVocabularies

AdminVocabularies creates the following directories:

  • H2 database containing administrative tables for the HIVE service. If the -k flag is specified, tables are also created to support the KEA++ indexing algorithm.
  • Lucene inverted index for searching. HIVE uses a document-centric approach to representing concepts in the inverted index. Each concept is represented as a document with multiple fields (e.g., preferred term, alternate terms, scope notes, etc).
  • Sesame database to store SKOS/RDF. HIVE uses a NativeStore, so vocabularies will be stored on the file system.
  • Lucene autocomplete index (if the -x flag is specified)
  • KEA++ (-t) and Maui (-m) statistical models used for automatic indexing.

All indexes and databases can be stored wherever you need in your file system. The location of each database and index is defined in the properties file for the vocabulary in the conf directory.

Clone this wiki locally