Skip to content

Commit

Permalink
Merge pull request #67 from rinigus/nominatim
Browse files Browse the repository at this point in the history
Import data through Nominatim
  • Loading branch information
rinigus authored Aug 8, 2022
2 parents 822fc52 + 5081ded commit 6e1644d
Show file tree
Hide file tree
Showing 24 changed files with 1,436 additions and 3,868 deletions.
14 changes: 14 additions & 0 deletions .clang-format
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
# We'll use defaults from the LLVM style, but with 4 columns indentation.
BasedOnStyle: GNU
ColumnLimit: 100
---
Language: Cpp
AllowShortFunctionsOnASingleLine: Inline
AlwaysBreakAfterDefinitionReturnType: None
AlwaysBreakAfterReturnType: None
SpaceBeforeParens: ControlStatements

AlignConsecutiveAssignments: Consecutive
AlignConsecutiveDeclarations: Consecutive
AlignConsecutiveDeclarations: Consecutive
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,7 @@
*~

/*.pro.user
build/
.vscode
*.code-workspace
CMakeLists.txt.user
52 changes: 47 additions & 5 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,22 @@ project(geocoder-nlp
DESCRIPTION "Geocoder NLP")

set(CMAKE_INCLUDE_CURRENT_DIR ON)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED True)

include(FindPkgConfig)
include(FeatureSummary)
include(GNUInstallDirs)

find_package(PkgConfig REQUIRED)
find_package(nlohmann_json 3.2.0 REQUIRED)
find_package(Boost 1.30 COMPONENTS program_options REQUIRED)

pkg_check_modules(MARISA marisa IMPORTED_TARGET)
pkg_check_modules(KYOTOCABINET kyotocabinet IMPORTED_TARGET)
pkg_check_modules(POSTAL postal IMPORTED_TARGET)
pkg_check_modules(POSTAL libpostal IMPORTED_TARGET)
pkg_check_modules(SQLITE3 sqlite3 IMPORTED_TARGET)
pkg_check_modules(LIBPQXX libpqxx IMPORTED_TARGET)

set(SRC
src/geocoder.cpp
Expand All @@ -31,31 +34,70 @@ set(HEAD
include_directories(thirdparty/sqlite3pp/headeronly_src)
include_directories(src)

# boost
include_directories(${Boost_INCLUDE_DIR})

# importer
set(IMPSRC
importer/src/config.h
importer/src/main.cpp
importer/src/hierarchy.cpp
importer/src/hierarchy.h
importer/src/hierarchyitem.cpp
importer/src/hierarchyitem.h
importer/src/normalization.cpp
importer/src/normalization.h
importer/src/utils.cpp
importer/src/utils.h
)
add_executable(geocoder-importer ${SRC} ${HEAD} ${IMPSRC})
target_link_libraries(geocoder-importer
PkgConfig::MARISA
PkgConfig::KYOTOCABINET
PkgConfig::POSTAL
PkgConfig::SQLITE3
PkgConfig::LIBPQXX
nlohmann_json::nlohmann_json
${Boost_LIBRARIES})

# demo codes
add_executable(geocoder-nlp
demo/geocoder-nlp.cpp
${SRC}
${HEAD})

target_link_libraries(geocoder-nlp
-lmarisa -lkyotocabinet -lpostal -lsqlite3)
PkgConfig::MARISA
PkgConfig::KYOTOCABINET
PkgConfig::POSTAL
PkgConfig::SQLITE3)

add_executable(nearby-line
demo/nearby-line.cpp
${SRC}
${HEAD})

target_link_libraries(nearby-line
-lmarisa -lkyotocabinet -lpostal -lsqlite3)
PkgConfig::MARISA
PkgConfig::KYOTOCABINET
PkgConfig::POSTAL
PkgConfig::SQLITE3)

add_executable(nearby-point
demo/nearby-point.cpp
${SRC}
${HEAD})

target_link_libraries(nearby-point
-lmarisa -lkyotocabinet -lpostal -lsqlite3)
PkgConfig::MARISA
PkgConfig::KYOTOCABINET
PkgConfig::POSTAL
PkgConfig::SQLITE3)

# install
install(TARGETS geocoder-importer
DESTINATION ${CMAKE_INSTALL_BINDIR})

# summary
feature_summary(WHAT ALL FATAL_ON_MISSING_REQUIRED_PACKAGES)

49 changes: 49 additions & 0 deletions Database.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Geocoder NLP database format

The geocoder database consists of several files which are expected to be in the
same directory. All locations are described using singe coordinate to keep the
files as small as possible.

The files composing a database are:

1. geonlp-primary.sqlite: SQLite database with location description and coordinate
2. geonlp-normalized.trie: MARISA database with normalized strings
3. geonlp-normalized-id.kch: Kyoto Cabinet database for linking MARISA and primary IDs

## geonlp-primary.sqlite

SQLite database contains location description, their organization into hierarchy
of objects.

Table `object_primary` keeps location description. In this table, objects are
stored sequentially (in terms of their `id`) according to the positioning in the
object hierarchy with the children stored after parents. Table `hierarchy` has a
record for each item (`id` from `object_primary`) with the children consisting
of parent ID (`prim_id`) and the ID of the last child (`last_subobject`).

Object types are stored separately in `type` table with the type ID used in
`object_primary`.

Spatial queries are indexed using R-Tree with `box_id` used as a reference in
`object_primary`. Namely, as all objects are stored as points, for storage
efficiency, objects next to each other are set to have the same `box_id` and are
found through `-rtree` tables.

Table `meta` keeps database format version and is used to check version
compatibility.

## geonlp-normalized.trie

All normalized strings are stored in MARISA database
(https://github.com/s-yata/marisa-trie). Normalized strings are formed from
`name` and other similar fields of `object_primary` table in
`geonlp-primary.sqlite`. All strings are pushed into MARISA database that
assigns its internal ID for each of the strings.

## geonlp-normalized-id.kch

Kyoto Cabinet (https://dbmx.net/kyotocabinet/) database for linking MARISA and
primary IDs. Hash database variant is used where `key` is an ID provided by
MARISA for a search string and value is an array of bytes consisting of
`object_primary` IDs stored as `uint32_t` one after another. The array is stored
using `std::string`.
100 changes: 0 additions & 100 deletions Makefile

This file was deleted.

17 changes: 11 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
# geocoder-nlp
# Geocoder NLP

This is a geocoder C++ library that uses libpostal to parse the user
This is a geocoder C++ library that targets offline use by mobile
applications. It is able to perform forward and reverse geocoding.
For forward geocoding, it uses libpostal to parse the user
request, normalize the parsed result, and search for the match in
geocoder database.
geocoder database. In addition to traditional reverse geocoding, it is
able to find points of interest close to the reference point or line.

The library includes demo program showing how to use it. Its also used
as one of the geocoders in OSM Scout Server
Expand All @@ -29,7 +32,7 @@ libraries mentioned above.
## Databases

At present, the datasets required for the geocoder to function are distributed
as a part of OSM Scout Server datasets .
as a part of OSM Scout Server datasets.

If you use the geocoder with the full libpostal installation, you don't need to
get the libpostal datasets from that location, but can use the datasets
Expand All @@ -43,8 +46,10 @@ To use country-specific datasets, you would have to get:
In addition, the prepared geocoder databases are available at
geocoder/SELECT THE NEEDED ONES.

Database format is described in [separate document](Database.md).

## Acknowledgments

libpostal: https://github.com/openvenues/libpostal
libpostal: Used for input parsing; https://github.com/openvenues/libpostal

libosmscout: http://libosmscout.sourceforge.net
Nominatim: Used for data import; https://nominatim.org/
74 changes: 0 additions & 74 deletions importer/Makefile

This file was deleted.

Loading

0 comments on commit 6e1644d

Please sign in to comment.