Faster than you can snatch the pebble from our hand, we will return a location.
This repo solves the problem of finding a location for geographic text, in particular postal address input. Often called geocoding, this project returns a latitude and longitude (y and x) value for entered postal addresses.
Using Elasticsearch and a fabric of high value data, this project offers an API built off of microservices. These services receive entered text, parse that text for postal address attributes, search authoritative local, state, and national data on those attributes and then return the best fit answer location for that entered text. The intent of this project is a high availability, high volume and high use geocoding service. Other projects contain data source/loading functions and user interface functions, and this project is the back end code for the search algorithm and API services.
Our goal is to reduce burden for financial institutions who need to report location information. This project was built in order to establish a federal authoritative function for mortgage market needs. In particular, the Consumer Financial Protection Bureau has elected to provide a geocoding service for those financial institutions which need to establish location attributes in order to meet regulatory functions for rules like Qualified Mortgage and Home Mortgage Disclosure Act rules. These rules require financial institutions to report data on mortgage activities for these financial institutions, and this service offers an authoritative function to meet this need.
We also noticed a gap in approaches to traditional geocoding and wanted to allow an opportunity for growth in the technology around this area. Many federal, state and local entities have generic needs for geocoding, which this service may help provide. Many traditional geocoding services hamper government use with a) inflexible terms and conditions (e.g. share alike clauses), b) proprietary technology requiring continuous licensing and/or c) in-ability to use local more relevant data for the search.
We encourage forking, adding to the code base and/or general use of the service.
Grasshopper's service layer runs on the Java Virtual Machine (JVM), and requires the Java 8 JDK to build and run the project. This project is currenly being built and tested on Oracle JDK 8. See Oracle's JDK Install Overview for install instructions.
Grasshopper should also run on OpenJDK 8.
Grasshopper's service layer is written in Scala. To build it, you will need to download and install Scala 2.11.x
In addition, you'll need Scala's interactive build tool sbt. Please refer to the installation instructions to get going.
Grasshopper uses Elasticsearch as a backend to store data for geocoding. For dev and test purposes, grasshopper includes an in-memory ElasticsearchServer. For non-dev environments, you'll want a dedicated Elasticsearch instance.
Grasshopper uses sbt's multi-project builds, each project representing a specific task and usually a Microservice.
-
Start
sbt
$ sbt
-
Select project to build and run
> projects [info] In file:/Users/keelerh/Projects/grasshopper/ [info] client [info] elasticsearch [info] geocoder [info] * grasshopper [info] metrics [info] model > project geocoder [info] Set current project to geocoder (in build file: /path/to/geocoder/)
-
Start the service
This will retrieve all necessary dependencies, compile Scala source, and start a local server. It also listens for changes to underlying source code, and auto-deploys to local server.
> ~re-start
- Confirm service is up by browsing to http://localhost:31010.
All grasshopper services and apps can be built as Docker images. Docker Compose is also used to simplify local development.
Docker is a Linux-only tool. If you are developing on Mac or Windows, you will need a VM to run Docker. Below are the steps for setting up a VirtualBox-based VM using Docker Machine.
-
Install necessary dependencies (Mac-specific):
brew install docker docker-compose docker-machine
-
Create a Docker VM using Docker Machine, and point the Docker client to it:
docker-machine create -d virtualbox docker-vm eval "$(docker-machine env docker-vm)"
-
Discover the Docker Host's IP
docker-machine ip docker-vm
Note: This is referred to as
{{docker-host-ip}}
throughout this doc.
-
Checkout all other grasshopper-related repos into the same directory as grasshopper. This currently includes:
-
Assemble Scala projects into Java artifacts:
cd grasshopper sbt clean assembly
Note: This is necessary because the
geocoder
Docker image is purely Java, so the Scala code must first be compiled and packaged to run in that environment. This step must be repeated with each change to thegrasshopper
project. -
Start all projects
docker-compose up -d
-
Browse to the web-based containers to confirm they're working:
Container URL geocoder
http://{{docker-host-ip}}:31010 parser
http://{{docker-host-ip}}:5000 ui
http://{{docker-host-ip}} elasticsearch
http://{{docker-host-ip}}:9200
The grasshopper-ui
and grasshopper-parser
projects support auto-reload of code, so you don't have to rebuild their respective images with each code change. grasshopper-ui
even has a Docker-specific Grunt task for further dev-friendliness. This means you can make UI changes and just refresh the browser to view them.
cd ../grasshopper-ui
grunt docker
The default Compose setup also mounts the local grasshopper-loader/test/data
directory into the grasshopper-loader
container so you can place files there and load them without having to rebuild.
The grasshopper-loader
project comes with some small test data files. You can load state address point and Census TIGER line data as follows:
docker-compose run loader ./index.js -f path/to/data.json
docker-compose run loader ./tiger.js -d path/to/tiger
For further details on loading data, see grasshopper-loader.
If you'd like to see the "full stack", which adds several logging and monitoring services,
just point docker-compose
at the "full" setup. This will start a lot of containers,
so no need to run this setup during development.
docker-compose -f docker-compose-full.yml up -d
To run the tests, from the project directory:
$ sbt
> test
This will run unit and integration tests. The integration tests will stand up a temporary Elasticsearch node, no additional dependencies are needed.
In addition to regular testing, some projects (i.e. client, geocoder) also have integration tests that can be run against a live system. To run these, first make sure that the underlying dependencies have been deployed and are running (addresspoints, parser and census services). The underlying services need to have the necessary data to pass the tests.
$ sbt
> project geocoder
> it:test
The tests will occasionally print out a stack trace, the in memory Elasticsearch node doesn't load all libraries. So far this is not an issue for the purposes of testing.
For details on how to get involved, please first read our CONTRIBUTING guidelines.