Skip to content

Ways of Using Social Harvest

tmaiaroto edited this page Sep 14, 2014 · 2 revisions

There are two main ways to use Social Harvest. It is designed to be a very flexible platform, but the vast majority of users are going to be end-users and not developers. Therefore, emphasis is put on an API that serves those needs; however, you can forego a front-end altogether and use Social Harvest as a raw data mining machine and nothing more.

Scenario 1: The Dashboard

First, the most common scenario is visualizing harvested data using a dashboard. Social Harvest has a separate dashboard codebase for this, but the harvester server is designed to work with it and exposes an API for it.

For this use case, a database is required. Social Harvest natively supports MySQL, Postgres, and MongoDB. Additional databases may be supported in the future (a package which abstracts the database queries is being used, so adding adapters is theoretically feasible). Additionally, the API server must be reachable by the front-end dashboard. The harvester API has basic authentication and CORS support, but any security concerns are currently outside the scope of the project.

After configuring the harvester, a web server must be setup (again, outside the scope of Social Harvest) to host the front-end dashboard code which is all in JavaScript and HTML. So the web hosting requirements are pretty low. Theoretically one could host the dashboard off of Amazon S3 (provided proper CORS and cross-origin concerns were taken into account).

Users can then go to the dashboard using their browser (or the dashboard could actually be put into a desktop application if using the proper tools) and visualize the data. Various settings are available to control the harvester through the API using the dashboard.

This is pretty much Social Harvest wholesale. Which means some control is given up.

Scenario 2: Raw Harvest

Social Harvest understands that there will be some very custom needs. Someone may wish to use their own dashboard or harvest data into their other projects. Maybe a database won't be used at all. This is all ok and accounted for in some way.

The harvester can be configured to not expose an API server. It can also, optionally, not store data to a database. Instead (or additionally), harvested data can be written to log files. These log files can then be processed by something like Fluentd which sends the data to any place desired by the user. This could be a different database (maybe one not supported by the harvester natively) or even multiple databases.

Fluentd allows for some pretty sophisticated work flows. It is possible, for example, to read the data coming from the harvester and then alter it before sending it along to be stored elsewhere. This is the most flexible the harvester gets and the user has complete control over where the data goes and then how it eventually gets used.

Again, database storage (into MySQL, Postgres, or MongoDB) is independent of the logging. They can be used independently or at the same time. It's possible the users won't want to use Social Harvest's own dashboard, but will want to just use all the data as is. Maybe users want their own dashboard and their own API. The option is there.

Clone this wiki locally