CloudI HtmlUnit Service

WHY?

For web scraping modern websites it is necessary to use the rendered result after JavaScript has modified the contents. The cloudi_service_htmlunit CloudI service provides the rendered result as XML while isolating any problems that may exist in the HtmlUnit source code. Browser source code is typically known for instability and HtmlUnit has had memory consumption problems in the past. However, cloudi_service_htmlunit provides reliable HtmlUnit processing while tolerating transient HtmlUnit bugs.

BUILD

Use maven and JDK >= 1.11 to build:

mvn clean package

RUNNING

Start the cloudi_service_htmlunit Java service:

export JAVA=`which java`
export PWD=`pwd`
export USER=`whoami`
cat << EOF > htmlunit.conf
[[{prefix, "/browser/"},
  {file_path, "$JAVA"},
  {args, "-Dfile.encoding=UTF-8 "
         "-server "
         "-ea:org.cloudi... "
         "-Xms1g -Xmx1g "
         "-jar $PWD/target/cloudi_service_htmlunit-2.0.7-jar-with-dependencies.jar "
         "-browser default"},
  {count_thread, 4},
  {options,
   [{owner, [{user, "$USER"}]},
    {directory, "$PWD"}]}]]
EOF
curl -X POST -d @htmlunit.conf http://localhost:6464/cloudi/api/rpc/services_add.erl

To extract the Java quickstart from https://cloudi.org using a XPath query:

curl -G --data-urlencode "url=https://cloudi.org" --data-urlencode "xpath=//div[@id='Java']" http://localhost:6464/browser/render

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
src/main/java/org/cloudi/examples/htmlunit		src/main/java/org/cloudi/examples/htmlunit
.travis.yml		.travis.yml
LICENSE		LICENSE
README.markdown		README.markdown
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CloudI HtmlUnit Service

WHY?

BUILD

RUNNING

About

Releases

Packages

Contributors 2

Languages

License

CloudI/cloudi_service_htmlunit

Folders and files

Latest commit

History

Repository files navigation

CloudI HtmlUnit Service

WHY?

BUILD

RUNNING

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages