Skip to content

Latest commit

 

History

History
43 lines (26 loc) · 1.39 KB

README.md

File metadata and controls

43 lines (26 loc) · 1.39 KB

wiki-talk-parser

This little program can:

  • Parse wikipedia dump files (xml) to wiki-talk networks. Original wikipedia UIDs are remained.
  • "Shrink" the resulting network, so it is an unweighted directed network w/o loops, like in the SNAP wiki-Talk dataset.
  • Group users according to their roles.

Usage with stu

Use stu for easy lives. The only file you need is main.stu. Simply type in stu or:

$ nohup stu -k -j 3 &

Stu will automatically start downloading this program and the datasets, then parsing. The parameter -j defines the number of jobs that will run in parallel. For downloading, more than 3 is not recommended.

Usage without stu

Installation

Manually download the latest jar files.

Parse

$ java -jar parser.jar *input-file* *lang* > *output-file*

Shrink

$ java -jar shrinker.jar *input-file* > *output-file*

Group users

$ java -jar grouper.jar *input-file* > *output-file*

Compilation

$ lein with-profile parser:shrinker:grouper uberjar

License

Copyright © 2023 Yfiua

Distributed under the Eclipse Public License either version 1.0 or any later version.