wiki-talk-parser

This little program can:

Parse wikipedia dump files (xml) to wiki-talk networks. Original wikipedia UIDs are remained.
"Shrink" the resulting network, so it is an unweighted directed network w/o loops, like in the SNAP wiki-Talk dataset.
Group users according to their roles.

Usage with stu

Use stu for easy lives. The only file you need is main.stu. Simply type in stu or:

$ nohup stu -k -j 3 &

Stu will automatically start downloading this program and the datasets, then parsing. The parameter -j defines the number of jobs that will run in parallel. For downloading, more than 3 is not recommended.

Usage without stu

Installation

Manually download the latest jar files.

Parse

$ java -jar parser.jar *input-file* *lang* > *output-file*

Shrink

$ java -jar shrinker.jar *input-file* > *output-file*

Group users

$ java -jar grouper.jar *input-file* > *output-file*

Compilation

$ lein with-profile parser:shrinker:grouper uberjar

License

Distributed under the Eclipse Public License either version 1.0 or any later version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

wiki-talk-parser

Usage with stu

Usage without stu

Installation

Parse

Shrink

Group users

Compilation

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

wiki-talk-parser

Usage with stu

Usage without stu

Installation

Parse

Shrink

Group users

Compilation

License