Skip to main content
   
Andrew Roberts @ School of Computing

Software

There's not much here yet. But I will eventually start putting some of my little apps here.

 

aConCorde
aConcorde is a multi-lingual concordance tool. Originally developed for native Arabic concordance, it posses basic concordance functionality, as well as English and Arabic interfaces. Written in Java, so will run on any platform that has the Java Runtime Environment installed. Go to the aConCorde project homepage for full details.
Project page
jTokeniser
jTokeniser is a set of classes that provide a variety of tokenisers for your Java projects. Simple tokenisers such as WhiteSpaceTokeniser or StringTokeniser provide basic token extraction whereas RegexTokeniser and BreakIteratorTokeniser give more advantage possibilities for more thorough tokenisers that discard punctuation too. Recent additions include RegexSeparatorTokeniser that allows complex definition of token delimiters. Also a SentenceTokeniser has been provided for segmenting text into a set of sentences. jTokeniser project homepage for full details.
Project page
JBootCat
JBootCat is a Java implemention of the BootCat scripts written by Marco Baroni et al for generating corpora from the Internet. The main goal is to encapsulate the BootCat functionality within a user-friendly desktop application. The advantage of using the Java platform is that JBootCat can be run easily on most major operating systems.
Project page
Jacman
Jacman is a GUI frontend to the excellent pacman software management software that comes with the equally excellent ArchLinux.
Project page
buckwalter2unicode.py
A fairly simple Python script designed to convert Arabic text, that is written using Buckwalter's transliteration system, to a Unicode encoding. Also supports the reverse direction, i.e., Unicode to Buckwalter. Requires Python. Released under the GPL.
Download: buckwalter2unicode.py | buckwalter2unicode-0.2.zip | README | Changelog
LAPD - Language Analysis for Plagiarism Detection
This is a rather crude implementation in Java. Written by myself and Alex Morrison, for a piece of coursework during our degrees. Its purpose is to detect cheating within two pieces of natural language text. It compares trigrams from two source files and if there is enough overlap, it is considered plagiarism. The software does also compute many stylistic statisitcs, although time ran out and we didn't have time to factor these in to detection procedure.
Download: lapd.tar.gz | LAPD report | README
Nedstat Basic - Free web site statistics
Personal homepage website counter