Text categorization using lexical chains

Lexical chaining is a process of reading a document, and placing the individual words into chains of other words of similar meaning. Lexical chaining has successfully been applied in a variety of Text Retrieval applications. I have made an initial study of using it in the application of dynamic text categorization. The prototype I have developed is available for download. Do whatever you want with it!


Report At DIKU I've made a small project on this subject. The result is a report, describing the developed prototype along with a survey of similar research.
Download The system is distributed as a tar.gz file containing the Prolog source. Furthermore you should download WordNet as a native version, where the *.exc files is placed, and the Prolog version. The system is tested on Sicstus Prolog 3.7 and WordNet 1.6.
InstallationThe exc files (adj.exc, adv.exc, cousin.exc, noun.exc, verb.exc) are used for looking up irregular word forms. They are formatted like:
[derivation] [base form]
and should be converted, so that each line read
exc([syntactic form],[derivation],[base form]).
where [syntactic form] is the name of the current file.

All the converted .exc files, along with the wn_*.pl files from the Prolog distribution are copied to the wordnet directory created when the textcat.tar.gz file was extracted.

Now the convert.pl and stopword.pl programs should be executed to create Prolog database files of the WordNet database. This process will create a directory for each database file. The total size will be about 80 Mb. Copy these directories to the lexchain directory, and run the main.pl file. This demonstrates the use of the lexical chainer on the words listed in the article.pl file.

Links Cycorp. Firm making a huge electronic ontology.
Threads of meaning in documents.
About Scatter/Gather.
WordNet. The home page of WordNet
An Adaptive Approach to Text Categorization and Understanding.
Back home My personal home page.
Last updated 20000303