Z-News SE News: Automatic Classification in Indexing and Searching

Automatic Classification in Indexing and Searching
Recent developments in Indexing, Searching and Information Retrieval Technologies

Automatic Classification

Learning Algorithms

Other Sections

Index page

This page is updated regularly, please send your suggestions to: demchenko@terena.nl

Automatic Classification

IKEM Toolkit
http://bikit.rug.ac.be:80/ikem/
IKEM Toolkit is a hybrid knowledge-based platform for thesaurus-oriented electronic document management. The project was sponsored by IWT. IKEM Toolkit contains various tools to manage your hybrid documents in an intelligent and user-oriented way.

Willpower Information. Information Management Consultants
www.willpower.demon.co.uk
Thesauri and vocabulary control: Principles and practice
http://www.willpower.demon.co.uk/thesprin.htm
Software for building and editing thesauri
http://www.willpower.demon.co.uk/thessoft.htm

CMU Text Learning Group
http://www.cs.cmu.edu/afs/cs/project/theo-4/text-learning/www/index.html
Goal is to develop new machine learning algorithms for text and hypertext data. Applications of these algorithms include information filtering systems for the Internet, and software agents that make decisions based on text information.

CMU World Wide Knowledge Base (WebKB) project
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
Goal is to develop a probabilistic, symbolic knowledge base that mirrors the content of the world wide web. If successful, this will make text information on the web available in computer-understandable form, enabling much more sophisticated information retrieval and problem solving.

Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering
Bow (or libbow) is a library of C code useful for writing statistical text analysis, language modeling and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document retrieval (arrow) and document clustering (crossbow).
The library and its front-ends were designed and written by Andrew McCallum.
http://www.cs.cmu.edu/~mccallum/bow/rainbow/

Homepage of Andrew McCallum
http://www.cs.cmu.edu/~mccallum/

Contains a lot of information on Learning Classification algorithms for text recognition.

Multi-Label Text Classification with a Mixture Model Trained by EM. Andrew McCallum. NIPS'99. - http://www.cs.cmu.edu/~mccallum/papers/multilabel-nips99s.ps.gz
Building Domain-Specific Search Engines with Machine Learning Techniques. Andrew McCallum, Kamal Nigam, Jason Rennie and Kristie Seymore. AAAI-99 Spring Symposium. - http://www.cs.cmu.edu/~mccallum/papers/cora-aaaiss99.ps.gz
Building Domain-Specific Search Engines with Machine Learning Techniques. Andrew McCallum, Kamal Nigam, Jason Rennie and Kristie Seymore. AAAI-99 Spring Symposium. A related paper was also accepted to IJCAI'99. - http://www.cs.cmu.edu/~mccallum/papers/cora-aaaiss99.ps.gz
Using Reinforcement Learning to Spider the Web Efficiently. Jason Rennie and Andrew McCallum. Draft accepted to ICML'99. - http://www.cs.cmu.edu/~mccallum/papers/rlspider-icml99s.ps.gz
Improving Text Classification by Shrinkage in a Hierarchy of Classes. Andrew McCallum, Ronald Rosenfeld, Tom Mitchell and Andrew Ng. ICML-98 - http://www.cs.cmu.edu/~mccallum/papers/hier-icml98.ps.gz
Learning to Extract Knowledge from the World Wide Web. Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew McCallum, Tom Mitchell, Kamal Nigam, Sean Slattery. AAAI-98. - http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/overview-aaai98.ps.gz

Reinforcement Learning with Selective Perception and Hidden State. PhD Thesis, by Andrew Kachites McCallum
http://www.cs.rochester.edu/u/mccallum/phd-thesis/
Method uses memory-based learning and a robust statistical test on reward in order to learn a structured policy representation that makes perceptual and memory distinctions only where needed for the task at hand. It can also be understood as a method of Value Function Approximation. The model learned is an order-n partially observable Markov decision process. It handles noisy observation, action and reward.

WWW -- Wealth, Weariness or Waste: Controlled vocabulary and thesauri in support of online information access
David Batty
http://www.dlib.org/dlib/november98/11contents.html

Using Automated Classification for Summarizing and Selecting Heterogeneous Information Sources
R. Dolin, D. Agrawal, A. El Abbadi, J. Pearlman
http://www.dlib.org/dlib/january98/dolin/01dolin.html

This page is updated regularly, please send your suggestions to: demchenko@terena.nl