/research/software/webutils/default.htm

originally: http://www.oclc.org/research/software/webutils/default.htm

Webutils

The Webutils Open Source project offers perl utilities to support web harvesting and metadata extraction.

License

Distribution

The Webutils code in the CVS repository is divided into modules for ease of retrieval.

The modules are listed below. The documentation is viewable. The Webutils code may be downloaded for use or evaluation, without using CVS.

WWW::Harvester (v 1.15) Documentation Source
This module provides an extensible mechanism for harvesting web pages, i.e, as a spider or robot.
HTML::Normalizer (v 1.04) Documentation Source
This module extracts and normalizes the text of an HTML page.
HTML::MetaExtor (v 1.08) Documentation Source
This module extracts metadata from the META elements of an HTML page. If supplied with a list of index terms, it will also report which terms are in the page. (Note: MetaExtor is dependent on Normalizer.)

Contact