| >>
home >> support >> documentation
>> examples
last update: 02/10/2004
Anthracite Example: Searching, Excerpting and Summarizing Legislation
Anthracite's processing tools can be used to support Legal Research, for example, searching legislation.
In this case, the starting point was a list of eight Colorado State Laws in PDF format that had been identified as potentially related to the topic of voting.
These are the original PDF Files: VotingLegislationPDFs
Using Anthracite's UNIX Command Source Processor in conjunction with the freely available (GPL) "pdftotext" program, this processor reads each of the PDF files containing legislation and performs four operations on them.

First, all of the documents' text is searched for the term "vote" and replaced with vote surrounded by HTML tags for bold.

Then, that tagged text is send to three different processors:
"Extract", which pulls the first 1024 characters from the text of each PDF file.
"Summarize", which uses Apple's Text Summarization Engine to condense the text into 5 relevant sentences.
and "Text Near", set to extract 1024 chararacters from the text whenever it finds the word "vote"
The results of these three processors are then appended to three different output files in raw HTML format, these are the output files:
VotingLegislationExcerpt.html
VotingLegislationTextNear.html
VotingLegislationSummary.html

|