last update: 3/23/03
anthracite processors: regex
Regular Expressions (REGEX) are a powerful way to match patterns in text, even when the actual text to be matched varies.
For example, regular expressions can match all of the phone numbers in a particular format on a web page.
Regular Expressions have been around for awhile, and there are some excellent resources out there to help you learn how to make the most productive use of them (see below for links and recommendations). This document cannot and does not replace those resources, but simply provides a starting-point for using the REGEX Processor Object in Anthracite.
NOTE: This document does not teach you how to construct your own Regular Expressions, it simply shows some examples of Regular Expressions that have been tested in Anthracite. To learn how to write your own regular expressions, please see any of the references listed below.
Resources for Learning Regular Expressions
Phone Numbers:
800-555-1212 [0-9]{3}-[0-9]{3}-[0-9]{4}
O'Reilly "Mastering Regular Expressions"
O'Reilly "sed & awk"
There is a brief discussion of Regular Expressions toward the end of the "grep" manpage. Use the Anthracite "Find Unix Command" option or type "man grep" at a command prompt in the MacOS X Terminal program.