home > support > docs > tools > sources

anthracite sources  

Anthracite currently offers eleven types of Source Object tools for working with remote or local data.

[ URL List ]
[ URL List File ]
[ CURL ]
[ Directory of Files ]
[ URL Generator ]
[ Google Search ]
  [ UNIX Command ]
[ Links On Page ]
[ Static Text ]
[ AppleScript ]
[ MySQL Query ]

1. URL List

This is the most basic type of source. It allows you to use a list of URLs to load, so the output of this source can be data from many different documents on the web or on your local hard drive. You can add URLs to the list of sources to load in several ways: by clicking the "+" button and typing in the URL directly; by loading them from a text file of URLs on your hard drive; or by dragging and dropping them from your web browser as you are surfing.

Using the drag and drop method makes it very easy to build and keep a collection of the URLs you'd like to process while you are browsing, so you will most likely use this feature frequently.

To make it even more convenient, you can drag a URL from your web browser (either from the "location" field or a draggable hyperlink in a document) and drop it anywhere within your Anthracite document's window and a new URL List Source Object will be created for you with the dropped URL pre-loaded.

NOTE: Not all URLs will work with the basic URL list and will result in "0" data if you try to load them. In particular, URLs that require you to type in a name and password will not work. For those, and other cases use the Expert URL Source Object tool.

You can also use the URL List object to load files on your local hard drive (or theoretically over the network, although this has yet to be fully tested), try using URLs like:

file://localhost/IEDownloads/10203040.html

file:///Users/joe/Documents/iTunes/iTunes\ Music\ Library.xml


You can even use foreign-language file names.

 

2. URL List File


Use the URL List File Source Object when you have a list of the URLs you'd like to load. Click the "Choose URL List File" button and select a text file that contains the URLs, one on each line.

3. CURL

Anthracite makes use of the built-in "CURL" facility on your Mac OS X computer for more complex URL transactions than the standard URL object is capable of, such as: accessing HTTPS (Secure Socket Layer) servers; web servers that require a username and password; accessing the header information for a webpage; sending custom headers to a webserver such as a cookie or referrer; and sending GET or POST form arguments.

A separate page is available detailing how to configure the CURL Source Object.

 

4. Directory of Files

To process files stored in a folder on your hard drive, instead of using files on the network, use the "Directory of Files" Source Object tool. To specify the directory of files, click on "Choose Directory" and a standard Macintosh file navigation window will open. Choose the directory you wish to use. Click on the "Also use files in subfolders" checkbox if you also want to look in folders within your chosen directory.

NOTE: Currently, Anthracite only loads text files with ".txt", ".csv" and ".htm" or ".html" extensions.

 

5. URL Generator

The URL Generator Source Object tool gives you the ability to load lists of URLs based on known variations in the URL, like a series of documents with sequential numbers in their URLs or a URL that varies based on a subcomponent of the URL, like a URL that includes a stock ticker symbol.

There are two steps to using the URL Generator tool. First, you must get a sample of one of the base URLs in the sequence you'd like to load. In the example pictured here, the sample base URL is in a subdirectory on metafy.com and named "Report-East.html." Once the URL is entered into the Sample Base URL field, you must select the portion of the URL that changes, in this case, it's the word "East." Once selected, click the "Set" button to specify the range of the URL that will change, to confirm your selection, you will see the varying portion in red in the field just below the Base URL entry field.

The second step is to specify how the URL changes. There are two methods to choose from, either "Numeric" or "From List." Use "Numeric" when there is a numbered sequence of URLs to load, for example, "Report-001.html", "Report-002.html", etc. Enter the starting number and the ending number and if you want the URL Generator to "pad" the number with zeroes (for example, should number 1 be inserted as 001 instead of just "1"). If you have a list of the parts of the URL that change, such as a list of stock ticker symbols that you want to load from the same web source, then use the "From List" method and specify the list file by clicking on the "Set List File" button and navigating to find the file containing the list of elements that change.

In the example shown above, the file "NEWS-List.txt" contains only the four words "North" "South" "East" and "West" each on a separate line. When processed, the URL Generator will create and load four URLs, one URL each using the four words. So, in the example shown above, with the settings described, the four URLs generated and loaded will be:

http://www.metafy.com/anthracite/samples/seq/Report-North.html
http://www.metafy.com/anthracite/samples/seq/Report-South.html
http://www.metafy.com/anthracite/samples/seq/Report-East.html
http://www.metafy.com/anthracite/samples/seq/Report-West.html

 

6. Google Search

The Google Search Source Object tool uses Google's SOAP API to perform a query on their database and then load the result URLs.

That is, it is similar to doing a search on Google via their web interface, but instead of getting back a list of URLs pointing to pages that matched your query that you then click on to view, this tool actually loads all of the pages so you can process them.

To use the Google Search Source Object tool, you must have an account with Google's API server.

Click this link to Apply for a Google API Account

For complete information on how to use the Restrict and Language fields, be sure to review the Google API Reference online:

http://www.google.com/apis/reference.html

An expanded description of the Google API Source Object is available.

NOTE: There may be limitations on your use of Google's API, such as a limit of 10 results per query and 1000 queries per day. See Google's documentation for info on the settings.

 

7. UNIX Command

The UNIX Command Source Tool integrates the output from command-line programs that write to STDOUT.

[Enter args, examples page]

Please also see the Additional Documentation page for using UNIX Commands in Anthracite.

[UNIX COMMAND WARNING/DISCLAIMER]

 

8. Links on Page

Links On Page allows you to specify a URL for a webpage from which you'd like every linked page to be retrieved. Note that although this source object is capable of producing a large quantity of output, it does NOT currently provide recursive link traversal, or fully automated spidering.

9. Static Text

The Static Text object allows you to use a fixed piece of text as a source. This can be useful when developing your processes, and wish to work with a snapshot of a webpage, instead of loading the page from the server repeatedly. Hint: You can drag and drop text into your Anthracite Documents and it will automatically be converted into a Static Text source object. Try this from the Results Window with selected text to quickly setup a results snapshot to process.


10. AppleScript

AppleScript is a powerful scripting language developed by Apple to enable inter-application workflow and full-fledged software development. Anthracite allows you to "Attach" an AppleScript to you application and will use the result of the script as the source for a process chain. Simply enter the AppleScript in the text field and it will be compiled and executed when you run your Anthracite document. Here's a simple script that retrieves the source of the currently selected mail message in Apple's built-in Mail.app:

	tell application "Mail"
	   set theMail to selection
	   source of (item 1 of theMail)
	end tell
	

11. MySQL Query

If you have the MySQL server and client software installed on your system in the standard way, the MySQL Query Source Tool should be enabled automatically for you. This tool allows you to use output from SQL queries performed on MySQL servers by the MySQL client command line tool. The Output of your MySQL Query can be treated as either a block of text, or as an array of results.
[More info on MySQL for MacOSX]

 

[ Top of This Page ] [ Anthracite Tools ] [ Anthracite Documentation ]


Last Updated: 10/21/2004
Last Updated: 8/26/2004
Last Updated: 7/16/2004
Last Update: 6/14/2004
Last Update: 2/06/2004
Last Update: 3/23/2003