home >> docs >> quickstart


anthracite quickstart guide


Be sure to read the Installation Instructions if you have not already installed the Software.

When you first run Anthracite, you'll see the Startup Screen which displays messages about the initialization status of the software. After the Startup Screen, two other windows will open when you run Anthracite. The Log Window displays messages about what the program is doing. A new empty Document Window will also open, containing a toolbar of Icons at the top of the window and a one-line status message at the bottom.

The initial empty Document Window and Toolbar:

(click for larger image)



The Toolbar has icons for running your process, saving your document, and creating the five basic Object Types that you will work with, and for using presets of your own.




The Five Object Creation Toolbar Icon Buttons are:
Red Sphere: Source, Orange 3-D Plus: Process, Blue Box: Results, Green Cone: Export and Black Slate: Report Object.

Clicking Once on any of the five object creation icon buttons will add an object of that type to the document.

Click Once on the Round Red Source Object Icon now, it should create a new object named "New Source Object" in the middle of the Document window.

The Document with a new Source Object added.


Before creating any more objects, let's stop to explain what each of them do.

There are currently Five Types of Objects you can use in your Anthracite documents:

 

Source Object (Red Round Sphere)
Source objects are the way data is brought in to the program at the beginning of an operation. There are several different source types, which we'll cover in a moment.


Processor Object (Orange 3-D Plus)
Processor objects do the work of manipulating data loaded from sources. The power of the program is in configuring the many types of processors to achieve the results you want.


Report Object (Black Slate) - The Report Object combines output from processes and formats it into different types of documents using basic or template layouts from web-based sources.


Export Object (Green Cone)
Export Objects enable you to get data out of the Anthracite program for use in other applications, such as spreadsheets, databases, reports or websites.


Results Object (Blue Box)
Results objects hold the output of other objects and save the data along with your Anthracite doucument for later review or export. One use of Results Objects is to hold snapshots of the stages of data transformations during development of complex processing chains. Results are covered in more detail below.



Now that we've covered what the objects do, let's move on to creating more of them and learning how to work with them.

But first, to stay organized as we work on more and more objects, we should quickly cover how to position the object we already created in the window earlier and give it a title.


Click once on the new object to select it, hold the mouse button down, and drag the object around within the window.



If you double-click on the object, its Edit View will "drop down" from the title bar of the window as a Mac OS X "Sheet".


(click for larger image)


We'll cover the various object types later, for now, just type a new name in the Name field at the top of the Edit View. In this case, we're going to call it "Homepage"



Click "Ok" at the bottom of the edit sheet.

Using the Processor (Orange 3-D Plus) and Results (Blue Box) Icon Buttons on the Toolbar, repeat this procedure so that you have one each of Source, Processor and Results Objects (Red, Orange and Blue) arranged in a diagonal line, and named "Homepage", "Links" and "LinkListResult" respectively.


If you create too many Objects, you can single click on them to select them, and then press the "Delete" key. You will be prompted to confirm that you want to delete the object. You can also "Cut" the object to Pasteboard with Command-X.


SAVE YOUR DOCUMENT (always very important no matter what program you are using)!!! Choose "Save" from the File Menu, or type Command-S (Apple/Cloverleaf key + S key). Be sure to place your saved file in a location where you will be able to find it. While exploring and building a process, you may wish to put the Anthracite file on your desktop for easy access. Once saved, several of your recent Anthracite documents are available in the "Open Recent" menu selection in the File Menu.


For this explanation, I saved the file as "AnthraciteTesting" on the desktop.

Now that we've created some objects, we need to connect them to make them do their work.

Anthracite uses a simple yet powerful model of "Object Connections" to visually define how process operations will take place.

The idea behind these connections is that you will often want to take the data from one source or from a collection of sources and have that data all go through one process, or, that you want (or need) to have one process followed by another process before you have exactly the results (or approximately the results) that you desire.

Thus, it is important to understand that objects are connected together in a "top down" fashion. That is, there is at least one source object at the top of a process chain that loads the data, then it passes that data "down stream" to the objects to which it is connected. Whenever there is a connection, there is a "Source" of data and a "Target" of the data, often, a "source" is the output of a process going into another processor object.

Anthracite enforces several rules regarding these Object Connections, including that you cannot connect objects together so that they would create a circular loop (a source pointing to its own input, even if via another object); that you cannot connect the output of any object to a Source object; and that you cannot connect the output of a Results Object to another object.

To create a connection between two objects, first, select the object that will be the source.

In our sample document we've created so far, the "Homepage" Source Object (the red disc) will be the first source for data in the processing chain.

Click once on the source to select it.

While holding down the Command (Apple/Cloverleaf) key, click and drag from within the red Source Object and toward the Processor Object (orange). A green line will follow you as you drag, and when you drag the line into the Processor Object (orange hexagon) the Processor Object highlights to indicate that a connection will be made if you release the mouse.



Go ahead and release the mouse so that the connection is made.


Repeat this procedure so that the Processor Object is in turn connected to the Results Object (blue cylinder).



You can see what object are connected by the lines between objects.

Arrows indicate the direction of data flow between the objects.


To delete a connection from one object to another, you must use a "Context menu". Hold down "Control" ("CTRL") while clicking on an object to get the context menu for an object. The context menu changes to reflect the connections each object has. Additionally, you can choose to bring up the Edit View for the object, Delete the Object itself, or Disable the Object, which you may want to do when testing your processes operations. (Note: If you cut or delete an object, that will of course delete the connections between it and other objects.)


When an object is disabled, it is shown as transparent, instead of its normal opacity. The context menu changes when an object is disabled to allow you to enable it again.


Now that we've created a set of objects and connected them together, let's configure what they're going to do.

Double click on the "Homepage" Object (red sphere), bringing up the Edit View we used earlier.

The "Type" pop-up selector lets you choose what type of Source Object you want this to be. We're going to use the Default "URL List" that is already selected.


Click on the white box area in the center of the sheet where the text "URLs" appears, below the title that says "URL List"

Click on the "+" button.

Then, double click just below the word "URLs"

A rectangular area will highlight (surrounded with a highlight), and a blinking insertion bar will appear at the left of the area.

Type in the following text exactly, followed by TAB or Return (HINT: You can drag and drop URLs from your browser into Anthracite Documents).:

http://metafy.com/anthracite/samples/sources/

when you hit tab or return, the text should become selected, as in this image:




For now, it is important that you see that your text is highlighted before continuing.

Click in the area just below the URL text you typed in, so that the URL is no longer highlighted.

Click "OK" and the sheet closes.

What we've done is set the Source Object to retrieve the data from the document at the URL we've entered. This is just like what your favorite web browser does when it loads HTML from a web server and formats it for display, except that Anthracite doesn't format the HTML for display, it simply retrieves the text data from the web server so you can process it in other ways.


So, Next, let's set up an operation to perform on the text.

For this quick start tutorial, we're going to extract all the links from the homepage URL we've entered.

Double-click on the Processor Object ("Links", orange 3-D plus) to invoke its edit sheet, and select "Text Between" from the Type selector.



This will change the edit sheet's view to show the settings for the "Text Between" process.

As you saw from the pop up selector, there are a variety of types of processes available to you in Anthracite, all of which are explained in the User Manual.

For this tutorial, we're using "Text Between", which looks in the source data to find matching runs of text between the "Start" and "End" match text you enter for settings.



In this case, we're using "<a " (less-than bracket, lower case a, and a space) as the start field, and "</a>" (the standard HTML anchor close tag) as the end field, and we're going to include that search text in the results. We're also going to ignore the case of the match text, so "</A>" and "</a>" will both match.

If the Text Between finds more than one match, it will return them all as an Array (collection) of results.

The Results Object ("LinkListResult", blue box) is able to accept results from processors in both Text and Array form. We'll show you how to see these different results in a moment.

It's also important to note that if the "Extract Links" process DOES NOT find any results, it will pass nothing along. In this case, if the page we load using the Source Object doesn't have any links on it, we'll get empty values in the results object. Building processors that extract just the data you want (and none of what you don't want) is your challenge, and is beyond the scope of this quick intro. Techniques for getting the best results are spread throughout the User Manual.

For More Information On Strings and Arrays (an important concept to understand when using Anthracite), please see the Terminology Page.

Now that we've configured the Source to load data (from the homepage URL we entered), and configured the Processor to perform an operation on the data, and connected the output of the Source to the Processor, and the Processor's output to the Results Object, we're ready to run the process.

Select "Run Process" from the File Menu, or type Command-R, or click the "Run" icon in the Toolbar.

Each object will highlight as it is actively working on its assigned task.

Each time a run finishes loading and processing all the objects, the program will sound a chime.

When a Results Object gets new data at the end of a process run, a plus badge is added to indicate it.



If there are any problems with your process configuration, you are likely to see warning badges on some of your objects, and will need to review the Log Window for information.



Now, to view the results we've created by loading the page and extracting the links, double-click on the Results Object ("LinkListResults", blue box).

This will bring up the Results View Sheet.



The Results View organizes the output of your processes as a heirarchy of results, the "Results Outline View". In our example, the "Process Results" holds all the data from the process run we just completed. It is shown "Collapsed" with its small expand arrow just to its left.

Click on the expand arrow to reveal the results from the run. Since one object is connected to the Results input, there is one entry, "Homepage-Links-" shown selected in this image:





As mentioned earlier, "Text Between" will return as many matching strings from the input as it can find.

So, presuming there was more than one link on the page we loaded, this result should have a collection of links from the page.

Click on the reveal arrow just to the left of the results entry "Homepage-Links-", and it shows the source URL for these results, shown selected in this image:




You can widen the view of results by clicking and dragging on the lower right corner of the sheet.




Now, double click on the URL entry in the Results Outline View.


A new window opens with a Spreadsheet-style collection of the links extracted from the page!


Congratulations, you've just completed your first Anthracite process!

If you'd rather view your results in a standard browser window, simply click on "Preview" and you should automatically be taken to a browser showing your results.


Save your document!

Then, as a final introduction to how the program works, note that you can close and re-open your document, then double click on the Results Object ("LinkListResults", blue box), and find the collection of results saved there.

 

[ Back to Anthracite Documentation ]






last update: 6/14/04
last update: 5/20/04
last update: 3/02/04
last update: 2/20/04
last update: 2/05/04
last update: 3/27/03