how to generate pages from apache templates with anthracite
Anthracite offers two primary subtypes of output with the Report Object.
The "Basic Report" is covered elsewhere, in this example we're going to use a Custom Snippet Report and the Apache (HTML Comments) Template Report.
This example is hopefully relevant to any Google user, it shows you how to load more than just 10 results at a time from Google, then create real-time summaries of what's on the pages right now and put those results into a pre-existing template.
First take a look at the final document overview:
![]()
...and then be sure to check out the results at the end of this document (anth_apache_tmpl.html) showing what this Anthracite process will produce.
Here's how it works:
First, we use a Google API Source Object (with our required key from Google), to do a search for "Regular Expressions" and request 25 results. Other documents cover the Google API Source Object in more detail, but in short, what we're doing is performing a Google query just like you would normally, except that Anthracite goes and loads all of the pages returned from the Google query, and then processes the source of those found pages.
In this case, we're going to summarize all those pages into 8 sentences, and then make a table with a link to the page using its title. That table is then inserted into a pre-defined template and exported to an HTML file for use on a webserver.
Pretty straightforward, right?
Two Key Benefits:
This approach provides two key benefits over "normal" web surfing. One is that with a traditional Google search, you're limited to 10 results per page, and about a two sentence summary. The second benefit is that the summary is of variable length (we chose 8 sentences), and it's a real-time summary of the web page that was just loaded, not a cached copy in Googlespace.
Detailed Instructions:
First, make sure you've got a template to work with. For this example, start with our very simple template, available at this URL (you can use this in the Anthracite Report Object setting):
http://www.metafy.com/anthracite/samples/templates/sample_apache_template.html
Note that if you click on the template link, you'll see an "empty" version of the template without information filled in.
"View Source" on this page and you'll see the Anthracite tags that have been inserted, such as this one to insert some named results:
<!--#anthracite name="GoogleTable"-->
or this tag that puts in the current date:
<!--#anthracite name="__DATE__"-->
You can revise your template and re-run your Anthracite process anytime (or simply make updates to a template that's used in an automated process and see the changes integrated on the next execution).
Once you have your template built and saved, we can move on to constructing the Anthracite document.
We're going to use the results from a Google query as the source for this process chain, and so first we start with a Google API Source Object.
Detailed configuration of the Google API Source Object is covered elsewhere, but a key detail is that you must have a Google API Key from Google to make use of this tool. We've configured ours to do a query for "Regular Expressions" and to get 25 results.
Next, the output of that source object is passed into two seperate process chains, one on the left to extract the title, and one on the right to produce the summary of the page.
As explained above, the output of the Google API Source Object is going to be an array of 25 web pages, and the processes we're going to perform will operate on this data.
On the left, we use a single "Text Between" Processor Object to extract the titles of each page by pulling out what's between the <title> tags. This will produce an array of 25 page titles.
On the Right, it's a little more complicated, requiring four Processor Objects to get just the results we want.
These four processor objects are arranged in clockwise order, and starting at the upper left with the process that gets the output from the source, they are in order of operation:
StripTags - A "Strip" Processor Object that removes all the HTML from the documents so that we're just summarizing plain text.
SummarizeIn8 - A "Summarize" Processor Object based on Apple's Text Summarization Engine configured to produce an 8 sentence summary of the plain text. You can set the number of sentences to more or fewer as you prefer, we chose 8 sentences to try and get a good feel for what the website is about.
BRLF - The Apple Text Summarization Engine produces output that uses returns to put space between paragraphs, but browsers usually ignore plain text returns. So we use this "Unify Linefeeds" Processor Object to make all the returns into HTML tags for linebreaks instead.
summary - Here, a "Find/Replace" Processor Object is used to make all occurrences of the search term ("Regular Expressions") bold, and so wherever we find the term, we replace it with the term inside of HTML bold tags (<b>Search Term</b>).
Now, we've got 25 titles and 25 summaries, and we're ready to bring them together in an HTML table that will be the centerpiece of our final output.
To do this, we first generate the individual rows for the table using the 25 sets of results we've generated, now connected to a Report Object from Processor Objects named "title" and "summary".
We use a Custom Snippet Template Report Object to build the rows, using only a few HTML tags that define the row contents from the connected inputs, here's what it looks like configured in Anthracite:
And here's just the HTML snippet broken up by the cells:
<tr> <td> <a href="__SOURCE__">__title__</a> </td> <td> <font size="1">__summary__</font> </td> <tr>\n
and then after the closing table row tag we add a "backslash-n" to insert the linefeed character to help break up our HTML with returns at the end of each line in the final output so it's more readable.
You can see in the snippet that we insert data from our inputs into each row by putting in the name of the input surrounded by two underscore characters.
We also use a special built-in Anthracite tag called "SOURCE" that is where the URL of the current page is stored, and so we can insert it into our reports, here, using it so that when you click on the Title of the page, it sends you to the appropriate website.
After this Report Object generates the 25 rows for the pages we got back from our Google query, it's passed to a "Wrap" Processor Object, which we use to place the required HTML table tags around the rows, we also set the table parameters to have some padding but no spacing and the thinnest border.
Finally, we're ready to place this table of Google results into our Template.
The output of the Wrap Processor Object, named "GoogleTable" in this example, is connected to another Template Report Object, this time one of the Apache (HTML Comments) subtype.
Here's where we use the template we generated above.
Enter this URL in the "Template URL or Snippet:" field of the Template Report Object:
http://www.metafy.com/anthracite/samples/templates/sample_apache_template.htmlLike so:
(click for larger)
You may recall from viewing the source of that template that we use an HTML comment tag with the name set to "GoogleTable" where we want the table inserted:
<!--#anthracite name="GoogleTable"-->
Note that the name of the input in the HTML comment tag ("GoogleTable") is NOT surrounded by double underscores. A couple other things to note, the title of the Report Object is used as the page title with the special __OBJECT_TITLE__ tag, and the Date, Time and Anthracite Version are included at the bottom.
Here after the Template Report, we branch to the left and the right again, this time on the Left we keep a stored copy of our results, and on the right, we export the results to an HTML file that's saved in our /Library/WebServer/Documents folder so that we can access it via the web.
In case you have read this far without playing along, here's a sample of the output of this process (compare this to the empty template above):
Anthracite Google Search to Apache Template Sample Output (anth_apache_tmpl.html)
For more information about Anthracite's tools, such as the Google API Source Object and the Template Report Object, check out the Anthracite Tools Documentation.
[ Top of Page ] [ Metafy Home ]
Last Updated: 8/27/2004
Copyright © 2004, All Rights Reserved, Metafy LLC