Filters and plugins are simple Unix pipes. Input comes in
stdin
, parameters come from the config file, and output goes to
stdout
. Anything written to stderr
is logged as an
ERROR message. If no stdout
is produced, the entry is not written
to the cache or processed further; in fact, if the entry had previously been
written to the cache, it will be removed.
There are two types of filters supported by Venus, input and template.
Input to an input filter is a aggressively normalized entry. For example, if a feed is RSS 1.0 with 10 items, the filter will be called ten times, each with a single Atom 1.0 entry, with all textConstructs expressed as XHTML, and everything encoded as UTF-8.
Input to a template filter will be the output produced by the template.
You will find a small set of example filters in the filters directory. The coral cdn filter will change links to images in the entry itself. The filters in the stripAd subdirectory will strip specific types of advertisements that you may find in feeds.
The excerpt filter adds metadata (in
the form of a planet:excerpt
element) to the feed itself. You
can see examples of how parameters are passed to this program in either
excerpt-images or
opml-top100.ini.
Alternately parameters may be passed
URI style, for example:
excerpt-images2.
The xpath sifter is a variation of the above, including or excluding feeds based on the presence (or absence) of data specified by xpath expressions. Again, parameters can be passed as config options or URI style.
The regexp sifter operates just like the xpath sifter, except it uses regular expressions instead of XPath expressions.
[planet]
section of your config.ini
will be invoked on all feeds. Filters listed in individual
[feed]
sections will only be invoked on those feeds.
Filters listed in [template]
sections will be invoked on the
output of that template..py
invokes
python. .xslt
involkes XSLT. .sed
and
.tmpl
(a.k.a. htmltmp) are also options. Other languages, like
perl or ruby or class/jar (java), aren't supported at the moment, but these
would be easy to add.>
),
then the output stream is
teed; one branch flows
through the specified filter and the output is planced into the named file; the
other unmodified branch continues onto the next filter, if any.
One use case for this function is to use
xhtml2html to produce both an XHTML
and an HTML output stream from one source.os.abort()
can't be recovered
from.