diff options
Diffstat (limited to 'libjava/gnu/xml/pipeline')
| -rw-r--r-- | libjava/gnu/xml/pipeline/CallFilter.java | 250 | ||||
| -rw-r--r-- | libjava/gnu/xml/pipeline/DomConsumer.java | 969 | ||||
| -rw-r--r-- | libjava/gnu/xml/pipeline/EventConsumer.java | 95 | ||||
| -rw-r--r-- | libjava/gnu/xml/pipeline/EventFilter.java | 809 | ||||
| -rw-r--r-- | libjava/gnu/xml/pipeline/LinkFilter.java | 243 | ||||
| -rw-r--r-- | libjava/gnu/xml/pipeline/NSFilter.java | 340 | ||||
| -rw-r--r-- | libjava/gnu/xml/pipeline/PipelineFactory.java | 723 | ||||
| -rw-r--r-- | libjava/gnu/xml/pipeline/TeeConsumer.java | 413 | ||||
| -rw-r--r-- | libjava/gnu/xml/pipeline/TextConsumer.java | 117 | ||||
| -rw-r--r-- | libjava/gnu/xml/pipeline/ValidationConsumer.java | 1922 | ||||
| -rw-r--r-- | libjava/gnu/xml/pipeline/WellFormednessFilter.java | 362 | ||||
| -rw-r--r-- | libjava/gnu/xml/pipeline/XIncludeFilter.java | 580 | ||||
| -rw-r--r-- | libjava/gnu/xml/pipeline/XsltFilter.java | 131 | ||||
| -rw-r--r-- | libjava/gnu/xml/pipeline/package.html | 255 | 
14 files changed, 7209 insertions, 0 deletions
diff --git a/libjava/gnu/xml/pipeline/CallFilter.java b/libjava/gnu/xml/pipeline/CallFilter.java new file mode 100644 index 00000000000..0d8585991df --- /dev/null +++ b/libjava/gnu/xml/pipeline/CallFilter.java @@ -0,0 +1,250 @@ +/* CallFilter.java --  +   Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING.  If not, write to the +Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA +02111-1307 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library.  Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module.  An independent module is a module which is not derived from +or based on this library.  If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so.  If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.io.*; +import java.net.*; + +import org.xml.sax.*; +import org.xml.sax.ext.*; +import org.xml.sax.helpers.XMLReaderFactory; + +import gnu.xml.util.Resolver; +import gnu.xml.util.XMLWriter; + + +/** + * Input is sent as an XML request to given URI, and the output of this + * filter is the parsed response to that request. + * A connection is opened to the remote URI when the startDocument call is + * issued through this filter, and the request is finished when the + * endDocument call is issued.  Events should be written quickly enough to + * prevent the remote HTTP server from aborting the connection due to + * inactivity; you may want to buffer text in an earlier pipeline stage. + * If your application requires validity checking of such + * outputs, have the output pipeline include a validation stage. + * + * <p>In effect, this makes a remote procedure call to the URI, with the + * request and response document syntax as chosen by the application. + * <em>Note that all the input events must be seen, and sent to the URI, + * before the first output event can be seen. </em>  Clients are delayed + * at least by waiting for the server to respond, constraining concurrency. + * Services can thus be used to synchronize concurrent activities, and + * even to prioritize service among different clients. + * + * <p> You are advised to avoid restricting yourself to an "RPC" model + * for distributed computation.  With a World Wide Web, network latencies + * and failures (e.g. non-availability) + * are significant; adopting a "procedure" model, rather than a workflow + * model where bulk requests are sent and worked on asynchronously, is not + * generally an optimal system-wide architecture.  When the messages may + * need authentication, such as with an OpenPGP signature, or when server + * loads don't argue in favor of immediate responses, non-RPC models can + * be advantageous.  (So-called "peer to peer" computing models are one + * additional type of model, though too often that term is applied to + * systems that still have a centralized control structure.) + * + * <p> <em>Be strict in what you send, liberal in what you accept,</em> as + * the Internet tradition goes.  Strictly conformant data should never cause + * problems to its receiver; make your request pipeline be very strict, and + * don't compromise on that.  Make your response pipeline strict as well, + * but be ready to tolerate specific mild, temporary, and well-documented + * variations from specific communications peers. + * + * @see XmlServlet + * + * @author David Brownell + */ +final public class CallFilter implements EventConsumer +{ +    private Requestor			req; +    private EventConsumer		next; +    private URL				target; +    private URLConnection		conn; +    private ErrorHandler		errHandler; + + +    /** +     * Initializes a call filter so that its inputs are sent to the +     * specified URI, and its outputs are sent to the next consumer +     * provided. +     * +     * @exception IOException if the URI isn't accepted as a URL +     */ +	// constructor used by PipelineFactory +    public CallFilter (String uri, EventConsumer next) +    throws IOException +    { +	this.next = next; +	req = new Requestor (); +	setCallTarget (uri); +    } + +    /** +     * Assigns the URI of the call target to be used. +     * Does not affect calls currently being made. +     */ +    final public void setCallTarget (String uri) +    throws IOException +    { +	target = new URL (uri); +    } + +    /** +     * Assigns the error handler to be used to present most fatal +     * errors. +     */ +    public void setErrorHandler (ErrorHandler handler) +    { +	req.setErrorHandler (handler); +    } + + +    /** +     * Returns the call target's URI. +     */ +    final public String getCallTarget () +    { +	return target.toString (); +    } + +    /** Returns the content handler currently in use. */ +    final public org.xml.sax.ContentHandler getContentHandler () +    { +	return req; +    } + +    /** Returns the DTD handler currently in use. */ +    final public DTDHandler getDTDHandler () +    { +	return req; +    } + + +    /** +     * Returns the declaration or lexical handler currently in +     * use, or throws an exception for other properties. +     */ +    final public Object getProperty (String id) +    throws SAXNotRecognizedException +    { +	if (EventFilter.DECL_HANDLER.equals (id)) +	    return req; +	if (EventFilter.LEXICAL_HANDLER.equals (id)) +	    return req; +	throw new SAXNotRecognizedException (id); +    } + + +    // JDK 1.1 seems to need it to be done this way, sigh +    ErrorHandler getErrorHandler () { return errHandler; } + +    // +    // Takes input and echoes to server as POST input. +    // Then sends the POST reply to the next pipeline element. +    // +    final class Requestor extends XMLWriter +    { +	Requestor () +	{ +	    super ((Writer)null); +	} + +	public synchronized void startDocument () throws SAXException +	{ +	    // Connect to remote object and set up to send it XML text +	    try { +		if (conn != null) +		    throw new IllegalStateException ("call is being made"); + +		conn = target.openConnection (); +		conn.setDoOutput (true); +		conn.setRequestProperty ("Content-Type", +			    "application/xml;charset=UTF-8"); + +		setWriter (new OutputStreamWriter ( +			conn.getOutputStream (), +			"UTF8"), "UTF-8"); + +	    } catch (IOException e) { +		fatal ("can't write (POST) to URI: " + target, e); +	    } + +	    // NOW base class can safely write that text! +	    super.startDocument (); +	} + +	public void endDocument () throws SAXException +	{ +	    // +	    // Finish writing the request (for HTTP, a POST); +	    // this closes the output stream. +	    // +	    super.endDocument (); + +	    // +	    // Receive the response. +	    // Produce events for the next stage. +	    // +	    InputSource	source; +	    XMLReader	producer; +	    String	encoding; + +	    try { + +		source = new InputSource (conn.getInputStream ()); + +// FIXME if status is anything but success, report it!!  It'd be good to +// save the request data just in case we need to deal with a forward. + +		encoding = Resolver.getEncoding (conn.getContentType ()); +		if (encoding != null) +		    source.setEncoding (encoding); + +		producer = XMLReaderFactory.createXMLReader (); +		producer.setErrorHandler (getErrorHandler ()); +		EventFilter.bind (producer, next); +		producer.parse (source); +		conn = null; + +	    } catch (IOException e) { +		fatal ("I/O Exception reading response, " + e.getMessage (), e); +	    } +	} +    } +} diff --git a/libjava/gnu/xml/pipeline/DomConsumer.java b/libjava/gnu/xml/pipeline/DomConsumer.java new file mode 100644 index 00000000000..17fdeeb3453 --- /dev/null +++ b/libjava/gnu/xml/pipeline/DomConsumer.java @@ -0,0 +1,969 @@ +/* DomConsumer.java --  +   Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING.  If not, write to the +Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA +02111-1307 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library.  Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module.  An independent module is a module which is not derived from +or based on this library.  If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so.  If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.util.Hashtable; + +import org.w3c.dom.*; +import org.xml.sax.*; +import org.xml.sax.ext.DeclHandler; +import org.xml.sax.ext.LexicalHandler; +import org.xml.sax.helpers.AttributesImpl; + +import gnu.xml.aelfred2.ContentHandler2; +import gnu.xml.util.DomParser; + + +/** + * This consumer builds a DOM Document from its input, acting either as a + * pipeline terminus or as an intermediate buffer.  When a document's worth + * of events has been delivered to this consumer, that document is read with + * a {@link DomParser} and sent to the next consumer.  It is also available + * as a read-once property. + * + * <p>The DOM tree is constructed as faithfully as possible.  There are some + * complications since a DOM should expose behaviors that can't be implemented + * without API backdoors into that DOM, and because some SAX parsers don't + * report all the information that DOM permits to be exposed.  The general + * problem areas involve information from the Document Type Declaration (DTD). + * DOM only represents a limited subset, but has some behaviors that depend + * on much deeper knowledge of a document's DTD.  You shouldn't have much to + * worry about unless you change handling of "noise" nodes from its default + * setting (which ignores them all); note if you use JAXP to populate your + * DOM trees, it wants to save "noise" nodes by default.  (Such nodes include + * ignorable whitespace, comments, entity references and CDATA boundaries.) + * Otherwise, your + * main worry will be if you use a SAX parser that doesn't flag ignorable + * whitespace unless it's validating (few don't). + * + * <p> The SAX2 events used as input must contain XML Names for elements + * and attributes, with original prefixes.  In SAX2, + * this is optional unless the "namespace-prefixes" parser feature is set. + * Moreover, many application components won't provide completely correct + * structures anyway.  <em>Before you convert a DOM to an output document, + * you should plan to postprocess it to create or repair such namespace + * information.</em> The {@link NSFilter} pipeline stage does such work. + * + * <p> <em>Note:  changes late in DOM L2 process made it impractical to + * attempt to create the DocumentType node in any implementation-neutral way, + * much less to populate it (L1 didn't support even creating such nodes). + * To create and populate such a node, subclass the inner + * {@link DomConsumer.Handler} class and teach it about the backdoors into + * whatever DOM implementation you want.  It's possible that some revised + * DOM API (L3?) will make this problem solvable again. </em> + * + * @see DomParser + * + * @author David Brownell + */ +public class DomConsumer implements EventConsumer +{ +    private Class		domImpl; + +    private boolean		hidingCDATA = true; +    private boolean		hidingComments = true; +    private boolean		hidingWhitespace = true; +    private boolean		hidingReferences = true; + +    private Handler		handler; +    private ErrorHandler	errHandler; + +    private EventConsumer	next; + +    // FIXME:  this can't be a generic pipeline stage just now, +    // since its input became a Class not a String (to be turned +    // into a class, using the right class loader) + + +    /** +     * Configures this pipeline terminus to use the specified implementation +     * of DOM when constructing its result value. +     * +     * @param impl class implementing {@link org.w3c.dom.Document Document} +     *	which publicly exposes a default constructor +     * +     * @exception SAXException when there is a problem creating an +     *	empty DOM document using the specified implementation +     */ +    public DomConsumer (Class impl) +    throws SAXException +    { +	domImpl = impl; +	handler = new Handler (this); +    } + +    /** +     * This is the hook through which a subclass provides a handler +     * which knows how to access DOM extensions, specific to some +     * implementation, to record additional data in a DOM. +     * Treat this as part of construction; don't call it except +     * before (or between) parses. +     */ +    protected void setHandler (Handler h) +    { +	handler = h; +    } + + +    private Document emptyDocument () +    throws SAXException +    { +	try { +	    return (Document) domImpl.newInstance (); +	} catch (IllegalAccessException e) { +	    throw new SAXException ("can't access constructor: " +		    + e.getMessage ()); +	} catch (InstantiationException e) { +	    throw new SAXException ("can't instantiate Document: " +		    + e.getMessage ()); +	} +    } + + +    /** +     * Configures this consumer as a buffer/filter, using the specified +     * DOM implementation when constructing its result value. +     * +     * <p> This event consumer acts as a buffer and filter, in that it +     * builds a DOM tree and then writes it out when <em>endDocument</em> +     * is invoked.  Because of the limitations of DOM, much information +     * will as a rule not be seen in that replay.  To get a full fidelity +     * copy of the input event stream, use a {@link TeeConsumer}. +     * +     * @param impl class implementing {@link org.w3c.dom.Document Document} +     *	which publicly exposes a default constructor +     * @param next receives a "replayed" sequence of parse events when +     *	the <em>endDocument</em> method is invoked. +     * +     * @exception SAXException when there is a problem creating an +     *	empty DOM document using the specified DOM implementation +     */ +    public DomConsumer (Class impl, EventConsumer n) +    throws SAXException +    { +	this (impl); +	next = n; +    } + + +    /** +     * Returns the document constructed from the preceding +     * sequence of events.  This method should not be +     * used again until another sequence of events has been +     * given to this EventConsumer.   +     */ +    final public Document getDocument () +    { +	return handler.clearDocument (); +    } + +    public void setErrorHandler (ErrorHandler handler) +    { +	errHandler = handler; +    } + + +    /** +     * Returns true if the consumer is hiding entity references nodes +     * (the default), and false if EntityReference nodes should +     * instead be created.  Such EntityReference nodes will normally be +     * empty, unless an implementation arranges to populate them and then +     * turn them back into readonly objects. +     * +     * @see #setHidingReferences +     */ +    final public boolean	isHidingReferences () +	{ return hidingReferences; } + +    /** +     * Controls whether the consumer will hide entity expansions, +     * or will instead mark them with entity reference nodes. +     * +     * @see #isHidingReferences +     * @param flag False if entity reference nodes will appear +     */ +    final public void		setHidingReferences (boolean flag) +	{ hidingReferences = flag; } +     + +    /** +     * Returns true if the consumer is hiding comments (the default), +     * and false if they should be placed into the output document. +     * +     * @see #setHidingComments +     */ +    public final boolean isHidingComments () +	{ return hidingComments; } + +    /** +     * Controls whether the consumer is hiding comments. +     * +     * @see #isHidingComments +     */ +    public final void setHidingComments (boolean flag) +	{ hidingComments = flag; } + + +    /** +     * Returns true if the consumer is hiding ignorable whitespace +     * (the default), and false if such whitespace should be placed +     * into the output document as children of element nodes. +     * +     * @see #setHidingWhitespace +     */ +    public final boolean isHidingWhitespace () +	{ return hidingWhitespace; } + +    /** +     * Controls whether the consumer hides ignorable whitespace +     * +     * @see #isHidingComments +     */ +    public final void setHidingWhitespace (boolean flag) +	{ hidingWhitespace = flag; } + + +    /** +     * Returns true if the consumer is saving CDATA boundaries, or +     * false (the default) otherwise. +     * +     * @see #setHidingCDATA +     */ +    final public boolean	isHidingCDATA () +	{ return hidingCDATA; } + +    /** +     * Controls whether the consumer will save CDATA boundaries. +     * +     * @see #isHidingCDATA +     * @param flag True to treat CDATA text differently from other +     *	text nodes +     */ +    final public void		setHidingCDATA (boolean flag) +	{ hidingCDATA = flag; } +     + + +    /** Returns the document handler being used. */ +    final public ContentHandler getContentHandler () +	{ return handler; } + +    /** Returns the DTD handler being used. */ +    final public DTDHandler getDTDHandler () +	{ return handler; } + +    /** +     * Returns the lexical handler being used. +     * (DOM construction can't really use declaration handlers.) +     */ +    final public Object getProperty (String id) +    throws SAXNotRecognizedException +    { +	if ("http://xml.org/sax/properties/lexical-handler".equals (id)) +	    return handler; +	if ("http://xml.org/sax/properties/declaration-handler".equals (id)) +	    return handler; +	throw new SAXNotRecognizedException (id); +    } + +    EventConsumer getNext () { return next; } + +    ErrorHandler getErrorHandler () { return errHandler; } + +    /** +     * Class used to intercept various parsing events and use them to +     * populate a DOM document.  Subclasses would typically know and use +     * backdoors into specific DOM implementations, used to implement  +     * DTD-related functionality. +     * +     * <p> Note that if this ever throws a DOMException (runtime exception) +     * that will indicate a bug in the DOM (e.g. doesn't support something +     * per specification) or the parser (e.g. emitted an illegal name, or +     * accepted illegal input data). </p> +     */ +    public static class Handler +	implements ContentHandler2, LexicalHandler, +	    DTDHandler, DeclHandler +    { +	protected DomConsumer		consumer; + +	private DOMImplementation	impl; +	private Document 		document; +	private boolean		isL2; + +	private Locator		locator; +	private Node		top; +	private boolean		inCDATA; +	private boolean		mergeCDATA; +	private boolean		inDTD; +	private String		currentEntity; + +	private boolean		recreatedAttrs; +	private AttributesImpl	attributes = new AttributesImpl (); + +	/** +	 * Subclasses may use SAX2 events to provide additional +	 * behaviors in the resulting DOM. +	 */ +	protected Handler (DomConsumer consumer) +	throws SAXException +	{ +	    this.consumer = consumer; +	    document = consumer.emptyDocument (); +	    impl = document.getImplementation (); +	    isL2 = impl.hasFeature ("XML", "2.0"); +	} + +	private void fatal (String message, Exception x) +	throws SAXException +	{ +	    SAXParseException	e; +	    ErrorHandler	errHandler = consumer.getErrorHandler ();; + +	    if (locator == null) +		e = new SAXParseException (message, null, null, -1, -1, x); +	    else +		e = new SAXParseException (message, locator, x); +	    if (errHandler != null) +		errHandler.fatalError (e); +	    throw e; +	} + +	/** +	 * Returns and forgets the document produced.  If the handler is +	 * reused, a new document may be created. +	 */ +	Document clearDocument () +	{ +	    Document retval = document; +	    document = null; +	    locator = null; +	    return retval; +	} + +	/** +	 * Returns the document under construction. +	 */ +	protected Document getDocument () +	    { return document; } +	 +	/** +	 * Returns the current node being populated.  This is usually +	 * an Element or Document, but it might be an EntityReference +	 * node if some implementation-specific code knows how to put +	 * those into the result tree and later mark them as readonly. +	 */ +	protected Node getTop () +	    { return top; } + + +	// SAX1 +	public void setDocumentLocator (Locator locator) +	{ +	    this.locator = locator; +	} + +	// SAX1 +	public void startDocument () +	throws SAXException +	{ +	    if (document == null) +		try { +		    if (isL2) { +			// couple to original implementation +			document = impl.createDocument (null, "foo", null); +			document.removeChild (document.getFirstChild ()); +		    } else { +			document = consumer.emptyDocument (); +		    } +		} catch (Exception e) { +		    fatal ("DOM create document", e); +		} +	    top = document; +	} + +        // ContentHandler2 +        public void xmlDecl(String version, +                            String encoding, +                            boolean standalone, +                            String inputEncoding) +          throws SAXException +        { +          if (document != null) +            { +              document.setXmlVersion(version); +              document.setXmlStandalone(standalone); +            } +        } + +	// SAX1 +	public void endDocument () +	throws SAXException +	{ +	    try { +		if (consumer.getNext () != null && document != null) { +		    DomParser	parser = new DomParser (document); + +		    EventFilter.bind (parser, consumer.getNext ()); +		    parser.parse ("ignored"); +		} +	    } finally { +		top = null; +	    } +	} + +	// SAX1 +	public void processingInstruction (String target, String data) +	throws SAXException +	{ +	    // we can't create populated entity ref nodes using +	    // only public DOM APIs (they've got to be readonly) +	    if (currentEntity != null) +		return; + +	    ProcessingInstruction	pi; + +	    if (isL2 +		    // && consumer.isUsingNamespaces () +		    && target.indexOf (':') != -1) +		namespaceError ( +		    "PI target name is namespace nonconformant: " +			+ target); +	    if (inDTD) +		return; +	    pi = document.createProcessingInstruction (target, data); +	    top.appendChild (pi); +	} + +	/** +	 * Subclasses may overrride this method to provide a more efficient +	 * way to construct text nodes. +	 * Typically, copying the text into a single character array will +	 * be more efficient than doing that as well as allocating other +	 * needed for a String, including an internal StringBuffer. +	 * Those additional memory and CPU costs can be incurred later, +	 * if ever needed. +	 * Unfortunately the standard DOM factory APIs encourage those costs +	 * to be incurred early. +	 */ +	protected Text createText ( +	    boolean	isCDATA, +	    char	ch [], +	    int		start, +	    int		length +	) { +	    String	value = new String (ch, start, length); + +	    if (isCDATA) +		return document.createCDATASection (value); +	    else +		return document.createTextNode (value); +	} + +	// SAX1 +	public void characters (char ch [], int start, int length) +	throws SAXException +	{ +	    // we can't create populated entity ref nodes using +	    // only public DOM APIs (they've got to be readonly +	    // at creation time) +	    if (currentEntity != null) +		return; + +	    Node	lastChild = top.getLastChild (); + +	    // merge consecutive text or CDATA nodes if appropriate. +	    if (lastChild instanceof Text) { +		if (consumer.isHidingCDATA () +			// consecutive Text content ... always merge +			|| (!inCDATA +			    && !(lastChild instanceof CDATASection)) +			// consecutive CDATASection content ... don't +			// merge between sections, only within them +			|| (inCDATA && mergeCDATA +			    && lastChild instanceof CDATASection) +			    ) { +		    CharacterData	last = (CharacterData) lastChild; +		    String		value = new String (ch, start, length); +		     +		    last.appendData (value); +		    return; +		} +	    } +	    if (inCDATA && !consumer.isHidingCDATA ()) { +		top.appendChild (createText (true, ch, start, length)); +		mergeCDATA = true; +	    } else +		top.appendChild (createText (false, ch, start, length)); +	} + +	// SAX2 +	public void skippedEntity (String name) +	throws SAXException +	{ +	    // this callback is useless except to report errors, since +	    // we can't know if the ref was in content, within an +	    // attribute, within a declaration ... only one of those +	    // cases supports more intelligent action than a panic. +	    fatal ("skipped entity: " + name, null); +	} + +	// SAX2 +	public void startPrefixMapping (String prefix, String uri) +	throws SAXException +	{ +	    // reconstruct "xmlns" attributes deleted by all +	    // SAX2 parsers without "namespace-prefixes" = true +	    if ("".equals (prefix)) +		attributes.addAttribute ("", "", "xmlns", +			"CDATA", uri); +	    else +		attributes.addAttribute ("", "", "xmlns:" + prefix, +			"CDATA", uri); +	    recreatedAttrs = true; +	} + +	// SAX2 +	public void endPrefixMapping (String prefix) +	throws SAXException +	    { } + +	// SAX2 +	public void startElement ( +	    String uri, +	    String localName, +	    String qName, +	    Attributes atts +	) throws SAXException +	{ +	    // we can't create populated entity ref nodes using +	    // only public DOM APIs (they've got to be readonly) +	    if (currentEntity != null) +		return; + +	    // parser discarded basic information; DOM tree isn't writable +	    // without massaging to assign prefixes to all nodes. +	    // the "NSFilter" class does that massaging. +	    if (qName.length () == 0) +		qName = localName; + + +	    Element	element; +	    int		length = atts.getLength (); + +	    if (!isL2) { +		element = document.createElement (qName); + +		// first the explicit attributes ... +		length = atts.getLength (); +		for (int i = 0; i < length; i++) +		    element.setAttribute (atts.getQName (i), +					    atts.getValue (i)); +		// ... then any recreated ones (DOM deletes duplicates) +		if (recreatedAttrs) { +		    recreatedAttrs = false; +		    length = attributes.getLength (); +		    for (int i = 0; i < length; i++) +			element.setAttribute (attributes.getQName (i), +						attributes.getValue (i)); +		    attributes.clear (); +		} + +		top.appendChild (element); +		top = element; +		return; +	    } + +	    // For an L2 DOM when namespace use is enabled, use +	    // createElementNS/createAttributeNS except when +	    // (a) it's an element in the default namespace, or +	    // (b) it's an attribute with no prefix +	    String	namespace; +	     +	    if (localName.length () != 0) +		namespace = (uri.length () == 0) ? null : uri; +	    else +		namespace = getNamespace (getPrefix (qName), atts); + +	    if (namespace == null) +		element = document.createElement (qName); +	    else +		element = document.createElementNS (namespace, qName); + +	    populateAttributes (element, atts); +	    if (recreatedAttrs) { +		recreatedAttrs = false; +		// ... DOM deletes any duplicates +		populateAttributes (element, attributes); +		attributes.clear (); +	    } + +	    top.appendChild (element); +	    top = element; +	} + +	final static String	xmlnsURI = "http://www.w3.org/2000/xmlns/"; + +	private void populateAttributes (Element element, Attributes attrs) +	throws SAXParseException +	{ +	    int		length = attrs.getLength (); + +	    for (int i = 0; i < length; i++) { +		String	type = attrs.getType (i); +		String	value = attrs.getValue (i); +		String	name = attrs.getQName (i); +		String	local = attrs.getLocalName (i); +		String	uri = attrs.getURI (i); + +		// parser discarded basic information, DOM tree isn't writable +		if (name.length () == 0) +		    name = local; + +		// all attribute types other than these three may not +		// contain scoped names... enumerated attributes get +		// reported as NMTOKEN, except for NOTATION values +		if (!("CDATA".equals (type) +			|| "NMTOKEN".equals (type) +			|| "NMTOKENS".equals (type))) { +		    if (value.indexOf (':') != -1) { +			namespaceError ( +				"namespace nonconformant attribute value: " +				    + "<" + element.getNodeName () +				    + " " + name + "='" + value + "' ...>"); +		    } +		} + +		// xmlns="" is legal (undoes default NS) +		// xmlns:foo="" is illegal +		String prefix = getPrefix (name); +		String namespace; + +		if ("xmlns".equals (prefix)) { +		    if ("".equals (value)) +			namespaceError ("illegal null namespace decl, " + name); +		    namespace = xmlnsURI; +		} else if ("xmlns".equals (name)) +		    namespace = xmlnsURI; + +		else if (prefix == null) +		    namespace = null; +		else if (!"".equals(uri) && uri.length () != 0) +		    namespace = uri; +		else +		    namespace = getNamespace (prefix, attrs); + +		if (namespace == null) +		    element.setAttribute (name, value); +		else +		    element.setAttributeNS (namespace, name, value); +	    } +	} + +	private String getPrefix (String name) +	{ +	    int		temp; + +	    if ((temp = name.indexOf (':')) > 0) +		return name.substring (0, temp); +	    return null; +	} + +	// used with SAX1-level parser output  +	private String getNamespace (String prefix, Attributes attrs) +	throws SAXParseException +	{ +	    String namespace; +	    String decl; + +	    // defaulting  +	    if (prefix == null) { +		decl = "xmlns"; +		namespace = attrs.getValue (decl); +		if ("".equals (namespace)) +		    return null; +		else if (namespace != null) +		    return namespace; + +	    // "xmlns" is like a keyword +	    // ... according to the Namespace REC, but DOM L2 CR2+ +	    // and Infoset violate that by assigning a namespace. +	    // that conflict is resolved elsewhere. +	    } else if ("xmlns".equals (prefix)) +		return null; + +	    // "xml" prefix is fixed +	    else if ("xml".equals (prefix)) +		return "http://www.w3.org/XML/1998/namespace"; + +	    // otherwise, expect a declaration +	    else { +		decl = "xmlns:" + prefix; +		namespace = attrs.getValue (decl); +	    } +	     +	    // if we found a local declaration, great +	    if (namespace != null) +		return namespace; + + +	    // ELSE ... search up the tree we've been building +	    for (Node n = top; +		    n != null && n.getNodeType () != Node.DOCUMENT_NODE; +		    n = (Node) n.getParentNode ()) { +		if (n.getNodeType () == Node.ENTITY_REFERENCE_NODE) +		    continue; +		Element e = (Element) n; +		Attr attr = e.getAttributeNode (decl); +		if (attr != null) +		    return attr.getNodeValue (); +	    } +	    // see above re "xmlns" as keyword +	    if ("xmlns".equals (decl)) +		return null; + +	    namespaceError ("Undeclared namespace prefix: " + prefix); +	    return null; +	} + +	// SAX2 +	public void endElement (String uri, String localName, String qName) +	throws SAXException +	{ +	    // we can't create populated entity ref nodes using +	    // only public DOM APIs (they've got to be readonly) +	    if (currentEntity != null) +		return; + +	    top = top.getParentNode (); +	} + +	// SAX1 (mandatory reporting if validating) +	public void ignorableWhitespace (char ch [], int start, int length) +	throws SAXException +	{ +	    if (consumer.isHidingWhitespace ()) +		return; +	    characters (ch, start, length); +	} + +	// SAX2 lexical event +	public void startCDATA () +	throws SAXException +	{ +	    inCDATA = true; +	    // true except for the first fragment of a cdata section +	    mergeCDATA = false; +	} +	 +	// SAX2 lexical event +	public void endCDATA () +	throws SAXException +	{ +	    inCDATA = false; +	} +	 +	// SAX2 lexical event +	// +	// this SAX2 callback merges two unrelated things: +	//	- Declaration of the root element type ... belongs with +	//    the other DTD declaration methods, NOT HERE. +	//	- IDs for the optional external subset ... belongs here +	//    with other lexical information. +	// +	// ...and it doesn't include the internal DTD subset, desired +	// both to support DOM L2 and to enable "pass through" processing +	// +	public void startDTD (String name, String publicId, String SystemId) +	throws SAXException +	{ +	    // need to filter out comments and PIs within the DTD +	    inDTD = true; +	} +	 +	// SAX2 lexical event +	public void endDTD () +	throws SAXException +	{ +	    inDTD = false; +	} +	 +	// SAX2 lexical event +	public void comment (char ch [], int start, int length) +	throws SAXException +	{ +	    Node	comment; + +	    // we can't create populated entity ref nodes using +	    // only public DOM APIs (they've got to be readonly) +	    if (consumer.isHidingComments () +		    || inDTD +		    || currentEntity != null) +		return; +	    comment = document.createComment (new String (ch, start, length)); +	    top.appendChild (comment); +	} + +	/** +	 * May be overridden by subclasses to return true, indicating +	 * that entity reference nodes can be populated and then made +	 * read-only. +	 */ +	public boolean canPopulateEntityRefs () +	    { return false; } + +	// SAX2 lexical event +	public void startEntity (String name) +	throws SAXException +	{ +	    // are we ignoring what would be contents of an +	    // entity ref, since we can't populate it? +	    if (currentEntity != null) +		return; + +	    // Are we hiding all entity boundaries? +	    if (consumer.isHidingReferences ()) +		return; + +	    // SAX2 shows parameter entities; DOM hides them +	    if (name.charAt (0) == '%' || "[dtd]".equals (name)) +		return; + +	    // Since we can't create a populated entity ref node in any +	    // standard way, we create an unpopulated one. +	    EntityReference ref = document.createEntityReference (name); +	    top.appendChild (ref); +	    top = ref; + +	    // ... allowing subclasses to populate them +	    if (!canPopulateEntityRefs ()) +		currentEntity = name; +	} + +	// SAX2 lexical event +	public void endEntity (String name) +	throws SAXException +	{ +	    if (name.charAt (0) == '%' || "[dtd]".equals (name)) +		return; +	    if (name.equals (currentEntity)) +		currentEntity = null; +	    if (!consumer.isHidingReferences ()) +		top = top.getParentNode (); +	} + + +	// SAX1 DTD event +	public void notationDecl ( +	    String name, +	    String publicId, String SystemId +	) throws SAXException +	{ +	    /* IGNORE -- no public DOM API lets us store these +	     * into the doctype node +	     */ +	} + +	// SAX1 DTD event +	public void unparsedEntityDecl ( +	    String name, +	    String publicId, String SystemId, +	    String notationName +	) throws SAXException +	{ +	    /* IGNORE -- no public DOM API lets us store these +	     * into the doctype node +	     */ +	} + +	// SAX2 declaration event +	public void elementDecl (String name, String model) +	throws SAXException +	{ +	    /* IGNORE -- no content model support in DOM L2 */ +	} + +	// SAX2 declaration event +	public void attributeDecl ( +	    String eName, +	    String aName, +	    String type, +	    String mode, +	    String value +	) throws SAXException +	{ +	    /* IGNORE -- no attribute model support in DOM L2 */ +	} + +	// SAX2 declaration event +	public void internalEntityDecl (String name, String value) +	throws SAXException +	{ +	    /* IGNORE -- no public DOM API lets us store these +	     * into the doctype node +	     */ +	} + +	// SAX2 declaration event +	public void externalEntityDecl ( +	    String name, +	    String publicId, +	    String SystemId +	) throws SAXException +	{ +	    /* IGNORE -- no public DOM API lets us store these +	     * into the doctype node +	     */ +	} + +	// +	// These really should offer the option of nonfatal handling, +	// like other validity errors, though that would cause major +	// chaos in the DOM data structures.  DOM is already spec'd +	// to treat many of these as fatal, so this is consistent. +	// +	private void namespaceError (String description) +	throws SAXParseException +	{ +	    SAXParseException err; +	     +	    err = new SAXParseException (description, locator); +	    throw err; +	} +    } +} diff --git a/libjava/gnu/xml/pipeline/EventConsumer.java b/libjava/gnu/xml/pipeline/EventConsumer.java new file mode 100644 index 00000000000..5f9737314de --- /dev/null +++ b/libjava/gnu/xml/pipeline/EventConsumer.java @@ -0,0 +1,95 @@ +/* EventConsumer.java --  +   Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING.  If not, write to the +Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA +02111-1307 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library.  Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module.  An independent module is a module which is not derived from +or based on this library.  If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so.  If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import org.xml.sax.*; + + +/** + * Collects the event consumption apparatus of a SAX pipeline stage. + * Consumers which permit some handlers or other characteristics to be + * configured will provide methods to support that configuration. + * + * <p> Two important categories of consumers include <em>filters</em>, which + * process events and pass them on to other consumers, and <em>terminus</em> + * (or <em>terminal</em>) stages, which don't pass events on.  Filters are not + * necessarily derived from the {@link EventFilter} class, although that + * class can substantially simplify their construction by automating the + * most common activities. + * + * <p> Event consumers which follow certain conventions for the signatures + * of their constructors can be automatically assembled into pipelines + * by the {@link PipelineFactory} class. + * + * @author David Brownell + */ +public interface EventConsumer +{ +    /** Most stages process these core SAX callbacks. */ +    public ContentHandler getContentHandler (); + +    /** Few stages will use unparsed entities. */ +    public DTDHandler getDTDHandler (); + +    /** +     * This method works like the SAX2 XMLReader method of the same name, +     * and is used to retrieve the optional lexical and declaration handlers +     * in a pipeline. +     * +     * @param id This is a URI identifying the type of property desired. +     * @return The value of that property, if it is defined. +     * +     * @exception SAXNotRecognizedException Thrown if the particular +     *	pipeline stage does not understand the specified identifier. +     */ +    public Object getProperty (String id) +    throws SAXNotRecognizedException; + +    /** +     * This method provides a filter stage with a handler that abstracts +     * presentation of warnings and both recoverable and fatal errors. +     * Most pipeline stages should share a single policy and mechanism +     * for such reports, since application components require consistency +     * in such activities.  Accordingly, typical responses to this method +     * invocation involve saving the handler for use; filters will pass +     * it on to any other consumers they use. +     * +     * @param handler encapsulates error handling policy for this stage +     */ +    public void setErrorHandler (ErrorHandler handler); +} diff --git a/libjava/gnu/xml/pipeline/EventFilter.java b/libjava/gnu/xml/pipeline/EventFilter.java new file mode 100644 index 00000000000..8587808f399 --- /dev/null +++ b/libjava/gnu/xml/pipeline/EventFilter.java @@ -0,0 +1,809 @@ +/* EventFilter.java --  +   Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING.  If not, write to the +Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA +02111-1307 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library.  Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module.  An independent module is a module which is not derived from +or based on this library.  If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so.  If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; + +import org.xml.sax.*; +import org.xml.sax.ext.*; +import org.xml.sax.helpers.XMLFilterImpl; + +import gnu.xml.aelfred2.ContentHandler2; + +/** + * A customizable event consumer, used to assemble various kinds of filters + * using SAX handlers and an optional second consumer.  It can be constructed + * in two ways: <ul> + * + *  <li> To serve as a passthrough, sending all events to a second consumer. + *  The second consumer may be identified through {@link #getNext}. + * + *  <li> To serve as a dead end, with all handlers null; + *  {@link #getNext} returns null. + * + * </ul> + * + * <p> Additionally, SAX handlers may be assigned, which completely replace + * the "upstream" view (through {@link EventConsumer}) of handlers, initially + * null or the "next" consumer provided to the constructor.  To make + * it easier to build specialized filter classes, this class implements + * all the standard SAX consumer handlers, and those implementations + * delegate "downstream" to the consumer accessed by {@link #getNext}. + * + * <p> The simplest way to create a custom a filter class is to create a + * subclass which overrides one or more handler interface methods.  The + * constructor for that subclass then registers itself as a handler for + * those interfaces using a call such as <em>setContentHandler(this)</em>, + * so the "upstream" view of event delivery is modified from the state + * established in the base class constructor.  That way, + * the overridden methods intercept those event callbacks + * as they go "downstream", and + * all other event callbacks will pass events to any next consumer. + * Overridden methods may invoke superclass methods (perhaps after modifying + * parameters) if they wish to delegate such calls.  Such subclasses + * should use {@link #getErrorHandler} to report errors using the + * common error reporting mechanism. + * + * <p> Another important technique is to construct a filter consisting + * of only a few specific types of handler.  For example, one could easily + * prune out lexical events or various declarations by providing handlers + * which don't pass those events downstream, or by providing null handlers. + * + * <hr /> + * + * <p> This may be viewed as the consumer oriented analogue of the SAX2 + * {@link org.xml.sax.helpers.XMLFilterImpl XMLFilterImpl} class. + * Key differences include: <ul> + * + *	<li> This fully separates consumer and producer roles:  it + *	does not implement the producer side <em>XMLReader</em> or + *	<em>EntityResolver</em> interfaces, so it can only be used + *	in "push" mode (it has no <em>parse()</em> methods). + * + *	<li> "Extension" handlers are fully supported, enabling a + *	richer set of application requirements. + *	And it implements {@link EventConsumer}, which groups related + *	consumer methods together, rather than leaving them separated. + * + *	<li> The chaining which is visible is "downstream" to the next + *	consumer, not "upstream" to the preceding producer. + *	It supports "fan-in", where + *	a consumer can be fed by several producers.  (For "fan-out", + *	see the {@link TeeConsumer} class.) + * + *	<li> Event chaining is set up differently.  It is intended to + *	work "upstream" from terminus towards producer, during filter + *	construction, as described above. + *	This is part of an early binding model: + *	events don't need to pass through stages which ignore them. + * + *	<li> ErrorHandler support is separated, on the grounds that + *	pipeline stages need to share the same error handling policy. + *	For the same reason, error handler setup goes "downstream": + *	when error handlers get set, they are passed to subsequent + *	consumers. + * + *	</ul> + * + * <p> The {@link #chainTo chainTo()} convenience routine supports chaining to + * an XMLFilterImpl, in its role as a limited functionality event + * consumer.  Its event producer role ({@link XMLFilter}) is ignored. + * + * <hr /> + * + * <p> The {@link #bind bind()} routine may be used associate event pipelines + * with any kind of {@link XMLReader} that will produce the events. + * Such pipelines don't necessarily need to have any members which are + * implemented using this class.  That routine has some intelligence + * which supports automatic changes to parser feature flags, letting + * event piplines become largely independent of the particular feature + * sets of parsers. + * + * @author David Brownell + */ +public class EventFilter +    implements EventConsumer, ContentHandler2, DTDHandler, +	    LexicalHandler, DeclHandler +{ +    // SAX handlers +    private ContentHandler		docHandler, docNext; +    private DTDHandler			dtdHandler, dtdNext; +    private LexicalHandler		lexHandler, lexNext; +    private DeclHandler			declHandler, declNext; +    // and ideally, one more for the stuff SAX2 doesn't show + +    private Locator			locator; +    private EventConsumer		next; +    private ErrorHandler		errHandler; + +     +    /** SAX2 URI prefix for standard feature flags. */ +    public static final String		FEATURE_URI +	= "http://xml.org/sax/features/"; +    /** SAX2 URI prefix for standard properties (mostly for handlers). */ +    public static final String		PROPERTY_URI +	= "http://xml.org/sax/properties/"; + +    /** SAX2 property identifier for {@link DeclHandler} events */ +    public static final String		DECL_HANDLER +	= PROPERTY_URI + "declaration-handler"; +    /** SAX2 property identifier for {@link LexicalHandler} events */ +    public static final String		LEXICAL_HANDLER +	= PROPERTY_URI + "lexical-handler"; +     +    // +    // These class objects will be null if the relevant class isn't linked. +    // Small configurations (pJava and some kinds of embedded systems) need +    // to facilitate smaller executables.  So "instanceof" is undesirable +    // when bind() sees if it can remove some stages. +    // +    // SECURITY NOTE:  assuming all these classes are part of the same sealed +    // package, there's no problem saving these in the instance of this class +    // that's associated with "this" class loader.  But that wouldn't be true +    // for classes in another package. +    // +    private static boolean		loaded; +    private static Class		nsClass; +    private static Class		validClass; +    private static Class		wfClass; +    private static Class		xincClass; + +    static ClassLoader getClassLoader () +    { +        Method m = null; + +        try { +            m = Thread.class.getMethod("getContextClassLoader", null); +        } catch (NoSuchMethodException e) { +            // Assume that we are running JDK 1.1, use the current ClassLoader +            return EventFilter.class.getClassLoader(); +        } + +        try { +            return (ClassLoader) m.invoke(Thread.currentThread(), null); +        } catch (IllegalAccessException e) { +            // assert(false) +            throw new UnknownError(e.getMessage()); +        } catch (InvocationTargetException e) { +            // assert(e.getTargetException() instanceof SecurityException) +            throw new UnknownError(e.getMessage()); +        } +    } + +    static Class loadClass (ClassLoader classLoader, String className) +    { +	try { +	    if (classLoader == null) +		return Class.forName(className); +	    else +		return classLoader.loadClass(className); +	} catch (Exception e) { +	    return null; +	} +    } + +    static private void loadClasses () +    { +	ClassLoader	loader = getClassLoader (); + +	nsClass = loadClass (loader, "gnu.xml.pipeline.NSFilter"); +	validClass = loadClass (loader, "gnu.xml.pipeline.ValidationConsumer"); +	wfClass = loadClass (loader, "gnu.xml.pipeline.WellFormednessFilter"); +	xincClass = loadClass (loader, "gnu.xml.pipeline.XIncludeFilter"); +	loaded = true; +    } + + +    /** +     * Binds the standard SAX2 handlers from the specified consumer +     * pipeline to the specified producer.  These handlers include the core +     * {@link ContentHandler} and {@link DTDHandler}, plus the extension +     * {@link DeclHandler} and {@link LexicalHandler}.  Any additional +     * application-specific handlers need to be bound separately. +     * The {@link ErrorHandler} is handled differently:  the producer's +     * error handler is passed through to the consumer pipeline. +     * The producer is told to include namespace prefix information if it +     * can, since many pipeline stages need that Infoset information to +     * work well. +     * +     * <p> At the head of the pipeline, certain standard event filters are +     * recognized and handled specially.  This facilitates construction +     * of processing pipelines that work regardless of the capabilities +     * of the XMLReader implementation in use; for example, it permits +     * validating output of a {@link gnu.xml.util.DomParser}. <ul> +     * +     *	<li> {@link NSFilter} will be removed if the producer can be +     *	told not to discard namespace data, using the "namespace-prefixes" +     *	feature flag. +     * +     *	<li> {@link ValidationConsumer} will be removed if the producer +     *	can be told to validate, using the "validation" feature flag. +     * +     *	<li> {@link WellFormednessFilter} is always removed, on the +     *	grounds that no XMLReader is permitted to producee malformed +     *	event streams and this would just be processing overhead. +     * +     *	<li> {@link XIncludeFilter} stops the special handling, except +     *	that it's told about the "namespace-prefixes" feature of the +     *	event producer so that the event stream is internally consistent. +     * +     *	<li> The first consumer which is not one of those classes stops +     *	such special handling.  This means that if you want to force +     *	one of those filters to be used, you could just precede it with +     *	an instance of {@link EventFilter} configured as a pass-through. +     *	You might need to do that if you are using an {@link NSFilter} +     *	subclass to fix names found in attributes or character data. +     * +     *	</ul> +     * +     * <p> Other than that, this method works with any kind of event consumer, +     * not just event filters.  Note that in all cases, the standard handlers +     * are assigned; any previous handler assignments for the handler will +     * be overridden. +     * +     * @param producer will deliver events to the specified consumer  +     * @param consumer pipeline supplying event handlers to be associated +     *	with the producer (may not be null) +     */ +    public static void bind (XMLReader producer, EventConsumer consumer) +    { +	Class	klass = null; +	boolean	prefixes; + +	if (!loaded) +	    loadClasses (); + +	// DOM building, printing, layered validation, and other +	// things don't work well when prefix info is discarded. +	// Include it by default, whenever possible. +	try { +	    producer.setFeature (FEATURE_URI + "namespace-prefixes", +		true); +	    prefixes = true; +	} catch (SAXException e) { +	    prefixes = false; +	} + +	// NOTE:  This loop doesn't use "instanceof", since that +	// would prevent compiling/linking without those classes +	// being present. +	while (consumer != null) { +	    klass = consumer.getClass (); + +	    // we might have already changed this problematic SAX2 default. +	    if (nsClass != null && nsClass.isAssignableFrom (klass)) { +		if (!prefixes) +		    break; +		consumer = ((EventFilter)consumer).getNext (); + +	    // the parser _might_ do DTD validation by default ... +	    // if not, maybe we can change this setting. +	    } else if (validClass != null +		    && validClass.isAssignableFrom (klass)) { +		try { +		    producer.setFeature (FEATURE_URI + "validation", +			true); +		    consumer = ((ValidationConsumer)consumer).getNext (); +		} catch (SAXException e) { +		    break; +		} + +	    // parsers are required not to have such bugs +	    } else if (wfClass != null && wfClass.isAssignableFrom (klass)) { +		consumer = ((WellFormednessFilter)consumer).getNext (); + +	    // stop on the first pipeline stage we can't remove +	    } else +		break; +	     +	    if (consumer == null) +		klass = null; +	} + +	// the actual setting here doesn't matter as much +	// as that producer and consumer agree +	if (xincClass != null && klass != null +		&& xincClass.isAssignableFrom (klass)) +	    ((XIncludeFilter)consumer).setSavingPrefixes (prefixes); + +	// Some SAX parsers can't handle null handlers -- bleech +	DefaultHandler2	h = new DefaultHandler2 (); + +	if (consumer != null && consumer.getContentHandler () != null) +	    producer.setContentHandler (consumer.getContentHandler ()); +	else +	    producer.setContentHandler (h); +	if (consumer != null && consumer.getDTDHandler () != null) +	    producer.setDTDHandler (consumer.getDTDHandler ()); +	else +	    producer.setDTDHandler (h); + +	try { +	    Object	dh; +	     +	    if (consumer != null) +		dh = consumer.getProperty (DECL_HANDLER); +	    else +		dh = null; +	    if (dh == null) +		dh = h; +	    producer.setProperty (DECL_HANDLER, dh); +	} catch (Exception e) { /* ignore */ } +	try { +	    Object	lh; +	     +	    if (consumer != null) +		lh = consumer.getProperty (LEXICAL_HANDLER); +	    else +		lh = null; +	    if (lh == null) +		lh = h; +	    producer.setProperty (LEXICAL_HANDLER, lh); +	} catch (Exception e) { /* ignore */ } + +	// this binding goes the other way around +	if (producer.getErrorHandler () == null) +	    producer.setErrorHandler (h); +	if (consumer != null) +	    consumer.setErrorHandler (producer.getErrorHandler ()); +    } +     +    /** +     * Initializes all handlers to null. +     */ +	// constructor used by PipelineFactory +    public EventFilter () { } + + +    /** +     * Handlers that are not otherwise set will default to those from +     * the specified consumer, making it easy to pass events through. +     * If the consumer is null, all handlers are initialzed to null. +     */ +	// constructor used by PipelineFactory +    public EventFilter (EventConsumer consumer) +    { +	if (consumer == null) +	    return; + +	next = consumer; + +	// We delegate through the "xxNext" handlers, and +	// report the "xxHandler" ones on our input side. + +	// Normally a subclass would both override handler +	// methods and register itself as the "xxHandler". + +	docHandler = docNext = consumer.getContentHandler (); +	dtdHandler = dtdNext = consumer.getDTDHandler (); +	try { +	    declHandler = declNext = (DeclHandler) +		    consumer.getProperty (DECL_HANDLER); +	} catch (SAXException e) { /* leave value null */ } +	try { +	    lexHandler = lexNext = (LexicalHandler) +		    consumer.getProperty (LEXICAL_HANDLER); +	} catch (SAXException e) { /* leave value null */ } +    } + +    /** +     * Treats the XMLFilterImpl as a limited functionality event consumer, +     * by arranging to deliver events to it; this lets such classes be +     * "wrapped" as pipeline stages. +     * +     * <p> <em>Upstream Event Setup:</em> +     * If no handlers have been assigned to this EventFilter, then the +     * handlers from specified XMLFilterImpl are returned from this +     * {@link EventConsumer}: the XMLFilterImpl is just "wrapped". +     * Otherwise the specified handlers will be returned. +     * +     * <p> <em>Downstream Event Setup:</em> +     * Subclasses may chain event delivery to the specified XMLFilterImpl +     * by invoking the appropiate superclass methods, +     * as if their constructor passed a "next" EventConsumer to the +     * constructor for this class. +     * If this EventFilter has an ErrorHandler, it is assigned as +     * the error handler for the XMLFilterImpl, just as would be +     * done for a next stage implementing {@link EventConsumer}. +     * +     * @param next the next downstream component of the pipeline. +     * @exception IllegalStateException if the "next" consumer has +     *	already been set through the constructor. +     */ +    public void chainTo (XMLFilterImpl next) +    { +	if (this.next != null) +	    throw new IllegalStateException (); + +	docNext = next.getContentHandler (); +	if (docHandler == null) +	    docHandler = docNext; +	dtdNext = next.getDTDHandler (); +	if (dtdHandler == null) +	    dtdHandler = dtdNext; + +	try { +	    declNext = (DeclHandler) next.getProperty (DECL_HANDLER); +	    if (declHandler == null) +		declHandler = declNext; +	} catch (SAXException e) { /* leave value null */ } +	try { +	    lexNext = (LexicalHandler) next.getProperty (LEXICAL_HANDLER); +	    if (lexHandler == null) +		lexHandler = lexNext; +	} catch (SAXException e) { /* leave value null */ } + +	if (errHandler != null) +	    next.setErrorHandler (errHandler); +    } + +    /** +     * Records the error handler that should be used by this stage, and +     * passes it "downstream" to any subsequent stage. +     */ +    final public void setErrorHandler (ErrorHandler handler) +    { +	errHandler = handler; +	if (next != null) +	    next.setErrorHandler (handler); +    } + +    /** +     * Returns the error handler assigned this filter stage, or null +     * if no such assigment has been made. +     */ +    final public ErrorHandler getErrorHandler () +    { +	return errHandler; +    } + + +    /** +     * Returns the next event consumer in sequence; or null if there +     * is no such handler. +     */ +    final public EventConsumer getNext () +	{ return next; } + + +    /** +     * Assigns the content handler to use; a null handler indicates +     * that these events will not be forwarded. +     * This overrides the previous settting for this handler, which was +     * probably pointed to the next consumer by the base class constructor. +     */ +    final public void setContentHandler (ContentHandler h) +    { +	docHandler = h; +    } + +    /** Returns the content handler being used. */ +    final public ContentHandler getContentHandler () +    { +	return docHandler; +    } + +    /** +     * Assigns the DTD handler to use; a null handler indicates +     * that these events will not be forwarded. +     * This overrides the previous settting for this handler, which was +     * probably pointed to the next consumer by the base class constructor. +     */ +    final public void setDTDHandler (DTDHandler h) +	{ dtdHandler = h; } + +    /** Returns the dtd handler being used. */ +    final public DTDHandler getDTDHandler () +    { +	return dtdHandler; +    } + +    /** +     * Stores the property, normally a handler; a null handler indicates +     * that these events will not be forwarded. +     * This overrides the previous handler settting, which was probably +     * pointed to the next consumer by the base class constructor. +     */ +    final public void setProperty (String id, Object o) +    throws SAXNotRecognizedException, SAXNotSupportedException +    { +	try { +	    Object	value = getProperty (id); + +	    if (value == o) +		return; +	    if (DECL_HANDLER.equals (id)) { +		declHandler = (DeclHandler) o; +		return; +	    } +	    if (LEXICAL_HANDLER.equals (id)) { +		lexHandler = (LexicalHandler) o; +		return; +	    } +	    throw new SAXNotSupportedException (id); + +	} catch (ClassCastException e) { +	    throw new SAXNotSupportedException (id); +	} +    } + +    /** Retrieves a property of unknown intent (usually a handler) */ +    final public Object getProperty (String id) +    throws SAXNotRecognizedException +    { +	if (DECL_HANDLER.equals (id))  +	    return declHandler; +	if (LEXICAL_HANDLER.equals (id)) +	    return lexHandler; + +	throw new SAXNotRecognizedException (id); +    } + +    /** +     * Returns any locator provided to the next consumer, if this class +     * (or a subclass) is handling {@link ContentHandler } events. +     */ +    public Locator getDocumentLocator () +	{ return locator; } + + +    // CONTENT HANDLER DELEGATIONS + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void setDocumentLocator (Locator locator) +    { +	this.locator = locator; +	if (docNext != null) +	    docNext.setDocumentLocator (locator); +    } + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void startDocument () throws SAXException +    { +	if (docNext != null) +	    docNext.startDocument (); +    } + +    public void xmlDecl(String version, String encoding, boolean standalone, +                        String inputEncoding) +      throws SAXException +    { +      if (docNext != null && docNext instanceof ContentHandler2) +        { +          ((ContentHandler2) docNext).xmlDecl(version, encoding, standalone, +                                              inputEncoding); +        } +    } + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void skippedEntity (String name) throws SAXException +    { +	if (docNext != null) +	    docNext.skippedEntity (name); +    } + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void processingInstruction (String target, String data) +    throws SAXException +    { +	if (docNext != null) +	    docNext.processingInstruction (target, data); +    } + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void characters (char ch [], int start, int length) +    throws SAXException +    { +	if (docNext != null) +	    docNext.characters (ch, start, length); +    } + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void ignorableWhitespace (char ch [], int start, int length) +    throws SAXException +    { +	if (docNext != null) +	    docNext.ignorableWhitespace (ch, start, length); +    } + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void startPrefixMapping (String prefix, String uri) +    throws SAXException +    { +	if (docNext != null) +	    docNext.startPrefixMapping (prefix, uri); +    } + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void startElement ( +	String uri, String localName, +	String qName, Attributes atts +    ) throws SAXException +    { +	if (docNext != null) +	    docNext.startElement (uri, localName, qName, atts); +    } + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void endElement (String uri, String localName, String qName) +    throws SAXException +    { +	if (docNext != null) +	    docNext.endElement (uri, localName, qName); +    } + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void endPrefixMapping (String prefix) throws SAXException +    { +	if (docNext != null) +	    docNext.endPrefixMapping (prefix); +    } + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void endDocument () throws SAXException +    { +	if (docNext != null) +	    docNext.endDocument (); +	locator = null; +    } + + +    // DTD HANDLER DELEGATIONS +     +    /** <b>SAX1:</b> passes this callback to the next consumer, if any */ +    public void unparsedEntityDecl ( +	String name, +	String publicId, +	String systemId, +	String notationName +    ) throws SAXException +    { +	if (dtdNext != null) +	    dtdNext.unparsedEntityDecl (name, publicId, systemId, notationName); +    } +     +    /** <b>SAX1:</b> passes this callback to the next consumer, if any */ +    public void notationDecl (String name, String publicId, String systemId) +    throws SAXException +    { +	if (dtdNext != null) +	    dtdNext.notationDecl (name, publicId, systemId); +    } +     + +    // LEXICAL HANDLER DELEGATIONS + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void startDTD (String name, String publicId, String systemId) +    throws SAXException +    { +	if (lexNext != null) +	    lexNext.startDTD (name, publicId, systemId); +    } + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void endDTD () +    throws SAXException +    { +	if (lexNext != null) +	    lexNext.endDTD (); +    } + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void comment (char ch [], int start, int length) +    throws SAXException +    { +	if (lexNext != null) +	    lexNext.comment (ch, start, length); +    } + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void startCDATA () +    throws SAXException +    { +	if (lexNext != null) +	    lexNext.startCDATA (); +    } + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void endCDATA () +    throws SAXException +    { +	if (lexNext != null) +	    lexNext.endCDATA (); +    } + +    /** +     * <b>SAX2:</b> passes this callback to the next consumer, if any. +     */ +    public void startEntity (String name) +    throws SAXException +    { +	if (lexNext != null) +	    lexNext.startEntity (name); +    } + +    /** +     * <b>SAX2:</b> passes this callback to the next consumer, if any. +     */ +    public void endEntity (String name) +    throws SAXException +    { +	if (lexNext != null) +	    lexNext.endEntity (name); +    } +     + +    // DECLARATION HANDLER DELEGATIONS + + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void elementDecl (String name, String model) +    throws SAXException +    { +	if (declNext != null) +	    declNext.elementDecl (name, model); +    } + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void attributeDecl (String eName, String aName, +	    String type, String mode, String value) +    throws SAXException +    { +	if (declNext != null) +	    declNext.attributeDecl (eName, aName, type, mode, value); +    } + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void externalEntityDecl (String name, +    	String publicId, String systemId) +    throws SAXException +    { +	if (declNext != null) +	    declNext.externalEntityDecl (name, publicId, systemId); +    } + +    /** <b>SAX2:</b> passes this callback to the next consumer, if any */ +    public void internalEntityDecl (String name, String value) +    throws SAXException +    { +	if (declNext != null) +	    declNext.internalEntityDecl (name, value); +    } +} diff --git a/libjava/gnu/xml/pipeline/LinkFilter.java b/libjava/gnu/xml/pipeline/LinkFilter.java new file mode 100644 index 00000000000..28a45017046 --- /dev/null +++ b/libjava/gnu/xml/pipeline/LinkFilter.java @@ -0,0 +1,243 @@ +/* LinkFilter.java --  +   Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING.  If not, write to the +Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA +02111-1307 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library.  Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module.  An independent module is a module which is not derived from +or based on this library.  If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so.  If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.io.IOException; +import java.net.URL;  +import java.util.Enumeration; +import java.util.Vector; + +import org.xml.sax.Attributes; +import org.xml.sax.Locator; +import org.xml.sax.SAXException; + + +/** + * Pipeline filter to remember XHTML links found in a document, + * so they can later be crawled.  Fragments are not counted, and duplicates + * are ignored.  Callers are responsible for filtering out URLs they aren't + * interested in.  Events are passed through unmodified. + * + * <p> Input MUST include a setDocumentLocator() call, as it's used to + * resolve relative links in the absence of a "base" element.  Input MUST + * also include namespace identifiers, since it is the XHTML namespace + * identifier which is used to identify the relevant elements. + * + * <p><em>FIXME:</em> handle xml:base attribute ... in association with + * a stack of base URIs.  Similarly, recognize/support XLink data. + * + * @author David Brownell + */ +public class LinkFilter extends EventFilter +{ +    // for storing URIs +    private Vector		vector = new Vector (); + +	// struct for "full" link record (tbd) +	// these for troubleshooting original source: +	//	original uri +	//	uri as resolved (base, relative, etc) +	//	URI of originating doc +	//	line # +	//	original element + attrs (img src, desc, etc) + +	// XLink model of the link ... for inter-site pairups ? + +    private String		baseURI; + +    private boolean		siteRestricted = false; + +    // +    // XXX leverage blacklist info (like robots.txt) +    // +    // XXX constructor w/param ... pipeline for sending link data +    // probably XHTML --> XLink, providing info as sketched above +    // + + +    /** +     * Constructs a new event filter, which collects links in private data +     * structure for later enumeration. +     */ +	// constructor used by PipelineFactory +    public LinkFilter () +    { +	super.setContentHandler (this); +    } + + +    /** +     * Constructs a new event filter, which collects links in private data +     * structure for later enumeration and passes all events, unmodified, +     * to the next consumer. +     */ +	// constructor used by PipelineFactory +    public LinkFilter (EventConsumer next) +    { +	super (next); +	super.setContentHandler (this); +    } + + +    /** +     * Returns an enumeration of the links found since the filter +     * was constructed, or since removeAllLinks() was called. +     * +     * @return enumeration of strings. +     */ +    public Enumeration getLinks () +    { +	return vector.elements (); +    } + +    /** +     * Removes records about all links reported to the event +     * stream, as if the filter were newly created. +     */ +    public void removeAllLinks () +    { +	vector = new Vector (); +    } + + +    /** +     * Collects URIs for (X)HTML content from elements which hold them. +     */ +    public void startElement ( +	String		uri, +	String		localName, +	String		qName, +	Attributes	atts +    ) throws SAXException +    { +	String	link; + +	// Recognize XHTML links. +	if ("http://www.w3.org/1999/xhtml".equals (uri)) { + +	    if ("a".equals (localName) || "base".equals (localName) +		    || "area".equals (localName)) +		link = atts.getValue ("href"); +	    else if ("iframe".equals (localName) || "frame".equals (localName)) +		link = atts.getValue ("src"); +	    else if ("blockquote".equals (localName) || "q".equals (localName) +		    || "ins".equals (localName) || "del".equals (localName)) +		link = atts.getValue ("cite"); +	    else +		link = null; +	    link = maybeAddLink (link); + +	    // "base" modifies designated baseURI +	    if ("base".equals (localName) && link != null) +		baseURI = link; + +	    if ("iframe".equals (localName) || "img".equals (localName)) +		maybeAddLink (atts.getValue ("longdesc")); +	} +	 +	super.startElement (uri, localName, qName, atts); +    } + +    private String maybeAddLink (String link) +    { +	int		index; + +	// ignore empty links and fragments inside docs +	if (link == null) +	    return null; +	if ((index = link.indexOf ("#")) >= 0) +	    link = link.substring (0, index); +	if (link.equals ("")) +	    return null; + +	try { +	    // get the real URI +	    URL		base = new URL ((baseURI != null) +				    ? baseURI +				    : getDocumentLocator ().getSystemId ()); +	    URL		url = new URL (base, link); + +	    link = url.toString (); + +	    // ignore duplicates +	    if (vector.contains (link)) +		return link; + +	    // other than what "base" does, stick to original site: +	    if (siteRestricted) { +		// don't switch protocols +		if (!base.getProtocol ().equals (url.getProtocol ())) +		    return link; +		// don't switch servers +		if (base.getHost () != null +			&& !base.getHost ().equals (url.getHost ())) +		    return link; +	    } + +	    vector.addElement (link); + +	    return link; +	     +	} catch (IOException e) { +	    // bad URLs we don't want +	} +	return null; +    } + +    /** +     * Reports an error if no Locator has been made available. +     */ +    public void startDocument () +    throws SAXException +    { +	if (getDocumentLocator () == null) +	    throw new SAXException ("no Locator!"); +    } + +    /** +     * Forgets about any base URI information that may be recorded. +     * Applications will often want to call removeAllLinks(), likely +     * after examining the links which were reported. +     */ +    public void endDocument () +    throws SAXException +    { +	baseURI = null; +	super.endDocument (); +    } +} diff --git a/libjava/gnu/xml/pipeline/NSFilter.java b/libjava/gnu/xml/pipeline/NSFilter.java new file mode 100644 index 00000000000..9e8a6436503 --- /dev/null +++ b/libjava/gnu/xml/pipeline/NSFilter.java @@ -0,0 +1,340 @@ +/* NSFilter.java --  +   Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING.  If not, write to the +Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA +02111-1307 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library.  Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module.  An independent module is a module which is not derived from +or based on this library.  If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so.  If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.util.EmptyStackException; +import java.util.Enumeration; +import java.util.Stack; + +import org.xml.sax.*; +import org.xml.sax.ext.*; +import org.xml.sax.helpers.AttributesImpl; +import org.xml.sax.helpers.NamespaceSupport; + + +/** + * This filter ensures that element and attribute names are properly prefixed, + * and that such prefixes are declared.  Such data is critical for operations + * like writing XML text, and validating against DTDs:  names or their prefixes + * may have been discarded, although they are essential to the exchange of + * information using XML.  There are various common ways that such data + * gets discarded: <ul> + * + *	<li> By default, SAX2 parsers must discard the "xmlns*" + *	attributes, and may also choose not to report properly prefixed + *	names for elements or attributes.  (Some parsers may support + *	changing the <em>namespace-prefixes</em> value from the default + *	to <em>true</em>, effectively eliminating the need to use this + *	filter on their output.) + * + *	<li> When event streams are generated from a DOM tree, they may + *	have never have had prefixes or declarations for namespaces; or + *	the existing prefixes or declarations may have been invalidated + *	by structural modifications to that DOM tree. + * + *	<li> Other software writing SAX event streams won't necessarily + *	be worrying about prefix management, and so they will need to + *	have a transparent solution for managing them. + * + *	</ul> + * + * <p> This filter uses a heuristic to choose the prefix to assign to any + * particular name which wasn't already corectly prefixed.  The associated + * namespace will be correct, and the prefix will be declared.  Original + * structures facilitating text editing, such as conventions about use of + * mnemonic prefix names or the scoping of prefixes, can't always be + * reconstructed after they are discarded, as strongly encouraged by the + * current SAX2 defaults. + * + * <p> Note that this can't possibly know whether values inside attribute + * value or document content involve prefixed names.  If your application + * requires using prefixed names in such locations you'll need to add some + * appropriate logic (perhaps adding additional heuristics in a subclass). + * + * @author David Brownell + */ +public class NSFilter extends EventFilter +{ +    private NamespaceSupport	nsStack = new NamespaceSupport (); +    private Stack		elementStack = new Stack (); + +    private boolean		pushedContext; +    private String		nsTemp [] = new String [3]; +    private AttributesImpl	attributes = new AttributesImpl (); +    private boolean		usedDefault; + +    // gensymmed prefixes use this root name +    private static final String	prefixRoot = "prefix-"; + +     +    /** +     * Passes events through to the specified consumer, after first +     * processing them. +     * +     * @param next the next event consumer to receive events. +     */ +	// constructor used by PipelineFactory +    public NSFilter (EventConsumer next) +    { +	super (next); + +	setContentHandler (this); +    } + +    private void fatalError (String message) +    throws SAXException +    { +	SAXParseException	e; +	ErrorHandler		handler = getErrorHandler (); +	Locator			locator = getDocumentLocator (); + +	if (locator == null) +	    e = new SAXParseException (message, null, null, -1, -1); +	else +	    e = new SAXParseException (message, locator); +	if (handler != null) +	    handler.fatalError (e); +	throw e; +    } + + +    public void startDocument () throws SAXException +    { +	elementStack.removeAllElements (); +	nsStack.reset (); +	pushedContext = false; +	super.startDocument (); +    } + +    /** +     * This call is not passed to the next consumer in the chain. +     * Prefix declarations and scopes are only exposed in the form +     * of attributes; this callback just records a declaration that +     * will be exposed as an attribute. +     */ +    public void startPrefixMapping (String prefix, String uri) +    throws SAXException +    { +	if (pushedContext == false) { +	    nsStack.pushContext (); +	    pushedContext = true; +	} + +	// this check is awkward, but the paranoia prevents big trouble +	for (Enumeration e = nsStack.getDeclaredPrefixes (); +		e.hasMoreElements (); +		/* NOP */ ) { +	    String	declared = (String) e.nextElement (); + +	    if (!declared.equals (prefix)) +		continue; +	    if (uri.equals (nsStack.getURI (prefix))) +		return; +	    fatalError ("inconsistent binding for prefix '" + prefix +		+ "' ... " + uri + " (was " + nsStack.getURI (prefix) + ")"); +	} + +	if (!nsStack.declarePrefix (prefix, uri)) +	    fatalError ("illegal prefix declared: " + prefix); +    } + +    private String fixName (String ns, String l, String name, boolean isAttr) +    throws SAXException +    { +	if ("".equals (name) || name == null) { +	    name = l; +	    if ("".equals (name) || name == null) +		fatalError ("empty/null name"); +	} + +	// can we correctly process the name as-is? +	// handles "element scope" attribute names here. +	if (nsStack.processName (name, nsTemp, isAttr) != null +		&& nsTemp [0].equals (ns) +		) { +	    return nsTemp [2]; +	} + +	// nope, gotta modify the name or declare a default mapping +	int	temp; + +	// get rid of any current prefix +	if ((temp = name.indexOf (':')) >= 0) { +	    name = name.substring (temp + 1); + +	    // ... maybe that's enough (use/prefer default namespace) ... +	    if (!isAttr && nsStack.processName (name, nsTemp, false) != null +		    && nsTemp [0].equals (ns) +		    ) { +		return nsTemp [2]; +	    } +	} + +	// must we define and use the default/undefined prefix? +	if ("".equals (ns)) { +	    if (isAttr) +		fatalError ("processName bug"); +	    if (attributes.getIndex ("xmlns") != -1) +		fatalError ("need to undefine default NS, but it's bound: " +			+ attributes.getValue ("xmlns")); +	     +	    nsStack.declarePrefix ("", ""); +	    attributes.addAttribute ("", "", "xmlns", "CDATA", ""); +	    return name; +	} + +	// is there at least one non-null prefix we can use? +	for (Enumeration e = nsStack.getDeclaredPrefixes (); +		e.hasMoreElements (); +		/* NOP */) { +	    String prefix = (String) e.nextElement (); +	    String uri = nsStack.getURI (prefix); + +	    if (uri == null || !uri.equals (ns)) +		continue; +	    return prefix + ":" + name; +	} + +	// no such luck.  create a prefix name, declare it, use it. +	for (temp = 0; temp >= 0; temp++) { +	    String	prefix = prefixRoot + temp; + +	    if (nsStack.getURI (prefix) == null) { +		nsStack.declarePrefix (prefix, ns); +		attributes.addAttribute ("", "", "xmlns:" + prefix, +			"CDATA", ns); +		return prefix + ":" + name; +	    } +	} +	fatalError ("too many prefixes genned"); +	// NOTREACHED +	return null; +    } + +    public void startElement ( +	String uri, String localName, +	String qName, Attributes atts +    ) throws SAXException +    { +	if (!pushedContext) +	    nsStack.pushContext (); +	pushedContext = false; + +	// make sure we have all NS declarations handy before we start +	int	length = atts.getLength (); + +	for (int i = 0; i < length; i++) { +	    String	aName = atts.getQName (i); + +	    if (!aName.startsWith ("xmlns")) +		continue; + +	    String	prefix; + +	    if ("xmlns".equals (aName)) +		prefix = ""; +	    else if (aName.indexOf (':') == 5) +		prefix = aName.substring (6); +	    else	// "xmlnsfoo" etc. +		continue; +	    startPrefixMapping (prefix, atts.getValue (i)); +	} + +	// put namespace decls at the start of our regenned attlist +	attributes.clear (); +	for (Enumeration e = nsStack.getDeclaredPrefixes (); +		e.hasMoreElements (); +		/* NOP */) { +	    String prefix = (String) e.nextElement (); + +	    attributes.addAttribute ("", "", +		    ("".equals (prefix) +			? "xmlns" +			: "xmlns:" + prefix), +		    "CDATA", +		    nsStack.getURI (prefix)); +	} + +	// name fixups:  element, then attributes. +	// fixName may declare a new prefix or, for the element, +	// redeclare the default (if element name needs it). +	qName = fixName (uri, localName, qName, false); + +	for (int i = 0; i < length; i++) { +	    String	aName = atts.getQName (i); +	    String	aNS = atts.getURI (i); +	    String	aLocal = atts.getLocalName (i); +	    String	aType = atts.getType (i); +	    String	aValue = atts.getValue (i); + +	    if (aName.startsWith ("xmlns")) +		continue; +	    aName = fixName (aNS, aLocal, aName, true); +	    attributes.addAttribute (aNS, aLocal, aName, aType, aValue); +	} + +	elementStack.push (qName); + +	// pass event along, with cleaned-up names and decls. +	super.startElement (uri, localName, qName, attributes); +    } + +    public void endElement (String uri, String localName, String qName) +    throws SAXException +    { +	nsStack.popContext (); +	qName = (String) elementStack.pop (); +	super.endElement (uri, localName, qName); +    } + +    /** +     * This call is not passed to the next consumer in the chain. +     * Prefix declarations and scopes are only exposed in their +     * attribute form. +     */ +    public void endPrefixMapping (String prefix) +    throws SAXException +	{ } + +    public void endDocument () throws SAXException +    { +	elementStack.removeAllElements (); +	nsStack.reset (); +	super.endDocument (); +    } +} diff --git a/libjava/gnu/xml/pipeline/PipelineFactory.java b/libjava/gnu/xml/pipeline/PipelineFactory.java new file mode 100644 index 00000000000..5edca73e683 --- /dev/null +++ b/libjava/gnu/xml/pipeline/PipelineFactory.java @@ -0,0 +1,723 @@ +/* PipelineFactory.java --  +   Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING.  If not, write to the +Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA +02111-1307 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library.  Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module.  An independent module is a module which is not derived from +or based on this library.  If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so.  If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.io.OutputStream; +import java.io.OutputStreamWriter; +import java.lang.reflect.Constructor; +import java.util.StringTokenizer; + +import org.xml.sax.*; +import org.xml.sax.ext.*; + + +/** + * This provides static factory methods for creating simple event pipelines. + * These pipelines are specified by strings, suitable for passing on + * command lines or embedding in element attributes.  For example, one way + * to write a pipeline that restores namespace syntax, validates (stopping + * the pipeline on validity errors) and then writes valid data to standard + * output is this: <pre> + *      nsfix | validate | write ( stdout )</pre> + * + * <p> In this syntax, the tokens are always separated by whitespace, and each + * stage of the pipeline may optionally have a parameter (which can be a + * pipeline) in parentheses.  Interior stages are called filters, and the + * rightmost end of a pipeline is called a terminus. + * + * <p> Stages are usually implemented by a single class, which may not be + * able to act as both a filter and a terminus; but any terminus can be + * automatically turned into a filter, through use of a {@link TeeConsumer}. + * The stage identifiers are either class names, or are one of the following + * short identifiers built into this class.  (Most of these identifiers are + * no more than aliases for classes.)  The built-in identifiers include:</p> +  + <table border="1" cellpadding="3" cellspacing="0"> +    <tr bgcolor="#ccccff" class="TableHeadingColor"> +	<th align="center" width="5%">Stage</th> +	<th align="center" width="9%">Parameter</th> +	<th align="center" width="1%">Terminus</th> +	<th align="center">Description</th> +    </tr> + +    <tr valign="top" align="center"> +	<td><a href="../dom/Consumer.html">dom</a></td> +	<td><em>none</em></td> +	<td> yes </td> +	<td align="left"> Applications code can access a DOM Document built +	from the input event stream.  When used as a filter, this buffers +	data up to an <em>endDocument</em> call, and then uses a DOM parser +	to report everything that has been recorded (which can easily be +	less than what was reported to it).  </td> +    </tr> +    <tr valign="top" align="center"> +	<td><a href="NSFilter.html">nsfix</a></td> +	<td><em>none</em></td> +	<td>no</td> +	<td align="left">This stage ensures that the XML element and attribute +	names in its output use namespace prefixes and declarations correctly. +	That is, so that they match the "Namespace plus LocalName" naming data +	with which each XML element and attribute is already associated.  </td> +    </tr> +    <tr valign="top" align="center"> +	<td><a href="EventFilter.html">null</a></td> +	<td><em>none</em></td> +	<td>yes</td> +	<td align="left">This stage ignores all input event data.</td> +    </tr> +    <tr valign="top" align="center"> +	<td><a href="CallFilter.html">server</a></td> +	<td><em>required</em><br> server URL </td> +	<td>no</td> +	<td align="left">Sends its input as XML request to a remote server, +	normally a web application server using the HTTP or HTTPS protocols. +	The output of this stage is the parsed response from that server.</td> +    </tr> +    <tr valign="top" align="center"> +	<td><a href="TeeConsumer.html">tee</a></td> +	<td><em>required</em><br> first pipeline</td> +	<td>no</td> +	<td align="left">This sends its events down two paths; its parameter +	is a pipeline descriptor for the first path, and the second path +	is the output of this stage.</td> +    </tr> + +    <tr valign="top" align="center"> +	<td><a href="ValidationConsumer.html">validate</a></td> +	<td><em>none</em></td> +	<td>yes</td> +	<td align="left">This checks for validity errors, and reports them +	through its error handler.  The input must include declaration events +	and some lexical events.  </td> +    </tr> +    <tr valign="top" align="center"> +	<td><a href="WellFormednessFilter.html">wf</a></td> +	<td><em>none</em></td> +	<td>yes</td> +	<td align="left"> This class provides some basic "well formedness" +	tests on the input event stream, and reports a fatal error if any +	of them fail.  One example: start/end calls for elements must match. +	No SAX parser is permitted to produce malformed output, but other +	components can easily do so.</td> +    </tr> +    <tr valign="top" align="center"> +	<td>write</td> +	<td><em>required</em><br> "stdout", "stderr", or filename</td> +	<td>yes</td> +	<td align="left"> Writes its input to the specified output, as pretty +	printed XML text encoded using UTF-8.  Input events must be well +	formed and "namespace fixed", else the output won't be XML (or possibly +	namespace) conformant.  The symbolic names represent +	<em>System.out</em> and <em>System.err</em> respectively; names must +	correspond to files which don't yet exist.</td> +    </tr> +    <tr valign="top" align="center"> +	<td>xhtml</td> +	<td><em>required</em><br> "stdout", "stderr", or filename</td> +	<td>yes</td> +	<td align="left"> Like <em>write</em> (above), except that XHTML rules +	are followed.  The XHTML 1.0 Transitional document type is declared, +	and only ASCII characters are written (for interoperability).  Other +	characters are written as entity or character references; the text is +	pretty printed.</td> +    </tr> +    <tr valign="top" align="center"> +	<td><a href="XIncludeFilter.html">xinclude</a></td> +	<td><em>none</em></td> +	<td>no</td> +	<td align="left">This stage handles XInclude processing. +	This is like entity inclusion, except that the included content +	is declared in-line rather than in the DTD at the beginning of +	a document. +	</td> +    </tr> +    <tr valign="top" align="center"> +	<td><a href="XsltFilter.html">xslt</a></td> +	<td><em>required</em><br> XSLT stylesheet URI</td> +	<td>no</td> +	<td align="left">This stage handles XSLT transformation +	according to a stylesheet. +	The implementation of the transformation may not actually +	stream data, although if such an XSLT engine is in use +	then that can happen. +	</td> +    </tr> + + </table> +  + * <p> Note that {@link EventFilter#bind} can automatically eliminate + * some filters by setting SAX2 parser features appropriately.  This means + * that you can routinely put filters like "nsfix", "validate", or "wf" at the + * front of a pipeline (for components that need inputs conditioned to match + * that level of correctness), and know that it won't actually be used unless + * it's absolutely necessary. + * + * @author David Brownell + */ +public class PipelineFactory +{ +    /** +     * Creates a simple pipeline according to the description string passed in. +     */ +    public static EventConsumer createPipeline (String description) +    throws IOException +    { +	return createPipeline (description, null); +    } + +    /** +     * Extends an existing pipeline by prepending the filter pipeline to the +     * specified consumer.  Some pipelines need more customization than can +     * be done through this simplified syntax.  When they are set up with +     * direct API calls, use this method to merge more complex pipeline +     * segments with easily configured ones. +     */ +    public static EventConsumer createPipeline ( +	String		description, +	EventConsumer	next +    ) throws IOException +    { +	// tokens are (for now) what's separated by whitespace; +	// very easy to parse, but IDs never have spaces. + +	StringTokenizer		tokenizer; +	String			tokens []; + +	tokenizer = new StringTokenizer (description); +	tokens = new String [tokenizer.countTokens ()]; +	for (int i = 0; i < tokens.length; i++) +	    tokens [i] = tokenizer.nextToken (); + +	PipelineFactory		factory = new PipelineFactory (); +	Pipeline		pipeline = factory.parsePipeline (tokens, next); + +	return pipeline.createPipeline (); +    } + + +    private PipelineFactory () { /* NYET */ } + + +    /** +     * Extends an existing pipeline by prepending a pre-tokenized filter +     * pipeline to the specified consumer.  Tokens are class names (or the +     * predefined aliases) left and right parenthesis, and the vertical bar. +     */ +    public static EventConsumer createPipeline ( +	String		tokens [], +	EventConsumer	next +    ) throws IOException +    { +	PipelineFactory		factory = new PipelineFactory (); +	Pipeline		pipeline = factory.parsePipeline (tokens, next); + +	return pipeline.createPipeline (); +    } + + +    private String		tokens []; +    private int			index; + +    private Pipeline parsePipeline (String toks [], EventConsumer next) +    { +	tokens = toks; +	index = 0; +	 +	Pipeline retval = parsePipeline (next); + +	if (index != toks.length) +	    throw new ArrayIndexOutOfBoundsException ( +		    "extra token: " + tokens [index]); +	return retval; +    } + +    // pipeline  ::= stage | stage '|' pipeline +    private Pipeline parsePipeline (EventConsumer next) +    { +	Pipeline	retval = new Pipeline (parseStage ()); + +	// minimal pipelines:  "stage" and "... | id" +	if (index > (tokens.length - 2) +		|| !"|".equals (tokens [index]) +		) { +	    retval.next = next; +	    return retval; +	} +	index++; +	retval.rest = parsePipeline (next); +	return retval; +    } + +    // stage     ::= id    | id '(' pipeline ')' +    private Stage parseStage () +    { +	Stage		retval = new Stage (tokens [index++]); + +	// minimal stages:  "id" and "id ( id )" +	if (index > (tokens.length - 2) +		|| !"(".equals (tokens [index]) /*)*/ +		) +	    return retval; +	 +	index++; +	retval.param = parsePipeline (null); +	if (index >= tokens.length) +	    throw new ArrayIndexOutOfBoundsException ( +		    "missing right paren"); +	if (/*(*/ !")".equals (tokens [index++])) +	    throw new ArrayIndexOutOfBoundsException ( +		    "required right paren, not: " + tokens [index - 1]); +	return retval; +    } + + +    // +    // these classes obey the conventions for constructors, so they're +    // only built in to this table of shortnames +    // +    //	- filter (one or two types of arglist) +    //	   * last constructor is 'next' element +    //	   * optional (first) string parameter +    // +    //	- terminus (one or types of arglist) +    //	   * optional (only) string parameter +    // +    // terminus stages are transformed into filters if needed, by +    // creating a "tee".  filter stages aren't turned to terminus +    // stages though; either eliminate such stages, or add some +    // terminus explicitly. +    // +    private static final String builtinStages [][] = { +	{ "dom",	"gnu.xml.dom.Consumer" }, +	{ "nsfix",	"gnu.xml.pipeline.NSFilter" }, +	{ "null",	"gnu.xml.pipeline.EventFilter" }, +	{ "server",	"gnu.xml.pipeline.CallFilter" }, +	{ "tee",	"gnu.xml.pipeline.TeeConsumer" }, +	{ "validate",	"gnu.xml.pipeline.ValidationConsumer" }, +	{ "wf",		"gnu.xml.pipeline.WellFormednessFilter" }, +	{ "xinclude",	"gnu.xml.pipeline.XIncludeFilter" }, +	{ "xslt",	"gnu.xml.pipeline.XsltFilter" }, + +// XXX want:  option for validate, to preload external part of a DTD + +	    //	xhtml, write ... nyet generic-ready +    }; + +    private static class Stage +    { +	String		id; +	Pipeline	param; + +	Stage (String name) +	    {  id = name; } + +	public String toString () +	{ +	    if (param == null) +		return id; +	    return id + " ( " + param + " )"; +	} + +	private void fail (String message) +	throws IOException +	{ +	    throw new IOException ("in '" + id +		    + "' stage of pipeline, " + message); +	} + +	EventConsumer createStage (EventConsumer next) +	throws IOException +	{ +	    String	 name = id; + +	    // most builtins are just class aliases +	    for (int i = 0; i < builtinStages.length; i++) { +		if (id.equals (builtinStages [i][0])) { +		    name = builtinStages [i][1]; +		    break; +		} +	    } + +	    // Save output as XML or XHTML text +	    if ("write".equals (name) || "xhtml".equals (name)) { +		String		filename; +		boolean		isXhtml = "xhtml".equals (name); +		OutputStream	out = null; +		TextConsumer	consumer; + +		if (param == null) +		    fail ("parameter is required"); + +		filename = param.toString (); +		if ("stdout".equals (filename)) +		    out = System.out; +		else if ("stderr".equals (filename)) +		    out = System.err; +		else { +		    File f = new File (filename); + +/* +		    if (!f.isAbsolute ()) +			fail ("require absolute file paths"); + */ +		    if (f.exists ()) +			fail ("file already exists: " + f.getName ()); + +// XXX this races against the existence test +		    out = new FileOutputStream (f); +		} +		 +		if (!isXhtml) +		    consumer = new TextConsumer (out); +		else +		    consumer = new TextConsumer ( +			new OutputStreamWriter (out, "8859_1"), +			true); +		 +		consumer.setPrettyPrinting (true); +		if (next == null) +		    return consumer; +		return new TeeConsumer (consumer, next); + +	    } else { +		// +		// Here go all the builtins that are just aliases for +		// classes, and all stage IDs that started out as such +		// class names.  The following logic relies on several +		// documented conventions for constructor invocation. +		// +		String		msg = null; + +		try { +		    Class	klass = Class.forName (name); +		    Class	argTypes [] = null; +		    Constructor	constructor = null; +		    boolean	filter = false; +		    Object	params [] = null; +		    Object	obj = null; + +		    // do we need a filter stage? +		    if (next != null) { +			// "next" consumer is always passed, with +			// or without the optional string param +			if (param == null) { +			    argTypes = new Class [1]; +			    argTypes [0] = EventConsumer.class; + +			    params = new Object [1]; +			    params [0] = next; + +			    msg = "no-param filter"; +			} else { +			    argTypes = new Class [2]; +			    argTypes [0] = String.class; +			    argTypes [1] = EventConsumer.class; + +			    params = new Object [2]; +			    params [0] = param.toString (); +			    params [1] = next; + +			    msg = "one-param filter"; +			} + + +			try { +			    constructor = klass.getConstructor (argTypes); +			} catch (NoSuchMethodException e) { +			    // try creating a filter from a +			    // terminus and a tee +			    filter = true; +			    msg += " built from "; +			} +		    } + +		    // build from a terminus stage, with or +		    // without the optional string param +		    if (constructor == null) { +			String	tmp; + +			if (param == null) { +			    argTypes = new Class [0]; +			    params = new Object [0]; + +			    tmp = "no-param terminus"; +			} else { +			    argTypes = new Class [1]; +			    argTypes [0] = String.class; + +			    params = new Object [1]; +			    params [0] = param.toString (); + +			    tmp = "one-param terminus"; +			} +			if (msg == null) +			    msg = tmp; +			else +			    msg += tmp; +			constructor = klass.getConstructor (argTypes); +			    // NOT creating terminus by dead-ending +			    // filters ... users should think about +			    // that one, something's likely wrong +		    } +		     +		    obj = constructor.newInstance (params); + +		    // return EventConsumers directly, perhaps after +		    // turning them into a filter +		    if (obj instanceof EventConsumer) { +			if (filter) +			    return new TeeConsumer ((EventConsumer) obj, next); +			return (EventConsumer) obj; +		    } +		     +		    // if it's not a handler, it's an error +		    // we can wrap handlers in a filter +		    EventFilter		retval = new EventFilter (); +		    boolean		updated = false; + +		    if (obj instanceof ContentHandler) { +			retval.setContentHandler ((ContentHandler) obj); +			updated = true; +		    } +		    if (obj instanceof DTDHandler) { +			retval.setDTDHandler ((DTDHandler) obj); +			updated = true; +		    } +		    if (obj instanceof LexicalHandler) { +			retval.setProperty ( +			    EventFilter.PROPERTY_URI + "lexical-handler", +			    obj); +			updated = true; +		    } +		    if (obj instanceof DeclHandler) { +			retval.setProperty ( +			    EventFilter.PROPERTY_URI + "declaration-handler", +			    obj); +			updated = true; +		    } + +		    if (!updated) +			fail ("class is neither Consumer nor Handler"); +		     +		    if (filter) +			return new TeeConsumer (retval, next); +		    return retval; + +		} catch (IOException e) { +		    throw e; + +		} catch (NoSuchMethodException e) { +		    fail (name + " constructor missing -- " + msg); + +		} catch (ClassNotFoundException e) { +		    fail (name + " class not found"); + +		} catch (Exception e) { +		    // e.printStackTrace (); +		    fail ("stage not available: " + e.getMessage ()); +		} +	    } +	    // NOTREACHED +	    return null; +	} +    } + +    private static class Pipeline +    { +	Stage		stage; + +	// rest may be null +	Pipeline	rest; +	EventConsumer	next; + +	Pipeline (Stage s) +	    { stage = s; } + +	public String toString () +	{ +	    if (rest == null && next == null) +		return stage.toString (); +	    if (rest != null) +		return stage + " | " + rest; +	    throw new IllegalArgumentException ("next"); +	} + +	EventConsumer createPipeline () +	throws IOException +	{ +	    if (next == null) { +		if (rest == null) +		    next = stage.createStage (null); +		else +		    next = stage.createStage (rest.createPipeline ()); +	    } +	    return next; +	} +    } + +/* +    public static void main (String argv []) +    { +	try { +	    // three basic terminus cases +	    createPipeline ("null"); +	    createPipeline ("validate"); +	    createPipeline ("write ( stdout )"); + +	    // four basic filters +	    createPipeline ("nsfix | write ( stderr )"); +	    createPipeline ("wf | null"); +	    createPipeline ("null | null"); +	    createPipeline ( +"call ( http://www.example.com/services/xml-1a ) | xhtml ( stdout )"); + +	    // tee junctions +	    createPipeline ("tee ( validate ) | write ( stdout )"); +	    createPipeline ("tee ( nsfix | write ( stdout ) ) | validate"); + +	    // longer pipeline +	    createPipeline ("nsfix | tee ( validate ) | write ( stdout )"); +	    createPipeline ( +		"null | wf | nsfix | tee ( validate ) | write ( stdout )"); + +	    // try some parsing error cases +	    try { +		createPipeline ("null (");		// extra token '(' +		System.err.println ("** didn't report error"); +	    } catch (Exception e) { +		System.err.println ("== err: " + e.getMessage ()); } + +	    try { +		createPipeline ("nsfix |");		// extra token '|' +		System.err.println ("** didn't report error"); +	    } catch (Exception e) { +		System.err.println ("== err: " + e.getMessage ()); } + +	    try { +		createPipeline ("xhtml ( foo");		// missing right paren +		System.err.println ("** didn't report error"); +	    } catch (Exception e) { +		System.err.println ("== err: " + e.getMessage ()); } + +	    try { +		createPipeline ("xhtml ( foo bar");	// required right paren +		System.err.println ("** didn't report error"); +	    } catch (Exception e) { +		System.err.println ("== err: " + e.getMessage ()); } + +	    try { +		createPipeline ("tee ( nsfix | validate");// missing right paren +		System.err.println ("** didn't report error"); +	    } catch (Exception e) { +		System.err.println ("== err: " + e.getMessage ()); } + +	    // try some construction error cases + +	    try { +		createPipeline ("call");		// missing param +		System.err.println ("** didn't report error"); +	    } catch (Exception e) { +		System.err.println ("== err: " + e.getMessage ()); } +	    try { +		createPipeline ("call ( foobar )");	// broken param +		System.err.println ("** didn't report error"); +	    } catch (Exception e) { +		System.err.println ("== err: " + e.getMessage ()); } +	    try { +		createPipeline ("nsfix ( foobar )");	// illegal param +		System.err.println ("** didn't report error"); +	    } catch (Exception e) { +		System.err.println ("== err: " + e.getMessage ()); } +	    try { +		createPipeline ("null ( foobar )");	// illegal param +		System.err.println ("** didn't report error"); +	    } catch (Exception e) { +		System.err.println ("== err: " + e.getMessage ()); } +	    try { +		createPipeline ("wf ( foobar )");	// illegal param +		System.err.println ("** didn't report error"); +	    } catch (Exception e) { +		System.err.println ("== err: " + e.getMessage ()); } +	    try { +		createPipeline ("xhtml ( foobar.html )"); +		new File ("foobar.html").delete (); +		// now supported +	    } catch (Exception e) { +		System.err.println ("** err: " + e.getMessage ()); } +	    try { +		createPipeline ("xhtml");		// missing param +		System.err.println ("** didn't report error"); +	    } catch (Exception e) { +		System.err.println ("== err: " + e.getMessage ()); } +	    try { +		createPipeline ("write ( stdout ) | null");	// nonterminal +		System.err.println ("** didn't report error"); +	    } catch (Exception e) { +		System.err.println ("== err: " + e.getMessage ()); } +	    try { +		createPipeline ("validate | null"); +		// now supported +	    } catch (Exception e) { +		System.err.println ("** err: " + e.getMessage ()); } +	    try { +		createPipeline ("validate ( foo )");	// illegal param +		System.err.println ("** didn't report error"); +	    } catch (Exception e) { +		System.err.println ("== err: " + e.getMessage ()); } +	    try { +		createPipeline ("tee");			// missing param +		System.err.println ("** didn't report error"); +	    } catch (Exception e) { +		System.err.println ("== err: " + e.getMessage ()); } +	    try { +		    // only builtins so far +		createPipeline ("com.example.xml.FilterClass"); +		System.err.println ("** didn't report error"); +	    } catch (Exception e) { +		System.err.println ("== err: " + e.getMessage ()); } + +	} catch (Exception e) { +	    e.printStackTrace (); +	} +    } +/**/ + +} diff --git a/libjava/gnu/xml/pipeline/TeeConsumer.java b/libjava/gnu/xml/pipeline/TeeConsumer.java new file mode 100644 index 00000000000..6d3227eda11 --- /dev/null +++ b/libjava/gnu/xml/pipeline/TeeConsumer.java @@ -0,0 +1,413 @@ +/* TeeConsumer.java --  +   Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING.  If not, write to the +Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA +02111-1307 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library.  Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module.  An independent module is a module which is not derived from +or based on this library.  If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so.  If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.io.IOException; +import org.xml.sax.*; +import org.xml.sax.ext.*; +//import gnu.xml.util; + + +/** + * Fans its events out to two other consumers, a "tee" filter stage in an + * event pipeline.  Networks can be assembled with multiple output points. + * + * <p> Error handling should be simple if you remember that exceptions + * you throw will cancel later stages in that callback's pipeline, and + * generally the producer will stop if it sees such an exception.  You + * may want to protect your pipeline against such backflows, making a + * kind of reverse filter (or valve?) so that certain exceptions thrown by + * your pipeline will caught and handled before the producer sees them. + * Just use a "try/catch" block, rememebering that really important + * cleanup tasks should be in "finally" clauses. + * + * <p> That issue isn't unique to "tee" consumers, but tee consumers have + * the additional twist that exceptions thrown by the first consumer + * will cause the second consumer not to see the callback (except for + * the endDocument callback, which signals state cleanup). + * + * @author David Brownell + */ +final public class TeeConsumer +	implements EventConsumer, +		ContentHandler, DTDHandler, +		LexicalHandler,DeclHandler +{ +    private EventConsumer	first, rest; + +    // cached to minimize time overhead +    private ContentHandler	docFirst, docRest; +    private DeclHandler		declFirst, declRest; +    private LexicalHandler	lexFirst, lexRest; + + +    /** +     * Constructs a consumer which sends all its events to the first +     * consumer, and then the second one.  If the first consumer throws +     * an exception, the second one will not see the event which +     * caused that exception to be reported. +     * +     * @param car The first consumer to get the events +     * @param cdr The second consumer to get the events +     */ +    public TeeConsumer (EventConsumer car, EventConsumer cdr) +    { +	if (car == null || cdr == null) +	    throw new NullPointerException (); +	first = car; +	rest = cdr; + +	// +	// Cache the handlers. +	// +	docFirst = first.getContentHandler (); +	docRest = rest.getContentHandler (); +	// DTD handler isn't cached (rarely needed) + +	try { +	    declFirst = null; +	    declFirst = (DeclHandler) first.getProperty ( +			EventFilter.DECL_HANDLER); +	} catch (SAXException e) {} +	try { +	    declRest = null; +	    declRest = (DeclHandler) rest.getProperty ( +			EventFilter.DECL_HANDLER); +	} catch (SAXException e) {} + +	try { +	    lexFirst = null; +	    lexFirst = (LexicalHandler) first.getProperty ( +			EventFilter.LEXICAL_HANDLER); +	} catch (SAXException e) {} +	try { +	    lexRest = null; +	    lexRest = (LexicalHandler) rest.getProperty ( +			EventFilter.LEXICAL_HANDLER); +	} catch (SAXException e) {} +    } + +/* FIXME +    /** +     * Constructs a pipeline, and is otherwise a shorthand for the +     * two-consumer constructor for this class. +     * +     * @param first Description of the first pipeline to get events, +     *	which will be passed to {@link PipelineFactory#createPipeline} +     * @param rest The second pipeline to get the events +     * / +	// constructor used by PipelineFactory +    public TeeConsumer (String first, EventConsumer rest) +    throws IOException +    { +	this (PipelineFactory.createPipeline (first), rest); +    } +*/ + +    /** Returns the first pipeline to get event calls. */ +    public EventConsumer getFirst () +	{ return first; } + +    /** Returns the second pipeline to get event calls. */ +    public EventConsumer getRest () +	{ return rest; } + +    /** Returns the content handler being used. */ +    final public ContentHandler getContentHandler () +    { +	if (docRest == null) +	    return docFirst; +	if (docFirst == null) +	    return docRest; +	return this; +    } + +    /** Returns the dtd handler being used. */ +    final public DTDHandler getDTDHandler () +    { +	// not cached (hardly used) +	if (rest.getDTDHandler () == null) +	    return first.getDTDHandler (); +	if (first.getDTDHandler () == null) +	    return rest.getDTDHandler (); +	return this; +    } + +    /** Returns the declaration or lexical handler being used. */ +    final public Object getProperty (String id) +    throws SAXNotRecognizedException +    { +	// +	// in degenerate cases, we have no work to do. +	// +	Object	firstProp = null, restProp = null; + +	try { firstProp = first.getProperty (id); } +	catch (SAXNotRecognizedException e) { /* ignore */ } +	try { restProp = rest.getProperty (id); } +	catch (SAXNotRecognizedException e) { /* ignore */ } + +	if (restProp == null) +	    return firstProp; +	if (firstProp == null) +	    return restProp; + +	// +	// we've got work to do; handle two builtin cases. +	// +	if (EventFilter.DECL_HANDLER.equals (id)) +	    return this; +	if (EventFilter.LEXICAL_HANDLER.equals (id)) +	    return this; + +	// +	// non-degenerate, handled by both consumers, but we don't know +	// how to handle this. +	// +	throw new SAXNotRecognizedException ("can't tee: " + id); +    } + +    /** +     * Provides the error handler to both subsequent nodes of +     * this filter stage. +     */ +    public void setErrorHandler (ErrorHandler handler) +    { +	first.setErrorHandler (handler); +	rest.setErrorHandler (handler); +    } + + +    // +    // ContentHandler +    // +    public void setDocumentLocator (Locator locator) +    { +	// this call is not made by all parsers +	docFirst.setDocumentLocator (locator); +	docRest.setDocumentLocator (locator); +    } + +    public void startDocument () +    throws SAXException +    { +	docFirst.startDocument (); +	docRest.startDocument (); +    } + +    public void endDocument () +    throws SAXException +    { +	try { +	    docFirst.endDocument (); +	} finally { +	    docRest.endDocument (); +	} +    } + +    public void startPrefixMapping (String prefix, String uri) +    throws SAXException +    { +	docFirst.startPrefixMapping (prefix, uri); +	docRest.startPrefixMapping (prefix, uri); +    } + +    public void endPrefixMapping (String prefix) +    throws SAXException +    { +	docFirst.endPrefixMapping (prefix); +	docRest.endPrefixMapping (prefix); +    } + +    public void skippedEntity (String name) +    throws SAXException +    { +	docFirst.skippedEntity (name); +	docRest.skippedEntity (name); +    } + +    public void startElement (String uri, String localName, +	    String qName, Attributes atts) +    throws SAXException +    { +	docFirst.startElement (uri, localName, qName, atts); +	docRest.startElement (uri, localName, qName, atts); +    } + +    public void endElement (String uri, String localName, String qName) +    throws SAXException +    { +	docFirst.endElement (uri, localName, qName); +	docRest.endElement (uri, localName, qName); +    } + +    public void processingInstruction (String target, String data) +    throws SAXException +    { +	docFirst.processingInstruction (target, data); +	docRest.processingInstruction (target, data); +    } + +    public void characters (char ch [], int start, int length) +    throws SAXException +    { +	docFirst.characters (ch, start, length); +	docRest.characters (ch, start, length); +    } + +    public void ignorableWhitespace (char ch [], int start, int length) +    throws SAXException +    { +	docFirst.ignorableWhitespace (ch, start, length); +	docRest.ignorableWhitespace (ch, start, length); +    } + + +    // +    // DTDHandler +    // +    public void notationDecl (String name, String publicId, String systemId) +    throws SAXException +    { +	DTDHandler	l1 = first.getDTDHandler (); +	DTDHandler	l2 = rest.getDTDHandler (); + +	l1.notationDecl (name, publicId, systemId); +	l2.notationDecl (name, publicId, systemId); +    } + +    public void unparsedEntityDecl (String name, +	    String publicId, String systemId, +	    String notationName +    ) throws SAXException +    { +	DTDHandler	l1 = first.getDTDHandler (); +	DTDHandler	l2 = rest.getDTDHandler (); + +	l1.unparsedEntityDecl (name, publicId, systemId, notationName); +	l2.unparsedEntityDecl (name, publicId, systemId, notationName); +    } + + +    // +    // DeclHandler +    // +    public void attributeDecl (String eName, String aName, +	String type, +	String mode, String value) +    throws SAXException +    { +	declFirst.attributeDecl (eName, aName, type, mode, value); +	declRest.attributeDecl (eName, aName, type, mode, value); +    } + +    public void elementDecl (String name, String model) +    throws SAXException +    { +	declFirst.elementDecl (name, model); +	declRest.elementDecl (name, model); +    } + +    public void externalEntityDecl (String name, +	String publicId, String systemId) +    throws SAXException +    { +	declFirst.externalEntityDecl (name, publicId, systemId); +	declRest.externalEntityDecl (name, publicId, systemId); +    } + +    public void internalEntityDecl (String name, String value) +    throws SAXException +    { +	declFirst.internalEntityDecl (name, value); +	declRest.internalEntityDecl (name, value); +    } + + +    // +    // LexicalHandler +    // +    public void comment (char ch [], int start, int length) +    throws SAXException +    { +	lexFirst.comment (ch, start, length); +	lexRest.comment (ch, start, length); +    } +     +    public void startCDATA () +    throws SAXException +    { +	lexFirst.startCDATA (); +	lexRest.startCDATA (); +    } +     +    public void endCDATA () +    throws SAXException +    { +	lexFirst.endCDATA (); +	lexRest.endCDATA (); +    } +     +    public void startEntity (String name) +    throws SAXException +    { +	lexFirst.startEntity (name); +	lexRest.startEntity (name); +    } +     +    public void endEntity (String name) +    throws SAXException +    { +	lexFirst.endEntity (name); +	lexRest.endEntity (name); +    } +     +    public void startDTD (String name, String publicId, String systemId) +    throws SAXException +    { +	lexFirst.startDTD (name, publicId, systemId); +	lexRest.startDTD (name, publicId, systemId); +    } +     +    public void endDTD () +    throws SAXException +    { +	lexFirst.endDTD (); +	lexRest.endDTD (); +    } +} diff --git a/libjava/gnu/xml/pipeline/TextConsumer.java b/libjava/gnu/xml/pipeline/TextConsumer.java new file mode 100644 index 00000000000..1039b3b8cf0 --- /dev/null +++ b/libjava/gnu/xml/pipeline/TextConsumer.java @@ -0,0 +1,117 @@ +/* TextConsumer.java --  +   Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING.  If not, write to the +Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA +02111-1307 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library.  Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module.  An independent module is a module which is not derived from +or based on this library.  If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so.  If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.io.*; + +import org.xml.sax.*; + +import gnu.xml.util.XMLWriter; + + +/** + * Terminates a pipeline, consuming events to print them as well formed + * XML (or XHTML) text. + * + * <p> Input must be well formed, and must include XML names (e.g. the + * prefixes and prefix declarations must be present), or the output of + * this class is undefined. + * + * @see NSFilter + * @see WellFormednessFilter + * + * @author David Brownell + */ +public class TextConsumer extends XMLWriter implements EventConsumer +{ +    /** +     * Constructs an event consumer which echoes its input as text, +     * optionally adhering to some basic XHTML formatting options +     * which increase interoperability with old (v3) browsers. +     * +     * <p> For the best interoperability, when writing as XHTML only +     * ASCII characters are emitted; other characters are turned to +     * entity or character references as needed, and no XML declaration +     * is provided in the document. +     */ +    public TextConsumer (Writer w, boolean isXhtml) +    throws IOException +    { +	super (w, isXhtml ? "US-ASCII" : null); +	setXhtml (isXhtml); +    } + +    /** +     * Constructs a consumer that writes its input as XML text. +     * XHTML rules are not followed. +     */ +    public TextConsumer (Writer w) +    throws IOException +    { +	this (w, false); +    } +	 +    /** +     * Constructs a consumer that writes its input as XML text, +     * encoded in UTF-8.  XHTML rules are not followed. +     */ +    public TextConsumer (OutputStream out) +    throws IOException +    { +	this (new OutputStreamWriter (out, "UTF8"), false); +    } + +    /** <b>EventConsumer</b> Returns the document handler being used. */ +    public ContentHandler getContentHandler () +	{ return this; } + +    /** <b>EventConsumer</b> Returns the dtd handler being used. */ +    public DTDHandler getDTDHandler () +	{ return this; } + +    /** <b>XMLReader</b>Retrieves a property (lexical and decl handlers) */ +    public Object getProperty (String propertyId) +    throws SAXNotRecognizedException +    { +	if (EventFilter.LEXICAL_HANDLER.equals (propertyId)) +	    return this; +	if (EventFilter.DECL_HANDLER.equals (propertyId)) +	    return this; +	throw new SAXNotRecognizedException (propertyId); +    } +} diff --git a/libjava/gnu/xml/pipeline/ValidationConsumer.java b/libjava/gnu/xml/pipeline/ValidationConsumer.java new file mode 100644 index 00000000000..e73c0ffe21c --- /dev/null +++ b/libjava/gnu/xml/pipeline/ValidationConsumer.java @@ -0,0 +1,1922 @@ +/* ValidationConsumer.java --  +   Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING.  If not, write to the +Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA +02111-1307 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library.  Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module.  An independent module is a module which is not derived from +or based on this library.  If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so.  If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.io.*; + +import java.util.EmptyStackException; +import java.util.Enumeration; +import java.util.Hashtable; +import java.util.Stack; +import java.util.StringTokenizer; +import java.util.Vector; + +import org.xml.sax.*; +import org.xml.sax.ext.*; +import org.xml.sax.helpers.XMLReaderFactory; + + +/** + * This class checks SAX2 events to report validity errors; it works as + * both a filter and a terminus on an event pipeline.  It relies on the + * producer of SAX events to:  </p> <ol> + * + *	<li> Conform to the specification of a non-validating XML parser that + *	reads all external entities, reported using SAX2 events. </li> + * + *	<li> Report ignorable whitespace as such (through the ContentHandler + *	interface).  This is, strictly speaking, optional for nonvalidating + *	XML processors.  </li> + * + *	<li> Make SAX2 DeclHandler callbacks, with default + *	attribute values already normalized (and without "<").</li> + * + *	<li> Make SAX2 LexicalHandler startDTD() and endDTD () + *	callbacks. </li> + * + *	<li> Act as if the <em>(URI)/namespace-prefixes</em> property were + *	set to true, by providing XML 1.0 names and all <code>xmlns*</code> + *	attributes (rather than omitting either or both). </li> + * + *	</ol> + * + * <p> At this writing, the major SAX2 parsers (such as Ælfred2, + * Crimson, and Xerces) meet these requirements, and this validation + * module is used by the optional Ælfred2 validation support. + * </p> + * + * <p> Note that because this is a layered validator, it has to duplicate some + * work that the parser is doing; there are also other cost to layering. + * However, <em>because of layering it doesn't need a parser</em> in order + * to work! You can use it with anything that generates SAX events, such + * as an application component that wants to detect invalid content in + * a changed area without validating an entire document, or which wants to + * ensure that it doesn't write invalid data to a communications partner.</p> + * + * <p> Also, note that because this is a layered validator, the line numbers + * reported for some errors may seem strange.  For example, if an element does + * not permit character content, the validator + * will use the locator provided to it. + * That might reflect the last character of a <em>characters</em> event + * callback, rather than the first non-whitespace character. </p> + * + * <hr /> + * + * <!-- + * <p> Of interest is the fact that unlike most currently known XML validators, + * this one can report some cases of non-determinism in element content models. + * It is a compile-time option, enabled by default.  This will only report + * such XML errors if they relate to content actually appearing in a document; + * content models aren't aggressively scanned for non-deterministic structure. + * Documents which trigger such non-deterministic transitions may be handled + * differently by different validating parsers, without losing conformance + * to the XML specification. </p> + * --> + * + * <p> Current limitations of the validation performed are in roughly three + * categories.  </p> + * + * <p> The first category represents constraints which demand violations + * of software layering:  exposing lexical details, one of the first things + * that <em>application</em> programming interfaces (APIs) hide.  These + * invariably relate to XML entity handling, and to historical oddities + * of the XML validation semantics.  Curiously, + * recent (Autumn 1999) conformance testing showed that these constraints are + * among those handled worst by existing XML validating parsers.  Arguments + * have been made that each of these VCs should be turned into WFCs (most + * of them) or discarded (popular for the standalone declaration); in short, + * that these are bugs in the XML specification (not all via SGML): </p><ul> + * + *	<li> The <em>Proper Declaration/PE Nesting</em> and + *	<em>Proper Group/PE Nesting</em> VCs can't be tested because they + *	require access to particularly low level lexical level information. + *	In essence, the reason XML isn't a simple thing to parse is that + *	it's not a context free grammar, and these constraints elevate that + *	SGML-derived context sensitivity to the level of a semantic rule. + * + *	<li> The <em>Standalone Document Declaration</em> VC can't be + *	tested.  This is for two reasons.  First, this flag isn't made + *	available through SAX2.  Second, it also requires breaking that + *	lexical layering boundary.  (If you ever wondered why classes + *	in compiler construction or language design barely mention the + *	existence of context-sensitive grammars, it's because of messy + *	issues like these.) + * + *	<li> The <em>Entity Declared</em> VC can't be tested, because it + *	also requires breaking that lexical layering boundary!  There's also + *	another issue: the VC wording (and seemingly intent) is ambiguous. + *	(This is still true in the "Second edition" XML spec.) + *	Since there is a WFC of the same name, everyone's life would be + *	easier if references to undeclared parsed entities were always well + *	formedness errors, regardless of whether they're parameter entities + *	or not.  (Note that nonvalidating parsers are not required + *	to report all such well formedness errors if they don't read external + *	parameter entities, although currently most XML parsers read them + *	in an attempt to avoid problems from inconsistent parser behavior.) + * + *	</ul> + * + * <p> The second category of limitations on this validation represent + * constraints associated with information that is not guaranteed to be + * available (or in one case, <em>is guaranteed not to be available</em>, + * through the SAX2 API: </p><ul> + * + *	<li> The <em>Unique Element Type Declaration</em> VC may not be + *	reportable, if the underlying parser happens not to expose + *	multiple declarations.   (Ælfred2 reports these validity + *	errors directly.)</li> + * + *	<li> Similarly, the <em>Unique Notation Name</em> VC, added in the + *	14-January-2000 XML spec errata to restrict typing models used by + *	elements, may not be reportable.  (Ælfred reports these + *	validity errors directly.) </li> + * + *	</ul> + * + * <p> A third category relates to ease of implementation.  (Think of this + * as "bugs".)  The most notable issue here is character handling.  Rather + * than attempting to implement the voluminous character tables in the XML + * specification (Appendix B), Unicode rules are used directly from + * the java.lang.Character class.  Recent JVMs have begun to diverge from + * the original specification for that class (Unicode 2.0), meaning that + * different JVMs may handle that aspect of conformance differently. + * </p> + * + * <p> Note that for some of the validity errors that SAX2 does not + * expose, a nonvalidating parser is permitted (by the XML specification) + * to report validity errors.  When used with a parser that does so for + * the validity constraints mentioned above (or any other SAX2 event + * stream producer that does the same thing), overall conformance is + * substantially improved. + * + * @see gnu.xml.aelfred2.SAXDriver + * @see gnu.xml.aelfred2.XmlReader + * + * @author David Brownell + */ +public final class ValidationConsumer extends EventFilter +{ +    // report error if we happen to notice a non-deterministic choice? +    // we won't report buggy content models; just buggy instances +    private static final boolean	warnNonDeterministic = false; + +    // for tracking active content models +    private String		rootName; +    private Stack		contentStack = new Stack (); + +    // flags for "saved DTD" processing +    private boolean		disableDeclarations; +    private boolean		disableReset; + +    // +    // most VCs get tested when we see element start tags.  the per-element +    // info (including attributes) recorded here duplicates that found inside +    // many nonvalidating parsers, hence dual lookups etc ... that's why a +    // layered validator isn't going to be as fast as a non-layered one. +    // + +    // key = element name; value = ElementInfo +    private Hashtable		elements = new Hashtable (); + +    // some VCs relate to ID/IDREF/IDREFS attributes +    // key = id; value = boolean true (defd) or false (refd) +    private Hashtable		ids = new Hashtable (); + +    // we just record declared notation and unparsed entity names. +    // the implementation here is simple/slow; these features +    // are seldom used, one hopes they'll wither away soon +    private Vector		notations = new Vector (5, 5); +    private Vector		nDeferred = new Vector (5, 5); +    private Vector		unparsed = new Vector (5, 5); +    private Vector		uDeferred = new Vector (5, 5); +	 +	// note: DocBk 3.1.7 XML defines over 2 dozen notations, +	// used when defining unparsed entities for graphics +	// (and maybe in other places) + +     + +    /** +     * Creates a pipeline terminus which consumes all events passed to +     * it; this will report validity errors as if they were fatal errors, +     * unless an error handler is assigned. +     * +     * @see #setErrorHandler +     */ +	// constructor used by PipelineFactory +	    // ... and want one taking system ID of an external subset +    public ValidationConsumer () +    { +	this (null); +    } + +    /** +     * Creates a pipeline filter which reports validity errors and then +     * passes events on to the next consumer if they were not fatal. +     * +     * @see #setErrorHandler +     */ +	// constructor used by PipelineFactory +	    // ... and want one taking system ID of an external subset +	    // (which won't send declaration events) +    public ValidationConsumer (EventConsumer next) +    { +	super (next); + +	setContentHandler (this); +	setDTDHandler (this); +	try { setProperty (DECL_HANDLER, this); } +	catch (Exception e) { /* "can't happen" */ } +	try { setProperty (LEXICAL_HANDLER, this); } +	catch (Exception e) { /* "can't happen" */ } +    } + +     +    private static final String	fakeRootName +	= ":Nobody:in:their_Right.Mind_would:use:this-name:1x:"; +     +    /** +     * Creates a validation consumer which is preloaded with the DTD provided. +     * It does this by constructing a document with that DTD, then parsing +     * that document and recording its DTD declarations.  Then it arranges +     * not to modify that information. +     * +     * <p> The resulting validation consumer will only validate against +     * the specified DTD, regardless of whether some other DTD is found +     * in a document being parsed. +     * +     * @param rootName The name of the required root element; if this is +     *	null, any root element name will be accepted. +     * @param publicId If non-null and there is a non-null systemId, this +     *	identifier provides an alternate access identifier for the DTD's +     *	external subset. +     * @param systemId If non-null, this is a URI (normally URL) that +     *	may be used to access the DTD's external subset. +     * @param internalSubset If non-null, holds literal markup declarations +     *	comprising the DTD's internal subset. +     * @param resolver If non-null, this will be provided to the parser for +     *	use when resolving parameter entities (including any external subset). +     * @param resolver If non-null, this will be provided to the parser for +     *	use when resolving parameter entities (including any external subset). +     * @param minimalElement If non-null, a minimal valid document. +     * +     * @exception SAXNotSupportedException If the default SAX parser does +     *	not support the standard lexical or declaration handlers. +     * @exception SAXParseException If the specified DTD has either +     *	well-formedness or validity errors +     * @exception IOException If the specified DTD can't be read for +     *	some reason +     */ +    public ValidationConsumer ( +	String		rootName, +	String		publicId, +	String		systemId, +	String		internalSubset, +	EntityResolver	resolver, +	String		minimalDocument +    ) throws SAXException, IOException +    { +	this (null); + +	disableReset = true; +	if (rootName == null) +	    rootName = fakeRootName; + +	// +	// Synthesize document with that DTD; is it possible to do +	// better for the declaration of the root element? +	// +	// NOTE:  can't use SAX2 to write internal subsets. +	// +	StringWriter	writer = new StringWriter (); + +	writer.write ("<!DOCTYPE "); +	writer.write (rootName); +	if (systemId != null) { +	    writer.write ("\n  "); +	    if (publicId != null) { +		writer.write ("PUBLIC '"); +		writer.write (publicId); +		writer.write ("'\n\t'"); +	    } else +		writer.write ("SYSTEM '"); +	    writer.write (systemId); +	    writer.write ("'"); +	} +	writer.write (" [ "); +	if (rootName == fakeRootName) { +	    writer.write ("\n<!ELEMENT "); +	    writer.write (rootName); +	    writer.write (" EMPTY>"); +	} +	if (internalSubset != null) +	    writer.write (internalSubset); +	writer.write ("\n ]>"); + +	if (minimalDocument != null) { +	    writer.write ("\n"); +	    writer.write (minimalDocument); +	    writer.write ("\n"); +	} else { +	    writer.write (" <"); +	    writer.write (rootName); +	    writer.write ("/>\n"); +	} +	minimalDocument = writer.toString (); + +	// +	// OK, load it +	// +	XMLReader	producer; + +	producer = XMLReaderFactory.createXMLReader (); +	bind (producer, this); + +	if (resolver != null) +	    producer.setEntityResolver (resolver); + +	InputSource	in; +	 +	in = new InputSource (new StringReader (minimalDocument)); +	producer.parse (in); + +	disableDeclarations = true; +	if (rootName == fakeRootName) +	    this.rootName = null; +    } + +    private void resetState () +    { +	if (!disableReset) { +	    rootName = null; +	    contentStack.removeAllElements (); +	    elements.clear (); +	    ids.clear (); + +	    notations.removeAllElements (); +	    nDeferred.removeAllElements (); +	    unparsed.removeAllElements (); +	    uDeferred.removeAllElements (); +	} +    } + + +    private void warning (String description) +    throws SAXException +    { +	ErrorHandler		errHandler = getErrorHandler (); +	Locator			locator = getDocumentLocator (); +	SAXParseException	err; + +	if (errHandler == null) +	    return; + +	if (locator == null) +	    err = new SAXParseException (description, null, null, -1, -1); +	else +	    err = new SAXParseException (description, locator); +	errHandler.warning (err); +    } + +    // package private (for ChildrenRecognizer) +    private void error (String description) +    throws SAXException +    { +	ErrorHandler		errHandler = getErrorHandler (); +	Locator			locator = getDocumentLocator (); +	SAXParseException	err; + +	if (locator == null) +	    err = new SAXParseException (description, null, null, -1, -1); +	else +	    err = new SAXParseException (description, locator); +	if (errHandler != null) +	    errHandler.error (err); +	else	// else we always treat it as fatal! +	    throw err; +    } + +    private void fatalError (String description) +    throws SAXException +    { +	ErrorHandler		errHandler = getErrorHandler (); +	Locator			locator = getDocumentLocator (); +	SAXParseException	err; + +	if (locator != null) +	    err = new SAXParseException (description, locator); +	else +	    err = new SAXParseException (description, null, null, -1, -1); +	if (errHandler != null) +	    errHandler.fatalError (err); +	// we always treat this as fatal, regardless of the handler +	throw err; +    } + + +    private static boolean isExtender (char c) +    { +	// [88] Extender ::= ... +	return c == 0x00b7 || c == 0x02d0 || c == 0x02d1 || c == 0x0387 +	       || c == 0x0640 || c == 0x0e46 || c == 0x0ec6 || c == 0x3005 +	       || (c >= 0x3031 && c <= 0x3035) +	       || (c >= 0x309d && c <= 0x309e) +	       || (c >= 0x30fc && c <= 0x30fe); +    } + + +    // use augmented Unicode rules, not full XML rules +    private boolean isName (String name, String context, String id) +    throws SAXException +    { +	char	buf [] = name.toCharArray (); +	boolean	pass = true; + +	if (!Character.isUnicodeIdentifierStart (buf [0]) +		&& ":_".indexOf (buf [0]) == -1) +	    pass = false; +	else { +	    int max = buf.length; +	    for (int i = 1; pass && i < max; i++) { +		char c = buf [i]; +		if (!Character.isUnicodeIdentifierPart (c) +			&& ":-_.".indexOf (c) == -1 +			&& !isExtender (c)) +		    pass = false; +	    } +	} + +	if (!pass) +	    error ("In " + context + " for " + id +		+ ", '" + name + "' is not a name"); +	return pass;	// true == OK +    } + +    // use augmented Unicode rules, not full XML rules +    private boolean isNmtoken (String nmtoken, String context, String id) +    throws SAXException +    { +	char	buf [] = nmtoken.toCharArray (); +	boolean	pass = true; +	int	max = buf.length; + +	// XXX make this share code with isName + +	for (int i = 0; pass && i < max; i++) { +		char c = buf [i]; +	    if (!Character.isUnicodeIdentifierPart (c) +		    && ":-_.".indexOf (c) == -1 +		    && !isExtender (c)) +		pass = false; +	} + +	if (!pass) +	    error ("In " + context + " for " + id +		+ ", '" + nmtoken + "' is not a name token"); +	return pass;	// true == OK +    } + +    private void checkEnumeration (String value, String type, String name) +    throws SAXException +    { +	if (!hasMatch (value, type)) +	    // VC: Enumeration +	    error ("Value '" + value +		+ "' for attribute '" + name +		+ "' is not permitted: " + type); +    } + +    // used to test enumerated attributes and mixed content models +    // package private +    static boolean hasMatch (String value, String orList) +    { +	int len = value.length (); +	int max = orList.length () - len; + +	for (int start = 0; +		(start = orList.indexOf (value, start)) != -1; +		start++) { +	    char c; + +	    if (start > max) +		break; +	    c = orList.charAt (start - 1); +	    if (c != '|' && c != '('/*)*/) +		continue; +	    c = orList.charAt (start + len); +	    if (c != '|' && /*(*/ c != ')') +		continue; +	    return true; +	} +	return false; +    } + +    /** +     * <b>LexicalHandler</b> Records the declaration of the root +     * element, so it can be verified later. +     * Passed to the next consumer, unless this one was +     * preloaded with a particular DTD. +     */ +    public void startDTD (String name, String publicId, String systemId) +    throws SAXException +    { +	if (disableDeclarations) +	    return; + +	rootName = name; +	super.startDTD (name, publicId, systemId); +    } + +    /** +     * <b>LexicalHandler</b> Verifies that all referenced notations +     * and unparsed entities have been declared. +     * Passed to the next consumer, unless this one was +     * preloaded with a particular DTD. +     */ +    public void endDTD () +    throws SAXException +    { +	if (disableDeclarations) +	    return; + +	// this is a convenient hook for end-of-dtd checks, but we +	// could also trigger it in the first startElement call. +	// locator info is more appropriate here though. + +	// VC: Notation Declared (NDATA can refer to them before decls, +	//	as can NOTATION attribute enumerations and defaults) +	int length = nDeferred.size (); +	for (int i = 0; i < length; i++) { +	    String notation = (String) nDeferred.elementAt (i); +	    if (!notations.contains (notation)) { +		error ("A declaration referred to notation '" + notation +			+ "' which was never declared"); +	    } +	} +	nDeferred.removeAllElements (); + +	// VC: Entity Name (attribute values can refer to them +	//	before they're declared); VC Attribute Default Legal +	length = uDeferred.size (); +	for (int i = 0; i < length; i++) { +	    String entity = (String) uDeferred.elementAt (i); +	    if (!unparsed.contains (entity)) { +		error ("An attribute default referred to entity '" + entity +			+ "' which was never declared"); +	    } +	} +	uDeferred.removeAllElements (); +	super.endDTD (); +    } + + +    // These are interned, so we can rely on "==" to find the type of +    // all attributes except enumerations ... +    // "(this|or|that|...)" and "NOTATION (this|or|that|...)" +    static final String types [] = { +	"CDATA", +	"ID", "IDREF", "IDREFS", +	"NMTOKEN", "NMTOKENS", +	"ENTITY", "ENTITIES" +    }; + + +    /** +     * <b>DecllHandler</b> Records attribute declaration for later use +     * in validating document content, and checks validity constraints +     * that are applicable to attribute declarations. +     * Passed to the next consumer, unless this one was +     * preloaded with a particular DTD. +     */ +    public void attributeDecl ( +	String eName, +	String aName, +	String type, +	String mode, +	String value +    ) throws SAXException +    { +	if (disableDeclarations) +	    return; + +	ElementInfo	info = (ElementInfo) elements.get (eName); +	AttributeInfo	ainfo = new AttributeInfo (); +	boolean		checkOne = false; +	boolean		interned = false; + +	// cheap interning of type names and #FIXED, #REQUIRED +	// for faster startElement (we can use "==") +	for (int i = 0; i < types.length; i++) { +	    if (types [i].equals (type)) { +		type = types [i]; +		interned = true; +		break; +	    } +	} +	if ("#FIXED".equals (mode)) +	    mode = "#FIXED"; +	else if ("#REQUIRED".equals (mode)) +	    mode = "#REQUIRED"; + +	ainfo.type = type; +	ainfo.mode = mode; +	ainfo.value = value; + +	// we might not have seen the content model yet +	if (info == null) { +	    info = new ElementInfo (eName); +	    elements.put (eName, info); +	} +	if ("ID" == type) { +	    checkOne = true; +	    if (!("#REQUIRED" == mode || "#IMPLIED".equals (mode))) { +		// VC: ID Attribute Default +		error ("ID attribute '" + aName +		    + "' must be #IMPLIED or #REQUIRED"); +	    } + +	} else if (!interned && type.startsWith ("NOTATION ")) { +	    checkOne = true; + +	    // VC: Notation Attributes (notations must be declared) +	    StringTokenizer	tokens = new StringTokenizer ( +		type.substring (10, type.lastIndexOf (')')), +		"|"); +	    while (tokens.hasMoreTokens ()) { +		String	token = tokens.nextToken (); +		if (!notations.contains (token)) +		    nDeferred.addElement (token); +	    } +	} +	if (checkOne) { +	    for (Enumeration e = info.attributes.keys (); +		    e.hasMoreElements (); +		    /* NOP */) { +		String		name; +		AttributeInfo	ainfo2; + +		name = (String) e.nextElement (); +		ainfo2 = (AttributeInfo) info.attributes.get (name); +		if (type == ainfo2.type || !interned /* NOTATION */) { +		    // VC: One ID per Element Type +		    // VC: One Notation per Element TYpe +		    error ("Element '" + eName +			+ "' already has an attribute of type " +			+ (interned ? "NOTATION" : type) +			+ " ('" + name +			+ "') so '" + aName  +			+ "' is a validity error"); +		} +	    } +	} + +	// VC: Attribute Default Legal +	if (value != null) { + +	    if ("CDATA" == type) { +		// event source rejected '<' + +	    } else if ("NMTOKEN" == type) { +		// VC: Name Token (is a nmtoken) +		isNmtoken (value, "attribute default", aName); + +	    } else if ("NMTOKENS" == type) { +		// VC: Name Token (is a nmtoken; at least one value) +		StringTokenizer	tokens = new StringTokenizer (value); +		if (!tokens.hasMoreTokens ()) +		    error ("Default for attribute '" + aName +			+ "' must have at least one name token."); +		else do { +		    String token = tokens.nextToken (); +		    isNmtoken (token, "attribute default", aName); +		} while (tokens.hasMoreTokens ()); + +	    } else if ("IDREF" == type || "ENTITY" == type) { +		// VC: Entity Name (is a name) +		// VC: IDREF (is a name) (is declared) +		isName (value, "attribute default", aName); +		if ("ENTITY" == type && !unparsed.contains (value)) +		    uDeferred.addElement (value); + +	    } else if ("IDREFS" == type || "ENTITIES" == type) { +		// VC: Entity Name (is a name; at least one value) +		// VC: IDREF (is a name; at least one value) +		StringTokenizer	names = new StringTokenizer (value); +		if (!names.hasMoreTokens ()) +		    error ("Default for attribute '" + aName +			+ "' must have at least one name."); +		else do { +		    String name = names.nextToken (); +		    isName (name, "attribute default", aName); +		    if ("ENTITIES" == type && !unparsed.contains (name)) +			uDeferred.addElement (value); +		} while (names.hasMoreTokens ()); +	     +	    } else if (type.charAt (0) == '(' /*)*/ ) { +		// VC: Enumeration (must match) +		checkEnumeration (value, type, aName); + +	    } else if (!interned && checkOne) {	/* NOTATION */ +		// VC: Notation attributes (must be names) +		isName (value, "attribute default", aName); + +		// VC: Notation attributes (must be declared) +		if (!notations.contains (value)) +		    nDeferred.addElement (value); +		 +		// VC: Enumeration (must match) +		checkEnumeration (value, type, aName); + +	    } else if ("ID" != type) +		throw new RuntimeException ("illegal attribute type: " + type); +	} + +	if (info.attributes.get (aName) == null) +	    info.attributes.put (aName, ainfo); +	/* +	else +	    warning ("Element '" + eName +		+ "' already has an attribute named '" + aName + "'"); +	*/ + +	if ("xml:space".equals (aName)) { +	    if (!("(default|preserve)".equals (type) +		    || "(preserve|default)".equals (type) +			// these next two are arguable; XHTML's DTD doesn't +			// deserve errors.  After all, it's not like any +			// illegal _value_ could pass ... +		    || "(preserve)".equals (type) +		    || "(default)".equals (type) +		    )) +		error ( +		    "xml:space attribute type must be like '(default|preserve)'" +		    + " not '" + type + "'" +		    ); + +	} +	super.attributeDecl (eName, aName, type, mode, value); +    } + +    /** +     * <b>DecllHandler</b> Records the element declaration for later use +     * when checking document content, and checks validity constraints that +     * apply to element declarations.  Passed to the next consumer, unless +     * this one was preloaded with a particular DTD. +     */ +    public void elementDecl (String name, String model) +    throws SAXException +    { +	if (disableDeclarations) +	    return; + +	ElementInfo	info = (ElementInfo) elements.get (name); + +	// we might have seen an attribute decl already +	if (info == null) { +	    info = new ElementInfo (name); +	    elements.put (name, info); +	} +	if (info.model != null) { +	    // NOTE:  not all parsers can report such duplicates. +	    // VC: Unique Element Type Declaration +	    error ("Element type '" + name +		+ "' was already declared."); +	} else { +	    info.model = model; + +	    // VC: No Duplicate Types (in mixed content models) +	    if (model.charAt (1) == '#') 	// (#PCDATA... +		info.getRecognizer (this); +	} +	super.elementDecl (name, model); +    } + +    /** +     * <b>DecllHandler</b> passed to the next consumer, unless this +     * one was preloaded with a particular DTD +     */ +    public void internalEntityDecl (String name, String value) +    throws SAXException +    { +	if (!disableDeclarations) +	    super.internalEntityDecl (name, value); +    } + +    /** +     * <b>DecllHandler</b> passed to the next consumer, unless this +     * one was preloaded with a particular DTD +     */ +    public void externalEntityDecl (String name, +    	String publicId, String systemId) +    throws SAXException +    { +	if (!disableDeclarations) +	    super.externalEntityDecl (name, publicId, systemId); +    } + + +    /** +     * <b>DTDHandler</b> Records the notation name, for checking +     * NOTATIONS attribute values and declararations of unparsed +     * entities.  Passed to the next consumer, unless this one was +     * preloaded with a particular DTD. +     */ +    public void notationDecl (String name, String publicId, String systemId) +    throws SAXException +    { +	if (disableDeclarations) +	    return; + +	notations.addElement (name); +	super.notationDecl (name, publicId, systemId); +    } + +    /** +     * <b>DTDHandler</b> Records the entity name, for checking +     * ENTITY and ENTITIES attribute values; records the notation +     * name if it hasn't yet been declared.  Passed to the next consumer, +     * unless this one was preloaded with a particular DTD. +     */ +    public void unparsedEntityDecl ( +	String name, +	String publicId, +	String systemId, +	String notationName +    ) throws SAXException +    { +	if (disableDeclarations) +	    return; + +	unparsed.addElement (name); +	if (!notations.contains (notationName)) +	    nDeferred.addElement (notationName); +	super.unparsedEntityDecl (name, publicId, systemId, notationName); +    } +     +     +    /** +     * <b>ContentHandler</b> Ensures that state from any previous parse +     * has been deleted. +     * Passed to the next consumer. +     */ +    public void startDocument () +    throws SAXException +    { +	resetState (); +	super.startDocument (); +    } + + +    private static boolean isAsciiLetter (char c) +    { +	return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'); +    } + + +    /** +     * <b>ContentHandler</b> Reports a fatal exception.  Validating +     * XML processors may not skip any entities. +     */ +    public void skippedEntity (String name) +    throws SAXException +    { +	fatalError ("may not skip entities"); +    } + +    /* +     * SAX2 doesn't expand non-PE refs in attribute defaults... +     */ +    private String expandDefaultRefs (String s) +    throws SAXException +    { +	if (s.indexOf ('&') < 0) +	    return s; +	 +// FIXME: handle &#nn; &#xnn; &name; +	String message = "Can't expand refs in attribute default: " + s; +	warning (message); + +	return s; +    } + +    /** +     * <b>ContentHandler</b> Performs validity checks against element +     * (and document) content models, and attribute values. +     * Passed to the next consumer. +     */ +    public void startElement ( +	String		uri, +	String		localName, +	String		qName, +	Attributes	atts +    ) throws SAXException +    { +	// +	// First check content model for the enclosing scope. +	// +	if (contentStack.isEmpty ()) { +	    // VC:  Root Element Type +	    if (!qName.equals (rootName)) { +		if (rootName == null) +		    warning ("This document has no DTD, can't be valid"); +		else +		    error ("Root element type '" + qName +			+ "' was declared to be '" + rootName + "'"); +	    } +	} else { +	    Recognizer state = (Recognizer) contentStack.peek (); + +	    if (state != null) { +		Recognizer newstate = state.acceptElement (qName); + +		if (newstate == null) +		    error ("Element type '" + qName +			+ "' in element '" + state.type.name +			+ "' violates content model " + state.type.model +			); +		if (newstate != state) { +		    contentStack.pop (); +		    contentStack.push (newstate); +		} +	    } +	} + +	// +	// Then check that this element was declared, and push the +	// object used to validate its content model onto our stack. +	// +	// This is where the recognizer gets created, if needed; if +	// it's a "children" (elements) content model, an NDFA is +	// created.  (One recognizer is used per content type, no +	// matter how complex that recognizer is.) +	// +	ElementInfo		info; + +	info = (ElementInfo) elements.get (qName); +	if (info == null || info.model == null) { +	    // VC: Element Valid (base clause) +	    error ("Element type '" + qName + "' was not declared"); +	    contentStack.push (null); + +	    // for less diagnostic noise, fake a declaration. +	    elementDecl (qName, "ANY"); +	} else +	    contentStack.push (info.getRecognizer (this)); + +	// +	// Then check each attribute present +	// +	int			len; +	String			aname; +	AttributeInfo		ainfo; + +	if (atts != null) +	    len = atts.getLength (); +	else +	    len = 0; +	 +	for (int i = 0; i < len; i++) { +	    aname = atts.getQName (i); + +	    if (info == null +		    || (ainfo = (AttributeInfo) info.attributes.get (aname)) +			    == null) { +		// VC: Attribute Value Type +		error ("Attribute '" + aname +		    + "' was not declared for element type " + qName); +		continue; +	    } + +	    String value = atts.getValue (i); + +	    // note that "==" for type names and "#FIXED" is correct +	    // (and fast) since we've interned those literals. + +	    if ("#FIXED" == ainfo.mode) { +		String expanded = expandDefaultRefs (ainfo.value); + +		// VC: Fixed Attribute Default +		if (!value.equals (expanded)) { +		    error ("Attribute '" + aname +			+ "' must match " + expanded +			); +		    continue; +		} +	    } + +	    if ("CDATA" == ainfo.type) +		continue; +	     +	    // +	    // For all other attribute types, there are various +	    // rules to follow. +	    // +	     +	    if ("ID" == ainfo.type) { +		// VC: ID (must be a name) +		if (isName (value, "ID attribute", aname)) { +		    if (Boolean.TRUE == ids.get (value)) +			// VC: ID (appears once) +			error ("ID attribute " + aname +			    + " uses an ID value '" + value +			    + "' which was already declared."); +		    else +			// any forward refs are no longer problems +			ids.put (value, Boolean.TRUE); +		} +		continue; +	    }  + +	    if ("IDREF" == ainfo.type) { +		// VC: IDREF (value must be a name) +		if (isName (value, "IDREF attribute", aname)) { +		    // VC: IDREF (must match some ID attribute) +		    if (ids.get (value) == null) +			// new -- assume it's a forward ref +			ids.put (value, Boolean.FALSE); +		} +		continue; +	    }  + +	    if ("IDREFS" == ainfo.type) { +		StringTokenizer	tokens = new StringTokenizer (value, " "); + +		if (!tokens.hasMoreTokens ()) { +		    // VC: IDREF (one or more values) +		    error ("IDREFS attribute " + aname +			+ " must have at least one ID ref"); +		} else do { +		    String id = tokens.nextToken (); + +		    // VC: IDREF (value must be a name) +		    if (isName (id, "IDREFS attribute", aname)) { +			// VC: IDREF (must match some ID attribute) +			if (ids.get (id) == null) +			    // new -- assume it's a forward ref +			    ids.put (id, Boolean.FALSE); +		    } +		} while (tokens.hasMoreTokens ()); +		continue; +	    } + +	    if ("NMTOKEN" == ainfo.type) { +		// VC: Name Token (is a name token) +		isNmtoken (value, "NMTOKEN attribute", aname); +		continue; +	    } + +	    if ("NMTOKENS" == ainfo.type) { +		StringTokenizer	tokens = new StringTokenizer (value, " "); + +		if (!tokens.hasMoreTokens ()) { +		    // VC: Name Token (one or more values) +		    error ("NMTOKENS attribute " + aname +			+ " must have at least one name token"); +		} else do { +		    String token = tokens.nextToken (); + +		    // VC: Name Token (is a name token) +		    isNmtoken (token, "NMTOKENS attribute", aname); +		} while (tokens.hasMoreTokens ()); +		continue; +	    } + +	    if ("ENTITY" == ainfo.type) { +		if (!unparsed.contains (value)) +		    // VC: Entity Name +		    error ("Value of attribute '" + aname +			+ "' refers to unparsed entity '" + value +			+ "' which was not declared."); +		continue; +	    } + +	    if ("ENTITIES" == ainfo.type) { +		StringTokenizer	tokens = new StringTokenizer (value, " "); + +		if (!tokens.hasMoreTokens ()) { +		    // VC: Entity Name (one or more values) +		    error ("ENTITIES attribute " + aname +			+ " must have at least one name token"); +		} else do { +		    String entity = tokens.nextToken (); + +		    if (!unparsed.contains (entity)) +			// VC: Entity Name +			error ("Value of attribute '" + aname +			    + "' refers to unparsed entity '" + entity +			    + "' which was not declared."); +		} while (tokens.hasMoreTokens ()); +		continue; +	    } + +	    // +	    // check for enumerations last; more expensive +	    // +	    if (ainfo.type.charAt (0) == '(' /*)*/	 +		    || ainfo.type.startsWith ("NOTATION ") +		    ) { +		// VC: Enumeration (value must be defined) +		checkEnumeration (value, ainfo.type, aname); +		continue; +	    } +	} + +	// +	// Last, check that all #REQUIRED attributes were provided +	// +	if (info != null) { +	    Hashtable	table = info.attributes; + +	    if (table.size () != 0) { +		Enumeration	e = table.keys (); + +		// XXX table.keys uses the heap, bleech -- slows things + +		while (e.hasMoreElements ()) { +		    aname = (String) e.nextElement (); +		    ainfo = (AttributeInfo) table.get (aname); + +		    // "#REQUIRED" mode was interned in attributeDecl +		    if ("#REQUIRED" == ainfo.mode +			    && atts.getValue (aname) == null) { +			// VC: Required Attribute +			error ("Attribute '" + aname + "' must be specified " +			    + "for element type " + qName); +		    } +		} +	    } +	} +	super.startElement (uri, localName, qName, atts); +    } + +    /** +     * <b>ContentHandler</b> Reports a validity error if the element's content +     * model does not permit character data. +     * Passed to the next consumer. +     */ +    public void characters (char ch [], int start, int length) +    throws SAXException +    { +	Recognizer state; + +	if (contentStack.empty ()) +	    state = null; +	else +	    state = (Recognizer) contentStack.peek (); + +	// NOTE:  if this ever supports with SAX parsers that don't +	// report ignorable whitespace as such (only XP?), this class +	// needs to morph it into ignorableWhitespace() as needed ... + +	if (state != null && !state.acceptCharacters ()) +	    // VC: Element Valid (clauses three, four -- see recognizer) +	    error ("Character content not allowed in element " +		+ state.type.name); +	 +	super.characters (ch, start, length); +    } +	 + +    /** +     * <b>ContentHandler</b> Reports a validity error if the element's content +     * model does not permit end-of-element yet, or a well formedness error +     * if there was no matching startElement call. +     * Passed to the next consumer. +     */ +    public void endElement (String uri, String localName, String qName) +    throws SAXException +    { +	try { +	    Recognizer state = (Recognizer) contentStack.pop (); + +	    if (state != null && !state.completed ()) +		// VC: Element valid (clauses two, three, four; see Recognizer) +		error ("Premature end for element '" +		    + state.type.name +		    + "', content model " +		    + state.type.model); +	     +	    // could insist on match of start element, but that's +	    // something the input stream must to guarantee. + +	} catch (EmptyStackException e) { +	    fatalError ("endElement without startElement: " + qName +		+ ((uri == null) +		    ? "" +		    : ( " { '" + uri + "', " + localName + " }"))); +	} +	super.endElement (uri, localName, qName); +    } + +    /** +     * <b>ContentHandler</b> Checks whether all ID values that were +     * referenced have been declared, and releases all resources.  +     * Passed to the next consumer. +     *  +     * @see #setDocumentLocator +     */ +    public void endDocument () +    throws SAXException +    { +	for (Enumeration idNames = ids.keys (); +		idNames.hasMoreElements (); +		/* NOP */) { +	    String id = (String) idNames.nextElement (); +	     +	    if (Boolean.FALSE == ids.get (id)) { +		// VC: IDREF (must match ID) +		error ("Undeclared ID value '" + id +		    + "' was referred to by an IDREF/IDREFS attribute"); +	    } +	} + +	resetState (); +	super.endDocument (); +    } + + +    /** Holds per-element declarations */ +    static private final class ElementInfo +    { +	String			name; +	String			model; + +	// key = attribute name; value = AttributeInfo +	Hashtable		attributes = new Hashtable (11); + +	ElementInfo (String n) { name = n; } + +	private Recognizer	recognizer; + +	// for validating content models:  one per type, shared, +	// and constructed only on demand ... so unused elements do +	// not need to consume resources. +	Recognizer	getRecognizer (ValidationConsumer consumer) +	throws SAXException +	{ +	    if (recognizer == null) { +		if ("ANY".equals (model)) +		    recognizer = ANY; +		else if ("EMPTY".equals (model)) +		    recognizer = new EmptyRecognizer (this); +		else if ('#' == model.charAt (1)) +		    // n.b. this constructor does a validity check +		    recognizer = new MixedRecognizer (this, consumer); +		else +		    recognizer = new ChildrenRecognizer (this, consumer); +	    } +	    return recognizer; +	} +    } + +    /** Holds per-attribute declarations */ +    static private final class AttributeInfo +    { +	String	type; +	String	mode;		// #REQUIRED, etc (or null) +	String	value;		// or null +    } + + +    // +    // Content model validation +    // + +    static private final Recognizer	ANY = new Recognizer (null); + + +    // Base class defines the calls used to validate content, +    // and supports the "ANY" content model +    static private class Recognizer +    { +	final ElementInfo	type; + +	Recognizer (ElementInfo t) { type = t; } + +	// return true iff character data is legal here +	boolean acceptCharacters () +	throws SAXException +	    // VC: Element Valid (third and fourth clauses) +	    { return true; } + +	// null return = failure +	// otherwise, next state (like an FSM) +	// prerequisite: tested that name was declared +	Recognizer acceptElement (String name) +	throws SAXException +	    // VC: Element Valid (fourth clause) +	    { return this; } + +	// return true iff model is completed, can finish +	boolean completed () +	throws SAXException +	    // VC: Element Valid (fourth clause) +	    { return true; } +	 +	public String toString () +	    // n.b. "children" is the interesting case! +	    { return (type == null) ? "ANY" : type.model; } +    } + +    // "EMPTY" content model -- no characters or elements +    private static final class EmptyRecognizer extends Recognizer +    { +	public EmptyRecognizer (ElementInfo type) +	    { super (type); } + +	// VC: Element Valid (first clause) +	boolean acceptCharacters () +	    { return false; } + +	// VC: Element Valid (first clause) +	Recognizer acceptElement (String name) +	    { return null; } +    } + +    // "Mixed" content model -- ANY, but restricts elements +    private static final class MixedRecognizer extends Recognizer +    { +	private String	permitted []; + +	// N.B. constructor tests for duplicated element names (VC) +	public MixedRecognizer (ElementInfo t, ValidationConsumer v) +	throws SAXException +	{ +	    super (t); + +	    // (#PCDATA...)* or (#PCDATA) ==> ... or empty +	    // with the "..." being "|elname|..." +	    StringTokenizer	tokens = new StringTokenizer ( +		t.model.substring (8, t.model.lastIndexOf (')')), +		"|"); +	    Vector		vec = new Vector (); + +	    while (tokens.hasMoreTokens ()) { +		String token = tokens.nextToken (); + +		if (vec.contains (token)) +		    v.error ("element " + token +			+ " is repeated in mixed content model: " +			+ t.model); +		else +		    vec.addElement (token.intern ()); +	    } +	    permitted = new String [vec.size ()]; +	    for (int i = 0; i < permitted.length; i++) +		permitted [i] = (String) vec.elementAt (i); +	     +	    // in one large machine-derived DTD sample, most of about +	    // 250 mixed content models were empty, and 25 had ten or +	    // more entries.  2 had over a hundred elements.  Linear +	    // search isn't obviously wrong. +	} + +	// VC: Element Valid (third clause) +	Recognizer acceptElement (String name) +	{ +	    int		length = permitted.length; + +	    // first pass -- optimistic w.r.t. event source interning +	    // (and document validity) +	    for (int i = 0; i < length; i++) +		if (permitted [i] == name) +		    return this; +	    // second pass -- pessimistic w.r.t. event source interning +	    for (int i = 0; i < length; i++) +		if (permitted [i].equals (name)) +		    return this; +	    return null; +	} +    } + + +    // recognizer loop flags, see later +    private static final int		F_LOOPHEAD = 0x01; +    private static final int		F_LOOPNEXT = 0x02; + +    // for debugging -- used to label/count nodes in toString() +    private static int			nodeCount; + +    /** +     * "Children" content model -- these are nodes in NDFA state graphs. +     * They work in fixed space.  Note that these graphs commonly have +     * cycles, handling features such as zero-or-more and one-or-more. +     * +     * <p>It's readonly, so only one copy is ever needed.  The content model +     * stack may have any number of pointers into each graph, when a model +     * happens to be needed more than once due to element nesting.  Since +     * traversing the graph just moves to another node, and never changes +     * it, traversals never interfere with each other. +     * +     * <p>There is an option to report non-deterministic models.  These are +     * always XML errors, but ones which are not often reported despite the +     * fact that they can lead to different validating parsers giving +     * different results for the same input.  (The XML spec doesn't require +     * them to be reported.) +     * +     * <p><b>FIXME</b> There's currently at least one known bug here, in that +     * it's not actually detecting the non-determinism it tries to detect. +     * (Of the "optional.xml" test, the once-or-twice-2* tests are all non-D; +     * maybe some others.)  This may relate to the issue flagged below as +     * "should not" happen (but it was), which showed up when patching the +     * graph to have one exit node (or more EMPTY nodes). +     */ +    private static final class ChildrenRecognizer extends Recognizer +	implements Cloneable +    { +	// for reporting non-deterministic content models +	// ... a waste of space if we're not reporting those! +	// ... along with the 'model' member (in base class) +	private ValidationConsumer	consumer; + +	// for CHOICE nodes -- each component is an arc that +	// accepts a different NAME (or is EMPTY indicating +	// NDFA termination). +	private Recognizer		components []; + +	// for NAME/SEQUENCE nodes -- accepts that NAME and +	// then goes to the next node (CHOICE, NAME, EMPTY). +	private String			name; +	private Recognizer		next; + +	// loops always point back to a CHOICE node. we mark such choice +	// nodes (F_LOOPHEAD) for diagnostics and faster deep cloning. +	// We also mark nodes before back pointers (F_LOOPNEXT), to ensure +	// termination when we patch sequences and loops. +	private int			flags; + + +	// prevent a needless indirection between 'this' and 'node' +	private void copyIn (ChildrenRecognizer node) +	{ +	    // model & consumer are already set +	    components = node.components; +	    name = node.name; +	    next = node.next; +	    flags = node.flags; +	} + +	// used to construct top level "children" content models, +	public ChildrenRecognizer (ElementInfo type, ValidationConsumer vc) +	{ +	    this (vc, type); +	    populate (type.model.toCharArray (), 0); +	    patchNext (new EmptyRecognizer (type), null); +	} + +	// used internally; populating is separate +	private ChildrenRecognizer (ValidationConsumer vc, ElementInfo type) +	{ +	    super (type); +	    consumer = vc; +	} + + +	// +	// When rewriting some graph nodes we need deep clones in one case; +	// mostly shallow clones (what the JVM handles for us) are fine. +	// +	private ChildrenRecognizer shallowClone () +	{ +	    try { +		return (ChildrenRecognizer) clone (); +	    } catch (CloneNotSupportedException e) { +		throw new Error ("clone"); +	    } +	} + +	private ChildrenRecognizer deepClone () +	{ +	    return deepClone (new Hashtable (37)); +	} + +	private ChildrenRecognizer deepClone (Hashtable table) +	{ +	    ChildrenRecognizer retval; + +	    if ((flags & F_LOOPHEAD) != 0) { +		retval = (ChildrenRecognizer) table.get (this); +		if (retval != null) +		    return this; + +		retval = shallowClone (); +		table.put (this, retval); +	    } else +		retval = shallowClone (); + +	    if (next != null) { +		if (next instanceof ChildrenRecognizer) +		    retval.next = ((ChildrenRecognizer)next) +			    .deepClone (table); +		else if (!(next instanceof EmptyRecognizer)) +		    throw new RuntimeException ("deepClone"); +	    } + +	    if (components != null) { +		retval.components = new Recognizer [components.length]; +		for (int i = 0; i < components.length; i++) { +		    Recognizer temp = components [i]; + +		    if (temp == null) +			retval.components [i] = null; +		    else if (temp instanceof ChildrenRecognizer) +			retval.components [i] = ((ChildrenRecognizer)temp) +				.deepClone (table); +		    else if (!(temp instanceof EmptyRecognizer)) +			throw new RuntimeException ("deepClone"); +		} +	    } + +	    return retval; +	} + +	// connect subgraphs, first to next (sequencing) +	private void patchNext (Recognizer theNext, Hashtable table) +	{ +	    // backpointers must not be repatched or followed +	    if ((flags & F_LOOPNEXT) != 0) +		return; + +	    // XXX this table "shouldn't" be needed, right? +	    // but some choice nodes looped if it isn't there. +	    if (table != null && table.get (this) != null) +		return; +	    if (table == null) +		table = new Hashtable (); + +	    // NAME/SEQUENCE +	    if (name != null) { +		if (next == null) +		    next = theNext; +		else if (next instanceof ChildrenRecognizer) { +		    ((ChildrenRecognizer)next).patchNext (theNext, table); +		} else if (!(next instanceof EmptyRecognizer)) +		    throw new RuntimeException ("patchNext"); +		return; +	    } + +	    // CHOICE +	    for (int i = 0; i < components.length; i++) { +		if (components [i] == null) +		    components [i] = theNext; +		else if (components [i] instanceof ChildrenRecognizer) { +		    ((ChildrenRecognizer)components [i]) +			    .patchNext (theNext, table); +		} else if (!(components [i] instanceof EmptyRecognizer)) +		    throw new RuntimeException ("patchNext"); +	    } + +	    if (table != null && (flags | F_LOOPHEAD) != 0) +		table.put (this, this); +	} + +	/** +	 * Parses a 'children' spec (or recursively 'cp') and makes this +	 * become a regular graph node. +	 * +	 * @return index after this particle +	 */ +	private int populate (char parseBuf [], int startPos) +	{ +	    int		nextPos = startPos + 1; +	    char	c; + +	    if (nextPos < 0 || nextPos >= parseBuf.length) +		throw new IndexOutOfBoundsException (); + +	    // Grammar of the string is from the XML spec, but +	    // with whitespace removed by the SAX parser. + +	    // children ::= (choice | seq) ('?' | '*' | '+')? +	    // cp ::= (Name | choice | seq) ('?' | '*' | '+')? +	    // choice ::= '(' cp ('|' choice)* ')' +	    // seq ::= '(' cp (',' choice)* ')' + +	    // interior nodes only +	    //   cp ::= name ... +	    if (parseBuf [startPos] != '('/*)*/) { +		boolean		done = false; +		do { +		    switch (c = parseBuf [nextPos]) { +			case '?': case '*': case '+': +			case '|': case ',': +			case /*(*/ ')': +			    done = true; +			    continue; +			default: +			    nextPos++; +			    continue; +		    } +		} while (!done); +		name = new String (parseBuf, startPos, nextPos - startPos); + +	    // interior OR toplevel nodes +	    //   cp ::= choice .. +	    //   cp ::= seq .. +	    } else { +		// collect everything as a separate list, and merge it +		// into "this" later if we can (SEQUENCE or singleton) +		ChildrenRecognizer	first; +	        +		first = new ChildrenRecognizer (consumer, type); +		nextPos = first.populate (parseBuf, nextPos); +		c = parseBuf [nextPos++]; + +		if (c == ',' || c == '|') { +		    ChildrenRecognizer	current = first; +		    char		separator = c; +		    Vector		v = null; + +		    if (separator == '|') { +			v = new Vector (); +			v.addElement (first); +		    } + +		    do { +			ChildrenRecognizer link; + +			link = new ChildrenRecognizer (consumer, type); +			nextPos = link.populate (parseBuf, nextPos); + +			if (separator == ',') { +			    current.patchNext (link, null); +			    current = link; +			} else +			    v.addElement (link); + +			c = parseBuf [nextPos++]; +		    } while (c == separator); + +		    // choice ... collect everything into one array. +		    if (separator == '|') { +			// assert v.size() > 1 +			components = new Recognizer [v.size ()]; +			for (int i = 0; i < components.length; i++) { +			    components [i] = (Recognizer) +				    v.elementAt (i); +			} +			// assert flags == 0 + +		    // sequence ... merge into "this" to be smaller. +		    } else +			copyIn (first); + +		// treat singletons like one-node sequences. +		} else +		    copyIn (first); + +		if (c != /*(*/ ')') +		    throw new RuntimeException ("corrupt content model"); +	    } + +	    // +	    // Arity is optional, and the root of all fun.  We keep the +	    // FSM state graph simple by only having NAME/SEQUENCE and +	    // CHOICE nodes (or EMPTY to terminate a model), easily +	    // evaluated.  So we rewrite each node that has arity, using +	    // those primitives.  We create loops here, if needed. +	    // +	    if (nextPos < parseBuf.length) { +		c = parseBuf [nextPos]; +		if (c == '?' || c == '*' || c == '+') { +		    nextPos++; + +		    // Rewrite 'zero-or-one' "?" arity to a CHOICE: +		    //   - SEQUENCE (clone, what's next) +		    //   - or, what's next +		    // Size cost: N --> N + 1 +		    if (c == '?') { +			Recognizer		once = shallowClone (); + +			components = new Recognizer [2]; +			components [0] = once; +			// components [1] initted to null +			name = null; +			next = null; +			flags = 0; + +			     +		    // Rewrite 'zero-or-more' "*" arity to a CHOICE. +		    //   - LOOP (clone, back to this CHOICE) +		    //   - or, what's next +		    // Size cost: N --> N + 1 +		    } else if (c == '*') { +			ChildrenRecognizer	loop = shallowClone (); + +			loop.patchNext (this, null); +			loop.flags |= F_LOOPNEXT; +			flags = F_LOOPHEAD; + +			components = new Recognizer [2]; +			components [0] = loop; +			// components [1] initted to null +			name = null; +			next = null; + + +		    // Rewrite 'one-or-more' "+" arity to a SEQUENCE. +		    // Basically (a)+ --> ((a),(a)*). +		    //   - this +		    //   - CHOICE +		    //	    * LOOP (clone, back to the CHOICE) +		    //	    * or, whatever's next +		    // Size cost: N --> 2N + 1 +		    } else if (c == '+') { +			ChildrenRecognizer loop = deepClone (); +			ChildrenRecognizer choice; + +			choice = new ChildrenRecognizer (consumer, type); +			loop.patchNext (choice, null); +			loop.flags |= F_LOOPNEXT; +			choice.flags = F_LOOPHEAD; + +			choice.components = new Recognizer [2]; +			choice.components [0] = loop; +			// choice.components [1] initted to null +			// choice.name, choice.next initted to null + +			patchNext (choice, null); +		    } +		} +	    } + +	    return nextPos; +	} + +	// VC: Element Valid (second clause) +	boolean acceptCharacters () +	    { return false; } + +	// VC: Element Valid (second clause) +	Recognizer acceptElement (String type) +	throws SAXException +	{ +	    // NAME/SEQUENCE +	    if (name != null) { +		if (name.equals (type)) +		    return next; +		return null; +	    } + +	    // CHOICE ... optionally reporting nondeterminism we +	    // run across.  we won't check out every transition +	    // for nondeterminism; only the ones we follow. +	    Recognizer	retval = null; + +	    for (int i = 0; i < components.length; i++) { +		Recognizer temp = components [i].acceptElement (type); + +		if (temp == null) +		    continue; +		else if (!warnNonDeterministic) +		    return temp; +		else if (retval == null) +		    retval = temp; +		else if (retval != temp) +		    consumer.error ("Content model " + this.type.model +			+ " is non-deterministic for " + type); +	    } +	    return retval; +	} + +	// VC: Element Valid (second clause) +	boolean completed () +	throws SAXException +	{ +	    // expecting a specific element +	    if (name != null) +		return false; +	     +	    // choice, some sequences +	    for (int i = 0; i < components.length; i++) { +		if (components [i].completed ()) +		    return true; +	    } + +	    return false; +	} + +/** / +	// FOR DEBUGGING ... flattens the graph for printing. + +	public String toString () +	{ +	    StringBuffer buf = new StringBuffer (); + +	    // only one set of loop labels can be generated +	    // at a time... +	    synchronized (ANY) { +		nodeCount = 0; + +		toString (buf, new Hashtable ()); +		return buf.toString (); +	    } +	} + +	private void toString (StringBuffer buf, Hashtable table) +	{ +	    // When we visit a node, label and count it. +	    // Nodes are never visited/counted more than once. +	    // For small models labels waste space, but if arity +	    // mappings were used the savings are substantial. +	    // (Plus, the output can be more readily understood.) +	    String temp = (String) table.get (this); + +	    if (temp != null) { +		buf.append ('{'); +		buf.append (temp); +		buf.append ('}'); +		return; +	    } else { +		StringBuffer scratch = new StringBuffer (15); + +		if ((flags & F_LOOPHEAD) != 0) +		    scratch.append ("loop"); +		else +		    scratch.append ("node"); +		scratch.append ('-'); +		scratch.append (++nodeCount); +		temp = scratch.toString (); + +		table.put (this, temp); +		buf.append ('['); +		buf.append (temp); +		buf.append (']'); +		buf.append (':'); +	    } + +	    // NAME/SEQUENCE +	    if (name != null) { +		// n.b. some output encodings turn some name chars into '?' +		// e.g. with Japanese names and ASCII output +		buf.append (name); +		if (components != null)		// bug! +		    buf.append ('$'); +		if (next == null) +		    buf.append (",*"); +		else if (next instanceof EmptyRecognizer) // patch-to-next +		    buf.append (",{}"); +		else if (next instanceof ChildrenRecognizer) { +		    buf.append (','); +		    ((ChildrenRecognizer)next).toString (buf, table); +		} else				// bug! +		    buf.append (",+"); +		return; +	    } + +	    // CHOICE +	    buf.append ("<"); +	    for (int i = 0; i < components.length; i++) { +		if (i != 0) +		    buf.append ("|"); +		if (components [i] instanceof EmptyRecognizer) { +		    buf.append ("{}"); +		} else if (components [i] == null) {	// patch-to-next +		    buf.append ('*'); +		} else { +		    ChildrenRecognizer r; + +		    r = (ChildrenRecognizer) components [i]; +		    r.toString (buf, table); +		} +	    } +	    buf.append (">"); +	} +/**/ +    } +} diff --git a/libjava/gnu/xml/pipeline/WellFormednessFilter.java b/libjava/gnu/xml/pipeline/WellFormednessFilter.java new file mode 100644 index 00000000000..3047ae3567f --- /dev/null +++ b/libjava/gnu/xml/pipeline/WellFormednessFilter.java @@ -0,0 +1,362 @@ +/* WellFormednessFilter.java --  +   Copyright (C) 1999,2000,2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING.  If not, write to the +Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA +02111-1307 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library.  Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module.  An independent module is a module which is not derived from +or based on this library.  If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so.  If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.util.EmptyStackException; +import java.util.Stack; + +import gnu.xml.aelfred2.SAXDriver; +import org.xml.sax.*; +import org.xml.sax.ext.*; + + +/** + * This filter reports fatal exceptions in the case of event streams that + * are not well formed.  The rules currently tested include: <ul> + * + *	<li>setDocumentLocator ... may be called only before startDocument + * + *	<li>startDocument/endDocument ... must be paired, and all other + *	calls (except setDocumentLocator) must be nested within these. + * + *	<li>startElement/endElement ... must be correctly paired, and + *	may never appear within CDATA sections. + * + *	<li>comment ... can't contain "--" + * + *	<li>character data ... can't contain "]]>" + * + *	<li>whitespace ... can't contain CR + * + *	<li>whitespace and character data must be within an element + * + *	<li>processing instruction ... can't contain "?>" or CR + * + *	<li>startCDATA/endCDATA ... must be correctly paired. + * + *	</ul> + * + * <p> Other checks for event stream correctness may be provided in + * the future.  For example, insisting that + * entity boundaries nest correctly, + * namespace scopes nest correctly, + * namespace values never contain relative URIs, + * attributes don't have "<" characters; + * and more. + * + * @author David Brownell + */ +public final class WellFormednessFilter extends EventFilter +{ +    private boolean		startedDoc; +    private Stack		elementStack = new Stack (); +    private boolean		startedCDATA; +    private String		dtdState = "before"; + +     +    /** +     * Swallows all events after performing well formedness checks. +     */ +	// constructor used by PipelineFactory +    public WellFormednessFilter () +	{ this (null); } + + +    /** +     * Passes events through to the specified consumer, after first +     * processing them. +     */ +	// constructor used by PipelineFactory +    public WellFormednessFilter (EventConsumer consumer) +    { +	super (consumer); + +	setContentHandler (this); +	setDTDHandler (this); +	 +	try { +	    setProperty (LEXICAL_HANDLER, this); +	} catch (SAXException e) { /* can't happen */ } +    } + +    /** +     * Resets state as if any preceding event stream was well formed. +     * Particularly useful if it ended through some sort of error, +     * and the endDocument call wasn't made. +     */ +    public void reset () +    { +	startedDoc = false; +	startedCDATA = false; +	elementStack.removeAllElements (); +    } + + +    private SAXParseException getException (String message) +    { +	SAXParseException	e; +	Locator			locator = getDocumentLocator (); + +	if (locator == null) +	    return new SAXParseException (message, null, null, -1, -1); +	else +	    return new SAXParseException (message, locator); +    } + +    private void fatalError (String message) +    throws SAXException +    { +	SAXParseException	e = getException (message); +	ErrorHandler		handler = getErrorHandler (); + +	if (handler != null) +	    handler.fatalError (e); +	throw e; +    } + +    /** +     * Throws an exception when called after startDocument. +     * +     * @param locator the locator, to be used in error reporting or relative +     *	URI resolution. +     * +     * @exception IllegalStateException when called after the document +     *	has already been started +     */ +    public void setDocumentLocator (Locator locator) +    { +	if (startedDoc) +	    throw new IllegalStateException ( +		    "setDocumentLocator called after startDocument"); +	super.setDocumentLocator (locator); +    } + +    public void startDocument () throws SAXException +    { +	if (startedDoc) +	    fatalError ("startDocument called more than once"); +	startedDoc = true; +	startedCDATA = false; +	elementStack.removeAllElements (); +	super.startDocument (); +    } + +    public void startElement ( +	String uri, String localName, +	String qName, Attributes atts +    ) throws SAXException +    { +	if (!startedDoc) +	    fatalError ("callback outside of document?"); +	if ("inside".equals (dtdState)) +	    fatalError ("element inside DTD?"); +	else +	    dtdState = "after"; +	if (startedCDATA) +	    fatalError ("element inside CDATA section"); +	if (qName == null || "".equals (qName)) +	    fatalError ("startElement name missing"); +	elementStack.push (qName); +	super.startElement (uri, localName, qName, atts); +    } + +    public void endElement (String uri, String localName, String qName) +    throws SAXException +    { +	if (!startedDoc) +	    fatalError ("callback outside of document?"); +	if (startedCDATA) +	    fatalError ("element inside CDATA section"); +	if (qName == null || "".equals (qName)) +	    fatalError ("endElement name missing"); +	 +	try { +	    String	top = (String) elementStack.pop (); + +	    if (!qName.equals (top)) +		fatalError ("<" + top + " ...>...</" + qName + ">"); +	    // XXX could record/test namespace info +	} catch (EmptyStackException e) { +	    fatalError ("endElement without startElement:  </" + qName + ">"); +	} +	super.endElement (uri, localName, qName); +    } + +    public void endDocument () throws SAXException +    { +	if (!startedDoc) +	    fatalError ("callback outside of document?"); +	dtdState = "before"; +	startedDoc = false; +	super.endDocument (); +    } + + +    public void startDTD (String root, String publicId, String systemId) +    throws SAXException +    { +	if (!startedDoc) +	    fatalError ("callback outside of document?"); +    if ("before" != dtdState) +	    fatalError ("two DTDs?"); +	if (!elementStack.empty ()) +	    fatalError ("DTD must precede root element"); +	dtdState = "inside"; +	super.startDTD (root, publicId, systemId); +    } + +    public void notationDecl (String name, String publicId, String systemId) +    throws SAXException +    { +// FIXME: not all parsers will report startDTD() ... +// we'd rather insist we're "inside". +    if ("after" == dtdState) +	    fatalError ("not inside DTD"); +	super.notationDecl (name, publicId, systemId); +    } + +    public void unparsedEntityDecl (String name, +    	String publicId, String systemId, String notationName) +    throws SAXException +    { +// FIXME: not all parsers will report startDTD() ... +// we'd rather insist we're "inside". +    if ("after" == dtdState) +	    fatalError ("not inside DTD"); +	super.unparsedEntityDecl (name, publicId, systemId, notationName); +    } + +    // FIXME:  add the four DeclHandler calls too + +    public void endDTD () +    throws SAXException +    { +	if (!startedDoc) +	    fatalError ("callback outside of document?"); +	if ("inside" != dtdState) +	    fatalError ("DTD ends without start?"); +	dtdState = "after"; +	super.endDTD (); +    } + +    public void characters (char ch [], int start, int length) +    throws SAXException +    { +	int here = start, end = start + length; +	if (elementStack.empty ()) +	    fatalError ("characters must be in an element"); +	while (here < end) { +	    if (ch [here++] != ']') +		continue; +	    if (here == end)	// potential problem ... +		continue; +	    if (ch [here++] != ']') +		continue; +	    if (here == end)	// potential problem ... +		continue; +	    if (ch [here++] == '>') +		fatalError ("character data can't contain \"]]>\""); +	} +	super.characters (ch, start, length); +    } + +    public void ignorableWhitespace (char ch [], int start, int length) +    throws SAXException +    { +	int here = start, end = start + length; +	if (elementStack.empty ()) +	    fatalError ("characters must be in an element"); +	while (here < end) { +	    if (ch [here++] == '\r') +		fatalError ("whitespace can't contain CR"); +	} +	super.ignorableWhitespace (ch, start, length); +    } + +    public void processingInstruction (String target, String data) +    throws SAXException +    { +	if (data.indexOf ('\r') > 0) +	    fatalError ("PIs can't contain CR"); +	if (data.indexOf ("?>") > 0) +	    fatalError ("PIs can't contain \"?>\""); +    } + +    public void comment (char ch [], int start, int length) +    throws SAXException +    { +	if (!startedDoc) +	    fatalError ("callback outside of document?"); +	if (startedCDATA) +	    fatalError ("comments can't nest in CDATA"); +	int here = start, end = start + length; +	while (here < end) { +	    if (ch [here] == '\r') +		fatalError ("comments can't contain CR"); +	    if (ch [here++] != '-') +		continue; +	    if (here == end) +		fatalError ("comments can't end with \"--->\""); +	    if (ch [here++] == '-') +		fatalError ("comments can't contain \"--\""); +	} +	super.comment (ch, start, length); +    } + +    public void startCDATA () +    throws SAXException +    { +	if (!startedDoc) +	    fatalError ("callback outside of document?"); +	if (startedCDATA) +	    fatalError ("CDATA starts can't nest"); +	startedCDATA = true; +	super.startCDATA (); +    } + +    public void endCDATA () +    throws SAXException +    { +	if (!startedDoc) +	    fatalError ("callback outside of document?"); +	if (!startedCDATA) +	    fatalError ("CDATA end without start?"); +	startedCDATA = false; +	super.endCDATA (); +    } +} diff --git a/libjava/gnu/xml/pipeline/XIncludeFilter.java b/libjava/gnu/xml/pipeline/XIncludeFilter.java new file mode 100644 index 00000000000..efa05d942f9 --- /dev/null +++ b/libjava/gnu/xml/pipeline/XIncludeFilter.java @@ -0,0 +1,580 @@ +/* XIncludeFilter.java --  +   Copyright (C) 2001,2002 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING.  If not, write to the +Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA +02111-1307 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library.  Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module.  An independent module is a module which is not derived from +or based on this library.  If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so.  If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.net.URL;  +import java.net.URLConnection;  +import java.util.Enumeration; +import java.util.Hashtable; +import java.util.Stack; +import java.util.Vector; + +import org.xml.sax.Attributes; +import org.xml.sax.ErrorHandler; +import org.xml.sax.InputSource; +import org.xml.sax.Locator; +import org.xml.sax.SAXException; +import org.xml.sax.SAXParseException; +import org.xml.sax.XMLReader; +import org.xml.sax.helpers.XMLReaderFactory; + +import gnu.xml.util.Resolver; + + + +/** + * Filter to process an XPointer-free subset of + * <a href="http://www.w3.org/TR/xinclude">XInclude</a>, supporting its + * use as a kind of replacement for parsed general entities. + * XInclude works much like the <code>#include</code> of C/C++ but + * works for XML documents as well as unparsed text files. + * Restrictions from the 17-Sept-2002 CR draft of XInclude are as follows: + * + * <ul> + * + * <li> URIs must not include fragment identifiers. + * The CR specifies support for XPointer <em>element()</em> fragment IDs, + * which is not currently implemented here. + * + * <li> <em>xi:fallback</em> handling of resource errors is not + * currently supported. + * + * <li> DTDs are not supported in included files, since the SAX DTD events + * must have completely preceded any included file.  + * The CR explicitly allows the DTD related portions of the infoset to + * grow as an effect of including XML documents. + * + * <li> <em>xml:base</em> fixup isn't done. + * + * </ul> + * + * <p> XML documents that are included will normally be processed using + * the default SAX namespace rules, meaning that prefix information may + * be discarded.  This may be changed with {@link #setSavingPrefixes + * setSavingPrefixes()}.  <em>You are strongly advised to do this.</em> + * + * <p> Note that XInclude allows highly incompatible implementations, which + * are specialized to handle application-specific infoset extensions.  Some + * such implementations can be implemented by subclassing this one, but + * they may only be substituted in applications at "user option". + * + * <p>TBD: "IURI" handling. + * + * @author David Brownell + */ +public class XIncludeFilter extends EventFilter implements Locator +{ +    private Hashtable		extEntities = new Hashtable (5, 5); +    private int			ignoreCount; +    private Stack		uris = new Stack (); +    private Locator		locator; +    private Vector		inclusions = new Vector (5, 5); +    private boolean		savingPrefixes; + +    /** +     */ +    public XIncludeFilter (EventConsumer next) +    throws SAXException +    { +	super (next); +	setContentHandler (this); +	// DTDHandler callbacks pass straight through +	setProperty (DECL_HANDLER, this); +	setProperty (LEXICAL_HANDLER, this); +    } + +    private void fatal (SAXParseException e) throws SAXException +    { +	ErrorHandler		eh; +	 +	eh = getErrorHandler (); +	if (eh != null) +	    eh.fatalError (e); +	throw e; +    } + +    /** +     * Passes "this" down the filter chain as a proxy locator. +     */ +    public void setDocumentLocator (Locator locator) +    { +	this.locator = locator; +	super.setDocumentLocator (this); +    } + +    /** Used for proxy locator; do not call directly. */ +    public String getSystemId () +	{ return (locator == null) ? null : locator.getSystemId (); } +    /** Used for proxy locator; do not call directly. */ +    public String getPublicId () +	{ return (locator == null) ? null : locator.getPublicId (); } +    /** Used for proxy locator; do not call directly. */ +    public int getLineNumber () +	{ return (locator == null) ? -1 : locator.getLineNumber (); } +    /** Used for proxy locator; do not call directly. */ +    public int getColumnNumber () +	{ return (locator == null) ? -1 : locator.getColumnNumber (); } + +    /** +     * Assigns the flag controlling the setting of the SAX2 +     * <em>namespace-prefixes</em> flag. +     */ +    public void setSavingPrefixes (boolean flag) +	{ savingPrefixes = flag; } + +    /** +     * Returns the flag controlling the setting of the SAX2 +     * <em>namespace-prefixes</em> flag when parsing included documents. +     * The default value is the SAX2 default (false), which discards +     * information that can be useful. +     */ +    public boolean isSavingPrefixes () +	{ return savingPrefixes; } + +    // +    // Two mechanisms are interacting here. +    //  +    //	- XML Base implies a stack of base URIs, updated both by +    //	  "real entity" boundaries and element boundaries. +    // +    //	- Active "Real Entities" (for document and general entities, +    //	  and by xincluded files) are tracked to prevent circular +    //	  inclusions. +    // +    private String addMarker (String uri) +    throws SAXException +    { +	if (locator != null && locator.getSystemId () != null) +	    uri = locator.getSystemId (); + +	// guard against InputSource objects without system IDs +	if (uri == null) +	    fatal (new SAXParseException ("Entity URI is unknown", locator)); + +	try { +	    URL	url = new URL (uri); + +	    uri = url.toString (); +	    if (inclusions.contains (uri)) +		fatal (new SAXParseException ( +			"XInclude, circular inclusion", locator)); +	    inclusions.addElement (uri); +	    uris.push (url); +	} catch (IOException e) { +	    // guard against illegal relative URIs (Xerces) +	    fatal (new SAXParseException ("parser bug: relative URI", +		locator, e)); +	} +	return uri; +    } + +    private void pop (String uri) +    { +	inclusions.removeElement (uri); +	uris.pop (); +    } + +    // +    // Document entity boundaries get both treatments. +    // +    public void startDocument () throws SAXException +    { +	ignoreCount = 0; +	addMarker (null); +	super.startDocument (); +    } + +    public void endDocument () throws SAXException +    { +	inclusions.setSize (0); +	extEntities.clear (); +	uris.setSize (0); +	super.endDocument (); +    } + +    // +    // External general entity boundaries get both treatments. +    // +    public void externalEntityDecl (String name, +    	String publicId, String systemId) +    throws SAXException +    { +	if (name.charAt (0) == '%') +	    return; +	try { +	    URL	url = new URL (locator.getSystemId ()); +	    systemId = new URL (url, systemId).toString (); +	} catch (IOException e) { +	    // what could we do? +	} +	extEntities.put (name, systemId); +    } + +    public void startEntity (String name) +    throws SAXException +    { +	if (ignoreCount != 0) { +	    ignoreCount++; +	    return; +	} + +	String	uri = (String) extEntities.get (name); +	if (uri != null) +	    addMarker (uri); +	super.startEntity (name); +    } + +    public void endEntity (String name) +    throws SAXException +    { +	if (ignoreCount != 0) { +	    if (--ignoreCount != 0) +		return; +	} + +	String	uri = (String) extEntities.get (name); + +	if (uri != null) +	    pop (uri); +	super.endEntity (name); +    } +     +    // +    // element boundaries only affect the base URI stack, +    // unless they're XInclude elements. +    // +    public void +    startElement (String uri, String localName, String qName, Attributes atts) +    throws SAXException +    { +	if (ignoreCount != 0) { +	    ignoreCount++; +	    return; +	} + +	URL	baseURI = (URL) uris.peek (); +	String	base; + +	base = atts.getValue ("http://www.w3.org/XML/1998/namespace", "base"); +	if (base == null) +	    uris.push (baseURI); +	else { +	    URL		url; + +	    if (base.indexOf ('#') != -1) +		fatal (new SAXParseException ( +		    "xml:base with fragment: " + base, +		    locator)); + +	    try { +		baseURI = new URL (baseURI, base); +		uris.push (baseURI); +	    } catch (Exception e) { +		fatal (new SAXParseException ( +		    "xml:base with illegal uri: " + base, +		    locator, e)); +	    } +	} + +	if (!"http://www.w3.org/2001/XInclude".equals (uri)) { +	    super.startElement (uri, localName, qName, atts); +	    return; +	} + +	if ("include".equals (localName)) { +	    String	href = atts.getValue ("href"); +	    String	parse = atts.getValue ("parse"); +	    String	encoding = atts.getValue ("encoding"); +	    URL		url = (URL) uris.peek (); +	    SAXParseException	x = null; + +	    if (href == null) +		fatal (new SAXParseException ( +		    "XInclude missing href", +		    locator)); +	    if (href.indexOf ('#') != -1) +		fatal (new SAXParseException ( +		    "XInclude with fragment: " + href, +		    locator)); + +	    if (parse == null || "xml".equals (parse)) +		x = xinclude (url, href); +	    else if ("text".equals (parse)) +		x = readText (url, href, encoding); +	    else +		fatal (new SAXParseException ( +		    "unknown XInclude parsing mode: " + parse, +		    locator)); +	    if (x == null) { +		// strip out all child content +		ignoreCount++; +		return; +	    } + +	    // FIXME the 17-Sept-2002 CR of XInclude says we "must" +	    // use xi:fallback elements to handle resource errors, +	    // if they exist. +	    fatal (x); + +	} else if ("fallback".equals (localName)) { +	    fatal (new SAXParseException ( +		"illegal top level XInclude 'fallback' element", +		locator)); +	} else { +	    ErrorHandler	eh = getErrorHandler (); + +	    // CR doesn't say this is an error +	    if (eh != null) +		eh.warning (new SAXParseException ( +		    "unrecognized toplevel XInclude element: " + localName, +		    locator)); +	    super.startElement (uri, localName, qName, atts); +	} +    } + +    public void endElement (String uri, String localName, String qName) +    throws SAXException +    { +	if (ignoreCount != 0) { +	    if (--ignoreCount != 0) +		return; +	} + +	uris.pop (); +	if (!("http://www.w3.org/2001/XInclude".equals (uri) +		&& "include".equals (localName))) +	    super.endElement (uri, localName, qName); +    } + +    // +    // ignore all content within non-empty xi:include elements +    // +    public void characters (char ch [], int start, int length) +    throws SAXException +    { +	if (ignoreCount == 0) +	    super.characters (ch, start, length); +    } + +    public void processingInstruction (String target, String value) +    throws SAXException +    { +	if (ignoreCount == 0) +	    super.processingInstruction (target, value); +    } + +    public void ignorableWhitespace (char ch [], int start, int length) +    throws SAXException +    { +	if (ignoreCount == 0) +	    super.ignorableWhitespace (ch, start, length); +    } + +    public void comment (char ch [], int start, int length) +    throws SAXException +    { +	if (ignoreCount == 0) +	    super.comment (ch, start, length); +    } + +    public void startCDATA () throws SAXException +    { +	if (ignoreCount == 0) +	    super.startCDATA (); +    } + +    public void endCDATA () throws SAXException +    { +	if (ignoreCount == 0) +	    super.endCDATA (); +    } + +    public void startPrefixMapping (String prefix, String uri) +    throws SAXException +    { +	if (ignoreCount == 0) +	    super.startPrefixMapping (prefix, uri); +    } + +    public void endPrefixMapping (String prefix) throws SAXException +    { +	if (ignoreCount == 0) +	    super.endPrefixMapping (prefix); +    } + +    public void skippedEntity (String name) throws SAXException +    { +	if (ignoreCount == 0) +	    super.skippedEntity (name); +    } + +    // JDK 1.1 seems to need it to be done this way, sigh +    void setLocator (Locator l) { locator = l; } +    Locator getLocator () { return locator; } +     + +    // +    // for XIncluded entities, manage the current locator and +    // filter out events that would be incorrect to report +    // +    private class Scrubber extends EventFilter +    { +	Scrubber (EventFilter f) +	throws SAXException +	{ +	    // delegation passes to next in chain +	    super (f); + +	    // process all content events +	    super.setContentHandler (this); +	    super.setProperty (LEXICAL_HANDLER, this); + +	    // drop all DTD events +	    super.setDTDHandler (null); +	    super.setProperty (DECL_HANDLER, null); +	} + +	// maintain proxy locator +	// only one startDocument()/endDocument() pair per event stream +	public void setDocumentLocator (Locator l) +	    { setLocator (l); } +	public void startDocument () +	    { } +	public void endDocument () +	    { } +	 +	private void reject (String message) throws SAXException +	    { fatal (new SAXParseException (message, getLocator ())); } +	 +	// only the DTD from the "base document" gets reported +	public void startDTD (String root, String publicId, String systemId) +	throws SAXException +	    { reject ("XIncluded DTD: " + systemId); } +	public void endDTD () +	throws SAXException +	    { reject ("XIncluded DTD"); } +	// ... so this should never happen +	public void skippedEntity (String name) throws SAXException +	    { reject ("XInclude skipped entity: " + name); } + +	// since we rejected DTDs, only builtin entities can be reported +    } + +    // <xi:include parse='xml' ...> +    // relative to the base URI passed +    private SAXParseException xinclude (URL url, String href) +    throws SAXException +    { +	XMLReader	helper; +	Scrubber	scrubber; +	Locator		savedLocator = locator; + +	// start with a parser acting just like our input +	// modulo DTD-ish stuff (validation flag, entity resolver) +	helper = XMLReaderFactory.createXMLReader (); +	helper.setErrorHandler (getErrorHandler ()); +	helper.setFeature (FEATURE_URI + "namespace-prefixes", true); + +	// Set up the proxy locator and event filter. +	scrubber = new Scrubber (this); +	locator = null; +	bind (helper, scrubber); + +	// Merge the included document, except its DTD +	try { +	    url = new URL (url, href); +	    href = url.toString (); + +	    if (inclusions.contains (href)) +		fatal (new SAXParseException ( +			"XInclude, circular inclusion", locator)); + +	    inclusions.addElement (href); +	    uris.push (url); +	    helper.parse (new InputSource (href)); +	    return null; +	} catch (java.io.IOException e) { +	    return new SAXParseException (href, locator, e); +	} finally { +	    pop (href); +	    locator = savedLocator; +	} +    } + +    // <xi:include parse='text' ...> +    // relative to the base URI passed +    private SAXParseException readText (URL url, String href, String encoding) +    throws SAXException +    { +	InputStream	in = null; + +	try { +	    URLConnection	conn; +	    InputStreamReader	reader; +	    char		buf [] = new char [4096]; +	    int			count; + +	    url = new URL (url, href); +	    conn = url.openConnection (); +	    in = conn.getInputStream (); +	    if (encoding == null) +		encoding = Resolver.getEncoding (conn.getContentType ()); +	    if (encoding == null) { +		ErrorHandler	eh = getErrorHandler (); +		if (eh != null) +		    eh.warning (new SAXParseException ( +			"guessing text encoding for URL: " + url, +			locator)); +		reader = new InputStreamReader (in); +	    } else +		reader = new InputStreamReader (in, encoding); + +	    while ((count = reader.read (buf, 0, buf.length)) != -1) +		super.characters (buf, 0, count); +	    in.close (); +	    return null; +	} catch (IOException e) { +	    return new SAXParseException ( +		"can't XInclude text", +		locator, e); +	} +    } +} diff --git a/libjava/gnu/xml/pipeline/XsltFilter.java b/libjava/gnu/xml/pipeline/XsltFilter.java new file mode 100644 index 00000000000..b1bebbe98a5 --- /dev/null +++ b/libjava/gnu/xml/pipeline/XsltFilter.java @@ -0,0 +1,131 @@ +/* XsltFilter.java --  +   Copyright (C) 2001 Free Software Foundation, Inc. + +This file is part of GNU Classpath. + +GNU Classpath is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Classpath is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Classpath; see the file COPYING.  If not, write to the +Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA +02111-1307 USA. + +Linking this library statically or dynamically with other modules is +making a combined work based on this library.  Thus, the terms and +conditions of the GNU General Public License cover the whole +combination. + +As a special exception, the copyright holders of this library give you +permission to link this library with independent modules to produce an +executable, regardless of the license terms of these independent +modules, and to copy and distribute the resulting executable under +terms of your choice, provided that you also meet, for each linked +independent module, the terms and conditions of the license of that +module.  An independent module is a module which is not derived from +or based on this library.  If you modify this library, you may extend +this exception to your version of the library, but you are not +obligated to do so.  If you do not wish to do so, delete this +exception statement from your version. */ + +package gnu.xml.pipeline; + +import java.io.IOException; + +import javax.xml.transform.TransformerFactory; +import javax.xml.transform.TransformerConfigurationException; +import javax.xml.transform.sax.*; +import javax.xml.transform.stream.StreamSource; + +import org.xml.sax.ContentHandler; +import org.xml.sax.SAXException; +import org.xml.sax.ext.LexicalHandler; + + +/** + * Packages an XSLT transform as a pipeline component.  + * Note that all DTD events (callbacks to DeclHandler and DTDHandler  + * interfaces) are discarded, although XSLT transforms may be set up to + * use the LexicalHandler to write DTDs with only an external subset. + * Not every XSLT engine will necessarily be usable with this filter, + * but current versions of + * <a href="http://saxon.sourceforge.net">SAXON</a> and + * <a href="http://xml.apache.org/xalan-j">Xalan</a> should work well. + * + * @see TransformerFactory + * + * @author David Brownell + */ +final public class XsltFilter extends EventFilter +{ +    /** +     * Creates a filter that performs the specified transform. +     * Uses the JAXP 1.1 interfaces to access the default XSLT +     * engine configured for in the current execution context, +     * and parses the stylesheet without custom EntityResolver +     * or ErrorHandler support. +     * +     * @param stylesheet URI for the stylesheet specifying the +     *	XSLT transform +     * @param next provides the ContentHandler and LexicalHandler +     *	to receive XSLT output. +     * @exception SAXException if the stylesheet can't be parsed +     * @exception IOException if there are difficulties +     *	bootstrapping the XSLT engine, such as it not supporting +     *	SAX well enough to use this way. +     */ +    public XsltFilter (String stylesheet, EventConsumer next) +    throws SAXException, IOException +    { +	// First, get a transformer with the stylesheet preloaded +	TransformerFactory	tf = null; +	TransformerHandler	th; + +	try { +	    SAXTransformerFactory	stf; + +	    tf = TransformerFactory.newInstance (); +	    if (!tf.getFeature (SAXTransformerFactory.FEATURE)	// sax inputs +		    || !tf.getFeature (SAXResult.FEATURE)	// sax outputs +		    || !tf.getFeature (StreamSource.FEATURE)	// stylesheet +		    ) +		throw new IOException ("XSLT factory (" +		    + tf.getClass ().getName () +		    + ") does not support SAX"); +	    stf = (SAXTransformerFactory) tf; +	    th = stf.newTransformerHandler (new StreamSource (stylesheet)); +	} catch (TransformerConfigurationException e) { +	    throw new IOException ("XSLT factory (" +		+ (tf == null +			? "none available" +			: tf.getClass ().getName ()) +		+ ") configuration error, " +		+ e.getMessage () +		); +	} + +	// Hook its outputs up to the pipeline ... +	SAXResult		out = new SAXResult (); + +	out.setHandler (next.getContentHandler ()); +	try { +	    LexicalHandler	lh; +	    lh = (LexicalHandler) next.getProperty (LEXICAL_HANDLER); +	    out.setLexicalHandler (lh); +	} catch (Exception e) { +	    // ignore +	} +	th.setResult (out); + +	// ... and make sure its inputs look like ours. +	setContentHandler (th); +	setProperty (LEXICAL_HANDLER, th); +    } +} diff --git a/libjava/gnu/xml/pipeline/package.html b/libjava/gnu/xml/pipeline/package.html new file mode 100644 index 00000000000..352f4c87c2c --- /dev/null +++ b/libjava/gnu/xml/pipeline/package.html @@ -0,0 +1,255 @@ +<html><head><title> +blah +<!-- +/* + * Copyright (C) 1999-2001 The Free Software Foundation, Inc. + */ +--> +</title></head><body> + +<p>This package exposes a kind of XML processing pipeline, based on sending +SAX events, which can be used as components of application architectures. +Pipelines are used to convey streams of processing events from a producer +to one or more consumers, and to let each consumer control the data seen by +later consumers. + +<p> There is a <a href="PipelineFactory.html">PipelineFactory</a> class which +accepts a syntax describing how to construct some simple pipelines.  Strings +describing such pipelines can be used in command line tools (see the +<a href="../util/DoParse.html">DoParse</a> class) +and in other places that it is +useful to let processing be easily reconfigured.  Pipelines can of course +be constructed programmatically, providing access to options that the +factory won't. + +<p> Web applications are supported by making it easy for servlets (or +non-Java web application components) to be part of a pipeline.  They can +originate XML (or XHTML) data through an <em>InputSource</em> or in +response to XML messages sent from clients using <em>CallFilter</em> +pipeline stages.  Such facilities are available using the simple syntax +for pipeline construction. + + +<h2> Programming Models </h2> + +<p> Pipelines should be simple to understand. + +<ul> +    <li> XML content, typically entire documents, +    is pushed through consumers by producers. + +    <li> Pipelines are basically about consuming SAX2 callback events, +    where the events encapsulate XML infoset-level data.<ul> + +	<li> Pipelines are constructed by taking one or more consumer +	stages and combining them to produce a composite consumer. + +	<li> A pipeline is presumed to have pending tasks and state from +	the beginning of its ContentHandler.startDocument() callback until +	it's returned from its ContentHandler.doneDocument() callback. + +	<li> Pipelines may have multiple output stages ("fan-out") +	or multiple input stages ("fan-in") when appropriate. + +	<li> Pipelines may be long-lived, but need not be. + +	</ul> + +    <li> There is flexibility about event production. <ul> + +	<li> SAX2 XMLReader objects are producers, which +	provide a high level "pull" model: documents (text or DOM) are parsed, +	and the parser pushes individual events through the pipeline. + +	<li> Events can be pushed directly to event consumer components +	by application modules, if they invoke SAX2 callbacks directly. +	That is, application modules use the XML Infoset as exposed +	through SAX2 event callbacks. + +	</ul> +     +    <li> Multiple producer threads may concurrently access a pipeline, +    if they coordinate appropriately. + +    <li> Pipeline processing is not the only framework applications +    will use. + +    </ul> + + +<h3> Producers: XMLReader or Custom </h3> + +<p> Many producers will be SAX2 XMLReader objects, and +will read (pull) data which is then written (pushed) as events. +Typically these will parse XML text (acquired from +<code>org.xml.sax.helpers.XMLReaderFactory</code>) or a DOM tree +(using a <code><a href="../util/DomParser.html">DomParser</a></code>) +These may be bound to event consumer using a convenience routine, +<em><a href="EventFilter.html">EventFilter</a>.bind()</em>. +Once bound, these producers may be given additional documents to +sent through its pipeline. + +<p> In other cases, you will write producers yourself.  For example, some +data structures might know how to write themselves out using one or +more XML models, expressed as sequences of SAX2 event callbacks. +An application module might +itself be a producer, issuing startDocument and endDocument events +and then asking those data structures to write themselves out to a +given EventConsumer, or walking data structures (such as JDBC query +results) and applying its own conversion rules.  WAP format XML +(WBMXL) can be directly converted to producer output. + +<p> SAX2 introduced an "XMLFilter" interface, which is a kind of XMLReader. +It is most useful in conjunction with its XMLFilterImpl helper class; +see the <em><a href="EventFilter.html">EventFilter</a></em> javadoc +for information contrasting that XMLFilterImpl approach with the +relevant parts of this pipeline framework.  Briefly, such XMLFilterImpl +children can be either producers or consumers, and are more limited in +configuration flexibility.  In this framework, the focus of filters is +on the EventConsumer side; see the section on +<a href="#fitting">pipe fitting</a> below. + + +<h3> Consume to Standard or Custom Data Representations </h3> + +<p> Many consumers will be used to create standard representations of XML +data.  The <a href="TextConsumer.html">TextConsumer</a> takes its events +and writes them as text for a single XML document, +using an internal <a href="../util/XMLWriter.html">XMLWriter</a>. +The <a href="DomConsumer.html">DomConsumer</a> takes its events and uses +them to create and populate a DOM Document. + +<p> In other cases, you will write consumers yourself.  For example, +you might use a particular unmarshaling filter to produce objects +that fit your application's requirements, instead of using DOM. +Such consumers work at the level of XML data models, rather than with +specific representations such as XML text or a DOM tree.  You could +convert your output directly to WAP format data (WBXML). + + +<h3><a name="fitting">Pipe Fitting</a></h3> + +<p> Pipelines are composite event consumers, with each stage having +the opportunity to transform the data before delivering it to any +subsequent stages. + +<p> The <a href="PipelineFactory.html">PipelineFactory</a> class +provides access to much of this functionality through a simple syntax. +See the table in that class's javadoc describing a number of standard +components.  Direct API calls are still needed for many of the most +interesting pipeline configurations, including ones leveraging actual +or logical concurrency. + +<p> Four basic types of pipe fitting are directly supported.  These may +be used to construct complex pipeline networks.  <ul> + +    <li> <a href="TeeConsumer.html">TeeConsumer</a> objects split event +    flow so it goes to two two different consumers, one before the other. +    This is a basic form of event fan-out; you can use this class to +    copy events to any number of output pipelines. + +    <li> Clients can call remote components through HTTP or HTTPS using +    the <a href="CallFilter.html">CallFilter</a> component, and Servlets +    can implement such components by extending the +    <a href="XmlServlet.html">XmlServlet</a> component.  Java is not +    required on either end, and transport protocols other than HTTP may +    also be used. + +    <li> <a href="EventFilter.html">EventFilter</a> objects selectively +    provide handling for callbacks, and can pass unhandled ones to a +    subsequent stage.  They are often subclassed, since much of the +    basic filtering machinery is already in place in the base class. + +    <li> Applications can merge two event flows by just using the same +    consumer in each one.  If multiple threads are in use, synchronization +    needs to be addressed by the appropriate application level policy. + +    </ul> + +<p> Note that filters can be as complex as +<a href="XsltFilter.html">XSLT transforms</a> +available) on input data, or as simple as removing simple syntax data +such as ignorable whitespace, comments, and CDATA delimiters. +Some simple "built-in" filters are part of this package. + + +<h3> Coding Conventions:  Filter and Terminus Stages</h3> + +<p> If you follow these coding conventions, your classes may be used +directly (give the full class name) in pipeline descriptions as understood +by the PipelineFactory.  There are four constructors the factory may +try to use; in order of decreasing numbers of parameters, these are: <ul> + +    <li> Filters that need a single String setup parameter should have +    a public constructor with two parameters:  that string, then the +    EventConsumer holding the "next" consumer to get events. + +    <li> Filters that don't need setup parameters should have a public +    constructor that accepts a single EventConsumer holding the "next" +    consumer to get events when they are done. + +    <li> Terminus stages may have a public constructor taking a single +    paramter:  the string value of that parameter. + +    <li> Terminus stages may have a public no-parameters constructor. + +    </ul> + +<p> Of course, classes may support more than one such usage convention; +if they do, they can automatically be used in multiple modes.  If you +try to use a terminus class as a filter, and that terminus has a constructor +with the appropriate number of arguments, it is automatically wrapped in +a "tee" filter. + + +<h2> Debugging Tip: "Tee" Joints can Snapshot Data</h2> + +<p> It can sometimes be hard to see what's happening, when something +goes wrong.  Easily fixed:  just snapshot the data.  Then you can find +out where things start to go wrong. + +<p> If you're using pipeline descriptors so that they're easily +administered, just stick a <em>write ( filename )</em> +filter into the pipeline at an appropriate point. + +<p> Inside your programs, you can do the same thing directly: perhaps +by saving a Writer (perhaps a StringWriter) in a variable, using that +to create a TextConsumer, and making that the first part of a tee -- +splicing that into your pipeline at a convenient location. + +<p> You can also use a DomConsumer to buffer the data, but remember +that DOM doesn't save all the information that XML provides, so that DOM +snapshots are relatively low fidelity.  They also are substantially more +expensive in terms of memory than a StringWriter holding similar data. + +<h2> Debugging Tip: Non-XML Producers</h2> + +<p> Producers in pipelines don't need to start from XML +data structures, such as text in XML syntax (likely coming +from some <em>XMLReader</em> that parses XML) or a +DOM representation (perhaps with a +<a href="../util/DomParser.html">DomParser</a>). + +<p> One common type of event producer will instead make +direct calls to SAX event handlers returned from an +<a href="EventConsumer.html">EventConsumer</a>. +For example, making <em>ContentHandler.startElement</em> +calls and matching <em>ContentHandler.endElement</em> calls. + +<p> Applications making such calls can catch certain +common "syntax errors" by using a +<a href="WellFormednessFilter.html">WellFormednessFilter</a>. +That filter will detect (and report) erroneous input data +such as mismatched document, element, or CDATA start/end calls. +Use such a filter near the head of the pipeline that your +producer feeds, at least while debugging, to help ensure that +you're providing legal XML Infoset data. + +<p> You can also arrange to validate data on the fly. +For DTD validation, you can configure a +<a href="ValidationConsumer.html">ValidationConsumer</a> +to work as a filter, using any DTD you choose. +Other validation schemes can be handled with other +validation filters. + +</body></html>  | 

