Changes from 1.2 to 1.2.1
|
=========================
|
Match DOCTYPE case-blind
|
Extend PushbackReader's size for oddball cases like & followed by CR
|
Leo Sutic's 2x-4x speedup by precompiling HTMLScanner table
|
|
Changes from 1.1.3 to 1.2
|
=========================
|
Changed license to Apache 2.0
|
Bogon default model is now ANY, not EMPTY
|
Support new DOCTYPE output switches --doctype-system and --doctype-public
|
Support new XML declaration output switches --standalone and --version
|
New --norootbogons switch makes bogons children of the root
|
Don't resolve entity references in attribute values unless semicolon-terminated
|
Support character entities above U+FFFF
|
Add character entities from the 2007-12-14 draft of xml-entity-names
|
Call SAX events startPrefixMapping and endPrefixMapping to report prefixes
|
Clean up newline processing, shrinking html.stml considerably
|
Allow link elements in the body as well as the head, to avoid excess bodies
|
Allow tables inside paragraphs
|
Allow cells and forms in thead and tfoot elements without intervening tr element
|
The span element is no longer restartable
|
Support non-standard elements bgsound, blink, canvas, comment, listing,
|
marquee, nobr, ruby, rbc, rtc, rb, rt, rp, wbr, xmp
|
In HTML mode, boolean attributes like checked are output in minimized form
|
Correctly handle runs of less-than characters
|
Suppress all but the first DOCTYPE declaration
|
Modify PI targets containing colons to have underscores instead
|
The case of element tags is now canonicalized to the schema
|
PI targets are no longer forced to lower case
|
|
Changes from 1.1.2 to 1.1.3
|
===========================
|
Allow Parser.set* methods to accept null
|
Allow setting the LexicalHandler feature to be null
|
in both cases means "use default behavior"
|
|
Changes from 1.1.1 to 1.1.2
|
===========================
|
Setting CDATAElementsFeature didn't really set CDATAElements instance variable
|
|
Changes from 1.1 to 1.1.1
|
=========================
|
Removed lexical handler calls to startCDATA/endCDATA from CDATA element handling
|
Added lexical handler calls to startCDATA/endCDATA from CDATA section handling
|
Added CDATAElementsFeature, the programmatic equivalent of the --nocdata switch
|
|
Changes from 1.0.5 to 1.1
|
=========================
|
Add Tatu Saloranta's JAXP support package
|
|
Changes from 1.0.4 to 1.0.5
|
===========================
|
Major repairs to comment scanning
|
Skip leading BOM
|
Comment out debugging code in PYXWriter
|
Allow &#X as well as &#x
|
Add net.sf.saxon to list of supported XSLT engines
|
|
Changes from 1.0.4 to 1.0.3
|
===========================
|
Certain options were mutually exclusive that should not have been
|
Blocked XML declaration from specifying an encoding of ""
|
--method=html was not doing the right thing
|
|
Changes from 1.0.3 to 1.0.2
|
===========================
|
Fixed build file to use Java target version 1.4
|
Fixed --version switch to print the right thing
|
|
Changes from 1.0.1 to 1.0.2
|
===========================
|
Version attribute default value removed from html element
|
Leading and trailing hyphens now trimmed properly from comments
|
Added --output-encoding switch to control encoding
|
If output encoding is Unicode, don't generate character references
|
Whitespace compressed and junk stripped from public identifiers
|
|
Changes from 1.0 to 1.0.1
|
=========================
|
Added ignorableWhitespaceFeature and --ignorable to report ignorable whitespace
|
Patch due to David Pashley
|
Insert spaces to break up -- in comments
|
Change bogus chars in publicids to spaces
|
--lexical switch now outputs DOCTYPE if there is one
|
Remove unnecessary blank line after XML declaration
|
|
Changes from 1.0rc9 to 1.0
|
==========================
|
Added feature to control restartability
|
Patch due to Nikita Zhuk
|
Added corresponding --norestart switch in CommandLine
|
Made translate-colons feature actually work
|
|
Changes from 1.0rc8 to 1.0rc9
|
=============================
|
If there is a publicid but no systemid, set systemid to ""
|
|
Changes from 1.0rc7 to 1.0rc8
|
=============================
|
Fixed paper-bag bug (source didn't match binary in release)
|
|
Changes from 1.0rc6 to 1.0rc7
|
=============================
|
LexicalHandler now gets DOCTYPE information (publicid and systemid)
|
Patch due to Mike Bremford
|
HTMLScanner now reports more useful debug output when not commented out
|
Patch due to Mike Bremford
|
Change "<memberOfAny>" to exclude "<root>" pseudo-element
|
This prevents "script" from being output as a root
|
The shared HTMLParser object has been eliminated
|
|
Changes from 1.0rc5 to 1.0rc6
|
=============================
|
If namespaceFeature is false, uri and localname are passed as empty strings
|
The namespacePrefixesFeature is now always false
|
Command line switch --nons no longer affects namespacePrefixesFeature
|
Command line switch --html now implies --nons
|
XMLWriter is now told directly to use the schema's URI as default namespace
|
XMLWriter now takes the element name from the qname if localname is empty
|
|
Changes from 1.0rc4 to 1.0rc5
|
=============================
|
The --nodefault switch now removes only default attributes, not all of them
|
Added --nocolons switch and translate-colons feature to convert ":"
|
in names to "_" (thus suppressing namespaces other than the basic one)
|
The root element can be unknown without problem
|
Empty <script/> and <style/> tags now work
|
Added all standard SAX2 features to feature hashtable
|
Reimplemented namespacePrefixes feature (broken since 1.0rc3)
|
|
Changes from 1.0rc3 to 1.0rc4
|
=============================
|
Remove trailing ? from processing instructions (in case the input is XHTML)
|
Added Javadocs for all SAX standard and TagSoup-specific features and properties
|
Fixed termination conditions for entity/character references
|
Fixed EOF-pushback bug that was generating bogus 񥔵 references
|
Added Parser feature and --nodefaults switch to ignore default attribute values
|
Added support for SAX Locator
|
Updated AFL license to version 3.0
|
Scanner buffer size increases as needed, allowing large attribute values
|
Look for various XSLT implementations as available (still fails in raw 5.0)
|
Clean up handling of XML empty tags and SGML minimized end-tags
|
Support proper options and help message internally
|
Use Hashtable in CommandLine class instead of HashMap
|
Do proper buffering of InputStream and Reader
|
Clean up content model of noframes element
|
Removed htmlMode in XMLWriter
|
Added support for XSLT output options METHOD=html and OMIT_XML_DECLARATION=yes
|
Command line option --html sets both of these
|
Wrote simple validator for TSSL schemas (tssl/tssl-validator.xslt)
|
Removed various validity problems in html.tssl
|
When processing a start-tag, don't restart elements that aren't in the new
|
element's content model
|
Remove bogus double param in tssl.xslt
|
|
Changes from 1.0rc2 to 1.0rc3
|
=============================
|
Convert CR and CRLF to LF in comments and PIs
|
Force empty elements to close immediately
|
Match close tags of CDATA elements more precisely (but case-blind)
|
Process switches on the command line
|
Man page available
|
|
Changes from 1.0rc1 to 1.0rc2
|
=============================
|
Isolated & and &# now don't crash parser
|
TagSoup no longer depends on /dev/stdin existing
|
Refactored Parser class, removing main method to new CommandLine class
|
Changes to content models of form, button, table, and tr elements in html.tssl
|
'</scr' + 'ipt>' in a script element no longer terminates it
|
Introduced "uncloseability" of form and table elements
|
"pyxin" property specifies that input is in PYX format
|
Correctly cope with unexpected characters around colons, also with multiple colons
|
Correctly output comments with "--" in them (by adding a space)
|
|
Changes from 0.10.2 to 1.0rc1
|
=============================
|
Script can now appear anywhere
|
Switch -nocdata correctly implemented
|
Eliminated useless M_n constants in Schema
|
Introduced <memberofAny> and <isRoot> as alternatives to
|
<memberOf> in TSSL
|
Allow prefixes in element names
|
Attributes are now normalized
|
Expanded public API for Element and ElementType
|
Javadoc improved
|
|
Changes from 0.10.1 to 0.10.2
|
=============================
|
Removed misfeature whereby > terminated a tag even inside quotes
|
Added licensing language to XSLT scripts, RELAX NG schemas
|
Removed long-standing mishandling of entity references in attributes
|
Cleaned up logic for converting junky strings to proper XML Names
|
Correctly handle empty tag that has no whitespace or attributes
|
Restore correct 0.9.3 handling of an apparent end-tag in a CDATA element
|
Added script element to content model of head element
|
|
Changes from 0.9.7 to 0.10.1 (there is no 0.10.0):
|
==================================================
|
Convert to XSLT configuration exclusively;
|
Perl code and tab-separated tables are gone
|
Remove xmlns:* attributes
|
Append "_" to attribute names ending in ":"
|
Don't prepend "_" to an attribute name starting in "_"
|
Handle namespace prefixes in attributes:
|
"xml" prefix is handled correctly
|
other prefixes are mapped to "urn:x-prefix:foo"
|
Ignore XML declarations
|
-Dnocdata=true turns off F_CDATA on script and style elements
|
Fixed off-by-one errors in character references that made them uninterpreted
|
Start-tags ending in a minimized attribute are no longer being dropped
|
XML empty tags are now supported (though slashes are still allowed in
|
unquoted attribute values)
|
|
Changes from 0.9.6 to 0.9.7:
|
============================
|
Upgraded AFL to version 2.1
|
Passed through newlines in character content (very old bug)
|
|
Changes from 0.9.5 to 0.9.6:
|
============================
|
Script element can appear directly in body
|
">" terminates a start-tag even inside a quoted attribute,
|
to protect against unbalanced quotes
|
"_" is prepended to attributes that don't begin with a letter
|
Remove "xmlns" attributes from the input
|
All standard features can now be set
|
(although there is no effect from doing so)
|
New "bogons-empty" feature can be set to false to give bogons
|
content model of ANY rather than EMPTY;
|
-Dany switch sets this feature to false
|
TSSL now has an explicit group element to declare an element group
|
STML is a new XML format for modeling state-table changes
|
License updated to AFL 2.1
|
|
Changes from 0.9.4 to 0.9.5:
|
============================
|
S in the statetable now means \r and \n and \t as well as space
|
(as was always intended; brain fart!)
|
Ins and del elements are now allowed everywhere
|
TSSL now correctly supports attributes that are legal on all elements
|
|
Changes from 0.9.3 to 0.9.4:
|
============================
|
Fixed paper-bag bug that revealed attribute type BOOLEAN to applications.
|
Obsolete ABSTRACT removed in favor of README.
|
Improved implementation of CDATA restart after bogus end-tag.
|
Allowed hyphen, underscore, and period in names as well as colon.
|
First cut at TagSoup Schema Language -- doesn't do anything yet.
|
Support CDATA sections on input.
|
Don't generate built-in entities within CDATA elements.
|
|
Changes from 0.9.2 to 0.9.3:
|
============================
|
Convenience main program "tagsoup" in bin directory.
|
Begin to integrate tests.
|
Introduced BOOLEAN type (currently just converted to NMTOKEN).
|
Features that actually work are now named constants in Parser.
|
Double root elements are really gone now.
|
ID attributes weren't being removed from restarted elements.
|
Fixed a bug that made unknown elements disappear in some cases.
|
Parser is now safely reusable.
|
PYXWriter and XMLWriter now implement LexicalHandler.
|
Parser reports comments, startCDATA, and endCDATA events to a LexicalHandler.
|
ScanHandler methods now throw only SAXException, not also IOException.
|
-Dlexical=true switch sets the ContentHandler as a LexicalHandler as well
|
(XMLWriter prints comments, ignores CDATA sections; PYXWriter ignores all).
|
-Dreuse=true switch reuses a single Parser object (no great speed gain).
|
We now disallow an a element as the child of another a element.
|
An empty input is now treated as zero-length character content.
|
HTMLWriter is gone in favor of an extended XMLWriter with get/setHTMLMode methods.
|
CDATA elements only terminaate with matching end-tags (thanks to Sebastien Bardoux).
|
|
Changes from 0.9.1 to 0.9.2:
|
============================
|
No longer inserts bogus ; after unknown entity reference without ;.
|
Consecutive entity references now work correctly.
|
Setting namespaces and namespace-prefixes methods now works.
|
-Dnons=true option turns off namespace and prefix.
|
New feature http://www.ccil.org/~cowan/tagsoup/features/ignore-bogons"
|
suppresses unknown start-tags (any end-tag will be automatically ignored).
|
-Dnobogons=true option turns ignore-bogons on.
|
Suppress unknown and/or empty initial start-tag always
|
(prevents double root element).
|
Schema now allows style as an inline element, like script.
|
Schema now allows tr as a child of table to avoid problems with embedded tables.
|
Clear Parser instance variables to make Parsers properly reusable.
|
|
Changes from 0.9 to 0.9.1:
|
==========================
|
Incorporated patch for -jar support by Joseph Walton.
|
Incorporated patch for Megginson XMLWriter support by Joseph Walton.
|
Changed existing XMLWriter to HTMLWriter.
|
Rewrote Parsermain for better features, removed Tester class.
|
-Dnewline=true removed, now implied by -DHTML=true.
|
-Dfiles=true now used to generate separate outputs (old Tester behavior)
|
with extension xhtml (removing any old extension).
|
Fixed nasty bug in HTMLScanner that was failing to fix unusual entities.
|
Don't attempt to smash whitespace to spaces any more.
|
|
Changes from 0.8 to 0.9:
|
========================
|
Ant-ified by Martin Rademacher.
|
Don't suppress colons in element names.
|
Entity problems fixed (I hope).
|
Can now set namespace and namespace-prefixes features (without effect).
|
Properly templatize HTMLModels.java.
|
Attributes are no longer in the HTML namespace.
|