Releases: iipc/jwarc
Releases · iipc/jwarc
v0.10.2
v0.10.1
v0.10.0
New features
- WarcParser, HttpParser and ChunkedBody now report the context of parse errors making them much easier to debug. (Sebastian Nagel)
- HttpParser now has a lenient parsing mode which copes with various deviations from the HTTP standards including:
- LF as a separator rather than CRLF
- spaces between field names and the colon separator
- normally disallowed characters in field values, request target
- variation of the number of spaces in the request-line and status-line
Bugs fixed
- The chunked encoding parser now handles last-chunk with multiple zreoes (reported by Sebastian Nagel)
- WarcTargetRecord.target() and targetURI() now trim angle brackets from WARC-Target-URI for compatibility with implementations that followed the WARC 1.0 grammar.
v0.9.0
v0.8.4
v0.8.3
v0.8.2
v0.8.1
v0.8.0
New features
- Added accessor methods and toString() to WarcDigest
Bugs fixed
- WarcReader: Cope with channels that aren't actually seekable despite advertising it
Changes
- Moved network services into a new package
- Split WarcTool into separate files in a new package
v0.7.0
New features
- jwarc now includes a simple filter language for selecting matching WARC records.
jwarc filter 'warc-type != "request"'
jwarc filter ':status == 200 && http:content-type =~ "image/.*"'
long errors = reader.records().filter(WarcFilter.compile(":status >= 400")).count();
- Native binary builds of the jwarc CLI tool are now available for Linux and MacOS. These are built using GraalVM and do not require Java to be installed. (The cross-platform .jar is still the recommended version though.)
Changed
- Calling record.http() no longer invalidates record.body() although care must still be taken.
- Remove the HttpParser.Handler interface