Skip to content

Releases: iipc/jwarc

v0.10.2

23 Apr 04:35
Compare
Choose a tag to compare

Bugs fixed

  • ChunkedBody could read over end of chunk if destination buffer had higher capacity #35 (Sebastian Nagel)
  • Payload body had size 0 when the HTTP Content-Length header was missing #36 (Sebastian Nagel)

v0.10.1

17 Apr 03:15
Compare
Choose a tag to compare

Bugs fixed

  • GunzipChannel did not update the input position properly while reading gzip the extra field (since v0.8.2). #32 (Sebastian Nagel)

v0.10.0

30 Mar 13:47
Compare
Choose a tag to compare

New features

  • WarcParser, HttpParser and ChunkedBody now report the context of parse errors making them much easier to debug. (Sebastian Nagel)
  • HttpParser now has a lenient parsing mode which copes with various deviations from the HTTP standards including:
    • LF as a separator rather than CRLF
    • spaces between field names and the colon separator
    • normally disallowed characters in field values, request target
    • variation of the number of spaces in the request-line and status-line

Bugs fixed

  • The chunked encoding parser now handles last-chunk with multiple zreoes (reported by Sebastian Nagel)
  • WarcTargetRecord.target() and targetURI() now trim angle brackets from WARC-Target-URI for compatibility with implementations that followed the WARC 1.0 grammar.

v0.9.0

02 Mar 06:09
Compare
Choose a tag to compare

New features

  • WarcWriter can now produce WARC files with per-record compression #20 (Sebastian Nagel)

Bugs fixed

  • Improved documentation and error checking of the ByteBuffer argument to the WarcReader and GunzipReader constructors #22 (Sebastian Nagel)

v0.8.4

06 Feb 11:47
@ato ato
Compare
Choose a tag to compare

Bugs fixed:

  • The concurrentTo(URI) builder method now correctly wraps URIs in angle brackets.
  • The body(contentType, ...) builder methods now accept a null contentType to allow construction of messages without any Content-Type header.

v0.8.3

20 Jan 04:42
@ato ato
Compare
Choose a tag to compare

Bugs fixed

  • WarcReader could hang on truncated gzipped WARCs #17 (Sebastian Nagel)

v0.8.2

20 Jan 04:35
Compare
Choose a tag to compare

Bugs fixed

  • Fixed IOException reading the gzip 'extra' field (Sebastian Nagel)

v0.8.1

15 Jan 15:34
Compare
Choose a tag to compare

Bugs fixed

  • HttpParser: Fix parsing of non-ascii characters

v0.8.0

15 Jan 15:33
Compare
Choose a tag to compare

New features

  • Added accessor methods and toString() to WarcDigest

Bugs fixed

  • WarcReader: Cope with channels that aren't actually seekable despite advertising it

Changes

  • Moved network services into a new package
  • Split WarcTool into separate files in a new package

v0.7.0

11 Mar 04:07
@ato ato
Compare
Choose a tag to compare

New features

  • jwarc now includes a simple filter language for selecting matching WARC records.
    • jwarc filter 'warc-type != "request"'
    • jwarc filter ':status == 200 && http:content-type =~ "image/.*"'
    • long errors = reader.records().filter(WarcFilter.compile(":status >= 400")).count();
  • Native binary builds of the jwarc CLI tool are now available for Linux and MacOS. These are built using GraalVM and do not require Java to be installed. (The cross-platform .jar is still the recommended version though.)

Changed

  • Calling record.http() no longer invalidates record.body() although care must still be taken.
  • Remove the HttpParser.Handler interface