Skip to content
This repository has been archived by the owner on Jan 6, 2024. It is now read-only.

Latest commit

 

History

History
182 lines (156 loc) · 7.35 KB

CHANGELOG.md

File metadata and controls

182 lines (156 loc) · 7.35 KB

RECENT CHANGES

26 Feb 2022: v1.8.1

  • Added a configurable poller interval
  • Added service recovery when directory to monitor is not writable
  • Fixed upgrades with newer configuration files
  • Fixed sporadic errors with preprocessed images being detected by poller

23 Feb 2022: v1.8.0

  • Added internal inotifywait emulation that can deal with events on NFS / SMB shares where inotify events won't happen
  • Highly speed up OCR by bypassing checks on non modified files
  • Speed up OCR_Dispatch by checking already OCRed PDFs before launching OCR function
  • Inclusions and exclusions are now case insensitive in order to make sure we play right with Windows rules too

29 Dec 2021: v1.7.0 (never released)

  • Tested Tesseract 5.X engine
  • Improved optional preprocessor commandline
    • Added antialiasing
    • Added text sharpening
  • Removed earlier ghostscript dependency
  • Fixed installer message when no wget is present
  • Updated ofunctions

11 Jul 2019: v1.6.1

  • Tested Tesseract 4.x engine
    • Renamed "tesseract3" engine to "tesseract" since we work with 3.02+ / 4.x
    • Added TESSERACT_OPTIONAL_ARGS in config file
  • Improved handling of open files being deferred for later OCR
  • Fixed automatic service shutdown in RHEL 6/7 (automatic /tmp directory cleanup removing service file)
  • Updated ofunctions
    • Moved from yes/no parameters to bash booleans
    • Compatibility with elder config is preserved
    • Better cleanup
  • Fixed installer typos

21 Dec 2018: v1.6.0

  • Simplified config file syntax for OCR_ENGINE selection
  • Added config file revision check
  • Fixed logs not writing correctly in service mode and batch mode (OCR_Dispatch and lower function Logger doesn't work in)
  • Fixed --no-text argument
  • Added --failed-suffix and --no-failed-suffix batch options
  • Skipping files currently being written to (workaround for slow file transfers), leaving them for next run
  • Add nanoseconds to filename if output file already exists on move
  • More clear preflight error messages
  • Updated ofunctions
    • RFC822 email compliance checks
    • New more complete ExecTasks function to replace ParallelExec
    • Fix log sending with double compressed extensions
    • Minor fixes
  • Fixed return code for initV style service file
  • Upgraded shunit2 test framework to v2.1.8pre (git commit 07bb329)

21 Avr 2017: v1.5.7

  • Fixed a bug cleaning the SERVICE_MONITOR file after each run

20 Avr 2017: v1.5.6

  • Added tesseract version preflight checks
  • Added unit test framework (basic functionnality yet)
    • Added batch tests
      • File suffixes & no suffixes
      • File text / date additions
      • Skip searchable pdf tests
      • Delete original upon successful processing
    • Added service tests
      • Basic PDF / TXT / CSV tests
      • File moves on success & failure
  • Fixed SERVICE-MONITOR file (run file) created in root
  • Fixed CSV transformation didn't work
  • Fixed a low severity security issue where log & run files are world readable
  • Fixed some installer strings
  • Tmp files are now cleaned on the fly after each dispatch

13 Mar 2017: v1.5.4

  • Support for moving files after processing
    • Failing to move files will automatically rename them
  • Better installer with --remove support
  • Mail alerts can now be encoded differently than UTF-8
  • Updated ofunctions from obackup / osync

06 Feb 2017: v1.5.2

  • Service improvements
    • A forced run is done every MAX_WAIT seconds
    • OCR is run on service start
    • Moved files now also trigger an OCR run
  • Prevent overwriting multiple failed files with same source filename
  • Updated ofunctions from osync & obackup projects allowing to address multiple issues
    • Improved mail function
    • Improved ParallelExec function
    • Improved logging functionality

21 Oct 2016: v1.5

  • Added ownership preservation option
  • Added optional file permission mask to replace default new file permissions
  • Added the possibility to use an image preprocesser (Imagemagick is preconfigured but not enabled by default)
  • Corrected an issue where a failed service run may end up in an infinite loop by adding a failed OCR file suffix
  • Made a workaround for Tesseract throwing an error when OSD data is missing but not exiting with a failure code
  • Fixed intermediary PDF2TIFF transformation used with Tesseract
  • Fixed --suffix option was ignored
  • Recoded service execution asynchronously
    • Fixed a bug when a file is added while the OCR process is already runnning, the file won't be processed until another file is added
  • Chaned unix process signals to be posix compliant
  • Fixed file suffix exclusion also excluded files that contained the suffix anywhere in the filename
  • Enhanced parallel execution for huge file sets
  • Improved cpu usage on idle
  • Changed the way pmocr works
    • Splitted pmocr.sh config into separate config files so updates don't overwrite current config anymore
    • Updated service files to run multiple instances
    • Updated install script to handle config files
  • Added parallel execution for multicore systems
  • Improved tesseract 3 support
    • Added text output format
    • Added csv output format (with csv hack)
    • Remove intermediary txt files produced by tesseract
  • Improved logging
  • Improved code compliance
  • Various minor fixes from ofunctions updates

15 Aug 2016: v1.4.2

  • Removed keep logging statement from WaitForTaskCompletion function
  • Fixed rare bug where original PDF file gets deleted without succeded transformation
  • Removed NO_DELETE_SUFFIX that is not used anymore
  • More debug logs
  • Updated ofunctions from other projects

06 Aug 2016: v1.4.1

  • Fixed mail alerts not sent
  • Improved debugging and logging
  • Merged dev builder with other projects
  • Cleaned code (a bit)

04 Aug 2016: v1.4

  • Merged more recent common function set
  • Improved logging
  • Improved installer
  • Added a systemd unit file
  • Added pdf2tiff intermediary transformation for tesseract3 to support pdf input (thanks to mhelff, https://github.com/mhelff)
  • Set pdf conversion as default choice in batch mode
  • Added preflight checks for tesseract3 engine
  • Refactored code that became totally unreadable for human being :)
  • Improved sub process terminate code
  • Improved daemon logging
  • Improved mail alert support in daemon mode

03 Mar 2016: v1.3

  • Merged function codebase with osync and obackup
  • Fixed file extension should not change when DELETE_ORIGINAL=no
  • Added a suffix to original files for recognition
  • Fixed detection of PDFs already containing text (pdffonts should output more than 2 lines if embedded fonts are found)
  • Added minimal email alerts
  • Ported some code from osync/obackup
  • Added LSB info to init script for Debian based distros
  • Check for service directories before launching service
  • Added better KillChilds function on exit in service mode
  • Changed code to be code style V2 compliant
  • Added support for tesseract 3.x
  • Added options to suppress suffix and text in batch process

31 Aug 2015: v1.2

  • Added all input file formats that abbyyocr11 supports
  • Fixed find command to allow case insensitive input extensions
  • Minor improvements in logging, and code readability
  • Added full commandline batch mode
  • Added option to delete input file after successful processing
  • Added option to suppress OCRed filename suffix
  • New option to avoid passing PDFs already containing text to the OCR engine
  • New option to add a trivial value to the output filename (like a date)

23 Aug 2015: v1.04

  • Fixed multiple problems with spaces in filenames and exclusion patterns
  • Minor fixes for logging
  • Renamed all pmOCR instances to pmocr