RECENT CHANGES

26 Feb 2022: v1.8.1

23 Feb 2022: v1.8.0

Added internal inotifywait emulation that can deal with events on NFS / SMB shares where inotify events won't happen
Highly speed up OCR by bypassing checks on non modified files
Speed up OCR_Dispatch by checking already OCRed PDFs before launching OCR function
Inclusions and exclusions are now case insensitive in order to make sure we play right with Windows rules too

29 Dec 2021: v1.7.0 (never released)

Tested Tesseract 5.X engine
Improved optional preprocessor commandline
- Added antialiasing
- Added text sharpening
Removed earlier ghostscript dependency
Fixed installer message when no wget is present
Updated ofunctions

11 Jul 2019: v1.6.1

Tested Tesseract 4.x engine
- Renamed "tesseract3" engine to "tesseract" since we work with 3.02+ / 4.x
- Added TESSERACT_OPTIONAL_ARGS in config file
Improved handling of open files being deferred for later OCR
Fixed automatic service shutdown in RHEL 6/7 (automatic /tmp directory cleanup removing service file)
Updated ofunctions
- Moved from yes/no parameters to bash booleans
- Compatibility with elder config is preserved
- Better cleanup
Fixed installer typos

21 Dec 2018: v1.6.0

Simplified config file syntax for OCR_ENGINE selection
Added config file revision check
Fixed logs not writing correctly in service mode and batch mode (OCR_Dispatch and lower function Logger doesn't work in)
Fixed --no-text argument
Added --failed-suffix and --no-failed-suffix batch options
Skipping files currently being written to (workaround for slow file transfers), leaving them for next run
Add nanoseconds to filename if output file already exists on move
More clear preflight error messages
Updated ofunctions
- RFC822 email compliance checks
- New more complete ExecTasks function to replace ParallelExec
- Fix log sending with double compressed extensions
- Minor fixes
Fixed return code for initV style service file
Upgraded shunit2 test framework to v2.1.8pre (git commit 07bb329)

21 Avr 2017: v1.5.7

20 Avr 2017: v1.5.6

13 Mar 2017: v1.5.4

Support for moving files after processing
- Failing to move files will automatically rename them
Better installer with --remove support
Mail alerts can now be encoded differently than UTF-8
Updated ofunctions from obackup / osync

06 Feb 2017: v1.5.2

Service improvements
- A forced run is done every MAX_WAIT seconds
- OCR is run on service start
- Moved files now also trigger an OCR run
Prevent overwriting multiple failed files with same source filename
Updated ofunctions from osync & obackup projects allowing to address multiple issues
- Improved mail function
- Improved ParallelExec function
- Improved logging functionality

21 Oct 2016: v1.5

Added ownership preservation option
Added optional file permission mask to replace default new file permissions
Added the possibility to use an image preprocesser (Imagemagick is preconfigured but not enabled by default)
Corrected an issue where a failed service run may end up in an infinite loop by adding a failed OCR file suffix
Made a workaround for Tesseract throwing an error when OSD data is missing but not exiting with a failure code
Fixed intermediary PDF2TIFF transformation used with Tesseract
Fixed --suffix option was ignored
Recoded service execution asynchronously
- Fixed a bug when a file is added while the OCR process is already runnning, the file won't be processed until another file is added
Chaned unix process signals to be posix compliant
Fixed file suffix exclusion also excluded files that contained the suffix anywhere in the filename
Enhanced parallel execution for huge file sets
Improved cpu usage on idle
Changed the way pmocr works
- Splitted pmocr.sh config into separate config files so updates don't overwrite current config anymore
- Updated service files to run multiple instances
- Updated install script to handle config files
Added parallel execution for multicore systems
Improved tesseract 3 support
- Added text output format
- Added csv output format (with csv hack)
- Remove intermediary txt files produced by tesseract
Improved logging
Improved code compliance
Various minor fixes from ofunctions updates

15 Aug 2016: v1.4.2

Removed keep logging statement from WaitForTaskCompletion function
Fixed rare bug where original PDF file gets deleted without succeded transformation
Removed NO_DELETE_SUFFIX that is not used anymore
More debug logs
Updated ofunctions from other projects

06 Aug 2016: v1.4.1

04 Aug 2016: v1.4

Merged more recent common function set
Improved logging
Improved installer
Added a systemd unit file
Added pdf2tiff intermediary transformation for tesseract3 to support pdf input (thanks to mhelff, https://github.com/mhelff)
Set pdf conversion as default choice in batch mode
Added preflight checks for tesseract3 engine
Refactored code that became totally unreadable for human being :)
Improved sub process terminate code
Improved daemon logging
Improved mail alert support in daemon mode

03 Mar 2016: v1.3

Merged function codebase with osync and obackup
Fixed file extension should not change when DELETE_ORIGINAL=no
Added a suffix to original files for recognition
Fixed detection of PDFs already containing text (pdffonts should output more than 2 lines if embedded fonts are found)
Added minimal email alerts
Ported some code from osync/obackup
Added LSB info to init script for Debian based distros
Check for service directories before launching service
Added better KillChilds function on exit in service mode
Changed code to be code style V2 compliant
Added support for tesseract 3.x
Added options to suppress suffix and text in batch process

31 Aug 2015: v1.2

23 Aug 2015: v1.04

Provide feedback