Skip to content

PyGlossary 4.6.0

Compare
Choose a tag to compare
@ilius ilius released this 07 Mar 11:27
· 500 commits to master since this release
4b7ae78

Changes since 4.5.0

Dependency change

We now require Python 3.9 or a later version.

Bug fixes

  • Fix exception in scripts/plugin-index.py: 8a94b8c

  • StarDict: Fix writing to .zip file produced empty zip, and fix bad test

  • dictunformat: fix #367: add option headword_separator, default to ;

  • Fixes in ui_gtk, #380 #382 #403

  • AppleDict source: fix #407 missing quotes for title, and refactor duplicate codes

  • DictionaryForMIDs: remove | from word when normalizing, fix punctuation regex, use Unix newlines

  • StarDict: use Unix newline when reading and writing .ifo file on Windows

  • Fix bug of glos.addEntryObj(dataEntry) adding empty file because tmpDataDir is not set until glos.read()

    • Set and create tmpDataDir on glos.tmpDataDir access, and add test, #424
  • Fix scripts/wiki-formats.py, #428

  • Dictd / Dict.org: fix exception on Windows

Features

  • Support sorting by an ICU locale, see Sorting section of README

  • Add Gtk4 interface --ui=gtk4 / --gtk4

    • still buggy and not as functional as Gtk3 or Tkinter interfaces
  • Add flag --optimize-memory, config key optimize_memory

    • To enable entry compression on --indirect
    • Not enabled by default (it was previously always compressed)
  • Allow plugin's reader.open() to return an Iterator for progress bar

    • Implement for Tabfile (reading info/metedata)
    • Implement for AppleDict Binary (reading KeyText.data)
  • Add read and write support for StarDict Textual File (.xml), #348

  • Add support for writing Yomichan dictionary files, #395 by @tomtung

  • StarDict reader: support .syn.dz file, #410

  • StarDict writer: add write option large_file, #392 #422

  • StarDict reader: support dxoffsetbits=64 on read, #392 #422

  • JMDict: support examples, #383

  • Add read support for JMnedict, #386

  • Add flag --skip-duplicate-headword, config skip_duplicate_headword, #365

    • Zim reader: remove option skip_duplicate_words, #365
  • Add flag --trim-arabic-diacritics, config trim_arabic_diacritics, #366

  • Add read support for IUPAC goldbook (.xml), #355

  • Add write support for DIKT JSON

  • StarDict writer: limit memory usage by using SQLite for idx and syn data, #409

  • CSV: add newline option, defaulting to Unix-style

  • Aard2 Slob writer: add option file_size_approx_check_num_entries

  • Add scripts/diff-glossary and scripts/view-glossary

Improvements

  • When remove HTML tags, also replace <div> with \n, #394 by @tomtung

    • Treat <div> the same way <p> is treated.
  • Mobi: add mobi7-forcing switch to kindlegen command, #374 by @holyspiritomb

  • Octopus MDict: ignore directories with same_dir_data_files, #362

  • StarDict reader: handle definitions with mixed types/formats

  • Dictfile: strip whitespaces from word and defi before going through entry filters

  • BGL: strip whitespaces from word and defi before going through entry filters

  • Improvement in glos.write: avoid printing exception for invalid encoding

  • Remove empty logs in glos.convert

  • StarDict reader: fix validating sametypesequence, and add test

  • glos.convert: Allow an existing empty directory as output path

  • TextGlossaryReader: replace nextPair method with nextBlock which returns resource files as third item

  • ui_cmd_interactive: allow converting several times before exiting

  • Change title tag for Greek from <big> to <b>

  • Update language data set (langs.json)

  • ui/main.py: print 1-line error instead of full exception on ImportError

  • ui/main.py: Windows: try Tkinter before Gtk

  • ebook_base.py: avoid shutil.move on Windows, #368

  • TextGlossaryReader: fix loading info and some refactoring, #370 36b9cd8

  • Entry: Allow word to be tuple in Entry(word=...)

  • glos.iterInfo() return Iterator rather than Iterable

  • Zim: change dependency to libzim>=1.0, and some comments

  • Mobi: work with kindlegen executable in PATH directories, #401

  • ui: limit the length of option comments in Format Options dialog

  • ui_gtk: improvement: show (last) critical error on status bar

  • ui_gtk: set intial focus

  • ui_gtk: improvements in About tab

  • ui_tk: revert most ttk widgets to tk because the theme doesn't match

  • Add SVG icon, #414 by @proletarius101

  • Prevent exception/traceback on Ctrl+C

  • Optimize progress bar

  • Aard2 slob: show info log before and after slobWriter.finalize(), #437

Removed features

  • Remove read support for Wiktiomary Dump, #48

  • Remove support for Sdictionary Binary and Source

Octopus MDict MDX: features and improvements

  • Support MDict V3 fomrat by updating readmdict, #385 by @xiaoqiangwang

  • Fix files created without UUID in header, #387 by @xiaoqiangwang

    • MdxBuilder 4.0 RC2 and before creates files without UUID header
  • Decode mdict title & description if they're bytes, #393 by @tomtung

  • readmdict: Skip zlib decompress exceptions, #384

  • readmdict: Use __name__ as logger name, and add 2 debug logs, #384

  • readmdict: improve exception msg for xxhash, #385

XDXF: fixes / imrovements, issue #376

  • Support <categ>
  • Support embedded tags in <iref>
  • Fix ignoring <mrkd>
  • Fix extra newlines
  • Get rid of warning for <etm>
  • Fix/improve newline and space issues
  • Fix and improve tests
  • Update url for format description
  • Support any tag/string in <ex>, #396
  • Support reading compressed files directly (.xdxf.gz, .xdxf.bz2, .xdxf.lzma)
  • Allow using XSL using --write-options=xsl=True
  • Update XSL
  • Other improvements in XDXF to HTML transformation

AppleDict Binary: features, bug fixes, improvements, refactoring

  • Fix css name on html_full=True

  • Fix using self._encoding when should use utf-8

  • Fix internal links, #343

    • Remove x-dictionary:d: prefix from href
    • First fix for x-dictionary:r:: use title if present
    • Add bword:// prefix to href (unless it points to http/https)
    • Read entry IDs on open and fix links with x-dictionary:r:
  • Add plistlib to dependencies

  • Add tests

  • Replace <entry ...> with <div>

  • Fix bad exception formatting

  • Fixes from PR #436

  • Support morphology (alternates): #434 by @soshial

  • Support different AppleDict offsets, #417 by @soshial

  • Extract AppleDict meta-info (langs, title, author), #418 by @soshial

  • Progress Bar on open() / loading KeyText.data

  • Improve memory usage of loading KeyText.data

  • Replace appledict_bin.py with appledict_bin directory and more refactoring

Glossary class (glossary.py)

  • Lots of refactoring in glossary.py

    • Improve the design and readability
    • Reduce complexity of methods
    • Move some code into new classes that Glossary inherits from
    • Improve error messages
  • Introduce glossary_v2.py, and maintain API backward-compatibility for glossary.py (as far as documented)

Refactoring

  • Fix style errors using ruff based on pyproject.toml configuration

  • Remove all usages of pyglossary.plugins.formats_common

  • Use str.startswith(tuple) and str.endswith(tuple)

  • Reduce complexity of Glossary methods

  • Rename entry filter strip to trim_whitespaces

  • Some refactoring in StarDict reader

  • Use f-string equal syntax added in Python 3.8

  • Use str.removeprefix and str.removesuffix added in Python 3.9

  • langs/writing_system.py:

    • Change iso field to list
    • Add new scripts
    • Add getAllWritingSystemsFromText
    • More refactoring
  • Split up TextGlossaryReader.loadInfo method

  • plugin_manager.py: make some methods private

Documentation

  • Update plugins' documentation

  • Glossary: add comments about entryFilters

  • Update config.rst

  • Update doc/entry-filters.md

  • Update README.md

  • Update doc/sort-key.md

  • Update doc/pyicu.md

  • Update plugins/testformat.py

  • Add types for arguments and result of all functions/methods

  • Add types for r/w options in reader/writer classes

  • Fix a few incorrect type annotations

  • README.md: Add document for adding data entries, #412

  • README.md: Fix -> nixos command, #400 by @srghma

  • Update bgl_info.md and move it from pyglossary/plugins/babylon_bgl/ to doc/babylon/

Testing

  • Add test for DSL -> Tabfile conversion

  • dsl_test.py: fix method names not starting with test_

  • StarDict reader: better testing for handling definitions with mixed types

  • StarDict writer: much better testing, coverage of stardict.py: from %62 to %83

  • Refactoring and improvements in tests of Glossary, along with new tests

  • Add test for dictunformat -> Tabfile

  • AppleDict (source) tests: validate plist file contents

  • Allow forking and branching pyglossary-test repo

  • Fix some failing tests on Windows

  • Slob: test file_size_approx

  • Test Tabfile -> SQL conversion

  • Test StarDict error/warning for sortKeyName with and without locale

  • Print useful messages for unhandled warnings

  • Improve logs

  • Add showDiff=False arg to compareTextFiles and convert

Packaging

  • Update and refactor Dockerfile and run-with-docker.sh

    • Dockerfile: change WORKDIR to /root/home which is mapped to host's home dir
    • run-with-docker.sh: create confDir before docker build (to check the owner later)
    • run-with-docker.sh: accept version (image tag) as argument
    • Use host's (non-root) user in docker run
    • Map host user's $HOME to docker's user home
    • Re-use existing docker image with same tag
  • Update setup.py