Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some errors when executing dumps #210

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pdelboca
Copy link
Member

@pdelboca pdelboca commented Nov 21, 2023

Hello!

I'm trying to do a dump of an instance but the package is throwing some errors. This PR is to fix whatever is appearing.

Problems when logging errors

TODO: See #209

KeyError: 'format'

Traceback (most recent call last):
  File "/home/pdelboca/Repos/ckanapi/.venv/bin/ckanapi", line 33, in <module>
    sys.exit(load_entry_point('ckanapi', 'console_scripts', 'ckanapi')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pdelboca/Repos/ckanapi/ckanapi/cli/main.py", line 156, in main
    return dump_things(ckan, thing[0], arguments)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pdelboca/Repos/ckanapi/ckanapi/cli/dump.py", line 110, in dump_things
    create_datapackage(record, datapackages_path, stderr, apikey)
  File "/home/pdelboca/Repos/ckanapi/ckanapi/datapackage.py", line 67, in create_datapackage
    filename = resource_filename(dres)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pdelboca/Repos/ckanapi/ckanapi/datapackage.py", line 87, in resource_filename
    ext = slugify.slugify(dres['format'])
                          ~~~~^^^^^^^^^^
KeyError: 'format'

@pdelboca
Copy link
Member Author

pdelboca commented Nov 22, 2023

@wardi have you ever used ckanapi to do a dump of a portal? I'm trying to do a dump of https://datos.gob.ar/ but it is extremely slow and it also gets "blocked" after 250 datasets. (Blocked = doesnt write any output, no progress, nothing is happening)

I'm trying to do:
ckanapi dump datasets --all --datapackages=./output_directory/ -r https://datos.gob.ar

@wardi
Copy link
Contributor

wardi commented Nov 24, 2023

@pdelboca we use it daily to create a history of our metadata for ~30k datasets. It's possible you're being throttled on the server side. dump datasets makes a separate package_show query for every dataset, you could try using search datasets instead that paginates over package_search instead for fewer requests.

It's possible to resume an interrupted load but not the dump command at the moment, maybe that's needed if you are being throttled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants