Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk indexing via Catmandu #14

Open
netsensei opened this issue Oct 6, 2017 · 1 comment
Open

Bulk indexing via Catmandu #14

netsensei opened this issue Oct 6, 2017 · 1 comment

Comments

@netsensei
Copy link

The current importer doesn't seem to support the bulk DataImportHandler method to add data to a Solr index in bulk. Pushing data record by record is a slow, error-prone process since it seems to re-trigger the indexing process each time a new record is pushed and committed to the index. The DataImportHandler method circumvents this.

We've implemented this method of indexing in the Datahub::Factory application (which is heavily based on the Catmandu architecture)

Would it be viable to reuse this code in this module as a separate importer?

See: https://github.com/thedatahub/Datahub-Factory/blob/master/lib/Datahub/Factory/Indexer/Solr.pm

The above module expects two inputs:

  • The local location of the JSON file which contains data to be uploaded.
  • The URL defined by the DataImportHandler in the Solr configuration.

Implementation looks like this:

my filename = "/tm/bulk.json"
my $requestHandler = "http://localhost:8983/solr/blacklight-core/update/json"
my $indexer = Datahub::Factory->indexer('Solr')->new(
    'file_name' = $filename,
    'request_handler' => $requestHandler
);
$indexer->import();
$indexer->commit();

Both methods will return the response of the handler API as a perl hash.
Both methods throw a Catmandu::HTTP:Error at the moment if something goes wrong.

@nicolasfranck
Copy link
Contributor

The store also has a method called "transaction":

$bag->store->transaction(sub{

  $bag->add_many($importer);

});

all bags of the store are committed (or rolled back) at the end

So this would require an option "transaction" in the CLI.

Warning: there are no real transactions in Solr, because another process can commit it for you..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants