Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing existing entities re-adds statements #22

Open
lucaswerkmeister opened this issue Apr 21, 2017 · 3 comments · Fixed by Wikidata/WikibaseImport#1 · May be fixed by #23
Open

Importing existing entities re-adds statements #22

lucaswerkmeister opened this issue Apr 21, 2017 · 3 comments · Fixed by Wikidata/WikibaseImport#1 · May be fixed by #23

Comments

@lucaswerkmeister
Copy link
Contributor

lucaswerkmeister commented Apr 21, 2017

Depending on local setup, if you re-import entities that you’ve already imported before, the importer will add all the statements again, leaving you with duplicate statements.

This query in PagePropsStatementCountLookup::getStatementCount is the culprit:

     $res = $db->selectRow(
         array( 'page_props', 'page' ),
         array( 'pp_value' ),
         array(
             'page_namespace' => 0, // <-- HERE
             'page_title' => $entityId->getSerialization(),
             'pp_propname' => 'wb-claims'
         ),
         __METHOD__,
         array(),
         array( 'page' => array( 'LEFT JOIN', 'page_id=pp_page' ) )
     );

In a default Wikibase setup, Item is namespace #120, and Property is namespace #122 (see Wikibase/repo/config/Wikibase.example.php), so the query returns no result, which is interpreted as a statement count of 0:

     if ( $res === false ) {
         return 0;
     }

Even if you configure your Wikibase to put items in the main namespace, as on Wikidata, I think the importer would still duplicate statements on properties.

I’m not sure what the best way to solve this is, but perhaps it would be a good idea to throw an exception instead of returning 0 if the query returns no results: if we can’t find the entity, there must be something wrong, right?

Workaround: change the query to select, e. g., 'page_namespace' => [ 120, 122 ].

@lucaswerkmeister
Copy link
Contributor Author

Wait, I forgot that this is a MediaWiki extension, so it has access to the same config that Wikibase uses, and can look up the entity namespaces :) fix incoming.

lucaswerkmeister added a commit to lucaswerkmeister/WikibaseImport that referenced this issue Apr 21, 2017
Instead of assuming that all entities are in namespace 0 (which is only
true in a Wikidata-like installation, but not in a default Wikibase
installation, and even then only for items, not properties), use all
entity namespaces from the local configuration. (Use array_values to
make the keys numeric to make sure the database wrapper doesn’t try to
interpret the string keys ('item', 'property').)

Also, if we can’t find the entity, throw an exception instead of
returning count 0: it’s better to fail the import than to duplicate
statements.

Fixes filbertkm#22.
lucaswerkmeister added a commit to lucaswerkmeister/WikibaseImport that referenced this issue Apr 21, 2017
Instead of assuming that all entities are in namespace 0 (which is only
true in a Wikidata-like installation, but not in a default Wikibase
installation, and even then only for items, not properties), use all
entity namespaces from the local configuration. (Use array_values to
make the keys numeric to make sure the database wrapper doesn’t try to
interpret the string keys ('item', 'property').)

Also, if we can’t find the entity, throw an exception instead of
returning count 0: it’s better to fail the import than to duplicate
statements.

Fixes filbertkm#22.
@lucaswerkmeister lucaswerkmeister linked a pull request Apr 21, 2017 that will close this issue
lucaswerkmeister added a commit to lucaswerkmeister/WikibaseImport that referenced this issue Apr 24, 2017
Instead of assuming that all entities are in namespace 0 (which is only
true in a Wikidata-like installation, but not in a default Wikibase
installation, and even then only for items, not properties), use all
entity namespaces from the local configuration. (Use array_values to
make the keys numeric to make sure the database wrapper doesn’t try to
interpret the string keys ('item', 'property').)

Also, if we can’t find the entity, throw an exception instead of
returning count 0: it’s better to fail the import than to duplicate
statements.

Fixes filbertkm#22.
lucaswerkmeister added a commit to lucaswerkmeister/WikibaseImport that referenced this issue May 10, 2017
Instead of assuming that all entities are in namespace 0 (which is only
true in a Wikidata-like installation, but not in a default Wikibase
installation, and even then only for items, not properties), take the
actual namespace of the requested entity from an injected
EntityNamespaceLookup.

Also, if we can’t find the entity, throw an exception instead of
returning count 0: it’s better to fail the import than to duplicate
statements.

Fixes filbertkm#22.
lucaswerkmeister added a commit to lucaswerkmeister/WikibaseImport that referenced this issue Apr 12, 2019
Instead of assuming that all entities are in namespace 0 (which is only
true in a Wikidata-like installation, but not in a default Wikibase
installation, and even then only for items, not properties), take the
actual namespace of the requested entity from an injected
EntityNamespaceLookup.

Also, if we can’t find the entity, throw an exception instead of
returning count 0: it’s better to fail the import than to duplicate
statements.

Fixes filbertkm#22.
@D063520
Copy link

D063520 commented Jun 18, 2019

I encountered the same problem, took me a morning to debug. I just dropped this line:
'page_namespace' => 0,
Why is the pull request not merged yet?
How can I help accelerate this?

@jjkoehorst
Copy link

Any clue when / if this can be fixed? I am running into this sync issue as well...Would love to help but php is not my strongest point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants