ExtractDocumentLengths: prints out sum of doclengths, both lossy and lossless #1040

lintool · 2020-03-20T01:26:05Z

Adds check in ExtractDocumentLengths per above issue.

codecov · 2020-03-20T01:30:54Z

Codecov Report

Merging #1040 into master will increase coverage by 0.34%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master    #1040      +/-   ##
============================================
+ Coverage     43.18%   43.53%   +0.34%     
- Complexity      610      622      +12     
============================================
  Files           128      128              
  Lines          7750     7759       +9     
  Branches       1131     1131              
============================================
+ Hits           3347     3378      +31     
+ Misses         4082     4062      -20     
+ Partials        321      319       -2

Impacted Files	Coverage Δ	Complexity Δ
.../java/io/anserini/util/ExtractDocumentLengths.java	`86.84% <100.00%> (+10.98%)`	`3.00 <0.00> (+1.00)`
...anserini/ltr/feature/base/PMIFeatureExtractor.java	`86.53% <0.00%> (+1.92%)`	`13.00% <0.00%> (+1.00%)`
...java/io/anserini/ltr/feature/CountBigramPairs.java	`89.61% <0.00%> (+24.67%)`	`33.00% <0.00%> (+10.00%)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1b07219...a5bd731. Read the comment docs.

chriskamphuis

Code looks good, but does not work for core18.
core18 has documents with length 0 which will result in the index not storing a docvector and resulting the following line to return null, and consequently throwing an error.

Terms terms = reader.getTermVector(i, "contents");

chriskamphuis · 2020-03-20T10:28:21Z

(I suppose this problem already existed)

lintool · 2020-03-20T11:01:08Z

Fixed the issue you mentioned while I was at it...

lintool added 3 commits March 18, 2020 14:09

Compares lossy vs. exact terms.

05b8065

Merge branch 'master' into doclength

8e2dba0

Tweaked tests.

e1f5139

lintool requested a review from chriskamphuis March 20, 2020 01:26

chriskamphuis reviewed Mar 20, 2020

View reviewed changes

Addressed CR.

a5bd731

chriskamphuis approved these changes Mar 20, 2020

View reviewed changes

lintool merged commit deae4b1 into master Mar 20, 2020

lintool deleted the doclength branch March 20, 2020 11:29

crystina-z pushed a commit to crystina-z/anserini that referenced this pull request Oct 28, 2022

Remove SimpleNearestNeighborSearcher and related code (castorini#1040)

fb103d3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExtractDocumentLengths: prints out sum of doclengths, both lossy and lossless #1040

ExtractDocumentLengths: prints out sum of doclengths, both lossy and lossless #1040

lintool commented Mar 20, 2020

codecov bot commented Mar 20, 2020 •

edited

Loading

chriskamphuis left a comment

chriskamphuis commented Mar 20, 2020

lintool commented Mar 20, 2020

ExtractDocumentLengths: prints out sum of doclengths, both lossy and lossless #1040

ExtractDocumentLengths: prints out sum of doclengths, both lossy and lossless #1040

Conversation

lintool commented Mar 20, 2020

codecov bot commented Mar 20, 2020 • edited Loading

Codecov Report

chriskamphuis left a comment

Choose a reason for hiding this comment

chriskamphuis commented Mar 20, 2020

lintool commented Mar 20, 2020

codecov bot commented Mar 20, 2020 •

edited

Loading