Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

word_stem() decapitalizes words in Velox, but Java preserves capitalization. #11012

Open
amitkdutta opened this issue Sep 16, 2024 · 1 comment
Labels
bug Something isn't working triage Newly created issue that needs attention.

Comments

@amitkdutta
Copy link
Contributor

Bug description

@spershin found that word_stem library does not preserve the capitalization of words like Presto Java

In Presto Java:

SELECT word_stem(c0) from (values ('Reston'), ('Earnings'), ('Q2'), ('PID'), ('RuntimeError'), ('PRIZE')) t(c0);
    _col0             
--------------
 Reston       
 Earn         
 Q2           
 PID          
 RuntimeError 
 PRIZE        
(6 rows)

In Velox/Presto C++, same query returns

SELECT word_stem(c0) from (values ('Reston'), ('Earnings'), ('Q2'), ('PID'), ('RuntimeError'), ('PRIZE')) t(c0);
    _col0             
--------------
 reston       
 earn         
 q2           
 pid          
 runtimeerror 
 prize        
(6 rows)

System information

Any platform

Relevant logs

No specific logs
@amitkdutta amitkdutta added bug Something isn't working triage Newly created issue that needs attention. labels Sep 16, 2024
@amitkdutta
Copy link
Contributor Author

CC: @yhwang @kgpai @kagamiori

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Newly created issue that needs attention.
Projects
None yet
Development

No branches or pull requests

1 participant