Skip to content

Script to perform dictionary based n-gram text tagging efficiently in apache spark

Notifications You must be signed in to change notification settings

dhwajraj/spark-text-tagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

spark-text-tagger

Script to perform dictionary based n-gram text tagging efficiently in Apache Spark

  • Method 1 : for big distributed dictionary using IndexedRDD.(Slightly slow, use only when memory is constraint and dict is too big)
  • Method 2 : for manageable size dictionary which could fit in worker memory, using broadcast varibales (shared by multiple workers per machine).

About

Script to perform dictionary based n-gram text tagging efficiently in apache spark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages