Skip to content

Spark code for conversion of dbpedia turtle files into TrecText format used in our runs for https://github.com/iai-group/DBpedia-Entity

License

Notifications You must be signed in to change notification settings

teanalab/dbpedia2fields

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prerequisites:

Run TriplesToTrec with the following 13 files from DBPedia 2015-10 as input

  • anchor_text_en.ttl
  • article_categories_en.ttl
  • category_labels_en.ttl
  • infobox_properties_en.ttl
  • infobox_property_definitions_en.ttl
  • instance_types_transitive_en.ttl
  • labels_en.ttl
  • long_abstracts_en.ttl
  • mappingbased_literals_en.ttl
  • mappingbased_objects_en.ttl
  • page_links_en.ttl
  • persondata_en.ttl
  • short_abstracts_en.ttl

Some Spark parameters tuning is required to run it successfully, for example --executor-memory 22g --driver-memory 6g --conf spark.yarn.executor.memoryOverhead=1g.

Example command to run:

$ sbt assembly
$ $SPARK_HOME/bin/spark-submit --class 'edu.wayne.dbpedia2fields.TriplesToTrec' --master 'local[*]' --executor-memory 22g --driver-memory 6g --conf spark.yarn.executor.memoryOverhead=1g target/scala-2.10/dbpedia2fields-assembly-1.0.jar 'dbpedia-2015-10-subset/*.ttl' triples-to-trec

About

Spark code for conversion of dbpedia turtle files into TrecText format used in our runs for https://github.com/iai-group/DBpedia-Entity

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages