Skip to content

Ingest the umbra data dumps to elasticsearch compare to LJ names load to mongodb

License

Notifications You must be signed in to change notification settings

linkedjazz/umbra-ingest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

umbra-ingest

Ingest the umbra data dumps to elasticsearch

This process requires elasticsearch and mongodb to be installed and running locally on the normal ports. You also need a newish >4 node and npm installed.

The data needs to be in the data/ directory, can be found here: https://www.dropbox.com/s/lr51wonmidl5bmn/umbra-ingest-data.zip?dl=0

Install the npm modules

npm install

Then run the scripts in this order

node index.js ingest
node index.js compare
node index.js load

To ingest the umbra data into elastic, compare them against LJ names, and load the results into the mongodb.

The data will be in lj database umbraMatches collections and will have these fields:

{
	"_id" : ObjectId("570d0c5bfc7a44727ef1444f"),
	"name" : "Eddie South",
	"uri" : "http://dbpedia.org/resource/Eddie_South",
	"hash" : "103152c56433118773a8228926010155c4dc834b",
	"object" : "//www.loc.gov/pictures/cdn/service/pnp/cph/3c00000/3c03000/3c03200/3c03244_150px.jpg",
	"isShownAt" : "//www.loc.gov/pictures/item/91732088/",
	"title" : "[African American musician, <em>Eddie</em> <em>South</em>, plays violin to white patrons at tables in New York night club]",
	"description" : null,
	"subjects" : "African Americans | Employment | Music halls | New York (State) | Photographic prints | <em>South</em>, <em>Eddie</em> | Violins",
	"complete" : false,
	"random" : 1217
}

hash is the umbra id

object is the direct link to the preview image

isShownAt is the link to the contributor

About

Ingest the umbra data dumps to elasticsearch compare to LJ names load to mongodb

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published