Project of Systems and Methods for Big and Unstructured Data.
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
Table of Contents
The purpose of the project is to create a bibliographic database, namely a system able to store and manage data regarding scientific articles and all their meaningful characteristics and relations with authors and the place where they are published.
- implementation in Neo4j
- implementation in MongoDB
- implementation in PySpark
The main dataset used can be downloaded from the site www.aminer.org, in particular, the latest version of data was employed, namely DBLP-Citation network V13. The dataset was not entirely imported in Neo4j because it was huge and our machines with limited computing power were not able to handle it, so just a part of it was taken, containing information about 2315 papers.
This link collects in a file the queries of the main document.
Each scientific paper has a full description of document structure and content as
- Title, Abstract, Authors (with affiliations, email, bio)
- Metadata (keywords)
- Publication details (journal, volume, number, date, pages)
- Sections: title, text (by paragraph), subsections, figures (image URL and caption)
- Bibliography (set of refs.)
This section lists any major frameworks/libraries used to bootstrap the project.
This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.
This is an example of how to list things you need to use the software and how to install them.
- Install Neo4j
./install-neo4j.sh
- Install MongoDB
./install-mongodb.sh
- Run Mongo-Shell by launching
mongosh
command in the terminal, and then create Administrative MongoDB Useruse admin
db.createUser({ user: "username", pwd: "password", roles: [{ role: "userAdminAnyDatabase", db: "admin" }] })
quit()
- To test the changes, access the mongo shell using the created administrative user.
mongosh -u username -p --authenticationDatabase admin
- To test the changes, access the mongo shell using the created administrative user.
- Download and extract the dataset
wget https://originalstatic.aminer.cn/misc/dblp.v13.7z 7z x dblpv13.7z
Below is an example of how you can instruct your audience on installing and setting up your app. This template doesn't rely on any external dependencies or services.
- Clone the repo
git clone https://github.com/IrfEazy/smbud-project.git
For more examples, please refer to the Documentation
Distributed under the MIT License. See LICENSE
for more information.
Irfan Cela - LinkedIn - irfan.cela@mail.polimi.it
Fabio Lusha - LinkedIn - fabio.lusha@mail.polimi.it
Alberto Sandri - LinkedIn - alberto.sandri@mail.polimi.it
Bianca Christiana Savoiu Marinas - LinkedIn - biancachristiana.savoiu@mail.polimi.it
Enrico Simionato - LinkedIn - enrico.simionato@mail.polimi.it
Project Link: https://github.com/IrfEazy/smbud-project