Skip to content

Project - System and Methods for Big and Unstructured Data [PoliMi]

License

Notifications You must be signed in to change notification settings

BiancaSavoiu/smbud-project

 
 

Repository files navigation

Contributors Forks Stargazers Issues MIT License


Logo

Systems and Methods for Big and Unstructured Data

Project of Systems and Methods for Big and Unstructured Data.
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. License
  5. Contact

About The Project

The purpose of the project is to create a bibliographic database, namely a system able to store and manage data regarding scientific articles and all their meaningful characteristics and relations with authors and the place where they are published.

  • implementation in Neo4j
  • implementation in MongoDB
  • implementation in PySpark

Neo4j

The main dataset used can be downloaded from the site www.aminer.org, in particular, the latest version of data was employed, namely DBLP-Citation network V13. The dataset was not entirely imported in Neo4j because it was huge and our machines with limited computing power were not able to handle it, so just a part of it was taken, containing information about 2315 papers.

This link collects in a file the queries of the main document.

MongoDB

Each scientific paper has a full description of document structure and content as

  • Title, Abstract, Authors (with affiliations, email, bio)
  • Metadata (keywords)
  • Publication details (journal, volume, number, date, pages)
  • Sections: title, text (by paragraph), subsections, figures (image URL and caption)
  • Bibliography (set of refs.)

(back to top)

Built With

This section lists any major frameworks/libraries used to bootstrap the project.

  • LaTeX
  • MongoDB
  • Neo4J
  • Python
  • Shell Script

(back to top)

Getting Started

This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.

Prerequisites

This is an example of how to list things you need to use the software and how to install them.

  • Install Neo4j
    ./install-neo4j.sh
  • Install MongoDB
    ./install-mongodb.sh
  • Run Mongo-Shell by launching mongosh command in the terminal, and then create Administrative MongoDB User
    use admin
    db.createUser({
      user: "username",
      pwd: "password",
      roles: [{
        role: "userAdminAnyDatabase",
        db: "admin"
      }]
    })
    quit()
    • To test the changes, access the mongo shell using the created administrative user.
      mongosh -u username -p --authenticationDatabase admin
  • Download and extract the dataset
    wget https://originalstatic.aminer.cn/misc/dblp.v13.7z
    7z x dblpv13.7z

Installation

Below is an example of how you can instruct your audience on installing and setting up your app. This template doesn't rely on any external dependencies or services.

  1. Clone the repo
    git clone https://github.com/IrfEazy/smbud-project.git

(back to top)

Usage

For more examples, please refer to the Documentation

(back to top)

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

Contact

Irfan Cela - LinkedIn - irfan.cela@mail.polimi.it

Fabio Lusha - LinkedIn - fabio.lusha@mail.polimi.it

Alberto Sandri - LinkedIn - alberto.sandri@mail.polimi.it

Bianca Christiana Savoiu Marinas - LinkedIn - biancachristiana.savoiu@mail.polimi.it

Enrico Simionato - LinkedIn - enrico.simionato@mail.polimi.it

Project Link: https://github.com/IrfEazy/smbud-project

(back to top)

About

Project - System and Methods for Big and Unstructured Data [PoliMi]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TeX 64.3%
  • Jupyter Notebook 17.7%
  • Python 13.5%
  • Cypher 3.6%
  • Shell 0.9%