GitHub - BiancaSavoiu/smbud-project: Project - System and Methods for Big and Unstructured Data [PoliMi]

Systems and Methods for Big and Unstructured Data

Project of Systems and Methods for Big and Unstructured Data.
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
License
Contact

About The Project

The purpose of the project is to create a bibliographic database, namely a system able to store and manage data regarding scientific articles and all their meaningful characteristics and relations with authors and the place where they are published.

implementation in Neo4j
implementation in MongoDB
implementation in PySpark

Neo4j

The main dataset used can be downloaded from the site www.aminer.org, in particular, the latest version of data was employed, namely DBLP-Citation network V13. The dataset was not entirely imported in Neo4j because it was huge and our machines with limited computing power were not able to handle it, so just a part of it was taken, containing information about 2315 papers.

This link collects in a file the queries of the main document.

MongoDB

Each scientific paper has a full description of document structure and content as

Title, Abstract, Authors (with affiliations, email, bio)
Metadata (keywords)
Publication details (journal, volume, number, date, pages)
Sections: title, text (by paragraph), subsections, figures (image URL and caption)
Bibliography (set of refs.)

(back to top)

Built With

This section lists any major frameworks/libraries used to bootstrap the project.

(back to top)

Getting Started

This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.

Prerequisites

This is an example of how to list things you need to use the software and how to install them.

Install Neo4j
```
./install-neo4j.sh
```
Install MongoDB
```
./install-mongodb.sh
```

Run Mongo-Shell by launching mongosh command in the terminal, and then create Administrative MongoDB User

use admin

db.createUser({
  user: "username",
  pwd: "password",
  roles: [{
    role: "userAdminAnyDatabase",
    db: "admin"
  }]
})

quit()

To test the changes, access the mongo shell using the created administrative user.
```
mongosh -u username -p --authenticationDatabase admin
```

Download and extract the dataset

wget https://originalstatic.aminer.cn/misc/dblp.v13.7z
7z x dblpv13.7z

Installation

Below is an example of how you can instruct your audience on installing and setting up your app. This template doesn't rely on any external dependencies or services.

Clone the repo

git clone https://github.com/IrfEazy/smbud-project.git

(back to top)

Usage

For more examples, please refer to the Documentation

(back to top)

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

Contact

Irfan Cela - LinkedIn - irfan.cela@mail.polimi.it

Fabio Lusha - LinkedIn - fabio.lusha@mail.polimi.it

Alberto Sandri - LinkedIn - alberto.sandri@mail.polimi.it

Bianca Christiana Savoiu Marinas - LinkedIn - biancachristiana.savoiu@mail.polimi.it

Enrico Simionato - LinkedIn - enrico.simionato@mail.polimi.it

Project Link: https://github.com/IrfEazy/smbud-project

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
images		images
latex		latex
mongo-db		mongo-db
neo4j		neo4j
py_cleaning		py_cleaning
spark		spark
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install-mongodb.sh		install-mongodb.sh
install-neo4j.sh		install-neo4j.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Systems and Methods for Big and Unstructured Data

About The Project

Neo4j

MongoDB

Built With

Getting Started

Prerequisites

Installation

Usage

License

Contact

About

Releases

Packages

Languages

License

BiancaSavoiu/smbud-project

Folders and files

Latest commit

History

Repository files navigation

Systems and Methods for Big and Unstructured Data

About The Project

Neo4j

MongoDB

Built With

Getting Started

Prerequisites

Installation

Usage

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages