GitHub - LAMDAMielgo/European_DataJobs_Poll: First project for IH Data Analytics M1 in July 2020

[IH] Data Project [M1 DATA MADPT 2020]

📌 About

This program will produce responses for the challenges ask for the final exercise in Module 1 of Data Analytics in IronHack's Bootcamp [PT2020]

This is an exercise on constructing a DATA PIPELINE, showcasing the programming skills and tools acquired in the first module of the program:

''A data pipeline views all data as streaming data and it allows for flexible schemas. Regardless of whether it comes from static sources or from real-time sources, the data pipeline divides each data stream into smaller chunks that it processes in parallel, conferring extra computing power.'''

🌠 Data Sources

Tables (.db). In the following link you can find the .db file with the main dataset:
API. The projects used the API from Open SKills Project
Web Scraping. The project retrieves information about ISO 3166 alpha2 code in Wikipedia and from World Health Organization to get the desired countries inside Europe.

⚠️ Project Requirements

Challenge 1:

You need to create a Data Pipeline that retrieves the following table:

Country	Job Title	Gender	Quantity	Percentage
Spain	Data Scientist	Male	25	5%
Spain	Data Scientist	Female	25	5%
...	...	...	...	...
** Percentages are in proportion to each gender in each job category for each country

Challenge 2:

The main purpuse of this challenge was to work with in favor/againsts polls, which where complex to clean from raw data.

My interpretation of the second challenge was to visually represent how different gender responded to the basic income polls and wether there was a significally difference.

Position	Pro Arguments for Male	Pro Arguments for Female
Responses

Position	Against for Male	Against for Female
Responses

Challenge 3:

The main purpose of this challenge was to work with education level table, with a discrete qualitative classification of levels.

I framed the challenge to continue working with gender differences within the data collected, therefore my final table shows top 5 jobs for each gender, using matplotlib to visualize quantities.

Education Level	Top 5 Skills
high
medium
low
no education

💻 Technology stack

As a prerequisite, the programming lenguage of this repository is Python 3.7.3, therefore must have Python 3 installed. The native packages in use are:

Furthermore, it is need to be installed in the proper environment the following libraries:

⭐ Project structure

🚧 Status

Version 1.0 [04.07.2020] > First version done for class presentation

Version 1.0.1 [08.07.2020] > Post presentation corrections

🔧 How to Use

Download the repo (make sure you have fulfilled the prerequisites).
Run the function \ main_script.py \ and set a valid argument. There is only the option for a country argument. If the country is not in the generated information retrieved from the web sources, the program will exit at the beginning.
Possible inputs:
- 3.1. View all countries contained in database:

$ python main_script.py -c All

You will get:

· Parsing argument: ['All']
	 ··· Fetching European countries from web scrapping
	 ··· Validating country argparse
		 >> getting all countries

3.2. View a specific country in the ddbb:

$ python main_script.py -c United Kingdom

You will get:

 · Parsing argument: ['United', 'Kingdom']
	 ··· Fetching European countries from web scrapping
	 ··· Validating country argparse
 ·· country_argument found in ddbb

3.3. Wrong entries:

$ python main_script.py -c

You will get:

 · Parsing argument: []
	 ··· Fetching European countries from web scrapping
	 ··· Validating country argparse
		 >> country_argument not found.
		 >> proceeding to exit

**Help from argeparse can always we call in doubt:

$ python main_script.py -help

💌 Contact info

linkedin.com/in/lauramielgo for inqueries.

♥️ Thanks

Big thanks to TAs and teachers for the help and support in the development of this project:

@github/potacho

@github/TheGurus

📁 Folder structure

The folder structure follows the template given in class, generating as many files as necessary inside each package.

└── project
    ├── __trash__
    ├── .gitignore
    ├── .env
    ├── requeriments.txt
    ├── README.md
    ├── main_script.py
    ├── notebooks
    │   ├── acquisition.ipynb
    │   └── wrangling.ipynb
    ├── package_acquisition
    │   ├── module_acquisition.py
    │   └── module_cleaning.py
    ├── package_wrangling
    │   └── module_awrangling.py
    ├── package_analysis
    │   └── module_analysis.py
    ├── package_reporting
    │   └── module_reporting.py
    └── data
        ├── raw
             └── ddbb
        ├── processed
             └── (here you will find each ddbb table cleaned)
        └── results
             ├── df_percentage_by_job_and_gender.csv
             ├── df_top_skills.csv
             ├── viz_distribution_top_skills.png
             └── viz_distribution_basic_income.png

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
__trash__		__trash__
notebooks		notebooks
p_acquisition		p_acquisition
p_analysis		p_analysis
p_reporting		p_reporting
p_wrangling		p_wrangling
.env.txt		.env.txt
.gitignore		.gitignore
README.md		README.md
main_script.py		main_script.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[IH] Data Project [M1 DATA MADPT 2020]

📌 About

🌠 Data Sources

⚠️ Project Requirements

Challenge 1:

Challenge 2:

Challenge 3:

💻 Technology stack

⭐ Project structure

🚧 Status

🔧 How to Use

💌 Contact info

♥️ Thanks

📁 Folder structure

About

Releases

Packages

Languages

LAMDAMielgo/European_DataJobs_Poll

Folders and files

Latest commit

History

Repository files navigation

[IH] Data Project [M1 DATA MADPT 2020]

📌 About

🌠 Data Sources

⚠️ Project Requirements

Challenge 1:

Challenge 2:

Challenge 3:

💻 Technology stack

⭐ Project structure

🚧 Status

🔧 How to Use

💌 Contact info

♥️ Thanks

📁 Folder structure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages