Data Analysis on Motor Vehicle Collisions using Map Reduce
This repository contains code to perform data analysis on NYPD Motor Vehicle Collisions dataset, using map reduce paradigm, that is provided by NYC Open Data. The dataset contains all the reports of vehicular incidents in New York City. More information about the dataset is available here. The dataset is updated regularly and has a lot of attributes. The aim of the project is to perform exploratory data analysis on the dataset and explore more about the map reduce paradigm. I have used hadoop streaming API and have written the mapper and reducer in python.
Note that I used the standard input for the data to flow in so that I am not restricting any approaches.
I gathered statistical counts for different types of vehicles that are involved in an accident over the period of time in the data.
AMBULANCE 3713
BICYCLE 24153
BUS 25871
FIRE TRUCK 1333
LARGE COM VEH(6 OR MORE TIRES) 27981
LIVERY VEHICLE 17775
MOTORCYCLE 10029
OTHER 51360
PASSENGER VEHICLE 1005160
PEDICAB 123
PICK-UP TRUCK 26281
SCOOTER 534
SMALL COM VEH(4 TIRES) 30048
SPORT UTILITY / STATION WAGON 363209
TAXI 63892
UNKNOWN 105481
VAN 51666
If there is anything you want to talk about please feel free to reach out on linkedin. If you find any issues feel free to update them on the issues of this repository.