Skip to content

Deepayanbasu07/Zomato-EDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Zomato Exploratory Data Analysis (EDA) Project

This project focuses on exploratory data analysis of the Zomato dataset. The dataset provides various details about restaurants, including location, cuisines, pricing, ratings, and more. Through this analysis, we aim to uncover key insights that can inform business decisions or form the basis for predictive models.

Project Overview

  1. Data Cleaning and Preprocessing:

    • Loaded and cleaned the Zomato dataset to handle missing values, remove duplicates, and standardize formats.
    • Key columns analyzed: City, Cuisines, Average Cost for two, Aggregate rating, Votes, Price range, and more.
  2. Geographical Insights:

    • City-Level Analysis: Analyzed the distribution of restaurants across cities and identified top cities by restaurant count and average ratings.
    • Locality Analysis: Explored restaurant distribution and ratings by localities within cities.
  3. Cuisine Analysis:

    • Popular Cuisines: Identified the most popular cuisines in the dataset based on the frequency of occurrences.
    • Cuisine vs Ratings: Investigated how different cuisines correlate with average ratings.
  4. Price and Cost Analysis:

    • Price Range Distribution: Visualized the distribution of restaurants across different price ranges.
    • Cost Analysis by City: Compared the average cost for two across different cities.
    • Price Range vs Rating: Analyzed the relationship between price range and aggregate ratings.
  5. Booking and Delivery Services:

    • Table Booking: Analyzed the impact of table booking availability on restaurant ratings.
    • Online Delivery: Explored how the availability of online delivery services affects restaurant ratings.
  6. Votes and Ratings Relationship:

    • Votes vs Ratings: Investigated the relationship between the number of votes a restaurant receives and its aggregate rating.

Key Insights

1. Geographical Insights

  • Top Cities by Restaurant Count: Cities like Delhi, Bengaluru, and Mumbai have the highest number of restaurants in the dataset.
  • Top Cities by Average Ratings: Smaller cities like Pune and Hyderabad tend to have higher average ratings compared to larger metro areas.
  • Locality-Based Analysis: Certain localities within cities stand out for their high concentration of highly-rated restaurants.

2. Cuisine Analysis

  • Most Popular Cuisines: North Indian, Chinese, and Italian are among the most popular cuisines offered by restaurants in the dataset.
  • Cuisine and Ratings: Fine-dining cuisines such as Italian and Mediterranean tend to have higher average ratings compared to fast food or street food options.

3. Price and Cost Analysis

  • Price Range Distribution: Most restaurants fall into the mid-range price bracket, with a few high-end and low-end outliers.
  • Cost by City: Cities like Mumbai and Delhi have higher average costs for two, whereas smaller cities like Pune have relatively lower costs.
  • Price vs Rating: There is a noticeable trend where higher-priced restaurants tend to have better ratings, indicating that price may be a reflection of quality.

4. Booking and Delivery Services

  • Table Booking vs Ratings: Restaurants that offer table booking tend to have higher average ratings compared to those that don't.
  • Online Delivery vs Ratings: Restaurants offering online delivery also show a trend of higher ratings, possibly indicating customer preference for convenience.

5. Votes and Ratings Relationship

  • Votes Correlation: There is a positive correlation between the number of votes and ratings, suggesting that more popular restaurants (in terms of customer engagement) tend to have higher ratings.

Visualizations

The analysis includes several visualizations to help uncover patterns and trends:

  • Bar charts showing restaurant distribution by city and locality.
  • Scatter plots and pair plots to visualize relationships between variables like cost, votes, and ratings.
  • Heatmaps to show correlations between numerical variables.

Conclusion

This exploratory data analysis of the Zomato dataset has provided valuable insights into the relationship between location, cuisine, pricing, and customer feedback. These insights can be used for strategic decision-making, marketing initiatives, or as features for machine learning models in predictive tasks such as rating prediction or customer segmentation.

Future Work

  • Sentiment Analysis: If customer reviews are available, sentiment analysis could provide further insights into customer satisfaction.
  • Recommendation System: Build a recommendation system based on user preferences and restaurant attributes.
  • Clustering Analysis: Use clustering techniques to group similar restaurants and understand key characteristics of different clusters.

Tools and Technologies

  • Python: For data analysis and visualization.
  • Pandas: For data manipulation and aggregation.
  • Seaborn & Matplotlib: For data visualization.
  • Jupyter Notebook: For conducting and documenting the analysis.

File Structure

  • Zomato_EDA.ipynb: Jupyter notebook containing the entire analysis and code implementation.
  • README.md: Project overview, insights, and future directions.

Feel free to explore the notebook for detailed analysis and visualizations!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published