Skip to content

Commit

Permalink
Merge pull request #75 from Maykkkk/Maykkkk-patch-1
Browse files Browse the repository at this point in the history
Added Other Materials
  • Loading branch information
Swastyy authored Oct 16, 2023
2 parents 98fcbc6 + 3682043 commit 1ead78c
Show file tree
Hide file tree
Showing 20 changed files with 47,004 additions and 0 deletions.
2,061 changes: 2,061 additions & 0 deletions Mathematics/Other material/Housing Price Prediction/Housing-Prediction.ipynb

Large diffs are not rendered by default.

25 changes: 25 additions & 0 deletions Mathematics/Other material/Housing Price Prediction/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Housing Price Prediction Model

Welcome to the Housing Price Prediction model. This machine learning model is designed to predict housing prices using a dataset of various features related to houses. Whether you're a beginner or an experienced practitioner, this project serves as a great starting point to delve into machine learning for real estate.

## Usage Guide
- Once the setup is complete, you can use the Housing Price Prediction model with ease:

- Data Preparation: Make sure you have a dataset prepared in the same format as the example data provided. Ensure that the features match the columns used during training.

- Model Loading: If you want to use a pre-trained model, update the model_path variable in predict_prices.py to point to the location of your saved model.

- Prediction: Run the prediction script using the command mentioned in the setup. The model will output predicted housing prices based on the input features.

- Interpret Results: Analyze the predicted prices and assess the model's performance. You can further fine-tune the model parameters or features to improve its accuracy.

## Model Details
The Housing Price Prediction model is built upon the Scikit-Learn library, utilizing powerful regression techniques. It's designed to predict housing prices based on features such as square footage, number of bedrooms, location, and more. The model has been trained on a real-world dataset, allowing it to provide valuable insights into housing market trends.

## Dataset
The dataset used for training and testing this model is not included in this repository due to its size. You can find the dataset and its description in Chapter X of "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" by Aurélien Géron. Please download the dataset from the provided source and ensure it's appropriately formatted before use.

## Contributing
Contributions to this project are more than welcome! If you find any issues, have suggestions, or want to add new features, please feel free to open an issue or submit a pull request.

Reg no.- 22BCE10275
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# California Housing

## Source
This dataset is a modified version of the California Housing dataset available from [Luís Torgo's page](http://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html) (University of Porto). Luís Torgo obtained it from the StatLib repository (which is closed now). The dataset may also be downloaded from StatLib mirrors.

This dataset appeared in a 1997 paper titled *Sparse Spatial Autoregressions* by Pace, R. Kelley and Ronald Barry, published in the *Statistics and Probability Letters* journal. They built it using the 1990 California census data. It contains one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).

## Tweaks
The dataset in this directory is almost identical to the original, with two differences:

* 207 values were randomly removed from the `total_bedrooms` column, so we can discuss what to do with missing data.
* An additional categorical attribute called `ocean_proximity` was added, indicating (very roughly) whether each block group is near the ocean, near the Bay area, inland or on an island. This allows discussing what to do with categorical data.

Note that the block groups are called "districts" in the Jupyter notebooks, simply because in some contexts the name "block group" was confusing.

## Data description

>>> housing.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 10 columns):
longitude 20640 non-null float64
latitude 20640 non-null float64
housing_median_age 20640 non-null float64
total_rooms 20640 non-null float64
total_bedrooms 20433 non-null float64
population 20640 non-null float64
households 20640 non-null float64
median_income 20640 non-null float64
median_house_value 20640 non-null float64
ocean_proximity 20640 non-null object
dtypes: float64(9), object(1)
memory usage: 1.6+ MB

>>> housing["ocean_proximity"].value_counts()
<1H OCEAN 9136
INLAND 6551
NEAR OCEAN 2658
NEAR BAY 2290
ISLAND 5
Name: ocean_proximity, dtype: int64

>>> housing.describe()
longitude latitude housing_median_age total_rooms \
count 16513.000000 16513.000000 16513.000000 16513.000000
mean -119.575972 35.639693 28.652335 2622.347605
std 2.002048 2.138279 12.576306 2138.559393
min -124.350000 32.540000 1.000000 6.000000
25% -121.800000 33.940000 18.000000 1442.000000
50% -118.510000 34.260000 29.000000 2119.000000
75% -118.010000 37.720000 37.000000 3141.000000
max -114.310000 41.950000 52.000000 39320.000000

total_bedrooms population households median_income
count 16355.000000 16513.000000 16513.000000 16513.000000
mean 534.885112 1419.525465 496.975050 3.875651
std 412.716467 1115.715084 375.737945 1.905088
min 2.000000 3.000000 2.000000 0.499900
25% 295.000000 784.000000 278.000000 2.566800
50% 433.000000 1164.000000 408.000000 3.541400
75% 644.000000 1718.000000 602.000000 4.745000
max 6210.000000 35682.000000 5358.000000 15.000100

Loading

0 comments on commit 1ead78c

Please sign in to comment.