HotelPricePrediction Using Machine Learning

Developing a machine learning model to predict hotel prices based on factors such as star rating, review rating, and amenities offered by the hotel. By collecting and analyzing data from https://www.trivago.com/, the model will explore the relationships and correlations among these factors to accurately forecast hotel room prices.

Tech Stack

Tool	Description
Selenium	Automated web scraping tool used to extract data from the Trivago website by locating elements using ID, class name, and XPath, and extracting text and attribute values.
PyCharm	IDE used for developing Python scripts for data extraction, cleaning, and analysis.
Microsoft Azure	Cloud platform utilized to establish a data pipeline for effectively managing and securely storing scraped data in Azure Blob Storage.
REST API Postman	Tool used to create and test REST API endpoints, facilitating practical, real-time integration and application of the machine learning model.

Sample DataSet

Data Pipeline - Microsoft Azure

Hotel_Data: This is the starting point, representing the input dataset that contains hotel-related information from scrapped_HotelsDataset.csv.
Clean Missing Data (clean_missing_data1): In the Hotel Rating column, the entire row which contains missing values is removed from the dataset. This ensures that only complete cases are used in subsequent analysis steps.
Summarize Data:The data is summarized to understand its characteristics, such as mean, median, mode, and other descriptive statistics. This helps in gaining insights into the dataset before further processing.
Clean Missing Data (clean_missing_data_2): In the column Review Rating null values are replaced by the median of the column.
Split Data: splits the dataset into two parts: 70% for training the model and 30% for testing it. This is important to validate the model's performance on unseen data.
Poisson Regression: sets up the model to be used for analysis. Poisson regression is suitable for modeling count data and is often used in scenarios like predicting the number of bookings or events.
Train Model: The model is trained using the training dataset. This step involves adjusting the model parameters to minimize error and improve predictions.
Select Columns in Dataset: It involves selecting specific features or columns from the dataset that are relevant for model training. It helps in focusing the model on important data points.
Score Model: After training, the model is evaluated on the test dataset to score its performance. This typically involves calculating metrics like accuracy, precision, recall, etc.
Evaluate Model: Finally, this step evaluates the overall performance of the algorithm based on the scores from the test dataset. This is essential for understanding how well the model is likely to perform in real-world scenarios.

Features found to have an impact on the target variable

Selected Features: Retained columns up to "Review Rating" with significant correlations.
Correlations :Price, Hotel Rating, Pool, Hotel bar, Spa, Restaurant, Parking, Free WiFi , A/C, Review Rating.
Dropped Features: Removed columns with low or no correlation : WiFi in lobby, Pets, Hotel Name.

Evaluation Metrics - Poisson Regression Algorithm

Evaluation Metrics for the different ML Algorithms

How to Interpret

Lower values of MAE and RMSE indicate better model performance.
Higher values of R² indicate a better fit of the model to the data.

Model Accuracies and Fine Tuning

Cleaning the Data: Removing or imputing missing values, and correcting inconsistencies in the data . Feature Selection: Selecting the most relevant features using techniques like Pearson correlation analysis and dropping irrelevant features.
Model Selection: Experimenting with different algorithms (e.g., linear ,Boosted Decision tree , Decision tree and poisson ) to find the best-performing one.
Poisson Regression showed the best performance with a notable increase in accuracy with 62%.

Inference and Model Deployment - Postman

The JSON input sent via Postman includes details of the Hotel name, hotel rating (3.0), review rating (7.1), and available amenities (WiFi, restaurant, etc.). The specified price for this hotel is 4479.

The JSON output received from the REST API contains the same hotel details and amenities, with an additional "Scored Labels" field showing the model-predicted price 4473.08, providing a comparison to the actual price.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
scraping_data.py		scraping_data.py
scrapped_HotelsDataset.csv		scrapped_HotelsDataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HotelPricePrediction Using Machine Learning

Tech Stack

Sample DataSet

Data Pipeline - Microsoft Azure

Features found to have an impact on the target variable

Evaluation Metrics - Poisson Regression Algorithm

Evaluation Metrics for the different ML Algorithms

How to Interpret

Model Accuracies and Fine Tuning

Inference and Model Deployment - Postman

About

Releases

Packages

Languages

siri-chandana-macha/Hotel-Price-Prediction---Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

HotelPricePrediction Using Machine Learning

Tech Stack

Sample DataSet

Data Pipeline - Microsoft Azure

Features found to have an impact on the target variable

Evaluation Metrics - Poisson Regression Algorithm

Evaluation Metrics for the different ML Algorithms

How to Interpret

Model Accuracies and Fine Tuning

Inference and Model Deployment - Postman

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages