Analysis of the Impact of an Event on Rental Prices on Booking

Project Description

This project investigates the effect of a major annual event in Barcelona on rental prices on Booking. Data is collected for at least two different weeks for Barcelona and another city (Milan) as a control group, aiming to analyze price variations using a difference-in-differences (DiD) model. Additionally, a text analysis is performed on accommodation descriptions to identify patterns in texts associated with prices.

Project Structure

textmining_booking/
|-- booking/
|   |-- packages/
|   |   |-- __pycache__/
|   |   |-- __init__.py         # Package initialization file
|   |   |-- dataloading.py      # Data loading and cleaning
|   |   |-- processing.py       # Data processing
|   |   |-- scraper.py          # Web scraper from Booking
|   |   |-- selenium_setup.py   # Selenium setup for scraping
|   |-- Barcelona_MWC.csv       # Data extracted from Barcelona
|   |-- Milan_MWC.csv           # Data extracted from Milan
|   |-- geckodriver.exe         # Selenium driver for Firefox
|-- ITM_HW1.ipynb               # Principal Notebook
|-- hw1.pdf                     # Document with project requirements
|-- README.md                   # Description the project structure
|-- requirements.txt            # Dependencies required to run
|-- setup.py                    # Installation and setup script

Installation and Setup

Requirements

To install the required dependencies, run:

pip install -r requirements.txt

Usage

Selenium Setup:
- Download geckodriver.exe for Firefox or use the appropriate driver for Chrome.
- Place it in the project folder.
Run the Files:
```
python packages/scraper.py
python packages/dataloading.py
python packages/processing.py
```
These files generate searches on Booking webpages, extract data according to our delimitations, and preprocess the description of each hotel.
Data Analysis:
- Run the ITM_HW1.ipynb notebook in Jupyter Notebook to visualize exploratory analysis, data cleaning, and the DiD regression. In this notebook, we use pipelines to call all the functions from the .py files.

Methodology

1. Web Scraping

Rental price data is collected from Booking for Barcelona and Milan.
Navigation through multiple result pages is automated.
Accommodation descriptions are also extracted for text analysis.

2. Text Analysis

Text preprocessing is performed by removing stopwords and applying stemming.
Wordclouds are generated before and after preprocessing.
Terms associated with higher prices are explored.

3. DiD Regression

The impact of the event on prices is estimated using a difference-in-differences model.
Additional controls based on text descriptions are included.
Heterogeneous effects are explored according to accommodation quality.

Contributions

This project was developed as part of an academic assignment. It is recommended to follow good practices in web scraping and respect the terms of service of the platforms used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis of the Impact of an Event on Rental Prices on Booking

Project Description

Project Structure

Installation and Setup

Requirements

Usage

Methodology

1. Web Scraping

2. Text Analysis

3. DiD Regression

Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
booking		booking
README.md		README.md
hw1.pdf		hw1.pdf
requirements.txt		requirements.txt
setup.py		setup.py

Enzo280100/textmining_booking

Folders and files

Latest commit

History

Repository files navigation

Analysis of the Impact of an Event on Rental Prices on Booking

Project Description

Project Structure

Installation and Setup

Requirements

Usage

Methodology

1. Web Scraping

2. Text Analysis

3. DiD Regression

Contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages