GGSIPU Alert Bot is a web scraping and notification system designed to fetch and store notices from the Guru Gobind Singh Indraprastha University (GGSIPU) website. It provides an efficient way to keep track of the latest announcements and updates from the university. The project includes an API that serves these notices, which is used by Telegram and WhatsApp bots to deliver notifications to users.
Using various Node.js and Express scripts, an automated job runs on Linux servers every hour which uploads new notices and sends them to various platforms.
@ggsipunotices - Join Here
The project is not affiliated with GGSIPU or any other Government entity.
- Features
- How to Contribute
- Prerequisites
- Installation
- Configuration
- Usage
- Project Structure
- Key Components
- Database
- Scripts
- API Documentation
- Deployment
- Related Projects
- Troubleshooting
- License
- Scrapes notices from the GGSIPU website automatically
- Stores notices in a PostgreSQL database for efficient retrieval
- Provides API endpoints to access the latest notices
- Handles URL encoding for consistent data storage
- Includes scripts for database maintenance and updates
- Hosted on Azure for reliable access
- Integrates with Telegram and WhatsApp bots for user notifications
- Runs automated jobs every hour to check and distribute new notices
We welcome contributions to improve the GGSIPU Alert Bot! There are several areas where the project needs improvement, and your expertise could make a significant difference.
-
Date Extraction Logic
One of the main challenges we face is extracting accurate dates from the notices. The GGSIPU website doesn't provide explicit dates for notices, so we rely on extracting dates from the notice URLs. However, our current pattern recognition sometimes fails to identify the correct date.
Current Approach: We extract dates from URLs using patterns. Here are some examples:
http://www.ipu.ac.in/Pubinfo2024/nt200724401.pdf→ Date: 20/07/2024http://www.ipu.ac.in/pubinfo/circflag250118.pdf→ Date: 25/01/2018http://www.ipu.ac.in/Pubinfo2024/formhost2425210724.pdf→ Date: 21/07/2024
The Challenge: Some URL patterns are not recognized by our current logic. For example:
http://www.ipu.ac.in/Pubinfo2024/nt200724401 (9).pdfhttp://www.ipu.ac.in/Pubinfo2024/cnnt1107249p (4).pdf
We need to improve our pattern recognition to handle these varied URL formats and extract dates accurately.
-
URL Encoding
We've encountered issues with spaces in URLs. While we've implemented a solution to encode spaces as
%20, there might be other special characters that need proper encoding. -
Performance Optimization
As the number of notices grows, we need to ensure our database queries and API responses remain efficient. Contributions to optimize database operations and API performance are welcome.
-
User Interface for Bots
While we have functional bots for Telegram and WhatsApp, there's room for improvement in their user interfaces and interaction patterns.
-
Code Contributions
- Fork the repository
- Create a new branch for your feature or bug fix
- Make your changes and submit a pull request
- Ensure your code follows the project's coding standards and includes appropriate tests
-
Issue Reporting
- If you notice any bugs or have feature suggestions, please open an issue on GitHub
- Provide as much detail as possible, including steps to reproduce for bugs
-
Documentation
- Help improve our documentation, including this README
- Write tutorials or guides for using the GGSIPU Notice Tracker
-
Testing
- Help test the application, especially the date extraction logic with various URL patterns
- Report any inconsistencies or errors you find
- Check the Issues page for existing problems or feature requests
- Comment on an issue if you want to work on it, or open a new issue to discuss your ideas
- Follow the Installation and Configuration steps in this README to set up your development environment
We appreciate all contributions, big or small. Together, we can make the GGSIPU Alert Bot more robust and useful for the entire community!
- Node.js (v14 or later)
- npm (v6 or later)
- PostgreSQL (v12 or later)
- TypeScript (v4 or later)
- Azure account (for deployment)
-
Clone the repository:
git clone https://github.com/shubhsardana29/notice-scraper.git cd notice-scraper -
Install dependencies:
npm install -
Set up the database (instructions in the Database section)
-
Build the project:
npm run build
-
Create a
.envfile in the root directory with the following content:DATABASE_URL="postgresql://username:password@localhost:5432/ggsipu_notices?schema=public" PORT=3000Replace
username,password, and other details as per your PostgreSQL setup. -
Update
prisma/schema.prismaif you need to make any changes to the database schema.
-
Start the server:
npm start -
To access the latest notices, send a GET request to:
http://localhost:3000/api/notices/latest
ggsipu-notice-tracker/
├── src/
│ ├── config/
│ ├── controllers/
│ ├── models/
│ ├── services/
│ ├── utils/
│ ├── scripts/
│ └── app.ts
├── prisma/
│ └── schema.prisma
├── dist/
├── node_modules/
├── .env
├── .gitignore
├── package.json
├── tsconfig.json
└── README.md
src/utils/scraper.ts: Handles web scraping of notices from the GGSIPU website.src/services/notice.service.ts: Manages database operations for notices.src/controllers/notice.controller.ts: Handles HTTP requests and responses for notice-related operations.src/app.ts: The main application file that sets up the Express server and routes.
The project uses PostgreSQL with Prisma as the ORM. To set up the database:
- Ensure PostgreSQL is installed and running.
- Create a new database named
ggsipu_notices. - Run Prisma migrations:
npx prisma migrate dev
npm start: Starts the servernpm run build: Compiles TypeScript to JavaScript
The GGSIPU Notice Tracker API provides the following endpoints:
- Get Latest Notices
-
Endpoint:
GET /api/notices/latest -
Description: Retrieves the most recent notices
-
Response:
[ { "id": 1, "date": "2024-07-21", "title": "Notice Title", "url": "http://www.ipu.ac.in/notices/example.pdf", }, ... ]
-
The GGSIPU Alert Bot API is hosted on Azure App Service. To deploy updates:
- Ensure you have the Azure CLI installed and are logged in.
- Build the project:
npm run build - Deploy to Azure:
az webapp up --name GGSIPUAlertBot --resource-group YourResourceGroup
Replace GGSIPUAlertBot and YourResourceGroup with your actual Azure App Service name and resource group.
This API serves as the backend for two bot projects:
-
Telegram Bot: GGSIPU Alert Telegram Bot
- Delivers notice updates to users via Telegram
- Uses this API to fetch the latest notices
- Join the Telegram channel: @ggsipunotices
-
WhatsApp Bot: GGSIPU Alert WhatsApp Bot
- Sends notice updates to users on WhatsApp
Both bots use the /api/notices/latest endpoint to fetch recent notices and notify users of updates.
For API-related issues:
- Check the Azure App Service logs for any error messages.
- Ensure the database connection string in Azure App Service configuration is correct.
- Verify that the API endpoints are accessible and returning expected data.
For bot-related issues:
- Check the respective bot's logs for any connection errors to the API.
- Ensure the bot is using the correct API URL and any required authentication.
This project is licensed under the MIT License.