-
Notifications
You must be signed in to change notification settings - Fork 6
Feat rework docs #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat rework docs #119
Conversation
Summary of ChangesHello @vibhatha, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly re-works the website's data presentation and underlying data processing logic. The primary goal is to enhance how datasets are organized and discovered by introducing a robust categorization system. This not only refines the interactive data browsing experience but also adds crucial documentation for data citation and source transparency, making the platform more user-friendly and academically sound. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request significantly improves the data browsing experience by introducing a categorization system, replacing the previous ministry-based filtering. The changes are well-implemented across the data generation script and the frontend components. I appreciate the addition of new documentation for data sources and citations, which enhances the project's usability for researchers. I have one suggestion to improve the maintainability of the new categorization logic in the Python script.
| def categorize_dataset(dataset_name: str) -> str: | ||
| """Categorize a dataset based on its name""" | ||
| name_lower = dataset_name.lower() | ||
|
|
||
| # Tourism datasets | ||
| tourism_keywords = [ | ||
| 'tourist', 'tourism', 'accommodation', 'occupancy', | ||
| 'top 10 source', 'location vs revenue', 'tourist attraction' | ||
| ] | ||
| if any(keyword in name_lower for keyword in tourism_keywords): | ||
| return 'Tourism' | ||
|
|
||
| # Foreign Employment datasets | ||
| employment_keywords = [ | ||
| 'slbfe', 'remittance', 'foreign exchange', 'local arrival', | ||
| 'local departure', 'legal division', 'complaints', 'raids' | ||
| ] | ||
| if any(keyword in name_lower for keyword in employment_keywords): | ||
| return 'Foreign Employment' | ||
|
|
||
| # Immigration datasets | ||
| immigration_keywords = [ | ||
| 'asylum', 'deportation', 'refugee', 'refused entry', | ||
| 'fake passport', 'fraudulent visa' | ||
| ] | ||
| if any(keyword in name_lower for keyword in immigration_keywords): | ||
| return 'Immigration' | ||
|
|
||
| # Budget datasets | ||
| budget_keywords = [ | ||
| 'capital expenditure', 'recurrent expenditure', 'budget' | ||
| ] | ||
| if any(keyword in name_lower for keyword in budget_keywords): | ||
| return 'Budget' | ||
|
|
||
| # Foreign Affairs datasets | ||
| foreign_affairs_keywords = [ | ||
| 'ministry news', 'mission news', 'staff of mission', | ||
| 'staff of the ministry', 'special notice', 'news from other', | ||
| 'cadre management' | ||
| ] | ||
| if any(keyword in name_lower for keyword in foreign_affairs_keywords): | ||
| return 'Foreign Affairs' | ||
|
|
||
| return 'Other' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better maintainability and readability, you could refactor this function to be more data-driven. Instead of a series of if statements, consider using a dictionary to map categories to their keywords. This makes it easier to add or modify categories in the future without changing the logic of the function.
def categorize_dataset(dataset_name: str) -> str:
"""Categorize a dataset based on its name"""
name_lower = dataset_name.lower()
category_keywords = {
'Tourism': [
'tourist', 'tourism', 'accommodation', 'occupancy',
'top 10 source', 'location vs revenue', 'tourist attraction'
],
'Foreign Employment': [
'slbfe', 'remittance', 'foreign exchange', 'local arrival',
'local departure', 'legal division', 'complaints', 'raids'
],
'Immigration': [
'asylum', 'deportation', 'refugee', 'refused entry',
'fake passport', 'fraudulent visa'
],
'Budget': [
'capital expenditure', 'recurrent expenditure', 'budget'
],
'Foreign Affairs': [
'ministry news', 'mission news', 'staff of mission',
'staff of the ministry', 'special notice', 'news from other',
'cadre management'
]
}
for category, keywords in category_keywords.items():
if any(keyword in name_lower for keyword in keywords):
return category
return 'Other'* fix : fix some typo errors and column rows mismatch in data * feat : new version of data * feat : complete 2021 & 2022 data * change the data structure * feat : adding data contribution page * feat : changing some data * fix: minor newline removal * fix: remove falty dir
- Update index.md: year range 2019-2025, add budget data mention - Add Data Sources page with SLTDA, SLBFE, and Treasury reports - Add Cite This Dataset page with copyable BibTeX/APA/Chicago/IEEE - Add Department of National Budget section to data-matrix.md - Add OpenGin Explore visualization tip to interactive browser - Update data index to scan only data/statistics (published data) - Add automatic dataset categorization (Tourism, Foreign Employment, Immigration, Budget, Foreign Affairs) - Replace ministry filter with category filter in DataBrowser - Add CopyableCode component for citation copying Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Deploy preview builds to Surge.sh when PRs are opened or updated. Posts a comment with the preview URL on each PR. Requires SURGE_TOKEN secret to be configured in repository settings. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The archive_Data.zip, sources_Data.zip, and statistics_Data.zip files are generated by prebuild but not used by the website. Only year-based zip files (2019_Data.zip, etc.) are used. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use rossjrw/pr-preview-action to deploy PR previews directly to
the gh-pages branch under /pr-preview/pr-{number}/ directory.
This eliminates the need for SURGE_TOKEN secret configuration.
Previews are automatically cleaned up when PRs are closed.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use pull_request_target to run with base repo permissions - Explicitly checkout PR head SHA for building Repository settings also need: - Settings > Actions > General > Workflow permissions - Enable "Read and write permissions" Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- build-docs.yml: Runs on pull_request, shows as CI check, uploads build artifact for verification - preview-docs.yml: Runs on pull_request_target (after merge to main), deploys previews to gh-pages The preview workflow requires being in main branch first to work. After this PR is merged, future PRs will get automatic previews. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Posts a comment with link to download the built docs artifact. Includes instructions to preview locally with npx serve. For live PR previews without download, consider connecting the repo to Netlify (free tier supports deploy previews).
Split into two workflows: - build-docs.yml: Builds and uploads artifact (runs on PR) - comment-docs-preview.yml: Posts comment with link (runs after build) workflow_run runs in base repo context with write permissions, allowing comments on PRs from forks.
Paths were relative to data/statistics/ but static files are served from data/. Added statistics/ prefix so URLs resolve correctly.
c076282 to
f914720
Compare
This PR re-works the website based on the updated data.
And also adds minor improvements to data visualization and citation.