Skip to content

Conversation

@vibhatha
Copy link
Member

This PR re-works the website based on the updated data.
And also adds minor improvements to data visualization and citation.

@vibhatha vibhatha requested a review from zaeema-n January 30, 2026 00:01
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @vibhatha, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly re-works the website's data presentation and underlying data processing logic. The primary goal is to enhance how datasets are organized and discovered by introducing a robust categorization system. This not only refines the interactive data browsing experience but also adds crucial documentation for data citation and source transparency, making the platform more user-friendly and academically sound.

Highlights

  • Enhanced Data Categorization: Implemented a new Python function to automatically categorize datasets (e.g., Tourism, Foreign Employment, Immigration, Budget, Foreign Affairs) based on their names, improving data organization and discoverability.
  • Improved Interactive Data Browser: The interactive data browser now utilizes the new categorization system, allowing users to filter datasets by these defined categories instead of ministries, and includes updated statistics and search capabilities.
  • New Documentation Pages: Added dedicated documentation pages for 'Cite This Dataset' (providing various academic citation formats) and 'Data Sources' (listing original source documents with archive links), enhancing the usability and credibility of the data.
  • Website Content and UI Updates: The website's overview, data matrix, and missing datasets report have been updated to reflect the latest data and categorization. A new 'CopyableCode' React component was introduced to facilitate easy copying of code snippets, particularly for citations.
  • Refined Data Ingestion Process: The data generation script now specifically targets the 'statistics' folder for data scanning and includes the new category information in the generated dataset index.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/preview-docs.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the data browsing experience by introducing a categorization system, replacing the previous ministry-based filtering. The changes are well-implemented across the data generation script and the frontend components. I appreciate the addition of new documentation for data sources and citations, which enhances the project's usability for researchers. I have one suggestion to improve the maintainability of the new categorization logic in the Python script.

Comment on lines +25 to +69
def categorize_dataset(dataset_name: str) -> str:
"""Categorize a dataset based on its name"""
name_lower = dataset_name.lower()

# Tourism datasets
tourism_keywords = [
'tourist', 'tourism', 'accommodation', 'occupancy',
'top 10 source', 'location vs revenue', 'tourist attraction'
]
if any(keyword in name_lower for keyword in tourism_keywords):
return 'Tourism'

# Foreign Employment datasets
employment_keywords = [
'slbfe', 'remittance', 'foreign exchange', 'local arrival',
'local departure', 'legal division', 'complaints', 'raids'
]
if any(keyword in name_lower for keyword in employment_keywords):
return 'Foreign Employment'

# Immigration datasets
immigration_keywords = [
'asylum', 'deportation', 'refugee', 'refused entry',
'fake passport', 'fraudulent visa'
]
if any(keyword in name_lower for keyword in immigration_keywords):
return 'Immigration'

# Budget datasets
budget_keywords = [
'capital expenditure', 'recurrent expenditure', 'budget'
]
if any(keyword in name_lower for keyword in budget_keywords):
return 'Budget'

# Foreign Affairs datasets
foreign_affairs_keywords = [
'ministry news', 'mission news', 'staff of mission',
'staff of the ministry', 'special notice', 'news from other',
'cadre management'
]
if any(keyword in name_lower for keyword in foreign_affairs_keywords):
return 'Foreign Affairs'

return 'Other'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better maintainability and readability, you could refactor this function to be more data-driven. Instead of a series of if statements, consider using a dictionary to map categories to their keywords. This makes it easier to add or modify categories in the future without changing the logic of the function.

def categorize_dataset(dataset_name: str) -> str:
    """Categorize a dataset based on its name"""
    name_lower = dataset_name.lower()

    category_keywords = {
        'Tourism': [
            'tourist', 'tourism', 'accommodation', 'occupancy',
            'top 10 source', 'location vs revenue', 'tourist attraction'
        ],
        'Foreign Employment': [
            'slbfe', 'remittance', 'foreign exchange', 'local arrival',
            'local departure', 'legal division', 'complaints', 'raids'
        ],
        'Immigration': [
            'asylum', 'deportation', 'refugee', 'refused entry',
            'fake passport', 'fraudulent visa'
        ],
        'Budget': [
            'capital expenditure', 'recurrent expenditure', 'budget'
        ],
        'Foreign Affairs': [
            'ministry news', 'mission news', 'staff of mission',
            'staff of the ministry', 'special notice', 'news from other',
            'cadre management'
        ]
    }

    for category, keywords in category_keywords.items():
        if any(keyword in name_lower for keyword in keywords):
            return category

    return 'Other'

yasandu0505 and others added 11 commits January 30, 2026 11:35
* fix : fix some typo errors and column rows mismatch in data

* feat : new version of data

* feat : complete 2021 & 2022 data

* change the data structure

* feat : adding data contribution page

* feat : changing some data

* fix: minor newline removal

* fix: remove falty dir
- Update index.md: year range 2019-2025, add budget data mention
- Add Data Sources page with SLTDA, SLBFE, and Treasury reports
- Add Cite This Dataset page with copyable BibTeX/APA/Chicago/IEEE
- Add Department of National Budget section to data-matrix.md
- Add OpenGin Explore visualization tip to interactive browser
- Update data index to scan only data/statistics (published data)
- Add automatic dataset categorization (Tourism, Foreign Employment,
  Immigration, Budget, Foreign Affairs)
- Replace ministry filter with category filter in DataBrowser
- Add CopyableCode component for citation copying

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Deploy preview builds to Surge.sh when PRs are opened or updated.
Posts a comment with the preview URL on each PR.

Requires SURGE_TOKEN secret to be configured in repository settings.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The archive_Data.zip, sources_Data.zip, and statistics_Data.zip
files are generated by prebuild but not used by the website.
Only year-based zip files (2019_Data.zip, etc.) are used.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use rossjrw/pr-preview-action to deploy PR previews directly to
the gh-pages branch under /pr-preview/pr-{number}/ directory.

This eliminates the need for SURGE_TOKEN secret configuration.
Previews are automatically cleaned up when PRs are closed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use pull_request_target to run with base repo permissions
- Explicitly checkout PR head SHA for building

Repository settings also need:
- Settings > Actions > General > Workflow permissions
- Enable "Read and write permissions"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- build-docs.yml: Runs on pull_request, shows as CI check, uploads
  build artifact for verification
- preview-docs.yml: Runs on pull_request_target (after merge to main),
  deploys previews to gh-pages

The preview workflow requires being in main branch first to work.
After this PR is merged, future PRs will get automatic previews.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Posts a comment with link to download the built docs artifact.
Includes instructions to preview locally with npx serve.

For live PR previews without download, consider connecting
the repo to Netlify (free tier supports deploy previews).
Split into two workflows:
- build-docs.yml: Builds and uploads artifact (runs on PR)
- comment-docs-preview.yml: Posts comment with link (runs after build)

workflow_run runs in base repo context with write permissions,
allowing comments on PRs from forks.
Paths were relative to data/statistics/ but static files are
served from data/. Added statistics/ prefix so URLs resolve correctly.
@zaeema-n zaeema-n merged commit 0ed21c1 into LDFLK:main Jan 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants