Feat rework docs #119

vibhatha · 2026-01-30T00:01:24Z

This PR re-works the website based on the updated data.
And also adds minor improvements to data visualization and citation.

gemini-code-assist · 2026-01-30T00:01:42Z

Summary of Changes

Hello @vibhatha, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly re-works the website's data presentation and underlying data processing logic. The primary goal is to enhance how datasets are organized and discovered by introducing a robust categorization system. This not only refines the interactive data browsing experience but also adds crucial documentation for data citation and source transparency, making the platform more user-friendly and academically sound.

Highlights

Enhanced Data Categorization: Implemented a new Python function to automatically categorize datasets (e.g., Tourism, Foreign Employment, Immigration, Budget, Foreign Affairs) based on their names, improving data organization and discoverability.
Improved Interactive Data Browser: The interactive data browser now utilizes the new categorization system, allowing users to filter datasets by these defined categories instead of ministries, and includes updated statistics and search capabilities.
New Documentation Pages: Added dedicated documentation pages for 'Cite This Dataset' (providing various academic citation formats) and 'Data Sources' (listing original source documents with archive links), enhancing the usability and credibility of the data.
Website Content and UI Updates: The website's overview, data matrix, and missing datasets report have been updated to reflect the latest data and categorization. A new 'CopyableCode' React component was introduced to facilitate easy copying of code snippets, particularly for citations.
Refined Data Ingestion Process: The data generation script now specifically targets the 'statistics' folder for data scanning and includes the new category information in the generated dataset index.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/preview-docs.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request significantly improves the data browsing experience by introducing a categorization system, replacing the previous ministry-based filtering. The changes are well-implemented across the data generation script and the frontend components. I appreciate the addition of new documentation for data sources and citations, which enhances the project's usability for researchers. I have one suggestion to improve the maintainability of the new categorization logic in the Python script.

gemini-code-assist · 2026-01-30T00:02:42Z

scripts/generate_data_index.py

+def categorize_dataset(dataset_name: str) -> str:
+    """Categorize a dataset based on its name"""
+    name_lower = dataset_name.lower()
+
+    # Tourism datasets
+    tourism_keywords = [
+        'tourist', 'tourism', 'accommodation', 'occupancy',
+        'top 10 source', 'location vs revenue', 'tourist attraction'
+    ]
+    if any(keyword in name_lower for keyword in tourism_keywords):
+        return 'Tourism'
+
+    # Foreign Employment datasets
+    employment_keywords = [
+        'slbfe', 'remittance', 'foreign exchange', 'local arrival',
+        'local departure', 'legal division', 'complaints', 'raids'
+    ]
+    if any(keyword in name_lower for keyword in employment_keywords):
+        return 'Foreign Employment'
+
+    # Immigration datasets
+    immigration_keywords = [
+        'asylum', 'deportation', 'refugee', 'refused entry',
+        'fake passport', 'fraudulent visa'
+    ]
+    if any(keyword in name_lower for keyword in immigration_keywords):
+        return 'Immigration'
+
+    # Budget datasets
+    budget_keywords = [
+        'capital expenditure', 'recurrent expenditure', 'budget'
+    ]
+    if any(keyword in name_lower for keyword in budget_keywords):
+        return 'Budget'
+
+    # Foreign Affairs datasets
+    foreign_affairs_keywords = [
+        'ministry news', 'mission news', 'staff of mission',
+        'staff of the ministry', 'special notice', 'news from other',
+        'cadre management'
+    ]
+    if any(keyword in name_lower for keyword in foreign_affairs_keywords):
+        return 'Foreign Affairs'
+
+    return 'Other'


For better maintainability and readability, you could refactor this function to be more data-driven. Instead of a series of if statements, consider using a dictionary to map categories to their keywords. This makes it easier to add or modify categories in the future without changing the logic of the function.

def categorize_dataset(dataset_name: str) -> str: """Categorize a dataset based on its name""" name_lower = dataset_name.lower() category_keywords = { 'Tourism': [ 'tourist', 'tourism', 'accommodation', 'occupancy', 'top 10 source', 'location vs revenue', 'tourist attraction' ], 'Foreign Employment': [ 'slbfe', 'remittance', 'foreign exchange', 'local arrival', 'local departure', 'legal division', 'complaints', 'raids' ], 'Immigration': [ 'asylum', 'deportation', 'refugee', 'refused entry', 'fake passport', 'fraudulent visa' ], 'Budget': [ 'capital expenditure', 'recurrent expenditure', 'budget' ], 'Foreign Affairs': [ 'ministry news', 'mission news', 'staff of mission', 'staff of the ministry', 'special notice', 'news from other', 'cadre management' ] } for category, keywords in category_keywords.items(): if any(keyword in name_lower for keyword in keywords): return category return 'Other'

* fix : fix some typo errors and column rows mismatch in data * feat : new version of data * feat : complete 2021 & 2022 data * change the data structure * feat : adding data contribution page * feat : changing some data * fix: minor newline removal * fix: remove falty dir

- Update index.md: year range 2019-2025, add budget data mention - Add Data Sources page with SLTDA, SLBFE, and Treasury reports - Add Cite This Dataset page with copyable BibTeX/APA/Chicago/IEEE - Add Department of National Budget section to data-matrix.md - Add OpenGin Explore visualization tip to interactive browser - Update data index to scan only data/statistics (published data) - Add automatic dataset categorization (Tourism, Foreign Employment, Immigration, Budget, Foreign Affairs) - Replace ministry filter with category filter in DataBrowser - Add CopyableCode component for citation copying Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Deploy preview builds to Surge.sh when PRs are opened or updated. Posts a comment with the preview URL on each PR. Requires SURGE_TOKEN secret to be configured in repository settings. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The archive_Data.zip, sources_Data.zip, and statistics_Data.zip files are generated by prebuild but not used by the website. Only year-based zip files (2019_Data.zip, etc.) are used. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Use rossjrw/pr-preview-action to deploy PR previews directly to the gh-pages branch under /pr-preview/pr-{number}/ directory. This eliminates the need for SURGE_TOKEN secret configuration. Previews are automatically cleaned up when PRs are closed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Use pull_request_target to run with base repo permissions - Explicitly checkout PR head SHA for building Repository settings also need: - Settings > Actions > General > Workflow permissions - Enable "Read and write permissions" Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- build-docs.yml: Runs on pull_request, shows as CI check, uploads build artifact for verification - preview-docs.yml: Runs on pull_request_target (after merge to main), deploys previews to gh-pages The preview workflow requires being in main branch first to work. After this PR is merged, future PRs will get automatic previews. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Posts a comment with link to download the built docs artifact. Includes instructions to preview locally with npx serve. For live PR previews without download, consider connecting the repo to Netlify (free tier supports deploy previews).

Split into two workflows: - build-docs.yml: Builds and uploads artifact (runs on PR) - comment-docs-preview.yml: Posts comment with link (runs after build) workflow_run runs in base repo context with write permissions, allowing comments on PRs from forks.

Paths were relative to data/statistics/ but static files are served from data/. Added statistics/ prefix so URLs resolve correctly.

vibhatha requested a review from zaeema-n January 30, 2026 00:01

gemini-code-assist bot reviewed Jan 30, 2026

View reviewed changes

yasandu0505 and others added 11 commits January 30, 2026 11:35

Add unused generated zip files to .gitignore

ea534db

The archive_Data.zip, sources_Data.zip, and statistics_Data.zip files are generated by prebuild but not used by the website. Only year-based zip files (2019_Data.zip, etc.) are used. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add artifact download link comment to PR

ba0d96a

Posts a comment with link to download the built docs artifact. Includes instructions to preview locally with npx serve. For live PR previews without download, consider connecting the repo to Netlify (free tier supports deploy previews).

Fix data paths to include statistics/ prefix

1f2edf8

Paths were relative to data/statistics/ but static files are served from data/. Added statistics/ prefix so URLs resolve correctly.

fix: minor

f914720

vibhatha force-pushed the feat-rework-docs branch from c076282 to f914720 Compare January 30, 2026 06:08

Remove missing-datasets page from documentation

155abe9

zaeema-n merged commit 0ed21c1 into LDFLK:main Jan 30, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat rework docs #119

Feat rework docs #119

Uh oh!

vibhatha commented Jan 30, 2026

Uh oh!

gemini-code-assist bot commented Jan 30, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feat rework docs #119

Feat rework docs #119

Uh oh!

Conversation

vibhatha commented Jan 30, 2026

Uh oh!

gemini-code-assist bot commented Jan 30, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants