-
Notifications
You must be signed in to change notification settings - Fork 0
Add creator biographical information to EAD XML exports #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add get_creator_bioghist method to fetch biographical notes - Update task_resource to resolve linked_agents - Inject bioghist element into EAD XML after </did> tag - Follow existing pattern for custom XML injection Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
- Use xml.sax.saxutils.escape for proper XML escaping - Format bioghist notes as separate paragraphs - Remove manual XML escaping in favor of library function Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
- Make error message more specific about the operation that failed Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
arcflow/main.py
Outdated
| 'ead3': 'false', | ||
| }) | ||
|
|
||
| # add record group and subgroup labels to EAD inside <archdesc level="collection"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to update the comments to reflect the new behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in commit db640ce. The comment now reflects that the section handles multiple types of custom XML elements (record group/subgroup labels and biographical/historical notes).
alexdryden
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I continue with the review, I'd like for you to start over re-work this with more frequent and verbose commits. So, keep everything the same, but parse out the changes more clearly through more frequent, single purpose commits.
Even if humans committing to this repo have dense commits, I'd like for you to be more explicit. For example, in the first commit to main.py you are making changes to the like locating the closing </did> tag, which is great, but not really part of "Add creator biographical information to EAD XML exports".
This makes it easier to review and helps me understand your specific logic and choices.
This reverts commit 63ffe60.
This reverts commit 3c608de.
This reverts commit e49106b.
Restructure the XML injection code to make it easier to add additional XML elements. Changes: - Move XML decode/encode outside the label-conditional block - Calculate insertion point (after </did> tag) upfront - Use extra_xml variable to accumulate multiple XML elements - Maintains existing recordgroup/subgroup functionality Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Include linked_agents in the resource API call resolution to fetch agent information along with the resource. This prepares for extracting creator biographical information. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Add import for xml_escape function to properly escape XML special characters (ampersands, less-than, greater-than, quotes) when injecting content into EAD XML. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Implement method to fetch biographical/historical notes from ArchivesSpace agent records. The method: - Filters agents by 'creator' role - Fetches agent details via API - Extracts note_bioghist content from subnotes - Escapes XML special characters using xml_escape - Wraps each note in <p> tags for proper XML structure - Returns concatenated paragraphs or None if no notes found Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Call get_creator_bioghist to fetch biographical notes and inject them into the EAD XML as a <bioghist> element. The bioghist element is added after </did> tag, following recordgroup/subgroup labels. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Update the comment on the XML injection section to indicate that it now handles multiple types of custom XML elements (record group/ subgroup labels AND biographical/historical notes), not just labels. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
I've reverted the previous commits and restructured the changes into 6 granular, single-purpose commits:
Each commit now has a clear, focused purpose that's easier to review. |
|
@copilot Great! This commit style suits me much better for this repo. Can you add a .github/copilot-instructions.md file with some basic on boarding instructions that includes this commit style? Consult https://docs.github.com/en/enterprise-cloud@latest/copilot/tutorials/coding-agent/get-the-best-results if you need context for what this file is for. Keep it very simple for now and mainly focus the commit style. Make the file so it is easy and clear for another agent to add more instructions later. |
Create onboarding instructions for GitHub Copilot agents with focus on granular, single-purpose commit style. Includes: - Guidelines for commit granularity - Examples of good vs bad commit sequences - Commit message format recommendations - Instructions for extending the file Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Created in commit b485f55. The file documents the granular commit style with clear examples and is structured to be easily extended by other agents. |
|
@copilot This is great! Lets tune it a bit more. I noticed in your granular commits you have a commit for adding an import--that is too granular. You can silently add the import in the same commit where it is used. Update the instructions to make that clear. |
Update copilot-instructions.md to specify that imports should be added in the same commit where they're used, not as separate commits. Added guideline and examples showing both good and too-granular commit patterns. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Updated in commit 76ac751. Added guideline "Include imports with usage" and examples showing both too-dense and too-granular commit patterns to clarify the right balance. |
|
Okay, this all looks good! @copilot I do notice that the xml output looks like this: where there are no line breaks between the element being injected. This is mostly just a cosmetic issue, but the EAD is frequently referenced by humans, so I think this is an important readability issue. |
Add newline characters before each injected XML element (recordgroup, subgroup, bioghist) to improve readability of the generated EAD XML for human reviewers. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
|
@copilot Great! Looks good. So, from that same example I'm seeing that there is an & in what is now line 21: Look back through, I thought I recalled this being handled for record group/subgroup at some point in the past. Make sure that you haven't bypassed some xml formatting through one of your changes. Report back what you find along with a proposed fix. |
Apply xml_escape to recordgroup and subgroup labels to prevent XML syntax errors when labels contain special characters like &, <, or >. This was a pre-existing issue but is now fixed to be consistent with bioghist content escaping. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
... Findings: The recordgroup/subgroup labels were never being XML-escaped in the original code - this was a pre-existing issue, not introduced by my refactoring. However, my bioghist implementation properly uses Fix: Applied in commit eabad7c. Now using |
|
@copilot Great! Can you propose some tests that would test for these two issues (spacing/readability of the EAD and xml escape characters)? You can think about this request broadly, e.g., should the code be refactored into something that is more testable? Don't make any commits for this yet, just report back what you think the changes to the code and test environment would look like. |
Test ProposalCurrent State AnalysisThe XML injection logic is currently embedded in
Recommended Refactoring for Testability1. Extract XML injection into pure function: def inject_custom_xml_elements(xml_content: str, rg_label: str = None,
sg_label: str = None, bioghist_content: str = None) -> str:
"""
Inject custom XML elements after </did> tag.
Returns modified XML string with proper escaping and formatting.
"""
# Current lines 232-259 logic here
# Pure function - no self, no side effects2. Extract bioghist formatting into testable method: def format_bioghist_paragraphs(bioghist_notes: list) -> str:
"""Format biographical notes as escaped XML paragraphs."""
# Current logic from get_creator_bioghist lines 549-555Proposed TestsTest file: import pytest
from arcflow.main import inject_custom_xml_elements, format_bioghist_paragraphs
class TestXMLInjection:
def test_xml_escaping_in_recordgroup(self):
"""Test that special chars in recordgroup are escaped"""
xml = '<archdesc level="collection"><did></did></archdesc>'
result = inject_custom_xml_elements(xml, rg_label="Group & Co")
assert '&' in result
assert '<recordgroup>Group & Co</recordgroup>' in result
def test_xml_escaping_in_subgroup(self):
"""Test that < and > are escaped"""
xml = '<archdesc level="collection"><did></did></archdesc>'
result = inject_custom_xml_elements(xml, rg_label="RG1", sg_label="Sub<>group")
assert '<' in result and '>' in result
def test_newlines_between_elements(self):
"""Test readability: elements on separate lines"""
xml = '<archdesc level="collection"><did></did><scope></scope></archdesc>'
result = inject_custom_xml_elements(xml, rg_label="RG1", sg_label="SG1",
bioghist_content="<p>Bio</p>")
lines = result.split('\n')
assert any('<recordgroup>' in line for line in lines)
assert any('<subgroup>' in line for line in lines)
assert any('<bioghist>' in line for line in lines)
def test_bioghist_escaping(self):
"""Test that bioghist content is properly escaped"""
notes = ["John & Jane", "Born <1900", "Died >2000"]
result = format_bioghist_paragraphs(notes)
assert '&' in result
assert '<' in result
assert '>' in resultTest Infrastructure SetupCreate:
Update Benefits
|
|
@copilot Okay, I like where you are headed in terms of extracting injection logic and seperating concerns, but any kind of custom injection funtion should not take these super specific parameters that are tied to specific types of things we want to inject. I'd like to think about this more and perhaps craft a more detailed request. Can you open a new issue to create a test suit and include the thoughts you and I have put just come up with on refactoring and testability? |
I don't have permission to create GitHub issues directly, but here's the issue content for you to create: Title: Add test suite with refactored XML injection logic Labels: enhancement, testing Description: SummaryCreate a comprehensive test suite for XML injection functionality with refactored code to improve testability and maintainability. BackgroundDuring PR review for adding creator biographical information (#[PR_NUMBER]), we identified that the XML injection logic is tightly coupled with the ArcFlow class, making it difficult to test without:
Goals
Proposed RefactoringDesign PrincipleCreate a flexible XML injection system that doesn't tie parameters to specific element types. Instead of: inject_custom_xml_elements(xml_content, rg_label, sg_label, bioghist_content) # Too specificUse a more generic approach: inject_xml_elements(xml_content, elements: List[XMLElement], insertion_point: str)Where @dataclass
class XMLElement:
tag: str
content: str
escape: bool = TrueKey Functions to Extract
Test Coverage NeededCritical Tests
Test File StructureImplementation Checklist
Benefits
After you create this issue, I can reference it in future work on this topic. |
|
@copilot can you briefly summarize in a comment the changes the pr makes to help support reviews |
PR SummaryThis PR adds creator biographical information to EAD XML exports by extracting Core Functionality Changes (
|
Implementation Plan for Creator Biographical Information
Summary
Successfully restructured the implementation with granular, single-purpose commits:
Documentation Added
Created
.github/copilot-instructions.mdwith onboarding instructions for GitHub Copilot agents, documenting the granular commit style guidelines including:XML Formatting
Original prompt
This pull request was created from Copilot chat.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.