Implement support for duplicated identifiers #186

main-- · 2025-11-22T14:40:51Z

Here is my attempt to fix #156.

I'm not sure if this is the best approach, and I'm not sure that I handled all the code paths correctly. At least the tests are passing now, so that's a good sign -- though I'm still not sure why some of the ID numbers changed on the schema tests.

Making the two maps private is of course an unfortunate breaking change, but otherwise I see a high risk of bugs caused by calling .get(ident) without a fallback in case the ident schema doesn't match.

I'm curious to see what you think about this.

While it is obviously not legal to have different element types with the exact same identifiers in an XML schema, it is possible to tolerate this in xsd-parser. The basic idea is to add a SchemaId to every Ident, therefore allowing otherwise identical Idents to coexist if they were identified in different schema files. This sounds simple, but creates the problem that if schema 1 defines foo:bar and then schema 2 references it, the Ident from the reference would be tagged as schema 2 and would therefore not resolve to the ident from schema 1. So most of the work then boils down to adding fallback logic everywhere so that if an Ident is not found with the same SchemaId it correctly resolves to an Ident defined elsewhere, just like before.

Bergmann89 · 2025-11-24T07:57:53Z

Hey @main--,

looks promising, I like the idea of adding the schema id to the identifier, but I do not like the invasive change to the MetaTypes structure. I also see some other smaller potential problems, but I think we can resolve all of these issues step by step.

Here is a first list of points I see:

As said above I would like to keep the structure of MetaTypes more or less unchanged, because it is part of the public API. Instead of changing MetaTypes directly, we should use a custom map implementation for MetaTypes::items that mimics the API of BTreeMap then we'll be API compatible to the current implementation.
In the interpreter you add the schema id of the current processed schema, even if the type that is currently interpreted is from a different schema. You should implement a similar function to get the schema of the current type, like it is already done for the namespace (see State::current_ns).
Using the schema also in the identifier for the elements is not a good idea, because then you have to implement the same logic for looking up the elements as you did for the types. Actually even the namespace and the type is not needed to identify an element, the name should be enough. But changing this would be also too invasive for now. Nevertheless, maybe we should introduce a new TypeIdent that contains the normal Ident and the schema, and keep the rest of the code as it is.

Bergmann89 · 2025-11-28T19:44:39Z

Hey @main--,

I'm preparing a new release, I just wanted to sync with you, if we should aim to get this PR into the new release also, or if we postpone it to a later one? Time wise I aim to make the release end of next week - Dec-07 (latest in two weeks - Dec-14).

main-- · 2025-11-30T20:44:24Z

Hi, thank you for asking. From my side there is no rush on this and I'm currently tied up in other projects. I intend to finish this, but it will probably take a few weeks until I can get back to it so no need to wait for me.

Bergmann89 · 2025-11-30T22:02:09Z

Ok, then we will put this into the next release, not the current one 👍

main-- · 2025-12-18T22:16:22Z

I implemented your suggestion with the custom map. What made it a bit awkward is the fact that it needs to handle both BTreeMap and IndexMap but I think I found a decent solution.

I also added a State::current_schema method in 9d5d048 but I'm not sure if I understood your comment on this correctly. It doesn't appear to make any difference? To be quite honest, I don't fully grasp all the details of that code.

I considered how to avoid adding the schema to Ident but ultimately didn't find a good way. Many places in the code assume that Ident can be used as a map key, so in all of those we would lose track of the duplicated types. With the custom map this is already a much cleaner change.

Unfortunately this still has a small breaking change regarding Ident::partial_eq: a01d3cd#diff-86c0c093340c621f7a270b8d5db87c2b2f25442cd200e0b5c010053042ddbbc2

Bergmann89 · 2025-12-19T08:14:08Z

Hey @main--,

Thanks for all the work, I'll have a look over the weekend and report back.

I also added a State::current_schema method in 9d5d048 but I'm not sure if I understood your comment on this correctly. It doesn't appear to make any difference? To be quite honest, I don't fully grasp all the details of that code.

The Problem I pointed out before is caused by the recursive evaluation of the types. If for example a type schema1:MyType depends on schema2:BaseType, processing of MyType will trigger the creation of BaseType (if it was not already created), even if the actual processed schema is schema1 a type for schema2 is interpreted. That's why you have to pick the schema ID from the stack, and not the one of the current active schema. The loop over the schemas and root elements in that schema is just to not miss any type, but the actual processing of a type is recursive.

I considered how to avoid adding the schema to Ident but ultimately didn't find a good way. Many places in the code assume that Ident can be used as a map key, so in all of those we would lose track of the duplicated types. With the custom map this is already a much cleaner change.

Ok, this was just an idea of mine. I'll have a look into the code and re-evaluate that idea again.

Bergmann89 · 2025-12-30T22:57:06Z

Hey @main--,

sorry for the late reply, this took longer than expected. I reviewed your changes and started simplifying a few things. While doing that, I explored my idea around the new TypeIdent, which turned out to be more complex than expected and resulted in a larger refactoring.

Instead of resolving the schema every time an entry is accessed from a map, the Interpreter now handles this centrally. It assigns an appropriate SchemaId to each identifier, allowing the later pipeline stages to work with identifiers in their usual form. This also enables us to implement support for xs:redefine and xs:override (as requested here #192) directly in the Interpreter.

Since GitHub doesn’t allow changing the source branch of an existing PR, I opened a new one here: #210. Please have a look and if you’re okay with it, I would close this PR and merge the new one.

Bergmann89 · 2026-01-19T22:36:09Z

Closed as duplicate to #210

main-- added 4 commits November 22, 2025 15:10

Add unit tests for duplicate types

7d371f1

clippy

1acb464

cargo fmt

cd64330

Bergmann89 mentioned this pull request Dec 11, 2025

Support for redefine in the schema #193

Draft

main-- force-pushed the support-duplicate-types branch from f2cd529 to 9af9b15 Compare December 18, 2025 22:01

main-- added 6 commits December 18, 2025 23:03

Merge support-duplicate-types v1 minus the private maps

3ba3a11

use generic IdentMap

cd57930

Don't render redundant copies of a type

be5aedf

fix more tests

a01d3cd

Fix namespace in test

64edd15

Use current_schema()

9d5d048

main-- force-pushed the support-duplicate-types branch from 9af9b15 to 9d5d048 Compare December 18, 2025 22:04

main-- added 2 commits December 18, 2025 23:06

cargo clippy

3940405

cargo fmt

497e5b4

Bergmann89 mentioned this pull request Dec 30, 2025

Support duplicated type identifiers across schema files #210

Merged

Bergmann89 closed this Jan 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement support for duplicated identifiers #186

Implement support for duplicated identifiers #186

Uh oh!

main-- commented Nov 22, 2025

Uh oh!

Bergmann89 commented Nov 24, 2025

Uh oh!

Bergmann89 commented Nov 28, 2025

Uh oh!

main-- commented Nov 30, 2025

Uh oh!

Bergmann89 commented Nov 30, 2025

Uh oh!

main-- commented Dec 18, 2025 •

edited

Loading

Uh oh!

Bergmann89 commented Dec 19, 2025

Uh oh!

Bergmann89 commented Dec 30, 2025

Uh oh!

Bergmann89 commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement support for duplicated identifiers #186

Implement support for duplicated identifiers #186

Uh oh!

Conversation

main-- commented Nov 22, 2025

Uh oh!

Bergmann89 commented Nov 24, 2025

Uh oh!

Bergmann89 commented Nov 28, 2025

Uh oh!

main-- commented Nov 30, 2025

Uh oh!

Bergmann89 commented Nov 30, 2025

Uh oh!

main-- commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bergmann89 commented Dec 19, 2025

Uh oh!

Bergmann89 commented Dec 30, 2025

Uh oh!

Bergmann89 commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

main-- commented Dec 18, 2025 •

edited

Loading