Skip to content

Conversation

@main--
Copy link
Contributor

@main-- main-- commented Nov 22, 2025

Here is my attempt to fix #156.

I'm not sure if this is the best approach, and I'm not sure that I handled all the code paths correctly. At least the tests are passing now, so that's a good sign -- though I'm still not sure why some of the ID numbers changed on the schema tests.

Making the two maps private is of course an unfortunate breaking change, but otherwise I see a high risk of bugs caused by calling .get(ident) without a fallback in case the ident schema doesn't match.

I'm curious to see what you think about this.

While it is obviously not legal to have different element types with the exact same identifiers in an XML schema, it is possible to tolerate this in xsd-parser. The basic idea is to add a SchemaId to every Ident, therefore allowing otherwise identical Idents to coexist if they were identified in different schema files. This sounds simple, but creates the problem that if schema 1 defines foo:bar and then schema 2 references it, the Ident from the reference would be tagged as schema 2 and would therefore not resolve to the ident from schema 1. So most of the work then boils down to adding fallback logic everywhere so that if an Ident is not found with the same SchemaId it correctly resolves to an Ident defined elsewhere, just like before.
@Bergmann89
Copy link
Owner

Hey @main--,

looks promising, I like the idea of adding the schema id to the identifier, but I do not like the invasive change to the MetaTypes structure. I also see some other smaller potential problems, but I think we can resolve all of these issues step by step.

Here is a first list of points I see:

  • As said above I would like to keep the structure of MetaTypes more or less unchanged, because it is part of the public API. Instead of changing MetaTypes directly, we should use a custom map implementation for MetaTypes::items that mimics the API of BTreeMap then we'll be API compatible to the current implementation.
  • In the interpreter you add the schema id of the current processed schema, even if the type that is currently interpreted is from a different schema. You should implement a similar function to get the schema of the current type, like it is already done for the namespace (see State::current_ns).
  • Using the schema also in the identifier for the elements is not a good idea, because then you have to implement the same logic for looking up the elements as you did for the types. Actually even the namespace and the type is not needed to identify an element, the name should be enough. But changing this would be also too invasive for now. Nevertheless, maybe we should introduce a new TypeIdent that contains the normal Ident and the schema, and keep the rest of the code as it is.

@Bergmann89
Copy link
Owner

Hey @main--,

I'm preparing a new release, I just wanted to sync with you, if we should aim to get this PR into the new release also, or if we postpone it to a later one? Time wise I aim to make the release end of next week - Dec-07 (latest in two weeks - Dec-14).

@main--
Copy link
Contributor Author

main-- commented Nov 30, 2025

Hi, thank you for asking. From my side there is no rush on this and I'm currently tied up in other projects. I intend to finish this, but it will probably take a few weeks until I can get back to it so no need to wait for me.

@Bergmann89
Copy link
Owner

Ok, then we will put this into the next release, not the current one 👍

@main-- main-- force-pushed the support-duplicate-types branch from 9af9b15 to 9d5d048 Compare December 18, 2025 22:04
@main--
Copy link
Contributor Author

main-- commented Dec 18, 2025

I implemented your suggestion with the custom map. What made it a bit awkward is the fact that it needs to handle both BTreeMap and IndexMap but I think I found a decent solution.

I also added a State::current_schema method in 9d5d048 but I'm not sure if I understood your comment on this correctly. It doesn't appear to make any difference? To be quite honest, I don't fully grasp all the details of that code.

I considered how to avoid adding the schema to Ident but ultimately didn't find a good way. Many places in the code assume that Ident can be used as a map key, so in all of those we would lose track of the duplicated types. With the custom map this is already a much cleaner change.

Unfortunately this still has a small breaking change regarding Ident::partial_eq: a01d3cd#diff-86c0c093340c621f7a270b8d5db87c2b2f25442cd200e0b5c010053042ddbbc2

@Bergmann89
Copy link
Owner

Hey @main--,

Thanks for all the work, I'll have a look over the weekend and report back.

I also added a State::current_schema method in 9d5d048 but I'm not sure if I understood your comment on this correctly. It doesn't appear to make any difference? To be quite honest, I don't fully grasp all the details of that code.

The Problem I pointed out before is caused by the recursive evaluation of the types. If for example a type schema1:MyType depends on schema2:BaseType, processing of MyType will trigger the creation of BaseType (if it was not already created), even if the actual processed schema is schema1 a type for schema2 is interpreted. That's why you have to pick the schema ID from the stack, and not the one of the current active schema. The loop over the schemas and root elements in that schema is just to not miss any type, but the actual processing of a type is recursive.

I considered how to avoid adding the schema to Ident but ultimately didn't find a good way. Many places in the code assume that Ident can be used as a map key, so in all of those we would lose track of the duplicated types. With the custom map this is already a much cleaner change.

Ok, this was just an idea of mine. I'll have a look into the code and re-evaluate that idea again.

@Bergmann89
Copy link
Owner

Hey @main--,

sorry for the late reply, this took longer than expected. I reviewed your changes and started simplifying a few things. While doing that, I explored my idea around the new TypeIdent, which turned out to be more complex than expected and resulted in a larger refactoring.

Instead of resolving the schema every time an entry is accessed from a map, the Interpreter now handles this centrally. It assigns an appropriate SchemaId to each identifier, allowing the later pipeline stages to work with identifiers in their usual form. This also enables us to implement support for xs:redefine and xs:override (as requested here #192) directly in the Interpreter.

Since GitHub doesn’t allow changing the source branch of an existing PR, I opened a new one here: #210. Please have a look and if you’re okay with it, I would close this PR and merge the new one.

@Bergmann89
Copy link
Owner

Closed as duplicate to #210

@Bergmann89 Bergmann89 closed this Jan 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Duplicate elements are silently ignored

2 participants