-
Notifications
You must be signed in to change notification settings - Fork 10
Feature: RDF/XML parser #410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Conflicts: # conanfile.py # tests/CMakeLists.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first review about architecture and design (not logic).
In general: please split the xml parse into multiple files. One file per state would be good. And please add comments to explain why stuff is happening, what each state is good for and so on. This is very hard to follow even if you know what rdf-xml roughly looks like.
I very much like the idea with the states. Just needs to be organized a little.
|
If you split it into multiple files just keep the definitions inline in the structs that makes it easier to read I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes.
As I said before, I do like the idea of the states. However, the concrete implementation is not optimal.
The main concern here is the circular nature of the relationships of the states and the parser. The states can just arbitrarily modify the parser through the given ImplXML &. This is not good for general understandability and correctness.
I suggest a more functional approach with state transitions and secondary actions (something like fn(CurrentState, Input) -> Transition).
The state transitions determine which state the parser will enter next (or that it should pop a state). The secondary actions are emitting triples or errors.
Something like this:
struct Transition {
using Action = std::variant<StatePop, StatePush, EmitStatement, EmitError>;
std::vector<Action> actions_;
};So instead of directly mutating the parser, the states tell the parser what to do (functional/data oriented design). There is a closed set of possible StateTransition actions that each have a clear purpose. (Rust can model this much cleaner than C++, but it still works). Performance is mostly a secondary concern here; maintainability is more important for rdf/xml (due to the complexity of the specification and since there are basically no large rdf/xml datasets).
The whole goal is to get rid of the circular mutability that is currently present and limit what states can do to the parser. That should make it both safer and easier to follow.
For this it makes sense to move some of the functionality out of the ImplXML (e.g. the static functions).
Note: I have not fully thought this through (since it would require me to do the refactor myself), but if you have questions or concerns, feel free to ask.
liss-h
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the uses of nested structs (you can keep the ones where ImplXYZ is nested in IStreamQuadIterator for now)
liss-h
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes. They really make the code much easier to read. Here is another review, I expect that I will write another (probably final one) after these comments are addressed.
This one is mostly a request for some more example driven documentation and some smaller issues.
liss-h
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all the changes. While this is still complex (because rdf-xml just is complex), I think it is very well structured.
I am just missing a few comments here an there to explain things.
Additionally, you use std::in_place_type_t<XYZ>{} very often, please just use std::in_place_type<XYZ> (without _t and curly braces).
| } | ||
|
|
||
| StateTransition CollectionState::on_start_element(XMLOutputQueue &out, std::string_view const local_name, std::string_view const uri, std::span<XMLAttribute> const attributes, XMLStateInfo const &info) { | ||
| auto [transition, obj] = DescriptionState::enter(out, local_name, uri, attributes, info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if I put something in the collection that is not a Description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see added comment
|
@nkaralis Do you still have the time to implement the tests for this or should the task be delegated? |
|
I will implement them. |
|
@mcb5637 Have a look at the commented out tests. In the negative tests, there are multiple that are failing |
liss-h
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work you put in @mcb5637. I think this is ready to be merged.
TODO: