Skip to content

Conversation

@BuZZ-dEE
Copy link

@xzel23
Copy link

xzel23 commented Nov 14, 2025

I think that's not enough. When another thread modifies the table while rehash() is running, rehash will see the updated values in the loop, seeing a mix of old and new values. This does not seem right. OldTable would need to be copied, not just share a reference to the original array, but this can not be done atomically.

The patch fixes concurrent rehashes, but all other methods that modify the table will still mess up the data.

I don't see how this can be solved without explicit synchronization.

@BuZZ-dEE
Copy link
Author

@xzel23 Would you synchronize the entire put method?

 public synchronized Value put(char c, Value value) { ... }

@xzel23
Copy link

xzel23 commented Nov 16, 2025

Caveat: I just stumbled over this because I use BATIK in some projects and follow commits to the repository. I have no deeper knowledge of this project's codebase, and I might have become a little rusty with Java 8, so I hope I didn't accidentally use any post Java 8 feature . These are just some observations from looking at the commit and the files through the GitHub Web interface.

It's much worse than that. Looking at the code for this class, it is inherently not thread safe. Methods that need synchronization are at least:

  • put(), rehash(), removeClearedEntries(), clear(): all these methods change the data and the changes need to be atomic
  • get(): this method only reads. But it iterates over the entries; another thread might replace the table during iteration, and get would iterate over invalidated data.

Possible Solutions

I think there are to ways to make this threadsafe:

  1. Use proper synchronization. Declaring all methods mentioned above synchronized would solve this, But:

    • concurrent get() calls would need to wait, which might drastically impact performance
    • declaring methods synchronized is IMHO bad practice since it exposes implementation details (later more on that)
  2. Reimplement the whole class designed for thread safety from the start.

    • I might get this wrong, but this looks like a Cache holding weak (or rather soft) references on the values where the value class is defined within this class itself. I think users of this class should not store references of type Value anyway but instead just use them to extract values. This means that practically all values should be eligible for GC at any time. If that's true, the simplest solution would be to use a soft reference to a Map holding strong references instead of a Map holding soft references. So the simple, easy to maintain and threadsafe version would be to make this class a thin wrapper around a SoftReference<ConcurrentHashMap<Character, Value>>. This would get rid of the custom ReferenceQueue, the Entry class, and all the complicated housekeeping. It can be done in a couple of lines of easy to maintain code and should be fully threadsafe and performant.
    • The biggest problem will be the protected methods exposing implementation detail.

Suggestion

  1. Implement proper synchronization using read/write locks to fix the current issue.
  2. Mark all protected methods and members as well as the Entry class as deprecated for removal.
  3. Include these changes in the next minor/patch release.
  4. In the next (or second to next, depending on how strict semver is followed) major release, remove the deprecated methods and members and switch to the SoftReference<ConcurrentHashMap<Value>> implementation.

Some more things to consider

  • The Value class is not immutable as it contains non final protected members. This is a bad thing for cache entries. If the class is not intended for inheritance, it should be declared final, all members final. I also suggest adding alternative getters not starting with "get" and delegating the existing getters the the new ones. This prepares for replacing the Value class with a one-line record class once the projects raises the minimum Java version to 17 (which I think would be a good thing for the next major release).
  • I have not checked if similar classes are contained in the codebase. In that case, either make the class take a generic parameter, or introduce a generic base class and use that one instead of repeating implementing the same pattern again and again.

@BuZZ-dEE
Copy link
Author

Other proposal using ConcurrentHashMap #67

@BuZZ-dEE
Copy link
Author

Closing this in favour of #67

@BuZZ-dEE BuZZ-dEE closed this Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants