Skip to content

vSomeIP 3.6.2 Release#1005

Merged
fcmonteiro merged 52 commits intoCOVESA:masterfrom
vvcarvalho-csw:master
Feb 23, 2026
Merged

vSomeIP 3.6.2 Release#1005
fcmonteiro merged 52 commits intoCOVESA:masterfrom
vvcarvalho-csw:master

Conversation

@vvcarvalho-csw
Copy link

No description provided.

lutzbichler and others added 7 commits February 18, 2026 18:02
Enable caching of unknown events.
This behavior could be switched on by setting ENABLE_DEFAULT_EVENT_CACHING at compile time.
Now it is always on.
Adapt wireshark plugin for newer wireshark versions
Wireshark 4.6.0 already decodes vsomeip internal messages.
https://www.wireshark.org/docs/dfref/v/vsomeip.html
https://gitlab.com/wireshark/wireshark/-/merge_requests/18760
As both protocols have the same name, wireshark complains of our plugin
having the same name as the official one.
This just changes the protocol name to vsomeip3 to allow having both plugins.
Improve logs for registration process
Logs during registration process could be a bit misleading.
Clarify when a client is being queued for registration/deregistration,
and when the client is actually being processed
Example:
Current logs

// app is requesting to register

Application/Client 1235 @ 160.10.4.1:31400 is registering

// we do not know when the daemon is actually processing the registration

Proposed logs

// register request from the app

Queueing a register request for application/client 1235 @ 160.10.4.1:31400

// actual log when daemon processes the registration

Application/Client 1235 is registering
It is a somewhat psychotic test, where first host offers a service, and
a second host offers the same service again, to check that collisions do
not break communication for the first offer
That is fine, however the second host would.. trigger a shutdown
method.. through the service that has a collision? Which makes no sense,
and was not deterministic. It is a leftover from b69aac5
Instead, have the high-level test logic stop the second host
Rework "Routing Ready" message

introduce internal "Routing Ready" message
(configurable by setting INTERNAL_ROUTING_READY_MESSAGE flag during compilation)
log external "Routing Ready" also after resume
If an application (or libvsomeip itself) accidentally closes a
descriptor, following syscalls receive EBADF
There are plenty of libraries (e.g., libsystemd, libudev, ...) that will
immediately SIGABRT on such a case, to signal application misbehavior
Follow the same principle, and add a SIGABRT on EBADF. Furthermore, add
a helper that can be used in the codebase to trigger SIGABRTs
Fix pending subscription tests return codes and add documentation

Fixed return codes.
Fixed send_alternating_subscribe_unsubscribe_same_port and send_subscribe_stop_subscribe_subscribe.
Added documentation.
Felipe Kellermann and others added 22 commits February 23, 2026 14:00
Update documentation based on recent commits.
No functional change other than updating documentation.

Remove VSOMEIP_ABORT_ON_CRIT_SYSCALL_ERROR: missing 93a0ffd.
While at it add a comment that non-JSON files are now ignored.
Skip processing for non-JSON extension files showing a warning when
it is routing manager.
While iterating directory paths for non-mandatory files the routing
manager ends up opening/reading files that are not even JSON. Parser
fails causing a warning "Configuration may be incomplete" after the
costs of opening & reading the file.
Projects have files (e.g. vsomeip security policies) that ended up
deciding for the same paths used by vsomeip JSON files.
This commit restructures the local communication endpoints,
replacing an inheritance-based design with a composition-based approach.
Key Changes

New Abstraction Layers

local_socket: Protocol-agnostic interface for bidirectional socket communication

Implementations: local_socket_tcp_impl and local_socket_uds_impl
Handles async read/write operations and connection management

local_acceptor: Protocol-agnostic server socket interface

Implementations: local_acceptor_tcp_impl and local_acceptor_uds_impl
Manages incoming connection acceptance

local_receive_buffer: Message parsing and buffer management

Handles protocol framing (start/end tags removed in favor of direct command parsing)
Implements buffer growth/shrink strategies

Unified Endpoint Classes

local_endpoint: Single class replacing 4 endpoint types:

local_tcp_client_endpoint_impl
local_tcp_server_endpoint_impl
local_uds_client_endpoint_impl
local_uds_server_endpoint_impl

local_server: Manages server-side connection lifecycle

Handles client handshake (ASSIGN_CLIENT/CONFIG commands)
Creates and manages local_endpoint instances for each client
Uses tmp_connection helper for initial handshake

State Management Changes

Explicit state machine in local_endpoint: STOPPED → CONNECTING → CONNECTED
Non-restartable design (endpoints cannot be restarted after stop)

Type Safety Changes

endpoint_manager_base methods now return specific types:

find_local() returns std::shared_ptr<local_endpoint> (previously endpoint)
create_routing_root() returns std::shared_ptr<local_server> (previously endpoint)

Protocol Handling Changes

Magic cookie tags (0x67 0x37 0x6d 0x07 / 0x07 0x6d 0x37 0x67) removed from local communication
Handshake uses protocol commands directly (instead of magic packet for uds)
Client assignment moved to local_server::tmp_connection
Messages are not send separately

Code Organization

Socket I/O, message parsing, and endpoint logic separated into distinct components
Protocol-specific behavior isolated in socket implementations

Implementation Changes

~2000 lines of duplicated client/server TCP/UDS endpoint code consolidated into local_endpoint
routing_manager_impl, routing_manager_client, and routing_manager_stub updated to use new types
unit tests were introduced for the endpoint layer to catch simple mistakes quicker

Migration Impact

Internal API changes (public vsomeip API unchanged)
Endpoint restart behavior changed: restart() now stops and recreates endpoints rather than reusing them
Socket options configured in stream_socket implementations rather than endpoint classes

This refactoring consolidates local communication code while maintaining protocol-specific behavior in
dedicated socket implementations.
There is Android-specific "fallback loading" of security policies,
introduced in 94f113c, which adds the following logic: if given client
host is unknown (e.g., ""), load own host (e.g., "android-idc")
The loading is not idempotent, so every time an Android libvsomeip
client would receive a RIE_ADD_CLIENT, it would re-load
its' policy, due to add_known_client(.., "")
94f113c then massively worsens the situation, by adding another
add_known_client(..., "") at every RIE_ADD_SERVICE_INSTANCE
This is what causes the security policy to be loaded countless times in
boardnetsettings HAL  in , causing it to consume immense
amounts of CPU - it is a HAL that creates 15 libvsomeip applications and
requests >100 services
Fix this by making sure security policy loading is idempotent,
including the Android-specific fallback
Bump minimum version of Boost to 1.74
If a user either closes or messes with the descriptors that the
boost::asio::io_context uses (in Linux, a epollfd), it is possible that
the io threads exit
This is a terrible scenario - libvsomeip application no longer works,
few signs aside the threads exit (and application::start terminating,
which libCommonAPI does nothing about)
Therefore: if the io threads exit, and the user did not
invoke application::stop, do SIGABRT
There is an unfortunately long history to the topic, see 71cd2d5,
, , ...
It is better to do std::abort from a library point-of-view, because it
guarantees a SIGABRT, whereas std::terminate may still invoke user
code
See for example While at it, fix a double-evaluation in VSOMEIP_TERMINATE macro
Reset is sending flag for UDP server endpoint.
An issue has been identified after STR where the the unicast socket for SD stopped
sending data to only one of its targets. Moreover, the dangerous ESQ trace that
depicts when a sending queue surpasses the warning limit was being logged
for that endpoint, this indicated that the data for that target reached the internal queue
but never the network, leading to believe that could be related with the is_sending flag
not being reset if an asynchronous operation was somehow halted/corrupted during STR.
This PR introduces some further traces to track targets unable to send data for the SD
endpoint and resets the is_sending_ flag for all UDP server endpoint on resume.
Check config for app name and id
If an app with clientID X and name Y de-registers and another Y name
app comes up and takes its clientID, if the original app re-registers
again, routing manager will accept the connection believing it's the
exact same application. This will then lead to issues where the second
app will detect lost of connection due to the replacing of connection
and try to register again, leading to a loop of register->de-register
between the two apps that share the same clientID X and name Y.
To fix this, check the config for configured app names, if it exists,
then it's OK, otherwise, continue with the clientID assignment
Note: there are several apps that do not specify a name, so it will
default to "" which can lead to the issue described above.
Change epoll_wait to check EINVAL in addition to EBADF.
Change react to support an errno (0 means use global) and add an
additional check for epoll_wait checking for EINVAL.
While at it, support epoll_pwait in addition to epoll_wait.
As a cleanup, fix return type for several other wrappers.
Keep original strings unchanged because of existing tools/parsers.
,
Minor documentation update
Silly regression from 7b3dedc, was causing local endpoints to
re-connect forever
When using TCP, it can cause "Cannot assign requested address", due to
old-endpoints-connecting-forever mixed with new endpoint using the same
address
vsomeip_sec_sync_client is what allows mapping an ip/port to a uid,
when using libvsomeip-sec
Unfortunately there is a gap in the configuration, and trying to map the
ip/port of the outgoing connection to vsomeipd to the uid of vsomeipd
results in an error
(this mapping now happens due to 7b3dedc)
Workaround this by "pretending" that the outgoing connection uses
the remote address of the incoming connection - libvsomeip-sec does have a
configuration for that one
Do not send STOP OFFER when registering an application
Fixes an issue where a second registration attempt made by an already registering application
would cause services that where offered to be stop offered and not offered again
Steps leading to the issue:

Daemon receives an error on connection with server and triggers DEREGISTER_ON_ERROR
Server simultaneously starts a registration attempt (ST_REGISTERING)
Daemon is still processing deregistration and queues registration request from server
Daemon finishes deregistration, closes connection with the server, and starts
processing pending registration request
Connection closed on step 4. triggers error handling on server and a new registration attempt
(ST_REGISTERING again)
Daemon queues second registration attempt, while processing the first one
Daemon finishes processing first registration, services are offered, server changes to ST_REGISTERED
Daemon starts processing second registration, services are stop offered
Services are not offered again due to server already being on ST_REGISTERED

This partially reverts 9abdb5f, now STOP OFFERs are only sent when an application is
being deregistered (The revert was already tested and no regression was found)
Consistency across VSOMEIP logging at RMC
This PR does not include any functional change.
It makes analysis and troubleshooting easier if log messages are standardized.
Changes are motivated to promote consistency across logging
and support at log analysis.
Note: vSomeIP Security logs remain untouched as I'm afraid it may
break some BAT Tests which rely somehow on these messages.
Fix missing unavailability on suspend_resume_test_initial
Sporadically, the test fails due to a logic limitation.
Root Cause:

The client waits to receive unavailable followed by available after sending the STR message to the service.
In the first iteration (network down → network up), the unavailable/available state transitions sporadically occur
instantaneously. Although is_available_ correctly becomes false, the test thread cannot capture this state change
because it immediately becomes true again before the condition variable check.

Solution:
Introduced a new flag was_unavailable_ that acts as a "sticky bit" to track state transitions:

Set to true when the service becomes available and the previous state was unavailable.
This ensures that the test knows the service went through the unavailable state.
The flag is only cleared after sending the STR (Suspend/Resume) message to the service, ensuring proper
synchronization
le::status acquires mutex_, same as what is held in le::start,
which results in a deadlock if this error happens
Follow-up to a9b4e9f, it is the lower-byte, not the top-byte that
needs to be modified
(otherwise there are still libvsomeip-sec issues)
local_routing_test_service would receive requests, immediately
respond, and immediately stop on the last one
There is no guarantee of message delivery on stop, and so test would
timeout as local_routing_test_client would never receive responses
Ensure stop_offer/offer completes before processing next command
Sporadically, even though STOP_OFFERs and OFFERs are sent in the correct order, the routing manager takes
longer to process the STOP_OFFER than the OFFER and ends up sending the availability out of order, that is, it sends
the ON_AVAILABLE first and only then the ON_UNAVAILABLE.
This issue seems to be due to the premature removal of offer commands from the queue.

Early Command Removal: The erase_offer_command() was being called before the asynchronous
on_availability() notification completed.
Queue Processing Gap: When erase_offer_command() removed the STOP_OFFER from the queue, the
next OFFER command would immediately start processing.
Async Callback Delay: The STOP_OFFER's asynchronous endpoint cleanup callbacks would still be pending
and could execute after the OFFER completed, sending ON_UNAVAILABLE out of order.

In order to ensure the correct order, the erase_offer_command() call must be made after all notifications have
been sent, only allowing the next command in the queue to be processed after the previous one has finished
processing.
Key Improvement:

Guaranteed Notification Order: By keeping the command in the queue until all notifications complete,
we prevent the next OFFER from processing prematurely.
Fix and optimize offer_test_external
Test updated to remove the sleeps. Additionally, it was verified that, the way the test was being initialized, the
external service (offer_test_service_external) never got to run because when the 3 seconds passed and the slave
container was started, the client had already finished execution and immediately sent the kill signal to the slave.
Changes:
offer_test_external_master_starter.sh.in:
-> Remove all sleeps
offer_test_external_sd_msg_sender.cpp:
->Mechanism added to wait for the master service to start offering the service
offer_test_service_external.cpp:
-> Wait until master service is available
Remove the server_endpoint interface
This interface is no longer needed after the refactoring of the local communication.
ricardolleite and others added 23 commits February 23, 2026 14:00
Description
Same as 80341ae, but for a bajillion other tests
It is always the same pattern, either:

client sends a shutdown request and waits before stop, or
server sends a shutdown response and waits before stop
Fix E2E tests return codes and add documentation

Fixed return codes.
Removed some magic numbers from source code.
Added documentation.
Enables existent logging
These logs are very useful for day-to-day analysis.
They allow us to see exactly when the boardnet service is
being offered and when it is stopped.
Improve logging for initial event tests
For some test cases logs like this would appear when launching initial_event_test_service

Unknown argument: CLIENT_SUBSCRIBES_TWICE

This is not an error but could be a bit confusing when analyzing the test.
Change the logs so that Unknown argument is only logged is a wrong argument is passed
Optimize external_local_payload_test_client_external
The test starter uses a function to verify if the service sockets are open, but unlike other starters, it checks after
launching the client instead of checking after launching the service.
Additionally, to verify if the sockets are open, a 2-second sleep is performed, but I don't think it makes sense to
always wait 2 seconds because in the case of sanitizers, it is almost instantaneous.
Therefore, to optimize execution times, a while loop was created that checks the socket status every 250ms with a
maximum limit of 2 seconds. This way, as soon as the sockets are open, the test can start, avoiding the static 2-second
wait.
Eliminate the risk of desynchronisation of the list of servers to be restarted
Since 4ef740c,
the lock on endpoint_mutex_ isn't hold while restarting the server.
So there is a risk that the list of active servers is modified while
the endpoint_manager_impl restarts them during a resume.
The consequence is that the old server occupies an endpoint
and prevents its replacement from taking its place.
We can reduce the likelihood of such an event occurring. And
when it occurs, we can correct the consequences.
Secondly, this problem should not occur because it is not normal
for the old server not to have self-destructed, as it should no longer
be referenced.
Remove initial/end tag check
With 7b3dedc, start and end tags were removed so vsomeip-dissector
had to be updated to no longer count on this 8 bytes.
Note: this breaks compatibility with pcaps that had versions previous
to the introduction of the PR mentioned above, you can use the
official wireshark vsomeip dissector for those.
If a user does application::start, then stop, then start, there is
no executor_work_guard anymore, so the io threads could exit due to
lack of outstanding work
It is theoretical, caught in code analysis
Add missing #include <unordered_map>
Remove unused internal.hpp

Android build is complaining,
local_server.hpp:187:10: error: no template named 'unordered_map' in namespace 'std'; did you mean 'unordered_set'?
And the missing internal.hpp on Android side.
Replaced all usages of std::lock_guard with std::scoped_lock across the codebase.
This aligns the project with modern C++17 best practices and
provides cleaner, more flexible mutex handling.
The main motivation for this PR is to ensure consistent
mutex-handling practices
In case of an error, a lock was incorrectly held, leading to a deadlock
In short:

lati::accept_cbk receives connection
lati::accept_cbk errors with ENOTCONN when reading remote address
(could happen if connection is already kaputt)
lati::accept_cbk calls ls::accept_cbk.. while incorrectly holding mtx
ls::accept_cbk calls lati::async_accept, which tries to acquire mtx
deadlock
Otherwise causes failures specifically with builds that only do
vsomeip_ctrl no def for std::scoped_lock
Change type for getsockopt
Android is complaining,
local_socket_uds_impl.cpp:114:101: error: cannot initialize a parameter of type 'socklen_t *' (aka 'int *') with an rvalue of type 'unsigned int *'
After every single kill, the test script must wait, because
otherwise the process might receive:

signal from kill and then
signal from shell termination

Unfortunately 2) can cause valgrind issues, but even 1) is problem, as
we can lose logs of the process exiting
Change all tests to wait after every single kill, check for exit
code. Furthermore, fix a tsan issue uncovered by these changes in
routingmanagerd
Ensure memory keeps alive until callback is invoked
keep a shared_ptr of the local_receive_buffer and copy it into the callback.
Because the callback does not keep alive the owner of the buffer as in the other
places, the callback needs to ensure that at least the memory used by asio is
kept alive. This avoids circular ownership problems, while protecting against
memory corruption.
The following (somewhat unlikely) sequence can happen:

le is created to connect to vsomeipd, from a guest
vsomeipd is unreachable
connection attempts timeout, leads to le::connection_cbk
connection attempts are exhausted, so le::connection_cbk does
le::escalate, calls error handler
however, before le::escalate, it does socket_->stop(), which
concurrently calls le::connection_cbk with "operation canceled"
concurrent le::connection_cbk sets state_e::STOPPED
due to state_e::STOPPED, le::escalate does nothing

This is possible because we're no longer using strands, and because
le::mutex_ is released between socket_->stop() and le::escalate
Rework the code to solve the last point, that alone eliminates the problem
Ensure lazy load before using security
During the refactoring in: 7b3dedc
the add_known_client was moved after the creation of a local_endpoint.
But the creation method of the local_endpoint internally uses the security model to ensure
this endpoint is allowed to be used. This was done because:

It was not understood that add_known_client needs to be called strictly before the check,
there was no remove_client method in the rh interface to remove the client
when the security check failed.

Both of these points are now understood and adjusted in this PR.
This bug did not surface, because the lazy load + security check needs to be used for
a client-server relation across containers.
As stands, vsomeip_sec_client_t::host is uninitialized with unix
sockets, which leads to valgrind issues when using CommonAPI
Trigger on_stop_offer_service coherently after on_unavailable
everywhere. It is a consistency fix, first and foremost
Which gives visibility into what steps the bash scripts do, without any
other changes. We could even remove quite a few logical "print"
statements
Without something like this, it can be very difficult to understand
which executable exited with an error
Avoid wrong security checks on the sender endpoints of the router and harden the local_server
Sequence that revealed all these problems addressed:

A container client that did not load the security policy (lazy loaded) tried to connect to the router
The router accepted the connection and assigned a client id
The client received the client id
The client started the receiver (triggered a async_accept)
The client dispatched the register_application command + the config command
The router enqueued the registration answer process on some other thread
The router started to load the security policy for the received config command
The routers endpoint was able to connect to the client
The router executed a "client" security check which failed
The router dropped the sender towards the client
The clients error handler is invoked
The client invoked rmc::reconnect but a credentials check within fails
The client does not stop the receiver
The register_application timer expires, not stopping the receiver, but restarting the sender
The loop starts from the beginning

Eventually the router is no longer rejecting the client (the policy loading was quick enough),
but now the pilled up async_accepts fail with "Bad file descriptor".
Problems address in this PR:

The sender of the router should not perform the security check (be aware it is a router endpoint)
The local_server should be able to handle subsequent start, start invocations,
Sockets used in the async_accept callback should not be replaced, before the callback is invoked.
remove the credential check when re-attempting to connect to the router
vSomeIP-Lib 3.6.2 Release
Enable caching of unknown events
Adapt wireshark plugin for newer wireshark versions
Improve logs for registration process
tests: fix offer_test_external
Rework "Routing Ready" message
misc: SIGABRT on EBADF
Fix network_tests/pending_subscription_tests
Fix for removed env and tweaks information on JSON files
Skip non-JSON files iterating config paths
restructure local endpoints
conf: fix lazy loading of security policies
Bump minimum version of Boost to 1.74
app: SIGABRT on unexpected io exit
misc: use std::abort instead of std::terminate
Reset sending flag for UDP server endpoint
check config for app name and id
epoll_wait check for EINVAL and react supporting errno
Update documentation
le: fix max-connection-attempts
lsti: fix policy check
do not send STOP OFFER when registering application
rmc: Unified log message structure
suspend_resume_test_initial – fix missing unavailability
le: fix silly deadlock
lsti: fix byte order
tests: fix local_routing fragility
Ensure stop_offer/offer completes before next command
Optimize offer_test_external
Remove the server_endpoint interface
tests: fix more fragility
Fix E2E return codes
Set log as info for boardnet OFFER/STOP OFFER
improve logging for initial event tests
Optimize external_local_payload_test_client_external
workaround to race condition with restart list
dissector: remove initial/end tag check
app: fix missing executor guard
Add missing include for unordered_map
tests: include unit-tests in network-tests
replaces std::lock_guard with std::scoped_lock
lati: fix accept deadlock
build: fix missing headers
Change type for getsockopt
tests: wait after kill
Ensure buffer is not cleaned up while asio might use it
le: fix escalate race
Ensure lazy load of client, before using security
le: fix uninitialized sec_client
rmi: fix order of on_*offer_service/on_available
tests: add bash -x
Avoid security failures on the daemon side
@fcmonteiro fcmonteiro merged commit 305aa99 into COVESA:master Feb 23, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants