PVS Failover graceful - a network view
The setup
Our environment has the following servers:
- Citrix Provisioning Server (version 2206):
- ctxpvs1 (192.168.0.30)
- ctxpvs2 (192.168.0.31)
- Target-Device:
- ctxvda1master (192.168.0.103)
Failover process
We want to look at the failover process from one target device from one PVS server to the other.
To simulate the failover we will stop the Citrix PVS Stream Service
via services.msc
.
Requirements
As general information: PVS HA - docs.citrix.com.
For a successful failover, the following is necessary:
- The vDisk must be exactly the same on the PVS servers (different timestamps are already problematic).
- The vDisk in the PVS console must be set to
Use the load balancing algorithm
.Best Effort
is also not a problem and allows failover across subnet boundaries.Fixed
prohibits failover across subnet boundaries. Reference: CTX138933
- The PVS servers and the target devices must be network reachable.
For network connectivity, we check the port matrix of Citrix.
Failover
In the test lab the target device (ctxvda1master
- 192.168.0.103
) is connected to the PVS server (ctxpvs2
- 192.168.0.31
). So we stop the service Citrix PVS Stream Service
and when the failover should happen, nothing happens. The target device hangs.
Troubleshooting
PVS is a network product, so it makes sense to do a network trace. One way to do this is as follows (server 2019 and higher):
network tracing:
pktmon start --capture
{reproduce the issue}
pktmon stop
pktmon etl2pcap PktMon.etl --out PktMon.pcapng
Theoretically, a CDF trace (using CDFControl) would also be useful, but Citrix does not provide public symbols
for StreamProcess.exe
(but for SoapServer.exe
!). I assume now in advance that CDF traces of SoapServer.exe
do not help here.
How it works
In order to troubleshoot the problem at all, we should of course know what/how is communicating. The port matrix reference shows us the following communication between the target device and provisioning server:
Source | Destination | Type | Port | Details |
---|---|---|---|---|
Target Device | PVS Server | UDP | 6910-6930 | vDisk Streaming |
Target Device | PVS Server | UDP | 6901,6902,6905 | ?? |
In the default configuration (which can be changed) the UDP ports 6910-6930 are responsible for the “content” streaming (i.e. the content from the vDisk to the target device).
But there are still the ports 6901
, 6902
and 6905
. I’m not aware of any publicly available documentation that describes what exactly these ports are used for.
Analysis
The “normal” streaming activity from vDisk to the target device looks like this in network traffic:
ctxpvs2
sends data over port 6930
to the target device with port 6905
. The port 6905
on the target device is the service where the vDisk data is processed.
What actually happens on ctxpvs1
(the PVS server to which the target device should failover):
Two ports are used here: 6903
as well as 6895
. The port 6895
is specified in the port matrix under “Inter-server communication”, so we can match that. This was actually all the communication between the two PVS servers.
If we look at the network traffic from ctxpvs2
, we see that regarding failover:
The packet 9013
is the last packet sent as a “normal” streaming packet. After that, we see a new UDP stream where the PVS server wants to contact the target device on port 6902
. This port is blocked on the target device (because not specified in the port matrix).
We also see the packet on the target device:
There is no response to the request from the target device.
Whatever port 6902
is responsible for, it looks like ctxpvs2
wants to say to the target device ctxvda1master
: “My Citrix PVS stream service is stopped, please switch to the other PVS server”.
If port 6902
is allowed on the target device firewall, the target device network trace looks like this:
So again we see a UDP packet from ctxpvs2
to ctxvda1master
on port 6902
but this time with an important difference. The target device now goes to the other PVS server (ctxpvs1
) on port 6910
.
After that, we see some more communication between ctxvda1master
(port 6901
) and ctxpvs1
(port 6930
). To finish the network trace we see the already known pattern between ports 6905
and 6930
.
Wait a minute, the failover works for me…
Obviously, Citrix is missing a specification in the port matrix for a graceful failover. But, most who install the target device software will not have the problem. Why? Because the setup creates a firewall rule automatically, namely this one:
Summary
In summary, based on the above analysis, it seems that a firewall enable on the target device with port 6902
is necessary to guarantee a graceful failover.
The target device setup creates this firewall rule.
However, it is missing in the Citrix documentation.
Additionally, there is information in the system requirements that port 6901
is allowed on the target device. This requirement for a local firewall does not create the setup nor is it in the port matrix.
Probably a good way would be to open all ports (6901
, 6902
, 6905
) between the PVS server and the target device in the firewall to avoid current and future problems.
I wrote about this about a year ago:
- but since the port matrix has still not been adopted, I have now rebuilt the scenario with a recent version to check if the behavior is still there.PVS failover doesn't work or only very slowly? Thanks to a Citrix case we finally found the solution. It's necessary to add an inbound firewall rule for the corresponding ports. #citrix #citrixpvs #pvs #provisioning #failover #slow
— Patrick Matula (@p_matula) October 15, 2021
Finally, I would like to say that this cannot be the only failover mechanism. We have looked at graceful failover here, but if a PVS server dies from one moment to the next, communication can no longer take place. So there’s obviously still a “plan b” there. That would probably be a topic for another blog post.
Happy troubleshooting.