There are several Windows 10 machines with 1 Gbps and 40 Gbps network cards, both of which are connected to the same router. The router has four networks: WAN, C, A, and B (which corresponds to the network cards on the Windows 10 machines). Routing is possible between A, B, and C, and A is configured as the gateway.
From network C, there are issues communicating with network A. It seems that packets are initially routed correctly, but then they are routed through the faster network B, which causes problems for the Windows 10 operating system when it receives responses from an address it did not connect to. These responses are dropped and communication hangs.
It’s worth noting that even though the communication started on network A, the machines on networks A and B are switching routes back to network C through network B. This is not a routing problem, as everything can ping everything, and it is not a connectivity problem either. For example, it is possible to RDP from network C into networks A and B, but only the connections to network B are stable. Connections to network A initially work, but then hang after a few frames (packets?).
CIFS file copies have no issues, but it’s important to note that Windows is designed to route copies across the fastest routes. Only Windows 10 machines seem to have this issue, as Windows Server expects multi-homing and does not behave in this way.
It’s clear that there is a configuration issue causing the machines to switch interfaces. Is there a way to fix this, while still allowing the option to use the faster route on network B if needed for improved performance on CIFS?
3 Answers
Introduction
Remote Desktop Protocol (RDP) is a widely used tool to access remote machines. However, RDP black screens, locking, and connection timeouts are common issues that users face. These issues can occur due to various reasons, and one of them is Windows 10 multihomed routing. In this blog post, we will discuss this issue in detail and provide possible solutions.
Understanding Windows 10 Multihomed Routing
Multihomed routing is a feature that allows a machine to have multiple network interfaces and use them simultaneously. Windows 10 supports multihomed routing, and it can be beneficial in some scenarios. For example, if a machine has a 1 Gbps and a 40 Gbps network card, it can use the faster interface for data-intensive tasks like file transfers.
However, multihomed routing can also cause issues, especially when the machine is connected to multiple networks. In such cases, the machine may switch interfaces, causing communication problems. For example, a packet may be sent through one interface, but the response may come through another interface, causing the operating system to drop the response and hang the communication.
The Problem with Windows 10 Multihomed Routing
Let’s consider a scenario where there are several Windows 10 machines with 1 Gbps and 40 Gbps network cards, both of which are connected to the same router. The router has four networks: WAN, C, A, and B, which corresponds to the network cards on the Windows 10 machines. Routing is possible between A, B, and C, and A is configured as the gateway.
From network C, there are issues communicating with network A. It seems that packets are initially routed correctly, but then they are routed through the faster network B, which causes problems for the Windows 10 operating system when it receives responses from an address it did not connect to. These responses are dropped, and communication hangs.
It’s worth noting that even though the communication started on network A, the machines on networks A and B are switching routes back to network C through network B. This is not a routing problem, as everything can ping everything, and it is not a connectivity problem either. For example, it is possible to RDP from network C into networks A and B, but only the connections to network B are stable. Connections to network A initially work, but then hang after a few frames (packets?).
CIFS file copies have no issues, but it’s important to note that Windows is designed to route copies across the fastest routes. Only Windows 10 machines seem to have this issue, as Windows Server expects multihoming and does not behave in this way.
Possible Solutions
There are several possible solutions to this issue. Let’s discuss them one by one.
Disable Multihomed Routing
The simplest solution is to disable multihomed routing altogether. This can be done by disabling one of the network interfaces. For example, if the machine has a 1 Gbps and a 40 Gbps network card, disabling the 40 Gbps network card will disable multihomed routing.
However, this solution may not be ideal if the machine needs to use both network interfaces. In such cases, other solutions may be more appropriate.
Change Interface Metrics
Interface metrics are values assigned to network interfaces that determine the priority of the interface. The lower the metric, the higher the priority. By default, Windows assigns a metric based on the speed of the interface. However, this may not always be appropriate, especially when the machine is connected to multiple networks.
Changing the interface metrics can help prioritize the interfaces correctly. For example, if the machine needs to communicate with network A, setting the metric of the interface connected to network A to a lower value than the interface connected to network B can ensure that the packets are sent through the correct interface.
Use Static Routes
Static routes are manually configured routes that override the default routing table. By using static routes, it is possible to ensure that packets are sent through the correct interface. For example, a static route can be configured to ensure that packets destined for network A are always sent through the interface connected to network A, regardless of the interface metrics.
However, configuring static routes can be a complex task, especially if the network topology is complex. It is important to ensure that the static routes are configured correctly, or else they can cause more problems than they solve.
Use Policy-Based Routing
Policy-based routing is a feature that allows routing decisions to be made based on policies rather than the default routing table. By using policy-based routing, it is possible to ensure that packets are sent through the correct interface based on the source or destination IP address, protocol, or other criteria.
Policy-based routing can be more flexible than static routes, as it allows more granular control over the routing decisions. However, configuring policy-based routing can be complex and requires a good understanding of the network topology and the policies that need to be enforced.
Use Virtual LANs (VLANs)
Virtual LANs (VLANs) are a way to create virtual networks within a physical network. By using VLANs, it is possible to isolate traffic between different parts of the network and ensure that packets are sent through the correct interface.
For example, if the machine needs to communicate with network A, a VLAN can be created for network A, and the interface connected to network A can be assigned to that VLAN. This ensures that all packets destined for network A are sent through the correct interface.
However, VLANs require additional configuration, and not all network devices support them. It is important to ensure that the network devices support VLANs before implementing this solution.
Conclusion
Windows 10 multihomed routing can cause communication problems, especially when the machine is connected to multiple networks. However, there are several possible solutions to this issue, including disabling multihomed routing, changing interface metrics, using static routes, using policy-based routing, and using VLANs. It is important to choose the solution that is appropriate for the network topology and the policies that need to be enforced.
It sounds like the issue you are experiencing is due to the Windows 10 machines switching to the faster network B while maintaining connections with the machines on network A. This is causing communication to hang as the responses from the machines on network A are being dropped.
To fix this issue, you can try disabling the automatic metric feature in Windows 10. This feature allows the operating system to automatically choose the best route for network traffic based on metrics such as speed and reliability. By disabling this feature, you can specify the routes that should be used for communication between the networks.
To disable the automatic metric feature, you can use the following steps:
- Open the Network Connections control panel by going to Start > Control Panel > Network and Internet > Network Connections.
- Right-click on the connection that you want to configure and select Properties.
- Click the Networking tab, select Internet Protocol Version 4 (TCP/IPv4) or Internet Protocol Version 6 (TCP/IPv6) depending on which protocol you want to configure, and then click Properties.
- Click the Advanced button.
- In the Advanced TCP/IP Settings dialog box, click the Automatic Metrics tab.
- Uncheck the “Automatic metric” checkbox and enter a value for the “Interface metric” field. This value should be higher than the metric of the network B connection, so that the Windows 10 machines will prefer to use the network A connection for communication.
- Click OK to save the changes and close the dialog boxes.
After disabling the automatic metric feature, you should be able to specify the routes that the Windows 10 machines should use for communication between the networks. This should fix the issue of the connections hanging and allow you to use the faster route on network B for improved performance on CIFS if needed.
I was able to confirm my guess, which I attribute to my many years of technical experience. It seems that RDP servers (the host machines being logged into) try to detect the fastest route by sending packets back to clients on all interfaces, using both UDP and TCP. However, one of the machines on networks A and B consistently favors network B for returning calls coming in on network A. This is not supported by TCP, but it is supported by the RDP protocol itself.
To fix this issue, the following steps can be taken:
On clients:
- Open gpedit
- Navigate to: Computer configuration -> Administrative Templates -> Windows Components -> Remote Desktop Services -> Remote Desktop Connection Client
- Turn off UDP on the client
On servers/hosts (in this case, all four headless Windows 10 “servers”):
- Open gpedit
- Navigate to: Computer configuration -> Administrative Templates\ Windows Components \ Remote Desktop Services\ Remote Desktop Session Host \ Connections
- Select Network Detection on the server -> Enabled & Turn Off Connect Time Detect and Continuous Network Detect
- Select RDP Transport Protocols -> Enabled & Use TCP Only
This should prevent the servers from trying to select paths on the 40 Gbps network (which is reserved for use with Kubernetes) during a conversation. Note that this solution worked with RDP connections on Windows 10 version 1803, but had occasional problems and did not work at all on version 1809.