Q2067 Faster change over to standby Servers

Summary:

When I have done redundancy setups I have mostly played around with parameters trying to get the switch time I required. But I must admit that mostly I'm fumbling in the dark as each redundancy setup has different demands. I have been searching high and low for something that can guide the engineer in setting the correct parameters to get the required results.

1) What I'm looking for is: What parameters affect:

IO Device redundancy
Task redundancy (alarm, report and trend)
LAN redundancy

Sometimes the switch time is not crucial and then the above is not an issue, but other times it is and the customer needs a switch time of down to 5 seconds between tasks and LAN.

2) Is the above time realistic, if not what is?

Solution:

LAN SendTimeout

The major impact when a server or network fails is the time-out on the network protocols. This time-out has no effect when an I/O device fails on an I/O server. This time-out should be controlled by the [LAN]SendTimeout parameter in the CITECT.INI file. The default setting for this parameter is 15 seconds. This default is set high so that false time-outs will not be generated under unusual network conditions. This time-out has been found to suit most local area networks running with Windows for Workgroups. Also notice I have highlighted the 'should be' controlled by this time-out. Citect passes this time-out value to the network protocol, however some network projects ignore this value and use their own time-out values (This seems to be the case with Microsoft's TCP/IP protocol).

If you are running Citect over a LAN which has fast response times and with little peak loading traffic then you can safety reduce the time-out values. If you are running Citect over a WAN with slow bridges/routers or running on a LAN which has peaky loading then reducing this time-out may start to cause false time-outs. When a false time-out occurs you will get the hardware error "Session Timeout" or "Session Closed" and a redundancy change over will occur. The I/O data may go into #COM break for several seconds before the standby switches in. After a few more seconds the computer will re-attach to the primary server. Peak loading on you network is sometimes caused by network backup procedures. For example if you backup a computer over the network this can generate a very large amount of network traffic. For backup procedure you should try to use local backup devices. Switched networks also help to reduce peak load and improve the overall network performance.

Due to the more robust nature of Windows NT networking you can reduce these time-outs over a similar system running under Windows for Workgroups or Windows 95. To improve the change over time try a SendTimeout of 2000 milliseconds by setting the parameter in the CITECT.INI file:

[LAN] SendTimeout=2000

If the network protocol ignores this value you must adjust the time-outs for the network protocol See Q1949 for details on Microsoft TCP/IP. For other protocols check your network documentation.

Protocol Timeout

Each PLC protocol has unique time-out values which are document in their online help. This time-out only effects the changeover time when and I/O device fails. When a PLC fails or the communication cable is broken the protocol driver must wait the protocol time-out value and number of retries before it detects the failure. The time-out values have been setup to suite the most common Citect installations. You may be able to reduce the time-outs and retries for your particular case. You however must be very careful as some PLC can produce highly variable response time, causing fault time-out if it is set too low. This can typically occur when you make modifications to the program in the PLC causing a very long one off response time.

For example to reduce the time-out if you are using MODBUS protocol to 500 ms you set the following parameter:

[MODBUS] Timeout=500

LAN Timeout

The LAN Timeout parameter controls the Timeout between a Citect and the I/O servers for I/O data. This time-out only effects the changeover for I/O servers and between redundant I/O Devices. When an I/O Server or an I/O Device fails the data attached to that device will go into #COM after this timeout. Then the I/O requests will wait for the IOSERVER HeartTime before being issued to the standby I/O Server. So between the LAN Timeout and the IOSERVER HeartTime you will get #COM displayed. This is only true for Citect versions less than 5.0.

In Versions 5.0 and later all pending I/O data requests are re-started immediately after an I/O server connection is lost or a I/O Device goes off line. In this case the switch over may be so quick that the data will never go into #COM condition. So if you are running this version or greater you don't need to adjust this value. The default for this parameter is 8000 mill seconds and in most cases you can reduce to 2000 ms. This does depend on how fast your I/O device protocol is.

LAN Watch time

The LAN Watch time parameter controls how often a Citect searches for a failed or redundant server. This time-out will effect the changeover time of a Citect Alarm/Trend/Report servers and with Dual LANs. It does not effect the change over time for I/O Servers as Citect communicates to all I/O Servers at the same time. The default for this parameter is 30 seconds in versions 1.x to 3.30/4.10. In version 3.40/4.20 and later the default has been changed to 2 seconds. The watch time is not exactly a time-out value, but how often Citect will watch for failure. Because of this the effective time-out will be 1/2 of the watch time.

When a server is lost by shutdown or failure, Citect will wait this time before searching for the failed server. When the failed server does not respond (controlled by the LAN SendTimeout parameter) Citect will again wait this watch time before searching for the redundant server. So if the network time-out is 15 seconds, Citect will detect that the server has failed after 15 seconds. Citect will then wait on average 15 seconds (in case of version 4.10 with 30 default) and then retry to connect to the failed server.

IOSERVER HeartTime

The IOSERVER HeartTime parameter controls how often a Citect I/O server sends out heartbeats on the status of its I/O Devices. In versions before 5.0 a Citect client was required to receive a heart beat from the I/O server before it would switch over to a standby I/O device. So reducing this time would reduce the time for change over. In version 5.0 this is no longer required so you should not adjust this parameter. When you reduce this parameter the I/O server will send out heart beats more often and this will generate more network traffic and loading on the I/O server. So you should be careful not to reduce this parameter to much. The default for this parameter is 30,000 milliseconds and you should be able to reduce to 5000 ms to reduce the change over time.

Results

We recently setup a system running Citect version 5.0 under Windows NT 4.0 using TCP/IP protocol. The registry settings documented in Q1949 were made to reduce the network time-outs in TCP/IP. With this setup we were able to get a 3 second change over time when the primary server was failed. This failure was simulated by disconnecting the primary server from the network. If the connection to the I/O Device was failed (by removing the communication cable) then after the protocol timeout (1 second in this case) the I/O Data switched to the standby device, the data did not go into #COM.

So to get the minimum change over times you should check you network protocol and reduce the time-outs. Then make the following changes in the CITECT.INI file depending on your version of Citect.

Version 5.0 or later

[LAN] SendTimeout=2000

Version 3.40/4.20

[LAN] SendTimeout=2000 [IOSERVER] HeartTime=5000

Versions 1.0 to 3.30/4.10

[LAN] WatchTime=2 SendTimeout=2000 TimeOut=2000 [IOSERVER] HeartTime=5000

Keywords:

Q2067 Faster change over to standby Servers

Related Links

Attachments