Applies To:
  • CitectSCADA 1.00, 1.01, 1.10, 1.11, 1.20, 2.00, 2.01, 2.10, 3.00

Summary:
Question: How long does it take for a Standby Citect I/O server to take over from a Primary I/O server when the Primary has lost communications to the PLCs. 

Solution:
The exact time depends on the type of failure and the type of PLC protocol. If the primary server continues to run, but it cannot talk to the PLC then the following sequence will occur:

1). The PLC protocol must detect that the PLC communication has been lost (by returning the error 'unit offline error'). How the PLC protocol driver does this depends on the nature of the protocol. With serial protocols, Citect assumes that the PLC is offline it it gets 3 timeouts in a row. If the protocol has a timeout of 1 second and 1 retry, it will take 1 seconds x (1 initial + 1 retries) x 3 timeouts = 6 seconds to detect failure. You can adjust the retries and timeout to reduce this time if required, however you should not make the timeouts so small that they cause errors when no timeout exists. With protocols that use special interface cards, this time is normally shorter as the cards are smarter at finding failure.

2). When the protocol returns the 'error unit offline' to the I/O Server, the I/O Server will tell all its Citect clients that the PLC is now offline, and they should re-send their PLC request to the Standby I/O Servers. The Citect Clients will wait for their LAN timeout period before resending their pending I/O requests to the Standby I/O server. The Citect Clients will send any new PLC requests directly to the Standby I/O Server. The LAN timeout period is normally 8 seconds and can be adjusted by the option [LAN] Timeout. You can reduce this timeout to make quicker change overs.

In the case of a PLC failure, therefore, the change over time will be say 6 seconds from the PLC protocol driver, plus 8 seconds from the Citect client (a total of 14 seconds). Note that the client timeout will most likely occur partly in parallel, so the real time will average around 8 seconds.

If the primary I/O server computer fails, a different failure sequence will occur. Under this condition, the primary server cannot inform the Citect clients that it is has failed. They will only know it has failed after the LAN send timeout period.

1). The Citect clients must detect that the Primary server has failed. When they try to send data to the I/O Server, they will get a timeout on their network connections. This timeout defaults to 15 seconds and is controlled by the option [LAN] SendTimeout. You can reduce this timeout to get quicker changeovers, however if you make it too small, you may get extra NetBIOS timeout errors.

2). When the Citect clients detect failure of the network connection to the primary I/O Server, they will place all I/O Devices associated with that I/O Server as offline and they will re-send their PLC request to the Standby I/O Servers. The Citect Clients will wait for their LAN timeout period before resending their pending I/O requests to the Standby I/O server. The Citect Clients will send any new PLC requests directly to the Standby I/O Server. The LAN timeout period is normally 8 seconds and can be adjusted by the option [LAN] Timeout. You can reduce this timeout to make quicker change overs.

In the case of a computer failure, therefore, the change over time will be say 15 seconds from the network timeout plus 8 seconds from the Citect client to give a total of 23 seconds.

If the primary I/O server is shutdown, the changeover is quicker because during its normal shutdown procedure, it will tell all connected Clients that it is shutting down - so they will switch across to the Standby I/O Server. The clients will only have to wait for the LAN timeout period of 8 seconds.

Failure of Alarm/Trend and Report servers will occur only if the computer fails. This will be detected as a network timeout from the Citect Clients. The sequence will be as follows:

1) The Citect clients must detect that the Primary server has failed. When they try to send data to the Server, they will get a timeout on their network connections. This timeout defaults to 15 seconds and is controlled by the option [LAN] SendTimeout. You can reduce this timeout to get quicker changeovers, however if you make it too small you may get extra NetBIOS timeout errors.

2). When the Citect clients detect failure of the network connection to the primary servers, they will close the network connection and try to connect to the Standby servers. The task that tries to re-connect network sessions runs every 30 seconds, so after a maximum of 30 seconds, the network session will be established to the Standby server and the changeover will be complete. You can reduce this time by the option [LAN]WatchTime=seconds, however decreasing this time will create extra overhead for the Citect client. (Note in version 3.40 / 4.20 and later the default value has been reduced to 2 seconds).

In the case of a computer failure the changeover for other servers will take 15 seconds from the network timeout plus 30 seconds for the re-connect time(a maximum of 45 seconds). Note that the re-connect time will normally occur in parallel so the client will most likely re-connect in about 30 seconds.

To sum up, Citect should switch across from a Primary to Standby server in less than 30 seconds. You can adjust the above parameters to reduce this time. However, these parameters have been set at these default values by the Citect development team to work correctly under the majority of configurations. Setting these parameters to incorrect values can cause problems with your system and so you should be careful when changing them.

Getting faster change over times

If you wanted to reduce the time taken to changeover would modify the LAN WatchTime, SendTimeout and Timeout parameters. For example if you wanted to reduce the time to around 3 seconds, set the parameters as follows:

[LAN]
WatchTime=1
SendTimeout=2000
Timeout=4000
[IOSERVER]
HeartTime=5000
! Note don't set Heart time in version 5 or greater

Reducing the WatchTime will only cause a very slight increase in CPU loading on client computers and extra network loading if a Citect server has failed. It is OK to set this watch time down to 1 second for fastest possible changeover.

If you have a fast network you can reduce the SendTimeout down below 1 second. If you set this timeout too low you will start to get NetBIOS timeout errors and you should increase it. You may experiment with your network to find the optimum value of this parameter. You may also have to tune the timeout values of the network protocol you are u sing. See Q1949 for details of the TCP/IP protocol with Windows 95 and Windows NT.

The Timeout parameter will depend on the response time of data from the I/O Server. You normally set this value to be twice as large as the average response time. So if the I/O Server can send the PLC data back in 2 seconds you can set this to 4 seconds. If this is set too low then Citect clients will send extra requests to the I/O Server and you may get the no response from I/O Server hardware error. If you get these errors increase the timeout. If you have a fast responding PLC and fast network you can decrease the timeout to get fastest changeover time.

Reducing the Hearttime parameter will cause an increase in the CPU loading on the I/O server and extra network packets will be sent over the network. You should avoid setting this parameter tool low as it can add a larger amount of network traffic. This parameter should not be set if you are using Citect version 5.0 or greater as this will degrade the performance and has no effect on the change over period.

If you tune these parameters correctly with a very fast network and a very fast PLC it is possible to reduce the change over time to around the 3 second level. See also Q1949


Keywords:
 

Attachments