Q3832 I Think the Trend Server is Leaking Memory

Summary:

My trend server is only running for a few hours before it stops responding. I have seen in task Manager that the memory usage increases regularly and continually whilst it is running until memory is depleted.

Solution:

The trend system stores the trend sample data in memory before writing to disk. This memory caching is necessary to ensure the trending can be carried out consistently, reliably and quickly. The operations of storing the data in memory and consequent writing of the data from memory to disk files are carried out in separate threads. They do not effect each other, however, if the flushing of data to disk cannot keep pace with the gathering of new data then the memory allocations will continue to grow. When that happens you can observe an increase in memory usage just as if there was a memory leak and eventually run out of resources with subsequent failure.

The trend system is not going to perform as well if it bogs down using virtual memory to store data. For faster trend sample rates, the more data there is to store, the harder the system is on memory. As we know, physical memory is fast and virtual memory is disk access speed limited. Fast trends (those with sub-second sample rates expressed in so many milliseconds) will need more memory and it is best to ensure there is enough physical memory to achieve throughput.

If your trend servers memory use is increasing over time the following guide may help you to analyse the problem so that you can take steps to address it.

The analysis needs to be done on the running trend server in question. You can view memory usage in Windows Task Manager or use Windows Performance Monitor and chart and log Citect32 private bytes. You can also use the Citect Kernel to look under the hood at the collection and flushing of the trend data and this will tell you if you are going to have a problem.

You need to open the Citect Kernel on the trend server and go to "Page TrendQueues" simply by typing and entering this command on the Kernel Main page. Let us start with a simple example. If we have a 1 second trend, it will have a default cache size of 764 samples - [trend]cachesize parameter. So for each 1 second trend there will be 12 minutes and 44 seconds of data in memory before writing to disk. If this were a 10 second trend it would be 3800 samples and 1 hour 3 minutes and 20 seconds. Now in page trendqueues, the trend queue length column shows the total number of caches that are waiting to be written to disk. So for our example, every 12 minutes 44 seconds the trend queue length would increase by the number of caches and would then slowly decrease as the trend data are written to disk. As mentioned previously, the writing is done in a separate thread and so runs independent of the gathering of the trend data gathering. If you have a small system like this you should not even notice this activity in the kernel. The Max Queue Length column shows the peak value of the Queue length so in this example it should go from 0 to 1 and hopefully remain on 1. If there was a memory problem and we could not write all of the samples to disk before the next 12 minutes 44 seconds and you could see the trend queue length and possibly the max queue length increment. In an operational project we would expect to have many more trends and and probably with different sample rates and thus have different trend cache sizes and frequencies of trend flushing to disk. In this situation we need to observe the trend queue length and max queue length over a longer time in order to determine the true pattern. If they do continue to grow over time, the memory allocations will continue to grow and you will observe a "leak" if you are monitoring memory usage. If you observe this problem, then there are various options available. These include adjusting appropriate trend parameters but in some cases you may not obtain substantial gains this way.

The following parameters are available and can be set in your Citect.ini file through Citect Help:

[Trend]BytesWrittenBeforeSleep

This parameter specifies how many bytes of trend data are written to file by the TrendWriteTask, before it sleeps for the amount of time specified by the WriteWatchTime parameter. You may increase this parameter to trade off increased throughput for higher CPU usage. To start, I would recommend setting this parameter to 32768 which is eight times the 4096 default value in versions up to 6.00. This should not cause an adverse CPU escalation. Check the results, then work on doubling iterations from there if necessary.

[Trend]WriteWatchTime

This is the number of milliseconds the trend write task sleeps between writing [Trend]BytesWrittenbeforeSleep bytes to the trend archives. Decreasing this parameter will increase increase throughput at the expense of CPU usage. You will probably be able to reduce this successfully from the current default 100 ms to about 20 ms.

[Trend]CacheSize

Determines the number of samples that can be stored in the cache of a trend with specific sample period ranges. These have been set to optimise performance but by increasing the cachesize you may be able to achieve more throughput by reducing overhead at the expense of memory usage. You may be able to double or quadruple the size of cache for the ranges where you have a lot of trends.

You can use trial and error with all of these parameters to ensure that your trend queues are returning to zero and your CPU and memory usage are satisfactory. In extreme cases the resolution may also include a combination of providing extra physical ram, increasing virtual memory, reducing the number of trends and sampling some trends at a slower rate.

Refer also to Q3088 about memory leaks.

Keywords:

Q3832 I Think the Trend Server is Leaking Memory

Related Links

Attachments