A common question we get is whether there is a way to improve PI Data Archive read performance. The are multiple ways how to do it, however it is usually necessary to determine where the current bottleneck is first. The typical bottlenecks are memory and the disk, less typical but possible is CPU. This blog post focuses on a situation where the disk is the bottleneck and you have enough of available memory on the system and want to find a solution without replacing the current disk.
When a user asks for data, the requested data needs to be loaded from archive files located on the disk into memory first. There are high chances that this or another user will request the same data many times and hitting the disk every time the data is requested would be inefficient and slow. Windows operating system addresses this problem with File System Cache (FSC), which takes care of caching files in memory. Access to data in memory is much faster than access to data on a disk. The first access to data (which hits the disk) might take significantly longer than subsequent access (which hits memory - FSC). So from performance point of view it would be nice, if data was prefetched into FSC before it is actually needed.
PI Archive subsystem automatically loads primary archive into FSC if there is enough available memory during startup. However users often ask for longer time periods of data than is covered by the primary archive subsystem. There is no out of box solution which would prefetch configurable amount of data into FSC (there is an enhancement request for this - https://feedback.osisoft.com/forums/555148-pi-server/suggestions/17475247-pi-data-archive-improve-piarchss-performance). Also once data is loaded into FSC, there is no control on how long data will stay in FSC. FSC is managed by Windows Operating System. It would be necessary to access the data frequently enough to ensure that your data stays in FSC all the time or at least during the time period when it is needed.
The PowerShell script attached to this blog post is an attempt to implement a tool which prefetches PI Data in advance and tries to keep data in FSC by accessing it periodically. The scheduling information (when a tag should be prefetched, what time range of data should be prefetched) can be specified in a simple csv file. Here is an example:
tag, starttime, endtime, hour from, hour to, days of week
sinusoid, *-30d, *, 8, 17, mon,wed,fri
sinusoidu, *-60d, *, 0, 12, tue,wed
cdt158, *-7d, *, 4, 8, all
cdt158, *-7d, *, 13, 17, all
The tag column defines the tag which should be prefetched.
The starttime/endtime columns define the start/end time of the period which should be prefetched. You can specify either absolute time in PI format (1-Jan-2018 01:00:00) or a relative time (*-30d)
The hour from/to columns define the hours of the day during which the tag should be prefetched in FSC. If you need a tag to be prefetched at two different time periods of the day, define each of the period of the day on a separate line as in example for cdt158
The days of week define the days of week during which the tag should be prefetched in FSC. Specify the days using their abbreviation (sun, mon, tue, wed, thu, fri, sat) separated by a comma or use "all" keyword to specify that the tag should be prefetched every day.
The columns needs to be defined in the same order as shown in the example above. I would recommend to use Excel to create the scheduling information and then save the file as csv. Attached is an example Excel file. Please note that the first line in the csv file is always skipped during processing (it is assumed that it contains header).
The script should be run on the PI Data Archive machine. Before running the script it is necessary to define 3 variables at the top of the PowerShell script:
$Global:archiveDisks - list of disks where PI Archives are stored
$Global:configFileName - path to the csv file with prefetch scheduling information (tags and times during which data should be prefetched)
$Global:logOutputFolder - folder where log files should be kept
The script prefetches data periodically based on the schedule information defined in the configuration csv file and based on the counter "\Memory\Long-Term Average Standby Cache Lifetime (s)". This counter shows how long data stay in FSC in average and gives an idea how often it is necessary to access the data to keep it in FSC.
Data of a tag is prefetched in smaller chunks to make sure that we don't overload archive subsystem - by default the script loads the data in ~500k event chunks for numeric tags and in 5k event chunks for string and blobg tags. The script also checks whether any of the physical disks where PI Archives are stored is busy with some other tasks. If % Idle Time for any of the disks goes below 40%, the script pauses prefetching until the disk becomes less busy. Reading physical disk performance counter value takes significant time (~1 second) in PowerShell and therefore it cannot be done very frequently - by default it is done every 5 seconds.
The script should be used to prefetch data of a small subset of tags only. Keep in mind that if you configure the script to load more data than can fit into available memory, Windows operating will start trimming the oldest data from the FSC.
This is an initial version of the script. The current version of the script cannot be run as a service. I plan to create another version which would be possible to run as a service (as in the example from Jean-François Larvoire: https://msdn.microsoft.com/en-us/magazine/mt703436.aspx). Do you have any other suggestions for improvement or found a bug? Do you have your own script/tool which prefetches data into FSC? Please share your thoughts in the discussion.