Do you see the AFErrors collection return different from empty or null? If so can you share some details about the errors contained?
The first thing to do is logging which will allow to investigate further. If there are always the same set items returning errors it may indicate some systematic error or a configuration issues. A retry mechanism will not be of any use for this case.
4 of 4 people found this helpful
There are at least 2 approaches here:
Approach 1 (simpler):
Inspect AFDataLossException. AFDataLossException Class
This will be returned in PIServerErrors collection in AFErrors. This exception is returned whenever possible data loss is detected (expired consumer, server-side overflow, exception). It will provide a Start and EndTime property that the client can keep track of per PI Data Archive. Persist these error periods and have the batch job/thread process the time ranges to recover the data.
Note that the server will persist the consumer's update queue for about 10 minutes in the event of disconnection. Reconnection is automatically established in AF SDK and if done before that expiration time, you should be able to recover the data automatically (i.e. no data loss exception is returned).
Approach 2 (more complex but offers more control):
1) Maintain a checkpoint timestamp per attribute and persist this state externally. When a new event comes in, move the checkpoint forward to the new time. Assume all times prior to the checkpoint minus explicitly tracked error periods have been received through the pipe successfully.
2) If AFErrors returns an error for an attribute that suggests data loss*, start an error period denoting a potential data loss range, and set the start time to the current checkpoint. Persist the error period state externally. For server-level errors, start the error period for all attributes under that server.
3) When the next event comes in, set the end time for the error period and update the checkpoint. A batch job can later make historical calls passing in the error period time intervals to recover the data.
4) When we receive a new event but can’t process successfully (i.e. 3rd party DB we are writing to is down), place the unprocessed event in a persisted queue to try again later but update the checkpoint.
5) On startup, create error periods between persisted checkpoints to first new value received (on a per attribute basis) so we can backfill over period when application was not running.
*If we only require at-least once processing, we can be fairly broad in defining what constitutes an error period.
We are receiving in-order data. If we miss an event due to exception, we assume we missed events that are timestamped within that exception time window, so out-of-order events timestamped outside the window that are missed cannot be recovered easily.
Thanks for reply Barry!
In Approach 1, how can I catch this AFDataLossException, I am iterating through errors like following:
private void writeErrorLogs(AFErrors<AFAttribute> errors)
if (errors.PIServerErrors != null && errors.PIServerErrors.Count > 0)
foreach (var item in errors.PIServerErrors)
// save to db
And also please confirm that this AFDataLossException is only available in case of PIServerErrors, how would you recover data if this is an error which is not a PISystem or PIserver error.
Thanks again for your help.
1 of 1 people found this helpful
PIServerErrors is an IDictionary<PIServer, Exception> so you can inspect Exception types by PIServer. You can check if Exception is an instance of AFDataLossException using is or as C# operators. For other errors, you would need to use Approach 2.