Backups are failing and getting error message as "backup freeze timed out while pausing archive flush threads". We are getting below error .
The -16858 error occurs when the archive subsystem is very busy and doesn't have the resources to respond to the backup subsystem before timing out.
If this occurs infrequently, it is not really a cause for concern. If it happens regularly, then you should reschedule the backup task to a time where you expect that the archive will not be as busy.
Thanks for your reply
Archive backups are scheduled at 2.15 AM daily(server time) and I have re-run at 11.53 PM and backups are running. Can we re-run daily or backups will automatically at 2.15 AM?
Kindly answer my questions.
The best time of day to run your backups will depend on the specifics of your environment. If you are regularly experiencing the -16858 error, then that would be an indication that you would want to look into why the PI Data Archive is so busy at that particular point in time and whether there is a better time to run your backups. If you are not regularly experiencing this issue then there is probably no problem with the 2:15 am time.
If the issue is occurring repeatedly, Gavin has posted some good first troubleshooting steps to gather more information.
As Luke mentioned, seeing this error indicates that the archive flush threads are likely very busy and therefore PI Backup Subsystem times out waiting for them to free. To determine why the flush threads are so busy, you could get the output of 'piartool -thread piarchss -info' run at the same time as receiving the error. This should indicate the state of the flush threads and what point they are flushing for.
Out of curiosity, what is the version of PI Data Archive in this case?
We are frequently facing "Backup freeze issues", atleast weekly once in both the collectives. What are the list of parameters to be analyzed to get more information on the issue. Can you please share your post on trouble shooting.
When you see the backup freeze messages, are they exactly the same as the OP's? If they specifically mention failing due to waiting for the archive flush threads, then we should investigate why the archive flush threads are taking so long to finish their tasks. The easiest way to do so would be to run 'piartool -thread piarchss -info' at the time of seeing these exceptions from PI Backup Subsystem. This could be done manually using command prompt, or if the backup is scheduled for the early hours of the morning, you could also write a small script to run this command at the time of backup. Since running it once may not guarantee that we capture the flush thread activity, it may be worthwhile to have such a script loop a couple times. An example of setting up this command to loop a specified number of times can be found in KB 3223OSI8. Once we have this output, we can cross-reference with your backup freeze messages to see if there are certain points which are taking an inordinate amount of time to flush. Most commonly, long flush tasks are a result of writing many out-of-order events, are a large number of duplicate or string events. When you mention that it happens at least once a week, I wonder if it always happens the same time each week. Perhaps there is some kind of UFL, RDBMS, or otherwise other data that sends in batches, which may be sending large amounts of events once/week?
I highly recommend that you open a Tech Support case to investigate this, as it will make communication on resolving this issue easier.
Retrieving data ...