AnsweredAssumed Answered

PI Server Crash - Troubleshooting Ideas

Question asked by jeff_denz on Apr 14, 2015
Latest reply on Apr 16, 2015 by Melinda Stivers

Over the last couple years we've had several very mysterious PI Server meltdowns. We are currently running on PI Server 2012 but this same issue got us on PI Server 2010.  We have contacted tech support about it a few times but so far we haven't been able to identify a solid cause or solution.


Every few weeks we find that the PI Server quickly runs out of memory even though under normal conditions it has plenty of horse power.  We typically run with about 7 Gigs of FREE memory.  Then something occurs that causes the PI Archive subsystem to eat up all the available memory, usually within a minute or two.  This of course leads to the system becoming unresponsive.  If we can get to it quickly enough, we can shutdown the PI Server gracefully and start it back up, and then we are operating clean again for the next few weeks.


I can't find any suspicious errors in the PI message log or Windows Event logs.  I basically have nothing to go off of except for Windows Perf counters, but all that tells me is that the server ran out of memory and that piarchss.exe was using up all the memory.


We do know for sure that certain batch queries using PI OLEDB can crash the PI Server but usually these types of queries aren't something a typical End User would even know how to do.  Furthermore, when checking the PI message log it doesn't appear that any users have connected using PI OLEDB near the time of failure.


What I'm hoping is that somebody has an idea of a good way to isolate this problem for troubleshooting.  Since the process is not crashing but rather just running very low on resources, I don't think a crash dump is the answer.  Is there any way to pinpoint exactly what would make the PI Archive subsystem use all available memory?


Note:  I remember reading that PI Server 2012 had better protections against runaway queries which is one of the reasons we upgraded, but that doesn't seem to have worked in our case.


Thanks for any great troubleshooting ideas!