I need a magic formula that can be used for determining, at run time, what the ideal manner is to retrieve data of varying profiles.
Let us set some background:
- PI Server can be a PI Collective, or an individual PI Server.
- PI Server hardware will vary in available resources.
- PI Server is being monitored so real time resource consumption is known. However, I don't have the per subsystem thread statistics.
- PI Server is already tuned for basic tuning parameters; Maximum Events per Call, RPC timeout etc. There is scope here for some advanced tuning but OSIsoft doesn't do a good job of documenting these in total/together. Why hide the tuning parameters in SMT until you type the name in exactly right.
- Data retrieval will be for 1 to many (read 100,000's) of PI Points. - Data retrieval will typically be grouped by function, e.g. Snapshot, Recorded Values, Interpolated Values.
- Data retrieval requests will need to know about each other, e.g. I need to make RecordedValues call over 7 days period for 100,000 PI Points and make Interpolated Values calls for 50,000 PI Points. How to I slice up the requests the most efficiently and without degrading performance of the PI Server for other connected users.
- Data density per PI Point will vary from sub-second to minutes.
- Data compression will vary, although typically compression deviation is 0.
- 15+ years of data retrieval is possible.
It is quite simple to craft some calls to the PI Server to make it unresponsive for a period of time.
There is possibility to put some safeguards in place via tuning parameters but then it is possible to again make the PI Server unresponsive (sometimes unintentionally ).
So how does a connected SDK application make the decision of when to make bulk calls, or parallelise, or page, or slice up large requests?
Note, I am not looking for basic recommendations here (don't point me at the "how to optimise your SDK application" webinar), I really want to know in real-time what decision my code should take given the connected member's current state. As we don't have any load balancing across a PI Collective I need to move that type of functionality into code.
I do not want to simply add a PI Collective member just for this application to connect to.
I have some thoughts around knowing the typical data density of assets (pre-calculated using the Event Count summary and stored outside of PI) to rank the likely impact of a summary call that forces piarchss to trawl through all events, then a multiplier for number of PI Points, input from the current state of resource consumption, the current PI Server configuration (e.g. number of subsystem threads), the number of connected consumers, the state of other members, ...
Right on the edge of this I'm considering looking to see if an archive file is loaded into memory for the time period of queries too. However, very interested in the community's viewpoint, and of course OSIsoft.
Possible alternative is to stick all my data into a distributed in-memory database, but I don't want to do that, right?