AnsweredAssumed Answered

Best practice for high performance / throughput reading

Question asked by oysteintorget on Oct 5, 2018
Latest reply on Jan 25, 2019 by Roger Palmen

What is the best practice to achieve high performance for a single read query in AF SDK and/or PI Web API?

 

My use cases is that I want to fetch large amounts out of data out of PI for machine learning purposes. The machine learning models will be created in Python. At the moment I do not know what tags are useful and for the tags that I read I want all the data stored in PI to train the machine learning models.

 

I have tried to read the data using both AF SDK in C# and through the PI Web API and in both cases the number of values returned per second is less than I expected (about 50 000 values per second for WebAPI and about 200 000 for AF SDK un-cached reads).

 

I have several questions related to this.

  1. Does the performance numbers I am seeing look about what I can expect for a PI Data Archive hosted on a physical machine with 24 cores, 32 GB RAM and with data stored on NVMe SSD drives? Or should I expect a lot more meaning that our setup is wrong?
  2. If the performance is what I can expect what is the recommended way to achieve better performance?
    1. Run several parallel queries for different tags?
    2. Duplicate data from the PI data archive into another storage solution like Parquet files and use that for analysis?
    3. Something else?

Outcomes