What is the best practice to achieve high performance for a single read query in AF SDK and/or PI Web API?
My use cases is that I want to fetch large amounts out of data out of PI for machine learning purposes. The machine learning models will be created in Python. At the moment I do not know what tags are useful and for the tags that I read I want all the data stored in PI to train the machine learning models.
I have tried to read the data using both AF SDK in C# and through the PI Web API and in both cases the number of values returned per second is less than I expected (about 50 000 values per second for WebAPI and about 200 000 for AF SDK un-cached reads).
I have several questions related to this.
- Does the performance numbers I am seeing look about what I can expect for a PI Data Archive hosted on a physical machine with 24 cores, 32 GB RAM and with data stored on NVMe SSD drives? Or should I expect a lot more meaning that our setup is wrong?
- If the performance is what I can expect what is the recommended way to achieve better performance?
- Run several parallel queries for different tags?
- Duplicate data from the PI data archive into another storage solution like Parquet files and use that for analysis?
- Something else?