22 Replies Latest reply on Apr 21, 2016 4:26 AM by Paurav Joshi

    maxCount restriction for stremasets call in PI Web API

    Paurav Joshi

      While getting lots of data of PI tag we encountered similar error, and the solution also lies with some of the tuning parameters in PI Data Archive. To get data from PI tag, we use streams in PI Web API call.

       

      Here problem statement is that we want to fetch data from all elements below specified root in AF hierarchy for the duration of 30 days. For example, we have 35 turbines and each turbine has 43 attributes. We want to retrieve 30 days data for all attributes' of each turbine which is around 15-20 million data.

       

      So we think the best way is to use stream sets as in definition only it said to be meant for bulk data retrieval.

      We have some doubts here:

      • According to document, while specifying the GET streamsets/{webId}/recorded we can specify maxCount. What value of maxCount should be defined to get all the events. 0 is definitely not the value, I checked that :

      maxCount

      The maximum number of values to be returned. The default is 1000. A value of '0' means that all of the events within the requested time range will be returned.

      • In case of fetching data via stream in Web API, the maxCount number dependent on maxArcCollect. Here, the error come like:

      "Parameter 'maxCount' is greater than the maximum allowed (2000)."

      How this limit has been decided? Is there based on any tuning parameter?

       

      Kindly let us know how can we retrieve this data via PI Web API.

        • Re: maxCount restriction for stremasets call in PI Web API
          Marcos Vainer Loeff

          Hi Paurav,

           

          Could you explain what are you developing exactly? Is it a web application? Why do you need to retrieve all this amount of data? Is performance important to you?

          • Re: maxCount restriction for stremasets call in PI Web API
            bshang

            You are probably running into the upper limit for maxReturnedItemsPerCall (it "overrides" the arcMaxCollect on the server in this case). By default, it is 150,000. The 2,000 is coming from 150,000/# attributes in streamsets call.

             

            You can configure the limit to the higher via PSE attribute explained here:  PI Web API Max Returned Items Per Call

             

            However, IMO, it is risky to request too much data from PI Web API all at once. One reason is that it severely limits performance. Serialization/deserialization tends to be a bottleneck in most environments in which a large amount of data is passed through the network. I would break the query into chunks and optionally process the results on a separate thread as they arrive. This at least maintains some responsiveness for the client and gives you the opportunity to resume a task without starting over by checkpointing during retrieval.

             

            However, another issue with expensive batch queries is that it potentially harms the server and other clients. In PI Web API 2015+, we have set a limit for maxCount. The idea that some limit exists is a best practice.  Returning a potentially infinite number of items in a single request is a pattern that leads to denial of service, either in the server or on the client.  Especially on the server side, allowing massive requests to proceed from a single consumer can lead to degradation of the service for other consumers. To that end, we designed and added the limiting scheme. With that said, the upper limit of 150,000 is configurable, so that higher limits can be established in environments where callers are trusted or performance has been proven.

            2 of 2 people found this helpful
              • Re: maxCount restriction for stremasets call in PI Web API
                Paurav Joshi

                ThanksBarry, that was very informative .

                I would break the query into chunks and optionally process the results on a separate thread as they arrive

                Can you please explain this breaking in more detail?

                However, IMO, it is risky to request too much data from PI Web API all at once.

                Actually client is asking only 1 month data of a turbine, which in requirement is general one.

                  • Re: maxCount restriction for stremasets call in PI Web API
                    gregor

                    Hello Paurav,

                     

                    You are dealing with 35 turbines with 43 attributes each and are retrieving a month of history for the ~ 1,500 attributes. What are you doing with this data? Are you rendering trends or how does that data become treated further?

                    Assume you would be trending the data, it would likely make sense to query Plot Values instead of Recorded Values because Plot Values is supposed to return a lot smaller amount of data but still valid to render a meaningful trend.

                    As a starting point and until we have a better understanding of what you like to accomplish, you can go by turbines. Instead of retrieving data for ~ 1,500 attributes, you would just query 43 at once. In case you indeed need archive values over the complete period, it may be wise to go attribute by attribute even this means ~ 1,500 round trips to the server.

                      • Re: maxCount restriction for stremasets call in PI Web API
                        Paurav Joshi

                        Hi Gregor,

                         

                        I fully understand that it will be expensive query to retrieve a month of history for the ~1500 attributes. The catch here is that client needs archive data over the complete period.

                        As a starting point and until we have a better understanding of what you like to accomplish, you can go by turbines. Instead of retrieving data for ~ 1,500 attributes, you would just query 43 at once. In case you indeed need archive values over the complete period, it may be wise to go attribute by attribute even this means ~ 1,500 round trips to the server.

                        Sorry but bit confused here are you suggesting to go via each turbine route or each attribute route.

                        Kindly explain why it is wise to go via attribute route. Exactly my concerns are also that it will take ~1500 round trips, so how will it impact the machine in which PI Web API is installed.

                          • Re: maxCount restriction for stremasets call in PI Web API
                            gregor

                            Hello Paurav,

                             

                            What are you doing with that huge package of data as soon as you receive it? Processing will likely take time too. Can you share information about how the data will be treated?

                             

                            It wasn't my intention to confuse you. Please accept my apologies. With "may" I intended to say it depends on the use case. I am doing hard to assume that some downstream data treatment would be able to "swallow" the huge lump at once.

                            Bulk calls are great because they help to avoid suffering network latencies that would otherwise add up to a recognizable period of time. Nevertheless, we recommend against executing expensive queries as explained already by Barry. The strategy is to order the data in smaller chunks and to process those smaller chunks in a parallel thread, as already suggested by Barry. This is a wise approach for many different reasons, starting with avoiding expensive queries. It will allow you to optimize your application performance wise. There's a however a new challenge to consider. You need to find the optimal chunk size. I expect this to be somewhere between ~1,500 and 1 Attribute per query but it's also possible that one needs to execute multiple queries per attribute if there are larger periods to cover.

                              • Re: maxCount restriction for stremasets call in PI Web API
                                Paurav Joshi

                                Hello Gregor,

                                 

                                Thanks for reply as it has almost cleared my confusions. Pardon me for not specifying how data will be treated. As mentioned in below comment the data will be dumped into excel-sheet for use to customer.

                                There's a however a new challenge to consider. You need to find the optimal chunk size. I expect this to be somewhere between ~1,500 and 1 Attribute per query but it's also possible that one needs to execute multiple queries per attribute if there are larger periods to cover.

                                While finding this optimal chunk size I think we have to consider the __MaxThreadsPerClientQuery tuning parameter also, because this all multi-threading calls will fall into "same request" category. Please correct me if I misunderstood.

                          • Re: maxCount restriction for stremasets call in PI Web API
                            Marcos Vainer Loeff

                            Hi Paurav,

                             

                            Thanks for replying. I understand that you need retrieve a month of history for the ~1500 attributes. But do you need all this amount of data as soon as the page loads? If so why? Are you plotting the data into many graphics? Are you making calculations? Isn’t it possible to get a summary for each turbine in order to retrieve less data? Nevertheless, once the user selects a specific turbine then the web app would retrieve one month of history for its 43 attributes.

                            Sometimes developers wants to retrieve a huge amount of data as soon as their app loads in order not to retrieve anything else later. This is not a good approach. You need to retrieve only what is needed to achieve better performance and user experience.

                             

                            In some situations you might want to use your own custom ASP.NET Web API with PI AF SDK. This will allow you to have more control of the server side, which means you can optimize your models, views, controllers and actions in order to reduce the amount of data transferred.

                             

                            If you are making calculations, make sure they are made on the server layer if possible. You can use PI Analysis Service (saving your results on PI Points) or write functions on the controllers/actions in case you decide to write a custom ASP.NET Web API with PI AF SDK.

                             

                            As it was already mentioned, if you are plotting graphics, do not forget to use PlotValues trying to return the less amount of data.

                            Finally, if your data is displayed on a long scrolling web page where the user needs to keep scrolling down in order to view the next graphic, then you can use some JavaScript libraries in order to load data from the graphics that the user is currently visualizing.

                             

                            Retrieving only the data that the user is navigating/exploring on your app is the key for achieving better performance and user experience.

                              • Re: maxCount restriction for stremasets call in PI Web API
                                Paurav Joshi

                                Hi Marcos,

                                 

                                Thank you. Your reply is very useful in any app building and I will definitely consider this remarks whenever developing any application .

                                Pardon me that I didn't explain how data will be consumed. Actually the end user need the pi archive data from application in excel-sheet, as they don't have direct access to PI data.

                                What do you suggest, how we use PI Web API here?

                                  • Re: maxCount restriction for stremasets call in PI Web API
                                    Marcos Vainer Loeff

                                    Hi Paurav,

                                     

                                    I am a bit confused now. Are you developing a web app, an excel spreadsheet or both?

                                     

                                    Actually, it doesn't matter since the idea is the same. For the spreadsheet, once it is loaded, you should show the general state from each turbine. This could be snapshots or calculated data. As soon as the user selects a turbine, a new window will open and the app will make an HTTP request against PI Web API and retrieve all data for this specific turbine and finally fetch into the graphic.

                                     

                                    PI System Explorer is a good example. It doesn't retrieve all AF objects once it starts, but just the minimum needed. As soon as you click on an element on the AF tree, it will make an AF SDK call and retrieve the data of the selected element.

                                     

                                    Let me know if this works for you!

                                      • Re: maxCount restriction for stremasets call in PI Web API
                                        Paurav Joshi

                                        Hi Marcos,

                                         

                                        Thanks prompt reply. I am developing both .

                                         

                                        Please find screenshot of application as follows:

                                        In this application, there is possibility that if you want to get data of whole farm then just select it and you will get whole month data in either csv or xls format.

                                        The approach mentioned by you is great and very effective regarding data querying. Unfortunately, I think it cannot be used in current problem .

                                          • Re: maxCount restriction for stremasets call in PI Web API
                                            Marcos Vainer Loeff

                                            Hi Paurav,

                                             

                                            PI Web API is not suitable for this use case. I don't know much about your software architecture but I strongly suggest you using AF SDK instead of PI Web API, taking advantage of the AFCache in order to raise performance. If you provide more information about your software architecture, we can help you further with more information.

                                            • Re: maxCount restriction for stremasets call in PI Web API
                                              gregor

                                              Hello Paurav,

                                               

                                              Why don't the users not have access to the PI System?

                                              Is this possibly a use case for PI Cloud Connect?

                                              At OSIsoft we are interested to learn about the problems people are facing when using our products. It's not always possible to service everybody, also because interests can be conflicting. The better we understand the use case, the higher becomes the chance we might be able to offer something and if not now than maybe in future. Making PI Data available to users that don't have access to PI sounds more like working around some restriction rather than finding a valid solution. What does the restriction originate from? I assume there's a network that allows to transfer the files, either XLS or CSV. Why can't this network not be used with PI Clients?

                                               

                                              Working around something usually means creating an island solution and island solutions can turn out to be very expensive with regards to operational and maintenance costs.

                                               

                                              Patrice Thivierge recently posted about How to use PI Web API with VBA - Introduction and I believe this could be a useful resource for you to make PI System Data available to users through PI Web API.

                                              • Re: maxCount restriction for stremasets call in PI Web API
                                                bshang

                                                Thanks for explaining the use case. PI Web API was designed to service relatively small queries during each request/response cycle. REST frameworks are usually designed assuming the "state" in REST is relatively small. It typically doesn't tend to work well when it's thought of as an FTP service   (of course you can make an HTTP call to download a file, but I'm referring to thinking of the response JSON as the file.)

                                                 

                                                If you need to export a large amount of data to csv via a web front-end, here is a high-level design.

                                                 

                                                1) When the user clicks "Submit", queue this task on your web server and return a "ticket" to the user. This ticket may contain a link that allows the user to monitor the progress of the csv file generation. On the monitoring page, you can add a link that's enabled once the csv file is ready. If you've created resources on Azure or AWS, it works similarly (i.e. long running requests should be asynchronous).

                                                 

                                                2) This queued task can use either AF SDK or PI Web API. The former is preferable for performance. This task will break the query down into chunks if it deems it expensive. As the results arrive from the PI System, it will write them to the csv file.

                                                 

                                                3) The task monitoring page can periodically poll the web server to interrogate the progress. You can get clever and store the batch task's state in a database so the user can pause and recover.

                                                 

                                                A lot of this is not trivial to design well. PI Integrator for BA may help here as it will allow you to pre-fetch/pre-format the data you want, so the web server task's job is somewhat simplified.

                                                1 of 1 people found this helpful
                                                  • Re: maxCount restriction for stremasets call in PI Web API
                                                    Paurav Joshi

                                                    REST frameworks are usually designed assuming the "state" in REST is relatively small. It typically doesn't tend to work well when it's thought of as an FTP service (of course you can make an HTTP call to download a file, but I'm referring to thinking of the response JSON as the file.)

                                                    That was some RESTful explanation .

                                                    1) When the user clicks "Submit", queue this task on your web server and return a "ticket" to the user. This ticket may contain a link that allows the user to monitor the progress of the csv file generation. On the monitoring page, you can add a link that's enabled once the csv file is ready. If you've created resources on Azure or AWS, it works similarly (i.e. long running requests should be asynchronous).

                                                    How can you be this much spot on ?  This is similar to design used by us. Here, we are notifying customer whenever the excel-sheet available to download.

                                                    PI Integrator for BA may help here as it will allow you to pre-fetch/pre-format the data you want, so the web server task's job is somewhat simplified.

                                                    It will be great if you can explain this in some detail, so I can understand it in current context.