12 Replies Latest reply on Jun 14, 2018 6:06 PM by jsorrick

    Extracting really huge amounts of data from PI, outside-the-box ideas?

    Roger Palmen

      Hi All,

       

      We all know there are several ways to extract large amounts of data from PI using PI Config or a custom tool using PI AFSDK. (The options basically here: How to write PI Data to files in XML format? )

      But even these have their limitations. I've built some tools for that but i typically don't get near 1M events/second for a hostory extraction. And with millions of tags and several years of data to process, that is too slow. Setting aside the causes of that, i started thinking outside the box.

       

      We currently have a process in place where we restore the PI Archives to a separate PI Server, and extract the data from that to other files before sending that off to another platform. To maximize performance, the first idea is to remove as much components as possible from the solution, and here that would be PI. Why do i restore files, and then use two applications to read those files (PI Data Historian) and write these to other files (custom AFSDK application)?

       

      In other words: would it be possible to read the PI Archive files directly to Transform the data to a different format?

        • Re: Extracting really huge amounts of data from PI, outside-the-box ideas?
          Roger Palmen

          Thinking a bit firther. PIArchss and PIArtool both read the data in archive files, so they would be great candidates to do some transcoding to just dump out the contents of an archive file into a public format.

          Not sure if OSI wants to go down that road, but it could be one of the fastest ways to pull plainly all data from an archive file.

          • Re: Extracting really huge amounts of data from PI, outside-the-box ideas?
            vkaufmann

            Hi Roger,

             

            Whats the end goal here? You quote a data rate of 1M events/sec which seems outrageous for any system. Where does this number come from? In my opinion, read speeds aren't going to be performed by any application better than the archive subsystem reads the archive files since everything is built and optimized for that specific data flow. I don't think there is any performance to be gained going down the route of a public archive format. The fastest reads are going to be had by getting your data into memory which can prohibitive for obvious reasons.

             

            --Vince

              • Re: Extracting really huge amounts of data from PI, outside-the-box ideas?
                Roger Palmen

                Hi Vincent,

                I think 1M ev/s is not an outrageous scenario.

                For a streaming read from a PI Server we recently worked on a scenario to continuously receive all updates of >1M PI Points, and that worked fine reaching speeds of over 20M events/minute on my laptop.

                Now history is a different beast than snapshot i agree. But still these numbers are not that far out. I we have 1M PI Points, 10y of data, and expect on average 1 event per minute, that equates to roughly 5.25M events per Pi Point. To extract that amount of data let's say within 1 month, you already need to extract 2M events per second from the archives to achieve that.

                Now we all agree that the main thing we are doing here, is to shuffle data from A to B, and that it would be very feasible to restore a PI Server of this size from A to B within a few days max. So that sets the stage...

                 

                Agree fully here that the Archive Subsystem should be the most optimized way to extract data from the archive files. Now the main thing is how to configure / tune an archive to do just that single job very well?

                Ideally i'd like some specialized tooling on the backend, but i am aware that OSI may have it's reasons not wanting to do that. But if i would have access to the proprietary binary format of the archive files, i would do just that.

                 

                Now back to the more feasible scenario. I already built a tool that segments the data calls to read all data from PI Points in sequence of time, so that i read e.g. all data for one day, then the next. That would allow me to have PI and Windows to pull one specific archive into memory and read that most optimized. But there's a lot more tuning to do...

              • Re: Extracting really huge amounts of data from PI, outside-the-box ideas?
                jsorrick

                Not sure how you are using the data but for me I started playing around with the "Filter Expression" option and adding multiple tags the filter. The way our site "validated " performance was they pulled 1 minute data ( which is interpolated) then filtered the data in excel. Huge sheets and really slow. I started using PI "filter expressions" and "calculated data" instead of inside the excel doing the integration and it really sped things up for us.