8 Replies Latest reply on Mar 9, 2012 1:49 PM by cguimaraes

    Best way to check for differences in timeseries using the PISDK.

    cguimaraes

      We have to merge the data from 19 PI Servers to a single PI server. I'm creating a small PISDK application to validate that the data merged successfully.

       

       

       

      In another topic I already found that using the secondary interface to PIValues called IPIValues2. This seems to be the fasted way to get all the data of a specific time range into the code.

       

       

       

      I'm using something similar to this to get the values.      

       

      var v1arr1 = Array.CreateInstance(typeof(object), iValues1Count);
      var v1arr2 = Array.CreateInstance(typeof(object), iValues1Count);
      var v1arr3 = Array.CreateInstance(typeof(object), iValues1Count);
      iValues1.GetValueArrays(out v1arr1, out v1arr2, out v1arr3);
          
      var iValues2Count = values2.Count;
      
      var v2arr1 = Array.CreateInstance(typeof(object), iValues2Count);
      var v2arr2 = Array.CreateInstance(typeof(object), iValues2Count);
      var v2arr3 = Array.CreateInstance(typeof(object), iValues2Count);
      
      iValues2.GetValueArrays(out v2arr1, out v2arr2, out v2arr3);
      

       

      Now I am left with 2 x 3 arrays representing the 2 time series. One array with the timestamps, one with values, and one with the value attributes per timeserie.

       

       My question now is: what is the best possible way to compare these two time series and find any differences?

       

       

        • Re: Best way to check for differences in timeseries using the PISDK.
          Ahmad Fattahi

          (Unfortunately your graphic or code insertion doesn't show up for me. I think I get your point though.)

           

          This can be a tricky question. The reason is that different PI Servers usually have slight differences in the events they record coming from the same source. This can be due to time differences, compression settings, and other different environmental factors on various PI Servers. Therefore, you may need to adopt "softer" criteria to compare the two trends.

           

          You can create an array of intrepolated values of the two arrays at constant intervals. You can then consider a couple measures with some error tolerance. With this in hand I would check the Mean Square Error between the two arrays as well as the maximum absolute difference between the two arrays. You can compare each measure with a threshold of yours and call it a match or fail.

            • Re: Best way to check for differences in timeseries using the PISDK.

              Ahmad

              This can be a tricky question. The reason is that different PI Servers usually have slight differences in the events they record coming from the same source. This can be due to time differences, compression settings, and other different environmental factors on various PI Servers. Therefore, you may need to adopt "softer" criteria to compare the two trends.

               

              You can create an array of intrepolated values of the two arrays at constant intervals. You can then consider a couple measures with some error tolerance. With this in hand I would check the Mean Square Error between the two arrays as well as the maximum absolute difference between the two arrays. You can compare each measure with a threshold of yours and call it a match or fail.

               

               

              +1

                • Re: Best way to check for differences in timeseries using the PISDK.
                  mhamel

                  Hi Carlos,

                   

                  Definitely you can go with the PI SDK approach but writing efficient comparison algorithms in code can take time mostly consisting of debugging and it should not be your focus. Your goal is to get a successful merge project. I would use a different tool to make these comparisons easy, I would use the PI OLEDB provider and SQL Server (2005 and later) to store intermediary values from the comparison. A SQL-type database was built to make comparison of datasets of information easy in terms or quantity and quality. Scripting these queries will not take long and is easily reusable for other unit testing.

                   

                  I would like to recall that the success of your project sits more on the comparison protocol or methodology you are using then how to implement your comparison tests. As an example, if each of the 19 PI Servers has 10K points each that you would like to merge into another PI Server of 190K points, I would define my methodology of comparison with these rules:

                  • Select 1000 points from each PI Servers equally distributed in the primary archive file. It means that the record number (recno) attribute has a value covering the whole space of an archive file. This will avoid not detecting damaged archive files. These points will be used for counting the exact number of events stored for each. They will represent your quantity test lot.
                  • Select 100 points different from the quantity test lot. These points will be used for comparing the exact values from the original PI Server and the destination one. They will represent your quality test lot. You will choose pre-defined short time ranges (between a few days to weeks) to compare these values for the whole period of merge.

                  Once this is defined, you can write SQL queries against both source and destination PI Servers to compare quantity and quality aspects over all the points and all the time ranges. If your merge process went well, you should get a perfect fit with the events, otherwise you know what you should redo in your merge process. By the laws of statistics, if you take a valid sample of your population (here the events) and get identical results for samples, you could assume with minimal error that the whole set of data is identical.

                   

                  I provided these examples below. The first query sends the number of values for each pre-determined tags from the source PI Server into a local table, the second does the same but with the destination PI Server and the third return a list of events before and after.

                   

                   

                   
                  INSERT INTO #LOCALTABLE1
                  SELECT * FROM OPENQUERY(PISOURCE,
                  'SELECT TAG, COUNT(*) FROM PIARCHIVE..PICOMP2
                  WHERE TAG IN (''TAG1'', ''TAG2'', ''TAG3'')
                  AND TIME BETWEEN ''T1'' AND ''T2''
                  GROUP BY TAG')
                  
                  --...
                  
                  INSERT INTO #LOCALTABLE2
                  SELECT * FROM OPENQUERY(PIDESTINATION,
                  'SELECT TAG, COUNT(*) FROM PIARCHIVE..PICOMP2
                  WHERE TAG IN (''TAG1'', ''TAG2'', ''TAG3'')
                  AND TIME BETWEEN ''T1'' AND ''T2''
                  GROUP BY TAG')
                  
                  --...
                  
                  SELECT
                       TAG,
                       'Source',
                       T1.VALUE,
                       'Destination'
                       T2.VALUE,
                  FROM
                       LOCALTABLE1 T1
                       INNER JOIN
                       LOCALTABLE2 T2
                            ON T2.TAG = T1.TAG
                  

                   

                    • Re: Best way to check for differences in timeseries using the PISDK.
                      cguimaraes

                      Hi Folks, thanks for all the answers.

                       

                      The main problem is that I have to test at least one site merge  with 100% sure of all values, which means read and check all the values from all the tags at both sites and make sure to not miss, create or duplicate any data from any tag.

                       

                      Until now we have found, doing some sample tests using excel spreadsheets, that some values where missed and some where duplicated at destiny. All those values where we found such errors are tags of laboratory.

                  • Re: Best way to check for differences in timeseries using the PISDK.
                    cguimaraes

                    Hi Ahmad, I fixed the graphic of the code, I hope you can see now.