cleroux

How to filter out the 30 most compromising values out of a monthly average

Discussion created by cleroux on Jun 17, 2011
Latest reply on Jun 17, 2011 by MichaelvdV@Atos

Use case

Because of environmental legislation, some administration will ask a company to report environmental air/water waste production monthly.
It is the responsability of the company to provide the administration with the relevant data.
To prevent any "bad" data to be part of the values sent, some administration request that a fixed number of values that affects the most the raw average be removed from the monthly average calculation.
If we set that fixed data to 30, then they will want to retrieve the time weighted monthly average of waste production without these 30 values to interfere in the average calculation.
This post-analysis requires filtering of the values that are the farthest above/below the raw average, and then recalculate the average without the filtered values.

Background

Since no native PI calculation engine like PE or Totalizer can perform this, we will need to use PI ACE. We could do this stricly in PI ACE using .NET functions and the PI-SDK, but it would be simpler to use the PI OLEDB Provider and use it in SQL linked Server to take advantage of its ordering capabilities.

Solution

The first step is to get the raw average of the current month up to midnight of the current day.
To do this we will use a linked server in SQL Server through the PI OLEDB Provider and use a pass-through query to be able to use PI syntax natively:

 
SELECT * FROM OPENQUERY(PI_LOCAL,
 'SELECT TOP 1 TAG, TIME - TIMESTEP AS STARTTIME, TIME AS ENDTIME, CAST(VALUE AS FLOAT32) AS AVERAGE
 FROM PIARCHIVE..PIAVG WHERE TAG = ''cdt158''
 AND TIME BETWEEN ''1'' AND ''*''
 AND TIMESTEP = ''15d'' 
 ORDER BY TIME DESC')

 

 

This will return the raw average we need from the begining of the month:

tag starttime endtime average
cdt158 2011-06-01 00:00:00 2011-06-16 00:00:00 231.6505

Let's extend the previous pass-through query to substract each raw value from this average, regardless if its below or above by using ABS().
We will then order the result by acending order from the distance of separation of the average:

 
SELECT * FROM OPENQUERY(PI_LOCAL,
 'SELECT T1.TIME, T2.AVERAGE, T1.ORIGINALVALUE,
 (CASE WHEN T1.ISBADVAL = 1 THEN NULL ELSE ABS(T1.ORIGINALVALUE - T2.AVERAGE) END) AS DISTANCE
 FROM (SELECT TAG, TIME, (CASE WHEN C.STATUS = 0 THEN 0 ELSE 1 END) AS ISBADVAL, CAST(C.VALUE AS FLOAT32) AS ORIGINALVALUE
 FROM PIARCHIVE..PICOMP2 C) T1
 INNER JOIN 
 (SELECT TOP 1 TAG, TIME - TIMESTEP AS STARTTIME, TIME AS ENDTIME, CAST(VALUE AS FLOAT32) AS AVERAGE
 FROM PIARCHIVE..PIAVG WHERE TAG = ''cdt158''
 AND TIME BETWEEN ''1'' AND ''*''
 AND TIMESTEP = ''15d'' 
 ORDER BY TIME DESC) T2
 ON T1.TAG = T2.TAG AND T1.TIME BETWEEN T2.STARTTIME AND T2.ENDTIME
 ORDER BY DISTANCE')

 

 

This will return the raw values ordered by distance ascending:

time AVERAGE ORIGINALVALUE DISTANCE
2011-06-04 00:00:00 225.3047 NULL NULL
2011-06-09 12:02:08 225.3047 NULL NULL
2011-06-01 10:51:21 225.3047 225.2845 0.02020264
2011-06-09 15:59:57 225.3047 225.2841 0.02056885
2011-06-03 13:35:20 225.3047 225.3275 0.02279663
... ... ... ...

For the query to perform in memory with high performance, let's output the result in a temporary table and packaged the whole thing in a Stored Procedure.
We can filter out the last 30 values of the returned result at this point in the final output.
The advantage of doing this in a Stored Procedure enables you to use this function with variables and therefore change the inputs like tag, starttime, endtime and number of values to filter:

 
USE [master]
GO
/****** Object:  StoredProcedure [dbo].[sp_FilterValues]    Script Date: 06/06/2011 16:25:33 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

-- To modify the SP use the ALTER command instead
--ALTER PROCEDURE [dbo].[sp_FilterValues]
CREATE PROCEDURE [dbo].[sp_FilterValues]
      @chvTagName varchar(255) = 'cdt158',
      @chvStartTime varchar(255) = '1',
      @chvEndTime varchar(255) = 't',
      @intFilterValues integer = 30
AS
BEGIN
      -- SET NOCOUNT ON added to prevent extra result sets from
      -- interfering with SELECT statements.
      SET NOCOUNT ON;
      
      DECLARE @Q AS VARCHAR(4)
      DECLARE @nchQuery AS NVARCHAR(4000)
      DECLARE @chvTemp AS VARCHAR(255)
      DECLARE @nchParams AS NVARCHAR(4000)
      DECLARE @intMaxRows integer   
      DECLARE @intMaxIDtoKeep integer
      DECLARE @intLocalCurrentDay integer
      DECLARE @DateLocalBeginningDay DateTime
      DECLARE @chvTimeStep VARCHAR(255)
      
      -- Initialize.
      SET @Q = ''''  
         
      -- If tag name give is empty, quit the procedure.
      IF @chvtagname = '' RETURN
      
      CREATE TABLE #TMP_PIDATA (
            ID                int         identity,
            TIME              datetime    not null,
            AVERAGE           float       null,
            ORIGINALVALUE     float       null,
            DISTANCE          float       null
      )
      
 --Set @chvTimeStep
 SET @nchQuery = N'SELECT @CurrentDay = DAY FROM OPENQUERY(PI_LOCAL,' + @Q + 'SELECT DAY(%ENDTIME%)-1 AS DAY' + @Q + ')'
 SET @nchParams = N'@CurrentDay int OUT'

 -- Replace tokens from the query template...
 SET @chvTemp = @Q + @Q + @chvEndTime + @Q + @Q
 SET @nchQuery = REPLACE(@nchQuery, '%ENDTIME%', @chvTemp)

 PRINT 'The query is... ' + @nchQuery
 
    -- Insert statements for procedure here
    EXEC sp_executesql @nchQuery, @nchParams, @intLocalCurrentDay OUT
 SET @chvTimeStep = CAST(@intLocalCurrentDay AS VARCHAR) + 'd'

 -- Proceed with the data extraction.
 SET @nchQuery = N'INSERT INTO #TMP_PIDATA (TIME, AVERAGE, ORIGINALVALUE, DISTANCE)' + 
 ' SELECT * FROM OPENQUERY(PI_LOCAL,' + @Q +
 ' SELECT T1.TIME, T2.AVERAGE, T1.ORIGINALVALUE,' +
 ' (CASE WHEN T1.ISBADVAL = 1 THEN NULL ELSE ABS(T1.ORIGINALVALUE - T2.AVERAGE) END) AS DISTANCE' +
 ' FROM (SELECT TAG, TIME, (CASE WHEN C.STATUS = 0 THEN 0 ELSE 1 END) AS ISBADVAL, CAST(C.VALUE AS FLOAT32) AS ORIGINALVALUE' + 
 ' FROM PIARCHIVE..PICOMP2 C) T1' +
 ' INNER JOIN ' + 
 ' (SELECT TOP 1 TAG, TIME - TIMESTEP AS STARTTIME, TIME AS ENDTIME, CAST(VALUE AS FLOAT32) AS AVERAGE' +
 ' FROM PIARCHIVE..PIAVG WHERE TAG = %TAG%' + 
 ' AND TIME BETWEEN %STARTTIME% AND %ENDTIME%' +
 ' AND TIMESTEP = %TIMESTEP%' + 
 ' ORDER BY TIME DESC) T2' +
 ' ON T1.TAG = T2.TAG AND T1.TIME BETWEEN T2.STARTTIME AND T2.ENDTIME' +
 ' ORDER BY DISTANCE ASC' 
 + @Q + ')'
 
 -- Replace tokens from the query template...
 SET @chvTemp = @Q + @Q + @chvTagName + @Q + @Q
 SET @nchQuery = REPLACE(@nchQuery, '%TAG%', @chvTemp)
 
 SET @chvTemp = @Q + @Q + @chvStartTime + @Q + @Q
 SET @nchQuery = REPLACE(@nchQuery, '%STARTTIME%', @chvTemp)

 SET @chvTemp = @Q + @Q + @chvEndTime + @Q + @Q
 SET @nchQuery = REPLACE(@nchQuery, '%ENDTIME%', @chvTemp)
 
 SET @chvTemp = @Q + @Q + @chvTimeStep + @Q + @Q
 SET @nchQuery = REPLACE(@nchQuery, '%TIMESTEP%', @chvTemp)
 
 PRINT 'The query is... ' + @nchQuery
 
    -- Insert statements for procedure here
    EXEC sp_executesql @nchQuery
 
 -- Calculate the max. ID to search.
 SET @intMaxRows = (SELECT MAX(ID) FROM #TMP_PIDATA)
 SET @intMaxIDtoKeep =  @intMaxRows - @intFilterValues
 
 -- Return desired results without the filtered value.
 SELECT TIME, AVERAGE, ORIGINALVALUE, DISTANCE FROM #TMP_PIDATA WHERE ID <= @intMaxIDtoKeep
 --SELECT AVG(ORIGINALVALUE) FROM #TMP_PIDATA WHERE ID <= @intMaxIDtoKeep
 
END

 

 

You just need to call the Stored Procedure now and pass as arguments the tag, starttime, endtime and number of values to filter:

 
EXEC sp_FilterValues 'cdt158','1','*', 30 

This will return the same table than above, except with the 30 values filtered.
Now we can call this Stored Procedure in PI-ACE using ADO code to the SQL Server OLEDB Provider:

 
Imports OSIsoft.PI.ACE
Imports OSIsoft.PI.ACE.MessageLevel
Imports System
Imports System.Threading
Imports PISDK
Imports PITimeServer
Imports PISDKCommon
Imports ADODB

Public Class Module_1
    Inherits PIACENetClassModule
    Private Media_CO_1month As PIACEPoint
    Private cdt158 As PIACEPoint
    '
    '      Tag Name/VB Variable Name Correspondence Table
    ' Tag Name                                VB Variable Name
    ' ------------------------------------------------------------
 ' cdt158                                  cdt158
 ' Media_CO_1month                         Media_CO_1month
    '
    Public Overrides Sub ACECalculations()

        Dim myPISDK As New PISDK.PISDK
        Dim MyServer As Server
        Dim MyQuery As String
        Dim MyPoint As PIPoint
        Dim MyADO As ADODB.Connection
        Dim pvValueToAdd As New PIValue
        Dim ptmTimestamp As New PITimeFormat

        'This query executes the stored procedure sp_FilterValues
        MyQuery = "EXEC sp_FilterValues 'cdt158','1','*', 30"
        MyServer = myPISDK.Servers.DefaultServer

        ' create a new ADO connection using the SQL Server OLEDB Provider
        MyADO = New ADODB.Connection
        MyADO.ConnectionString = "Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=True;Data Source=PIDEMOVM;Use Procedure for Prepare=1;Auto Translate=True;Packet Size=4096;Workstation ID=PIDEMOVM;Use Encryption for Data=False;Tag with column collation when possible=False"
        MyADO.Open()

        'Prepare the execution of the query
        Dim MyCommand As ADODB.Command
        MyCommand = New ADODB.Command
        MyCommand.ActiveConnection = MyADO
        MyCommand.CommandText = MyQuery
        MyCommand.CommandType = CommandTypeEnum.adCmdText

        'Get the record set of the result of the query
        Dim MyRS As ADODB.Recordset
        MyRS = New ADODB.Recordset
        MyRS.CursorType = CursorTypeEnum.adOpenStatic
        MyRS.CursorLocation = CursorLocationEnum.adUseServer

        'Executes the query
        MyRS.Open(MyCommand)

        'Initialize output PI tag
        MyPoint = MyServer.PIPoints(Media_CO_1month.Tag)

        Dim i As Integer
        Dim MyCount As Long = 0

        'Count the number of rows the query returns since a command of type 'adCmdText' does not return the RecordCount property
        While Not MyRS.EOF

            MyCount = MyCount + 1
            MyRS.MoveNext()

        End While

        'go back to the first record
        MyRS.MoveFirst()

        'Parse through each record set
        For i = 1 To MyCount

            'Bypass if this record is set to NULL because of "Bad" values
            If MyRS.Fields("ORIGINALVALUE").Value.ToString IsNot "" Then

                'Capture the timestamp of this current record set and store it in the PIValue object
                ptmTimestamp.InputString = MyRS.Fields("TIME").Value
                pvValueToAdd.TimeStamp = ptmTimestamp

                'Capture the value of this current record set and store it in the PIValue object
                pvValueToAdd.Value = MyRS.Fields("ORIGINALVALUE").Value

                'Write to the PI tag the current PIValue
                MyPoint.Data.UpdateValue(pvValueToAdd, 0)

            End If

            'Use next record set
            MyRS.MoveNext()

        Next

    End Sub

 

 

This code will write to the output tag all values contained in the record sets (all rows) except the NULL values.
Now since it was filtered out of its most 'compromising' values, the output tag can be used to perform a time weighted average and be sent to the relevant administration.

 


 

Outcomes