# How to filter out the 30 most compromising values out of a monthly average

Discussion created by cleroux on Jun 17, 2011
Latest reply on Jun 17, 2011 by MichaelvdV@Atos

# Use case

Because of environmental legislation, some administration will ask a company to report environmental air/water waste production monthly.
It is the responsability of the company to provide the administration with the relevant data.
To prevent any "bad" data to be part of the values sent, some administration request that a fixed number of values that affects the most the raw average be removed from the monthly average calculation.
If we set that fixed data to 30, then they will want to retrieve the time weighted monthly average of waste production without these 30 values to interfere in the average calculation.
This post-analysis requires filtering of the values that are the farthest above/below the raw average, and then recalculate the average without the filtered values.

# Background

Since no native PI calculation engine like PE or Totalizer can perform this, we will need to use PI ACE. We could do this stricly in PI ACE using .NET functions and the PI-SDK, but it would be simpler to use the PI OLEDB Provider and use it in SQL linked Server to take advantage of its ordering capabilities.

# Solution

The first step is to get the raw average of the current month up to midnight of the current day.
To do this we will use a linked server in SQL Server through the PI OLEDB Provider and use a pass-through query to be able to use PI syntax natively:

```SELECT * FROM OPENQUERY(PI_LOCAL,
'SELECT TOP 1 TAG, TIME - TIMESTEP AS STARTTIME, TIME AS ENDTIME, CAST(VALUE AS FLOAT32) AS AVERAGE
FROM PIARCHIVE..PIAVG WHERE TAG = ''cdt158''
AND TIME BETWEEN ''1'' AND ''*''
AND TIMESTEP = ''15d''
ORDER BY TIME DESC')
```

This will return the raw average we need from the begining of the month:

 tag starttime endtime average cdt158 2011-06-01 00:00:00 2011-06-16 00:00:00 231.6505

Let's extend the previous pass-through query to substract each raw value from this average, regardless if its below or above by using ABS().
We will then order the result by acending order from the distance of separation of the average:

```SELECT * FROM OPENQUERY(PI_LOCAL,
'SELECT T1.TIME, T2.AVERAGE, T1.ORIGINALVALUE,
(CASE WHEN T1.ISBADVAL = 1 THEN NULL ELSE ABS(T1.ORIGINALVALUE - T2.AVERAGE) END) AS DISTANCE
FROM (SELECT TAG, TIME, (CASE WHEN C.STATUS = 0 THEN 0 ELSE 1 END) AS ISBADVAL, CAST(C.VALUE AS FLOAT32) AS ORIGINALVALUE
FROM PIARCHIVE..PICOMP2 C) T1
INNER JOIN
(SELECT TOP 1 TAG, TIME - TIMESTEP AS STARTTIME, TIME AS ENDTIME, CAST(VALUE AS FLOAT32) AS AVERAGE
FROM PIARCHIVE..PIAVG WHERE TAG = ''cdt158''
AND TIME BETWEEN ''1'' AND ''*''
AND TIMESTEP = ''15d''
ORDER BY TIME DESC) T2
ON T1.TAG = T2.TAG AND T1.TIME BETWEEN T2.STARTTIME AND T2.ENDTIME
ORDER BY DISTANCE')
```

This will return the raw values ordered by distance ascending:

 time AVERAGE ORIGINALVALUE DISTANCE 2011-06-04 00:00:00 225.3047 NULL NULL 2011-06-09 12:02:08 225.3047 NULL NULL 2011-06-01 10:51:21 225.3047 225.2845 0.02020264 2011-06-09 15:59:57 225.3047 225.2841 0.02056885 2011-06-03 13:35:20 225.3047 225.3275 0.02279663 ... ... ... ...

For the query to perform in memory with high performance, let's output the result in a temporary table and packaged the whole thing in a Stored Procedure.
We can filter out the last 30 values of the returned result at this point in the final output.
The advantage of doing this in a Stored Procedure enables you to use this function with variables and therefore change the inputs like tag, starttime, endtime and number of values to filter:

```USE [master]
GO
/****** Object:  StoredProcedure [dbo].[sp_FilterValues]    Script Date: 06/06/2011 16:25:33 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

-- To modify the SP use the ALTER command instead
--ALTER PROCEDURE [dbo].[sp_FilterValues]
CREATE PROCEDURE [dbo].[sp_FilterValues]
@chvTagName varchar(255) = 'cdt158',
@chvStartTime varchar(255) = '1',
@chvEndTime varchar(255) = 't',
@intFilterValues integer = 30
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;

DECLARE @Q AS VARCHAR(4)
DECLARE @nchQuery AS NVARCHAR(4000)
DECLARE @chvTemp AS VARCHAR(255)
DECLARE @nchParams AS NVARCHAR(4000)
DECLARE @intMaxRows integer
DECLARE @intMaxIDtoKeep integer
DECLARE @intLocalCurrentDay integer
DECLARE @DateLocalBeginningDay DateTime
DECLARE @chvTimeStep VARCHAR(255)

-- Initialize.
SET @Q = ''''

-- If tag name give is empty, quit the procedure.
IF @chvtagname = '' RETURN

CREATE TABLE #TMP_PIDATA (
ID                int         identity,
TIME              datetime    not null,
AVERAGE           float       null,
ORIGINALVALUE     float       null,
DISTANCE          float       null
)

--Set @chvTimeStep
SET @nchQuery = N'SELECT @CurrentDay = DAY FROM OPENQUERY(PI_LOCAL,' + @Q + 'SELECT DAY(%ENDTIME%)-1 AS DAY' + @Q + ')'
SET @nchParams = N'@CurrentDay int OUT'

-- Replace tokens from the query template...
SET @chvTemp = @Q + @Q + @chvEndTime + @Q + @Q
SET @nchQuery = REPLACE(@nchQuery, '%ENDTIME%', @chvTemp)

PRINT 'The query is... ' + @nchQuery

-- Insert statements for procedure here
EXEC sp_executesql @nchQuery, @nchParams, @intLocalCurrentDay OUT
SET @chvTimeStep = CAST(@intLocalCurrentDay AS VARCHAR) + 'd'

-- Proceed with the data extraction.
SET @nchQuery = N'INSERT INTO #TMP_PIDATA (TIME, AVERAGE, ORIGINALVALUE, DISTANCE)' +
' SELECT * FROM OPENQUERY(PI_LOCAL,' + @Q +
' SELECT T1.TIME, T2.AVERAGE, T1.ORIGINALVALUE,' +
' (CASE WHEN T1.ISBADVAL = 1 THEN NULL ELSE ABS(T1.ORIGINALVALUE - T2.AVERAGE) END) AS DISTANCE' +
' FROM (SELECT TAG, TIME, (CASE WHEN C.STATUS = 0 THEN 0 ELSE 1 END) AS ISBADVAL, CAST(C.VALUE AS FLOAT32) AS ORIGINALVALUE' +
' FROM PIARCHIVE..PICOMP2 C) T1' +
' INNER JOIN ' +
' (SELECT TOP 1 TAG, TIME - TIMESTEP AS STARTTIME, TIME AS ENDTIME, CAST(VALUE AS FLOAT32) AS AVERAGE' +
' FROM PIARCHIVE..PIAVG WHERE TAG = %TAG%' +
' AND TIME BETWEEN %STARTTIME% AND %ENDTIME%' +
' AND TIMESTEP = %TIMESTEP%' +
' ORDER BY TIME DESC) T2' +
' ON T1.TAG = T2.TAG AND T1.TIME BETWEEN T2.STARTTIME AND T2.ENDTIME' +
' ORDER BY DISTANCE ASC'
+ @Q + ')'

-- Replace tokens from the query template...
SET @chvTemp = @Q + @Q + @chvTagName + @Q + @Q
SET @nchQuery = REPLACE(@nchQuery, '%TAG%', @chvTemp)

SET @chvTemp = @Q + @Q + @chvStartTime + @Q + @Q
SET @nchQuery = REPLACE(@nchQuery, '%STARTTIME%', @chvTemp)

SET @chvTemp = @Q + @Q + @chvEndTime + @Q + @Q
SET @nchQuery = REPLACE(@nchQuery, '%ENDTIME%', @chvTemp)

SET @chvTemp = @Q + @Q + @chvTimeStep + @Q + @Q
SET @nchQuery = REPLACE(@nchQuery, '%TIMESTEP%', @chvTemp)

PRINT 'The query is... ' + @nchQuery

-- Insert statements for procedure here
EXEC sp_executesql @nchQuery

-- Calculate the max. ID to search.
SET @intMaxRows = (SELECT MAX(ID) FROM #TMP_PIDATA)
SET @intMaxIDtoKeep =  @intMaxRows - @intFilterValues

-- Return desired results without the filtered value.
SELECT TIME, AVERAGE, ORIGINALVALUE, DISTANCE FROM #TMP_PIDATA WHERE ID <= @intMaxIDtoKeep
--SELECT AVG(ORIGINALVALUE) FROM #TMP_PIDATA WHERE ID <= @intMaxIDtoKeep

END
```

You just need to call the Stored Procedure now and pass as arguments the tag, starttime, endtime and number of values to filter:

```EXEC sp_FilterValues 'cdt158','1','*', 30
```

This will return the same table than above, except with the 30 values filtered.
Now we can call this Stored Procedure in PI-ACE using ADO code to the SQL Server OLEDB Provider:

```Imports OSIsoft.PI.ACE
Imports OSIsoft.PI.ACE.MessageLevel
Imports System
Imports PISDK
Imports PITimeServer
Imports PISDKCommon

Public Class Module_1
Inherits PIACENetClassModule
Private Media_CO_1month As PIACEPoint
Private cdt158 As PIACEPoint
'
'      Tag Name/VB Variable Name Correspondence Table
' Tag Name                                VB Variable Name
' ------------------------------------------------------------
' cdt158                                  cdt158
' Media_CO_1month                         Media_CO_1month
'
Public Overrides Sub ACECalculations()

Dim myPISDK As New PISDK.PISDK
Dim MyServer As Server
Dim MyQuery As String
Dim MyPoint As PIPoint
Dim ptmTimestamp As New PITimeFormat

'This query executes the stored procedure sp_FilterValues
MyQuery = "EXEC sp_FilterValues 'cdt158','1','*', 30"
MyServer = myPISDK.Servers.DefaultServer

' create a new ADO connection using the SQL Server OLEDB Provider
MyADO.ConnectionString = "Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=True;Data Source=PIDEMOVM;Use Procedure for Prepare=1;Auto Translate=True;Packet Size=4096;Workstation ID=PIDEMOVM;Use Encryption for Data=False;Tag with column collation when possible=False"

'Prepare the execution of the query
MyCommand.CommandText = MyQuery

'Get the record set of the result of the query

'Executes the query
MyRS.Open(MyCommand)

'Initialize output PI tag
MyPoint = MyServer.PIPoints(Media_CO_1month.Tag)

Dim i As Integer
Dim MyCount As Long = 0

'Count the number of rows the query returns since a command of type 'adCmdText' does not return the RecordCount property
While Not MyRS.EOF

MyCount = MyCount + 1
MyRS.MoveNext()

End While

'go back to the first record
MyRS.MoveFirst()

'Parse through each record set
For i = 1 To MyCount

'Bypass if this record is set to NULL because of "Bad" values
If MyRS.Fields("ORIGINALVALUE").Value.ToString IsNot "" Then

'Capture the timestamp of this current record set and store it in the PIValue object
ptmTimestamp.InputString = MyRS.Fields("TIME").Value

'Capture the value of this current record set and store it in the PIValue object

'Write to the PI tag the current PIValue

End If

'Use next record set
MyRS.MoveNext()

Next

End Sub
```

This code will write to the output tag all values contained in the record sets (all rows) except the NULL values.
Now since it was filtered out of its most 'compromising' values, the output tag can be used to perform a time weighted average and be sent to the relevant administration.