Why do I get repeated values in my tag data if Compression is on?

Blog Post created by avanfosson on Dec 24, 2015

First, if you haven't seen OSIsoft: Exception and Compression Full Details - YouTube, then go watch it now because the rest of this post assumes you have seen it.


I have gotten this question a few times from customers, "I see repeated values in my archive even though I have compression turned on. I thought compression prevented repeated values from occurring. What is wrong with my compression settings?" Chances are, there isn't anything wrong with your compression settings.


There are two possible reasons you are seeing repeated values:

  1. Your compression maximum value is taking effect.
  2. The values in the archive are necessary to avoid losing information.


The CompMax value for a tag determines the maximum amount of time between data points in the archive. By default it is set to eight hours. If no events pass the compression test for eight hours, then the current snapshot will be archived as soon as the next value is received. This means that the time between values in the archive will always be less than or equal to eight hours because the timestamp of the current snapshot must be at or before the eight hour mark. If you are looking in your archive and seeing identical values whose timestamps are almost but not quite 8 hours apart, then you are probably exceeding CompMax.


If the spacing between repeated values is much less than your CompMax setting, then you are probably running into the second scenario. As an example, let's say you have a tag writing a value of 1 or 0 once a second, and your archive data looks like this:


And your question is, "Why is the data 1,1,0,0,1,1 instead of 1,0,1?" If we plot the first data set that has the repeated values, and the Step attribute for the tag is set to OFF, then we should see a plot that looks like this:

Note that I have altered the spacing on the x-axis to highlight the transitions between 1's and 0's. If we plot the second data set that does not have repeated values, then we should see a plot that looks like this:

Now we have lost information. The first plot showed that the value held steady for an hour, changed suddenly, held steady at the new value, and then changed suddenly back. In the second plot, the values changed gradually over the course of an hour, and then gradually changed back.


Many users think that setting compression deviation to 0 and leaving compression on will eliminate all repeated values.  This is not without cause, at 10:03 in the video mentioned at the beginning of the article the narrator states, "[Setting] compression to on [and setting] the compression deviation to zero... means that successive identical values will not be archived." This statement is true the vast majority of the time but not always. In our first plot above only two values in an entire hour's worth of data were archived; the majority of repeated values were discarded. We had to keep the two values at the start and end of each horizontal line in order to show that the value did not change for the whole hour. As we saw in the second plot, if we were to discard the end point for each horizontal section, then it completely alters the information stored in the archive.


And that is why you see repeated values in your archive even when compression deviation is set to zero.