Note: Development and Testing purposes only. Not supported in production environments.

 

Link to other containerization articles

Containerization Hub

 

Introduction

In my previous blog on AF Server container health check, I talked about implementing a health check for the AF Server container. Naturally, we will also have to discuss about such a check for the PI Data Archive container. For an introduction to what a health check is about and also how you can integrate a health check with Docker. Please refer to the previous blog post as I won't be repeating it here.

 

In part 1, I will be covering the definition of the health tests that we can do for the PI Data Archive and then we will hook them up in the Dockerfile.

In part 2, we will be doing something interesting with these health check enabled containers by using another container that I wrote to inform us by email whenever there is a change in their health status so that we are aware when things fail.

 

Without further ado, let's jump into the definition of the health tests for the PI Data Archive container!

 

Define health tests

There are 2 tests that we will be performing. The first test is a test on the port 5450 to determine if there are any services listening on that port. The second test will use piartool to block for some essential subsystems of the PI Data Archive with a fixed timeout so that the test will fail if it exceeds that timeout.

 

The Powershell cmdlet Get-NetTCPConnection can accomplish the first check for us. A return value of null means that there is no service listening on port 5450.

The relevant code is below

$val = Get-NetTCPConnection -LocalPort 5450 -State Listen -ErrorAction SilentlyContinue
if ($val -eq $null)
{
      # return 1: unhealthy - the container is not working correctly
      Write-Host "Failed: No TCP Listener found on 5450"
      exit 1
}

 

Next, piartool is a utility that is located in the adm folder in PI Data Archive home directory. It has an option called "block" which waits for the specified subsystem to respond. This command is also used in the PI Data Archive start scripts to pause the script until the subsystem is available. The subsystems that we are going to check is the following list.

$SubsystemList = @(
   @("pibasess", "PI Base Subsystem"),
   @("pisnapss", "PI Snapshot Subsystem"),
   @("piarchss", "PI Archive Subsystem"),
   @("piupdmgr", "PI Update Manager")
)

 

We are going to change the amount of time that we allow for each check to 10 seconds so that we do not have to wait 1 hour for it to complete . We will also grab the start and end times so that we can provide detailed logging for troubleshooting purposes. The code for this is below.

function Block-Subsystem
{
Param ([string]$Name, [string]$DisplayName, [int] $TimeoutSeconds= 10)
$StartDate=Get-Date
$rc = Start-Process -FilePath "${env:PISERVER}\adm\piartool.exe" -ArgumentList @("-block", $Name, $TimeoutSeconds) -Wait -PassThru -NoNewWindow
$EndDate=Get-Date
if($rc.ExitCode -ne 0)
{
echo ("Block failed for {0} with exit code {1}, block started: {2}, block ended: {3}" -f $DisplayName,$rc.ExitCode,$StartDate,$EndDate)
exit 1
}
}

ForEach ($Subsystem in $SubsystemList) {Block-Subsystem -Name $Subsystem[0] -DisplayName $Subsystem[1] -TimeoutSeconds 10}

 

Integrate into Docker

We will add this line of code to our Dockerfile to make Docker start performing health checks.

HEALTHCHECK --start-period=60s --timeout=60s --retries=1 CMD powershell .\check.ps1

 

The start period is given as 60 seconds to allow the PI Data Archive to start up and initialize properly before the health check test results will be taken into account. A time out of 60 seconds is given for the entire health check to complete. If it takes longer than that, the health check is deemed to have failed. I also gave only 1 retry which means that the health check will be unsuccessful if the first try fails. There is no second chance! .

 

Build the image

As usual, you will have to supply the PI Server 2018 installer and pilicense.dat yourself. The rest of the files can be found here.

elee3/PI-Data-Archive-container-build

 

Put all the files into the same folder and run the build.bat file.

Once your image is built, you can create a container.

docker run -h pi --name pi -e trust=%computername% pidax:18

 

Now check docker ps. The health status should be starting.

 

After 1 minute which is the timeout period, run docker ps again. The health status should now be healthy.

 

Health monitoring

Now that we have a health check enabled container up and running, we can start to do some wonderful things with it. If your job is a PI administrator. don't you wish there was some way to keep tabs on your PI Data Archive's health so that if it fails, an email can be sent to notify you that it is unhealthy. This way, you won't get a shock the next time you check on your PI Data Archive and realize that it has been down for a week!

 

I have written an application that can help you monitor ANY health enabled containers (i.e. not only the PI Data Archive container and the AF Server container but any container that has a health check enabled) and send you an email when they become unhealthy. We can start the monitoring with just one simple command. You should change the following variables

 

Name of your SMTP server: <mysmtp>

Source email: <admin@osisoft.com>:

Destination email: <operator@osisoft.com>

 

to your own values.

 

docker run --rm -id -h test --name test -e smtp=<mysmtp> -e from=<admin@osisoft.com> -e to=<operator@osisoft.com> elee3/health

 

Once the application is running, we can test it by trying to break our PI Data Archive container. I will do so by stopping the PI Snapshot Subsystem since it is one of the services that is monitored by our health check. After a short while, I received an email in my inbox.

 

Let me check docker ps again.

 

The health status of docker ps corresponds to what the email has indicated. Notice that the email even provides us with the health logs so that we know exactly what went wrong. This is so useful. Now let me go back and start the PI Snapshot Subsystem again. The monitoring application will inform me that my container is healthy again.

 

The latest log at 2:30:47 PM has no output which indicates that there are no errors. The logs will normally fetch the 5 most recent events.

 

With the health monitoring application in place, we can now sleep in peace and not worry about container failures which go unnoticed.

 

Conclusion

In addition to what I have shown here, I want to mention that the health tests can be defined by the users themselves. You do not have to use the implementation that is provided by me. This level of flexibility is very important since health is a subjective topic. One man's trash is another man's treasure. You might think a BMI of 25 is ok but the official recommendation from the health hub is 23 and below. Therefore, the ability to define your own tests and thresholds will help you receive the right notifications that are appropriate to your own environment. You can hook them up during docker run. Here is more information if you are interested.

 

Source code for health monitoring application is here.

elee3/Health-Monitor