John Messinger

Analysing PI logs with Splunk

Blog Post created by John Messinger Champion on Dec 30, 2017

This past year I've been working with Splunk alongside PI. Mainly this has involved developing an integration between the two platforms, so as to be able to use PI System data in Splunk, especially around machine learning (Splunk has an excellent Machine Learning Toolkit). As an ancillary project, I've also looked at Splunk as a means of analysing PI logs. For those of you who haven't come across Splunk, it's often used by IT departments to aggregate, store and analyse data from multiple sources and systems (sounds a bit like PI). Probably one of the greatest strengths of Splunk is what you can do with log file data.


Some types of log file data are pretty straight forward to ingest into Splunk, such as syslog, Windows Event Logs and text based log files. The standard PI Message Log is a little bit of a different beast though. Thankfully, with the help of PowerShell Tools for the PI System, it's become a little bit easier.


The approach to ingesting PI Message Log data into Splunk uses the Powershell v3 Modular Input to execute a simple script based on the PowerShell Tools for the PI System.


A quick introduction

If you already have Splunk within your organisation, and have some familiarity with using it, you can skip to the next section.


Splunk is a browser based application - almost all configuration, and all data searching and analysis is performed through the browser. If you don't have Splunk available within your organisation, you can download the free version of Splunk that allows you to ingest up to 500MB of data each day. This version can be used in a production environment. You can also sign up for the developer program and get a free license that will allow you to ingest up to 50GB per day, but this is only for use within a development environment.


Create the Source Type

Firstly, before configuring the modular input and writing the script, we want to define a new sourcetype in Splunk.  Because the source type controls how Splunk software formats incoming data, it is important that you assign the correct source type to your data. That way, the indexed version of the data (the event data) looks the way you want, with appropriate timestamps and event breaks. This facilitates easier searching of the data later.


To create a new sourcetype, select the Source Types option from the Settings menu in Splunk:


In the Source Types page, click the New Source Type button in the top right corner of the page.


In the dialog that opens, fill in the fields with the following values:


Make sure to target 'system' as the destination app, and set the Category to 'Structured'. As PowerShell will return the timestamps of PI messages in UTC time (see this blog post), I've found that setting the Time zone to UTC is the best way to handle these, as Splunk will then convert to your local timezone. The timestamp format field in the example above is specific to the way that I see timestamps returned by PowerShell based on my regional settings. If you see timestamps returned from PowerShell formatted differently, please refer to the Splunk documentation here to get the specific time and date variables you will need.


At the bottom of the dialog, expand the 'Advanced' section before saving, and fill in the following fields:


Some of these settings (FIELD_NAMES and TIMESTAMP_FIELDS) will need to be manually added via the 'New Setting' option at the bottom of the field list. Once these are completed, click on the Save button.


Create a new Splunk app

To proceed further, you will need to create a new app in your Splunk instance. To do so will require admin level access within Splunk. From the main window, click on the gear icon next to the Apps label:


In the Apps window, click the 'Create app' button near the top left corner. Complete the app fields with similar values to the following:


Make note of the folder name you create, as you will need to navigate to this app folder later. Save the new app.


Create a new index

In Splunk, all ingested data is stored in one or more indexes. We will create a new index to store our PI log data. To do so, select Indexes from the Settings menu:


In the New Index dialog, give your index a meaningful name, and set the App to the name of the app you created in the previous step. All other fields can be left with default values:


Click Save to create the index.


Configure the data input

Before we configure the modular input, let's go ahead and create the PowerShell script to be executed. Note that you will need to be able to access the file system of your Splunk server in order to place the script where it can be executed. If you are using a Splunk Universal Forwarder, you will need access to the file system of the server running the forwarder.


The actual PowerShell script is fairly basic, and uses the PowerShell Tools for the PI System. These are installed as part of the PI System Management Tools package since the 2015 release. The script I use is as follows:


$DataSource = "JMPI"
$con = Connect-PIDataArchive -PIDataArchiveConnectionConfiguration (Get-PIDataArchiveConnectionConfiguration -name $DataSource) -ErrorAction Stop
$startTime = (Get-Date).AddMinutes(-5)
$endTime = Get-Date
Get-PIMessage -StartTime $startTime -EndTime $endTime | Select-Object *, @{ Name = "SplunkTime"; Expression = { Get-Date -Date $_.TimeStamp -UFormat %s } }


Note that on the last line of this script, our Select-Object cmdlet includes an additional parameter called 'SplunkTime'. The purpose of this expression is to set the Splunk event timestamp to the same value as the message log entry TimeStamp field. Otherwise, the log events in Splunk would be timestamped with the script's execution time. This corresponds to the _time field in a Splunk event, and must be passed as epoch (Unix) time.


Save this as a script file. Copy the script file to the app's bin folder on your Splunk server. In my case above, this would translate to 'C:\Program Files\Splunk\etc\apps\PIMsgLogApp\bin'. Note that on a Windows installation, the $SPLUNK_HOME environment variable mentioned in the screenshot above would actually translate to %SPLUNK_HOME%.


To create the new data input,select data inputs from the Splunk Settings menu:


Look for the 'Powershell v3 Modular Input' item in the list, and then click 'Add new' under the Actions column:


You will then see a screen that looks like the following:


Add the following values to the fields as follows:


There are a few things to be noted here. Firstly, the 'Command or Script path' field must start with a period (.), and the path is defined as a Unix style path (including the environment variable), regardless of whether the server is Windows or Linux. In the screenshot above, my script path is given as . "$SplunkHome/etc/apps/PIMsgLogApp/bin/SplunkPIMessages.ps1". Note that there is also a space between the leading period and the start of the path.


The Cron schedule is the task execution schedule for those who aren't familiar with the cron daemon from the Unix/Linux world. In the example above, this string configures cron to execute the task every 5 minutes.


Under the 'More settings' section, use the 'From list' option for Set sourcetype, and then select the sourcetype you created earlier for your PI Message Log sourcetype. The Host will be the name of the PI Data Archive server that is the source of these log messages. Lastly, the Index is the name of the Splunk index where your message log data will be stored - set this to the name of the index you created in the previous step.


That's it! You should now be able to collect PI Message Log data into your Splunk instance.


Analysing the data

Now that we have PI Message Log data in Splunk, how do we analyse it? All data in Splunk is accessed and analysed by searches:


The above search will request all events in the 'pi_logs' index for the past 15 minutes. The raw results will look something like the following:


These are the 'raw' events extracted from the index. Note that along the left side of the results are the list of fields that Splunk has identified - most are from our sourcetype definition, and others have been automatically extracted. These can be used in our search to filter results. For example, expanding the search query to index=pi_logs ID=7138 returns the following filtered events:


We could then output this in a tabular format using the search command index=pi_logs ID=7138 | table _time,Message to get the following result:


There is much more that you could do with this data in Splunk using SPL to generate specific reports and dashboards showing the kinds of metrics you might be interested in. Hopefully, this post is enough to get you started with analysing PI Message Log data with Splunk.