Machine Learning Pipeline 1: Importing PI System Data into Python

Blog Post created by mfoerster Employee on Mar 7, 2019

With this blog post series I want to enable data scientists to quickly get started doing Data Science in Python, without worrying about how to get the data out of the PI System.


In specific i want to highlight 2 options to get PI System data into Python for the use in data science:


  1. Writing PI System Data into a .csv file and using the .csv file as data source in Python.
  2. Directly accessing the PI Sytem using HTTP requests in Python.


Approach 1: Extracting PI System Data into a .csv file

Please check out these 3 ways to extract PI System data into .csv files:


Extracting PI System data in C# with AFSDK:

Extracting PI System Data to file using AFSDK in .NET


Extracting Pi System data in C# using PI SQL Client OLEDB

Extracting PI System Data to file using PI SQL Client OLEDB via PI SQL DAS RTQP in .NET


Extracting PI System Data in Python using PI Web API

Extracting PI System Data to file using PI Web API in Python


In each of the above approaches all events for the requested PI Points are extracted, no matter what how far the events are apart in time.

This can be not wanted, especially when using the data for time series prediction. In this case you would have to exchange the "RecordedValues" method by the "Interpolated" method to be able to define a sampling frequency:



GetInterpolated GET streams/{webId}/interpolated



AFData.InterpolatedValues Method


  • PI Datalink can also be used to create the .csv file, but focus is on programmatic approaches.


Reading data from .csv file in Python

Sample .csv file:

The events are stripped of their timestamps, as the events have a fixed sampling frequency, which makes a timestamp obsolete.



import numpy as np
import csv

dataset = np.loadtxt(open('filepath_csv', "rb"), delimiter=",", skiprows=1)


skiprows=1: will skip the first row of the .csv file. This can be useful when the header of the file contains column description.

The columns of the .csv file are stored in a numpy array, which can be further used for machine learning.


Approach 2: Directly accessing the PI Sytem using HTTP requests in Python.

For this approach we make use of the requests library in Python.

Requests: HTTP for Humans™ — Requests 2.21.0 documentation


The PI Web API GetInterpolated method is used to extract constantly sampled values of a desired PI Point:

GetInterpolated GET streams/{webId}/interpolated


In order to retrieve data for a certain PI Point we need the WebID as reference. It can be retrieved by the built-in search of PI Web API.

In this case the WebID can be found here:




Using the requests library of Python and the GetInterpolated method of PI Web API, we retrieve the sampled events of the desired PI Point as a JSON HTTP response:


import requests

response = requests.get('https://<PIWebAPI_host>/piwebapi/streams/<webID_of_PIPoint>/interpolated?startTime=T-10d&endTime=T&Interval=1h', headers={"Authorization": "Basic %s" % b64Val}, verify=True)


The response is in JSON format and will look something like that:



Parsing the JSON HTTP response:

We only need the values of the events. As they are interpolated, we do not care about quality. The timestamp information is contained in the sampling itnerval, that we have earlier specified in the GetInterpolated method of PI Web API.

We assume that we have 2 JSON responses r1 and r2 for 2 different PIPoints, but both generated with the GetInterpolated method, with same sampling interval, over the same timerange.



import json
import numpy as np

json1_data = r1.json()
json2_data = r2.json()

data_list_1 = list()

for j_object in json1_data["Items"]:

value = j_object["Value"]
if type(value) is float: #this is important to not iclude the last element which is of type "dict"

data_list_1 = np.append(data_list_1, float(value))
data_list_2 = list()

for j_object in json2_data["Items"]:

value = j_object["Value"]
if type(value) is float:
data_list_2 = np.append(data_list_2, float(value))

# Stack both 1-D Lists into a 2-D Array:
array_request_values = np.array(np.column_stack((data_list_1, data_list_2)))

(Sorry for the wrong intendation)


This Python code parses the JSON HTTP responses and writes them into 2 seperate lists. These then are stacked into a numpy array:





This numpy array can be used as input for machine learning.


Please check out Machine Learning Pipeline 2, for an easy way to write back machine learning output to the PI System.