3 Replies Latest reply on May 24, 2018 9:04 PM by tramachandran

    Apache Spark's Structured Steaming with PI Web API Channels


      Has anyone tried to connect PI Web API Channels with Apache Spark's (i.e. Databricks) Structured Steaming (Structured Streaming Programming Guide - Spark 2.3.0 Documentation ) successfully?

      We're thinking to use Databricks for all PI data ETL process which is currently done by a Python program, but I cannot figure it out how to feed PI data (via Web API Channels) to Databricks streaming.

      I found OSIsoft published a Python client for PI Web api (GitHub - osimloeff/PI-Web-API-Client-Python: PI Web API client library for Python generated using the Swagger specificat… ). Not sure if I could leverage this library with Databricks' streaming.


      Much appreciate if anyone could shed some light on this topic.

        • Re: Apache Spark's Structured Steaming with PI Web API Channels

          I am going take a jab at this. I am not familiar with Apache Spark. The recommended method for interfacing PI System data with business intelligence tools is PI Integrator for Business Analytics.

          This is the Reference Architecture for Streaming Analytics that was presented at EMEA USERS CONFERENCE 2017 LONDON.


          I noticed this from Spark documentation:

          From the Spark documentation, Creating streaming DataFrames and streaming Datasets section,

          Socket source (for testing) - Reads UTF8 text data from a socket connection. The listening server socket is at the driver. Note that this should be used only for testing as this does not provide end-to-end fault-tolerance guarantees.

          File source - Reads files written in a directory as a stream of data. Supported file formats are text, csv, json, orc, parquet. See the docs of the DataStreamReader interface for a more up-to-date list, and supported options for each file format. Note that the files must be atomically placed in the given directory, which in most file systems, can be achieved by file move operations.


          Having said the above, if you are planning to leverage PI Web API, here is what I would try.

          You can use the Python client library to set up a web socket to receive PI System data. This data(in a chosen format) can then be written into the desired TCP socket & port to be consumed by Spark.

          Python3 Socket Programming


          Again, this is just a suggestion and there might be cleaner methods to accomplish your goal. I will let rest of the PI community to help you with that.

          I am also tagging Marcos Vainer Loeff who created the client library for his inputs.