Marcos Vainer Loeff

Creating an AF Elements Tree using web page content

Blog Post created by Marcos Vainer Loeff Employee on Mar 19, 2014

On this blog post, I will show you how to develop a console application that creates automatically an AF Element Tree using the content of a web page through PI AF SDK. The data from the web page could be sent to the PI Server if it is time-series data or to the AF Server if the data represents a characteristic of the AF element.

 

There are some interesting use cases getting data from web pages:

  • Get the current temperature of a city every hour. The data is taken directly from the web page whose values updates frequently.
  • Get the weather forecast of a city as the new PI Server version will support future data.
  • Get the current coastal water temperature.
  • Get current values from stock market.
  • Get the structure and data from a dashboard or portal on an enterprise page exposing sensitive data.
This of course should be used only if the website's company doesn't provide a web service for their customers to get its data.

OSIsoft Worldwide Office Locations

If you go to the OSIsoft website and click on Regional and then Worldwide Locations, the browser will go to a web page displaying all the offices location around the world on a map and on a table. Please refer to Figures 1 and 2.

 

 

 

6471.fig1.jpg

 

Figure 1 – Map showing OSIsoft offices locations around the world.

 

 

 

1122.fig2.jpg

 

Figure 2 – Table of the web page showing the offices information.

 

 

 

The idea of this blog post is to teach you how to use the content of HTML page shown on the previous table in order to create an AF Element Tree with the available information. Nevertheless, the scope of this blog post is just technical. If you want to do the same procedure with other web page, you first need to check if you have legal permission in order to do so by checking the terms and agreement of the web site or by contacting the appropriate company.

AF Element Template

Before starting to develop the custom console application on Visual Studio, an AF Element Template called “OSIsoft Office” should be created first. This template represents a generic OSIsoft Office location with four attributes: Address, Contact Information, Location and Regional website.  The value type of all attributes is “String”.

 

All the elements should be created using this element template. The name of the new element should be the name of the office. For instance, after the name of the Brazilian office element should be “OSIsoft do Brasil Sistemas Ltda.” and it should have the following attributes and values:

Attribute

Value

Location

Brazil

Address

Alameda Lorena 131,
Conj. 135 a 138
Sao Paulo, SP 01424-000, Brazil

Telephone

+55 11.3053.5030 (Main)
+55 11.3053.5039 (Fax)
(+55 11) 3053-5040 (Tech) 

Regional Web Site

http://www.osisoft.com/Brazil.aspx

 

 

Table 1 – Example of the attributes from an element which represents an OSIsoft office.

 

 

 

As the template is created only once, it could be created  using PI System Explorer or programmatically with PI AF SDK. However, the elements should be created from this template only programmatically as it is more convenient and faster.

Html Agility Pack 

In order to read and properly process the web page, a library called Html Agility Pack is going to be used, which could be downloaded from Codeplex. The website of this library could be accessed here.

 

 

What is Html Agility Pack? 

The website describes its pack as an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams). 

 

 

Methods and Properties

 

 

Both tables below are showing the name and the description of the most important methods and properties of some objects of this library.

 

 

Property or method

Description

GetElementbyId

Gets the HTML node with the specified 'id' attribute value.

Load

Loads an HTML document from a stream.

LoadHtml

Loads the HTML document from the specified string.

Save

Saves the HTML document to the specified stream.

 Table 2 – Main properties and methods of the HtmlDocument class.

 

  

Property or method

Description

ChildNodes

Get all children of the node.

HasChildNodes

Get a bool value indicating if this node has any child node.

Id

Get or sets the value of the ‘id’ HTML attribute.

Name

Get or sets the name of the node.

NextSibling

Get the HTML node immediately following this element.

SelectNodes

Select a list of nodes matching the XPath expression

SelectSingleNode

Select the first node to match the XPath expression

XPath

Gets a valid XPath string that points to this node.

Table 3 – Main properties and methods of the HtmlNode class. 

Example

In order to get started with this library, an example is presented in this section. The content of a HTML file named Examples.html is shown below:

 

  

 
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<body>
<table>
    <tr class="menu">
        <td class="name">Volcano Name</td>
        <td>Location</td>
        <td>Last Major Eruption</td>      
        <td>Type of Eruption</td>
    </tr>
    <tr class="data">
        <td class="name">Mt. Lassen</td>
        <td><a href="california.html">California</a></td>
        <td>1914-17</td>
        <td>Explosive Eruption</td>
    </tr>
    <tr class="data">
        <td id="oregonid" class="name">Mt. Hood</td>
          <td><a href="oregon.html">Oregon</a></td>
        <td>1790s</td>
        <td>Pyroclastic flows and Mudflows</td>
    </tr>
    <tr class="data">
        <td class="name">Mt .St. Helens</td>
          <td><a href="washington.html">Washington</a></td>
        <td>1980</td>
        <td>Explosive Eruption</td>
    </tr>
</table>
</body>
</html>

 

 

The code snippet below reads the HTML file and it selects some nodes according to how the node search is executed. The comments explain what each line does.

 

  

 
            //Inicialize HtmlAgilityPack
            var doc = new HtmlAgilityPack.HtmlDocument();

            //Load document from file
            doc.Load("Examples.htm");

            //Create the top level node from the doc object
            HtmlNode root = doc.DocumentNode;           

            //Find all nodes that are links
            HtmlNodeCollection AllNodesLink = root.SelectNodes("//a");

            //A way to iterate over HtmlNodeCollection object
            foreach (HtmlNode NodeLink in AllNodesLink)
            {
                Console.WriteLine("Text: " + NodeLink.InnerHtml);
                Console.WriteLine("Link: " + NodeLink.Attributes["href"].Value);
            }

            //Find all nodes whose name are <tr>
            HtmlNodeCollection AllTrNodes = root.SelectNodes("//tr");


            //Find all nodes that are links within the second line of the table
            HtmlNode LinkFromTheFirstTrNode = AllTrNodes[1].SelectSingleNode(".//a");

            //Find all nodes whose name are <tr> with the class data.
            var AllTrNodesFromDataClass = root.SelectNodes("//tr[@class='data']");

            //Find the link node whose link address is washington.html
            var WashingtonLink = root.SelectSingleNode("//a[@href='washington.html']");

            //Using LINQ to find all the links from the document
            var linksOnPage = from lnks in doc.DocumentNode.Descendants()
                              where lnks.Name == "a" &&
                                   lnks.Attributes["href"] != null &&
                                   lnks.InnerText.Trim().Length > 0
                              select new
                              {
                                  Url = lnks.Attributes["href"].Value,
                                  Text = lnks.InnerText
                              };
            

 

 

For more information about this library, please refer to this article.

Custom Console Application

 

 

Finally, the custom console application was developed using PI AF SDK and Html Agility Pack library with the objective of migrating the web page content of the OSIsoft worldwide offices location to an AF Database, creating an AF Element Tree.

 

The dataflow of the application is described below:

  1. Connects to an AF Server and AF Database.
  2. Get the HTTP response from the internet with the web page content stored on a string.
  3. Convert the string into HtmlDocument object with the Html Agility Pack library.
  4. Find the table node.
  5. Iterate through the entire table, finding relevant information from the OSIsoft Office.
  6. Process the information to be stored on C# string through the functions ConvertInnerHtmlToString and ProcessString.
  7. Send the information to the PI System.

The code snippet of the console application could be seen next:

 

 

 

 

 
using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using HtmlAgilityPack;
using OSIsoft.AF;
using OSIsoft.AF.Asset;

namespace DownloadWebPageContent
{
    class Program
    {
        //This function helps convert HTML text into C# strings
        private static string ConvertInnerHtmlToString(string InnerHtml)
        {
            InnerHtml = InnerHtml.Replace("\n", String.Empty);
            InnerHtml = InnerHtml.Replace(" ", " ");
            InnerHtml = InnerHtml.Replace("&nbsp", " ");
            InnerHtml = InnerHtml.Replace("<br>", "\n");
            return (InnerHtml);
        }


        //This function eliminates spaces and comments improving the esthetics
        private static string ProcessString(string OriginalString)
        {
            string[] SplittedString = OriginalString.Split('\n');
            for (int n = 0; n < SplittedString.Length; n++)
            {
                SplittedString
 = SplittedString
.Trim();
                if ((SplittedString
.Contains("<!--")) && (SplittedString
.Contains("-->")))
                {
                    int StartRemoveIndex = SplittedString
.IndexOf("<!--");
                    int EndRemoveIndex = SplittedString
.IndexOf("-->") + 3;
                    SplittedString
 = SplittedString
.Remove(StartRemoveIndex, EndRemoveIndex - StartRemoveIndex);
                }
            }
            return (string.Join("\n", SplittedString)).Trim();
        }

        //Main functions
        static void Main(string[] args)
        {
            try
            {
                //Inicialize PI System objects
                PISystems myPISystems = new PISystems();
                AFDatabase myDb = myPISystems["MARC-PI2014"].Databases["OSIsoftOffices"];

                //Get HTTP response
                string url = "http://www.osisoft.com/company/Worldwide_Locations.aspx";
                WebRequest request = WebRequest.Create(url);
                WebResponse response = request.GetResponse();
                StreamReader sw = new StreamReader(response.GetResponseStream());
                string uricontent = sw.ReadToEnd();

                //Inicialize HtmlAgilityPac objects
                var doc = new HtmlAgilityPack.HtmlDocument();
                doc.LoadHtml(uricontent);
                HtmlNode root = doc.DocumentNode;

                //Get HtmlNode table which includes all the information needed
                HtmlNode TableNode = root.SelectSingleNode("//table[@class='table1']");

                //Get all the <tr> childreen nodes 
                HtmlNodeCollection TableNodeTrChildren = TableNode.SelectNodes(".//tr");


                //Process the content of all <tr> node excluding the first one
                for (int i = 1; i < TableNodeTrChildren.Count; i++)
                {
                    try
                    {
                        HtmlNode OfficeNode = TableNodeTrChildren
;

                        //Get HtmlNode objects
                        HtmlNode CountryNode = OfficeNode.SelectSingleNode(".//td[2]");
                        HtmlNode AddressNode = OfficeNode.SelectSingleNode(".//td[3]");
                        HtmlNode TelephoneNode = OfficeNode.SelectSingleNode(".//td[4]");
                        HtmlNode RegionalWebSiteNode = OfficeNode.SelectSingleNode(".//td[5]");
                        List<HtmlNode> RegionalWebSiteLinkNodes = RegionalWebSiteNode.Descendants("a").ToList();


                        //Convert HtmlNode content into strings
                        string Country = ConvertInnerHtmlToString(CountryNode.InnerHtml.Trim());
                        string Telephone = ProcessString(ConvertInnerHtmlToString(TelephoneNode.InnerHtml.Trim()));
                        string CompleteAddress = ProcessString(ConvertInnerHtmlToString(AddressNode.InnerHtml.Trim()));
                        string Name = CompleteAddress.Substring(0, CompleteAddress.IndexOf("\n"));
                        string Address = (CompleteAddress.Substring(CompleteAddress.IndexOf("\n"), CompleteAddress.Length - CompleteAddress.IndexOf("\n"))).Trim();
                        string RegionalLink = String.Empty;
                        if (RegionalWebSiteLinkNodes.Count > 0)
                        {
                            RegionalLink = RegionalWebSiteLinkNodes[0].GetAttributeValue("href", "");
                        }
                        else
                        {
                            RegionalLink = "No link found.";
                        }
                        Console.WriteLine("Office: " + Name);

                        //Get or create the current office element
                        AFElement OfficeElement = myDb.Elements[Name];
                        if (OfficeElement == null)
                        {
                            OfficeElement = myDb.Elements.Add(Name, myDb.ElementTemplates["OSIsoft Office"]);
                        }
               

                        //Send the values to the PI System
                        OfficeElement.Attributes["Location"].SetValue(new AFValue(Country));
                        OfficeElement.Attributes["Address"].SetValue(new AFValue(Address));
                        OfficeElement.Attributes["Contact Information"].SetValue(new AFValue(Telephone));
                        OfficeElement.Attributes["Regional WebSite"].SetValue(new AFValue("http://www.osisoft.com/" + RegionalLink));
                        myDb.CheckIn();
                    }

                    catch (Exception ex)
                    {
                        Console.WriteLine("Error: " + ex.Message);
                    }
                    

                }
            }
            catch (Exception ex)
            {
                Console.WriteLine("Error: " + ex.Message);
            }
            Console.WriteLine("Finished");
            Console.ReadKey();
        }
    }

}

 

 

After the program finishes its execution, you are going to see the following AF hierarchy on PI System Explorer shown on Figure 3 (supposing that no exception is thrown during the console application execution).

 

 

 

4466.fig3.jpg

 

Figure 3 – AF Hierarchy after running the custom console application.

 

 

 

 

Conclusion

 

 

The objective of this blog post is to show that the PI System could be used with every application you can think of. Even if you are working with no time-series data, storing them on AF Server is still an interesting alternative.

 

Remember to check if you have legal permissions to store web page content into your PI System before following this procedure.

 

Stay tuned because my next blog posts will be about integrating the PI System and Google APIs.

 

 

 

 

Outcomes