18 Replies Latest reply on Nov 16, 2011 10:59 AM by MvanderVeeken

    Hadoop and PI

      Has anyone played with Hadoop and PI: Hadoop + PI data or perhaps Hadoop + PI logs?

        • Re: Hadoop and PI
          andreas

          I had do search for Hadoop first and read about it - it sounds interesting. But while the amount of data in PI system is large to common users today, Hadoop seems to target even larger amounts of data.

           

          Doing analysis on PI Data usually does not require a distributed aproach to get enough memory and computing power, at least today, IMHO.

            • Re: Hadoop and PI
              ldieffenbach

              Hi Luis,

               

              As it happens, there is a group within OSIsoft currently interested in Hadoop with regard to searching algorithms.

               

              Can you tell me more about your interest and/or experience?

               

              Regards,

               

              Laurie

               

              OSIsoft Product Manager

                • Re: Hadoop and PI

                  Laurie, suppose you lot have seen the news about Yahoo's Horton Works? gigaom.com/.../exclusive-yahoo-launching-hadoop-spinoff-this-week

                    • Re: Hadoop and PI
                      ldieffenbach

                      Thanks for the link, Rhys. I had heard the rumors, but wasn't sure what would happen next.

                       

                      Laurie

                        • Re: Hadoop and PI

                          Laurie...how is the OSIsoft research in to Hadoop going, anything you can share?  Are you targeting  PI tags, AF Elements/Attributes, etc for the "Enterprise Search", or also expanding to searching values?

                            • Re: Hadoop and PI
                              ldieffenbach

                              Hi Rhys,

                               

                              Thanks for asking. The technical folks (that is, not me) have done some prototyping and settled on some technologies (don't have the list handy...I've had a technology-challenged day today). The team prototyped searching PI Points, Elements, Attributes and some displays, including ProcessBook files. The conceptual architecture would essentially have modules that push or pull data from content sources (data servers and content applications), index it in a meaningful way and modules that would accept requests from client products to return a matching list of results in the requested format.

                               

                              The work they've done so far has focused on string matching, not value conditions.

                               

                              Work continues and plans are forming.

                               

                              Regards,

                               

                              Laurie

                                • Re: Hadoop and PI

                                  Thanks for the update Laurie.

                                   

                                  I heard that Microsoft was going to adopt Hadoop too, and I saw this article that confirms so:

                                   

                                  blogs.msdn.com/.../cross-post-microsoft-announces-big-data-roadmap-adopts-apache-hadoop-on-windows-azure.aspx

                                   

                                  An interesting topic that has my attention for now.

                                   

                                  When you say ProcessBook files are being indexed, are you indexing data points used within the displays?  For example, say you want to delete a PI Point or AF Attribute then you can potentially get a better picture of the impact of such an action?  Do you think it will spread to Coresight too?  What about Event Frames - surely when customers start hitting millions or tens of millions of events you are going to need to start indexing them too?

                                    • Re: Hadoop and PI
                                      ldieffenbach

                                      So, when I say "indexing files" I mean enabling the sorts of interaction you can currently see in the PI Coresight Beta (both 1 and 2), where a keyword search will return data items (Elements, Attributes, PI Points) as well as displays that have matching data items. This is already available in Coresight, so we thought we should capture and index the same type of information from PI ProcessBook files and probably also PI DataLink files. I'm also hoping we can do something similar for PI WebParts (capturing the data items configured for a particular web part along with the URL to the page where that web part is used). This last one may be the trickiest, but we'll see.

                                       

                                      I also have ambitions to index the data items used in calculations (PI PE tags, Totalizers, etc.), PI ProcessBook datasets, PI Notifications, PI AF attributes, and eventually PI Event Frames.

                                       

                                      Essentially, I can't imagine any kind of content created by or stored in the PI System that folks wouldn't want to have available for searching. Naturally, I would expect that not everything will be part of a first release, but there isn't enough of a plan to describe what order things might come in.

                                       

                                      Laurie