Has anyone played with Hadoop and PI: Hadoop + PI data or perhaps Hadoop + PI logs?
I had do search for Hadoop first and read about it - it sounds interesting. But while the amount of data in PI system is large to common users today, Hadoop seems to target even larger amounts of data.
Doing analysis on PI Data usually does not require a distributed aproach to get enough memory and computing power, at least today, IMHO.
As it happens, there is a group within OSIsoft currently interested in Hadoop with regard to searching algorithms.
Can you tell me more about your interest and/or experience?
OSIsoft Product Manager
Laurie, suppose you lot have seen the news about Yahoo's Horton Works? gigaom.com/.../exclusive-yahoo-launching-hadoop-spinoff-this-week
Thanks for the link, Rhys. I had heard the rumors, but wasn't sure what would happen next.
Laurie...how is the OSIsoft research in to Hadoop going, anything you can share? Are you targeting PI tags, AF Elements/Attributes, etc for the "Enterprise Search", or also expanding to searching values?
Thanks for asking. The technical folks (that is, not me) have done some prototyping and settled on some technologies (don't have the list handy...I've had a technology-challenged day today). The team prototyped searching PI Points, Elements, Attributes and some displays, including ProcessBook files. The conceptual architecture would essentially have modules that push or pull data from content sources (data servers and content applications), index it in a meaningful way and modules that would accept requests from client products to return a matching list of results in the requested format.
The work they've done so far has focused on string matching, not value conditions.
Work continues and plans are forming.
Thanks for the update Laurie.
I heard that Microsoft was going to adopt Hadoop too, and I saw this article that confirms so:
An interesting topic that has my attention for now.
When you say ProcessBook files are being indexed, are you indexing data points used within the displays? For example, say you want to delete a PI Point or AF Attribute then you can potentially get a better picture of the impact of such an action? Do you think it will spread to Coresight too? What about Event Frames - surely when customers start hitting millions or tens of millions of events you are going to need to start indexing them too?
So, when I say "indexing files" I mean enabling the sorts of interaction you can currently see in the PI Coresight Beta (both 1 and 2), where a keyword search will return data items (Elements, Attributes, PI Points) as well as displays that have matching data items. This is already available in Coresight, so we thought we should capture and index the same type of information from PI ProcessBook files and probably also PI DataLink files. I'm also hoping we can do something similar for PI WebParts (capturing the data items configured for a particular web part along with the URL to the page where that web part is used). This last one may be the trickiest, but we'll see.
I also have ambitions to index the data items used in calculations (PI PE tags, Totalizers, etc.), PI ProcessBook datasets, PI Notifications, PI AF attributes, and eventually PI Event Frames.
Essentially, I can't imagine any kind of content created by or stored in the PI System that folks wouldn't want to have available for searching. Naturally, I would expect that not everything will be part of a first release, but there isn't enough of a plan to describe what order things might come in.
Has someone seen this presentation
Page 30 and following ... facebooks version of pi ;-)
Nice find Wolfgang. Some of the data rates of messages, e-mails that they process is huge.
Here is the corresponding paper to go with the presentation.
I do like the distributed manner of these systems, especially when considering nodes that have a different purpose to serve. It is along the lines of what I was getting at in this thread (vcampus.osisoft.com/.../12282.aspx) by having members of a PI collective that serve a particular purpose (administration, data storage, ...). If only we had load management within a PI collective.
Laurie, thanks for the update. Appreciate the time you take to keep us all updated on these types of projects! I agree that there shouldn't be anything with the realm of the PI System that you wouldn't want to index and then search. I would even go as far as wanting to search interface nodes for configuration details.
So, what is the ETA on PI System Enterprise Search?
ETA <> soon enough. I'm sure you'll agree.
On the other hand, I'm also confident that its arrival will be welcomed, when it happens.
Laurie DieffenbachRhys, ETA <> soon enough. I'm sure you'll agree. On the other hand, I'm also confident that its arrival will be welcomed, when it happens. :-)
Indeed it will, now hurry up and get it finished...
Some interesting concepts from Google's recent search changes that could be applied to a PI Enterprise Search, especially around events and freshness of data...
Would be nice for an Enterprise Search to have quick links like "Assets Of Interest" (i.e. what asset is everyone currently searching for / using), "Most Recent Assets", "Most Referenced Assets", "Recurring Event Frames", etc.
Thanks for sharing, Rhys.
I agree that exposing "relevance" in terms of how frequently data from PI is being used is important. Our prototype discussions identified ways to track how often each indexed item is selected from search results, so that we could use that count as in indication of relevance. Sadly, I have no other news to report at this time.
Relevance...if you could throw in a role/job based layer in to the mix then I will be happy For example, "what Assets have other Rotating Equipment Engineers from Plant Area A been accessing, in rank order, within the Event Frame 'Plant Area A Instability 12345'?". Then if you build an iOS application that hooks in to Siri, we could literally ask the question.
That's all for now - I've set a reminder to ask some more questions in a few weeks time. Hmmm, enterprise search is becoming the new SSB for me...
Rhys @ Wipro... enterprise search is becoming the new SSB for me...
*unsure if should feel relieved or terrified*
Michael @ OSIsoft*unsure if should feel relieved or terrified*
Ha ha ha. Be afraid...be very afraid...
Nice interview on the subject:
Retrieving data ...