19 Replies Latest reply on Apr 17, 2010 6:22 PM by spilon

    TagRank (another one of those "ideas"...)

    merighm

      I often wonder how many tags are ever retrieved from the archive of a PI Server containing, say, 10000 tags.  Most users rely on a handfull of DL reports and PB schematics to monitor their processes, and these documents are based on a small fraction of the total number of tags.  So, perhaps 9 out of 10 tags never ever get trended or displayed in a report or schematic, and they never contribute to operations.  I do not mean to imply that archiving 10000 tags is silly; actually the more tags one archives the better, because one never knows when something will be needed.

       

      But given data from a 10000 tags PI Server, can one identify which 1000 tags are most important without knowing what they really represent?  This is done for web search, where google ranks pages first based not on the semantic meaning of the information they contain but on the topological structure of the web (which pages link to which).  Some other criteria would have to be devised to assign an importance score to each tag.

       

      Similar to PageRank for ranking webpages, it would be really interesting to come up with a TagRank which quantitatively rates the importance of each tag on the server without knowing what it really represents.  Why?  The system could use this to say "this tag must be important, and I must notify the user when I notice an unusual pattern involving this tag."  So you come in the morning, and the system will suggest that you checkout an interesting drop in the value of a tag.  You know the tag represents a temperature, but the system flagged it simply because its behavior was not "in synch" with the rest of the system.  The idea is that the system should not flag a 100 tags a day, but just a few important ones (and give you links for tags with similar behavior).

       

      Such a system can be used to automate the analysis, highlight the exceptions and guide the user directly to the area that needs attention. This gives the user an alert about the problem, not a report to find the problem.  A potential application for StreamInsight.

       

      Moh&

       

       

        • Re: TagRank (another one of those "ideas"...)
          pcombellick

          Interesting! 

           

          Imagine that you and I are flying in an airplane over the Atlas Mtns.  On the instrument panel there are several unlabeled trends.  Airspeed, altitude, position, angle of attack, pitch, vertical speed, fuel quantity, fuel consumption rate, engine temperature, heading, etc. 

           

          How do you determine, which trend is the most important at any given moment and which values are in a "good" or "not good" range without domain knowledge of what each trend represents and without knowledge of the specific phase of our flight?

           

           

            • Re: TagRank (another one of those "ideas"...)
              merighm

              When I referred to a tag not being "in synch" with others, I wanted to say that as long as tag values look "consistent" with some known pattern, you are OK.  If altitude, pitch, and speed (in this order) are important during landing, the system should use the first two important parameters to identify that you are landing and warn you when your speed is not right. The point is that by observing the system over a period of time, the system can identify some "phases"...

               

              [The following is shamelesly plagiarized from another source] Since I am not a pilot, let us take the more mundane case of car driving, which we want categorize into city driving, highway driving, heavy traffic, etc. Let us say we want continuously monitor the system at various levels to detect surprising situations. Situations that deviate from the normal behavior could be indicative of a component failure. For example, if the monitoring system believed it was in one driving environment (highway) but observed the engine behaving as it would in another environment (city), it might conclude something is wrong and respond accordingly (warning message to the driver, etc).  To make this characterization, our sources of data for this model are temperature, vibration and rotational speed sensors. The speed sensors on the rear wheels tell us both how fast the car is going and also how much it is turning. The average of the two wheel speeds is proportional to the overall speed of the car. The difference of the two wheel speeds tells how much it is turning: on a left turn, the right wheel will rotate faster than the left, and on a right turn, the left wheel will rotate faster than the right.   The information gathered from these sensors is useful to give instantaneous values for speed and turning, but is even more powerful when observed over time. For example, we might observe smooth accelerations or rapid accelerations. Or we might observe a small speed differential over a long period of time (gradual turning), or a large speed differential over a short period of time (sharp turning)....well...you can read all about it here: http://www.numenta.com/for-developers/education/ProblemsThatFitHTMs.pdf

               

              ...

               

              In other words, there is no right or wrong... but there are normal pattern and abnormal patterns.

               

              Moh&

                • Re: TagRank (another one of those "ideas"...)
                  cescamilla

                  This all sounds great but, to my believe, it falls more in the alarm category than in any other.

                   

                  Why? because what you are describing is an alarmn when something is not as expected, which is the main reason behind alarms... alarms should not be set to simple values (if speed is greater than 50miles then alert - in an airplane will not make sense for 92% of the trip), they should be marked as deviations from the expected behaviour (if speed is greater than 50 miles while landing then alert - is a better put alarm). or... (and I do believe it is implemented this way) if landing gear is not (landing gear desired position) then alert.

                   

                  So, the only way to make sense of something is context. Sure, having alarms and alerts would be great, but you do need to provide some context to it for everything to make sense, once you provide this context (*1) you can use the system in the way you intend to or let a new guy try it out and the could learn about the alarms.

                   

                  *1 Having a system that learns is not that hard, but trying to do that will mean that someone will have to teach it how to behave... and that is a complex matter altogether. Setting those rules explicitevely is preffered. What happens if you miss a alarm while training the system? what will happen if you take the wrong action? ending up with a misbehaving system that way is really easy.

                   

                  This topic does invite for a lot of discussion.

                   

                  I read the PDF link you attached, and (as expected) it starts by defining the realms you want to divide. and then it sets a couple of variables, on which you rely to make the tree. The reasoning behind the document seems to be a little bit inverse to what you propose, you propose to have a lot of variables, and let the server decide which ones are importants and then you can set the realms. It sounds nice, but it seems to me you are going in the opposite direction.

                   

                  Then again do not take my word as final, each time I have heard that it can't be done I managed to find a way around it and make it possible, why couldn't you do the same?

                    • Re: TagRank (another one of those "ideas"...)
                      merighm

                      Hello Cristobal,

                       

                      You made very good points that will certainly put some order in my thoughts.

                       

                      In the meantime, checkout this demo  http://www.vitamindinc.com/ which is based on software from the company whose website I sent yesterday.  The demo shows some cool things you can do using a simple webcam.  Just like a webcam with this software can be used for more than security applications based on motion detection, alarming is only one of the potential uses of what I am trying to say in a PI Context.

                       

                      Moh&

                       

                       

                       

                       

                        • Re: TagRank (another one of those "ideas"...)
                          pcombellick

                          I think there is a lot of difference between "teaching" a software tool how to recognize "normal" and "abnormal" conditions and a generic software tool that you can connect to any PI System and have it figure out how to recognize "normal" and "abnormal" conditions.

                           

                           

                            • Re: TagRank (another one of those "ideas"...)
                              merighm

                              Well, I do not dream yet of a system that will just read PI archives and tell you, out of the box, if your process was operating normally or abnormally at a given time.  However, what I envision the system able to do is:

                              1. Lets you select a period of time of your choice, for which it will suggest a reduced list of tags that capture most of the variability in the process.
                              2. Lets you add more tags to that list if you so choose.
                              3. Asks you to label this period using some meaningfull name.  For instance, you can call this "Startup" because you had selected a period when your process was starting up.
                              4. You can choose to have the system try to identify other startup periods.  Somehow, the system needs to compare the "major features" it associated with "Startup" with the "features" observed at other time periods.  You can use this to fine tune the system's pattern identification for "Startup".
                              5. You can redo steps 1 to 4 for other special events like "pump breakdown", "instrument failure", etc...

                              When you are done, you tell the system to monitor incoming data and tell you when it identifies any of the patterns you labelled.

                               

                              Moh& 

                                • Re: TagRank (another one of those "ideas"...)
                                  hanyong

                                  MOH M ERIGH

                                • You can choose to have the system try to identify other startup periods.  Somehow, the system needs to compare the "major features" it associated with "Startup" with the "features" observed at other time periods.  You can use this to fine tune the system's pattern identification for "Startup".
                                •  

                                  This sounds that what Event Frames can be used to do, identifying event occurences and storing the information so that users can analyse similar occurences. In fact thats what some users are using PI-Batch to do at the moment. Just that PI-Batch is designed quite specifically for Batch processing, somewhat limiting its functionality for use case outside of Batch context. Event Frames is designed with extended use cases like "start-ups" or "instrument failure", so that more users can benefit from it.

                                   

                                  Since "Event Frames Generator" is not out yet, we can only imagine how it is going work, but Event Frames does seem like the suitable candidate for backend storage of events period like what is mentioned.

                                    • Re: TagRank (another one of those "ideas"...)
                                      pcombellick

                                      I agree that a set of EventFrames could be marked as "abnormal condition" to define a problem that could not be identified with a set of process alarms.  It is likely that a domain expert would have to classify specific EventFrames instances as "normal" or "abnormal".  A comparison engine could compare "abnormal" EventFrames with new EventFrames and attempt to classify them as "normal" or "abnormal" and raise an alarm when a new "abnormal" EventFrame is identified.

                                       

                                      In S88 terms, this might mean comparing a currently active Batch, to a historical reference "Golden Batch" and raising an alarm if the current batch is deviating from the target "Golden Batch".

                                       

                                      I still think that this involves a Domain Expert to define the conditions and not simply a generic software tool that can connect to any PI System and infer "correctness" or "incorrectness" without domain knowledge of the process and the specific system configuration.  Unless, of course, AF is involved.

                                       

                                       

                                        • Re: TagRank (another one of those "ideas"...)

                                          I think it's an awesome idea, Moh! And, just like all awesome ideas, it's got certain drawbacks...

                                           

                                          I can easily think of a system where all appropriate users of a plant/mill/facility of any kind would have the ability to identify those patterns and define those normal/abnormal status. A big site- or company-wide collaborative effort (pretty much like Moh intended this project, I think ) that would somehow "teach" the system about operations and alarms to raise.

                                           

                                          From a technology standpoint, Paul referred to comparing a currently active Batch to a historical reference (i.e. a Golden Batch), and this is pretty much what PI ProcessTemplates was meant for.

                                           

                                          But now this can be taken to the next level. And I think this "next level" might imply some human-driven artificial intelligence (AI). And as Han Yong and Paul pointed out, Event Frames and AF (and ultimately, PI Notifications) represent the perfect building blocks for such an initiative.

                                          With that said, my suggestion is that you guys keep cogitating on this idea (both in your head and in this discussion thread ) and look out for news on handling collaborative projects. Yes, you read correctly: given the interest demonstrated by the community relative to "Collaborative Projects", we are seriously thinking about facilitating this in the OSIsoft vCampus community, and are fleshing out the details of if and when as we speak.

                                            • Re: TagRank (another one of those "ideas"...)

                                              Steve Pilon

                                              With that said, my suggestion is that you guys keep cogitating on this idea (both in your head and in this discussion thread ) and look out for news on handling collaborative projects. Yes, you read correctly: given the interest demonstrated by the community relative to "Collaborative Projects", we are seriously thinking about facilitating this in the OSIsoft vCampus community, and are fleshing out the details of if and when as we speak.

                                               

                                               

                                              When you are going through the details, don't forget the share options for the collaborators

                                               

                                               

                                            • Re: TagRank (another one of those "ideas"...)
                                              merighm

                                              Hi Paul,

                                               

                                              From the (very) little I know about EventFrames, there are common things between TagRank (for a better name) and the concept of EventFrames.  My guess is that EventFrames will make it easier to *implement* this but "they cannot be it" since they lack the Analytics part.

                                               

                                              The ProcessTemplates and the "Golden Batch" scenario are particular cases of "TagRank"-based analytics.  In these particular cases, it is the end-user manually selecting what is worth monitoring and comparing across batch runs.  In "TagRank", it is the "system" suggesting which tags seem to be important during your batch based on some tag-ranking criteria.  So the main difference is not in "direction"; it is in the degree of automation of Analytics.

                                               

                                              I agree that the "system" could tell you that a tag representing the CPU loading on the PI Server was important for the quality of your batch, simply because there was a lot of variability in it.  But the system can be smart enough to ignore spammer-tags.  For instance, tags referenced within AF will be deemed exteremely important while those coming from certain data sources are not.

                                               

                                              Yesterday, I googled +osisoft +"pattern recognition" and found this article by Baskur (OSIsoft) et al.  I just believe they did great work.  There were some manual steps in their analysis, and the question is how much of what they did can be automated.  For instance, why do you need to take data into DataLink in order to perform statistical analysis?

                                               

                                              Moh&

                                  • Re: TagRank (another one of those "ideas"...)
                                    pcombellick

                                    What algorithm does a generic software tool, without domain expertise, use to differentiate between "normal" and "abnormal" patterns ?

                                      • Re: TagRank (another one of those "ideas"...)
                                        merighm

                                        Let us consider an image processing tool that takes an image and tells you that "this picture contains two oranges, one apple, and a candle".  At some point, the tool was thaught how an orange, an apple, and a candle look.  It was not given pictures of all possible oranges, apples, and candles in the world.  Also it was not taught about the essence of an orange; it was only told that english speakers refer to this object as an orange.  The tool needs only enough pictures to allow it to determine a few key features associated with each object it needs to identify.  And then the tool can pick out those objects in pictures containing those objects in widely differents shapes.  The challenge is how to process images bits into "features" that you associate with meaningfull names.

                                         

                                        So what I mean by "abnormal" is when the tool can tell you "this orange has an abnormal texture," and you tell the system to remember this as "stale, dry orange."   The system has no idea what orange, stale, or dry mean.  But it can distinguish between a fresh orange and a stale, dry one.  So it is not about normality being more desirable than abnormality; it is how you define each and can distinguish between the two.

                                         

                                        In the context of our discussion, the challenge is how to process a set of data streams into "features".  My hunch is that if hurmans keep track of their processes by monitoring a relatively small set of data streams, an automated tool can conceptually do the same. 

                                         

                                        Moh&

                                         

                                         

                                          • Re: TagRank (another one of those "ideas"...)
                                            richard

                                            This discussion thread reminds me of the work being done by Numenta building HTM Systems to "conceptualize" spatial-temporal data (like pictures). Once a "concept" is learned then it is able to identify those patterns (concepts) and deviations. I believe that we will see a lot more help from computers in searching and pattern matching but more importantly in the identification of relevant states of information.

                                              • Re: TagRank (another one of those "ideas"...)
                                                merighm

                                                Glad you did not catch me plagiarizing without admitting I was doing so-- the link I gave in my Dec 13 message was to a paper on Numenta.  Since I read Numenta founder's book "On Intelligence" back in 2004, I see his pattern of thinking appear in various forms in other people's work.  In the power/process industry domain, the only company I know of that seeks solutions like this is SmartSignal (very likely using different algorithms).

                                                 

                                                I am currently reading an article written by somebody who took a different approach: he first uses a DFT (Discrete Fourier Transform) or DWT (Discrete Wavelet Transform) on the data and then searches for similarity between the results of these transformations.  It looks like this approach yields much better results than trying to compare series of data in the time domain.  Looks like this is a field where a lot has been done in academia since the early 80s but has not seen many commercial applications yet.

                                                 

                                                 

                                      • Re: TagRank (another one of those "ideas"...)
                                        MichaelvdV@Atos

                                        Hi Moh,

                                         

                                        As I mentioned in the pm, this is also one of those idea's that was crossing my mind for the past few months. I was thinking what we could do with all this information stored into a PI system, could we come up with some 'datamining' algorithm/method that you would normaly see in a relational environment?

                                         

                                        If this is the case, we could have some semi-intelligence in the system, giving out notifications on abnormal behaviour, just like you said. I'm not a hardcore mathematician, but I could think of some ways this could be done.

                                         

                                        Given the enormous amount of historic data in a normal PI system, you could have some sort of datamining/analysis tool, that tries to find references and relations between the information when you run it, and it should keep updating the references/information as the system runs. For instance: if the values of one tag/timeline always increase relatively to another timeline, and a situation occurs when this is not the case: this could be a reason for a notification.

                                         

                                        You mentioned PageRanking: in it simplest form we could build an algorithm that sorts out the most important tags in a system looking at how much it is referenced in Performance Equations/ACE/AF (just like PageRank, which looks at the number of references to a certain document on the web). I don't know if this will give you much information about the operation, but it will certenly give you insight into which tags are most important for your calculated values. Using PI OLEDB this shouldn't be really hard to create.