16 Replies Latest reply on Sep 28, 2015 8:11 PM by bshang

    "BIG DATA"; reality or hype?

    Ahmad Fattahi

      As I was reading this article in Forbes, it made me ponder what our PI Users Community thinks about the topic. What's Big Data to you? Does it carry a very specific connotation?

       

      PI Server has been around for 3 decades. Back then the size of datasets was several orders of magnitude smaller than today. However, the same can be true for today's datasets compared with those of a decade from now. What is it today that we coin the phrase Big Data? Is it not just smarter ways of handling -collecting, historizing, analyzing, visualizing, cleansing, detecting- whatever data we have? I know some mention volume, frequency, or heterogeneity of data as litmus tests for Big Data; however, such factors have always been evolving and getting more and more complex and copious.

       

      We use our PI Systems, along with other data infrastructures, to create a smart and effective "digital nervous system" for the world. This has to be accomplished regardless of the evolution of the data sources around us. That's the bottomline, no matter if we call it Big Data or not.

       

      What do you think?

        • Re: "BIG DATA"; reality or hype?
          Lonnie Bowling

          Hi Ahmad,

           

          I think this article brings up a good point.  I often find myself cringing when I hear someone say "big data" because it is usually misused in my mind.  To give you an example of what I think real big data; is when the 1000's of cameras in London that are recording activities on the streets, and someone then wants to find a face in all the data.  I think of it as a lot of data, not well organized, and hard to analyze. I think when larger PI systems have years worth of data that no-one ever looks at, is a big data opportunity.  Often I see that people talk about big data as just a large database, or even a small database, but they, or even we, have no real point of to reference from.

           

          I feel that the phrase is more hype and a marketing term at this point.  I do believe that big data is real, but really not understood by most.  I must admit, I'm more at a loss with what it really means to us right now.  How is it changing my life?  I can see that mobile is really changing things, but can the same be said for big data?

           

          Good question!!

           

          Lonnie

            • Re: "BIG DATA"; reality or hype?
              Ahmad Fattahi

              Great points and examples Lonnie! Looks like the term is starting to be misused/overused/abused quite often these days. Also, multiple communities and organizations are trying to define it for themselves. This has created some kind of fatigue. As far as I am concerned we have always been handling large amounts of data (relative to the time being) in our PI Systems. That's Big data to me anyway.

               

              To your point, mobile adds a great deal of heterogeneity in data types. Maybe that's a significant development. I am curious to see what others in the community think.

                • Re: "BIG DATA"; reality or hype?

                  I agree with what has been written so far, Big Data can be misunderstood.  People believe that big data means pockets of their data that have high volume, but to me big data means ALL of their data not just the highly visible data.  Also, big data is relative, right?  One company's big data is another company's "small data".  Some people are also guilty of believing that big data means you have to store it in Hadoop, which is not true as you need to remove any technology from your mind to see that it doesn't matter how your data is stored.

                   

                  I read once that someone said "big data is lazy", you know because once it is stored on disk it juts sits there.  Its true, if your data just sits there and you don't exercise it you're not going to trim it down to the nonobvious information you didn't know you were looking for.  It means companies of old that had specialities in machine learning algorithms, predictive analytics ... will be popular again.  So in summary the PI Server is lazy, it just eats lots of data all day long.

                    • Re: "BIG DATA"; reality or hype?
                      Ahmad Fattahi

                      Great points Rhys. What would you see or suggest doing in order to make PI Data more nimble, therefore, extract more value out of it for the end customer? At t he end of the day what matters is "I was able to do X with my PI Data"; what are those valuable X's in your mind? What Lonnie does with his mobile platform is a prime example.

                        • Re: "BIG DATA"; reality or hype?
                          hamort

                          This is a great topic.

                           

                          The following is an excellent article about Big Data:

                           

                          http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation

                           

                          In my opinion the key point is that stored data is a quantifiable asset to a company. This of course also applies to PI, which is a library of processing conditions that can hold the key for operational improvements. So it is important to store and structure PI data properly so it can be data mined in the future. Poorly configured tags or a lack of documentation can diminish the value and make future analysis impossible.

                            • Re: "BIG DATA"; reality or hype?
                              mhalhead

                              Holger Amort

                              stored data is a quantifiable asset to a company

                               

                              I would disagree that stored data is a quantifiable asset. It would be impossible for me to say that x terabytes of data equates to y dollars. It is extremely difficult to attribute a value to information. Don't get me wrong storing the data is potentially useful because without the data you're blind (you can't control what you don't measure). The value of data only becomes apparent when you analysis it (distil it to meaningful information) and then act upon that information.

                               

                              I find that whole "Big Data" discussion somewhat irritating. There is no real definition of what "Big Data" is. Steven Few has perspective (there is a link to this article from the post Ahmad did originally) which I tend to subscribe to. To me "Big Data" is merely an evolution of what we have been doing. What has changed is that computer processing power has increased allowing us to collect, store and analyse more data. The bulk of the focus till now has been on the collection and storage. OSI has excelled at this; unfortunately the analysis side (I'm including visualisation into this) has been lacking. I'm not having a dig at OSI; they provide amongst the best products on the market for the analysis and visualisation of process data. IMNHO the focus must now shift to analysis/visualisation; the good news is that I'm starting to see this happening, although slowly.

                               

                              As a side comment on Steven Few; his books should be mandatory reading for anyone building a visualisation product or a dashboard.

                                • Re: "BIG DATA"; reality or hype?
                                  Ahmad Fattahi

                                  Holger Amort

                                  So it is important to store and structure PI data properly so it can be data mined in the future.

                                   

                                  Michael Halhead

                                  IMNHO the focus must now shift to analysis/visualisation; the good news is that I'm starting to see this happening, although slowly.

                                   

                                  When you guys talk about "so it can be data mined" or "analysis" can you be more specific? These are both very generic terms which can mean lots of different things. What in your fields of operation is the biggest source of need for such mining or analysis?

                                    • Re: "BIG DATA"; reality or hype?
                                      mhalhead

                                      Hi Ahmad,

                                       

                                      How long is a piece of string? I could literally provide a thesis on this (one of my staff has just submitted his PhD on the subject)

                                       

                                      It really does depend on the type of question that you are trying to answer. Analysis could consist of simple review of the time series trends to something more sophisticated as determining whether there is a correlation between the weather conditions and the plant performance for the past 5 years. The latter example is something that would be done offline by someone versed in exploratory data mining techniques.

                                       

                                      What I do see as a more immediate trend is take some of the exploratory techniques from an offline process to more an online process. An example that we are looking at is implementing a change point detection algorithm. The idea being that it would be beneficial to create events (using EF) when we detect a change in the performance of the operation with the aim of shorting the time between the event occurring and the detection and resolution.

                                       

                                      What we are finding is that we typically need data from more than one data source; PI, LIMS, environmental monitoring, energy management systems, ... And you need a good mechanism to display the information to those that must act on it. There will always be an element of offline data analysis done with a variety of tools ranging from Excel to Matlab (or any other heavy numerical analysis package like R, Statistica, ... ).

                                       

                                      Starting with the integration between sources. The one way is to pull all this data into PI. Another would be to use something like AF to surface the data. Both have benefits. Personally I like the latter of leaving the data in place and surfacing it. Unfortunately, we tend to do the former due to limitations of things like ACE, PE, ...

                                       

                                      Visualisation is a big issue. You can do all the brilliant maths detecting the problem but if the users don't see this information in a meaningful way they won't react. Visualisation of EFs and making the data from them available is something that I see as a first big step in the right direction. It gives you the ability to say to the user that hey something interesting happened here, there was a shift in your performance, maybe you should go and look at this in more details, you can ignore all the other stuff because that was the same as normal.

                                        • Re: "BIG DATA"; reality or hype?
                                          Ahmad Fattahi

                                          Great feedback Michael. I appreciate the fact that the needs and wants abound. My take from your piece above is that you would like better "surfacing" tools and integration features available at your fingertips. Creating such features native in PI System wouldn't be as important as you prefer keeping data where they are. Also, better visualization would be key to the quality of delivery to the end users. Please correct me if I missed anything major.

                                           

                                          I am curious what other folks think on the subject as well.

                                            • Re: "BIG DATA"; reality or hype?
                                              mhalhead

                                              Hi Ahmad,

                                               

                                              You have it about right.

                                               

                                              Project Trident is an interesting project that fits quite nicely into this discussion. I would love to see something similar for our field.

                                                • Re: "BIG DATA"; reality or hype?
                                                  Ahmad Fattahi

                                                  I find the following description of Project Trident of interest. Do you guys have so many services that you would need a registry of such services? How many would you say exist across your enterprise?

                                                   

                                                  "An increasing number of tools and databases in the sciences are available as Web services. As a result, researchers face not only a data deluge, but also a service deluge, and need a tool to organize, curate, and search for services of value to their research. Project Trident provides a registry that enables the scientist to include services from his or her particular domain. The registry enables a researcher to search on tags, keywords, and annotations to determine which services and workflows—and even which data sets—are available."

                                                    • Re: "BIG DATA"; reality or hype?
                                                      Lonnie Bowling

                                                      Just to keep the conversation going, I listen to this DotNetRocks podcast recently and it is along the lines of this thread.  There are some interesting points about what Big Data is and what it means. Some history on where it came from and where it is headed.  They also talk about BI and how it relates to "big data".   Good stuff!

                                                       

                                                      www.dotnetrocks.com/default.aspx

                                                       

                                                      Lonnie

                                                        • Re: "BIG DATA"; reality or hype?
                                                          mhalhead

                                                          Ahmad

                                                          Do you guys have so many services that you would need a registry of such services?

                                                           

                                                          I wouldn't say that we have a huge number of services but what we do have is a discovery issue in the company; the company also has the corporate memory of a gold fish. This is unfortunately not abnormal for big companies. Although the service repository is interesting what caught my attention was the workflow setup and data provenience (something auditors like).

                                                           

                                                          Lonnie, I'm a regular .NET Rocks (and Tablet Show) listener. This was an interesting show. Andrew Brust's definition of big data isn't bad; Volume, Velocity and Varity. We certainly have Velocity and Varity, and you could argue that we have Volume. The volume portion is an interesting argument, what constitutes big and most process data is actual a sparse data set.

                                  • Re: "BIG DATA"; reality or hype?
                                    bala

                                    Hi All,

                                     

                                    Thanks for sharing wonderful contents.

                                     

                                    I just gone through Gartner's Hype Cycle of Emerging Technologies 2015 - which explains Internet of Things been hyped more than Big data.

                                     

                                    Also Top 10 Strategic Technology Trends for 2015 explains top trends for 2015 cover three themes: the merging of the real and virtual worlds, the advent of intelligence everywhere, and the technology impact of the digital business shift.

                                     

                                    Big data remains an important enabler for Advanced, Pervasive and Invisible Analytics trend but the focus needs to shift to thinking about big questions and big answers first and big data second — the value is in the answers, not the data.