Skip navigation
All Places > PI Developers Club > Blog > Author: Ahmad Fattahi

[If you are already familiar with Hadoop you can only read the last two paragraphs] There is a strong trend in the Big Data community about Hadoop and all different layers and implementations of it. Google started the MapReduce software framework along with the GFS as the accompanying distributed file system. Then came along the open source version expanded by Yahoo and others delivering MapReduce and HDFS as the software and file system frameworks respectively. Nowadays a lot of Big Data organizations have their own Hadoop clusters capable of handling many terabytes and petabytes of data with some good  level of fault tolerance.


The main advantage of Hadoop lies in its cheap implementation, flexibility to lack of structure in data, and fault tolerance. In fact, the justification is that we can provide sufficient redundancy so that we can always be available in face of normal hardware and software failures using Hadoop. The redundancy is of course possible because of low hardware cost. Also, we don't need to process all the data that comes in right on the spot; therefore, Hadoop can store unstructured data for future processing. The downside is that sometimes you get incoherence among data nodes. Also, for smaller amounts of data traditional database systems, or Parallel Data Warehouses (PDW) can get the job done with far fewer number of nodes. To show the trade off let's see this example: eBay handles data roughly half or a third in size of Facebook's. eBay runs on PDW while Facebook runs Hadoop. Guess how much bigger Facebook's cluster is? About 10 times! Also, you sometimes see incoherence in Facebook updates which is a direct result of multiple copies and cheaper hardware. All in all, there are merits for both paradigms to co-exist and serve different purposes. In fact Microsoft adopted Hadoop in 2011. For the same reason some organizations (including Microsoft) embrace Sqoop (SQl to hadOOP) to bridge the gap.


Now the question is what you think of the future of Hadoop and the PI System. Do you see any feasible integration or use in there? For one thing, we cannot break down the archive files at random places in chunks of 64MB sizes as Hadoop requires. What if we make archive shards of 64MB in size and replicate whole copies of each on Hadoop data nodes? What operations you think can fit well in the MapReduce paradigm? Many more advanced analytics can very well be broken down (MapReduced) into parallel operations. How about data collection in the first place?


Do you see any downside to that? Do you see any major obstacle why PI System would not fit well into Hadoop platform?

Thursday was Day 2 and the final day of the conference. Very interesting was the talk by Twitter on their Storm platform to handle distributed and fault-tolerant real time computation. Storm is an open source platform that is made for handling Big Data analytics including Twitter-scale word counting, for example. The whole structure is "divide and conquer" and takes advantage of Hadoop architecture.


Also very interesting was a talk by Skytree on Machine Learning and big data (my favorite!) They offer a suite of advanced Machine Learning algorithms in the Cloud. The claim was that on major existing Machine Learning algorithms they outperform others by orders of magnitude. The front end can be MATLAB, R, or command line, among others.


There was also a very interesting talk by CrowdControl who offer crowd sourcing services. The mechanics are handled by Amazon Mechanical Turk. It was interesting to see that in fact a marketplace for crowd sourcing big tasks exists out there. Microtasks are priced at cents apiece; pretty cheap!


In general the energy was very high and the buzz words were power of data, Hadoop, R, and analytics. It was my first appearance at Strata but I heard from several people that it is the biggest and most energetic one so far. To give you an idea of the boom in this market, here is the picture of one of the several job boards at the conference:



Wednesday was the official opening of the event. There was 8 parallel sessions going on all day long; so eat was drinking from a fire hose. The keynotes came from a diverse group from hard-core data analysts to computer organizations such as Microsoft, as well as a doctor who used big data to analyze different drugs and detect deceiving marketing habits.


Among other sessions, Microsoft Hadoop was a great session I attended. It was presented by Alexander Stojanovic, General Manager for Cloud Analytics and Computation at Microsoft. He emphasized how Microsoft recently adopted and embraced the open source Hadoop platform aimed at cheap, distributed storage and processing of Big Data (petabytes). Also the open source platforms makes it more vibrant and open to public. They have an end-to-end approach: data management, data enrichment, analytics with emphasis on self service. Microsoft Hadoop is now offered for Windows Server and Azure. It also comes with a Hive add-in to Microsoft Excel for direct transfer of data between Excel and Hadoop clusters. A couple partners were showcased (Hortonworks and Karmasphere) who have adopted the CTP of the product.


Another very interesting talk was offered by Netflix. They went over the algorithms and some details of how they deal with ranking their movies in a personalized fashion. They deal with 2 Billion ratings right now adding millions every day. You can imagine how difficult the problem can get.


The Exhibit Hall was very successful with tens of organizations, large and small, offering Big Data and analytical services. Several smaller firms focus on doing analysis on Big Data for you. Interestingly enough, many of them offer their unsupported solutions in an open-source fashion for free.

I attended the O'Reilly Strata 2012 today (Tuesday). It was the day for workshops; the conference will officially start on Wednesday morning and will last two days.


It has been great so far and much bigger than I expected. The motto is "Making data Work" which comes so close to heart for all of us here I attended two workshops today. The first one was on "R" (the open source version) and "Revolution R" (the commercial version) presented by Revolution Analytics. It was a great review of the power of R in general as well as what the commercial version, Revolution R, can offer in statistical analysis and machine learning.


The second workshop was titled "The two most important algorithms in predictive modeling today".  The speakers, Jeremy Howard and Mike Bowles, talked about two major machine learning approaches today: random forests and glmnet. It was a very exciting and interactive session with some hands-on as well as theoretical discussions (Mike Bowles is a former MIT professor).


Overall, so far it has been a very well-attended (sold out!) event. On the algorithmic side there is a lot of buzz around R. On the implementation side, you hear a lot of Hadoop and MapReduce. There are big company names around both as sponsors and presenters such as Microsoft, IBM, Netflix, EMC, LinkedIn, vmware, Oracle, Amazon, to name a few. Will post more updates soon!



Today marks the 3rd anniversary of our beloved Community! On this day in 2008, OSIsoft vCampus was officially born paving the way for numerous collaboration opportunities, discussions, webinars, and other PI System development as well as social activities. After 3 years we are proud to have entertained:

  • 1800 members
  • 1800 discussion threads - 11000 posts
  • 250 blog posts
  • 40 exclusive white papers and tutorials
  • 33 webinars
  • Thousands of software downloads
  • Two Live! events with the third one approaching in 2 weeks
  • the vCampus All-Star program since 2010

Many thanks to all of you and everyone who contributed to the success of our community.


Our next vCampus Live! event is happening on Nov 30- Dec 1, 2011 in San Francisco. It should be a great opportunity for all of us to come together, and discuss geeky as well as nontechnical matters, learn, and have fun! Also, expect a number of new announcements from OSIsoft as well as the vCampus community during the event Whether or not you are planning to attend the event you can stay tuned by following us on Twitter @OSIsoftvCampus following the hash tag #VCL11.


We are, as always, eager to hear from you how and what we can do better. Please share your ideas with us on the forums or contact us at Looking forward to another bright and fruitful year for our community!



It sounds like a very difficult question. There are so many factors involved, such as laptops, desktops, servers, smart phones, etc. Also, don't forget the routers and the energy to produce all such devices. A pair of researchers from University of California, Berkeley tackled the question and published their results.


Answer: 2% of the whole world energy (somewhere between 107-370GW)!


It might sound a big number but the fact is that it makes up for several more power-intensive activities. For example, the researchers say, attending a meeting physically consumes 100 times more energy than attending it virtually! This is by the way aligned with our mentality in the PI Community: We want to use our infrastructure to make things work more efficiently and get higher value out of our investments.


Where do you think the energy share of the Internet will be in 10 or 20 years from now?

Ahmad Fattahi

Steve Jobs

Posted by Ahmad Fattahi Employee Oct 10, 2011

Everybody was deeply saddened and shocked when the news of Steve Jobs' death was announced last week. The real impact of his absence is yet to be seen over the coming years on Apple as well as the tech world.


It is true that he was a great exceptional visionary shaping several amazing technologies, mindsets, and products. However, how high you place him in all-time ranking is another question. These days many people compare him to Thomas Edison and Henry Ford. Some even say he had what all those folks had collectively. Where would you place him? How would you compare him with likes of Henry Ford and Thomas Edison?


Here is a couple pictures I took the other night from the Apple store in Palo Alto near where he lived and where he used to show up frequently. Countless fans and enthusiasts posted Post-it notes to cherish his legacy.





OSIsoft vCampus Live! 2011 is around the corner (Nov 30). With so many valuable presentations and hands-on sessions along with developers lounge and learning labs there is something for every one (hint: the registration is open ).


I will be presenting on "Machine Learning for Prediction Purposes on PI System Data". You will learn what machine learning means and how you can apply it to your existing PI data in order to make predictions, do preemptive maintenance, and make strategic decisions with an invaluable insight in the future trajectory of your data streams. We will talk about PI System along with SQL Server Analysis Services and MATLAB. You will see some real-world examples as well. Here is a short video I made to explain what I mean by my presentation:



See this piece and take security seriously on the declared "Day of Vengence"!

Here is a picture of the Windows 8 Samsung tablet given out on Tuesday. It looks very smooth for the most part. However, given that this is a developers' prerelease version there are inevitably some rough edges here and there.




A very cool feature of Windows 8 is that when you don't use an application all of its processes go to s suspension mode. So, it stays in the memory but doesn't use any CPU. See my task manager a few seconds after I moved away (didn't close) my IE and piano apps. This should save a lot of energy and also make visible apps run a lot faster.





Today (Tuesday) is the opening day of Microsoft BUILD which is revealing some super cool features of upcoming Windows 8. The morning has been packed with keynote and very exciting demos. Some of the key and very cool features of the upcoming Windows are:

  • Totally touch-friendly
  • Whatever runs on Windows 7 will run on Windows 8
  • You can keep using mouse and keyboard if you want
  • the start-up time from sleep is less than 10 seconds!
  • The OS is very power-savvy shutting down all the processes when the corresponding application is not visible or in use
  • Great Cloud and sharing integration
  • Easy-to-create app environment
  • Publish your applications directly from the Visual Studio!

Over the next few days more and more details will be revealed



OSIsoft vCampus Live! 2011 is right around the corner! Our annual event is scheduled for Nov 30 - Dec 1 2011 in San Francisco. It is the best place to gain insight and in-depth knowledge on PI System technology and meet with fellow PI System professionals as well as OSIsoft developers and staff. There will be something of great and immediate value for anyone working with the PI System in this year's vCampus Live! event! The event is specifically designed for:

  • Technology executives in the manufacturing, process and IT industries
  • PI System developers
  • System architects (especially those with enterprise-level roles)
  • OEM and VAR software engineering staff
  • System integrators and ISVs
  • Enterprise PI System managers
  • PI System administrators and PI System power users
  • Anyone using PI System on a regular basis



Based on the feedback from the OSIsoft vCampus community, we are providing a unique experience this year with lots of hands-on sessions where you can try and learn existing and cutting edge PI Technology with an OSIsoft developer or specialized engineer. We will also have regular presentations filled with new and exciting developments around the PI System from fellow vCampus Community members and OSIsoft staff!


On top of that, there will be a Developers Lounge to give attendees a one of a kind opportunity to meet, network and talk with other OSIsoft developers in a relaxed environment.


 It is the best opportunity to:

  • Network and talk to fellow users of PI System from a technical perspective
  • Learn more about PI System from fellow users of PI System as well as OSIsoft Product Managers and Developers
  • Experience the development of applications using the PI System through the hands-on sessions

Register today to take advantage of the reduced rates for the Early Birds!

This is a real burden for many computer-savvy as well as ordinary users of multiple computers. It is common practice these days to have a work and a personal computer, not to mention tablets and other types of personal gadgets. How do you keep the files in sync?


The following article is nicely counting a number of different ways to tackle the challenge. It might be useful for personal use as well as system administrators to be aware of the existing options.



"When I helped design the PC, I didn't think I'd live long enough to witness its decline. But, while PCs will continue to be much-used devices, they're no longer at the leading edge of computing. They're going the way of the vacuum tube, typewriter, vinyl records, CRT and incandescent light bulbs.", said Mark Dean, one of the original designers of IBM PC and current CTO for IBM in Middle East and Africa. Out of the total 9 patents for IBM PC he holds 3.


What do you make of this prediction? The typewriters and vinyl records have almost vanished everywhere. Do you see it coming for PCs in the next decade? To read more see here.

Ahmad Fattahi


Posted by Ahmad Fattahi Employee Aug 10, 2011


Filter Blog

By date: By tag: