This post will contain an overview of an internship project designed to showcase the value of the PI System in a Smart City context. The project was undertaken at the Montreal OSIsoft office, by two interns – Georges Khairallah and Zachary Zoldan  – both of whom are studying Mechanical Engineering at Concordia University in Montreal.

As a proof of concept for the PI System as a tool for the management of smart cities, we chose to bring in data from public bike sharing systems. We then set about collecting data from other sources that may affect bike share usage – data such as weather, festival and events, public transit disruptions and traffic data. In undertaking this project, we hope to better understand the loading patterns of bike sharing systems, analyze how they react to changes in their environment, and be able to predict future loading.

 

 

 

What is a Smart City?

 

A smart city implements technology to continuously improve its citizen’s daily lives. Through analyzing data in real time, through the movement of people in traffic to noise and pollution, a Smart City is able to understand its environment and quickly adjust and act according to ensure it maximizes the usage of its resources. A smart city’s assets can all be monitored in real time, something which is becoming easier with the emergence of IoT technologies.

 

We chose to monitor bike sharing systems due to the abundance of live and historical open data, and the numerous analyses which can be run on the system. Bike sharing systems represent one aspect of a smart city, and by demonstrating what can be done with just this data, we can show the potential value of the PI System in a Smart City context.

 

 

 

What is a bike sharing system?

 

A bike sharing system is a service which allows users to rent bikes on a short term basis. It consists of a network of stations which are distributed around a city. Each station consists of a pay station, bike docks, and bikes that the user can take out, and drop off at any other station.  Bikes are designed to be durable, resistant to inclement weather, and to fit the majority of riders.

Users can be classified as “members” or “casual users”. Members are those who pay an annual, semi-annual, or monthly fee for unlimited access to the bike sharing network. Members access a network with a personal key fob, allowing the bike sharing system to keep a record of a members’ trips. Casual users pay for a one-way, 24-hour or 72-hour pass using their credit card at the pay station.

 

o1.png

 

Every bike is equipped with unique identifier in the form of a RFID tag, which keeps track of the station it was taken out from and returned to. Coupled with a member’s key fob identification, or a 4 digit-code generated from the pay station for an occasional user, we can track each trip and determine whether it was taken by a member or a casual user.

 

 

What datasets were available to us?

 

We had access to live open data from numerous bike sharing systems , from cities such as Montreal, New York City, Philadelphia, Boston and Toronto. Each bike sharing system posts data in the form of a JSON document, updated every 10-15 minutes, containing data pertaining to individual bike stations. It reported back with the number of docks available at a station, number of bikes available at a station as well as station metadata such as station name, longitude/latitude positioning and station ID.

Example Live Data.png

We also had access to historical trip data from different bike sharing systems. This data came in the form of CSV files, with each row entry representing a single bike trip. Each individual trip is tracked using an RFID chip embedded in the bike, and can tell us the start time/end time of a trip, start station/end station and whether the trip was taken by a member or a casual user.

The specifics of the historical data vary from system to system. Montreal’s BIXI system only offers the trip duration and account type on top of the trip information, whereas New York City’s CITIBike also offers data on member’s gender and age.

 

o99.png

 

We then brought in data sets which we thought might affect bike usage – data such as weather data, festival and events data, traffic data, and public transit disruptions. Weather data was collected from an open weather API, and public transit data was collected using a web scraper and the PI Connector for UFL.

Importing data into the PI System:

Much of our data was text-based, so we chose to use the PI Connector for UFL to import the data into our PI System. The Connector was set up using the REST endpoint configuration for most of our data sources. It was also set up to create PI AF elements and our PI AF hierarchy.

We stored most of our live data as PI Points, while the historical trip data was stores as Event Frames.

 

To learn more about using the PI Connector for UFL to create event frames, click here.

To learn more about using the PI Connector for UFL as a REST endpoint, click here.

To learn more about creating PI AF elements with the PI Connector for UFL, click here.

 

Historical Data Analyses:

 

The historical bike trips have a designated start and end time – to us, it seemed like the best way to store this data was as Event Frames, with an Event Frame for each bike trip. We used the PI Connector for UFL to create Event Frames from our CSV files, we loaded more than 5 million event frames in our PI System!

We used the PI integrator for BA to publish various “Event Views” and then used Power BI to visualize the data. We can then access this data using a Business Intelligence tool such as Microsoft Power BI.

 

June 2015 Trips vs Rain.png

Above we can clearly see that the days which experienced the highest amount of usage had 0 rainy events in other words it was a sunny day (as shown by the variable “temp_numerical”), whereas the days that held the least amount of trips had the most amount of rainy events.

 

o5.png

Above, we have a dashboard that represents the trips from June 2015. It holds a total of 556,000 trips.

On the top left, we can observe bubbles that represent the location of the stations. Furthermore, the color of the bubble shows to which neighborhood it belongs to, and finally the size of the bubble indicates the amount of trips made from that bubble (or station).

On the bottom right, we can determine that “Le Plateau” neighborhood is the most popular, with “Downtown” coming in second place.

Next, we can slice through “Le Plateau” using the donut chart on the top right, to determine whether all of the stations were equally used, or some stations outperformed others?

 

o56.png

Looks like many stations outperform other, why is that? Although we observe two stations nearby, one of them holds 7140 trips whereas the other has only 2378 trips, the station that is nearest to metro stations experience the most trips because it is the path of least resistance.

 

o7.png

 

This was the same case for “Downtown” neighborhood!

Since the bike share companies have information regarding the revenue each bike station is approximately generating, the city can make better financial decisions regarding the possibility of withdrawing stations under-performing because at some point maintenance costs outweigh profit.

 

 

 

Live Data Analyses:

 

Using AF analyses and the live data available, we created several KPIs in order to keep track of usage at each station throughout a city’s bike sharing network. These include station utilization (how empty a station is), and 4/8/12/24 hour combined in/out traffic events.

 

Let’s take a look at the 4 hour combined in/our traffic events in Montreal, over one weekday:

 

 

4 Hr total traffic - Montreal only.png

We see that there are two characteristic peaks, one at 11 am (representing traffic from 7 am - 11 am) and another at 8 pm (representing traffic from 4 pm - 8 pm). As expected, this corresponds to commuting times for the standard 9-5 workday. But is this trend repeated on the weekends?

Let’s look at the same metric, but over one week. Different colors indicate different days. (Saturday-Friday, Left-Right)

We can see the two characteristic weekday peaks which were present in the previous graph, but the usage on weekends tell a different story. There is only one usage peak on the weekend, since users are not using the bikes to commute to work.

 


Can we use the PI system to explain why the midday usage on Monday has dropped off compared to other weekdays? Let’s bring in our weather data and see if it had any effect on the usage.

o8.png

The added purple line which varies from 0 to 1, acts as a marker to when it was raining in Montreal, with "1" representing rainy conditions. Now, we have a clear explanation for why Monday July 18th experience a drop in usage during the day, it was raining!

 

o9.png

 

 

 

Predicting future demand:

 

A key issue which bike share system managers must deal with is the need to ensure bikes are available to users at all times. This means they often need to redistribute bikes from one station to another to ensure bike availability. System administrators need to be able to predict usage at a station level and on the system level to account for this.

Using data collected with the PI system, we can predict the future usage at the station level and at the network level. We looked at the value at a given time, say 3:00 PM on a Monday, and then averaged values from the past three weeks. We were able to accurately predict future usage within a margin of error of 2%-5% . We store the forecasted future usage using AF analyses and PI System’s “future data” option.

o43.png
Mapping Smart City assets with the PI Integrator for ESRI ArcGIS and ESRI Online:

To learn more about mapping smart city assets with the PI Integrator for ESRI ArcGIS, click here.

 

From the beginning of the project, we knew that the geographical location of our assets would be incredibly important for our analysis. The usage of any bike station is directly tied to its location – those located closer to major office buildings, pedestrian malls, and points of interest will experience an increase in usage. By mapping our assets (along with all associated PI points and KPIs), we can view any geographical trends that might emerge.

 

We started off by mapping our AF assets (Bike stations in Montreal) using the ESRI online. The examples below show how we can view our elements in a much more intuitive way, and combine secondary data sets such as the location of bike paths, subway stations and bike accidents. We set the marker size to be tied to a specific station attribute, the 4-hour total in/out traffic events.

 

We can click on any circle and bring up a configured pop-up showing all relevant AF attributes and KPIs.

 

o87.pngo88.png

We can then further enrich this data by adding other map layers. For example, we can see that most bike stations coincide with the location of bicycle paths (left) and the location of subway stations (right).

 

o90.pngo91.png

 

 

Conclusion:

 

In this post, we were able to show the value of the PI System in just analyzing bike sharing systems, which is just one small aspect of a smart city. The PI System excelled at data collection, time-based analysis, and helping us visualize numerous datasets. We could have performed a much more in depth analysis if we had access to other closed smart city datasets such as power consumption, waste management and more. The potential for further analysis is there, all we need is access to the data.

 

 

 

Demo Video