Although developing models and algorithms in packages such as R and Python is straight forward, integrating these scripts is not - especially not in Windows environments. The main reason is that the scripts require a run time environment to execute and any other system need to communicate with these engines. Or in NET language: You have to call unmanaged code from managed code.
The other main hurdle is scalability. I believe R and Python can be setup to be thread safe, so they do provide a lock so that they can be called from different threads. But since the script execution is single threaded adding more calculations requires separate engines\cores\threads. The challenge then is to distribute calculations on a core pool in order to minimize latency and fully take advantage of parallel execution.
The other aspect is that R and python scripts are very different from C# or VB.Net code since they run typically slower and the calculations are also more complex. To tune a system of 10's or 100's of models requires a lot of metrics such as execution time and client\server latency. Ideally these metrics could also be used to automatically load balance the system.
The idea was to create a web service that provides a common interface and can communicates to different scripting languages. The web service can be configured to spin up several cores, each with its own communication channels and internal buffering. The cores will load standard and custom libraries on start-up based on configuration settings.
The service is pretty simple and just has GET commands for each language, for example:
For each call the returning JSON will contain the time for core processing and execution.
The data model is based on a sequence of operations that are necessary to execute a command\script. So you have Push, Pull and Execute operations and an equation such as: a + b = c would be split up into the following 4 operations:
- Push a
- Push b
- Execute: a + b = c
- Pull c
The advantage of this approach is that it minimizes the number of calls to the server. It also allows to provide a common interface to both languages, which makes the client side programming easier and allows flexibility in the way models or scripts can be called recalled.
Currently the data model supports the following data types:
|Data Frame (only for R)||ü||ü||ü|
Here are just a few examples:
Push double value:
Push and pull same value:
Sleep for 1 second:
Note: All times are in milliseconds.
In the next blog post I will provide some more examples and also share some load balancing metrics.