The power of S4, Yahoo’s distributed stream computing platform, in telco?
In October 2010 Yahoo made another internal system open source: S4. S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data.
After looking at the current code and documentation it is clear that this is an alpha project. Yahoo seemed to have stripped everything useful and is rebuilding the product almost from scratch. However this does not take away that S4 can in the near future become as important as Hadoop.
Hadoop is becoming the default standard for extremely large batch data processing. The word batch is important because low latency systems are not getting a lot of benefits from the Hadoop framework. In the telecom domain low latency is exactly the type of processing that is key. You don´t want to have voice or video to arrive late or unsynchronized.
S4 promises to focus on real-time high-volume data streams. It is unfortunate that the current code is not better documented and that Yahoo decided not to open source some examples around computer learning, etc.
The S4 framework should excel at taking rapid computational decisions for event-driven systems. This makes it a possible candidate for a long list of telecom domains: everywhere from network routing decisions, real-time billing, policy control, voice recognition, natural language processing, advertisement, etc.
Of course the S4 design is not new in the industry. Erlang and Scala have an Actor framework that can be seen as a more basic version of S4. Even some java implementations exist.
The power of mixing in Zookeeper and a pluggeable architecture can set S4 appart from previous frameworks. However more developers will be needed, more documentation but more important a re-usable library of processing elements. Having such a re-usable library would allow new applications to be built via configuration of processing elements instead of writing code.
Although S4 is still in an infant state, the potential to be a core compontent in a future telco 2.0 architecture is there…