Posts Tagged ‘nosql’

Open Source Big Data Reporting & ETL show promises

March 16, 2012 1 comment

With Hadoop/Hbase/Hive, Cassandra, etc. you can store and manipulate peta-bytes of data. But what if you want to get nice looking reports or compare data held in a NoSQL solution with data held elsewhere? There have been two market leaders in the Open Source business intelligence space that are putting all their firepower onto Big Data now.

Pentaho Big Data seems to be a bit further ahead. They offer a graphical ETL tool, a report designer and a business intelligence server. These are existing tools but support for Hadoop HDFS, Map-Reduce, Hbase, Hive, Pig, Cassandra, etc. have been added.

Jaspersoft’s Open Source Big Data strategy is a little bit behind because connectors are not included yet into the main product and several are still in beta quality and with missing documentation.

Both companies will accelerate the adoption of big data since the main problem with Big Data is easy reporting. Unstructured data is harder to format into a very structured report than structured data. Any solutions that will make this possible and additionally are Open Source are very welcome in times of cost cutting…


NextGen Hadoop, beyond MapReduce

Hadoop has run into architectural limitations and the community has started working on the Next Generation Hadoop [NGN Hadoop]. NGN Hadoop has some new management features of which multi-tenant application management is the major one. However the key change is that MapReduce no longer is entangled inside the rest of Hadoop. This will allow Hadoop to be used for MPI, Machine Learning, Master-Worker, Iterative Processing, Graph Processing, etc. New tools to better manage Hadoop are also being incubated, e.g. Ambari and HCatalog.

Why is this important for telecom?
Having one platform that allows massive data storage, peta-byte data analytics, complex parallel computations, large-scale machine learning, big data map reduce processing, etc. all in one multi-tenant set-up means that telecom operators could see massive reductions in their architecture costs together with faster go-to-market, better data intelligence, etc.

Telecom applications, that are redesigned around this new paradigm, can all use one shared back-office architecture. Having data centralized into one large Hadoop cluster instead of tens or hundreds of application-specific databases, will enable unseen data analytics possibilities and bring much-needed efficiencies.

Is this shared-architecture paradigm new? Not at all. Google has been using it since 2004 at least when they published Map Reduce and BigTable.

What is needed is that several large operators define this approach as their standard architecture hence telecom solution providers will start incorporating it into their solutions. Commercial support can be easily acquired from companies like Hortonworks, Cloudera, etc.

Having one shared data architecture and multi-tenant application virtualization in the form of a Telco PaaS would allow third-parties to launch new services quickly and cheaply, think days in stead of years…

Changing from Telco Grade to Web 2.0 Grade by fighting telecom myths

Most telecom operators are still thinking that software should be upgraded at most twice a year. Oracle RAC is the only valid database solution. RFQ’s bring innovation. If you pay higher software licenses, the software will have more features and as such will be better.

All of these myths will have to be changed in the coming 12 months if operators want to be stay on top of the game.

Upgrade twice a year

For telecom network equipment, two upgrades a year are fine. However for everything related to services that are offered to consumers or businesses, that means that operators are 180 times less competitive then their direct competition. The large dotcoms like Facebook and Google make software upgrades on a daily basis. 50% of all the files that contain Google software code change every month. Even if “a revolution” would happen and software upgrades would come every month, it would still mean a 30 times lag.

Operators need to start using cloud computing, even if they are private clouds, to deploy their back-office systems. The business needs software solutions to move at market speed. That means that if a new social networking site is hot, then it should be integrated into telecom solution offerings in days. Not in months or a year.

There are many techniques to make deployments more predictable, more frequent and more reliable. Offering extra features or integrations quickly can be done via plugins. You can have a group of early adopters, give feedback. If they don’t survive this feedback, kill them. If they do, scale up quickly.

Oracle RAC

Nothing bad about the quality of Oracle RAC but it is a very expensive solution that needs a lot of man-power to keep on running smoothly. Operators often pay a premium for services that could run equally well on cheaper or Open Source alternatives. Also NOSQL should be embraced.

If the cost of deploying a new service is millions, then only a couple of them will be deployed. By lowering hardware and software costs, innovative projects are more likely to see daylight.

RFQ’s and Innovation

It takes 3 months from idea to finalizing an RFQ document. 1,5 month to get a reply. 1,5 month to do procurement. Half a year in total. Not counting the deployment time which is likely to be another 6 months. The result is that the operator takes 12 months for any “new” system.

Now the question is if that system is really new. Because if an operator was able to define in detail what they want and how they want it, then the technology was probably quite mature to begin with. So operators spend fortunes installing yesterday’s technology 12 months late. Can anybody explain what innovation this is going to bring?

First of all operators should not organize multi-million RFQs for business or end-user solutions. These are likely to come late to market and can only be focused on mass markets.

Instead operators should focus on letting the customer decide what they want by offering a large open eco-system of partners the possibility to offer a very large list of competing services to their customers. The operator should offer open APIs to key assets (charging, numbering plans, call control, network QoS, etc.). As well as offer revenue share and extra services like common marketplaces and support 2.0 (social CRM, helpdesk as a service, etc.). This is called Telecom Platform-as-a-Service or Telco PaaS.

High licenses, more features, better

More features does not mean better. Most people want simplicity, not a long list of features. Easy of use comes at a premium price. Look at Apple’s stock price if you don’t believe it.

It is better to have basic systems that are extremely easy to use with open APIs and plug-ins. A feature by feature comparison will make you choose the most expensive one. However it is hard to put as a feature that the system needs to be easy to use.

In telecom, there is a natural tendency to make things hard. In Web 2.0 the tendency is the opposite. You can see the difference between Nokia and Apple. The Nokia phone would win every feature on feature comparison but the iPhone is winning the market battle…

Instead of organizing an RFP, let end-users and employees play around with early betas or proof-of-concepts. No training, no documentation. Let’s see which solution makes them more productive, the feature rich or the more straight forward. Just ask open APIs and a plugin-mechanism and you will be set…

Looking for the right hypervisor for my private cloud or IaaS is the wrong question

February 18, 2011 2 comments

If you are trying to find out what the right hypervisor is for your private cloud or IaaS then you might be asking the wrong question…

Do most applications really need an OS and hypervisor is a better question?

One company of the companies that is exploring this area is Joyent. Thier SmartOS is like the mix between a virtual machine and a combined OS + hypervisor. Instead of installing a hypervisor, on top an operating system, on top an application server or database, the Joyent team thought it would be more efficient to try to remove as many layers as possible between the application/data and the hardware.

According to publicly available videos and material, their SmartOS is based on a telecom technology for high-scalable low-latency application operations. Unfortunately Google does not seem to be able to answer which telecom technology it is. So if you know the answer, please leave a comment.

The idea of running applications as close to the hardware as possible and being able to scale an application over multiple servers is the ultimate goal of many cloud architects. Joyent claims that their SmartOS runs directly on the hardware. On top of SmartOS you are able to install virtualization but ideally you run applications and data stores directly.

The next step would be to combine the operating system with the  virtual machine/application server or database server into one.  Removing more layers will greatly improve performance as can be seen by Joyent’s performance tests.

So the real question is: do we need so many extra layers?

A distributed storage system, a virtualized webserver, a virtualized app server, a distributed SQL-accessble database or NoSQL solution that would run straight on hardware with a minimal extension to distribute load over multiple machines would be the ideal IaaS/PaaS architecture. It would give customers what they really need: performance, scalability, low-latency, etc. Why add a large set of OS and hypervisor functions that at the end are not strictly necessary?

Virtual Telecom Applications and an innovation architecture

December 22, 2010 Leave a comment

I have been looking into virtualization but what I find are mainly operation system based virtualizations. What I am looking for are application, integration and datastore virtualization solutions. Google’s App Engine and  Oracle’s JRocket Virtual come closed to what I am looking for application virtualization. Why do you need an operating system if you could virtualize your application directly? It would save resources and would be more secure. My ideal solution allows developers to write applications and run them on a virtual application server. This virtual app server can scale applications horizontally over multiple machines. Each application is running in a sandbox hence badly written or unsecure applications will run out of resources and are not able to impact other applications. We would need a similar solution for integration solutions. Both would need out of the box support for multi-tenancy in which either a tenant gets a separate instance or multiple tenants can share one instance if supported by the software. Integration should be separated from the application logic and so should data storage.

Integration is key because the virtual applications could be running on a public cloud but would have to be able to interact with on-site systems. Enormous high-throughput, security, multi-tenancy and resistance to failure are key. One API can be linked to multiple back-office systems or different versions. Different versions of an API can be link to the same back-office system to prepare applications before a major back-office upgrade.

A distributed multi-tenant data store should hold all the end-user and application data. Ideally in a schema-less manner that avoids having to migrate data for data schema changes.

All these virtual elements should be managed by an automated scaling and highly distributed administration that can let applications grow or shrink based on demand, assure integration links are always up and get re-established if they fail, store data in a limitless way, etc. But there is more. The administration should allow to deploy different versions of the same application or integration and allow for step-wise migration to new versions and fast roll-backs.

Why do we need all this?

The first company that will have such elements at its disposal will have enormous competitive advantages in delivering innovative services quickly. They can launch new applications quickly and scale them to millions of users in hours. They can integrate diverse sources and make them universally available to be re-used by multiple applications. They can store data without having an army of DBAs for every application. They can try out new features and quickly scale them up or kill them. In short they can innovate on a daily basis.

The Google’s of this world understood years ago that a good architecture is a very powerful competitive weapon. There is a valid trend to offshore technical work. However technical work should be separated in extremely high-value and routine. Never off-shore high-value work. Also never assume that because the resources are expensive, it must be high-value. Defining and implementing this innovation architecture is extremely high-value. Writing applications on top of it is routine at least starting from number 5.

The power of binary SIP

December 10, 2010 Leave a comment

With the world looking more at XML, SOAP and REST these days, it is perhaps  anti-natural to think binary again. However with Protocol Buffers [Protobuf], Thrift, Avro and BSON being used by the large dotcoms, thinking binary feels modern again…

How can we apply binary to telecom? Binary SIP?

SIP is a protocol for handling sessions for voice, video and instant messaging. It is a dialect of XML. For a SIP session to be set-up a lot of communication is required between different parties. What if that communication is substituted by a binary protocol based for instance on protocol buffers? Google’s protocol buffers can dramatically reduce network loads and parsing, even between 10 to a 100 times compared to regular XML.

What would be the advantages:

  • Latency – faster parsing and smaller network traffic reduces latency which is key in real-time communication.
  • Performance – faster parsing and lower load means that more can be done for less. One server can handle more clients.
  • Scalability – distributing the handling of SIP sessions over more machines becomes easier if each transaction can be handled faster.


  • No easy debugging – SIP can be human ready hence debugging is “easier”. However in practice tools could be written that allow binary debugging.
  • Syncing client & server – clients and server libraries need to be in sync otherwise parsing can not be handled. Protocol buffers ignores extensions that are unknown so there is some freedom for an old client to connect to a newer server or vice-versa.
  • Firewalls/Existing equipment – a new binary protocol can not be interchanged with existing equipment. A SIP to binary SIP proxy is necessary.

It would be interesting to see if a binary SIP prototype joined with the latest NOSQL data stores can compete with commercial SIP/IMS equipment in scalability, latency and performance.

%d bloggers like this: