UPDATE: There is a new social graph player that implements Pregel on Hadoop: Giraph
Lately there is a lot of talk going on about graph databases and its main applications for things like social graphs. Google’s Pregel and the bulk synchronous parallel model are also important hints. Building on the mobile social graph idea, I am evaluating different graph databases. For revenue sharing engagements, cost is critical. As such real “open source” solutions are preferable over expensive licenses.
What open source graph databases are available?
On paper the most promising one was Neo4J. After making some tests with it, I discovered however a quite important limitation: There is no remote thread-safe API. This means that when making a multi-threaded solution you run into problems when updating relationships between nodes. Under stress you are likely to want to update a relationship while another thread has a lock and as such you run into problems.
Sones has a very restrictive open source version, so not really useful.
OrientDB looks very promising for some applications but is not really build to execute complex graph algorithms like large scale pagerank.
Infogrid is extremely complex with a lot of individual components that are all in different stages of development. However there are some promising aspects.
Hama is one of the most promising technology-wise but until you can actually store data in Hadoop and quickly manipulate large sets of matrices is unusable for the moment. However having a group like Apache and more importantly having an Apache license should make it the best option. Especially for businesses that want to evaluate Graph databases and don’t want to spend fortunes on licenses or open source their complete solution when it is only a minor part in a larger solution.
FlockDB is very ruff around the edges (still). It might fit Twitter’s needs but most other people would like partitioning over multiple servers to be transparent and would like to traverse a graph.
In short there is no real solution yet, instead there are a lot of promises. Although commercial options exist, there are too few big ongoing graph projects in Telecom that would justify expensive licenses. Telecom is not a mature graph market yet. It is just starting or graph databases are used on side projects only. Since graph databases are an infrastructure element, having a open-source business-friendly license is preferable. Money can still be make via consultancy, support, administrative tools and a revenue sharing market place for re-usable algorithms. It is now more important to be market-leader in this developing market, then to have the highest sales volume of a niche market.
Why is a graph database important to telecom?
If I call you and you call me then we have a relationship. If I am the key “connector”, “maven” or “salesman” (See The Tipping Point) among my friends or business contacts then I would be the perfect marketing objective. Unfortunately RDBMs are not good at finding those profiles between millions of subscribers.
This is an open invitation for people to join forces and build tomorrow’s architecture, preferably with an Apache License, extremely scalable (billions not thousands) and with support for complex algorithms.
Facebook is rolling out seamless messaging which allows people to focus on what they want to communicate and not on how to communicate. This is again an example of using the social graph to communicate better.
Under the hood Facebook is using Hbase and Hadoop so there is no reason why Telecom operators could not have launched a unified communication system. True the operators don’t have an advanced social networking platform but they can use the user’s mobile social graph as a substitute. If I call you and you call me then we are friends. In the operator’s systems (CRM, HLR, etc.) there is information about who is who. This information is not perfect so operators would need to add a social address book in which users can update their own information and get other people’s updates, much like Plaxo. Adding SMS, instant messaging and email to voice calls, store it in the Cloud and we would have a seamless messaging solution.
The problem is not how hard it would be to implement but why operators are not focusing on this type of solutions. Focus is on market segmentation to find the right tariff plan and device to sell. However operators that want to be around whenever their call and SMS revenues start to seriously decline, will have to do a large mindset change: “Focus on why people want to communicate and not how!”. Find the why and you are likely to come up with alternative hows that are currently not available. A lot of buzz is being generated around Unified Communications Suites but they are the telecom answer to the how not the why. Facebook is definitely shooting in the right direction. Let’s see if operators can do so as well…
Any operator that has not started a project on Cloud Computing is late. The typical data center at an operator is filled with servers that are under utilized e.g. application servers and database servers are running at 30% of memory, disk and CPU. Just by doing step one of getting to Cloud Computing: virtualization, operators are able to save substantially in the cost of hardware, electricity, maintenance, etc. Virtualization means decoupling software from hardware. This allows to run multiple operating systems on one server.
However this would only be focusing on the tip of the iceberg. Cloud Computing is so much more…
Let´s first focus on the internal systems of an operator. After solutions have been virtualized, then you are able to scale them to more or less servers. The first step is to automate this process. If you have an application server cluster, do you need 8 nodes all the time? You probably only need them the week before Christmas or during some other peak period. So the ideal is to be able to measure the load and to automate the deployment of more or less cluster nodes based on load. The same can be done with the database. During the night you have 2 nodes. In the morning 3. During the day 4. During peak moments 8. In the evening 3 again. You could save massive amounts of money if application servers and databases can be scaled in this way. You ideally also are able to pay licenses based on what you really use and not on your maximum number of nodes during a yearly peak.
Redesigning Applications and Data
Both Amazon and Google found out that if they redesign their applications then they can get even more gains than pure virtualization. Amazon´s S3 service is a clear example. However internally they started with services like Dynamo on which S3 is build. The first step is to build general data stores. Multiple applications should be using a common data store instead of needing a separate database cluster each.
Unlike popular believe in the IT world, the dotcoms are not filling their data centers with Oracle RAC clusters. The dotcoms are designing special purpose data stores. The data volumes any market-leading dotcom has to deal with are so massive that a SQL database can not keep up. SQL databases are very good at running efficient queries on structural data or making sure transactions are consistent. However they fail when data is unstructured, write operations are massive or data volumes grow with terabytes every data.
So for all low-volume applications that need transactional data and read more than they write, you could still use a unified Oracle RAC cluster to serve multiple applications. An alternative approach are the data stores that have been build by Amazon (Relational Database Service or SimpleDB) or Google´s App Engine (Datastore with JDO).
What other alternatives are there?
Read Mostly Data
Data that needs to be read a lot and is not updated frequently can get an enormous performance and scalability boost by using an in-memory data store. The dotcom standard is memcached. Facebook (800 servers and 28TB) and Twitter are addicted to memcached.
Documents, Images & Videos
Binary and media files are best stored outside of a database. In small numbers they are often stored on a file system. However they occupy a lot of disk as well as network bandwidth when moved around. The ideal is a document store with a content-delivery network or CDN as a front-end. Amazon´s S3 and CloudFront are examples. Storing them in a compressed format, e.g. LZO can save valuable space. Also transcoding into different formats, e.g. thumbnails or preview can help save network bandwidth.
Unstructed Realtime Data
Data that is unstructured and needs to be stored and accessed in real-time in high volumes are best stored in special purpose data stores. You can write a book about the latest NoSQL solutions. Write an email to maarten at telruptive dot com if you are interested.
Twitter has described most extensively how they use all the unstructured data they get from their logs and other sources. They use technology from Facebook to stream it into a high-available file-system from Yahoo. There they run massive parallel map-reduce operations to get to know a lot more about what users are doing and who is influencing who, etc.
The social graph is about who knows who and what kind of relationship you have. This data is best stored in graph data stores.
Again a chapter by itself but dotcoms are also heavy users of collective intelligence which often means dedicate systems.
Instead of stove pipes with data, the dotcoms are making data accessible to all their applications. Either via search interfaces, web technology to access data (e.g. REST and JSON) or efficient binary interfaces (Thrift and Protocol Buffers).
Messaging and Notification
If applications have access to all the above services then the architecture of an application is simplified enormously. Most of the famous dotcoms don´t use middleware. They prefer the SOA principle. However unlike the IT SOA solutions, a dotcom would take an application and make it into a chain of reusable services. Let´s take an IVR application as an example. There would be a service to do voice recognition. Another one for voice transcription. Another one for text-to-speech. A transcoding service to transcode between different media formats (e.g. high-quality voice and low-phone-quality voice). And so on. Each service has independent load-balancing and can be scaled separately. Services can be re-used between applications. An application is very short because it just need to define which services need to work together and how.
The dotcoms deploy new features on a daily and even hourly basis. This means that all application deployment is fully automated. When a new feature is deployed it does not necessarily overwrite an existing feature. It is possible that a new functionality has been solved in 5 different approaches. Dotcoms would split the total user base and let small parts of users try out the different approaches. Depending on the user´s feedback they would take the preferred approach and slowly scale up from 1% to 100%. If they detect that the feature has a performance problem or a bug then they would be able to roll-back or decrease the load, fix it and deploy gradually again.
The Network, OSS and BSS
There is a substantial effort needed to redesign a network to be cloud-aware. Some components need latencies lower than 10 milli-seconds (e.g. antennas), hence most of this logic will have to be processed locally. However all systems that can live with 100 milli-seconds latencies benefit from a cloud make-over.
Especially in the area of OSS and BSS there is room for optimizing applications and making them cloud-aware. Global services like a network inventory service, a user profile service, a device profile service, etc. would mean simpler applications and less data duplication.
Opening the Cloud
So the network and IT infrastructure is being redesigned to allow for faster innovation and lower costs. However Cloud Computing can also be used to increment revenues.
Being a Cloud Infrastructure Provider
Many IT consultancies and software/hardware vendors will tell an operator that they could be a Cloud infrastructure provider. On slides this really looks nice. However unless an operator is not using the cloud computing principles for their own systems as described in the first part, they are lacking substantial knowledge about how to manage such an infrastructure. Without this knowledge it would be hard to have a very optimized solution and as such be price competitive with the existing players.
Being a Cloud Platform Provider
Although closer to the operator´s core competencies, being a cloud platform provider would still be for those operators that are Cloud experts. A Cloud platform provider would allow others to use the infrastructure services to create applications on top. The complexity lies in the fact that malicious users try to break the platform which could have a very negative effect on the infrastructure if not handled correctly.
Being a Cloud Service Provider
This is the default option most operators should explore first before moving into the other areas. Being a service provider also has a roadmap:
The easiest step is to be the storefront and to resell IT applications from others, e.g. cloud backup storage, security solutions, etc.
Offering Telco SaaS
The next step would be to offer specific telecom applications. Applications that are build for the operator or even better applications that can be build by others based on the operator´s assets. An example would be a PBX in the Cloud.
Open Market for SaaS
Building all telecom applications yourself is hard. Attracting others to do it for you is easier. However just putting a “Net App Store” and an SDK on the web will not get you to dominate the market. Only an open market with a large eco-system of companies and developers can generate large quantities of “Net Apps”. If you are thinking about building an open market, why don´t we talk first. Send an email to maarten at telruptive dot com.
Telruptive is changing focus…
The Top Blogs
Want to reproduce a Telruptive post?
- #KlausStraub CIO @BMW is still showing customer-driven future cars at @tmforumorg @uber @TeslaMotors @google going to eat their lunch 11 months ago
- See my new posts on LinkedIn telruptive.com/2015/11/24/see… 1 year ago
- "Ethereum + IoT = smart contracts on connected devices" by @telruptive @mectors @ethereum @ubuntu on @LinkedIn linkedin.com/pulse/ethereum… 1 year ago
- RT @Agent_Analytics: "The State of #BigData" with Maarten Ectors @telruptive for my #Data Blog! #datascience #InternetOfThings #IoT https:… 1 year ago
- The EU should focus on wealth creation telruptive.com/2015/07/06/the… 1 year ago
- November 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- December 2013
- November 2013
- September 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
All the contents of the Blog, EXCEPT FOR COMMENTS AND QUOTED MATERIAL, constitute the opinion of the Author, and the Author alone; they do not represent the views and opinions of the Author’s employers, supervisors, nor do they represent the view of organizations, businesses or institutions the Author is a part of.
The Author is not responsible for the content of any comments made by the Commenter(s).
While we have made every attempt to ensure that the information contained in this Blog has been obtained from reliable sources, the Author is not responsible for any errors or omissions, or for the results obtained from the use of this information. All information in this Blog is provided "as is", with no guarantee of completeness, accuracy, timeliness or of the results obtained from the use of this information, and without warranty of any kind.