Archive for the ‘Big Data’ Category

10 world changing technology trends

February 13, 2015 Leave a comment

1. Block chain
The block chain is the heart of digital currencies like Bitcoin. What most don’t realise yet is that the block chain will be used for managing everything from domain names, artist royalties, escrow contracts, auctions, lotteries, etc. You can do away with middlemen whose only reason of being is making sure they keep on getting a large cut in the value chain. Unless a middlemen or governmental institution adds real value, they are in danger of being block chained into the past.
2. Biometric security
A good example is the Nymi, a wearable that listens to your unique heart beat patterns and creates a unique identity. Even if people steal your Nymi, it is of no use since they need your heart to go with it.
3. Deep belief networks
Deep belief networks are the reason why Google’s voice recognition is surprisingly accurate, Facebook can tag photos automagically, self-driven cars, etc.
4. Smart labels
They are 1 to 3 millimetres small. They harvest electricity from their environment. They can detect people approaching within half a metre, sometimes even identify them and each product you will buy. Your microwave will not longer have to be told how to warm up a frozen meal.
5. Micro-servers
A $35 Raspberry Pi 2 or Odroid is many multiples more powerful than the first Google server but the size of a credit card. Parallella is $99, same size, and almost ten times more coresP then the first Google server.
6. Apps and App Stores for Smart Devices
Snappy Ubuntu Core allows developers to create apps like mobile apps but to put them on any smart device from robots & drones to wifi, hubs, industrial gateways, switches, dishwashers, sprinkler controls, etc. Software developers will be able to innovate faster and hardware can be totally repurposed in seconds. A switch can become a robot controller.
7. Edge/proximity/fog clouds
Public clouds often have too much latency for certain use cases. Often connectivity loss is not tolerable. Think about security cameras. In a world where 4K quality IP cameras will become extremely cheap, you want machine learning imagine recognition to be done locally and not on the other side of the world.
8. Containers and micro-services orchestration
Docker is not new but orchestrating millions of containers and handling super small micro services is still on the bleeding edge.
9. Cheap personalised robots and drones
£35 buys you a robot arm in Maplin in the UK. Not really useful for major things except for educating the next generation robot makers. Robots and drones will have apps (point 6) for which personalised robots and drones are happening this year.
10. Smart watches and hubs
Smart hubs know who is in the house, where they are (if you wear a phone, health wearable or smart watch), what their physical state is (heartbeat via smart watch), what your face looks like and your voice. Your smart watch will know more about you then you want relatives to know. Today Google knows a husband is getting a divorce before they do [wife searches and uses google maps]. Tomorrow your smart watch will know you are going to have a divorce before you do [heart jumped when you looked at that girl, her heartbeat went wild when you came closer].

Proximity Cloud, what is it and why you should care…

November 29, 2014 Leave a comment

Many people are still getting their head around public and private clouds. Even less know about Internet of Things. However the real revolution will be bringing both together and for that you will need a proximity cloud.

What is a proximity cloud?
In five years time there will be billions of sensors everywhere. You will be wearing them on your body. They will be in your house, at your job, in your hospital, in your city, in your car/bus/plane, etc.

Now in a world where 1 billion people will have access to 10’s or even 100’s of connected devices around them, it is easy to understand that you don’t want to send all data they generate to a public or private cloud. There is just not enough mobile spectrum or fiber capacity to cater for this.

What we need is put intelligence close to the data generators to determine if your heartbeat, that video camara stream, the electricity consumption of your boiler, the brake capacity of your car, etc. are within normal limits or are outliers that deserve some more attention. Especially if video is involved you don’t want to send it over a public Internet connection if it has nothing interesting on it, e.g. the cat of the neighbours just got onto your lawn.

So houses, cars, companies, telecoms, hospitals, manufacturers, etc. will need some new type of equipment close to the data avalanches in order to filter out the 99.999999% of useless data.

An example. If you are a security company that manages tens of thousands of video cameras in thousands of companies then you want to know when a thief walks by or an aggression happens. You will train machines to make decisions on what the difference is between a human walking past or a cat. However burglars will soon find out that computer vision has a fault and when they wear a cat disguise they can get past. It is this type of events that will trigger a central cloud platform to request all videos of a local business in the last 24 hours and to train its visual model that humans can wear animal suits and then push this to all proximity clouds in all its customer premises. The alternative is storing all video streams in the cloud which would require enormous bandwidth or even worse, not knowing what happened and being in the press the next week for being the “cat-suit” flop.

A Layman’s Guide to the Big Data Ecosystem

November 19, 2014 Leave a comment

Charles – Chuck – Butler, a colleague at Canonical, wrote a very nice blog post explaining the basics of Big Data. It does not only explain them but it also allows anybody to set up Big Data solutions in minutes via Juju. Really recommended reading:

This is a good example of the power of cloud orchestration. Some expert creates charms and bundles them with Juju and afterwards anybody can easily deploy, integrate and scale this Big Data solution in minutes.

Samuel Cozannet, another colleague, used some of these components to create an open source Tweet sentiment analysis solution that can be deployed in 14 minutes and includes autoscaling, a dashboard, Hadoop, Storm, Kafka, etc. He presented it on the OpenStack Developer Summit in Paris and will be providing instructions for everybody to set it up shortly.

IoT and personal health

November 9, 2014 Leave a comment

I just saw Eric Dishman’s TED session on “Health care should be a team sport“. I love the idea of providing people with chronicle illness the means to be diagnosed and treated remotely and use big data to learn of a large group of patients with similar issues. Personally this would mean that when my sons have breathing problems we would not have to drag them in the middle of the night to a hospital where they are exposed to many viruses. Instead by measuring their oxygen level and listening to their longs a personalized remote diagnose could be made and some nebulizers or other things administered. At scale all equipment would probably cost less than £200 because Maplin already sells the nebulizer and oxygen level meter for a combined £110. Add another £90 at worst for a stethoscope that can be connected via bluetooth to a smartphone. Now via Hangout a doctor could remotely diagnose the results and even in the future a computer programme. All results of millions of patients would be collected in order to improve treatment. So no need for an expensive hospital in London with a receptionist, nurse and doctor dedicating 2 hours. By just avoiding one hospital night, the whole system would be enormously profitable. Additionally Ubuntu’s Juju can be used to set up all the big data and diagnostic software in minutes in any cloud or server on any place in the world. If other open source solutions are used then the total solution would be in reach for any developing country. There are probably more than one developer whose kids are asthmatic, and would happily contribute time. It sounds like an ideal Gates Foundation or Kickstarter project. If you think you can help please reach out to me because this is not work for me, this is personal engagement.

Instant Big Data Stream Processing = Instant Storm

September 2, 2014 1 comment

Every 6 months at Canonical, the company behind Ubuntu, I work on something technical to test our tools first hand and to show others new ideas. This time around I created an Instant Big Data solution, more concretely “Instant Storm”.

Storm is now part of the Apache Foundation but previously Storm was build by Nathan Marz during his time at Twitter. Storm is a stream processing engine for real-time and distributed computation. You can use Storm to aggregate real-time flows of events, to do machine learning, for analytics, for distributed ETL, etc.

Storm is build out of several services and requires Zookeeper. It is a complex solution and non-trivial to deploy, integrate and scale.  The first technical project I did at Canonical was to create a Storm Juju charm. Although I was able to automate the deployment of Storm, there were still problems because users still had to read about how to actually use Storm.

Instant Storm is the first effort to resolve this problem. I created a StormDeployer charm that can read a yaml file in which a developer can specify multiple topologies. For each you specify the name of the topology, the jar file, the location in Github, how to package the jar file, etc. Afterwards by uploading the yaml file to Github or any public web server and giving it the extension .storm anybody in the world is able to reuse the topologies instantly in two steps:

1. Deploy the Storm bundle that comes with Storm + Zookeeper + StormDeployer via a simple drag and drop in Juju:

Screen Shot 2014-09-02 at 11.16.442. Get a URL to a storm file and put it into the deploy field of the service settings of the StormDeployer :

Screen Shot 2014-09-02 at 11.20.41


Alternatively you can use the Juju command line: 

juju set stormdeployer "deploy=http://somedomain/somefile.storm"

There are several examples already available on Github but here is one that for sure works:

Screen Shot 2014-09-02 at 11.18.44The StormDeployer will download the project from Github, package the jar with Maven and upload the jar to Storm.  You can check progress in the logs (/opt/storm/latest/log/deploy.log).

This is the easiest way to deploy Storm on any public cloud, private cloud or if Ubuntu’s Metal-as-a-Service / MaaS is used on any bare metal server (X86, ARM64, Power 8). See here for Juju installation instructions.

This is a first version with some limitations. One of the really nice things to add would be to use Juju to make integrations between a topology and other charms dynamic. You can for instance create a spout or bolt that connects to the Kafka or Cassandra charms. Juju can automatically tell the topology the connection information and make updates to the running topologies should anything change. This would make it a lot more robust to run long running Storm topologies.

I am happy to donate my work to the Apache Foundation and guide anybody who wants to take ownership…

Commoditizing Big Data via Instant Big Data Solutions

September 1, 2014 Leave a comment

In 1999 you could easily spend $1M on having a company build a static web site. A few years later any student could make you a web site. HTML became a commodity. The same commodity effect needs to happen to Big Data.

The past: build your own petabyte solution

A few years back only the happy few extremely technically gifted companies were able to create solutions to store TBs and even PBs of data. Google started to write papers. Yahoo and Facebook started to release open source solutions. Shortly after Big Data became a buzz word and anybody that was somebody in the IT consultancy space was talking about Hadoop.

Now: open source solutions and lots of handholding

In 2014 it is possible to download Hadoop, Spark, Storm, etc. You can even find prepackaged solutions from Hortonworks, Cloudera, MapR, Pivotal, IBM, etc. But still Big Data projects are hard. You need very bright people or spend quite a lot to get anywhere. Many projects run over budget and under deliver.

Future: instant Big Data solutions

We are ready for the next step and convert Big Data in a commodity. Several startups are launching Big Data solutions as a service. Unfortunately for many SaaS providers, having a Big Data SaaS solution is not enough. Big Data means lots of data. Data that can hold sensitive information. Data that can grow with GBs a day. This is the reason why if any SaaS Big Data solution ought to be successful, it also needs an on-premise alternative.

We are also missing a portable Big Data logic container. The industry is raving about Docker. Several startups are working on making Docker containers the way to share your map-reduce logic. I predict that many more Big Data logic can be containerised and made portable. Any data scientist should be able to reuse Deep Belief or Random Forest algorithms by just reusing a container.

The other part of the puzzle that is still missing is data visualisation and manipulation tools. There are many Big Data key-value stores and map-reduce engines. However the data visualisation and reporting space is still wide open. The Apache Foundation does not [yet] provide a drag-and-drop tool to setup dashboards, generate reports, schedule notifications, run workflows, automate data imports, etc. 

Industry specific reusable assets is another part that is missing. Nobody wants to go and reinvent eCommerce recommendation algorithms every time a new Big Data platform becomes available. 

However all of this is coming at enormous speeds. As soon as all the pieces of the puzzle are coming together then cloud orchestration solutions like Juju, ServiceMesh, Brooklyn, etc. will allow enterprises to start consuming Big Data solutions as a commodity. Instant Big Data solutions are 6-36 months away depending on your requirements. 

Categories: Big Data, Big Data Future

The next IT revolution: micro-servers and local cloud

Have you ever counted the number of Linux devices at home or work that haven’t been updated since they came out of the factory? Your cable/fibre/ADSL modem, your WiFi point, television sets, NAS storage, routers/bridges, media centres, etc. Typically this class of devices hosts a proprietary hardware platform, an embedded proprietary Linux and a proprietary application. If you are lucky you are able to log into a web GUI often using the admin/admin credentials and upload a new firmware blob. This firmware blob is frequently hard to locate on hardware supplier’s websites. No wonder the NSA and others love to look into potential firmware bugs. They are the ideal source of undetected wiretapping.

The next IT revolution: micro-servers
The next IT revolution is about to happen however. Those proprietary hardware platforms will soon give room for commodity multi-core processors from ARM, Intel, etc. General purpose operating systems will replace legacy proprietary and embedded predecessors. Proprietary and static single purpose apps will be replaced by marketplaces and multiple apps running on one device. Security updates will be sent regularly. Devices and apps will be easy to manage remotely. The next revolution will be around managing millions of micro-servers and the apps on top of them. These micro-servers will behave like a mix of phone apps, Docker containers, and cloud servers. Managing them will be like managing a “local cloud” sometimes also called fog computing.

Micro-servers and IoT?
Are micro-servers some form of Internet of Things. Yes they can be but not all the time. If you have a smarthub that controls your home or office then it is pure IoT. However if you have a router, firewall, fibre modem, micro-antenna station, etc. then the micro-server will just be an improved version of its predecessor.

Why should you care about micro-servers?
If you are a mobile app developer then the micro-servers revolution will be your next battlefield. Local clouds need “Angry Bird”-like successes.
If you are a telecom or network developer then the next-generation of micro-servers will give you unseen potentials to combine traffic shaping with parental control with QoS with security with …
If you are a VC then micro-server solution providers is the type of startups you want to invest in.
If you are a hardware vendor then this is the type of devices or SoCs you want to build.
If you are a Big Data expert then imagine the new data tsunami these devices will generate.
If you are a machine learning expert then you might want to look at algorithms and models that are easy to execute on constraint devices once they have been trained on potentially thousands of cloud servers and petabytes of data.
If you are a Devop then your next challenge will be managing and operating millions of constraint servers.
If you are a cloud innovator then you are likely to want to look into SaaS and PaaS management solutions for micro-servers.
If you are a service provider then this is the type of solutions you want to have the capabilities to manage at scale and easily integrate with.
If you are a security expert then you should start to think about micro-firewalls, anti-micro-viruses, etc.
If you are a business manager then you should think about how new “mega micro-revenue” streams can be obtained or how disruptive “micro- innovations” can give you a competitive advantage.
If you are an analyst or consultant then you can start predicting the next IT revolution and the billions the market will be worth in 2020.

The next steps…
It is still early days but expect some major announcements around micro-servers in the next months…

%d bloggers like this: