Bloomberg and Gartner finally are seeing what the rest of the IT industry already knew for some years, Oracle software is no longer attractive for new businesses. There is no amount of cloud washing that Oracle can do to hide the fact that open source is good enough now and often better than costly Oracle solutions. Smart companies have realized that Oracle databases are very complex to use and maintain and that they are not build for the webscale and horizontal cloud scale-out era. What is Oracle going to do next? It is going to do what it does best in those cases. When it sees that customers no longer buy its products, it’s buys its competitors. Expect Oracle to buy Datastax, Cloudera and/or Databricks. Afterwards it will do some more IoT washing until that does not work any longer and a competitor needs to be bought. The IT industry tends to become repetitive after some time…
1. Block chain
The block chain is the heart of digital currencies like Bitcoin. What most don’t realise yet is that the block chain will be used for managing everything from domain names, artist royalties, escrow contracts, auctions, lotteries, etc. You can do away with middlemen whose only reason of being is making sure they keep on getting a large cut in the value chain. Unless a middlemen or governmental institution adds real value, they are in danger of being block chained into the past.
2. Biometric security
A good example is the Nymi, a wearable that listens to your unique heart beat patterns and creates a unique identity. Even if people steal your Nymi, it is of no use since they need your heart to go with it.
3. Deep belief networks
Deep belief networks are the reason why Google’s voice recognition is surprisingly accurate, Facebook can tag photos automagically, self-driven cars, etc.
4. Smart labels
They are 1 to 3 millimetres small. They harvest electricity from their environment. They can detect people approaching within half a metre, sometimes even identify them and each product you will buy. Your microwave will not longer have to be told how to warm up a frozen meal.
A $35 Raspberry Pi 2 or Odroid is many multiples more powerful than the first Google server but the size of a credit card. Parallella is $99, same size, and almost ten times more coresP then the first Google server.
6. Apps and App Stores for Smart Devices
Snappy Ubuntu Core allows developers to create apps like mobile apps but to put them on any smart device from robots & drones to wifi, hubs, industrial gateways, switches, dishwashers, sprinkler controls, etc. Software developers will be able to innovate faster and hardware can be totally repurposed in seconds. A switch can become a robot controller.
7. Edge/proximity/fog clouds
Public clouds often have too much latency for certain use cases. Often connectivity loss is not tolerable. Think about security cameras. In a world where 4K quality IP cameras will become extremely cheap, you want machine learning imagine recognition to be done locally and not on the other side of the world.
8. Containers and micro-services orchestration
Docker is not new but orchestrating millions of containers and handling super small micro services is still on the bleeding edge.
9. Cheap personalised robots and drones
£35 buys you a robot arm in Maplin in the UK. Not really useful for major things except for educating the next generation robot makers. Robots and drones will have apps (point 6) for which personalised robots and drones are happening this year.
10. Smart watches and hubs
Smart hubs know who is in the house, where they are (if you wear a phone, health wearable or smart watch), what their physical state is (heartbeat via smart watch), what your face looks like and your voice. Your smart watch will know more about you then you want relatives to know. Today Google knows a husband is getting a divorce before they do [wife searches and uses google maps]. Tomorrow your smart watch will know you are going to have a divorce before you do [heart jumped when you looked at that girl, her heartbeat went wild when you came closer].
Charles – Chuck – Butler, a colleague at Canonical, wrote a very nice blog post explaining the basics of Big Data. It does not only explain them but it also allows anybody to set up Big Data solutions in minutes via Juju. Really recommended reading:
This is a good example of the power of cloud orchestration. Some expert creates charms and bundles them with Juju and afterwards anybody can easily deploy, integrate and scale this Big Data solution in minutes.
Samuel Cozannet, another colleague, used some of these components to create an open source Tweet sentiment analysis solution that can be deployed in 14 minutes and includes autoscaling, a dashboard, Hadoop, Storm, Kafka, etc. He presented it on the OpenStack Developer Summit in Paris and will be providing instructions for everybody to set it up shortly.
In 1999 you could easily spend $1M on having a company build a static web site. A few years later any student could make you a web site. HTML became a commodity. The same commodity effect needs to happen to Big Data.
The past: build your own petabyte solution
A few years back only the happy few extremely technically gifted companies were able to create solutions to store TBs and even PBs of data. Google started to write papers. Yahoo and Facebook started to release open source solutions. Shortly after Big Data became a buzz word and anybody that was somebody in the IT consultancy space was talking about Hadoop.
Now: open source solutions and lots of handholding
In 2014 it is possible to download Hadoop, Spark, Storm, etc. You can even find prepackaged solutions from Hortonworks, Cloudera, MapR, Pivotal, IBM, etc. But still Big Data projects are hard. You need very bright people or spend quite a lot to get anywhere. Many projects run over budget and under deliver.
Future: instant Big Data solutions
We are ready for the next step and convert Big Data in a commodity. Several startups are launching Big Data solutions as a service. Unfortunately for many SaaS providers, having a Big Data SaaS solution is not enough. Big Data means lots of data. Data that can hold sensitive information. Data that can grow with GBs a day. This is the reason why if any SaaS Big Data solution ought to be successful, it also needs an on-premise alternative.
We are also missing a portable Big Data logic container. The industry is raving about Docker. Several startups are working on making Docker containers the way to share your map-reduce logic. I predict that many more Big Data logic can be containerised and made portable. Any data scientist should be able to reuse Deep Belief or Random Forest algorithms by just reusing a container.
The other part of the puzzle that is still missing is data visualisation and manipulation tools. There are many Big Data key-value stores and map-reduce engines. However the data visualisation and reporting space is still wide open. The Apache Foundation does not [yet] provide a drag-and-drop tool to setup dashboards, generate reports, schedule notifications, run workflows, automate data imports, etc.
Industry specific reusable assets is another part that is missing. Nobody wants to go and reinvent eCommerce recommendation algorithms every time a new Big Data platform becomes available.
However all of this is coming at enormous speeds. As soon as all the pieces of the puzzle are coming together then cloud orchestration solutions like Juju, ServiceMesh, Brooklyn, etc. will allow enterprises to start consuming Big Data solutions as a commodity. Instant Big Data solutions are 6-36 months away depending on your requirements.
Have you ever counted the number of Linux devices at home or work that haven’t been updated since they came out of the factory? Your cable/fibre/ADSL modem, your WiFi point, television sets, NAS storage, routers/bridges, media centres, etc. Typically this class of devices hosts a proprietary hardware platform, an embedded proprietary Linux and a proprietary application. If you are lucky you are able to log into a web GUI often using the admin/admin credentials and upload a new firmware blob. This firmware blob is frequently hard to locate on hardware supplier’s websites. No wonder the NSA and others love to look into potential firmware bugs. They are the ideal source of undetected wiretapping.
The next IT revolution: micro-servers
The next IT revolution is about to happen however. Those proprietary hardware platforms will soon give room for commodity multi-core processors from ARM, Intel, etc. General purpose operating systems will replace legacy proprietary and embedded predecessors. Proprietary and static single purpose apps will be replaced by marketplaces and multiple apps running on one device. Security updates will be sent regularly. Devices and apps will be easy to manage remotely. The next revolution will be around managing millions of micro-servers and the apps on top of them. These micro-servers will behave like a mix of phone apps, Docker containers, and cloud servers. Managing them will be like managing a “local cloud” sometimes also called fog computing.
Micro-servers and IoT?
Are micro-servers some form of Internet of Things. Yes they can be but not all the time. If you have a smarthub that controls your home or office then it is pure IoT. However if you have a router, firewall, fibre modem, micro-antenna station, etc. then the micro-server will just be an improved version of its predecessor.
Why should you care about micro-servers?
If you are a mobile app developer then the micro-servers revolution will be your next battlefield. Local clouds need “Angry Bird”-like successes.
If you are a telecom or network developer then the next-generation of micro-servers will give you unseen potentials to combine traffic shaping with parental control with QoS with security with …
If you are a VC then micro-server solution providers is the type of startups you want to invest in.
If you are a hardware vendor then this is the type of devices or SoCs you want to build.
If you are a Big Data expert then imagine the new data tsunami these devices will generate.
If you are a machine learning expert then you might want to look at algorithms and models that are easy to execute on constraint devices once they have been trained on potentially thousands of cloud servers and petabytes of data.
If you are a Devop then your next challenge will be managing and operating millions of constraint servers.
If you are a cloud innovator then you are likely to want to look into SaaS and PaaS management solutions for micro-servers.
If you are a service provider then this is the type of solutions you want to have the capabilities to manage at scale and easily integrate with.
If you are a security expert then you should start to think about micro-firewalls, anti-micro-viruses, etc.
If you are a business manager then you should think about how new “mega micro-revenue” streams can be obtained or how disruptive “micro- innovations” can give you a competitive advantage.
If you are an analyst or consultant then you can start predicting the next IT revolution and the billions the market will be worth in 2020.
The next steps…
It is still early days but expect some major announcements around micro-servers in the next months…
Normally I write blog posts in which I answer questions. This time I would like to have somebody else provide the answer. Why is IT solving problems nobody has experienced yet?
I attend a lot of professional events around cloud, big data, IoT, etc. Hardly do I meet customers there. Mostly I meet suppliers that show me the solution to a problem that perhaps Google will experience in 5 years. I am overreacting but most IT problems are about scaling beyond terabytes. The problem is that most enterprises can’t find a quick way to setup a sub domain or to provision a new user in a central identity management system. Most enterprises need weeks if not months to do tasks that IT companies solved 5 or even 10 years ago in minutes. So why is it that trivial problems seem to capture enterprise attention? Just look at what is currently hot! Tableau software, Amazon Redshift and Dotcloud Docker. You would say that SAS, IBM, Teradata, Canonical, RedHat, Solaris/Sun/Oracle, etc. would have solved reporting, data storage for analytics and packaging Linux software. The market does not seem to agree. Can it be that the initial problems where aimed at early adopters and more and more features where added? The result is that by the time the majority started to use the “solution” it was already to complex?
Why do companies like complex solutions? Why are early adopters the drivers of people’s roadmap and not the majority? What does the IT industry need to do to better understand its enterprise customers? What are enterprise customers telling the IT industry? Are they saying one thing and doing another?
Data volumes are growing exponentially. Unstructured data from Twitter, LinkedIn, Mailling Lists, etc. has the potential to transform many industries if it could be combined with structured data. Machine learning, natural language processing, sentiment analysis, etc. everybody talks about them, hardly anybody is really using them at scale. Too many people when they talk about Big Data unfortunately start with the answer and then ask what the problem it. The answer seems to be Hadoop. News flash: Hadoop is not the answer and if you start from the answer to look for problems then you are doing it wrong.
What are Common Data Problems?
Most Big Data problems are about storage and reporting. How do I store all the exponentially growing data in such a way that business managers can get to in seconds when they need it? Ad-hoc reporting, adequate prediction, and making sense of the exponentially growing data stream are the key problems.
Big Data Storage?
Do you have relational data, unstructured data, graph data, etc.? How do you store different types of data and make it available inside an enterprise? The basics for big data storage is cloud storage technology. You want to store any type of data and be able to quickly scale up storage. RedHat did not buy Inktank for $175M because traditional storage has solved all of today’s problems. Premium SAN and other storage technologies are old school. They are too expensive for Big Data. They were designed with the idea that each byte of data is critical for an enterprise. Unfortunately this is no longer the case. You mind loosing transactional sales data. You don’t mind so much loosing sample tweets you bought from Datasift or Apache log files from an internal low-impact server. This is where cloud storage solutions like Inktank’s Ceph allow commodity storage to be built that is reliable, scalable and extremely cost effective. Does this mean you don’t need SANs any more? Wrong again. TV did not kill Radio. Same here.
Cloud storage technologies are needed because each type of data behaves differently. If you have log data that only is appended then HDFS is fine. If you have read-mostly data then a relational database is ideal. If you have write-mostly data then you need to look at NoSQL. If you need heavy read-and-write then you need strong Big Data architecture skills. What is more important: short latency, consistency, reliability, cheap storage, etc.? Each of these means that the solution is different. No latency means in-memory or SSD. Consistency means transactional. Reliability means replication. You can even now find inconsistent databases like BlinkDB. There is no longer one size fits all. Oracle is no longer the answer to everybody’s data questions.
What will companies need? Companies need cloud storage solutions that offer these different storage capabilities like a service. Amazon’s RDS, DynamoDB, S3 and Redshift are examples of what companies need. However companies need more flexibility. They need to be able to migrate their data between public cloud providers to optimise their costs and have added security. They also need to be able to store data in private local clouds or nearby hosted private clouds for latency or regulatory reasons.
The future of ETL & BI
Traditional ETL will see a revolution. ETL never worked. Business managers don’t want to go and ask their IT department to make a change in a star schema in order to import some extra data from the Internet followed by updates to reports and dashboards. Business managers want an easy to use tool that can answer their ad-hoc queries. This is the reason why Tableau Software + Amazon Redshift are growing like crazy. However if your organisation is starting to pump terabytes of data into Redshift, please be warned: The day will come that Amazon sends you a bill that your CxO will not want to pay and he/she will want you to move out of Amazon. What will you do then? Do you have an exit strategy?
The future of ETL and BI will be web tools that any business manager can use to create ad-hoc reports. The Office generation wants to see dynamic HTML5 GUIs that allow them to drag-and-drop data queries into ad-hoc reports and dashboards. If you need training then the tool is too difficult.
These next-generation BI tools will need dynamic back-office solutions that allow storing real-time, graph, blob, historical relational, unstructured, etc. data into a commonly accessible cloud storage solution. Each one will be hosted by a different cloud service but they will all be an API away. Software will be packaged in such a way that it knows how to export its own data. Why do you need to know where Apache stores the access and error logs and in which format? Apache should be able to export whatever interesting information it contains in a standardised way into some deep storage. Machine learning should be used to make decisions on how best to store that data for ad-hoc reporting afterwards. Humans should no longer be involved in this process.
Talking about machine learning. With the volumes of data growing from gigabytes into petabytes, traditional data scientists will not scale. In many companies a data scientist is similar to a report monkey: “Find out why in region X we sold Y% less”, etc. Data scientist should not be synonymous for dynamic report generators. Data scientists should be machine learning experts. They should tell the computer what they want, not how they want it. Today’s data scientists pride themselves they know R, Python, etc. These tools are too low-level to be usable at scale. There are just not enough people in the world to learn R. Data is growing exponentially, R experts at best can grow linear. What we need are machine learning GUI solutions like RapidMiner Studio but supported by Petabyte cloud solutions. A short term solution could be an HTML5 GUI version of RapidMiner Studio that connects to a back-end set of cloud services that use some of the nice Apache Spark extensions for machine learning, streaming, Big Data warehousing/SQL, graph retrieval, etc. or solutions based on Druid.io. For sure there are other solutions possible.
What is important is that companies start realising that data is becoming a strategic weapon. Those companies that are able to collect more of it and convert it into valuable knowledge and wisdom will be tomorrow’s giants. Most average machine learning algorithms become substantially better just by throwing more and more data at them. This means that having a Big Data architecture is not as critical as having the best trained models in the industry and continue to train them. There will be a data divide between the have’s and have-not’s. Google, Facebook, Microsoft and others have been buying any startup that smells like Deep Belief Networks. They have done this with a good reason. They know that tomorrow’s algorithms and models will be more valuable than diamonds and gold. If you want to be one of the have’s then you need to invest in cloud storage now. You need to have massive historical data volumes to train tomorrow’s algorithms and start building the foundations today…