> Big Data
, Big Data Future
, Disrup. Technology
, High Scalability
> A Big Data-Base that is fast but inaccurate: BlinkDB
A Big Data-Base that is fast but inaccurate: BlinkDB
The idea might sound strange at first. Why would you want a database that delivers inaccurate data? However BlinkDB trades accuracy for speed. When you query data you can specify when you want the answer, e.g. within 2 seconds, or how accurate you want the answer to be, e.g. 1% error with 95% confidence.
So if you have very large amounts of data (10-100s of Tera Bytes or even Peta Bytes) and you want quick good enough answers then BlinkDB is for you. An early adopter is Facebook. Would you rather have Justin Bieber‘s followers count exactly right in minutes or 99% right as long as your page loads almost instantly? So if you need fast reasonably accurate answers over slow correct answers, BlinkDB is worth checking out.
What can you use BlinkDB for?
- The obvious use case would be real-time reporting? If you need to take decisions in the blink of an eye, e.g. day traders, and 5-10% error is acceptable, e.g. what is the average change of all commodity prices in the last 2 seconds.
- Real-time bookings or price comparison in which users want to know the best possible offer but accept some small error margin, e.g. mobile bar-code scanners that deliver product price comparisons in 1 second instead of 10 will dominate the App Store.
- Any visitor, friends, tweets, total search results, etc. counter on a large website in the world.
- Any Power Law or Long Tail data in which there are some extremely popular cases, e.g. Justin Bieber followers, or a very large set of infrequent cases, e.g. the number of blogs that have under 1000 visitors per month.
- Machine Learning solutions and recommendation engines that are using Collaborative Filtering and other types of algorithms that compare an item or user with large groups of other items and users.
- and many other use cases…
Categories: Big Data, Big Data Future, Disrup. Technology, High Scalability
Berkeley, big data queries, BlinkDB, hadoop, hive, Justin Bieber queries, long tail queries, optimize large queries, petabyte queries, power law queries, real-time big data reporting, real-time collaborative filtering, real-time machine learning, real-time queries, shark, spark, terabyte queries