We all use Hadoop as the de-facto choice for a DataLake. Why so, because we can throw any data at it. But in reality we cannot throw any data at it. Most of our data is convertible to plaintext, and Hadoop accepts only plaintext. We convert all our data into plaintext and then throw it into Hadoop.
However does plain text storage in Hadoop address any problem? Yes, it does allow you to store all your organisational data in one place. But because your JSON & XML went in as plain text, analysing it within Hadoop now requires excess code.
|PDF, Word, Excel data||No||Yes|
(Java + Scala)
(Possible with Spark)
Using BlobCity as a DataLake futures proofs your DataLake infrastructure. With native support for 17 different formats of data and option of moving part data to in-memory for faster analytics, BlobCity strikes the right balance between features, performance, customer needs and cost effectiveness.
New systems may bring in data in newer formats that are currently not anticipated, and BlobCity will most likely and readily accept that format. This means new systems can report data to your DataLake with minimalistic integration effort.
If some queries are performing slower due to limitation of disk IO, the corresponding data can be instantly moved into memory to allow high speed and real-time analytics over such data.
Updated about 4 years ago