How checksum smartly manages data integrity in HDFS (Hadoop Distributed File System).

Ensuring data integrity is basic necessity or back bond in big data processing environment to achieve accurate outcome. Of course, same is applicable while executing any data moving operations with […]

Network Topology to create Multi Node Hybrid cluster for Hadoop Installation

The aim of this article is to provide an outline for creating network topology for Hadoop installation in multi node hybrid cluster with limited available hardware resources. This cluster would […]

Data Analytics in Sports

How Data Analytics is transforming Sports Performance – Captivating

The latest buzzword in the business world today is Data Analytics, a science where data is being used to build models to achieve better decision out of those models. Business […]

data governance

Data governance and security mechanism in distributed data storage system

We are very much aware that the traditional data storage mechanism is incapable to hold the massive volume of lightning speed generated data for further utilization even though perform vertical […]

A Credible approach of Big Data processing and subsequent analysis on telecom data to minimize crime, combat terrorism, unsocial activities, etc.

Telecom providers have a treasure trove of captive data – customer data, CDR (call detail records), call center interactions, tower logs etc. and are metaphorically “sitting on a gold mine”. […]


Deleting Solr log files/folder from Standby NameNode could be the disaster when Primary NameNode is active in the HDP (Hortonworks Data Platform) Hadoop Cluster

Most of us know that we use Apache Ambari for managing, provisioning and monitor different components of a Hortonworks Hadoop cluster. We also know that Apache Ranger can be used […]


Fault tolerance enhancement on Apache Hadoop 3.0.0-alpha2 by supporting more than 2 NameNodes.

NameNode is the most critical resource in Hadoop core cluster. Once very large files loaded into the Hadoop Distributed File System (HDFS), the files get broken into block-sized chunks as […]

Streaming by Apache Flink

Basic Understanding of Stateful data Streaming supported by Apache Flink

The technologies related to Big Data processing platform are enhancing the maturity in order to efficiently execute the streaming data which is becoming a major focal point to take business […]

Apache Flink

Apache Flink – A 4G Data Processing Engine

Analyzing streaming data in large-scale systems is becoming a focal point day by day to take accurate business decisions due to mushrooming of digital data generation sources around the globe […]


Steering number of mapper (MapReduce) in sqoop for parallelism of data ingestion into Hadoop Distributed File System (HDFS)

To import data from most the data source like RDBMS, sqoop internally use mapper. Before delegating the responsibility to the mapper, sqoop performs few initial operations in a sequence once […]