Google news uses clustering machine learning technique to group similar kind of news or articles together. Interestingly, they don’t have thousand news editors on trunk instead use the clustering technique to forms groups of similar data based on the common characteristics. Mahout is a machine learning software from Apache community that applications leverage to analyse large sets of data. Before invention of Mahout, it was too complex to a analyse large sets of data. Mahout extensively utilize Apache Hadoop to get the power parallel and distributed computing. Three machine learning techniques that offers by Mahout.
Clustering is not group data into an existing set of known categories. This is particularly useful when we are not sure how to organize the data in the first place. Google news uses this powerful technique to make change the ever-changing stream of news and articles from around the world to enabling us keep update with latest events around the globe.
Recommendation technique uses user information + community information to deliver what type of product/service etc we would like or prefer mainly when we are browsing e-commerce sites.
Classification technique or engine generally utilise to segregate the emails (Spam and Non-spam emails) based on known data like
Can be contacted for real time POC developement and handson technical training. Also to develop/support any Hadoop related project. Email:- [email protected] Gautam is a consultant as well as Educator. Prior to that, he worked as Sr. Technical Architect in multiple technologies and business domain. Currently, he is specializing in Big Data processing and analysis, Data lake creation, architecture etc. using HDFS. Besides, involved in HDFS maintenance and loading of multiple types of data from different sources, Design and development of real time use case development on client/customer demands to demonstrate how data can be leveraged for business transformation, profitability etc. He is passionate about sharing knowledge through blogs, seminars, presentations etc. on various Big Data related technologies, methodologies, real time projects with their architecture /design, multiple procedure of huge volume data ingestion, basic data lake creation etc.