There are many stories making news in the market on how data scientists are actually using Hadoop for analyzing customer data, web site click traffic, financial data etc. There was a large majority of people who attended the Strata+Hadoop World 2015 and believes the value of Hadoop while a very small number of people were against it. Data scientist was among the most common title at the show which is the reason that we need security scientist to help big data Hadoop developers in getting into the security market.
At the show, it was realized that saving identities for Hadoop is very much important for everyone who is planning to move their Hadoop deployments directly into the production. There were many people at the vent who were actually responsible for the Hadoop applications involving with the business and people who didn’t knew that their own company was already a customer of Hadoop. There are many cases like that in the market where Hadoop was started on the development side but IT was not aware of it. This indicates that it is important to secure the development to the same extent that you would secure a production.
Security is a very broad term and Secure Hadoop means different things for different users. The most commonly asked question is that what is so much different about Hadoop security and tools like Knox, Ranger or Sentry. The answer to that question is that leading IT giants offer the active directory based identity for groups and users needed to use these Hadoop tools to identify the actual user for authorized access to saved data.
They offer system level controls to allow authorized users with the capability to log in and it also sets up the users’ Kerberos identity which is needed for accessing Hadoop while it is running in secure mode. Apart from this, they also offers privilege management on almost each and every node in the cluster so that IT members can easily log in to perform their work which includes start, stop, restart certain Hadoop services, or to modify and improve configuration files, but at the same time not allow them to access the data which is held within the cluster.
In other terms, you are actually granting them with specific rights to OS management commands and not directly to Hadoop commands. Session auditing actually provides the video recording of the user’s actions on the nodes within the cluster as compared with the cryptic events found within syslog.
Read More Related This :
Apache Hadoop Programming, core component of any modern data architecture, allows companies to gather, store, analyze, and change larger or largest data amount on their own terms, regardless of the data source, storage, format and oldness.