Multiple group mapping providers are now possible

User group mapping is an important and very basic functionality in Hadoop. It’s used to query groups info for Hadoop services and users. As we all know, Hadoop related components such as HDFS, MapReduce, HBase, Hive, Oozie and etc. all implemented their authorization features based on user + group model. User info comes from client request, how to get groups info then depends on service provider to do the user group mapping . Currently Hadoop supports to configure one such provider and it’s pluggable, and there’re two providers we can choose to use, ShellBasedUnixGroupsMapping(and the like), and LdapGroupsMapping. The former just gets groups from *nix OS via “id -Gn” command, and the later queries an AD/LDAP to get groups entries. Generally speaking, ShellBasedUnixGroupsMapping is efficient, reliable and can be used
for Hadoop service users, such as hdfs, mapred, hbase, hive, oozie etc, and the LDAP one can be used for Hadoop end users which can avoid having to add group entries for amounts of end users into *nix OS.

So what’s the problem? The problem is, currently Hadoop only supports ONE provider to be used, and we can’t configure more. You may wonder why we need more than one that. OK, let’s see one typical use case. In big organizations AD is often used as users’ identity store, when Hadoop
cluster is deployed with Kerberos authentication in such an organization, the best practice would be use MIT Kerberos plus AD, where MIT Kerberos trusts the AD realm. Hadoop service principals authenticate with MIT Kerberos, while end users still authenticate with AD just as traditional. Regarding to user group mapping, since there’re two user sources thus might two groups sources here, if only one provider is used then it has to merge the two group sources into single place, which can be a big overhead or headache. Anyhow It does not make much sense to add groups entries into AD for service principals just for this, and similarly for adding groups entries into *NIX OS for end users.

To resolve such problem, one possible solution would be just implementing another one to deal with the complex situation regarding to more groups sources. That’s direct and most flexible since you can do whatever you want, but obviously you need to develop it, right. Of course you may don’t if there is one already which does the right job. OK, CompositeGroupsMapping can be such one since it comes for that.

CompositeGroupsMapping can make use of and combine multiple existing group mapping provider implementations to create a virtual one dealing with more than one group sources. Using it, one can go as
ShellBasedUnixGroupsMapping + LdapGroupsMapping,
or even more complex,
ShellBasedUnixGroupsMapping + LdapGroupsMapping for domain X + LdapGroupsMapping for domain Y.

So how to use it? Let’s illustrate how to configure for the complex and later one:
ShellBasedUnixGroupsMapping for service principals + LdapGroupsMapping for domain X + LdapGroupsMapping for domain Y.

First configure Hadoop to use CompositeGroupsMapping provider

<property>
<name>hadoop.security.group.mapping</name>
<value>org.apache.hadoop.security.CompositeGroupsMapping</value>
<description>
Class for user to group mapping (get groups for a given user) for ACL, which
makes use of other multiple providers to provide the service.
</description>
</property>

Then configure what providers with names to combine

<property>
<name>hadoop.security.group.mapping.providers</name>
<value>shell4services,ad4usersX,ad4usersY</value>
<description>
Comma separated of names of other providers to provide user to group mapping.
</description>
</property>

<property>
<name>hadoop.security.group.mapping.provider.shell4services</name>
<value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value>
<description>
Class for group mapping provider named by 'shell4services'. The name can then be referenced
by hadoop.security.group.mapping.providers property.
</description>
</property>

<property>
<name>hadoop.security.group.mapping.provider.ad4usersX</name>
<value>org.apache.hadoop.security.LdapGroupsMapping</value>
<description>
Class for group mapping provider named by 'ad4usersX'. The name can then be referenced
by hadoop.security.group.mapping.providers property.
</description>
</property>

<property>
<name>hadoop.security.group.mapping.provider.ad4usersY</name>
<value>org.apache.hadoop.security.LdapGroupsMapping</value>
<description>
Class for group mapping provider named by 'ad4usersY'. The name can then be referenced
by hadoop.security.group.mapping.providers property.
</description>
</property>

Next configure which users regarding to domain should go to which provider for groups info

<property>
<name>hadoop.security.group.mapping.provider.ad4usersX.domain</name>
<value>EXAMPLE-X.COM</value>
<description>
Domain or realm for users which should go to the provider named by 'ad4usersX' when do group mapping.
</description>
</property>

<property>
<name>hadoop.security.group.mapping.provider.ad4usersY.domain</name>
<value>EXAMPLE-Y.COM</value>
<description>
Domain or realm for users which should go to the provider named by 'ad4usersY' when do group mapping.
</description>
</property>

Also we need to configure AD specific configurations for each LdapGroupsMapping instance

<pre>
<property>
<name>hadoop.security.group.mapping.provider.ad4usersX.ldap.url</name>
<value>ldap://ad-host-for-users-X:389</value>
<description>
ldap url for the provider named by 'ad4usersX'. Note this property comes from
'hadoop.security.group.mapping.ldap.url'.
</description>
</property>

<property>
<name>hadoop.security.group.mapping.provider.ad4usersY.ldap.url</name>
<value>ldap://ad-host-for-users-Y:389</value>
<description>
ldap url for the provider named by 'ad4usersY'. Note this property comes from
'hadoop.security.group.mapping.ldap.url'.
</description>
</property>

more here omitted ...

For now I’m writing this blog, the patch for this provider CompositeGroupsMapping was just ready for review. But if you need it right now I think you can just download the patch, patch and build yourself. Believe me, or the test, it’s fine.

Thanks for your feedback. You’re welcome.

Advertisements
This entry was posted in Apache, Development, Hadoop and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s