Wednesday, June 3, 2015

Connecting to Hadoop CDH with a windows client

Random notes and links on getting Windows clients to work with CDH-5.3.1:

  1. Use the CDH-releases of your hadoop libraries, see http://blog.cloudera.com/blog/2012/08/developing-cdh-applications-with-maven-and-eclipse/  (for other hadoop distributions, this just means: use the same versions on the client as in the cluster)
  2. Set the necessary properties for cross-os functioning (why, oh why is such a thing necessary?), and get winutils.exe, see https://github.com/spring-projects/spring-hadoop/wiki/Using-a-Windows-client-together-with-a-Linux-cluster
  3. set the environment variable HADOOP_USERNAME to an appropriate value, see http://stackoverflow.com/a/11062529/1319284
  4. if you are using HDFS (which you probably are), you need to add hadoop-hdfs to your classpath if it is not already, see http://stackoverflow.com/a/24492225/1319284
  5. check the firewall rules of the nodes, on centOS you can do this with system-config-firewall. See https://www.centos.org/docs/5/html/Deployment_Guide-en-US/ch-fw.html, see here for a list of the ports used by the various CDH components http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_ports_cdh5.html
  6. make sure all configured host names can be resolved, and edit the Hosts file if necessary. (It is located under C:\Windows\System32\drivers\etc\Hosts)
  7. Make sure your compiler compliance level is set to target the right Java version. Right now this is 1.7. Failing to do so generates errors like Unsupported major.minor version 52.0. See here for example http://stackoverflow.com/questions/22489398/unsupported-major-minor-version-52-0
  8. When using eclipse, make sure to export a jar (or build with maven), and then add it to the classpath of the Launch command. That way, Job.setJarByClass will find the jar which can then be uploaded to the cluster. Granted, this is a little hacky, but works.

After doing all this, I successfully ran my MapReduce job from Eclipse.

[Update]
For CDH 5.5.0 (Hadoop 2.6.0), a binary build with winutils.exe can be downloaded from http://www.barik.net/archive/2015/01/19/172716/
In addition to setting hadoop.home.dir, java.library.path must be set to the bin directory.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.