Support for HDFS File Name Globbing
Earlier this year, Syncsort announced plans for a contribution that is intended to make Apache open source even more open. Specifically, this contribution opens up the sort that comes with Apache Hadoop and enables users to seamlessly “plug in” their own enterprise-class sorting solutions. We continue to work closely with the community to make progress and get this ticket (MAPREDUCE-2454) committed.
During our development work, we became aware that globbing was not supported in libhdfs (C/C++ API to access HDFS). As a result, we opened a ticket (HDFS-2461) that enhances libhdfs with a new API that will return HDFS file names that match a specified wildcard pattern.
Why is this important? Since Hadoop is written in Java, all the APIs are readily available from a Java program. C/C++ programmers have to use JNI in order to access these APIs. Libhdfs provides a convenient way to access a subset of HDFS related APIs from a C/C++ program so that programmers do not need to resort to JNI. HDFS-2461 enhances libhdfs in order to make file name globbing (which is available in Java) available to C/C++ programmers.
Stay tuned for more posts on the Syncsort blog as we make progress on these contributions, and continue to search for new ways to help improve Hadoop and make it easier to use. Also, don’t hesitate to leave us a comment if you have any questions about Syncsort’s Hadoop capabilities or ideas for additional ways we can try to give back to the community.