pyarrow.hdfs.connect

pyarrow.hdfs.connect(host='default', port=0, user=None, kerb_ticket=None, driver='libhdfs', extra_conf=None)[source]

Connect to an HDFS cluster. All parameters are optional and should only be set if the defaults need to be overridden.

Authentication should be automatic if the HDFS cluster uses Kerberos. However, if a username is specified, then the ticket cache will likely be required.

Parameters:
  • host (NameNode. Set to "default" for fs.defaultFS from core-site.xml.) –
  • port (NameNode's port. Set to 0 for default or logical (HA) nodes.) –
  • user (Username when connecting to HDFS; None implies login user.) –
  • kerb_ticket (Path to Kerberos ticket cache.) –
  • driver ({'libhdfs', 'libhdfs3'}, default 'libhdfs') – Connect using libhdfs (JNI-based) or libhdfs3 (3rd-party C++ library from Apache HAWQ (incubating) )
  • extra_conf (dict, default None) – extra Key/Value pairs for config; Will override any hdfs-site.xml properties

Notes

The first time you call this method, it will take longer than usual due to JNI spin-up time.

Returns:filesystem (HadoopFileSystem)