Class AbstractHadoopProcessor

All Implemented Interfaces:
ClassloaderIsolationKeyProvider, ConfigurableComponent, Processor

@RequiresInstanceClassLoading(cloneAncestorResources=true) public abstract class AbstractHadoopProcessor extends AbstractProcessor implements ClassloaderIsolationKeyProvider
This is a base class that is helpful when building processors interacting with HDFS.

As of Apache NiFi 1.5.0, the Relogin Period property is no longer used in the configuration of a Hadoop processor. Due to changes made to SecurityUtil.loginKerberos(Configuration, String, String), which is used by this class to authenticate a principal with Kerberos, Hadoop components no longer attempt relogins explicitly. For more information, please read the documentation for SecurityUtil.loginKerberos(Configuration, String, String).

See Also:
  • Field Details

  • Constructor Details

    • AbstractHadoopProcessor

      public AbstractHadoopProcessor()
  • Method Details

    • init

      protected void init(ProcessorInitializationContext context)
      Overrides:
      init in class AbstractSessionFactoryProcessor
    • getKerberosProperties

      protected KerberosProperties getKerberosProperties(File kerberosConfigFile)
    • getSupportedPropertyDescriptors

      protected List<PropertyDescriptor> getSupportedPropertyDescriptors()
      Overrides:
      getSupportedPropertyDescriptors in class AbstractConfigurableComponent
    • getClassloaderIsolationKey

      public String getClassloaderIsolationKey(PropertyContext context)
      Specified by:
      getClassloaderIsolationKey in interface ClassloaderIsolationKeyProvider
    • customValidate

      protected Collection<ValidationResult> customValidate(ValidationContext validationContext)
      Overrides:
      customValidate in class AbstractConfigurableComponent
    • validateFileSystem

      protected Collection<ValidationResult> validateFileSystem(org.apache.hadoop.conf.Configuration configuration)
    • getHadoopConfigurationForValidation

      protected org.apache.hadoop.conf.Configuration getHadoopConfigurationForValidation(List<String> locations) throws IOException
      Throws:
      IOException
    • abstractOnScheduled

      @OnScheduled public final void abstractOnScheduled(ProcessContext context) throws IOException
      If your subclass also has an @OnScheduled annotated method and you need hdfsResources in that method, then be sure to call super.abstractOnScheduled(context)
      Throws:
      IOException
    • getConfigLocations

      protected List<String> getConfigLocations(PropertyContext context)
    • abstractOnStopped

      @OnStopped public final void abstractOnStopped()
    • interruptStatisticsThread

      private void interruptStatisticsThread(org.apache.hadoop.fs.FileSystem fileSystem) throws NoSuchFieldException, IllegalAccessException
      Throws:
      NoSuchFieldException
      IllegalAccessException
    • getConfigurationFromResources

      private static org.apache.hadoop.conf.Configuration getConfigurationFromResources(org.apache.hadoop.conf.Configuration config, List<String> locations) throws IOException
      Throws:
      IOException
    • resetHDFSResources

      HdfsResources resetHDFSResources(List<String> resourceLocations, ProcessContext context) throws IOException
      Throws:
      IOException
    • getKerberosUser

      private KerberosUser getKerberosUser(ProcessContext context)
    • preProcessConfiguration

      protected void preProcessConfiguration(org.apache.hadoop.conf.Configuration config, ProcessContext context)
      This method will be called after the Configuration has been created, but before the FileSystem is created, allowing sub-classes to take further action on the Configuration before creating the FileSystem.
      Parameters:
      config - the Configuration that will be used to create the FileSystem
      context - the context that can be used to retrieve additional values
    • getFileSystem

      protected org.apache.hadoop.fs.FileSystem getFileSystem(org.apache.hadoop.conf.Configuration config) throws IOException
      This exists in order to allow unit tests to override it so that they don't take several minutes waiting for UDP packets to be received
      Parameters:
      config - the configuration to use
      Returns:
      the FileSystem that is created for the given Configuration
      Throws:
      IOException - if unable to create the FileSystem
    • getFileSystemAsUser

      protected org.apache.hadoop.fs.FileSystem getFileSystemAsUser(org.apache.hadoop.conf.Configuration config, org.apache.hadoop.security.UserGroupInformation ugi) throws IOException
      Throws:
      IOException
    • checkHdfsUriForTimeout

      protected void checkHdfsUriForTimeout(org.apache.hadoop.conf.Configuration config) throws IOException
      Throws:
      IOException
    • getCompressionCodec

      protected org.apache.hadoop.io.compress.CompressionCodec getCompressionCodec(ProcessContext context, org.apache.hadoop.conf.Configuration configuration)
      Returns the configured CompressionCodec, or null if none is configured.
      Parameters:
      context - the ProcessContext
      configuration - the Hadoop Configuration
      Returns:
      CompressionCodec or null
    • getPathDifference

      public static String getPathDifference(org.apache.hadoop.fs.Path root, org.apache.hadoop.fs.Path child)
      Returns the relative path of the child that does not include the filename or the root path.
      Parameters:
      root - the path to relativize from
      child - the path to relativize
      Returns:
      the relative path
    • getConfiguration

      protected org.apache.hadoop.conf.Configuration getConfiguration()
    • getFileSystem

      protected org.apache.hadoop.fs.FileSystem getFileSystem()
    • getUserGroupInformation

      protected org.apache.hadoop.security.UserGroupInformation getUserGroupInformation()
    • isAllowExplicitKeytab

      boolean isAllowExplicitKeytab()
    • isLocalFileSystemAccessDenied

      boolean isLocalFileSystemAccessDenied()
    • isFileSystemAccessDenied

      protected boolean isFileSystemAccessDenied(URI fileSystemUri)
    • getNormalizedPath

      protected org.apache.hadoop.fs.Path getNormalizedPath(ProcessContext context, PropertyDescriptor property)
    • getNormalizedPath

      protected org.apache.hadoop.fs.Path getNormalizedPath(String rawPath)
    • getNormalizedPath

      protected org.apache.hadoop.fs.Path getNormalizedPath(ProcessContext context, PropertyDescriptor property, FlowFile flowFile)
    • getNormalizedPath

      private org.apache.hadoop.fs.Path getNormalizedPath(String rawPath, Optional<String> propertyName)