Class AbstractHadoopProcessor
java.lang.Object
org.apache.nifi.components.AbstractConfigurableComponent
org.apache.nifi.processor.AbstractSessionFactoryProcessor
org.apache.nifi.processor.AbstractProcessor
org.apache.nifi.processors.hadoop.AbstractHadoopProcessor
- All Implemented Interfaces:
ClassloaderIsolationKeyProvider
,ConfigurableComponent
,Processor
@RequiresInstanceClassLoading(cloneAncestorResources=true)
public abstract class AbstractHadoopProcessor
extends AbstractProcessor
implements ClassloaderIsolationKeyProvider
This is a base class that is helpful when building processors interacting with HDFS.
As of Apache NiFi 1.5.0, the Relogin Period property is no longer used in the configuration of a Hadoop processor.
Due to changes made to
SecurityUtil.loginKerberos(Configuration, String, String)
, which is used by this
class to authenticate a principal with Kerberos, Hadoop components no longer
attempt relogins explicitly. For more information, please read the documentation for
SecurityUtil.loginKerberos(Configuration, String, String)
.- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected static class
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final String
static final PropertyDescriptor
private static final String
static final PropertyDescriptor
private static final String
private static final String
static final PropertyDescriptor
private static final HdfsResources
static final PropertyDescriptor
static final String
private final AtomicReference
<HdfsResources> static final PropertyDescriptor
static final PropertyDescriptor
(package private) static final PropertyDescriptor
private File
protected KerberosProperties
private static final Pattern
private static final String
private static final String
protected List
<PropertyDescriptor> private static final Object
protected static final String
private final AtomicReference
<AbstractHadoopProcessor.ValidationResources> -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionfinal void
abstractOnScheduled
(ProcessContext context) If your subclass also has an @OnScheduled annotated method and you need hdfsResources in that method, then be sure to call super.abstractOnScheduled(context)final void
protected void
checkHdfsUriForTimeout
(org.apache.hadoop.conf.Configuration config) protected Collection
<ValidationResult> customValidate
(ValidationContext validationContext) protected org.apache.hadoop.io.compress.CompressionCodec
getCompressionCodec
(ProcessContext context, org.apache.hadoop.conf.Configuration configuration) Returns the configured CompressionCodec, or null if none is configured.getConfigLocations
(PropertyContext context) protected org.apache.hadoop.conf.Configuration
private static org.apache.hadoop.conf.Configuration
getConfigurationFromResources
(org.apache.hadoop.conf.Configuration config, List<String> locations) protected org.apache.hadoop.fs.FileSystem
protected org.apache.hadoop.fs.FileSystem
getFileSystem
(org.apache.hadoop.conf.Configuration config) This exists in order to allow unit tests to override it so that they don't take several minutes waiting for UDP packets to be receivedprotected org.apache.hadoop.fs.FileSystem
getFileSystemAsUser
(org.apache.hadoop.conf.Configuration config, org.apache.hadoop.security.UserGroupInformation ugi) protected org.apache.hadoop.conf.Configuration
getHadoopConfigurationForValidation
(List<String> locations) protected KerberosProperties
getKerberosProperties
(File kerberosConfigFile) private KerberosUser
getKerberosUser
(ProcessContext context) protected org.apache.hadoop.fs.Path
getNormalizedPath
(String rawPath) private org.apache.hadoop.fs.Path
getNormalizedPath
(String rawPath, Optional<String> propertyName) protected org.apache.hadoop.fs.Path
getNormalizedPath
(ProcessContext context, PropertyDescriptor property) protected org.apache.hadoop.fs.Path
getNormalizedPath
(ProcessContext context, PropertyDescriptor property, FlowFile flowFile) static String
getPathDifference
(org.apache.hadoop.fs.Path root, org.apache.hadoop.fs.Path child) Returns the relative path of the child that does not include the filename or the root path.protected List
<PropertyDescriptor> protected org.apache.hadoop.security.UserGroupInformation
protected void
init
(ProcessorInitializationContext context) private void
interruptStatisticsThread
(org.apache.hadoop.fs.FileSystem fileSystem) (package private) boolean
protected boolean
isFileSystemAccessDenied
(URI fileSystemUri) (package private) boolean
protected void
preProcessConfiguration
(org.apache.hadoop.conf.Configuration config, ProcessContext context) This method will be called after the Configuration has been created, but before the FileSystem is created, allowing sub-classes to take further action on the Configuration before creating the FileSystem.(package private) HdfsResources
resetHDFSResources
(List<String> resourceLocations, ProcessContext context) protected Collection
<ValidationResult> validateFileSystem
(org.apache.hadoop.conf.Configuration configuration) Methods inherited from class org.apache.nifi.processor.AbstractProcessor
onTrigger, onTrigger
Methods inherited from class org.apache.nifi.processor.AbstractSessionFactoryProcessor
getControllerServiceLookup, getIdentifier, getLogger, getNodeTypeProvider, getRelationships, initialize, isConfigurationRestored, isScheduled, toString, updateConfiguredRestoredTrue, updateScheduledFalse, updateScheduledTrue
Methods inherited from class org.apache.nifi.components.AbstractConfigurableComponent
equals, getPropertyDescriptor, getPropertyDescriptors, getSupportedDynamicPropertyDescriptor, hashCode, onPropertyModified, validate
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.apache.nifi.components.ConfigurableComponent
getPropertyDescriptor, getPropertyDescriptors, onPropertyModified, validate
Methods inherited from interface org.apache.nifi.processor.Processor
isStateful, migrateProperties, migrateRelationships
-
Field Details
-
ALLOW_EXPLICIT_KEYTAB
- See Also:
-
DENY_LFS_ACCESS
- See Also:
-
DENY_LFS_EXPLANATION
-
LOCAL_FILE_SYSTEM_URI
-
NORMALIZE_ERROR_WITH_PROPERTY
- See Also:
-
NORMALIZE_ERROR_WITHOUT_PROPERTY
- See Also:
-
HADOOP_CONFIGURATION_RESOURCES
-
DIRECTORY
-
COMPRESSION_CODEC
-
KERBEROS_RELOGIN_PERIOD
-
ADDITIONAL_CLASSPATH_RESOURCES
-
KERBEROS_CREDENTIALS_SERVICE
-
KERBEROS_USER_SERVICE
-
ABSOLUTE_HDFS_PATH_ATTRIBUTE
- See Also:
-
HADOOP_FILE_URL_ATTRIBUTE
- See Also:
-
TARGET_HDFS_DIR_CREATED_ATTRIBUTE
- See Also:
-
RESOURCES_LOCK
-
EMPTY_HDFS_RESOURCES
-
kerberosProperties
-
properties
-
kerberosConfigFile
-
hdfsResources
-
validationResourceHolder
-
-
Constructor Details
-
AbstractHadoopProcessor
public AbstractHadoopProcessor()
-
-
Method Details
-
init
- Overrides:
init
in classAbstractSessionFactoryProcessor
-
getKerberosProperties
-
getSupportedPropertyDescriptors
- Overrides:
getSupportedPropertyDescriptors
in classAbstractConfigurableComponent
-
getClassloaderIsolationKey
- Specified by:
getClassloaderIsolationKey
in interfaceClassloaderIsolationKeyProvider
-
customValidate
- Overrides:
customValidate
in classAbstractConfigurableComponent
-
validateFileSystem
protected Collection<ValidationResult> validateFileSystem(org.apache.hadoop.conf.Configuration configuration) -
getHadoopConfigurationForValidation
protected org.apache.hadoop.conf.Configuration getHadoopConfigurationForValidation(List<String> locations) throws IOException - Throws:
IOException
-
abstractOnScheduled
If your subclass also has an @OnScheduled annotated method and you need hdfsResources in that method, then be sure to call super.abstractOnScheduled(context)- Throws:
IOException
-
getConfigLocations
-
abstractOnStopped
-
interruptStatisticsThread
private void interruptStatisticsThread(org.apache.hadoop.fs.FileSystem fileSystem) throws NoSuchFieldException, IllegalAccessException -
getConfigurationFromResources
private static org.apache.hadoop.conf.Configuration getConfigurationFromResources(org.apache.hadoop.conf.Configuration config, List<String> locations) throws IOException - Throws:
IOException
-
resetHDFSResources
HdfsResources resetHDFSResources(List<String> resourceLocations, ProcessContext context) throws IOException - Throws:
IOException
-
getKerberosUser
-
preProcessConfiguration
protected void preProcessConfiguration(org.apache.hadoop.conf.Configuration config, ProcessContext context) This method will be called after the Configuration has been created, but before the FileSystem is created, allowing sub-classes to take further action on the Configuration before creating the FileSystem.- Parameters:
config
- the Configuration that will be used to create the FileSystemcontext
- the context that can be used to retrieve additional values
-
getFileSystem
protected org.apache.hadoop.fs.FileSystem getFileSystem(org.apache.hadoop.conf.Configuration config) throws IOException This exists in order to allow unit tests to override it so that they don't take several minutes waiting for UDP packets to be received- Parameters:
config
- the configuration to use- Returns:
- the FileSystem that is created for the given Configuration
- Throws:
IOException
- if unable to create the FileSystem
-
getFileSystemAsUser
protected org.apache.hadoop.fs.FileSystem getFileSystemAsUser(org.apache.hadoop.conf.Configuration config, org.apache.hadoop.security.UserGroupInformation ugi) throws IOException - Throws:
IOException
-
checkHdfsUriForTimeout
protected void checkHdfsUriForTimeout(org.apache.hadoop.conf.Configuration config) throws IOException - Throws:
IOException
-
getCompressionCodec
protected org.apache.hadoop.io.compress.CompressionCodec getCompressionCodec(ProcessContext context, org.apache.hadoop.conf.Configuration configuration) Returns the configured CompressionCodec, or null if none is configured.- Parameters:
context
- the ProcessContextconfiguration
- the Hadoop Configuration- Returns:
- CompressionCodec or null
-
getPathDifference
public static String getPathDifference(org.apache.hadoop.fs.Path root, org.apache.hadoop.fs.Path child) Returns the relative path of the child that does not include the filename or the root path.- Parameters:
root
- the path to relativize fromchild
- the path to relativize- Returns:
- the relative path
-
getConfiguration
protected org.apache.hadoop.conf.Configuration getConfiguration() -
getFileSystem
protected org.apache.hadoop.fs.FileSystem getFileSystem() -
getUserGroupInformation
protected org.apache.hadoop.security.UserGroupInformation getUserGroupInformation() -
isAllowExplicitKeytab
boolean isAllowExplicitKeytab() -
isLocalFileSystemAccessDenied
boolean isLocalFileSystemAccessDenied() -
isFileSystemAccessDenied
-
getNormalizedPath
protected org.apache.hadoop.fs.Path getNormalizedPath(ProcessContext context, PropertyDescriptor property) -
getNormalizedPath
-
getNormalizedPath
protected org.apache.hadoop.fs.Path getNormalizedPath(ProcessContext context, PropertyDescriptor property, FlowFile flowFile) -
getNormalizedPath
-