Class Kernel
- All Implemented Interfaces:
Cloneable
To write a new kernel, a developer extends the Kernel
class and overrides the Kernel.run()
method.
To execute this kernel, the developer creates a new instance of it and calls Kernel.execute(int globalSize)
with a suitable 'global size'. At runtime
Aparapi will attempt to convert the Kernel.run()
method (and any method called directly or indirectly
by Kernel.run()
) into OpenCL for execution on GPU devices made available via the OpenCL platform.
Note that Kernel.run()
is not called directly. Instead,
the Kernel.execute(int globalSize)
method will cause the overridden Kernel.run()
method to be invoked once for each value in the range 0...globalSize
.
On the first call to Kernel.execute(int _globalSize)
, Aparapi will determine the EXECUTION_MODE of the kernel.
This decision is made dynamically based on two factors:
- Whether OpenCL is available (appropriate drivers are installed and the OpenCL and Aparapi dynamic libraries are included on the system path).
- Whether the bytecode of the
run()
method (and every method that can be called directly or indirectly from therun()
method) can be converted into OpenCL.
Below is an example Kernel that calculates the square of a set of input values.
class SquareKernel extends Kernel{ private int values[]; private int squares[]; public SquareKernel(int values[]){ this.values = values; squares = new int[values.length]; } public void run() { int gid = getGlobalID(); squares[gid] = values[gid]*values[gid]; } public int[] getSquares(){ return(squares); } }
To execute this kernel, first create a new instance of it and then call execute(Range _range)
.
int[] values = new int[1024]; // fill values array Range range = Range.create(values.length); // create a range 0..1024 SquareKernel kernel = new SquareKernel(values); kernel.execute(range);
When execute(Range)
returns, all the executions of Kernel.run()
have completed and the results are available in the squares
array.
int[] squares = kernel.getSquares();
for (int i=0; iinvalid input: '<' values.length; i++){
System.out.printf("%4d %4d %8d\n", i, values[i], squares[i]);
}
A different approach to creating kernels that avoids extending Kernel is to write an anonymous inner class:
final int[] values = new int[1024];
// fill the values array
final int[] squares = new int[values.length];
final Range range = Range.create(values.length);
Kernel kernel = new Kernel(){
public void run() {
int gid = getGlobalID();
squares[gid] = values[gid]*values[gid];
}
};
kernel.execute(range);
for (int i=0; iinvalid input: '<' values.length; i++){
System.out.printf("%4d %4d %8d\n", i, values[i], squares[i]);
}
- Version:
- Alpha, 21/09/2010
- Author:
- gfrost AMD Javalabs
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic @interface
We can use this Annotation to 'tag' intended constant buffers.class
static enum
Deprecated.final class
This class is for internal Kernel state managementstatic @interface
We can use this Annotation to 'tag' intended local buffers.static @interface
Annotation which can be applied to either a getter (with usual java bean naming convention relative to an instance field), or to any method with void return type, which prevents both the method body and any calls to the method being emitted in the generated OpenCL.static @interface
We can use this Annotation to 'tag' __private (unshared) array fields. -
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
addExecutionModes
(Kernel.EXECUTION_MODE... platforms) Deprecated.void
Invoking this method flags that once the current pass is complete execution should be abandoned.void
Frees the bulk of the resources used by this kernel, by setting array sizes in non-primitiveKernelArg
s to 1 (0 size is prohibited) and invoking kernel execution on a zero size range.clone()
When using a Java Thread Pool Aparapi uses clone to copy the initial instance to each thread.Force pre-compilation of the kernel for a given device, without executing it.Force pre-compilation of the kernel for a given device, without executing it.void
dispose()
Release any resources associated with this Kernel.execute
(int _range) Start execution of_range
kernels.execute
(int _range, int _passes) Start execution of_passes
iterations over the_range
of kernels.Start execution of_range
kernels.Start execution of_passes
iterations of_range
kernels.Start execution ofglobalSize
kernels for the given entrypoint.Start execution ofglobalSize
kernels for the given entrypoint.void
executeFallbackAlgorithm
(Range _range, int _passId) IfhasFallbackAlgorithm()
has been overriden to return true, this method should be overriden so as to apply a single pass of the kernel's logic to the entire _range.get
(boolean[] array) Enqueue a request to return this buffer from the GPU.get
(boolean[][] array) Enqueue a request to return this buffer from the GPU.get
(boolean[][][] array) Enqueue a request to return this buffer from the GPU.get
(byte[] array) Enqueue a request to return this buffer from the GPU.get
(byte[][] array) Enqueue a request to return this buffer from the GPU.get
(byte[][][] array) Enqueue a request to return this buffer from the GPU.get
(char[] array) Enqueue a request to return this buffer from the GPU.get
(char[][] array) Enqueue a request to return this buffer from the GPU.get
(char[][][] array) Enqueue a request to return this buffer from the GPU.get
(double[] array) Enqueue a request to return this buffer from the GPU.get
(double[][] array) Enqueue a request to return this buffer from the GPU.get
(double[][][] array) Enqueue a request to return this buffer from the GPU.get
(float[] array) Enqueue a request to return this buffer from the GPU.get
(float[][] array) Enqueue a request to return this buffer from the GPU.get
(float[][][] array) Enqueue a request to return this buffer from the GPU.get
(int[] array) Enqueue a request to return this buffer from the GPU.get
(int[][] array) Enqueue a request to return this buffer from the GPU.get
(int[][][] array) Enqueue a request to return this buffer from the GPU.get
(long[] array) Enqueue a request to return this buffer from the GPU.get
(long[][] array) Enqueue a request to return this buffer from the GPU.get
(long[][][] array) Enqueue a request to return this buffer from the GPU.double
Determine the total execution time of all previous Kernel.execute(range) calls for all threads that ran this kernel for the device used in the last kernel execution.double
Determine the total execution time of all produced profile reports from all threads that executed the current kernel on the specified device.double
Determine the total execution time of all previous kernel executions called from the current thread, calling this method, that executed the current kernel on the specified device.int
double
Determine the time taken to convert bytecode to OpenCL for first Kernel.execute(range) call.int
Deprecated.double
Determine the execution time of the previous Kernel.execute(range) called from the last thread that ran and executed on the most recently used device.int[]
getKernelCompileWorkGroupSize
(Device device) Retrieves the specified work-group size in the compiled kernel for the specified device or intermediate language for the device.long
getKernelLocalMemSizeInUse
(Device device) Retrieves the amount of local memory used in the specified device by this kernel instance.int
getKernelMaxWorkGroupSize
(Device device) Retrieves the maximum work-group size allowed for this kernel when running on the specified device.long
Retrieves that minimum private memory in use per work item for this kernel instance and the specified device.int
Retrieves the preferred work-group multiple in the specified device for this kernel instance.static String
getMappedMethodName
(ClassModel.ConstantPool.MethodReferenceEntry _methodReferenceEntry) Get the profiling information from the last successful call to Kernel.execute().getProfileReportCurrentThread
(Device device) Retrieves the most recent complete report available for the current thread calling this method for the current kernel instance and executed on the given device.getProfileReportLastThread
(Device device) Retrieves a profile report for the last thread that executed this kernel on the given device.final Device
boolean
False by default.boolean
Deprecated.static void
boolean
isAllowDevice
(Device _device) boolean
boolean
boolean
For dev purposes (we should remove this for production) determine whether this Kernel uses explicit memory managementstatic boolean
isMappedMethod
(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry) static boolean
isOpenCLDelegateMethod
(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry) boolean
put
(boolean[] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(boolean[][] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(boolean[][][] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(byte[] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(byte[][] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(byte[][][] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(char[] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(char[][] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(char[][][] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(double[] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(double[][] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(double[][][] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(float[] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(float[][] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(float[][][] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(int[] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(int[][] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(int[][][] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(long[] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(long[][] array) Tag this array so that it is explicitly enqueued before the kernel is executedput
(long[][][] array) Tag this array so that it is explicitly enqueued before the kernel is executedvoid
Registers a new profile report observer to receive profile reports as they're produced.abstract void
run()
The entry point of a kernel.void
setAutoCleanUpArrays
(boolean autoCleanUpArrays) Property which if true enables automatic calling ofcleanUpArrays()
following each execution.void
setExecutionMode
(Kernel.EXECUTION_MODE _executionMode) Deprecated.void
setExecutionModeWithoutFallback
(Kernel.EXECUTION_MODE _executionMode) void
setExplicit
(boolean _explicit) For dev purposes (we should remove this for production) allow us to define that this Kernel uses explicit memory managementvoid
Deprecated.toString()
void
Deprecated.static boolean
usesAtomic32
(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry) static boolean
usesAtomic64
(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)
-
Field Details
-
LOCAL_SUFFIX
We can use this suffix to 'tag' intended local buffers. So either name the buffer
Or use the Annotation formint[] buffer_$local$ = new int[1024];
invalid input: '@'Local int[] buffer = new int[1024];
- See Also:
-
CONSTANT_SUFFIX
We can use this suffix to 'tag' intended constant buffers. So either name the buffer
Or use the Annotation formint[] buffer_$constant$ = new int[1024];
invalid input: '@'Constant int[] buffer = new int[1024];
- See Also:
-
PRIVATE_SUFFIX
We can use this suffix to 'tag' __private buffers.So either name the buffer
Or use the Annotation formint[] buffer_$private$32 = new int[32];
invalid input: '@'PrivateMemorySpace(32) int[] buffer = new int[32];
- See Also:
-
-
Constructor Details
-
Kernel
public Kernel()
-
-
Method Details
-
run
public abstract void run()The entry point of a kernel.Every kernel must override this method.
-
hasFallbackAlgorithm
public boolean hasFallbackAlgorithm()False by default. In the event that all preferred devices fail to execute a kernel, it is possible to supply an alternate (possibly non-parallel) execution algorithm by overriding this method to return true, and overridingexecuteFallbackAlgorithm(Range, int)
with the alternate algorithm. -
executeFallbackAlgorithm
IfhasFallbackAlgorithm()
has been overriden to return true, this method should be overriden so as to apply a single pass of the kernel's logic to the entire _range.This is not normally required, as fallback to
JavaDevice.THREAD_POOL
will implement the algorithm in parallel. However in the event that thread pool execution may be prohibitively slow, this method might implement a "quick and dirty" approximation to the desired result (for example, a simple box-blur as opposed to a gaussian blur in an image processing application). -
cancelMultiPass
public void cancelMultiPass()Invoking this method flags that once the current pass is complete execution should be abandoned. Due to the complexity of intercommunication between java (or C) and executing OpenCL, this is the best we can do for general cancellation of execution at present. OpenCL 2.0 should introduce pipe mechanisms which will support mid-pass cancellation easily.Note that in the case of thread-pool/pure java execution we could do better already, using Thread.interrupt() (and/or other means) to abandon execution mid-pass. However at present this is not attempted.
- See Also:
-
getCancelState
public int getCancelState() -
getCurrentPass
public int getCurrentPass()- See Also:
-
isExecuting
public boolean isExecuting()- See Also:
-
clone
When using a Java Thread Pool Aparapi uses clone to copy the initial instance to each thread.If you choose to override
clone()
you are responsible for delegating tosuper.clone();
-
getKernelState
-
registerProfileReportObserver
Registers a new profile report observer to receive profile reports as they're produced. This is the method recommended when the client application desires to receive all the execution profiles for the current kernel instance on all devices over all client threads running such kernel with a single observer
Note1: A report will be generated by a thread that finishes executing a kernel. In multithreaded execution environments it is up to the observer implementation to handle thread safety.
Note2: To cancel the report subscription just set observer tonull
value.- Parameters:
observer
- the observer instance that will receive the profile reports
-
getProfileReportLastThread
Retrieves a profile report for the last thread that executed this kernel on the given device. A report will only be available if at least one thread executed the kernel on the device.
Note1: If the profile report is intended to be kept in memory, the object should be cloned withProfileReport.clone()
- Parameters:
device
- the relevant device where the kernel executed- Returns:
- the profiling report for the current most recent execution
- null, if no profiling report is available for such thread
- See Also:
-
getProfileReportCurrentThread
Retrieves the most recent complete report available for the current thread calling this method for the current kernel instance and executed on the given device.
Note1: If the profile report is intended to be kept in memory, the object should be cloned withProfileReport.clone()
Note2: If the thread didn't execute this kernel on the specified device, it will return null.- Parameters:
device
- the relevant device where the kernel executed- Returns:
- the profiling report for the current most recent execution
- null, if no profiling report is available for such thread
- See Also:
-
getExecutionTime
public double getExecutionTime()Determine the execution time of the previous Kernel.execute(range) called from the last thread that ran and executed on the most recently used device.
Note1: This is kept for backwards compatibility only, usage of eithergetProfileReportLastThread(Device)
orregisterProfileReportObserver(IProfileReportObserver)
is encouraged instead.
Note2: Calling this method is not recommended when using more than a single thread to execute the same kernel, or when running kernels on more than one device concurrently.
Note that for the first call this will include the conversion time.
- Returns:
- The time spent executing the kernel (ms)
- NaN, if no profile report is available
- See Also:
-
getConversionTime
public double getConversionTime()Determine the time taken to convert bytecode to OpenCL for first Kernel.execute(range) call.
Note1: This is kept for backwards compatibility only, usage of eithergetProfileReportLastThread(Device)
orregisterProfileReportObserver(IProfileReportObserver)
is encouraged instead.
Note2: Calling this method is not recommended when using more than a single thread to execute the same kernel, or when running kernels on more than one device concurrently.
Note that for the first call this will include the conversion time.
- Returns:
- The time spent preparing the kernel for execution using GPU
- NaN, if no profile report is available
- See Also:
-
getAccumulatedExecutionTimeCurrentThread
Determine the total execution time of all previous kernel executions called from the current thread, calling this method, that executed the current kernel on the specified device.
Note1: This is the recommended method to retrieve the accumulated execution time for a single current thread, even when doing multithreading for the same kernel and device.
Note that this will include the initial conversion time.- Parameters:
the
- device of interest where the kernel executed- Returns:
- The total time spent executing the kernel (ms)
- NaN, if no profiling information is available
- See Also:
-
getAccumulatedExecutionTimeAllThreads
Determine the total execution time of all produced profile reports from all threads that executed the current kernel on the specified device.
Note1: This is the recommended method to retrieve the accumulated execution time, even when doing multithreading for the same kernel and device.
Note that this will include the initial conversion time.- Parameters:
the
- device of interest where the kernel executed- Returns:
- The total time spent executing the kernel (ms)
- NaN, if no profiling information is available
- See Also:
-
getAccumulatedExecutionTime
public double getAccumulatedExecutionTime()Determine the total execution time of all previous Kernel.execute(range) calls for all threads that ran this kernel for the device used in the last kernel execution.
Note1: This is kept for backwards compatibility only, usage ofgetAccumulatedExecutionTimeAllThreads(Device)
is encouraged instead.
Note2: Calling this method is not recommended when using more than a single thread to execute the same kernel on multiple devices concurrently.
Note that this will include the initial conversion time.- Returns:
- The total time spent executing the kernel (ms)
- NaN, if no profiling information is available
- See Also:
-
execute
Start execution of_range
kernels.When
kernel.execute(globalSize)
is invoked, Aparapi will schedule the execution ofglobalSize
kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.- Parameters:
_range
- The number of Kernels that we would like to initiate.
-
toString
-
execute
Start execution of_range
kernels.When
kernel.execute(_range)
is 1invoked, Aparapi will schedule the execution of_range
kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.Since adding the new
Range class
this method offers backward compatibility and merely defers toreturn (execute(Range.create(_range), 1));
.- Parameters:
_range
- The number of Kernels that we would like to initiate.
-
execute
Start execution of_passes
iterations of_range
kernels.When
kernel.execute(_range, _passes)
is invoked, Aparapi will schedule the execution of_reange
kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.- Parameters:
_passes
- The number of passes to make- Returns:
- The Kernel instance (this) so we can chain calls to put(arr).execute(range).get(arr)
-
execute
Start execution of_passes
iterations over the_range
of kernels.When
kernel.execute(_range)
is invoked, Aparapi will schedule the execution of_range
kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.Since adding the new
Range class
this method offers backward compatibility and merely defers toreturn (execute(Range.create(_range), 1));
.- Parameters:
_range
- The number of Kernels that we would like to initiate.
-
execute
Start execution ofglobalSize
kernels for the given entrypoint.When
kernel.execute("entrypoint", globalSize)
is invoked, Aparapi will schedule the execution ofglobalSize
kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.- Parameters:
_entrypoint
- is the name of the method we wish to use as the entrypoint to the kernel- Returns:
- The Kernel instance (this) so we can chain calls to put(arr).execute(range).get(arr)
-
execute
Start execution ofglobalSize
kernels for the given entrypoint.When
kernel.execute("entrypoint", globalSize)
is invoked, Aparapi will schedule the execution ofglobalSize
kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.- Parameters:
_entrypoint
- is the name of the method we wish to use as the entrypoint to the kernel- Returns:
- The Kernel instance (this) so we can chain calls to put(arr).execute(range).get(arr)
-
compile
Force pre-compilation of the kernel for a given device, without executing it.- Parameters:
_device
- the device for which the kernel is to be compiled- Returns:
- the Kernel instance (this) so we can chain calls
- Throws:
CompileFailedException
- if compilation failed for some reason
-
compile
Force pre-compilation of the kernel for a given device, without executing it.- Parameters:
_entrypoint
- is the name of the method we wish to use as the entrypoint to the kernel_device
- the device for which the kernel is to be compiled- Returns:
- the Kernel instance (this) so we can chain calls
- Throws:
CompileFailedException
- if compilation failed for some reason
-
getKernelMinimumPrivateMemSizeInUsePerWorkItem
public long getKernelMinimumPrivateMemSizeInUsePerWorkItem(Device device) throws QueryFailedException Retrieves that minimum private memory in use per work item for this kernel instance and the specified device.- Parameters:
device
- the device where the kernel is intended to run- Returns:
- the number of bytes used per work item
- Throws:
QueryFailedException
- if the query couldn't complete
-
getKernelLocalMemSizeInUse
Retrieves the amount of local memory used in the specified device by this kernel instance.- Parameters:
device
- the device where the kernel is intended to run- Returns:
- the number of bytes of local memory in use for the specified device and current kernel
- Throws:
QueryFailedException
- if the query couldn't complete
-
getKernelPreferredWorkGroupSizeMultiple
Retrieves the preferred work-group multiple in the specified device for this kernel instance.- Parameters:
device
- the device where the kernel is intended to run- Returns:
- the preferred work group multiple
- Throws:
QueryFailedException
- if the query couldn't complete
-
getKernelMaxWorkGroupSize
Retrieves the maximum work-group size allowed for this kernel when running on the specified device.- Parameters:
device
- the device where the kernel is intended to run- Returns:
- the preferred work group multiple
- Throws:
QueryFailedException
- if the query couldn't complete
-
getKernelCompileWorkGroupSize
Retrieves the specified work-group size in the compiled kernel for the specified device or intermediate language for the device.- Parameters:
device
- the device where the kernel is intended to run- Returns:
- the preferred work group multiple
- Throws:
QueryFailedException
- if the query couldn't complete
-
isAutoCleanUpArrays
public boolean isAutoCleanUpArrays() -
setAutoCleanUpArrays
public void setAutoCleanUpArrays(boolean autoCleanUpArrays) Property which if true enables automatic calling ofcleanUpArrays()
following each execution. -
cleanUpArrays
public void cleanUpArrays()Frees the bulk of the resources used by this kernel, by setting array sizes in non-primitiveKernelArg
s to 1 (0 size is prohibited) and invoking kernel execution on a zero size range. Unlikedispose()
, this does not prohibit further invocations of this kernel, as sundry resources such as OpenCL queues are not freed by this method.This allows a "dormant" Kernel to remain in existence without undue strain on GPU resources, which may be strongly preferable to disposing a Kernel and recreating another one later, as creation/use of a new Kernel (specifically creation of its associated OpenCL context) is expensive.
Note that where the underlying array field is declared final, for obvious reasons it is not resized to zero.
-
dispose
public void dispose()Release any resources associated with this Kernel.When the execution mode is
CPU
orGPU
, Aparapi stores some OpenCL resources in a data structure associated with the kernel instance. Thedispose()
method must be called to release these resources.If
execute(int _globalSize)
is called afterdispose()
is called the results are undefined. -
isRunningCL
public boolean isRunningCL() -
getTargetDevice
-
isAllowDevice
- Returns:
- true by default, may be overriden to allow vetoing of a device or devices by a given Kernel instance.
-
getExecutionMode
Deprecated.SeeKernel.EXECUTION_MODE
Return the current execution mode. Before a Kernel executes, this return value will be the execution mode as determined by the setting of the EXECUTION_MODE enumeration. By default, this setting is either GPU if OpenCL is available on the target system, or JTP otherwise. This default setting can be changed by calling setExecutionMode().
After a Kernel executes, the return value will be the mode in which the Kernel actually executed.
- Returns:
- The current execution mode.
- See Also:
-
setExecutionMode
Deprecated.SeeKernel.EXECUTION_MODE
Set the execution mode.
This should be regarded as a request. The real mode will be determined at runtime based on the availability of OpenCL and the characteristics of the workload.
- Parameters:
_executionMode
- the requested execution mode.- See Also:
-
setExecutionModeWithoutFallback
-
setFallbackExecutionMode
Deprecated. -
getMappedMethodName
public static String getMappedMethodName(ClassModel.ConstantPool.MethodReferenceEntry _methodReferenceEntry) -
isMappedMethod
public static boolean isMappedMethod(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry) -
isOpenCLDelegateMethod
public static boolean isOpenCLDelegateMethod(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry) -
usesAtomic32
public static boolean usesAtomic32(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry) -
usesAtomic64
public static boolean usesAtomic64(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry) -
setExplicit
public void setExplicit(boolean _explicit) For dev purposes (we should remove this for production) allow us to define that this Kernel uses explicit memory management- Parameters:
_explicit
- (true if we want explicit memory management)
-
isExplicit
public boolean isExplicit()For dev purposes (we should remove this for production) determine whether this Kernel uses explicit memory management- Returns:
- (true if we kernel is using explicit memory management)
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
put
Tag this array so that it is explicitly enqueued before the kernel is executed- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
get
Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.- Parameters:
array
-- Returns:
- This kernel so that we can use the 'fluent' style API
-
getProfileInfo
Get the profiling information from the last successful call to Kernel.execute().- Returns:
- A list of ProfileInfo records
-
addExecutionModes
Deprecated.SeeKernel.EXECUTION_MODE
.set possible fallback path for execution modes. for example setExecutionFallbackPath(GPU,CPU,JTP) will try to use the GPU if it fails it will fall back to OpenCL CPU and finally it will try JTP.
-
hasNextExecutionMode
Deprecated.- Returns:
- is there another execution path we can try
-
tryNextExecutionMode
Deprecated.SeeKernel.EXECUTION_MODE
. try the next execution path in the list if there aren't any more than give up -
invalidateCaches
public static void invalidateCaches()
-
EXECUTION_MODE
s are used, as a more sophisticatedDevice
preference mechanism is in place, seeKernelManager
.