public abstract class Kernel extends Object implements Cloneable
To write a new kernel, a developer extends the Kernel
class and overrides the Kernel.run()
method.
To execute this kernel, the developer creates a new instance of it and calls Kernel.execute(int globalSize)
with a suitable 'global size'. At runtime
Aparapi will attempt to convert the Kernel.run()
method (and any method called directly or indirectly
by Kernel.run()
) into OpenCL for execution on GPU devices made available via the OpenCL platform.
Note that Kernel.run()
is not called directly. Instead,
the Kernel.execute(int globalSize)
method will cause the overridden Kernel.run()
method to be invoked once for each value in the range 0...globalSize
.
On the first call to Kernel.execute(int _globalSize)
, Aparapi will determine the EXECUTION_MODE of the kernel.
This decision is made dynamically based on two factors:
run()
method (and every method that can be called directly or indirectly from the run()
method)
can be converted into OpenCL.Below is an example Kernel that calculates the square of a set of input values.
class SquareKernel extends Kernel{ private int values[]; private int squares[]; public SquareKernel(int values[]){ this.values = values; squares = new int[values.length]; } public void run() { int gid = getGlobalID(); squares[gid] = values[gid]*values[gid]; } public int[] getSquares(){ return(squares); } }
To execute this kernel, first create a new instance of it and then call execute(Range _range)
.
int[] values = new int[1024]; // fill values array Range range = Range.create(values.length); // create a range 0..1024 SquareKernel kernel = new SquareKernel(values); kernel.execute(range);
When execute(Range)
returns, all the executions of Kernel.run()
have completed and the results are available in the squares
array.
int[] squares = kernel.getSquares(); for (int i=0; i< values.length; i++){ System.out.printf("%4d %4d %8d\n", i, values[i], squares[i]); }
A different approach to creating kernels that avoids extending Kernel is to write an anonymous inner class:
final int[] values = new int[1024]; // fill the values array final int[] squares = new int[values.length]; final Range range = Range.create(values.length); Kernel kernel = new Kernel(){ public void run() { int gid = getGlobalID(); squares[gid] = values[gid]*values[gid]; } }; kernel.execute(range); for (int i=0; i< values.length; i++){ System.out.printf("%4d %4d %8d\n", i, values[i], squares[i]); }
Modifier and Type | Class and Description | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
static interface |
Kernel.Constant
We can use this Annotation to 'tag' intended constant buffers.
|
||||||||||||
class |
Kernel.Entry |
||||||||||||
static class |
Kernel.EXECUTION_MODE
Deprecated.
It is no longer recommended that
EXECUTION_MODE s are used, as a more sophisticated Device
preference mechanism is in place, see KernelManager . Though setExecutionMode(EXECUTION_MODE)
is still honored, the default EXECUTION_MODE is now Kernel.EXECUTION_MODE.AUTO , which indicates that the KernelManager
will determine execution behaviours.
The execution mode ENUM enumerates the possible modes of executing a kernel. One can request a mode of execution using the values below, and query a kernel after it first executes to determine how it executed. Aparapi supports 5 execution modes. Default is GPU.
To request that a kernel is executed in a specific mode, call
int[] values = new int[1024]; // fill values array SquareKernel kernel = new SquareKernel(values); kernel.setExecutionMode(Kernel.EXECUTION_MODE.JTP); kernel.execute(values.length);
<<<<<<< HEAD:src/main/java/com/aparapi/Kernel.java
Alternatively, the property java -classpath ....;aparapi.jar -Dcom.aparapi.executionMode=GPU MyApplication ======= Alternatively, the property |
||||||||||||
class |
Kernel.KernelState
This class is for internal Kernel state management
|
||||||||||||
static interface |
Kernel.Local
We can use this Annotation to 'tag' intended local buffers.
|
||||||||||||
static interface |
Kernel.NoCL
Annotation which can be applied to either a getter (with usual java bean naming convention relative to an instance field), or to any method
with void return type, which prevents both the method body and any calls to the method being emitted in the generated OpenCL.
|
||||||||||||
static interface |
Kernel.PrivateMemorySpace
We can use this Annotation to 'tag' __private (unshared) array fields.
|
Modifier and Type | Field and Description |
---|---|
static String |
CONSTANT_SUFFIX
We can use this suffix to 'tag' intended constant buffers.
|
static String |
LOCAL_SUFFIX
We can use this suffix to 'tag' intended local buffers.
|
static String |
PRIVATE_SUFFIX
We can use this suffix to 'tag' __private buffers.
|
Constructor and Description |
---|
Kernel() |
Modifier and Type | Method and Description |
---|---|
void |
addExecutionModes(Kernel.EXECUTION_MODE... platforms)
Deprecated.
See
Kernel.EXECUTION_MODE .
set possible fallback path for execution modes. for example setExecutionFallbackPath(GPU,CPU,JTP) will try to use the GPU if it fails it will fall back to OpenCL CPU and finally it will try JTP. |
void |
cancelMultiPass()
Invoking this method flags that once the current pass is complete execution should be abandoned.
|
void |
cleanUpArrays()
Frees the bulk of the resources used by this kernel, by setting array sizes in non-primitive
KernelArg s to 1 (0 size is prohibited) and invoking kernel
execution on a zero size range. |
Kernel |
clone()
When using a Java Thread Pool Aparapi uses clone to copy the initial instance to each thread.
|
void |
dispose()
Release any resources associated with this Kernel.
|
Kernel |
execute(int _range)
Start execution of
_range kernels. |
Kernel |
execute(int _range,
int _passes)
Start execution of
_passes iterations over the _range of kernels. |
Kernel |
execute(Range _range)
Start execution of
_range kernels. |
Kernel |
execute(Range _range,
int _passes)
Start execution of
_passes iterations of _range kernels. |
Kernel |
execute(String _entrypoint,
Range _range)
Start execution of
globalSize kernels for the given entrypoint. |
Kernel |
execute(String _entrypoint,
Range _range,
int _passes)
Start execution of
globalSize kernels for the given entrypoint. |
void |
executeFallbackAlgorithm(Range _range,
int _passId)
If
hasFallbackAlgorithm() has been overriden to return true, this method should be overriden so as to
apply a single pass of the kernel's logic to the entire _range. |
Kernel |
get(boolean[] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(boolean[][] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(boolean[][][] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(byte[] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(byte[][] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(byte[][][] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(char[] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(char[][] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(char[][][] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(double[] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(double[][] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(double[][][] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(float[] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(float[][] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(float[][][] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(int[] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(int[][] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(int[][][] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(long[] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(long[][] array)
Enqueue a request to return this buffer from the GPU.
|
Kernel |
get(long[][][] array)
Enqueue a request to return this buffer from the GPU.
|
double |
getAccumulatedExecutionTime()
Determine the total execution time of all previous Kernel.execute(range) calls.
|
int |
getCancelState() |
double |
getConversionTime()
Determine the time taken to convert bytecode to OpenCL for first Kernel.execute(range) call.
|
int |
getCurrentPass() |
Kernel.EXECUTION_MODE |
getExecutionMode()
Deprecated.
See
Kernel.EXECUTION_MODE
Return the current execution mode. Before a Kernel executes, this return value will be the execution mode as determined by the setting of the EXECUTION_MODE enumeration. By default, this setting is either GPU if OpenCL is available on the target system, or JTP otherwise. This default setting can be changed by calling setExecutionMode(). After a Kernel executes, the return value will be the mode in which the Kernel actually executed. |
double |
getExecutionTime()
Determine the execution time of the previous Kernel.execute(range) call.
|
Kernel.KernelState |
getKernelState() |
static String |
getMappedMethodName(ClassModel.ConstantPool.MethodReferenceEntry _methodReferenceEntry) |
List<ProfileInfo> |
getProfileInfo()
Get the profiling information from the last successful call to Kernel.execute().
|
Device |
getTargetDevice() |
boolean |
hasFallbackAlgorithm()
False by default.
|
boolean |
hasNextExecutionMode()
Deprecated.
|
static void |
invalidateCaches() |
boolean |
isAllowDevice(Device _device) |
boolean |
isAutoCleanUpArrays() |
boolean |
isExecuting() |
boolean |
isExplicit()
For dev purposes (we should remove this for production) determine whether this Kernel uses explicit memory management
|
static boolean |
isMappedMethod(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry) |
static boolean |
isOpenCLDelegateMethod(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry) |
boolean |
isRunningCL() |
Kernel |
put(boolean[] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(boolean[][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(boolean[][][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(byte[] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(byte[][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(byte[][][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(char[] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(char[][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(char[][][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(double[] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(double[][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(double[][][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(float[] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(float[][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(float[][][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(int[] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(int[][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(int[][][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(long[] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(long[][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
Kernel |
put(long[][][] array)
Tag this array so that it is explicitly enqueued before the kernel is executed
|
abstract void |
run()
The entry point of a kernel.
|
void |
setAutoCleanUpArrays(boolean autoCleanUpArrays)
Property which if true enables automatic calling of
cleanUpArrays() following each execution. |
void |
setExecutionMode(Kernel.EXECUTION_MODE _executionMode)
Deprecated.
See
Kernel.EXECUTION_MODE
Set the execution mode. This should be regarded as a request. The real mode will be determined at runtime based on the availability of OpenCL and the characteristics of the workload. |
void |
setExecutionModeWithoutFallback(Kernel.EXECUTION_MODE _executionMode) |
void |
setExplicit(boolean _explicit)
For dev purposes (we should remove this for production) allow us to define that this Kernel uses explicit memory management
|
void |
setFallbackExecutionMode()
Deprecated.
|
String |
toString() |
void |
tryNextExecutionMode()
Deprecated.
See
Kernel.EXECUTION_MODE .
try the next execution path in the list if there aren't any more than give up |
static boolean |
usesAtomic32(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry) |
static boolean |
usesAtomic64(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry) |
public static final String LOCAL_SUFFIX
int[] buffer_$local$ = new int[1024];
Or use the Annotation form
@Local int[] buffer = new int[1024];
public static final String CONSTANT_SUFFIX
int[] buffer_$constant$ = new int[1024];
Or use the Annotation form
@Constant int[] buffer = new int[1024];
public static final String PRIVATE_SUFFIX
So either name the buffer
int[] buffer_$private$32 = new int[32];
Or use the Annotation form
@PrivateMemorySpace(32) int[] buffer = new int[32];
public abstract void run()
Every kernel must override this method.
public boolean hasFallbackAlgorithm()
executeFallbackAlgorithm(Range, int)
with the alternate
algorithm.public void executeFallbackAlgorithm(Range _range, int _passId)
hasFallbackAlgorithm()
has been overriden to return true, this method should be overriden so as to
apply a single pass of the kernel's logic to the entire _range.
This is not normally required, as fallback to JavaDevice.THREAD_POOL
will implement the algorithm in parallel. However
in the event that thread pool execution may be prohibitively slow, this method might implement a "quick and dirty" approximation
to the desired result (for example, a simple box-blur as opposed to a gaussian blur in an image processing application).
public void cancelMultiPass()
Note that in the case of thread-pool/pure java execution we could do better already, using Thread.interrupt() (and/or other means) to abandon execution mid-pass. However at present this is not attempted.
public int getCancelState()
public int getCurrentPass()
KernelRunner.getCurrentPass()
public boolean isExecuting()
KernelRunner.isExecuting()
public Kernel clone()
If you choose to override clone()
you are responsible for delegating to super.clone();
public Kernel.KernelState getKernelState()
public double getExecutionTime()
getConversionTime();
,
getAccumulatedExecutionTime();
public double getAccumulatedExecutionTime()
getExecutionTime();
,
getConversionTime();
public double getConversionTime()
getExecutionTime();
,
getAccumulatedExecutionTime();
public Kernel execute(Range _range)
_range
kernels.
When kernel.execute(globalSize)
is invoked, Aparapi will schedule the execution of globalSize
kernels. If the execution mode is GPU then
the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.
_range
- The number of Kernels that we would like to initiate.public Kernel execute(int _range)
_range
kernels.
When kernel.execute(_range)
is 1invoked, Aparapi will schedule the execution of _range
kernels. If the execution mode is GPU then
the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.
Since adding the new Range class
this method offers backward compatibility and merely defers to return (execute(Range.create(_range), 1));
.
_range
- The number of Kernels that we would like to initiate.public Kernel execute(Range _range, int _passes)
_passes
iterations of _range
kernels.
When kernel.execute(_range, _passes)
is invoked, Aparapi will schedule the execution of _reange
kernels. If the execution mode is GPU then
the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.
_passes
- The number of passes to makepublic Kernel execute(int _range, int _passes)
_passes
iterations over the _range
of kernels.
When kernel.execute(_range)
is invoked, Aparapi will schedule the execution of _range
kernels. If the execution mode is GPU then
the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.
Since adding the new Range class
this method offers backward compatibility and merely defers to return (execute(Range.create(_range), 1));
.
_range
- The number of Kernels that we would like to initiate.public Kernel execute(String _entrypoint, Range _range)
globalSize
kernels for the given entrypoint.
When kernel.execute("entrypoint", globalSize)
is invoked, Aparapi will schedule the execution of globalSize
kernels. If the execution mode is GPU then
the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.
_entrypoint
- is the name of the method we wish to use as the entrypoint to the kernelpublic Kernel execute(String _entrypoint, Range _range, int _passes)
globalSize
kernels for the given entrypoint.
When kernel.execute("entrypoint", globalSize)
is invoked, Aparapi will schedule the execution of globalSize
kernels. If the execution mode is GPU then
the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.
_entrypoint
- is the name of the method we wish to use as the entrypoint to the kernelpublic boolean isAutoCleanUpArrays()
public void setAutoCleanUpArrays(boolean autoCleanUpArrays)
cleanUpArrays()
following each execution.public void cleanUpArrays()
KernelArg
s to 1 (0 size is prohibited) and invoking kernel
execution on a zero size range. Unlike dispose()
, this does not prohibit further invocations of this kernel, as sundry resources such as OpenCL queues are
not freed by this method.
This allows a "dormant" Kernel to remain in existence without undue strain on GPU resources, which may be strongly preferable to disposing a Kernel and recreating another one later, as creation/use of a new Kernel (specifically creation of its associated OpenCL context) is expensive.
Note that where the underlying array field is declared final, for obvious reasons it is not resized to zero.
public void dispose()
When the execution mode is CPU
or GPU
, Aparapi stores some OpenCL resources in a data structure associated with the kernel instance. The
dispose()
method must be called to release these resources.
If execute(int _globalSize)
is called after dispose()
is called the results are undefined.
public boolean isRunningCL()
public final Device getTargetDevice()
public boolean isAllowDevice(Device _device)
@Deprecated public Kernel.EXECUTION_MODE getExecutionMode()
Kernel.EXECUTION_MODE
Return the current execution mode. Before a Kernel executes, this return value will be the execution mode as determined by the setting of the EXECUTION_MODE enumeration. By default, this setting is either GPU if OpenCL is available on the target system, or JTP otherwise. This default setting can be changed by calling setExecutionMode().
After a Kernel executes, the return value will be the mode in which the Kernel actually executed.
setExecutionMode(EXECUTION_MODE)
@Deprecated public void setExecutionMode(Kernel.EXECUTION_MODE _executionMode)
Kernel.EXECUTION_MODE
Set the execution mode.
This should be regarded as a request. The real mode will be determined at runtime based on the availability of OpenCL and the characteristics of the workload.
_executionMode
- the requested execution mode.getExecutionMode()
public void setExecutionModeWithoutFallback(Kernel.EXECUTION_MODE _executionMode)
@Deprecated public void setFallbackExecutionMode()
Kernel.EXECUTION_MODE
public static String getMappedMethodName(ClassModel.ConstantPool.MethodReferenceEntry _methodReferenceEntry)
public static boolean isMappedMethod(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)
public static boolean isOpenCLDelegateMethod(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)
public static boolean usesAtomic32(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)
public static boolean usesAtomic64(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)
public void setExplicit(boolean _explicit)
_explicit
- (true if we want explicit memory management)public boolean isExplicit()
public Kernel put(long[] array)
array
- public Kernel put(long[][] array)
array
- public Kernel put(long[][][] array)
array
- public Kernel put(double[] array)
array
- public Kernel put(double[][] array)
array
- public Kernel put(double[][][] array)
array
- public Kernel put(float[] array)
array
- public Kernel put(float[][] array)
array
- public Kernel put(float[][][] array)
array
- public Kernel put(int[] array)
array
- public Kernel put(int[][] array)
array
- public Kernel put(int[][][] array)
array
- public Kernel put(byte[] array)
array
- public Kernel put(byte[][] array)
array
- public Kernel put(byte[][][] array)
array
- public Kernel put(char[] array)
array
- public Kernel put(char[][] array)
array
- public Kernel put(char[][][] array)
array
- public Kernel put(boolean[] array)
array
- public Kernel put(boolean[][] array)
array
- public Kernel put(boolean[][][] array)
array
- public Kernel get(long[] array)
array
- public Kernel get(long[][] array)
array
- public Kernel get(long[][][] array)
array
- public Kernel get(double[] array)
array
- public Kernel get(double[][] array)
array
- public Kernel get(double[][][] array)
array
- public Kernel get(float[] array)
array
- public Kernel get(float[][] array)
array
- public Kernel get(float[][][] array)
array
- public Kernel get(int[] array)
array
- public Kernel get(int[][] array)
array
- public Kernel get(int[][][] array)
array
- public Kernel get(byte[] array)
array
- public Kernel get(byte[][] array)
array
- public Kernel get(byte[][][] array)
array
- public Kernel get(char[] array)
array
- public Kernel get(char[][] array)
array
- public Kernel get(char[][][] array)
array
- public Kernel get(boolean[] array)
array
- public Kernel get(boolean[][] array)
array
- public Kernel get(boolean[][][] array)
array
- public List<ProfileInfo> getProfileInfo()
@Deprecated public void addExecutionModes(Kernel.EXECUTION_MODE... platforms)
Kernel.EXECUTION_MODE
.
set possible fallback path for execution modes. for example setExecutionFallbackPath(GPU,CPU,JTP) will try to use the GPU if it fails it will fall back to OpenCL CPU and finally it will try JTP.
@Deprecated public boolean hasNextExecutionMode()
Kernel.EXECUTION_MODE
.@Deprecated public void tryNextExecutionMode()
Kernel.EXECUTION_MODE
.
try the next execution path in the list if there aren't any more than give uppublic static void invalidateCaches()
Copyright © 2016 Syncleus. All rights reserved.