com.aparapi.Kernel

All Implemented Interfaces:: Cloneable

public abstract class Kernel extends Object implements Cloneable

A kernel encapsulates a data parallel algorithm that will execute either on a GPU (through conversion to OpenCL) or on a CPU via a Java Thread Pool.

To write a new kernel, a developer extends the Kernel class and overrides the Kernel.run() method. To execute this kernel, the developer creates a new instance of it and calls Kernel.execute(int globalSize) with a suitable 'global size'. At runtime Aparapi will attempt to convert the Kernel.run() method (and any method called directly or indirectly by Kernel.run()) into OpenCL for execution on GPU devices made available via the OpenCL platform.

Note that Kernel.run() is not called directly. Instead, the Kernel.execute(int globalSize) method will cause the overridden Kernel.run() method to be invoked once for each value in the range 0...globalSize.

On the first call to Kernel.execute(int _globalSize), Aparapi will determine the EXECUTION_MODE of the kernel. This decision is made dynamically based on two factors:

Whether OpenCL is available (appropriate drivers are installed and the OpenCL and Aparapi dynamic libraries are included on the system path).
Whether the bytecode of the run() method (and every method that can be called directly or indirectly from the run() method) can be converted into OpenCL.

Below is an example Kernel that calculates the square of a set of input values.

     class SquareKernel extends Kernel{
         private int values[];
         private int squares[];
         public SquareKernel(int values[]){
            this.values = values;
            squares = new int[values.length];
         }
         public void run() {
             int gid = getGlobalID();
             squares[gid] = values[gid]*values[gid];
         }
         public int[] getSquares(){
             return(squares);
         }
     }

To execute this kernel, first create a new instance of it and then call execute(Range _range).

     int[] values = new int[1024];
     // fill values array
     Range range = Range.create(values.length); // create a range 0..1024
     SquareKernel kernel = new SquareKernel(values);
     kernel.execute(range);

When execute(Range) returns, all the executions of Kernel.run() have completed and the results are available in the squares array.

     int[] squares = kernel.getSquares();
     for (int i=0; iinvalid input: '<' values.length; i++){
        System.out.printf("%4d %4d %8d\n", i, values[i], squares[i]);
     }

A different approach to creating kernels that avoids extending Kernel is to write an anonymous inner class:


     final int[] values = new int[1024];
     // fill the values array
     final int[] squares = new int[values.length];
     final Range range = Range.create(values.length);

     Kernel kernel = new Kernel(){
         public void run() {
             int gid = getGlobalID();
             squares[gid] = values[gid]*values[gid];
         }
     };
     kernel.execute(range);
     for (int i=0; iinvalid input: '<' values.length; i++){
        System.out.printf("%4d %4d %8d\n", i, values[i], squares[i]);
     }

Version:: Alpha, 21/09/2010
Author:: gfrost AMD Javalabs

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static @interface

Kernel.Constant

We can use this Annotation to 'tag' intended constant buffers.

class

Kernel.Entry

static enum

Kernel.EXECUTION_MODE

Deprecated.
It is no longer recommended that EXECUTION_MODEs are used, as a more sophisticated Device preference mechanism is in place, see KernelManager.

final class

Kernel.KernelState

This class is for internal Kernel state management

static @interface

Kernel.Local

We can use this Annotation to 'tag' intended local buffers.

static @interface

Kernel.NoCL

Annotation which can be applied to either a getter (with usual java bean naming convention relative to an instance field), or to any method with void return type, which prevents both the method body and any calls to the method being emitted in the generated OpenCL.

static @interface

Kernel.PrivateMemorySpace

We can use this Annotation to 'tag' __private (unshared) array fields.
Field Summary

Fields

Modifier and Type

Field

Description

static final String

CONSTANT_SUFFIX

We can use this suffix to 'tag' intended constant buffers.

static final String

LOCAL_SUFFIX

We can use this suffix to 'tag' intended local buffers.

static final String

PRIVATE_SUFFIX

We can use this suffix to 'tag' __private buffers.
Constructor Summary

Constructors

Constructor

Description

Kernel()
Method Summary

Modifier and Type

Method

Description

void

addExecutionModes(Kernel.EXECUTION_MODE... platforms)

Deprecated.
See Kernel.EXECUTION_MODE.

void

cancelMultiPass()

Invoking this method flags that once the current pass is complete execution should be abandoned.

void

cleanUpArrays()

Frees the bulk of the resources used by this kernel, by setting array sizes in non-primitive KernelArgs to 1 (0 size is prohibited) and invoking kernel execution on a zero size range.

Kernel

clone()

When using a Java Thread Pool Aparapi uses clone to copy the initial instance to each thread.

Kernel

compile(Device _device)

Force pre-compilation of the kernel for a given device, without executing it.

Kernel

compile(String _entrypoint, Device _device)

Force pre-compilation of the kernel for a given device, without executing it.

void

dispose()

Release any resources associated with this Kernel.

Kernel

execute(int _range)

Start execution of _range kernels.

Kernel

execute(int _range, int _passes)

Start execution of _passes iterations over the _range of kernels.

Kernel

execute(Range _range)

Start execution of _range kernels.

Kernel

execute(Range _range, int _passes)

Start execution of _passes iterations of _range kernels.

Kernel

execute(String _entrypoint, Range _range)

Start execution of globalSize kernels for the given entrypoint.

Kernel

execute(String _entrypoint, Range _range, int _passes)

Start execution of globalSize kernels for the given entrypoint.

void

executeFallbackAlgorithm(Range _range, int _passId)

If hasFallbackAlgorithm() has been overriden to return true, this method should be overriden so as to apply a single pass of the kernel's logic to the entire _range.

Kernel

get(boolean[] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(boolean[][] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(boolean[][][] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(byte[] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(byte[][] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(byte[][][] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(char[] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(char[][] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(char[][][] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(double[] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(double[][] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(double[][][] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(float[] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(float[][] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(float[][][] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(int[] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(int[][] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(int[][][] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(long[] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(long[][] array)

Enqueue a request to return this buffer from the GPU.

Kernel

get(long[][][] array)

Enqueue a request to return this buffer from the GPU.

double

getAccumulatedExecutionTime()

Determine the total execution time of all previous Kernel.execute(range) calls for all threads that ran this kernel for the device used in the last kernel execution.

double

getAccumulatedExecutionTimeAllThreads(Device device)

Determine the total execution time of all produced profile reports from all threads that executed the current kernel on the specified device.

double

getAccumulatedExecutionTimeCurrentThread(Device device)

Determine the total execution time of all previous kernel executions called from the current thread, calling this method, that executed the current kernel on the specified device.

int

getCancelState()

double

getConversionTime()

Determine the time taken to convert bytecode to OpenCL for first Kernel.execute(range) call.

int

getCurrentPass()

Kernel.EXECUTION_MODE

getExecutionMode()

Deprecated.
See Kernel.EXECUTION_MODE

double

getExecutionTime()

Determine the execution time of the previous Kernel.execute(range) called from the last thread that ran and executed on the most recently used device.

int[]

getKernelCompileWorkGroupSize(Device device)

Retrieves the specified work-group size in the compiled kernel for the specified device or intermediate language for the device.

long

getKernelLocalMemSizeInUse(Device device)

Retrieves the amount of local memory used in the specified device by this kernel instance.

int

getKernelMaxWorkGroupSize(Device device)

Retrieves the maximum work-group size allowed for this kernel when running on the specified device.

long

getKernelMinimumPrivateMemSizeInUsePerWorkItem(Device device)

Retrieves that minimum private memory in use per work item for this kernel instance and the specified device.

int

getKernelPreferredWorkGroupSizeMultiple(Device device)

Retrieves the preferred work-group multiple in the specified device for this kernel instance.

Kernel.KernelState

getKernelState()

static String

getMappedMethodName(ClassModel.ConstantPool.MethodReferenceEntry _methodReferenceEntry)

List<ProfileInfo>

getProfileInfo()

Get the profiling information from the last successful call to Kernel.execute().

WeakReference<ProfileReport>

getProfileReportCurrentThread(Device device)

Retrieves the most recent complete report available for the current thread calling this method for the current kernel instance and executed on the given device.

WeakReference<ProfileReport>

getProfileReportLastThread(Device device)

Retrieves a profile report for the last thread that executed this kernel on the given device.

final Device

getTargetDevice()

boolean

hasFallbackAlgorithm()

False by default.

boolean

hasNextExecutionMode()

Deprecated.
See Kernel.EXECUTION_MODE.

static void

invalidateCaches()

boolean

isAllowDevice(Device _device)

boolean

isAutoCleanUpArrays()

boolean

isExecuting()

boolean

isExplicit()

For dev purposes (we should remove this for production) determine whether this Kernel uses explicit memory management

static boolean

isMappedMethod(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)

static boolean

isOpenCLDelegateMethod(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)

boolean

isRunningCL()

Kernel

put(boolean[] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(boolean[][] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(boolean[][][] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(byte[] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(byte[][] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(byte[][][] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(char[] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(char[][] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(char[][][] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(double[] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(double[][] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(double[][][] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(float[] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(float[][] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(float[][][] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(int[] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(int[][] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(int[][][] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(long[] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(long[][] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

Kernel

put(long[][][] array)

Tag this array so that it is explicitly enqueued before the kernel is executed

void

registerProfileReportObserver(IProfileReportObserver observer)

Registers a new profile report observer to receive profile reports as they're produced.

abstract void

run()

The entry point of a kernel.

void

setAutoCleanUpArrays(boolean autoCleanUpArrays)

Property which if true enables automatic calling of cleanUpArrays() following each execution.

void

setExecutionMode(Kernel.EXECUTION_MODE _executionMode)

Deprecated.
See Kernel.EXECUTION_MODE

void

setExecutionModeWithoutFallback(Kernel.EXECUTION_MODE _executionMode)

void

setExplicit(boolean _explicit)

For dev purposes (we should remove this for production) allow us to define that this Kernel uses explicit memory management

void

setFallbackExecutionMode()

Deprecated.
See Kernel.EXECUTION_MODE

String

toString()

void

tryNextExecutionMode()

Deprecated.
See Kernel.EXECUTION_MODE.

static boolean

usesAtomic32(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)

static boolean

usesAtomic64(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- LOCAL_SUFFIX
  
  public static final String LOCAL_SUFFIX
  We can use this suffix to 'tag' intended local buffers. So either name the buffer
  int[] buffer_$local$ = new int[1024];
  Or use the Annotation form
  invalid input: '&#64'Local int[] buffer = new int[1024];
  See Also:
  
  Constant Field Values
- CONSTANT_SUFFIX
  
  public static final String CONSTANT_SUFFIX
  We can use this suffix to 'tag' intended constant buffers. So either name the buffer
  int[] buffer_$constant$ = new int[1024];
  Or use the Annotation form
  invalid input: '&#64'Constant int[] buffer = new int[1024];
  See Also:
  
  Constant Field Values
- PRIVATE_SUFFIX
  
  public static final String PRIVATE_SUFFIX
  We can use this suffix to 'tag' __private buffers.
  So either name the buffer
  int[] buffer_$private$32 = new int[32];
  Or use the Annotation form
  invalid input: '&#64'PrivateMemorySpace(32) int[] buffer = new int[32];
  See Also:
  
  for a more detailed usage summary
  
  Constant Field Values
Constructor Details
- Kernel
  
  public Kernel()
Method Details
- run
  
  public abstract void run()
  
  The entry point of a kernel.
  Every kernel must override this method.
- hasFallbackAlgorithm
  
  public boolean hasFallbackAlgorithm()
  
  False by default. In the event that all preferred devices fail to execute a kernel, it is possible to supply an alternate (possibly non-parallel) execution algorithm by overriding this method to return true, and overriding executeFallbackAlgorithm(Range, int) with the alternate algorithm.
- executeFallbackAlgorithm
  
  public void executeFallbackAlgorithm(Range _range, int _passId)
  
  If hasFallbackAlgorithm() has been overriden to return true, this method should be overriden so as to apply a single pass of the kernel's logic to the entire _range.
  This is not normally required, as fallback to JavaDevice.THREAD_POOL will implement the algorithm in parallel. However in the event that thread pool execution may be prohibitively slow, this method might implement a "quick and dirty" approximation to the desired result (for example, a simple box-blur as opposed to a gaussian blur in an image processing application).
- cancelMultiPass
  
  public void cancelMultiPass()
  
  Invoking this method flags that once the current pass is complete execution should be abandoned. Due to the complexity of intercommunication between java (or C) and executing OpenCL, this is the best we can do for general cancellation of execution at present. OpenCL 2.0 should introduce pipe mechanisms which will support mid-pass cancellation easily.
  Note that in the case of thread-pool/pure java execution we could do better already, using Thread.interrupt() (and/or other means) to abandon execution mid-pass. However at present this is not attempted.
  See Also:
  
  execute(int, int)
  
  execute(Range, int)
  
  execute(String, Range, int)
- getCancelState
  
  public int getCancelState()
- getCurrentPass
  
  public int getCurrentPass()
  See Also:
  
  KernelRunner.getCurrentPass()
- isExecuting
  
  public boolean isExecuting()
  See Also:
  
  KernelRunner.isExecuting()
- clone
  
  public Kernel clone()
  
  When using a Java Thread Pool Aparapi uses clone to copy the initial instance to each thread.
  If you choose to override clone() you are responsible for delegating to super.clone();
- getKernelState
  
  public Kernel.KernelState getKernelState()
- registerProfileReportObserver
  
  public void registerProfileReportObserver(IProfileReportObserver observer)
  
  Registers a new profile report observer to receive profile reports as they're produced. This is the method recommended when the client application desires to receive all the execution profiles for the current kernel instance on all devices over all client threads running such kernel with a single observer
  Note1: A report will be generated by a thread that finishes executing a kernel. In multithreaded execution environments it is up to the observer implementation to handle thread safety.
  Note2: To cancel the report subscription just set observer to null value.
  
  Parameters:
  
  observer - the observer instance that will receive the profile reports
- getProfileReportLastThread
  
  public WeakReference<ProfileReport> getProfileReportLastThread(Device device)
  
  Retrieves a profile report for the last thread that executed this kernel on the given device. A report will only be available if at least one thread executed the kernel on the device.
  Note1: If the profile report is intended to be kept in memory, the object should be cloned with ProfileReport.clone()
  Parameters:
  
  device - the relevant device where the kernel executed
  
  Returns:
  
  the profiling report for the current most recent execution
  
  null, if no profiling report is available for such thread
  
  See Also:
  
  getProfileReportCurrentThread(Device)
  
  registerProfileReportObserver(IProfileReportObserver)
  
  getAccumulatedExecutionTimeAllThreads(Device)
  
  invalid reference
  
  #getExecutionTimeLastThread()
  
  invalid reference
  
  #getConversionTimeLastThread()
- getProfileReportCurrentThread
  
  public WeakReference<ProfileReport> getProfileReportCurrentThread(Device device)
  
  Retrieves the most recent complete report available for the current thread calling this method for the current kernel instance and executed on the given device.
  Note1: If the profile report is intended to be kept in memory, the object should be cloned with ProfileReport.clone()
  Note2: If the thread didn't execute this kernel on the specified device, it will return null.
  Parameters:
  
  device - the relevant device where the kernel executed
  
  Returns:
  
  the profiling report for the current most recent execution
  
  null, if no profiling report is available for such thread
  
  See Also:
  
  getProfileReportLastThread(Device)
  
  registerProfileReportObserver(IProfileReportObserver)
  
  invalid reference
  
  #getExecutionTimeCurrentThread(Device)
  
  invalid reference
  
  #getConversionTimeCurrentThread(Device)
  
  getAccumulatedExecutionTimeAllThreads(Device)
- getExecutionTime
  
  public double getExecutionTime()
  
  Determine the execution time of the previous Kernel.execute(range) called from the last thread that ran and executed on the most recently used device.
  Note1: This is kept for backwards compatibility only, usage of either getProfileReportLastThread(Device) or registerProfileReportObserver(IProfileReportObserver) is encouraged instead.
  Note2: Calling this method is not recommended when using more than a single thread to execute the same kernel, or when running kernels on more than one device concurrently.
  
  Note that for the first call this will include the conversion time.
  Returns:
  
  The time spent executing the kernel (ms)
  
  NaN, if no profile report is available
  
  See Also:
  
  getProfileReportCurrentThread(Device)
  
  registerProfileReportObserver(IProfileReportObserver)
  
  getAccumulatedExecutionTimeAllThreads(Device)
- getConversionTime
  
  public double getConversionTime()
  
  Determine the time taken to convert bytecode to OpenCL for first Kernel.execute(range) call.
  Note1: This is kept for backwards compatibility only, usage of either getProfileReportLastThread(Device) or registerProfileReportObserver(IProfileReportObserver) is encouraged instead.
  Note2: Calling this method is not recommended when using more than a single thread to execute the same kernel, or when running kernels on more than one device concurrently.
  
  Note that for the first call this will include the conversion time.
  Returns:
  
  The time spent preparing the kernel for execution using GPU
  
  NaN, if no profile report is available
  
  See Also:
  
  getProfileReportCurrentThread(Device)
  
  registerProfileReportObserver(IProfileReportObserver)
  
  getAccumulatedExecutionTimeAllThreads(Device)
- getAccumulatedExecutionTimeCurrentThread
  
  public double getAccumulatedExecutionTimeCurrentThread(Device device)
  
  Determine the total execution time of all previous kernel executions called from the current thread, calling this method, that executed the current kernel on the specified device.
  Note1: This is the recommended method to retrieve the accumulated execution time for a single current thread, even when doing multithreading for the same kernel and device.
  Note that this will include the initial conversion time.
  Parameters:
  
  the - device of interest where the kernel executed
  
  Returns:
  
  The total time spent executing the kernel (ms)
  
  NaN, if no profiling information is available
  
  See Also:
  
  getProfileReportCurrentThread(Device)
  
  getProfileReportLastThread(Device)
  
  registerProfileReportObserver(IProfileReportObserver)
  
  getAccumulatedExecutionTimeAllThreads(Device)
- getAccumulatedExecutionTimeAllThreads
  
  public double getAccumulatedExecutionTimeAllThreads(Device device)
  
  Determine the total execution time of all produced profile reports from all threads that executed the current kernel on the specified device.
  Note1: This is the recommended method to retrieve the accumulated execution time, even when doing multithreading for the same kernel and device.
  Note that this will include the initial conversion time.
  Parameters:
  
  the - device of interest where the kernel executed
  
  Returns:
  
  The total time spent executing the kernel (ms)
  
  NaN, if no profiling information is available
  
  See Also:
  
  getProfileReportCurrentThread(Device)
  
  getProfileReportLastThread(Device)
  
  registerProfileReportObserver(IProfileReportObserver)
  
  getAccumulatedExecutionTimeCurrentThread(Device)
- getAccumulatedExecutionTime
  
  public double getAccumulatedExecutionTime()
  
  Determine the total execution time of all previous Kernel.execute(range) calls for all threads that ran this kernel for the device used in the last kernel execution.
  Note1: This is kept for backwards compatibility only, usage of getAccumulatedExecutionTimeAllThreads(Device) is encouraged instead.
  Note2: Calling this method is not recommended when using more than a single thread to execute the same kernel on multiple devices concurrently.
  
  Note that this will include the initial conversion time.
  Returns:
  
  The total time spent executing the kernel (ms)
  
  NaN, if no profiling information is available
  
  See Also:
  
  invalid reference
  
  #getProfileReport(Device)
  
  registerProfileReportObserver(IProfileReportObserver)
- execute
  
  public Kernel execute(Range _range)
  
  Start execution of _range kernels.
  When kernel.execute(globalSize) is invoked, Aparapi will schedule the execution of globalSize kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.
  
  Parameters:
  
  _range - The number of Kernels that we would like to initiate.
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object
- execute
  
  public Kernel execute(int _range)
  
  Start execution of _range kernels.
  When kernel.execute(_range) is 1invoked, Aparapi will schedule the execution of _range kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.
  Since adding the new Range class this method offers backward compatibility and merely defers to return (execute(Range.create(_range), 1));.
  
  Parameters:
  
  _range - The number of Kernels that we would like to initiate.
- execute
  
  public Kernel execute(Range _range, int _passes)
  
  Start execution of _passes iterations of _range kernels.
  When kernel.execute(_range, _passes) is invoked, Aparapi will schedule the execution of _reange kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.
  
  Parameters:
  
  _passes - The number of passes to make
  
  Returns:
  
  The Kernel instance (this) so we can chain calls to put(arr).execute(range).get(arr)
- execute
  
  public Kernel execute(int _range, int _passes)
  
  Start execution of _passes iterations over the _range of kernels.
  When kernel.execute(_range) is invoked, Aparapi will schedule the execution of _range kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.
  Since adding the new Range class this method offers backward compatibility and merely defers to return (execute(Range.create(_range), 1));.
  
  Parameters:
  
  _range - The number of Kernels that we would like to initiate.
- execute
  
  public Kernel execute(String _entrypoint, Range _range)
  
  Start execution of globalSize kernels for the given entrypoint.
  When kernel.execute("entrypoint", globalSize) is invoked, Aparapi will schedule the execution of globalSize kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.
  
  Parameters:
  
  _entrypoint - is the name of the method we wish to use as the entrypoint to the kernel
  
  Returns:
  
  The Kernel instance (this) so we can chain calls to put(arr).execute(range).get(arr)
- execute
  
  public Kernel execute(String _entrypoint, Range _range, int _passes)
  
  Start execution of globalSize kernels for the given entrypoint.
  When kernel.execute("entrypoint", globalSize) is invoked, Aparapi will schedule the execution of globalSize kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.
  
  Parameters:
  
  _entrypoint - is the name of the method we wish to use as the entrypoint to the kernel
  
  Returns:
  
  The Kernel instance (this) so we can chain calls to put(arr).execute(range).get(arr)
- compile
  
  public Kernel compile(Device _device) throws CompileFailedException
  
  Force pre-compilation of the kernel for a given device, without executing it.
  
  Parameters:
  
  _device - the device for which the kernel is to be compiled
  
  Returns:
  
  the Kernel instance (this) so we can chain calls
  
  Throws:
  
  CompileFailedException - if compilation failed for some reason
- compile
  
  public Kernel compile(String _entrypoint, Device _device) throws CompileFailedException
  
  Force pre-compilation of the kernel for a given device, without executing it.
  
  Parameters:
  
  _entrypoint - is the name of the method we wish to use as the entrypoint to the kernel
  
  _device - the device for which the kernel is to be compiled
  
  Returns:
  
  the Kernel instance (this) so we can chain calls
  
  Throws:
  
  CompileFailedException - if compilation failed for some reason
- getKernelMinimumPrivateMemSizeInUsePerWorkItem
  
  public long getKernelMinimumPrivateMemSizeInUsePerWorkItem(Device device) throws QueryFailedException
  
  Retrieves that minimum private memory in use per work item for this kernel instance and the specified device.
  
  Parameters:
  
  device - the device where the kernel is intended to run
  
  Returns:
  
  the number of bytes used per work item
  
  Throws:
  
  QueryFailedException - if the query couldn't complete
- getKernelLocalMemSizeInUse
  
  public long getKernelLocalMemSizeInUse(Device device) throws QueryFailedException
  
  Retrieves the amount of local memory used in the specified device by this kernel instance.
  
  Parameters:
  
  device - the device where the kernel is intended to run
  
  Returns:
  
  the number of bytes of local memory in use for the specified device and current kernel
  
  Throws:
  
  QueryFailedException - if the query couldn't complete
- getKernelPreferredWorkGroupSizeMultiple
  
  public int getKernelPreferredWorkGroupSizeMultiple(Device device) throws QueryFailedException
  
  Retrieves the preferred work-group multiple in the specified device for this kernel instance.
  
  Parameters:
  
  device - the device where the kernel is intended to run
  
  Returns:
  
  the preferred work group multiple
  
  Throws:
  
  QueryFailedException - if the query couldn't complete
- getKernelMaxWorkGroupSize
  
  public int getKernelMaxWorkGroupSize(Device device) throws QueryFailedException
  
  Retrieves the maximum work-group size allowed for this kernel when running on the specified device.
  
  Parameters:
  
  device - the device where the kernel is intended to run
  
  Returns:
  
  the preferred work group multiple
  
  Throws:
  
  QueryFailedException - if the query couldn't complete
- getKernelCompileWorkGroupSize
  
  public int[] getKernelCompileWorkGroupSize(Device device) throws QueryFailedException
  
  Retrieves the specified work-group size in the compiled kernel for the specified device or intermediate language for the device.
  
  Parameters:
  
  device - the device where the kernel is intended to run
  
  Returns:
  
  the preferred work group multiple
  
  Throws:
  
  QueryFailedException - if the query couldn't complete
- isAutoCleanUpArrays
  
  public boolean isAutoCleanUpArrays()
- setAutoCleanUpArrays
  
  public void setAutoCleanUpArrays(boolean autoCleanUpArrays)
  
  Property which if true enables automatic calling of cleanUpArrays() following each execution.
- cleanUpArrays
  
  public void cleanUpArrays()
  
  Frees the bulk of the resources used by this kernel, by setting array sizes in non-primitive KernelArgs to 1 (0 size is prohibited) and invoking kernel execution on a zero size range. Unlike dispose(), this does not prohibit further invocations of this kernel, as sundry resources such as OpenCL queues are not freed by this method.
  This allows a "dormant" Kernel to remain in existence without undue strain on GPU resources, which may be strongly preferable to disposing a Kernel and recreating another one later, as creation/use of a new Kernel (specifically creation of its associated OpenCL context) is expensive.
  
  Note that where the underlying array field is declared final, for obvious reasons it is not resized to zero.
- dispose
  
  public void dispose()
  
  Release any resources associated with this Kernel.
  When the execution mode is CPU or GPU, Aparapi stores some OpenCL resources in a data structure associated with the kernel instance. The dispose() method must be called to release these resources.
  If execute(int _globalSize) is called after dispose() is called the results are undefined.
- isRunningCL
  
  public boolean isRunningCL()
- getTargetDevice
  
  public final Device getTargetDevice()
- isAllowDevice
  
  public boolean isAllowDevice(Device _device)
  
  Returns:
  
  true by default, may be overriden to allow vetoing of a device or devices by a given Kernel instance.
- getExecutionMode
  
  @Deprecated public Kernel.EXECUTION_MODE getExecutionMode()
  
  Deprecated.
  See Kernel.EXECUTION_MODE
  Return the current execution mode. Before a Kernel executes, this return value will be the execution mode as determined by the setting of the EXECUTION_MODE enumeration. By default, this setting is either GPU if OpenCL is available on the target system, or JTP otherwise. This default setting can be changed by calling setExecutionMode().
  After a Kernel executes, the return value will be the mode in which the Kernel actually executed.
  Returns:
  
  The current execution mode.
  
  See Also:
  
  setExecutionMode(EXECUTION_MODE)
- setExecutionMode
  
  @Deprecated public void setExecutionMode(Kernel.EXECUTION_MODE _executionMode)
  
  Deprecated.
  See Kernel.EXECUTION_MODE
  Set the execution mode.
  This should be regarded as a request. The real mode will be determined at runtime based on the availability of OpenCL and the characteristics of the workload.
  Parameters:
  
  _executionMode - the requested execution mode.
  
  See Also:
  
  getExecutionMode()
- setExecutionModeWithoutFallback
  
  public void setExecutionModeWithoutFallback(Kernel.EXECUTION_MODE _executionMode)
- setFallbackExecutionMode
  
  @Deprecated public void setFallbackExecutionMode()
  
  Deprecated.
  See Kernel.EXECUTION_MODE
- getMappedMethodName
  
  public static String getMappedMethodName(ClassModel.ConstantPool.MethodReferenceEntry _methodReferenceEntry)
- isMappedMethod
  
  public static boolean isMappedMethod(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)
- isOpenCLDelegateMethod
  
  public static boolean isOpenCLDelegateMethod(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)
- usesAtomic32
  
  public static boolean usesAtomic32(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)
- usesAtomic64
  
  public static boolean usesAtomic64(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)
- setExplicit
  
  public void setExplicit(boolean _explicit)
  
  For dev purposes (we should remove this for production) allow us to define that this Kernel uses explicit memory management
  
  Parameters:
  
  _explicit - (true if we want explicit memory management)
- isExplicit
  
  public boolean isExplicit()
  
  For dev purposes (we should remove this for production) determine whether this Kernel uses explicit memory management
  
  Returns:
  
  (true if we kernel is using explicit memory management)
- put
  
  public Kernel put(long[] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(long[][] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(long[][][] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(double[] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(double[][] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(double[][][] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(float[] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(float[][] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(float[][][] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(int[] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(int[][] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(int[][][] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(byte[] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(byte[][] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(byte[][][] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(char[] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(char[][] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(char[][][] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(boolean[] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(boolean[][] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- put
  
  public Kernel put(boolean[][][] array)
  
  Tag this array so that it is explicitly enqueued before the kernel is executed
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(long[] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(long[][] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(long[][][] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(double[] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(double[][] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(double[][][] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(float[] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(float[][] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(float[][][] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(int[] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(int[][] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(int[][][] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(byte[] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(byte[][] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(byte[][][] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(char[] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(char[][] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(char[][][] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(boolean[] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(boolean[][] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- get
  
  public Kernel get(boolean[][][] array)
  
  Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
  
  Parameters:
  
  array -
  
  Returns:
  
  This kernel so that we can use the 'fluent' style API
- getProfileInfo
  
  public List<ProfileInfo> getProfileInfo()
  
  Get the profiling information from the last successful call to Kernel.execute().
  
  Returns:
  
  A list of ProfileInfo records
- addExecutionModes
  
  @Deprecated public void addExecutionModes(Kernel.EXECUTION_MODE... platforms)
  
  Deprecated.
  See Kernel.EXECUTION_MODE.
  set possible fallback path for execution modes. for example setExecutionFallbackPath(GPU,CPU,JTP) will try to use the GPU if it fails it will fall back to OpenCL CPU and finally it will try JTP.
- hasNextExecutionMode
  
  @Deprecated public boolean hasNextExecutionMode()
  
  Deprecated.
  See Kernel.EXECUTION_MODE.
  
  Returns:
  
  is there another execution path we can try
- tryNextExecutionMode
  
  @Deprecated public void tryNextExecutionMode()
  
  Deprecated.
  See Kernel.EXECUTION_MODE. try the next execution path in the list if there aren't any more than give up
- invalidateCaches
  
  public static void invalidateCaches()

Class Kernel

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

LOCAL_SUFFIX

CONSTANT_SUFFIX

PRIVATE_SUFFIX

Constructor Details

Kernel

Method Details

run

hasFallbackAlgorithm

executeFallbackAlgorithm

cancelMultiPass

getCancelState

getCurrentPass

isExecuting

clone

getKernelState

registerProfileReportObserver

getProfileReportLastThread

getProfileReportCurrentThread

getExecutionTime

getConversionTime

getAccumulatedExecutionTimeCurrentThread

getAccumulatedExecutionTimeAllThreads

getAccumulatedExecutionTime

execute

toString

execute

execute

execute

execute

execute

compile

compile

getKernelMinimumPrivateMemSizeInUsePerWorkItem

getKernelLocalMemSizeInUse

getKernelPreferredWorkGroupSizeMultiple

getKernelMaxWorkGroupSize

getKernelCompileWorkGroupSize

isAutoCleanUpArrays

setAutoCleanUpArrays

cleanUpArrays

dispose

isRunningCL

getTargetDevice

isAllowDevice

getExecutionMode

setExecutionMode

setExecutionModeWithoutFallback

setFallbackExecutionMode

getMappedMethodName

isMappedMethod

isOpenCLDelegateMethod

usesAtomic32

usesAtomic64

setExplicit

isExplicit

put

put

put

put

put

put

put

put

put

put

put

put

put

put

put

put

put

put