Class CuDNNFunctionOptimizations
- java.lang.Object
-
- org.nd4j.autodiff.samediff.optimize.optimizations.BaseOptimizerSet
-
- org.nd4j.autodiff.samediff.optimize.optimizations.CuDNNFunctionOptimizations
-
- All Implemented Interfaces:
OptimizerSet
public class CuDNNFunctionOptimizations extends BaseOptimizerSet
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
CuDNNFunctionOptimizations.CudnnConv2dNCHWtoNHWCConversion
https://docs.nvidia.com/deeplearning/sdk/dl-performance-guide/index.html#tensor-layout For tensor cores: we want NHWC layout: Section 7.3.1 "Layout choice has an effect on performance, as convolutions implemented for Tensor Cores require NHWC layout and are fastest when input tensors are laid out in NHWC." "To maximize performance, we recommend using NHWC tensor layout." As for weights format: cuDNN docs are vague - but TF uses NCHW+OIHW or NHWC+OHWI
-
Field Summary
Fields Modifier and Type Field Description protected static boolean
isCudaBackend
-
Constructor Summary
Constructors Constructor Description CuDNNFunctionOptimizations()
-