static class |
CuDNNFunctionOptimizations.CudnnConv2dNCHWtoNHWCConversion
https://docs.nvidia.com/deeplearning/sdk/dl-performance-guide/index.html#tensor-layout
For tensor cores: we want NHWC layout:
Section 7.3.1
"Layout choice has an effect on performance, as convolutions implemented for Tensor Cores require NHWC layout and are fastest when input tensors are laid out in NHWC."
"To maximize performance, we recommend using NHWC tensor layout."
As for weights format: cuDNN docs are vague - but TF uses NCHW+OIHW or NHWC+OHWI
|