Blame Arnon

Matching SM architectures (CUDA arch and CUDA gencode) for various NVIDIA cards

I’ve seen some confusion regarding NVIDIA’s nvcc sm flags and what they’re used for:
When compiling with NVCC, the arch flag (‘-arch‘) specifies the name of the NVIDIA GPU architecture that the CUDA files will be compiled for.
Gencodes (‘-gencode‘) allows for more PTX generations, and can be repeated many times for different architectures.

When should different ‘gencodes’ or ‘cuda arch’ be used?

When you compile CUDA code, you should always compile only one ‘-arch‘ flag that matches your most used GPU cards. This will enable faster runtime, because code generation will occur during compilation.
If you only mention ‘-gencode‘, but omit the ‘-arch‘ flag, the GPU code generation will occur on the JIT compiler by the CUDA driver.

Sometimes, you would also like to enable some backwards compatibility. In those cases, you want to add some more ‘-gencode‘ flags.

Find out which GPU you have, and which CUDA version you have first.

Supported SM and Gencode variations

Below are the supported sm variations and sample cards from that generation

Supported on CUDA 7 and later

  • Fermi (CUDA 3.2 and later, deprecated from CUDA 9):
    • SM20 or SM_20, compute_30 – Older cards such as GeForce 400, 500, 600, GT-630
  • Kepler (CUDA 5 and later):
    • SM30 or SM_30, compute_30 – Kepler architecture (generic – Tesla K40/K80, GeForce 700, GT-730)
      Adds support for unified memory programming
    • SM35 or SM_35, compute_35 – More specific Tesla K40
      Adds support for dynamic parallelism. Shows no real benefit over SM30 in my experience.
    • SM37 or SM_37, compute_37 – More specific Tesla K80
      Adds a few more registers. Shows no real benefit over SM30 in my experience
  • Maxwell (CUDA 6 and later):
    • SM50 or SM_50, compute_50 – Tesla/Quadro M series
    • SM52 or SM_52, compute_52 – Quadro M6000 , GeForce 900, GTX-970, GTX-980, GTX Titan X
    • SM53 or SM_53, compute_53 – Tegra (Jetson) TX1 / Tegra X1
  • Pascal (CUDA 8 and later)
    • SM60 or SM_60, compute_60 – GP100/Tesla P100 – DGX-1 (Generic Pascal)
    • SM61 or SM_61, compute_61 – GTX 1080, GTX 1070, GTX 1060, GTX 1050, GTX 1030, Titan Xp, Tesla P40, Tesla P4
    • SM62 or SM_62, compute_62 – Drive-PX2, Tegra (Jetson) TX2, Denver-based GPU
  • Volta (CUDA 9 and later)
    • SM70 or SM_70, compute_70 – Tesla V100
    • SM71 or SM_71, compute_71 – probably not implemented
    • SM72 or SM_72, compute_72 – currently unknown

Sample flags

According to NVIDIA:

The arch= clause of the -gencode= command-line option to nvcc specifies the front-end compilation target and must always be a PTX version. The code= clause specifies the back-end compilation target and can either be cubin or PTX or both. Only the back-end target version(s) specified by the code= clause will be retained in the resulting binary; at least one must be PTX to provide Volta compatibility.

Sample flags for generation on CUDA 7 for maximum compatibility:

-arch=sm_30 \
 -gencode=arch=compute_20,code=sm_20 \
 -gencode=arch=compute_30,code=sm_30 \
 -gencode=arch=compute_50,code=sm_50 \
 -gencode=arch=compute_52,code=sm_52 \
 -gencode=arch=compute_52,code=compute_52

Sample flags for generation on CUDA 8 for maximum compatibility:

-arch=sm_30 \
 -gencode=arch=compute_20,code=sm_20 \
 -gencode=arch=compute_30,code=sm_30 \
 -gencode=arch=compute_50,code=sm_50 \
 -gencode=arch=compute_52,code=sm_52 \
 -gencode=arch=compute_60,code=sm_60 \
 -gencode=arch=compute_61,code=sm_61 \
 -gencode=arch=compute_61,code=compute_61

Sample flags for generation on CUDA 9 for maximum compatibility. Note the removed SM_20:

-arch=sm_30 \
 -gencode=arch=compute_30,code=sm_30 \
 -gencode=arch=compute_50,code=sm_50 \
 -gencode=arch=compute_52,code=sm_52 \
 -gencode=arch=compute_60,code=sm_60 \
 -gencode=arch=compute_61,code=sm_61 \
 -gencode=arch=compute_62,code=sm_62 \
 -gencode=arch=compute_70,code=sm_70 \
 -gencode=arch=compute_70,code=compute_70

Leave a Reply



  • i can not find any hard information for term “SM62” on the web.
    at least some are speculating that it is meant for Tegra.

    what are your sources for your statements on “SM62”?

  • Yan says:

    Hi,

    Then what happens if I only use the following at compile time
    -gencode arch=compute_20,code=\”sm_20,compute_20\”

    but run the compiled code on a 5.0 card? The JIT compiler will generate the GPU code, but is it going to compile with
    -gencode arch=compute_50,code=\”sm_50,compute_50\”

    I’ve been searching the web, but couldn’t find anything. Please advice.

    Thanks,

    Ian

    • Arnon Shimoni says:

      Hey Ian
      If you’re compiling for a 5.0 card, the second option you suggested is better. If you have to have cross-compatibility, I’d recommend the first.

  • jg says:

    Thank you, very useful, what about sm_37 ?

    • Arnon Shimoni says:

      `sm_37` is for the Tesla K80 cards, but our experience proves that it’s not effective to compile for it specifically. sm_30 gives the same results and is better if you also have K40s or similar.

  • LostWorld says:

    kindly help me to find SM for GTX950 and compute_????

  • Mandar Gogate says:

    Thank you. 🙂