Ticket #28 (new defect)

Opened 4 years ago

Last modified 4 years ago

reduced performance of small types

Reported by: tmcdonell Owned by:
Priority: minor Milestone:
Component: CUDA backend Version: 0.8.1.0
Keywords: Cc:

Description

CUDA devices do not coalesce memory transfers to global memory of 8- and 16-bit types. Without providing alternate skeletons that process multiple elements per thread (vec4 and vec2 types respectively), we may be able to promote these to 32-bit transactions, and mask off the irrelevant data. Similar issues exist for shared memory bank conflicts.

Change History

Changed 4 years ago by tmcdonell

  • owner tmcdonell deleted

Changed 4 years ago by chak

  • version changed from 0.8.0.0 to 0.8.1.0
Note: See TracTickets for help on using tickets.