id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc
28	reduced performance of small types	tmcdonell		CUDA devices do not coalesce memory transfers to global memory of 8- and 16-bit types. Without providing alternate skeletons that process multiple elements per thread (vec4 and vec2 types respectively), we may be able to promote these to 32-bit transactions, and mask off the irrelevant data. Similar issues exist for shared memory bank conflicts.	defect	new	minor		CUDA backend	0.8.1.0			
