Soft processors are an important tool in the Field Programmable Gate Array (FPGA) designer's toolkit, and their Single Instruction Multiple Data (SIMD) organizations are an efficient means to utilize the parallelism of FPGAs. However, the state-of-the-art SIMD processors are hindered by the additional logic complexity resulting from dynamic features. By minimizing such constructs, it is possible to design soft processors that are efficient but still flexible enough to operate within an application domain. To this end, we propose a family of instruction set programmable multi-issue wide SIMD soft cores. The template is based on a highly static Transport Triggered Architecture (TTA) and a design time customizable shuffle unit to minimize inefficient dynamic features while remaining compiler programmable. The cores are evaluated on the PYNQ-Z1 board against the ARM A9 hard processor system with NEON vector extensions. The proposed cores reach up to 2.4x performance improvement over the ARM, can fit up to 1024 bit wide SIMD units onto the relatively small FPGA, while still operating at above 100 MHz. The scalability of TTA enables state of the art vector widths. The multicore scalability of the template is preliminarily tested with a 14-core design on a XCZU9EG FPGA customized for real-time convolutional neural net inference.