summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/fmed3.ll
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Don't form fmed3 if it will require materializationMatt Arsenault2018-09-181-0/+62
| | | | | | | If there is a single use constant, it can be folded into the min/max, but not into med3. llvm-svn: 342443
* DAG: Enhance isKnownNeverNaNMatt Arsenault2018-08-031-41/+32
| | | | | | | | | | | | Add a parameter for testing specifically for sNaNs - at least one instruction pattern on AMDGPU needs to check specifically for this. Also handle more cases, and add a target hook for custom nodes, similar to the hooks for known bits. llvm-svn: 338910
* AMDGPU/GCN: Bring processors in sync with AMDGPUUsageKonstantin Zhuravlyov2017-12-081-2/+2
| | | | | | | | | | | | - Add gfx704 - Change bonaire to gfx704 - Remove gfx804 - Remove gfx901 - Remove gfx903 Differential Revision: https://reviews.llvm.org/D40046 llvm-svn: 320194
* AMDGPU: Fix -enable-var-scope violationsMatt Arsenault2017-11-121-7/+7
| | | | llvm-svn: 318004
* AMDGPU: Start selecting global instructionsMatt Arsenault2017-07-291-75/+75
| | | | llvm-svn: 309470
* AMDGPU: Allow SIShrinkInstructions to work in non-SSAMatt Arsenault2017-07-101-2/+2
| | | | | | | | Immediates can be folded as long as the immediate is a vreg. Also undo commuting instructions if it didn't fold an immediate. llvm-svn: 307575
* [AMDGPU] Narrow lshl from 64 to 32 bit if possibleStanislav Mekhanoshin2017-05-221-4/+4
| | | | | | | | | Turn expensive 64 bit shift into 32 bit if shift does not overflow int: shl (ext x) => zext (shl x) Differential Revision: https://reviews.llvm.org/D33367 llvm-svn: 303569
* AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernelMatt Arsenault2017-03-211-42/+42
| | | | | | | | | | | | Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
* AMDGPU: Use v_med3_{f16|i16|u16}Matt Arsenault2017-02-271-4/+83
| | | | llvm-svn: 296401
* AMDGPU: Generalize matching of v_med3_f32Matt Arsenault2017-01-311-6/+732
| | | | | | | | | | I think this is safe as long as no inputs are known to ever be nans. Also add an intrinsic for fmed3 to be able to handle all safe math cases. llvm-svn: 293598
* DAG: Consider nnan in isKnownNeverNaNMatt Arsenault2017-01-181-0/+16
| | | | llvm-svn: 292328
* AMDGPU: Remove superfluous string attributes from testsMatt Arsenault2016-07-111-9/+9
| | | | | | Also fix v_mac.ll not testing right thing for fneg llvm-svn: 275129
* AMDGPU: Run SIFoldOperands after PeepholeOptimizerMatt Arsenault2016-04-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PeepholeOptimizer cleans up redundant copies, which makes the operand folding more effective. shader-db stats: Totals: SGPRS: 34200 -> 34336 (0.40 %) VGPRS: 22118 -> 21655 (-2.09 %) Code Size: 632144 -> 633460 (0.21 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 10240 -> 11264 (10.00 %) bytes per wave Max Waves: 8822 -> 8918 (1.09 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 7704 -> 7840 (1.77 %) VGPRS: 5169 -> 4706 (-8.96 %) Code Size: 234444 -> 235760 (0.56 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 0 -> 1024 (0.00 %) bytes per wave Max Waves: 1188 -> 1284 (8.08 %) Wait states: 0 -> 0 (0.00 %) Increases: SGPRS: 35 (0.01 %) VGPRS: 1 (0.00 %) Code Size: 59 (0.02 %) LDS: 0 (0.00 %) Scratch: 1 (0.00 %) Max Waves: 48 (0.02 %) Wait states: 0 (0.00 %) Decreases: SGPRS: 26 (0.01 %) VGPRS: 54 (0.02 %) Code Size: 68 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Max Waves: 4 (0.00 %) Wait states: 0 (0.00 %) llvm-svn: 266378
* AMDGPU: Match fmed3 patterns with legacy fmin/fmaxMatt Arsenault2016-01-281-0/+23
| | | | llvm-svn: 259090
* AMDGPU: Match some med3 patternsMatt Arsenault2016-01-281-0/+131
llvm-svn: 259089
OpenPOWER on IntegriCloud