summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/clamp.ll
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Remove dx10-clamp from subtarget featuresMatt Arsenault2019-03-291-3/+3
| | | | | | | | | | | | | | | | | | Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdgpu-dx10-clamp attribute to override this if desired. Also introduce a new amdgpu-ieee attribute to match. The values need to match to allow inlining. I think it is OK for the caller's dx10-clamp attribute to override the callee, but there doesn't appear to be the infrastructure to do this currently without definining the attribute in the generic Attributes.td. Eventually the calling convention lowering will need to insert a mode switch somewhere for these. llvm-svn: 357302
* DAG: Change behavior of fminnum/fmaxnum nodesMatt Arsenault2018-10-221-5/+15
| | | | | | | | | | | Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914
* AMDGPU: Don't form fmed3 if it will require materializationMatt Arsenault2018-09-181-2/+2
| | | | | | | If there is a single use constant, it can be folded into the min/max, but not into med3. llvm-svn: 342443
* [AMDGPU] Preliminary patch for divergence driven instruction selection. ↵Alexander Timofeev2018-09-111-1/+1
| | | | | | | | | Immediate selection predicate changed Differential revision: https://reviews.llvm.org/D51734 Reviewers: rampitec llvm-svn: 341928
* AMDGPU: Use splat vectors for undefs when folding canonicalizeMatt Arsenault2018-08-121-0/+32
| | | | | | | | | | | If one of the elements is undef, use the canonicalized constant from the other element instead of 0. Splat vectors are more useful for other optimizations, such as matching vector clamps. This was breaking on clamps of half3 from the undef 4th component. llvm-svn: 339512
* DAG: Enhance isKnownNeverNaNMatt Arsenault2018-08-031-5/+29
| | | | | | | | | | | | Add a parameter for testing specifically for sNaNs - at least one instruction pattern on AMDGPU needs to check specifically for this. Also handle more cases, and add a target hook for custom nodes, similar to the hooks for known bits. llvm-svn: 338910
* AMDGPU: Replace i64 add/sub loweringMatt Arsenault2017-11-151-1/+2
| | | | | | | | | | | | | | | Use VOP3 add/addc like usual. This has some tradeoffs. Inline immediates fold a little better, but other constants are worse off. SIShrinkInstructions could be made smarter to handle these cases. This allows us to avoid selecting scalar adds where we need to track the carry in scc and replace its users. This makes it easier to use the carryless VALU adds. llvm-svn: 318340
* AMDGPU: Do not fold clamp instructions when sources are differentMatt Arsenault2017-10-051-0/+22
| | | | | | Patch by hakzsam (Samuel Pitoiset) llvm-svn: 314951
* AMDGPU: Select clamp pattern with v2f16Matt Arsenault2017-08-301-34/+190
| | | | llvm-svn: 312087
* [AMDGPU] Remove getBidirectionalReasonRankStanislav Mekhanoshin2017-03-111-2/+2
| | | | | | | | | | | | | | This method inverts the Reason field of a scheduling candidate. It does right comparison between RegCritical and RegExcess, but everything else is broken. In fact it can prefer less strong reason such as Weak over RegCritical because Weak > -RegCritical. The CandReason enum is properly sorted, so just remove artificial ranking. Differential Revision: https://reviews.llvm.org/D30557 llvm-svn: 297536
* AMDGPU: Use clamp with f64Matt Arsenault2017-02-221-6/+3
| | | | llvm-svn: 295908
* AMDGPU: Fold FP clamp as modifier bitMatt Arsenault2017-02-221-9/+6
| | | | | | | | | | | The manual is unclear on the details of this. It's not clear to me if denormals are not allowed with clamp, or if that is only omod. Not allowing denorms for fp16 or fp64 isn't useful so I also question if that is really a restriction. Same with whether this is valid without IEEE mode enabled. llvm-svn: 295905
* AMDGPU: Redefine clamp node as clamp 0.0-1.0Matt Arsenault2017-02-211-0/+535
Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
OpenPOWER on IntegriCloud