summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/R600/AMDGPUISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
* R600 -> AMDGPU renameTom Stellard2015-06-131-2866/+0
| | | | llvm-svn: 239657
* R600: Switch to using generic min / max nodes.Matt Arsenault2015-06-091-26/+19
| | | | llvm-svn: 239377
* R600/SI: Fix some cases for load / store of halfMatt Arsenault2015-06-041-0/+13
| | | | | | | Mostly argument loads were producing broken zextloads from an FP type. llvm-svn: 239049
* R600: Add comments to subword private address load lowering codeJan Vesely2015-05-261-0/+13
| | | | | | | | v2: Use C++ comments and end with periods Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> llvm-svn: 238228
* Add target hook to allow merging stores of nonzero constantsMatt Arsenault2015-05-241-0/+6
| | | | | | | | | | On GPU targets, materializing constants is cheap and stores are expensive, so only doing this for zero vectors was silly. Most of the new testcases aren't optimally merged, and are for later improvements. llvm-svn: 238108
* R600/SI: Remove explicit m0 operand from v_interp instructionsTom Stellard2015-05-121-0/+3
| | | | | | | Instead add m0 as an implicit operand. This helps avoid spills of the m0 register in some cases. llvm-svn: 237140
* R600/SI: Remove explicit m0 operand from s_sendmsgTom Stellard2015-05-121-0/+1
| | | | | | | | | | | | | | | Instead add m0 as an implicit operand. This allows us to avoid using the M0Reg register class and eliminates a number of unnecessary spills when using s_sendmsg instructions. This impacts one shader in the shader-db: SGPRS: 48 -> 40 (-16.67 %) VGPRS: 112 -> 108 (-3.57 %) Code Size: 40132 -> 38796 (-3.33 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 2048 -> 0 (-100.00 %) bytes per wave llvm-svn: 237133
* Change getTargetNodeName() to produce compiler warnings for missing cases, ↵Matthias Braun2015-05-071-2/+10
| | | | | | fix them llvm-svn: 236775
* Reinstate revisions r234755, r234759, r234760Jan Vesely2015-04-301-0/+10
| | | | | | | | | changes: Don't apply on hexagon and NVPTX since they no longer claim to support UADDO/USUBO Add location to getConstant Drop comment about the ops being turned into expand llvm-svn: 236240
* Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes"Sergey Dmitrouk2015-04-281-99/+109
| | | | | | | | | | | | | | | | | | | | | | | | | [DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235989
* Revert "[DebugInfo] Add debug locations to constant SD nodes"Daniel Jasper2015-04-281-109/+99
| | | | | | | This breaks a test: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870 llvm-svn: 235987
* [DebugInfo] Add debug locations to constant SD nodesSergey Dmitrouk2015-04-281-99/+109
| | | | | | | | | | | | | | | | | | | | | | | This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235977
* R600: Correctly lower CONCAT_VECTOR nodes with more than 2 operandsTom Stellard2015-04-231-4/+2
| | | | llvm-svn: 235662
* Revert revisions r234755, r234759, r234760Jan Vesely2015-04-131-10/+0
| | | | | | | | | | | Revert "Remove default in fully-covered switch (to fix Clang -Werror -Wcovered-switch-default)" Revert "R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO" Revert "LegalizeDAG: Try to use Overflow operations when expanding ADD/SUB" Using overflow operations fails CodeGen/Generic/2011-07-07-ScheduleDAGCrash.ll on hexagon, nvptx, and r600. Revert while I investigate. llvm-svn: 234768
* R600: Add carry and borrow instructions. Use them to implement UADDO/USUBOJan Vesely2015-04-131-0/+10
| | | | | | | | | | | | | | | | v2: tighten the sub64 tests v3: rename to CARRY/BORROW v4: fixup test cmdline add known bits computation use sign extend instead of sub 0,x better add test v5: remove redundant break move lowering to separate functions fix comments Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewers: arsenm llvm-svn: 234759
* R600: Make FMIN/MAXNUM legal on all asicsJan Vesely2015-04-121-0/+2
| | | | | | | | v2: Add tests Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> reviewer: arsenm llvm-svn: 234716
* R600: remove manual BFE optimizationJan Vesely2015-04-121-8/+2
| | | | | | | | Fixed since r233079 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> reviewer: arsenm llvm-svn: 234715
* R600/SI: Expand fract to floor, then only select V_FRACT on CIMarek Olsak2015-03-241-3/+0
| | | | | | | | | V_FRACT is buggy on SI. R600-specific code is left intact. v2: drop the multiclass, use complex VOP3 patterns llvm-svn: 233075
* R600/SI: Remove v_sub_f64 pseudoMatt Arsenault2015-02-201-0/+3
| | | | | | | | | | The expansion code does the same thing. Since the operands were not defined with the correct types, this has the side effect of fixing operand folding since the expanded pseudo would never use SGPRs or inline immediates. llvm-svn: 230072
* R600: Use new fmad node.Matt Arsenault2015-02-201-1/+7
| | | | | | | | | | | This enables a few useful combines that used to only use fma. Also since v_mad_f32 apparently does not support denormals, disable the existing cases that are custom handled if they are requested. llvm-svn: 230071
* R600/SI: Fix implicit vcc operand to v_div_fmas_*Matt Arsenault2015-02-141-3/+2
| | | | | | | | | This should allow finally fixing the f64 fdiv implementation. Test is disabled for VI since there seems to be a problem with one of the buffer load instructions on it. llvm-svn: 229236
* R600/SI: Make more store operations legalTom Stellard2015-02-041-3/+0
| | | | | | | | | | | v2i32, i32, trunc i32 to i16, and truc i32 to i8 stores are legal for all address spaces. We had marked them as custom in order to lower them for the private address space, but this is no longer necessary. This enables lowering of misaligned stores of these types in the DAGLegalizer. llvm-svn: 228189
* R600: Don't promote i64 stores to v2i32 during DAG legalizationTom Stellard2015-02-041-3/+0
| | | | | | | We take care of this during instruction selection now. This fixes a potential infinite loop when lowering misaligned stores. llvm-svn: 228188
* Reuse a bunch of cached subtargets and remove getSubtarget callsEric Christopher2015-01-301-7/+4
| | | | | | without a Function argument. llvm-svn: 227638
* Move DataLayout back to the TargetMachine from TargetSubtargetInfoEric Christopher2015-01-261-2/+2
| | | | | | | | | | | | | | | | | | | derived classes. Since global data alignment, layout, and mangling is often based on the DataLayout, move it to the TargetMachine. This ensures that global data is going to be layed out and mangled consistently if the subtarget changes on a per function basis. Prior to this all targets(*) have had subtarget dependent code moved out and onto the TargetMachine. *One target hasn't been migrated as part of this change: R600. The R600 port has, as a subtarget feature, the size of pointers and this affects global data layout. I've currently hacked in a FIXME to enable progress, but the port needs to be updated to either pass the 64-bitness to the TargetMachine, or fix the DataLayout to avoid subtarget dependent features. llvm-svn: 227113
* R600/SI: Move i64 -> v2i32 load promotion into AMDGPUDAGToDAGISel::Select()Tom Stellard2015-01-231-3/+0
| | | | | | | | | | | We used to do this promotion during DAG legalization, but this caused an infinite loop in ExpandUnalignedLoad() because it assumed that i64 loads were legal if i64 was a legal type. It also seems better to report i64 loads as legal, since they actually are and we were just promoting them to simplify our tablegen files. llvm-svn: 226945
* R600: Try to use lower types for 64bit division if possibleJan Vesely2015-01-221-12/+38
| | | | | | | | v2: add and enable tests for SI Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> llvm-svn: 226881
* R600: Simplify LowerUDIVREMJan Vesely2015-01-221-19/+11
| | | | | | | | | | | optimizations can handle removing the Hi part operations. The generated code is identical for R600, ~10% icount reduction for SI v2: rebase Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> llvm-svn: 226879
* R600/SI: Custom lower froundMatt Arsenault2015-01-211-10/+113
| | | | | | | | | This fixes it for SI. It also removes the pattern used previously for Evergreen for f32. I'm not sure if the the new R600 output is better or not, but it uses 1 fewer instructions if BFI is available. llvm-svn: 226682
* Implement new way of expanding extloads.Matt Arsenault2015-01-141-18/+8
| | | | | | | | | | | | | | | Now that the source and destination types can be specified, allow doing an expansion that doesn't use an EXTLOAD of the result type. Try to do a legal extload to an intermediate type and extend that if possible. This generalizes the special case custom lowering of extloads R600 has been using to work around this problem. This also happens to fix a bug that would incorrectly use more aligned loads than should be used. llvm-svn: 225925
* R600: Implement getRecipEstimateMatt Arsenault2015-01-131-0/+23
| | | | | | | | | This requires a new hook to prevent expanding sqrt in terms of rsqrt and reciprocal. v_rcp_f32, v_rsq_f32, and v_sqrt_f32 are all the same rate, so this expansion would just double the number of instructions and cycles. llvm-svn: 225828
* R600: Implement getRsqrtEstimateMatt Arsenault2015-01-131-0/+18
| | | | | | | | Only do for f32 since I'm unclear on both what this is expecting for the refinement steps in terms of accuracy, and what f64 instruction actually provides. llvm-svn: 225827
* R600: Make cttz / ctlz cheap to speculateMatt Arsenault2015-01-131-0/+12
| | | | | | | | | Speculating things is generally good. SI+ has instructions for these for 32-bit values. This is still probably better even with the expansion for 64-bit values, although it is odd that this callback doesn't have the size as a parameter. llvm-svn: 225822
* [SelectionDAG] Allow targets to specify legality of extloads' resultAhmed Bougacha2015-01-081-13/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | type (in addition to the memory type). The *LoadExt* legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421
* R600/SI: Add class intrinsicMatt Arsenault2015-01-061-0/+5
| | | | llvm-svn: 225305
* R600: Remove outdated commentMatt Arsenault2014-12-191-4/+0
| | | | llvm-svn: 224648
* R600/SI: Only form min/max with 1 use.Matt Arsenault2014-12-191-1/+1
| | | | | | | If the condition is used for something else, this increases the number of instructions. llvm-svn: 224646
* R600: Fix min/max matching problems with unordered comparesMatt Arsenault2014-12-121-42/+43
| | | | | | | | The returned operand needs to be permuted for the unordered compares. Also fix incorrectly producing fmin_legacy / fmax_legacy for f64, which don't exist. llvm-svn: 224094
* Add target hook for whether it is profitable to reduce load widthsMatt Arsenault2014-12-121-0/+23
| | | | | | | | Add an option to disable optimization to shrink truncated larger type loads to smaller type loads. On SI this prevents using scalar load instructions in some cases, since there are no scalar extloads. llvm-svn: 224084
* R600/SI: Update instruction conversions for VIMarek Olsak2014-12-071-1/+19
| | | | | | | | | There are 3 changes: - Convert 32-bit S_LSHL/LSHR/ASHR to their V_*REV variants for VI - Lower RSQ_CLAMP for VI - Don't generate MIN/MAX_LEGACY on VI llvm-svn: 223604
* R600/SI: Use ZeroOrNegativeOneBooleanContentMatt Arsenault2014-11-261-0/+3
| | | | | | | | | | | | | | This sort of doesn't matter since the setcc type is i1, but this previously was using the default UndefinedBooleanContent. This makes it more consistent with R600. This enables more optimizations which typically give up on UndefinedBooleanContent. For example, there is already a special case target DAG combine for setcc + sext which can be eliminated in favor of what the generic DAG combiner can do if it assumes boolean values are sign extended. Since -1 is an inline immediate, using it is basically free and the backend already uses it when a boolean value is needed in a wider type. llvm-svn: 222850
* R600: Fix assert on copy of an i1 on pre-SIMatt Arsenault2014-11-231-1/+2
| | | | | | | i1 is not a legal type on Evergreen, so this combine proceeded and tried to produce a bitcast between i1 and i8. llvm-svn: 222630
* R600: Permute operands when selecting legacy min/maxMatt Arsenault2014-11-151-6/+9
| | | | | | | | | | This gets the correct NaN behavior based on the compare type the hardware uses. This now passes the new piglit test I have for this on SI. Add stricter tests for the operand order. llvm-svn: 222079
* R600: Fix 64-bit integer divisionTom Stellard2014-11-151-2/+2
| | | | | | | | This fixes a failure in one of the oclconform tests. Patch by: Jan Vesely llvm-svn: 222073
* R600: Factor i64 UDIVREM lowering into its own fuctionTom Stellard2014-11-151-0/+81
| | | | | | | | This is so it could potentially be used by SI. However, the current implementation does not always produce correct results, so the IntegerDivisionPass is being used instead. llvm-svn: 222072
* R600/SI: Combine min3/max3 instructionsMatt Arsenault2014-11-141-0/+6
| | | | llvm-svn: 222032
* R600/SI: Match integer min / max instructionsMatt Arsenault2014-11-141-21/+69
| | | | llvm-svn: 222015
* R600/SI: Fix fmin_legacy / fmax_legacy matching for SIMatt Arsenault2014-11-131-19/+50
| | | | | | select_cc is expanded on SI, so this was never matched. llvm-svn: 221941
* We can get the TLOF from the TargetMachine - so constructor no longer ↵Aditya Nandakumar2014-11-131-1/+1
| | | | | | requires TargetLoweringObjectFile to be passed. llvm-svn: 221926
* R600: Error on initializer for LDS.Matt Arsenault2014-11-131-2/+21
| | | | | | Also give a proper error for other address spaces. llvm-svn: 221917
OpenPOWER on IntegriCloud