summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] fix ds_swizzle_b32 opcode for VI (bz 28371)Valery Pykhtin2016-07-082-1/+21
| | | | | | Differential Revision: http://reviews.llvm.org/D22049 llvm-svn: 274852
* [AArch64] Macro fusion of simple ALU ops with branches for Broadcom's VulcanPankaj Gode2016-07-081-0/+1
| | | | | | | | | | Support for the macro fusion of simple ALU ops with branches for the Vulcan sub-target. Patch by Meador Inge <meadori@gmail.com> Differential Revision: http://reviews.llvm.org/D22042 llvm-svn: 274837
* [X86][SSE] Accept any shuffle mask that is all zeroesSimon Pilgrim2016-07-081-0/+7
| | | | | | Until we have a better way to extract constants through bitcasted build vectors (and how to handle undefs of partial lanes etc.) at least accept build vectors that are all zeroes. llvm-svn: 274833
* [AVX512] Remove and autoupgrade a duplicate set of 512-bit masked shift ↵Craig Topper2016-07-082-14/+1
| | | | | | | | intrinsics. I'm not sure if clang ever used these builtin names or not. llvm-svn: 274827
* AMDGPU: Move si_mask_branch register operand to be a useMatt Arsenault2016-07-083-6/+8
| | | | llvm-svn: 274818
* AMDGPU: Cleanup. Use definesRegister instead of manual loopMatt Arsenault2016-07-081-6/+2
| | | | | | | Also this will be more precise since it will check exec_lo/exec_hi writes. llvm-svn: 274817
* ARM: support high registers in __builtin_longjmp on WoASaleem Abdulrasool2016-07-082-4/+34
| | | | | | | | | | Windows on ARM uses a pure thumb-2 environment. This means that it can select a high register when doing a __builtin_longjmp. We would use a tLDRi which would truncate the register to a low register. Use a t2LDRi12 to get the full register file access. Tweak the code to just load into PC, as that is an interworking branch on all supported cores anyways. llvm-svn: 274815
* [lanai] Use peephole optimizer to generate more conditional ALU operations.Jacques Pienaar2016-07-0715-364/+707
| | | | | | | | | | | | | | | | | Summary: * Similiar to the ARM backend yse the peephole optimizer to generate more conditional ALU operations; * Add predicated type with default always true to RR instructions in LanaiInstrInfo.td; * Move LanaiSetflagAluCombiner into optimizeCompare; * The ASM parser can currently only handle explicitly specified CC, so specify ".t" (true) where needed in the ASM test; * Remove unused MachineOperand flags; Reviewers: eliben Subscribers: aemerson Differential Revision: http://reviews.llvm.org/D22072 llvm-svn: 274807
* Recommit r274692 - [X86] Transform setcc + movzbl into xorl + setccMichael Kuperstein2016-07-074-1/+192
| | | | | | | | | | | xorl + setcc is generally the preferred sequence due to the partial register stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller. This fixes PR28146. The original commit tried inserting an 8bit-subreg into a GR32 (not GR32_ABCD) which was not appreciated by fast regalloc on 32-bit. llvm-svn: 274802
* [AArch64] Change the preferred alignment for char and short to word alignment.Chad Rosier2016-07-071-2/+2
| | | | | | | | | | The commit reinstates r273279, which was informally approved. Original Review: http://reviews.llvm.org/D21414 This reverts commit ca632c91aaa7cafc50942f890c49f727a046ace1. llvm-svn: 274790
* Revert r274692 to check whether this is what breaks windows selfhost.Michael Kuperstein2016-07-074-189/+1
| | | | llvm-svn: 274771
* NVPTX: Remove the legacy ptx intrinsicsJustin Bogner2016-07-073-136/+86
| | | | | | | | | | | | - Rename the ptx.read.* intrinsics to nvvm.read.ptx.sreg.* - some but not all of these registers were already accessible via the nvvm name. - Rename ptx.bar.sync nvvm.bar.sync, to match nvvm.bar0. There's a fair amount of code motion here, but it's all very mechanical. llvm-svn: 274769
* Revert "[AArch64] Change the preferred alignment for char and short to word ↵Chad Rosier2016-07-071-2/+2
| | | | | | | | alignment" This reverts commit r273279 as the change was not properly approved. llvm-svn: 274768
* [SystemZ] Fix regression when handling conditional callsZhan Jun Liau2016-07-071-2/+2
| | | | | | | | | | | | | | | Summary: A regression showed up in node.js when handling conditional calls. Fix the regression by recognizing external symbols as a possible operand type in CallJG. Reviewers: koriakin Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D22054 llvm-svn: 274761
* [AMDGPU] fix ds_write_src2 encoding (bz26027)Valery Pykhtin2016-07-072-2/+16
| | | | | | Differential revision: http://reviews.llvm.org/D22041 llvm-svn: 274756
* Don't crash trying to relax 32 loads on COFF.Rafael Espindola2016-07-071-0/+1
| | | | | | Fixes pr28452. llvm-svn: 274754
* [ARM] Do not test for CPUs, use SubtargetFeatures. Also remove 1 flagDiana Picus2016-07-074-12/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow-up for r273544. The end goal is to get rid of the isSwift / isCortexXY / isWhatever methods. This commit also removes a command line flag that isn't used in any of the tests: check-vmlx-hazards. It can be replaced easily with the mattr mechanism, since this is now a subtarget feature. There is still some work left regarding FeatureExpandMLx. In the past MLx expansion was enabled for subtargets with hasVFP2(), until r129775 [1] switched from that to isCortexA9, without too much justification. In spite of that, the code performing MLx expansion still contains calls to isSwift/isLikeA9, although the results of those are pretty clear given that we're only enabling it for the A9. We should try to enable it for all targets that have FeatureHasVMLxHazards, as it seems to be closely related to that behaviour, and if that is possible try to clean up the MLx expansion pass from all calls to isWhatever. This will require some performance testing, so it will be done in another patch. [1] http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20110418/119725.html Differential Revision: http://reviews.llvm.org/D21798 llvm-svn: 274742
* Use the class version of getPointerTy rather than getting back toEric Christopher2016-07-071-23/+22
| | | | | | ourselves via a call through the DAG. llvm-svn: 274721
* Use the class definition for useSoftFloat.Eric Christopher2016-07-071-4/+4
| | | | llvm-svn: 274720
* Rename argument for consistency.Eric Christopher2016-07-072-22/+22
| | | | llvm-svn: 274717
* Remove the plumbing for isDarwinABI from EmitTailCallLoadFPAndRetAddr.Eric Christopher2016-07-072-9/+6
| | | | llvm-svn: 274716
* Use the MachineFunction that we've already queried for in the function.Eric Christopher2016-07-071-4/+2
| | | | llvm-svn: 274715
* Remove the plumbing for isDarwinABI from the PrepareTailCall hierarchy.Eric Christopher2016-07-071-10/+8
| | | | llvm-svn: 274714
* Remove the plumbing of 64-bitness from PrepareTailCall and functionsEric Christopher2016-07-071-13/+13
| | | | | | called by it. llvm-svn: 274711
* Sink call to get the MachineFunction into EmitTailCallStoreFPAndRetAddrEric Christopher2016-07-071-10/+7
| | | | | | and remove the argument. llvm-svn: 274710
* Remove unnecessary subtarget parameters in PPCTargetLowering.Eric Christopher2016-07-072-31/+26
| | | | llvm-svn: 274709
* fix documentation comment. NFC.Junmo Park2016-07-061-2/+1
| | | | llvm-svn: 274704
* Minor code cleanup. NFC.Junmo Park2016-07-061-1/+1
| | | | llvm-svn: 274702
* [X86] Transform setcc + movzbl into xorl + setccMichael Kuperstein2016-07-064-1/+189
| | | | | | | | | | | xorl + setcc is generally the preferred sequence due to the partial register stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller. This fixes PR28146. Differential Revision: http://reviews.llvm.org/D21774 llvm-svn: 274692
* AArch64: Change modeling of zero cycle zeroing.Matthias Braun2016-07-062-24/+48
| | | | | | | | | | | | | | | | | On CPUs with the zero cycle zeroing feature enabled "movi v.2d" should be used to zero a vector register. This was previously done at instruction selection time, however the register coalescer sometimes widened multiple vregs to the Q width because of that leading to extra spills. This patch leaves the decision on how to zero a register to the AsmPrinter phase where it doesn't affect register allocation anymore. This patch also sets isAsCheapAsAMove=1 on FMOVS0, FMOVD0. This fixes http://llvm.org/PR27454, rdar://25866262 Differential Revision: http://reviews.llvm.org/D21826 llvm-svn: 274686
* AArch64: Replace a RegScavenger instance with LivePhysRegsMatthias Braun2016-07-061-14/+14
| | | | | | | | | | | | | | findScratchNonCalleeSaveRegister() just needs a simple liveness analysis, use LivePhysRegs for that as it is simpler and does not depend on the kill flags. This commit adds a convenience function available() to LivePhysRegs: This function returns true if the given register is not reserved and neither the register nor any of its aliases are alive. Differential Revision: http://reviews.llvm.org/D21865 llvm-svn: 274685
* Add initial support for R_386_GOT32X.Rafael Espindola2016-07-066-9/+29
| | | | | | This adds it only for movl mov@GOT(%reg), %reg. llvm-svn: 274678
* [NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM.Justin Lebar2016-07-061-1/+13
| | | | | | | | | | Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D22068 llvm-svn: 274674
* NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0Justin Bogner2016-07-062-4/+1
| | | | | | | Everywhere where cuda.syncthreads or __syncthreads is used, use the properly namespaced nvvm.barrier0 instead. llvm-svn: 274664
* NVPTX: Make the llvm.nvvm.shfl intrinsics and builtin names consistentJustin Bogner2016-07-061-8/+8
| | | | | | | The intrinsics here use nvvm, but the builtins and tablegen variable names were using ptx. Stick to the modern names here. llvm-svn: 274662
* [x86] fix cost of SINT_TO_FP for i32 --> float (PR21356, PR28434)Sanjay Patel2016-07-061-1/+1
| | | | | | | | | | | | | This is "cvtdq2ps" which does not appear to be particularly slow on any CPU according to Agner's tables. Choosing "5" as a cost here as suggested in: https://llvm.org/bugs/show_bug.cgi?id=21356 ...but it seems very conservative given that the instruction is fully pipelined, and I think these costs are supposed to model throughput. Note that related costs are also most likely too high, but this fixes PR21356 and partly fixes PR28434. llvm-svn: 274658
* [X86] Sort cast cost tables. NFC.Michael Kuperstein2016-07-061-124/+123
| | | | | | | Cast cost tables are now sorted, for each cast type, lexicographically on [source base type, source vector width, dest base type, base vector width]. llvm-svn: 274653
* [SystemZ] Remove AND mask of bottom 6 bits when result is used for shift/rotateElliot Colp2016-07-062-1/+55
| | | | | | | | | | On SystemZ, shift and rotate instructions only use the bottom 6 bits of the shift/rotate amount. Therefore, if the amount is ANDed with an immediate mask that has all of the bottom 6 bits set, we can remove the AND operation entirely. Differential Revision: http://reviews.llvm.org/D21854 llvm-svn: 274650
* [X86][SSE] Fixed typo in insertps lowering.Simon Pilgrim2016-07-061-1/+1
| | | | | | | | We were checking for 2 insertions (which is caught earlier in the pattern matching loop) instead of the case where we have no insertions. Turns out this code never fires as we always try to lower to insertps after trying to lower to blendps, which would catch these cases - I'm about to make some changes to support combining to insertps which could cause this to fire so I don't want to remove it. llvm-svn: 274648
* Ensure all uses of permute instructions feed vector storesKit Barton2016-07-061-0/+20
| | | | | | | | | | | | | | | There is a problem in VSXSwapRemoval where it is incorrectly removing permute instructions. In this case, the permute is feeding both a vector store and also a non-store instruction. In this case, the permute cannot be removed. The fix is to simply look at all the uses of the vector register defined by the permute and ensure that all the uses are vector store instructions. This problem was reported in PR 27735 (https://llvm.org/bugs/show_bug.cgi?id=27735). Test case based on the original problem reported. Phabricator Review: http://reviews.llvm.org/D21802 llvm-svn: 274645
* [TTI] The cost model should not assume vector casts get completely scalarizedMichael Kuperstein2016-07-061-0/+2
| | | | | | | | | | | | | | | | The cost model should not assume vector casts get completely scalarized, since on targets that have vector support, the common case is a partial split up to the legal vector size. So, when a vector cast gets split, the resulting casts end up legal and cheap. Instead of pessimistically assuming scalarization, base TTI can use the costs the concrete TTI provides for the split vector, plus a fudge factor to account for the cost of the split itself. This fudge factor is currently 1 by default, except on AMDGPU where inserts and extracts are considered free. Differential Revision: http://reviews.llvm.org/D21251 llvm-svn: 274642
* fix typo; NFCSanjay Patel2016-07-061-1/+1
| | | | llvm-svn: 274636
* Re-commit of 274613.Elena Demikhovsky2016-07-063-42/+89
| | | | | | | The prev commit failed on compilation. A minor change in one pattern in lib/Target/X86/X86InstrAVX512.td fixes the failure. llvm-svn: 274626
* [ARM] Do not test for CPUs, use SubtargetFeatures. Also remove 2 flags.Diana Picus2016-07-064-14/+20
| | | | | | | | | | | | | | | This is a follow-up for r273544. The end goal is to get rid of the isSwift / isCortexXY / isWhatever methods. This commit also removes two command-line flags that weren't used in any of the tests: widen-vmovs and swift-partial-update-clearance. The former may be easily replaced with the mattr mechanism, but the latter may not (as it is a subtarget property, and not a proper feature). Differential Revision: http://reviews.llvm.org/D21797 llvm-svn: 274620
* [ARM] Do not test for CPUs, use SubtargetFeatures (Part 3). NFCIDiana Picus2016-07-065-7/+40
| | | | | | | | | | | This is a follow-up for r273544 and r273853. The end goal is to get rid of the isSwift / isCortexXY / isWhatever methods. This commit also marks them as obsolete. Differential Revision: http://reviews.llvm.org/D21796 llvm-svn: 274616
* Reverted 274613 due to compilation failue. Elena Demikhovsky2016-07-063-89/+42
| | | | llvm-svn: 274615
* AVX-512: Optimization for patterns with i1 scalar typeElena Demikhovsky2016-07-063-42/+89
| | | | | | | | | | | | | | The patch removes redundant kmov instructions (not all, we still have a lot of work here) and redundant "and" instructions after "setcc". I use "AssertZero" marker between X86ISD::SETCC node and "truncate" to eliminate extra "and $1" instruction. I also changed zext, aext and trunc patterns in the .td file. It allows to remove extra "kmov" instruictions. This patch fixes https://llvm.org/bugs/show_bug.cgi?id=28173. Fast ISEL mode is not supported correctly for AVX-512. ICMP/FCMP scalar instruction should return result in k-reg. It will be fixed in one of the next patches. I redirected handling of "cmp" to the DAG builder mode. (The code looks worse in one specific test case, but without this fix the new patch fails). Differential revision: http://reviews.llvm.org/D21956 llvm-svn: 274613
* AMDGPU: Fix return of non-void-returning shadersNicolai Haehnle2016-07-061-6/+4
| | | | | | | | | | | | | | | | | Summary: Since "AMDGPU: Fix verifier errors in SILowerControlFlow", the logic that ensures that a non-void-returning shader falls off the end of the last basic block was effectively disabled, since SI_RETURN is now used. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96731 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21975 llvm-svn: 274612
* AArch64: try to fix optimized build failure.Tim Northover2016-07-051-1/+2
| | | | | | | | | I think the Ops filled out by Regex::match contain pointers into the temporary std::string returned by StringRef::upper. Its lifetime is extended by the call to match, but only until the end of that call (not to the uses of Ops later on). llvm-svn: 274586
* [X86][AVX2] Simplified BROADCAST combining to avoid repeated matching attemptsSimon Pilgrim2016-07-051-12/+9
| | | | llvm-svn: 274583
OpenPOWER on IntegriCloud