summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Make sure the prologue does not clobber EFLAGS when it lives accross it.Quentin Colombet2015-12-021-43/+26
| | | | | | | | This is a superset of the fix done in r254448. This fixes PR25607. llvm-svn: 254478
* AArch64: fix 128-bit shiftsTim Northover2015-12-021-33/+54
| | | | | | | | | | | | | | We mustn't introduce a shift of exactly 64-bits for any inputs, since that's an UNDEF value (and worse, it's not what you want with the natural Arch64 implementation). The generated code is pretty horrific, but I couldn't come up with an obviously better alternative (if the amount is constant EXTR could help). Turns out 128-bit shifts are just nasty. rdar://22491037 llvm-svn: 254475
* AMDGPU: Error on addrspacecasts that aren't actually implementedMatt Arsenault2015-12-011-4/+7
| | | | llvm-svn: 254469
* AMDGPU: Implement isNoopAddrSpaceCastMatt Arsenault2015-12-012-0/+13
| | | | llvm-svn: 254468
* ARM: Change ArchCheck field to uint64_tMatthias Braun2015-12-011-1/+1
| | | | | | | | The values in this field are compared against getAvailableFeatures() which returns an uint64_t. This was causing problems in an internal branch. llvm-svn: 254462
* AMDGPU: Disallow flat_scr in SI assemblerMatt Arsenault2015-12-011-3/+24
| | | | llvm-svn: 254459
* AMDGPU: Optimize VOP2 operand legalizationMatt Arsenault2015-12-013-45/+149
| | | | | | | | | | | | | | | | | | Don't use commuteInstruction, and don't commute if doing so will not improve legality. Skip the more complex checks for literal operands and constant bus restrictions, which are not a concern for VOP2 instructions because src1 does not accept SGPRs or constants and few implicitly read vcc. This gets called quite a few times and the attempts at commuting are a significant fraction of the time spent in SIFixSGPRCopies, so it's somewhat worthwhile to optimize. With this patch and others leading up to it, this reduces the compile time of SIFixSGPRCopies on some of the LuxMark 2 kernels from ~8ms to ~5ms on my system. llvm-svn: 254452
* [X86] Make sure the prologue does not clobber EFLAGS when it lives accross it.Quentin Colombet2015-12-011-1/+32
| | | | | | This fixes PR25629. llvm-svn: 254448
* Fix Thumb1 epilogue generationArtyom Skrobov2015-12-011-12/+55
| | | | | | | | | | | | | | Summary: This had been broken for a very long time, but nobody noticed until D14357 enabled shrink-wrapping by default. Reviewers: jroelofs, qcolombet Subscribers: tyomitch, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D14986 llvm-svn: 254444
* [AArch64] Fix a corner case in BitFeild selectWeiming Zhao2015-12-011-5/+11
| | | | | | | | | | | | | | | | | Summary: When not useful bits, BitWidth becomes 0 and APInt will not be happy. See https://llvm.org/bugs/show_bug.cgi?id=25571 We can just mark the operand as IMPLICIT_DEF is none bits of it is used. Reviewers: t.p.northover, jmolloy Subscribers: gberry, jmolloy, mgrang, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D14803 llvm-svn: 254440
* AMDGPU: Report extractelement as free in cost modelMatt Arsenault2015-12-012-0/+13
| | | | | | | | | | | | The cost for scalarized operations is computed as N * (scalar operation cost + 1 extractelement + 1 insertelement). This partially fixes inflating the cost of scalarized operations since every operation is scalarized and free. I don't think we want any cost asociated with scalarization, but for now insertelement is still counted. I'm not sure if we should pretend that insertelement is also free, or add a way to compute a custom scalarization cost. llvm-svn: 254438
* AMDGPU/SI: Remove REGISTER_STORE/REGISTER_LOAD code which is now deadTom Stellard2015-12-013-81/+0
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15050 llvm-svn: 254427
* AMDGPU: Use the default strings for data emission directivesTom Stellard2015-12-011-7/+0
| | | | | | | | | | | | | | Summary: This makes the assembly output look nicer and there is no reason to have custom strings for these. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D14671 llvm-svn: 254426
* [x86] add a convenience method to check for FMA capability; NFCISanjay Patel2015-12-012-5/+4
| | | | llvm-svn: 254425
* AVX-512: fixed asm string of vsqrtss Elena Demikhovsky2015-12-011-2/+2
| | | | | | (vvsqrtss was generated before) llvm-svn: 254411
* [mips][microMIPS] Implement RECIP.fmt, RINT.fmt, ROUND.L.fmt, ROUND.W.fmt, ↵Hrvoje Varga2015-12-015-26/+169
| | | | | | | | SEL.fmt, SELEQZ.fmt, SELNEQZ.fmt and CLASS.fmt Differential Revision: http://reviews.llvm.org/D13885 llvm-svn: 254405
* Introduce new @llvm.get.dynamic.area.offset.i{32, 64} intrinsics.Yury Gribov2015-12-016-0/+60
| | | | | | | | | | | | | | | The @llvm.get.dynamic.area.offset.* intrinsic family is used to get the offset from native stack pointer to the address of the most recent dynamic alloca on the caller's stack. These intrinsics are intendend for use in combination with @llvm.stacksave and @llvm.restore to get a pointer to the most recent dynamic alloca. This is useful, for example, for AddressSanitizer's stack unpoisoning routines. Patch by Max Ostapenko. Differential Revision: http://reviews.llvm.org/D14983 llvm-svn: 254404
* [AArch64] Add ARMv8.2-A Statistical Profiling ExtensionOliver Stannard2015-12-019-1/+170
| | | | | | | | | | | | The Statistical Profiling Extension is an optional extension to ARMv8.2-A. Since it is an optional extension, I have added the FeatureSPE subtarget feature to control it. The assembler-visible parts of this extension are the new "psb csync" instruction, which is equivalent to "hint #17", and a number of system registers. Differential Revision: http://reviews.llvm.org/D15021 llvm-svn: 254401
* [ARM] Add ARMv8.2-A to TargetParserOliver Stannard2015-12-014-1/+15
| | | | | | | | | | | | Add ARMv8.2-A to TargetParser, so that it can be used by the clang command-line options and the .arch directive. Most testing of this will be done in clang, checking that the command-line options that this enables work. Differential Revision: http://reviews.llvm.org/D15037 llvm-svn: 254400
* [ARM] Add subtarget features for ARMv8.2-AOliver Stannard2015-12-014-3/+20
| | | | | | | | | | | | | | This adds subtarget features for ARMv8.2-A, which builds on (and requires the features from) ARMv8.1-A. Most assembler-visible features of ARMv8.2-A are system instructions, and are all required parts of the architecture, so just depend on the HasV8_2aOps subtarget feature. There is also one large, optional feature, which adds 16-bit floating point versions of all existing floating-point instructions (VFP and SIMD), this is represented by the FeatureFullFP16 subtarget feature. Differential Revision: http://reviews.llvm.org/D15036 llvm-svn: 254399
* [X86] Fix patterns for memory forms of FP FSUBR and FDIVR. They need to have ↵Craig Topper2015-12-011-39/+69
| | | | | | | | memory on the left hand side of the fsub/fdiv operations in their patterns. Not sure how to test this. I noticed by inspection in the isel tables where the same pattern tried to produce DIV and DIVR or SUB and SUBR. llvm-svn: 254388
* [X86] Use range-based for loops. NFCCraig Topper2015-12-011-6/+6
| | | | llvm-svn: 254387
* [X86] Use array_lengthof instead of calculating manually. Also change index ↵Craig Topper2015-12-011-7/+7
| | | | | | types to size_t to match. llvm-svn: 254386
* [Hexagon] Use std::begin() and std::end() instead of doing the same ↵Craig Topper2015-12-011-2/+1
| | | | | | manually. NFC llvm-svn: 254385
* [Hexagon] Use array_lengthof and const correct and type correct the array ↵Craig Topper2015-12-011-7/+5
| | | | | | and array size. NFC llvm-svn: 254384
* Use array_lengthof instead of manually calculating it. NFCCraig Topper2015-12-011-2/+2
| | | | llvm-svn: 254383
* [Hexagon] Use ArrayRef to avoid needing to calculate an array size. ↵Craig Topper2015-12-011-16/+11
| | | | | | Interestingly the original code may have had a bug because it was passing the byte size of a uint16_t array instead of the number of entries. llvm-svn: 254382
* [ARM] Use range-based for loops to avoid the need for calculating an array ↵Craig Topper2015-12-011-6/+5
| | | | | | size that I would have otherwise cconverted to array_lengthof. NFC llvm-svn: 254381
* Replace all weight-based interfaces in MBB with probability-based ↵Cong Hou2015-12-015-13/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | interfaces, and update all uses of old interfaces. (This is the second attempt to submit this patch. The first caused two assertion failures and was reverted. See https://llvm.org/bugs/show_bug.cgi?id=25687) The patch in http://reviews.llvm.org/D13745 is broken into four parts: 1. New interfaces without functional changes (http://reviews.llvm.org/D13908). 2. Use new interfaces in SelectionDAG, while in other passes treat probabilities as weights (http://reviews.llvm.org/D14361). 3. Use new interfaces in all other passes. 4. Remove old interfaces. This patch is 3+4 above. In this patch, MBB won't provide weight-based interfaces any more, which are totally replaced by probability-based ones. The interface addSuccessor() is redesigned so that the default probability is unknown. We allow unknown probabilities but don't allow using it together with known probabilities in successor list. That is to say, we either have a list of successors with all known probabilities, or all unknown probabilities. In the latter case, we assume each successor has 1/N probability where N is the number of successors. An assertion checks if the user is attempting to add a successor with the disallowed mixed use as stated above. This can help us catch many misuses. All uses of weight-based interfaces are now updated to use probability-based ones. Differential revision: http://reviews.llvm.org/D14973 llvm-svn: 254377
* Revert r254348: "Replace all weight-based interfaces in MBB with ↵Hans Wennborg2015-12-015-7/+13
| | | | | | | | | | probability-based interfaces, and update all uses of old interfaces." and the follow-up r254356: "Fix a bug in MachineBlockPlacement that may cause assertion failure during BranchProbability construction." Asserts were firing in Chromium builds. See PR25687. llvm-svn: 254366
* Squelch unused variable warning in SIRegisterInfo.cpp.Matt Arsenault2015-12-011-1/+2
| | | | | | Patch by Justin Lebar llvm-svn: 254362
* Replace all weight-based interfaces in MBB with probability-based ↵Cong Hou2015-12-015-13/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | interfaces, and update all uses of old interfaces. The patch in http://reviews.llvm.org/D13745 is broken into four parts: 1. New interfaces without functional changes (http://reviews.llvm.org/D13908). 2. Use new interfaces in SelectionDAG, while in other passes treat probabilities as weights (http://reviews.llvm.org/D14361). 3. Use new interfaces in all other passes. 4. Remove old interfaces. This patch is 3+4 above. In this patch, MBB won't provide weight-based interfaces any more, which are totally replaced by probability-based ones. The interface addSuccessor() is redesigned so that the default probability is unknown. We allow unknown probabilities but don't allow using it together with known probabilities in successor list. That is to say, we either have a list of successors with all known probabilities, or all unknown probabilities. In the latter case, we assume each successor has 1/N probability where N is the number of successors. An assertion checks if the user is attempting to add a successor with the disallowed mixed use as stated above. This can help us catch many misuses. All uses of weight-based interfaces are now updated to use probability-based ones. Differential revision: http://reviews.llvm.org/D14973 llvm-svn: 254348
* [X86][FMA4] Prefer FMA4 to FMASimon Pilgrim2015-11-301-3/+4
| | | | | | | | | | | | | | | | We currently output FMA instructions on targets which support both FMA4 + FMA (i.e. later Bulldozer CPUS bdver2/bdver3/bdver4). This patch flips this so FMA4 is preferred; this is for several reasons: 1 - FMA4 is non-destructive reducing the need for mov instructions. 2 - Its more straighforward to commute and fold inputs (although the recent work on FMA has reduced this difference). 3 - All supported targets have FMA4 performance equal or better to FMA - Piledriver (bdver2) in particular has half the throughput when executing FMA instructions. Its looks like no future AMD processor lines will support FMA4 after the Bulldozer series so we're not causing problems for later CPUs. Differential Revision: http://reviews.llvm.org/D14997 llvm-svn: 254339
* AMDGPU: Fix unused functionMatt Arsenault2015-11-301-5/+0
| | | | llvm-svn: 254333
* AMDGPU: Error if too many user SGPRs usedMatt Arsenault2015-11-302-0/+8
| | | | llvm-svn: 254332
* AMDGPU: Rework how private buffer passed for HSAMatt Arsenault2015-11-309-102/+594
| | | | | | | | | | | | | | | | If we know we have stack objects, we reserve the registers that the private buffer resource and wave offset are passed and use them directly. If not, reserve the last 5 SGPRs just in case we need to spill. After register allocation, try to pick the next available registers instead of the last SGPRs, and then insert copies from the inputs to the reserved registers in the progloue. This also only selectively enables all of the input registers which are really required instead of always enabling them. llvm-svn: 254331
* AMDGPU: Rename enums to be consistent with HSA code object terminologyMatt Arsenault2015-11-305-50/+49
| | | | llvm-svn: 254330
* AMDGPU: Remove SIPrepareScratchRegsMatt Arsenault2015-11-3012-249/+123
| | | | | | | | | | | | | | | | | | | | | | It does not work because of emergency stack slots. This pass was supposed to eliminate dummy registers for the spill instructions, but the register scavenger can introduce more during PrologEpilogInserter, so some would end up left behind if they were needed. The potential for spilling the scratch resource descriptor and offset register makes doing something like this overly complicated. Reserve registers to use for the resource descriptor and use them directly in eliminateFrameIndex. Also removes creating another scratch resource descriptor when directly selecting scratch MUBUF instructions. The choice of which registers are reserved is temporary. For now it attempts to pick the next available registers after the user and system SGPRs. llvm-svn: 254329
* AMDGPU: Use assert zext for workgroup sizesMatt Arsenault2015-11-302-10/+24
| | | | llvm-svn: 254328
* [ARM] For old thumb ISA like v4t, we cannot use PC directly in pop.Quentin Colombet2015-11-301-18/+5
| | | | | | Fix the epilogue emission to account for that. llvm-svn: 254325
* [X86] Add RIP to GR64_TCW64David Majnemer2015-11-301-1/+1
| | | | | | | | | The MachineVerifier wants to check that the register operands of an instruction belong to the instruction's register class. RIP-relative control flow instructions violated this by referencing RIP. While this was fixed for SysV, it was never fixed for Win64. llvm-svn: 254315
* Enable shrink wrapping for PPC64Kit Barton2015-11-301-6/+14
| | | | | | | | | | | Re-enable shrink wrapping for PPC64 Little Endian. One minor modification to PPCFrameLowering::findScratchRegister was necessary to handle fall-thru blocks (blocks with no terminator) correctly. Tested with all LLVM test, clang tests, and the self-hosting build, with no problems found. PHabricator: http://reviews.llvm.org/D14778 llvm-svn: 254314
* [WebAssembly] Fix a few minor compiler warnings. NFC.Dan Gohman2015-11-301-7/+7
| | | | llvm-svn: 254311
* fix formatting; NFCSanjay Patel2015-11-301-6/+7
| | | | llvm-svn: 254310
* [Hexagon] NFC Reordering headers.Colin LeMahieu2015-11-301-1/+1
| | | | llvm-svn: 254307
* AMDGPU: Don't reserve SCRATCH_PTR input registerMatt Arsenault2015-11-301-12/+4
| | | | | | This hasn't been doing anything since using relocations was added. llvm-svn: 254304
* Silencing a 32-bit to 64-bit implicit conversion warning; NFC.Aaron Ballman2015-11-301-1/+1
| | | | llvm-svn: 254302
* [mips][microMIPS] Implement LBUX, LHX, LWX, MAQ_S[A].W.PHL, MAQ_S[A].W.PHR, ↵Hrvoje Varga2015-11-303-11/+75
| | | | | | | | MFHI, MFLO, MTHI and MTLO instructions Differential Revision: http://reviews.llvm.org/D14436 llvm-svn: 254297
* [mips][microMIPS] Fix issue with offset operand of BALC and BC instructionsZoran Jovanovic2015-11-304-2/+50
| | | | | | | Value of offset operand for microMIPS BALC and BC instructions is currently shifted 2 bits, but it should be 1 bit. Differential Revision: http://reviews.llvm.org/D14770 llvm-svn: 254296
* [mips][microMIPS] Implement PRECR.QB.PH, PRECR_SRA[_R].PH.W, PRECRQ.PH.W, ↵Zlatko Buljan2015-11-303-7/+41
| | | | | | | | PRECRQ.QB.PH, PRECRQU_S.QB.PH and PRECRQ_RS.PH.W instructions Differential Revision: http://reviews.llvm.org/D14605 llvm-svn: 254291
OpenPOWER on IntegriCloud