summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [Hexagon] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-08-0125-398/+562
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 309746
* [AArch64] Rewrite stack frame handling for win64 vararg functionsMartin Storsjo2017-08-011-22/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous attempt, which made do with a single offset in computeCalleeSaveRegisterPairs, wasn't quite enough. The previous attempt only worked as long as CombineSPBump == true (since the offset would be adjusted later in fixupCalleeSaveRestoreStackOffset). Instead include the size for the fixed stack area used for win64 varargs in calculations in emitPrologue/emitEpilogue. The stack consists of mainly three parts; - AFI->getLocalStackSize() - AFI->getCalleeSavedStackSize() - FixedObject Most of the places in the code which previously used the CSStackSize now use PrologueSaveSize instead, which is the sum of the latter two, while some cases which need exactly the middle one use AFI->getCalleeSavedStackSize() explicitly instead of a local variable. In addition to moving the offsetting into emitPrologue/emitEpilogue (which fixes functions with CombineSPBump == false), also set the frame pointer to point to the right location, where the frame pointer and link register actually are stored. In addition to the prologue/epilogue, this also requires changes to resolveFrameIndexReference. Add tests for a function that keeps a frame pointer and another one that uses a VLA. Differential Revision: https://reviews.llvm.org/D35919 llvm-svn: 309744
* AMDGPU: Fix handling of div_scale with undef inputsMatt Arsenault2017-08-011-1/+55
| | | | | | | | | | | | The src0 register must match src1 or src2, but if these were undefined they could end up using different implicit_defed virtual registers. Force these to use one undef vreg or pick the defined other register. Also fixes producing invalid nodes without the right number of inputs when src2 is undef. llvm-svn: 309743
* AMDGPU: Initial implementation of callsMatt Arsenault2017-08-0118-14/+574
| | | | | | | | | Includes a hack to fix the type selected for the GlobalAddress of the function, which will be fixed by changing the default datalayout to use generic pointers for 0. llvm-svn: 309732
* [AMDGPU] Put a function used only inside assert() under NDEBUG.Davide Italiano2017-08-011-0/+4
| | | | llvm-svn: 309723
* [lanai] Add getIntImmCost in LanaiTargetTransformInfo.Jacques Pienaar2017-08-011-0/+27
| | | | | | Add simple int immediate cost function. llvm-svn: 309721
* [X86][SSE] Added missing vector logic intrinsic schedulesSimon Pilgrim2017-08-011-10/+6
| | | | | | | | Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431). Merged SSE_VEC_BIT_ITINS_P + SSE_BIT_ITINS_P as we were interchanging between them. llvm-svn: 309715
* [X86] Use BEXTR/BEXTRI for 64-bit 'and' with a large maskCraig Topper2017-08-011-5/+36
| | | | | | | | | | | | | | Summary: The 64-bit 'and' with immediate instruction only supports a 32-bit immediate. So for larger constants we have to load the constant into a register first. If the immediate happens to be a mask we can use the BEXTRI instruction to perform the masking. We already do something similar using the BZHI instruction from the BMI2 instruction set. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36129 llvm-svn: 309706
* [X86][SSE] Added missing PACKSS/PACKUS intrinsic schedulesSimon Pilgrim2017-08-013-8/+10
| | | | | | | | Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431). Checked on Agner that these actually match the UNPACK schedules, but better to include a separate class llvm-svn: 309701
* [X86][SSSE3] Added missing PHADDS/PHSUBS/PSIGN intrinsic schedulesSimon Pilgrim2017-08-011-2/+2
| | | | llvm-svn: 309699
* [AVX-512] Don't use unmasked VMOVDQU8/16 for 8-bit or 16-bit element stores ↵Craig Topper2017-08-011-13/+29
| | | | | | | | | | | | | | even when BWI instructions are supported. Always use VMOVDQA32/VMOVDQU32. We were already using the 32 bit element opcode if BWI isn't enabled, but there's no reason to change opcode if we have BWI. We will still use the 8/16 opcodes for masked stores though. This allows us to use the aligned opcode when we can which makes our test output more consistent between different modes. It also reduces the number of isel patterns we need. This is a slight inconsistency with loads which default to 64 bit element opcodes. I'll probably rectify that in a future patch. Differential Revision: https://reviews.llvm.org/D35978 llvm-svn: 309693
* [Mips] Fix for BBIT octeon instructionStrahinja Petrovic2017-08-011-1/+7
| | | | | | | | | | | This patch enables control flow optimization for variations of BBIT instruction. In this case optimization removes unnecessary branch after BBIT instruction. Differential Revision: https://reviews.llvm.org/D35359 llvm-svn: 309679
* [Hexagon] Convert HVX vector constants of i1 to i8Krzysztof Parzyszek2017-08-011-0/+36
| | | | | | | | | Certain operations require vector of i1 values. However, for Hexagon architecture compatibility, they need to be represented as vector of i8. Patch by Suyog Sarda. llvm-svn: 309677
* AMDGPU/GlobalISel: Add support for amdgpu_vs calling conventionTom Stellard2017-08-011-4/+24
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35916 llvm-svn: 309675
* [AVX-512] Add unmasked subvector inserts and extract to the execution domain ↵Craig Topper2017-07-311-0/+24
| | | | | | tables. llvm-svn: 309632
* [X86][MMX] Added custom lowering action for MMX SELECT (PR30418)Konstantin Belochapka2017-07-311-0/+13
| | | | | | | Fix for pr30418 - error in backend: Cannot select: t17: x86mmx = select_cc t2, Constant:i64<0>, t7, t8, seteq:ch Differential Revision: https://reviews.llvm.org/D34661 llvm-svn: 309614
* [AVX-512] Remove patterns that select vmovdqu8/16 for unmasked loads. Prefer ↵Craig Topper2017-07-311-11/+18
| | | | | | | | | | | | | | vmovdqa64/vmovdqu64 instead. These were taking priority over the aligned load instructions since there is no vmovda8/16. I don't think there is really a difference between aligned and unaligned on newer cpus so I don't think it matters which instructions we use. But with this change we reduce the size of the isel table a little and we allow the aligned information to pass through to the evex->vec pass and produce the same output has avx/avx2 in some cases. I also generally dislike patterns rooted in a bitcast which these were. Differential Revision: https://reviews.llvm.org/D35977 llvm-svn: 309589
* Strip trailing whitespace. NFCI.Simon Pilgrim2017-07-311-7/+7
| | | | llvm-svn: 309584
* Fix typo in comment.Simon Pilgrim2017-07-311-1/+1
| | | | llvm-svn: 309583
* [GISel]: Support Widening G_ICMP's destination operand.Aditya Nandakumar2017-07-312-6/+10
| | | | | | | | | Updated AArch64 to widen destination to s32. https://reviews.llvm.org/D35737 Reviewed by Tim llvm-svn: 309579
* Do not recombine FMA when that is not needed.Amaury Sechet2017-07-311-4/+16
| | | | | | | | | | | | Summary: As per title. This creates useless recombines. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33848 llvm-svn: 309578
* Exclude more unused functions from release build.Florian Hahn2017-07-311-0/+4
| | | | llvm-svn: 309576
* [Cost] Rename getReductionCost() to getArithmeticReductionCost(), NFC.Alexey Bataev2017-07-312-4/+5
| | | | llvm-svn: 309563
* Guard print() functions only used by dump() functions.Florian Hahn2017-07-313-1/+9
| | | | | | | | | | | | | | | | | | | Summary: Since r293359, most dump() function are only defined when `!defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)` holds. print() functions only used by dump() functions are now unused in release builds, generating lots of warnings. This patch only defines some print() functions if they are used. Reviewers: MatzeB Reviewed By: MatzeB Subscribers: arsenm, mzolotukhin, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D35949 llvm-svn: 309553
* [X86][AVX512] Add masked MOVS[S|D] patternsGuy Blank2017-07-311-0/+16
| | | | | | | | | | | Added patterns to recognize AND 1 on the mask of a scalar masked move is not needed since only the lower bit is relevant for the instruction. Differential Revision: https://reviews.llvm.org/D35897 llvm-svn: 309546
* [PowerPC] Change method names; NFCHiroshi Inoue2017-07-311-24/+25
| | | | | | | | Changed method names based on the discussion in https://reviews.llvm.org/D34986: getInt64 -> selectI64Imm, getInt64Count -> selectI64ImmInstrCount. llvm-svn: 309541
* [X86] Add pattern to use bzhi for 64-bit 'and' with a mask when there is a ↵Craig Topper2017-07-311-0/+4
| | | | | | | | load involved. We already had a pattern without load, but with a load we were falling back to a regular 'and' due to pattern complexity priority. llvm-svn: 309535
* [x86][inline-asm][ms-compat] legalize the use of "jc/jz short <op>"Coby Tayree2017-07-301-1/+2
| | | | | | | | | MS ignores the keyword "short" when used after a jc/jz instruction, LLVM ought to do the same. Test: D35893 Differential Revision: https://reviews.llvm.org/D35892 llvm-svn: 309509
* [X86] Add addsub intrinsics to the intrinsic lowering table so we have a ↵Craig Topper2017-07-302-48/+24
| | | | | | single set of isel patterns. llvm-svn: 309502
* [AArch64] Tie source and destination operands for AESMC/AESIMC. Florian Hahn2017-07-293-1/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Most CPUs implementing AES fusion require instruction pairs of the form AESE Vn, _ AESMC Vn, Vn and AESD Vn, _ AESIMC Vn, Vn The constraint is added to AES(I)MC instructions which use the result of an AES(E|D) instruction by using AES(I)MCTrr pseudo instructions, which constraint source and destination registers to be the same. A nice side effect of this change is that now all possible pairs are scheduled back-to-back on the exynos-m1 for the misched-fusion-aes.ll test case. I had to update aes_load_store. The version I added initially was very reduced and with the new constraint, AESE/AESMC could not be scheduled back-to-back. I updated the test to be more realistic and still expose the same scheduling problem as the initial test case. Reviewers: t.p.northover, rengolin, evandro, kristof.beyls, silviu.baranga Reviewed By: t.p.northover, evandro Subscribers: aemerson, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D35299 llvm-svn: 309495
* [AArch64] Use 8 bytes as preferred function alignment on Cortex-A53.Florian Hahn2017-07-291-1/+3
| | | | | | | | | | | | | | | | | | Summary: This change gives a 0.25% speedup on execution time, a 0.82% improvement in benchmark scores and a 0.20% increase in binary size on a Cortex-A53. These numbers are the geomean results on a wide range of benchmarks from the test-suite and a range of proprietary suites. Reviewers: t.p.northover, aadg, silviu.baranga, mcrosier, rengolin Reviewed By: rengolin Subscribers: grimar, davide, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35568 llvm-svn: 309494
* [SelectionDAG][X86] CombineBT - more aggressively determine demanded bitsSimon Pilgrim2017-07-291-12/+8
| | | | | | | | | | | | This patch is in 2 parts: 1 - replace combineBT's use of SimplifyDemandedBits (hasOneUse only) with SelectionDAG::GetDemandedBits to more aggressively determine the lower bits used by BT. 2 - update SelectionDAG::GetDemandedBits to support ANY_EXTEND - if the demanded bits are only in the non-extended portion, then peek through and demand from the source value and then ANY_EXTEND that if we found a match. Differential Revision: https://reviews.llvm.org/D35896 llvm-svn: 309486
* AMDGPU: Remove deadcode from AMDGPUInstPrinterTom Stellard2017-07-293-28/+1
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D36034 llvm-svn: 309477
* AMDGPU: Move INDIRECT_BASE_ADDR definition out of common filesTom Stellard2017-07-293-3/+1
| | | | | | | | | | | | | | Summary: This is only used by R600. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35926 llvm-svn: 309476
* [MachineOutliner] NFC: Change IsTailCall to a call class + frame classJessica Paquette2017-07-294-160/+221
| | | | | | | | | | | | | | | | | | | | | This commit - Removes IsTailCall and replaces it with a target-defined unsigned - Refactors getOutliningCallOverhead and getOutliningFrameOverhead so that they don't use IsTailCall - Adds a call class + frame class classification to OutlinedFunction and Candidate respectively This accomplishes a couple things. Firstly, we don't need the notion of *tail call* in the general outlining algorithm. Secondly, we now can have different "outlining classes" for each candidate within a set of candidates. This will make it easy to add new ways to outline sequences for certain targets and dynamically choose an appropriate cost model for a sequence depending on the context that that sequence lives in. Ultimately, this should get us closer to being able to do something like, say avoid saving the link register when outlining AArch64 instructions. llvm-svn: 309475
* AMDGPU: Make areMemAccessesTriviallyDisjoint more aware of segment flatMatt Arsenault2017-07-292-1/+9
| | | | | | | Checking the encoding is insufficient since now there can be global or scratch instructions. llvm-svn: 309472
* AMDGPU: Teach isLegalAddressingMode about global_* instructionsMatt Arsenault2017-07-292-16/+25
| | | | | | | | Also refine the flat check to respect flat-for-global feature, and constant fallback should check global handling, not specifically MUBUF. llvm-svn: 309471
* AMDGPU: Start selecting global instructionsMatt Arsenault2017-07-293-7/+107
| | | | llvm-svn: 309470
* [Hexagon] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-07-298-216/+279
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 309469
* Remove the unused offset from DBG_VALUE (NFC)Adrian Prantl2017-07-281-2/+3
| | | | | | | Followup to r309426. rdar://problem/33580047 llvm-svn: 309450
* [Hexagon] Formatting changes, NFCKrzysztof Parzyszek2017-07-281-66/+49
| | | | llvm-svn: 309442
* AMDGPU: Look through a bitcast user of an out argumentMatt Arsenault2017-07-281-16/+101
| | | | | | | | | | | | | | This allows handling of a lot more of the interesting cases in Blender. Most of the large functions unlikely to be inlined have this pattern. This is a special case for what clang emits for OpenCL 3 element vectors. Annoyingly, these are emitted as <3 x elt>* pointers, but accessed as <4 x elt>* operations. This also needs to handle cases where a struct containing a single vector is used. llvm-svn: 309419
* AMDGPU: Add pass to replace out argumentsMatt Arsenault2017-07-284-0/+381
| | | | | | | | | | | | | | | | | | | | | | | It is better to return arguments directly in registers if we are making a call rather than introducing expensive stack usage. In one of sample compile from one of Blender's many kernel variants, this fires on about ~20 different functions. Future improvements may be to recognize simple cases where the pointer is indexing a small array. This also fails when the store to the out argument is in a separate block from the return, which happens in a few of the Blender functions. This should also probably be using MemorySSA which might help with that. I'm not sure this is correct as a FunctionPass, but MemoryDependenceAnalysis seems to not work with a ModulePass. I'm also not sure where it should run.I think it should run before DeadArgumentElimination, so maybe either EP_CGSCCOptimizerLate or EP_ScalarOptimizerLate. llvm-svn: 309416
* GlobalISel: map 128-bit values to an FPR by default.Tim Northover2017-07-281-1/+2
| | | | | | | Eventually we may want to allow a pair of GPRs but absolutely nothing in the entire world is ready for that yet. llvm-svn: 309404
* AMDGPU: Annotate implicitarg.ptr usageMatt Arsenault2017-07-286-6/+32
| | | | | | | | | | | We need to pass something to functions for this to work. It isn't derivable just from the kernarg segment pointer because the implicit arguments are placed after the kernel arguments. Also fixes missing test for the intrinsic. llvm-svn: 309398
* [AArch64] Standardize suffixes for LSE Atomics mnemonics (NFCI)Joel Jones2017-07-283-130/+130
| | | | | | | | | | | | | | | | This NFC changeset standardizes the suffixes used for LSE Atomics instructions. It changes the existing suffixes - 'b', 'h', 's', 'd' - to the existing standard 'B', 'H', 'W' and 'X'. This changeset is the result of the code review discussion for D35319. Patch by: steleman Differential Revision: https://reviews.llvm.org/D35927 llvm-svn: 309384
* [ARM] Add the option to directly access TLS pointerStrahinja Petrovic2017-07-283-1/+16
| | | | | | | | | This patch enables choice for accessing thread local storage pointer (like '-mtp' in gcc). Differential Revision: https://reviews.llvm.org/D34408 llvm-svn: 309381
* [MachineOutliner] NFC: Split up getOutliningBenefitJessica Paquette2017-07-284-247/+250
| | | | | | | | | | | | | | | | | | | | | This is some more cleanup in preparation for some actual functional changes. This splits getOutliningBenefit into two cost functions: getOutliningCallOverhead and getOutliningFrameOverhead. These functions return the number of instructions that would be required to call a specific function and the number of instructions that would be required to construct a frame for a specific funtion. The actual outlining benefit logic is moved into the outliner, which calls these functions. The goal of refactoring getOutliningBenefit is to: - Get us closer to getting rid of the IsTailCall flag - Further split up "target-specific" things and "general algorithm" things llvm-svn: 309356
* ARMFrameLowering: Only set ExtraCSSpill for actually unused registers.Matthias Braun2017-07-281-9/+18
| | | | | | | | | | The code assumed that unclobbered/unspilled callee saved registers are unused in the function. This is not true for callee saved registers that are also used to pass parameters such as swiftself. rdar://33401922 llvm-svn: 309350
* [X86] Fix latent bug in sibcall eligibility logicReid Kleckner2017-07-281-0/+7
| | | | | | | | | | | | | | | | | | | The X86 tail call eligibility logic was correct when it was written, but the addition of inalloca and argument copy elision broke its assumptions. It was assuming that fixed stack objects were immutable. Currently, we aim to emit a tail call if no arguments have to be re-arranged in memory. This code would trace the outgoing argument values back to check if they are loads from an incoming stack object. If the stack argument is immutable, then we won't need to store it back to the stack when we tail call. Fortunately, stack objects track their mutability, so we can just make the obvious check to fix the bug. This was http://crbug.com/749826 llvm-svn: 309343
OpenPOWER on IntegriCloud