summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [AArch64] fold 'isPositive' vector integer operations (PR26819)Sanjay Patel2016-03-031-1/+30
| | | | | | | | | | | | | | | | This is one of the cases shown in: https://llvm.org/bugs/show_bug.cgi?id=26819 Shift and negate is what InstCombine prefers to produce (and I tried to make it do more of that in http://reviews.llvm.org/rL262424 ), so we should recognize that pattern as something that might come from autovectorization even if it's unlikely to be produced from C NEON intrinsics. The patch is based on the x86 equivalent: http://reviews.llvm.org/rL262036 Differential Revision: http://reviews.llvm.org/D17834 llvm-svn: 262623
* AVX512: Combine AND + TESTM instructions .Igor Breger2016-03-033-8/+25
| | | | | | Differential Revision: http://reviews.llvm.org/D17844 llvm-svn: 262621
* [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREGSimon Pilgrim2016-03-031-9/+24
| | | | | | | | Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 262599
* Revert "[ARM] Merging 64-bit divmod lib calls into one"Renato Golin2016-03-031-9/+0
| | | | | | This reverts commit r262507, which broke some ARM buildbots. llvm-svn: 262594
* [LLVM][AVX512] PSRLWI Chnage imm8 to intMichael Zuckerman2016-03-031-3/+3
| | | | | | Differential Revision: http://reviews.llvm.org/D17753 llvm-svn: 262592
* AMDGPU: Insert two S_NOP instructions for every high level source statement.Tom Stellard2016-03-034-0/+110
| | | | | | | | | | | | | | Patch by: Konstantin Zhuravlyov Summary: Tools, such as debugger, need to pause execution based on user input (i.e. breakpoint). In order to do this, two S_NOP instructions are inserted for each high level source statement: one before first isa instruction of high level source statement, and one after last isa instruction of high level source statement. Further, debugger may replace S_NOP instructions with S_TRAP instructions based on user input. Reviewers: tstellarAMD, arsenm Subscribers: echristo, dblaikie, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17454 llvm-svn: 262579
* AMDGPU/SI: Don't try to move scratch wave offset when there are no free SGPRsTom Stellard2016-03-031-3/+15
| | | | | | | | | | | | | | Summary: When there were no free SGPRs, we were trying to move this value into some of the reserved registers which was causing a segmentation fault. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17590 llvm-svn: 262577
* [X86] Enable forwarding bool arguments in tail calls (PR26305)Hans Wennborg2016-03-031-0/+20
| | | | | | | | | The code was previously not able to track a boolean argument at a call site back to the formal argument of the caller. Differential Revision: http://reviews.llvm.org/D17786 llvm-svn: 262575
* [PPCVSXFMAMutate] Temporarily disable this passTim Shen2016-03-031-2/+8
| | | | llvm-svn: 262573
* [X86] Don't give catch objects a displacement of zeroDavid Majnemer2016-03-033-1/+23
| | | | | | | | | | | | | | | | | | Catch objects with a displacement of zero do not initialize a catch object. The displacement is relative to %rsp at the end of the function's prologue for x86_64 targets. If we place an object at the top-of-stack, we will end up wit a displacement of zero resulting in our catch object remaining uninitialized. Address this by creating our catch objects as fixed objects. We will ensure that the UnwindHelp object is created after the catch objects so that no catch object will have a displacement of zero. Differential Revision: http://reviews.llvm.org/D17823 llvm-svn: 262546
* AMDGPU: Simplify boolean conditional return statementsMatt Arsenault2016-03-027-44/+19
| | | | | | Patch by Richard Thomson llvm-svn: 262536
* [ARM] Merging 64-bit divmod lib calls into oneRenato Golin2016-03-021-0/+9
| | | | | | | | | | | | | | | | | When div+rem calls on the same arguments are found, the ARM back-end merges the two calls into one __aeabi_divmod call for up to 32-bits values. However, for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't merging the calls, and thus calling ldivmod twice and spilling the temporary results, which generated pretty bad code. This patch legalises 64-bit lib calls for divmod, so that now all the spilling and the second call are gone. It also relaxes the DivRem combiner a bit on the legal type check, since it was already checking for isLegalOrCustom on every value, so the extra check for isTypeLegal was redundant. This patch fixes PR17193 (and a long time FIXME in the tests). llvm-svn: 262507
* Revert "[X86] Elide references to _chkstk for dynamic allocas"Reid Kleckner2016-03-021-29/+9
| | | | | | | | | | | | | | | This reverts commit r262370. It turns out there is code out there that does sequences of allocas greater than 4K: http://crbug.com/591404 The goal of this change was to improve the code size of inalloca call sequences, but we got tangled up in the mess of dynamic allocas. Instead, we should come back later with a separate MI pass that uses dominance to optimize the full sequence. This should also be able to remove the often unneeded stacksave/stackrestore pairs around the call. llvm-svn: 262505
* ARM: Introduce conservative load/store optimization modeMatthias Braun2016-03-021-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | Most of the time ARM has the CCR.UNALIGN_TRP bit set to false which means that unaligned loads/stores do not trap and even extensive testing will not catch these bugs. However the multi/double variants are not affected by this bit and will still trap. In effect a more aggressive load/store optimization will break existing (bad) code. These bugs do not necessarily manifest in the broken code where the misaligned pointer is formed but often later in perfectly legal code where it is accessed. This means recompiling system libraries (which have no alignment bugs) with a newer compiler will break existing applications (with alignment bugs) that worked before. So (under protest) I implemented this safe mode which limits the formation of multi/double operations to cases that are not affected by user code (stack operations like spills/reloads) or cases where the normal operations trap anyway (floating point load/stores). It is disabled by default. Differential Revision: http://reviews.llvm.org/D17015 llvm-svn: 262504
* [AArch64] Enable non-leaf frame pointer elimination.Geoff Berry2016-03-022-9/+7
| | | | | | | | | | | | | | | | | | | | Summary: This change enables frame pointer elimination in non-leaf functions. The -fomit-frame-pointer option still needs to be used when compiling via clang (or an equivalent method of not setting the 'no-frame-pointer-elim*' function attributes if generating llvm IR via some other method) to take advantage of this optimization. This change should be NFC when compiling via clang without -fomit-frame-pointer. Reviewers: t.p.northover Subscribers: aemerson, rengolin, tberghammer, qcolombet, llvm-commits, danalbert, mcrosier, srhines Differential Revision: http://reviews.llvm.org/D17730 llvm-svn: 262495
* [LLVM][AVX512]PSRAWI Change imm8 to int.Michael Zuckerman2016-03-021-9/+9
| | | | | | Differential Revision: http://reviews.llvm.org/D17705 llvm-svn: 262480
* [X86][SSE] Lower 128-bit MOVDDUP with existing VBROADCAST mechanismsSimon Pilgrim2016-03-021-39/+51
| | | | | | | | | | | | We have a number of useful lowering strategies for VBROADCAST instructions (both from memory and register element 0) which the 128-bit form of the MOVDDUP instruction can make use of. This patch tweaks lowerVectorShuffleAsBroadcast to enable it to broadcast 2f64 args using MOVDDUP as well. It does require a slight tweak to the lowerVectorShuffleAsBroadcast mechanism as the existing MOVDDUP lowering uses isShuffleEquivalent which can match binary shuffles that can lower to (unary) broadcasts. Differential Revision: http://reviews.llvm.org/D17680 llvm-svn: 262478
* Revert "[AMDGPU] table-driven parser/printer for amd_kernel_code_t structure ↵Nikolay Haustov2016-03-024-370/+0
| | | | | | | | fields" Build failure with clang. llvm-svn: 262477
* Revert "[AMDGPU] Using table-driven amd_kernel_code_t field parser in ↵Nikolay Haustov2016-03-022-8/+157
| | | | | | | | assembler." Build failure with clang. llvm-svn: 262475
* [AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler.Nikolay Haustov2016-03-022-157/+8
| | | | | | | | | | complementary patch to table-driven amd_kernel_code_t field parser/printer utility. lit tests passed. Patch by: Valery Pykhtin Differential Revision: http://reviews.llvm.org/D17151 llvm-svn: 262474
* [AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fieldsNikolay Haustov2016-03-024-0/+370
| | | | | | | | | | | | | | | | | | This is going to be used in .hsatext disassembler and can be used in current assembler parser (lit tests passed on parsing). Code using this helpers isn't included in this patch. Benefits: unified approach fast field name lookup on parsing Later I would like to enhance some of the field naming/syntax using this code. Patch by: Valery Pykhtin Differential Revision: http://reviews.llvm.org/D17150 llvm-svn: 262473
* [X86] Remove unnecessary call to isReg from emitter's DestMem handling for ↵Craig Topper2016-03-021-7/+5
| | | | | | VEX prefix. The operand is always a register. NFC llvm-svn: 262468
* [X86] Make X86MCCodeEmitter::DetermineREXPrefix locate operands more like ↵Craig Topper2016-03-021-54/+50
| | | | | | how VEX prefix handling does. llvm-svn: 262467
* [X86] Permit reading of the FLAGS register without it being previously definedDavid Majnemer2016-03-022-3/+8
| | | | | | | | | | | We modeled the RDFLAGS{32,64} operations as "using" {E,R}FLAGS. While technically correct, this is not be desirable for folks who want to examine aspects of the FLAGS register which are not related to computation like whether or not CPUID is a valid instruction. Differential Revision: http://reviews.llvm.org/D17782 llvm-svn: 262465
* [X86] Remove assertion I accidentally left in.Craig Topper2016-03-021-1/+0
| | | | llvm-svn: 262464
* [X86] Be more structured about how we capture the register number when it is ↵Craig Topper2016-03-021-41/+39
| | | | | | | | | | encoded in bits 7:4 of the immediate. For some instructions the register is not the last operand and the immediate handling had to detect this and hardcode the index to find it. It also required CurOp to be pointing at the last operand handled in the Form switch whereas for any instruction it would be pointing at the next operand. Now we just capture the value in the Form switch when we know exactly where it is and the CurOp pointer can behave normally. llvm-svn: 262462
* [X86] Use MCPhysReg and uint16_t for static arrays of registers and opcodes ↵Craig Topper2016-03-025-16/+16
| | | | | | respectively should reduce size tiny bit. NFC llvm-svn: 262458
* AMDGPU: Fix bug 26659.Matt Arsenault2016-03-021-1/+1
| | | | | | | | Fix checking the same instruction twice instead of the second branch that uses vccz. I don't think this matters currently because s_branch_vccnz is always used currently. llvm-svn: 262457
* AMDGPU: Cleanup suggested in bug 23960Matt Arsenault2016-03-021-6/+3
| | | | llvm-svn: 262456
* Bug 20810: Use report_fatal_error instead of unreachableMatt Arsenault2016-03-021-6/+6
| | | | llvm-svn: 262455
* [NFC] Convert tabs to spaces.Colin LeMahieu2016-03-011-2/+2
| | | | llvm-svn: 262411
* AArch64: Reenable CompleteModel for A53, A57 and Kryo modelsMatthias Braun2016-03-013-3/+3
| | | | | | The fixes in r262393 completed them as well. llvm-svn: 262408
* [Hexagon] Modifying r262258 to only be in effect in the hand assembler path, ↵Colin LeMahieu2016-03-012-14/+18
| | | | | | not the integrated assembler. llvm-svn: 262400
* AArch64: Add missing schedinfo, check completeness for cycloneMatthias Braun2016-03-018-13/+41
| | | | | | | | | This adds some missing generic schedule info definitions, enables completeness checking for cyclone and fixes a typo uncovered by that. Differential Revision: http://reviews.llvm.org/D17748 llvm-svn: 262393
* [Power9] Implement new vector compare, extract, insert instructionsKit Barton2016-03-012-0/+96
| | | | | | | | | | | | | | | | | | This change implements the following vector operations: - Vector Compare Not Equal - vcmpneb(.) vcmpneh(.) vcmpnew(.) - vcmpnezb(.) vcmpnezh(.) vcmpnezw(.) - Vector Extract Unsigned - vextractub vextractuh vextractuw vextractd - vextublx vextubrx vextuhlx vextuhrx vextuwlx vextuwrx - Vector Insert - vinsertb vinserth vinsertw vinsertd 26 instructions. Phabricator: http://reviews.llvm.org/D15916 llvm-svn: 262392
* [x86] use getBitcast()Sanjay Patel2016-03-011-20/+20
| | | | | | | | This isn't quite NFC because some of the SDLocs may change which could cause scheduling differences. But no regression tests are affected and there is no functional change intended. llvm-svn: 262391
* Revert "[AArch64] Fix isLegalAddImmediate() to return true for valid ↵Geoff Berry2016-03-011-2/+2
| | | | | | | | | | negative values." Revert r262248 in an attempt to fix the clang-native-aarch64-full bot and to investigate a performance regression in SingleSource/Benchmarks/CoyoteBench/huffbench llvm-svn: 262388
* Revert "[mips] Promote the result of SETCC nodes to GPR width."Vasileios Kalintiris2016-03-0114-550/+418
| | | | | | | | | This reverts commit r262316. It seems that my change breaks an out-of-tree chromium buildbot, so I'm reverting this in order to investigate the situation further. llvm-svn: 262387
* New file to track implementation status of new POWER9 instructionsKit Barton2016-03-011-0/+442
| | | | llvm-svn: 262386
* TableGen: Check scheduling models for completenessMatthias Braun2016-03-0119-3/+31
| | | | | | | | | | | | | | | | | | | | | | TableGen checks at compiletime that for scheduling models with "CompleteModel = 1" one of the following holds: - Is marked with the hasNoSchedulingInfo flag - The instruction is a subclass of Sched - There are InstRW definitions in the scheduling model Typical steps necessary to complete a model: - Ensure all pseudo instructions that are expanded before machine scheduling (usually everything handled with EmitYYY() functions in XXXTargetLowering). - If a CPU does not support some instructions mark the corresponding resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }". - Add missing scheduling information. Differential Revision: http://reviews.llvm.org/D17747 llvm-svn: 262384
* [NVPTX] Annotate param loads/stores as mayLoad/mayStore.Justin Lebar2016-03-012-56/+68
| | | | | | | | | | | | | | | | | | Summary: Tablegen was unable to determine that param loads/stores were actually reading or writing from memory. I think this isn't a problem in practice for param stores, because those occur in a block right before we make our call. But param loads don't have to at the very beginning of a function, so should be annotated as mayLoad so we don't incorrectly optimize them. Reviewers: jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D17471 llvm-svn: 262381
* [NVPTX] Remove workaround for tablegen crash in NVPTXInstrInfo.td.Justin Lebar2016-03-011-28/+7
| | | | | | | | | | | | Summary: Looks like this was caused by a typo. Reviewers: jholewinski Subscribers: jholewinski, llvm-commits, tra Differential Revision: http://reviews.llvm.org/D17357 llvm-svn: 262380
* [NVPTX] Use different, convergent MIs for convergent calls.Justin Lebar2016-03-013-49/+56
| | | | | | | | | | | | | | | | | | | | | | | Summary: Calls sometimes need to be convergent. This is already handled at the LLVM IR level, but it also needs to be handled at the MI level. Ideally we'd propagate convergence from instructions, down through the selection DAG, and into MIs. But this is Hard, and would affect optimizations in the SDNs -- right now only SDNs with two operands have any flags at all. Instead, here's a much simpler hack: Add new opcodes for NVPTX for convergent calls, and generate these when lowering convergent LLVM calls. Reviewers: jholewinski Subscribers: jholewinski, chandlerc, joker.eph, jhen, tra, llvm-commits Differential Revision: http://reviews.llvm.org/D17423 llvm-svn: 262373
* [NVPTX] Nix hack used to emit '{' and '}' for NVPTX calls.Justin Lebar2016-03-011-9/+3
| | | | | | | | | | | | Summary: Tablegen understands backslash as an escape char; that's sufficient. Reviewers: jholewinski Subscribers: llvm-commits, tra, jholewinski Differential Revision: http://reviews.llvm.org/D17432 llvm-svn: 262372
* [NVPTX] Reformat NVPTXInstrInfo.td, and add additional comments.Justin Lebar2016-03-011-1418/+1400
| | | | | | | | | | | | | | | Summary: Also simplify some of the embedded C++ logic. No functional changes. Reviewers: jholewinski Subscribers: llvm-commits, tra, jholewinski Differential Revision: http://reviews.llvm.org/D17354 llvm-svn: 262371
* [X86] Elide references to _chkstk for dynamic allocasDavid Majnemer2016-03-011-9/+29
| | | | | | | | | | | | | | | The _chkstk function is called by the compiler to probe the stack in an order consistent with Windows' expectations. However, it is possible to elide the call to _chkstk and manually adjust the stack pointer if we can prove that the allocation is fixed size and smaller than the probe size. This shrinks chrome.dll, chrome_child.dll and chrome.exe by a cummulative ~133 KB. Differential Revision: http://reviews.llvm.org/D17679 llvm-svn: 262370
* fix function names; NFCSanjay Patel2016-03-011-157/+151
| | | | llvm-svn: 262367
* DAGCombiner: Turn extract of bitcasted integer into truncateMatt Arsenault2016-03-011-7/+12
| | | | | | | This reduces the number of bitcast nodes and generally cleans up the DAG when bitcasting between integers and vectors everywhere. llvm-svn: 262358
* AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and ↵Changpeng Fang2016-03-013-0/+32
| | | | | | | | | | | | | | | | Intrinsics Summary: This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics, which are new since VI. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D17614 llvm-svn: 262356
* [LLVM][AVX512] PSRL{DI|QI} Change imm8 to intMichael Zuckerman2016-03-011-6/+6
| | | | | | Differential Revision: http://reviews.llvm.org/D17713 llvm-svn: 262353
OpenPOWER on IntegriCloud