summaryrefslogtreecommitdiffstats
path: root/llvm
Commit message (Collapse)AuthorAgeFilesLines
* merge vector stores into wider vector stores and fix AArch64 misaligned ↵Sanjay Patel2015-09-254-18/+79
| | | | | | | | | | | | | | | | | | | | | | access TLI hook (PR21711) This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ). The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned accesses up in performSTORECombine() because they are slow. This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving existing (perhaps questionable) lowering behavior. The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned stores. Differential Revision: http://reviews.llvm.org/D12635 llvm-svn: 248622
* PrologueEpilogInserter: Fix missing live-ins when savepoint equals restorepointMatthias Braun2015-09-252-3/+172
| | | | | | | | | | | | | | The algorithm would not modify the live-in list of blocks below the save block point which is correct unless it happens to be a restore point at the same time. Also fixes the benign issue of live-in registers being added twice in some cases. The testcase is based on a test submitted by Kit Barton. Differential Revision: http://reviews.llvm.org/D13176 llvm-svn: 248620
* AMDGPU/SI: Use .hsatext section instead of .text for HSATom Stellard2015-09-2520-10/+258
| | | | | | | | | | Reviewers: arsenm, grosbach, rafael Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12424 llvm-svn: 248619
* MCAsmInfo: Allow targets to specify when the .section directive should be ↵Tom Stellard2015-09-253-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | omitted Summary: The default behavior is to omit the .section directive for .text, .data, and sometimes .bss, but some targets may want to omit this directive for other sections too. The AMDGPU backend will uses this to emit a simplified syntax for section switches. For example if the section directive is not omitted (current behavior), section switches to .hsatext will be printed like this: .section .hsatext,#alloc,#execinstr,#write This is actually wrong, because .hsatext has some custom STT_* flags, which MC doesn't know how to print or parse. If the section directive is omitted (made possible by this commit), section switches will be printed like this: .hsatext The motivation for this patch is to make it possible to emit sections with custom STT_* flags without having to teach MC about all the target specific STT_* flags. Reviewers: rafael, grosbach Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12423 llvm-svn: 248618
* MachineBasicBlock: Factor out common code into isReturnBlock()Matthias Braun2015-09-258-17/+16
| | | | llvm-svn: 248617
* Revert two SCEV changes that caused test failures in clang.Sanjoy Das2015-09-254-357/+1
| | | | | | r248606: "[SCEV] Exploit A < B => (A+K) < (B+K) when possible" r248608: "[SCEV] Teach isLoopBackedgeGuardedByCond to exploit trip counts." llvm-svn: 248614
* ADCE: Fix typo in file comment. NFCJustin Bogner2015-09-251-1/+1
| | | | llvm-svn: 248613
* PeepholeOptimizer: Remove redundant copiesMatt Arsenault2015-09-253-27/+117
| | | | | | | | | | | | If a virtual register is copied and another copy was already seen, replace with the previous copy. This only handles the simplest cases for now. This pattern shows up from various operand restrictions AMDGPU has which require inserting copies depending on the register class of the operands. llvm-svn: 248611
* Simplify code. NFC.Chad Rosier2015-09-251-6/+1
| | | | llvm-svn: 248610
* more space; NFCSanjay Patel2015-09-251-0/+1
| | | | llvm-svn: 248609
* [SCEV] Teach isLoopBackedgeGuardedByCond to exploit trip counts.Sanjoy Das2015-09-253-2/+55
| | | | | | | | | | | | | | | | | | | Summary: If the trip count of a specific backedge is `N`, then we know that backedge is effectively guarded by the condition `{0,+,1} u< N`. This change teaches SCEV to use this condition to prove things in `isLoopBackedgeGuardedByCond`. Depends on D12948 Depends on D12949 Reviewers: atrick, reames, majnemer, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12950 llvm-svn: 248608
* [SCEV] Extract helper function from isImpliedCond; NFCSanjoy Das2015-09-252-0/+15
| | | | | | | | | | | | | Summary: This new helper routine will be used in a subsequent change. Reviewers: hfinkel Subscribers: hfinkel, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D12949 llvm-svn: 248607
* [SCEV] Exploit A < B => (A+K) < (B+K) when possibleSanjoy Das2015-09-253-0/+303
| | | | | | | | | | | | | | | | | | | | Summary: This change teaches SCEV's `isImpliedCond` two new identities: A u< B u< -C => (A + C) u< (B + C) A s< B s< INT_MIN - C => (A + C) s< (B + C) While these are useful on their own, they're really intended to support D12950. Reviewers: atrick, reames, majnemer, nlewycky, hfinkel Subscribers: aadg, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D12948 llvm-svn: 248606
* AMDGPU: Add some more tests for literal operandsMatt Arsenault2015-09-252-6/+231
| | | | llvm-svn: 248600
* AMDGPU: Make getNamedOperandIdx declaration readonlyMatt Arsenault2015-09-252-0/+3
| | | | | | This matches how it is defined in the generated implementation. llvm-svn: 248598
* [AArch64] Add support for generating pre- and post-index load/store pairs.Chad Rosier2015-09-253-47/+301
| | | | llvm-svn: 248593
* AMDGPU: Disable some passes that are not meaningfulMatt Arsenault2015-09-251-3/+15
| | | | | | | | | | | Don't run passes related to stack maps, garbage collection, exceptions since these aren't useful for GPUs. There might be a few more to turn off that I'm less sure about (e.g. ShrinkWrapping) or I'm not sure how to disable (SafeStack and StackProtector) llvm-svn: 248591
* AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAGMatt Arsenault2015-09-252-52/+80
| | | | | | | | | | | | | This fixes a select error when the i64 source was also bitcasted to v2i32 in the original source. Instead of awkwardly trying to select the modified source value and the store, replace before isel begins. Uses a worklist to avoid possible problems from mutating the DAG, although it seems to work OK without it. llvm-svn: 248589
* AMDGPU: Fix recomputing dominator tree unnecessarilyMatt Arsenault2015-09-256-1/+19
| | | | | | | SIFixSGPRCopies does not modify the CFG, but this was being recomputed before running SIFoldOperands. llvm-svn: 248587
* AMDGPU: Re-justify workaround and fix worked around problemMatt Arsenault2015-09-252-56/+65
| | | | | | | | | | | | | | | When buffer resource descriptors were built, the upper two components of the descriptor were first composed into a 64-bit register because legalizeOperands assumed all operands had the same register class. Fix that problem, but keep the workaround. I'm not sure anything actually is actually emitting such a REG_SEQUENCE now. If multiple resource descriptors are set up with different base pointers, this is copied with a single s_mov_b64. We probably should fix this better by recognizing a pair of s_mov_b32 later, but for now delete the dead code. llvm-svn: 248585
* AMDGPU: Don't create REG_SEQUENCE with SGPR dest and VGPR sourcesMatt Arsenault2015-09-251-6/+15
| | | | | | This avoids needting to re-legalize the new REG_SEQUENCE. llvm-svn: 248584
* AMDGPU: Fix not adding exec to defs of cmpx instruction pseudosMatt Arsenault2015-09-251-0/+2
| | | | | | | | | This was only set on the final _si/_vi version, but not on the pseudos most of codegen sees. No test since these instructions aren't used yet. llvm-svn: 248583
* AMDGPU: Improve accuracy of instruction rates for VOPCMatt Arsenault2015-09-254-56/+83
| | | | | | | | | | | These were all using the default 32-bit VALU write class, but the i64/f64 compares are half rate. I'm not sure this is really correct, because they are still using the write to VALU write class, even though they really write to the SALU. llvm-svn: 248582
* [GlobalsAA] Teach GlobalsAA about nocaptureJames Molloy2015-09-254-2/+182
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Arguments to function calls marked "nocapture" can be marked as non-escaping. However, nocapture is defined in terms of the lifetime of the callee, and if the callee can directly or indirectly recurse to the caller, the semantics of nocapture are invalid. Therefore, we eagerly discover which SCC each function belongs to, and later can check if callee and caller of a callsite belong to the same SCC, in which case there could be recursion. This means that we can't be so optimistic in getModRefInfo(ImmutableCallsite) - previously we assumed all call arguments never aliased with an escaping global. Now we need to check, because a global could now be passed as an argument but still not escape. This also solves a related conformance problem: MemCpyOptimizer can turn non-escaping stores of globals into calls to intrinsics like llvm.memcpy/llvm/memset. This confuses GlobalsAA, which knows the global can't escape and so returns NoModRef when queried, when obviously a memcpy/memset call does indeed reference and modify its arguments. This fixes PR24800, PR24801, and PR24802. llvm-svn: 248576
* ARM: make -Asserts,-Werror=unused-variable build happySaleem Abdulrasool2015-09-251-4/+4
| | | | | | | The value was only used in an assertion. Sink the variable usage into the assertion. llvm-svn: 248562
* ARM: address WoA division limitationSaleem Abdulrasool2015-09-255-44/+196
| | | | | | | | | | | | | We now emit the compiler generated divide by zero check that was needed for the MSVC routines. We construct a psuedo-instruction for the DBZ check as the operation requires splitting up the BB. For the 64-bit operations, we need to custom expand the node as we need to insert the DBZ check and then emit the libcall to the appropriate name. Because this is target specific, it seemed better to reproduce the expansion operation from the target-agnostic type legalization rather than sink this there to avoid the duplication. The division library calls now match MSVC semantically. llvm-svn: 248561
* AMDGPU: Remove unused includesMatt Arsenault2015-09-251-6/+0
| | | | llvm-svn: 248553
* [LangRef] Unbreak the docs Sphinx build.Sanjoy Das2015-09-251-2/+2
| | | | | | | r248551 introduced some breakage due to incorrectly terminated ``literals`` s. llvm-svn: 248552
* [Bitcode][Asm] Teach LLVM to read and write operand bundles.Sanjoy Das2015-09-2410-14/+650
| | | | | | | | | | | | | | | | | | Summary: This also adds the first set of tests for operand bundles. The optimizer has not been audited to ensure that it does the right thing with operand bundles. Depends on D12456. Reviewers: reames, chandlerc, majnemer, dexonsmith, kmod, JosephTremoulet, rnk, bogner Subscribers: maksfb, llvm-commits Differential Revision: http://reviews.llvm.org/D12457 llvm-svn: 248551
* Restore test coverage for other than ELFOSABI_NONEEd Maste2015-09-241-1/+3
| | | | | | | | | Add a FreeBSD test to restore testing of ELF OSABI other than ELFOSABI_NONE after r248534. Differential Revision: http://reviews.llvm.org/D13146 llvm-svn: 248550
* Fix typoMatt Arsenault2015-09-241-1/+1
| | | | llvm-svn: 248549
* [AArch64] Improve the readability of the ld/st optimization pass. NFC.Chad Rosier2015-09-241-4/+4
| | | | | | In this context, MI is an add/sub instruction not a loads/store. llvm-svn: 248540
* [X86][SSE2] Fix zero/any extension shuffles that don't start from the first ↵Simon Pilgrim2015-09-242-7/+9
| | | | | | | | element Fix for D12561 - we weren't correctly ensuring that the base element for extension was moved to start on a boundary suitable for UNPCKL/H llvm-svn: 248536
* Use ELFOSABI_NONE instead of ELFOSABI_LINUX.Rafael Espindola2015-09-243-7/+4
| | | | | | | | | | | | The doesn't seem to be a difference and ELFOSABI_NONE seems to be far more common: * Linux doesn't care when loading and puts ELFOSABI_NONE on core dumps. * Gold and bfd ld produce files with ELFOSABI_NONE. * Gold and bfd ld seems to ignore EI_OSABI other than for freebsd. * Gas puts ELFOSABI_NONE in most .o files. llvm-svn: 248534
* AMDGPU: Add s_dcache_* instructionsMatt Arsenault2015-09-2412-14/+218
| | | | llvm-svn: 248533
* AMDGPU: Add cache invalidation instructions.Matt Arsenault2015-09-249-6/+127
| | | | | | | | | | These are necessary for implementing mem_fence for OpenCL 2.0. The VI assembler tests are disabled since it seems to be using the wrong encoding or opcode. llvm-svn: 248532
* AMDGPU: Run mubuf assembler test for CIMatt Arsenault2015-09-241-102/+102
| | | | llvm-svn: 248531
* [AArch64] The paired post-increment store instruction has an output register.Chad Rosier2015-09-241-2/+2
| | | | | | | | The pre- and post-increment version update the base register, but the post- version was defined incorrectly. There is no test case as we don't currently generate these instructions, but I plan on changing that in the near future. llvm-svn: 248528
* [IR] Add operand bundles to CallInst and InvokeInst.Sanjoy Das2015-09-2410-74/+483
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This change teaches `CallInst`s and `InvokeInst`s to maintain a set of operand bundles as part of its operands. `CallInst`s and `InvokeInst`s with operand bundles co-allocate some space before their `Use` array to hold meta information about which of its operands are part of an operand bundle. The strings corresponding to the bundle tags are interned into `LLVMContextImpl::BundleTagCache` This change does not include any parsing / bitcode support. That's the next change. Depends on D12455. Reviewers: reames, chandlerc, majnemer, dexonsmith, kmod, JosephTremoulet, rnk, bogner Subscribers: MatzeB, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D12456 llvm-svn: 248527
* [ARM] Handle +t2dsp feature as an ArchExtKind in ARMTargetParser.defArtyom Skrobov2015-09-2415-116/+113
| | | | | | | | | | | | | | | | | | Currently, the availability of DSP instructions (ACLE 6.4.7) is handled in a hand-rolled tricky condition block in tools/clang/lib/Basic/Targets.cpp, with a FIXME: attached. This patch changes the handling of +t2dsp to be in line with other architecture extensions. Following a revert of r248152 and new review comments, this patch also includes renaming FeatureDSPThumb2 -> FeatureDSP, hasThumb2DSP() -> hasDSP(), etc. The spelling of "t2dsp" is preserved, pending a further investigation of its possible external usage. Differential Revision: http://reviews.llvm.org/D12937 llvm-svn: 248519
* dsymutil: Fix the condition to distinguish module imports form definitions.Adrian Prantl2015-09-244-1/+41
| | | | llvm-svn: 248512
* [ValueTracking] Teach isKnownNonZero a new trickJames Molloy2015-09-242-0/+31
| | | | | | | | If the shifter operand is a constant, and all of the bits shifted out are known to be zero, then if X is known non-zero at least one non-zero bit must remain. llvm-svn: 248508
* [objdump] Make iterator operator* return a reference.Benjamin Kramer2015-09-241-1/+1
| | | | | | | This is closer to the expected behavior of an iterator and avoids awkward warnings from clang's -Wrange-loop-analysis below. llvm-svn: 248497
* Regression Test: Deletes redundant/invalid test.Mohammad Shahid2015-09-241-242/+0
| | | | | | | | Removes absdiff_expand.ll regression test file which is invalid. Diffrential Revision: http://reviews.llvm.org/D11678 llvm-svn: 248493
* [mips] Use PredicateControl for the MSA ASE instructions. NFC.Daniel Sanders2015-09-243-22/+23
| | | | | | | | | | Reviewers: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13092 llvm-svn: 248486
* Codegen: Fix llvm.*absdiff semantic.Mohammad Shahid2015-09-244-26/+245
| | | | | | | | Fixes the overflow case of llvm.*absdiff intrinsic also updats the tests and LangRef.rst accordingly. Differential Revision: http://reviews.llvm.org/D11678 llvm-svn: 248483
* [InstCombine] Recognize another bswap idiom.Charlie Turner2015-09-242-6/+22
| | | | | | | | | | | | | | | | | | | | | | | | Summary: The byte-swap recognizer can now notice that this ``` uint32_t bswap(uint32_t x) { x = (x & 0x0000FFFF) << 16 | (x & 0xFFFF0000) >> 16; x = (x & 0x00FF00FF) << 8 | (x & 0xFF00FF00) >> 8; return x; } ``` is a bswap. Fixes PR23863. Reviewers: nlewycky, hfinkel, hans, jmolloy, rengolin Subscribers: majnemer, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D12637 llvm-svn: 248482
* Introduce target hook for optimizing register copiesMatt Arsenault2015-09-2414-158/+281
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow a target to do something other than search for copies that will avoid cross register bank copies. Implement for SI by only rewriting the most basic copies, so it should look through anything like a subregister extract. I'm not entirely satisified with this because it seems like eliminating a reg_sequence that isn't fully used should work generically for all targets without them having to override something. However, it seems to be tricky to have a simple implementation of this without rewriting to invalid kinds of subregister copies on some targets. I'm not sure if there is currently a generic way to easily check if a subregister index would be valid for the current use. The current set of TargetRegisterInfo::get*Class functions don't quite behave like I would expect (e.g. getSubClassWithSubReg returns the maximal register class rather than the minimal), so I'm not sure how to make the generic test keep searching if SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making the default implementation to check for simple copies breaks a variety of ARM and x86 tests by producing illegal subregister uses. The ARM tests are not actually changed since it should still be using the same sharesSameRegisterFile implementation, this just relaxes them to not check for specific registers. llvm-svn: 248478
* AMDGPU: Return after instruction is processed.Matt Arsenault2015-09-241-0/+4
| | | | llvm-svn: 248476
* AMDGPU: Remove another unnecessary check from commuteInstructionMatt Arsenault2015-09-241-5/+0
| | | | llvm-svn: 248475
OpenPOWER on IntegriCloud