summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* remove 'FeatureSlowUAMem' from AMD CPUs based on 10H micro-arch or laterSanjay Patel2015-08-211-11/+7
| | | | | | | See discussion in D12154 ( http://reviews.llvm.org/D12154 ), AMD Software Optimization Guides for 10H/12H/15H/16H, and Agner Fog's experimental data. llvm-svn: 245733
* [x86] invert logic for attribute 'FeatureFastUAMem'Sanjay Patel2015-08-215-89/+98
| | | | | | | | | | | | | | | | This is a 'no functional change intended' patch. It removes one FIXME, but adds several more. Motivation: the FeatureFastUAMem attribute may be too general. It is used to determine if any sized misaligned memory access under 32-bytes is 'fast'. From the added FIXME comments, however, you can see that we're not consistent about this. Changing the name of the attribute makes it clearer to see the logic holes. Changing this to a 'slow' attribute also means we don't have to add an explicit 'fast' attribute to new chips; fast unaligned accesses have been standard for several generations of CPUs now. Differential Revision: http://reviews.llvm.org/D12154 llvm-svn: 245729
* [x86] enable machine combiner reassociations for 128-bit vector min/maxSanjay Patel2015-08-211-0/+8
| | | | llvm-svn: 245715
* Fix typo - symetric -> symmetric.Eric Christopher2015-08-211-1/+1
| | | | llvm-svn: 245705
* [Sparc] Support user-specified stack object overalignment.James Y Knight2015-08-214-27/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note: I do not implement a base pointer, so it's still impossible to have dynamic realignment AND dynamic alloca in the same function. This also moves the code for determining the frame index reference into getFrameIndexReference, where it belongs, instead of inline in eliminateFrameIndex. [Begin long-winded screed] Now, stack realignment for Sparc is actually a silly thing to support, because the Sparc ABI has no need for it -- unlike the situation on x86, the stack is ALWAYS aligned to the required alignment for the CPU instructions: 8 bytes on sparcv8, and 16 bytes on sparcv9. However, LLVM unfortunately implements user-specified overalignment using stack realignment support, so for now, I'm going to go along with that tradition. GCC instead treats objects which have alignment specification greater than the maximum CPU-required alignment for the target as a separate block of stack memory, with their own virtual base pointer (which gets aligned). Doing it that way avoids needing to implement per-target support for stack realignment, except for the targets which *actually* have an ABI-specified stack alignment which is too small for the CPU's requirements. Further unfortunately in LLVM, the default canRealignStack for all targets effectively returns true, despite that implementing that is something a target needs to do specifically. So, the previous behavior on Sparc was to silently ignore the user's specified stack alignment. Ugh. Yet MORE unfortunate, if a target actually does return false from canRealignStack, that also causes the user-specified alignment to be *silently ignored*, rather than emitting an error. (I started looking into fixing that last, but it broke a bunch of tests, because LLVM actually *depends* on having it silently ignored: some architectures (e.g. non-linux i386) have smaller stack alignment than spilled-register alignment. But, the fact that a register needs spilling is not known until within the register allocator. And by that point, the decision to not reserve the frame pointer has been frozen in place. And without a frame pointer, stack realignment is not possible. So, canRealignStack() returns false, and needsStackRealignment() then returns false, assuming everyone can just go on their merry way assuming the alignment requirements were probably just suggestions after-all. Sigh...) Differential Revision: http://reviews.llvm.org/D12208 llvm-svn: 245668
* SparcAsmParser.cpp: Appease msc x86.NAKAMURA Takumi2015-08-211-1/+1
| | | | llvm-svn: 245661
* AArch64: Fix cmp;ccmp orderingMatthias Braun2015-08-201-3/+10
| | | | | | | | | | | | When producing conditional compare sequences for or operations we need to negate the operands and the finally tested flags. The thing is if we negate the finally tested flags this equals a logical negation of all previously emitted expressions. There was a case missing where we have to order OR expressions so they get emitted first. This fixes http://llvm.org/PR24459 llvm-svn: 245641
* AArch64: Do not create CCMP on multiple users.Matthias Braun2015-08-201-1/+1
| | | | | | | | | Create CMP;CCMP sequences from and/or trees does not gain us anything if the and/or tree is materialized to a GP register anyway. While most of the code already checked for hasOneUse() there was one important case missing. llvm-svn: 245640
* [WebAssembly] Mark more operators as Expand.Dan Gohman2015-08-201-0/+26
| | | | llvm-svn: 245636
* [X86] Look for scalar through one bitcast when lowering to VBROADCAST.Ahmed Bougacha2015-08-202-0/+24
| | | | | | | | | | | | | | Fixes PR23464: one way to use the broadcast intrinsics is: _mm256_broadcastw_epi16(_mm_cvtsi32_si128(*(int*)src)); We don't currently fold this, but now that we use native IR for the intrinsics (r245605), we can look through one bitcast to find the broadcast scalar. Differential Revision: http://reviews.llvm.org/D10557 llvm-svn: 245613
* [NVPTX] truncating 64-bit to 32-bit is freeJingyue Wu2015-08-201-0/+8
| | | | | | | | | | | | | | Summary: Add an LSR test that exercises isTruncateFree. Without this change, LSR creates another indvar representing the truncated value. Reviewers: jholewinski, eliben Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D12058 llvm-svn: 245611
* [X86] Replace avx2 broadcast intrinsics with native IR.Ahmed Bougacha2015-08-201-86/+30
| | | | | | | | | | Since r245605, the clang headers don't use these anymore. r245165 updated some of the tests already; update the others, add an autoupgrade, remove the intrinsics, and cleanup the definitions. Differential Revision: http://reviews.llvm.org/D10555 llvm-svn: 245606
* [ARM] Don't try and custom lower a vNi64 SETCC.James Molloy2015-08-201-0/+6
| | | | | | | | It won't go well. We've already marked 64-bit SETCCs as non-Custom, but it's just possible that a SETCC has a legal result type but an illegal operand type. If this happens, bail out before we create unselectable nodes. Fixes PR24292. I tried to create a testcase but in 99% of cases we can't trigger this - not surprising that this bug has been latent since 2009. llvm-svn: 245577
* [Sparc]: correct the 'set' synthetic instructionDouglas Katzman2015-08-201-5/+37
| | | | | | Differential Revision: http://reviews.llvm.org/D12194 llvm-svn: 245575
* [X86] Fix FBLD and FBSTPMarina Yatsina2015-08-201-2/+2
| | | | | | | | | | FBLD and FBSTP should receive TBYTE because it is defined as FBLD m80 FBSTP m80 Differential Revision: http://reviews.llvm.org/D11748 llvm-svn: 245553
* [X86] Fix bug in COMISD and COMISS definition in td filesMarina Yatsina2015-08-202-6/+6
| | | | | | | | | | | | COMISD should receive QWORD because it is defined as (V)COMISD xmm1, xmm2/m64 COMISS should receive DWORD because it is defined as (V)COMISS xmm1, xmm2/m32 Differential Revision: http://reviews.llvm.org/D11712 llvm-svn: 245551
* [X86] Fix the (shl (and (setcc_c), c1), c2) -> (and setcc_c, (c1 << c2)) foldDavid Majnemer2015-08-201-12/+28
| | | | | | | | | We didn't check for the necessary preconditions before folding a mask/shift into a single mask. This fixes PR24516. llvm-svn: 245544
* [PowerPC] Fix value type on XVCMPEQDP for v2f64 comparisonsHal Finkel2015-08-201-3/+4
| | | | | | | | | XVCMPEQDP is used for VSX v2f64 equality comparisons, but the value type needs to be v2i64 (as that's the corresponding SETCC type). Fixes PR24225. llvm-svn: 245535
* [PowerPC] Fix the int2fp(fp2int(x)) DAGCombine to ignore ppc_fp128Hal Finkel2015-08-201-0/+3
| | | | | | | | This DAGCombine was creating custom SDAG nodes with an illegal ppc_fp128 operand type because it was triggering on f64/f32 int2fp(fp2int(ppc_fp128 x)), but shouldn't (it should only apply to f32/f64 types). The result was a crash. llvm-svn: 245530
* [x86] enable machine combiner reassociations for scalar double-precision min/maxSanjay Patel2015-08-191-0/+4
| | | | llvm-svn: 245506
* [x86] enable machine combiner reassociations for scalar single-precision ↵Sanjay Patel2015-08-191-0/+2
| | | | | | maximums llvm-svn: 245504
* [AArch64][FastISel] Don't fold shifts with UB.Juergen Ributzka2015-08-191-13/+38
| | | | | | | | | | We are already falling back to SelectionDAG when encountering an shift with UB. This adds the same checks for shifts with UB that get folded into arithmetic or logical operations. This fixes rdar://problem/22345295. llvm-svn: 245499
* [X86] Emit more efficient >= comparisons against 0David Majnemer2015-08-191-1/+49
| | | | | | | | | | | | | | | | | | | | | | | We don't do a great job with >= 0 comparisons against zero when the result is used as an i8. Given something like: void f(long long LL, bool *B) { *B = LL >= 0; } We used to generate: shrq $63, %rdi xorb $1, %dil movb %dil, (%rsi) Now we generate: testq %rdi, %rdi setns (%rsi) Differential Revision: http://reviews.llvm.org/D12136 llvm-svn: 245498
* [WebAssembly] Use the default alignment for SIMD types.Dan Gohman2015-08-191-2/+2
| | | | | | | | | Previously WebAssembly's datalayout string had -v128:8:128. This had been an attempt to declare a certain level of support for unaligned SIMD accesses. However, clang makes its own determinations for SIMD alignment that are independent of the datalayout string, so this wasn't actually meaningful. llvm-svn: 245494
* [Sparc]: asm-only support for the ldstub instruction.Douglas Katzman2015-08-191-0/+11
| | | | llvm-svn: 245485
* Temporary fix for the self-host failures introduced by rL244921.Nemanja Ivanovic2015-08-191-1/+2
| | | | | | | | | This revision has introduced an issue that only affects bootstrapped compiler when it is printing the ASM. I am working on resolving the issue, but in the meantime, I'm disabling the legalization of scalar_to_vector operation for v2i64 and the associated testing until I can get this fixed. llvm-svn: 245481
* [PeepholeOptimizer] Look through PHIs to find additional register sourcesBruno Cardoso Lopes2015-08-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reintroduce r245442. Remove an overly conservative assertion introduced in r245442. We could replace the assertion to use `shareSameRegisterFile` instead, but in that point in `insertPHI` we already lost the original Def subreg to check against. So drop the assertion completely. Original commit message: - Teaches the ValueTracker in the PeepholeOptimizer to look through PHI instructions. - Add findNextSourceAndRewritePHI method to lookup into multiple sources returnted by the ValueTracker and rewrite PHIs with new sources. With these changes we can find more register sources and rewrite more copies to allow coaslescing of bitcast instructions. Hence, we eliminate unnecessary VR64 <-> GR64 copies in x86, but it could be extended to other archs by marking "isBitcast" on target specific instructions. The x86 example follows: A: psllq %mm1, %mm0 movd %mm0, %r9 jmp C B: por %mm1, %mm0 movd %mm0, %r9 jmp C C: movd %r9, %mm0 pshufw $238, %mm0, %mm0 Becomes: A: psllq %mm1, %mm0 jmp C B: por %mm1, %mm0 jmp C C: pshufw $238, %mm0, %mm0 Differential Revision: http://reviews.llvm.org/D11197 rdar://problem/20404526 llvm-svn: 245479
* [SPARC] Enable writing to floating-point-state register.Douglas Katzman2015-08-193-0/+27
| | | | llvm-svn: 245475
* [AArch64] Improve short-form diags on long-form Match_InvalidOperand.Ahmed Bougacha2015-08-191-10/+18
| | | | | | | | | Since r244955, we try to use the short-form ErrorInfo when both tries failed, and the long-form match failed on a suffix operand. However, this means we sometimes mix ErrorInfo and MatchResult (one manifestation of this being PR24498). Instead, restore both. llvm-svn: 245469
* Revert "[AArch64] Simplify/refactor code to ease code review. NFC."Renato Golin2015-08-191-32/+18
| | | | | | | This reverts commit r245443, as it broke AArch64 test-suite tramp3d with an assert "Reg && "Null register has no regunits". llvm-svn: 245455
* x32. Fixes a bug in x32 exception handling.Derek Schuff2015-08-191-1/+1
| | | | | | | | | | | | This patch updates the X86 lowering so that the Exception Pointer and Selector are 64-bit wide only if Subtarget.isTarget64BitLP64. Patch by João Porto Reviewers: dschuff, rnk Differential Revision: http://reviews.llvm.org/D12111 llvm-svn: 245454
* x32. Fixes jmp %reg in x32JF Bastien2015-08-191-0/+21
| | | | | | | | | | | | | | x32 has 32-bit pointers; x86-64 can't jmp %r32. This patch addresses this issue by explicitly zero-extending brind's target to 64-bits. Author: jpp Reviewers: jfb, dschuff, pavel.v.chupin Subscribers: llvm-commits Differential revision: http://reviews.llvm.org/D12112 llvm-svn: 245452
* [Sparc] Rename LoadASR and StoreASR from r245360 to *ASI, as was intended.James Y Knight2015-08-191-10/+10
| | | | llvm-svn: 245450
* Revert "[PeepholeOptimizer] Look through PHIs to find additional register ↵Bruno Cardoso Lopes2015-08-191-2/+1
| | | | | | | | | sources" Revert r245442 while investigating a fix. An assertion hit in http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/11380 llvm-svn: 245446
* [SPARC] Fix BooleanContents, so that select of a trunc doesn'tJames Y Knight2015-08-191-0/+8
| | | | | | | | eliminate the trunc. Differential Revision: http://reviews.llvm.org/D10442 llvm-svn: 245444
* [AArch64] Simplify/refactor code to ease code review. NFC.Chad Rosier2015-08-191-18/+32
| | | | llvm-svn: 245443
* [PeepholeOptimizer] Look through PHIs to find additional register sourcesBruno Cardoso Lopes2015-08-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reapply r243486. - Teaches the ValueTracker in the PeepholeOptimizer to look through PHI instructions. - Add findNextSourceAndRewritePHI method to lookup into multiple sources returnted by the ValueTracker and rewrite PHIs with new sources. With these changes we can find more register sources and rewrite more copies to allow coaslescing of bitcast instructions. Hence, we eliminate unnecessary VR64 <-> GR64 copies in x86, but it could be extended to other archs by marking "isBitcast" on target specific instructions. The x86 example follows: A: psllq %mm1, %mm0 movd %mm0, %r9 jmp C B: por %mm1, %mm0 movd %mm0, %r9 jmp C C: movd %r9, %mm0 pshufw $238, %mm0, %mm0 Becomes: A: psllq %mm1, %mm0 jmp C B: por %mm1, %mm0 jmp C C: pshufw $238, %mm0, %mm0 Differential Revision: http://reviews.llvm.org/D11197 rdar://problem/20404526 llvm-svn: 245442
* [ARM] Add instruction selection patterns for vmin/vmaxSilviu Baranga2015-08-192-6/+24
| | | | | | | | | | | | | | | | Summary: The mid-end was generating vector smin/smax/umin/umax nodes, but we were using vbsl to generatate the code. This adds the vmin/vmax patterns and a test to check that we are now generating vmin/vmax instructions. Reviewers: rengolin, jmolloy Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D12105 llvm-svn: 245439
* Map %fprs to %asr6 in the Sparc assembler parser.Joerg Sonnenberger2015-08-191-0/+7
| | | | llvm-svn: 245437
* Revert "[X86] Widen the 'AND' mask if doing so shrinks the encoding size"Tobias Grosser2015-08-191-61/+2
| | | | | | | This reverts commit 245169 which miscompiles MultiSource/Applications/siod from LNT. llvm-svn: 245432
* [X86] Do not lower scalar sdiv/udiv to a shifts + mul sequence when ↵Michael Kuperstein2015-08-192-0/+13
| | | | | | | | | | | | | optimizing for minsize There are some cases where the mul sequence is smaller, but for the most part, using a div is preferable. This does not apply to vectors, since x86 doesn't have vector idiv, and a vector mul/shifts sequence ought to be smaller than a scalarized division. Differential Revision: http://reviews.llvm.org/D12082 llvm-svn: 245431
* [TLI] Refactor "is integer division cheap" queries.Michael Kuperstein2015-08-193-11/+0
| | | | | | | | | | | | | This removes the isPow2SDivCheap() query, as it is not currently used in any meaningful way. isIntDivCheap() no longer relies on a state variable (as all in-tree target set it to false), but the interface allows querying based on the type optimization level. NFC. Differential Revision: http://reviews.llvm.org/D12082 llvm-svn: 245430
* MIR Serialization: Serialize the operand's bit mask target flags.Alex Lorenz2015-08-182-0/+39
| | | | | | | | | This commit adds support for bit mask target flag serialization to the MIR printer and the MIR parser. It also adds support for the machine operand's target flag serialization to the AArch64 target. Reviewers: Duncan P. N. Exon Smith llvm-svn: 245383
* use TLI.allowsMemoryAccess() to check if memory accesses are fast; NFCISanjay Patel2015-08-181-6/+11
| | | | | | | | | This consolidates use of isUnalignedMem32Slow() in one place. There is a slight change in logic although I'm not sure that it would ever come up in the real world: we were assuming that an alignment of the type size is always fast; now, we actually check the data layout to confirm that. llvm-svn: 245382
* Load/store instructions for floating points with address space require SparcV9.Joerg Sonnenberger2015-08-181-19/+39
| | | | | | | | To properly handle this, define the *a instructions as separate instruction classes by refactoring the LoadA and StoreA multiclasses. Move the instruction tests into the sparcv9 file to test the difference. llvm-svn: 245360
* [WinEH] Calculate state numbers for the new EH representationDavid Majnemer2015-08-182-17/+38
| | | | | | | | | | | State numbers are calculated by performing a walk from the innermost funclet to the outermost funclet. Rudimentary support for the new EH constructs has been added to the assembly printer, just enough to test the new machinery. Differential Revision: http://reviews.llvm.org/D12098 llvm-svn: 245331
* MachineRegisterInfo: Introduce isPhysRegUsed()Matthias Braun2015-08-181-6/+3
| | | | | | | | | | | | | | | | This method checks whether a physical regiser or any of its aliases are used in the function. Using this function in SIRegisterInfo::findUnusedReg() should also fix this reported failure: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150803/292143.html http://reviews.llvm.org/rL242173#inline-533 The report doesn't come with a testcase and I don't know enough about AMDGPU to create one myself. llvm-svn: 245329
* use minSize wrapper; NFCISanjay Patel2015-08-181-1/+1
| | | | | | | These were missed when other uses were switched over: http://llvm.org/viewvc/llvm-project?view=revision&revision=243994 llvm-svn: 245311
* [AArch64] Simplify the logic for computing in bounds offset. NFC.Chad Rosier2015-08-181-10/+6
| | | | llvm-svn: 245307
* [mips] Expand JAL instructions when PIC is enabled.Daniel Sanders2015-08-181-0/+126
| | | | | | | | | | | | | | Summary: This is the correct way to handle JAL instructions when PIC is enabled. Patch by Toma Tabacu Reviewers: seanbruno, tomatabacu Subscribers: brooks, seanbruno, emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D6231 llvm-svn: 245305
OpenPOWER on IntegriCloud