summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* [mips][microMIPS] Add CodeGen support for SUBU16, SUB, SUBU, DSUB and DSUBU ↵Zlatko Buljan2016-04-271-37/+77
| | | | | | | | instructions Differential Revision: http://reviews.llvm.org/D16676 llvm-svn: 267694
* [mips][microMIPS] Add CodeGen support for SLL16, SRL16, SLL, SLLV, SRA, ↵Zlatko Buljan2016-04-274-4/+112
| | | | | | | | SRAV, SRL and SRLV instructions Differential Revision: http://reviews.llvm.org/D17989 llvm-svn: 267693
* [ppc64] fix bug in prologue that mfocrf's cr operand should be explict state ↵Chuang-Yu Cheng2016-04-271-1/+1
| | | | | | | | | | | | instead of implicit This fixes PR27414 Reviewers: kbarton mgrang tjablin http://reviews.llvm.org/D19255 llvm-svn: 267660
* [X86] Don't assume that MMX extractelts are from index 0.Ahmed Bougacha2016-04-271-0/+28
| | | | | | | It's probably the case for all 3 MMX users out there, but with hand-crafted IR, you can trigger selection failures. Fix that. llvm-svn: 267652
* [X86] Re-enable MMX i32 extractelt combine.Ahmed Bougacha2016-04-271-0/+15
| | | | | | | | | This effectively adds back the extractelt combine removed by r262358: the direct case can still occur (because x86_mmx is special, see r262446), but it's the indirect case that's now superseded by the generic combine. llvm-svn: 267651
* Detects the SAD pattern on X86 so that much better code will be emitted once ↵Cong Hou2016-04-271-0/+973
| | | | | | | | the pattern is matched. Differential revision: http://reviews.llvm.org/D14840 llvm-svn: 267649
* [X86] Make sure it is safe to clobber EFLAGS, if need be, when choosingQuentin Colombet2016-04-261-0/+44
| | | | | | | | | | | | | | | the prologue. Do not use basic blocks that have EFLAGS live-in as prologue if we need to realign the stack. Realigning the stack uses AND instruction and this clobbers EFLAGS. An other alternative would have been to save and restore EFLAGS around the stack realignment code, but this is likely inefficient. Fixes PR27531. llvm-svn: 267634
* [X86] Replace -mcpu with -mattr in several testsMitch Bodart2016-04-264-5/+5
| | | | | | Differential Revision: http://reviews.llvm.org/D19568 llvm-svn: 267629
* [MachineBasicBlock] Take advantage of the partially dead information.Quentin Colombet2016-04-261-2/+1
| | | | | | | Thanks to that information we wouldn't lie on a register being live whereas it is not. llvm-svn: 267622
* [MachineInstrBundle] Improvement the recognition of dead definitions.Quentin Colombet2016-04-261-2/+1
| | | | | | | Now, it is possible to know that partial definitions are dead definitions and recognize that clobbered registers are also dead. llvm-svn: 267621
* [Tail duplication] Handle source registers with subregistersKrzysztof Parzyszek2016-04-261-0/+67
| | | | | | | | | | | | | | When a block is tail-duplicated, the PHI nodes from that block are replaced with appropriate COPY instructions. When those PHI nodes contained use operands with subregisters, the subregisters were dropped from the COPY instructions, resulting in incorrect code. Keep track of the subregister information and use this information when remapping instructions from the duplicated block. Differential Revision: http://reviews.llvm.org/D19337 llvm-svn: 267583
* Swift Calling Convention: use %RAX for sret.Manman Ren2016-04-261-19/+24
| | | | | | | We don't need to copy the sret argument into %rax upon return. rdar://25671494 llvm-svn: 267579
* tests: tweak MIR for ARM tests to correct MI issuesSaleem Abdulrasool2016-04-262-5/+7
| | | | | | | | | The Machine Instruction Verifier flagged some issues in the serialized MIR. Adjust the input to correct them. Fixes the remaining portion of PR27480. llvm-svn: 267578
* test: remove some bleeding whitespaceSaleem Abdulrasool2016-04-262-57/+57
| | | | | | Kill bleeding whitespace. NFC llvm-svn: 267577
* [CodeGenPrepare] use branch weight metadata to decide if a select should be ↵Sanjay Patel2016-04-261-3/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | turned into a branch This is part of solving PR27344: https://llvm.org/bugs/show_bug.cgi?id=27344 CGP should undo the SimplifyCFG transform for the same reason that earlier patches have used this same mechanism: it's possible that passes between SimplifyCFG and CGP may be able to optimize the IR further with a select in place. For the TLI hook default, >99% taken or not taken is chosen as the default threshold for a highly predictable branch. Even the most limited HW branch predictors will be correct on this branch almost all the time, so even a massive mispredict penalty perf loss would be overcome by the win from all the times the branch was predicted correctly. As a follow-up, we could make the default target hook less conservative by using the SchedMachineModel's MispredictPenalty. Or we could just let targets override the default by implementing the hook with that and other target-specific options. Note that trying to statically determine mispredict rates for close-to-balanced profile weight data is generally impossible if the HW is sufficiently advanced. Ie, 50/50 taken/not-taken might still be 100% predictable. Finally, note that this patch as-is will not solve PR27344 because the current __builtin_unpredictable() branch weight default values are 4 and 64. A proposal to change that is in D19435. Differential Revision: http://reviews.llvm.org/D19488 llvm-svn: 267572
* [AMDGPU] Reserve VGPRs for trap handler usage if instructedKonstantin Zhuravlyov2016-04-261-0/+37
| | | | | | Differential Revision: http://reviews.llvm.org/D19235 llvm-svn: 267563
* [X86] PR27502: Fix the LEA optimization pass.Andrey Turetskiy2016-04-262-1/+21
| | | | | | | | Handle MachineBasicBlock as a memory displacement operand in the LEA optimization pass. Differential Revision: http://reviews.llvm.org/D19409 llvm-svn: 267551
* [PowerPC] Add support for llvm.thread.pointerMarcin Koscielnicki2016-04-261-0/+17
| | | | | | Differential Revision: http://reviews.llvm.org/D19304 llvm-svn: 267546
* [SPARC] [SSP] Add support for LOAD_STACK_GUARD.Marcin Koscielnicki2016-04-261-0/+33
| | | | | | | | This fixes PR22248 on sparc. Differential Revision: http://reviews.llvm.org/D19386 llvm-svn: 267545
* [SPARC] Add support for llvm.thread.pointer.Marcin Koscielnicki2016-04-261-0/+11
| | | | | | Differential Revision: http://reviews.llvm.org/D19387 llvm-svn: 267544
* [AArch64] Expand v1i64 and v2i64 ctlz.Craig Topper2016-04-261-0/+28
| | | | | | The default is legal, which results in 'Cannot select' errors. llvm-svn: 267522
* [ARM] Expand vector ctlz_zero_undef so it becomes ctlz.Craig Topper2016-04-261-0/+62
| | | | | | The default is Legal, which results in 'Cannot select' errors. llvm-svn: 267521
* [ARM] Expand v1i64 and v2i64 ctlz.Craig Topper2016-04-261-0/+16
| | | | | | The default is legal, which results in 'Cannot select' errors. llvm-svn: 267520
* Pass the test file in through stdin instead of by filename.Richard Trieu2016-04-261-1/+1
| | | | | | | | When passed in via filename, this test will fail if the path to the test has the strings "f1" and "f2" in somewhere. Pass the file through stdin to prevent test failures due to coincidences in path names. llvm-svn: 267517
* [WebAssembly] Account for implicit operands when computing operand indices.Dan Gohman2016-04-261-1/+7
| | | | llvm-svn: 267511
* [X86] Use LivePhysRegs in X86FixupBWInsts.Ahmed Bougacha2016-04-263-14/+19
| | | | | | | | | Kill-flags, which computeRegisterLiveness uses, are not reliable. LivePhysRegs is. Differential Revision: http://reviews.llvm.org/D19472 llvm-svn: 267495
* [Sparc] Fix double-float fabs and fneg on little endian CPUs.James Y Knight2016-04-252-77/+49
| | | | | | | | | | | | | | | | The SparcV8 fneg and fabs instructions interestingly come only in a single-float variant. Since the sign bit is always the topmost bit no matter what size float it is, you simply operate on the high subregister, as if it were a single float. However, the layout of double-floats in the float registers is reversed on little-endian CPUs, so that the high bits are in the second subregister, rather than the first. Thus, this expansion must check the endianness to use the correct subregister. llvm-svn: 267489
* ARM: put extern __thread stubs in a special section.Tim Northover2016-04-251-0/+17
| | | | | | | The linker needs to know that the symbols are thread-local to do its job properly. llvm-svn: 267473
* Re-apply r267206 with a fix for the encoding problem: when the immediate ofQuentin Colombet2016-04-251-5/+52
| | | | | | | | | | | | | | | | | | log2(Mask) is smaller than 32, we must use the 32-bit variant because the 64-bit variant cannot encode it. Therefore, set the subreg part accordingly. [AArch64] Fix optimizeCondBranch logic. The opcode for the optimized branch does not depend on the size of the activate bits in the AND masks, but the AND opcode itself. Indeed, we need to use a X or W variant based on the AND variant not based on whether the mask fits into the related variant. Otherwise, we may end up using the W variant of the optimized branch for 64-bit register inputs! This fixes the last make check verifier issues for AArch64: PR27479. llvm-svn: 267465
* AMDGPU: Implement addrspacecastMatt Arsenault2016-04-253-22/+274
| | | | llvm-svn: 267452
* AMDGPU: Add queue ptr intrinsicMatt Arsenault2016-04-252-0/+30
| | | | llvm-svn: 267451
* [Hexagon] Register save/restore functions do not follow regular conventionsKrzysztof Parzyszek2016-04-251-0/+72
| | | | | | Do not mark them as modifying any of the volatile registers by default. llvm-svn: 267433
* add tests for potential CGP transform (PR27344)Sanjay Patel2016-04-251-0/+32
| | | | llvm-svn: 267426
* [PR27390] [CodeGen] Reject indexed loads in CombinerDAG.Marcin Koscielnicki2016-04-251-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | visitAND, when folding and (load) forgets to check which output of an indexed load is involved, happily folding the updated address output on the following testcase: target datalayout = "e-m:e-i64:64-n32:64" target triple = "powerpc64le-unknown-linux-gnu" %typ = type { i32, i32 } define signext i32 @_Z8access_pP1Tc(%typ* %p, i8 zeroext %type) { %b = getelementptr inbounds %typ, %typ* %p, i64 0, i32 1 %1 = load i32, i32* %b, align 4 %2 = ptrtoint i32* %b to i64 %3 = and i64 %2, -35184372088833 %4 = inttoptr i64 %3 to i32* %_msld = load i32, i32* %4, align 4 %zzz = add i32 %1, %_msld ret i32 %zzz } Fix this by checking ResNo. I've found a few more places that currently neglect to check for indexed load, and tightened them up as well, but I don't have test cases for them. In fact, they might not be triggerable at all, at least with current targets. Still, better safe than sorry. Differential Revision: http://reviews.llvm.org/D19202 llvm-svn: 267420
* [mips][microMIPS] Revert commit r267137Hrvoje Varga2016-04-254-10/+2
| | | | | | Commit r267137 was the reason for failing tests in LLVM test suite. llvm-svn: 267419
* [x86] auto-generate checks for cmov testsSanjay Patel2016-04-251-14/+32
| | | | llvm-svn: 267417
* [WinEH] Update SplitAnalysis::computeLastSplitPoint to cope with multiple EH ↵David Majnemer2016-04-251-0/+67
| | | | | | | | | | | | | | | | | | | successors We didn't have logic to correctly handle CFGs where there was more than one EH-pad successor (these are novel with WinEH). There were situations where a register was live in one exceptional successor but not another but the code as written would only consider the first exceptional successor it found. This resulted in split points which were insufficiently early if an invoke was present. This fixes PR27501. N.B. This removes getLandingPadSuccessor. llvm-svn: 267412
* [ARM] Add support for the X asm constraintSilviu Baranga2016-04-252-0/+178
| | | | | | | | | | | | | | | | | | Summary: This patch adds support for the X asm constraint. To do this, we lower the constraint to either a "w" or "r" constraint depending on the operand type (both constraints are supported on ARM). Fixes PR26493 Reviewers: t.p.northover, echristo, rengolin Subscribers: joker.eph, jgreenhalgh, aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D19061 llvm-svn: 267411
* [AMDGPU][llvm-mc] s_getreg/setreg* - Add hwreg(...) syntax.Artem Tamazov2016-04-251-1/+1
| | | | | | | | | | | | | Added hwreg(reg[,offset,width]) syntax. Default offset = 0, default width = 32. Possibility to specify 16-bit immediate kept. Added out-of-range checks. Disassembling is always to hwreg(...) format. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19329 llvm-svn: 267410
* [PowerPC] [PR27387] Disallow r0 for ADD8TLS.Marcin Koscielnicki2016-04-251-0/+43
| | | | | | | | | | | ADD8TLS, a variant of add instruction used for initial-exec TLS, currently accepts r0 as a source register. While add itself supports r0 just fine, linker can relax it to a local-exec sequence, converting it to addi - which doesn't support r0. Differential Revision: http://reviews.llvm.org/D19193 llvm-svn: 267388
* Fixing wrong mask size error. From __mmask8 to __mmask16.Michael Zuckerman2016-04-251-5/+5
| | | | | | | Was reviewed over the shoulder by AsafBadouh. Connected to review http://reviews.llvm.org/D19195. llvm-svn: 267379
* [X86] Add a complete set of tests for all operand sizes of cttz/ctlz with ↵Craig Topper2016-04-251-6/+123
| | | | | | and without zero undef being lowered to bsf/bsr. llvm-svn: 267373
* [X86][AVX] Added PR24935 test caseSimon Pilgrim2016-04-241-0/+39
| | | | llvm-svn: 267362
* ARM: fix __chkstk Frame Setup on WoASaleem Abdulrasool2016-04-244-9/+9
| | | | | | | | | | | | This corrects the MI annotations for the stack adjustment following the __chkstk invocation. We were marking the original SP usage as a Def rather than Kill. The (new) assigned value is the definition, the original reference is killed. Adjust the ISelLowering to mark Kills and FrameSetup as well. This partially resolves PR27480. llvm-svn: 267361
* [X86][SSE] Added SSSE3/AVX/AVX2 BITREVERSE testsSimon Pilgrim2016-04-241-52/+14603
| | | | | | Codegen is pretty bad at the moment but could use PSHUFB quite efficiently llvm-svn: 267347
* [X86][XOP] Fixed VPPERM permute op decoding (PR27472).Simon Pilgrim2016-04-241-1/+1
| | | | | | Fixed issue with VPPERM target shuffle mask decoding that was incorrectly masking off the 3-bit permute op with a 2-bit mask. llvm-svn: 267346
* [X86][SSE] Improved support for decoding target shuffle masks through bitcastsSimon Pilgrim2016-04-242-13/+3
| | | | | | | | Reused the ability to split constants of a type wider than the shuffle mask to work with masks generated from scalar constants transfered to xmm. This fixes an issue preventing PSHUFB target shuffle masks decoding rematerialized scalar constants and also exposes the XOP VPPERM bug described in PR27472. llvm-svn: 267343
* [SystemZ] [SSP] Add support for LOAD_STACK_GUARD.Marcin Koscielnicki2016-04-241-0/+35
| | | | | | | | | | | | | | This fixes PR22248 on s390x. The previous attempt at this was D19101, which was before LOAD_STACK_GUARD existed. Compared to the previous version, this always emits a rather ugly block of 4 instructions, involving a thread pointer load that can't be shared with other potential users. However, this is necessary for SSP - spilling the guard value (or thread pointer used to load it) is counter to the goal, since it could be overwritten along with the frame it protects. Differential Revision: http://reviews.llvm.org/D19363 llvm-svn: 267340
* [X86][SSE] Demonstrate issue with decoding shuffle masks that have been ↵Simon Pilgrim2016-04-242-0/+37
| | | | | | | | lowered as rematerialized constants on scalar unit Found whilst investigating PR27472 llvm-svn: 267339
* [MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098)Gerolf Hoflehner2016-04-242-0/+264
| | | | | | | | | | | | | | | | | | | The original patch caused crashes because it could derefence a null pointer for SelectionDAGTargetInfo for targets that do not define it. Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267328
OpenPOWER on IntegriCloud