summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] Check for correct HW div when lowering divmodDiana Picus2017-04-181-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For subtargets that use the custom lowering for divmod, e.g. gnueabi, we used to check if the subtarget has hardware divide and then lower to a div-mul-sub sequence if true, or to a libcall if false. However, judging by the usage of hasDivide vs hasDivideInARMMode, it seems that hasDivide only refers to Thumb. For instance, in the ARMTargetLowering constructor, the code that specifies whether to use libcalls for (S|U)DIV looks like this: bool hasDivide = Subtarget->isThumb() ? Subtarget->hasDivide() : Subtarget->hasDivideInARMMode(); In the case of divmod for arm-gnueabi, using only hasDivide() to determine what to do means that instead of lowering to __aeabi_idivmod to get the remainder, we lower to div-mul-sub and then further lower the div to __aeabi_idiv. Even worse, if we have hardware divide in ARM but not in Thumb, we generate a libcall instead of using it (this is not an issue in practice since AFAICT none of the cores that we support have hardware divide in ARM but not Thumb). This patch fixes the code dealing with custom lowering to take into account the mode (Thumb or ARM) when deciding whether or not hardware division is available. Differential Revision: https://reviews.llvm.org/D32005 llvm-svn: 300536
* [GlobalISel] Support vector-of-pointers in LLTKristof Beyls2017-04-181-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes PR32471. As comment 10 on that bug report highlights (https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a few different defendable design tradeoffs that could be made, including not representing pointers at all in LLT. I decided to go for representing vector-of-pointer as a concept in LLT, while keeping the size of the LLT type 64 bits (this is an increase from 48 bits before). My rationale for keeping pointers explicit is that on some targets probably it's very handy to have the distinction between pointer and non-pointer (e.g. 68K has a different register bank for pointers IIRC). If we keep a scalar pointer, it probably is easiest to also have a vector-of-pointers to keep LLT relatively conceptually clean and orthogonal, while we don't have a very strong reason to break that orthogonality. Once we gain more experience on the use of LLT, we can of course reconsider this direction. Rejecting vector-of-pointer types in the IRTranslator is also an option to avoid the crash reported in PR32471, but that is only a very short-term solution; also needs quite a bit of code tweaks in places, and is probably fragile. Therefore I didn't consider this the best option. llvm-svn: 300535
* Change the testcase tail-merge-after-mbp.ll to tail-merge-after-mbp.mirHaicheng Wu2017-04-172-94/+105
| | | | | | Differential Revision: https://reviews.llvm.org/D32037 llvm-svn: 300506
* [WebAssembly] Fix WebAssemblyOptimizeReturned after r300367Jacob Gravelle2017-04-171-0/+31
| | | | | | | | | | | | | | | | Summary: Refactoring changed paramHasAttr(1 + i) to paramHasAttr(0), fix that to paramHasAttr(i). Add more tests to WebAssemblyOptimizeReturned that catch that regression. Reviewers: dschuff Subscribers: jfb, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D32136 llvm-svn: 300502
* AMDGPU: Use MachineRegisterInfo to find max used registerMatt Arsenault2017-04-173-16/+51
| | | | | | | | | | Avoid looping through program to determine register counts. This avoids needing to look at regmask operands. Also fixes some counting errors with flat_scr when there are no stack objects. llvm-svn: 300482
* AArch64: put nonlazybind special handling behind a flag for now.Tim Northover2017-04-171-1/+9
| | | | | | | | It's basically a terrible idea anyway but objc_msgSend gets emitted like that. We can decide on a better way to deal with it in the unlikely event that anyone actually uses it. llvm-svn: 300474
* AArch64: support nonlazybindTim Northover2017-04-171-0/+32
| | | | | | | | It's almost certainly not a good idea to actually use it in most cases (there's a pretty large code size overhead on AArch64), but we can't do those experiments until it's supported. llvm-svn: 300462
* [X86] Remove special handling for 16 bit for A asm constraints.Benjamin Kramer2017-04-161-1/+8
| | | | | | | | | | Our 16 bit support is assembler-only + the terrible hack that is .code16gcc. Simply using 32 bit registers does the right thing for the latter. Fixes PR32681. llvm-svn: 300429
* Use correct registers for "A" inline asm constraintDimitry Andric2017-04-151-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: In PR32594, inline assembly using the 'A' constraint on x86_64 causes llvm to crash with a "Cannot select" stack trace. This is because `X86TargetLowering::getRegForInlineAsmConstraint` hardcodes that 'A' means the EAX and EDX registers. However, on x86_64 it means the RAX and RDX registers, and on 16-bit x86 (ia16?) it means the old AX and DX registers. Add new register classes in `X86RegisterInfo.td` to support these cases, and amend the logic in `getRegForInlineAsmConstraint` to cope with different subtargets. Also add a test case, derived from PR32594. Reviewers: craig.topper, qcolombet, RKSimon, ab Reviewed By: ab Subscribers: ab, emaste, royger, llvm-commits Differential Revision: https://reviews.llvm.org/D31902 llvm-svn: 300404
* [AMDGPU] set read_only access qualifier for pointersStanislav Mekhanoshin2017-04-141-0/+33
| | | | | | | | | | If a kernel's pointer argument is known to be readonly set access qualifier accordingly. This allows RT not to flush caches before dispatches. Differential Revision: https://reviews.llvm.org/D32091 llvm-svn: 300362
* [X86][SSE] Update MOVNTDQA non-temporal loads to generic implementation (LLVM)Simon Pilgrim2017-04-144-33/+51
| | | | | | | | | | MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics. Clang companion patch: D31766. Differential Revision: https://reviews.llvm.org/D31767 llvm-svn: 300325
* Fix for PR#30562: Selection DAG error: Detected cycle in SelectionDAG.Andrew V. Tischenko2017-04-141-0/+22
| | | | | | Patch by Dinar Temirbulatov llvm-svn: 300314
* This patch closes PR#32216: Better testing of schedule model instruction ↵Andrew V. Tischenko2017-04-142-595/+631
| | | | | | | | latencies/throughputs. The details are here: https://reviews.llvm.org/D30941 llvm-svn: 300311
* [AMDGPU] added SIInstrInfo::getAddNoCarry() helperStanislav Mekhanoshin2017-04-141-119/+146
| | | | | | | | Addressed rest of post submit comments from D31993. Differential Revision: https://reviews.llvm.org/D32057 llvm-svn: 300288
* [AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16Adam Nemet2017-04-133-4/+85
| | | | | | | | | | | | This further improves Ahmed's change in rL299482. See the new comment for the rationale. The patch recovers most of the regression for bzip2 after D31965. We're down to +2.68% from +6.97%. Differential Revision: https://reviews.llvm.org/D32028 llvm-svn: 300276
* AMDGPU/GFX9: Do not use v_pack_b32_f16 when packingKonstantin Zhuravlyov2017-04-135-57/+37
| | | | | | Differential Revision: https://reviews.llvm.org/D31819 llvm-svn: 300275
* [bpf] Fix memory offset check for loads and storesAlexei Starovoitov2017-04-131-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the offset cannot fit into the instruction, an addition to the pointer is emitted before the actual access. However, BPF offsets are 16-bit but LLVM considers them to be, for the matter of this check, to be 32-bit long. This causes the following program: int bpf_prog1(void *ign) { volatile unsigned long t = 0x8983984739ull; return *(unsigned long *)((0xffffffff8fff0002ull) + t); } To generate the following (wrong) code: 0: 18 01 00 00 39 47 98 83 00 00 00 00 89 00 00 00 r1 = 590618314553ll 2: 7b 1a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r1 3: 79 a1 f8 ff 00 00 00 00 r1 = *(u64 *)(r10 - 8) 4: 79 10 02 00 00 00 00 00 r0 = *(u64 *)(r1 + 2) 5: 95 00 00 00 00 00 00 00 exit Fix it by changing the offset check to 16-bit. Patch by Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Differential Revision: https://reviews.llvm.org/D32055 llvm-svn: 300269
* [AMDGPU] Combine DS operations with offsets bigger than byteStanislav Mekhanoshin2017-04-131-0/+385
| | | | | | | | | In many cases ds operations can be combined even if offsets do not fit into 8 bit encoding. What it takes is to adjust base address. Differential Revision: https://reviews.llvm.org/D31993 llvm-svn: 300227
* [Hexagon] Unxfail passing tests Krzysztof Parzyszek2017-04-132-4/+0
| | | | | | | r300198 fixed a problem that caused two tests to be xfailed. Unxfail these tests now, since they are passing. llvm-svn: 300203
* AMDGPU : Fix common dominator of two incoming blocks terminates with uniform ↵Wei Ding2017-04-124-6/+65
| | | | | | | | branch issue. Differential Revision: http://reviews.llvm.org/D31350 llvm-svn: 300142
* AMDGPU: Fix invalid copies when copying i1 to phys regMatt Arsenault2017-04-121-0/+36
| | | | | | | Insert a VReg_1 virtual register so the i1 workaround pass can handle it. llvm-svn: 300113
* [AMDGPU] Generate range metadata for workitem idStanislav Mekhanoshin2017-04-1214-55/+141
| | | | | | | | | If workgroup size is known inform llvm about range returned by local id and local size queries. Differential Revision: https://reviews.llvm.org/D31804 llvm-svn: 300102
* [GlobalIsel][X86] support G_CONSTANT selection.Igor Breger2017-04-123-0/+224
| | | | | | | | | | | | | | Summary: [GlobalISel][X86] support G_CONSTANT selection. Add regbank select tests. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: llvm-commits, dberris, rovka, kristof.beyls Differential Revision: https://reviews.llvm.org/D31974 llvm-svn: 300057
* [AMDGPU] SDWA: make pass globalSam Kolton2017-04-121-18/+10
| | | | | | | | | | | | Summary: Remove checks for basic blocks. Reviewers: vpykhtin, rampitec, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31935 llvm-svn: 300040
* [AMDGPU] Add a new pass to insert waitcnts. Leave under an option for testing.Kannan Narayanan2017-04-122-5/+0
| | | | | | Based on comments in https://reviews.llvm.org/D31161. llvm-svn: 300023
* AMDGPU: Insert wait at start of callee functionsMatt Arsenault2017-04-112-1/+26
| | | | llvm-svn: 300000
* AMDGPU: Refactor SIMachineFunctionInfo slightlyMatt Arsenault2017-04-111-7/+7
| | | | | | Prepare for handling non-entry functions. llvm-svn: 299999
* AMDGPU: Refactor argument loweringMatt Arsenault2017-04-111-2/+3
| | | | | | | Split into smaller functions and prepare for handling non-entry functions. llvm-svn: 299998
* AMDGPU: Fix folding reg_sequence into copy to phys regMatt Arsenault2017-04-111-0/+13
| | | | | | | This was producing an illegal reg_sequence defining a physical register with virtual register inputs. llvm-svn: 299997
* [DAGCombine] Add more test cases for shuffle of splat. NFC.Zvi Rackover2017-04-111-0/+56
| | | | | | Tests added contain splat-masks with undef elements. llvm-svn: 299988
* [x86] Relax the check in areLoadsFromSameBasePtrEaswaran Raman2017-04-111-4/+4
| | | | | | | | | Check if the scale operand is identical (doesn't have to be 1) and do not check the chaain operand. Differential revision: https://reviews.llvm.org/D31833 llvm-svn: 299986
* MIR: Allow parsing of empty machine functionsJustin Bogner2017-04-118-41/+17
| | | | | | | | | | | | If you run llc -stop-after=codegenprepare and feed the resulting MIR to llc -start-after=codegenprepare, you'll have an empty machine function since we haven't run any isel yet. Of course, this only works if the MIRParser believes you that this is okay. This is essentially a revert of r241862 with a fix for the problem it was papering over. llvm-svn: 299975
* [X86] Create the correct ADC/SBB SDNode when lowering add.Davide Italiano2017-04-111-0/+27
| | | | | | Differential Revision: https://reviews.llvm.org/D31911 llvm-svn: 299973
* [AMDGPU] Add A5 to data layout for amdgiz environmentYaxun Liu2017-04-112-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D31589 llvm-svn: 299964
* GlobalISel: Allow legalizing G_FADD to a libcallDiana Picus2017-04-112-6/+112
| | | | | | | | | Use the same handling in the generic legalizer code as for the other libcalls (G_FREM, G_FPOW). Enable it on ARM for float and double so we can test it. llvm-svn: 299931
* [GlobalISel] LegalizerInfo: Enable legalization of non-power-of-2 typesVolkan Keles2017-04-112-0/+37
| | | | | | | | | | | | | | Summary: Legalize only if the type is marked as Legal or Custom. If not, return Unsupported as LegalizerHelper is not able to handle non-power-of-2 types right now. Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, kristof.beyls, javed.absar, ab Reviewed By: kristof.beyls, ab Subscribers: dberris, rovka, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D31711 llvm-svn: 299929
* [SelectionDAG] Check CALLSEQ_BEGIN nodes in DelayForLiveRegsSam Parker2017-04-111-0/+136
| | | | | | | | | | | | | | | | | A fix for the bug reported in PR30911. The issue arises when multiple CALLSEQ_BEGIN nodes are unscheduled as the last node to be unscheduled will gain access to the CallResource register. But when a node is being picked, only CALLSEQ_END nodes are checked against the CallResource and have their chains evaluated. This then means that other CALLSEQ_BEGIN nodes can be scheduled before the existing call sequence has been finalised. This patch adds a check against the FrameSetup nodes in DelayForLiveRegs to prevent this from happening. Differential Revision: https://reviews.llvm.org/D31536 llvm-svn: 299926
* [PowerPC] multiply-with-overflow might use the CTR registerHal Finkel2017-04-111-0/+34
| | | | | | | | | | | | Check the legality of ISD::[US]MULO to see whether Intrinsic::[us]mul_with_overflow will legalize into a function call (and, thus, will use the CTR register). Fixes PR32485. Patch by Tim Neumann! Differential Revision: https://reviews.llvm.org/D31790 llvm-svn: 299910
* [ARM, x86] add tests to show possible improvement for bool math; NFCSanjay Patel2017-04-102-0/+64
| | | | llvm-svn: 299897
* CodeGen: BlockPlacement: Don't always tail-duplicate with no other successor.Kyle Butt2017-04-102-2/+55
| | | | | | | | | | The math works out where it can actually be counter-productive. The probability calculations correctly handle the case where the alternative is 0 probability, rely on those calculations. Includes a test case that demonstrates the problem. llvm-svn: 299892
* CodeGen: BlockPlacement: Minor probability changes.Kyle Butt2017-04-102-1/+42
| | | | | | | Qin may be large, and Succ may be more frequent than BB. Take these both into account when deciding if tail-duplication is profitable. llvm-svn: 299891
* CodeGen: BranchFolding: Merge identical blocks, even if they are short.Kyle Butt2017-04-101-0/+41
| | | | | | | | Merging identical blocks when it doesn't reduce fallthrough. It is common for the blocks created from critical edge splitting to be identical. We would like to merge these blocks whenever doing so would not reduce fallthrough. llvm-svn: 299890
* Add address space mangling to lifetime intrinsicsMatt Arsenault2017-04-1053-355/+355
| | | | | | In preparation for allowing allocas to have non-0 addrspace. llvm-svn: 299876
* [X86][MMX] Add fast-isel support for MMX non-temporal writesSimon Pilgrim2017-04-101-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D31754 llvm-svn: 299852
* [ARM] GlobalISel: Support G_FPOW for float and doubleDiana Picus2017-04-102-3/+114
| | | | | | Legalize to a libcall. llvm-svn: 299841
* AMDGPU: Actually write nops for writeNopDataMatt Arsenault2017-04-081-0/+87
| | | | | | | Before this was just writing 0s, which ends up looking like a v_cndmask_b32 v0, s0, v0, vcc. Write out an encoded s_nop instead. llvm-svn: 299816
* [ARM] Prefer BIC over BFC in ARM mode.Eli Friedman2017-04-077-19/+25
| | | | | | | | | | | | BIC is generally faster, and it can put the output in a different register from the input. We already do this in Thumb2 mode; not sure why the equivalent fix never got applied to ARM mode. Differential Revision: https://reviews.llvm.org/D31797 llvm-svn: 299803
* [GlobalISel]: Fix bug where we can report GISelFailure on erased instructionsAditya Nandakumar2017-04-071-0/+8
| | | | | | | | | | | | | The original instruction might get legalized and erased and expanded into intermediate instructions and the intermediate instructions might fail legalization. This end up in reporting GISelFailure on the erased instruction. Instead report GISelFailure on the intermediate instruction which failed legalization. Reviewed by: ab llvm-svn: 299802
* [AArch64] Allow global register asm("x18") or asm("w18") under -ffixed-x18Petr Hosek2017-04-072-0/+28
| | | | | | | | | | | | When using -ffixed-x18, the x18 (or w18) register can safely be used with the "global register variable" GCC extension, but the backend fails to recognize it. Patch by Roland McGrath. Differential Revision: https://reviews.llvm.org/D31793 llvm-svn: 299799
* Revert "[SelectionDAG] Enable target specific vector scalarization of calls ↵Simon Dardis2017-04-074-1697/+24
| | | | | | | | | | | | | and returns" This reverts commit r299766. This change appears to have broken the MIPS buildbots. Reverting while I investigate. Revert "[mips] Remove usage of debug only variable (NFC)" This reverts commit r299769. Follow up commit. llvm-svn: 299788
OpenPOWER on IntegriCloud