summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert r305465: [X86][AVX512] Improve lowering of AVX512 compare intrinsics ↵Simon Pilgrim2017-06-154-13490/+50
| | | | | | | | (remove redundant shift left+right instructions). This is causing windows buildbot failures llvm-svn: 305470
* [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove ↵Ayman Musa2017-06-154-50/+13490
| | | | | | | | | | | | | | | redundant shift left+right instructions). AVX512 compare instructions return v*i1 types. In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type. Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes. The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class. When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction. Differential Revision: https://reviews.llvm.org/D33188 llvm-svn: 305465
* [ARM] GlobalISel: Add support for i32 moduloDiana Picus2017-06-152-0/+96
| | | | | | | | | | | | | | | | | | Add support for modulo for targets that have hardware division and for those that don't. When hardware division is not available, we have to choose the correct libcall to use. This is generally straightforward, except for AEABI. The AEABI variant is trickier than the other libcalls because it returns { quotient, remainder }, instead of just one value like the other libcalls that we've seen so far. Therefore, we need to use custom lowering for it. However, we don't want to have too much special code, so we refactor the target-independent code in the legalizer by adding a helper for replacing an instruction with a libcall. This helper is used by the legalizer itself when dealing with simple calls, and also by the custom ARM legalization for the more complicated AEABI divmod calls. llvm-svn: 305459
* [ARM] GlobalISel: Lower only homogeneous struct argsDiana Picus2017-06-152-158/+45
| | | | | | | | | | | | | Lowering mixed struct args, params and returns used G_INSERT, which is a bit more convoluted to support through the entire pipeline. Since they don't occur that often in practice, it's probably wiser to leave them out until later. Meanwhile, we can lower homogeneous structs using G_MERGE_VALUES, which has good support in the legalizer. These occur e.g. as the return of __aeabi_idivmod, so it's nice to be able to support them. llvm-svn: 305458
* [AArch64] Enable FeatureFuseAES for the generic processor model.Florian Hahn2017-06-151-36/+41
| | | | | | | | | | | | | | | | | | | | | | | Summary: Scheduling AESE/AESMC and AESD/AESIMC instruction pairs back-to-back gives a double digit speedup on benchmarks using those instructions on Cortex-A processors. In GCC, this optimization is part of the generic processor model as well. This change should not have a major performance impact on processors that do not optimize AES instruction pairs, although I only had access to Cortex-A processors for benchmarking. Reviewers: rengolin, kristof.beyls, javed.absar, evandro, silviu.baranga, MatzeB, mcrosier, joelkevinjones, joel_k_jones, bmakam, t.p.northover Reviewed By: evandro Subscribers: sbaranga, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D33836 llvm-svn: 305457
* [mips][microMIPS] Extending size reduction pass with ADDIUSP and ADDIUR1SPZoran Jovanovic2017-06-151-0/+17
| | | | | | | | | | | | Author: milena.vujosevic.janicic Reviewers: sdardis The patch extends size reduction pass for MicroMIPS. The following instructions are examined and transformed, if possible: ADDIU instruction is transformed into 16-bit instruction ADDIUSP ADDIU instruction is transformed into 16-bit instruction ADDIUR1SP Differential Revision: https://reviews.llvm.org/D33887 llvm-svn: 305455
* Revert "[ARM] Support constant pools in data when generating execute-only code."Alexandros Lamprineas2017-06-141-50/+0
| | | | | | | | | | | This reverts commit 3a204faa093c681a1e96c5e0622f50649b761ee0. I've upset a buildbot which runs the address sanitizer: ERROR: AddressSanitizer: stack-use-after-scope lib/Target/ARM/ARMISelLowering.cpp:2690 That Twine variable is used illegally. llvm-svn: 305390
* [mips] Fix multiprecision arithmetic.Simon Dardis2017-06-146-233/+444
| | | | | | | | | | | | | | | | | | | | | | | | | | For multiprecision arithmetic on MIPS, rather than using ISD::ADDE / ISD::ADDC, get SelectionDAG to break down the operation into ISD::ADDs and ISD::SETCCs. For MIPS, only the DSP ASE has a carry flag, so in the general case it is not useful to directly support ISD::{ADDE, ADDC, SUBE, SUBC} nodes. Also improve the generation code in such cases for targets with TargetLoweringBase::ZeroOrOneBooleanContent by directly using the result of the comparison node rather than using it in selects. Similarly for ISD::SUBE / ISD::SUBC. Address optimization breakage by moving the generation of MIPS specific integer multiply-accumulate nodes to before legalization. This revolves PR32713 and PR33424. Thanks to Simonas Kazlauskas and Pirama Arumuga Nainar for reporting the issue! Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D33494 llvm-svn: 305389
* [ARM] Support constant pools in data when generating execute-only code.Alexandros Lamprineas2017-06-141-0/+50
| | | | | | | | | | | | | | The ARM backend asserts against constant pool lowering when it generates execute-only code in order to prevent the generation of constant pools in the text section. It appears that target independent optimizations might generate DAG nodes that represent constant pools. By lowering such nodes as global addresses we don't violate the semantics of execute-only code and also it is guaranteed that execute-only behaves correct with the position-independent addressing modes that support execute-only code. Differential Revision: https://reviews.llvm.org/D33773 llvm-svn: 305387
* Align definition of DW_OP_plus with DWARF spec [3/3]Florian Hahn2017-06-142-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch is part of 3 patches that together form a single patch, but must be introduced in stages in order not to break things. The way that LLVM interprets DW_OP_plus in DIExpression nodes is basically that of the DW_OP_plus_uconst operator since LLVM expects an unsigned constant operand. This unnecessarily restricts the DW_OP_plus operator, preventing it from being used to describe the evaluation of runtime values on the expression stack. These patches try to align the semantics of DW_OP_plus and DW_OP_minus with that of the DWARF definition, which pops two elements off the expression stack, performs the operation and pushes the result back on the stack. This is done in three stages: • The first patch (LLVM) adds support for DW_OP_plus_uconst. • The second patch (Clang) contains changes all its uses from DW_OP_plus to DW_OP_plus_uconst. • The third patch (LLVM) changes the semantics of DW_OP_plus and DW_OP_minus to be in line with its DWARF meaning. This patch includes the bitcode upgrade from legacy DIExpressions. Patch by Sander de Smalen. Reviewers: echristo, pcc, aprantl Reviewed By: aprantl Subscribers: fhahn, javed.absar, aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D33894 llvm-svn: 305386
* [mips] Fix machine verifier errors in the long branch passSimon Dardis2017-06-141-20/+20
| | | | | | | | | | | | | | | This patch fixes two systemic machine verifier errors in the long branch pass. The first is the incorrect basic block successors and the second was the incorrect construction of several jump instructions. This partially resolves PR27458 and the associated PR32146. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D33378 llvm-svn: 305382
* Revert r304907 as it is causing some failures that I cannot reproduce.Nemanja Ivanovic2017-06-145-567/+6
| | | | | | Reverting this until a test case can be provided to aid the investigation. llvm-svn: 305372
* [globalisel][legalizer] G_LOAD/G_STORE NarrowScalar should not emit G_GEP x, 0.Daniel Sanders2017-06-131-6/+2
| | | | | | | | | | | | | | | | | | | Summary: When legalizing G_LOAD/G_STORE using NarrowScalar, we should avoid emitting %0 = G_CONSTANT ty 0 %1 = G_GEP %x, %0 since it's cheaper to not emit the redundant instructions than it is to fold them away later. Reviewers: qcolombet, t.p.northover, ab, rovka, aditya_nandakumar, kristof.beyls Reviewed By: qcolombet Subscribers: javed.absar, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D32746 llvm-svn: 305340
* [Hexagon] Generate store-immediate instructions for stack objectsKrzysztof Parzyszek2017-06-131-0/+86
| | | | | | | | | Store-immediate instructions have a non-extendable offset. Since the actual offset for a stack object is not known until much later, only generate these stores when the stack size (at the time of instruction selection) is small. llvm-svn: 305305
* [Hexagon] Generate multiply-high instruction in iselKrzysztof Parzyszek2017-06-131-0/+27
| | | | llvm-svn: 305302
* [Hexagon] Don't kill live registers when creating mux out of tfrKrzysztof Parzyszek2017-06-131-0/+15
| | | | | | | | | | | When a mux instruction is created from a pair of complementary conditional transfers, it can be placed at the location of either the earlier or the later of the transfers. Since it will use the operands of the original transfers, putting it in the earlier location may hoist a kill of a source register that was originally further down. Make sure the kill flag is removed if the register is still used afterwards. llvm-svn: 305300
* [MIPS] BuildCondBr should preserve MO flagsSimon Dardis2017-06-131-0/+26
| | | | | | | | | | | | | | | | | While simplifying branches in the MachineInstr representation, the routine BuildCondBr must preserve flags on register MachineOperands. In particular, it must preserve the <undef> flag. This fixes a bug that is unlikely to occur in any real scenario, but which bugpoint is likely to introduce. Patch By Nick Johnson! Reviewers: ahatanak, sdardis Differential Revision: https://reviews.llvm.org/D34041 llvm-svn: 305290
* [Hexagon] Stop pmpy recognition when shift conversion failsKrzysztof Parzyszek2017-06-131-0/+48
| | | | | | | The conversion of shifts from right shifts to left shifts may fail. In such case, the pmpy recognition cannot proceed. llvm-svn: 305289
* [ARM] Add scheduling classes for VFNM[AS]Oliver Stannard2017-06-131-0/+38
| | | | | | | | | | The VFNM[AS] instructions did not have scheduling information attached, which was causing assertion failures with the Cortex-A57 scheduling model and -fp-contract=fast, because the Cortex-A57 sched model claims to be complete. Differential Revision: https://reviews.llvm.org/D34139 llvm-svn: 305288
* [AVX-512] Mark masked VPCMP instructions as commutable.Craig Topper2017-06-131-0/+13
| | | | llvm-svn: 305276
* [AVX-512] Mark masked version of vpcmpeq as being commutable.Craig Topper2017-06-131-0/+14
| | | | llvm-svn: 305275
* [X86] Add masked integer compare instructions to load folding tables.Craig Topper2017-06-131-0/+28
| | | | llvm-svn: 305274
* AMDGPU/GlobalISel: Mark 32-bit G_ADD as legalTom Stellard2017-06-121-0/+22
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D33992 llvm-svn: 305232
* AArch64: don't try to emit an add (shifted reg) for SP.Tim Northover2017-06-121-0/+288
| | | | | | | | | | The "Add/sub (shifted reg)" instructions use the 31 encoding for xzr and wzr rather than the SP, so we need to use different variants. Situations where this actually comes up are rare enough (see test-case) that I think falling back to DAG is fine. llvm-svn: 305230
* [PowerPC] Match vec_revb builtins to P9 instructions.Tony Jiang2017-06-121-0/+54
| | | | | | | | | | | | Power9 has instructions that will reverse the bytes within an element for all sizes (half-word, word, double-word and quad-word). These can be used for the vec_revb builtins in altivec.h. However, we implement these to match vector shuffle nodes as that will cover both the builtins and vector shuffles that occur in the SDAG through other means. Differential Revision: https://reviews.llvm.org/D33690 llvm-svn: 305214
* [Power9] Added support for the modsw, moduw, modsd, modud hardware instructions.Tony Jiang2017-06-121-0/+263
| | | | | | | | | | | Note that if we need the result of both the divide and the modulo then we compute the modulo based on the result of the divide and not using the new hardware instruction. Commit on behalf of STEFAN PINTILIE. Differential Revision: https://reviews.llvm.org/D33940 llvm-svn: 305210
* [x86] regenerate checks with update_llc_test_checks.pySanjay Patel2017-06-1228-160/+33
| | | | | | | | | | | The dream of a unified check-line auto-generator for all phases of compilation is dead. The llc script has already diverged to be better at its goal, so having 2 scripts that do almost the same thing is just causing confusion. We can rip out the llc ability in update_test_checks.py next and rename it, so it will be clear that we have one script for llc check auto-generation and another for opt. llvm-svn: 305206
* [SelectionDAG] Allow sin/cos -> sincos optimization on GNU triples w/ just ↵Geoff Berry2017-06-125-90/+261
| | | | | | | | | | | | | | | | | | | | | -fno-math-errno Summary: This change enables the sin(x) cos(x) -> sincos(x) optimization on GNU target triples. This optimization was being inhibited when -ffast-math wasn't set because sincos in GLibC does not set errno, while sin and cos do. However, this optimization will only run if the attributes on the sin/cos calls include readnone, which is how clang represents the fact that it doesn't care about the errno values set by these functions (via the -fno-math-errno flag). Reviewers: hfinkel, bogner Subscribers: mcrosier, javed.absar, llvm-commits, paul.redmond Differential Revision: https://reviews.llvm.org/D32921 llvm-svn: 305204
* AMDGPU: Teach isLegalAddressingMode about flat offsetsMatt Arsenault2017-06-121-7/+116
| | | | | | | Also fix reporting r+r as a valid addressing mode without offsets. llvm-svn: 305203
* [x86] regenerate checks with update_llc_test_checks.pySanjay Patel2017-06-128-121/+192
| | | | | | | | | | The dream of a unified check-line auto-generator for all phases of compilation is dead. The llc script has already diverged to be better at its goal, so having 2 scripts that do almost the same thing is just causing confusion for newcomers. I plan to fix up more x86 tests in a next commit. We can rip out the llc ability in update_test_checks.py after that. llvm-svn: 305202
* AMDGPU: Start selecting flat instruction offsetsMatt Arsenault2017-06-123-54/+174
| | | | llvm-svn: 305201
* AMDGPU: Start adding offset fields to flat instructionsMatt Arsenault2017-06-128-66/+66
| | | | llvm-svn: 305194
* StackColoring: smarter check for slot overlapThan McIntosh2017-06-121-0/+64
| | | | | | | | | | | | | | | | | | | | | | | Summary: The old check for slot overlap treated 2 slots `S` and `T` as overlapping if there existed a CFG node in which both of the slots could possibly be active. That is overly conservative and caused stack blowups in Rust programs. Instead, check whether there is a single CFG node in which both of the slots are possibly active *together*. Fixes PR32488. Patch by Ariel Ben-Yehuda <ariel.byd@gmail.com> Reviewers: thanm, nagisa, llvm-commits, efriedma, rnk Reviewed By: thanm Subscribers: dotdash Differential Revision: https://reviews.llvm.org/D31583 llvm-svn: 305193
* [AVX-512] Add VPCONFLICT and VPLZCNT to load folding tables.Craig Topper2017-06-122-2/+110
| | | | llvm-svn: 305180
* [x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a ↵Sanjay Patel2017-06-112-41/+23
| | | | | | | | | | | | | | | | | | | | | 32-byte load I was looking closer at the x86 test diffs in D33866, and the first change seems like it shouldn't happen in the first place. So this patch will resolve that. Using Agner's tables and AMD docs, vperm2f128 and vinsertf128 have identical timing for any given CPU model, so we should be able to interchange those without affecting perf. But as we can see in some of the diffs here, using vperm2f128 allows load folding, so we should take that opportunity to reduce code size and register pressure. A secondary advantage is making AVX1 and AVX2 codegen more similar. Given that vperm2f128 was introduced with AVX1, we should be selecting it in all of the same situations that we would with AVX2. If there's some reason that an AVX1 CPU would not want to use this instruction, that should be fixed up in a later pass. Differential Revision: https://reviews.llvm.org/D33938 llvm-svn: 305171
* [DAGCombine] Make sure we check the ResNo from UADDO before combiningAmaury Sechet2017-06-111-0/+24
| | | | | | | | | | | | Summary: UADDO has 2 result, and one must check the result no before doing any kind of combine. Without it, the transform is invalid. Reviewers: joerg Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34088 llvm-svn: 305162
* [X86][SSE] Extended PR32368 to SSE/AVX1/AVX2Simon Pilgrim2017-06-101-8/+142
| | | | llvm-svn: 305154
* [X86][AVX512] Added test case for PR32368Simon Pilgrim2017-06-101-0/+19
| | | | llvm-svn: 305153
* AMDGPU : Fix ISA Version Definitions.Wei Ding2017-06-101-0/+13
| | | | | | Differential Revision: http://reviews.llvm.org/D28531 llvm-svn: 305137
* [PowerPC] add memcmp test with one constant operand and equality cmp; NFCSanjay Patel2017-06-091-3/+29
| | | | llvm-svn: 305131
* [AArch64] Add fallback in FastISel fp16 conversionsI-Jui (Ray) Sung2017-06-091-0/+131
| | | | | | | | | | | | | | | | | Summary: - Fix assertion failures on F16 to/from int types in FastISel by falling back to regular ISel - Add a testcase of various conversion cases with FastISel (-O0) Reviewers: kristof.beyls, jmolloy, SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: SjoerdMeijer, llvm-commits, srhines, pirama, aemerson, rengolin, javed.absar, kristof.beyls Differential Revision: https://reviews.llvm.org/D33734 llvm-svn: 305127
* [AMDGPU] Add intrinsics for alignbit and alignbyte instructionsStanislav Mekhanoshin2017-06-091-0/+23
| | | | | | Differential Revision: https://reviews.llvm.org/D34046 llvm-svn: 305098
* [X86][SSE] Add support for PACKSS nodes to faux shuffle extractionSimon Pilgrim2017-06-091-273/+265
| | | | | | If the inputs won't saturate during packing then we can treat the PACKSS as a truncation shuffle llvm-svn: 305091
* Reland "[SelectionDAG] Enable target specific vector scalarization of calls ↵Simon Dardis2017-06-094-24/+1697
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and returns" By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown, backends can request that LLVM to scalarize vector types for calls and returns. The MIPS vector ABI requires that vector arguments and returns are passed in integer registers. With SelectionDAG's new hooks, the MIPS backend can now handle LLVM-IR with vector types in calls and returns. E.g. 'call @foo(<4 x i32> %4)'. Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for calls and returns if vector types were not legal. If vector types were legal, a single 128bit vector argument would be assigned to a single 32 bit / 64 bit integer register. By teaching the MIPS backend to inspect the original types, it can now implement the MIPS vector ABI which requires a particular method of scalarizing vectors. Previously, the MIPS backend relied on clang to scalarize types such as "call @foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3, i32 inreg %4)". This patch enables the MIPS backend to take either form for vector types. The previous version of this patch had a "conditional move or jump depends on uninitialized value". Reviewers: zoran.jovanovic, jaydeep, vkalintiris, slthakur Differential Revision: https://reviews.llvm.org/D27845 llvm-svn: 305083
* [AMDGPU] Fix for issue in alloca to vector promotion passDavid Stuttard2017-06-091-0/+131
| | | | | | | | | | | | | | | Summary: Alloca promotion pass not dealing with non-canonical input Added some additional checks so the pass simply backs-off forms it can't deal with (non-canonical) Also added some test cases in non-canonical form to check that it no longer crashes Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31710 llvm-svn: 305079
* Prevent RemoveDeadNodes from deleted already deleted node.Nirav Dave2017-06-091-0/+83
| | | | | | | | | | | | | | | | | | | | | This prevents against assertion errors like PR32659 which occur from a replacement deleting a node after it's been added to the list argument of RemoveDeadNodes. The specific failure from PR32659 does not currently happen, but it is still potentially possible. The underlying cause is that the callers of the change dfunction builds up a list of nodes to delete after having moved their uses and it possible that a move of a later node will cause a previously deleted nodes to be deleted. Reviewers: bkramer, spatel, davide Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33731 llvm-svn: 305070
* [ARM] Add scheduling info for VFMSOliver Stannard2017-06-091-5/+86
| | | | | | | | | | The scalar VFMS instructions did not have scheduling information attached (but VFMA did), which was causing assertion failures with the Cortex-A57 scheduling model and -fp-contract=fast. Differential Revision: https://reviews.llvm.org/D34040 llvm-svn: 305064
* RegAllocPBQP: Do not assign reserved physical registerMatthias Braun2017-06-081-0/+35
| | | | | | | | | | | | | | | | (0) RegAllocPBQP: Since getRawAllocationOrder() may return a collection that includes reserved physical registers, iterate to find an un-reserved physical register. (1) VirtRegMap: Enforce the invariant: "no reserved physical registers" in assignVirt2Phys(). Previously, this was checked only after the fact in VirtRegRewriter::rewrite. (2) MachineVerifier: updated the test per MatzeB's review. (3) +testcase Patch by Nick Johnson<Nicholas.Paul.Johnson@deshawresearch.com>! Differential Revision: https://reviews.llvm.org/D33947 llvm-svn: 305016
* [Hexagon] Skip mux generation when predicate register is undefinedKrzysztof Parzyszek2017-06-081-0/+27
| | | | llvm-svn: 305014
* [CGP] don't expand a memcmp with nobuiltin attributeSanjay Patel2017-06-081-6/+15
| | | | | | | | | | | | | | | | This matches the behavior used in the SDAG when expanding memcmp. For reference, we're intentionally treating the earlier fortified call transforms differently after: https://bugs.llvm.org/show_bug.cgi?id=23093 https://reviews.llvm.org/rL233776 One motivation for not transforming nobuiltin calls is that it can interfere with sanitizers: https://reviews.llvm.org/D19781 https://reviews.llvm.org/D19801 Differential Revision: https://reviews.llvm.org/D34043 llvm-svn: 305007
OpenPOWER on IntegriCloud