summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU/GlobalISel: Mark 32-bit float constants as legalTom Stellard2017-05-262-0/+6
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33212 llvm-svn: 304003
* [AMDGPU] SDWA: add disassembler support for GFX9Sam Kolton2017-05-265-31/+113
| | | | | | | | | | | | Summary: Added decoder methods and tests Reviewers: vpykhtin, artem.tamazov, dp Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D33545 llvm-svn: 303999
* [ARM] Fix lowering of misaligned memcpy/memsetJohn Brawn2017-05-261-6/+0
| | | | | | | | | | | | | | | | | Currently getOptimalMemOpType returns i32 for large enough sizes without checking for alignment, leading to poor code generation when misaligned accesses aren't permitted as we generate a word store then later split it up into byte stores. This means we inadvertantly go over the MaxStoresPerMemcpy limit and for memset we splat the memset value into a word then immediately split it up again. Fix this by leaving it up to FindOptimalMemOpLowering to figure out which type to use, but also fix a bug there where it wasn't correctly checking if misaligned memory accesses are allowed. Differential Revision: https://reviews.llvm.org/D33442 llvm-svn: 303990
* The fix for PR22004: X86AsmParser.cpp asserts: OperandStack.size() > 1 && ↵Andrew V. Tischenko2017-05-261-2/+3
| | | | | | "Too few operands." llvm-svn: 303985
* Fix signedness of constant. NFC.Nirav Dave2017-05-261-5/+5
| | | | llvm-svn: 303980
* [PPC] Add text for assert.Tim Shen2017-05-251-1/+1
| | | | llvm-svn: 303940
* [PPC] Fix atomics lowering in DAG lowering.Tim Shen2017-05-251-1/+3
| | | | | | | | | | | I forgot to forward the chain, causing some missing instruction dependencies. The test crashes the compiler without this patch. Inspired by the test case, D33519 also tries to remove the extra sync. Differential Revision: https://reviews.llvm.org/D33573 llvm-svn: 303931
* PPC: Correct Size for GETtlsADDRKyle Butt2017-05-251-1/+3
| | | | | | | | | PPC::GETtlsADDR is lowered to a branch and a nop, by the assembly printer. Its size was incorrectly marked as 4, correct it to 8. The incorrect size can cause incorrect branch relaxation in PPCBranchSelector under the right conditions. llvm-svn: 303904
* Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots.Nico Weber2017-05-251-3/+1
| | | | llvm-svn: 303902
* [AArch64]: add 'a' inline asm operand modifier.Manoj Gupta2017-05-251-1/+4
| | | | | | | | | | | | | | | | Summary: This is used in the Linux kernel, and effectively just means "print an address". This brings back r193593. Reviewed by: Renato Golin Reviewers: t.p.northover, rengolin, richard.barton.arm, kristof.beyls Subscribers: aemerson, javed.absar, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D33558 llvm-svn: 303901
* [AMDGPU] add intrinsic for s_getpcTim Corringham2017-05-251-1/+3
| | | | | | | | | | | | | | Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32862 llvm-svn: 303859
* [X86] Adding vpopcntd and vpopcntq instructionsOren Ben Simhon2017-05-257-0/+59
| | | | | | | | | AVX512_VPOPCNTDQ is a new feature set that was published by Intel. The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq). Differential Revision: https://reviews.llvm.org/D33169 llvm-svn: 303858
* [PowerPC] Fix a performance bug for PPC::XXSLDWI.Tony Jiang2017-05-244-4/+96
| | | | | | | | There are some VectorShuffle Nodes in SDAG which can be selected to XXSLDWI instruction, this patch recognizes them and does the selection to improve the PPC performance. llvm-svn: 303822
* [AArch64] Prevent nested ADDs from address calc in splitStoreSplat. NFCNirav Dave2017-05-241-2/+12
| | | | | | In preparation for late-stage store merging. llvm-svn: 303800
* P9: D-form vector load/store. Differential Revision: ↵Zaara Syeda2017-05-241-18/+34
| | | | | | https://reviews.llvm.org/D33248 llvm-svn: 303780
* Revert r291254: [AArch64] Reduce vector insert/extract cost for FalkorMatthew Simpson2017-05-241-1/+0
| | | | | | | The default vector insert/extract cost is more profitable on Falkor than the reduced cost. llvm-svn: 303771
* [AMDGPU] Prevent too large store merges in AMDGPU Subtargets. NFCI.Nirav Dave2017-05-244-0/+24
| | | | | | | | | Various address spaces on the SI and R600 subtargets have stricter limits on memory access size that other address spaces. Use canMergeStoresTo predicate to prevent the DAGCombiner from creating these stores as they will be split up during legalization. llvm-svn: 303767
* [MSP430] Fix PR33050: Don't use ADD16ri to lower FrameIndex.Vadzim Dambrouski2017-05-243-3/+8
| | | | | | | | | Use ADDframe pseudo instruction instead. This will fix machine verifier error, and will help to fix PR32146. Differential Revision: https://reviews.llvm.org/D33452 llvm-svn: 303758
* Revert "AMDGPU: Fold CI-specific complex SMRD patterns into existing complex ↵Marek Olsak2017-05-244-18/+51
| | | | | | | | | | | patterns" This reverts commit e065977c4b5f68ab845400b256f6a3822b1325fa. It doesn't work. S_LOAD_DWORD_IMM_ci and friends aren't selected by any of the patterns, so it was putting 32-bit literals into the 8-bit field. llvm-svn: 303754
* [Hexagon] Fix comment in HexagonPacketizer::runOnMachineFunctionKrzysztof Parzyszek2017-05-241-2/+2
| | | | | | | | Patch by Wei-Ren Chen. Differential Revision: https://reviews.llvm.org/D33439 llvm-svn: 303745
* [LoopVectorizer] Let target prefer scalar addressing computations.Jonas Paulsson2017-05-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | The loop vectorizer usually vectorizes any instruction it can and then extracts the elements for a scalarized use. On SystemZ, all elements containing addresses must be extracted into address registers (GRs). Since this extraction is not free, it is better to have the address in a suitable register to begin with. By forcing address arithmetic instructions and loads of addresses to be scalar after vectorization, two benefits result: * No need to extract the register * LSR optimizations trigger (LSR isn't handling vector addresses currently) Benchmarking show improvements on SystemZ with this new behaviour. Any other target could try this by returning false in the new hook prefersVectorizedAddressing(). Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand https://reviews.llvm.org/D32422 llvm-svn: 303744
* [SystemZ] Fix register modelling in expandLoadStackGuard()Jonas Paulsson2017-05-241-16/+14
| | | | | | | | EXPENSIVE_CHECKS found this bug (https://bugs.llvm.org/show_bug.cgi?id=33047), which this patch fixes. The EAR instruction defines a GR32, not a GR64. Review: Ulrich Weigand llvm-svn: 303743
* Strip trailing whitespace. NFCI.Simon Pilgrim2017-05-241-2/+2
| | | | llvm-svn: 303736
* [ARM] Remove ThumbTargetMachines. (NFC)Florian Hahn2017-05-243-114/+15
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Thumb code generation is controlled by ARMSubtarget and the concrete ThumbLETargetMachine and ThumbBETargetMachine are not needed. Eric Christopher suggested removing the unneeded target machines in https://reviews.llvm.org/D33287. I think it still makes sense to keep separate TargetMachines for big and little endian as we probably do not want to have different endianess for difference functions in a single compilation unit. The MIPS backend has two separate TargetMachines for big and little endian as well. Reviewers: echristo, rengolin, kristof.beyls, t.p.northover Reviewed By: echristo Subscribers: aemerson, javed.absar, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D33318 llvm-svn: 303733
* [ARM] Add VLDx/VSTx sched defs for machine-schedulers. NFCIJaved Absar2017-05-245-337/+292
| | | | | | | | | | This patch adds missing scheds for Neon VLDx/VSTx instructions. This will help one write schedulers easier/faster in the future for ARM sub-targets. Existing models will not affected by this patch. Reviewed by: Renato Golin, Diana Picus Differential Revision: https://reviews.llvm.org/D33120 llvm-svn: 303717
* [MSP430] Add subtarget features for hardware multiplier.Vadzim Dambrouski2017-05-234-25/+54
| | | | | | | | Also add more processors to make -mcpu option behave similar to gcc. Differential Revision: https://reviews.llvm.org/D33335 llvm-svn: 303695
* [AMDGPU] Add INDIRECT_BASE_ADDR to R600_Reg32 class (PR33045)Simon Pilgrim2017-05-231-1/+1
| | | | | | | | This fixes 17 of the 41 -verify-machineinstrs test failures identified in PR33045 Differential Revision: https://reviews.llvm.org/D33451 llvm-svn: 303691
* AMDGPU/SI: Move the local memory usage related checking after calling ↵Changpeng Fang2017-05-231-99/+114
| | | | | | | | | | | | | | | | | convention checking in PromoteAlloca Summary: Promoting Alloca to Vector and Promoting Alloca to LDS are two independent handling of Alloca and should not affect each other. As a result, we should not give up promoting to vector if there is not enough LDS. This patch factors out the local memory usage related checking out and replace it after the calling convention checking. Reviewer: arsenm Differential Revision: http://reviews.llvm.org/D33139 llvm-svn: 303684
* [AArch64][Falkor] Refine sched details for LSLfast/ASRfast.Geoff Berry2017-05-234-40/+189
| | | | llvm-svn: 303682
* [AMDGPU] Combine and (srl) into shl (bfe)Stanislav Mekhanoshin2017-05-233-11/+40
| | | | | | | | | | | | | | | | | | | Perform DAG combine: and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb Where nb is a number of trailing zeroes in mask. It replaces two instructions with two and BFE is generally a more expensive one. However this is only done if we are selecting a byte or word at an aligned boundary which results in a proper SDWA operand pattern. It is only done if SDWA is supported. TODO: improve SDWA pass to actually convert this pattern. It is not done now because we have an immediate in the instruction, which has be moved into a VGPR. Differential Revision: https://reviews.llvm.org/D33455 llvm-svn: 303681
* [AArch64][Falkor] Fix sched details for FMOV of WZR/XZR.Geoff Berry2017-05-232-6/+8
| | | | llvm-svn: 303680
* [ARM] Temporarily disable globals promotion to constant pools to prevent ↵Oleg Ranevskyy2017-05-231-1/+1
| | | | | | | | | | | | | | | | | | | | | miscompilation Summary: A temporary workaround for PR32780 - rematerialized instructions accessing the same promoted global through different constant pool entries. The patch turns off the globals promotion optimization leaving all its code in place, so that it can be easily turned on once PR32780 is fixed. Since this is a miscompilation issue causing generation of misbehaving code, and the problem is very subtle, the patch might be valuable enough to get into 4.0.1. Reviewers: efriedma, jmolloy Reviewed By: efriedma Subscribers: aemerson, javed.absar, llvm-commits, rengolin, asl, tstellar Differential Revision: https://reviews.llvm.org/D33446 llvm-svn: 303679
* [DAG] Add AddressSpace parameter to canMergeStoresTo. NFC.Nirav Dave2017-05-231-1/+1
| | | | llvm-svn: 303673
* AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patternsMarek Olsak2017-05-234-51/+18
| | | | | | | | | | | | This is just a cleanup. Also, it adds checking that ByteCount is aligned to 4. Reviewers: arsenm, nhaehnle, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28994 llvm-svn: 303658
* [AMDGPU] Convert shl (add) into add (shl)Stanislav Mekhanoshin2017-05-232-2/+43
| | | | | | | | | | | shl (or|add x, c2), c1 => or|add (shl x, c1), (c2 << c1) This allows to fold a constant into an address in some cases as well as to eliminate second shift if the expression is used as an address and second shift is a result of a GEP. Differential Revision: https://reviews.llvm.org/D33432 llvm-svn: 303641
* [mips] Remove unused class field. NFCSimon Atanasyan2017-05-232-5/+0
| | | | llvm-svn: 303639
* [mips] Change type of MipsSubtarget ctor arguments s/std::string/StringRef/. NFCSimon Atanasyan2017-05-232-5/+4
| | | | llvm-svn: 303638
* [AMDGPU] SDWA: Add assembler support for GFX9Sam Kolton2017-05-2313-64/+552
| | | | | | | | | | | | | | | Summary: Added separate pseudo and real instruction for GFX9 SDWA instructions. Currently supports only in assembler. Depends D32493 Reviewers: vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D33132 llvm-svn: 303620
* [AArch64] Make instruction fusion more aggressive. Florian Hahn2017-05-232-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch makes instruction fusion more aggressive by * adding artificial edges between the successors of FirstSU and SecondSU, similar to BaseMemOpClusterMutation::clusterNeighboringMemOps. * updating PostGenericScheduler::tryCandidate to keep clusters together, similar to GenericScheduler::tryCandidate. This change increases the number of AES instruction pairs generated on Cortex-A57 and Cortex-A72. This doesn't change code at all in most benchmarks or general code, but we've seen improvement on kernels using AESE/AESMC and AESD/AESIMC. Reviewers: evandro, kristof.beyls, t.p.northover, silviu.baranga, atrick, rengolin, MatzeB Reviewed By: evandro Subscribers: aemerson, rengolin, MatzeB, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33230 llvm-svn: 303618
* [GlobalISel][X86] G_LOAD/G_STORE vec256/512 supportIgor Breger2017-05-233-0/+38
| | | | | | | | | | | | | | Summary: mark G_LOAD/G_STORE vec256/512 legal for AVX/AVX512. Implement instruction selection. Reviewers: zvi, guyblank Reviewed By: zvi Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33268 llvm-svn: 303617
* [AArch64] Fix PRR33100.Akira Hatanaka2017-05-231-7/+10
| | | | | | | | | | | | | This commit fixes a bug introduced in r301019 where optimizeLogicalImm would replace a logical node's immediate operand that was CSE'd and was also an operand of another node. This commit fixes the bug by replacing the logical node instead of its immediate operand. rdar://problem/32295276 llvm-svn: 303607
* [Hexagon] Fix definitions of vector predicate loads and storesKrzysztof Parzyszek2017-05-221-22/+17
| | | | | | This fixes http://llvm.org/PR33048. llvm-svn: 303572
* [AMDGPU] Narrow lshl from 64 to 32 bit if possibleStanislav Mekhanoshin2017-05-221-11/+33
| | | | | | | | | Turn expensive 64 bit shift into 32 bit if shift does not overflow int: shl (ext x) => zext (shl x) Differential Revision: https://reviews.llvm.org/D33367 llvm-svn: 303569
* [AMDGPU] Fix incorrect register usage tracking in GCNUpwardTrackerValery Pykhtin2017-05-222-62/+86
| | | | | | Differential revision: https://reviews.llvm.org/D33289 llvm-svn: 303548
* [mips] Support micromips attribute passed by front-endSimon Atanasyan2017-05-221-0/+9
| | | | | | | | | This patch adds handling of the `micromips` and `nomicromips` attributes passed by front-end. The patch depends on D33363. Differential revision: https://reviews.llvm.org/D33364 llvm-svn: 303545
* Re-apply r302416: [ARM] Clear the constant pool cache on explicit .ltorg ↵James Molloy2017-05-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | directives Re-applying now that PR32825 which was raised on the commit this fixed up is now known to have also been fixed by this commit. Original commit message: Multiple ldr pseudoinstructions with the same constant value will reuse the same constant pool entry. However, if the constant pool is explicitly flushed with a .ltorg directive, we should not try to reference constants in the previous pool any longer, since they may be out of range. This fixes assembling hand-written assembler source which repeatedly loads the same constant value, across a binary size larger than the pc-relative fixup range for ldr instructions (4096 bytes). Such assembler source already uses explicit .ltorg instructions to emit constant pools with regular intervals. However if we try to reuse constants emitted in earlier pools, they end up out of range. This makes the output of the testcase match what binutils gas does (prior to this patch, it would fail to assemble). Differential Revision: https://reviews.llvm.org/D32847 llvm-svn: 303540
* [MIPS] Add support to match more patterns for DINS instructionStrahinja Petrovic2017-05-221-25/+62
| | | | | | | | | This patch adds support for recognizing patterns to match DINS instruction. Differential Revision: https://reviews.llvm.org/D31465 llvm-svn: 303537
* Revert "[ARM] Clear the constant pool cache on explicit .ltorg directives"James Molloy2017-05-221-1/+0
| | | | | | | | This reverts commit r302416. This was a fixup for r286006, which has now been reverted so this doesn't apply (either in concept or in code). This commit itself has no problems, but the underlying issue it was fixing has now disappeared from the codebase. llvm-svn: 303536
* [GlobalISel][X86] Fix G_TRUNC instruction selection.Igor Breger2017-05-211-9/+15
| | | | | | | | Updated tests with -verify-machineinstrs flag. It fixes 3 tests failed with machine verifier enabled and listed in PR27481 llvm-svn: 303502
* SummaryHiroshi Inoue2017-05-211-6/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PPC backend eliminates compare instructions by using record-form instructions in PPCInstrInfo::optimizeCompareInstr, which is called from peephole optimization pass. This patch improves this optimization to eliminate more compare instructions in two types of common case. - comparison against a constant 1 or -1 The record-form instructions set CR bit based on signed comparison against 0. So, the current implementation does not exploit the record-form instruction for comparison against a non-zero constant. This patch enables record-form optimization for constant of 1 or -1 if possible; it changes the condition "greater than -1" into "greater than or equal to 0" and "less than 1" into "less than or equal to 0". With this patch, compare can be eliminated in the following code sequence, as an example. uint64_t a, b; if ((a | b) & 0x8000000000000000ull) { ... } else { ... } - andi for 32-bit comparison on PPC64 Since record-form instructions execute 64-bit signed comparison and so we have limitation in eliminating 32-bit comparison, i.e. with cmplwi, using the record-form. The original implementation already has such checks but andi. is not recognized as an instruction which executes implicit zero extension and hence safe to convert into record-form if used for equality check. %1 = and i32 %a, 10 %2 = icmp ne i32 %1, 0 br i1 %2, label %foo, label %bar In this simple example, LLVM generates andi. + cmplwi + beq on PPC64. This patch make it possible to eliminate the cmplwi for this case. I added andi. for optimization targets if it is safe to do so. Differential Revision: https://reviews.llvm.org/D30081 llvm-svn: 303500
OpenPOWER on IntegriCloud