summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [X86][SSE] Use VSEXT/VZEXT constant folding for ↵Simon Pilgrim2017-02-111-1/+6
| | | | | | | | SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG Preparatory step for PR31712 llvm-svn: 294874
* [X86][SSE] Improve VSEXT/VZEXT constant folding.Simon Pilgrim2017-02-111-11/+18
| | | | | | Generalize VSEXT/VZEXT constant folding to work with any target constant bits source not just BUILD_VECTOR . llvm-svn: 294873
* [X86][SSE] Add early-out when trying to match blend shuffle. NFCI.Simon Pilgrim2017-02-111-3/+4
| | | | llvm-svn: 294864
* [TargetLowering] check for sign-bit comparisons in SimplifyDemandedBitsSanjay Patel2017-02-111-0/+19
| | | | | | | | | | | | | | | | I don't know if anything other than x86 vectors is affected by this change, but this may allow us to remove target-specific intrinsics for blendv* (vector selects). The simplification arises from the fact that blendv* instructions only use the sign-bit when deciding which vector element to choose for the destination vector. The mechanism to fold VSELECT into SHRUNKBLEND nodes already exists in x86 lowering; this demanded bits change just enables the transform to fire more often. The original motivation starts with a bug for DSE of masked stores that seems completely unrelated, but I've explained the likely steps in this series here: https://llvm.org/bugs/show_bug.cgi?id=11210 Differential Revision: https://reviews.llvm.org/D29687 llvm-svn: 294863
* Fix indentation in X86ISelLowering. NFCAmaury Sechet2017-02-111-8/+8
| | | | llvm-svn: 294859
* [AVX-512] Add VPMINS/MINU/MAXS/MAXU instructions to load folding tables.Craig Topper2017-02-111-0/+136
| | | | llvm-svn: 294858
* [X86] Improve alphabetizing of load folding tables. NFCCraig Topper2017-02-111-18/+18
| | | | llvm-svn: 294857
* [X86][SSE] Convert getTargetShuffleMaskIndices to use ↵Simon Pilgrim2017-02-111-75/+25
| | | | | | | | | | getTargetConstantBitsFromNode. Removes duplicate constant extraction code in getTargetShuffleMaskIndices. getTargetConstantBitsFromNode - adds support for VZEXT_MOVL(SCALAR_TO_VECTOR) and fail if the caller doesn't support undef bits. llvm-svn: 294856
* [X86] Merge repeated getScalarValueSizeInBits calls. NFCI.Simon Pilgrim2017-02-111-7/+7
| | | | llvm-svn: 294852
* NewGVN: Reverse sense of this test to make it clearerDaniel Berlin2017-02-111-5/+7
| | | | llvm-svn: 294851
* NewGVN: Add missing initialization of NumFuncArgs lost due to bad merge.Daniel Berlin2017-02-111-0/+1
| | | | llvm-svn: 294850
* NewGVN: Rank and order commutative operands consistently.Daniel Berlin2017-02-111-2/+40
| | | | llvm-svn: 294849
* [X86][3DNow!] Enable PFSUB<->PFSUBR commutationSimon Pilgrim2017-02-112-2/+14
| | | | llvm-svn: 294847
* [X86][3DNow!] Enable commutation for PFADD/PFMUL/PFCMPEQ/PAVGUSB/PMULHRWSimon Pilgrim2017-02-111-8/+10
| | | | | | | | All commutations confirmed to give identical results - note PFMAX/PFMIN do not PFSUB<->PFSUBR should be commutable as well llvm-svn: 294846
* NewGVN: Clean up how we handle the INITIAL class so that everything inDaniel Berlin2017-02-111-16/+38
| | | | | | | | | | | | | | | | | | it is dead or unreachable, as it should be. This also makes the leader of INITIAL undef, enabling us to handle irreducibility properly. Summary: This lets us verify, more than we do now, that we didn't screw up value numbering. Reviewers: davide Subscribers: Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D29842 llvm-svn: 294844
* Fix "left shift of negative value -1" introduced by r294805Vitaly Buka2017-02-111-1/+1
| | | | llvm-svn: 294843
* Move symbols from the global namespace into (anonymous) namespaces. NFC.Benjamin Kramer2017-02-119-30/+25
| | | | llvm-svn: 294837
* [AVX-512] Add VPINSRB/W/D/Q instructions to load folding tables.Craig Topper2017-02-111-0/+4
| | | | llvm-svn: 294830
* [AVX-512] Fix apparent typo in instruction name VMOVSSDrr_REV->VMOVSDZrr_REV.Craig Topper2017-02-111-1/+1
| | | | llvm-svn: 294829
* [AVX-512] Add VPSADBW instructions to load folding tables.Craig Topper2017-02-111-0/+3
| | | | llvm-svn: 294827
* [X86] Don't base domain decisions on VEXTRACTF128/VINSERTF128 if only AVX1 ↵Craig Topper2017-02-111-4/+19
| | | | | | | | | | | | is available. Seems the execution dependency pass likes to use FP instructions when most of the consuming code is integer if a vextractf128 instruction produced the register. Without AVX2 we don't have the corresponding integer instruction available. This patch suppresses the domain on these instructions to GenericDomain if AVX2 is not supported so that they are ignored by domain fixing. If AVX2 is supported we'll report the correct domain and allow them to switch between integer and fp. Overall I think this produces better results in the modified test cases. llvm-svn: 294824
* Address Mehdi's post-commit review comments on r294795.Peter Collingbourne2017-02-111-0/+4
| | | | llvm-svn: 294822
* Fix PR23384 (under "-lsr-insns-cost" option)Evgeny Stupachenko2017-02-111-4/+57
| | | | | | | | | | | | | Summary: The patch adds instructions number generated by a solution to LSR cost under "-lsr-insns-cost" option. Reviewers: qcolombet, hfinkel Differential Revision: http://reviews.llvm.org/D28307 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 294821
* [ARM] Make f16 interleaved accesses expensive.Ahmed Bougacha2017-02-111-1/+2
| | | | | | | | | | | | | There are no vldN/vstN f16 variants, even with +fullfp16. We could use the i16 variants, but, in practice, even with +fullfp16, the f16 sequence leading to the i16 shuffle usually gets scalarized. We'd need to improve our support for f16 codegen before getting there. Teach the cost model to consider f16 interleaved operations as expensive. Otherwise, we are all but guaranteed to end up with a large block of scalarized vector code. llvm-svn: 294819
* [ARM] Don't lower f16 interleaved accesses.Ahmed Bougacha2017-02-111-0/+14
| | | | | | | | | | | | There are no vldN/vstN f16 variants, even with +fullfp16. We could use the i16 variants, but, in practice, even with +fullfp16, the f16 sequence leading to the i16 shuffle usually gets scalarized. We'd need to improve our support for f16 codegen before getting there. Reject f16 interleaved accesses. If we try to emit the f16 intrinsics, we'll just end up with a selection failure. llvm-svn: 294818
* [LSR] Recommit: Allow formula containing Reg for SCEVAddRecExpr related with ↵Wei Mi2017-02-111-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | outerloop. The recommit includes some changes of testcases. No functional change to the patch. In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr, and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser and dropped. Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only handle inner loop now so only %for.body2 will be handled. Using the logic above, formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related with outerloop. Only formula like reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept because the SCEVAddRecExpr related with outerloop is folded into the initial value of the SCEVAddRecExpr related with current loop. But in some cases, we do need to share the basic induction variable reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction variables used by LSR, so we don't want to drop the formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally. From the existing comment, it tries to avoid considering multiple level loops at the same time. However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other than current loop, it is an invariant and will be simple to handle, and the formula doesn't have to be dropped. Differential Revision: https://reviews.llvm.org/D26429 llvm-svn: 294814
* [MC] Fix some Clang-tidy modernize and Include What You Use warnings; other ↵Eugene Zelenko2017-02-1111-103/+83
| | | | | | minor fixes (NFC). llvm-svn: 294813
* [PM] Fix a bug in how I ported LoopDeletion to the new PM.Chandler Carruth2017-02-111-21/+14
| | | | | | | | | | | | | | | This was marking the loop for deletion after the loop was deleted. This almost works, except that when we do any kind of debug logging it starts reading the name of the loop from deleted memory or otherwise blowing up. This can fail in a bunch of ways. I recently added a test that *always* does this, and it started failing on the sanitizer bots. The fix is to mark the loop as deleted in the loop PM infrastructure before we remove the loop. We can do this by passing the updater into the routine. That also lets us simplify a bunch of other interface components here for a net win. llvm-svn: 294810
* [WebAssembly] Remove old experimental disassemler code.Dan Gohman2017-02-111-84/+2
| | | | | | | Remove support for disassembling an old experimental wasm binary format, which is no longer in use anywhere. llvm-svn: 294809
* [LTO] Share the optimization remarks setup between Thin/Full LTO.Davide Italiano2017-02-103-41/+29
| | | | llvm-svn: 294807
* [Hexagon] Introduce Hexagon V62Krzysztof Parzyszek2017-02-1018-61/+4032
| | | | llvm-svn: 294805
* IR: Function summary extensions for whole-program devirtualization pass.Peter Collingbourne2017-02-103-27/+191
| | | | | | | | | | The summary information includes all uses of llvm.type.test and llvm.type.checked.load intrinsics that can be used to devirtualize calls, including any constant arguments for virtual constant propagation. Differential Revision: https://reviews.llvm.org/D29734 llvm-svn: 294795
* [InstCombine] Move class into anonymous namespace. NFC.Benjamin Kramer2017-02-101-0/+2
| | | | | | | | This is necessary to avoid warnings from GCC. InstCombineLoadStoreAlloca.cpp:238:7: error: 'PointerReplacer' declared with greater visibility than the type of its field 'PointerReplacer::IC' llvm-svn: 294794
* [lib/LTO] Rework optimization remarkers setup.Davide Italiano2017-02-101-16/+19
| | | | | | | | This makes this code much more similar to what ThinLTO is using (also API wise), so now we can probably use a single code path instead of copying stuff around. llvm-svn: 294792
* [PPC] Silence warning in Release builds.Benjamin Kramer2017-02-101-2/+1
| | | | llvm-svn: 294791
* [InstCombine] Silence unused variable warning in Release builds.Benjamin Kramer2017-02-101-0/+2
| | | | llvm-svn: 294788
* Revert r294532, it caused PR31935Nico Weber2017-02-101-141/+9
| | | | llvm-svn: 294787
* Fix invalid addrspacecast due to combining alloca with global varYaxun Liu2017-02-102-7/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | For function-scope variables with large initialisation list, FE usually generates a global variable to hold the initializer, then generates memcpy intrinsic to initialize the alloca. InstCombiner::visitAllocaInst identifies such allocas which are accessed only by reading and replaces them with the global variable. This is done by casting the global variable to the type of the alloca and replacing all references. However, when the global variable is in a different address space which is disjoint with addr space 0 (e.g. for IR generated from OpenCL, global variable cannot be in private addr space i.e. addr space 0), casting the global variable to addr space 0 results in invalid IR for certain targets (e.g. amdgpu). To fix this issue, when the global variable is not in addr space 0, instead of casting it to addr space 0, this patch chases down the uses of alloca until reaching the load instructions, then replaces load from alloca with load from the global variable. If during the chasing bitcast and GEP are encountered, new bitcast and GEP based on the global variable are generated and used in the load instructions. Differential Revision: https://reviews.llvm.org/D27283 llvm-svn: 294786
* Fix a silly syntax error.Tim Shen2017-02-101-2/+2
| | | | llvm-svn: 294783
* Encode duplication factor from loop vectorization and loop unrolling to ↵Dehao Chen2017-02-104-10/+32
| | | | | | | | | | | | | | | | | | | | | discriminator. Summary: This patch starts the implementation as discuss in the following RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html When optimization duplicates code that will scale down the execution count of a basic block, we will record the duplication factor as part of discriminator so that the offline process tool can find the duplication factor and collect the accurate execution frequency of the corresponding source code. Two important optimization that fall into this category is loop vectorization and loop unroll. This patch records the duplication factor for these 2 optimizations. The recording will be guarded by a flag encode-duplication-in-discriminators, which is off by default. Reviewers: probinson, aprantl, davidxl, hfinkel, echristo Reviewed By: hfinkel Subscribers: mehdi_amini, anemet, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D26420 llvm-svn: 294782
* [XRay] Implement powerpc64le xray.Tim Shen2017-02-105-3/+104
| | | | | | | | | | | | | | | | | | Summary: powerpc64 big-endian is not supported, but I believe that most logic can be shared, except for xray_powerpc64.cc. Also add a function InvalidateInstructionCache to xray_util.h, which is copied from llvm/Support/Memory.cpp. I'm not sure if I need to add a unittest, and I don't know how. Reviewers: dberris, echristo, iteratee, kbarton, hfinkel Subscribers: mehdi_amini, nemanjai, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D29742 llvm-svn: 294781
* [Hexagon] Remove unused .td filesKrzysztof Parzyszek2017-02-107-2572/+0
| | | | llvm-svn: 294775
* [X86] Bitcast subvector before broadcasting it.Ahmed Bougacha2017-02-101-1/+10
| | | | | | | | | | | | | Since r274013, we've been looking through bitcasts on broadcast inputs. In the scalar-folding case (from a load, build_vector, or sc2vec), the input type didn't matter, as we'd simply bitcast the resulting scalar back. However, when broadcasting a 128-bit-lane-aligned element, we create an EXTRACT_SUBVECTOR. Use proper types, by creating an extract_subvector of the original input type. llvm-svn: 294774
* Yet another fix llvm-objdump so it picks a good CPU based for Mach-O files,Kevin Enderby2017-02-101-0/+2
| | | | | | | | | | | in this case for CPU_SUBTYPE_ARM64_ALL. For this cpusubtype it should default to a cyclone CPU to give proper disassembly without a -mcpu= flag. rdar://27767188 llvm-svn: 294771
* GlobalISel: drop lifetime intrinsics during translation.Tim Northover2017-02-101-0/+8
| | | | | | We don't use them yet and they just cause problems. llvm-svn: 294770
* [libFuzzer] Use stoull instead of stol to ensure 64 bits.Marcos Pividori2017-02-101-2/+2
| | | | | | Differential revision: https://reviews.llvm.org/D29831 llvm-svn: 294769
* [ARM] Fix incorrect mask bits in MSR encoding for write_register intrinsicJohn Brawn2017-02-101-10/+6
| | | | | | | | | | | In the encoding of system registers in the M-class MSR instruction the mask bits should be 2 for registers that don't take a _<bits> qualifier (the instruction is unpredictable otherwise), and should also be 2 if the register takes a _<bits> qualifier but it's not present as no _<bits> is an alias for _nzcvq. Differential Revision: https://reviews.llvm.org/D29828 llvm-svn: 294762
* [LV] Remove type restriction for vector phi creationMatthew Simpson2017-02-101-2/+1
| | | | | | | | | We previously only created a vector phi node for an induction variable if its type matched the type of the canonical induction variable. Differential Revision: https://reviews.llvm.org/D29776 llvm-svn: 294755
* [Hexagon] Replace instruction definitions with auto-generated onesKrzysztof Parzyszek2017-02-1037-12829/+48409
| | | | llvm-svn: 294753
* Move some error handling down to MCStreamer.Rafael Espindola2017-02-1011-30/+25
| | | | | | | | | This makes sure we get the same redefinition rules regardless of who is printing (asm parser, codegen) and to what (asm, obj). This fixes an unintentional regression in r293936. llvm-svn: 294752
OpenPOWER on IntegriCloud