summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Revert "Teach the IR verifier to reject conflicting debug info for function ↵Adrian Prantl2017-02-231-38/+0
| | | | | | | | | | | arguments." This reverts commit r295749 while investigating PR32042. It looks like this check uncovered a problem in the frontend that needs to be fixed before the check can be enabled again. llvm-svn: 296005
* [DAG] add convenience function to get -1 constant; NFCISanjay Patel2017-02-231-32/+15
| | | | llvm-svn: 296004
* [Reassociate] Add negated value of negative constant to the Duplicates list.Chad Rosier2017-02-231-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In OptimizeAdd, we scan the operand list to see if there are any common factors between operands that can be factored out to reduce the number of multiplies (e.g., 'A*A+A*B*C+D' -> 'A*(A+B*C)+D'). For each operand of the operand list, we only consider unique factors (which is tracked by the Duplicate set). Now if we find a factor that is a negative constant, we add the negated value as a factor as well, because we can percolate the negate out. However, we mistakenly don't add this negated constant to the Duplicates set. Consider the expression A*2*-2 + B. Obviously, nothing to factor. For the added value A*2*-2 we over count 2 as a factor without this change, which causes the assert reported in PR30256. The problem is that this code is assuming that all the multiply operands of the add are already reassociated. This change avoids the issue by making OptimizeAdd tolerate multiplies which haven't been completely optimized; this sort of works, but we're doing wasted work: we'll end up revisiting the add later anyway. Another possible approach would be to enforce RPO iteration order more strongly. If we have RedoInsts, we process them immediately in RPO order, rather than waiting until we've finished processing the whole function. Intuitively, it seems like the natural approach: reassociation works on expression trees, so the optimization only works in one direction. That said, I'm not sure how practical that is given the current Reassociate; the "optimal" form for an expression depends on its use list (see all the uses of "user_back()"), so Reassociate is really an iterative optimization of sorts, so any changes here would probably get messy. PR30256 Differential Revision: https://reviews.llvm.org/D30228 llvm-svn: 296003
* Use base discriminator in sample pgo profile matching.Dehao Chen2017-02-231-7/+8
| | | | | | | | | | | | | | Summary: The discriminator has been encoded, and only the base discriminator should be used during profile matching. Reviewers: dblaikie, davidxl Reviewed By: dblaikie, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30218 llvm-svn: 295999
* [Hexagon] Avoid IMPLICIT_DEFs as new-value producersKrzysztof Parzyszek2017-02-231-0/+2
| | | | llvm-svn: 295997
* [LazyMachineBFI] Reimplement with getAnalysisIfAvailableAdam Nemet2017-02-232-21/+44
| | | | | | | | | | | | | | | | | | | | | | | | | Since LoopInfo is not available in machine passes as universally as in IR passes, using the same approach for OptimizationRemarkEmitter as we did for IR will run LoopInfo and DominatorTree unnecessarily. (LoopInfo is not used lazily by ORE.) To fix this, I am modifying the approach I took in D29836. LazyMachineBFI now uses its client passes including MachineBFI itself that are available or otherwise compute them on the fly. So for example GreedyRegAlloc, since it's already using MBFI, will reuse that instance. On the other hand, AsmPrinter in Justin's patch will generate DT, LI and finally BFI on the fly. (I am of course wondering now if the simplicity of this approach is even preferable in IR. I will do some experiments.) Testing is provided by an updated version of D29837 which requires Justin's patch to bring ORE to the AsmPrinter. Differential Revision: https://reviews.llvm.org/D30128 llvm-svn: 295996
* [AddressSanitizer] Add PS4 offsetFilipe Cabecinhas2017-02-231-3/+7
| | | | llvm-svn: 295994
* [InstCombine] use loop instead of recursion to peek through FPExt; NFCISanjay Patel2017-02-231-6/+4
| | | | llvm-svn: 295992
* [InstCombine] use 'match' to reduce code; NFCISanjay Patel2017-02-231-11/+9
| | | | llvm-svn: 295991
* AMDGPU/SI: Fix trunc i16 patternJan Vesely2017-02-232-6/+5
| | | | | | | | Hit on ASICs that support 16bit instructions. Differential Revision: https://reviews.llvm.org/D30281 llvm-svn: 295990
* Strip trailing whitespace.Simon Pilgrim2017-02-231-8/+8
| | | | llvm-svn: 295989
* [Hexagon] Patterns for CTPOP, BSWAP and BITREVERSEKrzysztof Parzyszek2017-02-233-23/+16
| | | | llvm-svn: 295981
* [ARM] GlobalISel: Lower call returnsDiana Picus2017-02-231-11/+52
| | | | | | | | Introduce a common ValueHandler for call returns and formal arguments, and inherit two different versions for handling the differences (at the moment the only difference is the way physical registers are marked as used). llvm-svn: 295973
* [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrongAlexey Bataev2017-02-231-13/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295972
* [ARM] GlobalISel: Lower call parameters in regsDiana Picus2017-02-231-15/+39
| | | | | | | | Add support for lowering calls with parameters than can fit into regs. Use the same ValueHandler that we used for function returns, but rename it to match its new, extended purpose. llvm-svn: 295971
* [X86][AVX] Disable VCVTSS2SD & VCVTSD2SS memory folding and fix the register ↵Ayman Musa2017-02-232-6/+7
| | | | | | | | class of their first input when creating node in fast-isel. (Quick fix to buildbot failure after rL295940 commit). llvm-svn: 295970
* [mips][ias] Further relax operands of certain assembly instructionsSimon Dardis2017-02-233-80/+84
| | | | | | | | | | | | | | | | This patch adjusts the most relaxed predicate of immediate operands to accept immediate forms such as ~(0xf0000000|0x000f00000). Previously these forms would be accepted by GAS and rejected by IAS. This partially resolves PR/30383. Thanks to Sean Bruno for reporting the issue! Reviewers: slthakur, seanbruno Differential Revision: https://reviews.llvm.org/D29218 llvm-svn: 295965
* Fix assertion failure in ARMConstantIslandPass.Kristof Beyls2017-02-231-0/+1
| | | | | | | | | | The ARMConstantIslandPass didn't have support for handling accesses to constant island objects through ARM::t2LDRBpci instructions. This adds support for that. This fixes PR31997. llvm-svn: 295964
* Fix signed/unsigned comparison warning on MSVCSimon Pilgrim2017-02-231-1/+1
| | | | llvm-svn: 295962
* Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong"Alexey Bataev2017-02-231-21/+14
| | | | | | This reverts commit 7c5141e577d9efd1c8e3087566a38ce6b3a41a84. llvm-svn: 295957
* [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrongAlexey Bataev2017-02-231-14/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295956
* Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong"Alexey Bataev2017-02-231-19/+13
| | | | | | This reverts commit d83c81ee6a8dea662808ac22b396d1bb0595c89d. llvm-svn: 295951
* [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrongAlexey Bataev2017-02-231-13/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295949
* [X86][AVX512] Remove VCVTSS2SDZ & VCVTSD2SSZ from memory folding tables as ↵Ayman Musa2017-02-231-4/+0
| | | | | | | | they introduce new read dependency when folding. (Quick fix to buildbot fail). llvm-svn: 295946
* [X86][AVX512] Change VCVTSS2SD and VCVTSD2SS node types to keep consistency ↵Ayman Musa2017-02-233-26/+74
| | | | | | | | | | between VEX/EVEX versions. AVX versions of the converts work on f32/f64 types, while AVX512 version work on vectors. Differential Revision: https://reviews.llvm.org/D29988 llvm-svn: 295940
* LoadStoreVectorizer: Split even sized illegal chains properlyMatt Arsenault2017-02-233-3/+42
| | | | | | | | | | | | | | | | | | | | Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. llvm-svn: 295933
* [X86][IR] In AutoUpgrade, check explicitly for xop.vpcmov and xop.vpcmov.256 ↵Craig Topper2017-02-231-1/+2
| | | | | | | | | | instead of anything starting with xop.vpcmov There were some older intrinsics that only existed for less than a month in 2012 that still exist in some out of tree test files that start with this string, but aren't able to be handled by the current upgrade code and fire an assert. Now we'll go back to treating them as not intrinsics at all and just passing them through to output. Fixes PR32041, sort of. llvm-svn: 295930
* AMDGPU: Replace disabled exp inputs with undefMatt Arsenault2017-02-231-0/+28
| | | | llvm-svn: 295914
* AMDGPU: Add another BFE patternMatt Arsenault2017-02-233-39/+52
| | | | | | | This is the pattern that falls out of the instruction's definition if offset == 0. llvm-svn: 295912
* AMDGPU: Use clamp with f64Matt Arsenault2017-02-223-7/+11
| | | | llvm-svn: 295908
* Revert r295868 because it breaks a different SLP lit test.Michael Kuperstein2017-02-221-18/+13
| | | | llvm-svn: 295906
* AMDGPU: Fold FP clamp as modifier bitMatt Arsenault2017-02-226-6/+89
| | | | | | | | | | | The manual is unclear on the details of this. It's not clear to me if denormals are not allowed with clamp, or if that is only omod. Not allowing denorms for fp16 or fp64 isn't useful so I also question if that is really a restriction. Same with whether this is valid without IEEE mode enabled. llvm-svn: 295905
* AMDGPU : Update TrapCode based on Trap Handler ABI.Wei Ding2017-02-224-13/+17
| | | | | | Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295904
* [libFuzzer] Update traces hooks test after r293741Justin Bogner2017-02-221-5/+3
| | | | | | This test now passes on darwin. llvm-svn: 295902
* [libFuzzer] Mark a test that infinite loops as unsupportedJustin Bogner2017-02-223-5/+11
| | | | | | | We need to investigate this, but for now it just causes too much headache when trying to run these tests. llvm-svn: 295900
* AMDGPU: Add replacement bfe intrinsicsMatt Arsenault2017-02-222-0/+79
| | | | llvm-svn: 295899
* [InstCombine] don't try SimplifyDemandedInstructionBits from add/sub because ↵Sanjay Patel2017-02-221-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | it's slow and unlikely to succeed Notably, no regression tests change when we remove these calls, and these are expensive calls. The motivation comes from the general acknowledgement that the compiler is getting slower: http://lists.llvm.org/pipermail/llvm-dev/2017-January/109188.html http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html And specifically the test case attached to PR32037: https://bugs.llvm.org//show_bug.cgi?id=32037 Profiling the middle-end (opt) part of the compile: $ ./opt -O2 row_common.bc -o /dev/null ...visitAdd and visitSub are near the top of the instcombine list, and the calls to SimplifyDemandedInstructionBits() are high within each of those. Those calls account for 1%+ of the opt time in either debug or release profiles. And that's the rough win I see from this patch when testing opt built release from r295864 on an iMac with Haswell 4GHz (model 4790K). It seems unlikely that we'd be able to eliminate add/sub or change their operands given that add/sub normally affect all bits, and the PR32037 example shows no IR difference after this change using -O2. Also worth noting - the code comment in visitAdd: // This handles stuff like (X & 254)+1 -> (X&254)|1 ...isn't true. That transform is handled later with a call to haveNoCommonBitsSet(). Differential Revision: https://reviews.llvm.org/D30270 llvm-svn: 295898
* [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2017-02-225-74/+160
| | | | | | other minor fixes (NFC). llvm-svn: 295893
* [Hexagon] Implement @llvm.readcyclecounter()Krzysztof Parzyszek2017-02-226-9/+34
| | | | llvm-svn: 295892
* AMDGPU: Don't add emergency stack slot if all spills are SGPR->VGPRMatt Arsenault2017-02-221-36/+55
| | | | | | | | | This should avoid reporting any stack needs to be allocated in the case where no stack is truly used. An unused stack slot is still left around in other cases where there are real stack objects but no spilling occurs. llvm-svn: 295891
* PredicateInfo: Support switch statementsDaniel Berlin2017-02-222-36/+110
| | | | | | | | | | | | | | | | | | | Summary: Depends on D29606 and D29682 Makes us pass GVN's edge.ll (we also will pass a few other testcases they just need cleaning up). Thoughts on the Predicate* hiearchy of classes especially welcome :) (it's not clear to me how best to organize it, and currently, the getBlock* seems ... uglier than maybe wasting a field somewhere or something). Reviewers: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29747 llvm-svn: 295889
* Move updating functions to MemorySSAUpdater.Daniel Berlin2017-02-224-131/+110
| | | | | | | | | | | | | | | Add updater to passes that now need it. Move around code in MemorySSA to expose needed functions. Summary: Mostly cleanup Reviewers: george.burgess.iv Subscribers: llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D30221 llvm-svn: 295887
* [LSR] Canonicalize formula and put recursive Reg related with current loop ↵Wei Mi2017-02-221-39/+83
| | | | | | | | | | | | | | | | in ScaledReg. After rL294814, LSR formula can have multiple SCEVAddRecExprs inside of its BaseRegs. Previous canonicalization will swap the first SCEVAddRecExpr in BaseRegs with ScaledReg. But now we want to swap the SCEVAddRecExpr Reg related with current loop with ScaledReg. Otherwise, we may generate code like this: RegA + lsr.iv + RegB, where loop invariant parts RegA and RegB are not grouped together and cannot be promoted outside of loop. With this patch, it will ensure lsr.iv to be generated later in the expr: RegA + RegB + lsr.iv, so that RegA + RegB can be promoted outside of loop. Differential Revision: https://reviews.llvm.org/D26781 llvm-svn: 295884
* [RDF] Support for partial structural aliases in RegisterAggrKrzysztof Parzyszek2017-02-222-61/+67
| | | | llvm-svn: 295883
* [Support] Re-add the special OSX flags on mmap.Zachary Turner2017-02-221-0/+19
| | | | | | | | The problem appears to be that these flags can only be used when mapping a file for read-only, not for readwrite. So we do that here. llvm-svn: 295880
* [Hexagon] Add intrinsics for masked vector storesKrzysztof Parzyszek2017-02-221-0/+19
| | | | | | Patch by Harsha Jagasia. llvm-svn: 295879
* AMDGPU: Don't look at chain users when adjusting writemaskMatt Arsenault2017-02-221-0/+4
| | | | | | Fixes not adjusting using new intrinsics with chains. llvm-svn: 295878
* AMDGPU: Always allocate emergency stack slot at offset 0Matt Arsenault2017-02-221-5/+19
| | | | | | | | | This allows us to ensure that 0 is never a valid pointer to a user object, and ensures that the offset is always legal without needing a register to access it. This comes at the cost of usable offsets and wasted stack space. llvm-svn: 295877
* AMDGPU: Change exp with compr bit printingMatt Arsenault2017-02-221-3/+11
| | | | llvm-svn: 295873
* Revert "AMDGPU : Update TrapCode based on Trap Handler ABI."Wei Ding2017-02-224-16/+12
| | | | | | This reverts commit r295867. llvm-svn: 295871
OpenPOWER on IntegriCloud