summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [ValueTracking] Remove dead code from an old experimentPhilip Reames2016-03-031-208/+2
| | | | | | | | | | This experiment was originally about trying to use facts implied dominating conditions to infer more precise known bits. While the compile time was found to be acceptable on several large code bases, we never found sufficiently profitable examples to justify turning on the code by default. Given this, it's time to abandon the experiment. Several folks have commented that they've found this useful for experimentation, but nothing has come of those experiments. Given how easy the patch is to apply, there's no reason to leave the code in tree. For anyone interested in further investigation in this area, I recommend finding the summary email I sent on one of the original review threads. In particular, I now believe the use-list based approach is strictly worse than the dom-tree-walking approach. llvm-svn: 262646
* [InstCombine] transform bitcasted bitwise logic ops with constants (PR26702)Sanjay Patel2016-03-031-7/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Given that we're not actually reducing the instruction count in the included regression tests, I think we would call this a canonicalization step. The motivation comes from the example in PR26702: https://llvm.org/bugs/show_bug.cgi?id=26702 If we hoist the bitwise logic ahead of the bitcast, the previously unoptimizable example of: define <4 x i32> @is_negative(<4 x i32> %x) { %lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> %not = xor <4 x i32> %lobit, <i32 -1, i32 -1, i32 -1, i32 -1> %bc = bitcast <4 x i32> %not to <2 x i64> %notnot = xor <2 x i64> %bc, <i64 -1, i64 -1> %bc2 = bitcast <2 x i64> %notnot to <4 x i32> ret <4 x i32> %bc2 } Simplifies to the expected: define <4 x i32> @is_negative(<4 x i32> %x) { %lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> ret <4 x i32> %lobit } Differential Revision: http://reviews.llvm.org/D17583 llvm-svn: 262645
* Fix breakage caused by r262636.Easwaran Raman2016-03-031-1/+1
| | | | | | Use LLVM_ATTRIBUTE_UNUSED instead of __attribute_((unused)) llvm-svn: 262643
* [SCEV] Prove no-overflow via constant rangesSanjoy Das2016-03-031-0/+41
| | | | | | | Exploit ScalarEvolution::getRange's newly acquired smartness (since r262438) by using that to infer nsw and nuw when possible. llvm-svn: 262639
* [SCEV] Be less eager about demoting zexts to sextsSanjoy Das2016-03-031-4/+5
| | | | | | | | | | | | After r262438 we can have provably positive NSW SCEV expressions whose zero extensions cannot be simplified (since r262438 makes SCEV better at computing constant ranges). This means demoting sexts of positive add recurrences eagerly can result in an unsimplified zero extension where we could have had a simplified sign extension. This change fixes the issue by teaching SCEV to demote sext of a positive SCEV expression to a zext only if the sext could not be simplified. llvm-svn: 262638
* [ConstantRange] Generalize makeGuaranteedNoWrapRegion to work on rangesSanjoy Das2016-03-031-12/+20
| | | | | | | This will be used in a later patch to ScalarEvolution. Right now only the unit tests exercise the newly added code. llvm-svn: 262637
* Infrastructure for PGO enhancements in inlinerEaswaran Raman2016-03-035-45/+221
| | | | | | | | | | | | This patch provides the following infrastructure for PGO enhancements in inliner: Enable the use of block level profile information in inliner Incremental update of block frequency information during inlining Update the function entry counts of callees when they get inlined into callers. Differential Revision: http://reviews.llvm.org/D16381 llvm-svn: 262636
* [X86][AVX] Better support for the variable mask form of VPERMILPD/VPERMILPSSimon Pilgrim2016-03-033-19/+35
| | | | | | | | | | The variable mask form of VPERMILPD/VPERMILPS were only partially implemented, with much of it still performed as an intrinsic. This patch properly defines the instructions in terms of X86ISD::VPERMILPV, permitting the opcode to be easily combined as a target shuffle. Differential Revision: http://reviews.llvm.org/D17681 llvm-svn: 262635
* Use LineLocation instead of CallsiteLocation to index callsite profile.Dehao Chen2016-03-034-54/+30
| | | | | | | | | | | | Summary: With discriminator, LineLocation can uniquely identify a callsite without the need to specifying callee name. Remove Callee function name from the key, and put it in the value (FunctionSamples). Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17827 llvm-svn: 262634
* [X86] Tidied up 256-bit -> 2 x 128-bit vector shift extraction.Simon Pilgrim2016-03-031-14/+2
| | | | | | lowerShift was manually splitting BUILD_VECTOR cases when it could just call Extract128BitVector which does this anyway. llvm-svn: 262633
* [X86] Pulled out repeated code testing for constant vector shift amount. NFCI.Simon Pilgrim2016-03-031-8/+6
| | | | llvm-svn: 262631
* MCU target has its own ABI, however X86 interrupt handler calling convention ↵Amjad Aboud2016-03-031-1/+3
| | | | | | | | | | overrides this ABI. Fixed the ordering to check first for X86 interrupt handler then for MCU target. Differential Revision: http://reviews.llvm.org/D17801 llvm-svn: 262628
* [X86] Don't assume that shuffle non-mask operands starts at #0.Ahmed Bougacha2016-03-032-32/+68
| | | | | | | | | | | | | | | | | | | | | | | | | That's not the case for VPERMV/VPERMV3, which cover all possible combinations (the C intrinsics use a different order; the AVX vs AVX512 intrinsics are different still). Since: r246981 AVX-512: Lowering for 512-bit vector shuffles. VPERMV is recognized in getTargetShuffleMask. This breaks assumptions in most callers, as they expect the non-mask operands to start at index 0. VPERMV has the mask as operand #0; VPERMV3 has it in the middle. Instead of the faulty assumption, have getTargetShuffleMask return its operands as well. One alternative we considered was to change the operand order of VPERMV, but we agreed to stick to the instruction order, as there are more AVX512 weirdness to cover (vpermt2/vpermi2 in particular). Differential Revision: http://reviews.llvm.org/D17041 llvm-svn: 262627
* [LoopUtils, LV] Fix PR26734Matthew Simpson2016-03-031-1/+1
| | | | | | | | The vectorization of first-order recurrences (r261346) caused PR26734. When detecting these recurrences, we need to ensure that the previous value is actually defined inside the loop. This patch includes the fix and test case. llvm-svn: 262624
* [AArch64] fold 'isPositive' vector integer operations (PR26819)Sanjay Patel2016-03-031-1/+30
| | | | | | | | | | | | | | | | This is one of the cases shown in: https://llvm.org/bugs/show_bug.cgi?id=26819 Shift and negate is what InstCombine prefers to produce (and I tried to make it do more of that in http://reviews.llvm.org/rL262424 ), so we should recognize that pattern as something that might come from autovectorization even if it's unlikely to be produced from C NEON intrinsics. The patch is based on the x86 equivalent: http://reviews.llvm.org/rL262036 Differential Revision: http://reviews.llvm.org/D17834 llvm-svn: 262623
* AVX512: Combine AND + TESTM instructions .Igor Breger2016-03-033-8/+25
| | | | | | Differential Revision: http://reviews.llvm.org/D17844 llvm-svn: 262621
* [AVR] Add calling convention parser tokensDylan McKay2016-03-034-0/+9
| | | | | | | | | | | | Summary: Adds the 'avr_intrcc' and 'avr_signalcc' IR calling convention tokens to the parser. Reviewers: arsenm Subscribers: dylanmckay, llvm-commits Differential Revision: http://reviews.llvm.org/D16348 llvm-svn: 262600
* [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREGSimon Pilgrim2016-03-032-10/+42
| | | | | | | | Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 262599
* Revert "[ARM] Merging 64-bit divmod lib calls into one"Renato Golin2016-03-032-11/+1
| | | | | | This reverts commit r262507, which broke some ARM buildbots. llvm-svn: 262594
* [LLVM][AVX512] PSRLWI Chnage imm8 to intMichael Zuckerman2016-03-031-3/+3
| | | | | | Differential Revision: http://reviews.llvm.org/D17753 llvm-svn: 262592
* [BranchFolding] Change function name related with merging MMOs. NFCJunmo Park2016-03-031-7/+5
| | | | | | | | | | | Summary: Removing MMOs is not our prefer behavior any more. Reviewers: mcrosier, reames Differential Revision: http://reviews.llvm.org/D17668 llvm-svn: 262580
* AMDGPU: Insert two S_NOP instructions for every high level source statement.Tom Stellard2016-03-034-0/+110
| | | | | | | | | | | | | | Patch by: Konstantin Zhuravlyov Summary: Tools, such as debugger, need to pause execution based on user input (i.e. breakpoint). In order to do this, two S_NOP instructions are inserted for each high level source statement: one before first isa instruction of high level source statement, and one after last isa instruction of high level source statement. Further, debugger may replace S_NOP instructions with S_TRAP instructions based on user input. Reviewers: tstellarAMD, arsenm Subscribers: echristo, dblaikie, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17454 llvm-svn: 262579
* AMDGPU/SI: Don't try to move scratch wave offset when there are no free SGPRsTom Stellard2016-03-031-3/+15
| | | | | | | | | | | | | | Summary: When there were no free SGPRs, we were trying to move this value into some of the reserved registers which was causing a segmentation fault. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17590 llvm-svn: 262577
* [X86] Enable forwarding bool arguments in tail calls (PR26305)Hans Wennborg2016-03-031-0/+20
| | | | | | | | | The code was previously not able to track a boolean argument at a call site back to the formal argument of the caller. Differential Revision: http://reviews.llvm.org/D17786 llvm-svn: 262575
* [PPCVSXFMAMutate] Temporarily disable this passTim Shen2016-03-031-2/+8
| | | | llvm-svn: 262573
* [MBP] Renaming a confusing variable and add clarifying commentsPhilip Reames2016-03-031-19/+24
| | | | | | Was discussed as part of http://reviews.llvm.org/D17830 llvm-svn: 262571
* [MBP] Avoid placing random blocks between loop preheader and headerPhilip Reames2016-03-031-1/+2
| | | | | | | | | | If we have a loop with a rarely taken path, we will prune that from the blocks which get added as part of the loop chain. The problem is that we weren't then recognizing the loop chain as schedulable when considering the preheader when forming the function chain. We'd then fall to various non-predecessors before finally scheduling the loop chain (as if the CFG was unnatural.) The net result was that there could be lots of garbage between a loop preheader and the loop, even though we could have directly fallen into the loop. It also meant we separated hot code with regions of colder code. The particular reason for the rejection of the loop chain was that we were scanning predecessor of the header, seeing the backedge, believing that was a globally more important predecessor (true), but forgetting to account for the fact the backedge precessor was already part of the existing loop chain (oops!. Differential Revision: http://reviews.llvm.org/D17830 llvm-svn: 262547
* [X86] Don't give catch objects a displacement of zeroDavid Majnemer2016-03-035-25/+70
| | | | | | | | | | | | | | | | | | Catch objects with a displacement of zero do not initialize a catch object. The displacement is relative to %rsp at the end of the function's prologue for x86_64 targets. If we place an object at the top-of-stack, we will end up wit a displacement of zero resulting in our catch object remaining uninitialized. Address this by creating our catch objects as fixed objects. We will ensure that the UnwindHelp object is created after the catch objects so that no catch object will have a displacement of zero. Differential Revision: http://reviews.llvm.org/D17823 llvm-svn: 262546
* AMDGPU: Simplify boolean conditional return statementsMatt Arsenault2016-03-027-44/+19
| | | | | | Patch by Richard Thomson llvm-svn: 262536
* [MBP] Remove overly verbose debug outputPhilip Reames2016-03-021-5/+2
| | | | llvm-svn: 262531
* Explode store of arrays in instcombineAmaury Sechet2016-03-021-1/+33
| | | | | | | | | | | | Summary: This is the last step toward supporting aggregate memory access in instcombine. This explodes stores of arrays into a serie of stores for each element, allowing them to be optimized. Reviewers: joker.eph, reames, hfinkel, majnemer, mgrang Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17828 llvm-svn: 262530
* [MBP] Adjust debug output to be more focused and approachablePhilip Reames2016-03-021-18/+9
| | | | llvm-svn: 262522
* Unpack array of all sizes in InstCombineAmaury Sechet2016-03-021-5/+38
| | | | | | | | | | | | Summary: This is another step toward improving fca support. This unpack load of array in a series of load to array's elements. Reviewers: chandlerc, joker.eph, majnemer, reames, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15890 llvm-svn: 262521
* Really fix ASAN leak/etc issues with MemorySSA unittestsDaniel Berlin2016-03-021-3/+2
| | | | llvm-svn: 262519
* [libFuzzer] add -Werror for libFuzzer build ruleKostya Serebryany2016-03-021-1/+1
| | | | llvm-svn: 262517
* Revert "Fix ASAN detected errors in code and test" (it was not meant to be ↵Daniel Berlin2016-03-021-2/+3
| | | | | | | | committed yet) This reverts commit 890bbccd600ba1eb050353d06a29650ad0f2eb95. llvm-svn: 262512
* Fix ASAN detected errors in code and testDaniel Berlin2016-03-021-3/+2
| | | | llvm-svn: 262511
* [ARM] Merging 64-bit divmod lib calls into oneRenato Golin2016-03-022-1/+11
| | | | | | | | | | | | | | | | | When div+rem calls on the same arguments are found, the ARM back-end merges the two calls into one __aeabi_divmod call for up to 32-bits values. However, for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't merging the calls, and thus calling ldivmod twice and spilling the temporary results, which generated pretty bad code. This patch legalises 64-bit lib calls for divmod, so that now all the spilling and the second call are gone. It also relaxes the DivRem combiner a bit on the legal type check, since it was already checking for isLegalOrCustom on every value, so the extra check for isTypeLegal was redundant. This patch fixes PR17193 (and a long time FIXME in the tests). llvm-svn: 262507
* Revert "[X86] Elide references to _chkstk for dynamic allocas"Reid Kleckner2016-03-021-29/+9
| | | | | | | | | | | | | | | This reverts commit r262370. It turns out there is code out there that does sequences of allocas greater than 4K: http://crbug.com/591404 The goal of this change was to improve the code size of inalloca call sequences, but we got tangled up in the mess of dynamic allocas. Instead, we should come back later with a separate MI pass that uses dominance to optimize the full sequence. This should also be able to remove the often unneeded stacksave/stackrestore pairs around the call. llvm-svn: 262505
* ARM: Introduce conservative load/store optimization modeMatthias Braun2016-03-021-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | Most of the time ARM has the CCR.UNALIGN_TRP bit set to false which means that unaligned loads/stores do not trap and even extensive testing will not catch these bugs. However the multi/double variants are not affected by this bit and will still trap. In effect a more aggressive load/store optimization will break existing (bad) code. These bugs do not necessarily manifest in the broken code where the misaligned pointer is formed but often later in perfectly legal code where it is accessed. This means recompiling system libraries (which have no alignment bugs) with a newer compiler will break existing applications (with alignment bugs) that worked before. So (under protest) I implemented this safe mode which limits the formation of multi/double operations to cases that are not affected by user code (stack operations like spills/reloads) or cases where the normal operations trap anyway (floating point load/stores). It is disabled by default. Differential Revision: http://reviews.llvm.org/D17015 llvm-svn: 262504
* SelectionDAG: Use correctly sized allocation functions for SDNodesJustin Bogner2016-03-021-116/+86
| | | | | | | | | | | | | | | | The placement new calls here were all calling the allocation function in RecyclingAllocator/Recycler for SDNode, instead of the function for the specific subclass we were constructing. Since this particular allocator always overallocates it more or less worked, but would hide what we're actually doing from any memory tools. Also, if you tried to change this allocator so something like a BumpPtrAllocator or MallocAllocator, the compiler would crash horribly all the time. Part of llvm.org/PR26808. llvm-svn: 262500
* [AArch64] Enable non-leaf frame pointer elimination.Geoff Berry2016-03-022-9/+7
| | | | | | | | | | | | | | | | | | | | Summary: This change enables frame pointer elimination in non-leaf functions. The -fomit-frame-pointer option still needs to be used when compiling via clang (or an equivalent method of not setting the 'no-frame-pointer-elim*' function attributes if generating llvm IR via some other method) to take advantage of this optimization. This change should be NFC when compiling via clang without -fomit-frame-pointer. Reviewers: t.p.northover Subscribers: aemerson, rengolin, tberghammer, qcolombet, llvm-commits, danalbert, mcrosier, srhines Differential Revision: http://reviews.llvm.org/D17730 llvm-svn: 262495
* [AA] Hoist the logic to reformulate various AA queries in terms of otherChandler Carruth2016-03-0211-68/+161
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | parts of the AA interface out of the base class of every single AA result object. Because this logic reformulates the query in terms of some other aspect of the API, it would easily cause O(n^2) query patterns in alias analysis. These could in turn be magnified further based on the number of call arguments, and then further based on the number of AA queries made for a particular call. This ended up causing problems for Rust that were actually noticable enough to get a bug (PR26564) and probably other places as well. When originally re-working the AA infrastructure, the desire was to regularize the pattern of refinement without losing any generality. While I think it was successful, that is clearly proving to be too costly. And the cost is needless: we gain no actual improvement for this generality of making a direct query to tbaa actually be able to re-use some other alias analysis's refinement logic for one of the other APIs, or some such. In short, this is entirely wasted work. To the extent possible, delegation to other API surfaces should be done at the aggregation layer so that we can avoid re-walking the aggregation. In fact, this significantly simplifies the logic as we no longer need to smuggle the aggregation layer into each alias analysis (or the TargetLibraryInfo into each alias analysis just so we can form argument memory locations!). However, we also have some delegation logic inside of BasicAA and some of it even makes sense. When the delegation logic is baking in specific knowledge of aliasing properties of the LLVM IR, as opposed to simply reformulating the query to utilize a different alias analysis interface entry point, it makes a lot of sense to restrict that logic to a different layer such as BasicAA. So one aspect of the delegation that was in every AA base class is that when we don't have operand bundles, we re-use function AA results as a fallback for callsite alias results. This relies on the IR properties of calls and functions w.r.t. aliasing, and so seems a better fit to BasicAA. I've lifted the logic up to that point where it seems to be a natural fit. This still does a bit of redundant work (we query function attributes twice, once via the callsite and once via the function AA query) but it is *exactly* twice here, no more. The end result is that all of the delegation logic is hoisted out of the base class and into either the aggregation layer when it is a pure retargeting to a different API surface, or into BasicAA when it relies on the IR's aliasing properties. This should fix the quadratic query pattern reported in PR26564, although I don't have a stand-alone test case to reproduce it. It also seems general goodness. Now the numerous AAs that don't need target library info don't carry it around and depend on it. I think I can even rip out the general access to the aggregation layer and only expose that in BasicAA as it is the only place where we re-query in that manner. However, this is a non-trivial change to the AA infrastructure so I want to get some additional eyes on this before it lands. Sadly, it can't wait long because we should really cherry pick this into 3.8 if we're going to go this route. Differential Revision: http://reviews.llvm.org/D17329 llvm-svn: 262490
* [LLVM][AVX512]PSRAWI Change imm8 to int.Michael Zuckerman2016-03-021-9/+9
| | | | | | Differential Revision: http://reviews.llvm.org/D17705 llvm-svn: 262480
* [X86][SSE] Lower 128-bit MOVDDUP with existing VBROADCAST mechanismsSimon Pilgrim2016-03-021-39/+51
| | | | | | | | | | | | We have a number of useful lowering strategies for VBROADCAST instructions (both from memory and register element 0) which the 128-bit form of the MOVDDUP instruction can make use of. This patch tweaks lowerVectorShuffleAsBroadcast to enable it to broadcast 2f64 args using MOVDDUP as well. It does require a slight tweak to the lowerVectorShuffleAsBroadcast mechanism as the existing MOVDDUP lowering uses isShuffleEquivalent which can match binary shuffles that can lower to (unary) broadcasts. Differential Revision: http://reviews.llvm.org/D17680 llvm-svn: 262478
* Revert "[AMDGPU] table-driven parser/printer for amd_kernel_code_t structure ↵Nikolay Haustov2016-03-024-370/+0
| | | | | | | | fields" Build failure with clang. llvm-svn: 262477
* Revert "[AMDGPU] Using table-driven amd_kernel_code_t field parser in ↵Nikolay Haustov2016-03-022-8/+157
| | | | | | | | assembler." Build failure with clang. llvm-svn: 262475
* [AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler.Nikolay Haustov2016-03-022-157/+8
| | | | | | | | | | complementary patch to table-driven amd_kernel_code_t field parser/printer utility. lit tests passed. Patch by: Valery Pykhtin Differential Revision: http://reviews.llvm.org/D17151 llvm-svn: 262474
* [AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fieldsNikolay Haustov2016-03-024-0/+370
| | | | | | | | | | | | | | | | | | This is going to be used in .hsatext disassembler and can be used in current assembler parser (lit tests passed on parsing). Code using this helpers isn't included in this patch. Benefits: unified approach fast field name lookup on parsing Later I would like to enhance some of the field naming/syntax using this code. Patch by: Valery Pykhtin Differential Revision: http://reviews.llvm.org/D17150 llvm-svn: 262473
* libfuzzer: fix compiler warningsDmitry Vyukov2016-03-022-6/+12
| | | | | | | | - unused sigaction/setitimer result (used in assert) - unchecked fscanf return value - signed/unsigned comparison llvm-svn: 262472
OpenPOWER on IntegriCloud