summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* tidy up comments; NFCSanjay Patel2015-09-281-7/+7
| | | | llvm-svn: 248750
* add a FIXME for a CPU model check that should have an attribute insteadSanjay Patel2015-09-281-1/+2
| | | | llvm-svn: 248746
* move one-use check under the comment that describes it; NFCISanjay Patel2015-09-281-3/+2
| | | | llvm-svn: 248745
* [SCEV] Don't crash on pointer comparisonsSanjoy Das2015-09-281-2/+1
| | | | | | | | | | | | `ScalarEvolution::isImpliedCondOperandsViaNoOverflow` tries to cast the operand type of the comparison it is given to an `IntegerType`. This is incorrect because it could actually be simplifying a comparison between two pointers. Switch it to using `getTypeSizeInBits` instead, which does the right thing for both pointers and integers. Fixed PR24956. llvm-svn: 248743
* AMDGPU: Factor switch into separate functionMatt Arsenault2015-09-282-21/+30
| | | | llvm-svn: 248742
* AMDGPU: Fix splitting x16 SMRD loadsMatt Arsenault2015-09-281-2/+2
| | | | | | | | When used recursively, this would set the kill flag on the intermediate step from first splitting x16 to x8. llvm-svn: 248741
* AMDGPU: Fix moving SMRD loads with literal offsets on CIMatt Arsenault2015-09-281-3/+9
| | | | llvm-svn: 248740
* AMDGPU: Fix splitting SMRD with large offsetMatt Arsenault2015-09-281-1/+1
| | | | | | | | | | | | | The splitting of > 4 dword SMRD instructions if using an offset in an SGPR instead of an immediate was not setting the destination register, resulting an an instruction missing an operand which would assert later. Test will be included in a following commit which fixes a related issue. llvm-svn: 248739
* Improved the interface of methods commuting operands, improved X86-FMA3 ↵Andrew Kaylor2015-09-2813-174/+340
| | | | | | | | | | mem-folding&coalescing. Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com) Differential Revision: http://reviews.llvm.org/D11370 llvm-svn: 248735
* [GlobalOpt] Sort members of llvm.used deterministicallySean Silva2015-09-281-1/+2
| | | | | | | | | | | | | | | | | Patch by Jake VanAdrighem! Summary: Fix the way we sort the llvm.used and llvm.compiler.used members. This bug seems to have been introduced in rL183756 through a set of improper casts to GlobalValue*. In subsequent patches this problem was missed and transformed into a getName call on a ConstantExpr. Reviewers: silvas Subscribers: silvas, llvm-commits Differential Revision: http://reviews.llvm.org/D12851 llvm-svn: 248728
* Improve performance of SimplifyInstructionsInBlockFiona Glaser2015-09-281-12/+60
| | | | | | | | | | | | | | | | | | | | 1. Use a worklist, not a recursive approach, to avoid needless revisitation and being repeatedly forced to jump back to the start of the BB if a handle is invalidated. 2. Only insert operands to the worklist if they become unused after a dead instruction is removed, so we don’t have to visit them again in most cases. 3. Use a SmallSetVector to track the worklist. 4. Instead of pre-initting the SmallSetVector like in DeadCodeEliminationPass, only put things into the worklist if they have to be revisited after the first run-through. This minimizes how much the actual SmallSetVector gets used, which saves a lot of time. llvm-svn: 248727
* [mips][p5600] Added P5600 processor and initial scheduler.Daniel Sanders2015-09-284-0/+405
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The P5600 is an out-of-order, superscalar implementation of the MIPS32R5 architecture. The scheduler has a few missing details (see the 'Tricky Instructions' section and some quirks of the P5600 are deliberately omitted due to implementation difficulty and low chance of significant benefit (e.g. the predicate on P5600WriteEitherALU). However, testing on SingleSource is showing significant performance benefits on some apps (seven in the 10-30% range) and only one significant regression (12%) when -pre-RA-sched=linearize is given. Without -pre-RA-sched=linearize the results are more variable. Some do even better (up to 55% improvement) but increased numbers of copies are slowing others down (up to 12%). Overall, the scheduler as it currently stands is a 2.4% win with -pre-RA-sched=linearize and a 2.7% win without -pre-RA-sched=linearize. I'm sure we can improve on this further. For completeness, the FPGA this was tested on shows some failures with and without the P5600 scheduler. These appear to be scheduling related since the two test runs have fairly different sets of failing tests even after accounting for other factors (e.g. spurious connection failures) however it's not P5600 specific since we also get some for the generic scheduler. Reviewers: vkalintiris Subscribers: mpf, llvm-commits, atrick, vkalintiris Differential Revision: http://reviews.llvm.org/D12193 llvm-svn: 248725
* Introduce !align metadata for load instructionArtur Pilipenko2015-09-282-0/+9
| | | | | | | | Reviewed By: hfinkel Differential Revision: http://reviews.llvm.org/D12853 llvm-svn: 248721
* [InstSimplify] Fold simple known implications to truePhilip Reames2015-09-281-0/+47
| | | | | | | | | | This was split off of http://reviews.llvm.org/D13040 to make it easier to test the correctness of the implication logic. For the moment, this only handles a single easy case which shows up when eliminating and combining range checks. In the (near) future, I plan to extend this for other cases which show up in range checks, but I wanted to make those changes incrementally once the framework was in place. At the moment, the implication logic will be used by three places. One in InstSimplify (this review) and two in SimplifyCFG (http://reviews.llvm.org/D13040 & http://reviews.llvm.org/D13070). Can anyone think of other locations this style of reasoning would make sense? Differential Revision: http://reviews.llvm.org/D13074 llvm-svn: 248719
* [LoopReroll] Ignore debug intrinsicsWeiming Zhao2015-09-281-1/+20
| | | | | | | | | Originally, debug intrinsics and annotation intrinsics may prevent the loop to be rerolled, now they are ignored. Differential Revision: http://reviews.llvm.org/D13150 llvm-svn: 248718
* [WebAssembly] Support for direct call and call_indirect.Dan Gohman2015-09-281-6/+8
| | | | llvm-svn: 248716
* [mips] Handling of immediates bigger than 16 bitsZoran Jovanovic2015-09-282-0/+115
| | | | | | Differential Revision: http://reviews.llvm.org/D10539 llvm-svn: 248706
* [ARM] Avoid redundant checks for isThumb1Only() after supportsTailCall()Artyom Skrobov2015-09-282-25/+25
| | | | | | | | | | | | | | | supportsTailCall() has two callers. Both of them double-check isThumb1Only(), and refuse to proceed with tail-calling in that case. Therefore, it makes sense to move this check to ARMSubtarget::initSubtargetFeatures, where SupportsTailCall is initialized; and to eliminate the extra checks at the call sites. Following a review comment, added an "assert(supportsTailCall())" in IsEligibleForTailCall. NFC. llvm-svn: 248703
* [DAGCombine] Fix getStoreMergeAndAliasCandidates's AA-enabled chain walkingHal Finkel2015-09-281-0/+2
| | | | | | | | | | | | | | | | When AA is being used, non-aliasing stores are canonicalized to use the same chain, and DAGCombiner::getStoreMergeAndAliasCandidates can take advantage of this by looking only as users of a store's chain operand. However, user iteration is not result-number specific, we need to check that the use is as a chain operand, and not via some other operand. It is certainly possible to have another potentially-aliasing store, which shares the first's base pointer, and uses the first's chain's node via some other operand. Failure to catch this situation caused, at least in the included test case, an assert later because the relative sequence-number ordering caused later replacement to create a cycle in the DAG. llvm-svn: 248698
* Remove 'const' from some ArrayRefs. ArrayRefs are already immutable. NFCCraig Topper2015-09-284-9/+9
| | | | llvm-svn: 248693
* AsmWriter: Print the argument names in declarations while debuggingJustin Bogner2015-09-271-23/+31
| | | | | | | | | | | | | When llvm declarations have argument names, it's helpful to actually print those names when debugging. Arguably, it'd be nice to print them all the time, but that would mean the IR we output wouldn't round trip through bitcode, which doesn't store the names. Make the varous print() methods in AsmWriter optionally print "for debug" and set that flag in the dump() methods. The only thing this does differently for now is print the argument names in declarations. llvm-svn: 248692
* Silence clang warning: variable ‘Status’ set but not used.Yaron Keren2015-09-271-1/+1
| | | | llvm-svn: 248691
* [SCEV] identical instructions don't compute equal valuesSanjoy Das2015-09-271-1/+8
| | | | | | | | | | | | Before this change `HasSameValue` would return true for distinct `alloca` instructions if they happened to be allocating the same type (`alloca` instructions are not specified as reading memory). This change adds an explicit whitelist of instruction types for which "identical" instructions compute the same value. Fixes PR24952. llvm-svn: 248690
* [InstCombine] fold zexts and constants into a phi (PR24766)Sanjay Patel2015-09-272-0/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is one step towards solving PR24766: https://llvm.org/bugs/show_bug.cgi?id=24766 We were not producing the same IR for these two C functions because the store to the temp bool causes extra zexts: #include <stdbool.h> bool switchy(char x1, char x2, char condition) { bool conditionMet = false; switch (condition) { case 0: conditionMet = (x1 == x2); break; case 1: conditionMet = (x1 <= x2); break; } return conditionMet; } bool switchy2(char x1, char x2, char condition) { switch (condition) { case 0: return (x1 == x2); case 1: return (x1 <= x2); } return false; } As noted in the code comments, this test case manages to avoid the more general existing phi optimizations where there are only 2 phi inputs or where there are no constant phi args mixed in with the casts ops. It seems like a corner case, but if we don't catch it, then I don't think we can get SimplifyCFG to further optimize towards the canonical form for this function shown in the bug report. Differential Revision: http://reviews.llvm.org/D12866 llvm-svn: 248689
* [EH] Create removeUnwindEdge utilityJoseph Tremoulet2015-09-273-99/+69
| | | | | | | | | | | | | | | | | Summary: Factor the code that rewrites invokes to calls and rewrites WinEH terminators to their "unwind to caller" equivalents into a helper in Utils/Local, and use it in the three places I'm aware of that need to do this. Reviewers: andrew.w.kaylor, majnemer, rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13152 llvm-svn: 248677
* [BranchProbability] Manually round the floating point output.Benjamin Kramer2015-09-261-28/+7
| | | | | | | | | | | llvm::format compiles down to snprintf which has no defined rounding for floating point arguments, and MSVC has implemented it differently from what the BSD libcs and glibc do. Try to emulate the glibc rounding behavior to avoid changing tests. While there simplify code a bit and move trivial methods inline. llvm-svn: 248665
* AMDGPU: Remove hasPostISelHook from most instructionsMatt Arsenault2015-09-262-12/+19
| | | | | | | Since this is only needed for VOP3 and a few other special case instructions, stop setting it on everything. llvm-svn: 248657
* AMDGPU: Switch over reg class size instead of checking all super classesMatt Arsenault2015-09-262-22/+36
| | | | | | This gets isSGPRClass out of my profile of SIFixSGPRCopies. llvm-svn: 248656
* AMDGPU: Don't handle invalid reg classes in helper functionsMatt Arsenault2015-09-261-6/+0
| | | | | | | No tests hit these and it would be better to have checks like this explicit where they are used. llvm-svn: 248655
* AMDGPU: address -Winconsistent-missing-overrideSaleem Abdulrasool2015-09-261-1/+1
| | | | | | Add missing override. NFC. llvm-svn: 248652
* AMDGPU: Set CopyCost of register classesMatt Arsenault2015-09-261-8/+32
| | | | | | | | These require multiple mov instructions to copy, but the default value is that 1 instruction is needed. I'm not sure if this actually changes anything. llvm-svn: 248651
* [Bug 24848] Use range metadata to constant fold comparisons between two valuesChen Li2015-09-261-0/+26
| | | | | | | | | | | | | | | Summary: This is the second part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848. If both operands of a comparison have range metadata, they should be used to constant fold the comparison. Reviewers: sanjoy, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13177 llvm-svn: 248650
* AMDGPU: VOP3b definition cleanupsMatt Arsenault2015-09-262-18/+25
| | | | llvm-svn: 248647
* AMDGPU: Fix sched model for VOP2b instructionsMatt Arsenault2015-09-261-6/+7
| | | | | | | | Trying to use the version with the explicit output operand would complain because of the missing WriteSALU. I'm not sure why it doesn't complain about this with the implicit VCC def. llvm-svn: 248646
* [WebAssembly] Rename several functions and types according to the new spec.Dan Gohman2015-09-2610-158/+158
| | | | llvm-svn: 248644
* [ARM] Don't generate clrex for pre-v7 targets.Ahmed Bougacha2015-09-261-0/+2
| | | | | | Since r248294, we emit clrex, but it doesn't exist on v6. llvm-svn: 248640
* [SCEV] Reapply 'Teach isLoopBackedgeGuardedByCond to exploit trip counts'Sanjoy Das2015-09-251-0/+16
| | | | | | | | | | | | | | | | | | | | | | Summary: If the trip count of a specific backedge is `N`, then we know that backedge is effectively guarded by the condition `{0,+,1} u< N`. This change teaches SCEV to use this condition to prove things in `isLoopBackedgeGuardedByCond`. Depends on D12948 Depends on D12949 The original checkin, r248608 had to be backed out due to an issue with a ObjCXX unit test. That issue is now fixed, so re-landing. Reviewers: atrick, reames, majnemer, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12950 llvm-svn: 248638
* [SCEV] Reapply 'Exploit A < B => (A+K) < (B+K) when possible'Sanjoy Das2015-09-251-0/+143
| | | | | | | | | | | | | | | | | | | | | | | Summary: This change teaches SCEV's `isImpliedCond` two new identities: A u< B u< -C => (A + C) u< (B + C) A s< B s< INT_MIN - C => (A + C) s< (B + C) While these are useful on their own, they're really intended to support D12950. The original checkin, r248606 had to be backed out due to an issue with a ObjCXX unit test. That issue is now fixed, so re-landing. Reviewers: atrick, reames, majnemer, nlewycky, hfinkel Subscribers: aadg, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D12948 llvm-svn: 248637
* LivePhysRegs: Fix live-outs of return blocksMatthias Braun2015-09-251-2/+10
| | | | | | | | | | | | | I realized that the live-out set computed for the return block is missing the callee saved registers (the non-pristine ones to be exact). This only affects the liveness computed for instructions inside the function epilogue which currently none of the LivePhysRegs users in llvm cares about, so this is just a drive-by fix without a testcase. Differential Revision: http://reviews.llvm.org/D13180 llvm-svn: 248636
* [InstCombine] match De Morgan's Law hidden by zext ops (PR22723)Sanjay Patel2015-09-251-5/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a fix for PR22723: https://llvm.org/bugs/show_bug.cgi?id=22723 My first attempt at this was to change what I thought was the root problem: xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32 ...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop! My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would mean potentially returning a different size for the match than what was input. I think this would require all users of m_Not to check the size of the returned match, so I abandoned that idea. I settled on just fixing the exact case presented in the PR. This patch does allow the 2 functions in PR22723 to compile identically (x86): bool test(bool x, bool y) { return !x | !y; } bool test(bool x, bool y) { return !x || !y; } ... andb %sil, %dil xorb $1, %dil movb %dil, %al retq Differential Revision: http://reviews.llvm.org/D12705 llvm-svn: 248634
* Use fixed-point representation for BranchProbability.Cong Hou2015-09-251-5/+51
| | | | | | | | | | | | | | | | | | | | BranchProbability now is represented by its numerator and denominator in uint32_t type. This patch changes this representation into a fixed point that is represented by the numerator in uint32_t type and a constant denominator 1<<31. This is quite similar to the representation of BlockMass in BlockFrequencyInfoImpl.h. There are several pros and cons of this change: Pros: 1. It uses only a half space of the current one. 2. Some operations are much faster like plus, subtraction, comparison, and scaling by an integer. Cons: 1. Constructing a probability using arbitrary numerator and denominator needs additional calculations. 2. It is a little less precise than before as we use a fixed denominator. For example, 1 - 1/3 may not be exactly identical to 1 / 3 (this will lead to many BranchProbability unit test failures). This should not matter when we only use it for branch probability. If we use it like a rational value for some precise calculations we may need another construct like ValueRatio. One important reason for this change is that we propose to store branch probabilities instead of edge weights in MachineBasicBlock. We also want clients to use probability instead of weight when adding successors to a MBB. The current BranchProbability has more space which may be a concern. Differential revision: http://reviews.llvm.org/D12603 llvm-svn: 248633
* SelectionDAGDumper: Print simple operands inline.Matthias Braun2015-09-251-22/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Print simple operands inline instead of their pointer/value number. Simple operands are SDNodes without predecessors like Constant(FP), Register, UNDEF. This unifies the behaviour with dumpr() which was already doing this. Previously: t0: ch = EntryToken t1: i64 = Register %vreg0 t2: i64,ch = CopyFromReg t0, t1 t3: i64 = Constant<1> t4: i64 = add t2, t3 t5: i64 = Constant<2> t6: i64 = add t2, t5 t10: i64 = undef t11: i8,ch = load t0, t2, t10<LD1[%tmp81]> t12: i8,ch = load t0, t4, t10<LD1[%tmp10]> t13: i8,ch = load t0, t6, t10<LD1[%tmp12]> Now: t0: ch = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0 t4: i64 = add t2, Constant:i64<1> t6: i64 = add t2, Constant:i64<2> t11: i8,ch = load<LD1[%tmp81]> t0, t2, undef:i64 t12: i8,ch = load<LD1[%tmp10]> t0, t4, undef:i64 t13: i8,ch = load<LD1[%tmp12]> t0, t6, undef:i64 Differential Revision: http://reviews.llvm.org/D12567 llvm-svn: 248628
* AMDGPU: Construct new buffer instruction when moving SMRDMatt Arsenault2015-09-252-31/+39
| | | | | | | | | It's easier to understand creating a full instruction than the current situation where sometimes a new instruction is created and sometimes it is awkwardly mutated in place. llvm-svn: 248627
* DAGCombiner: Check if store is volatile firstMatt Arsenault2015-09-251-3/+3
| | | | | | This is the simpler check. NFC. llvm-svn: 248625
* TargetRegisterInfo: Introduce PrintLaneMask.Matthias Braun2015-09-259-26/+25
| | | | | | | This makes it more convenient to print lane masks and lead to more uniform printing. llvm-svn: 248624
* TargetRegisterInfo: Add typedef unsigned LaneBitmask and use it where ↵Matthias Braun2015-09-2512-88/+90
| | | | | | apropriate; NFC llvm-svn: 248623
* merge vector stores into wider vector stores and fix AArch64 misaligned ↵Sanjay Patel2015-09-252-14/+47
| | | | | | | | | | | | | | | | | | | | | | access TLI hook (PR21711) This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ). The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned accesses up in performSTORECombine() because they are slow. This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving existing (perhaps questionable) lowering behavior. The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned stores. Differential Revision: http://reviews.llvm.org/D12635 llvm-svn: 248622
* PrologueEpilogInserter: Fix missing live-ins when savepoint equals restorepointMatthias Braun2015-09-251-3/+6
| | | | | | | | | | | | | | The algorithm would not modify the live-in list of blocks below the save block point which is correct unless it happens to be a restore point at the same time. Also fixes the benign issue of live-in registers being added twice in some cases. The testcase is based on a test submitted by Kit Barton. Differential Revision: http://reviews.llvm.org/D13176 llvm-svn: 248620
* AMDGPU/SI: Use .hsatext section instead of .text for HSATom Stellard2015-09-2517-9/+197
| | | | | | | | | | Reviewers: arsenm, grosbach, rafael Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12424 llvm-svn: 248619
* MCAsmInfo: Allow targets to specify when the .section directive should be ↵Tom Stellard2015-09-252-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | omitted Summary: The default behavior is to omit the .section directive for .text, .data, and sometimes .bss, but some targets may want to omit this directive for other sections too. The AMDGPU backend will uses this to emit a simplified syntax for section switches. For example if the section directive is not omitted (current behavior), section switches to .hsatext will be printed like this: .section .hsatext,#alloc,#execinstr,#write This is actually wrong, because .hsatext has some custom STT_* flags, which MC doesn't know how to print or parse. If the section directive is omitted (made possible by this commit), section switches will be printed like this: .hsatext The motivation for this patch is to make it possible to emit sections with custom STT_* flags without having to teach MC about all the target specific STT_* flags. Reviewers: rafael, grosbach Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12423 llvm-svn: 248618
OpenPOWER on IntegriCloud