summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [SimplifyLibCalls] Correctly set the is_zero_undef flag for llvm.cttzDavide Italiano2015-08-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If <src> is non-zero we can safely set the flag to true, and this results in less code generated for, e.g. ffs(x) + 1 on FreeBSD. Thanks to majnemer for suggesting the fix and reviewing. Code generated before the patch was applied: 0: 0f bc c7 bsf %edi,%eax 3: b9 20 00 00 00 mov $0x20,%ecx 8: 0f 45 c8 cmovne %eax,%ecx b: 83 c1 02 add $0x2,%ecx e: b8 01 00 00 00 mov $0x1,%eax 13: 85 ff test %edi,%edi 15: 0f 45 c1 cmovne %ecx,%eax 18: c3 retq Code generated after the patch was applied: 0: 0f bc cf bsf %edi,%ecx 3: 83 c1 02 add $0x2,%ecx 6: 85 ff test %edi,%edi 8: b8 01 00 00 00 mov $0x1,%eax d: 0f 45 c1 cmovne %ecx,%eax 10: c3 retq It seems we can still use cmove and save another 'test' instruction, but that can be tackled separately. Differential Revision: http://reviews.llvm.org/D11989 llvm-svn: 244947
* MIR Parser: Extract the code that parses the alignment into a new method. NFC.Alex Lorenz2015-08-131-5/+13
| | | | | | | | This commit extracts the code that parses the memory operand's alignment into a new method named 'parseAlignment' so that it can be reused when parsing the basic block's alignment attribute. llvm-svn: 244945
* MIR Parser: Rename the method 'diagFromLLVMAssemblyDiag'. NFC.Alex Lorenz2015-08-131-6/+7
| | | | | | | | | This commit renames the method 'diagFromLLVMAssemblyDiag' to 'diagFromBlockStringDiag'. This method will be used when converting diagnostics from other YAML block strings, and not just the LLVM module block string, so the new name should reflect that. llvm-svn: 244943
* [SeparateConstOffsetFromGEP] strengthen the inbounds attributeJingyue Wu2015-08-131-4/+9
| | | | | | | | | | | | | | We used to be over-conservative about preserving inbounds. Actually, the second GEP (which applies the constant offset) can inherit the inbounds attribute of the original GEP, because the resultant pointer is equivalent to that of the original GEP. For example, x = GEP inbounds a, i+5 => y = GEP a, i // inbounds removed x = GEP inbounds y, 5 // inbounds preserved llvm-svn: 244937
* Remove and forbid raw_svector_ostream::flush() calls.Yaron Keren2015-08-1311-23/+0
| | | | | | | | | | After r244870 flush() will only compare two null pointers and return, doing nothing but wasting run time. The call is not required any more as the stream and its SmallString are always in sync. Thanks to David Blaikie for reviewing. llvm-svn: 244928
* Fix GCC warning: extra `;' [-Wpedantic].Nick Lewycky2015-08-131-1/+1
| | | | llvm-svn: 244924
* Scalar to vector conversions using direct movesNemanja Ivanovic2015-08-133-2/+89
| | | | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D11471 It improves the code generated for converting a scalar to a vector value. With direct moves from GPRs to VSRs, we no longer require expensive stack operations for this. Subsequent patches will handle the reverse case and more general operations between vectors and their scalar elements. llvm-svn: 244921
* [ARM] FMINNAN/FMAXNAN of f64 are not legal.James Molloy2015-08-131-2/+0
| | | | | | | | This was my error. We've got f32 marked as legal because they're simulated using a v2f32 instruction, but there's no equivalent for f64. This will get test coverage imminently when D12015 lands. llvm-svn: 244916
* [ARM] Allow vmin/vmax of scalars to be emitted without UseNEONForFP.James Molloy2015-08-131-2/+2
| | | | | | | | This overrides the default to more closely resemble the hand-crafted matching logic in ISelLowering. It makes sense, as there is no VFP equivalent of vmin or vmax, to use them when they're available even if in general VFP ops should be preferred. This should be NFC. llvm-svn: 244915
* [DeadStoreElimination] remove a redundant store even if the load is in a ↵Erik Eckstein2015-08-131-10/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | different block. DeadStoreElimination does eliminate a store if it stores a value which was loaded from the same memory location. So far this worked only if the store is in the same block as the load. Now we can also handle stores which are in a different block than the load. Example: define i32 @test(i1, i32*) { entry: %l2 = load i32, i32* %1, align 4 br i1 %0, label %bb1, label %bb2 bb1: br label %bb3 bb2: ; This store is redundant store i32 %l2, i32* %1, align 4 br label %bb3 bb3: ret i32 0 } Differential Revision: http://reviews.llvm.org/D11854 llvm-svn: 244901
* [mips][mcjit] Calculate correct addend for HI16 and PCHI16 relocPetar Jovanovic2015-08-133-9/+65
| | | | | | | | | | | Previously, for O32 ABI we did not calculate correct addend for R_MIPS_HI16 and R_MIPS_PCHI16 relocations. This patch fixes that. Patch by Vladimir Radosavljevic. Differential Revision: http://reviews.llvm.org/D11186 llvm-svn: 244897
* [WinEHPrepare] Update demotion logicJoseph Tremoulet2015-08-131-165/+209
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Update the demotion logic in WinEHPrepare to avoid creating new cleanups by walking predecessors as necessary to insert stores for EH-pad PHIs. Also avoid creating stores for EH-pad PHIs that have no uses. The store/load placement is still pretty naive. Likely future improvements (at least for optimized compiles) include: - Share loads for related uses as possible - Coalesce non-interfering use/def-related PHIs - Store at definition point rather than each PHI pred for non-interfering lifetimes. Reviewers: rnk, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11955 llvm-svn: 244894
* [SystemZ] Support large LLVM IR struct return valuesUlrich Weigand2015-08-133-4/+18
| | | | | | | | | | | | | | | | | | | | | | | Recent mesa/llvmpipe crashes on SystemZ due to a failed assertion when attempting to compile a routine with a return type of { <4 x float>, <4 x float>, <4 x float>, <4 x float> } on a system without vector instruction support. This is because after legalizing the vector type, we get a return value consisting of 16 floats, which cannot all be returned in registers. Usually, what should happen in this case is that the target's CanLowerReturn routine rejects the return type, in which case SelectionDAG falls back to implementing a structure return in memory via implicit reference. However, the SystemZ target never actually implemented any CanLowerReturn routine, and thus would accept any struct return type. This patch fixes the crash by implementing CanLowerReturn. As a side effect, this also handles fp128 return values, fixing a todo that was noted in SystemZCallingConv.td. llvm-svn: 244889
* Remove raw_svector_ostream::resync and users. It's no-op after r244870.Yaron Keren2015-08-133-7/+0
| | | | llvm-svn: 244888
* [InstCombinePHI] Partial simplification of identity operations.Charlie Turner2015-08-131-0/+115
| | | | | | | | | | | | | | | | | | | | | Consider this code: BB: %i = phi i32 [ 0, %if.then ], [ %c, %if.else ] %add = add nsw i32 %i, %b ... In this common case the add can be moved to the %if.else basic block, because adding zero is an identity operation. If we go though %if.then branch it's always a win, because add is not executed; if not, the number of instructions stays the same. This pattern applies also to other instructions like sub, shl, shr, ashr | 0, mul, sdiv, div | 1. Patch by Jakub Kuderski! llvm-svn: 244887
* Revert "[LIR] Start leveraging the fundamental guarantees of a loop..."Renato Golin2015-08-131-15/+12
| | | | | | | This reverts commit r244879, as it broke the test-suite on SingleSource/Regression/C/2004-03-15-IndirectGoto in AArch64. llvm-svn: 244885
* Revert "[LIR] Handle access to AliasAnalysis the same way as the other ↵Renato Golin2015-08-131-5/+3
| | | | | | | | | analysis in LoopIdiomRecognize." This reverts commit r244880, as it broke the test-suite on SingleSource/Regression/C/2004-03-15-IndirectGoto in AArch64. llvm-svn: 244884
* Test Commit.Ashutosh Nema2015-08-131-1/+1
| | | | llvm-svn: 244883
* [ARM] Reorganise and simplify thumb-1 load/store selectionJohn Brawn2015-08-132-169/+92
| | | | | | | | | | | | | | | | Other than PC-relative loads/store the patterns that match the various load/store addressing modes have the same complexity, so the order that they are matched is the order that they appear in the .td file. Rearrange the instruction definitions in ARMInstrThumb.td, and make use of AddedComplexity for PC-relative loads, so that the instruction matching order is the order that results in the simplest selection logic. This also makes register-offset load/store be selected when it should, as previously it was only selected for too-large immediate offsets. Differential Revision: http://reviews.llvm.org/D11800 llvm-svn: 244882
* [LIR] Handle access to AliasAnalysis the same way as the other analysisChandler Carruth2015-08-131-3/+5
| | | | | | | in LoopIdiomRecognize. This is what started me staring at this code. Now migrating it with the new AA stuff will be trivial. llvm-svn: 244880
* [LIR] Start leveraging the fundamental guarantees of a loop inChandler Carruth2015-08-131-12/+15
| | | | | | | | | | | simplified form to remove redundant checks and simplify the code for popcount recognition. We don't actually need to handle all of these cases. I've left a FIXME for one in particular until I finish inspecting to make sure we don't actually *rely* on the predicate in any way. llvm-svn: 244879
* [LIR] Handle the LoopInfo the same as all the other analyses. No utilityChandler Carruth2015-08-131-3/+3
| | | | | | really in breaking pattern just for this analysis. llvm-svn: 244878
* [InstCombine] SSE/AVX vector shifts demanded shift amount bitsSimon Pilgrim2015-08-131-27/+84
| | | | | | | | | | Most SSE/AVX (non-constant) vector shift instructions only use the lower 64-bits of the 128-bit shift amount vector operand, this patch calls SimplifyDemandedVectorElts to optimize for this. I had to refactor some of my recent InstCombiner work on the vector shifts to avoid quite a bit of duplicate code, it means that SimplifyX86immshift now (re)decodes the type of shift. Differential Revision: http://reviews.llvm.org/D11938 llvm-svn: 244872
* Modify raw_svector_ostream to use its SmallString without additional buffering.Yaron Keren2015-08-131-62/+5
| | | | | | | | | This is faster and avoids the stream and SmallString state synchronization issue. resync() is a no-op and may be safely deleted. I'll do so in a follow-up commit. Reviewed by Rafael Espindola. llvm-svn: 244870
* [LoopUnswitch] Check OptimizeForSize before traversing over all basic blocks ↵Chen Li2015-08-131-7/+6
| | | | | | | | | | | | | | in current loop Summary: This patch moves the check of OptimizeForSize before traversing over all basic blocks in current loop. If OptimizeForSize is set to true, no non-trivial unswitch is ever allowed. Therefore, the early exit will help reduce compilation time. This patch should be NFC. Reviewers: reames, weimingz, broune Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11997 llvm-svn: 244868
* [CodeGen] Mark the promoted FCOPYSIGN result FP_ROUND as TRUNCating.Ahmed Bougacha2015-08-131-1/+8
| | | | | | | | | | | | | Now that we can properly promote mismatched FCOPYSIGNs (r244858), we can mark the FP_ROUND on the result as truncating, to expose folding. FCOPYSIGN doesn't change anything but the sign bit, so (fp_round (fcopysign (fpext a), b)) is equivalent to (modulo the sign bit): (fp_round (fpext a)) which is a no-op. llvm-svn: 244862
* [AArch64] Also custom-lowering mismatched vector/f16 FCOPYSIGN.Ahmed Bougacha2015-08-131-11/+5
| | | | | | | | | We can lower them using our cool tricks if we fpext/fptrunc the second input, like we do for f32/f64. Follow-up to r243924, r243926, and r244858. llvm-svn: 244860
* [CodeGen] Assert on getNode(FP_EXTEND) with a smaller dst type.Ahmed Bougacha2015-08-131-0/+2
| | | | | | This would have caught the problem in r244858. llvm-svn: 244859
* [CodeGen] When Promoting, don't extend the 2nd FCOPYSIGN operand.Ahmed Bougacha2015-08-131-1/+1
| | | | | | | | | | | We don't care about its type, and there's even a combine that'll fold away the FP_EXTEND if we let it run. However, until it does, we'll have something broken like: (f32 (fp_extend (f64 v))) Scalar f16 follow-up to r243924. llvm-svn: 244858
* [CodeGen] Simplify getNode(*EXT/TRUNC) type size assert. NFC.Ahmed Bougacha2015-08-131-8/+8
| | | | | | | | We already check that vectors have the same number of elements, we don't need to use the scalar types explicitly: comparing the size of the whole vector is enough. llvm-svn: 244857
* There is only one saver of strings.Rafael Espindola2015-08-133-4/+4
| | | | llvm-svn: 244854
* [LIR] Make the LoopIdiomRecognize pass get analyses essentially the sameChandler Carruth2015-08-131-40/+6
| | | | | | | way as every other pass. This simplifies the code quite a bit and is also more idiomatic! <ba-dum!> llvm-svn: 244853
* [LIR] Remove the dedicated class for popcount recognition and sink theChandler Carruth2015-08-131-392/+343
| | | | | | | | | | | | | | | | | | | | | code into methods on LoopIdiomRecognize. This simplifies the code somewhat and also makes it much easier to move the analyses around. Ultimately, the separate class wasn't providing significant value over methods -- it contained the precondition basic block and the current loop. The current loop is already available and the precondition block wasn't needed everywhere and is easy to pass around. In several cases I just moved things to be static functions because they already accepted most of their inputs as arguments. This doesn't fix the way we manage analyses yet, that will be the next patch, but it already makes the code over 50 lines shorter. No functionality changed. llvm-svn: 244851
* Return ErrorOr from FileOutputBuffer::create. NFC.Rafael Espindola2015-08-131-7/+4
| | | | llvm-svn: 244848
* [LIR] Move all the helpers to be private and re-order the methods inChandler Carruth2015-08-131-46/+55
| | | | | | a way that groups things logically. No functionality changed. llvm-svn: 244845
* [LIR] Remove the 'LIRUtils' abstraction which was unnecessary and addingChandler Carruth2015-08-121-51/+18
| | | | | | | | | | | | | | | | | | | | | | | | complexity. There is only one function that was called from multiple locations, and that was 'getBranch' which has a reasonable one-line spelling already: dyn_cast<BranchInst>(BB->getTerminator). We could make this shorter, but it doesn't seem to add much value. Instead, we should avoid calling it so many times on the same basic blocks, but that will be in a subsequent patch. The other functions are only called in one location, so inline them there, and take advantage of this to use direct early exit and reduce indentation. This makes it much more clear what is being tested for, and in fact makes it clear now to me that there are simpler ways to do this work. However, this patch just does the mechanical inlining. I'll clean up the functionality of the code to leverage loop simplified form more effectively in a follow-up. Despite lots of early line breaks due to early-exit, this is still shorter than it was before. llvm-svn: 244841
* [LIR] Run clang-format over LoopIdiomRecognize in preparation forChandler Carruth2015-08-121-227/+219
| | | | | | | | | | | | a significant code cleanup here. The handling of analyses in this pass is overly complex and can be simplified significantly, but the right way to do that is to simplify all of the code not just the analyses, and that'll require pretty extensive edits that would be noisy with formatting changes mixed into them. llvm-svn: 244828
* [PM/AA] Remove the AliasDebugger pass.Chandler Carruth2015-08-123-132/+0
| | | | | | | | | | | | | | | | | | This debugger was designed to catch places where the old update API was failing to be used correctly. As I've removed the update API, it no longer serves any purpose. We can introduce new debugging aid passes around any future work w.r.t. updating AAs. Note that I've updated the documentation here, but really I need to rewrite the documentation to carefully spell out the ideas around stateful AA and how things are changing in the AA world. However, I'm hoping to do that as a follow-up to the refactoring of the AA infrastructure to work in both old and new pass managers so that I can write the documentation specific to that world. Differential Revision: http://reviews.llvm.org/D11984 llvm-svn: 244825
* [RewriteStatepointsForGC] Avoid using unrelocated pointers after safepointsPhilip Reames2015-08-121-0/+31
| | | | | | | | | | | | | | | | To be clear: this is an *optimization* not a correctness change. CodeGenPrep likes to duplicate icmps feeding branch instructions to take advantage of x86's ability to fuze many comparison/branch patterns into a single micro-op and to reduce the need for materializing i1s into general registers. PlaceSafepoints likes to place safepoint polls right at the end of basic blocks (immediately before terminators) when inserting entry and backedge safepoints. These two heuristics interact in a somewhat unfortunate way where the branch terminating the original block will be controlled by a condition driven by unrelocated pointers. This forces the register allocator to keep both the relocated and unrelocated values of the pointers feeding the icmp alive over the safepoint poll. One simple fix would have been to just adjust PlaceSafepoints to move one back in the basic block, but you can reach similar cases as a result of LICM or other hoisting passes. As a result, doing a post insertion fixup seems to be more robust. I considered doing this in CodeGenPrep itself, but having to update the live sets of already rewritten safepoints gets complicated fast. In particular, you can't just use def/use information because by moving the icmp, we're extending the live range of it's inputs potentially. Instead, this patch teaches RewriteStatepointsForGC to make the required adjustments before making the relocations explicit in the IR. This change really highlights the fact that RSForGC is a CodeGenPrep-like pass which is performing target specific lowering. In the long run, we may even want to combine the two though this would require a lot more smarts to be integrated into RSForGC first. We currently rely on being able to run a set of cleanup passes post rewriting because the IR RSForGC generates is pretty damn ugly. Differential Revision: http://reviews.llvm.org/D11819 llvm-svn: 244821
* MIR Parser: Allow the MI IR references to reference global values.Alex Lorenz2015-08-121-0/+3
| | | | | | | This commit fixes a bug where MI parser couldn't resolve the named IR references that referenced named global values. llvm-svn: 244817
* MIR Serialization: Serialize the fixed stack pseudo source values.Alex Lorenz2015-08-122-1/+14
| | | | llvm-svn: 244816
* NFC. Convert comments in MachineBasicBlock.cpp into new style.Cong Hou2015-08-121-35/+23
| | | | llvm-svn: 244815
* MIR Parser: Move the parsing of fixed stack object indices into new method. NFCAlex Lorenz2015-08-121-2/+11
| | | | | | | | | This commit moves the code that parses the frame indices for the fixed stack objects from the method 'parseFixedStackObjectOperand' to a new method named 'parseFixedStackFrameIndex', so that it can be reused when parsing fixed stack pseudo source values. llvm-svn: 244814
* MIR Serialization: Serialize the jump table pseudo source values.Alex Lorenz2015-08-124-1/+9
| | | | llvm-svn: 244813
* MIR Serialization: Serialize the GOT pseudo source values.Alex Lorenz2015-08-124-1/+10
| | | | llvm-svn: 244809
* [RewriteStatepointsForGC] Handle extractelement fully in the base pointer ↵Philip Reames2015-08-121-61/+96
| | | | | | | | | | algorithm When rewriting the IR such that base pointers are available for every live pointer, we potentially need to duplicate instructions to propagate the base. The original code had only handled PHI and Select under the belief those were the only instructions which would need duplicated. When I added support for vector instructions, I'd added a collection of hacks for ExtractElement which caught most of the common cases. Of course, I then found the one test case my hacks couldn't cover. :) This change removes all of the early hacks for extract element. By defining extractelement as a BDV (rather than trying to look through it), we can extend the rewriting algorithm to duplicate the extract as needed. Note that a couple of peephole optimizations were left in for the moment, because while we now handle extractelement as a first class citizen, we're not yet handling insertelement. That change will follow in the near future. llvm-svn: 244808
* MIR Serialization: Serialize the stack pseudo source values.Alex Lorenz2015-08-124-1/+9
| | | | llvm-svn: 244806
* fix typo; NFCSanjay Patel2015-08-121-1/+1
| | | | llvm-svn: 244805
* MIR Serialization: Serialize the constant pool pseudo source values.Alex Lorenz2015-08-124-16/+60
| | | | llvm-svn: 244803
* Fix missing space in libfuzzer's help text.Lenny Maiorani2015-08-121-1/+1
| | | | llvm-svn: 244800
OpenPOWER on IntegriCloud