summaryrefslogtreecommitdiffstats
path: root/llvm/test
Commit message (Collapse)AuthorAgeFilesLines
* This commit contains a few changes that had to go in together.Nadav Rotem2012-04-016-14/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Simplify xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) (and also scalar_to_vector). 2. Xor/and/or are indifferent to the swizzle operation (shuffle of one src). Simplify xor/and/or (shuff(A), shuff(B)) -> shuff(op (A, B)) 3. Optimize swizzles of shuffles: shuff(shuff(x, y), undef) -> shuff(x, y). 4. Fix an X86ISelLowering optimization which was very bitcast-sensitive. Code which was previously compiled to this: movd (%rsi), %xmm0 movdqa .LCPI0_0(%rip), %xmm2 pshufb %xmm2, %xmm0 movd (%rdi), %xmm1 pshufb %xmm2, %xmm1 pxor %xmm0, %xmm1 pshufb .LCPI0_1(%rip), %xmm1 movd %xmm1, (%rdi) ret Now compiles to this: movl (%rsi), %eax xorl %eax, (%rdi) ret llvm-svn: 153848
* Add instruction itinerary for the PPC64 A2 core.Hal Finkel2012-04-011-0/+33
| | | | | | | This adds a full itinerary for IBM's PPC64 A2 embedded core. These cores form the basis for the CPUs in the new IBM BG/Q supercomputer. llvm-svn: 153842
* Add some more testing to cover the remaining two cases whereChandler Carruth2012-04-011-0/+45
| | | | | | always-inlining is disabled: recursive functions and indirectbr. llvm-svn: 153833
* Fix a pretty scary bug I introduced into the always inliner withChandler Carruth2012-04-011-0/+38
| | | | | | | | | | a single missing character. Somehow, this had gone untested. I've added tests for returns-twice logic specifically with the always-inliner that would have caught this, and fixed the bug. Thanks to Matt for the careful review and spotting this!!! =D llvm-svn: 153832
* Replace four tiny tests with various uses of grep and not with a singleChandler Carruth2012-04-015-46/+42
| | | | | | test and FileCheck. llvm-svn: 153831
* Add a triple to the test.Rafael Espindola2012-03-311-1/+1
| | | | llvm-svn: 153818
* Teach CodeGen's version of computeMaskedBits to understand the range metadata.Rafael Espindola2012-03-311-0/+14
| | | | | | | | This is the CodeGen equivalent of r153747. I tested that there is not noticeable performance difference with any combination of -O0/-O2 /-g when compiling gcc as a single compilation unit. llvm-svn: 153817
* Initial commit for the rewrite of the inline cost analysis to operateChandler Carruth2012-03-315-55/+153
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | on a per-callsite walk of the called function's instructions, in breadth-first order over the potentially reachable set of basic blocks. This is a major shift in how inline cost analysis works to improve the accuracy and rationality of inlining decisions. A brief outline of the algorithm this moves to: - Build a simplification mapping based on the callsite arguments to the function arguments. - Push the entry block onto a worklist of potentially-live basic blocks. - Pop the first block off of the *front* of the worklist (for breadth-first ordering) and walk its instructions using a custom InstVisitor. - For each instruction's operands, re-map them based on the simplification mappings available for the given callsite. - Compute any simplification possible of the instruction after re-mapping, and store that back int othe simplification mapping. - Compute any bonuses, costs, or other impacts of the instruction on the cost metric. - When the terminator is reached, replace any conditional value in the terminator with any simplifications from the mapping we have, and add any successors which are not proven to be dead from these simplifications to the worklist. - Pop the next block off of the front of the worklist, and repeat. - As soon as the cost of inlining exceeds the threshold for the callsite, stop analyzing the function in order to bound cost. The primary goal of this algorithm is to perfectly handle dead code paths. We do not want any code in trivially dead code paths to impact inlining decisions. The previous metric was *extremely* flawed here, and would always subtract the average cost of two successors of a conditional branch when it was proven to become an unconditional branch at the callsite. There was no handling of wildly different costs between the two successors, which would cause inlining when the path actually taken was too large, and no inlining when the path actually taken was trivially simple. There was also no handling of the code *path*, only the immediate successors. These problems vanish completely now. See the added regression tests for the shiny new features -- we skip recursive function calls, SROA-killing instructions, and high cost complex CFG structures when dead at the callsite being analyzed. Switching to this algorithm required refactoring the inline cost interface to accept the actual threshold rather than simply returning a single cost. The resulting interface is pretty bad, and I'm planning to do lots of interface cleanup after this patch. Several other refactorings fell out of this, but I've tried to minimize them for this patch. =/ There is still more cleanup that can be done here. Please point out anything that you see in review. I've worked really hard to try to mirror at least the spirit of all of the previous heuristics in the new model. It's not clear that they are all correct any more, but I wanted to minimize the change in this single patch, it's already a bit ridiculous. One heuristic that is *not* yet mirrored is to allow inlining of functions with a dynamic alloca *if* the caller has a dynamic alloca. I will add this back, but I think the most reasonable way requires changes to the inliner itself rather than just the cost metric, and so I've deferred this for a subsequent patch. The test case is XFAIL-ed until then. As mentioned in the review mail, this seems to make Clang run about 1% to 2% faster in -O0, but makes its binary size grow by just under 4%. I've looked into the 4% growth, and it can be fixed, but requires changes to other parts of the inliner. llvm-svn: 153812
* Clean up the naming in this test. Someone pointed this out in review atChandler Carruth2012-03-311-3/+3
| | | | | | one point, and I forgot to go back and clean it up. Sorry about that. =/ llvm-svn: 153801
* FileCheck-ize this test, and generally tidy it up prior to changingChandler Carruth2012-03-311-21/+26
| | | | | | things around. llvm-svn: 153799
* Correctly vectorize powi.Hal Finkel2012-03-311-0/+44
| | | | | | | | The powi intrinsic requires special handling because it always takes a single integer power regardless of the result type. As a result, we can vectorize only if the powers are equal. Fixes PR12364. llvm-svn: 153797
* ARM assembler should prefer non-aliases encoding of cmp.Jim Grosbach2012-03-301-2/+5
| | | | | | | | When an immediate is both a value [t2_]so_imm and a [t2_]so_imm_neg, we want to use the non-negated form to make sure we prefer the normal encoding, not the aliased encoding via the negation of, e.g., 'cmp.w'. llvm-svn: 153770
* ARM encoding for VSWP got the second operand incorrect.Jim Grosbach2012-03-301-0/+7
| | | | | | | | | Make the non-tied register operand names line up with what the base class encoding handler expects. rdar://11157236 llvm-svn: 153766
* ARM integrated assembler should encoding choice for add/sub imm.Jim Grosbach2012-03-301-0/+8
| | | | | | | | | For 'adds r2, r2, #56' outside of an IT block, the 16-bit encoding T2 can be used for this syntax. Prefer the narrow encoding when possible. rdar://11156277 llvm-svn: 153759
* ARM assembly parsing needs to be paranoid about negative immediates.Jim Grosbach2012-03-301-1/+4
| | | | | | | | Make sure to treat immediates as unsigned when doing relative comparisons. rdar://11153621 llvm-svn: 153753
* Ensure conditional BL instructions for ARM are given the fixup ↵James Molloy2012-03-303-2/+28
| | | | | | | | fixup_arm_condbranch. Patch by Tim Northover! llvm-svn: 153737
* ARM target should allow codegenprep to duplicate ret instructions to enable ↵Evan Cheng2012-03-301-0/+42
| | | | | | tailcall opt. rdar://11140249 llvm-svn: 153717
* Testcase for r153710.Bill Wendling2012-03-301-0/+35
| | | | llvm-svn: 153711
* Add testcase for r153705Bill Wendling2012-03-301-0/+59
| | | | llvm-svn: 153706
* Change the constant in this testcase so that it results in a constant poolLang Hames2012-03-291-3/+3
| | | | | | load. llvm-svn: 153704
* Revert r153694. It was causing failures in the buildbots.Bill Wendling2012-03-2948-1075/+0
| | | | llvm-svn: 153701
* Filecheck-ize this test so that it actually tests something reasonable.Chandler Carruth2012-03-291-2/+13
| | | | llvm-svn: 153697
* Re-factored RuntimeDyld.Danil Malyshev2012-03-2948-0/+1075
| | | | | | Added ExecutionEngine/MCJIT tests. llvm-svn: 153694
* ARM assembly 'cmp lr, #0' should not encode using 'cmn'.Jim Grosbach2012-03-291-0/+2
| | | | | | | | | The CMP->CMN alias was matching for an immediate of zero when it should only match for negative values. rdar://11129224 llvm-svn: 153689
* The shuffle scheduler is only available in asserts build - make misched-new.llLang Hames2012-03-291-0/+1
| | | | | | testcase require asserts. llvm-svn: 153687
* Make x86 REP_MOV* and REP_STO instructions use the correct operand sizes in ↵Lang Hames2012-03-291-2/+3
| | | | | | 64-bit mode. llvm-svn: 153680
* Expand FREM.Akira Hatanaka2012-03-291-0/+13
| | | | llvm-svn: 153671
* Don't PRE compares.Jakob Stoklund Olesen2012-03-291-0/+68
| | | | | | | | | | | | CodeGenPrepare sinks compare instructions down to their uses to prevent live flags and predicate registers across basic blocks. PRE of a compare instruction prevents that, forcing the i1 compare result into a general purpose register. That is usually more expensive than the redundant compare PRE was trying to eliminate in the first place. llvm-svn: 153657
* For X86, change load/dec-or-inc/store into dec-or-inc, respectively.Joel Jones2012-03-292-67/+179
| | | | | | | | | | | | | | | | | This is a code change to add support for changing instruction sequences of the form: load inc/dec of 8/16/32/64 bits store into the appropriate X86 inc/dec through memory instruction: inc[qlwb] / dec[qlwb] The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode. The comments have also been expanded. llvm-svn: 153635
* Reverted to revision 153616 to unblock buildJoel Jones2012-03-292-179/+67
| | | | llvm-svn: 153623
* For X86, change load/dec-or-inc/store into dec-or-inc, respectively.Joel Jones2012-03-292-67/+179
| | | | | | | | | | | | | | | | | This is a code change to add support for changing instruction sequences of the form: load inc/dec of 8/16/32/64 bits store into the appropriate X86 inc/dec through memory instruction: inc[qlwb] / dec[qlwb] The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode. The comments have also been expanded. llvm-svn: 153617
* Don't kill the base register when expanding strd.Jakob Stoklund Olesen2012-03-281-0/+15
| | | | | | | | | | | | | | When an strd instruction doesn't get the registers it wants, it can be expanded into two str instructions. Make sure the first str doesn't kill the base register in the case where the base and data registers are identical: t2STRi12 %R0<kill>, %R0, 4, pred:14, pred:%noreg t2STRi12 %R2<kill>, %R0, 8, pred:14, pred:%noreg <rdar://problem/11101911> llvm-svn: 153611
* Handle intrinsics in GlobalsModRef. Fixes pr12351.Rafael Espindola2012-03-281-0/+33
| | | | llvm-svn: 153604
* Spill DPair registers, not just QPR.Jakob Stoklund Olesen2012-03-281-1/+15
| | | | | | | | | The arm_neon intrinsics can create virtual registers from the DPair register class which allows both even-odd and odd-even D-register pairs. This fixes PR12389. llvm-svn: 153603
* Revert r153521 as it's causing large regressions on the nightly testers.Chad Rosier2012-03-281-15/+0
| | | | | | | | Original commit message for r153521 (aka r153423): Use the new range metadata in computeMaskedBits and add a new optimization to instruction simplify that lets us remove an and when loding a boolean value. llvm-svn: 153587
* GlobalOpt: If we have an inbounds GEP from a ConstantAggregateZero global ↵Benjamin Kramer2012-03-281-0/+11
| | | | | | that we just determined to be constant, replace all loads from it with a zero value. llvm-svn: 153576
* Fixup VST1.32 with writeback instruction. Also re-factor non-writeback version.Richard Barton2012-03-281-0/+8
| | | | llvm-svn: 153573
* Switch to WeakVHs in the value mapper, and aggressively prune dead basicChandler Carruth2012-03-281-10/+0
| | | | | | | | | blocks in the function cloner. This removes the last case of trivially dead code that I've been seeing in the wild getting inlined, analyzed, re-inlined, optimized, only to be deleted. Nukes a FIXME from the cleanup tests. llvm-svn: 153572
* Fix the output of the DW_TAG_friend tag to include DW_AT_friendEric Christopher2012-03-281-0/+47
| | | | | | | | and not the rest of the member tag. Fixes PR11695 llvm-svn: 153570
* Fix test case.Akira Hatanaka2012-03-281-0/+2
| | | | llvm-svn: 153555
* Add a test for the previous commit. Also, remove two tests that wereEric Christopher2012-03-273-117/+31
| | | | | | | testing a) the wrong behavior or b) something that I'm already testing in the new test. llvm-svn: 153525
* Reapply r153423; the original commit was fine. The failing test, distray, had Chad Rosier2012-03-271-0/+15
| | | | | | | | | | undefined behavior, which Rafael was kind enough to fix. Original commit message for r153423: Use the new range metadata in computeMaskedBits and add a new optimization to instruction simplify that lets us remove an and when loding a boolean value. llvm-svn: 153521
* Post-ra LICM should take care not to hoist an instruction that would clobber aEvan Cheng2012-03-271-0/+59
| | | | | | | | register that's read by the preheader terminator. rdar://11095580 llvm-svn: 153492
* ARM has a peephole optimization which looks for a def / use pair. The defEvan Cheng2012-03-261-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | produces a 32-bit immediate which is consumed by the use. It tries to fold the immediate by breaking it into two parts and fold them into the immmediate fields of two uses. e.g movw r2, #40885 movt r3, #46540 add r0, r0, r3 => add.w r0, r0, #3019898880 add.w r0, r0, #30146560 ; However, this transformation is incorrect if the user produces a flag. e.g. movw r2, #40885 movt r3, #46540 adds r0, r0, r3 => add.w r0, r0, #3019898880 adds.w r0, r0, #30146560 Note the adds.w may not set the carry flag even if the original sequence would. rdar://11116189 llvm-svn: 153484
* SCEV fix: Handle loop invariant loads.Andrew Trick2012-03-261-0/+47
| | | | | | Fixes PR11882: NULL dereference in ComputeLoadConstantCompareExitLimit. llvm-svn: 153480
* Unit test for PR11950: LSR crash.Andrew Trick2012-03-261-0/+49
| | | | llvm-svn: 153472
* Revert r153423 as this is causing failures on our internal nightly testers.Chad Rosier2012-03-261-15/+0
| | | | | | | | Original commit message: Use the new range metadata in computeMaskedBits and add a new optimization to instruction simplify that lets us remove an and when loading a boolean value. llvm-svn: 153452
* [tsan] treat vtable pointer updates in a special way (requires tbaa); fix a ↵Kostya Serebryany2012-03-262-0/+14
| | | | | | bug (forgot to return true after instrumenting); make sure the tsan tests are run llvm-svn: 153448
* Remove stale CBackend tests.Benjamin Kramer2012-03-2652-627/+0
| | | | llvm-svn: 153433
* Use the new range metadata in computeMaskedBits and add a new optimization toRafael Espindola2012-03-261-0/+15
| | | | | | instruction simplify that lets us remove an and when loding a boolean value. llvm-svn: 153423
OpenPOWER on IntegriCloud