summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Don't PRE compares.Jakob Stoklund Olesen2012-03-291-1/+8
| | | | | | | | | | | | CodeGenPrepare sinks compare instructions down to their uses to prevent live flags and predicate registers across basic blocks. PRE of a compare instruction prevents that, forcing the i1 compare result into a general purpose register. That is usually more expensive than the redundant compare PRE was trying to eliminate in the first place. llvm-svn: 153657
* Replace assert(0) with llvm_unreachable to avoid warnings about dropping off ↵Benjamin Kramer2012-03-291-6/+5
| | | | | | the end of a non-void function in Release builds. llvm-svn: 153643
* Add support for objc property decls according to the page at:Eric Christopher2012-03-294-3/+42
| | | | | | | | | | http://llvm.org/docs/SourceLevelDebugging.html#objcproperty including type and DECL. Expand the metadata needed accordingly. rdar://11144023 llvm-svn: 153639
* Only allow symbolic names for (v)cmpss/sd/ps/pd encodings 8-31 to be used ↵Craig Topper2012-03-291-12/+13
| | | | | | with 'v' version of instructions. llvm-svn: 153636
* For X86, change load/dec-or-inc/store into dec-or-inc, respectively.Joel Jones2012-03-291-34/+94
| | | | | | | | | | | | | | | | | This is a code change to add support for changing instruction sequences of the form: load inc/dec of 8/16/32/64 bits store into the appropriate X86 inc/dec through memory instruction: inc[qlwb] / dec[qlwb] The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode. The comments have also been expanded. llvm-svn: 153635
* Reverted to revision 153616 to unblock buildJoel Jones2012-03-291-94/+34
| | | | llvm-svn: 153623
* For X86, change load/dec-or-inc/store into dec-or-inc, respectively.Joel Jones2012-03-291-34/+94
| | | | | | | | | | | | | | | | | This is a code change to add support for changing instruction sequences of the form: load inc/dec of 8/16/32/64 bits store into the appropriate X86 inc/dec through memory instruction: inc[qlwb] / dec[qlwb] The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode. The comments have also been expanded. llvm-svn: 153617
* Enable machine code verification in the entire code generator.Jakob Stoklund Olesen2012-03-282-10/+5
| | | | | | | | | | Some targets still mess up the liveness information, but that isn't verified after MRI->invalidateLiveness(). The verifier can still check other useful things like register classes and CFG, so it should be enabled after all passes. llvm-svn: 153615
* Enable machine code verification after PreSched2 passes.Jakob Stoklund Olesen2012-03-282-2/+4
| | | | | | | | | | The late scheduler depends on accurate liveness information if it is breaking anti-dependencies, so we should be able to verify it. Relax the terminator checking in the machine code verifier so it can handle the basic blocks created by if conversion. llvm-svn: 153614
* Don't kill the base register when expanding strd.Jakob Stoklund Olesen2012-03-281-0/+4
| | | | | | | | | | | | | | When an strd instruction doesn't get the registers it wants, it can be expanded into two str instructions. Make sure the first str doesn't kill the base register in the case where the base and data registers are identical: t2STRi12 %R0<kill>, %R0, 4, pred:14, pred:%noreg t2STRi12 %R2<kill>, %R0, 8, pred:14, pred:%noreg <rdar://problem/11101911> llvm-svn: 153611
* Preserve implicit defs in ARMLoadStoreOptimizer.Jakob Stoklund Olesen2012-03-282-4/+23
| | | | | | | | | | | When a number of sub-register VLRDS instructions are combined into a VLDM, preserve any super-register implicit defs. This is required to keep the register scavenger and machine code verifier happy. Enable machine code verification after ARMLoadStoreOptimizer. ARM/2012-01-26-CopyPropKills.ll was failing because of this. llvm-svn: 153610
* Move getPointerToNamedFunction() from JIT/MCJIT to JITMemoryManager.Danil Malyshev2012-03-2810-332/+208
| | | | llvm-svn: 153607
* Handle intrinsics in GlobalsModRef. Fixes pr12351.Rafael Espindola2012-03-281-0/+6
| | | | llvm-svn: 153604
* Spill DPair registers, not just QPR.Jakob Stoklund Olesen2012-03-282-6/+6
| | | | | | | | | The arm_neon intrinsics can create virtual registers from the DPair register class which allows both even-odd and odd-even D-register pairs. This fixes PR12389. llvm-svn: 153603
* Also verify after ExpandPostRAPseudos.Jakob Stoklund Olesen2012-03-281-1/+1
| | | | llvm-svn: 153599
* Enable machine code verification after the late machine optimization passes.Jakob Stoklund Olesen2012-03-281-3/+3
| | | | | | | Branch folding invalidates liveness and disables liveness verification on some targets. llvm-svn: 153597
* Skip liveness verification when MRI->tracksLiveness() is false.Jakob Stoklund Olesen2012-03-281-105/+112
| | | | | | | | | Extract the liveness verification into its own method. This makes it possible to run the machine code verifier after liveness information is no longer required to be valid. llvm-svn: 153596
* Revert r153516: "Invalidate liveness in Thumb2ITBlockPass."Jakob Stoklund Olesen2012-03-282-8/+0
| | | | | | | | | | | | | | Revert r153519: "ARMLoadStoreOptimizer invalidates register liveness." These patches caused miscompilations in povray by turning off branch folding's updating of live-in lists. It turns out the the late scheduler depends on the live-in lists, even if it doesn't need correct kill flags. <rdar://problem/11139228> llvm-svn: 153593
* Allow removeLiveIn to be called with a register that isn't live-in.Jakob Stoklund Olesen2012-03-281-2/+2
| | | | | | | | | This avoids the silly double search: if (isLiveIn(Reg)) removeLiveIn(Reg); llvm-svn: 153592
* Revert r153521 as it's causing large regressions on the nightly testers.Chad Rosier2012-03-282-41/+0
| | | | | | | | Original commit message for r153521 (aka r153423): Use the new range metadata in computeMaskedBits and add a new optimization to instruction simplify that lets us remove an and when loding a boolean value. llvm-svn: 153587
* Fixed commuteInstructions bug where if its called pre-regalloc the subreg ↵Pete Cooper2012-03-281-6/+15
| | | | | | indices weren't commuted llvm-svn: 153579
* GlobalOpt: If we have an inbounds GEP from a ConstantAggregateZero global ↵Benjamin Kramer2012-03-281-0/+6
| | | | | | that we just determined to be constant, replace all loads from it with a zero value. llvm-svn: 153576
* Add another note about a missed compare with nsw arithmetic instcombine.Benjamin Kramer2012-03-281-0/+7
| | | | llvm-svn: 153574
* Fixup VST1.32 with writeback instruction. Also re-factor non-writeback version.Richard Barton2012-03-281-21/+13
| | | | llvm-svn: 153573
* Switch to WeakVHs in the value mapper, and aggressively prune dead basicChandler Carruth2012-03-281-3/+23
| | | | | | | | | blocks in the function cloner. This removes the last case of trivially dead code that I've been seeing in the wild getting inlined, analyzed, re-inlined, optimized, only to be deleted. Nukes a FIXME from the cleanup tests. llvm-svn: 153572
* More debug output.Eric Christopher2012-03-281-1/+2
| | | | llvm-svn: 153571
* Fix the output of the DW_TAG_friend tag to include DW_AT_friendEric Christopher2012-03-282-8/+16
| | | | | | | | and not the rest of the member tag. Fixes PR11695 llvm-svn: 153570
* Turn off post-RA scheduler by default.Akira Hatanaka2012-03-281-1/+1
| | | | llvm-svn: 153557
* Fix 80-column violation.Chad Rosier2012-03-281-2/+2
| | | | llvm-svn: 153556
* Turn on post register allocation scheduler.Akira Hatanaka2012-03-284-0/+22
| | | | llvm-svn: 153554
* Sort relocation entries before they are written out to a file. MIPS ABIAkira Hatanaka2012-03-281-0/+103
| | | | | | | imposes a constraint that GOT16 referring to a local symbol or HI16 has to be followed immediately by a matching LO16 relocation. llvm-svn: 153553
* Emit all directives except for ".cprestore" during asm printing rather than emitAkira Hatanaka2012-03-287-151/+188
| | | | | | | | | | | | | | | | them as machine instructions. Directives ".set noat" and ".set at" are now emitted only at the beginning and end of a function except in the case where they are emitted to enclose .cpload with an immediate operand that doesn't fit in 16-bit field or unaligned load/stores. Also, make the following changes: - Remove function isUnalignedLoadStore and use a switch-case statement to determine whether an instruction is an unaligned load or store. - Define helper function CreateMCInst which generates an instance of an MCInst from an opcode and a list of operands. llvm-svn: 153552
* Mark flag neverHasSideEffects of pattern-less instructions that do not haveAkira Hatanaka2012-03-281-0/+5
| | | | | | any side effects. llvm-svn: 153551
* Add a note about a cute little fabs optimization.Benjamin Kramer2012-03-271-0/+5
| | | | llvm-svn: 153543
* Add two missed instcombines related to compares with nsw arithmetic.Benjamin Kramer2012-03-271-0/+12
| | | | llvm-svn: 153542
* Remove trailing white space.Akira Hatanaka2012-03-271-1/+1
| | | | llvm-svn: 153536
* Use a SmallVector and linear lookup instead of a DenseSet - SourceMap valuesLang Hames2012-03-271-11/+16
| | | | | | | will always be tiny sets, so DenseSet is overkill (SmallSet won't work as we need iteration support). llvm-svn: 153529
* Add member EmitNOAT and its setter and getter functions to class ↵Akira Hatanaka2012-03-271-1/+6
| | | | | | | | | MipsFunctionInfo. If EmitNOAT is true, directives ".set noat" and ".set at" are emitted at the beginning and end of a function. llvm-svn: 153528
* Use DW_AT_low_pc for a single entry point into a routine.Eric Christopher2012-03-271-3/+3
| | | | | | Fixes PR10105 llvm-svn: 153524
* Reapply r153423; the original commit was fine. The failing test, distray, had Chad Rosier2012-03-272-0/+41
| | | | | | | | | | undefined behavior, which Rafael was kind enough to fix. Original commit message for r153423: Use the new range metadata in computeMaskedBits and add a new optimization to instruction simplify that lets us remove an and when loding a boolean value. llvm-svn: 153521
* ARMLoadStoreOptimizer invalidates register liveness.Jakob Stoklund Olesen2012-03-271-0/+4
| | | | | | | | | | This pass tries to update kill flags, but there are still many bugs. Passes after the load/store optimizer don't need accurate liveness, so don't even try. <rdar://problem/11101911> llvm-svn: 153519
* Print SSA and liveness tracking flags in MF::print().Jakob Stoklund Olesen2012-03-271-1/+7
| | | | llvm-svn: 153518
* Branch folding may invalidate liveness.Jakob Stoklund Olesen2012-03-271-2/+9
| | | | | | | | Branch folding can use a register scavenger to update liveness information when required. Don't do that if liveness information is already invalid. llvm-svn: 153517
* Invalidate liveness in Thumb2ITBlockPass.Jakob Stoklund Olesen2012-03-271-0/+4
| | | | llvm-svn: 153516
* fix what looks like a real logic bug, found by PVS-Studio (part of PR12357)Chris Lattner2012-03-271-2/+2
| | | | llvm-svn: 153513
* Add an MRI::tracksLiveness() flag.Jakob Stoklund Olesen2012-03-272-1/+6
| | | | | | | | | | | | | | | | | | | | Late optimization passes like branch folding and tail duplication can transform the machine code in a way that makes it expensive to keep the register liveness information up to date. There is a fuzzy line between register allocation and late scheduling where the liveness information degrades. The MRI::tracksLiveness() flag makes the line clear: While true, liveness information is accurate, and can be used for register scavenging. Once the flag is false, liveness information is not accurate, and can only be used as a hint. Late passes generally don't need the liveness information, but they will sometimes use the register scavenger to help update it. The scavenger enforces strict correctness, and we have to spend a lot of code to update register liveness that may never be used. llvm-svn: 153511
* Make a seemingly tiny change to the inliner and fix the generated codeChandler Carruth2012-03-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | size bloat. Unfortunately, I expect this to disable the majority of the benefit from r152737. I'm hopeful at least that it will fix PR12345. To explain this requires... quite a bit of backstory I'm afraid. TL;DR: The change in r152737 actually did The Wrong Thing for linkonce-odr functions. This change makes it do the right thing. The benefits we saw were simple luck, not any actual strategy. Benchmark numbers after a mini-blog-post so that I've written down my thoughts on why all of this works and doesn't work... To understand what's going on here, you have to understand how the "bottom-up" inliner actually works. There are two fundamental modes to the inliner: 1) Standard fixed-cost bottom-up inlining. This is the mode we usually think about. It walks from the bottom of the CFG up to the top, looking at callsites, taking information about the callsite and the called function and computing th expected cost of inlining into that callsite. If the cost is under a fixed threshold, it inlines. It's a touch more complicated than that due to all the bonuses, weights, etc. Inlining the last callsite to an internal function gets higher weighth, etc. But essentially, this is the mode of operation. 2) Deferred bottom-up inlining (a term I just made up). This is the interesting mode for this patch an r152737. Initially, this works just like mode #1, but once we have the cost of inlining into the callsite, we don't just compare it with a fixed threshold. First, we check something else. Let's give some names to the entities at this point, or we'll end up hopelessly confused. We're considering inlining a function 'A' into its callsite within a function 'B'. We want to check whether 'B' has any callers, and whether it might be inlined into those callers. If so, we also check whether inlining 'A' into 'B' would block any of the opportunities for inlining 'B' into its callers. We take the sum of the costs of inlining 'B' into its callers where that inlining would be blocked by inlining 'A' into 'B', and if that cost is less than the cost of inlining 'A' into 'B', then we skip inlining 'A' into 'B'. Now, in order for #2 to make sense, we have to have some confidence that we will actually have the opportunity to inline 'B' into its callers when cheaper, *and* that we'll be able to revisit the decision and inline 'A' into 'B' if that ever becomes the correct tradeoff. This often isn't true for external functions -- we can see very few of their callers, and we won't be able to re-consider inlining 'A' into 'B' if 'B' is external when we finally see more callers of 'B'. There are two cases where we believe this to be true for C/C++ code: functions local to a translation unit, and functions with an inline definition in every translation unit which uses them. These are represented as internal linkage and linkonce-odr (resp.) in LLVM. I enabled this logic for linkonce-odr in r152737. Unfortunately, when I did that, I also introduced a subtle bug. There was an implicit assumption that the last caller of the function within the TU was the last caller of the function in the program. We want to bonus the last caller of the function in the program by a huge amount for inlining because inlining that callsite has very little cost. Unfortunately, the last caller in the TU of a linkonce-odr function is *not* the last caller in the program, and so we don't want to apply this bonus. If we do, we can apply it to one callsite *per-TU*. Because of the way deferred inlining works, when it sees this bonus applied to one callsite in the TU for 'B', it decides that inlining 'B' is of the *utmost* importance just so we can get that final bonus. It then proceeds to essentially force deferred inlining regardless of the actual cost tradeoff. The result? PR12345: code bloat, code bloat, code bloat. Another result is getting *damn* lucky on a few benchmarks, and the over-inlining exposing critically important optimizations. I would very much like a list of benchmarks that regress after this change goes in, with bitcode before and after. This will help me greatly understand what opportunities the current cost analysis is missing. Initial benchmark numbers look very good. WebKit files that exhibited the worst of PR12345 went from growing to shrinking compared to Clang with r152737 reverted. - Bootstrapped Clang is 3% smaller with this change. - Bootstrapped Clang -O0 over a single-source-file of lib/Lex is 4% faster with this change. Please let me know about any other performance impact you see. Thanks to Nico for reporting and urging me to actually fix, Richard Smith, Duncan Sands, Manuel Klimek, and Benjamin Kramer for talking through the issues today. llvm-svn: 153506
* Prune some includesCraig Topper2012-03-2730-33/+6
| | | | llvm-svn: 153502
* Remove unnecessary llvm:: qualificationsCraig Topper2012-03-2718-261/+261
| | | | llvm-svn: 153500
* Pass the llvm IR pointer value and offset to the constructor ofAkira Hatanaka2012-03-271-9/+13
| | | | | | | | | | | | MachinePointerInfo when getStore is called to create a node that stores an argument passed in register to the stack. Without this change, the post RA scheduler will fail to discover the dependencies between the stores instructions and the instructions that load from a structure passed by value. The link to the related discussion is here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-March/048055.html llvm-svn: 153499
OpenPOWER on IntegriCloud