summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* DAGCombiner: simplify by using condition variables; NFCMatthias Braun2015-01-132-18/+15
| | | | llvm-svn: 225836
* R600: Implement getRecipEstimateMatt Arsenault2015-01-132-1/+3
| | | | | | | | | This requires a new hook to prevent expanding sqrt in terms of rsqrt and reciprocal. v_rcp_f32, v_rsq_f32, and v_sqrt_f32 are all the same rate, so this expansion would just double the number of instructions and cycles. llvm-svn: 225828
* [StackMaps] Use CurrentFnSymForSizeHal Finkel2015-01-131-1/+1
| | | | | | | | When computing the call-site offset, use AP.CurrentFnSymForSize instead of AP.CurrentFnSym. There should be no change for other targets, but this is necessary for generating valid expressions for PPC64/ELF. llvm-svn: 225807
* [StackMaps] Mark in CallLoweringInfo when lowering a patchpointHal Finkel2015-01-133-4/+7
| | | | | | | | | | | | | | | While, generally speaking, the process of lowering arguments for a patchpoint is the same as lowering a regular indirect call, on some targets it may not be exactly the same. Targets may not, for example, want to add additional register dependencies that apply only to making cross-DSO calls through linker stubs, may not want to load additional registers out of function descriptors, and may not want to add additional side-effect-causing instructions that cannot be removed later with the call itself being generated. The PowerPC target will use this in a future commit (for all of the reasons stated above). llvm-svn: 225806
* [StackMaps] Allow the target to pre-process the live-out maskHal Finkel2015-01-131-0/+2
| | | | | | | | | | | | | | Some targets, PowerPC for example, have pseudo-registers (such as that used to represent the rounding mode), that don't have DWARF register numbers or a register class. These are used only for internal dependency tracking, and should not appear in the recorded live-outs. This adds a callback allowing the target to pre-process the live-out mask in order to remove these kinds of registers so that the StackMaps code does not complain about them and/or attempt to include them in the output. This will be used by the PowerPC target in a future commit. llvm-svn: 225805
* Added TLI hook for isFPExtFree. Some of the FMA combine heuristics are now ↵Olivier Sallenave2015-01-131-63/+70
| | | | | | guarded with that hook. llvm-svn: 225795
* Peephole opt needs optimizeSelect() to keep track of newly created MIsMehdi Amini2015-01-131-4/+13
| | | | | | | | | | | | | | | Peephole optimizer is scanning a basic block forward. At some point it needs to answer the question "given a pointer to an MI in the current BB, is it located before or after the current instruction". To perform this, it keeps a set of the MIs already seen during the scan, if a MI is not in the set, it is assumed to be after. It means that newly created MIs have to be inserted in the set as well. This commit passes the set as an argument to the target-dependent optimizeSelect() so that it can properly update the set with the (potentially) newly created MIs. llvm-svn: 225772
* Rename llvm.recoverframeallocation to llvm.framerecoverReid Kleckner2015-01-131-3/+3
| | | | | | | | This name is less descriptive, but it sort of puts things in the 'llvm.frame...' namespace, relating it to frameallocate and frameaddress. It also avoids using "allocate" and "allocation" together. llvm-svn: 225752
* Add the llvm.frameallocate and llvm.recoverframeallocation intrinsicsReid Kleckner2015-01-135-0/+90
| | | | | | | | | | | | | | | | | | | | | These intrinsics allow multiple functions to share a single stack allocation from one function's call frame. The function with the allocation may only perform one allocation, and it must be in the entry block. Functions accessing the allocation call llvm.recoverframeallocation with the function whose frame they are accessing and a frame pointer from an active call frame of that function. These intrinsics are very difficult to inline correctly, so the intention is that they be introduced rarely, or at least very late during EH preparation. Reviewers: echristo, andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D6493 llvm-svn: 225746
* Combine fcmp + select to fminnum / fmaxnum if no nans and legalMatt Arsenault2015-01-131-0/+59
| | | | | | | Also require unsafe FP math for no since there isn't a way to test for signed zeros. llvm-svn: 225744
* Debug Info: Move support for constants into DwarfExpression.Adrian Prantl2015-01-134-37/+65
| | | | | | | | | Move the declaration of DebugLocDwarfExpression into DwarfExpression.h because it needs to be accessed from AsmPrinterDwarf.cpp and DwarfDebug.cpp NFC. llvm-svn: 225734
* Make DwarfExpression store the AsmPrinter instead of the TargetMachine.Adrian Prantl2015-01-124-17/+26
| | | | | | NFC. llvm-svn: 225731
* remove extra semicolonAdrian Prantl2015-01-121-1/+1
| | | | llvm-svn: 225730
* musttail: Only set the inreg flag for fastcall and vectorcallReid Kleckner2015-01-121-3/+16
| | | | | | | | | | Otherwise we'll attempt to forward ECX, EDX, and EAX for cdecl and stdcall thunks, leaving us with no scratch registers for indirect call targets. Fixes PR22052. llvm-svn: 225729
* Run clang-format on the parts of AsmPrinterDwarf where it improves theAdrian Prantl2015-01-121-12/+10
| | | | | | readability. llvm-svn: 225726
* Debug Info: Add a virtual destructor to DwarfExpression.Adrian Prantl2015-01-121-0/+1
| | | | | | Thanks Chandler for noticing! llvm-svn: 225724
* Untwine this expression. Thanks to David for noticing!Adrian Prantl2015-01-121-1/+1
| | | | llvm-svn: 225720
* Debug Info: Implement DwarfUnit::addRegisterOpPiece() using DwarfExpression.Adrian Prantl2015-01-122-57/+4
| | | | | | NFC. llvm-svn: 225717
* Debug Info: Implement DwarfUnit::addRegisterOffset using DwarfExpression.Adrian Prantl2015-01-125-16/+60
| | | | | | No functional change. llvm-svn: 225707
* Debug info: Factor out the creation of DWARF expressions from AsmPrinterAdrian Prantl2015-01-124-136/+251
| | | | | | | | | | | | | | into a new class DwarfExpression that can be shared between AsmPrinter and DwarfUnit. This is the first step towards unifying the two entirely redundant implementations of dwarf expression emission in DwarfUnit and AsmPrinter. Almost no functional change — Testcases were updated because asm comments that used to be on two lines now appear on the same line, which is actually preferable. llvm-svn: 225706
* RegisterCoalescer: Turn some impossible conditions into assertsMatthias Braun2015-01-121-17/+11
| | | | | | | | This is a fixed version of reverted r225500. It fixes the too early if() continue; of the last patch and adds a comment to the unorthodox loop. llvm-svn: 225652
* [SimplifyLibCalls] Factor out fortified libcall handling.Ahmed Bougacha2015-01-121-20/+10
| | | | | | | | This lets us remove CGP duplicate. Differential Revision: http://reviews.llvm.org/D6541 llvm-svn: 225640
* Revert r225500, it leads to infinite loops.Joerg Sonnenberger2015-01-101-9/+15
| | | | llvm-svn: 225590
* Recommit r224935 with a fix for the ObjC++/AArch64 bug that that revisionLang Hames2015-01-091-54/+0
| | | | | | | | | | introduced. A test case for the bug was already committed in r225385. Patch by Rafael Espindola. llvm-svn: 225534
* RegisterCoalescer: Fix removeCopyByCommutingDef with subreg livenessMatthias Braun2015-01-091-1/+3
| | | | | | | | | The code that eliminated additional coalescable copies in removeCopyByCommutingDef() used MergeValueNumberInto() which internally may merge A into B or B into A. In this case A and B had different Def points, so we have to reset ValNo.Def to the intended one after merging. llvm-svn: 225503
* RegisterCoalescer: Some cleanup in removeCopyByCommutingDef(), NFCMatthias Braun2015-01-091-15/+19
| | | | llvm-svn: 225502
* RegisterCoalescer: No need to set kill flags, they are recompute later anywayMatthias Braun2015-01-091-2/+0
| | | | llvm-svn: 225501
* RegisterCoalescer: Turn some impossible conditions into assertsMatthias Braun2015-01-091-15/+9
| | | | llvm-svn: 225500
* [DAGCombine] Remainder of fix to r225380 (More FMA folding opportunities)Hal Finkel2015-01-091-10/+24
| | | | | | | | | | As pointed out by Aditya (and Owen), when we elide an FP extend to form an FMA, we need to extend the incoming operands so that the resulting node will really be legal. This is currently enabled only for PowerPC, and it happens to work there regardless, but this should fix the functionality for everyone else should anyone else wish to use it. llvm-svn: 225492
* Partial fix to r225380 (More FMA folding opportunities)Hal Finkel2015-01-091-96/+95
| | | | | | | | | | | | As pointed out by Aditya (and Owen), there are two things wrong with this code. First, it adds patterns which elide FP extends when forming FMAs, and that might not be profitable on all targets (it belongs behind the pre-existing aggressive-FMA-formation flag). This is fixed by this change. Second, the resulting nodes might have operands of different types (the extensions need to be re-added). That will be fixed in the follow-up commit. llvm-svn: 225485
* [MachineLICM] A command-line option to hoist even cheap instructionsHal Finkel2015-01-081-1/+6
| | | | | | | | Add a command-line option to enable hoisting even cheap instructions (in low-register-pressure situations). This is turned off by default, but has proved useful for testing purposes. llvm-svn: 225470
* CodeGen: Use handy new-fangled post-increment, NFCDuncan P. N. Exon Smith2015-01-081-1/+1
| | | | | | | Drive-by cleanup; I noticed this when reviewing the patch that became r225466. llvm-svn: 225468
* CodeGen: Use range-based for loops, NFCDuncan P. N. Exon Smith2015-01-081-5/+5
| | | | | | Patch by Ramkumar Ramachandra! llvm-svn: 225466
* Masked Load/Store - fixed a bug in type legalization.Elena Demikhovsky2015-01-083-3/+107
| | | | llvm-svn: 225441
* Fix include ordering, NFC.Michael Kuperstein2015-01-081-1/+1
| | | | llvm-svn: 225439
* Move SPAdj logic from PEI into the targets (NFC)Michael Kuperstein2015-01-082-11/+34
| | | | | | | | | | | | PEI tries to keep track of how much starting or ending a call sequence adjusts the stack pointer by, so that it can resolve frame-index references. Currently, it takes a very simplistic view of how SP adjustments are done - both FrameStartOpcode and FrameDestroyOpcode adjust it exactly by the amount written in its first argument. This view is in fact incorrect for some targets (e.g. due to stack re-alignment, or because it may want to adjust the stack pointer in multiple steps). However, that doesn't cause breakage, because most targets (the only in-tree exception appears to be 32-bit ARM) rely on being able to simplify the call frame pseudo-instructions earlier, so this code is never hit. Moving the computation into TargetInstrInfo allows targets to override the way the adjustment is computed if they need to have a non-zero SPAdj. Differential Revision: http://reviews.llvm.org/D6863 llvm-svn: 225437
* [RegAllocGreedy] Introduce a late pass to repair broken hints.Quentin Colombet2015-01-083-2/+212
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A broken hint is a copy where both ends are assigned different colors. When a variable gets evicted in the neighborhood of such copies, it is likely we can reconcile some of them. ** Context ** Copies are inserted during the register allocation via splitting. These split points are required to relax the constraints on the allocation problem. When such a point is inserted, both ends of the copy would not share the same color with respect to the current allocation problem. When variables get evicted, the allocation problem becomes different and some split point may not be required anymore. However, the related variables may already have been colored. This usually shows up in the assembly with pattern like this: def A ... save A to B def A use A restore A from B ... use B Whereas we could simply have done: def B ... def A use A ... use B ** Proposed Solution ** A variable having a broken hint is marked for late recoloring if and only if selecting a register for it evict another variable. Indeed, if no eviction happens this is pointless to look for recoloring opportunities as it means the situation was the same as the initial allocation problem where we had to break the hint. Finally, when everything has been allocated, we look for recoloring opportunities for all the identified candidates. The recoloring is performed very late to rely on accurate copy cost (all involved variables are allocated). The recoloring is simple unlike the last change recoloring. It propagates the color of the broken hint to all its copy-related variables. If the color is available for them, the recoloring uses it, otherwise it gives up on that hint even if a more complex coloring would have worked. The recoloring happens only if it is profitable. The profitability is evaluated using the expected frequency of the copies of the currently recolored variable with a) its current color and b) with the target color. If a) is greater or equal than b), then it is profitable and the recoloring happen. ** Example ** Consider the following example: BB1: a = b = BB2: ... = b = a Let us assume b gets split: BB1: a = b = BB2: c = b ... d = c = d = a Because of how the allocation work, b, c, and d may be assigned different colors. Now, if a gets evicted to make room for c, assuming b and d were assigned to something different than a. We end up with: BB1: a = st a, SpillSlot b = BB2: c = b ... d = c = d e = ld SpillSlot = e This is likely that we can assign the same register for b, c, and d, getting rid of 2 copies. ** Performances ** Both ARM64 and x86_64 show performance improvements of up to 3% for the llvm-testsuite + externals with Os and O3. There are a few regressions too that comes from the (in)accuracy of the block frequency estimate. <rdar://problem/18312047> llvm-svn: 225422
* [SelectionDAG] Allow targets to specify legality of extloads' resultAhmed Bougacha2015-01-085-33/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | type (in addition to the memory type). The *LoadExt* legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421
* RegisterCoalescer: Do not remove IMPLICIT_DEFS if they are required for ↵Matthias Braun2015-01-081-1/+7
| | | | | | | | | | | subranges. The register coalescer used to remove implicit_defs when they are covered by the main range anyway. With subreg liveness tracking we can't do that anymore in places where the IMPLICIT_DEF is required as begin of a subregister liverange. llvm-svn: 225416
* RegisterCoalescer: Fix valuesIdentical() in some subrange merge cases.Matthias Braun2015-01-071-90/+81
| | | | | | | | | | | | | I got confused and assumed SrcIdx/DstIdx of the CoalescerPair is a subregister index in SrcReg/DstReg, but they are actually subregister indices of the coalesced register that get you back to SrcReg/DstReg when applied. Fixed the bug, improved comments and simplified code accordingly. Testcase by Tom Stellard! llvm-svn: 225415
* LiveInterval: Implement feedback by Quentin Colombet.Matthias Braun2015-01-071-25/+32
| | | | llvm-svn: 225413
* Update a comment.Adrian Prantl2015-01-071-0/+2
| | | | llvm-svn: 225399
* [CodeGen] Use MVT iterator_ranges in legality loops. NFC intended.Ahmed Bougacha2015-01-071-19/+14
| | | | | | | | A few loops do trickier things than just iterating on an MVT subset, so I'll leave them be for now. Follow-up of r225387. llvm-svn: 225392
* More FMA folding opportunities.Olivier Sallenave2015-01-071-1/+133
| | | | llvm-svn: 225380
* Debug info: Allow aggregate types to be described by constants.Adrian Prantl2015-01-071-2/+5
| | | | llvm-svn: 225378
* Test commitOlivier Sallenave2015-01-071-0/+1
| | | | llvm-svn: 225368
* Add a missing file from 225365Philip Reames2015-01-071-0/+54
| | | | llvm-svn: 225366
* Introduce an example statepoint GC strategyPhilip Reames2015-01-074-0/+50
| | | | | | | | | | | | | | This change includes the most basic possible GCStrategy for a GC which is using the statepoint lowering code. At the moment, this GCStrategy doesn't really do much - aside from actually generate correct stackmaps that is - but I went ahead and added a few extra correctness checks as proof of concept. It's mostly here to provide documentation on how to do one, and to provide a point for various optimization legality hooks I'd like to add going forward. (For context, see the TODOs in InstCombine around gc.relocate.) Most of the validation logic added here as proof of concept will soon move in to the Verifier. That move is dependent on http://reviews.llvm.org/D6811 There was discussion in the review thread about addrspace(1) being reserved for something. I'm going to follow up on a seperate llvmdev thread. If needed, I'll update all the code at once. Note that I am deliberately not making a GCStrategy required to use gc.statepoints with this change. I want to give folks out of tree - including myself - a chance to migrate. In a week or two, I'll make having a GCStrategy be required for gc.statepoints. To this end, I added the gc tag to one of the test cases but not others. Differential Revision: http://reviews.llvm.org/D6808 llvm-svn: 225365
* New method SDep::isNormalMemoryOrBarrier() in ScheduleDAGInstrs.cpp.Jonas Paulsson2015-01-071-5/+6
| | | | | | | | | | | | | | | | | | | | Used to iterate over previously added memory dependencies in adjustChainDeps() and iterateChainSucc(). SDep::isCtrl() was previously used in these places, that also gave anti and output edges. The code may be worse if these are followed, because MisNeedChainEdge() will conservatively return true since a non-memory instruction has no memory operands, and a false chain dep will be added. It is also unnecessary since all memory accesses of interest will be reached by memory dependencies, and there is a budget limit for the number of edges traversed. This problem was found on an out-of-tree target with enabled alias analysis. No test case for an in-tree target has been found. Reviewed by Hal Finkel. llvm-svn: 225351
* Fix typos in comment and option help texts.Jonas Paulsson2015-01-071-3/+3
| | | | | | For -enable-aa-sched-mi and -use-tbaa-in-sched-mi. llvm-svn: 225350
OpenPOWER on IntegriCloud