summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* Reset the TopRPTracker's position in ScheduleDAGMILive::initQueuesKrzysztof Parzyszek2016-04-281-5/+11
| | | | | | | | | | | | | | | | | | | ScheduleDAGMI::initQueues changes the RegionBegin to the first non-debug instruction. Since it does not track register pressure, it does not affect any RP trackers. ScheduleDAGMILive inherits initQueues from ScheduleDAGMI, and it does reset the TopTPTracker in its schedule method. Any derived, target-specific scheduler will need to do it as well, but the TopRPTracker is only exposed as a "const" object to derived classes. Without the ability to modify the tracker directly, this leaves a derived scheduler with a potential of having the TopRPTracker out-of-sync with the CurrentTop. The symptom of the problem: void llvm::ScheduleDAGMILive::scheduleMI(llvm::SUnit *, bool): Assertion `TopRPTracker.getPos() == CurrentTop && "out of sync"' failed. Differential Revision: http://reviews.llvm.org/D19438 llvm-svn: 267918
* Debug Info: Restore the pre-r240853 behavior for DWARF2 bitfields.Adrian Prantl2016-04-281-24/+10
| | | | | | | | | The DWARF2 specification of DW_AT_bit_offset is ambiguous for little-endian machines, but by restoring to the old behavior we match what debuggers expect and what other popular compilers generate. llvm-svn: 267896
* Debug info: Support DWARF4 bitfields via DW_AT_data_bit_offset.Adrian Prantl2016-04-281-28/+30
| | | | | | | | | | | | | The DWARF2 specification of DW_AT_bit_offset was written from the perspective of a big-endian machine with unclear semantics for other systems. DWARF4 deprecated DW_AT_bit_offset and introduced a new attribute DW_AT_data_bit_offset that simply counts the number of bits from the beginning of the containing entity regardless of endianness. After this patch LLVM emits DW_AT_bit_offset for DWARF 2 or 3 and DW_AT_data_bit_offset when DWARF 4 or later is requested. llvm-svn: 267895
* [CodeGen] Default CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF to Expand in ↵Craig Topper2016-04-281-0/+4
| | | | | | TargetLoweringBase. This is what the majority of the targets want and removes a bunch of code. Set it to Legal explicitly in the few cases where that's the desired behavior. llvm-svn: 267853
* CodeGen: Add DetectDeadLanes pass.Matthias Braun2016-04-284-0/+534
| | | | | | | | | | | | | | | | | | | | The DetectDeadLanes pass performs a dataflow analysis of used/defined subregister lanes across COPY instructions and instructions that will get lowered to copies. It detects dead definitions and uses reading undefined values which are obscured by COPY and subregister usage. These dead definitions cause trouble in the register coalescer which cannot deal with definitions suddenly becoming dead after coalescing COPY instructions. For now the pass only adds dead and undef flags to machine operands. It should be possible to extend it in the future to remove the dead instructions and redo the analysis for the affected virtual registers. Differential Revision: http://reviews.llvm.org/D18427 llvm-svn: 267851
* LiveIntervalAnalysis: Fix handleMove() using wrong value numbersMatthias Braun2016-04-281-2/+1
| | | | | | | | | | handleMove() was incorrectly swapping two value numbers. This was missed before because the problem only occured when moving subregister definitions and needed -verify-machineinstrs to be detected. I cannot add a testcase as long as I cannot reapply r260905/r260806. llvm-svn: 267840
* [ImplicitNullChecks] Properly update the live-in of the block of the memory ↵Quentin Colombet2016-04-271-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | operation. We basically replace: HoistBB: cond_br NullBB, NotNullBB NullBB: ... NotNullBB: <reg> = load into HoistBB <reg> = load_faulting_op NullBB uncond_br NotNullBB NullBB: ... NotNullBB: ## <reg> is now live-in of NotNullBB ... This partially fixes the machine verifier error for test/CodeGen/X86/implicit-null-check.ll, but it still fails because of the implicit CFG structure. llvm-svn: 267817
* Fix build failure under NDEBUG.Than McIntosh2016-04-271-0/+4
| | | | llvm-svn: 267774
* [CodeGenPrepare] Don't sink a cast past its userDavid Majnemer2016-04-271-0/+5
| | | | | | | | | | The sink cast machinery is supposed to sink casts as close to their user as possible. However, an EH pad is the first instruction in it's basic block. Don't sink if the user is an EH pad. This fixes PR27536. llvm-svn: 267767
* Refactor debugging code, NFC.Than McIntosh2016-04-271-31/+30
| | | | | | | | | | | | | | | | | Summary: Refactor debugging routines to reduce code duplication. Remove a couple of #include's that were not needed. Don't require MachineDominator as a prereq for this pass (not needed). These changes split off from http://reviews.llvm.org/D18827. Reviewers: wmi, gbiv, qcolombet Subscribers: llvm-commits, davidxl, jevinskie Differential Revision: http://reviews.llvm.org/D18992 llvm-svn: 267766
* [DAGCombiner] Follow coding convention for function name (NFC)Gerolf Hoflehner2016-04-271-2/+2
| | | | llvm-svn: 267745
* Revert r267649, it caused PR27539.Nico Weber2016-04-271-11/+7
| | | | llvm-svn: 267723
* Detects the SAD pattern on X86 so that much better code will be emitted once ↵Cong Hou2016-04-271-7/+11
| | | | | | | | the pattern is matched. Differential revision: http://reviews.llvm.org/D14840 llvm-svn: 267649
* [MachineInstrBundle] Actually set the PartialDeadDef flag only when the registerQuentin Colombet2016-04-271-1/+1
| | | | | | | | | is defined! The users were checking the proper thing (Defined + PartialDeadDef), but the information may have been wrong for other use cases, so fix that. llvm-svn: 267641
* [MachineBasicBlock] Take advantage of the partially dead information.Quentin Colombet2016-04-261-2/+9
| | | | | | | Thanks to that information we wouldn't lie on a register being live whereas it is not. llvm-svn: 267622
* [MachineInstrBundle] Improvement the recognition of dead definitions.Quentin Colombet2016-04-261-3/+7
| | | | | | | Now, it is possible to know that partial definitions are dead definitions and recognize that clobbered registers are also dead. llvm-svn: 267621
* [CodeGen] Add getBuildVector and getSplatBuildVector helpers. NFCI.Ahmed Bougacha2016-04-262-40/+25
| | | | | | Differential Revision: http://reviews.llvm.org/D17176 llvm-svn: 267606
* [Tail duplication] Handle source registers with subregistersKrzysztof Parzyszek2016-04-261-34/+80
| | | | | | | | | | | | | | When a block is tail-duplicated, the PHI nodes from that block are replaced with appropriate COPY instructions. When those PHI nodes contained use operands with subregisters, the subregisters were dropped from the COPY instructions, resulting in incorrect code. Keep track of the subregister information and use this information when remapping instructions from the duplicated block. Differential Revision: http://reviews.llvm.org/D19337 llvm-svn: 267583
* [CodeGenPrepare] use branch weight metadata to decide if a select should be ↵Sanjay Patel2016-04-262-11/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | turned into a branch This is part of solving PR27344: https://llvm.org/bugs/show_bug.cgi?id=27344 CGP should undo the SimplifyCFG transform for the same reason that earlier patches have used this same mechanism: it's possible that passes between SimplifyCFG and CGP may be able to optimize the IR further with a select in place. For the TLI hook default, >99% taken or not taken is chosen as the default threshold for a highly predictable branch. Even the most limited HW branch predictors will be correct on this branch almost all the time, so even a massive mispredict penalty perf loss would be overcome by the win from all the times the branch was predicted correctly. As a follow-up, we could make the default target hook less conservative by using the SchedMachineModel's MispredictPenalty. Or we could just let targets override the default by implementing the hook with that and other target-specific options. Note that trying to statically determine mispredict rates for close-to-balanced profile weight data is generally impossible if the HW is sufficiently advanced. Ie, 50/50 taken/not-taken might still be 100% predictable. Finally, note that this patch as-is will not solve PR27344 because the current __builtin_unpredictable() branch weight default values are 4 and 64. A proposal to change that is in D19435. Differential Revision: http://reviews.llvm.org/D19488 llvm-svn: 267572
* [CodeGenPrepare] don't convert an unpredictable select into control flowSanjay Patel2016-04-261-1/+2
| | | | | | | Suggested in the review of D19488: http://reviews.llvm.org/D19488 llvm-svn: 267504
* [PR27390] [CodeGen] Reject indexed loads in CombinerDAG.Marcin Koscielnicki2016-04-252-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | visitAND, when folding and (load) forgets to check which output of an indexed load is involved, happily folding the updated address output on the following testcase: target datalayout = "e-m:e-i64:64-n32:64" target triple = "powerpc64le-unknown-linux-gnu" %typ = type { i32, i32 } define signext i32 @_Z8access_pP1Tc(%typ* %p, i8 zeroext %type) { %b = getelementptr inbounds %typ, %typ* %p, i64 0, i32 1 %1 = load i32, i32* %b, align 4 %2 = ptrtoint i32* %b to i64 %3 = and i64 %2, -35184372088833 %4 = inttoptr i64 %3 to i32* %_msld = load i32, i32* %4, align 4 %zzz = add i32 %1, %_msld ret i32 %zzz } Fix this by checking ResNo. I've found a few more places that currently neglect to check for indexed load, and tightened them up as well, but I don't have test cases for them. In fact, they might not be triggerable at all, at least with current targets. Still, better safe than sorry. Differential Revision: http://reviews.llvm.org/D19202 llvm-svn: 267420
* [WinEH] Update SplitAnalysis::computeLastSplitPoint to cope with multiple EH ↵David Majnemer2016-04-252-14/+12
| | | | | | | | | | | | | | | | | | | successors We didn't have logic to correctly handle CFGs where there was more than one EH-pad successor (these are novel with WinEH). There were situations where a register was live in one exceptional successor but not another but the code as written would only consider the first exceptional successor it found. This resulted in split points which were insufficiently early if an invoke was present. This fixes PR27501. N.B. This removes getLandingPadSuccessor. llvm-svn: 267412
* [MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098)Gerolf Hoflehner2016-04-243-3/+26
| | | | | | | | | | | | | | | | | | | The original patch caused crashes because it could derefence a null pointer for SelectionDAGTargetInfo for targets that do not define it. Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267328
* [CodeGen] Teach DAG combine to fold select_cc seteq X, 0, sizeof(X), ↵Craig Topper2016-04-241-0/+35
| | | | | | | | ctlz_zero_undef(X) -> ctlz(X). InstCombine already does this for IR and X86 pattern matches this during isel. A follow up commit will remove the X86 patterns to allow this to be tested. llvm-svn: 267325
* DebugInfo: Remove MDString-based type referencesDuncan P. N. Exon Smith2016-04-234-35/+14
| | | | | | | | | | | | | | | | | | | | | | | | Eliminate DITypeIdentifierMap and make DITypeRef a thin wrapper around DIType*. It is no longer legal to refer to a DICompositeType by its 'identifier:', and DIBuilder no longer retains all types with an 'identifier:' automatically. Aside from the bitcode upgrade, this is mainly removing logic to resolve an MDString-based reference to an actualy DIType. The commits leading up to this have made the implicit type map in DICompileUnit's 'retainedTypes:' field superfluous. This does not remove DITypeRef, DIScopeRef, DINodeRef, and DITypeRefArray, or stop using them in DI-related metadata. Although as of this commit they aren't serving a useful purpose, there are patchces under review to reuse them for CodeView support. The tests in LLVM were updated with deref-typerefs.sh, which is attached to the thread "[RFC] Lazy-loading of debug info metadata": http://lists.llvm.org/pipermail/llvm-dev/2016-April/098318.html llvm-svn: 267296
* replace duplicated static functions for profile metadata access with ↵Sanjay Patel2016-04-231-25/+2
| | | | | | BranchInst member function; NFCI llvm-svn: 267295
* [CodeGen] When promoting CTTZ operations to larger type, don't insert a ↵Craig Topper2016-04-231-9/+11
| | | | | | select to detect if the input is zero to return the original size instead of the extended size. Instead just set the first bit in the zero extended part. llvm-svn: 267280
* Re-commit optimization bisect support (r267022) without new pass manager ↵Andrew Kaylor2016-04-2215-15/+18
| | | | | | | | | | support. The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling). Differential Revision: http://reviews.llvm.org/D19172 llvm-svn: 267231
* Introduce llvm.load.relative intrinsic.Peter Collingbourne2016-04-224-0/+89
| | | | | | | | | | | | | | | | | | | This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that value and returns it. The constant folder specifically recognizes the form of this intrinsic and the constant initializers it may load from; if a loaded constant initializer is known to have the form ``i32 trunc(x - %ptr)``, the intrinsic call is folded to ``x``. LLVM provides that the calculation of such a constant initializer will not overflow at link time under the medium code model if ``x`` is an ``unnamed_addr`` function. However, it does not provide this guarantee for a constant initializer folded into a function body. This intrinsic can be used to avoid the possibility of overflows when loading from such a constant. Differential Revision: http://reviews.llvm.org/D18367 llvm-svn: 267223
* TLI: Only iterate over integer vector typesMatt Arsenault2016-04-221-4/+3
| | | | | | Instead of iterating over all vectors and skipping integers. llvm-svn: 267220
* DAGCombiner: Relax alignment restriction when changing store typeMatt Arsenault2016-04-221-10/+14
| | | | | | If the target allows the alignment, this should be OK. llvm-svn: 267217
* CodeGen: Use PLT relocations for relative references to unnamed_addr functions.Peter Collingbourne2016-04-222-5/+46
| | | | | | | | | | | | | The relative vtable ABI (PR26723) needs PLT relocations to refer to virtual functions defined in other DSOs. The unnamed_addr attribute means that the function's address is not significant, so we're allowed to substitute it with the address of a PLT entry. Also includes a bonus feature: addends for COFF image-relative references. Differential Revision: http://reviews.llvm.org/D17938 llvm-svn: 267211
* DAGCombiner: Relax alignment restriction when changing load typeMatt Arsenault2016-04-221-3/+4
| | | | | | If the target allows the alignment, this should still be OK. llvm-svn: 267209
* MachineScheduler: Move code to initialize a Candidate out of tryCandidate(); NFCMatthias Braun2016-04-221-33/+33
| | | | llvm-svn: 267191
* MachineScheduler: Limit the size of the ready list.Matthias Braun2016-04-221-1/+10
| | | | | | | | | Avoid quadratic complexity in unusually large basic blocks by limiting the size of the ready lists. Differential Revision: http://reviews.llvm.org/D19349 llvm-svn: 267189
* PostRAHazardRecocgnizer: Fix unused-private-field warningTom Stellard2016-04-221-1/+0
| | | | llvm-svn: 267160
* CodeGen: Add a stand-alone hazard recognizer passTom Stellard2016-04-223-0/+103
| | | | | | | | | | | | | | | | Summary: This new pass allows targets to use the hazard recognizer without having to also run one of the schedulers. This is useful when compiling with optimizations disabled for targets that still need noop hazards to be handled correctly. Reviewers: hfinkel, atrick Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18594 llvm-svn: 267156
* Fix -Wunused-variable in non-asserts build.Eric Liu2016-04-221-0/+1
| | | | llvm-svn: 267128
* Revert r267098 - [MachineCombiner] Support for floating-point FMA on ARM64Daniel Sanders2016-04-223-27/+4
| | | | | | It introduced buildbot failures on clang-cmake-mips, clang-ppc64le-linux, among others. llvm-svn: 267127
* Revert "Initial implementation of optimization bisect support."Vedant Kumar2016-04-2215-18/+15
| | | | | | | | This reverts commit r267022, due to an ASan failure: http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check/1549 llvm-svn: 267115
* AMDGPU/SI: add llvm.amdgcn.ps.live intrinsicNicolai Haehnle2016-04-221-0/+11
| | | | | | | | | | | | | | | | | | | | | | | Summary: This intrinsic returns true if the current thread belongs to a live pixel and false if it belongs to a pixel that we are executing only for derivative computation. It will be used by Mesa to implement gl_HelperInvocation. Note that for pixels that are killed during the shader, this implementation also returns true, but it doesn't matter because those pixels are always disabled in the EXEC mask. This unearthed a corner case in the instruction verifier, which complained about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but correct code, so make the verifier accept it as such. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19191 llvm-svn: 267102
* [MachineCombiner] Support for floating-point FMA on ARM64Gerolf Hoflehner2016-04-223-4/+27
| | | | | | | | | | | | | | | | Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267098
* Fix more -Wunused-variable in non-asserts build.David Blaikie2016-04-211-3/+2
| | | | llvm-svn: 267077
* Fix some -Wunused-variable warnings in non-asserts builds.David Blaikie2016-04-211-5/+6
| | | | llvm-svn: 267073
* Improve error message reporting for MachineFunctionPropertiesDerek Schuff2016-04-212-2/+4
| | | | | | | | When printing the properties required by a pass, only print the properties that are set, and not those that are clear (only properties that are set are verified, clear properties are "don't-care"). llvm-svn: 267070
* [MachineBasicBlock] Make the pass argument truly mandatory whenQuentin Colombet2016-04-214-10/+10
| | | | | | | | | | | | | | splitting edges. MachineBasicBlock::SplitCriticalEdges will crash if a nullptr would have been passed for the Pass argument. Do not allow that by turning this argument into a reference. The alternative would have been to make the Pass a truly optional argument, but although this is easy to do, I was afraid users using it like this would not be aware the livness information, dominator tree and such would silently be broken. llvm-svn: 267052
* [MachineBasicBlock] Refactor SplitCriticalEdge to expose a query API.Quentin Colombet2016-04-211-27/+39
| | | | | | | | | Introduce canSplitCriticalEdge, so that clients can now query whether or not a critical edge can be split without actually needing to split it. This may be useful when gathering information for cost models for instance. llvm-svn: 267046
* [RegisterBankInfo] Change the API for the verify methods.Quentin Colombet2016-04-213-10/+15
| | | | | | | Return bool instead of void so that it is natural to put the calls into asserts. llvm-svn: 267033
* LegalizeDAG: Move unaligned load/store expansion to TLIMatt Arsenault2016-04-212-310/+304
| | | | | | | | When custom lowered, this is not called if the store is custom lowered. Move it to be a utility function so targets can easily expand unaligned accesses when custom lowering. llvm-svn: 267029
* [RegisterBankInfo] Change the representation of the partial mappings.Quentin Colombet2016-04-212-33/+26
| | | | | | | | | Instead of holding a mask, hold two value: the start index and the length of the mapping. This is a more compact representation, although less powerful. That being said, arbitrary masks would not have worked for the generic so do not allow them in the first place. llvm-svn: 267025
OpenPOWER on IntegriCloud