summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* Improve the successor list update in TailDuplication.cpp.Cong Hou2015-12-151-6/+6
| | | | | | | This patch improves a temporary fix in r255530 so that we can normalize successor list without trigger assertion failures in tail duplication pass. llvm-svn: 255638
* Type legalizer for masked gather and scatter intrinsics.Elena Demikhovsky2015-12-154-83/+259
| | | | | | | | | | | | Full type legalizer that works with all vectors length - from 2 to 16, (i32, i64, float, double). This intrinsic, for example void @llvm.masked.scatter.v2f32(<2 x float>%data , <2 x float*>%ptrs , i32 align , <2 x i1>%mask ) requires type widening for data and type promotion for mask. Differential Revision: http://reviews.llvm.org/D13633 llvm-svn: 255629
* [ShrinkWrapping] Do not choose restore point inside loops.Quentin Colombet2015-12-151-5/+21
| | | | | | | | | | | | | | | | | | | | The post-dominance property is not sufficient to guarantee that a restore point inside a loop is safe. E.g., while(1) { Save Restore if (...) break; use/def CSRs } All the uses/defs of CSRs are dominated by Save and post-dominated by Restore. However, the CSRs uses are still reachable after Restore and before Save are executed. This fixes PR25824 llvm-svn: 255613
* [X86] Part 2 to fix x86-64 fp128 calling convention.Chih-Hung Hsieh2015-12-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | Part 1 was submitted in http://reviews.llvm.org/D15134. Changes in this part: * X86RegisterInfo.td, X86RecognizableInstr.cpp: Add FR128 register class. * X86CallingConv.td: Pass f128 values in XMM registers or on stack. * X86InstrCompiler.td, X86InstrInfo.td, X86InstrSSE.td: Add instruction selection patterns for f128. * X86ISelLowering.cpp: When target has MMX registers, configure MVT::f128 in FR128RegClass, with TypeSoftenFloat action, and custom actions for some opcodes. Add missed cases of MVT::f128 in places that handle f32, f64, or vector types. Add TODO comment to support f128 type in inline assembly code. * SelectionDAGBuilder.cpp: Fix infinite loop when f128 type can have VT == TLI.getTypeToTransformTo(Ctx, VT). * Add unit tests for x86-64 fp128 type. Differential Revision: http://reviews.llvm.org/D11438 llvm-svn: 255558
* [Packetizer] Add AliasAnalysis as a parameter to the packetizerKrzysztof Parzyszek2015-12-141-7/+11
| | | | | | | | This will make the depedence graph more accurate if an alias analysis is provided. If nullptr is specified in its place, the behavior will remain as it is currently. llvm-svn: 255540
* Remove the successor probabilities normalization in tail duplication pass.Cong Hou2015-12-141-1/+0
| | | | | | | | | | | | The normalization may cause assertion failures on SystemZ and some out-of-tree tests. The root cause is that unknown probabilities are materialized into known ones by calling getSuccProbability(), which is then used to add another successor to the same MBB which results in mixed known and unknown probabilities. But currently those mixed probabilities cannot be normalized. I will compose another patch to fix the root issue. llvm-svn: 255530
* [IR] Remove terminatepadDavid Majnemer2015-12-144-50/+2
| | | | | | | | | | | | | It turns out that terminatepad gives little benefit over a cleanuppad which calls the termination function. This is not sufficient to implement fully generic filters but MSVC doesn't support them which makes terminatepad a little over-designed. Depends on D15478. Differential Revision: http://reviews.llvm.org/D15479 llvm-svn: 255522
* FastISel needs to remove dead code when it bails out.Paul Robinson2015-12-141-2/+32
| | | | | | | | | | | | | | | When FastISel fails to translate an instruction it hands off code generation to SelectionDAG. Before it does so, it may have generated local value instructions to feed phi nodes in successor blocks. These instructions will then be generated again by SelectionDAG, causing duplication and less efficient code, including extra spill instructions. Patch by Wolfgang Pieb! Differential Revision: http://reviews.llvm.org/D11768 llvm-svn: 255520
* AMDGPU: Use generic bitreverse intrinsicMatt Arsenault2015-12-141-1/+24
| | | | | | Also fix bug in vector legalization for bitreverse. llvm-svn: 255512
* getParent() ^ 3 == getModule() ; NFCISanjay Patel2015-12-142-3/+3
| | | | llvm-svn: 255511
* Fix a type issue in r255455. Should not use unsigned type as std::abs()'s ↵Cong Hou2015-12-131-1/+1
| | | | | | template type. llvm-svn: 255461
* Replace <cstdint> by llvm/Support/DataTypes.h for the typedef of uint64_t. NFC.Cong Hou2015-12-131-1/+1
| | | | llvm-svn: 255458
* Add the missing header file <cstdint> needed by uint64_tCong Hou2015-12-131-0/+1
| | | | llvm-svn: 255457
* Normalize MBB's successors' probabilities in several locations.Cong Hou2015-12-135-10/+36
| | | | | | | | | | | | This patch adds some missing calls to MBB::normalizeSuccProbs() in several locations where it should be called. Those places are found by checking if the sum of successors' probabilities is approximate one in MachineBlockPlacement pass with some instrumented code (not in this patch). Differential revision: http://reviews.llvm.org/D15259 llvm-svn: 255455
* Partially fix memcpy / memset / memmove lowering in SelectionDAG ↵Manuel Jacob2015-12-122-22/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | construction if address space != 0. Summary: Previously SelectionDAGBuilder asserted that the pointer operands of memcpy / memset / memmove intrinsics are in address space < 256. This assert implicitly assumed the X86 backend, where all address spaces < 256 are equivalent to address space 0 from the code generator's point of view. On some targets (R600 and NVPTX) several address spaces < 256 have a target-defined meaning, so this assert made little sense for these targets. This patch removes this wrong assertion and adds extra checks before lowering these intrinsics to library calls. If a pointer operand can't be casted to address space 0 without changing semantics, a fatal error is reported to the user. The new behavior should be valid for all targets that give address spaces != 0 a target-specified meaning (NVPTX, R600, X86). NVPTX lowers big or variable-sized memory intrinsics before SelectionDAG construction. All other memory intrinsics are inlined (the threshold is set very high for this target). R600 doesn't support memcpy / memset / memmove library calls (previously the illegal emission of a call to such library function triggered an error somewhere in the code generator). X86 now emits inline loads and stores for address spaces 256 and 257 up to the same threshold that is used for address space 0 and reports a fatal error otherwise. I call this a "partial fix" because there are still cases that can't be lowered. A fatal error is reported in these cases. Reviewers: arsenm, theraven, compnerd, hfinkel Subscribers: hfinkel, llvm-commits, alex Differential Revision: http://reviews.llvm.org/D7241 llvm-svn: 255441
* [IR] Reformulate LLVM's EH funclet IRDavid Majnemer2015-12-126-1449/+428
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While we have successfully implemented a funclet-oriented EH scheme on top of LLVM IR, our scheme has some notable deficiencies: - catchendpad and cleanupendpad are necessary in the current design but they are difficult to explain to others, even to seasoned LLVM experts. - catchendpad and cleanupendpad are optimization barriers. They cannot be split and force all potentially throwing call-sites to be invokes. This has a noticable effect on the quality of our code generation. - catchpad, while similar in some aspects to invoke, is fairly awkward. It is unsplittable, starts a funclet, and has control flow to other funclets. - The nesting relationship between funclets is currently a property of control flow edges. Because of this, we are forced to carefully analyze the flow graph to see if there might potentially exist illegal nesting among funclets. While we have logic to clone funclets when they are illegally nested, it would be nicer if we had a representation which forbade them upfront. Let's clean this up a bit by doing the following: - Instead, make catchpad more like cleanuppad and landingpad: no control flow, just a bunch of simple operands; catchpad would be splittable. - Introduce catchswitch, a control flow instruction designed to model the constraints of funclet oriented EH. - Make funclet scoping explicit by having funclet instructions consume the token produced by the funclet which contains them. - Remove catchendpad and cleanupendpad. Their presence can be inferred implicitly using coloring information. N.B. The state numbering code for the CLR has been updated but the veracity of it's output cannot be spoken for. An expert should take a look to make sure the results are reasonable. Reviewers: rnk, JosephTremoulet, andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D15139 llvm-svn: 255422
* SelectionDAG: Match min/max if the scalar operation is legalMatt Arsenault2015-12-112-10/+42
| | | | llvm-svn: 255388
* Revert r248483, r242546, r242545, and r242409 - absdiff intrinsicsHal Finkel2015-12-116-56/+1
| | | | | | | | | | | | | | | | | | | | After much discussion, ending here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html it has been decided that, instead of having the vectorizer directly generate special absdiff and horizontal-add intrinsics, we'll recognize the relevant reduction patterns during CodeGen. Accordingly, these intrinsics are not needed (the operations they represent can be pattern matched, as is already done in some backends). Thus, we're backing these out in favor of the current development work. r248483 - Codegen: Fix llvm.*absdiff semantic. r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation llvm-svn: 255387
* CodeGen: Redo analyzePhysRegs() and computeRegisterLiveness()Matthias Braun2015-12-113-48/+43
| | | | | | | | | | | | | | | | | | | | computeRegisterLiveness() was broken in that it reported dead for a register even if a subregister was alive. I assume this was because the results of analayzePhysRegs() are hard to understand with respect to subregisters. This commit: Changes the results of analyzePhysRegs (=struct PhysRegInfo) to be clearly understandable, also renames the fields to avoid silent breakage of third-party code (and improve the grammar). Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling it and removing workarounds for the bug. This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033 Differential Revision: http://reviews.llvm.org/D15320 llvm-svn: 255362
* CXX_FAST_TLS calling convention: target independent portion.Manman Ren2015-12-111-1/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The access function has a short entry and a short exit, the initialization block is only run the first time. To improve the performance, we want to have a short frame at the entry and exit. We explicitly handle most of the CSRs via copies. Only the CSRs that are not handled via copies will be in CSR_SaveList. Frame lowering and prologue/epilogue insertion will generate a short frame in the entry and exit according to CSR_SaveList. The majority of the CSRs will be handled by register allcoator. Register allocator will try to spill and reload them in the initialization block. We add CSRsViaCopy, it will be explicitly handled during lowering. 1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target supports it for the given calling convention and the function has only return exits). We also call TLI->initializeSplitCSR to perform initialization. 2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to virtual registers at beginning of the entry block and copies from virtual registers to CSRsViaCopy at beginning of the exit blocks. 3> we also need to make sure the explicit copies will not be eliminated. rdar://problem/23557469 Differential Revision: http://reviews.llvm.org/D15340 llvm-svn: 255353
* Fix (bitcast (fabs x)), (bitcast (fneg x)) and (bitcast (fcopysign cst,Eric Christopher2015-12-101-0/+68
| | | | | | | | | | | | x)) combines for ppc_fp128, since signbit computation is more complicated. Discussion thread: http://lists.llvm.org/pipermail/llvm-dev/2015-November/092863.html Patch by Tim Shen! llvm-svn: 255305
* Delete a duplicate branch in IfConversion.cpp. NFC.Cong Hou2015-12-101-9/+0
| | | | llvm-svn: 255291
* [DAGCombiner] Fix PR25763 - vector comparison constant folding + sign-extensionSimon Pilgrim2015-12-101-5/+8
| | | | | | PR25763 demonstrated an issue with D14683 - vector comparison constant folding only works for i1 results, so we need to split off the sign-extension of the result to the required type. Luckily this can be done with the existing type legalization code. llvm-svn: 255289
* remove duplicated comments and don't repeat function names in comments; NFCSanjay Patel2015-12-101-142/+83
| | | | llvm-svn: 255257
* [PostRA scheduling] Allow a target to do scheduling when it wants post RA.Jonas Paulsson2015-12-101-5/+8
| | | | | | | | | | | | | | SystemZ needs to do its scheduling after branch relaxation, which can only happen after block placement, and therefore the standard PostRAScheduler point in the pass sequence is too early. TargetMachine::targetSchedulesPostRAScheduling() is a new method that signals on returning true that target will insert the final scheduling pass on its own. Reviewed by Hal Finkel llvm-svn: 255234
* RegisterPressure: Factor out liveness dead-def detection logic; NFCIMatthias Braun2015-12-101-40/+43
| | | | | | | | Detecting additional dead-defs without a dead flag that are only visible through liveness information should be part of the register operand collection not intertwined with the register pressure update logic. llvm-svn: 255192
* PeepholeOptimizer: Ignore dead implicit defsDan Gohman2015-12-101-0/+6
| | | | | | | | Target-specific instructions may have uninteresting physreg clobbers, for target-specific reasons. The peephole pass doesn't need to concern itself with such defs, as long as they're implicit and marked as dead. llvm-svn: 255182
* use range-based for loops; NFCISanjay Patel2015-12-091-6/+4
| | | | llvm-svn: 255171
* Fix cycle in selection DAG introduced by extractelement legalizationRobert Lougher2015-12-091-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | During selection DAG legalization, extractelement is replaced with a load instruction. To do this, a temporary store to the stack is used unless an existing store is found that can be re-used. If re-using a store, the chain going out of the store must be replaced by the one going out of the new load (this ensures that any stores that must take place after the store happens after the load, else the value might be overwritten before it is loaded). The problem is, if the extractelement index is dependent on the store replacing the chain will introduce a cycle in the selection DAG (the load uses the index, and by replacing the chain we will make the index dependent on the load). To fix this, if the index is dependent on the store, the store is skipped. This is conservative as we may end up creating an unnecessary extra store to the stack. However, the situation is not expected to occur very often. Differential Revision: http://reviews.llvm.org/D15330 llvm-svn: 255114
* Revert "Implement a new pass - LiveDebugValues - to compute the set of live ↵Mehdi Amini2015-12-094-405/+0
| | | | | | | | | | | DEBUG_VALUEs at each basic block and insert them. Reviewed and accepted at: http://reviews.llvm.org/D11933" This reverts commit r255096. Break the bots: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_check/16378/ From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255101
* Implement a new pass - LiveDebugValues - to compute the set of live ↵Vikram TV2015-12-094-0/+405
| | | | | | DEBUG_VALUEs at each basic block and insert them. Reviewed and accepted at: http://reviews.llvm.org/D11933 llvm-svn: 255096
* [CGP] Reimplement r255055 a different wayReid Kleckner2015-12-081-0/+4
| | | | llvm-svn: 255070
* Revert "[CGP] Check that we have an insert point before moving ↵Reid Kleckner2015-12-081-8/+4
| | | | | | | | | | llvm.dbg.value around" This reverts commit r255055. Breakage has been reported. llvm-svn: 255063
* [CGP] Check that we have an insert point before moving llvm.dbg.value aroundReid Kleckner2015-12-081-4/+8
| | | | llvm-svn: 255055
* AsmPrinter: Use emitGlobalConstantFP to emit elements of constant dataJustin Bogner2015-12-081-16/+4
| | | | | | | | | | It's strange to duplicate the logic for emitting FP values into emitGlobalConstantDataSequential, and it's even stranger that we end up printing the verbose assembly comments differently between the two paths. Just call into emitGlobalConstantFP rather than crudely duplicating its logic. llvm-svn: 254988
* fix return values to match bool return type; NFC Sanjay Patel2015-12-071-2/+2
| | | | llvm-svn: 254968
* AVX-512: Fixed masked load / store instruction selection for KNL.Elena Demikhovsky2015-12-071-1/+4
| | | | | | | | | | | | | | | Patterns were missing for KNL target for <8 x i32>, <8 x float> masked load/store. This intrinsic comes with all legal types: <8 x float> @llvm.masked.load.v8f32(<8 x float>* %addr, i32 align, <8 x i1> %mask, <8 x float> %passThru), but still requires lowering, because VMASKMOVPS, VMASKMOVDQU32 work with 512-bit vectors only. All data operands should be widened to 512-bit vector. The mask operand should be widened to v16i1 with zeroes. Differential Revision: http://reviews.llvm.org/D15265 llvm-svn: 254909
* Replace uint16_t with the MCPhysReg typedef in many places. A lot of ↵Craig Topper2015-12-054-12/+14
| | | | | | physical register arrays already use this typedef. llvm-svn: 254843
* Normalize successors' probabilities when building MBBs for jump table.Cong Hou2015-12-051-0/+2
| | | | llvm-svn: 254837
* ScheduleDAGInstrs: Move LiveIntervals field to ScheduleDAGMIMatthias Braun2015-12-041-2/+1
| | | | | | | Now that ScheduleDAGInstrs doesn't need it anymore we can move the field down the class hierarcy to ScheduleDAGMI. llvm-svn: 254759
* Revert "[BranchFolding] Merge MMOs during tail merge"Rafael Espindola2015-12-041-26/+16
| | | | | | | | This reverts commit r254694. It broke bootstrap. llvm-svn: 254700
* [BranchFolding] Merge MMOs during tail mergeJunmo Park2015-12-041-16/+26
| | | | | | | | | | | | | | | | | Summary: If we remove the MMOs from Load/Store instructions, they are treated as volatile. This makes other optimization passes unhappy. eg. Load/Store Optimization So, it looks better to merge, not remove. Reviewers: gberry, mcrosier Subscribers: gberry, llvm-commits Differential Revision: http://reviews.llvm.org/D14797 llvm-svn: 254694
* (no commit message)Junmo Park2015-12-041-1/+1
| | | | llvm-svn: 254686
* ScheduleDAGInstrs: Rework schedule graph builder.Matthias Braun2015-12-041-66/+161
| | | | | | | | | | | | | | | | | | Re-comitting with a change that avoids undefined uses getting put into the VRegUses list. The new algorithm remembers the uses encountered while walking backwards until a matching def is found. Contrary to the previous version this: - Works without LiveIntervals being available - Allows to increase the precision to subregisters/lanemasks (not used for now) The changes in the AMDGPU tests are necessary because the R600 scheduler is not stable with respect to the order of nodes in the ready queues. Differential Revision: http://reviews.llvm.org/D9068 llvm-svn: 254683
* raw_ostream: << operator for callables with raw_ostream argumentMatthias Braun2015-12-043-76/+64
| | | | | | | | | This is a revised version of r254655 which uses a Printable wrapper class to avoid ambiguous overload problems. Differential Revision: http://reviews.llvm.org/D14348 llvm-svn: 254681
* Emit function alias to data as a function symbol.Evgeniy Stepanov2015-12-041-0/+5
| | | | | | | | | | CFI emits jump slots for indirect functions as a byte array constant, and declares function-typed aliases to these constants. This change fixes AsmPrinter to emit these aliases as function symbols and not data symbols. llvm-svn: 254674
* CodeGen peephole: fold redundant phys reg copiesJF Bastien2015-12-031-12/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Code generation often exposes redundant physical register copies through virtual registers such as: %vreg = COPY %PHYSREG ... %PHYSREG = COPY %vreg There are cases where no intervening clobber of %PHYSREG occurs, and the later copy could therefore be removed. In some cases this further allows us to remove the initial copy. This patch contains a motivating example which comes from the x86 build of Chrome, specifically cc::ResourceProvider::UnlockForRead uses libstdc++'s implementation of hash_map. That example has two tests live at the same time, and after machine sinking LLVM has confused itself enough and things spilling EFLAGS is a great idea even though it's never restored and the comparison results are both live. Before this patch we have: DEC32m %RIP, 1, %noreg, <ga:@L>, %noreg, %EFLAGS<imp-def> %vreg1<def> = COPY %EFLAGS; GR64:%vreg1 %EFLAGS<def> = COPY %vreg1; GR64:%vreg1 JNE_1 <BB#1>, %EFLAGS<imp-use> Both copies are useless. This patch tries to eliminate the later copy in a generic manner. dec is especially confusing to LLVM when compared with sub. I wrote this patch to treat all physical registers generically, but only remove redundant copies of non-allocatable physical registers because the allocatable ones caused issues (e.g. when calling conventions weren't properly modeled) and should be handled later by the register allocator anyways. The following tests used to failed when the patch also replaced allocatable registers: CodeGen/X86/StackColoring.ll CodeGen/X86/avx512-calling-conv.ll CodeGen/X86/copy-propagation.ll CodeGen/X86/inline-asm-fpstack.ll CodeGen/X86/musttail-varargs.ll CodeGen/X86/pop-stack-cleanup.ll CodeGen/X86/preserve_mostcc64.ll CodeGen/X86/tailcallstack64.ll CodeGen/X86/this-return-64.ll This happens because COPY has other special meaning for e.g. dependency breakage and x87 FP stack. Note that all other backends' tests pass. Reviewers: qcolombet Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15157 llvm-svn: 254665
* AsmPrinter: Simplify emitting FP elements in sequential data. NFCJustin Bogner2015-12-031-26/+15
| | | | | | | Use APFloat APIs here Rather than manually type-punning through unions. llvm-svn: 254664
* Revert "raw_ostream: << operator for callables with raw_stream argument"Matthias Braun2015-12-033-62/+76
| | | | | | | | This commit provoked "error C2593: 'operator <<' is ambiguous" on MSVC. This reverts commit r254655. llvm-svn: 254661
* raw_ostream: << operator for callables with raw_stream argumentMatthias Braun2015-12-033-76/+62
| | | | | | | | | | | | | | | | | This allows easier construction of print helpers. Example: Printable PrintLaneMask(unsigned LaneMask) { return Printable([LaneMask](raw_ostream &OS) { OS << format("%08X", LaneMask); }); } // Usage: OS << PrintLaneMask(Mask); Differential Revision: http://reviews.llvm.org/D14348 llvm-svn: 254655
OpenPOWER on IntegriCloud