summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [PPC] Fix code generation for bswap(int32) followed by store16Guozhi Wei2017-03-021-2/+10
| | | | | | | | | | | | | | | | | | | This patch fixes pr32063. Current code in PPCTargetLowering::PerformDAGCombine can transform bswap store into a single PPCISD::STBRX instruction. but it doesn't consider the case that the operand size of bswap may be larger than store size. When it occurs, we need 2 modifications, 1 For the last operand of PPCISD::STBRX, we should not use DAG.getValueType(N->getOperand(1).getValueType()), instead we should use cast<StoreSDNode>(N)->getMemoryVT(). 2 Before PPCISD::STBRX, we need to shift the original operand of bswap to the right side. Differential Revision: https://reviews.llvm.org/D30362 llvm-svn: 296811
* [Support] Move Stream library from MSF -> Support.Zachary Turner2017-03-0236-56/+56
| | | | | | | | | | After several smaller patches to get most of the core improvements finished up, this patch is a straight move and header fixup of the source. Differential Revision: https://reviews.llvm.org/D30266 llvm-svn: 296810
* [AArch64] Extend redundant copy elimination pass to handle non-zero stores.Chad Rosier2017-03-021-76/+212
| | | | | | | | | | | | | | | This patch extends the current functionality of the AArch64 redundant copy elimination pass to handle non-zero cases such as: BB#0: cmp x0, #1 b.eq .LBB0_1 .LBB0_1: orr x0, xzr, #0x1 ; <-- redundant copy; x0 known to hold #1. Differential Revision: https://reviews.llvm.org/D29344 llvm-svn: 296809
* [DAG] improve documentation comments; NFCSanjay Patel2017-03-022-90/+48
| | | | llvm-svn: 296808
* [MSP430] Add SRet support to MSP430 targetVadzim Dambrouski2017-03-024-24/+85
| | | | | | | | | | | | | This patch adds support for struct return values to the MSP430 target backend. It also reverses the order of argument and return registers in the calling convention to bring it into closer alignment with the published EABI from TI. Patch by Andrew Wygle (awygle). Differential Revision: https://reviews.llvm.org/D29069 llvm-svn: 296807
* [NVPTX] Reduce amount of boilerplate code used to select load instruction ↵Artem Belevich2017-03-021-1781/+587
| | | | | | | | | | | | | | opcode. Make opcode selection code for the load instruction a bit easier to read and maintain. This patch also catches number of f16 load/store variants that were not handled before. Differential Revision: https://reviews.llvm.org/D30513 llvm-svn: 296785
* [NVPTX] Added missing LDU/LDG intrinsics for f16.Artem Belevich2017-03-022-2/+19
| | | | | | Differential Revision: https://reviews.llvm.org/D30512 llvm-svn: 296784
* Fix some Wdocumentation warningsSimon Pilgrim2017-03-021-2/+2
| | | | llvm-svn: 296783
* [X86][MMX] Fixed i32 extraction on 32-bit targetsSimon Pilgrim2017-03-021-0/+10
| | | | | | MMX extraction often ends up as extract_i32(bitcast_v2i32(extract_i64(bitcast_v1i64(x86mmx v), 0)), 0) which fails to simplify on 32-bit targets as i64 isn't legal llvm-svn: 296782
* Cast to the right type on Windows.Vassil Vassilev2017-03-021-1/+1
| | | | llvm-svn: 296778
* [Hexagon] Skip blocks that define vector predicate registers in early-ifKrzysztof Parzyszek2017-03-021-2/+8
| | | | llvm-svn: 296777
* Reland r296442 with modifications reverted in r296463.Vassil Vassilev2017-03-022-0/+24
| | | | | | | | | | | | | | Original commit message: "Allow externally dlopen-ed libraries to be registered as permanent libraries. This is also useful in cases when llvm is in a shared library. First we dlopen the llvm shared library and then we register it as a permanent library in order to keep the JIT and other services working. Patch reviewed by Vedant Kumar (D29955)!" llvm-svn: 296774
* [Hexagon] Properly handle 'q' constraint in 128-byte vector modeKrzysztof Parzyszek2017-03-021-22/+10
| | | | llvm-svn: 296772
* [PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame sizeNemanja Ivanovic2017-03-021-5/+43
| | | | | | | | | | | | | | This patch reduces the stack frame size by not allocating the parameter area if it is not required. In the current implementation LowerFormalArguments_64SVR4 already handles the parameter area, but LowerCall_64SVR4 does not (when calculating the stack frame size). What this patch does is make LowerCall_64SVR4 consistent with LowerFormalArguments_64SVR4. Committing on behalf of Hiroshi Inoue. Differential Revision: https://reviews.llvm.org/D29881 llvm-svn: 296771
* The patch turns on epilogue unroll for loops with constant recurency start.Evgeny Stupachenko2017-03-021-1/+44
| | | | | | | | | | | | | | | | | Summary: Set unroll remainder to epilog if a loop contains a phi with constant parameter: loop: pn = phi [Const, PreHeader], [pn.next, Latch] ... Reviewer: hfinkel Differential Revision: http://reviews.llvm.org/D27004 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 296770
* [DAGCombiner] avoid assertion when folding binops with opaque constantsSanjay Patel2017-03-021-3/+4
| | | | | | | | | | | | | This bug was introduced with: https://reviews.llvm.org/rL296699 There may be a way to loosen the restriction, but for now just bail out on any opaque constant. The tests show that opacity is target-specific. This goes back to cost calculations in ConstantHoisting based on TTI->getIntImmCost(). llvm-svn: 296768
* fix typo in comment; NFCSanjay Patel2017-03-021-1/+1
| | | | llvm-svn: 296760
* Re-apply "[GVNHoist] Move GVNHoist to function simplification part of pipeline."Geoff Berry2017-03-021-2/+2
| | | | | | | | | This re-applies r289696, which caused TSan perf regression, which has since been addressed in separate changes (see PR for details). See PR31382. llvm-svn: 296759
* GlobalISel: record correct stack usage for signext parameters.Tim Northover2017-03-021-3/+7
| | | | | | | | | | | | | The CallingConv.td rules allocate 8 bytes for these kinds of arguments on AAPCS targets, but we were only recording the smaller amount. The difference is theoretical on AArch64 because we don't actually store more than the smaller amount, but it's still much better to have these two components in agreement. Based on Diana Picus's ARM equivalent patch (where it matters a lot more). llvm-svn: 296754
* [InstCombine] Avoid faulty combines of select-cmp-brBjorn Pettersson2017-03-021-3/+5
| | | | | | | | | | | | | | | | | | | | | | Summary: When InstCombine is optimizing certain select-cmp-br patterns it replaces the result of the select in uses outside of the basic block containing the select. This is only legal if the path from the select to the outside use is disjoint from all other paths out from the originating basic block. The problem found was that InstCombiner::replacedSelectWithOperand did not consider the case when both edges out from the br pointed to the same label. In that case the paths aren't disjoint and the transformation is illegal. This patch avoids the faulty rewrites by verifying that there is a single flow to the successor where we want to replace uses. Reviewers: llvm-commits, spatel, majnemer Differential Revision: https://reviews.llvm.org/D30455 llvm-svn: 296752
* [ARM/AArch64] Update costs for interleaved accesses with wide typesMatthew Simpson2017-03-022-4/+8
| | | | | | | | | After r296750, we're able to match interleaved accesses having types wider than 128 bits. This patch updates the associated TTI costs. Differential Revision: https://reviews.llvm.org/D29675 llvm-svn: 296751
* [ARM/AArch64] Support wide interleaved accessesMatthew Simpson2017-03-022-93/+264
| | | | | | | | | | | | | This patch teaches (ARM|AArch64)ISelLowering.cpp to match illegal vector types to interleaved access intrinsics as long as the types are multiples of the vector register width. A "wide" access will now be mapped to multiple interleave intrinsics similar to the way in which non-interleaved accesses with illegal types are legalized into multiple accesses. I'll update the associated TTI costs (in getInterleavedMemoryOpCost) as a follow-on. Differential Revision: https://reviews.llvm.org/D29466 llvm-svn: 296750
* Do not leak OpenedHandles.Vassil Vassilev2017-03-021-10/+2
| | | | llvm-svn: 296748
* [LV] Considier non-consecutive but vectorizable accesses for VF selectionMatthew Simpson2017-03-021-3/+10
| | | | | | | | | | | | | | | | When computing the smallest and largest types for selecting the maximum vectorization factor, we currently ignore loads and stores of pointer types if the memory access is non-consecutive. We do this because such accesses must be scalarized regardless of vectorization factor, and thus shouldn't be considered when determining the factor. This patch makes this check less aggressive by also considering non-consecutive accesses that may be vectorized, such as interleaved accesses. Because we don't know at the time of the check if an accesses will certainly be vectorized (this is a cost model decision given a particular VF), we consider all accesses that can potentially be vectorized. Differential Revision: https://reviews.llvm.org/D30305 llvm-svn: 296747
* Do not verify MachimeDominatorTree if it is not calculatedSerge Pavlov2017-03-021-14/+14
| | | | | | | | | | | | | | | | If dominator tree is not calculated or is invalidated, set corresponding pointer in the pass state to nullptr. Such pointer value will indicate that operations with dominator tree are not allowed. In particular, it allows to skip verification for such pass state. The dominator tree is not calculated if the machine dominator pass was skipped, it occures in the case of entities with linkage available_externally. The change fixes some test fails observed when expensive checks are enabled. Differential Revision: https://reviews.llvm.org/D29280 llvm-svn: 296742
* Fix typo. NFCIXin Tong2017-03-021-1/+1
| | | | llvm-svn: 296735
* LTO: When creating a local cache, create the cache directory if it does not ↵Peter Collingbourne2017-03-021-2/+5
| | | | | | | | already exist. Differential Revision: https://reviews.llvm.org/D30519 llvm-svn: 296726
* LiveRegMatrix: Fix some subreg interference checksMatthias Braun2017-03-021-5/+8
| | | | | | | | Surprisingly, one of the three interference checks in LiveRegMatrix was using the main live range instead of the apropriate subregister range resulting in unnecessarily conservative results. llvm-svn: 296722
* Revert r296708; causing test failures on ARM hosts.Eli Friedman2017-03-021-18/+12
| | | | | | | | | | | | | | | Original commit message: [ARM] Fix insert point for store rescheduling. In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last operation which we want to merge. If we break out of the loop because an operation has the wrong offset, we shouldn't use that operation as LastOp. This patch fixes some cases where we would sink stores for no reason. llvm-svn: 296718
* Remove spurious use of LLVM_FALLTHROUGH (NFC)Paul Robinson2017-03-011-43/+17
| | | | llvm-svn: 296713
* [DAGCombiner] mulhi + 1 never overflow.Amaury Sechet2017-03-011-0/+13
| | | | | | | | | | | | | | | Summary: This can be used to optimize large multiplications after legalization. Depends on D29565 Reviewers: mkuper, spatel, RKSimon, zvi, bkramer, aaboud, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29587 llvm-svn: 296711
* [GlobalISel] Add a way for targets to enable GISel.Ahmed Bougacha2017-03-013-4/+22
| | | | | | | | | | | | | | | | | | | | | | | Until now, we've had to use -global-isel to enable GISel. But using that on other targets that don't support it will result in an abort, as we can't build a full pipeline. Additionally, we want to experiment with enabling GISel by default for some targets: we can't just enable GISel by default, even among those target that do have some support, because the level of support varies. This first step adds an override for the target to explicitly define its level of support. For AArch64, do that using a new command-line option (I know..): -aarch64-enable-global-isel-at-O=<N> Where N is the opt-level below which GISel should be used. Default that to -1, so that we still don't enable GISel anywhere. We're not there yet! While there, remove a couple LLVM_UNLIKELYs. Building the pipeline is such a cold path that in practice that shouldn't matter at all. llvm-svn: 296710
* [ARM] Fix insert point for store rescheduling.Eli Friedman2017-03-011-12/+18
| | | | | | | | | | | | | In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last operation which we want to merge. If we break out of the loop because an operation has the wrong offset, we shouldn't use that operation as LastOp. This patch fixes some cases where we would sink stores for no reason. Differential Revision: https://reviews.llvm.org/D30124 llvm-svn: 296708
* [ARM] Check correct instructions for load/store rescheduling.Eli Friedman2017-03-011-1/+1
| | | | | | | | | | | | | | | | | | | | This code starts from the high end of the sorted vector of offsets, and works backwards: it tries to find contiguous offsets, process them, then pops them from the end of the vector. Most of the code agrees with this order of processing, but one loop doesn't: it instead processes elements from the low end of the vector (which are nodes with unrelated offsets). Fix that loop to process the correct elements. This has a few implications. One, we don't incorrectly return early when processing multiple groups of offsets in the same block (which allows rescheduling prera-ldst-insertpt.mir). Two, we pick the correct insert point for loads, so they're correctly sorted (which affects the scheduling of vldm-liveness.ll). I think it might also impact some of the heuristics slightly. Differential Revision: https://reviews.llvm.org/D30368 llvm-svn: 296701
* [DAGCombiner] fold binops with constant into select-of-constantsSanjay Patel2017-03-011-0/+112
| | | | | | | | | | | | | | | | | | This is part of the ongoing attempt to improve select codegen for all targets and select canonicalization in IR (see D24480 for more background). The transform is a subset of what is done in InstCombine's FoldOpIntoSelect(). I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that hopes to convert more selects to basic math ops. This appears to be a general missing DAG transform though, so I added tests for all standard binops in rL296621 (PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86). The poor output for "sel_constants_shl_constant" is tracked with: https://bugs.llvm.org/show_bug.cgi?id=32105 Differential Revision: https://reviews.llvm.org/D30502 llvm-svn: 296699
* [Constant Hoisting] Avoid inserting instructions before EH padsReid Kleckner2017-03-011-2/+10
| | | | | | | | | | | | | Now that terminators can be EH pads, this code needs to iterate over the immediate dominators of the EH pad to find a valid insertion point. Fix for PR32107 Patch by Robert Olliff! Differential Revision: https://reviews.llvm.org/D30511 llvm-svn: 296698
* [DebugInfo] [DWARFv5] Unique abbrevs for DIEs with different implicit_const ↵Victor Leschuk2017-03-011-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | values Take DW_FORM_implicit_const attribute value into account when profiling DIEAbbrevData. Currently if we have two similar types with implicit_const attributes and different values we end up with only one abbrev in .debug_abbrev section. For example consider two structures: S1 with implicit_const attribute ATTR and value VAL1 and S2 with implicit_const ATTR and value VAL2. The .debug_abbrev section will contain only 1 related record: [N] DW_TAG_structure_type DW_CHILDREN_yes DW_AT_ATTR DW_FORM_implicit_const VAL1 // .... This is incorrect as struct S2 (with VAL2) will use abbrev record with VAL1. With this patch we will have two different abbreviations here: [N] DW_TAG_structure_type DW_CHILDREN_yes DW_AT_ATTR DW_FORM_implicit_const VAL1 // .... [M] DW_TAG_structure_type DW_CHILDREN_yes DW_AT_ATTR DW_FORM_implicit_const VAL2 // .... llvm-svn: 296691
* [DAGCombiner] Remove non-ascii character and reflow comment.Benjamin Kramer2017-03-011-5/+4
| | | | llvm-svn: 296690
* LIU:::Query: Query LiveRange instead of LiveInterval; NFCMatthias Braun2017-03-013-16/+15
| | | | | | | | | - We only need the information from the base class, not the additional details in the LiveInterval class. - Spread more `const` - Some code cleanup llvm-svn: 296684
* Elide argument copies during instruction selectionReid Kleckner2017-03-014-29/+281
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Avoids tons of prologue boilerplate when arguments are passed in memory and left in memory. This can happen in a debug build or in a release build when an argument alloca is escaped. This will dramatically affect the code size of x86 debug builds, because X86 fast isel doesn't handle arguments passed in memory at all. It only handles the x86_64 case of up to 6 basic register parameters. This is implemented by analyzing the entry block before ISel to identify copy elision candidates. A copy elision candidate is an argument that is used to fully initialize an alloca before any other possibly escaping uses of that alloca. If an argument is a copy elision candidate, we set a flag on the InputArg. If the the target generates loads from a fixed stack object that matches the size and alignment requirements of the alloca, the SelectionDAG builder will delete the stack object created for the alloca and replace it with the fixed stack object. The load is left behind to satisfy any remaining uses of the argument value. The store is now dead and is therefore elided. The fixed stack object is also marked as mutable, as it may now be modified by the user, and it would be invalid to rematerialize the initial load from it. Supersedes D28388 Fixes PR26328 Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D29668 llvm-svn: 296683
* [APInt] Optimize APInt creation from uint64_tCraig Topper2017-03-011-0/+1
| | | | | | | | | | | | | | | | | Summary: This patch moves the clearUnusedBits calls into the two different initialization paths for APInt from a uint64_t. This allows the compiler to better optimize the clearing of the unused bits for the single word case. And it puts the clearing for the multi word case into the initSlowCase function to save code. In the common case of initializing with 0 this allows the clearing to be completely optimized out for the single word case. On my local x86 build this is showing a ~45kb reduction in the size of the opt binary. Reviewers: RKSimon, hans, majnemer, davide, MatzeB Reviewed By: hans Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30486 llvm-svn: 296677
* LIU::Query: Remove always false member+getter; NFCMatthias Braun2017-03-011-2/+0
| | | | llvm-svn: 296675
* Improve scheduling with branch coalescingNemanja Ivanovic2017-03-014-0/+764
| | | | | | | | | | | This patch adds a MachineSSA pass that coalesces blocks that branch on the same condition. Committing on behalf of Lei Huang. Differential Revision: https://reviews.llvm.org/D28249 llvm-svn: 296670
* [DAG] Prevent Stale nodes from entering worklistNirav Dave2017-03-011-4/+10
| | | | | | | | | | | | | | | Add check that deleted nodes do not get added to worklist. This can occur when a node's operand is simplified to an existing node. This fixes PR32108. Reviewers: jyknight, hfinkel, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30506 llvm-svn: 296668
* [RDF] Replace {} with explicit constructor, since not all compilers like itKrzysztof Parzyszek2017-03-011-1/+1
| | | | llvm-svn: 296666
* NewGVN: Add debug counter for value numberingDaniel Berlin2017-03-011-5/+15
| | | | llvm-svn: 296665
* [DWARF] Print leading zeros in type signaturePaul Robinson2017-03-011-2/+2
| | | | llvm-svn: 296663
* [RDF] Add recursion limit to getAllReachingDefsRecKrzysztof Parzyszek2017-03-013-9/+40
| | | | | | | For large programs this function can take significant amounts of time. Let it abort gracefully when the program is too complex. llvm-svn: 296662
* Alphabetize some cases (NFC)Paul Robinson2017-03-011-12/+12
| | | | llvm-svn: 296655
* Revert r296575 "[SLP] Fixes the bug due to absence of in order uses of ↵Hans Wennborg2017-03-012-88/+78
| | | | | | | | scalars which needs to be available" It caused miscompiles, e.g. in Chromium (PR32109). llvm-svn: 296654
OpenPOWER on IntegriCloud