summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Enable ICP for AutoFDO.Dehao Chen2017-06-271-2/+3
| | | | | | | | | | | | | | Summary: AutoFDO should have ICP enabled. Reviewers: davidxl Reviewed By: davidxl Subscribers: sanjoy, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D34662 llvm-svn: 306429
* [ProfData] Make the method threadsafeXinliang David Li2017-06-271-2/+3
| | | | llvm-svn: 306428
* [X86][AsmParser][MS-compatability] Binary/Unary operators enhancementsCoby Tayree2017-06-271-37/+75
| | | | | | | | | | | Introducing MOD binary operator https://msdn.microsoft.com/en-us/library/hha180wt.aspx Enhancing unary operators NEG and NOT, to support more complex patterns Differential Revision: https://reviews.llvm.org/D33876 llvm-svn: 306425
* [DWARF] NFC: Make string-offset handling more like address-table handling; Paul Robinson2017-06-272-12/+3
| | | | | | do the indirection and relocation all in the same method. llvm-svn: 306418
* Updated and extended the information about each instruction in HSW and SNB ↵Gadi Haber2017-06-272-1634/+5225
| | | | | | | | | | | | | | | | | | | to include the following data: •static latency •number of uOps from which the instructions consists •all ports used by the instruction Reviewers:  RKSimon zvi aymanmus m_zuckerman Differential Revision: https://reviews.llvm.org/D33897 llvm-svn: 306414
* [AMDGPU] SDWA: several fixes for V_CVT and VOPC instructionsSam Kolton2017-06-276-33/+43
| | | | | | | | | | | | | | Summary: 1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it. 2. There were several problems with support of VOPC instructions in SDWA peephole pass. Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye Differential Revision: https://reviews.llvm.org/D34626 llvm-svn: 306413
* [AArch64] Update successor probabilities after ccmp-conversionMatthew Simpson2017-06-271-4/+44
| | | | | | | | | | | | | This patch modifies the conditional compares pass so that it keeps successor probabilities up-to-date after the conversion. Previously, successor probabilities were being normalized to a uniform distribution, even though they may have been heavily biased prior to the conversion (e.g., if one of the edges was the back edge of a loop). This loss of information affected passes later in the pipeline. Differential Revision: https://reviews.llvm.org/D34109 llvm-svn: 306412
* [LoopUnrollRuntime] Use SCEV exit count for calculating trip count. NFCIAnna Thomas2017-06-271-1/+5
| | | | | | | Instead of getBackEdgeTakenCount, use getExitCount on the latch exiting block (which is proven to be the only exiting block in the loop to be unrolled). llvm-svn: 306410
* [mips] Add instruction aliases for ds(r|l)l.Simon Dardis2017-06-272-3/+21
| | | | | | | Add the instruction aliases for ds(r|l)l for the two operand alias of ds(r|l)lv and the aliases ds(r|l)l with the three register operands. llvm-svn: 306405
* [SelectionDAG] set dereferenceable flag in MergeConsecutiveStores to fix ↵Hiroshi Inoue2017-06-271-2/+12
| | | | | | | | | | | | assetion failure When SelectionDAG merges consecutive stores and loads in MergeConsecutiveStores, it does not set dereferenceable flag for a created load instruction. This results in an assertion failure if SelectionDAG commonizes this load instruction with other load instructions, as well as it may miss optimization opportunities. This patch sat dereferenceable flag for the newly created load instruction if all the load instructions to be merged are dereferenceable. Differential Revision: https://reviews.llvm.org/D34679 llvm-svn: 306404
* Recommitting rL305465 after fixing bug in TableGen in rL306251 & rL306371Ayman Musa2017-06-272-25/+715
| | | | | | | | | | | | | | | [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions). AVX512 compare instructions return v*i1 types. In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type. Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes. The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class. When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction. Differential Revision: https://reviews.llvm.org/D33188 llvm-svn: 306402
* fix trivial typos, NFCHiroshi Inoue2017-06-273-6/+6
| | | | | | succesor -> successor llvm-svn: 306393
* [ARM] GlobalISel: Support G_SELECT for pointersDiana Picus2017-06-271-0/+1
| | | | | | All we need to do is mark it as legal, otherwise it's just like s32. llvm-svn: 306390
* [globalisel][tablegen] Add support for EXTRACT_SUBREG.Daniel Sanders2017-06-271-2/+7
| | | | | | | | | | | | | | | | Summary: After this patch, we finally have test cases that require multiple instruction emission. Depends on D33590 Reviewers: ab, qcolombet, t.p.northover, rovka, kristof.beyls Subscribers: javed.absar, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D33596 llvm-svn: 306388
* [mips] Refine the condition for when to use CALL16 vs a GOT displacement.Simon Dardis2017-06-271-2/+6
| | | | | | | | | | | | | | Borrow from the logic for 'jal' in MipsAsmParser::processInstruction and add the extra condition of bypassing CALL16 if the destination symbol is an ELF symbol with STB_LOCAL binding. Patch by: John Baldwin Reviewers: sdardis Differential Revision: https://reviews.llvm.org/D33999 llvm-svn: 306387
* [ARM] GlobalISel: Support G_SELECT for i32Diana Picus2017-06-273-0/+65
| | | | | | | | | | * Mark as legal for (s32, i1, s32, s32) * Map everything into GPRs * Select to two instructions: a CMP of the condition against 0, to set the flags, and a MOVCCr to select between the two inputs based on the flags that we've just set llvm-svn: 306382
* Recommitting 306331.Ayal Zaks2017-06-271-287/+300
| | | | | | | Undoing revert 306338 after fixed bug: add metadata to the load instead of the reverse shuffle added to it, retaining the original ValueMap implementation. llvm-svn: 306381
* [SROA] Fix PR32902 by more carefully propagating !nonnull metadata.Chandler Carruth2017-06-271-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is based heavily on the work done ni D34285. I mostly wanted to do test cleanup for the author to save them some time, but I had a really hard time understanding why it was so hard to write better test cases for these issues. The problem is that because SROA does a second rewrite of the loads and because we *don't* propagate !nonnull for non-pointer loads, we first introduced invalid !nonnull metadata and then stripped it back off just in time to avoid most ways of this PR manifesting. Moving to the more careful utility only fixes this by changing the predicate to look at the new load's type rather than the target type. However, that *does* fix the bug, and the utility is much nicer including adding range metadata to model the nonnull property after a conversion to an integer. However, we have bigger problems because we don't actually propagate *range* metadata, and the utility to do this extracted from instcombine isn't really in good shape to do this currently. It *only* handles the case of copying range metadata from an integer load to a pointer load. It doesn't even handle the trivial cases of propagating from one integer load to another when they are the same width! This utility will need to be beefed up prior to using in this location to get the metadata to fully survive. And even then, we need to go and teach things to turn the range metadata into an assume the way we do with nonnull so that when we *promote* an integer we don't lose the information. All of this will require a new test case that looks kind-of like `preserve-nonnull.ll` does here but focuses on range metadata. It will also likely require more testing because it needs to correctly handle changes to the integer width, especially as SROA actively tries to change the integer width! Last but not least, I'm a little worried about hooking the range metadata up here because the instcombine logic for converting from a range metadata *to* a nonnull metadata node seems broken in the face of non-zero address spaces where null is not mapped to the integer `0`. So that probably needs to get fixed with test cases both in SROA and in instcombine to cover it. But this *does* extract the core PR fix from D34285 of preventing the !nonnull metadata from being propagated in a broken state just long enough to feed into promotion and crash value tracking. On D34285 there is some discussion of zero-extend handling because it isn't necessary. First, the new load size covers all of the non-undef (ie, possibly initialized) bits. This may even extend past the original alloca if loading those bits could produce valid data. The only way its valid for us to zero-extend an integer load in SROA is if the original code had a zero extend or those bits were undef. And we get to assume things like undef *never* satifies nonnull, so non undef bits can participate here. No need to special case the zero-extend handling, it just falls out correctly. The original credit goes to Ariel Ben-Yehuda! I'm mostly landing this to save a few rounds of trivial edits fixing style issues and test case formulation. Differental Revision: D34285 llvm-svn: 306379
* AMDGPU: M0 operands to spill/restore opcodes are deadNicolai Haehnle2017-06-271-2/+2
| | | | | | | | | | | | | | | | | Summary: With scalar stores, M0 is clobbered and therefore marked as implicitly defined. However, it is also dead. This fixes an assertion when the Greedy Register Allocator decides to optimize a spill/restore pair away again (via tryHintsRecoloring). Reviewers: arsenm Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33319 llvm-svn: 306375
* Fixed the warning introduced by r306289 to make ubuntu-gcc7.1-werror bot green.Galina Kistanova2017-06-271-1/+1
| | | | llvm-svn: 306369
* [Reassociate] Make sure EraseInst sets MadeChangeMikael Holmen2017-06-271-0/+2
| | | | | | | | | | | | | | | | | | | | | Summary: EraseInst didn't report that it made IR changes through MadeChange. It is essential that changes to the IR are reported correctly, since for example ReassociatePass::run() will indicate that all analyses are preserved otherwise. And the CGPassManager determines if the CallGraph is up-to-date based on status from InstructionCombiningPass::runOnFunction(). Reviewers: craig.topper, rnk, davide Reviewed By: rnk, davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34616 llvm-svn: 306368
* [PowerPC] set optimization level in SelectionDAGISelHiroshi Inoue2017-06-273-6/+8
| | | | | | | | | PowerPC backend does not pass the current optimization level to SelectionDAGISel and so SelectionDAGISel works with the default optimization level regardless of the current optimization level. This patch makes the PowerPC backend set the optimization level correctly. Differential Revision: https://reviews.llvm.org/D34615 llvm-svn: 306367
* [AVR] Migrate to new MCAsmBackend applyFixup and processFixupValueLeslie Zhai2017-06-272-28/+26
| | | | | | | | | | | | Reviewers: rafael, dylanmckay, jroelofs, meadori Reviewed By: rafael, meadori Subscribers: meadori, llvm-commits Differential Revision: https://reviews.llvm.org/D34551 llvm-svn: 306359
* [CFLAA] Move a common function to the header to reduce duplication.Davide Italiano2017-06-272-24/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D34660 llvm-svn: 306354
* ScheduleDAGInstrs: Fix fixupKills() adding too many kill flags.Matthias Braun2017-06-271-1/+1
| | | | | | | | | Remove invalid shortcut in fixupKills(): A register needs to be marked live even when we are not adding a kill flag. This is because a partially live register must not get a kill flags, but it still needs to be fully marked live when walking backwards. llvm-svn: 306352
* [CFLAA] Use raw pointers instead of Optional<Pointer>. NFC.Davide Italiano2017-06-271-9/+9
| | | | | | | | | Using Optional<> here doesn't seem to be terribly valuable, but this is not the main point of this change. The change enables us to merge the (now) two identical copies of parentFunctionOfValue() that Steensgaard's and Andersens' provide. llvm-svn: 306351
* [CFLAA] Change FunctionHandle to be common to Steensgaard's and Andersens'Davide Italiano2017-06-262-3/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D34638 llvm-svn: 306348
* DAGCombine: Make sure we only eliminate trunc/extend when the scales of ↵Wolfgang Pieb2017-06-261-5/+9
| | | | | | | | | | | | truncation and extension match. This fixes PR33368. Reviewer: rksimon Differential Revision: https://reviews.llvm.org/D34069 llvm-svn: 306345
* revert r306336 for breaking ppc test.Dehao Chen2017-06-261-1/+1
| | | | llvm-svn: 306344
* [CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-06-266-70/+136
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 306341
* [Coverage] Improve readability by using a struct. NFC.Vedant Kumar2017-06-261-22/+20
| | | | llvm-svn: 306340
* reverting 306331.Ayal Zaks2017-06-261-293/+286
| | | | | | Causes TBAA metadata to be generates on reverse shuffles, investigating. llvm-svn: 306338
* Enable vectorizer-maximize-bandwidth by default.Dehao Chen2017-06-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306336
* Fix the bug when handling shufflevector for aarch64.Dehao Chen2017-06-261-2/+3
| | | | | | | | | | | | | | Summary: This Fixes https://bugs.llvm.org/show_bug.cgi?id=33600 Reviewers: mssimpso, davidxl, Carrot Reviewed By: mssimpso Subscribers: aemerson, rengolin, sanjoy, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D34641 llvm-svn: 306334
* RenameIndependentSubregs: Fix iterator problemMatt Arsenault2017-06-261-0/+3
| | | | | | | | | | Fixes bug 33597. Use of substituteRegister in the tied operand case messes up the register use iterator, causing some uses to be left unprocessed. llvm-svn: 306333
* [LV] Changing the interface of ValueMap, NFC.Ayal Zaks2017-06-261-286/+293
| | | | | | | | | | | | Instead of providing access to the internal MapStorage holding all Values associated with a given Key, used for setting or resetting them all together, ValueMap keeps its MapStorage internal; its new interface allows getting, setting or resetting a single Value, per part or per part-and-lane. Follows the discussion in https://reviews.llvm.org/D32871. Differential Revision: https://reviews.llvm.org/D34473 llvm-svn: 306331
* AArch64: legalize G_EXTRACT operations.Tim Northover2017-06-265-7/+76
| | | | | | | This is the dual problem to legalizing G_INSERTs so most of the code and testing was cribbed from there. llvm-svn: 306328
* [DWARF] NFC: Give DwarfFormat a 1-byte base type.Paul Robinson2017-06-261-2/+2
| | | | | | | In particular this reduces DWARFFormParams from 64 to 32 bits; pass it around by value. llvm-svn: 306324
* AArch64: remove all kill flags when extending register liveness.Tim Northover2017-06-261-1/+7
| | | | | | | | | | | | When we forward a stored value to a load and eliminate it entirely we need to make sure the liveness of the register is maintained all the way to its use. Previously we only cleared liveness on the store doing the forwarding, but there could be other killing uses in between. We already do the right thing when the load has to be converted into something else, it was just this one path that skipped it. llvm-svn: 306318
* [DWARF] NFC: Collect info used by DWARFFormValue into a helper.Paul Robinson2017-06-265-150/+99
| | | | | | | | | | | Some forms have sizes that depend on the DWARF version, DWARF format (32/64-bit), or the size of an address. Collect these into a struct to simplify passing them around. Require callers to provide one when they query a form's size. Differential Revision: http://reviews.llvm.org/D34570 llvm-svn: 306315
* [GVN] Recommit the patch "Add phi-translate support in scalarpre".Wei Mi2017-06-261-28/+158
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recommit fixes three bugs: The first one is to use CurrentBlock instead of PREInstr's Parent as param of performScalarPREInsertion because the Parent of a clone instruction may be uninitialized. The second one is stop PRE when CurrentBlock to its predecessor is a backedge and an operand of CurInst is defined inside of CurrentBlock. The same value defined inside of loop in last iteration can not be regarded as available. The third one is an out-of-bound array access in a flipped if guard. Right now scalarpre doesn't have phi-translate support, so it will miss some simple pre opportunities. Like the following testcase, current scalarpre cannot recognize the last "a * b" is fully redundent because a and b used by the last "a * b" expr are both defined by phis. long a[100], b[100], g1, g2, g3; __attribute__((pure)) long goo(); void foo(long a, long b, long c, long d) { g1 = a * b; if (__builtin_expect(g2 > 3, 0)) { a = c; b = d; g2 = a * b; } g3 = a * b; // fully redundant. } The patch adds phi-translate support in scalarpre. This is only a temporary solution before the newpre based on newgvn is available. llvm-svn: 306313
* AMDGPU: Setup SP/FP in callee function prolog/epilogMatt Arsenault2017-06-263-2/+78
| | | | llvm-svn: 306312
* Replace trivial use of external rc.exe by writing our own .res file.Eric Beckmann2017-06-262-17/+10
| | | | | | | | | This patch removes the dependency on the external rc.exe tool by writing a simple .res file using our own library. In this patch I also added an explicit definition for the .res file magic. Furthermore, I added a unittest for embeded manifests and fixed a bug exposed by the test. llvm-svn: 306311
* [SystemZ] Fix missing emergency spill slot corner caseUlrich Weigand2017-06-261-2/+15
| | | | | | | | | | | | | | | | We sometimes need emergency spill slots for the register scavenger. This may be the case when code needs to access a stack slot that has an offset of 4096 or more relative to the stack pointer. To make that determination, processFunctionBeforeFrameFinalized currently simply checks the total stack frame size of the current function. But this is not enough, since code may need to access stack slots in the caller's stack frame as well, in particular incoming arguments stored on the stack. This commit fixes the problem by taking argument slots into account. llvm-svn: 306305
* [inline asm] dot operator while using imm generates wrong ir + asm - llvm partMarina Yatsina2017-06-261-2/+1
| | | | | | | | | | | | | | | | | Inline asm dot operator while using imm generates wrong ir and asm This also fixes bugzilla 32987: https://bugs.llvm.org//show_bug.cgi?id=32987 The clang part of the review that contains the test can be found here: https://reviews.llvm.org/D33040 commit on behald of zizhar Differential Revision: https://reviews.llvm.org/D33039 llvm-svn: 306300
* [X86][AVX-512] Don't raise inexact in ceil, floor, round, trunc.Ahmed Bougacha2017-06-261-12/+12
| | | | | | | | | | | | The non-AVX-512 behavior was changed in r248266 to match N1778 (C bindings for IEEE-754 (2008)), which defined the four functions to not raise the inexact exception ("rint" is still defined as raising it). Update the AVX-512 lowering of these functions to match that: it should not be different. llvm-svn: 306299
* AMDGPU/GlobalISel: Mark 32-bit G_SHL as legalTom Stellard2017-06-261-0/+2
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34589 llvm-svn: 306298
* [x86] transform vector inc/dec to use -1 constant (PR33483)Sanjay Patel2017-06-261-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert vector increment or decrement to sub/add with an all-ones constant: add X, <1, 1...> --> sub X, <-1, -1...> sub X, <1, 1...> --> add X, <-1, -1...> The all-ones vector constant can be materialized using a pcmpeq instruction that is commonly recognized as an idiom (has no register dependency), so that's better than loading a splat 1 constant. AVX512 uses 'vpternlogd' for 512-bit vectors because there is apparently no better way to produce 512 one-bits. The general advantages of this lowering are: 1. pcmpeq has lower latency than a memop on every uarch I looked at in Agner's tables, so in theory, this could be better for perf, but... 2. That seems unlikely to affect any OOO implementation, and I can't measure any real perf difference from this transform on Haswell or Jaguar, but... 3. It doesn't look like it from the diffs, but this is an overall size win because we eliminate 16 - 64 constant bytes in the case of a vector load. If we're broadcasting a scalar load (which might itself be a bug), then we're replacing a scalar constant load + broadcast with a single cheap op, so that should always be smaller/better too. 4. This makes the DAG/isel output more consistent - we use pcmpeq already for padd x, -1 and psub x, -1, so we should use that form for +1 too because we can. If there's some reason to favor a constant load on some CPU, let's make the reverse transform for all of these cases (either here in the DAG or in a later machine pass). This should fix: https://bugs.llvm.org/show_bug.cgi?id=33483 Differential Revision: https://reviews.llvm.org/D34336 llvm-svn: 306289
* [Hexagon] Handle cases when the aligned stack pointer is missingKrzysztof Parzyszek2017-06-262-9/+19
| | | | llvm-svn: 306288
* [SystemZ] Add a check against zero before calling getTestUnderMaskCond()Jonas Paulsson2017-06-261-0/+2
| | | | | | | | | | | | | | Csmith discovered that this function can be called with a zero argument, in which case an assert for this triggered. This patch also adds a guard before the other call to this function since it was missing, although the test only covers the case where it was discovered. Reduced test case attached as CodeGen/SystemZ/int-cmp-54.ll. Review: Ulrich Weigand llvm-svn: 306287
OpenPOWER on IntegriCloud