summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [DWARF] - Use DWARFAddressRange struct instead of uint64_t pair for ↵George Rimar2017-05-164-19/+18
| | | | | | | | | | | DWARFAddressRangesVector. Recommit of r303159 "[DWARF] - Use DWARFAddressRange struct instead of uint64_t pair for DWARFAddressRangesVector" All places were shitched to use DWARFAddressRange now. Suggested during review of D33184. llvm-svn: 303163
* Revert r303159 "[DWARF] - Use DWARFAddressRange struct instead of uint64_t ↵George Rimar2017-05-162-8/+10
| | | | | | | | | pair for DWARFAddressRangesVector." Something went wrong, it broke BB. http://green.lab.llvm.org/green//job/clang-stage1-cmake-RA-incremental_build/38477/consoleFull#-200034420049ba4694-19c4-4d7e-bec5-911270d8a58c llvm-svn: 303162
* [DWARF] - Use DWARFAddressRange struct instead of uint64_t pair for ↵George Rimar2017-05-162-10/+8
| | | | | | | | DWARFAddressRangesVector. Suggested during review of D33184. llvm-svn: 303159
* [LTO] Print time-passes information at conclusion of LTO codegenJames Henderson2017-05-163-0/+15
| | | | | | | | | | | | | | | | | | | The information collected when requested by -time-passes is only printed when llvm_shutdown is called at the moment. This means that when linking against the LTO library dynamically and using the C interface, it is not possible to see the timing information, because llvm_shutdown cannot be called. This change modifies the LTO code generation functions for both regular LTO and thin LTO to explicitly print and reset the timing information. I have tested that this works with our proprietary linker. However, as this relies on a specific method of building and linking against the LTO library, I'm not sure how or if this can be tested in the LLVM testsuite. Reviewed by: mehdi_amini Differential Revision: https://reviews.llvm.org/D32803 llvm-svn: 303152
* [SCEV] Fix sorting order for AddRecExprsMax Kazantsev2017-05-161-16/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing sorting order in defined CompareSCEVComplexity sorts AddRecExprs by loop depth, but does not pay attention to dominance of loops. This can lead us to the following buggy situation: for (...) { // loop1 op1 = {A,+,B} } for (...) { // loop2 op2 = {A,+,B} S = add op1, op2 } In this case there is no guarantee that in operand list of S the op2 comes before op1 (loop depth is the same, so they will be sorted just lexicographically), so we can incorrectly treat S as a recurrence of loop1, which is wrong. This patch changes the sorting logic so that it places the dominated recs before the dominating recs. This ensures that when we pick the first recurrency in the operands order, it will be the bottom-most in terms of domination tree. The attached test set includes some tests that produce incorrect SCEV estimations and crashes with oldlogic. Reviewers: sanjoy, reames, apilipenko, anna Reviewed By: sanjoy Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33121 llvm-svn: 303148
* [CorrelatedValuePropagation] Don't use -> to call a static method of ↵Craig Topper2017-05-161-6/+4
| | | | | | ConstantRange. NFC llvm-svn: 303147
* NewGVN: Use StoreExpression StoredValue instead of looking it up again, ↵Daniel Berlin2017-05-161-6/+5
| | | | | | since it was already looked up when it was created llvm-svn: 303144
* NewGVN: Formatting fixesDaniel Berlin2017-05-161-2/+3
| | | | llvm-svn: 303143
* Revert "[NewGVN] Replace predicate info leftovers."Davide Italiano2017-05-161-4/+0
| | | | | | It's breaking the bots. llvm-svn: 303142
* [NewGVN] Replace predicate info leftovers.Davide Italiano2017-05-161-0/+4
| | | | | | | | Fixes PR32945. Differential Revision: https://reviews.llvm.org/D33226 llvm-svn: 303141
* AMDGPUCodeGen: Fix warnings in r303111. [-Wunused-variable]NAKAMURA Takumi2017-05-162-2/+4
| | | | llvm-svn: 303137
* IR: Give function GlobalValue::getRealLinkageName() a less misleading name: ↵Peter Collingbourne2017-05-1611-20/+20
| | | | | | | | | | | | dropLLVMManglingEscape(). This function gives the wrong answer on some non-ELF platforms in some cases. The function that does the right thing lives in Mangler.h. To try to discourage people from using this function, give it a different name. Differential Revision: https://reviews.llvm.org/D33162 llvm-svn: 303134
* [ShrinkWrapping] Handle restores on no-return pathsFrancis Visoiu Mistrih2017-05-151-2/+8
| | | | | | | | | | | | | | | | | | | Shrink-wrapping uses post-dominators to find a restore point that post-dominates all the uses of CSR / stack. The way dominator trees are modeled in LLVM today is that unreachable blocks are not present in a generic dominator tree, so, an unreachable node is dominated by anything: include/llvm/Support/GenericDomTree.h:467. Since for post-dominators, a no-return block is considered "unreachable", calling findNearestCommonDominator on an unreachable node A and a non-unreachable node B, will return B, which can be false. If we find such node, we bail out since there is no good restore point available. rdar://problem/30186931 llvm-svn: 303130
* [libFuzzer] fix tests on WindowsKostya Serebryany2017-05-151-0/+1
| | | | llvm-svn: 303128
* Fix memory leakXinliang David Li2017-05-151-0/+4
| | | | llvm-svn: 303126
* [libFuzzer] improve the afl driver and it's tests. Make it possible to run ↵Kostya Serebryany2017-05-153-13/+77
| | | | | | individual inputs with afl driver llvm-svn: 303125
* [AMDGPU] Kill now unused phiInfoElementGetDebugLoc(). NFCI.Davide Italiano2017-05-151-5/+0
| | | | llvm-svn: 303122
* [APInt] Simplify a for loop initialization based on the fact that 'n' is ↵Craig Topper2017-05-151-1/+1
| | | | | | known to be 1 by an earlier 'if'. llvm-svn: 303120
* [IR] Fix some Clang-tidy modernize-use-using warnings; other minor fixes (NFC).Eugene Zelenko2017-05-155-67/+108
| | | | llvm-svn: 303119
* AArch64: use linker-private symbols for globals in MachO.Tim Northover2017-05-152-0/+11
| | | | | | | | We don't use section-relative relocations on AArch64, so all symbols must be at least visible to the linker (i.e. properly global or l_whatever, but not L_whatever). llvm-svn: 303118
* PR32288: Describe a bool parameter's DWARF location with a simple registerDavid Blaikie2017-05-151-28/+23
| | | | | | | | | | | There's no need (& a bit incorrect) to mask off the high bits of the register reference when describing a simple bool value. Reviewers: aprantl Differential Revision: https://reviews.llvm.org/D31062 llvm-svn: 303117
* [SLP] Enable 64-bit wide vectorization on AArch64Adam Nemet2017-05-155-1/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. *** Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. ** SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116
* Revert r302678 "[AArch64] Enable use of reduction intrinsics."Hans Wennborg2017-05-154-99/+269
| | | | | | | | | | | | | | | | | | This caused PR33053. Original commit message: > The new experimental reduction intrinsics can now be used, so I'm enabling this > for AArch64. We will need this for SVE anyway, so it makes sense to do this for > NEON reductions as well. > > The existing code to match shufflevector patterns are replaced with a direct > lowering of the reductions to AArch64-specific nodes. Tests updated with the > new, simpler, representation. > > Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 303115
* [asan] Better workaround for gold PR19002.Evgeniy Stepanov2017-05-151-2/+11
| | | | | | See the comment for more details. Test in a follow-up CFE commit. llvm-svn: 303113
* Re-submit AMDGPUMachineCFGStructurizer.Jan Sjodin2017-05-157-12/+3245
| | | | | | Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303111
* AArch64: diagnose unrecognized features in .cpu directive.Tim Northover2017-05-151-2/+17
| | | | | | | We were silently ignoring any features we couldn't match up, which led to errors in an inline asm block missing the conventional "\n\t". llvm-svn: 303108
* [NewGVN] Remove unused setDefiningExpr(). NFCI.Davide Italiano2017-05-151-1/+0
| | | | llvm-svn: 303107
* [InstCombine] restrict icmp fold with 2 sdiv exact operands (PR32949)Sanjay Patel2017-05-151-2/+9
| | | | | | | | | | | | | | | | This is the InstCombine counterpart to D32954. I added some comments about the code duplication in: rL302436 Alive-based verification: http://rise4fun.com/Alive/dPw This is a 2nd fix for the problem reported in: https://bugs.llvm.org/show_bug.cgi?id=32949 Differential Revision: https://reviews.llvm.org/D32970 llvm-svn: 303105
* [InstSimplify] restrict icmp fold with 2 sdiv exact operands (PR32949)Sanjay Patel2017-05-151-2/+11
| | | | | | | | | | | | | | | | | | | | These folds were introduced with https://reviews.llvm.org/rL127064 as part of solving: https://bugs.llvm.org/show_bug.cgi?id=9343 As shown here: http://rise4fun.com/Alive/C8 ...however, the sdiv exact case needs a stronger predicate. I opted for duplicated code instead of adding another fallthrough because I think that's easier to read (and edit in case we need/want to restrict/loosen the predicates any more). This should fix: https://bugs.llvm.org/show_bug.cgi?id=32949 https://bugs.llvm.org/show_bug.cgi?id=32948 Differential Revision: https://reviews.llvm.org/D32954 llvm-svn: 303104
* The patch adds CTLZ idiom recognition.Evgeny Stupachenko2017-05-151-1/+294
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The following loops should be recognized: i = 0; while (n) { n = n >> 1; i++; body(); } use(i); And replaced with builtin_ctlz(n) if body() is empty or for CPUs that have CTLZ instruction converted to countable: for (j = 0; j < builtin_ctlz(n); j++) { n = n >> 1; i++; body(); } use(builtin_ctlz(n)); Reviewers: rengolin, joerg Differential Revision: http://reviews.llvm.org/D32605 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 303102
* [NewGVN] Fix verification of MemoryPhis in verifyMemoryCongruency().Davide Italiano2017-05-151-0/+13
| | | | | | | | | | | | verifyMemoryCongruency() filters out trivially dead MemoryDef(s), as we find them immediately dead, before moving from TOP to a new congruence class. This fixes the same problem for PHI(s) skipping MemoryPhis if all the operands are dead. Differential Revision: https://reviews.llvm.org/D33044 llvm-svn: 303100
* [AArch64][Falkor] Fix sched details for FMOVGeoff Berry2017-05-152-4/+6
| | | | llvm-svn: 303099
* Revert 303091.Jan Sjodin2017-05-157-3380/+12
| | | | llvm-svn: 303098
* Add support for handling ifuncs to GlobalValue::getBaseObjectTeresa Johnson2017-05-151-1/+1
| | | | | | | | | | | | | | | | | Summary: All GlobalIndirectSymbol types (not just GlobalAlias) should return their base object. Without this patch LTO would warn "Unable to determine comdat of alias!" for an ifunc. Reviewers: pcc Subscribers: mehdi_amini, inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D33202 llvm-svn: 303096
* [SCEV] Use copy initialization of APInts instead of direct initialization.Craig Topper2017-05-151-6/+6
| | | | | | This is based on post commit feed back from r302769. llvm-svn: 303092
* Add AMDGPUMachineCFGStructurizer.Jan Sjodin2017-05-157-12/+3380
| | | | | | Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303091
* [InstCombine] use m_OneUse to reduce code; NFCISanjay Patel2017-05-151-6/+7
| | | | llvm-svn: 303090
* [libFuzzer] fix a warning from Wunreachable-code-loop-increment reported by ↵Kostya Serebryany2017-05-151-1/+1
| | | | | | Christian Holler. This also fixes a logical bug, which however does not affect the libFuzzer's ability too much (I wasn't able to create a differentiating test) llvm-svn: 303087
* CodeGen: BlockPlacement: Increase tail duplication size for O3.Kyle Butt2017-05-151-3/+27
| | | | | | | | | | | | | | | | | | | | At O3 we are more willing to increase size if we believe it will improve performance. The current threshold for tail-duplication of 2 instructions is conservative, and can be relaxed at O3. Benchmark results: llvm test-suite: 6% improvement in aha, due to duplication of loop latch 3% improvement in hexxagon 2% slowdown in lpbench. Seems related, but couldn't completely diagnose. Internal google benchmark: Produces 4% improvement on internal google protocol buffer serialization benchmarks. Differential-Revision: https://reviews.llvm.org/D32324 llvm-svn: 303084
* [NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ↵Simon Pilgrim2017-05-151-3/+3
| | | | | | | | | | | | | | ReadMem/WriteMem (PR32146) Follow up to D33147 NVPTXTargetLowering::LowerCall was trusting the default argument values. Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146. Differential Revision: https://reviews.llvm.org/D33189 llvm-svn: 303082
* [AArch64] Enable FeatureFuseAES on Cortex-A72.Florian Hahn2017-05-151-0/+1
| | | | | | | | This patch enables fusing dependent AESE/AESMC and AESD/AESIMC instruction pairs on Cortex-A72, as recommended in the Software Optimization Guide, section 4.10. llvm-svn: 303073
* [AMDGPU][MC] Corrected several VI opcodes to avoid printing _e64Dmitry Preobrazhensky2017-05-151-11/+22
| | | | | | | | | | See bug 32936: https://bugs.llvm.org//show_bug.cgi?id=32936 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D33123 llvm-svn: 303070
* [AMDGPU][MC] Removed V_MQSAD_U16_U8Dmitry Preobrazhensky2017-05-151-3/+0
| | | | | | | | | | | | This instruction does not really exist See Bug 33018: https://bugs.llvm.org//show_bug.cgi?id=33018 Reviewers: vpykhtin, artem.tamazov Differential Revision: https://reviews.llvm.org/D33126 llvm-svn: 303055
* [ARM] Mark LEApcrel instructions as isAsCheapAsAMoveJohn Brawn2017-05-153-3/+3
| | | | | | | | | | | Doing this means that if an LEApcrel is used in two places we will rematerialize instead of generating two MOVs. This is particularly useful for printfs using the same format string, where we want to generate an address into a register that's going to get corrupted by the call. Differential Revision: https://reviews.llvm.org/D32858 llvm-svn: 303054
* [ARM] Mark LEApcrel as not having side effectsJohn Brawn2017-05-151-2/+2
| | | | | | | | | | | | | | | Doing this lets us hoist it out of loops, and I've also marked it as rematerializable the same as the thumb1 and thumb2 counterparts. It looks like it being marked as such was just a mistake, as the commit that made that change only mentions LEApcrelJT and in thumb1 and thumb2 only the LEApcrelJT instructions were marked as having side-effects, so it looks like the intent was to only mark LEApcrelJT as having side-effects but LEApcrel was accidentally marked as such also. Differential Revision: https://reviews.llvm.org/D32857 llvm-svn: 303053
* [DWARF] - Speedup handling of relocations in DWARFContextInMemory.George Rimar2017-05-151-4/+17
| | | | | | | | | | | | | | | | | | | I am working on a speedup of building .gdb_index in LLD and noticed that relocations that are proccessed in DWARFContextInMemory often uses the same symbol in a row. This patch introduces caching to reduce the relocations proccessing time. For benchmark, I took debug LLC binary objects configured with -ggnu-pubnames and linked it using LLD. Link time without --gdb-index is about 4,45s. Link time with --gdb-index: a) Without patch: 19,16s b) With patch: 15,52s That means time spent on --gdb-index in this configuration is 19,16s - 4,45s = 14,71s (without patch) vs 15,52s - 4,45s = 11,07s (with patch). Differential revision: https://reviews.llvm.org/D31136 llvm-svn: 303051
* [X86] Relocate code of replacement of subtarget unsupported masked memory ↵Ayman Musa2017-05-155-546/+667
| | | | | | | | | | | | | | intrinsics to run also on -O0 option. Currently, when masked load, store, gather or scatter intrinsics are used, we check in CodeGenPrepare pass if the subtarget support these intrinsics, if not we replace them with scalar code - this is a functional transformation not an optimization (not optional). CodeGenPrepare pass does not run when the optimization level is set to CodeGenOpt::None (-O0). Functional transformation should run with all optimization levels, so here I created a new pass which runs on all optimization levels and does no more than this transformation. Differential Revision: https://reviews.llvm.org/D32487 llvm-svn: 303050
* [NVPTX] Don't rely on default arguments to ↵Simon Pilgrim2017-05-151-3/+8
| | | | | | | | SelectionDAG::getMemIntrinsicNode. NFC. NFC followup to D33147, this explicitly sets all the arguments (instead of relying on the defaults) to SelectionDAG::getMemIntrinsicNode to help identify -verify-machineinstrs issues. llvm-svn: 303047
* [RegisterBankInfo] Remove overly-agressive assertsTom Stellard2017-05-151-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | Summary: We were asserting in RegisterBankInfo if RBI.copyCost() returns UINT_MAX. This is OK for RegBankSelect::Mode::Fast since we only try one instruction mapping and can't recover from this, but for RegBankSelect::Mode::Greedy we will be considering multiple instruction mappings, so we can recover if we see a UNIT_MAX copy cost. The copy cost for one pair of register banks in the AMDGPU backend will be UNIT_MAX, so this patch will prevent AMDGPU tests from breaking. Reviewers: ab, qcolombet, t.p.northover, dsanders Reviewed By: qcolombet Subscribers: tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D33144 llvm-svn: 303043
* MCObjectStreamer : fail with a diagnostic when emitting an out of range value.Arnaud A. de Grandmaison2017-05-151-0/+5
| | | | | | | | | We were previously silently emitting bogus data in release mode, making it very hard to diagnose the error, or crashing with an assert in debug mode. A proper diagnostic is now always emitted when the value to be emitted is out of range. llvm-svn: 303041
OpenPOWER on IntegriCloud