summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [LSV] Use make_range, and reformat a DEBUG message. NFCJustin Lebar2016-07-191-12/+15
| | | | | | | | | | | | | | | Summary: The DEBUG message was hard to read because two Values were being printed on the same line with only the delimiter "aliases". This change makes us print each Value on its own line. Reviewers: asbirlea Subscribers: llvm-commits, arsenm, mzolotukhin Differential Revision: https://reviews.llvm.org/D22533 llvm-svn: 276055
* [LSV] Nix two global (ish) variables in the LoadStoreVectorizer. NFCJustin Lebar2016-07-191-10/+12
| | | | | | | | | | Reviewers: asbirlea Subscribers: mzolotukhin, llvm-commits, arsenm Differential Revision: https://reviews.llvm.org/D22532 llvm-svn: 276054
* [libFuzzer] extend the messages printed by afl_driverKostya Serebryany2016-07-191-4/+12
| | | | llvm-svn: 276052
* AMDGPU: Change fdiv lowering based on !fpmath metadataMatt Arsenault2016-07-198-49/+227
| | | | | | | | | | | If 2.5 ulp is acceptable, denormals are not required, and isn't a reciprocal which will already be handled, replace with a faster fdiv. Simplify the lowering tests by using per function subtarget features. llvm-svn: 276051
* Fix unused variableDaniel Berlin2016-07-191-2/+1
| | | | llvm-svn: 276050
* Make GVN Hoisting obey optnone/bisect.Paul Robinson2016-07-191-0/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D22545 llvm-svn: 276048
* Make MemorySSA::dominates/locallydominates constant timeDaniel Berlin2016-07-191-16/+36
| | | | | | | | | | | | Summary: Make MemorySSA::dominates/locallydominates constant time Reviewers: george.burgess.iv, gberry Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22527 llvm-svn: 276046
* Add AIX support to Path.inc, Host.h, and CMake.Chandler Carruth2016-07-191-2/+3
| | | | | | | | Patch by Andrew Paprocki! Differential Revision: https://reviews.llvm.org/D18359 llvm-svn: 276045
* RegScavenging: Add scavengeRegisterBackwards()Matthias Braun2016-07-192-108/+240
| | | | | | | | | | | | | | This is a variant of scavengeRegister() that works for enterBasicBlockEnd()/backward(). The benefit of the backward mode is that it is not affected by incomplete kill flags. This patch also changes PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register scavenger in backwards mode. Differential Revision: http://reviews.llvm.org/D21885 llvm-svn: 276044
* RegisterScavenger: Introduce backward() mode.Matthias Braun2016-07-191-23/+84
| | | | | | | | | | | | | | | | This adds two pieces: - RegisterScavenger:::enterBasicBlockEnd() which behaves similar to enterBasicBlock() but starts tracking at the end of the basic block. - A RegisterScavenger::backward() method. It is subtly different from the existing unprocess() method which only considers uses with the kill flag set: If a value is dead at the end of a basic block with a last use inside the basic block, unprocess() will fail to mark it as live. However we cannot change/fix this behaviour because unprocess() needs to perform the exact reverse operation of forward(). Differential Revision: http://reviews.llvm.org/D21873 llvm-svn: 276043
* [AArch64] Properly validate the reciprocal estimation.Evandro Menezes2016-07-191-0/+6
| | | | | | | | Add check for legal data types when expanding into a Newton series. Differential Revision: https://reviews.llvm.org/D22267 llvm-svn: 276041
* [InstCombine] fold add(zext(xor X, C), C) --> sext X when C is INT_MIN in ↵Sanjay Patel2016-07-191-0/+10
| | | | | | | | | | | | | | | | | | | the source type The pattern may look more obviously like a sext if written as: define i32 @g(i16 %x) { %zext = zext i16 %x to i32 %xor = xor i32 %zext, 32768 %add = add i32 %xor, -32768 ret i32 %add } We already have that fold in visitAdd(). Differential Revision: https://reviews.llvm.org/D22477 llvm-svn: 276035
* Attempt to appease MSVC buildbots.George Burgess IV2016-07-191-8/+10
| | | | | | Broken by r276026. llvm-svn: 276032
* [AMDGPU] Remove spurious line (should've been removed in r276029).Davide Italiano2016-07-191-3/+0
| | | | llvm-svn: 276030
* [AMDGPU] Remove dead code.Davide Italiano2016-07-191-25/+0
| | | | | | LGTM'd by Matt Arsenault. llvm-svn: 276029
* [CFLAA] Add some interproc. analysis to CFLAnders.George Burgess IV2016-07-192-8/+188
| | | | | | | | | | | This patch adds function summary support to CFLAnders. It also comes with a lot of tests! Woohoo! Patch by Jia Chen. Differential Revision: https://reviews.llvm.org/D22450 llvm-svn: 276026
* Next step along the way to getting good error messages for bad archives.Kevin Enderby2016-07-191-35/+46
| | | | | | | | | This step builds on Lang Hames work to change Archive::child_iterator for better interoperation with Error/Expected. Building on that it is now possible to return an error message when the size field of an archive contains non-decimal characters. llvm-svn: 276025
* [CFLAA] Teach CFLAnders to distinguish reads from writes.George Burgess IV2016-07-191-50/+95
| | | | | | | | | | | | | | | | | | | | | | | This patch adds more specific edges to CFLAndersAliasAnalysis. The goal of these edges is to give us more information about *how* two values that MayAlias alias. With this, we can now tell cases like a = b; // ergo, a may alias b apart from a = c; b = c; // so, a may alias b, but only because they were both assigned to c. ...And others. Patch by Jia Chen. Differential Revision: https://reviews.llvm.org/D22429 llvm-svn: 276023
* Use posix_fallocate instead of ftruncate.Rafael Espindola2016-07-191-0/+9
| | | | | | | | | | | This makes sure that space is actually available. With this change running lld on a full file system causes it to exit with failed to open foo: No space left on device instead of crashing with a sigbus. llvm-svn: 276017
* [tsan] Don't instrument __llvm_gcov_global_state_pred or __llvm_gcda*Vedant Kumar2016-07-191-2/+3
| | | | | | | | | | r274801 did not go far enough to allow gcov+tsan to cooperate. With this commit it's possible to run the following code without false positives: std::thread T1(fib), T2(fib); T1.join(); T2.join(); llvm-svn: 276015
* ARM: move feature for Thumb2 pkhbt/pkhtb onto architectures.Tim Northover2016-07-191-32/+14
| | | | | | | | | | | There's not much functional change, but it really is an architectural feature (on v6T2, v7A, v7R and v7EM) rather than something each CPU implements individually. The main functional change is the default behaviour you get when specifying only "-triple". llvm-svn: 276013
* [GlobalISel] Mark newly-created gvregs as having a bank.Ahmed Bougacha2016-07-192-3/+10
| | | | | | | | | | Also verify that we never try to set the size of a vreg associated to a register class. Report an error when we encounter that in MIR. Fix a testcase that hit that error and had a size for no reason. llvm-svn: 276012
* [GlobalISel] Simplify more RegClassOrRegBank is+get. NFC.Ahmed Bougacha2016-07-191-5/+3
| | | | llvm-svn: 276011
* [FunctionAttrs] Correct the safety analysis for inference of 'returned'David Majnemer2016-07-191-0/+51
| | | | | | | | | | | | We skipped over ReturnInsts which didn't return an argument which would lead us to incorrectly conclude that an argument returned by another ReturnInst was 'returned'. This reverts commit r275756. This fixes PR28610. llvm-svn: 276008
* [SCCP] Improve assert messages. NFCI.Davide Italiano2016-07-191-4/+6
| | | | | | | I've been hitting those already while working on SCCP and I think it's be useful to provide a more explanatory diagnostic. llvm-svn: 276007
* [libFuzzer] properly intercept memmemKostya Serebryany2016-07-192-2/+15
| | | | llvm-svn: 276006
* [DSE] Add additional debug output. NFC.Chad Rosier2016-07-191-0/+2
| | | | llvm-svn: 276005
* [RegionPass] Some minor cleanupsDavid Majnemer2016-07-191-4/+2
| | | | | | No functional change is intended. llvm-svn: 276000
* [LoopPass] Some minor cleanupsDavid Majnemer2016-07-191-7/+5
| | | | | | No functional change is intended. llvm-svn: 275999
* [DSE] Add additional debug output. NFC.Chad Rosier2016-07-191-0/+3
| | | | llvm-svn: 275991
* [InstCombine] Enable cast-folding in logic(cast(icmp), cast(icmp))Tobias Grosser2016-07-191-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Currently, InstCombine is already able to fold expressions of the form `logic(cast(A), cast(B))` to the simpler form `cast(logic(A, B))`, where logic designates one of `and`/`or`/`xor`. This transformation is implemented in `foldCastedBitwiseLogic()` in InstCombineAndOrXor.cpp. However, this optimization will not be performed if both `A` and `B` are `icmp` instructions. The decision to preclude casts of `icmp` instructions originates in r48715 in combination with r261707, and can be best understood by the title of the former one: > Transform (zext (or (icmp), (icmp))) to (or (zext (cimp), (zext icmp))) if at least one of the (zext icmp) can be transformed to eliminate an icmp. Apparently, it introduced a transformation that is a reverse of the transformation that is done in `foldCastedBitwiseLogic()`. Its purpose is to expose pairs of `zext icmp` that would subsequently be optimized by `transformZExtICmp()` in InstCombineCasts.cpp. Therefore, in order to avoid an endless loop of switching back and forth between these two transformations, the one in `foldCastedBitwiseLogic()` has been restricted to exclude `icmp` instructions which is mirrored in the responsible check: `if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src)) && ...` This check seems to sort out more cases than necessary because: - the reverse transformation is obviously done for `or` instructions only - and also not every `zext icmp` pair is necessarily the result of this reverse transformation Therefore we now remove this check and replace it by a more finegrained one in `shouldOptimizeCast()` that now rejects only those `logic(zext(icmp), zext(icmp))` that would be able to be optimized by `transformZExtICmp()`, which also avoids the mentioned endless loop. That means we are now able to also simplify expressions of the form `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` (`cast` being an arbitrary `CastInst`). As an example, consider the following IR snippet ``` %1 = icmp sgt i64 %a, %b %2 = zext i1 %1 to i8 %3 = icmp slt i64 %a, %c %4 = zext i1 %3 to i8 %5 = and i8 %2, %4 ``` which would now be transformed to ``` %1 = icmp sgt i64 %a, %b %2 = icmp slt i64 %a, %c %3 = and i1 %1, %2 %4 = zext i1 %3 to i8 ``` This issue became apparent when experimenting with the programming language Julia, which makes use of LLVM. Currently, Julia lowers its `Bool` datatype to LLVM's `i8` (also see https://github.com/JuliaLang/julia/pull/17225). In fact, the above IR example is the lowered form of the Julia snippet `(a > b) & (a < c)`. Like shown above, this may introduce `zext` operations, casting between `i1` and `i8`, which could for example hinder ScalarEvolution and Polly on certain code. Reviewers: grosser, vtjnash, majnemer Subscribers: majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D22511 Contributed-by: Matthias Reisinger llvm-svn: 275989
* AMDGPU: Only use legal inline immediates with kill pseudoMatt Arsenault2016-07-195-3/+15
| | | | | | | | | | | Only if the value is negative or positive is what matters, so use a constant that doesn't require an instruction to materialize. These should really just emit the write exec directly, but for stick with the kill pseudo-terminator. llvm-svn: 275988
* [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using ↵Simon Pilgrim2016-07-193-25/+33
| | | | | | | | | | | | | | | | | | generic IR D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead. It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match). This patch changes both scalar and packed versions back to using x86-specific builtins. It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding. A companion clang patch is at D22105 Differential Revision: https://reviews.llvm.org/D22106 llvm-svn: 275981
* [ARM] Refactor Thumb2 Mul and Mla instr descsSam Parker2016-07-191-327/+143
| | | | | | | | | | | Recommitting after r274347 was reverted. This patch introduces some classes to refactor the 3 and 4 register Thumb2 multiplication instruction descriptions, plus improved tests for some of those instructions. Differential Revision: https://reviews.llvm.org/D21929 llvm-svn: 275979
* [AArch64] PredictableSelectIsExpensive for Vulcan.Pankaj Gode2016-07-191-0/+1
| | | | | | | | Adding PredictableSelectIsExpensive for Vulcan Differential Revision: https://reviews.llvm.org/D22448 llvm-svn: 275978
* Add support for tlsldm assembler operator to ARM targetPeter Smith2016-07-191-0/+3
| | | | | | | | | | | | | | | | | | | | The standard local dynamic model for TLS on ARM systems needs two relocations: - R_ARM_TLS_LDM32 (module idx) - R_ARM_TLS_LDO32 (offset of object from origin of module TLS block) In GNU style assembler we use symbol(tlsldm) and symbol(tlsldo) to produce these relocations. llvm-mc for ARM supports symbol(tlsldo) but does not support symbol(tlsldm). This patch wires up the existing symbol(tlsldm) to R_ARM_TLS_LDM32. TLS for ARM is defined in Addenda to, and Errata in, the ABI for the ARM Architecture Differential Revision: https://reviews.llvm.org/D22461 llvm-svn: 275977
* [mips][ias] R_MIPS_GOT_(PAGE|OFST) do not need symbolsDaniel Sanders2016-07-191-4/+4
| | | | | | | | | | Reviewers: sdardis Subscribers: dsanders, llvm-commits, sdardis Differential Revision: https://reviews.llvm.org/D22458 llvm-svn: 275968
* [mips] Correct label prefixes for N32 and N64.Daniel Sanders2016-07-192-3/+13
| | | | | | | | | | | | | | | | | Summary: N32 and N64 follow the standard ELF conventions (.L) whereas O32 uses its own ($). This fixes the majority of object differences between -fintegrated-as and -fno-integrated-as. Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: https://reviews.llvm.org/D22412 llvm-svn: 275967
* [mips] Recognise the triple used by Debian stretch for mips64el.Daniel Sanders2016-07-191-0/+2
| | | | | | | | | | | | | Summary: The triple used for this distribution is mips64el-linux-gnuabi64. Reviewers: sdardis Subscribers: sdardis, llvm-commits Differential Revision: https://reviews.llvm.org/D22406 llvm-svn: 275966
* [InstCombine] Minor cleanup of cast simplification code [NFC]Tobias Grosser2016-07-193-60/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch cleans up parts of InstCombine to raise its compliance with the LLVM coding standards and to increase its readability. The changes and according rationale are summarized in the following: - Rename `ShouldOptimizeCast()` to `shouldOptimizeCast()` since functions should start with a lower case letter. - Move `shouldOptimizeCast()` from InstCombineCasts.cpp to InstCombineAndOrXor.cpp since it's only used there. - Simplify interface of `shouldOptimizeCast()`. - Minor code style adaptions in `shouldOptimizeCast()`. - Remove the documentation on the function definition of `shouldOptimizeCast()` since it just repeats the documentation on its declaration. Also enhance the documentation on its declaration with more information describing its intended use and make it doxygen-compliant. - Change a comment in `foldCastedBitwiseLogic()` from `fold (logic (cast A), (cast B)) -> (cast (logic A, B))` to `fold logic(cast(A), cast(B)) -> cast(logic(A, B))` since the surrounding comments use this format. - Remove comment `Only do this if the casts both really cause code to be generated.` in `foldCastedBitwiseLogic()` since it just repeats parts of the documentation of `shouldOptimizeCast()` and does not help to improve readability. - Simplify the interface of `isEliminableCastPair()`. - Removed the documentation on the function definition of `isEliminableCastPair()` which only contained obvious statements about its implementation. Instead added more general doxygen-compliant documentation to its declaration. - Renamed parameter `DoXform` of `transformZExtIcmp()` to `DoTransform` to make its intention clearer. - Moved documentation of `transformZExtIcmp()` from its definition to its declaration and made it doxygen-compliant. Reviewers: vtjnash, grosser Subscribers: majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D22449 Contributed-by: Matthias Reisinger llvm-svn: 275964
* AVX-512: Fixed BT instruction selection.Elena Demikhovsky2016-07-192-25/+59
| | | | | | | | | | | The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets. But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1). I added the new sequence (truncate (a >>n) to i1) to the BT pattern. Differential Revision: https://reviews.llvm.org/D22354 llvm-svn: 275950
* [AVX512] Give priority to EVEX encoded PSHUFB over the VEX versions.Craig Topper2016-07-191-6/+16
| | | | llvm-svn: 275942
* [X86] Remove superfluous parameter from a multiclass. All instantiations ↵Craig Topper2016-07-191-9/+8
| | | | | | passed the same value. llvm-svn: 275941
* [MemorySSA] Update to the new shiny walker.George Burgess IV2016-07-191-307/+830
| | | | | | | | | | | | | | | | | | | | | | | | This patch updates MemorySSA's use-optimizing walker to be more accurate and, in some cases, faster. Essentially, this changed our core walking algorithm from a cache-as-you-go DFS to an iteratively expanded DFS, with all of the caching happening at the end. Said expansion happens when we hit a Phi, P; we'll try to do the smallest amount of work possible to see if optimizing above that Phi is legal in the first place. If so, we'll expand the search to see if we can optimize to the next phi, etc. An iteratively expanded DFS lets us potentially quit earlier (because we don't assume that we can optimize above all phis) than our old walker. Additionally, because we don't cache as we go, we can now optimize above loops. As an added bonus, this patch adds a ton of verification (if EXPENSIVE_CHECKS are enabled), so finding bugs is easier. Differential Revision: https://reviews.llvm.org/D21777 llvm-svn: 275940
* [X86] Rename VINSERTzrr to use a capital Z to match other instructions. NFCCraig Topper2016-07-192-4/+4
| | | | llvm-svn: 275939
* Retry: [llvm-profdata] Speed up merging by using a thread poolVedant Kumar2016-07-191-0/+8
| | | | | | | | | | | | | | | | | | | | | Add a "-j" option to llvm-profdata to control the number of threads used. Auto-detect NumThreads when it isn't specified, and avoid spawning threads when they wouldn't be beneficial. I tested this patch using a raw profile produced by clang (147MB). Here is the time taken to merge 4 copies together on my laptop: No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total Changes since the initial commit: - When handling odd-length inputs, call ThreadPool::wait() before merging the last profile. Should fix a race/off-by-one (see r275937). Differential Revision: https://reviews.llvm.org/D22438 llvm-svn: 275938
* Revert "[llvm-profdata] Speed up merging by using a thread pool"Vedant Kumar2016-07-191-8/+0
| | | | | | | | | | | This reverts commit r275921. It broke the ppc64be bot: http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/3537 I'm not sure why it broke, but based on the output, it looks like an off-by-one (one profile left un-merged). llvm-svn: 275937
* Recommit the patch "Use uniforms set to populate VecValuesToIgnore".Wei Mi2016-07-191-49/+40
| | | | | | | | | | | | | | | | | | For instructions in uniform set, they will not have vector versions so add them to VecValuesToIgnore. For induction vars, those only used in uniform instructions or consecutive ptrs instructions have already been added to VecValuesToIgnore above. For those induction vars which are only used in uniform instructions or non-consecutive/non-gather scatter ptr instructions, the related phi and update will also be added into VecValuesToIgnore set. The change will make the vector RegUsages estimation less conservative. Differential Revision: https://reviews.llvm.org/D20474 The recommit fixed the testcase global_alias.ll. llvm-svn: 275936
* AMDGPU/SI: Fix SI scheduler refcount issueMatt Arsenault2016-07-191-0/+3
| | | | | | | | | Without this fix, releaseSuccessors when InOrOutBlock is false could release SUs outside the schedule BasicBlock. Patch by Axel Davy llvm-svn: 275935
* AMDGPU: Expand register indexing pseudos in custom inserterMatt Arsenault2016-07-198-300/+451
| | | | | | | | | | | | | | | | | | | | | | | This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. llvm-svn: 275934
OpenPOWER on IntegriCloud