summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64][SVE] Asm: Support for LAST(A|B) and CLAST(A|B) instructions.Sander de Smalen2018-07-112-0/+148
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The LASTB and LASTA instructions extract the last active element, or element after the last active, from the source vector. The added variants are: Scalar: last(a|b) w0, p0, z0.b last(a|b) w0, p0, z0.h last(a|b) w0, p0, z0.s last(a|b) x0, p0, z0.d SIMD & FP Scalar: last(a|b) b0, p0, z0.b last(a|b) h0, p0, z0.h last(a|b) s0, p0, z0.s last(a|b) d0, p0, z0.d The CLASTB and CLASTA conditionally extract the last or element after the last active element from the source vector. The added variants are: Scalar: clast(a|b) w0, p0, w0, z0.b clast(a|b) w0, p0, w0, z0.h clast(a|b) w0, p0, w0, z0.s clast(a|b) x0, p0, x0, z0.d SIMD & FP Scalar: clast(a|b) b0, p0, b0, z0.b clast(a|b) h0, p0, h0, z0.h clast(a|b) s0, p0, s0, z0.s clast(a|b) d0, p0, d0, z0.d Vector: clast(a|b) z0.b, p0, z0.b, z1.b clast(a|b) z0.h, p0, z0.h, z1.h clast(a|b) z0.s, p0, z0.s, z1.s clast(a|b) z0.d, p0, z0.d, z1.d Please refer to the architecture specification for more details on the semantics of the added instructions. llvm-svn: 336783
* [llvm-readobj] Add -hex-dump (-x) optionPaul Semel2018-07-112-0/+36
| | | | | | Differential Revision: https://reviews.llvm.org/D48281 llvm-svn: 336782
* [SelectionDAG] Add constant buildvector support to isKnownNeverZeroSimon Pilgrim2018-07-112-6/+9
| | | | | | This allows us to use SelectionDAG::isKnownNeverZero in DAGCombiner::visitREM (visitSDIVLike/visitUDIVLike handle the checking for constants). llvm-svn: 336779
* [mips] Remove dead code. NFCSimon Atanasyan2018-07-115-38/+0
| | | | llvm-svn: 336777
* [DAGCombiner] Support non-uniform X%C -> X-(X/C)*C foldsSimon Pilgrim2018-07-111-1/+4
| | | | | | | | | | First stage in PR38057 - support non-uniform constant vectors in the combine to reuse the division-by-constant logic. We can definitely do better for srem pow2 remainders (and avoid that extra multiply....) but this at least helps keep everything on the vector unit. Differential Revision: https://reviews.llvm.org/D48975 llvm-svn: 336774
* [DAGCombiner] Add (urem X, -1) -> select(X == -1, 0, x) foldSimon Pilgrim2018-07-111-0/+6
| | | | llvm-svn: 336773
* [TableGen] Add missing std::moves to fix build failure.Simon Tatham2018-07-111-7/+7
| | | | | | | | | | gcc 4.7 seems to disagree with gcc 5.3 about whether you need to say 'return std::move(thing)' instead of just 'return thing'. All the json::Arrays and json::Objects that I was implicitly turning into json::Values by returning them from functions now have explicit std::move wrappers, so hopefully 4.7 will be happy now. llvm-svn: 336772
* [TableGen] Add a general-purpose JSON backend.Simon Tatham2018-07-112-0/+190
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The aim of this backend is to output everything TableGen knows about the record set, similarly to the default -print-records backend. But where -print-records produces output in TableGen's input syntax (convenient for humans to read), this backend produces it as structured JSON data, which is convenient for loading into standard scripting languages such as Python, in order to extract information from the data set in an automated way. The output data contains a JSON representation of the variable definitions in output 'def' records, and a few pieces of metadata such as which of those definitions are tagged with the 'field' prefix and which defs are derived from which classes. It doesn't dump out absolutely every piece of knowledge it _could_ produce, such as type information and complicated arithmetic operator nodes in abstract superclasses; the main aim is to allow consumers of this JSON dump to essentially act as new backends, and backends don't generally need to depend on that kind of data. The new backend is implemented as an EmitJSON() function similar to all of llvm-tblgen's other EmitFoo functions, except that it lives in lib/TableGen instead of utils/TableGen on the basis that I'm expecting to add it to clang-tblgen too in a future patch. To test it, I've written a Python script that loads the JSON output and tests properties of it based on comments in the .td source - more or less like FileCheck, except that the CHECK: lines have Python expressions after them instead of textual pattern matches. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arichardson, labath, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D46054 llvm-svn: 336771
* [WebAssembly] Only call llvm::value::dump() in debug build.Eric Liu2018-07-111-0/+2
| | | | | | | This fixes compile error in r336759. llvm::value::dump is not available in released build. llvm-svn: 336770
* [X86] The TEST instruction is eliminated when BSF/TZCNT is usedCraig Topper2018-07-112-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: These changes cover the PR#31399. Now the ffs(x) function is lowered to (x != 0) ? llvm.cttz(x) + 1 : 0 and it corresponds to the following llvm code: %cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true) %tobool = icmp eq i32 %v, 0 %.op = add nuw nsw i32 %cnt, 1 %add = select i1 %tobool, i32 0, i32 %.op and x86 asm code: bsfl %edi, %ecx addl $1, %ecx testl %edi, %edi movl $0, %eax cmovnel %ecx, %eax In this case the 'test' instruction can't be eliminated because the 'add' instruction modifies the EFLAGS, namely, ZF flag that is set by the 'bsf' instruction when 'x' is zero. We now produce the following code: bsfl %edi, %ecx movl $-1, %eax cmovnel %ecx, %eax addl $1, %eax Patch by Ivan Kulagin Reviewers: davide, craig.topper, spatel, RKSimon Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48765 llvm-svn: 336768
* Revert r336760: "[ORC] Add unit tests for the reexports utility that were..."Lang Hames2018-07-111-1/+1
| | | | | | | This patch broke a few buildbots. I will investigate and re-apply when I have a fix. llvm-svn: 336767
* [X86] Remove some composite MOVSS/MOVSD isel patterns.Craig Topper2018-07-112-31/+0
| | | | | | | | | | | | These patterns looked for a MOVSS/SD followed by a scalar_to_vector. Or a scalar_to_vector followed by a load. In both cases we emitted a MOVSS/SD for the MOVSS/SD part, a REG_CLASS for the scalar_to_vector, and a MOVSS/SD for the load. But we have patterns that do each of those 3 things individually so there's no reason to build large patterns. Most of the test changes are just reorderings. The one test that had a meaningful change is pr30430.ll and it appears to be a regression. But its doing -O0 so I think it missed a lot of opportunities and was just getting lucky before. llvm-svn: 336762
* [ORC] Add unit tests for the reexports utility that were left out of r336741,Lang Hames2018-07-111-1/+1
| | | | | | and fix a bug that these exposed. llvm-svn: 336760
* [WebAssembly] Add pass to infer prototypes for prototype-less functionsSam Clegg2018-07-114-0/+149
| | | | | | | | See https://bugs.llvm.org/show_bug.cgi?id=35385 Differential Revision: https://reviews.llvm.org/D48471 llvm-svn: 336759
* [Power9] Add remaining __flaot128 builtin support for FMA round to oddStefan Pintilie2018-07-111-3/+12
| | | | | | | | | | | | | | Implement this as it is done on GCC: __float128 a, b, c, d; a = __builtin_fmaf128_round_to_odd (b, c, d); // generates xsmaddqpo a = __builtin_fmaf128_round_to_odd (b, c, -d); // generates xsmsubqpo a = - __builtin_fmaf128_round_to_odd (b, c, d); // generates xsnmaddqpo a = - __builtin_fmaf128_round_to_odd (b, c, -d); // generates xsnmsubpqp Differential Revision: https://reviews.llvm.org/D48218 llvm-svn: 336754
* [ARM] Treat cmn immediates as legal in isLegalICmpImmediate.Eli Friedman2018-07-101-4/+6
| | | | | | | | | | | | | The original code attempted to do this, but the std::abs() call didn't actually do anything due to implicit type conversions. Fix the type conversions, and perform the correct check for negative immediates. This probably has very little practical impact, but it's worth fixing just to avoid confusion in the future, I think. Differential Revision: https://reviews.llvm.org/D48907 llvm-svn: 336742
* [ORC] Generalize alias materialization to support re-exports (i.e. aliasing ofLang Hames2018-07-101-40/+138
| | | | | | | | | symbols in another VSO). Also fixes a bug where chained aliases within a single VSO would deadlock on materialization. llvm-svn: 336741
* Sort includes + include a missing `extern "C"` headerGeorge Burgess IV2018-07-101-1/+2
| | | | | | | | | | If we don't include Initialization.h, `LLVMInitializeAggressiveInstCombiner` won't see its `extern "C"` decl. This causes sadness, name mangling, and linker errors. Reported on the mailing lists by Vladimir Vissoultchev. Thanks! llvm-svn: 336736
* [X86] Remove AddedComplexity from all patterns that use X86vzmovl as their root.Craig Topper2018-07-104-183/+139
| | | | | | | | Some added 20 and some added 15. Its unclear when to use which value and whether they are required at all. This patch removes them all. If we start finding real world issues we may need to add them back with proper tests. llvm-svn: 336735
* Fix -Wmismatched-tags warningRichard Trieu2018-07-101-1/+1
| | | | | | class -> struct in forward declaration. llvm-svn: 336733
* [X86] Teach X86InstrInfo::commuteInstructionImpl to use MOVSD/MOVSS for ↵Craig Topper2018-07-101-1/+21
| | | | | | | | | | BLEND under optsize when the immediate allows it. Isel currently emits movss/movsd a lot of the time and an accidental double commute turns it into a blend. Ideally we'd select blend directly in isel under optspeed and not rely on the double commute to create blend. llvm-svn: 336731
* [X86] Remove X86ISD::MOVLPS and X86ISD::MOVLPD. NFCICraig Topper2018-07-105-90/+8
| | | | | | | | | | | | These ISD nodes try to select the MOVLPS and MOVLPD instructions which are special load only instructions. They load data and merge it into the lower 64-bits of an XMM register. They are logically equivalent to our MOVSD node plus a load. There was only one place in X86ISelLowering that used MOVLPD and no places that selected MOVLPS. The one place that selected MOVLPD had to choose between it and MOVSD based on whether there was a load. But lowering is too early to tell if the load can really be folded. So in isel we have patterns that use MOVSD for MOVLPD if we can't find a load. We also had patterns that select the MOVLPD instruction for a MOVSD if we can find a load, but didn't choose the MOVLPD ISD opcode for some reason. So it seems better to just standardize on MOVSD ISD opcode and manage MOVSD vs MOVLPD instruction with isel patterns. llvm-svn: 336728
* [AMDGPU] Fix layering issue with AMDGPUHSAMetadataStreamer (NFC)Scott Linder2018-07-105-2/+2
| | | | llvm-svn: 336722
* [ThinLTO] Use std::map to get determistic imports filesTeresa Johnson2018-07-103-7/+18
| | | | | | | | | | | | | | | | | | | | | | Summary: I noticed that the .imports files emitted for distributed ThinLTO backends do not have consistent ordering. This is because StringMap iteration order is not guaranteed to be deterministic. Since we already have a std::map with this information, used when emitting the individual index files (ModuleToSummariesForIndex), use it for the imports files as well. This issue is likely causing some unnecessary rebuilds of the ThinLTO backends in our distributed build system as the imports files are inputs to those backends. Reviewers: pcc, steven_wu, mehdi_amini Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D48783 llvm-svn: 336721
* [X86] Remove dead SDNode object from X86InstrFragmentsSIMD.td. NFCCraig Topper2018-07-101-1/+0
| | | | | | It points to an opcode that doesn't exist. llvm-svn: 336720
* [X86] Remove AddedComplexity from register form of NOT. NFCICraig Topper2018-07-101-3/+0
| | | | | | | | I believe isProfitableToFold will stop the load folding that this was intended to overcome. Given an (xor load, -1), isProfitableToFold will see that the immediate can be folded with the xor using a one byte immediate since it can be sign extended. It doesn't know about NOT, but the one byte immediate check is enough to stop the fold. llvm-svn: 336712
* [X86] Remove AddedComplexity from MMX_X86movw2d patterns.Craig Topper2018-07-101-9/+6
| | | | | | There were only 3 patterns with this node as a root and they all the same AddedComplexity. So this doesn't really do anything. llvm-svn: 336711
* [AMDGPU] Refactor HSAMetadataStream::emitKernel (NFC)Scott Linder2018-07-105-122/+148
| | | | | | | | Move all metadata construction into AMDGPUHSAMetadataStreamer. Differential Revision: https://reviews.llvm.org/D48176 llvm-svn: 336707
* [GlobalISel][X86_64] Support for G_SITOFPAlexander Ivchenko2018-07-102-0/+18
| | | | | | The instruction selection is automatically handled by tablegen llvm-svn: 336703
* [Evaluator] Examine alias when evaluating function callEugene Leviant2018-07-101-2/+16
| | | | | | This fixes PR38120 llvm-svn: 336702
* [DAGCombiner] Add special case fast paths for udiv x,1 and udiv x,-1Simon Pilgrim2018-07-101-0/+9
| | | | | | udiv x,-1 was going down the (slow) BuildUDIV route resulting in unnecessary shifts. llvm-svn: 336701
* Revert "[AccelTable] Provide abstraction for emitting DWARF5 accelerator ↵Jonas Devlieghere2018-07-101-56/+18
| | | | | | | | | tables." This reverts r336529 because an alternative approach turned out to be a better fit for dsymuil. llvm-svn: 336698
* AMDGPU: Make hidden argument metadata consistent withKonstantin Zhuravlyov2018-07-102-31/+46
| | | | | | | | amdgpu-implicitarg-num-bytes attribute Differential Revision: https://reviews.llvm.org/D49096 llvm-svn: 336697
* [InstCombine] allow flag propagation when using safe constantSanjay Patel2018-07-101-2/+3
| | | | | | | This corresponds with the code for the single binop pattern added in rL336684. llvm-svn: 336696
* [gcov] Fix ABI when calling llvm_gcov_... routines from instrumentation codeUlrich Weigand2018-07-101-12/+55
| | | | | | | | | | | | | | | | | | | | | | | The llvm_gcov_... routines in compiler-rt are regular C functions that need to be called using the proper C ABI for the target. The current code simply calls them using plain LLVM IR types. Since the type are mostly simple, this happens to just work on certain targets. But other targets still need special handling; in particular, it may be necessary to sign- or zero-extended sub-word values to comply with the ABI. This caused gcov failures on SystemZ in particular. Now the very same problem was already fixed for the llvm_profile_ calls here: https://reviews.llvm.org/D21736 This patch uses the same method to fix the llvm_gcov_ calls, in particular calls to llvm_gcda_start_file, llvm_gcda_emit_function, and llvm_gcda_emit_arcs. Reviewed By: marco-c Differential Revision: https://reviews.llvm.org/D49134 llvm-svn: 336692
* [MC] Add interface to finish pending labels.Jonas Devlieghere2018-07-101-1/+1
| | | | | | | | | | | When manually finishing the object writer in dsymutil, it's possible that there are pending labels that haven't been resolved. This results in an assertion when the assembler tries to fixup a label that doesn't have an address yet. Differential revision: https://reviews.llvm.org/D49131 llvm-svn: 336688
* [InstCombine] safely allow non-commutative binop identity constant foldsSanjay Patel2018-07-101-8/+11
| | | | | | | | | | | | | | | | This was originally intended with D48893, but as discussed there, we have to make the folds safe from producing extra poison. This should give the single binop folds the same capabilities as the existing folds for 2-binops+shuffle. LLVM binary opcode review: there are a total of 18 binops. There are 7 commutative binops (add, mul, and, or, xor, fadd, fmul) which we already fold. We're able to fold 6 more opcodes with this patch (shl, lshr, ashr, fdiv, udiv, sdiv). There are no folds for srem/urem/frem AFAIK. We don't bother with sub/fsub with constant operand 1 because those are canonicalized to add/fadd. 7 + 6 + 3 + 2 = 18. llvm-svn: 336684
* Support -fdebug-prefix-map in llvm-mc. This is useful to omit thePaul Robinson2018-07-103-4/+35
| | | | | | | | | | | debug compilation dir when compiling assembly files with -g. Part of PR38050. Patch by Siddhartha Bagaria! Differential Revision: https://reviews.llvm.org/D48988 llvm-svn: 336680
* [InstCombine] drop poison flags when shuffle mask undef propagates to constantSanjay Patel2018-07-101-0/+5
| | | | llvm-svn: 336679
* [AArch64][SVE] Asm: Support for predicated unary operations.Sander de Smalen2018-07-102-14/+35
| | | | | | | | | | | | | | | | | | | | | This patch adds support for the following instructions: CLS (Count Leading Sign bits) CLZ (Count Leading Zeros) CNT (Count non-zero bits) CNOT (Logically invert boolean condition in vector) NOT (Bitwise invert vector) FABS (Floating-point absolute value) FNEG (Floating-point negate) All operations are predicated and unary, e.g. clz z0.s, p0/m, z1.s - CLS, CLZ, CNT, CNOT and NOT have variants for 8, 16, 32 and 64 bit elements. - FABS and FNEG have variants for 16, 32 and 64 bit elements. llvm-svn: 336677
* Reapply "AMDGPU: Force inlining if LDS global address is used"Matt Arsenault2018-07-103-26/+95
| | | | | | This reverts commit r336623 llvm-svn: 336675
* [InstCombine] allow more shuffle-binop folds with safe constantsSanjay Patel2018-07-101-10/+13
| | | | | | | | | | | | The case with 2 variables is more complicated than the case where we eliminate the shuffle entirely because a shuffle with an undef mask element creates an undef result. I'm not aware of any current analysis/transform that recognizes that undef propagating to a div/rem/shift, but we have to guard against the possibility. llvm-svn: 336668
* [DebugInfo][LoopVectorize] Preserve DL in induction PHI and AddAnastasis Grammenos2018-07-101-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D48968 llvm-svn: 336667
* [DAGCombiner] visitREM - call visitSDIVLike/visitUDIVLike directly to avoid ↵Simon Pilgrim2018-07-101-12/+9
| | | | | | | | recursive combining. As suggested by @efriedma on D48975 use the visitSDIVLike/visitUDIVLike functions introduced at rL336656. llvm-svn: 336664
* [Hexagon] Add implicit uses even when untied explicit uses are presentKrzysztof Parzyszek2018-07-101-2/+6
| | | | | | | | | | | | | | | | | | An explicit untied use is not sufficient to maintain liveness of a register redefined in a predicated instruction. For example %1 = COPY %0 ... %1 = A2_paddif %2, %1, 1 could become $r1 = COPY $r0 ... $r1 = A2_paddif $p0, $r1, 1 and later $r1 = COPY $r0 ;; this is not really dead! ... $r1 = A2_paddif $p0, $r0, 1 llvm-svn: 336662
* [LowerSwitch] Fixed faulty PHI nodesKarl-Johan Karlsson2018-07-101-3/+11
| | | | | | | | | | | | | | | | | | | | | Summary: Fixed two cases of where PHI nodes need to be updated by lowerswitch. When lowerswitch find out that the switch default branch is not reachable it remove the old default and replace it with the most popular block from the cases, but it forget to update the PHI nodes in the default block. The PHI nodes also need to be updated when the switch is replaced with a single branch. Reviewers: hans, reames, arsenm Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D47203 llvm-svn: 336659
* [Support] Harded JSON against invalid UTF-8.Sam McCall2018-07-101-4/+45
| | | | | | | | | Parsing invalid UTF-8 input is now a parse error. Creating JSON values from invalid UTF-8 now triggers an assertion, and (in no-assert builds) substitutes the unicode replacement character. Strings retrieved from json::Value are always valid UTF-8. llvm-svn: 336657
* [DAGCombiner] Split SDIV/UDIV optimization expansions from the rest of the ↵Simon Pilgrim2018-07-101-15/+44
| | | | | | | | combines. NFCI. As suggested by @efriedma on D48975, this patch separates the BuildDiv/Pow2 style optimizations from the rest of the visitSDIV/visitUDIV to make it easier to reuse the combines and will allow us to avoid some rather nasty node recursive combining in visitREM. llvm-svn: 336656
* [PM/Unswitch] Fix unused variable in r336646.Chandler Carruth2018-07-101-0/+1
| | | | llvm-svn: 336647
* [PM/Unswitch] Fix a collection of closely related issues with trivialChandler Carruth2018-07-101-17/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | switch unswitching. The core problem was that the way we handled unswitching trivial exit edges through the default successor of a switch. For some reason I thought the right way to do this was to add a block containing unreachable and point the default successor at this block. In retrospect, this has an amazing number of problems. The first issue is the one that this pass has always worked around -- we have to *detect* such edges and avoid unswitching them again. This seemed pretty easy really. You juts look for an edge to a block containing unreachable. However, this pattern is woefully unsound. So many things can break it. The amazing thing is that I found a test case where *simple-loop-unswitch itself* breaks this! When we do a *non-trivial* unswitch of a switch we will end up splitting this exit edge. The result will be a default successor that is an exit and terminates in ... a perfectly normal branch. So the first test case that I started trying to fix is added to the nontrivial test cases. This is a ridiculous example that did just amazing things previously. With just unswitch, it would create 10+ copies of this stuff stamped out. But if you combine it *just right* with a bunch of other passes (like simplify-cfg, loop rotate, and some LICM) you can get it to do this infinitely. Or at least, I never got it to finish. =[ This, in turn, uncovered another related issue. When we are manipulating these switches after doing a trivial unswitch we never correctly updated PHI nodes to reflect our edits. As soon as I started changing how these edges were managed, it became obvious there were more issues that I couldn't realistically leave unaddressed, so I wrote more test cases around PHI updates here and ensured all of that works now. And this, in turn, required some adjustment to how we collect and manage the exit successor when it is the default successor. That showed a clear bug where we failed to include it in our search for the outer-most loop reached by an unswitched exit edge. This was actually already tested and the test case didn't work. I (wrongly) thought that was due to SCEV failing to analyze the switch. In fact, it was just a simple bug in the code that skipped the default successor. While changing this, I handled it correctly and have updated the test to reflect that we now get precise SCEV analysis of trip counts for the outer loop in one of these cases. llvm-svn: 336646
OpenPOWER on IntegriCloud