summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [InstCombine] allow more shuffle-binop folds with safe constantsSanjay Patel2018-07-101-10/+13
| | | | | | | | | | | | The case with 2 variables is more complicated than the case where we eliminate the shuffle entirely because a shuffle with an undef mask element creates an undef result. I'm not aware of any current analysis/transform that recognizes that undef propagating to a div/rem/shift, but we have to guard against the possibility. llvm-svn: 336668
* [DebugInfo][LoopVectorize] Preserve DL in induction PHI and AddAnastasis Grammenos2018-07-101-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D48968 llvm-svn: 336667
* [DAGCombiner] visitREM - call visitSDIVLike/visitUDIVLike directly to avoid ↵Simon Pilgrim2018-07-101-12/+9
| | | | | | | | recursive combining. As suggested by @efriedma on D48975 use the visitSDIVLike/visitUDIVLike functions introduced at rL336656. llvm-svn: 336664
* [Hexagon] Add implicit uses even when untied explicit uses are presentKrzysztof Parzyszek2018-07-101-2/+6
| | | | | | | | | | | | | | | | | | An explicit untied use is not sufficient to maintain liveness of a register redefined in a predicated instruction. For example %1 = COPY %0 ... %1 = A2_paddif %2, %1, 1 could become $r1 = COPY $r0 ... $r1 = A2_paddif $p0, $r1, 1 and later $r1 = COPY $r0 ;; this is not really dead! ... $r1 = A2_paddif $p0, $r0, 1 llvm-svn: 336662
* [LowerSwitch] Fixed faulty PHI nodesKarl-Johan Karlsson2018-07-101-3/+11
| | | | | | | | | | | | | | | | | | | | | Summary: Fixed two cases of where PHI nodes need to be updated by lowerswitch. When lowerswitch find out that the switch default branch is not reachable it remove the old default and replace it with the most popular block from the cases, but it forget to update the PHI nodes in the default block. The PHI nodes also need to be updated when the switch is replaced with a single branch. Reviewers: hans, reames, arsenm Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D47203 llvm-svn: 336659
* [Support] Harded JSON against invalid UTF-8.Sam McCall2018-07-101-4/+45
| | | | | | | | | Parsing invalid UTF-8 input is now a parse error. Creating JSON values from invalid UTF-8 now triggers an assertion, and (in no-assert builds) substitutes the unicode replacement character. Strings retrieved from json::Value are always valid UTF-8. llvm-svn: 336657
* [DAGCombiner] Split SDIV/UDIV optimization expansions from the rest of the ↵Simon Pilgrim2018-07-101-15/+44
| | | | | | | | combines. NFCI. As suggested by @efriedma on D48975, this patch separates the BuildDiv/Pow2 style optimizations from the rest of the visitSDIV/visitUDIV to make it easier to reuse the combines and will allow us to avoid some rather nasty node recursive combining in visitREM. llvm-svn: 336656
* [PM/Unswitch] Fix unused variable in r336646.Chandler Carruth2018-07-101-0/+1
| | | | llvm-svn: 336647
* [PM/Unswitch] Fix a collection of closely related issues with trivialChandler Carruth2018-07-101-17/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | switch unswitching. The core problem was that the way we handled unswitching trivial exit edges through the default successor of a switch. For some reason I thought the right way to do this was to add a block containing unreachable and point the default successor at this block. In retrospect, this has an amazing number of problems. The first issue is the one that this pass has always worked around -- we have to *detect* such edges and avoid unswitching them again. This seemed pretty easy really. You juts look for an edge to a block containing unreachable. However, this pattern is woefully unsound. So many things can break it. The amazing thing is that I found a test case where *simple-loop-unswitch itself* breaks this! When we do a *non-trivial* unswitch of a switch we will end up splitting this exit edge. The result will be a default successor that is an exit and terminates in ... a perfectly normal branch. So the first test case that I started trying to fix is added to the nontrivial test cases. This is a ridiculous example that did just amazing things previously. With just unswitch, it would create 10+ copies of this stuff stamped out. But if you combine it *just right* with a bunch of other passes (like simplify-cfg, loop rotate, and some LICM) you can get it to do this infinitely. Or at least, I never got it to finish. =[ This, in turn, uncovered another related issue. When we are manipulating these switches after doing a trivial unswitch we never correctly updated PHI nodes to reflect our edits. As soon as I started changing how these edges were managed, it became obvious there were more issues that I couldn't realistically leave unaddressed, so I wrote more test cases around PHI updates here and ensured all of that works now. And this, in turn, required some adjustment to how we collect and manage the exit successor when it is the default successor. That showed a clear bug where we failed to include it in our search for the outer-most loop reached by an unswitched exit edge. This was actually already tested and the test case didn't work. I (wrongly) thought that was due to SCEV failing to analyze the switch. In fact, it was just a simple bug in the code that skipped the default successor. While changing this, I handled it correctly and have updated the test to reflect that we now get precise SCEV analysis of trip counts for the outer loop in one of these cases. llvm-svn: 336646
* [X86][SSE] Prefer BLEND(SHL(v,c1),SHL(v,c2)) over MUL(v, c3)Simon Pilgrim2018-07-101-8/+25
| | | | | | | | | | Now that rL336250 has landed, we should prefer 2 immediate shifts + a shuffle blend over performing a multiply. Despite the increase in instructions, this is quicker (especially for slow v4i32 multiplies), avoid loads and constant pool usage. It does mean however that we increase register pressure. The code size will go up a little but by less than what we save on the constant pool data. This patch also adds support for v16i16 to the BLEND(SHIFT(v,c1),SHIFT(v,c2)) combine, and also prevents blending on pre-SSE41 shifts if it would introduce extra blend masks/constant pool usage. Differential Revision: https://reviews.llvm.org/D48936 llvm-svn: 336642
* [X86] Use IsProfitableToFold to block vinsertf128rm in favor of ↵Craig Topper2018-07-102-1/+5
| | | | | | | | insert_subreg instead of artifically increasing pattern complexity to give priority. This is a much more direct way to solve the issue than just giving extra priority. llvm-svn: 336639
* [X86] Remove some seemingly unnecessary patterns.Craig Topper2018-07-101-4/+0
| | | | | | | | We're missing the EVEX equivalents of these patterns and seem to get along fine. I think we end up with X86vzload for the obvious IR cases that would produce this DAG. llvm-svn: 336638
* [X86] Correct vfixupimm load patterns to look for an integer load, not a ↵Craig Topper2018-07-101-21/+20
| | | | | | | | floating point load bitcasted to integer. DAG combine wouldn't let a floating point load bitcasted to integer exist. It would just be an integer load. llvm-svn: 336626
* [X86] Remove FloatVT from X86VectorVTInfo in X86InstrAVX512.tdCraig Topper2018-07-101-18/+8
| | | | | | The only places it was used where places where VT was the same as FloatVT. So switch those uses to VT and drop it. llvm-svn: 336624
* Revert "AMDGPU: Force inlining if LDS global address is used"Vlad Tsyrklevich2018-07-103-95/+26
| | | | | | | This reverts commit r336587, it was causing test failures on the sanitizer bots. llvm-svn: 336623
* [DWARF][NFC] Refactor range list emission to use a static helperWolfgang Pieb2018-07-101-57/+59
| | | | | | | | | | | This is prep for DWARF v5 range list emission. Emission of a single range list is moved to a static helper function. Reviewer: jdevlieghere Differential Revision: https://reviews.llvm.org/D49098 llvm-svn: 336621
* [InstCombine] allow more shuffle folds using safe constantsSanjay Patel2018-07-093-25/+50
| | | | | | | | | | getSafeVectorConstantForBinop() was calling getBinOpIdentity() assuming that the constant we wanted was operand 1 (RHS). That's wrong, but I don't think we could expose a bug or even a suboptimal fold from that because the callers have other guards for any binop that would have been affected. llvm-svn: 336617
* [WebAssembly] Support for binary atomic RMW instructionsHeejin Ahn2018-07-093-6/+444
| | | | | | | | | | | | | | | | | | Summary: This adds support for binary atomic read-modify-write instructions: add, sub, and, or, xor, and xchg. This does not yet support translations of some of LLVM IR atomicrmw instructions (nand, max, min, umax, and umin) that do not have a direct counterpart in wasm instructions. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49088 llvm-svn: 336615
* llvm: Add support for "-fno-delete-null-pointer-checks"Manoj Gupta2018-07-0916-71/+167
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Support for this option is needed for building Linux kernel. This is a very frequently requested feature by kernel developers. More details : https://lkml.org/lkml/2018/4/4/601 GCC option description for -fdelete-null-pointer-checks: This Assume that programs cannot safely dereference null pointers, and that no code or data element resides at address zero. -fno-delete-null-pointer-checks is the inverse of this implying that null pointer dereferencing is not undefined. This feature is implemented in LLVM IR in this CL as the function attribute "null-pointer-is-valid"="true" in IR (Under review at D47894). The CL updates several passes that assumed null pointer dereferencing is undefined to not optimize when the "null-pointer-is-valid"="true" attribute is present. Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv Reviewed By: efriedma, george.burgess.iv Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits Differential Revision: https://reviews.llvm.org/D47895 llvm-svn: 336613
* Use StringRef instead of `const char *`.Rui Ueyama2018-07-091-1/+1
| | | | | | | | | | I don't think there's a need to use `const char *`. In most (probably all?) cases, we need a length of a name later, so discarding a length will lead to a wasted effort. Differential Revision: https://reviews.llvm.org/D49046 llvm-svn: 336612
* Make llvm.objectsize more conservative with nullGeorge Burgess IV2018-07-091-1/+8
| | | | | | | | | | In non-zero address spaces, we were reporting that an object at `null` always occupies zero bytes. This is incorrect in many cases, so just return `unknown` in those cases for now. Differential Revision: https://reviews.llvm.org/D48860 llvm-svn: 336611
* [ORC] Rename MaterializationResponsibility::delegate to replace and add a newLang Hames2018-07-092-3/+20
| | | | | | | | | | | | | delegate method (and unit test). The name 'replace' better captures what the old delegate method did: it returned materialization responsibility for a set of symbols to the VSO. The new delegate method delegates responsibility for a set of symbols to a new MaterializationResponsibility instance. This can be used to split responsibility between multiple threads, or multiple materialization methods. llvm-svn: 336603
* [Power9] Add __float128 builtins for Rounding OperationsStefan Pintilie2018-07-092-0/+22
| | | | | | | | | | | | | | | Added __float128 support for a number of rounding operations: trunc rint nearbyint round floor ceil Differential Revision: https://reviews.llvm.org/D48415 llvm-svn: 336601
* [WebAssembly] Improve readability of load/stores and tests. NFC.Heejin Ahn2018-07-092-99/+89
| | | | | | | | | | | | | | | | | Summary: - Changed variable/function names to be more consistent - Improved comments in test files - Added more tests - Fixed a few typos - Misc. cosmetic changes Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49087 llvm-svn: 336598
* [Power9] [LLVM] Add __float128 support for trunc to double round to oddStefan Pintilie2018-07-091-1/+4
| | | | | | | | | Add support for this builtin: double builtin_truncf128_round_to_odd(float128) Differential Revision: https://reviews.llvm.org/D48483 llvm-svn: 336595
* RenameIndependentSubregs: Fix handling of undef tied operandsMark Searles2018-07-091-6/+10
| | | | | | | | | Ensure that, if updating a tied operand pair, to only update that pair. Differential Revision: https://reviews.llvm.org/D49052 llvm-svn: 336593
* [globalisel][irtranslator] Add support for atomicrmw and (strong) cmpxchgDaniel Sanders2018-07-092-1/+224
| | | | | | | | | | | | | | | | | | | | Summary: This patch adds support for the atomicrmw instructions and the strong cmpxchg instruction to the IRTranslator. I've left out weak cmpxchg because LangRef.rst isn't entirely clear on what difference it makes to the backend. As far as I can tell from the code, it only matters to AtomicExpandPass which is run at the LLVM-IR level. Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, javed.absar Reviewed By: qcolombet Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D40092 llvm-svn: 336589
* [AMDGPU][Waitcnt] fix "comparison of integers of different signs" build errorMark Searles2018-07-091-1/+1
| | | | | | | | | | | | | | | | | | Build error on Android; reported by and fix provided by (thanks) by Mauro Rossi <issor.oruam@gmail.com> Fixes the following building error: external/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1903:61: error: comparison of integers of different signs: 'typename iterator_traits<__wrap_iter<MachineBasicBlock **> >::difference_type' (aka 'int') and 'unsigned int' [-Werror,-Wsign-compare] BlockWaitcntProcessedSet.end(), &MBB) < Count)) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~ 1 error generated. Differential Revision: https://reviews.llvm.org/D49089 llvm-svn: 336588
* AMDGPU: Force inlining if LDS global address is usedMatt Arsenault2018-07-093-26/+95
| | | | | | | | | | These won't work for the forseeable future. These aren't allowed from OpenCL, but IPO optimizations can make them appear. Also directly set the attributes on functions, regardless of the linkage rather than cloning functions like before. llvm-svn: 336587
* [X86][TLI] DAGCombine: Unfold variable bit-clearing mask to two shifts.Roman Lebedev2018-07-093-0/+74
| | | | | | | | | | | | | | | | | | | | | Summary: This adds a reverse transform for the instcombine canonicalizations that were added in D47980, D47981. As discussed later, that was worse at least for the code size, and potentially for the performance, too. https://rise4fun.com/Alive/Zmpl Reviewers: craig.topper, RKSimon, spatel Reviewed By: spatel Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D48768 llvm-svn: 336585
* [Power9] Add __float128 builtins for Round To OddStefan Pintilie2018-07-092-7/+32
| | | | | | | | | | | | GCC has builtins for these round to odd instructions: __float128 __builtin_sqrtf128_round_to_odd (__float128) __float128 __builtin_{add,sub,mul,div}f128_round_to_odd (__float128, __float128) __float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128) Differential Revision: https://reviews.llvm.org/D47550 llvm-svn: 336578
* [DebugInfo] Change default value of FDEPointerEncodingMaksim Panchenko2018-07-091-1/+1
| | | | | | | | | | | | | | | | Summary: If the encoding is not specified in CIE augmentation string, then it should be DW_EH_PE_absptr instead of DW_EH_PE_omit. Reviewers: ruiu, MaskRay, plotfi, rafauler Reviewed By: MaskRay Subscribers: rafauler, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D49000 llvm-svn: 336577
* [SelectionDAG] Add VT consistency checks to the creation of ISD::FMA.Craig Topper2018-07-091-0/+3
| | | | | | This is similar to what is done for binops. I don't know if this would have helped us catch the bug fixed in r336566 earlier or not, but I figured it couldn't hurt. llvm-svn: 336576
* [LoopInfo] Port loop exit interfaces from Loop to LoopBaseDiego Caballero2018-07-091-63/+0
| | | | | | | | | | | | | | This patch ports hasDedicatedExits, getUniqueExitBlocks and getUniqueExitBlock in Loop to LoopBase so that they can be used from other LoopBase sub-classes. Reviewers: chandlerc, sanjoy, hfinkel, fhahn Reviewed By: chandlerc Differential Revision: https://reviews.llvm.org/D48817 llvm-svn: 336572
* [X86] In combineFMA, make sure we bitcast the result of isFNEG back the ↵Craig Topper2018-07-091-1/+2
| | | | | | | | expected type before creating the new FMA node. Previously, we were creating malformed SDNodes, but nothing noticed because the type constraints prevented isel from noticing. llvm-svn: 336566
* [InstCombine] avoid extra poison when moving shift above shuffleSanjay Patel2018-07-091-8/+5
| | | | | | | | | | As discussed in D49047 / D48987, shift-by-undef produces poison, so we can't use undef vector elements in that case.. Note that we need to extend this for poison-generating flags, and there's a proposal to create poison from FMF in D47963, llvm-svn: 336562
* [BitcodeReader] Infer the correct runtime preemption for GlobalValueSteven Wu2018-07-091-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: To allow bitcode built by old compiler to pass the current verifer, BitcodeReader needs to auto infer the correct runtime preemption from linkage and visibility for GlobalValues. Since llvm-6.0 bitcode already contains the new field but can be incorrect in some cases, the attribute needs to be recomputed all the time in BitcodeReader. This will make all the GVs has dso_local marked correctly if read from bitcode, and it should still allow the verifier to catch mistakes in optimization passes. This should fix PR38009. Reviewers: sfertile, vsk Reviewed By: vsk Subscribers: dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D49039 llvm-svn: 336560
* [InstCombine] generalize safe vector constant utilitySanjay Patel2018-07-093-30/+23
| | | | | | | | | | | This is almost NFC, but there could be some case where the original code had undefs in the constants (rather than just the shuffle mask), and we'll use safe constants rather than undefs now. The FIXME noted in foldShuffledBinop() is already visible in existing tests, so correcting that is the next step. llvm-svn: 336558
* [X86] Remove some patterns that include a bitcast of a floating point load ↵Craig Topper2018-07-092-10/+0
| | | | | | | | to an integer type. DAG combine should have converted the type of the load. llvm-svn: 336557
* [X86] Remove some patterns that seems to be unreachable.Craig Topper2018-07-092-12/+0
| | | | | | | | These patterns mapped (v2f64 (X86vzmovl (v2f64 (scalar_to_vector FR64:$src)))) to a MOVSD and an zeroing XOR. But the complexity of a pattern for (v2f64 (X86vzmovl (v2f64))) that selects MOVQ is artificially and hides this MOVSD pattern. Weirder still, the SSE version of the pattern was explicitly blocked on SSE41, but yet we had copied it to AVX and AVX512. llvm-svn: 336556
* [X86] Remove some seemingly unnecessary AddedComplexity lines.Craig Topper2018-07-091-8/+4
| | | | | | Looking at the generated tables this didn't seem to make an obvious difference in pattern priority. llvm-svn: 336555
* [VPlan][LV] Introduce condition bit in VPBlockBaseDiego Caballero2018-07-095-24/+70
| | | | | | | | | | | | | | | This patch introduces a VPValue in VPBlockBase to represent the condition bit that is used as successor selector when a block has multiple successors. This information wasn't necessary until now, when we are about to introduce outer loop vectorization support in VPlan code gen. Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D48814 llvm-svn: 336554
* [AArch64][SVE] Asm: Support for CNT(B|H|W|D) and CNTP instructions.Sander de Smalen2018-07-092-13/+72
| | | | | | | | | | | | | | | | | | This patch adds support for the following instructions: CNTB CNTH - Determine the number of active elements implied by CNTW CNTD the named predicate constant, multiplied by an immediate, e.g. cnth x0, vl8, #16 CNTP - Count active predicate elements, e.g. cntp x0, p0, p1.b counts the number of active elements in p1, predicated by p0, and stores the result in x0. llvm-svn: 336552
* [CVP] Handle calls with void return value. No need to create CVPLattice ↵Xin Tong2018-07-091-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | state for it. Summary: Tests: 10 Metric: compile_time Program unpatch-result patch-result diff Bullet/bullet 32.39 30.54 -5.7% SPASS/SPASS 18.14 17.25 -4.9% mafft/pairlocalalign 12.10 11.64 -3.8% ClamAV/clamscan 19.21 19.63 2.2% 7zip/7zip-benchmark 49.55 48.85 -1.4% kimwitu++/kc 15.68 15.87 1.2% lencod/lencod 21.13 21.34 1.0% consumer-typeset/consumer-typeset 13.65 13.62 -0.2% tramp3d-v4/tramp3d-v4 29.88 29.92 0.1% sqlite3/sqlite3 18.48 18.46 -0.1% unpatch-result patch-result diff count 10.000000 10.000000 10.000000 mean 23.022000 22.712400 -0.011671 std 11.362831 11.094183 0.027338 min 12.104000 11.640000 -0.057298 25% 16.299000 16.214000 -0.032282 50% 18.844000 19.048000 -0.001350 75% 27.689000 27.774000 0.007752 max 49.552000 48.852000 0.021861 I also tested only this pass by concatenating all the code from the llvm/lib/Analysis/ folder and do clang -g followed by opt. I get close to 20% speedup for the pass. I expect a majority of the gain come from skipping the dbg intrinsics. Before patch (opt -time-passes -called-value-propagation): ============ ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 3.8303 seconds (3.8279 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 2.0768 ( 57.3%) 0.0990 ( 48.0%) 2.1757 ( 56.8%) 2.1757 ( 56.8%) Bitcode Writer 0.8444 ( 23.3%) 0.0600 ( 29.1%) 0.9044 ( 23.6%) 0.9044 ( 23.6%) Called Value Propagation 0.7031 ( 19.4%) 0.0472 ( 22.9%) 0.7502 ( 19.6%) 0.7478 ( 19.5%) Module Verifier 3.6242 (100.0%) 0.2062 (100.0%) 3.8303 (100.0%) 3.8279 (100.0%) Total After patch (opt -time-passes -called-value-propagation): ============ ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 3.6605 seconds (3.6579 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 2.0716 ( 59.7%) 0.0990 ( 52.5%) 2.1705 ( 59.3%) 2.1706 ( 59.3%) Bitcode Writer 0.7144 ( 20.6%) 0.0300 ( 15.9%) 0.7444 ( 20.3%) 0.7444 ( 20.4%) Called Value Propagation 0.6859 ( 19.8%) 0.0596 ( 31.6%) 0.7455 ( 20.4%) 0.7429 ( 20.3%) Module Verifier 3.4719 (100.0%) 0.1886 (100.0%) 3.6605 (100.0%) 3.6579 (100.0%) Total Reviewers: davide, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49078 llvm-svn: 336551
* [Power9] Add __float128 support for compare operationsStefan Pintilie2018-07-093-2/+75
| | | | | | | | Added handling for the select f128. Differential Revision: https://reviews.llvm.org/D48294 llvm-svn: 336548
* [AArch64][SVE] Asm: Support for remaining shift instructions.Sander de Smalen2018-07-092-26/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch completes support for shifts, which include: - LSL - Logical Shift Left - LSLR - Logical Shift Left, Reversed form - LSR - Logical Shift Right - LSRR - Logical Shift Right, Reversed form - ASR - Arithmetic Shift Right - ASRR - Arithmetic Shift Right, Reversed form - ASRD - Arithmetic Shift Right for Divide In the following variants: - Predicated shift by immediate - ASR, LSL, LSR, ASRD e.g. asr z0.h, p0/m, z0.h, #1 (active lanes of z0 shifted by #1) - Unpredicated shift by immediate - ASR, LSL*, LSR* e.g. asr z0.h, z1.h, #1 (all lanes of z1 shifted by #1, stored in z0) - Predicated shift by vector - ASR, LSL*, LSR* e.g. asr z0.h, p0/m, z0.h, z1.h (active lanes of z0 shifted by z1, stored in z0) - Predicated shift by vector, reversed form - ASRR, LSLR, LSRR e.g. lslr z0.h, p0/m, z0.h, z1.h (active lanes of z1 shifted by z0, stored in z0) - Predicated shift left/right by wide vector - ASR, LSL, LSR e.g. lsl z0.h, p0/m, z0.h, z1.d (active lanes of z0 shifted by wide elements of vector z1) - Unpredicated shift left/right by wide vector - ASR, LSL, LSR e.g. lsl z0.h, z1.h, z2.d (all lanes of z1 shifted by wide elements of z2, stored in z0) *Variants added in previous patches. llvm-svn: 336547
* [InstCombine] fix shuffle-of-binops transform to avoid poison/undef Sanjay Patel2018-07-091-21/+52
| | | | | | | | | | | | | | | | | | | | As noted in D48987, there are many different ways for this transform to go wrong. In particular, the poison potential for shifts means we have to more careful with those ops. I added tests to make that behavior visible for all of the different cases that I could find. This is a partial fix. To make this review easier, I did not make changes for the single binop pattern (handled in foldSelectShuffleWith1Binop()). I also left out some potential optimizations noted with TODO comments. I'll follow-up once we're confident that things are correct here. The goal is to correct all marked FIXME tests to either avoid the shuffle transform or do it safely. Note that distinguishing when the shuffle mask contains undefs and using getBinOpIdentity() allows for some improvements to div/rem patterns, so there are wins along with the missed opportunities and fixes. Differential Revision: https://reviews.llvm.org/D49047 llvm-svn: 336546
* [mips] Addition of the [d]rem and [d]remu instructionsStefan Maksimovic2018-07-093-25/+116
| | | | | | | | | | | | | Related to http://reviews.llvm.org/D15772 Depends on http://reviews.llvm.org/D16889 Adds [D]REM[U] instructions. Patch By: Srdjan Obucina Contributions from: Simon Dardis Differential Revision: https://reviews.llvm.org/D17036 llvm-svn: 336545
* [AArch64][SVE] Asm: Support for TBL instruction.Sander de Smalen2018-07-092-0/+35
| | | | | | | | | | | Support for SVE's TBL instruction for programmable table lookup/permute using vector of element indices, e.g. tbl z0.d, { z1.d }, z2.d stores elements from z1, indexed by elements from z2, into z0. llvm-svn: 336544
* [Support] Make JSON handle doubles and int64s losslesslySam McCall2018-07-091-16/+26
| | | | | | | | | | | | | | | | | | | Summary: This patch adds a new "integer" ValueType, and renames Number -> Double. This allows us to preserve the full precision of int64_t when parsing integers from the wire, or constructing from an integer. The API is unchanged, other than giving asInteger() a clearer contract. In addition, always output doubles with enough precision that parsing will reconstruct the same double. Reviewers: simon_tatham Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D46209 llvm-svn: 336541
OpenPOWER on IntegriCloud