summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHGMatt Arsenault2019-10-259-76/+97
| | | | | | | | Custom lower this to a target instruction with the merge operands. I think it might be better to directly select this and emit a REG_SEQUENCE, but this would be more work since it would require splitting the tablegen patterns for these cases from the other atomics.
* AMDGPU: Fix the broken dominator tree when creating waterfall loop for ↵Changpeng Fang2019-10-251-2/+2
| | | | | | | | | | | | | resource descriptor Summary: In loadSRsrcFromVGPR, if MBB is the same as Succ, Remiander is not the immediate dominator of Succ. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D69358
* Revert "Add an instruction marker field to the ExtraInfo in MachineInstrs."Amy Huang2019-10-257-114/+117
| | | | | Reverting commit b85b4e5a6f8579c137fecb59a4d75d7bfb111f79 due to some buildbot failures/ out of memory errors.
* [LLD][ThinLTO] Handle GUID collision in import global processingTeresa Johnson2019-10-251-5/+11
| | | | | | | | | | | | | | | | | | | | | | Summary: If there are a GUID collision between two globals checking the summarylist from the import index to make assumption can be dangerous. Do not assume that a GlobalValue that has a GlobalVarSummary actually is a GlobalVariable as it can be another GlobalValue with the same GUID that the summary is connected to. Patch by Joel Klinghed (the_jk@opera.com) Reviewers: evgeny777, tejohnson Reviewed By: tejohnson Subscribers: tejohnson, dblaikie, MaskRay, mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67322
* [Alignment][NFC] getMemoryOpCost uses MaybeAlignGuillaume Chatelet2019-10-2515-57/+69
| | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69307
* [AMDGPU] Fold AGPR reg_sequence initializersStanislav Mekhanoshin2019-10-251-22/+131
| | | | Differential Revision: https://reviews.llvm.org/D69413
* [AMDGPU] Disallow dpp combining for dpp instructions without Src2 operand ↵vpykhtin2019-10-251-1/+2
| | | | | | (when Src2 is required) Differential revision: https://reviews.llvm.org/D69430
* [X86] Add a check for SSE2 to the top of combineReductionToHorizontal.Craig Topper2019-10-251-0/+4
| | | | Without this, we can create a PSADBW node that isn't legal.
* [DAGCombiner] widen zext of popcount based on target supportSanjay Patel2019-10-251-0/+12
| | | | | | | | | | | | | | | | zext (ctpop X) --> ctpop (zext X) This is a prerequisite step for canonicalizing in the other direction (narrow the popcount) in IR - PR43688: https://bugs.llvm.org/show_bug.cgi?id=43688 I'm not sure if any other targets are affected, but I found a missing fold for PPC, so added tests based on that. The reason we widen all the way to 64-bit in these tests is because the initial DAG looks something like this: t5: i8 = ctpop t4 t6: i32 = zero_extend t5 <-- created based on IR, but unused node? t7: i64 = zero_extend t5 Differential Revision: https://reviews.llvm.org/D69127
* AMDGPU/GlobalISel: Legalize FDIV16Austin Kerbow2019-10-253-2/+44
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69347
* Add an instruction marker field to the ExtraInfo in MachineInstrs.Amy Huang2019-10-257-117/+114
| | | | | | | | | | | | | | | | | | Summary: Add instruction marker to MachineInstr ExtraInfo. This does almost the same thing as Pre/PostInstrSymbols, except that it doesn't create a label until printing instructions. This allows for labels to be put around instructions that are deleted/duplicated somewhere. Also undo the workaround in r375137. Reviewers: rnk Subscribers: MatzeB, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69136
* [SLP] adjust code comment; NFCSanjay Patel2019-10-251-1/+1
| | | | (check commit access)
* [APInt] Add saturating left-shift opsRoman Lebedev2019-10-251-0/+19
| | | | | | | | | | | | | | | | Summary: There are `*_ov()` functions already, so at least for consistency it may be good to also have saturating variants. These may or may not be needed for `ConstantRange`'s `shlWithNoWrap()` Reviewers: spatel, nikic Reviewed By: nikic Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69398
* [APInt] Add saturating multiply opsRoman Lebedev2019-10-251-0/+21
| | | | | | | | | | | | | | | | Summary: There are `*_ov()` functions already, so at least for consistency it may be good to also have saturating variants. These may or may not be needed for `ConstantRange`'s `mulWithNoWrap()` Reviewers: spatel, nikic Reviewed By: nikic Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69397
* [CodeGen][SelectionDAG] Fix tiny bug in ExpandIntRes_UADDSUBOItay Bookstein2019-10-251-9/+22
| | | | | | | | | | | | | | | | | Summary: Ternary expression checks for ISD::ADD instead of ISD::UADDO inside DAGTypeLegalizer::ExpandIntRes_UADDSUBO. This means the ternary expression will evaluate to ISD::SUBCARRY for both ISD::UADDO and ISD::USUBO nodes. Targets are likely to implement both, so impact will be very limited in practice. Reviewers: bogner, lebedev.ri Reviewed By: lebedev.ri Subscribers: lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68123
* [RISCV] Add support for half-precision floatsLuís Marques2019-10-251-1/+6
| | | | | | | | | Complete fp16 support by ensuring that load extension / truncate store operations are properly expanded. Reviewers: asb, lenary Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D69246
* [MIPS GlobalISel] Select MSA vector generic and builtin fsqrtPetar Avramovic2019-10-252-7/+17
| | | | | | | | | | | | | selectImpl is able to select G_FSQRT when we set bank for vector operands to fprb. Add detailed tests. Note: G_FSQRT is generated from llvm-ir intrinsics llvm.sqrt.*, and at the moment MIPS is not able to generate this intrinsic for vector type (some targets generate vector llvm.sqrt.* from calls to a builtin function). __builtin_msa_fsqrt_<format> will be transformed into G_FSQRT in legalizeIntrinsic and selected in the same way. Differential Revision: https://reviews.llvm.org/D69376
* [yaml2obj, obj2yaml] - Add support for SHT_NOTE sections.georgerim2019-10-252-8/+102
| | | | | | | | | | | | | | | | | | | | | SHT_NOTE is the section that consists of namesz, descsz, type, name + padding, desc + padding data. This patch teaches yaml2obj, obj2yaml to dump and parse them. This patch implements the section how it is described here: https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-18048.html Which says: "For 64–bit objects and 32–bit objects, each entry is an array of 4-byte words in the format of the target processor" The official specification is different http://www.sco.com/developers/gabi/latest/ch5.pheader.html#note_section And says: "n 64-bit objects (files with e_ident[EI_CLASS] equal to ELFCLASS64), each entry is an array of 8-byte words in the format of the target processor. In 32-bit objects (files with e_ident[EI_CLASS] equal to ELFCLASS32), each entry is an array of 4-byte words in the format of the target processor" Since LLVM uses the first, 32-bit way, this patch follows it. Differential revision: https://reviews.llvm.org/D68983
* Fix a variable typo in LiveDebugValues [NFC]David Stenberg2019-10-251-2/+2
|
* [PowerPC] [Peephole] fold frame offset by using index form to save add.czhengsz2019-10-254-0/+246
| | | | | | | | | | | | | | | | renamable $x6 = ADDI8 $x1, -80 ;;; 0 is replaced with -80 renamable $x6 = ADD8 killed renamable $x6, renamable $x5 STW killed renamable $r3, 4, killed renamable $x6 :: (store 4 into %ir.14, !tbaa !2) After PEI there is a peephole opt opportunity to combine above -80 in ADDI8 with 4 in the STW to eliminate unnecessary ADD8. Expected result: renamable $x6 = ADDI8 $x1, -76 STWX killed renamable $r3, renamable $x5, killed renamable $x6 :: (store 4 into %ir.6, !tbaa !2) Reviewed by: stefanp Differential Revision: https://reviews.llvm.org/D66329
* [LiveDebugValues] Small code clean up; NFCDjordje Todorovic2019-10-251-18/+11
|
* [X86][GISel] Remove unneeded custom selection code for handling shifts.Craig Topper2019-10-241-78/+0
|
* [SCEV] Expose and use maximum constant exit counts for individual loop exitsPhilip Reames2019-10-242-6/+18
| | | | | | | | We were already going to all of the trouble of computing maximum constant exit counts for each loop exit, we might as well expose them through the API. The change in IndVars is mostly to demonstrate that the wired up code works, but it als very slightly strengthens the transform. The strengthened case is rather narrow though: it requires one exactly analyzeable exit, one imprecisely analyzeable exit (with the upper bound less than the precise one), and one unanalyzeable exit. I coudn't construct a reasonably stable test case. This does increase the memory usage of the BackedgeTakenCount by a factor of 2 in the worst case. I also noticed the loop in IndVars is O(#Exits ^ 2). This doesn't change with this patch. A future patch will cache this result inside of SCEV to avoid requering.
* Fix Clang -Wcovered-switch-default warning by moving llvm_unreachable ↵David Blaikie2019-10-241-5/+2
| | | | default to after the switch
* [SCEV] Start reworking backedge taken count APIs to unify max handling [NFC]Philip Reames2019-10-241-13/+21
| | | | This is a first step in figuring out a proper API for maximum (non constant) exit counts. This may evolve a bit as we get experience with the API needs; suggestions very welcome. This patch just tried to provide a framework that we can later add maximum too in a clean and obvious way.
* Always flush pending errors in MCAsmParserJoerg Sonnenberger2019-10-251-4/+3
| | | | This has become visible with the --fatal-warnings support.
* Test commit access via gitPhilip Reames2019-10-241-1/+0
|
* Try harder to fix GCC 5.3 buildHans Wennborg2019-10-241-1/+1
| | | | | | | | | | (This time verified locally.) It was failing with: llvm/lib/MC/XCOFFObjectWriter.cpp:168:56: error: array must be initialized with a brace-enclosed initializer std::array<Section *const, 2> Sections = {&Text, &BSS}; ^
* Fix cppcheck shadow variable warning. NFCI.Simon Pilgrim2019-10-241-6/+6
|
* Follow up on D69112, fix build break for skipping field initializationjasonliu2019-10-241-2/+2
| | | | | Clang emit warning for skipping field initialization. Add {} to fix it. This is a patch that fixes issue introduced in https://reviews.llvm.org/D69112
* Fix MSVC "switch statement contains 'default' but no 'case' labels" warning. ↵Simon Pilgrim2019-10-241-7/+4
| | | | NFCI.
* Revert "Disable exit-on-SIGPIPE in lldb"Vedant Kumar2019-10-242-16/+1
| | | | | | | This reverts commit 32ce14e55e5a99dd99c3b4fd4bd0ccaaf2948c30. In post-commit review, Pavel pointed out that there's a simpler way to ignore SIGPIPE in lldb that doesn't rely on llvm's handlers.
* [ObjC][ARC] Check whether the return and parameter types of the old andAkira Hatanaka2019-10-241-1/+22
| | | | | | | | | | | | new functions are compatible before upgrading a function call to an intrinsic call. Sometimes users insert calls to ARC runtime functions that are not compatible with the corresponding intrinsic functions (for example, 'i8* @objc_storeStrong' instead of 'void @objc_storeStrong'). Don't upgrade those calls. rdar://problem/56447127
* [AMDGPU] Fix mfma scheduling crashStanislav Mekhanoshin2019-10-241-1/+6
| | | | | | | An SUnit can be neither intruction not SDNode. It is all null if represents a nop. Fixed a crash on using SU->getInstr(). Differential Revision: https://reviews.llvm.org/D69395
* Speculative build fix for GCC 5.3.0Hans Wennborg2019-10-241-1/+1
| | | | | | | | It was failing with llvm/lib/MC/XCOFFObjectWriter.cpp:168:53: error: array must be initialized with a brace-enclosed initializer std::array<Section *const, 2> Sections{&Text, &BSS}; ^
* [NFC] Remove redundant linesdfukalov2019-10-241-4/+0
| | | | | | | | | | | | Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69375
* [InstCombine] Fold one-use variable into assertBenjamin Kramer2019-10-241-2/+1
| | | | Avoids warnings in Release builds. NFC.
* [NFC][XCOFF][AIX] Serialize object file writing for each CsectGroupjasonliu2019-10-241-129/+163
| | | | | | | | | | | | | | | | | Summary: Right now we handle each CsectGroup(ProgramCodeCsects, BSSCsects) individually when assigning indices, writing symbol table, and writing section raw data. However, there is already a pattern there, and we could common up those actions for every CsectGroup. This will make adding new CsectGroup(Read Write data, Read only data, TC/TOC, mergeable string) easier, and less error prone. Reviewed by: sfertile, daltenty, DiggerLin Approved by: daltenty Differential Revision: https://reviews.llvm.org/D69112
* [InstCombine] Known-bits optimization for ARM MVE VADC.Simon Tatham2019-10-241-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MVE VADC instruction reads and writes the carry bit at bit 29 of the FPSCR register. The corresponding ACLE intrinsic is specified to work with an integer in which the carry bit is stored at bit 0. So if a user writes a code sequence in C that passes the carry from one VADC to the next, like this, s0 = vadcq_u32(a0, b0, &carry); s1 = vadcq_u32(a1, b1, &carry); then clang will generate IR for each of those operations that shifts the carry bit up into bit 29 before the VADC, and after it, shifts it back down and masks off all but the low bit. But in this situation what you really wanted was two consecutive VADC instructions, so that the second one directly reads the value left in FPSCR by the first, without wasting several instructions on pointlessly clearing the other flag bits in between. This commit explains to InstCombine that the other bits of the flags operand don't matter, and adds a test that demonstrates that all the code between the two VADC instructions can be optimized away as a result. Reviewers: dmgreen, miyuki, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67162
* [ARM] Add IR intrinsics for MVE VLD[24] and VST[24].Simon Tatham2019-10-242-0/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VST2 and VST4 instructions take two or four vector registers as input, and store part of each register to memory in an interleaved pattern. They come in variants indicating which part of each register they store (VST20 and VST21; VST40 to VST43 inclusive); the intention is that issuing each of those variants in turn has the combined effect of loading or storing the whole set of registers to a memory block of equal size. The corresponding VLD2 and VLD4 instructions load from memory in the same interleaved format: each one overwrites only part of its output register set, and again, the idea is that if you use VLD4{0,1,2,3} or VLD2{0,1} together, you end up having written to the whole of each register. I've implemented the stores and loads quite differently. The loads were easiest to implement as a single intrinsic that expands to all four VLD4x instructions or both VLD2x, delivering four complete output registers. (Implementing each individual load as a separate instruction taking four input registers to partially overwrite is possible in theory, but pointless, and when I tried it, I found it would need extra work to get the register allocation not to be horrible.) Since that intrinsic delivers multiple outputs, it has to be instruction-selected in custom C++. But the store instructions are easier to model individually, because they don't overwrite any register at all and you can write a DAG Isel pattern in Tablegen for each one. Hence, my new intrinsic `int_arm_mve_vld4q` expands to four load instructions, delivers four full output vectors, and is handled by C++ code, whereas `int_arm_mve_vst4q` expands to just one store instruction, takes four input vectors and a constant indicating which lanes to store, and is handled entirely in Tablegen. (And similarly for vld2q/vst2q.) This is asymmetric, but it was the easiest way to do each one. Reviewers: dmgreen, miyuki, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68700
* [ARM] Add some sample IR MVE intrinsics with C++ isel.Simon Tatham2019-10-241-0/+173
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds some initial example IR intrinsics for MVE instructions that deliver multiple output values, and hence, have to be instruction- selected by custom C++ code instead of Tablegen patterns. I've added the writeback gather load instructions (taking a vector of base addresses and a single common offset, returning a vector of loaded values and an updated vector of base addresses); one example from the long shift family (taking and returning a 64-bit value in two GPRs); and the VADC instruction (which propagates a carry bit from each vector-lane addition to the next, taking an input carry flag in FPSCR and outputting the final one in FPSCR as well). To support the VPT-predicated forms of these instructions, I've written some helper functions to add the cluster of MVE predicate operands to the end of a MachineInstr. `AddMVEPredicateToOps` is used when the instruction actually is predicated (so it takes a predicate mask argument), and `AddEmptyMVEPredicateToOps` is for when the instruction is unpredicated (so it fills in $noreg for the mask). Each one comes in a form suitable for `vpred_n`, and one for `vpred_r` which takes the extra 'inactive' parameter. For VADC, the representation of the carry flag in the IR intrinsic is a word intended to be moved directly to and from `FPSCR_nzcvqc`, i.e. with the carry flag in bit 29 of the word. (The user-facing ACLE intrinsic will want it to be in bit 0, but I'll do that on the clang side.) Reviewers: dmgreen, miyuki, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68699
* [ARM] Begin adding IR intrinsics for MVE instructions.Simon Tatham2019-10-242-59/+157
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit, together with the next few, will add a representative sample of the kind of IR intrinsics that we'll need in order to implement the user-facing ACLE intrinsics for MVE. Supporting all of them will take more work; the intention of this initial series of commits is to implement an intrinsic or two from lots of different categories, as examples and proofs of concept. This initial commit introduces a small number of IR intrinsics for instructions simple enough that they can use Tablegen ISel patterns: the predicated versions of the VADD and VSUB instructions (both integer and FP), VMIN and VMAX, and the float->half VCVT instruction (predicated and unpredicated). When using VPT-predicated instructions in automatic code generation, it will be convenient to specify the predicate value as a vector of the appropriate number of i1. To make it easy to specify all sizes of an instruction in one go and give each one the matching predicate vector type, I've added a system of Tablegen informational records describing MVE's vector types: each one gives the underlying LLVM IR ValueType (which may not be the same if the MVE vector is of explicitly signed or unsigned integers) and an appropriate vNi1 to use as the predicate vector. (Also, those info records include the usual encoding for the types, so that as we add associations between each instruction encoding and one of the new `MVEVectorVTInfo` records, we can remove some of the existing template parameters and replace them with references to the vector type info's fields.) The user-facing ACLE intrinsics will receive a predicate mask as a 16-bit integer, so I've also provided a pair of intrinsics i2v and v2i, to convert between an integer and a vector of i1 by just changing the register class. Reviewers: dmgreen, miyuki, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67158
* [AMDGPU] Skip additional folding on the same operand.Michael Liao2019-10-241-7/+19
| | | | | | | | | | Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69355
* Fix compilation warning on the trailing whitespace. NFC.Michael Liao2019-10-241-1/+1
|
* [MIPS GlobalISel] Select MSA vector generic and builtin fabsPetar Avramovic2019-10-242-3/+7
| | | | | | | | | | | | | | selectImpl is able to select G_FABS when we set bank for vector operands to fprb. Add detailed tests. Note: G_FABS is generated from llvm-ir intrinsics llvm.fabs.*, and at the moment MIPS is not able to generate this intrinsic for vector type (some targets generate vector llvm.fabs.* from calls to a builtin function). We can handle fabs using __builtin_msa_fmax_a_<format> and passing same vector as both arguments. __builtin_msa_fmax_a_<format> will be directly selected into FMAX_A_<format> in legalizeIntrinsic. Differential Revision: https://reviews.llvm.org/D69346
* Hide implementation details in anonymous namespaces. NFC.Benjamin Kramer2019-10-242-1/+3
|
* [MIPS GlobalISel] MSA vector generic and builtin fadd, fsub, fmul, fdivPetar Avramovic2019-10-242-3/+28
| | | | | | | | | | | Select vector G_FADD, G_FSUB, G_FMUL and G_FDIV for MIPS32 with MSA. We have to set bank for vector operands to fprb and selectImpl will do the rest. __builtin_msa_fadd_<format>, __builtin_msa_fsub_<format>, __builtin_msa_fmul_<format> and __builtin_msa_fdiv_<format> will be transformed into G_FADD, G_FSUB, G_FMUL and G_FDIV in legalizeIntrinsic respectively and selected in the same way. Differential Revision: https://reviews.llvm.org/D69340
* [MIPS GlobalISel] MSA vector generic and builtin sdiv, srem, udiv, uremPetar Avramovic2019-10-242-6/+32
| | | | | | | | | | | Select vector G_SDIV, G_SREM, G_UDIV and G_UREM for MIPS32 with MSA. We have to set bank for vector operands to fprb and selectImpl will do the rest. __builtin_msa_div_s_<format>, __builtin_msa_mod_s_<format>, __builtin_msa_div_u_<format> and __builtin_msa_mod_u_<format> will be transformed into G_SDIV, G_SREM, G_UDIV and G_UREM in legalizeIntrinsic respectively and selected in the same way. Differential Revision: https://reviews.llvm.org/D69333
* [AMDGPU] Allow folding of sgpr to vgpr copyStanislav Mekhanoshin2019-10-231-2/+3
| | | | | | | | Potentially sgpr to sgpr copy should also be possible. That is however trickier because we may end up with a wrong register class at use because of xm0/xexec permutations. Differential Revision: https://reviews.llvm.org/D69280
* [Hexagon] Fix typo. NFCShoaib Meenai2019-10-231-1/+1
| | | | Testing git push access.
OpenPOWER on IntegriCloud