summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* [Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 foldRoman Lebedev2019-07-241-0/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This was originally reported in D62818. https://rise4fun.com/Alive/oPH InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression will be hoisted out of a loop if `Y` is invariant and `X` is not. But as it is seen from the diffs here, if it didn't get hoisted, the produced assembly is almost universally worse. Much like with my recent "hoist add/sub by/from const" patches, we should get almost universal win if we hoist constant, there is almost always an "and/test by imm" instruction, but "shift of imm" not so much, so we may avoid having to materialize the immediate, and thus need one less register. And since we now shift not by constant, but by something else, the live-range of that something else may reduce. Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit` instruction pattern. And to not get into endless combine loop. Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm Reviewed By: spatel Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62871 llvm-svn: 366955
* [GlobalISel] Support for inlining memcpy, memset and memmove calls.Amara Emerson2019-07-241-0/+505
| | | | | | | | | | | | | This introduces a new family of combiner helper routines that re-use the target specific cost model from SelectionDAG, and generate inline implementations of the memcpy family of intrinsics. The combines are only enabled at optimization levels higher than -O0, and give very substantial performance improvements. Differential Revision: https://reviews.llvm.org/D65167 llvm-svn: 366951
* [Remarks] Add support for serializing metadata for every remark streamerFrancis Visoiu Mistrih2019-07-241-48/+13
| | | | | | | This allows every serializer format to implement metaSerializer() and return the corresponding meta serializer. llvm-svn: 366946
* [AArch64][GlobalISel] Fix a crash during s128 G_ICMP legalization due to ↵Amara Emerson2019-07-241-4/+4
| | | | | | | | | | | r366317. r366317 added a legalization for s128 G_ICMP narrow scalar which tried to hard code the result type of the new legalized G_SELECT. Change this to instead use type of the original G_ICMP result and allow the target to legalize it if necessary later. llvm-svn: 366943
* [Remarks][NFC] Rename remarks::Serializer to remarks::RemarkSerializerFrancis Visoiu Mistrih2019-07-241-3/+3
| | | | llvm-svn: 366939
* Fix signed/unsigned comparison warning. NFCI.Simon Pilgrim2019-07-241-1/+1
| | | | llvm-svn: 366935
* [DAGCombine] matchBinOpReduction - add partial reduction matchingSimon Pilgrim2019-07-241-7/+32
| | | | | | | | | | | | | | | | | | | | This patch adds support for recognizing cases where a larger vector type is being used to reduce just the elements in the lower subvector: e.g. <8 x i32> reduction pattern in a <16 x i32> vector: <4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u> <2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> <1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u> matchBinOpReduction returns the lower extracted subvector in such cases, assuming isExtractSubvectorCheap accepts the extraction. I've only enabled it for X86 reduction sums so far. I intend to enable it for the bitop/minmax cases in future patches, and eventually I think its worth turning it on all the time. This is mainly just a case of ensuring calls to matchBinOpReduction don't make assumptions on the vector width based on the original vector extraction. Fixes the x86 partial reduction sum cases in PR33758 and PR42023. Differential Revision: https://reviews.llvm.org/D65047 llvm-svn: 366933
* [SelectionDAG] makeEquivalentMemoryOrdering - early out for equal chains ↵Simon Pilgrim2019-07-241-1/+1
| | | | | | | | | | (PR42727) If we are already using the same chain for the old/new memory ops then just return. Fixes PR42727 which had getLoad() reusing an existing node. llvm-svn: 366922
* [SDAG] convert (sub x, 1) to (add x, -1) in ctpop expansion; NFCSanjay Patel2019-07-241-3/+3
| | | | | | We canonicalize to the add form, so create that directly for efficiency. llvm-svn: 366914
* [TargetLowering] SimplifyMultipleUseDemandedBits - add VECTOR_SHUFFLE support.Simon Pilgrim2019-07-231-0/+23
| | | | | | | | If all the demanded elts are from one operand and are inline, then we can use the operand directly. The changes are mainly from SSE41 targets which has blendvpd but not cmpgtq, allowing the v2i64 comparison to be simplified as we only need the signbit from alternate v4i32 elements. llvm-svn: 366817
* [TargetLowering] Add SimplifyMultipleUseDemandedBitsSimon Pilgrim2019-07-231-1/+128
| | | | | | | | | | | | | | | | | | This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured. The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use. We do see a couple of regressions that need to be addressed: AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better). X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG. The code owners have confirmed its ok for these cases to fixed up in future patches. Differential Revision: https://reviews.llvm.org/D63281 llvm-svn: 366799
* [DAGCombiner] Make ShrinkLoadReplaceStoreWithStore return an SDValue instead ↵Craig Topper2019-07-231-9/+8
| | | | | | | | | | of an SDNode*. NFCI The function was calling getNode() on an SDValue to return and the caller turned the result back into a SDValue. So just return the original SDValue to avoid this. llvm-svn: 366779
* [DAGCombiner] Use SDNode::isOperandOf to simplify some code. NFCICraig Topper2019-07-231-7/+1
| | | | llvm-svn: 366778
* Move variable out from debug only section.Richard Trieu2019-07-231-2/+0
| | | | | | | MFI is no longer just needed for an assert. Move it out of the debug only section to allow non-assert builds to be able to find it. llvm-svn: 366773
* [Statepoints] Fix a bug in statepoint lowering for functions w/no-realign-stackPhilip Reames2019-07-221-1/+8
| | | | | | | | | | We were silently using the ABI alignment for all of the stores generated for deopt and gc values. We'd gotten the alignment of the stack slot itself properly reduced (via MachineFrameInfo's clamping), but having the MMO on the store incorrect was enough for us to generate an aligned store to a unaligned location. The simplest fix would have been to just pass the alignment to the helper function, but once we do that, the helper function doesn't really help. So, inline it and directly call the MMO version of DAG.getStore with a properly constructed MMO. Note that there's a separate performance possibility here. Even if we *can* realign stacks, we probably don't *want to* if all of the stores are in slowpaths. But that's a later patch, if at all. :) llvm-svn: 366765
* Stubs out TLOF for AIX and add support for common vars in assembly output.Sean Fertile2019-07-221-0/+55
| | | | | | | | | Stubs out a TargetLoweringObjectFileXCOFF class, implementing only SelectSectionForGlobal for common symbols. Also adds an override of EmitGlobalVariable in PPCAIXAsmPrinter which adds a number of defensive errors and adds support for emitting common globals. llvm-svn: 366727
* TableGen: Support physical register inputs > 255Matt Arsenault2019-07-221-1/+4
| | | | | | | This was truncating register value that didn't fit in unsigned char. Switch AMDGPU sendmsg intrinsics to using a tablegen pattern. llvm-svn: 366695
* Added address-space mangling for stack related intrinsicsChristudasan Devadasan2019-07-222-3/+6
| | | | | | | | | | | | Modified the following 3 intrinsics: int_addressofreturnaddress, int_frameaddress & int_sponentry. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D64561 llvm-svn: 366679
* [IPRA][ARM] Make use of the "returned" parameter attributeOliver Stannard2019-07-221-0/+2
| | | | | | | | | | | | ARM has code to recognise uses of the "returned" function parameter attribute which guarantee that the value passed to the function in r0 will be returned in r0 unmodified. IPRA replaces the regmask on call instructions, so needs to be told about this to avoid reverting the optimisation. Differential revision: https://reviews.llvm.org/D64986 llvm-svn: 366669
* [GISel]: Attach missing range metadata while translating G_LOADsAditya Nandakumar2019-07-211-2/+3
| | | | | | | | | | https://reviews.llvm.org/D65048 Attach range information to G_LOAD when only defining one register. reviewed by: arsenm llvm-svn: 366656
* [Codegen][SelectionDAG] X u% C == 0 fold: non-splat vector improvementsRoman Lebedev2019-07-201-35/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Four things here: 1. Generalize the fold to handle non-splat divisors. Reasonably trivial. 2. Unban power-of-two divisors. I don't see any reason why they should be illegal. * There is no ban in Hacker's Delight * I think the ban came from the same bug that caused the miscompile in the base patch - in `floor((2^W - 1) / D)` we were dividing by `D0` instead of `D`, and we **were** ensuring that `D0` is not `1`, which made sense. 3. Unban `1` divisors. I no longer believe Hacker's Delight actually says that the fold is invalid for `D = 0`. Further considerations: * We know that * `(X u% 1) == 0` can be constant-folded to `1`, * `(X u% 1) != 0` can be constant-folded to `0`, * Also, we know that * `X u<= -1` can be constant-folded to `1`, * `X u> -1` can be constant-folded to `0`, * https://godbolt.org/z/7jnZJX https://rise4fun.com/Alive/oF6p * We know will end up with the following: `(setule/setugt (rotr (mul N, P), K), Q)` * Therefore, for given new DAG nodes and comparison predicates (`ule`/`ugt`), we will still produce the correct answer if: `Q` is a all-ones constant; and both `P` and `K` are *anything* other than `undef`. * The fold will indeed produce `Q = all-ones`. 4. Try to re-splat the `P` and `K` vectors - we don't care about their values for the lanes where divisor was `1`. Reviewers: RKSimon, hermord, craig.topper, spatel, xbolva00 Reviewed By: RKSimon Subscribers: hiraditya, javed.absar, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63963 llvm-svn: 366637
* LiveIntervals: Fix handleMove asserting on BUNDLEMatt Arsenault2019-07-191-1/+4
| | | | | | | | | The top-level BUNDLE instruction should behave as an ordinary instruction. It is supposed to have all relevant registers as implicit operands. Moving it should work as any other instruction. I believe the assert intended to avoid moving instructions inside bundles. llvm-svn: 366605
* Revert "Use the MachineBasicBlock symbol for a callbr target"Nick Desaulniers2019-07-191-7/+2
| | | | | | | | | | | This reverts commit r366523/ccbffefccaff42b0d094c9ef0f49fc3e8c8456ea. Two regressions were immediately reported: - https://github.com/ClangBuiltLinux/linux/issues/614 - https://github.com/ClangBuiltLinux/linux/issues/615 Reported-by: nathanchance llvm-svn: 366600
* DAG: Handle dbg_value for arguments split into multiple subregsMatt Arsenault2019-07-191-23/+52
| | | | | | | | This was handled previously for arguments split due to not fitting in an MVT. This was dropping the register for argument registers split due to TLI::getRegisterTypeForCallingConv. llvm-svn: 366574
* [MachineCSE][MachinePRE] Avoid hoisting code from code regions into hot BBs.Kai Luo2019-07-191-0/+25
| | | | | | | | | | | | Summary: Current PRE hoists common computations into CMBB = DT->findNearestCommonDominator(MBB, MBB1). However, if CMBB is in a hot loop body, we might get performance degradation. Differential Revision: https://reviews.llvm.org/D64394 llvm-svn: 366570
* [IPRA] Don't rely on non-exact function definitionsOliver Stannard2019-07-191-1/+5
| | | | | | | | | If a function definition is not exact, then the linker could select a differently-compiled version of it, which could use different registers. https://reviews.llvm.org/D64909 llvm-svn: 366557
* Use the MachineBasicBlock symbol for a callbr targetBill Wendling2019-07-191-2/+7
| | | | | | | | | | | | | | | | | | | Summary: Inline asm doesn't use labels when compiled as an object file. Therefore, we shouldn't create one for the (potential) callbr destination. Instead, use the symbol for the MachineBasicBlock. Reviewers: nickdesaulniers, craig.topper Reviewed By: nickdesaulniers Subscribers: xbolva00, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64888 llvm-svn: 366523
* [GlobalISel] Translate calls to memcpy et al to G_INTRINSIC_W_SIDE_EFFECTs ↵Amara Emerson2019-07-192-42/+83
| | | | | | | | | | | | | | and legalize later. I plan on adding memcpy optimizations in the GlobalISel pipeline, but we can't do that unless we delay lowering to actual function calls. This patch changes the translator to generate G_INTRINSIC_W_SIDE_EFFECTS for these functions, and then have each target specify that using the new custom legalizer for intrinsics hook that they want it expanded it a libcall. Differential Revision: https://reviews.llvm.org/D64895 llvm-svn: 366516
* CodeGen: Allow !associated metadata to point to aliases.Peter Collingbourne2019-07-181-2/+2
| | | | | | | | | | This is a small extension of !associated, mostly useful for the implementation convenience of instrumentation passes that RAUW globals with aliases, such as LowerTypeTests. Differential Revision: https://reviews.llvm.org/D64951 llvm-svn: 366502
* [COFF] Change a variable type to be const in the HeapAllocSite map.Amy Huang2019-07-184-5/+7
| | | | llvm-svn: 366479
* [DAGCombine] Pull getSubVectorSrc helper out of ↵Simon Pilgrim2019-07-181-22/+22
| | | | | | | | narrowInsertExtractVectorBinOp. NFCI. NFC step towards reusing this in other EXTRACT_SUBVECTOR combines. llvm-svn: 366435
* Changes to display code view debug info type records in hex formatNilanjana Basu2019-07-171-1/+1
| | | | llvm-svn: 366390
* Adding inline comments to code view type record directives for better ↵Nilanjana Basu2019-07-171-2/+15
| | | | | | readability llvm-svn: 366372
* [PEI] Don't re-allocate a pre-allocated stack protector slotFrancis Visoiu Mistrih2019-07-172-2/+27
| | | | | | | | | | | | | | | | | | | | | | The LocalStackSlotPass pre-allocates a stack protector and makes sure that it comes before the local variables on the stack. We need to make sure that later during PEI we don't re-allocate a new stack protector slot. If that happens, the new stack protector slot will end up being **after** the local variables that it should be protecting. Therefore, we would have two slots assigned for two different stack protectors, one at the top of the stack, and one at the bottom. Since PEI will overwrite the assigned slot for the stack protector, the load that is used to compare the value of the stack protector will use the slot assigned by PEI, which is wrong. For this, we need to check if the object is pre-allocated, and re-use that pre-allocated slot. Differential Revision: https://reviews.llvm.org/D64757 llvm-svn: 366371
* [CodeGen][NFC] Simplify checks for stack protector index checkingFrancis Visoiu Mistrih2019-07-172-13/+11
| | | | | | | Use `hasStackProtectorIndex()` instead of `getStackProtectorIndex() >= 0`. llvm-svn: 366369
* GlobalISel: Handle widenScalar of arbitrary G_MERGE_VALUES sourcesMatt Arsenault2019-07-172-48/+87
| | | | | | | | | | | Extract the sources to the GCD of the original size and target size, padding with implicit_def as necessary. Also fix the case where the requested source type is wider than the original result type. This was ignoring the type, and just using the destination. Do the operation in the requested type and truncate back. llvm-svn: 366367
* GlobalISel: Handle more cases for widenScalar of G_MERGE_VALUESMatt Arsenault2019-07-171-4/+23
| | | | | | | | | | | | Use an anyext to the requested type for the leftover operand to produce a slightly wider type, and then truncate the final merge. I have another implementation almost ready which handles arbitrary widens, but I think it produces worse code in this example (which I think is 90% due to not folding redundant copies or folding out implicit_def users), so I wanted to add this as a baseline first. llvm-svn: 366366
* Basic codegen for MTE stack tagging.Evgeniy Stepanov2019-07-171-0/+13
| | | | | | | | | | | | Implement IR intrinsics for stack tagging. Generated code is very unoptimized for now. Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are used to implement a tagged stack frame pointer in a virtual register. Differential Revision: https://reviews.llvm.org/D64172 llvm-svn: 366360
* [AsmPrinter] Make the encoding of call sites in .gcc_except_table ↵Alex Bradbury2019-07-173-6/+28
| | | | | | | | | | | | | | | | | | | configurable and use for RISC-V The original behavior was to always emit the offsets to each call site in the call site table as uleb128 values, however on some architectures (eg RISCV) these uleb128 offsets into the code cannot always be resolved until link time (because relaxation will invalidate any calculated offsets), and there are no appropriate relocations for uleb128 values. As a consequence it needs to be possible to specify an alternative. This also switches RISCV to use DW_EH_PE_udata4 for call side encodings in .gcc_except_table Differential Revision: https://reviews.llvm.org/D63415 Patch by Edward Jones. llvm-svn: 366329
* [RISCV] Set correct encodings for DWARF exception handlingAlex Bradbury2019-07-171-0/+8
| | | | | | | | | | | | This patch sets correct encodings for DWARF exception handling for RISC-V (other than call site encoding, which must be udata4 rather than uleb128 and is handled by D63415). This has the same intend as D63409, except this version matches GCC/binutils behaviour which uses the same encodings regardless of PIC/non-PIC and medlow/medany code model. llvm-svn: 366327
* [MIPS GlobalISel] ClampScalar and select pointer G_ICMPPetar Avramovic2019-07-171-0/+36
| | | | | | | | | | | Add narrowScalar to half of original size for G_ICMP. ClampScalar G_ICMP's operands 2 and 3 to to s32. Select G_ICMP for pointers for MIPS32. Pointer compare is same as for integers, it is enough to declare them as legal type. Differential Revision: https://reviews.llvm.org/D64856 llvm-svn: 366317
* GlobalISel: Add overload of handleAssignments with CCStateMatt Arsenault2019-07-161-2/+11
| | | | | | | | | | | AMDGPU needs to allocate special argument registers separately from the user function argument list, so needs direct control over the CCState. The ArgLocs argument is only really necessary because CCState doesn't allow access to it. llvm-svn: 366279
* DWARF: Skip zero column for inline call sitesDavid Blaikie2019-07-161-1/+2
| | | | | | | | | | | | | | D64033 <https://reviews.llvm.org/D64033> added DW_AT_call_column for inline sites. However, that change wasn't aware of "-gno-column-info". To avoid adding column info when "-gno-column-info" is used, now DW_AT_call_column is only added when we have non-zero column (when "-gno-column-info" is used, column will be zero). Patch by Wenlei He! Differential Revision: https://reviews.llvm.org/D64784 llvm-svn: 366264
* [Strict FP] Allow more relaxed schedulingUlrich Weigand2019-07-161-10/+21
| | | | | | | | | | | | | | Reimplement scheduling constraints for strict FP instructions in ScheduleDAGInstrs::buildSchedGraph to allow for more relaxed scheduling. Specifially, allow one strict FP instruction to be scheduled across another, as long as it is not moved across any global barrier. Differential Revision: https://reviews.llvm.org/D64412 Reviewed By: cameron.mcinally llvm-svn: 366222
* [Remarks][NFC] Combine ParserFormat and SerializerFormatFrancis Visoiu Mistrih2019-07-161-0/+1
| | | | | | It's useless to have both. llvm-svn: 366216
* [DAGCombiner] fold (addcarry (xor a, -1), b, c) -> (subcarry b, a, !c) and ↵Amaury Sechet2019-07-161-16/+28
| | | | | | | | | | | | | | | | | | | flip carry. Summary: As per title. DAGCombiner only mathes the special case where b = 0, this patches extends the pattern to match any value of b. Depends on D57302 Reviewers: hfinkel, RKSimon, craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59208 llvm-svn: 366214
* Fix parameter name comments using clang-tidy. NFC.Rui Ueyama2019-07-169-24/+24
| | | | | | | | | | | | | | | | | | | | | This patch applies clang-tidy's bugprone-argument-comment tool to LLVM, clang and lld source trees. Here is how I created this patch: $ git clone https://github.com/llvm/llvm-project.git $ cd llvm-project $ mkdir build $ cd build $ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \ -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \ -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm $ ninja $ parallel clang-tidy -checks='-*,bugprone-argument-comment' \ -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \ ::: ../llvm/lib/**/*.{cpp,h} ../clang/lib/**/*.{cpp,h} ../lld/**/*.{cpp,h} llvm-svn: 366177
* [WebAssembly] Rename except_ref type to exnrefHeejin Ahn2019-07-151-1/+1
| | | | | | | | | | | | | | | | | | | Summary: We agreed to rename `except_ref` to `exnref` for consistency with other reference types in https://github.com/WebAssembly/exception-handling/issues/79. This also renames WebAssemblyInstrExceptRef.td to WebAssemblyInstrRef.td in order to use the file for other reference types in future. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64703 llvm-svn: 366145
* GlobalISel: Implement narrowScalar for vector extract/insert indexesMatt Arsenault2019-07-151-0/+11
| | | | llvm-svn: 366113
* [PowerPC] Support fp128 libcallsFangrui Song2019-07-151-0/+28
| | | | | | | | | | | | | On PowerPC, IEEE 754 quadruple-precision libcall names use "kf" instead of "tf". In libgcc, libgcc/config/rs6000/float128-sed converts TF names to KF names. This patch implements its 24 substitution rules. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D64282 llvm-svn: 366039
OpenPOWER on IntegriCloud