bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Basic codegen for MTE stack tagging.	Evgeniy Stepanov	2019-07-17	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \|	Implement IR intrinsics for stack tagging. Generated code is very unoptimized for now. Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are used to implement a tagged stack frame pointer in a virtual register. Differential Revision: https://reviews.llvm.org/D64172 llvm-svn: 366360
*	[DAGCombiner] fold (addcarry (xor a, -1), b, c) -> (subcarry b, a, !c) and ↵	Amaury Sechet	2019-07-16	1	-16/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	flip carry. Summary: As per title. DAGCombiner only mathes the special case where b = 0, this patches extends the pattern to match any value of b. Depends on D57302 Reviewers: hfinkel, RKSimon, craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59208 llvm-svn: 366214
*	Fix parameter name comments using clang-tidy. NFC.	Rui Ueyama	2019-07-16	2	-16/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch applies clang-tidy's bugprone-argument-comment tool to LLVM, clang and lld source trees. Here is how I created this patch: $ git clone https://github.com/llvm/llvm-project.git $ cd llvm-project $ mkdir build $ cd build $ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \ -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \ -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm $ ninja $ parallel clang-tidy -checks='-,bugprone-argument-comment' \ -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \ ::: ../llvm/lib//.{cpp,h} ../clang/lib/*/.{cpp,h} ../lld/*/.{cpp,h} llvm-svn: 366177
*	[DAGCombine] narrowExtractedVectorBinOp - wrap subvector extraction in ↵	Simon Pilgrim	2019-07-12	1	-9/+11
\| \| \| \| \| \| \| \|	helper. NFCI. First step towards supporting 'free' subvector extractions other than concat_vectors. llvm-svn: 365896
*	[DAGCombine] narrowInsertExtractVectorBinOp - add CONCAT_VECTORS support	Simon Pilgrim	2019-07-11	1	-4/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We already split extract_subvector(binop(insert_subvector(v,x),insert_subvector(w,y))) -> binop(x,y). This patch adds support for extract_subvector(binop(concat_vectors(),concat_vectors())) cases as well. In particular this means we don't have to wait for X86 lowering to convert concat_vectors to insert_subvector chains, which helps avoid some cases where demandedelts/combine calls occur too late to split large vector ops. The fast-isel-store.ll load folding regression is annoying but I don't think is that critical. Differential Revision: https://reviews.llvm.org/D63653 llvm-svn: 365785
*	OpaquePtr: use byval accessor instead of inspecting pointer type. NFC.	Tim Northover	2019-07-11	1	-3/+2
\| \| \| \| \| \| \|	The accessor can deal with both "byval(ty)" and "ty* byval" forms seamlessly. llvm-svn: 365769
*	[SDAG] commute setcc operands to match a subtract	Sanjay Patel	2019-07-10	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have: R = sub X, Y P = cmp Y, X ...then flipping the operands in the compare instruction can allow using a subtract that sets compare flags. Motivated by diffs in D58875 - not sure if this changes anything there, but this seems like a good thing independent of that. There's a more involved version of this transform already in IR (in instcombine although that seems misplaced to me) - see "swapMayExposeCSEOpportunities()". Differential Revision: https://reviews.llvm.org/D63958 llvm-svn: 365711
*	Move three folds for FADD, FSUB and FMUL in the DAG combiner away from ↵	Michael Berg	2019-07-10	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unsafe to more aligned checks that reflect context Summary: Unsafe does not map well alone for each of these three cases as it is missing NoNan context when accessed directly with clang. I have migrated the fold guards to reflect the expectations of handing nan and zero contexts directly (NoNan, NSZ) and some tests with it. Unsafe does include NSZ, however there is already precedent for using the target option directly to reflect that context. Reviewers: spatel, wristow, hfinkel, craig.topper, arsenm Reviewed By: arsenm Subscribers: michele.scandale, wdng, javed.absar Differential Revision: https://reviews.llvm.org/D64450 llvm-svn: 365679
*	[TargetLowering] support BlockAddress as "i" inline asm constraint	Nick Desaulniers	2019-07-10	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This allows passing address of labels to inline assembly "i" input constraints. Fixes pr/42502. Reviewers: ostannard Reviewed By: ostannard Subscribers: void, echristo, nathanchance, ostannard, javed.absar, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D64167 llvm-svn: 365664
*	[DAGCombine] visitINSERT_SUBVECTOR - use uint64_t subvector index. NFCI.	Simon Pilgrim	2019-07-10	1	-1/+1
\| \| \| \| \| \|	Keep the uint64_t type from getZExtValue() to stop truncation/extension overflow warnings in MSVC in subvector index math. llvm-svn: 365621
*	Fix const/non-const lambda return type warning. NFCI.	Simon Pilgrim	2019-07-10	1	-1/+1
\| \| \| \|	llvm-svn: 365613
*	[X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into ↵	Craig Topper	2019-07-09	1	-15/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it Basically the problem is that X86 doesn't set the Fast flag from allowsMemoryAccess on certain CPUs due to slow unaligned memory subtarget features. This prevents bitcasts from being folded into loads and stores. But all vector loads and stores of the same width are the same cost on X86. This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it. Differential Revision: https://reviews.llvm.org/D64295 llvm-svn: 365549
*	[DAGCombine] LoadedSlice - keep getOffsetFromBase() uint64_t offset. NFCI.	Simon Pilgrim	2019-07-09	1	-1/+1
\| \| \| \| \| \|	Keep the uint64_t type from getOffsetFromBase() to stop truncation/extension overflow warnings in MSVC in alignment math. llvm-svn: 365504
*	OpaquePtr: add Type parameter to Loads analysis API.	Tim Northover	2019-07-09	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	This makes the functions in Loads.h require a type to be specified independently of the pointer Value so that when pointers have no structure other than address-space, it can still do its job. Most callers had an obvious memory operation handy to provide this type, but a SROA and ArgumentPromotion were doing more complicated analysis. They get updated to merge the properties of the various instructions they were considering. llvm-svn: 365468
*	[SelectionDAG] Simplify some calls to getSetCCResultType. NFC	Bjorn Pettersson	2019-07-09	3	-8/+4
\| \| \| \| \| \| \| \|	DAGTypeLegalizer and SelectionDAGLegalize has helper functions wrapping the call to TLI.getSetCCResultType(...). Use those helpers in more places. llvm-svn: 365456
*	[LegalizeTypes] Fix saturation bug for smul.fix.sat	Bjorn Pettersson	2019-07-09	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Make sure we use SETGE instead of SETGT when checking if the sign bit is zero at SMULFIXSAT expansion. The faulty expansion occured when doing "expand" of SMULFIXSAT and the scale was exactly matching the size of the smaller type. For example doing i64 Z = SMULFIXSAT X, Y, 32 and expanding X/Y/Z into using two i32 values. The problem was that we sometimes did not saturate to min when overflowing. Here is an example using Q3.4 numbers: Consider that we are multiplying X and Y. X = 0x80 (-8.0 as Q3.4) Y = 0x20 (2.0 as Q3.4) To avoid loss of precision we do a widening multiplication, getting a 16 bit result Z = 0xF000 (-16.0 as Q7.8) To detect negative overflow we should check if the five most significant bits in Z are less than -1. Assume that we name the 4 most significant bits as HH and the next 4 bits as HL. Then we can do the check by examining if (HH < -1) or (HH == -1 && "sign bit in HL is zero"). The fault was that we have been doing the check as (HH < -1) or (HH == -1 && HL > 0) instead of (HH < -1) or (HH == -1 && HL >= 0). In our example HH is -1 and HL is 0, so the old code did not trigger saturation and simply truncated the result to 0x00 (0.0). With the bugfix we instead detect that we should saturate to min, and the result will be set to 0x80 (-8.0). Reviewers: leonardchan, bevinh Reviewed By: leonardchan Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64331 llvm-svn: 365455
*	Fixing @llvm.memcpy not honoring volatile.	Guillaume Chatelet	2019-07-09	1	-19/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is explicitly not addressing target-specific code, or calls to memcpy. Summary: https://bugs.llvm.org/show_bug.cgi?id=42254 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63215 llvm-svn: 365449
*	Standardize on MSVC behavior for triples with no environment	Reid Kleckner	2019-07-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This makes it so that IR files using triples without an environment work out of the box, without normalizing them. Typically, the MSVC behavior is more desirable. For example, it tends to enable things like constant merging, use of associative comdats, etc. Addresses PR42491 Reviewers: compnerd Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64109 llvm-svn: 365387
*	[TargetLowering] SimplifyDemandedBits - just call computeKnownBits for ↵	Simon Pilgrim	2019-07-08	1	-23/+3
\| \| \| \| \| \| \| \| \| \|	BUILD_VECTOR cases. Don't do this locally, computeKnownBits does this better (and can handle non-constant cases as well). A next step would be to actually simplify non-constant elements - building on what we already do in SimplifyDemandedVectorElts. llvm-svn: 365309
*	[DAGCombine] convertBuildVecZextToZext - remove duplicate getOpcode() call. ↵	Simon Pilgrim	2019-07-06	1	-1/+1
\| \| \| \| \| \|	NFCI. llvm-svn: 365269
*	[DAGCombiner] Don't combine (addcarry (uaddo X, Y), 0, Carry) -> (addcarry ↵	Craig Topper	2019-07-04	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	X, Y, Carry) if the Carry comes from the uaddo. Summary: The uaddo won't be removed and the addcarry will still be dependent on the uaddo. So we'll just increase the use count of X and Y and potentially require a COPY. Reviewers: spatel, RKSimon, deadalnix Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64190 llvm-svn: 365149
*	[CodeGen] Make branch funnels pass the machine verifier	Francis Visoiu Mistrih	2019-07-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We previously marked all the tests with branch funnels as `-verify-machineinstrs=0`. This is an attempt to fix it. 1) `ICALL_BRANCH_FUNNEL` has no defs. Mark it as `let OutOperandList = (outs)` 2) After that we hit an assert: ``` Assertion failed: (Op.getValueType() != MVT::Other && Op.getValueType() != MVT::Glue && "Chain and glue operands should occur at end of operand list!"), function AddOperand, file /Users/francisvm/llvm/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp, line 461. ``` The chain operand was added at the beginning of the operand list. Move that to the end. 3) After that we hit another verifier issue in the pseudo expansion where the registers used in the cmps and jmps are not added to the livein lists. Add the `EFLAGS` to all the new MBBs that we create. PR39436 Differential Review: https://reviews.llvm.org/D54155 llvm-svn: 365058
*	Use getAllOnesConstants instead of -1 in DAGCombiner. NFC	Amaury Sechet	2019-07-03	1	-1/+1
\| \| \| \|	llvm-svn: 365054
*	[DAGCombine] More diamong carry pattern optimization.	Amaury Sechet	2019-07-03	1	-27/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This diff improve the capability of DAGCOmbine to generate linear carries propagation in presence of a diamond pattern. It is now able to match a large variety of different patterns rather than some hardcoded one. Arguably, the codegen in test cases is not better, but this is to be expected. The goal of this transformation is more about canonicalisation than actual optimisation. Reviewers: hfinkel, RKSimon, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57302 llvm-svn: 365051
*	[SelectionDAG] Propagate alias metadata to target intrinsic nodes	James Molloy	2019-07-03	2	-6/+8
\| \| \| \| \| \| \| \|	When a target intrinsic has been determined to touch memory, we construct a MachineMemOperand during SDAG construction. In this case, we should propagate AAMDNodes metadata to the MachineMemOperand where available. Differential revision: https://reviews.llvm.org/D64131 llvm-svn: 365043
*	[Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457)	Roman Lebedev	2019-07-03	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 \| PR42457 ]]. In middle-end, we'd want to prefer the form with two adds - D63992, but as this diff shows, not every target will prefer that pattern. Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars, but only X86 prefer that same pattern for vectors. Here i'm adding a new TLI hook, always defaulting to the inc-of-add, but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars. Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel Reviewed By: efriedma Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64090 llvm-svn: 365010
*	[NFC][TargetLowering] Some preparatory cleanups around 'prepareUREMEqFold()' ↵	Roman Lebedev	2019-07-02	1	-17/+18
\| \| \| \| \| \|	from D63963 llvm-svn: 364921
*	[DAGCombiner] Exploiting more about the transformation of ↵	Zi Xuan Wu	2019-07-02	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TransformFPLoadStorePair function For a given floating point load / store pair, if the load value isn't used by any other operations, then consider transforming the pair to integer load / store operations if the target deems the transformation profitable. And we can exploiting much more when there are other operation nodes with chain operand between the load/store pair so long as we keep the chain ordering original. We only replace the register used to load/store from float to integer. I only add testcase in ARM because the TLI.isDesirableToTransformToIntegerOp hook is only enabled in ARM target. Differential Revision: https://reviews.llvm.org/D60601 llvm-svn: 364883
*	[SelectionDAG] Do minnum->minimum at legalization time instead of building time	Benjamin Kramer	2019-07-01	2	-16/+17
\| \| \| \| \| \| \| \|	The SDAGBuilder behavior stems from the days when we didn't have fast math flags available in SDAG. We do now and doing the transformation in the legalizer has the advantage that it also works for vector types. llvm-svn: 364743
*	[SelectionDAG] Use the memory VT instead of result VT for FoldingSet ↵	Craig Topper	2019-06-30	1	-3/+2
\| \| \| \| \| \| \| \| \|	profiling in getMaskedLoad/getMaskedStore. This matches what is done by the Profile function. Otherwise CSE won't work properly. llvm-svn: 364717
*	[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3)	Roman Lebedev	2019-06-27	1	-0/+109
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222. There is no such action in "Add Action" menu... This implements an optimization described in Hacker's Delight 10-17: when `C` is constant, the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. This is a recommit, the original commit rL364563 was reverted in rL364568 because test-suite detected miscompile - the new comparison constant 'Q' was being computed incorrectly (we divided by `D0` instead of `D`). Original patch D50222 by @hermord (Dmytro Shynkevych) Notes: - In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla. This seems to require an extra branch on overflow, so I refrained from implementing this for now. - An explicit check for when the `REM` can be reduced to just its LHS is included: the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise. I hadn't managed to find a better way to not generate worse output in this case. - The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390. Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00 Reviewed By: RKSimon, xbolva00 Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord Tags: #llvm Differential Revision: https://reviews.llvm.org/D63391 llvm-svn: 364600
*	Revert "[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM ↵	Roman Lebedev	2019-06-27	1	-107/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	case) (try 2)" Appears to break test-suite on http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/23790 FAIL: burg.execution_time FAIL: spiff.execution_time FAIL: employ.execution_time FAIL: llu.execution_time FAIL: gramschmidt.execution_time FAIL: fdtd-apml.execution_time This reverts commit r364563. llvm-svn: 364568
*	[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)	Roman Lebedev	2019-06-27	1	-0/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222. There is no such action in "Add Action" menu... Original patch D50222 by @hermord (Dmytro Shynkevych) This implements an optimization described in Hacker's Delight 10-17: when `C` is constant, the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. Original patch author: @hermord (Dmytro Shynkevych)! Notes: - In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla. This seems to require an extra branch on overflow, so I refrained from implementing this for now. - An explicit check for when the `REM` can be reduced to just its LHS is included: the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise. I hadn't managed to find a better way to not generate worse output in this case. - The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390. Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00 Reviewed By: RKSimon, xbolva00 Subscribers: xbolva00, javed.absar, llvm-commits, hermord Tags: #llvm Differential Revision: https://reviews.llvm.org/D63391 llvm-svn: 364563
*	[TargetLowering] SimplifyDemandedVectorElts - add shift/rotate support.	Simon Pilgrim	2019-06-27	1	-0/+18
\| \| \| \|	llvm-svn: 364548
*	[TargetLowering] SimplifyDemandedBits - use DemandedElts to better identify ↵	Simon Pilgrim	2019-06-27	1	-11/+21
\| \| \| \| \| \|	partial splat shift amounts llvm-svn: 364541
*	[ISEL][X86] Tracking of registers that forward call arguments	Djordje Todorovic	2019-06-27	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While lowering calls, collect info about registers that forward arguments into following function frame. We store such info into the MachineFunction of the call. This is used very late when dumping DWARF info about call site parameters. ([9/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D60715 llvm-svn: 364516
*	[X86] X86DAGToDAGISel::matchBitExtract(): pattern b: truncation awareness	Roman Lebedev	2019-06-26	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: (Not so) boringly identical to pattern a (D62786) Not yet sure how do deal with the last pattern c. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62793 llvm-svn: 364418
*	[DAGCombine] visitEXTRACT_SUBVECTOR - add TODO for ↵	Simon Pilgrim	2019-06-26	1	-0/+1
\| \| \| \| \| \| \| \|	extract_subvector(bitcast()) support We support 'big to little' (e.g. extract_subvector(v16i8 bitcast(v2i64))) but not 'little to big' cases (e.g. extract_subvector(v2i64 bitcast(v16i8))) llvm-svn: 364405
*	Teach the DAGCombine to fold this pattern(c1 and c2 is constant).	QingShan Zhang	2019-06-26	1	-2/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	// fold (sext (select cond, c1, c2)) -> (select cond, sext c1, sext c2) // fold (zext (select cond, c1, c2)) -> (select cond, zext c1, zext c2) // fold (aext (select cond, c1, c2)) -> (select cond, sext c1, sext c2) Sign extend the operands if it is any_extend, to keep the signess of the operands that, the other combine rule would apply. The any_extend is handled as zero extend for constants. i.e. t1: i8 = select t0, Constant:i8<-1>, Constant:i8<0> t2: i64 = any_extend t1 --> t3: i64 = select t0, Constant:i64<-1>, Constant:i64<0> --> t4: i64 = sign_extend_inreg t3 Differential Revision: https://reviews.llvm.org/D63318 llvm-svn: 364382
*	[DAGCombine] combineRepeatedFPDivisors - recognize -1.0 / X as a reciprocal	Simon Pilgrim	2019-06-25	1	-2/+2
\| \| \| \| \| \|	Fixes issue identified by @nemanjai (Nemanja Ivanovic) in D62963 / rL363040 - infinite loop due to GetNegatedExpression fighting combineRepeatedFPDivisors resulting in fneg(fdiv(x,splat)) -> fneg(fmul(x,1.0/splat)) -> fmul(x,-1.0/splat) -> fmul(x,(-1.0 * 1.0)/splat) ...... llvm-svn: 364326
*	[SDAG] expand ctpop != 1	Sanjay Patel	2019-06-25	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the generic ctpop expansion to more efficiently handle a check for not-a-power-of-two value: (ctpop x) != 1 --> (x == 0) \|\| ((x & x-1) != 0) This is the inverted predicate sibling pattern that was added with: D63004 This should have been done before I changed IR canonicalization to favor this form with: rL364246 ...so if this requires revert/changing, the earlier commit may also need to modified. llvm-svn: 364319
*	[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support	Simon Pilgrim	2019-06-25	1	-2/+18
\| \| \| \| \| \| \| \|	Add 'lowest' demanded elt -> bitcast fold to all *_EXTEND_VECTOR_INREG cases. Reapplies rL363856. llvm-svn: 364311
*	[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ↵	Simon Pilgrim	2019-06-25	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \|	ANY_EXTEND_VECTOR_INREG Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required. Matches what we already do for ZERO_EXTEND. Reapplies rL363850 but now with legality checks added at rL364290 llvm-svn: 364303
*	[SDAG] improve expansion of ctpop+setcc	Sanjay Patel	2019-06-25	1	-11/+14
\| \| \| \| \| \| \| \| \|	This should not cause any visible change in output, but it's more efficient because we were producing non-canonical 'sub x, 1' and 'setcc ugt x, 0'. As mentioned in the TODO, we should also be handling the inverse predicate. llvm-svn: 364302
*	[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ↵	Simon Pilgrim	2019-06-25	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	ANY/ZERO_EXTEND_VECTOR_INREG Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero. Matches what we already do for SIGN_EXTEND. Reapplies rL363802 but now with legality checks added at rL364290 llvm-svn: 364299
*	[VectorLegalizer] ↵	Simon Pilgrim	2019-06-25	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \|	ExpandANY_EXTEND_VECTOR_INREG/ExpandZERO_EXTEND_VECTOR_INREG - widen source vector The *_EXTEND_VECTOR_INREG opcodes were relaxed back around rL346784 to support source vector widths that are smaller than the output - it looks like the legalizers were never updated to account for this. This patch inserts the smaller source vector into an undef vector of the same width of the result before performing the shuffle+bitcast to correctly handle this. Part of the yak shaving to solve the crashes from rL364264 and rL364272 llvm-svn: 364295
*	[TargetLowering] SimplifyDemandedBits - legal checks for SIGN/ZERO_EXTEND -> ↵	Simon Pilgrim	2019-06-25	1	-6/+15
\| \| \| \| \| \| \| \| \| \|	ZERO/ANY_EXTEND As part of the fix for rL364264 + rL364272 - limit the *_EXTEND conversion to !TLO.LegalOperations \|\| isOperationLegal cases. We'll improve X86 legality in future commits. llvm-svn: 364290
*	[Codegen] TargetLowering::SimplifySetCC(): omit urem when possible	Roman Lebedev	2019-06-25	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This addresses the regression that is being exposed by D50222 in `test/CodeGen/X86/jump_sign.ll` The missing fold, at least partially, looks trivial: https://rise4fun.com/Alive/Zsln i.e. if we are comparing with zero, and comparing the `urem`-by-non-power-of-two, and the `urem` is of something that may at most have a single bit set (or no bits set at all), the `urem` is not needed. Reviewers: RKSimon, craig.topper, xbolva00, spatel Reviewed By: xbolva00, spatel Subscribers: xbolva00, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63390 llvm-svn: 364286
*	Revert r363802, r363850, and r363856 "[TargetLowering] SimplifyDemandedBits..."	Craig Topper	2019-06-25	1	-26/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts the following patches. "[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG" "[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG" "[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support" We can end up with an any_extend_vector_inreg with a 256 bit result type and a 128 bit result type. This is allowed by the ISD opcode, but the generic operation legalizer is only able to expand cases where the total vector width is the same. The X86 backend creates these mismatched cases for zext_vec_inreg/sext_vec_inreg. The SimplifyDemandedBits changes are allowing those nodes to become aext_vec_inreg. For the zext/sext cases, the X86 backend has Custom handling and never lets them get to the generic legalizer. We need to do the same for aext_vec_inreg. llvm-svn: 364264
*	[CodeGen] Add missing vector type legalization for ctlz_zero_undef	Roland Froese	2019-06-24	1	-0/+2
\| \| \| \| \| \| \| \| \|	Widen vector result type for ctlz_zero_undef and cttz_zero_undef the same as ctlz and cttz. Differential Revision: https://reviews.llvm.org/D63463 llvm-svn: 364221