bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86][NFC] Generalize the naming of "Retpoline Thunks" and related code to ↵	Scott Constable	2020-06-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	"Indirect Thunks" There are applications for indirect call/branch thunks other than retpoline for Spectre v2, e.g., https://software.intel.com/security-software-guidance/software-guidance/load-value-injection Therefore it makes sense to refactor X86RetpolineThunks as a more general capability. Differential Revision: https://reviews.llvm.org/D76810
*	[X86] Turn FP_ROUND/STRICT_FP_ROUND into X86ISD::VFPROUND/STRICT_VFPROUND ↵	Craig Topper	2020-01-11	1	-0/+4
\| \| \| \|	during PreprocessISelDAG to remove some duplicate isel patterns.
*	[X86] Remove dead code from X86DAGToDAGISel::Select that is no longer needed ↵	Craig Topper	2020-01-11	1	-28/+0
\| \| \| \|	now that we don't mutate strict fp nodes. NFC
*	[X86] Simplify code by removing an unreachable condition. NFCI	Craig Topper	2020-01-10	1	-12/+2
\| \| \| \| \| \|	For X87<->SSE conversions, the SSE type is always smaller than the X87 type. So we can always use the smallest type for the memory type.
*	[X86] Preserve fpexcept property when turning strict_fp_extend and ↵	Craig Topper	2020-01-10	1	-4/+34
\| \| \| \| \| \| \| \| \| \| \|	strict_fp_round into stack operations. We use the stack for X87 fp_round and for moving from SSE f32/f64 to X87 f64/f80. Or from X87 f64/f80 to SSE f32/f64. Note for the SSE<->X87 conversions the conversion always happens in the X87 domain. The load/store ops in the X87 instructions are able to signal exceptions.
*	[X86] Use ReplaceAllUsesWith instead of ReplaceAllUsesOfValueWith to ↵	Craig Topper	2020-01-10	1	-12/+2
\| \| \| \|	simplify some code. NFCI
*	[X86] Fix an 8 bit testb being selected when folding a volatile i32 load ↵	Amara Emerson	2020-01-06	1	-0/+11
\| \| \| \| \| \|	pattern. Differential Revision: https://reviews.llvm.org/D71581
*	add strict float for round operation	Liu, Chen3	2020-01-01	1	-5/+28
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D72026
*	[SelectionDAG] Disallow indirect "i" constraint	Fangrui Song	2019-12-29	1	-4/+0
\| \| \| \| \| \| \| \| \|	This allows us to delete InlineAsm::Constraint_i workarounds in SelectionDAGISel::SelectInlineAsmMemoryOperand overrides and TargetLowering::getInlineAsmMemConstraint overrides. They were introduced to X86 in r237517 to prevent crashes for constraints like "=*imr". They were later copied to other targets.
*	[X86] Add STRICT versions of CVTTP2SI, CVTTP2UI, CMPM, and CMPP.	Craig Topper	2019-12-24	1	-11/+12
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D71850
*	[FPEnv][X86] More strict int <-> FP conversion fixes	Ulrich Weigand	2019-12-23	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix several several additional problems with the int <-> FP conversion logic both in common code and in the X86 target. In particular: - The STRICT_FP_TO_UINT expansion emits a floating-point compare. This compare can raise exceptions and therefore needs to be a strict compare. I've made it signaling (even though quiet would also be correct) as signaling is the more usual default for an LT. This code exists both in common code and in the X86 target. - The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode: it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP that ends up not chosen. I've fixed the algorithm to use only a single STRICT_SINT_TO_FP instead. - The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do the wrong thing because it calls getOperationAction using the result VT. But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to be called using the operand VT. - Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily. Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D71840
*	Enable STRICT_FP_TO_SINT/UINT on X86 backend	Liu, Chen3	2019-12-19	1	-4/+20
\| \| \| \| \| \|	This patch is mainly for custom lowering the vector operation. Differential Revision: https://reviews.llvm.org/D71592
*	[X86] Add a simple hack to IsProfitableToFold to prevent vselect+strict fp ↵	Craig Topper	2019-12-18	1	-0/+6
\| \| \| \| \| \| \| \|	operations from being folded into masked instructions. We really need to update the isel patterns to prevent this, but that requires some tablegen de-tangling. So this hack will work for correctness in the short term.
*	[IR] Split out target specific intrinsic enums into separate headers	Reid Kleckner	2019-12-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320
*	[FPEnv][X86] Constrained FCmp intrinsics enabling on X86	Wang, Pengfei	2019-12-11	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision. Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3 Tags: #llvm Differential Revision: https://reviews.llvm.org/D70582
*	add support for strict operation fpextend/fpround/fsqrt on X86 backend	Liu, Chen3	2019-12-10	1	-8/+0
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D71184
*	Add strict fp support for instructions fadd/fsub/fmul/fdiv	Liu, Chen3	2019-12-06	1	-3/+1
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D68757
*	Add support for lowering 32-bit/64-bit pointers	Amy Huang	2019-12-04	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This follows a previous patch that changes the X86 datalayout to represent mixed size pointers (32-bit sext, 32-bit zext, and 64-bit) with address spaces (https://reviews.llvm.org/D64931) This patch implements the address space cast lowering to the corresponding sign extension, zero extension, or truncate instructions. Related to https://bugs.llvm.org/show_bug.cgi?id=42359 Reviewers: rnk, craig.topper, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69639
*	[X86] Add strict fp support for operations of X87 instructions	Craig Topper	2019-11-26	1	-2/+10
\| \| \| \| \| \| \| \| \| \|	This is the following patch of D68854. This patch adds basic operations of X87 instructions, including +, -, *, / , fp extensions and fp truncations. Patch by Chen Liu(LiuChen3) Differential Revision: https://reviews.llvm.org/D68857
*	[X86] Mark vector STRICT_FADD/STRICT_FSUB as Legal and add mutation to ↵	Craig Topper	2019-11-21	1	-0/+2
\| \| \| \| \| \| \|	X86ISelDAGToDAG The prevents LegalizeVectorOps from scalarizing them. We'll need to remove the X86 mutation code when we add isel patterns.
*	[PGO][PGSO] DAG.shouldOptForSize part.	Hiroshi Yamauchi	2019-11-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: (Split of off D67120) SelectionDAG::shouldOptForSize changes for profile guided size optimization. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70095
*	[SelectionDAG][X86] Mutate strictFP nodes to non-strict in ↵	Craig Topper	2019-11-20	1	-0/+7
\| \| \| \| \| \| \| \| \| \|	DoInstructionSelection when the node is marked Expand rather than when it is not Legal. This allows operations that are marked Custom, but have some type combinations that are legal to get past this code. Add custom mutation code to X86's Select function for the nodes that don't have isel patterns yet.
*	[X86] Add a 'break;' to the end of the last case in a switch to avoid ↵	Craig Topper	2019-11-18	1	-0/+2
\| \| \| \|	surprising the next person to add a case after this one. NFC
*	Revert r373172 "[X86] Add custom isel logic to match VPTERNLOG from 2 logic ↵	Craig Topper	2019-10-01	1	-79/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ops." This seems to be causing some performance regresions that I'm trying to investigate. One thing that stands out is that this transform can increase the live range of the operands of the earlier logic op. This can be bad for register allocation. If there are two logic op inputs we should really combine the one that is closest, but SelectionDAG doesn't have a good way to do that. Maybe we need to do this as a basic block transform in Machine IR. llvm-svn: 373401
*	[X86] Add a VBROADCAST_LOAD ISD opcode representing a scalar load ↵	Craig Topper	2019-10-01	1	-11/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	broadcasted to a vector. Summary: This adds the ISD opcode and a DAG combine to create it. There are probably some places where we can directly create it, but I'll leave that for future work. This updates all of the isel patterns to look for this new node. I had to add a few additional isel patterns for aligned extloads which we should probably fix with a DAG combine or something. This does mean that the broadcast load folding for avx512 can no longer match a broadcasted aligned extload. There's still some work to do here for combining a broadcast of a broadcast_load. We also need to improve extractelement or demanded vector elements of a broadcast_load. I'll try to get those done before I submit this patch. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68198 llvm-svn: 373349
*	[X86] Add custom isel logic to match VPTERNLOG from 2 logic ops.	Craig Topper	2019-09-29	1	-1/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's room from improvement here, but this is a decent starting point. There are a few minor regressions in the vector-rotate tests, where we are now forming a vpternlog from an and before we get a chance to form it for a bitselect that we were matching previously. This results in an AND and an ANDN feeding the vpternlog where previously we just had an AND after the vpternlog. I think we can probably DAG combine the AND with the bitselect to get back to similar codegen. llvm-svn: 373172
*	[X86] Move bitselect matching to vpternlog into X86ISelDAGToDAG.cpp	Craig Topper	2019-09-29	1	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \|	This allows us to reduce the use count on the condition node before the match. This enables load folding for that operand without relying on the peephole pass. This will be improved on for broadcast load folding in a subsequent commit. This still requires a bunch of isel patterns for vXi16/vXi8 types though. llvm-svn: 373156
*	[X86] Canonicalize all zeroes vector to RHS in X86DAGToDAGISel::tryVPTESTM.	Craig Topper	2019-09-23	1	-3/+9
\| \| \| \|	llvm-svn: 372544
*	[X86] Remove SETEQ/SETNE canonicalization code from LowerIntVSETCC_AVX512 to ↵	Craig Topper	2019-09-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	prevent an infinite loop. The attached test case would previous infinite loop after r365711. I'm going to move this to X86ISelDAGToDAG.cpp to get the setcc to match VPTEST in 32-bit mode in a follow up commit. llvm-svn: 372543
*	[X86] X86DAGToDAGISel::matchBEXTRFromAndImm(): if can't use BEXTR, fallback ↵	Roman Lebedev	2019-09-22	1	-12/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to BZHI is profitable (PR43381) Summary: PR43381 notes that while we are good at matching `(X >> C1) & C2` as BEXTR/BEXTRI, we only do that if we either have BEXTRI (TBM), or if BEXTR is marked as being fast (`-mattr=+fast-bextr`). In all other cases we don't match. But that is mainly only true for AMD CPU's. However, for all the CPU's for which we have sched models, the BZHI is always fast (or the sched models are all bad.) So if we decide that it's unprofitable to emit BEXTR/BEXTRI, we should consider falling-back to BZHI if it is available, and follow-up with the shift. While it's really tempting to do something because it's cool it is wise to first think whether it actually makes sense to do. We shouldn't just use BZHI because we can, but only it it is beneficial. In particular, it isn't really worth it if the input is a register, mask is small, or we can fold a load. But it is worth it if the mask does not fit into 32-bits. (careful, i don't know much about intel cpu's, my choice of `-mcpu` may be bad here) Thus we manage to fold a load: https://godbolt.org/z/Er0OQz Or if we'd end up using BZHI anyways because the mask is large: https://godbolt.org/z/dBJ_5h But this isn'r actually profitable in general case, e.g. here we'd increase microop count (the register renaming is free, mca does not model that there it seems) https://godbolt.org/z/k6wFoz Likewise, not worth it if we just get load folding: https://godbolt.org/z/1M1deG https://bugs.llvm.org/show_bug.cgi?id=43381 Reviewers: RKSimon, craig.topper, davezarzycki, spatel Reviewed By: craig.topper, davezarzycki Subscribers: andreadb, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67875 llvm-svn: 372532
*	Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"	Matt Arsenault	2019-09-19	1	-7/+5
\| \| \| \| \| \| \| \| \|	This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338
*	Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"	Hans Wennborg	2019-09-19	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_ instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314
*	GlobalISel: Don't materialize immarg arguments to intrinsics	Matt Arsenault	2019-09-19	1	-7/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Encode them directly as an imm argument to G_INTRINSIC. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_ instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285
*	[X86] X86DAGToDAGISel::tryFoldLoad - assert root/parent pointers are ↵	Simon Pilgrim	2019-09-17	1	-0/+1
\| \| \| \| \| \| \| \|	non-null. NFCI. Silences a static analyzer warning. llvm-svn: 372118
*	[X86] Updated target specific selection dag code to conservatively check for ↵	Philip Reames	2019-09-10	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	isAtomic in addition to isVolatile See D66309 for context. This is the first sweep of x86 target specific code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. Sorry for the lack of tests. As discussed in the review, most of these are vector tests (for which atomicity is not well defined) and I couldn't figure out to exercise the anyextend cases which aren't vector specific. Differential Revision: https://reviews.llvm.org/D66322 llvm-svn: 371547
*	[X86] X86DAGToDAGISel::combineIncDecVector(): call getSplatBuildVector() ↵	Roman Lebedev	2019-09-08	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	manually As reported in post-commit review of r370327, there is some case where the code crashes. As discussed with Craig Topper, the problem is that getConstant() internally calls getSplatBuildVector(), so we don't insert the constant itself. If we do that manually we're good. llvm-svn: 371346
*	[X86] Use MOVSX by default instead of CBW to extend i8 to AX for i8 sdivrem.	Craig Topper	2019-09-06	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can use a MOVSX16 here then rely on FixupBWInst to change to MOVSX32 if the upper bits are dead. With a special case to not promote if it could be turned into CBW. Then we can rely on X86MCInstLower to turn the MOVSX into CBW very late if register allocation worked out. Using MOVSX gives an opportunity to use the MOVSX as a both a copy and a sign extend since the input and output register aren't tied together. Differential Revision: https://reviews.llvm.org/D67192 llvm-svn: 371243
*	[X86] Use MOVZX16rr8/MOVZXrm8 when extending input for i8 udivrem.	Craig Topper	2019-09-06	1	-3/+3
\| \| \| \| \| \| \| \|	We can rely on X86FixupBWInsts to turn these into MOVZX32. This simplifies a follow up commit to use MOVSX for i8 sdivrem with a late optimization to use CBW when register allocation works out. llvm-svn: 371242
*	[X86][CodeGen][NFC] Delay `combineIncDecVector()` from DAGCombine to ↵	Roman Lebedev	2019-08-29	1	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	X86DAGToDAGISel Summary: We were previously doing it in DAGCombine. But we also want to do `sub %x, C` -> `add %x, (sub 0, C)` for vectors in DAGCombine. So if we had `sub %x, -1`, we'll transform it to `add %x, 1`, which `combineIncDecVector()` will immediately transform back into `sub %x, -1`, and here we go again... I've marked this as NFC since not a single test changes, but since that 'changes' DAGCombine, probably this isn't fully NFC. Reviewers: RKSimon, craig.topper, spatel Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62327 llvm-svn: 370327
*	[X86] Teach -Os immediate sharing code to not count constant uses that will ↵	Craig Topper	2019-08-25	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \|	become INC/DEC. INC/DEC don't use an immediate so we don't need to count it. We also shouldn't use the custom isel for it. Fixes PR42998. llvm-svn: 369863
*	[X86] Manually reimplement getTargetInsertSubreg in ↵	Craig Topper	2019-08-16	1	-2/+6
\| \| \| \| \| \| \| \| \| \|	X86DAGToDAGISel::matchBitExtract so we can call insertDAGNode on the target constant. This is needed to maintain the topological sort order. Fixes PR42992. llvm-svn: 369084
*	[X86] Add llvm_unreachable to a switch that covers all expected values.	Craig Topper	2019-08-14	1	-0/+1
\| \| \| \|	llvm-svn: 368857
*	[x86] try harder to form LEA from ADD to avoid flag conflicts (PR40483)	Sanjay Patel	2019-07-18	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	LEA doesn't affect flags, so use it more liberally to replace an ADD when we know that the ADD operands affect flags. In the motivating example from PR40483: https://bugs.llvm.org/show_bug.cgi?id=40483 ...this lets us avoid duplicating a math op just to avoid flag conflict. As mentioned in the TODO comments, this heuristic can be extended to fire more often if that leads to more improvements. Differential Revision: https://reviews.llvm.org/D64707 llvm-svn: 366431
*	[X86] Add custom isel to select ADD/SUB/OR/XOR/AND to their non-immediate ↵	Craig Topper	2019-07-04	1	-1/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	forms under optsize when the immediate has additional users. Summary: We attempt to prevent folding immediates with multiple users under optsize. But we only do this from store nodes and X86ISD::ADD/SUB/XOR/OR/AND patterns. We don't do it for ISD::ADD/SUB/XOR/OR/AND even though we count them as users when deciding whether to fold into other nodes. This leads to situations where we block folding to a compare for example, but still fold into an AND or OR as seen in PR27202. Unfortunately touching the isel patterns in tablegen for the ISD::ADD/SUB/XOR/OR/AND opcodes will cause the patterns to be unusable for fast isel. And we don't have a way to make a fast isel only pattern. To workaround this, this patch adds custom isel in front of the isel table that will select the non-immediate forms if the immediate has additional users. This may create some issues for ANDN and NOT matching. And there's room for improvement with unsigned 32 immediates on 64-bit AND. This patch needs more thorough test cases, but I wanted to get feedback on the direction. Please send me any other test cases you've seen in the wild. I think we probably have the same issue with the immediate matching when we fold RMW from X86ISD::ADD/SUB/XOR/OR/AND. And our TEST immedaite shrinking logic. Our cost modeling for immediates that can fit in a sign extended 8-bit immediate on a 16/32/64 bit operation is completely wrong. I also wonder if we should update the ConstantHoisting cost model and block folding for "opaque" constants. But of course constants can still be created by DAG combine and lowering optimizations. Fixes PR27202 Reviewers: spatel, RKSimon, andreadb Reviewed By: RKSimon Subscribers: jsji, hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59909 llvm-svn: 365163
*	[X86] Add PreprocessISelDAG support for turning ISD::FP_TO_SINT/UINT into ↵	Craig Topper	2019-07-02	1	-0/+21
\| \| \| \| \| \|	X86ISD::CVTTP2SI/CVTTP2UI and to reduce the number of isel patterns. llvm-svn: 364887
*	[X86] Remove (vzext_movl (scalar_to_vector (load))) matching code from ↵	Craig Topper	2019-06-27	1	-17/+0
\| \| \| \| \| \| \| \|	selectScalarSSELoad. I think this will be turning into vzext_load during DAG combine. llvm-svn: 364499
*	[X86] Teach selectScalarSSELoad to not narrow volatile loads.	Craig Topper	2019-06-27	1	-5/+7
\| \| \| \|	llvm-svn: 364498
*	[X86][Codegen] X86DAGToDAGISel::matchBitExtract(): consistently capture ↵	Roman Lebedev	2019-06-26	1	-7/+6
\| \| \| \| \| \|	lambdas by value llvm-svn: 364420
*	[X86] X86DAGToDAGISel::matchBitExtract(): pattern c: truncation awareness	Roman Lebedev	2019-06-26	1	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The one thing of note here is that the 'bitwidth' constant (32/64) was previously pessimistic. Given `x & (-1 >> (C - z))`, we were taking `C` to be `bitwidth(x)`, but in reality we want `(-1 >> (C - z))` pattern to mean "low z bits must be all-ones". And for that, `C` should be `bitwidth(-1 >> (C - z))`, i.e. of the shift operation itself. Last pattern D does not seem to exhibit any of these truncation issues. Although it has the opposite problem - if we extract low bits (no shift) from i64, and then truncate to i32, then we fail to shrink this 64-bit extraction into 32-bit extraction. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62806 llvm-svn: 364419
*	[X86] X86DAGToDAGISel::matchBitExtract(): pattern b: truncation awareness	Roman Lebedev	2019-06-26	1	-5/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: (Not so) boringly identical to pattern a (D62786) Not yet sure how do deal with the last pattern c. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62793 llvm-svn: 364418