bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[WinEH] Allocate space in funclets stack to save XMM CSRs	Pengfei Wang	2019-07-26	3	-23/+127
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is an alternate approach to D57970. Currently funclets reuse the same stack slots that are used in the parent function for saving callee-saved xmm registers. If the parent function modifies a callee-saved xmm register before an excpetion is thrown, the catch handler will overwrite the original saved value. This patch allocates space in funclets stack for saving callee-saved xmm registers and uses RSP instead RBP to access memory. Reviewers: andrew.w.kaylor, LuoYuanke, annita.zhang, craig.topper, RKSimon Subscribers: rnk, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63396 Signed-off-by: pengfei <pengfei.wang@intel.com> llvm-svn: 367088
*	[X86] concatSubVectors - remove unnecessary args. NFCI.	Simon Pilgrim	2019-07-25	1	-9/+12
\| \| \| \| \| \|	All these args can be cheaply recomputed and it makes it much easier to use the function as a quick helper. llvm-svn: 367014
*	[MC] Add MCInstrAnalysis::evaluateMemoryOperandAddress	Seiya Nuta	2019-07-25	1	-1/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add a new method which tries to compute the target address referenced by an operand. This patch supports x86_64 RIP-relative addressing for now. It is necessary to print referenced symbol names in llvm-objdump. Reviewers: andreadb, MaskRay, grosbach, jgalenson, craig.topper Reviewed By: MaskRay, craig.topper Subscribers: bcain, rupprecht, jhenderson, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63847 llvm-svn: 366987
*	[Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold	Roman Lebedev	2019-07-24	2	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This was originally reported in D62818. https://rise4fun.com/Alive/oPH InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression will be hoisted out of a loop if `Y` is invariant and `X` is not. But as it is seen from the diffs here, if it didn't get hoisted, the produced assembly is almost universally worse. Much like with my recent "hoist add/sub by/from const" patches, we should get almost universal win if we hoist constant, there is almost always an "and/test by imm" instruction, but "shift of imm" not so much, so we may avoid having to materialize the immediate, and thus need one less register. And since we now shift not by constant, but by something else, the live-range of that something else may reduce. Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit` instruction pattern. And to not get into endless combine loop. Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm Reviewed By: spatel Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62871 llvm-svn: 366955
*	[DAGCombine] matchBinOpReduction - add partial reduction matching	Simon Pilgrim	2019-07-24	1	-9/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for recognizing cases where a larger vector type is being used to reduce just the elements in the lower subvector: e.g. <8 x i32> reduction pattern in a <16 x i32> vector: <4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u> <2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> <1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u> matchBinOpReduction returns the lower extracted subvector in such cases, assuming isExtractSubvectorCheap accepts the extraction. I've only enabled it for X86 reduction sums so far. I intend to enable it for the bitop/minmax cases in future patches, and eventually I think its worth turning it on all the time. This is mainly just a case of ensuring calls to matchBinOpReduction don't make assumptions on the vector width based on the original vector extraction. Fixes the x86 partial reduction sum cases in PR33758 and PR42023. Differential Revision: https://reviews.llvm.org/D65047 llvm-svn: 366933
*	[X86] In lowerVectorShuffle, instead of creating a new node to canonicalize ↵	Craig Topper	2019-07-23	1	-11/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the shuffle mask by commuting, just commute the mask and swap V1/V2. LegalizeDAG tries to legal the DAG by legalizing nodes before their operands. If we create a new node, we end up legalizing it after its operands. This prevents some of the optimizations that can be done when the operand is a build_vector since the build_vector will have been legalized to something else. Differential Revision: https://reviews.llvm.org/D65132 llvm-svn: 366835
*	[X86] When using AND+PACKUS in lowerV16I8Shuffle, generate the build vector ↵	Craig Topper	2019-07-22	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	directly in v16i8 with the correct 0x00 or 0xFF elements rather than using another VT and bitcasting it. The build_vector will become a constant pool load. By using the desired type initially, it ensures we don't generate a bitcast of the constant pool load which will need to be folded with the load. While experimenting with another patch, I noticed that when the load type and the constant pool type don't match, then SimplifyDemandedBits can't handle it. While we should probably fix that, this was a simple way to fix the issue I saw. llvm-svn: 366732
*	[X86] EltsFromConsecutiveLoads - support common source loads (REAPPLIED)	Simon Pilgrim	2019-07-22	1	-5/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load. A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match. Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle. Fixed out of bounds load assert identified in rL366501 Differential Revision: https://reviews.llvm.org/D64551 llvm-svn: 366681
*	Added address-space mangling for stack related intrinsics	Christudasan Devadasan	2019-07-22	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Modified the following 3 intrinsics: int_addressofreturnaddress, int_frameaddress & int_sponentry. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D64561 llvm-svn: 366679
*	[X86] SimplifyDemandedVectorEltsForTargetNode - Move SUBV_BROADCAST ↵	Simon Pilgrim	2019-07-21	1	-19/+13
\| \| \| \| \| \| \| \|	narrowing handling. NFCI. Move the narrowing of SUBV_BROADCAST to where we handle all the other opcodes. llvm-svn: 366660
*	[X86][SSE] Use PSADBW to improve vXi8 sum reduction (PR42674)	Simon Pilgrim	2019-07-20	1	-7/+38
\| \| \| \| \| \|	As detailed on PR42674, we can reduce a vXi8 down until we have the final <8 x i8>, and then use PSADBW with zero, to sum those values. We then extract the bottom i8, discarding any overflow from the upper bits of the i16 result. llvm-svn: 366636
*	[X86] for split stack, not save/restore nested arg if unused	Than McIntosh	2019-07-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For split-stack, if the nested argument (i.e. R10) is not used, no need to save/restore it in the prologue. Reviewers: thanm Reviewed By: thanm Subscribers: mstorsjo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64673 llvm-svn: 366569
*	[GlobalISel] Translate calls to memcpy et al to G_INTRINSIC_W_SIDE_EFFECTs ↵	Amara Emerson	2019-07-19	2	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	and legalize later. I plan on adding memcpy optimizations in the GlobalISel pipeline, but we can't do that unless we delay lowering to actual function calls. This patch changes the translator to generate G_INTRINSIC_W_SIDE_EFFECTS for these functions, and then have each target specify that using the new custom legalizer for intrinsics hook that they want it expanded it a libcall. Differential Revision: https://reviews.llvm.org/D64895 llvm-svn: 366516
*	Revert [X86] EltsFromConsecutiveLoads - support common source loads	Reid Kleckner	2019-07-18	1	-63/+5
\| \| \| \| \| \| \| \|	This reverts r366441 (git commit 48104ef7c9c653bbb732b66d7254957389fea337) This causes clang to fail to compile some file in Skia. Reduction soon. llvm-svn: 366501
*	[X86] EltsFromConsecutiveLoads - support common source loads	Simon Pilgrim	2019-07-18	1	-5/+63
\| \| \| \| \| \| \| \| \| \| \| \|	This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load. A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match. Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle. Differential Revision: https://reviews.llvm.org/D64551 llvm-svn: 366441
*	[x86] try harder to form LEA from ADD to avoid flag conflicts (PR40483)	Sanjay Patel	2019-07-18	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	LEA doesn't affect flags, so use it more liberally to replace an ADD when we know that the ADD operands affect flags. In the motivating example from PR40483: https://bugs.llvm.org/show_bug.cgi?id=40483 ...this lets us avoid duplicating a math op just to avoid flag conflict. As mentioned in the TODO comments, this heuristic can be extended to fire more often if that leads to more improvements. Differential Revision: https://reviews.llvm.org/D64707 llvm-svn: 366431
*	[X86] Disable combineConcatVectors for vXi1 vectors.	Craig Topper	2019-07-18	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	I'm not convinced the code this calls is properly vetted for vXi1 vectors. Experimental vector widening legalization testing for D55251 is now hitting an assertion failure inside EltsFromConsecutiveLoads. This is occurring from a v2i1 load having a store size different than its VT size. Hopefully this commit will keep such issues from happening. llvm-svn: 366405
*	[X86] Make sure we mark 128/256 MLOAD as Legal with VLX when ↵	Craig Topper	2019-07-17	1	-5/+7
\| \| \| \| \| \| \| \| \|	min-legal-vector-width=256 is in effect. This started triggering an assertion after r364718 when we made these Custom under AVX2. llvm-svn: 366382
*	[x86] use more phadd for reductions	Sanjay Patel	2019-07-16	1	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is part of what is requested by PR42023: https://bugs.llvm.org/show_bug.cgi?id=42023 There's an extension needed for FP add, but exactly how we would specify that using flags is not clear to me, so I left that as a TODO. We're still missing patterns for partial reductions when the input vector is 256-bit or 512-bit, but I think that's a failure of vector narrowing. If we can reduce the widths, then this matching should work on those tests. Differential Revision: https://reviews.llvm.org/D64760 llvm-svn: 366268
*	[X86] In combineStore, don't convert v2f32 load/store pairs to f64 loads/stores.	Craig Topper	2019-07-16	1	-3/+2
\| \| \| \| \| \| \|	Type legalization can take care of this. This gives DAG combine a little more time with the original types. llvm-svn: 366182
*	Fix parameter name comments using clang-tidy. NFC.	Rui Ueyama	2019-07-16	4	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch applies clang-tidy's bugprone-argument-comment tool to LLVM, clang and lld source trees. Here is how I created this patch: $ git clone https://github.com/llvm/llvm-project.git $ cd llvm-project $ mkdir build $ cd build $ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \ -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \ -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm $ ninja $ parallel clang-tidy -checks='-,bugprone-argument-comment' \ -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \ ::: ../llvm/lib//.{cpp,h} ../clang/lib/*/.{cpp,h} ../lld/*/.{cpp,h} llvm-svn: 366177
*	[X86] Teach convertToThreeAddress to handle SUB with immediate	Craig Topper	2019-07-15	2	-7/+53
\| \| \| \| \| \| \| \| \| \| \| \|	We mostly avoid sub with immediate but there are a couple cases that can create them. One is the add 128, %rax -> sub -128, %rax trick in isel. The other is when a SUB immediate gets created for a compare where both the flags and the subtract value is used. If we are unable to linearize the SelectionDAG to satisfy the flag user and the sub result user from the same instruction, we will clone the sub immediate for the two uses. The one that produces flags will eventually become a compare. The other will have its flag output dead, and could then be considered for LEA creation. I added additional test cases to add.ll to show the the sub -128 trick gets converted to LEA and a case where we don't need to convert it. This showed up in the current codegen for PR42571. Differential Revision: https://reviews.llvm.org/D64574 llvm-svn: 366151
*	[x86] try to keep FP casted+truncated+extracted vector element out of GPRs	Sanjay Patel	2019-07-15	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	inttofp (trunc (extelt X, 0)) --> inttofp (extelt (bitcast X), 0) We have pseudo-vectorization of scalar int to FP casts, so this tries to make that more likely by replacing a truncate with a bitcast. I didn't see any test diffs starting from 'uitofp', so I left that as a TODO. We can't only match the shorter trunc+extract pattern because there's an opposing transform somewhere, so we infinite loop. Waiting to try this during lowering is another possibility. A motivating case is shown in PR39975 and included in the test diffs here: https://bugs.llvm.org/show_bug.cgi?id=39975 Differential Revision: https://reviews.llvm.org/D64710 llvm-svn: 366098
*	[X86] Return UNDEF from LowerScalarImmediateShift when the shift amount is ↵	Craig Topper	2019-07-15	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	out of range. I think we only turn out of range shiftss to undef when all elements are out of range or the shift amount is a splat out of range. I'm not sure which, I didn't check. During lowering we can split a shift where some elements are out of range into multiple shifts. This can create a new shift with a splat shift amount that is out of range. This patch returns undef for this case. Fixes PR42615. Differential Revision: https://reviews.llvm.org/D64699 llvm-svn: 366096
*	[X86] isTargetShuffleEquivalent - assert the expected mask is correctly ↵	Simon Pilgrim	2019-07-15	1	-0/+2
\| \| \| \| \| \| \| \|	formed. NFCI. While we don't make any assumptions about the actual mask, assert that the expected mask only contains valid mask element values. llvm-svn: 366066
*	[X86] Separate the memory size of vzext_load/vextract_store from the element ↵	Craig Topper	2019-07-15	4	-119/+164
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	size of the result type. Use them improve the codegen of v2f32 loads/stores with sse1 only. Summary: SSE1 only supports v4f32. But does have instructions like movlps/movhps that load/store 64-bits of memory. This patch breaks the connection between the node VT of the vzext_load/vextract_store patterns and the memory VT. Enabling a v4f32 node with a 64-bit memory VT. I've used i64 as the memory VT here. I've written the PatFrag predicate to just check the store size not the specific VT. I think the VT will only matter for CSE purposes. We could use v2f32, but if we want to start using these operations in more places a simple integer type might make the most sense. I'd like to maybe use this same thing for SSE2 and later as well, but that will need more work to be supported by EltsFromConsecutiveLoads to avoid regressing lit tests. I'd maybe also like to combine bitcasts with these load/stores nodes now that the types are disconnected. And I'd also like to consider canonicalizing (scalar_to_vector + load) to vzext_load. If you want I can split the mechanical tablegen stuff where I added the 32/64 off from the sse1 change. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64528 llvm-svn: 366034
*	[X86] Remove offset of 8 from the call to FuseInst for UNPCKLPDrr folding ↵	Craig Topper	2019-07-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	added in r365287. This was copy/pasted from above and I forgot to change it. We just need the default offset of 0 here. Fixes PR42616. llvm-svn: 366011
*	[x86] simplify cmov with same true/false operands	Sanjay Patel	2019-07-13	1	-0/+4
\| \| \| \|	llvm-svn: 365998
*	[X86] Use MachineInstr::findRegisterDefOperand to simplify some code in ↵	Craig Topper	2019-07-12	1	-9/+3
\| \| \| \| \| \|	optimizeCompareInstr. NFCI llvm-svn: 365946
*	[X86] Add NEG to isUseDefConvertible.	Craig Topper	2019-07-12	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	We can use the C flag from NEG to detect that the input was zero. Really we could probably use the Z flag too. But C matches what we'd do for usubo 0, X. Haven't found a test case for this due to the usubo formation in CGP. But I verified if I comment out the CGP code this transformation catches some of the same cases. llvm-svn: 365929
*	Revert "[DwarfDebug] Dump call site debug info"	Djordje Todorovic	2019-07-12	2	-96/+1
\| \| \| \| \| \| \| \|	A build failure was found on the SystemZ platform. This reverts commit 9e7e73578e54cd22b3c7af4b54274d743b6607cc. llvm-svn: 365886
*	[X86] Merge negated ISD::SUB nodes into X86ISD::SUB equivalent (PR40483)	Sanjay Patel	2019-07-11	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \|	Follow up to D58597, where it was noted that the commuted ISD::SUB variant was having problems with lack of combines. See also D63958 where we untangled setcc/sub pairs. Differential Revision: https://reviews.llvm.org/D58875 llvm-svn: 365791
*	[X86] -fno-plt: use GOT __tls_get_addr only if GOTPCRELX is enabled	Fangrui Song	2019-07-11	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As of binutils 2.32, ld has a bogus TLS relaxation error when the GD/LD code sequence using R_X86_64_GOTPCREL (instead of R_X86_64_GOTPCRELX) is attempted to be relaxed to IE/LE (binutils PR24784). gold and lld are good. In gcc/config/i386/i386.md, there is a configure-time check of as/ld support and the GOT relaxation will not be used if as/ld doesn't support it: if (flag_plt \|\| !HAVE_AS_IX86_TLS_GET_ADDR_GOT) return "call\t%P2"; return "call\t{*%p2@GOT(%1)\|[DWORD PTR %p2@GOT[%1]]}"; In clang, -DENABLE_X86_RELAX_RELOCATIONS=OFF is the default. The ld.bfd bogus error can be reproduced with: thread_local int a; int main() { return a; } clang -fno-plt -fpic a.cc -fuse-ld=bfd GOTPCRELX gained relative good support in 2016, which is considered relatively new. It is even difficult to conditionally default to -DENABLE_X86_RELAX_RELOCATIONS=ON due to cross compilation reasons. So work around the ld.bfd bug by only using GOT when GOTPCRELX is enabled. Reviewers: dalias, hjl.tools, nikic, rnk Reviewed By: nikic Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64304 llvm-svn: 365752
*	[X86] Don't convert 8 or 16 bit ADDs to LEAs on Atom in FixupLEAPass.	Craig Topper	2019-07-11	1	-12/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We use the functions that convert to three address to do the conversion, but changing an 8 or 16 bit will cause it to create a virtual register. This can't be done after register allocation where this pass runs. I've switched the pass completely to a white list of instructions that can be converted to LEA instead of a blacklist that was incorrect. This will avoid surprises if we enhance the three address conversion function to include additional instructions in the future. Fixes PR42565. llvm-svn: 365720
*	[X86] Add patterns with and_flag_nocf for BLSI and TBM instructions.	Craig Topper	2019-07-10	1	-6/+19
\| \| \| \| \| \|	Fixes similar issues to r352306. llvm-svn: 365705
*	[X86] Add BLSR and BLSMSK to isUseDefConvertible.	Craig Topper	2019-07-10	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Unfortunately subo formation in CGP prevents obvious ways of testing this. But we already have BLSI in here and the flag behavior is well understood. Might become more useful if we improve PR42571. llvm-svn: 365702
*	[X86] Remove unused variable. NFC	Craig Topper	2019-07-10	1	-1/+0
\| \| \| \|	llvm-svn: 365697
*	[X86] EltsFromConsecutiveLoads - clean up element size calcs. NFCI.	Simon Pilgrim	2019-07-10	1	-14/+12
\| \| \| \| \| \|	Determine the element/load size calculations earlier and assert that they are whole bytes in size. llvm-svn: 365674
*	[X86] EltsFromConsecutiveLoads - remove duplicate check for element size. NFCI.	Simon Pilgrim	2019-07-10	1	-6/+0
\| \| \| \| \| \|	We've already checked that each element is the correct contributory size for VT when we inspect the elements for Undef/Zero/Load. llvm-svn: 365656
*	[X86] EltsFromConsecutiveLoads - ensure element reg/store sizes are the same ↵	Simon Pilgrim	2019-07-10	1	-3/+5
\| \| \| \| \| \| \| \|	size. NFCI. This renames the type so it doesn't sound like its based off the load size - as we're moving towards supporting combining loads of different sizes. llvm-svn: 365655
*	[X86] EltsFromConsecutiveLoads - cleanup Zero/Undef/Load element collection. ↵	Simon Pilgrim	2019-07-10	1	-12/+17
\| \| \| \| \| \|	NFCI. llvm-svn: 365628
*	[X86] EltsFromConsecutiveLoads - LDBase is non-null. NFCI.	Simon Pilgrim	2019-07-10	1	-6/+4
\| \| \| \| \| \|	Don't bother checking for LDBase != null - it should be (and we assert that it is). llvm-svn: 365622
*	[X86] EltsFromConsecutiveLoads - store Loads on a per-element basis. NFCI.	Simon Pilgrim	2019-07-10	1	-9/+9
\| \| \| \| \| \|	Cache the LoadSDNode nodes so we can easily map to/from the element index instead of packing them together - this will be useful for future patches for PR16739 etc. llvm-svn: 365620
*	[X86][SSE] EltsFromConsecutiveLoads - add basic dereferenceable support	Simon Pilgrim	2019-07-10	1	-7/+15
\| \| \| \| \| \| \| \| \| \|	This patch checks to see if the vector element loads are based off a dereferenceable pointer that covers the entire vector width, in which case we don't need to have element loads at both extremes of the vector width - just the start (base pointer) of it. Another step towards partial vector loads...... Differential Revision: https://reviews.llvm.org/D64205 llvm-svn: 365614
*	[X86] Limit getTargetConstantFromNode to only work on NormalLoads not ↵	Craig Topper	2019-07-10	1	-1/+1
\| \| \| \| \| \| \| \| \|	extending loads. This seems to fix a failure reported by Jordan Rupprecht, but we don't have a reduced test case yet. llvm-svn: 365589
*	[X86] Don't form extloads in combineExtInVec unless the load extension is legal.	Craig Topper	2019-07-09	1	-7/+9
\| \| \| \| \| \| \| \| \| \|	This should prevent doing this on pre-sse4.1 targets or for 256 bit vectors without avx2. I don't know of a failure from this. Op legalization will probably take care of, but seemed better to be safe. llvm-svn: 365577
*	[X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into ↵	Craig Topper	2019-07-09	2	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it Basically the problem is that X86 doesn't set the Fast flag from allowsMemoryAccess on certain CPUs due to slow unaligned memory subtarget features. This prevents bitcasts from being folded into loads and stores. But all vector loads and stores of the same width are the same cost on X86. This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it. Differential Revision: https://reviews.llvm.org/D64295 llvm-svn: 365549
*	[X86] LowerToHorizontalOp - use count_if to count non-UNDEF ops. NFCI.	Simon Pilgrim	2019-07-09	1	-5/+2
\| \| \| \|	llvm-svn: 365540
*	[DwarfDebug] Dump call site debug info	Djordje Todorovic	2019-07-09	2	-1/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dump the DWARF information about call sites and call site parameters into debug info sections. The patch also provides an interface for the interpretation of instructions that could load values of a call site parameters in order to generate DWARF about the call site parameters. ([13/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D60716 llvm-svn: 365467
*	Standardize on MSVC behavior for triples with no environment	Reid Kleckner	2019-07-08	2	-10/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This makes it so that IR files using triples without an environment work out of the box, without normalizing them. Typically, the MSVC behavior is more desirable. For example, it tends to enable things like constant merging, use of associative comdats, etc. Addresses PR42491 Reviewers: compnerd Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64109 llvm-svn: 365387