bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[globalisel] Add G_SEXT_INREG	Daniel Sanders	2019-08-09	10	-4/+205
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Targets often have instructions that can sign-extend certain cases faster than the equivalent shift-left/arithmetic-shift-right. Such cases can be identified by matching a shift-left/shift-right pair but there are some issues with this in the context of combines. For example, suppose you can sign-extend 8-bit up to 32-bit with a target extend instruction. %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity) %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_ASHR %2:_(s32), i32 1 would reasonably combine to: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 25 which no longer matches the special case. If your shifts and extend are equal cost, this would break even as a pair of shifts but if your shift is more expensive than the extend then it's cheaper as: %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8 %3:_(s32) = G_ASHR %2:_(s32), i32 1 It's possible to match the shift-pair in ISel and emit an extend and ashr. However, this is far from the only way to break this shift pair and make it hard to match the extends. Another example is that with the right known-zeros, this: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_MUL %2:_(s32), i32 2 can become: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 23 All upstream targets have been configured to lower it to the current G_SHL,G_ASHR pair but will likely want to make it legal in some cases to handle their faster cases. To follow-up: Provide a way to legalize based on the constant. At the moment, I'm thinking that the best way to achieve this is to provide the MI in LegalityQuery but that opens the door to breaking core principles of the legalizer (legality is not context sensitive). That said, it's worth noting that looking at other instructions and acting on that information doesn't violate this principle in itself. It's only a violation if, at the end of legalization, a pass that checks legality without being able to see the context would say an instruction might not be legal. That's a fairly subtle distinction so to give a concrete example, saying %2 in: %1 = G_CONSTANT 16 %2 = G_SEXT_INREG %0, %1 is legal is in violation of that principle if the legality of %2 depends on %1 being constant and/or being 16. However, legalizing to either: %2 = G_SEXT_INREG %0, 16 or: %1 = G_CONSTANT 16 %2:_(s32) = G_SHL %0, %1 %3:_(s32) = G_ASHR %2, %1 depending on whether %1 is constant and 16 does not violate that principle since both outputs are genuinely legal. Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61289 llvm-svn: 368487
*	Remove variable only used in an assert.	Eric Christopher	2019-08-09	1	-2/+1
\| \| \| \|	llvm-svn: 368486
*	[X86] Remove custom handling for extloads from LowerLoad.	Craig Topper	2019-08-09	1	-183/+1
\| \| \| \| \| \|	We don't appear to need this with widening legalization. llvm-svn: 368479
*	[CodeGen] Require a name for a block addr target	Bill Wendling	2019-08-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: A block address may be used in inline assembly. In which case it requires a name so that the asm parser has something to parse. Creating a name for every block address is a large hammer, but is necessary because at the point when a temp symbol is created we don't necessarily know if it's used in inline asm. This ensures that it exists regardless. Reviewers: nickdesaulniers, craig.topper Subscribers: nathanchance, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65352 llvm-svn: 368478
*	[MC] Don't recreate a label if it's already used	Bill Wendling	2019-08-09	3	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch keeps track of MCSymbols created for blocks that were referenced in inline asm. It prevents creating a new symbol which doesn't refer to the block. Inline asm may have a reference to a label. The asm parser however doesn't recognize it as a label and tries to create a new symbol. The result being that instead of the original symbol (e.g. ".Ltmp0") the parser replaces it in the inline asm with the new one (e.g. ".Ltmp00") without updating it in the symbol table. So the machine basic block retains the "old" symbol (".Ltmp0"), but the inline asm uses the new one (".Ltmp00"). Reviewers: nickdesaulniers, craig.topper Subscribers: nathanchance, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65304 llvm-svn: 368477
*	[InstCombine] Refactor optimizeExp2() (NFC)	Evandro Menezes	2019-08-09	1	-31/+19
\| \| \| \| \| \| \|	Refactor `LibCallSimplifier::optimizeExp2()` to use the new `emitBinaryFloatFnCall()` version that fetches the function name from TLI. llvm-svn: 368457
*	[Transforms] Add a emitBinaryFloatFnCall() version that fetches the function ↵	Evandro Menezes	2019-08-09	1	-9/+35
\| \| \| \| \| \| \| \| \| \|	name from TLI Add the counterpart to a similar function for single operands. Differential revision: https://reviews.llvm.org/D65976 llvm-svn: 368453
*	Print reasonable representations of type names in llvm-nm, readelf and readobj	Sunil Srivastava	2019-08-09	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \|	For type values that do not have proper names, print reasonable representation in llvm-nm, llvm-readobj and llvm-readelf, matching GNU tools.s Fixes PR41713. Differential Revision: https://reviews.llvm.org/D65537 llvm-svn: 368451
*	[Transforms] Rename hasUnaryFloatFn() and getUnaryFloatFn() (NFC)	Evandro Menezes	2019-08-09	3	-23/+19
\| \| \| \| \| \|	Rename `hasUnaryFloatFn()` to `hasFloatFn()` and `getUnaryFloatFn()` to `getFloatFnName()`. llvm-svn: 368449
*	[DAGCombiner] remove redundant fold for X*1.0; NFC	Sanjay Patel	2019-08-09	1	-4/+0
\| \| \| \| \| \| \| \| \|	This is handled at node creation time (similar to X/1.0) after: rL357029 (no fast-math-flags needed) llvm-svn: 368443
*	[MachinePipeliner] Avoid indeterminate order in FuncUnitSorter	Jinsong Ji	2019-08-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is exposed by adding a new testcase in PowerPC in https://reviews.llvm.org/rL367732 The testcase got different output on different platform, hence breaking buildbots. The problem is that we get differnt FuncUnitOrder when calculateResMII. The root cause is: 1. Two MachineInstr might get SAME priority(MFUsx) from minFuncUnits. 2. Current comparison operator() will return `MFUs1 > MFUs2`. 3. We use iterators for MachineInstr, so the input to FuncUnitSorter might be different on differnt platform due to the iterator nature. So for two MI with same MFU, their order is actually depends on the iterator order, which is platform (implemtation) dependent. This is risky, and may cause cross-compiling problems. The fix is to check make sure we assign a determine order when they are equal. Reviewers: bcahoon, hfinkel, jmolloy Subscribers: nemanjai, hiraditya, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65992 llvm-svn: 368441
*	Title: Loop Cache Analysis	Whitney Tsang	2019-08-09	4	-0/+628
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Implement a new analysis to estimate the number of cache lines required by a loop nest. The analysis is largely based on the following paper: Compiler Optimizations for Improving Data Locality By: Steve Carr, Katherine S. McKinley, Chau-Wen Tseng http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf The analysis considers temporal reuse (accesses to the same memory location) and spatial reuse (accesses to memory locations within a cache line). For simplicity the analysis considers memory accesses in the innermost loop in a loop nest, and thus determines the number of cache lines used when the loop L in loop nest LN is placed in the innermost position. The result of the analysis can be used to drive several transformations. As an example, loop interchange could use it determine which loops in a perfect loop nest should be interchanged to maximize cache reuse. Similarly, loop distribution could be enhanced to take into consideration cache reuse between arrays when distributing a loop to eliminate vectorization inhibiting dependencies. The general approach taken to estimate the number of cache lines used by the memory references in the inner loop of a loop nest is: Partition memory references that exhibit temporal or spatial reuse into reference groups. For each loop L in the a loop nest LN: a. Compute the cost of the reference group b. Compute the 'cache cost' of the loop nest by summing up the reference groups costs For further details of the algorithm please refer to the paper. Authored By: etiotto Reviewers: hfinkel, Meinersbur, jdoerfert, kbarton, bmahjour, anemet, fhahn Reviewed By: Meinersbur Subscribers: reames, nemanjai, MaskRay, wuzish, Hahnfeld, xusx595, venkataramanan.kumar.llvm, greened, dmgreen, steleman, fhahn, xblvaOO, Whitney, mgorny, hiraditya, mgrang, jsji, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D63459 llvm-svn: 368439
*	[X86][SSE] Swap X86ISD::BLENDV inputs with an inverted selection mask (PR42825)	Simon Pilgrim	2019-08-09	1	-0/+6
\| \| \| \| \| \| \| \|	As discussed on PR42825, if we are inverting the selection mask we can just swap the inputs and avoid the inversion. Differential Revision: https://reviews.llvm.org/D65522 llvm-svn: 368438
*	[GlobalOpt] prevent crashing on large integer types (PR42932)	Sanjay Patel	2019-08-09	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \|	This is a minimal fix (copy the predicate for the assert) to prevent the crashing seen in: https://bugs.llvm.org/show_bug.cgi?id=42932 ...when converting a constant integer of arbitrary width to uint64_t. Differential Revision: https://reviews.llvm.org/D65970 llvm-svn: 368437
*	[Mips][Codegen] Fix fast-isel mixing of FGR64 and AFGR64 registers	Simon Atanasyan	2019-08-09	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Fast-isel was picking AFGR64 register class for processing call arguments when +fp64 options was used. We simply check is option +fp64 is used and pick appropriate register. Patch by Mirko Brkusanin. Differential Revision: https://reviews.llvm.org/D65886 llvm-svn: 368433
*	[MCA] Add flag -show-encoding to llvm-mca.	Andrea Di Biagio	2019-08-09	2	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Flag -show-encoding enables the printing of instruction encodings as part of the the instruction info view. Example (with flags -mtriple=x86_64-- -mcpu=btver2): Instruction Info: [1]: #uOps [2]: Latency [3]: RThroughput [4]: MayLoad [5]: MayStore [6]: HasSideEffects (U) [7]: Encoding Size [1] [2] [3] [4] [5] [6] [7] Encodings: Instructions: 1 2 1.00 4 c5 f0 59 d0 vmulps %xmm0, %xmm1, %xmm2 1 4 1.00 4 c5 eb 7c da vhaddps %xmm2, %xmm2, %xmm3 1 4 1.00 4 c5 e3 7c e3 vhaddps %xmm3, %xmm3, %xmm4 In this example, column Encoding Size is the size in bytes of the instruction encoding. Column Encodings reports the actual instruction encodings as byte sequences in hex (objdump style). The computation of encodings is done by a utility class named mca::CodeEmitter. In future, I plan to expose the CodeEmitter to the instruction builder, so that information about instruction encoding sizes can be used by the simulator. That would be a first step towards simulating the throughput from the decoders in the hardware frontend. Differential Revision: https://reviews.llvm.org/D65948 llvm-svn: 368432
*	[AArch64] Set pref. func. align to 8 bytes on Neoverse E1 & Cortex-A65	Pablo Barrio	2019-08-09	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The Arm Neoverse E1 and Cortex-A65 Software Optimization Guide [1][2], Section "4.7 Branch instruction alignment" state: "It is preferable for branch targets, including subroutine entry points, to be placed on aligned 64-bit boundaries to maximize instruction fetch efficiency." This patch sets the preferred function alignment on Neoverse E1 and Cortex-A65 to 2^3=8B. This was already the case in some Cortex-A CPUs such as Cortex-A53. [1] https://developer.arm.com/docs/swog466751/latest/arm-neoversetm-e1-core-software-optimization-guide [2] https://developer.arm.com/docs/swog010045/latest/arm-cortex-a65-core-software-optimization-guide Reviewers: dmgreen, fhahn, samparker Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65937 llvm-svn: 368431
*	AArch64: support TLS on Darwin platforms in GlobalISel.	Tim Northover	2019-08-09	1	-4/+34
\| \| \| \| \| \| \|	All TLS access on Darwin is in the "general dynamic" form where we call a function to resolve the address, so implementation is pretty simple. llvm-svn: 368418
*	GlobalISel: pack various parameters for lowerCall into a struct.	Tim Northover	2019-08-09	10	-115/+94
\| \| \| \| \| \| \| \| \|	I've now needed to add an extra parameter to this call twice recently. Not only is the signature getting extremely unwieldy, but just updating all of the callsites and implementations is a pain. Putting the parameters in a struct sidesteps both issues. llvm-svn: 368408
*	[ARM][ParallelDSP] Replace SExt uses	Sam Parker	2019-08-09	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \|	As loads are combined and widened, we replaced their sext users operands whereas we should have been replacing the uses of the sext. I've added a load of tests, with only a few of them originally causing assertion failures, the rest improve pattern coverage. Differential Revision: https://reviews.llvm.org/D65740 llvm-svn: 368404
*	[InstSimplify] Report "Changed" also when only deleting dead instructions	Bjorn Pettersson	2019-08-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Make sure that we report that changes has been made by InstSimplify also in situations when only trivially dead instructions has been removed. If for example a call is removed the call graph must be updated. Bug seem to have been introduced by llvm-svn r367173 (commit 02b9e45a7e4b81), since the code in question was rewritten in that commit. Reviewers: spatel, chandlerc, foad Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65973 llvm-svn: 368401
*	[X86] Remove code that expands truncating stores from combineStore.	Craig Topper	2019-08-09	1	-76/+1
\| \| \| \| \| \| \|	We shouldn't form trunc stores that need to be expanded now that we are using widening legalization. llvm-svn: 368400
*	[X86] Remove stale FIXME from combineMaskedStore. NFC	Craig Topper	2019-08-09	1	-4/+0
\| \| \| \| \| \| \|	I believe PR34584 was tracking that FIXME, but its since been closed and a test case was added. llvm-svn: 368397
*	[X86] Remove DAG combine expansion of extending masked load and truncating ↵	Craig Topper	2019-08-09	1	-181/+24
\| \| \| \| \| \| \| \| \| \|	masked store. The only way to generate these was through promoting legalization of narrow vectors, but we widen those types now. So we shouldn't produce these nodes. llvm-svn: 368396
*	[X86] Remove handler for (U/S)(ADD/SUB)SAT from ReplaceNodeResults. Remove ↵	Craig Topper	2019-08-09	1	-9/+4
\| \| \| \| \| \| \| \|	TypeWidenVector check from code that handles X86ISD::VPMADDWD and X86ISD::AVG. More unneeded code since we now legalize narrow vectors by widening. llvm-svn: 368395
*	[X86] Remove ISD::SETCC handling from ReplaceNodeResults.	Craig Topper	2019-08-09	1	-27/+0
\| \| \| \| \| \|	This is no longer needed since we widen v2i32 instead of promoting. llvm-svn: 368394
*	[X86] Simplify ISD::LOAD handling in ReplaceNodeResults and ISD::STORE ↵	Craig Topper	2019-08-09	1	-12/+10
\| \| \| \| \| \|	handling in LowerStore now that v2i32 is widened to v4i32. llvm-svn: 368390
*	[X86] Merge v2f32 and v2i32 gather/scatter handling in ↵	Craig Topper	2019-08-09	1	-86/+12
\| \| \| \| \| \|	ReplaceNodeResults/LowerMSCATTER now that v2i32 is also widened like v2f32. llvm-svn: 368389
*	[X86] Now unreachable handling for f64->v2i32/v4i16/v8i8 bitcasts from ↵	Craig Topper	2019-08-09	1	-14/+0
\| \| \| \| \| \| \| \|	ReplaceNodeResults. We rely on the generic type legalizer for this now. llvm-svn: 368388
*	[X86] Simplify ReplaceNodeResults handling for FP_TO_SINT/UINT for vectors ↵	Craig Topper	2019-08-09	1	-44/+10
\| \| \| \| \| \|	to only handle widening. llvm-svn: 368387
*	[X86] Simplify ReplaceNodeResults handling for ↵	Craig Topper	2019-08-09	1	-4/+5
\| \| \| \| \| \|	SIGN_EXTEND/ZERO_EXTEND/TRUNCATE for vectors to only handle widening. llvm-svn: 368386
*	[X86] Simplify ReplaceNodeResults handling for UDIV/UREM/SDIV/SREM for ↵	Craig Topper	2019-08-09	1	-12/+3
\| \| \| \| \| \|	vectors to only handle widening. llvm-svn: 368385
*	[X86] Remove vector promotion handling from the ReplaceNodeResults ISD::MUL ↵	Craig Topper	2019-08-09	1	-28/+14
\| \| \| \| \| \| \| \|	handling code. We now widen illegal vector types so we don't need this anymore. llvm-svn: 368384
*	DebugInfo/DWARF: Provide some (pretty half-hearted) error handling access ↵	David Blaikie	2019-08-09	1	-17/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	when parsing units This isn't the most robust error handling API, but does allow clients to opt-in to getting Errors they can handle. I suspect the long-term solution would be to move away from the lazy unit parsing and have an explicit step that parses the unit and then allows access to the other APIs that require a parsed unit. llvm-dwarfdump could be expanded to use this (or newer/better API) to demonstrate the benefit of it - but for now lld will use this in a follow-up cl which ensures lld can exit non-zero on errors like this (& provide more descriptive diagnostics including which object file the error came from). (error access to later errors when parsing nested DIEs would be good too - but, again, exposing that without it being a hassle for every consumer may be tricky) llvm-svn: 368377
*	Change the return type of UpgradeARCRuntimeCalls to void	Akira Hatanaka	2019-08-08	1	-9/+5
\| \| \| \| \| \|	Nothing is using the function return. llvm-svn: 368367
*	Remove else-after-return	David Blaikie	2019-08-08	1	-3/+3
\| \| \| \|	llvm-svn: 368364
*	Linker: Add support for GlobalIFunc.	Peter Collingbourne	2019-08-08	2	-64/+77
\| \| \| \| \| \| \| \| \| \|	GlobalAlias and GlobalIFunc ought to be treated the same by the IR linker, so we can generalize the code to be in terms of their common base class GlobalIndirectSymbol. Differential Revision: https://reviews.llvm.org/D55046 llvm-svn: 368357
*	[LICM] Support unary FNeg in LICM	Cameron McInally	2019-08-08	1	-1/+2
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D65908 llvm-svn: 368350
*	[X86] Improve codegen of v8i64->v8i16 and v16i32->v16i8 truncate with ↵	Craig Topper	2019-08-08	1	-1/+22
\| \| \| \| \| \| \| \| \| \| \| \|	avx512vl, avx512bw, min-legal-vector-width<=256 and prefer-vector-width=256 Under this configuration we'll want to split the v8i64 or v16i32 into two vectors. The default legalization will try to truncate each of those 256-bit pieces one step to 128-bit, concatenate those, then truncate one more time from the new 256 to 128 bits. With this patch we now truncate the two splits to 64-bits then concatenate those. We have to do this two different ways depending on whether have widening legalization enabled. Without widening legalization we have to manually construct X86ISD::VTRUNC to prevent the ISD::TRUNCATE with a narrow result being promoted to 128 bits with a larger element type than what we want followed by something like a pshufb to grab the lower half of each element to finish the job. With widening legalization we just get the right thing. When we switch to widening by default we can just delete the other code path. Differential Revision: https://reviews.llvm.org/D65626 llvm-svn: 368349
*	[SelectionDAG][X86] Move setcc mask splitting for ↵	Craig Topper	2019-08-08	3	-282/+38
\| \| \| \| \| \| \| \| \| \|	mload/mstore/mgather/mscatter from DAGCombiner to the type legalizer. We may be able to look to how VSELECT is handled to further improve this, but this appears to be neutral or an improvement on the test cases we have. llvm-svn: 368344
*	[LegalizeTypes] Remove SplitVSETCC helper and just call SplitVecRes_SETCC.	Craig Topper	2019-08-08	1	-18/+1
\| \| \| \|	llvm-svn: 368343
*	[MBP] Disable aggressive loop rotate in plain mode	Guozhi Wei	2019-08-08	1	-36/+80
\| \| \| \| \| \| \| \| \| \|	Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse. To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true. Differential Revision: https://reviews.llvm.org/D65673 llvm-svn: 368339
*	[llvm-mc] Add reportWarning() to MCContext	Brian Cain	2019-08-08	3	-10/+22
\| \| \| \| \| \| \|	Adding reportWarning() to MCContext, so that it can be used from the Hexagon assembler backend. llvm-svn: 368327
*	[X86] Make CMPXCHG16B feature imply CMPXCHG8B feature.	Craig Topper	2019-08-08	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	This fixes znver1 so that it properly enables CMPXHG8B. We can probably remove explicit CMPXCHG8B from CPUs that also have CMPXCHG16B, but keeping this simple to allow cherry pick to 9.0. Fixes PR42935. llvm-svn: 368324
*	[AArch64] Do not emit '#' before immediates in inline asm	Pirama Arumuga Nainar	2019-08-08	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The A64 assembly language does not require the '#' character to introduce constant immediate operands. Avoid the '#' since the AArch64 asm parser does not accept '#' before the lane specifier and rejects the following: __asm__ ("fmla v2.4s, v0.4s, v1.s[%0]" :: "I"(0x1)) Fix a test to not expect the '#' and add a new test case with the above asm. Fixes: https://github.com/android-ndk/ndk/issues/1036 Reviewers: peter.smith, kristof.beyls Subscribers: javed.absar, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D65550 llvm-svn: 368320
*	[ObjC][ARC] Upgrade calls to ARC runtime functions to intrinsic calls if	Akira Hatanaka	2019-08-08	2	-14/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the bitcode has the arm64 retainAutoreleasedReturnValue marker The ARC middle-end passes stopped optimizing or transforming bitcode that has been compiled with old compilers after we started emitting calls to ARC runtime functions as intrinsic calls instead of normal function calls in the front-end and made changes to teach the ARC middle-end passes about those intrinsics (see r349534). This patch converts calls to ARC runtime functions that are not intrinsic functions to intrinsic function calls if the bitcode has the arm64 retainAutoreleasedReturnValue marker. Checking for the presence of the marker is necessary to make sure we aren't changing ARC function calls that were originally MRR message sends (see r349952). rdar://problem/53280660 Differential Revision: https://reviews.llvm.org/D65902 llvm-svn: 368311
*	[X86] XFormVExtractWithShuffleIntoLoad - handle shuffle mask scaling	Simon Pilgrim	2019-08-08	1	-13/+27
\| \| \| \| \| \| \| \|	If the target shuffle mask is from a wider type, attempt to scale the mask so that the extraction can attempt to peek through. Fixes the regression mentioned in rL368307 llvm-svn: 368308
*	[X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using ↵	Simon Pilgrim	2019-08-08	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \|	DemandedElts mask If we don't demand all elements, then attempt to combine to a simpler shuffle. At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts. The insertps-combine.ll regression is because XFormVExtractWithShuffleIntoLoad can't see through shuffles of different widths - this will be fixed in a follow-up commit. llvm-svn: 368307
*	Enable assembly output of local commons for AIX	David Tenty	2019-08-08	7	-8/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch enable assembly output of local commons for AIX using .lcomm directives. Adds a EmitXCOFFLocalCommonSymbol to MCStreamer so we can emit the AIX version of .lcomm assembly directives which include a csect name. Handle the case of BSS locals in PPCAIXAsmPrinter by using EmitXCOFFLocalCommonSymbol. Adds a test for generating .lcomm on AIX Targets. Reviewers: cebowleratibm, hubert.reinterpretcast, Xiangling_L, jasonliu, sfertile Reviewed By: sfertile Subscribers: wuzish, nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64825 llvm-svn: 368306
*	[ARM] Add support for MVE pre and post inc loads and stores	David Green	2019-08-08	3	-19/+273
\| \| \| \| \| \| \| \| \| \| \| \|	This adds pre- and post- increment and decrements for MVE loads and stores. It uses the builtin pre and post load/store detection, unlike Neon. Loads are selected with the code in tryT2IndexedLoad, stores are selected with tablegen patterns. The immediates have a +/-7bit range, multiplied by the size of the element. Differential Revision: https://reviews.llvm.org/D63840 llvm-svn: 368305