bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[x86] fix test specifications and auto-generate checks	Sanjay Patel	2016-12-12	1	-29/+38
\| \| \| \|	llvm-svn: 289492
*	Avoid infinite loops in branch folding	Andrew Kaylor	2016-12-12	2	-6/+70
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D27582 llvm-svn: 289486
*	[PPC] Prefer direct move on power8 if load 1 or 2 bytes to VSR	Guozhi Wei	2016-12-12	1	-0/+26
\| \| \| \| \| \| \| \| \|	Power8 has MTVSRWZ but no LXSIBZX/LXSIHZX, so move 1 or 2 bytes to VSR through MTVSRWZ is much faster than store the extended value into stack and load it with LXSIWZX. This patch fixes pr31144. Differential Revision: https://reviews.llvm.org/D27287 llvm-svn: 289473
*	Recommit r288212: Emit 'no line' information for interesting 'orphan' ↵	Paul Robinson	2016-12-12	2	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instructions. DWARF specifies that "line 0" really means "no appropriate source location" in the line table. By default, use this for branch targets and some other cases that have no specified source location, to prevent inheriting unfortunate line numbers from physically preceding instructions (which might be from completely unrelated source). Updated patch allows enabling or suppressing this behavior for all unspecified source locations. Differential Revision: http://reviews.llvm.org/D24180 llvm-svn: 289468
*	[X86] Regenerate vector bitcast/widening tests.	Simon Pilgrim	2016-12-12	1	-118/+110
\| \| \| \|	llvm-svn: 289443
*	[X86] Regenerate test.	Simon Pilgrim	2016-12-12	1	-0/+2
\| \| \| \|	llvm-svn: 289438
*	[X86][SSE] Add support for combining SSE VSHLI/VSRLI uniform constant shifts.	Simon Pilgrim	2016-12-12	5	-46/+25
\| \| \| \| \| \|	Fixes some missed constant folding opportunities and allows us to combine shuffles that end with a logical bit shift. llvm-svn: 289429
*	[X86][SSE] Lower suitably sign-extended mul vXi64 using PMULDQ	Simon Pilgrim	2016-12-12	3	-145/+38
\| \| \| \| \| \| \| \|	PMULDQ returns the 64-bit result of the signed multiplication of the lower 32-bits of vXi64 vector inputs, we can lower with this if the sign bits stretch that far. Differential Revision: https://reviews.llvm.org/D27657 llvm-svn: 289426
*	[SelectionDAG] Add support for EXTRACT_SUBVECTOR to ComputeNumSignBits	Simon Pilgrim	2016-12-12	1	-40/+16
\| \| \| \| \| \|	Pre-commit as discussed on D27657 llvm-svn: 289425
*	[X86] Teach selectScalarSSELoad to accept full 128-bit vector loads and the ↵	Craig Topper	2016-12-12	3	-12/+12
\| \| \| \| \| \| \| \|	X86ISD::VZEXT_LOAD opcode. Disable peephole on some of the tests that no longer require it to properly fold scalar intrinsics. llvm-svn: 289424
*	[X86] Remove some intrinsic instructions from hasPartialRegUpdate	Craig Topper	2016-12-12	5	-14/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits. As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded. Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too. Reviewers: spatel, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27611 llvm-svn: 289419
*	[X86][SSE] Add support for combining target shuffles to SHUFPD.	Simon Pilgrim	2016-12-11	3	-10/+7
\| \| \| \|	llvm-svn: 289407
*	[X86][AVX512] Add missing patterns for broadcast fallback in case load node ↵	Ayman Musa	2016-12-11	1	-0/+73
\| \| \| \| \| \| \| \| \| \| \|	has multiple uses (for v4i64 and v4f64). When the load node which the broadcast instruction broadcasts has multiple uses, it cannot be folded. A fallback pattern is added to catch these cases and provide another solution. Differential Revision: https://reviews.llvm.org/D27661 llvm-svn: 289404
*	[Verifier] Add verification for TBAA metadata	Sanjoy Das	2016-12-11	9	-9/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change adds some verification in the IR verifier around struct path TBAA metadata. Other than some basic sanity checks (e.g. we get constant integers where we expect constant integers), this checks: - That by the time an struct access tuple `(base-type, offset)` is "reduced" to a scalar base type, the offset is `0`. For instance, in C++ you can't start from, say `("struct-a", 16)`, and end up with `("int", 4)` -- by the time the base type is `"int"`, the offset better be zero. In particular, a variant of this invariant is needed for `llvm::getMostGenericTBAA` to be correct. - That there are no cycles in a struct path. - That struct type nodes have their offsets listed in an ascending order. - That when generating the struct access path, you eventually reach the access type listed in the tbaa tag node. Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D26438 llvm-svn: 289402
*	[X86][AVX512] Add target shuffle test showing missing PSHUFPD combine.	Simon Pilgrim	2016-12-11	1	-0/+16
\| \| \| \|	llvm-svn: 289400
*	instr-combiner: sum up all latencies of the transformed instructions	Sebastian Pop	2016-12-11	3	-13/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have found that -- when the selected subarchitecture has a scheduling model and we are not optimizing for size -- the machine-instruction combiner uses a too-simple algorithm to compute the cost of one of the two alternatives [before and after running a combining pass on a section of code], and therefor it throws away the combination results too often. This fix has the potential to help any ISA with the potential to combine instructions and for which at least one subarchitecture has a scheduling model. As of now, this is only known to definitely affect AArch64 subarchitectures with a scheduling model. Regression tested on AMD64/GNU-Linux, new test case tested to fail on an unpatched compiler and pass on a patched compiler. Patch by Abe Skolnik and Sebastian Pop. llvm-svn: 289399
*	[X86][XOP] Add target shuffle tests showing missing PSHUFPD combine.	Simon Pilgrim	2016-12-11	1	-0/+28
\| \| \| \|	llvm-svn: 289398
*	[X86] Regcall - Adding support for mask types	Oren Ben Simhon	2016-12-11	1	-8/+163
\| \| \| \| \| \| \| \| \|	Regcall calling convention passes mask types arguments in x86 GPR registers. The review includes the changes required in order to support v32i1, v16i1 and v8i1. Differential Revision: https://reviews.llvm.org/D27148 llvm-svn: 289383
*	[AVR] Add calling convention CodeGen tests	Dylan McKay	2016-12-11	3	-0/+167
\| \| \| \| \| \|	This adds CodeGen tests for the AVR C calling convention. llvm-svn: 289369
*	[AVR] Add a test to validate a simple 'blinking led' program	Dylan McKay	2016-12-11	1	-0/+125
\| \| \| \|	llvm-svn: 289362
*	[X86] Remove masking from 512-bit VPERMIL intrinsics in preparation for ↵	Craig Topper	2016-12-11	2	-39/+146
\| \| \| \| \| \|	being able to constant fold them in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289350
*	[X86] Remove masking from 512-bit PSHUFB intrinsics in preparation for being ↵	Craig Topper	2016-12-10	3	-24/+70
\| \| \| \| \| \|	able to constant fold it in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289344
*	[X86][SSE] Add tests for sign extended vXi64 multiplication	Simon Pilgrim	2016-12-10	1	-0/+198
\| \| \| \|	llvm-svn: 289342
*	[AVX-512] Remove 128/256 masked vpermil instrinsics and autoupgrade to a ↵	Craig Topper	2016-12-10	2	-80/+80
\| \| \| \| \| \|	select around the unmasked avx1 intrinsics. llvm-svn: 289340
*	[AVX-512] Add support for lowering (v2i64 (fp_to_sint (v2f32))) to ↵	Craig Topper	2016-12-10	1	-18/+78
\| \| \| \| \| \|	vcvttps2uqq when AVX512DQ and AVX512VL are available. llvm-svn: 289335
*	[SelectionDAG] Add ability for computeKnownBits to peek through bitcasts ↵	Simon Pilgrim	2016-12-10	2	-13/+5
\| \| \| \| \| \| \| \|	from 'large element' scalar/vector to 'small element' vector. Extension to D27129 which already supported bitcasts from 'small element' vector to 'large element' scalar/vector types. llvm-svn: 289329
*	[X86][XOP] Add permil2ps buildvector combine test	Simon Pilgrim	2016-12-10	1	-0/+23
\| \| \| \|	llvm-svn: 289327
*	[AVR] Fix and clean up the inline assembly tests	Dylan McKay	2016-12-10	4	-337/+338
\| \| \| \| \| \| \| \| \| \|	There was a bug where we would hit an assertion if 'Q' was used as a constraint. I also removed hardcoded register names to prefer regexes so the tests don't break when the register allocator changes. llvm-svn: 289325
*	[AVR] Explicitly set the target in all CodeGen tests	Dylan McKay	2016-12-10	4	-4/+4
\| \| \| \| \| \|	This seems to have caused failures on the buildbot. llvm-svn: 289324
*	[AVR] Use the register scavenger when expanding 'LDDW' instructions	Dylan McKay	2016-12-10	1	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This gets rid of the hardcoded 'r0' that was used previously. Reviewers: asl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27567 llvm-svn: 289322
*	[AVR] Support stores to undefined pointers	Dylan McKay	2016-12-10	1	-0/+13
\| \| \| \| \| \|	This would previously trigger an assertion error in AVRISelDAGToDAG. llvm-svn: 289321
*	AMDGPU: Fix AMDGPUPromoteAlloca breaking addrspacecasts	Matt Arsenault	2016-12-10	1	-0/+21
\| \| \| \| \| \| \|	The users of the addrspacecast were having their types incorrectly changed, producing invalid bitcasts between address spaces. llvm-svn: 289307
*	AMDGPU: Fix handling of 16-bit immediates	Matt Arsenault	2016-12-10	11	-31/+1059
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determining the legality based on the size is incorrect. Change operands to have the size specified in the type. Also adds a workaround for a disassembler bug that produces an immediate MCOperand for an operand that is supposed to be OPERAND_REGISTER. The assembler appears to accept out of bounds immediates and truncates them, but this seems to be an issue for 32-bit already. llvm-svn: 289306
*	AMDGPU: Change vintrp printing to better match sc	Matt Arsenault	2016-12-10	1	-6/+142
\| \| \| \| \| \| \|	Some of the immediates need to be printed differently eventually. llvm-svn: 289291
*	[X86] Regenerate test	Simon Pilgrim	2016-12-09	1	-6/+26
\| \| \| \|	llvm-svn: 289279
*	AMDGPU: Cleanup checks in sext_inreg test	Matt Arsenault	2016-12-09	1	-168/+203
\| \| \| \|	llvm-svn: 289272
*	Add README describing the intention of test/CodeGen/MIR	Matthias Braun	2016-12-09	1	-0/+7
\| \| \| \|	llvm-svn: 289265
*	AMDGPU/SI: Don't reserve XNACK when it's disabled	Marek Olsak	2016-12-09	2	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This frees 2 additional scalar registers. These are results from all of my 3 patches combined: Polaris: Spilled SGPRs: 2231 -> 1517 (-32.00 %) Tonga: Spilled SGPRs: 3829 -> 2608 (-31.89 %) Spilled VGPRs: 100 -> 84 (-16.00 %) Tonga even spills SGPRs via VGPRs to scratch. That's a compute shader limited to 64 VGPRs. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27151 llvm-svn: 289262
*	AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects	Marek Olsak	2016-12-09	6	-42/+56
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This frees 2 scalar registers. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27150 llvm-svn: 289261
*	AMDGPU/SI: Allow using SGPRs 96-101 on VI	Marek Olsak	2016-12-09	2	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There is no point in setting SGPRS=104, because VI allocates SGPRs in multiples of 16, so 104 -> 112. That enables us to use all 102 SGPRs for general purposes. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27149 llvm-svn: 289260
*	Move .mir tests to appropriate directories	Matthias Braun	2016-12-09	23	-2/+0
\| \| \| \| \| \| \| \| \| \| \|	test/CodeGen/MIR should contain tests that intent to test the MIR printing or parsing. Tests that test something else should be in test/CodeGen/TargetName even when they are written in .mir. As a rule of thumb, only tests using "llc -run-pass none" should be in test/CodeGen/MIR. llvm-svn: 289254
*	[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED)	Simon Pilgrim	2016-12-09	2	-10/+29
\| \| \| \| \| \|	Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type. llvm-svn: 289232
*	AMDGPU: Fix i128 mul	Matt Arsenault	2016-12-09	1	-0/+71
\| \| \| \|	llvm-svn: 289231
*	[PPC] Add intrinsics for vector extract word and vector insert word.	Sean Fertile	2016-12-09	1	-0/+18
\| \| \| \| \|	Revision: https://reviews.llvm.org/D26547 llvm-svn: 289227
*	Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵	Nirav Dave	2016-12-09	64	-1540/+1709
\| \| \| \| \| \| \| \|	UseAA is enabled." This reverts commit r289221 which appears to be triggering an assertion llvm-svn: 289226
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2016-12-09	64	-1709/+1540
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. Retrying after fixing overly aggressive load-store forwarding optimization. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289221
*	AMDGPU/SI: Don't mark VINTRP instructions as mayLoad	Tom Stellard	2016-12-09	2	-5/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These instructions technically do read from memory, but the memory is considered to be out of bounds for normal load/store instructions. shader-db stats: SGPRS: 1416075 -> 1413323 (-0.19 %) VGPRS: 867413 -> 863935 (-0.40 %) Spilled SGPRs: 1409 -> 1354 (-3.90 %) Spilled VGPRs: 63 -> 63 (0.00 %) Private memory VGPRs: 880 -> 880 (0.00 %) Scratch size: 2648 -> 2632 (-0.60 %) dwords per thread Code Size: 37889052 -> 37897340 (0.02 %) bytes LDS: 2147 -> 2147 (0.00 %) blocks Max Waves: 279243 -> 280369 (0.40 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, mareko, arsenm Subscribers: kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27593 llvm-svn: 289219
*	[AVR] Remove a set of redundant tests	Dylan McKay	2016-12-09	4	-88/+0
\| \| \| \| \| \|	This fixes the build. llvm-svn: 289201
*	[SelectionDAG] Add partial BITCAST support to computeKnownBits	Simon Pilgrim	2016-12-09	1	-315/+123
\| \| \| \| \| \| \| \| \| \|	Adds support for bitcasting a little endian 'small element' vector to 'large element' scalar/vector (e.g. v16i8 to v4i32 or v2i32 to i64), which is required for PR30845. We extract the knownbits for each 'small element' part and concatenate the results together. We can add support for big endian and 'large element' scalar/vector to 'small element' vector bitcasting once we have test cases for them. Differential Revision: https://reviews.llvm.org/D27129 llvm-svn: 289200
*	Revert "[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes"	Daniel Jasper	2016-12-09	1	-2/+10
\| \| \| \| \| \| \| \|	This reverts commit r288916 as it is currently causing a crasher in Halide. Reproducer on llvm.org/PR31323. While it might be that halide is generating invalid IR, llc shouldn't crash. llvm-svn: 289194