bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[X86] Print all register forms of x87 fadd/fsub/fdiv/fmul as having two ↵	Craig Topper	2019-02-04	5	-12/+12
\| \| \| \| \| \| \| \| \| \|	arguments where on is %st. All of these instructions consume one encoded register and the other register is %st. They either write the result to %st or the encoded register. Previously we printed both arguments when the encoded register was written. And we printed one argument when the result was written to %st. For the stack popping forms the encoded register is always the destination and we didn't print both operands. This was inconsistent with gcc and objdump and just makes the output assembly code harder to read. This patch changes things to always print both operands making us consistent with gcc and objdump. The parser should still be able to handle the single register forms just as it did before. This also matches the GNU assembler behavior. llvm-svn: 353061
*	[Intrinsic] Unsigned Fixed Point Multiplication Intrinsic	Leonard Chan	2019-02-04	1	-0/+393
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add an intrinsic that takes 2 unsigned integers with the scale of them provided as the third argument and performs fixed point multiplication on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D55625 llvm-svn: 353059
*	[GlobalISel] Add IRTranslator support for G_FFLOOR	Jessica Paquette	2019-02-04	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	Follow-up to https://reviews.llvm.org/D57484 Adds G_FFLOOR to translateKnownIntrinsic and update arm64-irtranslator.ll. Differential Revision: https://reviews.llvm.org/D57485 llvm-svn: 353058
*	[GlobalISel] Introduce a generic floating point floor opcode, G_FFLOOR	Jessica Paquette	2019-02-04	1	-1/+4
\| \| \| \| \| \| \| \| \|	This introduces a generic opcode for floating point floor, working towards selecting @llvm.floor. Differential Revision: https://reviews.llvm.org/D57484 llvm-svn: 353057
*	[X86][SSE] SimplifyDemandedBitsForTargetNode - PCMPGT(0,X) sign mask	Simon Pilgrim	2019-02-04	6	-48/+10
\| \| \| \| \| \| \| \|	For PCMPGT(0, X) patterns where we only demand the sign bit (e.g. BLENDV or MOVMSK) then we can use X directly. Differential Revision: https://reviews.llvm.org/D57667 llvm-svn: 353051
*	AMDGPU/GlobalISel: Legalize select for v4s16	Matt Arsenault	2019-02-04	1	-1/+206
\| \| \| \| \| \| \|	Also add some more select tests to help show future legalization changes. llvm-svn: 353045
*	[DAGCombine] Add ADD(SUB,SUB) combines	Simon Pilgrim	2019-02-04	2	-27/+10
\| \| \| \| \| \| \| \|	Noticed while investigating PR40483, and fixes the basic test case from the bug - but not a more general case. We're pretty weak at dealing with ADD/SUB combines compared to the SimplifyAssociativeOrCommutative/SimplifyUsingDistributiveLaws abilities that InstCombine can manage. llvm-svn: 353044
*	[AsmPrinter] Remove hidden flag -print-schedule.	Andrea Di Biagio	2019-02-04	48	-129220/+2425
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes hidden codegen flag -print-schedule effectively reverting the logic originally committed as r300311 (https://llvm.org/viewvc/llvm-project?view=revision&revision=300311). Flag -print-schedule was originally introduced by r300311 to address PR32216 (https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better testing of schedule model instruction latencies/throughputs". These days, we can use llvm-mca to test scheduling models. So there is no longer a need for flag -print-schedule in LLVM. The main use case for PR32216 is now addressed by llvm-mca. Flag -print-schedule is mainly used for debugging purposes, and it is only actually used by x86 specific tests. We already have extensive (latency and throughput) tests under "test/tools/llvm-mca" for X86 processor models. That means, most (if not all) existing -print-schedule tests for X86 are redundant. When flag -print-schedule was first added to LLVM, several files had to be modified; a few APIs gained new arguments (see for example method MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained a couple of getSchedInfoStr() methods. Method getSchedInfoStr() had to originally work for both MCInst and MachineInstr. The original implmentation of getSchedInfoStr() introduced a subtle layering violation (reported as PR37160 and then fixed/worked-around by r330615). In retrospect, that new API could have been designed more optimally. We can always query MCSchedModel to get the latency and throughput. More importantly, the "sched-info" string should not have been generated by the subtarget. Note, r317782 fixed an issue where "print-schedule" didn't work very well in the presence of inline assembly. That commit is also reverted by this change. Differential Revision: https://reviews.llvm.org/D57244 llvm-svn: 353043
*	[X86] Add a couple of missed ADD combine tests	Simon Pilgrim	2019-02-04	1	-0/+42
\| \| \| \| \| \|	Noticed while investigating PR40483 llvm-svn: 353042
*	[ARM] Mark 255 and 65535 as cheap for Thumb1 "And"	David Green	2019-02-04	1	-16/+10
\| \| \| \| \| \| \| \| \| \|	This prevents Constant Hoisting from pulling the constant out of the block, allowing us to still produce LDRH/UXTH nodes. LDRB/UXTB (255) is already cheap by the default getIntImmCost, but I've added it for clarity. Differential Revision: https://reviews.llvm.org/D57671 llvm-svn: 353040
*	[ARM] Add testcases for D57671. NFC	David Green	2019-02-04	1	-0/+165
\| \| \| \|	llvm-svn: 353039
*	Recommit r352660 "[X86] Mark EMMS and FEMMS as clobbering MM0-7 and ST0-7."	Craig Topper	2019-02-04	1	-86/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We now print ST0 as 'st' when generating the clobber list for MS inline assembly in clang. This matches what the gcc reg name list expects. Original commit message: This fixes the test case in PR35982 by preventing MMX instructions that read MM0-7 from being moved below EMMS/FEMMS by the post RA scheduler. Though as discussed in bugzilla, this is not a complete fix. There is still the possibility of reordering in IR or by the pre-RA scheduler. Differential Revision: https://reviews.llvm.org/D57298 llvm-svn: 353016
*	[X86] Print %st(0) as %st when its implicit to the instruction. Continue ↵	Craig Topper	2019-02-04	11	-229/+229
\| \| \| \| \| \| \| \|	printing it as %st(0) when its encoded in the instruction. This is a step back from the change I made in r352985. This appears to be more consistent with gcc and objdump behavior. llvm-svn: 353015
*	[X86] Regenerate test to drop 'End function' comments some other other regex ↵	Craig Topper	2019-02-04	1	-12/+2
\| \| \| \| \| \|	updates. llvm-svn: 353014
*	Revert r352985 "[X86] Print %st(0) as %st to match what gcc inline asm uses ↵	Craig Topper	2019-02-04	16	-318/+328
\| \| \| \| \| \| \| \| \| \|	as the clobber name to make MS inline asm work correctly" Looking into gcc and objdump behavior more this was overly aggressive. If the register is encoded in the instruction we should print %st(0), if its implicit we should print %st. I'll be making a more directed change in a future patch. llvm-svn: 353013
*	[NFC] Make vector types legal in UREM test	Simon Pilgrim	2019-02-03	2	-271/+99
\| \| \| \| \| \| \| \| \| \|	As discussed in D50222, this changes the vector types in tests required for that revision to ones legal for X86. Patch by @hermord (Dmytro Shynkevych) Differential Revision: https://reviews.llvm.org/D56372 llvm-svn: 353004
*	[PowerPC] adjust test for uaddo change in rL353001	Sanjay Patel	2019-02-03	1	-2/+1
\| \| \| \| \| \| \|	We don't need a mtctr/bctr for this test now; a regular conditional branch is fine. llvm-svn: 353002
*	[CGP] adjust target constraints for forming uaddo	Sanjay Patel	2019-02-03	2	-33/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are 2 changes visible here: 1. There's no reason to limit this transform based on number of condition registers. That diff allows PPC to produce slightly better (dot-instructions should be generally good) code. Note: someone that cares about PPC codegen might want to look closer at that output because it seems like we could still improve this. 2. We (probably?) should not bother trying to form uaddo (or other overflow ops) when there's no target support for such an op. This goes beyond checking whether the op is expanded because both PPC and AArch64 show better codegen for standard types regardless of whether the op is legal/custom. llvm-svn: 353001
*	[X86][AVX] Support shuffle combining for VBROADCAST with smaller vector sources	Simon Pilgrim	2019-02-03	2	-17/+14
\| \| \| \| \| \|	getTargetShuffleMask can only do this safely if we're extracting the lowest subvector from a vector of the same result type. llvm-svn: 352999
*	[PatternMatch] add special-case uaddo matching for increment-by-one (2nd try)	Sanjay Patel	2019-02-03	3	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the most important uaddo problem mentioned in PR31754: https://bugs.llvm.org/show_bug.cgi?id=31754 ...but that was overcome in x86 codegen with D57637. That patch also corrects the inc vs. add regressions seen with the previous attempt at this. Still, we want to make this matcher complete, so we can potentially canonicalize the pattern even if it's an 'add 1' operation. Pattern matching, however, shouldn't assume that we have canonicalized IR, so we match 4 commuted variants of uaddo. There's also a test with a crazy type to show that the existing CGP transform based on this matcher is not limited by target legality checks. I'm not sure if the Hexagon diff means the test is no longer testing what it intended to test, but that should be solvable in a follow-up. Differential Revision: https://reviews.llvm.org/D57516 llvm-svn: 352998
*	[X86][AVX] Support shuffle combining for VPMOVZX with smaller vector sources	Simon Pilgrim	2019-02-03	1	-8/+19
\| \| \| \|	llvm-svn: 352997
*	[X86][AVX] More aggressively simplify BROADCAST source operand	Simon Pilgrim	2019-02-03	5	-46/+17
\| \| \| \| \| \| \| \|	Aim to use scalar source or lowest 128-bit vector directly. We're still missing some VZMOVL_LOAD combines. llvm-svn: 352994
*	[x86] add CGP uaddo test with weird type; NFC	Sanjay Patel	2019-02-03	1	-0/+19
\| \| \| \| \| \| \|	There's probably no reason to try this transform for an obviously unsupported op. llvm-svn: 352993
*	[PowerPC] add tests for saturating add; NFC	Sanjay Patel	2019-02-03	1	-0/+787
\| \| \| \| \| \|	This is copied from the existing test files for x86/AArch. llvm-svn: 352987
*	[X86] Print %st(0) as %st to match what gcc inline asm uses as the clobber ↵	Craig Topper	2019-02-03	16	-328/+318
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	name to make MS inline asm work correctly Summary: When calculating clobbers for MS style inline assembly we fail if the asm clobbers stack top because we print st(0) and try to pass it through the gcc register name check. This was found with when I attempted to make a emms/femms clobber all ST registers. If you use emms/femms in MS inline asm we would try to use st(0) as the clobber name but clang would think that wasn't a valid clobber name. This also matches what objdump disassembly prints. It's also what is printed by gcc -S. Reviewers: RKSimon, rnk, efriedma, spatel, andreadb, lebedev.ri Reviewed By: rnk Subscribers: eraman, gbedwell, lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D57621 llvm-svn: 352985
*	[X86] Lower ISD::UADDO to use the Z flag instead of C flag when the RHS is a ↵	Craig Topper	2019-02-03	5	-45/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	constant 1 to encourage INC formation. Summary: Add an additional combine to combineCarryThroughADD to reverse it back to the C flag to avoid regressions. I believe this catches the cases that D57547 got. Reviewers: RKSimon, spatel Reviewed By: spatel Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57637 llvm-svn: 352984
*	GlobalISel: Implement widenScalar for G_UNMERGE_VALUES	Matt Arsenault	2019-02-03	2	-23/+260
\| \| \| \| \| \| \| \| \|	For the scalar case only. Also move the similar G_MERGE_VALUES handling to a separate function and cleanup to make them look more similar. llvm-svn: 352979
*	GlobalISel: Implement widenScalar for G_EXTRACT vector sources	Matt Arsenault	2019-02-02	1	-0/+132
\| \| \| \| \| \|	Handle the basic element extract case. llvm-svn: 352978
*	AMDGPU/GlobalISel: Legalize icmp for pointer types	Matt Arsenault	2019-02-02	1	-0/+175
\| \| \| \|	llvm-svn: 352976
*	AMDGPU/GlobalISel: Legalize constant for pointer types	Matt Arsenault	2019-02-02	1	-0/+84
\| \| \| \|	llvm-svn: 352975
*	AMDGPU/GlobalISel: Legalize select for pointer types	Matt Arsenault	2019-02-02	1	-76/+514
\| \| \| \|	llvm-svn: 352974
*	GlobalISel: Legalization for inttoptr/ptrtoint	Matt Arsenault	2019-02-02	2	-28/+323
\| \| \| \|	llvm-svn: 352973
*	[X86] Add another test case for PR40539. NFC	Craig Topper	2019-02-02	1	-0/+37
\| \| \| \|	llvm-svn: 352967
*	[X86][AVX] Enable INSERT_SUBVECTOR(SRC0, SHUFFLE(SRC1)) shuffle combining	Simon Pilgrim	2019-02-02	10	-240/+398
\| \| \| \| \| \| \| \| \| \|	Push the insert_subvector up through the shuffle operands to help find more cross-lane shuffles. The is exposes a couple of minor issues that will be fixed shortly: Missed broadcast folds - we have a mixture of vzext_load lengths that need cleaning up combine-sdiv.ll - AVX1 SimplifyDemandedVectorElts failure (hits max depth due to a couple of extra bitcasts). llvm-svn: 352963
*	[BPF] [BTF] Process FileName with absolute path correctly	Yonghong Song	2019-02-02	1	-0/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In IR, sometimes the following attributes for DIFile may be generated: filename: /home/yhs/test.c directory: /tmp The /tmp may represent the working directory of the compilation process. In such cases, since filename is with absolute path, the directory should be ignored by BTF. The filename alone is enough to get the source. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 352952
*	[AutoUpgrade] Fix AutoUpgrade for x86.seh.recoverfp	Mandeep Singh Grang	2019-02-02	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes the bug in https://reviews.llvm.org/D56747#inline-502711. Reviewers: efriedma Reviewed By: efriedma Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57614 llvm-svn: 352945
*	Revert "[BPF] [BTF] Process FileName with absolute path correctly"	Yonghong Song	2019-02-01	1	-83/+0
\| \| \| \| \| \| \| \|	This reverts commit r352939. Some tests failed. Revert to unblock others. llvm-svn: 352941
*	[BPF] [BTF] Process FileName with absolute path correctly	Yonghong Song	2019-02-01	1	-0/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In IR, sometimes the following attributes for DIFile may be generated: filename: /home/yhs/test.c directory: /tmp The /tmp may represent the working directory of the compilation process. In such cases, since filename is with absolute path, the directory should be ignored by BTF. The filename alone is enough to get the source. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 352939
*	[WebAssembly] Add codegen support for the import_field attribute	Dan Gohman	2019-02-01	1	-1/+2
\| \| \| \| \| \| \| \| \|	This adds the LLVM side of https://reviews.llvm.org/D57602 -- the import_field attribute. See that patch for details. Differential Revision: https://reviews.llvm.org/D57603 llvm-svn: 352931
*	[COFF, ARM64] Fix localaddress to handle stack realignment and variable size ↵	Mandeep Singh Grang	2019-02-01	2	-76/+262
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	objects Summary: This fixes using the correct stack registers for SEH when stack realignment is needed or when variable size objects are present. Reviewers: rnk, efriedma, ssijaric, TomTan Reviewed By: rnk, efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D57183 llvm-svn: 352923
*	[X86][AVX] Add VMOVDDUP-VPBROADCASTQ execution domain mapping	Simon Pilgrim	2019-02-01	26	-298/+257
\| \| \| \| \| \| \| \|	Noticed in D57514. Differential Revision: https://reviews.llvm.org/D57519 llvm-svn: 352922
*	[AMDGPU] Mark test functions with hidden visibility	Scott Linder	2019-02-01	8	-70/+70
\| \| \| \| \| \| \| \| \|	Prepare for future patch which affects codegen for calls to preemptible functions. Differential Revision: https://reviews.llvm.org/D57605 llvm-svn: 352920
*	[opaque pointer types] Pass value type to LoadInst creation.	James Y Knight	2019-02-01	1	-16/+12
\| \| \| \| \| \| \| \| \|	This cleans up all LoadInst creation in LLVM to explicitly pass the value type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57172 llvm-svn: 352911
*	[DWARF v5] Fix DWARF emitter and consumer to produce/expect a uleb for a ↵	Wolfgang Pieb	2019-02-01	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	location description's length. Reviewer: davide, JDevliegere Differential Revision: https://reviews.llvm.org/D57550 llvm-svn: 352889
*	[AMDGPU] Fix for vector element insertion	Tim Corringham	2019-02-01	6	-34/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Incorrect code was generated when lowering insertelement operations for vectors with 8 or 16 bit elements. The value being inserted was not adjusted for the position of the element within the 32 bit word and so only the low element within each 32 bit word could receive the intended value. Fixed by simply replicating the value to each element of a congruent vector before the mask and or operation used to update the intended element. A number of affected LIT tests have been updated appropriately. before the mask & or into the intended Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: llvm-commits, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Tags: #llvm Differential Revision: https://reviews.llvm.org/D57588 llvm-svn: 352885
*	[X86][SSE] Use PSLLDQ/PSRLDQ to mask out zeroable ends of a shuffle	Simon Pilgrim	2019-02-01	4	-168/+148
\| \| \| \| \| \| \| \| \| \|	As suggested on PR40318, this patch uses PSLLDQ/PSRLDQ to lower shuffles to zero out the ends of a vector, leaving a sequential inner section. For pre-SSSE3 we do this for shuffles with zeros at either end (requiring up to 3 shifts), but once PSHUFB is available I've limited this to shuffles with a single zeroable end (2 shifts). Differential Revision: https://reviews.llvm.org/D56784 llvm-svn: 352883
*	[TargetLowering] try harder to determine undef elements of vector binops	Sanjay Patel	2019-02-01	1	-23/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This might be the start of tracking all vector element constants generally if we take it to its logical conclusion, but let's stop here and make sure this is correct/beneficial so far. The affected tests require a convoluted path before they get simplified currently because we don't call SimplifyDemandedVectorElts() from binops directly and don't modify the binop operands directly in SimplifyDemandedVectorElts(). That's why the tests all have a trailing shuffle to induce a chain reaction of transforms. So something like this is happening: 1. Improve the knowledge of undefs in the binop via a SimplifyDemandedVectorElts() call that originates from a shuffle. 2. Transfer that undef knowledge back to the shuffle mask user as more undef lanes. 3. Combine the modified shuffle by calling SimplifyDemandedVectorElts() again. 4. Translate the improved shuffle mask as undemanded lanes of build vector constants causing those to become full undef constants. 5. Simplify the binop now that it has a full undef operand. As we can see from the unchanged 'and' and 'or' tests, tracking undefs alone isn't a full solution. We would need to track zero and all-ones constants to improve those opcodes. We'd probably need to track NaN for FP ops too (assuming we don't have fast-math-flags set). Differential Revision: https://reviews.llvm.org/D57066 llvm-svn: 352880
*	[X86][AVX] Combine INSERT_SUBVECTOR(SRC0, ↵	Simon Pilgrim	2019-02-01	1	-9/+16
\| \| \| \| \| \| \| \| \| \|	BITCAST(SHUFFLE(EXTRACT_SUBVECTOR(SRC1))) Enable peeking through one use bitcasts to the subvector shuffle. This still depends on the subvector being the same scalar-size but D57514 has already helped with the more tricky patterns llvm-svn: 352879
*	[AArch64] Optimize floating point materialization	Adhemerval Zanella	2019-02-01	7	-46/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch changes isFPImmLegal to return if the value can be enconded as the immediate operand of a logical instruction besides checking if for immediate field for fmov. This optimizes some floating point materization, inclusive values used on isinf lowering. Reviewed By: rengolin, efriedma, evandro Differential Revision: https://reviews.llvm.org/D57044 llvm-svn: 352866
*	[X86][BdVer2] Transfer delays from the integer to the floating point unit.	Roman Lebedev	2019-02-01	4	-16/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I'm unable to find this number in the "AMD SOG for family 15h". llvm-exegesis measures the latencies of these instructions as `2`, which matches the latencies specified in "AMD SOG for family 15h". However if we look at Agner, Microarchitecture, "AMD Bulldozer, Piledriver, Steamroller and Excavator pipeline", "Data delay between different execution domains", the int->ivec transfer is listed as `8`..`10`cy of additional latency. Also, Agner's "Instruction tables", for Piledriver, lists their latencies as `12`, which is consistent with `2cy` from exegesis / AMD SOG + `10cy` transfer delay. Additional data point comes from the fact that Agner's "Instruction tables", for Jaguar, lists their latencies as `8`; and "AMD SOG for family 16h" does state the `+6cy` int->ivec delay, which is consistent with instr latency of `1` or `2`. Reviewers: andreadb, RKSimon, craig.topper Reviewed By: andreadb Subscribers: gbedwell, courbet, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57300 llvm-svn: 352861