bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[X86] Tag BMI/BMI2/TBM instructions scheduler classes	Simon Pilgrim	2017-12-07	1	-16/+16
\| \| \| \| \| \|	Put these under UNARY/BINOP ALU itinerary classes for now - seems to be a good average value llvm-svn: 320064
*	[X86] Tag LZCNT/TZCNT instructions scheduler classes	Simon Pilgrim	2017-12-07	1	-18/+24
\| \| \| \| \| \|	Tagged as IMUL instructions for a reasonable approximation (ALU tends to be a lot faster) - POPCNT is currently tagged as FAdd which I think should be replaced with IMUL as well llvm-svn: 320051
*	[X86] Tag RDRAND/RDSEED instruction scheduler classes	Simon Pilgrim	2017-12-07	1	-8/+12
\| \| \| \|	llvm-svn: 320045
*	[X86] Tag CLFLUSHOPT with same scheduling behaviour as CLFLUSH	Simon Pilgrim	2017-11-28	1	-2/+3
\| \| \| \|	llvm-svn: 319253
*	Control-Flow Enforcement Technology - Shadow Stack support (LLVM side)	Oren Ben Simhon	2017-11-26	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Shadow stack solution introduces a new stack for return addresses only. The HW has a Shadow Stack Pointer (SSP) that points to the next return address. If we return to a different address, an exception is triggered. The shadow stack is managed using a series of intrinsics that are introduced in this patch as well as the new register (SSP). The intrinsics are mapped to new instruction set that implements CET mechanism. The patch also includes initial infrastructure support for IBT. For more information, please see the following: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf Differential Revision: https://reviews.llvm.org/D40223 Change-Id: I4daa1f27e88176be79a4ac3b4cd26a459e88fed4 llvm-svn: 318996
*	[x86][icelake]GFNI	Coby Tayree	2017-11-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	galois field arithmetic (GF(2^8)) insns: gf2p8affineinvqb gf2p8affineqb gf2p8mulb Differential Revision: https://reviews.llvm.org/D40373 llvm-svn: 318993
*	[X86] Add separate intrinsics for scalar FMA4 instructions.	Craig Topper	2017-11-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These instructions zero the non-scalar part of the lower 128-bits which makes them different than the FMA3 instructions which pass through the non-scalar part of the lower 128-bits. I've only added fmadd because we should be able to derive all other variants using operand negation in the intrinsic header like we do for AVX512. I think there are still some missed negate folding opportunities with the FMA4 instructions in light of this behavior difference that I hadn't noticed before. I've split the tests so that we can use different intrinsics for scalar testing between the two. I just copied the tests split the RUN lines and changed out the scalar intrinsics. fma4-fneg-combine.ll is a new test to make sure we negate the fma4 intrinsics correctly though there are a couple TODOs in it. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39851 llvm-svn: 318984
*	Avoid unecessary opsize byte in segment move to memory	Nirav Dave	2017-11-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Segment moves to memory are always 16-bit. Remove invalid 32 and 64 bit variants. Recommiting with missing clang inline assembly test change. Fixes PR34478. Reviewers: rnk, craig.topper Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D39847 llvm-svn: 318797
*	[X86][LWP] Add missing LWP itinerary class to lwpins instructions	Simon Pilgrim	2017-11-21	1	-2/+2
\| \| \| \| \| \|	It's on all other LWP instruction but I missed it from lwpins, despite similar scheduling behaviour. llvm-svn: 318751
*	[x86][icelake]BITALG	Coby Tayree	2017-11-21	1	-0/+1
\| \| \| \| \| \| \|	vpopcnt{b,w} Differential Revision: https://reviews.llvm.org/D40213 llvm-svn: 318748
*	[x86][icelake]VNNI	Coby Tayree	2017-11-21	1	-0/+1
\| \| \| \| \| \| \| \| \|	Introducing Vector Neural Network Instructions, consisting of: vpdpbusd{s} vpdpwssd{s} Differential Revision: https://reviews.llvm.org/D40208 llvm-svn: 318746
*	[x86][icelake]vbmi2	Coby Tayree	2017-11-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	introducing vbmi2, consisting of vpcompress{b,w} vpexpand{b,w} vpsh{l,r}d{w,d,q} vpsh{l,r}dv{w,d,q} Differential Revision: https://reviews.llvm.org/D40206 llvm-svn: 318745
*	[x86][icelake]vpclmulqdq introduction	Coby Tayree	2017-11-21	1	-0/+3
\| \| \| \| \| \| \|	an icelake promotion of pclmulqdq Differential Revision: https://reviews.llvm.org/D40101 llvm-svn: 318741
*	[x86][icelake]VAES introduction	Coby Tayree	2017-11-21	1	-0/+2
\| \| \| \| \| \| \|	an icelake promotion of AES Differential Revision: https://reviews.llvm.org/D40078 llvm-svn: 318740
*	Revert r318678 to fix Clang test	Richard Trieu	2017-11-21	1	-2/+2
\| \| \| \| \| \|	r318678 caused the Clang test CodeGen/ms-inline-asm.c to start failing. llvm-svn: 318710
*	[X86] Avoid unecessary opsize byte in segment move to memory	Nirav Dave	2017-11-20	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Segment moves to memory are always 16-bit. Remove invalid 32 and 64 bit variants. Fixes PR34478. Reviewers: rnk, craig.topper Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D39847 llvm-svn: 318678
*	[X86] Make FeatureAVX512 imply FeatureF16C.	Craig Topper	2017-11-06	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	The EVEX to VEX pass is already assuming this is true under AVX512VL. We had special patterns to use zmm instructions if VLX and F16C weren't available. Instead just make AVX512 imply F16C to make the EVEX to VEX behavior explicitly legal and remove the extra patterns. All known CPUs with AVX512 have F16C so this should safe for now. llvm-svn: 317521
*	[X86] Make sure we don't create locked inc/dec instructions when the carry ↵	Craig Topper	2017-10-30	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	flag is being used. Summary: INC/DEC don't update the carry flag so we need to make sure we don't try to use it. This patch introduces new X86ISD opcodes for locked INC/DEC. Teaches lowerAtomicArithWithLOCK to emit these nodes if INC/DEC is not slow or the function is being optimized for size. An additional flag is added that allows the INC/DEC to be disabled if the caller determines that the carry flag is being requested. The test_sub_1_cmp_1_setcc_ugt test is currently showing this bug. The other test case changes are recovering cases that were regressed in r316860. This should fully fix PR35068 finishing the fix started in r316860. Reviewers: RKSimon, zvi, spatel Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39411 llvm-svn: 316913
*	[X86] Change RDRAND to use PS instead of TB.	Craig Topper	2017-10-23	1	-3/+3
\| \| \| \| \| \|	Should be no functional change for now. A future disassembler change will prevent disassembling with 0xf2/0xf3. llvm-svn: 316339
*	[X86] Add RDPID instruction for assembler and disassembler.	Craig Topper	2017-10-23	1	-3/+3
\| \| \| \|	llvm-svn: 316332
*	[X86] Remove the SlowBTMem feature flag entirely	Craig Topper	2017-10-15	1	-13/+6
\| \| \| \| \| \|	Turns out we have no patterns on the instructions that were using this feature flag for other reasons. These instructions are slow on all modern CPUs so it seems unlikely that we will spend any effort supporting these instructions going forward. So we might as well just kill of the feature flag and just fix up the comments. llvm-svn: 315862
*	[X86] Add CLWB intrinsic. llvm part	Craig Topper	2017-10-12	1	-2/+2
\| \| \| \|	llvm-svn: 315613
*	[X86] Add new attribute to X86 instructions to enable marking them as "not ↵	Ayman Musa	2017-10-08	1	-24/+30
\| \| \| \| \| \| \| \| \| \| \|	memory foldable" This attribute will be used in a tablegen backend that generated the X86 memory folding tables which will be added in a future pass. Instructions with this attribute unset will be excluded from the full set of X86 instructions available for the pass. Differential Revision: https://reviews.llvm.org/D38027 llvm-svn: 315171
*	[X86] Change register&memory TEST instructions from MRMSrcMem to MRMDstMem	Craig Topper	2017-10-01	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable. For the register&register form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register&register form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem. I believe this supercedes D38025 which was trying to switch the register&register form back to pre-PR22995. Reviewers: aymanmus, RKSimon, zvi Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38120 llvm-svn: 314639
*	Revert r314249 "Recommit r314151 "[X86] Make all the NOREX CodeGenOnly ↵	Craig Topper	2017-09-27	1	-7/+10
\| \| \| \| \| \| \| \|	instructions into postRA pseudos like the NOREX version of TEST.""" This caused PR34751 llvm-svn: 314339
*	[X86][AsmParser] fix PR32035	Coby Tayree	2017-09-27	1	-0/+2
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D37473 llvm-svn: 314295
*	Recommit r314151 "[X86] Make all the NOREX CodeGenOnly instructions into ↵	Craig Topper	2017-09-26	1	-10/+7
\| \| \| \| \| \| \| \|	postRA pseudos like the NOREX version of TEST."" The late MOV8rr_NOREX that caused the crash has been removed. llvm-svn: 314249
*	Revert "[X86] Make all the NOREX CodeGenOnly instructions into postRA ↵	Benjamin Kramer	2017-09-26	1	-7/+10
\| \| \| \| \| \| \| \|	pseudos like the NOREX version of TEST." Makes llc crash. This reverts commit r314151. llvm-svn: 314199
*	[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like ↵	Craig Topper	2017-09-25	1	-10/+7
\| \| \| \| \| \|	the NOREX version of TEST. llvm-svn: 314151
*	[X86] Remove isel checks for immediate size on floating point compare and ↵	Craig Topper	2017-09-20	1	-8/+0
\| \| \| \| \| \| \| \|	xop compare instructions. NFCI If these checks fail we end up not selecting an instruction at all. So we are already relying on the immediate being checked upstream of isel. So doing the check in isel is just bloat to the isel table. Interestingly, we didn't check on the AVX512 version of the instructions anyway. llvm-svn: 313724
*	[X86] Add NoAVX predicates to the patterns for the legacy encoded PCLMUL and ↵	Craig Topper	2017-09-16	1	-0/+1
\| \| \| \| \| \| \| \|	AES instructions. Previously we were just relying on pattern order to define precedence. Which works, but isn't the best way. llvm-svn: 313471
*	[X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI ↵	Craig Topper	2017-09-12	1	-22/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instruction to custom isel Recognizing this pattern during DAG combine hides information about the 'and' and the shift from other combines. I think it should be recognized at isel so its as late as possible. But it can't be done with table based isel because you need to be able to look at both immediates. This patch moves it to custom isel in X86ISelDAGToDAG.cpp. This does break a couple tests in tbm_patterns because we are now emitting an and_flag node or (cmp and, 0) that we dont' recognize yet. We already had this problem for several other TBM patterns so I think this fine and we can address of them together. I've also fixed a bug where the combine to BEXTR was preventing us from using a trick of zero extending AH to handle extracts of bits 15:8. We might still want to use BEXTR if it enables load folding. But honestly I hope we narrowed the load instead before got to isel. I think we should probably also support matching BEXTR from (srl/srl (and mask << C), C). But that should be a different patch. Differential Revision: https://reviews.llvm.org/D37592 llvm-svn: 313054
*	[X86] Don't disable slow INC/DEC if optimizing for size	Craig Topper	2017-09-09	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size. This appears to match gcc behavior. Reviewers: chandlerc, zvi, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37177 llvm-svn: 312866
*	[X86] Introduce a new td file to hold patterns some of the non instruction ↵	Craig Topper	2017-09-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	patterns from SSE and AVX512 This patch moves some of similar non-instruction patterns from X86InstrSSE.td and X86InstrAVX512.td to a common file. This is intended as a starting point. There are many other optimization patterns that exist in both files that we could move here. Differential Revision: https://reviews.llvm.org/D37455 llvm-svn: 312649
*	[X86] Add output register to BTC/BTR/BTS instructions.	Craig Topper	2017-09-03	1	-24/+24
\| \| \| \|	llvm-svn: 312432
*	[X86] Finish the subtarget and predicate implementation of CLWB.	Craig Topper	2017-08-29	1	-0/+4
\| \| \| \| \| \|	We don't have an intrinsic implemented for this instruction yet, but it looked odd that we were missing the accessor method from the subtarget. llvm-svn: 312064
*	Mark Knights Landing as having slow two memory operand instructions	Craig Topper	2017-08-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Knights Landing, because it is Atom derived, has slow two memory operand instructions. Mark the Knights Landing CPU model accordingly. Patch by David Zarzycki. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37224 llvm-svn: 311979
*	[X86] Allow xacquire/xrelease prefixes	Coby Tayree	2017-08-21	1	-0/+4
\| \| \| \| \| \| \|	Allow those prefixes on assembly code Differential Revision: https://reviews.llvm.org/D36845 llvm-svn: 311309
*	[X86] Use BEXTR/BEXTRI for 64-bit 'and' with a large mask	Craig Topper	2017-08-01	1	-5/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The 64-bit 'and' with immediate instruction only supports a 32-bit immediate. So for larger constants we have to load the constant into a register first. If the immediate happens to be a mask we can use the BEXTRI instruction to perform the masking. We already do something similar using the BZHI instruction from the BMI2 instruction set. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36129 llvm-svn: 309706
*	[X86] Add pattern to use bzhi for 64-bit 'and' with a mask when there is a ↵	Craig Topper	2017-07-31	1	-0/+4
\| \| \| \| \| \| \| \|	load involved. We already had a pattern without load, but with a load we were falling back to a regular 'and' due to pattern complexity priority. llvm-svn: 309535
*	[X86] Add nopq instruction which is a rex encoded version of nopl for gas ↵	Craig Topper	2017-07-22	1	-0/+4
\| \| \| \| \| \|	compatibility. llvm-svn: 308818
*	[X86] Add register form of NOPL and NOPW for assembler/disassembler.	Craig Topper	2017-07-22	1	-0/+5
\| \| \| \| \| \|	Fixes PR32805. llvm-svn: 308817
*	[X86] Adding FoldGenRegForm helper field (for memory folding tables tableGen ↵	Ayman Musa	2017-05-28	1	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	backend) to X86Inst class and set its value for the relevant instructions. Some register-register instructions can be encoded in 2 different ways, this happens when 2 register operands can be folded (separately). For example if we look at the MOV8rr and MOV8rr_REV, both instructions perform exactly the same operation, but are encoded differently. Here is the relevant information about these instructions from Intel's 64-ia-32-architectures-software-developer-manual: Opcode Instruction Op/En 64-Bit Mode Compat/Leg Mode Description 8A /r MOV r8,r/m8 RM Valid Valid Move r/m8 to r8. 88 /r MOV r/m8,r8 MR Valid Valid Move r8 to r/m8. Here we can see that in order to enable the folding of the output and input registers, we had to define 2 "encodings", and as a result we got 2 move 8-bit register-register instructions. In the X86 backend, we define both of these instructions, usually one has a regular name (MOV8rr) while the other has "_REV" suffix (MOV8rr_REV), must be marked with isCodeGenOnly flag and is not emitted from CodeGen. Automatically generating the memory folding tables relies on matching encodings of instructions, but in these cases where we want to map both memory forms of the mov 8-bit (MOV8rm & MOV8mr) to MOV8rr (not to MOV8rr_REV) we have to somehow point from the MOV8rr_REV to the "regular" appropriate instruction which in this case is MOV8rr. This field enable this "pointing" mechanism - which is used in the TableGen backend for generating memory folding tables. Differential Revision: https://reviews.llvm.org/D32683 llvm-svn: 304087
*	[X86] Adding vpopcntd and vpopcntq instructions	Oren Ben Simhon	2017-05-25	1	-0/+2
\| \| \| \| \| \| \| \| \|	AVX512_VPOPCNTDQ is a new feature set that was published by Intel. The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq). Differential Revision: https://reviews.llvm.org/D33169 llvm-svn: 303858
*	[globalisel][tablegen] Demote OptForSize/OptForMinSize/ForCodeSize to ↵	Daniel Sanders	2017-05-19	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	per-function predicates. Summary: This causes them to be re-computed more often than necessary but resolves objections that were raised post-commit on r301750. Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls Reviewed By: qcolombet Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32861 llvm-svn: 303418
*	[X86] Add more patterns for BZHI isel	Craig Topper	2017-05-09	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds more patterns that a reasonable person might write that can be compiled to BZHI. This adds support for (~0U >> (32 - b)) & a; and a << (32 - b) >> (32 - b); This was inspired by the code in APInt::clearUnusedBits. This can pass an index of 32 to the bzhi instruction which a quick test of Haswell hardware shows will not mask any bits. Though the description text in the Intel manual says the "index is saturated to OperandSize-1". The pseudocode in the same manual indicates no bits will be zeroed for this case. I think this is still missing cases where the subtract portion is an 8-bit operation. Differential Revision: https://reviews.llvm.org/D32616 llvm-svn: 302549
*	Add extra operand to CALLSEQ_START to keep frame part set up previously	Serge Pavlov	2017-05-09	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs the backend that the argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size stored in CALLSEQ_START in this case does not count the size of this argument. However CALLSEQ_END still keeps total frame size, as caller can be responsible for cleanup of entire frame. So CALLSEQ_START and CALLSEQ_END keep different frame size and the difference is treated by MachineVerifier as stack error. Currently there is no way to distinguish this case from actual errors. This patch adds additional argument to CALLSEQ_START and its target-specific counterparts to keep size of stack that is set up prior to the call frame sequence. This argument allows MachineVerifier to calculate actual frame size associated with frame setup instruction and correctly process the case of inalloca arguments. The changes made by the patch are: - Frame setup instructions get the second mandatory argument. It affects all targets that use frame pseudo instructions and touched many files although the changes are uniform. - Access to frame properties are implemented using special instructions rather than calls getOperand(N).getImm(). For X86 and ARM such replacement was made previously. - Changes that reflect appearance of additional argument of frame setup instruction. These involve proper instruction initialization and methods that access instruction arguments. - MachineVerifier retrieves frame size using method, which reports sum of frame parts initialized inside frame instruction pair and outside it. The patch implements approach proposed by Quentin Colombet in https://bugs.llvm.org/show_bug.cgi?id=27481#c1. It fixes 9 tests failed with machine verifier enabled and listed in PR27481. Differential Revision: https://reviews.llvm.org/D32394 llvm-svn: 302527
*	Strip trailing whitespace. NFCI.	Simon Pilgrim	2017-05-04	1	-3/+3
\| \| \| \|	llvm-svn: 302192
*	[X86][LWP] Add llvm support for LWP instructions (reapplied).	Simon Pilgrim	2017-05-03	1	-0/+59
\| \| \| \| \| \| \| \| \| \|	This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Reapplied - this time without changing line endings of existing files. Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302041
*	Revert rL302028 due to accidental line ending changes.	Simon Pilgrim	2017-05-03	1	-59/+0
\| \| \| \|	llvm-svn: 302038