bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[NFC] fix trivial typos in comments	Hiroshi Inoue	2018-07-18	2	-4/+4
\| \| \| \|	llvm-svn: 337351
*	Fix build failures from r337347, found by clang	Justin Hibbits	2018-07-18	3	-15/+6
\| \| \| \| \| \| \| \| \| \| \|	* Delete a no-longer-used override, and mark the other getRegisterTypeForCallingConv() as override. * SPE only supports i32, not i64, as the internal type, so simply remove the type check, so that DestReg and Opc are provably always set. GCC 6.4 did not warn about either of the above. llvm-svn: 337350
*	[X86] Remove patterns that mix X86ISD::MOVLHPS/MOVHLPS with v2i64/v2f64 types.	Craig Topper	2018-07-18	2	-33/+0
\| \| \| \| \| \|	The X86ISD::MOVLHPS/MOVHLPS should now only be emitted in SSE1 only. This means that the v2i64/v2f64 types would be illegal thus we don't need these patterns. llvm-svn: 337349
*	[X86] Generate v2f64 X86ISD::UNPCKL/UNPCKH instead of ↵	Craig Topper	2018-07-18	2	-4/+17
\| \| \| \| \| \| \| \| \| \| \| \|	X86ISD::MOVLHPS/MOVHLPS for unary v2f64 {0,0} and {1,1} shuffles with SSE2. I'm trying to restrict the MOVLHPS/MOVHLPS ISD nodes to SSE1 only. With SSE2 we can use unpcks. I believe this will allow some patterns to be cleaned up to require fewer bitcasts. I've put in an odd isel hack to still select MOVHLPS instruction from the unpckh node to avoid changing tests and because movhlps is a shorter encoding. Ideally we'd do execution domain switching on this, but the operands are in the wrong order and are tied. We might be able to try a commute in the domain switching using custom code. We already support domain switching for UNPCKLPD and MOVLHPS. llvm-svn: 337348
*	Introduce codegen for the Signal Processing Engine	Justin Hibbits	2018-07-18	18	-614/+1323
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1, e500v2, and several e200 cores. This adds support targeting the e500v2, as this is more common than the e500v1, and is in SoCs still on the market. This patch is very intrusive because the SPE is binary incompatible with the traditional FPU. After discussing with others, the cleanest solution was to make both SPE and FPU features on top of a base PowerPC subset, so all FPU instructions are now wrapped with HasFPU predicates. Supported by this are: * Code generation following the SPE ABI at the LLVM IR level (calling conventions) * Single- and Double-precision math at the level supported by the APU. Still to do: * Vector operations * SPE intrinsics As this changes the Callee-saved register list order, one test, which tests the precise generated code, was updated to account for the new register order. Reviewed by: nemanjai Differential Revision: https://reviews.llvm.org/D44830 llvm-svn: 337347
*	Complete the SPE instruction set patterns	Justin Hibbits	2018-07-18	6	-225/+562
\| \| \| \| \| \| \| \| \|	This is the lead-up to having SPE codegen. Add the rest of the instructions, along with MC tests. Differential Revision: https://reviews.llvm.org/D44829 llvm-svn: 337346
*	Add PowerPC e500(v2) core scheduler and directives.	Justin Hibbits	2018-07-18	7	-220/+497
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D44828 llvm-svn: 337345
*	[X86] Remove the vector alignment requirement from the patterns added in ↵	Craig Topper	2018-07-17	1	-2/+4
\| \| \| \| \| \| \| \|	r337320. The resulting instruction will only load 64 bits so alignment isn't required. llvm-svn: 337334
*	[X86] Add patterns for folding full vector load into MOVHPS and MOVLPS with ↵	Craig Topper	2018-07-17	2	-16/+25
\| \| \| \| \| \|	SSE1 only. llvm-svn: 337320
*	[x86/SLH] Flesh out the data-invariant instruction table a bit based on ↵	Chandler Carruth	2018-07-17	1	-7/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	feedback from Craig. Summary: The only thing he suggested that I've skipped here is the double-wide multiply instructions. Multiply is an area I'm nervous about there being some hidden data-dependent behavior, and it doesn't seem important for any benchmarks I have, so skipping it and sticking with the minimal multiply support that matches what I know is widely used in existing crypto libraries. We can always add double-wide multiply when we have clarity from vendors about its behavior and guarantees. I've tried to at least cover the fundamentals here with tests, although I've not tried to cover every width or permutation. I can add more tests where folks think it would be helpful. Reviewers: craig.topper Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D49413 llvm-svn: 337308
*	[WebAssembly] Update WebAssemblyLowerEmscriptenEHSjLj to handle separate ↵	Sam Clegg	2018-07-17	1	-38/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	compilation Previously we were assuming whole program compilation. Now that separate compilation is a thing we need to update this pass. Firstly, it can no longer assert on the existence of malloc and free. This functions might not be in the current translation unit. If we need them then we will generate not imports for them. Secondly the global helper function we create should be marked as weak since we will be generating a separate copy in each translation unit. Finally the names of the symbols used must be unique and fixed since they need to agree across translation units. Differential Revision: https://reviews.llvm.org/D49263 llvm-svn: 337301
*	[X86] Remove some standalone patterns in favor of the patterns in the MOVLPD ↵	Craig Topper	2018-07-17	2	-20/+2
\| \| \| \| \| \| \| \|	instruction definitions. Previously we passed 'null_frag' into the instruction definition. The multiclass is shared with MOVHPD which doesn't use null_frag. It turns out by passing X86Movsd it produces patterns equivalent to some standalone patterns. llvm-svn: 337299
*	[AArch64][SVE]: Integer multiply-add/subtract instructions.	Sander de Smalen	2018-07-17	2	-0/+69
\| \| \| \| \| \| \| \| \| \|	This patch adds support for the following instructions: MLA mul-add, writing addend (Zda = Zda + Zn * Zm) MLS mul-sub, writing addend (Zda = Zda + -Zn * Zm) MAD mul-add, writing multiplicant (Zdn = Za + Zdn * Zm) MSB mul-sub, writing multiplicant (Zdn = Za + -Zdn * Zm) llvm-svn: 337293
*	[Mips][FastISel] Fix handling of icmp with i1 type	Petar Jovanovic	2018-07-17	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	The Mips FastISel back-end does not extend i1 values while lowering icmp. Ensure that we bail into DAG ISel when handling this case. Patch by Dragan Mladjenovic. Differential Revision: https://reviews.llvm.org/D49290 llvm-svn: 337288
*	[AArch64][SVE] Asm: FP fused multiply-add/subtract instructions.	Sander de Smalen	2018-07-17	2	-0/+119
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for the following instructions: FMLA mul-add, writing addend (Zda = Zda + Zn * Zm) FNMLA negated mul-add, writing addend (Zda = -Zda + -Zn * Zm) FMLS mul-sub, writing addend (Zda = Zda + -Zn * Zm) FNMLS negated mul-sub, writing addend (Zda = -Zda + Zn * Zm) FMAD mul-add, writing multiplicant (Zdn = Za + Zdn * Zm) FNMAD negated mul-add, writing multiplicant (Zdn = -Za + -Zdn * Zm) FMSB mul-sub, writing multiplicant (Zdn = Za + -Zdn * Zm) FNMSB negated mul-sub, writing multiplicant (Zdn = -Za + Zdn * Zm) llvm-svn: 337282
*	[AArch64][SVE] Asm: Support for predicated FP operations (FP immediate)	Sander de Smalen	2018-07-17	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch completes support for the following floating point instructions that take FP immediates: FADD* (addition) FSUB (subtract) FSUBR (subtract reverse form) FMUL* (multiplication) FMAX* (maximum) FMAXNM (maximum number) FMIN (maximum) FMINNM (maximum number) All operations are predicated and take a FP immediate operand, e.g. fadd z0.h, p0/m, z0.h, #0.5 fmin z0.s, p0/m, z0.s, #1.0 ^___________^ (tied) * Instructions added in a previous patch. llvm-svn: 337272
*	[LLVM-C] Add target triple normalization to the C API.	whitequark	2018-07-17	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rL333307 was introduced to remove automatic target triple normalization when calling sys::getDefaultTargetTriple(), arguing that users of the latter already called Triple::normalize() if necessary. However, users of the C API currently have no way of doing target triple normalization. This patch introduces an LLVMNormalizeTargetTriple function to the C API which wraps Triple::normalize() and can be used on the result of LLVMGetDefaultTargetTriple to achieve the same effect. Differential Revision: https://reviews.llvm.org/D49414 Reviewed By: whitequark llvm-svn: 337263
*	[AArch64][SVE] Asm: Support for predicated FP operations.	Sander de Smalen	2018-07-17	2	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for the following floating point instructions: FABD (absolute difference) FADD (addition) FSUB (subtract) FSUBR (subtract reverse form) FDIV (divide) FDIVR (divide reverse form) FMAX (maximum) FMAXNM (maximum number) FMIN (minimum) FMINNM (minimum number) FSCALE (adjust exponent) FMULX (multiply extended) All operations are predicated and binary form, e.g. fadd z0.h, p0/m, z0.h, z1.h ^___________^ (tied) Supporting 16, 32 and 64-bit FP elements. llvm-svn: 337259
*	[DAGCombiner] Call SimplifyDemandedVectorElts from EXTRACT_VECTOR_ELT	Simon Pilgrim	2018-07-17	1	-10/+26
\| \| \| \| \| \| \| \|	If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops. Differential Revision: https://reviews.llvm.org/D49262 llvm-svn: 337258
*	[AArch64][SVE] Asm: Support for SPLICE instruction.	Sander de Smalen	2018-07-17	2	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SPLICE instruction splices two vectors into one vector using a predicate. It copies the active elements from the first vector, and then fills the remaining elements with the low-numbered elements from the second vector. The instruction has the following form, e.g. splice z0.b, p0, z0.b, z1.b for 8-bit elements. It also supports 16, 32 and 64-bit elements. llvm-svn: 337253
*	[AArch64][SVE] Asm: Support for EXT instruction.	Sander de Smalen	2018-07-17	2	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds an instruction that allows extracting a vector from a pair of vectors, given an immediate index that describes the element position to extract from. The instruction has the following assembly: ext z0.b, z0.b, z1.b, #imm where #imm is an immediate between 0 and 255. llvm-svn: 337251
*	[X86] Properly qualify some MOVSS/MOVSD patterns with OptSize.	Craig Topper	2018-07-17	1	-12/+13
\| \| \| \| \| \|	These are integer versions of patterns that I already fixed for floating point. llvm-svn: 337240
*	[Sparc] Do not depend on icc for ta 1	Daniel Cederman	2018-07-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	The ta instruction will always trap, regardless of the value of the integer condition codes. TRAPri is marked as using icc, so we cannot use a pattern for TRAPri to implement ta 1, as verify-machineinstrs can complain that icc is not defined. Instead we implement ta 1 the same way as ta 5. llvm-svn: 337236
*	[X86] Add full set of patterns for turning ceil/floor/trunc/rint/nearbyint ↵	Craig Topper	2018-07-17	1	-178/+197
\| \| \| \| \| \| \| \|	into rndscale with loads, broadcast, and masking. This amounts to pretty ridiculous number of patterns. Ideally we'd canonicalize the X86ISD::VRNDSCALE earlier to reuse those patterns. I briefly looked into doing that, but some strict FP operations could still get converted to rint and nearbyint during isel. It's probably still worthwhile to look into. This patch is meant as a starting point to work from. llvm-svn: 337234
*	[X86] Add a missing FMA3 scalar intrinsic pattern.	Craig Topper	2018-07-16	1	-0/+7
\| \| \| \| \| \|	This allows us to use 231 form to fold an insertelement on the add input to the fma. There is technically no software intrinsic that can use this until AVX512F, but it can be manually built up from other intrinsics. llvm-svn: 337223
*	[WebAssembly] Remove ELF file support.	Sam Clegg	2018-07-16	20	-341/+55
\| \| \| \| \| \| \| \| \|	This support was partial and temporary. Now that we have wasm object file support its no longer needed. Differential Revision: https://reviews.llvm.org/D48744 llvm-svn: 337222
*	[AMDGPU] [AMDGPU] Support a fdot2 pattern.	Farhana Aleen	2018-07-16	6	-1/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z)) -> fdot2((v2f16)S0, (v2f16)S1, (float)z) Author: FarhanaAleen Reviewed By: rampitec, b-sumner Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D49146 llvm-svn: 337198
*	[x86/SLH] Completely rework how we sink post-load hardening past data	Chandler Carruth	2018-07-16	1	-24/+184
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	invariant instructions to be both more correct and much more powerful. While testing, I continued to find issues with sinking post-load hardening. Unfortunately, it was amazingly hard to create any useful tests of this because we were mostly sinking across copies and other loading instructions. The fact that we couldn't sink past normal arithmetic was really a big oversight. So first, I've ported roughly the same set of instructions from the data invariant loads to also have their non-loading varieties understood to be data invariant. I've also added a few instructions that came up so often it again made testing complicated: inc, dec, and lea. With this, I was able to shake out a few nasty bugs in the validity checking. We need to restrict to hardening single-def instructions with defined registers that match a particular form: GPRs that don't have a NOREX constraint directly attached to their register class. The (tiny!) test case included catches all of the issues I was seeing (once we can sink the hardening at all) except for the NOREX issue. The only test I have there is horrible. It is large, inexplicable, and doesn't even produce an error unless you try to emit encodings. I can keep looking for a way to test it, but I'm out of ideas really. Thanks to Ben for giving me at least a sanity-check review. I'll follow up with Craig to go over this more thoroughly post-commit, but without it SLH crashes everywhere so landing it for now. Differential Revision: https://reviews.llvm.org/D49378 llvm-svn: 337177
*	[mips] Eliminate the usage of hasStdEnc in MipsPat.	Simon Atanasyan	2018-07-16	7	-161/+206
\| \| \| \| \| \| \| \| \| \| \|	Instead, the pattern is tagged with the correct predicate when it is declared. Some patterns have been duplicated as necessary. Patch by Simon Dardis. Differential revision: https://reviews.llvm.org/D48365 llvm-svn: 337171
*	[MIPS GlobalISel] Select instructions to load and store i32 on stack	Petar Jovanovic	2018-07-16	3	-2/+88
\| \| \| \| \| \| \| \| \| \| \|	Add code for selection of G_LOAD, G_STORE, G_GEP, G_FRAMEINDEX and G_CONSTANT. Support loads and stores of i32 values. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D48957 llvm-svn: 337168
*	[X86][AArch64][DAGCombine] Unfold 'check for [no] signed truncation' pattern	Roman Lebedev	2018-07-16	2	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 \| PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR for 'check for [no] signed truncation' pattern can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. But the IR-optimal patter does not lower efficiently, so we want to undo it.. This handles the simple pattern. There is a second pattern with predicate and constants inverted. NOTE: we do not check uses here. we always do the transform. Reviewers: spatel, craig.topper, RKSimon, javed.absar Reviewed By: spatel Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49266 llvm-svn: 337166
*	[Sparc] Use the correct encoding for ta 3	Daniel Cederman	2018-07-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The old encoding generated a "tn %g1 + 3" instruction instead of the expected "ta 3". Reviewers: venkatra, jyknight Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D49171 llvm-svn: 337165
*	[Sparc] Use the names .rem and .urem instead of __modsi3 and __umodsi3	Daniel Cederman	2018-07-16	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These are the names used in libgcc. Reviewers: venkatra, jyknight, ekedaigle Reviewed By: jyknight Subscribers: joerg, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48915 llvm-svn: 337164
*	[Sparc] Generate ta 1 for the @llvm.debugtrap intrinsic	Daniel Cederman	2018-07-16	2	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Software trap number one is the trap used for breakpoints in the Sparc ABI. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48637 llvm-svn: 337163
*	[x86/SLH] Fix a bug where we would try to post-load harden non-GPRs.	Chandler Carruth	2018-07-16	1	-13/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Found cases that hit the assert I added. This patch factors the validity checking into a nice helper routine and calls it when deciding to harden post-load, and asserts it when doing so later. I've added tests for the various ways of loading a floating point type, as well as loading all vector permutations. Even though many of these go to identical instructions, it seems good to somewhat comprehensively test them. I'm confident there will be more fixes needed here, I'll try to add tests each time as I get this predicate adjusted. llvm-svn: 337160
*	[x86/SLH] Extract another small helper function, add better comments and	Chandler Carruth	2018-07-16	1	-23/+34
\| \| \| \| \| \|	use better terminology. NFC. llvm-svn: 337157
*	[AMDGPU][Waitcnt] Re-apply fix "comparison of integers of different signs" ↵	Mark Searles	2018-07-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	build error" Re-apply "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error"" ( fe0a456510131f268e388c4a18a92f575c0db183 ), which was inadvertantly reverted via 2b2ee080f0164485562593b1b87291a48cea4a9a . llvm-svn: 337156
*	run post-RA hazard recognizer pass late	Mark Searles	2018-07-16	2	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	Memory legalizer, waitcnt, and shrink passes can perturb the instructions, which means that the post-RA hazard recognizer pass should run after them. Otherwise, one of those passes may invalidate the work done by the hazard recognizer. Note that this has adverse side-effect that any consecutive S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a single S_NOP <N>. This should be addressed in a follow-on patch. Differential Revision: https://reviews.llvm.org/D49288 llvm-svn: 337154
*	Revert "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" ↵	Mark Searles	2018-07-16	1	-1/+1
\| \| \| \| \| \| \| \|	build error" This reverts commit fe0a456510131f268e388c4a18a92f575c0db183. llvm-svn: 337153
*	[X86] Merge the FR128 and VR128 regclass since they have identical spill and ↵	Craig Topper	2018-07-16	7	-298/+328
\| \| \| \| \| \| \| \| \| \|	alignment characteristics. This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid. The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well. llvm-svn: 337147
*	[x86/SLH] Fix an unused variable warning in release builds after	Chandler Carruth	2018-07-16	1	-0/+1
\| \| \| \| \| \|	r337144. llvm-svn: 337145
*	[x86/SLH] Teach speculative load hardening to correctly harden the	Chandler Carruth	2018-07-16	2	-17/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	indices used by AVX2 and AVX-512 gather instructions. The index vector is hardened by broadcasting the predicate state into a vector register and then or-ing. We don't even have to worry about EFLAGS here. I've added a test for all of the gather intrinsics to make sure that we don't miss one. A particularly interesting creation is the gather prefetch, which needs to be marked as potentially "loading" to get the correct behavior. It's a memory access in many ways, and is actually relevant for SLH. Based on discussion with Craig in review, I've moved it to be `mayLoad` and `mayStore` rather than generic side effects. This matches how we model other prefetch instructions. Many thanks to Craig for the review here. Differential Revision: https://reviews.llvm.org/D49336 llvm-svn: 337144
*	[x86/SLH] Extract one of the bits of logic to its own function. NFC.	Chandler Carruth	2018-07-15	1	-43/+48
\| \| \| \| \| \| \|	This is just a refactoring to start cleaning up the code here and make it more readable and approachable. llvm-svn: 337138
*	[X86] Add custom execution domain fixing for 128/256-bit integer logic ↵	Craig Topper	2018-07-15	1	-0/+85
\| \| \| \| \| \| \| \| \| \| \| \|	operations with AVX512F, but not AVX512DQ. AVX512F only has integer domain logic instructions. AVX512DQ added FP domain logic instructions. Execution domain fixing runs before EVEX->VEX. So if we have AVX512F and not AVX512DQ we fail to do execution domain switching of the logic operations. This leads to mismatches in execution domain and more test differences. This patch adds custom domain fixing that switches EVEX integer logic operations to VEX fp logic operations if XMM16-31 are not used. llvm-svn: 337137
*	[X86] Add load patterns for cases where we select X86Movss/X86Movsd to blend ↵	Craig Topper	2018-07-15	1	-0/+32
\| \| \| \| \| \| \| \|	instructions. This allows us to fold the load during isel without waiting for the peephole pass to do it. llvm-svn: 337136
*	[X86] Use 128-bit blends instead vmovss/vmovsd for 512-bit vzmovl patterns ↵	Craig Topper	2018-07-15	1	-12/+39
\| \| \| \| \| \|	to match AVX. llvm-svn: 337135
*	[X86] Use 128-bit ops for 256-bit vzmovl patterns.	Craig Topper	2018-07-15	1	-10/+17
\| \| \| \| \| \| \| \|	128-bit ops implicitly zero the upper bits. This should address the comment about domain crossing for the integer version without AVX2 since we can use a 128-bit VBLENDW without AVX2. The only bad thing I see here is that we failed to reuse an vxorps in some of the tests, but I think that's already known issue. llvm-svn: 337134
*	[llvm-mca][BtVer2] teach how to identify false dependencies on partially written	Andrea Di Biagio	2018-07-15	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	registers. The goal of this patch is to improve the throughput analysis in llvm-mca for the case where instructions perform partial register writes. On x86, partial register writes are quite difficult to model, mainly because different processors tend to implement different register merging schemes in hardware. When the code contains partial register writes, the IPC (instructions per cycles) estimated by llvm-mca tends to diverge quite significantly from the observed IPC (using perf). Modern AMD processors (at least, from Bulldozer onwards) don't rename partial registers. Quoting Agner Fog's microarchitecture.pdf: " The processor always keeps the different parts of an integer register together. For example, AL and AH are not treated as independent by the out-of-order execution mechanism. An instruction that writes to part of a register will therefore have a false dependence on any previous write to the same register or any part of it." This patch is a first important step towards improving the analysis of partial register updates. It changes the semantic of RegisterFile descriptors in tablegen, and teaches llvm-mca how to identify false dependences in the presence of partial register writes (for more details: see the new code comments in include/Target/TargetSchedule.h - class RegisterFile). This patch doesn't address the case where a write to a part of a register is followed by a read from the whole register. On Intel chips, high8 registers (AH/BH/CH/DH)) can be stored in separate physical registers. However, a later (dirty) read of the full register (example: AX/EAX) triggers a merge uOp, which adds extra latency (and potentially affects the pipe usage). This is a very interesting article on the subject with a very informative answer from Peter Cordes: https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to In future, the definition of RegisterFile can be extended with extra information that may be used to identify delays caused by merge opcodes triggered by a dirty read of a partial write. Differential Revision: https://reviews.llvm.org/D49196 llvm-svn: 337123
*	[AVR] Document some public functions	Dylan McKay	2018-07-15	1	-0/+2
\| \| \| \|	llvm-svn: 337122
*	[X86] Add some optsize patterns for 256-bit X86vzmovl.	Craig Topper	2018-07-15	1	-0/+19
\| \| \| \| \| \|	These patterns use VMOVSS/SD. Without optsize we use BLENDI instead. llvm-svn: 337119