bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Fixing another -Wunused-variable warning, this time in release builds ↵	Aaron Ballman	2014-12-27	1	-3/+3
\| \| \| \| \| \|	without asserts. NFC. llvm-svn: 224889
*	Removing a variable that is set but never used, to silence a ↵	Aaron Ballman	2014-12-27	1	-4/+0
\| \| \| \| \| \|	-Wunused-but-set-variable warning; NFC. llvm-svn: 224888
*	[x86] Prevent instruction selection of AVX512 cmp.ps/pd/ss/sd intrinsics ↵	Craig Topper	2014-12-27	1	-15/+18
\| \| \| \| \| \|	with illegal immediates. Forgot to do this when I did SSE/SSE2/AVX/AVX2. llvm-svn: 224887
*	[x86] Assert on invalid immediates in the instruction printer for ↵	Craig Topper	2014-12-27	2	-4/+8
\| \| \| \| \| \|	cmp.ps/pd/ss/sd instead of truncating the immediate. The assembly parser and instruction selection shouldn't generate invalid immediates. llvm-svn: 224886
*	[x86] Prevent llvm.x86.cmp.ps/pd/ss/sd from being selected with bad ↵	Craig Topper	2014-12-27	2	-26/+33
\| \| \| \| \| \|	immediates. The frontend now checks this when the builtin is used. This will allow the instruction printer to not have to deal with invalid immediates on these instructions. llvm-svn: 224885
*	[FastIsel][X86] Fix invalid register replacement for bool args	Keno Fischer	2014-12-27	1	-28/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Consider the following IR: %3 = load i8* undef %4 = trunc i8 %3 to i1 %5 = call %jl_value_t.0* @foo(..., i1 %4, ...) ret %jl_value_t.0* %5 Bools (that are the result of direct truncs) are lowered as whatever the argument to the trunc was and a "and 1", causing the part of the MBB responsible for this argument to look something like this: %vreg8<def,tied1> = AND8ri %vreg7<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg8,%vreg7 Later, when the load is lowered, it will insert %vreg15<def> = MOV8rm %vreg14, 1, %noreg, 0, %noreg; mem:LD1[undef] GR8:%vreg15 GR64:%vreg14 but remember to (at the end of isel) replace vreg7 by vreg15. Now for the bug. In fast isel lowering, we mistakenly mark vreg8 as the result of the load instead of the trunc. This adds a fixup to have vreg8 replaced by whatever the result of the load is as well, so we end up with %vreg15<def,tied1> = AND8ri %vreg15<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg15 which is an SSA violation and causes problems later down the road. This fixes PR21557. Test Plan: Test test case from PR21557 is added to the test suite. Reviewers: ributzka Reviewed By: ributzka Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6245 llvm-svn: 224884
*	[X86] Add the debug registers DR8-DR15 so we can assemble and disassemble ↵	Craig Topper	2014-12-26	3	-11/+25
\| \| \| \| \| \|	references to them. llvm-svn: 224862
*	[X86] Don't fail disassembly if REX.R/REX.B is used on an MMX register. ↵	Craig Topper	2014-12-26	2	-6/+9
\| \| \| \| \| \|	Similar fix to not fail to disassembler CR9-CR15 references. llvm-svn: 224861
*	Teach disassembler to handle illegal immediates on (v)cmpps/pd/ss/sd ↵	Craig Topper	2014-12-26	4	-61/+75
\| \| \| \| \| \|	instructions. Instead of rejecting we'll just generate the _alt forms that don't try to alter the mnemonic. While I'm here, merge some common code in the Instruction printers for the condition code replacement and fix the mask on SSE to be 3-bits instead of 4. llvm-svn: 224846
*	Use MCPhysReg for table of register encodings.	Craig Topper	2014-12-26	1	-3/+3
\| \| \| \|	llvm-svn: 224845
*	Masked Load/Store - Changed the order of parameters in intrinsics.	Elena Demikhovsky	2014-12-25	1	-3/+3
\| \| \| \| \| \| \|	No functional changes. The documentation is coming. llvm-svn: 224829
*	[X86] Remove the single AdSize indicator and replace it with separate ↵	Craig Topper	2014-12-24	5	-93/+107
\| \| \| \| \| \| \| \|	AdSize16/32/64 flags. This removes a hardcoded list of instructions in the CodeEmitter. Eventually I intend to remove the predicates on the affected instructions since in any given mode two of them are valid if we supported addr32/addr16 prefixes in the assembler. llvm-svn: 224809
*	AVX-512: Added FMA instructions, intrinsics an tests for KNL and SKX targets	Elena Demikhovsky	2014-12-23	3	-81/+101
\| \| \| \| \| \| \| \|	by Asaf Badouh http://reviews.llvm.org/D6456 llvm-svn: 224764
*	AVX-512: BLENDM - fixed encoding of the broadcast version	Elena Demikhovsky	2014-12-23	2	-2/+3
\| \| \| \| \| \|	Added more intrinsics and encoding tests. llvm-svn: 224760
*	X86: Don't over-align combined loads.	Jim Grosbach	2014-12-23	1	-8/+3
\| \| \| \| \| \| \| \| \| \| \|	When combining consecutive loads+inserts into a single vector load, we should keep the alignment of the base load. Doing otherwise can, and does, lead to using overly aligned instructions. In the included test case, for example, using a 32-byte vmovaps on a 16-byte aligned value. Oops. rdar://19190968 llvm-svn: 224746
*	Make musttail more robust for vector types on x86	Reid Kleckner	2014-12-22	2	-100/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously I tried to plug musttail into the existing vararg lowering code. That turned out to be a mistake, because non-vararg calls use significantly different register lowering, even on x86. For example, AVX vectors are usually passed in registers to normal functions and memory to vararg functions. Now musttail uses a completely separate lowering. Hopefully this can be used as the basis for non-x86 perfect forwarding. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D6156 llvm-svn: 224745
*	[x86] Add vector @llvm.ctpop intrinsic custom lowering	Bruno Cardoso Lopes	2014-12-22	1	-0/+152
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, when ctpop is supported for scalar types, the expansion of @llvm.ctpop.vXiY uses vector element extractions, insertions and individual calls to @llvm.ctpop.iY. When not, expansion with bit-math operations is used for the scalar calls. Local haswell measurements show that we can improve vector @llvm.ctpop.vXiY expansion in some cases by using a using a vector parallel bit twiddling approach, based on: v = v - ((v >> 1) & 0x55555555); v = (v & 0x33333333) + ((v >> 2) & 0x33333333); v = ((v + (v >> 4) & 0xF0F0F0F) v = v + (v >> 8) v = v + (v >> 16) v = v & 0x0000003F (from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel) When scalar ctpop isn't supported, the approach above performs better for v2i64, v4i32, v4i64 and v8i32 (see numbers below). And even when scalar ctpop is supported, this approach performs ~2x better for v8i32. Here, x86_64 implies -march=corei7-avx without ctpop and x86_64h includes ctpop support with -march=core-avx2. == [x86_64h - new] v8i32: 0.661685 v4i32: 0.514678 v4i64: 0.652009 v2i64: 0.324289 == [x86_64h - old] v8i32: 1.29578 v4i32: 0.528807 v4i64: 0.65981 v2i64: 0.330707 == [x86_64 - new] v8i32: 1.003 v4i32: 0.656273 v4i64: 1.11711 v2i64: 0.754064 == [x86_64 - old] v8i32: 2.34886 v4i32: 1.72053 v4i64: 1.41086 v2i64: 1.0244 More work for other vector types will come next. llvm-svn: 224725
*	AVX-512: Added all forms of BLENDM instructions,	Elena Demikhovsky	2014-12-22	3	-55/+120
\| \| \| \| \| \|	intrinsics, encoding tests for AVX-512F and skx instructions. llvm-svn: 224707
*	[X86] Add hasSideEffects = 0 to CALLpcrel16. This matches what is inferred ↵	Craig Topper	2014-12-21	1	-4/+5
\| \| \| \| \| \|	from patterns for the 32-bit version. llvm-svn: 224692
*	[X86] Swap operand order in Intel syntax on a bunch of aliases.	Craig Topper	2014-12-20	1	-18/+18
\| \| \| \|	llvm-svn: 224687
*	[X86] Swap operand order of imul aliases in Intel syntax. Also disable ↵	Craig Topper	2014-12-20	1	-6/+6
\| \| \| \| \| \|	printing of the alias instead of the real instruction. llvm-svn: 224686
*	[X86] Remove '*' from asm strings in far call/jump aliases for Intel syntax.	Craig Topper	2014-12-20	1	-11/+11
\| \| \| \|	llvm-svn: 224685
*	[X86] Don't swap the order of segment and offset in immediate form of far ↵	Craig Topper	2014-12-20	1	-4/+4
\| \| \| \| \| \|	call/jump in Intel syntax. llvm-svn: 224684
*	[X86] Immediate forms of far call/jump are not valid in x86-64.	Craig Topper	2014-12-20	1	-16/+20
\| \| \| \|	llvm-svn: 224678
*	Masked load and store codegen - fixed 128-bit vectors	Elena Demikhovsky	2014-12-19	3	-20/+71
\| \| \| \| \| \| \|	The codegen failed on 128-bit types on AVX2. I added patterns and in td files and tests. llvm-svn: 224647
*	Add the ExceptionHandling::MSVC enumeration	Reid Kleckner	2014-12-19	2	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is intended to be used for a family of personality functions that have similar IR preparation requirements. Typically when interoperating with MSVC personality functions, bits of functionality need to be outlined from the main function into helper functions. There is also usually more than one landing pad per invoke, which does not match the LLVM IR landingpad representation. None of this is implemented yet. This change just adds a new enum that is active for *-windows-msvc and delegates to the EH removal preparation pass. No functionality change for other targets. llvm-svn: 224625
*	Model sqrtss as a binary operation with one source operand tied to the ↵	Sanjay Patel	2014-12-19	1	-58/+12
\| \| \| \| \| \| \| \| \| \| \|	destination (PR14221) This is a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ). That patch started to fix PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ), but it was not completed. Differential Revision: http://reviews.llvm.org/D6330 llvm-svn: 224624
*	[AVX512] Enable FP arithmetic lowering for AVX512VL subsets.	Robert Khasanov	2014-12-18	4	-2/+105
\| \| \| \| \| \| \|	Added RegOp2MemOpTable4 to transform 4th operand from register to memory in merge-masked versions of instructions. Added lowering tests. llvm-svn: 224516
*	[X86] Use correct opsize on indirect call and jump aliases.	Craig Topper	2014-12-18	1	-4/+4
\| \| \| \|	llvm-svn: 224497
*	[X86] Don't use PS prefix on LDMXCSR/STMXCSR.	Craig Topper	2014-12-18	1	-6/+8
\| \| \| \| \| \|	Near as I can tell prefixes are ignored on these instructions except for a comment in the Intel docs about 0xf3. Binutils disassembler seems to ignore prefixes on these instructions. Our disassembler still doesn't distinguish PS and "no prefix" well enough for this to make a functional change, but it helps with experiments I'm doing on a potential new disassembler table builder. llvm-svn: 224496
*	[X86] Remove unnecessary 'In64BitMode' predicate for instructions that ↵	Craig Topper	2014-12-18	1	-14/+11
\| \| \| \| \| \|	already indicate use of REX.W. llvm-svn: 224495
*	[DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle.	Michael Kuperstein	2014-12-17	2	-0/+12
\| \| \| \| \| \| \| \| \| \|	This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors. This fixes PR15872 and provides a partial fix for PR21711. Differential Revision: http://reviews.llvm.org/D6678 llvm-svn: 224429
*	[CodeGenPrepare] Reapply r224351 with a fix for the assertion failure:	Quentin Colombet	2014-12-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The type promotion helper does not support vector type, so when make such it does not kick in in such cases. Original commit message: [CodeGenPrepare] Move sign/zero extensions near loads using type promotion. This patch extends the optimization in CodeGenPrepare that moves a sign/zero extension near a load when the target can combine them. The optimization may promote any operations between the extension and the load to make that possible. Although this optimization may be beneficial for all targets, in particular AArch64, this is enabled for X86 only as I have not benchmarked it for other targets yet. Context Most targets feature extended loads, i.e., loads that perform a zero or sign extension for free. In that context it is interesting to expose such pattern in CodeGenPrepare so that the instruction selection pass can form such loads. Sometimes, this pattern is blocked because of instructions between the load and the extension. When those instructions are promotable to the extended type, we can expose this pattern. Motivating Example Let us consider an example: define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) { %ld = load i8* %addr1 %zextld = zext i8 %ld to i32 %ld2 = load i32* %addr2 %add = add nsw i32 %ld2, %zextld %sextadd = sext i32 %add to i64 %zexta = zext i8 %a to i32 %addza = add nsw i32 %zexta, %zextld %sextaddza = sext i32 %addza to i64 %addb = add nsw i32 %b, %zextld %sextaddb = sext i32 %addb to i64 call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb) ret void } As it is, this IR generates the following assembly on x86_64: [...] movzbl (%rdi), %eax # zero-extended load movl (%rsi), %es # plain load addl %eax, %esi # 32-bit add movslq %esi, %rdi # sign extend the result of add movzbl %dl, %edx # zero extend the first argument addl %eax, %edx # 32-bit add movslq %edx, %rsi # sign extend the result of add addl %eax, %ecx # 32-bit add movslq %ecx, %rdx # sign extend the result of add [...] The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA. Now, by promoting the additions to form more extended loads we would generate: [...] movzbl (%rdi), %eax # zero-extended load movslq (%rsi), %rdi # sign-extended load addq %rax, %rdi # 64-bit add movzbl %dl, %esi # zero extend the first argument addq %rax, %rsi # 64-bit add movslq %ecx, %rdx # sign extend the second argument addq %rax, %rdx # 64-bit add [...] The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA. This kind of sequences happen a lot on code using 32-bit indexes on 64-bit architectures. Note: The throughput numbers are similar on Sandy Bridge and Haswell. Proposed Solution To avoid the penalty of all these sign/zero extensions, we merge them in the loads at the beginning of the chain of computation by promoting all the chain of computation on the extended type. The promotion is done if and only if we do not introduce new extensions, i.e., if we do not degrade the code quality. To achieve this, we extend the existing “move ext to load” optimization with the promotion mechanism introduced to match larger patterns for addressing mode (r200947). The idea of this extension is to perform the following transformation: ext(promotableInst1(...(promotableInstN(load)))) => promotedInst1(...(promotedInstN(ext(load)))) The promotion mechanism in that optimization is enabled by a new TargetLowering switch, which is off by default. In other words, by default, the optimization performs the “move ext to load” optimization as it was before this patch. Performance Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10. Tested Optimization Levels: O3/Os Tests: llvm-testsuite + externals. Results: - No regression beside noise. - Improvements: CINT2006/473.astar: ~2% Benchmarks/PAQ8p: ~2% Misc/perlin: ~3% The results are consistent for both O3 and Os. <rdar://problem/18310086> llvm-svn: 224402
*	Revert "[CodeGenPrepare] Move sign/zero extensions near loads using type ↵	Reid Kleckner	2014-12-17	1	-1/+1
\| \| \| \| \| \| \| \| \|	promotion." This reverts commit r224351. It causes assertion failures when building ICU. llvm-svn: 224397
*	[X86][SSE] Vector double -> float conversion memory folding (cvtpd2ps)	Simon Pilgrim	2014-12-16	1	-0/+3
\| \| \| \| \| \| \| \|	Added a missing memory folding relationship for the (V)CVTPD2PS instruction - we can safely fold these for stack reloads. Differential Revision: http://reviews.llvm.org/D6663 llvm-svn: 224383
*	x86-32: PUSHF/POPF use/def EFLAGS	JF Bastien	2014-12-16	1	-7/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As a side-quest for D6629 jvoung pointed out that I should use -verify-machineinstrs and this found a bug in x86-32's handling of EFLAGS for PUSHF/POPF. This patch fixes the use/def, and adds -verify-machineinstrs to all x86 tests which contain 'EFLAGS'. One exception: this patch leaves inline-asm-fpstack.ll as-is because it fails -verify-machineinstrs in a way unrelated to EFLAGS. This patch also modifies cmpxchg-clobber-flags.ll along the lines of what D6629 already does by also testing i386. Test Plan: ninja check Reviewers: t.p.northover, jvoung Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6687 llvm-svn: 224359
*	[CodeGenPrepare] Move sign/zero extensions near loads using type promotion.	Quentin Colombet	2014-12-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch extends the optimization in CodeGenPrepare that moves a sign/zero extension near a load when the target can combine them. The optimization may promote any operations between the extension and the load to make that possible. Although this optimization may be beneficial for all targets, in particular AArch64, this is enabled for X86 only as I have not benchmarked it for other targets yet. Context Most targets feature extended loads, i.e., loads that perform a zero or sign extension for free. In that context it is interesting to expose such pattern in CodeGenPrepare so that the instruction selection pass can form such loads. Sometimes, this pattern is blocked because of instructions between the load and the extension. When those instructions are promotable to the extended type, we can expose this pattern. Motivating Example Let us consider an example: define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) { %ld = load i8* %addr1 %zextld = zext i8 %ld to i32 %ld2 = load i32* %addr2 %add = add nsw i32 %ld2, %zextld %sextadd = sext i32 %add to i64 %zexta = zext i8 %a to i32 %addza = add nsw i32 %zexta, %zextld %sextaddza = sext i32 %addza to i64 %addb = add nsw i32 %b, %zextld %sextaddb = sext i32 %addb to i64 call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb) ret void } As it is, this IR generates the following assembly on x86_64: [...] movzbl (%rdi), %eax # zero-extended load movl (%rsi), %es # plain load addl %eax, %esi # 32-bit add movslq %esi, %rdi # sign extend the result of add movzbl %dl, %edx # zero extend the first argument addl %eax, %edx # 32-bit add movslq %edx, %rsi # sign extend the result of add addl %eax, %ecx # 32-bit add movslq %ecx, %rdx # sign extend the result of add [...] The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA. Now, by promoting the additions to form more extended loads we would generate: [...] movzbl (%rdi), %eax # zero-extended load movslq (%rsi), %rdi # sign-extended load addq %rax, %rdi # 64-bit add movzbl %dl, %esi # zero extend the first argument addq %rax, %rsi # 64-bit add movslq %ecx, %rdx # sign extend the second argument addq %rax, %rdx # 64-bit add [...] The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA. This kind of sequences happen a lot on code using 32-bit indexes on 64-bit architectures. Note: The throughput numbers are similar on Sandy Bridge and Haswell. Proposed Solution To avoid the penalty of all these sign/zero extensions, we merge them in the loads at the beginning of the chain of computation by promoting all the chain of computation on the extended type. The promotion is done if and only if we do not introduce new extensions, i.e., if we do not degrade the code quality. To achieve this, we extend the existing “move ext to load” optimization with the promotion mechanism introduced to match larger patterns for addressing mode (r200947). The idea of this extension is to perform the following transformation: ext(promotableInst1(...(promotableInstN(load)))) => promotedInst1(...(promotedInstN(ext(load)))) The promotion mechanism in that optimization is enabled by a new TargetLowering switch, which is off by default. In other words, by default, the optimization performs the “move ext to load” optimization as it was before this patch. Performance Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10. Tested Optimization Levels: O3/Os Tests: llvm-testsuite + externals. Results: - No regression beside noise. - Improvements: CINT2006/473.astar: ~2% Benchmarks/PAQ8p: ~2% Misc/perlin: ~3% The results are consistent for both O3 and Os. <rdar://problem/18310086> llvm-svn: 224351
*	[AVX512] Enable integer arithmetic lowering for AVX512BW/VL subsets.	Robert Khasanov	2014-12-16	2	-1/+6
\| \| \| \| \| \|	Added lowering tests. llvm-svn: 224349
*	combine consecutive subvector 16-byte loads into one 32-byte load	Sanjay Patel	2014-12-16	2	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a fix for PR21709 ( http://llvm.org/bugs/show_bug.cgi?id=21709 ). When we have 2 consecutive 16-byte loads that are merged into one 32-byte vector, we can use a single 32-byte load instead. But we don't do this for SandyBridge / IvyBridge because they have slower 32-byte memops. We also don't bother using 32-byte integer loads on a machine that only has AVX1 (btver2) because those operands would have to be split in half anyway since there is no support for 32-byte integer math ops. Differential Revision: http://reviews.llvm.org/D6492 llvm-svn: 224344
*	[AVX512] Add a comment for avx512_broadcast_pat multiclass	Robert Khasanov	2014-12-16	1	-0/+3
\| \| \| \|	llvm-svn: 224341
*	X86: Added FeatureVectorUAMem for all AVX architectures.	Elena Demikhovsky	2014-12-16	2	-16/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to AVX specification: "Most arithmetic and data processing instructions encoded using the VEX prefix and performing memory accesses have more flexible memory alignment requirements than instructions that are encoded without the VEX prefix. Specifically, With the exception of explicitly aligned 16 or 32 byte SIMD load/store instructions, most VEX-encoded, arithmetic and data processing instructions operate in a flexible environment regarding memory address alignment, i.e. VEX-encoded instruction with 32-byte or 16-byte load semantics will support unaligned load operation by default. Memory arguments for most instructions with VEX prefix operate normally without causing #GP(0) on any byte-granularity alignment (unlike Legacy SSE instructions)." The same for AVX-512. This change does not affect anything right now, because only the "memop pattern fragment" depends on FeatureVectorUAMem and it is not used in AVX patterns. All AVX patterns are based on the "unaligned load" anyway. llvm-svn: 224330
*	x86: Emit LOCK prefix after DATA16	JF Bastien	2014-12-15	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: x86 allows either ordering for the LOCK and DATA16 prefixes, but using GCC+GAS leads to different code generation than using LLVM. This change matches the order that GAS emits the x86 prefixes when a semicolon isn't used in inline assembly (see tc-i386.c comment before define LOCK_PREFIX), and helps simplify tooling that operates on the instruction's byte sequence (such as NaCl's validator). This change shouldn't have any performance impact. Test Plan: ninja check Reviewers: craig.topper, jvoung Subscribers: jfb, llvm-commits Differential Revision: http://reviews.llvm.org/D6630 llvm-svn: 224283
*	[X86] Also pretty-print shuffle mask for INSERTPS rm variants.	Ahmed Bougacha	2014-12-15	1	-3/+7
\| \| \| \|	llvm-svn: 224260
*	[X86] Break false dependencies before partial register updates when the ↵	Michael Kuperstein	2014-12-15	1	-0/+20
\| \| \| \| \| \| \| \| \| \|	source operand is in memory Adds the various "rm" instruction variants into the list of instructions that have a partial register update. Also adds all variants of SQRTSD that were missing in the original list. Differential Revision: http://reviews.llvm.org/D6620 llvm-svn: 224246
*	AVX-512: Added EXPAND instructions and intrinsics.	Elena Demikhovsky	2014-12-15	4	-15/+150
\| \| \| \|	llvm-svn: 224241
*	Loop Vectorizer minor changes in the code -	Elena Demikhovsky	2014-12-14	1	-5/+5
\| \| \| \| \| \| \| \|	some comments, function names, identation. Reviewed here: http://reviews.llvm.org/D6527 llvm-svn: 224218
*	[AVX512] Enabling bit logic lowering	Robert Khasanov	2014-12-12	2	-0/+9
\| \| \| \| \| \|	Added lowering tests. llvm-svn: 224132
*	[AVX512] Enabling MIN/MAX lowering.	Robert Khasanov	2014-12-12	2	-4/+19
\| \| \| \| \| \|	Added lowering tests. llvm-svn: 224127
*	[AVX512] Minor fix in lowering pattern for broadcast intrustions.	Robert Khasanov	2014-12-12	1	-6/+5
\| \| \| \| \| \|	No functional change. llvm-svn: 224122
*	remove function names from comments; NFC	Sanjay Patel	2014-12-11	1	-29/+23
\| \| \| \|	llvm-svn: 224080