bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86][SSE] Share AVX1/AVX2 shuffle tests with AVX512 where possible	Simon Pilgrim	2015-11-17	2	-323/+504
\| \| \| \|	llvm-svn: 253379
*	[WinEH] Move WinEHFuncInfo from MachineModuleInfo to MachineFunction	Reid Kleckner	2015-11-17	2	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Now that there is a one-to-one mapping from MachineFunction to WinEHFuncInfo, we don't need to use a DenseMap to select the right WinEHFuncInfo for the current funclet. The main challenge here is that X86WinEHStatePass is an IR pass that doesn't have access to the MachineFunction. I gave it its own WinEHFuncInfo object that it uses to calculate state numbers, which it then throws away. As long as nobody creates or removes EH pads between this pass and SDAG construction, we will get the same state numbers. The other thing X86WinEHStatePass does is to mark the EH registration node. Instead of communicating which alloca was the registration through WinEHFuncInfo, I added the llvm.x86.seh.ehregnode intrinsic. This intrinsic generates no code and simply marks the alloca in use. Reviewers: JCTremoulet Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14668 llvm-svn: 253378
*	Lower statepoints with multi-def targets.	Pat Gavlin	2015-11-17	1	-1/+19
\| \| \| \| \| \| \| \| \| \|	Statepoint lowering currently expects that the target method of a statepoint only defines a single value. This precludes using statepoints with ABIs that return values in multiple registers (e.g. the SysV AMD64 ABI). This change adds support for lowering statepoints with mutli-def targets. llvm-svn: 253339
*	Use TargetRegisterInfo for printing MachineOperand register comments	Dan Gohman	2015-11-17	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Several places in AsmPrinter.cpp print comments describing MachineOperand registers using MCRegisterInfo, which uses MCOperand-oriented names. This doesn't work for targets that use virtual registers exclusively, as WebAssembly does, since virtual registers are represented and printed differently. This patch preserves what seems to be the spirit of r229978, avoiding the use of TM.getSubtargetImpl(), while still using MachineOperand-oriented printing for MachineOperands. Differential Revision: http://reviews.llvm.org/D14709 llvm-svn: 253338
*	AVX512 : regenerate the test file against trunk.	Igor Breger	2015-11-17	1	-117/+478
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D14742 llvm-svn: 253321
*	Drop prelink support.	Rafael Espindola	2015-11-17	4	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The way prelink used to work was * The compiler decides if a given section only has relocations that are know to point to the same DSO. If so, it names it .data.rel.ro.local<something>. * The static linker puts all of these together. * The prelinker program assigns addresses to each library and resolves the local relocations. There are many problems with this: * It is incompatible with address space randomization. * The information passed by the compiler is redundant. The linker knows if a given relocation is in the same DSO or not. If could sort by that if so desired. * There are newer ways of speeding up DSO (gnu hash for example). * Even if we want to implement this again in the compiler, the previous implementation is pretty broken. It talks about relocations that are "resolved by the static linker". If they are resolved, there are none left for the prelinker. What one needs to track is if an expression will require only dynamic relocations that point to the same DSO. At this point it looks like the prelinker is an historical curiosity. For example, fedora has retired it because it failed to build for two releases (http://pkgs.fedoraproject.org/cgit/prelink.git/commit/?id=eb43100a8331d91c801ee3dcdb0a0bb9babfdc1f) This patch removes support for it. That is, it stops printing the ".local" sections. llvm-svn: 253280
*	[WinEH] Don't let UnwindHelp alias the return address	Reid Kleckner	2015-11-16	2	-0/+50
\| \| \| \| \| \| \| \| \| \| \|	On top of that, don't bother allocating and initializing UnwindHelp if we don't have any funclets. Currently we always use RBP as our frame pointer when funclets are present, so this change makes it impossible to come here without any fixed stack objects. Fixes PR25533. llvm-svn: 253245
*	AVX512: Implemented encoding and intrinsics for VMOVSHDUP/VMOVSLDUP ↵	Igor Breger	2015-11-16	3	-0/+167
\| \| \| \| \| \| \| \|	instructions. Differential Revision: http://reviews.llvm.org/D14322 llvm-svn: 253185
*	Revert r253160.	Igor Breger	2015-11-15	3	-167/+0
\| \| \| \| \| \|	It broke layering violation. Reproducible with BUILD_SHARED_LIBS=ON. llvm-svn: 253163
*	AVX512: Implemented encoding and intrinsics for VMOVSHDUP/VMOVSLDUP ↵	Igor Breger	2015-11-15	3	-0/+167
\| \| \| \| \| \| \| \|	instructions. Differential Revision: http://reviews.llvm.org/D14322 llvm-svn: 253160
*	[X86][SSE] Fixed arch/triple and regenerated results.	Simon Pilgrim	2015-11-14	2	-21/+75
\| \| \| \| \| \|	Tidyup before diffs from new patch. llvm-svn: 253144
*	[X86][SSE] Added extra vector truncation tests	Simon Pilgrim	2015-11-14	1	-0/+201
\| \| \| \| \| \|	Baseline comparison to D14588 llvm-svn: 253132
*	[ShrinkWrapping] Disable the optimization for functions with sanitize like	Quentin Colombet	2015-11-14	1	-0/+40
\| \| \| \| \| \| \| \| \| \|	attribute. Even if the target supports shrink-wrapping, the prologue and epilogue must not move because a crash can happen anywhere and sanitizers need to be able to unwind from the PC of the crash. llvm-svn: 253116
*	[WinEH] Fix ESP management with 32-bit __CxxFrameHandler3	Reid Kleckner	2015-11-13	3	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The C++ EH personality automatically restores ESP from the C++ EH registration node after a catchret. I mistakenly thought it was like SEH, which does not restore ESP. It makes sense for C++ EH to differ from SEH here because SEH does not use funclets for catches, and does not allow catching inside of finally. C++ EH may need to unwind through multiple catch funclets and eventually catchret to some outer funclet. Therefore, the runtime has to keep track of which ESP to use with catchret, rather than having the compiler reload it manually. llvm-svn: 253084
*	[X86][SSE] Combine UNPCKL with vector_shuffle into UNPCKH to save one ↵	Cong Hou	2015-11-13	2	-15/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instruction for sext from v16i8 to v16i16 and v8i16 to v8i32. This patch is enabling combining UNPCKL with vector_shuffle that moves the upper half of a vector into the lower half, into a UNPCKH instruction. For example: t2: v16i8 = vector_shuffle<8,9,10,11,12,13,14,15,u,u,u,u,u,u,u,u> t1, undef:v16i8 t3: v16i8 = X86ISD::UNPCKL undef:v16i8, t2 will be combined to: t3: v16i8 = X86ISD::UNPCKH undef:v16i8, t1 Differential revision: http://reviews.llvm.org/D14399 llvm-svn: 253067
*	Add missing triple to WinEH test case	Reid Kleckner	2015-11-13	1	-1/+1
\| \| \| \|	llvm-svn: 253062
*	[WinEH] Make UnwindHelp a fixed stack object allocated after XMM CSRs	Reid Kleckner	2015-11-13	6	-13/+71
\| \| \| \| \| \| \|	Now the offset of UnwindHelp in our EH tables and the offset that we store to in the prologue agree. llvm-svn: 253059
*	X86-FMA3: Implemented commute transformations FMA*_Int instructions.	Vyacheslav Klochkov	2015-11-13	2	-48/+303
\| \| \| \| \| \| \| \| \| \|	It made it possible to apply the memory folding optimization for the 2nd operand of FMA*_Int instructions. Reviewer: Quentin Colombet Differential Revision: http://reviews.llvm.org/D14550 llvm-svn: 252973
*	specify triple and tighten checks using update_llc_test_checks.py	Sanjay Patel	2015-11-12	1	-61/+62
\| \| \| \|	llvm-svn: 252962
*	[ShrinkWrap] Make sure we do not mess up with EH funclet lowering.	Quentin Colombet	2015-11-12	1	-1/+3
\| \| \| \| \| \| \| \|	ShrinkWrapping does not understand exception handling constraints for now, so make sure we do not mess with them by aborting on functions that use EH funclets. llvm-svn: 252917
*	[SDAG] Introduce a new BITREVERSE node along with a corresponding LLVM intrinsic	James Molloy	2015-11-12	1	-0/+22
\| \| \| \| \| \| \| \| \| \|	Several backends have instructions to reverse the order of bits in an integer. Conceptually matching such patterns is similar to @llvm.bswap, and it was mentioned in http://reviews.llvm.org/D14234 that it would be best if these patterns were matched in InstCombine instead of reimplemented in every different target. This patch introduces an intrinsic @llvm.bitreverse.i* that operates similarly to @llvm.bswap. For plumbing purposes there is also a new ISD node ISD::BITREVERSE, with simple expansion and promotion support. The intention is that InstCombine's BSWAP detection logic will be extended to support BITREVERSE too, and @llvm.bitreverse intrinsics emitted (if the backend supports lowering it efficiently). llvm-svn: 252878
*	LegalizeDAG: Fix and improve FCOPYSIGN/FABS legalization	Matthias Braun	2015-11-12	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Factor out code to query and modify the sign bit of a floatingpoint value as an integer. This also works if none of the targets integer types is big enough to hold all bits of the floatingpoint value. - Legalize FABS(x) as FCOPYSIGN(x, 0.0) if FCOPYSIGN is available, otherwise perform bit manipulation on the sign bit. The previous code used "x >u 0 ? x : -x" which is incorrect for x being -0.0! It also takes 34 instructions on ARM Cortex-M4. With this patch we only require 5: vldr d0, LCPI0_0 vmov r2, r3, d0 lsrs r2, r3, #31 bfi r1, r2, #31, #1 bx lr (This could be further improved if the compiler would recognize that r2, r3 is zero). - Only lower FCOPYSIGN(x, y) = sign(x) ? -FABS(x) : FABS(x) if FABS is available otherwise perform bit manipulation on the sign bit. - Perform the sign(x) test by masking out the sign bit and comparing with 0 rather than shifting the sign bit to the highest position and testing for "<s 0". For x86 copysignl (on 80bit values) this gets us: testl $32768, %eax rather than: shlq $48, %rax sets %al testb %al, %al Differential Revision: http://reviews.llvm.org/D11172 llvm-svn: 252839
*	[TLS on Darwin] use a different mask for tls calls on x86-64.	Manman Ren	2015-11-12	1	-0/+28
\| \| \| \| \| \| \| \| \|	Calls involved in thread-local variable lookup save more registers than normal calls. rdar://problem/23073171 llvm-svn: 252837
*	[WinEH] Don't forward branches across empty EH pad BBs	Reid Kleckner	2015-11-11	1	-0/+53
\| \| \| \| \| \| \|	For really simple SEH catchpads, we tried to forward the invoke unwind edge across the empty block. llvm-svn: 252822
*	[WinEH] Only generate UnwindHelp slot for MSVCXX	Joseph Tremoulet	2015-11-11	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Other personalities don't use this special frame slot. Reviewers: majnemer, andrew.w.kaylor, rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14580 llvm-svn: 252778
*	[X86] Replace LEAs with INC/DEC when profitable	Michael Kuperstein	2015-11-11	2	-1/+35
\| \| \| \| \| \| \| \| \|	If possible and profitable, replace lea %reg, 1(%reg) and lea %reg, -1(%reg) with inc %reg and dec %reg respectively. Patch by: anton.nadolsky@intel.com Differential Revision: http://reviews.llvm.org/D14059 llvm-svn: 252722
*	[WinEH] Insert the MBB for EH_RESTORE after the catchret	Reid Kleckner	2015-11-10	1	-0/+46
\| \| \| \| \| \| \|	Inserting it before the target block could be bad, we might already have a fallthrough edge to it. llvm-svn: 252670
*	[X86] Do not try to custom-lower sitofp/fptosi in soft-float mode	Michael Kuperstein	2015-11-10	1	-7/+125
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D14495 llvm-svn: 252621
*	add 'MustReduceDepth' as an objective/cost-metric for the MachineCombiner	Sanjay Patel	2015-11-10	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is one of the problems noted in PR25016: https://llvm.org/bugs/show_bug.cgi?id=25016 and: http://lists.llvm.org/pipermail/llvm-dev/2015-October/090998.html The spilling problem is independent and not addressed by this patch. The MachineCombiner was doing reassociations that don't improve or even worsen the critical path. This is caused by inclusion of the "slack" factor when calculating the critical path of the original code sequence. If we don't add that, then we have a more conservative cost comparison of the old code sequence vs. a new sequence. The more liberal calculation must be preserved, however, for the AArch64 MULADD patterns because benchmark regressions were observed without that. The two failing test cases now have identical asm that does what we want: a + b + c + d ---> (a + b) + (c + d) Differential Revision: http://reviews.llvm.org/D13417 llvm-svn: 252616
*	AVX512 : Implemented encoding and DAG lowering for VMOVHPS/PD and VMOVLPS/PD ↵	Igor Breger	2015-11-10	2	-0/+56
\| \| \| \| \| \| \| \|	instructions. Differential Revision: http://reviews.llvm.org/D14492 llvm-svn: 252592
*	Support for emitting inline stack probes	Andy Ayers	2015-11-10	2	-1/+143
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For CoreCLR on Windows, stack probes must be emitted as inline sequences that probe successive stack pages between the current stack limit and the desired new stack pointer location. This implements support for the inline expansion on x64. For in-body alloca probes, expansion is done during instruction lowering. For prolog probes, a stub call is initially emitted during prolog creation, and expanded after epilog generation, to avoid complications that arise when introducing new machine basic blocks during prolog and epilog creation. Added a new test case, modified an existing one to exclude non-x64 coreclr (for now). Add test case Fix tests llvm-svn: 252578
*	[WinEH] Don't emit CATCHRET from visitCatchPad	David Majnemer	2015-11-09	5	-12/+8
\| \| \| \| \| \| \|	Instead, emit a CATCHPAD node which will get selected to a target specific sequence. llvm-svn: 252528
*	specify triple so Windows bots won't be sad	Sanjay Patel	2015-11-09	2	-2/+2
\| \| \| \|	llvm-svn: 252519
*	[x86] try harder to match bitwise 'or' into an LEA	Sanjay Patel	2015-11-09	3	-25/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivation for this patch starts with the epic fail example in PR18007: https://llvm.org/bugs/show_bug.cgi?id=18007 ...unfortunately, this patch makes no difference for that case, but it solves some simpler cases. We'll get there some day. :) The current 'or' matching code was using computeKnownBits() via isBaseWithConstantOffset() -> MaskedValueIsZero(), but that's an unnecessarily limited use. We can do more by copying the logic in ValueTracking's haveNoCommonBitsSet(), so we can treat the 'or' as if it was an 'add'. There's a TODO comment here because we should lift the bit-checking logic into a helper function, so it's not duplicated in DAGCombiner. An example of the better LEA matching: leal (%rdi,%rdi), %eax andl $1, %esi orl %esi, %eax Becomes: andl $1, %esi leal (%rsi,%rdi,2), %eax Differential Revision: http://reviews.llvm.org/D13956 llvm-svn: 252515
*	[WinEH] Tweak funclet prologue/epilogue insertion to pass verifier	Reid Kleckner	2015-11-09	4	-8/+8
\| \| \| \| \| \| \| \| \| \|	For some reason we'd never run MachineVerifier on WinEH code, and you explicitly have to ask for it with llc. I added it to a few test cases to get some coverage. Fixes PR25461. llvm-svn: 252512
*	[WinEH] Update PHIs of CATCHRET successors	David Majnemer	2015-11-08	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \|	The TailDuplication machine pass ran across a malformed CFG: a PHI node referred it's predecessor's predecessor instead of it's predecessor. This occurred because we split the edge in X86ISelLowering when we processed the CATCHRET but forgot to do something about the PHI nodes. This fixes PR25444. llvm-svn: 252413
*	[X86] Fold (trunc (i32 (zextload i16))) into vbroadcast.	Ahmed Bougacha	2015-11-06	2	-12/+4
\| \| \| \| \| \| \| \| \| \| \|	When matching non-LSB-extracting truncating broadcasts, we now insert the necessary SRL. If the scalar resulted from a load, the SRL will be folded into it, creating a narrower, offset, load. However, i16 loads aren't Desirable, so we get i16->i32 zextloads. We already catch i16 aextloads; catch these as well. llvm-svn: 252363
*	[X86] SRL non-LSB extracts when folding to truncating broadcasts.	Ahmed Bougacha	2015-11-06	4	-58/+110
\| \| \| \| \| \| \| \| \| \| \| \|	Now that we recognize this, we can support it instead of bailing out. That is, we can fold: (v8i16 (shufflevector (v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))), <1,1,...,1>)) into: (v8i16 (vbroadcast (i16 (trunc (srl Y, 16))))) llvm-svn: 252362
*	[X86] Don't fold non-LSB extracts into truncating broadcasts.	Ahmed Bougacha	2015-11-06	4	-0/+396
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We used to incorrectly assume that the offset we're extracting from was a multiple of the element size. So, we'd fold: (v8i16 (shufflevector (v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))), <1,1,...,1>)) into: (v8i16 (vbroadcast (i16 (trunc Y)))) whereas we should have extracted the higher bits from X. Instead, bail out if the assumption doesn't hold. llvm-svn: 252361
*	[ShrinkWrapping] Teach shrink-wrapping how to analyze RegMask.	Quentin Colombet	2015-11-06	1	-0/+59
\| \| \| \| \| \| \|	Previously we were conservatively assuming that RegMask operands clobber callee saved registers. llvm-svn: 252341
*	Improved the operands commute transformation for X86-FMA3 instructions.	Andrew Kaylor	2015-11-06	2	-12/+515
\| \| \| \| \| \| \| \| \| \| \| \|	All 3 operands of FMA3 instructions are commutable now. Patch by Slava Klochkov Reviewers: Quentin Colombet(qcolombet), Ahmed Bougacha(ab). Differential Revision: http://reviews.llvm.org/D13269 llvm-svn: 252335
*	[WinEH] Mark funclet entries and exits as clobbering all registers	Reid Kleckner	2015-11-06	2	-0/+177
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In this implementation, LiveIntervalAnalysis invents a few register masks on basic block boundaries that preserve no registers. The nice thing about this is that it prevents the prologue inserter from thinking it needs to spill all XMM CSRs, because it doesn't see any explicit physreg defs in the MI. Reviewers: MatzeB, qcolombet, JosephTremoulet, majnemer Subscribers: MatzeB, llvm-commits Differential Revision: http://reviews.llvm.org/D14407 llvm-svn: 252318
*	Bring r252305 back with a test fix.	Rafael Espindola	2015-11-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	We now create the .eh_frame section early, just like every other special section. This means that the special flags are visible in code that explicitly asks for ".eh_frame". llvm-svn: 252313
*	DI: Reverse direction of subprogram -> function edge.	Peter Collingbourne	2015-11-05	24	-60/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, subprograms contained a metadata reference to the function they described. Because most clients need to get or set a subprogram for a given function rather than the other way around, this created unneeded inefficiency. For example, many passes needed to call the function llvm::makeSubprogramMap() to build a mapping from functions to subprograms, and the IR linker needed to fix up function references in a way that caused quadratic complexity in the IR linking phase of LTO. This change reverses the direction of the edge by storing the subprogram as function-level metadata and removing DISubprogram's function field. Since this is an IR change, a bitcode upgrade has been provided. Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is attached to the PR. Differential Revision: http://reviews.llvm.org/D14265 llvm-svn: 252219
*	[WinEH] Fix funclet prologues with stack realignment	Reid Kleckner	2015-11-05	1	-0/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We already had a test for this for 32-bit SEH catchpads, but those don't actually create funclets. We had a bug that only appeared in funclet prologues, where we would establish EBP and ESI as our FP and BP, and then downstream prologue code would overwrite them. While I was at it, I fixed Win64+funclets+stackrealign. This issue doesn't come up as often there due to the ABI requring 16 byte stack alignment, but now we can rest easy that AVX and WinEH will work well together =P. llvm-svn: 252210
*	Add cfi instr for CFA calculation when movpc is expanded to call and pop	Petar Jovanovic	2015-11-05	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes the issue of wrong CFA calculation in the following case: 0x08048400 <+0>: push %ebx 0x08048401 <+1>: sub $0x8,%esp 0x08048404 <+4>: call 0x8048409 <test+9> 0x08048409 <+9>: pop %eax 0x0804840a <+10>: add $0x1bf7,%eax 0x08048410 <+16>: mov %eax,%ebx 0x08048412 <+18>: call 0x80483f0 <bar> 0x08048417 <+23>: add $0x8,%esp 0x0804841a <+26>: pop %ebx 0x0804841b <+27>: ret The highlighted instructions are a product of movpc instruction. The call instruction changes the stack pointer, and pop instruction restores its value. However, the rule for computing CFA is not updated and is wrong on the pop instruction. So, e.g. backtrace in gdb does not work when on the pop instruction. This adds cfi instructions for both call and pop instructions. cfi_adjust_cfa_offset** instruction is used with the appropriate offset for setting the rules to calculate CFA correctly. Patch by Violeta Vukobrat. Differential Revision: http://reviews.llvm.org/D14021 llvm-svn: 252176
*	revert rev. 252153 due to build failure on ubuntu	Asaf Badouh	2015-11-05	1	-75/+0
\| \| \| \| \| \|	[X86][AVX512] add comi with Sae llvm-svn: 252154
*	[X86][AVX512] add comi with Sae	Asaf Badouh	2015-11-05	1	-0/+75
\| \| \| \| \| \| \| \|	add builtin_ia32_vcomisd and builtin_ia32_vcomisd Differential Revision: http://reviews.llvm.org/D14331 llvm-svn: 252153
*	[x86] Teach the shrink-wrapping hooks to do the proper thing with Win64.	Quentin Colombet	2015-11-04	1	-0/+122
\| \| \| \| \| \| \| \| \| \|	Win64 has some strict requirements for the epilogue. As a result, we disable shrink-wrapping for Win64 unless the block that gets the epilogue is already an exit block. Fixes PR24193. llvm-svn: 252088
*	[X86][SSE] Add general memory folding for (V)INSERTPS instruction	Simon Pilgrim	2015-11-04	5	-13/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction. The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening. This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done. It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted. Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll Differential Revision: http://reviews.llvm.org/D13988 llvm-svn: 252074