bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	R600/SI: Default to no single precision denormals.	Matt Arsenault	2014-07-14	1	-1/+1
\| \| \| \|	llvm-svn: 213017
*	CodeGen: Stick constant pool entries in COMDAT sections for WinCOFF	David Majnemer	2014-07-14	2	-5/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	COFF lacks a feature that other object file formats support: mergeable sections. To work around this, MSVC sticks constant pool entries in special COMDAT sections so that each constant is in it's own section. This permits unused constants to be dropped and it also allows duplicate constants in different translation units to get merged together. This fixes PR20262. Differential Revision: http://reviews.llvm.org/D4482 llvm-svn: 213006
*	[DAGCombiner] Add more rules to combine shuffle vector dag nodes.	Andrea Di Biagio	2014-07-14	1	-0/+373
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch teaches the DAGCombiner how to fold a pair of shuffles according to rules: 1. shuffle(shuffle A, B, M0), B, M1) -> shuffle(A, B, M2) 2. shuffle(shuffle A, B, M0), A, M1) -> shuffle(A, B, M3) The new rules would only trigger if the resulting shuffle has legal type and legal mask. Added test 'combine-vec-shuffle-3.ll' to verify that DAGCombiner correctly folds shuffles on x86 when the resulting mask is legal. Also added some negative cases to verify that we avoid introducing illegal shuffles. llvm-svn: 213001
*	Unify the lowering of arguments during SjLj prepare.	Bill Wendling	2014-07-14	1	-1/+1
\| \| \| \| \| \| \|	The 'select true, %arg, undef' instruction can be used for both aggregate and non-aggregate arguments. llvm-svn: 212967
*	X86: correct 64-bit atomics on 32-bit	Saleem Abdulrasool	2014-07-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We would emit a libcall for a 64-bit atomic on x86 after SVN r212119. This was due to the misuse of hasCmpxchg16 to indicate if cmpxchg8b was supported on a 32-bit target. They were added at different times and would result in the border condition being mishandled. This fixes the border case to emit the cmpxchg8b instruction for 64-bit atomic operations on x86 at the cost of restoring a long-standing bug in the codegen. We emit a cmpxchg8b on all x86 targets even where the CPU does not support this instruction (pre-Pentium CPUs). Although this bug should be fixed, this was present prior to SVN r212119 and this change, so this is not really introducing a regression. llvm-svn: 212956
*	X86: remove temporary atomicrmw used during lowering.	Tim Northover	2014-07-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	We construct a temporary "atomicrmw xchg" instruction when lowering atomic stores for widths that aren't supported natively. This isn't on the top-level worklist though, so it won't be removed automatically and we have to do it ourselves once that itself has been lowered. Thanks Saleem for pointing this out! llvm-svn: 212948
*	[mips] For the FP64A ABI, odd-numbered double-precision moves must not use ↵	Daniel Sanders	2014-07-14	2	-48/+324
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mtc1/mfc1. Summary: This is because the FP64A the hardware will redirect 32-bit reads/writes from/to odd-numbered registers to the upper 32-bits of the corresponding even register. In effect, simulating FR=0 mode when FR=0 mode is not available. Unfortunately, we have to make the decision to avoid mfc1/mtc1 before register allocation so we currently do this for even registers too. FPXX has a similar requirement on 32-bit architectures that lack mfhc1/mthc1 so this patch also handles the affected moves from the FPU for FPXX too. Moves to the FPU were supported by an earlier commit. Differential Revision: http://reviews.llvm.org/D4484 llvm-svn: 212938
*	[mips] Use MFHC1 when it is available (MIPS32r2 and later) for both FP32 and ↵	Daniel Sanders	2014-07-14	3	-161/+169
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	FP64 moves Summary: This is similar to r210771 which did the same thing for MTHC1. Also corrected MTHC1_D32 and MTHC1_D64 which used AFGR64 and FGR64 on the wrong definitions. Differential Revision: http://reviews.llvm.org/D4483 llvm-svn: 212936
*	AArch64: remove unnecessary pseudo-instruction.	Tim Northover	2014-07-14	1	-2/+2
\| \| \| \| \| \| \|	Sufficiently twisted use of TableGen lets us write patterns directly for f16 (as an i16 promoted to i32) -> f32 conversion. llvm-svn: 212933
*	[mips] Expand BuildPairF64 to a spill and reload when the O32 FPXX ABI is	Sasa Stankovic	2014-07-14	2	-1/+142
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled and mthc1 and dmtc1 are not available (e.g. on MIPS32r1) This prevents the upper 32-bits of a double precision value from being moved to the FPU with mtc1 to an odd-numbered FPU register. This is necessary to ensure that the code generated executes correctly regardless of the current FPU mode. MIPS32r2 and above continues to use mtc1/mthc1, while MIPS-IV and above continue to use dmtc1. Differential Revision: http://reviews.llvm.org/D4465 llvm-svn: 212930
*	Support lowering of empty aggregates.	Bill Wendling	2014-07-14	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \|	This crash was pretty common while compiling Rust for iOS (armv7). Reason - SjLj preparation step was lowering aggregate arguments as ExtractValue + InsertValue. ExtractValue has assertion which checks that there is some data in value, which is not true in case of empty (no fields) structures. Rust uses them quite extensively so this patch uses a 'select true, %val, undef' instruction to lower the argument. Patch by Valerii Hiora. llvm-svn: 212922
*	[DAGCombiner] Fix a crash caused by a missing check for legal type when ↵	Andrea Di Biagio	2014-07-13	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	trying to fold shuffles. Verify that DAGCombiner does not crash when trying to fold a pair of shuffles according to rule (added at r212539): (shuffle (shuffle A, Undef, M0), Undef, M1) -> (shuffle A, Undef, M2) The DAGCombiner avoids folding shuffles if the resulting shuffle dag node is not legal for the target. That means, the resulting shuffle must have legal type and legal mask. Before, the DAGCombiner only called method 'TargetLowering::isShuffleMaskLegal' to check if it was "safe" to fold according to the above-mentioned rule. However, this caused a crash in the x86 backend since method 'isShuffleMaskLegal' always expects to be called on a legal vector type. llvm-svn: 212915
*	R600: Run more tests with promote alloca disabled.	Matt Arsenault	2014-07-13	4	-22/+57
\| \| \| \| \| \| \|	Re-run tests changed in r211110 to test both paths. Also fix broken check line. llvm-svn: 212895
*	R600: Run private-memory test with and without alloca promote	Matt Arsenault	2014-07-13	1	-24/+33
\| \| \| \| \| \| \|	The unpromoted path still needs to be tested since we can't always promote to using LDS. llvm-svn: 212894
*	AArch64: add support for llvm.aarch64.hint intrinsic	Saleem Abdulrasool	2014-07-12	1	-0/+67
\| \| \| \| \| \| \| \| \| \| \|	This adds a llvm.aarch64.hint intrinsic to mirror the llvm.arm.hint in order to support the various hint intrinsic functions in the ACLE. Add an optional pattern field that permits the subclass to specify the pattern that matches the selection. The intrinsic pattern is set as mayLoad, mayStore, so overload the value for the definition of the hint instruction. llvm-svn: 212883
*	R600: Add missing tests for some intrinsics	Matt Arsenault	2014-07-12	7	-5/+109
\| \| \| \|	llvm-svn: 212870
*	[PowerPC] Fix invalid displacement created by LocalStackAlloc	Ulrich Weigand	2014-07-11	1	-0/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit fixes a bug in PPCRegisterInfo::isFrameOffsetLegal that could result in the LocalStackAlloc pass creating an MI instruction out-of-range displacement: %vreg17<def> = LD 33184, %vreg31; mem:LD8[%g](align=32) %G8RC:%vreg17 G8RC_and_G8RC_NOX0:%vreg31 (In final assembler output the top bits are stripped off, resulting in a negative offset loading from below the stack pointer.) Common code expects the isFrameOffsetLegal routine to verify whether adding a given offset to the offset already present in the instruction results in a valid displacement. However, on PowerPC the routine did not take the already present instruction offset into account. This commit fixes isFrameOffsetLegal to add the instruction offset, and updates a local caller (needsFrameBaseReg) to no longer add the instruction offset itself before calling isFrameOffsetLegal. Reviewed by Hal Finkel. llvm-svn: 212832
*	R600/SI: Use i32 vectors for resources and samplers	Marek Olsak	2014-07-11	3	-86/+86
\| \| \| \| \| \| \| \|	This affects new intrinsics only. What surprises me is that v32i8 still works. llvm-svn: 212831
*	R600/SI: add sample and image intrinsics exposing all instruction fields	Marek Olsak	2014-07-11	3	-0/+627
\| \| \| \| \| \| \| \| \| \| \|	We need the intrinsics with offsets, so why not just add them all. The R128 parameter will also be useful for reducing SGPR usage. GL_ARB_image_load_store also adds some image GLSL modifiers like "coherent", so Mesa will probably translate those to slc, glc, etc. When LLVM 3.5 is released, I'll switch Mesa to these new intrinsics. llvm-svn: 212830
*	ARM: Allow __fp16 as a function arg or return type for AArch64	Oliver Stannard	2014-07-11	1	-0/+14
\| \| \| \| \| \| \|	ACLE 2.0 allows __fp16 to be used as a function argument or return type. This enables this for AArch64. llvm-svn: 212812
*	[X86] Fix the inversion of low and high bits for the lowering of MUL_LOHI.	Quentin Colombet	2014-07-11	1	-3/+0
\| \| \| \| \| \| \| \|	Also add a few comments. <rdar://problem/17581756> llvm-svn: 212808
*	R600: Implement float to long/ulong	Jan Vesely	2014-07-10	3	-49/+386
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use alg. from LegalizeDAG.cpp Move Expand setting to SIISellowering v2: Extend existing tests instead of creating new ones v3: use separate LowerFPTOSINT function v4: use TargetLowering::expandFP_TO_SINT add comment about using FP_TO_SINT for uints Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 212773
*	[mips] Emit two CFI offset directives per double precision SDC1/LDC1	Zoran Jovanovic	2014-07-10	2	-0/+54
\| \| \| \| \| \| \|	instead of just one for FR=1 registers Differential Revision: http://reviews.llvm.org/D4310 llvm-svn: 212769
*	Extend the test coverage in combine-vec-shuffle-2.ll adding some negative tests.	Andrea Di Biagio	2014-07-10	1	-0/+89
\| \| \| \| \| \| \| \| \|	Add test cases where we don't expect to trigger the combine optimizations introduced at revision 212748. No functional change intended. llvm-svn: 212756
*	Revert "Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), ↵	Matt Arsenault	2014-07-10	3	-2/+45
\| \| \| \| \| \| \| \|	(trunc b) combine."" Don't try to convert the select condition type. llvm-svn: 212750
*	[DAG] Further improve the logic in DAGCombiner that folds a pair of shuffles ↵	Andrea Di Biagio	2014-07-10	2	-1/+167
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	into a single shuffle if the resulting mask is legal. This patch teaches the DAGCombiner how to fold shuffles according to the following new rules: 1. shuffle(shuffle(x, y), undef) -> x 2. shuffle(shuffle(x, y), undef) -> y 3. shuffle(shuffle(x, y), undef) -> shuffle(x, undef) 4. shuffle(shuffle(x, y), undef) -> shuffle(y, undef) The backend avoids to combine shuffles according to rules 3. and 4. if the resulting shuffle does not have a legal mask. This is to avoid introducing illegal shuffles that are potentially expanded into a sub-optimal sequence of target specific dag nodes during vector legalization. Added test case combine-vec-shuffle-2.ll to verify that we correctly triggers the new rules when combining shuffles. llvm-svn: 212748
*	[X86] Mark pseudo instruction TEST8ri_NOEREX as hasSIdeEffects=0.	Akira Hatanaka	2014-07-10	1	-0/+20
\| \| \| \| \| \| \| \| \|	Also, add a case clause in X86InstrInfo::shouldScheduleAdjacent to enable macro-fusion. <rdar://problem/15680770> llvm-svn: 212747
*	[mips] Added FPXX modeless calling convention.	Zoran Jovanovic	2014-07-10	2	-0/+82
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D4293 llvm-svn: 212726
*	AArch64: correctly fast-isel i8 & i16 multiplies	Tim Northover	2014-07-10	1	-0/+40
\| \| \| \| \| \| \| \|	We were asking for a register for type i8 or i16 which caused an assert. rdar://problem/17620015 llvm-svn: 212718
*	[mips] Add support for -modd-spreg/-mno-odd-spreg	Daniel Sanders	2014-07-10	1	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When -mno-odd-spreg is in effect, 32-bit floating point values are not permitted in odd FPU registers. The option also prohibits 32-bit and 64-bit floating point comparison results from being written to odd registers. This option has three purposes: * It allows support for certain MIPS implementations such as loongson-3a that do not allow the use of odd registers for single precision arithmetic. * When using -mfpxx, -mno-odd-spreg is the default and this allows us to statically check that code is compliant with the O32 FPXX ABI since mtc1/mfc1 instructions to/from odd registers are guaranteed not to appear for any reason. Once this has been established, the user can then re-enable -modd-spreg to regain the use of all 32 single-precision registers. * When using -mfp64 and -mno-odd-spreg together, an O32 extension named O32 FP64A is used as the ABI. This is intended to provide almost all functionality of an FR=1 processor but can also be executed on a FR=0 core with the assistance of a hardware compatibility mode which emulates FR=0 behaviour on an FR=1 processor. * Added '.module oddspreg' and '.module nooddspreg' each of which update the .MIPS.abiflags section appropriately * Moved setFpABI() call inside emitDirectiveModuleFP() so that the caller doesn't have to remember to do it. * MipsABIFlags now calculates the flags1 and flags2 member on demand rather than trying to maintain them in the same format they will be emitted in. There is one portion of the -mfp64 and -mno-odd-spreg combination that is not implemented yet. Moves to/from odd-numbered double-precision registers must not use mtc1. I will fix this in a follow-up. Differential Revision: http://reviews.llvm.org/D4383 llvm-svn: 212717
*	[x86,SDAG] Introduce any- and sign-extend-vector-inreg nodes analogous	Chandler Carruth	2014-07-10	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to the zero-extend-vector-inreg node introduced previously for the same purpose: manage the type legalization of widened extend operations, especially to support the experimental widening mode for x86. I'm adding both because sign-extend is expanded in terms of any-extend with shifts to propagate the sign bit. This removes the last fundamental scalarization from vec_cast2.ll (a test case that hit many really bad edge cases for widening legalization), although the trunc tests in that file still appear scalarized because the the shuffle legalization is scalarizing. Funny thing, I've been working on that. Some initial experiments with this and SSE2 scenarios is showing moderately good behavior already for sign extension. Still some work to do on the shuffle combining on X86 before we're generating optimal sequences, but avoiding scalarization is a huge step forward. llvm-svn: 212714
*	llvm/test/CodeGen/X86/shift-parts.ll: FileCheck-ize. (from r212640)	NAKAMURA Takumi	2014-07-10	1	-1/+3
\| \| \| \|	llvm-svn: 212709
*	Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) ↵	NAKAMURA Takumi	2014-07-10	2	-40/+3
\| \| \| \| \| \| \| \|	combine." This caused miscompilation on, at least, x86-64. SExt(i1 cond) confused other optimizations. llvm-svn: 212708
*	[x86] Add another combine that is particularly useful for the new vector	Chandler Carruth	2014-07-10	1	-3/+11
\| \| \| \| \| \| \| \| \| \|	shuffle lowering: match shuffle patterns equivalent to an unpcklwd or unpckhwd instruction. This allows us to use generic lowering code for v8i16 shuffles and match the unpack pattern late. llvm-svn: 212705
*	Make it possible for ints/floats to return different values from ↵	Daniel Sanders	2014-07-10	2	-80/+124
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getBooleanContents() Summary: On MIPS32r6/MIPS64r6, floating point comparisons return 0 or -1 but integer comparisons return 0 or 1. Updated the various uses of getBooleanContents. Two simplifications had to be disabled when float and int boolean contents differ: - ScalarizeVecRes_VSELECT except when the kind of boolean contents is trivially discoverable (i.e. when the condition of the VSELECT is a SETCC node). - visitVSELECT (select C, 0, 1) -> (xor C, 1). Come to think of it, this one could test for the common case of 'C' being a SETCC too. Preserved existing behaviour for all other targets and updated the affected MIPS32r6/MIPS64r6 tests. This also fixes the pi benchmark where the 'low' variable was counting in the wrong direction because it thought it could simply add the result of the comparison. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, jholewinski, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D4389 llvm-svn: 212697
*	[x86] Expand the target DAG combining for PSHUFD nodes to be able to	Chandler Carruth	2014-07-10	1	-4/+1
\| \| \| \| \| \| \| \| \| \|	combine into half-shuffles through unpack instructions that expand the half to a whole vector without messing with the dword lanes. This fixes some redundant instructions in splat-like lowerings for v16i8, which are now getting to be really nice. llvm-svn: 212695
*	[x86] Tweak the v16i8 single input special case lowering for shuffles	Chandler Carruth	2014-07-10	1	-11/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	that splat i8s into i16s. Previously, we would try much too hard to arrange a sequence of i8s in one half of the input such that we could unpack them into i16s and shuffle those into place. This isn't always going to be a cheaper i8 shuffle than our other strategies. The case where it is always going to be cheaper is when we can arrange all the necessary inputs into one half using just i16 shuffles. It happens that viewing the problem this way also makes it much easier to produce an efficient set of shuffles to move the inputs into one half and then unpack them. With this, our splat code gets one step closer to being not terrible with the new experimental lowering strategy. It also exposes two combines missing which I will add next. llvm-svn: 212692
*	[x86] Initial improvements to the new shuffle lowering for v16i8	Chandler Carruth	2014-07-10	1	-17/+21
\| \| \| \| \| \| \| \| \| \| \| \| \|	shuffles specifically for cases where a small subset of the elements in the input vector are actually used. This is specifically targetted at improving the shuffles generated for trunc operations, but also helps out splat-like operations. There is still some really low-hanging fruit here that I want to address but this is a huge step in the right direction. llvm-svn: 212680
*	[AArch64]Fix an assertion failure in DAG Combiner about concating 2 ↵	Hao Liu	2014-07-10	1	-1/+15
\| \| \| \| \| \|	build_vector. llvm-svn: 212677
*	R600/SI: Add support for llvm.convert.{to\|from}.fp16	Matt Arsenault	2014-07-10	2	-0/+28
\| \| \| \|	llvm-svn: 212676
*	Recommit r212203: Don't try to construct debug LexicalScopes hierarchy for ↵	David Blaikie	2014-07-09	2	-38/+133
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	functions that do not have top level debug information. Reverted by Eric Christopher (Thanks!) in r212203 after Bob Wilson reported LTO issues. Duncan Exon Smith and Aditya Nandakumar helped provide a reduced reproduction, though the failure wasn't too hard to guess, and even easier with the example to confirm. The assertion that the subprogram metadata associated with an llvm::Function matches the scope data referenced by the DbgLocs on the instructions in that function is not valid under LTO. In LTO, a C++ inline function might exist in multiple CUs and the subprogram metadata nodes will refer to the same llvm::Function. In this case, depending on the order of the CUs, the first intance of the subprogram metadata may not be the one referenced by the instructions in that function and the assertion will fail. A test case (test/DebugInfo/cross-cu-linkonce-distinct.ll) is added, the assertion removed and a comment added to explain this situation. Original commit message: If a function isn't actually in a CU's subprogram list in the debug info metadata, ignore all the DebugLocs and don't try to build scopes, track variables, etc. While this is possibly a minor optimization, it's also a correctness fix for an incoming patch that will add assertions to LexicalScopes and the debug info verifier to ensure that all scope chains lead to debug info for the current function. Fix up a few test cases that had broken/incomplete debug info that could violate this constraint. Add a test case where this occurs by design (inlining a debug-info-having function in an attribute nodebug function - we want this to work because /if/ the nodebug function is then inlined into a debug-info-having function, it should be fine (and will work fine - we just stitch the scopes up as usual), but should the inlining not happen we need to not assert fail either). llvm-svn: 212649
*	Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine.	Matt Arsenault	2014-07-09	2	-3/+40
\| \| \| \| \| \|	Do this if the truncate is free and the select is legal. llvm-svn: 212640
*	AArch64: Better codegen for storing to __fp16.	Jim Grosbach	2014-07-09	1	-0/+126
\| \| \| \| \| \| \| \| \| \| \| \| \|	Storing will generally be immediately preceded by rounding from an f32 or f64, so make sure to match those patterns directly to convert into the FPR16 register class directly rather than going through the integer GPRs. This also eliminates an extra step in the convert-from-f64 path which was first converting to f32 and then to f16 from there. rdar://17594379 llvm-svn: 212638
*	[x86] Fix a bug in my new zext-vector-inreg DAG trickery where we were	Chandler Carruth	2014-07-09	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	not widening the input type to the node sufficiently to let the ext take place in a register. This would in turn result in a mysterious bitcast assertion failure downstream. First change here is to add back the helpful assert I had in an earlier version of the code to catch this immediately. Next change is to add support to the type legalization to detect when we have widened the operand either too little or too much (for whatever reason) and find a size-matched legal vector type to convert it to first. This can also fail so we get a new fallback path, but that seems OK. With this, we no longer crash on vec_cast2.ll when using widening. I've also added the CHECK lines for the zero-extend cases here. We still need to support sign-extend and trunc (or something) to get plausible code for the other two thirds of this test which is one of the regression tests that showed the most scalarization when widening was force-enabled. Slowly closing in on widening being a viable legalization strategy without it resorting to scalarization at every turn. =] llvm-svn: 212614
*	X86: When lowering v8i32 himuls use the correct shuffle masks for AVX2.	Benjamin Kramer	2014-07-09	1	-19/+23
\| \| \| \| \| \| \| \| \| \|	Turns out my trick of using the same masks for SSE4.1 and AVX2 didn't work out as we have to blend two vectors. While there remove unecessary cross-lane moves from the shuffles so the backend can lower it to palignr instead of vperm. Fixes PR20118, a miscompilation of vector sdiv by constant on AVX2. llvm-svn: 212611
*	[x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when widening	Chandler Carruth	2014-07-09	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vector types to be legal and a ZERO_EXTEND node is encountered. When we use widening to legalize vector types, extend nodes are a real challenge. Either the input or output is likely to be legal, but in many cases not both. As a consequence, we don't really have any way to represent this situation and the prior code in the widening legalization framework would just scalarize the extend operation completely. This patch introduces a new DAG node to represent doing a zero extend of a vector "in register". The core of the idea is to allow legal but different vector types in the input and output. The output vector must have fewer lanes but wider elements. The operation is defined to zero extend the low elements of the input to the size of the output elements, and drop all of the high elements which don't have a corresponding lane in the output vector. It also includes generic expansion of this node in terms of blending a zero vector into the high elements of the vector and bitcasting across. This in turn yields extremely nice code for x86 SSE2 when we use the new widening legalization logic in conjunction with the new shuffle lowering logic. There is still more to do here. We need to support sign extension, any extension, and potentially int-to-float conversions. My current plan is to continue using similar synthetic nodes to model each of these transitions with generic lowering code for each one. However, with this patch LLVM already reaches performance parity with GCC for the core C loops of the x264 code (assuming you disable the hand-written assembly versions) when compiling for SSE2 and SSE3 architectures and enabling the new widening and lowering logic for vectors. Differential Revision: http://reviews.llvm.org/D4405 llvm-svn: 212610
*	[mips][mips64r6] Correct select patterns that have the condition or ↵	Daniel Sanders	2014-07-09	4	-79/+79
\| \| \| \| \| \| \| \| \| \|	true/false values backwards Summary: This bug caused SingleSource/Regression/C/uint64_to_float and SingleSource/UnitTests/2002-05-02-CastTest3 to fail (among others). Differential Revision: http://reviews.llvm.org/D4388 llvm-svn: 212608
*	[mips][mips64r6] Correct cond names in the cmp.cond.[ds] instructions	Daniel Sanders	2014-07-09	3	-37/+37
\| \| \| \| \| \| \| \| \| \|	Summary: It seems we accidentally read the wrong column of the table MIPS64r6 spec and used the names for c.cond.fmt instead of cmp.cond.fmt. Differential Revision: http://reviews.llvm.org/D4387 llvm-svn: 212607
*	[mips][mips64r6] Use JALR for indirect branches instead of JR (which is not ↵	Daniel Sanders	2014-07-09	2	-11/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	available on MIPS32r6/MIPS64r6) Summary: This completes the change to use JALR instead of JR on MIPS32r6/MIPS64r6. Reviewers: jkolek, vmedic, zoran.jovanovic, dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4269 llvm-svn: 212605
*	[mips][mips64r6] Use JALR for returns instead of JR (which is not available ↵	Daniel Sanders	2014-07-09	3	-34/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	on MIPS32r6/MIPS64r6) Summary: RET, and RET_MM have been replaced by a pseudo named PseudoReturn. In addition a version with a 64-bit GPR named PseudoReturn64 has been added. Instruction selection for a return matches RetRA, which is expanded post register allocation to PseudoReturn/PseudoReturn64. During MipsAsmPrinter, this PseudoReturn/PseudoReturn64 are emitted as: - (JALR64 $zero, $rs) on MIPS64r6 - (JALR $zero, $rs) on MIPS32r6 - (JR_MM $rs) on microMIPS - (JR $rs) otherwise On MIPS32r6/MIPS64r6, 'jr $rs' is an alias for 'jalr $zero, $rs'. To aid development and review (specifically, to ensure all cases of jr are updated), these aliases are temporarily named 'r6.jr' instead of 'jr'. A follow up patch will change them back to the correct mnemonic. Added (JALR $zero, $rs) to MipsNaClELFStreamer's definition of an indirect jump, and removed it from its definition of a call. Note: I haven't accounted for MIPS64 in MipsNaClELFStreamer since it's doesn't appear to account for any MIPS64-specifics. The return instruction created as part of eh_return expansion is now expanded using expandRetRA() so we use the right return instruction on MIPS32r6/MIPS64r6 ('jalr $zero, $rs'). Also, fixed a misuse of isABI_N64() to detect 64-bit wide registers in expandEhReturn(). Reviewers: jkolek, vmedic, mseaborn, zoran.jovanovic, dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4268 llvm-svn: 212604