bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[DAGCombiner] Don't add volatile or indexed stores to ChainedStores	Junmo Park	2016-01-28	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: findBetterNeighborChains does not handle volatile or indexed stores. However, it did not check when adding stores to ChainedStores. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D16463 llvm-svn: 259024
*	Tidied up TRUNC combine code. NFC.	Simon Pilgrim	2016-01-23	1	-9/+5
\| \| \| \| \| \|	Make use of DAG.getBitcast and use clang-format to reduce number of lines (and make it more readable). llvm-svn: 258644
*	[SelectionDAG] Fold more offsets into GlobalAddresses	Dan Gohman	2016-01-22	1	-73/+77
\| \| \| \| \| \| \| \|	This reapplies r258296 and r258366, and also fixes an existing bug in SelectionDAG.cpp's isMemSrcFromString, neglecting to account for the offset in a GlobalAddressSDNode, which is uncovered by those patches. llvm-svn: 258482
*	Revert "[SelectionDAG] Fold more offsets into GlobalAddresses"	Reid Kleckner	2016-01-22	1	-77/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts r258296 and the follow up r258366. With this change, we miscompiled the following program on Windows: #include <string> #include <iostream> static const char kData[] = "asdf jkl;"; int main() { std::string s(kData + 3, sizeof(kData) - 3); std::cout << s << '\n'; } llvm-svn: 258465
*	[SelectionDAG] Fold more offsets into GlobalAddresses	Dan Gohman	2016-01-20	1	-73/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SelectionDAG previously missed opportunities to fold constants into GlobalAddresses in several areas. For example, given `(add (add GA, c1), y)`, it would often reassociate to `(add (add GA, y), c1)`, missing the opportunity to create `(add GA+c, y)`. This isn't often visible on targets such as X86 which effectively reassociate adds in their complex address-mode folding logic, however it is currently visible on WebAssembly since it currently has very simple address mode folding code that doesn't reassociate anything. This patch fixes this by making SelectionDAG fold offsets into GlobalAddresses at the same times that it folds constants together, so that it doesn't miss any opportunities to perform such folding. Differential Revision: http://reviews.llvm.org/D16090 llvm-svn: 258296
*	[DAGCombiner] don't dereference an operand that doesn't exist (PR26070)	Sanjay Patel	2016-01-08	1	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	The bug was introduced with changes for x86-64 fp128: http://reviews.llvm.org/rL254653 I don't know why an x86 change is here, so I'll follow up in: http://reviews.llvm.org/D15134 Should fix: https://llvm.org/bugs/show_bug.cgi?id=26070 llvm-svn: 257200
*	Test commit access - add a blank line in comment.	Tim Shen	2016-01-08	1	-0/+1
\| \| \| \|	llvm-svn: 257192
*	[SelectionDAGBuilder] Set NoUnsignedWrap for inbounds gep and load/store ↵	Dan Gohman	2016-01-06	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	offsets. In an inbounds getelementptr, when an index produces a constant non-negative offset to add to the base, the add can be assumed to not have unsigned overflow. This relies on the assumption that addresses can't occupy more than half the address space, which isn't possible in C because it wouldn't be possible to represent the difference between the start of the object and one-past-the-end in a ptrdiff_t. Setting the NoUnsignedWrap flag is theoretically useful in general, and is specifically useful to the WebAssembly backend, since it permits stronger constant offset folding. Differential Revision: http://reviews.llvm.org/D15544 llvm-svn: 256890
*	Fix (bitcast (fabs x)), (bitcast (fneg x)) and (bitcast (fcopysign cst,	Eric Christopher	2015-12-10	1	-0/+68
\| \| \| \| \| \| \| \| \| \| \| \|	x)) combines for ppc_fp128, since signbit computation is more complicated. Discussion thread: http://lists.llvm.org/pipermail/llvm-dev/2015-November/092863.html Patch by Tim Shen! llvm-svn: 255305
*	fix return values to match bool return type; NFC	Sanjay Patel	2015-12-07	1	-2/+2
\| \| \| \|	llvm-svn: 254968
*	[X86] Part 1 to fix x86-64 fp128 calling convention.	Chih-Hung Hsieh	2015-12-03	1	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Almost all these changes are conditioned and only apply to the new x86-64 f128 type configuration, which will be enabled in a follow up patch. They are required together to make new f128 work. If there is any error, we should fix or revert them as a whole. These changes should have no impact to current configurations. * Relax type legalization checks to accept new f128 type configuration, whose TypeAction is TypeSoftenFloat, not TypeLegal, but also has TLI.isTypeLegal true. * Relax GetSoftenedFloat to return in some cases f128 type SDValue, which is TLI.isTypeLegal but not "softened" to i128 node. * Allow customized FABS, FNEG, FCOPYSIGN on new f128 type configuration, to generate optimized bitwise operators for libm functions. * Enhance related Lower* functions to handle f128 type. * Enhance DAGTypeLegalizer::run, SoftenFloatResult, and related functions to keep new f128 type in register, and convert f128 operators to library calls. * Fix Combiner, Emitter, Legalizer routines that did not handle f128 type. * Add ExpandConstant to handle i128 constants, ExpandNode to handle ISD::Constant node. * Add one more parameter to getCommonSubClass and firstCommonClass, to guarantee that returned common sub class will contain the specified simple value type. This extra parameter is used by EmitCopyFromReg in InstrEmitter.cpp. * Fix infinite loop in getTypeLegalizationCost when f128 is the value type. * Fix printOperand to handle null operand. * Enhance ISD::BITCAST node to handle f128 constant. * Expand new f128 type for BR_CC, SELECT_CC, SELECT, SETCC nodes. * Enhance X86AsmPrinter to emit f128 values in comments. Differential Revision: http://reviews.llvm.org/D15134 llvm-svn: 254653
*	Use a lambda instead of std::bind and std::mem_fn I introduced in r254242. NFC	Craig Topper	2015-11-29	1	-2/+3
\| \| \| \|	llvm-svn: 254260
*	[SelectionDAG] Use std::any_of instead of a manually coded loop. NFC	Craig Topper	2015-11-29	1	-8/+4
\| \| \| \|	llvm-svn: 254242
*	Expose isXxxConstant() functions from SelectionDAGNodes.h (NFC)	Artyom Skrobov	2015-11-25	1	-20/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Many target lowerings copy-paste the code to test SDValues for known constants. This code can instead be shared in SelectionDAG.cpp, and reused in the targets. Reviewers: MatzeB, andreadb, tstellarAMD Subscribers: arsenm, jyknight, llvm-commits Differential Revision: http://reviews.llvm.org/D14945 llvm-svn: 254085
*	Remove duplicate getValueType() calls. NFCI.	Simon Pilgrim	2015-11-22	1	-2/+2
\| \| \| \|	llvm-svn: 253823
*	[DAGCombiner] Bugfix for lost chain depenedency.	Jonas Paulsson	2015-11-21	1	-13/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When MergeConsecutiveStores() combines two loads and two stores into wider loads and stores, the chain users of both of the original loads must be transfered to the new load, because it may be that a chain user only depends on one of the loads. New test case: test/CodeGen/SystemZ/dag-combine-01.ll Reviewed by James Y Knight. Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=25310#c6 llvm-svn: 253779
*	X86: More efficient legalization of wide integer compares	Hans Wennborg	2015-11-19	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In particular, this makes the code for 64-bit compares on 32-bit targets much more efficient. Example: define i32 @test_slt(i64 %a, i64 %b) { entry: %cmp = icmp slt i64 %a, %b br i1 %cmp, label %bb1, label %bb2 bb1: ret i32 1 bb2: ret i32 2 } Before this patch: test_slt: movl 4(%esp), %eax movl 8(%esp), %ecx cmpl 12(%esp), %eax setae %al cmpl 16(%esp), %ecx setge %cl je .LBB2_2 movb %cl, %al .LBB2_2: testb %al, %al jne .LBB2_4 movl $1, %eax retl .LBB2_4: movl $2, %eax retl After this patch: test_slt: movl 4(%esp), %eax movl 8(%esp), %ecx cmpl 12(%esp), %eax sbbl 16(%esp), %ecx jge .LBB1_2 movl $1, %eax retl .LBB1_2: movl $2, %eax retl Differential Revision: http://reviews.llvm.org/D14496 llvm-svn: 253572
*	[DAGCombiner] Improve zextload optimization.	Geoff Berry	2015-11-11	1	-22/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Don't fold (zext (and (load x), cst)) -> (and (zextload x), (zext cst)) if (and (load x) cst) will match as a zextload already and has additional users. For example, the following IR: %load = load i32, i32* %ptr, align 8 %load16 = and i32 %load, 65535 %load64 = zext i32 %load16 to i64 store i32 %load16, i32* %dst1, align 4 store i64 %load64, i64* %dst2, align 8 used to produce the following aarch64 code: ldr w8, [x0] and w9, w8, #0xffff and x8, x8, #0xffff str w9, [x1] str x8, [x2] but with this change produces the following aarch64 code: ldrh w8, [x0] str w8, [x1] str x8, [x2] Reviewers: resistor, mcrosier Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14340 llvm-svn: 252789
*	Add target preference for GatherAllAliases max depth	Matt Arsenault	2015-11-11	1	-1/+1
\| \| \| \|	llvm-svn: 252775
*	add a SelectionDAG method to check if no common bits are set in two nodes; NFCI	Sanjay Patel	2015-11-09	1	-16/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was suggested in: http://reviews.llvm.org/D13956 and is a follow-on to: http://reviews.llvm.org/rL252515 http://reviews.llvm.org/rL252519 This lets us remove logically equivalent/duplicated code from DAGCombiner and X86ISelDAGToDAG. A corresponding function for IR instructions already exists in ValueTracking. llvm-svn: 252539
*	DAGCombiner: Check shouldReduceLoadWidth before combining (and (load), x) -> ↵	Tom Stellard	2015-11-06	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	extload Reviewers: resistor, arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13805 llvm-svn: 252349
*	Fix two issues in MergeConsecutiveStores:	James Y Knight	2015-11-02	1	-2/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1) PR25154. This is basically a repeat of PR18102, which was fixed in r200201, and broken again by r234430. The latter changed which of the store nodes was merged into from the first to the last. Thus, we now also need to prefer merging a later store at a given address into the target node, instead of an earlier one. 2) While investigating that, I also realized I'd introduced a bug in r236850. There, I removed a check for alignment -- not realizing that nothing except the alignment check was ensuring that none of the stores were overlapping! This is a really bogus way to ensure there's no aliased stores. A better solution to both of these issues is likely to always use the code added in the 'if (UseAA)' branches which rearrange the chain based on a more principled analysis. I'll look into whether that can be used always, but in the interest of getting things back to working, I think a minimal change makes sense. llvm-svn: 251816
*	Use the 'arcp' fast-math-flag when combining repeated FP divisors	Sanjay Patel	2015-10-27	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \|	This is a usage of the IR-level fast-math-flags now that they are propagated to SDNodes. This was originally part of D8900. Removing the global 'enable-unsafe-fp-math' checks will require auto-upgrade and possibly other changes. Differential Revision: http://reviews.llvm.org/D9708 llvm-svn: 251450
*	Fix llc crash processing S/UREM for -Oz builds caused by rL250825.	Steve King	2015-10-27	1	-5/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When taking the remainder of a value divided by a constant, visitREM() attempts to convert the REM to a longer but faster sequence of instructions. This conversion calls combine() on a speculative DIV instruction. Commit rL250825 may cause this combine() to return a DIVREM, corrupting nearby nodes. Flow eventually hits unreachable(). This patch adds a test case and a check to prevent visitREM() from trying to convert the REM instruction in cases where a DIVREM is possible. See http://reviews.llvm.org/D14035 llvm-svn: 251373
*	[DAGCombiner] Generalize masking of constant rotates.	Simon Pilgrim	2015-10-24	1	-5/+10
\| \| \| \| \| \| \| \|	We don't need a mask of a rotation result to be a constant splat - any constant scalar/vector can be usefully folded. Followup to D13851. llvm-svn: 251197
*	[X86][XOP] Add support for lowering vector rotations	Simon Pilgrim	2015-10-24	1	-55/+55
\| \| \| \| \| \| \| \| \| \|	This patch adds support for lowering to the XOP VPROT / VPROTI vector bit rotation instructions. This has required changes to the DAGCombiner rotation pattern matching to support vector types - so far I've only changed it to support splat vectors, but generalising this further is feasible in the future. Differential Revision: http://reviews.llvm.org/D13851 llvm-svn: 251188
*	[X86] - Catch extra combine opportunities for redundant imuls.	Zia Ansari	2015-10-22	1	-8/+92
\| \| \| \| \| \| \| \| \| \| \| \|	When we fold "mul ((add x, c1), c1)" -> "add ((mul x, c2), c1*c2)", we bail if (add x, c1) has multiple users which would result in an extra add instruction. In such cases, this patch adds a check to see if we can eliminate a multiply instruction in exchange for the extra add. I also added the capability of doing the existing optimization with non-splatted vectors (splatted also works). Differential Revision: http://reviews.llvm.org/D13740 llvm-svn: 251028
*	Combining DIV+REM->DIVREM doesn't belong in LegalizeDAG; move it over into ↵	Artyom Skrobov	2015-10-20	1	-18/+95
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DAGCombiner. Summary: In addition to moving the code over, this patch amends the DIV,REM -> DIVREM combining to run on all affected nodes at once: if the nodes are converted to DIVREM one at a time, then the resulting DIVREM may get legalized by the backend into something target-specific that we won't be able to recognize and correlate with the remaining nodes. The motivation is to "prepare terrain" for D13862: when we set DIV and REM to be legalized to libcalls, instead of the DIVREM, we otherwise lose the ability to combine them together. To prevent this, we need to take the DIV,REM -> DIVREM combining out of the lowering stage. Reviewers: RKSimon, eli.friedman, rengolin Subscribers: john.brawn, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D13733 llvm-svn: 250825
*	A doccomment for CombineTo, and some NFC refactorings	Artyom Skrobov	2015-10-14	1	-39/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Caching SDLoc(N), instead of recreating it in every single function call, keeps the code denser, and allows to unwrap long lines. Reviewers: sunfish, atrick, sdmitrouk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13726 llvm-svn: 250305
*	Merge DAGCombiner::visitSREM and DAGCombiner::visitUREM (NFC)	Artyom Skrobov	2015-10-14	1	-66/+34
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: The two implementations had more code in common than not. Reviewers: sunfish, MatzeB, sdmitrouk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13724 llvm-svn: 250302
*	DAGCombiner: Don't stop finding better chain on 2 aliases	Matt Arsenault	2015-10-13	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The comment says this was stopped because it was unlikely to be profitable. This is not true if you want to combine vector loads with multiple components. For a simple case that looks like t0 = load t0 ... t1 = load t0 ... t2 = load t0 ... t3 = load t0 ... t4 = store t0:1, t0:1 t5 = store t4, t1:0 t6 = store t5, t2:0 t7 = store t6, t3:0 We want to get all of these stores onto a chain that is a TokenFactor of these N loads. This mostly solves the AMDGPU merge-stores.ll regressions with -combiner-alias-analysis for merging vector stores of vector loads. llvm-svn: 250138
*	DAGCombiner: Combine extract_vector_elt from build_vector	Matt Arsenault	2015-10-12	1	-5/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This basic combine was surprisingly missing. AMDGPU legalizes many operations in terms of 32-bit vector components, so not doing this results in many extra copies and subregister extracts that need to be cleaned up later. InstCombine already does this for the hasOneUse case. The target hook is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn from a vector materialize repeated immediate instruction to a constant vector load with more scalar copies from it. llvm-svn: 250129
*	[SelectionDAG] Add common vector constant folding helper function	Simon Pilgrim	2015-10-12	1	-63/+5
\| \| \| \| \| \| \| \| \| \| \| \|	We have a number of functions that implement constant folding of vectors (unary and binary ops) in near identical manners (and the differences don't appear to be critical). This patch introduces a common implementation (SelectionDAG::FoldConstantVectorArithmetic) and calls this in both the unary and binary op cases. After this initial patch I intend to begin enabling vector constant folding for a wider number of opcodes in SelectionDAG::getNode(). Differential Revision: http://reviews.llvm.org/D13665 llvm-svn: 250118
*	[DAGCombiner] Improved FMA combine support for vectors	Simon Pilgrim	2015-10-11	1	-33/+36
\| \| \| \| \| \| \| \|	Enabled constant canonicalization for all constants. Improved combining of constant vectors. llvm-svn: 249993
*	[DAGCombiner] Tidyup FMINNUM/FMAXNUM constant folding	Simon Pilgrim	2015-10-11	1	-14/+14
\| \| \| \| \| \| \| \|	Enable constant folding for vector splats as well as scalars. Enable constant canonicalization for all scalar and vector constants. llvm-svn: 249978
*	[DAGCombiner] Generalize FADD constant combines to work with vectors	Simon Pilgrim	2015-10-03	1	-16/+17
\| \| \| \| \| \| \| \|	Updated the FADD combines to work with vectors as well as scalars. Differential Revision: http://reviews.llvm.org/D13416 llvm-svn: 249251
*	[DAGCombiner] Merge SIGN_EXTEND_INREG vector constant folding methods. NCI.	Simon Pilgrim	2015-10-03	1	-24/+4
\| \| \| \| \| \| \| \| \| \|	visitSIGN_EXTEND_INREG calls SelectionDAG::getNode to constant fold scalar constants but handles vector constants itself, despite getNode being capable of dealing with them. This required a minor change to the getNode implementation to actually deal with cases where the scalars of a BUILD_VECTOR were wider integers than the vector type - which was the only extra ability of the visitSIGN_EXTEND_INREG implementation. No codegen intended and all existing tests remain the same. llvm-svn: 249236
*	[DAGCombine] Fix getStoreMergeAndAliasCandidates's AA-enabled chain walking	Hal Finkel	2015-09-28	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When AA is being used, non-aliasing stores are canonicalized to use the same chain, and DAGCombiner::getStoreMergeAndAliasCandidates can take advantage of this by looking only as users of a store's chain operand. However, user iteration is not result-number specific, we need to check that the use is as a chain operand, and not via some other operand. It is certainly possible to have another potentially-aliasing store, which shares the first's base pointer, and uses the first's chain's node via some other operand. Failure to catch this situation caused, at least in the included test case, an assert later because the relative sequence-number ordering caused later replacement to create a cycle in the DAG. llvm-svn: 248698
*	DAGCombiner: Check if store is volatile first	Matt Arsenault	2015-09-25	1	-3/+3
\| \| \| \| \| \|	This is the simpler check. NFC. llvm-svn: 248625
*	merge vector stores into wider vector stores and fix AArch64 misaligned ↵	Sanjay Patel	2015-09-25	1	-11/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	access TLI hook (PR21711) This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ). The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned accesses up in performSTORECombine() because they are slow. This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving existing (perhaps questionable) lowering behavior. The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned stores. Differential Revision: http://reviews.llvm.org/D12635 llvm-svn: 248622
*	Use new TokenFactor chain when merging stores	Matt Arsenault	2015-09-24	1	-5/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the stores are storing values from loads which partially alias the stores, we could end up placing the merged loads and stores on the same chain which has the potential to break. Each store may have a different chain dependency on only some of the original loads. Create a new TokenFactor to capture all of the required dependencies of the stores rather than assuming all stores can use the same chain. The testcase is a situation where this happens, although it does not have an observable change from this. The DAG nodes just happened to not be reordered before despite this missing chain dependency. This is based on an off-list report for an out of tree target which regressed due to r246307 and I haven't managed to find a case where the nodes do end up reordered with an in tree target. llvm-svn: 248468
*	[DAGCombiner] Improve FMA support for interpolation patterns	Simon Pilgrim	2015-09-21	1	-0/+89
\| \| \| \| \| \| \| \| \| \|	This patch adds support for combining patterns such as (FMUL(FADD(1.0, x), y)) and (FMUL(FSUB(x, 1.0), y)) to their FMA equivalents. This is useful in particular for linear interpolation cases such as (FADD(FMUL(x, t), FMUL(y, FSUB(1.0, t)))) Differential Revision: http://reviews.llvm.org/D13003 llvm-svn: 248210
*	[DAGCombiner] Tidy up FMA combine helpers. NFCI.	Simon Pilgrim	2015-09-21	1	-25/+21
\| \| \| \| \| \|	Based on feedback for D13003. llvm-svn: 248206
*	Fix accidentally committed debug printing	Matt Arsenault	2015-09-21	1	-14/+1
\| \| \| \|	llvm-svn: 248190
*	DAGCombiner: Replace store of FP constant after attemping store merges	Matt Arsenault	2015-09-21	1	-10/+10
\| \| \| \| \| \| \| \| \|	If storing multiple FP constants, some subset of the stores would be replaced with integers due to visit order, so MergeConsecutiveStores would only partially merge these. llvm-svn: 248169
*	Factor replacement of stores of FP constants into new function	Matt Arsenault	2015-09-21	1	-72/+104
\| \| \| \|	llvm-svn: 248168
*	Use makeArrayRef or None to avoid unnecessarily mentioning the ArrayRef type ↵	Craig Topper	2015-09-21	1	-1/+1
\| \| \| \| \| \|	extra times. NFC llvm-svn: 248140
*	propagate fast-math-flags on DAG nodes	Sanjay Patel	2015-09-16	1	-99/+129
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is one test case in this patch to prove that point. This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF ( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes. This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the current global settings. Differential Revision: http://reviews.llvm.org/D12095 llvm-svn: 247815
*	[DAGCombine] Truncate BUILD_VECTOR operators if necessary when constant ↵	Silviu Baranga	2015-09-10	1	-11/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	folding vectors Summary: The BUILD_VECTOR node will truncate its operators to match the type. We need to take this into account when constant folding - we need to perform a truncation before constant folding the elements. This is because the upper bits can change the result, depending on the operation type (for example this is the case for min/max). This change also adds a regression test. Reviewers: jmolloy Subscribers: jmolloy, llvm-commits Differential Revision: http://reviews.llvm.org/D12697 llvm-svn: 247265
*	check for fastness before merging in DAGCombiner::MergeConsecutiveStores()	Sanjay Patel	2015-09-03	1	-11/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use and check the 'IsFast' optional parameter to TLI.allowsMemoryAccess() any time we have a merged access candidate. Without this patch, we were generating unaligned 16-byte (SSE) memops for x86 targets where those accesses are slow. This change was mentioned in: http://reviews.llvm.org/D10662 and http://reviews.llvm.org/D10905 and will help solve PR21711. Differential Revision: http://reviews.llvm.org/D12573 llvm-svn: 246771