bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[gc.statepoint] Change gc.statepoint intrinsic's return type to token type ↵	Chen Li	2015-12-26	40	-224/+224
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	instead of i32 type Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint. Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob Subscribers: reames, mjacob, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D15662 llvm-svn: 256443
*	[InstCombine] transform more extract/insert pairs into shuffles (PR2109)	Sanjay Patel	2015-12-24	1	-16/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an extension of the shuffle combining from r203229: http://reviews.llvm.org/rL203229 The idea is to widen a short input vector with undef elements so the existing shuffle transform for extract/insert can kick in. The motivation is to finally solve PR2109: https://llvm.org/bugs/show_bug.cgi?id=2109 For that example, the IR becomes: %1 = bitcast <2 x i32>* %P to <2 x float>* %ld1 = load <2 x float>, <2 x float>* %1, align 8 %2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> %i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5> ret <4 x float> %i2 And x86 SSE output improves from: movq (%rdi), %xmm1 ## xmm1 = mem[0],zero movdqa %xmm1, %xmm2 shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3] shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0] shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2] shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0] shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0] retq To the almost optimal: movhpd (%rdi), %xmm0 Note: There's a tension in the existing transform related to generating arbitrary shufflevector masks. We avoid that in other places in InstCombine because we're scared that codegen can't handle strange masks, but it looks like we're ok with producing those here. I purposely chose weird insert/extract indexes for the regression tests to see the effect in these cases. For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or better for these examples. Differential Revision: http://reviews.llvm.org/D15096 llvm-svn: 256394
*	[OperandBundles] Have TailCallElim play nice with operand bundles	David Majnemer	2015-12-23	1	-0/+10
\| \| \| \| \| \| \| \| \|	A call site's use of a Value might not correspond to an argument operand but to a bundle operand. This fixes PR25928. llvm-svn: 256328
*	[OperandBundles] Have InstCombine play nice with operand bundles	David Majnemer	2015-12-23	1	-0/+11
\| \| \| \| \| \| \|	Don't assume a call's use corresponds to an argument operand, it might correspond to a bundle operand. llvm-svn: 256327
*	[OperandBundles] Have DeadArgElim play nice with operand bundles	David Majnemer	2015-12-23	1	-0/+12
\| \| \| \| \| \| \|	A call site's use of a Value might not correspond to an argument operand but to a bundle operand. llvm-svn: 256326
*	[RS4GC] Fix base pair printing for constants.	Manuel Jacob	2015-12-23	2	-2/+2
\| \| \| \| \| \| \|	Previously, "%" + name of the value was printed for each derived and base pointer. This is correct for instructions, but wrong for e.g. globals. llvm-svn: 256305
*	Also add unnamed_addr to functions.	Rafael Espindola	2015-12-22	1	-0/+6
\| \| \| \|	llvm-svn: 256281
*	Delete dead GlobalAliases.	Rafael Espindola	2015-12-22	3	-3/+6
\| \| \| \|	llvm-svn: 256276
*	[BPI] Replace weights by probabilities in BPI.	Cong Hou	2015-12-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This patch removes all weight-related interfaces from BPI and replace them by probability versions. With this patch, we won't use edge weight anymore in either IR or MC passes. Edge probabilitiy is a better representation in terms of CFG update and validation. Differential revision: http://reviews.llvm.org/D15519 llvm-svn: 256263
*	Remove deprecated llvm.experimental.gc.result.{int,float,ptr} intrinsics.	Manuel Jacob	2015-12-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These were deprecated 11 months ago when a generic llvm.experimental.gc.result intrinsic, which works for all types, was added. Reviewers: sanjoy, reames Subscribers: sanjoy, chenli, llvm-commits Differential Revision: http://reviews.llvm.org/D15719 llvm-svn: 256262
*	[RS4GC] Fix crash in the case that a live variable has a constant base.	Manuel Jacob	2015-12-22	2	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously, RS4GC crashed in CreateGCRelocates() because it assumed that every base is also in the array of live variables, which isn't true if a live variable has a constant base. This change fixes the crash by making sure CreateGCRelocates() won't try to relocate a live variable with a constant base. This would be unnecessary anyway because anything with a constant base won't move. Reviewers: reames Subscribers: llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D15556 llvm-svn: 256252
*	Determine callee's hotness and adjust threshold based on that. NFC.	Easwaran Raman	2015-12-22	2	-0/+78
\| \| \| \| \| \| \| \| \| \|	This uses the same criteria used in CFE's CodeGenPGO to identify hot and cold callees and uses values of inlinehint-threshold and inlinecold-threshold respectively as the thresholds for such callees. Differential Revision: http://reviews.llvm.org/D15245 llvm-svn: 256222
*	[safestack] Add option for non-TLS unsafe stack pointer.	Evgeniy Stepanov	2015-12-22	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds an option, -safe-stack-no-tls, for using normal storage instead of thread-local storage for the unsafe stack pointer. This can be useful when SafeStack is applied to an operating system kernel. http://reviews.llvm.org/D15673 Patch by Michael LeMay. llvm-svn: 256221
*	[cfi] Fix LowerBitSets on 32-bit targets.	Evgeniy Stepanov	2015-12-21	1	-0/+21
\| \| \| \| \| \| \|	This code attempts to truncate IntPtrTy to i32, which may be the same type. llvm-svn: 256205
*	Nonnull elements in OperandBundleCallSites are not all Instructions	Sanjoy Das	2015-12-19	1	-0/+36
\| \| \| \| \| \| \| \| \| \|	`CloneAndPruneIntoFromInst` sometimes RAUW's dead instructions with `undef` before erasing them (to avoid deleting instructions that still have uses). This changes the `WeakVH` in `OperandBundleCallSites` to hold an `undef`, and we need to guard for this situation in eventuality in `llvm::InlineFunction`. llvm-svn: 256110
*	[Deopt bundles] Fix a test case	Sanjoy Das	2015-12-19	1	-1/+1
\| \| \| \| \| \| \|	The `CHECK-NOT` line was incorrect, and would not have caught a breakage. llvm-svn: 256109
*	Remove double blanks. NFC.	Manuel Jacob	2015-12-19	1	-7/+7
\| \| \| \|	llvm-svn: 256100
*	[RS4GC] Remove an overly strong assertion	Philip Reames	2015-12-19	1	-0/+35
\| \| \| \| \| \|	As shown by the included test case, it's reasonable to end up with constant references during base pointer calculation. The code actually handled this case just fine, we only had the assert to help isolate problems under the belief that constant references shouldn't be present in IR generated by managed frontends. This turned out to be wrong on two fronts: 1) Manual Jacobs is working on a language with constant references, and b) we found a case where the optimizer does create them in practice. llvm-svn: 256079
*	Clean up the processing of dbg.value in various places	Keno Fischer	2015-12-19	1	-0/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: First up is instcombine, where in the dbg.declare -> dbg.value conversion, the llvm.dbg.value needs to be called on the actual loaded value, rather than the address (since the whole point of this transformation is to be able to get rid of the alloca). Further, now that that's cleaned up, we can remove a hack in the backend, that would add an implicit OP_deref if the argument to dbg.value was an alloca. This stems from before the existence of DIExpression and is no longer necessary since the deref can be expressed explicitly. Now, in order to make sure that the tests pass with this change, we need to correct the printing of DEBUG_VALUE comments to take into account the expression, which wasn't taken into account before. Unfortunately, for both these changes, there were a number of incorrect test cases (mostly the wrong number of DW_OP_derefs, but also a couple where the test itself was broken more badly). aprantl and I have gone through and adjusted these test case in order to make them pass with these fixes and in some cases to make sure they're actually testing what they are meant to test. Reviewers: aprantl Subscribers: dsanders Differential Revision: http://reviews.llvm.org/D14186 llvm-svn: 256077
*	AMDGPU: Switch barrier intrinsics to using convergent	Matt Arsenault	2015-12-19	2	-0/+36
\| \| \| \| \| \| \| \|	noduplicate prevents unrolling of small loops that happen to have barriers in them. If a loop has a barrier in it, it is OK to duplicate it for the unroll. llvm-svn: 256075
*	[NaryReassociate] allow candidate to have a different type	Jingyue Wu	2015-12-18	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If Candiadte may have a different type from GEP, we should bitcast or pointer cast it to GEP's type so that the later RAUW doesn't complain. Added a test in nary-gep.ll Reviewers: tra, meheff Subscribers: mcrosier, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D15618 llvm-svn: 256035
*	[WinEH] Update LCSSA to handle catchswitch with handlers inside and outside ↵	Andrew Kaylor	2015-12-18	1	-0/+95
\| \| \| \| \| \| \| \|	a loop Differential Revision: http://reviews.llvm.org/D15630 llvm-svn: 256005
*	[InstCombine] Extend peephole DSE to handle unordered atomics	Philip Reames	2015-12-17	1	-0/+113
\| \| \| \| \| \| \| \| \| \| \| \|	This extends the same line of reasoning used in EarlyCSE w/http://reviews.llvm.org/D15352 to the DSE implementation in InstCombine. Key points: * We only remove unordered or simple stores. * The loads producing values consumed by dead stores don't influence whether the store is dead. Differential Revision: http://reviews.llvm.org/D15354 llvm-svn: 255932
*	[EarlyCSE] DSE of atomic unordered stores	Philip Reames	2015-12-17	1	-0/+74
\| \| \| \| \| \| \| \| \| \|	The rules for removing trivially dead stores are a lot less complicated than loads. Since we know the later store post dominates the former and the former dominates the later, unless the former has side effects other than the actual store, we can remove it. One slightly surprising thing is that we can freely remove atomic stores, even if the later one isn't atomic. There's no guarantee the atomic one was every visible. For the moment, we don't handle DSE of ordered atomic stores. We could extend the same chain of reasoning to them, but the catch is we'd then have to model the ordering effect without a store instruction. Since our fences are a stronger than our operation orderings, simple using a fence isn't an obvious win. This arguable calls for a refinement in our fence specification, but that's (much) later work. Differential Revision: http://reviews.llvm.org/D15352 llvm-svn: 255914
*	[ThinLTO] Metadata linking for imported functions	Teresa Johnson	2015-12-17	2	-0/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Second patch split out from http://reviews.llvm.org/D14752. Maps metadata as a post-pass from each module when importing complete, suturing up final metadata to the temporary metadata left on the imported instructions. This entails saving the mapping from bitcode value id to temporary metadata in the importing pass, and from bitcode value id to final metadata during the metadata linking postpass. Depends on D14825. Reviewers: dexonsmith, joker.eph Subscribers: davidxl, llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D14838 llvm-svn: 255909
*	[NFC] Update horizontal reduction test cases.	Charlie Turner	2015-12-16	2	-2/+2
\| \| \| \| \| \| \|	These testcases no longer need to specify -slp-vectorize-hor, since it was enabled by default in r252733. llvm-svn: 255783
*	[SimplifyCFG] Don't create unnecessary PHIs	James Molloy	2015-12-16	1	-0/+215
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In conditional store merging, we were creating PHIs when we didn't need to. If the value to be predicated isn't defined in the block we're predicating, then it doesn't need a PHI at all (because we only deal with triangles and diamonds, any value not in the predicated BB must dominate the predicated BB). This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite. Now with a fix (and fixed tests) for the conformance issue seen in Chromium. llvm-svn: 255767
*	[EarlyCSE] DSE of stores which write back loaded values	Philip Reames	2015-12-16	1	-0/+74
\| \| \| \| \| \| \| \| \| \|	Extend EarlyCSE with an additional style of dead store elimination. If we write back a value just read from that memory location, we can eliminate the store under the assumption that the value hasn't changed. I'm implementing this mostly because I noticed the omission when looking at the code. It seemed strange to have InstCombine have a peephole which was more powerful than EarlyCSE. :) Differential Revision: http://reviews.llvm.org/D15397 llvm-svn: 255739
*	[IR] Add support for floating pointer atomic loads and stores	Philip Reames	2015-12-16	1	-0/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows atomic loads and stores of floating point to be specified in the IR and adds an adapter to allow them to be lowered via existing backend support for bitcast-to-equivalent-integer idiom. Previously, the only way to specify a atomic float operation was to bitcast the pointer to a i32, load the value as an i32, then bitcast to a float. At it's most basic, this patch simply moves this expansion step to the point we start lowering to the backend. This patch does not add canonicalization rules to convert the bitcast idioms to the appropriate atomic loads. I plan to do that in the future, but for now, let's simply add the support. I'd like to get instruction selection working through at least one backend (x86-64) without the bitcast conversion before canonicalizing into this form. Similarly, I haven't yet added the target hooks to opt out of the lowering step I added to AtomicExpand. I figured it would more sense to add those once at least one backend (x86) was ready to actually opt out. As you can see from the included tests, the generated code quality is not great. I plan on submitting some patches to fix this, but help from others along that line would be very welcome. I'm not super familiar with the backend and my ramp up time may be material. Differential Revision: http://reviews.llvm.org/D15471 llvm-svn: 255737
*	Cross-DSO control flow integrity (LLVM part).	Evgeniy Stepanov	2015-12-15	1	-0/+88
\| \| \| \| \| \| \| \|	An LTO pass that generates a __cfi_check() function that validates a call based on a hash of the call-site-known type and the target pointer. llvm-svn: 255693
*	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ↵	Cong Hou	2015-12-15	2	-1/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ignoring specific instructions. (This is the third attempt to check in this patch, and the first two are r255454 and r255460. The once failed test file reg-usage.ll is now moved to test/Transform/LoopVectorize/X86 directory with target datalayout and target triple indicated.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255691
*	[SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818)	Sanjay Patel	2015-12-15	2	-43/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the last general step to allow more IR-level speculation with a safety harness in place in CodeGenPrepare. The intent is to restore the behavior enabled by: http://reviews.llvm.org/rL228826 but prevent bad performance such as: https://llvm.org/bugs/show_bug.cgi?id=24818 Earlier patches in this sequence: D12882 (disable SimplifyCFG speculation for expensive instructions) D13297 (have CGP despeculate expensive ops) D14630 (have CGP despeculate special versions of cttz/ctlz) As shown in the test cases, we only have two instructions currently affected: ctz for some x86 and fdiv generally. Allowing exactly one expensive instruction is a bit of a hack, but it lines up with what is currently implemented in CGP. If we make the despeculation more general in CGP, we can make the speculation here more liberal. A follow-up patch will adjust the cost for sqrt and possibly other typically expensive math intrinsics (currently everything is cheap by default). GPU targets would likely want to override those expensive default costs (just as they probably should already override the cost of div/rem) because just about any math is cheaper than control-flow on those targets. Differential Revision: http://reviews.llvm.org/D15213 llvm-svn: 255660
*	AMDGPU: mark ldexp LibCalls as unavailable	Nicolai Hahnle	2015-12-15	1	-10/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The LibCallSimplifier will turn llvm.exp2.* intrinsics into ldexp* libcalls which do not make sense with the AMDGPU backend. In the long run, we'll want an llvm.ldexp.* intrinsic to properly make use of this optimization, but this works around the problem for now. See also: http://reviews.llvm.org/D14327 (suggested llvm.ldexp.* implementation) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709 Reviewers: arsenm, tstellarAMD Differential Revision: http://reviews.llvm.org/D14990 llvm-svn: 255658
*	Instcombine: destructor loads of structs that do not contains padding	Mehdi Amini	2015-12-15	3	-84/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For non padded structs, we can just proceed and deaggregate them. We don't want ot do this when there is padding in the struct as to not lose information about this padding (the subsequents passes would then try hard to preserve the padding, which is undesirable). Also update extractvalue.ll and cast.ll so that they use structs with padding. Remove the FIXME in the extractvalue of laod case as the non padded case is handled when processing the load, and we don't want to do it on the padded case. Patch by: Amaury SECHET <deadalnix@gmail.com> Differential Revision: http://reviews.llvm.org/D14483 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255600
*	[PGO] make profile prefix even shorter and more readable	Xinliang David Li	2015-12-15	8	-39/+39
\| \| \| \|	llvm-svn: 255586
*	[PGO] Shorten profile symbol prefixes	Xinliang David Li	2015-12-14	8	-39/+39
\| \| \| \| \| \| \| \| \|	Profile symbols have long prefixes which waste space and creating pressure for linker. This patch shortens the prefixes to minimal length without losing verbosity. Differential Revision: http://reviews.llvm.org/D15503 llvm-svn: 255575
*	Revert "Don't create unnecessary PHIs"	Reid Kleckner	2015-12-14	2	-200/+4
\| \| \| \| \| \| \| \| \|	This reverts commit r255489. It causes test failures in Chromium and does not appear to respect the AlternativeV parameter. llvm-svn: 255562
*	add fast-math-flags to 'call' instructions (PR21290)	Sanjay Patel	2015-12-14	6	-22/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds optional fast-math-flags (the same that apply to fmul/fadd/fsub/fdiv/frem/fcmp) to call instructions in IR. Follow-up patches would use these flags in LibCallSimplifier, add support to clang, and extend FMF to the DAG for calls. Motivating example: %y = fmul fast float %x, %x %z = tail call float @sqrtf(float %y) We'd like to be able to optimize sqrt(x*x) into fabs(x). We do this today using a function-wide attribute for unsafe-math, but we really want to trigger on the instructions themselves: %z = tail call fast float @sqrtf(float %y) because in an LTO build it's possible that calls with fast semantics have been inlined into a function with non-fast semantics. The code changes and tests are based on the recent commits that added "notail": http://reviews.llvm.org/rL252368 and added FMF to fcmp: http://reviews.llvm.org/rL241901 Differential Revision: http://reviews.llvm.org/D14707 llvm-svn: 255555
*	[IR] Remove terminatepad	David Majnemer	2015-12-14	5	-134/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	It turns out that terminatepad gives little benefit over a cleanuppad which calls the termination function. This is not sufficient to implement fully generic filters but MSVC doesn't support them which makes terminatepad a little over-designed. Depends on D15478. Differential Revision: http://reviews.llvm.org/D15479 llvm-svn: 255522
*	[InstCombine] fold trunc ([lshr] (bitcast vector) ) --> extractelement (PR25543)	Sanjay Patel	2015-12-14	1	-11/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a fix for PR25543: https://llvm.org/bugs/show_bug.cgi?id=25543 The idea is to take the existing fold of: bitcast ( trunc ( lshr ( bitcast X))) --> extractelement (bitcast X) ( http://reviews.llvm.org/rL112232 ) And break it into less specific transforms so we'll catch more cases such as the example in the bug report: bitcast ( trunc ( lshr ( bitcast X))) --> bitcast ( extractelement (bitcast X)) --> extractelement (bitcast X) Enabling patches for this change: http://reviews.llvm.org/rL255399 (combine bitcasts) http://reviews.llvm.org/rL255433 (canonicalize extractelement(bitcast X)) Differential Revision: http://reviews.llvm.org/D15392 llvm-svn: 255504
*	Don't create unnecessary PHIs	James Molloy	2015-12-14	2	-4/+200
\| \| \| \| \| \| \| \| \| \| \| \|	In conditional store merging, we were creating PHIs when we didn't need to. If the value to be predicated isn't defined in the block we're predicating, then it doesn't need a PHI at all (because we only deal with triangles and diamonds, any value not in the predicated BB must dominate the predicated BB). This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite. llvm-svn: 255489
*	Revert r255460, which still causes test failures on some platforms.	Cong Hou	2015-12-13	2	-70/+1
\| \| \| \| \| \|	Further investigation on the failures is ongoing. llvm-svn: 255463
*	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ↵	Cong Hou	2015-12-13	2	-1/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ignoring specific instructions. (This is the second attempt to check in this patch: REQUIRES: asserts is added to reg-usage.ll now.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255460
*	Revert r255454 as it leads to several test failers on buildbots.	Cong Hou	2015-12-13	2	-69/+1
\| \| \| \|	llvm-svn: 255456
*	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ↵	Cong Hou	2015-12-13	2	-1/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ignoring specific instructions. LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255454
*	[PGO] Stop using invalid char in instr variable names.	Xinliang David Li	2015-12-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Before the patch, -fprofile-instr-generate compile will fail if no integrated-as is specified when the file contains any static functions (the -S output is also invalid). This is the second try. The fix in this patch is very localized. Only profile symbol names of profile symbols with internal linkage are fixed up while initializer of name syms are not changes. This means there is no format change nor version bump. llvm-svn: 255434
*	[InstCombine] canonicalize (bitcast (extractelement X)) --> ↵	Sanjay Patel	2015-12-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(extractelement(bitcast X)) This change was discussed in D15392. It allows us to remove the fold that was added in: http://reviews.llvm.org/r255261 ...and it will allow us to generalize this fold: http://reviews.llvm.org/rL112232 while preserving the order of bitcast + extract that it produces and testing shows is better handled by the backend. Note that the existing check for "isVectorTy()" wasn't strong enough in general and specifically because: x86_mmx. It's not a vector, but it's not vectorizable either. So here we check VectorType::isValidElementType() directly before proceeding with the transform. llvm-svn: 255433
*	Move catchpad-phi-cast.ll to the X86 specific subdirectory	David Majnemer	2015-12-12	1	-0/+0
\| \| \| \| \| \| \|	It is X86 specific and will not be properly exercised unless LLVM is built with the X86 target. llvm-svn: 255426
*	[IR] Reformulate LLVM's EH funclet IR	David Majnemer	2015-12-12	11	-249/+260
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While we have successfully implemented a funclet-oriented EH scheme on top of LLVM IR, our scheme has some notable deficiencies: - catchendpad and cleanupendpad are necessary in the current design but they are difficult to explain to others, even to seasoned LLVM experts. - catchendpad and cleanupendpad are optimization barriers. They cannot be split and force all potentially throwing call-sites to be invokes. This has a noticable effect on the quality of our code generation. - catchpad, while similar in some aspects to invoke, is fairly awkward. It is unsplittable, starts a funclet, and has control flow to other funclets. - The nesting relationship between funclets is currently a property of control flow edges. Because of this, we are forced to carefully analyze the flow graph to see if there might potentially exist illegal nesting among funclets. While we have logic to clone funclets when they are illegally nested, it would be nicer if we had a representation which forbade them upfront. Let's clean this up a bit by doing the following: - Instead, make catchpad more like cleanuppad and landingpad: no control flow, just a bunch of simple operands; catchpad would be splittable. - Introduce catchswitch, a control flow instruction designed to model the constraints of funclet oriented EH. - Make funclet scoping explicit by having funclet instructions consume the token produced by the funclet which contains them. - Remove catchendpad and cleanupendpad. Their presence can be inferred implicitly using coloring information. N.B. The state numbering code for the CLR has been updated but the veracity of it's output cannot be spoken for. An expert should take a look to make sure the results are reasonable. Reviewers: rnk, JosephTremoulet, andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D15139 llvm-svn: 255422
*	[InstCombine] allow any pair of bitcasts to be combined	Sanjay Patel	2015-12-12	1	-12/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change is discussed in D15392 and should allow us to effectively revert: http://llvm.org/viewvc/llvm-project?view=revision&revision=255261 if we canonicalize bitcasts ahead of extracts. It should be safe to convert any pair of bitcasts into a single bitcast, however, it was mentioned here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20110829/127089.html that we're not allowed to bitcast from an x86_mmx to some other types, but I'm not seeing any failures from that, and we have regression tests in CodeGen/X86 that appear to cover all of those cases. Some day we'll get to remove that MMX wart from LLVM IR completely? Differential Revision: http://reviews.llvm.org/D15468 llvm-svn: 255399