bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Revert r224739: Debug info: Teach SROA how to update debug info for	Chandler Carruth	2014-12-23	1	-30/+1
\| \| \| \| \| \| \| \| \| \| \|	fragmented variables. This caused codegen to start crashing when we built somewhat large programs with debug info and optimizations. 'check-msan' hit in, and I suspect a bootstrap would as well. I mailed a test case to the review thread. llvm-svn: 224750
*	Remove dynamic allocation/indirection from GCOVBlocks owned by GCOVFunction	David Blaikie	2014-12-22	1	-22/+25
\| \| \| \| \| \| \| \| \|	Since these are all created in the DenseMap before they are referenced, there's no problem with pointer validity by the time it's required. This removes another use of DeleteContainerSeconds/manual memory management which I'm cleaning up from time to time. llvm-svn: 224744
*	[SROA] Lift the logic for traversing the alloca slices one partition at	Chandler Carruth	2014-12-22	1	-157/+303
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a time into a partition iterator and a Partition class. There is a lot of knock-on simplification that this enables, largely stemming from having a Partition object to refer to in lots of helpers. I've only done a minimal amount of that because enoguh stuff is changing as-is in this commit. This shouldn't change any observable behavior. I've worked hard to preserve the exact traversal semantics which were originally present even though some of them make no sense. I'll be changing some of this in subsequent commits now that the logic is carefully factored into a reusable place. The primary motivation for this change is to break the rewriting into phases in order to support more intelligent rewriting. For example, I'm planning to change how split loads and stores are rewritten to remove the significant overuse of integer bit packing in the resulting code and allow more effective secondary splitting of aggregates. For any of this to work, they have to share the exact traversal logic. llvm-svn: 224742
*	[LCSSA] Handle PHI insertion in disjoint loops	Bruno Cardoso Lopes	2014-12-22	3	-10/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Take two disjoint Loops L1 and L2. LoopSimplify fails to simplify some loops (e.g. when indirect branches are involved). In such situations, it can happen that an exit for L1 is the header of L2. Thus, when we create PHIs in one of such exits we are also inserting PHIs in L2 header. This could break LCSSA form for L2 because these inserted PHIs can also have uses in L2 exits, which are never handled in the current implementation. Provide a fix for this corner case and test that we don't assert/crash on that. Differential Revision: http://reviews.llvm.org/D6624 rdar://problem/19166231 llvm-svn: 224740
*	Debug info: Teach SROA how to update debug info for fragmented variables.	Adrian Prantl	2014-12-22	1	-1/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows us to generate debug info for extremely advanced code such as typedef struct { long int a; int b;} S; int foo(S s) { return s.b; } which at -O1 on x86_64 is codegen'd into define i32 @foo(i64 %s.coerce0, i32 %s.coerce1) #0 { ret i32 %s.coerce1, !dbg !24 } with this patch we emit the following debug info for this TAG_formal_parameter [3] AT_location( 0x00000000 0x0000000000000000 - 0x0000000000000006: rdi, piece 0x00000008, rsi, piece 0x00000004 0x0000000000000006 - 0x0000000000000008: rdi, piece 0x00000008, rax, piece 0x00000004 ) AT_name( "s" ) AT_decl_file( "/Volumes/Data/llvm/_build.ninja.release/test.c" ) Thanks to chandlerc, dblaikie, and echristo for their feedback on all previous iterations of this patch! llvm-svn: 224739
*	InstCombine: Squash an icmp+select into bitwise arithmetic	David Majnemer	2014-12-20	1	-6/+24
\| \| \| \| \| \| \| \| \|	(X & INT_MIN) == 0 ? X ^ INT_MIN : X into X \| INT_MIN (X & INT_MIN) != 0 ? X ^ INT_MIN : X into X & INT_MAX This fixes PR21993. llvm-svn: 224676
*	[SROA] Run clang-format over the entire SROA pass as I wrote it before	Chandler Carruth	2014-12-20	1	-157/+138
\| \| \| \| \| \| \| \| \| \| \| \|	much of the glory of clang-format, and now any time I touch it I risk introducing formatting changes as part of a functional commit. Also, clang-format is way better at formatting my code than I am. Most of this is a huge improvement although I reverted a couple of places where I hit a clang-format bug with lambdas that has been filed but not (fully) fixed. llvm-svn: 224666
*	[BBVectorize] Remove two more redundant assignments.	Tilmann Scheller	2014-12-19	1	-2/+0
\| \| \| \| \| \|	Found by the Clang static analyzer. llvm-svn: 224590
*	[BBVectorize] Remove redundant assignment.	Tilmann Scheller	2014-12-19	1	-1/+0
\| \| \| \| \| \|	Found by the Clang static analyzer. llvm-svn: 224589
*	Reapply: [InstCombine] Fix visitSwitchInst to use right operand types for ↵	Bruno Cardoso Lopes	2014-12-19	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sub cstexpr The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. Also, fix code to also return the modified switch when only the truncation is performed. This fixes an assertion crash. Differential Revision: http://reviews.llvm.org/D6644 rdar://problem/19191835 llvm-svn: 224588
*	[LoopVectorize] Remove redundant assignment.	Tilmann Scheller	2014-12-19	1	-1/+0
\| \| \| \| \| \|	Found by the Clang static analyzer. llvm-svn: 224587
*	use -0.0 when creating an fneg instruction	Sanjay Patel	2014-12-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backends recognize (-0.0 - X) as the canonical form for fneg and produce better code. Eg, ppc64 with 0.0: lis r2, ha16(LCPI0_0) lfs f0, lo16(LCPI0_0)(r2) fsubs f1, f0, f1 blr vs. -0.0: fneg f1, f1 blr Differential Revision: http://reviews.llvm.org/D6723 llvm-svn: 224583
*	Revert "[InstCombine] Fix visitSwitchInst to use right operand types for sub ↵	Bruno Cardoso Lopes	2014-12-19	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	cstexpr" Reverts commit r224574 to appease buildbots: The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. This fixes an assertion crash. llvm-svn: 224576
*	[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr	Bruno Cardoso Lopes	2014-12-19	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. This fixes an assertion crash. Differential Revision: http://reviews.llvm.org/D6644 rdar://problem/19191835 llvm-svn: 224574
*	Rename MapValue(Metadata*) to MapMetadata()	Duncan P. N. Exon Smith	2014-12-19	3	-18/+18
\| \| \| \| \| \| \| \|	Instead of reusing the name `MapValue()` when mapping `Metadata`, use `MapMetadata()`. The old name doesn't make much sense after the `Metadata`/`Value` split. llvm-svn: 224566
*	fix formatting; NFC	Sanjay Patel	2014-12-18	1	-8/+4
\| \| \| \|	llvm-svn: 224542
*	[Msan] Generalize instrumentation code to support FreeBSD mapping	Viktor Kutuzov	2014-12-18	1	-27/+106
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D6666 llvm-svn: 224514
*	[SROA] Cleanup - remove the use of std::mem_fun_ref nonsense and use	Chandler Carruth	2014-12-18	1	-1/+3
\| \| \| \| \| \|	a lambda now that we have them. llvm-svn: 224500
*	[sanitizer] allow -fsanitize-coverage=N w/ -fsanitize=leak, llvm part	Kostya Serebryany	2014-12-17	1	-4/+2
\| \| \| \|	llvm-svn: 224463
*	Revert 224119 "This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders ↵	Suyog Sarda	2014-12-17	1	-24/+2
\| \| \| \| \| \| \| \| \| \|	them for bundling into vector of loads, and vectorizes it." This was re-ordering floating point data types resulting in mismatch in output. llvm-svn: 224424
*	Strength reduce intrinsics with overflow into regular arithmetic operations ↵	Erik Eckstein	2014-12-17	3	-0/+61
\| \| \| \| \| \| \| \| \| \|	if possible. Some intrinsics, like s/uadd.with.overflow and umul.with.overflow, are already strength reduced. This change adds other arithmetic intrinsics: s/usub.with.overflow, smul.with.overflow. It completes the work on PR20194. llvm-svn: 224417
*	[sanitizer] prevent function call merging for sanitizer-coverage callbacks	Kostya Serebryany	2014-12-16	1	-0/+7
\| \| \| \|	llvm-svn: 224372
*	Masked Load and Store Intrinsics in loop vectorizer.	Elena Demikhovsky	2014-12-16	1	-21/+100
\| \| \| \| \| \| \| \| \| \|	The loop vectorizer optimizes loops containing conditional memory accesses by generating masked load and store intrinsics. This decision is target dependent. http://reviews.llvm.org/D6527 llvm-svn: 224334
*	Sink store based on alias analysis	Elena Demikhovsky	2014-12-15	2	-41/+37
\| \| \| \| \| \| \| \| \| \| \| \|	- by Ella Bolshinsky The alias analysis is used define whether the given instruction is a barrier for store sinking. For 2 identical stores, following instructions are checked in the both basic blocks, to determine whether they are sinking barriers. http://reviews.llvm.org/D6420 llvm-svn: 224247
*	Loop Vectorizer minor changes in the code -	Elena Demikhovsky	2014-12-14	1	-3/+3
\| \| \| \| \| \| \| \|	some comments, function names, identation. Reviewed here: http://reviews.llvm.org/D6527 llvm-svn: 224218
*	More code format fix from r224133, NFC	Steven Wu	2014-12-12	1	-2/+1
\| \| \| \|	llvm-svn: 224140
*	Restructure code from r224097. NFC	Steven Wu	2014-12-12	1	-12/+12
\| \| \| \|	llvm-svn: 224133
*	[Reassociate] Use dbgs() instead of errs().	Chad Rosier	2014-12-12	1	-2/+2
\| \| \| \|	llvm-svn: 224125
*	This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling ↵	Suyog Sarda	2014-12-12	1	-2/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	into vector of loads, and vectorizes it. Test case : float hadd(float* a) { return (a[0] + a[1]) + (a[2] + a[3]); } AArch64 assembly before patch : ldp s0, s1, [x0] ldp s2, s3, [x0, #8] fadd s0, s0, s1 fadd s1, s2, s3 fadd s0, s0, s1 ret AArch64 assembly after patch : ldp d0, d1, [x0] fadd v0.2s, v0.2s, v1.2s faddp s0, v0.2s ret Reviewed Link : http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141208/248531.html llvm-svn: 224119
*	Fix another infinite loop in InstCombine	Steven Wu	2014-12-12	1	-9/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: InstCombine infinite-loops for the testcase added It is because InstCombine is generating instructions that can be optimized by itself. Fix by not optimizing frem if the optimized type is the same as original type. rdar://problem/19150820 Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D6634 llvm-svn: 224097
*	[ASan] Change fake stack and local variables handling.	Alexey Samsonov	2014-12-11	1	-44/+104
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit changes the way we get fake stack from ASan runtime (to find use-after-return errors) and the way we represent local variables: - __asan_stack_malloc function now returns pointer to newly allocated fake stack frame, or NULL if frame cannot be allocated. It doesn't take pointer to real stack as an input argument, it is calculated inside the runtime. - __asan_stack_free function doesn't take pointer to real stack as an input argument. Now this function is never called if fake stack frame wasn't allocated. - __asan_init version is bumped to reflect changes in the ABI. - new flag "-asan-stack-dynamic-alloca" allows to store all the function local variables in a dynamic alloca, instead of the static one. It reduces the stack space usage in use-after-return mode (dynamic alloca will not be called if the local variables are stored in a fake stack), and improves the debug info quality for local variables (they will not be described relatively to %rbp/%rsp, which are assumed to be clobbered by function calls). This flag is turned off by default for now, but I plan to turn it on after more testing. llvm-svn: 224062
*	[InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi.	Andrea Di Biagio	2014-12-11	1	-1/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch teaches the instruction combiner how to fold a call to 'insertqi' if the 'length field' (3rd operand) is set to zero, and if the sum between field 'length' and 'bit index' (4th operand) is bigger than 64. From the AMD64 Architecture Programmer's Manual: 1. If the sum of the bit index + length field is greater than 64, then the results are undefined; 2. A value of zero in the field length is defined as a length of 64. This patch improves the existing combining logic for intrinsic 'insertqi' adding extra checks to address both point 1. and point 2. Differential Revision: http://reviews.llvm.org/D6583 llvm-svn: 224054
*	The inliner needs to fix up debug information for llvm.dbg.declare, not only ↵	Michael Kuperstein	2014-12-11	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	for llvm.dbg.value. Patch by Amjad Aboud Differential Revision: http://reviews.llvm.org/D6525 llvm-svn: 224015
*	Refactor creation of overflow result tuples in InstCombineCalls.	Erik Eckstein	2014-12-11	2	-57/+30
\| \| \| \| \| \|	Extract the creation of overflow result tuples in a separate function. NFC. llvm-svn: 224006
*	Rename static functiom "map" to be more descriptive and to avoid	Kaelyn Takata	2014-12-09	1	-5/+5
\| \| \| \| \| \|	potential confusion with the std::map type. llvm-svn: 223853
*	Remove redundant variable.	Michael Zolotukhin	2014-12-09	1	-4/+2
\| \| \| \| \| \| \|	Tested by adding assert(LoopVectorPreHeader == VecPreheader) on LLVM test suite and SPECs. llvm-svn: 223847
*	Revert r223764 which taught instcombine about integer-based elment extraction	Chandler Carruth	2014-12-09	1	-349/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	patterns. This is causing Clang to miscompile itself for 32-bit x86 somehow, and likely also on ARM and PPC. I really don't know how, but reverting now that I've confirmed this is actually the culprit. I have a reproduction as well and so should be able to restore this shortly. This reverts commit r223764. Original commit log follows: Teach instcombine to canonicalize "element extraction" from a load of an integer and "element insertion" into a store of an integer into actual element extraction, element insertion, and vector loads and stores. Previously various parts of LLVM (including instcombine itself) would introduce integer loads and stores into the code as a way of opaquely loading and storing "bits". In some cases (such as a memcpy of std::complex<float> object) we will eventually end up using those bits in non-integer types. In order for SROA to effectively promote the allocas involved, it splits these "store a bag of bits" integer loads and stores up into the constituent parts. However, for non-alloca loads and tsores which remain, it uses integer math to recombine the values into a large integer to load or store. All of this would be "fine", except that it forces LLVM to go through integer math to combine and split up values. While this makes perfect sense for integers (and in fact is critical for bitfields to end up lowering efficiently) it is terrible for non-integer types, especially floating point types. We have a much more canonical way of representing the act of concatenating the bits of two SSA values in LLVM: a vector and insertelement. This patch teaching InstCombine to use this representation. With this patch applied, LLVM will no longer introduce integer math into the critical path of every loop over std::complex<float> operations such as those that make up the hot path of ... oh, most HPC code, Eigen, and any other heavy linear algebra library. For the record, I looked extensively at fixing this in other parts of the compiler, but it just doesn't work: - We really do want to canonicalize memcpy and other bit-motion to integer loads and stores. SSA values are tremendously more powerful than "copy" intrinsics. Not doing this regresses massive amounts of LLVM's scalar optimizer. - We really do need to split up integer loads and stores of this form in SROA or every memcpy of a trivially copyable struct will prevent SSA formation of the members of that struct. It essentially turns off SROA. - The closest alternative is to actually split the loads and stores when partitioning with SROA, but this has all of the downsides historically discussed of splitting up loads and stores -- the wide-store information is fundamentally lost. We would also see performance regressions for bitfield-heavy code and other places where the integers aren't really intended to be split without seemingly arbitrary logic to treat integers totally differently. - We can effectively fix this in instcombine, so it isn't that hard of a choice to make IMO. llvm-svn: 223813
*	Remove unneeded curly braces.	Frederic Riss	2014-12-09	1	-4/+2
\| \| \| \|	llvm-svn: 223809
*	Reorder the code to avoid inserting at the beginning of a vector.	Frederic Riss	2014-12-09	1	-1/+1
\| \| \| \| \| \|	As per dblaikie suggestion, thanks\! llvm-svn: 223808
*	IR: Split Metadata from Value	Duncan P. N. Exon Smith	2014-12-09	17	-148/+243
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the IR C++ API. I have a follow-up patch prepared for `clang`. If this breaks other sub-projects, I apologize in advance :(. Help me compile it on Darwin I'll try to fix it. FWIW, the errors should be easy to fix, so it may be simpler to just fix it yourself. This breaks the build for all metadata-related code that's out-of-tree. Rest assured the transition is mechanical and the compiler should catch almost all of the problems. Here's a quick guide for updating your code: - `Metadata` is the root of a class hierarchy with three main classes: `MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from the `Value` class hierarchy. It is typeless -- i.e., instances do not have a `Type`. - `MDNode`'s operands are all `Metadata ` (instead of `Value `). - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively. If you're referring solely to resolved `MDNode`s -- post graph construction -- just use `MDNode`. - `MDNode` (and the rest of `Metadata`) have only limited support for `replaceAllUsesWith()`. As long as an `MDNode` is pointing at a forward declaration -- the result of `MDNode::getTemporary()` -- it maintains a side map of its uses and can RAUW itself. Once the forward declarations are fully resolved RAUW support is dropped on the ground. This means that uniquing collisions on changing operands cause nodes to become "distinct". (This already happened fairly commonly, whenever an operand went to null.) If you're constructing complex (non self-reference) `MDNode` cycles, you need to call `MDNode::resolveCycles()` on each node (or on a top-level node that somehow references all of the nodes). Also, don't do that. Metadata cycles (and the RAUW machinery needed to construct them) are expensive. - An `MDNode` can only refer to a `Constant` through a bridge called `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`). As a side effect, accessing an operand of an `MDNode` that is known to be, e.g., `ConstantInt`, takes three steps: first, cast from `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`; third, cast down to `ConstantInt`. The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have metadata schema owners transition away from using `Constant`s when the type isn't important (and they don't care about referring to `GlobalValue`s). In the meantime, I've added transitional API to the `mdconst` namespace that matches semantics with the old code, in order to avoid adding the error-prone three-step equivalent to every call site. If your old code was: MDNode N = foo(); bar(isa <ConstantInt>(N->getOperand(0))); baz(cast <ConstantInt>(N->getOperand(1))); bak(cast_or_null <ConstantInt>(N->getOperand(2))); bat(dyn_cast <ConstantInt>(N->getOperand(3))); bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4))); you can trivially match its semantics with: MDNode N = foo(); bar(mdconst::hasa <ConstantInt>(N->getOperand(0))); baz(mdconst::extract <ConstantInt>(N->getOperand(1))); bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2))); bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3))); bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4))); and when you transition your metadata schema to `MDInt`: MDNode N = foo(); bar(isa <MDInt>(N->getOperand(0))); baz(cast <MDInt>(N->getOperand(1))); bak(cast_or_null <MDInt>(N->getOperand(2))); bat(dyn_cast <MDInt>(N->getOperand(3))); bay(dyn_cast_or_null<MDInt>(N->getOperand(4))); - A `CallInst` -- specifically, intrinsic instructions -- can refer to metadata through a bridge called `MetadataAsValue`. This is a subclass of `Value` where `getType()->isMetadataTy()`. `MetadataAsValue` is the only class that can legally refer to a `LocalAsMetadata`, which is a bridged form of non-`Constant` values like `Argument` and `Instruction`. It can also refer to any other `Metadata` subclass. (I'll break all your testcases in a follow-up commit, when I propagate this change to assembly.) llvm-svn: 223802
*	Correctly handle complex locations expressions in replaceDbgDeclareForAlloca()	Frederic Riss	2014-12-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	replaceDbgDeclareForAlloca() replaces an alloca by a value storing the address of what was the alloca. If there is a dbg.declare corresponding to that alloca, we need to lower it to a dbg.value describing the additional dereference operation to be performed to get to the underlying variable. This is done by adding a DW_OP_deref to the complex location part of the location description. This deref was added to the end of the operation list, which is wrong. The expression applies to what is described by the dbg.{declare,value}, and as we are changing this, we need to apply the DW_OP_deref as the first operation in the list. Part of the fix for rdar://19162268. llvm-svn: 223799
*	Revert "Move function to obtain branch weights into the BranchInst class. NFC."	Juergen Ributzka	2014-12-09	1	-6/+26
\| \| \| \| \| \|	This reverts commit r223784 and copies the 'ExtractBranchMetadata' to CodeGenPrepare. llvm-svn: 223795
*	Move function to obtain branch weights into the BranchInst class. NFC.	Juergen Ributzka	2014-12-09	1	-26/+6
\| \| \| \| \| \|	Make this function available to other parts of LLVM. llvm-svn: 223784
*	Teach instcombine to canonicalize "element extraction" from a load of an	Chandler Carruth	2014-12-09	1	-41/+349
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	integer and "element insertion" into a store of an integer into actual element extraction, element insertion, and vector loads and stores. Previously various parts of LLVM (including instcombine itself) would introduce integer loads and stores into the code as a way of opaquely loading and storing "bits". In some cases (such as a memcpy of std::complex<float> object) we will eventually end up using those bits in non-integer types. In order for SROA to effectively promote the allocas involved, it splits these "store a bag of bits" integer loads and stores up into the constituent parts. However, for non-alloca loads and tsores which remain, it uses integer math to recombine the values into a large integer to load or store. All of this would be "fine", except that it forces LLVM to go through integer math to combine and split up values. While this makes perfect sense for integers (and in fact is critical for bitfields to end up lowering efficiently) it is terrible for non-integer types, especially floating point types. We have a much more canonical way of representing the act of concatenating the bits of two SSA values in LLVM: a vector and insertelement. This patch teaching InstCombine to use this representation. With this patch applied, LLVM will no longer introduce integer math into the critical path of every loop over std::complex<float> operations such as those that make up the hot path of ... oh, most HPC code, Eigen, and any other heavy linear algebra library. For the record, I looked extensively at fixing this in other parts of the compiler, but it just doesn't work: - We really do want to canonicalize memcpy and other bit-motion to integer loads and stores. SSA values are tremendously more powerful than "copy" intrinsics. Not doing this regresses massive amounts of LLVM's scalar optimizer. - We really do need to split up integer loads and stores of this form in SROA or every memcpy of a trivially copyable struct will prevent SSA formation of the members of that struct. It essentially turns off SROA. - The closest alternative is to actually split the loads and stores when partitioning with SROA, but this has all of the downsides historically discussed of splitting up loads and stores -- the wide-store information is fundamentally lost. We would also see performance regressions for bitfield-heavy code and other places where the integers aren't really intended to be split without seemingly arbitrary logic to treat integers totally differently. - We can effectively fix this in instcombine, so it isn't that hard of a choice to make IMO. Differential Revision: http://reviews.llvm.org/D6548 llvm-svn: 223764
*	InstrProf: An intrinsic and lowering for instrumentation based profiling	Justin Bogner	2014-12-08	3	-0/+311
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce the ``llvm.instrprof_increment`` intrinsic and the ``-instrprof`` pass. These provide the infrastructure for writing counters for profiling, as in clang's ``-fprofile-instr-generate``. The implementation of the instrprof pass is ported directly out of the CodeGenPGO classes in clang, and with the followup in clang that rips that code out to use these new intrinsics this ends up being NFC. Doing the instrumentation this way opens some doors in terms of improving the counter performance. For example, this will make it simple to experiment with alternate lowering strategies, and allows us to try handling profiling specially in some optimizations if we want to. Finally, this drastically simplifies the frontend and puts all of the lowering logic in one place. llvm-svn: 223672
*	LLVMInstrumentation requires MC since r223532.	NAKAMURA Takumi	2014-12-06	1	-1/+1
\| \| \| \|	llvm-svn: 223573
*	Utils: Style cleanups, NFC	Duncan P. N. Exon Smith	2014-12-06	1	-7/+7
\| \| \| \|	llvm-svn: 223556
*	Utils: Avoid RAUW on metadata in CloneFunction()	Duncan P. N. Exon Smith	2014-12-06	1	-4/+4
\| \| \| \|	llvm-svn: 223555
*	Recommit of r223513 and r223514.	Kuba Brecka	2014-12-05	1	-34/+48
\| \| \| \| \| \|	Reviewed at http://reviews.llvm.org/D6488 llvm-svn: 223532
*	Reverting r223513 and r223514.	Kuba Brecka	2014-12-05	1	-48/+34
\| \| \| \|	llvm-svn: 223520