summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/InstCombine
Commit message (Collapse)AuthorAgeFilesLines
* [InstCombine] Remove unnecessary variable indexing into single-element arraysHal Finkel2015-02-202-5/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change addresses a deficiency pointed out in PR22629. To copy from the bug report: [from the bug report] Consider this code: int f(int x) { int a[] = {12}; return a[x]; } GCC knows to optimize this to movl $12, %eax ret The code generated by recent Clang at -O3 is: movslq %edi, %rax movl .L_ZZ1fiE1a(,%rax,4), %eax retq .L_ZZ1fiE1a: .long 12 # 0xc [end from the bug report] This definitely seems worth fixing. I've also seen this kind of code before (as the base case of generic vector wrapper templates with one element). The general idea is to look at the GEP feeding a load or a store, which has some variable as its first non-zero index, and determine if that index must be zero (or else an out-of-bounds access would occur). We can do this for allocas and globals with constant initializers where we know the maximum size of the underlying object. When we find such a GEP, we create a new one for the memory access with that first variable index replaced with a constant zero. Even if we can't eliminate the memory access (and sometimes we can't), it is still useful because it removes unnecessary indexing calculations. llvm-svn: 229959
* [InstCombine] Do not insert a GEP instruction before a landingpad instruction.Akira Hatanaka2015-02-181-0/+44
| | | | | | | | | | | InstCombiner::visitGetElementPtrInst was using getFirstNonPHI to compute the insertion point, which caused the verifier to complain when a GEP was inserted before a landingpad instruction. This commit fixes it to use getFirstInsertionPt instead. rdar://problem/19394964 llvm-svn: 229619
* InstCombine: fold more cases of (fp_to_u/sint (u/sint_to_fp val))Mehdi Amini2015-02-161-0/+110
| | | | | | | Fixes radar 15486701. From: Fiona Glaser <fglaser@apple.com> llvm-svn: 229437
* Tests: reformat sitofp.ll and use FileCheckMehdi Amini2015-02-161-20/+39
| | | | | From: Fiona Glaser <fglaser@apple.com> llvm-svn: 229436
* InstCombine: propagate deref via new addDereferenceableAttrRamkumar Ramachandra2015-02-141-0/+20
| | | | | | | | | | | | | | | | | The "dereferenceable" attribute cannot be added via .addAttribute(), since it also expects a size in bytes. AttrBuilder#addAttribute or AttributeSet#addAttribute is wrapped by classes Function, InvokeInst, and CallInst. Add corresponding wrappers to AttrBuilder#addDereferenceableAttr. Having done this, propagate the dereferenceable attribute via gc.relocate, adding a test to exercise it. Note that -datalayout is required during execution over and above -instcombine, because InstCombine only optionally requires DataLayoutPass. Differential Revision: http://reviews.llvm.org/D7510 llvm-svn: 229265
* [InstCombine] When canonicalizing gep indices, prefer zext when possiblePhilip Reames2015-02-141-0/+61
| | | | | | | | | | If we know that the sign bit of a value being sign extended is zero, we can use a zero extension instead. This is motivated by the fact that zero extensions are generally cheaper on x86 (and most other architectures?). We already apply a similar transform in DAGCombine, this just extends that to the IR level. This comes up when we eagerly canonicalize gep indices to the width of a machine register (i64 on x86_64). To do so, we insert sign extensions (sext) to promote smaller types. Differential Revision: http://reviews.llvm.org/D7255 llvm-svn: 229189
* [InstCombine] Fix regression introduced at r227197.Andrea Di Biagio2015-02-131-0/+27
| | | | | | | | | | | | | | | | | | This patch fixes a problem I accidentally introduced in an instruction combine on select instructions added at r227197. That revision taught the instruction combiner how to fold a cttz/ctlz followed by a icmp plus select into a single cttz/ctlz with flag 'is_zero_undef' cleared. However, the new rule added at r227197 would have produced wrong results in the case where a cttz/ctlz with flag 'is_zero_undef' cleared was follwed by a zero-extend or truncate. In that case, the folded instruction would have been inserted in a wrong location thus leaving the CFG in an inconsistent state. This patch fixes the problem and add two reproducible test cases to existing test 'InstCombine/select-cmp-cttz-ctlz.ll'. llvm-svn: 229124
* [InstCombine] Fix a bug when combining `icmp` from `ptrtoint`Michael Liao2015-02-131-1/+22
| | | | | | | | | | | | - First, there's a crash when we try to combine that pointers into `icmp` directly by creating a `bitcast`, which is invalid if that two pointers are from different address spaces. - It's not always appropriate to cast one pointer to another if they are from different address spaces as that is not no-op cast. Instead, we only combine `icmp` from `ptrtoint` if that two pointers are of the same address space. llvm-svn: 229063
* [IC] Fix a bug with the instcombine canonicalizing of loads andChandler Carruth2015-02-131-0/+19
| | | | | | | | | | | | | | | | | | | | propagating of metadata. We were propagating !nonnull metadata even when the newly formed load is no longer of a pointer type. This is clearly broken and results in LLVM failing the verifier and aborting. This patch just restricts the propagation of !nonnull metadata to when we actually have a pointer type. This bug report and the initial version of this patch was provided by Charles Davis! Many thanks for finding this! We still need to add logic to round-trip the metadata correctly if we combine from pointer types to integer types and then back by using range metadata for the integer type loads. But this is the minimal and safe version of the patch, which is important so we can backport it into 3.6. llvm-svn: 229029
* InstCombine: Allow folding of xor into icmp by changing the predicate for ↵Benjamin Kramer2015-02-121-0/+6
| | | | | | | | vectors The loop vectorizer can create this pattern. llvm-svn: 228954
* Revert r228556: InstCombine: propagate nonNull through assumeChandler Carruth2015-02-101-37/+0
| | | | | | | | | This commit isn't using the correct context, and is transfoming calls that are operands to loads rather than calls that are operands to an icmp feeding into an assume. I've replied on the original review thread with a very reduced test case and some thoughts on how to rework this. llvm-svn: 228677
* InstCombine: propagate nonNull through assumeRamkumar Ramachandra2015-02-091-0/+37
| | | | | | | | | Make assume (load (call|invoke) != null) set nonNull return attribute for the call and invoke. Also include tests. Differential Revision: http://reviews.llvm.org/D7107 llvm-svn: 228556
* InstCombine: Combine select sequences into a single selectMatthias Braun2015-02-061-0/+28
| | | | | | | | | | | | | | Normalize select(C0, select(C1, a, b), b) -> select((C0 & C1), a, b) select(C0, a, select(C1, a, b)) -> select((C0 | C1), a, b) This normal form may enable further combines on the And/Or and shortens paths for the values. Many targets prefer the other but can go back easily in CodeGen. Differential Revision: http://reviews.llvm.org/D7399 llvm-svn: 228409
* Move EH personality type classification to Analysis/LibCallSemantics.hReid Kleckner2015-01-281-0/+52
| | | | | | | | | | | | | | Summary: Also add enum types for __C_specific_handler and _CxxFrameHandler3 for which we know a few things. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7214 llvm-svn: 227284
* [SimplifyLibCalls] Don't confuse strcpy_chk for stpcpy_chk.Ahmed Bougacha2015-01-276-121/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was introduced in a faulty refactoring (r225640, mea culpa): the tests weren't testing the return values, so, for both __strcpy_chk and __stpcpy_chk, we would return the end of the buffer (matching stpcpy) instead of the beginning (for strcpy). The root cause was the prefix "__" being ignored when comparing, which made us always pick LibFunc::stpcpy_chk. Pass the LibFunc::Func directly to avoid this kind of error. Also, make the testcases as explicit as possible to prevent this. The now-useful testcases expose another, entangled, stpcpy problem, with the further simplification. This was introduced in a refactoring (r225640) to match the original behavior. However, this leads to problems when successive simplifications generate several similar instructions, none of which are removed by the custom replaceAllUsesWith. For instance, InstCombine (the main user) doesn't erase the instruction in its custom RAUW. When trying to simplify say __stpcpy_chk: - first, an stpcpy is created (fortified simplifier), - second, a memcpy is created (normal simplifier), but the stpcpy call isn't removed. - third, InstCombine later revisits the instructions, and simplifies the first stpcpy to a memcpy. We now have two memcpys. llvm-svn: 227250
* [InstCombine] Teach how to fold a select into a cttz/ctlz with the ↵Andrea Di Biagio2015-01-272-2/+304
| | | | | | | | | | | 'is_zero_undef' flag. This patch teaches the Instruction Combiner how to fold a cttz/ctlz followed by a icmp plus select into a single cttz/ctlz with flag 'is_zero_undef' cleared. Added test InstCombine/select-cmp-cttz-ctlz.ll. llvm-svn: 227197
* [PM] Port instcombine to the new pass manager!Chandler Carruth2015-01-241-0/+1
| | | | | | | | | | | | | | | | | | | | This is exciting as this is a much more involved port. This is a complex, existing transformation pass. All of the core logic is shared between both old and new pass managers. Only the access to the analyses is separate because the actual techniques are separate. This also uses a bunch of different and interesting analyses and is the first time where we need to use an analysis across an IR layer. This also paves the way to expose instcombine utility functions. I've got a static function that implements the core pass logic over a function which might be mildly interesting, but more interesting is likely exposing a routine which just uses instructions *already in* the worklist and combines until empty. I've switched one of my favorite instcombine tests to run with both as well to make sure this keeps working. llvm-svn: 226987
* [canonicalize] Teach InstCombine to canonicalize loads which are onlyChandler Carruth2015-01-222-3/+53
| | | | | | | | | | | | | | | | | | | | | | | ever stored to always use a legal integer type if one is available. Regardless of whether this particular type is good or bad, it ensures we don't get weird differences in generated code (and resulting performance) from "equivalent" patterns that happen to end up using a slightly different type. After some discussion on llvmdev it seems everyone generally likes this canonicalization. However, there may be some parts of LLVM that handle it poorly and need to be fixed. I have at least verified that this doesn't impede GVN and instcombine's store-to-load forwarding powers in any obvious cases. Subtle cases are exactly what we need te flush out if they remain. Also note that this IR pattern should already be hitting LLVM from Clang at least because it is exactly the IR which would be produced if you used memcpy to copy a pointer or floating point between memory instead of a variable. llvm-svn: 226781
* InstCombine: Don't strip bitcasts off of callsites marked 'thunk'David Majnemer2015-01-211-0/+11
| | | | | | | The return type of a thunk is meaningless, we just want the arguments and return value to be forwarded. llvm-svn: 226708
* For PR21145: recognise a builtin call to a known deallocation function even ifRichard Smith2015-01-151-4/+23
| | | | | | | | it's defined in the current module. Clang generates this situation for the C++14 sized deallocation functions, because it generates a weak definition in case one isn't provided by the C++ runtime library. llvm-svn: 226069
* IR: Move MDLocation into placeDuncan P. N. Exon Smith2015-01-142-6/+6
| | | | | | | | | | | | | | | | | | | | This commit moves `MDLocation`, finishing off PR21433. There's an accompanying clang commit for frontend testcases. I'll attach the testcase upgrade script I used to PR21433 to help out-of-tree frontends/backends. This changes the schema for `DebugLoc` and `DILocation` from: !{i32 3, i32 7, !7, !8} to: !MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8) Note that empty fields (line/column: 0 and inlinedAt: null) don't get printed by the assembly writer. llvm-svn: 226048
* InstCombine: Don't take A-B<0 into A<B if A-B has other usesDavid Majnemer2015-01-141-0/+15
| | | | | | This fixes PR22226. llvm-svn: 226023
* [SimplifyLibCalls] Don't try to simplify indirect calls.Ahmed Bougacha2015-01-141-0/+13
| | | | | | | | | | | | It turns out, all callsites of the simplifier are guarded by a check for CallInst::getCalledFunction (i.e., to make sure the callee is direct). This check wasn't done when trying to further optimize a simplified fortified libcall, introduced by a refactoring in r225640. Fix that, add a testcase, and document the requirement. llvm-svn: 225895
* Fix fcmp + fabs instcombines when using the intrinsicMatt Arsenault2015-01-081-0/+82
| | | | | | | This was only handling the libcall. This is another example of why only the intrinsic should ever be used when it exists. llvm-svn: 225465
* Fix using wrong intrinsic in testMatt Arsenault2015-01-061-9/+9
| | | | | | | This is a leftover from renaming the intrinsic. It's surprising the unknown llvm. intrinsic wasn't rejected. llvm-svn: 225304
* Convert fcmp with 0.0 from casted integers to icmpMatt Arsenault2015-01-061-0/+454
| | | | | | | | | | | | | | | | | | | This is already handled in general when it is known the conversion can't lose bits with smaller integer types casted into wider floating point types. This pattern happens somewhat often in GPU programs that cast workitem intrinsics to float, which are often compared with 0. Specifically handle the special case of compares with zero which should also be known to not lose information. I had a more general version of this which allows equality compares if the casted float is exactly representable in the integer, but I'm not 100% confident that is always correct. Also fold cases that aren't integers to true / false. llvm-svn: 225265
* InstCombine: Bitcast call arguments from/to pointer/integer typeDavid Majnemer2015-01-062-4/+43
| | | | | | | Try harder to get rid of bitcast'd calls by ptrtoint/inttoptr'ing arguments and return values when DataLayout says it is safe to do so. llvm-svn: 225254
* InstCombine: match can find ConstantExprs, don't assume we have a ValueDavid Majnemer2015-01-041-0/+9
| | | | | | | | | | We assumed the output of a match was a Value, this would cause us to assert because we would fail a cast<>. Instead, use a helper in the Operator family to hide the distinction between Value and Constant. This fixes PR22087. llvm-svn: 225127
* InstCombine: Detect when llvm.umul.with.overflow always overflowsDavid Majnemer2015-01-021-0/+13
| | | | | | | We know overflow always occurs if both ~LHSKnownZero * ~RHSKnownZero and LHSKnownOne * RHSKnownOne overflow. llvm-svn: 225077
* InstCombine: fsub nsz 0, X ==> fsub nsz -0.0, XSanjay Patel2014-12-311-0/+8
| | | | | | | | | | | | | Some day the backend may handle instruction-level fast math flags and make this transform unnecessary, but it's still better practice to use the canonical representation of fneg when possible (use a -0.0). This is a partial fix for PR20870 ( http://llvm.org/bugs/show_bug.cgi?id=20870 ). See also http://reviews.llvm.org/D6723. Differential Revision: http://reviews.llvm.org/D6731 llvm-svn: 225050
* InstCombine: try to transform A-B < 0 into A < BDavid Majnemer2014-12-311-0/+36
| | | | | | | We are allowed to move the 'B' to the right hand side if we an prove there is no signed overflow and if the comparison itself is signed. llvm-svn: 225034
* Carry facts about nullness and undef across GC relocationPhilip Reames2014-12-291-0/+52
| | | | | | | | | | | | | | | This change implements four basic optimizations: If a relocated value isn't used, it doesn't need to be relocated. If the value being relocated is null, relocation doesn't change that. (Technically, this might be collector specific. I don't know of one which it doesn't work for though.) If the value being relocated is undef, the relocation is meaningless. If the value being relocated was known nonnull, the relocated pointer also isn't null. (Since it points to the same source language object.) I outlined other planned work in comments. Differential Revision: http://reviews.llvm.org/D6600 llvm-svn: 224968
* Loading from null is valid outside of addrspace 0Philip Reames2014-12-291-0/+20
| | | | | | | | | | This patches fixes a miscompile where we were assuming that loading from null is undefined and thus we could assume it doesn't happen. This transform is perfectly legal in address space 0, but is not neccessarily legal in other address spaces. We really should introduce a hook to control this property on a per target per address space basis. We may be loosing valuable optimizations in some address spaces by being too conservative. Original patch by Thomas P Raoux (submitted to llvm-commits), tests and formatting fixes by me. llvm-svn: 224961
* InstCombine: Infer nuw for multipliesDavid Majnemer2014-12-262-6/+18
| | | | | | | A multiply cannot unsigned wrap if there are bitwidth, or more, leading zero bits between the two operands. llvm-svn: 224849
* InstCombe: Infer nsw for multipliesDavid Majnemer2014-12-261-0/+12
| | | | | | | We already utilize this logic for reducing overflow intrinsics, it makes sense to reuse it for normal multiplies as well. llvm-svn: 224847
* [ValueTracking] Move GlobalAlias handling to be after the max depth check in ↵Michael Kuperstein2014-12-231-0/+24
| | | | | | | | | | | | computeKnownBits() GlobalAlias handling used to be after GlobalValue handling, which meant it was, in practice, dead code. r220165 moved GlobalAlias handling to be before GlobalValue handling, but also moved it to be before the max depth check, causing an assert due to a recursion depth limit violation. This moves GlobalAlias handling forward to where it's safe, and changes the GlobalValue handling to only look at GlobalObjects. Differential Revision: http://reviews.llvm.org/D6758 llvm-svn: 224765
* This should have been part of r224676.David Majnemer2014-12-201-2/+2
| | | | llvm-svn: 224677
* InstCombine: Squash an icmp+select into bitwise arithmeticDavid Majnemer2014-12-201-0/+33
| | | | | | | | | (X & INT_MIN) == 0 ? X ^ INT_MIN : X into X | INT_MIN (X & INT_MIN) != 0 ? X ^ INT_MIN : X into X & INT_MAX This fixes PR21993. llvm-svn: 224676
* Reapply: [InstCombine] Fix visitSwitchInst to use right operand types for ↵Bruno Cardoso Lopes2014-12-191-0/+30
| | | | | | | | | | | | | | | | | | sub cstexpr The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. Also, fix code to also return the modified switch when only the truncation is performed. This fixes an assertion crash. Differential Revision: http://reviews.llvm.org/D6644 rdar://problem/19191835 llvm-svn: 224588
* use -0.0 when creating an fneg instructionSanjay Patel2014-12-191-1/+1
| | | | | | | | | | | | | | | | | | | Backends recognize (-0.0 - X) as the canonical form for fneg and produce better code. Eg, ppc64 with 0.0: lis r2, ha16(LCPI0_0) lfs f0, lo16(LCPI0_0)(r2) fsubs f1, f0, f1 blr vs. -0.0: fneg f1, f1 blr Differential Revision: http://reviews.llvm.org/D6723 llvm-svn: 224583
* Revert "[InstCombine] Fix visitSwitchInst to use right operand types for sub ↵Bruno Cardoso Lopes2014-12-191-30/+0
| | | | | | | | | | | | | cstexpr" Reverts commit r224574 to appease buildbots: The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. This fixes an assertion crash. llvm-svn: 224576
* [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexprBruno Cardoso Lopes2014-12-191-0/+30
| | | | | | | | | | | | | The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. This fixes an assertion crash. Differential Revision: http://reviews.llvm.org/D6644 rdar://problem/19191835 llvm-svn: 224574
* Strength reduce intrinsics with overflow into regular arithmetic operations ↵Erik Eckstein2014-12-171-12/+91
| | | | | | | | | | if possible. Some intrinsics, like s/uadd.with.overflow and umul.with.overflow, are already strength reduced. This change adds other arithmetic intrinsics: s/usub.with.overflow, smul.with.overflow. It completes the work on PR20194. llvm-svn: 224417
* IR: Make metadata typeless in assemblyDuncan P. N. Exon Smith2014-12-1511-74/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that `Metadata` is typeless, reflect that in the assembly. These are the matching assembly changes for the metadata/value split in r223802. - Only use the `metadata` type when referencing metadata from a call intrinsic -- i.e., only when it's used as a `Value`. - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode` when referencing it from call intrinsics. So, assembly like this: define @foo(i32 %v) { call void @llvm.foo(metadata !{i32 %v}, metadata !0) call void @llvm.foo(metadata !{i32 7}, metadata !0) call void @llvm.foo(metadata !1, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{metadata !3}, metadata !0) ret void, !bar !2 } !0 = metadata !{metadata !2} !1 = metadata !{i32* @global} !2 = metadata !{metadata !3} !3 = metadata !{} turns into this: define @foo(i32 %v) { call void @llvm.foo(metadata i32 %v, metadata !0) call void @llvm.foo(metadata i32 7, metadata !0) call void @llvm.foo(metadata i32* @global, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{!3}, metadata !0) ret void, !bar !2 } !0 = !{!2} !1 = !{i32* @global} !2 = !{!3} !3 = !{} I wrote an upgrade script that handled almost all of the tests in llvm and many of the tests in cfe (even handling many `CHECK` lines). I've attached it (or will attach it in a moment if you're speedy) to PR21532 to help everyone update their out-of-tree testcases. This is part of PR21532. llvm-svn: 224257
* ValueTracking: Don't recurse too deeply in computeKnownBitsFromAssumeDavid Majnemer2014-12-121-0/+18
| | | | | | | | | Respect the MaxDepth recursion limit, doing otherwise will trigger an assert in computeKnownBits. This fixes PR21891. llvm-svn: 224168
* Fix another infinite loop in InstCombineSteven Wu2014-12-121-0/+12
| | | | | | | | | | | | | | | Summary: InstCombine infinite-loops for the testcase added It is because InstCombine is generating instructions that can be optimized by itself. Fix by not optimizing frem if the optimized type is the same as original type. rdar://problem/19150820 Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D6634 llvm-svn: 224097
* [InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi.Andrea Di Biagio2014-12-111-0/+27
| | | | | | | | | | | | | | | | | | This patch teaches the instruction combiner how to fold a call to 'insertqi' if the 'length field' (3rd operand) is set to zero, and if the sum between field 'length' and 'bit index' (4th operand) is bigger than 64. From the AMD64 Architecture Programmer's Manual: 1. If the sum of the bit index + length field is greater than 64, then the results are undefined; 2. A value of zero in the field length is defined as a length of 64. This patch improves the existing combining logic for intrinsic 'insertqi' adding extra checks to address both point 1. and point 2. Differential Revision: http://reviews.llvm.org/D6583 llvm-svn: 224054
* ConstantFold, InstSimplify: undef >>a x can be either -1 or 0, choose 0David Majnemer2014-12-101-2/+2
| | | | | | Zero is usually a nicer constant to have than -1. llvm-svn: 223969
* Revert r223764 which taught instcombine about integer-based elment extractionChandler Carruth2014-12-091-210/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | patterns. This is causing Clang to miscompile itself for 32-bit x86 somehow, and likely also on ARM and PPC. I really don't know how, but reverting now that I've confirmed this is actually the culprit. I have a reproduction as well and so should be able to restore this shortly. This reverts commit r223764. Original commit log follows: Teach instcombine to canonicalize "element extraction" from a load of an integer and "element insertion" into a store of an integer into actual element extraction, element insertion, and vector loads and stores. Previously various parts of LLVM (including instcombine itself) would introduce integer loads and stores into the code as a way of opaquely loading and storing "bits". In some cases (such as a memcpy of std::complex<float> object) we will eventually end up using those bits in non-integer types. In order for SROA to effectively promote the allocas involved, it splits these "store a bag of bits" integer loads and stores up into the constituent parts. However, for non-alloca loads and tsores which remain, it uses integer math to recombine the values into a large integer to load or store. All of this would be "fine", except that it forces LLVM to go through integer math to combine and split up values. While this makes perfect sense for integers (and in fact is critical for bitfields to end up lowering efficiently) it is *terrible* for non-integer types, especially floating point types. We have a much more canonical way of representing the act of concatenating the bits of two SSA values in LLVM: a vector and insertelement. This patch teaching InstCombine to use this representation. With this patch applied, LLVM will no longer introduce integer math into the critical path of every loop over std::complex<float> operations such as those that make up the hot path of ... oh, most HPC code, Eigen, and any other heavy linear algebra library. For the record, I looked *extensively* at fixing this in other parts of the compiler, but it just doesn't work: - We really do want to canonicalize memcpy and other bit-motion to integer loads and stores. SSA values are tremendously more powerful than "copy" intrinsics. Not doing this regresses massive amounts of LLVM's scalar optimizer. - We really do need to split up integer loads and stores of this form in SROA or every memcpy of a trivially copyable struct will prevent SSA formation of the members of that struct. It essentially turns off SROA. - The closest alternative is to actually split the loads and stores when partitioning with SROA, but this has all of the downsides historically discussed of splitting up loads and stores -- the wide-store information is fundamentally lost. We would also see performance regressions for bitfield-heavy code and other places where the integers aren't really intended to be split without seemingly arbitrary logic to treat integers totally differently. - We *can* effectively fix this in instcombine, so it isn't that hard of a choice to make IMO. llvm-svn: 223813
* Removal Of Duplicate Test Cases and Addition Of Missing Check StatementsSonam Kumari2014-12-091-21/+15
| | | | llvm-svn: 223768
OpenPOWER on IntegriCloud