summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [WebAssembly] Add a cast to void to fix an unused private member warning, ↵Dan Gohman2017-02-161-1/+3
| | | | | | for now. llvm-svn: 295327
* [X86] Remove local areOnlyUsersOf helper and use SDNode::areOnlyUsersOf instead.Simon Pilgrim2017-02-161-9/+1
| | | | llvm-svn: 295326
* [ARM] GlobalISel: Select floating point loadsDiana Picus2017-02-161-10/+31
| | | | llvm-svn: 295321
* Rever -r295314 "[DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in ↵Artur Pilipenko2017-02-161-19/+8
| | | | | | | | load combine" This change causes some of AMDGPU and PowerPC tests to fail. llvm-svn: 295316
* [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combineArtur Pilipenko2017-02-161-8/+19
| | | | | | | | | | Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295314
* [ARM] GlobalISel: Select G_SEQUENCE and G_EXTRACTDiana Picus2017-02-161-0/+78
| | | | | | | | Since they're only used for passing around double precision floating point values into the general purpose registers, we'll lower them to VMOVDRR and VMOVRRD. llvm-svn: 295310
* [ARM] GlobalISel: Select double G_FADD and copiesDiana Picus2017-02-161-6/+29
| | | | | | Just use VADDD if available, bail out if not. llvm-svn: 295309
* [ARM] GlobalISel: Assert that we don't use the FPR bank if we don't have VFPDiana Picus2017-02-161-0/+12
| | | | llvm-svn: 295308
* [ARM] GlobalISel: Add reg bank mappings for G_SEQUENCE and G_EXTRACTDiana Picus2017-02-161-0/+26
| | | | | | | Support G_SEQUENCE and G_EXTRACT as needed for passing double precision floating point values in the soft-fp float mode. llvm-svn: 295306
* [ARM] GlobalISel: Make the FPR bank 64-bit wideDiana Picus2017-02-162-5/+22
| | | | | | | Also add mappings for single and double precision FP, and use them for G_FADD and G_LOAD. llvm-svn: 295302
* [ARM] GlobalISel: Legalize 64-bit G_FADD and G_LOADDiana Picus2017-02-161-0/+7
| | | | | | | | For now we just mark them as legal all the time and let the other passes bail out if they can't handle it. In the future, we'll want to move more of the brains into the legalizer. llvm-svn: 295300
* [ARM] GlobalISel: Lower double precision FP argsDiana Picus2017-02-162-8/+85
| | | | | | | | | | | | | | For the hard float calling convention, we just use the D registers. For the soft-fp calling convention, we use the R registers and move values to/from the D registers by means of G_SEQUENCE/G_EXTRACT. While doing so, we make sure to honor the endianness of the target, since the CCAssignFn doesn't do that for us. For pure soft float targets, we still bail out because we don't support the libcalls yet. llvm-svn: 295295
* [AVX-512][InstCombine] Teach InstCombine to optimize 512-bit packss/packus ↵Craig Topper2017-02-162-4/+9
| | | | | | intrinsics like it does 128/256-bit. llvm-svn: 295294
* [AVX-512] Remove masked packss/packus intrinsics and autoupgrade to unmasked ↵Craig Topper2017-02-162-12/+44
| | | | | | | | intrinsics with select instructions. For 512-bit add new unmasked intrinsics. The new 512-bit unmasked intrinsics will make it easy to handle these with the SSE/AVX intrinsics in InstCombine where we currently have a TODO. llvm-svn: 295290
* Split WinCOFFObjectWriter::writeSection.Rui Ueyama2017-02-161-28/+39
| | | | llvm-svn: 295276
* Split WinCOFFObjectWriter::writeObject function.Rui Ueyama2017-02-161-160/+183
| | | | llvm-svn: 295273
* AMDGPU: Remove llvm.SI.sendmsgMatt Arsenault2017-02-162-6/+3
| | | | llvm-svn: 295270
* AMDGPU: Remove SI_fs_constant and SI_fs_interp intrinsicsMatt Arsenault2017-02-163-50/+3
| | | | | | Update test uses with expansion in terms of new intrinsics. llvm-svn: 295269
* Remove useless local variable.Rui Ueyama2017-02-161-9/+4
| | | | llvm-svn: 295268
* Rename variables to match the LLVM style.Rui Ueyama2017-02-161-94/+97
| | | | llvm-svn: 295265
* [X86] Re-enable conditional tail calls and fix PR31257.Hans Wennborg2017-02-166-2/+193
| | | | | | | | | | | This reverts r294348, which removed support for conditional tail calls due to the PR above. It fixes the PR by marking live registers as implicitly used and defined by the now predicated tailcall. This is similar to how IfConversion predicates instructions. Differential Revision: https://reviews.llvm.org/D29856 llvm-svn: 295262
* PMB: Add an importing WPD pass to the start of the ThinLTO backend pipeline.Peter Collingbourne2017-02-151-1/+15
| | | | | | Differential Revision: https://reviews.llvm.org/D30008 llvm-svn: 295260
* GlobalISel: legalize va_arg on AArch64.Tim Northover2017-02-154-0/+95
| | | | | | | | Uses a Custom implementation because the slot sizes being a multiple of the pointer size isn't really universal, even for the architectures that do have a simple "void *" va_list. llvm-svn: 295255
* GlobalISel: support translating va_argTim Northover2017-02-151-0/+12
| | | | | | | Since (say) i128 and [16 x i8] map to the same type in generic MIR, we also need to attach the required alignment info. llvm-svn: 295254
* Implement intrinsic mangling for literal struct types.Daniel Berlin2017-02-152-6/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes PR 31921 Summary: Predicateinfo requires an ugly workaround to try to avoid literal struct types due to the intrinsic mangling not being implemented. This workaround actually does not work in all cases (you can hit the assert by bootstrapping with -print-predicateinfo), and can't be made to work without DFS'ing the type (IE copying getMangledStr and using a version that detects if it would crash). Rather than do that, i just implemented the mangling. It seems simple, since they are unified structurally. Looking at the overloaded-mangling testcase we have, it actually turns out the gc intrinsics will *also* crash if you try to use a literal struct. Thus, the testcase added fails before this patch, and works after, without needing to resort to predicateinfo. Reviewers: chandlerc, davide Subscribers: llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D29925 llvm-svn: 295253
* AMDGPU: Remove dead node definitionsMatt Arsenault2017-02-151-10/+0
| | | | llvm-svn: 295247
* Fix typosMatt Arsenault2017-02-152-2/+2
| | | | llvm-svn: 295246
* AMDGPU: Consolidate sendmsg/sendmsghalt handling and testsMatt Arsenault2017-02-151-7/+4
| | | | llvm-svn: 295244
* [Support] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2017-02-153-23/+34
| | | | | | other minor fixes (NFC). llvm-svn: 295243
* DAG: Do not scalarize fsub if fneg is legalMatt Arsenault2017-02-151-0/+15
| | | | | | Tests will be included with future commit. llvm-svn: 295242
* Re-apply r295110 and r295144 with a fix for the ASan issue.Peter Collingbourne2017-02-151-98/+156
| | | | llvm-svn: 295241
* AMDGPU: Replace assert with report_fatal_errorMatt Arsenault2017-02-151-1/+2
| | | | | | Also use a more refined condition. llvm-svn: 295239
* [GlobalObject] Fix setSection("")Keno Fischer2017-02-151-1/+3
| | | | | | | | | | | | | Summary: In rL291613, the section name was interned in LLVMContext. However, this broke the ability to remove the section from a GlobalObject, because it tried to intern empty strings, which is not allowed. Fix that and add an appropriate regression test. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D29795 llvm-svn: 295238
* [InstCombine] improve formatting; NFCSanjay Patel2017-02-151-6/+3
| | | | llvm-svn: 295237
* AssumptionCache: Disable the verifier by default, move it behind a hidden ↵Peter Collingbourne2017-02-151-5/+15
| | | | | | | | | | | | | | cl::opt and verify from releaseMemory(). This is a short term solution to the problem that many passes currently fail to update the assumption cache. In the long term the verifier should not be controllable with a flag. We should either fix all passes to correctly update the assumption cache and enable the verifier unconditionally or somehow arrange for the assumption list to be updated automatically by passes. Differential Revision: https://reviews.llvm.org/D30003 llvm-svn: 295236
* [X86][SSE] Don't call EltsFromConsecutiveLoads if any element is missing.Simon Pilgrim2017-02-151-4/+11
| | | | | | Minor performance speedup - if any call to getShuffleScalarElt fails to get a result, don't both calling for the remaining elements as EltsFromConsecutiveLoads will fail anyhow. llvm-svn: 295235
* AddressSanitizer: don't track swifterror memory addressesArnold Schwaighofer2017-02-151-3/+12
| | | | | | | | | | | | | | They are register promoted by ISel and so it makes no sense to treat them as memory. Inserting calls to the thread sanitizer would also generate invalid IR. You would hit: "swifterror value can only be loaded and stored from, or as a swifterror argument!" llvm-svn: 295230
* [AArch64] Make am_ldrlit an iPTR - not OtherVT - operand. NFC-ish.Ahmed Bougacha2017-02-151-1/+1
| | | | | | | | | | | am_ldrlit diverged from am_brcond in r207105, but kept the OtherVT operand type. It made sense for branch targets, as those are represented as MVT::Other in SDAG. But loads operate on pointers. This shouldn't have an observable effect on any in-tree code, but helps make the patterns consistent for external users. llvm-svn: 295229
* [OptDiag] Pass const Values/Types to Argument. NFC.Ahmed Bougacha2017-02-151-2/+2
| | | | llvm-svn: 295228
* [LTO] Add ability to emit assembly to new LTO APITobias Edler von Koch2017-02-152-2/+2
| | | | | | | | | | | | | | | | Summary: Add a field to LTO::Config, CGFileType, to select the file type to emit (object or assembly). This is useful for testing and to implement -save-temps. Reviewers: tejohnson, mehdi_amini, pcc Reviewed By: mehdi_amini Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D29475 llvm-svn: 295226
* Codegen: Make chains from trellis-shaped CFGsKyle Butt2017-02-151-17/+293
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lay out trellis-shaped CFGs optimally. A trellis of the shape below: A B |\ /| | \ / | | X | | / \ | |/ \| C D would be laid out A; B->C ; D by the current layout algorithm. Now we identify trellises and lay them out either A->C; B->D or A->D; B->C. This scales with an increasing number of predecessors. A trellis is a a group of 2 or more predecessor blocks that all have the same successors. because of this we can tail duplicate to extend existing trellises. As an example consider the following CFG: B D F H / \ / \ / \ / \ A---C---E---G---Ret Where A,C,E,G are all small (Currently 2 instructions). The CFG preserving layout is then A,B,C,D,E,F,G,H,Ret. The current code will copy C into B, E into D and G into F and yield the layout A,C,B(C),E,D(E),F(G),G,H,ret define void @straight_test(i32 %tag) { entry: br label %test1 test1: ; A %tagbit1 = and i32 %tag, 1 %tagbit1eq0 = icmp eq i32 %tagbit1, 0 br i1 %tagbit1eq0, label %test2, label %optional1 optional1: ; B call void @a() br label %test2 test2: ; C %tagbit2 = and i32 %tag, 2 %tagbit2eq0 = icmp eq i32 %tagbit2, 0 br i1 %tagbit2eq0, label %test3, label %optional2 optional2: ; D call void @b() br label %test3 test3: ; E %tagbit3 = and i32 %tag, 4 %tagbit3eq0 = icmp eq i32 %tagbit3, 0 br i1 %tagbit3eq0, label %test4, label %optional3 optional3: ; F call void @c() br label %test4 test4: ; G %tagbit4 = and i32 %tag, 8 %tagbit4eq0 = icmp eq i32 %tagbit4, 0 br i1 %tagbit4eq0, label %exit, label %optional4 optional4: ; H call void @d() br label %exit exit: ret void } here is the layout after D27742: straight_test: # @straight_test ; ... Prologue elided ; BB#0: # %entry ; A (merged with test1) ; ... More prologue elided mr 30, 3 andi. 3, 30, 1 bc 12, 1, .LBB0_2 ; BB#1: # %test2 ; C rlwinm. 3, 30, 0, 30, 30 beq 0, .LBB0_3 b .LBB0_4 .LBB0_2: # %optional1 ; B (copy of C) bl a nop rlwinm. 3, 30, 0, 30, 30 bne 0, .LBB0_4 .LBB0_3: # %test3 ; E rlwinm. 3, 30, 0, 29, 29 beq 0, .LBB0_5 b .LBB0_6 .LBB0_4: # %optional2 ; D (copy of E) bl b nop rlwinm. 3, 30, 0, 29, 29 bne 0, .LBB0_6 .LBB0_5: # %test4 ; G rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 b .LBB0_7 .LBB0_6: # %optional3 ; F (copy of G) bl c nop rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 .LBB0_7: # %optional4 ; H bl d nop .LBB0_8: # %exit ; Ret ld 30, 96(1) # 8-byte Folded Reload addi 1, 1, 112 ld 0, 16(1) mtlr 0 blr The tail-duplication has produced some benefit, but it has also produced a trellis which is not laid out optimally. With this patch, we improve the layouts of such trellises, and decrease the cost calculation for tail-duplication accordingly. This patch produces the layout A,C,E,G,B,D,F,H,Ret. This layout does have back edges, which is a negative, but it has a bigger compensating positive, which is that it handles the case where there are long strings of skipped blocks much better than the original layout. Both layouts handle runs of executed blocks equally well. Branch prediction also improves if there is any correlation between subsequent optional blocks. Here is the resulting concrete layout: straight_test: # @straight_test ; BB#0: # %entry ; A (merged with test1) mr 30, 3 andi. 3, 30, 1 bc 12, 1, .LBB0_4 ; BB#1: # %test2 ; C rlwinm. 3, 30, 0, 30, 30 bne 0, .LBB0_5 .LBB0_2: # %test3 ; E rlwinm. 3, 30, 0, 29, 29 bne 0, .LBB0_6 .LBB0_3: # %test4 ; G rlwinm. 3, 30, 0, 28, 28 bne 0, .LBB0_7 b .LBB0_8 .LBB0_4: # %optional1 ; B (Copy of C) bl a nop rlwinm. 3, 30, 0, 30, 30 beq 0, .LBB0_2 .LBB0_5: # %optional2 ; D (Copy of E) bl b nop rlwinm. 3, 30, 0, 29, 29 beq 0, .LBB0_3 .LBB0_6: # %optional3 ; F (Copy of G) bl c nop rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 .LBB0_7: # %optional4 ; H bl d nop .LBB0_8: # %exit Differential Revision: https://reviews.llvm.org/D28522 llvm-svn: 295223
* include function name in dot filenameXinliang David Li2017-02-154-8/+9
| | | | | | Differential Revision: http://reviews.llvm.org/D29975 llvm-svn: 295220
* ThreadSanitizer: don't track swifterror memory addressesArnold Schwaighofer2017-02-151-0/+7
| | | | | | | | | | | | | | They are register promoted by ISel and so it makes no sense to treat them as memory. Inserting calls to the thread sanitizer would also generate invalid IR. You would hit: "swifterror value can only be loaded and stored from, or as a swifterror argument!" llvm-svn: 295215
* [DAG] Don't try to create an INSERT_SUBVECTOR with an illegal sourceMichael Kuperstein2017-02-151-1/+7
| | | | | | | | | | | | We currently can't legalize those, but we should really not be creating them in the first place, since legalization would probably look similar to the way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD. This fixes PR311956. Differential Revision: https://reviews.llvm.org/D29961 llvm-svn: 295213
* [X86][SSE] Propagate undef upper elements from scalar_to_vector during ↵Simon Pilgrim2017-02-151-1/+7
| | | | | | | | shuffle combining Only do this for integer types currently - floats types (in particular insertps) load folding often fails with this. llvm-svn: 295208
* [AMDGPU] Revert failed schedulingStanislav Mekhanoshin2017-02-153-37/+106
| | | | | | | | | | | | | | This patch reverts region's scheduling to the original untouched state in case if we have have decreased occupancy. In addition it switches to use TargetRegisterInfo occupancy callback for pressure limits instead of gradually increasing limits which were just passed by. We are going to stay with the best schedule so we do not need to tolerate worsened scheduling anymore. Differential Revision: https://reviews.llvm.org/D29971 llvm-svn: 295206
* Revert "[JumpThreading] Thread through guards"Anna Thomas2017-02-152-189/+15
| | | | | | | | | This reverts commit r294617. We fail on an assert while trying to get a condition from an unconditional branch. llvm-svn: 295200
* [InlineFunction] use getFunction(); NFCSanjay Patel2017-02-151-3/+3
| | | | llvm-svn: 295185
* [InlineFunction] use getCaller(); NFCISanjay Patel2017-02-151-3/+2
| | | | llvm-svn: 295181
* [InlineFunction] use range-for loop; NFCISanjay Patel2017-02-151-10/+8
| | | | llvm-svn: 295179
OpenPOWER on IntegriCloud