summaryrefslogtreecommitdiffstats
path: root/llvm/test
Commit message (Collapse)AuthorAgeFilesLines
...
* RenameIndependentSubregs: Fix iterator problemMatt Arsenault2017-06-261-0/+67
| | | | | | | | | | Fixes bug 33597. Use of substituteRegister in the tied operand case messes up the register use iterator, causing some uses to be left unprocessed. llvm-svn: 306333
* [WebAssembly] Add more support for weak symbolsSam Clegg2017-06-263-0/+71
| | | | | | | | | Add weak symbol tests to MC Add symbol flags to output of `llvm-readobj -t`. Differential Revision: https://reviews.llvm.org/D34635 llvm-svn: 306330
* AArch64: legalize G_EXTRACT operations.Tim Northover2017-06-263-8/+92
| | | | | | | This is the dual problem to legalizing G_INSERTs so most of the code and testing was cribbed from there. llvm-svn: 306328
* AArch64: remove all kill flags when extending register liveness.Tim Northover2017-06-261-0/+19
| | | | | | | | | | | | When we forward a stored value to a load and eliminate it entirely we need to make sure the liveness of the register is maintained all the way to its use. Previously we only cleared liveness on the store doing the forwarding, but there could be other killing uses in between. We already do the right thing when the load has to be converted into something else, it was just this one path that skipped it. llvm-svn: 306318
* [X86][SSE] Check SSE2/SSE3 codegen tests on i686 and x86_64Simon Pilgrim2017-06-262-184/+456
| | | | llvm-svn: 306314
* [GVN] Recommit the patch "Add phi-translate support in scalarpre".Wei Mi2017-06-263-4/+135
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recommit fixes three bugs: The first one is to use CurrentBlock instead of PREInstr's Parent as param of performScalarPREInsertion because the Parent of a clone instruction may be uninitialized. The second one is stop PRE when CurrentBlock to its predecessor is a backedge and an operand of CurInst is defined inside of CurrentBlock. The same value defined inside of loop in last iteration can not be regarded as available. The third one is an out-of-bound array access in a flipped if guard. Right now scalarpre doesn't have phi-translate support, so it will miss some simple pre opportunities. Like the following testcase, current scalarpre cannot recognize the last "a * b" is fully redundent because a and b used by the last "a * b" expr are both defined by phis. long a[100], b[100], g1, g2, g3; __attribute__((pure)) long goo(); void foo(long a, long b, long c, long d) { g1 = a * b; if (__builtin_expect(g2 > 3, 0)) { a = c; b = d; g2 = a * b; } g3 = a * b; // fully redundant. } The patch adds phi-translate support in scalarpre. This is only a temporary solution before the newpre based on newgvn is available. llvm-svn: 306313
* AMDGPU: Setup SP/FP in callee function prolog/epilogMatt Arsenault2017-06-262-3/+32
| | | | llvm-svn: 306312
* [llvm-pdbutil] Add a mode to `bytes` for dumping split debug chunks.Zachary Turner2017-06-261-0/+12
| | | | llvm-svn: 306309
* [SystemZ] Fix missing emergency spill slot corner caseUlrich Weigand2017-06-261-0/+76
| | | | | | | | | | | | | | | | We sometimes need emergency spill slots for the register scavenger. This may be the case when code needs to access a stack slot that has an offset of 4096 or more relative to the stack pointer. To make that determination, processFunctionBeforeFrameFinalized currently simply checks the total stack frame size of the current function. But this is not enough, since code may need to access stack slots in the caller's stack frame as well, in particular incoming arguments stored on the stack. This commit fixes the problem by taking argument slots into account. llvm-svn: 306305
* [X86][SSE] Add combine tests for PMULDQ/PMULUDQSimon Pilgrim2017-06-261-0/+110
| | | | | | Found several missed optimizations while investigating replacing _mm_mul_epi32/_mm_mul_epu32 with generic implementations llvm-svn: 306302
* [X86][AVX-512] Don't raise inexact in ceil, floor, round, trunc.Ahmed Bougacha2017-06-261-8/+8
| | | | | | | | | | | | The non-AVX-512 behavior was changed in r248266 to match N1778 (C bindings for IEEE-754 (2008)), which defined the four functions to not raise the inexact exception ("rint" is still defined as raising it). Update the AVX-512 lowering of these functions to match that: it should not be different. llvm-svn: 306299
* AMDGPU/GlobalISel: Mark 32-bit G_SHL as legalTom Stellard2017-06-261-0/+18
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34589 llvm-svn: 306298
* [X86] Add test case for PR15981Simon Pilgrim2017-06-261-0/+59
| | | | llvm-svn: 306296
* [x86] transform vector inc/dec to use -1 constant (PR33483)Sanjay Patel2017-06-2620-1605/+1714
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert vector increment or decrement to sub/add with an all-ones constant: add X, <1, 1...> --> sub X, <-1, -1...> sub X, <1, 1...> --> add X, <-1, -1...> The all-ones vector constant can be materialized using a pcmpeq instruction that is commonly recognized as an idiom (has no register dependency), so that's better than loading a splat 1 constant. AVX512 uses 'vpternlogd' for 512-bit vectors because there is apparently no better way to produce 512 one-bits. The general advantages of this lowering are: 1. pcmpeq has lower latency than a memop on every uarch I looked at in Agner's tables, so in theory, this could be better for perf, but... 2. That seems unlikely to affect any OOO implementation, and I can't measure any real perf difference from this transform on Haswell or Jaguar, but... 3. It doesn't look like it from the diffs, but this is an overall size win because we eliminate 16 - 64 constant bytes in the case of a vector load. If we're broadcasting a scalar load (which might itself be a bug), then we're replacing a scalar constant load + broadcast with a single cheap op, so that should always be smaller/better too. 4. This makes the DAG/isel output more consistent - we use pcmpeq already for padd x, -1 and psub x, -1, so we should use that form for +1 too because we can. If there's some reason to favor a constant load on some CPU, let's make the reverse transform for all of these cases (either here in the DAG or in a later machine pass). This should fix: https://bugs.llvm.org/show_bug.cgi?id=33483 Differential Revision: https://reviews.llvm.org/D34336 llvm-svn: 306289
* [Hexagon] Handle cases when the aligned stack pointer is missingKrzysztof Parzyszek2017-06-262-0/+81
| | | | llvm-svn: 306288
* [SystemZ] Add a check against zero before calling getTestUnderMaskCond()Jonas Paulsson2017-06-261-0/+20
| | | | | | | | | | | | | | Csmith discovered that this function can be called with a zero argument, in which case an assert for this triggered. This patch also adds a guard before the other call to this function since it was missing, although the test only covers the case where it was discovered. Reduced test case attached as CodeGen/SystemZ/int-cmp-54.ll. Review: Ulrich Weigand llvm-svn: 306287
* [X86][LLVM][test]Expanding Supports lowerInterleavedStore() in ↵Michael Zuckerman2017-06-262-0/+76
| | | | | | | | X86InterleavedAccess test. Adding base tast (to trunk) for Store strid=4 vf=32. llvm-svn: 306286
* This reverts commit r306272.Serguei Katkov2017-06-262-54/+4
| | | | | | | | Revert "[MBP] do not rotate loop if it creates extra branch" It breaks the sanitizer build bots. Need to fix this. llvm-svn: 306276
* fix trivial typo in comment, NFCHiroshi Inoue2017-06-261-1/+1
| | | | llvm-svn: 306274
* [MBP] do not rotate loop if it creates extra branchSerguei Katkov2017-06-262-4/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a last fix for the corner case of PR32214. Actually this is not really corner case in general. We should not do a loop rotation if we create an additional branch due to it. Consider the case where we have a loop chain H, M, B, C , where H is header with viable fallthrough from pre-header and exit from the loop M - some middle block B - backedge to Header but with exit from the loop also. C - some cold block of the loop. Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch. Let's compute the change in number of branches: +1 branch from pre-header to header -1 branch from header to exit +1 branch from header to middle block if there is such -1 branch from cold bock to header if there is one So if C is not a predecessor of H then we introduce extra branch. This change actually prohibits rotation of the loop if both true 1) Best Exit has next element in chain as successor. 2) Last element in chain is not a predecessor of first element of chain. Reviewers: iteratee, xur Reviewed By: iteratee Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34271 llvm-svn: 306272
* AMDGPU: Partially fix implicit.buffer.ptr intrinsic handlingMatt Arsenault2017-06-262-0/+59
| | | | | | | | | | | | | | This should not be treated as a different version of private_segment_buffer. These are distinct things with different uses and register classes, and requires the function argument info to have more context about the function's type and environment. Also add missing test coverage for the intrinsic, and emit an error for HSA. This also encovers that the intrinsic is broken unless there happen to be stack objects. llvm-svn: 306264
* [LoopSimplify] Re-instate r306081 with a bug fix w.r.t. indirectbr.Chandler Carruth2017-06-254-3/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was reverted in r306252, but I already had the bug fixed and was just trying to form a test case. The original commit factored the logic for forming dedicated exits inside of LoopSimplify into a helper that could be used elsewhere and with an approach that required fewer intermediate data structures. See that commit for full details including the change to the statistic, etc. The code looked fine to me and my reviewers, but in fact didn't handle indirectbr correctly -- it left the 'InLoopPredecessors' vector dirty. If you have code that looks *just* right, you can end up leaking these predecessors into a subsequent rewrite, and crash deep down when trying to update PHI nodes for predecessors that don't exist. I've added an assert that makes the bug much more obvious, and then changed the code to reliably clear the vector so we don't get this bug again in some other form as the code changes. I've also added a test case that *does* manage to catch this while also giving some nice positive coverage in the face of indirectbr. The real code that found this came out of what I think is CPython's interpreter loop, but any code with really "creative" interpreter loops mixing indirectbr and other exit paths could manage to tickle the bug. I was hard to reduce the original test case because in addition to having a particular pattern of IR, the whole thing depends on the order of the predecessors which is in turn depends on use list order. The test case added here was designed so that in multiple different predecessor orderings it should always end up going down the same path and tripping the same bug. I hope. At least, it tripped it for me without manipulating the use list order which is better than anything bugpoint could do... llvm-svn: 306257
* [LoopSimplify] Improve a test for loop simplify minorly. NFC.Chandler Carruth2017-06-251-12/+150
| | | | | | | | I did some basic testing while looking for a bug in my recent change to loop simplify and even though it didn't find the bug it seems like a useful improvement anyways. llvm-svn: 306256
* Revert "[LoopSimplify] Factor the logic to form dedicated exits into a utility."Daniel Jasper2017-06-253-0/+3
| | | | | | | This leads to a segfault. Chandler already has a test case and should be able to recommit with a fix soon. llvm-svn: 306252
* [X86] Add test case for PR15705Simon Pilgrim2017-06-251-0/+48
| | | | llvm-svn: 306246
* [InstCombine] add (sext i1 X), 1 --> zext (not X)Sanjay Patel2017-06-251-8/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | http://rise4fun.com/Alive/i8Q A narrow bitwise logic op is obviously better than math for value tracking, and zext is better than sext. Typically, the 'not' will be folded into an icmp predicate. The IR difference would even survive through codegen for x86, so we would see worse code: https://godbolt.org/g/C14HMF one_or_zero(int, int): # @one_or_zero(int, int) xorl %eax, %eax cmpl %esi, %edi setle %al retq one_or_zero_alt(int, int): # @one_or_zero_alt(int, int) xorl %ecx, %ecx cmpl %esi, %edi setg %cl movl $1, %eax subl %ecx, %eax retq llvm-svn: 306243
* AVX-512: Fixed a crash during legalization of <3 x i8> typeElena Demikhovsky2017-06-251-0/+31
| | | | | | | | | The compiler fails with assertion during legalization of SETCC for <3 x i8> operands. The result is extended to <4 x i8> and then truncated <4 x i1>. It does not happen on AVX2, because the final result of SETCC is <4 x i32>. Differential Revision: https://reviews.llvm.org/D34503 llvm-svn: 306242
* [GlobalISel][X86] Support vector type G_EXTRACT selection.Igor Breger2017-06-252-0/+207
| | | | | | | | | | | | | | | | Summary: Support vector type G_EXTRACT selection. For now G_EXTRACT marked as legal for any type, so nothing to do in legalizer. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33957 llvm-svn: 306240
* [AVX2] [TTI CostModel] Add cost of interleaved loads/stores for AVX2Dorit Nuzman2017-06-252-0/+183
| | | | | | | | | | | | | | | | | | | | | | | | | | | The cost of an interleaved access was only implemented for AVX512. For other X86 targets an overly conservative Base cost was returned, resulting in avoiding vectorization where it is actually profitable to vectorize. This patch starts to add costs for AVX2 for most prominent cases of interleaved accesses (stride 3,4 chars, for now). Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb workloads; There is also a known issue of 15-30% degradations on some of these workloads, associated with an interleaved access followed by type promotion/widening; the resulting shuffle sequence is currently inefficient and will be improved by a series of patches that extend the X86InterleavedAccess pass (such as D34601 and more to follow). Note 2: The costs in this patch do not reflect port pressure penalties which can be very dominant in the case of interleaved accesses since most of the shuffle operations are restricted to a single port. Further tuning, that may incorporate these considerations, will be done on top of the upcoming improved shuffle sequences (that is, along with the abovementioned work to extend X86InterleavedAccess pass). Differential Revision: https://reviews.llvm.org/D34023 llvm-svn: 306238
* [PGO] Implementate profile counter regiser promotionXinliang David Li2017-06-253-0/+222
| | | | | | Differential Revision: http://reviews.llvm.org/D34085 llvm-svn: 306231
* fix trivial typos in comment, NFCHiroshi Inoue2017-06-241-1/+1
| | | | llvm-svn: 306211
* fix trivial typos in comment, NFCHiroshi Inoue2017-06-241-2/+2
| | | | | | dereferencable -> dereferenceable llvm-svn: 306210
* [SelectionDAG] set dereferenceable flag when expanding memcpy/memmoveHiroshi Inoue2017-06-241-0/+74
| | | | | | | | | | When SelectionDAG expands memcpy (or memmove) call into a sequence of load and store instructions, it disregards dereferenceable flag even the source pointer is known to be dereferenceable. This results in an assertion failure if SelectionDAG commonizes a load instruction generated for memcpy with another load instruction for the source pointer. This patch makes SelectionDAG to set the dereferenceable flag for the load instructions properly to avoid the assertion failure. Differential Revision: https://reviews.llvm.org/D34467 llvm-svn: 306209
* Add missing %s to RUN line.Rafael Espindola2017-06-241-1/+1
| | | | llvm-svn: 306199
* Test the object file creation too.Rafael Espindola2017-06-241-0/+10
| | | | | | | This should *really* be a llvm-mc test, but the parser is broken. See PR33579 for the parser bug. llvm-svn: 306198
* [InstCombine] Don't replace allocas with smaller globalsVitaly Buka2017-06-241-0/+31
| | | | | | | | | | | | | | | | | | Summary: InstCombine replaces large allocas with small globals consts causing buffer overflows on valid code, see PR33372. This fix permits this optimization only if the global is dereference for alloca size. Fixes PR33372 Reviewers: eugenis, majnemer, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34311 llvm-svn: 306194
* Update constants in complex-return test to prevent reduction to smaller ↵Nirav Dave2017-06-241-1/+1
| | | | | | constants llvm-svn: 306192
* [llvm-pdbutil] Dump raw bytes of module symbols and debug chunks.Zachary Turner2017-06-231-0/+73
| | | | llvm-svn: 306179
* ARM: move some logic from processFixupValue to applyFixup.Rafael Espindola2017-06-231-9/+27
| | | | | | | | | | | | processFixupValue is called on every relaxation iteration. applyFixup is only called once at the very end. applyFixup is then the correct place to do last minute changes and value checks. While here, do proper range checks again for fixup_arm_thumb_bl. We used to do it, but dropped because of thumb2. We now do it again, but use the thumb2 range. llvm-svn: 306177
* Reland r306095: [mips] Fix reg positions in the aui/daui instructionsPetar Jovanovic2017-06-236-9/+9
| | | | | | | | | | | | | | | | After fixing (r306173) a failing test in the lld test suite (r306173), reland r306095. Original commit message: [mips] Fix register positions in the aui/daui instructions Swapped the position of the rt and rs register in the aui/daui instructions for mips32r6 and mips64r6. With this change, the format of the generated instructions complies with specifications and GCC. Patch by Milos Stojanovic. llvm-svn: 306174
* [llvm-readobj] Fix COFF RVA table dumping bugReid Kleckner2017-06-232-0/+6
| | | | | | | | We would return an error in getVaPtr if the RVA table being dumped was the last data in the .rdata section. Avoid the issue by subtracting one from the offset and adding it back to get an open interval again. llvm-svn: 306171
* [llvm-pdbutil] Dump raw bytes of type and id records.Zachary Turner2017-06-231-0/+27
| | | | llvm-svn: 306167
* [llvm-pdbutil] Dump raw bytes of various DBI stream subsections.Zachary Turner2017-06-231-0/+59
| | | | llvm-svn: 306160
* [MSP430] Fix data layout string.Vadzim Dambrouski2017-06-232-1/+58
| | | | | | | | | | | | | | | | Summary: Without this patch some types have incorrect size and/or alignment according to the MSP430 EABI. Reviewers: asl, awygle Reviewed By: asl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34561 llvm-svn: 306159
* Add bitcast store-merge test.Nirav Dave2017-06-231-2/+29
| | | | llvm-svn: 306158
* [llvm-pdbutil] Show what blocks a stream occupies.Zachary Turner2017-06-231-17/+34
| | | | | | | | This is useful when you want to look at a specific chunk of a stream or look for discontinuities, and you need to know the list of blocks occupied by a stream. llvm-svn: 306150
* [llvm-pdbutil] Dump raw bytes of pdb name map.Zachary Turner2017-06-231-0/+10
| | | | | | | | This patch dumps the raw bytes of the pdb name map which contains the mapping of stream name to stream index for the string table and other reserved streams. llvm-svn: 306148
* [llvm-pdbutil] Add the ability to dump raw bytes from the file.Zachary Turner2017-06-232-6/+21
| | | | | | | | | | Normally we can only make sense of the content of a PDB in terms of streams and blocks, but in some cases it may be useful to dump bytes at a specific absolute file offset. For example, if you know that some interesting data is at a particular location and you want to see some surrounding data. llvm-svn: 306146
* Revert "[Hexagon] Handle decreasing of stack alignment in frame lowering"Krzysztof Parzyszek2017-06-231-51/+0
| | | | | | This breaks passing of aligned function arguments. llvm-svn: 306145
* [AArch64] Prefer Bcc to CBZ/CBNZ/TBZ/TBNZ when NZCV flags can be set for "free".Chad Rosier2017-06-237-20/+189
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch contains a pass that transforms CBZ/CBNZ/TBZ/TBNZ instructions into a conditional branch (Bcc), when the NZCV flags can be set for "free". This is preferred on targets that have more flexibility when scheduling Bcc instructions as compared to CBZ/CBNZ/TBZ/TBNZ (assuming all other variables are equal). This can reduce register pressure and is also the default behavior for GCC. A few examples: add w8, w0, w1 -> cmn w0, w1 ; CMN is an alias of ADDS. cbz w8, .LBB_2 -> b.eq .LBB0_2 ; single def/use of w8 removed. add w8, w0, w1 -> adds w8, w0, w1 ; w8 has multiple uses. cbz w8, .LBB1_2 -> b.eq .LBB1_2 sub w8, w0, w1 -> subs w8, w0, w1 ; w8 has multiple uses. tbz w8, #31, .LBB6_2 -> b.ge .LBB6_2 In looking at all current sub-target machine descriptions, this transformation appears to be either positive or neutral. Differential Revision: https://reviews.llvm.org/D34220. llvm-svn: 306144
OpenPOWER on IntegriCloud