summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/ARM/reg_sequence.ll
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] tighten test checks; NFCSanjay Patel2019-04-171-12/+12
| | | | llvm-svn: 358594
* [ARM] make test checks more thorough; NFCSanjay Patel2019-04-171-4/+16
| | | | | | | | | | This will change with the proposal in D60214. Unfortunately, the triple is not supported for auto-generation via script, and the multiple RUN lines have diffs on this test, but I can't tell exactly what is required by this test. PR7162 was an assert/crash, so hopefully, this is good enough. llvm-svn: 358587
* [DAGCombiner] loosen restrictions for moving shuffles after vector binopSanjay Patel2019-04-031-1/+1
| | | | | | | | | | | | There are 3 changes to make this correspond to the same transform in instcombine: 1. Remove the legality check - we can't create anything less legal than we started with. 2. Ease the use restriction, so we only bail out if both operands have >1 use. 3. Ease the use restriction for binops with a repeated operand (eg, mul x, x). As discussed in D60150, there's a scalarization opportunity that will be made easier by allowing this transform more generally. llvm-svn: 357580
* [ARM] preserve test intent by removing undefSanjay Patel2018-05-171-4/+4
| | | | | | | | | | | | | | | | We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in https://reviews.llvm.org/D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we correct FP undef folding. Follow-up to: https://reviews.llvm.org/rL332538 ...because that change wasn't enough. llvm-svn: 332637
* [ARM] preserve test intent by removing undefSanjay Patel2018-05-161-8/+8
| | | | | | | | | | | | We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we correct FP undef folding. llvm-svn: 332538
* [ARM][NEON] Use address space in vld([1234]|[234]lane) and ↵Jeroen Ketema2015-09-301-32/+32
| | | | | | | | | | | | | | | | | | | | | vst([1234]|[234]lane) instructions This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234], vst[234]lane ARM neon intrinsics and associates an address space with the pointer that these intrinsics take. This changes, e.g., <2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32) to <2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32) This change ensures that address spaces are fully taken into account in the ARM target during lowering of interleaved loads and stores. Differential Revision: http://reviews.llvm.org/D12985 llvm-svn: 248887
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-271-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-271-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float*> %x, ... ->getelementptr float, <4 x float*> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") normrep = re.compile( r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name *.ll | xargs ./apply.sh From llvm/src/tools/clang: find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll | xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786
* ARM & AArch64: make use of common cmpxchg idioms after expansionTim Northover2014-05-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | The C and C++ semantics for compare_exchange require it to return a bool indicating success. This gets mapped to LLVM IR which follows each cmpxchg with an icmp of the value loaded against the desired value. When lowered to ldxr/stxr loops, this extra comparison is redundant: its results are implicit in the control-flow of the function. This commit makes two changes: it replaces that icmp with appropriate PHI nodes, and then makes sure earlyCSE is called after expansion to actually make use of the opportunities revealed. I've also added -{arm,aarch64}-enable-atomic-tidy options, so that existing fragile tests aren't perturbed too much by the change. Many of them either rely on undef/unreachable too pervasively to be restored to something well-defined (particularly while making sure they test the same obscure assert from many years ago), or depend on a particular CFG shape, which is disrupted by SimplifyCFG. rdar://problem/16227836 llvm-svn: 209883
* ARM: use LLVM IR to represent the vshrn operationTim Northover2014-02-101-3/+5
| | | | | | | | | | vshrn is just the combination of a right shift and a truncate (and the limits on the immediate value actually mean the signedness of the shift doesn't matter). Using that representation allows us to get rid of an ARM-specific intrinsic, share more code with AArch64 and hopefully get better code out of the mid-end optimisers. llvm-svn: 201085
* Revert "Tests: Be less dependent on a specific schedule/regalloc"Matthias Braun2013-10-111-2/+2
| | | | | | | | | This reverts r192454 Apparently FileCheck isn't as smart as I though and does not enforce a topological order between variable defs+uses. llvm-svn: 192472
* Tests: Be less dependent on a specific schedule/regallocMatthias Braun2013-10-111-2/+2
| | | | llvm-svn: 192454
* ARM: implement some simple f64 materializations.Tim Northover2013-08-201-3/+2
| | | | | | | | Previously we used a const-pool load for virtually all 64-bit floating values. Actually, we can get quite a few common values (including 0.0, 1.0) via "vmov" instructions of one stripe or another. llvm-svn: 188773
* Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to ↵Stephen Lin2013-07-141-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | function definitions for more informative error messages. No functionality change and all updated tests passed locally. This update was done with the following bash script: find test/CodeGen -name "*.ll" | \ while read NAME; do echo "$NAME" if ! grep -q "^; *RUN: *llc.*debug" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \ while read FUNC; do sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP done sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP mv $TEMP $NAME fi done llvm-svn: 186280
* Make ARMAsmPrinter generate the correct alignment specifier syntax in ↵Kristof Beyls2013-02-221-2/+2
| | | | | | | | | instructions. The Printer will now print instructions with the correct alignment specifier syntax, like vld1.8 {d16}, [r0:64] llvm-svn: 175884
* Enable the new coalescer algorithm by default.Jakob Stoklund Olesen2012-09-271-1/+0
| | | | | | | The new coalescer is better at merging values into unused vector lanes, improving NEON code. llvm-svn: 164794
* Try to make these tests more portable.Evan Cheng2012-09-201-2/+2
| | | | llvm-svn: 164320
* Use vld1 / vst2 for unaligned v2f64 load / store. e.g. Use vld1.16 for 2-byteEvan Cheng2012-09-181-3/+3
| | | | | | | | | aligned address. Based on patch by David Peixotto. Also use vld1.64 / vst1.64 with 128-bit alignment to take advantage of alignment hints. rdar://12090772, rdar://12238782 llvm-svn: 164089
* This commit contains a few changes that had to go in together.Nadav Rotem2012-04-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Simplify xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) (and also scalar_to_vector). 2. Xor/and/or are indifferent to the swizzle operation (shuffle of one src). Simplify xor/and/or (shuff(A), shuff(B)) -> shuff(op (A, B)) 3. Optimize swizzles of shuffles: shuff(shuff(x, y), undef) -> shuff(x, y). 4. Fix an X86ISelLowering optimization which was very bitcast-sensitive. Code which was previously compiled to this: movd (%rsi), %xmm0 movdqa .LCPI0_0(%rip), %xmm2 pshufb %xmm2, %xmm0 movd (%rdi), %xmm1 pshufb %xmm2, %xmm1 pxor %xmm0, %xmm1 pshufb .LCPI0_1(%rip), %xmm1 movd %xmm1, (%rdi) ret Now compiles to this: movl (%rsi), %eax xorl %eax, (%rdi) ret llvm-svn: 153848
* ARM VLDR/VSTR instructions don't need a size suffix.Jim Grosbach2011-11-141-2/+2
| | | | | | | Canonicallize on the non-suffixed form, but continue to accept assembly that has any correctly sized type suffix. llvm-svn: 144583
* Simplify some uses of utohexstr.Benjamin Kramer2011-11-071-1/+1
| | | | | | As a side effect hex is printed lowercase instead of uppercase now. llvm-svn: 144013
* Remove VMOVDneon and VMOVQ, which are just aliases for VORR. This continues ↵Owen Anderson2011-07-151-5/+5
| | | | | | to simplify the path towards an auto-generated disassembler. llvm-svn: 135290
* Fix ARM tests to be register allocator independent.Jakob Stoklund Olesen2011-03-311-10/+12
| | | | llvm-svn: 128680
* Making use of VFP / NEON floating point multiply-accumulate / subtraction isEvan Cheng2010-12-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Work in progress, only A+B are enabled. llvm-svn: 120960
* Two sets of changes. Sorry they are intermingled.Evan Cheng2010-11-031-1/+0
| | | | | | | | | | | | | 1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to "optimize for latency". Call instructions don't have the right latency and this is more likely to use introduce spills. 2. Fix if-converter cost function. For ARM, it should use instruction latencies, not # of micro-ops since multi-latency instructions is completely executed even when the predicate is false. Also, some instruction will be "slower" when they are predicated due to the register def becoming implicit input. rdar://8598427 llvm-svn: 118135
* putback r116983 and fix simple-fp-encoding.ll testsAndrew Trick2010-10-211-1/+2
| | | | llvm-svn: 116992
* Revert r116983, which is breaking all the buildbots.Owen Anderson2010-10-211-2/+1
| | | | llvm-svn: 116987
* Add missing scheduling itineraries for transfers between core registers and ↵Evan Cheng2010-10-211-1/+2
| | | | | | VFP registers. llvm-svn: 116983
* Correct some load / store instruction itinerary mistakes:Evan Cheng2010-10-091-1/+1
| | | | | | | | 1. Cortex-A8 load / store multiplies can only issue on ALU0. 2. Eliminate A8_Issue, A8_LSPipe will correctly limit the load / store issues. 3. Correctly model all vld1 and vld2 variants. llvm-svn: 116134
* Change register allocation order for ARM VFP and NEON registers to put theBob Wilson2010-10-081-17/+17
| | | | | | | | | | | | | | | | callee-saved registers at the end of the lists. Also prefer to avoid using the low registers that are in register subclasses required by certain instructions, so that those registers will more likely be available when needed. This change makes a huge improvement in spilling in some cases. Thanks to Jakob for helping me realize the problem. Most of this patch is fixing the testsuite. There are quite a few places where we're checking for specific registers. I changed those to wildcards in places where that doesn't weaken the tests. The spill-q.ll and thumb2-spill-q.ll tests stopped spilling with this change, so I added a bunch of live values to force spills on those tests. llvm-svn: 116055
* Convert VLD1 and VLD2 instructions to use pseudo-instructions untilBob Wilson2010-09-021-1/+1
| | | | | | after regalloc. llvm-svn: 112825
* Add alignment arguments to all the NEON load/store intrinsics.Bob Wilson2010-08-271-32/+33
| | | | | | | Update all the tests using those intrinsics and add support for auto-upgrading bitcode files with the old versions of the intrinsics. llvm-svn: 112271
* Replace some NEON vmovl intrinsic that I missed earlier.Bob Wilson2010-08-201-4/+2
| | | | llvm-svn: 111696
* Use a target-specific VMOVIMM DAG node instead of BUILD_VECTOR to representBob Wilson2010-07-131-1/+1
| | | | | | NEON VMOV-immediate instructions. This simplifies some things. llvm-svn: 108275
* Print "dregpair" NEON operands with a space between them, for readability andBob Wilson2010-07-091-2/+2
| | | | | | consistency with other instructions that have lists of register operands. llvm-svn: 107944
* Reenable DAG combining for vector shuffles. It looks like it was temporarilyBob Wilson2010-07-091-1/+0
| | | | | | | | disabled and then never turned back on again. Adjust some tests, one because this change avoids an unnecessary instruction, and the other to make it continue testing what it was intended to test. llvm-svn: 107941
* Eliminate the other half of the BRCOND optimization, and updateDan Gohman2010-06-241-2/+2
| | | | | | as many tests as possible. llvm-svn: 106749
* Remove arm_apcscc from the test files. It is the default and doing thisRafael Espindola2010-06-171-4/+4
| | | | | | matches what llvm-gcc and clang now produce. llvm-svn: 106221
* Fix some latency computation bugs: if the use is not a machine opcode do not ↵Evan Cheng2010-05-281-3/+4
| | | | | | just return zero. llvm-svn: 105061
* Change ARM scheduling default to list-hybrid if the target supports floating ↵Evan Cheng2010-05-211-6/+6
| | | | | | point instructions (and is not using soft float). llvm-svn: 104307
* TwoAddressInstructionPass doesn't really know how to merge live intervals whenJakob Stoklund Olesen2010-05-191-0/+17
| | | | | | | | lowering REG_SEQUENCE instructions. Insert copies for REG_SEQUENCE sources not killed to avoid breaking later passes. llvm-svn: 104146
* Fix PR7162: Use source register classes and sub-indices to determine the ↵Evan Cheng2010-05-181-0/+38
| | | | | | correct register class of the definitions of REG_SEQUENCE. llvm-svn: 104050
* Fix PR7175. Insert copies of a REG_SEQUENCE source if it is used by other ↵Evan Cheng2010-05-171-0/+35
| | | | | | REG_SEQUENCE instructions. llvm-svn: 103994
* Fix PR7156. If the sources of a REG_SEQUENCE are all IMPLICIT_DEF's. Replace ↵Evan Cheng2010-05-171-0/+46
| | | | | | it with an IMPLICIT_DEF rather than deleting it or else it would be left without a def. llvm-svn: 103984
* Careful with reg_sequence coalescing to not to overwrite sub-register indices.Evan Cheng2010-05-171-0/+42
| | | | llvm-svn: 103971
* Turn on -neon-reg-sequence by default.Evan Cheng2010-05-171-0/+170
Using NEON load / store multiple instructions will no longer create gobs of vmov of D registers! llvm-svn: 103960
OpenPOWER on IntegriCloud