summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86/2012-04-26-sdglue.ll
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Make getZeroVector return floating point vectors in their native type ↵Craig Topper2019-09-081-2/+2
| | | | | | | | | | | | | | on SSE2 and later. isel used to require zero vectors to be canonicalized to a single type to minimize the number of patterns needed to match. This is no longer required. I plan to do this to integers too, but floating point was simpler to start with. Integer has a complication where v32i16/v64i8 aren't legal when the other 512-bit integer types are. llvm-svn: 371325
* [DAGCombiner] narrow vector binops when extraction is cheapSanjay Patel2018-10-301-12/+8
| | | | | | | | | | | | | | | | | Narrowing vector binops came up in the demanded bits discussion in D52912. I don't think we're going to be able to do this transform in IR as a canonicalization because of the risk of creating unsupported widths for vector ops, but we already have a DAG TLI hook to allow what I was hoping for: isExtractSubvectorCheap(). This is currently enabled for x86, ARM, and AArch64 (although only x86 has existing regression test diffs). This is artificially limited to not look through bitcasts because there are so many test diffs already, but that's marked with a TODO and is a small follow-up. Differential Revision: https://reviews.llvm.org/D53784 llvm-svn: 345602
* Remove various use of undef in the X86 test suite as patern involving undef ↵Amaury Sechet2018-06-041-17/+18
| | | | | | can collapse them. NFC llvm-svn: 333961
* Revert "Remove various use of undef in the X86 test suite as patern ↵Amaury Sechet2018-06-041-18/+17
| | | | | | | | involving undef can collapse them. NFC" This reverts commit f0e85c194ae5e87476bc767304470dec85b6774f. llvm-svn: 333953
* Remove various use of undef in the X86 test suite as patern involving undef ↵Amaury Sechet2018-06-041-17/+18
| | | | | | can collapse them. NFC llvm-svn: 333950
* [x86] preserve test intent by removing undefSanjay Patel2018-05-171-8/+7
| | | | | | | | | | | | We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in https://reviews.llvm.org/D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we correct FP undef folding. llvm-svn: 332640
* [CodeGen] Unify MBB reference format in both MIR and debug outputFrancis Visoiu Mistrih2017-12-041-1/+1
| | | | | | | | | | | | | | | | As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g' * find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665
* [X86] SET0 to use XMM registers where possible PR26018 PR32862Dinar Temirbulatov2017-07-271-1/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D35839 llvm-svn: 309298
* [X86][SSE] Added support for combining target shuffles to ↵Simon Pilgrim2016-06-281-3/+3
| | | | | | | | | | | | (V)PSHUFD/VPERMILPD/VPERMILPS immediate permutes This patch allows target shuffles to be combined to single input immediate permute instructions - (V)PSHUFD/VPERMILPD/VPERMILPS - allowing more general pattern matching than what we current do and improves the likelihood of memory folding compared to existing patterns which tend to reuse the input in multiple arguments. Further permute instructions (V)PSHUFLW/(V)PSHUFHW/(V)PERMQ/(V)PERMPD may be added in the future but its proven tricky to create tests cases for them so far. (V)PSHUFLW/(V)PSHUFHW is already handled quite well in combineTargetShuffle so it may be that removing some of that code may allow us to perform more of the combining in one place without duplication. Differential Revision: http://reviews.llvm.org/D21148 llvm-svn: 273999
* [X86] Regenerated sdglue test checksSimon Pilgrim2016-04-111-12/+24
| | | | llvm-svn: 265927
* [X86, AVX] use blends instead of insert128 with index 0Sanjay Patel2015-03-191-2/+2
| | | | | | | | | | | | | | | Another case of x86-specific shuffle strength reduction: avoid generating insert*128 instructions with index 0 because they are slower than their non-lane-changing blend equivalents. Shuffle lowering already catches most of these cases, but the zero vector case and some other paths such as in the modified test in vector-shuffle-256-v32.ll were getting through. Differential Revision: http://reviews.llvm.org/D8366 llvm-svn: 232773
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float*> %x, ... ->getelementptr float, <4 x float*> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") normrep = re.compile( r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name *.ll | xargs ./apply.sh From llvm/src/tools/clang: find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll | xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786
* [x86] Enable the new vector shuffle lowering by default.Chandler Carruth2014-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update the entire regression test suite for the new shuffles. Remove most of the old testing which was devoted to the old shuffle lowering path and is no longer relevant really. Also remove a few other random tests that only really exercised shuffles and only incidently or without any interesting aspects to them. Benchmarking that I have done shows a few small regressions with this on LNT, zero measurable regressions on real, large applications, and for several benchmarks where the loop vectorizer fires in the hot path it shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy Bridge machines. Running on AMD machines shows even more dramatic improvements. When using newer ISA vector extensions the gains are much more modest, but the code is still better on the whole. There are a few regressions being tracked (PR21137, PR21138, PR21139) but by and large this is expected to be a win for x86 generated code performance. It is also more correct than the code it replaces. I have fuzz tested this extensively with ISA extensions up through AVX2 and found no crashes or miscompiles (yet...). The old lowering had a few miscompiles and crashers after a somewhat smaller amount of fuzz testing. There is one significant area where the new code path lags behind and that is in AVX-512 support. However, there was *extremely little* support for that already and so this isn't a significant step backwards and the new framework will probably make it easier to implement lowering that uses the full power of AVX-512's table-based shuffle+blend (IMO). Many thanks to Quentin, Andrea, Robert, and others for benchmarking assistance. Thanks to Adam and others for help with AVX-512. Thanks to Hal, Eric, and *many* others for answering my incessant questions about how the backend actually works. =] I will leave the old code path in the tree until the 3 PRs above are at least resolved to folks' satisfaction. Then I will rip it (and 1000s of lines of code) out. =] I don't expect this flag to stay around for very long. It may not survive next week. llvm-svn: 219046
* Enable MI Sched for x86.Andrew Trick2013-10-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | This changes the SelectionDAG scheduling preference to source order. Soon, the SelectionDAG scheduler can be bypassed saving a nice chunk of compile time. Performance differences that result from this change are often a consequence of register coalescing. The register coalescer is far from perfect. Bugs can be filed for deficiencies. On x86 SandyBridge/Haswell, the source order schedule is often preserved, particularly for small blocks. Register pressure is generally improved over the SD scheduler's ILP mode. However, we are still able to handle large blocks that require latency hiding, unlike the SD scheduler's BURR mode. MI scheduler also attempts to discover the critical path in single-block loops and adjust heuristics accordingly. The MI scheduler relies on the new machine model. This is currently unimplemented for AVX, so we may not be generating the best code yet. Unit tests are updated so they don't depend on SD scheduling heuristics. llvm-svn: 192750
* Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to ↵Stephen Lin2013-07-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | function definitions for more informative error messages. No functionality change and all updated tests passed locally. This update was done with the following bash script: find test/CodeGen -name "*.ll" | \ while read NAME; do echo "$NAME" if ! grep -q "^; *RUN: *llc.*debug" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \ while read FUNC; do sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP done sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP mv $TEMP $NAME fi done llvm-svn: 186280
* Revert "Temporarily enable MI-Sched on X86."Andrew Trick2013-06-251-2/+2
| | | | | | This reverts commit 98a9b72e8c56dc13a2617de84503a3d78352789c. llvm-svn: 184823
* Temporarily enable MI-Sched on X86.Andrew Trick2013-06-241-2/+2
| | | | | | | Sorry for the unit test churn. I'll try to make the change permanently next time. llvm-svn: 184705
* Merge AVX_SET0PSY/AVX_SET0PDY/AVX2_SET0 into a single post-RA pseudo.Craig Topper2012-08-281-1/+1
| | | | llvm-svn: 162738
* Fix a regression from r147481. This combine should only happen if there is aChad Rosier2012-05-071-1/+2
| | | | | | | single use. rdar://11360370 llvm-svn: 156316
* Reapply 155668: Fix the SD scheduler to avoid gluing the same node twice.Andrew Trick2012-04-281-0/+46
| | | | | | | | | | This time, also fix the caller of AddGlue to properly handle incomplete chains. AddGlue had failure modes, but shamefully hid them from its caller. It's luck ran out. Fixes rdar://11314175: BuildSchedUnits assert. llvm-svn: 155749
* Temporarily revert r155668: Fix the SD scheduler to avoid gluing.Andrew Trick2012-04-271-46/+0
| | | | | | This definitely caused regression with ARM -mno-thumb. llvm-svn: 155743
* Fix the SD scheduler to avoid gluing the same node twice.Andrew Trick2012-04-261-0/+46
DAGCombine strangeness may result in multiple loads from the same offset. They both may try to glue themselves to another load. We could insist that the redundant loads glue themselves to each other, but the beter fix is to bail out from bad gluing at the time we detect it. Fixes rdar://11314175: BuildSchedUnits assert. llvm-svn: 155668
OpenPOWER on IntegriCloud