summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86/MergeConsecutiveStores.ll
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Use LivePhysRegs in X86FixupBWInsts.Ahmed Bougacha2016-04-261-6/+9
| | | | | | | | | Kill-flags, which computeRegisterLiveness uses, are not reliable. LivePhysRegs is. Differential Revision: http://reviews.llvm.org/D19472 llvm-svn: 267495
* [X86] More test updates to support fixup-byte-word-insts optimizationKevin B. Smith2016-02-221-3/+6
| | | | | | | either on or off. Differential Revisions: http://reviews.llvm.org/D17458 llvm-svn: 261505
* Fix two issues in MergeConsecutiveStores:James Y Knight2015-11-021-3/+7
| | | | | | | | | | | | | | | | | | | | | | 1) PR25154. This is basically a repeat of PR18102, which was fixed in r200201, and broken again by r234430. The latter changed which of the store nodes was merged into from the first to the last. Thus, we now also need to prefer merging a later store at a given address into the target node, instead of an earlier one. 2) While investigating that, I also realized I'd introduced a bug in r236850. There, I removed a check for alignment -- not realizing that nothing except the alignment check was ensuring that none of the stores were overlapping! This is a really bogus way to ensure there's no aliased stores. A better solution to both of these issues is likely to always use the code added in the 'if (UseAA)' branches which rearrange the chain based on a more principled analysis. I'll look into whether that can be used always, but in the interest of getting things back to working, I think a minimal change makes sense. llvm-svn: 251816
* merge vector stores into wider vector stores and fix AArch64 misaligned ↵Sanjay Patel2015-09-251-4/+2
| | | | | | | | | | | | | | | | | | | | | | access TLI hook (PR21711) This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ). The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned accesses up in performSTORECombine() because they are slow. This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving existing (perhaps questionable) lowering behavior. The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned stores. Differential Revision: http://reviews.llvm.org/D12635 llvm-svn: 248622
* use CHECK-LABEL for more precisionSanjay Patel2015-09-011-7/+7
| | | | llvm-svn: 246547
* remove unnecessary/conflicting target infoSanjay Patel2015-09-011-3/+0
| | | | llvm-svn: 246514
* fixed test to specify triple rather than arch and CPUSanjay Patel2015-09-011-2/+2
| | | | llvm-svn: 246513
* Add some tests based on PR21711Sanjay Patel2015-06-161-0/+61
| | | | | | | | These were originally added in r227242, but that patch was reverted because it caused a failure on AArch64. llvm-svn: 239860
* Fix alignment checks in MergeConsecutiveStores.James Y Knight2015-05-081-8/+3
| | | | | | | | | | | | | | | 1) check whether the alignment of the memory is sufficient for the *merged* store or load to be efficient. Not doing so can result in some ridiculously poor code generation, if merging creates a vector operation which must be aligned but isn't. 2) DON'T check that the alignment of each load/store is equal. If you're merging 2 4-byte stores, the first *might* have 8-byte alignment, but the second certainly will have 4-byte alignment. We do want to allow those to be merged. llvm-svn: 236850
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-271-25/+25
| | | | | | | | | | | | | | | | | | | | | | | | load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-271-95/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float*> %x, ... ->getelementptr float, <4 x float*> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") normrep = re.compile( r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name *.ll | xargs ./apply.sh From llvm/src/tools/clang: find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll | xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786
* Revert r227242 - Merge vector stores into wider vector stores (PR21711).Quentin Colombet2015-01-271-58/+0
| | | | | | | This commit creates infinite loop in DAG combine for in the LLVM test-suite for aarch64 with mcpu=cylcone (just having neon may be enough to expose this). llvm-svn: 227272
* Merge vector stores into wider vector stores (PR21711)Sanjay Patel2015-01-271-0/+58
| | | | | | | | | | | | | | | | | | | | This patch resolves part of PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ). The 'f3' test case in that report presents a situation where we have two 128-bit stores extracted from a 256-bit source vector. Instead of producing this: vmovaps %xmm0, (%rdi) vextractf128 $1, %ymm0, 16(%rdi) This patch merges the 128-bit stores into a single 256-bit store: vmovups %ymm0, (%rdi) Differential Revision: http://reviews.llvm.org/D7208 llvm-svn: 227242
* merge consecutive stores of extracted vector elements (PR21711)Sanjay Patel2015-01-221-0/+59
| | | | | | | | | | | | | | | | | | This is a 2nd try at the same optimization as http://reviews.llvm.org/D6698. That patch was checked in at r224611, but reverted at r225031 because it caused a failure outside of the regression tests. The cause of the crash was not recognizing consecutive stores that have mixed source values (loads and vector element extracts), so this patch adds a check to bail out if any store value is not coming from a vector element extract. This patch also refactors the shared logic of the constant source and vector extracted elements source cases into a helper function. Differential Revision: http://reviews.llvm.org/D6850 llvm-svn: 226845
* Revert "merge consecutive stores of extracted vector elements"Alexey Samsonov2014-12-311-33/+0
| | | | | | | This reverts commit r224611. This change causes crashes in X86 DAG->DAG Instruction Selection. llvm-svn: 225031
* merge consecutive stores of extracted vector elementsSanjay Patel2014-12-191-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a path to DAGCombiner::MergeConsecutiveStores() to combine multiple scalar stores when the store operands are extracted vector elements. This is a partial fix for PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ). For the new test case, codegen improves from: vmovss %xmm0, (%rdi) vextractps $1, %xmm0, 4(%rdi) vextractps $2, %xmm0, 8(%rdi) vextractps $3, %xmm0, 12(%rdi) vextractf128 $1, %ymm0, %xmm0 vmovss %xmm0, 16(%rdi) vextractps $1, %xmm0, 20(%rdi) vextractps $2, %xmm0, 24(%rdi) vextractps $3, %xmm0, 28(%rdi) vzeroupper retq To: vmovups %ymm0, (%rdi) vzeroupper retq Patch reviewed by Nadav Rotem. Differential Revision: http://reviews.llvm.org/D6698 llvm-svn: 224611
* fix typo, add spaces; NFCSanjay Patel2014-12-161-37/+37
| | | | llvm-svn: 224384
* Add the ability to use GEPs for address sinking in CGPHal Finkel2014-04-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The current memory-instruction optimization logic in CGP, which sinks parts of the address computation that can be adsorbed by the addressing mode, does this by explicitly converting the relevant part of the address computation into IR-level integer operations (making use of ptrtoint and inttoptr). For most targets this is currently not a problem, but for targets wishing to make use of IR-level aliasing analysis during CodeGen, the use of ptrtoint/inttoptr is a problem for two reasons: 1. BasicAA becomes less powerful in the face of the ptrtoint/inttoptr 2. In cases where type-punning was used, and BasicAA was used to override TBAA, BasicAA may no longer do so. (this had forced us to disable all use of TBAA in CodeGen; something which we can now enable again) This (use of GEPs instead of ptrtoint/inttoptr) is not currently enabled by default (except for those targets that use AA during CodeGen), and so aside from some PowerPC subtargets and SystemZ, there should be no change in behavior. We may be able to switch completely away from the ptrtoint/inttoptr sinking on all targets, but further testing is required. I've doubled-up on a number of existing tests that are sensitive to the address sinking behavior (including some store-merging tests that are sensitive to the order of the resulting ADD operations at the SDAG level). llvm-svn: 206092
* Update to more CodeGen tests to use CHECK-LABEL for labels corresponding to ↵Stephen Lin2013-07-181-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | function definitions for more informative error messages. No functionality change. All changes were made by the following bash script: find test/CodeGen -name "*.ll" | \ while read NAME; do echo "$NAME" grep -q "^; *RUN: *llc.*debug" $NAME && continue grep -q "^; *RUN:.*llvm-objdump" $NAME && continue grep -q "^; *RUN: *opt.*" $NAME && continue TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \ while read FUNC; do sed -i '' "s/;\([A-Za-z0-9_-]*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC[:]* *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP done sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP mv $TEMP $NAME done This script catches a superset of the cases caught by the script associated with commit r186280. It initially found some false positives due to unusual constructs in a minority of tests; all such cases were disambiguated first in commit r186621. llvm-svn: 186624
* Merge load/store sequences with adresses: base + index + offsetArnold Schwaighofer2013-04-011-0/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We would also like to merge sequences that involve a variable index like in the example below. int index = *idx++ int i0 = c[index+0]; int i1 = c[index+1]; b[0] = i0; b[1] = i1; By extending the parsing of the base pointer to handle dags that contain a base, index, and offset we can handle examples like the one above. The dag for the code above will look something like: (load (i64 add (i64 copyfromreg %c) (i64 signextend (i8 load %index)))) (load (i64 add (i64 copyfromreg %c) (i64 signextend (i32 add (i32 signextend (i8 load %index)) (i32 1))))) The code that parses the tree ignores the intermediate sign extensions. However, if there is a sign extension it needs to be on all indexes. (load (i64 add (i64 copyfromreg %c) (i64 signextend (add (i8 load %index) (i8 1)))) vs (load (i64 add (i64 copyfromreg %c) (i64 signextend (i32 add (i32 signextend (i8 load %index)) (i32 1))))) radar://13536387 llvm-svn: 178483
* Dont merge consecutive loads/stores into vectors when noimplicitfloat is used.Nadav Rotem2013-02-141-0/+34
| | | | llvm-svn: 175190
* On Sandybridge split unaligned 256bit stores into two xmm-sized stores. Nadav Rotem2013-01-191-1/+1
| | | | llvm-svn: 172894
* When merging connsecutive stores, use vectors to store the constant zero.Nadav Rotem2012-10-041-3/+35
| | | | llvm-svn: 165267
* A DAGCombine optimization for mergeing consecutive stores to memory. The ↵Nadav Rotem2012-10-031-0/+273
| | | | | | | | | | | | | | | | | | | | | optimization is not profitable in many cases because modern processors perform multiple stores in parallel and merging stores prior to merging requires extra work. We handle two main cases: 1. Store of multiple consecutive constants: q->a = 3; q->4 = 5; In this case we store a single legal wide integer. 2. Store of multiple consecutive loads: int a = p->a; int b = p->b; q->a = a; q->b = b; In this case we load/store either ilegal vector registers or legal wide integer registers. llvm-svn: 165125
* Revert r164910 because it causes failures to several phase2 builds.Nadav Rotem2012-09-301-150/+0
| | | | llvm-svn: 164911
* A DAGCombine optimization for merging consecutive stores. This optimization ↵Nadav Rotem2012-09-301-0/+150
| | | | | | | | | | | | | | | | | | | is not profitable in many cases because moden processos can store multiple values in parallel, and preparing the consecutive store requires some work. We only handle these cases: 1. Consecutive stores where the values and consecutive loads. For example: int a = p->a; int b = p->b; q->a = a; q->b = b; 2. Consecutive stores where the values are constants. Foe example: q->a = 4; q->b = 5; llvm-svn: 164910
* Speculatively revert commit 164885 (nadav) in the hope of ressurecting a pile ofDuncan Sands2012-09-291-150/+0
| | | | | | | | | | | | | | | | | | | | buildbots. Original commit message: A DAGCombine optimization for merging consecutive stores. This optimization is not profitable in many cases because moden processos can store multiple values in parallel, and preparing the consecutive store requires some work. We only handle these cases: 1. Consecutive stores where the values and consecutive loads. For example: int a = p->a; int b = p->b; q->a = a; q->b = b; 2. Consecutive stores where the values are constants. Foe example: q->a = 4; q->b = 5; llvm-svn: 164890
* A DAGCombine optimization for merging consecutive stores. This optimization ↵Nadav Rotem2012-09-291-0/+150
is not profitable in many cases because moden processos can store multiple values in parallel, and preparing the consecutive store requires some work. We only handle these cases: 1. Consecutive stores where the values and consecutive loads. For example: int a = p->a; int b = p->b; q->a = a; q->b = b; 2. Consecutive stores where the values are constants. Foe example: q->a = 4; q->b = 5; llvm-svn: 164885
OpenPOWER on IntegriCloud