summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [DAGCombiner] Teach DAG combine that inserting an extract_subvector result ↵Craig Topper2017-02-131-0/+6
| | | | | | into the same location of a an undef vector can just use the original input to the extract. llvm-svn: 294932
* [X86] Genericize the handling of INSERT_SUBVECTOR from an EXTRACT_SUBVECTOR ↵Craig Topper2017-02-131-21/+18
| | | | | | | | to support 512-bit vectors with 128-bit or 256-bit subvectors. We now detect that both the extract and insert indices are non-zero and convert to a shuffle. This will be lowered as a blend for 256-bit vectors or as a vshuf operations for 512-bit vectors. llvm-svn: 294931
* [DAGCombiner] Remove the half vector width check for the combine of ↵Craig Topper2017-02-121-4/+3
| | | | | | | | EXTRACT_SUBVECTOR from an INSERT_SUBVECTOR. This gives more parallelism opportunities for AVX-512 when dealing with 128-bit extracts from 512-bit vectors. llvm-svn: 294930
* [X86] Don't let LowerEXTRACT_SUBVECTOR call getNode for EXTRACT_SUBVECTOR.Craig Topper2017-02-121-5/+7
| | | | | | This results in the simplifications inside of getNode running while we're legalizing nodes popped off the worklist during the final DAG combine. This basically makes a DAG combine like operation occur during this legalize step, but we don't handle something quite the same way. I think we don't recursively added the removed nodes to the DAG combiner worklist. llvm-svn: 294929
* NewGVN: Reverse order of congruence class elimination to maximize trivial ↵Daniel Berlin2017-02-121-2/+2
| | | | | | deadness llvm-svn: 294926
* NewGVN: Use shouldSwapOperands in one more placeDaniel Berlin2017-02-121-1/+1
| | | | llvm-svn: 294925
* [TargetLowering] fix SETCC SETLT folding with FP typesSanjay Patel2017-02-121-9/+13
| | | | | | | | | | | | The bug was introduced with: https://reviews.llvm.org/rL294863 ...and manifests as a selection failure in x86, but that's actually another bug. This fix prevents wrong codegen with -0.0, but in the more common case when we have NSZ and NNAN (-ffast-math), we should still be able to fold this setcc/compare. llvm-svn: 294924
* Revert accidental commit titled "testing"Daniel Berlin2017-02-121-1/+1
| | | | | | This reverts commit r294919 llvm-svn: 294923
* NewGVN: Apply the fast math flags fix in r267113 to NewGVN as well.Daniel Berlin2017-02-121-23/+26
| | | | llvm-svn: 294922
* PredicateInfo: Handle critical edgesDaniel Berlin2017-02-121-63/+107
| | | | | | | | | | | | | | | | | Summary: This adds support for placing predicateinfo such that it affects critical edges. This fixes the issues mentioned by Nuno on the mailing list. Depends on D29519 Reviewers: davide, nlopes Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29606 llvm-svn: 294921
* NewGVN: Fix missed call that should be to shouldSwapOperandsDaniel Berlin2017-02-121-1/+0
| | | | llvm-svn: 294920
* testingDaniel Berlin2017-02-121-1/+2
| | | | llvm-svn: 294919
* [X86] Fix typo in function name. NFCI.Simon Pilgrim2017-02-121-2/+2
| | | | | | convertBitVectorToUnsiged - convertBitVectorToUnsigned llvm-svn: 294914
* [AVX-512] Add various EVEX move instructions to load folding tables using ↵Craig Topper2017-02-121-4/+10
| | | | | | the VEX equivalents as a guide. llvm-svn: 294908
* [AVX-512] Add VMOV64toSDZrm CodeGenOnly instruction based on the same ↵Craig Topper2017-02-121-0/+4
| | | | | | | | instruction from AVX/SSE. I can't prove that we can select this instruction or the AVX/SSE version, but I'm adding it for consistency for now so I can continue matching the load folding tables. llvm-svn: 294907
* [X86] Fix a couple instruction names to use 'mr' instead of 'rm' to indicate ↵Craig Topper2017-02-121-2/+2
| | | | | | they are stores. AVX-512 version was already named with 'mr'. llvm-svn: 294906
* [AVX-512] Add VPEXTRD/Q to load folding tables.Craig Topper2017-02-121-0/+2
| | | | llvm-svn: 294905
* [X86][SSE] Update argument names to match function name. NFCI.Simon Pilgrim2017-02-121-12/+13
| | | | | | The target shuffle match function arguments were using the term 'Ops' but the function names referred to them as 'Inputs' - use 'Inputs' consistently. llvm-svn: 294900
* [InstCombine] fold icmp sgt/slt (add nsw X, C2), C --> icmp sgt/slt X, (C - C2)Sanjay Patel2017-02-121-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | I found one special case of this transform for 'slt 0', so I removed that and added the general transform. Alive code to check correctness: Name: slt_no_overflow Pre: WillNotOverflowSignedSub(C1, C2) %a = add nsw i8 %x, C2 %b = icmp slt %a, C1 => %b = icmp slt %x, C1 - C2 Name: sgt_no_overflow Pre: WillNotOverflowSignedSub(C1, C2) %a = add nsw i8 %x, C2 %b = icmp sgt %a, C1 => %b = icmp sgt %x, C1 - C2 http://rise4fun.com/Alive/MH Differential Revision: https://reviews.llvm.org/D29774 llvm-svn: 294898
* [ValueTracking] use nonnull argument attribute to eliminate null checksSanjay Patel2017-02-121-5/+17
| | | | | | | | | | | Enhancing value tracking's analysis of null-ness was suggested in D27855, so here's a first attempt at that. This is part of solving: https://llvm.org/bugs/show_bug.cgi?id=28430 Differential Revision: https://reviews.llvm.org/D28204 llvm-svn: 294897
* [X86][AVX2] Add support for combining target shuffles to VPMOVZXSimon Pilgrim2017-02-121-6/+11
| | | | | | Initial 256-bit vector support - 512-bit support requires extra checks for AVX512BW support (PMOVZXBW) that will be handled in a future patch. llvm-svn: 294896
* AMDGPU::expandMemIntrinsicUses(): Fix an uninitialized variable. This ↵NAKAMURA Takumi2017-02-121-1/+1
| | | | | | function returned true or undef. llvm-svn: 294895
* [LV/LoopAccess] Check statically if an unknown dependence distance can be Dorit Nuzman2017-02-121-6/+78
| | | | | | | | | | | | | | | | | | | | | | | proven larger than the loop-count This fixes PR31098: Try to resolve statically data-dependences whose compile-time-unknown distance can be proven larger than the loop-count, instead of resorting to runtime dependence checking (which are not always possible). For vectorization it is sufficient to prove that the dependence distance is >= VF; But in some cases we can prune unknown dependence distances early, and even before selecting the VF, and without a runtime test, by comparing the distance against the loop iteration count. Since the vectorized code will be executed only if LoopCount >= VF, proving distance >= LoopCount also guarantees that distance >= VF. This check is also equivalent to the Strong SIV Test. Reviewers: mkuper, anemet, sanjoy Differential Revision: https://reviews.llvm.org/D28044 llvm-svn: 294892
* AVX-512: Fixed DWARF register numbers for XMM16-31Elena Demikhovsky2017-02-121-16/+16
| | | | | | | The reference is here: https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf llvm-svn: 294890
* [PM] Add devirtualization-based iteration utility into the new PM'sChandler Carruth2017-02-121-1/+10
| | | | | | | | | | | default pipeline. A clang with this patch built with ASan and asserts can build all of the test-suite as well, so it seems to not uncover any latent problems. Differential Revision: https://reviews.llvm.org/D29853 llvm-svn: 294888
* [PM] Enable GlobalsAA in the new PM's pipeline by default.Chandler Carruth2017-02-121-14/+6
| | | | | | | | | | All the invalidation issues and bugs in this seem to be fixed, it has survived a full build of the test suite plus SPEC with asserts and ASan enabled on the Clang binary used. Differential Revision: https://reviews.llvm.org/D29815 llvm-svn: 294887
* [lib/LTO] Initial support for optimization remarks in the new API.Davide Italiano2017-02-121-0/+6
| | | | llvm-svn: 294882
* [X86] Move code for using blendi for insert_subvector out to an isel ↵Craig Topper2017-02-112-27/+53
| | | | | | pattern. This gives the DAG combiner more opportunity to optimize without needing to dig through the blend. llvm-svn: 294876
* [DAGCombiner] Make the combine of INSERT_SUBVECTOR into a CONCAT_VECTOR more ↵Craig Topper2017-02-111-16/+9
| | | | | | generic to support larger concats. llvm-svn: 294875
* [X86][SSE] Use VSEXT/VZEXT constant folding for ↵Simon Pilgrim2017-02-111-1/+6
| | | | | | | | SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG Preparatory step for PR31712 llvm-svn: 294874
* [X86][SSE] Improve VSEXT/VZEXT constant folding.Simon Pilgrim2017-02-111-11/+18
| | | | | | Generalize VSEXT/VZEXT constant folding to work with any target constant bits source not just BUILD_VECTOR . llvm-svn: 294873
* [X86][SSE] Add early-out when trying to match blend shuffle. NFCI.Simon Pilgrim2017-02-111-3/+4
| | | | llvm-svn: 294864
* [TargetLowering] check for sign-bit comparisons in SimplifyDemandedBitsSanjay Patel2017-02-111-0/+19
| | | | | | | | | | | | | | | | I don't know if anything other than x86 vectors is affected by this change, but this may allow us to remove target-specific intrinsics for blendv* (vector selects). The simplification arises from the fact that blendv* instructions only use the sign-bit when deciding which vector element to choose for the destination vector. The mechanism to fold VSELECT into SHRUNKBLEND nodes already exists in x86 lowering; this demanded bits change just enables the transform to fire more often. The original motivation starts with a bug for DSE of masked stores that seems completely unrelated, but I've explained the likely steps in this series here: https://llvm.org/bugs/show_bug.cgi?id=11210 Differential Revision: https://reviews.llvm.org/D29687 llvm-svn: 294863
* Fix indentation in X86ISelLowering. NFCAmaury Sechet2017-02-111-8/+8
| | | | llvm-svn: 294859
* [AVX-512] Add VPMINS/MINU/MAXS/MAXU instructions to load folding tables.Craig Topper2017-02-111-0/+136
| | | | llvm-svn: 294858
* [X86] Improve alphabetizing of load folding tables. NFCCraig Topper2017-02-111-18/+18
| | | | llvm-svn: 294857
* [X86][SSE] Convert getTargetShuffleMaskIndices to use ↵Simon Pilgrim2017-02-111-75/+25
| | | | | | | | | | getTargetConstantBitsFromNode. Removes duplicate constant extraction code in getTargetShuffleMaskIndices. getTargetConstantBitsFromNode - adds support for VZEXT_MOVL(SCALAR_TO_VECTOR) and fail if the caller doesn't support undef bits. llvm-svn: 294856
* [X86] Merge repeated getScalarValueSizeInBits calls. NFCI.Simon Pilgrim2017-02-111-7/+7
| | | | llvm-svn: 294852
* NewGVN: Reverse sense of this test to make it clearerDaniel Berlin2017-02-111-5/+7
| | | | llvm-svn: 294851
* NewGVN: Add missing initialization of NumFuncArgs lost due to bad merge.Daniel Berlin2017-02-111-0/+1
| | | | llvm-svn: 294850
* NewGVN: Rank and order commutative operands consistently.Daniel Berlin2017-02-111-2/+40
| | | | llvm-svn: 294849
* [X86][3DNow!] Enable PFSUB<->PFSUBR commutationSimon Pilgrim2017-02-112-2/+14
| | | | llvm-svn: 294847
* [X86][3DNow!] Enable commutation for PFADD/PFMUL/PFCMPEQ/PAVGUSB/PMULHRWSimon Pilgrim2017-02-111-8/+10
| | | | | | | | All commutations confirmed to give identical results - note PFMAX/PFMIN do not PFSUB<->PFSUBR should be commutable as well llvm-svn: 294846
* NewGVN: Clean up how we handle the INITIAL class so that everything inDaniel Berlin2017-02-111-16/+38
| | | | | | | | | | | | | | | | | | it is dead or unreachable, as it should be. This also makes the leader of INITIAL undef, enabling us to handle irreducibility properly. Summary: This lets us verify, more than we do now, that we didn't screw up value numbering. Reviewers: davide Subscribers: Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D29842 llvm-svn: 294844
* Fix "left shift of negative value -1" introduced by r294805Vitaly Buka2017-02-111-1/+1
| | | | llvm-svn: 294843
* Move symbols from the global namespace into (anonymous) namespaces. NFC.Benjamin Kramer2017-02-119-30/+25
| | | | llvm-svn: 294837
* [AVX-512] Add VPINSRB/W/D/Q instructions to load folding tables.Craig Topper2017-02-111-0/+4
| | | | llvm-svn: 294830
* [AVX-512] Fix apparent typo in instruction name VMOVSSDrr_REV->VMOVSDZrr_REV.Craig Topper2017-02-111-1/+1
| | | | llvm-svn: 294829
* [AVX-512] Add VPSADBW instructions to load folding tables.Craig Topper2017-02-111-0/+3
| | | | llvm-svn: 294827
* [X86] Don't base domain decisions on VEXTRACTF128/VINSERTF128 if only AVX1 ↵Craig Topper2017-02-111-4/+19
| | | | | | | | | | | | is available. Seems the execution dependency pass likes to use FP instructions when most of the consuming code is integer if a vextractf128 instruction produced the register. Without AVX2 we don't have the corresponding integer instruction available. This patch suppresses the domain on these instructions to GenericDomain if AVX2 is not supported so that they are ignored by domain fixing. If AVX2 is supported we'll report the correct domain and allow them to switch between integer and fp. Overall I think this produces better results in the modified test cases. llvm-svn: 294824
OpenPOWER on IntegriCloud