summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][SSE] Pull out variable shuffle mask combine logic. NFCI.Simon Pilgrim2017-09-271-10/+13
| | | | | | Hopefully this will make it easier to vary the combine depth threshold per-target. llvm-svn: 314337
* [X86] Rewrite the zero vector checks in lowerV2X128VectorShuffle to use the ↵Craig Topper2017-09-271-23/+10
| | | | | | | | | | Zeroable APInt We already have zeroable bits in an APInt. We might as well use that instead of checking for an all zero BUILD_VECTOR. Differential Revision: https://reviews.llvm.org/D37950 llvm-svn: 314332
* [X86] In combineLoopSADPattern, pad result with zeros and use full size add ↵Craig Topper2017-09-271-10/+7
| | | | | | | | | | | | | | instead of using a smaller add and inserting. In some cases the result psadbw is smaller than the type of the add that started the match. Currently in these cases we are using a smaller add and inserting the result. If we instead combine the psadbw with zeros and use the full size add we can take advantage of implicit zeroing we get if we emit a narrower move before the add. In a future patch, I want to make isel aware that the psadbw itself already zeroed the upper bits and remove the move entirely. Differential Revision: https://reviews.llvm.org/D37453 llvm-svn: 314331
* [AArch64][Falkor] Ignore SP based loads in HW prefetch fixups.Geoff Berry2017-09-271-1/+6
| | | | | | | | | | Reviewers: mcrosier Subscribers: aemerson, rengolin, javed.absar, kristof.beyls Differential Revision: https://reviews.llvm.org/D38301 llvm-svn: 314319
* [SimplifyCFG] add a struct to house optional folds (PR34603)Sanjay Patel2017-09-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was intended to be no-functional-change, but it's not - there's a test diff. So I thought I should stop here and post it as-is to see if this looks like what was expected based on the discussion in PR34603: https://bugs.llvm.org/show_bug.cgi?id=34603 Notes: 1. The test improvement occurs because the existing 'LateSimplifyCFG' marker is not carried through the recursive calls to 'SimplifyCFG()->SimplifyCFGOpt().run()->SimplifyCFG()'. The parameter isn't passed down, so we pick up the default value from the function signature after the first level. I assumed that was a bug, so I've passed 'Options' down in all of the 'SimplifyCFG' calls. 2. I split 'LateSimplifyCFG' into 2 bits: ConvertSwitchToLookupTable and KeepCanonicalLoops. This would theoretically allow us to differentiate the transforms controlled by those params independently. 3. We could stash the optional AssumptionCache pointer and 'LoopHeaders' pointer in the struct too. I just stopped here to minimize the diffs. 4. Similarly, I stopped short of messing with the pass manager layer. I have another question that could wait for the follow-up: why is the new pass manager creating the pass with LateSimplifyCFG set to true no matter where in the pipeline it's creating SimplifyCFG passes? // Create an early function pass manager to cleanup the output of the // frontend. EarlyFPM.addPass(SimplifyCFGPass()); --> /// \brief Construct a pass with the default thresholds /// and switch optimizations. SimplifyCFGPass::SimplifyCFGPass() : BonusInstThreshold(UserBonusInstThreshold), LateSimplifyCFG(true) {} <-- switches get converted to lookup tables and loops may not be in canonical form If this is unintended, then it's possible that the current behavior of dropping the 'LateSimplifyCFG' setting via recursion was masking this bug. Differential Revision: https://reviews.llvm.org/D38138 llvm-svn: 314308
* [PowerPC] eliminate unconditional branch to the next instructionHiroshi Inoue2017-09-271-0/+14
| | | | | | | | | This patch makes analyzeBranch eliminate unconditional branch to the next instruction. After basic blocks are re-organized by optimizers, such as machine block placement, a BB may end with an unconditional branch to the next (fallthrough) BB. This patch removes such redundant branch instruction. Differential Revision: https://reviews.llvm.org/D37730 llvm-svn: 314297
* [X86][AsmParser] fix PR32035Coby Tayree2017-09-271-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D37473 llvm-svn: 314295
* [X86][AVX] Improve (i4 bitcast (v4i1 x)) handling for 256-bit vector compare ↵Simon Pilgrim2017-09-271-1/+1
| | | | | | | | results. As commented on D37849 and rL313547, AVX1 targets were missing a chance to use vmovmskpd for v4f64/v4i64 results for bool vector bitcasts llvm-svn: 314293
* [ARM] isTruncateFree fixSam Parker2017-09-271-6/+6
| | | | | | | | | | I implemented isTruncateFree in rL313533, this patch fixes the logic to match my comment, as the previous logic was too general. Now the only truncates that are free are i64 -> i32. Differential Revision: https://reviews.llvm.org/D38234 llvm-svn: 314280
* [X86] Fix SJLJ struct offsets for x86_64Martin Storsjo2017-09-271-2/+2
| | | | | | | | | This is necessary, but not sufficient, for having working SJLJ exception handling on x86_64. Differential Revision: https://reviews.llvm.org/D38254 llvm-svn: 314277
* [X86] Remove erroneous callsite offsetting in SJLJ landing padsMartin Storsjo2017-09-271-6/+2
| | | | | | | | | | | | | | | | | | The callsite value is already stored indexed from 0 in the _Unwind_Context struct. When accessed via the functions _Unwind_GetIP and _Unwind_SetIP, the value is indexed from 1, but those functions handle the offseting. When reading directly from the struct here, we shouldn't subtract 1. This matches the code generated by the ARM target, where SJLJ exception handling is used by default on iOS. This makes clang-built object files for 32 bit x86 mingw work when linked with libgcc/libstdc++. Differential Revision: https://reviews.llvm.org/D38251 llvm-svn: 314276
* [X86] Use extract128BitVector in LowerMULH so we can extract from constant ↵Craig Topper2017-09-271-5/+6
| | | | | | build vectors. llvm-svn: 314274
* [AArch64][Falkor] Fix bug in falkor prefetcher fix pass.Geoff Berry2017-09-261-3/+8
| | | | | | | | | | | | | | | Summary: In rare cases, loads that don't get prefetched that were marked as strided loads could cause a crash if they occurred in a loop with other colliding loads. Reviewers: mcrosier Subscribers: aemerson, rengolin, javed.absar, kristof.beyls Differential Revision: https://reviews.llvm.org/D38261 llvm-svn: 314252
* [AArch64][Falkor] Fix correctness bug in falkor prefetcher fix pass and ↵Geoff Berry2017-09-261-49/+52
| | | | | | | | | | | | | | | | | | | | | correct some opcode tag computations. Summary: This addresses a correctness bug for LD[1234]*_POST opcodes that have the prefetcher fix applied to them: the base register was not being written back from the temp after being incremented, so it would appear to never be incremented. Also, fix some opcode tag computations based on some updated HW details to get better tag avoidance and thus better prefetcher performance. Reviewers: mcrosier Subscribers: aemerson, rengolin, javed.absar, kristof.beyls Differential Revision: https://reviews.llvm.org/D38256 llvm-svn: 314251
* [X86] Fix register class name in a comment. NFCCraig Topper2017-09-261-1/+1
| | | | llvm-svn: 314250
* Recommit r314151 "[X86] Make all the NOREX CodeGenOnly instructions into ↵Craig Topper2017-09-264-27/+37
| | | | | | | | postRA pseudos like the NOREX version of TEST."" The late MOV8rr_NOREX that caused the crash has been removed. llvm-svn: 314249
* [X86] Don't emit X86::MOV8rr_NOREX from X86InstrInfo::copyPhysReg.Craig Topper2017-09-261-7/+5
| | | | | | This hook is called after register allocation with two physical registers. We don't need a separate instruction at that time to force register class constraints. I left in the assert though. We also have a fatal error in X86MCCodeEmitter if we ever encode an H-reg and a REX prefix. llvm-svn: 314248
* [X86] Fix typo in comment. NFCCraig Topper2017-09-261-1/+1
| | | | llvm-svn: 314247
* [PowerPC] Reverting sequence of patches for elimination of comparison ↵Nemanja Ivanovic2017-09-261-1061/+0
| | | | | | | | | | | | | | | | | | | | | | instructions In the past while, I've committed a number of patches in the PowerPC back end aimed at eliminating comparison instructions. However, this causes some failures in proprietary source and these issues are not observed in SPEC or any open source packages I've been able to run. As a result, I'm pulling the entire series and will refactor it to: - Have a single entry point for easy control - Have fine-grained control over which patterns we transform A side-effect of this is that test cases for these patches (and modified by them) are XFAIL-ed. This is a temporary measure as it is counter-productive to remove/modify these test cases and then have to modify them again when the refactored patch is recommitted. The failure will be investigated in parallel to the refactoring effort and the recommit will either have a fix for it or will leave this transformation off by default until the problem is resolved. llvm-svn: 314244
* [X86][LLVM]Expanding Supports lowerInterleavedStore() in ↵Michael Zuckerman2017-09-261-3/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X86InterleavedAccess (VF{8|16|32} stride 3) This patch expands the support of lowerInterleavedStore to {8|16|32}x8i stride 3. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) . This patch is part two of two patches and it covers the store (interlevaed) side. The patch goal is to optimize the following sequence: a0 a1 a2 a3 a4 a5 a6 a7 b0 b1 b2 b3 b4 b5 b6 b7 c0 c1 c2 c3 c4 c5 c6 c7 into a0 b0 c0 a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a6 b6 c6 a7 b7 c7 Reviewers: zvi guyblank dorit Ayal Differential Revision: https://reviews.llvm.org/D37117 Change-Id: I56ced8bcbea809a37654060771911ade20246ccc llvm-svn: 314234
* [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.Artem Belevich2017-09-264-0/+93
| | | | | | Differential Revision: https://reviews.llvm.org/D38191 llvm-svn: 314223
* [X86] Add support for v16i32 UMUL_LOHI/SMUL_LOHICraig Topper2017-09-261-17/+20
| | | | | | | | | | | | | | Summary: This patch extends the v8i32/v4i32 custom lowering to support v16i32 Reviewers: zvi, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38274 llvm-svn: 314221
* [Hexagon] Fix a typo: #ifndef DEBUG -> #ifndef NDEBUGKrzysztof Parzyszek2017-09-261-1/+1
| | | | llvm-svn: 314216
* [Hexagon] Fix initialization of HexagonSubtargetKrzysztof Parzyszek2017-09-262-38/+18
| | | | | | | Make sure that "initializeSubtargetDependencies" sets all members that InstrInfo and the like may depend on. llvm-svn: 314214
* [X86][XOP] Merge rotation opcodes with AVX512 equivalents. NFCI.Simon Pilgrim2017-09-265-26/+19
| | | | | | | | The XOP rotations act as ROTL with +ve values and ROTR with -ve values, which means that we can treat them all as ROTL with unsigned modulo. We already check that we're only trying to lower as ROTL for XOP rotations. Differential Revision: https://reviews.llvm.org/D37949 llvm-svn: 314207
* [x86] fix pr29061Coby Tayree2017-09-261-6/+8
| | | | | | | | | | https://bugs.llvm.org//show_bug.cgi?id=29061 Don't try referencing REX-needed regs when not on 64bit mode Aligns to GCC Differetial Revision: https://reviews.llvm.org/D37801 llvm-svn: 314203
* Revert "[X86] Make all the NOREX CodeGenOnly instructions into postRA ↵Benjamin Kramer2017-09-264-37/+27
| | | | | | | | pseudos like the NOREX version of TEST." Makes llc crash. This reverts commit r314151. llvm-svn: 314199
* [X86] Finishing broadcastf32x2 and broadcasti32x2 intrinsics lowering to IR. ↵Uriel Korach2017-09-261-10/+0
| | | | | | | | | | | | llvm side. Removing X86 broadcast(f/i)32x2 intrinsics from llvm. Adding autoUpgrade support. Moving matching tests from avx512dq-intrinsics.ll to avx512dq-intrinsics-upgrade.ll and from avx512dqvl-intrinsics.ll to avx512dqvl-intrinsics-upgrade.ll. Differential Revision: https://reviews.llvm.org/D38220 llvm-svn: 314195
* [AVR] Prefer BasicBlock::getIterator over Function::begin()Dylan McKay2017-09-261-1/+1
| | | | | | Thanks to Eli Friedman for the suggestion. llvm-svn: 314182
* [AVR] When lowering shifts into loops, put newly generated MBBs in the sameDylan McKay2017-09-261-2/+4
| | | | | | | | | | | spot as the original MBB Discovered in avr-rust/rust#62 https://github.com/avr-rust/rust/issues/62 Patch by Gergo Erdi. llvm-svn: 314180
* [AVR] Use 1-byte alignment for all data typesDylan McKay2017-09-261-1/+1
| | | | | | | | | | | | | | This was an oversight in the original backend data layout. The AVR architecture does not have the concept of unaligned loads - all loads/stores from all addresses are aligned to one byte. Discovered in avr-rust issue #64 https://github.com/avr-rust/rust/issues/64 Patch By Gergo Erdi. llvm-svn: 314179
* Revert r312724 ("[ARM] Remove redundant vcvt patterns.").Eli Friedman2017-09-251-0/+14
| | | | | | | | | | | | | | It leads to some improvements, but also a regression for the simple case, so it's not clearly a good idea. test/CodeGen/ARM/vcvt.ll now has test coverage to show the difference. Ultimately, the right solution is probably to custom-lower fp-to-int conversions, to something like ARMISD::VCVT_F32_S32 plus a bitcast. It's hard to do the right thing when the implicit bitcast isn't visible to DAG transforms. llvm-svn: 314169
* X86: remove R12 from CSR on Windows x64 SwiftCCSaleem Abdulrasool2017-09-252-20/+21
| | | | | | | | R12 is used for the SwiftError parameter. It is no longer a CSR as it is used for transfer the SwiftError, and the caller must preserve it if they need to. llvm-svn: 314165
* [X86] Don't select anyext GR32->GR64 to SUBREG_TO_REG. Use INSERT_SUBREG ↵Craig Topper2017-09-251-1/+1
| | | | | | | | | | | | instead. As far as I know SUBREG_TO_REG is stating that the upper bits are 0. But if we are just converting the GR32 with no checks, then we have no reason to say the upper bits are 0. I don't really know how to test this today since I can't find anything that looks that closely at SUBREG_TO_REG. The test changes here seems to be some perturbance of register allocation. Differential Revision: https://reviews.llvm.org/D38001 llvm-svn: 314152
* [X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like ↵Craig Topper2017-09-254-27/+37
| | | | | | the NOREX version of TEST. llvm-svn: 314151
* [Hexagon] Avoid unused variable warnings in Release builds.Benjamin Kramer2017-09-251-2/+4
| | | | | | No functionality change intended. llvm-svn: 314143
* Revert "[NVPTX] added match.{any,all}.sync instructions, intrinsics & ↵Justin Lebar2017-09-254-92/+0
| | | | | | | | | | | | | | | builtins.", rL314135. Causing assertion failures on macos: > Assertion failed: (Num < NumOperands && "Invalid child # of SDNode!"), > function getOperand, file > /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/include/llvm/CodeGen/SelectionDAGNodes.h, > line 835. http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/42739/testReport/LLVM/CodeGen_NVPTX/surf_read_cuda_ll/ llvm-svn: 314142
* [X86] [ASM INTEL SYNTAX] fix for incorrect assembler code generation when ↵Konstantin Belochapka2017-09-251-0/+1
| | | | | | | | | x86-asm-syntax=intel (PR34617). Fix for incorrect code generation when x86-asm-syntax=intel. Differential Revision: https://reviews.llvm.org/D37945 llvm-svn: 314140
* [SelectionDAG] Teach simplifyDemandedBits to handle shifts by constant splat ↵Craig Topper2017-09-251-0/+6
| | | | | | | | | | | | | | | | vectors This teach simplifyDemandedBits to handle constant splat vector shifts. This required changing some uses of getZExtValue to getLimitedValue since we can't rely on legalization using getShiftAmountTy for the shift amount. I believe there may have been a bug in the ((X << C1) >>u ShAmt) handling where we didn't check if the inner shift was too large. I've fixed that here. I had to add new patterns to ARM because the zext/sext the patterns were trying to look for got turned into an any_extend with this patch. Happy to split that out too, but not sure how to test without this change. Differential Revision: https://reviews.llvm.org/D37665 llvm-svn: 314139
* [Hexagon] Better determination of register classes in bit trackerKrzysztof Parzyszek2017-09-254-16/+68
| | | | | | | | | | | Add two callbacks to MachineEvaluator, so that specific implementations can specify more details about register classes: - composeWithSubRegIndex(RC,Idx), to provide the register class for a register from RC used in conjunction with a subregister index Idx. - getPhysRegBitWidth(Reg), to provide the size in bits of the given physical register. llvm-svn: 314136
* [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.Artem Belevich2017-09-254-0/+92
| | | | | | Differential Revision: https://reviews.llvm.org/D38191 llvm-svn: 314135
* [Hexagon] Make getHexagonSubRegIndex take reference instead of pointerKrzysztof Parzyszek2017-09-255-16/+17
| | | | llvm-svn: 314134
* [AVX-512] Replace large number of explicit patterns that check for ↵Craig Topper2017-09-253-632/+116
| | | | | | | | | | | | | | insert_subvector with zero after masked compares with fewer patterns with predicate This replaces the large number of patterns that handle every possible case of zeroing after a masked compare with a few simpler patterns that use a predicate to check for a masked compare producer. This is similar to what we do for detecting free GR32->GR64 zero extends and free xmm->ymm/zmm zero extends. This shrinks the isel table from ~590k to ~531k. This is a roughly 10% reduction in size. Differential Revision: https://reviews.llvm.org/D38217 llvm-svn: 314133
* ARM: One more fix for swifterror CSR setArnold Schwaighofer2017-09-252-3/+11
| | | | | | | We use a differently ordered CSR set if the frame pointer is pushed. Add a matching ..._SwiftError version. llvm-svn: 314128
* [ARM] Fix -Wdangling-else warning.Benjamin Kramer2017-09-251-8/+4
| | | | | | A ternary is clearer here. No functionality change. llvm-svn: 314123
* ARM: Use the proper swifterror CSR list on platforms other than darwinArnold Schwaighofer2017-09-252-2/+11
| | | | | | Noticed by inspection llvm-svn: 314121
* [X86][LLVM]Expanding Supports lowerInterleavedStore() in ↵Michael Zuckerman2017-09-251-1/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X86InterleavedAccess (VF8 stride 4): This patch expands the support of lowerInterleavedStore to 8x8i stride 4. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=8) and we plan to include more patterns in the future. The patch goal is to optimize the following sequence: At the end of the computation, we have xmm2, xmm0, xmm12 and xmm3 holding each 8 chars: c0, c1, , c7 m0, m1, , m7 y0, y1, , y7 k0, k1, ., k7 And these need to be transposed/interleaved and stored like so: c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 .... Reviewers DavidKreitzer Farhana zvi igorb guyblank RKSimon Ayal Differential Revision: https://reviews.llvm.org/D36058 Change-Id: I3cc5c2ca5d6318901c192a4428493b99ef424c32 llvm-svn: 314109
* [PowerPC] Eliminate compares - add i64 sext/zext handling for SETLT/SETGTNemanja Ivanovic2017-09-251-2/+76
| | | | | | | | As mentioned in https://reviews.llvm.org/D33718, this simply adds another pattern to the compare elimination sequence and is committed without a differential review. llvm-svn: 314106
* [AArch64] Add basic support for Qualcomm's Saphira CPU.Chad Rosier2017-09-253-0/+21
| | | | llvm-svn: 314105
* Adding missing feature to goldmont.Michael Zuckerman2017-09-251-1/+2
| | | | | Change-Id: I1ddc619169fae6a56308deef8dae5db3da702cf4 llvm-svn: 314103
OpenPOWER on IntegriCloud