summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86
Commit message (Collapse)AuthorAgeFilesLines
* [MachineBlockPlacement] Let the target optimize the branches at the end.Quentin Colombet2016-05-021-4/+4
| | | | | | | | | | | | | | | After the layout of the basic blocks is set, the target may be able to get rid of unconditional branches to fallthrough blocks that the generic code does not catch. This happens any time TargetInstrInfo::AnalyzeBranch is not able to analyze all the branches involved in the terminators sequence, while still understanding a few of them. In such situation, AnalyzeBranch can directly modify the branches if it has been instructed to do so. This patch takes advantage of that. llvm-svn: 268328
* [X86] Model FAULTING_LOAD_OP as a terminator and branch.Quentin Colombet2016-05-021-24/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | This operation may branch to the handler block and we do not want it to happen anywhere within the basic block. Moreover, by marking it "terminator and branch" the machine verifier does not wrongly assume (because of AnalyzeBranch not knowing better) the branch is analyzable. Indeed, the target was seeing only the unconditional branch and not the faulting load op and thought it was a simple unconditional block. The machine verifier was complaining because of that and moreover, other optimizations could have done wrong transformation! In the process, simplify the representation of the handler block in the faulting load op. Now, we directly reference the handler block instead of using a label. This has the benefits of: 1. MC knows how to issue a label for a BB, so leave that to it. 2. Accessing the target BB from its label is painful, whereas it is direct from a MBB operand. Note: The 2 bytes offset in implicit-null-check.ll comes from the fact the unconditional jumps are not removed anymore, as the whole terminator sequence is not analyzable anymore. Will fix it in a subsequence commit. llvm-svn: 268327
* [X86][AVX2] Added 128-bit wide shuffle testSimon Pilgrim2016-05-021-0/+14
| | | | | | Demonstrate missing 128-bit wide shuffle combine support llvm-svn: 268290
* Enable the X86 call frame optimization for the 64-bit targets that allow it.David L Kreitzer2016-05-021-0/+193
| | | | | | | | Fixes PR27241. Differential Revision: http://reviews.llvm.org/D19688 llvm-svn: 268227
* [AVX512] VPACKUSWB/VPACKSSWB should not be encoded with EVEX.W=1. While ↵Craig Topper2016-05-011-12/+12
| | | | | | there fix the execution domain for VPACKSSDW/VPACKUSDW. llvm-svn: 268200
* getelementptr instruction, support index vector of EVT.Igor Breger2016-05-011-0/+9
| | | | | | Differential Revision: http://reviews.llvm.org/D19775 llvm-svn: 268195
* Change AVX512 braodcastsd/ss patterns interaction with spilling . New ↵Igor Breger2016-05-013-0/+231
| | | | | | | | implementation take a scalar register and generate a vector without COPY_TO_REGCLASS (turn it into a VR128 register ) .The issue is that during register allocation we may spill a scalar value using 128-bit loads and stores, wasting cache bandwidth. Differential Revision: http://reviews.llvm.org/D19579 llvm-svn: 268190
* [AVX512] Prefer AVX512 VPACK instructions over AVX/AVX2 instructions when ↵Craig Topper2016-05-011-8/+8
| | | | | | VLX and BWI are supported. llvm-svn: 268189
* [MBP] Use Function::optForSize() instead of checking OptimizeForSize directly.Haicheng Wu2016-04-291-0/+28
| | | | | | Fix a FIXME. Disable loop alignment if compiled with -Oz now. llvm-svn: 268121
* Differential Revision: http://reviews.llvm.org/D19733Sriraman Tallam2016-04-294-12/+29
| | | | llvm-svn: 268106
* DAGCombiner: Reduce truncated shl widthMatt Arsenault2016-04-291-0/+14
| | | | llvm-svn: 268094
* [X86] Enable the post-RA-scheduler for clang's default 32-bit cpu.Mitch Bodart2016-04-271-0/+40
| | | | | | | | | For compilations with no explicit cpu specified, this exhibits nice gains on Silvermont, with neutral performance on big cores. Differential Revision: http://reviews.llvm.org/D19138 llvm-svn: 267809
* [X86][FastISel] Make sure we use the right register class when we select stores.Quentin Colombet2016-04-271-3/+3
| | | | llvm-svn: 267806
* [X86] Fix the lowering of TLS calls.Quentin Colombet2016-04-272-4/+9
| | | | | | | | | | | The callseq_end node must be glued with the TLS calls, otherwise, the generic code will miss the uses of the returned value and will mark it dead. Moreover, TLSCall 64-bit pseudo must not set an implicit-use on RDI, the pseudo uses the symbol address at this point not RDI and the lowering will do the right thing. llvm-svn: 267797
* [X86]: Quit promoting 16 bit loads to 32 bit.Kevin B. Smith2016-04-271-1/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D19592 llvm-svn: 267773
* Revert r267649, it caused PR27539.Nico Weber2016-04-271-973/+0
| | | | llvm-svn: 267723
* [X86] Don't assume that MMX extractelts are from index 0.Ahmed Bougacha2016-04-271-0/+28
| | | | | | | It's probably the case for all 3 MMX users out there, but with hand-crafted IR, you can trigger selection failures. Fix that. llvm-svn: 267652
* [X86] Re-enable MMX i32 extractelt combine.Ahmed Bougacha2016-04-271-0/+15
| | | | | | | | | This effectively adds back the extractelt combine removed by r262358: the direct case can still occur (because x86_mmx is special, see r262446), but it's the indirect case that's now superseded by the generic combine. llvm-svn: 267651
* Detects the SAD pattern on X86 so that much better code will be emitted once ↵Cong Hou2016-04-271-0/+973
| | | | | | | | the pattern is matched. Differential revision: http://reviews.llvm.org/D14840 llvm-svn: 267649
* [X86] Make sure it is safe to clobber EFLAGS, if need be, when choosingQuentin Colombet2016-04-261-0/+44
| | | | | | | | | | | | | | | the prologue. Do not use basic blocks that have EFLAGS live-in as prologue if we need to realign the stack. Realigning the stack uses AND instruction and this clobbers EFLAGS. An other alternative would have been to save and restore EFLAGS around the stack realignment code, but this is likely inefficient. Fixes PR27531. llvm-svn: 267634
* [X86] Replace -mcpu with -mattr in several testsMitch Bodart2016-04-264-5/+5
| | | | | | Differential Revision: http://reviews.llvm.org/D19568 llvm-svn: 267629
* [MachineBasicBlock] Take advantage of the partially dead information.Quentin Colombet2016-04-261-2/+1
| | | | | | | Thanks to that information we wouldn't lie on a register being live whereas it is not. llvm-svn: 267622
* [MachineInstrBundle] Improvement the recognition of dead definitions.Quentin Colombet2016-04-261-2/+1
| | | | | | | Now, it is possible to know that partial definitions are dead definitions and recognize that clobbered registers are also dead. llvm-svn: 267621
* Swift Calling Convention: use %RAX for sret.Manman Ren2016-04-261-19/+24
| | | | | | | We don't need to copy the sret argument into %rax upon return. rdar://25671494 llvm-svn: 267579
* [CodeGenPrepare] use branch weight metadata to decide if a select should be ↵Sanjay Patel2016-04-261-3/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | turned into a branch This is part of solving PR27344: https://llvm.org/bugs/show_bug.cgi?id=27344 CGP should undo the SimplifyCFG transform for the same reason that earlier patches have used this same mechanism: it's possible that passes between SimplifyCFG and CGP may be able to optimize the IR further with a select in place. For the TLI hook default, >99% taken or not taken is chosen as the default threshold for a highly predictable branch. Even the most limited HW branch predictors will be correct on this branch almost all the time, so even a massive mispredict penalty perf loss would be overcome by the win from all the times the branch was predicted correctly. As a follow-up, we could make the default target hook less conservative by using the SchedMachineModel's MispredictPenalty. Or we could just let targets override the default by implementing the hook with that and other target-specific options. Note that trying to statically determine mispredict rates for close-to-balanced profile weight data is generally impossible if the HW is sufficiently advanced. Ie, 50/50 taken/not-taken might still be 100% predictable. Finally, note that this patch as-is will not solve PR27344 because the current __builtin_unpredictable() branch weight default values are 4 and 64. A proposal to change that is in D19435. Differential Revision: http://reviews.llvm.org/D19488 llvm-svn: 267572
* [X86] PR27502: Fix the LEA optimization pass.Andrey Turetskiy2016-04-262-1/+21
| | | | | | | | Handle MachineBasicBlock as a memory displacement operand in the LEA optimization pass. Differential Revision: http://reviews.llvm.org/D19409 llvm-svn: 267551
* [X86] Use LivePhysRegs in X86FixupBWInsts.Ahmed Bougacha2016-04-263-14/+19
| | | | | | | | | Kill-flags, which computeRegisterLiveness uses, are not reliable. LivePhysRegs is. Differential Revision: http://reviews.llvm.org/D19472 llvm-svn: 267495
* add tests for potential CGP transform (PR27344)Sanjay Patel2016-04-251-0/+32
| | | | llvm-svn: 267426
* [x86] auto-generate checks for cmov testsSanjay Patel2016-04-251-14/+32
| | | | llvm-svn: 267417
* [WinEH] Update SplitAnalysis::computeLastSplitPoint to cope with multiple EH ↵David Majnemer2016-04-251-0/+67
| | | | | | | | | | | | | | | | | | | successors We didn't have logic to correctly handle CFGs where there was more than one EH-pad successor (these are novel with WinEH). There were situations where a register was live in one exceptional successor but not another but the code as written would only consider the first exceptional successor it found. This resulted in split points which were insufficiently early if an invoke was present. This fixes PR27501. N.B. This removes getLandingPadSuccessor. llvm-svn: 267412
* Fixing wrong mask size error. From __mmask8 to __mmask16.Michael Zuckerman2016-04-251-5/+5
| | | | | | | Was reviewed over the shoulder by AsafBadouh. Connected to review http://reviews.llvm.org/D19195. llvm-svn: 267379
* [X86] Add a complete set of tests for all operand sizes of cttz/ctlz with ↵Craig Topper2016-04-251-6/+123
| | | | | | and without zero undef being lowered to bsf/bsr. llvm-svn: 267373
* [X86][AVX] Added PR24935 test caseSimon Pilgrim2016-04-241-0/+39
| | | | llvm-svn: 267362
* [X86][SSE] Added SSSE3/AVX/AVX2 BITREVERSE testsSimon Pilgrim2016-04-241-52/+14603
| | | | | | Codegen is pretty bad at the moment but could use PSHUFB quite efficiently llvm-svn: 267347
* [X86][XOP] Fixed VPPERM permute op decoding (PR27472).Simon Pilgrim2016-04-241-1/+1
| | | | | | Fixed issue with VPPERM target shuffle mask decoding that was incorrectly masking off the 3-bit permute op with a 2-bit mask. llvm-svn: 267346
* [X86][SSE] Improved support for decoding target shuffle masks through bitcastsSimon Pilgrim2016-04-242-13/+3
| | | | | | | | Reused the ability to split constants of a type wider than the shuffle mask to work with masks generated from scalar constants transfered to xmm. This fixes an issue preventing PSHUFB target shuffle masks decoding rematerialized scalar constants and also exposes the XOP VPPERM bug described in PR27472. llvm-svn: 267343
* [X86][SSE] Demonstrate issue with decoding shuffle masks that have been ↵Simon Pilgrim2016-04-242-0/+37
| | | | | | | | lowered as rematerialized constants on scalar unit Found whilst investigating PR27472 llvm-svn: 267339
* [X86] Fix patterns that turn cmove/cmovne+ctlz/cttz into lzcnt/tzcnt ↵Craig Topper2016-04-241-209/+0
| | | | | | instructions. Only one of the conditions should be valid for each pattern, not both. Update tests accordingly. llvm-svn: 267311
* DebugInfo: Remove MDString-based type referencesDuncan P. N. Exon Smith2016-04-232-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | Eliminate DITypeIdentifierMap and make DITypeRef a thin wrapper around DIType*. It is no longer legal to refer to a DICompositeType by its 'identifier:', and DIBuilder no longer retains all types with an 'identifier:' automatically. Aside from the bitcode upgrade, this is mainly removing logic to resolve an MDString-based reference to an actualy DIType. The commits leading up to this have made the implicit type map in DICompileUnit's 'retainedTypes:' field superfluous. This does not remove DITypeRef, DIScopeRef, DINodeRef, and DITypeRefArray, or stop using them in DI-related metadata. Although as of this commit they aren't serving a useful purpose, there are patchces under review to reuse them for CodeView support. The tests in LLVM were updated with deref-typerefs.sh, which is attached to the thread "[RFC] Lazy-loading of debug info metadata": http://lists.llvm.org/pipermail/llvm-dev/2016-April/098318.html llvm-svn: 267296
* [X86][XOP] Added VPPERM -> BLEND-WITH-ZERO TestSimon Pilgrim2016-04-231-0/+9
| | | | | | Currently failing due to poor blend matching, found whilst investigating PR27472 llvm-svn: 267282
* [CodeGen] When promoting CTTZ operations to larger type, don't insert a ↵Craig Topper2016-04-231-58/+3
| | | | | | select to detect if the input is zero to return the original size instead of the extended size. Instead just set the first bit in the zero extended part. llvm-svn: 267280
* Differential Revision: http://reviews.llvm.org/D19040Sriraman Tallam2016-04-222-4/+123
| | | | llvm-svn: 267229
* DAGCombiner: Relax alignment restriction when changing store typeMatt Arsenault2016-04-221-1/+1
| | | | | | If the target allows the alignment, this should be OK. llvm-svn: 267217
* CodeGen: Use PLT relocations for relative references to unnamed_addr functions.Peter Collingbourne2016-04-222-0/+35
| | | | | | | | | | | | | The relative vtable ABI (PR26723) needs PLT relocations to refer to virtual functions defined in other DSOs. The unnamed_addr attribute means that the function's address is not significant, so we're allowed to substitute it with the address of a PLT entry. Also includes a bonus feature: addends for COFF image-relative references. Differential Revision: http://reviews.llvm.org/D17938 llvm-svn: 267211
* DAGCombiner: Relax alignment restriction when changing load typeMatt Arsenault2016-04-223-5/+5
| | | | | | If the target allows the alignment, this should still be OK. llvm-svn: 267209
* Emit code16 in assembly in 16-bit modeNirav Dave2016-04-221-0/+20
| | | | | | | | | | | | | | | Summary: When generating assembly using -m16 we must explicitly mark it as 16-bit. Emit .code16 at beginning of file. Fixes wrong results when using -fno-integrated-as. Reviewers: dwmw2 Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19392 llvm-svn: 267152
* [AVX512] Teach lowering to use vplzcntd/q to implement 128/256-bit ↵Craig Topper2016-04-223-2/+317
| | | | | | CTTZ_ZERO_UNDEF even without VLX support. We can just extend to 512-bits and extract like we do for CTLZ. llvm-svn: 267100
* DAGCombiner: Reduce 64-bit BFE pattern to pattern on 32-bit componentMatt Arsenault2016-04-211-2/+2
| | | | | | | If the extracted bits are restricted to the upper half or lower half, this can be truncated. llvm-svn: 267024
* [AVX512] Add CTTZ support for v8i64 and v16i32 vectors.Craig Topper2016-04-211-182/+166
| | | | llvm-svn: 266968
* [X86] Fix vector-tzcnt-512 test to disable CDI while enabling BWI for one of ↵Craig Topper2016-04-211-18/+134
| | | | | | the runs. Update check patterns accordingly. llvm-svn: 266967
OpenPOWER on IntegriCloud