summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* Removed an old comment, NFCElena Demikhovsky2015-09-081-2/+0
| | | | llvm-svn: 247006
* compilation issue, NFCElena Demikhovsky2015-09-081-3/+3
| | | | llvm-svn: 246983
* fixed compilation issue, NFC.Elena Demikhovsky2015-09-081-3/+3
| | | | llvm-svn: 246982
* AVX-512: Lowering for 512-bit vector shuffles.Elena Demikhovsky2015-09-084-68/+324
| | | | | | | | Vector types: <8 x 64>, <16 x 32>, <32 x 16> float and integer. Differential Revision: http://reviews.llvm.org/D10683 llvm-svn: 246981
* Sink COFF.h MC include into .cpp filesReid Kleckner2015-09-031-0/+1
| | | | | | | | This prevents MC clients from getting COFF.h, which conflicts with winnt.h macros. Also a minor IWYU cleanup. Now the only public headers including COFF.h are in Object, and they actually need it. llvm-svn: 246784
* [x86] enable machine combiner reassociations for scalar 'xor' instsSanjay Patel2015-09-031-0/+4
| | | | llvm-svn: 246781
* AVX512: Implemented encoding and intrinsics for vplzcntq, vplzcntd, ↵Igor Breger2015-09-035-117/+54
| | | | | | | | | | vpconflictq, vpconflictd Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11931 llvm-svn: 246750
* [X86] Require 32-byte alignment for 32-byte VMOVNTs.Ahmed Bougacha2015-09-021-2/+4
| | | | | | | | | | | | | | | | We used to accept (and even test, and generate) 16-byte alignment for 32-byte nontemporal stores, but they require 32-byte alignment, per SDM. Found by inspection. Instead of hardcoding 16 in the patfrag, check for natural alignment. Also fix the autoupgrade and the various tests. Also, use explicit -mattr instead of -mcpu: I stared at the output several minutes wondering why I get 2x movntps for the unaligned case (which is the ideal output, but needs some work: see FIXME), until I remembered corei7-avx implies +slow-unaligned-mem-32. llvm-svn: 246733
* [X86] Cleanup nontemporal fragments. NFCI.Ahmed Bougacha2015-09-021-15/+6
| | | | | | | | We can chain other fragments to avoid repeating conditions. This also fixes a potential bug (that realistically can't happen), where we would match indexed nontemporal stores for i32/i64. llvm-svn: 246719
* [x86] fix allowsMisalignedMemoryAccesses() for 8-byte and smaller accessesSanjay Patel2015-09-021-5/+13
| | | | | | | | | | | | | | | | | This is a continuation of the fix from: http://reviews.llvm.org/D10662 and discussion in: http://reviews.llvm.org/D12154 Here, we distinguish slow unaligned SSE (128-bit) accesses from slow unaligned scalar (64-bit and under) accesses. Other lowering (eg, getOptimalMemOpType) assumes that unaligned scalar accesses are always ok, so this changes allowsMisalignedMemoryAccesses() to match that behavior. Differential Revision: http://reviews.llvm.org/D12543 llvm-svn: 246658
* [X86][AVX512VLBW] add support in byte shift and SADAsaf Badouh2015-09-024-7/+83
| | | | | | | | | add byte shift left/right add SAD - compute sum of absolute differences Differential Revision: http://reviews.llvm.org/D12479 llvm-svn: 246654
* AVX512: Implemented encoding and intrinsics for VGETMANTPD/S , VGETMANTSD/S ↵Igor Breger2015-09-025-17/+63
| | | | | | | | | | instructions Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11593 llvm-svn: 246642
* AVX512: Implemented encoding and intrinsics for vshufps/d.Igor Breger2015-09-023-44/+36
| | | | | | | | Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11709 llvm-svn: 246640
* AVX-512: store <4 x i1> and <2 x i1> values in memoryElena Demikhovsky2015-09-021-0/+5
| | | | | | | | Enabled DAG pattern lowering for SKX with DQI predicate. Differential Revision: http://reviews.llvm.org/D12550 llvm-svn: 246625
* [CodeGen] Fix FREM on 32-bit MSVC on x86Vedant Kumar2015-09-021-1/+11
| | | | | | | | Patch by Dylan McKay! Differential Revision: http://reviews.llvm.org/D12099 llvm-svn: 246615
* rename "slow-unaligned-mem-under-32" to slow-unaligned-mem-16" (NFCI)Sanjay Patel2015-09-015-53/+59
| | | | | | | | | | | | | | | This is a follow-on suggested by: http://reviews.llvm.org/D12154 ( http://reviews.llvm.org/rL245729 ) http://reviews.llvm.org/D10662 ( http://reviews.llvm.org/rL245075 ) This makes the attribute name match most of the existing lowering logic and regression test expectations. But the current use of this attribute is inconsistent; see the FIXME comment for "allowsMisalignedMemoryAccesses()". That change will result in functional changes and should be coming soon. llvm-svn: 246585
* AVX512: Implemented intrinsics for valign.Igor Breger2015-09-011-0/+8
| | | | | | Differential Revision: http://reviews.llvm.org/D12526 llvm-svn: 246551
* [x86] enable machine combiner reassociations for scalar 'or' instsSanjay Patel2015-08-311-0/+4
| | | | llvm-svn: 246481
* X86: Fix FastISel SSESelect register classMatthias Braun2015-08-311-3/+9
| | | | | | | | | X86FastISel has been using the wrong register class for VBLENDVPS which produces a VR128 and needs an extra copy to the target register. The problem was already hit by the existing test cases when using > llvm-lit -Dllc="llc -verify-machineinstr" llvm-svn: 246461
* AVX512: ktest implemantationIgor Breger2015-08-314-14/+16
| | | | | | | | Added tests for encoding. Differential Revision: http://reviews.llvm.org/D11979 llvm-svn: 246439
* AVX512: Implemented encoding and intrinsics for vdbpsadbwIgor Breger2015-08-315-1/+15
| | | | | | | | Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12491 llvm-svn: 246436
* AVX512: kadd implementationIgor Breger2015-08-311-2/+4
| | | | | | | | Added tests for encoding. Differential Revision: http://reviews.llvm.org/D11973 llvm-svn: 246432
* AVX512: Implemented encoding and intrinsics for vpalignrIgor Breger2015-08-314-34/+92
| | | | | | | | Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12270 llvm-svn: 246428
* [MIR Serialization] static -> static const in ↵Hal Finkel2015-08-301-1/+1
| | | | | | | | | getSerializable*MachineOperandTargetFlags Make the arrays 'static const' instead of just 'static'. Post-commit review comment from Roman Divacky on IRC. NFC. llvm-svn: 246376
* [X86] NFC: Clean up and clang-format a few linesVedant Kumar2015-08-281-5/+5
| | | | llvm-svn: 246340
* [x86] enable machine combiner reassociations for scalar 'and' instsSanjay Patel2015-08-281-1/+5
| | | | llvm-svn: 246300
* [WinEH] Add some support for code generating catchpadReid Kleckner2015-08-276-0/+52
| | | | | | | We can now run 32-bit programs with empty catch bodies. The next step is to change PEI so that we get funclet prologues and epilogues. llvm-svn: 246235
* [ms-inline-asm] Relax assertion around funky identifiers slightlyReid Kleckner2015-08-261-6/+8
| | | | | | | | | A corresponding clang change will make it so that clang can consume part of an assembler token. The assembler treats '.' as an identifier character while clang does not, so it's view of the token stream is a little different. llvm-svn: 246089
* Expose hasLiveCondCodeDef as a member function of the X86InstrInfo class. NFCAndrew Kaylor2015-08-262-1/+5
| | | | | | | | | This takes the existing static function hasLiveCondCodeDef and makes it a member function of the X86InstrInfo class. This is a useful utility function that an upcoming change would like to use. NFC. Patch by: Kevin B. Smith Differential Revision: http://reviews.llvm.org/D12371 llvm-svn: 246073
* [llvm-mc] Ignore opcode size prefix in 64-bit CALL disassemblyVedant Kumar2015-08-261-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a fix for disassembling unusual instruction sequences in 64-bit mode w.r.t the CALL rel16 instruction. It might be desirable to move the check somewhere else, but it essentially mimics the special case handling with JCXZ in 16-bit mode. The current behavior accepts the opcode size prefix and causes the call's immediate to stop disassembling after 2 bytes. When debugging sequences of instructions with this pattern, the disassembler output becomes extremely unreliable and essentially useless (if you jump midway into what lldb thinks is a unified instruction, you'll lose %rip). So we ignore the prefix and consume all 4 bytes when disassembling a 64-bit mode binary. Note: in Vol. 2A 3-99 the Intel spec states that CALL rel16 is N.S. N.S. is defined as: Indicates an instruction syntax that requires an address override prefix in 64-bit mode and is not supported. Using an address override prefix in 64-bit mode may result in model-specific execution behavior. (Vol. 2A 3-7) Since 0x66 is an operand override prefix we should be OK (although we may want to warn about 0x67 prefixes to 0xe8). On the CPUs I tested with, they all ignore the 0x66 prefix in 64-bit mode. Patch by Matthew Barney! Differential Revision: http://reviews.llvm.org/D9573 llvm-svn: 246038
* FastISel: Factor out common code; NFC intendedMatthias Braun2015-08-261-29/+5
| | | | | | | | | This should be no functional change but for the record: For three cases in X86FastISel this will change the order in which the FalseMBB and TrueMBB of a conditional branch is addedd to the successor/predecessor lists. llvm-svn: 245997
* Make variable argument intrinsics behave correctly in a Win64 CC function.Charles Davis2015-08-251-10/+18
| | | | | | | | | | | | | | | | Summary: This change makes the variable argument intrinsics, `llvm.va_start` and `llvm.va_copy`, and the `va_arg` instruction behave as they do on Windows inside a `CallingConv::X86_64_Win64` function. It's needed for a Clang patch I have to add support for GCC's `__builtin_ms_va_list` constructs. Reviewers: nadav, asl, eugenis CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D1622 llvm-svn: 245990
* make fast unaligned memory accesses implicit with SSE4.2 or SSE4aSanjay Patel2015-08-251-0/+7
| | | | | | | | | | | | | | | | | | | | | | This is a follow-on from the discussion in http://reviews.llvm.org/D12154. This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has generally fast unaligned memory ops. A motivating use case for this change is a clang invocation that doesn't explicitly set the CPU, but does target a feature that we know only exists on a CPU that supports fast unaligned memops. For example: $ clang -O1 foo.c -mavx This resolves a difference in lowering noted in PR24449: https://llvm.org/bugs/show_bug.cgi?id=24449 Before this patch, we used different store types depending on whether the example can be lowered as a memset or not. Differential Revision: http://reviews.llvm.org/D12288 llvm-svn: 245950
* [X86] Remove references to _ftol2Michael Kuperstein2015-08-255-54/+0
| | | | | | | As of r245924, _ftol2 is no longer used for fptoui on MS platforms. Remove the dead code associated with it. llvm-svn: 245925
* [X86] Fix fptoui conversionsMichael Kuperstein2015-08-252-69/+143
| | | | | | | | | | | | | | | This fixes two issues in x86 fptoui lowering. 1) Makes conversions from f80 go through the right path on AVX-512. 2) Implements an inline sequence for fptoui i64 instead of a library call. This improves performance by 6X on SSE3+ and 3X otherwise. Incidentally, it also removes the use of ftol2 for fptoui, which was wrong to begin with, as ftol2 converts to a signed i64, producing wrong results for values >= 2^63. Patch by: mitch.l.bodart@intel.com Differential Revision: http://reviews.llvm.org/D11316 llvm-svn: 245924
* Pass function attributes instead of boolean in isIntDivCheap().Steve King2015-08-252-2/+4
| | | | llvm-svn: 245921
* MachineBasicBlock: Add liveins() method returning an iterator_rangeMatthias Braun2015-08-242-15/+9
| | | | llvm-svn: 245895
* [X86] Add support for mmword memory operand size for Intel-syntax x86 assemblyMichael Zuckerman2015-08-241-1/+1
| | | | | | Differential Revision: http://reviews.llvm.org/D12151 llvm-svn: 245835
* first commit to llvmMichael Zuckerman2015-08-241-0/+1
| | | | llvm-svn: 245825
* [x86] enable machine combiner reassociations for 256-bit vector min/maxSanjay Patel2015-08-211-0/+4
| | | | llvm-svn: 245735
* remove 'FeatureSlowUAMem' from AMD CPUs based on 10H micro-arch or laterSanjay Patel2015-08-211-11/+7
| | | | | | | See discussion in D12154 ( http://reviews.llvm.org/D12154 ), AMD Software Optimization Guides for 10H/12H/15H/16H, and Agner Fog's experimental data. llvm-svn: 245733
* [x86] invert logic for attribute 'FeatureFastUAMem'Sanjay Patel2015-08-215-89/+98
| | | | | | | | | | | | | | | | This is a 'no functional change intended' patch. It removes one FIXME, but adds several more. Motivation: the FeatureFastUAMem attribute may be too general. It is used to determine if any sized misaligned memory access under 32-bytes is 'fast'. From the added FIXME comments, however, you can see that we're not consistent about this. Changing the name of the attribute makes it clearer to see the logic holes. Changing this to a 'slow' attribute also means we don't have to add an explicit 'fast' attribute to new chips; fast unaligned accesses have been standard for several generations of CPUs now. Differential Revision: http://reviews.llvm.org/D12154 llvm-svn: 245729
* [x86] enable machine combiner reassociations for 128-bit vector min/maxSanjay Patel2015-08-211-0/+8
| | | | llvm-svn: 245715
* Fix typo - symetric -> symmetric.Eric Christopher2015-08-211-1/+1
| | | | llvm-svn: 245705
* [X86] Look for scalar through one bitcast when lowering to VBROADCAST.Ahmed Bougacha2015-08-202-0/+24
| | | | | | | | | | | | | | Fixes PR23464: one way to use the broadcast intrinsics is: _mm256_broadcastw_epi16(_mm_cvtsi32_si128(*(int*)src)); We don't currently fold this, but now that we use native IR for the intrinsics (r245605), we can look through one bitcast to find the broadcast scalar. Differential Revision: http://reviews.llvm.org/D10557 llvm-svn: 245613
* [X86] Replace avx2 broadcast intrinsics with native IR.Ahmed Bougacha2015-08-201-86/+30
| | | | | | | | | | Since r245605, the clang headers don't use these anymore. r245165 updated some of the tests already; update the others, add an autoupgrade, remove the intrinsics, and cleanup the definitions. Differential Revision: http://reviews.llvm.org/D10555 llvm-svn: 245606
* [X86] Fix FBLD and FBSTPMarina Yatsina2015-08-201-2/+2
| | | | | | | | | | FBLD and FBSTP should receive TBYTE because it is defined as FBLD m80 FBSTP m80 Differential Revision: http://reviews.llvm.org/D11748 llvm-svn: 245553
* [X86] Fix bug in COMISD and COMISS definition in td filesMarina Yatsina2015-08-202-6/+6
| | | | | | | | | | | | COMISD should receive QWORD because it is defined as (V)COMISD xmm1, xmm2/m64 COMISS should receive DWORD because it is defined as (V)COMISS xmm1, xmm2/m32 Differential Revision: http://reviews.llvm.org/D11712 llvm-svn: 245551
* [X86] Fix the (shl (and (setcc_c), c1), c2) -> (and setcc_c, (c1 << c2)) foldDavid Majnemer2015-08-201-12/+28
| | | | | | | | | We didn't check for the necessary preconditions before folding a mask/shift into a single mask. This fixes PR24516. llvm-svn: 245544
* [x86] enable machine combiner reassociations for scalar double-precision min/maxSanjay Patel2015-08-191-0/+4
| | | | llvm-svn: 245506
OpenPOWER on IntegriCloud