| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 247006
|
| |
|
|
| |
llvm-svn: 246983
|
| |
|
|
| |
llvm-svn: 246982
|
| |
|
|
|
|
|
|
| |
Vector types: <8 x 64>, <16 x 32>, <32 x 16> float and integer.
Differential Revision: http://reviews.llvm.org/D10683
llvm-svn: 246981
|
| |
|
|
|
|
|
|
| |
This prevents MC clients from getting COFF.h, which conflicts with
winnt.h macros. Also a minor IWYU cleanup. Now the only public headers
including COFF.h are in Object, and they actually need it.
llvm-svn: 246784
|
| |
|
|
| |
llvm-svn: 246781
|
| |
|
|
|
|
|
|
|
|
| |
vpconflictq, vpconflictd
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D11931
llvm-svn: 246750
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to accept (and even test, and generate) 16-byte alignment
for 32-byte nontemporal stores, but they require 32-byte alignment,
per SDM. Found by inspection.
Instead of hardcoding 16 in the patfrag, check for natural alignment.
Also fix the autoupgrade and the various tests.
Also, use explicit -mattr instead of -mcpu: I stared at the output
several minutes wondering why I get 2x movntps for the unaligned
case (which is the ideal output, but needs some work: see FIXME),
until I remembered corei7-avx implies +slow-unaligned-mem-32.
llvm-svn: 246733
|
| |
|
|
|
|
|
|
| |
We can chain other fragments to avoid repeating conditions.
This also fixes a potential bug (that realistically can't happen),
where we would match indexed nontemporal stores for i32/i64.
llvm-svn: 246719
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a continuation of the fix from:
http://reviews.llvm.org/D10662
and discussion in:
http://reviews.llvm.org/D12154
Here, we distinguish slow unaligned SSE (128-bit) accesses from slow unaligned
scalar (64-bit and under) accesses. Other lowering (eg, getOptimalMemOpType)
assumes that unaligned scalar accesses are always ok, so this changes
allowsMisalignedMemoryAccesses() to match that behavior.
Differential Revision: http://reviews.llvm.org/D12543
llvm-svn: 246658
|
| |
|
|
|
|
|
|
|
| |
add byte shift left/right
add SAD - compute sum of absolute differences
Differential Revision: http://reviews.llvm.org/D12479
llvm-svn: 246654
|
| |
|
|
|
|
|
|
|
|
| |
instructions
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D11593
llvm-svn: 246642
|
| |
|
|
|
|
|
|
| |
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D11709
llvm-svn: 246640
|
| |
|
|
|
|
|
|
| |
Enabled DAG pattern lowering for SKX with DQI predicate.
Differential Revision: http://reviews.llvm.org/D12550
llvm-svn: 246625
|
| |
|
|
|
|
|
|
| |
Patch by Dylan McKay!
Differential Revision: http://reviews.llvm.org/D12099
llvm-svn: 246615
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-on suggested by:
http://reviews.llvm.org/D12154 ( http://reviews.llvm.org/rL245729 )
http://reviews.llvm.org/D10662 ( http://reviews.llvm.org/rL245075 )
This makes the attribute name match most of the existing lowering logic
and regression test expectations.
But the current use of this attribute is inconsistent; see the FIXME
comment for "allowsMisalignedMemoryAccesses()". That change will
result in functional changes and should be coming soon.
llvm-svn: 246585
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D12526
llvm-svn: 246551
|
| |
|
|
| |
llvm-svn: 246481
|
| |
|
|
|
|
|
|
|
| |
X86FastISel has been using the wrong register class for VBLENDVPS which
produces a VR128 and needs an extra copy to the target register. The
problem was already hit by the existing test cases when using
> llvm-lit -Dllc="llc -verify-machineinstr"
llvm-svn: 246461
|
| |
|
|
|
|
|
|
| |
Added tests for encoding.
Differential Revision: http://reviews.llvm.org/D11979
llvm-svn: 246439
|
| |
|
|
|
|
|
|
| |
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D12491
llvm-svn: 246436
|
| |
|
|
|
|
|
|
| |
Added tests for encoding.
Differential Revision: http://reviews.llvm.org/D11973
llvm-svn: 246432
|
| |
|
|
|
|
|
|
| |
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D12270
llvm-svn: 246428
|
| |
|
|
|
|
|
|
|
| |
getSerializable*MachineOperandTargetFlags
Make the arrays 'static const' instead of just 'static'. Post-commit review
comment from Roman Divacky on IRC. NFC.
llvm-svn: 246376
|
| |
|
|
| |
llvm-svn: 246340
|
| |
|
|
| |
llvm-svn: 246300
|
| |
|
|
|
|
|
| |
We can now run 32-bit programs with empty catch bodies. The next step
is to change PEI so that we get funclet prologues and epilogues.
llvm-svn: 246235
|
| |
|
|
|
|
|
|
|
| |
A corresponding clang change will make it so that clang can consume part
of an assembler token. The assembler treats '.' as an identifier
character while clang does not, so it's view of the token stream is a
little different.
llvm-svn: 246089
|
| |
|
|
|
|
|
|
|
| |
This takes the existing static function hasLiveCondCodeDef and makes it a member function of the X86InstrInfo class. This is a useful utility function that an upcoming change would like to use. NFC.
Patch by: Kevin B. Smith
Differential Revision: http://reviews.llvm.org/D12371
llvm-svn: 246073
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a fix for disassembling unusual instruction sequences in 64-bit
mode w.r.t the CALL rel16 instruction. It might be desirable to move the
check somewhere else, but it essentially mimics the special case
handling with JCXZ in 16-bit mode.
The current behavior accepts the opcode size prefix and causes the
call's immediate to stop disassembling after 2 bytes. When debugging
sequences of instructions with this pattern, the disassembler output
becomes extremely unreliable and essentially useless (if you jump midway
into what lldb thinks is a unified instruction, you'll lose %rip). So we
ignore the prefix and consume all 4 bytes when disassembling a 64-bit
mode binary.
Note: in Vol. 2A 3-99 the Intel spec states that CALL rel16 is N.S. N.S.
is defined as:
Indicates an instruction syntax that requires an address override
prefix in 64-bit mode and is not supported. Using an address
override prefix in 64-bit mode may result in model-specific
execution behavior. (Vol. 2A 3-7)
Since 0x66 is an operand override prefix we should be OK (although we
may want to warn about 0x67 prefixes to 0xe8). On the CPUs I tested
with, they all ignore the 0x66 prefix in 64-bit mode.
Patch by Matthew Barney!
Differential Revision: http://reviews.llvm.org/D9573
llvm-svn: 246038
|
| |
|
|
|
|
|
|
|
| |
This should be no functional change but for the record: For three cases
in X86FastISel this will change the order in which the FalseMBB and
TrueMBB of a conditional branch is addedd to the successor/predecessor
lists.
llvm-svn: 245997
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change makes the variable argument intrinsics, `llvm.va_start` and
`llvm.va_copy`, and the `va_arg` instruction behave as they do on Windows
inside a `CallingConv::X86_64_Win64` function. It's needed for a Clang patch
I have to add support for GCC's `__builtin_ms_va_list` constructs.
Reviewers: nadav, asl, eugenis
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1622
llvm-svn: 245990
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-on from the discussion in http://reviews.llvm.org/D12154.
This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has
generally fast unaligned memory ops.
A motivating use case for this change is a clang invocation that doesn't explicitly set
the CPU, but does target a feature that we know only exists on a CPU that supports fast
unaligned memops. For example:
$ clang -O1 foo.c -mavx
This resolves a difference in lowering noted in PR24449:
https://llvm.org/bugs/show_bug.cgi?id=24449
Before this patch, we used different store types depending on whether the example can be
lowered as a memset or not.
Differential Revision: http://reviews.llvm.org/D12288
llvm-svn: 245950
|
| |
|
|
|
|
|
| |
As of r245924, _ftol2 is no longer used for fptoui on MS platforms.
Remove the dead code associated with it.
llvm-svn: 245925
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes two issues in x86 fptoui lowering.
1) Makes conversions from f80 go through the right path on AVX-512.
2) Implements an inline sequence for fptoui i64 instead of a library
call. This improves performance by 6X on SSE3+ and 3X otherwise.
Incidentally, it also removes the use of ftol2 for fptoui, which was
wrong to begin with, as ftol2 converts to a signed i64, producing
wrong results for values >= 2^63.
Patch by: mitch.l.bodart@intel.com
Differential Revision: http://reviews.llvm.org/D11316
llvm-svn: 245924
|
| |
|
|
| |
llvm-svn: 245921
|
| |
|
|
| |
llvm-svn: 245895
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D12151
llvm-svn: 245835
|
| |
|
|
| |
llvm-svn: 245825
|
| |
|
|
| |
llvm-svn: 245735
|
| |
|
|
|
|
|
| |
See discussion in D12154 ( http://reviews.llvm.org/D12154 ), AMD Software
Optimization Guides for 10H/12H/15H/16H, and Agner Fog's experimental data.
llvm-svn: 245733
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a 'no functional change intended' patch. It removes one FIXME, but adds several more.
Motivation: the FeatureFastUAMem attribute may be too general. It is used to determine if any
sized misaligned memory access under 32-bytes is 'fast'. From the added FIXME comments, however,
you can see that we're not consistent about this. Changing the name of the attribute makes it
clearer to see the logic holes.
Changing this to a 'slow' attribute also means we don't have to add an explicit 'fast' attribute
to new chips; fast unaligned accesses have been standard for several generations of CPUs now.
Differential Revision: http://reviews.llvm.org/D12154
llvm-svn: 245729
|
| |
|
|
| |
llvm-svn: 245715
|
| |
|
|
| |
llvm-svn: 245705
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes PR23464: one way to use the broadcast intrinsics is:
_mm256_broadcastw_epi16(_mm_cvtsi32_si128(*(int*)src));
We don't currently fold this, but now that we use native IR for
the intrinsics (r245605), we can look through one bitcast to find
the broadcast scalar.
Differential Revision: http://reviews.llvm.org/D10557
llvm-svn: 245613
|
| |
|
|
|
|
|
|
|
|
| |
Since r245605, the clang headers don't use these anymore.
r245165 updated some of the tests already; update the others, add
an autoupgrade, remove the intrinsics, and cleanup the definitions.
Differential Revision: http://reviews.llvm.org/D10555
llvm-svn: 245606
|
| |
|
|
|
|
|
|
|
|
| |
FBLD and FBSTP should receive TBYTE because it is defined as
FBLD m80
FBSTP m80
Differential Revision: http://reviews.llvm.org/D11748
llvm-svn: 245553
|
| |
|
|
|
|
|
|
|
|
|
|
| |
COMISD should receive QWORD because it is defined as
(V)COMISD xmm1, xmm2/m64
COMISS should receive DWORD because it is defined as
(V)COMISS xmm1, xmm2/m32
Differential Revision: http://reviews.llvm.org/D11712
llvm-svn: 245551
|
| |
|
|
|
|
|
|
|
| |
We didn't check for the necessary preconditions before folding a
mask/shift into a single mask.
This fixes PR24516.
llvm-svn: 245544
|
| |
|
|
| |
llvm-svn: 245506
|