| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
- Fix PR5145 and turn on test 8-bit atomic ops
llvm-svn: 164358
|
| |
|
|
|
|
|
| |
- Rewirte most atomic instructions in templates for both better
maintenance and future extensions, such as HLE in TSX.
llvm-svn: 164357
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Rewrite/merge pseudo-atomic instruction emitters to address the
following issue:
* Reduce one unnecessary load in spin-loop
previously the spin-loop looks like
thisMBB:
newMBB:
ld t1 = [bitinstr.addr]
op t2 = t1, [bitinstr.val]
not t3 = t2 (if Invert)
mov EAX = t1
lcs dest = [bitinstr.addr], t3 [EAX is implicit]
bz newMBB
fallthrough -->nextMBB
the 'ld' at the beginning of newMBB should be lift out of the loop
as lcs (or CMPXCHG on x86) will load the current memory value into
EAX. This loop is refined as:
thisMBB:
EAX = LOAD [MI.addr]
mainMBB:
t1 = OP [MI.val], EAX
LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined]
JNE mainMBB
sinkMBB:
* Remove immopc as, so far, all pseudo-atomic instructions has
all-register form only, there is no immedidate operand.
* Remove unnecessary attributes/modifiers in pseudo-atomic instruction
td
* Fix issues in PR13458
- Add comprehensive tests on atomic ops on various data types.
NOTE: Some of them are turned off due to missing functionality.
- Revise tests due to the new spin-loop generated.
llvm-svn: 164281
|
| |
|
|
|
|
| |
This was only an issue if sse is disabled.
llvm-svn: 163967
|
| |
|
|
| |
llvm-svn: 163835
|
| |
|
|
|
|
|
|
| |
- Enhance the fix to PR12312 to support wider integer, such as 256-bit
integer. If more than 1 fully evaluated vectors are found, POR them
first followed by the final PTEST.
llvm-svn: 163832
|
| |
|
|
|
|
|
|
|
|
|
| |
- BlockAddress has no support of BA + offset form and there is no way to
propagate that offset into machine operand;
- Add BA + offset support and a new interface 'getTargetBlockAddress' to
simplify target block address forming;
- All targets are modified to use new interface and X86 backend is enhanced to
support BA + offset addressing.
llvm-svn: 163743
|
| |
|
|
| |
llvm-svn: 163682
|
| |
|
|
|
|
| |
functions. No functional change.
llvm-svn: 163596
|
| |
|
|
| |
llvm-svn: 163547
|
| |
|
|
|
|
| |
- Fix an remaining issue of PR11674 as well
llvm-svn: 163528
|
| |
|
|
|
|
|
|
| |
- If a boolean value is generated from CMOV and tested as boolean value,
simplify the use of test result by referencing the original condition.
RDRAND intrinisc is one of such cases.
llvm-svn: 163516
|
| |
|
|
|
|
|
|
| |
undefined or zeroinitializer.
I've added the "zeroinitializer" case in this patch.
llvm-svn: 163506
|
| |
|
|
| |
llvm-svn: 163473
|
| |
|
|
| |
llvm-svn: 163463
|
| |
|
|
| |
llvm-svn: 163461
|
| |
|
|
|
|
| |
FFLOOR of v4f32 to Expand for ARM. v2f64 was already correct.
llvm-svn: 163458
|
| |
|
|
|
|
| |
Added generation of VPSHUB instruction for <32 x i8> vector shuffle when possible.
llvm-svn: 163312
|
| |
|
|
| |
llvm-svn: 163295
|
| |
|
|
|
|
| |
lowering and patterns. This makes it consistent with the incoming DAG nodes from the DAG builder.
llvm-svn: 163293
|
| |
|
|
| |
llvm-svn: 163258
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- CodeGenPrepare pass for identifying div/rem ops
- Backend specifies the type mapping using addBypassSlowDivType
- Enabled only for Intel Atom with O2 32-bit -> 8-bit
- Replace IDIV with instructions which test its value and use DIVB if the value
is positive and less than 256.
- In the case when the quotient and remainder of a divide are used a DIV
and a REM instruction will be present in the IR. In the non-Atom case
they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
using the quotient and remainder from the first IDIV. However,
due to this optimization CSE is not able to eliminate redundant
IDIV instructions because they are located in different basic blocks.
This is overcome by calculating both the quotient (DIV) and remainder (REM)
in each basic block that is inserted by the optimization and reusing the result
values when a subsequent DIV or REM instruction uses the same operands.
- Test cases check for the presents of the optimization when calculating
either the quotient, remainder, or both.
Patch by Tyler Nowicki!
llvm-svn: 163150
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of 4.
Since this specific shuffle is widely used in many workloads we have ~10% performance on them.
shufflevector <8 x float> %A, <8 x float> %B, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
vmovaps (%rdx), %ymm0
vshufps $8, %ymm0, %ymm0, %ymm0
vmovaps (%rcx), %ymm1
vshufps $8, %ymm0, %ymm1, %ymm1
vunpcklps %ymm0, %ymm1, %ymm0
vmovaps (%rcx), %ymm0
vmovsldup (%rdx), %ymm1
vblendps $85, %ymm0, %ymm1, %ymm0
llvm-svn: 163134
|
| |
|
|
| |
llvm-svn: 163053
|
| |
|
|
|
|
|
|
|
|
|
| |
output chain is correctly setup.
As an example, if the original load must happen before later stores, we need
to make sure the constructed VZEXT_LOAD is constrained to be before the stores.
rdar://11457792
llvm-svn: 163036
|
| |
|
|
|
|
|
|
|
| |
- In addition to undefined, if V2 is zero vector, skip 2nd PSHUFB and POR as
well as PSHUFB will zero elements with negative indices.
Patch by Sriram Murali <sriram.murali@intel.com>
llvm-svn: 163018
|
| |
|
|
| |
llvm-svn: 162999
|
| |
|
|
| |
llvm-svn: 162892
|
| |
|
|
|
|
| |
align with FMA3.
llvm-svn: 162829
|
| |
|
|
| |
llvm-svn: 162805
|
| |
|
|
| |
llvm-svn: 162780
|
| |
|
|
|
|
|
|
|
|
| |
- Add a target-specific DAG optimization to recognize a pattern PTEST-able.
Such a pattern is a OR'd tree with X86ISD::OR as the root node. When
X86ISD::OR node has only its flag result being used as a boolean value and
all its leaves are extracted from the same vector, it could be folded into an
X86ISD::PTEST node.
llvm-svn: 162735
|
| |
|
|
|
|
| |
SelectionDAGBuilder.
llvm-svn: 162661
|
| |
|
|
| |
llvm-svn: 162534
|
| |
|
|
| |
llvm-svn: 162214
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this allows for better code generation.
Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and
FMINC, which are commutative.
For example:
movaps %xmm0, %xmm1
movsd LC(%rip), %xmm0
minsd %xmm1, %xmm0
becomes:
minsd LC(%rip), %xmm0
llvm-svn: 162187
|
| |
|
|
|
|
| |
better compare/branch code.
llvm-svn: 162172
|
| |
|
|
|
|
| |
functional change intended.
llvm-svn: 162166
|
| |
|
|
| |
llvm-svn: 162164
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
arithmetic instructions. However, when small data types are used, a truncate
node appears between the SETCC node and the arithmetic operation. This patch
adds support for this pattern.
Before:
xorl %esi, %edi
testb %dil, %dil
setne %al
ret
After:
xorb %dil, %sil
setne %al
ret
rdar://12081007
llvm-svn: 162160
|
| |
|
|
| |
llvm-svn: 162089
|
| |
|
|
|
|
| |
reduce to only a single call to it thus allowing it to be inlined by the compiler.
llvm-svn: 162088
|
| |
|
|
| |
llvm-svn: 161902
|
| |
|
|
|
|
|
|
|
|
|
|
| |
- FP_EXTEND only support extending from vectors with matching elements.
This results in the scalarization of extending to v2f64 from v2f32,
which will be legalized to v4f32 not matching with v2f64.
- add X86-specific VFPEXT supproting extending from v4f32 to v2f64.
- add BUILD_VECTOR lowering helper to recover back the original
extending from v4f32 to v2f64.
- test case is enhanced to include different vector width.
llvm-svn: 161894
|
| |
|
|
| |
llvm-svn: 161860
|
| |
|
|
|
|
| |
Reduces compiled code size a little bit.
llvm-svn: 161859
|
| |
|
|
|
|
| |
putting an a couple if conditions in a better order.
llvm-svn: 161746
|
| |
|
|
| |
llvm-svn: 161745
|
| |
|
|
| |
llvm-svn: 161743
|
| |
|
|
|
|
| |
there are no legal 64-bit vector types that could be used as inputs to a 128-bit concat_vectors. Remove a target specific SDNode and its patterns that become unused as a result.
llvm-svn: 161742
|