| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
| |
- Enhance the fix to PR12312 to support wider integer, such as 256-bit
integer. If more than 1 fully evaluated vectors are found, POR them
first followed by the final PTEST.
llvm-svn: 163832
|
| |
|
|
|
|
|
|
|
|
|
| |
- BlockAddress has no support of BA + offset form and there is no way to
propagate that offset into machine operand;
- Add BA + offset support and a new interface 'getTargetBlockAddress' to
simplify target block address forming;
- All targets are modified to use new interface and X86 backend is enhanced to
support BA + offset addressing.
llvm-svn: 163743
|
| |
|
|
| |
llvm-svn: 163682
|
| |
|
|
|
|
| |
functions. No functional change.
llvm-svn: 163596
|
| |
|
|
| |
llvm-svn: 163547
|
| |
|
|
|
|
| |
- Fix an remaining issue of PR11674 as well
llvm-svn: 163528
|
| |
|
|
|
|
|
|
| |
- If a boolean value is generated from CMOV and tested as boolean value,
simplify the use of test result by referencing the original condition.
RDRAND intrinisc is one of such cases.
llvm-svn: 163516
|
| |
|
|
|
|
|
|
| |
undefined or zeroinitializer.
I've added the "zeroinitializer" case in this patch.
llvm-svn: 163506
|
| |
|
|
| |
llvm-svn: 163473
|
| |
|
|
| |
llvm-svn: 163463
|
| |
|
|
| |
llvm-svn: 163461
|
| |
|
|
|
|
| |
FFLOOR of v4f32 to Expand for ARM. v2f64 was already correct.
llvm-svn: 163458
|
| |
|
|
|
|
| |
Added generation of VPSHUB instruction for <32 x i8> vector shuffle when possible.
llvm-svn: 163312
|
| |
|
|
| |
llvm-svn: 163295
|
| |
|
|
|
|
| |
lowering and patterns. This makes it consistent with the incoming DAG nodes from the DAG builder.
llvm-svn: 163293
|
| |
|
|
| |
llvm-svn: 163258
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- CodeGenPrepare pass for identifying div/rem ops
- Backend specifies the type mapping using addBypassSlowDivType
- Enabled only for Intel Atom with O2 32-bit -> 8-bit
- Replace IDIV with instructions which test its value and use DIVB if the value
is positive and less than 256.
- In the case when the quotient and remainder of a divide are used a DIV
and a REM instruction will be present in the IR. In the non-Atom case
they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
using the quotient and remainder from the first IDIV. However,
due to this optimization CSE is not able to eliminate redundant
IDIV instructions because they are located in different basic blocks.
This is overcome by calculating both the quotient (DIV) and remainder (REM)
in each basic block that is inserted by the optimization and reusing the result
values when a subsequent DIV or REM instruction uses the same operands.
- Test cases check for the presents of the optimization when calculating
either the quotient, remainder, or both.
Patch by Tyler Nowicki!
llvm-svn: 163150
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of 4.
Since this specific shuffle is widely used in many workloads we have ~10% performance on them.
shufflevector <8 x float> %A, <8 x float> %B, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
vmovaps (%rdx), %ymm0
vshufps $8, %ymm0, %ymm0, %ymm0
vmovaps (%rcx), %ymm1
vshufps $8, %ymm0, %ymm1, %ymm1
vunpcklps %ymm0, %ymm1, %ymm0
vmovaps (%rcx), %ymm0
vmovsldup (%rdx), %ymm1
vblendps $85, %ymm0, %ymm1, %ymm0
llvm-svn: 163134
|
| |
|
|
| |
llvm-svn: 163053
|
| |
|
|
|
|
|
|
|
|
|
| |
output chain is correctly setup.
As an example, if the original load must happen before later stores, we need
to make sure the constructed VZEXT_LOAD is constrained to be before the stores.
rdar://11457792
llvm-svn: 163036
|
| |
|
|
|
|
|
|
|
| |
- In addition to undefined, if V2 is zero vector, skip 2nd PSHUFB and POR as
well as PSHUFB will zero elements with negative indices.
Patch by Sriram Murali <sriram.murali@intel.com>
llvm-svn: 163018
|
| |
|
|
| |
llvm-svn: 162999
|
| |
|
|
| |
llvm-svn: 162892
|
| |
|
|
|
|
| |
align with FMA3.
llvm-svn: 162829
|
| |
|
|
| |
llvm-svn: 162805
|
| |
|
|
| |
llvm-svn: 162780
|
| |
|
|
|
|
|
|
|
|
| |
- Add a target-specific DAG optimization to recognize a pattern PTEST-able.
Such a pattern is a OR'd tree with X86ISD::OR as the root node. When
X86ISD::OR node has only its flag result being used as a boolean value and
all its leaves are extracted from the same vector, it could be folded into an
X86ISD::PTEST node.
llvm-svn: 162735
|
| |
|
|
|
|
| |
SelectionDAGBuilder.
llvm-svn: 162661
|
| |
|
|
| |
llvm-svn: 162534
|
| |
|
|
| |
llvm-svn: 162214
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this allows for better code generation.
Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and
FMINC, which are commutative.
For example:
movaps %xmm0, %xmm1
movsd LC(%rip), %xmm0
minsd %xmm1, %xmm0
becomes:
minsd LC(%rip), %xmm0
llvm-svn: 162187
|
| |
|
|
|
|
| |
better compare/branch code.
llvm-svn: 162172
|
| |
|
|
|
|
| |
functional change intended.
llvm-svn: 162166
|
| |
|
|
| |
llvm-svn: 162164
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
arithmetic instructions. However, when small data types are used, a truncate
node appears between the SETCC node and the arithmetic operation. This patch
adds support for this pattern.
Before:
xorl %esi, %edi
testb %dil, %dil
setne %al
ret
After:
xorb %dil, %sil
setne %al
ret
rdar://12081007
llvm-svn: 162160
|
| |
|
|
| |
llvm-svn: 162089
|
| |
|
|
|
|
| |
reduce to only a single call to it thus allowing it to be inlined by the compiler.
llvm-svn: 162088
|
| |
|
|
| |
llvm-svn: 161902
|
| |
|
|
|
|
|
|
|
|
|
|
| |
- FP_EXTEND only support extending from vectors with matching elements.
This results in the scalarization of extending to v2f64 from v2f32,
which will be legalized to v4f32 not matching with v2f64.
- add X86-specific VFPEXT supproting extending from v4f32 to v2f64.
- add BUILD_VECTOR lowering helper to recover back the original
extending from v4f32 to v2f64.
- test case is enhanced to include different vector width.
llvm-svn: 161894
|
| |
|
|
| |
llvm-svn: 161860
|
| |
|
|
|
|
| |
Reduces compiled code size a little bit.
llvm-svn: 161859
|
| |
|
|
|
|
| |
putting an a couple if conditions in a better order.
llvm-svn: 161746
|
| |
|
|
| |
llvm-svn: 161745
|
| |
|
|
| |
llvm-svn: 161743
|
| |
|
|
|
|
| |
there are no legal 64-bit vector types that could be used as inputs to a 128-bit concat_vectors. Remove a target specific SDNode and its patterns that become unused as a result.
llvm-svn: 161742
|
| |
|
|
|
|
| |
integer type not an FP type.
llvm-svn: 161738
|
| |
|
|
|
|
| |
SSE42. It was already called for the same under SSE2.
llvm-svn: 161737
|
| |
|
|
| |
llvm-svn: 161734
|
| |
|
|
|
|
| |
actions. Compiles to smaller code.
llvm-svn: 161733
|
| |
|
|
|
|
|
|
| |
- FCMOV only supports a subset of X86 conditions. Skip boolean
simplification if X86 condition is not valid for FCMOV.
- add a minimal test case for PR13577.
llvm-svn: 161732
|