summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Add target combine rules for horizontal add/sub.Andrea Di Biagio2014-06-091-0/+376
| | | | | | | | | | | | | | | | | | | | This patch adds new target specific combine rules to identify horizontal add/sub idioms from BUILD_VECTOR dag nodes. This patch also teaches the DAGCombiner how to canonicalize sequences of insert_vector_elt dag nodes according to the following rule: (insert_vector_elt (insert_vector_elt A, I0), I1) -> (insert_vecto_elt (insert_vector_elt A, I1), I0) This new canonicalization rule only triggers if the inner insert_vector dag node has exactly one use; also, both indices must be known constants, and I1 < I0. This last rule made it possible to write a simpler algorithm to identify horizontal add/sub patterns because now we don't have to worry about the ordering of insert_vector_elt dag nodes. llvm-svn: 210477
* llvm/test/CodeGen/X86/2014-05-29-factorial.ll: Relax an expression to match ↵NAKAMURA Takumi2014-06-091-2/+2
| | | | | | Win32 x64. llvm-svn: 210471
* [X86] Avoid emitting unnecessary test instructions.Andrea Di Biagio2014-06-091-0/+24
| | | | | | | | | | | | | This patch teaches the backend how to check for the 'NoSignedWrap' flag on binary operations to improve the emission of 'test' instructions. If the result of a binary operation is known not to overflow we know that resetting the Overflow flag is unnecessary and so we can avoid emitting the test instruction. Patch by Marcello Maggioni. llvm-svn: 210468
* [DAG] Expose NoSignedWrap, NoUnsignedWrap and Exact flags to SelectionDAG.Andrea Di Biagio2014-06-091-0/+20
| | | | | | | | | | | | | This patch modifies SelectionDAGBuilder to construct SDNodes with associated NoSignedWrap, NoUnsignedWrap and Exact flags coming from IR BinaryOperator instructions. Added a new SDNode type called 'BinaryWithFlagsSDNode' to allow accessing nsw/nuw/exact flags during codegen. Patch by Marcello Maggioni. llvm-svn: 210467
* Fix typosAlp Toker2014-06-071-3/+3
| | | | llvm-svn: 210401
* X86: Don't turn shifts into ands if there's another use that may not check ↵Benjamin Kramer2014-06-061-0/+13
| | | | | | | | for equality. Fixes PR19964. llvm-svn: 210371
* Fixed a bug in lowering shuffle_vectors to insertpsFilipe Cabecinhas2014-06-062-2/+15
| | | | | | | | | | | | | | Summary: We were being too strict and not accounting for undefs. Added a test case and fixed another one where we improved codegen. Reviewers: grosbach, nadav, delena Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4039 llvm-svn: 210361
* Add a new attribute called 'jumptable' that creates jump-instruction tables ↵Tom Roeder2014-06-053-0/+351
| | | | | | | | | | | | for functions marked with this attribute. It includes a pass that rewrites all indirect calls to jumptable functions to pass through these tables. This also adds backend support for generating the jump-instruction tables on ARM and X86. Note that since the jumptable attribute creates a second function pointer for a function, any function marked with jumptable must also be marked with unnamed_addr. llvm-svn: 210280
* Revert r209381 as it isn't a local variable. Add a testcase so thatEric Christopher2014-06-031-0/+23
| | | | | | we know next time this happens. llvm-svn: 210127
* Allow alias to point to an arbitrary ConstantExpr.Rafael Espindola2014-06-032-2/+15
| | | | | | | | | | | | | | | | | | | | | This patch changes GlobalAlias to point to an arbitrary ConstantExpr and it is up to MC (or the system assembler) to decide if that expression is valid or not. This reduces our ability to diagnose invalid uses and how early we can spot them, but it also lets us do things like @test5 = alias inttoptr(i32 sub (i32 ptrtoint (i32* @test2 to i32), i32 ptrtoint (i32* @bar to i32)) to i32*) An important implication of this patch is that the notion of aliased global doesn't exist any more. The alias has to encode the information needed to access it in its metadata (linkage, visibility, type, etc). Another consequence to notice is that getSection has to return a "const char *". It could return a NullTerminatedStringRef if there was such a thing, but when that was proposed the decision was to just uses "const char*" for that. llvm-svn: 210062
* [X86] Fix checked arithmetic for i8 on X86.Andrea Di Biagio2014-06-021-0/+24
| | | | | | | | | | | When lowering a ISD::BRCOND into a test+branch, make sure that we always use the correct condition code to emit the test operation. This fixes PR19858: "i8 checked mul is wrong on x86". Patch by Keno Fisher! llvm-svn: 210032
* Make blend tests more specificFilipe Cabecinhas2014-05-313-8/+17
| | | | | | | Following the lead set by r209324, I'm making these tests match the whole instruction, so we can be sure we're lowering them correctly. llvm-svn: 209947
* [X86] Add two combine rules to simplify dag nodes introduced during type ↵Andrea Di Biagio2014-05-302-27/+281
| | | | | | | | | | | | | | | | | | | | | | | legalization when promoting nodes with illegal vector type. This patch teaches the backend how to simplify/canonicalize dag node sequences normally introduced by the backend when promoting certain dag nodes with illegal vector type. This patch adds two new combine rules: 1) fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) -> (shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>) 2) fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) -> (shuffle (BINOP A, B), Undef, <Mask>). Both rules are only triggered on the type-legalized DAG. In particular, rule 1. is a target specific combine rule that attempts to sink a bitconvert into the operands of a binary operation. Rule 2. is a target independet rule that attempts to move a shuffle immediately after a binary operation. llvm-svn: 209930
* Convert a vselect into a concat_vector if possibleFilipe Cabecinhas2014-05-301-0/+14
| | | | | | | | | | | | | | | | | | | | | Summary: If both vector args to vselect are concat_vectors and the condition is constant and picks half a vector from each argument, convert the vselect into a concat_vectors. Added a test. The ConvertSelectToConcatVector is assuming it doesn't get vselects with arguments of, for example, <undef, undef, true, true>. Those get taken care of in the checks above its call. Reviewers: nadav, delena, grosbach, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3916 llvm-svn: 209929
* Separate the check for blend shuffle_vector masksFilipe Cabecinhas2014-05-301-2/+2
| | | | | | | | | | | | | | | | | Summary: Separate the check for blend shuffle_vector masks into isBlendMask. This function will also be used to check if a vector shuffle is legal. No change in functionality was intended, but we ended up improving codegen on two tests, which were being (more) optimized only if the resulting shuffle was legal. Reviewers: nadav, delena, andreadb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3964 llvm-svn: 209923
* [pr19636] Fix known bit computation in urem instruction with power of two.Rafael Espindola2014-05-301-0/+14
| | | | | | Patch by Andrey Kuharev. llvm-svn: 209902
* [X86] Move test from r209863 to CodeGen/X86Adam Nemet2014-05-291-0/+41
| | | | | | We should only run this if X86 is in the targets. llvm-svn: 209866
* [X86] Remove AVX1 vbroadcast intrinsicsAdam Nemet2014-05-291-24/+0
| | | | | | | | | | | | | | | | | | | | | The corresponding CFE patch replaces these intrinsics with vector initializers in avxintrin.h. This patch removes the LLVM intrinsics from the backend. We now stop lowering at X86ISD::VBROADCAST custom node rather than lowering that further to the intrinsics. The patch only changes VBROADCASTS* and leaves VBROADCAST[FI]128 to continue to use intrinsics. As explained in the CFE patch, the reason is that we currently don't generate as good code for them without the intrinsics. CodeGen/X86/avx-vbroadcast.ll already provides coverage for this change. It checks that for a series of insertelements we generate the appropriate vbroadcast instruction. Also verified that there was no assembly change in the test-suite before and after this patch. llvm-svn: 209864
* Added tests for shufflevector lowering to blend instrs.Filipe Cabecinhas2014-05-293-0/+69
| | | | | | | | | | | | | | | | | | | | These tests ensure that a change I will propose in clang works as expected. Summary: Added tests for the generation of blend+immediate instructions from a shufflevector. These tests were proposed along with a patch that was dropped. I'm committing the tests anyway to protect against possible regressions in codegen. Reviewers: nadav, bkramer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3600 llvm-svn: 209853
* [x86] Fold extract_vector_elt of a load into the Load's address computation.Michael J. Spencer2014-05-291-1/+19
| | | | | | | | An address only use of an extract element of a load can be simplified to a load. Without this the result of the extract element is spilled to the stack so that an address is available. llvm-svn: 209788
* [pr19844] Add thread local mode to aliases.Rafael Espindola2014-05-282-2/+18
| | | | | | | | | | This matches gcc's behavior. It also seems natural given that aliases contain other properties that govern how it is accessed (linkage, visibility, dll storage). Clang still has to be updated to expose this feature to C. llvm-svn: 209759
* Convert some X86 blendv* intrinsics into IR.Filipe Cabecinhas2014-05-273-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Implemented an InstCombine transformation that takes a blendv* intrinsic call and translates it into an IR select, if the mask is constant. This will eventually get lowered into blends with immediates if possible, or pblendvb (with an option to further optimize if we can transform the pblendvb into a blend+immediate instruction, depending on the selector). It will also enable optimizations by the IR passes, which give up on sight of the intrinsic. Both the transformation and the lowering of its result to asm got shiny new tests. The transformation is a bit convoluted because of blendvp[sd]'s definition: Its mask is a floating point value! This forces us to convert it and get the highest bit. I suppose this happened because the mask has type __m128 in Intel's intrinsic and v4sf (for blendps) in gcc's builtin. I will send an email to llvm-dev to discuss if we want to change this or not. Reviewers: grosbach, delena, nadav Differential Revision: http://reviews.llvm.org/D3859 llvm-svn: 209643
* Just check the entire string.Rafael Espindola2014-05-262-64/+64
| | | | | | Thanks to David Blaikie for the suggestion. llvm-svn: 209610
* Emit data or code export directives based on the type.Rafael Espindola2014-05-251-0/+4
| | | | | | | | | | | | | | | | | | | | | | Currently we look at the Aliasee to decide what type of export directive to use. It seems better to use the type of the alias directly. This is similar to how we handle the alias having the same address but other attributes (linkage, visibility) from the aliasee. With this patch it is now possible to do things like target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-pc-windows-msvc" @foo = global [6 x i8] c"\B8*\00\00\00\C3", section ".text", align 16 @f = dllexport alias i32 (), [6 x i8]* @foo !llvm.module.flags = !{!0} !0 = metadata !{i32 6, metadata !"Linker Options", metadata !1} !1 = metadata !{metadata !2, metadata !3} !2 = metadata !{metadata !"/DEFAULTLIB:libcmt.lib"} !3 = metadata !{metadata !"/DEFAULTLIB:oldnames.lib"} llvm-svn: 209600
* Make these CHECKs a bit more strict.Rafael Espindola2014-05-252-62/+62
| | | | | | The " at the end of the line makes sure we matched the entire directive. llvm-svn: 209599
* Use alias linkage and visibility to decide tls access mode.Rafael Espindola2014-05-231-2/+2
| | | | | | | | | | | | | | | | | This matches both what we do for the non-thread case and what gcc does. With this patch clang would match gcc's behaviour in static __thread int a = 42; extern __thread int b __attribute__((alias("a"))); int *f(void) { return &a; } int *g(void) { return &b; } if not for pr19843. Manually writing the IL does produce the same access modes. It is also a step in the direction of fixing pr19844. llvm-svn: 209543
* Remove unused CHECK linesNico Rieck2014-05-231-4/+0
| | | | llvm-svn: 209539
* Convert test to use FileCheck.Rafael Espindola2014-05-231-1/+5
| | | | llvm-svn: 209528
* [X86] Improve the lowering of BITCAST from MVT::f64 to MVT::v4i16/MVT::v8i8.Andrea Di Biagio2014-05-223-82/+157
| | | | | | | | | | | | | This patch teaches the x86 backend how to efficiently lower ISD::BITCAST dag nodes from MVT::f64 to MVT::v4i16 (and vice versa), and from MVT::f64 to MVT::v8i8 (and vice versa). This patch extends the logic from revision 208107 to also handle MVT::v4i16 and MVT::v8i8. Also, this patch correctly propagates Undef values when performing the widening of a vector (example: when widening from v2i32 to v4i32, the upper 64bits of the resulting vector are 'undef'). llvm-svn: 209451
* Segmented stacks: omit __morestack call when there's no frame.Tim Northover2014-05-221-7/+34
| | | | | | Patch by Florian Zeitz llvm-svn: 209436
* [X86] Fix a bug in the lowering of BLENDI introduced in r209043.Quentin Colombet2014-05-212-9/+41
| | | | | | | | | | | | | | | | | | | ISD::VSELECT mask uses 1 to identify the first argument and 0 to identify the second argument. On the other hand, BLENDI uses 0 to identify the first argument and 1 to identify the second argument. Fix the generation of the blend mask to account for this difference. The bug did not show up with r209043, because we were not checking for the actual arguments of the blend instruction! This commit also fixes the test cases. Note: The same mask works for the BLENDr variant because the arguments are swapped during instruction selection (see the BLENDXXrr patterns). <rdar://problem/16975435> llvm-svn: 209324
* Move the function and data section flags into the options struct andEric Christopher2014-05-201-37/+37
| | | | | | | | | | make the functions to set them non-static. Move and rename the llvm specific backend options to avoid conflicting with the clang option. Paired with a backend commit to update. llvm-svn: 209238
* [LSR] Canonicalize reg1 + ... + regN into reg1 + ... + 1*regN.Quentin Colombet2014-05-202-7/+10
| | | | | | | | | | | | | | This commit introduces a canonical representation for the formulae. Basically, as soon as a formula has more that one base register, the scaled register field is used for one of them. The register put into the scaled register is preferably a loop variant. The commit refactors how the formulae are built in order to produce such representation. This yields a more accurate, but still perfectible, cost model. <rdar://problem/16731508> llvm-svn: 209230
* Avoids DCE on write_registerRenato Golin2014-05-201-0/+3
| | | | llvm-svn: 209222
* Legalizer: Make bswap promotion safe for vectors.Benjamin Kramer2014-05-201-0/+17
| | | | llvm-svn: 209202
* DebugInfo: Assume all subprogram DIEs have been created before any abstract ↵David Blaikie2014-05-191-1/+1
| | | | | | | | | | | | | | subprograms are constructed. Since we visit the whole list of subprograms for each CU at module start, this is clearly true - don't test for the case, just assert it. A few old test cases seemed to have incomplete subprogram lists, but any attempt to reproduce them shows full subprogram lists that even include entities that have been completely inlined and the out of line definition removed. llvm-svn: 209178
* [X86] Add ISel patterns to improve the selection of TZCNT and LZCNT.Andrea Di Biagio2014-05-191-0/+447
| | | | | | | | | | Instructions TZCNT (requires BMI1) and LZCNT (requires LZCNT), always provide the operand size as output if the input operand is zero. We can take advantage of this knowledge during instruction selection stage in order to simplify a few corner case. llvm-svn: 209159
* Added more insertps optimizationsFilipe Cabecinhas2014-05-193-2/+221
| | | | | | | | | | | | | | | | | | | | Summary: When inserting an element that's coming from a vector load or a broadcast of a vector (or scalar) load, combine the load into the insertps instruction. Added PerformINSERTPSCombine for the case where we need to fix the load (load of a vector + insertps with a non-zero CountS). Added patterns for the broadcasts. Also added tests for SSE4.1, AVX, and AVX2. Reviewers: delena, nadav, craig.topper Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3581 llvm-svn: 209156
* SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the ↵Benjamin Kramer2014-05-191-6/+114
| | | | | | | | | | bswap not. - On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though. - On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal. - On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled. llvm-svn: 209123
* Change the blend tests to AVX, not AVX2.Filipe Cabecinhas2014-05-191-1/+1
| | | | llvm-svn: 209107
* [x86] Fix a bad predicate I spotted by inspection -- pshufhw and pshuflwChandler Carruth2014-05-171-1/+1
| | | | | | | | | | were added in SSE2, no SSSE3. Found this while auditing all uses of SSSE3 in the X86 target. I don't actually expect this to make a significant difference on anything and I don't have any detailed test cases but I updated the existing test cases that already covered some of this code path. llvm-svn: 209056
* Implemented special cases for PerformVSELECTCombine.Filipe Cabecinhas2014-05-161-5/+5
| | | | | | | | | | vselects with constant masks, after legalization, will get turned into specialized shuffle_vectors so they can be matched to blend+imm instructions. Fixed some tests. llvm-svn: 209044
* Lower vselects into X86ISD::BLENDI when appropriate.Filipe Cabecinhas2014-05-165-18/+45
| | | | | | | | | | | | | | | | LowerVSELECT will, if possible, generate a X86ISD::BLENDI DAG node if the condition is constant and we can emit that instruction, given the subtarget. This is not enough for all cases. An additional SELECTCombine optimization will be committed. Fixed tests that were expecting variable blends but where a blend+imm can be generated. Added test where we can't emit blend+immediate. Added avx2 blend+imm tests. llvm-svn: 209043
* Fix most of PR10367.Rafael Espindola2014-05-163-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the design of GlobalAlias so that it doesn't take a ConstantExpr anymore. It now points directly to a GlobalObject, but its type is independent of the aliasee type. To avoid changing all alias related tests in this patches, I kept the common syntax @foo = alias i32* @bar to mean the same as now. The cases that used to use cast now use the more general syntax @foo = alias i16, i32* @bar. Note that GlobalAlias now behaves a bit more like GlobalVariable. We know that its type is always a pointer, so we omit the '*'. For the bitcode, a nice surprise is that we were writing both identical types already, so the format change is minimal. Auto upgrade is handled by looking through the casts and no new fields are needed for now. New bitcode will simply have different types for Alias and Aliasee. One last interesting point in the patch is that replaceAllUsesWith becomes smart enough to avoid putting a ConstantExpr in the aliasee. This seems better than checking and updating every caller. A followup patch will delete getAliasedGlobal now that it is redundant. Another patch will add support for an explicit offset. llvm-svn: 209007
* DebugInfo: Assume the CU's Subprogram list only contains definitions.David Blaikie2014-05-161-1/+1
| | | | | | | | | | | DIBuilder maintains this invariant and the current DwarfDebug code could end up doing weird things if it contained declarations (such as putting the definition DIE inside a CU that contained the declaration - this doesn't seem like a good idea, so rather than adding logic to handle this case we'll just ban in for now & cross that bridge if we come to it later). llvm-svn: 209004
* llvm/test/CodeGen/X86/combine-sse41-intrinsics.ll: Add explicit triple.NAKAMURA Takumi2014-05-151-1/+1
| | | | llvm-svn: 208897
* [X86] Teach the backend how to fold SSE4.1/AVX/AVX2 blend intrinsics.Andrea Di Biagio2014-05-153-0/+414
| | | | | | | | | | | | | Added target specific combine rules to fold blend intrinsics according to the following rules: 1) fold(blend A, A, Mask) -> A; 2) fold(blend A, B, <allZeros>) -> A; 3) fold(blend A, B, <allOnes>) -> B. Added two new tests to verify that the new folding rules work for all the optimized blend intrinsics. llvm-svn: 208895
* Rename ComputeMaskedBits to computeKnownBits. "Masked" has beenJay Foad2014-05-141-1/+1
| | | | | | inappropriate since it lost its Mask parameter in r154011. llvm-svn: 208811
* X86: If we have an instruction that sets a flag and a zero test on the input ↵Benjamin Kramer2014-05-141-1/+75
| | | | | | | | | | | | | | | | | | | | | of that instruction try to eliminate the test. For example tzcntl %edi, %ebx testl %edi, %edi je .label can be rewritten into tzcntl %edi, %ebx jb .label A minor complication is that tzcnt sets CF instead of ZF when the input is zero, we have to rewrite users of the flags from ZF to CF. Currently we recognize patterns using lzcnt, tzcnt and popcnt. Differential Revision: http://reviews.llvm.org/D3454 llvm-svn: 208788
* [CGP] r205941 changed the logic, so that a cast happens *before* 'Result' isJoey Gouly2014-05-131-0/+14
| | | | | | | | | | | compared to 'AddrMode.BaseReg'. In the case that 'AddrMode.BaseReg' is nullptr, 'Result' will also be nullptr, so the cast causes an assertion. We should use dyn_cast_or_null here to check 'Result' is not null and it is an instruction. Bug found by Mats Petersson, and I reduced his IR to get a test case. llvm-svn: 208705
OpenPOWER on IntegriCloud