summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86
Commit message (Collapse)AuthorAgeFilesLines
* X86 memcpy lowering: use "rep movs" even when esi is used as base pointerHans Wennborg2014-03-182-3/+10
| | | | | | | | | | | | | For functions where esi is used as base pointer, we would previously fall back from lowering memcpy with "rep movs" because that clobbers esi. With this patch, we just store esi in another physical register, and restore it afterwards. This adds a little bit of register preassure, but the more efficient memcpy should be worth it. Differential Revision: http://llvm-reviews.chandlerc.com/D2968 llvm-svn: 204174
* Fix test lsr-normalization.ll broken in r204161.Michael Zolotukhin2014-03-181-5/+5
| | | | llvm-svn: 204166
* Add stride normalization to SCEV Normalize/Denormalize transformation.Michael Zolotukhin2014-03-181-1/+4
| | | | llvm-svn: 204161
* [DAGCombiner] teach how to simplify xor/and/or nodes according to the ↵Andrea Di Biagio2014-03-182-0/+255
| | | | | | | | | | | | | | following rules: 1) (AND (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (AND (A, B), C, Mask) 2) (OR (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (OR (A, B), C, Mask) 3) (XOR (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (XOR (A, B), V_0, Mask) 4) (AND (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, AND (A, B), Mask) 5) (OR (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, OR (A, B), Mask) 6) (XOR (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (V_0, XOR (A, B), Mask) llvm-svn: 204160
* Make DAGCombiner work on vector bitshifts with constant splat vectors.Matt Arsenault2014-03-172-24/+155
| | | | llvm-svn: 204071
* [VectorLegalizer/X86] Don't unvectorize fp_to_uint for v8f32->v8i16Adam Nemet2014-03-171-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than LegalizeAction::Expand, this needs LegalizeAction::Promote to get promoted to fp_to_sint v8f32->v8i32. This is a legal operation on AVX. For that to work properly, we also need to teach the legalizer about the specific promotion required here. The default vector promotion uses bitcasting to a vector type of the same total size. We want to promote the vector element type, effectively widening the operation and then truncating the result. This is analogous to the current logic of how int_to_fp is promoted. The change also factors out some code from the int_to_fp promotion code to ValueType::widenIntegerVectorElementType. This is now shared between int_to_fp and fp_to_int. There is no longer need for the custom lowering of fp_to_sint f32->v8i16 in X86. It can now go through the new target-independent fp_to_*int promotion logic. I also checked that no other target uses Promote for these ops yet, so there shouldn't be any unexpected change in behavior. Fixes <rdar://problem/16202247> llvm-svn: 204058
* [X86] New and improved VZeroUpperInserter optimization.Lang Hames2014-03-171-9/+28
| | | | | | | | | | | | | | | | | - Adds support for inserting vzerouppers before tail-calls. This is enabled implicitly by having MachineInstr::copyImplicitOps preserve regmask operands, which allows VZeroUpperInserter to see where tail-calls use vector registers. - Fixes a bug that caused the previous version of this optimization to miss some vzeroupper insertion points in loops. (Loops-with-vector-code that followed loops-without-vector-code were mistakenly overlooked by the previous version). - New algorithm never revisits instructions. Fixes <rdar://problem/16228798> llvm-svn: 204021
* Re-add checks that were in this testcase before it was converted to dwarfdump.Adrian Prantl2014-03-141-0/+6
| | | | llvm-svn: 203981
* Remove the linker_private and linker_private_weak linkages.Rafael Espindola2014-03-133-21/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These linkages were introduced some time ago, but it was never very clear what exactly their semantics were or what they should be used for. Some investigation found these uses: * utf-16 strings in clang. * non-unnamed_addr strings produced by the sanitizers. It turns out they were just working around a more fundamental problem. For some sections a MachO linker needs a symbol in order to split the section into atoms, and llvm had no idea that was the case. I fixed that in r201700 and it is now safe to use the private linkage. When the object ends up in a section that requires symbols, llvm will use a 'l' prefix instead of a 'L' prefix and things just work. With that, these linkages were already dead, but there was a potential future user in the objc metadata information. I am still looking at CGObjcMac.cpp, but at this point I am convinced that linker_private and linker_private_weak are not what they need. The objc uses are currently split in * Regular symbols (no '\01' prefix). LLVM already directly provides whatever semantics they need. * Uses of a private name (start with "\01L" or "\01l") and private linkage. We can drop the "\01L" and "\01l" prefixes as soon as llvm agrees with clang on L being ok or not for a given section. I have two patches in code review for this. * Uses of private name and weak linkage. The last case is the one that one could think would fit one of these linkages. That is not the case. The semantics are * the linker will merge these symbol by *name*. * the linker will hide them in the final DSO. Given that the merging is done by name, any of the private (or internal) linkages would be a bad match. They allow llvm to rename the symbols, and that is really not what we want. From the llvm point of view, these objects should really be (linkonce|weak)(_odr)?. For now, just keeping the "\01l" prefix is probably the best for these symbols. If we one day want to have a more direct support in llvm, IMHO what we should add is not a linkage, it is just a hidden_symbol attribute. It would be applicable to multiple linkages. For example, on weak it would produce the current behavior we have for objc metadata. On internal, it would be equivalent to private (and we should then remove private). llvm-svn: 203866
* Add -mtriple=x86_64-linux to this test case to fix the build bots.5Kevin Enderby2014-03-131-1/+1
| | | | | | The original commit was r203829. llvm-svn: 203844
* Fix for http://llvm.org/bugs/show_bug.cgi?id=18590Ekaterina Romanova2014-03-131-0/+83
| | | | | | | This patch fixes the bug in peephole optimization that folds a load which defines one vreg into the one and only use of that vreg. With debug info, a DBG_VALUE that referenced the vreg considered to be a use, preventing the optimization. The fix is to ignore DBG_VALUE's during the optimization, and undef a DBG_VALUE that references a vreg that gets removed. Patch by Trevor Smigiel! llvm-svn: 203829
* CodeGenPrep: sink extends of illegal types into use block.Manuel Jacob2014-03-131-0/+32
| | | | | | | | | | | | | | | | | | | Summary: This helps the instruction selector to lower an i64 * i64 -> i128 multiplication into a single instruction on targets which support it. This is an update of D2973 which was reverted because of a bug reported as PR19084. Reviewers: t.p.northover, chapuni Reviewed By: t.p.northover CC: llvm-commits, alex, chapuni Differential Revision: http://llvm-reviews.chandlerc.com/D3021 llvm-svn: 203797
* AVX-512: masked load/store + intrinsics for them.Elena Demikhovsky2014-03-131-0/+16
| | | | llvm-svn: 203790
* [X86] Add peephole for masked rotate amountAdam Nemet2014-03-121-0/+8
| | | | | | | | | | | | | | | | Extend what's currently done for shift because the HW performs this masking implicitly: (rotl:i32 x, (and y, 31)) -> (rotl:i32 x, y) I use the newly factored out multiclass that was only supporting shifts so far. For testing I extended my testcase for the new rotation idiom. <rdar://problem/15295856> llvm-svn: 203718
* Reject alias to undefined symbols in the verifier.Rafael Espindola2014-03-122-6/+1
| | | | | | | | | | | | | | | On ELF and COFF an alias is just another name for a position in the file. There is no way to refer to a position in another file, so an alias to undefined is meaningless. MachO currently doesn't support aliases. The spec has a N_INDR, which when implemented will have a different set of restrictions. Adding support for it shouldn't be harder than any other IR extension. For now, having the IR represent what is actually possible with current tools makes it easier to fix the design of GlobalAlias. llvm-svn: 203705
* X86: Don't generate 64-bit movd after cmpneqsd in 32-bit mode (PR19059)Hans Wennborg2014-03-111-2/+26
| | | | | | | | | This fixes the bug where we would bitcast the 64-bit floating point result of cmpneqsd to a 64-bit integer even on 32-bit targets. Differential Revision: http://llvm-reviews.chandlerc.com/D3009 llvm-svn: 203581
* IR: add a second ordering operand to cmpxhg for failureTim Northover2014-03-1111-32/+32
| | | | | | | | | | | | | | | The syntax for "cmpxchg" should now look something like: cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic where the second ordering argument gives the required semantics in the case that no exchange takes place. It should be no stronger than the first ordering constraint and cannot be either "release" or "acq_rel" (since no store will have taken place). rdar://problem/15996804 llvm-svn: 203559
* X86: Enable ISel of 16-bit MOVBE instructions.Jim Grosbach2014-03-111-12/+33
| | | | | | | | | | | | | | | | | When the MOVBE instructions are available, use them for 16-bit endian swapping as well as for 32 and 64 bit. The patterns were already present on the instructions, but weren't being matched because the operation was unconditionally marked to 'Expand.' Change that to be conditional on whether the MOVBE instructions are available. Use 'rolw' to implement the in-register version (32 and 64 bit have the dedicated 'bswap' instruction for that). Patch by Louis Gerbarg <lgg@apple.com>. rdar://15479984 llvm-svn: 203524
* Fix undefined behavior in vector shift tests.Matt Arsenault2014-03-112-28/+28
| | | | | | These were all shifting the same amount as the bitwidth. llvm-svn: 203519
* Revert r203230, "CodeGenPrep: sink extends of illegal types into use block."NAKAMURA Takumi2014-03-091-32/+0
| | | | | | It choked i686 stage2. llvm-svn: 203386
* IR: Change inalloca's grammar a bitDavid Majnemer2014-03-095-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The grammar for LLVM IR is not well specified in any document but seems to obey the following rules: - Attributes which have parenthesized arguments are never preceded by commas. This form of attribute is the only one which ever has optional arguments. However, not all of these attributes support optional arguments: 'thread_local' supports an optional argument but 'addrspace' does not. Interestingly, 'addrspace' is documented as being a "qualifier". What constitutes a qualifier? I cannot find a definition. - Some attributes use a space between the keyword and the value. Examples of this form are 'align' and 'section'. These are always preceded by a comma. - Otherwise, the attribute has no argument. These attributes do not have a preceding comma. Sometimes an attribute goes before the instruction, between the instruction and it's type, or after it's type. 'atomicrmw' has 'volatile' between the instruction and the type while 'call' has 'tail' preceding the instruction. With all this in mind, it seems most consistent for 'inalloca' on an 'inalloca' instruction to occur before between the instruction and the type. Unlike the current formulation, there would be no preceding comma. The combination 'alloca inalloca' doesn't look particularly appetizing, perhaps a better spelling of 'inalloca' is down the road. llvm-svn: 203376
* Update comment from r203315 based on reviewAdam Nemet2014-03-081-1/+1
| | | | llvm-svn: 203361
* DebugInfo: further improvements to test following up on r203329David Blaikie2014-03-081-7/+7
| | | | llvm-svn: 203337
* DebugInfo: Fix test fallout from r203323David Blaikie2014-03-081-1/+1
| | | | | | Will fix this harder in a moment. llvm-svn: 203329
* [DAGCombiner] Recognize another rotation idiomAdam Nemet2014-03-071-0/+126
| | | | | | | | | | | | | | | | This is the new idiom: x<<(y&31) | x>>((0-y)&31) which is recognized as: x ROTL (y&31) The change refines matchRotateSub. In Neg & (OpSize - 1) == (OpSize - Pos) & (OpSize - 1), if Pos is Pos' & (OpSize - 1) we can just use Pos' instead of Pos. llvm-svn: 203315
* ISel: Make VSELECT selection terminate in cases where the condition type has toArnold Schwaighofer2014-03-071-0/+14
| | | | | | | | | | | | | | | be split and the result type widened. When the condition of a vselect has to be split it makes no sense widening the vselect and thereby widening the condition. We end up in an endless loop of widening (vselect result type) and splitting (condition mask type) doing this. Instead, split both the condition and the vselect and widen the result. I ran this over the test suite with i686 and mattr=+sse and saw no regressions. Fixes PR18036. llvm-svn: 203311
* CodeGenPrep: sink extends of illegal types into use block.Tim Northover2014-03-071-0/+32
| | | | | | | | | This helps the instruction selector to lower an i64 * i64 -> i128 multiplication into a single instruction on targets which support it. Patch by Manuel Jacob. llvm-svn: 203230
* Replace PROLOG_LABEL with a new CFI_INSTRUCTION.Rafael Espindola2014-03-071-4/+4
| | | | | | | | | | | | | | | | | | | | | | | The old system was fairly convoluted: * A temporary label was created. * A single PROLOG_LABEL was created with it. * A few MCCFIInstructions were created with the same label. The semantics were that the cfi instructions were mapped to the PROLOG_LABEL via the temporary label. The output position was that of the PROLOG_LABEL. The temporary label itself was used only for doing the mapping. The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to one by holding an index into the CFI instructions of this function. I did consider removing MMI.getFrameInstructions completelly and having CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non trivial constructors and destructors and are somewhat big, so the this setup is probably better. The net result is that we don't create temporary labels that are never used. llvm-svn: 203204
* Remove shouldEmitUsedDirectiveFor.Rafael Espindola2014-03-061-1/+5
| | | | | | Clang now uses llvm.compiler.used for these cases. llvm-svn: 203174
* Convert test to FileCheck.Rafael Espindola2014-03-061-3/+5
| | | | llvm-svn: 203173
* [X86] Teach the DAGCombiner how to fold a OR of two shufflevector nodes.Andrea Di Biagio2014-03-061-0/+267
| | | | | | | | | | | | | | | | | | | | | | This patch teaches the DAGCombiner how to fold a binary OR between two shufflevector into a single shuffle vector when possible. The rules are: 1. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask1) 2. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf B, A, Mask2) The DAGCombiner can take advantage of the fact that OR is commutative and compute two possible shuffle masks (Mask1 and Mask2) for the resulting shuffle node. Before folding a dag according to either rule 1 or 2, DAGCombiner verifies that the resulting shuffle mask is legal for the target. DAGCombiner would firstly try to fold according to 1.; If not possible then it will try to fold according to 2. If both Mask1 and Mask2 are illegal then we conservatively don't fold the OR instruction. llvm-svn: 203156
* Always print the implicit .text at the start of an asm file.Rafael Espindola2014-03-052-2/+11
| | | | | | | | | | | | | | | | | Before llvm-mc would print it, but llc was assuming that it would produce another section changing directive before one was needed. That assumption is false with inline asm. Fixes PR19049. Another option would be to always create the section, but in the asm printer avoid printing sections changes during initialization. That would work, but * We do use the fact that llvm-mc prints it in testing. The tests can be changed if needed. * A quick poll on IRC suggest that most developers prefer the implicit .text to be printed. llvm-svn: 203001
* Lower AVX v4i64->v4i32 truncate to one shuffle.Cameron McInally2014-03-051-3/+5
| | | | llvm-svn: 202996
* Make stackmap machineinstrs clobber the scratch regs too.Andrew Trick2014-03-051-2/+22
| | | | | | | | | | | | | Patchpoints already did this. Doing it for stackmaps is a convenience for the runtime in the event that it needs to scratch register to patch or perform a runtime call thunk. Unlike patchpoints, we just assume the AnyRegCC calling convention. This is the only language and target independent calling convention specific to stackmaps so makes sense. Although the calling convention is not currently used to select the scratch registers. llvm-svn: 202943
* Check for dynamic allocas and inline asm that clobbers sp before buildingHans Wennborg2014-03-052-1/+43
| | | | | | | | | | | | | | | | | | | selection dag (PR19012) In X86SelectionDagInfo::EmitTargetCodeForMemcpy we check with MachineFrameInfo to make sure that ESI isn't used as a base pointer register before we choose to emit rep movs (which clobbers esi). The problem is that MachineFrameInfo wouldn't know about dynamic allocas or inline asm that clobbers the stack pointer until SelectionDAGBuilder has encountered them. This patch fixes the problem by checking for such things when building the FunctionLoweringInfo. Differential Revision: http://llvm-reviews.chandlerc.com/D2954 llvm-svn: 202930
* AVX-512: Fixed extract_vector_elt for v8i1 vectorElena Demikhovsky2014-03-021-6/+18
| | | | llvm-svn: 202624
* SpillPlacement: fix a bug in iterate.Manman Ren2014-02-281-0/+292
| | | | | | | | | | | Inside iterate, we scan backwards then scan forwards in a loop. When iteration is not zero, the last node was just updated so we can skip it. But when iteration is zero, we can't skip the last node. For the testing case, fixing this will save a spill and move register copies from hot path to cold path. llvm-svn: 202557
* Test commitAdam Nemet2014-02-281-0/+1
| | | | llvm-svn: 202528
* Stop test/CodeGen/X86/v4i32load-crash.ll targeting non-X86-64 targets.Daniel Sanders2014-02-271-2/+3
| | | | | | | | | | | | | Summary: Fixes an issue where a test attempts to use -mcpu=x86-64 on non-X86-64 targets. This triggers an assertion in the MIPS backend since it doesn't know what ABI to use by default for unrecognized processors. CC: llvm-commits, rafael Differential Revision: http://llvm-reviews.chandlerc.com/D2877 llvm-svn: 202369
* Add a limit to the heuristic that register allocates instructions in local ↵Andrew Trick2014-02-261-1/+1
| | | | | | | | | | | | order. This handles pathological cases in which we see 2x increase in spill code for large blocks (~50k instructions). I don't have a unit test for this behavior. Fixes rdar://16072279. llvm-svn: 202304
* Lower unsigned vsetcc to psubus in certain casesQuentin Colombet2014-02-262-12/+101
| | | | | | | | | | | | | | | | | | | | | The current approach to lower a vsetult is to flip the sign bit of the operands, swap the operands and then use a (signed) pcmpgt. psubus (unsigned saturating subtract) can be used to emulate a vsetult more efficiently: + case ISD::SETULT: { + // If the comparison is against a constant we can turn this into a + // setule. With psubus, setule does not require a swap. This is + // beneficial because the constant in the register is no longer + // destructed as the destination so it can be hoisted out of a loop. I also enable lowering via psubus in a few other cases where it's clearly beneficial: setule and setuge if minu/maxu cannot be used. rdar://problem/14338765 Patch by Adam Nemet <anemet@apple.com>. llvm-svn: 202301
* Store a DataLayout in Module.Rafael Espindola2014-02-251-2/+1
| | | | | | | | | | | | | | Now that DataLayout is not a pass, store one in Module. Since the C API expects to be able to get a char* to the datalayout description, we have to keep a std::string somewhere. This patch keeps it in Module and also uses it to represent modules without a DataLayout. Once DataLayout is mandatory, we should probably move the string to DataLayout itself since it won't be necessary anymore to represent the special case of a module without a DataLayout. llvm-svn: 202190
* AVX-512: Fixed encoding of VPTESTMQElena Demikhovsky2014-02-231-0/+14
| | | | llvm-svn: 201980
* Make test more resilient against scheduling decisions.Benjamin Kramer2014-02-221-3/+3
| | | | | | Should bring the atom buildbots back to life. llvm-svn: 201951
* llvm/test/CodeGen/X86/shift-pcmp.ll: Tweak to appease FileCheck. ↵NAKAMURA Takumi2014-02-221-1/+1
| | | | | | | | | | | | | | | | "CHECK-LABEL" doesn't identify labels magically and CHECK-LABEL behaves free from other contexts. For targeting pecoff, ".def foo" appears before ".short 32". .def foo; ... .LCPI0_0: .short 32 foo: CHECK-LABEL seeks not from ".short 32" but from the top of the input. llvm-svn: 201931
* [DAGCombiner] PCMP* sets its result to all ones or zeros so we can AND with theQuentin Colombet2014-02-211-0/+30
| | | | | | | | | | | | | | shifted mask rather than masking and shifting separately. The patch adds this transformation to the DAGCombiner:   (shl (and (setcc:i8v16 ...) N01C) N1C) -> (and (setcc:i8v16 ...) N01C<<N1C) <rdar://problem/16054492> Patch by Adam Nemet <anemet@apple.com> llvm-svn: 201906
* AVX-512: added a lit test for truncate operationElena Demikhovsky2014-02-201-0/+13
| | | | llvm-svn: 201763
* Add back r201608, r201622, r201624 and r201625Rafael Espindola2014-02-195-4/+83
| | | | | | | | | | | | | | r201608 made llvm corretly handle private globals with MachO. r201622 fixed a bug in it and r201624 and r201625 were changes for using private linkage, assuming that llvm would do the right thing. They all got reverted because r201608 introduced a crash in LTO. This patch includes a fix for that. The issue was that TargetLoweringObjectFile now has to be initialized before we can mangle names of private globals. This is trivially true during the normal codegen pipeline (the asm printer does it), but LTO has to do it manually. llvm-svn: 201700
* Fix AVX512 vector sqrt assembly strings.Cameron McInally2014-02-191-0/+18
| | | | llvm-svn: 201681
* Revert r201622 and r201608.Daniel Jasper2014-02-195-83/+4
| | | | | | | This causes the LLVMgold plugin to segfault. More information on the replies to r201608. llvm-svn: 201669
OpenPOWER on IntegriCloud