summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [PPC64LE] Teach swap optimization about the doubleword splat idiomBill Schmidt2015-07-021-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With a previous patch, the VSX swap optimization is able to recognize the doubleword load-splat idiom that can be implemented using lxvdsx. However, that does not cover a doubleword splat where the source is a register. We can implement this using xxspltd (a special form of xxpermdi). This patch teaches the swap optimization pass about this idiom. As a prerequisite, it also permits swap optimization to succeed for all forms of SUBREG_TO_REG. Previously we were conservative and only allowed SUBREG_TO_REG when it copied a full register. However, on reflection any form of SUBREG_TO_REG is safe in and of itself, so long as an unsafe operation is not performed on its result. In particular, a widening SUBREG_TO_REG often occurs as an input to a doubleword splat idiom, particularly in auto-vectorized code. The doubleword splat idiom is an XXPERMDI operation where both source registers are identical, and the selection mask is either 0 (splat the first element) or 3 (splat the second element). To determine whether the registers are identical, we use the existing mechanism for looking through "copy-like" operations. That mechanism has a side effect of marking the XXPERMDI operation as using a physical register, which would invalidate its presence in a swap-optimized region. This is correct for the form of XXPERMDI that performs a swap and hence would be removed, but is not what we want for a doubleword-splat variety of XXPERMDI. Therefore we reset the physical-register flag on the XXPERMDI when it represents a splat. A simple test case is added to verify that we generate the splat and that we also remove the xxswapd instructions that would otherwise be associated with the load and store of another operand. llvm-svn: 241285
* Reapply r240291: Fix shl folding in DAG combiner.Pawel Bylica2015-07-021-0/+9
| | | | | | | | The code responsible for shl folding in the DAGCombiner was assuming incorrectly that all constants are less than 64 bits. This patch simply changes the way values are compared. It has been reverted previously because of some problems with comparing APInt with raw uint64_t. That has been fixed/changed with r241204. llvm-svn: 241254
* [TwoAddressInstructionPass] Try 3 Addr Conversion After Commuting.Quentin Colombet2015-07-013-9/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | TwoAddressInstructionPass stops after a successful commuting but 3 Addr conversion might be good for some cases. Consider: int foo(int a, int b) { return a + b; } Before this commit, we emit: addl %esi, %edi movl %edi, %eax ret After this commit, we try 3 Addr conversion: leal (%rsi,%rdi), %eax ret Patch by Volkan Keles <vkeles@apple.com>! Differential Revision: http://reviews.llvm.org/D10851 llvm-svn: 241206
* Test for specific output in lit testMatthias Braun2015-07-011-1/+18
| | | | llvm-svn: 241200
* [NVPTX] expand extload/truncstore for vectors of floatsJingyue Wu2015-07-011-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: According to PTX ISA: For convenience, ld, st, and cvt instructions permit source and destination data operands to be wider than the instruction-type size, so that narrow values may be loaded, stored, and converted using regular-width registers. For example, 8-bit or 16-bit values may be held directly in 32-bit or 64-bit registers when being loaded, stored, or converted to other types and sizes. The operand type checking rules are relaxed for bit-size and integer (signed and unsigned) instruction types; floating-point instruction types still require that the operand type-size matches exactly, unless the operand is of bit-size type. So, the ISA does not support load with extending/store with truncatation for floating numbers. This is reflected in setting the loadext/truncstore actions to expand in the code for floating numbers, but vectors of floating numbers are not taken care of. As a result, loading a vector of floats followed by a fp_extend may be combined by DAGCombiner to a extload, and the extload may be lowered to NVPTXISD::LoadV2 with extending information. However, NVPTXISD::LoadV2 does not perform extending, and no extending instructions are inserted. Finally, PTX instructions with mismatched types are generated, like ld.v2.f32 {%fd3, %fd4}, [%rd2] This patch adds the correct actions for vectors of floats, so DAGCombiner would not create loads with extending, and correct code is generated. Patched by Gang Hu. Test Plan: Test case attached. Reviewers: jingyue Reviewed By: jingyue Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D10876 llvm-svn: 241191
* [NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPassJingyue Wu2015-07-011-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Offset of frame index is calculated by NVPTXPrologEpilogPass. Before that the correct offset of stack objects cannot be obtained, which leads to wrong offset if there are more than 2 frame objects. This patch move NVPTXPeephole after NVPTXPrologEpilogPass. Because the frame index is already replaced by %VRFrame in NVPTXPrologEpilogPass, we check VRFrame register instead, and try to remove the VRFrame if there is no usage after NVPTXPeephole pass. Patched by Xuetian Weng. Test Plan: Strengthened test/CodeGen/NVPTX/local-stack-frame.ll to check the offset calculation based on SP and SPL. Reviewers: jholewinski, jingyue Reviewed By: jingyue Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10853 llvm-svn: 241185
* [PPC64LE] Enable missing lxvdsx optimization, and related swap optimizationBill Schmidt2015-07-011-6/+288
| | | | | | | | | | | | | | | | | | | | | | | When adding little-endian vector support for PowerPC last year, I inadvertently disabled an optimization that recognizes a load-splat idiom and generates the lxvdsx instruction. This patch moves the offending logic so lxvdsx is once again generated. This pattern is frequently generated by the vectorizer for scalar loads of an effective constant. Previously the lxvdsx instruction was wrongly listed as lane-sensitive for the VSX swap optimization (since both doublewords are identical, swaps are safe). This patch fixes this as well, so that vectorized code using lxvdsx can now have swaps removed from the computation. There is an existing test (@test50) in test/CodeGen/PowerPC/vsx.ll that checks for the missing optimization. However, vsx.ll was only being tested for POWER7 with big-endian code generation. I've added a little-endian RUN statement and expected LE code generation for all the tests in vsx.ll to give us a bit better VSX coverage, including what's needed for this patch. llvm-svn: 241183
* add a cl::opt override for TargetLoweringBase's JumpIsExpensiveSanjay Patel2015-07-011-11/+20
| | | | | | | | | | | | | This patch is not intended to change existing codegen behavior for any target. It just exposes the JumpIsExpensive setting on the command-line to allow for easier testing and emergency overrides. Also, change the existing regression test to use FileCheck, explicitly specify the jump-is-expensive option, and use more precise checks. Differential Revision: http://reviews.llvm.org/D10846 llvm-svn: 241179
* [SEH] Don't assert if the parent function lacks a personalityReid Kleckner2015-07-011-0/+33
| | | | | | | | The EH code might have been deleted as unreachable and the personality pruned while the filter is still present. Currently I'm hitting this at -O0 due to the clang bug PR24009. llvm-svn: 241170
* AVX-512: Implemented missing encoding for FMA scalar instructionsIgor Breger2015-07-011-1/+30
| | | | | | | | Added tests for encoding Differential Revision: http://reviews.llvm.org/D10865 llvm-svn: 241159
* [SEH] Add new intrinsics for recovering and restoring parent framesReid Kleckner2015-06-302-31/+44
| | | | | | | | | | | | | | | | | | | | | | | | | The incoming EBP value established by the runtime is actually a pointer to the end of the EH registration object, and not the true parent function frame pointer. Clang doesn't need llvm.x86.seh.exceptioninfo anymore because we know that the exception info pointer is at a fixed offset from this incoming EBP. The llvm.x86.seh.recoverfp intrinsic takes an EBP value provided by the EH runtime and returns a pointer that is usable with llvm.framerecover. The llvm.x86.seh.restoreframe intrinsic is inserted by the 32-bit specific preparation pass in blocks targetted by the EH runtime. It re-establishes any physical registers used by the parent function to address the stack, such as the frame, base, and stack pointers. Neither of these intrinsics correctly handle stack realignment prologues yet, but it's possible to add that later. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D10848 llvm-svn: 241125
* [FaultMaps] Let the frontend pre-select implicit null check candidates.Sanjoy Das2015-06-302-5/+23
| | | | | | | | | | | | | | | | | | Summary: This change introduces a !make.implicit metadata that allows the frontend to pre-select the set of explicit null checks that will be considered for transformation into implicit null checks. The reason for not using profiling data instead of !make.implicit is explained in the change to `FaultMaps.rst`. Reviewers: atrick, reames, pgavlin, JosephTremoulet Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10824 llvm-svn: 241116
* Fixes a bug with __builtin_vsx_lxvdw4x on Little Endian systemsNemanja Ivanovic2015-06-301-0/+25
| | | | llvm-svn: 241108
* COFF: Do not assign linker-weak symbols to selectany comdat sections.Peter Collingbourne2015-06-301-0/+9
| | | | | | | | | | | | | | It is mandatory to specify a comdat in order to receive comdat semantics for a symbol. We were previously getting this wrong in -function-sections mode; linker-weak symbols were being emitted in a selectany comdat. This change causes such symbols to use a noduplicates comdat instead, fixing the inconsistency. Also correct an inaccuracy in the docs. Differential Revision: http://reviews.llvm.org/D10828 llvm-svn: 241103
* [NVPTX] Fix issue introduced in D10321Jingyue Wu2015-06-301-0/+18
| | | | | | | | | | | | | | | | | | | | | | Summary: Really check if %SP is not used in other places, instead of checking only exact one non-dbg use. Patched by Xuetian Weng. Test Plan: @foo4 in test/CodeGen/NVPTX/local-stack-frame.ll, create a case that SP will appear twice. Reviewers: jholewinski, jingyue Reviewed By: jingyue Subscribers: llvm-commits, sfantao, jholewinski Differential Revision: http://reviews.llvm.org/D10844 llvm-svn: 241099
* MIR Serialization: Serialize MBB successors.Alex Lorenz2015-06-303-0/+116
| | | | | | | | | | | | | This commit implements serialization of the machine basic block successors. It uses a YAML flow sequence that contains strings that have the MBB references. The MBB references in those strings use the same syntax as the MBB machine operands in the machine instruction strings. Reviewers: Duncan P. N. Exon Smith Differential Revision: http://reviews.llvm.org/D10699 llvm-svn: 241093
* Force relocation mode to be default, regardless of what is passed to the ↵Samuel Antao2015-06-301-0/+15
| | | | | | backend. llvm-svn: 241081
* [X86] Fix a bug in WIN_FTOL_32/64 handling.Michael Kuperstein2015-06-301-0/+22
| | | | | | | | | | Duplicating an FP register "as itself" is a bad idea, since it violates the invariant that every FP register is mapped to at most one FPU stack slot. Use the scratch FP register instead. This fixes PR23957. llvm-svn: 241069
* [X86] Add FXSR intrinsicsMichael Kuperstein2015-06-302-0/+50
| | | | | | Add intrinsics for the FXSR instructions (FXSAVE/FXSAVE64/FXRSTOR/FXRSTOR64) llvm-svn: 241049
* RegisterCoalescer: Cleanup empty subranges after shrinkToUses()Matthias Braun2015-06-301-0/+20
| | | | | | | | A call to removeEmptySubranges() is necessary after every operation that potentially removes all segments from a subregister range; this case in the register coalescer was missing. llvm-svn: 241027
* Teach LTOModule to emit linker flags for dllexported symbols, plus interface ↵Peter Collingbourne2015-06-292-65/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cleanup. This change unifies how LTOModule and the backend obtain linker flags for globals: via a new TargetLoweringObjectFile member function named emitLinkerFlagsForGlobal. A new function LTOModule::getLinkerOpts() returns the list of linker flags as a single concatenated string. This change affects the C libLTO API: the function lto_module_get_*deplibs now exposes an empty list, and lto_module_get_*linkeropts exposes a single element which combines the contents of all observed flags. libLTO should never have tried to parse the linker flags; it is the linker's job to do so. Because linkers will need to be able to parse flags in regular object files, it makes little sense for libLTO to have a redundant mechanism for doing so. The new API is compatible with the old one. It is valid for a user to specify multiple linker flags in a single pragma directive like this: #pragma comment(linker, "/defaultlib:foo /defaultlib:bar") The previous implementation would not have exposed either flag via lto_module_get_*deplibs (as the test in TargetLoweringObjectFileCOFF::getDepLibFromLinkerOpt was case sensitive) and would have exposed "/defaultlib:foo /defaultlib:bar" as a single flag via lto_module_get_*linkeropts. This may have been a bug in the implementation, but it does give us a chance to fix the interface. Differential Revision: http://reviews.llvm.org/D10548 llvm-svn: 241010
* ARM: add correct kill flags when combining stm instructionsTim Northover2015-06-291-0/+43
| | | | | | | | | When the store sequence being combined actually stores the base register, we should not mark it as killed until the end. rdar://21504262 llvm-svn: 241003
* X86: Rework inline asm integer register specification.Matthias Braun2015-06-292-0/+143
| | | | | | | | | | | | | | | | | | | | | | | This is a new version of http://reviews.llvm.org/D10260. It turned out that when you specify an integer register in inline asm on x86 you get the register of the required type size back. That means that X86TargetLowering::getRegForInlineAsmConstraint() has to accept any of the integer registers and adapt its size to the given target size which may be any 8/16/32/64 bit sized type. Surprisingly that means given a constraint of "{ax}" and a type of MVT::F32 we need to return X86::EAX. This change makes this face explicit, the previous code seemed like working by accident because there it never returned an error once a register was found. On the other hand this rewrite allows to actually return errors for invalid situations like requesting an integer register for an i128 type. Related to rdar://21042280 Differential Revision: http://reviews.llvm.org/D10813 llvm-svn: 241002
* [FaultMaps] Fix test case.Sanjoy Das2015-06-291-16/+1
| | | | | | | implicit-null-check-negative.ll had a missing 2>&1. Fix this, and remove an incorrect test case that this exposes. llvm-svn: 240998
* [DAGCombiner] Fix & simplify constant folding of sext/zext.Pawel Bylica2015-06-291-0/+92
| | | | | | | | | | | | | | | | Summary: This patch fixes the cases of sext/zext constant folding in DAG combiner where constans do not fit 64 bits. The fix simply removes un$ Test Plan: New regression test included. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: RKSimon, llvm-commits Differential Revision: http://reviews.llvm.org/D10607 llvm-svn: 240991
* MIR Serialization: Serialize the register mask machine operands.Alex Lorenz2015-06-291-0/+43
| | | | | | | | | | | | | | | | | This commit implements serialization of the register mask machine operands. This commit serializes only the call preserved register masks that are defined by a target, it doesn't serialize arbitrary register masks. This commit also extends the TargetRegisterInfo class and TableGen so that the users of TRI can get the list of all the call preserved register masks and their names. Reviewers: Duncan P. N. Exon Smith Differential Revision: http://reviews.llvm.org/D10673 llvm-svn: 240966
* AVX-512: all forms of SCATTER instruction on SKX,Elena Demikhovsky2015-06-291-0/+241
| | | | | | encoding, intrinsics and tests. llvm-svn: 240936
* [ARM]: Extend -mfpu options for half-precision and vfpv3xdJaved Absar2015-06-291-1/+14
| | | | | | | | | | | | | | | Some of the the permissible ARM -mfpu options, which are supported in GCC, are currently not present in llvm/clang.This patch adds the options: 'neon-fp16', 'vfpv3-fp16', 'vfpv3-d16-fp16', 'vfpv3xd' and 'vfpv3xd-fp16. These are related to half-precision floating-point and single precision. Reviewers: rengolin, ranjeet.singh Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10645 llvm-svn: 240930
* AVX-512: Implemented missing encoding and intrinsics for FMA instructionsIgor Breger2015-06-293-303/+1308
| | | | | | | | Added tests for DAG lowering ,encoding and intrinsics Differential Revision: http://reviews.llvm.org/D10796 llvm-svn: 240926
* AMDGPU/SI: Fix extra space when printing v_div_fmas_*Matt Arsenault2015-06-281-2/+2
| | | | llvm-svn: 240911
* [x86][AVX512]Asaf Badouh2015-06-282-0/+74
| | | | | | | | | | | Add vscalef support include encoding and intrinsics review: http://reviews.llvm.org/D10730 llvm-svn: 240906
* AVX-512: Added all SKX forms of GATHER instructions.Elena Demikhovsky2015-06-281-96/+411
| | | | | | | Added intrinsics. Added encoding and tests. llvm-svn: 240905
* [SDAG] Now that we have a way to communicate the exact bit on sdiv use it to ↵Benjamin Kramer2015-06-271-1/+12
| | | | | | | | | | | | | | | | | simplify sdiv by a constant. We had a hack in SDAGBuilder in place to work around this but now we can avoid that. Call BuildExactSDIV from BuildSDIV so DAGCombiner can perform this trick automatically. The added check in DAGCombiner is necessary to prevent exact sdiv by pow2 from regressing as the target-specific pow2 lowering is not aware of exact bits yet. This is mostly covered by existing tests. One side effect is that we get the better lowering for exact vector sdivs now too :) llvm-svn: 240891
* llvm/test/CodeGen/X86/xor.ll: Appease Win32 targets since r240796.NAKAMURA Takumi2015-06-271-1/+1
| | | | | | | | | %struct.ref_s = type { %union.v, i16, i16 } %union.v = type { i64 } It seems %struct.ref_s is incompatible in tail padding. llvm-svn: 240874
* MIR Serialization: Serialize global address machine operands.Alex Lorenz2015-06-263-0/+105
| | | | | | | | | | | | This commit serializes the global address machine operands. This commit doesn't serialize the operand's offset and target flags, it serializes only the global value reference. Reviewers: Duncan P. N. Exon Smith Differential Revision: http://reviews.llvm.org/D10671 llvm-svn: 240851
* [NVPTX] noop when kernel pointers are already globalJingyue Wu2015-06-261-1/+12
| | | | | | | | | | | | | | | | Summary: Some front ends make kernel pointers global already. In that case, handlePointerParams does nothing. Test Plan: more tests in lower-kernel-ptr-arg.ll Reviewers: grosser Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10779 llvm-svn: 240849
* AMDPGU/SI: Use correct resource descriptors for VI on HSATom Stellard2015-06-261-3/+8
| | | | | | | | | | Summary: We need to set MTYPE = 2 for VI shaders when targeting the HSA runtime. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D10777 llvm-svn: 240841
* AMDGPU/SI: Update amd_kernel_code_t definition and add assembler supportTom Stellard2015-06-261-4/+2
| | | | | | | | | | Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10772 llvm-svn: 240839
* [Verifier] Verify invokes of intrinsicsPhilip Reames2015-06-261-2/+2
| | | | | | | | | | We support invoking a subset of llvm's intrinsics, but the verifier didn't account for this. We had previously added a special case to verify invokes of statepoints. By generalizing the code in terms of CallSite, we can verify invokes of other intrinsics as well. Interestingly, this found one test case which was invalid. Note: I'm deliberately leaving the naming change from CI to CS to a follow up change. That will happen shortly, I just wanted to reduce the diff to make it clear what was happening with this one. Differential Revision: http://reviews.llvm.org/D10118 llvm-svn: 240836
* AMDGPU/SI: Set ELF OS/ABI to ELFOSABI_AMDGPU_HSATom Stellard2015-06-261-0/+1
| | | | | | | | | | Reviewers: arsenm, rafael Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10708 llvm-svn: 240832
* AMDGPU/SI: Add hsa code object directivesTom Stellard2015-06-261-0/+15
| | | | | | | | | | Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10757 llvm-svn: 240831
* AMDGPU/SI: There are no implicit kernel args in the amdhsa ABITom Stellard2015-06-261-0/+1
| | | | | | | | | | Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10706 llvm-svn: 240830
* AMDGPU/SI: Emit amd_kernel_code_t in EmitFunctionBodyStart()Tom Stellard2015-06-261-1/+3
| | | | | | | | | | | | | | Summary: This way the function symbol points to the start of amd_kernel_code_t rather than the start of the function. Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10705 llvm-svn: 240829
* AMDGPU: really don't commute REV opcodes if the target variant doesn't existMarek Olsak2015-06-261-0/+33
| | | | | | | | | | | | | | If pseudoToMCOpcode failed, we would return the original opcode, so operands would be swapped, but the instruction would remain the same. It resulted in LSHLREV a, b ---> LSHLREV b, a. This fixes Glamor text rendering and piglit/arb_sample_shading-builtin-gl-sample-mask on VI. This is a candidate for stable branches. v2: the test was simplified by Tom Stellard llvm-svn: 240824
* Add missing builtins to the PPC back end for ABI compliance (vol. 1)Nemanja Ivanovic2015-06-261-0/+165
| | | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D10638 This is the back end portion of patch http://reviews.llvm.org/D10637 It just adds the code gen and intrinsic functions necessary to support that patch to the back end. llvm-svn: 240820
* Revert "Revert r240762 "[X86] Cleanup ↵David Majnemer2015-06-261-0/+13
| | | | | | | | | | | X86WindowsTargetObjectFile::getSectionForConstant"" This reverts commit r240793 while fixing how we handle array constant pool entries. This fixes PR23966. llvm-svn: 240811
* [ARM] Cortex-R5 is not VFPOnlySPJaved Absar2015-06-261-1/+1
| | | | | | | | | | | | | This patch fixes the error in ARM.td which stated that Cortex-R5 floating point unit can do only single precision, when it can do double as well. Reviewers: rengolin Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10769 llvm-svn: 240799
* [DAGCombine] Fix demanded bits computation for exact shifts.Benjamin Kramer2015-06-261-0/+19
| | | | | | Fixes a miscompilation of MultiSource/Benchmarks/MallocBench/gs llvm-svn: 240796
* MIR Serialization: Serialize machine basic block operands.Alex Lorenz2015-06-2618-18/+277
| | | | | | | | | | | | | | | | | | | | | | | | | This commit serializes machine basic block operands. The machine basic block operands use the following syntax: %bb.<id>[.<name>] This commit also modifies the YAML representation for the machine basic blocks - a new, required field 'id' is added to the MBB YAML mapping. The id is used to resolve the MBB references to the actual MBBs. And while the name of the MBB can be included in a MBB reference, this name isn't used to resolve MBB references - as it's possible that multiple MBBs will reference the same BB and thus they will have the same name. If the name is specified, the parser will verify that it is equal to the name of the MBB with the specified id. Reviewers: Duncan P. N. Exon Smith Differential Revision: http://reviews.llvm.org/D10608 llvm-svn: 240792
* [DAGCombiner] Preserve the exact bit when simplifying SRA to SRL.Benjamin Kramer2015-06-261-0/+10
| | | | | | Allows more aggressive folding of ashr/shl pairs. llvm-svn: 240788
OpenPOWER on IntegriCloud