summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Add D32039/PR31357 tests to show current BSWAP codegenSimon Pilgrim2017-04-192-0/+255
| | | | llvm-svn: 300672
* [X86][SSE] Add scheduling latency/throughput tests for (most) SSE2 instructionsSimon Pilgrim2017-04-191-0/+6039
| | | | llvm-svn: 300671
* Revert "ARMFrameLowering: Reserve emergency spill slot for large arguments"Renato Golin2017-04-191-94/+0
| | | | | | This reverts commit r300639, as it broke self-hosting on ARM. PR32709. llvm-svn: 300668
* [GlobalISel][X86] Split select tests. NFC.Igor Breger2017-04-197-444/+455
| | | | llvm-svn: 300666
* [ARM] GlobalISel: Add support for G_MULDiana Picus2017-04-194-1/+326
| | | | | | | | Support G_MUL, very similar to G_ADD and G_SUB. The only difference is in the instruction selector, where we have to select either MUL or MULv5 depending on the target. llvm-svn: 300665
* [GlobalISel] Support vector-of-pointers in LLTKristof Beyls2017-04-191-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes PR32471. As comment 10 on that bug report highlights (https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a few different defendable design tradeoffs that could be made, including not representing pointers at all in LLT. I decided to go for representing vector-of-pointer as a concept in LLT, while keeping the size of the LLT type 64 bits (this is an increase from 48 bits before). My rationale for keeping pointers explicit is that on some targets probably it's very handy to have the distinction between pointer and non-pointer (e.g. 68K has a different register bank for pointers IIRC). If we keep a scalar pointer, it probably is easiest to also have a vector-of-pointers to keep LLT relatively conceptually clean and orthogonal, while we don't have a very strong reason to break that orthogonality. Once we gain more experience on the use of LLT, we can of course reconsider this direction. Rejecting vector-of-pointer types in the IRTranslator is also an option to avoid the crash reported in PR32471, but that is only a very short-term solution; also needs quite a bit of code tweaks in places, and is probably fragile. Therefore I didn't consider this the best option. llvm-svn: 300664
* ARMFrameLowering: Reserve emergency spill slot for large argumentsMatthias Braun2017-04-191-0/+94
| | | | | | | | | | | | We need to reserve an emergency spill slot in cases with large argument types that could overflow immediate offsets for FP relative address calculations. rdar://31317893 Differential Revision: https://reviews.llvm.org/D31643 llvm-svn: 300639
* [x86] add tests for potential andn optimization; NFCSanjay Patel2017-04-181-2/+40
| | | | llvm-svn: 300617
* [X86] Keep EXTRACT_VECTOR_ELT result type as f128 for Android x86_64.Chih-Hung Hsieh2017-04-182-0/+59
| | | | | | | | | | Android x86_64 target uses f128 type and stores f128 values in %xmm* registers. SoftenFloatRes_EXTRACT_VECTOR_ELT should not convert result value from f128 to i128. Differential Revision: http://reviews.llvm.org/D32102 llvm-svn: 300583
* [X86][SSE] Add scheduling latency/throughput tests for (most) SSE1 instructionsSimon Pilgrim2017-04-181-0/+2415
| | | | llvm-svn: 300576
* [DAG] Improve store merge candidate pruning.Nirav Dave2017-04-182-12/+4
| | | | | | | | | | | | | | Remove non-consecutive stores from store merge candidate search as they cannot be merged and will prevent us from finding subsequent mergeable store cases. Reviewers: jyknight, bogner, javed.absar, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32086 llvm-svn: 300561
* Add base-index-based store merge testNirav Dave2017-04-181-0/+31
| | | | llvm-svn: 300559
* Add store Merge test.Nirav Dave2017-04-181-0/+25
| | | | llvm-svn: 300551
* [ARM] Add hardware build attributes in assemblerOliver Stannard2017-04-181-234/+227
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the assembler, we should emit build attributes based on the target selected with command-line options. This matches the GNU assembler's behaviour. We only do this for build attributes which describe the hardware that is expected to be available, not the ones that describe ABI compatibility. This is done by moving some of the attribute emission code to ARMTargetStreamer, so that it can be shared between the assembly and code-generation code paths. Since the assembler only creates a MCSubtargetInfo, not an ARMSubtarget, the code had to be changed to check raw features, and not use the convenience functions in ARMSubtarget. If different attributes are later specified using the .eabi_attribute directive, then they will take precedence, as happens when the same .eabi_attribute is specified twice. This must be enabled by an option, because we don't want to do this when parsing inline assembly. The attributes would match the ones emitted at the start of the file, so wouldn't actually change the emitted object file, but the extra directives would be added to every inline assembly block when emitting assembly, which we'd like to avoid. The majority of the changes in the build-attributes.ll test are just re-ordering the directives, because the hardware attributes are now emitted before the ABI ones. However, I did fix one bug which I spotted: Tag_CPU_arch_profile was not being emitted for v6M. Differential revision: https://reviews.llvm.org/D31812 llvm-svn: 300547
* [ARM] GlobalISel: Add support for G_SUBDiana Picus2017-04-185-0/+329
| | | | | | | Support G_SUB throughout the GlobalISel pipeline. It is exactly the same as G_ADD, nothing fancy. llvm-svn: 300546
* Revert "[GlobalISel] Support vector-of-pointers in LLT"Kristof Beyls2017-04-181-16/+0
| | | | | | | | | | | | | | | | This reverts r300535 and r300537. The newly added tests in test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll produces slightly different code between LLVM versions being built with different compilers. E.g., dependent on the compiler LLVM is built with, either one of the following can be produced: remark: <unknown>:0:0: unable to legalize instruction: %vreg0<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg2; (in function: vector_of_pointers_extractelement) remark: <unknown>:0:0: unable to legalize instruction: %vreg2<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg0; (in function: vector_of_pointers_extractelement) Non-determinism like this is clearly a bad thing, so reverting this until I can find and fix the root cause of the non-determinism. llvm-svn: 300538
* [ARM] Check for correct HW div when lowering divmodDiana Picus2017-04-181-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For subtargets that use the custom lowering for divmod, e.g. gnueabi, we used to check if the subtarget has hardware divide and then lower to a div-mul-sub sequence if true, or to a libcall if false. However, judging by the usage of hasDivide vs hasDivideInARMMode, it seems that hasDivide only refers to Thumb. For instance, in the ARMTargetLowering constructor, the code that specifies whether to use libcalls for (S|U)DIV looks like this: bool hasDivide = Subtarget->isThumb() ? Subtarget->hasDivide() : Subtarget->hasDivideInARMMode(); In the case of divmod for arm-gnueabi, using only hasDivide() to determine what to do means that instead of lowering to __aeabi_idivmod to get the remainder, we lower to div-mul-sub and then further lower the div to __aeabi_idiv. Even worse, if we have hardware divide in ARM but not in Thumb, we generate a libcall instead of using it (this is not an issue in practice since AFAICT none of the cores that we support have hardware divide in ARM but not Thumb). This patch fixes the code dealing with custom lowering to take into account the mode (Thumb or ARM) when deciding whether or not hardware division is available. Differential Revision: https://reviews.llvm.org/D32005 llvm-svn: 300536
* [GlobalISel] Support vector-of-pointers in LLTKristof Beyls2017-04-181-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes PR32471. As comment 10 on that bug report highlights (https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a few different defendable design tradeoffs that could be made, including not representing pointers at all in LLT. I decided to go for representing vector-of-pointer as a concept in LLT, while keeping the size of the LLT type 64 bits (this is an increase from 48 bits before). My rationale for keeping pointers explicit is that on some targets probably it's very handy to have the distinction between pointer and non-pointer (e.g. 68K has a different register bank for pointers IIRC). If we keep a scalar pointer, it probably is easiest to also have a vector-of-pointers to keep LLT relatively conceptually clean and orthogonal, while we don't have a very strong reason to break that orthogonality. Once we gain more experience on the use of LLT, we can of course reconsider this direction. Rejecting vector-of-pointer types in the IRTranslator is also an option to avoid the crash reported in PR32471, but that is only a very short-term solution; also needs quite a bit of code tweaks in places, and is probably fragile. Therefore I didn't consider this the best option. llvm-svn: 300535
* Change the testcase tail-merge-after-mbp.ll to tail-merge-after-mbp.mirHaicheng Wu2017-04-172-94/+105
| | | | | | Differential Revision: https://reviews.llvm.org/D32037 llvm-svn: 300506
* [WebAssembly] Fix WebAssemblyOptimizeReturned after r300367Jacob Gravelle2017-04-171-0/+31
| | | | | | | | | | | | | | | | Summary: Refactoring changed paramHasAttr(1 + i) to paramHasAttr(0), fix that to paramHasAttr(i). Add more tests to WebAssemblyOptimizeReturned that catch that regression. Reviewers: dschuff Subscribers: jfb, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D32136 llvm-svn: 300502
* AMDGPU: Use MachineRegisterInfo to find max used registerMatt Arsenault2017-04-173-16/+51
| | | | | | | | | | Avoid looping through program to determine register counts. This avoids needing to look at regmask operands. Also fixes some counting errors with flat_scr when there are no stack objects. llvm-svn: 300482
* AArch64: put nonlazybind special handling behind a flag for now.Tim Northover2017-04-171-1/+9
| | | | | | | | It's basically a terrible idea anyway but objc_msgSend gets emitted like that. We can decide on a better way to deal with it in the unlikely event that anyone actually uses it. llvm-svn: 300474
* AArch64: support nonlazybindTim Northover2017-04-171-0/+32
| | | | | | | | It's almost certainly not a good idea to actually use it in most cases (there's a pretty large code size overhead on AArch64), but we can't do those experiments until it's supported. llvm-svn: 300462
* [X86] Remove special handling for 16 bit for A asm constraints.Benjamin Kramer2017-04-161-1/+8
| | | | | | | | | | Our 16 bit support is assembler-only + the terrible hack that is .code16gcc. Simply using 32 bit registers does the right thing for the latter. Fixes PR32681. llvm-svn: 300429
* Use correct registers for "A" inline asm constraintDimitry Andric2017-04-151-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: In PR32594, inline assembly using the 'A' constraint on x86_64 causes llvm to crash with a "Cannot select" stack trace. This is because `X86TargetLowering::getRegForInlineAsmConstraint` hardcodes that 'A' means the EAX and EDX registers. However, on x86_64 it means the RAX and RDX registers, and on 16-bit x86 (ia16?) it means the old AX and DX registers. Add new register classes in `X86RegisterInfo.td` to support these cases, and amend the logic in `getRegForInlineAsmConstraint` to cope with different subtargets. Also add a test case, derived from PR32594. Reviewers: craig.topper, qcolombet, RKSimon, ab Reviewed By: ab Subscribers: ab, emaste, royger, llvm-commits Differential Revision: https://reviews.llvm.org/D31902 llvm-svn: 300404
* [AMDGPU] set read_only access qualifier for pointersStanislav Mekhanoshin2017-04-141-0/+33
| | | | | | | | | | If a kernel's pointer argument is known to be readonly set access qualifier accordingly. This allows RT not to flush caches before dispatches. Differential Revision: https://reviews.llvm.org/D32091 llvm-svn: 300362
* [X86][SSE] Update MOVNTDQA non-temporal loads to generic implementation (LLVM)Simon Pilgrim2017-04-144-33/+51
| | | | | | | | | | MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics. Clang companion patch: D31766. Differential Revision: https://reviews.llvm.org/D31767 llvm-svn: 300325
* Fix for PR#30562: Selection DAG error: Detected cycle in SelectionDAG.Andrew V. Tischenko2017-04-141-0/+22
| | | | | | Patch by Dinar Temirbulatov llvm-svn: 300314
* This patch closes PR#32216: Better testing of schedule model instruction ↵Andrew V. Tischenko2017-04-142-595/+631
| | | | | | | | latencies/throughputs. The details are here: https://reviews.llvm.org/D30941 llvm-svn: 300311
* [AMDGPU] added SIInstrInfo::getAddNoCarry() helperStanislav Mekhanoshin2017-04-141-119/+146
| | | | | | | | Addressed rest of post submit comments from D31993. Differential Revision: https://reviews.llvm.org/D32057 llvm-svn: 300288
* [AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16Adam Nemet2017-04-133-4/+85
| | | | | | | | | | | | This further improves Ahmed's change in rL299482. See the new comment for the rationale. The patch recovers most of the regression for bzip2 after D31965. We're down to +2.68% from +6.97%. Differential Revision: https://reviews.llvm.org/D32028 llvm-svn: 300276
* AMDGPU/GFX9: Do not use v_pack_b32_f16 when packingKonstantin Zhuravlyov2017-04-135-57/+37
| | | | | | Differential Revision: https://reviews.llvm.org/D31819 llvm-svn: 300275
* [bpf] Fix memory offset check for loads and storesAlexei Starovoitov2017-04-131-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the offset cannot fit into the instruction, an addition to the pointer is emitted before the actual access. However, BPF offsets are 16-bit but LLVM considers them to be, for the matter of this check, to be 32-bit long. This causes the following program: int bpf_prog1(void *ign) { volatile unsigned long t = 0x8983984739ull; return *(unsigned long *)((0xffffffff8fff0002ull) + t); } To generate the following (wrong) code: 0: 18 01 00 00 39 47 98 83 00 00 00 00 89 00 00 00 r1 = 590618314553ll 2: 7b 1a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r1 3: 79 a1 f8 ff 00 00 00 00 r1 = *(u64 *)(r10 - 8) 4: 79 10 02 00 00 00 00 00 r0 = *(u64 *)(r1 + 2) 5: 95 00 00 00 00 00 00 00 exit Fix it by changing the offset check to 16-bit. Patch by Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Differential Revision: https://reviews.llvm.org/D32055 llvm-svn: 300269
* [AMDGPU] Combine DS operations with offsets bigger than byteStanislav Mekhanoshin2017-04-131-0/+385
| | | | | | | | | In many cases ds operations can be combined even if offsets do not fit into 8 bit encoding. What it takes is to adjust base address. Differential Revision: https://reviews.llvm.org/D31993 llvm-svn: 300227
* [Hexagon] Unxfail passing tests Krzysztof Parzyszek2017-04-132-4/+0
| | | | | | | r300198 fixed a problem that caused two tests to be xfailed. Unxfail these tests now, since they are passing. llvm-svn: 300203
* AMDGPU : Fix common dominator of two incoming blocks terminates with uniform ↵Wei Ding2017-04-124-6/+65
| | | | | | | | branch issue. Differential Revision: http://reviews.llvm.org/D31350 llvm-svn: 300142
* AMDGPU: Fix invalid copies when copying i1 to phys regMatt Arsenault2017-04-121-0/+36
| | | | | | | Insert a VReg_1 virtual register so the i1 workaround pass can handle it. llvm-svn: 300113
* [AMDGPU] Generate range metadata for workitem idStanislav Mekhanoshin2017-04-1214-55/+141
| | | | | | | | | If workgroup size is known inform llvm about range returned by local id and local size queries. Differential Revision: https://reviews.llvm.org/D31804 llvm-svn: 300102
* [GlobalIsel][X86] support G_CONSTANT selection.Igor Breger2017-04-123-0/+224
| | | | | | | | | | | | | | Summary: [GlobalISel][X86] support G_CONSTANT selection. Add regbank select tests. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: llvm-commits, dberris, rovka, kristof.beyls Differential Revision: https://reviews.llvm.org/D31974 llvm-svn: 300057
* [AMDGPU] SDWA: make pass globalSam Kolton2017-04-121-18/+10
| | | | | | | | | | | | Summary: Remove checks for basic blocks. Reviewers: vpykhtin, rampitec, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31935 llvm-svn: 300040
* [AMDGPU] Add a new pass to insert waitcnts. Leave under an option for testing.Kannan Narayanan2017-04-122-5/+0
| | | | | | Based on comments in https://reviews.llvm.org/D31161. llvm-svn: 300023
* AMDGPU: Insert wait at start of callee functionsMatt Arsenault2017-04-112-1/+26
| | | | llvm-svn: 300000
* AMDGPU: Refactor SIMachineFunctionInfo slightlyMatt Arsenault2017-04-111-7/+7
| | | | | | Prepare for handling non-entry functions. llvm-svn: 299999
* AMDGPU: Refactor argument loweringMatt Arsenault2017-04-111-2/+3
| | | | | | | Split into smaller functions and prepare for handling non-entry functions. llvm-svn: 299998
* AMDGPU: Fix folding reg_sequence into copy to phys regMatt Arsenault2017-04-111-0/+13
| | | | | | | This was producing an illegal reg_sequence defining a physical register with virtual register inputs. llvm-svn: 299997
* [DAGCombine] Add more test cases for shuffle of splat. NFC.Zvi Rackover2017-04-111-0/+56
| | | | | | Tests added contain splat-masks with undef elements. llvm-svn: 299988
* [x86] Relax the check in areLoadsFromSameBasePtrEaswaran Raman2017-04-111-4/+4
| | | | | | | | | Check if the scale operand is identical (doesn't have to be 1) and do not check the chaain operand. Differential revision: https://reviews.llvm.org/D31833 llvm-svn: 299986
* MIR: Allow parsing of empty machine functionsJustin Bogner2017-04-118-41/+17
| | | | | | | | | | | | If you run llc -stop-after=codegenprepare and feed the resulting MIR to llc -start-after=codegenprepare, you'll have an empty machine function since we haven't run any isel yet. Of course, this only works if the MIRParser believes you that this is okay. This is essentially a revert of r241862 with a fix for the problem it was papering over. llvm-svn: 299975
* [X86] Create the correct ADC/SBB SDNode when lowering add.Davide Italiano2017-04-111-0/+27
| | | | | | Differential Revision: https://reviews.llvm.org/D31911 llvm-svn: 299973
* [AMDGPU] Add A5 to data layout for amdgiz environmentYaxun Liu2017-04-112-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D31589 llvm-svn: 299964
OpenPOWER on IntegriCloud