summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [x86] autoupgrade and remove AVX2 integer min/max intrinsicsSanjay Patel2016-06-162-39/+170
| | | | | | | This will (hopefully very temporarily) break clang. The clang side of this should be the next commit. llvm-svn: 272932
* dos2unix this test. NFC.Rafael Espindola2016-06-161-45/+45
| | | | llvm-svn: 272928
* remove old FileCheck lines that are no longer usedSanjay Patel2016-06-161-18/+0
| | | | llvm-svn: 272921
* [DAG] Remove redundant FMUL in Newton-Raphson SQRT codeSanjay Patel2016-06-162-15/+66
| | | | | | | | | | | | | | | | | When calculating a square root using Newton-Raphson with two constants, a naive implementation is to use five multiplications (four muls to calculate reciprocal square root and another one to calculate the square root itself). However, after some reassociation and CSE the same result can be obtained with only four multiplications. Unfortunately, there's no reliable way to do such a reassociation in the back-end. So, the patch modifies NR code itself so that it directly builds optimal code for SQRT and doesn't rely on any further reassociation. Patch by Nikolai Bozhenov! Differential Revision: http://reviews.llvm.org/D21127 llvm-svn: 272920
* Don't print (PLT) on arm.Rafael Espindola2016-06-165-25/+25
| | | | | | | | | The R_ARM_PLT32 relocation is deprecated and is not produced by MC. This means that the code being deleted is dead from the .o point of view and was making the .s more confusing. llvm-svn: 272909
* [x86] autoupgrade and remove SSE2/SSE41 integer min/max intrinsicsSanjay Patel2016-06-162-7/+141
| | | | | | | | Follow-up to: http://reviews.llvm.org/rL272806 http://reviews.llvm.org/rL272807 llvm-svn: 272907
* [mips][mips16] Fix machine verifier errors about incorrect register classes ↵Daniel Sanders2016-06-164-66/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | on load/stores. Summary: [ls][bh] and [ls][bh]u cannot use sp-relative addresses and must therefore lower frameindex nodes such that there is a copy to a CPU16Regs register. This is now done consistently using a separate addressing mode that does not permit frameindex nodes. As part of this I've had to remove an optimization that reduced the number of instructions needed to work around the lack of sp-relative addresses on [ls][bh] and [ls][bh]u. This optimization used one of the eight CPU16Regs registers as a copy of the stack pointer and it's implementation was the root cause of many of the register vs register class mismatches. lw/sw can use sp-relative addresses but we ought to ensure that we use the correct version of lw/sw internally for things like IAS. This is not currently the case and this change does not fix this. However, this change does clean it up sufficiently well to fix the machine verifier failures. Also removed irrelevant functions from stchar.ll. Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21062 llvm-svn: 272882
* [llvm-objdump] Support detection of feature bits from the object and ↵Daniel Sanders2016-06-163-4/+4
| | | | | | | | | | | | | | | | | | | | implement this for Mips. Summary: The Mips implementation only covers the feature bits described by the ELF e_flags so far. Mips stores additional feature bits such as MSA in the .MIPS.abiflags section. Also fixed a small bug this revealed where microMIPS wouldn't add the EF_MIPS_MICROMIPS flag when using -filetype=obj. Reviewers: echristo, rafael Subscribers: rafael, mehdi_amini, dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21125 llvm-svn: 272880
* [mips][micromips] Implement DCLO, DCLZ, DROTR, DROTR32 and DROTRV instructionsHrvoje Varga2016-06-162-15/+23
| | | | | | Differential Revision: http://reviews.llvm.org/D16917 llvm-svn: 272876
* AArch64: allow MOV (imm) alias to be printedTim Northover2016-06-1631-123/+119
| | | | | | | | | The backend has been around for years, it's pretty ridiculous that we can't even use the preferred form for printing "MOV" aliases. Unfortunately, TableGen can't handle the complex predicates when printing so it's a bunch of nasty C++. Oh well. llvm-svn: 272865
* AMDGPU: Disable scheduling in some slow testsMatt Arsenault2016-06-162-4/+4
| | | | | | | Disabling the pre-RA scheduler on large-work-group-registers causes it to be ~50% slower. llvm-svn: 272860
* [x86, SSE] update packed FP compare tests for direct translation from ↵Sanjay Patel2016-06-153-53/+97
| | | | | | | | | | | | | builtin to IR The clang side of this was r272840: http://reviews.llvm.org/rL272840 A follow-up step would be to auto-upgrade and remove these LLVM intrinsics completely. Differential Revision: http://reviews.llvm.org/D21269 llvm-svn: 272841
* [x86] delete unnecessary function declarationsSanjay Patel2016-06-151-4/+0
| | | | | | Missed this in r272806, r272807. llvm-svn: 272834
* AArch64: stop trying to use 32-bit MOVZs when expanding patchpoints.Tim Northover2016-06-151-3/+10
| | | | | | | | | | | Of course the assembly was right but because the opcode was MOVZWi it was encoded as "movz w16, #65535, lsl #32" which is an unallocated encoding and would go horribly wrong on a CPU. No idea how this bug survived this long. It seems nobody is using that aspect of patchpoints. llvm-svn: 272831
* [x86] add folds for x86 vector compare nodes (PR27924)Sanjay Patel2016-06-151-7/+3
| | | | | | | | | | | | | Ideally, we can get rid of most x86 LLVM intrinsics by transforming them to IR (and some of that happened with http://reviews.llvm.org/rL272807), but it doesn't cost much to have some simple folds in the backend too while we're working on that and as a backstop. This fixes: https://llvm.org/bugs/show_bug.cgi?id=27924 Differential Revision: http://reviews.llvm.org/D21356 llvm-svn: 272828
* [X86]: Updated r272801 to promote 16 bit compares with immediate operandKevin B. Smith2016-06-152-2/+4
| | | | | | to 32 bits. This is in response to a comment by Eli Friedman. llvm-svn: 272814
* [x86, SSE] remove the GCCBuiltins from the integer min/max intrinsicsSanjay Patel2016-06-152-32/+36
| | | | | | | | This allows us to emit native IR in Clang (next commit). Also, update the intrinsic tests to show that codegen already knows how to handle the IR that Clang will soon produce. llvm-svn: 272806
* [X86]: Quit promoting 8 and 16 bit compares to 32 bit.Kevin B. Smith2016-06-158-68/+73
| | | | | | Differential Revision: http://reviews.llvm.org/D21144 llvm-svn: 272801
* [X86]: Improve Liveness checking for X86FixupBWInsts.cppKevin B. Smith2016-06-151-0/+37
| | | | | | Differential Revision: http://reviews.llvm.org/D21085 llvm-svn: 272797
* Reverting r272778 because there's an assertionRanjeet Singh2016-06-151-8/+0
| | | | | | failure when running the test CodeGen/ARM/intrinsics-coprocessor.ll llvm-svn: 272791
* [mips] Missing test caseSimon Dardis2016-06-151-0/+54
| | | | | | Add missing testcase from r272666. llvm-svn: 272784
* [ARM] Add support for mrrc/mrrc2 intrinsics.Ranjeet Singh2016-06-151-0/+8
| | | | | | Differential Revision: http://reviews.llvm.org/D21178 llvm-svn: 272778
* [mips] Removed invalid test from o32_cc.llDaniel Sanders2016-06-151-1/+0
| | | | | | MIPS32R1 cannot implement a 64-bit FPU because this was introduced in MIPS32R2. llvm-svn: 272769
* [mips][msa] Fix register/register-class mismatches in emitINSERT_DF_VIDX().Daniel Sanders2016-06-151-6/+18
| | | | | | | | | | Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21068 llvm-svn: 272765
* [mips][microMIPS] Add CodeGen support for AND*, OR16, OR*, XOR*, NOT16 and ↵Zlatko Buljan2016-06-1510-43/+1586
| | | | | | | | NOR instructions Differential Revision: http://reviews.llvm.org/D16719 llvm-svn: 272764
* [AVX512] Fix BLENDM lowering patterns. Operands should be swapped to match ↵Igor Breger2016-06-1510-253/+147
| | | | | | | | | | SELECT behavior. Use BLENDM instead of masked move instruction. Differential Revision: http://reviews.llvm.org/D21001 llvm-svn: 272763
* AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsicsNicolai Haehnle2016-06-153-22/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This fixes two related bugs. First, the generic optimization passes unfortunately generate negative constant offsets but the hardware treats SOffset as an unsigned value. Second, there is a hardware bug on SI and CI, where address clamping in MUBUF instructions does not work correctly when SOffset is larger than the buffer size. This patch works around this bug by never using SOffset. An alternative workaround would be to do the clamping manually when SOffset is too large, but generating the required code sequence during instruction selection would be rather involved, and in any case the resulting code would probably be worse. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21326 llvm-svn: 272761
* Don't force SP-relative addressing for statepointsSanjoy Das2016-06-151-0/+13
| | | | | | | | | | | | | | | | | | | Summary: ... when the offset is not statically known. Prioritize addresses relative to the stack pointer in the stackmap, but fallback gracefully to other modes of addressing if the offset to the stack pointer is not a known constant. Patch by Oscar Blumberg! Reviewers: sanjoy Subscribers: llvm-commits, majnemer, rnk, sanjoy, thanm Differential Revision: http://reviews.llvm.org/D21259 llvm-svn: 272756
* Remove the ScalarReplAggregates passDavid Majnemer2016-06-151-1/+1
| | | | | | | | | | Nearly all the changes to this pass have been done while maintaining and updating other parts of LLVM. LLVM has had another pass, SROA, which has superseded ScalarReplAggregates for quite some time. Differential Revision: http://reviews.llvm.org/D21316 llvm-svn: 272737
* AMDGPU: Run pointer optimization passesMatt Arsenault2016-06-155-76/+83
| | | | llvm-svn: 272736
* Fix a test case to match its intentionXinliang David Li2016-06-141-2/+2
| | | | llvm-svn: 272733
* Set machine block placement hot prob threshold for both static and runtime ↵Dehao Chen2016-06-141-0/+78
| | | | | | | | | | | | | | profile. Summary: With runtime profile, we have more confidence in branch probability, thus during basic block layout, we set a lower hot prob threshold so that blocks can be layouted optimally. Reviewers: djasper, davidxl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D20991 llvm-svn: 272729
* [x86] add current codegen tests for PR27924Sanjay Patel2016-06-141-0/+51
| | | | llvm-svn: 272714
* IR: Introduce local_unnamed_addr attribute.Peter Collingbourne2016-06-142-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a local_unnamed_addr attribute is attached to a global, the address is known to be insignificant within the module. It is distinct from the existing unnamed_addr attribute in that it only describes a local property of the module rather than a global property of the symbol. This attribute is intended to be used by the code generator and LTO to allow the linker to decide whether the global needs to be in the symbol table. It is possible to exclude a global from the symbol table if three things are true: - This attribute is present on every instance of the global (which means that the normal rule that the global must have a unique address can be broken without being observable by the program by performing comparisons against the global's address) - The global has linkonce_odr linkage (which means that each linkage unit must have its own copy of the global if it requires one, and the copy in each linkage unit must be the same) - It is a constant or a function (which means that the program cannot observe that the unique-address rule has been broken by writing to the global) Although this attribute could in principle be computed from the module contents, LTO clients (i.e. linkers) will normally need to be able to compute this property as part of symbol resolution, and it would be inefficient to materialize every module just to compute it. See: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html for earlier discussion. Part of the fix for PR27553. Differential Revision: http://reviews.llvm.org/D20348 llvm-svn: 272709
* [X86] Reduce the width of multiplification when its operands are extended ↵Wei Mi2016-06-141-0/+864
| | | | | | | | | | | | | | | | from i8 or i16 For <N x i32> type mul, pmuludq will be used for targets without SSE41, which often introduces many extra pack and unpack instructions in vectorized loop body because pmuludq generates <N/2 x i64> type value. However when the operands of <N x i32> mul are extended from smaller size values like i8 and i16, the type of mul may be shrunk to use pmullw + pmulhw/pmulhuw instead of pmuludq, which generates better code. For targets with SSE41, pmulld is supported so no shrinking is needed. Differential Revision: http://reviews.llvm.org/D20931 llvm-svn: 272694
* Fix BSS global handling in AsmPrinterNirav Dave2016-06-141-0/+18
| | | | | | | | | | | | | | | | Change EmitGlobalVariable to check final assembler section is in BSS before using .lcomm/.comm directive. This prevents globals from being put into .bss erroneously when -data-sections is used. This fixes PR26570. Reviewers: echristo, rafael Subscribers: llvm-commits, mehdi_amini Differential Revision: http://reviews.llvm.org/D21146 llvm-svn: 272674
* [mips] Optimize stack pointer adjustments.Simon Dardis2016-06-143-20/+32
| | | | | | | | | | | | | | | | | | | | | Instead of always using addu to adjust the stack pointer when the size out is of the range of an addiu instruction, use subu so that a smaller constant can be generated. This can give savings of ~3 instructions whenever a function has a a stack frame whose size is out of range of an addiu instruction. This change may break some naive stack unwinders. Partially resolves PR/26291. Thanks to David Chisnall for reporting the issue. Reviewers: dsanders, vkalintiris Differential Review: http://reviews.llvm.org/D21321 llvm-svn: 272666
* [Thumb] Fix off-by-one error in r272007James Molloy2016-06-141-0/+8
| | | | | | | | We can only generate immediates up to #510 with a MOV+ADD, not #511, because there's no such instruction as add #256. Found by Oliver Stannard and csmith! llvm-svn: 272665
* [mips][atomics] Fix atomic instruction descriptions and uses.Simon Dardis2016-06-141-10/+20
| | | | | | | | | | | | | | PR27458 highlights that the MIPS backend does not have well formed MIR for atomic operations (among other errors). This patch adds expands and corrects the LL/SC descriptions and uses for MIPS(64). Reviewers: dsanders, vkalintiris Differential Review: http://reviews.llvm.org/D19719 llvm-svn: 272655
* [X86][SSE4A] Added patterns for nontemporal stores of scalar float/doubles ↵Simon Pilgrim2016-06-141-13/+44
| | | | | | using MOVNTSD/MOVNTSS llvm-svn: 272651
* [mips] MIPS32/64 itinerariesSimon Dardis2016-06-142-12/+12
| | | | | | | | | | | Itineraries for some pre MIPSR6 and EVA instructions. Some pseudo expanded instructions are marked as having no scheduling info. Reviewers: dsanders, vkalintiris Differential Review: http://reviews.llvm.org/D20418 llvm-svn: 272648
* [mips][dsp] Fix use without def on DSPCtrl registers read by rddsp intrinsic.Daniel Sanders2016-06-141-1/+2
| | | | | | | | | | Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21063 llvm-svn: 272647
* [mips][msa] copyPhysReg() should not set RegState::Define on result of CTCMSA.Daniel Sanders2016-06-141-2/+2
| | | | | | | | | | | | | | Summary: The machine verifier reports 'Explicit operand marked as def' when it is manually specified even though it agrees with the operand info. Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21065 llvm-svn: 272646
* [SelectionDAG] Remove exit-on-error flag from test (PR27765)Diana Picus2016-06-142-2/+17
| | | | | | | | | | | | | | | | The exit-on-error flag in the ARM test is necessary in order to avoid an unreachable in the DAGTypeLegalizer, when trying to expand a physical register. We can also avoid this situation by introducing a bitcast early on, where the invalid scalar-to-vector conversion is detected. We also add a test for PowerPC, which goes through a similar code path in the SelectionDAGBuilder. Fixes PR27765. Differential Revision: http://reviews.llvm.org/D21061 llvm-svn: 272644
* re-generate the tests using the update_llc_test_checks.py script Igor Breger2016-06-148-684/+1137
| | | | llvm-svn: 272643
* [AVX512] Use MOVZX32 instead of MOVZ16 for loading single v8/v4/v2/v1 masks ↵Craig Topper2016-06-144-12/+12
| | | | | | when KMOVB is not available. This has better behavior with respect to partial register stalls since it won't need to preserve the upper 16-bits of the GPR. llvm-svn: 272626
* [AVX512] Add patterns for zero-extending a mask that use the def of ↵Craig Topper2016-06-142-7/+0
| | | | | | KMOVW/KMOVB without going through an EXTRACT_SUBREG and a MOVZX. llvm-svn: 272625
* [AVX512] Add tests for zero extending masks that show an unnecessary movzx ↵Craig Topper2016-06-141-26/+778
| | | | | | instruction. A followup patch will remove that instruction, but adding the tests first to make the more obvious. llvm-svn: 272624
* Move previously added test case to the right locationSanjoy Das2016-06-131-17/+0
| | | | | | | In rL272580 I accidentally added a test case to test/CodeGen when test/Transforms/DeadStoreElimination/ is a better place for it. llvm-svn: 272581
* Fix AAResults::callCapturesBefore for operand bundlesSanjoy Das2016-06-131-0/+17
| | | | | | | | | | | | | | | | Summary: AAResults::callCapturesBefore would previously ignore operand bundles. It was possible for a later instruction to miss its memory dependency on a call site that would only access the pointer through a bundle. Patch by Oscar Blumberg! Reviewers: sanjoy Differential Revision: http://reviews.llvm.org/D21286 llvm-svn: 272580
OpenPOWER on IntegriCloud