summaryrefslogtreecommitdiffstats
path: root/llvm/test
Commit message (Collapse)AuthorAgeFilesLines
* [x86] add tests for insertelement; NFCSanjay Patel2017-10-101-0/+1178
| | | | llvm-svn: 315312
* [mips] Partially fix PR34391Simon Dardis2017-10-102-0/+73
| | | | | | | | | | | | | | | | | | | | | | | | Previously, the parsing of the 'subu $reg, ($reg,) imm' relied on a parser which also rendered the operand to the instruction. In some cases the general parser could construct an MCExpr which was not a MCConstantExpr which MipsAsmParser was expecting. Address this by altering the special handling to cope with unexpected inputs and fine-tune the handling of cases where an register name that is not available in the current ABI is regarded as not a match for the custom parser but also not as an outright error. Also enforces the binutils restriction that only constants are accepted. This partially resolves PR34391. Thanks to Ed Maste for reporting the issue! Reviewers: nitesh.jain, arichardson Differential Revision: https://reviews.llvm.org/D37476 llvm-svn: 315310
* [DAGCombine] Fix for shuffle to vector extend for non power 2 vectorsDavid Stuttard2017-10-101-0/+32
| | | | | | | | | | | | | | | | | | | | | Summary: See https://llvm.org/PR33743 for more details It seems that for non-power of 2 vector sizes, the algorithm can produce non-matching sizes for input and result causing an assert. This usually isn't a problem as the isAnyExtend check will weed these out, but in some cases (most often with lots of undefined values for the mask indices) it can pass this check for non power of 2 vectors. Adding in an extra check that ensures that bit size will match for the result and input (as required) Subscribers: nhaehnle Differential Revision: https://reviews.llvm.org/D35241 llvm-svn: 315307
* [ARM, Asm] Harden GNU LDRD/STRD aliases against invalid inputsOliver Stannard2017-10-105-9/+87
| | | | | | | | | | | | | | | | | | | Previously, the code that implemented the GNU assembler aliases for the LDRD and STRD instructions (where the second register is omitted) assumed that the input was a valid instruction. This caused assertion failures for every example in ldrd-strd-gnu-bad-inst.s. This improves this code so that it bails out if the instruction is not in the expected format, the check bails out, and the asm parser is run on the unmodified instruction. It also relaxes the alias on thumb targets, so that unaligned pairs of registers can be used. The restriction that Rt must be even-numbered only applies to the ARM versions of these instructions. Differential revision: https://reviews.llvm.org/D36732 llvm-svn: 315305
* [ARM, Asm] Add diagnostics for floating-point register operandsOliver Stannard2017-10-106-50/+84
| | | | | | | | | | | | | | | This adds diagnostic strings for the ARM floating-point register classes, which will be used when these classes are expected by the assembler, but the provided operand is not valid. One of these, DPR, requires C++ code to select the correct error message, as that class contains different registers depending on the FPU. The rest can all have their diagnostic strings stored in the tablegen decription of them. Differential revision: https://reviews.llvm.org/D36693 llvm-svn: 315304
* [ARM, Asm] Add diagnostics for general-purpose register operandsOliver Stannard2017-10-1014-81/+143
| | | | | | | | | | | | | | | This adds diagnostic strings for the ARM general-purpose register classes, which will be used when these classes are expected by the assembler, but the provided operand is not valid. One of these, rGPR, requires C++ code to select the correct error message, as that class contains different registers in pre-v8 and v8 targets. The rest can all have their diagnostic strings stored in the tablegen description of them. Differential revision: https://reviews.llvm.org/D36692 llvm-svn: 315303
* AMDGPU: Split MUBUF offset into aligned componentsNicolai Haehnle2017-10-103-25/+25
| | | | | | | | | | | | | | | | | | | | Summary: Atomic buffer operations do not work (and trap on gfx9) when the components are unaligned, even if their sum is aligned. Previously, we generated an offset of 4156 without an SGPR by splitting it as 4095 + 61 (immediate + inline constant). The highest offset for which we can do this correctly is 4156 = 4092 + 64. Fixes dEQP-GLES31.functional.ssbo.atomic.* Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37850 llvm-svn: 315302
* Revert "[llvm-dwarfdump] Print type names in DW_AT_type DIEs"Jonas Devlieghere2017-10-1021-76/+76
| | | | | | This reverts commit r315297. llvm-svn: 315299
* [llvm-dwarfdump] Print type names in DW_AT_type DIEsJonas Devlieghere2017-10-1021-76/+76
| | | | | | | | | | This patch adds printing for DW_AT_type DIEs like it is already the case for DW_AT_specification DIEs. This is a rather naive approach and only a start. We should have pretty printers for different languages. Differential revision: https://reviews.llvm.org/D36993 llvm-svn: 315297
* [AsmParser] Add DiagnosticString to register classes in tablegenOliver Stannard2017-10-102-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | This allows a DiagnosticType and/or DiagnosticString to be associated with a RegisterClass in tablegen, so that we can emit diagnostics in the assembler when a register operand is incorrect. DiagnosticType creates a predictable enum value, which gets returned as the error code when an operand does not match, and can be used by the assembly parser to map to a user-facing diagnostic. DiagnosticString creates an anonymous enum value (currently based on the tablegen class name), and a function to map from enum values to strings will be generated. Both of these work the same was as they do for AsmOperand. This isn't used by any targets yet, but has one (positive) side-effect. It improves the diagnostic codes returned by validateOperandClass - we always want to emit the diagnostic that relates to the expected operand class, but this wasn't always being done when the expected and actual classes were completely different (token/register/custom). This causes a few AArch64 diagnostics to be improved, as Match_InvalidOperand was being returned instead of a specific diagnostic type. Differential revision: https://reviews.llvm.org/D36691 llvm-svn: 315295
* [X86][SKYLAKE] Update regression test to differentiate between HASWELL and ↵Gadi Haber2017-10-106-6/+314
| | | | | | | | | | | | | | | SKYLAKE scheduling.<NFC> NFC. Updated 6 regression tests to differentiate between HASWELL and SKYLAKE scheduling information. The fix is in preparation of a patch to update the information of the Skylake Client scheduling to include the appropriate load and store latencies. Reviewers: zvi, RKSimon Differential Revision: https://reviews.llvm.org/D38685 Change-Id: Ifc6b98d9eaf266913698f24c766fd994fc977555 llvm-svn: 315291
* [SCCP] Propagate integer range info for parameters in IPSCCP.Florian Hahn2017-10-101-0/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This updates the SCCP solver to use of the ValueElement lattice for parameters, which provides integer range information. The range information is used to remove unneeded icmp instructions. For the following function, f() can be optimized to `ret i32 2` with this change source_filename = "sccp.c" target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" ; Function Attrs: norecurse nounwind readnone uwtable define i32 @main() local_unnamed_addr #0 { entry: %call = tail call fastcc i32 @f(i32 1) %call1 = tail call fastcc i32 @f(i32 47) %add3 = add nsw i32 %call, %call1 ret i32 %add3 } ; Function Attrs: noinline norecurse nounwind readnone uwtable define internal fastcc i32 @f(i32 %x) unnamed_addr #1 { entry: %c1 = icmp sle i32 %x, 100 %cmp = icmp sgt i32 %x, 300 %. = select i1 %cmp, i32 1, i32 2 ret i32 %. } attributes #1 = { noinline } Reviewers: davide, sanjoy, efriedma, dberlin Reviewed By: davide, dberlin Subscribers: mcrosier, gberry, mssimpso, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D36656 llvm-svn: 315288
* Fix for PR34888.Nemanja Ivanovic2017-10-101-0/+121
| | | | | | | | | | The issue is that we assume operand zero of the input to the add instruction is a register. In this case, the input comes from inline assembly and operand zero is not a register thereby causing a crash. The code will bail anyway if the input instruction doesn't have the right opcode. So do that check first and let short-circuiting prevent the crash. llvm-svn: 315285
* Re-land "[MergeICmps] Disable mergeicmps if the target does not want to ↵Clement Courbet2017-10-106-85/+215
| | | | | | | | | | handle memcmp expansion." (fixed stability issues) This reverts commit d6492333d3b478a1d88163315002022f8d5e58dc. llvm-svn: 315281
* [AVX512] Add patterns to commute integer comparison instructions during isel.Craig Topper2017-10-101-106/+54
| | | | | | This enables broadcast loads to be commuted and allows normal loads to be folded without the peephole pass. llvm-svn: 315274
* Renable r314928Xinliang David Li2017-10-107-0/+528
| | | | | | | | | | | Eliminate inttype phi with inttoptr/ptrtoint. This version fixed a bug in finding the matching phi -- the order of the incoming blocks may be different (triggered in self build on Windows). A new test case is added. llvm-svn: 315272
* [MC] Properly diagnose badly scoped .cfi_ directivesReid Kleckner2017-10-101-0/+18
| | | | | | | | | | Removes two report_fatal_errors. Implement this by removing EmitCFICommon, and do the checking in getCurrentDwarfFrameInfo. Have the callers check for null before dereferencing it. llvm-svn: 315264
* Give a test a tripleReid Kleckner2017-10-101-1/+1
| | | | llvm-svn: 315263
* [SEH] Use reportError instead of report_fatal_error for bad directivesReid Kleckner2017-10-102-2/+97
| | | | | | | | | | This makes the .seh_ directives slightly more usable from standalone assembly files. This removes a large number of report_fatal_errors and recovers from the error by ignoring the directive. llvm-svn: 315262
* [MC] Suppress .Lcfi labels when emitting textual assemblyReid Kleckner2017-10-1091-2952/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This suppresses the generation of .Lcfi labels in our textual assembler. It was annoying that this generated cascading .Lcfi labels: llc foo.ll -o - | llvm-mc | llvm-mc After three trips through MCAsmStreamer, we'd have three labels in the output when none are necessary. We should only bother creating the labels and frame data when making a real object file. This supercedes D38605, which moved the entire .seh_ implementation into MCObjectStreamer. This has the advantage that we do more checking when emitting textual assembly, as a minor efficiency cost. Outputting textual assembly is not performance critical, so this shouldn't matter. Reviewers: majnemer, MatzeB Subscribers: qcolombet, nemanjai, javed.absar, eraman, hiraditya, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D38638 llvm-svn: 315259
* [GISel]: Fix generation of illegal COPYs during CallLoweringAditya Nandakumar2017-10-099-73/+122
| | | | | | | | | | | We end up creating COPY's that are either truncating/extending and this should be illegal. https://reviews.llvm.org/D37640 Patch for X86 and ARM by igorb, rovka llvm-svn: 315240
* [X86] Unsigned saturation subtraction canonicalization [the backend part]Zvi Rackover2017-10-091-236/+161
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: On behalf of julia.koval@intel.com The patch transforms canonical version of unsigned saturation, which is sub(max(a,b),a) or sub(a,min(a,b)) to special psubus insturuction on targets, which support it(8bit and 16bit uints). umax(a,b) - b -> subus(a,b) a - umin(a,b) -> subus(a,b) There is also extra case handled, when right part of sub is 32 bit and can be truncated, using UMIN(this transformation was discussed in https://reviews.llvm.org/D25987). The example of special case code: ``` void foo(unsigned short *p, int max, int n) { int i; unsigned m; for (i = 0; i < n; i++) { m = *--p; *p = (unsigned short)(m >= max ? m-max : 0); } } ``` Max in this example is truncated to max_short value, if it is greater than m, or just truncated to 16 bit, if it is not. It is vaid transformation, because if max > max_short, result of the expression will be zero. Here is the table of types, I try to support, special case items are bold: | Size | 128 | 256 | 512 | ----- | ----- | ----- | ----- | i8 | v16i8 | v32i8 | v64i8 | i16 | v8i16 | v16i16 | v32i16 | i32 | | **v8i32** | **v16i32** | i64 | | | **v8i64** Reviewers: zvi, spatel, DavidKreitzer, RKSimon Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37534 llvm-svn: 315237
* [SLP] Add test for reversed load, NFC.Alexey Bataev2017-10-091-0/+30
| | | | llvm-svn: 315232
* [globalisel] Add support for ValueType operands in patterns.Daniel Sanders2017-10-091-32/+53
| | | | | | | | | | | | It's rare but there are a small number of patterns like this: (set i64:$dst, (add i64:$src1, i64:$src2)) These should be equivalent to register classes except they shouldn't check for a specific register bank. This doesn't occur in AArch64/ARM/X86 but does occasionally come up in other in-tree targets such as BPF. llvm-svn: 315226
* [dsymutil] Emit valid debug locations when no symbol flags are setFrancis Ricci2017-10-093-0/+37
| | | | | | | | | | | | | | | | | Summary: swiftc emits symbols without flags set, which led dsymutil to ignore them when searching for global symbols, causing dwarf location data to be omitted. Xcode's dsymutil handles this case correctly, and emits valid location data. Add this functionality to llvm-dsymutil by allowing parsing of symbols with no flags set. Reviewers: aprantl, friss, JDevlieghere Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38587 llvm-svn: 315218
* [SLP] Test for wrongly vectorized set of extractelements, NFC.Alexey Bataev2017-10-091-0/+30
| | | | llvm-svn: 315217
* [llvm-rc] Have the tokenizer discard single & block comments.Zachary Turner2017-10-092-0/+17
| | | | | | | | | | This allows rc files to have comments. Eventually we should just use clang's c preprocessor, but that's a bit larger effort for minimal gain, and this is straightforward. Differential Revision: https://reviews.llvm.org/D38651 llvm-svn: 315207
* [DAG] combine assertsexts around a truncSanjay Patel2017-10-097-52/+73
| | | | | | | This was a suggested follow-up to: D37017 / https://reviews.llvm.org/rL313577 llvm-svn: 315206
* [AArch64] Improve codegen for inverted overflow checking intrinsicsAmara Emerson2017-10-091-0/+138
| | | | | | | | | | | | | | E.g. if we have a (xor(overflow-bit), 1) where overflow-bit comes from an intrinsic like llvm.sadd.with.overflow then we can kill the xor and use the inverted condition code for the CSEL. rdar://28495949 Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D38160 llvm-svn: 315205
* [x86] regenerate test checks; NFCSanjay Patel2017-10-091-90/+115
| | | | llvm-svn: 315204
* [AArch64] fix typos in test assertionsSanjay Patel2017-10-091-2/+2
| | | | llvm-svn: 315203
* [X86] Enable extended comparison predicate support for SETUEQ/SETONE when ↵Craig Topper2017-10-095-96/+40
| | | | | | | | | | targeting AVX instructions. We believe that despite AMD's documentation, that they really do support all 32 comparision predicates under AVX. Differential Revision: https://reviews.llvm.org/D38609 llvm-svn: 315201
* [X86][SSE] Add test case for PR27708Simon Pilgrim2017-10-081-0/+142
| | | | llvm-svn: 315186
* [X86] Regenerate fast-isel-select-pseudo-cmov.ll to prepare for D38609.Craig Topper2017-10-081-64/+213
| | | | llvm-svn: 315184
* [X86] getTargetConstantBitsFromNode - add support for decoding scalar constantsSimon Pilgrim2017-10-081-6/+4
| | | | llvm-svn: 315182
* [X86] Prefer MOVSS/SD over BLENDI during legalization. Remove BLENDI ↵Craig Topper2017-10-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | versions of scalar arithmetic patterns Summary: We currently disable some converting of shuffles to MOVSS/MOVSD during legalization if SSE41 is enabled. But later during shuffle combining we go back to prefering MOVSS/MOVSD. Additionally we have patterns that look for BLENDIs to detect scalar arithmetic operations. I believe due to the combining using MOVSS/MOVSD these are unnecessary. Interestingly, we still codegen blend instructions even though lowering/isel emit movss/movsd instructions. Turns out machine CSE commutes them to blend, and then commuting those blends back into blends that are equivalent to the original movss/movsd. This patch fixes the inconsistency in legalization to prefer MOVSS/MOVSD. The one test change was caused by this change. The problem is that we have integer types and are mostly selecting integer instructions except for the shufps. This shufps forced the execution domain, but the vpblendw couldn't have its domain changed with a naive instruction swap. We could fix this by special casing VPBLENDW based on the immediate to widen the element type. The rest of the patch is removing all the excess scalar patterns. Long term we should probably add isel patterns to make MOVSS/MOVSD emit blends directly instead of relying on the double commute. We may also want to consider emitting movss/movsd for optsize. I also wonder if we should still use the VEX encoded blendi instructions even with AVX512. Blends have better throughput, and that may outweigh the register constraint. Reviewers: RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38023 llvm-svn: 315181
* [AArch64][GlobalISel] Add a test case for G_PHI of p0 instruction selection.Amara Emerson2017-10-081-0/+45
| | | | llvm-svn: 315179
* [AArch64][GlobalISel] Add a test case for G_PHI of p0 regbank selection.Amara Emerson2017-10-081-0/+42
| | | | llvm-svn: 315178
* [AArch64][GlobalISel] Make G_PHI of p0 types legal.Amara Emerson2017-10-081-0/+51
| | | | | | Differential Revision: https://reviews.llvm.org/D38621 llvm-svn: 315177
* [X86][XOP] Add XOP oddshuffles testsSimon Pilgrim2017-10-081-0/+266
| | | | | | XOP codegen is often different to generic AVX - thank you vpperm! llvm-svn: 315176
* [X86][SKX] Adding the scheduling information for the SKX target.Gadi Haber2017-10-0813-5523/+5525
| | | | | | | | | | | | | | | | | | Adding the scheduling information for the SkylakeServer (SKX) target. This patch adds the instruction scheduling information for the SkylakeServer (SKX) architecture target by adding the file X86SchedSkylakeServer.td located under the X86 Target. We used the scheduling information retrieved from the Skylake architects in order to create the file. The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction. The patch continues the scheduling replacement and insertion effort started with the SNB target in r310792, the HSW target in r311879 and the SkylakeClient (SKL) target in rL313613. Please expect some performance fluctuations due to code alignment effects. Reviewers: zvi, RKSimon, craig.topper, chandlerc, aymanmu Differential Revision: https://reviews.llvm.org/D38443 Change-Id: I5c228fcc09e9e5a99b6116e62b356c4f9b971185 llvm-svn: 315175
* [X86] Stop LowerSIGN_EXTEND_AVX512 from creating v8i16/v16i16/v16i8 vselects ↵Craig Topper2017-10-081-0/+225
| | | | | | | | with a v8i1/v16i1 condition when BWI is not available. Some of the tests in vector-shuffle-v1.ll would get into an infinite loop without this. llvm-svn: 315172
* [X86] If we see an insert of a bitcast into zero vector, canonicalize it to ↵Craig Topper2017-10-081-5/+0
| | | | | | | | move the bitcast to the other side of the insert. This improves detection of zeroing of upper bits during isel. llvm-svn: 315161
* [X86][SSE] Match bitcasted BUILD_VECTOR of constants for v2i64 shifts on ↵Simon Pilgrim2017-10-071-6/+0
| | | | | | | | 64-bit targets (PR34855) Extension to rL315155, generate constant shifts on 64-bits as well as 32-bits. llvm-svn: 315156
* [X86][SSE] Match bitcasted v4i32 BUILD_VECTORS for v2i64 shifts on 64-bit ↵Simon Pilgrim2017-10-071-0/+38
| | | | | | | | targets (PR34855) We were already doing this for 32-bit targets, but we can generate these on 64-bits as well. llvm-svn: 315155
* [X86] Add X86ISD::CMOV to computeKnownBitsForTargetNode and ↵Craig Topper2017-10-072-5/+2
| | | | | | | | | | | | | | | | ComputeNumSignBitsForTargetNode. Summary: Implementations based on ISD::SELECT. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38663 llvm-svn: 315153
* [InstSimplify] add tests to show we can do better at folding poison; NFCSanjay Patel2017-10-071-0/+58
| | | | llvm-svn: 315152
* [X86][SSE] Improve shuffling combining with horizontal operationsSimon Pilgrim2017-10-072-137/+47
| | | | | | | | | | Recognise cases when we can merge the shuffles with their horizontal (HADD/HSUB/PACK) instruction inputs. Replaces an older implementation which performed some of this during lowering, expanding an existing target shuffle combine stage instead. Differential Revision: https://reviews.llvm.org/D38506 llvm-svn: 315150
* [MachineOutliner] Disable outlining from LinkOnceODRs by defaultJessica Paquette2017-10-071-8/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Say you have two identical linkonceodr functions, one in M1 and one in M2. Say that the outliner outlines A,B,C from one function, and D,E,F from another function (where letters are instructions). Now those functions are not identical, and cannot be deduped. Locally to M1 and M2, these outlining choices would be good-- to the whole program, however, this might not be true! To mitigate this, this commit makes it so that the outliner sees linkonceodr functions as unsafe to outline from. It also adds a flag, -enable-linkonceodr-outlining, which allows the user to specify that they want to outline from such functions when they know what they're doing. Changing this handles most code size regressions in the test suite caused by competing with linker dedupe. It also doesn't have a huge impact on the code size improvements from the outliner. There are 6 tests that regress > 5% from outlining WITH linkonceodrs to outlining WITHOUT linkonceodrs. Overall, most tests either improve or are not impacted. Not outlined vs outlined without linkonceodrs: https://hastebin.com/raw/qeguxavuda Not outlined vs outlined with linkonceodrs: https://hastebin.com/raw/edepoqoqic Outlined with linkonceodrs vs outlined without linkonceodrs: https://hastebin.com/raw/awiqifiheb Numbers generated using compare.py with -m size.__text. Tests run for AArch64 with -Oz -mllvm -enable-machine-outliner -mno-red-zone. llvm-svn: 315136
* [InstCombine] use correct type when propagating constant condition in ↵Sanjay Patel2017-10-061-0/+16
| | | | | | simplifyDivRemOfSelectWithZeroOp (PR34856) llvm-svn: 315130
OpenPOWER on IntegriCloud