summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [x86] eliminate redundant shuffle of horizontal math ops when both inputs ↵Sanjay Patel2017-09-011-1/+39
| | | | | | | | | | | | | | | are the same This is limited to a set of patterns based on the example in PR34111: https://bugs.llvm.org/show_bug.cgi?id=34111 ...but as I was investigating this, I see that horizontal patterns can go wrong in many, many other ways that would not be handled by this patch. Each data type may even go different in the DAG after starting with the same basic IR pattern, so even proper IR canonicalization won't fix it all. Differential Revision: https://reviews.llvm.org/D37357 llvm-svn: 312379
* [AMDGPU] Prevent infinite recursion in DAG.computeKnownBits()Stanislav Mekhanoshin2017-09-011-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D37392 llvm-svn: 312364
* AMDGPU: Add ds_{read|write}_addtid_b32 definitionsMatt Arsenault2017-09-012-0/+13
| | | | llvm-svn: 312349
* AMDGPU: Add most d16 load/store instruction definitionsMatt Arsenault2017-09-015-15/+147
| | | | | | | Doesn't include the tied operand necessary for the loads, but is enough for the assembler to work. llvm-svn: 312347
* [WebAssembly] Update relocation names to match specSam Clegg2017-09-011-3/+3
| | | | | | | | Summary: See https://github.com/WebAssembly/tool-conventions/blob/master/Linking.md Differential Revision: https://reviews.llvm.org/D37385 llvm-svn: 312342
* AMDGPU: IMPLICIT_DEFs and DBG_VALUEs do not contribute to wait statesNicolai Haehnle2017-09-011-4/+9
| | | | | | | | | | | | | | | Summary: This fixes a bug that was exposed on gfx9 in various GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests, e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D36193 llvm-svn: 312337
* [ARM] GlobalISel: Support ROPI global variablesDiana Picus2017-09-011-2/+14
| | | | | | | In the ROPI relocation model, read-only variables are accessed relative to the PC. We use the (MOV|LDRLIT)_ga_pcrel pseudoinstructions for this. llvm-svn: 312323
* [ARM] Add 2-operand assembly aliases for Thumb1 ADD/SUBOliver Stannard2017-09-011-0/+6
| | | | | | | | | | | | | | | | This adds 2-operand assembly aliases for these instructions: add r0, r1 => add r0, r0, r1 sub r0, r1 => sub r0, r0, r1 Previously this syntax was only accepted for Thumb2 targets, where the wide versions of the instructions were used. This patch allows the 2-operand syntax to be used for Thumb1 targets, and selects the narrow encoding when it is used for Thumb2 targets. Differential revision: https://reviews.llvm.org/D37377 llvm-svn: 312321
* Move static helper into ARMTargetLowering. NFCDiana Picus2017-09-012-1/+3
| | | | | | | This exposes the isReadOnly(GlobalValue *) in the ARMTargetLowering so we can make use of it in GlobalISel as well. llvm-svn: 312320
* [AVX512] Suppress duplicate register only FMA patterns.Craig Topper2017-09-011-30/+40
| | | | | | | | Previously we generated a register only pattern for each of the 3 instruction forms, but they are all identical as far as isel is concerned. So drop the others and just keep the 213 version. This removes 2968 bytes from the isel table. llvm-svn: 312313
* [X86] Remove unused multiclass.Craig Topper2017-09-011-17/+0
| | | | llvm-svn: 312312
* [X86] Simplify some multiclasses by inheriting from similar ones. NFCCraig Topper2017-09-011-17/+9
| | | | llvm-svn: 312311
* [X86] Add a couple TODOs to the PMADD52 instrucions about missing commuting ↵Craig Topper2017-09-011-0/+3
| | | | | | opportunities. llvm-svn: 312310
* [X86] Add isel patterns for memory forms of FMA3 intrinsic instructionsCraig Topper2017-09-011-0/+17
| | | | llvm-svn: 312309
* [X86] Remove unnecessary COPY_TO_REGCLASS(VR128) from the output patterns ↵Craig Topper2017-09-011-4/+4
| | | | | | | | for FMA instrinsics. The instructions are already defined as writing a VR128 register. llvm-svn: 312308
* AMDGPU: Fold clamp modifier for packed instructionsMatt Arsenault2017-08-316-20/+73
| | | | llvm-svn: 312297
* [Analysis] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-08-311-4/+19
| | | | | | warnings; other minor fixes. Also affected in files (NFC). llvm-svn: 312289
* [WebAssembly] Refactor load ISel tablegen patterns into classesDerek Schuff2017-08-312-327/+215
| | | | | | | | | Not all of these will be able to be used by atomics because tablegen, but it still seems like a good change by itself. Differential Revision: https://reviews.llvm.org/D37345 llvm-svn: 312287
* [X86] Don't pull carry through X86ISD::ADD carryin, -1 if we can't guranteed ↵Craig Topper2017-08-311-22/+45
| | | | | | | | | | | | | | | | we're really using the carry flag from the add. Prior to this patch we had a DAG combine that tried to bypass an X86ISD::ADD with -1 being added to the carry flag of some previous operation. We would then pass the carry flag directly to user. But this is only safe if the user is looking for the carry flag and not the zero flag. So we need to only do this combine in a context where we know what flag the consumer is using. Fixes PR34381. Differential Revision: https://reviews.llvm.org/D37317 llvm-svn: 312285
* AMDGPU: Turn int pack pattern into build_vectorMatt Arsenault2017-08-312-1/+18
| | | | | | | | | | build_vector is a more useful canonical form when pattern matching packed operations, so turn shift into high element into a build_vector. Should show no change for now. llvm-svn: 312282
* Revert r311525: "[XRay][CodeGen] Use PIC-friendly code in XRay sleds; remove ↵Daniel Jasper2017-08-311-38/+35
| | | | | | | | synthetic references in .text" Breaks builds internally. Will forward repo instructions to author. llvm-svn: 312243
* AMD family 17h (znver1) scheduler model update.Ashutosh Nema2017-08-311-3/+1550
| | | | | | | | | | | | | | | | | | | | Summary: This patch enables the following: 1) Regex based Instruction itineraries for integer instructions. 2) The instructions are grouped as per the nature of the instructions (move, arithmetic, logic, Misc, Control Transfer). 3) FP instructions and their itineraries are added which includes values for SSE4A, BMI, BMI2 and SHA instructions. Patch by Ganesh Gopalasubramanian Reviewers: RKSimon, craig.topper Subscribers: vprasad, shivaram, ddibyend, andreadb, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D36617 llvm-svn: 312237
* [GlobalISel][X86] Refactor X86LegalizerInfo. NFC.Igor Breger2017-08-311-45/+10
| | | | llvm-svn: 312234
* [AArch64] v8.3-a complex number supportSam Parker2017-08-315-0/+282
| | | | | | | | | | | | | New instructions are added to AArch32 and AArch64 to aid floating-point multiplication and addition of complex numbers, where the complex numbers are packed in a vector register as a pair of elements. The Imaginary part of the number is placed in the more significant element, and the Real part of the number is placed in the less significant element. Differential Revision: https://reviews.llvm.org/D36792 llvm-svn: 312228
* [ARM] Reverse PostRASched subtarget feature logicSam Parker2017-08-314-24/+17
| | | | | | | | | | | | Replace the UsePostRAScheduler SubtargetFeature with DisablePostRAScheduler, which is then used by Swift and Cyclone. This patch maintains enabling PostRA scheduling for other Thumb2 capable cores and/or for functions which are being compiled in Arm mode. Differential Revision: https://reviews.llvm.org/D37055 llvm-svn: 312226
* [AArch64] IDSAR6 register assembler supportSam Parker2017-08-311-0/+3
| | | | | | | | | | The IDSAR6 system register has been introduced to identify the v8.3-a Javascript data type conversion and v8.2-a dot product support. Differential Revision: https://reviews.llvm.org/D37068 llvm-svn: 312225
* [AArch64] Support COFF linker directivesMartin Storsjo2017-08-311-0/+24
| | | | | | | | | | This is similar to what was done for ARM in SVN r269574; the code and the test are straight copypaste to the corresponding AArch64 code and test directory. Differential revision: https://reviews.llvm.org/D37204 llvm-svn: 312223
* Temporarily revert "Update branch coalescing to be a PowerPC specific pass"Eric Christopher2017-08-314-785/+0
| | | | | | | | From comments and code review it wasn't intended to be enabled by default yet. This reverts commit r311588. llvm-svn: 312214
* AMDGPU: Don't assert in TTI with fp32 denorms enabledMatt Arsenault2017-08-311-3/+25
| | | | | | Also refine for f16 and rcp cases. llvm-svn: 312213
* AMDGPU: Use set for tracked registersMatt Arsenault2017-08-311-20/+23
| | | | | | | | | | | | | | | The majority of the time spent in the pass checking for the register reads. Rather than searching all of the defined registers for uses in each instruction, use a set of defined registers and check the operands of the instruction. This process still is algorithmically not great, but with the additional trick of skipping the analysis for addresses with one use, this brings one slow testcase into a reasonable range. llvm-svn: 312206
* [X86] Remove some code from fast isel that is no longer needed with i1 being ↵Craig Topper2017-08-301-31/+0
| | | | | | an illegal type. llvm-svn: 312190
* [ARM] Replace fixed-size SmallSet with a bitset.Benjamin Kramer2017-08-301-30/+30
| | | | | | It's smaller. No functionality change. llvm-svn: 312180
* AMDGPU: Correct operand types for v_mad_mix*Matt Arsenault2017-08-304-13/+37
| | | | | | | | | | | | These aren't really packed instructions, so the default op_sel_hi should be 0 since this indicates a conversion. The operand types are scalar values that behave similar to an f16 scalar that may be converted to f32. Doesn't change the default printing for op_sel_hi, just the parsing. llvm-svn: 312179
* [ARM] Use Swift error registers on non-Darwin targetsBrian Gesiak2017-08-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Remove a check for `ARMSubtarget::isTargetDarwin` when determining whether to use Swift error registers, so that Swift errors work properly on non-Darwin ARM32 targets (specifically Android). Before this patch, generated code would save and restores ARM register r8 at the entry and returns of a function that throws. As r8 is used as a virtual return value for the object being thrown, this gets overwritten by the restore, and calling code is unable to catch the error. In turn this caused Swift code that used `do`/`try`/`catch` to work improperly on Android ARM32 targets. Addresses Swift bug report https://bugs.swift.org/browse/SR-5438. Patch by John Holdsworth. Reviewers: manmanren, rjmccall, aschwaighofer Reviewed By: aschwaighofer Subscribers: srhines, aschwaighofer, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35835 llvm-svn: 312164
* [X86] Remove unneed AVX512 check from fast isel.Craig Topper2017-08-301-2/+1
| | | | | | This is no longer necessary now that i1 is illegal. llvm-svn: 312146
* [WebAssembly] Add target feature for atomicsDerek Schuff2017-08-309-11/+31
| | | | | | | | | | Summary: This tracks the WebAssembly threads feature proposal at https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md Differential Revision: https://reviews.llvm.org/D37300 llvm-svn: 312145
* [AVX512] Don't use 32-bit elements version of AND/OR/XOR/ANDN during isel ↵Craig Topper2017-08-301-32/+34
| | | | | | | | | | | | unless we're matching a masked op or broadcast Selecting 32-bit element logical ops without a select or broadcast requires matching a bitconvert on the inputs to the and. But that's a weird thing to rely on. It's entirely possible that one of the inputs doesn't have a bitcast and one does. Since there's no functional difference, just remove the extra patterns and save some isel table size. Differential Revision: https://reviews.llvm.org/D36854 llvm-svn: 312138
* [GlobalISel][X86] Support variadic function call.Igor Breger2017-08-301-4/+26
| | | | | | | | | | | | | | Summary: Support variadic function call. Port the implementation from X86FastISel. Reviewers: zvi, guyblank, oren_ben_simhon Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D37261 llvm-svn: 312130
* fix more typos; NFCSanjay Patel2017-08-301-2/+2
| | | | llvm-svn: 312120
* fix typos; NFCSanjay Patel2017-08-301-15/+15
| | | | llvm-svn: 312119
* [MIPS] Add support to match more patterns for BBIT instructionStrahinja Petrovic2017-08-301-0/+15
| | | | | | | | | | This patch supports one more pattern for bbit0 and bbit1 instructions, CBranchBitNum class is expanded so it can take 32 bit immidate. Differential Revision: https://reviews.llvm.org/D36222 llvm-svn: 312111
* [AArch64] allow v4f16 types when FullFP16 is supportedSjoerd Meijer2017-08-301-57/+53
| | | | | | | | | | Support for scalars was committed in r311154, this adds support for allowing v4f16 vector types (thus avoiding conversions from/to single precision for these types). Differential Revision: https://reviews.llvm.org/D37145 llvm-svn: 312104
* [AVX512] Correct isel patterns to support selecting masked ↵Craig Topper2017-08-302-33/+59
| | | | | | | | | | | | | | | | | | | | | | | vbroadcastf32x2/vbroadcasti32x2 Summary: This patch adjusts the patterns to make the result type of the broadcast node vXf64/vXi64. Then adds a bitcast to vXi32 after that. Intrinsic lowering was also adjusted to generate this new pattern. Fixes PR34357 We should probably just drop the intrinsic entirely and use native IR, but I'll leave that for a future patch. Any idea what instruction we should be lowering the floating point 128-bit result version of this pattern to? There's a 128-bit v2i32 integer broadcast but not an fp one. Reviewers: aymanmus, zvi, igorb Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37286 llvm-svn: 312101
* [AVX512] Use 256-bit extract instructions for extracting bits [255:128] from ↵Craig Topper2017-08-301-0/+58
| | | | | | | | | | a 512-bit register This enables the use of a smaller encoding by using a VEX instruction when possible. Differential Revision: https://reviews.llvm.org/D37092 llvm-svn: 312100
* [X86] Apply SlowIncDec feature to Sandybridge/Ivybridge CPUs as wellCraig Topper2017-08-301-2/+2
| | | | | | | | Currently we start applying this on Haswell and newer. I don't believe anything changed in the Haswell architecture to make this the right cutoff point. The partial flag handling around this has been roughly the same since Sandybridge. Differential Revision: https://reviews.llvm.org/D37250 llvm-svn: 312099
* [X86] Provide a separate feature bit for macro fusion support instead of ↵Craig Topper2017-08-304-14/+35
| | | | | | | | | | | | | | | | | | | | | basing it on the AVX flag Summary: Currently we determine if macro fusion is supported based on the AVX flag as a proxy for the processor being Sandy Bridge". This is really strange as now AMD supports AVX. It also means if user explicitly disables AVX we disable macro fusion. This patch adds an explicit macro fusion feature. I've also enabled for the generic 64-bit CPU (which doesn't have AVX) This is probably another candidate for being in the MI layer, but for now I at least wanted to correct the overloading of the AVX feature. Reviewers: spatel, chandlerc, RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37280 llvm-svn: 312097
* AMDGPU: Don't look for DS merge candidates with one use addressMatt Arsenault2017-08-301-3/+10
| | | | | | | | | | | | | The merge is only possible if the base address register is the same for the two instructions. If there is only the one use, there's no point in doing an expensive forward scan checking for memory interference looking for a merge candidate. This gives a signficant improvement in one extreme testcase. The code to do the scan is still algorithmically terrible, so this is still the slowest pass in that example. llvm-svn: 312096
* [AMDGPU] Use v_max_f* for fcanonicalizeStanislav Mekhanoshin2017-08-302-3/+33
| | | | | | | | | | If denorms are not flushed we can use max instead of multiplication by 1. For double that is simply faster, while for float and half it is shorter, because mul uses constant bus and VOP3. Differential Revision: https://reviews.llvm.org/D36856 llvm-svn: 312095
* AMDGPU: Select clamp pattern with v2f16Matt Arsenault2017-08-302-15/+36
| | | | llvm-svn: 312087
* [X86] Finish the subtarget and predicate implementation of CLWB.Craig Topper2017-08-292-0/+5
| | | | | | We don't have an intrinsic implemented for this instruction yet, but it looked odd that we were missing the accessor method from the subtarget. llvm-svn: 312064
OpenPOWER on IntegriCloud