summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-06-061-6/+14
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 304839
* Fix PR23384 (part 3 of 3)Evgeny Stupachenko2017-06-062-0/+13
| | | | | | | | | | | | | Summary: The patch makes instruction count the highest priority for LSR solution for X86 (previously registers had highest priority). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30562 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 304824
* [WebAssembly] MC: Refactor relocation handlingSam Clegg2017-06-061-0/+7
| | | | | | | | | | | | | The change cleans up and unifies the handling of relocation entries in WasmObjectWriter. Type index relocation no longer need to be handled separately. The only externally visible change should be that type index relocations are no longer grouped at the end. Differential Revision: https://reviews.llvm.org/D33918 llvm-svn: 304816
* AMDGPU/NFC: Move amdgpu code object metadata to supportKonstantin Zhuravlyov2017-06-063-617/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D31437 llvm-svn: 304812
* [Atomics][LoopIdiom] Recognize unordered atomic memcpyAnna Thomas2017-06-062-0/+4
| | | | | | | | | | | | | | | | | | | | | | Summary: Expanding the loop idiom test for memcpy to also recognize unordered atomic memcpy. The only difference for recognizing an unordered atomic memcpy and instead of a normal memcpy is that the loads and/or stores involved are unordered atomic operations. Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html Patch by Daniel Neilson! Reviewers: reames, anna, skatkov Reviewed By: reames, anna Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33243 llvm-svn: 304806
* [AMDGPU] Return correct value from SDWA passStanislav Mekhanoshin2017-06-061-1/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D33927 llvm-svn: 304805
* [mips] Add madd4 subtarget featurePetar Jovanovic2017-06-065-10/+24
| | | | | | | | | | | Addition of a feature and a predicate used to control generation of madd.fmt and similar instructions. Patch by Stefan Maksimovic. Differential Revision: https://reviews.llvm.org/D33400 llvm-svn: 304801
* [X86][AVX1] Split 256-bit vector non-temporal FastISel loads to keep it ↵Simon Pilgrim2017-06-061-0/+6
| | | | | | | | non-temporal (PR32744) Extension to D33728 llvm-svn: 304798
* AMDGPU/GlobalISel: Mark 32-bit G_ICMP as legalTom Stellard2017-06-061-0/+3
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33890 llvm-svn: 304797
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-06238-343/+333
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* [ARM] Add curly braces around switch case [NFC] Peter Smith2017-06-061-1/+2
| | | | | | | | | | | | | My previous commit r304702 introduced a new case into a switch statement. This case defined a variable but I forgot to add the curly brackets around the case to limit the scope. This change puts the curly braces back in so that the next person that adds a case doesn't get a build failure. Thanks to avieira for the spot. Differential Revision: https://reviews.llvm.org/D33931 llvm-svn: 304785
* [llvm] Remove double semicolonsMandeep Singh Grang2017-06-065-5/+5
| | | | | | | | | | | | Reviewers: craig.topper, arsenm, mehdi_amini Reviewed By: mehdi_amini Subscribers: mehdi_amini, wdng, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33924 llvm-svn: 304767
* [x86] Revert the X86FoldTablesEmitter due to more miscompiles.Chandler Carruth2017-06-062-3/+3398
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In testing, we've found yet another miscompile caused by the new tables. And this one is even less clear how to fix (we could teach it to fold a 16-bit load instead of the 32-bit load it wants, or block folding entirely). Also, the approach to excluding instructions seems increasingly to not scale well. I have left a more detailed analysis on the review log for the original patch (https://reviews.llvm.org/D32684) along with suggested path forward. I will land an additional test case that I wrote which covers the code that was miscompiling (folding into the output of `pextrw`) in a subsequent commit to keep this a pure revert. For each commit reverted here, I've restricted the revert to the non-test code touching the x86 fold table emission until the last commit where I did revert the test updates. This means the *new* test cases added for `insertps` and `xchg` remain untouched (and continue to pass). Reverted commits: r304540: [X86] Don't fold into memory operands into insertps in the ... r304347: [TableGen] Adapt more places to getValueAsString now ... r304163: [X86] Don't fold away the memory operand of an xchg. r304123: Don't capture a temporary std::string in a StringRef. r304122: Resubmit "[X86] Adding new LLVM TableGen backend that ..." Original commit was in r304088, and after a string of fixes was reverted previously in r304121 to fix build bots, and then re-landed in r304122. llvm-svn: 304762
* AMDGPU: Remove deprecated and unused elf definitionsKonstantin Zhuravlyov2017-06-055-144/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D33689 llvm-svn: 304737
* [AMDGPU] Fix uninit'ed var (RevisitLoop)Mark Searles2017-06-051-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D33907 llvm-svn: 304729
* [X86][SSE41] Non-temporal loads shouldn't be folded if it can be avoided ↵Simon Pilgrim2017-06-051-2/+6
| | | | | | | | | | (PR32743) Missed SSE41 non-temporal load case in previous commit Differential Revision: https://reviews.llvm.org/D33728 llvm-svn: 304722
* [X86][AVX1] Split 256-bit vector non-temporal loads to keep it non-temporal ↵Simon Pilgrim2017-06-051-6/+18
| | | | | | | | (PR32744) Differential Revision: https://reviews.llvm.org/D33728 llvm-svn: 304718
* [X86][SSE] Non-temporal loads shouldn't be folded if it can be avoided (PR32743)Simon Pilgrim2017-06-051-9/+24
| | | | | | Differential Revision: https://reviews.llvm.org/D33728 llvm-svn: 304717
* [ARM] GlobalISel: Constrain callee register on indirect callsDiana Picus2017-06-051-1/+10
| | | | | | | | | | | | | When lowering calls, we generate instructions with machine opcodes rather than generic ones. Therefore, we need to constrain the register classes of the operands. Also enable the machine verifier on the arm-irtranslator.ll test, since that would've caught this issue. Fixes (part of) PR32146. llvm-svn: 304712
* Add support for #pragma clang sectionJaved Absar2017-06-051-0/+14
| | | | | | | | | | | | | | | This patch provides a means to specify section-names for global variables, functions and static variables, using #pragma directives. This feature is only defined to work sensibly for ELF targets. One can specify section names as: #pragma clang section bss="myBSS" data="myData" rodata="myRodata" text="myText" One can "unspecify" a section name with empty string e.g. #pragma clang section bss="" data="" text="" rodata="" Reviewers: Roger Ferrer, Jonathan Roelofs, Reid Kleckner Differential Revision: https://reviews.llvm.org/D33413 llvm-svn: 304704
* [ARM] Support fixup for Thumb2 modified immediatePeter Smith2017-06-054-3/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | This change adds a new fixup fixup_t2_so_imm for the t2_so_imm_asmoperand "T2SOImm". The fixup permits code such as: .L1: sub r3, r3, #.L2 - .L1 .L2: to assemble in Thumb2 as well as in ARM state. The operand predicate isT2SOImm() explicitly doesn't match expressions containing :upper16: and :lower16: as expressions with these operators must match the movt and movw instructions. The test mov r0, foo2 in thumb2-diagnostics is moved to a new file as the fixup delays the error message till after the assembler has quit due to the other errors. As the mov instruction shares the t2_so_imm_asmoperand mov instructions with a non constant expression now match t2MOVi rather than t2MOVi16 so the error message is slightly different. Fixes PR28647 Differential Revision: https://reviews.llvm.org/D33492 llvm-svn: 304702
* [AMDGPU] Fix SIFoldOperands crash with clampStanislav Mekhanoshin2017-06-051-1/+2
| | | | | | | | | Fixes bug #33302. Pass did not account that Src1 of max instruction can be an immediate. Differential Revision: https://reviews.llvm.org/D33884 llvm-svn: 304696
* [X86][SSE] Change BUILD_VECTOR interleaving ordering to improve ↵Simon Pilgrim2017-06-041-18/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | coalescing/combine opportunities We currently generate BUILD_VECTOR as a tree of UNPCKL shuffles of the same type: e.g. for v4f32: Step 1: unpcklps 0, 2 ==> X: <?, ?, 2, 0> : unpcklps 1, 3 ==> Y: <?, ?, 3, 1> Step 2: unpcklps X, Y ==> <3, 2, 1, 0> The issue is because we are not placing sequential vector elements together early enough, we fail to recognise many combinable patterns - consecutive scalar loads, extractions etc. Instead, this patch unpacks progressively larger sequential vector elements together: e.g. for v4f32: Step 1: unpcklps 0, 2 ==> X: <?, ?, 1, 0> : unpcklps 1, 3 ==> Y: <?, ?, 3, 2> Step 2: unpcklpd X, Y ==> <3, 2, 1, 0> This does mean that we are creating UNPCKL shuffle of different value types, but the relevant combines that benefit from this are quite capable of handling the additional BITCASTs that are now included in the shuffle tree. Differential Revision: https://reviews.llvm.org/D33864 llvm-svn: 304688
* [AMDGPU] Untangle SDWA pass from SIShrinkInstructionsStanislav Mekhanoshin2017-06-032-26/+67
| | | | | | | | | | | | Remove dependency of SDWA pass on SIShrinkInstructions. The goal is to move SDWA even higher in the stack to avoid second run of MachineLICM, MachineCSE and SIFoldOperands. Also added handling to preserve original src modifiers. Differential Revision: https://reviews.llvm.org/D33860 llvm-svn: 304665
* [X86][SSE] Add SCALAR_TO_VECTOR(PEXTRW/PEXTRB) support to faux shuffle combiningSimon Pilgrim2017-06-031-9/+31
| | | | | | Generalized existing SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT) code to support AssertZext + PEXTRW/PEXTRB cases as well. llvm-svn: 304659
* AMDGPU/GlobalISel: Mark 1-bit integer constants as legalTom Stellard2017-06-031-0/+5
| | | | | | | | | | | | | | | | Summary: These are mostly legal, but will probably need special lowering for some cases. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D33791 llvm-svn: 304628
* [AMDGPU] Preserve operand order in SIFoldOperandsStanislav Mekhanoshin2017-06-031-3/+18
| | | | | | | | | SIFoldOperands can commute operands even if no folding was done. This change is to preserve IR is no folding was done. Differential Revision: https://reviews.llvm.org/D33802 llvm-svn: 304625
* [AMDGPU] V_DIV_FIXUP_F16 is not a commutable operationStanislav Mekhanoshin2017-06-031-1/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D33808 llvm-svn: 304619
* [x86] simplify code for vector icmp pred transforms; NFCISanjay Patel2017-06-021-31/+19
| | | | | | Organizing by transform is smaller and easier to read than a squashed switch with fall-throughs. llvm-svn: 304611
* [X86] Correctly broadcast NaN-like integers as float on AVX.Ahmed Bougacha2017-06-021-11/+13
| | | | | | | | | | | | | | | | | | | | | | | | Since r288804, we try to lower build_vectors on AVX using broadcasts of float/double. However, when we broadcast integer values that happen to have a NaN float bitpattern, we lose the NaN payload, thereby changing the integer value being broadcast. This is caused by ConstantFP::get, to which we pass the splat i32 as a float (by bitcasting it using bitsToFloat). ConstantFP::get takes a double parameter, so we end up lossily converting a single-precision NaN to double-precision. Instead, avoid any kinds of conversions by directly building an APFloat from the splatted APInt. Note that this also fixes another piece of code (broadcast of subvectors), that currently isn't susceptible to the same problem. Also note that we could really just use APInt and ConstantInt throughout: the constant pool type doesn't matter much. Still, for consistency, use the appropriate type. llvm-svn: 304590
* [x86] fix formatting; NFCISanjay Patel2017-06-021-8/+9
| | | | llvm-svn: 304576
* AMDGPU: Register AMDGPUAlwaysInlineMatt Arsenault2017-06-023-3/+10
| | | | llvm-svn: 304574
* AMDGPU: Make auto waitcnt before barrier a featureKonstantin Zhuravlyov2017-06-025-8/+16
| | | | | | Differential Revision: https://reviews.llvm.org/D33793 llvm-svn: 304571
* Tidy up a bit of r304516, use SmallVector::assign rather than for loopDavid Blaikie2017-06-021-2/+2
| | | | | | | | | | | | | | | | This might give a few better opportunities to optimize these to memcpy rather than loops - also a few minor cleanups (StringRef-izing, templating (to avoid std::function indirection), etc). The SmallVector::assign(iter, iter) could be improved with the use of SFINAE, but the (iter, iter) ctor and append(iter, iter) need it to and don't have it - so, workaround it for now rather than bothering with the added complexity. (also, as noted in the added FIXME, these assign ops could potentially be optimized better at least for non-trivially-copyable types) llvm-svn: 304566
* AMDGPUAnnotateUniformValue should always treat volatile loads as divergentAlexander Timofeev2017-06-022-1/+2
| | | | llvm-svn: 304554
* [AArch64][Falkor] Model immediate forwarding.Geoff Berry2017-06-021-13/+28
| | | | llvm-svn: 304552
* [AMDGPU] Turn on the new waitcnt insertion pass. Adjust tests.Mark Searles2017-06-021-1/+1
| | | | | | | | | -enable-si-insert-waitcnts=1 becomes the default -enable-si-insert-waitcnts=0 to use old pass Differential Revision: https://reviews.llvm.org/D33730 llvm-svn: 304551
* [mips][microMIPS] Extending size reduction pass with LBU16, LHU16, SB16 and SH16Zoran Jovanovic2017-06-021-0/+57
| | | | | | | | | | | | | | Author: milena.vujosevic.janicic Reviewers: sdardis The patch extends size reduction pass for MicroMIPS. The following instructions are examined and transformed, if possible: LBU instruction is transformed into 16-bit instruction LBU16 LHU instruction is transformed into 16-bit instruction LHU16 SB instruction is transformed into 16-bit instruction SB16 SH instruction is transformed into 16-bit instruction SH16 Differential Revision: https://reviews.llvm.org/D33091 llvm-svn: 304550
* [Hexagon] Return 0 from getDotNewPredOp when .new opcode does not existKrzysztof Parzyszek2017-06-021-3/+1
| | | | | | | This allows using this function to test if an instruction can be converted to a .new form. llvm-svn: 304549
* [ARM] GlobalISel: Support struct params/returnsDiana Picus2017-06-021-3/+11
| | | | | | | | | | | | Very very similar to the support for arrays. As with arrays, we don't support returning large structs that wouldn't fit in R0-R3. Most front-ends would likely use sret arguments for that anyway. The only significant difference is that when splitting a struct, we need to make sure we set the correct original alignment on each member, otherwise it may get split incorrectly between stack and registers. llvm-svn: 304536
* [ARM] Cortex-A57 scheduling model for ARM backend (AArch32)Javed Absar2017-06-027-11/+1914
| | | | | | | | | | | | | | | This patch implements the Cortex-A57 scheduling model. The main code is in ARMScheduleA57.td, ARMScheduleA57WriteRes.td. Small changes in cpp,.h files to support required scheduling predicates. Scheduling model implemented according to: http://infocenter.arm.com/help/topic/com.arm.doc.uan0015b/Cortex_A57_Software_Optimization_Guide_external.pdf. Patch by : Andrew Zhogin (submitted on his behalf, as requested). Rewiewed by: Renato Golin, Diana Picus, Javed Absar, Kristof Beyls. Differential Revision: https://reviews.llvm.org/D28152 llvm-svn: 304530
* [WebAssembly] MC: Fix references to undefined externals in data sectionSam Clegg2017-06-021-3/+0
| | | | | | | | | | | | Undefined externals don't need to have a size or an offset. This was broken by r303915. Added a test for this case. This fixes the "Compile LLVM Torture (o)" step on the wasm waterfall. Differential Revision: https://reviews.llvm.org/D33803 llvm-svn: 304505
* [CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-06-011-2/+5
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 304495
* [AMDGPU] Fix kernel arg segment size for amdgizclYaxun Liu2017-06-011-1/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D33307 llvm-svn: 304482
* [Hexagon] Fix dependence check in the packetizerKrzysztof Parzyszek2017-06-013-189/+35
| | | | | | | An incorrect check in the packetizer lead to an attempt to convert an unconditional branch to a .new (conditional) form. llvm-svn: 304442
* [Hexagon] Handle long-running simplification loop in idiom recognitionKrzysztof Parzyszek2017-06-011-3/+5
| | | | | | | | | | | | | The initial assumption was that the simplification would converge to a fixed point relatvely quickly. Turns out that there are legitimate situa- tions where the complexity of the code causes it to take a large number of iterations. Two main changes: - Instead of aborting upon hitting the limit, simply return nullptr. - Reduce the limit to 10,000 from 100,000. llvm-svn: 304441
* Remove ADDC, ADDE, SUBC, SUBE and SETCCE support from the X86 backend, use ↵Amaury Sechet2017-06-013-64/+0
| | | | | | | | | | | | | | | | | the CARRY ops instead. Summary: As per title. This cleanup some technical debt. Depends on D33374 Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33390 llvm-svn: 304435
* AMDGPU: Remove error on call in AsmPrinterMatt Arsenault2017-06-011-29/+26
| | | | | | | Partial revert of r301938 which is making it harder to split patches up. llvm-svn: 304418
* AMDGPU: Set high getCSRFirstUseCostMatt Arsenault2017-06-011-0/+6
| | | | llvm-svn: 304416
* [ARM] Create relocations for Thumb functions calling ARM fns in ELF.Florian Hahn2017-06-011-0/+9
| | | | | | | | | | | | | | | | Summary: Without using a fixup in this case, BL will be used instead of BLX to call internal ARM functions from Thumb functions. Reviewers: rafael, t.p.northover, peter.smith, kristof.beyls Reviewed By: peter.smith Subscribers: srhines, echristo, aemerson, rengolin, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33436 llvm-svn: 304413
OpenPOWER on IntegriCloud