summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Correctly broadcast NaN-like integers as float on AVX.Ahmed Bougacha2017-06-021-11/+13
| | | | | | | | | | | | | | | | | | | | | | | | Since r288804, we try to lower build_vectors on AVX using broadcasts of float/double. However, when we broadcast integer values that happen to have a NaN float bitpattern, we lose the NaN payload, thereby changing the integer value being broadcast. This is caused by ConstantFP::get, to which we pass the splat i32 as a float (by bitcasting it using bitsToFloat). ConstantFP::get takes a double parameter, so we end up lossily converting a single-precision NaN to double-precision. Instead, avoid any kinds of conversions by directly building an APFloat from the splatted APInt. Note that this also fixes another piece of code (broadcast of subvectors), that currently isn't susceptible to the same problem. Also note that we could really just use APInt and ConstantInt throughout: the constant pool type doesn't matter much. Still, for consistency, use the appropriate type. llvm-svn: 304590
* [x86] fix formatting; NFCISanjay Patel2017-06-021-8/+9
| | | | llvm-svn: 304576
* AMDGPU: Register AMDGPUAlwaysInlineMatt Arsenault2017-06-023-3/+10
| | | | llvm-svn: 304574
* AMDGPU: Make auto waitcnt before barrier a featureKonstantin Zhuravlyov2017-06-025-8/+16
| | | | | | Differential Revision: https://reviews.llvm.org/D33793 llvm-svn: 304571
* Tidy up a bit of r304516, use SmallVector::assign rather than for loopDavid Blaikie2017-06-021-2/+2
| | | | | | | | | | | | | | | | This might give a few better opportunities to optimize these to memcpy rather than loops - also a few minor cleanups (StringRef-izing, templating (to avoid std::function indirection), etc). The SmallVector::assign(iter, iter) could be improved with the use of SFINAE, but the (iter, iter) ctor and append(iter, iter) need it to and don't have it - so, workaround it for now rather than bothering with the added complexity. (also, as noted in the added FIXME, these assign ops could potentially be optimized better at least for non-trivially-copyable types) llvm-svn: 304566
* AMDGPUAnnotateUniformValue should always treat volatile loads as divergentAlexander Timofeev2017-06-022-1/+2
| | | | llvm-svn: 304554
* [AArch64][Falkor] Model immediate forwarding.Geoff Berry2017-06-021-13/+28
| | | | llvm-svn: 304552
* [AMDGPU] Turn on the new waitcnt insertion pass. Adjust tests.Mark Searles2017-06-021-1/+1
| | | | | | | | | -enable-si-insert-waitcnts=1 becomes the default -enable-si-insert-waitcnts=0 to use old pass Differential Revision: https://reviews.llvm.org/D33730 llvm-svn: 304551
* [mips][microMIPS] Extending size reduction pass with LBU16, LHU16, SB16 and SH16Zoran Jovanovic2017-06-021-0/+57
| | | | | | | | | | | | | | Author: milena.vujosevic.janicic Reviewers: sdardis The patch extends size reduction pass for MicroMIPS. The following instructions are examined and transformed, if possible: LBU instruction is transformed into 16-bit instruction LBU16 LHU instruction is transformed into 16-bit instruction LHU16 SB instruction is transformed into 16-bit instruction SB16 SH instruction is transformed into 16-bit instruction SH16 Differential Revision: https://reviews.llvm.org/D33091 llvm-svn: 304550
* [Hexagon] Return 0 from getDotNewPredOp when .new opcode does not existKrzysztof Parzyszek2017-06-021-3/+1
| | | | | | | This allows using this function to test if an instruction can be converted to a .new form. llvm-svn: 304549
* [ARM] GlobalISel: Support struct params/returnsDiana Picus2017-06-021-3/+11
| | | | | | | | | | | | Very very similar to the support for arrays. As with arrays, we don't support returning large structs that wouldn't fit in R0-R3. Most front-ends would likely use sret arguments for that anyway. The only significant difference is that when splitting a struct, we need to make sure we set the correct original alignment on each member, otherwise it may get split incorrectly between stack and registers. llvm-svn: 304536
* [ARM] Cortex-A57 scheduling model for ARM backend (AArch32)Javed Absar2017-06-027-11/+1914
| | | | | | | | | | | | | | | This patch implements the Cortex-A57 scheduling model. The main code is in ARMScheduleA57.td, ARMScheduleA57WriteRes.td. Small changes in cpp,.h files to support required scheduling predicates. Scheduling model implemented according to: http://infocenter.arm.com/help/topic/com.arm.doc.uan0015b/Cortex_A57_Software_Optimization_Guide_external.pdf. Patch by : Andrew Zhogin (submitted on his behalf, as requested). Rewiewed by: Renato Golin, Diana Picus, Javed Absar, Kristof Beyls. Differential Revision: https://reviews.llvm.org/D28152 llvm-svn: 304530
* [WebAssembly] MC: Fix references to undefined externals in data sectionSam Clegg2017-06-021-3/+0
| | | | | | | | | | | | Undefined externals don't need to have a size or an offset. This was broken by r303915. Added a test for this case. This fixes the "Compile LLVM Torture (o)" step on the wasm waterfall. Differential Revision: https://reviews.llvm.org/D33803 llvm-svn: 304505
* [CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-06-011-2/+5
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 304495
* [AMDGPU] Fix kernel arg segment size for amdgizclYaxun Liu2017-06-011-1/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D33307 llvm-svn: 304482
* [Hexagon] Fix dependence check in the packetizerKrzysztof Parzyszek2017-06-013-189/+35
| | | | | | | An incorrect check in the packetizer lead to an attempt to convert an unconditional branch to a .new (conditional) form. llvm-svn: 304442
* [Hexagon] Handle long-running simplification loop in idiom recognitionKrzysztof Parzyszek2017-06-011-3/+5
| | | | | | | | | | | | | The initial assumption was that the simplification would converge to a fixed point relatvely quickly. Turns out that there are legitimate situa- tions where the complexity of the code causes it to take a large number of iterations. Two main changes: - Instead of aborting upon hitting the limit, simply return nullptr. - Reduce the limit to 10,000 from 100,000. llvm-svn: 304441
* Remove ADDC, ADDE, SUBC, SUBE and SETCCE support from the X86 backend, use ↵Amaury Sechet2017-06-013-64/+0
| | | | | | | | | | | | | | | | | the CARRY ops instead. Summary: As per title. This cleanup some technical debt. Depends on D33374 Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33390 llvm-svn: 304435
* AMDGPU: Remove error on call in AsmPrinterMatt Arsenault2017-06-011-29/+26
| | | | | | | Partial revert of r301938 which is making it harder to split patches up. llvm-svn: 304418
* AMDGPU: Set high getCSRFirstUseCostMatt Arsenault2017-06-011-0/+6
| | | | llvm-svn: 304416
* [ARM] Create relocations for Thumb functions calling ARM fns in ELF.Florian Hahn2017-06-011-0/+9
| | | | | | | | | | | | | | | | Summary: Without using a fixup in this case, BL will be used instead of BLX to call internal ARM functions from Thumb functions. Reviewers: rafael, t.p.northover, peter.smith, kristof.beyls Reviewed By: peter.smith Subscribers: srhines, echristo, aemerson, rengolin, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33436 llvm-svn: 304413
* [X86] Match bitcast of vxi1 to pmovmskZvi Rackover2017-06-011-1/+107
| | | | | | | | | | | | | | | | | | | Summary: Add an early combine to match patterns such as: (i16 bitcast (v16i1 x)) -> (i16 movmsk (v16i8 sext (v16i1 x))) This combine needs to happen early enough before type-legalization scalarizes the result of the setcc. Reviewers: igorb, craig.topper, RKSimon Subscribers: delena, llvm-commits Differential Revision: https://reviews.llvm.org/D33311 llvm-svn: 304406
* Do not legalize large setcc with setcce, introduce setcccarry and do it with ↵Amaury Sechet2017-06-012-0/+27
| | | | | | | | | | | | | | | | | usubo/setcccarry. Summary: This is a continuation of the work started in D29872 . Passing the carry down as a value rather than as a glue allows for further optimizations. Introducing setcccarry makes the use of addc/subc unecessary and we can start the removal process. This patch only introduce the optimization strictly required to get the same level of optimization as was available before nothing more. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33374 llvm-svn: 304404
* Remove ISD::SETCC match from combineX86ADD. It's done improperly and doesn't ↵Amaury Sechet2017-06-011-2/+1
| | | | | | work. llvm-svn: 304403
* Add LiveRangeShrink pass to shrink live range within BB.Dehao Chen2017-05-311-0/+1
| | | | | | | | | | | | | | Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB. Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb Reviewed By: MatzeB, andreadb Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D32563 llvm-svn: 304371
* Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.Galina Kistanova2017-05-311-0/+1
| | | | llvm-svn: 304355
* X86FloatingPoint: Fix livein listsMatthias Braun2017-05-311-15/+21
| | | | | | | | | | | | After transforming FP to ST registers: - Do not add the ST register to the livein lists, they are reserved so we do not need to track their liveness. - Remove the FP registers from the livein lists, they don't have defs or uses anymore and so are not live. - (The setKillFlags() call is moved to an earlier place as it relies on the FP registers still being present in the livein list.) llvm-svn: 304342
* X86FloatingPoint: Add some static assert, cleanup; NFCMatthias Braun2017-05-311-2/+6
| | | | llvm-svn: 304341
* Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.Galina Kistanova2017-05-311-0/+4
| | | | llvm-svn: 304332
* [BPF] Correct the file name of the -gen-asm-matcher output file to not start ↵Craig Topper2017-05-311-1/+1
| | | | | | with X86. llvm-svn: 304324
* TargetMachine: Indicate whether machine verifier passes.Matthias Braun2017-05-3110-1/+37
| | | | | | | | | | | | | This adds a callback to the LLVMTargetMachine that lets target indicate that they do not pass the machine verifier checks in all cases yet. This is intended to be a temporary measure while the targets are fixed allowing us to enable the machine verifier by default with EXPENSIVE_CHECKS enabled! Differential Revision: https://reviews.llvm.org/D33696 llvm-svn: 304320
* [PowerPC] Correctly specify the cache line size for Power 7, 8 and 9.Sean Fertile2017-05-311-3/+12
| | | | | | | | | | Fixes PPCTTIImpl::getCacheLineSize() returning the wrong cache line size for newer ppc processors. Commiting on behalf of Stefan Pintilie. Differential Revision: https://reviews.llvm.org/D33656 llvm-svn: 304317
* [PPC] Inline expansion of memcmpZaara Syeda2017-05-313-0/+10
| | | | | | | | | | | | | | | This patch does an inline expansion of memcmp. It changes the memcmp library call into an inline expansion when the size is known at compile time and is under a target specified threshold. This expansion is implemented in CodeGenPrepare and expands into straight line code. The target specifies a maximum load size and the expansion works by using this size to load the two sources, compare, and exit early if a difference is found. It also has a special case when the memcmp result is used in a compare to zero equality. Differential Revision: https://reviews.llvm.org/D28637 llvm-svn: 304313
* Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.Galina Kistanova2017-05-311-5/+13
| | | | llvm-svn: 304312
* [AMDGPU] Fix bugs in new waitcnt pass. Add test.Mark Searles2017-05-311-4/+22
| | | | | | | | | | | - new waitcnt pass remains off by default; -enable-si-insert-waitcnts=1 to enable it - fix handling of PERMUTE ops - fix insertion of waitcnt instrs at function begin/end ( port of analogous code that was added to old waitcnt pass ) - add new test Differential Revision: https://reviews.llvm.org/D33114 llvm-svn: 304311
* [AMDGPU][MC] New syntax for ds_swizzle_b32 offsetDmitry Preobrazhensky2017-05-318-5/+505
| | | | | | | | | | See Bug 28601: https://bugs.llvm.org//show_bug.cgi?id=28601 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D33542 llvm-svn: 304309
* [AArch64] Enable FeatureFuseAES on Cortex-A53.Florian Hahn2017-05-311-0/+1
| | | | | | It improves performance on Cortex-A53. llvm-svn: 304307
* [AArch64] Enable FeatureFuseAES on Cortex-A73.Florian Hahn2017-05-311-0/+1
| | | | | | It improves performance on Cortex-A73. llvm-svn: 304304
* [PowerPC] Fix a performance bug for PPC::XXPERMDI.Tony Jiang2017-05-314-13/+110
| | | | | | | | | | There are some VectorShuffle Nodes in SDAG which can be selected to XXPERMDI Instruction, this patch recognizes them and does the selection to improve the PPC performance. Differential Revision: https://reviews.llvm.org/D33404 llvm-svn: 304298
* [PowerPC] Eliminate integer compare instructions - vol. 3Nemanja Ivanovic2017-05-311-9/+64
| | | | | | | | | This patch builds upon https://reviews.llvm.org/rL302810 to add handling for the 64-bit SETEQ patterns. Differential Revision: https://reviews.llvm.org/D33369 llvm-svn: 304286
* [AVR] Fix a big in shift operator lowering; Authored by Dr. Gergo ErdiDylan McKay2017-05-311-2/+2
| | | | | | | When generating code for a shift loop, check the shift amount against the literal value 0, not R0 llvm-svn: 304284
* [AVR] CPIRdK can only work with r16..r31; Authored by Dr. Gergo ErdiDylan McKay2017-05-311-1/+1
| | | | | | (https://github.com/avr-rust/rust/issues/50) llvm-svn: 304283
* [PowerPC] Eliminate integer compare instructions - vol. 2Nemanja Ivanovic2017-05-314-12/+230
| | | | | | | | | | | | This patch builds upon https://reviews.llvm.org/rL302810 to add handling for bitwise logical operations in general purpose registers. The idea is to keep the values in GPRs as long as possible - only extracting them to a condition register bit when no further operations are to be done. Differential Revision: https://reviews.llvm.org/D31851 llvm-svn: 304282
* X86FrameLowering: No need to mark FP as live-in everywhereMatthias Braun2017-05-311-7/+2
| | | | | | | | The frame pointer (when used as frame pointer) is a reserved register. We do not track liveness of reserved registers and hence do not need to add them to the basic block livein lists. llvm-svn: 304274
* ARM: Fix cmpxchg O0 expansionMatthias Braun2017-05-311-59/+73
| | | | | | | | | | | | | | | | This is the equivalent of r304048 for ARM: - Rewrite livein calculation to use the computeLiveIns() helper function. This is slightly less efficient but easier to reason about and doesn't unnecessarily add pristine and reserved registers[1] - Zero the status register at the beginning of the loop to make sure it has a defined value. - Remove kill flags of values that need to stay alive throughout the loop. [1] An upcoming commit of mine will tighten the MachineVerifier to catch these. llvm-svn: 304267
* ARM: Do not add reserved registers to block livein lists; NFCMatthias Braun2017-05-312-6/+8
| | | | llvm-svn: 304266
* [CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-05-311-8/+16
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 304265
* Add latency info for Exynos interleaved Load/Store instructions.Abderrazek Zaafrani2017-05-311-5/+335
| | | | llvm-svn: 304259
* TargetPassConfig: Keep a reference to an LLVMTargetMachine; NFCMatthias Braun2017-05-3017-37/+37
| | | | | | | | | | | TargetPassConfig is not useful for targets that do not use the CodeGen library, so we may just as well store a pointer to an LLVMTargetMachine instead of just to a TargetMachine. While at it, also change the constructor to take a reference instead of a pointer as the TM must not be nullptr. llvm-svn: 304247
* Revert "This patch closes PR28513: an optimization of multiplication by ↵Vedant Kumar2017-05-301-79/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | different constants. It's implemented on DAG combiner level." This reverts commit r304209. I think this change is responsible for a tablgen failure in stage2 builds: http://green.lab.llvm.org/green/job/clang-stage2-configure-Rthinlto_build/2171/ I reproduced the failure locally (without ThinLTO), reverted the commit, rebuilt the stage1 clang, rebuilt the stage2 llvm-tblgen tool, and found that the crash disappears when the commit is reverted. Here is the stack trace: FAILED: lib/Target/ARM/ARMGenRegisterBank.inc.tmp cd /Volumes/Builds/pz-master-stage2-RA/lib/Target/ARM && /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes /Builds/pz-master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp 0 llvm-tblgen 0x0000000106fc9568 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 40 1 llvm-tblgen 0x0000000106fc9be6 SignalHandler(int) + 422 2 libsystem_platform.dylib 0x00000001076a7fba _sigtramp + 26 3 libsystem_platform.dylib 0x00007fff58deb468 _sigtramp + 1366570184 4 llvm-tblgen 0x0000000106e89cc7 llvm::CodeGenRegBank::getCompositeSubRegIndex(llvm::CodeGenSubRegIndex*, llvm::CodeGenSubRegIndex*) + 615 5 llvm-tblgen 0x0000000106e88be6 llvm::CodeGenRegister::computeSubRegs(llvm::CodeGenRegBank&) + 2182 6 llvm-tblgen 0x0000000106e8e9f0 llvm::CodeGenRegBank::CodeGenRegBank(llvm::RecordKeeper&) + 2192 7 llvm-tblgen 0x0000000106f384a1 llvm::EmitRegisterBank(llvm::RecordKeeper&, llvm::raw_ostream&) + 65 8 llvm-tblgen 0x0000000106f72c64 (anonymous namespace)::LLVMTableGenMain(llvm::raw_ostream&, llvm::RecordKeeper&) + 1172 9 llvm-tblgen 0x0000000106fcb15f llvm::TableGenMain(char*, bool (*)(llvm::raw_ostream&, llvm::RecordKeeper&)) + 3599 10 llvm-tblgen 0x0000000106f727a6 main + 134 11 libdyld.dylib 0x000000010733c6a5 start + 1 Stack dump: 0. Program arguments: /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes/Builds/pz-master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp /bin/sh: line 1: 41986 Segmentation fault: 11 /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes/Builds/pz -master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp llvm-svn: 304231
OpenPOWER on IntegriCloud