summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert "Merge branch 'arcpatch-D55094'"David Callahan2019-01-141-3/+3
| | | | | | | | | This reverts commit a9788dd6587d67c856df74eedff5a6ad34ce8320, reversing changes made to f1309ffebf718d16aec4fab83380556c660e2825. unintended merge pushed llvm-svn: 351095
* [x86] lower extracted add/sub to horizontal vector mathSanjay Patel2019-01-141-19/+55
| | | | | | | | | | | | | | add (extractelt (X, 0), extractelt (X, 1)) --> extractelt (hadd X, X), 0 This is the integer sibling to D56011. There's an additional restriction to only to do this transform in the case where we don't have extra extracts from the source vector. Without that, we can fail to match larger horizontal patterns that are more beneficial than this minimal case. An improvement to the more general h-op lowering may allow us to remove the restriction here in a follow-up. llvm-svn: 351093
* Merge branch 'arcpatch-D55094'David Callahan2019-01-141-3/+3
| | | | llvm-svn: 351092
* Revert "[VFS] Allow multiple RealFileSystem instances with independent CWDs."Amara Emerson2019-01-141-73/+30
| | | | | | This reverts commit r351079, r351069 and r351050 as it broken the greendragon bots on macOS. llvm-svn: 351091
* cmake: Don't install plugins used for examples or testsTom Stellard2019-01-141-1/+1
| | | | | | | | | | | | | | | | Summary: This patch drops install targets for LLVMHello.so, TestPlugin.so, and BugpointPasses.so. Reviewers: chandlerc, beanz, thakis, philip.pfaffe Reviewed By: chandlerc Subscribers: SquallATF, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D55965 llvm-svn: 351087
* [WebAssembly] Remove old intrinsicsDan Gohman2019-01-141-29/+0
| | | | | | | | | This removes the old grow_memory and mem.grow-style intrinsics, leaving just the memory.grow-style intrinsics. Differential Revision: https://reviews.llvm.org/D56645 llvm-svn: 351084
* Reapply r345008 "Split MachinePipeliner code into header and cpp files"Adrian Prantl2019-01-141-595/+18
| | | | | | | | | | | | | | Split MachinePipeliner code into header and cpp files to allow inheritance from SwingSchedulerDAG. This reapplies https://reviews.llvm.org/D56084 after moving the implementation of the dump functions into the .cpp files. This fixes a linker error when building with Clang modules enables and local submodule visibility disabled. Original patch by Lama Saba <lama.saba@intel.com>! llvm-svn: 351077
* Remove NameLen argument from newly-introduced IR C APIs.James Y Knight2019-01-141-6/+5
| | | | | | | | | | | | | | | | | | | | | Normally, changing the function signatures of C APIs is disallowed, but as these two are brand new last week, and haven't been released yet, it is okay in this instance. As per discussion in D56556, we will not add NameLen arguments to IR building APIs, for the following reasons: 1. We do not want to deprecate all of the IR building APIs, just to add a NameLen argument to each one. 2. Consistency is important, so adding it just to new ones is unfortunate. 3. The IR names are completely optional, useful for readability of IR only. There is no value in ever supporting nul bytes. Differential Revision: https://reviews.llvm.org/D56669 llvm-svn: 351076
* Reland "Refactor GetRegistersForValue. NFCI."Nirav Dave2019-01-141-55/+44
| | | | | | Remove over-strictification class membership check. llvm-svn: 351074
* [DAGCombiner] Add (sub_sat x, x) -> 0 combineSimon Pilgrim2019-01-141-0/+4
| | | | llvm-svn: 351073
* [DAGCombiner] Enable sub saturation constant foldingSimon Pilgrim2019-01-142-1/+8
| | | | llvm-svn: 351072
* [DAGCombiner] Add add/sub saturation undef handlingSimon Pilgrim2019-01-142-0/+14
| | | | | | | | Match ConstantFolding.cpp: (add_sat x, undef) -> -1 (sub_sat x, undef) -> 0 llvm-svn: 351070
* [VFS] Fix unused variable warning. NFCSam McCall2019-01-141-1/+1
| | | | llvm-svn: 351069
* [DAGCombiner] Enable add saturation constant foldingSimon Pilgrim2019-01-142-2/+5
| | | | llvm-svn: 351060
* [mips] Optimize shifts for types larger than GPR size (mips2/mips3)Aleksandar Beserminji2019-01-144-0/+118
| | | | | | | | | | | | | With this patch, shifts are lowered to optimal number of instructions necessary to shift types larger than the general purpose register size. This resolves PR/32293. Thanks to Kyle Butt for reporting the issue! Differential Revision: https://reviews.llvm.org/D56320 llvm-svn: 351059
* [DebugInfo] Remove un-necessary logic from HoistThenElseCodeToIfJeremy Morse2019-01-141-12/+1
| | | | | | | | | | | | | | | | | Following PR39807, the way in which SimplifyCFG hoists common code on branch paths was fixed in r347782. However this left extra code hanging around HoistThenElseCodeToIf that wasn't necessary and needlessly complicated matters -- we no longer need to look up through the 'if' basic block to find a location for hoisted 'select' insts, we can instead use the location chosen by applyMergedLocation. This patch deletes that extra logic, and updates a regression test to reflect the new logic (selects get the merged location, not a previous insts location). Differential Revision: https://reviews.llvm.org/D55272 llvm-svn: 351058
* [DAGCombiner] Add add saturation constant folding tests.Simon Pilgrim2019-01-141-2/+3
| | | | | | Exposes an issue with sadd_sat for computeOverflowKind, so I've disabled it for now. llvm-svn: 351057
* [ARM GlobalISel] Import MOVi32imm into GlobalISelDiana Picus2019-01-141-1/+14
| | | | | | | | | | | | | | | | Make it possible for TableGen to produce code for selecting MOVi32imm. This allows reasonably recent ARM targets to select a lot more constants than before. We achieve this by adding GISelPredicateCode to arm_i32imm. It's impossible to use the exact same code for both DAGISel and GlobalISel, since one uses "Subtarget->" and the other "STI." to refer to the subtarget. Moreover, in GlobalISel we don't have ready access to the MachineFunction, so we need to add a bit of code for obtaining it from the instruction that we're selecting. This is also the reason why it needs to remain a PatLeaf instead of the more specific IntImmLeaf. llvm-svn: 351056
* [SelectionDAG] Add type sanity assertions for add/sub saturation node creation.Simon Pilgrim2019-01-141-0/+4
| | | | llvm-svn: 351055
* [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd tryDavid Stuttard2019-01-1413-57/+563
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b llvm-svn: 351054
* [VFS] Allow multiple RealFileSystem instances with independent CWDs.Sam McCall2019-01-141-30/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Previously only one RealFileSystem instance was available, and its working directory is shared with the process. This doesn't work well for multithreaded programs that want to work with relative paths - the vfs::FileSystem is assumed to provide the working directory, but a thread cannot control this exclusively. The new vfs::createPhysicalFileSystem() factory copies the process's working directory initially, and then allows it to be independently modified. This implementation records the working directory path, and glues it to relative paths to provide the correct absolute path to the sys::fs:: functions. This will give different results in unusual situations (e.g. the CWD is moved). The main alternative is the use of openat(), fstatat(), etc to ask the OS to resolve paths relative to a directory handle which can be kept open. This is more robust. There are two reasons not to do this initially: 1. these functions are not available on all supported Unixes, and are somewhere between difficult and unavailable on Windows. So we need a path-based fallback anyway. 2. this would mean also adding support at the llvm::sys::fs level, which is a larger project. My clearest idea is an OS-specific `BaseDirectory` object that can be optionally passed to functions there. Eventually this could be backed by either paths or a fd where openat() is supported. This is a large project, and demonstrating here that a path-based fallback works is a useful prerequisite. There is some subtlety to the path-manipulation mechanism: - when setting the working directory, both Specified=makeAbsolute(path) and Resolved=realpath(path) are recorded. These may differ in the presence of symlinks. - getCurrentWorkingDirectory() and makeAbsolute() use Specified - this is similar to the behavior of $PWD and sys::path::current_path - IO operations like openFileForRead use Resolved. This is similar to the behavior of an openat() based implementation, that doesn't see changes in symlinks. There may still be combinations of operations and FS states that yield unhelpful behavior. This is hard to avoid with symlinks and FS abstractions :( The caching behavior of the current working directory is removed in this patch. getRealFileSystem() is now specified to link to the process CWD, so the caching is incorrect. The user who needed this so far is clangd, which will immediately switch to createPhysicalFileSystem(). Reviewers: ilya-biryukov, bkramer, labath Subscribers: ioeric, kadircet, kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D56545 llvm-svn: 351050
* Replace "no-frame-pointer-*" function attributes with "frame-pointer"Francis Visoiu Mistrih2019-01-144-16/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Part of the effort to refactoring frame pointer code generation. We used to use two function attributes "no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" to represent three kinds of frame pointer usage: (all) frames use frame pointer, (non-leaf) frames use frame pointer, (none) frame use frame pointer. This CL makes the idea explicit by using only one enum function attribute "frame-pointer" Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as llc. "no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still supported for easy migration to "frame-pointer". tests are mostly updated with // replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’ grep -iIrnl '\-disable-fp-elim=false' * | xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g" // replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’ grep -iIrnl '\-disable-fp-elim' * | xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g" Patch by Yuanfang Chen (tabloid.adroit)! Differential Revision: https://reviews.llvm.org/D56351 llvm-svn: 351049
* [MIPS GlobalISel] Add pre legalizer combiner passPetar Avramovic2019-01-144-0/+101
| | | | | | | | | | Introduce GlobalISel pre legalizer pass for MIPS. It will be used to cope with instructions that require combining before legalization. Differential Revision: https://reviews.llvm.org/D56269 llvm-svn: 351046
* [BasicBlockUtils] Generalize DeleteDeadBlock to deal with multiple dead blocksMax Kazantsev2019-01-141-36/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Utility function `DeleteDeadBlock` expects that all predecessors of a block being deleted are already deleted, with the exception of single-block loop. It makes it hard to use for deletion of a set of blocks that may contain cyclic dependencies. The is no correct order of invocations of this function that does not produce dangling pointers on already deleted blocks. This patch introduces a generalized version of this function `DeleteDeadBlocks` that allows us to remove multiple blocks at once, even if there are cycles among them. The only requirement is that no block being deleted should have a predecessor that is not being deleted. The logic of `DeleteDeadBlocks` is following: for each block create relevant DT updates; remove all instructions (replace with undef if needed); replace terminator with unreacheable; apply DT updates; for each block delete block; Therefore, `DeleteDeadBlock` becomes a particular case of the general algorithm called for a single block. Differential Revision: https://reviews.llvm.org/D56120 Reviewed By: skatkov llvm-svn: 351045
* Add support for prefix-only CLI optionsThomas Preud'homme2019-01-141-5/+14
| | | | | | | | | | | | | | | | | | | | | | Summary: Add support for options that always prefix their value, giving an error if the value is in the next argument or if the option is given a value assignment (ie. opt=val). This is the desired behavior for the -D option of FileCheck for instance. Copyright: - Linaro (changes in version 2 of revision D55940) - GraphCore (changes in later versions and introduced when creating D56549) Reviewers: jdenny Subscribers: llvm-commits, probinson, kristina, hiraditya, JonChesterfield Differential Revision: https://reviews.llvm.org/D56549 llvm-svn: 351038
* [X86] Remove mask parameter from avx512 pmultishiftqb intrinsics. Use select ↵Craig Topper2019-01-142-6/+13
| | | | | | | | in IR instead. Fixes PR40259 llvm-svn: 351035
* [X86] Update type profile for DBPSADBW to indicate the immediate is an i8 ↵Craig Topper2019-01-141-1/+1
| | | | | | | | not just any int. Removes some type checks from X86GenDAGISel.inc llvm-svn: 351033
* [X86] Remove unused intrinsic handlers. NFCCraig Topper2019-01-142-39/+2
| | | | llvm-svn: 351032
* [X86] Remove FPCLASS intrinsic handler. Use INTR_TYPE_2OP instead. NFCCraig Topper2019-01-142-14/+7
| | | | llvm-svn: 351031
* [X86] Remove mask parameter from vpshufbitqmb intrinsics. Change result to a ↵Craig Topper2019-01-143-42/+19
| | | | | | | | | | vXi1 vector. The input mask can be represented with an AND in IR. Fixes PR40258 llvm-svn: 351028
* [DAGCombiner] If add_sat(x,y) can't overflow -> add(x,y)Simon Pilgrim2019-01-131-0/+4
| | | | | NOTE: We need more powerful signed overflow detection in computeOverflowKind llvm-svn: 351026
* Fix unused variable warning. NFCI.Simon Pilgrim2019-01-131-1/+0
| | | | llvm-svn: 351025
* [DAGCombiner] Some very basic add/sub saturation combines.Simon Pilgrim2019-01-131-0/+64
| | | | | | Handle combines with zero and constant canonicalization for adds. llvm-svn: 351024
* [LegalizeDAG] Remove 'NeedInvert' code from expansion of BR_CC. Replace with ↵Craig Topper2019-01-131-4/+1
| | | | | | | | | | | | an assert. I accidentally triggered this code while doing some experiments and it doesn't look lke it could possibly work. It calls 'getNOT' on a node that should be a CondCode. I think to do this right we would need to swap the branch target and the fallthrough target. But that's not easy to do. Or we could create an explicit SetCC and feed that into a new BR_CC? llvm-svn: 351022
* [X86] Rename overly verbose method; NFCNikita Popov2019-01-133-8/+5
| | | | | | As suggested on D56636. llvm-svn: 351021
* [X86] Add more ISD nodes to handle masked versions of ↵Craig Topper2019-01-135-12/+152
| | | | | | | | | | VCVT(T)PD2DQZ128/VCVT(T)PD2UDQZ128 which only produce 2 result elements and zeroes the upper elements. We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this. Fixes another case from PR34877 llvm-svn: 351018
* [X86] Add X86ISD::VMFPROUND to handle the masked case of VCVTPD2PSZ128 which ↵Craig Topper2019-01-135-18/+89
| | | | | | | | | | only produces 2 result elements and zeroes the upper elements. We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this. Fixes another case from PR34877. llvm-svn: 351017
* Give helper classes/functions local linkage. NFC.Benjamin Kramer2019-01-128-4/+14
| | | | llvm-svn: 351016
* [X86] More aggressive shuffle mask widening in combineExtractWithShuffleSimon Pilgrim2019-01-121-0/+9
| | | | | | Use demanded extract index to set most of the shuffle mask to undef, making it easier to widen and peek through. llvm-svn: 351013
* [LoopVectorizer] give more advice in remark about failure to vectorize callSanjay Patel2019-01-121-3/+23
| | | | | | | | | | Something like this is requested by: https://bugs.llvm.org/show_bug.cgi?id=40265 ...and it seems like a common enough case that we should acknowledge it. Differential Revision: https://reviews.llvm.org/D56551 llvm-svn: 351010
* [DAGCombiner] fold insert_subvector of insert_subvectorSanjay Patel2019-01-121-0/+8
| | | | | | | | | | | | | | | | | | | This pattern: t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0> t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0> ...shows up in PR33758: https://bugs.llvm.org/show_bug.cgi?id=33758 ...although this patch doesn't make any difference to the final result on that yet. In the affected tests here, it looks like it just makes RA wiggle. But we might as well squash this to prevent it interfering with other pattern-matching. Differential Revision: https://reviews.llvm.org/D56604 llvm-svn: 351008
* Use getShiftAmountTy for shift amounts.Simon Pilgrim2019-01-121-1/+2
| | | | llvm-svn: 351005
* [ORC][MIPS] Fill delay-slot after `jr` instructionSimon Atanasyan2019-01-121-5/+5
| | | | | | | | | | MIPS `jr` instruction uses a delay-slot. To escape execution of arbitrary instruction we should either fill the delay-slot by `nop` instruction or swap `jr` instruction and logically preceding instruction. This fix implements the second method to generate a bit more effective code. llvm-svn: 351001
* [ORC][MIPS] Setup t9 register and call function through this registerSimon Atanasyan2019-01-121-11/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MIPS ABI states that every function must be called through jalr $t9. In other words, a function expect that t9 register points to the beginning of its code. A function uses this register to calculate offset to the Global Offset Table and save it to the `gp` register. ``` lui $gp, %hi(_gp_disp) addiu $gp, %lo(_gp_disp) addu $gp, $gp, $t9 ``` If `t9` and as a result `$gp` point to the wrong place the following code loads incorrect value from GOT and passes control to invalid code. ``` lw $v0,%call16(foo)($gp) jalr $t9 ``` OrcMips32 and OrcMips64 writeResolverCode methods pass control to the resolved address, but do not setup `$t9` before the call. The `t9` holds value of the beginning of `resolver` code so any attempts to call routines via GOT failed. This change fixes the problem. The `OrcLazy/hidden-visibility.ll` test starts to pass correctly. Before the change it fails on MIPS because the `exitOnLazyCallThroughFailure` called from the resolver code could not call libc routine `exit` via GOT. Differential Revision: http://reviews.llvm.org/D56058 llvm-svn: 351000
* [X86] Improve vXi64 ISD::ABS codegen with SSE41+Simon Pilgrim2019-01-121-0/+9
| | | | | | | | Make use of vblendvpd to select on the signbit Differential Revision: https://reviews.llvm.org/D56544 llvm-svn: 350999
* [X86][AARCH64] Improve ISD::ABS supportSimon Pilgrim2019-01-124-6/+56
| | | | | | | | This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types. Differential Revision: https://reviews.llvm.org/D56544 llvm-svn: 350998
* Reapply "[DemandedBits] Use SetVector for Worklist"Nikita Popov2019-01-121-7/+6
| | | | | | | | | | | | | | | | DemandedBits currently uses a simple vector for the worklist, which means that instructions may be inserted multiple times into it. Especially in combination with the deep lattice, this may cause instructions too be recomputed very often. To avoid this, switch to a SetVector. Reapplying with a smaller number of inline elements in the SmallSetVector, to avoid running into the SmallDenseMap issue described in D56455. Differential Revision: https://reviews.llvm.org/D56362 llvm-svn: 350997
* [X86] Remove X86ISD::SELECT as its no longer used by any of our intrinsic ↵Craig Topper2019-01-123-3/+1
| | | | | | lowering. llvm-svn: 350995
* [X86] Add ISD node for masked version of CVTPS2PH.Craig Topper2019-01-125-22/+59
| | | | | | | | | | The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do. This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there. Fixes another instruction for PR34877. llvm-svn: 350994
* [RISCV] Introduce codegen patterns for RV64M-only instructionsAlex Bradbury2019-01-122-5/+52
| | | | | | | | | | | | | | | | | | As discussed on llvm-dev <http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>, we have to be careful when trying to select the *w RV64M instructions. i32 is not a legal type for RV64 in the RISC-V backend, so operations have been promoted by the time they reach instruction selection. Information about whether the operation was originally a 32-bit operations has been lost, and it's easy to write incorrect patterns. Similarly to the variable 32-bit shifts, a DAG combine on ANY_EXTEND will produce a SIGN_EXTEND if this is likely to result in sdiv/udiv/urem being selected (and so save instructions to sext/zext the input operands). Differential Revision: https://reviews.llvm.org/D53230 llvm-svn: 350993
OpenPOWER on IntegriCloud