summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Split extload/zextload local load patternsMatt Arsenault2019-07-084-32/+57
| | | | | | | This will help removing the custom load predicates, allowing the global isel emitter to handle them. llvm-svn: 365398
* Add parentheses to silence warning.Bill Wendling2019-07-081-6/+6
| | | | llvm-svn: 365394
* Standardize on MSVC behavior for triples with no environmentReid Kleckner2019-07-087-22/+14
| | | | | | | | | | | | | | | | | | | | | Summary: This makes it so that IR files using triples without an environment work out of the box, without normalizing them. Typically, the MSVC behavior is more desirable. For example, it tends to enable things like constant merging, use of associative comdats, etc. Addresses PR42491 Reviewers: compnerd Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64109 llvm-svn: 365387
* [InstCombine] fold insertelement into splat of same scalarSanjay Patel2019-07-081-0/+37
| | | | | | | | | | | | Forming the canonical splat shuffle improves analysis and may allow follow-on transforms (although some possibilities are missing as shown in the test diffs). The backend generically turns these patterns into build_vector, so there should be no codegen regressions. All targets are expected to be able to lower splats efficiently. llvm-svn: 365379
* AMDGPU: Fix unused variable in release buildMatt Arsenault2019-07-081-3/+3
| | | | llvm-svn: 365378
* Teach the symbolizer lib symbolize objects directly.Yuanfang Chen2019-07-083-23/+51
| | | | | | | | | | | | | | Currently, the symbolizer lib can only symbolize a file on disk. This patch teaches the symbolizer lib to symbolize objects. llvm-objdump needs this to support archive disassembly with source info. https://bugs.llvm.org/show_bug.cgi?id=41871 Reviewed by: jhenderson, grimar, MaskRay Differential Revision: https://reviews.llvm.org/D63521 llvm-svn: 365376
* AMDGPU: Fix stray typingMatt Arsenault2019-07-081-1/+1
| | | | llvm-svn: 365373
* AMDGPU: Make s34 the FP registerMatt Arsenault2019-07-086-142/+436
| | | | | | | | | | | | | | | | | | | | | | | Make the FP register callee saved. This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame register used throughout the rest of the function. I don't like how this bypassess the standard mechanism for CSR spills just to get the correct insert point. I may look for a better solution, since all CSR VGPRs may also need to have all lanes activated. Another option might be to make getFrameIndexReference change the base register if the frame index is a CSR, and then try to figure out the right insertion point in emitProlog. If there is a free VGPR lane available for SGPR spilling, try to use it for the FP. If that would require intrtoducing a new VGPR spill, try to use a free call clobbered SGPR. Only fallback to introducing a new VGPR spill as a last resort. This also doesn't attempt to handle SGPR spilling with scalar stores. llvm-svn: 365372
* RegUsageInfoCollector: Don't iterate all regs for every reg classMatt Arsenault2019-07-081-31/+6
| | | | | | | | | | This is extremly slow on AMDGPU, which has a lot of physical register and a lot of register classes. determineCalleeSaves, via MachineRegisterInfo::isPhysRegUsed already added all of the super registers to the saved set. llvm-svn: 365370
* AMDGPU: Move DEBUG_TYPE definition below includesMatt Arsenault2019-07-081-2/+2
| | | | llvm-svn: 365369
* Keep the order of the basic blocks in the cloned loop as the originalWhitney Tsang2019-07-081-24/+25
| | | | | | | | | | | | | | loop Summary: Do the cloning in two steps, first allocate all the new loops, then clone the basic blocks in the same order as the original loop. Reviewer: Meinersbur, fhahn, kbarton, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, hiraditya, llvm-commits Tag: https://reviews.llvm.org/D64224 Differential Revision: llvm-svn: 365366
* [SCEV] Fix for PR42397. SCEVExpander wrongly adds nsw to shl instruction.Denis Bakhvalov2019-07-081-2/+6
| | | | | Change-Id: I76c9f628c092ae3e6e78ebdaf55cec726e25d692 llvm-svn: 365363
* Revert "[BPF] add new intrinsics preserve_{array,union,struct}_access_index"Yonghong Song2019-07-081-1/+0
| | | | | | | | | This reverts commit r365352. Test ThinLTO/X86/lazyload_metadata.ll failed. Revert the commit and at the same time to fix the issue. llvm-svn: 365360
* [BPF] add new intrinsics preserve_{array,union,struct}_access_indexYonghong Song2019-07-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For background of BPF CO-RE project, please refer to http://vger.kernel.org/bpfconf2019.html In summary, BPF CO-RE intends to compile bpf programs adjustable on struct/union layout change so the same program can run on multiple kernels with adjustment before loading based on native kernel structures. In order to do this, we need keep track of GEP(getelementptr) instruction base and result debuginfo types, so we can adjust on the host based on kernel BTF info. Capturing such information as an IR optimization is hard as various optimization may have tweaked GEP and also union is replaced by structure it is impossible to track fieldindex for union member accesses. Three intrinsic functions, preserve_{array,union,struct}_access_index, are introducted. addr = preserve_array_access_index(base, index, dimension) addr = preserve_union_access_index(base, di_index) addr = preserve_struct_access_index(base, gep_index, di_index) here, base: the base pointer for the array/union/struct access. index: the last access index for array, the same for IR/DebugInfo layout. dimension: the array dimension. gep_index: the access index based on IR layout. di_index: the access index based on user/debuginfo types. For example, for the following example, $ cat test.c struct sk_buff { int i; int b1:1; int b2:2; union { struct { int o1; int o2; } o; struct { char flags; char dev_id; } dev; int netid; } u[10]; }; static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; #define _(x) (__builtin_preserve_access_index(x)) int bpf_prog(struct sk_buff *ctx) { char dev_id; bpf_probe_read(&dev_id, sizeof(char), _(&ctx->u[5].dev.dev_id)); return dev_id; } $ clang -target bpf -O2 -g -emit-llvm -S -mllvm -print-before-all \ test.c >& log The generated IR looks like below: ... define dso_local i32 @bpf_prog(%struct.sk_buff*) #0 !dbg !15 { %2 = alloca %struct.sk_buff*, align 8 %3 = alloca i8, align 1 store %struct.sk_buff* %0, %struct.sk_buff** %2, align 8, !tbaa !45 call void @llvm.dbg.declare(metadata %struct.sk_buff** %2, metadata !43, metadata !DIExpression()), !dbg !49 call void @llvm.lifetime.start.p0i8(i64 1, i8* %3) #4, !dbg !50 call void @llvm.dbg.declare(metadata i8* %3, metadata !44, metadata !DIExpression()), !dbg !51 %4 = load i32 (i8*, i32, i8*)*, i32 (i8*, i32, i8*)** @bpf_probe_read, align 8, !dbg !52, !tbaa !45 %5 = load %struct.sk_buff*, %struct.sk_buff** %2, align 8, !dbg !53, !tbaa !45 %6 = call [10 x %union.anon]* @llvm.preserve.struct.access.index.p0a10s_union.anons.p0s_struct.sk_buffs( %struct.sk_buff* %5, i32 2, i32 3), !dbg !53, !llvm.preserve.access.index !19 %7 = call %union.anon* @llvm.preserve.array.access.index.p0s_union.anons.p0a10s_union.anons( [10 x %union.anon]* %6, i32 1, i32 5), !dbg !53 %8 = call %union.anon* @llvm.preserve.union.access.index.p0s_union.anons.p0s_union.anons( %union.anon* %7, i32 1), !dbg !53, !llvm.preserve.access.index !26 %9 = bitcast %union.anon* %8 to %struct.anon.0*, !dbg !53 %10 = call i8* @llvm.preserve.struct.access.index.p0i8.p0s_struct.anon.0s( %struct.anon.0* %9, i32 1, i32 1), !dbg !53, !llvm.preserve.access.index !34 %11 = call i32 %4(i8* %3, i32 1, i8* %10), !dbg !52 %12 = load i8, i8* %3, align 1, !dbg !54, !tbaa !55 %13 = sext i8 %12 to i32, !dbg !54 call void @llvm.lifetime.end.p0i8(i64 1, i8* %3) #4, !dbg !56 ret i32 %13, !dbg !57 } !19 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "sk_buff", file: !3, line: 1, size: 704, elements: !20) !26 = distinct !DICompositeType(tag: DW_TAG_union_type, scope: !19, file: !3, line: 5, size: 64, elements: !27) !34 = distinct !DICompositeType(tag: DW_TAG_structure_type, scope: !26, file: !3, line: 10, size: 16, elements: !35) Note that @llvm.preserve.{struct,union}.access.index calls have metadata llvm.preserve.access.index attached to instructions to provide struct/union debuginfo type information. For &ctx->u[5].dev.dev_id, . The "%6 = ..." represents struct member "u" with index 2 for IR layout and index 3 for DI layout. . The "%7 = ..." represents array subscript "5". . The "%8 = ..." represents union member "dev" with index 1 for DI layout. . The "%10 = ..." represents struct member "dev_id" with index 1 for both IR and DI layout. Basically, traversing the use-def chain recursively for the 3rd argument of bpf_probe_read() and examining all preserve_*_access_index calls, the debuginfo struct/union/array access index can be achieved. The intrinsics also contain enough information to regenerate codes for IR layout. For array and structure intrinsics, the proper GEP can be constructed. For union intrinsics, replacing all uses of "addr" with "base" should be enough. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61810 llvm-svn: 365352
* [WebAssembly] tablegen: distinguish float/int immediate operands.Wouter van Oortmerssen2019-07-082-8/+24
| | | | | | | | | | | | | | | | | Summary: Before, they were one category of operands which could cause crashes in non-sensical combinations, e.g. "f32.const symbol". Now these are forced to be an error. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64039 llvm-svn: 365351
* AMDGPU: Remove mubuf specific PatFragsMatt Arsenault2019-07-081-27/+13
| | | | | | | These are identical to the *_global PatFrag, and will only create more work to get the GlobalISel importer to handle them. llvm-svn: 365350
* AMDGPU: Move waitcnt intrinsic to instruction definition patternMatt Arsenault2019-07-082-12/+3
| | | | llvm-svn: 365349
* Add, and infer, a nofree function attributeBrian Homerding2019-07-081-5/+0
| | | | | | | | | | Removing dead code leftover from refactor. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D49165 llvm-svn: 365345
* GlobalISel: Convert some build functions to using SrcOp/DstOpMatt Arsenault2019-07-083-53/+62
| | | | llvm-svn: 365343
* [InstCombine] canonicalize insert+splat to/from element 0 of vectorSanjay Patel2019-07-081-0/+38
| | | | | | | | | | | We recognize a splat from element 0 in (VectorUtils) llvm::getSplatValue() and also in ShuffleVectorInst::isZeroEltSplatMask(), so this converts to that form for better matching. The backend generically turns these patterns into build_vector, so there should be no codegen difference. llvm-svn: 365342
* [Bitcode][NFC] Remove unused variable from BitcodeAnalyzerFrancis Visoiu Mistrih2019-07-081-2/+0
| | | | llvm-svn: 365340
* Teach the IRBuilder about fadd and friends.Kevin P. Neal2019-07-082-12/+59
| | | | | | | | | | | | The IRBuilder has calls to create floating point instructions like fadd. It does not have calls to create constrained versions of them. This patch adds support for constrained creation of fadd, fsub, fmul, fdiv, and frem. Reviewed by: John McCall, Sanjay Patel Approved by: John McCall Differential Revision: https://reviews.llvm.org/D53157 llvm-svn: 365339
* Add, and infer, a nofree function attributeBrian Homerding2019-07-0811-19/+116
| | | | | | | | | | | | This patch adds a function attribute, nofree, to indicate that a function does not, directly or indirectly, call a memory-deallocation function (e.g., free, C++'s operator delete). Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D49165 llvm-svn: 365336
* [X86] ISD::INSERT_SUBVECTOR - use uint64_t index. NFCI.Simon Pilgrim2019-07-081-4/+4
| | | | | | Keep the uint64_t type from getConstantOperandVal to stop truncation/extension overflow warnings in MSVC in subvector index math. llvm-svn: 365328
* [Float2Int] Add support for unary FNeg to Float2IntCameron McInally2019-07-081-0/+14
| | | | | | Differential Revision: https://reviews.llvm.org/D63941 llvm-svn: 365324
* [MIPS GlobalISel] Register bank select for G_LOAD. Select i64 loadPetar Avramovic2019-07-083-10/+100
| | | | | | | | | | | | | | Select gprb or fprb when loaded value is used by either: copy to physical register or instruction with only one mapping available for that use operand. Load of integer s64 is handled with narrowScalar when mapping is applied, produced artifacts are combined away. Manually set gprb to all register operands of instructions created during narrowScalar. Differential Revision: https://reviews.llvm.org/D64269 llvm-svn: 365323
* [MIPS GlobalISel] Register bank select for G_STORE. Select i64 storePetar Avramovic2019-07-083-5/+344
| | | | | | | | | | | | | | Select gprb or fprb when stored value is defined by either: copy from physical register or instruction with only one mapping available for that def operand. Store of integer s64 is handled with narrowScalar when mapping is applied, produced artifacts are combined away. Manually set gprb to all register operands of instructions created during narrowScalar. Differential Revision: https://reviews.llvm.org/D64268 llvm-svn: 365322
* [AMDGPU][MC] Corrected parsing of FLAT offset modifierDmitry Preobrazhensky2019-07-087-98/+114
| | | | | | | | | | | | | | Summary of changes: - simplified handling of FLAT offset: offset_s13 and offset_u12 have been replaced with flat_offset; - provided information about error position for pre-gfx9 targets; - improved errors handling. Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D64244 llvm-svn: 365321
* GlobalISel: widenScalar for G_BUILD_VECTORMatt Arsenault2019-07-081-0/+19
| | | | llvm-svn: 365320
* [TargetLowering] SimplifyDemandedBits - just call computeKnownBits for ↵Simon Pilgrim2019-07-081-23/+3
| | | | | | | | | | BUILD_VECTOR cases. Don't do this locally, computeKnownBits does this better (and can handle non-constant cases as well). A next step would be to actually simplify non-constant elements - building on what we already do in SimplifyDemandedVectorElts. llvm-svn: 365309
* [ARM] Relax constraints on operands of VQxDMLxDH instructionsMikhail Maltsev2019-07-082-17/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: According to a recently updated Armv8-M spec (https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf) the 32-bit width versions of the following instructions: * VQDMLADH * VQDMLADHX * VQRDMLADH * VQRDMLADHX * VQDMLSDH * VQDMLSDHX * VQRDMLSDH * VQRDMLSDHX are no longer unpredictable when their output register is the same as one of the input registers. This patch updates the assembler parser and the corresponding tests and also removes @earlyclobber from the instruction constraints. Reviewers: simon_tatham, ostannard, dmgreen, SjoerdMeijer, samparker Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64250 llvm-svn: 365306
* [RISCV] Specify registers used in DWARF exception handlingAlex Bradbury2019-07-082-0/+20
| | | | | | | | | | | Defines RISCV registers for getExceptionPointerRegister() and getExceptionSelectorRegister(). Differential Revision: https://reviews.llvm.org/D63411 Patch by Edward Jones. Modified by Alex Bradbury to add CHECK lines to exception-pointer-register.ll. llvm-svn: 365301
* [ARM] Fix null pointer dereference in ↵Fangrui Song2019-07-081-6/+8
| | | | | | | | CodeGen/ARM/Windows/stack-protector-msvc.ll.test after D64292/r365283 CLI.CS may not be set. llvm-svn: 365299
* [AMDGPU] Use a named predicate instead of a magic number.Jay Foad2019-07-081-4/+3
| | | | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64201 llvm-svn: 365294
* [X86] Allow execution domain fixing to turn SHUFPD into SHUFPS.Craig Topper2019-07-081-0/+14
| | | | | | | This can help with code size on SSE targets where SHUFPD requires a 0x66 prefix and SHUFPS doesn't. llvm-svn: 365293
* [X86] Make movsd commutable to shufpd with a 0x02 immediate on pre-SSE4.1 ↵Craig Topper2019-07-082-15/+41
| | | | | | | | | | | | | targets. This can help avoid a copy or enable load folding. On SSE4.1 targets we can commute it to blendi instead. I had to make shufpd with a 0x02 immediate commutable as well since we expect commuting to be reversible. llvm-svn: 365292
* [RISCV] Support z and i operand modifiersAlex Bradbury2019-07-081-9/+27
| | | | | | | Differential Revision: https://reviews.llvm.org/D57792 Patch by James Clarke. llvm-svn: 365291
* [X86] Add MOVSDrr->MOVLPDrm entry to load folding table. Add custom handling ↵Craig Topper2019-07-082-1/+19
| | | | | | | | to turn UNPCKLPDrr->MOVHPDrm when load is under aligned. If the load is aligned we can turn UNPCKLPDrr into UNPCKLPDrm. llvm-svn: 365287
* [llvm-bcanalyzer] Refactor and move to libLLVMBitReaderFrancis Visoiu Mistrih2019-07-082-0/+978
| | | | | | | | | | | | | This allows us to use the analyzer from unit tests. * Refactor the interface to use proper error handling for most functions after JF's work. * Move everything into a BitstreamAnalyzer class. * Move that to Bitcode/BitcodeAnalyzer.h. Differential Revision: https://reviews.llvm.org/D64116 llvm-svn: 365286
* [ARM] Add support for MSVC stack cookie checkingMartin Storsjo2019-07-072-0/+34
| | | | | | | | Heavily based on the same for AArch64, from SVN r346469. Differential Revision: https://reviews.llvm.org/D64292 llvm-svn: 365283
* [X86] Make sure load isn't volatile before shrinking it in MOVDDUP isel ↵Craig Topper2019-07-072-5/+5
| | | | | | patterns. llvm-svn: 365275
* [CodeGen] Add larger vector types for i32 and f32David Majnemer2019-07-071-1/+25
| | | | | | | | | | Some out of tree backend require larger vector type. Since maintaining the changes out of tree is difficult due to the many manual changes needed when adding a new type we are adding it even if no backend currently use it. Differential Revision: https://reviews.llvm.org/D64141 Patch by Thomas Raoux! llvm-svn: 365274
* [X86] SimplifyDemandedVectorEltsForTargetNode - fix shadow variable warning. ↵Simon Pilgrim2019-07-061-3/+3
| | | | | | | | NFCI. Fixes cppcheck warning. llvm-svn: 365271
* [X86] LowerBuildVectorv16i8 - pull out repeated getOperand() call. NFCI.Simon Pilgrim2019-07-061-3/+3
| | | | llvm-svn: 365270
* [DAGCombine] convertBuildVecZextToZext - remove duplicate getOpcode() call. ↵Simon Pilgrim2019-07-061-1/+1
| | | | | | NFCI. llvm-svn: 365269
* [X86] Remove patterns from MOVLPSmr and MOVHPSmr instructions.Craig Topper2019-07-063-25/+37
| | | | | | | | These patterns are the same as the MOVLPDmr and MOVHPDmr patterns, but with a bitcast at the end. We can just select the PD instruction and let execution domain fixing switch to PS. llvm-svn: 365267
* [X86] Add patterns to select MOVLPDrm from MOVSD+load and MOVHPD from ↵Craig Topper2019-07-061-0/+14
| | | | | | | | | | | | | UNPCKL+load. These narrow the load so we can only do it if the load isn't volatile. There also tests in vector-shuffle-128-v4.ll that this should support, but we don't seem to fold bitcast+load on pre-sse4.2 targets due to the slow unaligned mem 16 flag. llvm-svn: 365266
* [IRBuilder] Introduce helpers for and/or of multiple values at oncePhilip Reames2019-07-063-25/+10
| | | | | | | | We had versions of this code scattered around, so consolidate into one location. Not strictly NFC since the order of intermediate results may change in some places, but since these operations are associatives, should not change results. llvm-svn: 365259
* [RegisterCoalescer] Fix an overzealous assertQuentin Colombet2019-07-061-1/+8
| | | | | | | | | | Although removeCopyByCommutingDef deals with full copies, it is still possible to copy undef lanes and thus, we wouldn't have any a value number for these lanes. This fixes PR40215. llvm-svn: 365256
* RegUsageInfoCollector: Skip AMDGPU entry point functionsMatt Arsenault2019-07-051-6/+42
| | | | | | | | | | | | | | I'm not sure if it's worth it or not to add a hook to disable the pass for an arbitrary function. This pass is taking up to 5% of compile time in tiny programs by iterating through all of the physical registers in every register class. This pass should be rewritten in terms of regunits. For now, skip doing anything for entry point functions. The vast majority of functions in the real world aren't callable, so just not running this will give the majority of the benefit. llvm-svn: 365255
OpenPOWER on IntegriCloud