summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Don't form extloads in combineExtInVec unless the load extension is legal.Craig Topper2019-07-091-7/+9
| | | | | | | | | | This should prevent doing this on pre-sse4.1 targets or for 256 bit vectors without avx2. I don't know of a failure from this. Op legalization will probably take care of, but seemed better to be safe. llvm-svn: 365577
* AMDGPU/GlobalISel: Fix legality for G_BUILD_VECTORMatt Arsenault2019-07-091-7/+4
| | | | llvm-svn: 365575
* [AMDGPU] gfx908 v_pk_fmac_f16 supportStanislav Mekhanoshin2019-07-092-4/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D64433 llvm-svn: 365573
* GlobalISel: Combine unmerge of merge with intermediate castMatt Arsenault2019-07-091-3/+9
| | | | | | | This eliminates some illegal intermediate vectors when operations are scalarized. llvm-svn: 365566
* [Profile] Support raw/indexed profiles larger than 4GBVedant Kumar2019-07-091-2/+2
| | | | | | rdar://45955976 llvm-svn: 365565
* [AMDGPU] gfx908 mAI instructions, MC partStanislav Mekhanoshin2019-07-0919-18/+674
| | | | | | Differential Revision: https://reviews.llvm.org/D64446 llvm-svn: 365563
* [SLP] Optimize getSpillCost(); NFCINikita Popov2019-07-091-6/+10
| | | | | | | | | | | | For a given set of live values, the spill cost will always be the same for each call. Compute the cost once and multiply it by the number of calls. (I'm not sure this spill cost modeling makes sense if there are multiple calls, as the spill cost will likely be shared across calls in that case. But that's how it currently works.) llvm-svn: 365552
* hwasan: Improve precision of checks using short granule tags.Peter Collingbourne2019-07-092-17/+152
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A short granule is a granule of size between 1 and `TG-1` bytes. The size of a short granule is stored at the location in shadow memory where the granule's tag is normally stored, while the granule's actual tag is stored in the last byte of the granule. This means that in order to verify that a pointer tag matches a memory tag, HWASAN must check for two possibilities: * the pointer tag is equal to the memory tag in shadow memory, or * the shadow memory tag is actually a short granule size, the value being loaded is in bounds of the granule and the pointer tag is equal to the last byte of the granule. Pointer tags between 1 to `TG-1` are possible and are as likely as any other tag. This means that these tags in memory have two interpretations: the full tag interpretation (where the pointer tag is between 1 and `TG-1` and the last byte of the granule is ordinary data) and the short tag interpretation (where the pointer tag is stored in the granule). When HWASAN detects an error near a memory tag between 1 and `TG-1`, it will show both the memory tag and the last byte of the granule. Currently, it is up to the user to disambiguate the two possibilities. Because this functionality obsoletes the right aligned heap feature of the HWASAN memory allocator (and because we can no longer easily test it), the feature is removed. Also update the documentation to cover both short granule tags and outlined checks. Differential Revision: https://reviews.llvm.org/D63908 llvm-svn: 365551
* [PoisonChecking] Flesh out complete todo list for full coveragePhilip Reames2019-07-091-8/+24
| | | | | Note: I don't actually plan to implement all of the cases at the moment, I'm just documenting them for completeness. There's a couple of cases left which are practically useful for me in debugging loop transforms, and I'll probably stop there for the moment. llvm-svn: 365550
* [X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into ↵Craig Topper2019-07-095-24/+31
| | | | | | | | | | | | | | | | isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it Basically the problem is that X86 doesn't set the Fast flag from allowsMemoryAccess on certain CPUs due to slow unaligned memory subtarget features. This prevents bitcasts from being folded into loads and stores. But all vector loads and stores of the same width are the same cost on X86. This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it. Differential Revision: https://reviews.llvm.org/D64295 llvm-svn: 365549
* Fix build error for VC STL, use llvm::make_uniqueReid Kleckner2019-07-091-1/+1
| | | | llvm-svn: 365548
* [AMDGPU] gfx908 register file changesStanislav Mekhanoshin2019-07-096-50/+621
| | | | | | Differential Revision: https://reviews.llvm.org/D64438 llvm-svn: 365546
* [PoisonCheker] Support for out of bounds operands on shifts + ↵Philip Reames2019-07-091-1/+41
| | | | | | | | insert/extractelement These are sources of poison which don't come from flags, but are clearly documented in the LangRef. Left off support for scalable vectors for the moment, but should be easy to add if anyone is interested. llvm-svn: 365543
* Boilerplate for producing XCOFF object files from the PowerPC backend.Sean Fertile2019-07-0917-3/+373
| | | | | | | | | | Stubs out a number of the classes needed to produce a new object file format (XCOFF) for the powerpc-aix target. For testing input is an empty module which produces an object file with just a file header. Differential Revision: https://reviews.llvm.org/D61694 llvm-svn: 365541
* [X86] LowerToHorizontalOp - use count_if to count non-UNDEF ops. NFCI.Simon Pilgrim2019-07-091-5/+2
| | | | llvm-svn: 365540
* [PoisonChecking] Add validation rules for "exact" on sdiv/udivPhilip Reames2019-07-091-0/+18
| | | | | | As directly stated in the LangRef, no ambiguity here... llvm-svn: 365538
* [ThinLTO] only emit used or referenced CFI records to indexBob Haarman2019-07-091-8/+22
| | | | | | | | | | | | | | | | | | | | | Summary: We emit CFI_FUNCTION_DEFS and CFI_FUNCTION_DECLS to distributed ThinLTO indices to implement indirect function call checking. This change causes us to only emit entries for functions that are either defined or used by the module we're writing the index for (instead of all functions in the combined index), which can make the indices substantially smaller. Fixes PR42378. Reviewers: pcc, vitalybuka, eugenis Subscribers: mehdi_amini, hiraditya, dexonsmith, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63887 llvm-svn: 365537
* Add a transform pass to make the executable semantics of poison explicit in ↵Philip Reames2019-07-095-0/+288
| | | | | | | | | | | | | | | | the IR Implements a transform pass which instruments IR such that poison semantics are made explicit. That is, it provides a (possibly partial) executable semantics for every instruction w.r.t. poison as specified in the LLVM LangRef. There are obvious parallels to the sanitizer tools, but this pass is focused purely on the semantics of LLVM IR, not any particular source language. The target audience for this tool is developers working on or targetting LLVM from a frontend. The idea is to be able to take arbitrary IR (with the assumption of known inputs), and evaluate it concretely after having made poison semantics explicit to detect cases where either a) the original code executes UB, or b) a transform pass introduces UB which didn't exist in the original program. At the moment, this is mostly the framework and still needs to be fleshed out. By reusing existing code we have decent coverage, but there's a lot of cases not yet handled. What's here is good enough to handle interesting cases though; for instance, one of the recent LFTR bugs involved UB being triggered by integer induction variables with nsw/nuw flags would be reported by the current code. (See comment in PoisonChecking.cpp for full explanation and context) Differential Revision: https://reviews.llvm.org/D64215 llvm-svn: 365536
* Try to appease the Windows build bots.Sean Fertile2019-07-091-4/+12
| | | | | | | Several of the conditonal operators commited in llvm-svn: 365524 fail to compile on the windows buildbots. Converting to an if and early return to try to fix. llvm-svn: 365535
* [BPF] Fix a typo in the file nameYonghong Song2019-07-092-1/+1
| | | | | | | | Fixed the file name from BPFAbstrctMemberAccess.cpp to BPFAbstractMemberAccess.cpp. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 365532
* [AMDGPU] gfx908 targetStanislav Mekhanoshin2019-07-097-1/+104
| | | | | | Differential Revision: https://reviews.llvm.org/D64429 llvm-svn: 365525
* [Object][XCOFF] Add support for 64-bit file header and section header dumping.Sean Fertile2019-07-095-150/+260
| | | | | | | | | | | Adds a readobj dumper for 32-bit and 64-bit section header tables, and extend support for the file-header dumping to include 64-bit object files. Also refactors the binary file parsing to be done in a helper function in an attempt to cleanup error handeling. Differential Revision: https://reviews.llvm.org/D63843 llvm-svn: 365524
* Revert "[HardwareLoops] NFC - move hardware loop checking code to ↵Jinsong Ji2019-07-092-44/+34
| | | | | | | | isHardwareLoopProfitable()" This reverts commit d95557306585404893d610784edb3e32f1bfce18. llvm-svn: 365520
* [AMDGPU] Created a sub-register class for the return address operand in the ↵Christudasan Devadasan2019-07-093-12/+15
| | | | | | | | | | | | | | return instruction. Function return instruction lowering, currently uses the fixed register pair s[30:31] for holding the return address. It can be any SGPR pair other than the CSRs. Created an SGPR pair sub-register class exclusive of the CSRs, and used this regclass while lowering the return instruction. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D63924 llvm-svn: 365512
* [RISCV] Fix ICE in isDesirableToCommuteWithShiftSam Elliott2019-07-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: There was an error being thrown from isDesirableToCommuteWithShift in some tests. This was tracked down to the method being called before legalisation, with an extended value type, not a machine value type. In the case I diagnosed, the error was only hit with an instruction sequence involving `i24`s in the add and shift. `i24` is not a Machine ValueType, it is instead an Extended ValueType which was causing the issue. I have added a test to cover this case, and fixed the error in the callback. Reviewers: asb, luismarques Reviewed By: asb Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64425 llvm-svn: 365511
* [AArch64][GlobalISel] Optimize conditional branches followed by ↵Amara Emerson2019-07-092-0/+64
| | | | | | | | | | | | | | unconditional branches If we have an icmp->brcond->br sequence where the brcond just branches to the next block jumping over the br, while the br takes the false edge, then we can modify the conditional branch to jump to the br's target while inverting the condition of the incoming icmp. This means we can eliminate the br as an unconditional branch to the fallthrough block. Differential Revision: https://reviews.llvm.org/D64354 llvm-svn: 365510
* [mips] Show error in case of using FP64 mode on pre MIPS32R2 CPUSimon Atanasyan2019-07-091-0/+5
| | | | llvm-svn: 365508
* [DAGCombine] LoadedSlice - keep getOffsetFromBase() uint64_t offset. NFCI.Simon Pilgrim2019-07-091-1/+1
| | | | | | Keep the uint64_t type from getOffsetFromBase() to stop truncation/extension overflow warnings in MSVC in alignment math. llvm-svn: 365504
* [BPF] Support for compile once and run everywhereYonghong Song2019-07-0910-55/+1268
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve_*_access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device *dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs *ctx) { struct net_device *dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = *(u64 *)(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = *(u64 *)(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = *(u64 *)(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = *(u64 *)(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503
* [HardwareLoops] NFC - move hardware loop checking code to ↵Chen Zheng2019-07-092-34/+44
| | | | | | | | isHardwareLoopProfitable() Differential Revision: https://reviews.llvm.org/D64197 llvm-svn: 365497
* [MIPS GlobalISel] Register bank select for G_PHI. Select i64 phiPetar Avramovic2019-07-093-4/+63
| | | | | | | | | | | | | | | Select gprb or fprb when def/use register operand of G_PHI is used/defined by either: copy to/from physical register or instruction with only one mapping available for that use/def operand. Integer s64 phi is handled with narrowScalar when mapping is applied, produced artifacts are combined away. Manually set gprb to all register operands of instructions created during narrowScalar. Differential Revision: https://reviews.llvm.org/D64351 llvm-svn: 365494
* [MIPS GlobalISel] Regbanks for G_SELECT. Select i64, f32 and f64 selectPetar Avramovic2019-07-092-9/+35
| | | | | | | | | | | | | | | | | | Select gprb or fprb when def/use register operand of G_SELECT is used/defined by either: copy to/from physical register or instruction with only one mapping available for that use/def operand. Integer s64 select is handled with narrowScalar when mapping is applied, produced artifacts are combined away. Manually set gprb to all register operands of instructions created during narrowScalar. For selection of floating point s32 or s64 select it is enough to set fprb of appropriate size and selectImpl will do the rest. Differential Revision: https://reviews.llvm.org/D64350 llvm-svn: 365492
* AMDGPU/GlobalISel: Legalize more concat_vectorsMatt Arsenault2019-07-091-13/+16
| | | | llvm-svn: 365488
* AMDGPU/GlobalISel: Improve regbankselect for icmp s16Matt Arsenault2019-07-091-5/+10
| | | | | | Account for 64-bit scalar eq/ne when available. llvm-svn: 365487
* AMDGPU/GlobalISel: Make s16 G_ICMP legalMatt Arsenault2019-07-091-2/+8
| | | | llvm-svn: 365486
* AMDGPU/GlobalISel: Select G_SUBMatt Arsenault2019-07-092-7/+15
| | | | llvm-svn: 365484
* AMDGPU/GlobalISel: Select G_UNMERGE_VALUESMatt Arsenault2019-07-092-0/+47
| | | | llvm-svn: 365483
* AMDGPU/GlobalISel: Select G_MERGE_VALUESMatt Arsenault2019-07-094-18/+77
| | | | llvm-svn: 365482
* [CodeGen] AccelTable - remove non-constexpr (MSVC) Atom defsSimon Pilgrim2019-07-091-20/+0
| | | | | | Now that we've dropped VS2015 support (D64326) we can enable the constexpr variables on MSVC builds as VS2017+ correctly handles them llvm-svn: 365477
* [mips] Implement sge/sgeu pseudo instructionsSimon Atanasyan2019-07-093-0/+147
| | | | | | | | | | The `sge/sgeu Dst, Src1, Src2/Imm` pseudo instructions set register `Dst` to 1 if register `Src1` is greater than or equal `Src2/Imm` and to 0 otherwise. Differential Revision: https://reviews.llvm.org/D64314 llvm-svn: 365476
* [mips] Implement sgt/sgtu pseudo instructions with immediate operandSimon Atanasyan2019-07-093-0/+86
| | | | | | | | | The `sgt/sgtu Dst, Src1, Src2/Imm` pseudo instructions set register `Dst` to 1 if register `Src1` is greater than `Src2/Imm` and to 0 otherwise. Differential Revision: https://reviews.llvm.org/D64313 llvm-svn: 365475
* [NFC][AsmPrinter] Fix the formatting for the rL365467Djordje Todorovic2019-07-092-23/+21
| | | | | | | In addition, fix the build failure for the 'unused' variable. The variable was used inside the 'LLVM_DEBUG()'. llvm-svn: 365469
* OpaquePtr: add Type parameter to Loads analysis API.Tim Northover2019-07-0911-36/+91
| | | | | | | | | | | | | This makes the functions in Loads.h require a type to be specified independently of the pointer Value so that when pointers have no structure other than address-space, it can still do its job. Most callers had an obvious memory operation handy to provide this type, but a SROA and ArgumentPromotion were doing more complicated analysis. They get updated to merge the properties of the various instructions they were considering. llvm-svn: 365468
* [DwarfDebug] Dump call site debug infoDjordje Todorovic2019-07-0914-40/+495
| | | | | | | | | | | | | | | | | | | Dump the DWARF information about call sites and call site parameters into debug info sections. The patch also provides an interface for the interpretation of instructions that could load values of a call site parameters in order to generate DWARF about the call site parameters. ([13/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D60716 llvm-svn: 365467
* [RISCV] Fix RISCVTTIImpl::getIntImmCost for immediates where ↵Alex Bradbury2019-07-091-1/+3
| | | | | | | | | | | | | | | getMinSignedBits() > 64 APInt::getSExtValue will assert if getMinSignedBits() > 64. This can happen, for instance, if examining an i128. Avoid this assertion by checking Imm.getMinSignedBits() <= 64 before doing getTLI()->isLegalAddImmediate(Imm.getSExtValue()). We could directly check getMinSignedBits() <= 12 but it seems better to reuse the isLegalAddImmediate helper for this. Differential Revision: https://reviews.llvm.org/D64390 llvm-svn: 365462
* [SelectionDAG] Simplify some calls to getSetCCResultType. NFCBjorn Pettersson2019-07-093-8/+4
| | | | | | | | DAGTypeLegalizer and SelectionDAGLegalize has helper functions wrapping the call to TLI.getSetCCResultType(...). Use those helpers in more places. llvm-svn: 365456
* [LegalizeTypes] Fix saturation bug for smul.fix.satBjorn Pettersson2019-07-091-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Make sure we use SETGE instead of SETGT when checking if the sign bit is zero at SMULFIXSAT expansion. The faulty expansion occured when doing "expand" of SMULFIXSAT and the scale was exactly matching the size of the smaller type. For example doing i64 Z = SMULFIXSAT X, Y, 32 and expanding X/Y/Z into using two i32 values. The problem was that we sometimes did not saturate to min when overflowing. Here is an example using Q3.4 numbers: Consider that we are multiplying X and Y. X = 0x80 (-8.0 as Q3.4) Y = 0x20 (2.0 as Q3.4) To avoid loss of precision we do a widening multiplication, getting a 16 bit result Z = 0xF000 (-16.0 as Q7.8) To detect negative overflow we should check if the five most significant bits in Z are less than -1. Assume that we name the 4 most significant bits as HH and the next 4 bits as HL. Then we can do the check by examining if (HH < -1) or (HH == -1 && "sign bit in HL is zero"). The fault was that we have been doing the check as (HH < -1) or (HH == -1 && HL > 0) instead of (HH < -1) or (HH == -1 && HL >= 0). In our example HH is -1 and HL is 0, so the old code did not trigger saturation and simply truncated the result to 0x00 (0.0). With the bugfix we instead detect that we should saturate to min, and the result will be set to 0x80 (-8.0). Reviewers: leonardchan, bevinh Reviewed By: leonardchan Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64331 llvm-svn: 365455
* Fixing @llvm.memcpy not honoring volatile.Guillaume Chatelet2019-07-091-19/+24
| | | | | | | | | | | | | | | | This is explicitly not addressing target-specific code, or calls to memcpy. Summary: https://bugs.llvm.org/show_bug.cgi?id=42254 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63215 llvm-svn: 365449
* Revert r364515 and r364524Jeremy Morse2019-07-091-85/+2
| | | | | | | Jordan reports on llvm-commits a performance regression with r364515, backing the patch out while it's investigated. llvm-svn: 365448
* Reland "[LiveDebugValues] Emit the debug entry values"Djordje Todorovic2019-07-093-18/+127
| | | | | | | | | | | | | | | | Emit replacements for clobbered parameters location if the parameter has unmodified value throughout the funciton. This is basic scenario where we can use the debug entry values. ([12/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D58042 llvm-svn: 365444
OpenPOWER on IntegriCloud