summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [SelectionDAGBuilder] Fuse inline asm input operand loops passes. NFCI.Nirav Dave2019-01-241-13/+9
| | | | llvm-svn: 352053
* [X86] Add missing isReg() guards in FixupSetCCs pass.Nirav Dave2019-01-241-2/+2
| | | | llvm-svn: 352051
* [TTI] Add generic SADDO/SSUBO costsSimon Pilgrim2019-01-241-0/+10
| | | | | | Added x86 scalar sadd_with_overflow/ssub_with_overflow costs. llvm-svn: 352045
* [TTI] Add generic UADDO/USUBO costsSimon Pilgrim2019-01-241-3/+14
| | | | | | | | Added x86 scalar uadd_with_overflow/usub_with_overflow costs. Differential Revision: https://reviews.llvm.org/D56907 llvm-svn: 352043
* Revert "[HotColdSplitting] Get DT and PDT from the pass manager."Florian Hahn2019-01-241-32/+9
| | | | | | | | | This reverts commit a6982414edf315c39ae93f3c3322476217119e99 (llvm-svn: 352036), because it causes a memory leak in the pass manager. Failing bot http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/10351/steps/check-llvm%20asan/logs/stdio llvm-svn: 352041
* [MIPS GlobalISel] Select zero extending and sign extending loadPetar Avramovic2019-01-243-2/+37
| | | | | | | | | Select zero extending and sign extending load for MIPS32. Use size from MachineMemOperand to determine number of bytes to load. Differential Revision: https://reviews.llvm.org/D57099 llvm-svn: 352038
* [MIPS GlobalISel] Combine extending loadsPetar Avramovic2019-01-241-0/+11
| | | | | | | | | | | | | | Use CombinerHelper to combine extending load instructions. G_LOAD combined with G_ZEXT, G_SEXT or G_ANYEXT gives G_ZEXTLOAD, G_SEXTLOAD or G_LOAD with same type as def of extending instruction respectively. Similarly G_ZEXTLOAD combined with G_ZEXT gives G_ZEXTLOAD and G_SEXTLOAD combined with G_SEXT gives G_SEXTLOAD with same type as def of extending instruction. Differential Revision: https://reviews.llvm.org/D56914 llvm-svn: 352037
* [HotColdSplitting] Get DT and PDT from the pass manager.Florian Hahn2019-01-241-9/+32
| | | | | | | | | | | | | | | | | | | Instead of manually computing DT and PDT, we can get the from the pass manager, which ideally has them already cached. With the new pass manager, we could even preserve DT/PDT on a per function basis in a module pass. I think this also addresses the TODO about re-using the computed DTs for BFI. IIUC, GetBFI will fetch the DT from the pass manager and when we will fetch the cached version later. Reviewers: vsk, hiraditya, tejohnson, thegameg, sebpop Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D57092 llvm-svn: 352036
* Reapply: [mips] Handle MipsMCExpr sub-expression for the MEK_DTPREL tagSimon Atanasyan2019-01-242-8/+11
| | | | | | | | | | | | | | | | | | | | | This reapplies commit r351987 with a failed test fix. Now the test accepts both DW_OP_GNU_push_tls_address and DW_OP_form_tls_address opcode. Original commit message: ``` This is a fix for a regression introduced by the rL348194 commit. In that change new type (MEK_DTPREL) of MipsMCExpr expression was added, but in some places of the code this type of expression considered as unexpected. This change fixes the bug. The MEK_DTPREL type of expression is used for marking TLS DIEExpr only and contains a regular sub-expression. Where we need to handle the expression, we retrieve the sub-expression and handle it in a common way. ``` llvm-svn: 352034
* [SystemZ] Remember to reset the NoPHIs property on MF in createPHIsForSelects()Jonas Paulsson2019-01-241-0/+2
| | | | | | | | After creating new PHI instructions during isel pseudo expansion, the NoPHIs property of MF should be reset in case it was previously set. Review: Ulrich Weigand llvm-svn: 352030
* [X86] Update SelectionDAGDumper to print the extension type and expanding ↵Craig Topper2019-01-241-0/+30
| | | | | | flag for masked loads. Add truncating and compressing for masked stores. llvm-svn: 352029
* [X86] Add test cases for opportunities to fold a truncate and a masked store ↵Craig Topper2019-01-241-1/+1
| | | | | | into a truncating masked store. llvm-svn: 352027
* [LoopSimplifyCFG] Fix inconsistency in live blocks markupMax Kazantsev2019-01-241-2/+3
| | | | | | | | | | | | | | | | | | | | | | When we choose whether or not we should mark block as dead, we have an inconsistent logic in markup of live blocks. - We take candidate IF its terminator branches on constant AND it is immediately in current loop; - We mark successor live IF its terminator doesn't branch by constant OR it branches by constant and the successor is its always taken block. What we are missing here is that when the terminator branches on a constant but is not taken as a candidate because is it not immediately in the current loop, we will mark only one (always taken) successor as live. Therefore, we do NOT do the actual folding but may NOT mark one of the successors as live. So the result of markup is wrong in this case, and we may then hit various asserts. Thanks Jordan Rupprech for reporting this! Differential Revision: https://reviews.llvm.org/D57095 Reviewed By: rupprecht llvm-svn: 352024
* DebugInfo: Use assembly label arithmetic for address pool size for easier ↵David Blaikie2019-01-242-9/+17
| | | | | | | | | reading/editing Recommits 350048, 350050 That broke buildbots because of some typos in the test case. llvm-svn: 352019
* Revert "[RISCV] Set isAsCheapAsAMove for ADDI, ORI, XORI, LUI"Ana Pazos2019-01-243-18/+3
| | | | | | | | This reverts commit ccfb060ecb5d7e18ea729455660484d576bde2cc. Some tests need to to fixed before reapplying this commit. llvm-svn: 352014
* [RISCV] Set isAsCheapAsAMove for ADDI, ORI, XORI, LUIAna Pazos2019-01-243-3/+18
| | | | | | | | | | | | | | | | | Summary: Affected instructions: PseudoLI simplest form (ADDI with X0) ALU operations with immediate (they do not set status flag - ADDI, ORI, XORI) Reviewers: asb Reviewed By: asb Subscribers: shiva0217, rkruppe, kito-cheng, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei Differential Revision: https://reviews.llvm.org/D56526 llvm-svn: 352010
* [RISCV] Set isReMaterializable for ORI, XORIAna Pazos2019-01-241-0/+4
| | | | | | | | | | | | Reviewers: asb Reviewed By: asb Subscribers: asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei Differential Revision: https://reviews.llvm.org/D57069 llvm-svn: 352008
* [Sanitizers] UBSan unreachable incompatible with ASan in the presence of ↵Julian Lettner2019-01-2410-3/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `noreturn` calls Summary: UBSan wants to detect when unreachable code is actually reached, so it adds instrumentation before every `unreachable` instruction. However, the optimizer will remove code after calls to functions marked with `noreturn`. To avoid this UBSan removes `noreturn` from both the call instruction as well as from the function itself. Unfortunately, ASan relies on this annotation to unpoison the stack by inserting calls to `_asan_handle_no_return` before `noreturn` functions. This is important for functions that do not return but access the the stack memory, e.g., unwinder functions *like* `longjmp` (`longjmp` itself is actually "double-proofed" via its interceptor). The result is that when ASan and UBSan are combined, the `noreturn` attributes are missing and ASan cannot unpoison the stack, so it has false positives when stack unwinding is used. Changes: # UBSan now adds the `expect_noreturn` attribute whenever it removes the `noreturn` attribute from a function # ASan additionally checks for the presence of this attribute Generated code: ``` call void @__asan_handle_no_return // Additionally inserted to avoid false positives call void @longjmp call void @__asan_handle_no_return call void @__ubsan_handle_builtin_unreachable unreachable ``` The second call to `__asan_handle_no_return` is redundant. This will be cleaned up in a follow-up patch. rdar://problem/40723397 Reviewers: delcypher, eugenis Tags: #sanitizers Differential Revision: https://reviews.llvm.org/D56624 llvm-svn: 352003
* Update entry count for cold callsDavid Callahan2019-01-242-37/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Profile sample files include the number of times each entry or inlined call site is sampled. This is translated into the entry count metadta on functions. When sample data is being read, if a call site that was inlined in the sample program is considered cold and not inlined, then the entry count of the out-of-line functions does not reflect the current compilation. In this patch, we note call sites where the function was not inlined and as a last action of the sample profile loading, we update the called function's entry count to reflect the calls from these call sites which are not included in the profile file. Reviewers: danielcdh, wmi, Kader, modocache Reviewed By: wmi Subscribers: davidxl, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D52845 llvm-svn: 352001
* Revert "[mips] Handle MipsMCExpr sub-expression for the MEK_DTPREL tag"Amara Emerson2019-01-242-11/+8
| | | | | | This reverts commit r351987 as it broke some bots. llvm-svn: 351998
* [llvm] Clarify responsiblity of some of DILocation discriminator APIsMircea Trofin2019-01-244-5/+5
| | | | | | | | | | | | | | | | | | | | | Summary: Renamed setBaseDiscriminator to cloneWithBaseDiscriminator, to match similar APIs. Also changed its behavior to copy over the other discriminator components, instead of eliding them. Renamed cloneWithDuplicationFactor to cloneByMultiplyingDuplicationFactor, which more closely matches what this API does. Reviewers: dblaikie, wmi Reviewed By: dblaikie Subscribers: zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D56220 llvm-svn: 351996
* [ADT] Notify ilist traits about in-list transfersReid Kleckner2019-01-232-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Previously no client of ilist traits has needed to know about transfers of nodes within the same list, so as an optimization, ilist doesn't call transferNodesFromList in that case. However, now there are clients that want to use ilist traits to cache instruction ordering information to optimize dominance queries of instructions in the same basic block. This change updates the existing ilist traits users to detect in-list transfers and do nothing in that case. After this change, we can start caching instruction ordering information in LLVM IR data structures. There are two main ways to do that: - by putting an order integer into the Instruction class - by maintaining order integers in a hash table on BasicBlock I plan to implement and measure both, but I wanted to commit this change first to enable other out of tree ilist clients to implement this optimization as well. Reviewers: lattner, hfinkel, chandlerc Subscribers: hiraditya, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D57120 llvm-svn: 351992
* [LV][VPlan] Change to implement VPlan based predication forHideki Saito2019-01-237-6/+419
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VPlan-native path Context: Patch Series #2 for outer loop vectorization support in LV using VPlan. (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). Patch series #2 checks that inner loops are still trivially lock-step among all vector elements. Non-loop branches are blindly assumed as divergent. Changes here implement VPlan based predication algorithm to compute predicates for blocks that need predication. Predicates are computed for the VPLoop region in reverse post order. A block's predicate is computed as OR of the masks of all incoming edges. The mask for an incoming edge is computed as AND of predecessor block's predicate and either predecessor's Condition bit or NOT(Condition bit) depending on whether the edge from predecessor block to the current block is true or false edge. Reviewers: fhahn, rengolin, hsaito, dcaballe Reviewed By: fhahn Patch by Satish Guggilla, thanks! Differential Revision: https://reviews.llvm.org/D53349 llvm-svn: 351990
* hwasan: Read shadow address from ifunc if we don't need a frame record.Peter Collingbourne2019-01-231-10/+21
| | | | | | | | | | | | | This saves a cbz+cold call in the interceptor ABI, as well as a realign in both ABIs, trading off a dcache entry against some branch predictor entries and some code size. Unfortunately the functionality is hidden behind a flag because ifunc is known to be broken on static binaries on Android. Differential Revision: https://reviews.llvm.org/D57084 llvm-svn: 351989
* [mips] Handle MipsMCExpr sub-expression for the MEK_DTPREL tagSimon Atanasyan2019-01-232-8/+11
| | | | | | | | | | | | | | This is a fix for a regression introduced by the rL348194 commit. In that change new type (MEK_DTPREL) of MipsMCExpr expression was added, but in some places of the code this type of expression considered as unexpected. This change fixes the bug. The MEK_DTPREL type of expression is used for marking TLS DIEExpr only and contains a regular sub-expression. Where we need to handle the expression, we retrieve the sub-expression and handle it in a common way. llvm-svn: 351987
* Revert r351938 "[ARM] Alter the register allocation order for minsize on Thumb2"Reid Kleckner2019-01-231-27/+4
| | | | | | | This change caused fatal backend errors when compiling a file in libvpx for Android. llvm-svn: 351979
* [DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target.Alexey Bataev2019-01-234-13/+19
| | | | | | | | Enable full support for the debug info. Differential revision: https://reviews.llvm.org/D46189 llvm-svn: 351974
* Revert "[DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target."Alexey Bataev2019-01-234-18/+12
| | | | | | | This reverts commit r351972. Some pieces of the patch was not applied correctly. llvm-svn: 351973
* [DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target.Alexey Bataev2019-01-234-12/+18
| | | | | | | | | Enable full support for the debug info. Recommit to fix the emission of the not required closing brace. Differential revision: https://reviews.llvm.org/D46189 llvm-svn: 351972
* Revert "[DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target."Haojian Wu2019-01-233-8/+12
| | | | | | | | | | | | | | | | | | | | | | This reverts commit r351846. This patch may generate illegal assembly code, see ``` $ ./bin/clang -cc1 -triple nvptx64-nvidia-cuda -aux-triple x86_64-grtev4-linux-gnu -S -disable-free -disable-llvm-verifier -discard-value-names -main-file-name new.cc -mrelocation-model pic -pic-level 2 -mthread-model posix -fmerge-all-constants -mdisable-fp-elim -relaxed-aliasing -no-integrated-as -mpie-copy-relocations -munwind-tables -fcuda-is-device -target-feature +ptx60 -target-cpu sm_35 -dwarf-column-info -debug-info-kind=line-directives-only -dwarf-version=2 -debugger-tuning=gdb -o empty.s -x cuda empty.cc $ cat empty.s // // Generated by LLVM NVPTX Back-End // .version 6.0 .target sm_35 .address_size 64 } ``` llvm-svn: 351966
* [MC][X86] Correctly model additional operand latency caused by transfer ↵Andrea Di Biagio2019-01-2316-8/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | delays from the integer to the floating point unit. This patch adds a new ReadAdvance definition named ReadInt2Fpu. ReadInt2Fpu allows x86 scheduling models to accurately describe delays caused by data transfers from the integer unit to the floating point unit. ReadInt2Fpu currently defaults to a delay of zero cycles (i.e. no delay) for all x86 models excluding BtVer2. That means, this patch is only a functional change for the Jaguar cpu model only. Tablegen definitions for instructions (V)PINSR* have been updated to account for the new ReadInt2Fpu. That read is mapped to the the GPR input operand. On Jaguar, int-to-fpu transfers are modeled as a +6cy delay. Before this patch, that extra delay was added to the opcode latency. In practice, the insert opcode only executes for 1cy. Most of the actual latency is actually contributed by the so-called operand-latency. According to the AMD SOG for family 16h, (V)PINSR* latency is defined by expression f+1, where f is defined as a forwarding delay from the integer unit to the fpu. When printing instruction latency from MCA (see InstructionInfoView.cpp) and LLC (only when flag -print-schedule is speified), we now need to account for any extra forwarding delays. We do this by checking if scheduling classes declare any negative ReadAdvance entries. Quoting a code comment in TargetSchedule.td: "A negative advance effectively increases latency, which may be used for cross-domain stalls". When computing the instruction latency for the purpose of our scheduling tests, we now add any extra delay to the formula. This avoids regressing existing codegen and mca schedule tests. It comes with the cost of an extra (but very simple) hook in MCSchedModel. Differential Revision: https://reviews.llvm.org/D57056 llvm-svn: 351965
* Fix indentation. NFCI.Simon Pilgrim2019-01-231-13/+13
| | | | llvm-svn: 351958
* [IR] Match intrinsic parameter by scalar/vectorwidthSimon Pilgrim2019-01-231-11/+14
| | | | | | | | | | | | | | This patch replaces the existing LLVMVectorSameWidth matcher with LLVMScalarOrSameVectorWidth. The matching args must be either scalars or vectors with the same number of elements, but in either case the scalar/element type can differ, specified by LLVMScalarOrSameVectorWidth. I've updated the _overflow intrinsics to demonstrate this - allowing it to return a i1 or <N x i1> overflow result, matching the scalar/vectorwidth of the other (add/sub/mul) result type. The masked load/store/gather/scatter intrinsics have also been updated to use this, although as we specify the reference type to be llvm_anyvector_ty we guarantee the mask will be <N x i1> so no change in behaviour Differential Revision: https://reviews.llvm.org/D57090 llvm-svn: 351957
* [Hexagon] Remove incorrect bit negationKrzysztof Parzyszek2019-01-231-1/+1
| | | | llvm-svn: 351956
* [AArch64] Fix out of bounds strlenBenjamin Kramer2019-01-231-2/+2
| | | | | | | | | CFIInst is not zero-terminated. This is one of more annoying functional differences between StringRef and ArrayRef. Found by asan. llvm-svn: 351955
* Move saturated arithmetic intrinsics to other integer intrinsics. NFCI.Simon Pilgrim2019-01-231-4/+4
| | | | | | They were in the floating point group. llvm-svn: 351953
* [AMDGPU] With XNACK, cannot clause a load with result coalesced with operandTim Renouf2019-01-231-0/+11
| | | | | | | | | | | | | | | | | | Summary: With XNACK, an smem load whose result is coalesced with an operand (thus it overwrites its own operand) cannot appear in a clause, because some other instruction might XNACK and restart the whole clause. The clause breaker already realized that an smem that overwrites an operand cannot appear in a clause, and broke the clause. The problem that this commit fixes is that the SIFormMemoryClauses optimization formed a bundle with early clobber, which caused the earlier code that set up the coalesced operand to be removed as dead. Differential Revision: https://reviews.llvm.org/D57008 Change-Id: I703c4d5b0bf7d6060222bec491f45c18bb3c0016 llvm-svn: 351950
* [HotColdSplitting] Remove unused SSAUpdater.h include (NFC).Florian Hahn2019-01-231-1/+0
| | | | llvm-svn: 351945
* [ARM] Alter the register allocation order for minsize on Thumb2David Green2019-01-231-4/+27
| | | | | | | | | | | | | | | Currently in Arm code, we allocate LR first, under the assumption that it needs to be saved anyway. Unfortunately this has the disadvantage that it will require any instructions using it to be the longer thumb2 instructions, not the shorter thumb1 ones. This switches the order when we are optimising for minsize, returning to the default order so that more lower registers can be used. It can end up requiring more pushed registers, but on average produces smaller code. Differential Revision: https://reviews.llvm.org/D56008 llvm-svn: 351938
* [ARM][CGP] Check trunc type before replacingSam Parker2019-01-231-7/+13
| | | | | | | | | | | | In the last stage of type promotion, we replace any zext that uses a new trunc with the operand of the trunc. This is okay when we only allowed one type to be optimised, but now its the case that the trunc maybe needed to produce a more narrow type than the one we were optimising for. So we need to check this before doing the replacement. Differential Revision: https://reviews.llvm.org/D57041 llvm-svn: 351935
* [DAGCombine] Enable more pre-indexed storesSam Parker2019-01-231-1/+7
| | | | | | | | | | | The current check in CombineToPreIndexedLoadStore is too conversative, preventing a pre-indexed store when the base pointer is a predecessor of the value being stored. Instead, we should check the pointer operand of the store. Differential Revision: https://reviews.llvm.org/D56719 llvm-svn: 351933
* [SLH] AArch64: correctly pick temporary register to mask SPKristof Beyls2019-01-231-57/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As part of speculation hardening, the stack pointer gets masked with the taint register (X16) before a function call or before a function return. Since there are no instructions that can directly mask writing to the stack pointer, the stack pointer must first be transferred to another register, where it can be masked, before that value is transferred back to the stack pointer. Before, that temporary register was always picked to be x17, since the ABI allows clobbering x17 on any function call, resulting in the following instruction pattern being inserted before function calls and returns/tail calls: mov x17, sp and x17, x17, x16 mov sp, x17 However, x17 can be live in those locations, for example when the call is an indirect call, using x17 as the target address (blr x17). To fix this, this patch looks for an available register just before the call or terminator instruction and uses that. In the rare case when no register turns out to be available (this situation is only encountered twice across the whole test-suite), just insert a full speculation barrier at the start of the basic block where this occurs. Differential Revision: https://reviews.llvm.org/D56717 llvm-svn: 351930
* [SystemZ] Handle DBG_VALUE instructions in two places in backend.Jonas Paulsson2019-01-232-6/+10
| | | | | | | | | | | | | | | | | Two backend optimizations failed to handle cases when compiled with -g, due to failing to consider DBG_VALUE instructions. This was in SystemZTargetLowering::emitSelect() and SystemZElimCompare::getRegReferences(). This patch makes sure that DBG_VALUEs are recognized so that they do not affect these optimizations. Tests for branch-on-count, load-and-trap and consecutive selects. Review: Ulrich Weigand https://reviews.llvm.org/D57048 llvm-svn: 351928
* [IRCE] Support narrow latch condition for wide range checksMax Kazantsev2019-01-231-11/+44
| | | | | | | | | | | | | | | | | | | | This patch relaxes restrictions on types of latch condition and range check. In current implementation, they should match. This patch allows to handle wide range checks against narrow condition. The motivating example is the following: int N = ... for (long i = 0; (int) i < N; i++) { if (i >= length) deopt; } In this patch, the option that enables this support is turned off by default. We'll wait until it is switched to true. Differential Revision: https://reviews.llvm.org/D56837 Reviewed By: reames llvm-svn: 351926
* [Pipeliner] Add two pragmas to control software pipelining optimizationBrendon Cahoon2019-01-231-7/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | #pragma clang loop pipeline(disable) Disable SWP optimization for the next loop. “disable” is the only possible value. #pragma clang loop pipeline_initiation_interval(number) Set value of initiation interval for SWP optimization to specified number value for the next loop. Number is the positive value greater than 0. These pragmas could be used for debugging or reducing compile time purposes. It is possible to disable SWP for concrete loops to save compilation time or to find bugs by not doing SWP to certain loops. It is possible to set value of initiation interval to concrete number to save compilation time by not doing extra pipeliner passes or to check created schedule for specific initiation interval. That is llvm part of the fix Clang part of fix: https://reviews.llvm.org/D55710 Patch by Alexey Lapshin! Differential Revision: https://reviews.llvm.org/D56403 llvm-svn: 351923
* hwasan: Move memory access checks into small outlined functions on aarch64.Peter Collingbourne2019-01-235-24/+172
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each hwasan check requires emitting a small piece of code like this: https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses The problem with this is that these code blocks typically bloat code size significantly. An obvious solution is to outline these blocks of code. In fact, this has already been implemented under the -hwasan-instrument-with-calls flag. However, as currently implemented this has a number of problems: - The functions use the same calling convention as regular C functions. This means that the backend must spill all temporary registers as required by the platform's C calling convention, even though the check only needs two registers on the hot path. - The functions take the address to be checked in a fixed register, which increases register pressure. Both of these factors can diminish the code size effect and increase the performance hit of -hwasan-instrument-with-calls. The solution that this patch implements is to involve the aarch64 backend in outlining the checks. An intrinsic and pseudo-instruction are created to represent a hwasan check. The pseudo-instruction is register allocated like any other instruction, and we allow the register allocator to select almost any register for the address to check. A particular combination of (register selection, type of check) triggers the creation in the backend of a function to handle the check for specifically that pair. The resulting functions are deduplicated by the linker. The pseudo-instruction (really the function) is specified to preserve all registers except for the registers that the AAPCS specifies may be clobbered by a call. To measure the code size and performance effect of this change, I took a number of measurements using Chromium for Android on aarch64, comparing a browser with inlined checks (the baseline) against a browser with outlined checks. Code size: Size of .text decreases from 243897420 to 171619972 bytes, or a 30% decrease. Performance: Using Chromium's blink_perf.layout microbenchmarks I measured a median performance regression of 6.24%. The fact that a perf/size tradeoff is evident here suggests that we might want to make the new behaviour conditional on -Os/-Oz. But for now I've enabled it unconditionally, my reasoning being that hwasan users typically expect a relatively large perf hit, and ~6% isn't really adding much. We may want to revisit this decision in the future, though. I also tried experimenting with varying the number of registers selectable by the hwasan check pseudo-instruction (which would result in fewer variants being created), on the hypothesis that creating fewer variants of the function would expose another perf/size tradeoff by reducing icache pressure from the check functions at the cost of register pressure. Although I did observe a code size increase with fewer registers, I did not observe a strong correlation between the number of registers and the performance of the resulting browser on the microbenchmarks, so I conclude that we might as well use ~all registers to get the maximum code size improvement. My results are below: Regs | .text size | Perf hit -----+------------+--------- ~all | 171619972 | 6.24% 16 | 171765192 | 7.03% 8 | 172917788 | 5.82% 4 | 177054016 | 6.89% Differential Revision: https://reviews.llvm.org/D56954 llvm-svn: 351920
* MemoryBlock: Do not automatically extend a given size to a multiple of page ↵Rui Ueyama2019-01-232-7/+9
| | | | | | | | | | | | | | | | | | | | | | | size. Previously, MemoryBlock automatically extends a requested buffer size to a multiple of page size because (I believe) doing it was thought to be harmless and with that you could get more memory (on average 2KiB on 4KiB-page systems) "for free". That programming interface turned out to be error-prone. If you request N bytes, you usually expect that a resulting object returns N for `size()`. That's not the case for MemoryBlock. Looks like there is only one place where we take the advantage of allocating more memory than the requested size. So, with this patch, I simply removed the automatic size expansion feature from MemoryBlock and do it on the caller side when needed. MemoryBlock now always returns a buffer whose size is equal to the requested size. Differential Revision: https://reviews.llvm.org/D56941 llvm-svn: 351916
* [CodeView] Allow empty types in member functionsJosh Stone2019-01-231-1/+4
| | | | | | | | | | | | | | | | | | | | | Summary: `CodeViewDebug::lowerTypeMemberFunction` used to default to a `Void` return type if the function's type array was empty. After D54667, it started blindly indexing the 0th item for the return type, which fails in `getOperand` for empty arrays if assertions are enabled. This patch restores the `Void` return type for empty type arrays, and adds a test generated by Rust in line-only debuginfo mode. Reviewers: zturner, rnk Reviewed By: rnk Subscribers: hiraditya, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D57070 llvm-svn: 351910
* Fixed isReMaterializable setting for LUI instruction.Ana Pazos2019-01-221-1/+2
| | | | llvm-svn: 351895
* [HotColdSplit] Calculate BFI lazily to reduce compile-time, NFCVedant Kumar2019-01-221-5/+12
| | | | | | | | | | The splitting pass does not need BFI unless the Module actually has a profile summary. Do not calcualte BFI unless the summary is present. For the sqlite3 amalgamation, this reduces time spent in the splitting pass from 0.4% of the total to under 0.1%. llvm-svn: 351894
OpenPOWER on IntegriCloud