bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[llvm-rc] Support EXSTYLE statement.	Martin Storsjo	2018-11-29	7	-2/+37
\| \| \| \| \| \| \| \|	Patch by Jacek Caban! Differential Revision: https://reviews.llvm.org/D55020 llvm-svn: 347858
*	[llvm-mca][MC] Add the ability to declare which processor resources model ↵	Andrea Di Biagio	2018-11-29	6	-16/+137
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	load/store queues (PR36666). This patch adds the ability to specify via tablegen which processor resources are load/store queue resources. A new tablegen class named MemoryQueue can be optionally used to mark resources that model load/store queues. Information about the load/store queue is collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and `StoreQueueID`. Those two fields are identifiers for buffered resources used to describe the load queue and the store queue. Field `BufferSize` is interpreted as the number of entries in the queue, while the number of units is a throughput indicator (i.e. number of available pickers for loads/stores). At construction time, LSUnit in llvm-mca checks for the presence of extra processor information (i.e. MCExtraProcessorInfo) in the scheduling model. If that information is available, and fields LoadQueueID and StoreQueueID are set to a value different than zero (i.e. the invalid processor resource index), then LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value declared by the two processor resources. With this patch, we more accurately track dynamic dispatch stalls caused by the lack of LS tokens (i.e. load/store queue full). This is also shown by the differences in two BdVer2 tests. Stalls that were previously classified as generic SCHEDULER FULL stalls, are not correctly classified either as "load queue full" or "store queue full". About the differences in the -scheduler-stats view: those differences are expected, because entries in the load/store queue are not released at instruction issue stage. Instead, those are released at instruction executed stage. This is the main reason why for the modified tests, the load/store queues gets full before PdEx is full. Differential Revision: https://reviews.llvm.org/D54957 llvm-svn: 347857
*	Reapply "[llvm-mca] Return the total number of cycles from method ↵	Andrea Di Biagio	2018-11-28	3	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pipeline::run()." This reapplies r347767 (originally reviewed at: https://reviews.llvm.org/D55000) with a fix for the missing std::move of the Error returned by the call to Pipeline::runCycle(). Below is the original commit message from r347767. If a user only cares about the overall latency, then the best/quickest way is to change method Pipeline::run() so that it returns the total number of cycles to the caller. When the simulation pipeline is run, the number of cycles (or an error) is returned from method Pipeline::run(). The advantage is that no hardware event listener is needed for computing that latency. So, the whole process should be faster (and simpler - at least for that particular use case). llvm-svn: 347795
*	Revert [llvm-mca] Return the total number of cycles from method Pipeline::run().	Andrea Di Biagio	2018-11-28	3	-9/+5
\| \| \| \| \| \|	This reverts commits 347767. llvm-svn: 347775
*	[llvm-mca] Return the total number of cycles from method Pipeline::run().	Andrea Di Biagio	2018-11-28	3	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a user only cares about the overall latency, then the best/quickest way is to change method Pipeline::run() so that it returns the total number of cycles to the caller. When the simulation pipeline is run, the number of cycles (or an error) is returned from method Pipeline::run(). The advantage is that no hardware event listener is needed for computing that latency. So, the whole process should be faster (and simpler - at least for that particular use case). llvm-svn: 347767
*	[llvm-objcopy] Hook up the -V alias to --version, output "GNU strip"	Martin Storsjo	2018-11-28	3	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	This allows libtool to detect the presence of llvm-strip and use it with the options --strip-debug and --strip-unneeded. Also hook up the -V alias for objcopy. Differential Revision: https://reviews.llvm.org/D54936 llvm-svn: 347731
*	[yaml2obj] Treat COFF/ARM64 as a 64 bit architecture	Martin Storsjo	2018-11-27	1	-1/+2
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D54935 llvm-svn: 347703
*	[cfi] Help sanstats to find binary if they are not at the original location	Vitaly Buka	2018-11-26	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: By default sanstats search binaries at the same location where they were when stats was collected. Sometime you can not print report immediately or you need to move post-processing to another workstation. To support this use-case when original binary is missing sanstats will fall-back to directory with sanstats file. Reviewers: pcc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53857 llvm-svn: 347601
*	[cfi] Make sanstats print address of the check	Vitaly Buka	2018-11-26	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Help with off-line symbolization or other type debugging. Reviewers: pcc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53606 llvm-svn: 347600
*	[llvm-mca] Add support for instructions with a variadic number of operands.	Andrea Di Biagio	2018-11-25	1	-18/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By default, llvm-mca conservatively assumes that a register operand from the variadic sequence is both a register read and a register write. That is because MCInstrDesc doesn't describe extra variadic operands; we don't have enough dataflow information to tell which register operands from the variadic sequence is a definition, and which is a use instead. However, if a variadic instruction is flagged 'mayStore' (but not 'mayLoad'), and it has no 'unmodeledSideEffects', then llvm-mca (very) optimistically assumes that any register operand in the variadic sequence is a register read only. Conversely, if a variadic instruction is marked as 'mayLoad' (but not 'mayStore'), and it has no 'unmodeledSideEffects', then llvm-mca optimistically assumes that any extra register operand is a register definition only. These assumptions work quite well for variadic load/store multiple instructions defined by the ARM backend. llvm-svn: 347522
*	[llvm-mca] InstrBuilder: warnings for call/ret instructions are only ↵	Andrea Di Biagio	2018-11-24	2	-4/+14
\| \| \| \| \| \|	reported once. llvm-svn: 347514
*	[llvm-mca] Refactor some of the logic in InstrBuilder, and add a ↵	Andrea Di Biagio	2018-11-23	2	-77/+118
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	verifyOperands method. With this change, InstrBuilder emits an error if the MCInst sequence contains an instruction with a variadic opcode, and a non-zero number of variadic operands. Currently we don't know how to correctly analyze variadic opcodes. The problem with variadic operands is that there is no information for them in the opcode descriptor (i.e. MCInstrDesc). That means, we don't know which variadic operands are defs, and which are uses. In future, we could try to conservatively assume that any extra register operands is both a register use and a register definition. This patch fixes a subtle bug in the evaluation of read/write operands for ARM VLD1 with implicit index update. Added test vld1-index-update.s llvm-svn: 347503
*	Revert r347490 as it breaks address sanitizer builds	Luke Cheeseman	2018-11-23	3	-58/+41
\| \| \| \|	llvm-svn: 347499
*	[llvm-mca][View] Improved Retire Control Unit Statistics.	Andrea Di Biagio	2018-11-23	3	-16/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RetireControlUnitStatistics now reports extra information about the ROB and the avg/maximum number of entries consumed over the entire simulation. Example: Retire Control Unit - number of cycles where we saw N instructions retired: [# retired], [# cycles] 0, 109 (17.9%) 1, 102 (16.7%) 2, 399 (65.4%) Total ROB Entries: 64 Max Used ROB Entries: 35 ( 54.7% ) Average Used ROB Entries per cy: 32 ( 50.0% ) Documentation in llvm/docs/CommandGuide/llvmn-mca.rst has been updated to reflect this change. llvm-svn: 347493
*	Revert r343341	Luke Cheeseman	2018-11-23	3	-41/+58
\| \| \| \| \| \| \|	- Cannot reproduce the build failure locally and the build logs have been deleted. llvm-svn: 347490
*	[llvm-mca] LSUnit: use a SmallSet to model load/store queues. NFCI	Andrea Di Biagio	2018-11-22	2	-25/+32
\| \| \| \| \| \| \| \| \| \|	Also, try to minimize the number of queries to the memory queues to speedup the analysis. On average, this change gives a small 2% speedup. For memcpy-like kernels, the speedup is up to 5.5%. llvm-svn: 347469
*	[llvm-mca] Use a SmallVector instead of std::vector to track register ↵	Andrea Di Biagio	2018-11-22	2	-9/+11
\| \| \| \| \| \| \| \| \| \|	reads/writes. NFCI This avoids a heap allocation most of the times. This patch gives a small but consistent 3% speedup on a release build (up to ~5% on a debug build). llvm-svn: 347464
*	[llvm-mca] Fix an invalid memory read introduced by r346487.	Andrea Di Biagio	2018-11-22	3	-16/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes an invalid memory read introduced by r346487. Before this patch, partial register write had to query the latency of the dependent full register write by calling a method on the full write descriptor. However, if the full write is from an already retired instruction, chances are that the EntryStage already reclaimed its memory. In some parial register write tests, valgrind was reporting an invalid memory read. This change fixes the invalid memory access problem. Writes are now responsible for tracking dependent partial register writes, and notify them in the event of instruction issued. That means, partial register writes no longer need to query their associated full write to check when they are ready to execute. Added test X86/BtVer2/partial-reg-update-7.s llvm-svn: 347459
*	[llvm-size] Use empty() and range-based for loop. NFC	Fangrui Song	2018-11-22	1	-5/+5
\| \| \| \|	llvm-svn: 347441
*	[llvm-exegesis][NFC] Some code style cleanup	Jinsong Ji	2018-11-20	1	-201/+238
\| \| \| \| \| \| \| \| \| \| \| \| \|	Apply review comments of https://reviews.llvm.org/D54185 to other target as well, specifically: 1. make anonymous namespaces as small as possible, avoid using static inside anonymous namespaces 2. Add missing header to some files 3. GetLoadImmediateOpcodem-> getLoadImmediateOpcode 4. Fix typo Differential Revision: https://reviews.llvm.org/D54343 llvm-svn: 347309
*	[DebugInfo] DISubprogram flags get their own flags word. NFC.	Paul Robinson	2018-11-19	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	This will hold flags specific to subprograms. In the future we could potentially free up scarce bits in DIFlags by moving subprogram-specific flags from there to the new flags word. This patch does not change IR/bitcode formats, that will be done in a follow-up. Differential Revision: https://reviews.llvm.org/D54597 llvm-svn: 347239
*	[llvm-nm] Fix use-after-free for MachOUniversalBinaries	Francis Visoiu Mistrih	2018-11-19	1	-1/+2
\| \| \| \| \| \| \| \| \|	MachOObjectFile::getHostArch() returns a temporary, and getArchName returns a StringRef pointing to a temporary std::string. No tests since it doesn't trigger any errors except with the sanitizers. llvm-svn: 347230
*	[llvm-exegesis][NFC] More tests for ExegesisTarget::fillMemoryOperands().	Clement Courbet	2018-11-19	2	-7/+9
\| \| \| \| \| \| \| \| \| \|	Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54304 llvm-svn: 347209
*	Subject: [PATCH] [CodeGen] Add pass to combine interleaved loads.	Martin Elshuber	2018-11-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch defines an interleaved-load-combine pass. The pass searches for ShuffleVector instructions that represent interleaved loads. Matches are converted such that they will be captured by the InterleavedAccessPass. The pass extends LLVMs capabilities to use target specific instruction selection of interleaved load patterns (e.g.: ld4 on Aarch64 architectures). Differential Revision: https://reviews.llvm.org/D52653 llvm-svn: 347208
*	[llvm-exegesis] (+final perf overview) ↵	Roman Lebedev	2018-11-19	2	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	InstructionBenchmarkClustering::rangeQuery(): reserve for the upper bound of Neighbors Summary: As it was pointed out in D54388+D54390, the maximal size of `Neighbors` is known, it will contain at most Points_.size() minus one (the center of the cluster) While that is the upper bound, meaning in the most cases, the actual count will be much smaller, since D54390 made the allocation persistent, we no longer have to worry about overly-optimistically `reserve()`ing. Old: (D54393) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 6553.167456 task-clock (msec) # 1.000 CPUs utilized ( +- 0.21% ) ... 6.5547 +- 0.0134 seconds time elapsed ( +- 0.20% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 6315.057872 task-clock (msec) # 0.999 CPUs utilized ( +- 0.24% ) ... 6.3187 +- 0.0160 seconds time elapsed ( +- 0.25% ) ``` And that is another -~4%. Since this is the last (as of this moment) patch in this patch series, it is a good time to summarize: Old: (svn trunk, as stated in D54381) ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m24.884s user 0m24.099s sys 0m0.785s ``` So these patches, on a given benchmark, has decreased llvm-exegesis analysis time by 74.62%. There surely is more room for further improvements. D54514 may improve thins by -11.5% more (relative to this patch). Parallelization may improve things further significantly, too. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54415 llvm-svn: 347204
*	[llvm-exegesis] Move InstructionBenchmarkClustering::isNeighbour() into header	Roman Lebedev	2018-11-19	2	-12/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Old: (D54390) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7432.421721 task-clock (msec) # 1.000 CPUs utilized ( +- 0.15% ) ... 7.4336 +- 0.0115 seconds time elapsed ( +- 0.15% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 6569.936144 task-clock (msec) # 1.000 CPUs utilized ( +- 0.22% ) ... 6.5711 +- 0.0143 seconds time elapsed ( +- 0.22% ) ``` And another -12%. You'd think it would be `inline`d anyway, but no! :) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54393 llvm-svn: 347203
*	[llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): write into ↵	Roman Lebedev	2018-11-19	2	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	llvm::SmallVectorImpl& output parameter Summary: I do believe this is the correct fix. We call `rangeQuery()` very often. And many times it's output vector is large (tens of thousands entries), so small-size-opt won't help. Old: (D54389) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7934.528363 task-clock (msec) # 1.000 CPUs utilized ( +- 0.19% ) ... 7.9354 +- 0.0148 seconds time elapsed ( +- 0.19% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7383.793440 task-clock (msec) # 1.000 CPUs utilized ( +- 0.47% ) ... 7.3868 +- 0.0340 seconds time elapsed ( +- 0.46% ) ``` And another -7%. And that isn't even the good bit yet. Old: * calls to allocation functions: 2081419 * temporary allocations: 219658 (10.55%) * bytes allocated in total (ignoring deallocations): 4.31 GB New: * calls to allocation functions: 1880295 (-10%) * temporary allocations: 18758 (1%) (-91% sic) * bytes allocated in total (ignoring deallocations): 545.15 MB (-88% sic) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54390 llvm-svn: 347202
*	[llvm-exegesis] InstructionBenchmarkClustering::dbScan(): replace ↵	Roman Lebedev	2018-11-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	std::vector<> with std::deque<> in llvm::SetVector<> Summary: Old: (D54388) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 8606.323981 task-clock (msec) # 1.000 CPUs utilized ( +- 0.11% ) ... 8.60773 +- 0.00978 seconds time elapsed ( +- 0.11% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7971.403653 task-clock (msec) # 1.000 CPUs utilized ( +- 0.14% ) ... 7.9728 +- 0.0113 seconds time elapsed ( +- 0.14% ) ``` Another -~7%. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, RKSimon Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54389 llvm-svn: 347201
*	[llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): use ↵	Roman Lebedev	2018-11-19	2	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	llvm::SmallVector<size_t, 0> for storage. Summary: Old: (D54383) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 9098.781978 task-clock (msec) # 1.000 CPUs utilized ( +- 0.16% ) ... 9.1015 +- 0.0148 seconds time elapsed ( +- 0.16% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 8553.352480 task-clock (msec) # 1.000 CPUs utilized ( +- 0.12% ) ... 8.5539 +- 0.0105 seconds time elapsed ( +- 0.12% ) ``` So another -6%. That is because the `SmallVector` doubles it size when reallocating, which is great here, since we can't `reserve()` since we can't know how many `Neighbors` we will have. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54388 llvm-svn: 347200
*	[llvm-exegesis] Analysis: writeMeasurementValue(): don't alloc string for ↵	Roman Lebedev	2018-11-19	1	-1/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	double each time. Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: (D54382) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 9024.354355 task-clock (msec) # 1.000 CPUs utilized ( +- 0.18% ) ... 9.0262 +- 0.0161 seconds time elapsed ( +- 0.18% ) ``` New time: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 8996.541057 task-clock (msec) # 0.999 CPUs utilized ( +- 0.19% ) ... 9.0045 +- 0.0172 seconds time elapsed ( +- 0.19% ) ``` -~0.3%, not that much. But this isn't the important part. Old: * calls to allocation functions: 2109712 * temporary allocations: 33112 * bytes allocated in total (ignoring deallocations): 4.43 GB New: * calls to allocation functions: 2095345 (-0.68%) * temporary allocations: 18745 (-43.39% !!!) * bytes allocated in total (ignoring deallocations): 4.31 GB (-2.71%) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54383 llvm-svn: 347199
*	[llvm-exegesis] Analysis::writeSnippet(): be smarter about memory allocations.	Roman Lebedev	2018-11-19	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: (D54381) ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m10.487s user 0m9.745s sys 0m0.740s ``` New time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m9.599s user 0m8.824s sys 0m0.772s ``` Not that much, around -9%. But that is not the good part yet, again. Old: * calls to allocation functions: 3347676 * temporary allocations: 277818 * bytes allocated in total (ignoring deallocations): 10.52 GB New: * calls to allocation functions: 2109712 (-36%) * temporary allocations: 33112 (-88%) * bytes allocated in total (ignoring deallocations): 4.43 GB (-58% sic) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54382 llvm-svn: 347198
*	[llvm-exegesis] InstructionBenchmarkClustering::dbScan(): use ↵	Roman Lebedev	2018-11-19	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	llvm::SetVector<> instead of ILLEGAL std::unordered_set<> Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m24.884s user 0m24.099s sys 0m0.785s ``` New time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m10.469s user 0m9.797s sys 0m0.672s ``` So -60%. And that isn't the good bit yet. Old: * calls to allocation functions: 106560180 (yes, 107 million allocations.) * bytes allocated in total (ignoring deallocations): 12.17 GB New: * calls to allocation functions: 3347676 (-96.86%) (just 3 mil) * bytes allocated in total (ignoring deallocations): 10.52 GB (~2GB less) --- Two points i want to raise: * `std::unordered_set<>` should not have been used there in the first place. It is banned by the https://llvm.org/docs/ProgrammersManual.html#other-set-like-container-options * There is no tests, so i'm not fully sure this is correct. Since it was unordered set, i guess there are zero restrictions on the order, and anything will be ok? * I tried other containers suggested in https://llvm.org/docs/ProgrammersManual.html#set-like-containers-std-set-smallset-setvector-etc, this `llvm::SetVector<>` seems to be best here. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet Subscribers: kristina, bobsayshilol, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54381 llvm-svn: 347197
*	[llvm-objdump] Print a blank row at the end of sections	Xing GUO	2018-11-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When using option `-x` (--all-headers), it will print `Sections`, `Symbol Table`, `Program Header` ... `Sections` and `Symbol Table` will be connected together. Before: ``` Sections: Idx Name Size Address Type 0 00000000 0000000000000000 ... 29 .shstrtab 0000011a 0000000000000000 SYMBOL TABLE: ... ``` After: ``` Sections: Idx Name Size Address Type 0 00000000 0000000000000000 ... 29 .shstrtab 0000011a 0000000000000000 SYMBOL TABLE: ... ``` Reviewers: Higuoxing Reviewed By: Higuoxing Subscribers: llvm-commits, jhenderson Differential Revision: https://reviews.llvm.org/D54665 llvm-svn: 347135
*	Use llvm::copy. NFC	Fangrui Song	2018-11-17	1	-5/+5
\| \| \| \|	llvm-svn: 347126
*	[llvm-objcopy] Use llvm::all_of and rename the variables "Segment" to avoid ↵	Fangrui Song	2018-11-17	1	-6/+6
\| \| \| \| \| \|	confusion with the type of the same name llvm-svn: 347123
*	[llvm-objdump] Use `auto` declaration in typecasting	Xing GUO	2018-11-15	1	-16/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: According to `MaskRay`, use `auto` for type inference, according to coding standards. Delete some comments, because these comments can be easily inferred from codes. Reviewers: jhenderson, MaskRay Reviewed By: jhenderson Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54573 llvm-svn: 346946
*	[WebAssembly] Add support for dylink section in object format	Sam Clegg	2018-11-14	2	-2/+25
\| \| \| \| \| \| \| \|	See https://github.com/WebAssembly/tool-conventions/blob/master/DynamicLinking.md. Differential Revision: https://reviews.llvm.org/D54490 llvm-svn: 346880
*	[llvm-objdump] Improve ELF file type checking statements (D54509)	Xing GUO	2018-11-14	1	-12/+6
\| \| \| \|	llvm-svn: 346851
*	[WebAssembly] Add support for the event section	Heejin Ahn	2018-11-14	3	-18/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This adds support for the 'event section' specified in the exception handling proposal. (This was named 'exception section' first, but later renamed to 'event section' to take possibilities of other kinds of events into consideration. But currently we only store exception info in this section.) The event section is added between the global section and the export section. This is for ease of validation per request of the V8 team. This patch: - Creates the event symbol type, which is a weak symbol - Makes 'throw' instruction take the event symbol '__cpp_exception' - Adds relocation support for events - Adds WasmObjectWriter / WasmObjectFile (Reader) support - Adds obj2yaml / yaml2obj support - Adds '.eventtype' printing support Reviewers: dschuff, sbc100, aardappel Subscribers: jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54096 llvm-svn: 346825
*	Make dsymutil more robust when parsing load commands.	Adrian Prantl	2018-11-13	1	-14/+19
\| \| \| \| \| \|	rdar://problem/45883463 llvm-svn: 346815
*	[llvm-objcopy] Rename --keep to --keep-section.	Jordan Rupprecht	2018-11-13	5	-9/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: llvm-objcopy/strip support `--keep` (for sections) and `--keep-symbols` (for symbols). For consistency and clarity, rename `--keep` to `--keep-section`. In fact, for GNU compatability, -K is --keep-symbol, so it's weird that the alias `-K` is not the same as the short-ish `--keep`. Reviewers: jakehehrlich, jhenderson, alexshap, MaskRay, espindola Reviewed By: jakehehrlich, MaskRay Subscribers: emaste, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D54477 llvm-svn: 346782
*	[IR] Add a dedicated FNeg IR Instruction	Cameron McInally	2018-11-13	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	The IEEE-754 Standard makes it clear that fneg(x) and fsub(-0.0, x) are two different operations. The former is a bitwise operation, while the latter is an arithmetic operation. This patch creates a dedicated FNeg IR Instruction to model that behavior. Differential Revision: https://reviews.llvm.org/D53877 llvm-svn: 346774
*	[libObject] Fix getDesc for Elf_Note_Impl	Jake Ehrlich	2018-11-13	1	-31/+30
\| \| \| \| \| \| \|	This change fixes a bug in Elf_Note_Impl in which Elf_Word was used where uint8_t should have been used. llvm-svn: 346724
*	[llvm-objcopy] Don't copy Config when processing --keep	Fangrui Song	2018-11-12	1	-1/+1
\| \| \| \|	llvm-svn: 346717
*	[llvm-readelf] Make llvm-readelf more compatible with GNU readelf.	Jordan Rupprecht	2018-11-12	7	-79/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change adds a bunch of options that GNU readelf supports. There is one breaking change when invoked as `llvm-readobj`, and three breaking changes when invoked as `llvm-readelf`: - Add --all (implies --file-header, --program-headers, etc.) - [Breaking] -a is --all instead of --arm-attributes - Add --file-header as an alias for --file-headers - Replace --sections with --sections-headers, keeping --sections as an alias for it - Add --relocs as an alias for --relocations - Add --dynamic as an alias for --dynamic-table - Add --segments as an alias for --program-headers - Add --section-groups as an alias for --elf-section-groups - Add --dyn-syms as an alias for --dyn-symbols - Add --syms as an alias for --symbols - Add --histogram as an alias for --elf-hash-histogram - [Breaking] When invoked as `llvm-readelf`, -s is --symbols instead of --sections - [Breaking] When invoked as `llvm-readelf`, -t is no longer an alias for --symbols Reviewers: MaskRay, phosek, mcgrathr, jhenderson Reviewed By: MaskRay, jhenderson Subscribers: sbc100, aheejin, edd, jhenderson, silvas, echristo, compnerd, kristina, javed.absar, kristof.beyls, llvm-commits, Bigcheese Differential Revision: https://reviews.llvm.org/D54124 llvm-svn: 346685
*	[llvm-mca] Correctly update the resource strategy for processor resources ↵	Andrea Di Biagio	2018-11-12	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	with multiple units. When looking at the tests committed by Roman at r346587, I noticed that numbers reported by the resource pressure for PdAGU01 were wrong. In particular, according to the aut-generated CHECK lines in tests memcpy-like-test.s and store-throughput.s, resource pressure for PdAGU01 was not uniformly distributed among the two AGEN pipes. It turns out that the reason why pressure was not correctly distributed, was because the "resource selection strategy" object associated with PdAGU01 was not correctly updated on the event of AGEN pipe used. As a result, llvm-mca was not simulating a round-robin pipeline allocation for PdAGU01. Instead, PdAGU1 was always prioritized over PdAGU0. This patch fixes the issue; now processor resource strategy objects for resources declaring multiple units, are correctly notified in the event of "resource used". llvm-svn: 346650
*	[newpm] Fix r346645: Missing consume of the Error return by the pipeline parser	Philip Pfaffe	2018-11-12	1	-2/+3
\| \| \| \|	llvm-svn: 346649
*	Add an OptimizerLast EP	Philip Pfaffe	2018-11-12	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It turns out that we need an OptimizerLast PassBuilder extension point after all. I missed the relevance of this EP the first time. By legacy PM magic, function passes added at this EP get added to the last _Function_ PM, which is a feature we lost when dropping this EP for the new PM. A key difference between this and the legacy PassManager's OptimizerLast callback is that this extension point is not triggered at O0. Extensions to the O0 pipeline should append their passes to the end of the overall pipeline. Differential Revision: https://reviews.llvm.org/D54374 llvm-svn: 346645
*	[llvm-nm] Use WithColor for error reporting	Jonas Devlieghere	2018-11-11	1	-7/+8
\| \| \| \| \| \|	Use helpers from Support/WithError.h to print errors. llvm-svn: 346624
*	[llvm-objdump] Use WithColor for error reporting	Jonas Devlieghere	2018-11-11	3	-68/+95
\| \| \| \| \| \|	Use helpers from Support/WithError.h to print errors. llvm-svn: 346623