| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When no scope qualifier is specified, allow completing index symbols
from any scope and insert proper automatically. This is still experimental and
hidden behind a flag.
Things missing:
- Scope proximity based scoring.
- FuzzyFind supports weighted scopes.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: kbobyrev, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52364
llvm-svn: 343248
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52611
llvm-svn: 343247
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* With the current implementation, `sizeof(std::vector<Chunk>)` is added
twice to the `Dex` memory estimate which is incorrect
* `Dex` logs memory usage estimation before `BackingDataSize` is set and
hence the log report excludes size of the external `SymbolSlab` which is
coupled with `Dex` instance
Reviewed By: ioeric
Differential Revision: https://reviews.llvm.org/D52503
llvm-svn: 343117
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because `PostingList` objects are compressed, it is now impossible to
see elements other than the current one and the documentation doesn't
match implementation anymore.
Reviewed By: ioeric
Differential Revision: https://reviews.llvm.org/D52545
llvm-svn: 343116
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Interface is in one file, implementation in two as they have little in common.
A couple of ad-hoc YAML functions left exposed:
- symbol -> YAML I expect to keep for tools like dexp
- YAML -> symbol is used for the MR-style indexer, I think we can eliminate
this (merge-on-the-fly, else use a different serialization)
Reviewers: kbobyrev
Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52453
llvm-svn: 342999
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For consistency, functional-style code pieces are replaced with their
simple counterparts to improve readability.
Also, file headers are fixed to comply with LLVM Coding Standards.
`static` member of anonymous namespace is not marked `static` anymore,
because it is redundant.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D52466
llvm-svn: 342974
|
|
|
|
|
|
| |
Some compilers fail to parse struct default member initializer.
llvm-svn: 342970
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements Variable-length Byte compression of `PostingList`s
to sacrifice some performance for lower memory consumption.
`PostingList` compression and decompression was extensively tested using
fuzzer for multiple hours and runnning significant number of realistic
`FuzzyFindRequests`. AddressSanitizer and UndefinedBehaviorSanitizer
were used to ensure the correct behaviour.
Performance evaluation was conducted with recent LLVM symbol index (292k
symbols) and the collection of user-recorded queries (7751
`FuzzyFindRequest` JSON dumps):
| Metrics | Before| After | Change (%)
| ----- | ----- | ----- | -----
| Memory consumption (posting lists only), MB | 54.4 | 23.5 | -60%
| Time to process queries, sec | 7.70 | 9.4 | +25%
Reviewers: sammccall, ioeric
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D52300
llvm-svn: 342965
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`Dex` should utilize `FuzzyFindRequest.RestrictForCodeCompletion` flags
and omit symbols not meant for code completion when asked for it.
The measurements below were conducted with setting
`FuzzyFindRequest.RestrictForCodeCompletion` to `true` (so that it's
more realistic). Sadly, the average latency goes down, I suspect that is
mostly because of the empty queries where the number of posting lists is
critical.
| Metrics | Before | After | Relative difference
| ----- | ----- | ----- | -----
| Cumulative query latency (7000 `FuzzyFindRequest`s over LLVM static index) | 6182735043 ns | 7202442053 ns | +16%
| Whole Index size | 81.24 MB | 81.79 MB | +0.6%
Out of 292252 symbols collected from LLVM codebase 136926 appear to be
restricted for code completion.
Reviewers: ioeric
Differential Revision: https://reviews.llvm.org/D52357
llvm-svn: 342866
|
|
|
|
| |
llvm-svn: 342505
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We can use cl::ResetCommandLineParser() to support different types of
command-lines, as long as we're careful about option lifetimes.
(I tried using subcommands, but the error messages were bad)
I found a mostly-reasonable pattern to isolate the fiddly parts.
Added -scope and -limit flags to the `find` command to demonstrate.
(Note that scope support seems to be broken in dex?)
Fixed symbol lookup to parse symbol IDs.
Caveats:
- with command help (e.g. `find -help`), you also get some spam
about required arguments. This is a bug in llvm::cl, which prints
these to errs() rather than the designated stream.
Reviewers: kbobyrev
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51989
llvm-svn: 342456
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch abstracts `PostingList` interface and reuses existing
implementation. It will be used later to test different `PostingList`
representations.
No functionality change is introduced, this patch is mostly refactoring
so that the following patches could focus on functionality while not
being too hard to review.
Reviewed By: sammccall, ioeric
Differential Revision: https://reviews.llvm.org/D51982
llvm-svn: 342155
|
|
|
|
|
|
| |
%s/MaxCandidateCount/Limit/g after rL342138.
llvm-svn: 342143
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed during D51860 review, it is better to use `llvm::Optional`
here as it has clear semantics which reflect intended behavior.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D52028
llvm-svn: 342138
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`AND( AND( Child ) ... )` -> `AND( Child ... )`
`AND( OR( Child ) ... )` -> `AND( Child ... )`
This simple optimization results in 5-6% performance improvement in the
benchmark with 2000 serialized `FuzzyFindRequest`s.
Reviewed By: ilya-biryukov
Differential Revision: https://reviews.llvm.org/D52016
llvm-svn: 342124
|
|
|
|
|
|
| |
Without this, builds with `-DSHARED_LIB=ON` fail.
llvm-svn: 342037
|
|
|
|
|
|
|
|
| |
Reviewed By: sammccall, ilya-biryukov
Differential Revision: https://reviews.llvm.org/D51628
llvm-svn: 342025
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, `SymbolIndex::estimateMemoryUsage()` returns the "overhead"
estimate, i.e. the estimate of the Index data structure excluding
backing data (such as Symbol Slab and Reference Slab). This patch
propagates information about paired data size where necessary.
Reviewed By: ioeric, sammccall
Differential Revision: https://reviews.llvm.org/D51539
llvm-svn: 341800
|
|
|
|
|
|
|
|
|
|
| |
Also, cleanup some redundant includes.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D51774
llvm-svn: 341784
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the current element is already beyond advanceTo()'s DocID, just
return instead of doing binary search. This simple optimization saves up
to 6-7% performance,
Reviewed By: ilya-biryukov
Differential Revision: https://reviews.llvm.org/D51802
llvm-svn: 341781
|
|
|
|
|
|
| |
Get rid of "../" and "../../".
llvm-svn: 341645
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch sets URI schemes of Dex to SymbolCollector's default schemes
in case callers tried to pass empty list of schemes. This was the case
for initialization in Clangd main and was a reason of incorrect
behavior.
Also, it fixes a bug with missed `continue;` after spotting invalid URI
scheme conversion.
llvm-svn: 341552
|
|
|
|
|
|
|
|
|
|
| |
Code health: prevent implicit conversions to user-defined types.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D51690
llvm-svn: 341543
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces `PathURI` Search Token kind and utilizes it to
uprank symbols which are defined in files with small distance to the
directory where the fuzzy find request is coming from (e.g. files user
is editing).
Reviewed By: ioeric
Reviewers: ioeric, sammccall
Differential Revision: https://reviews.llvm.org/D51481
llvm-svn: 341542
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
A few things that I noticed while merging the SwapIndex patch:
- SymbolOccurrences and particularly SymbolOccurrenceSlab are unwieldy names,
and these names appear *a lot*. Ref, RefSlab, etc seem clear enough
and read/format much better.
- The asymmetry between SymbolSlab and RefSlab (build() vs freeze()) is
confusing and irritating, and doesn't even save much code.
Avoiding RefSlab::Builder was my idea, but it was a bad one; add it.
- DenseMap<SymbolID, ArrayRef<Ref>> seems like a reasonable compromise for
constructing MemIndex - and means many less wasted allocations than the
current DenseMap<SymbolID, vector<Ref*>> for FileIndex, and none for
slabs.
- RefSlab::find() is not actually used for anything, so we can throw
away the DenseMap and keep the representation much more compact.
- A few naming/consistency fixes: e.g. Slabs,Refs -> Symbols,Refs.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51605
llvm-svn: 341368
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
|
|
|
|
| |
llvm-svn: 341273
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Use consistent assertion messages in iterators implementations
* Silence a bunch of clang-tidy warnings: use `emplace_back` instead of
`push_back` where possible, make sure arguments have the same name in
header and implementation file, use for loop over ranges where possible
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D51528
llvm-svn: 341190
|
|
|
|
| |
llvm-svn: 341066
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces iterator cost concept to improve the performance
of Dex query iterators (mainly, AND iterator). Benchmarks show that the
queries become ~10% faster.
Before
```
-------------------------------------------------------
Benchmark Time CPU Iteration
-------------------------------------------------------
DexAdHocQueries 5883074 ns 5883018 ns 117
DexRealQ 959904457 ns 959898507 ns 1
```
After
```
-------------------------------------------------------
Benchmark Time CPU Iteration
-------------------------------------------------------
DexAdHocQueries 5238403 ns 5238361 ns 130
DexRealQ 873275207 ns 873269453 ns 1
```
Reviewed by: sammccall
Differential Revision: https://reviews.llvm.org/D51310
llvm-svn: 341057
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stop using `$$$` (empty) trigram and generating a posting list with all
items. Since TRUE iterator is already implemented and correctly inserted
when there are no real trigram posting lists, this is a valid
transformation.
Benchmarks show that this simple change allows ~30% speedup on dataset
of real completion queries.
Before
```
-------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------
DexAdHocQueries 5640321 ns 5640265 ns 120
DexRealQ 939835603 ns 939830296 ns 1
```
After
```
-------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------
DexAdHocQueries 3452014 ns 3451987 ns 203
DexRealQ 667455912 ns 667455750 ns 1
```
Reviewed by: ilya-biryukov
Differential Revision: https://reviews.llvm.org/D51287
llvm-svn: 340729
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces LIMIT iterator, which is very important for
improving the quality of search query. LIMIT iterators can be applied on
top of BOOST iterators to prevent populating query request with a huge
number of low-quality symbols.
Reviewed by: sammccall
Differential Revision: https://reviews.llvm.org/D51029
llvm-svn: 340605
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch prints information about built index size estimation to
verbose logs. This is useful for optimizing memory usage of DexIndex and
comparisons with MemIndex.
Reviewed by: sammccall
Differential Revision: https://reviews.llvm.org/D51154
llvm-svn: 340601
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces BOOST iterator - a substantial block for efficient
and high-quality symbol retrieval. The concept of boosting allows
performing computationally inexpensive scoring on the query side so that
the final (expensive) scoring can only be applied on the items with the
highest preliminary score while eliminating the need to score too many
items.
Reviewed by: ilya-biryukov
Differential Revision: https://reviews.llvm.org/D50970
llvm-svn: 340409
|
|
|
|
|
|
|
| |
The wrong diff that was uploaded to Phabricator was building the wrong
index.
llvm-svn: 340388
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds hidden Clangd flag ("use-dex-index") which replaces
(currently) default `MemIndex` with `DexIndex` for the static index.
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50897
llvm-svn: 340262
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is a proof-of-concept Dex index implementation. It has
several flaws, which don't allow replacing static MemIndex yet, such as:
* Not being able to handle queries of small size (less than 3 symbols);
a way to solve this is generating trigrams of smaller size and having
such incomplete trigrams in the index structure.
* Speed measurements: while manually editing files in Vim and requesting
autocompletion gives an impression that the performance is at least
comparable with the current static index, having actual numbers is
important because we don't want to hurt the users and roll out slow
code. Eric (@ioeric) suggested that we should only replace MemIndex as
soon as we have the evidence that this is not a regression in terms of
performance. An approach which is likely to be successful here is to
wait until we have benchmark library in the LLVM core repository, which
is something I have suggested in the LLVM mailing lists, received
positive feedback on and started working on. I will add a dependency as
soon as the suggested patch is out for a review (currently there's at
least one complication which is being addressed by
https://github.com/google/benchmark/pull/649). Key performance
improvements for iterators are sorting by cost and the limit iterator.
* Quality measurements: currently, boosting iterator and two-phase
lookup stage are not implemented, without these the quality is likely to
be worse than the current implementation can yield. Measuring quality is
tricky, but another suggestion in the offline discussion was that the
drop-in replacement should only happen after Boosting iterators
implementation (and subsequent query enhancement).
The proposed changes do not affect Clangd functionality or performance,
`DexIndex` is only used in unit tests and not in production code.
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50337
llvm-svn: 340175
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Proposed changes:
* Cleanup comments in `clangd/index/dex/Iterator.h`: Vim's `gq`
formatting added redundant spaces instead of newlines in few
places
* Few comments in `OrIterator` are wrong
* Use `EXPECT_TRUE(Condition)` instead of
`EXPECT_THAT(Condition, true)` (same with `EXPECT_FALSE`)
* Don't expose `dump()` method to the public by misplacing
`private:`
This patch does not affect functionality.
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50956
llvm-svn: 340157
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces TRUE Iterator which efficiently handles posting
lists containing all items within `[0, Size)` range.
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50955
llvm-svn: 340155
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch improves `dex::Iterator` string representation by
incorporating the information about the element which is currently being
pointed to by the `DocumentIterator`.
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50689
llvm-svn: 339877
|
|
|
|
|
|
|
|
|
|
|
| |
This patch improves readability of multiple files in clang-tools-extra
and enforces LLVM Coding Guidelines.
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50707
llvm-svn: 339687
|
|
|
|
| |
llvm-svn: 339557
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch handles trigram generation "short" identifiers and queries.
Trigram generator produces incomplete trigrams for short names so that
the same query iterator API can be used to match symbols which don't
have enough symbols to form a trigram and correctly handle queries which
also are not sufficient for generating a full trigram.
Reviewed by: ioeric
Differential revision: https://reviews.llvm.org/D50517
llvm-svn: 339548
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch modifies `consume` function to allow retrieval of limited
number of symbols. This is the "cheap" implementation of top-level
limiting iterator. In the future we would like to have a complete limit
iterator implementation to insert it into the query subtrees, but in the
meantime this version would be enough for a fully-functional
proof-of-concept Dex implementation.
Reviewers: ioeric, ilya-biryukov
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50500
llvm-svn: 339426
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original Dex Iterators patch (https://reviews.llvm.org/rL338017)
caused problems for Clang 3.6 and Clang 3.7 due to the compiler bug
which prevented inferring template parameter (`Size`) in create(And|Or)?
functions. It was reverted in https://reviews.llvm.org/rL338054.
In this revision the mentioned helper functions were replaced with
variadic templated versions.
Proposed changes were tested on multiple compiler versions, including
Clang 3.6 which originally caused the failure.
llvm-svn: 338116
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts two revisions:
* https://reviews.llvm.org/rL338017
* https://reviews.llvm.org/rL338028
They caused crash for Clang 3.6 & Clang 3.7 buildbots, it was
reported by Jeremy Morse.
llvm-svn: 338054
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces three essential types of query iterators:
`DocumentIterator`, `AndIterator`, `OrIterator`. It provides a
convenient API for query tree generation and serves as a building block
for the next generation symbol index - Dex. Currently, many
optimizations are missed to improve code readability and to serve as the
reference implementation. Potential improvements are briefly mentioned
in `FIXME`s and will be addressed in the following patches.
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
Iterators, their applications and potential extensions are explained in
detail in the design proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: ioeric, sammccall, ilya-biryukov
Subscribers: cfe-commits, klimek, jfb, mgrang, mgorny, MaskRay, jkorous,
arphaman
Differential Revision: https://reviews.llvm.org/D49546
llvm-svn: 338017
|
|
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
|