Code Search Capabilities
Learn and understand more about Sourcegraph's Code Search features and core functionality.
Powerful, flexible queries
Sourcegraph code search performs full-text searches and supports both regular expression and exact queries. By default, Sourcegraph searches across all your repositories. Our search query syntax allows for advanced queries, such as searching over any branch or commit, narrowing searches by programming language or file pattern, and more.
See the query syntax and query language reference documentation for a comprehensive overview of supported syntax.
Data freshness
Searches scoped to specific repositories are always up-to-date. Sourcegraph automatically fetches repository contents with any user action specific to the repository and makes new commits and branches available for searching and browsing immediately.
Unscoped search results over large repository sets may trail latest default branch revisions by some interval of time. This interval is a function of the number of repositories and the computational resources devoted to search indexing.
Max file size
By default, files larger than 1 MB are excluded from search results. Use the search.largeFiles keyword to specify files to be indexed and searched regardless of size. Regardless of where you set the search.largeFiles
environment variable, Sourcegraph will continue to ignore binary files, even if the size of the file is less than the limit you set.
Language-aware structural code search
Sourcegraph supports advanced code search for specifically matching patterns inside code structures, like function parameters and loop bodies.
See the structural search documentation for a detailed explanation of this search mode.
Commit diff search
Search over commit diffs using type:diff
to see how your codebase has changed over time. This is often used to find changes to particular functions, classes, or areas of the codebase when debugging.
You can also search within commit diffs on multiple branches by specifying the branches in a repo:
field after the @
sign. After the @
, separate Git refs with :
, specify Git ref globs by prefixing them with *
, and exclude a commit reachable from a ref by prefixing it with ^
. Diff searches can be further narrowed down with parameters that filter by author and time.
See the query syntax documentation for a comprehensive list of supported parameters.
Commit message search
Searching over commit messages is supported in Sourcegraph by adding type:commit
to your search query. Separately, you can also use the message:"any string"
parameter to filter type:diff
searches for a given commit message. Commit message searches can narrowed down further with filters such as author and time.
See our query syntax documentation for a comprehensive list of supported parameters.
Symbol search
Searching for symbols makes it easier to find specific functions, variables, and more. Use the type:symbol
filter to search for symbol results. Symbol results also appear in typeahead suggestions, so you can jump directly to symbols by name. When on an indexed commit, it uses Zoekt. Otherwise it uses the symbols service
Saved searches
Saved searches let you save and describe search queries so you can easily monitor the results on an ongoing basis. You can create a saved search for anything, including diffs and commits across all branches of your repositories. Saved searches can be an early warning system for common problems in your code and a way to monitor best practices, the progress of refactors, etc.
See the saved searches documentation for instructions for setting up and configuring saved searches.
Suggestions
As you type a query, the menu below will contain suggestions based on the query. Use the keyboard or mouse to select a suggestion. For example, if your query is repo:foo file:\.js$ hello
, the suggestions will consist of the list of files that match your query.
You can also type in the partial name of a repository or filename to quickly jump to it. For example, typing in just foo
would show you a list of repositories (first) and files with names containing foo.
Search contexts
Search contexts help you search the code you care about on Sourcegraph. A search context represents a set of repositories at specific revisions on a Sourcegraph instance that will be targeted by search queries by default.
Every search on Sourcegraph uses a search context. Search contexts can be defined with the contexts selector shown in the search input, or entered directly in a search query.
If you currently use version contexts, you can automatically convert your existing version contexts to search contexts. We recommend migrating to search contexts for a more intuitive, powerful search experience and the latest improvements and updates.
See the search contexts documentation to learn how to use and create search contexts.
Multi-branch indexing
The most common branch to search is your default branch. To speed up this common operation, Sourcegraph maintains an index of the source code on your default branch. Some organizations have other branches that are regularly searched. To speed up search for those branches, Sourcegraph can be configured to index up to 64 branches per repository.
Your site admin can configure indexed branches in site configuration under the experimentalFeatures.search.index.branches
setting. For example:
"experimentalFeatures": {
"search.index.branches": {
"github.com/sourcegraph/sourcegraph": ["3.15", "develop"],
"github.com/sourcegraph/src-cli": ["next"]
}
}
Indexing multiple branches will add additional resource requirements to Sourcegraph (particularly memory). The indexer will deduplicate documents between branches. So the size of your index will grow in relation to the number of unique documents. Refer to our resource estimator to estimate whether additional resources are required.
Exclude files and directories
You can exclude files and directories from search by adding the file .sourcegraph/ignore
to the root directory of your repository. Sourcegraph interprets each line in the ignore file as a glob pattern. Files or directories matching those patterns will not show up in the search results.
The ignore
file is tied to a commit. This means that if you committed an ignore
file to a feature branch but not to your default branch, then only search results for the feature branch will be filtered, while the default branch will show all results.
For example:
// .sourcegraph/ignore
// lines starting with "//" are comments and are ignored
// empty lines are ignored, too
// ignore the directory node_modules/
node_modules/
// ignore the directory src/data/
src/data/
// ** matches all characters, while * matches all characters except /
// ignore all JSON files
**.json
// ignore all JSON files at the root of the repository
*.json
// ignore all JSON files within the directory data/
data/**.json
// ignore all data folders
data/
**/data/
// ignore all files that start with numbers
[0-9]*.*
**/[0-9]*.*
Our syntax follows closely what is documented in the linux documentation project. However, we distinguish between *
and **
: While **
matches all characters, *
matches all characters except the path separator /
.
Note that invalid globbing patterns will cause an error and searches over commits containing a broken ignore
file
will not return any result.
Shard merging
Shard merging is a feature of Zoekt that enables the combination of smaller index files, or shards, into one larger file, a compound shard. This can reduce memory costs for Zoekt webserver. This feature is particularly useful for
customers with many small and rarely updated repositories, and can result in a significant reduction in memory. Shard merging can be enabled by setting SRC_ENABLE_SHARD_MERGING="1"
for Zoekt indexserver.
Shard merging can be fine-tuned by setting ENV variables for Zoekt indexserver:
Env Variable | Description | Default |
---|---|---|
SRC_VACUUM_INTERVAL | Run vacuum this often, specified as a duration | 24 hours |
SRC_MERGE_INTERVAL | Run merge this often, specified as a duration | 8 hours |
SRC_MERGE_TARGET_SIZE | The target size of compound shards in MiB | 2000 |
SRC_MERGE_MIN_SIZE | The minimum size of a compound shard in MiB | 1800 |
SRC_MERGE_MIN_AGE | The time since the last commit in days. Shards with newer commits are excluded from merging. | 7 |
SRC_MERGE_MAX_PRIORITY | The maximum priority a shard can have to be considered for merging, specified as a float value | 100.0 |
When repostiories receive udpates, Zoekt reindexes them and tombstones their old index data. As a result, compound shards can shrink and be dismantled into individual shards once they reach a critical minimum size. These individual shards are then considered for future merge operations. Shard merging can be monitored via the "Compound shards" panel in Zoekt's Grafana dashboard.
RE2 Regular Expressions
The Sourcegraph search language supports RE2 syntax. If you're used to tools like Perl which uses PCRE syntax, you may notice that there are some features that are missing from RE2 like backreferences and lookarounds. We choose to use RE2 for a few reasons:
- It makes it possible to build worst-case linear evaluation engines, which is very desirable for building a production-ready regex search engine.
- It's well-supported in Go, allowing us to take advantage of a rich ecosystem (notably including Zoekt)
- Our API and tooling makes it straightforward to use Sourcegraph with other tools that provide facilities not built in to the search language.
As an example of how you can use Sourcegraph tooling with other tools, we can use jq
(which supports Perl regexes) along with src
to post-filter search results. In this case, we want to use backreferences to find go functions take a single pointer argument and return a non-pointer of the same type as the input.
re2_regex='func \w+\(\w+ \*\w+\) \w+'
pcre2_regex='func \w+\(\w+ \*(\w+)\) \1'
src search --json --stream -- "/$re2_regex/" \
| jq '
//Filter to only content events
select(.type == "content") # Filter to only content events
//Flatten to a single object per match
| {content: .chunkMatches[].content} + del(.chunkMatches)
//Select only matches that match the PCRE regex
| select(.content | test($ARGS.positional[0]))
' --args "$pcre2_regex"
Other search tips
- When viewing a file or directory, press the
y
key to expand the URL to its canonical form (with the full 40-character Git commit SHA). - To share a link to multi-line range in a file, click on the starting line number and shift-click on the ending line number (in the left-hand gutter).
- The dynamic filter currently allows up to 40 results returned under the repo filter and 40 results returned under other (Total:80)