GitHub
Site admins can sync Git repositories hosted on GitHub.com and GitHub Enterprise with Sourcegraph so that users can search and navigate the repositories.
There are 2 ways to connect with GitHub:
Supported versions
- GitHub.com
- GitHub Enterprise v2.10 and newer
Using a GitHub App
Sourcegraph 5.1+To create a GitHub App and connect it to Sourcegraph:
- Go to Site admin > Repositories > Github Apps on Sourcegraph.
-
Click Create GitHub App.
-
Enter a name for your app (it must be unique across your GitHub instance) and the URL of your GitHub instance.
You may optionally specify an organization to register the app with. If no organization is specified, the app will be owned by the account of the user who creates it on GitHub. This is the default.
You may also optionally set the App visibility to public. A GitHub App must be made public if you wish to install it on multiple organizations or user accounts. The default is private.
The GitHub App will require the following permissions:
Contents (Repository contents, commits, branches, downloads, releases, and merges): read Emails (Manage a user's email addresses): read Members (Organization members and teams): read Metadata (Search repositories, list collaborators, and access repository metadata): read
- When you click Create GitHub App, you will be redirected to GitHub to confirm the details of the App to be created.
- To complete the setup on GitHub, you will be asked to review the App permissions and select which repositories the App can access before installing it in a namespace. The default is All repositories. Any repositories that you choose to omit will not be able to be synced by Sourcegraph. You can change this later.
- Click Install. Once complete, you will be redirected back to Sourcegraph, where you will now be able to view and manage the details of your new GitHub App from within Sourcegraph.
-
Sourcegraph needs to map Sourcegraph users to GitHub users. Click Reveal secret to get the JSON configuration for the auth provider and copy/paste it into the
"auth.providers"
section of your site configuration. -
Click Add connection under your new installation to create a code host connection to GitHub with this App installation. By default, it will sync all repositories the App can access within the namespace where it was installed. Repository permission enforcement will also be turned on by default.
You can now select repositories to sync or see more configuration options in the configuration section.
-
(Optional) If you want to sync repositories from other organization or user namespaces and your GitHub App is set to public visibility, you can create additional installations with Add installation.
NOTE: If you are using Batch Changes, you can create a GitHub App to perform commit signing (Beta).
Multiple installations
The initial GitHub App setup will only install the App on the organization or user account that you registered it with. If your code is spread across multiple organizations or user accounts, you will need to create additional installations for each namespace that you want Sourcegraph to sync repositories from.
By default, Sourcegraph creates a private GitHub App, which only allows the App to be installed on the same organization or user account that it was created in. If you did not set the App to public visibility during creation, you will need to change the visibility to public before you can install it in other namespaces. For security considerations, see GitHub's documentation on private vs public apps.
Once public, App can be installed in additional namespaces either from Sourcegraph or from GitHub.
Installing from Sourcegraph
- Go to Site admin > Repositories > Github Apps and click Edit on the App you want to install in another namespace. You'll be taken to the App details page.
-
Click Add installation. You will be redirected to GitHub to pick which other organization to install the App on and finish the installation process.
NOTE: Only organization owners can install GitHub Apps on an organization. If you are not an owner, you will need to ask an owner to install the App for you.
- As before, you will be asked to review the App permissions and select which repositories the App can access before installing it in a namespace. Once you click Install and the setup completes, you will be redirected back to Sourcegraph, where you will now see your additional installation listed.
- To sync repositories from this installation, click Add connection under your new installation.
Installing from GitHub
- Go to the GitHub App page. You can get here easily from Sourcegraph by clicking View in GitHub for the App you want to install in another namespace.
- Click Configure, or go to App settings > Install App, and select the organization or user account you want to install the App on.
- As before, you will be asked to review the App permissions and select which repositories the App can access before installing it in a namespace. Once you click Install and the setup completes, you will be redirected back to Sourcegraph.
- GitHub App installations will be automatically synced in the background. Return to Site admin > Repositories > Github Apps and click Edit on the App you added the new installation for. You'll be taken to the App details page. Once synced, you will see the new installation listed.
- To sync repositories from this installation, click Add connection under your new installation.
Uninstalling an App
You can uninstall a GitHub App from a namespace or remove it altogether at any time.
To remove an installation in a single namespace, click View in GitHub for the installation you want to remove. If you are able to administer Apps in this namespace, you will see Uninstall "[APP NAME]" in the "Danger zone" at the bottom of the page. Click Uninstall to remove the App from this namespace. Sourcegraph will periodically sync installations in the background. It may temporarily throw errors related to the missing installation until the sync completes. You can check the GitHub App details page to confirm the installation has been removed.
To remove an App entirely, go to Site admin > Repositories > Github Apps and click Remove for the App you want to remove. You will be prompted to confirm you want to remove the App from Sourcegraph. Once removed from the Sourcegraph side, Sourcegraph will no longer communicate with your GitHub instance via the App unless explicitly reconnected. However, the App will still exist on GitHub unless manually deleted there, as well.
GitHub App token use
Sourcegraph uses the tokens from GitHub Apps in the following ways:
Installation access tokens
Installation access tokens are short-lived, non-refreshable tokens that give Sourcegraph access to the repositories the GitHub App has been given access to. Sourcegraph uses these tokens to clone repositories and to determine which users should be able to view a repository. These tokens expire after 1 hour.
User access tokens
These are OAuth tokens that Sourcegraph receives when a user signs into Sourcegraph using the configured GitHub App. Sourcegraph uses these tokens to link the user's Sourcegraph account to their GitHub account, as well as determine which repositories a user should be able to access. These tokens are refreshable, and by default they expire after 8 hours. Sourcegraph refreshes the user tokens as required.
Custom Certificates
NOTE: Feature supported in Sourcegraph 5.1.5+
If you are using a self-signed certificate for your GitHub Enterprise instance, configure tls.external
under experimentalFeatures
in the Site configuration with your certificate(s).
{
"experimentalFeatures": {
"tls.external": {
"certificates": [
"-----BEGIN CERTIFICATE-----\n..."
]
}
}
}
Using an access token
To connect GitHub to Sourcegraph with an access token:
- Go to Site admin > Manage code hosts
- Select GitHub.
- Configure the connection to GitHub using the action buttons above the text field, and additional fields can be added using Cmd/Ctrl+Space for auto-completion. See the configuration documentation below.
- Press Add repositories.
In this example, the kubernetes public repository on GitHub is added by selecting Add a single repository and replacing <owner>/<repository>
with kubernetes/kubernetes
:
{
"url": "https://github.com",
"token": "<access token>",
"orgs": [],
"repos": [
"kubernetes/kubernetes"
]
}
GitHub API access
GitHub requires a token
in order to access their API. There are different types of tokens that can be supplied. When using GitHub apps, this is handled automatically by Sourcegraph.
- GitHub app installation access token:An installation access token is created automatically when you install a GitHub app. Do not set this token in the code host connection configuration. This token gives Sourcegraph the same level of access to repositories as the GitHub app installation.
- Personal access token:This gives Sourcegraph the same level of access to repositories as the account that created the token. If you don't want to mix your personal repositories with your organizations repositories, you could add an entry to the
exclude
array, or you can use a machine user token or a fine-grained access token. - Fine-grained access token:Allows scoping access tokens to specific repositories with specific permissions. Consult the table below for the required permissions.
- Machine user token:Generates a token for a machine user that is affiliated with an organization instead of a user account.
Personal access token scopes
No token scopes are required if you only want to sync public repositories and don't want to use any of the following features. Otherwise, the following token scopes are required for specific features:
Feature | Required token scopes |
---|---|
Sync private repositories | repo |
Sync repository permissions | repo |
Batch changes | repo , read:org , user:email , read:discussion , and workflow (learn more) |
WARNING: In addition to the prerequisite token scopes, the account attached to the token must actually have the same level of access to the relevant resources that you are trying to grant. For example:
- If read access to repositories is required, the token must have
repo
scope and the token's account must have read access to the relevant repositories. This can happen by being directly granted read access to repositories, being on a team with read access to the repository, and so on.- If write access to repositories is required, the token must have
repo
scope and the token's account must have write access to all repositories. This can happen by being added as a direct contributor, being on a team with write access to the repository, being an admin for the repository's organization, and so on.- If write access to organizations is required, the token must have
write:org
scope and the token's account must have write access for all organizations. This can happen by being an admin in all relevant organizations.Learn more about how the GitHub API is used and what level of access is required in the corresponding feature documentation.
Fine-grained access token permissions
Fine-grained tokens can access public repositories, but can only access the private repositories of the account they are scoped to.
When creating your fine-grained access token, select the following permissions depending on the purpose of the token:
Feature | Required token permissions |
---|---|
Sync private repositories | Repository permissions: Contents - Access: Read-only |
Sync repository permissions | Repository permissions: Contents - Access: Read-only |
Batch changes | Unsupported |
WARNING: Fine-grained tokens don't support the
repositoryQuery
code host connection option or batch changes. Both of these features rely on GitHub's GraphQL API, which is unsupported by fine-grained access tokens.
Private repositories
To clone and search private repositories, we need a GitHub access token with the required scopes and at least read access to the relevant private repositories.
For more details, see GitHub API access.
Selecting repositories to sync
There are four fields for configuring which repositories are mirrored/synchronized:
repos
A list of repositories inowner/name
format. The order determines the order in which we sync repository metadata and is safe to change.orgs
A list of organizations (every repository belonging to the organization will be cloned).repositoryQuery
A list of strings with three pre-defined options (public
,affiliated
,none
, none of which are subject to result limitations), and/or a GitHub advanced search query. Note: There is an existing limitation that requires the latter, GitHub advanced search queries, to return less than 1000 results. See this issue for ongoing work to address this limitation.exclude
A list of repositories to exclude which takes precedence over therepos
,orgs
, andrepositoryQuery
fields.
Rate limits
Always include a token in a configuration for a GitHub.com URL to avoid being denied service by GitHub's unauthenticated rate limits. If you don't want to automatically synchronize repositories from the account associated with your personal access token, you can create a token without a repo
scope for the purposes of bypassing rate limit restrictions only.
When Sourcegraph hits a rate limit imposed by GitHub, Sourcegraph waits the appropriate amount of time specified by GitHub before retrying the request. This can be several minutes in extreme cases.
GitHub Enterprise Server rate limits
Rate limiting may not be enabled by default. To check and verify the current rate limit settings, you may make a request to the /rate_limit
endpoint like this:
$ curl -s https://<github-enterprise-url>/api/v3/rate_limit -H "Authorization: Bearer <token>"
{
"message": "Rate limiting is not enabled.",
"documentation_url": "https://docs.github.com/enterprise/3.3/rest/reference/rate-limit#get-rate-limit-status-for-the-authenticated-user"
}
Internal rate limits
See Internal rate limits.
Repository permissions
Prerequisite for configuring repository permission syncing: Add GitHub as an authentication provider.
Then, add or edit the GitHub connection as described above and include the authorization
field:
{
// ...
"authorization": {}
}
This needs to be done for every github code host connection if there is more than one configured.
Repo-centric permission syncing is done by calling the list repository collaborators GitHub API endpoint. To call this API endpoint correctly, we need a GitHub access token with the required scopes and read and write access to all relevant repositories.
IMPORTANT: We strongly recommend configuring both read and write access to associated repositories for permission syncing due to GitHub's token scope requirements. Without write access, there will be a conflict between user-centric sync and repo-centric sync. In that case, disable repo-centric permission sync (supported in Sourcegraph 5.0.4+).
IMPORTANT: Optional, but strongly recommended - continue with configuring webhooks for permissions.
NOTE: It can take some time to complete full cycle of repository permissions sync if you have a large number of users or repositories. See sync duration time for more information.
Internal repositories
GitHub Enterprise has internal repositories in addition to the usual public and private repositories. Depending on how your organization structure is configured, you may want to make these internal repositories available to everyone on your Sourcegraph instance without relying on permission syncs. To mark all internal repositories as public, add the following field to the authorization
field:
{
// ...
"authorization": {
"markInternalReposAsPublic": true
}
}
If you would like internal repositories to remain private, but you're experiencing issues where user permission syncs aren't granting access to internal repositories, you can add the following field instead:
{
// ...
"authorization": {
"syncInternalRepoPermissions": true
}
}
NOTE: An explanation on repository visibility options in GitHub Enterprise.
public
- Only index public GitHub Enterprise repositories visible to all users. This excludes private and internal repos.private
- Index both public and private GitHub Enterprise repositories. This allows accessing private repos the token has access to.internal
- Include GitHub Enterprise internal repositories in addition to public/private repos. Internal repos are only visible to org members.
Trigger permissions sync from GitHub webhooks
Follow the link to configure webhooks for permissions for Github
Teams and organizations permissions caching
NOTE: This is an experimental feature.
WARNING: The following section is experimental and might not work properly anymore on new Sourcegraph versions (post 4.0+). Please prefer configuring webhooks for permissions instead
Github code host can leverage caching mechanisms to reduce the number of API calls used when syncing permissions. This can significantly reduce the amount of time it takes to perform a full cycle of permissions sync due to reduced instances of being rate limited by the code host, and is useful for code hosts with very large numbers of users and repositories.
Sourcegraph can leverage caching of GitHub team and organization permissions.
NOTE: You should only try this if your GitHub setup makes extensive use of GitHub teams and organizations to distribute access to repositories and your number of
users * avg_repositories
is greater than 250,000 (which roughly corresponds to the scale at which GitHub rate limits might become an issue).
This caching behaviour can be enabled via the authorization.groupsCacheTTL
field:
{
"url": "https://github.example.com",
"token": "$PERSONAL_ACCESS_TOKEN",
"authorization": {
"groupsCacheTTL": 72, // hours
}
}
In the corresponding authorization provider in site configuration, the allowGroupsPermissionsSync
field must be set as well for the correct auth scopes to be requested from users:
{
// ...
"auth.providers": [
{
"type": "github",
"url": "https://github.example.com",
"allowGroupsPermissionsSync": true,
}
]
}
A token that has the required scopes and both read and write access to all relevant repositories and organizations is needed to fetch repository permissions and team memberships. Read-only access will not work with cached permissions sync, but will work with careful configuration for regular GitHub permissions sync.
When enabling this feature, we currently recommend a default groupsCacheTTL
of 72
(hours, or 3 days). A lower value can be set if your teams and organizations change frequently, though the chosen value must be at least several hours for the cache to be leveraged in the event of being rate-limited (which takes an hour to recover from).
Cache invalidation happens automatically on certain webhook events, so it is recommended to configure webhook support when using cached permissions sync. Caches can also be manually invalidated if necessary.
Manually invalidate caches
To force a bypass of caches during a sync, you can manually queue users or repositories for sync with the invalidateCaches
options via the Sourcegraph GraphQL API:
mutation {
scheduleUserPermissionsSync(user: "userid", options: {invalidateCaches: true}) {
alwaysNil
}
}
User authentication
To configure GitHub as an authentication provider (which will enable sign-in via GitHub), see the authentication documentation.
Webhooks
Using the webhooks
property on the external service has been deprecated.
Please consult this page in order to configure webhooks.
Configuration
GitHub connections support the following configuration options, which are specified in the JSON editor in the site admin "Manage code hosts" area.
admin/external_service/github.schema.json
{
// If non-null, enforces GitHub repository permissions. This requires that there is an item in the [site configuration json](https://sourcegraph.com/docs/admin/config/site_config#auth-providers) `auth.providers` field, of type "github" with the same `url` field as specified in this `GitHubConnection`.
"authorization": null,
// TLS certificate of the GitHub Enterprise instance. This is only necessary if the certificate is self-signed or signed by an internal CA. To get the certificate run `openssl s_client -connect HOST:443 -showcerts < /dev/null 2> /dev/null | openssl x509 -outform PEM`. To escape the value into a JSON string, you may want to use a tool like https://json-escape-text.now.sh.
"certificate": null,
// Other example values:
// - "-----BEGIN CERTIFICATE-----\n..."
// Only used to override the cloud_default column from a config file specified by EXTSVC_CONFIG_FILE
"cloudDefault": false,
// When set to true, this external service will be chosen as our 'Global' GitHub service. Only valid on Sourcegraph.com. Only one service can have this flag set.
"cloudGlobal": false,
// A list of repositories to never mirror from this GitHub instance. Takes precedence over "orgs", "repos", and "repositoryQuery" configuration.
//
// Supports excluding by name ({"name": "owner/name"}) or by ID ({"id": "MDEwOlJlcG9zaXRvcnkxMTczMDM0Mg=="}).
//
// Note: ID is the GitHub GraphQL ID, not the GitHub database ID. eg: "curl https://api.github.com/repos/vuejs/vue | jq .node_id"
"exclude": null,
// Other example values:
// - [{"forks":true}]
// - [
// {
// "name": "owner/name"
// },
// {
// "id": "MDEwOlJlcG9zaXRvcnkxMTczMDM0Mg=="
// }
// ]
// - [
// {
// "name": "vuejs/vue"
// },
// {
// "name": "php/php-src"
// },
// {
// "pattern": "^topsecretorg/.*"
// }
// ]
// - [
// {
// "size": "\u003e= 1GB",
// "stars": "\u003c 100"
// }
// ]
// If non-null, this is a GitHub App connection with some additional properties.
"gitHubAppDetails": null,
// The type of Git URLs to use for cloning and fetching Git repositories on this GitHub instance.
//
// If "http", Sourcegraph will access GitHub repositories using Git URLs of the form http(s)://github.com/myteam/myproject.git (using https: if the GitHub instance uses HTTPS).
//
// If "ssh", Sourcegraph will access GitHub repositories using Git URLs of the form git@github.com:myteam/myproject.git. See the documentation for how to provide SSH private keys and known_hosts: https://sourcegraph.com/docs/admin/repo/auth#repositories-that-need-http-s-or-ssh-authentication.
"gitURLType": "http",
// DEPRECATED: The installation ID of the GitHub App.
"githubAppInstallationID": null,
// Deprecated and ignored field which will be removed entirely in the next release. GitHub repositories can no longer be enabled or disabled explicitly. Configure repositories to be mirrored via "repos", "exclude" and "repositoryQuery" instead.
"initialRepositoryEnablement": null,
// An array of organization names identifying GitHub organizations whose repositories should be mirrored on Sourcegraph.
"orgs": null,
// Other example values:
// - ["name"]
// - [
// "kubernetes",
// "golang",
// "facebook"
// ]
// Whether the code host connection is in a pending state.
"pending": false,
// Rate limit applied when making background API requests to GitHub.
"rateLimit": {
"enabled": true,
"requestsPerHour": 5000
},
// An array of repository "owner/name" strings specifying which GitHub or GitHub Enterprise repositories to mirror on Sourcegraph.
"repos": null,
// Other example values:
// - ["owner/name"]
// - [
// "kubernetes/kubernetes",
// "golang/go",
// "facebook/react"
// ]
// The pattern used to generate the corresponding Sourcegraph repository name for a GitHub or GitHub Enterprise repository. In the pattern, the variable "{host}" is replaced with the GitHub host (such as github.example.com), and "{nameWithOwner}" is replaced with the GitHub repository's "owner/path" (such as "myorg/myrepo").
//
// For example, if your GitHub Enterprise URL is https://github.example.com and your Sourcegraph URL is https://src.example.com, then a repositoryPathPattern of "{host}/{nameWithOwner}" would mean that a GitHub repository at https://github.example.com/myorg/myrepo is available on Sourcegraph at https://src.example.com/github.example.com/myorg/myrepo.
//
// It is important that the Sourcegraph repository name generated with this pattern be unique to this code host. If different code hosts generate repository names that collide, Sourcegraph's behavior is undefined.
"repositoryPathPattern": "{host}/{nameWithOwner}",
// An array of strings specifying which GitHub or GitHub Enterprise repositories to mirror on Sourcegraph. The valid values are:
//
// - `public` mirrors all public repositories for GitHub Enterprise and is the equivalent of `none` for GitHub
//
// - `internal` mirrors all internal repositories for GitHub Enterprise and is the equivalent of `none` for GitHub
//
// - `affiliated` mirrors all repositories affiliated with the configured token's user:
// - Private repositories with read access
// - Public repositories owned by the user or their orgs
// - Public repositories with write access
//
// - `none` mirrors no repositories (except those specified in the `repos` configuration property or added manually)
//
// - All other values are executed as a GitHub advanced repository search as described at https://github.com/search/advanced. Example: to sync all repositories from the "sourcegraph" organization including forks the query would be "org:sourcegraph fork:true".
//
// If multiple values are provided, their results are unioned.
//
// If you need to narrow the set of mirrored repositories further (and don't want to enumerate it with a list or query set as above), create a new bot/machine user on GitHub or GitHub Enterprise that is only affiliated with the desired repositories.
"repositoryQuery": [
"none"
],
// A GitHub personal access token. Create one for GitHub.com at https://github.com/settings/tokens/new?description=Sourcegraph (for GitHub Enterprise, replace github.com with your instance's hostname). See https://sourcegraph.com/docs/admin/external_service/github#github-api-token-and-access for which scopes are required for which use cases.
"token": null,
// URL of a GitHub instance, such as https://github.com or https://github-enterprise.example.com.
"url": null,
// Other example values:
// - "https://github.com"
// - "https://github-enterprise.example.com"
// An array of configurations defining existing GitHub webhooks that send updates back to Sourcegraph.
"webhooks": null
// Other example values:
// - [
// {
// "org": "yourorgname",
// "secret": "webhook-secret"
// }
// ]
}
Default branch
Sourcegraph displays search results from the default branch of a repository when no revision:
parameter is specified. If you'd like the search results to be displayed from another branch by default, you may change a repo's default branch on the github repo settings page. If this is not an option, consider using search contexts instead.
Troubleshooting
Hitting GitHub Search API rate limit with repositoryQuery
When Sourcegraph syncs repositories configured via repositoryQuery
, it consumes GitHub API search rate limit, which is lower than the normal rate limit. The affiliated
, public
, and none
special values, however, trigger normal API requests instead of search API requests. internal
is also a special value that uses the GitHub Search API to list all internal repositories.
When the search rate limit quota is exhausted, an error like failed to list GitHub repositories for search: page=..., searchString=\"...\"
can be found in logs. To work around this try reducing the frequency with which repository syncing happens by setting a higher value (in minutes) of repoListUpdateInterval
in your Sourcegraph site config.
repositoryQuery
is the only repo syncing method that consumes GitHub search API quota, so if setting repoListUpdateInterval
doesn't work consider switching your syncing method to use another option, like orgs
, or using one of the special values described above.
"repositoryQuery": ["public"] does not return archived status of a repo
The repositoryQuery
option "public"
is valuable in that it allows sourcegraph to sync all public repositories, however, it does not return whether or not a repo is archived. This can result in archived repos appearing in normal search. You can see an example of what is returned by the GitHub API for a query to "public" here.
If you would like to sync all public repositories while omitting archived repos, consider generating a GitHub token with access to only public repositories, then use repositoryQuery
with option affiliated
and an exclude
argument with option public
as seen in the example below:
{
"url": "https://github.example.com",
"gitURLType": "http",
"repositoryPathPattern": "devs/{nameWithOwner}",
"repositoryQuery": [
"affiliated"
],
"token": "TOKEN_WITH_PUBLIC_ACCESS",
"exclude": [
{
"archived": true
}
]
}