Maven repositories - short intro

Maven completely relies on repositories to do the work. There are remote repositories (plural!) and local repository (singular!). The remote repositories are (from the developer's point of view) "remote", since he has no direct influence on them/it, while local repositories are completely under his control and used directly in build cycle performed by him.

Furthermore, the local repository and remote repository is not the same and are not interchangeable by any means (like deep-filecopy and republishing it on some HTTP server)! These two kinds of repositories shares same "deep" hierarchy and most of the content (JARs, POMs,...) but a local repository is not usable as remote repository, since the metadata information is lost (at least transformed and/or renamed) in it!

Maven uses centralized repositories to provide global access to needed and described artifacts found in project's POM. Furthermore, these remote repositories are heavily used in artifact and maven plugin (which are plain artifacts too) resolution. Even the "clean" goal on fresh maven2 installation will fetch a few plugins (at least the maven-clean-plugin) and it's dependencies from central repository, since maven2 is shipped slim, without plugins!

Repositories for Maven2 are configured at Maven2 level (in settings.xml) and per project level (in POM).

The remote repositories

As it was stated on the beginning, you may have as much remote repositories as you like, but Maven2 comes preconfigured to use the "central" remote repository (see below) as default. The reason why would you like more then one remote repository is simple: you can separate artifacts, you can separate build or even you can create separate environments (these reasons will may be unclear for a common OSS developer).

Remote repositories are accessed mostly over plain HTTP transport, but it is not restricted to it. There are examples to repositories published over HTTPS, FTP or even over HTTPS with client key authentication too (2SSL). For the latter, there are sample deployments using Proximity.

The central remote repository

The central repository is an alias for a "Maven Repository Switchboard" named repository, which is found in maven-project-2.0.x.jar, deployed with Maven2. It points to a symbolic address "http://repo1.maven.org/maven2" which is simply an alias for "http://www.ibiblio.org/maven2/" (as of 2006. 06. 21.). The need for this alias were justified by recent ibiblio and/or codehaus blackout, when the alias was redirected to "http://repo.mergere.com/maven2/" as a quick solution, since all maven2 users without maven-proxy between them and repo "central" was doomed to failing builds!

The central repository is configured to not hold snapshots (not like Maven1 repository found on ibiblio!) -- a fact that is a way too much ignored (will see later when overrriding reposes why).

The local repository

The local repository is per default created in $HOME/.m2/repository directory. It will by time grow as new artifacts are fetched from remote repositories and as the developer installs artifacts into it locally. Simple formulation would be: the local repository will contain the union of all artifacts and their dependencies ever built with Maven2 using the given local repository.

Repository aggregation

Please read the the Aggregation of proxies may not be your friend article!

As we said already, Proximity aggregates multiple repositories. What does it mean actually? Well, first of all, you don't need to reference four (as provided in default config) different remote repositories (IT nightmare), it is enough only to reference Proximity.

Aggregation does brings some difficulties: what happens if some path (eg. /ant/ant/1.6.5/ant-1.6.5.pom) have defined contents in more then one repository? Proximity solves this by "who serve first - wins" algorithm. When Maven client issues a HTTP GET command with absolute URL (which is computed from artifact's groupId, artifactId and version), Proximity start looking for that item by iterating over his repositories. This is the reason why is so important the repository order in configuration.

Artifact "spoofing"

By placing repository A with similar content before repository B in repository list, we do an intentional spoof of artifacts from B with artifacts from A. This brings endless possibilities (but possibilities to errors too) to us.

Metadata merging

Beside this. for Maven2 to function properly, proper metadata (maven-metadata.xml) is very important. But metadata falls in a special category as opposed to those "general" artifacts descibed before. Spoofing is not always a good solution for them. Imagine a case, when you have configured the following repositories:

  • inhouse, hosted
  • central, proxied from http://repo1.maven.org/maven2
  • apache-snapshot, proxied from http://people.apache.org/maven-snapshot-repository

Let's say, you request artifact Doxia core. You can check yourself, but the /org/apache/maven/doxia/doxia-core/maven-metadata.xml file exists in both, central and apache-snapshot repositories. Which one should be served to Maven if requested?

Since this file is Maven metadata file, it should be merged. The merge operation within Proximity depends on repository's groupId. If the repositories are in the same group, the metadata will be merged, otherwise spoofed.

Using these techniques, we can fight against various "unstable" repositories, or simply "defend" a temporary malfunction (like dissapearance of Jakarta VFS unreleased version). If we would depend on it, we are able with Proximity to "spoof" it back into place at company level without to be doomed to failing builds until the remote repository that is proxied by Proximity is brought back to its original state.