5.13 File fetching

Overview

In the preceding section, we explored the process of constructing module graphs. While we touched upon the topic of fetching modules, we didn't delve deeply into the specifics. In this upcoming section, we will elaborate on how Deno retrieves files.
Both the main module and its imports are essentially files. These files must be obtained from their respective sources, which can take the form of local locations or remote destinations such as HTTP, HTTPS, data URLs, or even from the NPM package repository. Deno's mechanism for obtaining these files plays a crucial role in how the program functions. Let's delve into the details of how Deno goes about fetching these files, whether they're in close proximity or situated across the vast landscape of the internet. By understanding this process, we can gain insights into the inner workings of Deno's module resolution and loading procedures.

Functionality

In Deno, when it comes to fetching files, you have four main options to choose from. These options are like different paths you can take to get the file you need:
  1. 1.
    Local Source:
    • This is when the file you want is right on your computer's disk. It's stored locally.
    • For example, you might have a file saved on your computer, and you want to use it in your Deno program.
  2. 2.
    Remote Source:
    • This happens when the file is not on your computer, but it's available on the internet.
    • You can access remote files through HTTP or HTTPS, just like when you browse websites, or even through NPM, a package manager for JavaScript.
  3. 3.
    Data URL:
    • This is a bit different. Instead of fetching a file from a regular location, you're getting it from a special kind of web standard called a data URL.
    • It's like embedding the file's content directly into the URL itself. This can be useful for small files or pieces of data.
  4. 4.
    Cache:
    • This is like a storage area where Deno keeps files that it has already fetched before. If you ask for a file that's in the cache, Deno can quickly give it to you without needing to download it again.
    • Think of it like a saved copy of a file that Deno keeps handy for you.
Now, when we look at the code, fetching from the cache is actually a part of the remote fetching process. However, to avoid confusion, we show it separately. This helps to understand that even though the file might already be in the cache, Deno still uses the process it uses for remote files. So, it's like a small detour within the main remote fetching journey.
The file fetcher function is responsible for obtaining a file when provided with a module specifier. This handy function conceals the intricate workings by consistently delivering the requested file, regardless of where it originates from. By doing so, it shields users from the complexity of file retrieval operations and ensures a seamless experience in accessing the needed files.
Here is the source of the main fetch function:
pub async fn fetch(
&self,
specifier: &ModuleSpecifier,
permissions: PermissionsContainer,
) -> Result<File, AnyError> {
debug!("FileFetcher::fetch() - specifier: {}", specifier);
self.fetch_with_accept(specifier, permissions, None).await
}
pub async fn fetch_with_accept(
&self,
specifier: &ModuleSpecifier,
permissions: PermissionsContainer,
maybe_accept: Option<&str>,
) -> Result<File, AnyError> {
let scheme = get_validated_scheme(specifier)?;
permissions.check_specifier(specifier)?;
if let Some(file) = self.cache.get(specifier) {
Ok(file)
} else if scheme == "file" {
// we do not in memory cache files, as this would prevent files on the
// disk changing effecting things like workers and dynamic imports.
fetch_local(specifier)
} else if scheme == "data" {
self.fetch_data_url(specifier)
} else if scheme == "blob" {
self.fetch_blob_url(specifier).await
} else if !self.allow_remote {
Err(custom_error(
"NoRemote",
format!("A remote specifier was requested: \"{specifier}\", but --no-remote is specified."),
))
} else {
let result = self
.fetch_remote(
specifier,
permissions,
10,
maybe_accept.map(String::from),
)
.await;
if let Ok(file) = &result {
self.cache.insert(specifier.clone(), file.clone());
}
result
}
}
The code here is quite straightforward. It retrieves a file, either from your computer or from a distant location, depending on the type of URL the file has. These URLs come in different forms: file://, data:, http://, and https://.
When dealing with files fetched from a remote location, the code takes an additional step. It stores these remote files in an internal cache. This cache is like a temporary storage space that helps speed up the process. Fetching files from a remote location can take a lot of time compared to grabbing them from your own computer. So, by keeping a copy of the remote file in the cache, future fetches can be quicker.

Local fetch

If the file is situated on the disk (using the file:// protocol), it will be retrieved from your own computer. This process is referred to as "fetch_local" within Deno. This means that Deno will access the file directly from your local storage instead of over the internet.
fn fetch_local(specifier: &ModuleSpecifier) -> Result<File, AnyError> {
let local = specifier.to_file_path().map_err(|_| {
uri_error(format!("Invalid file path.\n Specifier: {specifier}"))
})?;
let bytes = fs::read(local)?;
let charset = text_encoding::detect_charset(&bytes).to_string();
let source = get_source_from_bytes(bytes, Some(charset))?;
let media_type = MediaType::from_specifier(specifier);
Ok(File {
maybe_types: None,
media_type,
source: source.into(),
specifier: specifier.clone(),
maybe_headers: None,
})
}
The function fetch_local operates by reading a file directly from the disk and subsequently providing the content of the file as output. It's important to note that local files obtained through this function are not stored in a cache for future use. Instead, the function retrieves the file's content anew each time it's called, ensuring that the most up-to-date version is always obtained.

Remote fetch

If the file is positioned on a different server, it will be obtained using an HTTP client. This client is responsible for fetching the file either through regular HTTP or the more secure HTTPS. This fetching process is done remotely, meaning it's pulled from a different location, which can be a bit slower and resource-intensive.
To tackle the potential inefficiency of remote fetching, Deno employs a caching mechanism. Once the file is fetched for the first time, it is saved in a cache. This way, if the same file is needed again, Deno can retrieve it from the cache rather than fetching it anew from the remote source. This caching strategy helps improve performance and reduces the need for repeated remote fetches.
fn fetch_remote(
&self,
specifier: &ModuleSpecifier,
permissions: PermissionsContainer,
redirect_limit: i64,
maybe_accept: Option<String>,
) -> Pin<Box<dyn Future<Output = Result<File, AnyError>> + Send>> {
debug!("FileFetcher::fetch_remote() - specifier: {}", specifier);
if redirect_limit < 0 {
return futures::future::err(custom_error("Http", "Too many redirects."))
.boxed();
}
if let Err(err) = permissions.check_specifier(specifier) {
return futures::future::err(err).boxed();
}
if self.should_use_cache(specifier) {
match self.fetch_cached(specifier, redirect_limit) {
Ok(Some(file)) => {
return futures::future::ok(file).boxed();
}
Ok(None) => {}
Err(err) => {
return futures::future::err(err).boxed();
}
}
}
if self.cache_setting == CacheSetting::Only {
return futures::future::err(custom_error(
"NotCached",
format!(
"Specifier not found in cache: \"{specifier}\", --cached-only is specified."
),
))
.boxed();
}
let mut maybe_progress_guard = None;
if let Some(pb) = self.progress_bar.as_ref() {
maybe_progress_guard = Some(pb.update(specifier.as_str()));
} else {
log::log!(
self.download_log_level,
"{} {}",
colors::green("Download"),
specifier
);
}
let maybe_etag = self
.http_cache
.cache_item_key(specifier)
.ok()
.and_then(|key| self.http_cache.read_metadata(&key).ok().flatten())
.and_then(|metadata| metadata.headers.get("etag").cloned());
let maybe_auth_token = self.auth_tokens.get(specifier);
let specifier = specifier.clone();
let client = self.http_client.clone();
let file_fetcher = self.clone();
// A single pass of fetch either yields code or yields a redirect, server
// error causes a single retry to avoid crashing hard on intermittent failures.
async fn handle_request_or_server_error(
retried: &mut bool,
specifier: &Url,
err_str: String,
) -> Result<(), AnyError> {
// Retry once, and bail otherwise.
if !*retried {
*retried = true;
log::debug!("Import '{}' failed: {}. Retrying...", specifier, err_str);
tokio::time::sleep(std::time::Duration::from_millis(50)).await;
Ok(())
} else {
Err(generic_error(format!(
"Import '{}' failed: {}",
specifier, err_str
)))
}
}
async move {
let mut retried = false;
let result = loop {
let result = match fetch_once(
&client,
FetchOnceArgs {
url: specifier.clone(),
maybe_accept: maybe_accept.clone(),
maybe_etag: maybe_etag.clone(),
maybe_auth_token: maybe_auth_token.clone(),
maybe_progress_guard: maybe_progress_guard.as_ref(),
},
)
.await?
{
FetchOnceResult::NotModified => {
let file = file_fetcher.fetch_cached(&specifier, 10)?.unwrap();
Ok(file)
}
FetchOnceResult::Redirect(redirect_url, headers) => {
file_fetcher.http_cache.set(&specifier, headers, &[])?;
file_fetcher
.fetch_remote(
&redirect_url,
permissions,
redirect_limit - 1,
maybe_accept,
)
.await
}
FetchOnceResult::Code(bytes, headers) => {
file_fetcher
.http_cache
.set(&specifier, headers.clone(), &bytes)?;
let file =
file_fetcher.build_remote_file(&specifier, bytes, &headers)?;
Ok(file)
}
FetchOnceResult::RequestError(err) => {
handle_request_or_server_error(&mut retried, &specifier, err)
.await?;
continue;
}
FetchOnceResult::ServerError(status) => {
handle_request_or_server_error(
&mut retried,
&specifier,
status.to_string(),
)
.await?;
continue;
}
};
break result;
};
drop(maybe_progress_guard);
result
}
.boxed()
}
Remote fetching refers to a process in Deno where a function calls itself in a loop-like manner due to the possibility of encountering redirects while making HTTP requests. When making an HTTP call, there's a chance that the server might respond with a redirect instruction, asking the client to retrieve the resource from a different URL. To handle this possibility, the function responsible for remote fetching repeats its own execution whenever a redirection occurs.
The following steps outline the remote fetching process in more detail:
  1. 1.
    Check for Redirect Limit: The function first checks if the redirection limit has been exceeded. If the number of redirections has reached a certain threshold, it will return an error, indicating that the process has encountered too many redirects.
  2. 2.
    Cache Check: If the specified resource is already present in the cache, the function returns that cached version of the resource. This helps in saving time and resources by avoiding unnecessary redundant downloads.
  3. 3.
    HTTP Request: The function initiates an HTTP request to fetch the desired file from a remote server.
  4. 4.
    Redirection Handling: If the server responds with a redirection instruction, indicating that the requested resource has moved to a different location, the function doesn't stop there. Instead, it calls itself again, passing the redirected URL as the new parameter. This recursive behavior ensures that the function can follow multiple redirections until the final resource is reached.
  5. 5.
    Non-Redirection Response: If no redirection occurs, the function returns the fetched file as the output.
This recursive process of remote fetching is managed by the fetch_remote function, which is designed to handle up to ten consecutive redirections. This means that if a series of redirections leads to a final resource, Deno's fetching mechanism can navigate through it effectively.
As the remote fetching process takes place, Deno provides feedback to the developer by displaying a familiar message on the console, commonly referred to as the "Download" message. This message signifies that a remote resource is being downloaded and integrated into the local environment. It serves as an indication that the remote fetching process is active and that the necessary content is being retrieved for further usage.