Installing Docker

Computational Reproducibility

Docker is, at least in theory, a really neat way to manage your environment. Like computers, occasionally that’s true and it’s really handy. More often than not, you’ll wonder why this needed to be this complicated as you cry curled up in a ball on the floor at 1 in the morning (case in point: https://docs.docker.com/build/architecture/#install-buildx). Below I’ll walk you though my use case, ecosystem, and how I eventually got it up off the ground.

The futility of documentation

Part of my struggle is that of the “early” adapter. As I sit here in March of 2023, there are key setup steps in windows that are no longer critical to perform that were at the start of the year, including transitions from wsl to wsl2 and off brand distros that I believe are no longer necessary after systemd was added. What I have now works for the most part so I’m unwilling to break it to test these changes, but I’m sure in 6 months something else will have changed anyways.

Having spent the day unsuccessfully getting docker to install (it would load an image once and then never start again), Based on this post and the Windows WSL docs, here is the process that finally worked for me. See intro note above related to my issues here.
~~1) Reset any previous wsl install following https://pureinfotech.com/reset-wsl2-linux-distro-windows-10/~~
~~2) We need a special version of wsl that has the systemcdl command, I chose distrod.~~
~~3) Install ubuntu with focal~~
~~4) reboot pc~~
~~5) !Success~~

Edit: Case in point. I rebuilt my environment 3 months later and it’s 4 lines. Why do I even bother…

My use case & ecosystem

For 90% of what I want to do, an VM of OSGeo would be the most direct and accessible way I’d run Python and Linux applications. So why am I so tied to getting docker to run on Windows? Because photogrammaty/SfM is a memory intensive process, so every level above the bare metal adds overhead and costs memory; and virtualization, even complex docker images, can save a large portion of that overhead. It also keeps installs tidy so I don’t step all over my home OS. And because my name is Jim and I’m a silly goose who does things the hard way. Finally, there’s no way to scale an analysis to if you have to click or put eyes on every piece of data you’d use, so at that level almost everything needs to be a function at the terminal, and a functional dockered approach makes horizontal scaling fairly trivial for the cloud wizards.

Installing Docker

Last Built: 03/2025 on windows 11 Offical docs

Windows only, enable wsl

In administrative PowerShell: wsl --install

Misc.: WSL maintinence

```{bash}
sudo apt update && sudo apt upgrade
```

Install buildx?

Does not appear to be strictly necessary at this point but the warning does indicate this step will be required sooner or later.

Install docker

```{bash}
sudo apt-get update
sudo apt install docker.io
sudo snap install docker
```

Testing a “Hello world”

```{bash}
docker --version
docker run hello-world
```

General working pattern

Pull the repo, which will have a file (likely called “Dockerfile”, no extension) which specifies the docker build, the command to create that image is typically docker build -f Dockerfile -t <image_name>:<number> <path/to/repository>
Connect

```{bash}
docker run 
    --rm 
    -it 
    --name 'Broken_boi' 
    -v /<codebase_on_disk>/:/<codebase_in_docker> 
   <image_name>:<number>
```

Expand for Quick Cheatsheet

List the instances that are running: docker ps -a
Stop and remove all containers: docker ps -aq | xargs docker stop | xargs docker rm
Build from a docker file (from within repo folder): docker build -f Dockerfile -t name:verson_number /path/to/file
start an instance: docker run -it --rm -p 8888:8888 -v /mnt/g:/home/rstudio/g pangeo:1.0 jupyter lab --ip 0.0.0.0
Attach to an instance: docker image ls
Shut down an instance: docker kill <name>
List the images available: docker image ls
Shut down all containers: docker container prune -f
Remove an image: docker rmi <image_name>
Remove all images: docker rmi $(docker images --filter "dangling=true" -q --no-trunc)
Remove everything: docker system prune --all
Restart Docker with:
- sudo systemctl restart containerd.service
- sudo systemctl restart docker.socket
- sudo systemctl restart docker.service

Running Docker containers in VS Code

Launch VS code on local GFE Install extensions Remote-SSH Remote-SSH: Editing Conf Docker Python

Select Remote Window status bar at the bottom left corner of the VS Code GUI: Open SSH configuration file -> C:<user>.sshand update the following:

```{md}
Host <VM name>
    HostName <VM name>
    User <firstname.lastname>
    IdentityFile ~/.ssh/id_rsa (generate this in Moba from the VM terminal)
```

Select Remote Window status bar -> Connect to Host. You should see a list of all available Hosts from the config file. A new GUI will open and you should see the SSH connection in the Remote Window status bar. From this GUI, opening a terminal will connect to the VM and opening a file/folder will reference the local VM directory. Running a new docker container or attaching to an existing container in the terminal will connect to the container and you should be able to run files and lines of code from the editor in the container.

Accessing Jupyter Notebook in docker image

```{bash}
docker run --rm -it --name <name> -v <path to repo>:/repo/ -v <path to data>:/data/ --expose 8080 -p 8080:8080 <image:version>
jupyter-notebook --no-browser --ip 0.0.0.0 --port=8069 --allow-root
```

Adding docker user to group

```{bash}
# check /etc/group to see if the docker group exists
cat /etc/group | grep docker

# create a docker group if it does not exist
sudo groupadd docker

# add yourself to the docker group
sudo usermod -aG docker $(whoami)
```

Images and code

Python

more minimal: https://github.com/datawire/hello-world-python

From FIM

git clone https://github.com/NOAA-OWP/inundation-mapping.git
Build Docker Image : docker build -f Dockerfile -t owpfim:1.0 <path/to/inundation-mapping>
‘docker run –rm -it –name -v /:/foss_fim -v /:/outputs -v /:/fim_temp -v :/data owp_fim:1.0’
Create FIM group on host machine: groupadd -g 1370800178 fim
Change group ownership of repo (needs to be redone when a new file occurs in the repo): chgrp -R fim <path/to/repository>
jupyter-notebook --no-browser --ip 0.0.0.0 --port=8787 --allow-root

From PANGEO

git clone https://github.com/pangeo-data/pangeo-docker-images.git
Build Docker Image : docker build -f Dockerfile -t pangeo:1.0 /path/to/pangeo-docker-images
Launch a notebook: docker run -it --rm -p 8888:8888 pangeo:1.0 jupyter lab --ip 0.0.0.0

From R Docker (& personal dev env - geodev)

git clone https://github.com/rocker-org/rocker-versioned2.git
[optional] extend the Dockerfile to install desired packages.
From top of repo, build Docker Image:
- Mine: docker build -f ./dockerfiles/geodev.Dockerfile -t geodev:1.0 /path/to/rocker-versioned2
- RStudio: docker build -f ./dockerfiles/rstudio_4.3.1.Dockerfile -t rstudio:1.0 /path/to/rocker-versioned2
Launch the image: docker run -it --rm -p 8787:8787 -p 4200:4200 -e PASSWORD=YOURNEWPASSWORD -e ROOT=TRUE geodev:1.0
Point to localhost:port, (http://localhost:8787 in our example as specified in the dockerfile and exposed in the image mount)
- username: rstudio
- password: YOURNEWPASSWORD

Restarting rstudio-server

I break RStudio often and have a habit of not saving. Unlike a local env, if you just close without having saved you may lose changes. First, SAVE OFTEN. Second, One potential solution is to “log” into the container as root and restart the daemon. The operating system for rocker/rstudio:4.2.2 is Ubuntu and the steps below should work for Ubuntu/Debian.

log into the container, change the container ID/name accordingly: docker exec -it rstudio_server /bin/bash
list services: service --status-all
restart RStudio: service rstudio-server restart
exit and log back into RStudio
Hopefully when you visit the RStudio Server page again the page is responsive again you are ok, and SAVE MORE OFTEN!

Aside: Site Building with Quarto

Important

Anyone know how to get reticulate to work in docker?

Some misc. setup:
- Extension installation for revealjs
  - I like the ability to increase font when folks complain that the text is too small to see¹ : quarto add gadenbuie/revealjs-text-resizer
  - I tend to overuse QR codes when I need to deploy them, so to make that aspect a little more seamless I add: quarto install extension jmbuhr/quarto-qrcode
  - Quizzes are an important tool in the learning process: https://github.com/parmsam/quarto-quiz
  - Font awesome helpers: quarto add quarto-ext/fontawesome
  - And even more flair to express myself: quarto add ArthurData/quarto-confetti
Quarto docs
Copy zotero lib to classes > quarto preview --host 0.0.0.0 --port 4200 --no-browser > http://localhost:4200/ > quarto render > git push docs
[[20241119183423]] Quarto-revealjs helpers

Aside: Building R packages

seehttps://r-pkgs.org/whole-game.html#write-the-first-function, https://yonicd.github.io/sinew/articles/motivation.html, https://github.com/jthomasmock/pkg-building, and https://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/

I have no intention of taking a deep dive into this (docs | cheatshet) but in general you should:

Make a new package in a new folder, separate functions out.
Build your README.rmd using this template:

```{md}
---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

#`#`#`{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  eval = FALSE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
#`#`#`  <!-- remove the # -->

markdown here...
```

Attribute with: withr::with_dir(getwd(), usethis::use_mit_license(name = "Cornholio"))
Make HTML with: usethis::use_pkgdown()
Append missing namespace with: sinew::pretty_namespace(getwd(),overwrite = TRUE)
Make first cut headers with:

```{r}
usethis::use_pkgdown()
usethis::use_pkgdown_github_pages()
sinew::sinew_opts$set(markdown_links = TRUE)
sinew::makeOxyFile(input = getwd(), overwrite = TRUE, verbose = FALSE)
```

You may need to manually add imports such as these like so:

```{r}
#' @import magrittr
#' @import data.table
#' @importFrom foreach `%do%`
#' @importFrom foreach `%dopar%`

marco <- function(in_value=TRUE) {
  # sinew::moga(file.path(getwd(),"R/hello.R"),overwrite = TRUE)
  # devtools::document()
  # pkgdown::build_site(new_process=TRUE)
  # devtools::load_all()
  # 
  # marco(in_value=TRUE)
  
  ## -- Start --
  
  print(" ⚠ WARNING WIZARD ⚠  ")
  print("                     ")
  print("    ⚠              ")
  print(" (∩｀-´)⊃━☆ﾟ.*･｡ﾟ  " )

  return(TRUE)
}
```

As we iterate we:
- Test functions with: devtools::load_all()
- Make new headers with: sinew::moga(file.path(getwd(),"R/hello.R"),overwrite = TRUE) and copy output into the file.
- Recreate .Rd files with: devtools::document()
- Delete the markdown version of the readme and run pkgdown::build_site(new_process = TRUE) or press the “knit” button.
remove the docs line from .gitignore and push. Publish that as the github pages. If you are pushing to a newly created empty repo that will look something like:

```{bash}
git init
git remote add origin https://github.com/JimColl/RRASSLER.git
git commit -m "first commit"
git branch --move master main
git push --set-upstream origin main
```

Footnotes

as they sit in the back of the room looking down at their phone↩︎