My motivation to write a post about this is merely not to forget, and also, the best way to keep your notes is by sharing them with others. Besides that, these posts do not easily disappear. Also, you can benefit from it by providing feedback, which I find really useful for self-improvement. Your views/comments are much appreciated.
Creating virtual environments using systems like Ubuntu to run data pipelines offers numerous benefits for data scientists and developers. Firstly, it provides a sandboxed and isolated environment, ensuring that dependencies and packages do not conflict, leading to stable and reproducible results. Secondly, virtual environments allow for easy testing and debugging without impacting the system’s core functionalities. Thirdly, they enhance scalability, enabling the deployment of pipelines across various machines and cloud platforms. Lastly, virtual environments simplify collaboration, enabling teams to share consistent development environments and ensuring smooth workflow integration. Overall, leveraging virtual environments empowers data professionals to streamline their pipeline development, increase productivity, and accelerate data-driven insights.
Setting up the Ubuntu VM on your local
- Search for Windows Powershell on your local machine
2. Type ‘ssh’ command to see it is installed. SSH stands for Secure Shell, and it is a cryptographic network protocol used for secure remote access and communication over unsecured networks. It provides a secure channel between two devices, typically a client and a server, allowing them to exchange data and execute commands securely.
3. Using ‘dir’ will give you the file directory and ‘mkdir’ is to create a folder in the system. These are the most commonly used commands in Linux based systems
4. Type ‘wsl’ to see if it is setup. It is required for installation. WSL stands for Windows Subsystem for Linux. It is a compatibility layer in Windows that allows users to run a Linux distribution directly on a Windows machine. With WSL, you can access and use a Linux terminal and run Linux command-line utilities alongside your Windows applications.
5. ‘wsl — list — online’ will list valid distributions installed in your system
6. To install the desired distribution use ‘wsl — install -d Ubuntu-22.04(the version you want to install)
7. After the installation, you may or may not be asked to reboot the laptop to complete the installation. But before that, if this is your first time, you may also be asked to set up a Unix username and password. Then the process should be complete.
8. To make sure about the installation, type ‘uname -a’ to confirm the installation details.
9. In your powershell cli type ‘wsl -l -v’ to see which ones are running or stopped. In order to re-launch the virtual machine type ‘wsl -d Ubuntu-22.04 (machine you want to launch)
Working with Docker
- Search for the docker desktop and follow the installation instructions. They are pretty straightforward.
Validate Docker on Windows using the command line using PowerShell
- Use ‘docker run hello-world’ command to check whether the installation is completed successfully
- ‘Docker images’ command will give you the images in your container
- ‘Docker ps -a’ is to see the container run info
- ‘Docker rm container_id’ to stop the running image
Bringing all of them together
- By running the ‘docker run -i -t — rm ubuntu bash’ command, it will set the resources as ubuntu based docker container and thanks to the ‘ — rm’ once we close it, the container will be automatically deleted.
2. When in the container image, if you run ‘free -h’ will give you the memory info and some other useful information when working locally in your machine.
Installation of Python and distills on Windows using wsl Ubuntu
- For development purposes, Python 3.9 will be used, and the version we are looking for here is the python 3.9 for Ubuntu.
- When on Ubuntu environment in your power shell, copy and paste the below commands one by one:
1. Update the package list and install the prerequisites:
sudo apt update
sudo apt install software-properties-common
2. Add the deadsnakes PPA to your system's sources list:
sudo add-apt-repository ppa:deadsnakes/ppa
3. Update the package list again:
sudo apt update
4. Install Python 3.9:
sudo apt install python3.9
5. Verify that the installation was successful by typing:
python3.9 --version
To be able to create a virtual environment for this Python version, you can check whether you are able to do it using ‘python3.9 -m venv <environment_name>’ , if not so please run this command to make sure that you have the necessary set up for the virtual environment; ‘sudo apt install python3-distutils -y’. Then re-run the above command to verify results.
Quick side note: If you are like me and getting this error:
Error: Command '['/home/home_folder/p39-venv/bin/python3.9', '-Im', 'ensurepip', '--upgrade', '--default-pip']' returned non-zero exit status 1.
Use the below command to install remaining folders for the Python virtual env creation:
sudo apt-get install python3.9-dev python3.9-venv
Next, proceed with the virtual environment creation:
python3.9 -m venv p39-venv
Then activation:
source p39-venv/bin/activate
Make sure that you got the correct python version in the virtual environment by running ‘python — version’.
Use the below commands to deactivate and delete the virtual environment you created.
deactivate
rm -rf <venv_name>