Python in Visual Studio Code. Working with Python in Visual Studio Code, using the Microsoft Python extension, is simple, fun, and productive.The extension makes VS Code an excellent Python editor, and works on any operating system with a variety of Python interpreters. Python in Visual Studio Code. Working with Python in Visual Studio Code, using the Microsoft Python extension, is simple, fun, and productive. The extension makes VS Code an excellent Python editor, and works on any operating system with a variety of Python interpreters. It leverages all of VS Code's power to provide auto complete. A Visual Studio Code extension to convert the current editor selection from JSON to CSV, and vice versa. After installation, you'll find two new commands in the Command Palette (F1 by default): Convert JSON to CSV Convert CSV to JSON.
This tutorial demonstrates using Visual Studio Code and the Microsoft Python extension with common data science libraries to explore a basic data science scenario. Specifically, using passenger data from the Titanic, you will learn how to set up a data science environment, import and clean data, create a machine learning model for predicting survival on the Titanic, and evaluate the accuracy of the generated model.
Prerequisites
The following installations are required for the completion of the tutorial. If you do not have them already, install them prior to beginning.
The Python extension for VS Code from the Visual Studio Marketplace. For additional details on installing extensions, see Extension Marketplace. The Python extension is named Python and published by Microsoft.
Note: If you already have the full Anaconda distribution installed, you don't need to install Miniconda. Alternatively, if you'd prefer not to use Anaconda or Miniconda, you can create a Python virtual environment and install the packages needed for the tutorial using pip. If you go this route, you will need to install the following packages: pandas, jupyter, seaborn, scikit-learn, keras, and tensorflow.
Set up a data science environment
Visual Studio Code and the Python extension provide a great editor for data science scenarios. With native support for Jupyter notebooks combined with Anaconda, it's easy to get started. In this section, you will create a workspace for the tutorial, create an Anaconda environment with the data science modules needed for the tutorial, and create a Jupyter notebook that you'll use for creating a machine learning model.
Begin by creating an Anaconda environment for the data science tutorial. Open an Anaconda command prompt and run
conda create -n myenv python=3.7 pandas jupyter seaborn scikit-learn keras tensorflow
to create an environment named myenv. For additional information about creating and managing Anaconda environments, see the Anaconda documentation.Next, create a folder in a convenient location to serve as your VS Code workspace for the tutorial, name it
hello_ds
.Open the project folder in VS Code by running VS Code and using the File > Open Folder command.
Once VS Code launches, open the Command Palette (View > Command Palette or ⇧⌘P (Windows, Linux Ctrl+Shift+P)). Then select the Python: Select Interpreter command:
The Python: Select Interpreter command presents the list of available interpreters that VS Code was able to locate automatically (your list will vary from the one shown below; if you don't see the desired interpreter see Configuring Python environments). From the list, select the Anaconda environment you created, which should include the text 'myenv': conda.
With the environment and VS Code setup, the final step is to create the Jupyter notebook that will be used for the tutorial. Open the Command Palette (⇧⌘P (Windows, Linux Ctrl+Shift+P)) and select Jupyter: Create New Blank Jupyter Notebook.
Note: Alternatively, from the VS Code File Explorer, you can use the New File icon to create a Notebook file named
hello.ipynb
.Use the Save icon on the main notebook toolbar to save the notebook with the filename
hello
.After your file is created, you should see the open Jupyter notebook in the native notebook editor. For additional information about native Jupyter notebook support, see this section of the documentation.
Prepare the data
This tutorial uses the Titanic dataset available on OpenML.org, which is obtained from Vanderbilt University's Department of Biostatistics at http://biostat.mc.vanderbilt.edu/DataSets. The Titanic data provides information about the survival of passengers on the Titanic, as well as characteristics about the passengers such as age and ticket class. Using this data, the tutorial will establish a model for predicting whether a given passenger would have survived the sinking of the Titanic. This section shows how to load and manipulate data in your Jupyter notebook.
To begin, download the Titanic data from OpenML.org as a csv file named
data.csv
and save it to thehello_ds
folder that you created in the previous section.In VS Code, open the
hello_ds
folder and the Jupyter notebook (hello.ipynb
), by going to File > Open Folder.Within your Jupyter notebook begin by importing the pandas and numpy libraries, two common libraries used for manipulating data, and loading the Titanic data into a pandas DataFrame. To do so, copy the below code into the first cell of the notebook. For additional guidance about working with Jupyter notebooks in VS Code, see the Working with Jupyter Notebooks documentation.
Now, run the cell using the Run cell icon or the Shift+Enter shortcut.
After the cell finishes running, you can view the data that was loaded using the variable explorer and data viewer. First click on the chart icon in the notebook's upper toolbar, then the data viewer icon to the right of the
data
variable. For additional information about the data set, refer to this document about how it was constructed.You can then use the data viewer to view, sort, and filter the rows of data. After reviewing the data, it can then be helpful to graph some aspects of it to help visualize the relationships between the different variables.
Before the data can be graphed though, you need to make sure that there aren't any issues with it. If you look at the Titanic csv file, one thing you'll notice is that a question mark ('?') was used to designate cells where data wasn't available.
While Pandas can read this value into a DataFrame, the result for a column like Age is that its data type will be set to Object instead of a numeric data type, which is problematic for graphing.
This problem can be corrected by replacing the question mark with a missing value that pandas is able to understand. Add the following code to the next cell in your notebook to replace the question marks in the age and fare columns with the numpy NaN value. Notice that we also need to update the column's data type after replacing the values.
Tip: To add a new cell you can use the insert cell icon that's in the bottom left corner of an existing cell. Alternatively, you can also use the Esc to enter command mode, followed by the B key.
Note: If you ever need to see the data type that has been used for a column, you can use the DataFrame dtypes attribute.
Now that the data is in good shape, you can use seaborn and matplotlib to view how certain columns of the dataset relate to survivability. Add the following code to the next cell in your notebook and run it to see the generated plots.
Note: To better view details on the graphs, you can open them in plot viewer by hovering over the upper left corner of the graph and clicking the button that appears.
These graphs are helpful in seeing some of the relationships between survival and the input variables of the data, but it's also possible to use pandas to calculate correlations. To do so, all the variables used need to be numeric for the correlation calculation and currently gender is stored as a string. To convert those string values to integers, add and run the following code.
Now, you can analyze the correlation between all the input variables to identify the features that would be the best inputs to a machine learning model. The closer a value is to 1, the higher the correlation between the value and the result. Use the following code to correlate the relationship between all variables and survival.
Looking at the correlation results, you'll notice that some variables like gender have a fairly high correlation to survival, while others like relatives (sibsp = siblings or spouse, parch = parents or children) seem to have little correlation.
Let's hypothesize that sibsp and parch are related in how they affect survivability, and group them into a new column called 'relatives' to see whether the combination of them has a higher correlation to survivability. To do this, you will check if for a given passenger, the number of sibsp and parch is greater than 0 and, if so, you can then say that they had a relative on board.
Use the following code to create a new variable and column in the dataset called
relatives
and check the correlation again.You'll notice that in fact when looked at from the standpoint of whether a person had relatives, versus how many relatives, there is a higher correlation with survival. With this information in hand, you can now drop from the dataset the low value sibsp and parch columns, as well as any rows that had NaN values, to end up with a dataset that can be used for training a model.
Note: Although age had a low direct correlation, it was kept because it seems reasonable that it might still have correlation in conjunction with other inputs.
Train and evaluate a model
With the dataset ready, you can now begin creating a model. For this section you'll use the scikit-learn library (as it offers some useful helper functions) to do pre-processing of the dataset, train a classification model to determine survivability on the Titanic, and then use that model with test data to determine its accuracy.
A common first step to training a model is to divide up the dataset into training and validation data. This allows you to use a portion of the data to train the model and a portion of the data to test the model. If you used all your data to train the model, you wouldn't have a way to estimate how well it would actually perform against data the model has not yet seen. A benefit of the scikit-learn library is that it provides a method specifically for splitting a dataset into training and test data.
Add and run a cell with the following code to the notebook to split up the data.
Next, you'll normalize the inputs such that all features are treated equally. For example, within the dataset the values for age range from ~0-100, while gender is only a 1 or 0. By normalizing all the variables, you can ensure that the ranges of values are all the same. Use the following code in a new code cell to scale the input values.
There are a number of different machine learning algorithms that you could choose from to model the data and scikit-learn provides support for a number of them, as well as a chart to help select the one that's right for your scenario. For now, use the Naïve Bayes algorithm, a common algorithm for classification problems. Add a cell with the following code to create and train the algorithm.
With a trained model, you can now try it against the test data set that was held back from training. Add and run the following code to predict the outcome of the test data and calculate the accuracy of the model.
Looking at the result of the test data, you'll see that the trained algorithm had a ~75% success rate at estimating survival.
(Optional) Use a neural network to increase accuracy
A neural network is a model that uses weights and activation functions, modeling aspects of human neurons, to determine an outcome based on provided inputs. Unlike the machine learning algorithm you looked at previously, neural networks are a form of deep learning wherein you don't need to know an ideal algorithm for your problem set ahead of time. It can be used for many different scenarios and classification is one of them. For this section, you'll use the Keras library with TensorFlow to construct the neural network, and explore how it handles the Titanic dataset.
The first step is to import the required libraries and to create the model. In this case, you'll use a Sequential neural network, which is a layered neural network wherein there are multiple layers that feed into each other in sequence.
After defining the model, the next step is to add the layers of the neural network. For now, let's keep things simple and just use three layers. Add the following code to create the layers of the neural network.
- The first layer will be set to have a dimension of 5, since you have 5 inputs: sex, pclass, age, relatives, and fare.
- The last layer must output 1, since you want a 1-dimensional output indicating whether a passenger would survive.
- The middle layer was kept at 5 for simplicity, although that value could have been different.
The rectified linear unit (relu) activation function is used as a good general activation function for the first two layers, while the sigmoid activation function is required for the final layer as the output you want (of whether a passenger survives or not) needs to be scaled in the range of 0-1 (the probability of a passenger surviving).
You can also look at the summary of the model you built with this line of code:
Once the model is created, it needs to be compiled. As part of this, you need to define what type of optimizer will be used, how loss will be calculated, and what metric should be optimized for. Add the following code to build and train the model. You'll notice that after training the accuracy is ~80%.
Note: This step may take anywhere from a few seconds to a few minutes to run depending on your machine.
With the model built and trained its now time to see how it performs against the test data.
Similar to the training, you'll notice that you were able to get close to 80% accuracy in predicting survival of passengers. This result was better than the 75% accuracy from the Naive Bayes Classifier tried previously.
Next steps
Now that you're familiar with the basics of performing machine learning within Visual Studio Code, here are some other Microsoft resources and tutorials to check out.
- Learn more about working with Jupyter Notebooks in Visual Studio Code (video).
- Get started with Azure Machine Learning for VS Code to deploy and optimize your model using the power of Azure.
- Find additional data to explore on Azure Open Data Sets.
In this tutorial, you use Python 3 to create the simplest Python 'Hello World' application in Visual Studio Code. By using the Python extension, you make VS Code into a great lightweight Python IDE (which you may find a productive alternative to PyCharm).
This tutorial introduces you to VS Code as a Python environment, primarily how to edit, run, and debug code through the following tasks:
- Write, run, and debug a Python 'Hello World' Application
- Learn how to install packages by creating Python virtual environments
- Write a simple Python script to plot figures within VS Code
This tutorial is not intended to teach you Python itself. Once you are familiar with the basics of VS Code, you can then follow any of the programming tutorials on python.org within the context of VS Code for an introduction to the language.
If you have any problems, feel free to file an issue for this tutorial in the VS Code documentation repository.
Prerequisites
To successfully complete this tutorial, you need to first setup your Python development environment. Specifically, this tutorial requires:
- VS Code
- VS Code Python extension
- Python 3
Install Visual Studio Code and the Python Extension
If you have not already done so, install VS Code.
Next, install the Python extension for VS Code from the Visual Studio Marketplace. For additional details on installing extensions, see Extension Marketplace. The Python extension is named Python and it's published by Microsoft.
Install a Python interpreter
Along with the Python extension, you need to install a Python interpreter. Which interpreter you use is dependent on your specific needs, but some guidance is provided below.
Windows
Install Python from python.org. You can typically use the Download Python button that appears first on the page to download the latest version.
Note: If you don't have admin access, an additional option for installing Python on Windows is to use the Microsoft Store. The Microsoft Store provides installs of Python 3.7, Python 3.8, and Python 3.9. Be aware that you might have compatibility issues with some packages using this method.
For additional information about using Python on Windows, see Using Python on Windows at Python.org
macOS
The system install of Python on macOS is not supported. Instead, an installation through Homebrew is recommended. To install Python using Homebrew on macOS use brew install python3
at the Terminal prompt.
Note On macOS, make sure the location of your VS Code installation is included in your PATH environment variable. See these setup instructions for more information.
Linux
The built-in Python 3 installation on Linux works well, but to install other Python packages you must install pip
with get-pip.py.
Other options
Data Science: If your primary purpose for using Python is Data Science, then you might consider a download from Anaconda. Anaconda provides not just a Python interpreter, but many useful libraries and tools for data science.
Windows Subsystem for Linux: If you are working on Windows and want a Linux environment for working with Python, the Windows Subsystem for Linux (WSL) is an option for you. If you choose this option, you'll also want to install the Remote - WSL extension. For more information about using WSL with VS Code, see VS Code Remote Development or try the Working in WSL tutorial, which will walk you through setting up WSL, installing Python, and creating a Hello World application running in WSL.
Verify the Python installation
To verify that you've installed Python successfully on your machine, run one of the following commands (depending on your operating system):
Linux/macOS: open a Terminal Window and type the following command:
Windows: open a command prompt and run the following command:
If the installation was successful, the output window should show the version of Python that you installed.
Note You can use the py -0
command in the VS Code integrated terminal to view the versions of python installed on your machine. The default interpreter is identified by an asterisk (*).
Start VS Code in a project (workspace) folder
Using a command prompt or terminal, create an empty folder called 'hello', navigate into it, and open VS Code (code
) in that folder (.
) by entering the following commands:
Note: If you're using an Anaconda distribution, be sure to use an Anaconda command prompt.
By starting VS Code in a folder, that folder becomes your 'workspace'. VS Code stores settings that are specific to that workspace in .vscode/settings.json
, which are separate from user settings that are stored globally.
Alternately, you can run VS Code through the operating system UI, then use File > Open Folder to open the project folder.
Select a Python interpreter
Python is an interpreted language, and in order to run Python code and get Python IntelliSense, you must tell VS Code which interpreter to use.
From within VS Code, select a Python 3 interpreter by opening the Command Palette (⇧⌘P (Windows, Linux Ctrl+Shift+P)), start typing the Python: Select Interpreter command to search, then select the command. You can also use the Select Python Environment option on the Status Bar if available (it may already show a selected interpreter, too):
The command presents a list of available interpreters that VS Code can find automatically, including virtual environments. If you don't see the desired interpreter, see Configuring Python environments.
Note: When using an Anaconda distribution, the correct interpreter should have the suffix ('base':conda)
, for example Python 3.7.3 64-bit ('base':conda)
.
Selecting an interpreter sets the python.pythonPath
value in your workspace settings to the path of the interpreter. To see the setting, select File > Preferences > Settings (Code > Preferences > Settings on macOS), then select the Workspace Settings tab.
Note: If you select an interpreter without a workspace folder open, VS Code sets python.pythonPath
in your user settings instead, which sets the default interpreter for VS Code in general. The user setting makes sure you always have a default interpreter for Python projects. The workspace settings lets you override the user setting.
Create a Python Hello World source code file
From the File Explorer toolbar, select the New File button on the hello
folder:
Name the file hello.py
, and it automatically opens in the editor:
By using the .py
file extension, you tell VS Code to interpret this file as a Python program, so that it evaluates the contents with the Python extension and the selected interpreter.
Note: The File Explorer toolbar also allows you to create folders within your workspace to better organize your code. You can use the New folder button to quickly create a folder.
Now that you have a code file in your Workspace, enter the following source code in hello.py
:
When you start typing print
, notice how IntelliSense presents auto-completion options.
IntelliSense and auto-completions work for standard Python modules as well as other packages you've installed into the environment of the selected Python interpreter. It also provides completions for methods available on object types. For example, because the msg
variable contains a string, IntelliSense provides string methods when you type msg.
:
Feel free to experiment with IntelliSense some more, but then revert your changes so you have only the msg
variable and the print
call, and save the file (⌘S (Windows, Linux Ctrl+S)).
For full details on editing, formatting, and refactoring, see Editing code. The Python extension also has full support for Linting.
Run Hello World
It's simple to run hello.py
with Python. Just click the Run Python File in Terminal play button in the top-right side of the editor.
The button opens a terminal panel in which your Python interpreter is automatically activated, then runs python3 hello.py
(macOS/Linux) or python hello.py
(Windows):
There are three other ways you can run Python code within VS Code:
Right-click anywhere in the editor window and select Run Python File in Terminal (which saves the file automatically):
Select one or more lines, then press Shift+Enter or right-click and select Run Selection/Line in Python Terminal. This command is convenient for testing just a part of a file.
From the Command Palette (⇧⌘P (Windows, Linux Ctrl+Shift+P)), select the Python: Start REPL command to open a REPL terminal for the currently selected Python interpreter. In the REPL, you can then enter and run lines of code one at a time.
Configure and run the debugger
Let's now try debugging our simple Hello World program.
First, set a breakpoint on line 2 of hello.py
by placing the cursor on the print
call and pressing F9. Alternately, just click in the editor's left gutter, next to the line numbers. When you set a breakpoint, a red circle appears in the gutter.
Next, to initialize the debugger, press F5. Since this is your first time debugging this file, a configuration menu will open from the Command Palette allowing you to select the type of debug configuration you would like for the opened file.
Note: VS Code uses JSON files for all of its various configurations; launch.json
is the standard name for a file containing debugging configurations.
These different configurations are fully explained in Debugging configurations; for now, just select Python File, which is the configuration that runs the current file shown in the editor using the currently selected Python interpreter.
The debugger will stop at the first line of the file breakpoint. The current line is indicated with a yellow arrow in the left margin. If you examine the Local variables window at this point, you will see now defined msg
variable appears in the Local pane.
A debug toolbar appears along the top with the following commands from left to right: continue (F5), step over (F10), step into (F11), step out (⇧F11 (Windows, Linux Shift+F11)), restart (⇧⌘F5 (Windows, Linux Ctrl+Shift+F5)), and stop (⇧F5 (Windows, Linux Shift+F5)).
The Status Bar also changes color (orange in many themes) to indicate that you're in debug mode. The Python Debug Console also appears automatically in the lower right panel to show the commands being run, along with the program output.
To continue running the program, select the continue command on the debug toolbar (F5). The debugger runs the program to the end.
Tip Debugging information can also be seen by hovering over code, such as variables. In the case of msg
, hovering over the variable will display the string Hello world
in a box above the variable.
You can also work with variables in the Debug Console (If you don't see it, select Debug Console in the lower right area of VS Code, or select it from the ... menu.) Then try entering the following lines, one by one, at the > prompt at the bottom of the console:
Select the blue Continue button on the toolbar again (or press F5) to run the program to completion. 'Hello World' appears in the Python Debug Console if you switch back to it, and VS Code exits debugging mode once the program is complete.
If you restart the debugger, the debugger again stops on the first breakpoint.
To stop running a program before it's complete, use the red square stop button on the debug toolbar (⇧F5 (Windows, Linux Shift+F5)), or use the Run > Stop debugging menu command.
For full details, see Debugging configurations, which includes notes on how to use a specific Python interpreter for debugging.
Tip: Use Logpoints instead of print statements: Developers often litter source code with print
statements to quickly inspect variables without necessarily stepping through each line of code in a debugger. In VS Code, you can instead use Logpoints. A Logpoint is like a breakpoint except that it logs a message to the console and doesn't stop the program. For more information, see Logpoints in the main VS Code debugging article.
Install and use packages
Let's now run an example that's a little more interesting. In Python, packages are how you obtain any number of useful code libraries, typically from PyPI. For this example, you use the matplotlib
and numpy
packages to create a graphical plot as is commonly done with data science. (Note that matplotlib
cannot show graphs when running in the Windows Subsystem for Linux as it lacks the necessary UI support.)
Return to the Explorer view (the top-most icon on the left side, which shows files), create a new file called standardplot.py
, and paste in the following source code:
Tip: If you enter the above code by hand, you may find that auto-completions change the names after the as
keywords when you press Enter at the end of a line. To avoid this, type a space, then Enter.
Next, try running the file in the debugger using the 'Python: Current file' configuration as described in the last section.
Unless you're using an Anaconda distribution or have previously installed the matplotlib
package, you should see the message, 'ModuleNotFoundError: No module named 'matplotlib'. Such a message indicates that the required package isn't available in your system.
To install the matplotlib
package (which also installs numpy
as a dependency), stop the debugger and use the Command Palette to run Terminal: Create New Integrated Terminal (⌃⇧` (Windows, Linux Ctrl+Shift+`)). This command opens a command prompt for your selected interpreter.
Vs Code Csv Editor
A best practice among Python developers is to avoid installing packages into a global interpreter environment. You instead use a project-specific virtual environment
that contains a copy of a global interpreter. Once you activate that environment, any packages you then install are isolated from other environments. Such isolation reduces many complications that can arise from conflicting package versions. To create a virtual environment and install the required packages, enter the following commands as appropriate for your operating system:
Note: For additional information about virtual environments, see Environments.
Create and activate the virtual environment
Note: When you create a new virtual environment, you should be prompted by VS Code to set it as the default for your workspace folder. If selected, the environment will automatically be activated when you open a new terminal.
For Windows
If the activate command generates the message 'Activate.ps1 is not digitally signed. You cannot run this script on the current system.', then you need to temporarily change the PowerShell execution policy to allow scripts to run (see About Execution Policies in the PowerShell documentation):
For macOS/Linux
Select your new environment by using the Python: Select Interpreter command from the Command Palette.
Install the packages
Rerun the program now (with or without the debugger) and after a few moments a plot window appears with the output:
Once you are finished, type
deactivate
in the terminal window to deactivate the virtual environment.
For additional examples of creating and activating a virtual environment and installing packages, see the Django tutorial and the Flask tutorial.
Next steps
Vs Code Csv Delete Column
You can configure VS Code to use any Python environment you have installed, including virtual and conda environments. You can also use a separate environment for debugging. For full details, see Environments.
To learn more about the Python language, follow any of the programming tutorials listed on python.org within the context of VS Code.
To learn to build web apps with the Django and Flask frameworks, see the following tutorials:
There is then much more to explore with Python in Visual Studio Code:
- Editing code - Learn about autocomplete, IntelliSense, formatting, and refactoring for Python.
- Linting - Enable, configure, and apply a variety of Python linters.
- Debugging - Learn to debug Python both locally and remotely.
- Testing - Configure test environments and discover, run, and debug tests.
- Settings reference - Explore the full range of Python-related settings in VS Code.