Automate PowerPoint Presentation Report with Python | by Cornellius Yudha Wijaya | Jan, 2023


Photo by Nghia Nguyen on Unsplash

Data people and business people are always working together hand-to-hand. One of the common activities between data people and business users is to make PowerPoint report presentations which comprise all the important messages.

Sometimes it feels too much for data people to work on the PowerPoint report daily, and we want to focus on more important things.

This article discusses how we could develop a PowerPoint presentation with Python and automate the reporting process. Let’s get into it.

To help us automate the PowerPoint presentation report, we would use the Python package called python-pptx. It is a Python package developed to create and update PowerPoint files.

To start using the package, we need to install it first with the following code.

pip install python-pptx

After that, let’s understand the basics of the package used to create our PowerPoint presentation.

The first thing we need to do is to set up the Presentation class as the graph that contains all the necessary objects (slide, image, table, etc.). We can open an existing PowerPoint file or initiate a new one. For this example, we would create a new existing presentation that we would fill ourselves.

Before creating the presentation, we need to understand the Slide object concept. Our presentation would need a slide, and every slide is based on the theme slide layout.

The basic presentation theme from PowerPoint contains nine different layouts such as Title, Title, Content, Blank, etc. In each layout, there are predetermined templates of areas we could fill with various stuff such as text, images, bullet points, etc.

In python-pptx, these slide layouts contain onprs.slide_layouts[0] through prs.slide_layouts[8]. You can read all the layout’s order here. Let’s try to create two empty slides with different layouts.

from pptx import Presentation

prs = Presentation()
title_slide_layout = prs.slide_layouts[0]
slide1 = prs.slides.add_slide(title_slide_layout)

bullet_slide_layout = prs.slide_layouts[1]
slide2 =prs.slides.add_slide(bullet_slide_layout)

prs.save('test.pptx')

Image by Author

The result would be a PowerPoint file with two different slides. We want to fill the slide with the content as it’s important. To do that, we need to understand the concept of Shape.

The Shape is everything that could be shown on the slide. From the text, picture, table, etc., is something that appears on the slide. All the available Shape object is available in the documentation.

Additionally, we need to understand the Placeholders concept of adding content to our PowerPoint file. Placehoders itself is a Shape object and can be accessed using the following code.

for shape in slide1.placeholders:
print('{} {}'.format(shape.placeholder_format.idx, shape.name))
Image by Author

The above output shows that our first slide has two placeholders: the Title and Subtitle. In this case, we can input the object inside this placeholder.

title = slide1.shapes.title
subtitle = slide1.placeholders[1]

title.text = "This is an example text"
subtitle.text = "Amazing!"

prs.save('test.pptx')

Image by Author

Our slides are now updated with the new text in the Title and Subtitle. Now, it’s also possible to add another object in our content placeholder, such as Text Box. To do that, we could use the following code.

from pptx.util import Inches

title2 = slide2.shapes.title
title2.text = 'This is Textbook example'

left = top = width = height = Inches(3)
txBox = slide2.shapes.add_textbox(left, top, width, height)
tf = txBox.text_frame

tf.text = "This is text inside a textbox as a sample"

prs.save('test.pptx')

Image by Author

From the code above, we create a textbox inside our PowerPoint second slides and position it in the middle (3′ from each direction).

There are still many things that we could add to our slide, including:

Table

title_only_slide_layout = prs.slide_layouts[5]
slide3 = prs.slides.add_slide(title_only_slide_layout)

shapes3 = slide3.shapes

shapes3.title.text = 'Creating Table With Python'

rows = cols = 3
left = Inches(1)
top = Inches(2)
width = Inches(4.0)
height = Inches(1)

table = shapes3.add_table(rows, cols, left, top, width, height).table

# set column widths
table.columns[0].width = Inches(2.0)
table.columns[1].width = Inches(4.0)
table.columns[2].width = Inches(2.0)

# write column headings
table.cell(0, 0).text = 'Column 1'
table.cell(0, 1).text = 'Column 2'
table.cell(0, 2).text = 'Column 3'

# write body cells
table.cell(1, 0).text = 'Text 1 Example'
table.cell(1, 1).text = 'Text 2 Example'
table.cell(1, 2).text = 'Text 3 Example'

prs.save('test.pptx')

Image by Author

Chart

from pptx.chart.data import CategoryChartData
from pptx.enum.chart import XL_CHART_TYPE

slide4 = prs.slides.add_slide(title_only_slide_layout)

slide4.shapes.title.text = 'Creating Chart with Python'

chart_data = CategoryChartData()
mean_group = list(df['origin'].unique())
mean_group_res = tuple(df.groupby('origin')['mpg'].mean()[mean_group].values)
chart_data.categories = mean_group
chart_data.add_series('Mean by Origin', mean_group_res)

# add chart to slide --------------------
x, y, cx, cy = Inches(2), Inches(2), Inches(6), Inches(4.5)
slide4.shapes.add_chart(
XL_CHART_TYPE.COLUMN_CLUSTERED, x, y, cx, cy, chart_data
)

Image by Author

Image

slide5 = prs.slides.add_slide(title_only_slide_layout)

slide5.shapes.title.text = 'Add Image with Python'

img = sns.heatmap(df.corr(), annot = True).get_figure()
img.savefig('heatmap1.png')

left = top = Inches(3)
height = Inches(4)
pic = slide5.shapes.add_picture('heatmap1.png', left, top, height=height)

prs.save('test.pptx')

Image by Author

There are still many possibilities we could do with the python-pptx, which you could refer to the documentation to learn more about.

With python-pptx, we can turn the manual report into an automatic reporting experience. We need to develop the pipeline system to trigger the report automation.

Here is an example of report automation with GitHub Action.

First, we must set up the repository for all our files in GitHub.

Image by Author

After initiating our repository, we could pull it to your local or preferred environment. Let’s prepare a few files required in our environment.

Requirement text file

We would use a Python environment, so we need to state which packages we would use for the process. Create a txt file called requirement and put the following package inside the file.

Image by Author

Python Script

We need a Python script to create our report every time new data is incoming.

First, we need to prepare the data. For our example, I would save the mpg sample data in the data folder. You could always change the dataset to the other data.

df = sns.load_dataset('mpg')
df.to_csv('mpg.csv', index = False)

Next, we need to prepare the script. I would use the following code to create the PowerPoint report.

import pandas as pd
import seaborn as sns
from pptx import Presentation
from pptx.util import Inches

df = pd.read_csv('data/mpg.csv')
prs = Presentation()
title_slide_layout = prs.slide_layouts[0]
title_only_slide_layout = prs.slide_layouts[5]

slide1 = prs.slides.add_slide(title_slide_layout)

title = slide1.shapes.title
subtitle = slide1.placeholders[1]

title.text = "Trying out PowerPoint Automation"
subtitle.text = "With python-pptx and GitHub action!"

slide2 = prs.slides.add_slide(title_only_slide_layout)
slide2.shapes.title.text = 'Add Image with Python'

img = sns.heatmap(df.corr(), annot = True).get_figure()
img.savefig('graph/heatmap1.png')

left = Inches(3)
top = Inches(3)
height = Inches(4)
pic = slide2.shapes.add_picture('graph/heatmap1.png', left, top, height=height)

prs.save('report.pptx')

In the above code, I would read the mpg file from the data folder and create a heatmap file that I save in the graph folder. The output of the script would be the report.pptx file that we wanted.

GitHub Action

We want to have a specific trigger that would run our script above. Using the GitHub action, we could trigger the script execution when there are changes in our repository.

First, we need to create a folder called .github/workflows in our environment. Within that environment, we need to create a YAML file. You could call it anything, but I set it as automate_report.yml.

Inside the YAML file, we could input the following code.

name: Automate PowerPoint Report When changes happen
on: [push]
jobs:
run:
runs-on: [ubuntu-latest]
steps:
- uses: actions/checkout@v2
- name: 'Start the Process'
env:
repo_token: ${{ secrets.GITHUB_TOKEN }}
run: |
# Your processing code start here
pip install -r requirement.txt
python script.py
- name: Commit new file
uses: devops-infra/action-commit-push@master
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
commit_message: Report Automations

The code above would execute the Python script whenever new changes happen in our repository. Then the script output (report file) would be committed to our repository.

Additionally, don’t forget to change the repository setting for GitHub action to have the written permission.

Putting it together

Ultimately, we would end up with the following structure for our GitHub repository.

Image by Author

Try to Git push to your new repository and see if the action is successfully run.

Image by Author

You will see the new report file in your repository when successfully running the code. You could pull it to your local to gain the report. If you want to see the full code, you can access it here.

You could still do many automation variations, such as sending the report via email, running the trigger using AWS Lambda, scheduling it with Airflow, etc.

Be creative and develop the automation in the way that you need.



Review Website

Leave a Comment