Extract text from Word files (docx) simply

If you want to extract the text content of a Word file there are a few solutions to do this in Python. Unfortunately most of these solutions have dependencies or need to run an external command in a subprocess or are heavy/complex, using an office suite, etc. I find that the best solution among those in the Stackoverflow page is python-docx. But using it bring two dependencies: python-docx itself and lxml. Installing python-docx is not a big problem. Unfortunately lxml is sometimes hard to install or, at the minimum, requires compilation.

To avoid that, inspired by python-docx, I created a simple function to extract text from .docx files that do not require dependencies, using only the standard library. So it’s easy to incorporate it in any Python project.

Is there any way to improve it?

Comments

Using a Raspberry Pi as an air exchanger controller

The Margarita project

This project’s goal is to build a controller for my house’s air exchanger that will optimize its utilization. It will take into account exterior and interior temperature and humidity to decide what the exchanger should do. In a second time I will add a web/phone interface to access and set the controller. In fact, I have a lot of other ideas and goals, but better to not make too many promises!

The problem and the current situation

I suffer from strong allergies so, a few years ago, I added to my house an air exchanger with HEPA filter, a Venmar/VanEE THH 1.0, to purify the inside air. Unfortunately the controllers that can work with this model are really limited (completely manual with only theses modes stop/slow/fast/recycle) or more targeted at managing the relative humidity. Not to mention that they are all pretty ugly. I really don’t have any difficulties to understand why Tony Fadell created the Nest thermostat!

What is needed for an exchanger targeted at purifying the air is to run it just enough to replace all the house’s air every day and maintaining a good humidity level at the same time. Not too dry in winter, not too humid in summer.

In fact, a human at rest will consume around 400 cubic feet of air a day (0.27 CFM). We are 5 humans living in my house. Running the exchanger at slow speed (45 CFM) at least 45 min. each day should exit all the CO2 we have created. It takes 12 hours at this speed to change all the house’s air volume (30,000 cu ft). So, if I change all house’s air every day, CO2 level should be kept low enough. At the beginning I was thinking about hooking also a CO2 meter to my controller but, with these facts, I think I just need to set my controller to run the exchanger at least 45 min. a day from exterior air and all will be fine on the CO2 side (I know that my calculations don’t take in account some variables…).

My current solution is to use my costly Aube thermostat (TH146-P-U) that control my heating system and set it to control also the exchanger. I can set it to run the exchanger at a minimum of x minutes every hour and to run it more if there’s too much humidity. But it’s not possible to choose the fan speed, it’s always set at slow. I can neither control the recycling mode.

Connecting the exchanger to this thermostat was not easy. There’s no information on the 4 wires coming out from the exchanger (I called directly the manufacturer to ask for information but without success). These 4 wires come from a micro-controller on the exchanger board. I did some investigations and found that when 2 of the 4 wires are connected together the exchanger stop. So I added a relay between the thermostat relay and the exchanger to reverse the relay action. Unfortunately, with this setup, if I want to run the exchanger at full speed I still need to go in the basement and turn the button on the machine.

About a year after I did this set up, our house had a significant water infiltration that probably affected the exchanger. After this event the full speed was not working anymore. But I didn’t use the full speed really often so I can’t be sure if it’s the water or how I wired the controller that did the harm. But I know that the problem is definitely in the micro-controller. All the rest of the board is working properly.

So, this left me with two options:

  • Change the exchanger board for around $150 and keep my current setup.
  • Keep the old board and hook it to a custom controller that will fulfill my needs better that my current setup.

I chose the second option.

Choosing the controller hardware

The first time I heard about the Raspberry Pi I immediately saw that was the perfect fit for my controller project. It would be easy for me to program the controller in Python, create a Phone interface by running a web server and interface it with the exchanger and some sensors with the GPIO. So, I bought one.

Just after buying it I stumbled on the BeagleBone Black and it definitely look like another really good candidate for my project. The price is mostly the same ($45 for the BBB vs $35 for RPi, but you need to buy an SD card for the later). Some advantages of the BBB are more IO (some analogs!) and a faster CPU. Having flash memory instead of the SD card can be an advantage or disadvantage, it depends of the use. For me, it doesn’t really matter. Some disadvantages are a less powerful video output (it doesn’t matter for my project, there’s no display) and a smaller community. Makezine did a more thoughtful comparison.

In my case a micro-controller, like the Arduino, is not a good match because I need to be able to run a web server. I found only a few micro-controllers that can run a web server like the Atmega88, but the learning curve and complexity is a big disadvantage for no apparent advantage in return in my case.

Setting up the Raspberry Pi

Choosing a Linux distribution

There are many Linux distributions for the Raspberry Pi. I chose Occidentalis 0.2 from Adafruit. It’s derived from Raspbian. For me the good things about Occidentalis are:

  • Like Raspbian, it’s based on Debian, my preferred Linux distro.
  • Unlike Raspbian, the SSH daemon is already activated.
  • The avahi daemon is also already activated (Bonjour client).
  • Many IO kernel modules are included.

The most important things for me are SSH and Bonjour. I don’t have any display that I can plug to the RPi, so I need to be able to do everything via SSH from the beginning. With Bonjour I don’t have to find/guess its IP address, I can just connect to raspberrypi.local.

Installing the distro

I used a Mac to install the distro on the SD card. I followed these instructions except that I used the Occidentalis 0.2 image.

Expanding the SD card

The SD card partition will be the same size as the distro image size instead of the card size. It’s a good idea to expand the root partition to fill the SD card. It’s easy to do so by choosing the expand_rootfs menu option of the raspi-config script.

Upgrading the system

I think it’s good idea to upgrade the OS:

$ sudo apt-get update
$ sudo apt-get upgrade

Adjusting the memory

There are two things to adjust (at least in my case):

  1. By default my RPi was using only 256 MB of the 512 MB. I had to upgrade the firmware to access all the memory:

    $ sudo rpi-update
    
  2. Some memory is used by the GPU. It’s possible to change how much memory the GPU will use. In my case, as I will not use an external display, so I set it to the minimum:

    $ sudo raspi-config
    

Then choose the Memory Split option in the Advanced Options option. I set the GPU memory to 16 MB, which is the minimum.

It’s also possible to set the memory to be split dynamically (CMA).

Connecting it directly to a Mac

As I use the Occidentalis distro, I simply have to connect like this:

$ ssh pi@rasbpberrypi.local

Unfortunately I don’t have a WiFi dongle on my RPi and my router is far from my main computer. But I would like to have the RPi sitting next to my main computer to ease development. A solution is to connect the Ethernet cable directly in my Mac and share the Mac Internet (WiFi) connection. By sharing the Internet connection, the Mac will create a subnet on 192.168.2.x and will assign dynamically an address to the RPi. So, by sharing the Internet connection, you get two things: a subnet with an address for the RPi so you can now connect to the RPi from your Mac, and Internet access to the RPi. No need to turn on IPV6 on the RPi or to install avahi-autoipd, as others suggested.

Unfortunately that did not work for me on the first try and many tries after, until I realized that my LAN was already using the 192.168.2.x subnet! I switched my LAN to another subnet and everything started to work. If you can’t change the LAN subnet, another solution is to change the default address used by the Mac Internet sharing.

Setting up a proper Python environment

I will use mainly Python to develop my application. To work comfortably I set up a few things Python related (the usual suspects: pip, virtualenv, IPython):

$ sudo easy_install pip
$ sudo pip install virtualenv
$ mkdir venv
$ cd venv
$ virtualenv --system-site-packages margarita
$ cd
$ source venv/margarita/bin/activate
$ sudo pip install ipython

I also installed vim because vim.tiny, installed by default, lacks too many features:

$ sudo apt-get install vim

That’s enough for now. Next time I will write about reading the sensors and controlling the relays.

Comments