Home > Uncategorized > Script: python script to modify strings in a file

Script: python script to modify strings in a file

I recently migrated a bunch of moinmoin wiki pages from a very old version.

Although the migration scripts ran okay, it appears they didn’t account for the link syntax change between the versions (or if they did, the scripts malfunctioned).

After some research, it appeared that the best way to handle the situation I faced, with the need to retain the string, but modify it (meaning the final string would be a derivative of the original string), was to use a script versus just using sed.

The following is a python script that does just that.

The line to be processed is:

["Old link"]

The subsequent replaced line is:

[[Old_link|Old link]]

The script contents:

# find /var/www/moin/wiki/data/pages -iname text_html -type f -exec rm -f {} \;
# find /var/www/moin/wiki/data/pages -iname pagelinks -type f -exec rm -f {} \;

import fileinput
import os
import re

for root, dirs, files in os.walk('C:\Documents and Settings\mbrown\Desktop\pages'):

    for name in files:
        filename = os.path.join(root, name)
        for line in fileinput.FileInput(filename,inplace=1):
            if ("[\"" in line) and ("\"]" in line):
                #find all positions of string in the line
                starts = [match.start() for match in re.finditer(re.escape("[\""), line)]
                ends = [match.start() for match in re.finditer(re.escape("\"]"), line)]
                linkcount = len(starts)
                for i in range(0,linkcount):
                    starts = [match.start() for match in re.finditer(re.escape("[\""), line)]
                    ends = [match.start() for match in re.finditer(re.escape("\"]"), line)]
                    linktext = line[starts[i]+2:ends[i]]
                    newtarget = linktext.replace(' ','_')
                    newlinktext = "[[" + newtarget + "|" + linktext + "]]"
                    line = line.replace(linktext,newlinktext)

                line = line.replace("[\"","")
                line = line.replace("\"]","")

            if line[len(line)-1] == "\n":
                line = line[0:(len(line)-1)] #each print sends a \n (CR followed by an LF), so you need to remove the existing \n
            print line
        fileinput.close()
Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: