More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons...
-
Upload
jonas-harrell -
Category
Documents
-
view
212 -
download
0
Transcript of More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons...
![Page 1: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/1.jpg)
More Tools
Copyright © Software Carpentry 2010
This work is licensed under the Creative Commons Attribution License
See http://software-carpentry.org/license.html for more information.
Regular Expressions
![Page 2: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/2.jpg)
Regular Expressions More Tools
Thousands of papers and theses written in
LaTeX
![Page 3: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/3.jpg)
Regular Expressions More Tools
Thousands of papers and theses written in
LaTeXGranger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality\cite{ snape87 } (but see \
cite{quirrell89}),has opened up new lines of research.
However,studies at Unseen University \
cite{stibbons2002,
stibbons2008} highlight several dangers. ⋮ ⋮ ⋮ ⋮
⋮ ⋮ ⋮
![Page 4: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/4.jpg)
Regular Expressions More Tools
Thousands of papers and theses written in
LaTeXGranger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality\cite{ snape87 } (but see \
cite{quirrell89}),has opened up new lines of research.
However,studies at Unseen University \
cite{stibbons2002,
stibbons2008} highlight several dangers. ⋮ ⋮ ⋮ ⋮
⋮ ⋮ ⋮
All share a common bibliography
![Page 5: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/5.jpg)
Regular Expressions More Tools
Thousands of papers and theses written in
LaTeXGranger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality\cite{ snape87 } (but see \
cite{quirrell89}),has opened up new lines of research.
However,studies at Unseen University \
cite{stibbons2002,
stibbons2008} highlight several dangers. ⋮ ⋮ ⋮ ⋮
⋮ ⋮ ⋮
All share a common bibliography
Want to see how often citations appear
together
![Page 6: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/6.jpg)
Regular Expressions More Tools
Thousands of papers and theses written in
LaTeXGranger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality\cite{ snape87 } (but see \
cite{quirrell89}),has opened up new lines of research.
However,studies at Unseen University \
cite{stibbons2002,
stibbons2008} highlight several dangers. ⋮ ⋮ ⋮ ⋮
⋮ ⋮ ⋮
All share a common bibliography
Want to see how often citations appear
together
First step: extract citation sets from
documents
![Page 7: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/7.jpg)
Regular Expressions More Tools
Citations enclosed in \cite{…}
Granger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality\cite{ snape87 } (but see \
cite{quirrell89}),has opened up new lines of research.
However,studies at Unseen University \
cite{stibbons2002,
stibbons2008} highlight several dangers. ⋮ ⋮ ⋮ ⋮
⋮ ⋮ ⋮
![Page 8: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/8.jpg)
Regular Expressions More Tools
Citations enclosed in \cite{…}
Granger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality\cite{ snape87 } (but see \
cite{quirrell89}),has opened up new lines of research.
However,studies at Unseen University \
cite{stibbons2002,
stibbons2008} highlight several dangers. ⋮ ⋮ ⋮ ⋮
⋮ ⋮ ⋮
Multiple labels separated by commas
![Page 9: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/9.jpg)
Regular Expressions More Tools
Citations enclosed in \cite{…}
Granger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality\cite{ snape87 } (but see \
cite{quirrell89}),has opened up new lines of research.
However,studies at Unseen University \
cite{stibbons2002,
stibbons2008} highlight several dangers. ⋮ ⋮ ⋮ ⋮
⋮ ⋮ ⋮
Multiple labels separated by commas
May be white space
![Page 10: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/10.jpg)
Regular Expressions More Tools
Citations enclosed in \cite{…}
Granger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality\cite{ snape87 } (but see \
cite{quirrell89}),has opened up new lines of research.
However,studies at Unseen University \
cite{stibbons2002,
stibbons2008} highlight several dangers. ⋮ ⋮ ⋮ ⋮
⋮ ⋮ ⋮
Multiple labels separated by commas
May be white space (including line breaks)
![Page 11: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/11.jpg)
Regular Expressions More Tools
Citations enclosed in \cite{…}
Granger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality\cite{ snape87 } (but see \
cite{quirrell89}),has opened up new lines of research.
However,studies at Unseen University \
cite{stibbons2002,
stibbons2008} highlight several dangers. ⋮ ⋮ ⋮ ⋮
⋮ ⋮ ⋮
Multiple labels separated by commas
May be white space (including line breaks)
And multiple citations per line
![Page 12: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/12.jpg)
Regular Expressions More Tools
print re.search('cite{(.+)}', 'a \\cite{X} b').groups()
('X',)
Idea #1: capture everything in cite{…} in a
group
![Page 13: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/13.jpg)
Regular Expressions More Tools
print re.search('cite{(.+)}', 'a \\cite{X} b').groups()
('X',)
Idea #1: capture everything in cite{…} in a
group
print re.search('cite{(.+)}', 'a \\cite{X} b \\cite{Y} c').groups()
('X} b \\cite{Y',)
What about multiple citations?
![Page 14: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/14.jpg)
Regular Expressions More Tools
print re.search('cite{(.+)}', 'a \\cite{X} b').groups()
('X',)
Idea #1: capture everything in cite{…} in a
group
print re.search('cite{(.+)}', 'a \\cite{X} b \\cite{Y} c').groups()
('X} b \\cite{Y',)
What about multiple citations?
![Page 15: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/15.jpg)
Regular Expressions More Tools
print re.search('cite{(.+)}', 'a \\cite{X} b').groups()
('X',)
Idea #1: capture everything in cite{…} in a
group
print re.search('cite{(.+)}', 'a \\cite{X} b \\cite{Y} c').groups()
('X} b \\cite{Y',)
What about multiple citations?
Matching is greedy
![Page 16: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/16.jpg)
Regular Expressions More Tools
Idea #2: match everything inside '{}' except
'}'
![Page 17: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/17.jpg)
Regular Expressions More Tools
Idea #2: match everything inside '{}' except
'}'
Use '[^}]' to negate the set containing only
'}'
![Page 18: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/18.jpg)
Regular Expressions More Tools
print re.search('cite{([^}]+)}', 'a \\cite{X} b').groups()
('X',)
Idea #2: match everything inside '{}' except
'}'
Use '[^}]' to negate the set containing only
'}'
![Page 19: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/19.jpg)
Regular Expressions More Tools
print re.search('cite{([^}]+)}', 'a \\cite{X} b').groups()
('X',)
Idea #2: match everything inside '{}' except
'}'
Use '[^}]' to negate the set containing only
'}'
![Page 20: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/20.jpg)
Regular Expressions More Tools
print re.search('cite{([^}]+)}', 'a \\cite{X} b').groups()
('X',)
What about multiple citations?
Idea #2: match everything inside '{}' except
'}'
Use '[^}]' to negate the set containing only
'}'
![Page 21: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/21.jpg)
Regular Expressions More Tools
print re.search('cite{([^}]+)}', 'a \\cite{X} b').groups()
('X',)
print re.search('cite{([^}]+)}', 'a \\cite{X} b \\cite{Y} c').groups()
('X',)
What about multiple citations?
Idea #2: match everything inside '{}' except
'}'
Use '[^}]' to negate the set containing only
'}'
![Page 22: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/22.jpg)
Regular Expressions More Tools
print re.search('cite{([^}]+)}', 'a \\cite{X} b').groups()
('X',)
print re.search('cite{([^}]+)}', 'a \\cite{X} b \\cite{Y} c').groups()
('X',)
What about multiple citations?
Need to extract all matches, not just the first
Idea #2: match everything inside '{}' except
'}'
Use '[^}]' to negate the set containing only
'}'
![Page 23: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/23.jpg)
Regular Expressions More Tools
Idea #3: use re.findall instead of re.search
![Page 24: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/24.jpg)
Regular Expressions More Tools
Idea #3: use re.findall instead of re.search
"A programmer is only as good as her
knowledge
of her language's libraries."
![Page 25: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/25.jpg)
Regular Expressions More Tools
print re.findall('cite{([^}]+)}', 'a \\cite{X} b \\cite{Y} c')
['X', 'Y']
Idea #3: use re.findall instead of re.search
"A programmer is only as good as her
knowledge
of her language's libraries."
![Page 26: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/26.jpg)
Regular Expressions More Tools
print re.findall('cite{([^}]+)}', 'a \\cite{X} b \\cite{Y} c')
['X', 'Y']
Idea #3: use re.findall instead of re.search
"A programmer is only as good as her
knowledge
of her language's libraries."
![Page 27: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/27.jpg)
Regular Expressions More Tools
print re.findall('cite{([^}]+)}', 'a \\cite{X} b \\cite{Y} c')
['X', 'Y']
Idea #3: use re.findall instead of re.search
"A programmer is only as good as her
knowledge
of her language's libraries."
What about spaces?
![Page 28: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/28.jpg)
Regular Expressions More Tools
print re.findall('cite{([^}]+)}', 'a \\cite{X} b \\cite{Y} c')
['X', 'Y']
Idea #3: use re.findall instead of re.search
"A programmer is only as good as her
knowledge
of her language's libraries."
print re.search('cite{([^}]+)}', 'a \\cite{ X} b \\cite{Y } c').groups()
[' X', 'Y ']
What about spaces?
![Page 29: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/29.jpg)
Regular Expressions More Tools
Could tidy this up after matching using
string.strip()
![Page 30: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/30.jpg)
Regular Expressions More Tools
Could tidy this up after matching using
string.strip()
Let's modify the pattern instead
![Page 31: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/31.jpg)
Regular Expressions More Tools
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
['X', 'Y ']
Could tidy this up after matching using
string.strip()
Let's modify the pattern instead
![Page 32: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/32.jpg)
Regular Expressions More Tools
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
['X', 'Y ']
Could tidy this up after matching using
string.strip()
Let's modify the pattern instead
![Page 33: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/33.jpg)
Regular Expressions More Tools
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
['X', 'Y ']
Could tidy this up after matching using
string.strip()
Let's modify the pattern instead
Still capturing the space after 'Y'
![Page 34: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/34.jpg)
Regular Expressions More Tools
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
['X', 'Y ']
Could tidy this up after matching using
string.strip()
Let's modify the pattern instead
Still capturing the space after 'Y'
Match the word-to-nonword transition as well
![Page 35: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/35.jpg)
Regular Expressions More Tools
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
['X', 'Y ']
Could tidy this up after matching using
string.strip()
Let's modify the pattern instead
print re.findall('cite{\\s*\\b([^}]+)\\b\\s*}', 'a \\cite{ X} b \\cite{Y } c')
[' X', 'Y']
Still capturing the space after 'Y'
Match the word-to-nonword transition as well
![Page 36: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/36.jpg)
Regular Expressions More Tools
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
['X', 'Y ']
Could tidy this up after matching using
string.strip()
Let's modify the pattern instead
print re.findall('cite{\\s*\\b([^}]+)\\b\\s*}', 'a \\cite{ X} b \\cite{Y } c')
[' X', 'Y']
Still capturing the space after 'Y'
Match the word-to-nonword transition as well
![Page 37: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/37.jpg)
Regular Expressions More Tools
What about multiple labels in a single
citation?
![Page 38: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/38.jpg)
Regular Expressions More Tools
print re.findall('cite{\\s*\\b([^}]+)\\b\\s*}', '\\cite{X,Y} ')
['X,Y']
print re.findall('cite{\\s*\\b([^}]+)\\b\\s*}', '\\cite{X, Y, Z} ')
['X, Y, Z']
What about multiple labels in a single
citation?
![Page 39: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/39.jpg)
Regular Expressions More Tools
print re.findall('cite{\\s*\\b([^}]+)\\b\\s*}', '\\cite{X,Y} ')
['X,Y']
print re.findall('cite{\\s*\\b([^}]+)\\b\\s*}', '\\cite{X, Y, Z} ')
['X, Y, Z']
What about multiple labels in a single
citation?
Actually can be done, but it's very complex
![Page 40: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/40.jpg)
Regular Expressions More Tools
print re.findall('cite{\\s*\\b([^}]+)\\b\\s*}', '\\cite{X,Y} ')
['X,Y']
print re.findall('cite{\\s*\\b([^}]+)\\b\\s*}', '\\cite{X, Y, Z} ')
['X, Y, Z']
What about multiple labels in a single
citation?
Actually can be done, but it's very complex
Use re.split() to break matches on '\\s*,\\s*'
![Page 41: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/41.jpg)
Regular Expressions More Tools
# Start with a working skeleton.def get_citations(text): '''Return the set of all citation tags found in a block of text.''' return set()
if __name__ == '__main__': test = '''\Granger's work on graphs \cite{dd-gr2007,gr2009},particularly ones obeying Snape's Inequality\cite{ snape87 } (but see \cite{quirrell89}),has opened up new lines of research. However,studies at Unseen University \cite{stibbons2002, stibbons2008} highlight several dangers.'''
print get_citations(test)
set([])
![Page 42: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/42.jpg)
Regular Expressions More Tools
import re
CITE = 'cite{\\s*\\b([^}]+)\\b\\s*}'SPLIT = '\\s*,\\s*'
def get_citations(text): '''Return the set of all citation tags found in a block of text.'''
result = set() match = re.findall(CITE, text) if match: for citation in match: cites = re.split(SPLIT, citation) for c in cites: result.add(c)
return result
![Page 43: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/43.jpg)
Regular Expressions More Tools
import re
CITE = re.compile('cite{\\s*\\b([^}]+)\\b\\s*}')SPLIT = re.compile('\\s*,\\s*')
def get_citations(text): '''Return the set of all citation tags found in a block of text.'''
result = set() match = CITE.findall(text) if match: for citations in match: label_list = SPLIT.split(citations) for label in label_list: result.add(label)
return result
![Page 44: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/44.jpg)
Regular Expressions More Tools
import re
CITE = re.compile('cite{\\s*\\b([^}]+)\\b\\s*}')SPLIT = re.compile('\\s*,\\s*')
def get_citations(text): '''Return the set of all citation tags found in a block of text.'''
result = set() match = CITE.findall(text) if match: for citations in match: label_list = SPLIT.split(citations) for label in label_list: result.add(label)
return result
![Page 45: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/45.jpg)
Regular Expressions More Tools
# Now test it all out.
if __name__ == '__main__': test = '''\Granger's work on graphs \cite{dd-gr2007,gr2009},particularly ones obeying Snape's Inequality\cite{ snape87 } (but see \cite{quirrell89}),has opened up new lines of research. However,studies at Unseen University \cite{stibbons2002, stibbons2008} highlight several dangers.'''
print get_citations(test)
set(['gr2009', 'stibbons2002', 'dd-gr2007', 'stibbons2008', 'snape87', 'quirrell89'])
![Page 46: More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See .](https://reader036.fdocuments.us/reader036/viewer/2022070403/56649f275503460f94c3f8a7/html5/thumbnails/46.jpg)
June 2010
created by
Greg Wilson
Copyright © Software Carpentry 2010
This work is licensed under the Creative Commons Attribution License
See http://software-carpentry.org/license.html for more information.