diffpy.structure.parsers package
Conversion plugins for various structure formats.
The recognized structure formats are defined by subclassing StructureParser, by convention these classes are named P_<format>.py. The parser classes should to override the parseLines() and toLines() methods of StructureParser. Any structure parser needs to be registered in parser_index module.
For normal usage it should be sufficient to use the routines provided in this module.
- Content:
StructureParser: base class for a concrete Parser
parser_index: dictionary of known structure formats
getParser: factory for Parser at given format
inputFormats: list of available input formats
outputFormats: list of available output formats
- diffpy.structure.parsers.getParser(format, **kw)[source]
Return Parser instance for a given structure format.
- Parameters:
format (str) – String with the format name, see parser_index_mod.
**kw (dict) – Keyword arguments passed to the Parser init function.
- Returns:
Parser instance for the given format.
- Return type:
Parser
- Raises:
StructureFormatError – When the format is not defined.
- diffpy.structure.parsers.inputFormats()[source]
Return list of implemented input structure formats.
- diffpy.structure.parsers.outputFormats()[source]
Return list of implemented output structure formats.
Submodules
diffpy.structure.parsers.p_rawxyz module
Parser for raw XYZ file format.
Raw XYZ is a 3 or 4 column text file with cartesian coordinates of atoms and an optional first column for atom types.
- class diffpy.structure.parsers.p_rawxyz.P_rawxyz[source]
Bases:
StructureParser
Parser –> StructureParser subclass for RAWXYZ format.
- format
Format name, default “rawxyz”.
- Type:
str
- parseLines(lines)[source]
Parse list of lines in RAWXYZ format.
- Parameters:
lines (list of str) – List of lines in RAWXYZ format.
- Returns:
Parsed structure instance.
- Return type:
- Raises:
StructureFormatError – Invalid RAWXYZ format.
diffpy.structure.parsers.structureparser module
Definition of StructureParser, a base class for specific parsers.
- class diffpy.structure.parsers.structureparser.StructureParser[source]
Bases:
object
Base class for all structure parsers.
- format
Format name of particular parser.
- Type:
str
- filename
Path to structure file that is read or written.
- Type:
str
- parseLines(lines)[source]
Create Structure instance from a list of lines.
Return Structure object or raise StructureFormatError exception.
Note
This method has to be overloaded in derived class.
diffpy.structure.parsers.p_cif module
Parser for basic CIF file format.
- diffpy.structure.parsers.p_cif.rx_float
Constant regular expression for leading_float().
- Type:
re.Pattern
- diffpy.structure.parsers.p_cif.symvec
Helper dictionary for getSymOp().
- Type:
dict
Note
References: https://www.iucr.org/resources/cif
- class diffpy.structure.parsers.p_cif.P_cif(eps=None)[source]
Bases:
StructureParser
Simple parser for CIF structure format.
Reads Structure from the first block containing _atom_site_label key. Following blocks, if any, are ignored.
- Parameters:
eps (float, Optional) – Fractional coordinates cutoff for duplicate positions. When
None
use the default for ExpandAsymmetricUnit:1.0e-5
.
- format
Structure format name.
- Type:
str
- ciffile
Instance of CifFile from PyCifRW.
- Type:
CifFile
- spacegroup
Instance of SpaceGroup used for symmetry expansion.
- Type:
- eps
Resolution in fractional coordinates for non-equal positions. Used for expansion of asymmetric unit.
- Type:
float
- eau
Instance of ExpandAsymmetricUnit from SymmetryUtilities.
- Type:
- asymmetric_unit
List of Atom instances for the original asymmetric unit in the CIF file.
- Type:
list
- labelindex
Dictionary mapping unique atom label to index of Atom in self.asymmetric_unit.
- Type:
dict
- anisotropy
Dictionary mapping unique atom label to displacement anisotropy resolved at that site.
- Type:
dict
- cif_sgname
Space group name obtained by looking up the value of _space_group_name_Hall, _symmetry_space_group_name_Hall, _space_group_name_H-M_alt, _symmetry_space_group_name_H-M items.
None
when neither is defined.- Type:
str or None
- BtoU = 0.012665147955292222
Conversion factor from B values to U values.
- Type:
float
- parse(s)[source]
Create Structure instance from a string in CIF format.
- Parameters:
s (str) – A string in CIF format.
- Returns:
Structure instance.
- Return type:
- Raises:
StructureFormatError – When the data do not constitute a valid CIF format.
- parseFile(filename)[source]
Create Structure from an existing CIF file.
- Parameters:
filename (str) – Path to structure file.
- Returns:
Structure instance.
- Return type:
- Raises:
StructureFormatError – When the data do not constitute a valid CIF format.
IOError – When the file cannot be opened.
- parseLines(lines)[source]
Parse list of lines in CIF format.
- Parameters:
lines (list) – List of strings stripped of line terminator.
- Returns:
Structure instance.
- Return type:
- Raises:
StructureFormatError – When the data do not constitute a valid CIF format.
- diffpy.structure.parsers.p_cif.getParser(eps=None)[source]
Return new parser object for CIF format.
- Parameters:
eps (float, Optional) – fractional coordinates cutoff for duplicate positions. When
None
use the default for ExpandAsymmetricUnit:1.0e-5
.- Returns:
Instance of P_cif.
- Return type:
- diffpy.structure.parsers.p_cif.getSymOp(s)[source]
Create SpaceGroups.SymOp instance from a string.
- Parameters:
s (str) – Formula for equivalent coordinates, for example
'x,1/2-y,1/2+z'
.- Returns:
Instance of SymOp.
- Return type:
- diffpy.structure.parsers.p_cif.leading_float(s, d=0.0)[source]
Extract the first float from a string and ignore trailing characters.
Useful for extracting values from “value(std)” syntax.
- Parameters:
s (str) – The string to be scanned for floating point value.
d (float, Optional) – The default value when s is “.” or “?”, which in CIF format stands for inapplicable and unknown, respectively.
- Returns:
The extracted floating point value.
- Return type:
float
- Raises:
ValueError – When string does not start with a float.
diffpy.structure.parsers.p_auto module
Parser for automatic file format detection.
This Parser does not provide the the toLines() method.
- class diffpy.structure.parsers.p_auto.P_auto(**kw)[source]
Bases:
StructureParser
Parser with automatic detection of structure format.
This parser attempts to automatically detect the format of a given structure file and parse it accordingly. When successful, it sets its format attribute to the detected structure format.
- Parameters:
**kw (dict) – Keyword arguments for the structure parser.
- format
Detected structure format. Initially set to “auto” and updated after successful detection of the structure format.
- Type:
str
- pkw
Keyword arguments passed to the parser.
- Type:
dict
- parse(s)[source]
Detect format and create Structure instance from a string.
Set format attribute to the detected file format.
- Parameters:
s (str) – String with structure data.
- Returns:
Structure object.
- Return type:
- Raises:
- parseFile(filename)[source]
Detect format and create Structure instance from an existing file.
Set format attribute to the detected file format.
- Parameters:
filename (str) – Path to structure file.
- Returns:
Structure object.
- Return type:
- Raises:
StructureFormatError – If the structure format is unknown or invalid.
IOError – If the file cannot be read.
diffpy.structure.parsers.p_pdffit module
Parser for PDFfit structure format
- class diffpy.structure.parsers.p_pdffit.P_pdffit[source]
Bases:
StructureParser
Parser for PDFfit structure format.
- format
Format name, default “pdffit”.
- Type:
str
- ignored_lines
List of lines ignored during parsing.
- Type:
list
- stru
Structure instance used for cif input or output.
- Type:
- parseLines(lines)[source]
Parse list of lines in PDFfit format.
- Parameters:
lines (list of str) – List of lines in PDB format.
- Returns:
Parsed structure instance.
- Return type:
- Raises:
StructureFormatError – File not in PDFfit format.
diffpy.structure.parsers.p_xcfg module
Parser for extended CFG format used by atomeye.
- diffpy.structure.parsers.p_xcfg.AtomicMass
Dictionary of atomic masses for elements.
- Type:
dict
- class diffpy.structure.parsers.p_xcfg.P_xcfg[source]
Bases:
StructureParser
Parser for AtomEye extended CFG format.
- format
Format name, default “xcfg”.
- Type:
str
- cluster_boundary = 2
Width of boundary around corners of non-periodic cluster to avoid PBC effects in atomeye.
- Type:
int
- parseLines(lines)[source]
Parse list of lines in XCFG format.
- Parameters:
lines (list of str) – List of lines in XCFG format.
- Returns:
Parsed structure instance.
- Return type:
- Raises:
StructureFormatError – Invalid XCFG format.
- toLines(stru)[source]
Convert Structure stru to a list of lines in XCFG atomeye format.
- Parameters:
stru (Structure) – Structure to be converted.
- Returns:
List of lines in XCFG format.
- Return type:
list of str
- Raises:
StructureFormatError – Cannot convert empty structure to XCFG format.
diffpy.structure.parsers.parser_index_mod module
Index of recognized structure formats, their IO capabilities and associated modules where they are defined.
- diffpy.structure.parsers.parser_index_mod.parser_index
Dictionary of recognized structure formats. The keys are format names and the values are dictionaries with the following keys:
- modulestr
Name of the module that defines the parser class.
- file_extensionstr
File extension for the format, including the leading dot.
- file_patternstr
File pattern for the format, using ‘|’ as separator for multiple patterns.
- has_inputbool
True
if the parser can read the format.- has_outputbool
True
if the parser can write the format.
- Type:
dict
Note
Plugins for new structure formats need to be added to the parser_index dictionary in this module.
diffpy.structure.parsers.p_pdb module
Basic parser for PDB structure format.
Note
- class diffpy.structure.parsers.p_pdb.P_pdb[source]
Bases:
StructureParser
Simple parser for PDB format.
The parser understands following PDB records: TITLE, CRYST1, SCALE1, SCALE2, SCALE3, ATOM, SIGATM, ANISOU, SIGUIJ, TER, HETATM, END.
- format
Format name, default “pdb”.
- Type:
str
- atomLines(stru, idx)[source]
Build ATOM records and possibly SIGATM, ANISOU or SIGUIJ records for structure stru atom number aidx.
- orderOfRecords = ['HEADER', 'OBSLTE', 'TITLE', 'CAVEAT', 'COMPND', 'SOURCE', 'KEYWDS', 'EXPDTA', 'AUTHOR', 'REVDAT', 'SPRSDE', 'JRNL', 'REMARK', 'REMARK', 'REMARK', 'REMARK', 'DBREF', 'SEQADV', 'SEQRES', 'MODRES', 'HET', 'HETNAM', 'HETSYN', 'FORMUL', 'HELIX', 'SHEET', 'TURN', 'SSBOND', 'LINK', 'HYDBND', 'SLTBRG', 'CISPEP', 'SITE', 'CRYST1', 'ORIGX1', 'ORIGX2', 'ORIGX3', 'SCALE1', 'SCALE2', 'SCALE3', 'MTRIX1', 'MTRIX2', 'MTRIX3', 'TVECT', 'MODEL', 'ATOM', 'SIGATM', 'ANISOU', 'SIGUIJ', 'TER', 'HETATM', 'ENDMDL', 'CONECT', 'MASTER', 'END']
Ordered list of PDB record labels.
- Type:
list
- parseLines(lines)[source]
Parse list of lines in PDB format.
- Parameters:
lines (list of str) – List of lines in PDB format.
- Returns:
Parsed structure instance.
- Return type:
- Raises:
StructureFormatError – Invalid PDB record.
- toLines(stru)[source]
Convert Structure stru to a list of lines in PDB format.
- Parameters:
stru (Structure) – Structure to be converted.
- Returns:
List of lines in PDB format.
- Return type:
list of str
- validRecords = {'ANISOU': None, 'ATOM': None, 'AUTHOR': None, 'CAVEAT': None, 'CISPEP': None, 'COMPND': None, 'CONECT': None, 'CRYST1': None, 'DBREF': None, 'END': None, 'ENDMDL': None, 'EXPDTA': None, 'FORMUL': None, 'HEADER': None, 'HELIX': None, 'HET': None, 'HETATM': None, 'HETNAM': None, 'HETSYN': None, 'HYDBND': None, 'JRNL': None, 'KEYWDS': None, 'LINK': None, 'MASTER': None, 'MODEL': None, 'MODRES': None, 'MTRIX1': None, 'MTRIX2': None, 'MTRIX3': None, 'OBSLTE': None, 'ORIGX1': None, 'ORIGX2': None, 'ORIGX3': None, 'REMARK': None, 'REVDAT': None, 'SCALE1': None, 'SCALE2': None, 'SCALE3': None, 'SEQADV': None, 'SEQRES': None, 'SHEET': None, 'SIGATM': None, 'SIGUIJ': None, 'SITE': None, 'SLTBRG': None, 'SOURCE': None, 'SPRSDE': None, 'SSBOND': None, 'TER': None, 'TITLE': None, 'TURN': None, 'TVECT': None}
Dictionary of PDB record labels.
- Type:
dict
diffpy.structure.parsers.p_discus module
Parser for DISCUS structure format
- class diffpy.structure.parsers.p_discus.P_discus[source]
Bases:
StructureParser
Parser for DISCUS structure format. The parser chokes on molecule and generator records.
- format
File format name, default “discus”.
- Type:
str
- nl
Line number of the current line being parsed.
- Type:
int
- lines
List of lines from the input file.
- Type:
list of str
- line
Current line being parsed.
- Type:
str
- stru
Structure being parsed.
- Type:
- ignored_lines
List of lines that were ignored during parsing.
- Type:
list of str
- cell_read
True
if cell record processed.- Type:
bool
- ncell_read
True
if ncell record processed.- Type:
bool
- parseLines(lines)[source]
Parse list of lines in DISCUS format.
- Parameters:
lines (list of str) – List of lines from the input file.
- Returns:
Parsed PDFFitStructure instance.
- Return type:
- Raises:
StructureFormatError – If the file is not in DISCUS format.
diffpy.structure.parsers.p_xyz module
Parser for XYZ file format, where
First line gives number of atoms.
Second line has optional title.
Remaining lines contain element, x, y, z.
- class diffpy.structure.parsers.p_xyz.P_xyz[source]
Bases:
StructureParser
Parser for standard XYZ structure format.
- format
Format name, default “xyz”.
- Type:
str
- parseLines(lines)[source]
Parse list of lines in XYZ format.
- Parameters:
lines (list of str) – List of lines in XYZ format.
- Returns:
Parsed structure instance.
- Return type:
- Raises:
StructureFormatError – Invalid XYZ format.