diffpy.structure.parsers package

Submodules

diffpy.structure.parsers.p_auto module

Parser for automatic file format detection.

This Parser does not provide the the toLines() method.

class diffpy.structure.parsers.p_auto.P_auto(**kw)[source]

Bases: StructureParser

Parser with automatic detection of structure format.

This parser attempts to automatically detect the format of a given structure file and parse it accordingly. When successful, it sets its format attribute to the detected structure format.

Parameters:

**kw (dict) – Keyword arguments for the structure parser.

format

Detected structure format. Initially set to “auto” and updated after successful detection of the structure format.

Type:

str

pkw

Keyword arguments passed to the parser.

Type:

dict

parse(s)[source]

Detect format and create Structure instance from a string.

Set format attribute to the detected file format.

Parameters:

s (str) – String with structure data.

Returns:

Structure object.

Return type:

Structure

Raises:

StructureFormatError

parseFile(filename)[source]

Detect format and create Structure instance from an existing file.

Set format attribute to the detected file format.

Parameters:

filename (str) – Path to structure file.

Returns:

Structure object.

Return type:

Structure

Raises:
  • StructureFormatError – If the structure format is unknown or invalid.

  • IOError – If the file cannot be read.

parseLines(lines)[source]

Detect format and create Structure instance from a list of lines.

Set format attribute to the detected file format.

Parameters:

lines (list) – List of lines with structure data.

Returns:

Structure object.

Return type:

Structure

Raises:

StructureFormatError

diffpy.structure.parsers.p_auto.getParser(**kw)[source]

Return a new instance of the automatic parser.

Parameters:

**kw (dict) – Keyword arguments for the structure parser

Returns:

Instance of P_auto.

Return type:

P_auto

diffpy.structure.parsers.p_cif module

Parser for basic CIF file format.

diffpy.structure.parsers.p_cif.rx_float

Constant regular expression for leading_float().

Type:

re.Pattern

diffpy.structure.parsers.p_cif.symvec

Helper dictionary for getSymOp().

Type:

dict

class diffpy.structure.parsers.p_cif.P_cif(eps=None)[source]

Bases: StructureParser

Simple parser for CIF structure format.

Reads Structure from the first block containing _atom_site_label key. Following blocks, if any, are ignored.

Parameters:

eps (float, Optional) – Fractional coordinates cutoff for duplicate positions. When None use the default for ExpandAsymmetricUnit: 1.0e-5.

format

Structure format name.

Type:

str

ciffile

Instance of CifFile from PyCifRW.

Type:

CifFile

stru

Structure instance used for CIF input or output.

Type:

Structure

spacegroup

Instance of SpaceGroup used for symmetry expansion.

Type:

SpaceGroup

eps

Resolution in fractional coordinates for non-equal positions. Used for expansion of asymmetric unit.

Type:

float

eau

Instance of ExpandAsymmetricUnit from SymmetryUtilities.

Type:

ExpandAsymmetricUnit

asymmetric_unit

List of Atom instances for the original asymmetric unit in the CIF file.

Type:

list

labelindex

Dictionary mapping unique atom label to index of Atom in self.asymmetric_unit.

Type:

dict

anisotropy

Dictionary mapping unique atom label to displacement anisotropy resolved at that site.

Type:

dict

cif_sgname

Space group name obtained by looking up the value of _space_group_name_Hall, _symmetry_space_group_name_Hall, _space_group_name_H-M_alt, _symmetry_space_group_name_H-M items. None when neither is defined.

Type:

str or None

BtoU = 0.012665147955292222

Conversion factor from B values to U values.

Type:

float

parse(s)[source]

Create Structure instance from a string in CIF format.

Parameters:

s (str) – A string in CIF format.

Returns:

Structure instance.

Return type:

Structure

Raises:

StructureFormatError – When the data do not constitute a valid CIF format.

parseFile(filename)[source]

Create Structure from an existing CIF file.

Parameters:

filename (str) – Path to structure file.

Returns:

Structure instance.

Return type:

Structure

Raises:
  • StructureFormatError – When the data do not constitute a valid CIF format.

  • IOError – When the file cannot be opened.

parseLines(lines)[source]

Parse list of lines in CIF format.

Parameters:

lines (list) – List of strings stripped of line terminator.

Returns:

Structure instance.

Return type:

Structure

Raises:

StructureFormatError – When the data do not constitute a valid CIF format.

toLines(stru)[source]

Convert Structure to a list of lines in basic CIF format.

Parameters:

stru (Structure) – The structure to be converted.

Returns:

List of lines in basic CIF format.

Return type:

list

diffpy.structure.parsers.p_cif.getParser(eps=None)[source]

Return new parser object for CIF format.

Parameters:

eps (float, Optional) – fractional coordinates cutoff for duplicate positions. When None use the default for ExpandAsymmetricUnit: 1.0e-5.

Returns:

Instance of P_cif.

Return type:

P_cif

diffpy.structure.parsers.p_cif.getSymOp(s)[source]

Create SpaceGroups.SymOp instance from a string.

Parameters:

s (str) – Formula for equivalent coordinates, for example 'x,1/2-y,1/2+z'.

Returns:

Instance of SymOp.

Return type:

SymOp

diffpy.structure.parsers.p_cif.leading_float(s, d=0.0)[source]

Extract the first float from a string and ignore trailing characters.

Useful for extracting values from “value(std)” syntax.

Parameters:
  • s (str) – The string to be scanned for floating point value.

  • d (float, Optional) – The default value when s is “.” or “?”, which in CIF format stands for inapplicable and unknown, respectively.

Returns:

The extracted floating point value.

Return type:

float

Raises:

ValueError – When string does not start with a float.

diffpy.structure.parsers.p_discus module

Parser for DISCUS structure format

class diffpy.structure.parsers.p_discus.P_discus[source]

Bases: StructureParser

Parser for DISCUS structure format. The parser chokes on molecule and generator records.

format

File format name, default “discus”.

Type:

str

nl

Line number of the current line being parsed.

Type:

int

lines

List of lines from the input file.

Type:

list of str

line

Current line being parsed.

Type:

str

stru

Structure being parsed.

Type:

PDFFitStructure

ignored_lines

List of lines that were ignored during parsing.

Type:

list of str

cell_read

True if cell record processed.

Type:

bool

ncell_read

True if ncell record processed.

Type:

bool

parseLines(lines)[source]

Parse list of lines in DISCUS format.

Parameters:

lines (list of str) – List of lines from the input file.

Returns:

Parsed PDFFitStructure instance.

Return type:

PDFFitStructure

Raises:

StructureFormatError – If the file is not in DISCUS format.

toLines(stru)[source]

Convert Structure stru to a list of lines in DISCUS format.

Parameters:

stru (Structure) – Structure to be converted.

Returns:

List of lines in DISCUS format.

Return type:

list of str

diffpy.structure.parsers.p_discus.getParser()[source]

Return new parser object for DISCUS format.

Returns:

Instance of P_discus.

Return type:

P_discus

diffpy.structure.parsers.p_pdb module

Basic parser for PDB structure format.

class diffpy.structure.parsers.p_pdb.P_pdb[source]

Bases: StructureParser

Simple parser for PDB format.

The parser understands following PDB records: TITLE, CRYST1, SCALE1, SCALE2, SCALE3, ATOM, SIGATM, ANISOU, SIGUIJ, TER, HETATM, END.

format

Format name, default “pdb”.

Type:

str

atomLines(stru, idx)[source]

Build ATOM records and possibly SIGATM, ANISOU or SIGUIJ records for structure stru atom number aidx.

cryst1Lines(stru)[source]

Build lines corresponding to CRYST1 record.

orderOfRecords = ['HEADER', 'OBSLTE', 'TITLE', 'CAVEAT', 'COMPND', 'SOURCE', 'KEYWDS', 'EXPDTA', 'AUTHOR', 'REVDAT', 'SPRSDE', 'JRNL', 'REMARK', 'REMARK', 'REMARK', 'REMARK', 'DBREF', 'SEQADV', 'SEQRES', 'MODRES', 'HET', 'HETNAM', 'HETSYN', 'FORMUL', 'HELIX', 'SHEET', 'TURN', 'SSBOND', 'LINK', 'HYDBND', 'SLTBRG', 'CISPEP', 'SITE', 'CRYST1', 'ORIGX1', 'ORIGX2', 'ORIGX3', 'SCALE1', 'SCALE2', 'SCALE3', 'MTRIX1', 'MTRIX2', 'MTRIX3', 'TVECT', 'MODEL', 'ATOM', 'SIGATM', 'ANISOU', 'SIGUIJ', 'TER', 'HETATM', 'ENDMDL', 'CONECT', 'MASTER', 'END']

Ordered list of PDB record labels.

Type:

list

parseLines(lines)[source]

Parse list of lines in PDB format.

Parameters:

lines (list of str) – List of lines in PDB format.

Returns:

Parsed structure instance.

Return type:

Structure

Raises:

StructureFormatError – Invalid PDB record.

titleLines(stru)[source]

Build lines corresponding to TITLE record.

toLines(stru)[source]

Convert Structure stru to a list of lines in PDB format.

Parameters:

stru (Structure) – Structure to be converted.

Returns:

List of lines in PDB format.

Return type:

list of str

validRecords = {'ANISOU': None, 'ATOM': None, 'AUTHOR': None, 'CAVEAT': None, 'CISPEP': None, 'COMPND': None, 'CONECT': None, 'CRYST1': None, 'DBREF': None, 'END': None, 'ENDMDL': None, 'EXPDTA': None, 'FORMUL': None, 'HEADER': None, 'HELIX': None, 'HET': None, 'HETATM': None, 'HETNAM': None, 'HETSYN': None, 'HYDBND': None, 'JRNL': None, 'KEYWDS': None, 'LINK': None, 'MASTER': None, 'MODEL': None, 'MODRES': None, 'MTRIX1': None, 'MTRIX2': None, 'MTRIX3': None, 'OBSLTE': None, 'ORIGX1': None, 'ORIGX2': None, 'ORIGX3': None, 'REMARK': None, 'REVDAT': None, 'SCALE1': None, 'SCALE2': None, 'SCALE3': None, 'SEQADV': None, 'SEQRES': None, 'SHEET': None, 'SIGATM': None, 'SIGUIJ': None, 'SITE': None, 'SLTBRG': None, 'SOURCE': None, 'SPRSDE': None, 'SSBOND': None, 'TER': None, 'TITLE': None, 'TURN': None, 'TVECT': None}

Dictionary of PDB record labels.

Type:

dict

diffpy.structure.parsers.p_pdb.getParser()[source]

Return new parser object for PDB format.

Returns:

Instance of P_pdb.

Return type:

P_pdb

diffpy.structure.parsers.p_pdffit module

Parser for PDFfit structure format

class diffpy.structure.parsers.p_pdffit.P_pdffit[source]

Bases: StructureParser

Parser for PDFfit structure format.

format

Format name, default “pdffit”.

Type:

str

ignored_lines

List of lines ignored during parsing.

Type:

list

stru

Structure instance used for cif input or output.

Type:

PDFFitStructure

parseLines(lines)[source]

Parse list of lines in PDFfit format.

Parameters:

lines (list of str) – List of lines in PDB format.

Returns:

Parsed structure instance.

Return type:

Structure

Raises:

StructureFormatError – File not in PDFfit format.

toLines(stru)[source]

Convert Structure stru to a list of lines in PDFfit format.

Parameters:

stru (Structure) – Structure to be converted.

Returns:

List of lines in PDFfit format.

Return type:

list of str

diffpy.structure.parsers.p_pdffit.getParser()[source]

Return new parser object for PDFfit format.

Returns:

Instance of P_pdffit.

Return type:

P_pdffit

diffpy.structure.parsers.p_rawxyz module

Parser for raw XYZ file format.

Raw XYZ is a 3 or 4 column text file with cartesian coordinates of atoms and an optional first column for atom types.

class diffpy.structure.parsers.p_rawxyz.P_rawxyz[source]

Bases: StructureParser

Parser –> StructureParser subclass for RAWXYZ format.

format

Format name, default “rawxyz”.

Type:

str

parseLines(lines)[source]

Parse list of lines in RAWXYZ format.

Parameters:

lines (list of str) – List of lines in RAWXYZ format.

Returns:

Parsed structure instance.

Return type:

Structure

Raises:

StructureFormatError – Invalid RAWXYZ format.

toLines(stru)[source]

Convert Structure stru to a list of lines in RAWXYZ format.

Parameters:

stru (Structure) – Structure to be converted.

Returns:

List of lines in RAWXYZ format.

Return type:

list of str

diffpy.structure.parsers.p_rawxyz.getParser()[source]

Return new parser object for RAWXYZ format.

Returns:

Instance of P_rawxyz.

Return type:

P_rawxyz

diffpy.structure.parsers.p_xcfg module

Parser for extended CFG format used by atomeye.

diffpy.structure.parsers.p_xcfg.AtomicMass

Dictionary of atomic masses for elements.

Type:

dict

class diffpy.structure.parsers.p_xcfg.P_xcfg[source]

Bases: StructureParser

Parser for AtomEye extended CFG format.

format

Format name, default “xcfg”.

Type:

str

cluster_boundary = 2

Width of boundary around corners of non-periodic cluster to avoid PBC effects in atomeye.

Type:

int

parseLines(lines)[source]

Parse list of lines in XCFG format.

Parameters:

lines (list of str) – List of lines in XCFG format.

Returns:

Parsed structure instance.

Return type:

Structure

Raises:

StructureFormatError – Invalid XCFG format.

toLines(stru)[source]

Convert Structure stru to a list of lines in XCFG atomeye format.

Parameters:

stru (Structure) – Structure to be converted.

Returns:

List of lines in XCFG format.

Return type:

list of str

Raises:

StructureFormatError – Cannot convert empty structure to XCFG format.

diffpy.structure.parsers.p_xcfg.getParser()[source]

Return new parser object for XCFG format.

Returns:

Instance of P_xcfg.

Return type:

P_xcfg

diffpy.structure.parsers.p_xyz module

Parser for XYZ file format, where

  • First line gives number of atoms.

  • Second line has optional title.

  • Remaining lines contain element, x, y, z.

class diffpy.structure.parsers.p_xyz.P_xyz[source]

Bases: StructureParser

Parser for standard XYZ structure format.

format

Format name, default “xyz”.

Type:

str

parseLines(lines)[source]

Parse list of lines in XYZ format.

Parameters:

lines (list of str) – List of lines in XYZ format.

Returns:

Parsed structure instance.

Return type:

Structure

Raises:

StructureFormatError – Invalid XYZ format.

toLines(stru)[source]

Convert Structure stru to a list of lines in XYZ format.

Parameters:

stru (Structure) – Structure to be converted.

Returns:

List of lines in XYZ format.

Return type:

list of str

diffpy.structure.parsers.p_xyz.getParser()[source]

Return new parser object for XYZ format.

Returns:

Instance of P_xyz.

Return type:

P_xcfg

diffpy.structure.parsers.parser_index_mod module

Index of recognized structure formats, their IO capabilities and associated modules where they are defined.

diffpy.structure.parsers.parser_index_mod.parser_index

Dictionary of recognized structure formats. The keys are format names and the values are dictionaries with the following keys:

modulestr

Name of the module that defines the parser class.

file_extensionstr

File extension for the format, including the leading dot.

file_patternstr

File pattern for the format, using ‘|’ as separator for multiple patterns.

has_inputbool

True if the parser can read the format.

has_outputbool

True if the parser can write the format.

Type:

dict

Note

Plugins for new structure formats need to be added to the parser_index dictionary in this module.

diffpy.structure.parsers.structureparser module

Definition of StructureParser, a base class for specific parsers.

class diffpy.structure.parsers.structureparser.StructureParser[source]

Bases: object

Base class for all structure parsers.

format

Format name of particular parser.

Type:

str

filename

Path to structure file that is read or written.

Type:

str

parse(s)[source]

Create Structure instance from a string.

parseFile(filename)[source]

Create Structure instance from an existing file.

parseLines(lines)[source]

Create Structure instance from a list of lines.

Return Structure object or raise StructureFormatError exception.

Note

This method has to be overloaded in derived class.

toLines(stru)[source]

Convert Structure stru to a list of lines.

Return list of strings.

Note

This method has to be overloaded in derived class.

tostring(stru)[source]

Convert Structure instance to a string.

Module contents

Conversion plugins for various structure formats.

The recognized structure formats are defined by subclassing StructureParser, by convention these classes are named P_<format>.py. The parser classes should to override the parseLines() and toLines() methods of StructureParser. Any structure parser needs to be registered in parser_index module.

For normal usage it should be sufficient to use the routines provided in this module.

Content:
  • StructureParser: base class for a concrete Parser

  • parser_index: dictionary of known structure formats

  • getParser: factory for Parser at given format

  • inputFormats: list of available input formats

  • outputFormats: list of available output formats

diffpy.structure.parsers.getParser(format, **kw)[source]

Return Parser instance for a given structure format.

Parameters:
  • format (str) – String with the format name, see parser_index_mod.

  • **kw (dict) – Keyword arguments passed to the Parser init function.

Returns:

Parser instance for the given format.

Return type:

Parser

Raises:

StructureFormatError – When the format is not defined.

diffpy.structure.parsers.inputFormats()[source]

Return list of implemented input structure formats.

diffpy.structure.parsers.outputFormats()[source]

Return list of implemented output structure formats.