home *** CD-ROM | disk | FTP | other *** search
- CSV Files
- =========
- What are CSV files?
- ===================
- CSV stands for "comma-separated value" and is the name given to data files
- in which each item of data is separated from the next by a comma. Such files
- are widely used to transfer data from one application to another, e.g. from
- a database to a spreadsheet or from a RISC OS database to a PC database. The
- standard for such files does not appear to be very rigorously defined and
- the following variants may be encountered:-
-
- (1) Data which is non-numeric (e.g. ordinary text items such as names and
- addresses) are usually enclosed in quotation marks ("") whereas numeric
- items are not. This makes it possible for an application which is reading a
- CSV file to distinguish between numbers and strings (which might of course
- contain numerals) and also allows the comma to be used as a character within
- a string without being mistaken for a data separator. (Addresses often have
- commas in them, e.g. 112, Keighley Road.)
-
- (2) Null data items are usually included and can be located by looking for
- two commas with either nothing in between them or with only two quotation
- marks between them, i.e. ,, or ,"",. Each line of data (record) in such a
- file will always contain the same number of data items (fields). Nulls may,
- however, be omitted entirely in which case the number of fields per record
- will vary.
-
- (3) The way in which each record is terminated tends to vary from one system
- to another. The last item of data in a record is not followed by a comma but
- by an invisible character or pair of characters called the line terminator.
- On RISC OS systems this is the "line-feed" character (ASCII value 10) and on
- PCs it is usually the "carriage-return" character (ASCII value 13). You
- might, however, encounter CSV files in which BOTH these characters are used,
- i.e. LF CR or CR LF.
-
- (4) The separator isn't always even a comma. The "TAB" character (ASCII
- value 9) is often used and such files are called TSV files, but characters
- such as the semicolon are sometimes used too.
-
- (5) Some applications which accept a CSV file as input expect the first
- record to contain the names of the fields which comprise the subsequent
- records, e.g. if each record consists of a name and a four-part address this
- "header" record might read "NAME","STREET","TOWN","COUNTY","POSTCODE".
-
- How does Powerbase use CSV files?
- =================================
- Powerbase is able to save data in the form of true CSV files or files using
- some other data separator. If the separator is indeed a comma then the file
- created is of type &dfe and its icon displays the letters "CSV". If some
- other separator, such as TAB, is used (see (4) above) then the file is of
- the ordinary "text" type, i.e. &fff. Files of these types may also be used
- to enter data into a Powerbase database. In what follows we will, for
- convenience, refer to them as "CSV files" whether or not the separator is a
- comma. The choices on the main menu which control these actions are 'Export
- CSV' and 'CSV options'.
-
- CSV options
- ===========
- This leads to a dialogue box which allows you to specify all the file
- characteristics described in the previous section. Thus, selection of the
- first three option buttons causes a saved file to have, respectively:-
-
- (a) quotes round non-numeric fields (see (1) above)
- (b) a header record specifying the field names (see (5) above)
- (c) null fields included (see (2) above).
-
- Note that the "field names" referred to in (b) are, by default, the TAGS of
- the corresponding Powerbase fields, but may be changed to the DESCRIPTORS by
- altering the settings in the 'Print options' dialogue box.
-
- The field separator and record terminator may be changed by means of menus
- which appear when you click on the right-pointing arrows. Each of these
- menus also allows you to define your own separator and/or terminator of one
- or two printing characters.
-
- Two buttons in this dialogue box have not yet been referred to. Their
- function is described later (Using CSV files to import data).
-
- Export CSV
- ==========
- Having set up your options as described above, creating a CSV file is very
- like printing a list (see 'Printing'). First highlight the fields to be
- exported by clicking on each with ADJUST. Remember that the order in which
- the fields are highlighted is important. Next follow the 'Export CSV' option
- to the dialogue box and type in the search formula to determine which
- records are exported. Finally, enter the name of the file and drag the file
- icon to a filer window. The hourglass appears showing the progress of the
- operation.
-
- Experiment with saving CSV files with different settings of the CSV options
- and then loading the resulting files into Edit to examine them.
-
- Using CSV files to import data
- ==============================
- If you drag a CSV file onto the record display of an open database the
- options window (see CSV options) appears with a changed title and some
- additional icons, one of which displays the pathname of the CSV file. The
- same thing happens with a suitably-constituted text file (see above: 'How
- does Powerbase use CSV files'). Please note, however, that there are other
- circumstances under which a text file might be dropped on the record window.
- A properly-written script file (see 'ScriptFile') would be recognised as
- such, but a text file dropped onto the appropriate type of External field
- (i.e. a 'Text' or 'Text Block') field would become linked to that field
- instead of being treated like a CSV file. If you are importing data from a
- plain text file and your record contains fields of the aforementioned types
- be sure to drop the file on the window background, not on the external
- field. You are strongly advised to use proper CSV files if at all possible.
-
- There is an option to display each record as it is imported. The process is
- faster with this option turned off but much less informative. Most people
- like to know how things are progressing.
-
- Another option causes any trailing spaces in the imported data-fields to be
- stripped. Some database programs appear to pad all fields to their maximum
- length by adding spaces to the end of the data where necessary. If the
- program doesn't allow you to get rid of these spaces when exporting data in
- CSV format they can be a nuisance when the file is imported into Powerbase.
- Queries, for instance, may not work properly unless you use the "contains"
- comparator ({) rather than "equals". You can easily recognise this situation
- since the caret will always be at the far right of the field even though the
- visible characters nowhere near fill the field. Setting the "Strip spaces"
- switch before importing the file overcomes the problem.
-
- Clicking on 'GO' will make Powerbase try to create new Powerbase records
- from the data. If you decide not to do this you should click on 'Cancel'. If
- you do wish to import the data there are some important points to consider:-
-
- (1) The options need to be correctly set up. This means that if the CSV file
- originally came from a Powerbase application the settings in the options
- window should be exactly the same as they were when the file was exported.
- The exception to this is the 'Quotes' button which is greyed out on import
- because Powerbase doesn't need it. If the CSV file came from a PC or another
- RISC OS application you might have to load it into Edit to find out what
- separator and terminator are used, whether it uses quotes etc.
-
- (2) Will the data be read into the correct Powerbase fields? If no fields on
- the record screen are highlighted (i.e. with ADJUST) and the CSV file does
- not contain a header record then the answer is as follows:-
-
- (a) The first CSV field will be read into the first WRITABLE
- Powerbase field or Check box field. Non-writable fields such as
- Text blocks, Pictures, Buttons or fields which are merely labels
- will be ignored. The next CSV field will be read into the second
- Powerbase field and so on.
-
- (b) If the end of the CSV record is reached before all the writable
- fields have been filled then the next CSV record will start a new
- Powerbase record, i.e. the reading won't get out of step. It does not
- matter, therefore, if the CSV file omits null items.
-
- (c) If all writable fields are filled in before reaching the end of
- the CSV record Powerbase ignores the remainder of the latter and
- skips to the beginning of the next CSV record before starting a new
- Powerbase record. This is also to keep the operation in step.
-
- There will be occasions when you don't want to fill the Powerbase fields
- sequentially as just described. There are two ways of making the process
- more specific:-
-
- (a) If you highlight Powerbase fields with ADJUST before starting the
- import the data will be read only into the highlighted fields. The
- order in which the fields are filled is the order in which you
- highlighted them. Paragraphs (b) and (c) above apply in cases where
- the CSV record has more or fewer fields than are highlighted on the
- Powerbase record. In other words the operation proceeds as if the
- highlighted fields were the only ones in the record and as if they
- actually occurred in the order in which they were highlighted.
-
- (b) If the CSV file has a header which contains the tags or
- descriptors of fields in the Powerbase record then the import behaves
- exactly as if those fields were highlighted. The 'Print options'
- dialogue box must correctly reflect whether the tags or descriptors
- of fields are used. Note that you should not use both a header AND
- highlighting. Note also that there is nothing to prevent you adding
- a header manually to a CSV file which did not originate from a
- Powerbase application.
-
- What if the data won't fit?
- ===========================
- This is really two questions. If the database doesn't contain enough free
- records to hold all the imported CSV data you may get a "Database full when
- reading CSV file" error. There are two ways of avoiding this. The obvious
- one is to make sure the database is big enough before you start. If it isn't
- you can expand it via the 'Change length' dialogue box accessible from the
- 'Utilities' submenu on the icon bar menu. That dialogue box also has an
- 'Increment for expansion' which, if greater than zero, causes the database
- to expand automatically whenever it becomes full. Only if the database
- becomes full when the increment is set to zero should the aforementioned
- error occur.
-
- The second sense in which data might not fit is if an item of data is too
- long for the database field into which you are trying to import it. When a
- CSV file is being read Powerbase maintains a file called 'TooBig' inside the
- database's 'PrintJobs' directory. Anything read from the CSV file which
- won't fit in the target field is written to this file together with
- information about where it was intended to go. No writable Powerbase field
- may be longer than 246 characters and if an item of imported data exceeds
- this a note will be made in the 'TooBig' file advising you to define an
- External field (Text Block or Text) for such data.
-
- Advanced features of CSV import
- ===============================
- When you import a CSV file a new Powerbase record is normally created to
- accommodate each CSV record read from the file. You may also, subject to
- certain restrictions, use a CSV file to enter data into EXISTING Powerbase
- records.
-
- The CSV options window contains a button which, if selected, causes each
- record of a saved CSV file to include the primary key of the Powerbase
- record. The key is placed at the beginning of the record. Try creating CSV
- files with and without this option set and compare the files using Edit. If
- a file containing primary keys is dragged into a Powerbase record window
- Powerbase will attempt to locate the records with the same primary keys as
- the records in the CSV file. If the key is not found the CSV record is
- simply ignored. If it is found then the CSV data goes into the same record,
- overwriting any data which the target fields already contain.
-
- The operation described needs using with care. It is only really useful for
- transferring data between Powerbase databases which have the same primary
- key field. It can also lead to trouble if the primary key isn't unique,
- since the record located by Powerbase and into which the CSV data is read
- might not be the correct one. You have been warned!
-
- Finally, there is one option button in the CSV options window which only
- becomes active when the 'Header' button is selected. This is called 'With
- field data'. When selected the header of a saved CSV file contains not only
- the field names but also the field lengths and types. Again it is suggested
- that you create such a file and look at it in Edit. Each item in the header
- begins with a number. This is the defined field length in characters, i.e.
- the maximum allowed length in the Powerbase record field. Next comes a ¤
- character which separates the length from the field name. Another ¤
- separates the field name from the concluding number which is the field type.
- (The type numbers may be inspected by looking at the 'ValStrings' file in
- the Powerbase directory.)
-
- A file created in this form is NOT meant to be dropped on the window of an
- open database. It should be dropped on the Powerbase icon on the icon bar
- when no database is open. Powerbase will attempt to convert the file into a
- functioning database. All the fields will be ranged on the left of the
- record window, one beneath another and the primary key will consist of the
- first four characters of the first field. Don't try to create databases
- containing Check box, Button or External fields using this method; it's only
- meant for the most basic type of database.
-
-
-
-
-
-