RISC DISC 1

home *** CD-ROM | disk | FTP | other *** search

/ RISC DISC 1 / RISC_DISC_1.iso / pd_share / apps / powerbase / Docs / CSVFiles < prev next >

Wrap

Text File | 1994-10-01 | 13.6 KB | 248 lines

CSV Files ========= What are CSV files? =================== CSV stands for "comma-separated value" and is the name given to data files in which each item of data is separated from the next by a comma. Such files are widely used to transfer data from one application to another, e.g. from a database to a spreadsheet or from a RISC OS database to a PC database. The standard for such files does not appear to be very rigorously defined and the following variants may be encountered:- (1) Data which is non-numeric (e.g. ordinary text items such as names and addresses) are usually enclosed in quotation marks ("") whereas numeric items are not. This makes it possible for an application which is reading a CSV file to distinguish between numbers and strings (which might of course contain numerals) and also allows the comma to be used as a character within a string without being mistaken for a data separator. (Addresses often have commas in them, e.g. 112, Keighley Road.) (2) Null data items are usually included and can be located by looking for two commas with either nothing in between them or with only two quotation marks between them, i.e. ,, or ,"",. Each line of data (record) in such a file will always contain the same number of data items (fields). Nulls may, however, be omitted entirely in which case the number of fields per record will vary. (3) The way in which each record is terminated tends to vary from one system to another. The last item of data in a record is not followed by a comma but by an invisible character or pair of characters called the line terminator. On RISC OS systems this is the "line-feed" character (ASCII value 10) and on PCs it is usually the "carriage-return" character (ASCII value 13). You might, however, encounter CSV files in which BOTH these characters are used, i.e. LF CR or CR LF. (4) The separator isn't always even a comma. The "TAB" character (ASCII value 9) is often used and such files are called TSV files, but characters such as the semicolon are sometimes used too. (5) Some applications which accept a CSV file as input expect the first record to contain the names of the fields which comprise the subsequent records, e.g. if each record consists of a name and a four-part address this "header" record might read "NAME","STREET","TOWN","COUNTY","POSTCODE". How does Powerbase use CSV files? ================================= Powerbase is able to save data in the form of true CSV files or files using some other data separator. If the separator is indeed a comma then the file created is of type &dfe and its icon displays the letters "CSV". If some other separator, such as TAB, is used (see (4) above) then the file is of the ordinary "text" type, i.e. &fff. Files of these types may also be used to enter data into a Powerbase database. In what follows we will, for convenience, refer to them as "CSV files" whether or not the separator is a comma. The choices on the main menu which control these actions are 'Export CSV' and 'CSV options'. CSV options =========== This leads to a dialogue box which allows you to specify all the file characteristics described in the previous section. Thus, selection of the first three option buttons causes a saved file to have, respectively:- (a) quotes round non-numeric fields (see (1) above) (b) a header record specifying the field names (see (5) above) (c) null fields included (see (2) above). Note that the "field names" referred to in (b) are, by default, the TAGS of the corresponding Powerbase fields, but may be changed to the DESCRIPTORS by altering the settings in the 'Print options' dialogue box. The field separator and record terminator may be changed by means of menus which appear when you click on the right-pointing arrows. Each of these menus also allows you to define your own separator and/or terminator of one or two printing characters. Two buttons in this dialogue box have not yet been referred to. Their function is described later (Using CSV files to import data). Export CSV ========== Having set up your options as described above, creating a CSV file is very like printing a list (see 'Printing'). First highlight the fields to be exported by clicking on each with ADJUST. Remember that the order in which the fields are highlighted is important. Next follow the 'Export CSV' option to the dialogue box and type in the search formula to determine which records are exported. Finally, enter the name of the file and drag the file icon to a filer window. The hourglass appears showing the progress of the operation. Experiment with saving CSV files with different settings of the CSV options and then loading the resulting files into Edit to examine them. Using CSV files to import data ============================== If you drag a CSV file onto the record display of an open database the options window (see CSV options) appears with a changed title and some additional icons, one of which displays the pathname of the CSV file. The same thing happens with a suitably-constituted text file (see above: 'How does Powerbase use CSV files'). Please note, however, that there are other circumstances under which a text file might be dropped on the record window. A properly-written script file (see 'ScriptFile') would be recognised as such, but a text file dropped onto the appropriate type of External field (i.e. a 'Text' or 'Text Block') field would become linked to that field instead of being treated like a CSV file. If you are importing data from a plain text file and your record contains fields of the aforementioned types be sure to drop the file on the window background, not on the external field. You are strongly advised to use proper CSV files if at all possible. There is an option to display each record as it is imported. The process is faster with this option turned off but much less informative. Most people like to know how things are progressing. Another option causes any trailing spaces in the imported data-fields to be stripped. Some database programs appear to pad all fields to their maximum length by adding spaces to the end of the data where necessary. If the program doesn't allow you to get rid of these spaces when exporting data in CSV format they can be a nuisance when the file is imported into Powerbase. Queries, for instance, may not work properly unless you use the "contains" comparator ({) rather than "equals". You can easily recognise this situation since the caret will always be at the far right of the field even though the visible characters nowhere near fill the field. Setting the "Strip spaces" switch before importing the file overcomes the problem. Clicking on 'GO' will make Powerbase try to create new Powerbase records from the data. If you decide not to do this you should click on 'Cancel'. If you do wish to import the data there are some important points to consider:- (1) The options need to be correctly set up. This means that if the CSV file originally came from a Powerbase application the settings in the options window should be exactly the same as they were when the file was exported. The exception to this is the 'Quotes' button which is greyed out on import because Powerbase doesn't need it. If the CSV file came from a PC or another RISC OS application you might have to load it into Edit to find out what separator and terminator are used, whether it uses quotes etc. (2) Will the data be read into the correct Powerbase fields? If no fields on the record screen are highlighted (i.e. with ADJUST) and the CSV file does not contain a header record then the answer is as follows:- (a) The first CSV field will be read into the first WRITABLE Powerbase field or Check box field. Non-writable fields such as Text blocks, Pictures, Buttons or fields which are merely labels will be ignored. The next CSV field will be read into the second Powerbase field and so on. (b) If the end of the CSV record is reached before all the writable fields have been filled then the next CSV record will start a new Powerbase record, i.e. the reading won't get out of step. It does not matter, therefore, if the CSV file omits null items. (c) If all writable fields are filled in before reaching the end of the CSV record Powerbase ignores the remainder of the latter and skips to the beginning of the next CSV record before starting a new Powerbase record. This is also to keep the operation in step. There will be occasions when you don't want to fill the Powerbase fields sequentially as just described. There are two ways of making the process more specific:- (a) If you highlight Powerbase fields with ADJUST before starting the import the data will be read only into the highlighted fields. The order in which the fields are filled is the order in which you highlighted them. Paragraphs (b) and (c) above apply in cases where the CSV record has more or fewer fields than are highlighted on the Powerbase record. In other words the operation proceeds as if the highlighted fields were the only ones in the record and as if they actually occurred in the order in which they were highlighted. (b) If the CSV file has a header which contains the tags or descriptors of fields in the Powerbase record then the import behaves exactly as if those fields were highlighted. The 'Print options' dialogue box must correctly reflect whether the tags or descriptors of fields are used. Note that you should not use both a header AND highlighting. Note also that there is nothing to prevent you adding a header manually to a CSV file which did not originate from a Powerbase application. What if the data won't fit? =========================== This is really two questions. If the database doesn't contain enough free records to hold all the imported CSV data you may get a "Database full when reading CSV file" error. There are two ways of avoiding this. The obvious one is to make sure the database is big enough before you start. If it isn't you can expand it via the 'Change length' dialogue box accessible from the 'Utilities' submenu on the icon bar menu. That dialogue box also has an 'Increment for expansion' which, if greater than zero, causes the database to expand automatically whenever it becomes full. Only if the database becomes full when the increment is set to zero should the aforementioned error occur. The second sense in which data might not fit is if an item of data is too long for the database field into which you are trying to import it. When a CSV file is being read Powerbase maintains a file called 'TooBig' inside the database's 'PrintJobs' directory. Anything read from the CSV file which won't fit in the target field is written to this file together with information about where it was intended to go. No writable Powerbase field may be longer than 246 characters and if an item of imported data exceeds this a note will be made in the 'TooBig' file advising you to define an External field (Text Block or Text) for such data. Advanced features of CSV import =============================== When you import a CSV file a new Powerbase record is normally created to accommodate each CSV record read from the file. You may also, subject to certain restrictions, use a CSV file to enter data into EXISTING Powerbase records. The CSV options window contains a button which, if selected, causes each record of a saved CSV file to include the primary key of the Powerbase record. The key is placed at the beginning of the record. Try creating CSV files with and without this option set and compare the files using Edit. If a file containing primary keys is dragged into a Powerbase record window Powerbase will attempt to locate the records with the same primary keys as the records in the CSV file. If the key is not found the CSV record is simply ignored. If it is found then the CSV data goes into the same record, overwriting any data which the target fields already contain. The operation described needs using with care. It is only really useful for transferring data between Powerbase databases which have the same primary key field. It can also lead to trouble if the primary key isn't unique, since the record located by Powerbase and into which the CSV data is read might not be the correct one. You have been warned! Finally, there is one option button in the CSV options window which only becomes active when the 'Header' button is selected. This is called 'With field data'. When selected the header of a saved CSV file contains not only the field names but also the field lengths and types. Again it is suggested that you create such a file and look at it in Edit. Each item in the header begins with a number. This is the defined field length in characters, i.e. the maximum allowed length in the Powerbase record field. Next comes a ¤ character which separates the length from the field name. Another ¤ separates the field name from the concluding number which is the field type. (The type numbers may be inspected by looking at the 'ValStrings' file in the Powerbase directory.) A file created in this form is NOT meant to be dropped on the window of an open database. It should be dropped on the Powerbase icon on the icon bar when no database is open. Powerbase will attempt to convert the file into a functioning database. All the fields will be ranged on the left of the record window, one beneath another and the primary key will consist of the first four characters of the first field. Don't try to create databases containing Check box, Button or External fields using this method; it's only meant for the most basic type of database.