home *** CD-ROM | disk | FTP | other *** search
- Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU
- Path: sparky!uunet!paladin.american.edu!auvm!BARUCH.BITNET!TEJERA
- Message-ID: <SAS-L%92122114531958@UGA.CC.UGA.EDU>
- Newsgroups: bit.listserv.sas-l
- Date: Mon, 21 Dec 1992 14:44:46 EST
- Reply-To: Philip Tejera <TEJERA@BARUCH.BITNET>
- Sender: "SAS(r) Discussion" <SAS-L@UGA.BITNET>
- From: Philip Tejera <TEJERA@BARUCH.BITNET>
- Subject: Re: PROC SORT NODUPLICATES is no good
- In-Reply-To: Message of Wed,
- 16 Dec 1992 19:48:41 EST from <76350.1604@COMPUSERVE.COM>
- Lines: 51
-
- On Wed, 16 Dec 1992 19:48:41 EST Andy Norton said:
- >..
- >SUMMARY: Phillip Tejera has not addressed the "adjacency" issue
- >..
- Gee, Andy, I didn't realize I had committed such a crime! :-)
-
- My purpose was more general. I thought the so-called "adjacency"
- issue was more than adequately covered in the SAS v5 Basics manual,
- the SAS v6 Procedures manual, and previous discussions on the list.
- I had no quarrel with your contention that the Language and Procedures
- manual was inaccurate. My question is, having shown the inaccuracy, why do
- you continue to want to believe the L&P manual?
-
- Incidentally, despite your protests, your example did have only
- one key! It was clearly not unique. I was heartened, however, that in
- your responses to Derek and me it was clear you took back most of what
- you said in your first posting. My main objection was to your Proc Sort
- by _all_ ; . In your most recent posting you agree:
-
- > ... I sort by _ALL_ to make
- >them adjacent, not for any other reason. Yes this is expensive, but I
- >need to make them adjacent in order to delete exact duplicates.
- >
- >Note: in real life I don't really ever do this. I keep keys, and
- >remove duplicates on those keys. But ...
-
- If you don't do it, why do you recommend it to SAS-L's world-wide
- readership. Get real!
-
- My point in discussing the concept of a unique key or set of keys was
- to provide a basis for clarifying the context of the issue. If you
- have observations that are supposed to be unique as to key, but in
- fact have duplicates, erroneous data have crept into your study.
- The Noduplicates option of Proc Sort is one conservative way to clean
- the data in the course of sorting it.
-
- It is no news that it is NOT GUARANTEED to remove all exact duplicates.
- But I strongly disagree with your procedure for eliminating duplicate
- keys using Nodupkey and the sort by _all_; .
-
- If I had occasion to sort the data, I would use the Noduplicates option;
- this would be efficient since I was already doing the sorting.
- But more importantly, I would do a Univariate or Freq on the unique
- key or set of keys, SO AS TO VERIFY THAT THEY ARE INDEED UNIQUE. Lines
- or cells with a count greater than one would immediately identify the
- problem observations. Having identified them, I could then print them
- out, or better, examine them interactively. Having done sufficient
- checking, it is then a simple matter to use SAS's Delete or Where
- statements to eliminate the erroneous observations.
-
- And, of course, this does not require the notorious "adjacency".
-