Which one is better and when

10 months ago by

I was coding (a) part of ex1 of Exercise sheet 1.
There is a simple pythonic way to calculate F(x):

x = np.log(np.sort(numbers))
y = np.log(np.arange(len(numbers),0,-1) / len(numbers))

However this should not work when we have equal x values. (Right?) Therefore I made the following more non-pythonic solution

for i in range(1,len(numbers)):

Which one is better? Do we have to worry for equal values in general or it's more like task-dependent?

Could you expand on "should not work when we have equal x values"?
written 10 months ago by Dr Damon Wischik  
Of course. However I might have misunderstood the definition of F(x), so I might be wrong as well. What I get is that the first code does the following:
x will be the log of sorted numbers: a list which will look like
[small numbers,..,larger ones,..,large numbers]
however x might contain duplicates, e.g.
[smallV1, smallV1, smallV1, .., large numbers]
where smallV1 is some value that's always the same
the y list will be always the same
[1.0, (len(numbers)-1)/len(numbers), (len(numbers)-2)/len(numbers), .. and so on]
so for same x values we will have different F(x) (i.e. y value) which contradicts to general definition of a function if we assume F(x) is a proper function.

P. S.
Sorry for layout of explanation, I'm in a hurry.
written 10 months ago by Dobrik Georgiev  
You're right that a proper function F(x) isn't allowed to have multiple values for a single x. But Question 1(a) is asking for a plot -- and, for the purposes of plotting, it doesn't hurt to have an improper function! matplotlib.pyplot.plot() just draws a line through the points you give it, and it doesn't mind duplicates. Another tip: look at matplotlib.pyplot.step, which can draw a step function like the one in Section 2.2 of lecture notes.
written 10 months ago by Damon Wischik  

1 Answer

10 months ago by
You're right, the 2nd one works for non-equal values while the first one doesn't. However, there are a number of ways to 'pythonify' your second implementation:

empDistX = []
empDistY = []
numbers = sorted(data) #sorted() is more pythonic, and is valid over all iterables
for i, x in enumerate(numbers): #get the element and index in 1 go
        if(x not in empDistX): #this incurs a slight performance hit
                               #but is more in the python style
                empDistY.append(float(length - i)/ length)

I believe we do have to worry about equal values - in general and in this specific case.

Alright, that's part 1a done... part 1b!
Please login to add an answer/comment or follow this question.

Similar posts:
Search »
  • question 1 (b)
    Am I missing something. I tried differentiating and I get d/dx(F)=(-beta-gamma)/x = 0, which has ...